cascading.tap.hadoop
Class ZipInputFormat
java.lang.Object
org.apache.hadoop.mapred.FileInputFormat<LongWritable,Text>
cascading.tap.hadoop.ZipInputFormat
- All Implemented Interfaces:
- InputFormat<LongWritable,Text>, JobConfigurable
public class ZipInputFormat
- extends FileInputFormat<LongWritable,Text>
- implements JobConfigurable
Class ZipInputFormat is an InputFormat
for zip files. Each file within a zip file is broken
into lines. Either line-feed or carriage-return are used to signal end of
line. Keys are the position in the file, and values are the line of text.
If the underlying FileSystem
is HDFS or FILE, each ZipEntry
is returned
as a unique split. Otherwise this input format returns false for isSplitable, and will
subsequently iterate over each ZipEntry and treat all internal files as the 'same' file.
Methods inherited from class org.apache.hadoop.mapred.FileInputFormat |
addInputPath, addInputPaths, computeSplitSize, getBlockIndex, getInputPathFilter, getInputPaths, getSplitHosts, setInputPathFilter, setInputPaths, setInputPaths, setMinSplitSize |
Methods inherited from class java.lang.Object |
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait |
ZipInputFormat
public ZipInputFormat()
configure
public void configure(JobConf conf)
- Specified by:
configure
in interface JobConfigurable
isSplitable
protected boolean isSplitable(FileSystem fs,
Path file)
- Return true only if the file is in ZIP format.
- Overrides:
isSplitable
in class FileInputFormat<LongWritable,Text>
- Parameters:
fs
- the file system that the file is onfile
- the path that represents this file
- Returns:
- is this file splitable?
listPathsInternal
protected Path[] listPathsInternal(JobConf jobConf)
throws IOException
- Throws:
IOException
listStatus
protected FileStatus[] listStatus(JobConf jobConf)
throws IOException
- Overrides:
listStatus
in class FileInputFormat<LongWritable,Text>
- Throws:
IOException
getSplits
public InputSplit[] getSplits(JobConf job,
int numSplits)
throws IOException
- Splits files returned by
listPathsInternal(JobConf)
. Each file is
expected to be in zip format and each split corresponds to
ZipEntry
.
- Specified by:
getSplits
in interface InputFormat<LongWritable,Text>
- Overrides:
getSplits
in class FileInputFormat<LongWritable,Text>
- Parameters:
job
- the JobConf data structure, see JobConf
numSplits
- the number of splits required. Ignored here
- Throws:
IOException
- if input files are not in zip format
getRecordReader
public RecordReader<LongWritable,Text> getRecordReader(InputSplit genericSplit,
JobConf job,
Reporter reporter)
throws IOException
- Specified by:
getRecordReader
in interface InputFormat<LongWritable,Text>
- Specified by:
getRecordReader
in class FileInputFormat<LongWritable,Text>
- Throws:
IOException
isAllowSplits
protected boolean isAllowSplits(FileSystem fs)
Copyright © 2007-2010 Concurrent, Inc. All Rights Reserved.