|
|||||||||
PREV CLASS NEXT CLASS | FRAMES NO FRAMES | ||||||||
SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD |
java.lang.Object cascading.scheme.Scheme cascading.scheme.TextLine
public class TextLine
A TextLine is a type of Scheme
for plain text files. Files are broken into
lines. Either line-feed or carriage-return are used to signal end of line.
Tuple
with two fields, "offset" and "line".
Many of the constructors take both "sourceFields" and "sinkFields". sourceFields denote the field names
to be used instead of the names "offset" and "line". sinkFields is a selector and is by default Fields.ALL
.
Any available field names can be given if only a subset of the incoming fields should be used.
If a Fields
instance is passed on the constructor as sourceFields having only one field, the return tuples
will simply be the "line" value using the given field name.
Note that TextLine will concatenate all the Tuple values for the selected fields with a TAB delimiter before
writing out the line.
Note sink compression is TextLine.Compress.DISABLE
by default. If null
is passed to the constructor
for the compression value, it will remain disabled.
If all the input files end with ".zip", the ZipInputFormat
will be used. This is not
bi-directional, so zip files cannot be written.
Nested Class Summary | |
---|---|
static class |
TextLine.Compress
|
Field Summary | |
---|---|
static Fields |
DEFAULT_SOURCE_FIELDS
Field DEFAULT_SOURCE_FIELDS |
Constructor Summary | |
---|---|
TextLine()
Creates a new TextLine instance that sources "offset" and "line" fields, and sinks all incoming fields, where "offset" is the byte offset in the input file. |
|
TextLine(Fields sourceFields)
Creates a new TextLine instance. |
|
TextLine(Fields sourceFields,
Fields sinkFields)
Creates a new TextLine instance. |
|
TextLine(Fields sourceFields,
Fields sinkFields,
int numSinkParts)
Creates a new TextLine instance. |
|
TextLine(Fields sourceFields,
Fields sinkFields,
TextLine.Compress sinkCompression)
Constructor TextLine creates a new TextLine instance. |
|
TextLine(Fields sourceFields,
Fields sinkFields,
TextLine.Compress sinkCompression,
int numSinkParts)
Constructor TextLine creates a new TextLine instance. |
|
TextLine(Fields sourceFields,
int numSinkParts)
Creates a new TextLine instance. |
|
TextLine(int numSinkParts)
Creates a new TextLine instance that sources "offset" and "line" fields, and sinks all incoming fields, where "offset" is the byte offset in the input file. |
|
TextLine(TextLine.Compress sinkCompression)
Creates a new TextLine instance that sources "offset" and "line" fields, and sinks all incoming fields, where "offset" is the byte offset in the input file. |
Method Summary | |
---|---|
TextLine.Compress |
getSinkCompression()
Method getSinkCompression returns the sinkCompression of this TextLine object. |
void |
setSinkCompression(TextLine.Compress sinkCompression)
Method setSinkCompression sets the sinkCompression of this TextLine object. |
void |
sink(TupleEntry tupleEntry,
OutputCollector outputCollector)
Method sink writes out the given Tuple instance to the outputCollector. |
void |
sinkInit(Tap tap,
JobConf conf)
Method sinkInit initializes this instance as a sink. |
Tuple |
source(Object key,
Object value)
Method source takes the given Hadoop key and value and returns a new Tuple instance. |
void |
sourceInit(Tap tap,
JobConf conf)
Method sourceInit initializes this instance as a source. |
Methods inherited from class cascading.scheme.Scheme |
---|
equals, getNumSinkParts, getSinkFields, getSourceFields, getTrace, hashCode, isSink, isSource, isSymmetrical, isWriteDirect, setNumSinkParts, setSinkFields, setSourceFields, toString |
Methods inherited from class java.lang.Object |
---|
clone, finalize, getClass, notify, notifyAll, wait, wait, wait |
Field Detail |
---|
public static final Fields DEFAULT_SOURCE_FIELDS
Constructor Detail |
---|
public TextLine()
@ConstructorProperties(value="numSinkParts") public TextLine(int numSinkParts)
numSinkParts
- of type int@ConstructorProperties(value="sinkCompression") public TextLine(TextLine.Compress sinkCompression)
sinkCompression
- of type Compress@ConstructorProperties(value={"sourceFields","sinkFields"}) public TextLine(Fields sourceFields, Fields sinkFields)
sourceFields
- the source fields for this schemesinkFields
- the sink fields for this scheme@ConstructorProperties(value={"sourceFields","sinkFields","numSinkParts"}) public TextLine(Fields sourceFields, Fields sinkFields, int numSinkParts)
sourceFields
- the source fields for this schemesinkFields
- the sink fields for this schemenumSinkParts
- of type int@ConstructorProperties(value={"sourceFields","sinkFields","sinkCompression"}) public TextLine(Fields sourceFields, Fields sinkFields, TextLine.Compress sinkCompression)
sourceFields
- of type FieldssinkFields
- of type FieldssinkCompression
- of type Compress@ConstructorProperties(value={"sourceFields","sinkFields","sinkCompression","numSinkParts"}) public TextLine(Fields sourceFields, Fields sinkFields, TextLine.Compress sinkCompression, int numSinkParts)
sourceFields
- of type FieldssinkFields
- of type FieldssinkCompression
- of type CompressnumSinkParts
- of type int@ConstructorProperties(value="sourceFields") public TextLine(Fields sourceFields)
sourceFields
- the source fields for this scheme@ConstructorProperties(value={"sourceFields","numSinkParts"}) public TextLine(Fields sourceFields, int numSinkParts)
sourceFields
- the source fields for this schemenumSinkParts
- of type intMethod Detail |
---|
public TextLine.Compress getSinkCompression()
public void setSinkCompression(TextLine.Compress sinkCompression)
sinkCompression
- the sinkCompression of this TextLine object.public void sourceInit(Tap tap, JobConf conf)
Scheme
sourceInit
in class Scheme
tap
- of type Tapconf
- of type JobConfpublic void sinkInit(Tap tap, JobConf conf) throws IOException
Scheme
sinkInit
in class Scheme
tap
- of type Tapconf
- of type JobConf
IOException
- on initialization failurepublic Tuple source(Object key, Object value)
Scheme
Tuple
instance.
source
in class Scheme
key
- of type WritableComparablevalue
- of type Writable
public void sink(TupleEntry tupleEntry, OutputCollector outputCollector) throws IOException
Scheme
Tuple
instance to the outputCollector.
sink
in class Scheme
outputCollector
- of type OutputCollector @throws IOException when
IOException
|
|||||||||
PREV CLASS NEXT CLASS | FRAMES NO FRAMES | ||||||||
SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD |