|
|||||||||
| PREV CLASS NEXT CLASS | FRAMES NO FRAMES | ||||||||
| SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD | ||||||||
java.lang.Objectcascading.scheme.Scheme<JobConf,RecordReader,OutputCollector,Object[],Object[]>
cascading.scheme.hadoop.TextLine
public class TextLine
A TextLine is a type of Scheme for plain text files. Files are broken into
lines. Either line-feed or carriage-return are used to signal end of line.
Tuple with two fields, "offset" and "line".
Many of the constructors take both "sourceFields" and "sinkFields". sourceFields denote the field names
to be used instead of the names "offset" and "line". sinkFields is a selector and is by default Fields.ALL.
Any available field names can be given if only a subset of the incoming fields should be used.
If a Fields instance is passed on the constructor as sourceFields having only one field, the return tuples
will simply be the "line" value using the given field name.
Note that TextLine will concatenate all the Tuple values for the selected fields with a TAB delimiter before
writing out the line.
Note sink compression is TextLine.Compress.DISABLE by default. If null is passed to the constructor
for the compression value, it will remain disabled.
If any of the input files end with ".zip", an error will be thrown.
| Nested Class Summary | |
|---|---|
static class |
TextLine.Compress
|
| Field Summary | |
|---|---|
static Fields |
DEFAULT_SOURCE_FIELDS
Field DEFAULT_SOURCE_FIELDS |
| Constructor Summary | |
|---|---|
TextLine()
Creates a new TextLine instance that sources "offset" and "line" fields, and sinks all incoming fields, where "offset" is the byte offset in the input file. |
|
TextLine(Fields sourceFields)
Creates a new TextLine instance. |
|
TextLine(Fields sourceFields,
Fields sinkFields)
Creates a new TextLine instance. |
|
TextLine(Fields sourceFields,
Fields sinkFields,
int numSinkParts)
Creates a new TextLine instance. |
|
TextLine(Fields sourceFields,
Fields sinkFields,
TextLine.Compress sinkCompression)
Constructor TextLine creates a new TextLine instance. |
|
TextLine(Fields sourceFields,
Fields sinkFields,
TextLine.Compress sinkCompression,
int numSinkParts)
Constructor TextLine creates a new TextLine instance. |
|
TextLine(Fields sourceFields,
int numSinkParts)
Creates a new TextLine instance. |
|
TextLine(int numSinkParts)
Creates a new TextLine instance that sources "offset" and "line" fields, and sinks all incoming fields, where "offset" is the byte offset in the input file. |
|
TextLine(TextLine.Compress sinkCompression)
Creates a new TextLine instance that sources "offset" and "line" fields, and sinks all incoming fields, where "offset" is the byte offset in the input file. |
|
| Method Summary | |
|---|---|
TextLine.Compress |
getSinkCompression()
Method getSinkCompression returns the sinkCompression of this TextLine object. |
void |
presentSinkFields(FlowProcess<JobConf> flowProcess,
Tap tap,
Fields fields)
Method presentSinkFields is called after the planner is invoked and all fields are resolved. |
void |
presentSourceFields(FlowProcess<JobConf> flowProcess,
Tap tap,
Fields fields)
Method presentSourceFields is called after the planner is invoked and all fields are resolved. |
void |
setSinkCompression(TextLine.Compress sinkCompression)
Method setSinkCompression sets the sinkCompression of this TextLine object. |
void |
sink(FlowProcess<JobConf> flowProcess,
SinkCall<Object[],OutputCollector> sinkCall)
Method sink writes out the given Tuple found on SinkCall.getOutgoingEntry() to
the SinkCall.getOutput(). |
void |
sinkConfInit(FlowProcess<JobConf> flowProcess,
Tap<JobConf,RecordReader,OutputCollector> tap,
JobConf conf)
Method sinkInit initializes this instance as a sink. |
boolean |
source(FlowProcess<JobConf> flowProcess,
SourceCall<Object[],RecordReader> sourceCall)
Method source will read a new "record" or value from SourceCall.getInput() and populate
the available Tuple via SourceCall.getIncomingEntry() and return true
on success or false if no more values available. |
void |
sourceCleanup(FlowProcess<JobConf> flowProcess,
SourceCall<Object[],RecordReader> sourceCall)
Method sourceCleanup is used to destroy resources created by Scheme.sourcePrepare(cascading.flow.FlowProcess, SourceCall). |
void |
sourceConfInit(FlowProcess<JobConf> flowProcess,
Tap<JobConf,RecordReader,OutputCollector> tap,
JobConf conf)
Method sourceInit initializes this instance as a source. |
protected void |
sourceHandleInput(SourceCall<Object[],RecordReader> sourceCall)
|
void |
sourcePrepare(FlowProcess<JobConf> flowProcess,
SourceCall<Object[],RecordReader> sourceCall)
Method sourcePrepare is used to initialize resources needed during each call of Scheme.source(cascading.flow.FlowProcess, SourceCall). |
protected void |
verify(Fields sourceFields)
|
| Methods inherited from class cascading.scheme.Scheme |
|---|
equals, getNumSinkParts, getSinkFields, getSourceFields, getTrace, hashCode, isSink, isSource, isSymmetrical, presentSinkFieldsInternal, presentSourceFieldsInternal, retrieveSinkFields, retrieveSourceFields, setNumSinkParts, setSinkFields, setSourceFields, sinkCleanup, sinkPrepare, toString |
| Methods inherited from class java.lang.Object |
|---|
clone, finalize, getClass, notify, notifyAll, wait, wait, wait |
| Field Detail |
|---|
public static final Fields DEFAULT_SOURCE_FIELDS
| Constructor Detail |
|---|
public TextLine()
@ConstructorProperties(value="numSinkParts") public TextLine(int numSinkParts)
numSinkParts - of type int@ConstructorProperties(value="sinkCompression") public TextLine(TextLine.Compress sinkCompression)
sinkCompression - of type Compress
@ConstructorProperties(value={"sourceFields","sinkFields"})
public TextLine(Fields sourceFields,
Fields sinkFields)
sourceFields - the source fields for this schemesinkFields - the sink fields for this scheme
@ConstructorProperties(value={"sourceFields","sinkFields","numSinkParts"})
public TextLine(Fields sourceFields,
Fields sinkFields,
int numSinkParts)
sourceFields - the source fields for this schemesinkFields - the sink fields for this schemenumSinkParts - of type int
@ConstructorProperties(value={"sourceFields","sinkFields","sinkCompression"})
public TextLine(Fields sourceFields,
Fields sinkFields,
TextLine.Compress sinkCompression)
sourceFields - of type FieldssinkFields - of type FieldssinkCompression - of type Compress
@ConstructorProperties(value={"sourceFields","sinkFields","sinkCompression","numSinkParts"})
public TextLine(Fields sourceFields,
Fields sinkFields,
TextLine.Compress sinkCompression,
int numSinkParts)
sourceFields - of type FieldssinkFields - of type FieldssinkCompression - of type CompressnumSinkParts - of type int@ConstructorProperties(value="sourceFields") public TextLine(Fields sourceFields)
sourceFields - the source fields for this scheme
@ConstructorProperties(value={"sourceFields","numSinkParts"})
public TextLine(Fields sourceFields,
int numSinkParts)
sourceFields - the source fields for this schemenumSinkParts - of type int| Method Detail |
|---|
protected void verify(Fields sourceFields)
public TextLine.Compress getSinkCompression()
public void setSinkCompression(TextLine.Compress sinkCompression)
sinkCompression - the sinkCompression of this TextLine object.
public void sourceConfInit(FlowProcess<JobConf> flowProcess,
Tap<JobConf,RecordReader,OutputCollector> tap,
JobConf conf)
SchemeScheme.sourcePrepare(cascading.flow.FlowProcess, SourceCall) if resources much be initialized
before use. And Scheme.sourceCleanup(cascading.flow.FlowProcess, SourceCall) if resources must be
destroyed after use.
sourceConfInit in class Scheme<JobConf,RecordReader,OutputCollector,Object[],Object[]>tap - of type Tapconf - of type JobConf @throws IOException on initialization failure
public void presentSourceFields(FlowProcess<JobConf> flowProcess,
Tap tap,
Fields fields)
SchemeScheme.retrieveSourceFields(cascading.flow.FlowProcess, cascading.tap.Tap).
presentSourceFields in class Scheme<JobConf,RecordReader,OutputCollector,Object[],Object[]>flowProcess - of type FlowProcesstap - of type Tapfields - of type Fields
public void presentSinkFields(FlowProcess<JobConf> flowProcess,
Tap tap,
Fields fields)
SchemeScheme.retrieveSinkFields(cascading.flow.FlowProcess, cascading.tap.Tap).
presentSinkFields in class Scheme<JobConf,RecordReader,OutputCollector,Object[],Object[]>flowProcess - of type FlowProcesstap - of type Tapfields - of type Fields
public void sinkConfInit(FlowProcess<JobConf> flowProcess,
Tap<JobConf,RecordReader,OutputCollector> tap,
JobConf conf)
SchemeScheme.sinkPrepare(cascading.flow.FlowProcess, SinkCall) if resources much be initialized
before use. And Scheme.sinkCleanup(cascading.flow.FlowProcess, SinkCall) if resources must be
destroyed after use.
sinkConfInit in class Scheme<JobConf,RecordReader,OutputCollector,Object[],Object[]>tap - of type Tapconf - of type JobConf @throws IOException on initialization failure
public void sourcePrepare(FlowProcess<JobConf> flowProcess,
SourceCall<Object[],RecordReader> sourceCall)
SchemeScheme.source(cascading.flow.FlowProcess, SourceCall).
Be sure to place any initialized objects in the SourceContext so each instance
will remain threadsafe.
sourcePrepare in class Scheme<JobConf,RecordReader,OutputCollector,Object[],Object[]>flowProcess - of ProcesssourceCall - of SourceCall
public boolean source(FlowProcess<JobConf> flowProcess,
SourceCall<Object[],RecordReader> sourceCall)
throws IOException
SchemeSourceCall.getInput() and populate
the available Tuple via SourceCall.getIncomingEntry() and return true
on success or false if no more values available.
It's ok to set a new Tuple instance on the incomingEntry TupleEntry, or
to simply re-use the existing instance.
Note this is only time it is safe to modify a Tuple instance handed over via a method call.
This method may optionally throw a TapException if it cannot process a particular
instance of data. If the payload Tuple is set on the TapException, that Tuple will be written to
any applicable failure trap Tap.
source in class Scheme<JobConf,RecordReader,OutputCollector,Object[],Object[]>flowProcess - of ProcesssourceCall - of SourceCall
true when a Tuple was successfully read
IOExceptionprotected void sourceHandleInput(SourceCall<Object[],RecordReader> sourceCall)
public void sourceCleanup(FlowProcess<JobConf> flowProcess,
SourceCall<Object[],RecordReader> sourceCall)
SchemeScheme.sourcePrepare(cascading.flow.FlowProcess, SourceCall).
sourceCleanup in class Scheme<JobConf,RecordReader,OutputCollector,Object[],Object[]>flowProcess - of ProcesssourceCall - of SourceCall
public void sink(FlowProcess<JobConf> flowProcess,
SinkCall<Object[],OutputCollector> sinkCall)
throws IOException
SchemeTuple found on SinkCall.getOutgoingEntry() to
the SinkCall.getOutput().
This method may optionally throw a TapException if it cannot process a particular
instance of data. If the payload Tuple is set on the TapException, that Tuple will be written to
any applicable failure trap Tap. If not set, the incoming Tuple will be written instead.
sink in class Scheme<JobConf,RecordReader,OutputCollector,Object[],Object[]>flowProcess - of ProcesssinkCall - of SinkCall
IOException
|
|||||||||
| PREV CLASS NEXT CLASS | FRAMES NO FRAMES | ||||||||
| SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD | ||||||||