cascading.scheme.local
Class TextLine

java.lang.Object
  extended by cascading.scheme.Scheme<Properties,InputStream,OutputStream,LineNumberReader,PrintWriter>
      extended by cascading.scheme.local.TextLine
All Implemented Interfaces:
Serializable

public class TextLine
extends Scheme<Properties,InputStream,OutputStream,LineNumberReader,PrintWriter>

A TextLine is a type of Scheme for plain text files. Files are broken into lines. Either line-feed or carriage-return are used to signal end of line.

By default, this scheme returns a Tuple with two fields, "num" and "line". Where "num" is the line number for "line".

Many of the constructors take both "sourceFields" and "sinkFields". sourceFields denote the field names to be used instead of the names "num" and "line". sinkFields is a selector and is by default Fields.ALL. Any available field names can be given if only a subset of the incoming fields should be used.

If a Fields instance is passed on the constructor as sourceFields having only one field, the return tuples will simply be the "line" value using the given field name.

Note that TextLine will concatenate all the Tuple values for the selected fields with a TAB delimiter before writing out the line.

By default, all text is encoded/decoded as UTF-8. This can be changed via the charsetName constructor argument.

See Also:
Serialized Form

Field Summary
static String DEFAULT_CHARSET
           
 
Constructor Summary
TextLine()
          Creates a new TextLine instance that sources "num" and "line" fields, and sinks all incoming fields, where "num" is the line number of the line in the input file.
TextLine(Fields sourceFields)
          Creates a new TextLine instance.
TextLine(Fields sourceFields, Fields sinkFields)
          Creates a new TextLine instance.
TextLine(Fields sourceFields, Fields sinkFields, String charsetName)
          Creates a new TextLine instance.
TextLine(Fields sourceFields, String charsetName)
          Creates a new TextLine instance.
 
Method Summary
 LineNumberReader createInput(InputStream inputStream)
           
 PrintWriter createOutput(OutputStream outputStream)
           
 void presentSinkFields(FlowProcess<Properties> process, Tap tap, Fields fields)
          Method presentSinkFields is called after the planner is invoked and all fields are resolved.
 void presentSourceFields(FlowProcess<Properties> process, Tap tap, Fields fields)
          Method presentSourceFields is called after the planner is invoked and all fields are resolved.
 void sink(FlowProcess<Properties> flowProcess, SinkCall<PrintWriter,OutputStream> sinkCall)
          Method sink writes out the given Tuple found on SinkCall.getOutgoingEntry() to the SinkCall.getOutput().
 void sinkCleanup(FlowProcess<Properties> flowProcess, SinkCall<PrintWriter,OutputStream> sinkCall)
          Method sinkCleanup is used to destroy resources created by Scheme.sinkPrepare(cascading.flow.FlowProcess, SinkCall).
 void sinkConfInit(FlowProcess<Properties> flowProcess, Tap<Properties,InputStream,OutputStream> tap, Properties conf)
          Method sinkInit initializes this instance as a sink.
 void sinkPrepare(FlowProcess<Properties> flowProcess, SinkCall<PrintWriter,OutputStream> sinkCall)
          Method sinkPrepare is used to initialize resources needed during each call of Scheme.sink(cascading.flow.FlowProcess, SinkCall).
 boolean source(FlowProcess<Properties> flowProcess, SourceCall<LineNumberReader,InputStream> sourceCall)
          Method source will read a new "record" or value from SourceCall.getInput() and populate the available Tuple via SourceCall.getIncomingEntry() and return true on success or false if no more values available.
 void sourceCleanup(FlowProcess<Properties> flowProcess, SourceCall<LineNumberReader,InputStream> sourceCall)
          Method sourceCleanup is used to destroy resources created by Scheme.sourcePrepare(cascading.flow.FlowProcess, SourceCall).
 void sourceConfInit(FlowProcess<Properties> flowProcess, Tap<Properties,InputStream,OutputStream> tap, Properties conf)
          Method sourceInit initializes this instance as a source.
 void sourcePrepare(FlowProcess<Properties> flowProcess, SourceCall<LineNumberReader,InputStream> sourceCall)
          Method sourcePrepare is used to initialize resources needed during each call of Scheme.source(cascading.flow.FlowProcess, SourceCall).
protected  void verify(Fields sourceFields)
           
 
Methods inherited from class cascading.scheme.Scheme
equals, getNumSinkParts, getSinkFields, getSourceFields, getTrace, hashCode, isSink, isSource, isSymmetrical, presentSinkFieldsInternal, presentSourceFieldsInternal, retrieveSinkFields, retrieveSourceFields, setNumSinkParts, setSinkFields, setSourceFields, toString
 
Methods inherited from class java.lang.Object
clone, finalize, getClass, notify, notifyAll, wait, wait, wait
 

Field Detail

DEFAULT_CHARSET

public static final String DEFAULT_CHARSET
See Also:
Constant Field Values
Constructor Detail

TextLine

public TextLine()
Creates a new TextLine instance that sources "num" and "line" fields, and sinks all incoming fields, where "num" is the line number of the line in the input file.


TextLine

@ConstructorProperties(value="sourceFields")
public TextLine(Fields sourceFields)
Creates a new TextLine instance. If sourceFields has one field, only the text line will be returned in the subsequent tuples.

Parameters:
sourceFields - of Fields

TextLine

@ConstructorProperties(value={"sourceFields","charsetName"})
public TextLine(Fields sourceFields,
                                           String charsetName)
Creates a new TextLine instance. If sourceFields has one field, only the text line will be returned in the subsequent tuples.

Parameters:
sourceFields - of Fields
charsetName - of type String

TextLine

@ConstructorProperties(value={"sourceFields","sinkFields"})
public TextLine(Fields sourceFields,
                                           Fields sinkFields)
Creates a new TextLine instance. If sourceFields has one field, only the text line will be returned in the subsequent tuples.

Parameters:
sourceFields - of Fields
sinkFields - of Fields

TextLine

@ConstructorProperties(value={"sourceFields","sinkFields","charsetName"})
public TextLine(Fields sourceFields,
                                           Fields sinkFields,
                                           String charsetName)
Creates a new TextLine instance. If sourceFields has one field, only the text line will be returned in the subsequent tuples.

Parameters:
sourceFields - of Fields
sinkFields - of Fields
charsetName - of type String
Method Detail

verify

protected void verify(Fields sourceFields)

createInput

public LineNumberReader createInput(InputStream inputStream)

createOutput

public PrintWriter createOutput(OutputStream outputStream)

presentSourceFields

public void presentSourceFields(FlowProcess<Properties> process,
                                Tap tap,
                                Fields fields)
Description copied from class: Scheme
Method presentSourceFields is called after the planner is invoked and all fields are resolved. This method presents to the Scheme the actual source fields after any planner intervention.

This method is called after Scheme.retrieveSourceFields(cascading.flow.FlowProcess, cascading.tap.Tap).

Overrides:
presentSourceFields in class Scheme<Properties,InputStream,OutputStream,LineNumberReader,PrintWriter>
Parameters:
process - of type FlowProcess
tap - of type Tap
fields - of type Fields

presentSinkFields

public void presentSinkFields(FlowProcess<Properties> process,
                              Tap tap,
                              Fields fields)
Description copied from class: Scheme
Method presentSinkFields is called after the planner is invoked and all fields are resolved. This method presents to the Scheme the actual source fields after any planner intervention.

This method is called after Scheme.retrieveSinkFields(cascading.flow.FlowProcess, cascading.tap.Tap).

Overrides:
presentSinkFields in class Scheme<Properties,InputStream,OutputStream,LineNumberReader,PrintWriter>
Parameters:
process - of type FlowProcess
tap - of type Tap
fields - of type Fields

sourceConfInit

public void sourceConfInit(FlowProcess<Properties> flowProcess,
                           Tap<Properties,InputStream,OutputStream> tap,
                           Properties conf)
Description copied from class: Scheme
Method sourceInit initializes this instance as a source.

This method is executed client side as a means to provide necessary configuration parameters used by the underlying platform.

It is not intended to initialize resources that would be necessary during the execution of this class, like a "formatter" or "parser".

See Scheme.sourcePrepare(cascading.flow.FlowProcess, SourceCall) if resources much be initialized before use. And Scheme.sourceCleanup(cascading.flow.FlowProcess, SourceCall) if resources must be destroyed after use.

Specified by:
sourceConfInit in class Scheme<Properties,InputStream,OutputStream,LineNumberReader,PrintWriter>
Parameters:
flowProcess - of type FlowProcess
tap - of type Tap
conf - of type Config

sinkConfInit

public void sinkConfInit(FlowProcess<Properties> flowProcess,
                         Tap<Properties,InputStream,OutputStream> tap,
                         Properties conf)
Description copied from class: Scheme
Method sinkInit initializes this instance as a sink.

This method is executed client side as a means to provide necessary configuration parameters used by the underlying platform.

It is not intended to initialize resources that would be necessary during the execution of this class, like a "formatter" or "parser".

See Scheme.sinkPrepare(cascading.flow.FlowProcess, SinkCall) if resources much be initialized before use. And Scheme.sinkCleanup(cascading.flow.FlowProcess, SinkCall) if resources must be destroyed after use.

Specified by:
sinkConfInit in class Scheme<Properties,InputStream,OutputStream,LineNumberReader,PrintWriter>
Parameters:
flowProcess - of type FlowProcess
tap - of type Tap
conf - of type Config

sourcePrepare

public void sourcePrepare(FlowProcess<Properties> flowProcess,
                          SourceCall<LineNumberReader,InputStream> sourceCall)
                   throws IOException
Description copied from class: Scheme
Method sourcePrepare is used to initialize resources needed during each call of Scheme.source(cascading.flow.FlowProcess, SourceCall).

Be sure to place any initialized objects in the SourceContext so each instance will remain threadsafe.

Overrides:
sourcePrepare in class Scheme<Properties,InputStream,OutputStream,LineNumberReader,PrintWriter>
Parameters:
flowProcess - of type FlowProcess
sourceCall - of type SourceCall
Throws:
IOException

source

public boolean source(FlowProcess<Properties> flowProcess,
                      SourceCall<LineNumberReader,InputStream> sourceCall)
               throws IOException
Description copied from class: Scheme
Method source will read a new "record" or value from SourceCall.getInput() and populate the available Tuple via SourceCall.getIncomingEntry() and return true on success or false if no more values available.

It's ok to set a new Tuple instance on the incomingEntry TupleEntry, or to simply re-use the existing instance.

Note this is only time it is safe to modify a Tuple instance handed over via a method call.

This method may optionally throw a TapException if it cannot process a particular instance of data. If the payload Tuple is set on the TapException, that Tuple will be written to any applicable failure trap Tap.

Specified by:
source in class Scheme<Properties,InputStream,OutputStream,LineNumberReader,PrintWriter>
Parameters:
flowProcess - of type FlowProcess
sourceCall - of SourceCall
Returns:
returns true when a Tuple was successfully read
Throws:
IOException

sourceCleanup

public void sourceCleanup(FlowProcess<Properties> flowProcess,
                          SourceCall<LineNumberReader,InputStream> sourceCall)
                   throws IOException
Description copied from class: Scheme
Method sourceCleanup is used to destroy resources created by Scheme.sourcePrepare(cascading.flow.FlowProcess, SourceCall).

Overrides:
sourceCleanup in class Scheme<Properties,InputStream,OutputStream,LineNumberReader,PrintWriter>
Parameters:
flowProcess - of Process
sourceCall - of type SourceCall
Throws:
IOException

sinkPrepare

public void sinkPrepare(FlowProcess<Properties> flowProcess,
                        SinkCall<PrintWriter,OutputStream> sinkCall)
                 throws IOException
Description copied from class: Scheme
Method sinkPrepare is used to initialize resources needed during each call of Scheme.sink(cascading.flow.FlowProcess, SinkCall).

Be sure to place any initialized objects in the SinkContext so each instance will remain threadsafe.

Overrides:
sinkPrepare in class Scheme<Properties,InputStream,OutputStream,LineNumberReader,PrintWriter>
Parameters:
flowProcess - of type FlowProcess
sinkCall - of type SinkCall
Throws:
IOException

sink

public void sink(FlowProcess<Properties> flowProcess,
                 SinkCall<PrintWriter,OutputStream> sinkCall)
          throws IOException
Description copied from class: Scheme
Method sink writes out the given Tuple found on SinkCall.getOutgoingEntry() to the SinkCall.getOutput().

This method may optionally throw a TapException if it cannot process a particular instance of data. If the payload Tuple is set on the TapException, that Tuple will be written to any applicable failure trap Tap. If not set, the incoming Tuple will be written instead.

Specified by:
sink in class Scheme<Properties,InputStream,OutputStream,LineNumberReader,PrintWriter>
Parameters:
flowProcess - of Process
sinkCall - of SinkCall
Throws:
IOException

sinkCleanup

public void sinkCleanup(FlowProcess<Properties> flowProcess,
                        SinkCall<PrintWriter,OutputStream> sinkCall)
                 throws IOException
Description copied from class: Scheme
Method sinkCleanup is used to destroy resources created by Scheme.sinkPrepare(cascading.flow.FlowProcess, SinkCall).

Overrides:
sinkCleanup in class Scheme<Properties,InputStream,OutputStream,LineNumberReader,PrintWriter>
Parameters:
flowProcess - of type FlowProcess
sinkCall - of type SinkCall
Throws:
IOException


Copyright © 2007-2012 Concurrent, Inc. All Rights Reserved.