cascading.scheme.local
Class TextDelimited

java.lang.Object
  extended by cascading.scheme.Scheme<Properties,InputStream,OutputStream,LineNumberReader,PrintWriter>
      extended by cascading.scheme.local.TextDelimited
All Implemented Interfaces:
Serializable

public class TextDelimited
extends Scheme<Properties,InputStream,OutputStream,LineNumberReader,PrintWriter>

Class TextDelimited provides direct support for delimited text files, like TAB (\t) or COMMA (,) delimited files. It also optionally allows for quoted values.

TextDelimited may also be used to skip the "header" in a file, where the header is defined as the very first line in every input file. That is, if the byte offset of the current line from the input is zero (0), that line will be skipped.

It is assumed if sink/source fields is set to either Fields.ALL or Fields.UNKNOWN and skipHeader or hasHeader is true, the field names will be retrieved from the header of the file and used during planning. The header will parsed with the same rules as the body of the file.

By default headers are not skipped.

TextDelimited may also be used to write a "header" in a file. The fields names for the header are taken directly from the declared fields. Or if the declared fields are Fields.ALL or Fields.UNKNOWN, the resolved field names will be used, if any.

By default headers are not written.

If hasHeaders is set to true on a constructor, both skipHeader and writeHeader will be set to true.

By default this Scheme is both strict and safe.

Strict meaning if a line of text does not parse into the expected number of fields, this class will throw a TapException. If strict is false, then Tuple will be returned with null values for the missing fields.

Safe meaning if a field cannot be coerced into an expected type, a null will be used for the value. If safe is false, a TapException will be thrown.

Also by default, quote strings are not searched for to improve processing speed. If a file is COMMA delimited but may have COMMA's in a value, the whole value should be surrounded by the quote string, typically double quotes (").

Note all empty fields in a line will be returned as null unless coerced into a new type.

This Scheme may source/sink Fields.ALL, when given on the constructor the new instance will automatically default to strict == false as the number of fields parsed are arbitrary or unknown. A type array may not be given either, so all values will be returned as Strings.

See Also:
TextLine, Serialized Form

Constructor Summary
TextDelimited()
          Constructor TextDelimited creates a new TextDelimited instance sourcing Fields.UNKNOWN, sinking Fields.ALL and using TAB as the default delimiter.
TextDelimited(boolean hasHeader, String delimiter)
          Constructor TextDelimited creates a new TextDelimited instance sourcing Fields.UNKNOWN, sinking Fields.ALL and using TAB as the default delimiter.
TextDelimited(boolean hasHeader, String delimiter, String quote)
          Constructor TextDelimited creates a new TextDelimited instance sourcing Fields.UNKNOWN, sinking Fields.ALL and using TAB as the default delimiter.
TextDelimited(Fields fields)
          Constructor TextDelimited creates a new TextDelimited instance with TAB as the default delimiter.
TextDelimited(Fields fields, boolean skipHeader, boolean writeHeader, String delimiter)
          Constructor TextDelimited creates a new TextDelimited instance.
TextDelimited(Fields fields, boolean skipHeader, boolean writeHeader, String delimiter, boolean strict, String quote, Class[] types, boolean safe)
          Constructor TextDelimited creates a new TextDelimited instance.
TextDelimited(Fields fields, boolean skipHeader, boolean writeHeader, String delimiter, Class[] types)
          Constructor TextDelimited creates a new TextDelimited instance.
TextDelimited(Fields fields, boolean skipHeader, boolean writeHeader, String delimiter, String quote, Class[] types)
          Constructor TextDelimited creates a new TextDelimited instance.
TextDelimited(Fields fields, boolean skipHeader, boolean writeHeader, String delimiter, String quote, Class[] types, boolean safe)
          Constructor TextDelimited creates a new TextDelimited instance.
TextDelimited(Fields fields, boolean hasHeader, String delimiter)
          Constructor TextDelimited creates a new TextDelimited instance.
TextDelimited(Fields fields, boolean hasHeader, String delimiter, Class[] types)
          Constructor TextDelimited creates a new TextDelimited instance.
TextDelimited(Fields fields, boolean hasHeader, String delimiter, String quote)
          Constructor TextDelimited creates a new TextDelimited instance.
TextDelimited(Fields fields, boolean hasHeader, String delimiter, String quote, Class[] types)
          Constructor TextDelimited creates a new TextDelimited instance.
TextDelimited(Fields fields, boolean hasHeader, String delimiter, String quote, Class[] types, boolean safe)
          Constructor TextDelimited creates a new TextDelimited instance.
TextDelimited(Fields fields, String delimiter)
          Constructor TextDelimited creates a new TextDelimited instance.
TextDelimited(Fields fields, String delimiter, Class[] types)
          Constructor TextDelimited creates a new TextDelimited instance.
TextDelimited(Fields fields, String delimiter, String quote)
          Constructor TextDelimited creates a new TextDelimited instance.
TextDelimited(Fields fields, String delimiter, String quote, Class[] types)
          Constructor TextDelimited creates a new TextDelimited instance.
TextDelimited(Fields fields, String delimiter, String quote, Class[] types, boolean safe)
          Constructor TextDelimited creates a new TextDelimited instance.
 
Method Summary
 LineNumberReader createInput(InputStream inputStream)
           
 PrintWriter createOutput(OutputStream outputStream)
           
 boolean isSymmetrical()
          Method isSymmetrical returns true if the sink fields equal the source fields.
 void presentSinkFields(FlowProcess<Properties> flowProcess, Tap tap, Fields fields)
          Method presentSinkFields is called after the planner is invoked and all fields are resolved.
 void presentSourceFields(FlowProcess<Properties> process, Tap tap, Fields fields)
          Method presentSourceFields is called after the planner is invoked and all fields are resolved.
 Fields retrieveSourceFields(FlowProcess<Properties> process, Tap tap)
          Method retrieveSourceFields notifies a Scheme when it is appropriate to dynamically update the fields it sources.
 void sink(FlowProcess<Properties> flowProcess, SinkCall<PrintWriter,OutputStream> sinkCall)
          Method sink writes out the given Tuple found on SinkCall.getOutgoingEntry() to the SinkCall.getOutput().
 void sinkCleanup(FlowProcess<Properties> flowProcess, SinkCall<PrintWriter,OutputStream> sinkCall)
          Method sinkCleanup is used to destroy resources created by Scheme.sinkPrepare(cascading.flow.FlowProcess, SinkCall).
 void sinkConfInit(FlowProcess<Properties> flowProcess, Tap<Properties,InputStream,OutputStream> tap, Properties conf)
          Method sinkInit initializes this instance as a sink.
 void sinkPrepare(FlowProcess<Properties> flowProcess, SinkCall<PrintWriter,OutputStream> sinkCall)
          Method sinkPrepare is used to initialize resources needed during each call of Scheme.sink(cascading.flow.FlowProcess, SinkCall).
 boolean source(FlowProcess<Properties> flowProcess, SourceCall<LineNumberReader,InputStream> sourceCall)
          Method source will read a new "record" or value from SourceCall.getInput() and populate the available Tuple via SourceCall.getIncomingEntry() and return true on success or false if no more values available.
 void sourceCleanup(FlowProcess<Properties> flowProcess, SourceCall<LineNumberReader,InputStream> sourceCall)
          Method sourceCleanup is used to destroy resources created by Scheme.sourcePrepare(cascading.flow.FlowProcess, SourceCall).
 void sourceConfInit(FlowProcess<Properties> flowProcess, Tap<Properties,InputStream,OutputStream> tap, Properties conf)
          Method sourceInit initializes this instance as a source.
 void sourcePrepare(FlowProcess<Properties> flowProcess, SourceCall<LineNumberReader,InputStream> sourceCall)
          Method sourcePrepare is used to initialize resources needed during each call of Scheme.source(cascading.flow.FlowProcess, SourceCall).
 
Methods inherited from class cascading.scheme.Scheme
equals, getNumSinkParts, getSinkFields, getSourceFields, getTrace, hashCode, isSink, isSource, presentSinkFieldsInternal, presentSourceFieldsInternal, retrieveSinkFields, setNumSinkParts, setSinkFields, setSourceFields, toString
 
Methods inherited from class java.lang.Object
clone, finalize, getClass, notify, notifyAll, wait, wait, wait
 

Constructor Detail

TextDelimited

public TextDelimited()
Constructor TextDelimited creates a new TextDelimited instance sourcing Fields.UNKNOWN, sinking Fields.ALL and using TAB as the default delimiter.

Use this constructor if the source and sink fields will be resolved during planning, for example, when using with a Checkpoint Tap.


TextDelimited

@ConstructorProperties(value={"hasHeader","delimiter"})
public TextDelimited(boolean hasHeader,
                                                String delimiter)
Constructor TextDelimited creates a new TextDelimited instance sourcing Fields.UNKNOWN, sinking Fields.ALL and using TAB as the default delimiter.

Use this constructor if the source and sink fields will be resolved during planning, for example, when using with a Checkpoint Tap.

Parameters:
hasHeader -
delimiter -

TextDelimited

@ConstructorProperties(value={"hasHeader","delimiter","quote"})
public TextDelimited(boolean hasHeader,
                                                String delimiter,
                                                String quote)
Constructor TextDelimited creates a new TextDelimited instance sourcing Fields.UNKNOWN, sinking Fields.ALL and using TAB as the default delimiter.

Use this constructor if the source and sink fields will be resolved during planning, for example, when using with a Checkpoint Tap.

Parameters:
hasHeader -
delimiter -
quote -

TextDelimited

@ConstructorProperties(value="fields")
public TextDelimited(Fields fields)
Constructor TextDelimited creates a new TextDelimited instance with TAB as the default delimiter.

Parameters:
fields - of type Fields

TextDelimited

@ConstructorProperties(value={"fields","delimiter"})
public TextDelimited(Fields fields,
                                                String delimiter)
Constructor TextDelimited creates a new TextDelimited instance.

Parameters:
fields - of type Fields
delimiter - of type String

TextDelimited

@ConstructorProperties(value={"fields","hasHeader","delimiter"})
public TextDelimited(Fields fields,
                                                boolean hasHeader,
                                                String delimiter)
Constructor TextDelimited creates a new TextDelimited instance.

Parameters:
fields - of type Fields
hasHeader - of type boolean
delimiter - of type String

TextDelimited

@ConstructorProperties(value={"fields","skipHeader","writeHeader","delimiter"})
public TextDelimited(Fields fields,
                                                boolean skipHeader,
                                                boolean writeHeader,
                                                String delimiter)
Constructor TextDelimited creates a new TextDelimited instance.

Parameters:
fields - of type Fields
skipHeader - of type boolean
delimiter - of type String

TextDelimited

@ConstructorProperties(value={"fields","delimiter","types"})
public TextDelimited(Fields fields,
                                                String delimiter,
                                                Class[] types)
Constructor TextDelimited creates a new TextDelimited instance.

Parameters:
fields - of type Fields
delimiter - of type String
types - of type Class[]

TextDelimited

@ConstructorProperties(value={"fields","hasHeader","delimiter","types"})
public TextDelimited(Fields fields,
                                                boolean hasHeader,
                                                String delimiter,
                                                Class[] types)
Constructor TextDelimited creates a new TextDelimited instance.

Parameters:
fields - of type Fields
hasHeader - of type boolean
delimiter - of type String
types - of type Class[]

TextDelimited

@ConstructorProperties(value={"fields","skipHeader","delimiter","types"})
public TextDelimited(Fields fields,
                                                boolean skipHeader,
                                                boolean writeHeader,
                                                String delimiter,
                                                Class[] types)
Constructor TextDelimited creates a new TextDelimited instance.

Parameters:
fields - of type Fields
skipHeader - of type boolean
writeHeader - of type boolean
delimiter - of type String
types - of type Class[]

TextDelimited

@ConstructorProperties(value={"fields","delimiter","quote","types"})
public TextDelimited(Fields fields,
                                                String delimiter,
                                                String quote,
                                                Class[] types)
Constructor TextDelimited creates a new TextDelimited instance.

Parameters:
fields - of type Fields
delimiter - of type String
quote - of type String
types - of type Class[]

TextDelimited

@ConstructorProperties(value={"fields","hasHeader","delimiter","quote","types"})
public TextDelimited(Fields fields,
                                                boolean hasHeader,
                                                String delimiter,
                                                String quote,
                                                Class[] types)
Constructor TextDelimited creates a new TextDelimited instance.

Parameters:
fields - of type Fields
hasHeader - of type boolean
delimiter - of type String
quote - of type String
types - of type Class[]

TextDelimited

@ConstructorProperties(value={"fields","skipHeader","writeHeader","delimiter","quote","types"})
public TextDelimited(Fields fields,
                                                boolean skipHeader,
                                                boolean writeHeader,
                                                String delimiter,
                                                String quote,
                                                Class[] types)
Constructor TextDelimited creates a new TextDelimited instance.

Parameters:
fields - of type Fields
skipHeader - of type boolean
writeHeader - of type boolean
delimiter - of type String
quote - of type String
types - of type Class[]

TextDelimited

@ConstructorProperties(value={"fields","delimiter","quote","types","safe"})
public TextDelimited(Fields fields,
                                                String delimiter,
                                                String quote,
                                                Class[] types,
                                                boolean safe)
Constructor TextDelimited creates a new TextDelimited instance.

Parameters:
fields - of type Fields
delimiter - of type String
quote - of type String
types - of type Class[]
safe - of type boolean

TextDelimited

@ConstructorProperties(value={"fields","hasHeader","delimiter","quote","types","safe"})
public TextDelimited(Fields fields,
                                                boolean hasHeader,
                                                String delimiter,
                                                String quote,
                                                Class[] types,
                                                boolean safe)
Constructor TextDelimited creates a new TextDelimited instance.

Parameters:
fields - of type Fields
hasHeader - of type boolean
delimiter - of type String
quote - of type String
types - of type Class[]
safe - of type boolean

TextDelimited

@ConstructorProperties(value={"fields","skipHeader","writeHeader","delimiter","quote","types","safe"})
public TextDelimited(Fields fields,
                                                boolean skipHeader,
                                                boolean writeHeader,
                                                String delimiter,
                                                String quote,
                                                Class[] types,
                                                boolean safe)
Constructor TextDelimited creates a new TextDelimited instance.

Parameters:
fields - of type Fields
skipHeader - of type boolean
writeHeader - of type boolean
delimiter - of type String
quote - of type String
types - of type Class[]
safe - of type boolean

TextDelimited

@ConstructorProperties(value={"fields","delimiter","quote"})
public TextDelimited(Fields fields,
                                                String delimiter,
                                                String quote)
Constructor TextDelimited creates a new TextDelimited instance.

Parameters:
fields - of type Fields
delimiter - of type String
quote - of type String

TextDelimited

@ConstructorProperties(value={"fields","hasHeader","delimiter","quote"})
public TextDelimited(Fields fields,
                                                boolean hasHeader,
                                                String delimiter,
                                                String quote)
Constructor TextDelimited creates a new TextDelimited instance.

Parameters:
fields - of type Fields
hasHeader - of type boolean
delimiter - of type String
quote - of type String

TextDelimited

@ConstructorProperties(value={"fields","skipHeader","writeHeader","delimiter","strict","quote","types","safe"})
public TextDelimited(Fields fields,
                                                boolean skipHeader,
                                                boolean writeHeader,
                                                String delimiter,
                                                boolean strict,
                                                String quote,
                                                Class[] types,
                                                boolean safe)
Constructor TextDelimited creates a new TextDelimited instance.

Parameters:
fields - of type Fields
skipHeader - of type boolean
writeHeader - of type boolean
delimiter - of type String
strict - of type boolean
quote - of type String
types - of type Class[]
safe - of type boolean
Method Detail

createInput

public LineNumberReader createInput(InputStream inputStream)

createOutput

public PrintWriter createOutput(OutputStream outputStream)

isSymmetrical

public boolean isSymmetrical()
Description copied from class: Scheme
Method isSymmetrical returns true if the sink fields equal the source fields. That is, this scheme sources the same fields as it sinks.

Overrides:
isSymmetrical in class Scheme<Properties,InputStream,OutputStream,LineNumberReader,PrintWriter>
Returns:
the symmetrical (type boolean) of this Scheme object.

retrieveSourceFields

public Fields retrieveSourceFields(FlowProcess<Properties> process,
                                   Tap tap)
Description copied from class: Scheme
Method retrieveSourceFields notifies a Scheme when it is appropriate to dynamically update the fields it sources. By default the current declared fields are returned.

The FlowProcess presents all known properties resolved by the current planner.

The instance is the parent Tap for this Scheme instance.

Overrides:
retrieveSourceFields in class Scheme<Properties,InputStream,OutputStream,LineNumberReader,PrintWriter>
Parameters:
process - of type FlowProcess
tap - of type Tap
Returns:
Fields

presentSourceFields

public void presentSourceFields(FlowProcess<Properties> process,
                                Tap tap,
                                Fields fields)
Description copied from class: Scheme
Method presentSourceFields is called after the planner is invoked and all fields are resolved. This method presents to the Scheme the actual source fields after any planner intervention.

This method is called after Scheme.retrieveSourceFields(cascading.flow.FlowProcess, cascading.tap.Tap).

Overrides:
presentSourceFields in class Scheme<Properties,InputStream,OutputStream,LineNumberReader,PrintWriter>
Parameters:
process - of type FlowProcess
tap - of type Tap
fields - of type Fields

presentSinkFields

public void presentSinkFields(FlowProcess<Properties> flowProcess,
                              Tap tap,
                              Fields fields)
Description copied from class: Scheme
Method presentSinkFields is called after the planner is invoked and all fields are resolved. This method presents to the Scheme the actual source fields after any planner intervention.

This method is called after Scheme.retrieveSinkFields(cascading.flow.FlowProcess, cascading.tap.Tap).

Overrides:
presentSinkFields in class Scheme<Properties,InputStream,OutputStream,LineNumberReader,PrintWriter>
Parameters:
flowProcess - of type FlowProcess
tap - of type Tap
fields - of type Fields

sourceConfInit

public void sourceConfInit(FlowProcess<Properties> flowProcess,
                           Tap<Properties,InputStream,OutputStream> tap,
                           Properties conf)
Description copied from class: Scheme
Method sourceInit initializes this instance as a source.

This method is executed client side as a means to provide necessary configuration parameters used by the underlying platform.

It is not intended to initialize resources that would be necessary during the execution of this class, like a "formatter" or "parser".

See Scheme.sourcePrepare(cascading.flow.FlowProcess, SourceCall) if resources much be initialized before use. And Scheme.sourceCleanup(cascading.flow.FlowProcess, SourceCall) if resources must be destroyed after use.

Specified by:
sourceConfInit in class Scheme<Properties,InputStream,OutputStream,LineNumberReader,PrintWriter>
tap - of type Tap
conf - of type JobConf @throws IOException on initialization failure

sourcePrepare

public void sourcePrepare(FlowProcess<Properties> flowProcess,
                          SourceCall<LineNumberReader,InputStream> sourceCall)
                   throws IOException
Description copied from class: Scheme
Method sourcePrepare is used to initialize resources needed during each call of Scheme.source(cascading.flow.FlowProcess, SourceCall).

Be sure to place any initialized objects in the SourceContext so each instance will remain threadsafe.

Overrides:
sourcePrepare in class Scheme<Properties,InputStream,OutputStream,LineNumberReader,PrintWriter>
Parameters:
flowProcess - of Process
sourceCall - of SourceCall
Throws:
IOException

source

public boolean source(FlowProcess<Properties> flowProcess,
                      SourceCall<LineNumberReader,InputStream> sourceCall)
               throws IOException
Description copied from class: Scheme
Method source will read a new "record" or value from SourceCall.getInput() and populate the available Tuple via SourceCall.getIncomingEntry() and return true on success or false if no more values available.

It's ok to set a new Tuple instance on the incomingEntry TupleEntry, or to simply re-use the existing instance.

Note this is only time it is safe to modify a Tuple instance handed over via a method call.

This method may optionally throw a TapException if it cannot process a particular instance of data. If the payload Tuple is set on the TapException, that Tuple will be written to any applicable failure trap Tap.

Specified by:
source in class Scheme<Properties,InputStream,OutputStream,LineNumberReader,PrintWriter>
Parameters:
flowProcess - of Process
sourceCall - of SourceCall
Returns:
returns true when a Tuple was successfully read
Throws:
IOException

sourceCleanup

public void sourceCleanup(FlowProcess<Properties> flowProcess,
                          SourceCall<LineNumberReader,InputStream> sourceCall)
                   throws IOException
Description copied from class: Scheme
Method sourceCleanup is used to destroy resources created by Scheme.sourcePrepare(cascading.flow.FlowProcess, SourceCall).

Overrides:
sourceCleanup in class Scheme<Properties,InputStream,OutputStream,LineNumberReader,PrintWriter>
Parameters:
flowProcess - of Process
sourceCall - of SourceCall
Throws:
IOException

sinkConfInit

public void sinkConfInit(FlowProcess<Properties> flowProcess,
                         Tap<Properties,InputStream,OutputStream> tap,
                         Properties conf)
Description copied from class: Scheme
Method sinkInit initializes this instance as a sink.

This method is executed client side as a means to provide necessary configuration parameters used by the underlying platform.

It is not intended to initialize resources that would be necessary during the execution of this class, like a "formatter" or "parser".

See Scheme.sinkPrepare(cascading.flow.FlowProcess, SinkCall) if resources much be initialized before use. And Scheme.sinkCleanup(cascading.flow.FlowProcess, SinkCall) if resources must be destroyed after use.

Specified by:
sinkConfInit in class Scheme<Properties,InputStream,OutputStream,LineNumberReader,PrintWriter>
tap - of type Tap
conf - of type JobConf @throws IOException on initialization failure

sinkPrepare

public void sinkPrepare(FlowProcess<Properties> flowProcess,
                        SinkCall<PrintWriter,OutputStream> sinkCall)
Description copied from class: Scheme
Method sinkPrepare is used to initialize resources needed during each call of Scheme.sink(cascading.flow.FlowProcess, SinkCall).

Be sure to place any initialized objects in the SinkContext so each instance will remain threadsafe.

Overrides:
sinkPrepare in class Scheme<Properties,InputStream,OutputStream,LineNumberReader,PrintWriter>
Parameters:
flowProcess - of Process
sinkCall - of SinkCall

sink

public void sink(FlowProcess<Properties> flowProcess,
                 SinkCall<PrintWriter,OutputStream> sinkCall)
          throws IOException
Description copied from class: Scheme
Method sink writes out the given Tuple found on SinkCall.getOutgoingEntry() to the SinkCall.getOutput().

This method may optionally throw a TapException if it cannot process a particular instance of data. If the payload Tuple is set on the TapException, that Tuple will be written to any applicable failure trap Tap. If not set, the incoming Tuple will be written instead.

Specified by:
sink in class Scheme<Properties,InputStream,OutputStream,LineNumberReader,PrintWriter>
Parameters:
flowProcess - of Process
sinkCall - of SinkCall
Throws:
IOException

sinkCleanup

public void sinkCleanup(FlowProcess<Properties> flowProcess,
                        SinkCall<PrintWriter,OutputStream> sinkCall)
Description copied from class: Scheme
Method sinkCleanup is used to destroy resources created by Scheme.sinkPrepare(cascading.flow.FlowProcess, SinkCall).

Overrides:
sinkCleanup in class Scheme<Properties,InputStream,OutputStream,LineNumberReader,PrintWriter>
Parameters:
flowProcess - of Process
sinkCall - of SinkCall


Copyright © 2007-2012 Concurrent, Inc. All Rights Reserved.