cascading.scheme.hadoop
Class TextLine

java.lang.Object
  extended by cascading.scheme.Scheme<JobConf,RecordReader,OutputCollector,Object[],Object[]>
      extended by cascading.scheme.hadoop.TextLine
All Implemented Interfaces:
Serializable
Direct Known Subclasses:
TextDelimited

public class TextLine
extends cascading.scheme.Scheme<JobConf,RecordReader,OutputCollector,Object[],Object[]>

A TextLine is a type of Scheme for plain text files. Files are broken into lines. Either line-feed or carriage-return are used to signal end of line.

By default, this scheme returns a Tuple with two fields, "offset" and "line".

Many of the constructors take both "sourceFields" and "sinkFields". sourceFields denote the field names to be used instead of the names "offset" and "line". sinkFields is a selector and is by default Fields.ALL. Any available field names can be given if only a subset of the incoming fields should be used.

If a Fields instance is passed on the constructor as sourceFields having only one field, the return tuples will simply be the "line" value using the given field name.

Note that TextLine will concatenate all the Tuple values for the selected fields with a TAB delimiter before writing out the line.

Note sink compression is TextLine.Compress.DISABLE by default. If null is passed to the constructor for the compression value, it will remain disabled.

If any of the input files end with ".zip", an error will be thrown. *

By default, all text is encoded/decoded as UTF-8. This can be changed via the charsetName constructor argument.

See Also:
Serialized Form

Nested Class Summary
static class TextLine.Compress
           
 
Field Summary
static String DEFAULT_CHARSET
           
static cascading.tuple.Fields DEFAULT_SOURCE_FIELDS
          Field DEFAULT_SOURCE_FIELDS
 
Constructor Summary
TextLine()
          Creates a new TextLine instance that sources "offset" and "line" fields, and sinks all incoming fields, where "offset" is the byte offset in the input file.
TextLine(cascading.tuple.Fields sourceFields)
          Creates a new TextLine instance.
TextLine(cascading.tuple.Fields sourceFields, cascading.tuple.Fields sinkFields)
          Creates a new TextLine instance.
TextLine(cascading.tuple.Fields sourceFields, cascading.tuple.Fields sinkFields, int numSinkParts)
          Creates a new TextLine instance.
TextLine(cascading.tuple.Fields sourceFields, cascading.tuple.Fields sinkFields, String charsetName)
          Creates a new TextLine instance.
TextLine(cascading.tuple.Fields sourceFields, cascading.tuple.Fields sinkFields, TextLine.Compress sinkCompression)
          Constructor TextLine creates a new TextLine instance.
TextLine(cascading.tuple.Fields sourceFields, cascading.tuple.Fields sinkFields, TextLine.Compress sinkCompression, int numSinkParts)
          Constructor TextLine creates a new TextLine instance.
TextLine(cascading.tuple.Fields sourceFields, cascading.tuple.Fields sinkFields, TextLine.Compress sinkCompression, int numSinkParts, String charsetName)
          Constructor TextLine creates a new TextLine instance.
TextLine(cascading.tuple.Fields sourceFields, cascading.tuple.Fields sinkFields, TextLine.Compress sinkCompression, String charsetName)
          Constructor TextLine creates a new TextLine instance.
TextLine(cascading.tuple.Fields sourceFields, int numSinkParts)
          Creates a new TextLine instance.
TextLine(cascading.tuple.Fields sourceFields, String charsetName)
          Creates a new TextLine instance.
TextLine(int numSinkParts)
          Creates a new TextLine instance that sources "offset" and "line" fields, and sinks all incoming fields, where "offset" is the byte offset in the input file.
TextLine(TextLine.Compress sinkCompression)
          Creates a new TextLine instance that sources "offset" and "line" fields, and sinks all incoming fields, where "offset" is the byte offset in the input file.
 
Method Summary
 TextLine.Compress getSinkCompression()
          Method getSinkCompression returns the sinkCompression of this TextLine object.
protected  String makeEncodedString(Object[] context)
           
 void presentSinkFields(cascading.flow.FlowProcess<JobConf> flowProcess, cascading.tap.Tap tap, cascading.tuple.Fields fields)
           
 void presentSourceFields(cascading.flow.FlowProcess<JobConf> flowProcess, cascading.tap.Tap tap, cascading.tuple.Fields fields)
           
protected  void setCharsetName(String charsetName)
           
 void setSinkCompression(TextLine.Compress sinkCompression)
          Method setSinkCompression sets the sinkCompression of this TextLine object.
 void sink(cascading.flow.FlowProcess<JobConf> flowProcess, cascading.scheme.SinkCall<Object[],OutputCollector> sinkCall)
           
 void sinkConfInit(cascading.flow.FlowProcess<JobConf> flowProcess, cascading.tap.Tap<JobConf,RecordReader,OutputCollector> tap, JobConf conf)
           
 void sinkPrepare(cascading.flow.FlowProcess<JobConf> flowProcess, cascading.scheme.SinkCall<Object[],OutputCollector> sinkCall)
           
 boolean source(cascading.flow.FlowProcess<JobConf> flowProcess, cascading.scheme.SourceCall<Object[],RecordReader> sourceCall)
           
 void sourceCleanup(cascading.flow.FlowProcess<JobConf> flowProcess, cascading.scheme.SourceCall<Object[],RecordReader> sourceCall)
           
 void sourceConfInit(cascading.flow.FlowProcess<JobConf> flowProcess, cascading.tap.Tap<JobConf,RecordReader,OutputCollector> tap, JobConf conf)
           
protected  void sourceHandleInput(cascading.scheme.SourceCall<Object[],RecordReader> sourceCall)
           
 void sourcePrepare(cascading.flow.FlowProcess<JobConf> flowProcess, cascading.scheme.SourceCall<Object[],RecordReader> sourceCall)
           
protected  void verify(cascading.tuple.Fields sourceFields)
           
 
Methods inherited from class cascading.scheme.Scheme
equals, getNumSinkParts, getSinkFields, getSourceFields, getTrace, hashCode, isSink, isSource, isSymmetrical, presentSinkFieldsInternal, presentSourceFieldsInternal, retrieveSinkFields, retrieveSourceFields, setNumSinkParts, setSinkFields, setSourceFields, sinkCleanup, toString
 
Methods inherited from class java.lang.Object
clone, finalize, getClass, notify, notifyAll, wait, wait, wait
 

Field Detail

DEFAULT_CHARSET

public static final String DEFAULT_CHARSET
See Also:
Constant Field Values

DEFAULT_SOURCE_FIELDS

public static final cascading.tuple.Fields DEFAULT_SOURCE_FIELDS
Field DEFAULT_SOURCE_FIELDS

Constructor Detail

TextLine

public TextLine()
Creates a new TextLine instance that sources "offset" and "line" fields, and sinks all incoming fields, where "offset" is the byte offset in the input file.


TextLine

@ConstructorProperties(value="numSinkParts")
public TextLine(int numSinkParts)
Creates a new TextLine instance that sources "offset" and "line" fields, and sinks all incoming fields, where "offset" is the byte offset in the input file.

Parameters:
numSinkParts - of type int

TextLine

@ConstructorProperties(value="sinkCompression")
public TextLine(TextLine.Compress sinkCompression)
Creates a new TextLine instance that sources "offset" and "line" fields, and sinks all incoming fields, where "offset" is the byte offset in the input file.

Parameters:
sinkCompression - of type Compress

TextLine

@ConstructorProperties(value={"sourceFields","sinkFields"})
public TextLine(cascading.tuple.Fields sourceFields,
                                           cascading.tuple.Fields sinkFields)
Creates a new TextLine instance. If sourceFields has one field, only the text line will be returned in the subsequent tuples.

Parameters:
sourceFields - the source fields for this scheme
sinkFields - the sink fields for this scheme

TextLine

@ConstructorProperties(value={"sourceFields","sinkFields","charsetName"})
public TextLine(cascading.tuple.Fields sourceFields,
                                           cascading.tuple.Fields sinkFields,
                                           String charsetName)
Creates a new TextLine instance. If sourceFields has one field, only the text line will be returned in the subsequent tuples.

Parameters:
sourceFields - the source fields for this scheme
sinkFields - the sink fields for this scheme
charsetName - of type String

TextLine

@ConstructorProperties(value={"sourceFields","sinkFields","numSinkParts"})
public TextLine(cascading.tuple.Fields sourceFields,
                                           cascading.tuple.Fields sinkFields,
                                           int numSinkParts)
Creates a new TextLine instance. If sourceFields has one field, only the text line will be returned in the subsequent tuples.

Parameters:
sourceFields - the source fields for this scheme
sinkFields - the sink fields for this scheme
numSinkParts - of type int

TextLine

@ConstructorProperties(value={"sourceFields","sinkFields","sinkCompression"})
public TextLine(cascading.tuple.Fields sourceFields,
                                           cascading.tuple.Fields sinkFields,
                                           TextLine.Compress sinkCompression)
Constructor TextLine creates a new TextLine instance. If sourceFields has one field, only the text line will be returned in the subsequent tuples.

Parameters:
sourceFields - of type Fields
sinkFields - of type Fields
sinkCompression - of type Compress

TextLine

@ConstructorProperties(value={"sourceFields","sinkFields","sinkCompression","charsetName"})
public TextLine(cascading.tuple.Fields sourceFields,
                                           cascading.tuple.Fields sinkFields,
                                           TextLine.Compress sinkCompression,
                                           String charsetName)
Constructor TextLine creates a new TextLine instance. If sourceFields has one field, only the text line will be returned in the subsequent tuples.

Parameters:
sourceFields - of type Fields
sinkFields - of type Fields
sinkCompression - of type Compress
charsetName - of type String

TextLine

@ConstructorProperties(value={"sourceFields","sinkFields","sinkCompression","numSinkParts"})
public TextLine(cascading.tuple.Fields sourceFields,
                                           cascading.tuple.Fields sinkFields,
                                           TextLine.Compress sinkCompression,
                                           int numSinkParts)
Constructor TextLine creates a new TextLine instance. If sourceFields has one field, only the text line will be returned in the subsequent tuples.

Parameters:
sourceFields - of type Fields
sinkFields - of type Fields
sinkCompression - of type Compress
numSinkParts - of type int

TextLine

@ConstructorProperties(value={"sourceFields","sinkFields","sinkCompression","numSinkParts","charsetName"})
public TextLine(cascading.tuple.Fields sourceFields,
                                           cascading.tuple.Fields sinkFields,
                                           TextLine.Compress sinkCompression,
                                           int numSinkParts,
                                           String charsetName)
Constructor TextLine creates a new TextLine instance. If sourceFields has one field, only the text line will be returned in the subsequent tuples.

Parameters:
sourceFields - of type Fields
sinkFields - of type Fields
sinkCompression - of type Compress
numSinkParts - of type int
charsetName - of type String

TextLine

@ConstructorProperties(value="sourceFields")
public TextLine(cascading.tuple.Fields sourceFields)
Creates a new TextLine instance. If sourceFields has one field, only the text line will be returned in the subsequent tuples.

Parameters:
sourceFields - the source fields for this scheme

TextLine

@ConstructorProperties(value={"sourceFields","charsetName"})
public TextLine(cascading.tuple.Fields sourceFields,
                                           String charsetName)
Creates a new TextLine instance. If sourceFields has one field, only the text line will be returned in the subsequent tuples.

Parameters:
sourceFields - the source fields for this scheme
charsetName - of type String

TextLine

@ConstructorProperties(value={"sourceFields","numSinkParts"})
public TextLine(cascading.tuple.Fields sourceFields,
                                           int numSinkParts)
Creates a new TextLine instance. If sourceFields has one field, only the text line will be returned in the subsequent tuples. The resulting data set will have numSinkParts.

Parameters:
sourceFields - the source fields for this scheme
numSinkParts - of type int
Method Detail

setCharsetName

protected void setCharsetName(String charsetName)

verify

protected void verify(cascading.tuple.Fields sourceFields)

getSinkCompression

public TextLine.Compress getSinkCompression()
Method getSinkCompression returns the sinkCompression of this TextLine object.

Returns:
the sinkCompression (type Compress) of this TextLine object.

setSinkCompression

public void setSinkCompression(TextLine.Compress sinkCompression)
Method setSinkCompression sets the sinkCompression of this TextLine object. If null, compression will remain disabled.

Parameters:
sinkCompression - the sinkCompression of this TextLine object.

sourceConfInit

public void sourceConfInit(cascading.flow.FlowProcess<JobConf> flowProcess,
                           cascading.tap.Tap<JobConf,RecordReader,OutputCollector> tap,
                           JobConf conf)
Specified by:
sourceConfInit in class cascading.scheme.Scheme<JobConf,RecordReader,OutputCollector,Object[],Object[]>

presentSourceFields

public void presentSourceFields(cascading.flow.FlowProcess<JobConf> flowProcess,
                                cascading.tap.Tap tap,
                                cascading.tuple.Fields fields)
Overrides:
presentSourceFields in class cascading.scheme.Scheme<JobConf,RecordReader,OutputCollector,Object[],Object[]>

presentSinkFields

public void presentSinkFields(cascading.flow.FlowProcess<JobConf> flowProcess,
                              cascading.tap.Tap tap,
                              cascading.tuple.Fields fields)
Overrides:
presentSinkFields in class cascading.scheme.Scheme<JobConf,RecordReader,OutputCollector,Object[],Object[]>

sinkConfInit

public void sinkConfInit(cascading.flow.FlowProcess<JobConf> flowProcess,
                         cascading.tap.Tap<JobConf,RecordReader,OutputCollector> tap,
                         JobConf conf)
Specified by:
sinkConfInit in class cascading.scheme.Scheme<JobConf,RecordReader,OutputCollector,Object[],Object[]>

sourcePrepare

public void sourcePrepare(cascading.flow.FlowProcess<JobConf> flowProcess,
                          cascading.scheme.SourceCall<Object[],RecordReader> sourceCall)
Overrides:
sourcePrepare in class cascading.scheme.Scheme<JobConf,RecordReader,OutputCollector,Object[],Object[]>

source

public boolean source(cascading.flow.FlowProcess<JobConf> flowProcess,
                      cascading.scheme.SourceCall<Object[],RecordReader> sourceCall)
               throws IOException
Specified by:
source in class cascading.scheme.Scheme<JobConf,RecordReader,OutputCollector,Object[],Object[]>
Throws:
IOException

sourceHandleInput

protected void sourceHandleInput(cascading.scheme.SourceCall<Object[],RecordReader> sourceCall)

makeEncodedString

protected String makeEncodedString(Object[] context)

sourceCleanup

public void sourceCleanup(cascading.flow.FlowProcess<JobConf> flowProcess,
                          cascading.scheme.SourceCall<Object[],RecordReader> sourceCall)
Overrides:
sourceCleanup in class cascading.scheme.Scheme<JobConf,RecordReader,OutputCollector,Object[],Object[]>

sinkPrepare

public void sinkPrepare(cascading.flow.FlowProcess<JobConf> flowProcess,
                        cascading.scheme.SinkCall<Object[],OutputCollector> sinkCall)
                 throws IOException
Overrides:
sinkPrepare in class cascading.scheme.Scheme<JobConf,RecordReader,OutputCollector,Object[],Object[]>
Throws:
IOException

sink

public void sink(cascading.flow.FlowProcess<JobConf> flowProcess,
                 cascading.scheme.SinkCall<Object[],OutputCollector> sinkCall)
          throws IOException
Specified by:
sink in class cascading.scheme.Scheme<JobConf,RecordReader,OutputCollector,Object[],Object[]>
Throws:
IOException


Copyright © 2007-2013 Concurrent, Inc. All Rights Reserved.