cascading.scheme
Class TextLine

java.lang.Object
  extended by cascading.scheme.Scheme
      extended by cascading.scheme.TextLine
All Implemented Interfaces:
Serializable
Direct Known Subclasses:
TextDelimited

public class TextLine
extends Scheme

A TextLine is a type of Scheme for plain text files. Files are broken into lines. Either line-feed or carriage-return are used to signal end of line.

By default, this scheme returns a Tuple with two fields, "offset" and "line".

Many of the constructors take both "sourceFields" and "sinkFields". sourceFields denote the field names to be used instead of the names "offset" and "line". sinkFields is a selector and is by default Fields.ALL. Any available field names can be given if only a subset of the incoming fields should be used.

If a Fields instance is passed on the constructor as sourceFields having only one field, the return tuples will simply be the "line" value using the given field name.

Note that TextLine will concatenate all the Tuple values for the selected fields with a TAB delimiter before writing out the line.

Note sink compression is TextLine.Compress.DISABLE by default. If null is passed to the constructor for the compression value, it will remain disabled.

If all the input files end with ".zip", the ZipInputFormat will be used. This is not bi-directional, so zip files cannot be written.

See Also:
Serialized Form

Nested Class Summary
static class TextLine.Compress
           
 
Field Summary
static Fields DEFAULT_SOURCE_FIELDS
          Field DEFAULT_SOURCE_FIELDS
 
Constructor Summary
TextLine()
          Creates a new TextLine instance that sources "offset" and "line" fields, and sinks all incoming fields, where "offset" is the byte offset in the input file.
TextLine(Fields sourceFields)
          Creates a new TextLine instance.
TextLine(Fields sourceFields, Fields sinkFields)
          Creates a new TextLine instance.
TextLine(Fields sourceFields, Fields sinkFields, int numSinkParts)
          Creates a new TextLine instance.
TextLine(Fields sourceFields, Fields sinkFields, TextLine.Compress sinkCompression)
          Constructor TextLine creates a new TextLine instance.
TextLine(Fields sourceFields, Fields sinkFields, TextLine.Compress sinkCompression, int numSinkParts)
          Constructor TextLine creates a new TextLine instance.
TextLine(Fields sourceFields, int numSinkParts)
          Creates a new TextLine instance.
TextLine(int numSinkParts)
          Creates a new TextLine instance that sources "offset" and "line" fields, and sinks all incoming fields, where "offset" is the byte offset in the input file.
TextLine(TextLine.Compress sinkCompression)
          Creates a new TextLine instance that sources "offset" and "line" fields, and sinks all incoming fields, where "offset" is the byte offset in the input file.
 
Method Summary
 TextLine.Compress getSinkCompression()
          Method getSinkCompression returns the sinkCompression of this TextLine object.
 void setSinkCompression(TextLine.Compress sinkCompression)
          Method setSinkCompression sets the sinkCompression of this TextLine object.
 void sink(TupleEntry tupleEntry, OutputCollector outputCollector)
          Method sink writes out the given Tuple instance to the outputCollector.
 void sinkInit(Tap tap, JobConf conf)
          Method sinkInit initializes this instance as a sink.
 Tuple source(Object key, Object value)
          Method source takes the given Hadoop key and value and returns a new Tuple instance.
 void sourceInit(Tap tap, JobConf conf)
          Method sourceInit initializes this instance as a source.
 
Methods inherited from class cascading.scheme.Scheme
equals, getNumSinkParts, getSinkFields, getSourceFields, getTrace, hashCode, isSink, isSource, isSymmetrical, isWriteDirect, setNumSinkParts, setSinkFields, setSourceFields, toString
 
Methods inherited from class java.lang.Object
clone, finalize, getClass, notify, notifyAll, wait, wait, wait
 

Field Detail

DEFAULT_SOURCE_FIELDS

public static final Fields DEFAULT_SOURCE_FIELDS
Field DEFAULT_SOURCE_FIELDS

Constructor Detail

TextLine

public TextLine()
Creates a new TextLine instance that sources "offset" and "line" fields, and sinks all incoming fields, where "offset" is the byte offset in the input file.


TextLine

@ConstructorProperties(value="numSinkParts")
public TextLine(int numSinkParts)
Creates a new TextLine instance that sources "offset" and "line" fields, and sinks all incoming fields, where "offset" is the byte offset in the input file.

Parameters:
numSinkParts - of type int

TextLine

@ConstructorProperties(value="sinkCompression")
public TextLine(TextLine.Compress sinkCompression)
Creates a new TextLine instance that sources "offset" and "line" fields, and sinks all incoming fields, where "offset" is the byte offset in the input file.

Parameters:
sinkCompression - of type Compress

TextLine

@ConstructorProperties(value={"sourceFields","sinkFields"})
public TextLine(Fields sourceFields,
                                           Fields sinkFields)
Creates a new TextLine instance. If sourceFields has one field, only the text line will be returned in the subsequent tuples.

Parameters:
sourceFields - the source fields for this scheme
sinkFields - the sink fields for this scheme

TextLine

@ConstructorProperties(value={"sourceFields","sinkFields","numSinkParts"})
public TextLine(Fields sourceFields,
                                           Fields sinkFields,
                                           int numSinkParts)
Creates a new TextLine instance. If sourceFields has one field, only the text line will be returned in the subsequent tuples.

Parameters:
sourceFields - the source fields for this scheme
sinkFields - the sink fields for this scheme
numSinkParts - of type int

TextLine

@ConstructorProperties(value={"sourceFields","sinkFields","sinkCompression"})
public TextLine(Fields sourceFields,
                                           Fields sinkFields,
                                           TextLine.Compress sinkCompression)
Constructor TextLine creates a new TextLine instance. If sourceFields has one field, only the text line will be returned in the subsequent tuples.

Parameters:
sourceFields - of type Fields
sinkFields - of type Fields
sinkCompression - of type Compress

TextLine

@ConstructorProperties(value={"sourceFields","sinkFields","sinkCompression","numSinkParts"})
public TextLine(Fields sourceFields,
                                           Fields sinkFields,
                                           TextLine.Compress sinkCompression,
                                           int numSinkParts)
Constructor TextLine creates a new TextLine instance. If sourceFields has one field, only the text line will be returned in the subsequent tuples.

Parameters:
sourceFields - of type Fields
sinkFields - of type Fields
sinkCompression - of type Compress
numSinkParts - of type int

TextLine

@ConstructorProperties(value="sourceFields")
public TextLine(Fields sourceFields)
Creates a new TextLine instance. If sourceFields has one field, only the text line will be returned in the subsequent tuples.

Parameters:
sourceFields - the source fields for this scheme

TextLine

@ConstructorProperties(value={"sourceFields","numSinkParts"})
public TextLine(Fields sourceFields,
                                           int numSinkParts)
Creates a new TextLine instance. If sourceFields has one field, only the text line will be returned in the subsequent tuples. The resulting data set will have numSinkParts.

Parameters:
sourceFields - the source fields for this scheme
numSinkParts - of type int
Method Detail

getSinkCompression

public TextLine.Compress getSinkCompression()
Method getSinkCompression returns the sinkCompression of this TextLine object.

Returns:
the sinkCompression (type Compress) of this TextLine object.

setSinkCompression

public void setSinkCompression(TextLine.Compress sinkCompression)
Method setSinkCompression sets the sinkCompression of this TextLine object. If null, compression will remain disabled.

Parameters:
sinkCompression - the sinkCompression of this TextLine object.

sourceInit

public void sourceInit(Tap tap,
                       JobConf conf)
Description copied from class: Scheme
Method sourceInit initializes this instance as a source.

Specified by:
sourceInit in class Scheme
Parameters:
tap - of type Tap
conf - of type JobConf

sinkInit

public void sinkInit(Tap tap,
                     JobConf conf)
              throws IOException
Description copied from class: Scheme
Method sinkInit initializes this instance as a sink.

Specified by:
sinkInit in class Scheme
Parameters:
tap - of type Tap
conf - of type JobConf
Throws:
IOException - on initialization failure

source

public Tuple source(Object key,
                    Object value)
Description copied from class: Scheme
Method source takes the given Hadoop key and value and returns a new Tuple instance.

Specified by:
source in class Scheme
Parameters:
key - of type WritableComparable
value - of type Writable
Returns:
Tuple

sink

public void sink(TupleEntry tupleEntry,
                 OutputCollector outputCollector)
          throws IOException
Description copied from class: Scheme
Method sink writes out the given Tuple instance to the outputCollector.

Specified by:
sink in class Scheme
outputCollector - of type OutputCollector @throws IOException when
Throws:
IOException


Copyright © 2007-2010 Concurrent, Inc. All Rights Reserved.