cascading.scheme.hadoop
Class SequenceFile

java.lang.Object
  extended by cascading.scheme.Scheme<org.apache.hadoop.mapred.JobConf,org.apache.hadoop.mapred.RecordReader,org.apache.hadoop.mapred.OutputCollector,Object[],Void>
      extended by cascading.scheme.hadoop.SequenceFile
All Implemented Interfaces:
Serializable
Direct Known Subclasses:
WritableSequenceFile

public class SequenceFile
extends Scheme<org.apache.hadoop.mapred.JobConf,org.apache.hadoop.mapred.RecordReader,org.apache.hadoop.mapred.OutputCollector,Object[],Void>

A SequenceFile is a type of Scheme, which is a flat file consisting of binary key/value pairs. This is a space and time efficient means to store data.

See Also:
Serialized Form

Constructor Summary
protected SequenceFile()
          Protected for use by TempDfs and other subclasses.
  SequenceFile(Fields fields)
          Creates a new SequenceFile instance that stores the given field names.
 
Method Summary
 void sink(FlowProcess<org.apache.hadoop.mapred.JobConf> flowProcess, SinkCall<Void,org.apache.hadoop.mapred.OutputCollector> sinkCall)
          Method sink writes out the given Tuple found on SinkCall.getOutgoingEntry() to the SinkCall.getOutput().
 void sinkConfInit(FlowProcess<org.apache.hadoop.mapred.JobConf> flowProcess, Tap<org.apache.hadoop.mapred.JobConf,org.apache.hadoop.mapred.RecordReader,org.apache.hadoop.mapred.OutputCollector> tap, org.apache.hadoop.mapred.JobConf conf)
          Method sinkInit initializes this instance as a sink.
 boolean source(FlowProcess<org.apache.hadoop.mapred.JobConf> flowProcess, SourceCall<Object[],org.apache.hadoop.mapred.RecordReader> sourceCall)
          Method source will read a new "record" or value from SourceCall.getInput() and populate the available Tuple via SourceCall.getIncomingEntry() and return true on success or false if no more values available.
 void sourceCleanup(FlowProcess<org.apache.hadoop.mapred.JobConf> flowProcess, SourceCall<Object[],org.apache.hadoop.mapred.RecordReader> sourceCall)
          Method sourceCleanup is used to destroy resources created by Scheme.sourcePrepare(cascading.flow.FlowProcess, SourceCall).
 void sourceConfInit(FlowProcess<org.apache.hadoop.mapred.JobConf> flowProcess, Tap<org.apache.hadoop.mapred.JobConf,org.apache.hadoop.mapred.RecordReader,org.apache.hadoop.mapred.OutputCollector> tap, org.apache.hadoop.mapred.JobConf conf)
          Method sourceInit initializes this instance as a source.
 void sourcePrepare(FlowProcess<org.apache.hadoop.mapred.JobConf> flowProcess, SourceCall<Object[],org.apache.hadoop.mapred.RecordReader> sourceCall)
          Method sourcePrepare is used to initialize resources needed during each call of Scheme.source(cascading.flow.FlowProcess, SourceCall).
 
Methods inherited from class cascading.scheme.Scheme
equals, getNumSinkParts, getSinkFields, getSourceFields, getTrace, hashCode, isSink, isSource, isSymmetrical, presentSinkFields, presentSinkFieldsInternal, presentSourceFields, presentSourceFieldsInternal, retrieveSinkFields, retrieveSourceFields, setNumSinkParts, setSinkFields, setSourceFields, sinkCleanup, sinkPrepare, toString
 
Methods inherited from class java.lang.Object
clone, finalize, getClass, notify, notifyAll, wait, wait, wait
 

Constructor Detail

SequenceFile

protected SequenceFile()
Protected for use by TempDfs and other subclasses. Not for general consumption.


SequenceFile

@ConstructorProperties(value="fields")
public SequenceFile(Fields fields)
Creates a new SequenceFile instance that stores the given field names.

Parameters:
fields -
Method Detail

sourceConfInit

public void sourceConfInit(FlowProcess<org.apache.hadoop.mapred.JobConf> flowProcess,
                           Tap<org.apache.hadoop.mapred.JobConf,org.apache.hadoop.mapred.RecordReader,org.apache.hadoop.mapred.OutputCollector> tap,
                           org.apache.hadoop.mapred.JobConf conf)
Description copied from class: Scheme
Method sourceInit initializes this instance as a source.

This method is executed client side as a means to provide necessary configuration parameters used by the underlying platform.

It is not intended to initialize resources that would be necessary during the execution of this class, like a "formatter" or "parser".

See Scheme.sourcePrepare(cascading.flow.FlowProcess, SourceCall) if resources much be initialized before use. And Scheme.sourceCleanup(cascading.flow.FlowProcess, SourceCall) if resources must be destroyed after use.

Specified by:
sourceConfInit in class Scheme<org.apache.hadoop.mapred.JobConf,org.apache.hadoop.mapred.RecordReader,org.apache.hadoop.mapred.OutputCollector,Object[],Void>
Parameters:
flowProcess - of type FlowProcess
tap - of type Tap
conf - of type Config

sinkConfInit

public void sinkConfInit(FlowProcess<org.apache.hadoop.mapred.JobConf> flowProcess,
                         Tap<org.apache.hadoop.mapred.JobConf,org.apache.hadoop.mapred.RecordReader,org.apache.hadoop.mapred.OutputCollector> tap,
                         org.apache.hadoop.mapred.JobConf conf)
Description copied from class: Scheme
Method sinkInit initializes this instance as a sink.

This method is executed client side as a means to provide necessary configuration parameters used by the underlying platform.

It is not intended to initialize resources that would be necessary during the execution of this class, like a "formatter" or "parser".

See Scheme.sinkPrepare(cascading.flow.FlowProcess, SinkCall) if resources much be initialized before use. And Scheme.sinkCleanup(cascading.flow.FlowProcess, SinkCall) if resources must be destroyed after use.

Specified by:
sinkConfInit in class Scheme<org.apache.hadoop.mapred.JobConf,org.apache.hadoop.mapred.RecordReader,org.apache.hadoop.mapred.OutputCollector,Object[],Void>
Parameters:
flowProcess - of type FlowProcess
tap - of type Tap
conf - of type Config

sourcePrepare

public void sourcePrepare(FlowProcess<org.apache.hadoop.mapred.JobConf> flowProcess,
                          SourceCall<Object[],org.apache.hadoop.mapred.RecordReader> sourceCall)
Description copied from class: Scheme
Method sourcePrepare is used to initialize resources needed during each call of Scheme.source(cascading.flow.FlowProcess, SourceCall).

Be sure to place any initialized objects in the SourceContext so each instance will remain threadsafe.

Overrides:
sourcePrepare in class Scheme<org.apache.hadoop.mapred.JobConf,org.apache.hadoop.mapred.RecordReader,org.apache.hadoop.mapred.OutputCollector,Object[],Void>
Parameters:
flowProcess - of type FlowProcess
sourceCall - of type SourceCall

source

public boolean source(FlowProcess<org.apache.hadoop.mapred.JobConf> flowProcess,
                      SourceCall<Object[],org.apache.hadoop.mapred.RecordReader> sourceCall)
               throws IOException
Description copied from class: Scheme
Method source will read a new "record" or value from SourceCall.getInput() and populate the available Tuple via SourceCall.getIncomingEntry() and return true on success or false if no more values available.

It's ok to set a new Tuple instance on the incomingEntry TupleEntry, or to simply re-use the existing instance.

Note this is only time it is safe to modify a Tuple instance handed over via a method call.

This method may optionally throw a TapException if it cannot process a particular instance of data. If the payload Tuple is set on the TapException, that Tuple will be written to any applicable failure trap Tap.

Specified by:
source in class Scheme<org.apache.hadoop.mapred.JobConf,org.apache.hadoop.mapred.RecordReader,org.apache.hadoop.mapred.OutputCollector,Object[],Void>
Parameters:
flowProcess - of type FlowProcess
sourceCall - of SourceCall
Returns:
returns true when a Tuple was successfully read
Throws:
IOException

sourceCleanup

public void sourceCleanup(FlowProcess<org.apache.hadoop.mapred.JobConf> flowProcess,
                          SourceCall<Object[],org.apache.hadoop.mapred.RecordReader> sourceCall)
Description copied from class: Scheme
Method sourceCleanup is used to destroy resources created by Scheme.sourcePrepare(cascading.flow.FlowProcess, SourceCall).

Overrides:
sourceCleanup in class Scheme<org.apache.hadoop.mapred.JobConf,org.apache.hadoop.mapred.RecordReader,org.apache.hadoop.mapred.OutputCollector,Object[],Void>
Parameters:
flowProcess - of Process
sourceCall - of type SourceCall

sink

public void sink(FlowProcess<org.apache.hadoop.mapred.JobConf> flowProcess,
                 SinkCall<Void,org.apache.hadoop.mapred.OutputCollector> sinkCall)
          throws IOException
Description copied from class: Scheme
Method sink writes out the given Tuple found on SinkCall.getOutgoingEntry() to the SinkCall.getOutput().

This method may optionally throw a TapException if it cannot process a particular instance of data. If the payload Tuple is set on the TapException, that Tuple will be written to any applicable failure trap Tap. If not set, the incoming Tuple will be written instead.

Specified by:
sink in class Scheme<org.apache.hadoop.mapred.JobConf,org.apache.hadoop.mapred.RecordReader,org.apache.hadoop.mapred.OutputCollector,Object[],Void>
Parameters:
flowProcess - of Process
sinkCall - of SinkCall
Throws:
IOException


Copyright © 2007-2013 Concurrent, Inc. All Rights Reserved.