cascading.scheme.hadoop
Class WritableSequenceFile

java.lang.Object
  extended by cascading.scheme.Scheme<org.apache.hadoop.mapred.JobConf,org.apache.hadoop.mapred.RecordReader,org.apache.hadoop.mapred.OutputCollector,Object[],Void>
      extended by cascading.scheme.hadoop.SequenceFile
          extended by cascading.scheme.hadoop.WritableSequenceFile
All Implemented Interfaces:
Serializable

public class WritableSequenceFile
extends SequenceFile

Class WritableSequenceFile is a sub-class of SequenceFile that reads and writes values of the given writableType Class, instead of Tuple instances used by default in SequenceFile.

This Class is a convenience for those who need to read/write specific types from existing sequence files without them being wrapped in a Tuple instance.

Note due to the nature of sequence files, only one type can be stored in the key and value positions, they they can be uniquely different types (LongWritable, Text).

If keyType is null, valueType must not be null, and vice versa, assuming you only wish to store a single value.

NullWritable is used as the empty type for either a null keyType or valueType.

See Also:
Serialized Form

Field Summary
protected  Class<? extends org.apache.hadoop.io.Writable> keyType
           
protected  Class<? extends org.apache.hadoop.io.Writable> valueType
           
 
Constructor Summary
WritableSequenceFile(Fields fields, Class<? extends org.apache.hadoop.io.Writable> valueType)
          Constructor WritableSequenceFile creates a new WritableSequenceFile instance.
WritableSequenceFile(Fields fields, Class<? extends org.apache.hadoop.io.Writable> keyType, Class<? extends org.apache.hadoop.io.Writable> valueType)
          Constructor WritableSequenceFile creates a new WritableSequenceFile instance.
 
Method Summary
 boolean equals(Object object)
           
 int hashCode()
           
 void sink(FlowProcess<org.apache.hadoop.mapred.JobConf> flowProcess, SinkCall<Void,org.apache.hadoop.mapred.OutputCollector> sinkCall)
          Method sink writes out the given Tuple found on SinkCall.getOutgoingEntry() to the SinkCall.getOutput().
 void sinkConfInit(FlowProcess<org.apache.hadoop.mapred.JobConf> flowProcess, Tap<org.apache.hadoop.mapred.JobConf,org.apache.hadoop.mapred.RecordReader,org.apache.hadoop.mapred.OutputCollector> tap, org.apache.hadoop.mapred.JobConf conf)
          Method sinkInit initializes this instance as a sink.
 boolean source(FlowProcess<org.apache.hadoop.mapred.JobConf> flowProcess, SourceCall<Object[],org.apache.hadoop.mapred.RecordReader> sourceCall)
          Method source will read a new "record" or value from SourceCall.getInput() and populate the available Tuple via SourceCall.getIncomingEntry() and return true on success or false if no more values available.
 
Methods inherited from class cascading.scheme.hadoop.SequenceFile
sourceCleanup, sourceConfInit, sourcePrepare
 
Methods inherited from class cascading.scheme.Scheme
getNumSinkParts, getSinkFields, getSourceFields, getTrace, isSink, isSource, isSymmetrical, presentSinkFields, presentSinkFieldsInternal, presentSourceFields, presentSourceFieldsInternal, retrieveSinkFields, retrieveSourceFields, setNumSinkParts, setSinkFields, setSourceFields, sinkCleanup, sinkPrepare, toString
 
Methods inherited from class java.lang.Object
clone, finalize, getClass, notify, notifyAll, wait, wait, wait
 

Field Detail

keyType

protected final Class<? extends org.apache.hadoop.io.Writable> keyType

valueType

protected final Class<? extends org.apache.hadoop.io.Writable> valueType
Constructor Detail

WritableSequenceFile

@ConstructorProperties(value={"fields","valueType"})
public WritableSequenceFile(Fields fields,
                                                       Class<? extends org.apache.hadoop.io.Writable> valueType)
Constructor WritableSequenceFile creates a new WritableSequenceFile instance.

Parameters:
fields - of type Fields
valueType - of type Class, may not be null

WritableSequenceFile

@ConstructorProperties(value={"fields","keyType","valueType"})
public WritableSequenceFile(Fields fields,
                                                       Class<? extends org.apache.hadoop.io.Writable> keyType,
                                                       Class<? extends org.apache.hadoop.io.Writable> valueType)
Constructor WritableSequenceFile creates a new WritableSequenceFile instance.

Parameters:
fields - of type Fields
keyType - of type Class
valueType - of type Class
Method Detail

sinkConfInit

public void sinkConfInit(FlowProcess<org.apache.hadoop.mapred.JobConf> flowProcess,
                         Tap<org.apache.hadoop.mapred.JobConf,org.apache.hadoop.mapred.RecordReader,org.apache.hadoop.mapred.OutputCollector> tap,
                         org.apache.hadoop.mapred.JobConf conf)
Description copied from class: Scheme
Method sinkInit initializes this instance as a sink.

This method is executed client side as a means to provide necessary configuration parameters used by the underlying platform.

It is not intended to initialize resources that would be necessary during the execution of this class, like a "formatter" or "parser".

See Scheme.sinkPrepare(cascading.flow.FlowProcess, SinkCall) if resources much be initialized before use. And Scheme.sinkCleanup(cascading.flow.FlowProcess, SinkCall) if resources must be destroyed after use.

Overrides:
sinkConfInit in class SequenceFile
Parameters:
flowProcess - of type FlowProcess
tap - of type Tap
conf - of type Config

source

public boolean source(FlowProcess<org.apache.hadoop.mapred.JobConf> flowProcess,
                      SourceCall<Object[],org.apache.hadoop.mapred.RecordReader> sourceCall)
               throws IOException
Description copied from class: Scheme
Method source will read a new "record" or value from SourceCall.getInput() and populate the available Tuple via SourceCall.getIncomingEntry() and return true on success or false if no more values available.

It's ok to set a new Tuple instance on the incomingEntry TupleEntry, or to simply re-use the existing instance.

Note this is only time it is safe to modify a Tuple instance handed over via a method call.

This method may optionally throw a TapException if it cannot process a particular instance of data. If the payload Tuple is set on the TapException, that Tuple will be written to any applicable failure trap Tap.

Overrides:
source in class SequenceFile
Parameters:
flowProcess - of type FlowProcess
sourceCall - of SourceCall
Returns:
returns true when a Tuple was successfully read
Throws:
IOException

sink

public void sink(FlowProcess<org.apache.hadoop.mapred.JobConf> flowProcess,
                 SinkCall<Void,org.apache.hadoop.mapred.OutputCollector> sinkCall)
          throws IOException
Description copied from class: Scheme
Method sink writes out the given Tuple found on SinkCall.getOutgoingEntry() to the SinkCall.getOutput().

This method may optionally throw a TapException if it cannot process a particular instance of data. If the payload Tuple is set on the TapException, that Tuple will be written to any applicable failure trap Tap. If not set, the incoming Tuple will be written instead.

Overrides:
sink in class SequenceFile
Parameters:
flowProcess - of Process
sinkCall - of SinkCall
Throws:
IOException

equals

public boolean equals(Object object)
Overrides:
equals in class Scheme<org.apache.hadoop.mapred.JobConf,org.apache.hadoop.mapred.RecordReader,org.apache.hadoop.mapred.OutputCollector,Object[],Void>

hashCode

public int hashCode()
Overrides:
hashCode in class Scheme<org.apache.hadoop.mapred.JobConf,org.apache.hadoop.mapred.RecordReader,org.apache.hadoop.mapred.OutputCollector,Object[],Void>


Copyright © 2007-2013 Concurrent, Inc. All Rights Reserved.