SequenceFile (Cascading 2.2.1)

Overview

Package

Class

Tree

Deprecated

Index

Help

PREV CLASS NEXT CLASS

FRAMES NO FRAMES

SUMMARY: NESTED | FIELD | CONSTR | METHOD

DETAIL: FIELD | CONSTR | METHOD

cascading.scheme.hadoop
Class SequenceFile

java.lang.Object
  cascading.scheme.Scheme<org.apache.hadoop.mapred.JobConf,org.apache.hadoop.mapred.RecordReader,org.apache.hadoop.mapred.OutputCollector,Object[],Void>
      cascading.scheme.hadoop.SequenceFile

All Implemented Interfaces:: Serializable

Direct Known Subclasses:: WritableSequenceFile

public class SequenceFile
extends Scheme<org.apache.hadoop.mapred.JobConf,org.apache.hadoop.mapred.RecordReader,org.apache.hadoop.mapred.OutputCollector,Object[],Void>
extends Scheme<org.apache.hadoop.mapred.JobConf,org.apache.hadoop.mapred.RecordReader,org.apache.hadoop.mapred.OutputCollector,Object[],Void>

A SequenceFile is a type of Scheme, which is a flat file consisting of binary key/value pairs. This is a space and time efficient means to store data.

See Also:: Serialized Form

Constructor Summary
`protected`	`SequenceFile()` Protected for use by TempDfs and other subclasses.
	`SequenceFile(Fields fields)` Creates a new SequenceFile instance that stores the given field names.

Method Summary
`void`	`sink(FlowProcess<org.apache.hadoop.mapred.JobConf> flowProcess, SinkCall<Void,org.apache.hadoop.mapred.OutputCollector> sinkCall)` Method sink writes out the given `Tuple` found on `SinkCall.getOutgoingEntry()` to the `SinkCall.getOutput()`.
`void`	`sinkConfInit(FlowProcess<org.apache.hadoop.mapred.JobConf> flowProcess, Tap<org.apache.hadoop.mapred.JobConf,org.apache.hadoop.mapred.RecordReader,org.apache.hadoop.mapred.OutputCollector> tap, org.apache.hadoop.mapred.JobConf conf)` Method sinkInit initializes this instance as a sink.
`boolean`	`source(FlowProcess<org.apache.hadoop.mapred.JobConf> flowProcess, SourceCall<Object[],org.apache.hadoop.mapred.RecordReader> sourceCall)` Method source will read a new "record" or value from `SourceCall.getInput()` and populate the available `Tuple` via `SourceCall.getIncomingEntry()` and return `true` on success or `false` if no more values available.
`void`	`sourceCleanup(FlowProcess<org.apache.hadoop.mapred.JobConf> flowProcess, SourceCall<Object[],org.apache.hadoop.mapred.RecordReader> sourceCall)` Method sourceCleanup is used to destroy resources created by `Scheme.sourcePrepare(cascading.flow.FlowProcess, SourceCall)`.
`void`	`sourceConfInit(FlowProcess<org.apache.hadoop.mapred.JobConf> flowProcess, Tap<org.apache.hadoop.mapred.JobConf,org.apache.hadoop.mapred.RecordReader,org.apache.hadoop.mapred.OutputCollector> tap, org.apache.hadoop.mapred.JobConf conf)` Method sourceInit initializes this instance as a source.
`void`	`sourcePrepare(FlowProcess<org.apache.hadoop.mapred.JobConf> flowProcess, SourceCall<Object[],org.apache.hadoop.mapred.RecordReader> sourceCall)` Method sourcePrepare is used to initialize resources needed during each call of `Scheme.source(cascading.flow.FlowProcess, SourceCall)`.

Methods inherited from class cascading.scheme.Scheme
`equals, getNumSinkParts, getSinkFields, getSourceFields, getTrace, hashCode, isSink, isSource, isSymmetrical, presentSinkFields, presentSinkFieldsInternal, presentSourceFields, presentSourceFieldsInternal, retrieveSinkFields, retrieveSourceFields, setNumSinkParts, setSinkFields, setSourceFields, sinkCleanup, sinkPrepare, toString`

Methods inherited from class java.lang.Object
`clone, finalize, getClass, notify, notifyAll, wait, wait, wait`

Constructor Detail

SequenceFile

protected SequenceFile()

Protected for use by TempDfs and other subclasses. Not for general consumption.

SequenceFile

@ConstructorProperties(value="fields")
public SequenceFile(Fields fields)

Creates a new SequenceFile instance that stores the given field names.

Parameters:: fields -

Method Detail

sourceConfInit

public void sourceConfInit(FlowProcess<org.apache.hadoop.mapred.JobConf> flowProcess,
                           Tap<org.apache.hadoop.mapred.JobConf,org.apache.hadoop.mapred.RecordReader,org.apache.hadoop.mapred.OutputCollector> tap,
                           org.apache.hadoop.mapred.JobConf conf)

Description copied from class: Scheme

Method sourceInit initializes this instance as a source.

This method is executed client side as a means to provide necessary configuration parameters used by the underlying platform.

It is not intended to initialize resources that would be necessary during the execution of this class, like a "formatter" or "parser".

See Scheme.sourcePrepare(cascading.flow.FlowProcess, SourceCall) if resources much be initialized before use. And Scheme.sourceCleanup(cascading.flow.FlowProcess, SourceCall) if resources must be destroyed after use.

Specified by:: sourceConfInit in class Scheme<org.apache.hadoop.mapred.JobConf,org.apache.hadoop.mapred.RecordReader,org.apache.hadoop.mapred.OutputCollector,Object[],Void>

Parameters:: flowProcess - of type FlowProcess; tap - of type Tap; conf - of type Config

sinkConfInit

public void sinkConfInit(FlowProcess<org.apache.hadoop.mapred.JobConf> flowProcess,
                         Tap<org.apache.hadoop.mapred.JobConf,org.apache.hadoop.mapred.RecordReader,org.apache.hadoop.mapred.OutputCollector> tap,
                         org.apache.hadoop.mapred.JobConf conf)

Description copied from class: Scheme

Method sinkInit initializes this instance as a sink.

This method is executed client side as a means to provide necessary configuration parameters used by the underlying platform.

It is not intended to initialize resources that would be necessary during the execution of this class, like a "formatter" or "parser".

See Scheme.sinkPrepare(cascading.flow.FlowProcess, SinkCall) if resources much be initialized before use. And Scheme.sinkCleanup(cascading.flow.FlowProcess, SinkCall) if resources must be destroyed after use.

Specified by:: sinkConfInit in class Scheme<org.apache.hadoop.mapred.JobConf,org.apache.hadoop.mapred.RecordReader,org.apache.hadoop.mapred.OutputCollector,Object[],Void>

Parameters:: flowProcess - of type FlowProcess; tap - of type Tap; conf - of type Config

sourcePrepare

public void sourcePrepare(FlowProcess<org.apache.hadoop.mapred.JobConf> flowProcess,
                          SourceCall<Object[],org.apache.hadoop.mapred.RecordReader> sourceCall)

Description copied from class: Scheme

Method sourcePrepare is used to initialize resources needed during each call of Scheme.source(cascading.flow.FlowProcess, SourceCall).

Be sure to place any initialized objects in the SourceContext so each instance will remain threadsafe.

Overrides:: sourcePrepare in class Scheme<org.apache.hadoop.mapred.JobConf,org.apache.hadoop.mapred.RecordReader,org.apache.hadoop.mapred.OutputCollector,Object[],Void>

Parameters:: flowProcess - of type FlowProcess; sourceCall - of type SourceCall

source

public boolean source(FlowProcess<org.apache.hadoop.mapred.JobConf> flowProcess,
                      SourceCall<Object[],org.apache.hadoop.mapred.RecordReader> sourceCall)
               throws IOException

Description copied from class: Scheme

Method source will read a new "record" or value from SourceCall.getInput() and populate the available Tuple via SourceCall.getIncomingEntry() and return true on success or false if no more values available.

It's ok to set a new Tuple instance on the incomingEntry TupleEntry, or to simply re-use the existing instance.

Note this is only time it is safe to modify a Tuple instance handed over via a method call.

This method may optionally throw a TapException if it cannot process a particular instance of data. If the payload Tuple is set on the TapException, that Tuple will be written to any applicable failure trap Tap.

Specified by:: source in class Scheme<org.apache.hadoop.mapred.JobConf,org.apache.hadoop.mapred.RecordReader,org.apache.hadoop.mapred.OutputCollector,Object[],Void>

Parameters:: flowProcess - of type FlowProcess; sourceCall - of SourceCall
Returns:: returns true when a Tuple was successfully read
Throws:: IOException

sourceCleanup

public void sourceCleanup(FlowProcess<org.apache.hadoop.mapred.JobConf> flowProcess,
                          SourceCall<Object[],org.apache.hadoop.mapred.RecordReader> sourceCall)

Description copied from class: Scheme

Method sourceCleanup is used to destroy resources created by Scheme.sourcePrepare(cascading.flow.FlowProcess, SourceCall).

Overrides:: sourceCleanup in class Scheme<org.apache.hadoop.mapred.JobConf,org.apache.hadoop.mapred.RecordReader,org.apache.hadoop.mapred.OutputCollector,Object[],Void>

Parameters:: flowProcess - of Process; sourceCall - of type SourceCall

sink

public void sink(FlowProcess<org.apache.hadoop.mapred.JobConf> flowProcess,
                 SinkCall<Void,org.apache.hadoop.mapred.OutputCollector> sinkCall)
          throws IOException

Description copied from class: Scheme

Method sink writes out the given Tuple found on SinkCall.getOutgoingEntry() to the SinkCall.getOutput().

Specified by:: sink in class Scheme<org.apache.hadoop.mapred.JobConf,org.apache.hadoop.mapred.RecordReader,org.apache.hadoop.mapred.OutputCollector,Object[],Void>

Parameters:: flowProcess - of Process; sinkCall - of SinkCall
Throws:: IOException

Overview

Package

Class

Tree

Deprecated

Index

Help

PREV CLASS NEXT CLASS

FRAMES NO FRAMES

SUMMARY: NESTED | FIELD | CONSTR | METHOD

DETAIL: FIELD | CONSTR | METHOD

cascading.scheme.hadoop Class SequenceFile

SequenceFile

SequenceFile

sourceConfInit

sinkConfInit

sourcePrepare

source

sourceCleanup

sink

cascading.scheme.hadoop
Class SequenceFile