cascading.scheme
Class Scheme<Config,Input,Output,SourceContext,SinkContext>

java.lang.Object
  extended by cascading.scheme.Scheme<Config,Input,Output,SourceContext,SinkContext>
All Implemented Interfaces:
Serializable
Direct Known Subclasses:
BasePartitionTap.PartitionScheme, BaseTemplateTap.TemplateScheme, NullScheme

public abstract class Scheme<Config,Input,Output,SourceContext,SinkContext>
extends Object
implements Serializable

A Scheme defines what is stored in a Tap instance by declaring the Tuple field names, and alternately parsing or rendering the incoming or outgoing Tuple stream, respectively.

A Scheme defines the type of resource data will be sourced from or sinked to.

The default sourceFields are Fields.UNKNOWN and the default sinkFields are Fields.ALL.

Any given sourceFields only label the values in the Tuples as they are sourced. It does not necessarily filter the output since a given implementation may choose to collapse values and ignore keys depending on the format.

If the sinkFields are Fields.ALL, the Cascading planner will attempt to resolve the actual field names and make them available via the SinkCall.getOutgoingEntry() method. Sometimes this may not be possible (in the case the Tap.openForWrite(cascading.flow.FlowProcess) method is called from user code directly (without planner intervention).

If the sinkFields are a valid selector, the sink(cascading.flow.FlowProcess, SinkCall) method will only see the fields expected.

Setting the numSinkParts value to 1 (one) attempts to ensure the output resource has only one part. In the case of MapReduce, this is only a suggestion for the Map side, on the Reduce side it does this by setting the number of reducers to the given value. This may affect performance, so be cautioned.

Note that setting numSinkParts does not force the planner to insert a final Reduce operation in the job, so numSinkParts may be ignored entirely if the final job is Map only. To force the Flow to have a final Reduce, add a GroupBy to the assembly before sinking.

See Also:
Serialized Form

Constructor Summary
protected Scheme()
          Constructor Scheme creates a new Scheme instance.
protected Scheme(Fields sourceFields)
          Constructor Scheme creates a new Scheme instance.
protected Scheme(Fields sourceFields, Fields sinkFields)
          Constructor Scheme creates a new Scheme instance.
protected Scheme(Fields sourceFields, Fields sinkFields, int numSinkParts)
          Constructor Scheme creates a new Scheme instance.
protected Scheme(Fields sourceFields, int numSinkParts)
          Constructor Scheme creates a new Scheme instance.
 
Method Summary
 boolean equals(Object object)
           
 int getNumSinkParts()
          Method getNumSinkParts returns the numSinkParts of this Scheme object.
 Fields getSinkFields()
          Method getSinkFields returns the sinkFields of this Scheme object.
 Fields getSourceFields()
          Method getSourceFields returns the sourceFields of this Scheme object.
 String getTrace()
          Method getTrace returns a String that pinpoint where this instance was created for debugging.
 int hashCode()
           
 boolean isSink()
          Method isSink returns true if this Scheme instance can be used as a sink.
 boolean isSource()
          Method isSource returns true if this Scheme instance can be used as a source.
 boolean isSymmetrical()
          Method isSymmetrical returns true if the sink fields equal the source fields.
 void presentSinkFields(FlowProcess<Config> flowProcess, Tap tap, Fields fields)
          Method presentSinkFields is called after the planner is invoked and all fields are resolved.
protected  void presentSinkFieldsInternal(Fields fields)
           
 void presentSourceFields(FlowProcess<Config> flowProcess, Tap tap, Fields fields)
          Method presentSourceFields is called after the planner is invoked and all fields are resolved.
protected  void presentSourceFieldsInternal(Fields fields)
           
 Fields retrieveSinkFields(FlowProcess<Config> flowProcess, Tap tap)
          Method retrieveSinkFields notifies a Scheme when it is appropriate to dynamically update the fields it sources.
 Fields retrieveSourceFields(FlowProcess<Config> flowProcess, Tap tap)
          Method retrieveSourceFields notifies a Scheme when it is appropriate to dynamically update the fields it sources.
 void setNumSinkParts(int numSinkParts)
          Method setNumSinkParts sets the numSinkParts of this Scheme object.
 void setSinkFields(Fields sinkFields)
          Method setSinkFields sets the sinkFields of this Scheme object.
 void setSourceFields(Fields sourceFields)
          Method setSourceFields sets the sourceFields of this Scheme object.
abstract  void sink(FlowProcess<Config> flowProcess, SinkCall<SinkContext,Output> sinkCall)
          Method sink writes out the given Tuple found on SinkCall.getOutgoingEntry() to the SinkCall.getOutput().
 void sinkCleanup(FlowProcess<Config> flowProcess, SinkCall<SinkContext,Output> sinkCall)
          Method sinkCleanup is used to destroy resources created by sinkPrepare(cascading.flow.FlowProcess, SinkCall).
abstract  void sinkConfInit(FlowProcess<Config> flowProcess, Tap<Config,Input,Output> tap, Config conf)
          Method sinkInit initializes this instance as a sink.
 void sinkPrepare(FlowProcess<Config> flowProcess, SinkCall<SinkContext,Output> sinkCall)
          Method sinkPrepare is used to initialize resources needed during each call of sink(cascading.flow.FlowProcess, SinkCall).
abstract  boolean source(FlowProcess<Config> flowProcess, SourceCall<SourceContext,Input> sourceCall)
          Method source will read a new "record" or value from SourceCall.getInput() and populate the available Tuple via SourceCall.getIncomingEntry() and return true on success or false if no more values available.
 void sourceCleanup(FlowProcess<Config> flowProcess, SourceCall<SourceContext,Input> sourceCall)
          Method sourceCleanup is used to destroy resources created by sourcePrepare(cascading.flow.FlowProcess, SourceCall).
abstract  void sourceConfInit(FlowProcess<Config> flowProcess, Tap<Config,Input,Output> tap, Config conf)
          Method sourceInit initializes this instance as a source.
 void sourcePrepare(FlowProcess<Config> flowProcess, SourceCall<SourceContext,Input> sourceCall)
          Method sourcePrepare is used to initialize resources needed during each call of source(cascading.flow.FlowProcess, SourceCall).
 String toString()
           
 
Methods inherited from class java.lang.Object
clone, finalize, getClass, notify, notifyAll, wait, wait, wait
 

Constructor Detail

Scheme

protected Scheme()
Constructor Scheme creates a new Scheme instance.


Scheme

protected Scheme(Fields sourceFields)
Constructor Scheme creates a new Scheme instance.

Parameters:
sourceFields - of type Fields

Scheme

protected Scheme(Fields sourceFields,
                 int numSinkParts)
Constructor Scheme creates a new Scheme instance.

Parameters:
sourceFields - of type Fields
numSinkParts - of type int

Scheme

protected Scheme(Fields sourceFields,
                 Fields sinkFields)
Constructor Scheme creates a new Scheme instance.

Parameters:
sourceFields - of type Fields
sinkFields - of type Fields

Scheme

protected Scheme(Fields sourceFields,
                 Fields sinkFields,
                 int numSinkParts)
Constructor Scheme creates a new Scheme instance.

Parameters:
sourceFields - of type Fields
sinkFields - of type Fields
numSinkParts - of type int
Method Detail

getSinkFields

public Fields getSinkFields()
Method getSinkFields returns the sinkFields of this Scheme object.

Returns:
the sinkFields (type Fields) of this Scheme object.

setSinkFields

public void setSinkFields(Fields sinkFields)
Method setSinkFields sets the sinkFields of this Scheme object.

Parameters:
sinkFields - the sinkFields of this Scheme object.

getSourceFields

public Fields getSourceFields()
Method getSourceFields returns the sourceFields of this Scheme object.

Returns:
the sourceFields (type Fields) of this Scheme object.

setSourceFields

public void setSourceFields(Fields sourceFields)
Method setSourceFields sets the sourceFields of this Scheme object.

Parameters:
sourceFields - the sourceFields of this Scheme object.

getNumSinkParts

public int getNumSinkParts()
Method getNumSinkParts returns the numSinkParts of this Scheme object.

Returns:
the numSinkParts (type int) of this Scheme object.

setNumSinkParts

public void setNumSinkParts(int numSinkParts)
Method setNumSinkParts sets the numSinkParts of this Scheme object.

Parameters:
numSinkParts - the numSinkParts of this Scheme object.

getTrace

public String getTrace()
Method getTrace returns a String that pinpoint where this instance was created for debugging.

Returns:
String

isSymmetrical

public boolean isSymmetrical()
Method isSymmetrical returns true if the sink fields equal the source fields. That is, this scheme sources the same fields as it sinks.

Returns:
the symmetrical (type boolean) of this Scheme object.

isSource

public boolean isSource()
Method isSource returns true if this Scheme instance can be used as a source.

Returns:
boolean

isSink

public boolean isSink()
Method isSink returns true if this Scheme instance can be used as a sink.

Returns:
boolean

retrieveSourceFields

public Fields retrieveSourceFields(FlowProcess<Config> flowProcess,
                                   Tap tap)
Method retrieveSourceFields notifies a Scheme when it is appropriate to dynamically update the fields it sources. By default the current declared fields are returned.

The FlowProcess presents all known properties resolved by the current planner.

The tap instance is the parent Tap for this Scheme instance.

Parameters:
flowProcess - of type FlowProcess
tap - of type Tap
Returns:
Fields

presentSourceFields

public void presentSourceFields(FlowProcess<Config> flowProcess,
                                Tap tap,
                                Fields fields)
Method presentSourceFields is called after the planner is invoked and all fields are resolved. This method presents to the Scheme the actual source fields after any planner intervention.

This method is called after retrieveSourceFields(cascading.flow.FlowProcess, cascading.tap.Tap).

Parameters:
flowProcess - of type FlowProcess
tap - of type Tap
fields - of type Fields

presentSourceFieldsInternal

protected void presentSourceFieldsInternal(Fields fields)

retrieveSinkFields

public Fields retrieveSinkFields(FlowProcess<Config> flowProcess,
                                 Tap tap)
Method retrieveSinkFields notifies a Scheme when it is appropriate to dynamically update the fields it sources. By default the current declared fields are returned.

The FlowProcess presents all known properties resolved by the current planner.

The tap instance is the parent Tap for this Scheme instance.

Parameters:
flowProcess - of type FlowProcess
tap - of type Tap
Returns:
Fields

presentSinkFields

public void presentSinkFields(FlowProcess<Config> flowProcess,
                              Tap tap,
                              Fields fields)
Method presentSinkFields is called after the planner is invoked and all fields are resolved. This method presents to the Scheme the actual source fields after any planner intervention.

This method is called after retrieveSinkFields(cascading.flow.FlowProcess, cascading.tap.Tap).

Parameters:
flowProcess - of type FlowProcess
tap - of type Tap
fields - of type Fields

presentSinkFieldsInternal

protected void presentSinkFieldsInternal(Fields fields)

sourceConfInit

public abstract void sourceConfInit(FlowProcess<Config> flowProcess,
                                    Tap<Config,Input,Output> tap,
                                    Config conf)
Method sourceInit initializes this instance as a source.

This method is executed client side as a means to provide necessary configuration parameters used by the underlying platform.

It is not intended to initialize resources that would be necessary during the execution of this class, like a "formatter" or "parser".

See sourcePrepare(cascading.flow.FlowProcess, SourceCall) if resources much be initialized before use. And sourceCleanup(cascading.flow.FlowProcess, SourceCall) if resources must be destroyed after use.

Parameters:
flowProcess - of type FlowProcess
tap - of type Tap
conf - of type Config

sinkConfInit

public abstract void sinkConfInit(FlowProcess<Config> flowProcess,
                                  Tap<Config,Input,Output> tap,
                                  Config conf)
Method sinkInit initializes this instance as a sink.

This method is executed client side as a means to provide necessary configuration parameters used by the underlying platform.

It is not intended to initialize resources that would be necessary during the execution of this class, like a "formatter" or "parser".

See sinkPrepare(cascading.flow.FlowProcess, SinkCall) if resources much be initialized before use. And sinkCleanup(cascading.flow.FlowProcess, SinkCall) if resources must be destroyed after use.

Parameters:
flowProcess - of type FlowProcess
tap - of type Tap
conf - of type Config

sourcePrepare

public void sourcePrepare(FlowProcess<Config> flowProcess,
                          SourceCall<SourceContext,Input> sourceCall)
                   throws IOException
Method sourcePrepare is used to initialize resources needed during each call of source(cascading.flow.FlowProcess, SourceCall).

Be sure to place any initialized objects in the SourceContext so each instance will remain threadsafe.

Parameters:
flowProcess - of type FlowProcess
sourceCall - of type SourceCall
Throws:
IOException

source

public abstract boolean source(FlowProcess<Config> flowProcess,
                               SourceCall<SourceContext,Input> sourceCall)
                        throws IOException
Method source will read a new "record" or value from SourceCall.getInput() and populate the available Tuple via SourceCall.getIncomingEntry() and return true on success or false if no more values available.

It's ok to set a new Tuple instance on the incomingEntry TupleEntry, or to simply re-use the existing instance.

Note this is only time it is safe to modify a Tuple instance handed over via a method call.

This method may optionally throw a TapException if it cannot process a particular instance of data. If the payload Tuple is set on the TapException, that Tuple will be written to any applicable failure trap Tap.

Parameters:
flowProcess - of type FlowProcess
sourceCall - of SourceCall
Returns:
returns true when a Tuple was successfully read
Throws:
IOException

sourceCleanup

public void sourceCleanup(FlowProcess<Config> flowProcess,
                          SourceCall<SourceContext,Input> sourceCall)
                   throws IOException
Method sourceCleanup is used to destroy resources created by sourcePrepare(cascading.flow.FlowProcess, SourceCall).

Parameters:
flowProcess - of Process
sourceCall - of type SourceCall
Throws:
IOException

sinkPrepare

public void sinkPrepare(FlowProcess<Config> flowProcess,
                        SinkCall<SinkContext,Output> sinkCall)
                 throws IOException
Method sinkPrepare is used to initialize resources needed during each call of sink(cascading.flow.FlowProcess, SinkCall).

Be sure to place any initialized objects in the SinkContext so each instance will remain threadsafe.

Parameters:
flowProcess - of type FlowProcess
sinkCall - of type SinkCall
Throws:
IOException

sink

public abstract void sink(FlowProcess<Config> flowProcess,
                          SinkCall<SinkContext,Output> sinkCall)
                   throws IOException
Method sink writes out the given Tuple found on SinkCall.getOutgoingEntry() to the SinkCall.getOutput().

This method may optionally throw a TapException if it cannot process a particular instance of data. If the payload Tuple is set on the TapException, that Tuple will be written to any applicable failure trap Tap. If not set, the incoming Tuple will be written instead.

Parameters:
flowProcess - of Process
sinkCall - of SinkCall
Throws:
IOException

sinkCleanup

public void sinkCleanup(FlowProcess<Config> flowProcess,
                        SinkCall<SinkContext,Output> sinkCall)
                 throws IOException
Method sinkCleanup is used to destroy resources created by sinkPrepare(cascading.flow.FlowProcess, SinkCall).

Parameters:
flowProcess - of type FlowProcess
sinkCall - of type SinkCall
Throws:
IOException

equals

public boolean equals(Object object)
Overrides:
equals in class Object

toString

public String toString()
Overrides:
toString in class Object

hashCode

public int hashCode()
Overrides:
hashCode in class Object


Copyright © 2007-2013 Concurrent, Inc. All Rights Reserved.