cascading.scheme
Class Scheme

java.lang.Object
  extended by cascading.scheme.Scheme
All Implemented Interfaces:
Serializable
Direct Known Subclasses:
SequenceFile, TemplateTap.TemplateScheme, TextLine

public abstract class Scheme
extends Object
implements Serializable

A Scheme defines what is stored in a Tap instance by declaring the Tuple field names, and alternately parsing or rendering the incoming or outgoing Tuple stream, respectively.

A Scheme defines the type of resource data will be sourced from or sinked to.

The given sourcFields only label the values in the Tuples as they are sourced. It does not necessarily filter the output since a given implementation may choose to collapse values and ignore keys depending on the format.

Setting the numSinkParts value to 1 (one) insures the output resource has only one part. In the case of MapReduce, it does this by setting the number of reducers to the given value. This may affect performance, so be cautioned.

Note that setting numSinkParts does not force the planner to insert a final Reduce operation in the job, so numSinkParts may be ignored entirely if the final job is Map only. To force the Flow to have a final Reduce, add a GroupBy to the assembly before sinking.

See Also:
Serialized Form

Constructor Summary
protected Scheme()
          Constructor Scheme creates a new Scheme instance.
protected Scheme(Fields sourceFields)
          Constructor Scheme creates a new Scheme instance.
protected Scheme(Fields sourceFields, Fields sinkFields)
          Constructor Scheme creates a new Scheme instance.
protected Scheme(Fields sourceFields, Fields sinkFields, int numSinkParts)
          Constructor Scheme creates a new Scheme instance.
protected Scheme(Fields sourceFields, int numSinkParts)
          Constructor Scheme creates a new Scheme instance.
 
Method Summary
 boolean equals(Object object)
           
 int getNumSinkParts()
          Method getNumSinkParts returns the numSinkParts of this Scheme object.
 Fields getSinkFields()
          Method getSinkFields returns the sinkFields of this Scheme object.
 Fields getSourceFields()
          Method getSourceFields returns the sourceFields of this Scheme object.
 String getTrace()
          Method getTrace returns a String that pinpoint where this instance was created for debugging.
 int hashCode()
           
 boolean isSink()
          Method isSink returns true if this Scheme instance can be used as a sink.
 boolean isSource()
          Method isSource returns true if this Scheme instance can be used as a source.
 boolean isSymmetrical()
          Method isSymmetrical returns true if the sink fields equal the source fields.
 boolean isWriteDirect()
          Method isWriteDirect returns true if the parent Tap instances TupleEntryCollector should be used to sink values.
 void setNumSinkParts(int numSinkParts)
          Method setNumSinkParts sets the numSinkParts of this Scheme object.
 void setSinkFields(Fields sinkFields)
          Method setSinkFields sets the sinkFields of this Scheme object.
 void setSourceFields(Fields sourceFields)
          Method setSourceFields sets the sourceFields of this Scheme object.
abstract  void sink(TupleEntry tupleEntry, OutputCollector outputCollector)
          Method sink writes out the given Tuple instance to the outputCollector.
abstract  void sinkInit(Tap tap, JobConf conf)
          Method sinkInit initializes this instance as a sink.
abstract  Tuple source(Object key, Object value)
          Method source takes the given Hadoop key and value and returns a new Tuple instance.
abstract  void sourceInit(Tap tap, JobConf conf)
          Method sourceInit initializes this instance as a source.
 String toString()
           
 
Methods inherited from class java.lang.Object
clone, finalize, getClass, notify, notifyAll, wait, wait, wait
 

Constructor Detail

Scheme

protected Scheme()
Constructor Scheme creates a new Scheme instance.


Scheme

protected Scheme(Fields sourceFields)
Constructor Scheme creates a new Scheme instance.

Parameters:
sourceFields - of type Fields

Scheme

protected Scheme(Fields sourceFields,
                 int numSinkParts)
Constructor Scheme creates a new Scheme instance.

Parameters:
sourceFields - of type Fields
numSinkParts - of type int

Scheme

protected Scheme(Fields sourceFields,
                 Fields sinkFields)
Constructor Scheme creates a new Scheme instance.

Parameters:
sourceFields - of type Fields
sinkFields - of type Fields

Scheme

protected Scheme(Fields sourceFields,
                 Fields sinkFields,
                 int numSinkParts)
Constructor Scheme creates a new Scheme instance.

Parameters:
sourceFields - of type Fields
sinkFields - of type Fields
numSinkParts - of type int
Method Detail

getSinkFields

public Fields getSinkFields()
Method getSinkFields returns the sinkFields of this Scheme object.

Returns:
the sinkFields (type Fields) of this Scheme object.

setSinkFields

public void setSinkFields(Fields sinkFields)
Method setSinkFields sets the sinkFields of this Scheme object.

Parameters:
sinkFields - the sinkFields of this Scheme object.

getSourceFields

public Fields getSourceFields()
Method getSourceFields returns the sourceFields of this Scheme object.

Returns:
the sourceFields (type Fields) of this Scheme object.

setSourceFields

public void setSourceFields(Fields sourceFields)
Method setSourceFields sets the sourceFields of this Scheme object.

Parameters:
sourceFields - the sourceFields of this Scheme object.

getNumSinkParts

public int getNumSinkParts()
Method getNumSinkParts returns the numSinkParts of this Scheme object.

Returns:
the numSinkParts (type int) of this Scheme object.

setNumSinkParts

public void setNumSinkParts(int numSinkParts)
Method setNumSinkParts sets the numSinkParts of this Scheme object.

Parameters:
numSinkParts - the numSinkParts of this Scheme object.

getTrace

public String getTrace()
Method getTrace returns a String that pinpoint where this instance was created for debugging.

Returns:
String

isWriteDirect

public boolean isWriteDirect()
Method isWriteDirect returns true if the parent Tap instances TupleEntryCollector should be used to sink values.

Returns:
the writeDirect (type boolean) of this Tap object.

isSymmetrical

public boolean isSymmetrical()
Method isSymmetrical returns true if the sink fields equal the source fields. That is, this scheme sources the same fields as it sinks.

Returns:
the symmetrical (type boolean) of this Scheme object.

isSource

public boolean isSource()
Method isSource returns true if this Scheme instance can be used as a source.

Returns:
boolean

isSink

public boolean isSink()
Method isSink returns true if this Scheme instance can be used as a sink.

Returns:
boolean

sourceInit

public abstract void sourceInit(Tap tap,
                                JobConf conf)
                         throws IOException
Method sourceInit initializes this instance as a source.

Parameters:
tap - of type Tap
conf - of type JobConf
Throws:
IOException - on initializatin failure

sinkInit

public abstract void sinkInit(Tap tap,
                              JobConf conf)
                       throws IOException
Method sinkInit initializes this instance as a sink.

Parameters:
tap - of type Tap
conf - of type JobConf
Throws:
IOException - on initialization failure

source

public abstract Tuple source(Object key,
                             Object value)
Method source takes the given Hadoop key and value and returns a new Tuple instance.

Parameters:
key - of type WritableComparable
value - of type Writable
Returns:
Tuple

sink

public abstract void sink(TupleEntry tupleEntry,
                          OutputCollector outputCollector)
                   throws IOException
Method sink writes out the given Tuple instance to the outputCollector.

Parameters:
tupleEntry -
outputCollector - of type OutputCollector @throws IOException when
Throws:
IOException

equals

public boolean equals(Object object)
Overrides:
equals in class Object

toString

public String toString()
Overrides:
toString in class Object

hashCode

public int hashCode()
Overrides:
hashCode in class Object


Copyright © 2007-2010 Concurrent, Inc. All Rights Reserved.