public abstract class Scheme<Config,Input,Output,SourceContext,SinkContext> extends java.lang.Object implements java.io.Serializable, Traceable
Tap
instance by declaring the Tuple
field names, and alternately parsing or rendering the incoming or outgoing Tuple
stream, respectively.
A Scheme defines the type of resource data will be sourced from or sinked to.
The default sourceFields are Fields.UNKNOWN
and the default sinkFields are Fields.ALL
.
Any given sourceFields only label the values in the Tuple
s as they are sourced.
It does not necessarily filter the output since a given implementation may choose to
collapse values and ignore keys depending on the format.
If the sinkFields are Fields.ALL
, the Cascading planner will attempt to resolve the actual field names
and make them available via the SinkCall.getOutgoingEntry()
method. Sometimes this may
not be possible (in the case the Tap.openForWrite(cascading.flow.FlowProcess)
method is called from user
code directly (without planner intervention).
If the sinkFields are a valid selector, the sink(cascading.flow.FlowProcess, SinkCall)
method will
only see the fields expected.
Setting the numSinkParts
value to 1 (one) attempts to ensure the output resource has only one part.
In the case of MapReduce, this is only a suggestion for the Map side, on the Reduce side it does this by
setting the number of reducers to the given value. This may affect performance, so be cautioned.
Note that setting numSinkParts does not force the planner to insert a final Reduce operation in the job, so
numSinkParts may be ignored entirely if the final job is Map only. To force the Flow to have a final Reduce,
add a GroupBy
to the assembly before sinking.Modifier | Constructor and Description |
---|---|
protected |
Scheme()
Constructor Scheme creates a new Scheme instance.
|
protected |
Scheme(Fields sourceFields)
Constructor Scheme creates a new Scheme instance.
|
protected |
Scheme(Fields sourceFields,
Fields sinkFields)
Constructor Scheme creates a new Scheme instance.
|
protected |
Scheme(Fields sourceFields,
Fields sinkFields,
int numSinkParts)
Constructor Scheme creates a new Scheme instance.
|
protected |
Scheme(Fields sourceFields,
int numSinkParts)
Constructor Scheme creates a new Scheme instance.
|
Modifier and Type | Method and Description |
---|---|
boolean |
equals(java.lang.Object object) |
int |
getNumSinkParts()
Method getNumSinkParts returns the numSinkParts of this Scheme object.
|
Fields |
getSinkFields()
Method getSinkFields returns the sinkFields of this Scheme object.
|
Fields |
getSourceFields()
Method getSourceFields returns the sourceFields of this Scheme object.
|
java.lang.String |
getTrace()
Method getTrace returns a String that pinpoints the caller that created this instance.
|
int |
hashCode() |
boolean |
isSink()
Method isSink returns true if this Scheme instance can be used as a sink.
|
boolean |
isSource()
Method isSource returns true if this Scheme instance can be used as a source.
|
boolean |
isSymmetrical()
Method isSymmetrical returns
true if the sink fields equal the source fields. |
void |
presentSinkFields(FlowProcess<? extends Config> flowProcess,
Tap tap,
Fields fields)
Method presentSinkFields is called after the planner is invoked and all fields are resolved.
|
protected void |
presentSinkFieldsInternal(Fields fields) |
void |
presentSourceFields(FlowProcess<? extends Config> flowProcess,
Tap tap,
Fields fields)
Method presentSourceFields is called after the planner is invoked and all fields are resolved.
|
protected void |
presentSourceFieldsInternal(Fields fields) |
Fields |
retrieveSinkFields(FlowProcess<? extends Config> flowProcess,
Tap tap)
Method retrieveSinkFields notifies a Scheme when it is appropriate to dynamically
update the fields it sources.
|
Fields |
retrieveSourceFields(FlowProcess<? extends Config> flowProcess,
Tap tap)
Method retrieveSourceFields notifies a Scheme when it is appropriate to dynamically
update the fields it sources.
|
void |
setNumSinkParts(int numSinkParts)
Method setNumSinkParts sets the numSinkParts of this Scheme object.
|
void |
setSinkFields(Fields sinkFields)
Method setSinkFields sets the sinkFields of this Scheme object.
|
void |
setSourceFields(Fields sourceFields)
Method setSourceFields sets the sourceFields of this Scheme object.
|
abstract void |
sink(FlowProcess<? extends Config> flowProcess,
SinkCall<SinkContext,Output> sinkCall)
Method sink writes out the given
Tuple found on SinkCall.getOutgoingEntry() to
the SinkCall.getOutput() . |
void |
sinkCleanup(FlowProcess<? extends Config> flowProcess,
SinkCall<SinkContext,Output> sinkCall)
Method sinkCleanup is used to destroy resources created by
sinkPrepare(cascading.flow.FlowProcess, SinkCall) . |
abstract void |
sinkConfInit(FlowProcess<? extends Config> flowProcess,
Tap<Config,Input,Output> tap,
Config conf)
Method sinkInit initializes this instance as a sink.
|
void |
sinkPrepare(FlowProcess<? extends Config> flowProcess,
SinkCall<SinkContext,Output> sinkCall)
Method sinkPrepare is used to initialize resources needed during each call of
sink(cascading.flow.FlowProcess, SinkCall) . |
abstract boolean |
source(FlowProcess<? extends Config> flowProcess,
SourceCall<SourceContext,Input> sourceCall)
Method source will read a new "record" or value from
SourceCall.getInput() and populate
the available Tuple via SourceCall.getIncomingEntry() and return true
on success or false if no more values available. |
void |
sourceCleanup(FlowProcess<? extends Config> flowProcess,
SourceCall<SourceContext,Input> sourceCall)
Method sourceCleanup is used to destroy resources created by
sourcePrepare(cascading.flow.FlowProcess, SourceCall) . |
abstract void |
sourceConfInit(FlowProcess<? extends Config> flowProcess,
Tap<Config,Input,Output> tap,
Config conf)
Method sourceInit initializes this instance as a source.
|
void |
sourcePrepare(FlowProcess<? extends Config> flowProcess,
SourceCall<SourceContext,Input> sourceCall)
Method sourcePrepare is used to initialize resources needed during each call of
source(cascading.flow.FlowProcess, SourceCall) . |
void |
sourceRePrepare(FlowProcess<? extends Config> flowProcess,
SourceCall<SourceContext,Input> sourceCall)
Method sourceRePrepare is used to re-initialize resources needed during each call of
source(cascading.flow.FlowProcess, SourceCall) after the Input object
has been changed, if needed. |
java.lang.String |
toString() |
protected Scheme()
protected Scheme(Fields sourceFields)
sourceFields
- of type Fieldsprotected Scheme(Fields sourceFields, int numSinkParts)
sourceFields
- of type FieldsnumSinkParts
- of type intprotected Scheme(Fields sourceFields, Fields sinkFields)
sourceFields
- of type FieldssinkFields
- of type Fieldspublic Fields getSinkFields()
public void setSinkFields(Fields sinkFields)
sinkFields
- the sinkFields of this Scheme object.public Fields getSourceFields()
public void setSourceFields(Fields sourceFields)
sourceFields
- the sourceFields of this Scheme object.public int getNumSinkParts()
public void setNumSinkParts(int numSinkParts)
numSinkParts
- the numSinkParts of this Scheme object.public java.lang.String getTrace()
Traceable
public boolean isSymmetrical()
true
if the sink fields equal the source fields. That is, this
scheme sources the same fields as it sinks.public boolean isSource()
public boolean isSink()
public Fields retrieveSourceFields(FlowProcess<? extends Config> flowProcess, Tap tap)
FlowProcess
presents all known properties resolved by the current planner.
The tap
instance is the parent Tap
for this Scheme instance.flowProcess
- of type FlowProcesstap
- of type Tappublic void presentSourceFields(FlowProcess<? extends Config> flowProcess, Tap tap, Fields fields)
retrieveSourceFields(cascading.flow.FlowProcess, cascading.tap.Tap)
.flowProcess
- of type FlowProcesstap
- of type Tapfields
- of type Fieldsprotected void presentSourceFieldsInternal(Fields fields)
public Fields retrieveSinkFields(FlowProcess<? extends Config> flowProcess, Tap tap)
FlowProcess
presents all known properties resolved by the current planner.
The tap
instance is the parent Tap
for this Scheme instance.flowProcess
- of type FlowProcesstap
- of type Tappublic void presentSinkFields(FlowProcess<? extends Config> flowProcess, Tap tap, Fields fields)
retrieveSinkFields(cascading.flow.FlowProcess, cascading.tap.Tap)
.flowProcess
- of type FlowProcesstap
- of type Tapfields
- of type Fieldsprotected void presentSinkFieldsInternal(Fields fields)
public abstract void sourceConfInit(FlowProcess<? extends Config> flowProcess, Tap<Config,Input,Output> tap, Config conf)
sourcePrepare(cascading.flow.FlowProcess, SourceCall)
if resources much be initialized
before use. And sourceCleanup(cascading.flow.FlowProcess, SourceCall)
if resources must be
destroyed after use.flowProcess
- of type FlowProcesstap
- of type Tapconf
- of type Configpublic abstract void sinkConfInit(FlowProcess<? extends Config> flowProcess, Tap<Config,Input,Output> tap, Config conf)
sinkPrepare(cascading.flow.FlowProcess, SinkCall)
if resources much be initialized
before use. And sinkCleanup(cascading.flow.FlowProcess, SinkCall)
if resources must be
destroyed after use.flowProcess
- of type FlowProcesstap
- of type Tapconf
- of type Configpublic void sourcePrepare(FlowProcess<? extends Config> flowProcess, SourceCall<SourceContext,Input> sourceCall) throws java.io.IOException
source(cascading.flow.FlowProcess, SourceCall)
.
This method is guaranteed to be called once before the first invocation of source(FlowProcess, SourceCall)
.
Be sure to place any initialized objects in the SourceContext
so each instance
will remain thread-safe.flowProcess
- of type FlowProcesssourceCall
- of type SourceCalljava.io.IOException
public void sourceRePrepare(FlowProcess<? extends Config> flowProcess, SourceCall<SourceContext,Input> sourceCall) throws java.io.IOException
source(cascading.flow.FlowProcess, SourceCall)
after the Input
object
has been changed, if needed.
This method may be called zero or more times. Note sourcePrepare(FlowProcess, SourceCall)
will always
be called before any source(FlowProcess, SourceCall)
invocation.flowProcess
- of type FlowProcesssourceCall
- of type SourceCalljava.io.IOException
public abstract boolean source(FlowProcess<? extends Config> flowProcess, SourceCall<SourceContext,Input> sourceCall) throws java.io.IOException
SourceCall.getInput()
and populate
the available Tuple
via SourceCall.getIncomingEntry()
and return true
on success or false
if no more values available.
It's ok to set a new Tuple instance on the incomingEntry
TupleEntry
, or
to simply re-use the existing instance.
Note this is only time it is safe to modify a Tuple instance handed over via a method call.
This method may optionally throw a TapException
if it cannot process a particular
instance of data. If the payload Tuple is set on the TapException, that Tuple will be written to
any applicable failure trap Tap.flowProcess
- of type FlowProcesssourceCall
- of SourceCalltrue
when a Tuple was successfully readjava.io.IOException
public void sourceCleanup(FlowProcess<? extends Config> flowProcess, SourceCall<SourceContext,Input> sourceCall) throws java.io.IOException
sourcePrepare(cascading.flow.FlowProcess, SourceCall)
.flowProcess
- of ProcesssourceCall
- of type SourceCalljava.io.IOException
public void sinkPrepare(FlowProcess<? extends Config> flowProcess, SinkCall<SinkContext,Output> sinkCall) throws java.io.IOException
sink(cascading.flow.FlowProcess, SinkCall)
.
This method is guaranteed to be called once before the first invocation of sink(FlowProcess, SinkCall)
.
Be sure to place any initialized objects in the SinkContext
so each instance
will remain threadsafe.
flowProcess
- of type FlowProcesssinkCall
- of type SinkCalljava.io.IOException
public abstract void sink(FlowProcess<? extends Config> flowProcess, SinkCall<SinkContext,Output> sinkCall) throws java.io.IOException
Tuple
found on SinkCall.getOutgoingEntry()
to
the SinkCall.getOutput()
.
This method may optionally throw a TapException
if it cannot process a particular
instance of data. If the payload Tuple is set on the TapException, that Tuple will be written to
any applicable failure trap Tap. If not set, the incoming Tuple will be written instead.flowProcess
- of ProcesssinkCall
- of SinkCalljava.io.IOException
public void sinkCleanup(FlowProcess<? extends Config> flowProcess, SinkCall<SinkContext,Output> sinkCall) throws java.io.IOException
sinkPrepare(cascading.flow.FlowProcess, SinkCall)
.flowProcess
- of type FlowProcesssinkCall
- of type SinkCalljava.io.IOException
public boolean equals(java.lang.Object object)
equals
in class java.lang.Object
public java.lang.String toString()
toString
in class java.lang.Object
public int hashCode()
hashCode
in class java.lang.Object
Copyright © 2007-2015 Xplenty, Inc. All Rights Reserved.