|
|||||||||
PREV CLASS NEXT CLASS | FRAMES NO FRAMES | ||||||||
SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD |
java.lang.Objectcascading.scheme.Scheme<Config,Input,Output,SourceContext,SinkContext>
public abstract class Scheme<Config,Input,Output,SourceContext,SinkContext>
A Scheme defines what is stored in a Tap
instance by declaring the Tuple
field names, and alternately parsing or rendering the incoming or outgoing Tuple
stream, respectively.
Fields.UNKNOWN
and the default sinkFields are Fields.ALL
.
Any given sourceFields only label the values in the Tuple
s as they are sourced.
It does not necessarily filter the output since a given implementation may choose to
collapse values and ignore keys depending on the format.
If the sinkFields are Fields.ALL
, the Cascading planner will attempt to resolve the actual field names
and make them available via the SinkCall.getOutgoingEntry()
method. Sometimes this may
not be possible (in the case the Tap.openForWrite(cascading.flow.FlowProcess)
method is called from user
code directly (without planner intervention).
If the sinkFields are a valid selector, the sink(cascading.flow.FlowProcess, SinkCall)
method will
only see the fields expected.
Setting the numSinkParts
value to 1 (one) attempts to ensure the output resource has only one part.
In the case of MapReduce, this is only a suggestion for the Map side, on the Reduce side it does this by
setting the number of reducers to the given value. This may affect performance, so be cautioned.
Note that setting numSinkParts does not force the planner to insert a final Reduce operation in the job, so
numSinkParts may be ignored entirely if the final job is Map only. To force the Flow to have a final Reduce,
add a GroupBy
to the assembly before sinking.
Constructor Summary | |
---|---|
protected |
Scheme()
Constructor Scheme creates a new Scheme instance. |
protected |
Scheme(Fields sourceFields)
Constructor Scheme creates a new Scheme instance. |
protected |
Scheme(Fields sourceFields,
Fields sinkFields)
Constructor Scheme creates a new Scheme instance. |
protected |
Scheme(Fields sourceFields,
Fields sinkFields,
int numSinkParts)
Constructor Scheme creates a new Scheme instance. |
protected |
Scheme(Fields sourceFields,
int numSinkParts)
Constructor Scheme creates a new Scheme instance. |
Method Summary | |
---|---|
boolean |
equals(Object object)
|
int |
getNumSinkParts()
Method getNumSinkParts returns the numSinkParts of this Scheme object. |
Fields |
getSinkFields()
Method getSinkFields returns the sinkFields of this Scheme object. |
Fields |
getSourceFields()
Method getSourceFields returns the sourceFields of this Scheme object. |
String |
getTrace()
Method getTrace returns a String that pinpoint where this instance was created for debugging. |
int |
hashCode()
|
boolean |
isSink()
Method isSink returns true if this Scheme instance can be used as a sink. |
boolean |
isSource()
Method isSource returns true if this Scheme instance can be used as a source. |
boolean |
isSymmetrical()
Method isSymmetrical returns true if the sink fields equal the source fields. |
void |
presentSinkFields(FlowProcess<Config> flowProcess,
Tap tap,
Fields fields)
Method presentSinkFields is called after the planner is invoked and all fields are resolved. |
protected void |
presentSinkFieldsInternal(Fields fields)
|
void |
presentSourceFields(FlowProcess<Config> flowProcess,
Tap tap,
Fields fields)
Method presentSourceFields is called after the planner is invoked and all fields are resolved. |
protected void |
presentSourceFieldsInternal(Fields fields)
|
Fields |
retrieveSinkFields(FlowProcess<Config> flowProcess,
Tap tap)
Method retrieveSinkFields notifies a Scheme when it is appropriate to dynamically update the fields it sources. |
Fields |
retrieveSourceFields(FlowProcess<Config> flowProcess,
Tap tap)
Method retrieveSourceFields notifies a Scheme when it is appropriate to dynamically update the fields it sources. |
void |
setNumSinkParts(int numSinkParts)
Method setNumSinkParts sets the numSinkParts of this Scheme object. |
void |
setSinkFields(Fields sinkFields)
Method setSinkFields sets the sinkFields of this Scheme object. |
void |
setSourceFields(Fields sourceFields)
Method setSourceFields sets the sourceFields of this Scheme object. |
abstract void |
sink(FlowProcess<Config> flowProcess,
SinkCall<SinkContext,Output> sinkCall)
Method sink writes out the given Tuple found on SinkCall.getOutgoingEntry() to
the SinkCall.getOutput() . |
void |
sinkCleanup(FlowProcess<Config> flowProcess,
SinkCall<SinkContext,Output> sinkCall)
Method sinkCleanup is used to destroy resources created by sinkPrepare(cascading.flow.FlowProcess, SinkCall) . |
abstract void |
sinkConfInit(FlowProcess<Config> flowProcess,
Tap<Config,Input,Output> tap,
Config conf)
Method sinkInit initializes this instance as a sink. |
void |
sinkPrepare(FlowProcess<Config> flowProcess,
SinkCall<SinkContext,Output> sinkCall)
Method sinkPrepare is used to initialize resources needed during each call of sink(cascading.flow.FlowProcess, SinkCall) . |
abstract boolean |
source(FlowProcess<Config> flowProcess,
SourceCall<SourceContext,Input> sourceCall)
Method source will read a new "record" or value from SourceCall.getInput() and populate
the available Tuple via SourceCall.getIncomingEntry() and return true
on success or false if no more values available. |
void |
sourceCleanup(FlowProcess<Config> flowProcess,
SourceCall<SourceContext,Input> sourceCall)
Method sourceCleanup is used to destroy resources created by sourcePrepare(cascading.flow.FlowProcess, SourceCall) . |
abstract void |
sourceConfInit(FlowProcess<Config> flowProcess,
Tap<Config,Input,Output> tap,
Config conf)
Method sourceInit initializes this instance as a source. |
void |
sourcePrepare(FlowProcess<Config> flowProcess,
SourceCall<SourceContext,Input> sourceCall)
Method sourcePrepare is used to initialize resources needed during each call of source(cascading.flow.FlowProcess, SourceCall) . |
String |
toString()
|
Methods inherited from class java.lang.Object |
---|
clone, finalize, getClass, notify, notifyAll, wait, wait, wait |
Constructor Detail |
---|
protected Scheme()
protected Scheme(Fields sourceFields)
sourceFields
- of type Fieldsprotected Scheme(Fields sourceFields, int numSinkParts)
sourceFields
- of type FieldsnumSinkParts
- of type intprotected Scheme(Fields sourceFields, Fields sinkFields)
sourceFields
- of type FieldssinkFields
- of type Fieldsprotected Scheme(Fields sourceFields, Fields sinkFields, int numSinkParts)
sourceFields
- of type FieldssinkFields
- of type FieldsnumSinkParts
- of type intMethod Detail |
---|
public Fields getSinkFields()
public void setSinkFields(Fields sinkFields)
sinkFields
- the sinkFields of this Scheme object.public Fields getSourceFields()
public void setSourceFields(Fields sourceFields)
sourceFields
- the sourceFields of this Scheme object.public int getNumSinkParts()
public void setNumSinkParts(int numSinkParts)
numSinkParts
- the numSinkParts of this Scheme object.public String getTrace()
public boolean isSymmetrical()
true
if the sink fields equal the source fields. That is, this
scheme sources the same fields as it sinks.
public boolean isSource()
public boolean isSink()
public Fields retrieveSourceFields(FlowProcess<Config> flowProcess, Tap tap)
FlowProcess
presents all known properties resolved by the current planner.
The instance is the parent Tap
for this Scheme instance.
flowProcess
- of type FlowProcesstap
- of type Tap
public void presentSourceFields(FlowProcess<Config> flowProcess, Tap tap, Fields fields)
retrieveSourceFields(cascading.flow.FlowProcess, cascading.tap.Tap)
.
flowProcess
- of type FlowProcesstap
- of type Tapfields
- of type Fieldsprotected void presentSourceFieldsInternal(Fields fields)
public Fields retrieveSinkFields(FlowProcess<Config> flowProcess, Tap tap)
FlowProcess
presents all known properties resolved by the current planner.
The instance is the parent Tap
for this Scheme instance.
flowProcess
- of type FlowProcesstap
- of type Tap
public void presentSinkFields(FlowProcess<Config> flowProcess, Tap tap, Fields fields)
retrieveSinkFields(cascading.flow.FlowProcess, cascading.tap.Tap)
.
flowProcess
- of type FlowProcesstap
- of type Tapfields
- of type Fieldsprotected void presentSinkFieldsInternal(Fields fields)
public abstract void sourceConfInit(FlowProcess<Config> flowProcess, Tap<Config,Input,Output> tap, Config conf)
sourcePrepare(cascading.flow.FlowProcess, SourceCall)
if resources much be initialized
before use. And sourceCleanup(cascading.flow.FlowProcess, SourceCall)
if resources must be
destroyed after use.
flowProcess
- of type FlowProcesstap
- of type Tapconf
- of type Configpublic abstract void sinkConfInit(FlowProcess<Config> flowProcess, Tap<Config,Input,Output> tap, Config conf)
sinkPrepare(cascading.flow.FlowProcess, SinkCall)
if resources much be initialized
before use. And sinkCleanup(cascading.flow.FlowProcess, SinkCall)
if resources must be
destroyed after use.
flowProcess
- of type FlowProcesstap
- of type Tapconf
- of type Configpublic void sourcePrepare(FlowProcess<Config> flowProcess, SourceCall<SourceContext,Input> sourceCall) throws IOException
source(cascading.flow.FlowProcess, SourceCall)
.
Be sure to place any initialized objects in the SourceContext
so each instance
will remain threadsafe.
flowProcess
- of type FlowProcesssourceCall
- of type SourceCallIOException
public abstract boolean source(FlowProcess<Config> flowProcess, SourceCall<SourceContext,Input> sourceCall) throws IOException
SourceCall.getInput()
and populate
the available Tuple
via SourceCall.getIncomingEntry()
and return true
on success or false
if no more values available.
It's ok to set a new Tuple instance on the incomingEntry
TupleEntry
, or
to simply re-use the existing instance.
Note this is only time it is safe to modify a Tuple instance handed over via a method call.
This method may optionally throw a TapException
if it cannot process a particular
instance of data. If the payload Tuple is set on the TapException, that Tuple will be written to
any applicable failure trap Tap.
flowProcess
- of type FlowProcesssourceCall
- of SourceCall
true
when a Tuple was successfully read
IOException
public void sourceCleanup(FlowProcess<Config> flowProcess, SourceCall<SourceContext,Input> sourceCall) throws IOException
sourcePrepare(cascading.flow.FlowProcess, SourceCall)
.
flowProcess
- of ProcesssourceCall
- of type SourceCallIOException
public void sinkPrepare(FlowProcess<Config> flowProcess, SinkCall<SinkContext,Output> sinkCall) throws IOException
sink(cascading.flow.FlowProcess, SinkCall)
.
Be sure to place any initialized objects in the SinkContext
so each instance
will remain threadsafe.
flowProcess
- of type FlowProcesssinkCall
- of type SinkCallIOException
public abstract void sink(FlowProcess<Config> flowProcess, SinkCall<SinkContext,Output> sinkCall) throws IOException
Tuple
found on SinkCall.getOutgoingEntry()
to
the SinkCall.getOutput()
.
This method may optionally throw a TapException
if it cannot process a particular
instance of data. If the payload Tuple is set on the TapException, that Tuple will be written to
any applicable failure trap Tap. If not set, the incoming Tuple will be written instead.
flowProcess
- of ProcesssinkCall
- of SinkCall
IOException
public void sinkCleanup(FlowProcess<Config> flowProcess, SinkCall<SinkContext,Output> sinkCall) throws IOException
sinkPrepare(cascading.flow.FlowProcess, SinkCall)
.
flowProcess
- of type FlowProcesssinkCall
- of type SinkCallIOException
public boolean equals(Object object)
equals
in class Object
public String toString()
toString
in class Object
public int hashCode()
hashCode
in class Object
|
|||||||||
PREV CLASS NEXT CLASS | FRAMES NO FRAMES | ||||||||
SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD |