cascading.flow.hadoop
Class HadoopFlowProcess

java.lang.Object
  extended by cascading.flow.FlowProcess<org.apache.hadoop.mapred.JobConf>
      extended by cascading.flow.hadoop.HadoopFlowProcess

public class HadoopFlowProcess
extends FlowProcess<org.apache.hadoop.mapred.JobConf>

Class HadoopFlowProcess is an implementation of FlowProcess for Hadoop. Use this interface to get direct access to the Hadoop JobConf and Reporter interfaces.

Be warned that coupling to this implementation will cause custom Operations to fail if they are executed on a system other than Hadoop.

See Also:
FlowSession, JobConf, Reporter

Nested Class Summary
 
Nested classes/interfaces inherited from class cascading.flow.FlowProcess
FlowProcess.NullFlowProcess
 
Field Summary
 
Fields inherited from class cascading.flow.FlowProcess
NULL
 
Constructor Summary
HadoopFlowProcess()
           
HadoopFlowProcess(FlowSession flowSession, org.apache.hadoop.mapred.JobConf jobConf)
           
HadoopFlowProcess(FlowSession flowSession, org.apache.hadoop.mapred.JobConf jobConf, boolean isMapper)
          Constructor HadoopFlowProcess creates a new HadoopFlowProcess instance.
HadoopFlowProcess(HadoopFlowProcess flowProcess, org.apache.hadoop.mapred.JobConf jobConf)
           
HadoopFlowProcess(org.apache.hadoop.mapred.JobConf jobConf)
           
 
Method Summary
 org.apache.hadoop.mapred.JobConf copyConfig(org.apache.hadoop.mapred.JobConf jobConf)
           
 FlowProcess copyWith(org.apache.hadoop.mapred.JobConf jobConf)
           
 Map<String,String> diffConfigIntoMap(org.apache.hadoop.mapred.JobConf defaultConfig, org.apache.hadoop.mapred.JobConf updatedConfig)
           
 org.apache.hadoop.mapred.JobConf getConfigCopy()
           
 int getCurrentNumMappers()
           
 int getCurrentNumReducers()
           
 int getCurrentSliceNum()
          Method getCurrentTaskNum returns the task number of this task.
 org.apache.hadoop.mapred.JobConf getJobConf()
          Method getJobConf returns the jobConf of this HadoopFlowProcess object.
 int getNumProcessSlices()
          Method getNumProcessSlices returns the number of parallel slices or tasks allocated for this process execution.
 org.apache.hadoop.mapred.OutputCollector getOutputCollector()
           
 Object getProperty(String key)
          Method getProperty should be used to return configuration parameters from the underlying system.
 Collection<String> getPropertyKeys()
          Method getPropertyKeys returns an immutable collection of all available property key values.
 org.apache.hadoop.mapred.Reporter getReporter()
          Method getReporter returns the reporter of this HadoopFlowProcess object.
 void increment(Enum counter, long amount)
          Method increment is used to increment a custom counter.
 void increment(String group, String counter, long amount)
          Method increment is used to increment a custom counter.
 boolean isCounterStatusInitialized()
          Method isCounterStatusInitialized returns true if it is safe to increment a counter or set a status.
 boolean isMapper()
          Method isMapper returns true if this part of the FlowProcess is a MapReduce mapper.
 void keepAlive()
          Method keepAlive notifies the system that the current process is still alive.
 org.apache.hadoop.mapred.JobConf mergeMapIntoConfig(org.apache.hadoop.mapred.JobConf defaultConfig, Map<String,String> map)
           
 Object newInstance(String className)
          Method newInstance creates a new object instance from the given className argument delegating to any platform specific instantiation and configuration routines.
 TupleEntryCollector openSystemIntermediateForWrite()
           
 TupleEntryIterator openTapForRead(Tap tap)
          Method openTapForRead return a TupleEntryIterator for the given Tap instance.
 TupleEntryCollector openTapForWrite(Tap tap)
          Method openTapForWrite returns a (@link TupleCollector} for the given Tap instance.
 TupleEntryCollector openTrapForWrite(Tap trap)
          Method openTrapForWrite returns a (@link TupleCollector} for the given Tap instance.
 void setOutputCollector(org.apache.hadoop.mapred.OutputCollector outputCollector)
           
 void setReporter(org.apache.hadoop.mapred.Reporter reporter)
          Method setReporter sets the reporter of this HadoopFlowProcess object.
 void setStatus(String status)
          Method setStatus is used to set the status of the current operation.
 
Methods inherited from class cascading.flow.FlowProcess
getCurrentSession, getID, getStringProperty, setCurrentSession
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Constructor Detail

HadoopFlowProcess

public HadoopFlowProcess()

HadoopFlowProcess

public HadoopFlowProcess(org.apache.hadoop.mapred.JobConf jobConf)

HadoopFlowProcess

public HadoopFlowProcess(FlowSession flowSession,
                         org.apache.hadoop.mapred.JobConf jobConf)

HadoopFlowProcess

public HadoopFlowProcess(FlowSession flowSession,
                         org.apache.hadoop.mapred.JobConf jobConf,
                         boolean isMapper)
Constructor HadoopFlowProcess creates a new HadoopFlowProcess instance.

Parameters:
flowSession - of type FlowSession
jobConf - of type JobConf

HadoopFlowProcess

public HadoopFlowProcess(HadoopFlowProcess flowProcess,
                         org.apache.hadoop.mapred.JobConf jobConf)
Method Detail

copyWith

public FlowProcess copyWith(org.apache.hadoop.mapred.JobConf jobConf)
Specified by:
copyWith in class FlowProcess<org.apache.hadoop.mapred.JobConf>

getJobConf

public org.apache.hadoop.mapred.JobConf getJobConf()
Method getJobConf returns the jobConf of this HadoopFlowProcess object.

Returns:
the jobConf (type JobConf) of this HadoopFlowProcess object.

getConfigCopy

public org.apache.hadoop.mapred.JobConf getConfigCopy()
Specified by:
getConfigCopy in class FlowProcess<org.apache.hadoop.mapred.JobConf>

isMapper

public boolean isMapper()
Method isMapper returns true if this part of the FlowProcess is a MapReduce mapper. If false, it is a reducer.

Returns:
boolean

getCurrentNumMappers

public int getCurrentNumMappers()

getCurrentNumReducers

public int getCurrentNumReducers()

getCurrentSliceNum

public int getCurrentSliceNum()
Method getCurrentTaskNum returns the task number of this task. Task 0 is the first task.

Specified by:
getCurrentSliceNum in class FlowProcess<org.apache.hadoop.mapred.JobConf>
Returns:
int

getNumProcessSlices

public int getNumProcessSlices()
Description copied from class: FlowProcess
Method getNumProcessSlices returns the number of parallel slices or tasks allocated for this process execution.

For MapReduce platforms, this is the same as the number of tasks for a given MapReduce job.

Specified by:
getNumProcessSlices in class FlowProcess<org.apache.hadoop.mapred.JobConf>
Returns:
an int

setReporter

public void setReporter(org.apache.hadoop.mapred.Reporter reporter)
Method setReporter sets the reporter of this HadoopFlowProcess object.

Parameters:
reporter - the reporter of this HadoopFlowProcess object.

getReporter

public org.apache.hadoop.mapred.Reporter getReporter()
Method getReporter returns the reporter of this HadoopFlowProcess object.

Returns:
the reporter (type Reporter) of this HadoopFlowProcess object.

setOutputCollector

public void setOutputCollector(org.apache.hadoop.mapred.OutputCollector outputCollector)

getOutputCollector

public org.apache.hadoop.mapred.OutputCollector getOutputCollector()

getProperty

public Object getProperty(String key)
Description copied from class: FlowProcess
Method getProperty should be used to return configuration parameters from the underlying system.

In the case of Hadoop, the current Configuration will be queried.

Specified by:
getProperty in class FlowProcess<org.apache.hadoop.mapred.JobConf>
Parameters:
key - of type String
Returns:
an Object

getPropertyKeys

public Collection<String> getPropertyKeys()
Description copied from class: FlowProcess
Method getPropertyKeys returns an immutable collection of all available property key values.

Specified by:
getPropertyKeys in class FlowProcess<org.apache.hadoop.mapred.JobConf>
Returns:
a Collection

newInstance

public Object newInstance(String className)
Description copied from class: FlowProcess
Method newInstance creates a new object instance from the given className argument delegating to any platform specific instantiation and configuration routines.

Specified by:
newInstance in class FlowProcess<org.apache.hadoop.mapred.JobConf>
Returns:
an instance of className

keepAlive

public void keepAlive()
Description copied from class: FlowProcess
Method keepAlive notifies the system that the current process is still alive. Use this method if a particular Operation takes some moments to complete. Each system is different, so calling ping every few seconds to every minute or so would be best.

This method will fail silently if the underlying mechanism to notify keepAlive status are not initialized.

Specified by:
keepAlive in class FlowProcess<org.apache.hadoop.mapred.JobConf>

increment

public void increment(Enum counter,
                      long amount)
Description copied from class: FlowProcess
Method increment is used to increment a custom counter. Counters must be of type Enum. The amount to increment must be a integer value.

This method will fail if the underlying counter infrastructure is unavailable. See FlowProcess.isCounterStatusInitialized().

Specified by:
increment in class FlowProcess<org.apache.hadoop.mapred.JobConf>
Parameters:
counter - of type Enum
amount - of type int

increment

public void increment(String group,
                      String counter,
                      long amount)
Description copied from class: FlowProcess
Method increment is used to increment a custom counter. The amount to increment must be a integer value.

This method will fail if the underlying counter infrastructure is unavailable. See FlowProcess.isCounterStatusInitialized().

Specified by:
increment in class FlowProcess<org.apache.hadoop.mapred.JobConf>
Parameters:
group - of type String
counter - of type String
amount - of type int

setStatus

public void setStatus(String status)
Description copied from class: FlowProcess
Method setStatus is used to set the status of the current operation.

This method will fail if the underlying counter infrastructure is unavailable. See FlowProcess.isCounterStatusInitialized().

Specified by:
setStatus in class FlowProcess<org.apache.hadoop.mapred.JobConf>
Parameters:
status - of type String

isCounterStatusInitialized

public boolean isCounterStatusInitialized()
Description copied from class: FlowProcess
Method isCounterStatusInitialized returns true if it is safe to increment a counter or set a status.

Specified by:
isCounterStatusInitialized in class FlowProcess<org.apache.hadoop.mapred.JobConf>
Returns:
boolean

openTapForRead

public TupleEntryIterator openTapForRead(Tap tap)
                                  throws IOException
Description copied from class: FlowProcess
Method openTapForRead return a TupleEntryIterator for the given Tap instance.

Note the returned iterator will return the same instance of TupleEntry on every call, thus a copy must be made of either the TupleEntry or the underlying Tuple instance if they are to be stored in a Collection.

Specified by:
openTapForRead in class FlowProcess<org.apache.hadoop.mapred.JobConf>
Parameters:
tap - of type Tap
Returns:
TupleIterator
Throws:
IOException - when there is a failure opening the resource

openTapForWrite

public TupleEntryCollector openTapForWrite(Tap tap)
                                    throws IOException
Description copied from class: FlowProcess
Method openTapForWrite returns a (@link TupleCollector} for the given Tap instance.

Specified by:
openTapForWrite in class FlowProcess<org.apache.hadoop.mapred.JobConf>
Parameters:
tap - of type Tap
Returns:
TupleCollector
Throws:
IOException - when there is a failure opening the resource

openTrapForWrite

public TupleEntryCollector openTrapForWrite(Tap trap)
                                     throws IOException
Description copied from class: FlowProcess
Method openTrapForWrite returns a (@link TupleCollector} for the given Tap instance.

Specified by:
openTrapForWrite in class FlowProcess<org.apache.hadoop.mapred.JobConf>
Parameters:
trap - of type Tap
Returns:
TupleCollector
Throws:
IOException - when there is a failure opening the resource

openSystemIntermediateForWrite

public TupleEntryCollector openSystemIntermediateForWrite()
                                                   throws IOException
Specified by:
openSystemIntermediateForWrite in class FlowProcess<org.apache.hadoop.mapred.JobConf>
Throws:
IOException

copyConfig

public org.apache.hadoop.mapred.JobConf copyConfig(org.apache.hadoop.mapred.JobConf jobConf)
Specified by:
copyConfig in class FlowProcess<org.apache.hadoop.mapred.JobConf>

diffConfigIntoMap

public Map<String,String> diffConfigIntoMap(org.apache.hadoop.mapred.JobConf defaultConfig,
                                            org.apache.hadoop.mapred.JobConf updatedConfig)
Specified by:
diffConfigIntoMap in class FlowProcess<org.apache.hadoop.mapred.JobConf>

mergeMapIntoConfig

public org.apache.hadoop.mapred.JobConf mergeMapIntoConfig(org.apache.hadoop.mapred.JobConf defaultConfig,
                                                           Map<String,String> map)
Specified by:
mergeMapIntoConfig in class FlowProcess<org.apache.hadoop.mapred.JobConf>


Copyright © 2007-2012 Concurrent, Inc. All Rights Reserved.