cascading.flow.hadoop
Class HadoopFlow

java.lang.Object
  extended by cascading.flow.BaseFlow<org.apache.hadoop.mapred.JobConf>
      extended by cascading.flow.hadoop.HadoopFlow
All Implemented Interfaces:
Flow<org.apache.hadoop.mapred.JobConf>, UnitOfWork<FlowStats>
Direct Known Subclasses:
MapReduceFlow, ProcessFlow

public class HadoopFlow
extends BaseFlow<org.apache.hadoop.mapred.JobConf>

Class HadoopFlow is the Apache Hadoop specific implementation of a Flow.

HadoopFlow must be created through a HadoopFlowConnector instance.

If classpath paths are provided on the FlowDef, the Hadoop distributed cache mechanism will be used to augment the remote classpath.

Any path elements that are relative will be uploaded to HDFS, and the HDFS URI will be used on the JobConf. Note all paths are added as "files" to the JobConf, not archives, so they aren't needlessly uncompressed cluster side.

See Also:
HadoopFlowConnector

Nested Class Summary
 
Nested classes/interfaces inherited from class cascading.flow.BaseFlow
BaseFlow.FlowHolder
 
Field Summary
 
Fields inherited from class cascading.flow.BaseFlow
flowStats, sinks, sources, stop, stopJobsOnExit, thread
 
Fields inherited from interface cascading.flow.Flow
CASCADING_FLOW_ID
 
Constructor Summary
protected HadoopFlow()
           
  HadoopFlow(cascading.flow.planner.PlatformInfo platformInfo, Map<Object,Object> properties, org.apache.hadoop.mapred.JobConf jobConf, FlowDef flowDef)
           
protected HadoopFlow(cascading.flow.planner.PlatformInfo platformInfo, Map<Object,Object> properties, org.apache.hadoop.mapred.JobConf jobConf, String name, Map<String,String> flowDescriptor)
           
 
Method Summary
 org.apache.hadoop.mapred.JobConf getConfig()
          Method getConfig returns the internal configuration object.
 Map<Object,Object> getConfigAsProperties()
          Method getConfiAsProperties converts the internal configuration object into a Map of key value pairs.
 org.apache.hadoop.mapred.JobConf getConfigCopy()
          Method getConfigCopy returns a copy of the internal configuration object.
 FlowProcess<org.apache.hadoop.mapred.JobConf> getFlowProcess()
           
protected  int getMaxNumParallelSteps()
           
 String getProperty(String key)
          Method getProperty returns the value associated with the given key from the underlying properties system.
protected  void initConfig(Map<Object,Object> properties, org.apache.hadoop.mapred.JobConf parentConfig)
          This method creates a new internal Config with the parentConfig as defaults using the properties to override the defaults.
protected  void initFromProperties(Map<Object,Object> properties)
           
protected  void internalClean(boolean stop)
           
protected  void internalShutdown()
           
protected  void internalStart()
           
 boolean isPreserveTemporaryFiles()
          Method isPreserveTemporaryFiles returns false if temporary files will be cleaned when this Flow completes.
protected  org.apache.hadoop.mapred.JobConf newConfig(org.apache.hadoop.mapred.JobConf defaultConfig)
           
protected  void setConfigProperty(org.apache.hadoop.mapred.JobConf config, Object key, Object value)
           
 boolean stepsAreLocal()
          Method jobsAreLocal returns true if all jobs are executed in-process as a single map and reduce task.
 
Methods inherited from class cascading.flow.BaseFlow
addListener, addStepListener, areSinksStale, areSourcesNewer, cleanup, complete, createConfig, createFlowThread, deleteCheckpointsIfNotUpdate, deleteCheckpointsIfReplace, deleteSinks, deleteSinksIfNotUpdate, deleteSinksIfReplace, deleteTrapsIfNotUpdate, deleteTrapsIfReplace, fireOnCompleted, fireOnStarting, fireOnStopping, fireOnThrowable, getCascadeID, getCascadingServices, getCheckpointNames, getCheckpoints, getCheckpointsCollection, getClassPath, getFieldsFor, getFlowDescriptor, getFlowSession, getFlowSkipStrategy, getFlowStats, getFlowSteps, getFlowStepStrategy, getHolder, getID, getName, getPlatformInfo, getRunID, getSink, getSink, getSinkModified, getSinkNames, getSinks, getSinksCollection, getSource, getSourceNames, getSources, getSourcesCollection, getSpawnStrategy, getStats, getSubmitPriority, getTags, getTrapNames, getTraps, getTrapsCollection, handleExecutorShutdown, hasListeners, hasStepListeners, initialize, initializeNewJobsMap, initSteps, internalStopAllJobs, isSkipFlow, isStopJobsOnExit, logInfo, openSink, openSink, openSource, openSource, openTapForRead, openTapForWrite, openTrap, openTrap, prepare, presentSinkFields, presentSourceFields, registerShutdownHook, removeListener, removeStepListener, resourceExists, retrieveSinkFields, retrieveSourceFields, setCascade, setCheckpoints, setFlowSkipStrategy, setFlowStepGraph, setFlowStepStrategy, setName, setSinks, setSources, setSpawnStrategy, setSubmitPriority, setTraps, start, stop, toString, updateSchemes, writeDOT, writeStepsDOT
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, wait, wait, wait
 

Constructor Detail

HadoopFlow

protected HadoopFlow()

HadoopFlow

protected HadoopFlow(cascading.flow.planner.PlatformInfo platformInfo,
                     Map<Object,Object> properties,
                     org.apache.hadoop.mapred.JobConf jobConf,
                     String name,
                     Map<String,String> flowDescriptor)

HadoopFlow

public HadoopFlow(cascading.flow.planner.PlatformInfo platformInfo,
                  Map<Object,Object> properties,
                  org.apache.hadoop.mapred.JobConf jobConf,
                  FlowDef flowDef)
Method Detail

initFromProperties

protected void initFromProperties(Map<Object,Object> properties)
Overrides:
initFromProperties in class BaseFlow<org.apache.hadoop.mapred.JobConf>

initConfig

protected void initConfig(Map<Object,Object> properties,
                          org.apache.hadoop.mapred.JobConf parentConfig)
Description copied from class: BaseFlow
This method creates a new internal Config with the parentConfig as defaults using the properties to override the defaults.

Specified by:
initConfig in class BaseFlow<org.apache.hadoop.mapred.JobConf>
Parameters:
properties - of type Map
parentConfig - of type Config

setConfigProperty

protected void setConfigProperty(org.apache.hadoop.mapred.JobConf config,
                                 Object key,
                                 Object value)
Specified by:
setConfigProperty in class BaseFlow<org.apache.hadoop.mapred.JobConf>

newConfig

protected org.apache.hadoop.mapred.JobConf newConfig(org.apache.hadoop.mapred.JobConf defaultConfig)
Specified by:
newConfig in class BaseFlow<org.apache.hadoop.mapred.JobConf>

getConfig

public org.apache.hadoop.mapred.JobConf getConfig()
Description copied from interface: Flow
Method getConfig returns the internal configuration object.

Any changes to this object will not be reflected in child steps. See FlowConnector for setting default properties visible to children. Or see FlowStepStrategy for setting properties on individual steps before they are executed.

Returns:
the default configuration of this Flow

getConfigCopy

public org.apache.hadoop.mapred.JobConf getConfigCopy()
Description copied from interface: Flow
Method getConfigCopy returns a copy of the internal configuration object. This object can be safely modified.

Returns:
a copy of the default configuration of this Flow

getConfigAsProperties

public Map<Object,Object> getConfigAsProperties()
Description copied from interface: Flow
Method getConfiAsProperties converts the internal configuration object into a Map of key value pairs.

Returns:
a Map of key/value pairs

getProperty

public String getProperty(String key)
Method getProperty returns the value associated with the given key from the underlying properties system.

Parameters:
key - of type String
Returns:
String

getFlowProcess

public FlowProcess<org.apache.hadoop.mapred.JobConf> getFlowProcess()

isPreserveTemporaryFiles

public boolean isPreserveTemporaryFiles()
Method isPreserveTemporaryFiles returns false if temporary files will be cleaned when this Flow completes.

Returns:
the preserveTemporaryFiles (type boolean) of this Flow object.

internalStart

protected void internalStart()
Specified by:
internalStart in class BaseFlow<org.apache.hadoop.mapred.JobConf>

stepsAreLocal

public boolean stepsAreLocal()
Description copied from interface: Flow
Method jobsAreLocal returns true if all jobs are executed in-process as a single map and reduce task.

Returns:
boolean

internalClean

protected void internalClean(boolean stop)
Specified by:
internalClean in class BaseFlow<org.apache.hadoop.mapred.JobConf>

internalShutdown

protected void internalShutdown()
Specified by:
internalShutdown in class BaseFlow<org.apache.hadoop.mapred.JobConf>

getMaxNumParallelSteps

protected int getMaxNumParallelSteps()
Specified by:
getMaxNumParallelSteps in class BaseFlow<org.apache.hadoop.mapred.JobConf>


Copyright © 2007-2013 Concurrent, Inc. All Rights Reserved.