cascading.flow
Class Flow

java.lang.Object
  extended by cascading.flow.Flow
All Implemented Interfaces:
Runnable
Direct Known Subclasses:
MapReduceFlow, ProcessFlow

@Process
public class Flow
extends Object
implements Runnable

A Pipe assembly is connected to the necessary number of Tap sinks and sources into a Flow. A Flow is then executed to push the incoming source data through the assembly into one or more sinks.

Note that Pipe assemblies can be reused in multiple Flow instances. They maintain no state regarding the Flow execution. Subsequently, Pipe assemblies can be given parameters through its calling Flow so they can be built in a generic fashion.

When a Flow is created, an optimized internal representation is created that is then executed within the cluster. Thus any overhead inherent to a give Pipe assembly will be removed once it's placed in context with the actual execution environment.

Flows are submitted in order of dependency. If two or more steps do not share the same dependencies and all can be scheduled simultaneously, the getSubmitPriority() value determines the order in which all steps will be submitted for execution. The default submit priority is 5.

Properties

See Also:
FlowConnector

Nested Class Summary
static class Flow.FlowHolder
          Class FlowHolder is a helper class for wrapping Flow instances.
 
Field Summary
protected  Map<String,Tap> sources
          Field sources
protected  boolean stopJobsOnExit
          Field stopJobsOnExit
 
Constructor Summary
protected Flow()
          Used for testing.
protected Flow(Map<Object,Object> properties, JobConf jobConf, String name)
           
protected Flow(Map<Object,Object> properties, JobConf jobConf, String name, ElementGraph pipeGraph, StepGraph stepGraph, Map<String,Tap> sources, Map<String,Tap> sinks, Map<String,Tap> traps)
           
protected Flow(Map<Object,Object> properties, JobConf jobConf, String name, StepGraph stepGraph, Map<String,Tap> sources, Map<String,Tap> sinks, Map<String,Tap> traps)
           
 
Method Summary
 void addListener(FlowListener flowListener)
          Method addListener registers the given flowListener with this instance.
 boolean areSinksStale()
          Method areSinksStale returns true if any of the sinks referenced are out of date in relation to the sources.
 boolean areSourcesNewer(long sinkModified)
          Method areSourcesNewer returns true if any source is newer than the given sinkModified date value.
 void cleanup()
           
 void complete()
          Method complete starts the current Flow instance if it has not be previously started, then block until completion.
 void deleteSinks()
          Method deleteSinks deletes all sinks, whether or not they are configured for SinkMode.UPDATE.
 void deleteSinksIfNotAppend()
          Deprecated. 
 void deleteSinksIfNotUpdate()
          Method deleteSinksIfNotUpdate deletes all sinks if they are not configured with the SinkMode.UPDATE flag.
 FlowSkipStrategy getFlowSkipStrategy()
          Method getFlowSkipStrategy returns the current FlowSkipStrategy used by this Flow.
 FlowStats getFlowStats()
          Method getFlowStats returns the flowStats of this Flow object.
 Flow.FlowHolder getHolder()
          Used to return a simple wrapper for use as an edge in a graph where there can only be one instance of every edge.
 String getID()
          Method getID returns the ID of this Flow object.
 JobConf getJobConf()
          Method getJobConf returns the jobConf of this Flow object.
static long getJobPollingInterval(JobConf jobConf)
           
static long getJobPollingInterval(Map<Object,Object> properties)
          Returns property jobPollingInterval.
 String getName()
          Method getName returns the name of this Flow object.
static boolean getPreserveTemporaryFiles(Map<Object,Object> properties)
          Returns property preserveTemporaryFiles.
 String getProperty(String key)
          Method getProperty returns the value associated with the given key from the underlying properties system.
 Tap getSink()
          Method getSink returns the first sink of this Flow object.
 long getSinkModified()
          Method getSinkModified returns the youngest modified date of any sink Tap managed by this Flow instance.
 Map<String,Tap> getSinks()
          Method getSinks returns the sinks of this Flow object.
 Collection<Tap> getSinksCollection()
          Method getSinksCollection returns a Collection of sink Taps for this Flow object.
 Map<String,Tap> getSources()
          Method getSources returns the sources of this Flow object.
 Collection<Tap> getSourcesCollection()
          Method getSourcesCollection returns a Collection of source Taps for this Flow object.
 List<FlowStep> getSteps()
          Method getSteps returns the steps of this Flow object.
static boolean getStopJobsOnExit(Map<Object,Object> properties)
          Returns property stopJobsOnExit.
 int getSubmitPriority()
          Method getSubmitPriority returns the submitPriority of this Flow object.
 Map<String,Tap> getTraps()
          Method getTraps returns the traps of this Flow object.
 Collection<Tap> getTrapsCollection()
          Method getTrapsCollection returns a Collection of trap Taps for this Flow object.
 boolean hasListeners()
          Method hasListeners returns true if FlowListener instances have been registered.
 boolean isPreserveTemporaryFiles()
          Method isPreserveTemporaryFiles returns false if temporary files will be cleaned when this Flow completes.
 boolean isSkipFlow()
          Method isSkipFlow returns true if the parent Cascade should skip this Flow instance.
 boolean isStopJobsOnExit()
          Method isStopJobsOnExit returns the stopJobsOnExit of this Flow object.
 boolean jobsAreLocal()
          Method jobsAreLocal returns true if all jobs are executed in-process as a single map and reduce task.
 TupleEntryIterator openSink()
          Method openSink opens the first sink Tap.
 TupleEntryIterator openSink(String name)
          Method openSink opens the named sink Tap.
 TupleEntryIterator openSource()
          Method openSource opens the first source Tap.
 TupleEntryIterator openSource(String name)
          Method openSource opens the named source Tap.
 TupleEntryIterator openTapForRead(Tap tap)
          Method openTapForRead return a TupleIterator for the given Tap instance.
 TupleEntryCollector openTapForWrite(Tap tap)
          Method openTapForWrite returns a (@link TupleCollector} for the given Tap instance.
 TupleEntryIterator openTrap()
          Method openTrap opens the first trap Tap.
 TupleEntryIterator openTrap(String name)
          Method openTrap opens the named trap Tap.
 void prepare()
          Method prepare is used by a Cascade to notify the given Flow it should initialize or clear any resources necessary for start() to be called successfully.
 boolean removeListener(FlowListener flowListener)
          Method removeListener removes the given flowListener from this instance.
 void run()
          Method run implements the Runnable run method and should not be called by users.
 FlowSkipStrategy setFlowSkipStrategy(FlowSkipStrategy flowSkipStrategy)
          Method setFlowSkipStrategy sets a new FlowSkipStrategy, the current strategy is returned.
static void setJobPollingInterval(Map<Object,Object> properties, long interval)
          Property jobPollingInterval will set the time to wait between polling the remote server for the status of a job.
protected  void setName(String name)
           
static void setPreserveTemporaryFiles(Map<Object,Object> properties, boolean preserveTemporaryFiles)
          Property preserveTemporaryFiles forces the Flow instance to keep any temporary intermediate data sets.
 void setProperty(String key, String value)
          Method setProperty sets the given key and value on the underlying properites system.
protected  void setSinks(Map<String,Tap> sinks)
           
protected  void setSources(Map<String,Tap> sources)
           
protected  void setStepGraph(StepGraph stepGraph)
           
static void setStopJobsOnExit(Map<Object,Object> properties, boolean stopJobsOnExit)
          Property stopJobsOnExit will tell the Flow to add a JVM shutdown hook that will kill all running processes if the underlying computing system supports it.
 void setSubmitPriority(int submitPriority)
          Method setSubmitPriority sets the submitPriority of this Flow object.
protected  void setTraps(Map<String,Tap> traps)
           
 void start()
          Method start begins the execution of this Flow instance.
 void stop()
          Method stop stops all running jobs, killing any currently executing.
 boolean tapPathExists(Tap tap)
          Method tapExists returns true if the resource represented by the given Tap instance exists.
 String toString()
           
 void writeDOT(String filename)
          Method writeDOT writes this Flow instance to the given filename as a DOT file for import into a graphics package.
 void writeStepsDOT(String filename)
          Method writeStepsDOT writes this Flow step graph to the given filename as a DOT file for import into a graphics package.
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, wait, wait, wait
 

Field Detail

sources

protected Map<String,Tap> sources
Field sources


stopJobsOnExit

protected boolean stopJobsOnExit
Field stopJobsOnExit

Constructor Detail

Flow

protected Flow()
Used for testing.


Flow

protected Flow(Map<Object,Object> properties,
               JobConf jobConf,
               String name)

Flow

protected Flow(Map<Object,Object> properties,
               JobConf jobConf,
               String name,
               ElementGraph pipeGraph,
               StepGraph stepGraph,
               Map<String,Tap> sources,
               Map<String,Tap> sinks,
               Map<String,Tap> traps)

Flow

protected Flow(Map<Object,Object> properties,
               JobConf jobConf,
               String name,
               StepGraph stepGraph,
               Map<String,Tap> sources,
               Map<String,Tap> sinks,
               Map<String,Tap> traps)
Method Detail

setPreserveTemporaryFiles

public static void setPreserveTemporaryFiles(Map<Object,Object> properties,
                                             boolean preserveTemporaryFiles)
Property preserveTemporaryFiles forces the Flow instance to keep any temporary intermediate data sets. Useful for debugging. Defaults to false.

Parameters:
properties - of type Map
preserveTemporaryFiles - of type boolean

getPreserveTemporaryFiles

public static boolean getPreserveTemporaryFiles(Map<Object,Object> properties)
Returns property preserveTemporaryFiles.

Parameters:
properties - of type Map
Returns:
a boolean

setStopJobsOnExit

public static void setStopJobsOnExit(Map<Object,Object> properties,
                                     boolean stopJobsOnExit)
Property stopJobsOnExit will tell the Flow to add a JVM shutdown hook that will kill all running processes if the underlying computing system supports it. Defaults to true.

Parameters:
properties - of type Map
stopJobsOnExit - of type boolean

getStopJobsOnExit

public static boolean getStopJobsOnExit(Map<Object,Object> properties)
Returns property stopJobsOnExit.

Parameters:
properties - of type Map
Returns:
a boolean

setJobPollingInterval

public static void setJobPollingInterval(Map<Object,Object> properties,
                                         long interval)
Property jobPollingInterval will set the time to wait between polling the remote server for the status of a job. The default value is 5000 msec (5 seconds).

Parameters:
properties - of type Map
interval - of type long

getJobPollingInterval

public static long getJobPollingInterval(Map<Object,Object> properties)
Returns property jobPollingInterval. The default is 5000 (5 sec).

Parameters:
properties - of type Map
Returns:
a long

getJobPollingInterval

public static long getJobPollingInterval(JobConf jobConf)

getName

public String getName()
Method getName returns the name of this Flow object.

Returns:
the name (type String) of this Flow object.

setName

protected void setName(String name)

getID

public String getID()
Method getID returns the ID of this Flow object.

The ID value is a long HEX String used to identify this instance globally. Subsequent Flow instances created with identical parameters will not return the same ID.

Returns:
the ID (type String) of this Flow object.

getSubmitPriority

public int getSubmitPriority()
Method getSubmitPriority returns the submitPriority of this Flow object.

10 is lowest, 1 is the highest, 5 is the default.

Returns:
the submitPriority (type int) of this FlowStep object.

setSubmitPriority

public void setSubmitPriority(int submitPriority)
Method setSubmitPriority sets the submitPriority of this Flow object.

10 is lowest, 1 is the highest, 5 is the default.

Parameters:
submitPriority - the submitPriority of this FlowStep object.

setSources

protected void setSources(Map<String,Tap> sources)

setSinks

protected void setSinks(Map<String,Tap> sinks)

setTraps

protected void setTraps(Map<String,Tap> traps)

setStepGraph

protected void setStepGraph(StepGraph stepGraph)

getJobConf

public JobConf getJobConf()
Method getJobConf returns the jobConf of this Flow object.

Returns:
the jobConf (type JobConf) of this Flow object.

setProperty

public void setProperty(String key,
                        String value)
Method setProperty sets the given key and value on the underlying properites system.

Parameters:
key - of type String
value - of type String

getProperty

public String getProperty(String key)
Method getProperty returns the value associated with the given key from the underlying properties system.

Parameters:
key - of type String
Returns:
String

getFlowStats

public FlowStats getFlowStats()
Method getFlowStats returns the flowStats of this Flow object.

Returns:
the flowStats (type FlowStats) of this Flow object.

hasListeners

public boolean hasListeners()
Method hasListeners returns true if FlowListener instances have been registered.

Returns:
boolean

addListener

public void addListener(FlowListener flowListener)
Method addListener registers the given flowListener with this instance.

Parameters:
flowListener - of type FlowListener

removeListener

public boolean removeListener(FlowListener flowListener)
Method removeListener removes the given flowListener from this instance.

Parameters:
flowListener - of type FlowListener
Returns:
true if the listener was removed

getSources

public Map<String,Tap> getSources()
Method getSources returns the sources of this Flow object.

Returns:
the sources (type Map) of this Flow object.

getSourcesCollection

@DependencyIncoming
public Collection<Tap> getSourcesCollection()
Method getSourcesCollection returns a Collection of source Taps for this Flow object.

Returns:
the sourcesCollection (type Collection) of this Flow object.

getSinks

public Map<String,Tap> getSinks()
Method getSinks returns the sinks of this Flow object.

Returns:
the sinks (type Map) of this Flow object.

getSinksCollection

@DependencyOutgoing
public Collection<Tap> getSinksCollection()
Method getSinksCollection returns a Collection of sink Taps for this Flow object.

Returns:
the sinkCollection (type Collection) of this Flow object.

getTraps

public Map<String,Tap> getTraps()
Method getTraps returns the traps of this Flow object.

Returns:
the traps (type Map) of this Flow object.

getTrapsCollection

public Collection<Tap> getTrapsCollection()
Method getTrapsCollection returns a Collection of trap Taps for this Flow object.

Returns:
the trapsCollection (type Collection) of this Flow object.

getSink

public Tap getSink()
Method getSink returns the first sink of this Flow object.

Returns:
the sink (type Tap) of this Flow object.

isPreserveTemporaryFiles

public boolean isPreserveTemporaryFiles()
Method isPreserveTemporaryFiles returns false if temporary files will be cleaned when this Flow completes.

Returns:
the preserveTemporaryFiles (type boolean) of this Flow object.

isStopJobsOnExit

public boolean isStopJobsOnExit()
Method isStopJobsOnExit returns the stopJobsOnExit of this Flow object. Defaults to true.

Returns:
the stopJobsOnExit (type boolean) of this Flow object.

getFlowSkipStrategy

public FlowSkipStrategy getFlowSkipStrategy()
Method getFlowSkipStrategy returns the current FlowSkipStrategy used by this Flow.

Returns:
FlowSkipStrategy

setFlowSkipStrategy

public FlowSkipStrategy setFlowSkipStrategy(FlowSkipStrategy flowSkipStrategy)
Method setFlowSkipStrategy sets a new FlowSkipStrategy, the current strategy is returned.

FlowSkipStrategy instances define when a Flow instance should be skipped. The default strategy is FlowSkipIfSinkStale. An alternative strategy would be FlowSkipIfSinkExists.

A FlowSkipStrategy will not be consulted when executing a Flow directly through start() or complete(). Only when the Flow is executed through a Cascade instance.

Parameters:
flowSkipStrategy - of type FlowSkipStrategy
Returns:
FlowSkipStrategy

isSkipFlow

public boolean isSkipFlow()
                   throws IOException
Method isSkipFlow returns true if the parent Cascade should skip this Flow instance. True is returned if the current FlowSkipStrategy returns true.

Returns:
the skipFlow (type boolean) of this Flow object.
Throws:
IOException - when

areSinksStale

public boolean areSinksStale()
                      throws IOException
Method areSinksStale returns true if any of the sinks referenced are out of date in relation to the sources. Or if any sink method Tap.isReplace() returns true.

Returns:
boolean
Throws:
IOException - when

areSourcesNewer

public boolean areSourcesNewer(long sinkModified)
                        throws IOException
Method areSourcesNewer returns true if any source is newer than the given sinkModified date value.

Parameters:
sinkModified - of type long
Returns:
boolean
Throws:
IOException - when

getSinkModified

public long getSinkModified()
                     throws IOException
Method getSinkModified returns the youngest modified date of any sink Tap managed by this Flow instance.

If zero (0) is returned, atleast one of the sink resources does not exist. If minus one (-1) is returned, atleast one of the sinks are marked for delete (returns true).

Returns:
the sinkModified (type long) of this Flow object.
Throws:
IOException - when

getSteps

public List<FlowStep> getSteps()
Method getSteps returns the steps of this Flow object. They will be in topological order.

Returns:
the steps (type List) of this Flow object.

prepare

@ProcessPrepare
public void prepare()
Method prepare is used by a Cascade to notify the given Flow it should initialize or clear any resources necessary for start() to be called successfully.

Specifically, this implementation calls deleteSinksIfNotUpdate().

Throws:
IOException - when

start

@ProcessStart
public void start()
Method start begins the execution of this Flow instance. It will return immediately. Use the method complete() to block until this Flow completes.


stop

@ProcessStop
public void stop()
Method stop stops all running jobs, killing any currently executing.


complete

@ProcessComplete
public void complete()
Method complete starts the current Flow instance if it has not be previously started, then block until completion.


cleanup

@ProcessCleanup
public void cleanup()

openSource

public TupleEntryIterator openSource()
                              throws IOException
Method openSource opens the first source Tap.

Returns:
TupleIterator
Throws:
IOException - when

openSource

public TupleEntryIterator openSource(String name)
                              throws IOException
Method openSource opens the named source Tap.

Parameters:
name - of type String
Returns:
TupleIterator
Throws:
IOException - when

openSink

public TupleEntryIterator openSink()
                            throws IOException
Method openSink opens the first sink Tap.

Returns:
TupleIterator
Throws:
IOException - when

openSink

public TupleEntryIterator openSink(String name)
                            throws IOException
Method openSink opens the named sink Tap.

Parameters:
name - of type String
Returns:
TupleIterator
Throws:
IOException - when

openTrap

public TupleEntryIterator openTrap()
                            throws IOException
Method openTrap opens the first trap Tap.

Returns:
TupleIterator
Throws:
IOException - when

openTrap

public TupleEntryIterator openTrap(String name)
                            throws IOException
Method openTrap opens the named trap Tap.

Parameters:
name - of type String
Returns:
TupleIterator
Throws:
IOException - when

deleteSinks

public void deleteSinks()
                 throws IOException
Method deleteSinks deletes all sinks, whether or not they are configured for SinkMode.UPDATE.

Use with caution.

Throws:
IOException - when
See Also:
deleteSinksIfNotUpdate()

deleteSinksIfNotAppend

@Deprecated
public void deleteSinksIfNotAppend()
                            throws IOException
Deprecated. 

Method deleteSinksIfNotAppend deletes all sinks if they are not configured with the SinkMode.APPEND flag.

Typically used by a Cascade before executing the flow if the sinks are stale.

Use with caution.

Throws:
IOException - when

deleteSinksIfNotUpdate

public void deleteSinksIfNotUpdate()
                            throws IOException
Method deleteSinksIfNotUpdate deletes all sinks if they are not configured with the SinkMode.UPDATE flag.

Typically used by a Cascade before executing the flow if the sinks are stale.

Use with caution.

Throws:
IOException - when

tapPathExists

public boolean tapPathExists(Tap tap)
                      throws IOException
Method tapExists returns true if the resource represented by the given Tap instance exists.

Parameters:
tap - of type Tap
Returns:
boolean
Throws:
IOException - when

openTapForRead

public TupleEntryIterator openTapForRead(Tap tap)
                                  throws IOException
Method openTapForRead return a TupleIterator for the given Tap instance.

Parameters:
tap - of type Tap
Returns:
TupleIterator
Throws:
IOException - when there is an error opening the resource

openTapForWrite

public TupleEntryCollector openTapForWrite(Tap tap)
                                    throws IOException
Method openTapForWrite returns a (@link TupleCollector} for the given Tap instance.

Parameters:
tap - of type Tap
Returns:
TupleCollector
Throws:
IOException - when there is an error opening the resource

jobsAreLocal

public boolean jobsAreLocal()
Method jobsAreLocal returns true if all jobs are executed in-process as a single map and reduce task.

Returns:
boolean

run

public void run()
Method run implements the Runnable run method and should not be called by users.

Specified by:
run in interface Runnable

toString

public String toString()
Overrides:
toString in class Object

writeDOT

public void writeDOT(String filename)
Method writeDOT writes this Flow instance to the given filename as a DOT file for import into a graphics package.

Parameters:
filename - of type String

writeStepsDOT

public void writeStepsDOT(String filename)
Method writeStepsDOT writes this Flow step graph to the given filename as a DOT file for import into a graphics package.

Parameters:
filename - of type String

getHolder

public Flow.FlowHolder getHolder()
Used to return a simple wrapper for use as an edge in a graph where there can only be one instance of every edge.

Returns:
FlowHolder


Copyright © 2007-2010 Concurrent, Inc. All Rights Reserved.