3.9 Cascades

3.9 Cascades
Prev	3. Data Processing	Next

A Cascade allows multiple Flow instances to be executed as a single logical unit. If there are dependencies between the Flows, they are executed in the correct order. Further, Cascades act like Ant builds or Unix make files - that is, a Cascade only executes Flows that have stale sinks (i.e., output data that is older than the input data). For more on this, see Skipping Flows.

Example 3.15. Creating a new Cascade

CascadeConnector connector = new CascadeConnector();
Cascade cascade = connector.connect( flowFirst, flowSecond, flowThird );

When passing Flows to the CascadeConnector, order is not important. The CascadeConnector automatically identifies the dependencies between the given Flows and creates a scheduler that starts each Flow as its data sources become available. If two or more Flow instances have no interdependencies, they are submitted together so that they can execute in parallel.

For more information, see the section on Topological Scheduling.

If an instance of cascading.flow.FlowSkipStrategy is given to a Cascade instance (via the Cascade.setFlowSkipStrategy() method), it is consulted for every Flow instance managed by that Cascade, and all skip strategies on those Flow instances are ignored. For more information on skip strategies, see Skipping Flows.

Prev	Up	Next
3.8 Flows	Home	4. Executing Processes on Hadoop