Cascading supports pluggable planners that
allow it to execute on differing platforms. Planners are invoked by an
FlowConnector subclass. Currently,
only two planners are provided, as described below:
provides a "local" mode planner for running Cascading completely
in memory on the current computer. This allows for fast
execution of Flows against local files or any other compatible
The local mode planner and platform were not designed to
scale beyond available memory, CPU, or disk on the current
machine. Thus any memory-intensive processes that use
HashJoin are likely to fail against
moderately large files.
Local mode is useful for development, testing, and interactive data exploration against sample sets.
provides a planner for running Cascading on an Apache Hadoop 1
cluster. This allows Cascading to execute against extremely
large data sets over a cluster of computing nodes.
provides a planner for running Cascading on an Apache Hadoop 2
cluster. This class is roughly equivalent to the above
HadoopFlowConnector except it uses Hadoop
2 specific properties and is compiled against Hadoop 2 API
Cascading's support for pluggable planners allows a
pipe assembly to be executed on an arbitrary platform, using
platform-specific Tap and Scheme classes that hide the platform-related
I/O details from the developer. For example, Hadoop uses
org.apache.hadoop.mapred.InputFormat to read
data, but local mode is happy with a
java.io.FileInputStream. This detail is hidden
from developers unless they are creating custom Tap and Scheme
Copyright © 2007-2012 Concurrent, Inc. All Rights Reserved.