To use Cascading, it is not strictly necessary to create custom Operations. There are a number of Operations in the Cascading library that can be combined into very robust applications. In the same way you can chain sed, grep, sort, uniq, awk, etc in Unix, you can chain existing Cascading operations. But developing customs Operations is very simple in Cascading.
There are four kinds of Operations:
All Operations operate on an input argument Tuple and all
Operations other than
Filter may return zero or
more Tuple object results. That is, a
can parse a string and return a new Tuple for every value parsed out
(one Tuple for each 'word'), or it may create a single Tuple with every
parsed value as an element in the Tuple object (one Tuple with
"first-name" and "last-name" fields).
In practice, a
Function that returns no
results is a
Filter, but the
Filter type has been optimized and can be
combined with "logical" filter Operations like
During runtime, Operations actually receive arguments as an
instance of the TupleEntry object. The TupleEntry object holds both an
Fields and the current
defines fields for.
All Operations, other than
declare result Fields. For example, if a
was written to parse words out of a String and return a new Tuple for
each word, this
Function must declare that it
intends to return a Tuple with one field named "word". If the
Function mistakenly returns more values in the
Tuple other than a 'word', the process will fail. Operations that do
return arbitrary numbers of values in a result Tuple may declare
The Cascading planner always attempts to "fail fast" where possible by checking the field name dependencies between Pipes and Operations, but some cases the planner can't account for.
All Operations must be wrapped by either an
Each or an
instance. The pipe is responsible for passing in an argument Tuple and
accepting the result Tuple.
Operations, by default, are "safe". Safe Operations can execute
safely multiple times on the same Tuple multiple times, that is, it has
no side-effects, it is idempotent. If an Operation is not idempotent,
isSafe() must return
value influences how the Cascading planner renders the Flow under
Copyright © 2007-2008 Concurrent, Inc. All Rights Reserved.