In all the above sections, the
cascading.operation.BaseOperation
class was
subclassed. This class is an implementation of the
cascading.operation.Operation
interface and
provides a few default method implementations. It is not strictly
required to extendBaseOperation
, but it is very
convenient to do so.
When developing custom operations, the developer may need to
initialize and destroy a resource. For example, when doing pattern
matching, a java.util.regex.Matcher
may need to
be initialized and used in a thread-safe way. Or a remote connection may
need to be opened and eventually closed. But for performance reasons,
the operation should not create/destroy the connection for each Tuple or
every Tuple group that passes through.
The interface Operation
declares
two methods, prepare()
and
cleanup()
. In the case of Hadoop and MapReduce,
the prepare()
and
cleanup()
methods are called once per Map or
Reduce task. prepare()
is called before any
argument Tuple is passed in, and cleanup()
is
called after all Tuple arguments have been operated on. Within each of
these methods, the developer can initialize a "context" object that can
hold an open socket connection, or Matcher
instance. The "context" is user defined and is the same mechanism used
by the Aggregator
operation, except the
Aggregator
is also given the opportunity to
initialize and destroy its context via the
start()
and complete()
methods.
If a "context" object is used, its type should be declared in the sub-class class declaration using the Java Generics notation.
Copyright © 2007-2008 Concurrent, Inc. All Rights Reserved.