In all of the above sections, the
cascading.operation.BaseOperation
class was
subclassed. This class is an implementation of the
cascading.operation.Operation
interface, and
provides a few default method implementations. It is not strictly
required to extend BaseOperation
when
implementing this interface, but it is very convenient to do so.
When developing custom operations, the developer may need to
initialize and destroy a resource. For example, when doing pattern
matching, you might need to initialize a
java.util.regex.Matcher
and use it in a
thread-safe way. Or you might need to open, and eventually close, a
remote connection. But for performance reasons, the operation should not
create or destroy the connection for each Tuple or every Tuple group
that passes through.
For this reason, the interface
Operation
declares two methods:
prepare()
and
cleanup()
. In the case of Hadoop and MapReduce,
the prepare()
and
cleanup()
methods are called once per Map or
Reduce task. The prepare()
method is called
before any argument Tuple is passed in, and the
cleanup()
method is called after all Tuple
arguments have been operated on. Within each of these methods, the
developer can initialize a "context" object that can hold an open socket
connection or Matcher
instance. This context is
user defined, and is the same mechanism used by the
Aggregator
operation - except that the
Aggregator
is also given the opportunity to
initialize and destroy its context, via the
start()
and complete()
methods.
Note that if a "context" object is used, its type should be declared in the subclass class declaration using the Java Generics notation.
Copyright © 2007-2012 Concurrent, Inc. All Rights Reserved.