5.6 Operation and BaseOperation

In all of the above sections, the cascading.operation.BaseOperation class was subclassed. This class is an implementation of the cascading.operation.Operation interface, and provides a few default method implementations. It is not strictly required to extend BaseOperation when implementing this interface, but it is very convenient to do so.

When developing custom operations, the developer may need to initialize and destroy a resource. For example, when doing pattern matching, you might need to initialize a java.util.regex.Matcher and use it in a thread-safe way. Or you might need to open, and eventually close, a remote connection. But for performance reasons, the operation should not create or destroy the connection for each Tuple or every Tuple group that passes through.

For this reason, the interface Operation declares two methods: prepare() and cleanup(). In the case of Hadoop and MapReduce, the prepare() and cleanup() methods are called once per Map or Reduce task. The prepare() method is called before any argument Tuple is passed in, and the cleanup() method is called after all Tuple arguments have been operated on. Within each of these methods, the developer can initialize a "context" object that can hold an open socket connection or Matcher instance. This context is user defined, and is the same mechanism used by the Aggregator operation - except that the Aggregator is also given the opportunity to initialize and destroy its context, via the start() and complete() methods.

Note that if a "context" object is used, its type should be declared in the subclass class declaration using the Java Generics notation.

Copyright © 2007-2012 Concurrent, Inc. All Rights Reserved.