All custom Scheme classes must subclass the
cascading.scheme.Scheme abstract class and
implement the required methods.
Scheme is ultimately responsible for
sourcing and sinking Tuples of data. Consequently it must know what
Fields it presents during sourcing, and what
Fields it accepts during sinking. Thus the
constructors on the base
Scheme type must be set
with the source and sink Fields.
A Scheme is allowed to source different Fields than it sinks. The
Scheme does just
Scheme, on the other hand, forces the source and
Fields to be the same.)
retrieveSinkFields() methods allow a custom
Scheme to fetch its source and sink
Fields immediately before the planner is invoked
- for example, from the header of a file, as is the case with
TextDelimited. Also the
presentSinkFields() methods notify the
Scheme of the
the planner expects the Scheme to handle - for example, to write the
field names as a header, as is the case with
Scheme is presented the opportunity
to set any custom properties the underlying platform requires, via the
sourceConfInit() (for a Tuple source tap)
sinkConfInit() (for a Tuple sink tap).
These methods may be called more than once with new configuration
objects, and should be idempotent.
On the Hadoop platform, these methods should be used to configure
A Scheme is always sourced via the
source() method, and is always sunk to via the
Prior to a
sink() call, the
sinkPrepare() methods are called. After all
values have been read or written, the s
sinkCleanup() methods are called.
*Prepare() methods allow a Scheme to
initialize any state necessary - for example, to create a new
java.util.regex.Matcher instance for use against
all record reads). Conversely, the
methods allow for clearing up any resources.
These methods are always called in the same process space as their
sink() calls. In the case of the Hadoop
platform, this will likely be on the cluster side, unlike calls to
*ConfInit() which will likely be on the client
Copyright © 2007-2012 Concurrent, Inc. All Rights Reserved.