As of Cascading 2.2, the Fields class can hold type information for each field, and the Cascading planner can propagate that information from source Tap instances to downstream Operations through to sink Tap instances.
This allows for Taps to read and store type information for external systems and applications, error detection during joins (detecting non-comparable types), to enforce canonical representations within the Tuple (prevent a field from switching arbitrarily between String and Integer types), and to allow for pluggable coercion from one type to another type, even if either isn't a Java primitive.
To declare types, simply pass type information to the Fields instance either through the constructor or via a fluent API.
Example 7.2. Fluent
Fields resultFields = new Fields( "count" ).applyTypes( long.class ); // null becomes 0
Note the first example uses Long.class
, and
the second long.class
. Since
Long
is an object, we are letting Cascading know
that the null value can be set. If declared long
(a primitive) then null becomes zero.
In practice, typed fields can only be used when they declare the results of an operation, for example:
Example 7.3. Declaring Typed Results
Pipe assembly = new Pipe( "assembly" );
// ...
Fields groupingFields = new Fields( "date" );
// note we do not pass the parent assembly Pipe in
Fields valueField = new Fields( "size" );
Fields sumField = new Fields( "total-size", long.class );
SumBy sumBy = new SumBy( valueField, sumField );
Fields countField = new Fields( "num-events" );
CountBy countBy = new CountBy( countField );
assembly = new AggregateBy( assembly, groupingFields, sumBy, countBy );
Here the type information serves two roles. First, it allows a downstream consumer of the field value to know the type maintained in the tuple. Second, the SumBy sub-assembly now has a simpler API and can get the type information it needs internally to perform the aggregation directly from the Fields instance.
Note that the TextDelimited
and other
Scheme
classes should have any type information
declared so it can be maintained by the Cascading planner. Custom
Scheme
types also have the opportunity to read
type information from any field or data sources they represent so it can
be handed to the planner during runtime.
Copyright © 2007-2012 Concurrent, Inc. All Rights Reserved.