Cascading applications can perform complex manipulation or "field
algebra" on the fields stored in tuples, using Fields sets, a feature of the
Fields
class that provides a sort of wildcard
tool for referencing sets of field values.
These predefined Fields sets are constant values on the
Fields
class. They can be used in many places
where the Fields
class is expected. They are:
The cascading.tuple.Fields.ALL
constant is a wildcard that represents all the current available
fields.
// incoming -> first, last, age
String expression = "first + \" \" + last";
Fields fields = new Fields( "full" );
ExpressionFunction full =
new ExpressionFunction( fields, expression, String.class );
assembly =
new Each( assembly, new Fields( "first", "last" ), full, Fields.ALL );
// outgoing -> first, last, age, full
The cascading.tuple.Fields.RESULTS
constant is used to represent the field names of the current
operations return values. This Fields set may only be used as an
output selector on a pipe, causing the pipe to output a tuple
containing the operation results.
// incoming -> first, last, age
String expression = "first + \" \" + last";
Fields fields = new Fields( "full" );
ExpressionFunction full =
new ExpressionFunction( fields, expression, String.class );
Fields firstLast = new Fields( "first", "last" );
assembly =
new Each( assembly, firstLast, full, Fields.RESULTS );
// outgoing -> full
The cascading.tuple.Fields.REPLACE
constant is used as an output selector to inline-replace values
in the incoming tuple with the results of an operation. This
convenient Fields set allows operations to overwrite the value
stored in the specified field. The current operation must either
specify the identical argument selector field names used by the
pipe, or use the ARGS
Fields set.
// incoming -> first, last, age
// coerce to int
Identity function = new Identity( Fields.ARGS, Integer.class );
Fields age = new Fields( "age" );
assembly = new Each( assembly, age, function, Fields.REPLACE );
// outgoing -> first, last, age
The cascading.tuple.Fields.SWAP
constant is used as an output selector to swap the operation
arguments with its results. Neither the argument and result
field names, nor the size, need to be the same. This is useful
for when the operation arguments are no longer necessary and the
result Fields and values should be appended to the remainder of
the input field names and Tuple.
// incoming -> first, last, age
String expression = "first + \" \" + last";
Fields fields = new Fields( "full" );
ExpressionFunction full =
new ExpressionFunction( fields, expression, String.class );
Fields firstLast = new Fields( "first", "last" );
assembly = new Each( assembly, firstLast, full, Fields.SWAP );
// outgoing -> age, full
The cascading.tuple.Fields.ARGS
constant is used to let a given operation inherit the field
names of its argument Tuple. This Fields set is a convenience
and is typically used when the Pipe output selector is
RESULTS
or
REPLACE
. It is specifically used by the
Identity Function when coercing values from Strings to primitive
types.
// incoming -> first, last, age
// coerce to int
Identity function = new Identity( Fields.ARGS, Integer.class );
Fields age = new Fields( "age" );
assembly = new Each( assembly, age, function, Fields.REPLACE );
// outgoing -> first, last, age
The cascading.tuple.Fields.GROUP
constant represents all the fields used as grouping key in the
most recent grouping. If no previous grouping exists in the pipe
assembly, GROUP
represents all the
current field names.
// incoming -> first, last, age
assembly = new GroupBy( assembly, new Fields( "first", "last" ) );
FieldJoiner full = new FieldJoiner( new Fields( "full" ), " " );
assembly = new Each( assembly, Fields.GROUP, full, Fields.ALL );
// outgoing -> first, last, age, full
The cascading.tuple.Fields.VALUES
constant represents all the fields not used as grouping fields
in a previous Group. That is, if you have fields "a", "b", and
"c", and group on "a", Fields.VALUES
will
resolve to "b" and "c".
// incoming -> first, last, age
assembly = new GroupBy( assembly, new Fields( "age" ) );
FieldJoiner full = new FieldJoiner( new Fields( "full" ), " " );
assembly = new Each( assembly, Fields.VALUES, full, Fields.ALL );
// outgoing -> first, last, age, full
The cascading.tuple.Fields.UNKNOWN
constant is used when Fields must be declared, but it's not
known how many fields or what their names are. This allows for
processing tuples of arbitrary length from an input source or
some operation. Use this Fields set with caution.
// incoming -> line
RegexSplitter function = new RegexSplitter( Fields.UNKNOWN, "\t" );
Fields fields = new Fields( "line" );
assembly =
new Each( assembly, fields, function, Fields.RESULTS );
// outgoing -> unknown
The cascading.tuple.Fields.NONE
constant is used to specify no fields. Typically used as an
argument selector for Operations that do not process any Tuples,
like cascading.operation.Insert
.
// incoming -> first, last, age
Insert constant = new Insert( new Fields( "zip" ), "77373" );
assembly = new Each( assembly, Fields.NONE, constant, Fields.ALL );
// outgoing -> first, last, age, zip
The chart below shows common ways to merge input and
result fields for the desired output fields. A few minutes with this
chart may help clarify the discussion of fields, tuples, and pipes. Also
see Each and Every Pipes for details on the different columns
and their relationships to the Each
and
Every
pipes and Functions, Aggregators, and
Buffers.
Copyright © 2007-2012 Concurrent, Inc. All Rights Reserved.