3.4 Field Algebra

As can be seen above, the Each and Every Pipe classes provide a means to merge input Tuple values with Operation result Tuple values to create a final output Tuple, which are used as the input to the next Pipe instance. This merging is created through a type of "field algebra", and can get rather complicated when factoring in Fields sets, a kind of wildcard for specifying certain field values.

Fields sets are constant values on the Fields class and can be used in many places the Fields class is expected. They are:

Fields.ALL

The cascading.tuple.Fields.ALL constant is a "wildcard" that represents all the current available fields.

Fields.RESULTS

The cascading.tuple.Fields.RESULTS constant set is used to represent the field names of the current Operations return values. This Fields set may only be used as an output selector on a Pipe where it replaces in the input Tuple with the Operation result Tuple in the stream.

Fields.REPLACE

The cascading.tuple.Fields.REPLACE constant is used as an output selector to inline-replace values in the incoming Tuple with the results of an Operation. This is a convenience Fields set that allows subsequent Operations to 'step' on the value with a given field name. The current Operation must always use the exact same field names, or the ARGS Fields set.

Fields.SWAP

The cascading.tuple.Fields.SWAP constant is used as an output selector to swap out Operation arguments with its results. Neither the argument and result field names or size need to be the same. This is useful for when the Operation arguments are no longer necessary and the result Fields and values should be appended to the remainder of the input field names and Tuple.

Fields.ARGS

The cascading.tuple.Fields.ARGS constant is used to let a given Operation inherit the field names of its argument Tuple. This Fields set is a convenience and is typically used when the Pipe output selector is RESULTS or REPLACE. It is specifically used by the Identity Function when coercing values from Strings to primitive types.

Fields.GROUP

The cascading.tuple.Fields.GROUP constant represents all the fields used as grouping values in a previous Group. If there is no previous Group in the pipe assembly, the GROUP represents all the current field names.

Fields.VALUES

The cascading.tuple.Fields.VALUES constant represent all the fields not used as grouping fields in a previous Group.

Fields.UNKNOWN

The cascading.tuple.Fields.UNKNOWN constant is used when Fields must be declared, but how many and their names is unknown. This allows for arbitrarily length Tuples from an input source or some Operation. Use this Fields set with caution.

Below is a reference chart showing common ways to merge input and result fields for the desired output fields. See the section on Each and Every Pipes for details on the different columns and their relationships to the Each and Every Pipes and Functions, Aggregators, and Buffers.

Copyright © 2007-2008 Concurrent, Inc. All Rights Reserved.