There are a number of helper SubAssemblies provided by the core cascading library.
As of Cascading 2.2, many of the below assemblies can optionally ignore null values. This allows for an optional but closer resemblance to how similar functions in SQL perform.
The cascading.pipe.assembly.AggregateBy
SubAssembly is an implementation of the Partial Aggregation pattern, and
is the base class for built-in and custom partial aggregation
implementations like AverageBy
or
CountBy
.
Generally the AggregateBy class is used to combine multiple AggregateBy subclasses into a single Pipe.
Example 10.1. Composing partials with AggregateBy
Pipe assembly = new Pipe( "assembly" );
// ...
Fields groupingFields = new Fields( "date" );
// note we do not pass the parent assembly Pipe in
Fields valueField = new Fields( "size" );
Fields sumField = new Fields( "total-size", long.class );
SumBy sumBy = new SumBy( valueField, sumField );
Fields countField = new Fields( "num-events" );
CountBy countBy = new CountBy( countField );
assembly = new AggregateBy( assembly, groupingFields, sumBy, countBy );
To create a custom partial aggregation, subclass the AggregateBy class and implement the appropriate internal interfaces. See the Javadoc for details.
The cascading.pipe.assembly.AverageBy
SubAssembly performs an average over the given
valueFields
and returns the result in the
averageField
field. AverageBy may be combined with other
AggregateBy subclasses so they may be executed simultaneously over the
same grouping.
Example 10.2. Using AverageBy
Pipe assembly = new Pipe( "assembly" );
// ...
Fields groupingFields = new Fields( "date" );
Fields valueField = new Fields( "size" );
Fields avgField = new Fields( "avg-size" );
assembly = new AverageBy( assembly, groupingFields, valueField, avgField );
The cascading.pipe.assembly.CountBy
SubAssembly performs a count over the given
groupingFields
and returns the result in the
countField
field. CountBy may be combined with other
AggregateBy subclasses so they may be executed simultaneously over the
same grouping.
Example 10.3. Using CountBy
Pipe assembly = new Pipe( "assembly" );
// ...
Fields groupingFields = new Fields( "date" );
Fields countField = new Fields( "count" );
assembly = new CountBy( assembly, groupingFields, countField );
The cascading.pipe.assembly.SumBy
SubAssembly performs a sum over the given valueFields
and
returns the result in the sumField
field. SumBy may be
combined with other AggregateBy subclasses so they may be executed
simultaneously over the same grouping.
Example 10.4. Using SumBy
Pipe assembly = new Pipe( "assembly" );
// ...
Fields groupingFields = new Fields( "date" );
Fields valueField = new Fields( "size" );
Fields sumField = new Fields( "total-size" );
assembly =
new SumBy( assembly, groupingFields, valueField, sumField, long.class );
The cascading.pipe.assembly.FirstBy
SubAssembly is used to return the first encountered value in the given
valueFields
. FirstBy may be combined with other
AggregateBy subclasses so they may be executed simultaneously over the
same grouping.
Example 10.5. Using FirstBy
Pipe assembly = new Pipe( "assembly" );
// ...
Fields groupingFields = new Fields( "date" );
Fields valueField = new Fields( "size" );
// we want the largest size in this grouping
valueField.setComparator( "size", new LongComparator() );
assembly =
new FirstBy( assembly, groupingFields, valueField );
Note if the valueFields
Fields instance has field
comparators, they will be used to sort the argument values to
influence what values are seen first. Otherwise the fields will not be
sorted in any deterministic order.
Copyright © 2007-2012 Concurrent, Inc. All Rights Reserved.