There are a number of helper SubAssemblies provided by the core cascading library.
The cascading.pipe.assembly.AggregateBy
SubAssembly is an implementation of the Partial Aggregation pattern, and
is the base class for built-in and custom partial aggregation
implementations like AverageBy
or
CountBy
.
Generally the AggregateBy class is used to combine multiple AggregateBy subclasses into a single Pipe.
Example 9.1. Composing partials with AggregateBy
Pipe assembly = new Pipe( "assembly" );
// ...
Fields groupingFields = new Fields( "date" );
// note we do not pass the parent assembly Pipe in
Fields valueField = new Fields( "size" );
Fields sumField = new Fields( "total-size" );
SumBy sumBy = new SumBy( valueField, sumField, long.class );
Fields countField = new Fields( "num-events" );
CountBy countBy = new CountBy( countField );
assembly = new AggregateBy( assembly, groupingFields, sumBy, countBy );
To create a custom partial aggregation, subclass the AggregateBy class and implement the appropriate internal interfaces. See the Javadoc for details.
The cascading.pipe.assembly.AverageBy
SubAssembly performs an average over the given
valueFields
and returns the result in the
averageField
field. AverageBy may be combined with other
AggregateBy subclasses so they may be executed simultaneously over the
same grouping.
Example 9.2. Using AverageBy
Pipe assembly = new Pipe( "assembly" );
// ...
Fields groupingFields = new Fields( "date" );
Fields valueField = new Fields( "size" );
Fields avgField = new Fields( "avg-size" );
assembly = new AverageBy( assembly, groupingFields, valueField, avgField );
The cascading.pipe.assembly.CountBy
SubAssembly performs a count over the given
groupingFields
and returns the result in the
countField
field. CountBy may be combined with other
AggregateBy subclasses so they may be executed simultaneously over the
same grouping.
Example 9.3. Using CountBy
Pipe assembly = new Pipe( "assembly" );
// ...
Fields groupingFields = new Fields( "date" );
Fields countField = new Fields( "count" );
assembly = new CountBy( assembly, groupingFields, countField );
The cascading.pipe.assembly.SumBy
SubAssembly performs a sum over the given valueFields
and
returns the result in the sumField
field. SumBy may be
combined with other AggregateBy subclasses so they may be executed
simultaneously over the same grouping.
Example 9.4. Using SumBy
Pipe assembly = new Pipe( "assembly" );
// ...
Fields groupingFields = new Fields( "date" );
Fields valueField = new Fields( "size" );
Fields sumField = new Fields( "total-size" );
assembly =
new SumBy( assembly, groupingFields, valueField, sumField, long.class );
Copyright © 2007-2012 Concurrent, Inc. All Rights Reserved.