cascading.operation
Interface Aggregator<Context>

All Superinterfaces:
DeclaresResults, Operation<Context>
All Known Implementing Classes:
Average, AverageBy.AverageFinal, Count, ExtentBase, ExtremaBase, ExtremaValueBase, First, Last, Max, MaxValue, Min, MinValue, Sum

public interface Aggregator<Context>
extends Operation<Context>

An Aggregator takes the set of all values associated with a unique grouping and returns zero or more values. MaxValue, MinValue, Count, and Average are good examples.

Aggregator implementations should be reentrant. There is no guarantee an Aggregator instance will be executed in a unique vm, or by a single thread. The start(cascading.flow.FlowProcess, AggregatorCall) method provides a mechanism for maintaining a 'context' object to hold intermediate values.

Note TupleEntry instances are reused internally so should not be stored. Instead use the TupleEntry or Tuple copy constructors to make safe copies.

Since Aggregators can be chained, and Cascading pipelines all operation results, any Aggregators coming ahead of the current Aggregator must return a value before the complete(cascading.flow.FlowProcess, AggregatorCall) method on this Aggregator is called. Subsequently, if any previous Aggregators return more than one Tuple result, this complete() method will be called for each Tuple emitted.

Thus it is a best practice to implement a Buffer when emitting more than one, or zero Tuple results.

See Also:
AggregatorCall, OperationCall

Field Summary
 
Fields inherited from interface cascading.operation.Operation
ANY
 
Method Summary
 void aggregate(FlowProcess flowProcess, AggregatorCall<Context> aggregatorCall)
          Method aggregate is called for each TupleEntry value in the current grouping.
 void complete(FlowProcess flowProcess, AggregatorCall<Context> aggregatorCall)
          Method complete will be issued last after every TupleEntry has been passed to the aggregate(cascading.flow.FlowProcess, AggregatorCall) method.
 void start(FlowProcess flowProcess, AggregatorCall<Context> aggregatorCall)
          Method start initializes the aggregation procedure and is called for every unique grouping.
 
Methods inherited from interface cascading.operation.Operation
cleanup, flush, getFieldDeclaration, getNumArgs, isSafe, prepare
 

Method Detail

start

void start(FlowProcess flowProcess,
           AggregatorCall<Context> aggregatorCall)
Method start initializes the aggregation procedure and is called for every unique grouping.

The AggregatorCall context should be initialized here if necessary.

The first time this method is called for a given 'process', the AggregatorCall context will be null. This method should set a new instance of the user defined context object. When the AggregatorCall context is not null, it is up to the developer to create a new instance, or 'recycle' the given instance. If recycled, it must be re-initialized to remove any previous state/values.

For example, if a Map is used to hold the intermediate data for each subsequent aggregate(cascading.flow.FlowProcess, AggregatorCall) call, new HashMap() should be set on the AggregatorCall instance when OperationCall.getContext() is null. On the next grouping, start() will be called again, but this time with the old Map instance. In this case, map.clear() should be invoked before returning.

Parameters:
flowProcess - of type FlowProcess
aggregatorCall - of type AggregatorCall

aggregate

void aggregate(FlowProcess flowProcess,
               AggregatorCall<Context> aggregatorCall)
Method aggregate is called for each TupleEntry value in the current grouping.

TupleEntry entry, or entry.getTuple() should not be stored directly in the context. A copy of the tuple should be made via the new Tuple( entry.getTuple() ) copy constructor.

Parameters:
flowProcess - of type FlowProcess
aggregatorCall - of type AggregatorCall

complete

void complete(FlowProcess flowProcess,
              AggregatorCall<Context> aggregatorCall)
Method complete will be issued last after every TupleEntry has been passed to the aggregate(cascading.flow.FlowProcess, AggregatorCall) method. Any final calculation should be completed here and passed to the outputCollector.

Parameters:
flowProcess - of type FlowProcess
aggregatorCall - of type AggregatorCall


Copyright © 2007-2014 Concurrent, Inc. All Rights Reserved.