cascading.operation
Interface Buffer<Context>

All Superinterfaces:
DeclaresResults, Operation<Context>
All Known Implementing Classes:
FirstNBuffer

public interface Buffer<Context>
extends Operation<Context>

A Buffer is similar to an Aggregator by the fact that it operates on unique groups of values. It differs by the fact that an Iterator is provided and it is the responsibility of the operate(cascading.flow.FlowProcess, BufferCall) method to iterate overall all the input arguments returned by this Iterator, if any.

For the case where a Buffer follows a CoGroup, the method operate(cascading.flow.FlowProcess, BufferCall) will be called for every unique group whether or not there are values available to iterate over. This may be counter-intuitive for the case of an 'inner join' where the left or right stream may have a null grouping key value. Regardless, the current grouping value can be retrieved through BufferCall.getGroup().

Buffer is very useful when header or footer values need to be inserted into a grouping, or if values need to be inserted into the middle of the group values. For example, consider a stream of timestamps. A Buffer could be used to add missing entries, or to calculate running or moving averages over a smaller "window" within the grouping.

By default, if a result is emitted from the Buffer before the argumentsIterator is started or after it is completed (argumentsIterator.hasNext() == false), non-grouping values are forced to null (to allow for header and footer tuple results).

By setting BufferCall.setRetainValues(boolean) to true in the Operation.prepare(cascading.flow.FlowProcess, OperationCall) method, the last seen Tuple values will not be nulled after completion and will be treated as the current incoming Tuple when merged with the Buffer result Tuple via the Every outgoing selector.

There may be only one Buffer after a GroupBy or CoGroup. And there may not be any additional Every pipes before or after the buffers Every pipe instance. A PlannerException will be thrown if these rules are violated.

Buffer implementations should be re-entrant. There is no guarantee a Buffer instance will be executed in a unique vm, or by a single thread. Also, note the Iterator will return the same TupleEntry instance, but with new values in its child Tuple.

As of Cascading 2.5, if the previous CoGroup uses a BufferJoin as the Joiner, a Buffer may be used to implement differing Joiner strategies.

Instead of calling BufferCall.getArgumentsIterator() (which will return null), BufferCall.getJoinerClosure() will return an JoinerClosure instance with direct access to each CoGrouped Iterator.


Field Summary
 
Fields inherited from interface cascading.operation.Operation
ANY
 
Method Summary
 void operate(FlowProcess flowProcess, BufferCall<Context> bufferCall)
          Method operate is called once for each grouping.
 
Methods inherited from interface cascading.operation.Operation
cleanup, flush, getFieldDeclaration, getNumArgs, isSafe, prepare
 

Method Detail

operate

void operate(FlowProcess flowProcess,
             BufferCall<Context> bufferCall)
Method operate is called once for each grouping. BufferCall passes in an Iterator that returns an argument TupleEntry for each value in the grouping defined by the argument selector on the parent Every pipe instance.

TupleEntry entry, or entry.getTuple() should not be stored directly in a collection or modified. A copy of the tuple should be made via the new Tuple( entry.getTuple() ) copy constructor.

This method is called for every unique group, whether or not there are values in the arguments Iterator.

Parameters:
flowProcess - of type FlowProcess
bufferCall - of type BufferCall


Copyright © 2007-2014 Concurrent, Inc. All Rights Reserved.