cascading.operation
Interface Buffer<Context>
- All Superinterfaces:
- DeclaresResults, Operation<Context>
- All Known Implementing Classes:
- FirstNBuffer
public interface Buffer<Context>
- extends Operation<Context>
A Buffer is similar to an Aggregator
by the fact that it operates on unique groups of values. It differs
by the fact that an Iterator
is provided and it is the responsibility
of the operate(cascading.flow.FlowProcess, BufferCall)
method to iterate overall all the input
arguments returned by this Iterator, if any.
For the case where a Buffer follows a CoGroup, the method operate(cascading.flow.FlowProcess, BufferCall)
will be called for every unique group whether or not there are values available to iterate over. This may be
counter-intuitive for the case of an 'inner join' where the left or right stream may have a null grouping key value.
Regardless, the current grouping value can be retrieved through BufferCall.getGroup()
.
Buffer is very useful when header or footer values need to be inserted into a grouping, or if values need to be
inserted into the middle of the group values. For example, consider a stream of timestamps. A Buffer could
be used to add missing entries, or to calculate running or moving averages over a smaller "window" within the grouping.
By default, if a result is emitted from the Buffer before the argumentsIterator is started or after it is
completed (argumentsIterator.hasNext() == false
), non-grouping values are forced to null (to allow for header
and footer tuple results).
By setting BufferCall.setRetainValues(boolean)
to true
in the
Operation.prepare(cascading.flow.FlowProcess, OperationCall)
method, the last seen Tuple values will not be
nulled after completion and will be treated as the current incoming Tuple when merged with the Buffer result Tuple
via the Every outgoing selector.
There may be only one Buffer after a GroupBy
or CoGroup
. And there
may not be any additional Every
pipes before or after the buffers Every pipe instance. A
PlannerException
will be thrown if these rules are violated.
Buffer implementations should be re-entrant. There is no guarantee a Buffer instance will be executed in a
unique vm, or by a single thread. Also, note the Iterator will return the same TupleEntry
instance, but with new values in its child Tuple
.
As of Cascading 2.5, if the previous CoGroup uses a BufferJoin
as the
Joiner
, a Buffer may be used to implement differing Joiner strategies.
Instead of calling BufferCall.getArgumentsIterator()
(which will return null),
BufferCall.getJoinerClosure()
will return an JoinerClosure
instance with direct access to each CoGrouped Iterator.
Fields inherited from interface cascading.operation.Operation |
ANY |
operate
void operate(FlowProcess flowProcess,
BufferCall<Context> bufferCall)
- Method operate is called once for each grouping.
BufferCall
passes in an Iterator
that returns an argument TupleEntry
for each value in the grouping defined by the
argument selector on the parent Every pipe instance.
TupleEntry entry, or entry.getTuple() should not be stored directly in a collection or modified.
A copy of the tuple should be made via the new Tuple( entry.getTuple() )
copy constructor.
This method is called for every unique group, whether or not there are values in the arguments Iterator.
- Parameters:
flowProcess
- of type FlowProcessbufferCall
- of type BufferCall
Copyright © 2007-2014 Concurrent, Inc. All Rights Reserved.