Tuple class is a generic container for
java.lang.Object instances. Thus any
primitive value or custom Class can be stored in a
Tuple instance - that is, returned by a
Buffer as a result value.
But for this to work when using the Cascading Hadoop mode, any
Class that isn't a primitive type or a Hadoop
Writable type requires a corresponding Hadoop
serialization class registered in the Hadoop configuration files for
your cluster. Hadoop
Writable types work because
there is already a generic serialization implementation built into
Hadoop. See the Hadoop documentation for information on registering a
new serialization helper or creating
types. Registered serialization implementations are automatically
inherited by Cascading.
During serialization and deserialization of
Tuple instances that contain custom types, the
Tuple serialization framework must
store the class name (as a
serializing the custom object. This can be very space-inefficient. To
overcome this, custom types can add the
SerializationToken Java annotation to the custom
type class. The
expects two arrays - one of integers that are used as tokens, and one of
Class name strings. Both arrays must be the same size. The integer
tokens must all have values of 128 or greater, since the first 128
values are reserved for internal use.
During serialization and deserialization, the token values are
used instead of the
String Class names, in order
to reduce the amount of storage used.
Serialization tokens may also be stored in the Hadoop config files
or set as a property passed to the
with the property name
value of this property is a comma separated list of
Note that Cascading natively serializes/deserializes all
primitives and byte arrays (
byte), if the developer
BytesSerialization class by using
BytesSerialization.class.getName(). The token 127 is used for the
By default, Cascading uses lazy deserialization on Tuple elements during comparisons when Hadoop sorts keys during the "shuffle" phase.
Cascading supports custom serialization for custom types, as well
as lazy deserialization of custom types during comparisons. This is
accomplished by implementing the
interface. See the Javadoc for detailed instructions on implemention,
and the unit tests for examples.
Copyright © 2007-2012 Concurrent, Inc. All Rights Reserved.