12.3 Custom Comparators and Hashing

Frequently, objects in one Tuple are compared to objects in a second Tuple. This is especially true during the sort phase of GroupBy and CoGroup in Cascading Hadoop mode . By default, Hadoop and Cascading use the native Object methods equals() and hashCode() to compare two values and get a consistent hash code for a given value, respectively.

To override this default behavior, you can create a custom java.util.Comparator class to perform comparisons on a given field in a Tuple. For instance, to secondary-sort a collection of custom Person objects in a GroupBy, use the Fields.setComparator() method to designate the custom Comparator to the Fields instance that specifies the sort fields.

Alternatively, you can set a default Comparator to be used by a Flow, or used locally on a given Pipe instance. There are two ways to do this. Call FlowProps.setDefaultTupleElementComparator() on a Properties instance, or use the property key cascading.flow.tuple.element.comparator.

If the hash code must also be customized, the custom Comparator can implement the interface cascading.tuple.Hasher. For more information, see the Javadoc.

Copyright © 2007-2012 Concurrent, Inc. All Rights Reserved.