12.3 Custom Comparators and Hashing

12.3 Custom Comparators and Hashing
Prev	12. Extending Cascading	Next

Frequently, objects in one Tuple are compared to objects in a second Tuple. This is especially true during the sort phase of GroupBy and CoGroup in Cascading Hadoop mode . By default, Hadoop and Cascading use the native Object methods equals() and hashCode() to compare two values and get a consistent hash code for a given value, respectively.

To override this default behavior, you can create a custom java.util.Comparator class to perform comparisons on a given field in a Tuple. For instance, to secondary-sort a collection of custom Person objects in a GroupBy, use the Fields.setComparator() method to designate the custom Comparator to the Fields instance that specifies the sort fields.

Alternatively, you can set a default Comparator to be used by a Flow, or used locally on a given Pipe instance. There are two ways to do this. Call FlowProps.setDefaultTupleElementComparator() on a Properties instance, or use the property key cascading.flow.tuple.element.comparator.

If the hash code must also be customized, the custom Comparator can implement the interface cascading.tuple.Hasher. For more information, see the Javadoc.

Prev	Up	Next
12.2 Custom Types and Serialization	Home	13. Cookbook