7.2 Type Coercion

Type coercion is a means to convert one data type to another. For example, parsing the Java String "42" to the Integer 42 would be coercion. Or more simply, converting a Long 42 to a Double 42.0. Cascading supports primitive type coercions natively through the cascading.tuple.coerce.Coercions class.

In practice, developers implicitly invoke coercions via the cascading.tuple.TupleEntry interface by requesting a Long or String representation of a field, via TupleEntry.getLong() or TupleEntry.getString(), respectively.

Or when data is set on a Tuple via TupleEntry.setLong() or TupleEntry.setString(), for example. If the field was declared as an Integer, and TupleEntry.setString( "someField", "42" ) was called, the value of "someFields" will be coerced into its canonical form, 42.

To create custom coercions, the cascading.tuple.type.CoercibleType interface must be implemented, and instances of CoercibleType can be used as the Type accepted by the Fields API as CoercibleType extends java.lang.reflect.Type.

Cascading provided a cascading.tuple.type.DateType implementation to allow for coercions between date strings and the Long canonical type. For example:

Example 7.4. Date Type

SimpleDateFormat dateFormat = new SimpleDateFormat( "dd/MMM/yyyy:HH:mm:ss:SSS Z" );
Date firstDate = new Date();
String stringFirstDate = dateFormat.format( firstDate );

CoercibleType coercible = new DateType( "dd/MMM/yyyy:HH:mm:ss:SSS Z", TimeZone.getDefault() );

// create the Fields, Tuple, and TupleEntry
Fields fields = new Fields( "dateString", "dateValue" ).applyTypes( coercible, long.class );
Tuple tuple = new Tuple( firstDate.getTime(), firstDate.getTime() );
TupleEntry results = new TupleEntry( fields, tuple );

// test the results
assert results.getObject( "dateString" ).equals( firstDate.getTime() );
assert results.getLong( "dateString" ) == firstDate.getTime();
assert results.getString( "dateString" ).equals( stringFirstDate );
assert !results.getString( "dateString" ).equals( results.getString( "dateValue" ) ); // not equals

Date secondDate = new Date( firstDate.getTime() + ( 60 * 1000 ) );
String stringSecondDate = dateFormat.format( secondDate );

results.setString( "dateString", stringSecondDate );
results.setLong( "dateValue", secondDate.getTime() );

assert !results.getObject( "dateString" ).equals( firstDate.getTime() ); // equals
assert results.getObject( "dateString" ).equals( secondDate.getTime() ); // not equals

In this example we declare the "dateString" field to be a DateType. DateType maintains the value of the field as a long internally, but if a String is set or requested, it will be converted using the given SimpleDateFormat String against the given TimeZone. In the case of a TextDelimited CSV file, where one column is a date value, DateType can be used to declare its format allowing TextDelimited to read and write the value as a String, but use the value internally (in the Tuple) as a long, which is much more efficient.

Copyright © 2007-2012 Concurrent, Inc. All Rights Reserved.