Type coercion is a means to convert one data type to another. For
example, parsing the Java String
"42" to the
Integer
42 would be coercion. Or more simply,
converting a Long
42 to a
Double
42.0. Cascading supports primitive type
coercions natively through the
cascading.tuple.coerce.Coercions
class.
In practice, developers implicitly invoke coercions via the
cascading.tuple.TupleEntry
interface by
requesting a Long
or
String
representation of a field, via
TupleEntry.getLong()
or
TupleEntry.getString()
, respectively.
Or when data is set on a Tuple
via
TupleEntry.setLong()
or
TupleEntry.setString()
, for example. If the field was
declared as an Integer
, and
TupleEntry.setString( "someField", "42" )
was called, the
value of "someFields" will be coerced into its canonical form,
42.
To create custom coercions, the
cascading.tuple.type.CoercibleType
interface must
be implemented, and instances of CoercibleType
can be used as the Type accepted by the Fields API as
CoercibleType
extends
java.lang.reflect.Type
.
Cascading provided a
cascading.tuple.type.DateType
implementation to
allow for coercions between date strings and the
Long
canonical type. For example:
Example 7.4. Date Type
SimpleDateFormat dateFormat = new SimpleDateFormat( "dd/MMM/yyyy:HH:mm:ss:SSS Z" );
Date firstDate = new Date();
String stringFirstDate = dateFormat.format( firstDate );
CoercibleType coercible = new DateType( "dd/MMM/yyyy:HH:mm:ss:SSS Z", TimeZone.getDefault() );
// create the Fields, Tuple, and TupleEntry
Fields fields = new Fields( "dateString", "dateValue" ).applyTypes( coercible, long.class );
Tuple tuple = new Tuple( firstDate.getTime(), firstDate.getTime() );
TupleEntry results = new TupleEntry( fields, tuple );
// test the results
assert results.getObject( "dateString" ).equals( firstDate.getTime() );
assert results.getLong( "dateString" ) == firstDate.getTime();
assert results.getString( "dateString" ).equals( stringFirstDate );
assert !results.getString( "dateString" ).equals( results.getString( "dateValue" ) ); // not equals
Date secondDate = new Date( firstDate.getTime() + ( 60 * 1000 ) );
String stringSecondDate = dateFormat.format( secondDate );
results.setString( "dateString", stringSecondDate );
results.setLong( "dateValue", secondDate.getTime() );
assert !results.getObject( "dateString" ).equals( firstDate.getTime() ); // equals
assert results.getObject( "dateString" ).equals( secondDate.getTime() ); // not equals
In this example we declare the "dateString" field to be a
DateType
. DateType
maintains the value of the field as a long
internally, but if a String
is set or requested,
it will be converted using the given
SimpleDateFormat
String
against the given TimeZone
. In the case of a
TextDelimited
CSV file, where one column is a
date value, DateType
can be used to declare its
format allowing TextDelimited
to read and write
the value as a String
, but use the value
internally (in the Tuple) as a long
, which is
much more efficient.
Copyright © 2007-2012 Concurrent, Inc. All Rights Reserved.