Cascading includes a number of text functions in the
cascading.operation.text
package.
The
cascading.operation.text.FieldJoiner
function joins all the values in a Tuple with a given delimiter
and stuffs the result into a new field.
The
cascading.operation.text.FieldFormatter
function formats Tuple values with a given String format and
stuffs the result into a new field. The
java.util.Formatter
class is used to
create a new formatted String.
The
cascading.operation.text.DateParser
function is used to convert a text date String to a timestamp
using the java.text.SimpleDateFormat
syntax. The timestamp is a long
value
representing the number of milliseconds since January 1, 1970,
00:00:00 GMT. By default it emits a field with the name "ts" for
timestamp, but this can be overridden by passing a declared
Fields value.
// "time" -> 01/Sep/2007:00:01:03 +0000 DateParser dateParser = new DateParser( "dd/MMM/yyyy:HH:mm:ss Z" ); pipe = new Each( pipe, new Fields( "time" ), dateParser ); // outgoing -> "ts" -> 1188604863000
Above we convert an Apache log style date-time field into
a long
timestamp.
The
cascading.operation.text.DateFormatter
function is used to convert a date timestamp to a formatted
String. This function expects a long
value representing the number of milliseconds since January 1,
1970, 00:00:00 GMT. And uses the
java.text.SimpleDateFormat
syntax.
// "ts" -> 1188604863000 DateFormatter formatter = new DateFormatter( new Fields("date"), "dd/MMM/yyyy" ); pipe = new Each( pipe, new Fields( "ts" ), formatter ); // outgoing -> "date" -> 31/Aug/2007
Above we convert a long
timestamp
("ts") to a date String.
Copyright © 2007-2008 Concurrent, Inc. All Rights Reserved.