7.5 Text Functions

Cascading includes a number of text functions in the cascading.operation.text package.

FieldJoiner

The cascading.operation.text.FieldJoiner function joins all the values in a Tuple with a given delimiter and stuffs the result into a new field.

FieldFormatter

The cascading.operation.text.FieldFormatter function formats Tuple values with a given String format and stuffs the result into a new field. The java.util.Formatter class is used to create a new formatted String.

DateParser

The cascading.operation.text.DateParser function is used to convert a text date String to a timestamp using the java.text.SimpleDateFormat syntax. The timestamp is a long value representing the number of milliseconds since January 1, 1970, 00:00:00 GMT. By default it emits a field with the name "ts" for timestamp, but this can be overridden by passing a declared Fields value.

// "time" -> 01/Sep/2007:00:01:03 +0000

DateParser dateParser = new DateParser( "dd/MMM/yyyy:HH:mm:ss Z" );
pipe = new Each( pipe, new Fields( "time" ), dateParser );

// outgoing -> "ts" -> 1188604863000

Above we convert an Apache log style date-time field into a long timestamp.

DateFormatter

The cascading.operation.text.DateFormatter function is used to convert a date timestamp to a formatted String. This function expects a long value representing the number of milliseconds since January 1, 1970, 00:00:00 GMT. And uses the java.text.SimpleDateFormat syntax.

// "ts" -> 1188604863000

DateFormatter formatter =
  new DateFormatter( new Fields("date"), "dd/MMM/yyyy" );
pipe = new Each( pipe, new Fields( "ts" ), formatter );

// outgoing -> "date" -> 31/Aug/2007

Above we convert a long timestamp ("ts") to a date String.

Copyright © 2007-2008 Concurrent, Inc. All Rights Reserved.