8.5 Text Functions

Cascading includes a number of text functions in the cascading.operation.text package.

DateFormatter

The cascading.operation.text.DateFormatter function is used to convert a date timestamp to a formatted String. This function expects a long value representing the number of milliseconds since January 1, 1970, 00:00:00 GMT/UTC, and formats the output using java.text.SimpleDateFormat syntax.

// "ts" -> 1188604863000

DateFormatter formatter =
  new DateFormatter( new Fields( "date" ), "dd/MMM/yyyy" );
pipe = new Each( pipe, new Fields( "ts" ), formatter );

// outgoing -> "date" -> 31/Aug/2007

The example above converts a long timestamp ("ts") to a date String.

DateParser

The cascading.operation.text.DateParser function is used to convert a text date String to a timestamp, using the java.text.SimpleDateFormat syntax. The timestamp is a long value representing the number of milliseconds since January 1, 1970, 00:00:00 GMT/UTC. By default, the output is a field with the name "ts" (for timestamp), but this can be overridden by passing a declared Fields value.

// "time" -> 01/Sep/2007:00:01:03 +0000

DateParser dateParser = new DateParser( "dd/MMM/yyyy:HH:mm:ss Z" );
pipe = new Each( pipe, new Fields( "time" ), dateParser );

// outgoing -> "ts" -> 1188604863000

In the example above, an Apache log-style date-time field is converted into a long timestamp in UTC.

FieldJoiner

The cascading.operation.text.FieldJoiner function joins all the values in a Tuple with a specified delimiter and places the result into a new field. (For the opposite effect, see the RegexSplitter function.)

FieldFormatter

The cascading.operation.text.FieldFormatter function formats Tuple values with a given String format and stuffs the result into a new field. The java.util.Formatter class is used to create a new formatted String.

Copyright © 2007-2012 Concurrent, Inc. All Rights Reserved.