Cascading includes a number of text functions in the
cascading.operation.text
package.
The
cascading.operation.text.DateFormatter
function is used to convert a date timestamp to a formatted
String. This function expects a long
value representing the number of milliseconds since January 1,
1970, 00:00:00 GMT/UTC, and formats the output using
java.text.SimpleDateFormat
syntax.
// "ts" -> 1188604863000
DateFormatter formatter =
new DateFormatter( new Fields( "date" ), "dd/MMM/yyyy" );
pipe = new Each( pipe, new Fields( "ts" ), formatter );
// outgoing -> "date" -> 31/Aug/2007
The example above converts a long
timestamp ("ts") to a date String.
The
cascading.operation.text.DateParser
function is used to convert a text date String to a timestamp,
using the java.text.SimpleDateFormat
syntax. The timestamp is a long
value
representing the number of milliseconds since January 1, 1970,
00:00:00 GMT/UTC. By default, the output is a field with the
name "ts" (for timestamp), but this can be overridden by passing
a declared Fields value.
// "time" -> 01/Sep/2007:00:01:03 +0000
DateParser dateParser = new DateParser( "dd/MMM/yyyy:HH:mm:ss Z" );
pipe = new Each( pipe, new Fields( "time" ), dateParser );
// outgoing -> "ts" -> 1188604863000
In the example above, an Apache log-style date-time field
is converted into a long
timestamp in
UTC.
The
cascading.operation.text.FieldJoiner
function joins all the values in a Tuple with a specified
delimiter and places the result into a new field. (For the
opposite effect, see the RegexSplitter
function.)
The
cascading.operation.text.FieldFormatter
function formats Tuple values with a given String format and
stuffs the result into a new field. The
java.util.Formatter
class is used to
create a new formatted String.
Copyright © 2007-2012 Concurrent, Inc. All Rights Reserved.