All XML Operations are kept in a module other than core, so can be
included in a Cascading application by including the
cascading-xml-x.y.z.jar
in the project. This module
has one dependency, the TagSoup library, which allows for HTML and XML
"tidying". More about TagSoup can be read on its website,http://home.ccil.org/~cowan/XML/tagsoup/.
The
cascading.operation.xml.XPathParser
function will extract a value from the passed Tuple argument
into a new Tuple field value. One Tuple value for every given
XPath expression will be created. This function effectively
converts an XML document into a table. If the returned value of
the expression is aNodeList
, only the
first Node
is used. The
Node
is converted to a new XML document
and converted to a String. If only the text values are required,
search on the text()
nodes, or consider using
XPathGenerator to handle multiple
NodeList
values.
The
cascading.operation.xml.XPathGenerator
function is a generator function that will emit a new Tuple for
every Node
returned by the given XPath
expression.
The
cascading.operation.xml.XPathFilter
filter will filter out a Tuple if the given XPath expression
returnsfalse
. Set the removeMatch parameter to
true
if the filter should be reversed.
The
cascading.operation.xml.TagSoupParser
function uses the Tag Soup library to convert incoming HTML to
clean XHTML. Use the setFeature( feature, value )
method to set TagSoup specific features (as documented on the
TagSoup website listed above).
Copyright © 2007-2008 Concurrent, Inc. All Rights Reserved.