7.8 XML Operations

All XML Operations are kept in a module other than core, so can be included in a Cascading application by including the cascading-xml-x.y.z.jar in the project. This module has one dependency, the TagSoup library, which allows for HTML and XML "tidying". More about TagSoup can be read on its website,http://home.ccil.org/~cowan/XML/tagsoup/.

XPathParser

The cascading.operation.xml.XPathParser function will extract a value from the passed Tuple argument into a new Tuple field value. One Tuple value for every given XPath expression will be created. This function effectively converts an XML document into a table. If the returned value of the expression is aNodeList, only the first Node is used. The Node is converted to a new XML document and converted to a String. If only the text values are required, search on the text() nodes, or consider using XPathGenerator to handle multiple NodeList values.

XPathGenerator

The cascading.operation.xml.XPathGenerator function is a generator function that will emit a new Tuple for every Node returned by the given XPath expression.

XPathFilter

The cascading.operation.xml.XPathFilter filter will filter out a Tuple if the given XPath expression returnsfalse. Set the removeMatch parameter to true if the filter should be reversed.

TapSoupParser

The cascading.operation.xml.TagSoupParser function uses the Tag Soup library to convert incoming HTML to clean XHTML. Use the setFeature( feature, value ) method to set TagSoup specific features (as documented on the TagSoup website listed above).

Copyright © 2007-2008 Concurrent, Inc. All Rights Reserved.