Stream assertions are simply a mechanism for asserting that one or more values in a tuple stream meet certain criteria. This is similar to the Java language "assert" keyword, or a unit test. An example would be "assert not null" or "assert matches".
Assertions are treated like any other function or aggregator in Cascading. They are embedded directly into the pipe assembly by the developer. By default, if an assertion fails, the processing fails. As an alternative, an assertion failure can be caught by a failure Trap.
Assertions may be more, or less, desirable in different contexts. For this reason, stream assertions can be treated as either "strict" or "validating". Strict assertions make sense when running tests against regression data - which should be small, and should represent many of the edge cases that the processing assembly must robustly support. Validating assertions, on the other hand, make more sense when running tests in staging, or when using data that may vary in quality due to an unmanaged source.
And of course there are cases where assertions are unnecessary and only impede processing, and it would be best to just bypass them altogether.
To handle all three of these situations, Cascading can be instructed to plan out (i.e., omit) strict assertions, validation assertions, or both when building the Flow. To create optimal performance, Cascading implements this by actually leaving the undesired assertions out of the final Flow (not merely switching them off).
Example 8.5. Adding Assertions
// incoming -> "ip", "time", "method", "event", "status", "size"
AssertNotNull notNull = new AssertNotNull();
assembly = new Each( assembly, AssertionLevel.STRICT, notNull );
AssertSizeEquals equals = new AssertSizeEquals( 6 );
assembly = new Each( assembly, AssertionLevel.STRICT, equals );
AssertMatchesAll matchesAll = new AssertMatchesAll( "(GET|HEAD|POST)" );
assembly = new Each( assembly, new Fields( "method" ),
AssertionLevel.STRICT, matchesAll );
// outgoing -> "ip", "time", "method", "event", "status", "size"
Again, assertions are added to a pipe assembly like any other
operation, except that the AssertionLevel
must be
set to tell the planner how to treat the assertion during
planning.
Example 8.6. Planning Out Assertions
// FlowDef is a fluent way to define a Flow
FlowDef flowDef = new FlowDef();
// bind the taps and pipes
flowDef
.addSource( assembly.getName(), source )
.addSink( assembly.getName(), sink )
.addTail( assembly );
// removes all assertions from the Flow
flowDef
.setAssertionLevel( AssertionLevel.NONE );
Flow flow = new HadoopFlowConnector().connect( flowDef );
To configure the planner to remove some or all assertions, a
property can be set via the
FlowConnectorProps.setAssertionLevel()
method or
directly on the FlowDef
instance, as shown above.
Setting AssertionLevel.NONE
removes all
assertions. AssertionLevel.VALID
keeps
VALID
assertions but removes STRICT
ones. And
AssertionLevel.STRICT
keeps all assertions - the
planner default value.
Copyright © 2007-2012 Concurrent, Inc. All Rights Reserved.