8.2 Stream Assertions

Stream assertions are simply a mechanism for asserting that one or more values in a tuple stream meet certain criteria. This is similar to the Java language "assert" keyword, or a unit test. An example would be "assert not null" or "assert matches".

Assertions are treated like any other function or aggregator in Cascading. They are embedded directly into the pipe assembly by the developer. By default, if an assertion fails, the processing fails. As an alternative, an assertion failure can be caught by a failure Trap.

Assertions may be more, or less, desirable in different contexts. For this reason, stream assertions can be treated as either "strict" or "validating". Strict assertions make sense when running tests against regression data - which should be small, and should represent many of the edge cases that the processing assembly must robustly support. Validating assertions, on the other hand, make more sense when running tests in staging, or when using data that may vary in quality due to an unmanaged source.

And of course there are cases where assertions are unnecessary and only impede processing, and it would be best to just bypass them altogether.

To handle all three of these situations, Cascading can be instructed to plan out (i.e., omit) strict assertions, validation assertions, or both when building the Flow. To create optimal performance, Cascading implements this by actually leaving the undesired assertions out of the final Flow (not merely switching them off).

Example 8.5. Adding Assertions

// incoming -> "ip", "time", "method", "event", "status", "size"

AssertNotNull notNull = new AssertNotNull();
assembly = new Each( assembly, AssertionLevel.STRICT, notNull );

AssertSizeEquals equals = new AssertSizeEquals( 6 );
assembly = new Each( assembly, AssertionLevel.STRICT, equals );

AssertMatchesAll matchesAll = new AssertMatchesAll( "(GET|HEAD|POST)" );
assembly = new Each( assembly, new Fields( "method" ),
  AssertionLevel.STRICT, matchesAll );

// outgoing -> "ip", "time", "method", "event", "status", "size"

Again, assertions are added to a pipe assembly like any other operation, except that the AssertionLevel must be set to tell the planner how to treat the assertion during planning.

Example 8.6. Planning Out Assertions

// FlowDef is a fluent way to define a Flow
FlowDef flowDef = new FlowDef();

// bind the taps and pipes
  .addSource( assembly.getName(), source )
  .addSink( assembly.getName(), sink )
  .addTail( assembly );

// removes all assertions from the Flow
  .setAssertionLevel( AssertionLevel.NONE );

Flow flow = new HadoopFlowConnector().connect( flowDef );

To configure the planner to remove some or all assertions, a property can be set via the FlowConnectorProps.setAssertionLevel() method or directly on the FlowDef instance, as shown above. Setting AssertionLevel.NONE removes all assertions. AssertionLevel.VALID keeps VALID assertions but removes STRICT ones. And AssertionLevel.STRICT keeps all assertions - the planner default value.

Copyright © 2007-2012 Concurrent, Inc. All Rights Reserved.