6.2 Stream Assertions

A Flow with Stream Assertions

Stream assertions are simply a mechanism to 'assert' that one or more values in a tuple stream meet certain criteria. This is similar to the Java language 'assert' keyword, or a unit test. An example would be 'assert not null' or 'assert matches'.

Assertions are treated like any other function or aggregator in Cascading. They are embedded directly into the pipe assembly by the developer. If an assertion fails, the processing stops, by default. Alternately they can trigger a Failure Trap.

As with any test, sometimes they are wanted, and sometimes they are unnecessary. Thus stream assertions are embedded as either 'strict' or 'validating'.

When running a tests against regression data, it makes sense to use strict assertions. This regression data should be small and represent many of the edge cases the processing assembly must support robustly. When running tests in staging, or with data that may vary in quality since it is from an unmanaged source, using validating assertions make much sense. Then there are obvious cases where assertions just get in the way and slow down processing and it would be nice to just bypass them.

During runtime, Cascading can be instructed to plan out strict, validating, or all assertions before building the final MapReduce jobs via the MapReduce Job Planner. And they are truly planned out of the resulting job, not just switched off, providing the best performance.

This is just one feature of lazily building MapReduce jobs via a planner, instead of hard coding them.

Example 6.5. Adding Assertions

// incoming -> "ip", "time", "method", "event", "status", "size"

AssertNotNull notNull = new AssertNotNull();
assembly = new Each( assembly, AssertionLevel.STRICT, notNull );

AssertSizeEquals equals = new AssertSizeEquals( 6 );
assembly = new Each( assembly, AssertionLevel.STRICT, equals );

AssertMatchesAll matchesAll = new AssertMatchesAll( "(GET|HEAD|POST)" );
assembly = new Each( assembly, new Fields("method"),
                     AssertionLevel.STRICT, matchesAll );

// outgoing -> "ip", "time", "method", "event", "status", "size"

Again, assertions are added to a pipe assembly like any other operation, except the AssertionLevel must be set, so the planner knows how to treat the assertion during planning.

Example 6.6. Planning Out Assertions

Properties properties = new Properties();

// removes all assertions from the Flow
FlowConnector.setAssertionLevel( properties, AssertionLevel.NONE );

FlowConnector flowConnector = new FlowConnector( properties );

Flow flow = flowConnector.connect( source, sink, assembly );

To configure the planner to remove some or all assertions, a property must be set via the FlowConnector#setAssertionLevel() method. AssertionLevel.NONE removes all assertions. AssertionLevel.VALID keeps VALID assertions but removes STRICT ones. And AssertionLevel.STRICT keeps all assertions, which is the planner default value.

Copyright © 2007-2008 Concurrent, Inc. All Rights Reserved.