It's very common when processing raw data streams to encounter data that is corrupt or malformed in some way. For instance, bad content may be fetched from the web via a crawler upstream, or a bug may have leaked into a browser widget somewhere that sends user behavior information back for analysis. Whatever the cause, it's a good practice to define a set of rules for identifying and discarding questionable records.
It is tempting to simply throw an exception and have a Trap
capture the offending
Tuple, but Traps were not
designed as a filtering mechanism, and consequently much valuable
information would be lost.
Instead of traps, use filters. Create a
SubAssembly that applies rules to the stream by
setting a binary field that marks the tuple as good or bad. After all
the rules are applied, split the stream based on the value of the good
Boolean value. Consider setting a reason
field that states why the Tuple was marked bad.
Copyright © 2007-2012 Concurrent, Inc. All Rights Reserved.