It's very common when processing raw data streams to encounter data that is corrupt or malformed in some way. For instance, bad content may be fetched from the web via a crawler upstream, or a bug may have leaked into a browser widget somewhere that sends user behavior information back for analysis. Whatever the cause, it's a good practice to define a set of rules for identifying and discarding questionable records.
It is tempting to simply throw an exception and have a Trap
capture the offending Tuple
, but Traps were not
designed as a filtering mechanism, and consequently much valuable
information would be lost.
Instead of traps, use filters. Create a
SubAssembly
that applies rules to the stream by
setting a binary field that marks the tuple as good or bad. After all
the rules are applied, split the stream based on the value of the good
or bad Boolean
value. Consider setting a reason
field that states why the Tuple was marked bad.
Copyright © 2007-2012 Concurrent, Inc. All Rights Reserved.