cascading.flow
Class FlowConnector

java.lang.Object
  extended by cascading.flow.FlowConnector

public class FlowConnector
extends Object

Use the FlowConnector to link source and sink Tap instances with an assembly of Pipe instances into an executable Flow.

FlowConnector invokes a planner for the target execution environment. Currently only MultiMapReducePlanner is supported (for Hadoop). If you have just one pre-existing custom Hadoop job to execute, see MapReduceFlow.

Note that all connect methods take a single tail or an array of tail Pipe instances. "tail" refers to the last connected Pipe instances in a pipe-assembly. Pipe-assemblies are graphs of object with "heads" and "tails". From a given "tail", all connected heads can be found, but not the reverse. So "tails" must be supplied by the user.

The FlowConnector, resulting Flow, and the underlying execution framework (Hadoop) can be configured via a Map or Properties instance given to the constructor. This properties map can be populated before constructing a FlowConnector instance through static methods on FlowConnector and MultiMapReducePlanner. These properties are used to influence the current planner and are also passed down to the execution framework (Hadoop) to override any default values (the number of reducers or mappers, etc. by using application specific properties).

Custom operations (Functions, Filter, etc) may also retrieve these property values at runtime through calls to FlowProcess.getProperty(String).

Most applications will need to call setApplicationJarClass(java.util.Map, Class) or setApplicationJarPath(java.util.Map, String) so that the correct application jar file is passed through to all child processes. The Class or path must reference the custom application jar, not a Cascading library class or jar. The easiest thing to do is give setApplicationJarClass the Class with your static main function and let Cascading figure out which jar to use.

Note that Map is compatible with the Properties class, so properties can be loaded at runtime from a configuration file.

By default, all Assertions are planned into the resulting Flow instance. This can be changed by calling setAssertionLevel(java.util.Map, cascading.operation.AssertionLevel).

Also by default, all Debugs are planned into the resulting Flow instance. This can be changed by calling setDebugLevel(java.util.Map, cascading.operation.DebugLevel).

Properties

See Also:
MapReduceFlow

Constructor Summary
FlowConnector()
          Constructor FlowConnector creates a new FlowConnector instance.
FlowConnector(Map<Object,Object> properties)
          Constructor FlowConnector creates a new FlowConnector instance using the given Properties instance as default value for the underlying jobs.
 
Method Summary
 Flow connect(Map<String,Tap> sources, Map<String,Tap> sinks, Pipe... tails)
          Method connect links the named sources and sinks to the given pipe assembly.
 Flow connect(Map<String,Tap> sources, Tap sink, Pipe tail)
          Method connect links the named source Taps and sink Tap to the given pipe assembly.
 Flow connect(String name, Map<String,Tap> sources, Map<String,Tap> sinks, Map<String,Tap> traps, Pipe... tails)
          Method connect links the named sources, sinks and traps to the given pipe assembly.
 Flow connect(String name, Map<String,Tap> sources, Map<String,Tap> sinks, Pipe... tails)
          Method connect links the named sources and sinks to the given pipe assembly.
 Flow connect(String name, Map<String,Tap> sources, Tap sink, Map<String,Tap> traps, Pipe tail)
          Method connect links the named source and trap Taps and sink Tap to the given pipe assembly.
 Flow connect(String name, Map<String,Tap> sources, Tap sink, Pipe tail)
          Method connect links the named source Taps and sink Tap to the given pipe assembly.
 Flow connect(String name, Tap source, Map<String,Tap> sinks, Collection<Pipe> tails)
          Method connect links the named source Taps and sink Tap to the given pipe assembly.
 Flow connect(String name, Tap source, Map<String,Tap> sinks, Pipe... tails)
          Method connect links the named source Taps and sink Tap to the given pipe assembly.
 Flow connect(String name, Tap source, Tap sink, Map<String,Tap> traps, Pipe tail)
          Method connect links the named trap Taps, source and sink Tap to the given pipe assembly.
 Flow connect(String name, Tap source, Tap sink, Pipe tail)
          Method connect links the given source and sink Taps to the given pipe assembly.
 Flow connect(String name, Tap source, Tap sink, Tap trap, Pipe tail)
          Method connect links the given source, sink, and trap Taps to the given pipe assembly.
 Flow connect(Tap source, Map<String,Tap> sinks, Collection<Pipe> tails)
          Method connect links the named source Taps and sink Tap to the given pipe assembly.
 Flow connect(Tap source, Map<String,Tap> sinks, Pipe... tails)
          Method connect links the named source Taps and sink Tap to the given pipe assembly.
 Flow connect(Tap source, Tap sink, Pipe tail)
          Method connect links the given source and sink Taps to the given pipe assembly.
static Class getApplicationJarClass(Map<Object,Object> properties)
          Method getApplicationJarClass returns the Class set by the setApplicationJarClass method.
static String getApplicationJarPath(Map<Object,Object> properties)
          Method getApplicationJarPath return the path set by the setApplicationJarPath method.
static AssertionLevel getAssertionLevel(Map<Object,Object> properties)
          Method getAssertionLevel returns the configured target planner AssertionLevel.
static DebugLevel getDebugLevel(Map<Object,Object> properties)
          Method getDebugLevel returns the configured target planner DebugLevel.
static Class getIntermediateSchemeClass(Map<Object,Object> properties)
          Method getIntermediateSchemeClass is used for debugging.
 Map<Object,Object> getProperties()
          Method getProperties returns the properties of this FlowConnector object.
static void setApplicationJarClass(Map<Object,Object> properties, Class type)
          Method setApplicationJarClass is used to set the application jar file.
static void setApplicationJarPath(Map<Object,Object> properties, String path)
          Method setApplicationJarPath is used to set the application jar file.
static void setAssertionLevel(Map<Object,Object> properties, AssertionLevel assertionLevel)
          Method setAssertionLevel sets the target planner AssertionLevel.
static void setDebugLevel(Map<Object,Object> properties, DebugLevel debugLevel)
          Method setDebugLevel sets the target planner DebugLevel.
static void setIntermediateSchemeClass(Map<Object,Object> properties, Class intermediateSchemeClass)
          Method setIntermediateSchemeClass is used for debugging.
static void setIntermediateSchemeClass(Map<Object,Object> properties, String intermediateSchemeClass)
          Method setIntermediateSchemeClass is used for debugging.
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Constructor Detail

FlowConnector

public FlowConnector()
Constructor FlowConnector creates a new FlowConnector instance.


FlowConnector

@ConstructorProperties(value="properties")
public FlowConnector(Map<Object,Object> properties)
Constructor FlowConnector creates a new FlowConnector instance using the given Properties instance as default value for the underlying jobs. All properties are copied to a new JobConf instance.

Parameters:
properties - of type Properties
Method Detail

setAssertionLevel

public static void setAssertionLevel(Map<Object,Object> properties,
                                     AssertionLevel assertionLevel)
Method setAssertionLevel sets the target planner AssertionLevel.

Parameters:
properties - of type Map
assertionLevel - of type AssertionLevel

getAssertionLevel

public static AssertionLevel getAssertionLevel(Map<Object,Object> properties)
Method getAssertionLevel returns the configured target planner AssertionLevel.

Parameters:
properties - of type Map
Returns:
AssertionLevel the configured AssertionLevel

setDebugLevel

public static void setDebugLevel(Map<Object,Object> properties,
                                 DebugLevel debugLevel)
Method setDebugLevel sets the target planner DebugLevel.

Parameters:
properties - of type Map
debugLevel - of type DebugLevel

getDebugLevel

public static DebugLevel getDebugLevel(Map<Object,Object> properties)
Method getDebugLevel returns the configured target planner DebugLevel.

Parameters:
properties - of type Map
Returns:
DebugLevel the configured DebugLevel

setIntermediateSchemeClass

public static void setIntermediateSchemeClass(Map<Object,Object> properties,
                                              Class intermediateSchemeClass)
Method setIntermediateSchemeClass is used for debugging. The default Scheme for intermediate files is SequenceFile.

Parameters:
properties - of type Map
intermediateSchemeClass - of type Class

setIntermediateSchemeClass

public static void setIntermediateSchemeClass(Map<Object,Object> properties,
                                              String intermediateSchemeClass)
Method setIntermediateSchemeClass is used for debugging. The default Scheme for intermediate files is SequenceFile.

Parameters:
properties - of type Map
intermediateSchemeClass - of type String

getIntermediateSchemeClass

public static Class getIntermediateSchemeClass(Map<Object,Object> properties)
Method getIntermediateSchemeClass is used for debugging. The default Scheme for intermediate files is SequenceFile.

Parameters:
properties - of type Map
Returns:
Class

setApplicationJarClass

public static void setApplicationJarClass(Map<Object,Object> properties,
                                          Class type)
Method setApplicationJarClass is used to set the application jar file.

All cluster executed Cascading applications need to call setApplicationJarClass(java.util.Map, Class) or setApplicationJarPath(java.util.Map, String), otherwise ClassNotFound exceptions are likely.

Parameters:
properties - of type Map
type - of type Class

getApplicationJarClass

public static Class getApplicationJarClass(Map<Object,Object> properties)
Method getApplicationJarClass returns the Class set by the setApplicationJarClass method.

Parameters:
properties - of type Map
Returns:
Class

setApplicationJarPath

public static void setApplicationJarPath(Map<Object,Object> properties,
                                         String path)
Method setApplicationJarPath is used to set the application jar file.

All cluster executed Cascading applications need to call setApplicationJarClass(java.util.Map, Class) or setApplicationJarPath(java.util.Map, String), otherwise ClassNotFound exceptions are likely.

Parameters:
properties - of type Map
path - of type String

getApplicationJarPath

public static String getApplicationJarPath(Map<Object,Object> properties)
Method getApplicationJarPath return the path set by the setApplicationJarPath method.

Parameters:
properties - of type Map
Returns:
String

getProperties

public Map<Object,Object> getProperties()
Method getProperties returns the properties of this FlowConnector object. The returned Map instance is immutable to prevent changes to the underlying property values in this FlowConnector instance.

Returns:
the properties (type Map) of this FlowConnector object.

connect

public Flow connect(Tap source,
                    Tap sink,
                    Pipe tail)
Method connect links the given source and sink Taps to the given pipe assembly.

Parameters:
source - source Tap to bind to the head of the given tail Pipe
sink - sink Tap to bind to the given tail Pipe
tail - tail end of a pipe assembly
Returns:
Flow

connect

public Flow connect(String name,
                    Tap source,
                    Tap sink,
                    Pipe tail)
Method connect links the given source and sink Taps to the given pipe assembly.

Parameters:
name - name to give the resulting Flow
source - source Tap to bind to the head of the given tail Pipe
sink - sink Tap to bind to the given tail Pipe
tail - tail end of a pipe assembly
Returns:
Flow

connect

public Flow connect(String name,
                    Tap source,
                    Tap sink,
                    Tap trap,
                    Pipe tail)
Method connect links the given source, sink, and trap Taps to the given pipe assembly. The given trap will be linked to the assembly head along with the source.

Parameters:
name - name to give the resulting Flow
source - source Tap to bind to the head of the given tail Pipe
sink - sink Tap to bind to the given tail Pipe
trap - trap Tap to sink all failed Tuples into
tail - tail end of a pipe assembly
Returns:
Flow

connect

public Flow connect(Map<String,Tap> sources,
                    Tap sink,
                    Pipe tail)
Method connect links the named source Taps and sink Tap to the given pipe assembly.

Parameters:
sources - all head names and source Taps to bind to the heads of the given tail Pipe
sink - sink Tap to bind to the given tail Pipe
tail - tail end of a pipe assembly
Returns:
Flow

connect

public Flow connect(String name,
                    Map<String,Tap> sources,
                    Tap sink,
                    Pipe tail)
Method connect links the named source Taps and sink Tap to the given pipe assembly.

Parameters:
name - name to give the resulting Flow
sources - all head names and source Taps to bind to the heads of the given tail Pipe
sink - sink Tap to bind to the given tail Pipe
tail - tail end of a pipe assembly
Returns:
Flow

connect

public Flow connect(String name,
                    Map<String,Tap> sources,
                    Tap sink,
                    Map<String,Tap> traps,
                    Pipe tail)
Method connect links the named source and trap Taps and sink Tap to the given pipe assembly.

Parameters:
name - name to give the resulting Flow
sources - all head names and source Taps to bind to the heads of the given tail Pipe
sink - sink Tap to bind to the given tail Pipe
traps - all pipe names and trap Taps to sink all failed Tuples into
tail - tail end of a pipe assembly
Returns:
Flow

connect

public Flow connect(String name,
                    Tap source,
                    Tap sink,
                    Map<String,Tap> traps,
                    Pipe tail)
Method connect links the named trap Taps, source and sink Tap to the given pipe assembly.

Parameters:
name - name to give the resulting Flow
source - source Tap to bind to the head of the given tail Pipe
sink - sink Tap to bind to the given tail Pipe
traps - all pipe names and trap Taps to sink all failed Tuples into
tail - tail end of a pipe assembly
Returns:
Flow

connect

public Flow connect(Tap source,
                    Map<String,Tap> sinks,
                    Collection<Pipe> tails)
Method connect links the named source Taps and sink Tap to the given pipe assembly.

Since only once source Tap is given, it is assumed to be associated with the 'head' pipe. So the head pipe does not need to be included as an argument.

Parameters:
source - source Tap to bind to the head of the given tail Pipes
sinks - all tail names and sink Taps to bind to the given tail Pipes
tails - all tail ends of a pipe assembly
Returns:
Flow

connect

public Flow connect(String name,
                    Tap source,
                    Map<String,Tap> sinks,
                    Collection<Pipe> tails)
Method connect links the named source Taps and sink Tap to the given pipe assembly.

Since only once source Tap is given, it is assumed to be associated with the 'head' pipe. So the head pipe does not need to be included as an argument.

Parameters:
name - name to give the resulting Flow
source - source Tap to bind to the head of the given tail Pipes
sinks - all tail names and sink Taps to bind to the given tail Pipes
tails - all tail ends of a pipe assembly
Returns:
Flow

connect

public Flow connect(Tap source,
                    Map<String,Tap> sinks,
                    Pipe... tails)
Method connect links the named source Taps and sink Tap to the given pipe assembly.

Since only once source Tap is given, it is assumed to be associated with the 'head' pipe. So the head pipe does not need to be included as an argument.

Parameters:
source - source Tap to bind to the head of the given tail Pipes
sinks - all tail names and sink Taps to bind to the given tail Pipes
tails - all tail ends of a pipe assembly
Returns:
Flow

connect

public Flow connect(String name,
                    Tap source,
                    Map<String,Tap> sinks,
                    Pipe... tails)
Method connect links the named source Taps and sink Tap to the given pipe assembly.

Since only once source Tap is given, it is assumed to be associated with the 'head' pipe. So the head pipe does not need to be included as an argument.

Parameters:
name - name to give the resulting Flow
source - source Tap to bind to the head of the given tail Pipes
sinks - all tail names and sink Taps to bind to the given tail Pipes
tails - all tail ends of a pipe assembly
Returns:
Flow

connect

public Flow connect(Map<String,Tap> sources,
                    Map<String,Tap> sinks,
                    Pipe... tails)
Method connect links the named sources and sinks to the given pipe assembly.

Parameters:
sources - all head names and source Taps to bind to the heads of the given tail Pipes
sinks - all tail names and sink Taps to bind to the given tail Pipes
tails - all tail ends of a pipe assembly
Returns:
Flow

connect

public Flow connect(String name,
                    Map<String,Tap> sources,
                    Map<String,Tap> sinks,
                    Pipe... tails)
Method connect links the named sources and sinks to the given pipe assembly.

Parameters:
name - name to give the resulting Flow
sources - all head names and source Taps to bind to the heads of the given tail Pipes
sinks - all tail names and sink Taps to bind to the given tail Pipes
tails - all tail ends of a pipe assembly
Returns:
Flow

connect

public Flow connect(String name,
                    Map<String,Tap> sources,
                    Map<String,Tap> sinks,
                    Map<String,Tap> traps,
                    Pipe... tails)
Method connect links the named sources, sinks and traps to the given pipe assembly.

Parameters:
name - name to give the resulting Flow
sources - all head names and source Taps to bind to the heads of the given tail Pipes
sinks - all tail names and sink Taps to bind to the given tail Pipes
traps - all pipe names and trap Taps to sink all failed Tuples into
tails - all tail ends of a pipe assembly
Returns:
Flow


Copyright © 2007-2010 Concurrent, Inc. All Rights Reserved.