During runtime, Hadoop must be told which application jar file
should be pushed to the cluster. Typically, this is done via the Hadoop
API JobConf
object.
Cascading offers a shorthand for configuring this parameter, demonstrated here:
Properties properties = new Properties();
// pass in the class name of your application
// this will find the parent jar at runtime
AppProps.setApplicationJarClass( properties, Main.class );
// ALTERNATIVELY ...
// pass in the path to the parent jar
AppProps.setApplicationJarPath( properties, pathToJar );
// pass properties to the connector
FlowConnector flowConnector = new HadoopFlowConnector( properties );
Above we see two ways to set the same property - via the
setApplicationJarClass()
method, and via the
setApplicationJarPath()
method. One is based on
a Class name, and the other is based on a literal path.
The first method takes a Class object that owns the "main"
function for this application. The assumption here is that
Main.class
is not located in a Java Jar that is stored in
the lib
folder of the application Jar. If it is,
that Jar is pushed to the cluster, not the parent application
jar.
The second method simply sets the path to the parent Class as a property.
In your application, only one of these methods needs to be called, but one of them must be called to properly configure Hadoop.
Copyright © 2007-2012 Concurrent, Inc. All Rights Reserved.