4.3 Configuring

During runtime, Hadoop must be told which application jar file should be pushed to the cluster. Typically, this is done via the Hadoop API JobConf object.

Cascading offers a shorthand for configuring this parameter, demonstrated here:

Properties properties = new Properties();

// pass in the class name of your application
// this will find the parent jar at runtime
AppProps.setApplicationJarClass( properties, Main.class );

// ALTERNATIVELY ...

// pass in the path to the parent jar
AppProps.setApplicationJarPath( properties, pathToJar );


// pass properties to the connector
FlowConnector flowConnector = new HadoopFlowConnector( properties );

Above we see two ways to set the same property - via the setApplicationJarClass() method, and via the setApplicationJarPath() method. One is based on a Class name, and the other is based on a literal path.

The first method takes a Class object that owns the "main" function for this application. The assumption here is that Main.class is not located in a Java Jar that is stored in the lib folder of the application Jar. If it is, that Jar is pushed to the cluster, not the parent application jar.

The second method simply sets the path to the parent Class as a property.

In your application, only one of these methods needs to be called, but one of them must be called to properly configure Hadoop.

Copyright © 2007-2012 Concurrent, Inc. All Rights Reserved.