4.3 Configuring

During runtime, Hadoop must be "told" which application jar file should be pushed to the cluster. Typically this is done via the Hadoop API JobConf object.

Cascading offers a shorthand for configuring this parameter.

Properties properties = new Properties();

// pass in the class name of your application
// this will find the parent jar at runtime
FlowConnector.setApplicationJarClass( properties, Main.class );

// or pass in the path to the parent jar
FlowConnector.setApplicationJarPath( properties, pathToJar );

FlowConnector flowConnector = new FlowConnector( properties );

Above we see how to set the same property two ways. First via the setApplicationJarClass() method, and via the setApplicationJarPath() method. The first method takes a Class object that owns the 'main' function for this application. The assumption here is that Main.class is not located in a Java Jar that is stored in the lib folder of the application Jar. If it is, that Jar will be pushed to the cluster, not the parent application jar.

In your application, only one of these methods needs to be called, but one of them must be called to properly configure Hadoop.

Copyright © 2007-2008 Concurrent, Inc. All Rights Reserved.