Cascading ships with a handful of jars.
all relevant Cascading class files and libraries, with a
Hadoop friendly lib
folder containing all
third-party dependencies
all Cascading Core class files, should be packaged with
lib/*.jar
all Cascading XML module class files, should be packaged
with lib/xml/*.jar
all Cascading unit tests. If writing custom modules for
cascading, sub-classing
cascading.CascadingTestCase
might be
helpful
Cascading will run with Hadoop in its default 'local' or 'stand alone' mode, or configured as a distributed cluster.
When used on a cluster, a Hadoop job Jar must be created with
Cascading jars and dependent thrid-party jars in the job jar
lib
directory, per the Hadoop documentation.
Example 4.1. Sample Ant Build - Properties
<!-- Common ant build properties, included here for completeness --> <property name="build.dir" location="${basedir}/build"/> <property name="build.classes" location="${build.dir}/classes"/> <!-- Cascading specific properties --> <property name="cascading.home" location="${basedir}/../cascading"/> <property file="${cascading.home}/version.properties"/> <property name="cascading.release.version" value="x.y.z"/> <property name="cascading.filename.core" value="cascading-core-${cascading.release.version}.jar"/> <property name="cascading.filename.xml" value="cascading-xml-${cascading.release.version}.jar"/> <property name="cascading.libs" value="${cascading.home}/lib"/> <property name="cascading.libs.core" value="${cascading.libs}"/> <property name="cascading.libs.xml" value="${cascading.libs}/xml"/> <condition property="cascading.path" value="${cascading.home}/" else="${cascading.home}/build"> <available file="${cascading.home}/${cascading.filename.core}"/> </condition> <property name="cascading.lib.core" value="${cascading.path}/${cascading.filename.core}"/> <property name="cascading.lib.xml" value="${cascading.path}/${cascading.filename.xml}"/>
Example 4.2. Sample Ant Build - Target
<!-- A sample target to jar project classes and Cascading libraries into a single Hadoop compatible jar file. --> <target name="jar" description="creates a Hadoop ready jar w/dependencies"> <!-- copy Cascading classes and libraries --> <copy todir="${build.classes}/lib" file="${cascading.lib.core}"/> <copy todir="${build.classes}/lib" file="${cascading.lib.xml}"/> <copy todir="${build.classes}/lib"> <fileset dir="${cascading.libs.core}" includes="*.jar"/> <fileset dir="${cascading.libs.xml}" includes="*.jar"/> </copy> <jar jarfile="${build.dir}/${ant.project.name}.jar"> <fileset dir="${build.classes}"/> <fileset dir="${basedir}" includes="lib/"/> <manifest> <!-- the project Main class, by default assumes Main --> <attribute name="Main-Class" value="${ant.project.name}/Main"/> </manifest> </jar> </target>
The above Ant snippets can be used in your project to create a
Hadoop jar for submission on your cluster. Again, all Hadoop
applications that are intended to be run in a cluster must be packaged
with all third-party libraries in a directory named
lib
in the final application Jar file, regardless
if they are Cascading applications or raw Hadoop MapReduce
applications.
Note, the snippets above is only intended to show how to include Cascading libraries, you still need to compile your project into the build.classes path.
Copyright © 2007-2008 Concurrent, Inc. All Rights Reserved.