4.2 Building

Cascading ships with a handful of jars.

cascading-1.2.x.jar

all relevant Cascading class files and libraries, with a Hadoop friendly lib folder containing all third-party dependencies

cascading-core-1.2.x.jar

all Cascading Core class files, should be packaged with lib/*.jar

cascading-xml-1.2.x.jar

all Cascading XML module class files, should be packaged with lib/xml/*.jar

cascading-test-1.2.x.jar

all Cascading unit tests. If writing custom modules for cascading, sub-classing cascading.CascadingTestCase might be helpful

Cascading will run with Hadoop in its default 'local' or 'stand alone' mode, or configured as a distributed cluster.

When used on a cluster, a Hadoop job Jar must be created with Cascading jars and dependent thrid-party jars in the job jar lib directory, per the Hadoop documentation.

Example 4.1. Sample Ant Build - Properties

<!-- Common ant build properties, included here for completeness -->
<property name="build.dir" location="${basedir}/build"/>
<property name="build.classes" location="${build.dir}/classes"/>

<!-- Cascading specific properties -->
<property name="cascading.home" location="${basedir}/../cascading"/>
<property file="${cascading.home}/version.properties"/>
<property name="cascading.release.version" value="x.y.z"/>
<property name="cascading.filename.core"
          value="cascading-core-${cascading.release.version}.jar"/>
<property name="cascading.filename.xml"
          value="cascading-xml-${cascading.release.version}.jar"/>
<property name="cascading.libs" value="${cascading.home}/lib"/>
<property name="cascading.libs.core" value="${cascading.libs}"/>
<property name="cascading.libs.xml" value="${cascading.libs}/xml"/>

<condition property="cascading.path" value="${cascading.home}/"
           else="${cascading.home}/build">
  <available file="${cascading.home}/${cascading.filename.core}"/>
</condition>

<property name="cascading.lib.core"
          value="${cascading.path}/${cascading.filename.core}"/>
<property name="cascading.lib.xml"
          value="${cascading.path}/${cascading.filename.xml}"/>

Example 4.2. Sample Ant Build - Target

<!--
  A sample target to jar project classes and Cascading
  libraries into a single Hadoop compatible jar file.
 -->

<target name="jar" description="creates a Hadoop ready jar w/dependencies">

  <!-- copy Cascading classes and libraries -->
  <copy todir="${build.classes}/lib" file="${cascading.lib.core}"/>
  <copy todir="${build.classes}/lib" file="${cascading.lib.xml}"/>
  <copy todir="${build.classes}/lib">
    <fileset dir="${cascading.libs.core}" includes="*.jar"/>
    <fileset dir="${cascading.libs.xml}" includes="*.jar"/>
  </copy>

  <jar jarfile="${build.dir}/${ant.project.name}.jar">
    <fileset dir="${build.classes}"/>
    <fileset dir="${basedir}" includes="lib/"/>
    <manifest>
      <!-- the project Main class, by default assumes Main -->
      <attribute name="Main-Class" value="${ant.project.name}/Main"/>
    </manifest>
  </jar>

</target>

The above Ant snippets can be used in your project to create a Hadoop jar for submission on your cluster. Again, all Hadoop applications that are intended to be run in a cluster must be packaged with all third-party libraries in a directory named lib in the final application Jar file, regardless if they are Cascading applications or raw Hadoop MapReduce applications.

Note, the snippets above is only intended to show how to include Cascading libraries, you still need to compile your project into the build.classes path.

Copyright © 2007-2008 Concurrent, Inc. All Rights Reserved.