cascading.tap
Class Hfs

java.lang.Object
  extended by cascading.tap.Tap
      extended by cascading.tap.Hfs
All Implemented Interfaces:
FlowElement, Serializable
Direct Known Subclasses:
Dfs, Lfs, S3fs, TempHfs

public class Hfs
extends Tap

Class Hfs is the base class for all Hadoop file system access. Use Dfs, Lfs, or S3fs for resources specific to Hadoop Distributed file system, the Local file system, or Amazon S3, respectively.

Use the Hfs class if the 'kind' of resource is unknown at design time. To use, prefix a scheme to the 'stringPath'. Where hdfs://... will denonte Dfs, file://... will denote Lfs, and s3://aws_id:aws_secret@bucket/... will denote S3fs.

Call setTemporaryDirectory(java.util.Map, String) to use a different temporary file directory path other than the current Hadoop default path.

See Also:
Serialized Form

Constructor Summary
protected Hfs()
           
  Hfs(Fields fields, String stringPath)
          Constructor Hfs creates a new Hfs instance.
  Hfs(Fields fields, String stringPath, boolean replace)
          Constructor Hfs creates a new Hfs instance.
  Hfs(Fields fields, String stringPath, SinkMode sinkMode)
          Constructor Hfs creates a new Hfs instance.
protected Hfs(Scheme scheme)
           
  Hfs(Scheme scheme, String stringPath)
          Constructor Hfs creates a new Hfs instance.
  Hfs(Scheme scheme, String stringPath, boolean replace)
          Constructor Hfs creates a new Hfs instance.
  Hfs(Scheme scheme, String stringPath, SinkMode sinkMode)
          Constructor Hfs creates a new Hfs instance.
 
Method Summary
 boolean deletePath(JobConf conf)
          Method deletePath deletes the resource represented by this instance.
 boolean equals(Object object)
           
protected  FileSystem getDefaultFileSystem(JobConf jobConf)
           
 URI getDefaultFileSystemURIScheme(JobConf jobConf)
          Method getDefaultFileSystemURIScheme returns the URI scheme for the default Hadoop FileSystem.
protected  FileSystem getFileSystem(JobConf jobConf)
           
 Path getPath()
          Method getPath returns the Hadoop path to the resource represented by this Tap instance.
 long getPathModified(JobConf conf)
          Method getPathModified returns the date this resource was last modified.
 Path getQualifiedPath(JobConf conf)
          Method getQualifiedPath returns a FileSystem fully qualified Hadoop Path.
static String getTemporaryDirectory(Map<Object,Object> properties)
          Methdo getTemporaryDirectory returns the configured temporary directory from the given properties object.
protected  Path getTempPath(JobConf conf)
           
 URI getURIScheme(JobConf jobConf)
           
 int hashCode()
           
 boolean isWriteDirect()
          Method isWriteDirect returns true if this instances TupleEntryCollector should be used to sink values.
 boolean makeDirs(JobConf conf)
          Method makeDirs makes all the directories this Tap instance represents.
protected  String makeTemporaryPathDir(String name)
           
protected  URI makeURIScheme(JobConf jobConf)
           
 TupleEntryIterator openForRead(JobConf conf)
          Method openForRead opens the resource represented by this Tap instance.
 TupleEntryCollector openForWrite(JobConf conf)
          Method openForWrite opens the resource represented by this Tap instance.
 boolean pathExists(JobConf conf)
          Method pathExists return true if the path represented by this instance exists.
protected  void setStringPath(String stringPath)
           
static void setTemporaryDirectory(Map<Object,Object> properties, String tempDir)
          Method setTemporaryDirectory sets the temporary directory on the given properties object.
protected  void setUriScheme(URI uriScheme)
           
 void sinkInit(JobConf conf)
          Method sinkInit initializes this instance as a sink.
 void sourceInit(JobConf conf)
          Method sourceInit initializes this instance as a source.
 String toString()
           
 
Methods inherited from class cascading.tap.Tap
flowInit, getIdentifier, getScheme, getSinkFields, getSinkMode, getSourceFields, isAppend, isEquivalentTo, isKeep, isReplace, isSink, isSource, isUpdate, outgoingScopeFor, resolveFields, resolveIncomingOperationFields, setScheme, setWriteDirect, sink, source, taps
 
Methods inherited from class java.lang.Object
clone, finalize, getClass, notify, notifyAll, wait, wait, wait
 

Constructor Detail

Hfs

protected Hfs()

Hfs

@ConstructorProperties(value="scheme")
protected Hfs(Scheme scheme)

Hfs

@ConstructorProperties(value={"fields","stringPath"})
public Hfs(Fields fields,
                                      String stringPath)
Constructor Hfs creates a new Hfs instance.

Parameters:
fields - of type Fields
stringPath - of type String

Hfs

@ConstructorProperties(value={"fields","stringPath","replace"})
public Hfs(Fields fields,
                                      String stringPath,
                                      boolean replace)
Constructor Hfs creates a new Hfs instance.

Parameters:
fields - of type Fields
stringPath - of type String
replace - of type boolean

Hfs

@ConstructorProperties(value={"fields","stringPath","sinkMode"})
public Hfs(Fields fields,
                                      String stringPath,
                                      SinkMode sinkMode)
Constructor Hfs creates a new Hfs instance.

Parameters:
fields - of type Fields
stringPath - of type String
sinkMode - of type SinkMode

Hfs

@ConstructorProperties(value={"scheme","stringPath"})
public Hfs(Scheme scheme,
                                      String stringPath)
Constructor Hfs creates a new Hfs instance.

Parameters:
scheme - of type Scheme
stringPath - of type String

Hfs

@ConstructorProperties(value={"scheme","stringPath","replace"})
public Hfs(Scheme scheme,
                                      String stringPath,
                                      boolean replace)
Constructor Hfs creates a new Hfs instance.

Parameters:
scheme - of type Scheme
stringPath - of type String
replace - of type boolean

Hfs

@ConstructorProperties(value={"scheme","stringPath","sinkMode"})
public Hfs(Scheme scheme,
                                      String stringPath,
                                      SinkMode sinkMode)
Constructor Hfs creates a new Hfs instance.

Parameters:
scheme - of type Scheme
stringPath - of type String
sinkMode - of type SinkMode
Method Detail

setTemporaryDirectory

public static void setTemporaryDirectory(Map<Object,Object> properties,
                                         String tempDir)
Method setTemporaryDirectory sets the temporary directory on the given properties object.

Parameters:
properties - of type Map
tempDir - of type String

getTemporaryDirectory

public static String getTemporaryDirectory(Map<Object,Object> properties)
Methdo getTemporaryDirectory returns the configured temporary directory from the given properties object.

Parameters:
properties - of type Map
Returns:
a String or null if not set

setStringPath

protected void setStringPath(String stringPath)

setUriScheme

protected void setUriScheme(URI uriScheme)

getURIScheme

public URI getURIScheme(JobConf jobConf)
                 throws IOException
Throws:
IOException

makeURIScheme

protected URI makeURIScheme(JobConf jobConf)
                     throws IOException
Throws:
IOException

getDefaultFileSystemURIScheme

public URI getDefaultFileSystemURIScheme(JobConf jobConf)
                                  throws IOException
Method getDefaultFileSystemURIScheme returns the URI scheme for the default Hadoop FileSystem.

Parameters:
jobConf - of type JobConf
Returns:
URI
Throws:
IOException - when

isWriteDirect

public boolean isWriteDirect()
Description copied from class: Tap
Method isWriteDirect returns true if this instances TupleEntryCollector should be used to sink values.

Overrides:
isWriteDirect in class Tap
Returns:
the writeDirect (type boolean) of this Tap object.

getDefaultFileSystem

protected FileSystem getDefaultFileSystem(JobConf jobConf)
                                   throws IOException
Throws:
IOException

getFileSystem

protected FileSystem getFileSystem(JobConf jobConf)
                            throws IOException
Throws:
IOException

getPath

public Path getPath()
Description copied from class: Tap
Method getPath returns the Hadoop path to the resource represented by this Tap instance.

Specified by:
getPath in class Tap
Returns:
Path
See Also:
Tap.getPath()

getQualifiedPath

public Path getQualifiedPath(JobConf conf)
                      throws IOException
Description copied from class: Tap
Method getQualifiedPath returns a FileSystem fully qualified Hadoop Path.

Overrides:
getQualifiedPath in class Tap
Parameters:
conf - of type JobConf
Returns:
Path
Throws:
IOException - when

sourceInit

public void sourceInit(JobConf conf)
                throws IOException
Description copied from class: Tap
Method sourceInit initializes this instance as a source.

This method maybe called more than once if this Tap instance is used outside the scope of a Flow instance or if it participates in multiple times in a given Flow or across different Flows in a Cascade.

In the context of a Flow, it will be called after FlowListener.onStarting(cascading.flow.Flow)

Overrides:
sourceInit in class Tap
Parameters:
conf - of type JobConf
Throws:
IOException - on resource initialization failure.

sinkInit

public void sinkInit(JobConf conf)
              throws IOException
Description copied from class: Tap
Method sinkInit initializes this instance as a sink.

This method maybe called more than once if this Tap instance is used outside the scope of a Flow instance or if it participates in multiple times in a given Flow or across different Flows in a Cascade.

Note this method will be called in context of this Tap being used as a traditional 'sink' and as a 'trap'.

In the context of a Flow, it will be called after FlowListener.onStarting(cascading.flow.Flow)

Overrides:
sinkInit in class Tap
Parameters:
conf - of type JobConf
Throws:
IOException - on resource initialization failure.

makeDirs

public boolean makeDirs(JobConf conf)
                 throws IOException
Description copied from class: Tap
Method makeDirs makes all the directories this Tap instance represents.

Specified by:
makeDirs in class Tap
Parameters:
conf - of type JobConf
Returns:
boolean
Throws:
IOException - when there is an error making directories

deletePath

public boolean deletePath(JobConf conf)
                   throws IOException
Description copied from class: Tap
Method deletePath deletes the resource represented by this instance.

Specified by:
deletePath in class Tap
Parameters:
conf - of type JobConf
Returns:
boolean
Throws:
IOException - when the resource cannot be deleted

pathExists

public boolean pathExists(JobConf conf)
                   throws IOException
Description copied from class: Tap
Method pathExists return true if the path represented by this instance exists.

Specified by:
pathExists in class Tap
Parameters:
conf - of type JobConf
Returns:
boolean
Throws:
IOException - when the status cannot be determined

getPathModified

public long getPathModified(JobConf conf)
                     throws IOException
Description copied from class: Tap
Method getPathModified returns the date this resource was last modified.

Specified by:
getPathModified in class Tap
Parameters:
conf - of type JobConf
Returns:
long
Throws:
IOException - when the modified date cannot be determined

getTempPath

protected Path getTempPath(JobConf conf)

makeTemporaryPathDir

protected String makeTemporaryPathDir(String name)

toString

public String toString()
Overrides:
toString in class Object
See Also:
Object.toString()

equals

public boolean equals(Object object)
Overrides:
equals in class Tap
See Also:
Tap.equals(Object)

hashCode

public int hashCode()
Overrides:
hashCode in class Tap
See Also:
Tap.hashCode()

openForRead

public TupleEntryIterator openForRead(JobConf conf)
                               throws IOException
Description copied from class: Tap
Method openForRead opens the resource represented by this Tap instance.

Specified by:
openForRead in class Tap
Parameters:
conf - of type JobConf
Returns:
TupleEntryIterator
Throws:
IOException - when the resource cannot be opened

openForWrite

public TupleEntryCollector openForWrite(JobConf conf)
                                 throws IOException
Description copied from class: Tap
Method openForWrite opens the resource represented by this Tap instance.

Specified by:
openForWrite in class Tap
Parameters:
conf - of type JobConf
Returns:
TupleEntryCollector
Throws:
IOException - when


Copyright © 2007-2010 Concurrent, Inc. All Rights Reserved.