Class GlobHfs

  extended by cascading.tap.Tap<Config,Input,Void>
      extended by cascading.tap.SourceTap<Config,Input>
          extended by cascading.tap.MultiSourceTap<Hfs,org.apache.hadoop.mapred.JobConf,org.apache.hadoop.mapred.RecordReader>
              extended by cascading.tap.hadoop.GlobHfs
All Implemented Interfaces:
FlowElement, CompositeTap<Hfs>, Serializable

public class GlobHfs
extends MultiSourceTap<Hfs,org.apache.hadoop.mapred.JobConf,org.apache.hadoop.mapred.RecordReader>

Class GlobHfs is a type of MultiSourceTap that accepts Hadoop style 'file globing' expressions so multiple files that match the given pattern may be used as the input sources for a given Flow.

See FileSystem.globStatus(org.apache.hadoop.fs.Path) for details on the globing syntax. But in short it is similar to standard regular expressions except alternation is done via {foo,bar} instead of (foo|bar).

Note that a Flow sourcing from GlobHfs is not currently compatible with the Cascade scheduler. GlobHfs expects the files and paths to exist so the wildcards can be resolved into concrete values so that the scheduler can order the Flows properly.

Note that globing can match files or directories. It may consume less resources to match directories and let Hadoop include all sub-files immediately contained in the directory instead of enumerating every individual file. Ending the glob path with a / should match only directories.

See Also:
Hfs, MultiSourceTap, FileSystem, Serialized Form

Field Summary
Fields inherited from class cascading.tap.MultiSourceTap
Constructor Summary
GlobHfs(Scheme<org.apache.hadoop.mapred.JobConf,org.apache.hadoop.mapred.RecordReader,?,?,?> scheme, String pathPattern)
          Constructor GlobHfs creates a new GlobHfs instance.
GlobHfs(Scheme<org.apache.hadoop.mapred.JobConf,org.apache.hadoop.mapred.RecordReader,?,?,?> scheme, String pathPattern, org.apache.hadoop.fs.PathFilter pathFilter)
          Constructor GlobHfs creates a new GlobHfs instance.
Method Summary
 boolean equals(Object object)
 String getIdentifier()
          Method getIdentifier returns a String representing the resource this Tap instance represents.
protected  Hfs[] getTaps()
          Method getTaps returns the taps of this MultiTap object.
 int hashCode()
 void sourceConfInit(FlowProcess<org.apache.hadoop.mapred.JobConf> process, org.apache.hadoop.mapred.JobConf conf)
          Method sourceConfInit initializes this instance as a source.
 String toString()
Methods inherited from class cascading.tap.MultiSourceTap
getChildTaps, getModifiedTime, getNumChildTaps, getScheme, isReplace, openForRead, resourceExists
Methods inherited from class cascading.tap.SourceTap
commitResource, createResource, deleteResource, getSinkFields, isSink, openForWrite, rollbackResource, sinkConfInit
Methods inherited from class cascading.tap.Tap
flowConfInit, getConfigDef, getFullIdentifier, getSinkMode, getSourceFields, getStepConfigDef, getTrace, hasConfigDef, hasStepConfigDef, id, isEquivalentTo, isKeep, isSource, isTemporary, isUpdate, openForRead, openForWrite, outgoingScopeFor, presentSinkFields, presentSourceFields, resolveIncomingOperationArgumentFields, resolveIncomingOperationPassThroughFields, retrieveSinkFields, retrieveSourceFields, setScheme, taps
Methods inherited from class java.lang.Object
clone, finalize, getClass, notify, notifyAll, wait, wait, wait

Constructor Detail


public GlobHfs(Scheme<org.apache.hadoop.mapred.JobConf,org.apache.hadoop.mapred.RecordReader,?,?,?> scheme,
                                          String pathPattern)
Constructor GlobHfs creates a new GlobHfs instance.

scheme - of type Scheme
pathPattern - of type String


public GlobHfs(Scheme<org.apache.hadoop.mapred.JobConf,org.apache.hadoop.mapred.RecordReader,?,?,?> scheme,
                                          String pathPattern,
                                          org.apache.hadoop.fs.PathFilter pathFilter)
Constructor GlobHfs creates a new GlobHfs instance.

scheme - of type Scheme
pathPattern - of type String
pathFilter - of type PathFilter
Method Detail


public String getIdentifier()
Description copied from class: Tap
Method getIdentifier returns a String representing the resource this Tap instance represents.

Often, if the tap accesses a filesystem, the identifier is nothing more than the path to the file or directory. In other cases it may be a an URL or URI representing a connection string or remote resource.

Any two Tap instances having the same value for the identifier are considered equal.

getIdentifier in class MultiSourceTap<Hfs,org.apache.hadoop.mapred.JobConf,org.apache.hadoop.mapred.RecordReader>


protected Hfs[] getTaps()
Description copied from class: MultiSourceTap
Method getTaps returns the taps of this MultiTap object.

getTaps in class MultiSourceTap<Hfs,org.apache.hadoop.mapred.JobConf,org.apache.hadoop.mapred.RecordReader>
the taps (type Tap[]) of this MultiTap object.


public void sourceConfInit(FlowProcess<org.apache.hadoop.mapred.JobConf> process,
                           org.apache.hadoop.mapred.JobConf conf)
Description copied from class: Tap
Method sourceConfInit initializes this instance as a source.

This method maybe called more than once if this Tap instance is used outside the scope of a Flow instance or if it participates in multiple times in a given Flow or across different Flows in a Cascade.

In the context of a Flow, it will be called after FlowListener.onStarting(cascading.flow.Flow)

Note that no resources or services should be modified by this method.

sourceConfInit in class MultiSourceTap<Hfs,org.apache.hadoop.mapred.JobConf,org.apache.hadoop.mapred.RecordReader>
process - of type FlowProcess
conf - of type Config


public boolean equals(Object object)
equals in class MultiSourceTap<Hfs,org.apache.hadoop.mapred.JobConf,org.apache.hadoop.mapred.RecordReader>


public int hashCode()
hashCode in class MultiSourceTap<Hfs,org.apache.hadoop.mapred.JobConf,org.apache.hadoop.mapred.RecordReader>


public String toString()
toString in class MultiSourceTap<Hfs,org.apache.hadoop.mapred.JobConf,org.apache.hadoop.mapred.RecordReader>

Copyright © 2007-2012 Concurrent, Inc. All Rights Reserved.