cascading.tap.hadoop
Class GlobHfs

java.lang.Object
  extended by cascading.tap.Tap<Config,Input,Void>
      extended by cascading.tap.SourceTap<Config,Input>
          extended by cascading.tap.MultiSourceTap<Hfs,JobConf,RecordReader>
              extended by cascading.tap.hadoop.GlobHfs
All Implemented Interfaces:
FlowElement, CompositeTap<Hfs>, Traceable, Serializable

public class GlobHfs
extends MultiSourceTap<Hfs,JobConf,RecordReader>

Class GlobHfs is a type of MultiSourceTap that accepts Hadoop style 'file globing' expressions so multiple files that match the given pattern may be used as the input sources for a given Flow.

See FileSystem.globStatus(org.apache.hadoop.fs.Path) for details on the globing syntax. But in short it is similar to standard regular expressions except alternation is done via {foo,bar} instead of (foo|bar).

Note that a Flow sourcing from GlobHfs is not currently compatible with the Cascade scheduler. GlobHfs expects the files and paths to exist so the wildcards can be resolved into concrete values so that the scheduler can order the Flows properly.

Note that globing can match files or directories. It may consume less resources to match directories and let Hadoop include all sub-files immediately contained in the directory instead of enumerating every individual file. Ending the glob path with a / should match only directories.

See Also:
Hfs, MultiSourceTap, FileSystem, Serialized Form

Field Summary
 
Fields inherited from class cascading.tap.MultiSourceTap
taps
 
Constructor Summary
GlobHfs(Scheme<JobConf,RecordReader,?,?,?> scheme, String pathPattern)
          Constructor GlobHfs creates a new GlobHfs instance.
GlobHfs(Scheme<JobConf,RecordReader,?,?,?> scheme, String pathPattern, PathFilter pathFilter)
          Constructor GlobHfs creates a new GlobHfs instance.
 
Method Summary
 boolean equals(Object object)
           
 String getIdentifier()
           
protected  Hfs[] getTaps()
           
 int hashCode()
           
 void sourceConfInit(FlowProcess<JobConf> process, JobConf conf)
           
 String toString()
           
 
Methods inherited from class cascading.tap.MultiSourceTap
getChildTaps, getModifiedTime, getNumChildTaps, getScheme, isReplace, openForRead, resourceExists
 
Methods inherited from class cascading.tap.SourceTap
commitResource, createResource, deleteResource, getSinkFields, isSink, openForWrite, rollbackResource, sinkConfInit
 
Methods inherited from class cascading.tap.Tap
createResource, deleteResource, flowConfInit, getConfigDef, getFullIdentifier, getFullIdentifier, getModifiedTime, getSinkMode, getSourceFields, getStepConfigDef, getTrace, hasConfigDef, hasStepConfigDef, id, isEquivalentTo, isKeep, isSource, isTemporary, isUpdate, openForRead, openForWrite, outgoingScopeFor, presentSinkFields, presentSourceFields, resolveIncomingOperationArgumentFields, resolveIncomingOperationPassThroughFields, resourceExists, retrieveSinkFields, retrieveSourceFields, setScheme, taps
 
Methods inherited from class java.lang.Object
clone, finalize, getClass, notify, notifyAll, wait, wait, wait
 

Constructor Detail

GlobHfs

@ConstructorProperties(value={"scheme","pathPattern"})
public GlobHfs(Scheme<JobConf,RecordReader,?,?,?> scheme,
                                          String pathPattern)
Constructor GlobHfs creates a new GlobHfs instance.

Parameters:
scheme - of type Scheme
pathPattern - of type String

GlobHfs

@ConstructorProperties(value={"scheme","pathPattern","pathFilter"})
public GlobHfs(Scheme<JobConf,RecordReader,?,?,?> scheme,
                                          String pathPattern,
                                          PathFilter pathFilter)
Constructor GlobHfs creates a new GlobHfs instance.

Parameters:
scheme - of type Scheme
pathPattern - of type String
pathFilter - of type PathFilter
Method Detail

getIdentifier

public String getIdentifier()
Overrides:
getIdentifier in class MultiSourceTap<Hfs,JobConf,RecordReader>

getTaps

protected Hfs[] getTaps()
Overrides:
getTaps in class MultiSourceTap<Hfs,JobConf,RecordReader>

sourceConfInit

public void sourceConfInit(FlowProcess<JobConf> process,
                           JobConf conf)
Overrides:
sourceConfInit in class MultiSourceTap<Hfs,JobConf,RecordReader>

equals

public boolean equals(Object object)
Overrides:
equals in class MultiSourceTap<Hfs,JobConf,RecordReader>

hashCode

public int hashCode()
Overrides:
hashCode in class MultiSourceTap<Hfs,JobConf,RecordReader>

toString

public String toString()
Overrides:
toString in class MultiSourceTap<Hfs,JobConf,RecordReader>


Copyright © 2007-2014 Concurrent, Inc. All Rights Reserved.