cascading.tap.hadoop
Class GlobHfs
java.lang.Object
cascading.tap.Tap<Config,Input,Void>
cascading.tap.SourceTap<Config,Input>
cascading.tap.MultiSourceTap<Hfs,JobConf,RecordReader>
cascading.tap.hadoop.GlobHfs
- All Implemented Interfaces:
- FlowElement, CompositeTap<Hfs>, Serializable
public class GlobHfs
- extends MultiSourceTap<Hfs,JobConf,RecordReader>
Class GlobHfs is a type of MultiSourceTap
that accepts Hadoop style 'file globing' expressions so
multiple files that match the given pattern may be used as the input sources for a given Flow
.
See FileSystem.globStatus(org.apache.hadoop.fs.Path)
for details on the globing syntax. But in short
it is similar to standard regular expressions except alternation is done via {foo,bar} instead of (foo|bar).
Note that a Flow
sourcing from GlobHfs is not currently compatible with the Cascade
scheduler. GlobHfs expects the files and paths to exist so the wildcards can be resolved into concrete values so
that the scheduler can order the Flows properly.
Note that globing can match files or directories. It may consume less resources to match directories and let
Hadoop include all sub-files immediately contained in the directory instead of enumerating every individual file.
Ending the glob path with a /
should match only directories.
- See Also:
Hfs
,
MultiSourceTap
,
FileSystem
,
Serialized Form
Constructor Summary |
GlobHfs(Scheme<JobConf,RecordReader,?,?,?> scheme,
String pathPattern)
Constructor GlobHfs creates a new GlobHfs instance. |
GlobHfs(Scheme<JobConf,RecordReader,?,?,?> scheme,
String pathPattern,
PathFilter pathFilter)
Constructor GlobHfs creates a new GlobHfs instance. |
Methods inherited from class cascading.tap.Tap |
flowConfInit, getConfigDef, getFullIdentifier, getSinkMode, getSourceFields, getStepConfigDef, getTrace, hasConfigDef, hasStepConfigDef, id, isEquivalentTo, isKeep, isSource, isTemporary, isUpdate, openForRead, openForWrite, outgoingScopeFor, presentSinkFields, presentSourceFields, resolveFields, resolveIncomingOperationFields, retrieveSinkFields, retrieveSourceFields, setScheme, taps |
GlobHfs
@ConstructorProperties(value={"scheme","pathPattern"})
public GlobHfs(Scheme<JobConf,RecordReader,?,?,?> scheme,
String pathPattern)
- Constructor GlobHfs creates a new GlobHfs instance.
- Parameters:
scheme
- of type SchemepathPattern
- of type String
GlobHfs
@ConstructorProperties(value={"scheme","pathPattern","pathFilter"})
public GlobHfs(Scheme<JobConf,RecordReader,?,?,?> scheme,
String pathPattern,
PathFilter pathFilter)
- Constructor GlobHfs creates a new GlobHfs instance.
- Parameters:
scheme
- of type SchemepathPattern
- of type StringpathFilter
- of type PathFilter
getIdentifier
public String getIdentifier()
- Description copied from class:
Tap
- Method getIdentifier returns a String representing the resource this Tap instance represents.
Often, if the tap accesses a filesystem, the identifier is nothing more than the path to the file or directory.
In other cases it may be a an URL or URI representing a connection string or remote resource.
Any two Tap instances having the same value for the identifier are considered equal.
- Overrides:
getIdentifier
in class MultiSourceTap<Hfs,JobConf,RecordReader>
- Returns:
- String
getTaps
protected Hfs[] getTaps()
- Description copied from class:
MultiSourceTap
- Method getTaps returns the taps of this MultiTap object.
- Overrides:
getTaps
in class MultiSourceTap<Hfs,JobConf,RecordReader>
- Returns:
- the taps (type Tap[]) of this MultiTap object.
sourceConfInit
public void sourceConfInit(FlowProcess<JobConf> process,
JobConf conf)
- Description copied from class:
Tap
- Method sourceInit initializes this instance as a source.
This method maybe called more than once if this Tap instance is used outside the scope of a
Flow
instance or if it participates in multiple times in a given Flow or across different Flows in
a Cascade
.
In the context of a Flow, it will be called after
FlowListener.onStarting(cascading.flow.Flow)
- Overrides:
sourceConfInit
in class MultiSourceTap<Hfs,JobConf,RecordReader>
conf
- of type JobConf @throws IOException on resource initialization failure.
equals
public boolean equals(Object object)
- Overrides:
equals
in class MultiSourceTap<Hfs,JobConf,RecordReader>
hashCode
public int hashCode()
- Overrides:
hashCode
in class MultiSourceTap<Hfs,JobConf,RecordReader>
toString
public String toString()
- Overrides:
toString
in class MultiSourceTap<Hfs,JobConf,RecordReader>
Copyright © 2007-2012 Concurrent, Inc. All Rights Reserved.