public class GlobHfs extends cascading.tap.MultiSourceTap<Hfs,Configuration,RecordReader>
MultiSourceTap
that accepts Hadoop style 'file globing' expressions so
multiple files that match the given pattern may be used as the input sources for a given Flow
.
See FileSystem.globStatus(org.apache.hadoop.fs.Path)
for details on the globing syntax. But in short
it is similar to standard regular expressions except alternation is done via {foo,bar} instead of (foo|bar).
Note that a Flow
sourcing from GlobHfs is not currently compatible with the cascading.cascade.Cascade
scheduler. GlobHfs expects the files and paths to exist so the wildcards can be resolved into concrete values so
that the scheduler can order the Flows properly.
Note that globing can match files or directories. It may consume less resources to match directories and let
Hadoop include all sub-files immediately contained in the directory instead of enumerating every individual file.
Ending the glob path with a /
should match only directories.Hfs
,
MultiSourceTap
,
FileSystem
,
Serialized FormConstructor and Description |
---|
GlobHfs(cascading.scheme.Scheme<Configuration,RecordReader,?,?,?> scheme,
java.lang.String pathPattern)
Constructor GlobHfs creates a new GlobHfs instance.
|
GlobHfs(cascading.scheme.Scheme<Configuration,RecordReader,?,?,?> scheme,
java.lang.String pathPattern,
PathFilter pathFilter)
Constructor GlobHfs creates a new GlobHfs instance.
|
Modifier and Type | Method and Description |
---|---|
boolean |
equals(java.lang.Object object) |
java.lang.String |
getIdentifier() |
protected Hfs[] |
getTaps() |
int |
hashCode() |
void |
sourceConfInit(cascading.flow.FlowProcess<? extends Configuration> process,
Configuration conf) |
java.lang.String |
toString() |
findMatchingTap, getChildTaps, getModifiedTime, getNumChildTaps, getScheme, getTapPrefixMap, getTapsCommonPrefix, isReplace, openForRead, resourceExists
commitResource, createResource, deleteResource, getSinkFields, isSink, openForWrite, prepareResourceForWrite, rollbackResource, sinkConfInit
createResource, deleteResource, flowConfInit, getConfigDef, getFullIdentifier, getFullIdentifier, getModifiedTime, getNodeConfigDef, getSinkMode, getSourceFields, getStepConfigDef, getTrace, hasConfigDef, hasNodeConfigDef, hasStepConfigDef, id, isKeep, isSource, isTemporary, isUpdate, openForRead, openForWrite, outgoingScopeFor, prepareResourceForRead, presentSinkFields, presentSourceFields, resolveIncomingOperationArgumentFields, resolveIncomingOperationPassThroughFields, resourceExists, retrieveSinkFields, retrieveSourceFields, setScheme, taps
@ConstructorProperties(value={"scheme","pathPattern"}) public GlobHfs(cascading.scheme.Scheme<Configuration,RecordReader,?,?,?> scheme, java.lang.String pathPattern)
scheme
- of type SchemepathPattern
- of type String@ConstructorProperties(value={"scheme","pathPattern","pathFilter"}) public GlobHfs(cascading.scheme.Scheme<Configuration,RecordReader,?,?,?> scheme, java.lang.String pathPattern, PathFilter pathFilter)
scheme
- of type SchemepathPattern
- of type StringpathFilter
- of type PathFilterpublic java.lang.String getIdentifier()
getIdentifier
in class cascading.tap.MultiSourceTap<Hfs,Configuration,RecordReader>
protected Hfs[] getTaps()
getTaps
in class cascading.tap.MultiSourceTap<Hfs,Configuration,RecordReader>
public void sourceConfInit(cascading.flow.FlowProcess<? extends Configuration> process, Configuration conf)
sourceConfInit
in class cascading.tap.MultiSourceTap<Hfs,Configuration,RecordReader>
public boolean equals(java.lang.Object object)
equals
in class cascading.tap.MultiSourceTap<Hfs,Configuration,RecordReader>
public int hashCode()
hashCode
in class cascading.tap.MultiSourceTap<Hfs,Configuration,RecordReader>
public java.lang.String toString()
toString
in class cascading.tap.MultiSourceTap<Hfs,Configuration,RecordReader>
Copyright © 2007-2015 Xplenty, Inc. All Rights Reserved.