public class MultiSourceTap<Child extends Tap,Config,Input> extends SourceTap<Config,Input> implements CompositeTap<Child>
Tap
instances into a single resource. Effectively this will allow
multiple files to be concatenated into the requesting pipe assembly, if they all share the same Scheme
instance.
Note that order is not maintained by virtue of the underlying model. If order is necessary, use a unique sequence key to span the resources, like a line number.
Note that if multiple input files have the same Scheme (like cascading.scheme.hadoop.TextLine
), they may not contain
the same semi-structure internally. For example, one file might be an Apache log file, and another might be a Log4J
log file. If each one should be parsed differently, then they must be handled by different pipe assembly branches.
As of 3.3, MultiSourceTap can aggregate cascading.scheme.hadoop.PartitionTap
and
cascading.scheme.local.PartitionTap
instances.
But it may not be safe to aggregate cascading.scheme.hadoop.GlobHfs
and cascading.scheme.hadoop.PartitionTap
instances as GlobHfs identifiers cannot be fully resolved, preventing the cluster side runtime from distinguishing which
Tap instance to open for reading.
Modifier and Type | Field and Description |
---|---|
protected java.lang.String |
commonPrefix |
protected Trie<Child> |
prefixMap |
protected Child[] |
taps |
Modifier | Constructor and Description |
---|---|
|
MultiSourceTap(Child... taps)
Constructor MultiSourceTap creates a new MultiSourceTap instance.
|
protected |
MultiSourceTap(Scheme<Config,Input,?,?,?> scheme) |
Modifier and Type | Method and Description |
---|---|
boolean |
equals(java.lang.Object object) |
protected Child |
findMatchingTap(FlowProcess<? extends Config> flowProcess) |
java.util.Iterator<Child> |
getChildTaps() |
java.lang.String |
getIdentifier()
Method getIdentifier returns a String representing the resource this Tap instance represents.
|
long |
getModifiedTime(Config conf)
Returns the most current modified time.
|
long |
getNumChildTaps() |
Scheme |
getScheme()
Method getScheme returns the scheme of this Tap object.
|
protected Trie<Child> |
getTapPrefixMap(FlowProcess<? extends Config> flowProcess) |
protected Child[] |
getTaps()
Method getTaps returns the taps of this MultiTap object.
|
protected java.lang.String |
getTapsCommonPrefix(FlowProcess<? extends Config> flowProcess) |
int |
hashCode() |
boolean |
isReplace()
Method isReplace indicates whether the resource represented by this instance should be deleted if it
already exists when the Flow is started.
|
TupleEntryIterator |
openForRead(FlowProcess<? extends Config> flowProcess,
Input input)
Method openForRead opens the resource represented by this Tap instance for reading.
|
boolean |
resourceExists(Config conf)
Method resourceExists returns true if the path represented by this instance exists.
|
void |
sourceConfInit(FlowProcess<? extends Config> process,
Config conf)
Method sourceConfInit initializes this instance as a source.
|
java.lang.String |
toString() |
commitResource, createResource, deleteResource, getSinkFields, isSink, openForWrite, prepareResourceForWrite, rollbackResource, sinkConfInit
createResource, deleteResource, flowConfInit, getConfigDef, getFullIdentifier, getFullIdentifier, getModifiedTime, getNodeConfigDef, getSinkMode, getSourceFields, getStepConfigDef, getTrace, hasConfigDef, hasNodeConfigDef, hasStepConfigDef, id, isKeep, isSource, isTemporary, isUpdate, openForRead, openForWrite, outgoingScopeFor, prepareResourceForRead, presentSinkFields, presentSourceFields, resolveIncomingOperationArgumentFields, resolveIncomingOperationPassThroughFields, resourceExists, retrieveSinkFields, retrieveSourceFields, setScheme, taps
protected transient java.lang.String commonPrefix
protected MultiSourceTap(Scheme<Config,Input,?,?,?> scheme)
@ConstructorProperties(value="taps") public MultiSourceTap(Child... taps)
taps
- of type Tap...protected Child[] getTaps()
public java.util.Iterator<Child> getChildTaps()
getChildTaps
in interface CompositeTap<Child extends Tap>
public long getNumChildTaps()
getNumChildTaps
in interface CompositeTap<Child extends Tap>
public java.lang.String getIdentifier()
Tap
getIdentifier
in class Tap<Config,Input,java.lang.Void>
public Scheme getScheme()
Tap
public boolean isReplace()
Tap
public void sourceConfInit(FlowProcess<? extends Config> process, Config conf)
Tap
Flow
instance or if it participates in multiple times in a given Flow or across different Flows in
a Cascade
.
In the context of a Flow, it will be called after
FlowListener.onStarting(cascading.flow.Flow)
Note that no resources or services should be modified by this method.sourceConfInit
in class Tap<Config,Input,java.lang.Void>
process
- of type FlowProcessconf
- of type Configpublic boolean resourceExists(Config conf) throws java.io.IOException
Tap
resourceExists
in class Tap<Config,Input,java.lang.Void>
conf
- of type Configjava.io.IOException
- when the status cannot be determinedpublic long getModifiedTime(Config conf) throws java.io.IOException
getModifiedTime
in class Tap<Config,Input,java.lang.Void>
conf
- of type Configjava.io.IOException
public TupleEntryIterator openForRead(FlowProcess<? extends Config> flowProcess, Input input) throws java.io.IOException
Tap
input
value may be null, if so, sub-classes must inquire with the underlying Scheme
via Scheme.sourceConfInit(cascading.flow.FlowProcess, Tap, Object)
to get the proper
input type and instantiate it before calling super.openForRead()
.
Note the returned iterator will return the same instance of TupleEntry
on every call,
thus a copy must be made of either the TupleEntry or the underlying Tuple
instance if they are to be
stored in a Collection.openForRead
in class Tap<Config,Input,java.lang.Void>
flowProcess
- of type FlowProcessinput
- of type Inputjava.io.IOException
- when the resource cannot be openedprotected Child findMatchingTap(FlowProcess<? extends Config> flowProcess)
protected java.lang.String getTapsCommonPrefix(FlowProcess<? extends Config> flowProcess)
protected Trie<Child> getTapPrefixMap(FlowProcess<? extends Config> flowProcess)
public boolean equals(java.lang.Object object)
Copyright © 2007-2015 Xplenty, Inc. All Rights Reserved.