|
|||||||||
PREV CLASS NEXT CLASS | FRAMES NO FRAMES | ||||||||
SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD |
java.lang.Object cascading.tap.Tap
public abstract class Tap
A Tap represents the physical data source or sink in a connected Flow
.
Pipe
and Tuple
stream, and
a sink Tap is the tail end. Kinds of Tap types are used to manage files from a local disk,
distributed disk, remote storage like Amazon S3, or via FTP. It simply abstracts
out the complexity of connecting to these types of data sources.
A Tap takes a Scheme
instance, which is used to identify the type of resource (text file, binary file, etc).
A Tap is responsible for how the resource is reached.
A Tap is not given an explicit name by design. This is so a given Tap instance can be
re-used in different Flow
s that may expect a source or sink by a different
logical name, but are the same physical resource. If a tap had a name other than its path, which would be
used for the tap identity? If the name, then two Tap instances with different names but the same path could
interfere with one another.
Constructor Summary | |
---|---|
protected |
Tap()
|
protected |
Tap(Scheme scheme)
|
protected |
Tap(Scheme scheme,
SinkMode sinkMode)
|
Method Summary | |
---|---|
abstract boolean |
deletePath(JobConf conf)
Method deletePath deletes the resource represented by this instance. |
boolean |
equals(Object object)
|
void |
flowInit(Flow flow)
Method flowInit allows this Tap instance to initalize itself in context of the given Flow instance. |
String |
getIdentifier()
Method getIdentifier returns a String representing the resource identifier this Tap instance represents. |
abstract Path |
getPath()
Method getPath returns the Hadoop path to the resource represented by this Tap instance. |
abstract long |
getPathModified(JobConf conf)
Method getPathModified returns the date this resource was last modified. |
Path |
getQualifiedPath(JobConf conf)
Method getQualifiedPath returns a FileSystem fully qualified Hadoop Path. |
Scheme |
getScheme()
Method getScheme returns the scheme of this Tap object. |
Fields |
getSinkFields()
Method getSinkFields returns the sinkFields of this Tap object. |
SinkMode |
getSinkMode()
Method getSinkMode returns the SinkMode }of this Tap object. |
Fields |
getSourceFields()
Method getSourceFields returns the sourceFields of this Tap object. |
int |
hashCode()
|
boolean |
isAppend()
Deprecated. |
boolean |
isEquivalentTo(FlowElement element)
|
boolean |
isKeep()
Method isKeep indicates whether the resource represented by this instance should be kept if it already exists when the Flow is started. |
boolean |
isReplace()
Method isReplace indicates whether the resource represented by this instance should be deleted if it already exists when the Flow is started. |
boolean |
isSink()
Method isSink returns true if this Tap instance can be used as a sink. |
boolean |
isSource()
Method isSource returns true if this Tap instance can be used as a source. |
boolean |
isUpdate()
Method isUpdate indicates whether the resrouce represented by this instance should be updated if it already exists. |
boolean |
isWriteDirect()
Method isWriteDirect returns true if this instances TupleEntryCollector should be used to sink values. |
abstract boolean |
makeDirs(JobConf conf)
Method makeDirs makes all the directories this Tap instance represents. |
abstract TupleEntryIterator |
openForRead(JobConf conf)
Method openForRead opens the resource represented by this Tap instance. |
abstract TupleEntryCollector |
openForWrite(JobConf conf)
Method openForWrite opens the resource represented by this Tap instance. |
Scope |
outgoingScopeFor(Set<Scope> incomingScopes)
Method outgoingScopeFor returns the Scope this FlowElement hands off to the next FlowElement. |
abstract boolean |
pathExists(JobConf conf)
Method pathExists return true if the path represented by this instance exists. |
Fields |
resolveFields(Scope scope)
Method resolveFields returns the actual field names represented by the given Scope. |
Fields |
resolveIncomingOperationFields(Scope incomingScope)
Method resolveIncomingOperationFields resolves the incoming scopes to the actual incoming operation field names. |
protected void |
setScheme(Scheme scheme)
|
void |
setWriteDirect(boolean writeDirect)
Method setWriteDirect should be set to true if this instances TupleEntryCollector should be used to sink values. |
void |
sink(TupleEntry tupleEntry,
OutputCollector outputCollector)
Method sink emits the sink value(s) to the OutputCollector |
void |
sinkInit(JobConf conf)
Method sinkInit initializes this instance as a sink. |
Tuple |
source(Object key,
Object value)
Method source returns the source value as an instance of Tuple |
void |
sourceInit(JobConf conf)
Method sourceInit initializes this instance as a source. |
static Tap[] |
taps(Tap... taps)
Convenience function to make an array of Tap instances. |
Methods inherited from class java.lang.Object |
---|
clone, finalize, getClass, notify, notifyAll, toString, wait, wait, wait |
Constructor Detail |
---|
protected Tap()
protected Tap(Scheme scheme)
protected Tap(Scheme scheme, SinkMode sinkMode)
Method Detail |
---|
public static Tap[] taps(Tap... taps)
taps
- of type Tap
protected void setScheme(Scheme scheme)
public Scheme getScheme()
public boolean isWriteDirect()
TupleEntryCollector
should be used to sink values.
public void setWriteDirect(boolean writeDirect)
TupleEntryCollector
should be used to sink values.
writeDirect
- the writeDirect of this Tap object.public void flowInit(Flow flow)
Flow
instance.
This method is guaranteed to be called before the Flow is started and the
FlowListener.onStarting(cascading.flow.Flow)
event is fired.
This method will be called once per Flow, and before sourceInit(org.apache.hadoop.mapred.JobConf)
and
sinkInit(org.apache.hadoop.mapred.JobConf)
methods.
flow
- of type Flowpublic void sourceInit(JobConf conf) throws IOException
Flow
instance or if it participates in multiple times in a given Flow or across different Flows in
a Cascade
.
In the context of a Flow, it will be called after
FlowListener.onStarting(cascading.flow.Flow)
conf
- of type JobConf
IOException
- on resource initialization failure.public void sinkInit(JobConf conf) throws IOException
Flow
instance or if it participates in multiple times in a given Flow or across different Flows in
a Cascade
.
Note this method will be called in context of this Tap being used as a traditional 'sink' and as a 'trap'.
In the context of a Flow, it will be called after
FlowListener.onStarting(cascading.flow.Flow)
conf
- of type JobConf
IOException
- on resource initialization failure.public abstract Path getPath()
public String getIdentifier()
getPath().toString()
.
public Fields getSourceFields()
public Fields getSinkFields()
public abstract TupleEntryIterator openForRead(JobConf conf) throws IOException
conf
- of type JobConf
IOException
- when the resource cannot be openedpublic abstract TupleEntryCollector openForWrite(JobConf conf) throws IOException
conf
- of type JobConf
IOException
- whenpublic Tuple source(Object key, Object value)
Tuple
key
- of type WritableComparablevalue
- of type Writable
public void sink(TupleEntry tupleEntry, OutputCollector outputCollector) throws IOException
tupleEntry
- of type TupleEntryoutputCollector
- of type OutputCollector
IOException
- when the resource cannot be written topublic Scope outgoingScopeFor(Set<Scope> incomingScopes)
FlowElement
outgoingScopeFor
in interface FlowElement
incomingScopes
- of type SetFlowElement.outgoingScopeFor(Set)
public Fields resolveIncomingOperationFields(Scope incomingScope)
FlowElement
resolveIncomingOperationFields
in interface FlowElement
incomingScope
- of type Scope
FlowElement.resolveIncomingOperationFields(Scope)
public Fields resolveFields(Scope scope)
FlowElement
resolveFields
in interface FlowElement
scope
- of type Scope
FlowElement.resolveFields(Scope)
public Path getQualifiedPath(JobConf conf) throws IOException
conf
- of type JobConf
IOException
- whenpublic abstract boolean makeDirs(JobConf conf) throws IOException
conf
- of type JobConf
IOException
- when there is an error making directoriespublic abstract boolean deletePath(JobConf conf) throws IOException
conf
- of type JobConf
IOException
- when the resource cannot be deletedpublic abstract boolean pathExists(JobConf conf) throws IOException
conf
- of type JobConf
IOException
- when the status cannot be determinedpublic abstract long getPathModified(JobConf conf) throws IOException
conf
- of type JobConf
IOException
- when the modified date cannot be determinedpublic SinkMode getSinkMode()
SinkMode
}of this Tap object.
public boolean isKeep()
public boolean isReplace()
@Deprecated public boolean isAppend()
public boolean isUpdate()
public boolean isSink()
public boolean isSource()
public boolean isEquivalentTo(FlowElement element)
isEquivalentTo
in interface FlowElement
public boolean equals(Object object)
equals
in class Object
public int hashCode()
hashCode
in class Object
|
|||||||||
PREV CLASS NEXT CLASS | FRAMES NO FRAMES | ||||||||
SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD |