cascading.tap
Class Tap

java.lang.Object
  extended by cascading.tap.Tap
All Implemented Interfaces:
FlowElement, Serializable
Direct Known Subclasses:
Hfs, SinkTap, SourceTap

public abstract class Tap
extends Object
implements FlowElement, Serializable

A Tap represents the physical data source or sink in a connected Flow.

That is, a source Tap is the head end of a connected Pipe and Tuple stream, and a sink Tap is the tail end. Kinds of Tap types are used to manage files from a local disk, distributed disk, remote storage like Amazon S3, or via FTP. It simply abstracts out the complexity of connecting to these types of data sources.

A Tap takes a Scheme instance, which is used to identify the type of resource (text file, binary file, etc). A Tap is responsible for how the resource is reached.

A Tap is not given an explicit name by design. This is so a given Tap instance can be re-used in different Flows that may expect a source or sink by a different logical name, but are the same physical resource. If a tap had a name other than its path, which would be used for the tap identity? If the name, then two Tap instances with different names but the same path could interfere with one another.

See Also:
Serialized Form

Constructor Summary
protected Tap()
           
protected Tap(Scheme scheme)
           
protected Tap(Scheme scheme, SinkMode sinkMode)
           
 
Method Summary
abstract  boolean deletePath(JobConf conf)
          Method deletePath deletes the resource represented by this instance.
 boolean equals(Object object)
           
 void flowInit(Flow flow)
          Method flowInit allows this Tap instance to initalize itself in context of the given Flow instance.
 String getIdentifier()
          Method getIdentifier returns a String representing the resource identifier this Tap instance represents.
abstract  Path getPath()
          Method getPath returns the Hadoop path to the resource represented by this Tap instance.
abstract  long getPathModified(JobConf conf)
          Method getPathModified returns the date this resource was last modified.
 Path getQualifiedPath(JobConf conf)
          Method getQualifiedPath returns a FileSystem fully qualified Hadoop Path.
 Scheme getScheme()
          Method getScheme returns the scheme of this Tap object.
 Fields getSinkFields()
          Method getSinkFields returns the sinkFields of this Tap object.
 SinkMode getSinkMode()
          Method getSinkMode returns the SinkMode }of this Tap object.
 Fields getSourceFields()
          Method getSourceFields returns the sourceFields of this Tap object.
 int hashCode()
           
 boolean isAppend()
          Deprecated. 
 boolean isEquivalentTo(FlowElement element)
           
 boolean isKeep()
          Method isKeep indicates whether the resource represented by this instance should be kept if it already exists when the Flow is started.
 boolean isReplace()
          Method isReplace indicates whether the resource represented by this instance should be deleted if it already exists when the Flow is started.
 boolean isSink()
          Method isSink returns true if this Tap instance can be used as a sink.
 boolean isSource()
          Method isSource returns true if this Tap instance can be used as a source.
 boolean isUpdate()
          Method isUpdate indicates whether the resrouce represented by this instance should be updated if it already exists.
 boolean isWriteDirect()
          Method isWriteDirect returns true if this instances TupleEntryCollector should be used to sink values.
abstract  boolean makeDirs(JobConf conf)
          Method makeDirs makes all the directories this Tap instance represents.
abstract  TupleEntryIterator openForRead(JobConf conf)
          Method openForRead opens the resource represented by this Tap instance.
abstract  TupleEntryCollector openForWrite(JobConf conf)
          Method openForWrite opens the resource represented by this Tap instance.
 Scope outgoingScopeFor(Set<Scope> incomingScopes)
          Method outgoingScopeFor returns the Scope this FlowElement hands off to the next FlowElement.
abstract  boolean pathExists(JobConf conf)
          Method pathExists return true if the path represented by this instance exists.
 Fields resolveFields(Scope scope)
          Method resolveFields returns the actual field names represented by the given Scope.
 Fields resolveIncomingOperationFields(Scope incomingScope)
          Method resolveIncomingOperationFields resolves the incoming scopes to the actual incoming operation field names.
protected  void setScheme(Scheme scheme)
           
 void setWriteDirect(boolean writeDirect)
          Method setWriteDirect should be set to true if this instances TupleEntryCollector should be used to sink values.
 void sink(TupleEntry tupleEntry, OutputCollector outputCollector)
          Method sink emits the sink value(s) to the OutputCollector
 void sinkInit(JobConf conf)
          Method sinkInit initializes this instance as a sink.
 Tuple source(Object key, Object value)
          Method source returns the source value as an instance of Tuple
 void sourceInit(JobConf conf)
          Method sourceInit initializes this instance as a source.
static Tap[] taps(Tap... taps)
          Convenience function to make an array of Tap instances.
 
Methods inherited from class java.lang.Object
clone, finalize, getClass, notify, notifyAll, toString, wait, wait, wait
 

Constructor Detail

Tap

protected Tap()

Tap

protected Tap(Scheme scheme)

Tap

protected Tap(Scheme scheme,
              SinkMode sinkMode)
Method Detail

taps

public static Tap[] taps(Tap... taps)
Convenience function to make an array of Tap instances.

Parameters:
taps - of type Tap
Returns:
Tap array

setScheme

protected void setScheme(Scheme scheme)

getScheme

public Scheme getScheme()
Method getScheme returns the scheme of this Tap object.

Returns:
the scheme (type Scheme) of this Tap object.

isWriteDirect

public boolean isWriteDirect()
Method isWriteDirect returns true if this instances TupleEntryCollector should be used to sink values.

Returns:
the writeDirect (type boolean) of this Tap object.

setWriteDirect

public void setWriteDirect(boolean writeDirect)
Method setWriteDirect should be set to true if this instances TupleEntryCollector should be used to sink values.

Parameters:
writeDirect - the writeDirect of this Tap object.

flowInit

public void flowInit(Flow flow)
Method flowInit allows this Tap instance to initalize itself in context of the given Flow instance. This method is guaranteed to be called before the Flow is started and the FlowListener.onStarting(cascading.flow.Flow) event is fired.

This method will be called once per Flow, and before sourceInit(org.apache.hadoop.mapred.JobConf) and sinkInit(org.apache.hadoop.mapred.JobConf) methods.

Parameters:
flow - of type Flow

sourceInit

public void sourceInit(JobConf conf)
                throws IOException
Method sourceInit initializes this instance as a source.

This method maybe called more than once if this Tap instance is used outside the scope of a Flow instance or if it participates in multiple times in a given Flow or across different Flows in a Cascade.

In the context of a Flow, it will be called after FlowListener.onStarting(cascading.flow.Flow)

Parameters:
conf - of type JobConf
Throws:
IOException - on resource initialization failure.

sinkInit

public void sinkInit(JobConf conf)
              throws IOException
Method sinkInit initializes this instance as a sink.

This method maybe called more than once if this Tap instance is used outside the scope of a Flow instance or if it participates in multiple times in a given Flow or across different Flows in a Cascade.

Note this method will be called in context of this Tap being used as a traditional 'sink' and as a 'trap'.

In the context of a Flow, it will be called after FlowListener.onStarting(cascading.flow.Flow)

Parameters:
conf - of type JobConf
Throws:
IOException - on resource initialization failure.

getPath

public abstract Path getPath()
Method getPath returns the Hadoop path to the resource represented by this Tap instance.

Returns:
Path

getIdentifier

public String getIdentifier()
Method getIdentifier returns a String representing the resource identifier this Tap instance represents.

By default, simply calls getPath().toString().

Returns:
the resource (type String) of this Tap object.

getSourceFields

public Fields getSourceFields()
Method getSourceFields returns the sourceFields of this Tap object.

Returns:
the sourceFields (type Fields) of this Tap object.

getSinkFields

public Fields getSinkFields()
Method getSinkFields returns the sinkFields of this Tap object.

Returns:
the sinkFields (type Fields) of this Tap object.

openForRead

public abstract TupleEntryIterator openForRead(JobConf conf)
                                        throws IOException
Method openForRead opens the resource represented by this Tap instance.

Parameters:
conf - of type JobConf
Returns:
TupleEntryIterator
Throws:
IOException - when the resource cannot be opened

openForWrite

public abstract TupleEntryCollector openForWrite(JobConf conf)
                                          throws IOException
Method openForWrite opens the resource represented by this Tap instance.

Parameters:
conf - of type JobConf
Returns:
TupleEntryCollector
Throws:
IOException - when

source

public Tuple source(Object key,
                    Object value)
Method source returns the source value as an instance of Tuple

Parameters:
key - of type WritableComparable
value - of type Writable
Returns:
Tuple

sink

public void sink(TupleEntry tupleEntry,
                 OutputCollector outputCollector)
          throws IOException
Method sink emits the sink value(s) to the OutputCollector

Parameters:
tupleEntry - of type TupleEntry
outputCollector - of type OutputCollector
Throws:
IOException - when the resource cannot be written to

outgoingScopeFor

public Scope outgoingScopeFor(Set<Scope> incomingScopes)
Description copied from interface: FlowElement
Method outgoingScopeFor returns the Scope this FlowElement hands off to the next FlowElement.

Specified by:
outgoingScopeFor in interface FlowElement
Parameters:
incomingScopes - of type Set
Returns:
Scope
See Also:
FlowElement.outgoingScopeFor(Set)

resolveIncomingOperationFields

public Fields resolveIncomingOperationFields(Scope incomingScope)
Description copied from interface: FlowElement
Method resolveIncomingOperationFields resolves the incoming scopes to the actual incoming operation field names.

Specified by:
resolveIncomingOperationFields in interface FlowElement
Parameters:
incomingScope - of type Scope
Returns:
Fields
See Also:
FlowElement.resolveIncomingOperationFields(Scope)

resolveFields

public Fields resolveFields(Scope scope)
Description copied from interface: FlowElement
Method resolveFields returns the actual field names represented by the given Scope. The scope may be incoming or outgoing in relation to this FlowElement instance.

Specified by:
resolveFields in interface FlowElement
Parameters:
scope - of type Scope
Returns:
Fields
See Also:
FlowElement.resolveFields(Scope)

getQualifiedPath

public Path getQualifiedPath(JobConf conf)
                      throws IOException
Method getQualifiedPath returns a FileSystem fully qualified Hadoop Path.

Parameters:
conf - of type JobConf
Returns:
Path
Throws:
IOException - when

makeDirs

public abstract boolean makeDirs(JobConf conf)
                          throws IOException
Method makeDirs makes all the directories this Tap instance represents.

Parameters:
conf - of type JobConf
Returns:
boolean
Throws:
IOException - when there is an error making directories

deletePath

public abstract boolean deletePath(JobConf conf)
                            throws IOException
Method deletePath deletes the resource represented by this instance.

Parameters:
conf - of type JobConf
Returns:
boolean
Throws:
IOException - when the resource cannot be deleted

pathExists

public abstract boolean pathExists(JobConf conf)
                            throws IOException
Method pathExists return true if the path represented by this instance exists.

Parameters:
conf - of type JobConf
Returns:
boolean
Throws:
IOException - when the status cannot be determined

getPathModified

public abstract long getPathModified(JobConf conf)
                              throws IOException
Method getPathModified returns the date this resource was last modified.

Parameters:
conf - of type JobConf
Returns:
long
Throws:
IOException - when the modified date cannot be determined

getSinkMode

public SinkMode getSinkMode()
Method getSinkMode returns the SinkMode }of this Tap object.

Returns:
the sinkMode (type SinkMode) of this Tap object.

isKeep

public boolean isKeep()
Method isKeep indicates whether the resource represented by this instance should be kept if it already exists when the Flow is started.

Returns:
boolean

isReplace

public boolean isReplace()
Method isReplace indicates whether the resource represented by this instance should be deleted if it already exists when the Flow is started.

Returns:
boolean

isAppend

@Deprecated
public boolean isAppend()
Deprecated. 

Method isAppend indicates whether the resrouce represented by this instance should be appended to if it already exists. Otherwise a new resource will be created when the Flow is started..

Returns:
boolean

isUpdate

public boolean isUpdate()
Method isUpdate indicates whether the resrouce represented by this instance should be updated if it already exists. Otherwise a new resource will be created when the Flow is started..

Returns:
boolean

isSink

public boolean isSink()
Method isSink returns true if this Tap instance can be used as a sink.

Returns:
boolean

isSource

public boolean isSource()
Method isSource returns true if this Tap instance can be used as a source.

Returns:
boolean

isEquivalentTo

public boolean isEquivalentTo(FlowElement element)
Specified by:
isEquivalentTo in interface FlowElement

equals

public boolean equals(Object object)
Overrides:
equals in class Object

hashCode

public int hashCode()
Overrides:
hashCode in class Object


Copyright © 2007-2010 Concurrent, Inc. All Rights Reserved.