cascading.tap.local
Class FileTap

java.lang.Object
  extended by cascading.tap.Tap<Properties,InputStream,OutputStream>
      extended by cascading.tap.local.FileTap
All Implemented Interfaces:
FlowElement, FileType<Properties>, Serializable

public class FileTap
extends Tap<Properties,InputStream,OutputStream>
implements FileType<Properties>

Class FileTap is a Tap sub-class that allows for direct local file access.

FileTap must be used with the LocalFlowConnector to create Flow instances that run in "local" mode.

See Also:
Serialized Form

Constructor Summary
FileTap(Scheme<Properties,InputStream,OutputStream,?,?> scheme, String path)
          Constructor FileTap creates a new FileTap instance using the given Scheme and file path.
FileTap(Scheme<Properties,InputStream,OutputStream,?,?> scheme, String path, SinkMode sinkMode)
          Constructor FileTap creates a new FileTap instance using the given Scheme, file path, and SinkMode.
 
Method Summary
 boolean commitResource(Properties conf)
          Method commitResource allows the underlying resource to be notified when all write processing is successful so that any additional cleanup or processing may be completed.
 boolean createResource(Properties conf)
          Method createResource creates the underlying resource.
 boolean deleteResource(Properties conf)
          Method deleteResource deletes the resource represented by this instance.
 String[] getChildIdentifiers(Properties conf)
          Method getChildIdentifiers returns an array of child identifiers if this resource is a directory.
 String getFullIdentifier(Properties conf)
          Method getFullIdentifier returns a fully qualified resource identifier.
 String getIdentifier()
          Method getIdentifier returns a String representing the resource this Tap instance represents.
 long getModifiedTime(Properties conf)
          Method getModifiedTime returns the date this resource was last modified.
 long getSize(Properties conf)
          Method getSize returns the size of the file referenced by this tap.
 boolean isDirectory(Properties conf)
          Method isDirectory returns true if the underlying resource represents a directory or folder instead of an individual file.
 TupleEntryIterator openForRead(FlowProcess<Properties> flowProcess, InputStream input)
          Method openForRead opens the resource represented by this Tap instance for reading.
 TupleEntryCollector openForWrite(FlowProcess<Properties> flowProcess, OutputStream output)
          Method openForWrite opens the resource represented by this Tap instance for writing.
 boolean resourceExists(Properties conf)
          Method resourceExists returns true if the path represented by this instance exists.
 
Methods inherited from class cascading.tap.Tap
createResource, deleteResource, equals, flowConfInit, getConfigDef, getFullIdentifier, getModifiedTime, getScheme, getSinkFields, getSinkMode, getSourceFields, getStepConfigDef, getTrace, hasConfigDef, hashCode, hasStepConfigDef, id, isEquivalentTo, isKeep, isReplace, isSink, isSource, isTemporary, isUpdate, openForRead, openForWrite, outgoingScopeFor, presentSinkFields, presentSourceFields, resolveIncomingOperationArgumentFields, resolveIncomingOperationPassThroughFields, resourceExists, retrieveSinkFields, retrieveSourceFields, rollbackResource, setScheme, sinkConfInit, sourceConfInit, taps, toString
 
Methods inherited from class java.lang.Object
clone, finalize, getClass, notify, notifyAll, wait, wait, wait
 

Constructor Detail

FileTap

public FileTap(Scheme<Properties,InputStream,OutputStream,?,?> scheme,
               String path)
Constructor FileTap creates a new FileTap instance using the given Scheme and file path.

Parameters:
scheme - of type LocalScheme
path - of type String

FileTap

public FileTap(Scheme<Properties,InputStream,OutputStream,?,?> scheme,
               String path,
               SinkMode sinkMode)
Constructor FileTap creates a new FileTap instance using the given Scheme, file path, and SinkMode.

Parameters:
scheme - of type LocalScheme
path - of type String
sinkMode - of type SinkMode
Method Detail

getIdentifier

public String getIdentifier()
Description copied from class: Tap
Method getIdentifier returns a String representing the resource this Tap instance represents.

Often, if the tap accesses a filesystem, the identifier is nothing more than the path to the file or directory. In other cases it may be a an URL or URI representing a connection string or remote resource.

Any two Tap instances having the same value for the identifier are considered equal.

Specified by:
getIdentifier in class Tap<Properties,InputStream,OutputStream>
Returns:
String

getFullIdentifier

public String getFullIdentifier(Properties conf)
Description copied from class: Tap
Method getFullIdentifier returns a fully qualified resource identifier.

Overrides:
getFullIdentifier in class Tap<Properties,InputStream,OutputStream>
Parameters:
conf - of type Config
Returns:
String

openForRead

public TupleEntryIterator openForRead(FlowProcess<Properties> flowProcess,
                                      InputStream input)
                               throws IOException
Description copied from class: Tap
Method openForRead opens the resource represented by this Tap instance for reading.

input value may be null, if so, sub-classes must inquire with the underlying Scheme via Scheme.sourceConfInit(cascading.flow.FlowProcess, Tap, Object) to get the proper input type and instantiate it before calling super.openForRead().

Note the returned iterator will return the same instance of TupleEntry on every call, thus a copy must be made of either the TupleEntry or the underlying Tuple instance if they are to be stored in a Collection.

Specified by:
openForRead in class Tap<Properties,InputStream,OutputStream>
Parameters:
flowProcess - of type FlowProcess
input - of type Input
Returns:
TupleEntryIterator
Throws:
IOException - when the resource cannot be opened

openForWrite

public TupleEntryCollector openForWrite(FlowProcess<Properties> flowProcess,
                                        OutputStream output)
                                 throws IOException
Description copied from class: Tap
Method openForWrite opens the resource represented by this Tap instance for writing.

This method is used internally and does not honor the SinkMode setting. If SinkMode is SinkMode.REPLACE, this call may fail. See Tap.openForWrite(cascading.flow.FlowProcess).

output value may be null, if so, sub-classes must inquire with the underlying Scheme via Scheme.sinkConfInit(cascading.flow.FlowProcess, Tap, Object) to get the proper output type and instantiate it before calling super.openForWrite().

Specified by:
openForWrite in class Tap<Properties,InputStream,OutputStream>
Parameters:
flowProcess - of type FlowProcess
output - of type Output
Returns:
TupleEntryCollector
Throws:
IOException - when the resource cannot be opened

getSize

public long getSize(Properties conf)
             throws IOException
Description copied from interface: FileType
Method getSize returns the size of the file referenced by this tap.

Specified by:
getSize in interface FileType<Properties>
Parameters:
conf - of type Config
Returns:
The size of the file reference by this tap.
Throws:
IOException

createResource

public boolean createResource(Properties conf)
                       throws IOException
Description copied from class: Tap
Method createResource creates the underlying resource.

Specified by:
createResource in class Tap<Properties,InputStream,OutputStream>
Parameters:
conf - of type Config
Returns:
boolean
Throws:
IOException - when there is an error making directories

deleteResource

public boolean deleteResource(Properties conf)
                       throws IOException
Description copied from class: Tap
Method deleteResource deletes the resource represented by this instance.

Specified by:
deleteResource in class Tap<Properties,InputStream,OutputStream>
Parameters:
conf - of type Config
Returns:
boolean
Throws:
IOException - when the resource cannot be deleted

commitResource

public boolean commitResource(Properties conf)
                       throws IOException
Description copied from class: Tap
Method commitResource allows the underlying resource to be notified when all write processing is successful so that any additional cleanup or processing may be completed.

See Tap.rollbackResource(Object) to handle cleanup in the face of failures.

This method is invoked once "client side" and not in the cluster, if any.

If other sink Tap instance in a given Flow fail on commitResource after called on this instance, rollbackResource will not be called.

This is an experimental API and subject to refinement!!

Overrides:
commitResource in class Tap<Properties,InputStream,OutputStream>
Parameters:
conf - of type Config
Returns:
returns true if successful
Throws:
IOException

resourceExists

public boolean resourceExists(Properties conf)
                       throws IOException
Description copied from class: Tap
Method resourceExists returns true if the path represented by this instance exists.

Specified by:
resourceExists in class Tap<Properties,InputStream,OutputStream>
Parameters:
conf - of type Config
Returns:
true if the underlying resource already exists
Throws:
IOException - when the status cannot be determined

getModifiedTime

public long getModifiedTime(Properties conf)
                     throws IOException
Description copied from class: Tap
Method getModifiedTime returns the date this resource was last modified.

Specified by:
getModifiedTime in class Tap<Properties,InputStream,OutputStream>
Parameters:
conf - of type Config
Returns:
The date this resource was last modified.
Throws:
IOException

isDirectory

public boolean isDirectory(Properties conf)
                    throws IOException
Description copied from interface: FileType
Method isDirectory returns true if the underlying resource represents a directory or folder instead of an individual file.

Specified by:
isDirectory in interface FileType<Properties>
Parameters:
conf - of JobConf
Returns:
boolean
Throws:
IOException

getChildIdentifiers

public String[] getChildIdentifiers(Properties conf)
                             throws IOException
Description copied from interface: FileType
Method getChildIdentifiers returns an array of child identifiers if this resource is a directory.

This method will skip Hadoop log directories (_log).

Specified by:
getChildIdentifiers in interface FileType<Properties>
Parameters:
conf - of JobConf
Returns:
String[]
Throws:
IOException


Copyright © 2007-2013 Concurrent, Inc. All Rights Reserved.