cascading.tap.hadoop
Class TemplateTap

java.lang.Object
  extended by cascading.tap.Tap<Config,Void,Output>
      extended by cascading.tap.SinkTap<Config,Output>
          extended by cascading.tap.BaseTemplateTap<org.apache.hadoop.mapred.JobConf,org.apache.hadoop.mapred.OutputCollector>
              extended by cascading.tap.hadoop.TemplateTap
All Implemented Interfaces:
FlowElement, Serializable

public class TemplateTap
extends BaseTemplateTap<org.apache.hadoop.mapred.JobConf,org.apache.hadoop.mapred.OutputCollector>

Class TemplateTap can be used to write tuple streams out to sub-directories based on the values in the Tuple instance.

The constructor takes a Hfs Tap and a Formatter format syntax String. This allows Tuple values at given positions to be used as directory names. Note that Hadoop can only sink to directories, and all files in those directories are "part-xxxxx" files.

openTapsThreshold limits the number of open files to be output to. This value defaults to 300 files. Each time the threshold is exceeded, 10% of the least recently used open files will be closed.

TemplateTap will populate a given pathTemplate without regard to case of the values being used. Thus the resulting paths 2012/June/ and 2012/june/ will likely result in two open files into the same location. Forcing the case to be consistent with an upstream Function is recommended, see ExpressionFunction.

Though Hadoop has no mechanism to prevent simultaneous writes to a directory from multiple jobs, it doesn't mean its safe to do so. Same is true with the TemplateTap. Interleaving writes to a common parent (root) directory across multiple flows will very likely lead to data loss.

See Also:
Serialized Form

Nested Class Summary
 
Nested classes/interfaces inherited from class cascading.tap.BaseTemplateTap
BaseTemplateTap.Counters, BaseTemplateTap.TemplateScheme<Config,Output>
 
Field Summary
 
Fields inherited from class cascading.tap.BaseTemplateTap
keepParentOnDelete, OPEN_TAPS_THRESHOLD_DEFAULT, openTapsThreshold, parent, pathTemplate
 
Constructor Summary
TemplateTap(Hfs parent, String pathTemplate)
          Constructor TemplateTap creates a new TemplateTap instance using the given parent Hfs Tap as the base path and default Scheme, and the pathTemplate as the Formatter format String.
TemplateTap(Hfs parent, String pathTemplate, Fields pathFields)
          Constructor TemplateTap creates a new TemplateTap instance using the given parent Hfs Tap as the base path and default Scheme, and the pathTemplate as the Formatter format String.
TemplateTap(Hfs parent, String pathTemplate, Fields pathFields, int openTapsThreshold)
          Constructor TemplateTap creates a new TemplateTap instance using the given parent Hfs Tap as the base path and default Scheme, and the pathTemplate as the Formatter format String.
TemplateTap(Hfs parent, String pathTemplate, Fields pathFields, SinkMode sinkMode)
          Constructor TemplateTap creates a new TemplateTap instance using the given parent Hfs Tap as the base path and default Scheme, and the pathTemplate as the Formatter format String.
TemplateTap(Hfs parent, String pathTemplate, Fields pathFields, SinkMode sinkMode, boolean keepParentOnDelete)
          Constructor TemplateTap creates a new TemplateTap instance using the given parent Hfs Tap as the base path and default Scheme, and the pathTemplate as the Formatter format String.
TemplateTap(Hfs parent, String pathTemplate, Fields pathFields, SinkMode sinkMode, boolean keepParentOnDelete, int openTapsThreshold)
          Constructor TemplateTap creates a new TemplateTap instance using the given parent Hfs Tap as the base path and default Scheme, and the pathTemplate as the Formatter format String.
TemplateTap(Hfs parent, String pathTemplate, int openTapsThreshold)
          Constructor TemplateTap creates a new TemplateTap instance using the given parent Hfs Tap as the base path and default Scheme, and the pathTemplate as the Formatter format String.
TemplateTap(Hfs parent, String pathTemplate, SinkMode sinkMode)
          Constructor TemplateTap creates a new TemplateTap instance using the given parent Hfs Tap as the base path and default Scheme, and the pathTemplate as the Formatter format String.
TemplateTap(Hfs parent, String pathTemplate, SinkMode sinkMode, boolean keepParentOnDelete)
          Constructor TemplateTap creates a new TemplateTap instance using the given parent Hfs Tap as the base path and default Scheme, and the pathTemplate as the Formatter format String.
TemplateTap(Hfs parent, String pathTemplate, SinkMode sinkMode, boolean keepParentOnDelete, int openTapsThreshold)
          Constructor TemplateTap creates a new TemplateTap instance using the given parent Hfs Tap as the base path and default Scheme, and the pathTemplate as the Formatter format String.
 
Method Summary
protected  TupleEntrySchemeCollector createTupleEntrySchemeCollector(FlowProcess<org.apache.hadoop.mapred.JobConf> flowProcess, Tap parent, String path)
           
 
Methods inherited from class cascading.tap.BaseTemplateTap
commitResource, createResource, deleteResource, equals, getIdentifier, getModifiedTime, getOpenTapsThreshold, getParent, getPathTemplate, hashCode, openForWrite, resourceExists, rollbackResource, toString
 
Methods inherited from class cascading.tap.SinkTap
getSourceFields, isSource, openForRead, sourceConfInit
 
Methods inherited from class cascading.tap.Tap
flowConfInit, getConfigDef, getFullIdentifier, getScheme, getSinkFields, getSinkMode, getStepConfigDef, getTrace, hasConfigDef, hasStepConfigDef, id, isEquivalentTo, isKeep, isReplace, isSink, isTemporary, isUpdate, openForRead, openForWrite, outgoingScopeFor, presentSinkFields, presentSourceFields, resolveIncomingOperationArgumentFields, resolveIncomingOperationPassThroughFields, retrieveSinkFields, retrieveSourceFields, setScheme, sinkConfInit, taps
 
Methods inherited from class java.lang.Object
clone, finalize, getClass, notify, notifyAll, wait, wait, wait
 

Constructor Detail

TemplateTap

@ConstructorProperties(value={"parent","pathTemplate"})
public TemplateTap(Hfs parent,
                                              String pathTemplate)
Constructor TemplateTap creates a new TemplateTap instance using the given parent Hfs Tap as the base path and default Scheme, and the pathTemplate as the Formatter format String.

Parameters:
parent - of type Tap
pathTemplate - of type String

TemplateTap

@ConstructorProperties(value={"parent","pathTemplate","openTapsThreshold"})
public TemplateTap(Hfs parent,
                                              String pathTemplate,
                                              int openTapsThreshold)
Constructor TemplateTap creates a new TemplateTap instance using the given parent Hfs Tap as the base path and default Scheme, and the pathTemplate as the Formatter format String.

openTapsThreshold limits the number of open files to be output to.

Parameters:
parent - of type Hfs
pathTemplate - of type String
openTapsThreshold - of type int

TemplateTap

@ConstructorProperties(value={"parent","pathTemplate","sinkMode"})
public TemplateTap(Hfs parent,
                                              String pathTemplate,
                                              SinkMode sinkMode)
Constructor TemplateTap creates a new TemplateTap instance using the given parent Hfs Tap as the base path and default Scheme, and the pathTemplate as the Formatter format String.

Parameters:
parent - of type Tap
pathTemplate - of type String
sinkMode - of type SinkMode

TemplateTap

@ConstructorProperties(value={"parent","pathTemplate","sinkMode","keepParentOnDelete"})
public TemplateTap(Hfs parent,
                                              String pathTemplate,
                                              SinkMode sinkMode,
                                              boolean keepParentOnDelete)
Constructor TemplateTap creates a new TemplateTap instance using the given parent Hfs Tap as the base path and default Scheme, and the pathTemplate as the Formatter format String.

keepParentOnDelete, when set to true, prevents the parent Tap from being deleted when BaseTemplateTap.deleteResource(Object) is called, typically an issue when used inside a Cascade.

Parameters:
parent - of type Tap
pathTemplate - of type String
sinkMode - of type SinkMode
keepParentOnDelete - of type boolean

TemplateTap

@ConstructorProperties(value={"parent","pathTemplate","sinkMode","keepParentOnDelete","openTapsThreshold"})
public TemplateTap(Hfs parent,
                                              String pathTemplate,
                                              SinkMode sinkMode,
                                              boolean keepParentOnDelete,
                                              int openTapsThreshold)
Constructor TemplateTap creates a new TemplateTap instance using the given parent Hfs Tap as the base path and default Scheme, and the pathTemplate as the Formatter format String.

keepParentOnDelete, when set to true, prevents the parent Tap from being deleted when BaseTemplateTap.deleteResource(Object) is called, typically an issue when used inside a Cascade.

openTapsThreshold limits the number of open files to be output to.

Parameters:
parent - of type Tap
pathTemplate - of type String
sinkMode - of type SinkMode
keepParentOnDelete - of type boolean
openTapsThreshold - of type int

TemplateTap

@ConstructorProperties(value={"parent","pathTemplate","pathFields"})
public TemplateTap(Hfs parent,
                                              String pathTemplate,
                                              Fields pathFields)
Constructor TemplateTap creates a new TemplateTap instance using the given parent Hfs Tap as the base path and default Scheme, and the pathTemplate as the Formatter format String. The pathFields is a selector that selects and orders the fields to be used in the given pathTemplate.

This constructor also allows the sinkFields of the parent Tap to be independent of the pathFields. Thus allowing data not in the result file to be used in the template path name.

Parameters:
parent - of type Tap
pathTemplate - of type String
pathFields - of type Fields

TemplateTap

@ConstructorProperties(value={"parent","pathTemplate","pathFields","openTapsThreshold"})
public TemplateTap(Hfs parent,
                                              String pathTemplate,
                                              Fields pathFields,
                                              int openTapsThreshold)
Constructor TemplateTap creates a new TemplateTap instance using the given parent Hfs Tap as the base path and default Scheme, and the pathTemplate as the Formatter format String. The pathFields is a selector that selects and orders the fields to be used in the given pathTemplate.

This constructor also allows the sinkFields of the parent Tap to be independent of the pathFields. Thus allowing data not in the result file to be used in the template path name.

openTapsThreshold limits the number of open files to be output to.

Parameters:
parent - of type Hfs
pathTemplate - of type String
pathFields - of type Fields
openTapsThreshold - of type int

TemplateTap

@ConstructorProperties(value={"parent","pathTemplate","pathFields","sinkMode"})
public TemplateTap(Hfs parent,
                                              String pathTemplate,
                                              Fields pathFields,
                                              SinkMode sinkMode)
Constructor TemplateTap creates a new TemplateTap instance using the given parent Hfs Tap as the base path and default Scheme, and the pathTemplate as the Formatter format String. The pathFields is a selector that selects and orders the fields to be used in the given pathTemplate.

This constructor also allows the sinkFields of the parent Tap to be independent of the pathFields. Thus allowing data not in the result file to be used in the template path name.

Parameters:
parent - of type Tap
pathTemplate - of type String
pathFields - of type Fields
sinkMode - of type SinkMode

TemplateTap

@ConstructorProperties(value={"parent","pathTemplate","pathFields","sinkMode","keepParentOnDelete"})
public TemplateTap(Hfs parent,
                                              String pathTemplate,
                                              Fields pathFields,
                                              SinkMode sinkMode,
                                              boolean keepParentOnDelete)
Constructor TemplateTap creates a new TemplateTap instance using the given parent Hfs Tap as the base path and default Scheme, and the pathTemplate as the Formatter format String. The pathFields is a selector that selects and orders the fields to be used in the given pathTemplate.

This constructor also allows the sinkFields of the parent Tap to be independent of the pathFields. Thus allowing data not in the result file to be used in the template path name.

keepParentOnDelete, when set to true, prevents the parent Tap from being deleted when BaseTemplateTap.deleteResource(Object) is called, typically an issue when used inside a Cascade.

Parameters:
parent - of type Tap
pathTemplate - of type String
pathFields - of type Fields
sinkMode - of type SinkMode
keepParentOnDelete - of type boolean

TemplateTap

@ConstructorProperties(value={"parent","pathTemplate","pathFields","sinkMode","keepParentOnDelete","openTapsThreshold"})
public TemplateTap(Hfs parent,
                                              String pathTemplate,
                                              Fields pathFields,
                                              SinkMode sinkMode,
                                              boolean keepParentOnDelete,
                                              int openTapsThreshold)
Constructor TemplateTap creates a new TemplateTap instance using the given parent Hfs Tap as the base path and default Scheme, and the pathTemplate as the Formatter format String. The pathFields is a selector that selects and orders the fields to be used in the given pathTemplate.

This constructor also allows the sinkFields of the parent Tap to be independent of the pathFields. Thus allowing data not in the result file to be used in the template path name.

keepParentOnDelete, when set to true, prevents the parent Tap from being deleted when BaseTemplateTap.deleteResource(Object) is called, typically an issue when used inside a Cascade.

openTapsThreshold limits the number of open files to be output to.

Parameters:
parent - of type Hfs
pathTemplate - of type String
pathFields - of type Fields
sinkMode - of type SinkMode
keepParentOnDelete - of type boolean
openTapsThreshold - of type int
Method Detail

createTupleEntrySchemeCollector

protected TupleEntrySchemeCollector createTupleEntrySchemeCollector(FlowProcess<org.apache.hadoop.mapred.JobConf> flowProcess,
                                                                    Tap parent,
                                                                    String path)
                                                             throws IOException
Specified by:
createTupleEntrySchemeCollector in class BaseTemplateTap<org.apache.hadoop.mapred.JobConf,org.apache.hadoop.mapred.OutputCollector>
Throws:
IOException


Copyright © 2007-2012 Concurrent, Inc. All Rights Reserved.