|
|||||||||
PREV CLASS NEXT CLASS | FRAMES NO FRAMES | ||||||||
SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD |
java.lang.Object cascading.tap.Tap<org.apache.hadoop.mapred.JobConf,org.apache.hadoop.mapred.RecordReader,org.apache.hadoop.mapred.OutputCollector> cascading.tap.hadoop.Hfs
public class Hfs
Class Hfs is the base class for all Hadoop file system access. Hfs may only be used with the
HadoopFlowConnector
when creating Hadoop executable Flow
instances.
Dfs
or Lfs
for resources specific to Hadoop Distributed file system or
the Local file system, respectively.
Use the Hfs class if the 'kind' of resource is unknown at design time. To use, prefix a scheme to the 'stringPath'. Where
hdfs://...
will denote Dfs, and file://...
will denote Lfs.
Call setTemporaryDirectory(java.util.Map, String)
to use a different temporary file directory path
other than the current Hadoop default path.
By default Cascading on Hadoop will assume any source or sink Tap using the file://
URI scheme
intends to read files from the local client filesystem (for example when using the Lfs
Tap) where the Hadoop
job jar is started, Tap so will force any MapReduce jobs reading or writing to file://
resources to run in
Hadoop "local mode" so that the file can be read.
To change this behavior, setLocalModeScheme(java.util.Map, String)
to set a different scheme value,
or to "none" to disable entirely for the case the file to be read is available on every Hadoop processing node
in the exact same path.
Field Summary | |
---|---|
static String |
LOCAL_MODE_SCHEME
Fields LOCAL_MODE_SCHEME * |
protected String |
stringPath
Field stringPath |
static String |
TEMPORARY_DIRECTORY
Field TEMPORARY_DIRECTORY |
Constructor Summary | |
---|---|
protected |
Hfs()
|
|
Hfs(Fields fields,
String stringPath)
Deprecated. |
|
Hfs(Fields fields,
String stringPath,
boolean replace)
Deprecated. |
|
Hfs(Fields fields,
String stringPath,
SinkMode sinkMode)
Deprecated. |
protected |
Hfs(Scheme<org.apache.hadoop.mapred.JobConf,org.apache.hadoop.mapred.RecordReader,org.apache.hadoop.mapred.OutputCollector,?,?> scheme)
|
|
Hfs(Scheme<org.apache.hadoop.mapred.JobConf,org.apache.hadoop.mapred.RecordReader,org.apache.hadoop.mapred.OutputCollector,?,?> scheme,
String stringPath)
Constructor Hfs creates a new Hfs instance. |
|
Hfs(Scheme<org.apache.hadoop.mapred.JobConf,org.apache.hadoop.mapred.RecordReader,org.apache.hadoop.mapred.OutputCollector,?,?> scheme,
String stringPath,
boolean replace)
Deprecated. |
|
Hfs(Scheme<org.apache.hadoop.mapred.JobConf,org.apache.hadoop.mapred.RecordReader,org.apache.hadoop.mapred.OutputCollector,?,?> scheme,
String stringPath,
SinkMode sinkMode)
Constructor Hfs creates a new Hfs instance. |
Method Summary | |
---|---|
boolean |
createResource(org.apache.hadoop.mapred.JobConf conf)
Method createResource creates the underlying resource. |
boolean |
deleteResource(org.apache.hadoop.mapred.JobConf conf)
Method deleteResource deletes the resource represented by this instance. |
long |
getBlockSize(org.apache.hadoop.mapred.JobConf conf)
Method getBlockSize returns the blocksize specified by the underlying file system for this resource. |
String[] |
getChildIdentifiers(org.apache.hadoop.mapred.JobConf conf)
Method getChildIdentifiers returns an array of child identifiers if this resource is a directory. |
protected org.apache.hadoop.fs.FileSystem |
getDefaultFileSystem(org.apache.hadoop.mapred.JobConf jobConf)
|
URI |
getDefaultFileSystemURIScheme(org.apache.hadoop.mapred.JobConf jobConf)
Method getDefaultFileSystemURIScheme returns the URI scheme for the default Hadoop FileSystem. |
protected org.apache.hadoop.fs.FileSystem |
getFileSystem(org.apache.hadoop.mapred.JobConf jobConf)
|
String |
getFullIdentifier(org.apache.hadoop.mapred.JobConf conf)
Method getFullIdentifier returns a fully qualified resource identifier. |
String |
getIdentifier()
Method getIdentifier returns a String representing the resource this Tap instance represents. |
protected static String |
getLocalModeScheme(org.apache.hadoop.mapred.JobConf conf,
String defaultValue)
|
long |
getModifiedTime(org.apache.hadoop.mapred.JobConf conf)
Method getModifiedTime returns the date this resource was last modified. |
org.apache.hadoop.fs.Path |
getPath()
|
int |
getReplication(org.apache.hadoop.mapred.JobConf conf)
Method getReplication returns the replication specified by the underlying file system for
this resource. |
long |
getSize(org.apache.hadoop.mapred.JobConf conf)
Method getSize returns the size of the file referenced by this tap. |
static String |
getTemporaryDirectory(Map<Object,Object> properties)
Method getTemporaryDirectory returns the configured temporary directory from the given properties object. |
static org.apache.hadoop.fs.Path |
getTempPath(org.apache.hadoop.mapred.JobConf conf)
|
URI |
getURIScheme(org.apache.hadoop.mapred.JobConf jobConf)
|
boolean |
isDirectory(org.apache.hadoop.mapred.JobConf conf)
Method isDirectory returns true if the underlying resource represents a directory or folder instead of an individual file. |
protected String |
makeTemporaryPathDirString(String name)
|
protected URI |
makeURIScheme(org.apache.hadoop.mapred.JobConf jobConf)
|
TupleEntryIterator |
openForRead(FlowProcess<org.apache.hadoop.mapred.JobConf> flowProcess,
org.apache.hadoop.mapred.RecordReader input)
Method openForRead opens the resource represented by this Tap instance for reading. |
TupleEntryCollector |
openForWrite(FlowProcess<org.apache.hadoop.mapred.JobConf> flowProcess,
org.apache.hadoop.mapred.OutputCollector output)
Method openForWrite opens the resource represented by this Tap instance for writing. |
boolean |
resourceExists(org.apache.hadoop.mapred.JobConf conf)
Method resourceExists returns true if the path represented by this instance exists. |
static void |
setLocalModeScheme(Map<Object,Object> properties,
String scheme)
Method setLocalModeScheme provides a means to change the scheme value used to detect when a MapReduce job should be run in Hadoop local mode. |
protected void |
setStringPath(String stringPath)
|
static void |
setTemporaryDirectory(Map<Object,Object> properties,
String tempDir)
Method setTemporaryDirectory sets the temporary directory on the given properties object. |
protected void |
setUriScheme(URI uriScheme)
|
void |
sinkConfInit(FlowProcess<org.apache.hadoop.mapred.JobConf> process,
org.apache.hadoop.mapred.JobConf conf)
Method sinkConfInit initializes this instance as a sink. |
void |
sourceConfInit(FlowProcess<org.apache.hadoop.mapred.JobConf> process,
org.apache.hadoop.mapred.JobConf conf)
Method sourceConfInit initializes this instance as a source. |
Methods inherited from class java.lang.Object |
---|
clone, finalize, getClass, notify, notifyAll, wait, wait, wait |
Field Detail |
---|
public static final String TEMPORARY_DIRECTORY
public static final String LOCAL_MODE_SCHEME
protected String stringPath
Constructor Detail |
---|
protected Hfs()
@ConstructorProperties(value="scheme") protected Hfs(Scheme<org.apache.hadoop.mapred.JobConf,org.apache.hadoop.mapred.RecordReader,org.apache.hadoop.mapred.OutputCollector,?,?> scheme)
@Deprecated @ConstructorProperties(value={"fields","stringPath"}) public Hfs(Fields fields, String stringPath)
fields
- of type FieldsstringPath
- of type String@Deprecated @ConstructorProperties(value={"fields","stringPath","replace"}) public Hfs(Fields fields, String stringPath, boolean replace)
fields
- of type FieldsstringPath
- of type Stringreplace
- of type boolean@Deprecated @ConstructorProperties(value={"fields","stringPath","sinkMode"}) public Hfs(Fields fields, String stringPath, SinkMode sinkMode)
fields
- of type FieldsstringPath
- of type StringsinkMode
- of type SinkMode@ConstructorProperties(value={"scheme","stringPath"}) public Hfs(Scheme<org.apache.hadoop.mapred.JobConf,org.apache.hadoop.mapred.RecordReader,org.apache.hadoop.mapred.OutputCollector,?,?> scheme, String stringPath)
scheme
- of type SchemestringPath
- of type String@Deprecated @ConstructorProperties(value={"scheme","stringPath","replace"}) public Hfs(Scheme<org.apache.hadoop.mapred.JobConf,org.apache.hadoop.mapred.RecordReader,org.apache.hadoop.mapred.OutputCollector,?,?> scheme, String stringPath, boolean replace)
scheme
- of type SchemestringPath
- of type Stringreplace
- of type boolean@ConstructorProperties(value={"scheme","stringPath","sinkMode"}) public Hfs(Scheme<org.apache.hadoop.mapred.JobConf,org.apache.hadoop.mapred.RecordReader,org.apache.hadoop.mapred.OutputCollector,?,?> scheme, String stringPath, SinkMode sinkMode)
scheme
- of type SchemestringPath
- of type StringsinkMode
- of type SinkModeMethod Detail |
---|
public static void setTemporaryDirectory(Map<Object,Object> properties, String tempDir)
properties
- of type Mappublic static String getTemporaryDirectory(Map<Object,Object> properties)
properties
- of type Mappublic static void setLocalModeScheme(Map<Object,Object> properties, String scheme)
"file"
, set to
"none"
to disable entirely.
properties
- of tyep Mapprotected static String getLocalModeScheme(org.apache.hadoop.mapred.JobConf conf, String defaultValue)
protected void setStringPath(String stringPath)
protected void setUriScheme(URI uriScheme)
public URI getURIScheme(org.apache.hadoop.mapred.JobConf jobConf)
protected URI makeURIScheme(org.apache.hadoop.mapred.JobConf jobConf)
public URI getDefaultFileSystemURIScheme(org.apache.hadoop.mapred.JobConf jobConf)
jobConf
- of type JobConf
protected org.apache.hadoop.fs.FileSystem getDefaultFileSystem(org.apache.hadoop.mapred.JobConf jobConf)
protected org.apache.hadoop.fs.FileSystem getFileSystem(org.apache.hadoop.mapred.JobConf jobConf)
public String getIdentifier()
Tap
getIdentifier
in class Tap<org.apache.hadoop.mapred.JobConf,org.apache.hadoop.mapred.RecordReader,org.apache.hadoop.mapred.OutputCollector>
public org.apache.hadoop.fs.Path getPath()
public String getFullIdentifier(org.apache.hadoop.mapred.JobConf conf)
Tap
getFullIdentifier
in class Tap<org.apache.hadoop.mapred.JobConf,org.apache.hadoop.mapred.RecordReader,org.apache.hadoop.mapred.OutputCollector>
conf
- of type Config
public void sourceConfInit(FlowProcess<org.apache.hadoop.mapred.JobConf> process, org.apache.hadoop.mapred.JobConf conf)
Tap
Flow
instance or if it participates in multiple times in a given Flow or across different Flows in
a Cascade
.
In the context of a Flow, it will be called after
FlowListener.onStarting(cascading.flow.Flow)
Note that no resources or services should be modified by this method.
sourceConfInit
in class Tap<org.apache.hadoop.mapred.JobConf,org.apache.hadoop.mapred.RecordReader,org.apache.hadoop.mapred.OutputCollector>
process
- of type FlowProcessconf
- of type Configpublic void sinkConfInit(FlowProcess<org.apache.hadoop.mapred.JobConf> process, org.apache.hadoop.mapred.JobConf conf)
Tap
Flow
instance or if it participates in multiple times in a given Flow or across different Flows in
a Cascade
.
Note this method will be called in context of this Tap being used as a traditional 'sink' and as a 'trap'.
In the context of a Flow, it will be called after
FlowListener.onStarting(cascading.flow.Flow)
Note that no resources or services should be modified by this method. If this Tap instance returns true for
Tap.isReplace()
, then Tap.deleteResource(Object)
will be called by the parent Flow.
sinkConfInit
in class Tap<org.apache.hadoop.mapred.JobConf,org.apache.hadoop.mapred.RecordReader,org.apache.hadoop.mapred.OutputCollector>
process
- of type FlowProcessconf
- of type Configpublic TupleEntryIterator openForRead(FlowProcess<org.apache.hadoop.mapred.JobConf> flowProcess, org.apache.hadoop.mapred.RecordReader input) throws IOException
Tap
input
value may be null, if so, sub-classes must inquire with the underlying Scheme
via Scheme.sourceConfInit(cascading.flow.FlowProcess, Tap, Object)
to get the proper
input type and instantiate it before calling super.openForRead()
.
Note the returned iterator will return the same instance of TupleEntry
on every call,
thus a copy must be made of either the TupleEntry or the underlying Tuple
instance if they are to be
stored in a Collection.
openForRead
in class Tap<org.apache.hadoop.mapred.JobConf,org.apache.hadoop.mapred.RecordReader,org.apache.hadoop.mapred.OutputCollector>
flowProcess
- of type FlowProcessinput
- of type Input
IOException
- when the resource cannot be openedpublic TupleEntryCollector openForWrite(FlowProcess<org.apache.hadoop.mapred.JobConf> flowProcess, org.apache.hadoop.mapred.OutputCollector output) throws IOException
Tap
SinkMode
setting. If SinkMode is
SinkMode.REPLACE
, this call may fail. See Tap.openForWrite(cascading.flow.FlowProcess)
.
output
value may be null, if so, sub-classes must inquire with the underlying Scheme
via Scheme.sinkConfInit(cascading.flow.FlowProcess, Tap, Object)
to get the proper
output type and instantiate it before calling super.openForWrite()
.
openForWrite
in class Tap<org.apache.hadoop.mapred.JobConf,org.apache.hadoop.mapred.RecordReader,org.apache.hadoop.mapred.OutputCollector>
flowProcess
- of type FlowProcessoutput
- of type Output
IOException
- when the resource cannot be openedpublic boolean createResource(org.apache.hadoop.mapred.JobConf conf) throws IOException
Tap
createResource
in class Tap<org.apache.hadoop.mapred.JobConf,org.apache.hadoop.mapred.RecordReader,org.apache.hadoop.mapred.OutputCollector>
conf
- of type Config
IOException
- when there is an error making directoriespublic boolean deleteResource(org.apache.hadoop.mapred.JobConf conf) throws IOException
Tap
deleteResource
in class Tap<org.apache.hadoop.mapred.JobConf,org.apache.hadoop.mapred.RecordReader,org.apache.hadoop.mapred.OutputCollector>
conf
- of type Config
IOException
- when the resource cannot be deletedpublic boolean resourceExists(org.apache.hadoop.mapred.JobConf conf) throws IOException
Tap
resourceExists
in class Tap<org.apache.hadoop.mapred.JobConf,org.apache.hadoop.mapred.RecordReader,org.apache.hadoop.mapred.OutputCollector>
conf
- of type Config
IOException
- when the status cannot be determinedpublic boolean isDirectory(org.apache.hadoop.mapred.JobConf conf) throws IOException
conf
- of JobConf
IOException
- whenpublic long getSize(org.apache.hadoop.mapred.JobConf conf) throws IOException
conf
- of type Properties
IOException
public long getBlockSize(org.apache.hadoop.mapred.JobConf conf) throws IOException
blocksize
specified by the underlying file system for this resource.
conf
- of JobConf
IOException
- whenpublic int getReplication(org.apache.hadoop.mapred.JobConf conf) throws IOException
replication
specified by the underlying file system for
this resource.
conf
- of JobConf
IOException
- whenpublic String[] getChildIdentifiers(org.apache.hadoop.mapred.JobConf conf) throws IOException
_log
).
conf
- of JobConf
IOException
- whenpublic long getModifiedTime(org.apache.hadoop.mapred.JobConf conf) throws IOException
Tap
getModifiedTime
in class Tap<org.apache.hadoop.mapred.JobConf,org.apache.hadoop.mapred.RecordReader,org.apache.hadoop.mapred.OutputCollector>
conf
- of type Config
IOException
public static org.apache.hadoop.fs.Path getTempPath(org.apache.hadoop.mapred.JobConf conf)
protected String makeTemporaryPathDirString(String name)
|
|||||||||
PREV CLASS NEXT CLASS | FRAMES NO FRAMES | ||||||||
SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD |