|
|||||||||
PREV CLASS NEXT CLASS | FRAMES NO FRAMES | ||||||||
SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD |
java.lang.Object cascading.scheme.Scheme<Properties,InputStream,OutputStream,LineNumberReader,PrintWriter> cascading.scheme.local.TextDelimited
public class TextDelimited
Class TextDelimited provides direct support for delimited text files, like TAB (\t) or COMMA (,) delimited files. It also optionally allows for quoted values.
TextDelimited may also be used to skip the "header" in a file, where the header is defined as the very first line in every input file. That is, if the byte offset of the current line from the input is zero (0), that line will be skipped. It is assumed if sink/sourcefields
is set to either Fields.ALL
or Fields.UNKNOWN
and
skipHeader
or hasHeader
is true
, the field names will be retrieved from the header of the
file and used during planning. The header will parsed with the same rules as the body of the file.
By default headers are not skipped.
TextDelimited may also be used to write a "header" in a file. The fields names for the header are taken directly
from the declared fields. Or if the declared fields are Fields.ALL
or Fields.UNKNOWN
, the
resolved field names will be used, if any.
By default headers are not written.
If hasHeaders
is set to true
on a constructor, both skipHeader
and writeHeader
will
be set to true
.
By default this Scheme
is both strict
and safe
.
Strict meaning if a line of text does not parse into the expected number of fields, this class will throw a
TapException
. If strict is false
, then Tuple
will be returned with null
values
for the missing fields.
Safe meaning if a field cannot be coerced into an expected type, a null
will be used for the value.
If safe is false
, a TapException
will be thrown.
Also by default, quote
strings are not searched for to improve processing speed. If a file is
COMMA delimited but may have COMMA's in a value, the whole value should be surrounded by the quote string, typically
double quotes (").
Note all empty fields in a line will be returned as null
unless coerced into a new type.
This Scheme may source/sink Fields.ALL
, when given on the constructor the new instance will automatically
default to strict == false as the number of fields parsed are arbitrary or unknown. A type array may not be given
either, so all values will be returned as Strings.
TextLine
,
Serialized FormConstructor Summary | |
---|---|
TextDelimited()
Constructor TextDelimited creates a new TextDelimited instance sourcing Fields.UNKNOWN , sinking
Fields.ALL and using TAB as the default delimiter. |
|
TextDelimited(boolean hasHeader,
String delimiter)
Constructor TextDelimited creates a new TextDelimited instance sourcing Fields.UNKNOWN , sinking
Fields.ALL and using TAB as the default delimiter. |
|
TextDelimited(boolean hasHeader,
String delimiter,
String quote)
Constructor TextDelimited creates a new TextDelimited instance sourcing Fields.UNKNOWN , sinking
Fields.ALL and using TAB as the default delimiter. |
|
TextDelimited(Fields fields)
Constructor TextDelimited creates a new TextDelimited instance with TAB as the default delimiter. |
|
TextDelimited(Fields fields,
boolean skipHeader,
boolean writeHeader,
String delimiter)
Constructor TextDelimited creates a new TextDelimited instance. |
|
TextDelimited(Fields fields,
boolean skipHeader,
boolean writeHeader,
String delimiter,
boolean strict,
String quote,
Class[] types,
boolean safe)
Constructor TextDelimited creates a new TextDelimited instance. |
|
TextDelimited(Fields fields,
boolean skipHeader,
boolean writeHeader,
String delimiter,
Class[] types)
Constructor TextDelimited creates a new TextDelimited instance. |
|
TextDelimited(Fields fields,
boolean skipHeader,
boolean writeHeader,
String delimiter,
String quote,
Class[] types)
Constructor TextDelimited creates a new TextDelimited instance. |
|
TextDelimited(Fields fields,
boolean skipHeader,
boolean writeHeader,
String delimiter,
String quote,
Class[] types,
boolean safe)
Constructor TextDelimited creates a new TextDelimited instance. |
|
TextDelimited(Fields fields,
boolean hasHeader,
String delimiter)
Constructor TextDelimited creates a new TextDelimited instance. |
|
TextDelimited(Fields fields,
boolean hasHeader,
String delimiter,
Class[] types)
Constructor TextDelimited creates a new TextDelimited instance. |
|
TextDelimited(Fields fields,
boolean hasHeader,
String delimiter,
String quote)
Constructor TextDelimited creates a new TextDelimited instance. |
|
TextDelimited(Fields fields,
boolean hasHeader,
String delimiter,
String quote,
Class[] types)
Constructor TextDelimited creates a new TextDelimited instance. |
|
TextDelimited(Fields fields,
boolean hasHeader,
String delimiter,
String quote,
Class[] types,
boolean safe)
Constructor TextDelimited creates a new TextDelimited instance. |
|
TextDelimited(Fields fields,
String delimiter)
Constructor TextDelimited creates a new TextDelimited instance. |
|
TextDelimited(Fields fields,
String delimiter,
Class[] types)
Constructor TextDelimited creates a new TextDelimited instance. |
|
TextDelimited(Fields fields,
String delimiter,
String quote)
Constructor TextDelimited creates a new TextDelimited instance. |
|
TextDelimited(Fields fields,
String delimiter,
String quote,
Class[] types)
Constructor TextDelimited creates a new TextDelimited instance. |
|
TextDelimited(Fields fields,
String delimiter,
String quote,
Class[] types,
boolean safe)
Constructor TextDelimited creates a new TextDelimited instance. |
Method Summary | |
---|---|
LineNumberReader |
createInput(InputStream inputStream)
|
PrintWriter |
createOutput(OutputStream outputStream)
|
boolean |
isSymmetrical()
Method isSymmetrical returns true if the sink fields equal the source fields. |
void |
presentSinkFields(FlowProcess<Properties> flowProcess,
Tap tap,
Fields fields)
Method presentSinkFields is called after the planner is invoked and all fields are resolved. |
void |
presentSourceFields(FlowProcess<Properties> process,
Tap tap,
Fields fields)
Method presentSourceFields is called after the planner is invoked and all fields are resolved. |
Fields |
retrieveSourceFields(FlowProcess<Properties> process,
Tap tap)
Method retrieveSourceFields notifies a Scheme when it is appropriate to dynamically update the fields it sources. |
void |
sink(FlowProcess<Properties> flowProcess,
SinkCall<PrintWriter,OutputStream> sinkCall)
Method sink writes out the given Tuple found on SinkCall.getOutgoingEntry() to
the SinkCall.getOutput() . |
void |
sinkCleanup(FlowProcess<Properties> flowProcess,
SinkCall<PrintWriter,OutputStream> sinkCall)
Method sinkCleanup is used to destroy resources created by Scheme.sinkPrepare(cascading.flow.FlowProcess, SinkCall) . |
void |
sinkConfInit(FlowProcess<Properties> flowProcess,
Tap<Properties,InputStream,OutputStream> tap,
Properties conf)
Method sinkInit initializes this instance as a sink. |
void |
sinkPrepare(FlowProcess<Properties> flowProcess,
SinkCall<PrintWriter,OutputStream> sinkCall)
Method sinkPrepare is used to initialize resources needed during each call of Scheme.sink(cascading.flow.FlowProcess, SinkCall) . |
boolean |
source(FlowProcess<Properties> flowProcess,
SourceCall<LineNumberReader,InputStream> sourceCall)
Method source will read a new "record" or value from SourceCall.getInput() and populate
the available Tuple via SourceCall.getIncomingEntry() and return true
on success or false if no more values available. |
void |
sourceCleanup(FlowProcess<Properties> flowProcess,
SourceCall<LineNumberReader,InputStream> sourceCall)
Method sourceCleanup is used to destroy resources created by Scheme.sourcePrepare(cascading.flow.FlowProcess, SourceCall) . |
void |
sourceConfInit(FlowProcess<Properties> flowProcess,
Tap<Properties,InputStream,OutputStream> tap,
Properties conf)
Method sourceInit initializes this instance as a source. |
void |
sourcePrepare(FlowProcess<Properties> flowProcess,
SourceCall<LineNumberReader,InputStream> sourceCall)
Method sourcePrepare is used to initialize resources needed during each call of Scheme.source(cascading.flow.FlowProcess, SourceCall) . |
Methods inherited from class cascading.scheme.Scheme |
---|
equals, getNumSinkParts, getSinkFields, getSourceFields, getTrace, hashCode, isSink, isSource, presentSinkFieldsInternal, presentSourceFieldsInternal, retrieveSinkFields, setNumSinkParts, setSinkFields, setSourceFields, toString |
Methods inherited from class java.lang.Object |
---|
clone, finalize, getClass, notify, notifyAll, wait, wait, wait |
Constructor Detail |
---|
public TextDelimited()
Fields.UNKNOWN
, sinking
Fields.ALL
and using TAB as the default delimiter.
Use this constructor if the source and sink fields will be resolved during planning, for example, when using
with a Checkpoint
Tap.
@ConstructorProperties(value={"hasHeader","delimiter"}) public TextDelimited(boolean hasHeader, String delimiter)
Fields.UNKNOWN
, sinking
Fields.ALL
and using TAB as the default delimiter.
Use this constructor if the source and sink fields will be resolved during planning, for example, when using
with a Checkpoint
Tap.
hasHeader
- delimiter
- @ConstructorProperties(value={"hasHeader","delimiter","quote"}) public TextDelimited(boolean hasHeader, String delimiter, String quote)
Fields.UNKNOWN
, sinking
Fields.ALL
and using TAB as the default delimiter.
Use this constructor if the source and sink fields will be resolved during planning, for example, when using
with a Checkpoint
Tap.
hasHeader
- delimiter
- quote
- @ConstructorProperties(value="fields") public TextDelimited(Fields fields)
fields
- of type Fields@ConstructorProperties(value={"fields","delimiter"}) public TextDelimited(Fields fields, String delimiter)
fields
- of type Fieldsdelimiter
- of type String@ConstructorProperties(value={"fields","hasHeader","delimiter"}) public TextDelimited(Fields fields, boolean hasHeader, String delimiter)
fields
- of type FieldshasHeader
- of type booleandelimiter
- of type String@ConstructorProperties(value={"fields","skipHeader","writeHeader","delimiter"}) public TextDelimited(Fields fields, boolean skipHeader, boolean writeHeader, String delimiter)
fields
- of type FieldsskipHeader
- of type booleandelimiter
- of type String@ConstructorProperties(value={"fields","delimiter","types"}) public TextDelimited(Fields fields, String delimiter, Class[] types)
fields
- of type Fieldsdelimiter
- of type Stringtypes
- of type Class[]@ConstructorProperties(value={"fields","hasHeader","delimiter","types"}) public TextDelimited(Fields fields, boolean hasHeader, String delimiter, Class[] types)
fields
- of type FieldshasHeader
- of type booleandelimiter
- of type Stringtypes
- of type Class[]@ConstructorProperties(value={"fields","skipHeader","delimiter","types"}) public TextDelimited(Fields fields, boolean skipHeader, boolean writeHeader, String delimiter, Class[] types)
fields
- of type FieldsskipHeader
- of type booleanwriteHeader
- of type booleandelimiter
- of type Stringtypes
- of type Class[]@ConstructorProperties(value={"fields","delimiter","quote","types"}) public TextDelimited(Fields fields, String delimiter, String quote, Class[] types)
fields
- of type Fieldsdelimiter
- of type Stringquote
- of type Stringtypes
- of type Class[]@ConstructorProperties(value={"fields","hasHeader","delimiter","quote","types"}) public TextDelimited(Fields fields, boolean hasHeader, String delimiter, String quote, Class[] types)
fields
- of type FieldshasHeader
- of type booleandelimiter
- of type Stringquote
- of type Stringtypes
- of type Class[]@ConstructorProperties(value={"fields","skipHeader","writeHeader","delimiter","quote","types"}) public TextDelimited(Fields fields, boolean skipHeader, boolean writeHeader, String delimiter, String quote, Class[] types)
fields
- of type FieldsskipHeader
- of type booleanwriteHeader
- of type booleandelimiter
- of type Stringquote
- of type Stringtypes
- of type Class[]@ConstructorProperties(value={"fields","delimiter","quote","types","safe"}) public TextDelimited(Fields fields, String delimiter, String quote, Class[] types, boolean safe)
fields
- of type Fieldsdelimiter
- of type Stringquote
- of type Stringtypes
- of type Class[]safe
- of type boolean@ConstructorProperties(value={"fields","hasHeader","delimiter","quote","types","safe"}) public TextDelimited(Fields fields, boolean hasHeader, String delimiter, String quote, Class[] types, boolean safe)
fields
- of type FieldshasHeader
- of type booleandelimiter
- of type Stringquote
- of type Stringtypes
- of type Class[]safe
- of type boolean@ConstructorProperties(value={"fields","skipHeader","writeHeader","delimiter","quote","types","safe"}) public TextDelimited(Fields fields, boolean skipHeader, boolean writeHeader, String delimiter, String quote, Class[] types, boolean safe)
fields
- of type FieldsskipHeader
- of type booleanwriteHeader
- of type booleandelimiter
- of type Stringquote
- of type Stringtypes
- of type Class[]safe
- of type boolean@ConstructorProperties(value={"fields","delimiter","quote"}) public TextDelimited(Fields fields, String delimiter, String quote)
fields
- of type Fieldsdelimiter
- of type Stringquote
- of type String@ConstructorProperties(value={"fields","hasHeader","delimiter","quote"}) public TextDelimited(Fields fields, boolean hasHeader, String delimiter, String quote)
fields
- of type FieldshasHeader
- of type booleandelimiter
- of type Stringquote
- of type String@ConstructorProperties(value={"fields","skipHeader","writeHeader","delimiter","strict","quote","types","safe"}) public TextDelimited(Fields fields, boolean skipHeader, boolean writeHeader, String delimiter, boolean strict, String quote, Class[] types, boolean safe)
fields
- of type FieldsskipHeader
- of type booleanwriteHeader
- of type booleandelimiter
- of type Stringstrict
- of type booleanquote
- of type Stringtypes
- of type Class[]safe
- of type booleanMethod Detail |
---|
public LineNumberReader createInput(InputStream inputStream)
public PrintWriter createOutput(OutputStream outputStream)
public boolean isSymmetrical()
Scheme
true
if the sink fields equal the source fields. That is, this
scheme sources the same fields as it sinks.
isSymmetrical
in class Scheme<Properties,InputStream,OutputStream,LineNumberReader,PrintWriter>
public Fields retrieveSourceFields(FlowProcess<Properties> process, Tap tap)
Scheme
FlowProcess
presents all known properties resolved by the current planner.
The instance is the parent Tap
for this Scheme instance.
retrieveSourceFields
in class Scheme<Properties,InputStream,OutputStream,LineNumberReader,PrintWriter>
process
- of type FlowProcesstap
- of type Tap
public void presentSourceFields(FlowProcess<Properties> process, Tap tap, Fields fields)
Scheme
Scheme.retrieveSourceFields(cascading.flow.FlowProcess, cascading.tap.Tap)
.
presentSourceFields
in class Scheme<Properties,InputStream,OutputStream,LineNumberReader,PrintWriter>
process
- of type FlowProcesstap
- of type Tapfields
- of type Fieldspublic void presentSinkFields(FlowProcess<Properties> flowProcess, Tap tap, Fields fields)
Scheme
Scheme.retrieveSinkFields(cascading.flow.FlowProcess, cascading.tap.Tap)
.
presentSinkFields
in class Scheme<Properties,InputStream,OutputStream,LineNumberReader,PrintWriter>
flowProcess
- of type FlowProcesstap
- of type Tapfields
- of type Fieldspublic void sourceConfInit(FlowProcess<Properties> flowProcess, Tap<Properties,InputStream,OutputStream> tap, Properties conf)
Scheme
Scheme.sourcePrepare(cascading.flow.FlowProcess, SourceCall)
if resources much be initialized
before use. And Scheme.sourceCleanup(cascading.flow.FlowProcess, SourceCall)
if resources must be
destroyed after use.
sourceConfInit
in class Scheme<Properties,InputStream,OutputStream,LineNumberReader,PrintWriter>
tap
- of type Tapconf
- of type JobConf @throws IOException on initialization failurepublic void sourcePrepare(FlowProcess<Properties> flowProcess, SourceCall<LineNumberReader,InputStream> sourceCall) throws IOException
Scheme
Scheme.source(cascading.flow.FlowProcess, SourceCall)
.
Be sure to place any initialized objects in the SourceContext
so each instance
will remain threadsafe.
sourcePrepare
in class Scheme<Properties,InputStream,OutputStream,LineNumberReader,PrintWriter>
flowProcess
- of ProcesssourceCall
- of SourceCallIOException
public boolean source(FlowProcess<Properties> flowProcess, SourceCall<LineNumberReader,InputStream> sourceCall) throws IOException
Scheme
SourceCall.getInput()
and populate
the available Tuple
via SourceCall.getIncomingEntry()
and return true
on success or false
if no more values available.
It's ok to set a new Tuple instance on the incomingEntry
TupleEntry
, or
to simply re-use the existing instance.
Note this is only time it is safe to modify a Tuple instance handed over via a method call.
This method may optionally throw a TapException
if it cannot process a particular
instance of data. If the payload Tuple is set on the TapException, that Tuple will be written to
any applicable failure trap Tap.
source
in class Scheme<Properties,InputStream,OutputStream,LineNumberReader,PrintWriter>
flowProcess
- of ProcesssourceCall
- of SourceCall
true
when a Tuple was successfully read
IOException
public void sourceCleanup(FlowProcess<Properties> flowProcess, SourceCall<LineNumberReader,InputStream> sourceCall) throws IOException
Scheme
Scheme.sourcePrepare(cascading.flow.FlowProcess, SourceCall)
.
sourceCleanup
in class Scheme<Properties,InputStream,OutputStream,LineNumberReader,PrintWriter>
flowProcess
- of ProcesssourceCall
- of SourceCallIOException
public void sinkConfInit(FlowProcess<Properties> flowProcess, Tap<Properties,InputStream,OutputStream> tap, Properties conf)
Scheme
Scheme.sinkPrepare(cascading.flow.FlowProcess, SinkCall)
if resources much be initialized
before use. And Scheme.sinkCleanup(cascading.flow.FlowProcess, SinkCall)
if resources must be
destroyed after use.
sinkConfInit
in class Scheme<Properties,InputStream,OutputStream,LineNumberReader,PrintWriter>
tap
- of type Tapconf
- of type JobConf @throws IOException on initialization failurepublic void sinkPrepare(FlowProcess<Properties> flowProcess, SinkCall<PrintWriter,OutputStream> sinkCall)
Scheme
Scheme.sink(cascading.flow.FlowProcess, SinkCall)
.
Be sure to place any initialized objects in the SinkContext
so each instance
will remain threadsafe.
sinkPrepare
in class Scheme<Properties,InputStream,OutputStream,LineNumberReader,PrintWriter>
flowProcess
- of ProcesssinkCall
- of SinkCallpublic void sink(FlowProcess<Properties> flowProcess, SinkCall<PrintWriter,OutputStream> sinkCall) throws IOException
Scheme
Tuple
found on SinkCall.getOutgoingEntry()
to
the SinkCall.getOutput()
.
This method may optionally throw a TapException
if it cannot process a particular
instance of data. If the payload Tuple is set on the TapException, that Tuple will be written to
any applicable failure trap Tap. If not set, the incoming Tuple will be written instead.
sink
in class Scheme<Properties,InputStream,OutputStream,LineNumberReader,PrintWriter>
flowProcess
- of ProcesssinkCall
- of SinkCall
IOException
public void sinkCleanup(FlowProcess<Properties> flowProcess, SinkCall<PrintWriter,OutputStream> sinkCall)
Scheme
Scheme.sinkPrepare(cascading.flow.FlowProcess, SinkCall)
.
sinkCleanup
in class Scheme<Properties,InputStream,OutputStream,LineNumberReader,PrintWriter>
flowProcess
- of ProcesssinkCall
- of SinkCall
|
|||||||||
PREV CLASS NEXT CLASS | FRAMES NO FRAMES | ||||||||
SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD |