|
|||||||||
PREV CLASS NEXT CLASS | FRAMES NO FRAMES | ||||||||
SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD |
java.lang.Object cascading.scheme.Scheme<Properties,InputStream,OutputStream,LineNumberReader,PrintWriter> cascading.scheme.local.TextDelimited
public class TextDelimited
Class TextDelimited provides direct support for delimited text files, like TAB (\t) or COMMA (,) delimited files. It also optionally allows for quoted values.
TextDelimited may also be used to skip the "header" in a file, where the header is defined as the very first line in every input file. That is, if the byte offset of the current line from the input is zero (0), that line will be skipped. It is assumed if sink/sourcefields
is set to either Fields.ALL
or Fields.UNKNOWN
and
skipHeader
or hasHeader
is true
, the field names will be retrieved from the header of the
file and used during planning. The header will parsed with the same rules as the body of the file.
By default headers are not skipped.
TextDelimited may also be used to write a "header" in a file. The fields names for the header are taken directly
from the declared fields. Or if the declared fields are Fields.ALL
or Fields.UNKNOWN
, the
resolved field names will be used, if any.
By default headers are not written.
If hasHeaders
is set to true
on a constructor, both skipHeader
and writeHeader
will
be set to true
.
By default this Scheme
is both strict
and safe
.
Strict meaning if a line of text does not parse into the expected number of fields, this class will throw a
TapException
. If strict is false
, then Tuple
will be returned with null
values
for the missing fields.
Safe meaning if a field cannot be coerced into an expected type, a null
will be used for the value.
If safe is false
, a TapException
will be thrown.
Also by default, quote
strings are not searched for to improve processing speed. If a file is
COMMA delimited but may have COMMA's in a value, the whole value should be surrounded by the quote string, typically
double quotes (").
Note all empty fields in a line will be returned as null
unless coerced into a new type.
This Scheme may source/sink Fields.ALL
, when given on the constructor the new instance will automatically
default to strict == false as the number of fields parsed are arbitrary or unknown. A type array may not be given
either, so all values will be returned as Strings.
By default, all text is encoded/decoded as UTF-8. This can be changed via the charsetName
constructor
argument.
To override field and line parsing behaviors, sub-class DelimitedParser
or provide a
FieldTypeResolver
implementation.
Note that there should be no expectation that TextDelimited, or specifically DelimitedParser
, can handle
all delimited and quoted combinations reliably. Attempting to do so would impair its performance and maintainability.
Further, it can be safely said any corrupted files will not be supported for obvious reasons. Corrupted files may
result in exceptions or could cause edge cases in the underlying java regular expression engine.
A large part of Cascading was designed to help users cleans data. Thus the recommendation is to create Flows that
are responsible for cleansing large data-sets when faced with the problem
DelimitedParser maybe sub-classed and extended if necessary.
TextLine
,
Serialized FormField Summary | |
---|---|
static String |
DEFAULT_CHARSET
|
Constructor Summary | |
---|---|
TextDelimited()
Constructor TextDelimited creates a new TextDelimited instance sourcing Fields.UNKNOWN , sinking
Fields.ALL and using TAB as the default delimiter. |
|
TextDelimited(boolean hasHeader,
DelimitedParser delimitedParser)
Constructor TextDelimited creates a new TextDelimited instance sourcing Fields.UNKNOWN , sinking
Fields.ALL and using the given delimitedParser instance for parsing. |
|
TextDelimited(boolean hasHeader,
String delimiter)
Constructor TextDelimited creates a new TextDelimited instance sourcing Fields.UNKNOWN , sinking
Fields.ALL and using TAB as the default delimiter. |
|
TextDelimited(boolean hasHeader,
String delimiter,
String quote)
Constructor TextDelimited creates a new TextDelimited instance sourcing Fields.UNKNOWN , sinking
Fields.ALL and using TAB as the default delimiter. |
|
TextDelimited(DelimitedParser delimitedParser)
Constructor TextDelimited creates a new TextDelimited instance sourcing Fields.UNKNOWN , sinking
Fields.ALL and using the given delimitedParser instance for parsing. |
|
TextDelimited(Fields fields)
Constructor TextDelimited creates a new TextDelimited instance with TAB as the default delimiter. |
|
TextDelimited(Fields fields,
boolean skipHeader,
boolean writeHeader,
DelimitedParser delimitedParser)
Constructor TextDelimited creates a new TextDelimited instance. |
|
TextDelimited(Fields fields,
boolean skipHeader,
boolean writeHeader,
String delimiter)
Constructor TextDelimited creates a new TextDelimited instance. |
|
TextDelimited(Fields fields,
boolean skipHeader,
boolean writeHeader,
String delimiter,
boolean strict,
String quote,
Class[] types,
boolean safe)
Constructor TextDelimited creates a new TextDelimited instance. |
|
TextDelimited(Fields fields,
boolean skipHeader,
boolean writeHeader,
String delimiter,
boolean strict,
String quote,
Class[] types,
boolean safe,
String charsetName)
Constructor TextDelimited creates a new TextDelimited instance. |
|
TextDelimited(Fields fields,
boolean skipHeader,
boolean writeHeader,
String delimiter,
Class[] types)
Constructor TextDelimited creates a new TextDelimited instance. |
|
TextDelimited(Fields fields,
boolean skipHeader,
boolean writeHeader,
String charsetName,
DelimitedParser delimitedParser)
Constructor TextDelimited creates a new TextDelimited instance. |
|
TextDelimited(Fields fields,
boolean skipHeader,
boolean writeHeader,
String delimiter,
String quote,
Class[] types)
Constructor TextDelimited creates a new TextDelimited instance. |
|
TextDelimited(Fields fields,
boolean skipHeader,
boolean writeHeader,
String delimiter,
String quote,
Class[] types,
boolean safe)
Constructor TextDelimited creates a new TextDelimited instance. |
|
TextDelimited(Fields fields,
boolean hasHeader,
DelimitedParser delimitedParser)
Constructor TextDelimited creates a new TextDelimited instance. |
|
TextDelimited(Fields fields,
boolean hasHeader,
String delimiter)
Constructor TextDelimited creates a new TextDelimited instance. |
|
TextDelimited(Fields fields,
boolean hasHeader,
String delimiter,
Class[] types)
Constructor TextDelimited creates a new TextDelimited instance. |
|
TextDelimited(Fields fields,
boolean hasHeader,
String delimiter,
String quote)
Constructor TextDelimited creates a new TextDelimited instance. |
|
TextDelimited(Fields fields,
boolean hasHeader,
String delimiter,
String quote,
Class[] types)
Constructor TextDelimited creates a new TextDelimited instance. |
|
TextDelimited(Fields fields,
boolean hasHeader,
String delimiter,
String quote,
Class[] types,
boolean safe)
Constructor TextDelimited creates a new TextDelimited instance. |
|
TextDelimited(Fields fields,
boolean hasHeader,
String delimiter,
String quote,
Class[] types,
boolean safe,
String charsetName)
Constructor TextDelimited creates a new TextDelimited instance. |
|
TextDelimited(Fields fields,
boolean hasHeader,
String delimiter,
String quote,
String charsetName)
Constructor TextDelimited creates a new TextDelimited instance. |
|
TextDelimited(Fields fields,
String delimiter)
Constructor TextDelimited creates a new TextDelimited instance. |
|
TextDelimited(Fields fields,
String delimiter,
Class[] types)
Constructor TextDelimited creates a new TextDelimited instance. |
|
TextDelimited(Fields fields,
String delimiter,
String quote)
Constructor TextDelimited creates a new TextDelimited instance. |
|
TextDelimited(Fields fields,
String delimiter,
String quote,
Class[] types)
Constructor TextDelimited creates a new TextDelimited instance. |
|
TextDelimited(Fields fields,
String delimiter,
String quote,
Class[] types,
boolean safe)
Constructor TextDelimited creates a new TextDelimited instance. |
Methods inherited from class cascading.scheme.Scheme |
---|
equals, getNumSinkParts, getSinkFields, getSourceFields, getTrace, hashCode, isSink, isSource, presentSinkFieldsInternal, presentSourceFieldsInternal, retrieveSinkFields, setNumSinkParts, toString |
Methods inherited from class java.lang.Object |
---|
clone, finalize, getClass, notify, notifyAll, wait, wait, wait |
Field Detail |
---|
public static final String DEFAULT_CHARSET
Constructor Detail |
---|
public TextDelimited()
Fields.UNKNOWN
, sinking
Fields.ALL
and using TAB as the default delimiter.
Use this constructor if the source and sink fields will be resolved during planning, for example, when using
with a Checkpoint
Tap.
@ConstructorProperties(value={"hasHeader","delimiter"}) public TextDelimited(boolean hasHeader, String delimiter)
Fields.UNKNOWN
, sinking
Fields.ALL
and using TAB as the default delimiter.
Use this constructor if the source and sink fields will be resolved during planning, for example, when using
with a Checkpoint
Tap.
hasHeader
- delimiter
- @ConstructorProperties(value={"hasHeader","delimiter","quote"}) public TextDelimited(boolean hasHeader, String delimiter, String quote)
Fields.UNKNOWN
, sinking
Fields.ALL
and using TAB as the default delimiter.
Use this constructor if the source and sink fields will be resolved during planning, for example, when using
with a Checkpoint
Tap.
hasHeader
- delimiter
- quote
- @ConstructorProperties(value={"hasHeader","delimitedParser"}) public TextDelimited(boolean hasHeader, DelimitedParser delimitedParser)
Fields.UNKNOWN
, sinking
Fields.ALL
and using the given delimitedParser instance for parsing.
Use this constructor if the source and sink fields will be resolved during planning, for example, when using
with a Checkpoint
Tap.
hasHeader
- delimitedParser
- @ConstructorProperties(value="delimitedParser") public TextDelimited(DelimitedParser delimitedParser)
Fields.UNKNOWN
, sinking
Fields.ALL
and using the given delimitedParser instance for parsing.
Use this constructor if the source and sink fields will be resolved during planning, for example, when using
with a Checkpoint
Tap.
This constructor will set skipHeader
and writeHeader
values to true.
delimitedParser
- @ConstructorProperties(value="fields") public TextDelimited(Fields fields)
fields
- of type Fields@ConstructorProperties(value={"fields","delimiter"}) public TextDelimited(Fields fields, String delimiter)
fields
- of type Fieldsdelimiter
- of type String@ConstructorProperties(value={"fields","hasHeader","delimiter"}) public TextDelimited(Fields fields, boolean hasHeader, String delimiter)
fields
- of type FieldshasHeader
- of type booleandelimiter
- of type String@ConstructorProperties(value={"fields","skipHeader","writeHeader","delimiter"}) public TextDelimited(Fields fields, boolean skipHeader, boolean writeHeader, String delimiter)
fields
- of type FieldsskipHeader
- of type booleandelimiter
- of type String@ConstructorProperties(value={"fields","delimiter","types"}) public TextDelimited(Fields fields, String delimiter, Class[] types)
fields
- of type Fieldsdelimiter
- of type Stringtypes
- of type Class[]@ConstructorProperties(value={"fields","hasHeader","delimiter","types"}) public TextDelimited(Fields fields, boolean hasHeader, String delimiter, Class[] types)
fields
- of type FieldshasHeader
- of type booleandelimiter
- of type Stringtypes
- of type Class[]@ConstructorProperties(value={"fields","skipHeader","writeHeader","delimiter","types"}) public TextDelimited(Fields fields, boolean skipHeader, boolean writeHeader, String delimiter, Class[] types)
fields
- of type FieldsskipHeader
- of type booleanwriteHeader
- of type booleandelimiter
- of type Stringtypes
- of type Class[]@ConstructorProperties(value={"fields","delimiter","quote","types"}) public TextDelimited(Fields fields, String delimiter, String quote, Class[] types)
fields
- of type Fieldsdelimiter
- of type Stringquote
- of type Stringtypes
- of type Class[]@ConstructorProperties(value={"fields","hasHeader","delimiter","quote","types"}) public TextDelimited(Fields fields, boolean hasHeader, String delimiter, String quote, Class[] types)
fields
- of type FieldshasHeader
- of type booleandelimiter
- of type Stringquote
- of type Stringtypes
- of type Class[]@ConstructorProperties(value={"fields","skipHeader","writeHeader","delimiter","quote","types"}) public TextDelimited(Fields fields, boolean skipHeader, boolean writeHeader, String delimiter, String quote, Class[] types)
fields
- of type FieldsskipHeader
- of type booleanwriteHeader
- of type booleandelimiter
- of type Stringquote
- of type Stringtypes
- of type Class[]@ConstructorProperties(value={"fields","delimiter","quote","types","safe"}) public TextDelimited(Fields fields, String delimiter, String quote, Class[] types, boolean safe)
fields
- of type Fieldsdelimiter
- of type Stringquote
- of type Stringtypes
- of type Class[]safe
- of type boolean@ConstructorProperties(value={"fields","hasHeader","delimiter","quote","types","safe"}) public TextDelimited(Fields fields, boolean hasHeader, String delimiter, String quote, Class[] types, boolean safe)
fields
- of type FieldshasHeader
- of type booleandelimiter
- of type Stringquote
- of type Stringtypes
- of type Class[]safe
- of type boolean@ConstructorProperties(value={"fields","hasHeader","delimiter","quote","types","safe","charsetName"}) public TextDelimited(Fields fields, boolean hasHeader, String delimiter, String quote, Class[] types, boolean safe, String charsetName)
fields
- of type FieldshasHeader
- of type booleandelimiter
- of type Stringquote
- of type Stringtypes
- of type Class[]safe
- of type booleancharsetName
- of type String@ConstructorProperties(value={"fields","skipHeader","writeHeader","delimiter","quote","types","safe"}) public TextDelimited(Fields fields, boolean skipHeader, boolean writeHeader, String delimiter, String quote, Class[] types, boolean safe)
fields
- of type FieldsskipHeader
- of type booleanwriteHeader
- of type booleandelimiter
- of type Stringquote
- of type Stringtypes
- of type Class[]safe
- of type boolean@ConstructorProperties(value={"fields","delimiter","quote"}) public TextDelimited(Fields fields, String delimiter, String quote)
fields
- of type Fieldsdelimiter
- of type Stringquote
- of type String@ConstructorProperties(value={"fields","hasHeader","delimiter","quote"}) public TextDelimited(Fields fields, boolean hasHeader, String delimiter, String quote)
fields
- of type FieldshasHeader
- of type booleandelimiter
- of type Stringquote
- of type String@ConstructorProperties(value={"fields","hasHeader","delimiter","quote","charsetName"}) public TextDelimited(Fields fields, boolean hasHeader, String delimiter, String quote, String charsetName)
fields
- of type FieldshasHeader
- of type booleandelimiter
- of type Stringquote
- of type StringcharsetName
- of type String@ConstructorProperties(value={"fields","skipHeader","writeHeader","delimiter","strict","quote","types","safe"}) public TextDelimited(Fields fields, boolean skipHeader, boolean writeHeader, String delimiter, boolean strict, String quote, Class[] types, boolean safe)
fields
- of type FieldsskipHeader
- of type booleanwriteHeader
- of type booleandelimiter
- of type Stringstrict
- of type booleanquote
- of type Stringtypes
- of type Class[]safe
- of type boolean@ConstructorProperties(value={"fields","skipHeader","writeHeader","delimiter","strict","quote","types","safe","charsetName"}) public TextDelimited(Fields fields, boolean skipHeader, boolean writeHeader, String delimiter, boolean strict, String quote, Class[] types, boolean safe, String charsetName)
fields
- of type FieldsskipHeader
- of type booleanwriteHeader
- of type booleandelimiter
- of type Stringstrict
- of type booleanquote
- of type Stringtypes
- of type Class[]safe
- of type booleancharsetName
- of type String@ConstructorProperties(value={"fields","skipHeader","writeHeader","delimitedParser"}) public TextDelimited(Fields fields, boolean skipHeader, boolean writeHeader, DelimitedParser delimitedParser)
fields
- of type FieldswriteHeader
- of type booleandelimitedParser
- of type DelimitedParser@ConstructorProperties(value={"fields","hasHeader","delimitedParser"}) public TextDelimited(Fields fields, boolean hasHeader, DelimitedParser delimitedParser)
fields
- of type FieldshasHeader
- of type booleandelimitedParser
- of type DelimitedParser@ConstructorProperties(value={"fields","skipHeader","writeHeader","charsetName","delimitedParser"}) public TextDelimited(Fields fields, boolean skipHeader, boolean writeHeader, String charsetName, DelimitedParser delimitedParser)
fields
- of type FieldswriteHeader
- of type booleancharsetName
- of type StringdelimitedParser
- of type DelimitedParserMethod Detail |
---|
public String getCharsetName()
public String getDelimiter()
public String getQuote()
public LineNumberReader createInput(InputStream inputStream)
public PrintWriter createOutput(OutputStream outputStream)
public void setSinkFields(Fields sinkFields)
setSinkFields
in class Scheme<Properties,InputStream,OutputStream,LineNumberReader,PrintWriter>
public void setSourceFields(Fields sourceFields)
setSourceFields
in class Scheme<Properties,InputStream,OutputStream,LineNumberReader,PrintWriter>
public boolean isSymmetrical()
isSymmetrical
in class Scheme<Properties,InputStream,OutputStream,LineNumberReader,PrintWriter>
public Fields retrieveSourceFields(FlowProcess<Properties> process, Tap tap)
retrieveSourceFields
in class Scheme<Properties,InputStream,OutputStream,LineNumberReader,PrintWriter>
public void presentSourceFields(FlowProcess<Properties> process, Tap tap, Fields fields)
presentSourceFields
in class Scheme<Properties,InputStream,OutputStream,LineNumberReader,PrintWriter>
public void presentSinkFields(FlowProcess<Properties> flowProcess, Tap tap, Fields fields)
presentSinkFields
in class Scheme<Properties,InputStream,OutputStream,LineNumberReader,PrintWriter>
public void sourceConfInit(FlowProcess<Properties> flowProcess, Tap<Properties,InputStream,OutputStream> tap, Properties conf)
sourceConfInit
in class Scheme<Properties,InputStream,OutputStream,LineNumberReader,PrintWriter>
public void sourcePrepare(FlowProcess<Properties> flowProcess, SourceCall<LineNumberReader,InputStream> sourceCall) throws IOException
sourcePrepare
in class Scheme<Properties,InputStream,OutputStream,LineNumberReader,PrintWriter>
IOException
public boolean source(FlowProcess<Properties> flowProcess, SourceCall<LineNumberReader,InputStream> sourceCall) throws IOException
source
in class Scheme<Properties,InputStream,OutputStream,LineNumberReader,PrintWriter>
IOException
public void sourceCleanup(FlowProcess<Properties> flowProcess, SourceCall<LineNumberReader,InputStream> sourceCall) throws IOException
sourceCleanup
in class Scheme<Properties,InputStream,OutputStream,LineNumberReader,PrintWriter>
IOException
public void sinkConfInit(FlowProcess<Properties> flowProcess, Tap<Properties,InputStream,OutputStream> tap, Properties conf)
sinkConfInit
in class Scheme<Properties,InputStream,OutputStream,LineNumberReader,PrintWriter>
public void sinkPrepare(FlowProcess<Properties> flowProcess, SinkCall<PrintWriter,OutputStream> sinkCall)
sinkPrepare
in class Scheme<Properties,InputStream,OutputStream,LineNumberReader,PrintWriter>
public void sink(FlowProcess<Properties> flowProcess, SinkCall<PrintWriter,OutputStream> sinkCall) throws IOException
sink
in class Scheme<Properties,InputStream,OutputStream,LineNumberReader,PrintWriter>
IOException
public void sinkCleanup(FlowProcess<Properties> flowProcess, SinkCall<PrintWriter,OutputStream> sinkCall)
sinkCleanup
in class Scheme<Properties,InputStream,OutputStream,LineNumberReader,PrintWriter>
|
|||||||||
PREV CLASS NEXT CLASS | FRAMES NO FRAMES | ||||||||
SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD |