|
|||||||||
PREV CLASS NEXT CLASS | FRAMES NO FRAMES | ||||||||
SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD |
java.lang.Object cascading.scheme.Scheme cascading.scheme.TextLine cascading.scheme.TextDelimited
public class TextDelimited
Class TextDelimited is a sub-class of TextLine
. It provides direct support for delimited text files, like
TAB (\t) or COMMA (,) delimited files. It also optionally allows for quoted values.
Scheme
is both strict
and safe
.
Strict meaning if a line of text does not parse into the expected number of fields, this class will throw a
TapException
. If strict is false
, then Tuple
will be returned with null
values
for the missing fields.
Safe meaning if a field cannot be coerced into an expected type, a null
will be used for the value.
If safe is false
, a TapException
will be thrown.
Also by default, quote
strings are not searched for to improve processing speed. If a file is
COMMA delimited but may have COMMA's in a value, the whole value should be surrounded by the quote string, typically
double quotes (").
Note all empty fields in a line will be returned as null
unless coerced into a new type.
This Scheme may source/sink Fields.ALL
, when given on the constructor the new instance will automatically
default to strict == false as the number of fields parsed are arbitrary or unknown. A type array may not be given
either, so all values will be returned as Strings.
TextLine
,
Serialized FormNested Class Summary |
---|
Nested classes/interfaces inherited from class cascading.scheme.TextLine |
---|
TextLine.Compress |
Field Summary | |
---|---|
protected Pattern |
cleanPattern
Field cleanPattern |
protected Pattern |
escapePattern
Field escapePattern |
protected Pattern |
splitPattern
Field splitPattern |
Fields inherited from class cascading.scheme.TextLine |
---|
DEFAULT_SOURCE_FIELDS |
Constructor Summary | |
---|---|
TextDelimited(Fields fields,
boolean skipHeader,
String delimiter)
Constructor TextDelimited creates a new TextDelimited instance. |
|
TextDelimited(Fields fields,
boolean skipHeader,
String delimiter,
Class[] types)
Constructor TextDelimited creates a new TextDelimited instance. |
|
TextDelimited(Fields fields,
boolean skipHeader,
String delimiter,
String quote)
Constructor TextDelimited creates a new TextDelimited instance. |
|
TextDelimited(Fields fields,
boolean skipHeader,
String delimiter,
String quote,
Class[] types)
Constructor TextDelimited creates a new TextDelimited instance. |
|
TextDelimited(Fields fields,
boolean skipHeader,
String delimiter,
String quote,
Class[] types,
boolean safe)
Constructor TextDelimited creates a new TextDelimited instance. |
|
TextDelimited(Fields fields,
String delimiter)
Constructor TextDelimited creates a new TextDelimited instance. |
|
TextDelimited(Fields fields,
String delimiter,
Class[] types)
Constructor TextDelimited creates a new TextDelimited instance. |
|
TextDelimited(Fields fields,
String delimiter,
String quote)
Constructor TextDelimited creates a new TextDelimited instance. |
|
TextDelimited(Fields fields,
String delimiter,
String quote,
Class[] types)
Constructor TextDelimited creates a new TextDelimited instance. |
|
TextDelimited(Fields fields,
String delimiter,
String quote,
Class[] types,
boolean safe)
Constructor TextDelimited creates a new TextDelimited instance. |
|
TextDelimited(Fields fields,
TextLine.Compress sinkCompression,
boolean skipHeader,
String delimiter)
Constructor TextDelimited creates a new TextDelimited instance. |
|
TextDelimited(Fields fields,
TextLine.Compress sinkCompression,
boolean skipHeader,
String delimiter,
boolean strict,
String quote,
Class[] types,
boolean safe)
Constructor TextDelimited creates a new TextDelimited instance. |
|
TextDelimited(Fields fields,
TextLine.Compress sinkCompression,
boolean skipHeader,
String delimiter,
Class[] types)
Constructor TextDelimited creates a new TextDelimited instance. |
|
TextDelimited(Fields fields,
TextLine.Compress sinkCompression,
boolean skipHeader,
String delimiter,
Class[] types,
boolean safe)
Constructor TextDelimited creates a new TextDelimited instance. |
|
TextDelimited(Fields fields,
TextLine.Compress sinkCompression,
boolean skipHeader,
String delimiter,
String quote)
Constructor TextDelimited creates a new TextDelimited instance. |
|
TextDelimited(Fields fields,
TextLine.Compress sinkCompression,
boolean skipHeader,
String delimiter,
String quote,
Class[] types)
Constructor TextDelimited creates a new TextDelimited instance. |
|
TextDelimited(Fields fields,
TextLine.Compress sinkCompression,
boolean skipHeader,
String delimiter,
String quote,
Class[] types,
boolean safe)
Constructor TextDelimited creates a new TextDelimited instance. |
|
TextDelimited(Fields fields,
TextLine.Compress sinkCompression,
String delimiter)
Constructor TextDelimited creates a new TextDelimited instance. |
|
TextDelimited(Fields fields,
TextLine.Compress sinkCompression,
String delimiter,
Class[] types)
Constructor TextDelimited creates a new TextDelimited instance. |
|
TextDelimited(Fields fields,
TextLine.Compress sinkCompression,
String delimiter,
Class[] types,
boolean safe)
Constructor TextDelimited creates a new TextDelimited instance. |
|
TextDelimited(Fields fields,
TextLine.Compress sinkCompression,
String delimiter,
String quote)
Constructor TextDelimited creates a new TextDelimited instance. |
|
TextDelimited(Fields fields,
TextLine.Compress sinkCompression,
String delimiter,
String quote,
Class[] types)
Constructor TextDelimited creates a new TextDelimited instance. |
|
TextDelimited(Fields fields,
TextLine.Compress sinkCompression,
String delimiter,
String quote,
Class[] types,
boolean safe)
Constructor TextDelimited creates a new TextDelimited instance. |
Method Summary | |
---|---|
static Object[] |
cleanSplit(Object[] split,
Pattern cleanPattern,
Pattern escapePattern,
String quote)
Method cleanSplit will return a quote free array of String values, the given split array
will be updated in place. |
static Pattern |
createCleanPatternFor(String quote)
Method createCleanPatternFor creates a regex Pattern for removing quote characters from a String. |
static Pattern |
createEscapePatternFor(String quote)
Method createEscapePatternFor creates a regex Pattern cleaning quote escapes from a String. |
static String[] |
createSplit(String value,
Pattern splitPattern,
int numValues)
Method createSplit will split the given value with the given splitPattern . |
static Pattern |
createSplitPatternFor(String delimiter,
String quote)
Method createSplitPatternFor creates a regex Pattern for splitting a line of text into its component
parts using the given delimiter and quote Strings. |
void |
sink(TupleEntry tupleEntry,
OutputCollector outputCollector)
Method sink writes out the given Tuple instance to the outputCollector. |
Tuple |
source(Object key,
Object value)
Method source takes the given Hadoop key and value and returns a new Tuple instance. |
Methods inherited from class cascading.scheme.TextLine |
---|
getSinkCompression, setSinkCompression, sinkInit, sourceInit |
Methods inherited from class cascading.scheme.Scheme |
---|
equals, getNumSinkParts, getSinkFields, getSourceFields, getTrace, hashCode, isSink, isSource, isSymmetrical, isWriteDirect, setNumSinkParts, setSinkFields, setSourceFields, toString |
Methods inherited from class java.lang.Object |
---|
clone, finalize, getClass, notify, notifyAll, wait, wait, wait |
Field Detail |
---|
protected Pattern splitPattern
protected Pattern cleanPattern
protected Pattern escapePattern
Constructor Detail |
---|
@ConstructorProperties(value={"fields","delimiter"}) public TextDelimited(Fields fields, String delimiter)
fields
- of type Fieldsdelimiter
- of type String@ConstructorProperties(value={"fields","skipHeader","delimiter"}) public TextDelimited(Fields fields, boolean skipHeader, String delimiter)
fields
- of type FieldsskipHeader
- of type booleandelimiter
- of type String@ConstructorProperties(value={"fields","delimiter","types"}) public TextDelimited(Fields fields, String delimiter, Class[] types)
fields
- of type Fieldsdelimiter
- of type Stringtypes
- of type Class[]@ConstructorProperties(value={"fields","skipHeader","delimiter","types"}) public TextDelimited(Fields fields, boolean skipHeader, String delimiter, Class[] types)
fields
- of type FieldsskipHeader
- of type booleandelimiter
- of type Stringtypes
- of type Class[]@ConstructorProperties(value={"fields","delimiter","quote","types"}) public TextDelimited(Fields fields, String delimiter, String quote, Class[] types)
fields
- of type Fieldsdelimiter
- of type Stringquote
- of type Stringtypes
- of type Class[]@ConstructorProperties(value={"fields","skipHeader","delimiter","quote","types"}) public TextDelimited(Fields fields, boolean skipHeader, String delimiter, String quote, Class[] types)
fields
- of type FieldsskipHeader
- of type booleandelimiter
- of type Stringquote
- of type Stringtypes
- of type Class[]@ConstructorProperties(value={"fields","delimiter","quote","types","safe"}) public TextDelimited(Fields fields, String delimiter, String quote, Class[] types, boolean safe)
fields
- of type Fieldsdelimiter
- of type Stringquote
- of type Stringtypes
- of type Class[]safe
- of type boolean@ConstructorProperties(value={"fields","skipHeader","delimiter","quote","types","safe"}) public TextDelimited(Fields fields, boolean skipHeader, String delimiter, String quote, Class[] types, boolean safe)
fields
- of type FieldsskipHeader
- of type booleandelimiter
- of type Stringquote
- of type Stringtypes
- of type Class[]safe
- of type boolean@ConstructorProperties(value={"fields","sinkCompression","delimiter"}) public TextDelimited(Fields fields, TextLine.Compress sinkCompression, String delimiter)
fields
- of type FieldssinkCompression
- of type Compressdelimiter
- of type String@ConstructorProperties(value={"fields","sinkCompression","skipHeader","delimiter"}) public TextDelimited(Fields fields, TextLine.Compress sinkCompression, boolean skipHeader, String delimiter)
fields
- of type FieldssinkCompression
- of type CompressskipHeader
- of type booleandelimiter
- of type String@ConstructorProperties(value={"fields","sinkCompression","delimiter","types"}) public TextDelimited(Fields fields, TextLine.Compress sinkCompression, String delimiter, Class[] types)
fields
- of type FieldssinkCompression
- of type Compressdelimiter
- of type Stringtypes
- of type Class[]@ConstructorProperties(value={"fields","sinkCompression","skipHeader","delimiter","types"}) public TextDelimited(Fields fields, TextLine.Compress sinkCompression, boolean skipHeader, String delimiter, Class[] types)
fields
- of type FieldssinkCompression
- of type CompressskipHeader
- of type booleandelimiter
- of type Stringtypes
- of type Class[]@ConstructorProperties(value={"fields","sinkCompression","delimiter","types","safe"}) public TextDelimited(Fields fields, TextLine.Compress sinkCompression, String delimiter, Class[] types, boolean safe)
fields
- of type FieldssinkCompression
- of type Compressdelimiter
- of type Stringtypes
- of type Class[]safe
- of type boolean@ConstructorProperties(value={"fields","sinkCompression","skipHeader","delimiter","types","safe"}) public TextDelimited(Fields fields, TextLine.Compress sinkCompression, boolean skipHeader, String delimiter, Class[] types, boolean safe)
fields
- of type FieldssinkCompression
- of type CompressskipHeader
- of type booleandelimiter
- of type Stringtypes
- of type Class[]safe
- of type boolean@ConstructorProperties(value={"fields","delimiter","quote"}) public TextDelimited(Fields fields, String delimiter, String quote)
fields
- of type Fieldsdelimiter
- of type Stringquote
- of type String@ConstructorProperties(value={"fields","skipHeader","delimiter","quote"}) public TextDelimited(Fields fields, boolean skipHeader, String delimiter, String quote)
fields
- of type FieldsskipHeader
- of type booleandelimiter
- of type Stringquote
- of type String@ConstructorProperties(value={"fields","sinkCompression","skipHeader","delimiter","quote"}) public TextDelimited(Fields fields, TextLine.Compress sinkCompression, String delimiter, String quote)
fields
- of type FieldssinkCompression
- of type Compressdelimiter
- of type Stringquote
- of type Stringpublic TextDelimited(Fields fields, TextLine.Compress sinkCompression, boolean skipHeader, String delimiter, String quote)
fields
- of type FieldssinkCompression
- of type CompressskipHeader
- of type booleandelimiter
- of type Stringquote
- of type String@ConstructorProperties(value={"fields","sinkCompression","delimiter","quote","types"}) public TextDelimited(Fields fields, TextLine.Compress sinkCompression, String delimiter, String quote, Class[] types)
fields
- of type FieldssinkCompression
- of type Compressdelimiter
- of type Stringquote
- of type Stringtypes
- of type Class[]@ConstructorProperties(value={"fields","sinkCompression","skipHeader","delimiter","quote","types"}) public TextDelimited(Fields fields, TextLine.Compress sinkCompression, boolean skipHeader, String delimiter, String quote, Class[] types)
fields
- of type FieldssinkCompression
- of type CompressskipHeader
- of type booleandelimiter
- of type Stringquote
- of type Stringtypes
- of type Class[]@ConstructorProperties(value={"fields","sinkCompression","delimiter","quote","types","safe"}) public TextDelimited(Fields fields, TextLine.Compress sinkCompression, String delimiter, String quote, Class[] types, boolean safe)
fields
- of type FieldssinkCompression
- of type Compressdelimiter
- of type Stringquote
- of type Stringtypes
- of type Class[]safe
- of type boolean@ConstructorProperties(value={"fields","sinkCompression","skipHeader","delimiter","quote","types","safe"}) public TextDelimited(Fields fields, TextLine.Compress sinkCompression, boolean skipHeader, String delimiter, String quote, Class[] types, boolean safe)
fields
- of type FieldssinkCompression
- of type CompressskipHeader
- of type booleandelimiter
- of type Stringquote
- of type Stringtypes
- of type Class[]safe
- of type boolean@ConstructorProperties(value={"fields","sinkCompression","skipHeader","delimiter","strict","quote","types","safe"}) public TextDelimited(Fields fields, TextLine.Compress sinkCompression, boolean skipHeader, String delimiter, boolean strict, String quote, Class[] types, boolean safe)
fields
- of type FieldssinkCompression
- of type CompressskipHeader
- of type booleandelimiter
- of type Stringstrict
- of type booleanquote
- of type Stringtypes
- of type Class[]safe
- of type booleanMethod Detail |
---|
public static Pattern createEscapePatternFor(String quote)
Pattern
cleaning quote escapes from a String.
If quote
is null or empty, a null value will be returned;
quote
- of type String
public static Pattern createCleanPatternFor(String quote)
Pattern
for removing quote characters from a String.
If quote
is null or empty, a null value will be returned;
quote
- of type String
public static Pattern createSplitPatternFor(String delimiter, String quote)
Pattern
for splitting a line of text into its component
parts using the given delimiter and quote Strings. quote
may be null.
delimiter
- of type Stringquote
- of type String
public Tuple source(Object key, Object value)
Scheme
Tuple
instance.
source
in class TextLine
key
- of type WritableComparablevalue
- of type Writable
public static String[] createSplit(String value, Pattern splitPattern, int numValues)
value
with the given splitPattern
.
value
- of type StringsplitPattern
- of type PatternnumValues
- of type int
public static Object[] cleanSplit(Object[] split, Pattern cleanPattern, Pattern escapePattern, String quote)
split
array
will be updated in place.
If cleanPattern
is null, quote cleaning will not be performed, but all empty String values
will be replaces with a null
value.
split
- of type Object[]cleanPattern
- of type PatternescapePattern
- of type Patternquote
- of type String
public void sink(TupleEntry tupleEntry, OutputCollector outputCollector) throws IOException
Scheme
Tuple
instance to the outputCollector.
sink
in class TextLine
outputCollector
- of type OutputCollector @throws IOException when
IOException
|
|||||||||
PREV CLASS NEXT CLASS | FRAMES NO FRAMES | ||||||||
SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD |