|
|||||||||
| PREV CLASS NEXT CLASS | FRAMES NO FRAMES | ||||||||
| SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD | ||||||||
java.lang.Objectcascading.scheme.Scheme
cascading.scheme.TextLine
cascading.scheme.TextDelimited
public class TextDelimited
Class TextDelimited is a sub-class of TextLine. It provides direct support for delimited text files, like
TAB (\t) or COMMA (,) delimited files. It also optionally allows for quoted values.
Scheme is both strict and safe.
Strict meaning if a line of text does not parse into the expected number of fields, this class will throw a
TapException. If strict is false, then Tuple will be returned with null values
for the missing fields.
Safe meaning if a field cannot be coerced into an expected type, a null will be used for the value.
If safe is false, a TapException will be thrown.
Also by default, quote strings are not searched for to improve processing speed. If a file is
COMMA delimited but may have COMMA's in a value, the whole value should be surrounded by the quote string, typically
double quotes (").
Note all empty fields in a line will be returned as null unless coerced into a new type.
This Scheme may source/sink Fields.ALL, when given on the constructor the new instance will automatically
default to strict == false as the number of fields parsed are arbitrary or unknown. A type array may not be given
either, so all values will be returned as Strings.
TextLine,
Serialized Form| Nested Class Summary |
|---|
| Nested classes/interfaces inherited from class cascading.scheme.TextLine |
|---|
TextLine.Compress |
| Field Summary | |
|---|---|
protected Pattern |
cleanPattern
Field cleanPattern |
protected Pattern |
escapePattern
Field escapePattern |
protected Pattern |
splitPattern
Field splitPattern |
| Fields inherited from class cascading.scheme.TextLine |
|---|
DEFAULT_SOURCE_FIELDS |
| Constructor Summary | |
|---|---|
TextDelimited(Fields fields,
boolean skipHeader,
String delimiter)
Constructor TextDelimited creates a new TextDelimited instance. |
|
TextDelimited(Fields fields,
boolean skipHeader,
String delimiter,
Class[] types)
Constructor TextDelimited creates a new TextDelimited instance. |
|
TextDelimited(Fields fields,
boolean skipHeader,
String delimiter,
String quote)
Constructor TextDelimited creates a new TextDelimited instance. |
|
TextDelimited(Fields fields,
boolean skipHeader,
String delimiter,
String quote,
Class[] types)
Constructor TextDelimited creates a new TextDelimited instance. |
|
TextDelimited(Fields fields,
boolean skipHeader,
String delimiter,
String quote,
Class[] types,
boolean safe)
Constructor TextDelimited creates a new TextDelimited instance. |
|
TextDelimited(Fields fields,
String delimiter)
Constructor TextDelimited creates a new TextDelimited instance. |
|
TextDelimited(Fields fields,
String delimiter,
Class[] types)
Constructor TextDelimited creates a new TextDelimited instance. |
|
TextDelimited(Fields fields,
String delimiter,
String quote)
Constructor TextDelimited creates a new TextDelimited instance. |
|
TextDelimited(Fields fields,
String delimiter,
String quote,
Class[] types)
Constructor TextDelimited creates a new TextDelimited instance. |
|
TextDelimited(Fields fields,
String delimiter,
String quote,
Class[] types,
boolean safe)
Constructor TextDelimited creates a new TextDelimited instance. |
|
TextDelimited(Fields fields,
TextLine.Compress sinkCompression,
boolean skipHeader,
String delimiter)
Constructor TextDelimited creates a new TextDelimited instance. |
|
TextDelimited(Fields fields,
TextLine.Compress sinkCompression,
boolean skipHeader,
String delimiter,
boolean strict,
String quote,
Class[] types,
boolean safe)
Constructor TextDelimited creates a new TextDelimited instance. |
|
TextDelimited(Fields fields,
TextLine.Compress sinkCompression,
boolean skipHeader,
String delimiter,
Class[] types)
Constructor TextDelimited creates a new TextDelimited instance. |
|
TextDelimited(Fields fields,
TextLine.Compress sinkCompression,
boolean skipHeader,
String delimiter,
Class[] types,
boolean safe)
Constructor TextDelimited creates a new TextDelimited instance. |
|
TextDelimited(Fields fields,
TextLine.Compress sinkCompression,
boolean skipHeader,
String delimiter,
String quote)
Constructor TextDelimited creates a new TextDelimited instance. |
|
TextDelimited(Fields fields,
TextLine.Compress sinkCompression,
boolean skipHeader,
String delimiter,
String quote,
Class[] types)
Constructor TextDelimited creates a new TextDelimited instance. |
|
TextDelimited(Fields fields,
TextLine.Compress sinkCompression,
boolean skipHeader,
String delimiter,
String quote,
Class[] types,
boolean safe)
Constructor TextDelimited creates a new TextDelimited instance. |
|
TextDelimited(Fields fields,
TextLine.Compress sinkCompression,
String delimiter)
Constructor TextDelimited creates a new TextDelimited instance. |
|
TextDelimited(Fields fields,
TextLine.Compress sinkCompression,
String delimiter,
Class[] types)
Constructor TextDelimited creates a new TextDelimited instance. |
|
TextDelimited(Fields fields,
TextLine.Compress sinkCompression,
String delimiter,
Class[] types,
boolean safe)
Constructor TextDelimited creates a new TextDelimited instance. |
|
TextDelimited(Fields fields,
TextLine.Compress sinkCompression,
String delimiter,
String quote)
Constructor TextDelimited creates a new TextDelimited instance. |
|
TextDelimited(Fields fields,
TextLine.Compress sinkCompression,
String delimiter,
String quote,
Class[] types)
Constructor TextDelimited creates a new TextDelimited instance. |
|
TextDelimited(Fields fields,
TextLine.Compress sinkCompression,
String delimiter,
String quote,
Class[] types,
boolean safe)
Constructor TextDelimited creates a new TextDelimited instance. |
|
| Method Summary | |
|---|---|
static Object[] |
cleanSplit(Object[] split,
Pattern cleanPattern,
Pattern escapePattern,
String quote)
Method cleanSplit will return a quote free array of String values, the given split array
will be updated in place. |
static Pattern |
createCleanPatternFor(String quote)
Method createCleanPatternFor creates a regex Pattern for removing quote characters from a String. |
static Pattern |
createEscapePatternFor(String quote)
Method createEscapePatternFor creates a regex Pattern cleaning quote escapes from a String. |
static String[] |
createSplit(String value,
Pattern splitPattern,
int numValues)
Method createSplit will split the given value with the given splitPattern. |
static Pattern |
createSplitPatternFor(String delimiter,
String quote)
Method createSplitPatternFor creates a regex Pattern for splitting a line of text into its component
parts using the given delimiter and quote Strings. |
void |
sink(TupleEntry tupleEntry,
OutputCollector outputCollector)
Method sink writes out the given Tuple instance to the outputCollector. |
Tuple |
source(Object key,
Object value)
Method source takes the given Hadoop key and value and returns a new Tuple instance. |
| Methods inherited from class cascading.scheme.TextLine |
|---|
getSinkCompression, setSinkCompression, sinkInit, sourceInit |
| Methods inherited from class cascading.scheme.Scheme |
|---|
equals, getNumSinkParts, getSinkFields, getSourceFields, getTrace, hashCode, isSink, isSource, isSymmetrical, isWriteDirect, setNumSinkParts, setSinkFields, setSourceFields, toString |
| Methods inherited from class java.lang.Object |
|---|
clone, finalize, getClass, notify, notifyAll, wait, wait, wait |
| Field Detail |
|---|
protected Pattern splitPattern
protected Pattern cleanPattern
protected Pattern escapePattern
| Constructor Detail |
|---|
@ConstructorProperties(value={"fields","delimiter"})
public TextDelimited(Fields fields,
String delimiter)
fields - of type Fieldsdelimiter - of type String
@ConstructorProperties(value={"fields","skipHeader","delimiter"})
public TextDelimited(Fields fields,
boolean skipHeader,
String delimiter)
fields - of type FieldsskipHeader - of type booleandelimiter - of type String
@ConstructorProperties(value={"fields","delimiter","types"})
public TextDelimited(Fields fields,
String delimiter,
Class[] types)
fields - of type Fieldsdelimiter - of type Stringtypes - of type Class[]
@ConstructorProperties(value={"fields","skipHeader","delimiter","types"})
public TextDelimited(Fields fields,
boolean skipHeader,
String delimiter,
Class[] types)
fields - of type FieldsskipHeader - of type booleandelimiter - of type Stringtypes - of type Class[]
@ConstructorProperties(value={"fields","delimiter","quote","types"})
public TextDelimited(Fields fields,
String delimiter,
String quote,
Class[] types)
fields - of type Fieldsdelimiter - of type Stringquote - of type Stringtypes - of type Class[]
@ConstructorProperties(value={"fields","skipHeader","delimiter","quote","types"})
public TextDelimited(Fields fields,
boolean skipHeader,
String delimiter,
String quote,
Class[] types)
fields - of type FieldsskipHeader - of type booleandelimiter - of type Stringquote - of type Stringtypes - of type Class[]
@ConstructorProperties(value={"fields","delimiter","quote","types","safe"})
public TextDelimited(Fields fields,
String delimiter,
String quote,
Class[] types,
boolean safe)
fields - of type Fieldsdelimiter - of type Stringquote - of type Stringtypes - of type Class[]safe - of type boolean
@ConstructorProperties(value={"fields","skipHeader","delimiter","quote","types","safe"})
public TextDelimited(Fields fields,
boolean skipHeader,
String delimiter,
String quote,
Class[] types,
boolean safe)
fields - of type FieldsskipHeader - of type booleandelimiter - of type Stringquote - of type Stringtypes - of type Class[]safe - of type boolean
@ConstructorProperties(value={"fields","sinkCompression","delimiter"})
public TextDelimited(Fields fields,
TextLine.Compress sinkCompression,
String delimiter)
fields - of type FieldssinkCompression - of type Compressdelimiter - of type String
@ConstructorProperties(value={"fields","sinkCompression","skipHeader","delimiter"})
public TextDelimited(Fields fields,
TextLine.Compress sinkCompression,
boolean skipHeader,
String delimiter)
fields - of type FieldssinkCompression - of type CompressskipHeader - of type booleandelimiter - of type String
@ConstructorProperties(value={"fields","sinkCompression","delimiter","types"})
public TextDelimited(Fields fields,
TextLine.Compress sinkCompression,
String delimiter,
Class[] types)
fields - of type FieldssinkCompression - of type Compressdelimiter - of type Stringtypes - of type Class[]
@ConstructorProperties(value={"fields","sinkCompression","skipHeader","delimiter","types"})
public TextDelimited(Fields fields,
TextLine.Compress sinkCompression,
boolean skipHeader,
String delimiter,
Class[] types)
fields - of type FieldssinkCompression - of type CompressskipHeader - of type booleandelimiter - of type Stringtypes - of type Class[]
@ConstructorProperties(value={"fields","sinkCompression","delimiter","types","safe"})
public TextDelimited(Fields fields,
TextLine.Compress sinkCompression,
String delimiter,
Class[] types,
boolean safe)
fields - of type FieldssinkCompression - of type Compressdelimiter - of type Stringtypes - of type Class[]safe - of type boolean
@ConstructorProperties(value={"fields","sinkCompression","skipHeader","delimiter","types","safe"})
public TextDelimited(Fields fields,
TextLine.Compress sinkCompression,
boolean skipHeader,
String delimiter,
Class[] types,
boolean safe)
fields - of type FieldssinkCompression - of type CompressskipHeader - of type booleandelimiter - of type Stringtypes - of type Class[]safe - of type boolean
@ConstructorProperties(value={"fields","delimiter","quote"})
public TextDelimited(Fields fields,
String delimiter,
String quote)
fields - of type Fieldsdelimiter - of type Stringquote - of type String
@ConstructorProperties(value={"fields","skipHeader","delimiter","quote"})
public TextDelimited(Fields fields,
boolean skipHeader,
String delimiter,
String quote)
fields - of type FieldsskipHeader - of type booleandelimiter - of type Stringquote - of type String
@ConstructorProperties(value={"fields","sinkCompression","skipHeader","delimiter","quote"})
public TextDelimited(Fields fields,
TextLine.Compress sinkCompression,
String delimiter,
String quote)
fields - of type FieldssinkCompression - of type Compressdelimiter - of type Stringquote - of type String
public TextDelimited(Fields fields,
TextLine.Compress sinkCompression,
boolean skipHeader,
String delimiter,
String quote)
fields - of type FieldssinkCompression - of type CompressskipHeader - of type booleandelimiter - of type Stringquote - of type String
@ConstructorProperties(value={"fields","sinkCompression","delimiter","quote","types"})
public TextDelimited(Fields fields,
TextLine.Compress sinkCompression,
String delimiter,
String quote,
Class[] types)
fields - of type FieldssinkCompression - of type Compressdelimiter - of type Stringquote - of type Stringtypes - of type Class[]
@ConstructorProperties(value={"fields","sinkCompression","skipHeader","delimiter","quote","types"})
public TextDelimited(Fields fields,
TextLine.Compress sinkCompression,
boolean skipHeader,
String delimiter,
String quote,
Class[] types)
fields - of type FieldssinkCompression - of type CompressskipHeader - of type booleandelimiter - of type Stringquote - of type Stringtypes - of type Class[]
@ConstructorProperties(value={"fields","sinkCompression","delimiter","quote","types","safe"})
public TextDelimited(Fields fields,
TextLine.Compress sinkCompression,
String delimiter,
String quote,
Class[] types,
boolean safe)
fields - of type FieldssinkCompression - of type Compressdelimiter - of type Stringquote - of type Stringtypes - of type Class[]safe - of type boolean
@ConstructorProperties(value={"fields","sinkCompression","skipHeader","delimiter","quote","types","safe"})
public TextDelimited(Fields fields,
TextLine.Compress sinkCompression,
boolean skipHeader,
String delimiter,
String quote,
Class[] types,
boolean safe)
fields - of type FieldssinkCompression - of type CompressskipHeader - of type booleandelimiter - of type Stringquote - of type Stringtypes - of type Class[]safe - of type boolean
@ConstructorProperties(value={"fields","sinkCompression","skipHeader","delimiter","strict","quote","types","safe"})
public TextDelimited(Fields fields,
TextLine.Compress sinkCompression,
boolean skipHeader,
String delimiter,
boolean strict,
String quote,
Class[] types,
boolean safe)
fields - of type FieldssinkCompression - of type CompressskipHeader - of type booleandelimiter - of type Stringstrict - of type booleanquote - of type Stringtypes - of type Class[]safe - of type boolean| Method Detail |
|---|
public static Pattern createEscapePatternFor(String quote)
Pattern cleaning quote escapes from a String.
If quote is null or empty, a null value will be returned;
quote - of type String
public static Pattern createCleanPatternFor(String quote)
Pattern for removing quote characters from a String.
If quote is null or empty, a null value will be returned;
quote - of type String
public static Pattern createSplitPatternFor(String delimiter,
String quote)
Pattern for splitting a line of text into its component
parts using the given delimiter and quote Strings. quote may be null.
delimiter - of type Stringquote - of type String
public Tuple source(Object key,
Object value)
SchemeTuple instance.
source in class TextLinekey - of type WritableComparablevalue - of type Writable
public static String[] createSplit(String value,
Pattern splitPattern,
int numValues)
value with the given splitPattern.
value - of type StringsplitPattern - of type PatternnumValues - of type int
public static Object[] cleanSplit(Object[] split,
Pattern cleanPattern,
Pattern escapePattern,
String quote)
split array
will be updated in place.
If cleanPattern is null, quote cleaning will not be performed, but all empty String values
will be replaces with a null value.
split - of type Object[]cleanPattern - of type PatternescapePattern - of type Patternquote - of type String
public void sink(TupleEntry tupleEntry,
OutputCollector outputCollector)
throws IOException
SchemeTuple instance to the outputCollector.
sink in class TextLineoutputCollector - of type OutputCollector @throws IOException when
IOException
|
|||||||||
| PREV CLASS NEXT CLASS | FRAMES NO FRAMES | ||||||||
| SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD | ||||||||