Cascading 3.0 User Guide
- 1 Introduction
- 2 Diving into the APIs
- 3 Cascading Basic Concepts
-
3.1 Terminology
3.2 Pipe Assemblies
3.3 Pipes
3.4 Platforms
3.6 Sink Modes
3.7 Flows
- 4 Tuple Fields
-
4.1 Field Sets
4.2 Field Algebra
4.3 Field Typing
4.4 Type Coercion
- 5 Pipe Assemblies
-
5.2 Merge
5.3 GroupBy
5.4 CoGroup
5.5 HashJoin
- 6 Flows
-
6.1 Creating Flows from Pipe Assemblies
6.3 Skipping Flows
6.6 Runtime Metrics
- 7 Cascades
- 8 Configuring
-
8.1 Introduction
- 9 Local Platform
- 10 The Apache Hadoop Platforms
-
10.8 Source and Sink Taps
- 11 Apache Hadoop MapReduce Platform
-
11.3 Building
- 12 Apache Tez Platform
-
12.2 Building
- 13 Using and Developing Operations
-
13.1 Introduction
13.2 Functions
13.3 Filters
13.4 Aggregators
13.5 Buffers
- 14 Custom Taps and Schemes
-
14.1 Introduction
14.2 Custom Taps
14.3 Custom Schemes
- 15 Advanced Processing
-
15.1 SubAssemblies
15.2 Stream Assertions
15.3 Failure Traps
15.4 Checkpointing
15.7 PartitionTaps
- 16 Built-In Operations
-
16.1 Identity Function
16.2 Debug Function
16.4 Insert Function
16.5 Text Functions
16.8 XML Operations
16.9 Assertions
16.10 Logical Filter Operators
16.11 Buffers
- 17 Built-in SubAssemblies
-
17.2 Stream Shaping
- 18 Cascading Best Practices
-
18.1 Unit Testing
18.2 Flow Granularity
18.7 Optimizing Joins
18.8 Debugging Streams
18.11 Fields Constants
18.12 Checking the Source Code
- 19 Extending Cascading
-
19.1 Scripting
- 20 Cookbook: Code Examples of Cascading Idioms
-
20.1 Tuples and Fields
20.2 Stream Shaping
20.3 Common Operations
20.4 Stream Ordering
20.5 API Usage
- 21 The Cascading Process Planner
-
21.1 FlowConnector
21.2 RuleRegistrySet
21.3 RuleRegistry