Cascading 3.0 User Guide

1 Introduction

1.1 What Is Cascading?

2 Diving into the APIs

2.1 Anatomy of a Word-Count Application

3 Cascading Basic Concepts

3.1 Terminology

3.3 Pipes

3.4 Platforms

3.7 Flows

4 Tuple Fields

4.1 Field Sets

5 Pipe Assemblies

5.1 Each and Every Pipes

5.2 Merge

5.3 GroupBy

5.4 CoGroup

5.5 HashJoin

6 Flows

6.1 Creating Flows from Pipe Assemblies

7 Cascades

7.1 Creating a Cascade

8 Configuring

8.1 Introduction

9 Local Platform

9.1 Building an Application

10 The Apache Hadoop Platforms

10.1 What is Apache Hadoop?

11 Apache Hadoop MapReduce Platform

11.1 Configuring Applications

11.3 Building

12 Apache Tez Platform

12.1 Configuring Applications

12.2 Building

13 Using and Developing Operations

13.1 Introduction

13.2 Functions

13.3 Filters

13.5 Buffers

14 Custom Taps and Schemes

14.1 Introduction

15 Advanced Processing

15.1 SubAssemblies

16 Built-In Operations

16.1 Identity Function

16.9 Assertions

16.11 Buffers

17 Built-in SubAssemblies

17.1 Optimized Aggregations

18 Cascading Best Practices

18.1 Unit Testing

19 Extending Cascading

19.1 Scripting

20 Cookbook: Code Examples of Cascading Idioms

20.1 Tuples and Fields

20.5 API Usage

21 The Cascading Process Planner

21.1 FlowConnector