Thank you for sending your enquiry! One of our team members will contact you shortly.
Thank you for sending your booking! One of our team members will contact you shortly.
Course Outline
Introduction
Understanding Hadoop's Architecture and Key Concepts
Understanding the Hadoop Distributed File System (HDFS)
- Overview of HDFS and its Architectural Design
- Interacting with HDFS
- Performing Basic File Operations on HDFS
- Overview of HDFS Command Reference
- Overview of Snakebite
- Installing Snakebite
- Using the Snakebite Client Library
- Using the CLI Client
Learning the MapReduce Programming Model with Python
- Overview of the MapReduce Programming Model
- Understanding Data Flow in the MapReduce Framework
- Map
- Shuffle and Sort
- Reduce
- Using the Hadoop Streaming Utility
- Understanding How the Hadoop Streaming Utility Works
- Demo: Implementing the WordCount Application on Python
- Using the mrjob Library
- Overview of mrjob
- Installing mrjob
- Demo: Implementing the WordCount Algorithm Using mrjob
- Understanding How a MapReduce Job Written with the mrjob Library Works
- Executing a MapReduce Application with mrjob
- Hands-on: Computing Top Salaries Using mrjob
Learning Pig with Python
- Overview of Pig
- Demo: Implementing the WordCount Algorithm in Pig
- Configuring and Running Pig Scripts and Pig Statements
- Using the Pig Execution Modes
- Using the Pig Interactive Mode
- Using the Pic Batch Mode
- Understanding the Basic Concepts of the Pig Latin Language
- Using Statements
- Loading Data
- Transforming Data
- Storing Data
- Extending Pig's Functionality with Python UDFs
- Registering a Python UDF File
- Demo: A Simple Python UDF
- Demo: String Manipulation Using Python UDF
- Hands-on: Calculating the 10 Most Recent Movies Using Python UDF
Using Spark and PySpark
- Overview of Spark
- Demo: Implementing the WordCount Algorithm in PySpark
- Overview of PySpark
- Using an Interactive Shell
- Implementing Self-Contained Applications
- Working with Resilient Distributed Datasets (RDDs)
- Creating RDDs from a Python Collection
- Creating RDDs from Files
- Implementing RDD Transformations
- Implementing RDD Actions
- Hands-on: Implementing a Text Search Program for Movie Titles with PySpark
Managing Workflow with Python
- Overview of Apache Oozie and Luigi
- Installing Luigi
- Understanding Luigi Workflow Concepts
- Tasks
- Targets
- Parameters
- Demo: Examining a Workflow that Implements the WordCount Algorithm
- Working with Hadoop Workflows that Control MapReduce and Pig Jobs
- Using Luigi's Configuration Files
- Working with MapReduce in Luigi
- Working with Pig in Luigi
Summary and Conclusion
Requirements
- Experience with Python programming
- Basic familiarity with Hadoop
28 Hours