Course Outline
-
Scala Primer
- A concise introduction to Scala
- Labs: Getting Familiar with Scala
-
Spark Basics
- Background and history
- Spark and Hadoop
- Core concepts and architecture of Spark
- Spark ecosystem (core, Spark SQL, MLib, Streaming)
- Labs: Installing and Running Spark
-
First Look at Spark
- Running Spark in local mode
- Spark Web UI
- Spark Shell
- Analyzing datasets – Part 1
- Inspecting RDDs
- Labs: Exploring the Spark Shell
-
RDDs
- RDD concepts
- Partitions
- RDD Operations and Transformations
- RDD Types
- Key-Value Pair RDDs
- MapReduce on RDD
- Caching and Persistence
- Labs: Creating and Inspecting RDDs; Caching RDDs
-
Spark API Programming
- Introduction to the Spark API and RDD API
- Submitting the first program to Spark
- Debugging and Logging
- Configuration Properties
- Labs: Programming with the Spark API; Submitting Jobs
-
Spark SQL
- SQL Support in Spark
- DataFrames
- Defining Tables and Importing Datasets
- Querying DataFrames Using SQL
- Storage Formats: JSON and Parquet
- Labs: Creating and Querying DataFrames; Evaluating Data Formats
-
MLlib
- Introduction to MLlib
- MLlib Algorithms
- Labs: Writing MLlib Applications
-
GraphX
- GraphX Library Overview
- GraphX APIs
- Labs: Processing Graph Data Using Spark
-
Spark Streaming
- Streaming Overview
- Evaluating Streaming Platforms
- Streaming Operations
- Sliding Window Operations
- Labs: Writing Spark Streaming Applications
-
Spark and Hadoop
- Hadoop Introduction (HDFS and YARN)
- Hadoop and Spark Architecture
- Running Spark on Hadoop YARN
- Processing HDFS Files Using Spark
-
Spark Performance and Tuning
- Broadcast Variables
- Accumulators
- Memory Management and Caching
-
Spark Operations
- Deploying Spark in Production
- Sample Deployment Templates
- Configurations
- Monitoring
- Troubleshooting
Requirements
PRE-REQUISITES
Proficiency in either Java, Scala, or Python (our hands-on labs utilize Scala and Python)
A fundamental understanding of the Linux development environment (including command-line navigation and file editing using VI or nano)
Testimonials (6)
Doing similar exercises different ways really help understanding what each component (Hadoop/Spark, standalone/cluster) can do on its own and together. It gave me ideas on how I should test my application on my local machine when I develop vs when it is deployed on a cluster.
Thomas Carcaud - IT Frankfurt GmbH
Course - Spark for Developers
Ajay was very friendly, helpful and also knowledgable about the topic he was discussing.
Biniam Guulay - ICE International Copyright Enterprise Germany GmbH
Course - Spark for Developers
Ernesto did a great job explaining the high level concepts of using Spark and its various modules.
Michael Nemerouf
Course - Spark for Developers
The trainer made the class interesting and entertaining which helps quite a bit with all day training.
Ryan Speelman
Course - Spark for Developers
We know a lot more about the whole environment.
John Kidd
Course - Spark for Developers
Richard is very calm and methodical, with an analytic insight - exactly the qualities needed to present this sort of course.