Get in Touch

Course Outline

  1. Scala Primer

    • A concise introduction to Scala
    • Labs: Getting Familiar with Scala
  2. Spark Basics

    • Background and history
    • Spark and Hadoop
    • Core concepts and architecture of Spark
    • Spark ecosystem (core, Spark SQL, MLib, Streaming)
    • Labs: Installing and Running Spark
  3. First Look at Spark

    • Running Spark in local mode
    • Spark Web UI
    • Spark Shell
    • Analyzing datasets – Part 1
    • Inspecting RDDs
    • Labs: Exploring the Spark Shell
  4. RDDs

    • RDD concepts
    • Partitions
    • RDD Operations and Transformations
    • RDD Types
    • Key-Value Pair RDDs
    • MapReduce on RDD
    • Caching and Persistence
    • Labs: Creating and Inspecting RDDs; Caching RDDs
  5. Spark API Programming

    • Introduction to the Spark API and RDD API
    • Submitting the first program to Spark
    • Debugging and Logging
    • Configuration Properties
    • Labs: Programming with the Spark API; Submitting Jobs
  6. Spark SQL

    • SQL Support in Spark
    • DataFrames
    • Defining Tables and Importing Datasets
    • Querying DataFrames Using SQL
    • Storage Formats: JSON and Parquet
    • Labs: Creating and Querying DataFrames; Evaluating Data Formats
  7. MLlib

    • Introduction to MLlib
    • MLlib Algorithms
    • Labs: Writing MLlib Applications
  8. GraphX

    • GraphX Library Overview
    • GraphX APIs
    • Labs: Processing Graph Data Using Spark
  9. Spark Streaming

    • Streaming Overview
    • Evaluating Streaming Platforms
    • Streaming Operations
    • Sliding Window Operations
    • Labs: Writing Spark Streaming Applications
  10. Spark and Hadoop

    • Hadoop Introduction (HDFS and YARN)
    • Hadoop and Spark Architecture
    • Running Spark on Hadoop YARN
    • Processing HDFS Files Using Spark
  11. Spark Performance and Tuning

    • Broadcast Variables
    • Accumulators
    • Memory Management and Caching
  12. Spark Operations

    • Deploying Spark in Production
    • Sample Deployment Templates
    • Configurations
    • Monitoring
    • Troubleshooting

Requirements

PRE-REQUISITES

Proficiency in either Java, Scala, or Python (our hands-on labs utilize Scala and Python)
A fundamental understanding of the Linux development environment (including command-line navigation and file editing using VI or nano)

 21 Hours

Number of participants


Price per participant

Testimonials (6)

Upcoming Courses

Related Categories