Spark for Developers培訓

課程代碼

sparkdev

課程時長

21 時間: 同常來說包括休息是 3天

最低要求

PRE-REQUISITES

familiarity with either Java / Scala / Python language (our labs in Scala and Python)
basic understanding of Linux development environment (command line navigation / editing files using VI or nano)

概觀

目的:

本課程將介紹Apache Spark 。學生將學習Spark如何適應Big Data生態系統,以及如何使用Spark進行數據分析。該課程涵蓋Spark shell,用於交互式數據分析,Spark內部,Spark API,Spark SQL ,Spark流,機器學習和graphX。

聽眾:

開發人員/數據分析師

Machine Translated

課程簡介

  1. Scala primer

    • A quick introduction to Scala
    • Labs : Getting know Scala
  2. Spark Basics

    • Background and history
    • Spark and Hadoop
    • Spark concepts and architecture
    • Spark eco system (core, spark sql, mlib, streaming)
    • Labs : Installing and running Spark
  3. First Look at Spark

    • Running Spark in local mode
    • Spark web UI
    • Spark shell
    • Analyzing dataset – part 1
    • Inspecting RDDs
    • Labs: Spark shell exploration
  4. RDDs

    • RDDs concepts
    • Partitions
    • RDD Operations / transformations
    • RDD types
    • Key-Value pair RDDs
    • MapReduce on RDD
    • Caching and persistence
    • Labs : creating & inspecting RDDs;   Caching RDDs
  5. Spark API programming

    • Introduction to Spark API / RDD API
    • Submitting the first program to Spark
    • Debugging / logging
    • Configuration properties
    • Labs : Programming in Spark API, Submitting jobs
  6. Spark SQL

    • SQL support in Spark
    • Dataframes
    • Defining tables and importing datasets
    • Querying data frames using SQL
    • Storage formats : JSON / Parquet
    • Labs : Creating and querying data frames; evaluating data formats
  7. MLlib

    • MLlib intro
    • MLlib algorithms
    • Labs : Writing MLib applications
  8. GraphX

    • GraphX library overview
    • GraphX APIs
    • Labs : Processing graph data using Spark
  9. Spark Streaming

    • Streaming overview
    • Evaluating Streaming platforms
    • Streaming operations
    • Sliding window operations
    • Labs : Writing spark streaming applications
  10. Spark and Hadoop

    • Hadoop Intro (HDFS / YARN)
    • Hadoop + Spark architecture
    • Running Spark on Hadoop YARN
    • Processing HDFS files using Spark
  11. Spark Performance and Tuning

    • Broadcast variables
    • Accumulators
    • Memory management & caching
  12. Spark Operations

    • Deploying Spark in production
    • Sample deployment templates
    • Configurations
    • Monitoring
    • Troubleshooting

客戶評論

★★★★★
★★★★★

課程分類

促銷課程

訂閱促銷課程

為尊重您的隱私,我公司不會把您的郵箱地址提供給任何人。您可以享有優先權和隨時取消訂閱的權利。

我們的客戶

is growing fast!

We are looking to expand our presence in Taiwan!

As a Business Development Manager you will:

  • expand business in Taiwan
  • recruit local talent (sales, agents, trainers, consultants)
  • recruit local trainers and consultants

We offer:

  • Artificial Intelligence and Big Data systems to support your local operation
  • high-tech automation
  • continuously upgraded course catalogue and content
  • good fun in international team

If you are interested in running a high-tech, high-quality training and consulting business.

Apply now!