Apache Spark in the Cloud培訓

課程代碼

sparkcloud

課程時長

21 時間: 同常來說包括休息是 3天

最低要求

Programing skills (preferably python, scala)

SQL basics

概觀

Apache Spark的學習曲線在開始時逐漸增加,需要付出很多努力來獲得第一次回歸。本課程旨在突破第一個艱難的部分。參加本課程後,參與者將了解Apache Spark的基礎知識,他們將明確區分RDD和DataFrame,他們將學習PythonScala API,他們將理解執行者和任務等。同樣遵循最佳實踐,本課程重點關注雲部署,Databricks和AWS。學生還將了解AWS EMR與AWS Glue之間的差異,AWS Glue是AWS最新的Spark服務之一。

聽眾:

數據工程師, DevOps ,數據科學家

Machine Translated

課程簡介

 

Introduction:

  • Apache Spark in Hadoop Ecosystem
  • Short intro for python, scala

Basics (theory):

  • Architecture
  • RDD
  • Transformation and Actions
  • Stage, Task, Dependencies

Using Databricks environment understand the basics (hands-on workshop):

  • Exercises using RDD API
  • Basic action and transformation functions
  • PairRDD
  • Join
  • Caching strategies
  • Exercises using DataFrame API
  • SparkSQL
  • DataFrame: select, filter, group, sort
  • UDF (User Defined Function)
  • Looking into DataSet API
  • Streaming

Using AWS environment understand the deployment (hands-on workshop):

  • Basics of AWS Glue
  • Understand differencies between AWS EMR and AWS Glue
  • Example jobs on both environment
  • Understand pros and cons

Extra:

  • Introduction to Apache Airflow orchestration

客戶評論

★★★★★
★★★★★

課程分類

促銷課程

訂閱促銷課程

為尊重您的隱私,我公司不會把您的郵箱地址提供給任何人。您可以享有優先權和隨時取消訂閱的權利。

我們的客戶

is growing fast!

We are looking to expand our presence in Taiwan!

As a Business Development Manager you will:

  • expand business in Taiwan
  • recruit local talent (sales, agents, trainers, consultants)
  • recruit local trainers and consultants

We offer:

  • Artificial Intelligence and Big Data systems to support your local operation
  • high-tech automation
  • continuously upgraded course catalogue and content
  • good fun in international team

If you are interested in running a high-tech, high-quality training and consulting business.

Apply now!