Hadoop with Python培訓

課程代碼

hadooppython

課程時長

28 時間: 同常來說包括休息是 4天

最低要求

  • Experience with Python programming
  • Basic familiarity with Hadoop

概觀

Hadoop是一種流行的Big Data處理框架。 Python是一種高級編程語言,以其清晰的語法和代碼可讀性而聞名。

在這個以講師為主導的現場培訓中,參與者將學習如何使用Python來處理Hadoop ,MapReduce,Pig和Spark,因為他們會逐步完成多個示例和用例。

在培訓結束時,參與者將能夠:

  • 了解Hadoop ,MapReduce,Pig和Spark背後的基本概念
  • PythonHadoop分佈式文件系統(HDFS),MapReduce,Pig和Spark結合使用
  • 使用Snakebite以編程方式訪問Python HDFS
  • 使用mrjob在Python編寫MapReduce作業
  • Python編寫Spark程序
  • 使用Python UDF擴展pig的功能
  • 使用Luigi管理MapReduce作業和Pig腳本

聽眾

  • 開發商
  • IT專業人士

課程形式

  • 部分講座,部分討論,練習和繁重的實踐練習

Machine Translated

課程簡介

Introduction

Understanding Hadoop's Architecture and Key Concepts

Understanding the Hadoop Distributed File System (HDFS)

  • Overview of HDFS and its Architectural Design
  • Interacting with HDFS
  • Performing Basic File Operations on HDFS
  • Overview of HDFS Command Reference
  • Overview of Snakebite
  • Installing Snakebite
  • Using the Snakebite Client Library
  • Using the CLI Client

Learning the MapReduce Programming Model with Python

  • Overview of the MapReduce Programming Model
  • Understanding Data Flow in the MapReduce Framework
    • Map
    • Shuffle and Sort
    • Reduce
  • Using the Hadoop Streaming Utility
    • Understanding How the Hadoop Streaming Utility Works
    • Demo: Implementing the WordCount Application on Python
  • Using the mrjob Library
    • Overview of mrjob
    • Installing mrjob
    • Demo: Implementing the WordCount Algorithm Using mrjob
    • Understanding How a MapReduce Job Written with the mrjob Library Works
    • Executing a MapReduce Application with mrjob
    • Hands-on: Computing Top Salaries Using mrjob

Learning Pig with Python

  • Overview of Pig
  • Demo: Implementing the WordCount Algorithm in Pig
  • Configuring and Running Pig Scripts and Pig Statements
    • Using the Pig Execution Modes
    • Using the Pig Interactive Mode
    • Using the Pic Batch Mode
  • Understanding the Basic Concepts of the Pig Latin Language
    • Using Statements
    • Loading Data
    • Transforming Data
    • Storing Data
  • Extending Pig's Functionality with Python UDFs
    • Registering a Python UDF File
    • Demo: A Simple Python UDF
    • Demo: String Manipulation Using Python UDF
    • Hands-on: Calculating the 10 Most Recent Movies Using Python UDF

Using Spark and PySpark

  • Overview of Spark
  • Demo: Implementing the WordCount Algorithm in PySpark
  • Overview of PySpark
    • Using an Interactive Shell
    • Implementing Self-Contained Applications
  • Working with Resilient Distributed Datasets (RDDs)
    • Creating RDDs from a Python Collection
    • Creating RDDs from Files
    • Implementing RDD Transformations
    • Implementing RDD Actions
  • Hands-on: Implementing a Text Search Program for Movie Titles with PySpark

Managing Workflow with Python

  • Overview of Apache Oozie and Luigi
  • Installing Luigi
  • Understanding Luigi Workflow Concepts
    • Tasks
    • Targets
    • Parameters
  • Demo: Examining a Workflow that Implements the WordCount Algorithm
  • Working with Hadoop Workflows that Control MapReduce and Pig Jobs
    • Using Luigi's Configuration Files
    • Working with MapReduce in Luigi
    • Working with Pig in Luigi

Summary and Conclusion

客戶評論

★★★★★
★★★★★

課程分類

促銷課程

訂閱促銷課程

為尊重您的隱私,我公司不會把您的郵箱地址提供給任何人。您可以享有優先權和隨時取消訂閱的權利。

我們的客戶

is growing fast!

We are looking to expand our presence in Taiwan!

As a Business Development Manager you will:

  • expand business in Taiwan
  • recruit local talent (sales, agents, trainers, consultants)
  • recruit local trainers and consultants

We offer:

  • Artificial Intelligence and Big Data systems to support your local operation
  • high-tech automation
  • continuously upgraded course catalogue and content
  • good fun in international team

If you are interested in running a high-tech, high-quality training and consulting business.

Apply now!