Apache Hadoop: Manipulation and Transformation of Data Performance培訓

課程代碼

ApHadm1

課程時長

21 時間: 同常來說包括休息是 3天

最低要求

Attendees are not required to have any specific skill as the training is focused on end users skills for both the administration and the manipulation of data under Apache Hadoop

概觀


本課程面向開發人員,架構師,數據科學家或任何需要密集或定期訪問數據的配置文件。

該課程的主要重點是數據處理和轉換。

Hadoop生態系統的工具中,本課程包括Pig和Hive的使用,這兩者都大量用於數據轉換和操作。

此培訓還涉及性能指標和性能優化。

該課程完全是實踐性的,並通過理論方面的介紹來打斷。

Machine Translated

課程簡介

1.1Hadoop Concepts

1.1.1HDFS

  • The Design of HDFS
  • Command line interface
  • Hadoop File System

1.1.2Clusters

  • Anatomy of a cluster
  • Mater Node / Slave node
  • Name Node / Data Node

1.2Data Manipulation

1.2.1MapReduce detailed

  • Map phase
  • Reduce phase
  • Shuffle

1.2.2Analytics with Map Reduce

  • Group-By with MapReduce
  • Frequency distributions and sorting with MapReduce
  • Plotting results (GNU Plot)
  • Histograms with MapReduce
  • Scatter plots with MapReduce
  • Parsing complex datasets
  • Counting with MapReduce and Combiners
  • Build reports

 

1.2.3Data Cleansing

  • Document Cleaning
  • Fuzzy string search
  • Record linkage / data deduplication
  • Transform and sort event dates
  • Validate source reliability
  • Trim Outliers

1.2.4Extracting and Transforming Data

  • Transforming logs
  • Using Apache Pig to filter
  • Using Apache Pig to sort
  • Using Apache Pig to sessionize

1.2.5Advanced Joins

  • Joining data in the Mapper using MapReduce
  • Joining data using Apache Pig replicated join
  • Joining sorted data using Apache Pig merge join
  • Joining skewed data using Apache Pig skewed join
  • Using a map-side join in Apache Hive
  • Using optimized full outer joins in Apache Hive
  • Joining data using an external key value store

1.3Performance Diagnosis and Optimization Techniques

  • Map
    • Investigating spikes in input data
    • Identifying map-side data skew problems
    • Map task throughput
    • Small files
    • Unsplittable files
  • Reduce
    • Too few or too many reducers
    • Reduce-side data skew problems
    • Reduce tasks throughput
    • Slow shuffle and sort
  • Competing jobs and scheduler throttling
  • Stack dumps & unoptimized code
  • Hardware failures
  • CPU contention
  • Tasks
    • Extracting and visualizing task execution times
    • Profiling your map and reduce tasks
  • Avoid the reducer
  • Filter and project
  • Using the combiner
  • Fast sorting with comparators
  • Collecting skewed data
  • Reduce skew mitigation

客戶評論

★★★★★
★★★★★

課程分類

促銷課程

訂閱促銷課程

為尊重您的隱私,我公司不會把您的郵箱地址提供給任何人。您可以享有優先權和隨時取消訂閱的權利。

我們的客戶

is growing fast!

We are looking to expand our presence in Taiwan!

As a Business Development Manager you will:

  • expand business in Taiwan
  • recruit local talent (sales, agents, trainers, consultants)
  • recruit local trainers and consultants

We offer:

  • Artificial Intelligence and Big Data systems to support your local operation
  • high-tech automation
  • continuously upgraded course catalogue and content
  • good fun in international team

If you are interested in running a high-tech, high-quality training and consulting business.

Apply now!