課程簡介
第 1 部分:HDFS 中的數據 Management
- 各種資料格式(JSON / Avro / Parquet)
- 壓縮方案
- 數據脫敏
- 實驗室:分析不同的數據格式;啟用壓縮
第 2 部分:高級清管器
- 用戶定義的函數
- Pig 庫簡介 (ElephantBird / Data-Fu)
- 使用 Pig 載入複雜的結構化數據
- 清管器調整
- 實驗室:高級清管腳本,解析複雜數據類型
第 3 部分:高級 Hive
- 用戶定義的函數
- 壓縮表
- Hive 性能調優
- 實驗室:創建壓縮表,評估表格式和配置
第4部分:高級 HBase
- 高級模式建模
- 壓縮
- 批量數據攝取
- 寬工作臺/高工作台比較
- HBase 和 Pig
- HBase 和 Hive
- HBase 性能調優
- 實驗室:調整 HBase;從 Pig 訪問 HBase 數據 & Hive;使用 Phoenix 進行數據建模
最低要求
-
熟悉 Java 程式設計
- 語言(大多數程式設計練習都是用 Java 進行的)
- 在 Linux 環境中感到舒適(能夠導航 Linux 命令行,使用 vi / nano 編輯檔)
- Hadoop 的應用 知識。
實驗室環境
零安裝: 無需在學生機器上安裝hadoop軟體!將為學生提供一個有效的hadoop集群。
學生將需要以下內容
- SSH 用戶端(Linux 和 Mac 已經有 ssh 用戶端,對於 Windows 建議使用 Putty )
- 用於訪問群集的瀏覽器。我們推薦 Firefox瀏覽器
客戶評論 (6)
Trainer's preparation & organization, and quality of materials provided on github.
Mateusz Rek - MicroStrategy Poland Sp. z o.o.
Course - Impala for Business Intelligence
I thought he did a great job of tailoring the experience to the audience. This class is mostly designed to cover data analysis with HIVE, but me and my co-worker are doing HIVE administration with no real data analytics responsibilities.
ian reif - Franchise Tax Board
Course - Data Analysis with Hive/HiveQL
Many hands-on sessions.
Jacek Pieczątka
Course - Administrator Training for Apache Hadoop
The VM I liked very much The Teacher was very knowledgeable regarding the topic as well as other topics, he was very nice and friendly I liked the facility in Dubai.
Safar Alqahtani - Elm Information Security
Course - Big Data Analytics in Health
The fact that all the data and software was ready to use on an already prepared VM, provided by the trainer in external disks.
vyzVoice
Course - Hadoop for Developers and Administrators
practical things of doing, also theory was served good by Ajay