Course Outline
Introduction
- Why and how project teams adopt Hadoop
- How it all started
- The Project Manager's role in Hadoop projects
Understanding Hadoop's Architecture and Key Concepts
- HDFS
- MapReduce
- Other pieces of the Hadoop ecosystem
What Constitutes Big Data?
Different Approaches to Storing Big Data
HDFS (Hadoop Distributed File System) as the Foundation
How Big Data is Processed
- The power of distributed processing
Processing Data with MapReduce
- How data is picked apart step by step
The Role of Clustering in Large-Scale Distributed Processing
- Architectural overview
- Clustering approaches
Clustering Your Data and Processes with YARN
The Role of Non-Relational Database in Big Data Storage
Working with Hadoop's Non-Relational Database: HBase
Data Warehousing Architectural Overview
Managing Your Data Warehouse with Hive
Running Hadoop from Shell-Scripts
Working with Hadoop Streaming
Other Hadoop Tools and Utilities
Getting Started on a Hadoop Project
- Demystifying complexity
Migrating an Existing Project to Hadoop
- Infrastructure considerations
- Scaling beyond your allocated resources
Hadoop Project Stakeholders and Their Toolkits
- Developers, data scientists, business analysts and project managers
Hadoop as a Foundation for New Technologies and Approaches
Closing Remarks
Requirements
- A general understanding of programming
- An understanding of databases
- Basic knowledge of Linux