Get in Touch

Course Outline

Section 1: Introduction to Hadoop

  • Hadoop history, concepts
  • ecosystem
  • distributions
  • high-level architecture
  • Hadoop myths
  • Hadoop challenges
  • hardware / software
  • lab : first look at Hadoop

Section 2: HDFS

  • Design and architecture
  • concepts (horizontal scaling, replication, data locality, rack awareness)
  • Daemons : Namenode, Secondary namenode, Data node
  • communications / heart-beats
  • data integrity
  • read / write path
  • Namenode High Availability (HA), Federation
  • labs : Interacting with HDFS

Section 3 : Map Reduce

  • concepts and architecture
  • daemons (MRV1) : jobtracker / tasktracker
  • phases : driver, mapper, shuffle/sort, reducer
  • Map Reduce Version 1 and Version 2 (YARN)
  • Internals of Map Reduce
  • Introduction to Java Map Reduce program
  • labs : Running a sample MapReduce program

Section 4 : Pig

  • pig vs java map reduce
  • pig job flow
  • pig latin language
  • ETL with Pig
  • Transformations & Joins
  • User defined functions (UDF)
  • labs : writing Pig scripts to analyze data

Section 5: Hive

  • architecture and design
  • data types
  • SQL support in Hive
  • Creating Hive tables and querying
  • partitions
  • joins
  • text processing
  • labs : various labs on processing data with Hive

Section 6: HBase

  • concepts and architecture
  • HBase vs RDBMS vs Cassandra
  • HBase Java API
  • Time series data on HBase
  • schema design
  • labs : Interacting with HBase using shell; programming in HBase Java API ; Schema design exercise

Requirements

  • Proficiency in the Java programming language (as most programming exercises are conducted in Java)
  • Familiarity with the Linux environment (including the ability to navigate the Linux command line and edit files using vi or nano)

Lab environment

Zero Install : There is no need to install Hadoop software on students' machines! A fully functional Hadoop cluster will be provided for students.

Students will need the following

  • an SSH client (Linux and Mac systems already include ssh clients; for Windows, Putty is recommended)
  • a browser to access the cluster, with Firefox recommended
 28 Hours

Number of participants


Price per participant

Testimonials (1)

Upcoming Courses

Related Categories