Get in Touch

Course Outline

Section 1: Data Management in HDFS

  • Various Data Formats (JSON / Avro / Parquet)
  • Compression Schemes
  • Data Masking
  • Labs: Analyzing different data formats; enabling compression

Section 2: Advanced Pig

  • User-defined Functions
  • Introduction to Pig Libraries (ElephantBird / Data-Fu)
  • Loading Complex Structured Data using Pig
  • Pig Tuning
  • Labs: advanced Pig scripting, parsing complex data types

Section 3: Advanced Hive

  • User-defined Functions
  • Compressed Tables
  • Hive Performance Tuning
  • Labs: creating compressed tables, evaluating table formats and configuration

Section 4: Advanced HBase

  • Advanced Schema Modelling
  • Compression
  • Bulk Data Ingest
  • Wide-table / Tall-table comparison
  • HBase and Pig
  • HBase and Hive
  • HBase Performance Tuning
  • Labs: tuning HBase; accessing HBase data from Pig & Hive; Using Phoenix for data modeling

Requirements

  • Proficiency in the Java programming language (as most coding exercises are conducted in Java)
  • Familiarity with the Linux environment (including navigating the command line and editing files using vi or nano)
  • Practical knowledge of Hadoop.

Lab environment

Zero Install: There is no need to install Hadoop software on your own machines! A fully operational Hadoop cluster will be provided for use during the course.

Students will need the following

 21 Hours

Number of participants


Price per participant

Testimonials (1)

Upcoming Courses

Related Categories