Get in Touch

Course Outline

1: HDFS (17%)

  • Explain the roles of HDFS Daemons
  • Describe the standard operational procedures of an Apache Hadoop cluster, covering both data storage and processing aspects.
  • Recognize current computing system characteristics that necessitate a solution like Apache Hadoop.
  • Categorize the primary objectives behind HDFS Design
  • In a given scenario, identify the suitable use case for HDFS Federation
  • Identify the components and daemons constituting an HDFS HA-Quorum cluster
  • Analyze the role of HDFS security mechanisms (Kerberos)
  • Select the most appropriate data serialization method for a specific scenario
  • Describe the pathways for file reading and writing
  • Recognize the commands required to manipulate files using the Hadoop File System Shell

2: YARN and MapReduce version 2 (MRv2) (17%)

  • Comprehend how upgrading a cluster from Hadoop 1 to Hadoop 2 impacts cluster configurations
  • Understand the deployment process for MapReduce v2 (MRv2 / YARN), including all associated YARN daemons
  • Grasp the fundamental design strategy of MapReduce v2 (MRv2)
  • Determine how YARN manages resource allocations
  • Identify the workflow of a MapReduce job operating on YARN
  • Determine which files need modification and how to effectuate the migration of a cluster from MapReduce version 1 (MRv1) to MapReduce version 2 (MRv2) on YARN.

3: Hadoop Cluster Planning (16%)

  • Key considerations when selecting hardware and operating systems to host an Apache Hadoop cluster.
  • Analyze options available when selecting an OS
  • Understand kernel tuning and disk swapping processes
  • Given a scenario and workload pattern, identify a hardware configuration suitable for that context
  • Given a scenario, determine the necessary ecosystem components for the cluster to run in order to meet SLA requirements
  • Cluster sizing: given a scenario and execution frequency, identify workload specifics, including CPU, memory, storage, and disk I/O
  • Disk Sizing and Configuration, including JBOD versus RAID, SANs, virtualization, and disk sizing requirements within a cluster
  • Network Topologies: understand network usage in Hadoop (for both HDFS and MapReduce) and propose or identify key network design components for a given scenario

4: Hadoop Cluster Installation and Administration (25%)

  • Given a scenario, identify how the cluster will manage disk and machine failures
  • Analyze logging configuration and logging configuration file formats
  • Understand the fundamentals of Hadoop metrics and cluster health monitoring
  • Identify the functions and purposes of tools available for cluster monitoring
  • Install all ecosystem components in CDH 5, including (but not limited to): Impala, Flume, Oozie, Hue, Manager, Sqoop, Hive, and Pig
  • Identify the functions and purposes of tools available for managing the Apache Hadoop file system

5: Resource Management (10%)

  • Understand the overall design goals of each Hadoop scheduler
  • Given a scenario, determine how the FIFO Scheduler allocates cluster resources
  • Given a scenario, determine how the Fair Scheduler allocates cluster resources under YARN
  • Given a scenario, determine how the Capacity Scheduler allocates cluster resources

6: Monitoring and Logging (15%)

  • Understand the functions and features of Hadoop’s metric collection capabilities
  • Analyze the NameNode and JobTracker Web UIs
  • Understand how to monitor cluster Daemons
  • Identify and monitor CPU usage on master nodes
  • Describe methods to monitor swap and memory allocation on all nodes
  • Identify how to view and manage Hadoop’s log files
  • Interpret a log file

Requirements

  • Foundational skills in Linux administration
  • Basic programming proficiency
 35 Hours

Number of participants


Price per participant

Testimonials (3)

Upcoming Courses

Related Categories