HadoopConceptsNote

Chapter 3: YARN and MapReduce

Goal

Manage the resources over the entire cluster, where are many application running on.

yarn

Hive, Pig... are run on MapReduce and don't interact with YARN directly.

Terminology

  • Resource Manager
    • Manages the resources accross the cluster.
    • ApplicationManager
      • Accepting job-submission
      • Negotiate first container for the ApplicationMaster
      • ApplicationMaster failover.
    • Scheduler
      • Allocating resources
      • Plugable: CapacityScheduler and FairScheduler
  • Node Manager
    • Running on all nodes in the cluster to launch.
    • Monitor containers.
    • Heartbeats carry the information about NodeManager's running Containers and resources available for new Containers.
  • Container
    • Unix process or Linux cgroup.
  • ApplicationMaster
    • Managing each and every instance of applications that runs within the YARN.
    • Negotiation of the resources containers from the Resource Manager(Scheduler).
    • monitors the execution and resource consumption of containers, such as resources allocations of CPU, Memory, etc.

TODO

  1. CGroup, LXC study.
  2. More details for CapacityScheduler and FairScheduler.