HadoopConceptsNote

Scheduling in YARN

  • In the real world, however, resources are limited, and on a busy cluster an application will often have to wait to have some of its requests fulfilled.
  • The YARN Scheduler does allocate resources to applications according to some defined policies.

Scheduler

  • The FIFO Scheduler
    • Places applications in a queue and runs them in the order of submission.
    • For a shared cluster, Capacity Scheduler or the Fair Scheduler would be better.
  • The Capacity Scheduler
    • Sharing of a Hadoop cluster along organisational lines.
    • Each organisation is allocated a certain capacity of the overall cluster
    • Queues may be further divided in hierarchical fashion
    • Share their cluster between different groups of users within the organisation
    • queue elasticity
      • if there is more than one job in the queue and there are idle resources available, then the Capacity Scheduler may allocate the spare resources to jobs in the queue even if that causes the queue’s capacity to be exceeded.
      • The property yarn.scheduler.capacity.<queue-path>.user-limit-factor is set to a value larger than 1 (the default), then a single job is allowed to use more than its queue’s capacity.
      • Modify the file capacity-scheduler.xml
      • Each queue is the type of FIFO.
  • Fair Scheduler

Delay Scheduling

  • Scheduling opportunity
    • From NodeManaer's heartbeats.
  • yarn.scheduler.capacity.node-locality-delay is the number of scheduling opportunuties Capacity Scheduler are going to miss.
  • yarn.scheduler.fair.locality.threshold.node represents the persentage of number of nodes in cluster, the Fair Scheduler

Dominant Resource Fairness

  • Single resource type is being scheduled.

TODO

  1. Launch an experiment for Schedulers.
  2. Try DRF.