Scheduling in YARN
- In the real world, however, resources are limited, and on a busy cluster an
application will often have to wait to have some of its requests fulfilled.
- The YARN Scheduler does allocate resources to applications according to some defined policies.
Scheduler
- The FIFO Scheduler
- Places applications in a queue and runs them in the order of submission.
- For a shared cluster, Capacity Scheduler or the Fair Scheduler would be better.
- The Capacity Scheduler
- Sharing of a Hadoop cluster along organisational lines.
- Each organisation is allocated a certain capacity of the overall cluster
- Queues may be further divided in hierarchical fashion
- Share their cluster between different groups of users within
the organisation
- queue elasticity
- if there is more than one job in the queue and there are idle resources
available, then the Capacity Scheduler may allocate the spare resources to jobs in the
queue even if that causes the queue’s capacity to be exceeded.
- The property
yarn.scheduler.capacity.<queue-path>.user-limit-factor
is set to a value larger
than 1 (the default), then a single job is allowed to use more than its queue’s capacity.
- Modify the file
capacity-scheduler.xml
- Each queue is the type of FIFO.
- Fair Scheduler

Delay Scheduling
- Scheduling opportunity
- From NodeManaer's heartbeats.
yarn.scheduler.capacity.node-locality-delay
is the number of scheduling opportunuties Capacity Scheduler are going to miss.
yarn.scheduler.fair.locality.threshold.node
represents the persentage of number of nodes in cluster, the Fair Scheduler
Dominant Resource Fairness
- Single resource type is being scheduled.
TODO
- Launch an experiment for Schedulers.
- Try DRF.