HadoopConceptsNote

YARN Application Lifespan

  • Can be varied from few seconds to months.
  • (Simplest) One application per job.
    • MapReduce
  • (Second) One Application per workflow.
    • Container can be reused.
    • Cache intermedia data between jobs.
    • Spark and Tez are examples.
  • (Third) Long-running application shared by different users.
    • Acts as coordination role.
    • Apache Slide
      • A long-running master for launching other applications on the cluster.
    • Impala: always on ApplicationMaster.
      • Proxy application.
      • Reduce the overhead of starting a new ApplicationMaster.

TODO

  1. A practice in Slide
  2. How does the Proxy Application works in Impala?
  3. Is there any possible to implement a Proxy Application in Slide?