New Scheduling Algorithms for Improving Performance and Resource Utilization in Hadoop YARN Clusters

Yi Yao, Han Gao, Jiayin Wang, Bo Sheng, Ningfang Mi

Research output: Contribution to journalArticlepeer-review

24 Scopus citations

Abstract

The MapReduce framework has become the defacto scheme for scalable semi-structured and un-structured data processing in recent years. The Hadoop ecosystem has evolved into its second generation, Hadoop YARN, which adopts fine-grained resource management schemes for job scheduling. Nowadays, fairness and efficiency are two main concerns in YARN resource management because resources in YARN are shared and contended by multiple applications. However, the current scheduling in YARN does not yield the optimal resource arrangement, unnecessarily causing idle resources and inefficient scheduling. It omits the dependency between tasks which is extremely crucial for the efficiency of resource utilization as well as heterogeneous job features in real application environments. We thus propose a new YARN scheduler which can effectively reduce the makespan (i.e., the total execution time) of a batch of MapReduce jobs in Hadoop YARN clusters by leveraging the information of requested resources, resource capacities and dependency between tasks. For accommodating heterogeneity in MapReduce jobs, we also extend our scheduler by further considering the job iteration information in the scheduling decisions. We implemented the new scheduling algorithm as a pluggable scheduler in YARN and evaluated it with a set of classic MapReduce benchmarks. The experimental results demonstrate that our YARN scheduler effectively reduces the makespans and improves resource utilizations.

Original languageEnglish
Article number8624318
Pages (from-to)1158-1171
Number of pages14
JournalIEEE Transactions on Cloud Computing
Volume9
Issue number3
DOIs
StatePublished - 1 Jul 2021

Keywords

  • Data Processing
  • MapReduce
  • Resource Management
  • YARN

Fingerprint

Dive into the research topics of 'New Scheduling Algorithms for Improving Performance and Resource Utilization in Hadoop YARN Clusters'. Together they form a unique fingerprint.

Cite this