Optimizing Internal Overlaps by Self-Adjusting Resource Allocation in Multi-Stage Computing Systems

Allen Yang, Jiayin Wang, Ying Mao, Yi Yao, Ningfang Mi, Bo Sheng

Research output: Contribution to journalArticlepeer-review

1 Scopus citations


With the rise of big data, more and more users will launch computing systems to process a large volume of data in various applications. A Scheduling algorithm is crucial to the performance of the processing platforms, especially when they are concurrently executing a batch of jobs. Such jobs usually represent multiple stages. Each stage produces the intermediate data which will be piped to the next stage for further processing. However, the scheduling problem in a big data computing system is different from the traditional multi-stage job scheduling problem as for any two consecutive stages, the later stage usually starts before the former stage is finished to 'shuffle' the intermediate data. In this paper, we consider MapReduce/Hadoop as a representative computing system and develop a new strategy named OMO, Optimize MapReduce Overlap with a Good Start (Reduce) and a Good Finish (Map). A MapReduce job contains two consecutive phases: map and reduce. Our general target is to optimize the internal overlap between these two phases. There are two new techniques included in our solution, Lazy start of reduce tasks and Batch finish of map tasks, which aim to approach an effective alignment of the two phases based on the characteristics of the MapReduce process. OMO has been implemented on the Hadoop system with extensive experiments for performance evaluation. The results show that OMO's performance is superior in terms of total completion time (i.e., makespan) of a batch of jobs.

Original languageEnglish
Article number9456865
Pages (from-to)88805-88819
Number of pages15
JournalIEEE Access
StatePublished - 2021


  • Hadoop scheduling
  • MapReduce jobs
  • Reduced makespan
  • Resource management


Dive into the research topics of 'Optimizing Internal Overlaps by Self-Adjusting Resource Allocation in Multi-Stage Computing Systems'. Together they form a unique fingerprint.

Cite this