TY - GEN
T1 - OMO
T2 - 34th IEEE International Performance Computing and Communications Conference, IPCCC 2015
AU - Wang, Jiayin
AU - Yao, Yi
AU - Mao, Ying
AU - Sheng, Bo
AU - Mi, Ningfang
N1 - Publisher Copyright:
© 2015 IEEE.
PY - 2016/2/17
Y1 - 2016/2/17
N2 - MapReduce has become a popular data processing framework in the past few years. Scheduling algorithm is crucial to the performance of a MapReduce cluster, especially when the cluster is concurrently executing a batch of MapReduce jobs. However, the scheduling problem in MapReduce is different from the traditional job scheduling problem as the reduce phase usually starts before the map phase is finished to shuffle the intermediate data. This paper develops a new strategy, named OMO, which particularly aims to optimize the overlap between the map and reduce phases. Our solution includes two new techniques, lazy start of reduce tasks and batch finish of map tasks, which catch the characteristics of the overlap in a MapReduce process and achieve a good alignment of the two phases. We have implemented OMO on Hadoop system and evaluated the performance with extensive experiments. The results show that OMO's performance is superior in terms of total completion length (i.e., makespan) of a batch of jobs.
AB - MapReduce has become a popular data processing framework in the past few years. Scheduling algorithm is crucial to the performance of a MapReduce cluster, especially when the cluster is concurrently executing a batch of MapReduce jobs. However, the scheduling problem in MapReduce is different from the traditional job scheduling problem as the reduce phase usually starts before the map phase is finished to shuffle the intermediate data. This paper develops a new strategy, named OMO, which particularly aims to optimize the overlap between the map and reduce phases. Our solution includes two new techniques, lazy start of reduce tasks and batch finish of map tasks, which catch the characteristics of the overlap in a MapReduce process and achieve a good alignment of the two phases. We have implemented OMO on Hadoop system and evaluated the performance with extensive experiments. The results show that OMO's performance is superior in terms of total completion length (i.e., makespan) of a batch of jobs.
UR - http://www.scopus.com/inward/record.url?scp=84969913388&partnerID=8YFLogxK
U2 - 10.1109/PCCC.2015.7410279
DO - 10.1109/PCCC.2015.7410279
M3 - Conference contribution
AN - SCOPUS:84969913388
T3 - 2015 IEEE 34th International Performance Computing and Communications Conference, IPCCC 2015
BT - 2015 IEEE 34th International Performance Computing and Communications Conference, IPCCC 2015
PB - Institute of Electrical and Electronics Engineers Inc.
Y2 - 14 December 2015 through 16 December 2015
ER -