OMO: Optimize MapReduce overlap with a good start (reduce) and a good finish (map)

Jiayin Wang, Yi Yao, Ying Mao, Bo Sheng, Ningfang Mi

Research output: Chapter in Book/Report/Conference proceedingConference contribution

7 Citations (Scopus)

Abstract

MapReduce has become a popular data processing framework in the past few years. Scheduling algorithm is crucial to the performance of a MapReduce cluster, especially when the cluster is concurrently executing a batch of MapReduce jobs. However, the scheduling problem in MapReduce is different from the traditional job scheduling problem as the reduce phase usually starts before the map phase is finished to shuffle the intermediate data. This paper develops a new strategy, named OMO, which particularly aims to optimize the overlap between the map and reduce phases. Our solution includes two new techniques, lazy start of reduce tasks and batch finish of map tasks, which catch the characteristics of the overlap in a MapReduce process and achieve a good alignment of the two phases. We have implemented OMO on Hadoop system and evaluated the performance with extensive experiments. The results show that OMO's performance is superior in terms of total completion length (i.e., makespan) of a batch of jobs.

Original languageEnglish
Title of host publication2015 IEEE 34th International Performance Computing and Communications Conference, IPCCC 2015
PublisherInstitute of Electrical and Electronics Engineers Inc.
ISBN (Electronic)9781467385909
DOIs
StatePublished - 17 Feb 2016
Event34th IEEE International Performance Computing and Communications Conference, IPCCC 2015 - Nanjing, China
Duration: 14 Dec 201516 Dec 2015

Publication series

Name2015 IEEE 34th International Performance Computing and Communications Conference, IPCCC 2015

Other

Other34th IEEE International Performance Computing and Communications Conference, IPCCC 2015
CountryChina
CityNanjing
Period14/12/1516/12/15

Fingerprint

Scheduling
Scheduling algorithms
Experiments

Cite this

Wang, J., Yao, Y., Mao, Y., Sheng, B., & Mi, N. (2016). OMO: Optimize MapReduce overlap with a good start (reduce) and a good finish (map). In 2015 IEEE 34th International Performance Computing and Communications Conference, IPCCC 2015 [7410279] (2015 IEEE 34th International Performance Computing and Communications Conference, IPCCC 2015). Institute of Electrical and Electronics Engineers Inc.. https://doi.org/10.1109/PCCC.2015.7410279
Wang, Jiayin ; Yao, Yi ; Mao, Ying ; Sheng, Bo ; Mi, Ningfang. / OMO : Optimize MapReduce overlap with a good start (reduce) and a good finish (map). 2015 IEEE 34th International Performance Computing and Communications Conference, IPCCC 2015. Institute of Electrical and Electronics Engineers Inc., 2016. (2015 IEEE 34th International Performance Computing and Communications Conference, IPCCC 2015).
@inproceedings{1c1558e75eb14b4fa5c5f4f3d12b61d0,
title = "OMO: Optimize MapReduce overlap with a good start (reduce) and a good finish (map)",
abstract = "MapReduce has become a popular data processing framework in the past few years. Scheduling algorithm is crucial to the performance of a MapReduce cluster, especially when the cluster is concurrently executing a batch of MapReduce jobs. However, the scheduling problem in MapReduce is different from the traditional job scheduling problem as the reduce phase usually starts before the map phase is finished to shuffle the intermediate data. This paper develops a new strategy, named OMO, which particularly aims to optimize the overlap between the map and reduce phases. Our solution includes two new techniques, lazy start of reduce tasks and batch finish of map tasks, which catch the characteristics of the overlap in a MapReduce process and achieve a good alignment of the two phases. We have implemented OMO on Hadoop system and evaluated the performance with extensive experiments. The results show that OMO's performance is superior in terms of total completion length (i.e., makespan) of a batch of jobs.",
author = "Jiayin Wang and Yi Yao and Ying Mao and Bo Sheng and Ningfang Mi",
year = "2016",
month = "2",
day = "17",
doi = "10.1109/PCCC.2015.7410279",
language = "English",
series = "2015 IEEE 34th International Performance Computing and Communications Conference, IPCCC 2015",
publisher = "Institute of Electrical and Electronics Engineers Inc.",
booktitle = "2015 IEEE 34th International Performance Computing and Communications Conference, IPCCC 2015",

}

Wang, J, Yao, Y, Mao, Y, Sheng, B & Mi, N 2016, OMO: Optimize MapReduce overlap with a good start (reduce) and a good finish (map). in 2015 IEEE 34th International Performance Computing and Communications Conference, IPCCC 2015., 7410279, 2015 IEEE 34th International Performance Computing and Communications Conference, IPCCC 2015, Institute of Electrical and Electronics Engineers Inc., 34th IEEE International Performance Computing and Communications Conference, IPCCC 2015, Nanjing, China, 14/12/15. https://doi.org/10.1109/PCCC.2015.7410279

OMO : Optimize MapReduce overlap with a good start (reduce) and a good finish (map). / Wang, Jiayin; Yao, Yi; Mao, Ying; Sheng, Bo; Mi, Ningfang.

2015 IEEE 34th International Performance Computing and Communications Conference, IPCCC 2015. Institute of Electrical and Electronics Engineers Inc., 2016. 7410279 (2015 IEEE 34th International Performance Computing and Communications Conference, IPCCC 2015).

Research output: Chapter in Book/Report/Conference proceedingConference contribution

TY - GEN

T1 - OMO

T2 - Optimize MapReduce overlap with a good start (reduce) and a good finish (map)

AU - Wang, Jiayin

AU - Yao, Yi

AU - Mao, Ying

AU - Sheng, Bo

AU - Mi, Ningfang

PY - 2016/2/17

Y1 - 2016/2/17

N2 - MapReduce has become a popular data processing framework in the past few years. Scheduling algorithm is crucial to the performance of a MapReduce cluster, especially when the cluster is concurrently executing a batch of MapReduce jobs. However, the scheduling problem in MapReduce is different from the traditional job scheduling problem as the reduce phase usually starts before the map phase is finished to shuffle the intermediate data. This paper develops a new strategy, named OMO, which particularly aims to optimize the overlap between the map and reduce phases. Our solution includes two new techniques, lazy start of reduce tasks and batch finish of map tasks, which catch the characteristics of the overlap in a MapReduce process and achieve a good alignment of the two phases. We have implemented OMO on Hadoop system and evaluated the performance with extensive experiments. The results show that OMO's performance is superior in terms of total completion length (i.e., makespan) of a batch of jobs.

AB - MapReduce has become a popular data processing framework in the past few years. Scheduling algorithm is crucial to the performance of a MapReduce cluster, especially when the cluster is concurrently executing a batch of MapReduce jobs. However, the scheduling problem in MapReduce is different from the traditional job scheduling problem as the reduce phase usually starts before the map phase is finished to shuffle the intermediate data. This paper develops a new strategy, named OMO, which particularly aims to optimize the overlap between the map and reduce phases. Our solution includes two new techniques, lazy start of reduce tasks and batch finish of map tasks, which catch the characteristics of the overlap in a MapReduce process and achieve a good alignment of the two phases. We have implemented OMO on Hadoop system and evaluated the performance with extensive experiments. The results show that OMO's performance is superior in terms of total completion length (i.e., makespan) of a batch of jobs.

UR - http://www.scopus.com/inward/record.url?scp=84969913388&partnerID=8YFLogxK

U2 - 10.1109/PCCC.2015.7410279

DO - 10.1109/PCCC.2015.7410279

M3 - Conference contribution

AN - SCOPUS:84969913388

T3 - 2015 IEEE 34th International Performance Computing and Communications Conference, IPCCC 2015

BT - 2015 IEEE 34th International Performance Computing and Communications Conference, IPCCC 2015

PB - Institute of Electrical and Electronics Engineers Inc.

ER -

Wang J, Yao Y, Mao Y, Sheng B, Mi N. OMO: Optimize MapReduce overlap with a good start (reduce) and a good finish (map). In 2015 IEEE 34th International Performance Computing and Communications Conference, IPCCC 2015. Institute of Electrical and Electronics Engineers Inc. 2016. 7410279. (2015 IEEE 34th International Performance Computing and Communications Conference, IPCCC 2015). https://doi.org/10.1109/PCCC.2015.7410279