ESplash

Efficient speculation in large scale heterogeneous computing systems

Jiayin Wang, Teng Wang, Zhengyu Yang, Ningfang Mi, Bo Sheng

Research output: Chapter in Book/Report/Conference proceedingConference contributionResearchpeer-review

11 Citations (Scopus)

Abstract

In this paper, we aim to develop an efficient speculation framework for a heterogeneous cluster. Speculation is a common mechanism that identifies 'slow' node in a cluster and starts redundant tasks on other nodes to guarantee the reliability. We consider MapReduce/Hadoop as a representative computing platform, and our general goal is to accurately and quickly identify the straggler nodes during the job execution. On the one hand, our approach significantly reduces unnecessary speculative executions that occupy system resources, but do not get finished. On the other hand, when a node is prone to failure, our solution is able to detect it at an early stage and effectively launch a speculative task to avoid the delay in the job execution. We implement our solution in Hadoop platform and evaluate it with extensive experiments. The results show that our solution is efficient and effective when handling the speculative execution. The job execution time in our system is superior to that in the current Hadoop distribution.

Original languageEnglish
Title of host publication2016 IEEE 35th International Performance Computing and Communications Conference, IPCCC 2016
PublisherInstitute of Electrical and Electronics Engineers Inc.
ISBN (Electronic)9781509052523
DOIs
StatePublished - 17 Jan 2017
Event35th IEEE International Performance Computing and Communications Conference, IPCCC 2016 - Las Vegas, United States
Duration: 9 Dec 201611 Dec 2016

Publication series

Name2016 IEEE 35th International Performance Computing and Communications Conference, IPCCC 2016

Other

Other35th IEEE International Performance Computing and Communications Conference, IPCCC 2016
CountryUnited States
CityLas Vegas
Period9/12/1611/12/16

Fingerprint

Experiments

Cite this

Wang, J., Wang, T., Yang, Z., Mi, N., & Sheng, B. (2017). ESplash: Efficient speculation in large scale heterogeneous computing systems. In 2016 IEEE 35th International Performance Computing and Communications Conference, IPCCC 2016 [7820648] (2016 IEEE 35th International Performance Computing and Communications Conference, IPCCC 2016). Institute of Electrical and Electronics Engineers Inc.. https://doi.org/10.1109/PCCC.2016.7820648
Wang, Jiayin ; Wang, Teng ; Yang, Zhengyu ; Mi, Ningfang ; Sheng, Bo. / ESplash : Efficient speculation in large scale heterogeneous computing systems. 2016 IEEE 35th International Performance Computing and Communications Conference, IPCCC 2016. Institute of Electrical and Electronics Engineers Inc., 2017. (2016 IEEE 35th International Performance Computing and Communications Conference, IPCCC 2016).
@inproceedings{fd7e0e9e650a470eaad1bb8783ee4c0e,
title = "ESplash: Efficient speculation in large scale heterogeneous computing systems",
abstract = "In this paper, we aim to develop an efficient speculation framework for a heterogeneous cluster. Speculation is a common mechanism that identifies 'slow' node in a cluster and starts redundant tasks on other nodes to guarantee the reliability. We consider MapReduce/Hadoop as a representative computing platform, and our general goal is to accurately and quickly identify the straggler nodes during the job execution. On the one hand, our approach significantly reduces unnecessary speculative executions that occupy system resources, but do not get finished. On the other hand, when a node is prone to failure, our solution is able to detect it at an early stage and effectively launch a speculative task to avoid the delay in the job execution. We implement our solution in Hadoop platform and evaluate it with extensive experiments. The results show that our solution is efficient and effective when handling the speculative execution. The job execution time in our system is superior to that in the current Hadoop distribution.",
author = "Jiayin Wang and Teng Wang and Zhengyu Yang and Ningfang Mi and Bo Sheng",
year = "2017",
month = "1",
day = "17",
doi = "10.1109/PCCC.2016.7820648",
language = "English",
series = "2016 IEEE 35th International Performance Computing and Communications Conference, IPCCC 2016",
publisher = "Institute of Electrical and Electronics Engineers Inc.",
booktitle = "2016 IEEE 35th International Performance Computing and Communications Conference, IPCCC 2016",

}

Wang, J, Wang, T, Yang, Z, Mi, N & Sheng, B 2017, ESplash: Efficient speculation in large scale heterogeneous computing systems. in 2016 IEEE 35th International Performance Computing and Communications Conference, IPCCC 2016., 7820648, 2016 IEEE 35th International Performance Computing and Communications Conference, IPCCC 2016, Institute of Electrical and Electronics Engineers Inc., 35th IEEE International Performance Computing and Communications Conference, IPCCC 2016, Las Vegas, United States, 9/12/16. https://doi.org/10.1109/PCCC.2016.7820648

ESplash : Efficient speculation in large scale heterogeneous computing systems. / Wang, Jiayin; Wang, Teng; Yang, Zhengyu; Mi, Ningfang; Sheng, Bo.

2016 IEEE 35th International Performance Computing and Communications Conference, IPCCC 2016. Institute of Electrical and Electronics Engineers Inc., 2017. 7820648 (2016 IEEE 35th International Performance Computing and Communications Conference, IPCCC 2016).

Research output: Chapter in Book/Report/Conference proceedingConference contributionResearchpeer-review

TY - GEN

T1 - ESplash

T2 - Efficient speculation in large scale heterogeneous computing systems

AU - Wang, Jiayin

AU - Wang, Teng

AU - Yang, Zhengyu

AU - Mi, Ningfang

AU - Sheng, Bo

PY - 2017/1/17

Y1 - 2017/1/17

N2 - In this paper, we aim to develop an efficient speculation framework for a heterogeneous cluster. Speculation is a common mechanism that identifies 'slow' node in a cluster and starts redundant tasks on other nodes to guarantee the reliability. We consider MapReduce/Hadoop as a representative computing platform, and our general goal is to accurately and quickly identify the straggler nodes during the job execution. On the one hand, our approach significantly reduces unnecessary speculative executions that occupy system resources, but do not get finished. On the other hand, when a node is prone to failure, our solution is able to detect it at an early stage and effectively launch a speculative task to avoid the delay in the job execution. We implement our solution in Hadoop platform and evaluate it with extensive experiments. The results show that our solution is efficient and effective when handling the speculative execution. The job execution time in our system is superior to that in the current Hadoop distribution.

AB - In this paper, we aim to develop an efficient speculation framework for a heterogeneous cluster. Speculation is a common mechanism that identifies 'slow' node in a cluster and starts redundant tasks on other nodes to guarantee the reliability. We consider MapReduce/Hadoop as a representative computing platform, and our general goal is to accurately and quickly identify the straggler nodes during the job execution. On the one hand, our approach significantly reduces unnecessary speculative executions that occupy system resources, but do not get finished. On the other hand, when a node is prone to failure, our solution is able to detect it at an early stage and effectively launch a speculative task to avoid the delay in the job execution. We implement our solution in Hadoop platform and evaluate it with extensive experiments. The results show that our solution is efficient and effective when handling the speculative execution. The job execution time in our system is superior to that in the current Hadoop distribution.

UR - http://www.scopus.com/inward/record.url?scp=85013421526&partnerID=8YFLogxK

U2 - 10.1109/PCCC.2016.7820648

DO - 10.1109/PCCC.2016.7820648

M3 - Conference contribution

T3 - 2016 IEEE 35th International Performance Computing and Communications Conference, IPCCC 2016

BT - 2016 IEEE 35th International Performance Computing and Communications Conference, IPCCC 2016

PB - Institute of Electrical and Electronics Engineers Inc.

ER -

Wang J, Wang T, Yang Z, Mi N, Sheng B. ESplash: Efficient speculation in large scale heterogeneous computing systems. In 2016 IEEE 35th International Performance Computing and Communications Conference, IPCCC 2016. Institute of Electrical and Electronics Engineers Inc. 2017. 7820648. (2016 IEEE 35th International Performance Computing and Communications Conference, IPCCC 2016). https://doi.org/10.1109/PCCC.2016.7820648