TY - GEN
T1 - ESplash
T2 - 35th IEEE International Performance Computing and Communications Conference, IPCCC 2016
AU - Wang, Jiayin
AU - Wang, Teng
AU - Yang, Zhengyu
AU - Mi, Ningfang
AU - Sheng, Bo
N1 - Publisher Copyright:
© 2016 IEEE.
PY - 2017/1/17
Y1 - 2017/1/17
N2 - In this paper, we aim to develop an efficient speculation framework for a heterogeneous cluster. Speculation is a common mechanism that identifies 'slow' node in a cluster and starts redundant tasks on other nodes to guarantee the reliability. We consider MapReduce/Hadoop as a representative computing platform, and our general goal is to accurately and quickly identify the straggler nodes during the job execution. On the one hand, our approach significantly reduces unnecessary speculative executions that occupy system resources, but do not get finished. On the other hand, when a node is prone to failure, our solution is able to detect it at an early stage and effectively launch a speculative task to avoid the delay in the job execution. We implement our solution in Hadoop platform and evaluate it with extensive experiments. The results show that our solution is efficient and effective when handling the speculative execution. The job execution time in our system is superior to that in the current Hadoop distribution.
AB - In this paper, we aim to develop an efficient speculation framework for a heterogeneous cluster. Speculation is a common mechanism that identifies 'slow' node in a cluster and starts redundant tasks on other nodes to guarantee the reliability. We consider MapReduce/Hadoop as a representative computing platform, and our general goal is to accurately and quickly identify the straggler nodes during the job execution. On the one hand, our approach significantly reduces unnecessary speculative executions that occupy system resources, but do not get finished. On the other hand, when a node is prone to failure, our solution is able to detect it at an early stage and effectively launch a speculative task to avoid the delay in the job execution. We implement our solution in Hadoop platform and evaluate it with extensive experiments. The results show that our solution is efficient and effective when handling the speculative execution. The job execution time in our system is superior to that in the current Hadoop distribution.
UR - http://www.scopus.com/inward/record.url?scp=85013421526&partnerID=8YFLogxK
U2 - 10.1109/PCCC.2016.7820648
DO - 10.1109/PCCC.2016.7820648
M3 - Conference contribution
AN - SCOPUS:85013421526
T3 - 2016 IEEE 35th International Performance Computing and Communications Conference, IPCCC 2016
BT - 2016 IEEE 35th International Performance Computing and Communications Conference, IPCCC 2016
PB - Institute of Electrical and Electronics Engineers Inc.
Y2 - 9 December 2016 through 11 December 2016
ER -