Using a tunable knob for reducing makespan of mapreduce jobs in a hadoop cluster

Yi Yao, Jiayin Wang, Bo Sheng, Ningfang Mi

Research output: Contribution to journalConference article

7 Citations (Scopus)

Abstract

The MapReduce framework and its open source implementation Hadoop have become the defacto platform for scalable analysis on large data sets in recent years. One of the primary concerns in Hadoop is how to minimize the completion length (i.e., makespan) of a set of MapReduce jobs. The current Hadoop only allows static slot configuration, i.e., fixed numbers of map slots and reduce slots throughout the lifetime of a cluster. However, we found that such a static configuration may lead to low system resource utilizations as well as long completion length. Motivated by this, we propose a simple yet effective scheme which uses slot ratio between map and reduce tasks as a tunable knob for reducing the makespan of a given set. By leveraging the workload information of recently completed jobs, our scheme dynamically allocates resources (or slots) to map and reduce tasks. We implemented the presented scheme in Hadoop V0.20.2 and evaluated it with representative MapReduce benchmarks at Amazon EC2. The experimental results demonstrate the effectiveness and robustness of our scheme under both simple workloads and more complex mixed workloads.

Original languageEnglish
Article number6676671
Pages (from-to)1-8
Number of pages8
JournalIEEE International Conference on Cloud Computing, CLOUD
DOIs
StatePublished - 1 Dec 2013
Event2013 IEEE 6th International Conference on Cloud Computing, CLOUD 2013 - Santa Clara, CA, United States
Duration: 27 Jun 20132 Jul 2013

Fingerprint

Knobs

Cite this

@article{83d85b676d4b4306ad4e11df6f664238,
title = "Using a tunable knob for reducing makespan of mapreduce jobs in a hadoop cluster",
abstract = "The MapReduce framework and its open source implementation Hadoop have become the defacto platform for scalable analysis on large data sets in recent years. One of the primary concerns in Hadoop is how to minimize the completion length (i.e., makespan) of a set of MapReduce jobs. The current Hadoop only allows static slot configuration, i.e., fixed numbers of map slots and reduce slots throughout the lifetime of a cluster. However, we found that such a static configuration may lead to low system resource utilizations as well as long completion length. Motivated by this, we propose a simple yet effective scheme which uses slot ratio between map and reduce tasks as a tunable knob for reducing the makespan of a given set. By leveraging the workload information of recently completed jobs, our scheme dynamically allocates resources (or slots) to map and reduce tasks. We implemented the presented scheme in Hadoop V0.20.2 and evaluated it with representative MapReduce benchmarks at Amazon EC2. The experimental results demonstrate the effectiveness and robustness of our scheme under both simple workloads and more complex mixed workloads.",
author = "Yi Yao and Jiayin Wang and Bo Sheng and Ningfang Mi",
year = "2013",
month = "12",
day = "1",
doi = "10.1109/CLOUD.2013.140",
language = "English",
pages = "1--8",
journal = "IEEE International Conference on Cloud Computing, CLOUD",
issn = "2159-6182",

}

Using a tunable knob for reducing makespan of mapreduce jobs in a hadoop cluster. / Yao, Yi; Wang, Jiayin; Sheng, Bo; Mi, Ningfang.

In: IEEE International Conference on Cloud Computing, CLOUD, 01.12.2013, p. 1-8.

Research output: Contribution to journalConference article

TY - JOUR

T1 - Using a tunable knob for reducing makespan of mapreduce jobs in a hadoop cluster

AU - Yao, Yi

AU - Wang, Jiayin

AU - Sheng, Bo

AU - Mi, Ningfang

PY - 2013/12/1

Y1 - 2013/12/1

N2 - The MapReduce framework and its open source implementation Hadoop have become the defacto platform for scalable analysis on large data sets in recent years. One of the primary concerns in Hadoop is how to minimize the completion length (i.e., makespan) of a set of MapReduce jobs. The current Hadoop only allows static slot configuration, i.e., fixed numbers of map slots and reduce slots throughout the lifetime of a cluster. However, we found that such a static configuration may lead to low system resource utilizations as well as long completion length. Motivated by this, we propose a simple yet effective scheme which uses slot ratio between map and reduce tasks as a tunable knob for reducing the makespan of a given set. By leveraging the workload information of recently completed jobs, our scheme dynamically allocates resources (or slots) to map and reduce tasks. We implemented the presented scheme in Hadoop V0.20.2 and evaluated it with representative MapReduce benchmarks at Amazon EC2. The experimental results demonstrate the effectiveness and robustness of our scheme under both simple workloads and more complex mixed workloads.

AB - The MapReduce framework and its open source implementation Hadoop have become the defacto platform for scalable analysis on large data sets in recent years. One of the primary concerns in Hadoop is how to minimize the completion length (i.e., makespan) of a set of MapReduce jobs. The current Hadoop only allows static slot configuration, i.e., fixed numbers of map slots and reduce slots throughout the lifetime of a cluster. However, we found that such a static configuration may lead to low system resource utilizations as well as long completion length. Motivated by this, we propose a simple yet effective scheme which uses slot ratio between map and reduce tasks as a tunable knob for reducing the makespan of a given set. By leveraging the workload information of recently completed jobs, our scheme dynamically allocates resources (or slots) to map and reduce tasks. We implemented the presented scheme in Hadoop V0.20.2 and evaluated it with representative MapReduce benchmarks at Amazon EC2. The experimental results demonstrate the effectiveness and robustness of our scheme under both simple workloads and more complex mixed workloads.

UR - http://www.scopus.com/inward/record.url?scp=84897728337&partnerID=8YFLogxK

U2 - 10.1109/CLOUD.2013.140

DO - 10.1109/CLOUD.2013.140

M3 - Conference article

AN - SCOPUS:84897728337

SP - 1

EP - 8

JO - IEEE International Conference on Cloud Computing, CLOUD

JF - IEEE International Conference on Cloud Computing, CLOUD

SN - 2159-6182

M1 - 6676671

ER -