Self-Adjusting Slot Configurations for Homogeneous and Heterogeneous Hadoop Clusters

Yi Yao, Jiayin Wang, Bo Sheng, Chiu C. Tan, Ningfang Mi

Research output: Contribution to journalArticleResearchpeer-review

9 Citations (Scopus)

Abstract

The MapReduce framework and its open source implementation Hadoop have become the defacto platform for scalable analysis on large data sets in recent years. One of the primary concerns in Hadoop is how to minimize the completion length (i.e., makespan) of a set of MapReduce jobs. The current Hadoop only allows static slot configuration, i.e., fixed numbers of map slots and reduce slots throughout the lifetime of a cluster. However, we found that such a static configuration may lead to low system resource utilizations as well as long completion length. Motivated by this, we propose simple yet effective schemes which use slot ratio between map and reduce tasks as a tunable knob for reducing the makespan of a given set. By leveraging the workload information of recently completed jobs, our schemes dynamically allocates resources (or slots) to map and reduce tasks. We implemented the presented schemes in Hadoop V0.20.2 and evaluated them with representative MapReduce benchmarks at Amazon EC2. The experimental results demonstrate the effectiveness and robustness of our schemes under both simple workloads and more complex mixed workloads.

Original languageEnglish
Pages (from-to)344-357
Number of pages14
JournalIEEE Transactions on Cloud Computing
Volume5
Issue number2
DOIs
StatePublished - 1 Apr 2017

Fingerprint

Knobs

Keywords

  • Hadoop scheduling
  • MapReduce jobs
  • reduced makespan
  • slot configuration

Cite this

Yao, Yi ; Wang, Jiayin ; Sheng, Bo ; Tan, Chiu C. ; Mi, Ningfang. / Self-Adjusting Slot Configurations for Homogeneous and Heterogeneous Hadoop Clusters. In: IEEE Transactions on Cloud Computing. 2017 ; Vol. 5, No. 2. pp. 344-357.
@article{5026c51540d24b8d8cd2039188845c4f,
title = "Self-Adjusting Slot Configurations for Homogeneous and Heterogeneous Hadoop Clusters",
abstract = "The MapReduce framework and its open source implementation Hadoop have become the defacto platform for scalable analysis on large data sets in recent years. One of the primary concerns in Hadoop is how to minimize the completion length (i.e., makespan) of a set of MapReduce jobs. The current Hadoop only allows static slot configuration, i.e., fixed numbers of map slots and reduce slots throughout the lifetime of a cluster. However, we found that such a static configuration may lead to low system resource utilizations as well as long completion length. Motivated by this, we propose simple yet effective schemes which use slot ratio between map and reduce tasks as a tunable knob for reducing the makespan of a given set. By leveraging the workload information of recently completed jobs, our schemes dynamically allocates resources (or slots) to map and reduce tasks. We implemented the presented schemes in Hadoop V0.20.2 and evaluated them with representative MapReduce benchmarks at Amazon EC2. The experimental results demonstrate the effectiveness and robustness of our schemes under both simple workloads and more complex mixed workloads.",
keywords = "Hadoop scheduling, MapReduce jobs, reduced makespan, slot configuration",
author = "Yi Yao and Jiayin Wang and Bo Sheng and Tan, {Chiu C.} and Ningfang Mi",
year = "2017",
month = "4",
day = "1",
doi = "10.1109/TCC.2015.2415802",
language = "English",
volume = "5",
pages = "344--357",
journal = "IEEE Transactions on Cloud Computing",
issn = "2168-7161",
publisher = "Institute of Electrical and Electronics Engineers Inc.",
number = "2",

}

Self-Adjusting Slot Configurations for Homogeneous and Heterogeneous Hadoop Clusters. / Yao, Yi; Wang, Jiayin; Sheng, Bo; Tan, Chiu C.; Mi, Ningfang.

In: IEEE Transactions on Cloud Computing, Vol. 5, No. 2, 01.04.2017, p. 344-357.

Research output: Contribution to journalArticleResearchpeer-review

TY - JOUR

T1 - Self-Adjusting Slot Configurations for Homogeneous and Heterogeneous Hadoop Clusters

AU - Yao, Yi

AU - Wang, Jiayin

AU - Sheng, Bo

AU - Tan, Chiu C.

AU - Mi, Ningfang

PY - 2017/4/1

Y1 - 2017/4/1

N2 - The MapReduce framework and its open source implementation Hadoop have become the defacto platform for scalable analysis on large data sets in recent years. One of the primary concerns in Hadoop is how to minimize the completion length (i.e., makespan) of a set of MapReduce jobs. The current Hadoop only allows static slot configuration, i.e., fixed numbers of map slots and reduce slots throughout the lifetime of a cluster. However, we found that such a static configuration may lead to low system resource utilizations as well as long completion length. Motivated by this, we propose simple yet effective schemes which use slot ratio between map and reduce tasks as a tunable knob for reducing the makespan of a given set. By leveraging the workload information of recently completed jobs, our schemes dynamically allocates resources (or slots) to map and reduce tasks. We implemented the presented schemes in Hadoop V0.20.2 and evaluated them with representative MapReduce benchmarks at Amazon EC2. The experimental results demonstrate the effectiveness and robustness of our schemes under both simple workloads and more complex mixed workloads.

AB - The MapReduce framework and its open source implementation Hadoop have become the defacto platform for scalable analysis on large data sets in recent years. One of the primary concerns in Hadoop is how to minimize the completion length (i.e., makespan) of a set of MapReduce jobs. The current Hadoop only allows static slot configuration, i.e., fixed numbers of map slots and reduce slots throughout the lifetime of a cluster. However, we found that such a static configuration may lead to low system resource utilizations as well as long completion length. Motivated by this, we propose simple yet effective schemes which use slot ratio between map and reduce tasks as a tunable knob for reducing the makespan of a given set. By leveraging the workload information of recently completed jobs, our schemes dynamically allocates resources (or slots) to map and reduce tasks. We implemented the presented schemes in Hadoop V0.20.2 and evaluated them with representative MapReduce benchmarks at Amazon EC2. The experimental results demonstrate the effectiveness and robustness of our schemes under both simple workloads and more complex mixed workloads.

KW - Hadoop scheduling

KW - MapReduce jobs

KW - reduced makespan

KW - slot configuration

UR - http://www.scopus.com/inward/record.url?scp=85027555628&partnerID=8YFLogxK

U2 - 10.1109/TCC.2015.2415802

DO - 10.1109/TCC.2015.2415802

M3 - Article

VL - 5

SP - 344

EP - 357

JO - IEEE Transactions on Cloud Computing

JF - IEEE Transactions on Cloud Computing

SN - 2168-7161

IS - 2

ER -