EA2S2: An efficient application-aware storage system for big data processing in heterogeneous clusters

Teng Wang, Jiayin Wang, Son Nam Nguyen, Zhengyu Yang, Ningfang Mi, Bo Sheng

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

14 Scopus citations

Abstract

Big data processing frameworks such as Hadoop have been widely adopted to process a large volume of data. A lot of prior work has focused on the allocation of resources and the execution order of jobs/tasks to improve the performance in a homogeneous cluster. In this paper, we investigate storage layer design in a heterogeneous system considering a new type of bundled jobs where the input data and associated application jobs are submitted in a bundle. Our goal is to break the barrier between resource management and the underlying storage layer, and improve data locality, an important performance factor for resource management, from the aspect of storage system. We develop a sampling-based randomized algorithm for the network file system to determine the placement of input data blocks. The main idea is to query a selected set of candidate nodes, and estimate their workload at run time combining centralized and per-node information. The node with the smallest workload is selected to host the data block. Our evaluation is based with system implementation and comprehensive experiments on NSF CloudLab platforms. We have also conducted simulation for large-scale clusters. The results show significant performance improvements in terms of execution time and data locality.

Original languageEnglish
Title of host publication2017 26th International Conference on Computer Communications and Networks, ICCCN 2017
PublisherInstitute of Electrical and Electronics Engineers Inc.
ISBN (Electronic)9781509029914
DOIs
StatePublished - 14 Sep 2017
Event26th International Conference on Computer Communications and Networks, ICCCN 2017 - Vancouver, Canada
Duration: 31 Jul 20173 Aug 2017

Publication series

Name2017 26th International Conference on Computer Communications and Networks, ICCCN 2017

Conference

Conference26th International Conference on Computer Communications and Networks, ICCCN 2017
Country/TerritoryCanada
CityVancouver
Period31/07/173/08/17

Fingerprint

Dive into the research topics of 'EA2S2: An efficient application-aware storage system for big data processing in heterogeneous clusters'. Together they form a unique fingerprint.

Cite this