AutoPath: Harnessing parallel execution paths for efficient resource allocation in multi-stage big data frameworks

Han Gao, Zhengyu Yang, Janki Bhimani, Teng Wang, Jiayin Wang, Bo Sheng, Ningfang Mi

Research output: Chapter in Book/Report/Conference proceedingConference contribution

27 Scopus citations

Abstract

Due to the flexibility of data operations and scalability of in- memory cache, Spark has revealed the potential to become the standard distributed framework to replace Hadoop for data-intensive processing in both industry and academia. However, we observe that the built-in scheduling algorithms in Spark (i.e., FIFO and FAIR) are not optimized for the applications with multiple parallel and independent branches in stages. Specifically, the child stage needs to wait and collect data from all its parent branches, but this wait has no guaranteed upper bound since it is tightly coupled with each branch's workload characteristic, stage order, and their corresponding allocated computing resource. To address this challenge, we investigate a superior solution which ensures all branches acquire suitable resources according to their workload demand in order to let the finish time of each branch be as close as possible. Based on this, we propose a novel scheduling policy, named AutoPath, which can effectively reduce the overall makespan of such kind of applications by detecting and leveraging the parallel path, and adaptively assigning computing resources based on the estimated workload demands during runtime. We implemented the new scheduling scheme in Spark v1.5.0 and evaluated it with selected representative workloads. The experiments demonstrate that our new scheduler effectively reduces the makespan and improves resource utilizations for these applications, compared to the current FIFO and FAIR schedulers.

Original languageEnglish
Title of host publication2017 26th International Conference on Computer Communications and Networks, ICCCN 2017
PublisherInstitute of Electrical and Electronics Engineers Inc.
ISBN (Electronic)9781509029914
DOIs
StatePublished - 14 Sep 2017
Event26th International Conference on Computer Communications and Networks, ICCCN 2017 - Vancouver, Canada
Duration: 31 Jul 20173 Aug 2017

Publication series

Name2017 26th International Conference on Computer Communications and Networks, ICCCN 2017

Other

Other26th International Conference on Computer Communications and Networks, ICCCN 2017
CountryCanada
CityVancouver
Period31/07/173/08/17

    Fingerprint

Keywords

  • Resource management
  • Scheduling
  • Spark
  • Task assignment
  • Workload evaluation & estimation

Cite this

Gao, H., Yang, Z., Bhimani, J., Wang, T., Wang, J., Sheng, B., & Mi, N. (2017). AutoPath: Harnessing parallel execution paths for efficient resource allocation in multi-stage big data frameworks. In 2017 26th International Conference on Computer Communications and Networks, ICCCN 2017 [8038381] (2017 26th International Conference on Computer Communications and Networks, ICCCN 2017). Institute of Electrical and Electronics Engineers Inc.. https://doi.org/10.1109/ICCCN.2017.8038381