Abstract
In this study, we investigated the problem of scheduling streaming applications on a heterogeneous cluster environment and, based on our previous work, developed the maximum throughput scheduler algorithm (MT-Scheduler) for streaming applications. The proposed algorithm uses a dynamic programming technique to efficiently map the application topology onto the heterogeneous distributed system based on computing and data transfer requirements, while also taking into account the capacity of the underlying cluster resources. The proposed approach maximizes the system throughput by identifying and minimizing the time incurred at the computing/transfer bottleneck. The MT-Scheduler supports scheduling applications structured as a directed acyclic graph. We conducted experiments using three Storm microbenchmark topologies in both simulation and real Apache Storm environments. In terms of the performance evaluation, we compared the proposed MT-Scheduler with the simulated round robin and the default Storm scheduler algorithms. The results indicated that the MT-Scheduler outperforms the default round robin approach in terms of both the average system latency and throughput.
Original language | English |
---|---|
Pages (from-to) | 9609-9628 |
Number of pages | 20 |
Journal | Journal of Supercomputing |
Volume | 76 |
Issue number | 12 |
DOIs | |
State | Published - 1 Dec 2020 |
Keywords
- Apache Storm
- DAG scheduling
- Data stream
- Distributed systems
- Heterogeneous scheduling