Automatic and scalable data replication manager in distributed computation and storage infrastructure of Cyber-Physical Systems

Zhengyu Yang, Janki Bhimani, Jiayin Wang, David Evans, Ningfang Mi

Research output: Contribution to journalArticleResearchpeer-review

2 Citations (Scopus)

Abstract

Cyber-Physical System (CPS) is a rising technology that utilizes computation and storage resources for sensing, processing, analysis, predicting, understanding of field-data, and then uses communication resources for interaction, intervene, and interface management, and finally provides control for systems so that they can inter-operate, evolve, and run in a stable evidence-based environment. There are two major demands when building the storage infrastructure for a CPS cluster to support above-mentioned functionalities: (1) high I/O and network throughput requirements during runtime, and (2) low latency demand for disaster recovery. To address challenges brought by these demands, in this paper, we propose a complete solution called "AutoReplica" - an automatic and scalable data replication manager in distributed computation and storage infrastructure of cyber-physical systems, using tiering storage with SSD (solid state disk) and HDD (hard disk drive). Specifically, AutoReplica uses SSD to absorb hot data and to maximize I/Os, and its intelligent replication scheme further helps to recovery from disaster. To effectively balance the trade-off between I/O performance and fault tolerance, AutoReplica utilizes the SSDs of remote CPS server nodes (which are connected by high speed fibers) to replicate hot datasets cached in the SSD tier of the local CPS server node. AutoReplica has three approaches to build the replica cluster in order to support multiple SLAs. AutoReplica automatically balances loads among nodes, and can conduct seamlessly online migration operation (i.e., migrate-on-write scheme), instead of pausing the subsystem and copying the entire dataset from one node to the other. Lastly, AutoReplica supports parallel prefetching from both primary node and replica node(s) with a new dynamic optimizing streaming technique to improve I/O performance. We implemented AutoReplica on a real CPS infrastructure, and experimental results show that AutoReplica can significantly reduce the total recovery time with slight overhead compared to the no replication cluster and traditional replication clusters.

Original languageEnglish
Pages (from-to)291-311
Number of pages21
JournalScalable Computing
Volume18
Issue number4
DOIs
StatePublished - 1 Jan 2017

Fingerprint

Managers
Recovery
Disasters
Servers
Copying
Hard disk storage
Fault tolerance
Cyber Physical System
Throughput
Fibers
Communication
Processing

Keywords

  • Atomicity
  • Backup
  • Cache and replacement policy
  • Cluster migration
  • Consistency
  • Cyber Physical Systems infrastructure
  • Device failure recovery
  • Dis- tributed storage system
  • Fault tolerance
  • Parallel I/O
  • Replication
  • SLA
  • VM Crash

Cite this

Yang, Zhengyu ; Bhimani, Janki ; Wang, Jiayin ; Evans, David ; Mi, Ningfang. / Automatic and scalable data replication manager in distributed computation and storage infrastructure of Cyber-Physical Systems. In: Scalable Computing. 2017 ; Vol. 18, No. 4. pp. 291-311.
@article{b51a68e811bb42d9a8bf6f6e93afdd1f,
title = "Automatic and scalable data replication manager in distributed computation and storage infrastructure of Cyber-Physical Systems",
abstract = "Cyber-Physical System (CPS) is a rising technology that utilizes computation and storage resources for sensing, processing, analysis, predicting, understanding of field-data, and then uses communication resources for interaction, intervene, and interface management, and finally provides control for systems so that they can inter-operate, evolve, and run in a stable evidence-based environment. There are two major demands when building the storage infrastructure for a CPS cluster to support above-mentioned functionalities: (1) high I/O and network throughput requirements during runtime, and (2) low latency demand for disaster recovery. To address challenges brought by these demands, in this paper, we propose a complete solution called {"}AutoReplica{"} - an automatic and scalable data replication manager in distributed computation and storage infrastructure of cyber-physical systems, using tiering storage with SSD (solid state disk) and HDD (hard disk drive). Specifically, AutoReplica uses SSD to absorb hot data and to maximize I/Os, and its intelligent replication scheme further helps to recovery from disaster. To effectively balance the trade-off between I/O performance and fault tolerance, AutoReplica utilizes the SSDs of remote CPS server nodes (which are connected by high speed fibers) to replicate hot datasets cached in the SSD tier of the local CPS server node. AutoReplica has three approaches to build the replica cluster in order to support multiple SLAs. AutoReplica automatically balances loads among nodes, and can conduct seamlessly online migration operation (i.e., migrate-on-write scheme), instead of pausing the subsystem and copying the entire dataset from one node to the other. Lastly, AutoReplica supports parallel prefetching from both primary node and replica node(s) with a new dynamic optimizing streaming technique to improve I/O performance. We implemented AutoReplica on a real CPS infrastructure, and experimental results show that AutoReplica can significantly reduce the total recovery time with slight overhead compared to the no replication cluster and traditional replication clusters.",
keywords = "Atomicity, Backup, Cache and replacement policy, Cluster migration, Consistency, Cyber Physical Systems infrastructure, Device failure recovery, Dis- tributed storage system, Fault tolerance, Parallel I/O, Replication, SLA, VM Crash",
author = "Zhengyu Yang and Janki Bhimani and Jiayin Wang and David Evans and Ningfang Mi",
year = "2017",
month = "1",
day = "1",
doi = "10.12694/scpe.v18i4.1330",
language = "English",
volume = "18",
pages = "291--311",
journal = "Scalable Computing",
issn = "1895-1767",
publisher = "universitatea de vest",
number = "4",

}

Automatic and scalable data replication manager in distributed computation and storage infrastructure of Cyber-Physical Systems. / Yang, Zhengyu; Bhimani, Janki; Wang, Jiayin; Evans, David; Mi, Ningfang.

In: Scalable Computing, Vol. 18, No. 4, 01.01.2017, p. 291-311.

Research output: Contribution to journalArticleResearchpeer-review

TY - JOUR

T1 - Automatic and scalable data replication manager in distributed computation and storage infrastructure of Cyber-Physical Systems

AU - Yang, Zhengyu

AU - Bhimani, Janki

AU - Wang, Jiayin

AU - Evans, David

AU - Mi, Ningfang

PY - 2017/1/1

Y1 - 2017/1/1

N2 - Cyber-Physical System (CPS) is a rising technology that utilizes computation and storage resources for sensing, processing, analysis, predicting, understanding of field-data, and then uses communication resources for interaction, intervene, and interface management, and finally provides control for systems so that they can inter-operate, evolve, and run in a stable evidence-based environment. There are two major demands when building the storage infrastructure for a CPS cluster to support above-mentioned functionalities: (1) high I/O and network throughput requirements during runtime, and (2) low latency demand for disaster recovery. To address challenges brought by these demands, in this paper, we propose a complete solution called "AutoReplica" - an automatic and scalable data replication manager in distributed computation and storage infrastructure of cyber-physical systems, using tiering storage with SSD (solid state disk) and HDD (hard disk drive). Specifically, AutoReplica uses SSD to absorb hot data and to maximize I/Os, and its intelligent replication scheme further helps to recovery from disaster. To effectively balance the trade-off between I/O performance and fault tolerance, AutoReplica utilizes the SSDs of remote CPS server nodes (which are connected by high speed fibers) to replicate hot datasets cached in the SSD tier of the local CPS server node. AutoReplica has three approaches to build the replica cluster in order to support multiple SLAs. AutoReplica automatically balances loads among nodes, and can conduct seamlessly online migration operation (i.e., migrate-on-write scheme), instead of pausing the subsystem and copying the entire dataset from one node to the other. Lastly, AutoReplica supports parallel prefetching from both primary node and replica node(s) with a new dynamic optimizing streaming technique to improve I/O performance. We implemented AutoReplica on a real CPS infrastructure, and experimental results show that AutoReplica can significantly reduce the total recovery time with slight overhead compared to the no replication cluster and traditional replication clusters.

AB - Cyber-Physical System (CPS) is a rising technology that utilizes computation and storage resources for sensing, processing, analysis, predicting, understanding of field-data, and then uses communication resources for interaction, intervene, and interface management, and finally provides control for systems so that they can inter-operate, evolve, and run in a stable evidence-based environment. There are two major demands when building the storage infrastructure for a CPS cluster to support above-mentioned functionalities: (1) high I/O and network throughput requirements during runtime, and (2) low latency demand for disaster recovery. To address challenges brought by these demands, in this paper, we propose a complete solution called "AutoReplica" - an automatic and scalable data replication manager in distributed computation and storage infrastructure of cyber-physical systems, using tiering storage with SSD (solid state disk) and HDD (hard disk drive). Specifically, AutoReplica uses SSD to absorb hot data and to maximize I/Os, and its intelligent replication scheme further helps to recovery from disaster. To effectively balance the trade-off between I/O performance and fault tolerance, AutoReplica utilizes the SSDs of remote CPS server nodes (which are connected by high speed fibers) to replicate hot datasets cached in the SSD tier of the local CPS server node. AutoReplica has three approaches to build the replica cluster in order to support multiple SLAs. AutoReplica automatically balances loads among nodes, and can conduct seamlessly online migration operation (i.e., migrate-on-write scheme), instead of pausing the subsystem and copying the entire dataset from one node to the other. Lastly, AutoReplica supports parallel prefetching from both primary node and replica node(s) with a new dynamic optimizing streaming technique to improve I/O performance. We implemented AutoReplica on a real CPS infrastructure, and experimental results show that AutoReplica can significantly reduce the total recovery time with slight overhead compared to the no replication cluster and traditional replication clusters.

KW - Atomicity

KW - Backup

KW - Cache and replacement policy

KW - Cluster migration

KW - Consistency

KW - Cyber Physical Systems infrastructure

KW - Device failure recovery

KW - Dis- tributed storage system

KW - Fault tolerance

KW - Parallel I/O

KW - Replication

KW - SLA

KW - VM Crash

UR - http://www.scopus.com/inward/record.url?scp=85041637462&partnerID=8YFLogxK

U2 - 10.12694/scpe.v18i4.1330

DO - 10.12694/scpe.v18i4.1330

M3 - Article

VL - 18

SP - 291

EP - 311

JO - Scalable Computing

JF - Scalable Computing

SN - 1895-1767

IS - 4

ER -