AutoReplica: Automatic data replica manager in distributed caching and data processing systems

Zhengyu Yang, Jiayin Wang, David Evans, Ningfang Mi

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

30 Scopus citations

Abstract

Nowadays, replication technique is widely used in data center storage systems for large scale Cyber-physical Systems (CPS) to prevent data loss. However, side-effect of replication is mainly the overhead of extra network and I/O traffics, which inevitably downgrades the overall I/O performance of the cluster. To effectively balance the trade-off between I/O performance and fault tolerance, in this paper, we propose a complete solution called "AutoReplica" - a replica manager in distributed caching and data processing systems with SSD-HDD tier storages. In detail, AutoReplica utilizes the remote SSDs (connected by high speed fibers) to replicate local SSD caches to protect data. In order to conduct load balancing among nodes and reduce the network overhead, we propose three approaches (i.e., ring, network, and multiple-SLA network) to automatically setup the cross-node replica structure with the consideration of network traffic, I/O speed and SLAs. To improve the performance during migrations triggered by load balance and failure recovery, we propose the a migrate-on-write technique called "fusion cache" to seamlessly migrate and prefetch among local and remote replicas without pausing the subsystem. Moreover, AutoReplica can also recover from different failure scenarios, while limits the performance downgrading degree. Lastly, AutoReplica supports parallel prefetching from multiple nodes with a new dynamic optimizing streaming technique to improve I/O performance. We are currently in the process of implementing AutoReplica to be easily plugged into commonly used distributed caching systems, and solidifying our design and implementation details.

Original languageEnglish
Title of host publication2016 IEEE 35th International Performance Computing and Communications Conference, IPCCC 2016
PublisherInstitute of Electrical and Electronics Engineers Inc.
ISBN (Electronic)9781509052523
DOIs
StatePublished - 17 Jan 2017
Event35th IEEE International Performance Computing and Communications Conference, IPCCC 2016 - Las Vegas, United States
Duration: 9 Dec 201611 Dec 2016

Publication series

Name2016 IEEE 35th International Performance Computing and Communications Conference, IPCCC 2016

Other

Other35th IEEE International Performance Computing and Communications Conference, IPCCC 2016
Country/TerritoryUnited States
CityLas Vegas
Period9/12/1611/12/16

Keywords

  • Atomicity
  • Backup
  • Cache and Replacement Policy
  • Cluster Migration
  • Consistency
  • Device Failure Recovery
  • Distributed Storage System
  • Fault Tolerance
  • Parallel I/O
  • Replica
  • SLA
  • VM Crash

Fingerprint

Dive into the research topics of 'AutoReplica: Automatic data replica manager in distributed caching and data processing systems'. Together they form a unique fingerprint.

Cite this