Cyclic Diagnosability and Fault Diagnosis Algorithm of Data Center Network DCell

  • Kaineng Guan
  • , Limei Lin
  • , Yanze Huang
  • , Dajin Wang
  • , Sun Yuan Hsieh

Research output: Contribution to journalArticlepeer-review

Abstract

Data center DCell networks are particularly well-suited for large, reliability-critical data centers due to their scalability, fault tolerance, and efficient bandwidth utilization. The reliability and diagnosis of DCell networks are of paramount importance in ensuring smooth operation and continuous availability of data center services. Traditional fault diagnosis models, which focus on global fault detection, are more suited to simpler networks. In contrast, complex DCell networks require fault diagnosis under specific conditions to accommodate dynamic changes and constraints. This paper studies the cyclic diagnosability of DCell networks under different system-level diagnostic models, which is a novel fault diagnosis strategy. Cyclic diagnosability, denoted as ctc(G), represents the maximum size of a set of fault vertices D in a network G, so that the self-diagnosis system can identify all vertices in D under the condition that at least two connected components of G-D contain a cycle. We show that for k-dimensional DCell with n-port switches DCellk, n, when k ≥ 2, 3 ≤ n ≤ 5, or k ≥ n/2 + 1, n ≥ 6, the cyclic diagnosability is 4k + 2n - 5 under the PMC and MM* models based on the indistinguishability of the constructed set and the linear multiple fault analysis technology. Additionally, we propose two practical cyclic fault diagnosis algorithms with low time complexity: PMC-Based Cyclic Fault Diagnosis (PMCCFD) and MM*-Based Cyclic Fault Diagnosis (MMCFD) for the PMC and MM* models to improve fault detection and recovery in large-scale DCell networks. We also implement the PMCCFD and MMCFD algorithms on both synthetic and real data. Furthermore, we verify the availability/efficiency of algorithms PMCCFD and MMCFD in terms of accuracy rate, recall, false negative rate, negative predictive value, and F1Score.

Original languageEnglish
Pages (from-to)888-901
Number of pages14
JournalIEEE Transactions on Networking
Volume34
DOIs
StatePublished - 2026

Keywords

  • Fault diagnosis
  • cyclic diagnosability
  • data center network
  • reliability

Fingerprint

Dive into the research topics of 'Cyclic Diagnosability and Fault Diagnosis Algorithm of Data Center Network DCell'. Together they form a unique fingerprint.

Cite this