TY - JOUR
T1 - Cyclic Diagnosability and Fault Diagnosis Algorithm of Data Center Network DCell
AU - Guan, Kaineng
AU - Lin, Limei
AU - Huang, Yanze
AU - Wang, Dajin
AU - Hsieh, Sun Yuan
N1 - Publisher Copyright:
© 2025 IEEE.
PY - 2026
Y1 - 2026
N2 - Data center DCell networks are particularly well-suited for large, reliability-critical data centers due to their scalability, fault tolerance, and efficient bandwidth utilization. The reliability and diagnosis of DCell networks are of paramount importance in ensuring smooth operation and continuous availability of data center services. Traditional fault diagnosis models, which focus on global fault detection, are more suited to simpler networks. In contrast, complex DCell networks require fault diagnosis under specific conditions to accommodate dynamic changes and constraints. This paper studies the cyclic diagnosability of DCell networks under different system-level diagnostic models, which is a novel fault diagnosis strategy. Cyclic diagnosability, denoted as ctc(G), represents the maximum size of a set of fault vertices D in a network G, so that the self-diagnosis system can identify all vertices in D under the condition that at least two connected components of G-D contain a cycle. We show that for k-dimensional DCell with n-port switches DCellk, n, when k ≥ 2, 3 ≤ n ≤ 5, or k ≥ n/2 + 1, n ≥ 6, the cyclic diagnosability is 4k + 2n - 5 under the PMC and MM* models based on the indistinguishability of the constructed set and the linear multiple fault analysis technology. Additionally, we propose two practical cyclic fault diagnosis algorithms with low time complexity: PMC-Based Cyclic Fault Diagnosis (PMCCFD) and MM*-Based Cyclic Fault Diagnosis (MMCFD) for the PMC and MM* models to improve fault detection and recovery in large-scale DCell networks. We also implement the PMCCFD and MMCFD algorithms on both synthetic and real data. Furthermore, we verify the availability/efficiency of algorithms PMCCFD and MMCFD in terms of accuracy rate, recall, false negative rate, negative predictive value, and F1Score.
AB - Data center DCell networks are particularly well-suited for large, reliability-critical data centers due to their scalability, fault tolerance, and efficient bandwidth utilization. The reliability and diagnosis of DCell networks are of paramount importance in ensuring smooth operation and continuous availability of data center services. Traditional fault diagnosis models, which focus on global fault detection, are more suited to simpler networks. In contrast, complex DCell networks require fault diagnosis under specific conditions to accommodate dynamic changes and constraints. This paper studies the cyclic diagnosability of DCell networks under different system-level diagnostic models, which is a novel fault diagnosis strategy. Cyclic diagnosability, denoted as ctc(G), represents the maximum size of a set of fault vertices D in a network G, so that the self-diagnosis system can identify all vertices in D under the condition that at least two connected components of G-D contain a cycle. We show that for k-dimensional DCell with n-port switches DCellk, n, when k ≥ 2, 3 ≤ n ≤ 5, or k ≥ n/2 + 1, n ≥ 6, the cyclic diagnosability is 4k + 2n - 5 under the PMC and MM* models based on the indistinguishability of the constructed set and the linear multiple fault analysis technology. Additionally, we propose two practical cyclic fault diagnosis algorithms with low time complexity: PMC-Based Cyclic Fault Diagnosis (PMCCFD) and MM*-Based Cyclic Fault Diagnosis (MMCFD) for the PMC and MM* models to improve fault detection and recovery in large-scale DCell networks. We also implement the PMCCFD and MMCFD algorithms on both synthetic and real data. Furthermore, we verify the availability/efficiency of algorithms PMCCFD and MMCFD in terms of accuracy rate, recall, false negative rate, negative predictive value, and F1Score.
KW - Fault diagnosis
KW - cyclic diagnosability
KW - data center network
KW - reliability
UR - https://www.scopus.com/pages/publications/105017323635
U2 - 10.1109/TON.2025.3610896
DO - 10.1109/TON.2025.3610896
M3 - Article
AN - SCOPUS:105017323635
SN - 2998-4157
VL - 34
SP - 888
EP - 901
JO - IEEE Transactions on Networking
JF - IEEE Transactions on Networking
ER -