TY - JOUR
T1 - A probabilistic data analytics methodology based on Bayesian Belief network for predicting and understanding breast cancer survival
AU - Dag, Asli Z.
AU - Akcam, Zumrut
AU - Kibis, Eyyub
AU - Simsek, Serhat
AU - Delen, Dursun
N1 - Publisher Copyright:
© 2022 Elsevier B.V.
PY - 2022/4/22
Y1 - 2022/4/22
N2 - Understanding breast cancer survival has proven to be a challenging problem for practitioners and researchers. Identifying the factors affecting cancer progression, their interrelationships, and their influence on patients’ long-term survival helps make timely treatment decisions. The current study addresses this problem by proposing a Tree-Augmented Bayesian Belief Network (TAN)-based data analytics methodology comprising of four steps: data acquisition and preprocessing, variable selection via Genetic Algorithm (GA), data balancing with synthetic minority over-sampling and random under-sampling methods, and finally the development of the TAN model to determine the probabilistic inter-conditional dependency structure among breast cancer-related variables along with the posterior survival probabilities The proposed model is compared to well-known machine learning models. A what-if analysis has also been conducted to verify the associations among the variables in the TAN model. The relative importance of each variable has been investigated via sensitivity analysis. Finally, a decision support tool is developed to further explore the conditional dependency structure among the cancer-related factors. The results produced by the proposed methodology, namely the patient-specific posterior survival probabilities and the conditional relationships among the variables, can be used by healthcare professionals and physicians to improve the decision-making process in planning and managing breast cancer treatments. Our generic methodology can also accommodate other types of cancer and be applied to manage various medical procedures.
AB - Understanding breast cancer survival has proven to be a challenging problem for practitioners and researchers. Identifying the factors affecting cancer progression, their interrelationships, and their influence on patients’ long-term survival helps make timely treatment decisions. The current study addresses this problem by proposing a Tree-Augmented Bayesian Belief Network (TAN)-based data analytics methodology comprising of four steps: data acquisition and preprocessing, variable selection via Genetic Algorithm (GA), data balancing with synthetic minority over-sampling and random under-sampling methods, and finally the development of the TAN model to determine the probabilistic inter-conditional dependency structure among breast cancer-related variables along with the posterior survival probabilities The proposed model is compared to well-known machine learning models. A what-if analysis has also been conducted to verify the associations among the variables in the TAN model. The relative importance of each variable has been investigated via sensitivity analysis. Finally, a decision support tool is developed to further explore the conditional dependency structure among the cancer-related factors. The results produced by the proposed methodology, namely the patient-specific posterior survival probabilities and the conditional relationships among the variables, can be used by healthcare professionals and physicians to improve the decision-making process in planning and managing breast cancer treatments. Our generic methodology can also accommodate other types of cancer and be applied to manage various medical procedures.
KW - Breast cancer
KW - Data mining
KW - Genetic Algorithm
KW - Machine learning
KW - Sensitivity Analysis
UR - http://www.scopus.com/inward/record.url?scp=85125283691&partnerID=8YFLogxK
U2 - 10.1016/j.knosys.2022.108407
DO - 10.1016/j.knosys.2022.108407
M3 - Article
AN - SCOPUS:85125283691
SN - 0950-7051
VL - 242
JO - Knowledge-Based Systems
JF - Knowledge-Based Systems
M1 - 108407
ER -