A hybrid data mining approach for identifying the temporal effects of variables associated with breast cancer survival

Serhat Simsek, Ugur Kursuncu, E. Kibis, Musheera AnisAbdellatif, Ali Dag

Research output: Contribution to journalArticlepeer-review

55 Scopus citations

Abstract

Predicting breast cancer survival is crucial for practitioners to determine possible outcomes and make better treatment plans for the patients. In this study, a hybrid data mining based methodology was constructed to differentiate the variables whose importance for survival change over time. Therefore, the importance of variables was determined for three different time periods (i.e. one, five, and ten years). To conduct such an analysis, the most parsimonious models were constructed by employing one regression analysis method—Least Absolute Shrinkage and Selection Operator (LASSO), and one metaheuristic optimization method, namely a Genetic Algorithm (GA). Due to the high imbalance between the number of survivals and deaths, two well-known resampling procedures—Random Under-sampling (RUS) and Synthetic Minority Over-sampling Technique (SMOTE)—were applied to increase the performance of the classification models. In the final stage, two data mining models, namely Artificial Neural Networks (ANNs) and Logistic Regression (LR), were utilized along with 10-fold cross-validation. Sensitivity analysis (SA) was conducted for each model to identify the importance of each variable for a certain model and time period. The obtained results revealed that certain variables lose their importance over time, while others gain importance. This information can assist medical practitioners in identifying specific subsets of variables to focus on in different periods, which will in turn lead to a more effective and efficient cancer care. Moreover, the study findings indicate that extremely parsimonious models can be developed by adopting a purely data-driven approach, rather than eliminating the variables manually. Such methodology can also be applied in treating other types of cancer.

Original languageEnglish
Article number112863
JournalExpert Systems with Applications
Volume139
DOIs
StatePublished - Jan 2020

Keywords

  • Data mining
  • Healthcare analytics
  • Machine learning
  • Medical decision making

Fingerprint

Dive into the research topics of 'A hybrid data mining approach for identifying the temporal effects of variables associated with breast cancer survival'. Together they form a unique fingerprint.

Cite this