Classification and Diagnostic Prediction of Colorectal Cancer Mortality Based on Machine Learning Algorithms: A Multicenter National Study

Document Type : Research Articles

Authors

1 Cancer Research Center, Shahid Beheshti University of Medical Sciences, Tehran, Iran.

2 Basic and Molecular Epidemiology of Gastrointestinal Disorders Research Center, Research Institute for Gastroenterology and Liver Diseases, Shahid Beheshti University of Medical Sciences, Tehran, Iran.

3 Gastroenterology and Liver Diseases Research Center, Research Institute for Gastroenterology and Liver Diseases, Shahid Beheshti University of Medical Sciences, Tehran, Iran.

4 Proteomics Research Center, Shahid Beheshti University of Medical Sciences, Tehran, Iran.

5 Vice Chancellor in Administration and Resources Development Affairs, Shahid Beheshti University of Medical Sciences, Tehran, Iran.

6 Department of Epidemiology and Biostatistics, School of Public Health, Tehran University of Medical Sciences, Tehran, Iran.

7 Shahid Beheshti University of Medical Sciences, Tehran, Iran.

8 Cardiovascular Research Center, Shahid Beheshti University of Medical Sciences, Tehran, Iran.

9 Vice Chancellor for Research & Technology, Shiraz University of Medical Sciences, Shiraz, Iran.

10 Department of Epidemiology, Faculty of Health, Iran University of Medical Science,Tehran, Iran.

11 Laser Application in Medical Sciences Research Center. Shahid Beheshti University of Medical Sciences, Tehran, Iran.

12 Department of Dermatology, Director of Skin Research Center, Shahid Beheshti University of Medical Sciences, Tehran, Iran.

13 Department of Health Information Technology and Management, School of Allied Medical Sciences, Shahid Beheshti University of Medical Sciences, Tehran, Iran.

14 Department of Epidemiology and Biostatistics, School of Public Health, Tehran University of Medical Sciences

15 Department of Mathematics at Architecture and Computer Engineering, University of Applied Sciences (unit 10), Tehran, Iran.

16 Health Sciences Research Center, Mazandaran University of Medical Sciences, Sari, Iran.

17 Department of Pediatrics, Faculty of Medicine, Ahvaz Jundishapur University of Medical Sciences, Ahvaz, Iran.

Abstract

Introduction: Colorectal cancer (CRC) ranks as the second leading cause of cancer-related deaths. This study aimed to predict survival outcomes of CRC patients using machine learning (ML) methods. Material and Methods: A retrospective analysis included 1853 CRC patients admitted to three prominent tertiary hospitals in Iran from October 2006 to July 2019. Six ML methods, namely logistic regression (LR), Naïve Bayes (NB), Support Vector Machine (SVM), Neural Network (NN), Decision Tree (DT), and Light Gradient Boosting Machine (LGBM), were developed with 10-fold cross-validation. Feature selection employed the Random Forest method based on mean decrease GINI criteria. Model performance was assessed using Area Under the Curve (AUC). Results: Time from diagnosis, age, tumor size, metastatic status, lymph node involvement, and treatment type emerged as crucial predictors of survival based on mean decrease GINI. The NB (AUC = 0.70, 95% Confidence Interval [CI] 0.65–0.75) and LGBM (AUC = 0.70, 95% CI 0.65–0.75) models achieved the highest predictive AUC values for CRC patient survival. Conclusions: This study highlights the significance of variables including time from diagnosis, age, tumor size, metastatic status, lymph node involvement, and treatment type in predicting CRC survival. The NB model exhibited optimal efficacy in mortality prediction, maintaining a balanced sensitivity and specificity. Policy recommendations encompass early diagnosis and treatment initiation for CRC patients, improved data collection through digital health records and standardized protocols, support for predictive analytics integration in clinical decisions, and the inclusion of identified prognostic variables in treatment guidelines to enhance patient outcomes.

Keywords

Main Subjects