Predicting HER2 Status Associated with Breast Cancer Aggressiveness Using Four Machine Learning Models

Document Type : Research Articles

Authors

1 Laboratory of Applied Biochemistry and Microbiology, Department of Biochemistry, Faculty of Sciences, University of Badji Mokhtar, 23000 Annaba, Algeria.

2 Environmental Biosurveillance Laboratory, Department of Biology, Faculty of Sciences, University of Badji Mokhtar, 23000 Annaba, Algeria.

3 Medical Oncology Service, Cancer Research Centre (CLCC), 23000 Annaba, Algeria.

4 Abdallah Nouaouiria Health Hospital Establishment - El Bouni, 23000 Annaba, Algeria.

Abstract

Objective: Breast cancer (BC) is a heterogeneous disease with various biological and clinical subtypes. HER2 status (human epidermal growth factor receptor 2) is a crucial biomarker, associated with aggressive tumor behavior and poor prognosis. Advanced algorithmic models can aid in predicting cancer growth and metastasis, serving as valuable clinical tools for classification and treatment. Effective treatment strategies in oncology rely on accurate decision-making and early identification of factors associated with positive outcomes. Breast cancer (BC) presents challenges in understanding its contributing factors and establishing precise diagnostic methods. Our research introduces a novel method utilizing machine learning (ML) techniques to explore the relationship between various clinical and molecular variables focusing on predicting the status of the human epidermal growth factor receptor 2 (HER2), a key aggressiveness biomarker in BC. This objective aligns with leveraging artificial intelligence (AI) to support decision-making and address diagnostic considerations during treatment. Methods: Four ML models, namely logistic regression, random forest, LightGBM, and CatBoost, were implemented and evaluated using Python. The dataset was compiled by extracting medical records of BC patients, covering the period from 2018 to 2020. The model’s predictive performance was evaluated using accuracy, precision, recall, and F1-score as performance metrics. Result: The models achieved varying accuracies between 86.36 and 95.45%. The logistic regression model achieved an accuracy of 90.90% while the random forest and LightGBM models achieved an accuracy of 86.36%. The CatBoost model outperformed others with a greater accuracy of 95.45%, indicating its superior predictive capabilities for HER2 status. The ML models demonstrated potential in predicting HER2 status, enabling early detection and facilitating personalized treatment strategies. Conclusion: Our findings emphasize the significance of AI and ML techniques in improving BC outcomes and guiding decision-making. Further research is required to explore the broader applications of ML in predicting comprehensive BC outcomes in diverse healthcare settings and among heterogeneous populations. 

Keywords

Main Subjects