Data Driven for Early Breast Cancer Staging using Integrated Mammography and Biopsy

Document Type : Research Articles


Department of Computer Science, Faculty of Business Administration and Information Technology, Rajamangala University of Technology Tawan-Ok, Thailand.


Objective: Breast cancer patients who have a rapid diagnosis have been better prognosis than late diagnosis. The popular screening is mammogram or ultrasound. In recent years, researchers try to develop data driven models to predict early cancer staging from the first screening. However, data elements are not complete such as lymph node status. Therefore, the Integrated dataset approach will be challenging. Methods: Because the data elements are not collected from the same source, joining between mammography and biopsy data were performed using latent variables that determine by tumor severity. The datasets consist of 445 mammography reports and 183 pathological reports. The latent variables of the mammogram dataset were determined by the severity of mass, while latent variables of the pathological dataset were determined by TNM Staging. The latent variables were used to join between two datasets. Then, the prediction models were built using the machine learning technique. The modeling is divided into three steps; staging prediction, lymph node prediction, and prognosis. Results: Integrated dataset from mammography and biopsy extend more factors and built the models to predict breast cancer staging in the mammography process. The staging prediction is 100% accuracy. The lymph node prediction are 72.47% accuracy, 73.94% specificity, and 72.5% sensitivity. An area under ROC curve is 0.74. The prognosis model prediction are 72.72% accuracy, 80% specificity, and 77% sensitivity. An area under ROC curve is 0.87. There are also built the rule for early staging, diagnosis, and prognosis.  Conclusion: This study aims to build the models for early staging, diagnosis, and prognosis using the less aggressive method. The advantages are (1) predict staging from the first screening (2) estimate the lymph node metastases for planning to ALND or SLNB (3) evaluate overall survival time. These advantages help the physician planning the best treatment for cancer patients.


Main Subjects