An Automatic Breast Tumor Detection and Classification including Automatic Tumor Volume Estimation Using Deep Learning Technique

Objective: This study aims to develop automatic breast tumor detection and classification including automatic tumor volume estimation using deep learning techniques based on computerized analysis of breast ultrasound images. When the skill levels of the radiologists and image quality are important to detect and diagnose the tumor using handheld ultrasound, the ability of this approach tends to assist the radiologist’s decision for breast cancer diagnosis. Material and Methods: Breast ultrasound images were provided by the Department of Radiology of Thammasat University and Queen Sirikit Center of Breast Cancer of Thailand. The dataset consists of 655 images including 445 benign and 210 malignant. Several data augmentation methods including blur, flip vertical, flip horizontal, and noise have been applied to increase the training and testing dataset. The tumor detection, localization, and classification were performed by drawing the appropriate bounding box around it using YOLO7 architecture based on deep learning techniques. Then, the automatic tumor volume estimation was performed using a simple pixel per metric technique. Result: The model demonstrated excellent tumor detection performance with a confidence score of 0.95. In addition, the model yielded satisfactory predictions on the test sets, with a lesion classification accuracy of 95.07%, a sensitivity of 94.97%, a specificity of 95.24%, a PPV of 97.42%, and an NPV of 90.91%. Conclusion: An automatic breast tumor detection and classification including automatic tumor volume estimation using deep learning technique yielded satisfactory predictions in distinguishing benign from malignant breast lesions. In addition, automatic tumor volume estimation was performed. Our approach could be integrated into the conventional breast ultrasound machine to assist the radiologist’s decision for breast cancer diagnosis.


Introduction
The new cases of breast cancer is increasing every years (Siegel et al., 2022). When early detection of breast cancer is an effective method to decrease the morality rate, ultrasound is used to detect and diagnose breast lesions when abnormalities are identified by other imaging modalities or on palpation (Kornecki, 2011). In addition, the ultrasound (US) had a higher sensitivity and diagnostic accuracy (Shen et al., 2015). When the handheld ultrasound is used, the skill levels of the radiologists and image quality are importance to detect and diagnose the tumor (Komatsu et al., 2021). Therefore, computerized analysis of breast images has been widely introduced to increase the efficiency breast screening using a computer system to help radiologists detect and diagnose abnormalities (Jiang et al., 1999;Giger, 2000).
Detector (SSD) are commonly used for real-time object detection. In the experiments, the SSD model obtained the highest F1s score of 0.79. Chiao et al. (2019) used an extension of the Faster R-CNN for tumor segmentation. Their model called The Mask R-CNN model obtained an Intersection over Union (IoU) of 0.75. Amiri et al. (2020) developed a two-stage segmentation method based on the U-Net architecture. This method attained F1s = 0.86 and F1=0.80 with and without test time augmentation procedure, respectively. The Dense skip U-Net (DsUnet) network that was proposed by Cui et al. (2020) is a segmentation approach that is based on the U-Net model. From the experiments, the results show that the DsUnet model reached F1s = 0.86.
According to our survey, many studies are successful in breast ultrasound detection and segmentation. However, no one study proposed automatic tumor volume estimation. Therefore, this study aims to develop automatic breast tumor detection and classification including automatic tumor volume estimation using deep learning technique. The ability of the model tends to assist the radiologist's decision for breast cancer diagnosis.

Materials
Breast ultrasound images were provided by the Department of Radiology of Thammasat University and Queen Sirikit Center of Breast Cancer of Thailand. The dataset consists of 655 images including 445 benign and 210 malignant. The dataset consists of 655 images including 445 benign and 210 malignant. Figure 1 shows some example images from our experiment dataset.
If a dataset is very small, may still not be enough for a given problem. The accuracy of deep learning models largely depends on the quality, quantity, and contextual meaning of training data. However, data scarcity is one of the most common challenges in building deep learning models. In production use cases, collecting such data can be costly and time-consuming. Hence, several data augmentation methods including blur, flip vertical, flip horizontal, and noise have been applied to improve the classification performance. Table 1 Data augmentation techniques were used in this study. In addition, table 2 shows the original sample size compared with the augmentation sample size.

Deep learning for object detection and classification
Deep learning for object detection involves not only recognizing and classifying every object in an image, but localizing each one by drawing the appropriate bounding box around it. This technique is extended from traditional computer vision and image classification. In recent years, many model architectures were successful approaches to object detection such as R-CNN, Faster R-CNN, and YOLO (Redmon et al., 2016, Ren et al., 2017, and Virasova et al., 2021. Realtime object detection advances with the release of YOLO. YOLOv7 infers faster and with greater accuracy than its previous versions (i.e. YOLOv5), pushing the state of the art in object detection to new heights. These features are combined and mixed in the neck, and then they are passed along to the head of the network YOLO predicts the locations and classes of objects around which bounding boxes should be drawn. Figure 2 shows the training process. First, input images were fed to the convolution layer based on YOLOv7 architecture. This process iteratively trained until archive the best model performance. Second, the model tries to localizing each tumor by drawing the appropriate bounding box around it. Third, the tumor volume was estimated from the size of minimum bounding box. Finally, detected object classify the tumor in benign or malignant. Once a deep learning model has been trained, it can be used to make predictions about new data. To do this, we pass the new data through the network and use the output of the final layer to make our predictions.
The object detector is responsible for identifying which pixels in an image belong to an object, and the regressor is responsible for predicting the coordinates of the bounding box around that object. The output of the object detector will typically be a set of bounding boxes around the detected objects, along with a confidence score for each bounding box.
The regressor is then trained on these bounding boxes to learn how to predict the coordinates of the tightest possible bounding box around an object. After both the object detector and regressor have been trained, they can be combined into a single model that can be used to detect and localize objects in new images.

Performance Evaluation
The model evaluation such as precision, recall, and accuracy were used to evaluate the model performance. (1) Intersection over Union (IoU) was used to quantifies the degree of overlap between two boxes. In the case of object detection and segmentation, IoU evaluates the overlap of Ground Truth and Prediction region that helps the model measure the correctness of a prediction. Fig x shows the example to understand how IoU is calculated. Figure 3 showed that the predicted box of model (a) has more overlap with the Ground Truth as compared to model (b). However, model (c) has an even higher overlap with the ground truth. But it also has a high overlap with the background. It is clear that model (b) and (c) not just about matching the Ground Truth, but how closely the prediction matches the Ground Truth.
Two bounding boxes over iteration were compared for pixels per inch. We know that 1 inch is equal to 2.54 cm. So, there are 96 pixels per 2.54 cm. Than 1 pixel = (2.54 / 96) cm. Finally, there are 0.026458333 centimeters in a pixel. Figure 5 show the tumor volume size.

System integration
Integration of various systems of medicine provides the best available therapeutic care to the patient without undue delay, making way for a better prognosis. In recent years, new approaches for existing diseases or newly emerging diseases are thought out. However, developing new software, new tools, or new systems is time-consuming and costly. Instead of developing new systems, this study proposed an automatic breast detection and classification including automatic tumor volume estimation that could be integrated into the conventional breast ultrasound machine. The proposed system is shown in Figure 6.

Data augmentation result
In this section, we present the comparative classification performance of two approach (original dataset and augmentation dataset) using AI models as summarized in Table 3 The quantitative comparison of precision, recall, and accuracy were estimated over the testing dataset.
The original breast ultrasound dataset achieved a lesion classification a precision of 0.92, a recall of 0.84, and an accuracy of 0.88. The effectiveness of the model was evaluated from the two classes, with a precision of 0.89, a recall of 0.85, and an accuracy of 0.86 from the benign class. Moreover, the malignant class achieved higher than the benign class, with a precision of 0.95, a recall of 0.86, and an accuracy of 0.87. all the detected objects using the Intersection over Union (IoU) as follow: (4) In addition, Recall (RE), and the mAP. An Average Precision (AP) formally presents in: Where the P(k) refers to the precision at a specifically given threshold k, and Δr(k) as the shift in the Recall. For multiple object detection, the mAP calculates the mean of all AP for each category as follow:

Volume measurement
This section explains how to extract the boxes from the raw image and measures the object size. In yolo, a bounding box is represented by four values [x_center, y_center, width, height]. The x_center and y_center are the normalized coordinates of the center of the bounding box. To make coordinates normalized, we take pixel values of x and y, which marks the center of the bounding box on the x-axis and y-axis. Then we divide the value of x by the width of the image and value of y by the height of the image. The width and height represent the width and the height of the bounding box. When the object pixel was found, the pixel density was used to estimated the object size. Figure 4 shows the volume measurement process.
Assuming the pixel density is 96 dpi, there are 96 The results from data augmentation tend to improve the model performance. All classes classification achieved a precision of 0.97, a recall of 0.89, and an accuracy of 0.95. The effectiveness of the model was evaluated from the two classes, with a precision of 0.94, a recall of 0.88, and an accuracy of 0.93 from the benign class. Moreover, the malignant class achieved higher than the benign class, with a precision of 1, a recall of 0.89, and an accuracy of 0.95.
The empirical results showed that data augmentation is useful in improving the performances and outcomes of ABUS models. It could reduce the cost of the data collection process by transforming new synthetic images for image classification.

RoI extraction and bounding-box regression results
The lesion detection in ABUS usually uses a bounding box to describe the spatial location of the tumor. The bounding box is rectangular, which is determined by the x and y coordinates of the upper-left corner of the rectangle and the such coordinates of the lower-right corner. Another commonly used bounding box representation is the (x, y)-axis coordinates of the bounding box center, and the width and height of the box. Figure 7 shows the results of RoI extraction using minimum bounding box.
In addition, the mAP compares the ground-truth bounding box to the detected box and returns a score. The higher the score represents the more accurate the model in its detections. Figure 8 shows the mean Average Precision or mAP score by taking the mean AP over all classes and/or overall IoU thresholds. The result show that the    mAP@0.5 score achieve 0.95, while the mAP@0.5-0.95 achieve 0.75.

Tumor prediction
The confusion matrices of the model for predicting breast cancer with the test set is shown in Figure 9. The ABUS model achieved a high performance in distinguishing benign from malignant breast lesions when applied to the breast US images of the test set. The model achieved a lesion classification accuracy of 95.07%, a sensitivity of 94.97%, a specificity of 95.24%, a PPV of 97.42%, and an NPV of 90.91% (Table 3).

Tumor volume estimation
In previous experiment results, this study illustrated the important results for cropping the tumor using bounding box coordinates in a top-left, top-right, bottomright, and bottom-left arrangement. This section illustrated the computation results in the size of tumor. The ABUS model can measure the size of tumor in an image using a simple "pixels per metric" technique which describes the number of pixels that can "fit" into a given number of inches, millimeters, meters, etc.

Discussion
This paper proposes artificial intelligence model that not only automatically detect the breast tumor lesions but also classify the breast tumor in benign or malignant follow by the tumor volume measurement using effective deep learning technique. The breast ultrasound images were used for model training and testing. When the        quantity and diversity of data are important factors in the effectiveness of most machine learning models (e.g. deep learning neural network models), data augmentation has been used in this study to enhance the amount of data producing synthetic data from existing data. Our experiment results showed the augmentation dataset tend to improve the model performance. This result is consistent with previous papers (Han et al., 2017, Zheng Figure 8. The Mean Average Precision or mAP Score is Calculated by Taking the Mean AP Over All Classes and/or Overall IoU Thresholds, mAP@0.5 is calculated for an IoU threshold of 0.5, while mAP@0.5-0.95 is calculated for an IoU threshold from 0.5 to 0.95. study. They proposed BUSNet that showed the performance for the breast US images using the backbone network for the classification of RoIs and bounding box regression. As same as the G´omez-Flores et al, (2020) study, the well-established CNN models have been developed by the computer vision community for the automatic segmentation of BUS images using semantic segmentation.
The diagnostic performances using ABUS were discussed. The model yielded satisfactory predictions on the test sets, with a lesion classification accuracy of 95.07%, a sensitivity of 94.97%, a specificity of 95.24%, a PPV of 97.42%, and an NPV of 90.91%. The diagnostic performance of our model was similar to previous papers on AI methods for breast US analysis (Kim KE et al., 2020, Wan KW et al., 2021, and Boumaraf S et al., 2011. In recent years, many new techniques have been developed to compensate for the deficiencies of conventional US (Yampaka and Chongstitvatana, 2020). In particular, automatic breast detection and classification including automatic tumor volume estimation using artificial intelligence can provide a second opinion or supportive decision and significantly improve the efficiency and effectiveness of the radiologists' diagnosis (Chan et al., 2020). In summary, this study proposed an automatic breast tumor detection and classification including automatic tumor volume estimation for US images. In addition, our ABUS model can measure the size of tumor in an image using a simple pixels per metric technique. There are several limitations in our study. First, the training dataset consist of only B-mode US images. Therefore, other US mode such as Doppler or Elastography mode can include in the further development. Second, our model classifies the tumor in benign and malignant. Additionally, the BI-RADS assessment is a standard tumor diagnosis. Consequently, the ABUS classification in BI-RADS assessment can be further applied to develop AI-based diagnostic support technologies for breast disease screening.

Author Contribution Statement
All authors contributed equally in this study.