Automated Detection and Classification of Microcalcification Clusters with Enhanced Preprocessing and Fractal Analysis

This paper addresses the automated detection of microcalcification clusters from mammogram images by enhanced preprocessing operations on digital mammograms for automated extraction of breast tissue from background, removing artefacts occurring during image registration using X-rays, followed by fractal analysis of suspicious regions. Identification of breast of either left or right and realigning them to a standard position forms a primitive step in preprocessing of mammograms. As the next step in the process, pectoral muscles are separated. Suspicious regions of microcalcifications are identified and are subjected to further analysis of classifying it as benign or malignant. Texture features are representative of its malignancy and fractal analysis was carried out on extracted suspicious regions for its texture features. Principal Component Analysis was carried out to extract optimal features. Ten features were found to be an optimal number of reduced texture features without compromising on classification accuracy. Scaled conjugate Gradient Back propagation network was used for classification using reduced texture features obtained from PCA analysis. By varying hidden layer neurons, accuracy of results achieved by proposed methods is analysed and is calculated to reach maximum accuracy with an optimal level of 15 neurons. Accuracy of 96.3% was achieved with 10 fractal features as input to neural network and 15 hidden layer neurons in neural network designed. The design of architecture is finalised with maximised accuracy for labelling microcalcification clusters as benign or malignant.


Introduction
Breast cancer, being a highly probable risk of cancer causing deaths among females, early diagnosis of breast cancer is looked upon as a saviour. Early detection of breast cancer from mammogram is an important factor in reducing fatality rate. Mammogram is a widely used and reliable screening technology, helping in detection of breast cancer (Avalos-Rivera and Pastrana-Palma, 2016). The presence of microcalcifications in breast is an early indication of breast cancer (Dhawan and Royer, 1988). Identifying the presence of microcalcifications and localization of microcalcifications in breast tissue is normally carried out by radiologists in screening process. Cheng et al., (2003) attempted to carry out the task of identification of microcalcifications and its localization as a computer aided process.
Suspicious regions encompassing microcalcifications are chosen for analysis or feature extraction in many papers (Miranda and Felipe, 2015; Avalos-Rivera and V Gowri 1 , K R Valluvan 2 ,V Vijaya Chamundeeswari 3 * and shape, or geometric properties of microcalcifications, were extracted to aid in its classification process by Artificial Neural Networks (Arevalo et al., 2015). Liu et al., (2001) and Eltonsy et al., (2007) proved that multiresolution of mammograms helps in improving the effectiveness of classification process. Gray level Co-occurrence matrix, Gabor wavelets and Contourlets were implemented for texture feature extraction (Tai et al., 2013;Eichkitz et al., 2015). Rashed et al., (2003) carried out multi-resolution analysis of mammogram using wavelets. Comparison of multi wavelet, wavelet, Harelick and shape features were carried out by Soltanian-Zadeh et al., (2004).
Texture features, through literature works, have been proved to play a significant role in recognizing or labelling of microcalcification clusters as benign or malignant. Principal Component Analysis is employed in this proposed work for the selection of more relevant features to represent the type of microcalcification clusters (Arikidis et al., 2006). Scaled Conjugate gradient method had better training results for real valued cases. It is an extension to enhance performance of back propagation algorithm. This method applied on real world applications show improved results over complex gradient descent algorithm (Popa, 2015).

The paper is organised as follows
Materials and Methods section describes the database used, algorithm for processing methodology, preprocessing operations involving removal of artefacts, removal of labels, separation of breast tissue, computing and realigning orientation of breast tissue and separation of suspicious regions containing microcalcification clusters. It is followed by description of fractal analysis and extraction of texture features. The application of Principal Component analysis for reduction of features and the training and testing of scaled conjugate gradient back propagation network for classification of microcalcification clusters as benign or malignant are explained. Discussion section involves results obtained by preprocessing the mammogram image, analysis of Lee filter, and effects of varying hidden layer neurons with classification accuracy.

Database used
The Mammographic Image Analysis Society (MIAS) maintains mammogram database. The mammogram database consists of 322 images, representing 161 breast pairs in mediolateral oblique view. The images have been reviewed by a consultant radiologist to identify abnormalities and truth data is available with the database. Images are numbered conveniently from 1 to 322. The images are presented as consecutive left-right pairs of each patient, so that odd numbers indicate a left breast image and even numbers a right breast image with right breast following the left breast in order. For illustrations in this paper, 'mdb005' and 'mdb006' are used. 'mdb005' represents left breast image and 'mdb006' represents right breast tissue of the same patient considered.

Preprocessing of Images to extract Breast Tissue
Mammogram capturing breast image, typically include labels with patient id and descriptions. These labels have either alphabets or numbers enclosed in rectangular boxes. These artefacts were removed by 'template matching algorithm'. Database for template matching consists of alphabets, numerals and a rectangular box. By applying template matching algorithm, the locations of box containing patient id is identified and masked. Figure 1 represents the mammogram image. Figure 2 represents the image with patient id removed.
Enhanced Lee filter is applied to suppress noise while enhancing image detail and sharpness. Multiplicative noise can be removed effectively using this filter using Enhanced Lee filter with a window size of 5. Cross  Table1. Window Size Versus Breast Boundary Detection Accuracy between horizontal line and major axis. In addition to this, identifying tip of the breast aligning them uniformly in the whole of image database is also required. Breast tip position can be identified by scanning x, and y coordinates of the boundary line separating breast tissue and background. If all images are aligned so that tip of breast occurs in right side of the image, it makes identification easier. On the basis of orientation, image is either flipped or maintained as such for realignment. Study of orientation values on 50 test images belonging to both left and right breast tissue helps in identifying and confirming the role of orientation. Figure 6 represents the flipped breast tissue. With the help of orientation and correcting the alignment of breast tissue with the negation of orientation, helps in realigning the breast tissue.

Separation of Suspicious regions
Histogram Equalization was applied on breast tissue to enhance contrast. Exponential operation on breast tissue further enhances dynamic range. With this process, microcalcification regions are visually enhanced and separable. To make this process automatic, filter using correlation analysis is carried out to enhance breast tissue and background. SVM classifier is applied to give binary output of breast tissue and background. Output of SVM classifier is in binary form and is applied as a mask to the original image to extract breast tissue making an uniform background, so that intensity variations in background will not affect the breast tissue processing for further analysis. Figure 3 represents gamma filtered output of SVM classified output. Gamma filter is applied to further suppress the noise and enhance boundary. Resultant is the boundary line traced and hence separation of breast tissue from the background. Figure 4 represents breast tissue separated with suppressed/ masked background. Figure 5 represents the boundary line detection on right breast.

Computing Orientation of Breast Tissue and Realigning
Once the breast tissue was separated from the background, region properties of breast tissue, viz., Major axis length, Minor Axis length and Orientation, are calculated. Orientation is calculated as the angle

Fractal Texture Analysis
When the mass regions were identified as mentioned in step I and II, texture features are extracted using fractal analysis. Texture features are intended to capture the granularity and repetitive patterns of regions within an image. Texture feature extraction was carried out using segmentation based Fractal Texture Analysis (SFTA). The input image is decomposed into a set of binary image using two thresholded binary decomposition algorithm. From the binary images, fractal dimensions of the resulting regions were computed that describes the image. In this work, input image of suspicious region was decomposed into set of four images. Each image is described by a [24 x 1] vector of fractal texture features (Cardona et al., 2014).

Principal Component Analysis
Fractal analysis of microcalcification containing region has [24 x 1] feature vector. Before applying a learning algorithm or neural network for training and testing purposes, it is imminent to check for optimization of feature vectors. If there is too much irrelevant and redundant information present or the data is noisy and unreliable, then learning during training phase is more difficult. Feature subset selection is the process of identifying and removing irrelevant and redundant information as much as possible (Hall, 1999). Process of subset selection reduces the dimensionality of the problem, and allows accuracy of classification to be improved. Principal Component Analysis is chosen to compute reduced feature vectors. By varying number of feature vectors, and applying classification process for training, classification accuracy is plotted. From this analysis, optimal number of feature vectors is identified to be ten.

Classifier Network For Labelling Benign And Malignant Microcalcifications
Scaled Conjugate Gradient Back propogation network    Figure 7 represents the neural network trained for classification of microcalcifications as benign, malignant, normal and no lesions.

Preprocessing of Input Image
Test image considered for the analysis is labeled as 'mdb005' in MIAS mammogram database. Generally, mammogram images include labels carrying patient information. It usually includes patient id and description whether image belong to left or right breast, enclosed in a rectangular box. From the analysis of various mammogram images, it is concluded that artifacts generally include a rectangular section, within which alphabets and numerals were printed. In some cases, alphabets and numerals alone are printed even outside the rectangular label section. Artifacts are removed by implementation of 'template matching' algorithm.

Artefacts Removal by Template Matching Algorithm
Template matching is a brute force algorithm to search for a subset image of predefined template within an image. Template matching is a preferred strategy for discovering zones of an image matching a template image. Template matching algorithm uses input image, image from MIAS database as a source image, within which algorithm hopes to identify labels and artifacts, and template images. Templates for all alphabets 'a-z','A-Z', and digits '0-9' and template for rectangular box are employed as template images. Template matching algorithms require template image to be of the same size as in source image. In this paper, Fourier transform of the template is computed so that the matching process is irrespective of the size and orientation of the template in source image. Figure 8 (a) represents the source image, 'mdb005'. This is the source image given as input to template matching algorithm. Figure 8 (b) represents templates, sample template representing L, M and rectangular box. These are given as template images. ENVI +IDL 4.7 is used to employ template matching algorithm. Loci of template matching areas were located. Figure 8 (c) represents the template locations identified in the source image, 'mdb005'. Then the corresponding regions were masked. The tools, 'Build mask' and 'Apply mask' from ENVI were utilized building the mask and obtaining the masked image. Figure 8 (d) represents the masked image, void of artifacts.

Analysis of Lee filter
The image obtained from mammogram includes speculative noise, specifically, multiplicative noise (Makandar and Halalli, 2015). Enhanced Lee filter is an adaptive filter. Adaptive filtering uses the standard deviation of those pixels within a local box surrounding each pixel to calculate a new pixel value (Vikhe and Thool, 2016). Lee filter is a standard deviation based sigma filter that filters data based on statistics calculated within individual filter windows. Enhanced Lee filter are the adaptation of Lee filter and similarly uses local statistics, specifically, coefficient of variation, with individual filter windows. As a result, each pixel is put into three classes, which are treated as (a) Homogeneous: the pixel value is replaced by average of filter window, (b) Heterogeneous: pixel value is replaced by a weighted average, and (c) Point target: pixel value is not changed.
Unlike a typical low pass filter, Enhanced Lee filter preserve image sharpness and detail while suppressing noise. The masked image is filtered for its multiplicative noise preserving image sharpness and texture using adaptive Lee filter. Window size of Lee filter is varied from 3x3 to 5x5, 7x7, and 9x9 and accuracy of detecting breast tissue from background is calculated. Application of Lee filter, for suppressing noise, followed by SVM classifier, produces a binary output, representing breast tissue and boundary.
With the analysis of varying window size of Lee filter, window size of 5x5 is taken as optimal for suppressing noise in detecting breast boundary algorithm (Figure 9).
In the design of classifier for the neural network, input layer consists of 10 neurons and an output layer consisting of four neurons, to represent four types of lesions or microcalcifications. Number of neurons in hidden layer of this network is found to have an impact on classifier accuracy. By varying number of hidden layer neurons, classification accuracy is plotted (Figure 10). From Table 2, number of hidden layer neurons was fixed at 15 to give an optimal classifier accuracy of 96.3%.
From the breast tissue separated from background, suspicious regions were localized. Texture features are clearly representative of benign or malignancy of the breast tissue, fractal analysis was applied. Fractal analysis resulted in twenty four features. These features, if used, as such, will have redundant information and may reduce the accuracy of results. Table 3 and Figure 11 represents the classification accuracy obtained by varying number of principal components from PCA application. From the graph depicted in Figure 11, it is clearly obvious that the optimal number of principal components was ten. If lesser number of features is used, then classifier was not able to capture significant information for malignancy detection, resulting in lesser accuracy.
Scaled conjugate Gradient Back propogation network was used for classification using reduced texture features obtained from PCA analysis. By varying hidden layer neurons, accuracy of results achieved by proposed methods is analysed to reach an optimal level of 15 neurons. Figure 12 represents the confusion matrix obtained as performance of classifier. Class 1 represents malignant tissue, class 2 represents benign tissue, class 3 represents normal tissue and class 4 represents no lesion. The classifier has resulted in 96.3% accuracy with testing carried out for 27 sets of regions of breast tissue.
In conclusion, research works carried out in the analysis of mammograms have applied intensity, texture and morphological operators to classify microcalcifications. In this paper, we have adopted the texture features using fractal analysis. Even though twenty four features obtained from fractal analysis helped in effective classification of synthetic images, for real world application of classifying microcalcifications, the proposed method resulted in 69.4%. To remove redundancy in data and making features lesser and representative of the type of MC, plot of classification accuracy versus principal components was drawn and obtained the optimal number of principal components selected as features was found to be 10. Then, classifier was trained and tested to achieve an accuracy of 96.3%. Automated analysis was achieved by identifying suspicious regions with the help of proposed method of tracing boundary and extraction of suspicious regions or regions containing microcalcifications. Future scope of the work is to identify the type of breast tissue as fat or dense and check classification accuracy of labelling lesions presence in both types of breast tissue. Tailor made architectures can be designed to achieve higher accuracy for detection of types of calcifications in all kinds of breast tissue.