An Effective Two Way Classification of Breast Cancer Images: A Detailed Review

Cancer, a disease of cells, causes cell growth which differs from normal cell growth ratio, this cell growth spreads in the human body and kills the body cells. Breast cancer, it’s a highly heterogeneous disease and western women commonly witness this. Mammography, a pre-screening X-ray based check is used to diagnose woman’s breast cancer. This basic test mode helps in identifying breast cancer at early stage and this early stage detection would support in recovering more number of women from this serious disease. Medical centres deputed highly skilled radiologists and they were given the responsibility of analysing this mammography results but still human errors are inevitable. An error frequency ratio is high when radiologists exhausted in their analysis task and leads variations in either observations ie., internal or external observation. Also, quality of the image plays vital role in Mammographic sensitivity and leads to variation. Several automation processes were tried in streamlining and standardising diagnosis analysis process and quality of breast cancer images were improved. This paper inducts a two way mode algorithm for grouping of breast cancer images to 1. benign (tumour growing, but not dangerous) and 2. malignant (cannot be controlled, it causes death) classes. Two-way mode data mining algorithms are used due to thinly dispersed distribution of abnormal mammograms. First type algorithm is k-means algorithm, which regroups the given data elements into clusters (ie., prioritized by the users). Second type algorithm is Support Vector Machine (SVM), which is used to identify the most suitable function which differentiates the members based on the training data.


Introduction
Unlimited multiplication of a specific group of cells in a particular area of the human body is referred as CANCER. A lumps or mass of an additional tissue will be formed on a group of divided cells which splits quickly. These lump or masses are identified as Tumours. Cancer cells are known as malignant tumours. Breast cells were the basis for formation of malignant tumour, known as Breast Cancer. Clusters of micro calcifications, architectural distortions and masses are notable and cautionable signs. The growth ratio of breast cancer seems to be reported very high in present years. At the same time, survival rate is also increased potentially over past, which is majorly due to improved efficiency in diagnosis and treatments.
Screening of Breast Cancer is primarily taken as anatomic approach through X-ray mammography, which requires the breast tumor to have developed to a stage where it is significantly more dense than healthy tissue. As a consequence, mammography misses 5%-15% of nonpalpable breast lesions that are not sufficiently denser than healthy tissue (Manoharan et al., 1998;Osteen et al., 1996). In addition, increased density is not always Sinthia P*, Malathi M tied to the presence of cancer: dense lesions of tissue that are further investigated via biopsy are often found to be benign (Manoharan et al., 1998). Instead of relying on density changes, cancer can also be detected by using early molecular signatures.
In 2011 United States, the American Cancer Society had come out with their pre-analysis report that nearly 230,480 fresh cases of invasive breast cancer and nearly 57,650 fresh cases of non-invasive breast cancer would be in treatment for breast cancer. 39,520 women would die out of the total affected cases. Mammography, a famous and well-known process in diagnosing Breast Cancer uses low-dose X-rays, high-contrast and high-resolution detectors and the X-ray system is designed exclusively to image the breasts. In Breast Cancer Screening and diagnosis, it is understood that Mammography serves the purpose of application. Mammography is of two types, 1. Screen Film Mammography (SFM) -film screen is acting as an end recording device and 2. Full-Field Digital Mammography (FFDM) -digital detectors acting as an end recording media. In Image Processing and further grading support, the FFDM produced digital images have more advantages rather traditional film screen. Digital Mammogram is one of the important methods to identify the Breast Cancer at an early stage at some extent. The advantages of digital mammography include the lack of ionizing radiation, its non-invasiveness, the relatively compact instrumentation, and its cost-effectiveness. As Siddiqui et al., (2005) mentioned, Mammography is very effective and the results were highly reliable in identifying breast cancer and it's proven, a minimum number of radiologists were tasked in interpreting and diagnosis of Mammograms which is more by and large from population screening. It was mentioned in the report by Wroblewska et al., (2003) that there is always a risk missing breast cancer cases, involved in mammographic image observance because unusual identifications were embedded and hidden by variance in structures of breast tissue.

Related Work a. Studies on different techniques
Abou-Chadi et al., (2002), taken support of neural networks in identifying candidate circumscribed lesions in digitized mammograms. Back Propagation algorithm is used in training neural networks. The process of neural networks majorly differentiates the histogram of cancerous tissue and the normal tissue.
Brake et al., (1999) noted in his studies how digital mammograms were used in single and multi-layer detection of masses. In mammograms, scaling plays a vital role in automated process of detecting masses, it is mainly because of the possible range of masses can have. The work carried out, was experimented that if detection of masses can be done in single scale or might be suitable to use the result at various levels of scaling in multi-scale scheme. Chan et al., (1988), has done research and introduced a computerized mode of detecting micro-calcification in digital mammograms. This mode works on variation in image in which the signal suppressed image is subtracted from a signal enhanced image, which removes the structured background in the mammogram. For extracting micro-calcification signals, global and local threshold values based techniques are used. Karssemeijer (1905) has done his studies and developed a data based calculate method for detecting of micro-calcifications in digital mammograms. Baysian image analysis is base for the statistical models and general framework. Nakayama et al., (2005) had taken support of filter bank in identifying linear and nodular patterns. The sub images were generated with the elements of a Hessian matrix at all resolution level with support from filter bank. The small and eigen values were calculated and a new filter bank resulted with three properties, follows, 1. Nodular patterns can be enhanced with various sizes, 2. Various sizes can be enhanced in both nodular and linear patterns and 3. By removing these patterns, an original image can be re-build. In mammograms, filter bank is used in enhancing micro-calcifications. Yu et al., (2000) has given in proposal that two steps of CAD system for the automatic clustered micro-calcifications detection. In first step, wavelet and gray level statistical properties were used in potential micro-calcification pixels segment and establish them into objects of potential individual micro-calcification. In second step, 31 statistical properties were used to check these potential objects. Enough support was taken from Neural Networks too. The outcome results were promising but not guaranteed, it's due to training set usage in testing.
Kim (2016)   Step 1: Assigning Data from set D Every data point from set D is assigned to its closest centroid, with ties arbitrarily broken. Data partitioning is resulted.
Step 2: "means" Relocation Every cluster representative data is replaced to the center (mena) of all the data points assigned to it. The replacement is to the expectations (weighted mean) of the data partitions taking place if the data points reached the probability measure (weights).
Euclidean distance is the default measure of closeness, during this scenario, non-negative cost function applies always,

b. Support Vector Machines
Support Vector Machines (SVM) (Vapnik, 1995) is very popular application mode because it results in robust, reliable and accurate method, while comparing with other process and algorithms. Numbers of dimensions are insensitive and needs only 12 examples for training, sound theoretical foundation. Moreover, improved methods are developed at rapidly in training SVM.
In order to determine a maximum margin hyperplane, Support Vector Machine takes data from two classes. The hyperplane is resulted in, distance from the hyperplane taken and to the next nearest data points on either side, which is called support vectors, bring the results maximum. In support from Kernel function application, non-linearly separable data to make them linearly separable -to perform this Support Vector Machine (SVM) is extendable (Muller et al., 2001). We have taken support from the linear kernel in this paperwork, polynomial kernel of orders 1, 2 and 3 and the radial basis function kernel. Similar kernel techniques were quoted and used in wavelet SVM (Shen et al., 2010).

System Architecture
Mammogram result is taken as an input and given to preprocessing phase for filtering the data. In low-level image processing, pre-processing becomes an inevitable problem. By using various filtering techniques, noise presented in the image can be filtered out. The gray level of an image reduces while high pass filter passes the changes to a low pass filter. It means, the value smoothens and sharp edges were removed frequently, while applying low pass filter. The Median Filter is the best of low pass filters. The filter considers an image of area 3x3, 5x5, 7x7, etc., an element array is resulted by taking all the pixel values. The median value of an array is calculated and resulted by ordering element array. A famous sorting technique, Bubble sort is used in this element array sorting in an Ascending order, which returns a median value from the middle elements of the sorted array. The set, the median values of the array elements calculated for all the pixels, were resulted to an output image array (Gonzalez and Woods, 2007). The complete image array is arrived by repeating the Median Filter process. SVM algorithm. Initially the weighted SVM was performed and executed which made tractable process to classify the data set which can address the disease sub type identification. Zheng et al., (2014) insisted that when the same data mining algorithm is applied to the same data set, the output may differ. Based on SVM classifier, three approaches, including GA, ant colony optimization (ACO) and particle swarm optimization(PSO), were utilized for selecting the most important features in the data set to be trained by the classification model. Sridevi and Murugan (2014) introduced that Rough set theory is often applied to feature reduction using the data alone, requiring no additional information and widely used for classification tool in data mining. k-means clustering algorithm is applied to partition the given information system and further rough set theory was implemented on the data set to generate feature subset. The classification process by means of SVM is performed by using the remaining features. Wisconsin Breast Cancer datasets derived from UCI machine learning database are used for the purpose of testing the proposed hybrid model and the success rate of hybrid model is determined as 99%.

Algorithms used a. K-means Algorithm
A set D = {xi | i = 1, . . . , N }, where xi denotes the ith data point => Set of d-dimensional vectors. The process was initiated with k points chosen from the initial k cluster data or "centroids". The initial value was taken by using sampling at random on dataset, fixing it as the clustering solution, a small data subset or unsettled global mean of k times data.
Repeat this algorithm process till convergence, Figure 1. System Architecture End of pre-processing phase, all the processed data is fed into first classification algorithm (i.e. k-means algorithm). With the help of k-means algorithm, processed data can be converted into specified clustered data. Then clustered data is given as input in SVM algorithm and produces best classified data.
In conclusion, Extensive Review has been made from various papers and the modified versions of the conventional k-means and SVM have been reviewed. The review showed that these techniques are found to have proved to be a novel framework for two-way classification methodology in mammographic image analysis. From a study of the available literature, we find that the application of two way classification to the problem of mammographic image analysis is rare. We strongly believe that the proposed system's performance can be scaled up and further enhanced by framing new functional features that are more adaptable to mammograms.
This article presents a very general overview of two way classification architecture. It demonstrates how an abstract structure allows us to discover effective classification of breast cancer images.
This algorithm will be implemented in future because it is simple and the results were encouraging, this will lead to a real-time breast cancer diagnosis system.