A Comparative Evaluation of Cancer Classification via TP53 Gene Mutations Using Machin Learning

Mikhail, Dina Yousif; Al-Mukhtar, Firas H; Kareem, Shahab Wahab

doi:10.31557/APJCP.2022.23.7.2459

A Comparative Evaluation of Cancer Classification via TP53 Gene Mutations Using Machin Learning

Document Type : Research Articles

Authors

¹ Information System Engineering Department, Technical Engineering College, Erbil Polytechnic University, Erbil, Iraq.

² Department of Information Technology & Computer science, Catholic University in Erbil, Iraq.

³ Department of Information Technology, College of Engineering and Computer Science, Lebanese French University, Erbil, Iraq.

10.31557/APJCP.2022.23.7.2459

Abstract

Objective: Cancer is one of the horrendous diseases. Classifying cancer is founded on identifying cancer-causing mutations in gene sequences. Although genetic analysis can predict certain types of cancer, there is currently no effective method for predicting cancers. Therefore, the purpose of this paper is to predict the cancer types and to find a data mining technique that uses two different machine learning algorithms for classifying cancer. Moreover, earlier detection of the mutated tumor protein P53 gene can predict treatment and gene therapy techniques. Methods: (UMD-2010) the Universal Mutation Database is used to diagnose mutations in genes. The challenge, however, is that the database very basic. Besides, it is an excel format database. Due to its limitations, the data base cannot be used to classify cancer. In addition, bioinformatics techniques such as pairwise alignment and BLAST are used, followed by machine learning algorithms that use neural network algorithms to classify cancer based on malignant mutations in the TP53 gene, by selecting (12) out of (53) database fields for the TP53 gene database in the second stage. It should be noted that the (UMDCell-line2010) database does not have one of these twelve fields (Field of gene locus). Result: As a Utilizing MLP and SVM for training and testing a set number of fields, the Machin learning methods were found to be an effective way to classify cancers. Where the Relative Absolute Error for MLP and SVM is 83.6005 % ,65.6605 %, the accuracy is 90 %, 93.7% respectively. Conclusion: Following the learning and testing stages, the mean absolute error (MAE), used to measure the errors was found in the SVM less than the (MAE) in MLP algorithm. we can conclude that using SVM is considered better than the MLP algorithm because the accuracy in SVM better than the accuracy of MLP.

Keywords