Prediction of Lung Cancer Based on Serum Biomarkers by Gene Expression Programming Methods

Abstract

In diagnosis of lung cancer, rapid distinction between small cell lung cancer (SCLC) and non-small celllung cancer (NSCLC) tumors is very important. Serum markers, including lactate dehydrogenase (LDH),C-reactive protein (CRP), carcino-embryonic antigen (CEA), neurone specific enolase (NSE) and Cyfra21-1,are reported to reflect lung cancer characteristics. In this study classification of lung tumors was made basedon biomarkers (measured in 120 NSCLC and 60 SCLC patients) by setting up optimal biomarker joint modelswith a powerful computerized tool - gene expression programming (GEP). GEP is a learning algorithm thatcombines the advantages of genetic programming (GP) and genetic algorithms (GA). It specifically focuses onrelationships between variables in sets of data and then builds models to explain these relationships, and hasbeen successfully used in formula finding and function mining. As a basis for defining a GEP environment forSCLC and NSCLC prediction, three explicit predictive models were constructed. CEA and NSE are requentlyusedlung cancer markers in clinical trials, CRP, LDH and Cyfra21-1 have significant meaning in lung cancer,basis on CEA and NSE we set up three GEP models-GEP 1(CEA, NSE, Cyfra21-1), GEP2 (CEA, NSE, LDH),GEP3 (CEA, NSE, CRP). The best classification result of GEP gained when CEA, NSE and Cyfra21-1 werecombined: 128 of 135 subjects in the training set and 40 of 45 subjects in the test set were classified correctly,the accuracy rate is 94.8% in training set; on collection of samples for testing, the accuracy rate is 88.9%. WithGEP2, the accuracy was significantly decreased by 1.5% and 6.6% in training set and test set, in GEP3 was0.82% and 4.45% respectively. Serum Cyfra21-1 is a useful and sensitive serum biomarker in discriminatingbetween NSCLC and SCLC. GEP modeling is a promising and excellent tool in diagnosis of lung cancer.

Keywords