Feature Selection Methods for Optimizing Clinicopathologic Input Variables in Oral Cancer Prognosis

Abstract

The incidence of oral cancer is high for those of Indian ethnic origin in Malaysia. Various clinical and pathological data are usually used in oral cancer prognosis. However, due to time, cost and tissue limitations, the number of prognosis variables need to be reduced. In this research, we demonstrated the use of feature selection methods to select a subset of variables that is highly predictive of oral cancer prognosis. The objective is to reduce the number of input variables, thus to identify the key clinicopathologic (input) variables of oral cancer prognosis based on the data collected in the Malaysian scenario. Two feature selection methods, genetic algorithm (wrapper approach) and Pearson’s correlation coefficient (filter approach) were implemented and compared with single-input models and a full-input model. The results showed that the reduced models with feature selection method are able to produce more accurate prognosis results than the full-input model and single-input model, with the Pearson’s correlation coefficient achieving the most promising results.

Keywords