Application of Data Mining Techniques to Explore Predictors of HCC in Egyptian Patients with HCV-related Chronic Liver Disease


Background:Hepatocellular carcinoma (HCC) is the second most common malignancy in Egypt. Data miningis a method of predictive analysis which can explore tremendous volumes of information to discover hiddenpatterns and relationships. Our aim here was to develop a non-invasive algorithm for prediction of HCC. Suchan algorithm should be economical, reliable, easy to apply and acceptable by domain experts.
Methods: Thiscross-sectional study enrolled 315 patients with hepatitis C virus (HCV) related chronic liver disease (CLD); 135HCC, 116 cirrhotic patients without HCC and 64 patients with chronic hepatitis C. Using data mining analysis,we constructed a decision tree learning algorithm to predict HCC.
Results: The decision tree algorithm was ableto predict HCC with recall (sensitivity) of 83.5% and precession (specificity) of 83.3% using only routine data.The correctly classified instances were 259 (82.2%), and the incorrectly classified instances were 56 (17.8%).Out of 29 attributes, serum alpha fetoprotein (AFP), with an optimal cutoff value of ≥50.3 ng/ml was selectedas the best predictor of HCC. To a lesser extent, male sex, presence of cirrhosis, AST>64U/L, and ascites werevariables associated with HCC.
Conclusion: Data mining analysis allows discovery of hidden patterns and enablesthe development of models to predict HCC, utilizing routine data as an alternative to CT and liver biopsy. Thisstudy has highlighted a new cutoff for AFP (≥50.3 ng/ml). Presence of a score of >2 risk variables (out of 5) cansuccessfully predict HCC with a sensitivity of 96% and specificity of 82%.