Document Type: Research Articles
The Affiliated Hospital of Hangzhou Normal University, Hangzhou Normal University. Hangzhou, Zhejiang, People’s Republic of China.
Department of Applied Psychology and Human Development, Ontario Institute for Studies in Education, University of Toronto, Toronto, Ontario, Canada.
Department of Physiology, University of Toronto, Toronto, Ontario, Canada.
Objective: Serum protein concentrations are diagnostically and prognostically valuable in cancer and other diseases, but their measurement via blood test is uncomfortable, inconvenient, and costly. This study investigates the possibility of predicting albumin, globulin, and albumin-globulin ratio from easily accessible physical characteristics (height, weight, Body Mass Index, age, gender) and vital signs (systolic blood pressure, diastolic blood pressure, mean arterial pressure, pulse pressure, pulse) using advanced machine learning techniques. Methods: We obtained albumin concentration, globulin concentration, albumin-globulin ratio and predictor information (physical characteristics, vital signs) from physical exam records of 46,951 healthy adult participants in Hangzhou, China. We trained a computational model to predict each serum protein concentration from the predictors and then evaluated the predictive accuracy of each model on an independent portion of the dataset that was not used in model training. We also determined the relative importance of each feature within the model. Results: Prediction accuracies were r=0.540 (95% CI: 0.539-0.540; Pearson r) for albumin, r=0.250 (95% CI: 0.249-0.251) for globulin, and r=0.373 (95% CI: 0.372-0.374) for albumin-globulin ratio. The most important predictive features were age (100% ± 0.0%; mean ± 95% CI of normalized importance), gender (34.4% ± 0.7%), pulse (25.6% ± 1.3%) and Body Mass Index (24.4% ± 2.3%) for albumin, pulse (83.7% ± 3.8%) for globulin, and age (99.2% ± 1.0%), gender (59.2% ± 1.7%), Body Mass Index (46.1% ± 4.2%) and height (40.0% ± 3.8%) for albumin-globulin ratio. Conclusions: Our models predicted serum protein concentrations with appreciable accuracy showing the promise of this approach. Such models could serve to augment existing tools for identifying “at-risk” individuals for follow-up with a blood test.