TY - JOUR ID - 89459 TI - Physical Features and Vital Signs Predict Serum Albumin and Globulin Concentrations Using Machine Learning JO - Asian Pacific Journal of Cancer Prevention JA - APJCP LA - en SN - 1513-7368 AU - Wei, Jing AU - Xiang, Jie AU - Yasin, Yousef AU - Barszczyk, Andrew AU - Wah, Deanne Tak On AU - Yu, Meifen AU - Huang, Wendy Wenyu AU - Feng, Zhong-Ping AU - Lee, Kang AU - Luo, Hong AD - The Affiliated Hospital of Hangzhou Normal University, Hangzhou Normal University. Hangzhou, Zhejiang, People’s Republic of China. AD - Department of Applied Psychology and Human Development, Ontario Institute for Studies in Education, University of Toronto, Toronto, Ontario, Canada. AD - Department of Physiology, University of Toronto, Toronto, Ontario, Canada. Y1 - 2021 PY - 2021 VL - 22 IS - 2 SP - 333 EP - 340 KW - Anthropometry KW - blood pressure KW - pulse KW - health screening KW - blood biomarker prediction DO - 10.31557/APJCP.2021.22.2.333 N2 - Objective: Serum protein concentrations are diagnostically and prognostically valuable in cancer and other diseases, but their measurement via blood test is uncomfortable, inconvenient, and costly. This study investigates the possibility of predicting albumin, globulin, and albumin-globulin ratio from easily accessible physical characteristics (height, weight, Body Mass Index, age, gender) and vital signs (systolic blood pressure, diastolic blood pressure, mean arterial pressure, pulse pressure, pulse) using advanced machine learning techniques. Methods: We obtained albumin concentration, globulin concentration, albumin-globulin ratio and predictor information (physical characteristics, vital signs) from physical exam records of 46,951 healthy adult participants in Hangzhou, China. We trained a computational model to predict each serum protein concentration from the predictors and then evaluated the predictive accuracy of each model on an independent portion of the dataset that was not used in model training. We also determined the relative importance of each feature within the model. Results: Prediction accuracies were r=0.540 (95% CI: 0.539-0.540; Pearson r) for albumin, r=0.250 (95% CI: 0.249-0.251) for globulin, and r=0.373 (95% CI: 0.372-0.374) for albumin-globulin ratio. The most important predictive features were age (100% ± 0.0%; mean ± 95% CI of normalized importance), gender (34.4% ± 0.7%), pulse (25.6% ± 1.3%) and Body Mass Index (24.4% ± 2.3%) for albumin, pulse (83.7% ± 3.8%) for globulin, and age (99.2% ± 1.0%), gender (59.2% ± 1.7%), Body Mass Index (46.1% ± 4.2%) and height (40.0% ± 3.8%) for albumin-globulin ratio. Conclusions: Our models predicted serum protein concentrations with appreciable accuracy showing the promise of this approach. Such models could serve to augment existing tools for identifying “at-risk” individuals for follow-up with a blood test. UR - https://journal.waocp.org/article_89459.html L1 - https://journal.waocp.org/article_89459_3df8746c4001a3a58a1fb699d906dabe.pdf ER -