Modeling Age-specific Cancer Incidences Using Logistic Growth Equations: Implications for Data Collection


Large scale secular registry or surveillance systems have been accumulating vast data that allow mathematicalmodeling of cancer incidence and mortality rates. Most contemporary models in this regard use time series andAPC (age-period-cohort) methods and focus primarily on predicting or analyzing cancer epidemiology withlittle attention being paid to implications for designing cancer registry, surveillance or evaluation initiatives.This research models age-specific cancer incidence rates using logistic growth equations and explores theirperformance under different scenarios of data completeness in the hope of deriving clues for reshaping relevantdata collection. The study used China Cancer Registry Report 2012 as the data source. It employed 3-parameterlogistic growth equations and modeled the age-specific incidence rates of all and the top 10 cancers presented inthe registry report. The study performed 3 types of modeling, namely full age-span by fitting, multiple 5-yearsegmentfitting and single-segment fitting. Measurement of model performance adopted adjusted goodness of fitthat combines sum of squred residuals and relative errors. Both model simulation and performance evalationutilized self-developed algorithms programed using C# languade and MS Visual Studio 2008. For models builtupon full age-span data, predicted age-specific cancer incidence rates fitted very well with observed values formost (except cervical and breast) cancers with estimated goodness of fit (Rs) being over 0.96. When a givencancer is concerned, the R valuae of the logistic growth model derived using observed data from urban residentswas greater than or at least equal to that of the same model built on data from rural people. For models basedon multiple-5-year-segment data, the Rs remained fairly high (over 0.89) until 3-fourths of the data segmentswere excluded. For models using a fixed length single-segment of observed data, the older the age covered by thecorresponding data segment, the higher the resulting Rs. Logistic growth models describe age-specific incidencerates perfectly for most cancers and may be used to inform data collection for purposes of monitoring andanalyzing cancer epidemic. Helped by appropriate logistic growth equations, the work vomume of contemporarydata collection, e.g., cancer registry and surveilance systems, may be reduced substantially.