How Well Have Projected Lung Cancer Rates Predicted the Actual Observed Rates?

Background: While many past studies have constructed projections of future lung cancer rates, little is known about their consistency with the corresponding observed data for the time period covered by the projections. The aim of this study was to assess the agreement between previously published lung cancer incidence and/or mortality rate projections and observed rates. Methods: Published studies were included in the current study if they projected future lung cancer rates for at least 10 years beyond the period for which rates were used to obtain the projections, and if more recent observed rates for comparison covered a minimum of 10 years from the beginning of the projection period. Projected lung cancer incidence and/or mortality rates from these included studies were extracted from the publications. Observed rates were obtained from cancer registries or the World Health Organization’s Mortality Database. Agreement between projected and observed rates was assessed and the relative difference (RD) for each projected rate was calculated as the percentage difference between the projected and observed rates. Results: A total of 59 projections reported in 14 studies were included. Nine studies provided projections for 20 years or more. RDs were higher for those projections in which the lung cancer rates peaked during the projection period, and RDs increased substantially with the length of the projection period. When lung cancer rates peaked during the projection period, methods incorporating smoking data were generally more successful at predicting the trend reversal than those which did not incorporate smoking data. Mean RDs for 15-year projections comparing methods with or without smoking data were 12.7% versus 48.0% for males and 8.2% versus 42.3% for females. Conclusions: The agreement between projected and observed lung cancer rates is dependent on the trends in the observed rates and characteristics of the population, particularly trends in smoking.


Introduction
Lung cancer has been the most commonly diagnosed cancer in the world for several decades, and is the leading cause of cancer death worldwide (Ferlay et al., 2018). Reliable projections of future patterns of lung cancer incidence and mortality are therefore very important for health service planning (Bashir and Esteve, 2001). Projecting future trends in cancer incidence and mortality is always complicated, as the population's risk factor profile will change over time, and in some cases there is a significant latency period between risk factor exposure and cancer development (Bray and Moller, 2006). For lung cancer in particular, the well-documented association between tobacco smoking and cancer risk means that the accuracy of any projections is very reliant on how smoking behaviours are accounted for in the projection methods (Brown and Kessler, 1988;Shibuya et al., 2005;Luo et al., 2018). As detailed data on smoking behaviours are not always available, the selection and implementation of an appropriate projection method is complex.
A systematic review identified 101 studies published between 1st January 1988 and 14th August 2018 that used statistical methods to project lung cancer incidence or mortality rates (Yu et al., 2019). The aims of this study were to compare previously published lung cancer incidence and/or mortality rate projections to observed data that became available since their publication, and to provide insights into key factors that should be considered when selecting methods for projecting lung cancer rates.

Selection criteria
The literature search (Online resource 1) and review protocol for potentially relevant studies are described in detail in our previously published systematic review (Yu et al., 2019), and the full inclusion and exclusion criteria are summarised in Online resource 2. The results of the literature search and the process for selecting studies are described in Online resource 3. Published studies were included in the current study if they projected future lung cancer rates for at least 10 years beyond the period for which rates were used to obtain the projections, and if more recent observed rates for comparison covered a minimum of 10 years from the beginning of the projection period. We defined the 'original observation period', 'projection period', 'evaluation period' and 'observed data for evaluation' as follows (illustrated in Figure 1). For each study, the 'original observation period' is the period for which observed data were used to generate the published projections. The 'projection period' is the period covered by the projections, beyond the observed data used to build the statistical model. 'Observed data for evaluation' are the more recently released observed rates which could be compared with the projected rates. The 'evaluation period' is the period from the beginning of the projection period to the latest observed data available for this current study.

Data extraction
Individual projections were the unit of analysis in this study so that publications that used more than one projection method or multiple datasets for different countries contributed multiple projections. Predicted lung cancer incidence/mortality rates from the published studies, including the fitted values for the data period used for model fitting and the projected values for the period beyond the original observed data, were extracted from the publications (Online resource 4). Newly released observed data for evaluation were obtained from the World Health Organization (WHO) Mortality Database (World Health Organization, 2017), United States Cancer Statistics (SEER, 2016), NORDCAN (Engholm et al., 2017), Cancer Statistics Registrations England at the National Archives (Office for National Statistics, 2016), and the Bulgarian National Cancer Registry (Bulgarian National Cancer Registry, 2018). These observed data were agestandardised to the same standard population used in the published studies, including the WHO World standard population (Ahmad et al., 2001), Segi World standard population, European standard population, the 1970 and 2000 USA standard populations (SEER, 2018), and the 1985 Japanese standard population (Kuroishi et al., 1992). In order to summarise the differences and similarities between the methods used for projections, we applied our previously developed organisational framework to group these methods (Yu et al., 2019).

Statistical analyses
The aim of this evaluation was not to test the exact agreement between the projected and observed rates. Indeed, a formal test was often not possible as it requires estimates of standard errors for the projected rates, which were not always available. Instead, two measures were used to evaluate the overall performance of each projection: assessment of the agreement in the overall trends, and the relative difference (RD) of the projected age-standardised rate (ASR) compared to the observed ASR.
The graphed projected and observed cancer rates were visually inspected to assess the agreement in the overall trends, and in particular whether the projections predicted the peak in the lung cancer rates. The peak in the cancer rate was defined as the point at which there was a significant change in the lung cancer rates from an increasing trend to a decreasing trend, as identified by the Joinpoint regression program with p<0.05 considered to denote statistical significance (Kim et al., 2000). As most of the studies used 5-year grouped data, we determined that 'lung cancer rates peaked in the original observation period' if the significant change point identified by Joinpoint regression occurred at least 5 years before the end of the original observation period.
The second measure (RD) compared the projected ASR to the observed data, and was defined as: where E t is the projected ASR and O t is the observed ASR, and t is the year of the projection beyond the original observation period. RD was calculated for each year for which projections were available for evaluation, and at the 10-year, 15-year and 20-year points in the projection period where available. The mean RD for a set of projections was calculated as the mean of the RDs for the projections in this set. The RD can be interpreted as a measure of the closeness of the observed and projected values.

Results
A total of 14 eligible studies published between 1988 and 2008 were included, covering 18 countries or regions ranked as very high or high on the Human Development Index (HDI) (Online resource 4) (Brown and Kessler, 1988;Negri et al., 1990a;Negri et al., 1990b;Kuroishi et al., 1992;Engeland et al., 1995;Hristova et al., 1997;Kubik et al., 1998;Moller et al., 2002;Kaneko et al., 2003;O'Lorcain and Comber, 2004;Shibuya et al., 2005;Byers et al., 2006;Moller et al., 2007;Eilstein et al., 2008). Eleven studies reported projections of lung cancer mortality rates and 3 studies reported projections of lung cancer incidence rates. Nine studies provided projections for 20 years or more. Twelve studies used methods that did not incorporate smoking data and 4 studies used methods that did. Two of 14 studies used three methods: ageperiod-cohort (APC) model with constant period effects, APC model with linear regression on period effects and APC model with a priori coefficients for period effects based on smoking trends (Negri et al., 1990a;Negri et al., 1990b). Three studies reported projections for four or more countries using the same method (Studies 5, 8 and 11). Consequently, there were 30 and 29 individual projections of lung cancer rates for males and females, respectively.
In general, the RDs were higher for projections in which the lung cancer rates peaked during the projection period, and the RD increased substantially with the length of the projection period (Tables 1 and 2). The 6 projections which used a method incorporating smoking data (projections 1, 2C and 3C for males in Figure 2; projections 1, 11B and 11D for females in Figure 3), and 1 of 12 projections which did not incorporate smoking data (projection 5D for males in Figure 2) captured the change in the direction of the trend in the lung cancer rate. The RDs for projections which used a method incorporating smoking data (Tables 1 and 2, e.g. mean RDs for 15-year projections were 12.7% and 8.2% for males and females respectively) were consistently lower than those for projections which used methods which did not incorporate smoking data (Tables 1 and 2, e.g. mean RDs for 15-year projections were 48.0% and 42.3% for males and females respectively). This pattern was also demonstrated in the RDs for projections for males were higher than those for projections for females. Mean RDs were 13.6%, 25.9% and 50.7% for 10-year, 15-year, and 20-year projections for males (Table 1) and 9.6%, 11.3% and 18.3%, for 10-year, 15-year, and 20-year projections for females (Table 2). In each of the three studies that reported projections for multiple countries using the same method, the RDs for 15-year projections varied between countries, with absolute differences of 12-42% (Studies 5, 8 and 11).
In fewer than one-third of all projections (18 of 59 projections: 12 for males and 6 for females) the lung cancer rates peaked during the projection period ( Figures  2 and 3), and nearly all of these projections overestimated the true rate (12 of 12 for males and 5 of 6 for females). All  two studies which reported comparisons of methods which did or did not incorporate smoking data using the same lung cancer mortality data (Figures 2 and 5, studies 2A-C and 3A-C).
A statistically significant peak in lung cancer rates did not occur during the projection periods for 41 projections (18 for males and 23 for females), although in 16 projections for males the rates did peak during the original observation period (11A-D, 5B, 5E, 7A, 8A-C, 8E, 9, 10, 12-14 for males in Figure 4), and in some datasets for females the rates appear to have levelled off during the projection period (11A, 11C, 5A, 5C, 10 for females in Figure 5). The peak in lung cancer rates had not occurred during the original observation period for any of the projections for females. For the 41 projections in which the peak in lung cancer rates did not occur during  the projection period, most of the studies appear to have good agreement between the projected rates and the more recent observed rates (Figures 4 and 5) set of projections are similar, even when different methods were originally used, and the RDs are consistently lower than for projections where the lung cancer rates peaked  (Tables 1 and 2). For 15-year projections with a RD>10%, the majority of projections for males were overestimates (7 out of 9; Table 1), while for females 80% of the projections were underestimates (8 out of 10; Table 2).

Discussion
There have been many studies published which have used various statistical methods to project lung cancer rates, and many of these studies have been frequently referenced in the literature, reflecting the potential high impact and implications for such cancer research. However, very little is known about the consistency between projected and observed lung cancer rates. The statistical methods for projecting lung cancer rates included in this evaluation ranged from simple linear regression to more complex APC models which require specific techniques and software packages. We found that the agreement between published projections and observed actual rates varied by sex and data setting and is largely dependent on whether or not the lung cancer rates peaked during the projection period rather than the original observation period. Our results showed that lung cancer projections for females generally tended to more closely resemble the observed patterns than projections for males. This is likely to be because lung cancer rates for females tended to be more stable throughout the study periods, without the sharp changes that have occurred in the lung cancer rates for males.
We found that for both males and females, almost all of the projections were overestimates when there was a significant change in the actual lung cancer rates during the projection period, and the RDs were much lower for studies that used a method which incorporated smoking data compared with those that did not do this. Due to the well-established and strong association between smoking exposure and lung cancer risk (Doll and Hill, 1950), the past smoking behaviour in the population should be taken into account when performing lung cancer projections. This is particularly important if sharp changes in smoking trends have occurred (Lopez et al., 1994), since a projection method that does not incorporate the smoking data may not reflect the future impact of these changes in smoking behaviour (Brown and Kessler, 1988). Two of the published studies reported projections of lung cancer mortality rates in Italy and Switzerland by comparing three methods using the same data, and their results confirmed that a method incorporating information on smoking trends more successfully predicted changes in lung cancer mortality rates (studies 2A-C and 3A-C in Figures 2 and  5). However, it should be stressed that the use of a method incorporating smoking data by no means guarantees the reliability of the projection (e.g. see projection 11C in Figure 5), possibly due to variation in the quality of the smoking data, or incomplete capture of other factors that contributed to the changes in lung cancer rates.
There is no single "best" method for projecting lung cancer rates that suits all situations, as the influences of changing risk factors, diagnostic practice and treatment, are complex and very hard to predict and capture (Cancer Projections Network, 2010). Results from this study show that projections using the same method applied to different study populations still had varying degrees of success in Projected and Observed Lung Cancer Rates   (Moller et al., 2003). For example, a large variation in RDs was observed for studies which reported projections for four or more countries using the same method (Studies 5, 8 and 11). In addition, for populations where the tobacco epidemic was fully established early on and in which the peak in lung cancer rates occurred during the original observation period, or for populations in which the peak in lung cancer rates did not occur at any time during the original observation or projection periods, some methods not incorporating smoking data also provided generally reliable projections for 10-15 years beyond the original observation period (Study 12 in Figure 4). Therefore, our study highlights the importance of selecting appropriate methods for projections based on the observation period, length of the projection period beyond the observed data, data quality and availability, and a good understanding of the tobacco epidemic in the population, as well as any other potential factors that may contribute to changes in lung cancer rates. Moreover, an appropriate validation of the selected projection model should be performed and justified whenever this is possible, as such information is useful for checking the specifications of the model and helps researchers understand its potential limitations. This study has some limitations. As it is an evaluation of past projections in different data settings over different study periods, some of these studies are not directly comparable to each other. A further potential limitation of this study is that the data for some of the included studies were extracted from figures using computer software. However, to ensure the reliability of this data extraction, it was independently conducted by two authors and the mean values of the two extractions were used for analyses. Also, the agreement between the two extractions was evaluated and found to be high. Finally, it is important to note that this evaluation study is limited to projections of lung cancer incidence or mortality rates only, which are strongly associated with past tobacco exposure. Therefore, the interpretation of the results may not be generalisable to projections of rates for other cancer types.
Despite these limitations this study also has many strengths. It is the first study to provide an objective assessment of previously published projections of lung cancer incidence or mortality rates using newly released observed data. Included studies were identified from a systematic review of statistical methods for projecting lung cancer rates. Furthermore, the measures for evaluation developed in this study provided an objective assessment of the agreement between newly released observed data and the published lung cancer projections. The approach developed in this study may be applicable to evaluations of other disease rate projections.
By comparing newly released cancer statistics with previously reported projected rates for different populations, it is hoped that this study can provide important information for researchers about the applicability and suitability of various methods in different data settings, so that appropriate methods can be chosen to suit the situation and projection requirements for future research.