Estimating Risk of Breast Cancer Occurrences at Different Ages: Application of Survival Techniques

Background: Awareness is the primary means to control breast cancer occurrence. The purpose of the present work is to study the risk of breast cancer occurrence in different age group, for the study area, Assam, India, by means of survival analysis techniques. Methods: Survival and hazard functions are key concepts in survival analysis for describing the distribution of event times. In the present research a new individialized model has been proposed for cumulative hazard function, taking gamma probability distribution as probability distribution of breast cancer occurrences. Kaplan Meier Survival method has been applied to find out the probability of diseases occurrence in the early menarche and late menarche group. The data used for implementation were collected from the Record Department of a prime local cancer institute, for the period 2010-2012. The information for the risk factor age at menarche were collected from the patients registered during August 2011 to February 2012. Results: The study reveals that in the study area, cumulative hazard of the women belonging to 35 to 50 years is higher than the early and late aged women. The cumulative hazard plot with shape parameter 0.5, 1 and 10 shows that cumulative risk for early aged women are greater than the late age women but when this values is increased from 10, the opposite trend is observed. Further, the median age of disease occurrence among early menarche group is 52 years and for late menarche it is 54 years. Conclusion: The model developed could successfully point out the age group for women lying at higher risk of breast cancer occurrence. Additionally the important risk factor, age at menarche, was effectively applied to supplement to this calculation. It is hoped that practical use of this method would enhance not only awareness but also early detection of the said disease.


Introduction
Cancer is a complex disease and an old and burning health dilemma. This dilemma was assumed to persist more for prosperous nations but as of now it is known that this is a problem both in developing as well as developed countries. In case of breast cancer the trend is slight different for these two categories of nations though. While it remains the leading cause of incident cancer cases for women globally, incidence rates have been stable or declining since the early 2000s in developed countries. Compared to this, the reverse is true in developing countries, where incidence rates are lower but rising faster than in developed countries. New cases of virtually all types of cancer are rising in countries globally -regardless of income. Breast cancer has remained the leading cause of incident cancer cases for women between 1990 and 2013, but the number of new cases doubled during this period (Healthdata; Scienceblog). Breast cancer is the most prevalent kind of cancer among women and its standardized incidence rates are 86.4,

Estimating Risk of Breast Cancer Occurrences at Different
Ages: Application of Survival Techniques N Rajbongshi 1 , D C Nath 2 , L B Mahanta 1 * 27.3, 38.9 and 26 in developed countries, developing countries and the entire world, Asia respectively (Ahmad et al. 2015).
Dr R Ranga Rao, Head of Department and Director-Oncology, Max Super Speciality Hospital, India mentioned in a report that there was a noticeable change in the pattern of the disease too, in India. Until a few decades ago it was observed that breast cancer affected women only from after 50 years of age. The report stated that 65 to 70 percent women suffering were above 50 years of age and 30 to 35 percent were below 50 years. The report also mentioned that however, in the present days, 50 percent of all cases were only between 20 to 25 years of age and that 70 percent of the cases present in advanced stage were heading towards poor survival and high mortality (Timesofindia). In another study made by the author it was reported that most of the breast cancer patient (18.6%) belongs to the 38-43 age groups (Rajbongshi et al., 2016).
Many studies have reported that methods of personalization or modelling applied for screening of breast cancer, although associated with its own benefits and harms, may make a vital change towards its early detection (Onega et al., 2014;Vilaprinyo et al., 2014). Many authors (Gail et al., 1989;Benichou et al., 1996;Tyrer et al., 2004;Shaik et al., 2015;Usman et al., 2014;Armero et al., 2016) have developed risk models employing different risk factors associated with the disease to estimate the risk of developing breast cancer of an individual. However a review of these models questioned their utility because of their low discrimination power (Anothaisintawee et al., 2012). Baghestani et al., (2015) employed the Weibull parametric model to evaluate the prognostic factors responsible for survival of patients with breast cancer.
In the present study, anew basic model applying survival analysis techniques has-been developed to estimate the risk of female breast cancer occurrences using age factor of a group of female population based on the earlier study done in this region (Rajbongshi et al., 2016). Survival analysis is generally defined as a set of methods for analyzing data where the outcome variable is the time until the occurrence of an event of interest. The event can be death, occurrence of a disease, marriage, divorce, etc. The time to event or survival time can be measured in days, weeks, years, etc. For example, if the event of interest is heart attack, then the survival time can be the time in years until a person develops a heart attack.
Apart from age, it has been well reported in different regions of the world, that the age at menarche too is an important risk factor of the disease (Hsieh et.al., 1990;Rautalahti et al., 1993;Meshram et al., 2009;Bhadoria et al., 2013). In a study by the authors (Rajbongshi et al., 2015) it was revealed that the results obtained supported the importance of reproductive factors mainly age at first child birth, number of child birth and age at menarche as breast cancer risk factors, in the region where the study was conducted.
In a study conducted by Wyshak and Frisch (1982) it was pointed out that the age of menarche has reduced from the 19th to 21st century. A decline of about 0.3 year per decade could be calculated from Norwegian and Finnish data. This decrease can be attributed to many factors including genetic and environmental differences as discussed in (Zacharias and Wurtman, 1969). The region, Assam, Inida, where the study was conducted appears to support the finding to some extent. It is located in the North-East region of India. As has been reported by Rajbongshi et al., (2015), Assam is a confluence ground of ethnic tribes and races which entails the need for a study involving the related risk factors. The average age at menarche in Assam study reported in a study in 1960 was found to be 13.21± 0.11 (Foll, 1954). Progressing chronologically in other studies it is found as follows: 12.95 years in Singphou (Kar and Mahanta, 1975); 12.60 years among the Ahom (Sengupta, 1982); 13.23 years among the Brahmin (Das et al., 1984); 13.06 years among the Turung (Das, 1985); 12.80 years among the Khamyang (Das, 1985); 12.76 years among the Kaibarta (Das and Barua, 1999) and 12.78 years among the Maiteis (Das and Barua, 1999). In 2015 a study by Sharma and Dutta (2016) reported the mean onset of menarche in Kalita girls is found to be 11.80± 0.926.
Since breast cancer is the most frequently diagnosed cancer among the female population so estimating relative risk and trends of incidence at the regional level is helpful for the policymakers and also in screening process of the disease (Jafari Koshki et al., 2014). Hence, the factors kept in consideration to develop a new individualized model were: a) To estimate the Cumulative risk of having breast cancer from the present age of a group of women up to age 70. b) Find out the probability of breast cancer occurrences between the early menarche and late menarche group of female population.

Study design and data collection
To complete the first objective of the present research work, secondary data are used in the implementation of the estimated hazard function, for the period 2010-2012. Those data were collected from the Hospital Record Department of Dr. B. Borooah Cancer Research Institute, the prime regional cancer care and research institute of Assam situated at Guwahati. It is the main source of data on cancer for the whole North East India. In this case, Case files of the clinically diagnosed breast cancer patient were reviewed and information is extracted. To find out the information about the age at menarche data were obtained using a Matched Case Control study design. In matched case control study, controls (without disease) are matched to cases (with disease) on one or more attributes to eliminate the effect of confounding factors. Confounding occurs when the relationship between the exposure and disease is attributable (partly or wholly) to the effect of another risk factor, i.e. the confounder. The details are as explained below:

Cases
Data are collected during August 2011 to February 2012 from the Outpatient department of the same institute. We included newly diagnosed female (both married and unmarried) breast cancer patient confirmed by histopathology during the given period and belonging only to different district of Assam.158 cases were diagnosed during that period out of which 58 cases are excluded due to different criteria such as incomplete information, male patients, outside Assam etc. Hence, 100 cases are included in the study.

Controls
Here we take age at diagnosis (±5 years), family income and place of residence as matching criteria of the cases to the controls, thereby eliminating the effect of these factors, as these three factors are already well established risk factors of breast cancer. Our present study focussed on those risk factors affected by ethnicity, diversity in the customs and culture. Here we did not take genetical factor as matching criteria as we are not permitted to collect clinical data of breast cancer cases. The matching ratio was 1:1.
The data for the 50 controls were collected from respondents who came to the preventive oncology and ( 3) It is observed in an earlier study that in the study area, age occurrence of the breast cancer (event of interest) follows Gamma probability distribution (Rajbongshi et.al., 2016). Therefore we take Gamma probability density function as f (t). The Gamma probability density function is characterized by the two parameters, shape and scale, where the scale parameter is approximately mean divided by the variance. A large scale parameter may indicate large mean and low variance in the data set and low scale parameter may indicate variance is larger than mean of the observation. Here, Applying Leibniz integration rule we get (5) Now from equation (4) and (5) we get, Let (I)= From equation (6), Kaplan-Meier method is one of the non-parametric methods for describing time to event data. Santos Silva (Santos Silva, 1999) pointed out that if we know the exact time when death or event of interest occurs then we can calculate the survival probability right after death or event of interest of each individual without having to combine the data into intervals of one year or for any length of time. This system is the chosen approach whenever event and censoring times are available (Estève et al., 1994). Here Kaplan-Meier method is applied using SPSS (version 18) software.

Results
As explained above, a total of 100 pairs of cases and controls were collected and considered in the study and analysis was done based on the information given by those respondents. The missing information of the patients for the cases were attempted to be filled by calling them up over phone or by personally visiting the accessible ones. In this process records of 58 patients could not be completed and hence excluded. Nan earlier study (Rajbongshi et al., 2016), it has been shown that the age distribution of the breast department of the institute for breast cancer screening but were not diagnosed with breast cancer. Information for the remaining 50 controls was collected from the community having same economic status.
Collection of the data and preparation of the structured questionnaire were done under the guidance of an oncologist. The questionnaire was pre-tested with data of 10 other patients, which were not included in the implementation of the study later on, as a check whether it would provide the valid and reliable information required for this study (A. K. Singh, 1998, This reference is not at the reference list???).

Formulation of the model
Now, Kaplan Meier survival method has been applied to complete the second objective. It may be used to estimate the probability of breast cancer occurrences in early and late menarche group. Here, occurrence of breast cancer at a particular age may be considered as the event of interest. That is, our event of interest is "Disease occurrence" and the waiting time is the disease free period.
We assume T= age at when disease has occurred with probability density function f(t) and cumulative distribution function F(t)=pr{T≤t} giving the probability that the event "Event of interest" has occurred by duration tithe complement of the distribution function is S(t)=pr{T>t}=1-F(t) which gives the probability that the event of interest has not occurred by duration t.

Now, the definition of the hazard function is
(1) The rate of occurrence of the event at duration t equals the density of the event at t (in our case, age at diagnosis with breast cancer), divided by the probability of surviving to that duration without experiencing the event. Integrating the equation from 0 to t, above expression may be used to obtain a formula for the probability of surviving (in our case Probability of being disease free) to duration t, as a function of the hazard at all duration up to t. Hence, Where Λ(t)is the sum of risks women can face going from duration 0 to t and is given by, In the present study tp is the present age of a woman and we have to find out cumulative hazard from the present age up to age t. Therefore we skip the first portion of the equation (2) which indicates time from birth to present age, and finally we take Now from equation (1), λ(x)=(f(x))/(S(x)) cancer occurrences (event of interest) in the study area gives good fit to the gamma distribution with shape and scale as the two parameters. The shape parameter describes the form of the probability curve. As derived in the study, the square root of the estimated shape parameter is equal to the mean of the non-zero observation divided by the standard deviation. Further, it was estimated that the values of the shape parameter were 16.5 and scale parameter 0.3727 respectively. It is known from the Gamma distribution properties that the shape parameter value increases from 1 when a Gamma distribution assumes an unmade but skewed shape curve. The skewness reduces as the value of the shape parameter increases. From the earlier study done in the study area (Rajbongshi et al., 2016), it was observed that the average age of most of the women in the regionwere 44.5 years with a standard deviation of SD=10.9 years. Here, as mean is greater than standard deviation it implies that the shape value is always greater than 1. This means that the shape of the curve always takes skewed form.
In estimating the scale parameter from the same study, it is known that mean of the non-zero observation has been found by dividing the shape parameter by the scale parameter. The variance of the gamma distribution is estimated by dividing the shape parameter by the square of the scale parameter. Here mean is 44.5 which is less than variance 119.This indicates that variability has been observed in the time of disease occurrence.
In the present study, the parameter estimated has been implemented to find out the cumulative hazard from the estimated function of the cumulative hazard given in the equation (7). Further, the changing behavior of the cumulative hazard has also been observed for different values of the shape parameter, as it is known that the shape of the probability curve of the gamma distribution changes with the shape parameter. Table 1 shows different values of the cumulative hazard i.e.Λ(t)(cumulative risk of a women corresponding to present age of a women up to age 70) by taking t=70 years, α =16.5, λ=0.3727.
Next we implemented the above method for different value of the shape parameter keeping scale parameter constant, as shown in Table 2. Figure 1 depicts   shows an opposite trend for values of α above 10.
As mentioned above, in literature related to risk factors of the female breast cancer, it is always mentioned that early menarche is associated with the increased risk of breast cancer occurrences. So in this paper estimated probability of disease occurrence at different age of an observer in the two group early menarche and late menarche has also been find out using Kaplan Meier Survival method as indicated in Table 3.

Discussion
Elaborate discussions have been forwarded (Rockhill et al., 2000;Gail et al., 2011) on risk of diseases, in general, and some of its properties. These studies also present applications in breast cancer counselling and prevention. As reported by Shaik et al., (2015) women within the age group 30-50 are 3.704 more likely to have breast cancer, as found in our case (age group 35-50 years). In close comparison with our findings, age factor and the point of change at diagnosis has been reported at 50 years by Abdollabi et al., (2016). A broader age group for highest probability (10.1 percent) of occurrence of breast cancer is also reported by Gail et al., (1989). However, no study has been made with age at menarche in mind, although this is a high-risk factor in case of breast cancer.
Data compiled from the Population Based Cancer Registry report (PBCR, HBCR Report, 2014) of the Indian Council of Medical Research, Government of India, clearly indicates that the Age Adjusted Incidence rate (AAR) of breast cancer in North East (NE) India (27.81 rate per 100,000) is almost double than the non-NE India (13.62 rate per 100,000) incidence. Also as per reports mentioned above but not deeply studied earlier (and out of scope of this study) the age at menarche in this region is much lower than the age at menarche in the rest of the country. There are a lot of different factors that can cause menarche to occur late or early. Race is one, and the region is a confluence of so many races and tribes. Further it is well known that the earlier menarche is attained, the more period cycles, in theory, are there in life.Hence, the earlier a woman starts having periods, the longer her breast tissue is exposed to estrogens released during the menstrual cycle and the greater her lifetime exposure to oestrogen and oestrogen-sensitive cancers, like breast cancer. So the study of these two factors is vital to the understanding of incidence of the disease for intervention efforts in the region.
It is evident that absolute risk models are vital to implement a "high risk" prevention strategy by identifying a high risk subset of the population and emphasise on intervention efforts on that subset. The present study reveals that the women of the region who belong to age group 35 to 50 years are at a higher hazard to the disease. Further, from the second part of the study it is found that probability of disease occurrence in the early menarche group is higher than the late menarche group. And the median age of the disease occurrence among the early menarche group is 52 years and for late menarche it is 54 years.

Funding
This research did not receive any specific grant from funding agencies in the public, commercial, or not-forprofit sectors.