Utilization of a Mixture Cure Rate Model based on the Generalized Modified Weibull Distribution for the Analysis of Leukemia Patients

Acute lymphocytic leukemia (ALL) is a form of blood and bone cancer where lymphoid progenitor cells proliferate in the bone marrow, blood, and other sites. While ALL represents 80% of children leukemia, it is uncommon in grown-ups (20% of cases). In the United States, the incidence of ALL is estimated at 1.64 per 100,000 persons (National Cancer Institute, 2020). According to the American Cancer Society database (2019), an estimated 5930 new cases were diagnosed, with 1,500 deaths due to ALL in 2019. The formulation of treatment for grown-ups ALL has been adjusted from pediatric conventions. Shockingly, while cure tends to be 90% for standard-hazard pediatric ALL, the long-term survival rate is humbler in grown-ups (Terwilliger and Abdul-Hay, 2017). The main treatment of ALL is chemotherapy, which comprises of induction, intensification, and long-term maintenance, with the central nervous system (CNS) Abstract


Utilization of a Mixture Cure Rate Model based on the Generalized Modified Weibull Distribution for the Analysis of Leukemia Patients
prophylaxis provided at different times during the therapy. Induction therapy aims to accomplish total remission and to re-establish ordinary hematopoiesis. Following the induction treatment, patients underwent three cycles of consolidation treatment of methotrexate with leucovorin rescue and L-asparaginase. Registered individuals as high-risk disease and a corresponded donor, then received allogeneic stem cell transplantation (allo-SCT). The rest were randomly assigned to standard intensification/ maintenance or autologous bone marrow transplants.
In survival analysis, analysts generally utilize product-limit estimates or log-rank test (Bradburn et al., 2003a), semi-parametrical models (for instance, Cox proportional hazards model), or regular parametrical models considering several well-known distributions in the existence of covariates (Cox, 1972). The Weibull distribution is widely used in cancer research (Bradburn et al., 2003b). Since its risk function is flexible and its parameters are easy to estimate. However, data sets of medical studies often necessitate more advanced parametric models. As a consequence, to resolve this problem, several authors in the literature have proposed new classes of parametric distributions based on Weibull distribution such as the EW (Mudholkar and Srivastava, 1993), the GMW (Carrasco et al., 2008), the log-beta Weibull (Ortega et al., 2013), and the generalized alpha power inverse Weibull (Basheer, 2019) distributions. Another common scenario in survival data analysis, particularly in cancer research, is when a fraction of a population is not exposed to the event of interest. For this situation, patients were divided into two groups: those who were exposed to the event under study, and those who were not exposed to it and, therefore, were not at risk. These patients are viewed as cured or immunized.
The existence of cured subjects in a sample data is commonly proposed by a Kaplan-Meier curve, which displays a tall and a steady level with dense censoring at the right extreme (Corbière et al., 2009). To model the proportion of cured subjects, many authors have proposed several statistical methods. For further reading, we refer interested readers, for instance, to (Boag, 1949;Berkson and Gage, 1952;Goldman, 1984;Haybittle, 1965;Farewell, 1982;Maller and Zhou, 1992;Abu Bakar et al., 2008;Lu, 2010;López-Cheda et al., 2017). Moreover, the maximum likelihood estimation technique has been suggested by some authors such as (Farewell, 1982;Yamaguchi, 1992;Ghitany and Maller, 1992;Peng et al., 1998;Sy and Taylor, 2000) amongst others.

Leukemia data
In this paper, we considered a leukemia dataset, presented by Kersey (1987) and available in smcure package in R software (Cai et al., 2012). This dataset consists of 91 patients with high-risk ALL and is divided into two subsets; the first subset (Group 1) contains 46 patients who exposed to allogeneic bone marrow transplants and the second subset (Group 2) includes 45 patients who received autologous bone marrow transplants. The event of interest is time to death. We found that there is 24.17 % of censored observations, in other words, 28.26 % if we assume the patients received allogeneic bone marrow transplant and 20 % if we consider the patients received autologous bone marrow transplant.

Mixture cure model
In the context of survival analysis, a mixture cure model assumes that the survival function for the entire population can be expressed as a mixture of the cured and the uncured patients and is given by where ρ is a proportion of "cured patients" or "long-term survivors" and S uc (t) represents the survival function for susceptible patients (Boag, 1949). The density function for the random time T is given by: , where f uc (t) is the probability density function for the uncured subjects.
Assume for each subject belongs to a random sample of size m, we observe the pair , , then the j th subject contribution for the likelihood function can be given by , where d j is a censoring indicator variable which is defined as

The generalized modified Weibull distribution (GMW)
In the present study, we consider GMW distribution for uncured patients. The GMW distribution was proposed by Carrasco (2008), which is a four -parameter distribution with probability density function f uc (t) and survival function S uc (t) given respectively by: ( 2) and (3) where . This distribution has a capability to model monotonic and non-monotonic failure rates, and it is denoted by . The ε parameter is the scaling parameter, whereas ψ and η are shape parameters. The parameter λ is a sort of quickening operator in the lack of time. When the time increments, λ is working as an operator of weakness in the individual's survival time (Carrasco et al., 2008). The respective risk function is as follows: (4) The GMW distribution (Carrasco et al., 2008) represents a generalization of some special cases which are given as follows: • Weibull distribution (W): in the case where λ=0 and ψ=1, the equation (2) becomes which is the probability density function of a Weibull distribution with two parameters, and, furthermore, if η=1 and η=2, the cases correspond to exponential (E) and Rayleigh (R) distributions, respectively.
• Extreme value distribution (EV): for ψ=1 and η=0, GMW reduces to which is type I extreme value distribution (Kotz and Nadarajah, 2000). Nevertheless, when assuming survival Moreover, let be a vector of unknown values, we can include the kind of transplant as a covariate in the shape control parameter ψ by substituting in the Equation (5). In this case, the parameter φ 1 is associated with the transplant effect on the shape of the survival curve.

Model choice
Comparison between mixture cure models presuming various distributions, was evaluated using the Akaike information criterion (AIC) introduced by Akaike (1974) in the early 1970s. The AIC is defined by, AIC=-2l+2q where -2l being the negative-two-log maximum likelihood and q is the number of free parameters in the model. A lower AIC value suggests preferable model fit.

Results
The product-limit estimates of survival function considering the leukemia data are demonstrated in Figure  1a. There is a plateau in the right tail of the curve with a height closer to 0.228. Figure 1b presents the estimated survival curves for allogeneic and autologous groups, where steady plateaus are noticed after around 734 days and about 1,256 days of follow-up for the autologous and allogeneic groups, respectively.
In Table 1, we have maximum likelihood estimates for the parameters of the mixture cure model assuming the GMW distribution and its particular cases and excluding covariates, 95% confidence intervals for the estimated parameter, and AIC values. This table shows that the models considered the EW and the GMW distributions for susceptible patients are provide the smallest AIC values (953.3 and 955.94, respectively). To obtain a more visible illustration for the model fitting, considering different probability distributions for the leukemia data, Figure 2 displays product-limit estimators for survival function versus the corresponding anticipated values obtained by the mixture cure models for every proposed distribution (results from Table 1).
The survival functions and the respective risk functions provided by the mixture cure model fit based on GMW distribution and its sub-distributions (results in Table 1) are shown in Figure 3 (panels (a) and (b)) and (panels (c) and (d)), respectively. Figure 3, panel (b) shows that the survival curves obtained by the models based on the EW and GMW distributions are the nearest to the Kaplan-Meier estimates' curves. Table 2 presents the results of the models based on GMW distribution and its sub-distributions, excluding the cure rate ρ. This table also illustrates that the smallest AIC values are obtained by EW and GMW distributions.
The results in Table 3, obtained by considering the mixture cure model based on the GMW distribution and the covariate is included both in ρ and ψ. The values of ρ 0 and ρ 1 included in Table 3, were calculated from the formulas; data some care is needed since its support dispersion over the entire real line (Lai et al., 2003). From the equation (3), we can derive the survival function for this distribution which is given by . Consequently, we have when t=0 , and, in other words, as it is anticipated considering survival outcomes. This distribution is also included in the present study regardless of these issues.
• Exponentiated Weibull distribution (EW): if we let λ=0 in the equation (2), we obtain the density which is the probability density function of EW distribution proposed by Mudholkar (1993). If η=1, in addition to λ=0, the particular case coincides with the exponentiated exponential (EE) distribution. Gupta (2001) debated some statistical properties of the EE distribution. If η=2 besides λ=0 , the case coincides with the generalized Rayleigh (GR) distribution (Kundu and Raqab, 2005).
• Modified Weibull distribution (MW): if we take ψ=1, the GMW distribution becomes which is the density of a MW distribution with three-parameter as suggested by Lai (2003). Sarhan (2009) presented the estimation of the parameters and properties of MW distribution using the maximum-likelihood estimation approach.

The log -likelihood function
Let , then the log-likelihood function for Ω regarding the mixture model in Equation (1) takes the form:

Covariates Influence
To link the cure fraction ρ with a vector of covariates x j considering the mixture cure model (1), we substitute for ρ in the Equation (5), where β represents a vector of unknown coefficients.
In order to study the effect of the transplant type as a covariate, we assume the following model: where x j is a binary variable associated with the transplant (1 for the autologous transplant; 0 for the allogeneic transplant). The parameter β 1 is linked to the influence of the transplant on the cure fraction. If the zero belongs to the 95% confidence intervals for β 1 , we can deduce that there is no proof the transplant has influence.   and they represent the cure fractions assuming the patients exposed to allogeneic transplants and autologous transplants, respectively. This    (Table 1, results). Perfect consent between product-limit estimates and anticipated values is shown by the diagonal lines. kind of transplant is incorporated both in the cure fraction ρ and the parameter of shape ψ (results from Table 3), are shown in Figure 4. The graph in this figure displays that the hazard functions are overlapping at the time of around 382 days after the transplants.

Discussion
The present study aims to select a suitable distribution for survival times of susceptible leukemia patients. To achieve this goal, we proposed a mixture cure model based on a GMW and its sub-distributions. This model expands many distributions broadly utilized in the field of survival data analysis. The GMW distribution is adaptable to accommodate various forms of hazard rate functions, such as bathtub-shaped failure rates data.
Parameter Estimate 95% Confidence interval AIC*  Cure fraction models are developed to estimate the probability of being cured. In the absence of cured individuals, cure models can be reduced to the classical survival models. There are two major types of cure models. The first one is the mixture cure model, which is a common approach for modelling data with long-term survivors. In this model, the population is considered as a mixture of cured patients and uncured patients. The advantage of the mixture cure model is that it enables covariates to have different effects on cured patients and on survival times of susceptible patients. It is then possible to consider various covariates in the two parts of the model (incidence and latency) and assess the influence of the same covariate(s) on the two components. This property distinguishes the mixture cure model from the other cure models.
On the other hand, the mixture cure model cannot verify the property of proportional hazard functions. Besides, it does not appear to have a biological interpretation meaning, especially in the cancer recurrence. The other type of cure model is often known as the promotion time cure model or as the non-mixture cure model. It was first proposed by Yakovlev et al., (1993). This model assumes that after the cancer initial treatment, there are some cancer cells left inside a patient's body which may develop slowly over time and produce a discernible relapse of cancer. In some cases, the mixture cure model and the promotion time cure model are mathematically related to each other.
The graph of Kaplan-Meier survival curves in this study indicates that survival models that disregard the rate of cured subjects ρ would not be appropriate for the analysis of this data. In addition, this graph illustrates that the probability of being cured for the allogeneic group is better than the survival probability of the autologous group. Moreover, it displays stable plateaus at the right tail of each curve, which indicates that there may exist unsusceptible patients in the two treatment groups.
The present study found that the mixture cure models based on the EW and the GMW distributions display a better fit to the leukemia data since these models provide the closest anticipated values to the empirical values. However, in a different setting to the present study, Peng et al., (2001) proposed for the autologous group, the log-normal mixture cure model provides the best fit compared to exponential, Weibull, and gamma mixture cure models.
The results of the current study show that the estimated values for the cure proportion ρ obtained by the cure models based on the Weibull, exponential, Rayleigh, extreme value, exponentiated exponential, and generalized Rayleigh distributions are greater than the value of ρ shown in the panel (a) of Figure 1. Applying the models based on the EW and GMW distributions, one can obtain more accurate estimated values for the parameter ρ, and this means that we have extra proof of a good fit when assuming these models.
The curves of risk functions obtained by the EW and the GMW distributions are extremely near one another. These curves indicate that there is a high danger of death during the period instantly after the transplant. After this pinnacle, the hazard continues to decline until the end of the observation period.
In the present study, comparing between Table 1 and Table 2, we note that the AIC values provided by the fit of models excluding ρ are greater than the values of AIC obtained by the fit of models not excluding the cure rate. As anticipated, this also reveals that the mixture cure model is very convenient for analysing the data at hand. Kutal et al., (2018) found that the estimated cure rates for the allogeneic group obtained by a mixture cure model based on Weibull and exponentiated exponential distributions are 0.239 and 0.242, respectively. Moreover, Lázaro et al., (2020) reported that the approximated cure fractions for allogeneic and autologous groups are 0.270 and 0.198, respectively.These results seemed relatively consistent with the present findings in terms of the values of the estimated cure fractions for the allogeneic and autologous patients groups.
It is interesting to note that the 95% confidence intervals for ρ 0 /ρ 1 (1.18614, 1.72955) do not contain the value 1, suggesting that there is proof of differences between the population cure rates presuming patients treated by allogeneic and autologous bone marrow transplants. Furthermore, the 95% confidence intervals for β 1 (-0.56860,-0.36137) not include zero.
The results of this study reveal that the hazard functions considering the mixture cure model with the GMW distribution and the covariate is linked both to ρ and ψ, are not proportional. Thus, the Cox proportional hazard model approach is not appropriate for analysing this data. This result is in agreement with the findings of Kersey et al., (1987). The existence of cure individuals commonly is not presumed by the standard Cox model (Cox, 1972). Nevertheless, the literature presents several expansions of this model that consider cured subjects. However, these approaches are unsuitable for the evaluation of nonproportional risk functions.
The curves of the hazard functions have different shapes, indicating that applying the parametric models based on generalized probability distributions for the study of lifetime data like the data in hand can be very beneficial (Martinez et al., 2013). Because these models can incorporate numerous types of hazard rate functions.
It is worth mentioning that the zero belongs to the 95% confidence intervals for φ 1 (-0.05267, 0.05259) shown in Table 3, indicating no evidence of a significant difference between the shapes of population risk functions.
The findings of the present study show that the risk of doom is higher in the time around 11 days after the transplant assuming the patients that were exposed to allogeneic bone marrow transplants. While the hazard of dying is higher around 31 days after the transplant, presuming that the patients were treated by the autologous bone marrow transplants. Around 400 days after the transplantation, the hazard of death is almost the same.
In conclusion, parametric models incorporating a cure fraction with a specified distribution for survival times of susceptible subjects are appropriate instruments to analyse survival data with long-term survivors because these models do not presume hazards proportionality, and they can estimate measures that are readily construed by practitioners and health experts, as the fractions of immune individuals and the mean survival time. Mixture cure model with the EW or the GMW distribution for survival times of uncured leukemia patients provides the best results compared with the Weibull, exponential, Rayleigh, extreme value, exponentiated exponential, generalized Rayleigh, and modified Weibull distributions.

Author Contribution Statement
Mohamed Elamin Omer was involved in the conception and design of the study, data analyses, interpretation of results, and wrote the first draft of the manuscript. Mohd Rizam Abu Bakar was involved in the conception and design of the study, carried out the data analyses, interpretation of results, and drafting the manuscript. Mohd Bakri Adam and Mohd Shafie Mustafa were involved in the conception and design of the study, data acquisition, and interpretation of results. All authors read and approved the final manuscript and take responsibility for the work.