Predicting EGFR Mutation in Lung Adenocarcinoma: Development and Validation of the EGFR Mutation Predictive Score (EMPS) in Bali, Indonesia

Introduction: The examination of epidermal growth factor receptor (EGFR) mutations may not be routinely available to all patients due to the limited availability and the expensive price of the examination, especially in area with limited resources such as in Indonesia. Therefore, we aimed to build a nomogram to predict the EGFR mutation in patients with lung adenocarcinoma by incorporating significant clinical and radiological parameters. Methods: We conducted an age-matched case–control study using 160 treatment-naïve patients [80 patients with EGFR-mutated (EGFRmut) and 80 with EGFR-wild-type (EGFRwt)] with pathologically confirmed lung adenocarcinomas with tumor specimens available for genetic analysis taken from 2017 through 2021 in Bali, Indonesia. Radiomics features were extracted from contrast CT images. The cut-off of the tumor diameter was defined using Receiver Operating Characteristic Curve. A conditional logistic regression model was constructed to identify significant risk factors, and a nomogram was developed for predicting the risk of EGFR mutation. A cohort was done to validate the nomogram. Result: Being female, never-smoker, having a smaller tumor diameter (<48.5mm), located in the upper lobe, have bubble-like lucency and air-bronchogram in the chest CT scan were identified as independent risk factors of EGFR mutation at the multivariate logistic regression model. The forming normogram model produced an area under the curve of 0.993 (95 % CI = 0.98−1.00) and 0.91 (95 % CI = 0.84−0.99) in development and validation group, respectively. The calibration curve showed good agreement between predicted and actual probability. At the cut-off point of the normogram score 246 shows a sensitivity of 97.5%, a specificity of 98.8%, a positive predictive value of 99.0%, and a negative predictive value of 96.8%. Conclusion: Our study indicated that the EGFR Mutation Normogram could provide a non-invasive way to predict the risk of EGFR mutation in patients with lung adenocarcinoma in clinical practice. This normogram need to be validated in other area in Indonesia.


Introduction
condition, EGFR work only when it binds to ligands or external stimuli such as epidermal growth factor. However, EGFR mutated tumor constantly activates tumor growth and metastasizes regardless of any stimuli (Sukauichai et al., 2022). Mutation in EGFR is accompanied with a dysregulation in EGFR tyrosine kinase (TK) activity which is closely associated with the late disease stage and poor progression of NSCLC (Omar et al., 2022). Several mutations clustered around the TK domain, such as point mutation G719X (G719A, G719C, G719S) in exon 18, in-frame deletions and in-frame insertions in exon 19, point mutation T790M and insertions in exon 20, and point mutations (L858R and L861Q) in exon 21, have been found as the most commonly found mutations in the EGFR Gene.
In 2004, EGFR mutations were found to be sensitive to targeted therapies called tyrosine kinase inhibitors (TKIs). Therefore, chimeric monoclonal antibodies (panitumumab and cetuximab) and TKIs (gefitinib, erlotinib, and afatinib) have been developed strategies that target EGFR. In particular, these treatment are able to compete with ATP (adenosine triphosphate) binding site inhibitors at the active site of EGFR kinase, therefore preventing and blocking vital EGFR pathways. (Liu et al., 2017) TKI has been a proved treatment for NSCLC with EGFR mutation in Indonesia (Kementerian Kesehatan, 2017).
Since the detection of EGFR mutations is crucial in the treatment of lung adenocarcinoma, this examination is routinely examined in all lung adenocarcinoma patients in Indonesia. However, the examination of EGFR mutations both in tissue and in circulation requires special technology such as Polymerase Chain Reaction (PCR) examination which is not easy to obtain in areas with limited resources in Indonesia. In addition, the cost of the examination is paid by pharmacy and patients. These limitations shows the importance of finding an easy, cheap, precise method or scoring in detecting EGFR mutations in lung adenocarcinoma patients. With regard to cost effectiveness and early planning of treatment, prediction of gene mutation from patients' characteristics and CT findings may be valuable. In this study, we aimed to develop and to validate normogram prediction EMPS (EGFR Mutation Predictive Score) that utilize the differences in patients' characteristics and CT findings between EGFR-mutated and nonmutated lung adenocarcinoma.

Participants Study Group (EMPS model)
A total of 409 patients with lung adenocarcinoma who underwent EGFR mutation tests between January 2017 and January 2021 in Prof Dr. IGNG Ngoerah Hospital (Denpasar, Bali) were initially enrolled in this study, from whom 80 cases (patients with lung adenocarcinoma with EGFR mutation) were selected according to the following inclusion criteria: patients with pathologically confirmed diagnosis of lung adenocarcinoma and has previously underwent EGFR mutation test; patients with thin-section Computed Tomography (CT) images with contrast accessible; and has not received any chemotherapy or radiotherapy or any therapy. Then, 80 out of 329 patients with wild type (negative EGFR mutation status) lung adenocarcinoma were individually selected and matched with each case for age (± 5 years) and geographical location (regency) of the case's residence. These 80 patients were then defined as controls. Finally, 160 ethnically Balinese patients (80 as cases and 80 as controls) were reserved for analysis (Supplement 1).

Validation Cohort Group
We conducted a cohort study in June 2023 at Prof Dr. IGNG Ngoerah Hospital, Denpasar, Bali. 48 patients were selected according to the following inclusion criteria: patients with pathologically confirmed diagnosis of lung adenocarcinoma and has previously underwent EGFR mutation test in June 2023; patients with thin-section Computed Tomography (CT) images with contrast accessible; and has not received any chemotherapy or radiotherapy or any therapy. From the 48 patients, we found 25 patients with EGFR mutation and 23 patients without EGFR mutation (Supplement 1).

Materials and Methods
Smoking status was defined as never smoker and ever smoker. All CT clinical findings, such as the largest tumor diameter, air-bronchogram, bubble-like lucency, were being measured or determined by two pulmonologist from the thorax CT-Scan with contrast. The cut-off for the tumor diameter is determined using a receiver operating characteristic (ROC) curve and the Youden's index (Liu et al., 2016, Zou et al., 2017. Tumor location in upper right, middle right, and upper left lobes were categorized as upper lobe. Those in lower right and lower left lobes were categorized as lower lobe. Brain metastasis was determined using brain CT-scan with contrast. The calibration of the EMPS was assessed by comparing the actual observed risk and the average probability of EGFR mutation by the score. The Hosmer-Lemeshow test was used to assess the corresponding goodness-of-fit. The Harrell's concordance index (C-index) was used to assess the score's discrimination ability. Since the EMPS was derived based on a conditional logistic regression model, C-index values and the corresponding 95% confidence intervals (CIs) were estimated for each groups.
For external validation, the EMPS was computed with the validation data, and score performance was assessed and compared to newly diagnosed adenocarcinoma patients who will have their EGFR status being checked. To corroborate the results observed in the derivation and validation sets, a confirmatory analysis was carried out by estimating the Area Under Curve (AUC) of the ROC curve (AUROC) of the EMPS for predicting EGFR Mutation. This also allowed us to identify the best score cut-points, which maximized the sensitivity, specificity and predictive values. In addition, calibration plots were being drawn to provide a detailed view on calibration by comparing observed and predicted outcomes among patients with the same estimated risk. The observed outcome proportions and estimated risks by a particular time point of interest are plotted against each other, with deviations from the diagonal signalling miscalibration.

Statistical Analysis
All statistical analyses were performed using SPSS 16.0. For bivariate analysis, McNemar Test was used for categorical variables and those variables have P > 0.25 was excluded from the analysis. Subsequently, multivariate analysis was performed to establish a conditional logistic regression model with stepwise selection of variables. As per stepwise selection, effects were entered into and removed from the model. Thus, one or more backward elimination steps could follow each forward selection step. At each forward selection step, if it was significant at the P=0.05 level, the corresponding effect was added to the model. Meanwhile, results of the Wald test for area under the curve (AUC) was 0.827 (95% Confidence Index [95%CI]: 0.758-0.869). The Youden index value found is 48.5 mm. Therefore, the diameter of the tumor in this study was divided into two, namely diameter tumors below 48.5 mm and above 48.5 mm. In this study, 54.37% of the research samples were obtained with a diameter greater than 48.5 mm.
Cerebral metastases were found in 13.75% of the samples and a family history of lung cancer was found in 1.25% of the samples. Asthma and COPD were found as the most common pulmonary comorbidities in 1.88% of the sample. On computed tomography examination, the majority of tumors were found in the upper lobe (65.63%), bubble-like lucency (53.13%), and air-bronchogram (53.75%).
This study also discusses the inter-observer comparison of computed tomography interpretations. The concordance between the two observers at the tumor location, the presence of a bubble-like lucency, and the presence of an air-bronchogram were almost perfect with a coefficient of K varying between 0.85 and 1.0.
Multivariate logistic regression analysis was performed on variables with statistical differences in the univariate analysis and found that being female (OR, 13.22; 95% individual parameters were examined at each backward elimination step. The least significant effect not meeting the P=0.05 level was removed. The stepwise selection process terminated when no further effect could be added to the model or when the current model was identical to a previously visited model. Then, the predictive normogram EMPS (EGFR Mutation Predictive Score) was created. The cut-off for the EMPS model is determined using a receiver operating characteristic (ROC) curve and the Youden's index.

Study Population
A total of 208 patients were enrolled, including 160 in the EMPS model ( This study is an observational analytic study using a case-control approach in pairs (pairs based on the patient's age and district of residence). The majority of research subjects were over 50 years old (83.75%), female (53.75%), never smoked (53.75%) and lived in Denpasar city (31.75%) ( Table 1).
The median diameter of the largest dimension of primary lung tumor found was 49.50 mm with the lowest value being 16 mm and the highest value being 140 mm. Receiver Operating Characteristic (ROC) curve analysis and Youden index determination are performed to determine the cut-off point to be used. The value of the   (Table 2). Based on the results of multiple logistic regression analysis, we used six independent risk factors to generate an individualized nomogram to predict the EGFR mutation status (Figure 1).

Calibration and Validation of the EMPS model
In the EMPS and the validation cohorts, the 95% CI of the calibration curve did not cross the di-agonal line (Figure 2a and 2b). Therefore, the predicted probability of the nomogram model was consistent with the actual probability, which indicated that the model had good consistency. In the EMPS model, the AUC of the EMPS nomogram was 0,993 (95% CI: 0.830, 1.000) ( Figure  3a). The AUC of the external validation cohort was 0.919 (95% CI: 0.843, 0.995) (Figure 3b).

Determining the cutoff value for predicting EGFR mutation
The values of sensitivity and specificity and the predictive values of the predicted probability under different cutoff values of the nomogram are shown in Table 3. A lower cutoff value resulted in a higher sensitivity and negative predictive value, and the lower specificity and positive predictive value increased. The diagnostic odds ratios of the nomogram at different cutoff values are shown in Table 3. The cutoff values for good performance of the nomogram ranged between ≥0.1 and ≥ 0.5. Finally, the optimal cutoff value determined by Figure 2. Calibration Curve of the Nomogram in the EMPS (a) and Validation (b) Cohorts. The x-axis represents the predicted EGFR mutation risk. The y-axis represents the actual EGFR mutation rate. The green line represents a perfect estimated mutation rate by an ideal model. The black line represents the performance of the nomogram, in which a closer fit to the blue line represents a better prediction. EGFR, epidermal growth factor receptor; EMPS, EGFR Mutation Predictive Score.  (Fig. 3a), and the AUC in the validation cohort was 0.92 (95% CI: 0.84-1.00) (Figure 3b). AUC, area under the curve; EGFR, epidermal growth factor receptor; ROC, receiver operating characteristic.     (Table 3).

Discussion
In this study, we developed and validated a nomogram based on patients' characteristic, such as gender and smoking status, and CT features including tumor diameter, tumor location, bubble-like lucency, and air-bronchogram, for personalized prediction of EGFR mutation status. The nomogram development case control included 160 patients with lung adenocarcinoma. To further validate the performance of this model, we evaluated it in an independent external validation cohort that included 48 cases. The AUC of the model in the development and validation cohorts was 0.993 and 0.919, respectively.
Even though EGFR examination is crucial in the management of all lung adenocarcinoma patients in Indonesia, the detection of EGFR mutations in practive has various obstacles due to the limited material or adequate EGFR test samples (Zhao et al., 2017), the need of advanced examination that may not available in areas with limited resources in Indonesia, and its expensive cost. Although the treatment of tyrosine kinase inhibitor (TKI) is borne by the BPJS, currently the cost of examining the detection of EGFR mutations is still being borne by the pharmacy and the patient at the Prof. IGNG Ngoerah Central General Hospital (RSUP). This limitation shows the importance of finding an easy, cheap, accurate method or scoring in detecting EGFR mutations in lung adenocarcinoma patients. One of them is by analyzing risk factors in predicting EGFR mutation status and developing a normogram of significant factors. By focusing on the distinction between between EGFR wild types and EGFR-mutated types, developing normogram should be helpful and useful for clinical treatment.
For the construction of the nomogram, we carried out multiple logistic regression analyses on the variables that showed statistical differences in univariate analysis. We then selected the variables that showed significant differences in the univariate analysis in the final model. Initially, we used 8 candidate variables, including 5 CT features and 3 clinical variables. Due to inadequacy of data on cerebral metastasis, this variable was excluded from the univariate analysis. Hasegawa et al., (2016), Zhao et al., (2017), Zhang et al., (2019) and Liu et al., (2016) used the clinical and pathological data of patients with lung adenocarcinoma to establish a nomogram model to predict the EGFR mutation status, and the AUC of the model in the independent validation dataset varied between 0.741-0.784. However, all patients included in this study were from outside Indonesia, so this model may not suitable for Indonesian patients. By combining the variables contained in the model that were easily obtained before operation and observed by the naked eye from CT scan result. Therefore, our model is easier to apply to clinical practice in Indonesia.
One of the most important reasons for using a nomogram is that it can explain the needs of individual treatment or care needs. However, the clinical consequences of a specific level of discrimination or mis-alignment cannot be captured by performance, discrimination, and cal-ibration of risk prediction (Van Calster andVickers, 2015, Collins et al., 2015). Therefore, in order to demonstrate the usefulness of a nomogram, a decision curve was used to evaluate whether the decision of a nomogram model would improve the prognosis of patients. This novel approach analyzed clinical consequences based on threshold probabilities and derived net benefits from it Elkin, 2006, Balachandran et al., 2015). The AUC of the ROC curve was 0.993, indicating good predictive ability. We used a cross validation model in a cohort of patients, and the final prediction accuracy was 91%. According to the Youden index and diagnostic odds ratios, we assigned an optimal cutoff value of 246 (Table 3). According to our results, female and not smoking patients who has tumor located in the upper lobe, tumor diameter less than 48.5 mm, air-bronchogram and bubble-like lucency were most likely associated with a higher probability to have EGFR mutation.
To the best our knowledge, this is the first study describing a comprehensive scoring system to predict EGFR mutation in Bali, Indonesia. Nonetheless, there are some limitations in this study. First, this study is limited to Balinese populations only. Second, patients with other mutation subtypes, such as KRAS mutation or TP53 mutation, were not included in our study, and larger patient cohort studies are required to confirm our observation. Finally, the scoring system obtained from the retrospective analysis in this study needs to be further confirmed by prospective studies including nonsurgical candidates and different ethnic populations to determine whether this model can be used for treatment decision-making without molecular profiling in Indonesia.
In conclusion, we constructed and validated a nomogram for predicting EGFR mutation in lung adenocarcinoma patients. The proposed nomogram considered six independent risk factors: female gender, not smoker, a smaller tumor diameter, tumor located in the upper lobe, air-bronchogram, and bubble-like lucency. We have confirmed the precise calibration and excellent discrimination power of our nomogram. The predictive power of this nomogram may be improved by considering other potential important factors that we could not be obtained from are with limited resouces such as Indonesia, and also by external validation.