Risk Scoring System based on lncRNA Expression for Predicting Survival in Hepatocellular Carcinoma with Cirrhosis

Objective: This study aims to explore the roles of long non-coding RNAs (lncRNAs) for predicting survival in hepatocellular carcinoma (HCC) patients with cirrhosis. Methods: A set of lncRNAs differentially expressed between HCC patients with or without cirrhosis was identified using expression profiles of The Cancer Genome Atlas database, and these lncRNAs were screened for their risk scoring system to predict recurrence-free survival (RFS) or overall survival (OS). Predictive ability of risk scoring systems was confirmed using uni- and multivariate Cox analyses while adjusting for clinical features. Predictive lncRNAs were analyzed by functional enrichment analysis. Results: Our screen identified 22 lncRNAs that were upregulated in the presence of cirrhosis and 59 that were downregulated. To predict OS of HCC patients with cirrhosis, a risk scoring system was developed with four lncRNAs (LINC02086, LINC00880, LINC01549 and AC136475.3); to predict RFS in these patients, the risk scoring system contained five lncRNAs (SH3RF3-AS1, AC104117.3, AC136475.3, LINC00239 and MRPL23-AS1). All risk scoring systems were associated with an area under the receiver operating characteristic curve > 0.7. Based on uni- and multivariate Cox analyses, the risk scoring system could serve as a significant independent predictor for OS in HCC patients with cirrhosis. Functional enrichment analysis suggested that the lncRNAs in the risk scoring systems are involved primarily in the pathway of Wnt signal and cytokine-cytokine receptor interaction. Conclusion: Risk scoring systems based on lncRNAs can effectively predict OS of HCC patients with cirrhosis. The system should be further developed and validated in larger, preferably multi-site patient populations.


Introduction
Liver cancer remains the sixth most frequent cancer globally, with approximately 841,000 new diagnosed cases and 782,000 deaths annually. Hepatocellular carcinoma (HCC) is the most frequent common liver cancer, accounting for 75-85% of liver cancers (Bray et al., 2018;Forner et al., 2018;Kulik and El-Serag, 2019). Despite significant improvements in diagnosing and treating HCC, its heterogeneity means that it continues to be associated with relatively low rates of recurrence-free survival (RFS) and overall survival (OS) (Bruix et al., 2014;Fong and Tanabe, 2014). Currently no biomarkers have proven effective for predicting prognosis, reflecting the complicated nature of the disease. Further efforts are needed to identify biomarkers that can predict prognosis and guide treatment.
Long non-coding RNAs (lncRNAs), located in the cytoplasm and nucleus of eukaryotic cells, are non-coding whether cirrhosis is present (Grazi et al., 2003;Gassmann et al., 2010;Wang et al., 2013;Techathuvanan et al., 2015), we wanted to examine whether some lncRNAs are differentially expressed in the presence or absence of cirrhosis, and whether these might influence risk of recurrence or death. Using expression profile data in The Cancer Genome Atlas (TCGA) database, we developed a risk scoring system based on lncRNA levels and showed that it could predict survival of HCC patients with cirrhosis.

Datasets on HCC patients
Expression profiles of lncRNAs and mRNAs from HCC patients, together with the corresponding clinical information, were taken from TCGA (version 09-14-2017 for HCC) via the UCSC Xena server (https://xenabrowser. net/datapages/). Patient data were included in the present study if their HCC was confirmed by histology, complete lncRNA and mRNA expression profiles were available, cirrhosis status was known, and sufficient data were available to determine OS and RFS. After applying these criteria, 77 HCC patients with cirrhosis and 130 without cirrhosis were selected (Table 1). This study was prepared in accordance with TCGA guidelines (https://cancergenome. nih.gov/publications/publication-guidelines). No ethics approval was required for this study since the data came from TCGA.

Identification of lncRNAs differentially expressed between HCC patients with or without cirrhosis
After removal of lncRNAs showing zero expression in more than 50% of patients, the "edgeR" package in R (Robinson et al., 2010) was used to identify lncRNAs differentially expressed between patients with or without cirrhosis, defined as those associated with |log2fold change (log2FC)| > 1 and false discovery rate (FDR) < 0.05. Volcano and cluster heat maps were created by "gplots" and "heatmap" packages in R.

LncRNA-based risk scoring systems
Firstly, the normalized expression values of multiple samples of the same patient were averaged. Then, univariate Cox analysis was performed to screen differentially expressed lncRNAs for a significant relationship with OS or RFS. Those lncRNAs associated with p < 0.05 were included in subsequent multivariate Cox regression, and the best model was selected using the method of backward stepwise. A risk scoring system was defined using a linear combination of the lncRNA expression levels, each multiplied by a regression coefficient β: Risk score = (β1 * lncRNA1 level) + (β2 * lncRNA2 level) + (β3 * lncRNA3 level) + (β4 * …level) + ...
Using this formula, a risk score was calculated for each patient, and the ability of this score to predict survival was evaluated by time-dependent receiver operating characteristic (ROC) curves in three years. Patients were divided into groups at high or low risk based on the median risk score, and shown on a non-cluster heat map. Kaplan-Meier survival curves were compared between patients at high or low risk. All these analyses were performed using R/Bioconductor (version 3.4.4).

Validation of the prognostic significance of risk scoring systems
Uni-or multivariate analyses were used to verify associations between risk score and survival. If these analyses did not return any significant results, stratified analyses were conducted to identify factors that might be affecting the results, based on the chi-squared test. All these analyses were carried out using SPSS 16.0 (IBM, Chicago, IL, USA), and the threshold for significance was a two-sided p < 0.05.

Co-expression and functional analyses of lncRNA-related mRNAs
Based on data from the 207 HCC patients, mRNAs whose expression co-varied with that of lncRNAs in the risk scoring system were identified based on a two-sided |Pearson correlation coefficient | > 0.40 and a z-test p < 0.01 (Fan and Liu, 2016). These mRNAs were analyzed for Kyoto Encyclopedia of Genes and Genomes (KEGG) pathway enrichment using the "clusterProfiler" package in R (Yu et al., 2012). Differences in enrichment were considered significant when associated with p < 0.05.

LncRNAs differentially expressed in the presence or absence of cirrhosis in HCC patients
Our analysis included the lncRNA expression profiles in 77 HCC patients with cirrhosis and 130 HCC without cirrhosis obtained from the TCGA database. A total of 81 differentially expressed lncRNAs were identified, 22 (27.16%) of which were up-regulated in cirrhosis and 59 (72.84%) down-regulated. Table 2 shows the first 20 up-and down-regulated lncRNAs, together with the corresponding values for log2FC, p, and FDR. Figure 1 shows all differentially expressed lncRNAs plotted according to -log10 (FDR) and log2FC, while Figure 2 shows a heatmap indicating the relative specificity of differentially expressed lncRNAs.
Based on their risk scores, patients with cirrhosis were and 5 years (20.6% vs 82.2%) ( Figure 4A). The area under the ROC curve (AUC) for the risk scoring system was 0.818 ( Figure 5A). After obtaining these promising results, we wished to check whether risk scoring based on lncRNAs could predict OS independently of clinicodemographic characteristics of the patients. This analysis is important given the heterogeneity of clinical presentation of HCC, and the range of factors that can influence prognosis. First we conducted univariate Cox regression to identify which clinical features were significantly associated with OS, and then we subjected this subset of factors to classified as being at low or high risk of poor OS using the median risk score as a cut-off ( Figure 3A). Kaplan-Meier curves showed that high-risk patients showed significantly lower OS rates at 3 years (65.8% vs 89.1%)
Based on their risk scores, patients with cirrhosis were classified as being at low or high risk of poor RFS using the median risk score as cut-off ( Figure 3B). Kaplan-Meier curves showed that high-risk patients showed significantly lower RFS rates at 3 years (25.3% vs 66.8%) and 5 years (16.9% vs 57.3%) ( Figure 4B). The AUC of the risk scoring system was 0.819 ( Figure 5B).
As we did above for OS, we wished to check whether      Functional analysis of mRNAs related to differentially expressed lncRNAs Correlation in expression levels between the lncRNAs in our risk scoring systems and mRNAs from RNA-seq data was analyzed using Pearson correlation, and several related mRNAs were identified (Supplementary Tables BMI, Body mass index; AFP, Alpha fetoprotein; *TNM staging systems Table 7. Stratified Analyses for the Risk Score of RFS in HCC Patients with Cirrhosis Using Chi-Square Test 1-2). These related mRNAs were analyzed using the KEGG signal pathway databases ( Figure 6). Functional enrichment analysis showed that mRNAs strongly related to the lncRNAs in our risk scoring systems were involved mainly in the pathway of Wnt signal (Supplementary Figure 1) and cytokine-cytokine receptor interaction (Supplementary Figure 2).

Discussion
HCC is associated with high morbidity and poor prognosis, and the factors that contribute to poor outcomes are likely to vary substantially in the presence or absence of cirrhosis (Grazi et al., 2003;Gassmann et al., 2010;Wang et al., 2013;Techathuvanan et al., 2015). Therefore we developed an lncRNA-based scoring system to assess a patient's risk of poor OS or RFS in the presence of cirrhosis. Our results establish the potential of four or five lncRNAs to serve as prognostic biomarkers with respective AUCs of 0.818 and 0.819, suggesting reasonable predictive power.
After adjusting for other clinical variables, uni-and/ or multivariate Cox analyses showed the lncRNA-based scoring system to be significantly associated with OS of HCC patients with cirrhosis, suggesting the lncRNA-based scoring system can serve as a significant independent predictor. However, the system for predicting RFS was not significantly associated with RFS in univariate analysis. Therefore, we stratified patients by low or high risk score and found the following factors to influence risk: AFP, new tumor event and cancer status. Our results suggest that these factors should be taken into account when predicting RFS in HCC patients with cirrhosis. Furthermore, other clinical factors also influenced the survival of HCC patients in our sample. Our study showed that age was an independent factor for the prediction of OS in HCC patients with cirrhosis, which was similar with previous studies (Wang et al., 2013;Techathuvanan et al., 2015;Liu et al., 2018). To gain insight into the prognostic significance of our risk scoring systems, we analyzed the molecular functions of genes highly related to these lncRNAs. We found that the lncRNAs associated with prognosis in HCC are involved mainly in Wnt signaling and cytokine-cytokine receptor interactions.
Several previous studies constructed risk scoring systems to predict the prognosis of HCC patients (Gu et al., 2018;Liao et al., 2018;Ma et al., 2018;Shi et al., 2018;Sui et al., 2018;Wu et al., 2018b;Zhao et al., 2018;Bai et al., 2019;Yan et al., 2019;Ye et al., 2019;Zhang et al., 2019). However, all these risk scoring systems are based on lncRNAs differentially expressed between HCC and normal samples, while our systems are based on lncRNAs differentially expressed between HCC patients with or without cirrhosis. Therefore, our systems may be more specific for HCC involving cirrhosis. One previous study identified a four-lncRNA signature as a prognostic indicator in cirrhotic HCC (Ma and Deng, 2019), with no overlap with our four or five-lncRNA signatures. Those authors extracted their data from a different expression profile dataset, and they used only stratified analyses based on clinical characteristics to validate the prognostic significance of their risk scoring system. In contrast, we applied multivariate Cox regression, which may be more rigorous. This may help explain why none of the lncRNAs in our risk scoring systems occurs in the previously published risk scoring systems for HCC prognosis. The importance of developing prognostic tools for HCC with cirrhosis is underscored by the observation of several microRNAs differentially expressed between patients with or without cirrhosis, particularly mir-149 (Mei et al., 2018), miR-24 and miR-27a (Salvi et al., 2013). These differences may help explain the different prognosis in the two types of HCC. Together, these studies suggest that non-coding RNAs may hold the key to unlocking new strategies in the diagnosis, treatment and management of HCC in the presence of cirrhosis.
Our results should be interpreted with caution in light of several limitations. First, we could not include type of HCC treatment in our multivariate Cox regression for lack of data, and this variable may affect OS and RFS. Second, some Cox analyses may have been biased because certain clinical data were unavailable for some patients. Third, although the AUC was 0.819 for the system to score risk of poor RFS in HCC patients with cirrhosis, univariate Cox analysis showed that the risk score was not a significant independent predictive factor of HCC prognosis. This model, in particular, should be re-examined carefully with a larger dataset. Last but not least, we could not divide the samples into training and test dataset, which may result in some bias. Indeed, future work should validate all our findings in larger, preferably multi-site patient populations.
In conclusion, we have defined lncRNA-based risk scoring systems that can effectively predict OS in HCC patients with cirrhosis. Our work highlights the potential usefulness of lncRNAs as prognostic biomarkers, and it provides several leads for the development of novel therapies.