Finding Genes Discriminating Smokers from Non-smokers by Applying a Growing Self-organizing Clustering Method to Large Airway Epithelium Cell Microarray Data

Abstract

Background: Cigarette smoking is the major risk factor for development of lung cancer. Identification ofeffects of tobacco on airway gene expression may provide insight into the causes. This research aimed to comparegene expression of large airway epithelium cells in normal smokers (n=13) and non-smokers (n=9) in order tofind genes which discriminate the two groups and assess cigarette smoking effects on large airway epitheliumcells. Materials and
Methods: Genes discriminating smokers from non-smokers were identified by applying aneural network clustering method, growing self-organizing maps (GSOM), to microarray data according to classdiscrimination scores. An index was computed based on differentiation between each mean of gene expression inthe two groups. This clustering approach provided the possibility of comparing thousands of genes simultaneously.
Results: The applied approach compared the mean of 7,129 genes in smokers and non-smokers simultaneouslyand classified the genes of large airway epithelium cells which had differently expressed in smokers comparingwith non-smokers. Seven genes were identified which had the highest different expression in smokers comparedwith the non-smokers group: NQO1, H19, ALDH3A1, AKR1C1, ABHD2, GPX2 and ADH7. Most (NQO1,ALDH3A1, AKR1C1, H19 and GPX2) are known to be clinically notable in lung cancer studies. Furthermore,statistical discriminate analysis showed that these genes could classify samples in smokers and non-smokerscorrectly with 100% accuracy. With the performed GSOM map, other nodes with high average discriminatescores included genes with alterations strongly related to the lung cancer such as AKR1C3, CYP1B1, UCHL1and AKR1B10.
Conclusions: This clustering by comparing expression of thousands of genes at the same timerevealed alteration in normal smokers. Most of the identified genes were strongly relevant to lung cancer in theexisting literature. The genes may be utilized to identify smokers with increased risk for lung cancer. A largesample study is now recommended to determine relations between the genes ABHD2 and ADH7 and smoking.

Keywords