Distributed ICSA Clustering Approach for Large Scale Protein Sequences and Cancer Diagnosis

Document Type : Research Articles

Authors

1 Department of Computer Applications, Selvam College of Technology, Namakkal, India.

2 Department of Computer Science, Government Arts and Science College, Kangayam, India.

3 Department of Computer Applications, Kongu Engineering College, Erode, Tamilnadu, India.

Abstract

Objective: With the over saturating growth of biological sequence databases, handling of these amounts of data has
increasingly become a problem. Clustering has become one of the principal research objectives in structural and functional
genomics. However, exact clustering algorithms, such as partitioned and hierarchical clustering, scale relatively poorly
in terms of run time and memory usage with large sets of sequences. Methods: From these performance limits, heuristic
optimizations such as Cuckoo Search Algorithm with genetic operators (ICSA) algorithm have been implemented in
distributed computing environment. The proposed ICSA, a global optimized algorithm that can cluster large numbers
of protein sequences by running on distributed computing hardware. Results: It allocates both memory and computing
resources efficiently. Compare with the latest research results, our method requires only 15% of the execution time and
obtains even higher quality information of protein sequence. Conclusion: From the experimental analysis, We noticed
that the cluster of large protein sequence data sets using ICSA technique instead of only alignment methods reduce
extremely the execution time and improve the efficiency of this important task in molecular biology. Moreover, the
new era of proteomics is providing us with extensive knowledge of mutations and other alterations in cancer study.

Keywords

Main Subjects