Authors:
Taysir H. A. Soliman
1
;
Tarek F. Gharib
2
;
Alshaimaa Abo-Alian
2
and
Mohammed Alsharkawy
2
Affiliations:
1
Faculty of Computer and Information, Assiut University, Egypt
;
2
Faculty of Computer and Information Sciences, Ain Shams University, Egypt
Keyword(s):
Lossless Compression Algorithm, encoding, approximate repeats, palindrome.
Related
Ontology
Subjects/Areas/Topics:
Artificial Intelligence
;
Biomedical Engineering
;
Business Analytics
;
Data Engineering
;
Data Mining
;
Databases and Information Systems Integration
;
Datamining
;
Enterprise Information Systems
;
Health Information Systems
;
Sensor Networks
;
Signal Processing
;
Soft Computing
Abstract:
Homology search is the seed for both genomics and proteomics research. However, the increase of the amount of DNA sequences requires efficient computational algorithms for performing sequence comparison and analysis. This is due to the fact that standard compression algorithms are not able to compress DNA sequences because they do not consider special characteristics of DNA sequences (i.e. DNA sequences contain several approximate repeats and complimentary palindromes are frequent in DNA). Recently, new algorithms have been proposed to compress DNA sequences, often using detection of long approximate repeats. The current work proposes a Lossless Compression Algorithm (LCA), providing a new encoding method. LCA achieves a better compression ratio than that of existing DNA-oriented compression algorithms, when compared to GenCompress and DNACompress, using nine different datasets.