| | |
| | |
Stat |
Members: 3667 Articles: 2'599'751 Articles rated: 2609
16 February 2025 |
|
| | | |
|
Article overview
| |
|
DNA Lossless Differential Compression Algorithm based on Similarity of Genomic Sequence Database | Heba Afify
; Muhammad Islam
; Manal Abdel Wahed
; | Date: |
1 Sep 2011 | Abstract: | Modern biological science produces vast amounts of genomic sequence data.
This is fuelling the need for efficient algorithms for sequence compression and
analysis. Data compression and the associated techniques coming from
information theory are often perceived as being of interest for data
communication and storage. In recent years, a substantial effort has been made
for the application of textual data compression techniques to various
computational biology tasks, ranging from storage and indexing of large
datasets to comparison of genomic databases. This paper presents a differential
compression algorithm that is based on production of difference sequences
according to op-code table in order to optimize the compression of homologous
sequences in dataset. Therefore, the stored data are composed of reference
sequence, the set of differences, and differences locations, instead of storing
each sequence individually. This algorithm does not require a priori knowledge
about the statistics of the sequence set. The algorithm was applied to three
different datasets of genomic sequences, it achieved up to 195-fold compression
rate corresponding to 99.4% space saving. | Source: | arXiv, 1109.0094 | Services: | Forum | Review | PDF | Favorites |
|
|
No review found.
Did you like this article?
Note: answers to reviews or questions about the article must be posted in the forum section.
Authors are not allowed to review their own article. They can use the forum section.
|
| |
|
|
|