Genome Sequences as Media Files - Towards Effective, Efficient, and functional Compression of Genomic Data

Tom Paridaens, Wesley De Neve, Peter Lambert, Rik Van De Walle

Abstract

In the past decade, the cost of sequencing a human genome has evolved from millions of dollars to a few thousand dollars. As a result of this exponential drop in cost, the amount of sequencing data is increasing exponentially. Given that storage capacity per dollar doubles roughly every 18 months, the storage of genomic data is becoming a big problem. Therefore, research is performed on improving compression ratio. Unfortunately, functionality such as random access and compressed-domain analysis are often neglected. Therefore, genomic data files have to be transmitted and decompressed completely before any analysis or editing can be done, even if the researcher is only interested in one specific gene. In this paper, we describe our view on how we can introduce techniques that are widely used in the world of media file compression and management into the world of genomic data. With these techniques we aim to lower network load by more efficient transmission techniques, to lower processing load by more efficient (compressed-domain) analysis of genomic data, to provide configurable compression tools and to provide privacy protection through encryption and DRM.

References

  1. Armando J. Pinho et al. (2009). DNA coding using finitecontext models and arithmetic coding. ICASSP.
  2. Chris Poppe et al. (2009). Moving object detection in the H.264/AVC compressed domain for video surveillance applications. Journal of visual communication and image representation, 20(6):428-437.
  3. Davy Van Deursen et al. (2010). NinSuna : a fully integrated platform for format-independent multimedia content adaptation and delivery using Semantic Web technologies. Multimedia tools and applications, 46(2-3):371-398.
  4. EBI (2013). First bulk CRAM submission to ENA. EBI. http://www.ebi.ac.uk/ena/about/news/first-bulkcram-submission-ena.
  5. Fatma Al-Abri et al. Optimal H.264/AVC video transcoding system. Multimedia tools and applications, pages 335-336.
  6. Glenn Van Wallendael et al. (2013). Encryption for high efficiency video coding with video adaptation capabilities. IEEE international conference on consumer electronics.
  7. Guy Cochrane et al. (2013). The future of DNA sequence archiving. Gigascience.
  8. Heiko Schwarz et al. (2007). Overview of the Scalable Video Coding Extension of the H.264/AVC Standard. IEEE transactions on circuits and systems for video technology, 17(9).
  9. Kalyan Kumar Kaipa et al. (2010). System for Random Access DNA Sequence Compression. IEEE International Conference on Bioinformatics and Biomedicine Workshops.
  10. Markus Hsi-Yang Fritz et al. (2011). Efficient storage of high throughput DNA sequencing data using reference-based compression. Cold Spring Harbor Laboratory Press.
  11. Sebastian Wandelt et al. (2013). Trends in genome compression. Journal of Current Bioinformatics.
  12. Shanika Kuruppu et al. (2011). Optimized Relative LempelZiv Compression of Genomes. 34th Australasian Computer Science Conference.
  13. Tom Paridaens et al. (2007). XML-driven Bitrate Adaptation of SVC Bitstreams. 8th International Workshop on Image Analysis for Multimedia Interactive Services.
  14. Xin Chen et al. (1999). Compression Algorithm for DNA Sequences and Its Applications in Genome Comparison. Genome Inform Ser Workshop Genome Inform.
  15. Yuichi Kodama et al. (2011). The sequence read archive: explosive growth of sequencing data. Nucleic Acids Research.
Download


Paper Citation


in Harvard Style

Paridaens T., De Neve W., Lambert P. and Van De Walle R. (2014). Genome Sequences as Media Files - Towards Effective, Efficient, and functional Compression of Genomic Data . In Doctoral Consortium - DCBIOSTEC, (BIOSTEC 2014) ISBN Not Available, pages 3-9


in Bibtex Style

@conference{dcbiostec14,
author={Tom Paridaens and Wesley De Neve and Peter Lambert and Rik Van De Walle},
title={Genome Sequences as Media Files - Towards Effective, Efficient, and functional Compression of Genomic Data},
booktitle={Doctoral Consortium - DCBIOSTEC, (BIOSTEC 2014)},
year={2014},
pages={3-9},
publisher={SciTePress},
organization={INSTICC},
doi={},
isbn={Not Available},
}


in EndNote Style

TY - CONF
JO - Doctoral Consortium - DCBIOSTEC, (BIOSTEC 2014)
TI - Genome Sequences as Media Files - Towards Effective, Efficient, and functional Compression of Genomic Data
SN - Not Available
AU - Paridaens T.
AU - De Neve W.
AU - Lambert P.
AU - Van De Walle R.
PY - 2014
SP - 3
EP - 9
DO -