Bharti, R. K., Verma, A., and Singh, R. (2011). A biological
sequence compression based on cross chromosomal
similarities using variable length lut. International
Journal of Biometrics and Bioinformatics, 4:217–223.
Bhola, V., Bopardikar, A. S., Narayanan, R., Lee, K., and
Ahn, T. (2011). No-reference compression of genomic
data stored in fastq format. In BIBM, pages 147–150.
Boyer, R. S. and Moore, J. S. (1977). A fast string searching
algorithm. Commun. ACM, 20(10):762–772.
Brandon, M. C., Wallace, D. C., and Baldi, P. (2009). Data
structures and compression algorithms for genomic
sequence data. Bioinformatics, 25(14):1731–1738.
Chen, W., Lu, Y., Lai, F., Chien, Y., and Hwu, W. (2011).
Integrating human genome database into electronic
health record with sequence alignment and compres-
sion mechanism. J Med Syst.
Chiang, G.-T., Clapham, P., Qi, G., Sale, K., and Coates, G.
(2011). Implementing a genomic data management
system using iRODS in the Wellcome Trust Sanger
Institute. BMC Bioinformatics, 12(1):361+.
Daily, K., Rigor, P., Christley, S., Xie, X., and Baldi, P.
(2010). Data structures and compression algorithms
for high-throughput sequencing technologies. BMC
bioinformatics, 11(1):514+.
Deorowicz, S. and Grabowski, S. (2011). Robust Rela-
tive Compression of Genomes with Random Access.
Bioinformatics.
Duc Cao, M., Dix, T. I., Allison, L., and Mears, C. (2007).
A simple statistical algorithm for biological sequence
compression. In Proceedings of the 2007 Data Com-
pression Conference, pages 43–52, Washington, DC,
USA. IEEE Computer Society.
Grabowski, S. and Deorowicz, S. (2011). Engineering rela-
tive compression of genomes. CoRR, abs/1103.2351.
Hunt, E., Atkinson, M. P., and Irving, R. W. (2002).
Database indexing for large dna and protein sequence
collections. The VLDB Journal, 11(3):256–271.
Kahn, S. D. (2011). On the future of genomic data. Science,
331(6018):728–729.
Kaipa, K. K., Bopardikar, A. S., Abhilash, S., Venkatara-
man, P., Lee, K., Ahn, T., and Narayanan, R.
(2010). Algorithm for dna sequence compression
based on prediction of mismatch bases and repeat lo-
cation. In Bioinformatics and Biomedicine Workshops
(BIBMW).
Kent, W. J. (2002). BLATThe BLAST-Like Alignment
Tool. Genome Research, 12(4):656–664.
Kent, W. J., Sugnet, C. W., Furey, T. S., Roskin, K. M.,
Pringle, T. H., Zahler, A. M., and Haussler, D. (2002).
The human genome browser at UCSC. Genome Res,
12(6):996–1006.
Kuruppu, S., Beresford-Smith, B., Conway, T., and Zobel,
J. (2012). Iterative dictionary construction for com-
pression of large dna data sets. IEEE/ACM Trans.
Comput. Biol. Bioinformatics, 9(1):137–149.
Kuruppu, S., Puglisi, S. J., and Zobel, J. (2010). Relative
lempel-ziv compression of genomes for large-scale
storage and retrieval. In Proceedings of the 17th inter-
national conference on String processing and infor-
mation retrieval, SPIRE’10, pages 201–206, Berlin,
Heidelberg. Springer-Verlag.
Mishra, K. N., Aaggarwal, D. A., Abdelhadi, D. E., and
Srivastava, D. P. C. (2010). An efficient horizontal
and vertical method for online dna sequence compres-
sion. International Journal of Computer Applications,
3(1):39–46. Published By Foundation of Computer
Science.
Pande, P. and Matani, D. (2011). Compressing the human
genome against a reference. Technical report, Stony
Brook University.
Peltola, H. and Tarhio, J. (2003). Alternative algorithms for
bit-parallel string matching. In SPIRE, pages 80–94.
Pennisi, E. (2011). Will Computers Crash Genomics? Sci-
ence, 331(6018):666–668.
Pratas, D. and Pinho, A. J. (2011). Compressing the hu-
man genome using exclusively markov models. In
Rocha, M. P., Rodrguez, J. M. C., Fdez-Riverola, F.,
and Valencia, A., editors, PACBB, volume 93 of Ad-
vances in Intelligent and Soft Computing, pages 213–
220. Springer.
Schadt, E. E., Turner, S., and Kasarskis, A. (2010). A win-
dow into third-generation sequencing. Human molec-
ular genetics, 19(R2):R227–R240.
Ukkonen, E. (1995). On-Line Construction of Suffix Trees.
Algorithmica, 14(3):249–260.
Vey, G. (2009). Differential direct coding: a compression
algorithm for nucleotide sequence data. The Journal
of Biological Database and Curation, 2009.
Vlimki, N., Mkinen, V., Gerlach, W., and Dixit, K. (2009).
Engineering a compressed suffix tree implementation.
ACM Journal of Experimental Algorithmics, 14.
Wan, R., Anh, V. N., and Asai, K. (2011). Transforma-
tions for the compression of fastq quality scores of
next generation sequencing data. Bioinformatics.
KDIR2012-InternationalConferenceonKnowledgeDiscoveryandInformationRetrieval
102