Authors:
Artur J. Ferreira
1
;
Arlindo L. Oliveira
2
and
Mário A. T. Figueiredo
3
Affiliations:
1
Instituto Superior de Engenharia de Lisboa; Instituto de Telecomunicações, Portugal
;
2
Instituto de Engenharia de Sistemas e Computadores: Investigação e Desenvolvimento; Instituto Superior Técnico, Portugal
;
3
Instituto de Telecomunicações; Instituto Superior Técnico, Portugal
Keyword(s):
Lempel-Ziv, Lossless Data Compression, Suffix Arrays, Suffix Trees, String Matching.
Related
Ontology
Subjects/Areas/Topics:
Joint Source/Channel Coding
;
Multimedia
;
Multimedia and Communications
;
Multiple Description Coding
;
Telecommunications
Abstract:
Lossless compression algorithms of the Lempel-Ziv (LZ) family are widely used in a variety of applications. The LZ encoder and decoder exhibit a high asymmetry, regarding time and memory requirements, with the former being much more demanding. Several techniques have been used to speed up the encoding process; among them is the use of suffix trees. In this paper, we explore the use of a simple data structure, named suffix array, to hold the dictionary of the LZ encoder, and propose an algorithm to search the dictionary. A comparison with the suffix tree based LZ encoder is carried out, showing that the compression ratios are roughly the same. The ammount of memory required by the suffix array is fixed, being much lower than the variable memory requirements of the suffix tree encoder, which depends on the text to encode. We conclude that suffix arrays are a very interesting option regarding the tradeoff between time, memory, and compression ratio, when compared with suffix trees, that
make them preferable in some compression scenarios.
(More)