compression techniques by simply employing
instead of IPC and PFD, other constructions.
6 CONCLUSIONS AND OPEN
PROBLEMS
We have presented a novel technique for
compressing efficiently inverted files, when storing
the document identifiers. The proposed
constructions can be harmonically combined with
other techniques that have been proposed in the
literature such as IPC and PFD and produce
compression results that are competitive and in the
majority of the cases even better than those of the
previous works. As the careful reader should have
noticed the handling of the secondary index with the
extra identifiers constitutes the main burden of our
technique. This burden can be relieved by using
more arithmetic progressions when representing
each initial inverted list, and here there exists a
tradeoff that is worth the effort to be further
explored, since it could lead to a whole set of
parametric techniques. Moreover it could be
interesting to investigate further techniques of
handling the secondary index that could lead to
faster decompression performance.
ACKNOWLEDGEMENTS
This research has been co-financed by the European
Union (European Social Fund-ESF) and Greek
national funds through the Operational Program
“Education and Lifelong Learning” of the National
Strategic Reference Framework (NSRF)-Research
Funding Program: Heracleitus II. Investing in
knowledge society through the European Social
Fund.
This research has been co-financed by the
European Union (European Social Fund-ESF) and
Greek national funds through the Operational
Program “Education and Lifelong Learning” of the
National Strategic Reference Framework (NSRF)-
Research Funding Program: Thales. Investing in
knowledge society through the European Social
Fund.
REFERENCES
Baeza-Yates, R., Ribeiro-Neto, B. 2011, Modern
Information Retrieval: the concepts and technology
behind search, second edition, Essex: Addison Wesley.
Callan, J. 2009, The ClueWeb09 Dataset. available at
http://boston.lti.cs.cmu.edu/clueweb09 (accessed 1st
August 2012).
Chierichetti, F., Kumar, R., Raghavan, P., 2009.
Compressed web indexes. In: 18th Int. World Wide
Web Conference, pp. 451–460.
Ding, S., Attenberg, J., Suel, T., 2010, Scalable
Techniques for Document Identifier Assignment in
Inverted Indexes, Proceedings of the 19th
International Conf. on World Wide Web, pp. 311-320.
He, J., Yan, H., Suel, T., 2009. Compact full-text indexing
of versioned document collections, Proceedings of the
18th ACM Conference on Information and knowledge
management, November 02-06, Hong Kong, China
Heman, S. 2005. Super-scalar database compression
between RAM and CPU-cache. MS Thesis, Centrum
voor Wiskunde en Informatica, Amsterdam.
Moffat, A., Stuiver, L., 2000, Binary interpolative coding
for effective index compression, Information
Retrieval, 3, 25-47.
Navarro, G., Silva De Moura, E., Neubert, M., Ziviani,
N., Baeza-Yates R., 2000, Adding Compression to
Block Addressing Inverted Indexes, Information
Retrieval, 3, 49-77.
Ntoulas A., Cho J., 2007. Pruning policies for two-tiered
inverted index with correctness guarantee,
Proceedings of the 30th Annual International ACM
SIGIR conference on Research and development in
Information Retrieval, July 23-27, Amsterdam, The
Netherlands.
Scholer, F., Williams, H.E., Yiannis, J., Zobel, J. 2002.
Compression of inverted indexes for fast query
evaluation, In 25th Annual ACM SIGIR Conference,
pp. 222-229.
Witten, I. H., Moffat, A., and Bell, T., 1999. Managing
Gigabytes: Compressing and Indexing Documents and
Images. Morgan Kaufmann Publishers, 2nd edition.
Yan H., Ding S., Suel T., 2009. Inverted index
compression and query processing with optimized
document ordering, Proceedings of the 18th
international conference on World Wide Web, April
20-24, 2009, Madrid, Spain
Yan, H., Ding, S., Suel, T., 2009, Compressing term
positions in Web indexes, pp. 147-154, Proceedings
of the 32nd Annual International ACM SIGIR
Conference on Research and Development in
Information Retrieval.
Zhang, J., Long, X., and Suel, T. 2008. Performance of
compressed inverted list caching in search engines. In
the 17th International World Wide Web Conf. WWW.
Zobel, J., Moffat, A., 2006. Inverted Files for Text Search
Engines, ACM Computing Surveys, Vol. 38, No. 2,
Article 6.
Zukowski, M., Heman, S., Nes, N., and Boncz, P. 2006.
Super-scalar RAM-CPU cache compression. In the
22
nd
International Conf. on Data Engineering (ICDE)
2006.
WEBIST2013-9thInternationalConferenceonWebInformationSystemsandTechnologies
256