XML Labels Compression using Prefix-encodings

Hanaa Al Zadjali, Siobhán North

Abstract

XML is the de-facto standard for data representation and communication over the web, and so there is a lot of interest in querying XML data and most approaches require the data to be labelled to indicate structural relationships between elements. This is simple when the data does not change but complex when it does. In the day-to-day management of XML databases over the web, it is usual that more information is inserted over time than deleted. Frequent insertions can lead to large labels which have a detrimental impact on query performance and can cause overflow problems. Many researchers have shown that prefix encoding usually gives the highest compression ratio in comparison to other encoding schemes. Nonetheless, none of the existing prefix encoding methods has been applied to XML labels. This research investigates compressing XML labels via different prefix-encoding methods in order to reduce the occurrence of any overflow problems and improve query performance. The paper also presents a comparison between the performances of several prefix-encodings in terms of encoding/decoding time and compressed code size.

References

  1. Apostolico, A. & Fraenkel, A. S. 1987. Robust Transmission Of Unbounded Strings Using Fibonacci Representations. Information Theory, Ieee Transactions On, 33, 238-245.
  2. Baca, R., Walder, J., Pawlas, M. & Krátký, M. Year. Benchmarking The Compression Of Xml Node Streams. In: Database Systems For Advanced Applications, 2010 Berlin Heidelberg. Springer, 179- 190.
  3. Elias, P. 1975. Universal Codeword Sets And Representations Of The Integers. Information Theory, Ieee Transactions On, 21, 194-203.
  4. Ghaleb, T. A. & Mohammed, S. Year. Novel Scheme For Labeling Xml Trees Based On Bits-Masking And Logical Matching. In: 2013 World Congress On Computer And Information Technology (Wccit),, 22-24 June 2013 2013 Tunisia, Sousse. Ieee, 1-5.
  5. Härder, T., Haustein, M., Mathis, C. & Wagner, M. 2007. Node Labeling Schemes For Dynamic Xml Documents Reconsidered. Data & Knowledge Engineering, 60, 126-149.
  6. Haw, S.-C. & Lee, C.-S. 2011. Data Storage Practices And Query Processing In Xml Databases: A Survey. Knowledge-Based Systems, 24, 1317-1340.
  7. He, Y. Year. A Novel Encoding Scheme For Xml Document Update-Supporting. In: International Conference On Advances In Mechanical Engineering And Industrial Informatics (Ameii), 2015 Zhengzhou. Atlantis Press.
  8. Karpinski, M. & Nekrich, Y. 2009. A Fast Algorithm For Adaptive Prefix Coding. Algorithmica, 55, 29-41.
  9. Li, C. & Ling, T. W. 2005. Qed: A Novel Quaternary Encoding To Completely Avoid Re-Labeling In Xml Updates. Proceedings Of The 14th Acm International Conference On Information And Knowledge Management. Bremen, Germany: Acm.
  10. Li, C., Ling, T. W. & Hu, M. 2008. Efficient Updates In Dynamic Xml Data: From Binary String To Quaternary String. The Vldb Journal-The International Journal On Very Large Data Bases, 17, 573-601.
  11. Mactutor. 1996. Edouard Lucas Http://WwwGroups.Dcs.St-And.Ac.Uk/History/Biographies/ Lucas.Html [Online]. [Accessed 7/May/2015].
  12. Miklau, G. 2015. Xml Data Repository Http://Www.Cs .Washington.Edu/Research/Xmldatasets/ [Online]. [Accessed February 2015].
  13. O'neil, P., O'neil, E., Pal, S., Cseri, I., Schaller, G. & Westbury, N. 2004. Ordpaths: Insert-Friendly Xml Node Labels. Proceedings Of The 2004 Acm Sigmod International Conference On Management Of Data. Paris, France: Acm.
  14. O'connor, M. & Roantree, M. 2012. Scooter: A Compact And Scalable Dynamic Labeling Scheme For Xml Updates. Database And Expert Systems Applications. Springer Berlin Heidelberg.
  15. O'connor, M. & Roantree, M. 2013. Fiblss: A Scalable Label Storage Scheme For Dynamic Xml Updates. Advances In Databases And Information Systems. Springer Berlin Heidelberg.
  16. Sans, V. & Laurent, D. 2008. Prefix Based Numbering Schemes For Xml: Techniques, Applications And Performances. Proc. Vldb Endow., 1, 1564-1573.
  17. Scholer, F., Williams, H. E., Yiannis, J. & Zobel, J. 2002. Compression Of Inverted Indexes For Fast Query Evaluation. Proceedings Of The 25th Annual International Acm Sigir Conference On Research And Development In Information Retrieval. Tampere, Finland: Acm.
  18. Tatarinov, I., Viglas, S. D., Beyer, K., Shanmugasundaram, J., Shekita, E. & Zhang, C. 2002. Storing And Querying Ordered Xml Using A Relational Database System. Proceedings Of The 2002 Acm Sigmod International Conference On Management Of Data. Madison, Wisconsin: Acm.
  19. Walder, J., Krátký, M., Baca, R., Platoš, J. & Snášel, V. 2012. Fast Decoding Algorithms For Variable-Lengths Codes. Information Sciences, 183, 66-91.
  20. Williams, H. E. & Zobel, J. 1999. Compressing Integers For Fast File Access. The Computer Journal, 42, 193-201.
  21. Yergeau, F. 2003. Utf-8, A Transformation Format Of Iso 10646 Via Https://Tools.Ietf.Org/Html/Rfc3629 [Online]. [Accessed January 2015].
Download


Paper Citation


in Harvard Style

Zadjali H. and North S. (2016). XML Labels Compression using Prefix-encodings . In Proceedings of the 12th International Conference on Web Information Systems and Technologies - Volume 1: WEBIST, ISBN 978-989-758-186-1, pages 69-75. DOI: 10.5220/0005755500690075


in Bibtex Style

@conference{webist16,
author={Hanaa Al Zadjali and Siobhán North},
title={XML Labels Compression using Prefix-encodings},
booktitle={Proceedings of the 12th International Conference on Web Information Systems and Technologies - Volume 1: WEBIST,},
year={2016},
pages={69-75},
publisher={SciTePress},
organization={INSTICC},
doi={10.5220/0005755500690075},
isbn={978-989-758-186-1},
}


in EndNote Style

TY - CONF
JO - Proceedings of the 12th International Conference on Web Information Systems and Technologies - Volume 1: WEBIST,
TI - XML Labels Compression using Prefix-encodings
SN - 978-989-758-186-1
AU - Zadjali H.
AU - North S.
PY - 2016
SP - 69
EP - 75
DO - 10.5220/0005755500690075