A NEW ALGORITHM FOR TWIG PATTERN MATCHING

Yangjun Chen

Abstract

Tree pattern matching is one of the most fundamental tasks for XML query processing. Prior work has typically decomposed the twig pattern into binary structural (parent-child and ancestor-descendent) relationships or paths, and then stitch together these basic matches by join operations. In this paper, we propose a new algorithm that explores both the document tree and the twig pattern in a bottom-up way and show that the join operation can be completely avoided. The new algorithm runs in O (|T|×|Q|) time and O (|Q|×leafT) space, where T and Q are the document tree and the twig pattern query, respectively; and leafT represents the number of leaf nodes in T. Our experiments show that our method is effective, scalable and efficient in evaluating twig pattern queries.

References

  1. S. Abiteboul, P. Buneman, and D. Suciu (1999) Data on the web: from relations to semistructured data and XML, Morgan Kaufmann Publisher, Los Altos, CA 94022, USA, 1999.
  2. A. Aghili, H. Li, D. Agrawal (2006). and A.E. Abbadi, TWIX: Twig structure and content matching of selective queries using binary labeling, in: INFOSCALE, 2006.
  3. N. Bruno, N. Koudas, and D. Srivastava (2002) Holistic Twig Hoins: Optimal XML Pattern Matching, in Proc. SIGMOD Int. Conf. on Management of Data, Madison, Wisconsin, June 2002, pp. 310-321.
  4. C. Chung, J. Min, and K. Shim (2002). APEX: An adaptive path index for XML data, ACM SIGMOD, June 2002.
  5. S. Chen et al. (2006). Twig2Stack: Bottom-up Processing of Generalized-Tree-Pattern Queries over XML Documents, in Proc. VLDB, Seoul, Korea, Sept. 2006, pp. 283-323.
  6. B.F. Cooper, N. Sample (2001). M. Franklin, A.B. Hialtason, and M. Shadmon, A fast index for semistructured data, in: Proc. VLDB, Sept. 2001, pp. 341-350.
  7. R. Goldman and J. Widom (1997). DataGuide: Enable query formulation and optimization in semistructured databases, in: Proc. VLDB, Aug. 1997, pp. 436-445.
  8. G. Gottlob, C. Koch, and R. Pichler (2005). Efficient Algorithms for Processing XPath Queries, ACM Transaction on Database Systems, Vol. 30, No. 2, June 2005, pp. 444-491.
  9. C.M. Hoffmann and M.J. O'Donnell (1982). Pattern matching in trees, J. ACM, 29(1):68-95, 1982.
  10. Q. Li and B. Moon (2001) Indexing and Querying XML data for regular path expressions, in: Proc. VLDB, Sept. 2001, pp. 361-370.
  11. J. Lu, T.W. Ling, C.Y. Chan, and T. Chan (2005). From Region Encoding to Extended Dewey: on Efficient Processing of XML Twig Pattern Matching, in: Proc. VLDB, pp. 193 - 204, 2005.
  12. G. Miklau and D. Suciu (2004) Containment and Equivalence of a Fragment of XPath, J. ACM, 51(1):2- 45, 2004.
  13. H. Wang, S. Park, W. Fan, and P.S. Yu (2003) ViST: A Dynamic Index Method for Querying XML Data by Tree Structures, SIGMOD Int. Conf. on Management of Data, San Diego, CA., June 2003.
  14. H. Wang and X. Meng (2005), On the Sequencing of Tree Structures for XML Indexing, in Proc. Conf. Data Engineering, Tokyo, Japan, April, 2005, pp. 372-385.
  15. R. Kaushik, P. Bohannon, J. Naughton, and H. Korth (2002) Covering indexes for branching path queries, in: ACM SIGMOD, June 2002.
Download


Paper Citation


in Harvard Style

Chen Y. (2007). A NEW ALGORITHM FOR TWIG PATTERN MATCHING . In Proceedings of the Ninth International Conference on Enterprise Information Systems - Volume 1: ICEIS, ISBN 978-972-8865-88-7, pages 44-51. DOI: 10.5220/0002356300440051


in Bibtex Style

@conference{iceis07,
author={Yangjun Chen},
title={A NEW ALGORITHM FOR TWIG PATTERN MATCHING},
booktitle={Proceedings of the Ninth International Conference on Enterprise Information Systems - Volume 1: ICEIS,},
year={2007},
pages={44-51},
publisher={SciTePress},
organization={INSTICC},
doi={10.5220/0002356300440051},
isbn={978-972-8865-88-7},
}


in EndNote Style

TY - CONF
JO - Proceedings of the Ninth International Conference on Enterprise Information Systems - Volume 1: ICEIS,
TI - A NEW ALGORITHM FOR TWIG PATTERN MATCHING
SN - 978-972-8865-88-7
AU - Chen Y.
PY - 2007
SP - 44
EP - 51
DO - 10.5220/0002356300440051