On the other hand, there are approaches that con-
centrate on a higher performance for the computa-
tion of the set of query results.
Early approaches were computing the LCA for a
set of given keywords on the fly (Schmidt et al.,
2001). Recent approaches try to enhance the query
performance by using a pre-computed index. The
approach (Florescu et al., 2000) is based on storing
the inverted element lists within a relational
database.
(Li et al., 2004) present an approach based on
computing the MLCA with the help of XQuery
operations and a second approach that, similar as
XKSearch (Xu and Papakonstantinou, 2005),
processes the document bottom-up in order to
compute the index and store all nodes not yet com-
pletely parsed on a stack. Whenever a node is found
as result, all its ancestors are removed from the
stack, as they cannot form a result anymore.
JDeweyJoin (Chen and Papakonstantinou, 2010)
returns the top-k most relevant results. They com-
pute the results bottom-up based on a kind of join on
the lists of DeweyIDs of the nodes in the inverted
element lists. They sort the list entries according to a
weight function and stop the computation after k
results, returning the top-k most relevant results.
(Zhou et al., 2012) present an approach that en-
riches the inverted element lists by all ancestor-
nodes of the nodes with the keyword as label. There-
fore, they can compute the SLCAs by intersecting
the inverted element lists with the list of keywords
and by finally removing each result candidate, the
descendant of which is another result candidate.
Our paper focuses on efficient result computa-
tion. It follows the anchor-based approach as it was
presented in (Sun et al., 2007). However, different
from all other contributions, instead of computing an
XML-index, we compute a DAG-Index. This
enables us to compute several keyword search
results in parallel, and thereby speeds-up the SLCA
computation. To the best of our knowledge, DAG-
Index is the first approach that improves keyword
search by using XML compression before comput-
ing the search index.
5 SUMMARY
AND CONCLUSIONS
Keyword search is of increasing interest for search-
ing relevant data within large XML document col-
lections, especially for the huge majority of non-
expert users. Due to the increasing amount of pub-
licly available data in the XML format, there is an
increasing interest in fast keyword search tech-
niques. We have presented DAG-Index, an indexing
and keyword search strategy for large XML docu-
ments that allows compressing an XML tree and the
search index in such a way that common sub-trees
have to be indexed only once. As a consequence, a
repeated keyword search within a repeated sub-tree
can be avoided. We consider our DAG-Index-based
keyword search to be a significant contribution to
improve the search performance especially for the
majority of the non-expert users.
REFERENCES
Chen, L. J., & Papakonstantinou, Y. (2010). Supporting
top-K keyword search in XML databases. Proceedings
of the 26th International Conference on Data
Engineering. Long Beach, CA, USA.
Florescu, D., Kossmann, D., & Manolescu, I. (2000).
Integrating keyword search into XML query
processing. Computer Networks , 33.
Guo, L., Shao, F., Botev, C., & Shanmugasundaram, J.
(2003). XRANK: Ranked Keyword Search over XML
Documents. Proceedings of the 2003 ACM SIGMOD
International Conference on Management of Data.
San Diego, California, USA.
Li, J., Liu, C., Zhou, R., & Wang, W. (2010). Suggestion
of promising result types for XML keyword search.
13th International Conference on Extending Database
Technology. Lausanne, Switzerland.
Li, Y., Yu, C., & Jagadish, H. V. (2004). Schema-Free
XQuery. (e)Proceedings of the Thirtieth International
Conference on Very Large Data Bases. Toronto,
Canada.
Petkova, D., Croft, W. B., & Diao, Y. (2009). Refining
Keyword Queries for XML Retrieval by Combining
Content and Structure. Advances in Information
Retrieval, 31th European Conference on IR Research.
Toulouse, France.
Schmidt, A., Kersten, M. L., & Windhouwer, M. (2001).
Querying XML Documents Made Easy: Nearest
Concept Queries. Proceedings of the 17th
International Conference on Data Engineering.
Heidelberg, Germany.
Sun, C., Chan, C. Y., & Goenka, A. K. (2007). Multiway
SLCA-based keyword search in XML data.
Proceedings of the 16th International Conference on
World Wide Web. Banff, Alberta, Canada.
Xu, Y., & Papakonstantinou, Y. (2005). Efficient
Keyword Search for Smallest LCAs in XML
Databases. Proceedings of the ACM SIGMOD
International Conference on Management of Data.
Baltimore, Maryland, USA.
Zhou, J., Bao, Z., Wang, W., Ling, T. W., Chen, Z., Lin,
X., et al. (2012). Fast SLCA and ELCA Computation
for XML Keyword Queries Based on Set Intersection.
IEEE 28th International Conference on Data
Engineering. Washington, DC, USA.
WEBIST2013-9thInternationalConferenceonWebInformationSystemsandTechnologies
140