A Virtual Document Approach for Keyword Search in Databases

Jaime I. Lopez-Veyna, Victor J. Sosa-Sosa, Ivan Lopez-Arevalo

Abstract

It is clear that in recent years the amount of information available in a variety of data sources, like those found on the Web, has presented an accelerated growth. This information can be classified based on its structure in three different forms: unstructured (free text documents), semi-structured (XML documents) and structured (a relational database or XML database). A search technique that has gained wide acceptance for use in massive data sources, such as the Web, is the keyword based search, which is simple to people who are familiar with the use of Web search engines. Keyword search has become an alternative to users without any knowledge about formal query languages and schema used in structured data. There are some traditional approaches to perform keyword search over relational databases such as Steiner Trees, Candidate Networks and recently Tuple Units. Nevertheless these methods have some limitations. In this paper we propose a Virtual Document (VD) approach for keyword search in databases. We represent the structured information as graphs and propose the use of an index that captures the structural relationships of the information. This approach produce fast and accuracy results in search responses. We have conducted extensive experiments on large-scale real databases and the results demonstrates that our approach achieves high search efficiency and high accuracy for keyword search in databases.

References

  1. Abiteboul, S. and Allard, T. (2008). Webcontent: Efficient p2p warehousing of web data.
  2. Achiezra, H. and Golenberg, K. (2010). Exploratory keyword search on data graphs. In Proceedings of the 2010 international conference on Management of data (SIGMOD), pages 1163-1166. ACM.
  3. Agrawal, S. and Chaudhuri, S. (2002). Dbxplorer: A system for keyword-based search over relational databases. In Proceedings of the 18th International Conference on Data Engineering, ICDE 7802. IEEE Computer Society.
  4. Bao, Z. and Lu, J. (2010). Towards an effective xml keyword search. IEEE Transactions on Knowledge and Data Engineering, 22(8):1077-1092.
  5. Bhalotia, G. and Hulgeri, A. (2002). Keyword searching and browsing in databases using banks. In Proceedings of the 18th International Conference on Data Engineering, ICDE 7802, pages 431-440.
  6. Chaudhuri, S. and Ramakrishnan, R. (2005). Integrating db and ir technologies: What is the sound of one hand clapping. In Innovative Data Systems Research (CIDR), pages 1-12.
  7. Ding, B. and Xu, J. (2007). Finding top-k min-cost connected trees in databases.
  8. Dong, X. and Halevy, A. (2007). Indexing dataspaces. In Proceedings of the 2007 ACM SIGMOD international conference on Management of data, SIGMOD 7807, pages 43-54. ACM.
  9. Du, D. and Hu, X. (2008). Steiner Tree problems in Computer Communication Networks. World Scientific Publishing.
  10. Fang, L. and Clement, Y. (2006). Effective keyword search in relational databases. In Proceedings of the 2006 ACM SIGMOD international conference on Management of data, SIGMOD 7806, pages 563-574. ACM.
  11. Feng, J. and Li, G. (2011). Finding top-k answers in keyword search over relational databases using tuple units. IEEE Transactions on Knowledge and Data Engineering Volume, 23:1781-1794.
  12. Franklin, M. and Halevy, A. (2005). From databases to dataspaces: A new abstraction for information management. SIGMOD Record, 34:27-33.
  13. He, H. and Wang, H. (2007). Blinks: ranked keyword searches on graphs. In Proceedings of the 2007 ACM SIGMOD international conference on Management of data, SIGMOD 7807, pages 305-316. ACM.
  14. Hristidis, V. and Gravano, L. (2003). Efficient ir-style keyword search over relational databases. In Proceedings of the 29th international conference on Very large data bases - Volume 29, VLDB 782003, pages 850-861. VLDB Endowment.
  15. Hristidis, V. and Papakonstantinou, Y. (2002). Discover: Keyword search in relational databases. In Proceedings of the 28th international conference on Very Large Data Bases, pages 670-681. VLDB Endowment.
  16. Hristidis, V. and Papakonstantinou, Y. (2003). Keyword proximity search on xml graphs. In Proceedings. 19th International Conference Data Engineering, pages 367-378.
  17. Kacholia, V. and Pandit, S. (2005). Bidirectional expansion for keyword search on graph databases.
  18. Kimelfeld, B. and Sagiv, Y. (2008). Efficiently enumerating results of keyword search over data graphs. Information Systems, 33:335-359.
  19. Lam, C. (2011). Hadoop in Action. Manning Publications Co.
  20. Li, G. and Feng, J. (2008a). Ease: an effective 3-in-1 keyword search method for unstructured, semi-structured and structured data. In Proceedings of the 2008 ACM SIGMOD international conference on Management of data(SIGMOD), pages 903-914.
  21. Li, G. and Feng, J. (2008b). Retrieving and materializing tuple units for effective keyword search over relational databases. In Lecture Notes in Computer Science, Conceptual Modeling - ER, pages 469-483.
  22. Li, G. and Feng, J. (2009). Providing built-in keyword search capabilities in rdbms.
  23. Luo, L. and Lin, X. (2007). Spark: top-k keyword query in relational databases. In Proceedings of the 2007 ACM SIGMOD international conference on Management of data, SIGMOD 7807, pages 115-126. ACM.
  24. M. Karnstedt, K. S. (2008). A dht-based infrastructure for ad-hoc integration and querying of semantic data. In Proceedings of the 2008 international symposium on Database engineering and applications, pages 19-28.
  25. Park, J. and goo Lee, S. (2011). Keyword search in relational databases. Knowl. Inf. Syst, 26(2):175-193.
  26. Su, Q. and Widom, J. (2005). Indexing relational database content offline for efficient keyword-based search. In Proceedings of the 9th International Database Engineering and Application Symposium (IDEAS), pages 297-306.
  27. V. Hristidis, N. K. (2006). Keyword proximity search in xml trees. IEEE Transactions on Knowledge and Data Engineering, pages 525-539.
  28. Xu, J. and Qui, L. (2010). Keyword search in relational databases: A survey. Bulletin of the IEEE Computer Society Technical Comittee on Data Engineering, 33:67-78.
  29. Zhong, M. and Liu, M. (2009). Efficient keyword proximity search using a frontier-reduce strategy based on d-distance graph index. In Proceedings of the 2009 International Database Engineering & Applications Symposium (IDEAS), pages 206-216. ACM.
Download


Paper Citation


in Harvard Style

I. Lopez-Veyna J., J. Sosa-Sosa V. and Lopez-Arevalo I. (2012). A Virtual Document Approach for Keyword Search in Databases . In Proceedings of the International Conference on Data Technologies and Applications - Volume 1: DATA, ISBN 978-989-8565-18-1, pages 39-48. DOI: 10.5220/0004048700390048


in Bibtex Style

@conference{data12,
author={Jaime I. Lopez-Veyna and Victor J. Sosa-Sosa and Ivan Lopez-Arevalo},
title={A Virtual Document Approach for Keyword Search in Databases},
booktitle={Proceedings of the International Conference on Data Technologies and Applications - Volume 1: DATA,},
year={2012},
pages={39-48},
publisher={SciTePress},
organization={INSTICC},
doi={10.5220/0004048700390048},
isbn={978-989-8565-18-1},
}


in EndNote Style

TY - CONF
JO - Proceedings of the International Conference on Data Technologies and Applications - Volume 1: DATA,
TI - A Virtual Document Approach for Keyword Search in Databases
SN - 978-989-8565-18-1
AU - I. Lopez-Veyna J.
AU - J. Sosa-Sosa V.
AU - Lopez-Arevalo I.
PY - 2012
SP - 39
EP - 48
DO - 10.5220/0004048700390048