An Approach for Semantic Search over Lithuanian News Website Corpus

Tomas Vileiniškis, Algirdas Šukys, Rita Butkienė

2015

Abstract

The continuous growth of unstructured textual information on the web implies the need for novel, semantically aware content processing and information retrieval (IR) methods. Following the evolution and wide adoption of Semantic Web technology, a number of approaches to overcome the limitations of traditional keyword-based search techniques have been proposed. However, most of the research concentrates on English and other well-known, linguistic resource-rich languages. Hence, this paper presents an attempt to semantic search over domain-specific Lithuanian web documents. We introduce an ontology-based semantic search framework capable of answering structured natural Lithuanian language questions and discuss its language-dependent design decisions. The findings from a recent case study showed that our proposed framework can be applied to approach meaning-based IR with significant results, even when the underlying language is morphologically rich and has limited linguistic resources.

References

  1. Salton, G., Wong, A., Yang, C. S. (1975). A vector space model for automatic indexing. Communications of the ACM, 18(11), 613-620.
  2. Carpineto, C., Romano, G. (2012). A survey of automatic query expansion in information retrieval. ACM Computing Surveys (CSUR), 44(1), 1.
  3. Stokoe, C., Oakes, M. P., Tait, J. (2003). Word sense disambiguation in information retrieval revisited. In Proceedings of the 26th annual international ACM SIGIR conference on Research and development in information retrieval (pp. 159-166). ACM.
  4. Mangold, C. (2007). A survey and classification of semantic search approaches. International Journal of Metadata, Semantics and Ontologies, 2(1), 23-34.
  5. Ferrucci, D., Brown, E., Chu-Carroll, J., Fan, J., Gondek, D., Kalyanpur, A., Welty, C. et al. (2010). Building Watson: An overview of the DeepQA project. AI magazine, 31(3), 59-79.
  6. Šveikauskiene, D., Telksnys, L. (2014). Accuracy of the Parsing of Lithuanian Simple Sentences. Information Technology and Control, 43(4), 402-413.
  7. Kiryakov, A., Popov, B., Terziev, I., Manov, D., Ognyanoff, D. (2004). Semantic annotation, indexing, and retrieval. Web Semantics: Science, Services and Agents on the World Wide Web, 2(1), 49-79.
  8. Castells, P., Fernandez, M., and Vallet, D. (2007). An adaptation of the vector-space model for ontologybased information retrieval. Knowledge and Data Engineering, IEEE Transactions on, 19(2), 261-272.
  9. Fernández, M., Cantador, I., López, V., Vallet, D., Castells, P., Motta, E. (2011). Semantically enhanced Information Retrieval: an ontology-based approach. Web Semantics: Science, Services and Agents on the World Wide Web, 9(4), 434-452.
  10. Lopez, V., Uren, V., Sabou, M. R., Motta, E. (2009). Cross ontology query answering on the semantic web: an initial evaluation. In Proceedings of the fifth international conference on Knowledge capture (pp. 17-24). ACM.
  11. Zinkevicius, V. (2000). Lemuoklis-morfologinei analizei. Darbai ir dienos, 24, 245-274.
  12. Šveikauskiene, D. (2005). Formal description of the syntax of the Lithuanian language. Information Technologies and Control, 34(3).
  13. Kapociute-Dzikiene, J., Nivre, J., Krupavicius, A. (2013). Lithuanian Dependency Parsing with Rich Morphological Features. In Fourth Workshop on Statistical Parsing of Morphologically Rich Languages (p. 12).
  14. Krilavicius, T., Medelis, Ž., Kapociute-Dzikiene, J., Žalandauskas, T. (2012). News Media Analysis Using Focused Crawl and Natural Language Processing: Case of Lithuanian News Websites. In Information and Software Technologies (pp. 48-61). Springer Berlin Heidelberg.
  15. Amardeilh, F. (2008). Semantic annotation and ontology population. Semantic Web Engineering in the Knowledge Society, 424-p.
  16. Navigli, R., Ponzetto, S. P. (2012). BabelNet: The automatic construction, evaluation and application of a wide-coverage multilingual semantic network. Artificial Intelligence, 193, 217-250.
  17. OMG, 2008. Semantics of Business Vocabulary and Business Rules (SBVR). Version 1.0. December, 2008, OMG Document Number: formal/2008-01-02.
  18. Goedertier, S., Vanthienen, J. (2008). A Vocabulary and Execution Model for Declarative Service Orchestration. Business Process Management Workshops, LNCS, Vol. 4928, 496-501.
  19. Bodenstaff, L., Ceravolo, P., Ernesto Damiani, R., Fugazza, C., Reed, K., Wombacher, A. (2008). Representing and Validating Digital Business Processes. Web Information Systems and Technologies, LNBIP, Vol. 8(1), 19-32.
  20. Karpovic, J., Krišciuniene, G., Ablonskis, L., Nemuraite, L. (2014). The Comprehensive Mapping of Semantics of Business Vocabulary and Business Rules (SBVR) to OWL 2 Ontologies. Information Technology and Control, 43(3), 289-302.
  21. Sukys, A., Nemuraite, L., Paradauskas, B., Sinkevicius, E. (2012). Transformation framework for SBVR based semantic queries in business information systems. In BUSTECH 2012, The Second International Conference on Business Intelligence and Technology (pp. 19-24).
  22. Sukys, A., Nemuraite, L., Paradauskas, B. (2012). Representing and transforming SBVR question patterns into SPARQL. In Information and Software Technologies (pp. 436-451).
  23. Bernotaityte, G., Nemuraite, L., Butkiene, R., Paradauskas, B. (2013). Developing SBVR vocabularies and business rules from OWL2 ontologies. In Information and Software Technologies (pp. 134-145).
  24. Shekarpour, S., Marx, E., Ngomo, A. C. N., & Auer, S. (2015). Sina: Semantic interpretation of user queries for question answering on interlinked data. Web Semantics: Science, Services and Agents on the World Wide Web, 30, 39-51.
  25. Yao, X., Van Durme, B. (2014). Information extraction over structured data: Question answering with freebase. In Proceedings of ACL.
Download


Paper Citation


in Harvard Style

Vileiniškis T., Šukys A. and Butkienė R. (2015). An Approach for Semantic Search over Lithuanian News Website Corpus . In Proceedings of the 7th International Joint Conference on Knowledge Discovery, Knowledge Engineering and Knowledge Management - Volume 1: KDIR, (IC3K 2015) ISBN 978-989-758-158-8, pages 57-66. DOI: 10.5220/0005596800570066


in Bibtex Style

@conference{kdir15,
author={Tomas Vileiniškis and Algirdas Šukys and Rita Butkienė},
title={An Approach for Semantic Search over Lithuanian News Website Corpus},
booktitle={Proceedings of the 7th International Joint Conference on Knowledge Discovery, Knowledge Engineering and Knowledge Management - Volume 1: KDIR, (IC3K 2015)},
year={2015},
pages={57-66},
publisher={SciTePress},
organization={INSTICC},
doi={10.5220/0005596800570066},
isbn={978-989-758-158-8},
}


in EndNote Style

TY - CONF
JO - Proceedings of the 7th International Joint Conference on Knowledge Discovery, Knowledge Engineering and Knowledge Management - Volume 1: KDIR, (IC3K 2015)
TI - An Approach for Semantic Search over Lithuanian News Website Corpus
SN - 978-989-758-158-8
AU - Vileiniškis T.
AU - Šukys A.
AU - Butkienė R.
PY - 2015
SP - 57
EP - 66
DO - 10.5220/0005596800570066