POPULATING A DOMAIN ONTOLOGY FROM A WEB BIOGRAPHICAL DICTIONARY OF MUSIC - An Unsupervised Rule-based Method to Handle Brazilian Portuguese Texts

Eduardo Motta, Sean Siqueira, Alexandre Andreatta

Abstract

An increasing amount of information is available on the web and usually is expressed as text, representing unstructured or semi-structured data. Semantic information is implicit in these texts, since they are mainly intended for human consumption and interpretation. Since unstructured information is not easily handled automatically, an information extraction process has to be used to identify concepts and establish relations among them. Information extraction outcome can be represented as a domain ontology. Ontologies are an appropriate way to represent structured knowledge bases, enabling sharing, reuse and inference. In this paper, an information extraction process is used for populating a domain ontology. It targets Brazilian Portuguese texts from a biographical dictionary of music, which requires specific tools due to some language unique aspects. An unsupervised rule-based method is proposed. Through this process, latent concepts and relations expressed in natural language can be extracted and represented as an ontology, allowing new uses and visualizations of the content, such as semantically browsing and inferring new knowledge.

References

  1. Abiteboul, S., Buneman, P., Suciu, D., 2000. Data on the Web. San Francisco: Morgan Kaufman.
  2. Albin, R., 2008. Dicionário Cravo Albin da Música Popular Brasileira, http://www.dicionariompb.com.br, accessed on November, 2008.
  3. Allen, J., 1991. Time and Time Again - The Many Ways to Represent Time, International Journal of Intelligent Systems, 6 (1991).
  4. Branco, A., Silva, J., 2006. A Suite of Shallow Processing Tools for Portuguese:LX-Suite, In Proceedings of 11th Conference of the European Chapter of Association for Computational Linguistics, pp. 179-182.
  5. Cardoso, J., 2007. The Semantic Web Vision: Where are We, IEEE Intelligent Systems, September/October 2007, pp.22-26, 2007.
  6. Chang, C., Kayed, M., Girgis, M., Shaalan, K., 2006. A Survey of Web Information Extraction Systems, IEEE Transaction on Knowledge and Data Engineering, 18(10), pp.1411-1428.
  7. Chaves, A. and Rino, L., 2008, The Mitkov Algorithm for Anaphora Resolution in Portuguese. In International Conference on Computational Processing of Portuguese Language (PROPOR 2008), Aveiro, Portugal.
  8. Cimiano, P. and Völker, J., 2005. Towards large-scale open-domain and ontology-based named entity classification, In Proceedings of RANLP'05, pp. 166- 172, Borovets, Bulgaria.
  9. CliqueMusic, 2008. CliqueMusic site, http:// cliquemusic.uol.com.br, accessed on November, 2008.
  10. Feldman, R. and Sanger, J., 2007. The Text Mining Handbook - Advanced Approaches in Analyzing Unstructured Data, Cambridge University Press, Cambridge, UK.
  11. Giasson, F. and Raimond, Y., 2007. Music Ontology Specification, http://musicontology.com, accessed on November, 2008.
  12. Graça, J., Mamede, N., Pereira, J., 2006. A framework for Integrating Natural Language Tools, In Computational Processing of the Portuguese Language - 7th International Workshop, PROPOR 2006, Itatiaia, Brazil, Springer
  13. Gruber, T., 2008. Ontology. To appear in Encyclopedia of Database Systems, Ling Liu and M. Tamer Özsu (Eds.), Springer-Verlag
  14. Haarslev, V. and Möller, R., 2003. Racer: An OWL Reasoning Agent for the Semantic Web, In Proceedings of the International Workshop on Applications, Products and Services of Web-based Support Systems, in conjunction with 2003 IEEE/WIC International Conference on Web Intelligence, Halifax Canada, Oct 13, pp. 91-95, 2003.
  15. Haase, P. and Völker, J., 2005. Ontology learning and reasoning - dealing with uncertainty and inconsistency In Proceedings of the Workshop on Uncertainty Reasoning for the Semantic Web (URSW 2005)
  16. Hearst, M., 1992. Automatic acquisition of hyponyms from large text corpora, In Proceedings of the 14th International Conference on Computational Linguistics (COLING), pp. 539-545.
  17. Kaiser K., and Miksch, S., 2005. Information Extraction - A Survey, Technical Report Asgaard-TR-2005-6, Vienna University of Technology, Vienna, Austria, 2005
  18. Knublauch, H. 2006. Protégé-OWL API Programmer's Guide, http://protege.stanford.edu/plugins/owl/api/ guide.html, accessed on November, 2008.
  19. Mani, I., and Wilson, G., 2000. Temporal Granularity and Temporal Tagging of Text. In AAAI-2000 Workshop on Spatial and Temporal Granularity, Austin, TX.
  20. Moens, M-F., 2006. Information Extraction: Algorithms and Prospects in a Retrieval Context, Springer.
  21. Muniz, M. and Nunes, M., Laporte, E., 2005. UNITEXPB, a set of flexible language resources for Brazilian Portuguese In Proceedings of the Workshop on Technology on Information and Human Language (TIL), São Leopoldo, Brazil
  22. Protégé, 2008. Protégé home page, http:// protege.stanford.edu/, acessed on November, 2008.
  23. Quan, D. and Karger,D., 2004. How to make a semantic web browser, In Proceedings of the 13th international conference on World Wide Web, 2004
  24. Tanev, H. and Magnini, B., 2006. Weakly Supervised Approaches for Ontology Population In Proceedings of 11 th Conference of the European Chapter of the Association for Computational Linguistics: EACL 2006
  25. Yildiz, B. and Miksch, S., 2007. Motivating OntologyDriven Information Extraction, In Proceedings of the International Conference on Semantic Web and Digital Libraries (ICSD-2007).
Download


Paper Citation


in Harvard Style

Motta E., Siqueira S. and Andreatta A. (2009). POPULATING A DOMAIN ONTOLOGY FROM A WEB BIOGRAPHICAL DICTIONARY OF MUSIC - An Unsupervised Rule-based Method to Handle Brazilian Portuguese Texts . In Proceedings of the Fifth International Conference on Web Information Systems and Technologies - Volume 1: WEBIST, ISBN 978-989-8111-81-4, pages 192-199. DOI: 10.5220/0001842301920199


in Bibtex Style

@conference{webist09,
author={Eduardo Motta and Sean Siqueira and Alexandre Andreatta},
title={POPULATING A DOMAIN ONTOLOGY FROM A WEB BIOGRAPHICAL DICTIONARY OF MUSIC - An Unsupervised Rule-based Method to Handle Brazilian Portuguese Texts},
booktitle={Proceedings of the Fifth International Conference on Web Information Systems and Technologies - Volume 1: WEBIST,},
year={2009},
pages={192-199},
publisher={SciTePress},
organization={INSTICC},
doi={10.5220/0001842301920199},
isbn={978-989-8111-81-4},
}


in EndNote Style

TY - CONF
JO - Proceedings of the Fifth International Conference on Web Information Systems and Technologies - Volume 1: WEBIST,
TI - POPULATING A DOMAIN ONTOLOGY FROM A WEB BIOGRAPHICAL DICTIONARY OF MUSIC - An Unsupervised Rule-based Method to Handle Brazilian Portuguese Texts
SN - 978-989-8111-81-4
AU - Motta E.
AU - Siqueira S.
AU - Andreatta A.
PY - 2009
SP - 192
EP - 199
DO - 10.5220/0001842301920199