Inference Approach to Enhance a Portuguese Open Information Extraction

Cleiton Fernando Lima Sena, Rafael Glauber, Daniela Barreiro Claro

2017

Abstract

Open Information Extraction (Open IE) enables the extraction of facts in large quantities of texts written in natural language. Despite the fact that almost research has been doing in English texts, methods and techniques for other languages have been less frequent. However, those languages other than English correspond to 48% of content available on websites around the world. In this work, we propose a method for extracting facts in Portuguese without pre-determining the types of the facts. Additionally, we increased the quantity of those extracted facts by the use of an inference approach. Our inference method is composed of two issues: a transitive and a symmetric mechanism. To the best of our knowledge, this is the first time that inference approach is used to extract facts in Portuguese texts. Our proposal allowed an increase of 36% in quantity of valid facts extracted in a Portuguese Open IE system, and it is compatible in the quality of facts with English approaches.

References

  1. Angeli, G., Premkumar, M. J., and Manning, C. D. (2015). Leveraging linguistic structure for open domain information extraction. Linguistics, (1/24).
  2. Banko, M. (2009). Open information extraction for the web. PhD thesis, University of Washington.
  3. Banko, M., Cafarella, M. J., Soderland, S., Broadhead, M., and Etzioni, O. (2007). Open information extraction for the web. In IJCAI, volume 7, pages 2670-2676.
  4. Bast, H. and Haussmann, E. (2013). Open information extraction via contextual sentence decomposition. In Semantic Computing (ICSC), 2013 IEEE Seventh International Conference on, pages 154-159. IEEE.
  5. Bast, H. and Haussmann, E. (2014). More informative open information extraction via simple inference. In Proceedings of the 36th European Conference on IR Research on Advances in Information Retrieval - Volume 8416, ECIR 2014, pages 585-590, New York, NY, USA. Springer-Verlag New York, Inc.
  6. Buckland, M. and Gey, F. (1994). The relationship between recall and precision. Journal of the American society for information science, 45(1):12.
  7. Del Corro, L. and Gemulla, R. (2013). Clausie: clausebased open information extraction. In Proceedings of the 22nd international conference on World Wide Web, pages 355-366. ACM.
  8. Etzioni, O., Banko, M., Soderland, S., and Weld, D. S. (2008). Open information extraction from the web. Communications of the ACM, 51(12):68-74.
  9. Fader, A., Soderland, S., and Etzioni, O. (2011). Identifying relations for open information extraction. In Proceedings of the Conference on Empirical Methods in Natural Language Processing, pages 1535-1545. Association for Computational Linguistics.
  10. Faruqui, M. and Kumar, S. (2015). Multilingual open relation extraction using cross-lingual projection. arXiv preprint arXiv:1503.06450.
  11. Gamallo, P. and Garcia, M. (2015). Multilingual open information extraction. In Portuguese Conference on Artificial Intelligence, pages 711-722. Springer.
  12. Gamallo, P., Garcia, M., and Fernández-Lanza, S. (2012). Dependency-based open information extraction. In Proceedings of the joint workshop on unsupervised and semi-supervised learning in NLP, pages 10-18. Association for Computational Linguistics.
  13. Gotti, F. and Langlais, P. (2016). Harnessing open information extraction for entity classification in a french corpus. In Canadian Conference on Artificial Intelligence, pages 150-161. Springer.
  14. Kuhn, M. (2008). Caret package. Journal of Statistical Software, 28(5).
  15. Moura Silva, W. D. C. d. (2013). Improving the Corrector Gramatical CoGrOO. PhD thesis, University of Sa˜o Paulo.
  16. Qiu, L. and Zhang, Y. (2014). Zore: A syntax-based system for chinese open relation extraction. In EMNLP, pages 1870-1880.
  17. Schmitz, M., Bart, R., Soderland, S., Etzioni, O., et al. (2012). Open language learning for information extraction. In Proceedings of the 2012 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning, pages 523-534. Association for Computational Linguistics.
  18. Schoenmackers, S., Etzioni, O., and Weld, D. S. (2008). Scaling textual inference to the web. In Proceedings of the Conference on Empirical Methods in Natural Language Processing, pages 79-88. Association for Computational Linguistics.
  19. Soderland, S. (1999). Learning information extraction rules for semi-structured and free text. Machine learning, 34(1-3):233-272.
  20. Tseng, Y.-H., Lee, L.-H., Lin, S.-Y., Liao, B.-S., Liu, M.-J., Chen, H.-H., Etzioni, O., and Fader, A. (2014). Chinese open relation extraction for knowledge acquisition. In EACL, pages 12-16.
  21. Van Deemter, K. and Kibble, R. (1999). What is coreference, and what should coreference annotation be? In Proceedings of the Workshop on Coreference and its Applications, pages 90-96. Association for Computational Linguistics.
  22. Wu, F. and Weld, D. S. (2010). Open information extraction using wikipedia. In Proceedings of the 48th Annual Meeting of the Association for Computational Linguistics, ACL 7810, pages 118-127, Stroudsburg, PA, USA. Association for Computational Linguistics.
Download


Paper Citation


in Harvard Style

Lima Sena C., Glauber R. and Barreiro Claro D. (2017). Inference Approach to Enhance a Portuguese Open Information Extraction . In Proceedings of the 19th International Conference on Enterprise Information Systems - Volume 1: ICEIS, ISBN 978-989-758-247-9, pages 442-451. DOI: 10.5220/0006338204420451


in Bibtex Style

@conference{iceis17,
author={Cleiton Fernando Lima Sena and Rafael Glauber and Daniela Barreiro Claro},
title={Inference Approach to Enhance a Portuguese Open Information Extraction},
booktitle={Proceedings of the 19th International Conference on Enterprise Information Systems - Volume 1: ICEIS,},
year={2017},
pages={442-451},
publisher={SciTePress},
organization={INSTICC},
doi={10.5220/0006338204420451},
isbn={978-989-758-247-9},
}


in EndNote Style

TY - CONF
JO - Proceedings of the 19th International Conference on Enterprise Information Systems - Volume 1: ICEIS,
TI - Inference Approach to Enhance a Portuguese Open Information Extraction
SN - 978-989-758-247-9
AU - Lima Sena C.
AU - Glauber R.
AU - Barreiro Claro D.
PY - 2017
SP - 442
EP - 451
DO - 10.5220/0006338204420451