ArabRelat: Arabic Relation Extraction using Distant Supervision

Reham Mohamed, Nagwa M. El-Makky, Khaled Nagi

Abstract

Relation Extraction is an important preprocessing task for a number of text mining applications, including: Information Retrieval, Question Answering, Ontology building, among others. In this paper, we propose a novel Arabic relation extraction method that leverages linguistic features of the Arabic language in Web data to infer relations between entities. Due to the lack of labeled Arabic corpora, we adopt the idea of distant supervision, where DBpedia, a large database of semantic relations extracted from Wikipedia, is used along with a large unlabeled text corpus to build the training data. We extract the sentences from the unlabeled text corpus, and tag them using the corresponding DBpedia relations. Finally, we build a relation classifier using this data which predicts the relation type of new instances. Our experimental results show that the system reaches 70% for the F-measure in detecting relations.

References

  1. Alsaif, A. and Markert, K. (2011). Modelling discourse relations for arabic. In Proceedings of the Conference on Empirical Methods in Natural Language Processing. Association for Computational Linguistics.
  2. Banko, M., Cafarella, M. J., Soderland, S., Broadhead, M., and Etzioni, O. (2007). Open information extraction for the web. In IJCAI.
  3. Diab, M. T., Moschitti, A., and Pighin, D. (2008). Semantic role labeling systems for arabic using kernel methods. In ACL.
  4. Fan, M., Zhao, D., Zhou, Q., Liu, Z., Zheng, T. F., and Chang, E. Y. (2014). Distant supervision for relation extraction with matrix completion. In Proceedings of the 52nd Annual Meeting of the Association for Computational Linguistics.
  5. Gabrilovich, E. and Markovitch, S. (2007). Computing semantic relatedness using wikipedia-based explicit semantic analysis. In IJCAI.
  6. Green, S. and Manning, C. D. (2010). Better arabic parsing: Baselines, evaluations, and analysis. In Proceedings of the 23rd International Conference on Computational Linguistics, pages 394-402. Association for Computational Linguistics.
  7. Gupta, R., Halevy, A., Wang, X., Whang, S. E., and Wu, F. (2014). Biperpedia: An ontology for search applications. Proceedings of the VLDB Endowment.
  8. Hall, M., Frank, E., Holmes, G., Pfahringer, B., Reutemann, P., and Witten, I. H. (2009). The WEKA data mining software: An update. ACM SIGKDD Explorations Newsletter, 11(1):10-18.
  9. Hsu, I.-C., Lin, H.-Y., Yang, L. J., and Huang, D.-C. (2012). Using linked data for intelligent information retrieval. In Soft Computing and Intelligent Systems (SCIS) and 13th International Symposium on Advanced Intelligent Systems (ISIS), 2012 Joint 6th International Conference on. IEEE.
  10. Kambhatla, N. (2006). Minority vote: at-least-n voting improves recall for extracting relations. In Proceedings of the COLING/ACL on Main conference poster sessions. Association for Computational Linguistics.
  11. Manning, C. D., Surdeanu, M., Bauer, J., Finkel, J., Bethard, S. J., and McClosky, D. (2014). The stanford corenlp natural language processing toolkit. In Proceedings of 52nd Annual Meeting of the Association for Computational Linguistics: System Demonstrations.
  12. Mintz, M., Bills, S., Snow, R., and Jurafsky, D. (2009). Distant supervision for relation extraction without labeled data. In Proceedings of the Joint Conference of the 47th Annual Meeting of the ACL and the 4th International Joint Conference on Natural Language Processing of the AFNLP: Volume 2. Association for Computational Linguistics.
  13. Nguyen, T.-V. T. and Moschitti, A. (2011). Joint distant and direct supervision for relation extraction. In IJCNLP.
  14. NIST, U. (2003). The ace 2003 evaluation plan. US National Institute for Standards and Technology (NIST).
  15. Pasha, A., Al-Badrashiny, M., Diab, M., El Kholy, A., Eskander, R., Habash, N., Pooleery, M., Rambow, O., and Roth, R. M. (2014). Madamira: A fast, comprehensive tool for morphological analysis and disambiguation of arabic. In Proceedings of the Language Resources and Evaluation Conference (LREC), Reykjavik, Iceland.
  16. Snow, R., Jurafsky, D., and Ng, A. Y. (2004). Learning syntactic patterns for automatic hypernym discovery. Advances in Neural Information Processing Systems 17.
  17. Unger, C., B├╝hmann, L., Lehmann, J., Ngonga Ngomo, A.- C., Gerber, D., and Cimiano, P. (2012). Templatebased question answering over rdf data. In Proceedings of the 21st international conference on World Wide Web, pages 639-648. ACM.
  18. Waitelonis, J. and Sack, H. (2012). Towards exploratory video search using linked data. Multimedia Tools and Applications.
  19. Yahya, M., Berberich, K., Elbassuoni, S., Ramanath, M., Tresp, V., and Weikum, G. (2012). Natural language questions for the web of data. In Proceedings of the 2012 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning. Association for Computational Linguistics.
  20. Yao, L., Riedel, S., and McCallum, A. (2012). Unsupervised relation discovery with sense disambiguation. In Proceedings of the 50th Annual Meeting of the Association for Computational Linguistics: Long PapersVolume 1. Association for Computational Linguistics.
Download


Paper Citation


in Harvard Style

Mohamed R., M. El-Makky N. and Nagi K. (2015). ArabRelat: Arabic Relation Extraction using Distant Supervision . In Proceedings of the 7th International Joint Conference on Knowledge Discovery, Knowledge Engineering and Knowledge Management - Volume 2: KEOD, (IC3K 2015) ISBN 978-989-758-158-8, pages 410-417. DOI: 10.5220/0005636604100417


in Bibtex Style

@conference{keod15,
author={Reham Mohamed and Nagwa M. El-Makky and Khaled Nagi},
title={ArabRelat: Arabic Relation Extraction using Distant Supervision},
booktitle={Proceedings of the 7th International Joint Conference on Knowledge Discovery, Knowledge Engineering and Knowledge Management - Volume 2: KEOD, (IC3K 2015)},
year={2015},
pages={410-417},
publisher={SciTePress},
organization={INSTICC},
doi={10.5220/0005636604100417},
isbn={978-989-758-158-8},
}


in EndNote Style

TY - CONF
JO - Proceedings of the 7th International Joint Conference on Knowledge Discovery, Knowledge Engineering and Knowledge Management - Volume 2: KEOD, (IC3K 2015)
TI - ArabRelat: Arabic Relation Extraction using Distant Supervision
SN - 978-989-758-158-8
AU - Mohamed R.
AU - M. El-Makky N.
AU - Nagi K.
PY - 2015
SP - 410
EP - 417
DO - 10.5220/0005636604100417