Domain-Specific Relation Extraction - Using Distant Supervision Machine Learning

Abduladem Aljamel, Taha Osman, Giovanni Acampora

2015

Abstract

The increasing accessibility and availability of online data provides a valuable knowledge source for information analysis and decision-making processes. In this paper we argue that extracting information from this data is better guided by domain knowledge of the targeted use-case and investigate the integration of a knowledge-driven approach with Machine Learning techniques in order to improve the quality of the Relation Extraction process. Targeting the financial domain, we use Semantic Web Technologies to build the domain Knowledgebase, which is in turn exploited to collect distant supervision training data from semantic linked datasets such as DBPedia and Freebase. We conducted a serious of experiments that utilise the number of Machine Learning algorithms to report on the favourable implementations/configuration for successful Information Extraction for our targeted domain.

References

  1. Akbik, A., and Broß, J., 2009. Wanderlust: Extracting semantic relations from natural language text using dependency grammar patterns. In: WWW Workshop.
  2. Andrew, G., and Gao, J., 2007. Scalable training of L 1- regularized log-linear models. In: Proceedings of the 24th international conference on Machine learning, ACM, pp. 33-40.
  3. boilerpipe, 2014. boilerpipe [online]. Google. Available at: https://code.google.com/p/boilerpipe [Accessed 5/20 2014].
  4. Costantino, M., Morgan, R.G., Collingham, R.J. and Carigliano, R., 1997. Natural language processing and information extraction: Qualitative analysis of financial news articles. In: Computational Intelligence for Financial Engineering (CIFEr), 1997., Proceedings of the IEEE/IAFE 1997, IEEE, pp. 116-122.
  5. Cunningham, H., 2005. Information extraction, automatic. Encyclopedia of Language and Linguistics, 665-677.
  6. Cunningham, H., Maynard, D. and Bontcheva, K., 2011. Text processing with gate. Gateway Press CA.
  7. Daelemans, W., and Hoste, V., 2002. Evaluation of machine learning methods for natural language processing tasks. In: 3rd International conference on Language Resources and Evaluation (LREC 2002), European Language Resources Association (ELRA).
  8. fadyart.com, 2014. Finance Ontology[online]. fadyart.com. Available at: http://fadyart.com [Accessed 4/30 2014].
  9. Farkas, R., 2009. Machine learning techniques for applied information extraction.
  10. Farmakiotou, D., Karkaletsis, V., Koutsias, J., Sigletos, G., Spyropoulos, C.D. and Stamatopoulos, P., 2000. Rulebased named entity recognition for Greek financial texts. In: Proceedings of the Workshop on Computational lexicography and Multimedia Dictionaries (COMLEX 2000), Citeseer, pp. 75-78.
  11. Garcia, M., and Gamallo, P., 2011. A Weakly-Supervised Rule-Based Approach for Relation Extraction. In: XIV Conference of the Spanish Association for Artificial Intelligence (CAEPIA 2011), pp. 07-2011.
  12. Han, J., Kamber, M. and Pei, J., 2011. Data mining: concepts and techniques: concepts and techniques. Elsevier.
  13. Harris, S., Seaborne, A. and Prud'hommeaux, E., 2013. SPARQL 1.1 query language. W3C Recommendation, 21.
  14. Hmeidi, I., Hawashin, B. and El-Qawasmeh, E., 2008. Performance of KNN and SVM classifiers on full word Arabic articles. Advanced Engineering Informatics, 22 (1), 106-111.
  15. Hong, G., 2005. Relation extraction using support vector machine. In: Relation extraction using support vector machine. Natural Language Processing-IJCNLP 2005. Springer, 2005, pp. 366-377.
  16. Jiang, X., Huang, Y., Nickel, M. and Tresp, V., 2012. Combining information extraction, deductive reasoning and machine learning for relation prediction. In: Combining information extraction, deductive reasoning and machine learning for relation prediction. The Semantic Web: Research and Applications. Springer, 2012, pp. 164-178.
  17. Khan, A., and Baig, A.R., 2015. Multi-Objective Feature Subset Selection using Non-dominated Sorting Genetic Algorithm. Journal of Applied Research and Technology, 13 (1), 145-159.
  18. Kohlschütter, C., Fankhauser, P. and Nejdl, W., 2010. Boilerplate detection using shallow text features. In: Proceedings of the third ACM international conference on Web search and data mining, ACM, pp. 441-450.
  19. Konstantinova, N., 2014. Review of Relation Extraction Methods: What Is New Out There? In: Review of Relation Extraction Methods: What Is New Out There? Analysis of Images, Social Networks and Texts. Springer, 2014, pp. 15-28.
  20. Li, Y., Bontcheva, K. and Cunningham, H., 2009. Adapting SVM for data sparseness and imbalance: a case study in information extraction. Natural Language Engineering, 15 (02), 241-271.
  21. Li, Y., Miao, C., Bontcheva, K. and Cunningham, H., 2005. Perceptron Learning for Chinese Word Segmentation. In: Proceedings of Fourth SIGHAN Workshop on Chinese Language Processing (Sighan-05), pp. 154- 157.
  22. Li, Y., and Shawe-Taylor, J., 2003. The SVM with uneven margins and Chinese document categorization. In: Proceedings of The 17th Pacific Asia Conference on Language, Information and Computation (PACLIC17), pp. 216-227.
  23. Min, B., Grishman, R., Wan, L., Wang, C. and Gondek, D., 2013. Distant Supervision for Relation Extraction with an Incomplete Knowledge Base. In: HLT-NAACL, pp. 777-782.
  24. Mintz, M., Bills, S., Snow, R. and Jurafsky, D., 2009. Distant supervision for relation extraction without labeled data. In: Proceedings of the Joint Conference of the 47th Annual Meeting of the ACL and the 4th International Joint Conference on Natural Language Processing of the AFNLP: Volume 2-Volume 2, Association for Computational Linguistics, pp. 1003- 1011.
  25. Panchenko, A., Adeykin, S., Romanov, P. and Romanov, A., 2012. Extraction of semantic relations between concepts with knn algorithms on wikipedia. In: Concept Discovery in Unstructured Data Workshop (CDUD) of International Conference On Formal Concept Analysis, Belgium, Citeseer, pp. 78-88.
  26. Radzimski, M., Sánchez-Cervantes, J.L., RodríguezGonzález, A., Gómez-Berbís, J.M. and García-Crespo, Á, 2012. FLORA-Publishing Unstructured Financial Information in the Linked Open Data Cloud. In: International Workshop on Finance and Economics on the Semantic Web (FEOSW 2012), pp. 27-28.
  27. Ruiz-Martínez, J.M., Valencia-García, R. and GarcíaSánchez, F., 2012. Semantic-Based Sentiment analysis in financial news. In: Proceedings of the 1st International Workshop on Finance and Economics on the Semantic Web, pp. 38-51.
  28. Wang, T., Li, Y., Bontcheva, K., Cunningham, H. and Wang, J., 2006. Automatic extraction of hierarchical relations from text. Springer.
  29. Witten, I.H., and Frank, E., 2005. Data Mining: Practical machine learning tools and techniques. Morgan Kaufmann.
Download


Paper Citation


in Harvard Style

Aljamel A., Osman T. and Acampora G. (2015). Domain-Specific Relation Extraction - Using Distant Supervision Machine Learning . In Proceedings of the 7th International Joint Conference on Knowledge Discovery, Knowledge Engineering and Knowledge Management - Volume 1: KDIR, (IC3K 2015) ISBN 978-989-758-158-8, pages 92-103. DOI: 10.5220/0005615100920103


in Bibtex Style

@conference{kdir15,
author={Abduladem Aljamel and Taha Osman and Giovanni Acampora},
title={Domain-Specific Relation Extraction - Using Distant Supervision Machine Learning},
booktitle={Proceedings of the 7th International Joint Conference on Knowledge Discovery, Knowledge Engineering and Knowledge Management - Volume 1: KDIR, (IC3K 2015)},
year={2015},
pages={92-103},
publisher={SciTePress},
organization={INSTICC},
doi={10.5220/0005615100920103},
isbn={978-989-758-158-8},
}


in EndNote Style

TY - CONF
JO - Proceedings of the 7th International Joint Conference on Knowledge Discovery, Knowledge Engineering and Knowledge Management - Volume 1: KDIR, (IC3K 2015)
TI - Domain-Specific Relation Extraction - Using Distant Supervision Machine Learning
SN - 978-989-758-158-8
AU - Aljamel A.
AU - Osman T.
AU - Acampora G.
PY - 2015
SP - 92
EP - 103
DO - 10.5220/0005615100920103