A WEAKLY SUPERVISED APPROACH FOR LARGE-SCALE RELATION EXTRACTION

Ludovic Jean-Louis; Romaric Besançon; Olivier Ferret; Adrien Durand

doi:10.5220/0003661200940103

A WEAKLY SUPERVISED APPROACH FOR LARGE-SCALE RELATION EXTRACTION

Ludovic Jean-Louis, Romaric Besançon, Olivier Ferret, Adrien Durand

2011

Abstract

Standard Information Extraction (IE) systems are designed for a specific domain and a limited number of relations. Recent work has been undertaken to deal with large-scale IE systems. Such systems are characterized by a large number of relations and no restriction on the domain, which makes difficult the definition of manual resources or the use of supervised techniques. In this paper, we present a large-scale IE system based on a weakly supervised method of pattern learning. This method uses pairs of entities known to be in relation to automatically extract example sentences from which the patterns are learned. We present the results of this system on the data from the KBP task of the TAC 2010 evaluation campaign.

References

Agichtein, E. and Gravano, L. (2000). Snowball: Extracting Relations from Large Plain-Text Collections. In 5th ACM International Conference on Digital Libraries, pages 85-94, San Antonio, Texas, USA.
Agirre, E., Chang, A., Jurafsky, D., Manning, C., Spitkovsky, V., and Yeh, E. (2009). Stanford-UBC at TAC-KBP. In Second Text Analysis Conference (TAC 2009), Gaithersburg, Maryland, USA.
Banko, M. and Etzioni, O. (2008). The Tradeoffs Between Open and Traditional Relation Extraction. In ACL-08: HLT, pages 28-36, Columbus, Ohio.
Bayardo, R., Ma, Y., and Srikant, R. (2007). Scaling Up All Pairs Similarity Search. In 16th International Conference on World Wide Web (WWW'07), pages 131-140, Banff, Alberta, Canada.
Bikel, D., Castelli, V., Radu, F., and jung Han, D. (2009). Entity Linking and Slot Filling through Statistical Processing and Inference Rules. In Second Text Analysis Conference (TAC 2009), Gaithersburg, USA.
Bizer, C., Lehmann, J., Kobilarov, G., Auer, S., Becker, C., Cyganiak, R., and Hellmann, S. (2009). DBpedia - A crystallization point for the Web of Data. Journal of Web Semantics, 7:154-165.
Byrne, L. and Dunnion, J. (2010). UCD IIRG at TAC 2010 KBP Slot Filling Task. In Third Text Analysis Conference (TAC 2010), Gaithersburg, Maryland, USA.
Chada, D., Aranha, C., and Monte, C. (2010). An Analysis of The Cortex Method at TAC 2010 KBP SlotFilling. In Third Text Analysis Conference (TAC 2010), Gaithersburg, Maryland, USA.
Chen, Z., Tamang, S., Lee, A., Li, X., Passantino, M., and Ji, H. (2010a). Top-down and Bottom-up: A Combined Approach to Slot Filling. In 6th Asia Information Retrieval Symposium on Information Retrieval Technology, pages 300-309, Taipei, Taiwan.
Chen, Z., Tamang, S., Lee, A., Li, X., Snover, M., Passantino, M., Lin, W.-P., and Ji, H. (2010b). CUNYBLENDER TAC-KBP2010 Slot Filling System Description. In Text Analysis Conference (TAC 2010), Gaithersburg, Maryland, USA.
Embarek, M. and Ferret, O. (2008). Learning patterns for building resources about semantic relations in the medical domain. In 6th Conference on Language Resources and Evaluation (LREC'08), Marrakech, Morocco.
Gionis, A., Indyk, P., and Motwani, R. (1999). Similarity Search in High Dimensions via Hashing. In 25th International Conference on Very Large Data Bases (VLDB'99), pages 518-529, Edinburgh, Scotland, UK.
Hearst, M. (1992). Automatic Acquisition of Hyponyms from Large Text Corpora. In 14th International Conference on Computational linguistics (COLING'92), pages 539-545, Nantes, France.
Ji, H., Grishman, R., and Trang Dang, H. (2010). Overview of the TAC 2010 Knowledge Base Population Track. In Third Text Analysis Conference (TAC 2010), Gaithersburg, Maryland, USA.
Li, F., Zheng, Z., Bu, F., Tang, Y., Zhu, X., and Huang, M. (2009). THU QUANTA at TAC 2009 KBP and RTE Track. In Second Text Analysis Conference (TAC 2009), Gaithersburg, Maryland, USA.
McNamee, P., Dredze, M., Gerber, A., Garera, N., Finin, T., Mayfield, J., Piatko, C., Rao, D., Yarowsky, D., and Dreyer, M. (2009). HLTCOE Approaches to Knowledge Base Population at TAC 2009. In Second Text Analysis Conference (TAC 2009), Gaithersburg, Maryland, USA.
Mintz, M., Bills, S., Snow, R., and Jurafsky, D. (2009). Distant supervision for relation extraction without labeled data. In ACL-IJCNLP'09, pages 1003-1011, Suntec, Singapore.
Pantel, P., Ravichandran, D., and Hovy, E. (2004). Towards Terascale Knowledge Acquisition. In 20th International Conference on Computational Linguistics (COLING'04), pages 771-777, Geneva, Switzerland.
Ravichandran, D. (2005). Terascale Knowledge Acquisition. PhD thesis, University of Southern California, Los Angeles, CA, USA.
Riedel, S., Yao, L., and McCallum, A. (2010). Modeling Relations and Their Mentions without Labeled Text. In Machine Learning and Knowledge Discovery in Databases, LNCS, pages 148-163.
Ruiz-Casado, M., Alfonseca, E., and Castells, P. (2007). Automatising the Learning of Lexical Patterns: an Application to the Enrichment of WordNet by Extracting Semantic Relationships from Wikipedia. Data Knowledge Engineering, 61:484-499.
Shinyama, Y. and Sekine, S. (2006). Preemptive Information Extraction using Unrestricted Relation Discovery. In HLT-NAACL 2006, pages 304-311, New York City, USA.
Surdeanu, M., McClosky, D., Tibshirani, J., Bauer, J., Chang, A., Spitkovsky, V., and Manning, C. (2010). A Simple Distant Supervision Approach for the TACKBP Slot Filling Task. In Text Analysis Conference (TAC 2010), Gaithersburg, Maryland, USA.
TAC-KBP (2010). Preliminary Task Description for Knowledge-Base Population at TAC 2010.
van Dongen, S. (2000). Graph Clustering by Flow Simulation. PhD thesis, University of Utrecht.
Zhou, G., Su, J., Zhang, J., and Zhang, M. (2005). Exploring Various Knowledge in Relation Extraction. In ACL 2005, pages 427-434, Ann Arbor, USA.
Zhou, G., Zhang, M., Ji, D., and Zhu, Q. (2007). Tree Kernel-Based Relation Extraction with ContextSensitive Structured Parse Tree Information. In EMNLP - CoNLL'07, pages 728-736, Prague, Czech Republic.

Download

Paper Citation

in Harvard Style

Jean-Louis L., Besançon R., Ferret O. and Durand A. (2011). A WEAKLY SUPERVISED APPROACH FOR LARGE-SCALE RELATION EXTRACTION . In Proceedings of the International Conference on Knowledge Discovery and Information Retrieval - Volume 1: KDIR, (IC3K 2011) ISBN 978-989-8425-79-9, pages 94-103. DOI: 10.5220/0003661200940103

in Bibtex Style

@conference{kdir11,
author={Ludovic Jean-Louis and Romaric Besançon and Olivier Ferret and Adrien Durand},
title={A WEAKLY SUPERVISED APPROACH FOR LARGE-SCALE RELATION EXTRACTION},
booktitle={Proceedings of the International Conference on Knowledge Discovery and Information Retrieval - Volume 1: KDIR, (IC3K 2011)},
year={2011},
pages={94-103},
publisher={SciTePress},
organization={INSTICC},
doi={10.5220/0003661200940103},
isbn={978-989-8425-79-9},
}

in EndNote Style

TY - CONF
JO - Proceedings of the International Conference on Knowledge Discovery and Information Retrieval - Volume 1: KDIR, (IC3K 2011)
TI - A WEAKLY SUPERVISED APPROACH FOR LARGE-SCALE RELATION EXTRACTION
SN - 978-989-8425-79-9
AU - Jean-Louis L.
AU - Besançon R.
AU - Ferret O.
AU - Durand A.
PY - 2011
SP - 94
EP - 103
DO - 10.5220/0003661200940103