7 CONCLUDING REMARKS
Manual population of ontology by domain experts
and knowledge engineers is an expensive and time
consuming task so, automatic and semi-automatic
approaches are needed.
This article gives an overview of a domain
independent process for automatic ontology
population from textual resources and details the
phase where candidate instances of an ontology are
identified. The process is based on natural language
processing and supervised machine learning
techniques and consists of four phases: corpus
creation, identification of candidate instances,
instance creation and instance representation.
Two experiments were performed to evaluate the
proposed approach. The first experiment used
natural language processing techniques and the IDF
statistical measure. Candidate instances were
extracted from a corpus in the family law domain
and a precision of 54% was obtained. The second
experiment used the previously described techniques
and Wordnet getting an improvement of 10% in the
precision value.
The combination of natural language processing
techniques and statistical measures seems to be a
promising approach for automatic extraction of
ontology instances considering the initial results
reported here. However, more experimentation is
needed.
Currently we are evaluating different supervised
machine learning algorithms (Bayesian networks,
decision trees and statistical relational learning,
among others) in order to select a suitable approach
for the classification of instances in ontology
classes. We are also evaluating the advantages of
combining information extraction techniques with
the proposed approach to improve its effectiveness.
ACKNOWLEDGEMENTS
This work is supported by CNPq, CAPES and
FAPEMA, institutions of the Brazilian Government
for scientific and technologic development.
REFERENCES
Allen, J. 1995. Natural Language Understanding.
Redwood City, CA: The Benjamin/Cummings
Publishing Company, Inc.
Bichop, C. M. 2006. Pattern Recognition and Machine
Learning, Springer.
Cimiano, P. and Volker, J., 2005. Towards large-scale,
open-domain and ontology-based named entity
classification. In: Proceedings of RANLP’05, p. 166–
172, Borovets, Bulgaria.
Cimiano,P., Pivk, A., Thieme, L. S. and Staab, L. S., 2005.
Learning Taxonomic Relations from heterogeneous
Sources of Evidence. In Ontology Learning from
Text: Methods, Evaluation and Applications. IOS
Press.
Dale, R., Moisl, H. and Somers, H. L. 2000. Handbook of
natural language processing. CRC Press.
Dellschaft, K. and Staab, S. 2006. On how to perform a
gold standard based evaluation of ontology learning.
In: Proceedings of the 5th International Semantic Web
Conference, p. 228 – 241, Athens. Springer.
Fellbaum, C., 1998. Wordnet: An Electronic Lexical
Database, MIT Press.
Fleischman, M. and Hovy, E., 2002. Fine Grained
Classification of Named Entities. In: Proceedings of
COLING, Taipei, Taiwan.
General Architecture for Text Engineering, 2009,
http://gate.ac.uk, December.
Gruber, T. R., 1995. Toward Principles for the Design of
Ontologies used for Knowledge Sharing. International
Journal of Human-Computer Studies, nº43, pp. 907-
928.
Guarino, N., Masolo, C., and Vetere, G. 1999. Ontoseek:
Content-based Access to the web. IEEE Intelligent
Systems, v. 14(3), p. 70-80.
Hearst, M., 1998. Automated Discovery of Word-Net
Relations. In WordNet: An Electronic Lexical
Database. MIT Press.
Marcus, M., Santorini, B. and Marcinkiewicz, M. 1993.
Building a Large Annotaded Corpus of English: Penn
TreeBank. Computational linguistics: Special Issue on
Using Large Corpora, [S. I.], v. 19, n.2, p. 313 – 330.
Marneffe, M. and Manning, C. 2008. The Stanford typed
dependencies representation. In: Workshop on Cross-
Framework and Cross-Domain Parser Evaluation,
Manchester. Proceedings of the Workshop on Cross-
Framework and Cross-Domain Parser Evaluation.
p. 1 - 8.
Mitchell, T. 1997. Machine Learning, Mc Graw Hill.
Nierenburg, S. and Raskin, V. 2004. Ontological
Semantics, MIT Press.
Russel, S. and Norvig, P. 1995. Artificial Intelligence: A
Modern Approach, Prentice-Hall.
Salton, G. and Buckley, C., 1987. Term Weighting
Approaches in Automatic Text Retrieval. Cornell
University.
Tanev, H. and Magnini, B., 2006. Weakly Supervised
Approaches for Ontology Population. In: Proceedings
of EACL.
Witten, I. H. and Frank, E. 2005. Data Mining Practical
Machine Learning Tools and Techniques, Elsevier 2
nd
edition.
USING NATURAL LANGUAGE PROCESSING FOR AUTOMATIC EXTRACTION OF ONTOLOGY INSTANCES
283