Authors:
Eduardo Motta
;
Sean Siqueira
and
Alexandre Andreatta
Affiliation:
Federal University of the State of Rio de Janeiro (UNIRIO), Brazil
Keyword(s):
Information extraction, Ontology population, Natural language processing, Brazilian Portuguese.
Related
Ontology
Subjects/Areas/Topics:
Artificial Intelligence
;
Knowledge Discovery and Information Retrieval
;
Knowledge-Based Systems
;
Ontology and the Semantic Web
;
Searching and Browsing
;
Soft Computing
;
Symbolic Systems
;
Web Information Systems and Technologies
;
Web Interfaces and Applications
;
Web Mining
Abstract:
An increasing amount of information is available on the web and usually is expressed as text, representing unstructured or semi-structured data. Semantic information is implicit in these texts, since they are mainly intended for human consumption and interpretation. Since unstructured information is not easily handled automatically, an information extraction process has to be used to identify concepts and establish relations among them. Information extraction outcome can be represented as a domain ontology. Ontologies are an appropriate way to represent structured knowledge bases, enabling sharing, reuse and inference. In this paper, an information extraction process is used for populating a domain ontology. It targets Brazilian Portuguese texts from a biographical dictionary of music, which requires specific tools due to some language unique aspects. An unsupervised rule-based method is proposed. Through this process, latent concepts and relations expressed in natural language can b
e extracted and represented as an ontology, allowing new uses and visualizations of the content, such as semantically browsing and inferring new knowledge.
(More)