Authors:
Giacomo Frisoni
;
Gianluca Moro
and
Antonella Carbonaro
Affiliation:
Department of Computer Science and Engineering – DISI, University of Bologna, Via dell’Università 50, I-47522 Cesena, Italy
Keyword(s):
Text Mining, Knowledge Graphs, Unsupervised Learning, Semantic Web, Ontology Learning, Rare Diseases.
Abstract:
The use of knowledge graphs (KGs) in advanced applications is constantly growing, as a consequence of their ability to model large collections of semantically interconnected data. The extraction of relational facts from plain text is currently one of the main approaches for the construction and expansion of KGs. In this paper, we introduce a novel unsupervised and automatic technique of KG learning from corpora of short unstructured and unlabeled texts. Our approach is unique in that it starts from raw textual data and comes to: i) identify a set of relevant domain-dependent terms; ii) extract aggregate and statistically significant semantic relationships between terms, documents and classes; iii) represent the accurate probabilistic knowledge as a KG; iv) extend and integrate the KG according to the Linked Open Data vision. The proposed solution is easily transferable to many domains and languages as long as the data are available. As a case study, we demonstrate how it is possible
to automatically learn a KG representing the knowledge contained within the conversational messages shared on social networks such as Facebook by patients with rare diseases, and the impact this can have on creating resources aimed to capture the “voice of patients”.
(More)