Ontology Learning from Twitter Data

Saad Alajlan, Frans Coenen, Boris Konev, Angrosh Mandya

Abstract

This paper presents and compares three mechanisms for learning an ontology describing a domain of discoursed as defined in a collection of tweets. The task in part involves the identification of entities and relations in the free text data, which can then be used to produce a set of RDF triples from which an ontology can be generated. The first mechanism is therefore founded on the Stanford CoreNLP Toolkit.; in particular the Named Entity Recognition and Relation Extraction mechanisms that come with this tool kit. The second is founded on the GATE General Architecture for Text Engineering which provides an alternative mechanism for relation extraction from text. Both require a substantial amount of training data. To reduce the training data requirement the third mechanism is founded on the concept of Regular Expressions extracted from a training data “seed set”. Although the third mechanism still requires training data the amount of training data is significantly reduced without adversely affecting the quality of the ontologies generated.

Download


Paper Citation