which a more complete training set could be gener-
ated and input to either of the two previous mecha-
nisms. All three mechanisms were compared using
a car pollution scenario comprised of 300 tweets as
the training set and 313 tweets to create the ontolo-
gies. For the first two mechanisms the Relation Ex-
traction models were compared and Stanford Relation
Extraction found to outperform GATE relation extrac-
tion; an average F-score of 79.56 compared to 67.94.
The third mechanism, the Regular Expression mech-
anism, was therefore coupled with Stanford Relation
Extraction. For the first two mechanisms 270 records
were used for training (30 held back for testing) while
for the third only 100 records were used fort the ini-
tial seed training set, a reduction of 65%, without ad-
versely affecting the quality of the generated ontolo-
gies. The generated ontologies were evaluated using
a W3C validation tool which focused on the syntax of
the ontology, whilst the utility of the ontologies was
evaluated by populating the ontologies and querying
them using SPARQL test queries. The results were
very encouraging and the authors are now embarking
on a large scale evaluation of the proposed mecha-
nisms directed at more sophisticated ontologies and
using larger collections of tweets.
REFERENCES
Aramaki, E., Maskawa, S., and Morita, M. (2011). Twitter
Catches The Flu: Detecting Influenza Epidemics us-
ing Twitter. In 2011 Conference on Empirical Meth-
ods in Natural Language Processing, pages 1568–
1576. Association for Computational Linguistics.
Carlson, A., Betteridge, J., Wang, R. C., Hruschka, E. R.,
and Mitchell, T. M. (2010). Coupled semi-supervised
learning for information extraction. In Proceedings of
the third ACM international conference on Web search
and data mining, page 101. ACM.
Chunxiao, W., Jingjing, L., Yire, X., Min, D., Zhaohui, W.,
Gaofu, Q., Xiangchun, S., Xuejun, W., Jie, W., and
Taiming, L. (2007). Customizing an Information Ex-
traction System to a New Domain. In Regulatory Pep-
tides, volume 141, pages 35–43. Association for Com-
putational Linguistics.
Cunningham, H. (2002). Gate, a general architecture for
text engineering. Computers and the Humanities,
36(2):223–254.
Exner, P. and Nugues, P. (2012). Entity Extraction: From
Unstructured Text to DBpedia RDF Triples. In The
Web of Linked Entities Workshop (WoLE 2012), pages
58–69. CEUR.
Finkel, J. R., Grenager, T., and Manning, C. (2007). Incor-
porating non-local information into information ex-
traction systems by Gibbs sampling. In Proceedings
of the 43rd annual meeting on association for com-
putational linguistics, pages 363–370. Association for
Computational Linguistics.
Graham, K. and Carroll, J. (2004). Resource Descrip-
tion Framework (RDF): Concepts and Abstract Syn-
tax. W3C Recommendation, 10(October):1—-20.
H. Cunningham D. Maynard and V. Tablan (2000). JAPE:
a Java Annotation Patterns Engine (Second Edi-
tion). Department of Computer Science, University
of Sheffield.
Harlow, C. (2015). Data Munging Tools in Preparation for
RDF: Catmandu and LODRefine. The Code4Lib Jour-
nal, 30(30):1–30.
King, B. E. and Reinold, K. (2014). Natural language pro-
cessing. Finding the Concept, Not Just the Word,
pages 67–78.
Klusch, M., Kapahnke, P., Schulte, S., Lecue, F., and Bern-
stein, A. (2016). Semantic Web Service Search: A
Brief Survey. KI - K
¨
unstliche Intelligenz, 30(2):139–
147.
Murthy, D. (2015). Twitter and elections: are tweets, pre-
dictive, reactive, or a form of buzz? Information Com-
munication and Society, 18(7):816–831.
Prud’Hommeaux, E., Seaborne, A., Prud, E., and Labora-
tories, H.-p. (2008). SPARQL Query Language for
RDF. W3C working draftd, pages 1–95.
Republic, C. (2003). A Study on Automated Relation
Labelling in Ontology Learning. Ontology Learn-
ing from Text Methods evaluation and applications,
123(123):1–15.
Riedel, S. and Mccallum, A. (2013). Relation Extraction
with Matrix Factorization. In Proceedings of the 2013
Conference of the North American Chapter of the As-
sociation for Computational Linguistics: Human Lan-
guage Technologies, pages 74–84.
Riedel, S., Yao, L., and McCallum, A. (2010). Model-
ing Relations and Their Mentions without Labeled
Text BT - Machine Learning and Knowledge Dis-
covery in Databases. In Joint European Conference
on Machine Learning and Knowledge Discovery in
Databases, pages 148–163. Springer.
Roth, D. and Yih, W.-t. (2019). Global Inference for En-
tity and Relation Identification via a Linear Program-
ming Formulation. Introduction to Statistical Rela-
tional Learning, pages 553–580.
Sidhu, R. and Prasanna, V. K. (2001). Fast regular expres-
sion matching using fpgas. In The 9th Annual IEEE
Symposium on Field-Programmable Custom Comput-
ing Machines (FCCM’01), pages 227–238. IEEE.
Takamatsu, S., Sato, I., and Nakagawa, H. (2012). Reduc-
ing Wrong Labels in Distant Supervision for Relation
Extraction. In Acl, pages 721–729. Association for
Computational Linguistics.
Wang, T., Li, Y., Bontcheva, K., Cunningham, H., and
Wang, J. (2006). Automatic Extraction of Hierarchi-
cal Relations from Text. In European Semantic Web
Conference, pages 215–229. Springer.
Zhou, L. (2007). Ontology learning: State of the art and
open issues. Information Technology and Manage-
ment, 8(3):241–252.
Ontology Learning from Twitter Data
103