hand, the entity Whitney (another common named en-
tity) has been linked to a DBpedia resource of type
Other since the corresponding tweet does not pro-
vide sufficient evidence to look for the correct KB re-
source. Lastly, a percent of classifiable, non-linkable
entities exist for which no entity link could be estab-
lished (1.21%, also included in 8% missed entities of
step 1).
Step 3: Entity Recognition Enhancement
Here, we present an analysis of the proposed system
discussed in Section 3, step 3. By using equation (2),
we re-classify the (classifiable, and linkable) named
entities that have been classified and linked in step
1 and step 2, respectively, irrespective of the entity
types discovered in step 1. By following this ap-
proach, we are able to improve the performance of
entity recognition step of the entity linking pipeline
of our system. We denote the improved entity recog-
nition system as T-NER+.
Table 5: Example: Re-classification of entities.
Entity Ground-Truth T-NER T-NER+
30stm Band Product Band
Yahoo Company Band Company
Southgate House Facility Band Facility
Canada Geo-Location Person Geo-Location
Camp rock 2 Movie Person Movie
Thanksgiving Other Person Other
John Acuff Person Facility Person
iphone Product Company Product
Lions Sportsteam Person Sportsteam
TMZ TVshow Band TVshow
Table 2 summarizes the results of this step where
we also present the comparative analysis with T-NER.
As evident, we are able to improve the class-wise
classification of a majority of entity types, except the
entity type TVshow for which there is a decline in
classification accuracy by almost 7%. Entity types
Geo-Location and Other experience marginal decline
in classification accuracy. Table 5 presents an exam-
ple of re-classification of entities into correct entity
types w.r.t ground truth.
5 CONCLUSIONS
In this paper, we have presented an end-to-end entity
linking pipeline for short textual formats, in particular
tweets. We also presented an approach to improve the
entity recognition performance of a NER system by
using re-classification. By our approach, we are able
to enhance the classification performance of the NER
system, however, the scale of this enhancement can be
still improved. One outcome of our work is that newly
emerging knowledge (new entities or new mentions of
existing entities) on the Web, in particular social me-
dia platforms, can be extracted if not covered by an
existing KB. During entity recognition and classifica-
tion, we come across 8% entities that are not iden-
tified by the system. These entities comprise newly
emerging entities as well as entities that have not been
identified, and hence not classified. While, during en-
tity linking, we came across ≈ 2.4% entities for which
a match could not be found with any resource in the
DBpedia KB, owing to either non-existence of such
entities in the KB or to non coverage of their surface
form in the KB vocabulary.
Our next step in this field is to extract information
from the Web as well as social media platforms for
new entities that are discovered in the entity recog-
nition and entity linking phase in order to, not only
improve NER and NEL, but also work towards real-
time lexical extensions of a KB. Concerning the fu-
ture work, a possible contribution could be given by
comparing the performance of the proposed approach
with the most relevant related work (Yamada et al.,
2015) on a common dataset, as well as using addi-
tional datasets (Rizzo et al., 2015).
REFERENCES
Cohen, W., Ravikumar, P., and Fienberg, S. (2003). A
comparison of string metrics for matching names and
records. In Kdd workshop on data cleaning and object
consolidation, volume 3, pages 73–78.
Cucerzan, S. (2007). Large-scale named entity disambigua-
tion based on wikipedia data. In EMNLP-CoNLL, vol-
ume 7, pages 708–716.
Cunningham, H., Maynard, D., Bontcheva, K., and Tablan,
V. (2002). A framework and graphical development
environment for robust nlp tools and applications. In
ACL, pages 168–175.
Damljanovic, D. and Bontcheva, K. (2012). Named entity
disambiguation using linked data. In Proceedings of
the 9th Extended Semantic Web Conference.
Derczynski, L., Maynard, D., Rizzo, G., van Erp, M., Gor-
rell, G., Troncy, R., Petrak, J., and Bontcheva, K.
(2015). Analysis of named entity recognition and link-
ing for tweets. Information Processing & Manage-
ment.
Ferragina, P. and Scaiella, U. (2010). Tagme: on-the-fly
annotation of short text fragments (by wikipedia enti-
ties). In Proceedings of the 19th ACM international
conference on Information and knowledge manage-
ment. ACM.
Finin, T., Murnane, W., Karandikar, A., Keller, N., Mar-
tineau, J., and Dredze, M. (2010). Annotating named
entities in twitter data with crowdsourcing. In Pro-
ceedings of the NAACL HLT 2010 Workshop on Cre-
KDIR 2015 - 7th International Conference on Knowledge Discovery and Information Retrieval
154