would be to choose as root node a category lower in
the Wikipedia hierarchy than ”Main Topic Classifi-
cations”, possibly a node that still encompasses the
overall topics of the entity, but not as generic as the
one chosen here. The values of the parameters are
being fine-tuned in ongoing work in order to further
improve the proposed measure.
5 CONCLUSIONS
In this paper we presented a new semantic related-
ness measure between entities, using Wikipedia as a
hierarchy of scientific categories. The devised mea-
sure examines the Wikipedia category paths between
all the possible concept pairs of two distinct entities,
assigning weights according to the category’s rele-
vance in the resulting path set and in the Wikipedia
graph. We examined the proposed measure values
for selected entities, observing that these match the
intuitive human assessment of their semantic similar-
ity. We conclude then that this is a valid approach
to automatically assess the proximity of scientific re-
searchers and other scientific entities such as confer-
ences and journals.
Future Work. Ongoing work includes com-
parison of the results obtained with man-
ual annotations done by volunteers, using a
website specifically deployed for this task
(www.insticc.org/SemanticDistance.aspx). Fur-
ther work includes continuing exploration of the
measure for other entity pairs, comparison of our
measure with other state-of-the-art metrics, devising
tasks of semantic disambiguation of Wikipedia arti-
cles and clustering of concept sets such that an entity
may be represented by several subsets of scientific
topics, each subset representing a particular area.
ACKNOWLEDGEMENTS
The authors wish to acknowledge the support of
the Instituto de Telecomunicac¸
˜
oes (IT-IST) and Es-
cola Superior de Tecnologia, Instituto Polit
´
ecnico de
Set
´
ubal (EST-IPS).
REFERENCES
Gabrilovich, E. and Markovitch, S. (2007). Computing se-
mantic relatedness using wikipedia-based explicit se-
mantic analysis. In Proceedings of the 20th inter-
national joint conference on Artifical intelligence, IJ-
CAI’07, pages 1606–1611. Morgan Kaufmann Pub-
lishers Inc.
Gouws, S., Rooyen, G., and Engelbrecht, H. (2010). Mea-
suring conceptual similarity by spreading activation
over wikipedia’s hyperlink structure. In Proceedings
of the 2nd Workshop on The People’s Web Meets NLP:
Collaboratively Constructed Semantic Resources.
Jiang, J. J. and Conrath, D. W. (1997). Semantic Similarity
Based on Corpus Statistics and Lexical Taxonomy. In
International Conference Research on Computational
Linguistics (ROCLING X).
Leacock, C. and Chodorow, M. (1998). Combining Local
Context and WordNet Similarity for Word Sense Iden-
tification, chapter 11, pages 265–283. The MIT Press.
Liu, J. and Birnbaum, L. (2007). Measuring semantic sim-
ilarity between named entities by searching the web
directory. In Proceedings of the IEEE/WIC/ACM In-
ternational Conference on Web Intelligence, WI ’07,
pages 461–465.
Milne, D. and Witten, I. H. (2008). An effective, low-
cost measure of semantic relatedness obtained from
wikipedia links. In In Proceedings of AAAI 2008.
Nastase, V. and Strube, M. (2008). Decoding wikipedia
categories for knowledge acquisition. In AAAI, pages
1219–1224.
Ponzetto, S. P. and Strube, M. (2007). Knowledge derived
from wikipedia for computing semantic relatedness. J.
Artif. Int. Res., 30:181–212.
Rada, R., Mili, H., Bicknell, E., and Blettner, M. (1989).
Development and application of a metric on semantic
nets. IEEE Transactions on Systems, Man and Cyber-
netics, 19(1):17–30.
Resnik, P. (1999). Semantic Similarity in a Taxonomy:
An Information-Based Measure and its Application to
Problems of Ambiguity in Natural Language. Journal
of Artificial Intelligence Research, 11:95–130.
Rodrguez, M. A. and Egenhofer, M. J. (2003). Determining
semantic similarity among entity classes from differ-
ent ontologies. IEEE Transactions on Knowledge and
Data Engineering, 15:442–456.
Slimani, T., Yaghlane, B. B., and Mellouli, K. (2006). A
New Similarity Measure based on Edge Counting. In
Proceedings of world academy of science, engineer-
ing and technology, volume 17.
Wu, Z. and Palmer, M. (1994). Verbs semantics and lexical
selection. In Proceedings of the 32nd annual meeting
on Association for Computational Linguistics, ACL
’94, pages 133–138. Association for Computational
Linguistics.
Zesch, T., M
¨
uller, C., and Gurevych, I. (2008). Extract-
ing Lexical Semantic Knowledge from Wikipedia and
Wiktionary. In Proceedings of the Conference on Lan-
guage Resources and Evaluation (LREC).
MeasuringEntitySemanticRelatednessusingWikipedia
437