intelligible words, the method can also be used to
help wikify possibly erroneous text from real-world
sources such as the Web, optical character recogni-
tion, or speech recognition. The numerical evalu-
ations demonstrated that the method works consis-
tently even in large-scale experiments, when disam-
biguating between up to 1000 Wikipedia articles.
A possible future application of the presented
method is the verification of links to Wikipedia. The
method can assign a single weight to each candidate
article: the sum of the weights in its group in the rep-
resentation vector α
α
α. If the weight corresponding to
the target of the link is small in contrast to weights of
other articles, the link is probably incorrect.
The presented method can be generalized, as it
can work with arbitrarily labeled text fragments as
well as contexts of Wikipedia links. This more gen-
eral framework may have further applications, as the
idea of distributional similarity offers solutions to
many natural language processing problems. For ex-
ample, topics might be assigned to documents as
in centroid-based document classification (Han and
Karypis, 2000).
ACKNOWLEDGEMENTS
The research has been supported by the ‘European
Robotic Surgery’ EC FP7 grant (no.: 288233). Any
opinions, findings and conclusions or recommenda-
tions expressed in this material are those of the au-
thors and do not necessarily reflect the views of other
members of the consortium or the European Commis-
sion.
REFERENCES
Bach, F., Jenatton, R., Mairal, J., and Obozinski, G. (2012).
Optimization with sparsity-inducing penalties. Foun-
dations and Trends in Machine Learning, 4(1):1–106.
BNC Consortium (2001). The British National Corpus, ver-
sion 2 (BNC World).
Chang, C.-C. and Lin, C.-J. (2001). LIBSVM: a library for
support vector machines.
Garofolo, J. S., Auzanne, C. G. P., and Voorhees, E. M.
(2000). The TREC Spoken Document Retrieval
Track: A Success Story. In RIAO, pages 1–20.
Han, E.-H. and Karypis, G. (2000). Centroid-based doc-
ument classification: Analysis and experimental re-
sults. In PKDD, pages 116–123.
Harris, Z. (1954). Distributional structure. Word,
10(23):146–162.
Jenatton, R., Mairal, J., Obozinski, G., and Bach, F.
(2011). Proximal methods for hierarchical sparse cod-
ing. Journal of Machine Learning Research, 12:2297–
2334.
Kantor, P. B. and Voorhees, E. M. (2000). The TREC-5
Confusion Track: Comparing Retrieval Methods for
Scanned Text. Information Retrieval, 2:165–176.
Kukich, K. (1992). Techniques for automatically correcting
words in text. ACM Computing Surveys, 24(4):377–
439.
Kulkarni, S., Singh, A., Ramakrishnan, G., and
Chakrabarti, S. (2009). Collective annotation of
Wikipedia entities in web text. In KDD, pages 457–
466.
Leacock, C., Chodorow, M., Gamon, M., and Tetreault, J.
(2010). Automated Grammatical Error Detection for
Language Learners. Synthesis Lectures on Human
Language Technologies. Morgan & Claypool Publish-
ers.
Lee, Y. K. and Ng, H. T. (2002). An empirical evaluation of
knowledge sources and learning algorithms for word
sense disambiguation. In EMNLP, pages 41–48.
Liu, J., Ji, S., and Ye, J. (2009). SLEP: Sparse Learning
with Efficient Projections. Arizona State University.
Martins, A. F. T., Smith, N. A., Aguiar, P. M. Q., and
Figueiredo, M. A. T. (2011). Structured Sparsity in
Structured Prediction. In EMNLP, pages 1500–1511.
Mihalcea, R. and Csomai, A. (2007). Wikify!: linking doc-
uments to encyclopedic knowledge. In CIKM, pages
233–242.
Miller, G. A. (1995). WordNet: A lexical database for En-
glish. Communications of the ACM, 38:39–41.
Milne, D. and Witten, I. H. (2008). Learning to link with
Wikipedia. In CIKM, pages 509–518.
Porter, M. F. (1997). An algorithm for suffix stripping, pages
313–316. Morgan Kaufmann Publishers Inc.
Ratinov, L., Roth, D., Downey, D., and Anderson, M.
(2011). Local and global algorithms for disambigua-
tion to Wikipedia. In ACL-HLT, pages 1375–1384.
Sch
¨
utze, H. (1998). Automatic word sense discrimination.
Computational Linguistics, 24(1):97–123.
Tibshirani, R. (1994). Regression shrinkage and selection
via the lasso. Journal of the Royal Statistical Society,
Series B, 58:267–288.
Turney, P. D. and Pantel, P. (2010). From frequency to
meaning: vector space models of semantics. Journal
of Artificial Intelligence Research, 37(1):141–188.
Yuan, M., Yuan, M., Lin, Y., and Lin, Y. (2006). Model
selection and estimation in regression with grouped
variables. Journal of the Royal Statistical Society, Se-
ries B, 68:49–67.
ExplainingUnintelligibleWordsbyMeansoftheirContext
387