in-depth analysis of what happens to individual cases
during the process.
We claim that this approach can be adapted to suit
any kind of named entities. It is just required to de-
velop a mechanism to find highly ambiguous false
positives among the extracted named entities. Co-
herency measures can be used to find highly ambigu-
ous named entities. For future research, we plan to
apply and enhance our approach for other types of
named entities and other domains. Furthermore, the
approach appears to be fully language independent,
therefore we like to prove that this is the case and
investigate its effect on texts in multiple and mixed
languages.
REFERENCES
Borthwick, A., Sterling, J., Agichtein, E., and Grishman, R.
(1998). NYU: Description of the MENE named entity
system as used in MUC-7. In Proc. of MUC-7.
Buscaldi, D. and Rosso, P. (2008). A conceptual density-
based approach for the disambiguation of toponyms.
Int’l Journal of Geographical Information Science,
22(3):301–313.
Finkel, J. R., Grenager, T., and Manning, C. (2005). ncorpo-
rating non-local information into information extrac-
tion systems by gibbs sampling. In roceedings of the
43nd Annual Meeting of the Association for Compu-
tational Linguistics, ACL 2005, pages 363–370.
Gaizauskas, R., Wakao, T., Humphreys, K., Cunningham,
H., and Wilks, Y. (1995). University of Sheffield: De-
scription of the LaSIE system as used for MUC-6. In
Proc. of MUC-6, pages 207–220.
Grishman, R. and Sundheim, B. (1996). Message under-
standing conference - 6: A brief history. In Proc. of
Int’l Conf. on Computational Linguistics, pages 466–
471.
Gupta, R. (2006). Creating probabilistic databases from in-
formation extraction models. In VLDB, pages 965–
976.
Habib, M. B. (2011). Neogeography: The challenge of
channelling large and ill-behaved data streams. In
Workshops Proc. of the 27th ICDE 2011, pages 284–
287.
Habib, M. B. and van Keulen, M. (2011). Named entity
extraction and disambiguation: The reinforcement ef-
fect. In Proc. of MUD 2011, Seatle, USA, pages 9–16.
Hobbs, J., Appelt, D., Bear, J., Israel, D., Kameyama, M.,
Stickel, M., and Tyson, M. (1993). Fastus: A system
for extracting information from text. In Proc. of Hu-
man Language Technology, pages 133–137.
Humphreys, K., Gaizauskas, R., Azzam, S., Huyck, C.,
Mitchell, B., Cunningham, H., and Wilks, Y. (1998).
University of Sheffield: Description of the Lasie-II
system as used for MUC-7. In Proc. of MUC-7.
Isozaki, H. and Kazawa, H. (2002). Efficient support vector
classifiers for named entity recognition. In Proc. of
COLING 2002, pages 1–7.
Martins, B., Anast
´
acio, I., and Calado, P. (2010). A ma-
chine learning approach for resolving place references
in text. In Proc. of AGILE 2010.
McCallum, A. and Li, W. (2003). Early results for named
entity recognition with conditional random fields, fea-
ture induction and web-enhanced lexicons. In Proc. of
CoNLL 2003, pages 188–191.
Michelakis, E., Krishnamurthy, R., Haas, P. J., and
Vaithyanathan, S. (2009). Uncertainty management
in rule-based information extraction systems. In Pro-
ceedings of the 35th SIGMOD international confer-
ence on Management of data, SIGMOD ’09, pages
101–114, New York, NY, USA. ACM.
Overell, J. and Ruger, S. (2006). Place disambiguation with
co-occurrence models. In Proc. of CLEF 2006.
Rauch, E., Bukatin, M., and Baker, K. (2003). A
confidence-based framework for disambiguating geo-
graphic terms. In Workshop Proc. of the HLT-NAACL
2003, pages 50–54.
Sekine, S. (1998). NYU: Description of the Japanese NE
system used for MET-2. In Proc. of MUC-7.
Smith, D. and Crane, G. (2001). Disambiguating ge-
ographic names in a historical digital library. In
Research and Advanced Technology for Digital Li-
braries, volume 2163 of LNCS, pages 127–136.
Smith, D. and Mann, G. (2003). Bootstrapping toponym
classifiers. In Workshop Proc. of HLT-NAACL 2003,
pages 45–49.
Sutton, C. and McCallum, A. (2011). An introduction to
conditional random fields. Foundations and Trends in
Machine Learning. To appear.
Viterbi, A. (1967). Error bounds for convolutional codes
and an asymptotically optimum decoding algorithm.
Information Theory, IEEE Transactions on, 13(2):260
– 269.
Wacholder, N., Ravin, Y., and Choi, M. (1997). Disam-
biguation of proper names in text. In Proc. of ANLC
1997, pages 202–208.
Wallach, H. (2004). Conditional random fields: An in-
troduction. Technical Report MS-CIS-04-21, Depart-
ment of Computer and Information Science, Univer-
sity of Pennsylvania.
Zhou, G. and Su, J. (2002). Named entity recognition using
an hmm-based chunk tagger. In Proc. ACL2002, pages
473–480.
KDIR2012-InternationalConferenceonKnowledgeDiscoveryandInformationRetrieval
410