as well as phonetic similarity measures. We will
also incorporate other heuristics such as those pro-
posed in (Wong et al., 2006). Other collections of
errors such as the one used by Aspell (Hirst and St-
Onge, 1998) will be included as well as collections
of documents tagged with errors such as the one used
by (Pedler, 2007). The platform can then be used to
determine optimal parameters in the combination of
different approaches and heuristics. We wish to eval-
uate complete error correction systems on the same
platform. The results are more difficult to interpret
because we do not control the resources (including
dictionaries) they rely on, but they will provide refer-
ence results to locate raw performance of the evalu-
ated approaches.
An Indexation Time Error Correction (ITEC) sys-
tem can be used in the analysis of documents to cor-
rect errors they contain and allowing creation of more
representative indexes. We wish to make indirect
evaluation of error correction approaches by compar-
ing the results obtained by information retrieval sys-
tems on evaluation campaigns such as TREC (Kantor
and Voorhees, 2000) or INEX without ITEC and with
it enabled.
REFERENCES
Atkinson, K. (2012). Aspell Spellchecker. http://aspell.net.
Last access 15 Jan. 2012.
Fellbaum, C. (1998). WordNet: An Electronic Lexical
Database. Cambridge, mit press edition.
Hirst, G. and Budanitsky, A. (2005). Correcting real-word
spelling errors by restoring lexical cohesion. Natural
Language Engineering, 11(1):87–111.
Hirst, G. and St-Onge, D. (1998). Lexical chains as repre-
sentations of context for the detection and correction
of malapropisms. In Fellbaum, C., editor, WordNet An
Electronic Lexical Database, volume 305, chapter 13,
pages 305–332. The MIT Press.
Kantor, P. B. and Voorhees, E. M. (2000). The TREC-5
Confusion Track: Comparing Retrieval Methods for
Scanned Text. Information Retrieval, 2(2):165–176.
Kukich, K. (1992). Techniques for Automatically Correct-
ing Words in Text. ACM Computing Surveys (CSUR),
24(4):439.
Mays, E., Damerau, F. J., and Mercer, R. L. (1991). Context
based spelling correction. Information Processing &
Management, 27(5):517–522.
Miller, G. A. (1995). WordNet: A Lexical Database for
English. Communications of the ACM, 38(11):39–41.
Mitton, R. (2008). Ordering the suggestions of a
spellchecker without using context. Natural Language
Engineering, 15(02):173–192.
Mudge, R. (2012). After the Deadline. http://
static.afterthedeadline.com. Last access 15 Jan. 2012.
OSGi-Alliance (2012). Open Services Gateway initiative.
http://www.osgi.org. Last access 15 Jan. 2012.
Pedler, J. (2007). Computer Correction of Real-word
Spelling Errors in Dyslexic Text. PhD thesis, Birk-
beck, London University.
Rosnay, J. and Revelli, C. (2006). Pronetarian Revolution.
Ruch, P. (2002). Using contextual spelling correction to im-
prove retrieval effectiveness in degraded text collec-
tions. In Proceedings of the 19th international con-
ference on Computational linguistics-Volume 1, vol-
ume 1, page 7. Association for Computational Lin-
guistics.
Shannon, C. (1948). A mathematical theory of communi-
cation. Bell System Technical Journal, 27:379–423,
623–656.
Subramaniam, L. V., Roy, S., Faruquie, T. A., and Negi, S.
(2009). A Survey of Types of Text Noise and Tech-
niques to Handle Noisy Text. Language, pages 115–
122.
Varnhagen, C. K., McFall, G. P., Figueredo, L., Takach,
B. S., Daniels, J., and Cuthbertson, H. (2009).
Spelling and the Web. Journal of Applied Develop-
mental Psychology, 30(4):454–462.
Voorhees, E. M., Garofolo, J., and Sparck Jones, K. (2000).
The TREC-6 Spoken Document Retrieval Track. Bul-
letin of the American Society for Information Science
and Technology, 26(5):18–19.
Wikipedia Community (2012). Wikipedia List of Com-
mon Misspellings. http://en.wikipedia.org/wiki/
Wikipedia:Lists_of_common_misspellings. Last ac-
cess 15 Jan. 2012.
Wiktionary Community (2012). Wiktionary Online Col-
laborative Dictionary. http://en.wiktionary.org/wiki/
Wiktionary:Main_Page. Last access 15 Jan. 2012.
Wilcox-O’Hearn, A., Hirst, G., and Budanitsky, A. (2008).
Real-Word Spelling Correction with Trigrams: A Re-
consideration of the Mays, Damerau, and Mercer
Model. In A. Gelbukh, editor, In Proceedings of
CICLing-2008 (LNCS 4919, Springer-Verlag, pages
605–616.
Wong, W., Liu, W., and Bennamoun, M. (2006). Integrated
Scoring for Spelling Error Correction, Abbreviation
Expansion and Case Restoration in Dirty Text. In
5th Australasian conference on Data mining and ana-
lystics (AusDM’06), pages 83–89, Sydney, Australia.
Australian Computer Society.
ATooltoEvaluateErrorCorrectionResourcesandProcessesSuitedforDocumentsImprovement
35