MALAPROPISMS DETECTION AND CORRECTION USING A

PARONYMS DICTIONARY, A SEARCH ENGINE

AND WORDNET

Costin-Gabriel Chiru, Valentin Cojocaru, Traian Rebedea and Stefan Trausan-Matu

“Politehnica” University of Bucharest, Department of Computer Science and Engineering

313 Splaiul Independentei, Bucharest, Romania

Keywords: Malapropism, Paronyms Dictionary, Cohesion, Search Engine, Lexical Chains, Filters, Chunking.

Abstract: This paper presents a method for the automatic detection and correction of malapropism errors found in

documents using the WordNet lexical database, a search engine (Google) and a paronyms dictionary. The

malapropisms detection is based on the evaluation of the cohesion of the local context using the search

engine, while the correction is done using the whole text cohesion evaluated in terms of lexical chains built

using the linguistic ontology. The correction candidates, which are taken from the paronyms dictionary, are

evaluated versus the local and the whole text cohesion in order to find the best candidate that is chosen for

replacement. The testing methods of the application are presented, along with the obtained results.

1 INTRODUCTION

During the last years, people have started to write

more and more electronic documents using the

programs available on everyone’s PC, because they

can use some features that a sheet of paper could not

offer. One of the most important such feature is the

automatic spellchecking. Many people are not

paying enough attention to the things they write,

knowing that if they make any mistake, the

spellchecker will point out the mistake or even

correct it. Nevertheless, even the best spellcheckers

sometimes fail in correcting a misspelled word,

introducing a different word than the original one

that has been misspelled, that is close to the initial

word from the editing distance point of view, but

semantically unrelated. More than that, even people

sometimes use other words instead of the ones that

they should, due to the lexical or phonetic similarity

between these words and to the insufficient

knowledge of the language or lack of attention. This

unintentional misuse of a word by confusion with

another one that sounds similar is called

malapropism and cannot be identified by an ordinary

spellchecker.

In this paper, we propose a method for the

automatic detection and correction of these

malapropisms using an ontology (WordNet), a

search engine (Google) and a paronyms dictionary.

In the next chapter, we present other approaches for

malapropisms detection. The paper continues with

the architecture of our application and the

experiments developed in order to test it, along with

a walkthrough example. We wrap up with the results

and the conclusions that can be drawn from them.

2 OTHER APPROACHES FOR

MALAPROPISMS DETECTION

One of the first approaches about handling

malapropisms was proposed by (Hirst and St-Onge,

1998). They presented a method for identifying and

correcting malapropisms based on the semantic

anomaly discovered through lexical chains. They

followed the assumption that malapropisms should

be words that do not fit in the context, so they

should be found in atomic chains. After detecting the

malapropisms, the authors applied the spelling

corrections procedures used by a spelling checker in

order to identify some possible corrections for the

given malapropism. Then they have tried to see

whether these corrections fit better into the lexical

chains, and if so, the corrections were made. In order

to test their assumption, they have built a corpus of

about 500 articles randomly selected from the Wall

364

Cojocaru V., Chiru C., Trausan-Matu S. and Rebedea T. (2010).

MALAPROPISMS DETECTION AND CORRECTION USING A PARONYMS DICTIONARY, A SEARCH ENGINE AND WORDNET.

In Proceedings of the 5th International Conference on Software and Data Technologies, pages 364-373

DOI: 10.5220/0002931803640373

 SciTePress

Street Journal discussing many different topics and

they have introduced a malapropism at every 200

words. This action resulted in a 322,645 words

corpus having 1,409 malapropisms. After building

the corpus, they have used their method for detecting

and correcting the malapropisms. The results

showed a 28.2% detection rate and a 24.8%

correction rate with a false alarm rate of 0.859%. A

similar idea is presented in (Hirst and Budanitsky,

2005), where the authors have noticed a 50% recall

and 20% precision for malapropisms detection, and

between 92 and 97.4% for malapropisms correction.

A different approach (Bolshakov and Gelbukh,

2003) proposes an algorithm for malapropisms

detection and correction based on evaluating the text

cohesion represented by the number of collocations

that can be formed between the words found in the

immediate context of each other. The words that do

not form any collocation in the context are signalled

as being possible malapropisms. The possible

corrections are generated and tested if they form at

least a collocation with the given context. In

(Gelbukh and Bolshakov, 2004) the authors suggest

that a paronyms dictionary could be very useful for

the generation of possible corrections. Two words

are called paronyms if they have only slight

differences in spelling or pronunciation, but they

have (complete) different meanings. This approach

was semi-automatically tested on a set of 16

sentences that were built so that each of them had a

malapropism. An accuracy of 68.75% has been

achieved. Nevertheless, considering the very small

dimension of the test, the accuracy could be affected

if the algorithm would be applied on a larger corpus.

More than that, since all the sentences had a

malapropism, the false alarm rate could not be

accurately detected. Variants of this algorithm were

also tested against three corpora, one written in

Spanish and having 125 malapropisms (Bolshakov

et al., 2005) and the other two written in Russian

(Bolshakova et al., 2005), having 100 malapropisms

each. For Spanish, all the malapropisms have been

identified, and around 90% were correctly replaced.

For Russian, 99% of the malapropisms have been

identified, and around 91% were correctly replaced.

Though the obtained results are very good, one

should notice that the algorithm uses three constants

(P, NEG and Q) that have been empirically

determined to optimize the results and therefore are

very text-dependent, as shown by the values chosen

for these corpora: P = 3500, NEG ≈ -9, Q = -7.5 for

the Spanish corpus (Bolshakov et al., 2005), and P

=1200, NEG = -100, Q = -7.5 for the Russian ones

(Bolshakova et al., 2005). Again, the corpus

contained only phrases that contained a

malapropism, so the false alarm rate could not be

computed.

Besides these methods that exploit the semantic

similarity between words, other approaches are

based on statistical methods. Among these, there are

methods employing n-grams (Mays et al., 1991;

Wilcox-O’Hearn et al., 2008), Bayesian methods

(Gale et al., 1993; Golding, 1995), POS tagging

(Marshall, 1983), or a combination between the

latter two methods (Golding and Schabes, 1996).

3 THE ARCHITECTURE OF THE

APPLICATION

Our application has two main modules – one for

the malapropisms detection and the other for their

correction – and a couple of sub-modules as it can

be seen in Fig. 1. In this application, we have used

some external technologies (marked by italics in the

figure): as a POS tagger we used Q-tag (which can

be found online at http://web.bham.ac.uk/

O.Mason/software/tagger/), but there are many

others freely available (http://www-nlp.

stanford.edu/links/statnlp.html# Taggers); a Web

search engine – we have used Google because at the

moment it is the most popular search engine; a

lexical ontology – we have used WordNet because

the APIs provided by the developers are very useful

in building the lexical chains; and a paronyms

dictionary that has been compiled in our department

based on the WordNet ontology – therefore

containing only common words. The dictionary has

77,503 words, 22,020 of them (28.4%) having at

least one first-level paronym. We called two words

to be first-level paronyms if they are at editing

distance of 1.

For the malapropisms detection and correction,

we have tried to improve the results by combining

the methods based on semantic similarity between

words with the statistical ones. Therefore, we

consider both the lexical chains and words co-

appearance as measures for text coherence, while the

statistical methods are represented by the way we

decide whether the co-appearance of two chunks of

text is statistically correct or not.

We believe that the lexical chains represent the

context of the whole text, while the words co-

appearance expresses the cohesion of the immediate

context of each word. This is why the malapropisms

detection and correction is done in two stages: an

initial detection that checks for local anomalies is

done in the detection module, while in the second

MALAPROPISMS DETECTION AND CORRECTION USING A PARONYMS DICTIONARY, A SEARCH ENGINE

AND WORDNET

365

Figure 1: The architecture of the application. The texts written in italics represent third-party technologies that have been

used for the application. The underlined texts represent the input given by the user and the output given by the system. The

bold texts represent the two main modules of the application: the malapropisms detector and the malapropisms corrector.

stage the results of this phase are revised in the

context of the whole text, during the correction.

3.1 Malapropisms Detection

This module is responsible for the initial

identification of the possible malapropisms by

detecting anomalies in the local text coherence. In

order to achieve this, we have used the Google

search engine. The search engine receives two

chunks of text – the means of selecting these chunks

is described in the next paragraph – and based on the

mutual information inequality it evaluates if their co-

appearance is statistically correct in a manner similar

to the collocation testing suggested in (Bolshakov

and Gelbukh, 2003).

If we simply send the content words to Google,

we cannot check whether the local text coherence is

damaged, because these words are rarely adjacent.

This is the reason for also considering the functional

words surrounding these content words when

evaluating the local text coherence. Therefore, we

have built and used a pseudo-chuncker that groups

the words in chunks before sending them to the

search engine. These chunks contain all the

functional words between any two content words

next to each other. After the phrase has been

decomposed in chunks, these are sequentially

evaluated in order to identify the potential drop in

the text coherence. Hence, the first two chunks are

sent to the search engine, and the results are parsed

in order to find three pieces of information: the

number of hits for the first chunk, the number of hits

for the second chunk and the number of hits for the

co-occurrence of the two chunks (considering the

second chunk is right after the first one). From now

on, we will address these numbers with the

following names: no_pages1, no_pages2 and

no_combined. These scores are evaluated and then

the process continues with the next chunk,

evaluating the coherence between the second and the

third chunk. The process ends when the coherence

between the last two chunks is evaluated.

The coherence evaluation is done based on six

progressive filters that depend on these three

numbers that are obtained from the search engine.

The assumptions behind these six filters are: the

fewer hits of the co-occurrences of the two chunks,

the greater probability of a malapropism and, the

more pages for the individual chunks – having the

same number of co-occurrences of the two chunks –

the greater probability of a malapropism. In order to

model these facts, we used a parameter (beta) that is

modified depending on the values of no_combined.

The first filter is for the case when no_combined

has a very small value. The purpose of this filter is

to eliminate the noise caused by the indexed pages

that are grammatically incorrect and it does not

depend on the number of hits for the individual

chunks. If no_combined is smaller than the filter’s

upper threshold (which we considered to be 20) then

a possible malapropism is signalled. If no_combined

is greater than the threshold, one of the next filters is

applied and a malapropism is signaled if the

following formula is true.

(1)

ICSOFT 2010 - 5th International Conference on Software and Data Technologies

366

The pages parameter from the formula above

represents the number of indexed pages written in

the used language. For English, this number could

be easily found by sending the word “the” to the

search engine and noticing the number of hits. At the

moment of writing this paper, more than 11 billion

pages written in English were indexed by Google.

This value is automatically detected every time the

application is launched.

The second filter is applied when a low number

of co-occurrences is obtained (less than 500). Here,

the value of the parameter beta is 1.05 in order to

provide a tougher filtering than normal, according to

the fact that fewer hits of the co-occurrences of the

two chunks imply greater probability of a

malapropism. Therefore, the filtering is not

dependent on the input text, but on the number of

hits of the co-occurrence of the two chunks.

The third filter applies to the co-occurrences that

have between 501 and 12,000 hits, being the filter

that is the most often used. For this filter, beta takes

the value 1 instead of 1.05 as in the previous filter,

because this is considered the regular filter from the

permission point of view. From now on, the

permissibility will constantly drop, since the number

of hits for the co-occurrence of the two chunks

becomes higher and higher, therefore the probability

of being a malapropism decreases.

The fourth filter is applied when no_combined is

between 12,001 and 14,000 and here, beta’s value is

0.95. The fifth filter lowers again the probability of

having a malapropism by considering beta to 0.9 and

is applied for chunks that have no_combined

between 14,001 and 15,000. Finally, the most

permissive filter is applied when no_combined

between 15,001 and 16,000, beta having the value of

0.8.

Above this final threshold (16,000), no possible

malapropisms are signalled, since having a very

large number of hits, one cannot precisely tell if a

malapropism occurred or there was just a less often

combination of two very popular chunks of text.

The presented thresholds and the coefficient for

the co-occurrence of the two chunks that these filters

depend on have been empirically determined and

they are language and time dependent, but they are

text independent. First of all, the values depend on

the language, because the number of pages written in

different languages is not the same. These values

have been detected for English, but if the language is

changed, the value of the “pages” parameter also

changes, and the same happens with the values of

no_pages1, no_pages2 and no_combined, so the

thresholds are not accurate any more. The values are

also time dependent, because the Internet is in a

continuous expansion and therefore, the number of

the written pages available to the search engines

continue to increase and in the same time, the

probability of finding incorrect text also increases,

affecting the thresholds of the presented filters.

Considering the large number of queries that are

sent to the search engine, we have also investigated

the possibility of using the Google 5-grams corpus

“Web 1T 5-gram Version 1 Corpus” (Brants and

Franz, 2006) instead of sending our queries to the

search engine. Besides his very large size (30 GB of

compressed text), which makes it difficult to

integrate in any application, we have observed

another drawback of this corpus: the document n-

grams were not completely covered by the corpus’

n-grams – the covering varied from 90% in the case

of bigrams to 15% in the case of 5-grams. More than

that, the nature of our application made us give up at

this corpus, because in the application we do not

know a-priori the degree of the n-grams that are

going to be used, since this is determined

dynamically by the pseudo-chuncker.

The purpose of this module is to limit as much as

possible the number of misses in the malapropisms

detection. The signalled malapropisms generated in

this module should cover all the real malapropisms

that exist in text. The module also signals a lot of

fake malapropisms, but they will be evaluated in the

next module and some of them will be ignored.

3.2 Malapropisms Correction

There are two main purposes of this module. The

first one is to determine which of the signalled

malapropisms from the previous step are false

alarms in order to eliminate them. The second

purpose is to detect the most probable candidates for

the remaining malapropisms in order to correct the

errors. This module uses all three technologies: the

paronyms dictionary in order to identify the

candidates for the possible malapropisms correction,

lexical chains in order to filter the list of candidates

for finding the ones that fit into the context and,

finally, the search engine in order to decide which is

the best candidate in the case that there are more that

fit into the lexical chains.

The module also works sequentially by analyzing

every pair of two chunks of words and deciding

whether a malapropism or a false alarm has been

found, and in the case of a malapropism, what

should be the replacement word. If the pair contains

no signalled malapropisms, than the process

continues with the next chunk, until a signaled

MALAPROPISMS DETECTION AND CORRECTION USING A PARONYMS DICTIONARY, A SEARCH ENGINE

AND WORDNET

367

Figure 2: Isolated malapropism found in the first/last word in the phrase. The malapropism is found in one of the two

chunks written in italics. The paronyms of the malapropos words are chosen in conjunction with the only chunk that it

relates to.

Figure 3: Isolated malapropism found in the middle of the phrase. The malapropism is found in the chunk written with

italics. The paronyms of the malapropos word are chosen in conjunction with both the chunks that surround the one

containing the malapropism.

Figure 4: Malapropisms chain. The malapropisms are found in the chunks written in italics. In the first case (a) only one of

the malapropos words is corrected so that both malapropisms disappear. The actions described by either the continuous

arrows or by the interrupted ones are executed. In the second case (b), the malapropos words are handled independently, so

both the actions described by the continuous and interrupted arrows are executed.

ICSOFT 2010 - 5th International Conference on Software and Data Technologies

368

malapropism is found. The correction is done in

three stages: first of all, the replacement candidates

that ensure the local cohesion are identified using

the paronyms dictionary; these words are then

filtered against the text logic so that the whole text

cohesion to be maintained; finally, the replacement

word is chosen from the remaining words, based on

the information given by the search engine relating

to the probability of fitting in the local context.

For the detection of the replacement candidate

words, there are three possible situations that are

treated separately: a signalled malapropism in the

first/last word in a sentence (Fig. 2), an isolated

malapropism in the middle of the sentence (Fig. 3),

or a malapropisms chain (Fig. 4).

The analysis of a pair of chunks begins with the

extraction of all the paronyms of the content word

from the chunk signalled as containing a

malapropism. Then, every paronym replaces the

malapropos word and the local cohesion of the

phrase is tested in order to avoid replacing a word

with a paronym that is worse than it, from the

cohesion point of view. This time the cohesion is

tested versus both chunks that surround the

problematic one (Fig. 3), except the special case

when that chunk is the first or the last one in the

phrase (Fig. 2). The cohesion testing between two

chunks is done in a similar way as described in the

detection module.

Ideally, we obtain a list of paronyms that fit

perfectly in the phrase without drops in cohesion

between the chunks that the malapropism is part of

and the one before and after it. These words become

candidates for replacement and the signalled

malapropism is marked as a real one that should be

corrected. If the paronym fits with only one of the

chunks, it is also saved as a possible candidate, but

weaker than the regular ones, needing further

investigation. The malapropism is not yet marked,

but the signal received from the detection is not

ignored either.

Sometimes a malapropisms chain is observed in

the phrase (Fig. 4). Most of the time, this is caused

by a malapropism that makes both its chunk and the

next one to be signalled as possible malapropisms.

To solve this problem, we start from the premise that

only one of the chunks contains a malapropism, and

try to find a replacement that corrects both the

malapropisms (Fig. 4a) for only one of the two

malapropos words. In this case, two pairs of chunks

are corrected together: the one containing both the

malapropisms and one of the two pairs containing

only one of the signalled malapropisms. If this

correction is not possible, then each malapropism is

handled independently, trying to correct both of

them separately (Fig. 4b). If this is still impossible,

then we try to correct at least one of the

malapropisms, leaving the other one as it is.

At this point, if none of the paronyms of a

signalled malapropos word fits into the local

context, without damaging the cohesion, then the

signalled malapropism is considered a false alarm.

The next step is to filter the candidates for

replacing a malapropos word against the text logic.

The idea behind this step is that a word that fits in

the logical presentation of the text topics should be

encountered. To verify if the word fits in the text

logic, we extracted the lexical chains of the given

text and tried to see if the candidates can be found in

one of these chains. The candidates that did not fit in

any chain have been discarded. Again, if no

candidates have been kept for a signalled

malapropism, the signal is ignored and considered a

false alarm. If there is a single candidate for a

malapropos word, then that candidate replaces it and

a correction is signalled. If there are more candidates

to replace a malapropos word, then they are

evaluated using the search engine in the same way as

in the detection module and the candidate with the

best score is chosen as a replacement.

4 WALKTHROUGH EXAMPLE

In order to demonstrate our approach, we shall

present an example of the detection and correction

mechanisms described in the previous section,

considering a simple example:

I am travelling around the word [world].

In the first step, we identify the lexical chains of

each content word from the phrase, using WordNet.

Then, the phrase is split into the following chunks

using the pseudo-chunker: I, am travelling and

around the word. Afterwards, we look for the co-

occurrence of any two consecutive chunks and we

obtain the following results using Google:

"I am travelling" – 1620000 hits

"am travelling around the word" – 3 hits

Thus, the first combination will be considered

correct because of the large number of hits (having

over 16000 hits), while the second combination is

signalled as a possible malapropism due to the very

low number of hits (below 20 hits).

The next step is to look for the paronyms of the

content word in the signalled chunk, which are:

cord, ford, lord, sword, ward, wyrd, woad, wold,

MALAPROPISMS DETECTION AND CORRECTION USING A PARONYMS DICTIONARY, A SEARCH ENGINE

AND WORDNET

369

wood, wordy, work, worm, worn, wort, world. We

replace the content word with each of its paronyms

and we look for the number of hits of the newly

obtained chunk, along with the previous one. For

each such combination, we try to apply one of the

filters described in the previous section. The only

one that passes through the filters is “am travelling

around the world” which has 4120 hits and it is

passed through the third filter. Replacing these

results into (1) and considering beta equal to 1.00 we

obtain a valid relation that shows that world is a

valid candidate in order to correct the signalled

malapropos word.

In conclusion, a malapropism is signalled and the

corrected form is presented to the user:

I am travelling around the world.

5 EXPERIMENTS

The accuracy of the application depends greatly on

the completeness of the paronyms dictionary

because the correction method relies on the fact that

the candidate words for replacing the malapropos

ones are available in the dictionary. And since it

contains only first-level paronyms, it means that the

application is limited only to the detection of the

malapropisms where the correct word and the

malapropos one are first-level paronyms. Another

limitation of this method rises from the fact that the

dictionary has been built based on the concepts from

WordNet, without containing declined forms of the

words.

Considering these limitations, we had to build

some test corpora in order to determine the accuracy

of our approach. Therefore, three different types of

corpora have been used: the first corpus was built

from individual phrases that contained malapropisms

in order to evaluate the rate of malapropisms

detection and correction; the second contained no

malapropisms at all and was used in order to

estimate the rate of false alarms; and finally, the

third type of corpus consisted of parts of text

published on the Internet and modified in the same

manner suggested in (Hirst and Budanitsky, 2005)

and (Hirst and St-Onge, 1998).

The first corpus was built to evaluate the

malapropisms detection and correction rate and

contains 31 distinct phrases. The first 11 of them are

variants of the examples 1-8, 11, 12 and 14 taken

from (Bolshakov and Gelbukh, 2003) that are

adapted to suit the limitations of our application by

changing the malapropos word that was a second-

level paronym of the correct word by a first-level

paronym of this word. The phrases 12-15 are the

examples 4, 6, 7 and 8 from (Hirst and Budanitsky,

2005), while the phrases 16-18 are the examples 10-

12 from (Hirst and St-Onge, 1998). The rest of the

corpus has been built by the authors.

1. They are travelling around the word [world].

2. The salmons swim upstream to pawn [spawn].

3. Take it for granter [granted].

4. The bowel [vowel] is pronounced distinctly.

5. She has a very loose vowel [bowel].

6. He wears a turfan [turban] on his head.

7. This is an ingenuous [ingenious] machine for

peeling bananas.

8. A quite affordable germ [term] is proposed.

9. The kinds of Greek columns are Corinthian,

Doric, and Ironic [Ionic].

10. The desert was activated by irritation

[irrigation].

11. This is only a scientific hypothesise [hypothesis].

12. It is my sincere hole [hope] that you will recover

swiftly.

13. Maybe the reasons the House Democrats won’t

let the contras stand and fight for what they

believe in is because the Democrats themselves

no longer stand and fight for their beliefs. The

House’s liberals want to pull the plug on the

rebels but, lacking the courage to hold a straight

up or down vote on that policy and expose its

consequences to the U.S. electorate, they have to

disguise their intension [intention] as a funding

“moratorium.”

14. American Express says . . . it doesn’t know what

the muss [fuss] is all about.

15. Mr. Russell argues that usury flaw [law]

depressed rates below market levels year ago . . .

16. Much of that data, he notes, is available toady

[today] electronically.

17. Among the largest OTC issues, Farmers Group,

which expects B.A.T. Industries to launch a

hostile tenter [tender] offer for it, jumped 2 3/8

to 62 yesterday.

18. But most of yesterday’s popular issues were

small out-of-the-limelight technology companies

that slipped in price a bit last year after the crush

[crash], although their earnings are on the rise.

19. My chat [cat] likes mice.

20. The question is: to eat or not to beat [eat].

21. Move your spawn [pawn] to attack the queen.

22. A core [sore] throat is the thing I want less.

ICSOFT 2010 - 5th International Conference on Software and Data Technologies

370

23. Boss [Toss] a coin and see whether it is tails or

not.

24. He has a beautiful deep vice [voice].

25. I want to watch a horror move [movie] on TV.

26. We should sharpen our glade [blade] and attack

the enemy.

27. People said they saw an unidentified flying

abject [object].

28. Mild [Wild] places are hard to find these days.

29. He climbed up the bill [hill].

30. The superstition of seeing a back [black] cat is

one of the most well-known and popular

superstitions today.

31. We should pay deed [heed] to the words of our

elders.

For the examples 9, 10, 13 from (Bolshakov and

Gelbukh, 2003) and 5 from (Hirst and Budanitsky,

2005) we could not find a first-level paronym to

replace the original malapropos that was a second-

level paronym of the correct word. We have also

tested with the original examples from (Bolshakov

and Gelbukh, 2003) and (Hirst and Budanitsky,

2005), but in this case we have manually added to

the dictionary the second-order paronyms of the

correct words.

The second corpus that has been used to test our

approach consisted only of phrases that had no

malapropisms at all. For this corpus, we have used

the examples 1-5, 8 and 14-16 from (Hirst and St-

Onge, 1998) and the examples 9 and 10 from (Hirst

and Budanitsky, 2005). The rest of the corpus was

built from news taken from Yahoo

(news.yahoo.com) in mid June 2009:

1. The North's threats were the first public

acknowledgment that the reclusive communist

nation has been running a secret uranium

enrichment program.

2. The resolution also authorized searches of North

Korean ships suspected of transporting illicit

ballistic missile and nuclear materials.

3. President Barack Obama says he's now found

savings that will pay almost all the costs of a

massive overhaul of America's health care

system.

4. Any honest accounting must prepare for the fact

that health care reform will require additional

costs in the short term in order to reduce

spending in the long term.

5. She has handled only a small number of K-12

education cases during her 17 years on the

federal bench, but the trials-- which have focused

on such key issues as special education, racial

discrimination, and student freedom of

expression --could offer clues on future school

policy matters if she joins the court.

6. The big goals of the new American general

taking charge of the war in Afghanistan start

with fixing a problem that bedeviled the man he

is replacing: the repeated, inadvertent killing of

civilians.

7. Nearly 700,000 calls were received by a federal

hot line this week from people confused about

the nationwide switch from analog to digital TV

broadcasts that occurred Friday.

8. About a third of the calls were about federal

coupons to pay for digital converter boxes, an

indication that at least 100,000 people still didn't

have the right equipment to receive digital

signals.

9. He jokes about all politicians but it's becoming

clearer where his sympathies lie — something

that Palin and her supporters sensed in their

criticisms.

10. The rival candidate said the vote was tainted by

widespread fraud and his followers responded

with the most serious unrest in the capital in a

decade.

11. If everyone at the Tony’s were aware that Bret

missed his mark, then they should have been

aware enough to stop the set piece from hitting

him or at least slowed it down until he cleared

the stage.

Finally, for the third type of corpus we have

considered two distinct corpora: a small one,

containing a few paragraphs (199 words) taken from

a Fox News, and a larger one, consisting of 2083

words.

In this text, we introduced a malapropism by

replacing the word fraud by one of its first-level

paronyms, frau. In the bigger corpus – which is too

large to be presented here –, we randomly

introduced 25 malapropisms in a manner similar to

the one used in (Hirst and Budanitsky, 2005) and

(Hirst and St-Onge, 1998).

6 INTERPRETATION OF THE

RESULTS

For the first corpus, 27 out of the 31 examples were

correctly detected and 25 of them were properly

corrected, representing an accuracy of 87.05% for

the malapropism detection and 80.64% for

MALAPROPISMS DETECTION AND CORRECTION USING A PARONYMS DICTIONARY, A SEARCH ENGINE

AND WORDNET

371

correction. Only 4 examples were not detected (9,

10, 12, 15) and another two were wrongfully

corrected (muss was replaced by mass instead of

fuss in example 14, while crush was replaced by

rush instead of crash in 18). The tests made on the

phrases containing second-level paronyms have

shown that these malapropos words could be

properly corrected if the paronyms existed in the

dictionary. These malapropisms were always

detected, and all of them have also been corrected

after the corresponding paronyms were manually

inserted in the dictionary. This result made us

believe that the method could also have good results

if applied to the correction of the malapropos words

that are second-level paronyms to the correct word,

and suggested us to build another dictionary that

contains both first and second-level paronyms.

The second corpus, built in order to evaluate the

rate of false alarms introduced by the application,

contained 587 words. Only one false alarm was

inserted, replacing the word “while” with “white” in

the example 16 taken from (Hirst and Budanitsky,

2005) (And while institutions until the past month or

so stayed away from the smallest issues for fear they

would get stuck in an illiquid stock,…). This false

alarm was caused by the POS-tagger that we used

because it wrongly identified “while” as being a

noun and replacing it with the more plausible word

“white”. This test has shown that the application has

a probability of only 0.17% of introducing a false

malapropism in the text. Since the text did not

contain any malapropisms, the probability has been

computed as the ratio between the number of

introduced malapropisms divided by the total

number of words from the corpus.

Considering the good results that we obtained for

these two compiled corpora, we decided to try the

application on real texts. The test for the smaller

text, containing almost 200 words and one

malapropism, has shown that the malapropism has

been corrected, but a false alarm has been introduced

by replacing the correct word “fighting” with the

word “sighting”. This test has shown that we

underestimated the rate of false alarms, which for

this text was 0.5%.

The test on the larger text (the Yahoo News),

shown that 21 malapropisms have been detected out

of the 25, and 17 of them were properly corrected.

The application has also introduced 10 false alarms,

by replacing some correct words. The results of this

test have shown an application performance of 84%

for the detection rate, 68% for correction and a false

alarm rate of 0.48%. Analyzing the false alarms

introduced by the application, we have seen that 6

out of the 10 malapropisms introduced by the

application were in the vicinity of a proper noun,

one of them being exactly a proper noun

malapropism (Iran has been replaced by Iraq – this

happened because both countries have similar

contexts on the Internet: geographically, politically,

religiously, etc.). This observation upheld our insight

that the application has problems with the proper

nouns, the numbers and the metaphors found in the

analyzed texts.

7 CONCLUSIONS

In this paper we have presented a fully automatic

method for malapropisms detection and correction

for texts written in English, having very good results

for this very difficult task: between 84% and 87%

for malapropism detection, between 68% and 80%

for malapropisms correction and around 0.5% rate of

introducing new malapropisms in texts. Moreover,

this method could be easily adapted for correcting

malapropisms in any language if a lexical database

similar to Wordnet and a paronyms dictionary are

available for that language.

ACKNOWLEDGEMENTS

The research presented in this paper was partially

performed under the FP7 EU STREP project LTfLL.

REFERENCES

Bolshakov, I. A., Galicia-Haro, S. N., Gelbukh, A., 2005.

Detection and Correction of Malapropisms in Spanish

by means of Internet Search. 8th International

Conference Text, Speech and Dialogue (TSD-2005),

Karlovy Vary, Czech Rep. In: Lecture Notes in

Artificial Intelligence (indexed in SCIE), N 3658,

ISSN 0302-9743, ISBN 3-540-28789-2, Springer-

Verlag, pp. 115–122.

Bolshakov, I., Gelbukh, A., 2003. On Detection of

Malapropisms by Multistage Collocation Testing.

NLDB-2003, 8th International Conference on

Application of Natural Language to Information

Systems, June 23–25, 2003, Burg, Germany. In:

Lecture Notes in Informatics., Bonner Köllen Verlag,

ISSN 1617-5468, ISBN 3-88579-358-X, pp. 28–41.

Bolshakova, E., Bolshakov, I. A., Kotlyarov, A., 2005.

Experiments in Detection and Correction of Russian

Malapropisms by Means of the Web. In: International

Journal on Information Theories & Applications.

V.12, N 2, p 141-149.

ICSOFT 2010 - 5th International Conference on Software and Data Technologies

372

Gale, W. A., Church, K. W. and Yarowsky, D., 1993. A

method for disambiguating word senses in a large

corpus. In Computers and the Humanities, 26:415–

439.

Gelbukh, A., Bolshakov, I. A., 2004. On Correction of

Semantic Errors in Natural Language Texts with a

Dictionary of Literal Paronyms. Jesus Favela,

Ernestina Menasalvas, Edgar Chávez (Eds.) Advances

in Web Intelligence (AWIC-2004, 2nd International

Atlantic Web Intelligence Conference, May 16–19,

2004, Cancun, Mexico). In: Lecture Notes in Artificial

Intelligence (indexed by SCIE), N 3034, Springer-

Verlag, ISSN 0302-9743, ISBN 3-540-22009-7, pp.

105–114.

Golding, A., 1995. A bayesian hybrid method for context-

sensitive spelling correction. In The Third Workshop

on Very Large Corpora, pages 39–53.

Golding, A. and Schabes, Y., 1996. Combining trigram-

based and feature-based methods for context sensitive

spelling correction. In 34th Annual Meeting of the

Association for Computational Linguistics.

Hirst, G., Budanitsky, A., 2005. Correcting Real-Word

Spelling Errors by Restoring Lexical Cohesion. In:

Computational Linguistics. Natural Language

Engineering, 11:87–111.

Hirst, G., St-Onge, D., 1998. Lexical Chains as

Representation of Context for Detection and

Corrections of Malapropisms. In: C. Fellbaum (ed.)

WordNet: An Electronic Lexical Database. The MIT

Press, p. 305-332.

Marshall, I., 1983. Choice of grammatical word-class

without global syntactic analysis: tagging words in the

LOB corpus. In Computers and the Humanities,

17:139–150.

Mays, E., Damerau, F. J. and Mercer, R. L., 1991. Context

based spelling correction. In Information Processing

and Management, 27(5):517–522.

Wilcox-O’Hearn, A., Hirst, G. and Budanitsky, A., 2006.

Real-word spelling correction with trigrams: A

reconsideration of the Mays, Damerau, andMercer

model. In CICLing-2008, 9th International

Conference on Intelligent Text Processing and

Computational Linguistics, pp. 605–616, Haifa, Israel.

MALAPROPISMS DETECTION AND CORRECTION USING A PARONYMS DICTIONARY, A SEARCH ENGINE

AND WORDNET

373