humorous DEW is defined as a word with two mean-
ings, one of which is, at the same time, the least com-
mon and most interesting one. More specifically, a
DEW can be characterized by the followingattributes:
• WORD is the lexical unit (e.g. a single word or a
phrase).
• AMBIGUITY is a list of two or more “meanings”
associated to the WORD.
• DEPTH expresses the different typicality of the
two meanings. For example, a two fold ambigu-
ity will be associated to a main meaning (called
surface meaning, with depth 1) and a secondary
meaning (called hidden meaning, with depth 2).
• SLANT is a set of additional semantic labels asso-
ciated to the hidden meaning, and characterizing it
as potentially humorous. Slant labels can be used
to emphasize the humorous role of hidden mean-
ing. For example, slant labels can be selected in
order to evoke ridiculous trait of people.
Two main operations are associated to a database of
DEWs: 1) extraction of attribute value of a DEW as-
sociated to an input word and 2) selection of the sub-
set of DEWs corresponding to an input slant. The
proper indexing of a large database of DEWs accord-
ing to the slant values is crucial for an efficient re-
trieval of items for creative applications.
4 RESOURCE DESCRIPTION
The developmentof DEWN was performedaccording
to three different types described below. Each of them
corresponds to a different form of lexical ambiguity:
homonymy, homophony, and idiomatic ambiguity.
4.1 Homonymic DEWs
Homonymy is defined as the relation between words
that share the same spelling and pronunciation but
have different meanings. This is the most typically
recognized form of lexical ambiguity and the one em-
ployed to define word meanings in a monolingual En-
glish dictionary. The term is used here as synonym of
polysemy, even thought the latter one is often used to
indicate words that have at least some feature in com-
mon (Blank, 1999). In WordNet each word meaning
is represented by a set of synonyms (synset) and asso-
ciated to a specific ID in the database. Each word is
associated to one of more senses (i.e. ranked synsets).
The sense ranking is performed according to their oc-
currence frequency in a reference corpus annotated
according to WordNet senses. So it is natural to iden-
tify homonymic DEWs as words in WordNet with at
least two senses. The sense number expresses the
DEPTH attribute. A list of 24167 DEWs was extracted
from WordNet 3.1.
4.2 Homophonic DEWs
Homophony is defined here as the relation between
words that are phonetically identical (complete homo-
phones) or similar (partial homophones) but with dif-
ferent spelling.
The algorithm for the measure of the phonetic dis-
tance is a specific implementation of the Levenshtein
distance (Levenshtein, 1966). It is based on a se-
quence of elementary operations applied on the pho-
netic expression of a word in order to obtain another
word. Each step (i.e. application of an operation) is
associated to the value of a cost function. The se-
quence of steps, required to transform the first word
in the second one, and corresponding to the minimum
total value of cost, defines the distance between two
words. Three types of elementary operations are con-
sidered: substitution, insertion and deletion.
The cost value associated to the substitution op-
erator was assigned according to the phonetic type,
tonic accent, and vowel length. The algorithm re-
duces the phonetic distance between words to the dis-
tance between syllables, and the syllabic distance to
the distance between single phonemes.
The information on mapping between words
and their phonetic transcription was extracted from
the CMU pronouncing dictionary (available at
http://www.speech.cs.cmu.edu/cgi-bin/cmudict
).
A measure of the above described phonetic distance
was calculated for all pairs of words in WordNet,
in order to collect sets of homophones. A number
of 5400 total homophonic sets and 23050 partial
homophonic sets were filtered.
4.3 Idiomatic DEWs
Idiomatic ambiguity is a specific type of ambiguity
between literal and figurative language. Idioms are
defined here as multiword expressions whose mean-
ing cannot be inferred by the meaning of the compo-
nent words. The idiomatic meaning of a word is the
meaning associated to the idiom in which the word is
included.
A manual annotation of WordNet was performed
in order to identify lexical idioms (i.e. idioms con-
sisting of a composed word). The collection includes
3541 WordNet synsets. For each of them, one or more
component words were selected. For each idiomati-
cally ambiguous word, the surface meaning (or liter-
ally meaning) was defined as its first sense in Word-
AMBIGUOUS LEXICAL RESOURCES FOR COMPUTATIONAL HUMOR GENERATION
533