AMBIGUOUS LEXICAL RESOURCES

FOR COMPUTATIONAL HUMOR GENERATION

Alessandro Valitutti

Department of Computer Science and HIIT, University of Helsinki, Helsinki, Finland

Keywords:

Ambiguity, Computational Humor, Creative Lexical Resources.

Abstract:

The ongoing work presented here is aimed to investigate to what extent it is possible to perform a feasible

use of ambiguous texts in computational humor generation. The ﬁrst core of a lexical database was developed

in order to collect ambiguous terms in the English lexicon. Then an exploratory use of the resource for

computational humor generation was performed. Finally, three existing prototypes of humor generator were

simulated in order to generate different form of humorous messages from the same lexical resource.

1 INTRODUCTION

Humor and creativity are strictly connected. As wit-

tily pointed out by Joel Goodman, “there is a connec-

tion between HA HA, and AHA” (Goodman, 1995).

In the generation of a joke, a typical creative process

consists of the invention of new ways to violate recip-

ients expectation and then induce surprise. A more

speciﬁc form of creativity is in the discovery of con-

nections that allows the humorist to emphasize ridicu-

lous aspects of people.

Nevertheless in most cases part of the information

necessary for the creation of humorous surprise ef-

fects is already present in the common sense knowl-

edge and the linguistic use. Creativity in this case

consists of the appropriatereuse of pre-existing pieces

of knowledge coded in the language.

This paper is focused on linguistic ambiguity. The

use of ambiguous texts is a common and effective

way to achieve the surprise effect. More speciﬁ-

cally, the ongoing work presented here is aimed to

investigate to what extent it is possible to perform

a feasible use of ambiguous texts in computational

humor generation. As a ﬁrst step, the focus is on

the lexical level. The ﬁrst core of a lexical database,

characterized as an extension of WORDNET 3.1

(Fellbaum, 1998), was developed in order to collect

ambiguous terms in the English lexicon. Items are

deﬁned according to three different possible types

of lexical ambiguity (homonymy, homophony and

idiomatic ambiguity) and called double-edged words

(DEW). The database was then called DOUBLE-

EDGED WORDNET (DEWN).

As a second step, an exploratory use of the resource

was started. Three existing prototypes of humor gen-

erator were simulated in order to generate different

form of humorous messages from the same lexical re-

source. In this way, the aim is to perform some step

toward a more general model of humor generation, in

which part of the linguistic knowledge can be reused

and extended over the time.

2 BACKGROUND

To date there are only a limited number of researches

on the computational generation of humorous texts.

Ritchie provides a systematic review of the most re-

markable verbal humor generators developed in the

last 20 years (Ritchie, 2004). The most remarkable

of them are LIBJOG, a program for the generation

of light bulb jokes (Raskin and Attardo, 1994), JAPE

program producing a speciﬁc type of punning riddles

(Binsted et al., 1997), and HAHAcronym, a generator

of humorous acronyms (Stock and Strapparava, 2002)

3 CHARACTERIZATION OF

DOUBLE-EDGED WORDS

The design and development of DEWN is based on

the idea of double-edged word (from now on called

DEW), an abstract data structure introduced for mod-

eling a speciﬁc type of ambiguous lexical unities. A

532

Valitutti A..

AMBIGUOUS LEXICAL RESOURCES FOR COMPUTATIONAL HUMOR GENERATION.

DOI: 10.5220/0003882305320535

In Proceedings of the 4th International Conference on Agents and Artiﬁcial Intelligence (ICAART-2012), pages 532-535

ISBN: 978-989-8425-95-9

 2012 SCITEPRESS (Science and Technology Publications, Lda.)

humorous DEW is deﬁned as a word with two mean-

ings, one of which is, at the same time, the least com-

mon and most interesting one. More speciﬁcally, a

DEW can be characterized by the followingattributes:

• WORD is the lexical unit (e.g. a single word or a

phrase).

• AMBIGUITY is a list of two or more “meanings”

associated to the WORD.

• DEPTH expresses the different typicality of the

two meanings. For example, a two fold ambigu-

ity will be associated to a main meaning (called

surface meaning, with depth 1) and a secondary

meaning (called hidden meaning, with depth 2).

• SLANT is a set of additional semantic labels asso-

ciated to the hidden meaning, and characterizing it

as potentially humorous. Slant labels can be used

to emphasize the humorous role of hidden mean-

ing. For example, slant labels can be selected in

order to evoke ridiculous trait of people.

Two main operations are associated to a database of

DEWs: 1) extraction of attribute value of a DEW as-

sociated to an input word and 2) selection of the sub-

set of DEWs corresponding to an input slant. The

proper indexing of a large database of DEWs accord-

ing to the slant values is crucial for an efﬁcient re-

trieval of items for creative applications.

4 RESOURCE DESCRIPTION

The developmentof DEWN was performedaccording

to three different types described below. Each of them

corresponds to a different form of lexical ambiguity:

homonymy, homophony, and idiomatic ambiguity.

4.1 Homonymic DEWs

Homonymy is deﬁned as the relation between words

that share the same spelling and pronunciation but

have different meanings. This is the most typically

recognized form of lexical ambiguity and the one em-

ployed to deﬁne word meanings in a monolingual En-

glish dictionary. The term is used here as synonym of

polysemy, even thought the latter one is often used to

indicate words that have at least some feature in com-

mon (Blank, 1999). In WordNet each word meaning

is represented by a set of synonyms (synset) and asso-

ciated to a speciﬁc ID in the database. Each word is

associated to one of more senses (i.e. ranked synsets).

The sense ranking is performed according to their oc-

currence frequency in a reference corpus annotated

according to WordNet senses. So it is natural to iden-

tify homonymic DEWs as words in WordNet with at

least two senses. The sense number expresses the

DEPTH attribute. A list of 24167 DEWs was extracted

from WordNet 3.1.

4.2 Homophonic DEWs

Homophony is deﬁned here as the relation between

words that are phonetically identical (complete homo-

phones) or similar (partial homophones) but with dif-

ferent spelling.

The algorithm for the measure of the phonetic dis-

tance is a speciﬁc implementation of the Levenshtein

distance (Levenshtein, 1966). It is based on a se-

quence of elementary operations applied on the pho-

netic expression of a word in order to obtain another

word. Each step (i.e. application of an operation) is

associated to the value of a cost function. The se-

quence of steps, required to transform the ﬁrst word

in the second one, and corresponding to the minimum

total value of cost, deﬁnes the distance between two

words. Three types of elementary operations are con-

sidered: substitution, insertion and deletion.

The cost value associated to the substitution op-

erator was assigned according to the phonetic type,

tonic accent, and vowel length. The algorithm re-

duces the phonetic distance between words to the dis-

tance between syllables, and the syllabic distance to

the distance between single phonemes.

The information on mapping between words

and their phonetic transcription was extracted from

the CMU pronouncing dictionary (available at

http://www.speech.cs.cmu.edu/cgi-bin/cmudict

A measure of the above described phonetic distance

was calculated for all pairs of words in WordNet,

in order to collect sets of homophones. A number

of 5400 total homophonic sets and 23050 partial

homophonic sets were ﬁltered.

4.3 Idiomatic DEWs

Idiomatic ambiguity is a speciﬁc type of ambiguity

between literal and ﬁgurative language. Idioms are

deﬁned here as multiword expressions whose mean-

ing cannot be inferred by the meaning of the compo-

nent words. The idiomatic meaning of a word is the

meaning associated to the idiom in which the word is

included.

A manual annotation of WordNet was performed

in order to identify lexical idioms (i.e. idioms con-

sisting of a composed word). The collection includes

3541 WordNet synsets. For each of them, one or more

component words were selected. For each idiomati-

cally ambiguous word, the surface meaning (or liter-

ally meaning) was deﬁned as its ﬁrst sense in Word-

AMBIGUOUS LEXICAL RESOURCES FOR COMPUTATIONAL HUMOR GENERATION

533

Net, and the hidden (or idiomatic meaning) as the ﬁrst

sense in the idiom in which the word is included.

4.4 Slant Indexing

In order to implement the SLANT attribute, (character-

izing potentially “interesting/relevant” meanings for

creative/humorous applications), a number of seman-

tic constraints were considered. Semantic constraints

can be classiﬁed in two categories: 1) absolute (i.e.

applied to a single meaning) and 2) relational (i.e. ap-

plied to a couple of meanings of the same word).

A next explorative annotation of previous col-

lected ambiguous terms was performed, exploit-

ing three lexical collections: WORDNET 3.1,

WORDNET-DOMAINS (Magnini and Cavagli`a, 2000)

and a list of positive/negative/polarized words. A

ﬁrst semantic labeling of DEWs was performed tak-

ing advantage of WordNet-Domains, and extension of

WordNet, in which synsets are tagged according to

a list of semantic domains. Since the last release of

WordNet-Domains is interfaced to WordNet 3.0, the

mapping to the 3.1 release was applied.

Other constraints were applied (and additional

lists of labeled words employed) in order to empha-

size the two following types of semantic opposition).

• Polarized Words. A list of positive and negative

words collected from the Web and the WordNet-

Affect lexical database (Strapparava and Valitutti,

2004) were both employed to ﬁlter ambiguous

words associated to meanings with opposite val-

ues of polarity.

• Metaphors. A list of metaphors for people was

automatically built exploiting the hypernym hier-

archy in WordNet. A list of high-level synsets

(called here metaphor categories) was deﬁned.

The list includes categories such as ANIMAL (see

the example in the next section, based on the def-

inition of ‘pig’), FOOD and TOOL. The criterion

for the selection of DEWs is that the default sense

is a descendant, in the hypernym hierarchy, of a

metaphor category, and the hidden sense is a de-

scendant of the category PERSON.

5 USE OF DEWN IN HUMOR

GENERATORS

As ﬁrst exploratory use DEWN in computational hu-

mor, a few examples are analyzed below. They are

obtained through the application of procedures simu-

lating a number of well-known computational humor

generators.

5.1 Examples of Punning Riddles

How do you deﬁne a pig?

It is a stout-bodied short-legged omnivorous

policeman.

In order to obtain this joke, the homonymic DEW

“pig” was selected. The deﬁnition (in the form of an-

swer) is the gloss of the default meaning (i.e. ﬁrst

WordNet sense of the corresponding noun), in which

the word “animal” was substituted by the ﬁrst syn-

onym (“policeman”) of the hidden meaning (i.e. third

WordNet sense).

The creation of a punning riddle starting from a

“lexical core” is inspired to the JAPE system (Bin-

sted and Ritchie, 1994), in which the joke is generally

based on a couple of phonetically similar words. An

analogue example is:

Who is a working girl?

A young streetwalker who is employed.

In this case, the deﬁnition is obtained though re-

placing “woman” (in the gloss of the default meaning)

with “a young streetwalker” (from the hidden mean-

ing).

5.2 Examples of Funny Acronyms

This type of acronym generation is modeled on the

HAHAcronym system (Stock and Strapparava,2002):

CPU = Celibate Professing Untied

The acronym is generated through the replace-

ment of each word in the original expansion (Cen-

tral Processing Unit) according to phonetic similarity

(“processing” vs. “professing”) and semantic opposi-

tion (“computer” vs. “religion”).

The following “hand-made” example, instead,

cannot be generated with the present resource because

it involve a model of the ambiguity propagated at the

phrase level:

IBM = Interpreting Bible Machines

(from the original International Business Machines)

5.3 Variation of Familiar Expressions

The following example is based on the FEVER pro-

gram (Valitutti, 2011):

A chapel a day keeps the malefactor away.

This pun is obtained through two word replace-

ments in which both phonetic similarity and domain

slanting (RELIGION) constraints were applied.

Instead the following hand-made expression can-

not be generated without a model describing the am-

biguity at the sentence level:

An onion a day keeps everyone away.

ICAART 2012 - International Conference on Agents and Artificial Intelligence

534

6 CONCLUSIONS AND FUTURE

WORK

Through the development and description of DEWN,

this work emphasizes the advantage to use a collec-

tion of ambiguous lexicon in computational humor

generation. The resource is based on the deﬁnition of

an abstract data structure (DEW) and aims to simplify

and standardize a set of lexical operation employed in

existing systems for generating creative text. The ap-

plicative examples were selected to support the idea

of reuse of available creative operations and their in-

tegration through the access to a shared lexical re-

source.

A crucial aspect in the future development of this

type of lexical resources is the indexing of items ac-

cording to speciﬁc semantic dimension, especially

when the number of items is enough large to delay

the search time.

The sharing of linguistic resources specialized for

creative applications and the effort to integrate differ-

ent specialized humor generators in a more general

tool is aimed as a form of adjacent possible. Accord-

ing to this term coined by Stuart Kauffman (Kauff-

man, 2000), the possible creative achievements avail-

able at a given time are based on the existing resources

and the shared innovation. The proposed approach is

aimed to give a contribution to extend the space of

creative possibilities.

ACKNOWLEDGEMENTS

This work has been supported by the Algorithmic

Data Analysis (Algodan) Centre of Excellence of the

Academy of Finland.

REFERENCES

Binsted, K., Pain, H., and Ritchie, G. (1997). Chil-

dren’s evaluation of computer-generated punning rid-

dles. Pragmatics and Cognition, 2(5):305–354.

Binsted, K. and Ritchie, G. (1994). An implemented model

of punning riddles. In Proc. of the 12

National Con-

ference on Artiﬁcial Intelligence (AAAI-94), Seattle.

Blank, A. (1999). Polysemy in the lexicon. In Eckardt,

R. and von Heusinger, K., editors, Meaning Change –

Meaning Variation, volume 1, pages 11–29.

Fellbaum, C. (1998). WordNet. An Electronic Lexical

Database. The MIT Press.

Goodman, J. (1995). 1,001 Ways to Add Humor to Your Life

and Work. Health Communications.

Kauffman, S. A. (2000). Investigations. OUP.

Levenshtein, V. I. (1966). Binary codes capable of correct-

ing deletions, insertions, and reversals. Soviet Physics

Doklady, 10(8):707–710.

Magnini, B. and Cavagli`a, G. (2000). Integrating subject

ﬁeld codes into wordnet. In Proc. of the 2

Interna-

tional Conference on Language Resources and Evalu-

ation (LREC2000), Athens, Greece.

Raskin, V. and Attardo, S. (1994). Non-literalness and non-

bona-ﬁde in language: approaches to formal and com-

putational treatments of humor. Pragmatics and Cog-

nition, 2(1):31–69.

Ritchie, G. (2004). The Linguistic Analysis of Jokes. Rout-

ledge, London.

Stock, O. and Strapparava, C. (2002). HAHAcronym: Hu-

morous agents for humorous acronyms. In (Stock

et al., 2002).

Stock, O., Strapparava, C., and Nijholt, A., editors (2002).

Proceedings of the The April Fools Day Workshop on

Computational Humour (TWLT20), Trento.

Strapparava, C. and Valitutti, A. (2004). WordNet-Affect:

an affective extension of WordNet. In Proc. of 4

International Conference on Language Resources and

Evaluation (LREC 2004), Lisbon.

Valitutti, A. (2011). How many jokes are really funny? to-

wards a new approach to the evaluation of computa-

tional humour generators. In Proc.of 8th International

Workshop on Natural Language Processing and Cog-

nitive Science, Copenhagen.

AMBIGUOUS LEXICAL RESOURCES FOR COMPUTATIONAL HUMOR GENERATION

535