The word "bright" in paragraph 2
is closest in meaning to
(A) smart
(B) cheerful and lively
(C) dazzling
(D) valuable
Research indicates that questioning is second only to lecturing in popularity as a
teaching method and that classroom teachers spend anywhere from 35 to 50% of
their instructional time conducting questioning sessions (Cotton, 1988). Multiple-
choice and open-ended questions (why, what, how etc) are two of the most popular
type of questions for knowledge evaluation. However, manual construction of such
questions requires high-level skill, and is also a hard and time-consuming task.
Recent research has investigated how natural language processing can contribute to
automatically generating questions, and this kind of research has received a lot of
attention lately.
For instance, (Narendra et al., 2013) attempted to generate cloze (fill-in-blank)
questions adopting a semi-structured approach using knowledge base extracted from
the Cricket World Cup portal data, while (Agarwal and Mannem, 2011) generated
factual cloze questions from a biology textbook. (Liu and Calvo, 2009) and (Chen et
al., 2009) worked on generating open-ended questions from essays or informational
texts. Concerning their target domain, many attempts focused on language learning,
particularly English language learning (Sumita et al., 2005; Lee and Seneff, 2007;
Lin et al., 2007; Smith et al., 2010).
This paper also addresses the issue of automatic question generation in English
language learning. As the demands of communication across diverse communities
have been developing in the recent years, the use of English as the main
international language has increased to interact with different societies both in
business and academic settings. Owing to this, English proficiency tests such as
TOEFL and TOEIC are imperative in measuring the English communication skills
of a non-native English speaker. However, since the past questions of those tests are
not freely distributed, test takers can only rely on a limited number of test samples
and preparation books. This is our main motivation to generate questions for English
proficiency test practice. We focus on the multiple-choice vocabulary question
because it contributes to majority of questions in the TOEFL iBT reading section
(2-4 out of 12 questions in one reading passage) and also appears in other English
proficiency tests such as TOEIC.
In the area of vocabulary questions, many researches have been done in the domain
of English language learning, e.g. generation of fill-in-the-blank questions for
completing a sentence, words collocation, synonym, antonym, etc. Questions have
been generated to test students ’knowledge of English in using the correct verbs
(Sumita et al., 2005), prepositions (Lee and Seneff, 2007) and adjectives (Lin et al.,
2007) in sentences. (Pino et al., 2008) and (Smith et al., 2010) have generated
questions to teach and evaluate students' English vocabulary.
She was a bright young PhD
graduate from Yale University, and
her research on thermal dynamics
…
(1) target word
(2) reading passage
← (3) correct answer
(4) distractors
Figure 1: Four components in a vocabulary question asking
for closest in meaning of a word.
questions for completing a sentence, word colloca-
tion, synonym, antonym, etc. In previous research,
questions have been generated to test students knowl-
edge of English in correctly using the verbs (Sumita
et al., 2005), prepositions (Lee and Seneff, 2007) and
adjectives (Lin et al., 2007) appearing in sentences.
(Pino et al., 2008) and (Smith et al., 2010) have gener-
ated questions to teach and evaluate students English
vocabulary.
In this research, we adopt TOEFL vocabulary
questions as the format. This type of vocabulary ques-
tion asks the closest option in meaning to a given
word. As shown in Figure 1, this type of question is
composed of four components: (1) a target word, (2)
a reading passage in which the target word appears,
(3) a correct answer, and (4) distractors (incorrect op-
tions). To generate a question, we need to produce
these four components.
One possible approach for generating such ques-
tions is using a manually-created lexical knowledge
base such as WordNet (Fellbaum, 1998). (Brown
et al., 2005) generated multiple-choice questions by
taking their components from WordNet. (Lin et al.,
2007) also adopted WordNet to produce English ad-
jective questions from a given text. The candidates
of options (correct answer and distractors) are taken
from WordNet and filtered by Web searching. Un-
like previous work, we propose a method for question
generation by utilising Web texts from the Internet in
addition to information from WordNet. Producing the
reading passage from Internet materials enables us to
provide learners with fresh, updated, and high-quality
English reading passages.
Another focus of this research is generating not
only single-word options for a correct answer and dis-
tractors, but also multiple-word options that past re-
search did not deal with. Since multiple-word op-
tions are often used in actual English vocabulary
tests like TOEFL, introducing multiple-word options
would make generated questions more natural and
closer to human generated questions.
As shown in Figure 1, a multiple-choice vocabu-
lary question consists of four components, thus given
a target word with its part-of-speech (noun, verb, ad-
jective or adverb) as an input, the task of generating
this kind of question can be broken down into three
tasks: (1) reading passage generation, (2) correct an-
swer generation, and (3) distractor generation. In the
next three sections, we describe each task in detail fol-
lowed by an evaluation experiment. Finally we con-
clude the paper and look at future directions.
2 READING PASSAGE
GENERATION
In English proficiency tests such as TOEFL, the read-
ing passage is taken from university-level academic
texts with various subjects such as biology, sociology,
history, etc. In this work we generate similar pas-
sages, but do not limit ourselves to academic texts;
the Internet is used as the source for generating the
reading passages. In addition, text domains can be
controlled by choosing the target sites for retrieving
texts from the Internet. Here the users, e.g. English
teachers, can choose the sites depending on their in-
terest. For example if the users prefer news articles
on the subject of technology, they can choose sites
such as www.nytimes.comon “Technology” category.
Utilising a reading passage retrieved from the Internet
(especially from news portals) gives a lot of benefits
because such texts tend to be new and up-to-date, in
terms of both content and writing-style. They also
come from broad genres and topics, making them well
suited for English language learning.
A straightforward approach to generating the
questions would be to choose a passage that contains
the target word, and then identifying its word sense
for generating the correct answer and distractors. In
general, a word in a dictionary has several meanings,
while the word in a given passage is used for repre-
senting a specific one of those meanings. The task
of identifying the correct word sense within a con-
text has been studied in natural language processing
research under the name of “word sense disambigua-
tion (WSD)”.
2.1 Word Sense Disambiguation
Word sense disambiguation (WSD) is the task of iden-
tifying the meaning of a word in context in a com-
putational manner (Navigli, 2009). Vocabulary ques-
CSEDU2015-7thInternationalConferenceonComputerSupportedEducation
78