Item Difficulty Analysis of English Vocabulary Questions
Yuni Susanti
1
, Hitoshi Nishikawa
1
, Takenobu Tokunaga
1
and Obari Hiroyuki
2
1
Department of Computer Science, Tokyo Institute of Technology, Tokyo, Japan
2
College of Economics, Aoyama Gakuin University, Tokyo, Japan
Keywords:
English Vocabulary Test, Item Difficulty, Multiple-choice Question.
Abstract:
This study investigates the relations between several factors of question items in English vocabulary tests and
the corresponding item difficulty. Designing the item difficulty of a test impacts the quality of the test itself.
Our goal is suggesting a way to control the item difficulty of questions generated by computers. To achieve
this goal we conducted correlation and regression analyses on several potential factors of question items and
their item difficulty obtained through experiments. The analyses revealed that several item factors correlated
with the item difficulty, and up to 59% of the item difficulty can be explained by a combination of item factors.
1 INTRODUCTION
English proficiency tests such as TOEFL
R
and
TOEIC
R
are imperative in measuring English com-
munication skills of non-native English speakers.
Manual construction of questions for such tests, how-
ever, requires high-level skills, and is a hard and
time-consuming task. Recent research has investi-
gated how natural language processing (NLP) can
contribute to automatically generating such questions,
and more generally research on Computer-Assisted
Language Testing (CALT) has received immense at-
tention lately. Open-ended question asking for the
“why”, “what” and “how” of something, and vocab-
ulary questions are two of the most popular types of
questions for evaluating English proficiency. Figure 1
shows an example of a TOEFL-like multiple-choice
vocabulary question, asking an option with the clos-
est meaning to the target word in the reading passage.
Automatic question generation for evaluating lan-
guage proficiency is an emerging application since it
has been made possible only recently with the avail-
ability of NLP technologies and resources such as
word sense disambiguation (WSD) techniques (Mc-
Carthy, 2009) and WordNet (Fellbaum, 1998), a
machine-readable lexical dictionary. To generate a
question as in Figure 1, one needs to produce four
components: (1) a target word, (2) a reading pas-
sage, (3) a correct answer and (4) distractors. Su-
santi et al. (2015) generated closest-in-meaning vo-
cabulary questions employing Web news articles for
the reading passage and WordNet for the correct an-
!"#$%&'($)*'+,"-)$+.$/0'0,'0/"$1$
+2$34&2#2-$+.$5#0.+.,$-&
678$250'-
698$3"##':;4$0.($4+<#4=
6>8$(0??4+.,
6@8$<04;0*4#
!"#$%$&'()%#*+,$&#-.&,$%&",%/,0.*%2,&3,,*%$,+,/.0%4.5&1/$%14%6',$*%#&,7$%#*%
8*-0#$"%+15.2'0./)%&,$&$%.*(%&",%51//,$91*(#*-%#&,7%(#4!5'0&):%;,$#-*#*-%&",%#&,7%
(#4!5'0&)%14%.%&,$&%#79.5&$%1*%&",%6'.0#&)%14%&",%&,$&%#&$,04:%<'/%-1.0%#$%$'--,$&#*-%.%
3.)%&1%51*&/10%&",%#&,7%(#4!5'0&)%14%6',$*$%-,*,/.&,(%2)%5179'&,/$:%!1%.5"#,+,%
&"#$%-1.0%3,%51*('5&,(%51//,0.*%.*(%/,-/,$$#1*%.*.0)$,$%1*%$,+,/.0%91&,*&#.0%
4.5&1/$%14%6',$*%#&,7$%.*(%&",#/%#&,7%(#4!5'0&)%12&.#*,(%&"/1'-"%,=9,/#7,*&$:%!",%
.*.0)$,$%/,+,.0,(%&".&%$,+,/.0%#&,7%4.5&1/$%51//,0.&,(%3#&"%&",%#&,7%(#4!5'0&)>%.*(%'9%
&1%?@A%14%&",%#&,7%(#4!5'0&)%5.*%2,%,=90.#*,(%2)%.%5172#*.*%14%#&,7%4.5&1/$:
8*-0#$"%9/1!5#,*5)%&,$&$%$'5"%.$%!<8BC%.*(%!<8DE%./,%#79,/.&#+,%#*%7,.$'/#*-%
8*-0#$"%517F%7'*#5.*%$G#00%14%*1*F*.&#+,%8*-0#$"%$9,.G,/$:%H.*'.0%51*$&/'5*%
14%6',$*$%41/%$'5"%&,$&$>%"13F%,+,/>%/,6'#/,$%"#-"F0,+,0%$G#00$>%.*(%#$%.%"./(%.*(%
,F51*$'7#*-%&.$G:%I,5,*&%/,$,./5"%".$%#*+,$&#F%-.&,(%"13%*.&'/.0%0.*-'.-,%
9/15,$$#*-%JKCLM%5.*%51*&/#2'&,%&1%.'&17..00)%-,*,/.&,%$'5"%6',$*$>%.*(%71/,%
-,*,/.00)%/,$,./5"%1*%E179'&,/FN$$#$&,(%C.*-'.-,%!,$&#*-%JENC!M%".$%/,5,#+,(%
#77,*$,%.&F%&,**%0.&,0):%<9,*F,*(,(%6',$*$%.$G#*-%41/%&",%O3")P>%O3".&P%.*(%
O"13P%14%$17,&"#*->%.*(%+15.2'F%0./)%6',$*$%./,%&31%14%&",%71$&%919'0./%&)9,$%
14%6',$*$%41/%,+.0'.&#*-%8*-0#$"%9/1!5#,*5)%14%&",%&,$&%&.G,/$:%B#-'/,%Q%$"13$%.*%
,=.790,%14%!<8BCF0#G,%7'0Z,F5"1#5,%+15.2'0./)%6',$*%.$G#*-%1*,%3#&"%&",%
501$,$&%7,.*#*-%&1%&",%&./-,&%31/(%#*%&",%/,.(#*-%9.$$.-,:
N'&17.%6',$*%-,*,/.*%41/%,+.0'.&#*-%0.*F%-'.-,%9/1!5#,*5)%#$%.*%
,7,/-#*-%.990#5.*%$#*5,%#&%".$%2,,*%7.(,%91$$#20,%3#&"%&",%.+.#0.2#0#&)%14%KCL%
&,5"*101-#,$%.*(%/,$1'/5,$%$'5"%.$%31/(%$,*$,%(#$.7F%2#-'.*%JRS;M%&,5"*#6',$%
JH5E./&")>%TUU@M%.*(%R1/(K,&%JB,002.'7>%Q@@VM>%.%7.5"#*,F/,.(.20,%0,=F%#5.0%
(#5*./):%!1%-,*,/.&,%.%6',$*%.$%#*%B#-'/,%Q>%1*,%*,,($%&1%9/1('5,%&",%41'/%
51791*,*&$W%JQM%.%&./F%-,&%31/(>%JTM%.%/,.(#*-%9.$$.-,>%JXM%.%51//,5&%.*$3,/%.*(%JYM%
(#$&/.5&1/$:%JS'$.*&#%,&%.0:>%TUQ?M%-,*,/.&,(%501$,$&F#*F7,.*#*-%+15.2'0./)%6',$*$%
,7901)#*-%R,2%*,3$%./2,$%41/%&",%/,.(#*-%9.$$.-,%.*(%R1/(F%K,&%41/%&",%51//,5&%
.*$3,/:%!",%(#$&/.5&1/$%1/%#*F%51//,5&%19*$%3,/,%-,*,/.&,(%2)%'$#*-%21&"%&",%/,F%
&/#,+,(%/,.(#*-%9.$$.-,%.*(%R1/(K,&%0,=#5.0%(#5F%*./):%JZ/13*%,&%.0:>%TUU?M%
-,*,/.&,(%7'0Z,F5"1#5,%6',$*$%2)%&.G#*-%&",#/%51791*,*&$%4/17%R1/(K,&
A"#$%02$0$*'+,"-$=&;.,$B"@$
,'0(;0-#$:'&5$C04#$D.+<#'2+-=E$0.($
"#'$'#2#0'3"$&.$-"#'504$(=.05+32
F
6G8$-0',#-$%&'(
618$'#0(+.,$/0220,#
6H8$3&''#3-$0.2%#'
6I8$(+2-'03-&'2