5 CALIBRATION
In (Rudner, 2002; Rudner, 2009; Rudner, 2010) Rud-
ner spends only a few words talking about MDT cali-
bration (estimation of priors - vector ~p and set P). But
for practical purposes calibration process is essential.
In this section we are presenting methods of calibra-
tion and results of our experiments showing charac-
teristics of calibration process important to implement
MDT in real-world testing. In this section we are fo-
cused entirely on an estimation of set P because in the
worst case, if we were unable to estimate ~p, we could
set it ~p =
1
|M|
,
1
|M|
, .. . ,
1
|M|
(equally distributed cat-
egories in population) without fatal consequences to
method precision (see (Rudner, 2009)).
5.1 Basic Calibration
As we have already mentioned MDT is an instance
of well-known Naive Bayes Classifier (NBC). NBC
is widely used in a scope of Artificial Intelligence
where calibration process (“classifier training“) is
well-developed. In AI there is a “training set“ of ob-
jects of known attributes as well as their classifica-
tion. Equivalent to training set in Educational Mea-
surement is pilot testing performed on a set of exami-
nees (“pre-testees“) of known classification (typically
obtained from external sources e.g. existing certifi-
cations). Once we have a set of objects (pre-testees),
their attributes (responses to items) and their classi-
fications we are able to compute parameters of at-
tributes (items).
More precisely: Let’s have a set of categories M,
set of items U , set of examinees Z, set of appropriate
responses R and a vector of appropriate classification
~c. Our task is to obtain an appropriate set P. Once
again we could describe the process as a function B :
(R,~c) → P. Since P is a set of P(u
i
|m
k
) elements
we could simplify the computation of function B to
a computation of each element. In equations 6, 7 and
8 there is a description of evaluation of P(u
i
|m
k
) in
three steps.
R
m
k
= {~z
j
|~z
j
∈ R∧ c
j
= m
k
} (6)
T
m
k
i
= {z
ji
|~z
j
∈ R
m
k
∧ z
ji
= 1} (7)
P(u
i
|m
k
) =
T
m
k
i
R
m
k
(8)
Crucial difference between usage of NBC in Ar-
tificial Intelligence and Educational Measurement is
the size of a training set. In AI we are typically op-
erating with large training sets even larger than the
set of objects we want to classify (see examples in
(Caruana and Niculescu-mizil, 2006)). In contrast in
Educational Measurement we are very limited in the
number of pre-testees. It is an expensiveprocess to re-
cruit persons of known classification especially if we
are developing a brand-new test. Therefore there is
a strong motivation to keep number of pre-testees as
small as possible. Two next sections are dedicated to
the analysis of required number of pre-testees (section
5.2) and to the description of a particular calibration
improvement technique (section 5.3).
5.2 Items or Categories
Two approaches are possible when describing suffi-
cient number of pre-testees: a per-item (used by Rud-
ner in (Rudner, 2010)) or per-group. In this section
we will show which approach is more appropriate.
To answer this question we have constructed two
sets of experiments. Experiments are repeated a few
times and share a common framework. Results of ex-
periments are again presented as box-graphs.
5.2.1 Framework
Let’s have a given number of categories m, number of
items u, number of pre-testees z
p
, number of exami-
nees z = 200 defining sets M, U, Z
p
, Z and number of
selected items u
s
≤ u.
Step 1. Let’s again have a set of parame-
ters of items P
U
=
S
u
i
∈U
GI(), vector of cat-
egories of pre-testees z
p
j
∈ Z
p
belong to ~c
Z
p
such as c
Z
p
j
= RAN ({1, 2, . . . , m}, ~p) where
~p =
1
m
,
1
m
, . . . ,
1
m
(again equal distribution of ~p),
a set R
p
of responses of pre-testees such as z
p
ji
=
RAN
{0, 1} ,
n
1− P
U
u
i
|c
Z
p
j
, P
U
u
i
|c
Z
p
j
o
,
U
s
⊆ U of u
s
i
randomly (equally distributed)
selected items u
i
∈ U, and a set of responses
of examinees to items of U
s
R
s
such as z
s
ji
=
RAN
{0, 1} ,
n
1− P
U
u
s
i
|c
Z
p
j
, P
U
u
s
i
|c
Z
p
j
o
.
Step 2. Let P = B
R
p
,~c
Z
p
and than let ~c =
F (~p, P, R
s
).
Step 3. Let again classification error rate e =
E
~c,~c
Z
. And let difference of calibration to real pa-
rameters d = D
P, P
U
where function D is defined
by equation 9.
ONPRACTICALISSUESOFMEASUREMENTDECISIONTHEORY-AnExperimentalStudy
97