the conventional MC and single alternate choice,
true-false (TF), formats.”
Table 1: Four non-G9a-compliant MC items that cover the
same content as the G9-compliant MAC item displayed in
Figure 1.
MCQ01
TRS = Task record Sheet
Can you start completing TRSs before you’ve
completed the relevant training?
A) Yes
B) No
MCQ02
TRS = Task record Sheet
Is it acceptable to get a TRS signed a week
after the task was completed?
A) Yes
B) No
MCQ03
TRS = Task record Sheet
Is an apprentice permitted to get several TRSs
for the same task signed off on the same date?
A) Yes
B) No
MCQ04
TRS = Task record Sheet
Is an apprentice permitted to get several TRSs
for different tasks signed off on the same date?
A) Yes
B) No
3.2 Method
The hypothesis has been tested by delivering both
G9a-NON-compliant test items and G9a-compliant
test items to 28 new entrants to the featured UK
company. Two parallel experiments have been
conducted to test G9a in the featured domain. Group
A (14 members) took the assessment routine
containing 8 G9a-non compliant items MCQ01,
MCQ02, MCQ03 etc. and 8 G9a-compliant items
(MAC09, MAC10, MAC11 etc).
Meanwhile, group B (14 members) were
presented with 8 G9a-non compliant items MCQ09,
MCQ10, MCQ11 etc. which tested equivalent
content to MAC09, MAC10, MAC11 etc) and 8
compliant items (MAC01, MAC02, MAC03 etc.)
which tested equivalent content to items (MCQ01,
MCQ02, MCQ03 etc).
3.3 Evaluation
The adapted Guideline will be assessed as
acceptable if the change in item difficulty between a
G9a-compliant item and a G9a-non-compliant item
is not significant and the response time for the G9a-
non-compliant item is the same or less than the
response time for a G9a-compliant item.
4 RESULTS
For each test item a record was made of the option
selected by each apprentice along with the time
taken to respond. A total of 28 sets of responses for
the featured test items for each experiment was
retained for analysis consisting of 14 sets of
responses for G9a-compliant items and 14 sets for
responses for G9a-non-compliant items. All tests
were conducted under controlled conditions however
some candidates completed the test without
recording a response to some components of some
of the MAC items. All responses from any candidate
whose response record included one or more ‘no
response’ record(s) were excluded from the analysis
of results in order to facilitate comparisons.
Table 2 summarizes the comparison between
G9a-compliant and G9a-non-compliant test item
response data where ID is the Item Difficulty
calculated using the technique described in Swanson
D.B., Holtzman, K.Z.,Allbee K.,Clauser, B.E., 2006.
Table 2: Results from the comparison of response data
between G9a-compliant and G9a-non compliant items.
Item IDn-IDc RTn – RTc
1 0.05 -86
2 0.10 -315
3 -0.11 -88
4 0.03 -103
5 -0.01 18
6 0.31 101
7 -0.14 -95
8 0.15 45
9 0.09 103
10 0.31 45
11 0.06 -15
12 0.09 5
13 -0.08 -16
14 -0.24 -30
15 0.32 -102
16 0.20 -96
IDc indicates that the measurement has been
calculated from item response data for G9a-
compliant items and IDn indicates that the
measurement has been calculated for G9a-non
compliant items. The IDn-IDc column therefore
contains the difference between these two Item
Difficulty values and the RTn – RTc column
contains the difference in seconds between the total
recorded response times for compliant and non-
compliant items.
The method chosen to calculate Item Difficulty
(ID) for this experiment is the one used by (Swanson
et al. 2006). The psychometric characteristics used
in (Swanson et al 2006) also include the Logit
ADAPTING MULTIPLE-CHOICE ITEM-WRITING GUIDELINES TO AN INDUSTRIAL CONTEXT
73