2. Median Absolute Error (MdAE), median(|e|)
3. Root Mean Square Error (RMSE),
p
e
2
4.
Root Median Square Error (RMdSE),
p
median(e
2
).
Applying one of the measures produces a value
val
i
for each strategy
t
i
,
1 ≤i ≤n
. Since the true progress
is always the same for each strategy and all values are
on the percentage scale, statistics allows us to compare
the different values. The strategy with the lowest value
is the best one of the considered strategies.
A value of
0
is perfect for all measures, it means
that the error between the true and predicted progress
is zero. The RMSE and RMdSE have the disadvan-
tage that they are infinite, undefined, or skewed if
all observed values are
0
or near to
0
(Hyndman and
Koehler, 2006). Since the true progress has values in
the range from
0
to
100%
, this disadvantage does not
affect them.
The approach relies on the knowledge of the true
progress and, therefore, on empirical data. Unfortu-
nately, as with any empricial study, these data is usu-
ally not available before the survey starts. To overcome
this problem, data can be generated by pilot studies,
simulations, or path-explorations of the survey for ex-
ample. Pilot studies refer to conducting the survey
with a subset of the population, whereas in simulations
virtual participants answer the questionnaire. In a path-
exploration, an algorithm computes all (or most) paths
of the survey and computes sample progresses for each
path. But adaptive surveys may have a (exponential)
large number of such paths. Furthermore, all three
possibilities have in common that they should repre-
sent a “realistic” usage of the different paths. Different
weights exist for the paths and influence the measure.
The researcher should be aware of this.
4 EXPERIMENTS AND LESSONS
LEARNED
In our department, we conduct large surveys with hun-
dreds of variables and items and many adaptive paths.
The survey engine, that we use, stores the paths on
which the participants “walk” through the surveys. For
each participant, it is possible to compute the true num-
ber of remaining items for each visited page. Besides
the true progress, we can also compute the predicted
progress for different prediction strategies in retrospect
with Equation 1 and the algorithm of Figure 2. As a
result, we get data sets with the true and displayed
progresses for each strategy for each participant. With
these it is possible to determine the most suitable mea-
sure and the best strategy.
4.1 Experimental Settings
We took two of our surveys, survey A and survey B.
Table 1 describes the structure of the surveys based
on empirical data. In the table,
N
Branches
refers to the
number of pages with branches,
|Path|
is the number of
pages within a path, and
N
Items
refers to the number of
items a participant has seen. Both surveys have similar
characteristics except for
N
Participants
and
N
Branches
. For
survey A we have more available data sets, whereas
survey B has much more branches.
For both surveys, we produced data sets for three
different prediction strategies: minimum (min), mean,
and maximum (max). If a page has more than two
direct successor pages, the minimum strategy takes the
smallest number of remaining items. The maximum
strategy takes the largest number of remaining items
in such a case, whereas the mean strategy computes
the mean number. At this place, it is important that the
mean represents not the empirical mean of items on
all empirical paths. It represents the selection operator
mean used in the general algorithm. The expected
remaining items on the start page vary for both surveys
(cf. Table 1,
rem(start)
) and are higher for survey
B except for the min approach, which is very small
with a value of
7
. The values in parentheses represent
adjustments on the surveys explained in the following.
4.2 Lessons Learned
Screening paths are paths at the start of a survey in
which a participant receives a few key questions to
determine if they are part of the specific target popu-
lation. Depending on their answers, the survey either
continues to the main part or ends quickly. Therefore,
there is an exit path to the end without many items.
The first lesson we learned was that the inclusion of
screening paths in the progress calculation usually pro-
duces bad predictions. By taking the min strategy, the
exit path has the fewest remaining items and, there-
fore, decreases the number of remaining items on all
paths at the beginning of the survey (cf.
rem(start)
in
Table 1). In survey B, this leads to progresses near
100%
after passing the page where the screening path
ends. For the strategies mean and max, the screening
path does not have a great impact.
Adaptive page chains are subpaths with many adap-
tive pages, however, each participant only sees a small
number of them. In survey B, there are a lot of such
pages, which contain items about special topics. In
general, each participant has only seen one or two of
these approx. 30 pages. For min, such chains dis-
appear which skews the results as most participants
see at least one page. The max strategy includes each
WEBIST 2019 - 15th International Conference on Web Information Systems and Technologies
310