A Benchmark for the Short Version of the User Experience Questionnaire

Andreas Hinderks

, Martin Schrepp

and J

org Thomaschewski

ETS Ingenier

ıa Inform

atica, Universidad de Sevilla, Spain

SAP SE, Germany

Universitiy of Applied Sciences Emden/Leer, Germany

Keywords:

User Experience Questionnaire Short, UEQ-S, Benchmark.

Abstract:

To enable an interactive product to provide adequate user experience (UX), it is important to ensure the quanti-

tative measurability of this parameter. The User Experience Questionnaire (UEQ) is a well-known and popular

method for such a UX measurement. One of the key features of this questionnaire is a benchmark that helps to

interpret measurement results by a comparison with a large dataset of results obtained for other products. For

situations where ﬁlling out the entire UEQ is too time-consuming, there is a short version (UEQ-S). However,

there is currently no sufﬁcient data available to construct an independent and interpretable benchmark for this

short version. This paper examines the efﬁciency of using a modiﬁed version of the existing benchmark of the

full UEQ for this purpose. The paper also presents some additional evaluation results concerning the UEQ-S.

1 INTRODUCTION

Providing a high or at least adequate user experience

for a product is indispensable for success in todays

highly competitive market environments. Modern de-

vices, like smartphones or tablets, with their quite in-

tuitive and easy-to-use interfaces, have raised the ge-

neral expectation of users concerning the quality of

user interfaces drastically in the last couple of years.

Todays users place high expectations on user interfa-

ces in their products (Hassenzahl, 2003).

This is not only true for the classical usability as-

pects, such as efﬁciency of the interaction, ease of le-

arning or the ability to control the interaction. In ad-

dition, hedonic and non-task-related quality aspects,

for example visual appeal of the user interface, fun

of use or originality or novelty of the design, play

an important role (Preece et al., 2015). This is not

only true for products intended for leisure activities,

but also for products like business software or pro-

gramming tools, that are used professionally (Schrepp

et al., 2006).

Deciding on the adequacy of the user experience

of any given interface requires that this parameter be

measurable. Based on such measurements we can

then draw conclusions, for example that the user ex-

perience is sufﬁciently high to launch the product or

that an investment into improvements of the user ex-

perience is required in order to be competitive on the

market (Schrepp et al., 2014).

User experience is a quite subjective matter (Boy,

2017). If several users are asked their opinion about

the user experience of the same product, their impres-

sions may differ widely. This can be due to the fact

that they use the product in different ways, i.e. try

to reach different goals and thus work with different

parts of the user interface of a product. Another cause

of divergent opinions can be differences in age or gen-

der. But even quite individual aspects, for example

the familiarity with products similar to the evaluated

product, can massively inﬂuence the impressions con-

cerning the UX of a product.

Because of these differences in the opinion of dif-

ferent users, it is very important to collect data from

larger groups of users. Questionnaires are a simple

and cost-effective method to collect user feedback

(Schrepp et al., 2014), since they can be distributed

rather efﬁciently to larger groups of users, especially

if they are designed as online questionnaires. In ad-

dition, analysing the data from such questionnaires is

highly standardised and thus efﬁcient as well.

However, the success of this method to collect

user feedback also has its drawbacks. Nowadays, we

are all ﬂooded with all kinds of questionnaires. This

naturally reduces the willingness to take part in such a

study, especially if answering the questionnaire requi-

res too much time (Batinic et al., 2002). In addition,

there are several cases in which the time a participant

Hinderks, A., Schrepp, M. and Thomaschewski, J.

A Benchmark for the Short Version of the User Experience Questionnaire.

DOI: 10.5220/0007188303730377

In Proceedings of the 14th International Conference on Web Information Systems and Technologies (WEBIST 2018), pages 373-377

ISBN: 978-989-758-324-7

373

is able or willing to spend on answering questions for

a UX evaluation is quite limited. Therefore, it is im-

portant in many cases to use questionnaires that con-

tain only a few items and can be answered extremely

fast, for example, in less than a minute.

The User Experience Questionnaire (UEQ)

(Laugwitz et al., 2008) is an established and fre-

quently used questionnaire for the evaluation of

the UX of interactive products. The questionnaire

consists of 26 items and requires 3-5 minutes to ﬁll

it out. For special circumstances, a short version

UEQ-S (Schrepp et al., 2017b) was developed which

contains only 8 items.

This paper illustrates some evaluation methods for

the short version and a method to make the UEQ

benchmark usable for this short version UEQ-S. In

section 2, we present the structure of the UEQ and the

adapted version UEQ-S. In section 3 a further evalua-

tion of the UEQ-S is presented. Section 4 describes an

adaption of the UEQ benchmark to the short version.

We conclude our article with a conclusion in section

2 STRUCTURE OF THE UEQ

AND THE UEQ-S

The UEQ (Laugwitz et al., 2008) is a frequently used

questionnaire to measure the subjective user expe-

rience of interactive products. Each item of the UEQ

consists of a pair of antonyms, for example:

Figure 1: User Experience Questionnaire - Item example.

Participants can rate each item on a 7-point Likert

scale. The answers are scaled from -3 (fully agree

with negative term) to +3 (fully agree with positive

term). Half of the items start with the positive term,

the others with the negative term (in randomized or-

der).

The 26 items are grouped into 6 scales:

• Attractiveness: Describes the overall impression

of the product. Do users like or dislike it? Is it

attractive, enjoyable or pleasing?

• Perspicuity: Describes how easy it is to get fa-

miliar with the product Is it easy to learn? Is the

product easy to understand and unambiguous?

• Efﬁciency: Can users solve their tasks without un-

necessary effort? Is the interaction efﬁcient and

fast? Does the product react quickly to user in-

put?

• Dependability: Does the user feel in control of the

interaction? Can he or she predict the systems be-

haviour? Does the user feel conﬁdent when wor-

king with the product?

• Stimulation: Is it exciting and motivating to use

the product? Is it enjoyable to use?

• Novelty: Is the product innovative and creative?

Does it capture the users attention?

Attractiveness is a pure valence dimension. Per-

spicuity, Efﬁciency, and Dependability are task or

goal related (pragmatic) quality aspects. Stimulation

and Novelty are hedonic quality aspects, i.e. they re-

late to pleasure or fun while using the product, see

(Schrepp et al., 2006) and (Schrepp et al., 2017a).

Figure 2: Adopted scale structure of the User Experience

Questionnaire (UEQ).

Details concerning the structure and evaluation of

the UEQ can be found in (Laugwitz et al., 2008). The

UEQ in all available languages (there are currently 20

languages available), an Excel sheet for data analysis,

and the UEQ Handbook are available free of charge

at www.ueq-online.org.

Filling out the UEQ does not require too much

time (usually 3-5 minutes are sufﬁcient). However,

there are some special application scenarios that re-

quire an even shorter questionnaire. The list below is

based on our practical experience in conducting the

UEQ. These scenarios are examples to understand the

need for a shorter version of the UEQ.

1. Data is often collected when a user leaves an ap-

plication, for example, a web shop. If the questi-

onnaire contains too many items, then users might

refuse to give feedback (they have ﬁnished their

tasks and are in the process of leaving, so motiva-

ting them to spend some more time on feedback

is difﬁcult).

2. Some questions concerning UX are to be included

in an already existing product experience questi-

onnaire. Such questionnaires are sent out after a

APMDWE 2018 - 3rd International Special Session on Advanced practices in Model-Driven Web Engineering

374

customer has purchased a product and has alre-

ady used it for some time. The questionnaires try

to collect data about the entire product experience

and are thus often quite lengthy. It is therefore

difﬁcult to add a full 26 item user experience que-

stionnaire in such cases.

3. Sometimes participants of a study must judge the

user experience of several products in one ses-

sion. Products are presented to the participant in a

random order one after the other and participants

must ﬁll out a UX questionnaire for each of them.

In such a setting, the number of items must be kept

to a minimum, since otherwise the quality of ans-

wers will decrease quickly.

A short version of the UEQ, called UEQ-S, was

designed to address such scenarios. This short ver-

sion only measures the two meta-dimensions pragma-

tic and hedonic quality, each with 4 items. Thus, the

UEQ-S consists of only eight items grouped into two

scales. In addition, the mean value of the eight items

is used as an overall UX value.

The structure of the short version relies on a data-

analytical approach. A data set with 1867 rows of

the German UEQ was collected. Each row reﬂects

the evaluation of a product with the UEQ by a parti-

cipant. In total, 21 different products were evaluated

(business software, web shops, household appliances,

etc.). From this data set, the 8 items that best repre-

sented the two meta-dimensions were isolated by a

main component analysis (varimax rotation). The re-

sult of the selected items and their respective loadings

on the two meta-dimensions are shown in Table 1.

Details of the design process and results can be found

in (Schrepp et al., 2017b).

Table 1: Loadings of the items of the short version UEQ-S

on the factors pragmatic and hedonic quality

Item Pragmatic Hedonic

clear / confusing 0.71 0.21

inefﬁcient / efﬁcient 0.63 0.39

complicated / easy 0.79 0.10

obstructive / supportive 0.69 0.41

boring / exiting 0.29 0.74

not interesting / interesting 0.36 0.75

conventional / inventive 0.19 0.82

usual / leading edge 0.19 0.86

Thus, the items show the intended scale structure.

Only the item obstructive/supportive yields a relevant,

but still relatively small cross-loading to the other fac-

tor. The other items load strongly on the factor they

belong to and only weakly on other factors. This

could also be conﬁrmed in a ﬁrst evaluation study that

compared evaluations of the full UEQ with evaluation

with the UEQ-S (Schrepp et al., 2017b).

In addition, it was shown with some smaller data

sets that the scales of the short version (UEQ-S) ap-

proximate the corresponding scales of the full version

(UEQ) reasonably well. The next section presents a

study that replicates these results for the German, En-

glish and Spanish language version with larger target

groups.

3 FURTHER EVALUATIONS OF

THE UEQ-S

In order to get a better understanding of the relation of

the UEQ-S and the full UEQ, a study was set up that

measures the user experience of Amazon and Skype

with German, Spanish and English users. The two

products Amazon and Skype were selected for the

study, since they are widely-known products which

are quite popular in all three countries. In addition,

both products show, with the exception of a transla-

tion of the terms on the user interface, no difference

in these countries, i.e. the user interface is quite simi-

lar in all three countries.

3.1 Setup of the Study

The data for the study were collected in England,

Spain and Germany with Online (England, Germany)

and a pen-and-paper version (Spain) of the UEQ. In

Spain the data were collected at the University of Se-

ville while a lecture. The English and German parti-

cipants were recruited over a social platform (Proliﬁc

Academic).

Participants could choose if they wished to eva-

luate Amazon or Skype. After this decision, they

received a short standard instruction concerning the

UEQ and then ﬁlled out the questionnaire. Finally,

the participants were asked to state their age and gen-

der. Each participant evaluated only one product. For

the Spanish data set, age and gender were not recor-

ded.

3.2 Participants

In England, 238 participants evaluated Amazon, and

215 Skype (330 females, 178 males, average age 34

years). In Germany, 144 participants chose to evalu-

ate Amazon and 85 Skype (92 females, 154 males,

average age 29). All participants were native spea-

kers and had sufﬁcient experience with Amazon and

Skype. For Spain, 51 evaluations of Amazon and 62

A Benchmark for the Short Version of the User Experience Questionnaire

375

evaluations of Skype could be collected. It is difﬁ-

cult and costly to recruit practitioners from compa-

nies (Escalona et al., 2016), so we decided to choose

studends of the university. All Spanish participants

were students at the University of Seville and in the

Master’s program and in the third level of the grade in

computer science.

3.3 Results

The items of the UEQ-S are a subset of the items of

the UEQ. The goal was to investigate if the short ver-

sion of the UEQ can be used to predict the results of

the meta-dimensions of the full UEQ. Thus, the mean

values for all 12 UEQ items belonging to the dimen-

sions Efﬁciency, Perspicuity and Dependability was

calculated to get a value PQ(UEQ) and the mean value

of the 8 Items belonging to Stimulation and Novelty

to get a value HQ(UEQ). The mean value of all 26

items of the UEQ is denoted as OV(UEQ). These va-

lues are then compared with the values of the 4 prag-

matic items of the UEQ-S (denoted as PQ(UEQ-S),

the 4 hedonic items of the UEQ-S (HQ(UEQ-S) and

all 8 items of the UEQ-S (OV(UEQ-S).

Table 2 shows these data (the numbers in brackets

are the standard deviations).

As we can see from the results, the values obtained

for the meta-dimensions pragmatic quality and hedo-

nic quality obtained from all items of the full version

and just the items from the UEQ-S are quite close.

This conﬁrms results already described in (Schrepp

et al., 2017b) with bigger data sets.

4 ADAPTION OF THE

BENCHMARK TO THE SHORT

VERSION

The UEQ contains a benchmark (Schrepp et al.,

2017a) that helps to judge how good or bad an actu-

ally measured product is in comparison to other pro-

ducts. The benchmark contains data from 9905 per-

sons that evaluated 246 different interactive products.

It would therefore be beneﬁcial to be able to somehow

use this benchmark also for the short version UEQ-S.

Many of the data sets contained in the benchmark

are from industry projects. These data are quite often

conﬁdential and thus their authors contributed the me-

asured scale means for the benchmark, but not the raw

data. Therefore, it is currently not possible to synthe-

sise a benchmark containing only the values for the 8

items of the UEQ-S directly from the available ben-

chmark data set.

Given the good approximation of the full UEQ

to the items of the short version UEQ-S, it seems

plausible to simply calculate the benchmark for the

short version based on the benchmark of the full ver-

sion from the available scale means. Thus, the scale

means for Efﬁciency, Perspicuity and Dependability

were used to calculate a value for Pragmatic Quality

and the scale means for Stimulation and Novelty were

used to calculate a value for Hedonic Quality for each

product evaluation in the existing benchmark. The

Overall value was calculated from all 6 UEQ scale

means, i.e. including Attractiveness.

Since it was already shown that the items of the

short version allow a relative accurate approximation

to the values for the meta-dimensions of the full ver-

sion, the results of this procedure can be used as a rea-

sonable benchmark for the short version, at least until

such time as enough data are available to calculate a

benchmark for the UEQ-S directly.

This procedure yields the following result:

• Excellent (In the range of the 10% best results):

Pragmatic Q. greater than 1.73, Hedonic Q. grea-

ter than 1.55, Overall greater than 1.58.

• Good (10% of the results in the benchmark data

set are better and 75% of the results are worse):

Pragmatic Q. between 1.55 and 1.73, Hedonic Q.

between 1.25 and 1.55, Overall between 1.4 and

1.58.

• Above average (25% of the results in the bench-

mark are better than the result for the evaluated

product, 50% of the results are worse): Pragmatic

Q. between 1.15 and 1.54, Hedonic Q. between

0.88 and 1.24, Overall between 1.02 and 1.39.

• Below average (50% of the results in the bench-

mark are better than the result for the evaluated

product, 25% of the results are worse): Pragmatic

Q. between 0.73 and 1.14, Hedonic Q. between

0.57 and 0.87, Overall between 0.68 and 1.01.

• Bad (In the range of the 25% worst results): Prag-

matic Q. less than 0.73, Hedonic Q. less than 0.57,

Overall less than 0.68.

Figure 3: Benchmark for the UEQ-S. Results for a hypot-

hetical example product.

APMDWE 2018 - 3rd International Special Session on Advanced practices in Model-Driven Web Engineering

376

Table 2: Mean values and standard deviations for the UEQ and UEQ-S.

Pragmatic Quality Hedonic Quality Overall

PQ(UEQ) PQ(UEQ-S) HQ(UEQ) HQ(UEQ-S) OV(UEQ) OV(UEQ-S)

Amazon England 1.50 (1.05) 1.54 (1.04) 0.95 (1.08) 1.008 (1.03) 1.32 (1.11) 1.272 (0.93)

Spain 1.03 (1.05) 0.99 (1.02) 1.03 (1.02) 0.990 (0.99) 1.08 (1.04) 0.98 (0.96)

Germany 1.36 (0.91) 1.550 (0.95) 0.73 (0.98) 0.701 (0.97) 1.17 (1.00) 1.126 (0.85)

Skype England 1.06 (1.13) 1.167 (1.12) 0.50 (1.08) 0.403 (1.04) 0.90 (1.15) 0.787 (0.94)

Spain 0.93 (0.07) 1.118 (0.93) 0.77 (1.05) 0.723 (1.02) 0.94 (1.01) 0.911 (0.84)

Germany 0.77 (1.12) 0.997 (1.24) 0.44 (1.05) 0.412 (1.04) 0.68 (1.12) 0.698 (1.03)

5 CONCLUSIONS

The short version UEQ-S of the UEQ is intended to

be used in cases where ﬁlling out a complete 26 item

UEQ is not possible. The UEQ-S was designed ac-

cording to a data analytical approach on the basis of

the full UEQ. Some ﬁrst evaluation studies showed

that the items of the UEQ-S are a good approxima-

tion of the UEQ results for pragmatic and hedonic

quality in the sense that the 4 items of the UEQ-S

are quite good predictors for the mean values of the

12, respectively 8, items of the full UEQ assigned to

these meta-dimensions.

However, some more data are required to get a

deeper understanding of the relation between full and

short version. In future work, the Spanish data set

must be checked to see if it is generally valid for the

Spanish version of the UEQ, since only students took

part. This paper presents the results of three additio-

nal validation studies for different language versions,

which conﬁrm the scale structure of the UEQ-S and

again a reasonable congruity of short and full version

of the questionnaire.

One of the key features of the UEQ is a large ben-

chmark data set. The benchmark helps to interpret

results obtained with the UEQ by a comparison to the

results of other products in the benchmark. It is not

directly possible to recalculate a benchmark for the

UEQ-S, since only the scale means are available for

many of the data points in the UEQ benchmark, i.e.

the raw data are not available due to data privacy is-

sues. This paper showed that due to the good approx-

imation of the metadimensions pragmatic and hedo-

nic quality of the 8 items of the short version, it is

possible to use a natural transformation of the UEQ

benchmark for the UEQ-S.

REFERENCES

Batinic, B., Reips, U.-D., and Bosnjak, M., editors (2002).

Online social sciences. Hogrefe & Huber Publishers,

Seattle.

Boy, G. A. (2017). The Handbook of Human-Machine Inte-

raction: A Human-Centered Design Approach. CRC

Press, Milton, 1st ed. edition.

Escalona, M. J., Lopez, G., Vegas, S., Garc

ıa-Borgo

on, L.,

Garcia-Garcia, J. A., and Juristo, N. (2016). A soft-

ware engineering experiments to value mde in testing.

learning lessons.

Hassenzahl, M. (2003). The thing and i: Understanding

the relationship between user and product. In Blythe,

M. A., editor, Funology, pages 31–42. Kluwer Acade-

mic Publishers, Boston [etc.].

Laugwitz, B., Held, T., and Schrepp, M. (2008). Con-

struction and evaluation of a user experience question-

naire. In Holzinger, A., editor, HCI and Usability for

Education and Work, volume 5298 of Lecture Notes in

Computer Science, pages 63–76. Springer Berlin Hei-

delberg, Berlin, Heidelberg.

Preece, J., Rogers, Y., and Sharp, H. (2015). Interaction

design: Beyond human-computer interaction. Wiley,

Chichester, 4. ed. edition.

Schrepp, M., Held, T., and Laugwitz, B. (2006). The inﬂu-

ence of hedonic quality on the attractiveness of user

interfaces of business management software. Inte-

racting with Computers, 18(5):1055–1069.

Schrepp, M., Hinderks, A., and Thomaschewski, J. (2014).

Applying the user experience questionnaire (ueq) in

different evaluation scenarios. In Marcus, A., edi-

tor, Design, User Experience, and Usability. Theo-

ries, Methods, and Tools for Designing the User Ex-

perience, volume 8517 of Lecture Notes in Computer

Science, pages 383–392. Springer International Pu-

blishing, Cham.

Schrepp, M., Hinderks, A., and Thomaschewski, J. (2017a).

Construction of a benchmark for the user expe-

rience questionnaire (ueq). International Journal

of Interactive Multimedia and Artiﬁcial Inteligence,

4(4):40–44.

Schrepp, M., Hinderks, A., and Thomaschewski, J. (2017b).

Design and evaluation of a short version of the user ex-

perience questionnaire (ueq-s). International Journal

of Interactive Multimedia and Artiﬁcial Intelligence,

4(6):103.

A Benchmark for the Short Version of the User Experience Questionnaire

377