LEARNABILITY AND ROBUSTNESS OF USER INTERFACES

Towards a Formal Analysis of Usability Design Principles

Steinar Kristoffersen

Østfold University College, Halden, Norway

Keywords:

Logical modeling, precise analysis of usability evaluation, model checking.

Abstract:

The paper is concerned with automatic usability assessment, based on heuristic principles. The objective is to

lay the ground, albeit still rather informally, of a program of assessing the usability of an interactive system

using formal methods. Further research can then extend this into an algebra of interactive systems.

1 INTRODUCTION

We know that the effect of poor usability is difﬁcult

to measure (Lund, 1997). Usability itself is difﬁcult

to deﬁne, at least at any level of precision deeper than

by example. Researchers agree that improving usabil-

ity may save considerable time and resources (Myers,

1994). Few in industry will say that usability has lit-

tle value, although perceived user-friendliness is not

a signiﬁcant determinant of adoption in the ﬁrst place

(Davis, 1989). Vredenburg et al. found that ca. 20%

of a project’s resources will be spent on activities re-

lated to usability, but that the effectiveness of user-

centered design activities are not usually measured

(Vredenburg et al., 2002).

When can we tell that our usability goals have

been reached? Many forms of usability assessment

exist (Holzinger,2005), but for recurring reasons such

as lack of resources, time and technology, the most

widely encompassing and precise methods are often

ignored (Mulligan et al., 1991).

Various forms of heuristic evaluation based on

experts or users benchmarking of a speciﬁcation or

prototype again pre-deﬁned usability design princi-

ples, have provento be quite efﬁcient (Nielsen, 1992),

compared to other techniques (Jeffries et al., 1991).

Nevertheless, it may require as many as 15 evaluators

for an optimal result (Nielsen, 1993), which is well

beyond the reach of most projects.

Information technology is becoming commodi-

tized and the time of stake-holders is a relatively

scarce resource. Many project have to be developed

for the web or with mobile clients, which tends to

limit the degrees of freedom even more. Most users

are experienced with similar applications already, and

want to be able to apply the skills that they already

have. Less software is built “from scratch”. These

trends towards standardization may lead to a reduc-

tion in the resources available for an expert-based,

manual evaluation of usability.

Moreover, it has been proven that the evaluator

effects in heuristic evaluation are signiﬁcant and that

they may invalidate the results of using many evalua-

tors (Hertzum and Jacobsen, 2003). In order to make

better interfaces, and iin the next instance even as-

sess the usability process itself, as well as communi-

cate clearly the results and intentions coming out of

an evaluation, a stable set of usability criteria and de-

sign principles needs to be observed.

It is, however, difﬁcult to say if and to which re-

spect, one set of usability guidelines or design princi-

ples is going to perform better than another (Nielsen,

1994). The sets of guidelines and design principles

that we have today are arguably the result of experi-

ence and well-founded theoretical reasoning, but they

have not themselves been subjected to scientiﬁc test-

ing. This is the long-term objective of the research

described in this paper. In order to get there, some

groundwork entailing the operationalization of the

underlying heuristics is an absolute prerequisite. A

stricter approach to expertise-based evaluation, based

on well-deﬁned methods embedding a ﬁxed set of

principles is necessary to develop a stable “best prac-

tice” of usability principles, since evaluation of the

approach is otherwise impossible.

Experience indicates that human actors, however,

will not meticulously follow `a priori rules (Carl-

shamre and Rantzer, 2001). Moreover,it is difﬁcult to

261

Kristoffersen S. (2008).

LEARNABILITY AND ROBUSTNESS OF USER INTERFACES - Towards a Formal Analysis of Usability Design Principles.

In Proceedings of the Third International Conference on Software and Data Technologies - SE/GSDCA/MUSE, pages 261-268

DOI: 10.5220/0001869402610268

 SciTePress

isolate the usability concerns from everyday develop-

ment activities (Gentner and Grudin, 1990). Ivory and

Hearst found many studies conﬁrming that designers

ﬁnd it difﬁcult to adhere to guidelines, and that they

are biased towards an esthetically pleasing design re-

gardless (Ivory and Hearst, 2001). The development

of automatic usability evaluation techniques is there-

fore essential to advance the research in this area.

2 PERTAINING SYSTEMS FOR

USABILITY ASSESSMENT

Many guidelines and standards turn out to be poorly

formulated and difﬁcult to use, upon closer inspec-

tion (Thovtrup and Nielsen, 1991). The tools that ex-

ist for automating the assessment of usability correct-

ness criteria are often not sufﬁciently oriented towards

efﬁcient software development. Many tools require

dedicated mark-up or tool-chain facilitation. Others

are directed towards post-hoc evaluation. This re-

dundancy aspect may explain why they have had rel-

atively limited industrial success. Some tools have

simply been too cumbersome for designers and de-

velopers to be able to adopt (Ivory and Hearst, 2001).

Some progress has been made in the realms of

HCI (Human-Computer Interaction) research, how-

ever. The Catchit project has taken the need for ef-

ﬁcient integration in the software development tool-

chain seriously. It addresses evaluation in predic-

tive terms as well as development-oriented model-

ing of user behavior. Its software automatically in-

struments the source code with a monitor which de-

tects deviation from the expected work-ﬂow (Cal-

vary and Coutaz, 2002). Most previous approaches

needed such instrumentation to be carried out manu-

ally (Coutaz et al., 1996), which is error-prone in it-

self. Catchit represents an improvement, since it does

this automatically. It needs, however, a running code-

base to instrument. This limits the usefulness of the

tool to stages in which a running version of the system

has been implemented.

Another example of an integrated user interface

design and evaluation environment, is AIDE. It sup-

ports the assessment of a precisely deﬁned set of met-

rics related to efﬁciency on the keystroke-level, the

alignment and balance of elements on the screen, and

eventual violations of constraints given by the de-

signer (Sears, 1995). It can generate layouts, but is

perhaps limited in spite of its precision by this strong

focus exactly on the layout of a single dialogue. It

leaves the implementation of more broadly scoped us-

ability principles almost as an “exercise for product

developers”. We believe that it is exactly in the for-

mal modeling of deeper relations between multiple el-

ements (rather than widgets) that an improvement of

user experiences can be mostly improved.

One early example of an automatic tool aim-

ing at improving the usability of projects, based

on guidelines which are then compared to the ac-

tual performance of the implemented system or at

least its speciﬁcation, is the KRI/AG (L¨owgren and

Nordqvist, 1992). The project builds, like ours, on a

selected number of operationalized user interface de-

sign guidelines. The interface is encoded and then

subjected to automatic analysis. It does not, on the

other hand, see this as an instance of a more general

model-driven approach to software engineering, and

in spite of encouraging results from its initial trials, it

is now a near-forgotten endeavor.

3 RESEARCH OBJECTIVES

Our paper aims to pick up where the KRI/AG project

left off. A novel contribution of our work compared

to L¨owgren and Nordqvist’s work are the improve-

ments that we suggest to the modeling approach, in

order for this type of effort to succeed. This is cru-

cial if we want to see it as part of a more ambi-

tious, general approach, and even if we look only at

the more widespread interest lately in automatic eval-

uation of the usability of web-pages (Chevalier and

Ivory, 2003).

Ideally, any project should have some form of au-

tomatic usability evaluation support available, which

is itself easy to use, stable and transparent to com-

plement. We may in some situation wish to be able

to substitute dedicated expertise if that is not avail-

able, but ideally of course, we would be offering it as

a complement tool.

The research projects that we outlined above have

not spawned into successful commercial products.

Looking at the formulations of usability principles

themselves, in research papers (Gould and Lewis,

1985) as well as in the HCI curriculum (Dix et al.,

1997), this should come as no surprise. They are usu-

ally left rather vague, even for a human student of the

principles, and implementing automatic tool support

on the current conceptual platform is thus impossible.

In order to be able to develop an improved and

partly (at least) automatic checking of usability guide-

lines adherence as an integrated element in the devel-

opment tool-chain, so that it will be able to detect fail-

ure of invariants based on well-known usability prin-

ciples in an efﬁcient manner, these principles need to

be properly operationalized. It is therefore necessary

to take a step back and look at the speciﬁcation of the

ICSOFT 2008 - International Conference on Software and Data Technologies

262

criteria themselves.

In order to prepare the ground for an even more

ambitious theoretical endeavor, which we outlined

above, a walk-through and discussion of the “user re-

quirements” is warranted. In order to study the cor-

relation of perceived usability and formal assessment,

the guidelines need to be stable, well understood and

operationalizable. This is what we do next. The prin-

ciples are ordered within categories of broader usabil-

ity concerns, which are learnability, ﬂexibility, robust-

ness and task conformance. We deal only with two of

the categories in this paper, due to page number limi-

tations. We are going to deal with the remaining two

categories in a completely analogous fashion, in fu-

ture work.

4 RESULTS

In this section we will discuss and then summarize the

analysis that we did with respect to the operational-

izability of the design principles. The purpose is to

uncover at least one possible precise (albeit still in-

formal, in terms of the notation that we apply) for-

mulation of the principles, and sketch the research

problems that we believe must be solved in order to

implement a fully automatic check of the principles.

4.1 Learnability Principles

4.1.1 Predictability

The user ought to be able to judge what system re-

sponse is going to be as a response to the next user

action, and which state it will lead to. An informal

speciﬁcation of this principle, as a “theorem of us-

ability” might be that it ought to be impossible to get

from any to state to a state that is invisible, or to apply

(inadvertedly) a rule which has not made itself known

to the user.

The problem is that the requirement takes to its

logical consequence, instructs that all states in the

path of action that may be performed without user

interruption (or some other deﬁnition of “closure”),

from the current state, need to be visible. This is

usually neither possible nor desirable and it is an em-

pirical question if it really does encourage learnabil-

ity much. It certainly may leave the impression of a

messy interface. Some of the published actions in the

next sequence of possible steps, may now be “out of

context” for the user, and therefore diffcult to com-

prehend.

4.1.2 Synthesizability

The user should be able to understand which user ac-

tions have lead to the current state, and what the sys-

tem did to get there. We need to ﬁnd out if the there

are somehow invisible states that lead to the current

state. Thus, this is the criteria representing the inverse

of predictability.

The problem is similar to the one above, and the

trivial solutions equally unproductive. Unless the user

can learn to remember any possible path leading up to

an identiﬁable state, the publishing of possible paths

(at least a small handful of steps back) seems to be

necessary. Unfortunately, that will totally clutter the

interface.

4.1.3 Consistency

The system should offer the same or similar function-

ality from comparable situations, and in a familiar

fashion. The same or similar actions should yield the

same response. This means that we expect the same

or similar components to look alike and to respond

similarly on user input. We summarize it as the extent

to which similar appearances offer the same function-

ality. We think that this colloquially resembles the

second of Grudin’s consistency types: “External con-

sistency of interface features with features of other in-

terfaces familiar to the users (Grudin, 1989).”

The principle of consistence is, to be fair, not un-

contested. Grudin’s paper is one strong voice in this

respect (Grudin, 1989). One can imagine situations in

which consistency does not encourage learnability. It

is outside the scope of this paper to enter that discus-

sion, however, at least from a theoretical angle only.

Our ambition is to prepare the grounds in this paper

to do this empirically at a later stage.

4.1.4 Generalizability

Generalizability is sometimes described as “a form of

consistency,” except that it applies more broadly to

situations, rather than just operations. It is a state

where existing knowledge can be successfully ap-

plied; as such it digs even deeper into the question of

what is “the existing knowledge”. We summarize this

as being the extent to which related functionality can

be grouped, or a sequence of actions can be seen as

coming to some form of “closure.” Thus, it is aligned

with Grudin’s ﬁrst type of consistency, which he calls

“Internal consistency of an interface design (Grudin,

1989).”

In terms of operationalizing this principle, it is, to

start with, difﬁcult to knowexactly what to match, and

LEARNABILITY AND ROBUSTNESS OF USER INTERFACES - Towards a Formal Analysis of Usability Design

Principles

263

certainly there is no useful ontological or etymologi-

cal answers at the surface anywhere. We have already

pinned down consistency as a criteria which stipulates

yielding the same effect from similar actions. Re-

lying on the abstraction mechanisms well-described

in object-oriented programming, we now deﬁne gen-

eralizability as the property of categorizing sensibly,

so that similar action-effect pairs can be grouped to-

gether under more abstract headings, which seen from

outside the group behave in a coherent manner.

4.1.5 Familiarity

This is an externally oriented criteria, which capture

the extent to which the user experiences a real-world

parallel to the system. Can we match the actions that

we work with to similar activities outside the sys-

tem, so that lessons learned can be exploited either

directly or metaphorically? This criteria attempts to

measure the correlation of users’ knowledge with the

skills needed for effective interaction. We summarize

it as the extent to which functionality offered by the

system is similar to “a priori” or at least widely held,

experiences. It overlaps nicely to Grudin’s third con-

sistency deﬁnition; which is “correspondenceof inter-

face features to familiar features of the world beyond

computing. (Grudin, 1989).”

The biggest problem here is of course to be able

to match anything within the system with a theo-

retically inﬁnite and unspeciﬁeable universe without.

The character and number of “experiences” held by

the users will be vast, and even if human beings can

be expected to, to some extent, make their mind up

about what constitutes a priori knowledge, we cannot

expect to make an algorithm which does.

4.2 Robustness Principles

4.2.1 Observability

The question about observability asks if it is possible

for the users to decide which state the system is in,

from what they are observing. Are there states which

cannot be assessed from the interface?

We ﬁnd an unresolved problem right away, when

we try to operationalize this principle, namely identi-

fying which states we shall judge as being signiﬁcant.

It is likely to clutter the interface and overwhelm the

user if too many such states are “listed” at the inter-

face, so the majority of states will and should be hid-

den from the user. But the user still need be able to

ﬁnd out what is “going on,” in order to diagnose and

repair the interaction if it does not proceed according

to the intention. The next principle, of browsability,

is partly an answer to this question.

4.2.2 Browsability

The principle of browsability concerns whether there

there states which cannot be assessed from other

states, i.e., can the user cycle through all the states

once one is presented at the interface, to assess all

others.

This is an equally tricky claim to respond to, theo-

retically as well as technically. The systems which we

are interested in is often going to have inﬁnitely many,

even uncountable states and the interactivity of the

system as such makes it non-deterministic. Heuristics

to constrain and limit the search strategies and algo-

rithms that act out the user options fairly, will be a

prerequisite to establish fulﬁllment of this criteria.

4.2.3 Defaults

Next, a simpler criteria is often listen, namely if all

input states have default value suggestions. We might

want to allow an empty input be the default, but is is

important that this is at the designers discretion and

a conscious choice. Pragmatically, this might simply

mean that some explicit choice needs to have been

made.

Given that a modeling language is available which

makes implicit that an element is of type input ﬁeld,

we can quite straightforwardly implement a check for

default values and alert the designer is one is not given

(and preferably explicitly not given). This is not dif-

ﬁcult in a static description. Given that we are in-

terested in the dynamic behavior of an applications,

some problem may arise in which it is difﬁcult to sep-

arate a default value (“output”) from user input.

4.2.4 Reachability

The main notion of reachability concerns whether

the user can navigate from any given state to any other

state. This criteria is easily operationalized, although

it may not be practical to check since the number of

states is usually very large, and potentially inﬁnite.

Thus, we expect to have to devise clever search strate-

gies in order to be able to check our model. It may

even be necessary to built assumptions about which

subset of states that are relevant from any given “posi-

tion” in the application. This is perhaps ideologically

unfortunate. Often one tends to assert that the under-

lying models, and the actions and analysis which is

furnishes ought to be independent. In extrapolation

from this failure to separate concerns, errors may be

introduced into the model. Moreover, maintenance

and extension of the entire framework of formal anal-

ysis may be made more difﬁcult. This is also a con-

ICSOFT 2008 - International Conference on Software and Data Technologies

264

cern which we need to dig deeper into in further re-

search.

4.2.5 Persistence

The idea is that system communication to the user at

any state needs to be easily retrievable, unless (and

perhaps in spite of, sometimes) having been explic-

itly deleted. This criteria overlaps nicely in abstract

terms with the previous one, by a fortunate side-effect

of our choice of checking not the static model, but

the dynamic trace of entire state universe from the ap-

plication. Communication with the user is going to

be captured by states, and the question can then be

translated into one of reachability of these states from

those which follow. It highlights the general question

of ordering of states, of which chronological ordering

is but one of the simplest of orderings. Logical and se-

mantic ordering of (communication) states is also an

area of which our perspective invites further research.

4.2.6 Recoverability

The challenge of making sure that is is possible to

recognize and repair errors when they are detected is

a much harder one. The literature distinguishes be-

tween several interesting aspects.

• “Forward error recovery,” i.e., is it possible to

move forward to a state without errors.

• “Backward error recovery,” in other words, are

there states from which an action is irreversible?

Both of the above may be seen as instances of reach-

ability, since they stipulate an extent to which other

states (bearing in mind that we are investigating prop-

erties of the total dynamically generated state-space)

can be reached.

Additionally,the ”commensurateefforts” aspect is

associated with recoverability. It denotes the extent to

which the length of the path of actions that lead from

one state p to another state q, is equal to the one which

goes back from q to the ﬁrst state p. In other words,

if something is difﬁcult to repair, it should be difﬁcult

to break in the ﬁrst place.

In terms of our research, the biggest challenge that

we have recognized so far is to be able to recognize

the start and stop of a user action (in some meaningful

sense of “closure”). This is also related to the “calcu-

lus” of user interface behavior that we implicitly pre-

pare the ground for now, similarly to ordering, since

it concerns the capability of “grouping” state transi-

tions.

4.2.7 Responsiveness

Finally, in the robustness category we ﬁnd the crite-

ria that if re-action is not immediate, there needs to

be mechanisms in place that indicate to the user how

long it is going to take before it is ﬁnished. A his-

torical measure is also. according to Dix, useful, in

the sense that ”time stability” is desirable (Dix et al.,

1997). It means that the time it takes to execute ac-

tions needs to be approximately the same every time.

We see this criteria as bringing to the fore a set

of research questions related to the dimensions dis-

cussed above, concerning grouping and ordering of

state transitions. In addition, it demonstrates the need

to introduce real-time aspects into the modeling ap-

proach. This has so far been under-emphasized in

formal research on interaction design. It is by now

evidently clear that we need to separate between con-

cepts more clearly, and give some of them a more ex-

act interpretation in relation to the domain of user in-

teraction. The most important are listed here:

• Signiﬁcant State. This is a state that is modeled. A

“real” program during execution will have a vast

number of states that we do not need to model,

since they have no bearing on the properties that

we wish to analyze. Our theorems only describe

signiﬁcant states, by deﬁnition.

• Visible State. This is a state that makes itself

known to the user as the state or a previous one, is

entered.

• Published Rule. This is a rule that is visible at the

state from which the rule can applied, or earlier.

• Published state. This is a state that is visible at a

state from which a rule can be applied, which will

lead to the state, or earlier.

• Aggregate Action. A “macro” of state-changes,

which are in themselves atomic in the sense that

they can be performed individually or in other ag-

gregate actions.

• Compound Action. A “procedure” of state-

changes, in which individual statements are not

independently meaningful.

5 DISCUSSION

At the very ﬁrst point of reﬂection, it becomes clear

that the usability guidelines and principles that one

aims to assert in some formal fashion, will need to

be operationalizable at an entirely different level from

what we know today. This turns out to be very hard,

though, as noted by Farenc et al. (Farenc et al., 1999).

LEARNABILITY AND ROBUSTNESS OF USER INTERFACES - Towards a Formal Analysis of Usability Design

Principles

265

An effort such as the one described in his paper con-

tributes to advance this situation, by attempting to re-

specify usability design principles in a form that may

be decidable “even by” computers. One should be

careful not to expect to be able to capture every aspect

of each rule in this way, however. Our attempts at for-

malizing design ambitions may make them more triv-

ial. Clearly, we do not expect such an effort to elimi-

nate the competencies of a human evaluator and see it

instead as a complement and a ﬁrst stab at tool support

for usability engineering. Lack of precision is not, of

course, an advantage, on the other hand(Doubleday

et al., 1997), and one should arguably be doubtful of

design principles that cannot at least be exempliﬁed

or seem to detect simple instances of non-adherence.

When we know about the divergence caused by

the evaluator effect and the time and resources needed

to do robust usability testing, tool support for inves-

tigation the the HCI (Human-Computer Interaction)

aspects is clearly warranted. Some systems for usabil-

ity testing exist, relying on guidelines and standards,

which turn out to be hard to use even on their own

(Thovtrup and Nielsen, 1991). The tools for automat-

ing the assessment of usability correctness criteria is

often not efﬁciently integrated with software develop-

ment, or facilitate only post-hoc evaluation (Ivory and

Hearst, 2001). This introduces redundancy aspects,

which may explain why they have had relatively lim-

ited industrial success.

Usability engineering is often limited to infor-

mal user testing and ad-hoc observations (Holzinger,

2005), which, apart from the problems of divergence

and user/expert involvement needed, suffer from the

lack of generalizability and predictive force. Thus,

a “theory of usability” is needed. Many such at-

tempts to make a formal argument of usability is re-

lated to GOMS (Goals-Operators-Methods-Selection

rules) or similar rational or at least goal-oriented mod-

els (Gray et al., 1992). There has been reasonable

correlation of GOMS-based predictions with experi-

ments established in the literature (Gray et al., 1992).

Unfortunately, creating such models is labor-intensive

and error-prone. Using it for evaluation requires a

low-level speciﬁcation or ﬁnished software which can

be run to elicit a task model which is sufﬁciently high-

ﬁdelity, since the GOMS family of model represents

user actions down to the level of singular keystrokes

(John and Kieras, 1996a).

As a dedicated medium, theoretical representation

of the users’ interaction with the system can be seen

as facilitating the job of evaluating usability. On the

other hand, it is not usually viable as a modeling ap-

proach that is going to drive the development of the

interface in the ﬁrst place, although this is a possi-

bility (John and Kieras, 1996b). Too often, it relies in

the ﬁrst instance on an existing computer system or an

implementation level speciﬁcation (Card et al., 1980),

which arguably is exactly what one wants it for in the

ﬁrst place.

Ideally, the formal modeling of user interfaces,

which are input to the evaluation, should be exactly

the same speciﬁcation as the one used for the design

in the ﬁrst place. It makes it more likely that it will ac-

tually be used, since it does not create redundant spec-

iﬁcation work. More importantly, however, a multi-

purpose speciﬁcation would make it possible to con-

duct continuous evaluation of usability aspects. In-

deed, it could be built into the software development

environment.

LOTOS is one alternative speciﬁcation language

for interactive user interfaces (Patern´o and Faconti,

1993), which could be seen as aiming to ﬁll this role.

It is more akin to a general design language than the

syntax of the keystroke-level GOMS. This could also

be seen as its biggest downside also, since it becomes

almost as complex to make the speciﬁcation as the

actual programming of the user interface. It is simi-

lar enough to a full-blown programming language for

an even more overlapping representation in the form

of running code to be a tempting alternative in many

projects, and the advantages of formal speciﬁcation is

lost if it is not robustly simple and abstract enough for

the designer to be able to verify that the model is ac-

curate. Still, it is not an executable speciﬁcation, so

the work entailed by making the test aid is easily per-

ceived as redundant. We believe that a declarative ap-

proach is needed, and preferably one that can rely on

model-checking of the logical properties of the speci-

ﬁcation.

Approaches used in formal research in HCI, such

as GOMS and LOTOS, are not widely used in the

industry. Some argue they have fundamental prob-

lems, which means they only succeed in narrow do-

mains and will not realistically be useful in actual

design projects (Bannon and Bødker, 1991). In this

paper, instead, we see the problem as being the lack

of proper operationalization of the underlying usabil-

ity design principles. As a ﬁrst step toward resolving

that, we have offered a re-speciﬁcation of a subset of

the most commonly taught usability principles (Dix

et al., 1997). Additionally, we think for further work

that creating or extending a formalmodeling language

so that it is not only suited for describing interactive

user interfaces in a platform-independent fashion, but

also testing its logical properties in a precise way, is

absolutely necessary.

ICSOFT 2008 - International Conference on Software and Data Technologies

266

6 CONCLUSIONS

It is important to remind ourselves that we do not take

for granted that implementing the principles in accor-

dance with any of the deﬁnitions above, is

a priori,

in itself, virtuous or necessary (although it seems rea-

sonable) to achieve usability. We see this as an em-

pirical question, which needs to be assessed indepen-

dently. It will, however, be a much more doable as-

sessment in the ﬁrst place, if as we have suggested, a

precise and formal deﬁnition of what each principle

entails. Moreover, the availability of a tool which can

identify fulﬁllment or breakage of consistency criteria

will be necessary for any quantitative assessment of

the correlation between practice and theory. A subjec-

tive or example-dependent qualiﬁcation of each prin-

ciple may be sufﬁcient for teaching the notion com-

prised by each principle, but it will not do as a point

of departure for experiments of a more quantitative

nature. We believe that the latter will be a strong sup-

plement to the existing body of work.

Another equally important contribution of the re-

search presented in this paper is that it documents

shortcomings of modeling techniques when their ob-

jective has not been taken properly into account. We

know that many of the more generic frameworks for

describing user interfaces are not suitable for the dual

task of development and formal analysis. In many re-

spects that we have touched upon in this paper, they

are not suited for formal validation of usability de-

sign principles. There are many drawbacks. The vol-

ume and verbose nature of the speciﬁcations make

them hard to write and understand for the “human

model checker,”who at least has to be able to check

the model checker. We are of course aware of the

irony in this, but improvement of practice must be

seen as desirable even if it is stepwise rather than to-

tal, in our opinion.

It will, as we see it, be a great advantage compared

to most other automatic usability evaluation methods

based on models, if one can devise an approach which

does not need be ”made to match” an existing artifact,

i.e. a dedicated format or tool. These approaches

suffer from an ”impedance mismatch” problem, by

which we mean that the representation of the artifact

intended for checking may itself be an inaccurate im-

age (or it may not be one-to-one). By deﬁnition, us-

ing a declarative product from the software life-cycle

product chain itself, will make our ”substrate” corre-

spond more accurately with the manifest artifact that

one aims to implement in the next instance, namely

the dynamic user interface. Th result may still not

be exactly what the users wanted, but at least we can

check it properly and know that it represents correctly

the artifact, since the relationship between them is

one-to-one. On the other hand, this may represent a

problem for the speciﬁcation of the search strategies

that perform the model checking.

Finally, we need to state that in our opinion the

possibility of a nice framework and associated toolkit

for logical and precise analysis of usability principles

in an interactive application, does not pre-empt the

need to work closely with users. Notwithstanding the

internal validity of our contribution, which is to some

extent only depending on our efforts to formulate an

abstract world, the usefulness of such a framework

depends wholly on the “real world”. Thus, we look

forward to being able to compare the predictions of

a formal analysis with traditional usability evaluation

of the same systems. Only when correlation on this

level has been established, of course, one may con-

clude that this type of approach is really viable.

REFERENCES

Bannon, L. J. and Bødker, S. (1991). Beyond the interface:

encountering artifacts in use, pages 227–253. Cam-

bridge University Press, New York, NY, USA.

Calvary, G. and Coutaz, J. (2002). Catchit, a development

environment for transparent usability testing. In TA-

MODIA ’02: Proceedings of the First International

Workshop on Task Models and Diagrams for User In-

terface Design, pages 151–160. INFOREC Publishing

House Bucharest.

Card, S. K., Moran, T. P., and Newell, A. (1980). The

keystroke-level model for user performance time with

interactive systems. Commun. ACM, 23(7):396–410.

Carlshamre, P. and Rantzer, M. (2001). Dissemination of

usability: Failure of a success story. interactions, 8(1).

Chevalier, A. and Ivory, M. Y. (2003). Web site designs:

inﬂuences of designer’s expertise and design con-

straints. Int. J. Hum.-Comput. Stud., 58(1):57–87.

Coutaz, J., Salber, D., Carraux, E., and Portolan, N. (1996).

Neimo, a multiworkstation usability lab for observing

and analyzing multimodal interaction. In CHI ’96:

Conference companion on Human factors in comput-

ing systems, pages 402–403, New York, NY, USA.

ACM.

Davis, F. D. (1989). Perceived usefulness, perceived ease of

use, and user acceptance of information technology.

MIS Quarterly, 13(3):319–340.

Dix, A., Finlay, J., Abowd, G., and Beale, R. (1997).

Human-computer interaction. Prentice-Hall, Inc., Up-

per Saddle River, NJ, USA.

Doubleday, A., Ryan, M., Springett, M., and Sutcliffe, A.

(1997). A comparison of usability techniques for eval-

uating design. In DIS ’97: Proceedings of the 2nd con-

ference on Designing interactive systems, pages 101–

110, New York, NY, USA. ACM.

LEARNABILITY AND ROBUSTNESS OF USER INTERFACES - Towards a Formal Analysis of Usability Design

Principles

267

Farenc, C., Liberati, V., and Barthet, M.-F. (1999). Au-

tomatic ergonomic evaluation: What are the limits?

In Proceedings of the Third International Conference

on Computer-Aided Design of User Interfaces, Dor-

drecht, The Netherlands. Kluwer Academic Publish-

ers.

Gentner, D. R. and Grudin, J. (1990). Why good engineers

(sometimes) create bad interfaces. In CHI ’90: Pro-

ceedings of the SIGCHI conference on Human factors

in computing systems, pages 277–282, New York, NY,

USA. ACM.

Gould, J. D. and Lewis, C. (1985). Designing for usability:

key principles and what designers think. Commun.

ACM, 28(3):300–311.

Gray, W. D., John, B. E., and Atwood, M. E. (1992). The

precis of project ernestine or an overview of a valida-

tion of goms. In CHI ’92: Proceedings of the SIGCHI

conference on Human factors in computing systems,

pages 307–312, New York, NY, USA. ACM.

Grudin, J. (1989). The case against user interface consis-

tency. Commun. ACM, 32(10):1164–1173.

Hertzum, M. and Jacobsen, N. E. (2003). The evaluator ef-

fect: A chilling fact about usability evaluation meth-

ods. International Journal of Human-Computer Inter-

action, 15(1):183–204.

Holzinger, A. (2005). Usability engineering methods for

software developers. Commun. ACM, 48(1):71–74.

Ivory, M. Y. and Hearst, M. A. (2001). The state of the art

in automating usability evaluation of user interfaces.

ACM Comput. Surv., 33(4):470–516.

Jeffries, R., Miller, J. R., Wharton, C., and Uyeda, K.

(1991). User interface evaluation in the real world:

a comparison of four techniques. In CHI ’91: Pro-

ceedings of the SIGCHI conference on Human factors

in computing systems, pages 119–124, New York, NY,

USA. ACM.

John, B. E. and Kieras, D. E. (1996a). The goms fam-

ily of user interface analysis techniques: comparison

and contrast. ACM Trans. Comput.-Hum. Interact.,

3(4):320–351.

John, B. E. and Kieras, D. E. (1996b). Using goms for user

interface design and evaluation: which technique?

ACM Trans. Comput.-Hum. Interact., 3(4):287–319.

L¨owgren, J. and Nordqvist, T. (1992). Knowledge-based

evaluation as design support for graphical user inter-

faces. In CHI ’92: Proceedings of the SIGCHI confer-

ence on Human factors in computing systems, pages

181–188, New York, NY, USA. ACM.

Lund, A. M. (1997). Another approach to justifying the cost

of usability. interactions, 4(3):48–56.

Mulligan, R. M., Altom, M. W., and Simkin, D. K. (1991).

User interface design in the trenches: some tips on

shooting from the hip. In CHI ’91: Proceedings of the

SIGCHI conference on Human factors in computing

systems, pages 232–236, New York, NY, USA. ACM.

Myers, B. (1994). Challenges of hci design and implemen-

tation. interactions, 1(1):73–83.

Nielsen, J. (1992). Finding usability problems through

heuristic evaluation. In CHI ’92: Proceedings of the

SIGCHI conference on Human factors in computing

systems, pages 373–380, New York, NY, USA. ACM.

Nielsen, J. (1993). Usability Engineering. Morgan Kauf-

mann Publishers Inc., San Francisco, CA, USA.

Nielsen, J. (1994). Enhancing the explanatory power of

usability heuristics. In CHI ’94: Proceedings of the

SIGCHI conference on Human factors in computing

systems, pages 152–158, New York, NY, USA. ACM.

Patern´o, F. and Faconti, G. (1993). On the use of lotos to de-

scribe graphical interaction. In HCI’92: Proceedings

of the conference on People and computers VII, pages

155–173, New York, NY, USA. Cambridge University

Press.

Sears, A. (1995). Aide: a step toward metric-based inter-

face development tools. In UIST ’95: Proceedings of

the 8th annual ACM symposium on User interface and

software technology, pages 101–110, New York, NY,

USA. ACM.

Thovtrup, H. and Nielsen, J. (1991). Assessing the usability

of a user interface standard. In CHI ’91: Proceedings

of the SIGCHI conference on Human factors in com-

puting systems, pages 335–341, New York, NY, USA.

ACM.

Vredenburg, K., Mao, J.-Y., Smith, P., and Carey, T. (2002).

A survey of user-centered design practice. In Proceed-

ings of the SIGCHI Conference on Human Factors in

Computing Systems, page 471478, New York. ACM

Press.

ICSOFT 2008 - International Conference on Software and Data Technologies

268