experiments — they allow to gather much more ob-
jective data than field studies.
Subjective data include the individual user expe-
riences. The most convenient way to gather them is
through questionnaires and by structured interviews
with the users. While objective data are usually good
at showing “what” happened, the subjective data of-
ten help to clarify “why” it happened or to filter out
biased users. If the user group is large enough, all of
the data should be statistically analyzed.
So far, we do not have other metric for the tool
productivity than the time required to solve the tasks.
8 CONCLUSIONS
We have presented our position on the AI tool design,
in particular we have stressed the importance of user
evaluation of the tools. We believe that comparative
controlled experiments should become the main aca-
demic approach how to demonstrate AI tools useful-
ness to the game industry and how to drive the tool
improvements.
In general, designing proper comparative exper-
iments for usability and productivity evaluation is
tricky as there are many factors that affect user per-
formance, which are difficult to control. Since this
research area is relatively new, there are no definitive
“best practices” to follow. In this paper we have pre-
sented what we consider a candidate for such “best
practice” in the case of languages and tools for de-
sign of game AI. We believe that our experience is to
a large extent transferable to other real world applica-
tions of AI and supporting tools.
The experience we have gathered throughout the
development of SPOSH and yaPOSH have let us de-
sign a new tool for developing NPCs behavior that has
been adopted for real-world deployment in the design
process of an upcoming AAA computer game. Al-
though the project leads were initially skeptical about
the idea, thanks to our studies, we were able to show
that we know how to create a practical tool. The tool
itself was ultimately received very well by both the
project leads and its users (i. e. scripters) and has
fully replaced their previous behavior design solution.
ACKNOWLEDGEMENTS
Human data were collected with APA princi-
ples in mind. This research is supported by
the Czech Science Foundation under the contract
P103/10/1287 (GACR), by student grant GA UK No.
655012/2012/A-INF/MFF and partially supported by
SVV project number 267 314.
REFERENCES
Bryson, J. J. (2001). Intelligence by design: Principles of
Modularity and Coordination for Engineering Com-
plex Adaptive Agent. PhD thesis, MIT, Department of
EECS, Cambridge, MA.
Champandard, A. J. (2010). Finding a better way to Mordor.
Presentation, CIG 2010. http://vimeo.com/14390998
Accessed 2014-01-05.
Cutumisu, M., Onuczko, C., McNaughton, M., Roy, T.,
Schaeffer, J., Schumacher, A., Siegel, J., Szafron, D.,
Waugh, K., Carbonaro, M., et al. (2007). ScriptEase:
A generative/adaptive programming paradigm for
game scripting. Science of Computer Programming,
67(1):32 – 58.
Hollingsed, T. and Novick, D. G. (2007). Usability inspec-
tion methods after 15 years of research and practice.
In Proceedings of the 25th annual ACM international
conference on Design of communication, pages 249–
255. ACM.
Jeffries, R., Miller, J. R., Wharton, C., and Uyeda, K.
(1991). User interface evaluation in the real world: a
comparison of four techniques. In Proceedings of the
SIGCHI conference on Human factors in computing
systems, pages 119–124. ACM.
Karat, C.-M., Campbell, R., and Fiegel, T. (1992). Com-
parison of empirical testing and walkthrough methods
in user interface evaluation. In Proceedings of the
SIGCHI conference on Human factors in computing
systems, pages 397–404. ACM.
Nielsen, J. and Phillips, V. L. (1993). Estimating the rela-
tive usability of two interfaces: Heuristic, formal, and
empirical methods compared. In Proceedings of the
INTERACT’93 and CHI’93 conference on Human fac-
tors in computing systems, pages 214–221. ACM.
P
´
ıbil, R., Nov
´
ak, P., Brom, C., and Gemrot, J. (2012). Notes
on pragmatic agent-programming with Jason. In Pro-
gramming Multi-Agent Systems, volume LNCS 7217,
pages 58–73. Springer.
Sadowski, C. and Kurniawan, S. (2011). Heuristic evalua-
tion of programming language features: two parallel
programming case studies. In Proceedings of the 3rd
ACM SIGPLAN workshop on Evaluation and usabil-
ity of programming languages and tools, pages 9–14.
ACM.
Stefik, A., Siebert, S., Stefik, M., and Slattery, K. (2011).
An empirical comparison of the accuracy rates of
novices using the Quorum, Perl, and Randomo pro-
gramming languages. In Proceedings of the 3rd ACM
SIGPLAN workshop on Evaluation and usability of
programming languages and tools, pages 3–8. ACM.
ICAART2014-InternationalConferenceonAgentsandArtificialIntelligence
468