were ∼ 59 (with a response rate of 71%) per user.
As an aside, there were 4 users out of 60 (3 in the first
group, 1 in the second) who completely disabled the
tutoring system because they found it too distracting.
The results shown in Table 1 indicate that the tutoring
system generally provides good and helpful feedback
on most aspects without being too disruptive when the
fine-tuned policy is used.
Table 1: Live feedback evaluation results averaged over the
number of responses received to each category during play
sessions, using both the intermediate and the fine-tuned pol-
icy.
Live feedback re-
sponse category
Percent of users
π
Advice
v1
π
Advice
Useful 38% 65%
Too many in a
short time
33% 13%
Too repetitive 19% 19%
Users disabling the
tutoring system
10% 3%
R3 Evaluation. For product-ready scenarios where
applications such as games need to run as fast as
possible and achieve high frame rates, it is impor-
tant to profile the time required to infer our pro-
posed model. Considering this, we profiled how
much time is required on average per frame to run
inference (CPU only, no GPU) on an Intel i5 9400
processor. The average evaluation per frame took
∼ 0.27 milliseconds(ms), with variations of ±0.12ms.
The memory footprint for the model was less than 24
MB. These results suggest that the methods described
in this paper may be suitable for lower specification
systems with limited resource budgets, without com-
promising the application frame rate.
5 CONCLUSIONS
In this paper, we presented a method to provide live
suggestions to the user while using an application.
These suggestions aimed to give the user a better un-
derstanding of the controls, interface, and dynamics,
and to show them how to use the application to their
advantage with less effort. Our method uses Rein-
forcement Learning techniques at its core. The eval-
uation conducted has shown that our tutoring system
is simultaneously efficient for human users, not per-
ceived as a nuisance, adapts to multimodal human be-
havior, and is fast enough to be deployed in real time
with limited resource budget. Our future plans are to
improve the method even further by automating the
manual work for the database of suggestions made.
One idea is to use some modern NLP techniques in
this area, since text generation is the most expensive
work for this step. Also, our method is currently being
used in other games that are planned to be released, so
we can further evaluate and improve our methods.
ACKNOWLEDGEMENTS
This research was supported by the European Re-
gional Development Fund, Competitiveness Oper-
ational Program 2014-2020 through project IDBC
(code SMIS 2014+: 121512).
REFERENCES
Agarwal, R., Machado, M. C., Castro, P. S., and Bellemare,
M. G. (2021). Contrastive behavioral similarity em-
beddings for generalization in reinforcement learning.
In International Conference on Learning Representa-
tions.
Alameda-Pineda, X., Ricci, E., and Sebe, N. (2018). Mul-
timodal Behavior Analysis in the Wild: Advances and
Challenges. Computer Vision and Pattern Recogni-
tion. Elsevier Science.
Ausin, M. S., Azizsoltani, H., Barnes, T., and Chi, M.
(2019). Leveraging deep reinforcement learning for
pedagogical policy induction in an intelligent tutoring
system. In EDM.
Beck, J., Woolf, B. P., and Beal, C. R. (2000). Advisor: A
machine learning architecture for intelligent tutor con-
struction. In Proceedings of the Seventeenth National
Conference on Artificial Intelligence and Twelfth Con-
ference on Innovative Applications of Artificial Intel-
ligence, page 552–557. AAAI Press.
Clouse, J. and Utgoff, P. (1996). On integrating apprentice
learning and reinforcement learning.
Fujimoto, S., van Hoof, H., and Meger, D. (2018). Ad-
dressing function approximation error in actor-critic
methods. CoRR, abs/1802.09477.
Haarnoja, T., Zhou, A., Abbeel, P., and Levine, S. (2018).
Soft actor-critic: Off-policy maximum entropy deep
reinforcement learning with a stochastic actor.
Ho, J. and Ermon, S. (2016). Generative adversarial imita-
tion learning. In Lee, D. D., Sugiyama, M., Luxburg,
U. V., Guyon, I., and Garnett, R., editors, Advances
in Neural Information Processing Systems 29, pages
4565–4573. Curran Associates, Inc.
Konda, V. R. and Tsitsiklis, J. N. (2000). Actor-critic algo-
rithms. In Solla, S. A., Leen, T. K., and M
¨
uller, K.,
editors, Advances in Neural Information Processing
Systems 12, pages 1008–1014. MIT Press.
Malpani, A., Ravindran, B., and Murthy, H. (2011). Person-
alized intelligent tutoring system using reinforcement
learning. In FLAIRS Conference.
Using Deep Reinforcement Learning to Build Intelligent Tutoring Systems
297