
ations that stimulate the cognitive development of its
users. The authors also aim to explore deep learn-
ing models based on transformers (Khan et al., 2022)
to extract high-level representations from gameplay
footage.
AKNOWLEDGMENTS
To CAPES, for financial support.
REFERENCES
Boyle, E., Connolly, T. M., and Hainey, T. (2011). The role
of psychology in understanding the impact of com-
puter games. Entertainment Computing, 2(2):69–74.
Serious Games Development and Applications.
Faria, M. P. P., Julia, E. S., Nascimento, M. Z. d., and Ju-
lia, R. M. S. (2022). Investigating the performance
of various deep neural networks-based approaches de-
signed to identify game events in gameplay footage.
volume 5, New York, NY, USA. Association for Com-
puting Machinery.
Feng, J. and Zhou, Z.-H. (2017). Deep miml network. In
AAAI.
Gao, J., Yang, Z., and Nevatia, R. (2017). RED: rein-
forced encoder-decoder networks for action anticipa-
tion. CoRR, abs/1707.04818.
Geest, R. D., Gavves, E., Ghodrati, A., Li, Z., Snoek, C.,
and Tuytelaars, T. (2016). Online action detection.
volume abs/1604.06506.
Global Data (2021). Video games market set
to become a 300bn-plus industry by 2025.
https://www.globaldata.com/video-games-market-
set-to-become-a-300bn-plus-industry-by-2025.
Hochreiter, S. and Schmidhuber, J. (1997). Long short-term
memory. Neural Computation, 9(8):1735–1780.
Janarthanan, V. (2012). Serious video games: Games for
education and health. In 2012 Ninth International
Conference on Information Technology - New Gener-
ations.
Karakovskiy, S. and Togelius, J. (2012). The mario ai
benchmark and competitions. IEEE Transactions
on Computational Intelligence and AI in Games,
4(1):55–67.
Khan, S., Naseer, M., Hayat, M., Zamir, S. W., Khan, F. S.,
and Shah, M. (2022). Transformers in vision: A sur-
vey. ACM computing surveys (CSUR), 54(10s):1–41.
Kingma, D. P. and Ba, J. (2015). Adam: A method for
stochastic optimization. In 3rd International Confer-
ence on Learning Representations, ICLR 2015, San
Diego, CA, USA, May 7-9, 2015, Conference Track
Proceedings.
Kozlov, M. D. and Johansen, M. K. (2010). Real behavior
in virtual environments: Psychology experiments in
a simple virtual-reality paradigm using video games.
Cyberpsychology, Behavior, and Social Networking.
Luo, Z., Guzdial, M., Liao, N., and Riedl, M. (2018). Player
experience extraction from gameplay video. CoRR,
abs/1809.06201.
Luo, Z., Guzdial, M., and Riedl, M. (2019). Making cnns
for video parsing accessible. CoRR, abs/1906.11877.
Neto, H. C. and Julia, R. M. S. (2018). Ace-rl-checkers:
decision-making adaptability through integration of
automatic case elicitation, reinforcement learning, and
sequential pattern mining. Knowledge and Informa-
tion Systems, 57(3):603–634.
Prena, K. and Sherry, J. (2018). Parental perspectives on
video game genre preferences and motivations of chil-
dren with down syndrome. Journal of Enabling Tech-
nologies, 12:00–00.
Ravanbakhsh, M., Nabi, M., Sangineto, E., Marcenaro, L.,
Regazzoni, C., and Sebe, N. (2017). Abnormal event
detection in videos using generative adversarial nets.
Roettl, J. and Terlutter, R. (2018). The same video game in
2d, 3d or virtual reality – how does technology impact
game evaluation and brand placements? PLOS ONE,
13:1–24.
Sandler, M., Howard, A., Zhu, M., Zhmoginov, A., and
Chen, L.-C. (2018). Mobilenetv2: Inverted residu-
als and linear bottlenecks. In Proceedings of the IEEE
Conference on Computer Vision and Pattern Recogni-
tion (CVPR).
Song, L., Liu, J., Qian, B., Sun, M., Yang, K., Sun, M., and
Abbas, S. (2018). A deep multi-modal cnn for multi-
instance multi-label image classification. IEEE Trans-
actions on Image Processing, 27(12):6025–6038.
Squire, K. (2003). Video games in education. International
Journal of Intelligent Simulations and Gaming, 2:49–
62.
Xu, M., Gao, M., Chen, Y., Davis, L. S., and Crandall, D. J.
(2018). Temporal recurrent networks for online action
detection.
Yu, M., Bambacus, M., Cervone, G., Clarke, K., Duffy, D.,
Huang, Q., Li, J., Li, W., Li, Z., Liu, Q., Resch, B.,
Yang, J., and Yang, C. (2020). Spatiotemporal event
detection: a review. International Journal of Digital
Earth, 13(12):1339–1365.
Zeiler, M. D. (2012). Adadelta: an adaptive learning rate
method. arXiv preprint arXiv:1212.5701.
Zhou, Z.-H., Zhang, M.-L., Huang, S.-J., and Li, Y.-F.
(2012). Multi-instance multi-label learning. Artificial
Intelligence, 176(1):2291–2320.
Deep Learning-Based Models for Performing Multi-Instance Multi-Label Event Classification in Gameplay Footage
121