how well the model generalizes on other datasets is
yet to be studied. Another limitation is the small size
of the dataset, which is a major difficulty in the use of
more advanced deep learning techniques. Thus, the
next step would be to perform an experiment to col-
lect a larger dataset.
ACKNOWLEDGEMENTS
This work was supported by the DAISI project, co-
funded by the European Union with the European
Regional Development Fund (ERDF), by the French
Agence Nationale de la Recherche and by the Re-
gional Council of Normandie.
REFERENCES
Akker, H. and Akker, R. (2009). Are you being addressed?-
real-time addressee detection to support remote par-
ticipants in hybrid meetings. In Proceedings of the
SIGDIAL 2009 Conference, pages 21–28.
Akker, R. o. d. and Traum, D. (2009). A comparison of
addressee detection methods for multiparty conversa-
tions. In Workshop on the Semantics and Pragmatics
of Dialogue, pages 99–106.
Baba, N., Huang, H.-H., and Nakano, Y. I. (2011). Iden-
tifying utterances addressed to an agent in multiparty
human–agent conversations. In International Work-
shop on Intelligent Virtual Agents, pages 255–261.
Friedman, N., Geiger, D., and Goldszmidt, M. (1997).
Bayesian network classifiers. Machine learning, 29(2-
3):131–163.
Galley, M., McKeown, K., Hirschberg, J., and Shriberg,
E. (2004). Identifying agreement and disagreement
in conversational speech: Use of bayesian networks
to model pragmatic dependencies. In Proceedings of
ACL’04, page 669.
Goffman, E. (1981). Forms of talk. university of pennsyl-
vania publications in conduct and communication.
Gupta, S., Niekrasz, J., Purver, M., and Jurafsky, D. (2007).
Resolving you in multiparty dialog. In In Proc. SIG-
dial, pages 227–230.
Hearst, M. A., Dumais, S. T., Osuna, E., Platt, J., and
Scholkopf, B. (1998). Support vector machines. In-
telligent Systems and their applications, 13(4):18–28.
Hochreiter, S. and Schmidhuber, J. (1997). Long short-term
memory. Neural computation, 9(8):1735–1780.
Hosmer Jr, D. W., Lemeshow, S., and Sturdivant, R. X.
(2013). Applied logistic regression, volume 398.
Jovanovic, N. (2007). To whom it may concern-addressee
identification in face-to-face meetings.
Jovanovic, N., Akker, R. o. d., and Nijholt, A. (2006). A
corpus for studying addressing behaviour in multi-
party dialogues. LREC’06, 40(1):5–23.
Jovanovic, N. and op den Akker, R. (2004). Towards
automatic addressee identification in multi-party dia-
logues. In Proc. of SIGdial@HLT-NAACL’04.
Krizhevsky, A., Sutskever, I., and Hinton, G. E. (2012). Im-
agenet classification with deep convolutional neural
networks. In Advances in neural information process-
ing systems, pages 1097–1105.
Kruse, R., Borgelt, C., Klawonn, F., Moewes, C., Stein-
brecher, M., and Held, P. (2013). Multi-layer percep-
trons. In Computational Intelligence, pages 47–81.
Landwehr, N., Hall, M., and Frank, E. (2005). Logistic
model trees. Machine learning, 59(1-2):161–205.
Le, T. M., Shimizu, N., Miyazaki, T., and Shinoda, K.
(2018). Deep learning based multi-modal addressee
recognition in visual scenes with utterances. arXiv
preprint arXiv:1809.04288.
Liaw, A., Wiener, M., et al. (2002). Classification and re-
gression by randomforest. R news, 2(3):18–22.
McCowan, I., Carletta, J., Kraaij, W., Ashby, S., Bourban,
S., Flynn, M., Guillemot, M., Hain, T., Kadlec, J.,
Karaiskos, V., et al. (2005). The ami meeting cor-
pus. In Proc. of the 5th International Conference on
Methods and Techniques in Behavioral Research, vol-
ume 88, page 100.
Pedregosa, F., Varoquaux, G., Gramfort, A., Michel, V.,
Thirion, B., Grisel, O., Blondel, M., Prettenhofer,
P., Weiss, R., Dubourg, V., et al. (2011). Scikit-
learn: Machine learning in python. Journal of ma-
chine learning research, 12(Oct):2825–2830.
Peng, C.-Y. J., Lee, K. L., and Ingersoll, G. M. (2002). An
introduction to logistic regression analysis and report-
ing. The journal of educational research, 96(1):3–14.
Recasens, A., Khosla, A., Vondrick, C., and Torralba, A.
(2015). Where are they looking? In Adv. in Neural
Information Processing Systems, pages 199–207.
Rish, I. et al. (2001). An empirical study of the naive bayes
classifier. In IJCAI 2001 workshop on empirical meth-
ods in artificial intelligence, volume 3, pages 41–46.
IBM New York.
Searle, J. (1969). Speech Acts: An Essay in the Philosophy
of Language.
Smit, S. K. and Eiben, A. E. (2009). Comparing parameter
tuning methods for evolutionary algorithms. In Proc
of CEC’09, pages 399–406.
Traum, D. R., Robinson, S., and Stephan, J. (2004). Evalua-
tion of multi-party virtual reality dialogue interaction.
In In Proc. LREC’04, pages 1699–1702.
Traum, D. R., Robinson, S., and Stephan, J. (2006). Evalu-
ation of multi-party reality dialogue interaction. Tech-
nical report, University of Southern California Marina
Del Rey CA Inst For Creative Technologies.
Vertegaal, R. (1998). Look who’s talking to whom. Medi-
ating Joint Attention in multiparty.
Zhang, M.-L. and Zhou, Z.-H. (2005). A k-nearest neigh-
bor based algorithm for multi-label classification. In
Granular Computing, 2005 IEEE International Con-
ference on, volume 2, pages 718–721. IEEE.
ICAART 2019 - 11th International Conference on Agents and Artificial Intelligence
274