ably allowing the agent to do intervention in the tar-
get task. The definitions of the timings were validated
with Web based questionnaires. Second, an automatic
timing identifying method was developed to identify
four of the timings only by using nonverbal cues in-
cluding face direction and speaking state. The perfor-
mance of the method is moderate (F-measure 0.53).
The current automatic estimation method is still
relatively ad hoc due to the small corpus, we would
like to increase the corpus size and would like to
explore machine learning techniques to improve its
performance. The performance should be able to be
improved with additional context information. We
would like to introduce the mechanism of context
management and understanding in the future, this
should help to estimate the four remaining timings as
well. Finally, we would like to incorporate the inter-
vention timing estimation feature into an ECA system
and test it in a real-world application.
ACKNOWLEDGEMENTS
This work is partially funded by JSPS under a Grant-
in-Aid for Scientific Research (B) (24300039) and
(25280076).
REFERENCES
Argyle, M. and Cook, M. (1976). Gaze and Mutual Gaze.
Cambridge University Press.
Baba, N., Huang, H.-H., and Nakano, Y. (2012). Addressee
identification for human-human-agent multiparty con-
versations in different proxemics. In 4th Workshop on
Eye Gaze in Intelligent Human Machine Interaction:
Eye Gaze and Multimodality,14th International Con-
ference on Multimodal Interaction (ICMI 2012).
Bohus, D. and Horvitz, E. (2010). Facilitating multiparty
dialog with gaze, gesture, and speech. In In Interna-
tional Conference on Multimodal Interfaces and the
Workshop on Machine Learning for Multimodal In-
teraction.
Clark, H. H. and Schaefer, E. F. (1989). Contributing to
discourse. Cognitive Science, 13:259–294.
Duncan, S. (1972). Some signals and rules for taking speak-
ing turns in conversations. Journal of Personality and
Psychology, 23(2):283–292.
Hall, M., Frank, E., Holmes, G., Pfahringer, B., Reutemann,
P., and Witten, I. H. (2009). The weka data mining
software: An update. ACM SIGKDD Explorations,
11(1):11–18.
Huang, H.-H., Baba, N., and Nakano, Y. (2011). Making
virtual conversational agent aware of the addressee
of users’ utterances in multi-user conversation from
nonverbal information. In 13th International Con-
ference on Multimodal Interaction (ICMI’11), pages
401–408.
Jonsdottir, G. R. and Thorisson, K. (2009). Teaching com-
puters to conduct spoken interviews: Breaking the re-
altime barrier with learning. In Ruttkay, Z., Kipp, M.,
Nijholt, A., Hogni, H., and Vilhjalmsson, editors, 9th
International Conference on Intelligent Virtual Agents
(IVA’09), volume 5773/2009 of LNCS, pages 446–
459, Amsterdam, Netherlands. Springer Berlin.
Kendon, A. (1967). some functions of gaze direction in
social interaction. Acta Psychologica, 26:22–63.
Kopp, S., Gesellensetter, L., Kramer, N. C., and
Wachsmuth, I. (2005). A conversational agent as mu-
seum guide - design and evaluation of a real-world ap-
plication. In Proceedings of the 5th International Con-
ference on Intelligent Virtual Agents (IVA’05), Kos,
Greece.
Nakano, M., Dohsaka, K., Miyazaki, N., ichi Hirasawa,
J., Tamoto, M., Kawamori, M., Sugiyama, A., and
Kawabata, T. (1999). Handling rich turn-taking
in spoken dialogue systems. In European Confer-
ence on Speech Communication and Technology (EU-
ROSPEECH’99).
Renals, S., Hain, T., and Bourlard, H. (2007). Recogni-
tion and understanding of meetings the ami and amida
projects. In IEEE Workhshop on Automatic Speech
Recognition and Understanding (ASRU’07).
Sacks, H., Schegloff, E. A., and Jefferson, G. (1974). a sim-
plest systematics for the organization of turn-taking
for conversation. language, 50(4):696–735.
Subramanian, R., Staiano, J., Kalimeri, K., Sebe, N., and
Pianesi, F. (2010). Putting the pieces together: Mul-
timodal analysis of social attention in meetings. In
Proceedings of the International Conference on Mul-
timedia, pages 659–662.
Takemae, Y., Otsuka, K., and Mukawa, N. (2003). Videocut
editing rule based on participants’ gaze in multiparty
conversation. In 11th ACM International Conference
on Multimedia.
Traum, D. (2003). Issues in multiparty dialogues. In Ad-
vances in Agent Communication, International Work-
shop on Agent Communication Languages (ACL’03),
pages 201–211.
Traum, D., Aggarwal, P., Artstein, R., Foutz, S., Gerten, J.,
Katsamanis, A., Leuski, A., Noren, D., and Swartout,
W. (2012). Ada and grace: Direct interaction with
museum visitors. In 12th International Conference on
Intelligent Virtual Agents (IVA 2012), pages 245–251.
Waibel, A., Bett, M., Finke, M., and Stiefelhagen, R.
(1998). Meeting browser: Tracking and summarizing
meetings. In DARPA Broadcast News Transcription
and Understanding Workshop, pages 281–286.
Waibel, A., Bett, M., Metze, F., Ries, K., Schaaf, T.,
Schultz, T., Soltau, H., Yu, H., and Zechner, K.
(2001). Advances in automatic meeting record cre-
ation and access. In Proceedings of the International
Conference on Acoustics, Speech, and Signal Process-
ing 2001 (ICASSP’01), Seattle, USA.