timate the gaze target of a human subject seated in
front of a computer using a single webcam. Possi-
ble applications of this system include enhancements
to videoconferencing technology, enabling communi-
cation channels which are otherwise removed from
telematic conversation and interaction. By using pub-
licly available Python libraries, we read video user
names and cell locations directly from the computer
screen, creating a system which both recognizes and
communicates the subject of a person’s gaze.
This system was then shown to be capable of
restoring nonverbal communication to an environ-
ment which relies heavily on such a channel: the mu-
sic classroom. Through telematic performances of
Walter Thompson’s Soundpainting, student musicians
were able to respond to conducted cues, and experi-
enced a greater sense of communication and connec-
tion with the restoration of eye contact to their virtual
music environment.
ACKNOWLEDGEMENTS
The authors would like to thank students and teach-
ing staff from the Sound in Time introductory music
course at the University of California, San Diego; mu-
sicians from the UCSD Symphonic Student Associa-
tion; and students of the Southwestern College Or-
chestra under the direction of Dr. Matt Kline.
This research was supported by the UCOP Innova-
tive Learning Technology Initiative Grant (University
of California Office of the President, 2020).
REFERENCES
Barat
`
e, A., Haus, G., and Ludovico, L. A. (2020). Learning,
teaching, and making music together in the covid-19
era through ieee 1599. In 2020 International Confer-
ence on Software, Telecommunications and Computer
Networks (SoftCOM), pages 1–5.
Baydin, A. G., Pearlmutter, B. A., Radul, A. A., and
Siskind, J. M. (2017). Automatic differentiation in
machine learning: a survey. The Journal of Machine
Learning Research, 18(1):5595–5637.
Biasutti, M., Concina, E., Wasley, D., and Williamon, A.
(2013). Behavioral coordination among chamber mu-
sicians: A study of visual synchrony and communica-
tion in two string quartets.
Bradski, G. (2000). The OpenCV Library. Dr. Dobb’s Jour-
nal of Software Tools.
Byo, J. L. and Lethco, L.-A. (2001). Student musicians’ eye
contact with the conductor: An exploratory investiga-
tion. Contributions to music education, pages 21–35.
Cha, X., Yang, X., Feng, Z., Xu, T., Fan, X., and Tian, J.
(2018). Calibration-free gaze zone estimation using
convolutional neural network. In 2018 International
Conference on Security, Pattern Analysis, and Cyber-
netics (SPAC), pages 481–484.
Davidson, J. W. and Good, J. M. (2002). Social and
musical co-ordination between members of a string
quartet: An exploratory study. Psychology of music,
30(2):186–201.
de Oliveira Dias, M., Lopes, R. d. O. A., and Teles, A. C.
(2020). Will virtual replace classroom teaching?
lessons from virtual classes via zoom in the times of
covid-19. Journal of Advances in Education and Phi-
losophy.
Frischen, A., Bayliss, A. P., and Tipper, S. P. (2007). Gaze
cueing of attention: visual attention, social cognition,
and individual differences. Psychological bulletin,
133(4):694.
He, K., Zhang, X., Ren, S., and Sun, J. (2016). Deep resid-
ual learning for image recognition. In Proceedings of
the IEEE conference on computer vision and pattern
recognition, pages 770–778.
Hoffman, G. and Weinberg, G. (2010). Shimon: an in-
teractive improvisational robotic marimba player. In
CHI’10 Extended Abstracts on Human Factors in
Computing Systems, pages 3097–3102.
Isikdogan, F., Gerasimow, T., and Michael, G. (2020). Eye
contact correction using deep neural networks. In Pro-
ceedings of the IEEE/CVF Winter Conference on Ap-
plications of Computer Vision, pages 3318–3326.
Kingma, D. P. and Ba, J. (2015). Adam: A method for
stochastic optimization. In Bengio, Y. and LeCun,
Y., editors, 3rd International Conference on Learn-
ing Representations, ICLR 2015, San Diego, CA, USA,
May 7-9, 2015, Conference Track Proceedings.
Krafka, K., Khosla, A., Kellnhofer, P., Kannan, H., Bhan-
darkar, S., Matusik, W., and Torralba, A. (2016). Eye
tracking for everyone. In 2016 IEEE Conference on
Computer Vision and Pattern Recognition (CVPR),
pages 2176–2184.
Liu, W., Anguelov, D., Erhan, D., Szegedy, C., Reed, S.,
Fu, C.-Y., and Berg, A. C. (2016). Ssd: Single shot
multibox detector. In European conference on com-
puter vision, pages 21–37. Springer.
Marchetti, E. and Jensen, K. (2010). A meta-study of mu-
sicians’ non-verbal interaction. International Journal
of Technology, Knowledge & Society, 6(5).
Peplowski, K. (1998). The process of improvisation. Orga-
nization Science, 9(5):560–561.
Rao, A. S., Georgeff, M. P., et al. (1995). Bdi agents: From
theory to practice. In ICMAS, volume 95, pages 312–
319.
Regenbrecht, H. and Langlotz, T. (2015). Mutual gaze sup-
port in videoconferencing reviewed. Communications
of the Association for Information Systems, 37(1):45.
Seddon, F. A. (2005). Modes of communication during jazz
improvisation. British Journal of Music Education,
22(1):47–61.
Sugano, Y., Matsushita, Y., and Sato, Y. (2012).
Appearance-based gaze estimation using visual
saliency. IEEE transactions on pattern analysis and
machine intelligence, 35(2):329–341.
Restoring Eye Contact to the Virtual Classroom with Machine Learning
707