Eating and Drinking Recognition via Integrated Information of Head Directions and Joint Positions in a Group

Naoto Ienaga, Yuko Ozasa, Hideo Saito

2017

Abstract

Recent years have seen the introduction of service robots as waiters or waitresses in restaurants and cafes. In such venues, it is common for customers to visit in groups as well as for them to engage in conversation while eating and drinking. It is important for cyber serving staff to understand whether they are eating and drinking, or not, in order to wait on tables at appropriate times. In this paper, we present a method by which the robots can recognize eating and drinking actions performed by individuals in a group. Our approach uses the positions of joints in the human body as a feature and long short-term memory to achieve a recognition task on time-series data. We also used head directions in our method, as we assumed that it is effective for recognition in a group. The information garnered from head directions and joint positions is integrated via logistic regression and employed in recognition. The results show that this yielded the highest accuracy and effectiveness of the robots’ tasks.

References

  1. Amor, B. B., Su, J., and Srivastava, A. (2016). Action recognition using rate-invariant analysis of skeletal shape trajectories. IEEE transactions on pattern analysis and machine intelligence, 38(1):1-13.
  2. Bishop, C. M. (2006). Pattern Recognition and Machine Learning. Springer.
  3. Chua, J.-L., Chang, Y. C., Jaward, M. H., Parkkinen, J., and Wong, K.-S. (2014). Vision-based hand grasping posture recognition in drinking activity. In International Symposium on Intelligent Signal Processing and Communication Systems, pages 185-190.
  4. Clark, R. A., Pua, Y.-H., Oliveira, C. C., Bower, K. J., Thilarajah, S., McGaw, R., Hasanki, K., and Mentiplay, B. F. (2015). Reliability and concurrent validity of the microsoft xbox one kinect for assessment of standing balance and postural control. Gait & posture, 42(2):210-213.
  5. Ding, W., Liu, K., Cheng, F., Shi, H., and Zhang, B. (2015). Skeleton-based human action recognition with profile hidden markov models. In CCF Chinese Conference on Computer Vision, pages 12-21.
  6. Du, Y., Fu, Y., and Wang, L. (2016). Representation learning of temporal dynamics for skeleton-based action recognition. IEEE Transactions on Image Processing, 25(7):3010-3022.
  7. Hochreiter, S. and Schmidhuber, J. (1997). Long short-term memory. Neural Computation, 9(8):1735-1780.
  8. Iosifidis, A., Marami, E., Tefas, A., and Pitas, I. (2012). Eating and drinking activity recognition based on discriminant analysis of fuzzy distances and activity volumes. In International Conference on Acoustics, Speech and Signal Processing, pages 2201-2204.
  9. Ishii, R., Otsuka, K., Kumano, S., and Yamato, J. (2016). Prediction of who will be the next speaker and when using gaze behavior in multiparty meetings. ACM Transactions on Interactive Intelligent Systems, 6(1):4:1-4:31.
  10. Kingma, D. and Ba, J. (2014). for stochastic optimization. arXiv:1412.6980, pages 1-9.
  11. Koppula, H. S. and Saxena, A. (2016). Anticipating human activities using object affordances for reactive robotic response. IEEE transactions on pattern analysis and machine intelligence, 38(1):14-29.
  12. Nayak, N. M., Zhu, Y., and Chowdhury, A. K. R. (2015). Hierarchical graphical models for simultaneous tracking and recognition in wide-area scenes. IEEE Transactions on Image Processing, 24(7):2025-2036.
  13. Nihei, F., Nakano, Y. I., Hayashi, Y., Huang, H.-H., and Okada, S. (2014). Predicting influential statements in group discussions using speech and head motion information. In International Conference on Multimodal Interaction, pages 136-143.
  14. Ozasa, Y. and Ariki, Y. (2012). Object identification based on color and object names using multimodal information. The journal of the Institute of Image Electronics Engineers : visual computing, devices and communications, 45(1):105-111.
  15. Ozasa, Y., Nakano, M., Ariki, Y., and Iwahashi, N. (2015). Discriminating unknown objects from known objects using image and speech information. IEICE TRANSACTIONS on Information and Systems, 98(3):704- 711.
  16. Pieska, S., Luimula, M., Jauhiainen, J., and Spiz, V. (2013). Social service robots in wellness and restaurant applications. Journal of Communication and Computer, 10(1):116-123.
  17. Qing-Xiao, Y., Can, Y., Zhuang, F., and Yan-Zheng, Z. (2010). Research of the localization of restaurant service robot. International Journal of Advanced Robotic Systems, 7(3):227-238.
  18. Swears, E., Hoogs, A., Ji, Q., and Boyer, K. (2014). Complex activity recognition using granger constrained dbn (gcdbn) in sports and surveillance video. In IEEE Conference on Computer Vision and Pattern Recognition, pages 788-795.
  19. Tokui, S., Oono, K., Hido, S., and Clayton, J. (2015). Chainer: a next-generation open source framework for deep learning. In Workshop on Machine Learning Systems at Neural Information Processing Systems.
  20. Vemulapalli, R., Arrate, F., and Chellappa, R. (2014). Human action recognition by representing 3d skeletons as points in a lie group. In IEEE Conference on Computer Vision and Pattern Recognition, pages 588-595.
  21. Wang, J., Liu, Z., Wu, Y., and Yuan, J. (2014). Learning actionlet ensemble for 3d human action recognition. IEEE transactions on pattern analysis and machine intelligence, 36(5):914-927.
Download


Paper Citation


in Harvard Style

Ienaga N., Ozasa Y. and Saito H. (2017). Eating and Drinking Recognition via Integrated Information of Head Directions and Joint Positions in a Group . In Proceedings of the 6th International Conference on Pattern Recognition Applications and Methods - Volume 1: ICPRAM, ISBN 978-989-758-222-6, pages 527-533. DOI: 10.5220/0006200305270533


in Bibtex Style

@conference{icpram17,
author={Naoto Ienaga and Yuko Ozasa and Hideo Saito},
title={Eating and Drinking Recognition via Integrated Information of Head Directions and Joint Positions in a Group},
booktitle={Proceedings of the 6th International Conference on Pattern Recognition Applications and Methods - Volume 1: ICPRAM,},
year={2017},
pages={527-533},
publisher={SciTePress},
organization={INSTICC},
doi={10.5220/0006200305270533},
isbn={978-989-758-222-6},
}


in EndNote Style

TY - CONF
JO - Proceedings of the 6th International Conference on Pattern Recognition Applications and Methods - Volume 1: ICPRAM,
TI - Eating and Drinking Recognition via Integrated Information of Head Directions and Joint Positions in a Group
SN - 978-989-758-222-6
AU - Ienaga N.
AU - Ozasa Y.
AU - Saito H.
PY - 2017
SP - 527
EP - 533
DO - 10.5220/0006200305270533