Our goal is to make cyber serving staff determine
the appropriate times for waiting on tables. One of
the times is when a customer in a group finishes the
eating or drinking. We will predict the time by using
our proposed method in future work.
ACKNOWLEDGEMENTS
This work was partially supported by JST CREST
“Intelligent Information Processing Systems Creat-
ing Co-Experience Knowledge and Wisdom with
Human-Machine Harmonious Collaboration.”
REFERENCES
Amor, B. B., Su, J., and Srivastava, A. (2016). Action
recognition using rate-invariant analysis of skeletal
shape trajectories. IEEE transactions on pattern anal-
ysis and machine intelligence, 38(1):1–13.
Bishop, C. M. (2006). Pattern recognition and machine
learning. Springer.
Chua, J.-L., Chang, Y. C., Jaward, M. H., Parkkinen, J., and
Wong, K.-S. (2014). Vision-based hand grasping pos-
ture recognition in drinking activity. In International
Symposium on Intelligent Signal Processing and Com-
munication Systems, pages 185–190.
Clark, R. A., Pua, Y.-H., Oliveira, C. C., Bower, K. J., Thi-
larajah, S., McGaw, R., Hasanki, K., and Mentiplay,
B. F. (2015). Reliability and concurrent validity of the
microsoft xbox one kinect for assessment of stand-
ing balance and postural control. Gait & posture,
42(2):210–213.
Ding, W., Liu, K., Cheng, F., Shi, H., and Zhang, B. (2015).
Skeleton-based human action recognition with profile
hidden markov models. In CCF Chinese Conference
on Computer Vision, pages 12–21.
Du, Y., Fu, Y., and Wang, L. (2016). Representation learn-
ing of temporal dynamics for skeleton-based action
recognition. IEEE Transactions on Image Processing,
25(7):3010–3022.
Hochreiter, S. and Schmidhuber, J. (1997). Long short-term
memory. Neural computation, 9(8):1735–1780.
Iosifidis, A., Marami, E., Tefas, A., and Pitas, I. (2012).
Eating and drinking activity recognition based on dis-
criminant analysis of fuzzy distances and activity vol-
umes. In International Conference on Acoustics,
Speech and Signal Processing, pages 2201–2204.
Ishii, R., Otsuka, K., Kumano, S., and Yamato, J. (2016).
Prediction of who will be the next speaker and
when using gaze behavior in multiparty meetings.
ACM Transactions on Interactive Intelligent Systems,
6(1):4:1–4:31.
Kingma, D. and Ba, J. (2014). Adam: A method
for stochastic optimization. arXiv preprint
arXiv:1412.6980, pages 1–9.
Koppula, H. S. and Saxena, A. (2016). Anticipating human
activities using object affordances for reactive robotic
response. IEEE transactions on pattern analysis and
machine intelligence, 38(1):14–29.
Nayak, N. M., Zhu, Y., and Chowdhury, A. K. R. (2015).
Hierarchical graphical models for simultaneous track-
ing and recognition in wide-area scenes. IEEE Trans-
actions on Image Processing, 24(7):2025–2036.
Nihei, F., Nakano, Y. I., Hayashi, Y., Huang, H.-H., and
Okada, S. (2014). Predicting influential statements
in group discussions using speech and head motion
information. In International Conference on Multi-
modal Interaction, pages 136–143.
Ozasa, Y. and Ariki, Y. (2012). Object identification based
on color and object names using multimodal informa-
tion. The journal of the Institute of Image Electronics
Engineers : visual computing, devices and communi-
cations, 45(1):105–111.
Ozasa, Y., Nakano, M., Ariki, Y., and Iwahashi, N. (2015).
Discriminating unknown objects from known objects
using image and speech information. IEICE TRANS-
ACTIONS on Information and Systems, 98(3):704–
711.
Pieska, S., Luimula, M., Jauhiainen, J., and Spiz, V. (2013).
Social service robots in wellness and restaurant appli-
cations. Journal of Communication and Computer,
10(1):116–123.
Qing-Xiao, Y., Can, Y., Zhuang, F., and Yan-Zheng, Z.
(2010). Research of the localization of restaurant ser-
vice robot. International Journal of Advanced Robotic
Systems, 7(3):227–238.
Swears, E., Hoogs, A., Ji, Q., and Boyer, K. (2014). Com-
plex activity recognition using granger constrained
dbn (gcdbn) in sports and surveillance video. In IEEE
Conference on Computer Vision and Pattern Recogni-
tion, pages 788–795.
Tokui, S., Oono, K., Hido, S., and Clayton, J. (2015).
Chainer: a next-generation open source framework for
deep learning. In Workshop on Machine Learning Sys-
tems at Neural Information Processing Systems.
Vemulapalli, R., Arrate, F., and Chellappa, R. (2014). Hu-
man action recognition by representing 3d skeletons
as points in a lie group. In IEEE Conference on Com-
puter Vision and Pattern Recognition, pages 588–595.
Wang, J., Liu, Z., Wu, Y., and Yuan, J. (2014). Learning
actionlet ensemble for 3d human action recognition.
IEEE transactions on pattern analysis and machine
intelligence, 36(5):914–927.