Damen, D., Doughty, H., Farinella, G. M., Fidler, S., Fur-
nari, A., Kazakos, E., Moltisanti, D., Munro, J., Per-
rett, T., Price, W., and Wray, M. (2018). Scaling ego-
centric vision: The epic-kitchens dataset. European
Conference on Computer Vision.
Deng, J., Dong, W., Socher, R., Li, L. J., Li, K., and Fei-Fei,
L. (2009). Imagenet: A large-scale hierarchical image
database. In IEEE Conference on Computer Vision
and Pattern Recognition, pages 248–255.
Duarte, N., Tasevski, J., Coco, M. I., R akovic, M., and
Santos-Victor, J. (2018). Action anticipation: Reading
the intentions of humans and robots. IEEE Robotics
and Automation Letters, abs/1802.02788.
Furnari, A., Battiato, S., Grauman, K ., and Farinella, G. M.
(2017). Next-active-object prediction from egocentric
videos. Journal of Visual Communication and Image
Representation, 49:401 – 411.
Gao, J., Yang, Z., and Nevatia, R. (2017). RED: reinfor-
ced encoder-decoder networks for action anticipation.
British Machine Vision Conference, abs/1707.04818.
Hadsell, R., Chopra, S., and LeCun, Y. (2006). Dimensio-
nality reduction by learning an invariant mapping. In
IEEE Computer Society Conference on Computer Vi-
sion and Pattern Recognition (CVPR’06), volume 2,
pages 1735–1742.
Kiranyaz, S., Ince, T., and Gabbouj, M. (2016). Real-time
patient-specific ecg classification by 1-d convolutional
neural networks. IEEE Transactions on Biomedical
Engineering, 63(3):664–675.
Koch, G., Zemel, R., and Salakhutdinov, R. (2015). Sia-
mese neural networks for one-shot image recognition.
In ICML Deep Learning Workshop.
Koppula, H. S., Jain, A., and Saxena, A. (2016). cipatory
planning for human-robot teams. In Experimental Ro-
botics: The 14th International Symposium on Experi-
mental Robotics.
Koppula, H. S. and Saxena, A. (2016). Anticipating human
activities using object affordances for reactive robo-
tic response. IEEE Trans. Pattern Anal. Mach. Intell.,
38(1):14–29.
Lan, T., Chen, T.-C., and Savarese, S. (2014). A hierar-
chical representation f or future action prediction. In
European Conference on Computer Vision – ECCV,
pages 689–704, Cham. Springer International Publis-
hing.
Lee, S.-M., Yoon, S. M., and Cho, H. (2017). Human acti-
vity recognition from accelerometer data using convo-
lutional neural network. In IEEE International Confe-
rence on Big Data and Smart Computing (BigComp),
pages 131–134.
Ma, S., Sigal, L., and Sclaroff, S. (2016). Learning activity
progression in lst ms for activity detection and early
detection. In IEEE Conference on Computer Vision
and Pattern Recognition (CVPR), pages 1942–1950.
Mainprice, J. and Berenson, D. (2013). Human-robot colla-
borative manipulation planning using early prediction
of human motion. In IEEE/RSJ International Confe-
rence on Intelligent Robots and Systems, pages 299–
306.
Nakamura, K. , Yeung, S., Al ahi, A., and Fei-Fei, L. (2017).
Jointly learning energy expenditures and activities
using egocentric multimodal signals. In IEEE Con-
ference on Computer Vision and Pattern Recognition
(CVPR), pages 6817–6826.
Ngiam, J., Khosla, A., Kim, M., Nam, J., Lee, H., and Ng,
A. Y. (2011). Multimodal deep learning. In Procee-
dings of the 28th International Conference on Interna-
tional Conference on Machine Learning, pages 689–
696, USA. Omnipress.
Pirsiavash, H. and Ramanan, D. (2012). Detecting activities
of daily living in first-person camera views. In IEEE
Conference on Computer Vision and Pattern Recogni-
tion, pages 2847–2854.
Song, S., Cheung, N., Chandrasekhar, V., Mandal, B., and
Lin, J. (2016). Egocentric activity r ecognition with
multimodal fisher vector. abs/1601.06603.
Srivastava, N. and Salakhutdinov, R. (2014). Multimodal
learning with deep boltzmann machines. Journal of
Machine Learning Research, 15:2949–2980.
Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S., Angue-
lov, D., Erhan, D., Vanhoucke, V., and Rabinovich, A.
(2015). Going deeper with convolutions. In The IEEE
Conference on Computer Vision and Pattern Recogni-
tion (CVPR), pages 1–9.
Torre, F. D., Hodgins, J. K., Montano, J., and Valcarcel,
S. (2009). Detailed human data acquisition of kit -
chen activities: the cmu-multimodal activity database
(cmu-mmac). In CHI 2009 Workshop. Developing
Shared Home Behavior Datasets to Advance HCI and
Ubiquitous Computing Research.
Wu, T., Chien, T., Chan, C., Hu, C., and Sun, M. (2017).
Anticipating daily intention using on-wrist motion
triggered sensing. Intenational Conference on Conm-
puter Vision, abs/1710.07477.