Jain, M., van Gemert, J. C., Mensink, T., and Snoek, C. G.
(2015). Objects2action: Classifying and localizing ac-
tions without any video example. In Proceedings of
the IEEE ICCV, pages 4588–4596.
Junior, V. L. E., Pedrini, H., and Menotti, D. (2019). Zero-
shot action recognition in videos: A survey. arXiv
preprint arXiv:1909.06423.
Karpathy, A., Toderici, G., Shetty, S., Leung, T., Suk-
thankar, R., and Fei-Fei, L. (2014). Large-scale video
classification with convolutional neural networks. In
Proceedings of IEEE CVPR, pages 1725–1732.
Kuehne, H., Jhuang, H., Garrote, E., Poggio, T., and Serre,
T. (2011). Hmdb: a large video database for hu-
man motion recognition. In ICCV, pages 2556–2563.
IEEE.
Liu, H., Yao, L., Zheng, Q., Luo, M., Zhao, H., and Lyu, Y.
(2020). Dual-stream generative adversarial networks
for distributionally robust zero-shot learning. Infor-
mation Sciences, 519:407–422.
Liu, J., Kuipers, B., and Savarese, S. (2011). Recognizing
human actions by attributes. In CVPR 2011, pages
3337–3344. IEEE.
Liu, K., Liu, W., Ma, H., Huang, W., and Dong, X. (2019).
Generalized zero-shot learning for action recognition
with web-scale video data. WWW, 22(2):807–824.
Mandal, D., Narayan, S., Dwivedi, S. K., Gupta, V.,
Ahmed, S., Khan, F. S., and Shao, L. (2019). Out-of-
distribution detection for generalized zero-shot action
recognition. In Proceedings of CVPR, pages 9985–
9993.
Mikolov, T., Sutskever, I., Chen, K., Corrado, G. S., and
Dean, J. (2013). Distributed representations of words
and phrases and their compositionality. In Advances in
neural information processing systems, pages 3111–
3119.
Mishra, A., Pandey, A., and Murthy, H. A. (2020). Zero-
shot learning for action recognition using synthesized
features. Neurocomputing.
Mishra, A., Verma, V. K., Reddy, M. S. K., Arulkumar, S.,
Rai, P., and Mittal, A. (2018). A generative approach
to zero-shot and few-shot action recognition. In 2018
IEEE Winter Conference on WACV, pages 372–380.
IEEE.
Narayan, S., Gupta, A., Khan, F. S., Snoek, C. G., and
Shao, L. (2020). Latent embedding feedback and dis-
criminative features for zero-shot classification. arXiv
preprint arXiv:2003.07833.
Norouzi, M., Mikolov, T., Bengio, S., Singer, Y., Shlens,
J., Frome, A., Corrado, G. S., and Dean, J. (2013).
Zero-shot learning by convex combination of seman-
tic embeddings. arXiv preprint arXiv:1312.5650.
Palatucci, M., Pomerleau, D., Hinton, G. E., and Mitchell,
T. M. (2009). Zero-shot learning with semantic output
codes. In Advances in NIPS, pages 1410–1418.
Piergiovanni, A. and Ryoo, M. S. (2018). Learning shared
multimodal embeddings with unpaired data. CoRR.
Qin, J., Liu, L., Shao, L., Shen, F., Ni, B., Chen, J., and
Wang, Y. (2017). Zero-shot action recognition with
error-correcting output codes. In Proceedings of the
IEEE Conference on CVPR, pages 2833–2842.
Roitberg, A., Martinez, M., Haurilet, M., and Stiefelhagen,
R. (2018). Towards a fair evaluation of zero-shot ac-
tion recognition using external data. In ECCV, pages
0–0.
Simonyan, K. and Zisserman, A. (2014). Two-stream con-
volutional networks for action recognition in videos.
In Advances in NIPS, pages 568–576.
Soomro, K., Zamir, A. R., and Shah, M. (2012). Ucf101:
A dataset of 101 human actions classes from videos in
the wild. arXiv preprint arXiv:1212.0402.
Tian, Y., Kong, Y., Ruan, Q., An, G., and Fu, Y. (2019).
Aligned dynamic-preserving embedding for zero-shot
action recognition. IEEE Transactions on Circuits and
Systems for Video Technology.
Tran, D., Bourdev, L., Fergus, R., Torresani, L., and Paluri,
M. (2015). Learning spatiotemporal features with 3d
convolutional networks. In ICCV, pages 4489–4497.
Wang, H. and Schmid, C. (2013). Action recognition with
improved trajectories. In Proceedings of IEEE ICCV,
pages 3551–3558.
Wang, Q. and Chen, K. (2017a). Alternative semantic rep-
resentations for zero-shot human action recognition.
In Joint European Conference on Machine Learning
and Knowledge Discovery in Databases, pages 87–
102. Springer.
Wang, Q. and Chen, K. (2017b). Multi-label zero-shot
human action recognition via joint latent embedding.
arXiv preprint arXiv:1709.05107.
Wang, Q. and Chen, K. (2017c). Zero-shot visual recog-
nition via bidirectional latent embedding. IJCV,
124(3):356–383.
Xian, Y., Akata, Z., Sharma, G., Nguyen, Q., Hein, M., and
Schiele, B. (2016). Latent embeddings for zero-shot
classification. In Proceedings of CVPR, pages 69–77.
Xian, Y., Schiele, B., and Akata, Z. (2017). Zero-shot
learning-the good, the bad and the ugly. In Proceed-
ings of the IEEE Conference on CVPR, pages 4582–
4591.
Xu, X., Hospedales, T., and Gong, S. (2015). Semantic
embedding space for zero-shot action recognition. In
2015 IEEE ICIP, pages 63–67. IEEE.
Xu, X., Hospedales, T., and Gong, S. (2017). Transductive
zero-shot action recognition by word-vector embed-
ding. IJCV, 123(3):309–333.
Xu, X., Hospedales, T. M., and Gong, S. (2016). Multi-
task zero-shot action recognition with prioritised data
augmentation. In ECCV, pages 343–359. Springer.
Zhang, B., Hu, H., and Sha, F. (2018). Cross-modal and hi-
erarchical modeling of video and text. In Proceedings
of ECCV, pages 374–390.
Zhang, C. and Peng, Y. (2018). Visual data synthesis via
gan for zero-shot video classification. arXiv preprint
arXiv:1804.10073.
Zhang, Z. and Saligrama, V. (2016). Zero-shot learning via
joint latent similarity embedding. In proceedings of
CVPR, pages 6034–6042.
Zhu, Y., Long, Y., Guan, Y., Newsam, S., and Shao, L.
(2018). Towards universal representation for unseen
action recognition. In Proceedings of CVPR, pages
9436–9445.
Fairer Evaluation of Zero Shot Action Recognition in Videos
215