information to the feature map by skip connection is
effective for visualizing important frames.
5 CONCLUSION
This paper proposed the Integration-Net which
integrates different networks for classifying wild-
type and mutant sperm of liverwort. This allows
more accurate classification than conventional video
classification methods. We discover the difference
between the two types of sperm by using the gradients
of network. Our method discovered the difference of
the flagella automatically.
However, the heat map may be ambiguous or
blurry. Therefore, we would like to use visualization
methods that do not depend on gradient calculation
such as Score-CAM (Wang, H., Wang, Z., Du, M.,
Yang, F., Zhang, Z., Ding, S., Mardziel, P. & Hu, X.,
2020), or consider more detailed visualization
methods that refer to it.
In addition, we would like to use ConvLSTM
(Xingjian, S. H. I., Chen, Z., Wang, H., Yeung, D. Y.,
Wong, W. K., & Woo, W. C., 2015) to use motion
information effectively.
ACKNOWLEDGEMENT
This research is partially supported by JSPS
KAKENHI Grant Number 20H05427.
REFERENCES
Ji, S., Xu, W., Yang, M., & Yu, K., “3D convolutional
neural networks for human action recognition”, IEEE
Transactions on Pattern Analysis and Machine
Intelligence, Vol 35, pp. 221-231, 2013.
Dalal, N., Triggs, B., & Schmid, C., “Human detection
using oriented histograms of flow and appearance”, In
European Conference on Computer Vision, pp. 428-
441, 2006.
Selvaraju, R. R., Cogswell, M., Das, A., Vedantam, R.,
Parikh, D., & Batra, D., “Grad-CAM: Visual
explanations from deep networks via gradient-based
localization”, In IEEE International Conference on
Computer Vision, pp. 618-626, 2017.
Chattopadhay, A., Sarkar, A., Howlader, P., &
Balasubramanian, V. N., “Grad-cam++: Generalized
gradient-based visual explanations for deep
convolutional networks”, In IEEE Winter Conference
on Applications of Computer Vision, pp. 839-847, 2018.
Anderson, D. J., & Perona, P., “Toward a science of
computational ethology”, Neuron, Vol 84, pp.18-31,
2014.
Tran, D., Bourdev, L., Fergus, R., Torresani, L., & Paluri,
M. “Learning spatiotemporal features with 3d
convolutional networks”, In IEEE International
Conference on Computer Vision, pp. 4489-4497, 2015.
Springenberg, J. T., Dosovitskiy, A., Brox, T., &
Riedmiller, M., “Striving for simplicity: The all
convolutional net”, International Conference on
Learning Representations Workshops, 2015.
Zeiler, M. D., & Fergus, R., “Visualizing and understanding
convolutional networks”, In European Conference on
Computer Vision, pp. 818-833, 2014.
Ma, N., Zhang, X., & Sun, J., “Funnel activation for visual
recognition”, In European Conference on Computer
Vision, pp. 351-368, 2020.
Lin, T. Y., Goyal, P., Girshick, R., He, K., & Dollár, P.
“Focal loss for dense object detection”, In IEEE
International Conference on Computer Vision, pp.
2980-2988, 2017.
Hara, K., Kataoka, H., & Satoh, Y., “Learning spatio-
temporal features with 3d residual networks for action
recognition”, In Proceedings of the IEEE International
Conference on Computer Vision Workshops, pp. 3154-
3160, 2017.
Wang, H., Wang, Z., Du, M., Yang, F., Zhang, Z., Ding, S.,
Mardziel, P., & Hu, X., “Score-CAM: Score-weighted
visual explanations for convolutional neural networks”,
In IEEE Conference on Computer Vision and Pattern
Recognition Workshops, pp. 24-25, 2020.
Xingjian, S. H. I. Chen, Z., Wang, H., Yeung, D. Y., Wong,
W. K., & Woo, W. C., “Convolutional LSTM network:
A machine larning approach for precipitation
nowcasting”, In Advances in Neural Information
Processing Systems, pp. 802-810, 2015.