model 1. On the other hand, model 2 is able to predict
high probability until the last prediction time. The
model 2 using Town Centre dataset are also higher
prediction than model 1 in the first prediction time,
as shown Figure 13. Figure 14 shows all models are
predictable along each the truth path. From the above
results, recursive model (model2) can predict better
than non-recursive model (model 1).
5 CONCLUSIONS
We proposed a probabilistic paths prediction method
based on an encoder-predictor model. The proposed
method uses context images which visually represents
the state and movement of people’s positions. The
conventional encoder-predictor model applies to im-
ages generation. We applied it to paths prediction
with expressing probability distribution. We made
context images for getting effective information, and
evaluated two types of images and two models in one-
person and multi-person prediction. The experimen-
tal results show optical flow image can get better val-
ues than RGB images. And we also show that model
2, which input the last image of encoder and recur-
sively input images in predictor, is better than a non-
recursive model. Our future work includes planning
to predict individual paths in multi-person prediction.
REFERENCES
Adam, H. and Jules, L. (2019). Megapixels: Origins, ethics,
and privacy implications of publicly available face
recognition image datasets.
Bazzani, L., Larochelle, H., and Torresani, L. (2017). Re-
current mixture density network for spatiotemporal vi-
sual attention. In ICLR.
Bhattacharyya, A., Fritz, M., and Schiele, B. (2017). Long-
term on-board prediction of people in traffic scenes
under uncertainty. In CVPR.
Cordts, M., Omran, M., Ramos, S., Rehfeld, T., Enzweiler,
M., Beneson, R., Franke, U., Roth, S., and Schiele,
B. (2016). The cityscapes dataset for semantic urban
scene understanding. In CVPR.
Deo, N. and Trivedi, M. (2018). Convolutional social pool-
ing for vehicle trajectory prediction. In CVPR Work-
shops.
Du, X., Vasudevan, R., and Johnson-Roberson, M. (2018).
Bio-lstm: A biomechanically inspired recurrent neural
network for 3d pedestrian pose and gait prediction. In
RA-L.
Fukui, H., Hirakawa, T., Yamashita, T., and Fujiyoshi, H.
(2019). Attention branch network: Learning of atten-
tion mechanism for visual explanation. In CVPR.
Hinton, G. and Salakhutdinov, R. (2006). Reducing the di-
mensionality of data with neural networks. Science.
Hochreiter, S. and Schmidhuber, J. (1997). Long short-term
memory. Neural computation.
Hsieh, J., Liu, B., Huang, D., Fei-Fei, L., and Niebles, J.
(2018). Learning to decompose and disentangle rep-
resentations for video prediction. In NeurIPS.
Ilg, E., Mayer, N., Saikia, T., Keuper, M., Dosovitskiy, A.,
and Brox, T. (2017). Flownet 2.0: Evolution of optical
flow estimation with deep networks. In CVPR.
Kooij, J., schneider, N., Flohr, F., and Gavrila, D. M. (2014).
Context-based pedestrian path prediction. In ECCV.
Kozuka, K. and Niebles, J. (2017). Risky region localization
with point supervision. In ICCV Workshops.
Luc, P., Couprie, C., LeCun, Y., and Verbeek, J. (2018).
Predicting future instance segmentation by forecasting
convolutional features. In ECCV.
Makansi, O., Ilg, E., C¸ ic¸ek,
¨
O., and Brox, T. (2019). Over-
coming limitations of mixture density networks: A
sampling and fitting framework for multimodal future
prediction. In CVPR.
Palazzi, A., Abati, D., Calderara, S., Solera, F., and Cuc-
chiara, R. (2017). Predicting the driver’s focus of at-
tention: the dr(eye)ve project. Transactions on Pattern
Analysis and Machine Intelligence.
Rehder, E., Wirth, F., Lauer, M., and Stiller, C. (2017).
Pedestrian prediction by planning using deep neural
networks. arXiv preprint arXiv:1706.05904.
Shi, X., Chen, Z., Wang, H., Yeung, D., Wong, W., and
Woo, W. (2015). Convolutional lstm network: A ma-
chine learning approach for precipitation nowcasting.
In Advances in Neural Information Processing Sys-
tems 28. Curran Associates, Inc.
Tang, S., Andres, B., Andriluka, M., and Schiele, B. (2016).
Multi-person tracking by multicut and deep matching.
In ECCV Workshop.
Tran, D., Bourdev, L., Fergus, R., Torresani, L., and Paluri,
M. (2015). Learning spatiotemporal features with 3d
convolutional networks. In ICCV.
Xu, K., Ba, J., Kiros, R., Cho, K., Courville, A., Salakhudi-
nov, R., Zemel, R., and Bengio, Y. (2015). Show, at-
tend and tell: Neural image caption generation with
visual attention. In Proceedings of the 32nd Interna-
tional Conference on Machine Learning.
Zeng, K., Chou, S., Chan, F., J. Niebles, J., and Sun, M.
(2017a). Agent-centric risk assessment: Accident an-
ticipation and risky region localization. In CVPR.
Zeng, K., Shen, W., Huang, D., Sun, M., and Niebles, J. C.
(2017b). Visual forecasting by imitating dynamics in
natural sequences. In ICCV.
Zhang, P., Ouyang, W., Zhang, P., Xue, J., and Zhen, N.
(2019). Sr-lstm: State refinement for lstm towards
pedestrian trajectory prediction. In CVPR.
Zhang, S., Benenson, R., and Schiele, B. (2017). Cityper-
sons: A diverse dataset for pedestrian detection. In
CVPR.