nonlinear/non-gaussian bayesian tracking. Ieee Trans-
actions on Signal Processing, 50(2):174–188.
Ba, J., Mnih, V., and Kavukcuoglu, K. (2014). Multiple ob-
ject recognition with visual attention. arXiv preprint
arXiv:1412.7755.
Bahdanau, D., Cho, K., and Bengio, Y. (2014). Neural ma-
chine translation by jointly learning to align and trans-
late. arXiv preprint arXiv:1409.0473.
Baydin, A. G. and Le, T. A. (2018). pyprob.
Baydin, A. G., Shao, L., Bhimji, W., Heinrich, L., Naderi-
parizi, S., Munk, A., Liu, J., Gram-Hansen, B.,
Louppe, G., Meadows, L., et al. (2019). Efficient
probabilistic inference in the quest for physics beyond
the standard model. In Advances in Neural Informa-
tion Processing Systems, pages 5460–5473.
Bingham, E., Chen, J. P., Jankowiak, M., Obermeyer, F.,
Pradhan, N., Karaletsos, T., Singh, R., Szerlip, P.,
Horsfall, P., and Goodman, N. D. (2018). Pyro: Deep
Universal Probabilistic Programming. arXiv preprint
arXiv:1810.09538.
Del Moral, P., Doucet, A., and Jasra, A. (2006). Sequen-
tial monte carlo samplers. Journal of the Royal Sta-
tistical Society: Series B (Statistical Methodology),
68(3):411–436.
Gershman, S. and Goodman, N. (2014). Amortized infer-
ence in probabilistic reasoning. In Proceedings of the
Annual Meeting of the Cognitive Science Society, vol-
ume 36.
Goodman, N. D., Mansinghka, V. K., Roy, D., Bonawitz,
K., and Tenenbaum, J. B. (2008). Church: A language
for generative models. Proceedings of the 24th Con-
ference on Uncertainty in Artificial Intelligence, Uai
2008, pages 220–229.
Gregor, K., Danihelka, I., Graves, A., Rezende, D. J.,
and Wierstra, D. (2015). Draw: A recurrent neu-
ral network for image generation. arXiv preprint
arXiv:1502.04623.
Hinton, G. E., Dayan, P., Frey, B. J., and Neal, R. M. (1995).
The” wake-sleep” algorithm for unsupervised neural
networks. Science, 268(5214):1158–1161.
Hochreiter, S. and Schmidhuber, J. (1997). Long short-term
memory. Neural Computation, 9(8):1735–1780.
Jaderberg, M., Simonyan, K., Zisserman, A., et al. (2015).
Spatial transformer networks. In Advances in neural
information processing systems, pages 2017–2025.
Kingma, D. P. and Ba, J. (2014). Adam: A
method for stochastic optimization. arXiv preprint
arXiv:1412.6980.
Kulkarni, T. D., Kohli, P., Tenenbaum, J. B., and Mans-
inghka, V. (2015). Picture: A probabilistic program-
ming language for scene perception. In Proceedings
of the ieee conference on computer vision and pattern
recognition, pages 4390–4399.
Le, T. A., Baydin, A. G., and Wood, F. (2017). Inference
compilation and universal probabilistic programming.
In Proceedings of the 20th International Conference
on Artificial Intelligence and Statistics, volume 54
of Proceedings of Machine Learning Research, pages
1338–1348, Fort Lauderdale, FL, USA. PMLR.
Luong, M.-T., Pham, H., and Manning, C. D. (2015). Ef-
fective approaches to attention-based neural machine
translation. arXiv preprint arXiv:1508.04025.
Mansinghka, V., Selsam, D., and Perov, Y. (2014). Ven-
ture: a higher-order probabilistic programming plat-
form with programmable inference. arXiv preprint
arXiv:1404.0099.
Milch, B., Marthi, B., Russell, S., Sontag, D., Ong, D. L.,
and Kolobov, A. (2005). Blog: Probabilistic models
with unknown objects. Ijcai International Joint Con-
ference on Artificial Intelligence, pages 1352–1359.
Minka, T., Winn, J., Guiver, J., Zaykov, Y., Fabian, D., and
Bronskill, J. (2018). /Infer.NET 0.3. Microsoft Re-
search Cambridge. http://dotnet.github.io/infer.
Munk, A.,
´
Scibior, A., Baydin, A. G., Stewart, A., Fern-
lund, G., Poursartip, A., and Wood, F. (2019). Deep
probabilistic surrogate networks for universal simula-
tor approximation. arXiv preprint arXiv:1910.11950.
Opgen-Rhein, R. and Strimmer, K. (2007). From cor-
relation to causation networks: a simple approxi-
mate learning algorithm and its application to high-
dimensional plant gene expression data. BMC systems
biology, 1(1):37.
Seo, M., Kembhavi, A., Farhadi, A., and Hajishirzi, H.
(2016). Bidirectional attention flow for machine com-
prehension. arXiv preprint arXiv:1611.01603.
Tran, D., Kucukelbir, A., Dieng, A. B., Rudolph, M., Liang,
D., and Blei, D. M. (2016). Edward: A library for
probabilistic modeling, inference, and criticism. arXiv
preprint arXiv:1610.09787.
van de Meent, J.-W., Paige, B., Yang, H., and Wood, F.
(2018). An introduction to probabilistic programming.
arXiv preprint arXiv:1809.10756.
Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones,
L., Gomez, A. N., Kaiser, Ł., and Polosukhin, I.
(2017). Attention is all you need. In Advances in
Neural Information Processing Systems, pages 5998–
6008.
Venturini, G., Daniher, I., Crowther, R., and KOLANICH
(2017). ahkab.
Wingate, D., Andreas Stuhlm
¨
uller, A., and Goodman, N. D.
(2011). Lightweight implementations of probabilis-
tic programming languages via transformational com-
pilation. Journal of Machine Learning Research,
15:770–778.
Wood, F., Meent, J. W., and Mansinghka, V. (2014). A
new approach to probabilistic programming inference.
In Artificial Intelligence and Statistics, pages 1024–
1032.
Xu, K., Ba, J., Kiros, R., Cho, K., Courville, A., Salakhudi-
nov, R., Zemel, R., and Bengio, Y. (2015). Show, at-
tend and tell: Neural image caption generation with
visual attention. In International Conference on Ma-
chine Learning, pages 2048–2057.
Attention for Inference Compilation
87