
REFERENCES
Alger, B. E. (2002). Retrograde signaling in the regulation
of synaptic transmission: Focus on endocannabinoids.
Progress in Neurobiology, 68(4):247 – 286. Cited by:
486.
Amari, S. (1993). Backpropagation and stochastic gradient
descent method. Neurocomputing, 5:185–196.
Amunts, K., Lenzen, M., Friederici, A. D., Schleicher, A.,
Morosan, P., Palomero-Gallagher, N., and Zilles, K.
(2010). Broca’s region: Novel organizational princi-
ples and multiple receptor mapping. PLOS Biology,
8(9):1–16.
Amunts, K. and Zilles, K. (2015). Architectonic map-
ping of the human brain beyond brodmann. Neuron,
88(6):1086–1107.
Banerjee, S. and Lavie, A. (2005). METEOR: An automatic
metric for MT evaluation with improved correlation
with human judgments. In Proceedings of the ACL
Workshop on Intrinsic and Extrinsic Evaluation Mea-
sures for Machine Translation and/or Summarization,
pages 65–72, Ann Arbor, Michigan. Association for
Computational Linguistics.
Black, S., Gao, L., Wang, P., Leahy, C., and Biderman, S.
(2021). GPT-Neo: Large Scale Autoregressive Lan-
guage Modeling with Mesh-Tensorflow.
Block, H. D., Knight Jr, B., and Rosenblatt, F. (1962). Anal-
ysis of a four-layer series-coupled perceptron. ii. Re-
views of Modern Physics, 34(1):135.
Brown, T. B., Mann, B., Ryder, N., Subbiah, M., Kaplan, J.,
Dhariwal, P., Neelakantan, A., Shyam, P., Sastry, G.,
Askell, A., Agarwal, S., Herbert-Voss, A., Krueger,
G., Henighan, T., Child, R., Ramesh, A., Ziegler,
D. M., Wu, J., Winter, C., Hesse, C., Chen, M., Sigler,
E., Litwin, M., Gray, S., Chess, B., Clark, J., Berner,
C., McCandlish, S., Radford, A., Sutskever, I., and
Amodei, D. (2020). Language models are few-shot
learners.
Burke, D., Kiernan, M. C., and Bostock, H. (2001). Ex-
citability of human axons. Clinical Neurophysiology,
112(9):1575 – 1585. Cited by: 368.
Carion, N., Massa, F., Synnaeve, G., Usunier, N., Kirillov,
A., and Zagoruyko, S. (2020). End-to-end object de-
tection with transformers. In Computer Vision–ECCV
2020: 16th European Conference, Glasgow, UK, Au-
gust 23–28, 2020, Proceedings, Part I 16, pages 213–
229. Springer.
Cettolo, M., Federico, M., Bentivogli, L., Niehues, J.,
St
¨
uker, S., Sudoh, K., Yoshino, K., and Federmann,
C. (2017). Overview of the IWSLT 2017 evaluation
campaign. In Proceedings of the 14th International
Conference on Spoken Language Translation, pages
2–14, Tokyo, Japan. International Workshop on Spo-
ken Language Translation.
Chen, T., Xu, B., Zhang, C., and Guestrin, C. (2016). Train-
ing deep nets with sublinear memory cost. arXiv
preprint arXiv:1604.06174.
Chen, X., Wu, Y., Wang, Z., Liu, S., and Li, J. (2021).
Developing real-time streaming transformer trans-
ducer for speech recognition on large-scale dataset.
In ICASSP 2021-2021 IEEE International Confer-
ence on Acoustics, Speech and Signal Processing
(ICASSP), pages 5904–5908. IEEE.
Cohen, G., Afshar, S., Tapson, J., and Van Schaik, A.
(2017). Emnist: Extending mnist to handwritten let-
ters. In 2017 international joint conference on neural
networks (IJCNN), pages 2921–2926. IEEE.
Deng, L. (2012). The mnist database of handwritten digit
images for machine learning research. IEEE Signal
Processing Magazine, 29(6):141–142.
Dong, L., Xu, S., and Xu, B. (2018). Speech-transformer: a
no-recurrence sequence-to-sequence model for speech
recognition. In 2018 IEEE international conference
on acoustics, speech and signal processing (ICASSP),
pages 5884–5888. IEEE.
Dosovitskiy, A., Beyer, L., Kolesnikov, A., Weissenborn,
D., Zhai, X., Unterthiner, T., Dehghani, M., Minderer,
M., Heigold, G., Gelly, S., Uszkoreit, J., and Houlsby,
N. (2021). An image is worth 16x16 words: Trans-
formers for image recognition at scale.
Elliott, D., Frank, S., Sima’an, K., and Specia, L. (2016).
Multi30k: Multilingual english-german image de-
scriptions. In Proceedings of the 5th Workshop on
Vision and Language, pages 70–74. Association for
Computational Linguistics.
Foundation, T. P. (2023). Multiprocess-
ing package - torch.multiprocessing.
https://pytorch.org/docs/stable/multiprocessing.html.
Gerdeman, G. L., Ronesi, J., and Lovinger, D. M. (2002).
Postsynaptic endocannabinoid release is critical to
long-term depression in the striatum. Nature Neuro-
science, 5(5):446 – 451. Cited by: 607.
Gidon, A., Zolnik, T. A., Fidzinski, P., Bolduan, F., Pa-
poutsi, A., Poirazi, P., Holtkamp, M., Vida, I., and
Larkum, M. E. (2020). Dendritic action potentials and
computation in human layer 2/3 cortical neurons. Sci-
ence, 367(6473):83–87.
G
¨
unther, S., Ruthotto, L., Schroder, J. B., Cyr, E. C., and
Gauger, N. R. (2019). Layer-parallel training of deep
residual neural networks.
Gregor Koehler, A. A. and Markovics, P. (2020).
Mnist handwritten digit recognition in pytorch.
https://nextjournal.com/gkoehler/pytorch-mnist.
Griewank, A. and Walther, A. (2000). Algorithm 799: re-
volve: an implementation of checkpointing for the
reverse or adjoint mode of computational differenti-
ation. ACM Transactions on Mathematical Software
(TOMS), 26(1):19–45.
Gulati, A., Qin, J., Chiu, C.-C., Parmar, N., Zhang, Y., Yu,
J., Han, W., Wang, S., Zhang, Z., Wu, Y., and Pang, R.
(2020). Conformer: Convolution-augmented Trans-
former for Speech Recognition. In Proc. Interspeech
2020, pages 5036–5040.
Hardingham, N., Dachtler, J., and Fox, K. (2013). The role
of nitric oxide in pre-synaptic plasticity and home-
ostasis. Frontiers in Cellular Neuroscience, (OCT).
Cited by: 176; All Open Access, Gold Open Access,
Green Open Access.
He, K., Zhang, X., Ren, S., and Sun, J. (2015). Deep resid-
ual learning for image recognition.
Decoupling the Backward Pass Using Abstracted Gradients
515