
in neural sequence models with dual-system, neuro-
symbolic reasoning. In Beygelzimer, A., Dauphin, Y.,
Liang, P., and Vaughan, J. W., editors, Advances in
Neural Information Processing Systems.
Rajendran, J., Ganhotra, J., Singh, S., and Polymenakos, L.
(2018). Learning end-to-end goal-oriented dialog with
multiple answers. In Proceedings of the 2018 Confer-
ence on Empirical Methods in Natural Language Pro-
cessing, pages 3834–3843, Brussels, Belgium. Asso-
ciation for Computational Linguistics.
Ruder, S. (2021). Challenges and opportunities in NLP
benchmarking. Technical report, ruder.io A blog about
natural language processing and machine learning.
Tsamoura, E., Hospedales, T., and Michael, L. (2021).
Neural-symbolic integration: a compositional per-
spective. In Proceedings of the 35th AAAI Confer-
ence on Artificial Intelligence (AAAI’21), pages 5051–
5060.
Valmeekam, K., Olmo, A., Sreedharan, S., and Kambham-
pati, S. (2022). Large language models still can’t
plan (a benchmark for llms on planning and reasoning
about change). In NeurIPS 2022 Foundation Models
for Decision Making Workshop.
Wei, J., Wang, X., Schuurmans, D., Bosma, M., brian ichter,
Xia, F., Chi, E. H., Le, Q. V., and Zhou, D. (2022).
Chain of thought prompting elicits reasoning in large
language models. In Oh, A. H., Agarwal, A., Bel-
grave, D., and Cho, K., editors, Advances in Neural
Information Processing Systems.
Wen, T.-H., Vandyke, D., Mrk
ˇ
si
´
c, N., Ga
ˇ
si
´
c, M., Rojas-
Barahona, L. M., Su, P.-H., Ultes, S., and Young, S.
(2017). A network-based end-to-end trainable task-
oriented dialogue system. In Proceedings of the 15th
Conference of the European Chapter of the Associa-
tion for Computational Linguistics: Volume 1, Long
Papers, pages 438–449, Valencia, Spain. Association
for Computational Linguistics.
Yang, Z., Ishay, A., and Lee, J. (2023). Coupling large lan-
guage models with logic programming for robust and
general reasoning from text. In Findings of the As-
sociation for Computational Linguistics: ACL 2023,
pages 5186–5219, Toronto, Canada. Association for
Computational Linguistics.
Zhao, T. and Eskenazi, M. (2016). Towards end-to-end
learning for dialog state tracking and management us-
ing deep reinforcement learning. In Proceedings of
the 17th Annual Meeting of the Special Interest Group
on Discourse and Dialogue, pages 1–10, Los Angeles.
Association for Computational Linguistics.
Zhou, D., Sch
¨
arli, N., Hou, L., Wei, J., Scales, N., Wang,
X., Schuurmans, D., Cui, C., Bousquet, O., Le, Q. V.,
and Chi, E. H. (2023). Least-to-most prompting en-
ables complex reasoning in large language models. In
The Eleventh International Conference on Learning
Representations.
APPENDIX
Linguistic Annotation of Sentences
Linguistic Annotation of (S1):
token(root, root, nner, o, root, 0);
token(if, in, nner, o, if, 1);
token(signs, nns, nner, o, sign, 2);
token(are, vbp, nner, o, be, 3);
token(blurry, jj, nner, o, blurry, 4);
token(activate, vbp, nner, o, activate, 6);
token(esp, nnp, ner, system, esp, 7);
root(root, 0, activate, 6);
mark(blurry, 4, if, 1);
nsubj(blurry, 4, sign, 2);
cop(blurry, 4, be, 3);
advcl(activate, 6, blurry, 4);
dobj(activate, 6, esp, 7);
Linguistic Annotation of (S2):
token(root, root, nner, o, root, 0);
token(when, wrb, nner, o, when, 1);
token(esp, nnp, ner, system, esp, 2);
token(is, vbz, nner, o, be, 3);
token(inactive, jj, nner, o, inactive, 4);
token(decrease, vb, nner, o, decrease, 6);
token(speed, nn, nner, o, speed, 7);
root(root, 0, decrease, 6);
advmod(inactive, 4, when, 1);
nsubj(inactive, 4, esp, 2);
cop(inactive, 4, be, 3);
advcl(decrease, 6, inactive, 4);
dobj(decrease, 6, speed, 7);
Linguistic Annotation of (S3):
token(root, root, nner, o, root, 0);
token(increase, vb, nner, o, increase, 1);
token(the, dt, nner, o, the, 2);
token(distance, nn, nner, o, distance, 3);
token(when, wrb, nner, o, when, 4);
token(joe, nnp, ner, person, joe, 5);
token(is, vbz, nner, o, be, 6);
token(driving, vbg, nner, o, drive, 7);
root(root, 0, increase, 1);
det(distance, 3, the, 2);
dobj(increase, 1, distance, 3);
advmod(drive, 7, when, 4);
nsubj(drive, 7, joe, 5);
aux(drive, 7, be, 6);
advcl(increase, 1, drive, 7);
Linguistic Annotation of (S4):
token(root, root, nner, o, root, 0);
token(engage, vb, nner, o, engage, 1);
token(the, dt, nner, o, the, 2);
token(lights, nns, nner, o, light, 3);
token(when, wrb, nner, o, when, 4);
token(the, dt, nner, o, the, 5);
token(road, nn, nner, o, road, 6);
token(is, vbz, nner, o, be, 7);
A Coachable Parser of Natural Language Advice
509