
ronment and compared it with the existing algorithms
and the worst-case baseline. We have concluded that
the new algorithm is effective at solving the teaching
problem.
In future work, it would be interesting to study
such a teacher-learner interaction in more complex
environments. For example, an environment could
have more states and a non-linear reward function
possibly represented as a neural network. Another
question yet to be addressed is the convergence guar-
antees of the proposed algorithms. It is also interest-
ing to check whether the MT module of the algorithm
could be improved by considering the uncertainty of
the estimated learner policy. Another possible direc-
tion of research is finding more sophisticated ways
of weighing older trajectories of the learner. E.g., if
the environment consists of several isolated regions
and any feature is confined to a certain region, then
sending a teaching demonstration in one region might
not change the learner’s behavior in others, therefore
the previous learner’s trajectories from other regions
might not need to be weighed down.
ACKNOWLEDGEMENTS
This work was partially supported by national funds
through Fundac¸
˜
ao para a Ci
ˆ
encia e a Tecnologia,
under project UIDB/50021/2020 (INESC-ID multi-
annual funding) and the RELEvaNT project, with ref-
erence PTDC/CCI-COM/5060/2021.
Rustam Zayanov would also like to thank Open
Philanthropy for their scholarship, which facilitated
his dedicated involvement in this project.
REFERENCES
Abbeel, P. and Ng, A. Y. (2004). Apprenticeship learning
via inverse reinforcement learning. In Proceedings of
the twenty-first international conference on Machine
learning, page 1.
Brown, D. S., Cui, Y., and Niekum, S. (2018). Risk-aware
active inverse reinforcement learning. In Conference
on Robot Learning, pages 362–372. PMLR.
Brown, D. S. and Niekum, S. (2019). Machine teaching for
inverse reinforcement learning: Algorithms and appli-
cations. In Proceedings of the AAAI Conference on
Artificial Intelligence, volume 33, pages 7749–7758.
Cakmak, M. and Lopes, M. (2012). Algorithmic and human
teaching of sequential decision tasks. In Twenty-Sixth
AAAI Conference on Artificial Intelligence.
Kamalaruban, P., Devidze, R., Cevher, V., and Singla,
A. (2019). Interactive teaching algorithms for
inverse reinforcement learning. arXiv preprint
arXiv:1905.11867.
Liu, W., Dai, B., Humayun, A., Tay, C., Yu, C., Smith,
L. B., Rehg, J. M., and Song, L. (2017). Iterative ma-
chine teaching. In International Conference on Ma-
chine Learning, pages 2149–2158. PMLR.
Liu, W., Dai, B., Li, X., Liu, Z., Rehg, J., and Song, L.
(2018). Towards black-box iterative machine teach-
ing. In International Conference on Machine Learn-
ing, pages 3141–3149. PMLR.
Lopes, M., Melo, F., and Montesano, L. (2009). Ac-
tive learning for reward estimation in inverse rein-
forcement learning. In Joint European Conference
on Machine Learning and Knowledge Discovery in
Databases, pages 31–46. Springer.
Melo, F. S., Guerra, C., and Lopes, M. (2018). Interactive
optimal teaching with unknown learners. In IJCAI,
pages 2567–2573.
Ng, A. Y., Russell, S., et al. (2000). Algorithms for inverse
reinforcement learning. In Icml, volume 1, page 2.
Settles, B. (2009). Active learning literature survey. Com-
puter Sciences Technical Report 1648, University of
Wisconsin–Madison.
Sutton, R. S. and Barto, A. G. (2018). Reinforcement learn-
ing: An introduction. MIT press.
Yengera, G., Devidze, R., Kamalaruban, P., and Singla, A.
(2021). Curriculum design for teaching via demon-
strations: Theory and applications. Advances in
Neural Information Processing Systems, 34:10496–
10509.
Zhu, X. (2015). Machine teaching: An inverse problem
to machine learning and an approach toward optimal
education. In Proceedings of the AAAI Conference on
Artificial Intelligence, volume 29.
Zhu, X., Singla, A., Zilles, S., and Rafferty, A. N. (2018).
An overview of machine teaching. arXiv preprint
arXiv:1801.05927.
Ziebart, B. D. (2010). Modeling purposeful adaptive be-
havior with the principle of maximum causal entropy.
Carnegie Mellon University.
Ziebart, B. D., Bagnell, J. A., and Dey, A. K. (2013). The
principle of maximum causal entropy for estimating
interacting processes. IEEE Transactions on Informa-
tion Theory, 59(4):1966–1980.
ICAART 2024 - 16th International Conference on Agents and Artificial Intelligence
24