Modelling Agent Policies with Interpretable Imitation Learning

Published in Trustworthy AI - Integrating Learning, Optimization and Reasoning (also 1st TAILOR Workshop at ECAI 2020), 2020

Recommended citation: Bewley T., Lawry J., Richards A. (2021) Modelling Agent Policies with Interpretable Imitation Learning. In: Heintz F., Milano M., O'Sullivan B. (eds) Trustworthy AI - Integrating Learning, Optimization and Reasoning. TAILOR 2020. Lecture Notes in Computer Science, vol 12641. Springer, Cham. [PDF]

As we deploy autonomous agents in safety-critical domains, it becomes important to develop an understanding of their internal mechanisms and representations. We outline an approach to imitation learning for reverse-engineering black box agent policies in MDP environments, yielding simplified, interpretable models in the form of decision trees. As part of this process, we explicitly model and learn agentsโ€™ latent state representations by selecting from a large space of candidate features constructed from the Markov state. We present initial promising results from an implementation in a multi-agent traffic environment.