Start Here


A Go player makes a weak move that loses her the game. A company’s hiring policy appears to show gender biases. A person crashes their car. Our immediate question in each case is why? Our social norms, laws and fundamental ethical principles rely on the assumption that decisions and actions have explanations, that give insight into their context, causes and consequences.

❓ Why?

Explanation enables trust and understanding. It’s how we learn about the world and communicate our knowledge to others. It’s also how we diagnose flaws in any social or technical system to change it for the better. But ‘context, causes and consequences’ is a very broad scope, and our current understanding of what makes a good explanation is limited.

Each of the domains in my opening sentence is a target of recent research in the AI community. Today’s performance is often impressive and headline-grabbing, but not perfect. Machine learning models make mistakes, and we may rightly ask why. We should still demand explanations from AI systems, but it’s very difficult to actually get them. This is because today’s AI largely consists of so-called black boxes; data from the real world goes in, lots of complex mathematical things happen to make sense of it, and predictions, decisions or actions come out the other side. Only those outputs that are requested are produced; they come with no surrounding rationale or clarification. They may be appropriate, but they may be dangerously flawed.

You can try to look inside a black box, but you’ll obtain little insight. Fundamentally, AI is built of equations whose parameters are modified using statistics and differential calculus. There are no semantics here; no building blocks for explanations that a person would find meaningful. It’s numbers and equations all the way down.

So far, the AI field hasn’t really had to worry about this. Research takes place away from the real world, in the realm of video games and narrow ‘toy’ problems. Commercial uses are also pretty inconsequential. When my smart speaker mishears a voice command I don’t demand an explanation because while frustrating, the mistake barely matters. Where the risks are low we can bury our problems.

A clear exception is the content recommender systems on social networks. It seems clear that some of the fractures in modern politics are due to how these complex algorithms work to maximise ad revenue, in a way that their creators could not easily foresee or control. This is really the first AI application with much freedom to interact with humans, and the results are scary.

And we can look forward to many more ambitious uses of AI than this, far more integrated with our social fabric, as part of our legal system for example, or with the physical world through robotic embodiment. In these inevitable future applications, human welfare, and even human lives, are regularly at stake, and transparency, trust and verifiable safety are essential.

🎓 My Research

My PhD research will explore what it means to obtain genuine explanation and understanding from artificial intelligence deployed in safety-critical environments, specifically in the form of multi-agent systems. These consist of multiple actors interacting in a common environment, each working towards a goal that may be individual, shared or adversarial. Examples include fleets of vehicles (cars, trains, ships, drones…), human-robot teams collaborating on a factory floor, and stock market trading algorithms.

Some questions that I might seek to answer include:

  • What is the nature of the putative trade-off between model comprehensibility and performance in autonomous intelligent systems? Could simplicity improve generalisation?
  • Can post-hoc explanation of black box models ever provide the level of verification and validation needed for safety-critical applications, or is inherent interpretability the only viable approach?

  • Does generalised and trustworthy explanation require explicit causal and predictive models? Does it require a theory of mind?