Posts by Collection

blog

notes

A Mathematical Framework for Transformer Circuits

2 minute read

2021 #Content/Paper by Nelson Elhage, Neel Nanda, Catherine Olsson, Tom Henighan, Nicholas Joseph, Ben Mann, Amanda Askell, Yuntao Bai, Anna Chen, Tom Conerly, Nova DasSarma, Dawn Drain, Deep Ganguli, Zac Hatfield-Dodds, Danny Hernandez, Andy Jones, Jackson Kernion, Liane Lovitt, Kamal Ndousse, Dario Amodei, Tom Brown, Jack Clark, Jared Kaplan, Sam McCandlish, and Chris Olah. Thereā€™s also an accompanying YouTube playlist here and walkthrough by Neel Nanda here

Attention

3 minute read

Note: traditional attention was first used as a mechanism to improve the performance of RNNs. The Attention is All You Need paper proposed self-attention as a complete replacement for the RNN itself, which has subsequently proven to be more performant as well as faster due to the lack of reliance on sequential processing.

Linear Representation Hypothesis

less than 1 minute read

That monosemantic concepts are represented in neural networks as directions (vectors) in activation space, that conjunctions of concepts are represented as linear combinations of these vectors, and that we can understand the computations done by models between layers as nonlinear operations on them.

Mechanistic Interpretability

less than 1 minute read

Embodies a fundamental belief that neural networks are only black boxes by default, and that we can understand their function in terms of interpretable concepts and algorithms if we try hard enough.

Toy Models of Superposition

3 minute read

2022 #Content/Blog by Nelson Elhage, Tristan Hume, Catherine Olsson, Nicholas Schiefer, Tom Henighan, Shauna Kravec, Zac Hatfield-Dodds, Robert Lasenby, Dawn Drain, Carol Chen, Roger Grosse, Sam McCandlish, Jared Kaplan, Dario Amodei, Martin Wattenberg, and Chris Olah.

Transformer

1 minute read

The term ā€œtransformerā€ doesnā€™t have a fully precise definition, but in general is used to refer to any neural sequence-to-sequence model where the only interaction between positions is through a sequence of multi-head attention layers that iteratively update a central embedding called a residual stream by addition.

portfolio

publications

On the Combination of Gamiļ¬cation and Crowd Computation in Industrial Automation and Robotics Applications

Published in 2019 International Conference on Robotics and Automation (ICRA), 2019

A theoretical framework for embedding robotics problems into video games, enabling distributed control by a crowd of players.

Recommended citation: Bewley, Tom, and Liarokapis, Minas. "On the Combination of Gamification and Crowd Computation in Industrial Automation and Robotics Applications." 2019 International Conference on Robotics and Automation (ICRA). IEEE, 2019. /files/gamification.pdf

On Tour: Harnessing Social Tourism Data for City and Point of Interest Recommendation

Published in DSRS-Turing 2019: 1st International ā€˜Alan Turingā€™ Conference on Decision Support and Recommender Systems, 2019

A data-driven recommender system for tourism.

Recommended citation: Bewley, Tom, and Carrascosa, Ivan Palomares. "On Tour: Harnessing Social Tourism Data for City and Point of Interest Recommendation." 1st International ā€˜Alan Turingā€™ Conference on Decision Support and Recommender Systems (DSRS-Turing 2019). 2019. https://www.researchgate.net/profile/Tom_Bewley2/publication/338581229_On_Tour_Harnessing_Social_Tourism_Data_for_City_and_Point_of_Interest_Recommendation/links/5e1dd8e6458515d2b46d3eb6/On-Tour-Harnessing-Social-Tourism-Data-for-City-and-Point-of-Interest-Recommendation.pdf

Modelling Agent Policies with Interpretable Imitation Learning

Published in Trustworthy AI - Integrating Learning, Optimization and Reasoning (also 1st TAILOR Workshop at ECAI 2020), 2020

Introducing I2L and preliminary results.

Recommended citation: Bewley T., Lawry J., Richards A. (2021) Modelling Agent Policies with Interpretable Imitation Learning. In: Heintz F., Milano M., O'Sullivan B. (eds) Trustworthy AI - Integrating Learning, Optimization and Reasoning. TAILOR 2020. Lecture Notes in Computer Science, vol 12641. Springer, Cham. https://link.springer.com/chapter/10.1007%2F978-3-030-73959-1_16

TripleTree: A Versatile Interpretable Representation of Black Box Agents and their Environments

Published in 35th AAAI Conference on Artificial Intelligence (AAAI 2021), 2020

Introducing a new decision tree model of black box agent behaviour, which jointly captures the policy, value function and temporal dynamics.

Recommended citation: Bewley, Tom and Lawry, Jonathan. "TripleTree: A Versatile Interpretable Representation of Black Box Agents and their Environments" 35th AAAI Conference on Artificial Intelligence (AAAI 2021). 2021. https://arxiv.org/abs/2009.04743

Interpretable Preference-based Reinforcement Learning with Tree-Structured Reward Functions

Published in 21st International Conference on Autonomous Agents and Multiagent Systems (AAMAS 2022), 2021

An online, active learning algorithm that uses human preferences to construct reward functions with intrinsically interpretable, compositional tree structures.

Recommended citation: Bewley, Tom and Lecue, Freddy. "Interpretable Preference-based Reinforcement Learning with Tree-Structured Reward Functions" Proc. of the 21st International Conference on Autonomous Agents and Multiagent Systems (AAMAS 2022). 2022. http://arxiv.org/abs/2112.11230

Non-Markovian Reward Modelling from Trajectory Labels via Interpretable Multiple Instance Learning

Published in Thirty-sixth Conference on Neural Information Processing Systems (NeurIPS 2022), 2022

We generalise reward modelling for reinforcement learning to handle non-Markovian rewards, and propose new interpretable multiple instance learning models for this problem.

Recommended citation: Early, Joseph, Tom Bewley, Christine Evers, and Sarvapali Ramchurn. "Non-Markovian Reward Modelling from Trajectory Labels via Interpretable Multiple Instance Learning" Proc. of the Thirty-sixth Conference on Neural Information Processing Systems (NeurIPS 2022). 2022. https://arxiv.org/abs/2205.15367

Summarising and Comparing Agent Dynamics with Contrastive Spatiotemporal Abstraction

Published in XAI-IJCAI22 Workshop, 2022

A data-driven, model-agnostic technique for generating a human-interpretable summary of the salient points of contrast within an evolving dynamical system.

Recommended citation: Bewley, Tom and Lawry, Jonathan and Richards, Arthur. "Summarising and Comparing Agent Dynamics with Contrastive Spatiotemporal Abstraction" XAI-IJCAI22 Workshop. 2022. https://arxiv.org/abs/2201.07749

Reward Learning with Trees: Methods and Evaluation

Published in arXiv, 2022

We show that reward learning with tree models can be competitive with neural networks, and demonstrate some of its interpretability benefits.

Recommended citation: Bewley, Tom, Jonathan Lawry, Arthur Richards, Rachel Craddock, and Ian Henderson. "Reward Learning with Trees: Methods and Evaluation" arXiv preprint 2210.01007. 2022. https://arxiv.org/abs/2210.01007

Learning Interpretable Models of Aircraft Handling Behaviour by Reinforcement Learning from Human Feedback

Published in AIAA SciTech Forum, 2023

We show that reward learning with tree models can be competitive with neural networks in an aircraft handling domain, and demonstrate some of its interpretability benefits.

Recommended citation: Bewley, Tom, Jonathan Lawry, and Arthur Richards. "Learning Interpretable Models of Aircraft Handling Behaviour by Reinforcement Learning from Human Feedback" AIAA SciTech Forum. 2023. https://arxiv.org/abs/2305.16924

Conversative World Models

Published in Seventh Workshop on Generalization in Planning, 2023

We propose methods for improving the performance of zero-shot RL methods when trained on sub-optimal offline datasets.

Recommended citation: Jeen, Scott and Tom Bewley and Jonathan M. Cullen. "Conversative World Models" Seventh Workshop on Generalization in Planning. 2023. https://openreview.net/pdf?id=6oPZcdzFjW

Counterfactual Metarules for Local and Global Recourse

Published in International Conference on Machine Learning (ICML), 2024

A model-agnostic method for learning local and global counterfactual explanations in the form of generalised rules.

Recommended citation: Bewley, Tom, Salim I. Amoukou, Saumitra Mishra, Daniele Magazzeni, and Manuela Veloso. "Counterfactual Metarules for Local and Global Recourse" International Conference on Machine Learning (ICML). 2024. https://arxiv.org/abs/2405.18875

readings

Weekly Readings #1

19 minute read

Published:

As it stands Iā€™m precisely 13 days into my PhD, which means a lot of reading, and I thought Iā€™d kick this blog off with a weekly rolling ā€˜diaryā€™ of things I read, watch and otherwise consume which may have some influence on my PhD topic. Most of the papers have words pertaining to explanation in there, and thatā€™s because I did a massive scrape of papers with that keyword. I figured that would be a reasonable start.

Weekly Readings #2

23 minute read

Published:

Approximately three weeks in, Iā€™m starting to work on a case study project that will allow me to explore some of the key ideas around multi-agent explainability ā€“ collision avoidance within a population of autonomous vehicles on road / track networks. As a result, more of my reading this week has focused specifically on the multi-agent context.

Weekly Readings #3

8 minute read

Published:

This week didnā€™t involve very much reading since I focused instead on my practical investigation of the traffic coordination problem. Nonetheless, I encountered a variety of fascinating ideas.

Weekly Readings #4

11 minute read

Published:

Decision trees for state space segmentation; lightweight manual labelling as a ā€˜seedā€™ for interpretability; the dangerous of homogenous distributed control; AI and the climate crisis.

Weekly Readings #5

18 minute read

Published:

The theory of why-questions; fidelity versus accuracy; trees and programs as RL policies; partially-interpretable hybrids.

Weekly Readings #6

9 minute read

Published:

Modelling other agents; DAGGER; evaluating feature importance visualisations; self, soul and circular ethics.

Weekly Readings #7

8 minute read

Published:

Meta learning causal relations; decomposing explanation questions; misleading explanations; the critical influence of metrics.

Weekly Readings #8

17 minute read

Published:

State representation learning; emotions and qualitative regions for heuristic explanation; causal reasoning as a middle ground between statistics and mechanics; deep learning and neuroscientific discovery.

Weekly Readings #9

11 minute read

Published:

Model extraction; world models and representations; a MAS taxonomy.

Weekly Readings #10

7 minute read

Published:

State representation learning in Atari; AI shortcuts and ethical debt; cloning swarms.

Weekly Readings #11

11 minute read

Published:

Distillation and cloning; onboard swarm evolution; The Mindā€™s I chapters.

Weekly Readings #12

14 minute read

Published:

Goal hierarchies as rule sets; mutual information and auxiliary tasks for representation learning; model-based understanding.

Weekly Readings #13

14 minute read

Published:

Theory-of-mind as a general solution; factual and counterfactual explanation; semantic development in neural networks; cloning without action knowledge; intuition pumps.

Weekly Readings #14

15 minute read

Published:

Integrating knowledge and machine learning; folk psychology and intentionality; soft decision trees; conceptual spaces.

Weekly Readings #15

12 minute read

Published:

Confident execution framework; explananda as differences; online decision tree induction; hybrid AI design patterns.

Weekly Readings #16

11 minute read

Published:

Imitation by coaching; GAIL; human-centric vs robot-centric; DeepMimic.

Weekly Readings #17

12 minute read

Published:

Using $Q$ for imitation; differentiable decision trees and their application to RL; interactive explanations with Glass-Box.

Weekly Readings #18

22 minute read

Published:

Formalising interpretation and explanation; operationally-meaningful representations; Conceptual Spaces book.

Weekly Readings #19

10 minute read

Published:

Symbols and cognition; robust AI through hybridisation; causal modelling via RL interventions; environment as an engineered system.

Weekly Readings #20

7 minute read

Published:

Rule-based regularisation; DeepSHAP for augmenting GAN training; image schemas as conceptual primitives; imitating DDPG with a fuzzy rule-based system.

Weekly Readings #21

12 minute read

Published:

Constraining embeddings with side information; latent actions; RL with abstract representations and models; fuzzy state prototypes; index-free imitation.

Weekly Readings #22

7 minute read

Published:

Explanation-based tuning; saliency maps for vision-based policies; RL with differentiable decision trees.

Weekly Readings #23

10 minute read

Published:

Explanatory debugging; latent canonicalisations; the perceptual user interface; automatic curriculum learning.

Weekly Readings #24

11 minute read

Published:

Trustworthy AI; unifying imitation and policy gradient; soft decision trees; SRL with dimension specialisation.

Weekly Readings #25

13 minute read

Published:

Imitation learning using value or reward.

Weekly Readings #26

17 minute read

Published:

Terminological quagmires; (mis)interpreting interpolation; interaction is paramount.

talks

Explainable AI for Black Box Autonomous Agents

Published:

Talk presenting an overview of my PhD research so far, as part of a one-day conference showcasing student work within the School of Computer Science, Electrical and Electronic Engineering, and Engineering Maths (SCEEM) at the University of Bristol.

teaching

Engineering Design: Design Project 2

Undergraduate course, Faculty of Engineering, University of Bristol, 2019

Technical assistance for second year concept generation and MATLAB modelling project.

Engineering Design: Design Project 4

Undergraduate course, Faculty of Engineering, University of Bristol, 2019

Technical assistance for fourth year research and modelling activities, which form the first half of the centrepiece two-year group design project in the latter half of the Engineering Design programme.