Tom Bewley
📃 Papers
✍️ Blog
📝 Notes
🎵 Music

Tom Bewley

Research Scientist at the J.P. Morgan Explainable AI Center of Excellence. Working to build aligned and interpretable AI.

London, UK
Email
Twitter
LinkedIn
Github
Google Scholar

📝 Notes

Full List

A Mathematical Framework for Transformer Circuits
Attention
Codebook Features - Sparse and Discrete Interpretability for Neural Networks
Dictionary Learning
Interpreting Neural Networks through the Polytope Lens
Linear Representation Hypothesis
Mechanistic Interpretability
Not All Language Model Features Are Linear
Scaling Monosemanticity - Extracting Interpretable Features from Claude 3 Sonnet
Toy Models of Superposition
Transformer

© 2024 Tom Bewley. Powered by Jekyll & AcademicPages, a fork of Minimal Mistakes.