• Tom Bewley
  • 📃 Papers
  • ✍️ Blog
  • 📝 Notes
  • 🎵 Music
    Tom Bewley

    Tom Bewley

    Research Scientist at the J.P. Morgan Explainable AI Center of Excellence. Working to build aligned and interpretable AI.

    • London, UK
    • Email
    • Twitter
    • LinkedIn
    • Github
    • Google Scholar

    📝 Notes

    Full List

    • A Mathematical Framework for Transformer Circuits
    • Attention
    • Codebook Features - Sparse and Discrete Interpretability for Neural Networks
    • Dictionary Learning
    • Interpreting Neural Networks through the Polytope Lens
    • Linear Representation Hypothesis
    • Mechanistic Interpretability
    • Not All Language Model Features Are Linear
    • Scaling Monosemanticity - Extracting Interpretable Features from Claude 3 Sonnet
    • Toy Models of Superposition
    • Transformer
    © 2024 Tom Bewley. Powered by Jekyll & AcademicPages, a fork of Minimal Mistakes.