Skip to Main Content
Talk Advanced

XAI - Tracing Hooks for Peek inside Transformers

Rejected
Session Description

In this talk, I propose to discuss the problem of building explainable AI with the two approaches - causal vs correlational.

I will talk about what mech interp is in large language models like Gemma. It's a way to understand how models answer questions by looking inside them and checking which neurons activate when.

I will discuss the Anthropic's open sourced a python module - circuit-tracer, and also the Neuronpedia portal , helps us find neurons linked to real-world concepts. We will examine specific prompts on transformers and understand the various paths and thoughts that make use reach the output. (It is veery interesting - for me)

I will also talk about my own work on mech interp tooling (modelrecon) - with "activation cube" data structure (this is not a standard - I came up with it) as a means to share and visualize activation data. And also the "counterfactual" library that I am working on to correctly implement intervention testing

basic problem of causal vs. correlational techniques and the limitations of corrlational.

## Why Explainability Matters 2-3 mins

We need to understand why AI models make certain choices, not just what answers they give. Without this, the model feels like a black box. - in this I will include example of human behavior

---

## What Transformers Hide 2-3 mins

I will talk about basic transformers internal steps and features that are hard to see. highliting that tools only show the final output, not the thinking process. Infact - I will highlight that it comes as a surpirse to normal people that we dont know how models "actually" arrive at specific answers. -

---

## How Circuit Tracer Helps 3 - 5 mins

I will talk about how Anthropic’s Circuit Tracer shows the inside connections of the model.

It turns hidden activations into easy-to-understand features and shows how they link together. It not that easy, but we can get used to the graphs (like the link guy in the matrix movie- he could just understand by looking at the matrix runtime code) - I will show some graphs and walk through the reasoning path on colab

---

## Seeing the Reasoning Path 10 minutes or more

The tool draws a clear path from

input → inner features → final output

This lets everyone see which parts of the model caused the answer. _ this would be fun as the type of path a model takes are weird sometimes.

---

## Why This Is Important 2 minutes

With this method, we can:

* check if the model is behaving safely

* fix mistakes inside the model

* build trust by seeing how it thinks

I will finish with description with some of my work around activation cube data and pytorch hook mechanism and other options for logging data.

Key Takeaways
  • The idea of explainability in AI

  • Tools and techniques available

  • Current trends in XAI

  • Open source tooling available

My Slides:

https://docs.google.com/presentation/d/1FNd37jW3nB95lko2imfk6A7VGVG0S0H53hUgWocYJ1g/edit?usp=sharing

Code:
https://github.com/modelrecon

References

Session Categories

Tutorial about using a FOSS project
Introducing a FOSS project or a new version of a popular project
Technology architecture

Speakers

Viraj Sharma Student | Presidium School Indirapuram

I am a passionate technologist with a strong interest in Python, artificial intelligence and Edge computing. I am currently studying in Class 9 at Presidium School, Delhi, INDIA. I have worked in areas such as torch.nn visualization, Anthropic technologies (MCP, Skills), large concept models, and TensorCore/CUDA benchmarks, Edge AI on raspbrry pi running small models with sensors. Recently I have been working on XAI (AI explaianability) and putting my work as a project on my AI Lab - modelrecon.com. As an active member of the Python community and AI communities, I enjoy learning from experienced developers and sharing my insights with others. I attend major tech events including PyCons, Linux Fests, OS Summits, GDG events, p99conf, and various AI conferences, where I actively present my projects and ideas.

Viraj Sharma
https://sharmaviraj.com

Reviews

I'm not sure how this fits at a FOSS conference

Reviewer #1 Rejected