Circuit Tracing in Transformers: Peeking Inside the Black Box

Withdrawn

Session Description

In this talk, I will explain what circuit tracing is in large language models like Gemma. It's a new way to understand how models answer questions by looking inside them and checking which neurons activate when. Anthropic open sourced a library called circuit-tracer and also the website Neuronpedia, which helps us find neurons linked to real-world concepts like "Texas" or "capital".

I'll show how this works with a live demo from their Jupyter notebook. We'll see what nodes and supernodes are, and how they connect to form reasoning paths in the model. This can help us debug, understand, and make models safer. We will be able to see how AI can be more transparent with the help of FOSS tools!

Slides:
https://docs.google.com/presentation/d/1FNd37jW3nB95lko2imfk6A7VGVG0S0H53hUgWocYJ1g/edit?usp=sharing

Key Takeaways

Circuit tracing helps us understand what’s going on inside AI models.
Instead of guessing how a model gives answers, we can now see the reasoning path neuron by neuron.
Nodes and Supernodes they are like Lego blocks of AI thinking.
One neuron is a node, and a group of related ones form a supernode, like a concept ("Texas").
You can trace which parts of the model fire for each part of the prompt.
Like tracing how "Dallas" leads to "Austin" through internal circuits.
Anthropic’s circuit-tracer and Neuronpedia are powerful open source tools.
Anyone can use them to experiment and explore model internals.
FOSS makes advanced AI safety research accessible.
Now we can explore, patch, and visualize how LLMs work.
This technique helps with safety, debugging, and understanding model failures.
Imagine catching a wrong answer before it happens — by watching the circuit.
Even students and indie devs can now do deep interpretability research.
You don’t need a lab — just a notebook, model weights, and curiosity.

References

https://www.anthropic.com/research/open-source-circuit-tracing

https://colab.research.google.com/github/safety-research/circuit-tracer/blob/main/demos/circuit_tracing_tutorial.ipynb

Session Categories

Tutorial about using a FOSS project

Which track are you applying for?

Main track

Speakers

Viraj Sharma

Student Presidium Indirapuram Delhi

https://sharmaviraj.com

Reviews

66 %

Approvability

Approvals

Rejections

Not Sure

This is up-and-coming research, and it'd be nice to have a presentation on it.

Reviewer #1

Approved

This sounds interesting, however I'm concerned because of the large number of LLM-related talks being submitted that other topics will be overshadowed.

Reviewer #2

Not Sure

Excellent proposal! Would be interested to learn more about circuit tracing.

Reviewer #3

Approved

Thank you for submitting your proposal for IndiaFOSS 2025. Your submission was well-received and progressed to our final review stages.

Unfortunately, due to the high volume of excellent proposals this year, we were unable to select your talk for the final program. We appreciate the effort you put into your submission and encourage you to apply again for future events.

Reviewer #4

Rejected