In this talk, I will explain what circuit tracing is in large language models like Gemma. It's a new way to understand how models answer questions by looking inside them and checking which neurons activate when. Anthropic open sourced a library called circuit-tracer and also the website Neuronpedia, which helps us find neurons linked to real-world concepts like "Texas" or "capital".
I'll show how this works with a live demo from their Jupyter notebook. We'll see what nodes and supernodes are, and how they connect to form reasoning paths in the model. This can help us debug, understand, and make models safer. We will be able to see how AI can be more transparent with the help of FOSS tools!
Slides:
https://docs.google.com/presentation/d/1FNd37jW3nB95lko2imfk6A7VGVG0S0H53hUgWocYJ1g/edit?usp=sharing
Circuit tracing helps us understand what’s going on inside AI models.
Instead of guessing how a model gives answers, we can now see the reasoning path neuron by neuron.
Nodes and Supernodes they are like Lego blocks of AI thinking.
One neuron is a node, and a group of related ones form a supernode, like a concept ("Texas").
You can trace which parts of the model fire for each part of the prompt.
Like tracing how "Dallas" leads to "Austin" through internal circuits.
Anthropic’s circuit-tracer and Neuronpedia are powerful open source tools.
Anyone can use them to experiment and explore model internals.
FOSS makes advanced AI safety research accessible.
Now we can explore, patch, and visualize how LLMs work.
This technique helps with safety, debugging, and understanding model failures.
Imagine catching a wrong answer before it happens — by watching the circuit.
Even students and indie devs can now do deep interpretability research.
You don’t need a lab — just a notebook, model weights, and curiosity.
This is up-and-coming research, and it'd be nice to have a presentation on it.
This sounds interesting, however I'm concerned because of the large number of LLM-related talks being submitted that other topics will be overshadowed.
Excellent proposal! Would be interested to learn more about circuit tracing.
Thank you for submitting your proposal for IndiaFOSS 2025. Your submission was well-received and progressed to our final review stages.
Unfortunately, due to the high volume of excellent proposals this year, we were unable to select your talk for the final program. We appreciate the effort you put into your submission and encourage you to apply again for future events.