Breaking into the Black Box: Making LLMs Transparent for Science

Review Pending

Session Description

What if we could peer inside the black box of LLMs and understand exactly how they reason through scientific problems? This session aims to turn around the notion of LLMs from mysterious neural networks into interpretable, debuggable systems using entirely open-source tools.

Unlike closed-source alternatives, we'll explore how the open architecture of models like DeepSeek and Evo-2 allow us to trace data flow, examine attention patterns, and understand decision-making processes at a granular level. We'll see how DeepSeek's reasoning pathways derive equations and how Evo-2's genomic knowledge can be interpreted to reveal cross-species correlations!

We'll be introducing FOSS evaluation frameworks like Promptfoo and Comet's Opik to systematically audit model performance and mechanistic interpretability tools like TransformerLens and Prisma to visualize internal representations and understand how these models process scientific concepts—from protein folding predictions to mathematical theorem proving.

This session bridges the gap between AI transparency and scientific discovery, showing how open-source interpretability tools can make LLMs accountable for research.

Key Takeaways

Discover how open-source LLMs provide unprecedented transparency compared to closed alternatives, enabling scientific validation of AI reasoning
Microscoping straight into LLMs with TransformerLens and Prisma
Understand FOSS evaluation tools like Promptfoo and Opik to systematically assess LLM performance in research.
Explore real scientific applications through models like Evo-2 for genomics and DeepSeek's breakthrough reasoning mode

Which track are you applying for?

FOSS in Science Devroom

Reviews

100 %

Approvability

2

Approvals

0

Rejections

0

Not Sure

Reviewer #1

Approved

The proposal is unique and relevant, but it would have been better if the proposer had provided a talk outline. The proposal outlines the models and model evaluation frameworks, so it'll be useful to understand if equal time will be spent covering these two topics. Personally, speaking about understanding and using models might be more useful for the audience than model evaluation.

The proposer provided two examples - derive equations and cross-species correlation - and it'll be good to know if these are the motivating examples that the talk will be based around or if other examples will also be introduced.

Reviewer #2

Approved