Skip to Main Content
Lightning Talk Beginner

Breaking into the Black Box: Making LLMs Transparent for Science

Approved
Session Description

What if we could peer inside the black box of LLMs and understand exactly how they reason through scientific problems? This session aims to turn around the notion of LLMs from mysterious neural networks into interpretable, debuggable systems using entirely open-source tools.

Unlike closed-source alternatives, we'll explore how the open architecture of models like DeepSeek and Evo-2 allow us to trace data flow, examine attention patterns, and understand decision-making processes at a granular level. We'll see how DeepSeek's reasoning pathways derive equations and how Evo-2's genomic knowledge can be interpreted to reveal cross-species correlations!

We'll be introducing FOSS evaluation frameworks like Promptfoo and Comet's Opik to systematically audit model performance and mechanistic interpretability tools like TransformerLens and Prisma to visualize internal representations and understand how these models process scientific concepts—from protein folding predictions to mathematical theorem proving.

This session bridges the gap between AI transparency and scientific discovery, showing how open-source interpretability tools can make LLMs accountable for research.

Key Takeaways
  • Discover how open-source LLMs provide unprecedented transparency compared to closed alternatives, enabling scientific validation of AI reasoning

  • Microscoping straight into LLMs with TransformerLens and Prisma

  • Understand FOSS evaluation tools like Promptfoo and Opik to systematically assess LLM performance in research.

  • Explore real scientific applications through models like Evo-2 for genomics and DeepSeek's breakthrough reasoning mode

References

Session Categories

Technology architecture
Which track are you applying for?
FOSS in Science Devroom

Speakers

Alosh Denny
AI Engineer Discern Security
https://linkedin.com/in/aloshdenny
Alosh Denny

I am an AGI developer, hobbyist and an NVIDIA Certified Associate in AI Infra & Ops. My expertise ranges from Embedded IoT to Robotics to Specializing in Generative AI. I have secured World Rank 11th in the 2023 and 2024 European Rover Challenge at Kielce, Poland.

With a track record of participating in over 30 hackathons and nailing the finals in over 10 national levels, I love putting myself to work on new and innovative technology. Being the CEO and CTO of two startups and mentoring over more than 10 hackathons, I have made it my mission to lead and follow.

Recently, I have expanded over to the medical and healthcare field, bridging its domain experts such as doctors, clinicians and nurse practitioners with the advent of AI to revolutionize RPM and EHRs altogether. I am also a public speaker who follows deeply into open-source AI development, covering LLMs, application building and deployment.

If you ever feel like you've got a bright idea up your sleeve, let's sit and discuss over a cup of coffee :)

Reviews

Reviewer #1 Approved

The proposal is unique and relevant, but it would have been better if the proposer had provided a talk outline. The proposal outlines the models and model evaluation frameworks, so it'll be useful to understand if equal time will be spent covering these two topics. Personally, speaking about understanding and using models might be more useful for the audience than model evaluation.

The proposer provided two examples - derive equations and cross-species correlation - and it'll be good to know if these are the motivating examples that the talk will be based around or if other examples will also be introduced.

The proposer provided two examples - derive equations and cross-species correlation - and it'll be good to know if these are the motivating examples that the talk will be based around or if other examples will also be introduced.

Reviewer #2 Approved