Talk
Intermediate
First Talk

marimo: DAG-based Reactive Python Notebooks for Reproducible Computing

Review Pending

Notebooks transformed how we write documentation (markdown), code and explore data all in one place, but it came with hidden costs. We've all experienced the frustration of a notebook that mysteriously breaks overnight or the pain of trying to decipher someone else's notebook with cells run out of order.

Of ~1M Jupyter notebooks on GitHub (study from 2019), approximately 24% are runnable and only 4% are reproducible; and with the AI/ML boom, we can only assume these numbers have gotten worse. The fundamental issue? Traditional notebooks don't enforce execution order, leading to hidden state bugs, execution order dependencies and the challenges of version control with JSON-based formats.

I'll start by showing real examples of these reproducibility issues that plague traditional notebooks. Then, I'll show how marimo tackles these problems by treating Python notebooks as compiler targets - transforming cell-based code into executable dataflow graphs through static analysis.

marimo creates a dataflow graph where each cell becomes a node and edges represent variable dependencies - when a cell changes, everything that depends on it updates automatically using static construction rather than runtime tracing.

![See relevant diagram](Image)

The runtime enforces a simple rule: when a cell runs, all descendants (cells referencing its definitions) are automatically marked for execution. Unlike other reactive platforms that re-run the whole program (yes, Streamlit, we see you), marimo only runs what actually needs updating.

marimo also solves another critical reproducibility challenge: package management. With tight uv integration, dependencies get inlined at the top of notebooks, ensuring everyone runs your code with exactly the same package versions. No more "works on my machine" issues.

I'll walk through the technical architecture: how marimo statically analyzes each cell to extract definitions and references, how it wires the dataflow graph based on variables and how the reactive runtime maintains graph topology.

But marimo goes beyond what users call "fixing" notebooks - I'll show how it bridges the gap between exploration and production by turning notebooks into interactive web apps with a single click. We'll see examples of embedding these reactive notebooks in browsers for documentation and dashboards. With WebAssembly and Pyodide, these apps run Python directly in browsers without complex server setup.

You'll walk away understanding how static analysis and dataflow graphs can solve the reproducibility crisis that's been plaguing notebooks for years. More importantly, you'll see how thoughtful constraints can actually improve developer experience - sometimes limitations aren't bugs, they're features that make your code more reliable and your workflows more predictable.

Introducing a FOSS project or a new version of a popular project
Technology architecture

0 %
Approvability
0
Approvals
0
Rejections
0
Not Sure
No reviews yet.