DiscoveryBench: Data-Driven Scientific Discovery with LLM Agents

Review Pending

Session Description

This talk introduces data-driven discovery (ICML 24), focusing on LLM-driven scientific discovery with observational data [1]. I will present our work on DiscoveryBench (ICLR 25), an open-source benchmark designed to systematically evaluate how well LLM agents can generate scientific insights. This includes 264 real-world discovery tasks across six domains such as biology, economics, and sociology, requiring both statistical analysis and semantic reasoning.

The talk will cover the motivation, and the agentic methods used to evaluate and improve LLM performance on these tasks. I will also share a preview of our upcoming research [2] and applied agentic workflow takeaways for the audience. The open-source benchmark has been built upon and cited by leading research groups at Stanford, UC Berkeley, and Microsoft Research.

—

[1] data being the primary hypothesis foundation here for scientific discovery as opposed to literature, or theorems.

[2] joint work with Ai2 Seattle, UMass Amherst, & UWash among others.

Key Takeaways

Takeways:

Learn how generative AI is accelerating scientific discovery
Understanding the importance of robust evaluation for building reliable agentic AI systems
Applied frameworks for building robust agentic systems doing reasoning at scale
Examples of scientifically vetted insights discovered by our LLM agents
The journey of doing fundamental LLM research, collaborating with global labs, and influencing the broader research direction from India

Intended Audience:

While the primary audience for the material is ML conferences and foundation research labs, I intend to modify the talk level based on the expected audience & their expected background knowledge.

My broader hope is to spark interest in fundamental LLM/AI research coming (partially) from India. This work is one example, and I would be happy if it helps motivate professionals and students to contribute to core advances in AI.

References

https://github.com/allenai/discoverybench/

https://allenai.org/blog/data-driven-discovery-with-large-generative-models-e1a062e99390

https://arxiv.org/abs/2402.13610

https://x.com/mbodhisattwa/status/1811524569410531333

https://arxiv.org/abs/2407.01725

Session Categories

Introducing a FOSS project or a new version of a popular project

Knowledge Commons (Open Hardware, Open Science, Open Data etc.)

Story of a FOSS project - from inception to growth

Engineering practice - productivity, debugging