Talk
Intermediate

DiscoveryBench: Data-Driven Scientific Discovery with LLM Agents

Review Pending

This talk introduces data-driven discovery (ICML 24), focusing on LLM-driven scientific discovery with observational data [1]. I will present our work on DiscoveryBench (ICLR 25), an open-source benchmark designed to systematically evaluate how well LLM agents can generate scientific insights. This includes 264 real-world discovery tasks across six domains such as biology, economics, and sociology, requiring both statistical analysis and semantic reasoning. 

The talk will cover the motivation, and the agentic methods used to evaluate and improve LLM performance on these tasks. I will also share a preview of our upcoming research [2] and applied agentic workflow takeaways for the audience. The open-source benchmark has been built upon and cited by leading research groups at Stanford, UC Berkeley, and Microsoft Research.

[1] data being the primary hypothesis foundation here for scientific discovery as opposed to literature, or theorems. 

[2] joint work with Ai2 Seattle, UMass Amherst, & UWash among others.

Takeways:

  • Learn how generative AI is accelerating scientific discovery

  • Understanding the importance of robust evaluation for building reliable agentic AI systems

  • Applied frameworks for building robust agentic systems doing reasoning at scale

  • Examples of scientifically vetted insights discovered by our LLM agents

  • The journey of doing fundamental LLM research, collaborating with global labs, and influencing the broader research direction from India


Intended Audience:

While the primary audience for the material is ML conferences and foundation research labs, I intend to modify the talk level based on the expected audience & their expected background knowledge.

My broader hope is to spark interest in fundamental LLM/AI research coming (partially) from India. This work is one example, and I would be happy if it helps motivate professionals and students to contribute to core advances in AI.

Introducing a FOSS project or a new version of a popular project
Knowledge Commons (Open Hardware, Open Science, Open Data etc.)
Story of a FOSS project - from inception to growth
Engineering practice - productivity, debugging
Which track are you applying for?
Main track

Harshit Surana
Visiting Research Scientist/ Founder Ai2 Seattle/ OpenLocus
https://www.linkedin.com/in/surana
Speaker Image

0 %
Approvability
0
Approvals
0
Rejections
0
Not Sure
No reviews yet.