This talk introduces data-driven discovery (ICML 24), focusing on LLM-driven scientific discovery with observational data [1]. I will present our work on DiscoveryBench (ICLR 25), an open-source benchmark designed to systematically evaluate how well LLM agents can generate scientific insights. This includes 264 real-world discovery tasks across six domains such as biology, economics, and sociology, requiring both statistical analysis and semantic reasoning.
The talk will cover the motivation, and the agentic methods used to evaluate and improve LLM performance on these tasks. I will also share a preview of our upcoming research [2] and applied agentic workflow takeaways for the audience. The open-source benchmark has been built upon and cited by leading research groups at Stanford, UC Berkeley, and Microsoft Research.
—
[1] data being the primary hypothesis foundation here for scientific discovery as opposed to literature, or theorems.
[2] joint work with Ai2 Seattle, UMass Amherst, & UWash among others.
Takeways:
Learn how generative AI is accelerating scientific discovery
Understanding the importance of robust evaluation for building reliable agentic AI systems
Applied frameworks for building robust agentic systems doing reasoning at scale
Examples of scientifically vetted insights discovered by our LLM agents
The journey of doing fundamental LLM research, collaborating with global labs, and influencing the broader research direction from India
Intended Audience:
While the primary audience for the material is ML conferences and foundation research labs, I intend to modify the talk level based on the expected audience & their expected background knowledge.
My broader hope is to spark interest in fundamental LLM/AI research coming (partially) from India. This work is one example, and I would be happy if it helps motivate professionals and students to contribute to core advances in AI.