Skip to Main Content
Talk Intermediate

Scaling Real-Time Upserts in Apache Pinot: Reliability and Efficiency at Any Scale

Review Pending
Session Description

Modern data-driven organizations require analytics platforms that can not only ingest high-velocity data streams but also support rapid updates and corrections on mutable datasets—a challenge that traditional OLAP engines struggle to meet. In this talk, we’ll explore how Apache Pinot delivers industry-leading throughput for upserts without compromising on efficiency or reliability.

We will explain how Pinot enables both high-throughput ingestion and low-latency queries without sacrificing correctness. We’ll explore Pinot’s metadata design, segment-level bitmaps, and partitioning strategies that keep ingestion and query overhead minimal. Real-world scaling bottlenecks—such as memory constraints and server restarts—will be addressed, along with solutions introduced by the community and StarTree Cloud: off-heap metadata management with RocksDB, minion-based metadata prebuilding, and efficient compaction strategies.

Key Takeaways
  • Learn the internal architecture that makes upserts in Pinot efficient and reliable

  • Discover practical strategies for scaling upserts to billions of keys per server

  • Features ensuring query accuracy and performance under heavy, concurrent workloads.

  • See how StarTree Cloud extends Pinot’s upsert capabilities for massive scale

References

Session Categories

Technology architecture

Speakers

Krishan Goyal Staff Software Engineer | StarTree

Krishan Goyal is a Staff Software Engineer at StarTree, where he works on scaling and improving Apache Pinot for real-time analytics at massive scale. With over a decade of experience in building and operating large-scale distributed systems, Krishan has led engineering teams at StarTree, LinkedIn, Flipkart, Moonfrog Labs, and Practo. His expertise spans data infrastructure, big data compute, and high-performance analytics. Krishan is passionate about advancing open-source technologies and enjoys sharing practical insights on building reliable, efficient, and scalable data platforms.

Krishan Goyal
https://www.linkedin.com/in/krishan1390/

Reviews

We'd have liked to host this talk but because the aim of this meetup is to conduct mock presentations for the upcoming IndiaFOSS conference we'll have to reject it. We look forward to having you over as a speaker at a future meetup.

Reviewer #1 Rejected