Talk Intermediate First Talk

Key Observability Takeaways from the SRE handbook that every developer should know

Rejected

Session Description

In this talk, we will explore the concepts of observability and availability in modern production applications. We'll delve into how meaningful signals—metrics, logs, and traces—can drastically accelerate incident debugging and help teams make proactive, data-driven infrastructure decisions. Attendees will walk away with a practical understanding of building resilient systems by investing in the right observability strategies

Key Takeaways

Understanding Observability & Availability: Gain a clear understanding of what observability and availability mean in the context of production systems, and why they are critical for application reliability.
SLIs, SLOs, and SLAs Demystified: Learn how Service Level Indicators, Objectives, and Agreements connect the dots between technical metrics and user experience, and how they drive business-aligned reliability goals.
Beyond Metrics & Alerts: Discover why relying solely on metrics and alerts is not enough, and how integrating structured logging, distributed tracing, and effective monitoring practices can significantly reduce mean time to detect (MTTD) and resolve (MTTR) incidents.
The “Three Pillars” of Observability: Metrics, logs, and traces—explain how each serves a different purpose, and how they work together for complete system visibility.
Tooling Ecosystem: Briefly introduce popular FOSS tools (e.g., Prometheus ).
Proactive vs Reactive Monitoring: How observability helps in identifying patterns or anomalies before they turn into full-blown outages.

References

https://sre.google/sre-book/table-of-contents/

https://www.brendangregg.com/usemethod.html

https://github.com/prometheus/prometheus

Session Categories

Technology architecture

Engineering practice - productivity, debugging

Knowledge Commons (Open Hardware, Open Science, Open Data etc.)

Which track are you applying for?

Main track

Speakers

Sankararaman Software Engineer | Nielsen

I am a backend engineer and systems reliability enthusiast with deep experience in building and scaling high-availability infrastructure. I have worked extensively in observability, monitoring, and incident response—helping teams move from reactive firefighting to proactive system design. My work focuses on creating robust observability pipelines, crafting actionable SLIs/SLOs, and reducing mean time to recovery in complex, distributed environments.

Having worked on mission-critical systems serving millions of users, I bring a practitioner’s perspective to designing production-ready platforms. I am passionate about making infrastructure topics more approachable, and regularly share insights on bridging the gap between raw telemetry and real business impact.

Reviews

This seems like a generic talk.

Reviewer #1 Rejected

I don't see anything novel here that is not already available online.

Reviewer #2 Rejected

More about concepts which are available online than a FOSS tool

Reviewer #3 Rejected

The topic was too generic, and the content is readily available online. Our conference prioritizes talks that present novel ideas, specific experiences with FOSS tools, or a unique contribution to the FOSS community. We encourage you to resubmit a proposal in the future that focuses on a more specific FOSS project or your personal experience contributing to it.

Reviewer #4 Rejected