In this talk, I’ll walk through building a self-healing system for Kubernetes using Prometheus and Keptn — both powerful CNCF-hosted FOSS projects. We’ll begin by defining Service Level Objectives (SLOs), then use Prometheus to track metrics, and finally automate incident response using Keptn’s remediation workflows.
This isn’t just theory — the talk includes practical guidance, architecture diagrams, and a live or recorded demo showing how to integrate these tools for real-world resiliency. Whether you're a contributor, a DevOps practitioner, or someone exploring cloud-native tooling, you'll walk away with a blueprint for building production-grade, SLO-driven automation with FOSS projects.
Along the way, I’ll share my learnings from working with these tools — how I reported issues, contributed feedback, and helped other developers understand Keptn's remediation workflows.
What SLOs are and why they matter in Kubernetes environments
How to use Prometheus to track and trigger SLO violations
Automating remediation using Keptn’s FOSS workflows
Architectural overview of an SLO-driven system
How contributing back (even non-code!) improves the FOSS ecosystem
This is a k8s talk about implementing existing projects, not contributing back. While this is likely insightful to k8s professionals it doesn't not directly support the main goal of our organization to promote and strengthen the FOSS ecosystem of India. Please try to propose this talk at a local city chapter meetup.
While the reviewers found the topic to be insightful for Kubernetes professionals, a key concern was that the talk focuses on implementing existing open-source projects rather than contributing back to them. The feedback noted that this approach does not directly support the conference's goal of promoting and strengthening the FOSS ecosystem in India.