Skip to Main Content
Talk Advanced

Engineering Resilient Systems with Chaos Engineering using Open Source Tools

Approved
Session Description

Modern cloud-native systems are inherently complex, and failures are not a matter of 'if', but 'when'. Chaos Engineering enables teams to proactively identify weaknesses by injecting controlled failures into their systems.

In this talk, I will go beyond theory and explore how Chaos Engineering can be practically implemented using open-source tools such as Chaos Mesh, LitmusChaos, and Gremlin.

I will cover real-world scenarios such as pod failures, network latency injection, and infrastructure disruptions in Kubernetes environments. I am planning to also include a live demo showcasing how chaos experiments can be safely executed in production-like environments.

Attendees will gain actionable insights into integrating chaos experiments into CI/CD pipelines and adopting a culture of resilience using open-source ecosystems.

Key Takeaways
  • Learn how to proactively uncover hidden system weaknesses before they cause real production outages

  • Understand how Chaos Engineering helps reduce downtime, improve availability, and increase system reliability

  • Gain knowledge of implementing chaos experiments using open-source tools like LitmusChaos, Chaos Mesh, and Chaos Toolkit

  • Discover how to safely run experiments in production using blast radius control, hypothesis-driven testing, and observability

  • Learn how to build confidence in your systems by validating real-world failure scenarios (pod crashes, network latency, infra failures)

  • Understand how to integrate chaos engineering into CI/CD pipelines for continuous resilience testing

  • Walk away with practical strategies to build a culture of resilience across engineering teams

References

Session Categories

Technology architecture
Contributing to FOSS
Engineering practice - productivity, debugging
Technology / FOSS licenses, policy

Speakers

Midhun NS Lead Cloud Security | HID Global

I’m NS Midhun, an AWS Community Builder and a Lead Cloud Security professional. My career has taken me through various engineering disciplines where I've worn multiple hats - from Site Reliability Engineering to Cloud Architect, DevOps, and DevSecOps. Throughout all these roles, I've had the opportunity to implement various practices, which has given me hands-on experience with how it works across different organizational contexts.

I hold three AWS certifications and one Terraform certification, which reflect my deep involvement with cloud infrastructure and automation technologies. One of my achievements has been developing an observability product that's now available on the AWS Marketplace (https://aws.amazon.com/blogs/apn/enhancing-fact-based-decision-making-using-tech-mahindra-smart-observability-on-aws/).

I'm passionate about sharing knowledge with the tech community. I regularly write blogs and create content on social media platforms like YouTube to help fellow professionals learn new technologies. My work has taken me to different locations for various projects, which has broadened my perspective on how different organizations approach these challenges.

Speaking at events is something I truly enjoy. I've had the privilege of presenting at various internal and external conferences, including CNCF Chennai, Zinnov events, College events,aws community days, and several conferences in person in India and Canada. I also organize internal Communities of Practice (COP) for DevSecOps, where I help foster knowledge sharing and best practices within my organization.

Midhun NS
https://www.linkedin.com/in/nsmidhun/

Reviews

The proposal is thorough, but Chaos Engineering has been discussed extensively in earlier meetups/conferences, and this proposal isn't adding anything new to mainstream discourse

Reviewer #1 Not Sure