Skip to Main Content
Lightning Talk Intermediate

Engineering Resilient Systems with Chaos Engineering

Review Pending
Session Description

Modern distributed systems are complex and failures are inevitable. Chaos Engineering is an approach that helps teams proactively test system reliability by introducing failures and observing how systems respond. This talk will explore about behind Chaos Engineering, why it is becoming essential for modern infrastructure, and how organizations can use it to uncover weaknesses, improve resilience, and build confidence in their production environments before real failures occur.

Key Takeaways
  • Learn how to proactively uncover hidden system weaknesses before they cause real production outages

  • Understand how Chaos Engineering helps reduce downtime, improve availability, and increase system reliability

  • Gain knowledge of implementing chaos experiments using open-source tools like LitmusChaos, Chaos Mesh, and Chaos Toolkit

  • Discover how to safely run experiments in production using blast radius control, hypothesis-driven testing, and observability

  • Learn how to build confidence in your systems by validating real-world failure scenarios (pod crashes, network latency, infra failures)

References

Session Categories

Knowledge Commons (Open Hardware, Open Science, Open Data etc.)

Speakers

Midhun NS Lead Cloud Security

I'm NS Midhun, currently working as a Lead Cloud Security professional at HID Global. My career has taken me through various engineering disciplines where I've worn multiple hats - from Site Reliability Engineering to Cloud Architect, DevOps, and DevSecOps. Throughout all these roles, I've had the opportunity to implement various practices, which has given me hands-on experience with how it works across different organizational contexts.
I hold three AWS certifications and one Terraform certification, which reflect my deep involvement with cloud infrastructure and automation technologies. One of my achievements has been developing an observability product that's now available on the AWS Marketplace (https://aws.amazon.com/blogs/apn/enhancing-fact-based-decision-making-using-tech-mahindra-smart-observability-on-aws/).
I'm passionate about sharing knowledge with the tech community. I regularly write blogs and create content on social media platforms like YouTube to help fellow professionals learn new technologies. My work has taken me to different locations for various projects, which has broadened my perspective on how different organizations approach these challenges.
Speaking at events is something I truly enjoy. I've had the privilege of presenting at various internal and external conferences, including CNCF Chennai, Zinnov events, College events,aws community days, and several conferences in person in India and Canada. I also organize internal Communities of Practice (COP) for DevSecOps, where I help foster knowledge sharing and best practices within my organization.

Midhun NS

Reviews

The proposal is very light on details, and tangential to FOSS at best. I would've liked it much more if the speaker mentioned FOSS tools that they might use for chaos engineering in the proposal.

Reviewer #1 Not Sure

A lot of existing talks cover chaos engineering, including in past FOSS United events, and it's not clear how this proposal stands out or improves upon existing mainstream discourse on chaos engineering

Reviewer #2 Not Sure