Talk
Intermediate
First Talk

Reducing AWS Glue bills by 25 times using Airflow!

Review Pending

When your AWS Glue bill hits $10,000/month for 80 ETL pipelines, it's time to think differently. This talk shares the brutal yet rewarding journey of migrating from AWS's proprietary Glue service to Apache Airflow - achieving a staggering 96% cost reduction while maintaining performance.

We will learn how we migrated our ETL workloads to Apache Airflow, running on EC2 instances with ECS, all orchestrated seamlessly using Terraform. While Airflow is a powerful alternative, setting it up correctly with Terraform and the Celery Executor, especially for cost optimization, lacks clear documentation. In this talk, I'll walk you through the entire process: from configuring the Airflow Webserver, Scheduler, and Workers to integrating Redis and RDS, and even baking DAGs directly into Docker images. I'll share the challenges I encountered and how you can replicate this success to significantly reduce your AWS Glue expenses.

  • Slash AWS Glue Costs: Learn how to cut AWS Glue expenses by a massive 96%.

  • Identify Glue's Drawbacks: Understand why Glue's serverless nature often leads to high, unsustainable costs.

  • Migrate to Airflow: Discover Apache Airflow as a cost-effective and powerful ETL orchestration alternative.

  • Leverage Terraform: See how Terraform automates the entire Airflow infrastructure setup on AWS.

  • Overcome Setup Hurdles: Get practical tips on tackling common challenges like Redis integration and DAG deployment

Tutorial about using a FOSS project
Technology architecture
Engineering practice - productivity, debugging
Which track are you applying for?
Open Data Devroom

0 %
Approvability
0
Approvals
1
Rejections
0
Not Sure

Sorry, this talk does not fit our devroom. Our CFP mentions that talks about Big Data tooling is out of scope: https://fossunited.org/indiafoss/2025/devrooms/data

The devroom is focused on open-data and open-data tooling.

Reviewer #1
Rejected