Talk
Intermediate

Building a Real-Time Fraud Detection System at Massive Scale Using FOSS

Withdrawn

Session Description

  • This talk is a practical case study of designing a real-time fraud detection system from scratch using a FOSS-first stack including Python, Kafka, Redis, PostgreSQL, and open source ML tooling.

  • The session focuses on a real enterprise problem: how to inspect transaction streams at very high scale, generate meaningful fraud signals in real time, and take action with low latency and high explainability.

  • I will walk through the full architecture:

    • event ingestion and schema validation

    • Kafka-based stream processing

    • feature engineering with hot-state lookups

    • rule engine plus ML scoring

    • decision engine for allow, review, or block

    • audit trails, reviewer workflows, and feedback loops

  • The talk will explain why these technologies were chosen, what trade-offs they introduced, and how Python was used extensively despite throughput concerns, with careful design around batching, async flows, multiprocessing, and service isolation.

  • I will also cover how open source helped in practical ways beyond cost:

    • deeper control over infra and data paths

    • better observability and debugging

    • portability across environments

    • transparent and explainable system behaviour

  • The session will include lessons from actual implementation, including bottlenecks, partition strategy, latency budgeting, false positive control, and balancing rules with ML.

  • This talk is meant for backend engineers, platform teams, ML practitioners, fintech builders, students, and FOSS enthusiasts who want to understand how open source tools can power large-scale, business-critical systems in production.


    Previous Talk/Links :

    https://www.youtube.com/watch?v=i6vKEo12KfE&t=596s

    https://www.youtube.com/watch?v=mz97xjV2TQM

    Given a workshop at Pycon India 2017 [https://in.pycon.org/cfp/2017/proposals/creating-captive-portal-with-tornado-and-raspberry-pi~dwngb/]

    https://curiousdtu.wordpress.com/2014/03/24/bootconf-2014/

    https://www.youtube.com/watch?v=n5xUTcsrRns [Property Based Testing] [PyDelhi]

Key Takeaways

Key Takeaways

  • How to think about fraud detection as a distributed systems problem, not just an ML problem.

  • How a FOSS-first stack using Python, Kafka, Redis, PostgreSQL, and open source ML tooling can be used to build a real-world, enterprise-grade fraud platform.

  • Why Python was chosen for large parts of the system, where it worked well, and how its performance limits were handled through architecture decisions instead of hype or guesswork.

  • How to design the end-to-end event flow:

    • ingestion

    • stream processing

    • feature generation

    • rule checks

    • ML scoring

    • final decisioning

  • How to combine rules and ML models in a practical way for fraud detection, instead of depending on a black-box model alone.

  • How to build for high throughput and low latency using Kafka partitioning, batching, async processing, hot-state lookups, and service isolation.

  • How to generate and serve real-time fraud features such as velocity, device trust, behaviour deviation, and IP or merchant risk.

  • Why explainability matters in fraud systems, and how to produce decisions that are useful for reviewers, auditors, and business teams.

  • The trade-offs between:

    • scale and ordering

    • accuracy and latency

    • fraud catch rate and false positives

    • engineering simplicity and model sophistication

References

Session Categories

Engineering practice - productivity, debugging
Technology architecture

Speakers


Reviews

66 %
Approvability
2
Approvals
1
Rejections
0
Not Sure

This seems like a good talk, it might be overwhelming to some. Having it scale up in technical details so that the whole talk doesn't go over people's head would be good. The audience will be all over the spectrum so there will be some who would really want and benefit from the details too

Reviewer #1
Approved

This seems like a good talk, it might be overwhelming to some. Having it scale up in technical details so that the whole talk doesn't go over people's head would be good. The audience will be all over the spectrum so there will be some who would really want and benefit from the details too

Reviewer #2
Approved

I'm unable to find the project, or gauge the "scale" of it from the proposal

Reviewer #3
Rejected