This talk is a practical case study of designing a real-time fraud detection system from scratch using a FOSS-first stack including Python, Kafka, Redis, PostgreSQL, and open source ML tooling.
The session focuses on a real enterprise problem: how to inspect transaction streams at very high scale, generate meaningful fraud signals in real time, and take action with low latency and high explainability.
I will walk through the full architecture:
event ingestion and schema validation
Kafka-based stream processing
feature engineering with hot-state lookups
rule engine plus ML scoring
decision engine for allow, review, or block
audit trails, reviewer workflows, and feedback loops
The talk will explain why these technologies were chosen, what trade-offs they introduced, and how Python was used extensively despite throughput concerns, with careful design around batching, async flows, multiprocessing, and service isolation.
I will also cover how open source helped in practical ways beyond cost:
deeper control over infra and data paths
better observability and debugging
portability across environments
transparent and explainable system behaviour
The session will include lessons from actual implementation, including bottlenecks, partition strategy, latency budgeting, false positive control, and balancing rules with ML.
This talk is meant for backend engineers, platform teams, ML practitioners, fintech builders, students, and FOSS enthusiasts who want to understand how open source tools can power large-scale, business-critical systems in production.
Previous Talk/Links :
https://www.youtube.com/watch?v=i6vKEo12KfE&t=596s
https://www.youtube.com/watch?v=mz97xjV2TQM
Given a workshop at Pycon India 2017 [https://in.pycon.org/cfp/2017/proposals/creating-captive-portal-with-tornado-and-raspberry-pi~dwngb/]
https://curiousdtu.wordpress.com/2014/03/24/bootconf-2014/
https://www.youtube.com/watch?v=n5xUTcsrRns [Property Based Testing] [PyDelhi]
How to think about fraud detection as a distributed systems problem, not just an ML problem.
How a FOSS-first stack using Python, Kafka, Redis, PostgreSQL, and open source ML tooling can be used to build a real-world, enterprise-grade fraud platform.
Why Python was chosen for large parts of the system, where it worked well, and how its performance limits were handled through architecture decisions instead of hype or guesswork.
How to design the end-to-end event flow:
ingestion
stream processing
feature generation
rule checks
ML scoring
final decisioning
How to combine rules and ML models in a practical way for fraud detection, instead of depending on a black-box model alone.
How to build for high throughput and low latency using Kafka partitioning, batching, async processing, hot-state lookups, and service isolation.
How to generate and serve real-time fraud features such as velocity, device trust, behaviour deviation, and IP or merchant risk.
Why explainability matters in fraud systems, and how to produce decisions that are useful for reviewers, auditors, and business teams.
The trade-offs between:
scale and ordering
accuracy and latency
fraud catch rate and false positives
engineering simplicity and model sophistication
This seems like a good talk, it might be overwhelming to some. Having it scale up in technical details so that the whole talk doesn't go over people's head would be good. The audience will be all over the spectrum so there will be some who would really want and benefit from the details too
This seems like a good talk, it might be overwhelming to some. Having it scale up in technical details so that the whole talk doesn't go over people's head would be good. The audience will be all over the spectrum so there will be some who would really want and benefit from the details too
I'm unable to find the project, or gauge the "scale" of it from the proposal