This project is about creating a realtime data platform using open-source technology. It is about architecting a software system that brings data from multiple sources with minimal latency, and allows deriving actionable insights quickly.
The aim of this project is to create a simple, open-source, realtime, self-service data platform. Simple, so that the system is easy to understand and build. Open-source, so that we can leverage some of the best technology out there. Realtime, so we can analyse and query the data as soon as it is generated. Self-service, so that the various teams that rely on data — the data analysts, and the data scientists — can service their needs by themselves with minimal involvement from the data engineering team.