Talk
Intermediate
First Talk

How Intuit leverages and contributes to OpenLineage

Rejected

Data Lineage is a crucial aspect of big data that helps understand the flow and manage the lifecycle of data. Lineage also improves big data operational excellence and provides a more robust impact analysis process compared to manual analysis or institutional knowledge, thereby reducing data incidents caused by lack of visibility into data dependencies.


OpenLineage defines an OpenAPI specification for modeling lineage, and also provides ways to capture lineage out-of-the-box from a variety of big data processing systems using the concept of event-driven listeners. Leveraging OpenLineage to capture Data Lineage has provided us with a great head start and improved our development velocity, as compared to developing from scratch.


While productionizing OpenLineage across the big data jobs at Intuit, we faced unique challenges of scale and edge-cases that were not yet tackled by the open source OpenLineage community. With production systems at stake, we came up with solutions that worked for Intuit and have then contributed them back to OpenLineage for a wider community.


This talk intends to explain our year-long journey with OpenLineage.

None
FOSS

Athitya Kumar
Senior Software Engineer Intuit India
Speaker Image

0 %
Approvability
0
Approvals
1
Rejections
0
Not Sure
I think this will be too niche a talk.
Reviewer #1
Rejected