DuckDB: An In-Process Analytical Data Management System

Approved

Session Description

Data management systems and data analysts have a troubled relationship: Common systems such as Postgres or Spark are unwieldy, hard to set up and maintain, hard to transfer data in and out, and hard to integrate into complex end-to-end workflows.

DuckDB is a new analytical data management system that is built for an in-process use case. DuckDB speaks SQL, has no external dependencies, and is deeply integrated into the Python ecosystem. DuckDB uses state-of-the art query processing techniques with vectorized execution, lightweight compression, and morsel-driven automatic parallelism. DuckDB is out-of-core capable, meaning that it is capable of not only reading datasets that are bigger than main memory. This allows for analysis of far greater datasets and in many cases removes the need to run separate infrastructure.

DuckDB is free and open source and one of the fastest growing data system to date. DuckDB was created in the acedemic environment of Centrum Wiskunde & Informatica (CWI) in Amsterdam, not entirely coincidentally the same place Python was created in.

Key Takeaways

None

References

Session Categories

FOSS

Speakers

Hannes Mühleisen

Co-Founder & CEO DuckDB Labs

Reviews

100 %

Approvability

Approvals

Rejections

Not Sure

not very clear on what are the points they are going to present as talk though a topic about duck db is useful

Reviewer #1

Not Sure

Reviewer #2

Approved