Zarr: Cloud-Native, Chunked & Compressed N-Dimensional Arrays

Sanket Verma
Approved
Proposal Details
About Speaker
Talk Description
Reference

A key feature of the Python data ecosystem is the reliance on simple but efficient primitives that follow well-defined interfaces to make tools work seamlessly together (Cf. http://data-apis.org/). NumPy provides an in-memory representation for tensors. Dask provides parallelisation of tensor access. Xarray provides metadata linking tensor dimensions. Zarr provides a missing feature, namely the scalable, persistent storage for annotated hierarchies of tensors. Defined through a community process, the Zarr specification enables the storage of large out-of-memory datasets locally and in the cloud. Implementations exist in C++, C, Java, Javascript, Julia, and Python, enabling.


In this talk, I’ll be talking about Zarr, an open-source data format for storing chunked, compressed N-dimensional arrays. This talk presents a systematic approach to understanding and implementing Zarr by showing how it works and the need for using it. Zarr is based on an open technical specification, making implementations across several languages possible. I’d mainly talk about Zarr’s Python implementation and show how it beautifully interoperates with the existing libraries in the PyData stack.


I will also briefly discuss the evolution of the Zarr - the development of the Zarr Enhancement Process (ZEP) and its use to define the next major version of the specification (V3), as well as the uptake of the format across the research landscape.

About the Speaker
View Profile

Sanket is a data scientist based out of New Delhi, India. He likes to build data science tools and products and has worked with startups, governments, and organisations. He loves building community and bringing everyone together and is Chair of PyData Delhi and PyData Global.

Currently, he's taking care of the community and OSS at Zarr as their Community Manager.

When he’s not working, he likes to play the violin and computer games and sometimes thinks of saving the world!

Proposal Overview
2 People Approved this Proposal
100%
0 People Rejected this Proposal
0%
0 People Marked Unsure
0%
Approvability of proposal
100%
The proposal is accepted from my side because it is descriptive enough, has all the required content, and the speaker is part of the OSS project.
Approved delta231 2 months ago
An introductory talk about Zarr will be interesting by in my personal opinion, a talk that goes into the details about how the Technical Specification is put together, enabling implementations in different languages. The Zarr Enhancement Proposal (ZEP) process is also worth an entire talk in my humble opinion as it highlights the value of a slow but methodical process in improving a library that is used by thousands (maybe tens of thousands) of projects directly. Overall, the proposal is good to go.
Approved rahulporuri 2 months ago