Another rechunking algorithm

TomNicholas · July 3, 2025, 2:37pm

Thanks for the clarification! Yes that does seem like a crucial difference.

The cubed issue I liked is still relevant though - plugging your algorithm into cubed could be very powerful, especially if it can be efficiently parallelized.

You also might be interested in dask’s algorithm for rechunking in constant memory. (Note I said constant memory, not bounded memory like yours is. IIUC dask completes the whole rechunk using a constant amount of memory irrespective of the number of chunks, but you don’t know what that constant is until you’re running it.)

Topic		Replies	Views
Rechunking large data at constant memory in Dask [experimental] HPC	9	1974	June 5, 2024
Extremely slow rechunking of Zarr store with xarray Data	16	3969	October 22, 2021
Any suggestion how to avoid memory error when using rechunker? Data machine-learning	0	429	February 2, 2023
Cubed: Fixed-memory serverless distributed N-dimensional array processing Cloud	1	692	May 24, 2022
Issues pip installing rechunker onto HPC conda environment HPC	4	778	November 19, 2020

Another rechunking algorithm

Related topics