I wrote a blog post comparing the performance of ML-training pipelines using NetCDF with tf-records and some other file formats. Seems like it might be of interest to some here: Loading NetCDFs in TensorFlow | Noah Brenowitz
What kind of data formats do others in this community using for training ML models?
Great post Noah! Thanks so much for sharing.
In our group we use Zarr pretty heavily for ML training datasets. I’d love to see Zarr added to the comparison. If you can align chunks with batches, I imagine it should go pretty fast.
Thank you for sharing this - very timely. We’ve just started to migrate our workflows over to NetCDF from an old GeoTiff model.
Zarr would be very interesting to see.