Without getting into much detail, the following chart shows how the various combinations of cache_type, block_size (in MiB), and fill (bool) arguments to s3fs.S3FileSystem.open affect read performance of .h5 files.
I gathered these metrics in the context of running an algorithm for subsetting a set of .h5 GEDI L4A files in basic Python multiprocessing code (i.e., no use of any specific libraries for scaling, such as dask, ray, etc.). The gist of the code is that it takes a list of .h5 files in S3 (in this case, ~1200), and for each file it reads (via h5py) a handful of datasets (the same for each file) out of the dozens available, doing so across 32 CPUs, each with 64GiB of RAM (far more than actually needed).
We also compare this to simply downloading the files and reading them from the local filesystem, rather than reading directly from S3. This case is where the y-axis tick is labeled ('download', 0.0, False). I ran each combination ~30 times each (some combos had 1 or 2 jobs fail).
As you can see, all uses of cache type first perform significantly worse than anything else, including downloading. Of course, the .h5 files are not cloud-optimized, so nothing performs more than marginally better than downloading, but that’s sort of the point.
I’m no expert on any of this stuff, but I’m happy to provide more details if anybody has questions.
