Is this a use case for Pangeo?

RGBYCP · September 27, 2021, 3:52pm

Dear Pangeo-Community, it was recommended I ask you for advise. I am from X-ray physics. We scan with an X-ray beam samples in raster mode. In my current case, a scan contains thousands of hdf5 files. Each hdf5 file has 3-10 numpy arrays inside. These are my 2d images. I am interested in these images, i.e. numpy arrays. I need to manipulated them with another Python library to extract data from them, but also keep the manipulated data. In addition, I have other hdf5 files where I need to extract as well some numbers and store them with the manipulated data. These are motor positions, in principle x,y values telling me where the X-ray beam was. As there are so many files, it would be nice if it can be easily parallelised. This is why I was looking at dask and dask data frames. But I am not sure if this would work.
My aim is to have each scan as dataframe. Each column contains then different values and in each row I have a scan point. I hope this makes sense.
I leave all files on a cluster and work on the cluster where I could create my own conda environment.
Hence, while asking for advise, somebody pointed me kindly in the direction of pangeo. If you have any code examples that fit maybe my plan, I am grateful to look at them and learn. Thank you for your guidance.

Topic		Replies	Views
Any interest in using Ray? Cloud HPC	2	866	September 24, 2021
Exploring Pangeo's Data Processing Capabilities for Large-Scale Climate Modeling! Data	0	67	January 31, 2025
Usage of xhistogram compared to np.digitize Science	1	483	April 10, 2021
Cloud Optimized Geotiffs + Pangeo best practices Data	4	2072	January 21, 2021
First 2023 Pangeo showcase at the Feb 1 community meeting! News & Announcements	1	1032	January 27, 2023

Is this a use case for Pangeo?

Related topics