I am a grad student working on tropical climate dynamics. I was excited to learn about Pangeo as it seems like an incredible way to do some more in depth analysis of CMIP6 data without having to download large 4d ocean variables on slow bandwidth.
So I went ahead to develop some code, which essentially extracts ocean velocities and temperature (along with wind stress and heat fluxes) in a tropical Pacific box with the aim of assessing feedbacks associated with ENSO across a selection of CMIP6 models.
While the code runs fine for 3d fields (like wind stress), it almost always crashes for 4d ocean variables. I get things like:
CancelledError: (‘truediv-5a4b2c564a815459f94e3086e8d792ac’, 112)
distributed.client - ERROR - Failed to reconnect to scheduler after 50.00 seconds, closing client
_GatheringFuture exception was never retrieved
future: <_GatheringFuture finished exception=CancelledError()>
It happens seemingly regardless of how big my cluster is, I’ve tried changing chunk sizes and still my experience is that anytime the cluster gets in contact with a 4d variable, it almost always becomes unresponsive.
So my questions are:
Is this a problem with the dask and chunks that are set up in a wrong way that I can fix with more knowledge? OR
Is my aim of carrying out (heavy) calculations with 4d ocean variables even well suited for pangeo cloud?
I also have a proposition: If anyone is interested in developing an online, straightforward tool for assessing ENSO dynamics in CMIP6, and how they change with global warming, and have experience dask/Pangeo/etc., I would really love to collaborate. I think my idea is awesome and could be very useful for the climate science community, but I’ve also realized that I may need some help turning it into a reality. At least using Pangeo, because right now I feel a little bit like giving up on it.
Thanks for reading my post!