Challenges in Accessing 3D CMIP6 Ocean Variables (e.g., thetao, so)

minminfu · June 13, 2024, 4:17pm

I am trying to download (and regrid) CMIP6 climate data download through Google Cloud.

I am roughly following the Pangeo CMIP6 tutorial

One issue I encounter is when I download and save 2D ocean variables (e.g. sea surface temperature or “tos”, or sea surface salinity or “sos”) the download finishes very quickly (within a minute). However, if I instead download and save the top level of the 3D ocean fields such as “thetao” or “so”, the code hangs and never completes. A 2D ocean field for one CMIP6 historical ensemble member is around 980MB, so it shouldn’t take very long on a decent internet connection. I am not sure why downloading from the 3D data is so much slower.

I am really not sure why there is a difference in performance here. I tried chunking the levels into chunks of size 1, but it didn’t help.

Any advice would be much appreciated! Here is the code I am using, which works for “tos” but not the top level of “thetao”. download_CMIP6_minimal_working.py · GitHub

jbusecke · June 14, 2024, 5:12pm

Hey @minminfu,

I suspect the issue here is that by writing to a single netcdf you might be loosing parallelism. Can you try writing to zarr to test this?

Topic		Replies	Views
Loading CMIP5 data in python?	2	1005	May 21, 2024
Best way to access CMIP6 data Cloud or HPC (in UK)?	3	1009	November 13, 2020
Developing online CMIP6 ENSO tool: issues with unresponsive cluster and looking for potential collaborators Cloud	4	538	May 5, 2021
CMIP6 Ocean Atlas CMIP6 Project Proposals location-ldeo	4	886	September 18, 2019
CMIP6 "ocean" data for HighResMIP simulations Data	2	668	November 13, 2020

Challenges in Accessing 3D CMIP6 Ocean Variables (e.g., thetao, so)

Related topics