I am trying to save a large Xarray dataset to csv, however, I have been struggling with memory issues and dataset formatting. Since my data is quite large, I have been utilizing Dask and opening up my netCDF files using:
xarray_data_ssp126 = xr.open_mfdataset('filename*.nc', combine='by_coords', parallel=True)
I do not want to use every single point in the data above, so below I am importing my desired lat/lon and converting it from .csv (gem_df) to xarray (gem):
gem = gem_df.to_xarray()
gem = gem.assign_coords(lat = ("lat", gem_df.lat), lon = ("lon", gem_df.lon))
I combined the gem dataset and the xarray_data_ssp126 dataset using interpolation with nearest neighbor:
data = xarray_data_ssp126.interp(lat = gem.lat, lon = gem.lon, method = 'nearest')
I then wanted to combine the ‘data’ dataset with gem to have some of my other data variables back.
data = data.combine_first(gem)
data
Which leaves me with the exact data I want (yay!), however, I have been struggling to find a way to export this data due to memory issues. I have read that you have previously suggested exporting to zarr, which did not work for me unfortunately. I tried to minimize the data by going from all United States data to just state level data, which did not work either. The data is daily and I am trying to save it in chunks of 25 years.
I have been able to do this previously when ignoring the coordinates (not sure if this is the right terminology) by using set_coords((“lat”, “lon”)) instead of the assign_coords listed above and instead of interpolating using select with the nearest neighbor method. By ignoring the coordinates I mean that when I did not assign the coordinates for ‘gem’, my data variables just said “index” instead of time, lat, lon.
I was wondering if you have any advice on how to go about saving these to csv?