Himawari 9 data from netcdf into raster

Hello everyone,

I have been trying to convert this netcdf into a raster file to read in QGIS, more specifically the variable CldTopTemp. I tried to read the netcdf directly into QGIS, but the dimensions are set between 1 to 5500, so I did not manage to gereference it.

My question is, how can create a raster file from the variable CldTopTemp in wgs84 using python ?

file link:
https://noaa-himawari9.s3.amazonaws.com/AHI-L2-FLDK-Clouds/2024/03/04/0030/AHI-CHGT_v1r1_h09_s202403040030204_e202403040039398_c202403040047497.nc

Thank you in advance!

1 Like

Hey there @bruno_castro,

Iā€™m not a tiff expert, but if you want to use Xarray and rioxarray, you should be able read the NetCDF file and write to tiff.

  • Install some python deps (you might need others)

# !pip install 'xarray[io]' fsspec rioxarray dask

import xarray as xr 
import rioxarray 

path = 'https://noaa-himawari9.s3.amazonaws.com/AHI-L2-FLDK-Clouds/2024/03/04/0030/AHI-CHGT_v1r1_h09_s202403040030204_e202403040039398_c202403040047497.nc'

ds = xr.open_dataset(path, chunks={})

# write a single variable to tiff
ds['CldOptDpthAWIPS'].rio.to_raster('himawari9_CldOptDpthAWIPS.tiff')

Some potentially helpful resources:
https://tutorial.xarray.dev/intermediate/remote_data/remote-data.html#

Good luck!

from a GDAL perspective, that raster is described like this

NETCDF:"/vsicurl/https://noaa-himawari9.s3.amazonaws.com/AHI-L2-FLDK-Clouds/2024/03/04/0030/AHI-CHGT_v1r1_h09_s202403040030204_e202403040039398_c202403040047497.nc":CldTopTemp

So, you could do at the command line

gdalwarp NETCDF:"/vsicurl/https://noaa-himawari9.s3.amazonaws.com/AHI-L2-FLDK-Clouds/2024/03/04/0030/AHI-CHGT_v1r1_h09_s202403040030204_e202403040039398_c202403040047497.nc":CldTopTemp CldTopTemp_longlat.tif

which will produce a best-guess longlat grid appropriate for the data.

(I wouldnā€™t actually recommend that because it will take a while to figure everything out and write to a mostly-global grid spec).

You can choose a different CRS with ā€œ-t_srsā€ and set the resolution with ā€˜-trā€™ and bbox with ā€˜-teā€™ for a repeatable grid specification.

That subdataset is georeferenced via the longitude/latitude arrays, which you can see from gdalinfo. Thereā€™s possibly a way to do that all dynamically within QGIS, but Iā€™m not experienced with that.

(But, isnā€™t himawari natively in a map projection? Probably itā€™s best to fix that upfront and read it without modification, and Iā€™d be very interested to explore fixing that).

hereā€™s an example of something similar, itā€™s the ā€˜geosā€™ projection

https://proj.org/en/stable/operations/projections/geos.html

Thatā€™s what would be preferable, I canā€™t yet see how to determine what longitude and ellipsoid parameters is from this file, but Iā€™ll explore. (Figuring this out would be the way to augment and finalize the to ā€˜rio.to_raster()ā€™ example of @norlandrhagen, you would assign the extent and crs to the raster and then get a full GeoTIFF.)

Thank you so much for the insights!
@norlandrhagen, your code works, I have tried it before, the problem is because there is no latitude and longitude in the xarray dimensions, only the number of rows and columns with 5500 each. Once you export it as raster, it works and you can load in QGIS, but the dimensions are not correct (Dimensions X: 5500 Y: 5500 Bands: 1) and the raster is georeferenced in row and col instead of lat and lon.

As @Michael_Sumner describe, I need first to figure out the best set of parameters to generate the most appropriate grid for the image and then use your solution. I will continue trying and I get back when I have some advancement. Thank you again!

I came up with a code to transform the nc into geotiff. I am still a bit skeptical, but this is what I have:

I needed to clip the only Australia from the full disk image, So I found the bbox for Australia (https://www.data.jma.go.jp/mscweb/data/himawari/). So I could filter the coordinates inside the netcdf using xarray. After filtering the netcdf I went from a xarray with dimensions 5500x5500 to 1591x2324. Based on that I calculated the pixel height, pixel width and subsequently I created the range of lat and lon taking into account the boundaries provided on the website. See the code below:

import xarray as xr
import numpy as np
import matplotlib.pyplot as plt
from affine import Affine
import rioxarray
import numpy as np

# Specify the path to your netCDF file
filename = "himawari9_sample_files\AHI-CHGT_v1r1_h09_s202502070840211_e202502070849405_c202502070858053.nc"

# Open the dataset using xarray

ds = xr.open_dataset(filename)

# Print the min and max values of the geolocation data variables
lat_pc_min = ds["Latitude_Pc"].min().values
lat_pc_max = ds["Latitude_Pc"].max().values
lon_pc_min = ds["Longitude_Pc"].min().values
lon_pc_max = ds["Longitude_Pc"].max().values

print("Latitude_Pc ranges from", lat_pc_min, "to", lat_pc_max)
print("Longitude_Pc ranges from", lon_pc_min, "to", lon_pc_max)

# Define the approximate bounding box for Australia.
#Australia BBOX https://www.data.jma.go.jp/mscweb/data/himawari/

lat_min, lat_max = -45.0, -10.0

lon_min, lon_max = 110.0, 155.0

# Retrieve the full geolocation arrays (assumed to be 2D: Rows x Columns)
lat_pc = ds["Latitude_Pc"].values
lon_pc = ds["Longitude_Pc"].values

# Create a boolean mask for the Australia bounding box condition.
condition = (lat_pc >= lat_min) & (lat_pc <= lat_max) & \
(lon_pc >= lon_min) & (lon_pc <= lon_max)

# Use np.where to obtain the row and column indices where the condition is True.
indices = np.where(condition)

if indices[0].size == 0:

print("No points found within the specified Australia bounding box. Please check your bounds.")

exit()

# Determine the rectangular slice edges from the indices.

row_min, row_max = indices[0].min(), indices[0].max()

col_min, col_max = indices[1].min(), indices[1].max()

print(f"Subsetting rows from {row_min} to {row_max}, and columns from {col_min} to {col_max}.")

# Subset the dataset using isel on the corresponding dimensions.

# Adjust dimension names if needed (here assumed as 'Rows' and 'Columns').

subset_ds = ds.isel(Rows=slice(row_min, row_max+1), Columns=slice(col_min, col_max+1))

###Estimating the coordinates from the suBset file, considering the australia

# Calculate pixel width

pixel_width_lon = (lon_max-lon_min) / subset_ds.dims['Columns']

# Generate the range for longitudes

lon_values = np.arange(lon_min, lon_max, pixel_width_lon)

# Calculate pixel width

pixel_height_lat = (lat_max-lat_min) / subset_ds.dims['Rows']

# Generate the range

lon_values = np.arange(lon_min, lon_max, pixel_width_lon)

lat_values = np.flip(np.arange(lat_min, lat_max, pixel_height_lat))

# Extract the temperature variable and rename dimensions.

da = subset_ds["CldTopTemp"].rename({"Rows": "y", "Columns": "x"})

# # Extract 1D coordinate arrays from the first row/column.

# x_coords = subset_ds["Longitude_Pc"].isel(Rows=0).values

# y_coords = subset_ds["Latitude_Pc"].isel(Columns=0).values

x_coords=lon_values

y_coords=lat_values

# # Assign 1D coordinates.

da = da.assign_coords(x=("x", x_coords), y=("y", y_coords))

# # Ensure y is in descending order.

if y_coords[0] < y_coords[-1]:

da = da.sortby("y", ascending=False)

y_coords = da["y"].values

# # Calculate pixel resolutions.

res_x = (x_coords[1] - x_coords[0]).item()

res_y = (y_coords[0] - y_coords[1]).item()

# # Compute the affine transform.

x_min = x_coords[0]

y_max = y_coords[0]

transform = Affine.translation(x_min - res_x/2, y_max + res_y/2) * Affine.scale(res_x, -res_y)

# # Set spatial dimensions, transform, and CRS.

da.rio.set_spatial_dims(x_dim="x", y_dim="y", inplace=True)

da.rio.write_transform(transform, inplace=True)

da.rio.write_crs("EPSG:4326", inplace=True)

output_tif = "suBset_value_12_aus.tif"

#Export to GeoTIFF.

da.rio.to_raster(output_tif)

print("GeoTIFF saved to:", output_tif) ```

this is what I get from warping to longlat, top panel, and in the bottom is with the correct georeferencing set via a_ullr and a_srs, I also have to flip it vertically so Iā€™ll figure out how to do that without kludging and share the code

cli command produces the bottom dataset by non-destructive metadata assignment (but this doesnā€™t flip it, probably to do that we use a_gt which Iā€™ll follow up later with various options and in python and R)

(why are these simple georeference schemes being smashed into degenerate longitude,latitude arrays still ā€¦ I just donā€™t understand why this keeps happening, I will report to the producers).

this produces an actual tif, but could be simplified and templated with VRT

gdal_translate -a_srs "+proj=geos +h=35785863 +a=6378137.0 +b=6356752.3 +lon_0=140.7 +no_defs" -a_ullr -5500000 5500000 5500000 -5500000 NETCDF:AHI-CHGT_v1r1_h09_s202403040030204_e202403040039398_c202403040047497.nc:CldTopTemp temp.tif
1 Like

this is the most compact syntax for a fix, requires GDAL version >=3.7 also needs the GDAL config setting, which I save and then reset

import os
gnbset = os.environ["GDAL_NETCDF_BOTTOMUP"] 
os.environ["GDAL_NETCDF_BOTTOMUP"] = "NO"
dsn = 'vrt://NETCDF:"AHI-CHGT_v1r1_h09_s202403040030204_e202403040039398_c202403040047497.nc":CldTopTemp?a_srs=+proj=geos +lon_0=140.7 +h=35785863 +x_0=0 +y_0=0 +a=6378137 +b=6356752.3 +units=m +no_defs&a_ullr=-5500000,5500000,5500000,-5500000'

import xarray
import rioxarray
ds = xarray.open_dataset(dsn, engine = "rasterio")
os.environ["GDAL_NETCDF_BOTTOMUP"] = gnbset

ds.CldTopTemp.rio.to_raster("CldTopTemp.tif", driver = "COG")

itā€™s a bit awful having a proj string in the vrt:// arguments but itā€™s fine, thereā€™s no epsg code for that (it could possibly be written to WKT in a file to make it slightly more compact syntax)

for earlier GDAL if you need that we can write an actual VRT file which you could substitute in the dsn ā€œNETCDF:{file}:{varname}ā€ for reuse.

generates a nice GeoTIFF

2 Likes

Thank you very much for the code. I believe the current a_ullr is setting the image upside-down. I compared the output to the image on this website:

https://www.data.jma.go.jp/mscweb/data/himawari/sat_img.php?area=fd_

I believe the correct sequence should be: -a_ullr= -5500000, -5500000, 5500000, 5500000

2 Likes

that works because you are giving it an upside down extent (ā€œassign upper left lower rightā€ is the mnemonic for a_ullr). Thatā€™s fine if it works just makes me uncomfortable because youā€™ll have a geotransform with positive y-delta (I think). I achieved same with ā€œGDAL_NETCDF_BOTTOMUPā€ config at read time. I can follow up with more if needed.

Thatā€™s true, thank you again @Michael_Sumner :slightly_smiling_face: