Altimetric Data

fbriol · November 15, 2019, 9:29am

The altimetry data are on the “pangeo-cnes” Bucket (gs://pangeo-cnes). These data cover the period from 31 December 1992 to 13 May 2019. The list of available missions (dataset) is as follows:

al: Altika
alg: Altika Drifting Phase
c2: Cryosat-2
e1: ERS-1
e1g: ERS-1 Geodetic Phase
e2: ERS-2
en: ENVISAT
enn: ENVISAT Extension Phase
g2: Geosat Follow On
h2: Haiyang-2A
j1: Jason-1
j1g: Jason-1 Geodetic Phase
j1n: Jason-1 Interleaved
j2: OSTM/Jason-2
j2g: OSTM/Jason-2 Geodetic Phase
j2n: OSTM/Jason-2 Interleaved
j3: Jason-3
s3a: Sentinel-3A
s3b: Sentinel-3B
tp: Topex/Poseidon
tpn: Topex/Poseidon Interleaved

Access Example

import xarray
import gcsfs.mapping
fs = gcsfs.GCSFileSystem(project='pangeo-cnes', token=None)
gcsmap = gcsfs.mapping.GCSMap('pangeo-cnes/alti/al', gcs=fs)
ds = xarray.open_zarr(gcsmap)

rabernat · November 16, 2019, 2:17pm

This is fantastic! Thanks so much. I’m sure people will really love having access to these data.

We are trying to track all cloud datasets in our catalog: https://pangeo-data.github.io/pangeo-datastore/index.html

Could you make a PR here to add entries for your new datasets: https://github.com/pangeo-data/pangeo-datastore

fbriol · December 9, 2019, 2:05pm

The L3 altimetry data have been updated on the “pangeo-cnes” bucket. In this update, I modified the encoding settings to access the dataset more quickly. I used Zarr filters and other encoding options to significantly improve compression, and performance reading.

Since version 2 of Zarr, it is possible to apply filters to encode the data before writing it. The idea is to transform the data in order to improve compression.

In the case of altimetry data, we have a time axis, encoded in a 64-bit integer, representing the date of the measurement to the nearest microsecond.

If zarr.Delta filter is applied, the data will be transformed to store only the delta between two successive items in order to reduce entropy in the binary representation. For example, a table containing 45723 different dates for one day, contains only 133 different values after applying the filter. This gives a much more efficient compression. For the Topex mission (the time axis represents nearly 10 years of data), with this filter, we obtain a compression factor of 62.6 vs. 5 without. In other words, to read a time axis of 1.2 GB we will need to read only 16.5 MB.

It is also possible to use several filters. For example, for the other variables, I used zarr.FixedScaleOffset and zarr.Delta filters. The first filter, compress the data using a scale factor and an offset as it is done in the CF convention and by Xarray. The advantage of using this filter is that if the data read with Dask or Zarr, will be natively decoded. These filters change the compression factor of 1.3 to 3.2. This is less impressive than for the time variable, but it allows move a storage space of 1.9 GB to 783 MB for the entire Topex mission dataset.

In short, it is very useful to play with these filters to compress the data more efficiently.

Ref:
http://alimanfoo.github.io/2016/09/09/zarr-2-groups-filters.html
http://alimanfoo.github.io/2016/09/21/genotype-compression-benchmark.html

Topic		Replies	Views
Access to some Pangeo GCS Bucket to push data from CNES Cloud	4	700	September 29, 2019
Pangeo showcase: "HYCOM-OceanTrack: From 17,518 NetCDF files to an Analysis-Ready Cloud-Optimized dataset in the cloud?" Pangeo Showcase	0	193	October 11, 2024
September 21th 2022: Accessing NetCDF and GRIB file collections as cloud-native virtual datasets using Kerchunk Pangeo Showcase	0	1202	September 19, 2022
Wednesday November 2nd 2022: Jupyter book tutorials demonstrating xarray-based workflows for cloud-hosted remote sensing data Pangeo Showcase	3	1624	November 4, 2022
MITgcm LLC4320 data extracting Data	3	934	August 16, 2023

Altimetric Data

Related topics