Altimetric Data

The altimetry data are on the “pangeo-cnes” Bucket (gs://pangeo-cnes). These data cover the period from 31 December 1992 to 13 May 2019. The list of available missions (dataset) is as follows:

  • al: Altika
  • alg: Altika Drifting Phase
  • c2: Cryosat-2
  • e1: ERS-1
  • e1g: ERS-1 Geodetic Phase
  • e2: ERS-2
  • en: ENVISAT
  • enn: ENVISAT Extension Phase
  • g2: Geosat Follow On
  • h2: Haiyang-2A
  • j1: Jason-1
  • j1g: Jason-1 Geodetic Phase
  • j1n: Jason-1 Interleaved
  • j2: OSTM/Jason-2
  • j2g: OSTM/Jason-2 Geodetic Phase
  • j2n: OSTM/Jason-2 Interleaved
  • j3: Jason-3
  • s3a: Sentinel-3A
  • s3b: Sentinel-3B
  • tp: Topex/Poseidon
  • tpn: Topex/Poseidon Interleaved

Access Example

import xarray
import gcsfs.mapping
fs = gcsfs.GCSFileSystem(project='pangeo-cnes', token=None)
gcsmap = gcsfs.mapping.GCSMap('pangeo-cnes/alti/al', gcs=fs)
ds = xarray.open_zarr(gcsmap)
2 Likes

This is fantastic! Thanks so much. I’m sure people will really love having access to these data.

We are trying to track all cloud datasets in our catalog: https://pangeo-data.github.io/pangeo-datastore/index.html

Could you make a PR here to add entries for your new datasets: https://github.com/pangeo-data/pangeo-datastore

The L3 altimetry data have been updated on the “pangeo-cnes” bucket. In this update, I modified the encoding settings to access the dataset more quickly. I used Zarr filters and other encoding options to significantly improve compression, and performance reading.

Since version 2 of Zarr, it is possible to apply filters to encode the data before writing it. The idea is to transform the data in order to improve compression.

In the case of altimetry data, we have a time axis, encoded in a 64-bit integer, representing the date of the measurement to the nearest microsecond.

If zarr.Delta filter is applied, the data will be transformed to store only the delta between two successive items in order to reduce entropy in the binary representation. For example, a table containing 45723 different dates for one day, contains only 133 different values after applying the filter. This gives a much more efficient compression. For the Topex mission (the time axis represents nearly 10 years of data), with this filter, we obtain a compression factor of 62.6 vs. 5 without. In other words, to read a time axis of 1.2 GB we will need to read only 16.5 MB.

It is also possible to use several filters. For example, for the other variables, I used zarr.FixedScaleOffset and zarr.Delta filters. The first filter, compress the data using a scale factor and an offset as it is done in the CF convention and by Xarray. The advantage of using this filter is that if the data read with Dask or Zarr, will be natively decoded. These filters change the compression factor of 1.3 to 3.2. This is less impressive than for the time variable, but it allows move a storage space of 1.9 GB to 783 MB for the entire Topex mission dataset.

In short, it is very useful to play with these filters to compress the data more efficiently.

Ref:
http://alimanfoo.github.io/2016/09/09/zarr-2-groups-filters.html
http://alimanfoo.github.io/2016/09/21/genotype-compression-benchmark.html

2 Likes