Cloud computing using NASA Earthdata with Earthdata login

Hey All,

Much of NASA Earthdata especially those from PO.PAAC are now hosted on S3. Using parallel computing is the key to unlock the potential of hosting the data in the cloud. I could not find much information here, but has anybody used Dask (or MPI) using an AWS cluster directly on NASA Earthdata which requires a (free) Earthdata login (EDL)? How was your experience?

Jinbo

1 Like

@scottyhq has an example at Skip the download! Stream NASA data directly into Python objects | by Scott Henderson | pangeo | Medium. I’m not sure if that’s the state of the art or not, but it might get you started.

1 Like

Thx @scottyhq, it is useful, but it did not touch Dask or parallel computing. Not sure whether it is too trivial.

1 Like

Hi, yes, that example is now a bit dated and doesn’t go into dask. You might also have a look at materials from the more recent ICESat-2 hackweek: Cloud Computing Tutorial — ICESat-2 Hackweek 2022

Not sure what dataset you’re interested in. Working with dask on a single machine is straightforward because you can read credentials from a local file (~/.netrc), but on a distributed cluster you need some way of moving credentials (ideally temporary ones) across machines. Here is another example for raster data using data from LPDAAC CloudDAAC_Binders/s3_v_http.ipynb at main · rmg55/CloudDAAC_Binders · GitHub

2 Likes

I took a look at the tutorial. It is very useful for laying out the process. But it seemed to be more complicated than necessary. Is this the reason why not many existing tutorials use earthdata to demo large-scale computation? What is your experience @rabernat?

@scottyhq I am interested in using cluster with earthdata in general. MUR SST at 1km is one example for its size.

1 Like

Fortunately a completely public version of MUR exists outside the Earthdata enclave: Multi-Scale Ultra High Resolution (MUR) Sea Surface Temperature (SST) - Registry of Open Data on AWS.

I don’t have any insights on Earthdata login that Scott and Tom do not. We have usually been able to make it work using fsspec to pass through the credentials

It would be great if someone would write a definitive guide on how to use Earthdata data with Pangeo. Why not just as a forum post here on Discourse?

To be honest, this is an issue to raise with NASA, not Pangeo. NASA has created a pretty complicated wall around their data with Earthdata login. We are doing our best to deal with it given those constraints.

1 Like

A belated reply that this is on my and @betolink’s long to-do list, because it’s a pain point for icepyx as well (I know Luis’ earthdata library handles S3 authentication, but not sure if its set up for parallel computing). I’m anticipating that I can dig in on this sometime this summer (we have a very crude start here).

2 Likes

@JessicaS11 and @betolink , I’ve also had this on my todo list – I’d be happy to team up this summer if you want to organized a mini-hackathon of sorts. :smile:

1 Like