Astronomical Data Analysis Software and Systems (ADASS) Conference

Hi everyone,

I plan to submit something around Pangeo and CNES stuf for ADASS. I’ll recycle some already used slides I have. I think Astronomical community will and actualy is already benefitting from Pangeo ecosystem through some examples I’ve seen.

I wonder if anyone else would be interested?

You might check in with Stuart Mumford stuart@cadair.com . He did some early work with Pangeo and knows some other people in this space.

Despite some early activity, Pangeo has never gotten much traction in astro applications. I never understood why. I may have to do with the dependence on astropy and the FITS format. Neither of these is particularly compatible with the Pangeo (xarray + dask + netCDF/zarr) stack.

I was wondering if this early work had been continued, it is a pitty it has not…

I tried to re-run https://github.com/pangeo-data/pangeo-astro-examples, notebook is starting (which I was grateful for), but it still uses Dask-kubernetes, and I think this is not working anymore (I’ve not been able to follow all the dask-gateway and hub refactoring work !!).

Anyway, I’ve submitted something with recycled content:
https://adass2020.es/adass2020/talk/review/SKDCEQ8RMRUGZMHLZCFEA9Y8HMNKV97Y

I’ll try to contact Stuart Mumford and Will Barnes.

1 Like

Hi @geynard! Thanks for roping us in on this.

Ryan is basically right. The layout of our data is certainly an impediment to the sort of workflows that Pangeo has enabled in the Earth sciences, particularly in the cloud. We are tied to the FITS format for the foreseeable future and there has not been a ton of progress into understanding how this data format can be used with more scalable compute, either on HPC or in the cloud.

I cannot speak much for the nighttime astronomy community, but there is definitely interest in the solar physics community, especially among ECRs, in these types of workflows. If you’re looking for examples to add to a talk, I gave a poster on using Dask (and dask-jobqueue) to accelerate EUV image analysis on HPC (specifically NASA Pleiades). You can find the ADS entry here and a PDF of the poster here. A lot of these ideas were born out of the early work that Stuart and I, assisted by Matt and others, did with the demo Pangeo cluster a few years ago.

Also, I tried to click the link to your talk abstract but it is giving me a 404.

1 Like

Thanks @wtbarnes for your inputs.

It seems the sharing link provided is not working, but there’snot much I can do. Anyway there’s not much in it. I’ll have a close look to your resources, thanks !

There’s a lot of interest at CNES from astronomy missions like LISA or ATHENA, at least for Jupyter but also for scalable compute. And I think pointing the problem of FITS format is important.

Note that intake-astro offers dask-parallel access to uncompressed FITS tables and arrays, so long as they are simple types (i.e., no embedded arrays in rows). The pangeo-astro derived example notebook above ( @wtbarnes ) used a similar, but more specific approach. Yes, this is outside of typical xarray, but the general needs are rather similar to pangeo’s. The one big difference really, is the use of generalised, curved coordinate systems specified as analytic matrices (WCS), rather than arrays of coordinate labels. That ends up not mattering for many of the processing tasks that astros actually do, such as merging many images into a master image with rejection.

I know there is or had been efforts in astropy to use dask directly too, but I don’t know the current status there.

I’d be happy to offer some thoughts from an ex-astro’s point of view, but I probably can’t contribute effort.

Hi all,

For information, the talk has been accepted. And I think I have a link that is working:

If anyone is interested to give the talk with me, this would be a pleasure. It is actually in the form of 30 minutes demo (I’ll think I’ll do a 10 minutes slides introduction, and then an actual demonstration).

2 Likes

Thanks so much for sharing @geynard!

We miss you! :heart_eyes:

I’m still around, in the shadow! I’m following every step of Pangeo (so beware :smile:) even if I don’t find much time to actively contribute.

I still try to talk about Pangeo in any occasion, like ADASS, or the Teratec (french HPC community) forum next week.

I hope I could make it from time to time to the weekly meetings, to let you know that Pangeo use is growing more in more in CNES and with our local (french, for the time being) partners!

Hey all, I’ve put together a paper for the proceedings:

@wtbarnes it would be awesome if you could have a look at it.

@TomAugspurger @wtbarnes, it would be even more awesome if we could manage to make the solar example work on binder!

For information, I see a number of other talks around Pangeo ecosystem, or at least Dask:

I see we already discussed with Simon Perkins in https://github.com/pangeo-data/pangeo/issues/255.

1 Like

Hi All - sorry for reviving the thread here but it has been super helpful. We have a couple of proposals in or going in related to WCS-Xarray and FITS-xarray. Trying to work with @jbednar to get the WCS in Xarray and further looking to adapt Xarray to work more efficiently with FITS if its possible. NASA is hoping to put 8 PBs in the cloud with much of it WCS-dependent so it is pretty critical we get this sorted ASAP. Most of it will likely be FITS hence we need an angle/efficiency there.

FYI - we are building “Panhelio” in the same footsteps as Pangeo. If anyone wants a tour of the Jupyter Hub let me know! :slight_smile: @rabernat would like to sync up with you sometime!

wtbarnes if your DASK-HPC-AIA workflow uses the FDL dataset or something similar (reduced size?) perhaps we can work with in the Panhelio Hub to test your workflow while we try to get WCS-Xarray to play together? No HPC as of yet but we could at least play around with it. Your poster is helpful along with the gitlab.

2 Likes

That sounds pretty cool!