Astronomical Data Analysis Software and Systems (ADASS) Conference

Hi everyone,

I plan to submit something around Pangeo and CNES stuf for ADASS. I’ll recycle some already used slides I have. I think Astronomical community will and actualy is already benefitting from Pangeo ecosystem through some examples I’ve seen.

I wonder if anyone else would be interested?

You might check in with Stuart Mumford stuart@cadair.com . He did some early work with Pangeo and knows some other people in this space.

Despite some early activity, Pangeo has never gotten much traction in astro applications. I never understood why. I may have to do with the dependence on astropy and the FITS format. Neither of these is particularly compatible with the Pangeo (xarray + dask + netCDF/zarr) stack.

I was wondering if this early work had been continued, it is a pitty it has not…

I tried to re-run https://github.com/pangeo-data/pangeo-astro-examples, notebook is starting (which I was grateful for), but it still uses Dask-kubernetes, and I think this is not working anymore (I’ve not been able to follow all the dask-gateway and hub refactoring work !!).

Anyway, I’ve submitted something with recycled content:
https://adass2020.es/adass2020/talk/review/SKDCEQ8RMRUGZMHLZCFEA9Y8HMNKV97Y

I’ll try to contact Stuart Mumford and Will Barnes.

1 Like

Hi @geynard! Thanks for roping us in on this.

Ryan is basically right. The layout of our data is certainly an impediment to the sort of workflows that Pangeo has enabled in the Earth sciences, particularly in the cloud. We are tied to the FITS format for the foreseeable future and there has not been a ton of progress into understanding how this data format can be used with more scalable compute, either on HPC or in the cloud.

I cannot speak much for the nighttime astronomy community, but there is definitely interest in the solar physics community, especially among ECRs, in these types of workflows. If you’re looking for examples to add to a talk, I gave a poster on using Dask (and dask-jobqueue) to accelerate EUV image analysis on HPC (specifically NASA Pleiades). You can find the ADS entry here and a PDF of the poster here. A lot of these ideas were born out of the early work that Stuart and I, assisted by Matt and others, did with the demo Pangeo cluster a few years ago.

Also, I tried to click the link to your talk abstract but it is giving me a 404.

Thanks @wtbarnes for your inputs.

It seems the sharing link provided is not working, but there’snot much I can do. Anyway there’s not much in it. I’ll have a close look to your resources, thanks !

There’s a lot of interest at CNES from astronomy missions like LISA or ATHENA, at least for Jupyter but also for scalable compute. And I think pointing the problem of FITS format is important.

Note that intake-astro offers dask-parallel access to uncompressed FITS tables and arrays, so long as they are simple types (i.e., no embedded arrays in rows). The pangeo-astro derived example notebook above ( @wtbarnes ) used a similar, but more specific approach. Yes, this is outside of typical xarray, but the general needs are rather similar to pangeo’s. The one big difference really, is the use of generalised, curved coordinate systems specified as analytic matrices (WCS), rather than arrays of coordinate labels. That ends up not mattering for many of the processing tasks that astros actually do, such as merging many images into a master image with rejection.

I know there is or had been efforts in astropy to use dask directly too, but I don’t know the current status there.

I’d be happy to offer some thoughts from an ex-astro’s point of view, but I probably can’t contribute effort.