This is an announcement, request for help and call to spread the word about the UK Met Office’s response to the UK Royal Society’s RAMP initiative (the same one mentioned by Tom Nicholas on this Discourse a few weeks ago). Details in this blogpost.
The UK Met Office have been working with a couple groups of epidemiologists (UK and USA) to provide Met Office data to understand any links between the spread of COVID-19 and environmental factors. To find out more please refer to our blogpost announcement. You can contact us at email@example.com, but if you want guidance on accessing the data or the platform please refer to the blogpost (our processes for granting access may change). You can also subscribe to our Google Groups mailing list for updates.
We have made available global hourly and daily meteorological data for 01 Jan 2020 - 12 April 2020*, as well as high resolution UK data. These are available as NetCDF files. Variables include air temperature, sunshine, specific humidity, precipitation and air pressure, but may include more in the future.
Part of our work has also involved collating daily spatial mean values for these variables for each reporting region in the UK and state county in the USA (as CSV files).
Microsoft are providing us storage on Azure blob storage and we aim to have this available on Azure Open Datasets soon.
*We will be updating with new data on a weekly basis.
In addition we have spun up our Pangeo instance on Microsoft Azure to allow scientists to process data themselves - https://covid19-response.informaticslab.co.uk/.
Request for help:
If anybody wants to provide some helping hands, please contact us! (firstname.lastname@example.org)
Due to limited bandwidth, we are currently seeking help with:
- Responding to user queries and requests for specific processing.
- Representing/storing the NetCDF data as Zarrs, and keeping them updated.
- Automating our data processing pipeline (NetCDF to CSV).
- Subsetting NetCDF with Shapefiles using Xarray (currently using Iris and Cartopy).
- Help/information on what the COVID-19 response community need/want.
And if you have ideas or comments please let us know!
Spread the word:
We are keen to spread the word amongst the scientific community. If you know any scientists working on COVID-19 who would be interested in access to meteorological data please share our blogpost or put them in contact with us on email@example.com.
Great work @kaedonkers I’d be keen to help out as part of the team
Great, thanks @NickMortimer! If you sign up to our Google Groups mailing list we can coordinate responses for those interested in helping. If you have any specific suggestions or offers of expertise please send us an email firstname.lastname@example.org and we can go from there
This is great! I’m working with a group that is interested in understanding the implications of the COVID-19 epidemic for air quality and climate responses to emissions perturbations. We are working to identify and prioritize a variety of data useful for this research, it seems like the data we are interested in will overlap a lot with your group. We are having a call to discuss 4/28, perhaps you (or anyone interested) can join? (email me)
Hi @cgentemann, great to hear that you are interested in the data! While the data we have provided in this instance is specifically weather observation data, there is some work at the Met Office on air quality and climate responses as a result of the pandemic. If you email us on email@example.com with some information about your research and how to join the call we can coordinate the right people to attend the meeting. Thanks!
Regarding shapefiles and xarray: with regionmask https://github.com/mathause/regionmask it’s possible to get spatial aggregations within the boundaries set by a geopandas dataframe. And geopandas reads .shp shapefiles. If you have shapefiles, you are welcome to pm me and I can try go get xr netcdf into regional spatial means of a given shapefile
@aaronspring @cgentemann @NickMortimer -
Thanks for the offers of help. A few bits of work that would help please shout if your keen to tackle any of them:
We are releasing the data under the Microsoft Open Datasets initiative. I’ve created an example of accessing and using the data here - https://github.com/informatics-lab/covid19-ai4earth-examples it’s got a launch in My Binder link if you want to test out. This could really do with some more examples to help the epidemiologist community that is much less familiar with this type of data. Would be great to have some R examples too (don’t know if that can also work with Binder?).
We have an academic from Pen State Uni doing some interesting research who could do with help processing the gridded data with shapefiles and population density information to get population-weighted regional aggregations. Get in touch if this is something you could help with.
analytics on the platform. As you can see in this thread Capturing / recording usage analytics I’ve got some advice but haven’t had the time to implement it.
We are likely building more platforms, hopefully with Microsoft support but a lot of the target users will be R users. We don’t have a lot of R expertise in our team so if you can help or advice in this space let me know.
The data is in NetCDFs it would be good to offer as Zarr or more appropriate cloud format but we’ve not found the time to set up a pipeline to do that (we update the data approx each week).
We are trying to spin up a project board here so feel free to add issues or comment there too.
Thanks for everyone’s support.
Hope you and yours are all keeping safe and well.
All the best.
@Theo_McCaie sorry have been busy could look at NetCDF to Zarr
@NickMortimer sorry for the slow reply. We’d love you to take a look at converting to Zarr. The data is available on Azure Open Datasets or you can go to the index page which ever suites.
Things to consider:
- We update the dataset daily so we would need to update the Zarr too.
- We continue to add parameters, so we would need to be able to update the Zarr (or create a new one) to have additional parameters.
- There are some ‘gotchas’ in the data I think. Nothing awful but for example, the daily values all have the correct date but may vary the time associated with that date. i.e. time stamp in the file would be 12:00 01/01/20 and then the next file might be 13:00 02/01/20, then maybe back to 12:00 again…
So, would love you to take a look at this, I don’t think it’s trivial but it would be great if we could do it well.
Of course, feel free to drop us a line anytime you have issues.
@Theo_McCaie Hope all is going well sorry for my slowness things have been crazy at work! our end of financial year is fast aproaching!