Pipelines for downscaling CMIP6 data

Pipelines for downscaling CMIP6

Scientific Motivation

Statistical downscaling and post-processing have been a part of both weather forecasting and climate modeling applications for decades. These statistical techniques are used to correct persistent biases in atmospheric model outputs. While many methods for statistical downscaling have been proposed over the years, few have been made open source. This fact has made it difficult to systematically explore the performance of different methods in the context of the broad range of applications their output is use for. Furthermore, reproducing previous downscaling efforts has proved impossible in most cases and has lead to man reinventions of the downscaling wheel.

The recent growth in popularity of machine learning, and more specifically deep learning, is leading to the development of new downscaling methods that apply machine learning techniques to downscale climate model output. These approaches are appealing in part because they have the potential to make higher-order connections between modeled and observed climate. Additionally, the deep learning community has developed a productive culture of benchmarking approaches, leading to demonstrated improvements in target ML problems. We hope to do the same in the downscaling space.

Proposed Hacking

We plan to focus on three areas of development:

  1. Development of end-to-end pipelines for applying downscaling methods to CMIP6 data
  2. Development of methods for comparing downscaled data against other downscaled climate data products.
  3. Development of metrics to help inform the selection of different methods
  4. Implementation of new and existing downscaling methods in a common framework (this work is underway already)

Anticipated Data Needs

We plan to focus on downscaling the traditional target variables first (i.e. daily precipitation and temperature). However, if collaborators are interested in downscaling other variables, the tool set we hope to develop should support that. We have made data requests to have daily precipitation/temperature/wind/humidity/etc made available before the workshop.

Anticipated Software Tools

Xarray, Dask, Scikit-learn, xsd (https://github.com/jhamman/xsd)

Note: Xsd is a new tool we are developing to provide a common scikit-learn-like API on top of common downscaling approaches (e.g. BCSD, Analog-Regression, Quantile-Mapping). We are actively looking for collaborators on the development of this project. More information on the xsd roadmap is available here

Desired Collaborators

Anyone interested in statistical downscaling, machine learning, or climate applications.

2 Likes

@jhamman I like this project idea, and the proposed hacking would be very useful for the community! I am particularly interested in downscaling precipitation and wind speed/direction, thus I would be happy to contribute.

I’m not so familiar with the time investment required to make the proposed hacking happen, but it seems ambitious for 3 days? Do you have specific goals for the hackathon?

1 Like

Hi @eric_keenan. Glad to hear this seems interesting to you. My personal goals are to establish some momentum around the xsd project and to produce a few sample pipelines using relatively simple downscaling methods. I don’t expect to walk away from the event with all of CMIP6 downscaled but I hope we can show some significant progress in this area. Some specific hacking activities that may interest people:

  • build out a model evaluation tool for evaluating multiple downscaling methods. Hopefully this takes the form of something like sklearn.metrics.
  • develop a benchmark problem to help the community measure the effectiveness of individual models
  • (re) implement a traditional downscaling method (e.g. BCSD, ARRM, QM) in a sklearn like framework
  • develop a new downscaling pipeline using sklearn. Something like a RandomForestRegressor would be a great place to start.

All of these are fairly well contained efforts and significant progress could be made on them in a few days. I expect any complete downscaling efforts will happen after the hack though.

@jhamman Thanks for the clarification! Count me in. Do we have a slack channel?

I just created a slack channel and I’ve added this project to the signup spreadsheet.

What may be useful at this point is to hear from others what their specific downscaling applications look like. I’ll provide a brief description of what I’m going for:

  • Target variables: daily precipitation/temperature
  • Target region: CONUS+
  • Obs data: GMET.v2
  • Training data: ERA-5
  • Methods: benchmark methods including BCSD, ARRM, QM and new prototype methods RandomForestRegressor

A great outcome for me would be an end-to-end workflow that compares 3 or 4 methods using some basic metrics we can develop during the hack week.

Hi @jhamman and @eric_keenan. Here’s a brief description of what I’m going for:

  • Target variables: daily temperature and precip (but may just start off with temp)
  • Target region: global
  • Obs data: ERA-5
  • Methods: BCSD with possible mods at the tails

A great outcome for me would also be an end-to-end workflow for one method (BCSD) that possibly also shows differences at the tails depending on the type of quantile mapping used.

1 Like