We’ve discussed serverless compute here and there over the years, but I thought it might be nice to start a dedicated topic. Here’s a timeline of some relevant activities I know of:
November 2017: AWS Workshop on Massive parallel data processing using pywren and AWS Lambda, featuring a Landsat8 NDVI processing example notebook! This used the
pywren.default_executor(). (It requires python 2.7 and pywren is been supplanted by lithops). (Thanks to @RichardScottOZ for the discovering this)!
… the future of distributed computation will make use of highly abstracted function execution services like Lambda. However with the tools we have today there are too many issues to get this working without having major limitations.
May 2020: @tomwhite creates Dask Executor Scheduler, a Dask scheduler that uses
concurrent.futures.Executor to run tasks. He then used this for a zarr rechunking workflow using pywren and GCP (and also using rechunker).
November 2020: The GitHub project pywren-ibm-cloud becomes Lithops
January 2021: Blog post by IBM discussing integration of Lithops with IBM Cloud: Using Serverless to Run Your Python Code on 1000 Cores by Changing Two Lines of Code.
It would be cool to have an updated workshop showing how this stuff can work for Pangeo workflows. Perhaps it would just take updating the old AWS 2017 workshop to use Python 3 and lithops, using the Dask Executor Scheduler?
I look forward to seeing what people have to say who actually understand this stuff.