I’m following the PangeoForge recipe tutorial Xarray-to-Zarr Sequential Recipe: NOAA OISST to create a recipe for CEDA monthly daytime land surface temperature data, but I’m running into issues using pangeo-forge-recipes version 0.10.1 (obtained via conda-forge).

Here’s my code in (I’m using python 3.11):

import os
from tempfile import TemporaryDirectory

import apache_beam as beam
import pandas as pd
import xarray as xr
from pangeo_forge_recipes.patterns import pattern_from_file_sequence
from pangeo_forge_recipes.transforms import (

url_pattern = (
months = pd.date_range("1995-08", "2020-12", freq=pd.offsets.MonthBegin())
urls = tuple(url_pattern.format(time=month) for month in months)
pattern = pattern_from_file_sequence(urls, "time", nitems_per_file=1).prune(1)

temp_dir = TemporaryDirectory()
target_root =
store_name = "output.zarr"
target_store = os.path.join(target_root, store_name)

transforms = (
    | OpenURLWithFSSpec()
    | OpenWithXarray(file_type=pattern.file_type)
    | StoreToZarr(
        target_chunks={"time": 1, "lat": 5, "lon": 5},


with beam.Pipeline() as p:
    p | transforms  # type: ignore[reportUnusedExpression]

with xr.open_zarr(target_store) as ds:

NOTE: In my attempt to reduce the likelihood of a memory problem, my pattern is pruned to a single element. Unfortunately, this didn’t help.

When I run this, it is eventually killed because it consumes an obscene amount of memory. I saw the python process exceed 40G of memory (on my 16G machine), but it may very well have gone beyond that while I wasn’t watching it – it ran for about 3.5 hours!:

$ time python
.../python3.11/site-packages/xarray/core/ SerializationWarning: saving variable None with floating point data as an integer dtype without any _FillValue to use for NaNs
  return to_zarr(  # type: ignore[call-overload,misc]
Killed: 9

real    216m31.108s
user    76m14.794s
sys     90m21.965s
.../python3.11/multiprocessing/ UserWarning: resource_tracker: There appear to be 1 leaked semaphore objects to clean up at shutdown
  warnings.warn('resource_tracker: There appear to be %d '

I’m going to downgrade pangeo-forge-recipes to a version prior to the recently introduced breaking API changes to see if I encounter the same problem with the old API, but in the meantime, is there anything glaringly wrong with what I’ve written above that would cause the memory issue?

Thanks for reporting this! Definitely not normal or expected behavior.

For Pangeo Forge Recipes support, you will likely get a much better response on the issue tracker, rather than this forum: Issues · pangeo-forge/pangeo-forge-recipes · GitHub

Thanks @rabernat. I wasn’t sure if I was just doing something wrong, so I didn’t want to open an issue against the repo in that case, but as you suggest, I’ll go that route.

Here’s the issue I created: Xarray-to-Zarr recipe runs out of memory · Issue #614 · pangeo-forge/pangeo-forge-recipes · GitHub