Efficient access of ensemble data on AWS

Hi @Peter_Marsh - thanks for your help! This is very useful and exactly what I was looking for. And thank you for your work on the kerchunk project as a whole.

I did get the first version of your code to work for me with no problems. It is faster to create the virtual dataset rather than concatting the xarrays.

I tried to get the second code you put up working, but ran into a problem here:

mzz = MultiZarrToZarr(flist, 
                    remote_protocol='s3',
                    remote_options={'anon':True},
                    coo_map={'ensemble' : ex},
                    concat_dims = ['ensemble'],
                    identical_dims = ['feature_id', 'reference_time', 'time'],
                     )
out = mzz.translate()

I get:

---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
Input In [40], in <cell line: 8>()
      1 mzz = MultiZarrToZarr(flist, 
      2                     remote_protocol='s3',
      3                     remote_options={'anon':True},
   (...)
      6                     identical_dims = ['feature_id', 'reference_time', 'time'],
      7                      )
----> 8 out = mzz.translate()

File ~/python3/miniconda3/envs/rain2/lib/python3.10/site-packages/kerchunk/combine.py:394, in MultiZarrToZarr.translate(self, filename, storage_options)
    392 """Perform all stages and return the resultant references dict"""
    393 if 1 not in self.done:
--> 394     self.first_pass()
    395 if 2 not in self.done:
    396     self.store_coords()

File ~/python3/miniconda3/envs/rain2/lib/python3.10/site-packages/kerchunk/combine.py:200, in MultiZarrToZarr.first_pass(self)
    198 z = zarr.open_group(fs.get_mapper(""))
    199 for var in self.concat_dims:
--> 200     value = self._get_value(i, z, var, fn=self._paths[i])
    201     if isinstance(value, np.ndarray):
    202         value = value.ravel()

File ~/python3/miniconda3/envs/rain2/lib/python3.10/site-packages/kerchunk/combine.py:150, in MultiZarrToZarr._get_value(self, index, z, var, fn)
    148     o = selector[index]
    149 elif isinstance(selector, re.Pattern):
--> 150     o = selector.match(fn).groups[0]  # may raise
    151 elif not isinstance(selector, str):
    152     # constant, should be int or float
    153     o = selector

TypeError: 'builtin_function_or_method' object is not subscriptable

I ran this in a separate env where I made sure I had the latest packages, and everything looks to be working prior to this point. Can you help me out again?