Hi,

I hope I am using this forum in a correct way.

I am struggling with a thing that should be simple. Calculate a correlation!

But I have >300Gb of data and this gets very tricky.

I have a big matrix `A`

that has the following dimensions `(time,kx,ky) = (400,4096,4096)`

. The first thing I would like to do is to calculate an autocorrelation in time for each `kx`

and `ky`

, but I would also like to be able to do the same for cross-correlation, as I actually have many matrices like `A`

that I would like to correlate with.

To give an example, I will create an arbitrary smaller matrix `A`

that is basically a cosine in time adding a noise that depends on the magnitude of `k`

. In this example, the fake dataset will be loaded in the memory, but in my real-world example I am loading the datasets from netcdf files using `xr.open_dataset`

.

```
import xarray as xr
import numpy as np
time = xr.DataArray(np.arange(200), dims=["time"])
kx = xr.DataArray(np.arange(1000), dims=["kx"])
ky = xr.DataArray(np.arange(1000), dims=["ky"])
K = np.abs(kx + 1j*ky)
A = (
(np.cos(time)*np.exp(-time/50))+
((np.random.randn(time.size)*xr.ones_like(time)*K/2e3))
).assign_coords(time=time, kx=kx, ky=ky)
```

Anyone has any idea on what would be the fastest way to wrap `scipy.signal.correlate`

in a way that I could give any two matrices like `A`

and return a matrix `C(lag,kx,ky)`

without loading everything in the memory all at once?

The fastest way I could do was to select and load each `kx`

and then loop over `ky`

selecting each time series, running `scipy.signal.correlate`

and then saving into a `numpy.array`

that later becomes my `xarray.DataArray`

object `C(lag,kx,ky)`

.