I hope I am using this forum in a correct way.
I am struggling with a thing that should be simple. Calculate a correlation!
But I have >300Gb of data and this gets very tricky.
I have a big matrix
A that has the following dimensions
(time,kx,ky) = (400,4096,4096). The first thing I would like to do is to calculate an autocorrelation in time for each
ky, but I would also like to be able to do the same for cross-correlation, as I actually have many matrices like
A that I would like to correlate with.
To give an example, I will create an arbitrary smaller matrix
A that is basically a cosine in time adding a noise that depends on the magnitude of
k. In this example, the fake dataset will be loaded in the memory, but in my real-world example I am loading the datasets from netcdf files using
import xarray as xr import numpy as np time = xr.DataArray(np.arange(200), dims=["time"]) kx = xr.DataArray(np.arange(1000), dims=["kx"]) ky = xr.DataArray(np.arange(1000), dims=["ky"]) K = np.abs(kx + 1j*ky) A = ( (np.cos(time)*np.exp(-time/50))+ ((np.random.randn(time.size)*xr.ones_like(time)*K/2e3)) ).assign_coords(time=time, kx=kx, ky=ky)
Anyone has any idea on what would be the fastest way to wrap
scipy.signal.correlate in a way that I could give any two matrices like
A and return a matrix
C(lag,kx,ky) without loading everything in the memory all at once?
The fastest way I could do was to select and load each
kx and then loop over
ky selecting each time series, running
scipy.signal.correlate and then saving into a
numpy.array that later becomes my