Numba guvectorize and distributed

Hi,

I’m trying to compute a @guvectorized function over a cluster created by dask distributed, particularly dask-cloudprovider.

I’ve seen there are some open issues regarding this problem, among possible others

numba/numba/issues#4314
xgcm/fastjmd95/issues#1
dask/distributed/issues#3450

which don’t paint an optimistic picture that there’s a working solution or fix soon.

I was just wondering if anyone here has some more experiences with this, or a possible workaround that hasn’t been documented in the issues.

Thanks!

Val

1 Like

Hi Val,

I don’t believe that anyone is investigating this. IIRC, we loosely determined that we’d need changes to NumPy’s ufunc machinery to enable this.

Just to make sure, did you see the comment that using array.map_blocks(gufunc), rather than gufunc(array) should work?

Hi Tom,

thanks for your answer. I already got this sense reading the issues.

Ideally, I’d like to use xarray.apply_ufunc, which works just fine on the local cluster.

Actually, on the local cluster, all of the “fixes” work - just in the production setup on Fargate, it fails and I can’t figure out a way around it.

To boil things down, I’ve tried this:

import dask.array as da
from dask_cloudprovider import FargateCluster
from distributed import LocalCluster, Client
from numba import guvectorize
...

@guvectorize("(uint8[:], uint8[:])", "(n) -> ()", nopython=True)
def gf_test(data, out):
    ...

def main():

 arr = da.random.randint(0, 2, (10,10,100), dtype="uint8")
 task = arr.map_blocks(gf_test)

 result = client.compute(task).result()  

 ...

if __name__ == "__main__":
 main()

Which works fine when I tested it using a local cluster like

cluster = LocalCluster(ip='0.0.0.0', n_workers=2)
client = Client(cluster)

but fails in production using a FargateCluster from dask-cloudprovider like

cluster = FargateCluster(...)
client = Client(cluster)

Same with external imports etc …

Edit:

Clearly the difference seems to be the cluster, but that goes way beyond my understanding of dask and/or distributed.

The error msg I get is

AttributeError: module ‘mp_main’ has no attribute ‘gf_test’

as compared to the one you’ve had in the issue which was

AttributeError: module ‘main’ has no attribute ‘test_numba’

maybe that helps?