Hi Tom,
thanks for your answer. I already got this sense reading the issues.
Ideally, I’d like to use xarray.apply_ufunc
, which works just fine on the local cluster.
Actually, on the local cluster, all of the “fixes” work - just in the production setup on Fargate, it fails and I can’t figure out a way around it.
To boil things down, I’ve tried this:
import dask.array as da
from dask_cloudprovider import FargateCluster
from distributed import LocalCluster, Client
from numba import guvectorize
...
@guvectorize("(uint8[:], uint8[:])", "(n) -> ()", nopython=True)
def gf_test(data, out):
...
def main():
arr = da.random.randint(0, 2, (10,10,100), dtype="uint8")
task = arr.map_blocks(gf_test)
result = client.compute(task).result()
...
if __name__ == "__main__":
main()
Which works fine when I tested it using a local cluster like
cluster = LocalCluster(ip='0.0.0.0', n_workers=2)
client = Client(cluster)
but fails in production using a FargateCluster
from dask-cloudprovider
like
cluster = FargateCluster(...)
client = Client(cluster)
Same with external imports etc …
Edit:
Clearly the difference seems to be the cluster, but that goes way beyond my understanding of dask and/or distributed.
The error msg I get is
AttributeError: module ‘mp_main’ has no attribute ‘gf_test’
as compared to the one you’ve had in the issue which was
AttributeError: module ‘main’ has no attribute ‘test_numba’
maybe that helps?