Arkouda will not be able to handle cases where the aggregate array memory exceeds the cluster memory, whereas Dask has no problem with this.
@rabernat: I’m interested in confirming something that I’ve only heard second-hand: Is it true that Dask array indices (and therefore sizes) are limited to 32-bit integers?
And then for my curiosity: What size arrays do you work with (or want to work with) in practice? And is the straining point for your computations typically the overall size of the arrays, or having large numbers of arrays? (or both?)
I ask because I’ve yet to run into use cases where a cluster’s memory size has been a limiting factor for array size in Arkouda/Chapel (though, granted, we tend to have access to fairly large systems at Cray/HPE; but then again, so do most people nowadays, via the cloud). And if the 32-bit limit is real, it seems that a user would typically hit that limit long before they’d run out of cluster memory?
With Arkouda, that would be impossible (AFAIK)
With the current implementation, you’re right that things are not evaluated lazily. With effort, this could be added, but it hasn’t been a pain point for Arkouda’s user community that I’m aware of.
While I understand the principles of lazy evaluation, I’m not seeing what benefit it’s getting you here vs. eager evaluation. Can you clarify, maybe through an example, what the value-add is for your use cases?