If it is acceptable to perform conservative aggregation of arbitrary polygons at a given finite resolution, I think that something based on s2geometry (or H3) cells may offer great speed-ups. Both libraries support large to very fine spatial resolutions (cf. @darothen’s comment).
Like xESMF, a solution based on s2geometry or h3 would also introduce some heavy dependencies, but not heavier than GEOS used by shapely.