Blog post: Cloud‑native pipelines for scientific data processing with prefect and dask

andreirusu · September 24, 2025, 10:21am

Hey everyone!

I hope you don’t mind the self-promo too much, but I published recently an extended entry-level tutorial on building an end-to-end pipeline using Prefect and Dask, which uses xarray and Zarr on AWS S3, for converting raw hydroacoustic data from NOAA NCEI archive.

The data we used it’s a small subset from the 2019 Fall Bottom Trawl Survey, conducted by the Northeast Fisheries Science Center – a fisheries independent, multi-species survey that provides the primary scientific data for fisheries assessments in the U.S. mid-Atlantic and New England regions.

The raw data was recorded using an EK60 scientific echosounder which is very common narrow-band split beam sonar. The survey was conducted on-board the Henry B. Bigelow.

For processing the raw acoustic data, we used echopype, the primary Python based open-source tool used in fisheries acoustics data analysis.

More info on the dataset: NOAA Northeast Fisheries Science Center. 2019. ‘EK60 Water Column Sonar Data Collected During HB1906’. NOAA National Centers for Environmental Information. https://doi.org/10.25921/vt45-sa66

Thanks for reading!

jhamman · September 25, 2025, 1:22am

Thanks for sharing @andreirusu! Reminds me a lot of the early architecture of Pangeo-Forge when we used Prefect

ps. I think you missed a chance calling output data sonzarr

andreirusu · September 25, 2025, 7:46am

That’s a good point @jhamman!

Pangeo-Forge does look quite similar, I guess Prefect (presumably v2) was used in the bakery?
I’m not a strong advocate of Prefect, but the folks at UW who are developing echopype are also using it (v3), and I was curious to hear why it was dropped, if you don’t mind sharing.

For us, a big bonus for using Prefect is their slick UI which we can fork and extend, and also the prefect-dask runner seems to be good enough.

jhamman · September 25, 2025, 4:04pm

Pangeo forge switched to Apache Beam for the execution engine. I’m not sure it was worth it in the end but its water under the bridge!

Topic		Replies	Views
Pangeo Forge bakeries Cloud	21	1442	October 19, 2023
Hello Pangeo! News & Announcements	3	852	September 12, 2019
Infrastructure Engineer (Python, Docker, Airflow) opening @ Sofar Ocean (San Francisco, CA) Jobs	2	563	February 22, 2022
Is this a use case for Pangeo? Science	0	411	September 27, 2021
Any interest in using Ray? Cloud HPC	2	982	September 24, 2021

Blog post: Cloud‑native pipelines for scientific data processing with prefect and dask

Related topics