Large Scale Networking (LSN) Workshop on Huge Data: A Computing, Networking and Distributed Systems Perspective

rabernat · February 11, 2020, 8:08pm

Would be great to have a Pangeo application to this
https://protocols.netlab.uky.edu/~hugedata2020/
Is anyone available to go to Chicago at the proposed dates?

Large Scale Networking (LSN) Workshop on Huge Data:

A Computing, Networking and Distributed Systems Perspective

Sponsored by the National Science Foundation (NSF)

Chicago, IL, April 13 – 14, 2020

co-located with FABRIC Community Visioning Workshop

There is an ever-increasing demand in science and engineering, and arguably all areas of research, on the creation, analysis, archival and sharing of extremely large data sets - often referred to as “huge data”. For example, the blackhole image comes from 5 petabytes of data collected by the Event Horizon Telescope over a period of 7 days. Scientific instruments such as confocal and multiphoton microscopes generate huge images in the order of 10 GB per image and the total size can grow quickly when the number of images generated increases. The Large Hadron Collider generates 2000 petabytes of data over a typical 12 hour run. These data sets reside at the high end of the “big data” spectrum and can include data sets that are continuously growing without bounds. They are often collected from distributed devices (e.g., sensors), potentially processed on-site or at distributed clouds, and can be intentionally placed/duplicated in distributed sites for reliability, scalability and/or availability reasons. Data creation resulting from measurement, generation, and transformation over distributed locations is stressing the contemporary computing paradigm. Efficient processing, persistent availability and timely delivery (especially over wide-area) of huge data have become critically important to the success of scientific research.

While distributed systems and networking research has well explored the fundamental challenges and solution space for a broad spectrum of distributed computing models operating on large data sets, the sheer size of the data in question today has well surpassed that assumed in prior research. To-date, the majority of computing systems and applications operate based on clear delineation of data movement and data computing. Data is moved from one or more data stores to a computing system, and then it is computed “locally” on that system. This paradigm consumes significant storage capacity at each computing system to hold the transferred data and data generated by the computation, as well as significant time for data transfer before and after the computation. Looking forward, researchers have begun to discuss the potential benefits of a completely new computing paradigm that more efficiently supports “in situ” computation of extremely large data at unprecedented scales across distributed computing systems interconnected by high speed networks, with high performance data transfer functions more closely integrated in software (e.g., operating systems) and hardware infrastructure than have been so far. Such a new paradigm has the potential to avoid bottlenecks for scientific discoveries and engineering innovations through much faster, efficient, and scalable computation across a globally distributed, highly interconnected and vast collection of data and computation infrastructure.

This workshop intends to bring together domain scientists, network and systems researchers, and infrastructure providers, to understand the challenges and requirements of “huge-data” sciences and engineering research needs and explore new paradigms to address the problems associated with processing, storing, and transferring huge data. Topics of interest include, but are not limited to:

huge data applications, requirements and challenges
challenges of designing and working with devices for huge data generation
storage systems for huge data
software systems and network protocols for huge data
in-network computing/storage for huge data
software-defined networking and infrastructure for huge data
infrastructure support for huge data
debugging and troubleshooting of huge data infrastructure
AI/ML technologies for huge data
measuring the huge data transfer and computation
scientific workflow of huge data
access to (portions of) huge data sets
protecting/securing (portions of) huge data sets

Submission of White Papers

Individuals interested in attending should submit a 1-2 page white paper that addresses a problem related to huge data transfer and processing. White papers should be submitted as PDF attachments by email to hugedata@netlab.uky.edu no later than February 15, 2020 .

Registration and Travel Grant

A limited number of travel grants are available for authors of accepted white papers to support attendance at the workshop. Registration and travel grant application information can be found by following “Registration/Travel Grant” tab on the top of this page. The deadline is February 25, 2020 .

Important Dates

Deadline for submission of white papers:	February 15, 2020
Acceptance notification:	February 20, 2020
Registration and travel grants application:	February 25, 2020
Notification of travel grant approval:	March 1, 2020
Workshop dates	April 13-14, 2020

Organizing Committee

Kuang-Ching Wang, Clemson University
James Griffioen, University of Kentucky
Ronald Hutchins, University of Virginia
Zongming Fei, University of Kentucky

Topic		Replies	Views
Processing Terabyte-Scale NASA Cloud Datasets with Coiled Cloud	1	358	November 27, 2023
Pangeo Showcase: "Dask Array: Scaling Up for Terabyte-Level Performance" (April 9, 2025 at 12 PM ET) Pangeo Showcase	3	336	April 9, 2025
NSF-Sponsored Workshop on "Next Generation Cloud Research Infrastructure" News & Announcements	8	780	September 12, 2019
Cloud-Native Benchmarking: Pangeo Community Meeting June 4th Discussion Topic News & Announcements	3	242	May 30, 2025
Pangeo Showcase: "Arkouda as an XArray backend for HPC!" Pangeo Showcase	3	237	December 4, 2024

Large Scale Networking (LSN) Workshop on Huge Data: A Computing, Networking and Distributed Systems Perspective