Like some (or even many?) of you, I work at a university.
In my research group, we use the university HPC for most of our model simulations, and then use our own servers to archive and host the data, since it is difficult to get enough storage on the HPC machines we use. From there, we use the Pangeo stack of tools to analyze the model results.
We use the university HPC machines because they are free (for us), resources are plentiful for researchers with funded projects, and the High Performance Research Computation group at TAMU is fantastic. But I still wonder if there are changes I could make to improve the workflow. Keep the data on the cluster? Move the data to the cloud instead of local machines? Put all my data on a desktop RAID array?
I would like to start a discussion about this sort of workflow. I’m interested to hear what others do, what works and what doesn’t? To inspire some discussion, see this twitter thread (I laughed out loud at take number 2).