The Met Office Informatics lab has set up a dataset and Pangeo platform to help COVID-19 researchers interested in relationships with environmental data information, details in the blog post.
The question I have is what’s the best/easiest way of capturing analytics of who is using our Pangeo instance, how often, for how long, etc. It’s backed with GitHub auth (so not anonymous). At the moment I can look on the hub admin screen and see when people were last on the platform but that’s about it.
If you’re using a recent enough version of JupyterHub, you can use the experimental telemetry / eventlogging system to capture user server start and end events. This can then be used to see session lengths, etc. If you have a persistent hub PVC that is storing the sqlite database, you can put the telemetry there with a FileHandler. Just make sure the PVC is big enough
There is a PR in notebook itself that should provide more events, but I don’t think you need those.
I probably also have some scripts that can parse jupyterhub logs to emit conformant Telemetry logs. Let me see if I can find those.