Capturing / recording usage analytics

Hi all,

The Met Office Informatics lab has set up a dataset and Pangeo platform to help COVID-19 researchers interested in relationships with environmental data information, details in the blog post.

The question I have is what’s the best/easiest way of capturing analytics of who is using our Pangeo instance, how often, for how long, etc. It’s backed with GitHub auth (so not anonymous). At the moment I can look on the hub admin screen and see when people were last on the platform but that’s about it.

Any suggestions?

Thanks.

2 Likes

:clap:

Fantastic work Theo and colleagues!

Nice @Theo_McCaie! Didn’t Jacob Tomlinson have a grafana dashboard setup for your Pangeo clusters?

Cross referencing a few other posts:

and

1 Like

Thanks @jhamman. You are right about Jacob and grafana, unfortunately that stopped working when we re-architected but it’s a good shout, I could see about resurrecting.

Thanks for the other links too!

If you’re using a recent enough version of JupyterHub, you can use the experimental telemetry / eventlogging system to capture user server start and end events. This can then be used to see session lengths, etc. If you have a persistent hub PVC that is storing the sqlite database, you can put the telemetry there with a FileHandler. Just make sure the PVC is big enough :slight_smile:

There is a PR in notebook itself that should provide more events, but I don’t think you need those.

I probably also have some scripts that can parse jupyterhub logs to emit conformant Telemetry logs. Let me see if I can find those.

Hope this is useful!

1 Like