Python taking statistics over each latitudes and longitudes pair

Hi All,

I have a year’s worth of data and this is just a sample of how it is formatted: https://i.stack.imgur.com/KssjM.png

for each lat and lon pair I need to take statistics over a time period. How do I do this with python???

For a particular example:
there are 1000s of temperature values specifically at lat = 25.313 and lon = -108.813.

The entire dataset is mapped on a grid of the US and at each particular lat and lon I want to take statistic on based on time for one particular variables, let’s say temperature. I have not done something like this before and I need to figure out a method to do this.

To further describe I want to take the mean. I used Pandas before but I haven’t utilized a large dataset before. lat is between (25.063 and 52.938 w x spatial step) lon btwn (-124.938 & -67.688 w x spatial step). All data is represented in those 4 columns. I want to do this so I can then use geospatial python package to map these values with a color map

Thanks!

1 Like

If it is really big you can use a dask dataframe, group by your coordinates for the mean, then map?

1 Like

What format is the original data in? Is it just a large CSV/excel file? You say that the data is already mapped on a grid, does that mean that it is stored in some sort of 3D array (lat, lon, time)?

1 Like

Maybe this notebook could help: https://gallery.pangeo.io/repos/pangeo-gallery/physical-oceanography/01_sea-surface-height.html#Sea-Level-Variability in particular the sea level variability part. Only catch is that it uses xarray.dataset I think but if you can load the data with pandas maybe you can push it to xarray like this: https://xarray.pydata.org/en/stable/pandas.html#dataset-and-dataframe

1 Like