Re: [Tutor] summary stats grouped by month year

Andre' Walker-Loud Mon, 07 May 2012 23:43:58 -0700

Hello anonymous questioner,

first comment - you may want to look into hdf5 data structures


http://www.hdfgroup.org/HDF5/

and the python tools to play with them

pytables - http://www.pytables.org/moin
h5py - http://code.google.com/p/h5py/

I have personally used pytables more - but not for any good reason.  If you 
happen to have the Enthought python distribution - these come with the package, 
as well as an installation of hdf5

hdf5 is a very nice file format for storing large amounts of data (binary) with 
descriptive meta-data.  Also, numpy plays very nice with hdf5.  Given all your 
questions here, I suspect you would benefit from learning about these and 
learning to play with them.

Now to your specific question.

> I would like to calculate summary statistics of rainfall based on year and 
> month.
> I have the data in a text file (although could put in any format if it helps) 
> extending over approx 40 years:
> YEAR MONTH    MeanRain
> 1972 Jan    12.7083199
> 1972 Feb    14.17007142
> 1972 Mar    14.5659302
> 1972 Apr    1.508517302
> 1972 May    2.780009889
> 1972 Jun    1.609619287
> 1972 Jul    0.138150181
> 1972 Aug    0.214346148
> 1972 Sep    1.322102228
> 
> I would like to be able to calculate the total rain annually:
> 
> YEAR   Annualrainfall
> 1972    400
> 1973    300
> 1974    350
> ....
> 2011     400
> 
> and also the monthly mean rainfall for all years:
> 
> YEAR  MonthlyMeanRain
> Jan      13
> Feb      15
> Mar       8
> .....
> Dec       13
> 
> 
> Is this something I can easily do?

Yes - this should be very easy.  Imagine importing all this data into a numpy 
array

===
import numpy as np

data = open(your_data).readlines()
years = []
for line in data:
        if line.split()[0] not in years:
                years.append(line.split()[0])
months = ['Jan','Feb',....,'Dec']

rain_fall = np.zeros([len(n_year),len(months)])
for y,year in enumerate(years):
        for m,month in enumerate(months):
                rain_fall[y,m] = float(data[ y * 12 + m].split()[2])

# to get average per year - average over months - axis=1
print np.mean(rain_fall,axis=1)

# to get average per month - average over years - axis=0
print np.mean(rain_fall,axis=0)

===

now you should imagine doing this by setting up dictionaries, so that you can 
request an average for year 1972 or for month March.  That is why I used the 
enumerate function before to walk the indices - so that you can imagine 
building the dictionary simultaneously.

years = {'1972':0, '1973':1, ....}
months = {'Jan':0,'Feb':1,...'Dec':11}

then you can access and store the data to the array using these dictionaries.

print rain_fall[int('%(1984)s' % years), int('%(March)s' % months)]


Andre





> I have started by simply importing the text file but data is not represented 
> as time so that is probably my first problem and then I am not sure how to 
> group them by month/year. 
> 
> textfile=r"textfile.txt"
> f=np.genfromtxt(textfile,skip_header=1)
> 
> Any feedback will be greatly appreciated.
> 
> _______________________________________________
> Tutor maillist  -  Tutor@python.org
> To unsubscribe or change subscription options:
> http://mail.python.org/mailman/listinfo/tutor

_______________________________________________
Tutor maillist  -  Tutor@python.org
To unsubscribe or change subscription options:
http://mail.python.org/mailman/listinfo/tutor

Re: [Tutor] summary stats grouped by month year

Reply via email to