Hello anonymous questioner, first comment - you may want to look into hdf5 data structures
http://www.hdfgroup.org/HDF5/ and the python tools to play with them pytables - http://www.pytables.org/moin h5py - http://code.google.com/p/h5py/ I have personally used pytables more - but not for any good reason. If you happen to have the Enthought python distribution - these come with the package, as well as an installation of hdf5 hdf5 is a very nice file format for storing large amounts of data (binary) with descriptive meta-data. Also, numpy plays very nice with hdf5. Given all your questions here, I suspect you would benefit from learning about these and learning to play with them. Now to your specific question. > I would like to calculate summary statistics of rainfall based on year and > month. > I have the data in a text file (although could put in any format if it helps) > extending over approx 40 years: > YEAR MONTH MeanRain > 1972 Jan 12.7083199 > 1972 Feb 14.17007142 > 1972 Mar 14.5659302 > 1972 Apr 1.508517302 > 1972 May 2.780009889 > 1972 Jun 1.609619287 > 1972 Jul 0.138150181 > 1972 Aug 0.214346148 > 1972 Sep 1.322102228 > > I would like to be able to calculate the total rain annually: > > YEAR Annualrainfall > 1972 400 > 1973 300 > 1974 350 > .... > 2011 400 > > and also the monthly mean rainfall for all years: > > YEAR MonthlyMeanRain > Jan 13 > Feb 15 > Mar 8 > ..... > Dec 13 > > > Is this something I can easily do? Yes - this should be very easy. Imagine importing all this data into a numpy array === import numpy as np data = open(your_data).readlines() years = [] for line in data: if line.split()[0] not in years: years.append(line.split()[0]) months = ['Jan','Feb',....,'Dec'] rain_fall = np.zeros([len(n_year),len(months)]) for y,year in enumerate(years): for m,month in enumerate(months): rain_fall[y,m] = float(data[ y * 12 + m].split()[2]) # to get average per year - average over months - axis=1 print np.mean(rain_fall,axis=1) # to get average per month - average over years - axis=0 print np.mean(rain_fall,axis=0) === now you should imagine doing this by setting up dictionaries, so that you can request an average for year 1972 or for month March. That is why I used the enumerate function before to walk the indices - so that you can imagine building the dictionary simultaneously. years = {'1972':0, '1973':1, ....} months = {'Jan':0,'Feb':1,...'Dec':11} then you can access and store the data to the array using these dictionaries. print rain_fall[int('%(1984)s' % years), int('%(March)s' % months)] Andre > I have started by simply importing the text file but data is not represented > as time so that is probably my first problem and then I am not sure how to > group them by month/year. > > textfile=r"textfile.txt" > f=np.genfromtxt(textfile,skip_header=1) > > Any feedback will be greatly appreciated. > > _______________________________________________ > Tutor maillist - Tutor@python.org > To unsubscribe or change subscription options: > http://mail.python.org/mailman/listinfo/tutor _______________________________________________ Tutor maillist - Tutor@python.org To unsubscribe or change subscription options: http://mail.python.org/mailman/listinfo/tutor