Thanks again for all the useful tips.
I settled with R for now. As Oscar said, the dataset is not massive so I could
have done it using a dictionary. However some of the more frequent requests
will include to find data during certain times during certain days, for
specific months or weekdays vs. weekends, etc. I believe this would mean that I
would have needed some indexing (which made me think of using databases in the
first place). All of this seems to be quite easy in R as well
# The weather[3] column stores the string for the weekday
wkdays = which(weather[3] !="Sat"& weather[3] !="Sun")
I guess that would be easy enough with a list comprehension in python too.
Binning looks like this:
heatcut =
cut(as.matrix(lib[15]),breaks=c(0,max(lib[15])*0.1,max(lib[15])*0.2,max(lib[15])*0.3,max(lib[15])*0.4,max(lib[15])*0.5,max(lib[15])*0.6,max(lib[15])*0.7,max(lib[15])*0.8,max(lib[15])*0.9,max(lib[15])*1.0),labels=c('0%','10%','20%','30%','40%','50%','60%','80%','90%','100%'))
This can be added to a function. So the call will look something like
bin_me(lib[15], breaks=default, labels=default).
To get one bin in sqlite I wrote this for a sqlite db (not sure if there is an
easier way):
select count(Heating_plant_sensible_load) from LibraryMainwhere
Heating_plant_sensible_load > (select max(Heating_plant_sensible_load)*0.3 from
LibraryMain ANDHeating_plant_sensible_load < (select
max(Heating_plant_sensible_load)*0.4 from LibraryMain;
Indexing for certain times, using my approach, would add even more lines; on
top of this, I believe you would have to add this either to a view or a new
table...
So are seems to be clearer and more concise compared to sql/sqlite (at least
for me). On top of that it provides the possibility to do additional analysis
later on for specific cases. That it can connect with python is another plus.
Thanks again for everyones ideasdm
> Date: Wed, 14 Nov 2012 13:59:25 +
> Subject: Re: [Tutor] data analysis with python
> From: oscar.j.benja...@gmail.com
> To: awesome.me...@outlook.com
> CC: tutor@python.org
>
> On 14 November 2012 03:17, David Martins wrote:
> > Hi All
> >
> > I'm trying to use python for analysing data from building energy simulations
> > and was wondering whether there is way to do this without using anything sql
> > like.
>
> There are many ways to do this.
>
> >
> > The simulations are typically run for a full year, every hour, i.e. there
> > are 8760 rows and about 100+ variables such as external air temperature,
> > internal air temperature, humidity, heating load, ... making roughly a
> > million data points. I've got the data in a csv file and also managed to
> > write it in a sqlite db.
>
> This dataset is not so big that you can't just load it all into memory.
>
> >
> > I would like to make requests like the following:
> >
> > Show the number of hours the aircon is running at 10%, 20%, ..., 100%
> > Show me the average, min, max air temperature, humidity, solar gains,
> > when the aircon is running at 10%, 20%,...,100%
> >
> > Eventually I'd also like to generate an automated html or pdf report with
> > graphs. Creating graphs is actually somewhat essential.
>
> Do you mean graphs or plots? I would use matplotlib for plotting. It
> can automatically generate image files of plots. There are also ways
> to generate output for visualising graphs but I guess that's not what
> you mean. Probably I would create a pdf report using latex and
> matplotlib but that's not the only way.
> http://en.wikipedia.org/wiki/Graph_(mathematics)
> http://en.wikipedia.org/wiki/Plot_(graphics)
>
> > I tried sql and find it horrible, error prone, too much to write, the logic
> > somehow seems to work different than my brain and I couldn't find
> > particulary good documentation (particulary the documentation of the api is
> > terrible, in my humble opinion). I heard about zope db which might be an
> > alternative. Would you mind pointing me towards an appropriate way to solve
> > my problem? Is there a way for me to avoid having to learn sql or am I
> > doomed?
>
> There are many ways to avoid learning SQL. I'll suggest the simplest
> one: Can you not just read all the data into memory and then perform
> the computations you want?
>
> For example:
>
> $ cat tmp.csv
> Temp,Humidity
> 23,85
> 25,87
> 26,89
> 23,90
> 24,81
> 24,80
>
> $ cat tmp.py
> #!/usr/bin/env python
>
> import csv
>
> with open('tmp.csv', 'rb') as f:
> reader = csv.DictReader(f)
> data =