I have some data in an SQL data base (PostgreSQL to be exact). This data
base (one table in particular) has a lot of data in it. Years worth, in
fact (2.3 million rows since 2012). That just for background.

My question is: If I am going to create 5 separate graphs, one graph for
each of the previous 5 weeks (in this case a week is Sunday to Saturday),
is it better to read in all 5 weeks worth of data in a single dbGetQuery
where the SELECT has a WHERE clause which will get the proper 5 weeks worth
of data, then subset in R. Or is it better to get a single weeks worth of
data in a dbGetQuery, with the proper SELECT ... WHERE. And then process
each week. There seems to be anywhere from 7,500 to 9,000 entries for a
single week. The calculations for each week are independent of any other
week's data. Basically I am just creating a simple bar chart.

In the first case, I do one SELECT; and then subset the data.frame data in
the for() loop. In the second case, I still use a for() loop, but I do a
SELECT in each iteration, but don't need to subset the data.frame.

I have read the "Data Import/Export". The only advice I can find is based
on:
<quote>
The sort of statistical applications for which DBMS might be used are to
extract a 10% sample of the data, to cross-tabulate data to produce a
multi-dimensional contingency table, and to extract data group by group
from a database for separate analysis.
</quote>

The "extract data group by group ..." seems, to me, to say that I should
extract and process each week's data separately, using a dbGetQuery in my
loop. Am I interpreting this correctly?

Thanks for your advice.

-- 
There is nothing more pleasant than traveling and meeting new people!
Genghis Khan

Maranatha! <><
John McKown

        [[alternative HTML version deleted]]

______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Reply via email to