On Fri, 1 Apr 2011, Henri Mone wrote:
Dear R Users,
I use for my data crunching a combination of MySQL and GNU R. I have
to handle huge/ middle seized data which is stored in a MySql
database, R executes a SQL command to fetch the data and does the
plotting with the build in R plotting functions.
The (low level) calculations like summing, dividing, grouping, sorting
etc. can be done either with the sql command on the MySQL side or on
the R side.
My question is what is faster for this low level calculations / data
rearrangement MySQL or R? Is there a general rule of thumb what to
shift to the MySql side and what to the R side?
The data transfer costs almost always dominate here: since such
low-level computations would almost always be a trivial part of the
total costs, you should do things which can reduce the size (e.g.
summarizations) in the DBMS.
I do wonder what you think the R-sig-db list is for if not questions
such as this one. Please subscribe and use it next time.
Thanks
Henri
--
Brian D. Ripley, rip...@stats.ox.ac.uk
Professor of Applied Statistics, http://www.stats.ox.ac.uk/~ripley/
University of Oxford, Tel: +44 1865 272861 (self)
1 South Parks Road, +44 1865 272866 (PA)
Oxford OX1 3TG, UK Fax: +44 1865 272595
______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.