On Fri, 18 Jan 2008, [EMAIL PROTECTED] wrote: > For cost reasons, I'd like to replace SAS on my PC under Win XP Pro. > > Nearly all my work involves medium-size datasets (100k-10M) records which I > cleanup, relate, fliter and get into shape for analysis using SAS/SQL > followed by standard statistical procedures, e.g. regression using SAS proc > reg. > > It seems to me that this type of analysis could be done in MySQL followed > by R, but I'd like some advice about the best way to pass datasets from > MySQL to R. I understand there are various connectivity packages > avaialble, RODBC and RMySQL, but I'd appreciate some advice about where to > dig in first. > > I'd like to avoid additional syntax in my SQL code -- so it seems better > not to coat SQL queries in R wrappers and pass them to MySQL -- probably > better for my way of working to finish all the SQL work and pass a clean > table ready to analyze to R.
The way these work (using RODBC is an example) is - If necessary, send the data to MySQL via sqlSave(). - Use sqlQuery() to send SQL statements verbatim to the RDBMS (here MySQL) - Retrieve a table via sqlFetch(). - Do the analysis on the fetched table. If the table is very large, you can fetch in junks and use the facilities in the 'biglm' package to do a regression a block of data at a time. However, I am not sure of the value of using more than 10,000 cases in a regression, as well before that non-sampling errors will dominate the error distribution: e.g. the systematic error from model misfit may be larger than the nominal standard errors. I can see why experienced SAS users like to use it for data cleanup, but it seem generally true that the user is a more important variable than the tool: people work best with the tools they understand best (and personal preference comes into it). > One of the great advantages for me using SAS is that I can beat the data > into shape using proc SQL and then call proc STAT_OF_THE_DAY all in the > same batch file with no plumbing, data conversion or additional > machination. The way I am sketching above is using R as the scripting language. It's a pretty powerful one, certainly powerful enuough to do the text processing needed to prepare SQL queries. > But if I could do this in MySQL + R, well in a few years I'd have the down > payment for an Audi TT instead of having given it to SAS Institute. > > If someone found a practical way to make this work -- pls let me know. And > thanks in advance. > > Jack Williamson > LECG > 2049 Century Park East, Suite 2300 > Los Angeles CA 90067 > 323-683-5004 > [EMAIL PROTECTED] > > ______________________________________________ > R-help@r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. > -- Brian D. Ripley, [EMAIL PROTECTED] Professor of Applied Statistics, http://www.stats.ox.ac.uk/~ripley/ University of Oxford, Tel: +44 1865 272861 (self) 1 South Parks Road, +44 1865 272866 (PA) Oxford OX1 3TG, UK Fax: +44 1865 272595 ______________________________________________ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.