Re: [R] R & MySQL (Databases)

[Ricardo Rodriguez] Your XEN ICT Team Sat, 16 Oct 2010 07:27:10 -0700

Hi!

Santosh Srinivas wrote:

Dear R-helpers,


Considering that a substantial part of analysis is related data
manipulation, I'm just wondering if I should do the basic data part in a
database server (currently I have the data in .txt file).
For this purpose, I am planning to use MySQL. Is MySQL a good way to go
about? Are there any anticipated problems that I need to be aware of?

I'm afraid that I've no real answer to your questions but more questionsand, perhaps, another point of view from a different world!


Considering, that many users here use large datasets. Do you typical store
the data in databases and query relevant portions for your analysis?
Does it speed up the entire process? Is it neater to do things in a
database? (for e.g. errors could corrected at data import stage itself by
conditions in defining the data itself in the database as opposed to
discovering things when you do the analysis in R and realize something is
wrong in the output?)


Please, what do you mean with "large datasets"?

I wouldn't consider only process speed, but also how the globalrepository of data is constructed. I mean, the problem is not onlyaccession speed now, but to be able to identify any set of data in thefuture. For this, it could be so useful a RDBMS as a hierarchical folderstructure plus well designed file names.


This is vis-à-vis using the built in SQLLite, indexing, etc capabilities in
R itself? Does performance work better with a database backend (especially
for simple but large datasets)?

As you said, R itself has powerful tools for data filteringrearrangement. I've only see some problems related with Genomicsanalysis where an external tool was required to manage some huge matrix.And that was time ago and some patches where on their way to solvethis problem within R.

What I see here, with sets of data coming from experimental designspouring into Excel data sheets and analytical facilities generating big(around 1Gb/day) plain text files, is that we have such a hugevariability in model structure that it will be quite expensive toprogramme interfaces to all the processes to store data in a centralrepository managed by a RDBMS.

Even worst! During these last years some of hour information has beenmoved to object oriented databases, so the problem is becoming more complex.


The financial applications that I am thinking of are not exactly realtime
but quick response and fast performance would definitely help.

Aside info, I want to take things to a cloud environment at some point of
time just because it will be easier and cheaper to deliver.

Kind of an open question, but any inputs will help.

As you see, there are no answers here, but more doubts as I'm in asimilar situation. So, any idea will be extremely welcome for us!


Thanks!



--
Ricardo Rodríguez
Your XEN ICT Team

______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] R & MySQL (Databases)

Reply via email to