Unfortunately, I have a lot of errors with RMySQL -- but that is another thread...
Ralf On Thu, Jun 24, 2010 at 10:31 AM, James W. MacDonald <jmac...@med.umich.edu> wrote: > Hi Ralf, > > Ralf B wrote: >> >> Sorry for the lack of details. Since I run the same SQL first directly >> on MySQL (using the MySQL Query Browser) and then again using R >> through the RJDBC interface, I assume that I won't simply have a badly >> constructed SQL query. However, just to clear possible objection, here >> the SQL: >> >> >> # Extracts vector of data points >> getData <- function(connection) { >> queryStart <- "SELECT id1, id2, x, y FROM `mytable` " >> queryEnd <- ";" >> query <- paste(queryStart, " WHERE id1 IN(", id1s, ") AND id2 IN(", >> id2s, ") AND subtype='TYPE1'", queryEnd) >> # execute query >> data = dbGetQuery(connection, query) >> return(data) >> } >> >> When running this method using either RGUI or the command line, I have >> a runtime that reaches an incredible 10 minutes (!) for selecting >> about 50k - 80k data points (which I consider not much) based on the >> range of IDs I choose. The table size is about 5-8 million data points >> total. The same SQL query directly executed in MySQL Query Browser >> takes about 20 seconds which I would consider fine. There are no >> indices created for any of the fields but since the query runs a lot >> faster in the query browser I don't suspect this to be the main >> reason. >> >> Any ideas? > > Well, the RJDBC rforge page has this note: > > Note: The current implementation of RJDBC is done entirely in R, no Java > code is used. This means that it may not be extremely efficient and could be > potentially sped up by using Java native code. However, it was sufficient > for most tasks we tested. If you have performance issues with RJDBC, please > let us know and tell us more details about your test case. > > And from my quick peek at the page, it appears RJDBC is designed to allow > one to query any DBMS. Since RMySQL is MySQL-specific, it may be more > efficient. Anyway, why don't you just try it and see? > > Best, > > Jim > > >> >> Best, >> Ralf >> >> >> >> >> On Wed, Jun 23, 2010 at 4:36 PM, James W. MacDonald >> <jmac...@med.umich.edu> wrote: >>> >>> Hi Ralf, >>> >>> Ralf B wrote: >>>> >>>> I am running a simple SQL SELECT statement that involvs 50k + data >>>> points using R and the RJDBC interface. I am facing very slow response >>>> times in both the RGUI and the R console. When running this SQL >>>> statement directly in a SQL client I have processing times that are a >>>> lot lot faster (which means that the SQL statement itself is not the >>>> problem). >>>> >>>> Did any of you compare RJDBC vs RMySQL or is there a better, more >>>> efficient way to extract large data from databases using R? Would you >>>> recommend dumping data out completely into flat files and working with >>>> flat files instead? I expected that this would not be such a problem >>>> given that businesses maintain their data in DBs and R is supposed to >>>> be good in shifting around data. Am I doing something wrong? >>> >>> Well, if you don't show people what you have done, how can anybody tell >>> if >>> you are doing something wrong or not? >>> >>> I have no experience with RJDBC, so cannot say anything about that. >>> However, >>> I have always found RMySQL to be speedy enough. As an example: >>> >>>> library(RMySQL) >>> >>> Loading required package: DBI >>>> >>>> con <- dbConnect("MySQL", host="genome-mysql.cse.ucsc.edu", user = >>>> "genome", dbname = "hg18") >>>> system.time(a <- dbGetQuery(con, "select name, chromEnd from snp129 >>>> where >>>> chrom='chr1' and chromStart between 1 and 1e8;") >>> >>> + ) >>> user system elapsed >>> 7.95 0.06 38.59 >>>> >>>> dim(a) >>> >>> [1] 508676 2 >>> >>> So 40 seconds to get half a million records. Since this is via the >>> internet, >>> I have to imagine things would be much faster querying a local DB. >>> >>> But then you never say what constitutes 'slow' for you, so maybe this is >>> slow as well? >>> >>> Best, >>> >>> Jim >>> >>> >>>> Ralf >>>> >>>> ______________________________________________ >>>> R-help@r-project.org mailing list >>>> https://stat.ethz.ch/mailman/listinfo/r-help >>>> PLEASE do read the posting guide >>>> http://www.R-project.org/posting-guide.html >>>> and provide commented, minimal, self-contained, reproducible code. >>> >>> -- >>> James W. MacDonald, M.S. >>> Biostatistician >>> Douglas Lab >>> University of Michigan >>> Department of Human Genetics >>> 5912 Buhl >>> 1241 E. Catherine St. >>> Ann Arbor MI 48109-5618 >>> 734-615-7826 >>> ********************************************************** >>> Electronic Mail is not secure, may not be read every day, and should not >>> be >>> used for urgent or sensitive issues >>> > > -- > James W. MacDonald, M.S. > Biostatistician > Douglas Lab > University of Michigan > Department of Human Genetics > 5912 Buhl > 1241 E. Catherine St. > Ann Arbor MI 48109-5618 > 734-615-7826 > ********************************************************** > Electronic Mail is not secure, may not be read every day, and should not be > used for urgent or sensitive issues > ______________________________________________ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.