Re: [R] tips to speed up sqlSave with MS SQL Server?

Prof Brian Ripley Mon, 28 Apr 2008 22:34:08 -0700

Sorry, but Thomas needs a format something else can read, e.g. 'a flatfile'.

write.table() is better than write(), and you may want to write insections of, say, 1million rows.

The comparison you quote is for ca 7-14 secs to impoort 1m rows of 4integers. 6 secs is what write.table(DF, "foo.dat", row.names=FALSE)is doing for me, so looks like export can be faster than import.


On Mon, 28 Apr 2008, jim holtman wrote:

?save

This will write the object to a file in the fastest manner.  Here is an example:

x <- runif(125000000/8)
object.size(x)

[1] 125000024

system.time(save(x, file='/tempxx.Rdata'))

  user  system elapsed
 56.84    0.86   87.97


This wound up to be 79MB on disk after compression.

Without compression, most of the time is my slow disk:

system.time(save(x, file='/tempxx.Rdata', compress=FALSE))

  user  system elapsed
  5.07    1.27   56.66

The size on disk was 119MB.

On Mon, Apr 28, 2008 at 8:51 PM, Thomas Pujol <[EMAIL PROTECTED]> wrote:

Prof,
Thanks for your generous assistance.

I'm unsure, but an thinking that to utilize one of MS SQL Server's bulkimport utilities, I'll need to export my dataframe to a "flat-file".

Any tips on the best approach for exporting such a large dataframe to aflat-file? Is write() or write.table() the "best" function to use, orare there others I should consider?

Also, not specifc to R, but I came across this:
"Flat File Bulk Import methods speed comparison in SQL Server 2005"
http://weblogs.sqlteam.com/mladenp/archive/2006/07/17/10634.aspx

Thanks again.

Prof Brian Ripley <[EMAIL PROTECTED]> wrote:

   I think the short answer is that RODBC is not designed for that, because
   ODBC is not. There seems to be an ODBC extension specific to SQL Server
   to do so (somewhere said 'SQL Server version 7.0 or later', which may not
   apply to you).

   I'm pretty unlikely to add support for just one database, especially one
   that requires files from SQL Server. Also, I don't know of any RODBC /SQL
   Server users who might be motivated to do so.

   There is work in progress to implement SQLBulkOperations, but that is a
   different concept (and not yet wired up to sqlSave).

   On Mon, 28 Apr 2008, Thomas Pujol wrote:

  > I am using R2.6.0 on ?Windows Small Business Server 2003?. I apologize
  > if the answer to my question is available?I have searched but have not
  > found anything that I thought helped me.
  >
  > I have a dataframe that contains ~4.5 million rows and 5 columns.
  > (see memory and df details below). I am trying to save the dataframe to
  > a MS SQL Server database, using the ?sqlSave? function. The code below
  > seems to work, but takes several hours.
  >
  > ?sqlSave(channel, dat=idxdata, varTypes=c(ddates="datetime") )?
  >
  > Any tips how I can speed things up? Or is my dataframe so large that it
  > is going to take a while? (I have ~20 dataframes that I need to save to
  > SQL, so speed is somewhat important.) Is there an altogether different
  > approach I should consider taking?

   Use a different client that does implement bulk copy operations? At least
   SQL Server 2005 comes with a bcp.exe command-line client to do this. See
   http://msdn2.microsoft.com/en-us/library/ms188728.aspx

  > FYI, here is information re: the dataframe and memory on my system.
  > Please let me know if there is any further information I should provide.
  >
  >> memory.size(max = F) #reports amount of memory currently in use
  > [1] 131.8365
  >
  >> str(idxdata)
  > 'data.frame': 4474553 obs. of 5 variables:
  > $ idkey : int 1003 1003 1003 1003 1003 1003 1003 1003 1003 1003 ...
  > $ nnd : Factor w/ 25 levels "01","01C","02",..: 1 1 1 1 1 1 1 1 1 1 ...
  > $ curcdd : Factor w/ 2 levels "CAD","USD": 2 2 2 2 2 2 2 2 2 2 ...
  > $ ddates:Class 'Date' num [1:4474553] 6942 6943 6944 6945 6948 ...
  > $ idx : num 100 100 100 100 100 100 100 100 100 100 ...
  >
  >> object.size(idxdata)
  > [1] 125289688

   --
   Brian D. Ripley, [EMAIL PROTECTED]
   Professor of Applied Statistics, http://www.stats.ox.ac.uk/~ripley/
   University of Oxford, Tel: +44 1865 272861 (self)
   1 South Parks Road, +44 1865 272866 (PA)
   Oxford OX1 3TG, UK Fax: +44 1865 272595

---------------------------------
[[elided Yahoo spam]]
       [[alternative HTML version deleted]]

______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.




--
Jim Holtman
Cincinnati, OH
+1 513 646 9390

What is the problem you are trying to solve?

______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


--
Brian D. Ripley,                  [EMAIL PROTECTED]
Professor of Applied Statistics,  http://www.stats.ox.ac.uk/~ripley/
University of Oxford,             Tel:  +44 1865 272861 (self)
1 South Parks Road,                     +44 1865 272866 (PA)
Oxford OX1 3TG, UK                Fax:  +44 1865 272595

______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] tips to speed up sqlSave with MS SQL Server?

Reply via email to