Best thanks for confirming my impression. I use dump for storing large data.frames with a number of attributes for each variable. save/load is much faster, but I am unsure, if such files will be readable by R versions years later. What format/functions would you suggest for data storage/transfer between different (future) R versions?

best regards,
Heinz

on/am 30.10.2013 20:11, William Dunlap wrote/hat geschrieben:
I see a big 2.15.2/3.0.2 speed difference in parse() (which is used by source())
when it is parsing long vectors of numeric data.  dump/source has never been an 
efficient
way of transferring data between different R session, but it is much worse
now for long vectors.   In 2.15.2 doubling the size of the vector (of lengths
in the range 10^4 to 10^7) makes the time to parse go up by a factor of c. 2.1.
In 3.0.2 that factor is more like 4.4.

        n elapsed-2.15.2 elapsed-3.0.2
     2048          0.003         0.018
     4096          0.006         0.065
     8192          0.013         0.254
    16384          0.025         1.067
    32768          0.050         4.114
    65536          0.100        16.236
   131072          0.219        66.013
   262144          0.808       291.883
   524288          2.022      1285.265
  1048576          4.918            NA
  2097152          9.857            NA
  4194304         22.916            NA
  8388608         49.671            NA
16777216        101.042            NA
33554432        512.719            NA

I tried this with 64-bit R on a Linux box.  The NA's represent sizes that did 
not
finish while I was at a 1 1/2 hour dentist's apppointment.  The timing function
was:
   test <- function(n = 2^(11:25))
   {
       tf <- tempfile()
       on.exit(unlink(tf))
       t(sapply(n, function(n){
           dput(log(seq_len(n)), file=tf)
           print(c(n=n, system.time(parse(file=tf))[1:3]))
       }))
   }

Bill Dunlap
Spotfire, TIBCO Software
wdunlap tibco.com


-----Original Message-----
From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org] On 
Behalf
Of Carl Witthoft
Sent: Wednesday, October 30, 2013 5:29 AM
To: r-help@r-project.org
Subject: Re: [R] big speed difference in source btw. R 2.15.2 and R 3.0.2 ?

Did you run the identical code on the identical machine, and did you verify
there were no other tasks running which might have limited the RAM available
to R?  And equally important, did you run these tests in the reverse order
(in case R was storing large objects from the first run, thus chewing up
RAM)?



Dear All,

is it known that source works much faster in  R 2.15.2 than in R 3.0.2 ?
In the example below I observe e.g. for a data.frame with 10^7 rows the
following timings:

R version 2.15.2 Patched (2012-11-29 r61184)
length: 1e+07
     user  system elapsed
    62.04    0.22   62.26

R version 3.0.2 Patched (2013-10-27 r64116)
length: 1e+07
     user  system elapsed
   388.63  176.42  566.41

Is there a way to speed R version 3.0.2 up to the performance of R
version 2.15.2?

best regards,

Heinz Tüchler


example:
sessionInfo()
sample.vec <-
    c('source', 'causes', 'R', 'to', 'accept', 'its', 'input', 'from', 'the',
      'named', 'file', 'or', 'URL', 'or', 'connection')
dmp.size <- c(10^(1:7))
set.seed(37)

for(i in dmp.size) {
    df0 <- data.frame(x=sample(sample.vec, i, replace=TRUE))
    dump('df0', file='testdump')
    cat('length:', i, '\n')
    print(system.time(source('testdump', keep.source = FALSE,
                             encoding='')))
}

output for R version 2.15.2 Patched (2012-11-29 r61184):
sessionInfo()
R version 2.15.2 Patched (2012-11-29 r61184)
Platform: x86_64-w64-mingw32/x64 (64-bit)

locale:
[1] LC_COLLATE=German_Switzerland.1252  LC_CTYPE=German_Switzerland.1252
[3] LC_MONETARY=German_Switzerland.1252 LC_NUMERIC=C
[5] LC_TIME=German_Switzerland.1252

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base
sample.vec <-
+   c('source', 'causes', 'R', 'to', 'accept', 'its', 'input', 'from',
'the',
+     'named', 'file', 'or', 'URL', 'or', 'connection')
dmp.size <- c(10^(1:7))
set.seed(37)

for(i in dmp.size) {
+   df0 <- data.frame(x=sample(sample.vec, i, replace=TRUE))
+   dump('df0', file='testdump')
+   cat('length:', i, '\n')
+   print(system.time(source('testdump', keep.source = FALSE,
+                            encoding='')))
+ }
length: 10
     user  system elapsed
        0       0       0
length: 100
     user  system elapsed
        0       0       0
length: 1000
     user  system elapsed
        0       0       0
length: 10000
     user  system elapsed
     0.02    0.00    0.01
length: 1e+05
     user  system elapsed
     0.21    0.00    0.20
length: 1e+06
     user  system elapsed
     4.47    0.04    4.51
length: 1e+07
     user  system elapsed
    62.04    0.22   62.26



output for R version 3.0.2 Patched (2013-10-27 r64116):
sessionInfo()
R version 3.0.2 Patched (2013-10-27 r64116)
Platform: x86_64-w64-mingw32/x64 (64-bit)

locale:
[1] LC_COLLATE=German_Switzerland.1252  LC_CTYPE=German_Switzerland.1252
[3] LC_MONETARY=German_Switzerland.1252 LC_NUMERIC=C
[5] LC_TIME=German_Switzerland.1252

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base
sample.vec <-
+   c('source', 'causes', 'R', 'to', 'accept', 'its', 'input', 'from',
'the',
+     'named', 'file', 'or', 'URL', 'or', 'connection')
dmp.size <- c(10^(1:7))
set.seed(37)

for(i in dmp.size) {
+   df0 <- data.frame(x=sample(sample.vec, i, replace=TRUE))
+   dump('df0', file='testdump')
+   cat('length:', i, '\n')
+   print(system.time(source('testdump', keep.source = FALSE,
+                            encoding='')))
+ }
length: 10
     user  system elapsed
        0       0       0
length: 100
     user  system elapsed
        0       0       0
length: 1000
     user  system elapsed
        0       0       0
length: 10000
     user  system elapsed
     0.01    0.00    0.01
length: 1e+05
     user  system elapsed
     0.36    0.06    0.42
length: 1e+06
     user  system elapsed
     6.02    1.86    7.88
length: 1e+07
     user  system elapsed
   388.63  176.42  566.41






--
View this message in context: 
http://r.789695.n4.nabble.com/big-speed-difference-in-
source-btw-R-2-15-2-and-R-3-0-2-tp4679314p4679346.html
Sent from the R help mailing list archive at Nabble.com.

______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Reply via email to