[R] Fwd: Re: speed up process

Ivan Calandra Mon, 28 Feb 2011 01:25:05 -0800

Dear Jim,

Here is again exactly what I did and with the output of Rprof (with thisreduced dataset and with a simpler function, it is here much faster thanin real life).


Thanks you again for your help!


## CODE ##

mydata1<- structure(list(species = structure(1:8, .Label =c("alsen","gogor", "loalb", "mafas", "pacyn", "patro", "poabe","thgel"), class = "factor"), fruit = c(0.52, 0.45, 0.43, 0.82, 0.35,0.9, 0.68, 0), Asfc = c(207.463765, 138.5533755, 70.4391735,160.9742745, 41.455809, 119.155109, 26.241441, 148.337377), Tfv =c(47068.1437773483, 43743.8087431582, 40323.5209129239,23420.9455581495, 29382.6947428651, 50460.2202192311, 21810.1456510625,41747.6053810881)), .Names = c("species", "fruit", "Asfc", "Tfv"),row.names = c(NA, 8L), class = "data.frame")


mydata2<- mydata1[!(mydata1$species %in% c("thgel","alsen")),]
mydata3<- mydata1[!(mydata1$species %in% c("thgel","alsen","poabe")),]
mydata_list<- list(mydata1=mydata1, mydata2=mydata2, mydata3=mydata3)

library(WRS)
foo_reg<- function(dat, xvar, yvar, mycol, pos, name.dat){
 tsts<- tstsreg(dat[[xvar]], dat[[yvar]])
 tsts_inter<- signif(tsts$coef[1], digits=3)
 tsts_slope<- signif(tsts$coef[2], digits=3)
 abline(tsts$coef, lty=1, col=mycol)

legend(x=pos, legend=c(paste("TSTS ",name.dat,":Y=",tsts_inter,"+",tsts_slope,"X",sep="")), lty=1, col=mycol)

}

ind.xvar<- 2
seq.yvar<- 3:4
mypos<- c("topleft", "topright","bottomleft")

par(mfrow=c(2,1))
Rprof()
for (i in seq_along(seq.yvar)){
  k<- seq.yvar[i]

plot(mydata1[[k]]~mydata1[[ind.xvar]], type="p",xlab=names(mydata1)[ind.xvar], ylab=names(mydata1)[k])

  for (j in seq_along(mydata_list)){

foo_reg(dat=mydata_list[[j]], xvar=ind.xvar, yvar=k, mycol=j,pos=mypos[j], name.dat=names(mydata_list)[j])

  }
}
Rprof(NULL)

summaryRprof()
$by.self
         self.time self.pct total.time total.pct
pt            0.04    18.18       0.04     18.18
plot          0.02     9.09       0.08     36.36
sc            0.02     9.09       0.08     36.36
mean          0.02     9.09       0.04     18.18
|             0.02     9.09       0.02      9.09
axis          0.02     9.09       0.02      9.09
box           0.02     9.09       0.02      9.09
ifelse        0.02     9.09       0.02      9.09
plot.new      0.02     9.09       0.02      9.09
sort          0.02     9.09       0.02      9.09

$by.total
               total.time total.pct self.time self.pct
foo_reg              0.14     63.64      0.00     0.00
tstsreg              0.12     54.55      0.00     0.00
plot                 0.08     36.36      0.02     9.09
sc                   0.08     36.36      0.02     9.09
do.call              0.06     27.27      0.00     0.00
plot.default         0.06     27.27      0.00     0.00
plot.formula         0.06     27.27      0.00     0.00
pt                   0.04     18.18      0.04    18.18
mean                 0.04     18.18      0.02     9.09
corfun               0.04     18.18      0.00     0.00
median               0.04     18.18      0.00     0.00
median.default       0.04     18.18      0.00     0.00
tsreg                0.04     18.18      0.00     0.00
|                    0.02      9.09      0.02     9.09
axis                 0.02      9.09      0.02     9.09
box                  0.02      9.09      0.02     9.09
ifelse               0.02      9.09      0.02     9.09
plot.new             0.02      9.09      0.02     9.09
sort                 0.02      9.09      0.02     9.09
Axis                 0.02      9.09      0.00     0.00
Axis.default         0.02      9.09      0.00     0.00
legend               0.02      9.09      0.00     0.00
localAxis            0.02      9.09      0.00     0.00
localBox             0.02      9.09      0.00     0.00
par                  0.02      9.09      0.00     0.00
rect                 0.02      9.09      0.00     0.00
rect2                0.02      9.09      0.00     0.00

$sample.interval
[1] 0.02

$sampling.time
[1] 0.22

sessionInfo()
R version 2.12.1 (2010-12-16)
Platform: i386-pc-mingw32/i386 (32-bit)

locale:
[1] LC_COLLATE=English_United States.1252
[2] LC_CTYPE=English_United States.1252
[3] LC_MONETARY=English_United States.1252
[4] LC_NUMERIC=C
[5] LC_TIME=English_United States.1252

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base

other attached packages:
[1] WRS_0.12.1       robustbase_0.6-2 akima_0.5-4      MASS_7.3-11



-------- Message original --------
Sujet:  Re: [R] speed up process
Date :  Fri, 25 Feb 2011 13:42:58 -0500
De :    jim holtman <jholt...@gmail.com>
Pour :  ivan.calan...@uni-hamburg.de



What did the output from Rprof look like?  How much time is the rest
of the script taking in proportion of foo_reg?  That would indicate if
it is worth it to spend the time in trying to improve it.  Time to
change the code vs. the number of time that you expect to run it with
its current "slowness"?

On Fri, Feb 25, 2011 at 8:38 AM, Ivan Calandra
<ivan.calan...@uni-hamburg.de>  wrote:

 Ha... it was way too simple!
 I thought it would be like system.time()... my bad. Thanks for the tip!

 As we thought, foo_reg() takes most of the computing time, and I cannot
 improve that.
 Any ideas of how to improve the rest?

 Thanks again for your help
 Ivan


 Le 2/25/2011 14:29, jim holtman a écrit :


 You invoke Rprof, run your code and then terminate it:


 Rprof()
 ....... code you want to profile
 Rprof(NULL)  # generate output
 summaryRprof()

 example:

 Rprof()
 for (i in 1:1e6) sin(i) + cos(i) + sqrt(i)
 Rprof(NULL)
 summaryRprof()


 $by.self
       self.time self.pct total.time total.pct
 sin       0.24    30.77       0.24     30.77
 sqrt      0.22    28.21       0.22     28.21
 cos       0.16    20.51       0.16     20.51
 +         0.14    17.95       0.14     17.95
 :         0.02     2.56       0.02      2.56

 $by.total
       total.time total.pct self.time self.pct
 sin        0.24     30.77      0.24    30.77
 sqrt       0.22     28.21      0.22    28.21
 cos        0.16     20.51      0.16    20.51
 +          0.14     17.95      0.14    17.95
 :          0.02      2.56      0.02     2.56

 $sample.interval
 [1] 0.02

 $sampling.time
 [1] 0.78


 On Fri, Feb 25, 2011 at 6:57 AM, Ivan Calandra
 <ivan.calan...@uni-hamburg.de>    wrote:


 Dear Jim,

 I've tried to use Rprof() as you advised me, but I don't understand how
 it
 works.
 I've done this:
 Rprof(for (i in seq_along(seq.yvar)){
   all_my_commands
 })
 summaryRprof()

 But I got this error:
 Error in summaryRprof() : no lines found in ‘Rprof.out’

 I couldn't really understand from the help page what I should do.

 In any case, it's sure that the function tstsreg(), is what takes the
 most
 computing time. But I wanted to optimize the rest of the code to gain as
 much speed as possible.

 Ivan

 Le 2/25/2011 12:30, Jim Holtman a écrit :


 use Rprof to find where time is being spent.  probably in 'plot' which
 might imply it is not the 'for' loop and therefore beyond your control.

 Sent from my iPad

 On Feb 25, 2011, at 6:19, Ivan Calandra<ivan.calan...@uni-hamburg.de>
   wrote:

 Thanks Nick for your quick answer.
 It does work (no missed bracket!) but unfortunately doesn't really
 speed
 up anything: with my real data, it takes 82.78 seconds with the double
 lapply() instead of 83.59s with the double loop (about 0.8 s).

 It looks like my double loop was not that bad. Does anyone know another
 faster way to do this?

 Thanks again in advance,
 Ivan

 Le 2/25/2011 11:41, Nick Sabbe a écrit :


 Simply avoiding the for loops by using lapply (I may have missed a
 bracket
 here or there cause I did this without opening R)...
 Haven't checked the speed up, though.

 lapply(seq.yvar, function(k){
     plot(mydata1[[k]]~mydata1[[ind.xvar]], type="p",
 xlab=names(mydata1)[ind.xvar], ylab=names(mydata1)[k])
     lapply(seq_along(mydata_list), function(j){
       foo_reg(dat=mydata_list[[j]], xvar=ind.xvar, yvar=k, mycol=j,
 pos=mypos[j], name.dat=names(mydata_list)[j])
       return(NULL)
     })
     invisible(NULL)
 })

 HTH,

 Nick Sabbe
 --
 ping: nick.sa...@ugent.be
 link: http://biomath.ugent.be
 wink: A1.056, Coupure Links 653, 9000 Gent
 ring: 09/264.59.36

 -- Do Not Disapprove




 -----Original Message-----
 From: r-help-boun...@r-project.org
 [mailto:r-help-boun...@r-project.org]
 On
 Behalf Of Ivan Calandra
 Sent: vrijdag 25 februari 2011 11:20
 To: r-help
 Subject: [R] speed up process

 Dear users,

 I have a double for loop that does exactly what I want, but is quite
 slow. It is not so much with this simplified example, but IRL it is
 slow.
 Can anyone help me improve it?

 The data and code for foo_reg() are available at the end of the email;
 I
 preferred going directly into the problematic part.
 Here is the code (I tried to simplify it but I cannot do it too much
 or
 else it wouldn't represent my problem). It might also look too complex
 for what it is intended to do, but my colleagues who are also supposed
 to use it don't know much about R. So I wrote it so that they don't
 have
 to modify the critical parts to run the script for their needs.

 #column indexes for function
 ind.xvar<- 2
 seq.yvar<- 3:4
 #position vector for legend(), stupid positioning but it doesn't
 matter
 here
 mypos<- c("topleft", "topright","bottomleft")

 #run the function for columns 3&4 as y (seq.yvar) with column 2 as x
 (ind.xvar) for all 3 datasets (mydata_list)
 par(mfrow=c(2,1))
 for (i in seq_along(seq.yvar)){
     k<- seq.yvar[i]
     plot(mydata1[[k]]~mydata1[[ind.xvar]], type="p",
 xlab=names(mydata1)[ind.xvar], ylab=names(mydata1)[k])
     for (j in seq_along(mydata_list)){
       foo_reg(dat=mydata_list[[j]], xvar=ind.xvar, yvar=k, mycol=j,
 pos=mypos[j], name.dat=names(mydata_list)[j])
     }
 }

 I tried with lapply() or mapply() but couldn't manage to pass the
 arguments for names() and col= correctly, e.g. for the 2nd loop:
 lapply(mydata_list, FUN=function(x){foo_reg(dat=x, xvar=ind.xvar,
 yvar=k, col1=1:3, pos=mypos[1:3], name.dat=names(x)[1:3])})
 mapply(FUN=function(x) {foo_reg(dat=x, name.dat=names(x)[1:3])},
 mydata_list, col1=1:3, pos=mypos, MoreArgs=list(xvar=ind.xvar,
 yvar=k))

 Thanks in advance for any hints.
 Ivan




 #create data (it looks horrible with these datasets but it doesn't
 matter here)
 mydata1<- structure(list(species = structure(1:8, .Label = c("alsen",
 "gogor", "loalb", "mafas", "pacyn", "patro", "poabe", "thgel"), class
 =
 "factor"), fruit = c(0.52, 0.45, 0.43, 0.82, 0.35, 0.9, 0.68, 0), Asfc
 =
 c(207.463765, 138.5533755, 70.4391735, 160.9742745, 41.455809,
 119.155109, 26.241441, 148.337377), Tfv = c(47068.1437773483,
 43743.8087431582, 40323.5209129239, 23420.9455581495,
 29382.6947428651,
 50460.2202192311, 21810.1456510625, 41747.6053810881)), .Names =
 c("species", "fruit", "Asfc", "Tfv"), row.names = c(NA, 8L), class =
 "data.frame")

 mydata2<- mydata1[!(mydata1$species %in% c("thgel","alsen")),]
 mydata3<- mydata1[!(mydata1$species %in% c("thgel","alsen","poabe")),]
 mydata_list<- list(mydata1=mydata1, mydata2=mydata2, mydata3=mydata3)

 #function for regression
 library(WRS)
 foo_reg<- function(dat, xvar, yvar, mycol, pos, name.dat){
    tsts<- tstsreg(dat[[xvar]], dat[[yvar]])
    tsts_inter<- signif(tsts$coef[1], digits=3)
    tsts_slope<- signif(tsts$coef[2], digits=3)
    abline(tsts$coef, lty=1, col=mycol)
    legend(x=pos, legend=c(paste("TSTS ",name.dat,":
 Y=",tsts_inter,"+",tsts_slope,"X",sep="")), lty=1, col=mycol)
 }

 --
 Ivan CALANDRA
 PhD Student
 University of Hamburg
 Biozentrum Grindel und Zoologisches Museum
 Abt. Säugetiere
 Martin-Luther-King-Platz 3
 D-20146 Hamburg, GERMANY
 +49(0)40 42838 6231
 ivan.calan...@uni-hamburg.de

 **********
 http://www.for771.uni-bonn.de
 http://webapp5.rrz.uni-hamburg.de/mammals/eng/1525_8_1.php

 ______________________________________________
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide
 http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.


 --
 Ivan CALANDRA
 PhD Student
 University of Hamburg
 Biozentrum Grindel und Zoologisches Museum
 Abt. Säugetiere
 Martin-Luther-King-Platz 3
 D-20146 Hamburg, GERMANY
 +49(0)40 42838 6231
 ivan.calan...@uni-hamburg.de

 **********
 http://www.for771.uni-bonn.de
 http://webapp5.rrz.uni-hamburg.de/mammals/eng/1525_8_1.php


 --
 Ivan CALANDRA
 PhD Student
 University of Hamburg
 Biozentrum Grindel und Zoologisches Museum
 Abt. Säugetiere
 Martin-Luther-King-Platz 3
 D-20146 Hamburg, GERMANY
 +49(0)40 42838 6231
 ivan.calan...@uni-hamburg.de

 **********
 http://www.for771.uni-bonn.de
 http://webapp5.rrz.uni-hamburg.de/mammals/eng/1525_8_1.php




--
Jim Holtman
Data Munger Guru

What is the problem that you are trying to solve?

______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] Fwd: Re: speed up process

Reply via email to