Thank you Mike for looking into the problem and your helpful advice, really 
appreciate that.  Also, thank you Bill for pointing out the bad data.frame 
code.I modified the codes per your suggestions and run some time tests on 
n=2000 (increase # obs and groups as I actually have a much bigger dataset and 
more complicated getResult function with 100 lines of code)
original code:> proc.time() - ptm   user  system elapsed   72.37    0.06   
72.51 
modified code:> proc.time() - ptm   user  system elapsed   73.21    0.20   
81.26 
Surprisingly the lapply doesn't appear to save time, perhaps I should use dplyr 
function to extract groups but I tried that before and it didn't save time 
either.  I read that data.table is faster (if applicable) and perhaps I should 
go to that direction? Any thought of speeding the ops is much appreciated.Thank 
you,Anne


--------------------------------------------- CODE below #sample data 
setupn=2000; 
set.seed(1)g=rep(1:500,each=4)df=data.frame(s=sort(rnorm(mean=15,sd=10, n)), 
w=runif(n), h=rbinom(n, 1, 0.4) , g ); df      
getResult(df)#i0=c(1,2,4,5,5)i0=rep(c(1,2,4,5,5),100)
ng= length(unique(g))
#initiation of result matrixA=B=matrix(Inf, ng,  ng); A## my code (Anne)ptm = 
proc.time()for(i in 1:ng) { #cat("i:",i," ") for(j in i0[i]:ng) { ok = 
!is.na(match(g,i:j)); #cat("j:",j,"\n");      A[i,j]=getResult(d=df[ok,]) } 
#end for (j)} #end for (i)proc.time() - ptm
## Mike's codeptm = proc.time()invisible(lapply(1:ng, function(i) {             
         lapply(i0[i]:ng, function(j) {                                 ok <- 
!is.na(match(g, i:j))                                 B[i, j] <<- 
getResult(df[ok, ])                             })                 
}))proc.time() - ptm

 


     On Wednesday, October 14, 2015 11:35 AM, William Dunlap 
<wdun...@tibco.com> wrote:
   

 > df=as.data.frame(cbind( sort(rnorm(mean=15,sd=10, n)),runif(n), rbinom(n, 1, 
 > 0.4) , g ))

This is a lousy way to make a data.frame - the cbind forces all columns to be 
the sametype and forces them into one vector then as.data.frame splits them up 
into separate columnsagain.  You also get weird names for your columns.  If you 
want to make a data.frame, use   df <- data.frame(ColA = 
sort(rnorm(mean=15,sd=10, n)), ColB = runif(n), ColC = rbinom(n, 1, 0.4) , g = 
g)
However, since the columns you are passing to getResult are both numeric a 
matrix (madewith cbind) would work just as well and selecting rows from it will 
probably be faster. Youwill have to have a large number of groups before you 
notice the difference.

Bill Dunlap
TIBCO Software
wdunlap tibco.com
On Wed, Oct 14, 2015 at 2:02 AM, Michael Hannon <jmhannon.ucda...@gmail.com> 
wrote:

I've done a simple-minded transliteration of your code into code using nested
lapply's.  I doubt that it buys you much in terms of performance (or even
clarity, which is really one of the main advantages of the `apply` family).


> A
        [,1]      [,2]     [,3]     [,4]     [,5]
[1,] 3.06097  6.507521 10.99610 12.05556 15.10388
[2,]     Inf 11.818495 15.85044 16.69465 19.70425
[3,]     Inf       Inf      Inf 19.14779 22.30343
[4,]     Inf       Inf      Inf      Inf 26.11170
[5,]     Inf       Inf      Inf      Inf 28.29882

> B
        [,1]      [,2]     [,3]     [,4]     [,5]
[1,] 3.06097  6.507521 10.99610 12.05556 15.10388
[2,]     Inf 11.818495 15.85044 16.69465 19.70425
[3,]     Inf       Inf      Inf 19.14779 22.30343
[4,]     Inf       Inf      Inf      Inf 26.11170
[5,]     Inf       Inf      Inf      Inf 28.29882
> all.equal(A, B)
[1] TRUE

If I happen to think of a more-elegant approach, I'll let you know.

-- Mike

Appendix: code
==============

###### Anne's code

getResult <- function(d) {

      #examplefunction

     weighted.mean(x=d[,1], w=d[,2])

}

#example data setup

n=20;

set.seed(1)

g=rep(1:5,each=4)

df=as.data.frame(cbind( sort(rnorm(mean=15,sd=10, n)),runif(n), rbinom(n, 1,
0.4) , g )); df

getResult(df)

i0=c(1,2,4,5,5)

ng= length(unique(g))



#initiation of result matrix

A=matrix(Inf, ng, ng); A

for(i in 1:ng)

{              cat("i:",i,"")

                for(j in i0[i]:ng) {

                                ok= !is.na(match(g,i:j)); cat("j:",j,"\n");

                                A[i,j]=getResult(d=df[ok,])

                } #endfor (j)

} #end for (i)
A

###### Mike's code

n <- 20;
set.seed(1)
g <- rep(1:5,each=4)
df <- as.data.frame(cbind(sort(rnorm(mean=15,sd=10, n)),
                          runif(n),
                          rbinom(n, 1, 0.4),
                          g )); df
getResult(df)
i0 <- c(1,2,4,5,5)
ng <- length(unique(g))

B <- matrix(Inf, ng, ng);

invisible(lapply(1:ng, function(i) {
                     lapply(i0[i]:ng, function(j) {
                                ok <- !is.na(match(g, i:j))
                                B[i, j] <<- getResult(df[ok, ])
                            })
                 }))

B
all.equal(A, B)


On Mon, Oct 12, 2015 at 5:55 PM, Annie Hawk via R-help
<r-help@r-project.org> wrote:
> HI R-experts,
>
>
> I am trying to speed up my calculation of the A results below and replace the 
> for loop withsome functionals like lapply.  After manyreadings, trial and 
> error, I still have no success.  Would anyone please give me some hints 
> onthat?
>
> Thank you in advance.
>
> Anne
>
>
> The program is this, I have a complicated function and itneeds to operate on 
> some subsets of a dataset many times, depending on thevalues of group.  I 
> simplify the functionand dataset for this example run.
>
> getResult <- function(d) {
>
>       #examplefunction
>
>      weighted.mean(x=d[,1], w=d[,2])
>
> }
>
>
>
> #example data setup
>
> n=20;
>
> set.seed(1)
>
> g=rep(1:5,each=4)
>
> df=as.data.frame(cbind( sort(rnorm(mean=15,sd=10, n)),runif(n), rbinom(n, 1, 
> 0.4) , g )); df
>
> getResult(df)
>
> i0=c(1,2,4,5,5)
>
> ng= length(unique(g))
>
>
>
> #initiation of result matrix
>
> A=matrix(Inf, ng, ng); A
>
> for(i in 1:ng)
>
> {              cat("i:",i,"")
>
>                 for(jin i0[i]:ng) {
>
>                                 ok= !is.na(match(g,i:j)); cat("j:",j,"\n");
>
>                                 A[i,j]=getResult(d=df[ok,])
>
>                 } #endfor (j)
>
> } #end for (i)
>
> Is there an elegant way to remove the for loop here?  I try to make it flat 
> for faster run but Icannot figure out how to subset the observations faster 
> without error to apply the functiongetResult.  Any hint is appreciated.
>
>
>
>
>
> on another note, is there a more elegant way to initiate the list as follows?
>
> mylist=list(); w=rep(4,5)
>
> for (i in 1:5) mylist[[i]]=w[i:5]
>
>
>
>
>         [[alternative HTML version deleted]]
>
> ______________________________________________
> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.

______________________________________________
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.




  
        [[alternative HTML version deleted]]

______________________________________________
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Reply via email to