Re: [R] How do I use "tapply" in this case ?

David Winsemius Fri, 05 Feb 2010 05:35:10 -0800


On Feb 5, 2010, at 1:50 AM, Bert Gunter wrote:

Folks:
You can make use of matrix subscripting and avoid R level loops andapplys
altogether. This will end up being many times faster.

Here's your original code:

Z=matrix(rnorm(20), nrow=4)
index=replicate(4, sample(1:5, 3))
P=4
tmpr=list()
for (i in 1:P)
{
 tmp = Z[i,index[,i]]
 tmpr[[i]]=tmp
}

for clarity, here's the index matrix I got:
index
    [,1] [,2] [,3] [,4]
[1,]    5    1    2    3
[2,]    2    2    4    4
[3,]    1    5    5    5

Here's what I got for tmpr when I used your code:
tmpr
[[1]]
[1] -0.6246316 -0.8695538 -0.4136176

[[2]]
[1]  0.02885345 -1.89837071  0.43195955

[[3]]
[1]  0.2453368 -0.1788287 -0.6620405

[[4]]
[1] -0.87077697 -1.62554371  0.04464793
So the ith component of tmpr is is just what the indices in the ithcolumnof index pick out of the ith row of Z. That is, the first componentof tmpr
are the (1,5), (1,2), and (1,1) elements of Z. Matrix (in general,
array)indexing -- read the man page for "[" carefully: it'sdocumented inthe "Matrices and Arrays" section -- allow you to "stack" thesepairs (forn-dim arrays,n-tuples) row-wise into a matrix and use this matrix asan
index:
Z[cbind(c(1,1,1),index[,1])]
[1] -0.6246316 -0.8695538 -0.4136176
So you can do everything at once by (making use of R's columnwisestorage of
arrays) as:

result <- Z[cbind(as.vector(col(index)), as.vector(index))]

which gives:
[1] -0.62463163 -0.86955383 -0.41361765 0.02885345 -1.898370710.43195955
0.24533679
[8] -0.17882867 -0.66204048 -0.87077697 -1.62554371  0.04464793
Note that this vector is the same as: unlist(tmpr). So you can turnit into
a matrix e.g. where column i is the ith component of tmpr by:

dim(result) <- dim(index)
As I said, for large problems, this should be wayyyyy faster thanexplicitloops or the hidden (and optimized, but still) loops of applyfunctions.


Well, twice as fast as the explicit anyway:

> system.time( replicate(10000, {result <-Z[cbind(as.vector(col(index)), as.vector(index))]; dim(result) <-dim(index)} )

+ )
   user  system elapsed
  0.164   0.001   0.171

> system.time( replicate(10000, for (i in 1:P) { tmpr[[i]] <-Z[i,index[,i]] } ) )

  user  system elapsed
 0.267   0.049   0.330

Which was in turn twice as fast as the lapply approach:

> system.time( replicate(10000, tmpr[1:4]<-lapply(1:4, function(i, x,y) {x[i,y[,1]]}, Z, index ) ) )

  user  system elapsed
 0.628   0.015   0.646

--
David.



Bert Gunter
Genentech Nonclinical Statistics




-----Original Message-----

From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org] On

Behalf Of RICHARD M. HEIBERGER
Sent: Thursday, February 04, 2010 9:10 PM
To: Carrie Li
Cc: r-help
Subject: Re: [R] How do I use "tapply" in this case ?

lapply(1:4, function(i, x, y) {x[i,y[,1]]}, Z, index ) ## reproduces
your results

sapply(1:4, function(i, x, y) {x[i,y[,1]]}, Z, index ) ## collapses
your list into a set of columns

______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


David Winsemius, MD
Heritage Laboratories
West Hartford, CT

______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] How do I use "tapply" in this case ?

Reply via email to