try running Rprof on the two examples to see what the difference is. what you will probably see is a lot of the time on the dataframe is spent in accessing it like a matrix ('['). Rprof is very helpful to see where time is spent in your scripts.

Sent from my iPhone

On Oct 21, 2009, at 17:17, Roberto Perdisci <roberto.perdi...@gmail.com> wrote:

Hi everybody,
I noticed a strange behavior when using loops versus apply() on a data frame.
The example below "explicitly" computes a distance matrix given a
dataset. When the dataset is a matrix, everything works fine. But when
the dataset is a data.frame, the dist.for function written using
nested loops will take a lot longer than the dist.apply

######## USING FOR #######

dist.for <- function(data) {

 d <- matrix(0,nrow=nrow(data),ncol=nrow(data))
 n <- ncol(data)
 r <- nrow(data)

 for(i in 1:r) {
    for(j in 1:r) {
       d[i,j] <- sum(abs(data[i,]-data[j,]))/n
    }
 }

 return(as.dist(d))
}

######## USING APPLY #######

f <- function(data.row,data.rest) {

 r2 <- as.double(apply(data.rest,1,g,data.row))

}

g <- function(row2,row1) {
 return(sum(abs(row1-row2))/length(row1))
}

dist.apply <- function(data) {
 d <- apply(data,1,f,data)

 return(as.dist(d))
}


######## TESTING #######

library(mvtnorm)
data <- rmvnorm(100,mean=seq(1,10),sigma=diag(1,nrow=10,ncol=10))

tf <- system.time(df <- dist.for(data))
ta <- system.time(da <- dist.apply(data))

print(paste('diff = ',sum(as.matrix(df) - as.matrix(da))))
print("tf = ")
print(tf)
print("ta = ")
print(ta)

print('----------------------------------')
print('Same experiment on data.frame...')
data2 <- as.data.frame(data)

tf <- system.time(df <- dist.for(data2))
ta <- system.time(da <- dist.apply(data2))

print(paste('diff = ',sum(as.matrix(df) - as.matrix(da))))
print("tf = ")
print(tf)
print("ta = ")
print(ta)

########################

Here is the output I get on my system (R version 2.7.1 on a Debian lenny)

[1] "diff =  0"
[1] "tf = "
  user  system elapsed
 0.088   0.000   0.087
[1] "ta = "
  user  system elapsed
 0.128   0.000   0.128
[1] "----------------------------------"
[1] "Same experiment on data.frame..."
[1] "diff =  0"
[1] "tf = "
  user  system elapsed
35.031   0.000  35.029
[1] "ta = "
  user  system elapsed
 0.184   0.000   0.185

Could you explain why that happens?

thank you,
regards

Roberto

______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Reply via email to