Hi everybody, I noticed a strange behavior when using loops versus apply() on a data frame. The example below "explicitly" computes a distance matrix given a dataset. When the dataset is a matrix, everything works fine. But when the dataset is a data.frame, the dist.for function written using nested loops will take a lot longer than the dist.apply
######## USING FOR ####### dist.for <- function(data) { d <- matrix(0,nrow=nrow(data),ncol=nrow(data)) n <- ncol(data) r <- nrow(data) for(i in 1:r) { for(j in 1:r) { d[i,j] <- sum(abs(data[i,]-data[j,]))/n } } return(as.dist(d)) } ######## USING APPLY ####### f <- function(data.row,data.rest) { r2 <- as.double(apply(data.rest,1,g,data.row)) } g <- function(row2,row1) { return(sum(abs(row1-row2))/length(row1)) } dist.apply <- function(data) { d <- apply(data,1,f,data) return(as.dist(d)) } ######## TESTING ####### library(mvtnorm) data <- rmvnorm(100,mean=seq(1,10),sigma=diag(1,nrow=10,ncol=10)) tf <- system.time(df <- dist.for(data)) ta <- system.time(da <- dist.apply(data)) print(paste('diff = ',sum(as.matrix(df) - as.matrix(da)))) print("tf = ") print(tf) print("ta = ") print(ta) print('----------------------------------') print('Same experiment on data.frame...') data2 <- as.data.frame(data) tf <- system.time(df <- dist.for(data2)) ta <- system.time(da <- dist.apply(data2)) print(paste('diff = ',sum(as.matrix(df) - as.matrix(da)))) print("tf = ") print(tf) print("ta = ") print(ta) ######################## Here is the output I get on my system (R version 2.7.1 on a Debian lenny) [1] "diff = 0" [1] "tf = " user system elapsed 0.088 0.000 0.087 [1] "ta = " user system elapsed 0.128 0.000 0.128 [1] "----------------------------------" [1] "Same experiment on data.frame..." [1] "diff = 0" [1] "tf = " user system elapsed 35.031 0.000 35.029 [1] "ta = " user system elapsed 0.184 0.000 0.185 Could you explain why that happens? thank you, regards Roberto ______________________________________________ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.