On Wed, Oct 1, 2008 at 7:21 AM, Cézar Freitas <[EMAIL PROTECTED]> wrote: > Hi. I searched the list and didn't found nothing similar to this. I > simplified my example like below: > > #I need calculate correlation (for example) between 2 columns classified by a > third one at a data.frame, like below: > > #number of rows > nr = 10 > > #the third column is to enforce that I need correlation on two variables only > dataf = > as.data.frame(matrix(c(rnorm(nr),rnorm(nr)*2,runif(nr),sort(c(1,1,2,2,3,3,sample(1:3,nr-6,replace=TRUE)))),ncol=4)) > names(dataf)[4] = "class" > > #> dataf > # V1 V2 V3 class > #1 0.56933020 1.2529931 0.30774422 1 > #2 0.41702299 -1.6441547 0.76140046 1 > #3 -1.07671647 -4.8747575 0.43706944 1 > #4 -1.97701167 1.3015196 0.04390175 2 > #5 0.56501325 1.8597720 0.08174124 2 > #6 0.70068638 1.7922641 0.74730126 2 > #7 -1.39956177 -1.9918904 0.64521918 3 > #8 0.27086664 0.3745362 0.61026133 3 > #9 0.04282347 3.7360407 0.48696109 3 > #10 -0.34262654 0.7933674 0.09824913 3 > > #I tried: > > tapply(dataf$V1, dataf$class, cor, dataf$V2) > #Error FUN(X[[1L]], ...) : incompatible dimensions > > tapply(dataf$V1, dataf$class, cor, tapply(dataf$V2, dataf$class)) > #Error FUN(X[[1L]], ...) : incompatible dimensions > > #But using "by" I obtain: > > by(dataf[,c("V1","V2")], dataf$class, cor) > > #dataf$class: 1 > # V1 V2 > #V1 1.00000 0.91777 > #V2 0.91777 1.00000 > #-------------------------------------------------------------------------------------------------- > #dataf$class: 2 > # V1 V2 > #V1 1.000000 0.987857 > #V2 0.987857 1.000000 > #-------------------------------------------------------------------------------------------------- > #dataf$class: 3 > # V1 V2 > #V1 1.0000000 0.7318938 > #V2 0.7318938 1.0000000 > > #My interest is on cor(V1,V2)[1,2], so I can take 0.91777, 0.987857 and > 0.7318938, but I think that tapply can works better, if I can solve the > problem.
You might want to have a look at the plyr package: install.packages("plyr") library(plyr) # You can easily control the output data type: # d = data.frame, a = array, l = list ddply(dataf, .(class), function(df) data.frame(cor(df[, 1:2]))) daply(dataf, .(class), function(df) cor(df[, 1:2])) dlply(dataf, .(class), function(df) cor(df[, 1:2])) # Or for the minimal value you want ddply(dataf, .(class), function(df) cor(df$V1, df$V2)) # Note that plyr preserves labels so it's easier to match up with the original data # Learn more at http://had.co.nz/plyr Hadley -- http://had.co.nz/ ______________________________________________ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.