On Thu, Nov 20, 2008 at 10:04 AM, Dieter Menne <[EMAIL PROTECTED]> wrote: > hadley wickham <h.wickham <at> gmail.com> writes: > >> > library(plyr) >> > dat = data.frame(SUBJECT_ID=sample(letters[1:5],100,TRUE),HR=rnorm(100)) >> > daply(dat,.(SUBJECT_ID),sd) >> > ddply(dat,.(SUBJECT_ID),sd) >> >> Well that calculates sd on the whole data frame. (Like sd(dat)). > > Not really, it looks like the breakdown is somehow done: > >> library(plyr) >> dat = data.frame(SUBJECT_ID=sample(letters[1:5],100,TRUE),HR=rnorm(100)) >> daply(dat,.(SUBJECT_ID),sd) > > SUBJECT_ID SUBJECT_ID HR > a NA 1.0488930 > b NA 0.9110685 > c NA 1.0776996 > d NA 1.1724009 > e NA 0.9455105 > Warning messages: > 1: In var(as.vector(x), na.rm = na.rm) : NAs introduced by coercion > ..more warnings > >> ddply(dat,.(SUBJECT_ID),sd) > SUBJECT_ID HR > 1 NA 1.0488930 > 2 NA 0.9110685 > 3 NA 1.0776996 > 4 NA 1.1724009 > 5 NA 0.9455105 > Warning messages: > 1: In var(as.vector(x), na.rm = na.rm) : NAs introduced by coercion > > That's what I meant by "almost correct". Your suggestion works, but wouldn't > is > be a good default to make numcolwise(sd) the default with this close miss?
I have considered it, but I think it makes it harder to use plyr for the more complicated problems where it really shines. Being able to work with the whole data frame, instead of just some subset of the columns, makes it possible to do much much more. For example, because aggregate operates on a column at a time, you can't calculate the correlation between variables: given a data frame you can always operate on a column at time, but given a column at a time, you can not operate on the data frame as a whole. Plyr chooses to supply your aggregation function with the whole data frame, and then provides functions (colwise, numcolwise, catcolwise) that make it easy to operate column-wise. Hadley -- http://had.co.nz/ ______________________________________________ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.