It is currently not possible to pass weights in summaryBy. Regards Søren ________________________________________ Fra: Joshua Wiley [jwiley.ps...@gmail.com] Sendt: 17. januar 2011 08:16 Til: Solomon Messing Cc: r-help@r-project.org; Søren Højsgaard Emne: Re: [R] Using summaryBy with weighted data
Dear Solomon, On Sun, Jan 16, 2011 at 10:27 PM, Solomon Messing <solomon.mess...@gmail.com> wrote: > Dear Soren and R users: > > I am trying to use the summaryBy function with weights. Is this possible? > An example that illustrates what I am trying to do follows: > > library(doBy) > ## make up some data > response = rnorm(100) > group = c(rep(1,20), rep(2,20), rep(3,20), rep(4,20), rep(5,20)) > weights = runif(100, 0, 1) > mydata = data.frame(response,group,weights) > > ## run summaryBy without weights: > summaryBy(response~group, data = mydata, FUN = mean) > > ## attempt to run summaryBy with weights, throws error > summaryBy(x~group, data = mydata, FUN = weighted.mean, w=weights ) > > ## throws the error: > # Error in tapply(lh.data[, lh.var[vv]], rh.string.factor, function(x) { : > # arguments must have same length > > My guess is that summaryBy is not giving weighted.mean() each group of > weights, but instead is passing all of the weights in the data set each time > it calls weighted.mean(). Yes, of course. It has no way of knowing that the weights should also be being broken down by group....they are not in the formula. > Do you know if there is some way to get summaryBy to pass weights to > weighted.mean() only for each group? Ideally there would be a way to pass more than one variable to a function (e.g., response and weights) or just an entire object (mydata) broken down by group. Then you would just make a wrapper function to pass the right values to the x and w arguments of weighted.mean. Instead here is a somewhat hacked version: library(doBy) ## make up some data (easier) mydata <- data.frame(response = rnorm(100), group = rep(1:5, each = 20), weights = runif(100, 0, 1)) ## manually compute weighted mean tmp <- summaryBy(response*weights ~ group, data = mydata, FUN = sum) tmp[,2] <- tmp[,2]/with(mydata, tapply(weights, group, sum)) tmp ## weighted means ## here's the 'problem', if you will, even with +, they are passed one at a time summaryBy(response + weights ~ group, data = mydata, FUN = str) summaryBy(mydata ~ group, data = mydata, FUN = str) ## here is an option using by(): xy <- by(mydata, mydata$group, function(z) weighted.mean(z$response, z$weights)) xy ## if you don't like the formatting.... data.frame(group = names(c(xy)), weighted.mean = c(xy)) HTH, Josh > > I suspect this functionality would be a tremendous benefit to R users who > regularly work with weighted data, such as myself. > > Thanks, > > Solomon Messing > www.stanford.edu/~messing > > PS I know this basic example can be done using lapply(split(...)) approach > referenced here: > > http://www.mail-archive.com/r-help@stat.math.ethz.ch/msg12349.html > > but for more complex tasks the lapply approach will mean writing a lot of > extra code to run everything and then to get things formatted as nicely as > summaryBy() was designed to do. > > > [[alternative HTML version deleted]] > > ______________________________________________ > R-help@r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. > -- Joshua Wiley Ph.D. Student, Health Psychology University of California, Los Angeles http://www.joshuawiley.com/ ______________________________________________ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.