> -----Original Message----- > From: r-help-boun...@r-project.org > [mailto:r-help-boun...@r-project.org] On Behalf Of Seth W Bigelow > Sent: Tuesday, September 21, 2010 4:22 PM > To: bill.venab...@csiro.au > Cc: r-help@r-project.org > Subject: Re: [R] Doing operations by grouping variable > > Aah, that is the sort of truly elegant solution I have been > seeking. And > it's wrapped up in a nice programming shortcut to boot (i.e., > the within > statement). I retract anything I may have said about tapply > being clunky. > > Many thanks > > --Seth > > Dr. Seth W. Bigelow > Biologist, USDA-FS Pacific Southwest Research Station > 1731 Research Park Drive, Davis California > > <bill.venab...@csiro.au> > 09/21/2010 03:15 PM > > To <sbige...@fs.fed.us> > > You left out the subscript. Why not just do > > d <- within(data.frame(group = rep(1:5, each = 5), > variable = rnorm(25)), > scaled <- variable/tapply(variable, group, max)[group])
This approach can be tricky when there is more than one grouping variable. E.g., suppose we have grouping variables g1 and g2: > d <- data.frame(x=1:10, g1=LETTERS[rep(11:12,each=5)], g2=letters[rep(21:23,c(3,3,4))]) > d x g1 g2 1 1 K u 2 2 K u 3 3 K u 4 4 K v 5 5 K v 6 6 L v 7 7 L w 8 8 L w 9 9 L w 10 10 L w and we want to divide each x value by it max for each g1*g2 group (6 possible groups, of which 4 are in the data). You can extend Bill V.'s approach with > with(d, x/tapply(x, list(g1,g2), FUN=max)[cbind(g1,g2)]) [1] 0.3333333 0.6666667 1.0000000 0.8000000 1.0000000 [6] 1.0000000 0.7000000 0.8000000 0.9000000 1.0000000 That would fail if g1 and g2 were not factors but were integer vectors. Try it with > di <- data.frame(x=1:10, g1=rep(11:12,each=5), g2=rep(21:23,c(3,3,4))) > with(di, x/tapply(x, list(g1,g2), FUN=max)[cbind(g1,g2)]) Error in tapply(x, list(g1, g2), FUN = max)[cbind(g1, g2)] : subscript out of bounds To avoid that problem you can call tapply with no FUN to get the indices to subscript by > with(d, x/tapply(x, list(g1,g2), FUN=max)[tapply(x, list(g1, g2))]) [1] 0.3333333 0.6666667 1.0000000 0.8000000 1.0000000 [6] 1.0000000 0.7000000 0.8000000 0.9000000 1.0000000 The misleadingly named ave() can avoid the need to do the subscripting after tapply but has other problems > with(d, x/ave(x, g1, g2, FUN=max)) [1] 0.3333333 0.6666667 1.0000000 0.8000000 1.0000000 [6] 1.0000000 0.7000000 0.8000000 0.9000000 1.0000000 Warning messages: 1: In FUN(X[[6L]], ...) : no non-missing arguments to max; returning -Inf 2: In FUN(X[[6L]], ...) : no non-missing arguments to max; returning -Inf It gives the right answer but it is calling FUN even for the empty interaction groups. For some FUN's this would abort the call, not just give a warning. In any case it is a waste of time. In either case you can also use the interaction() function to change the multiple grouping vectors into one: > d <- within(d, interaction(g1, g2, drop=TRUE)) > with(d, x/ave(x, g1g2, FUN=max)) [1] 0.3333333 0.6666667 1.0000000 0.8000000 1.0000000 [6] 1.0000000 0.7000000 0.8000000 0.9000000 1.0000000 > with(d, x/tapply(x, g1g2, FUN=max)[g1g2]) K.u K.u K.u K.v K.v L.v 0.3333333 0.6666667 1.0000000 0.8000000 1.0000000 1.0000000 L.w L.w L.w L.w 0.7000000 0.8000000 0.9000000 1.0000000 > with(d, x/tapply(x, g1g2, FUN=max)[tapply(x, g1g2)]) K.u K.u K.u K.v K.v L.v 0.3333333 0.6666667 1.0000000 0.8000000 1.0000000 1.0000000 L.w L.w L.w L.w 0.7000000 0.8000000 0.9000000 1.0000000 The names are probably unwanted in the tapply cases; use unname to get rid of them. Bill Dunlap Spotfire, TIBCO Software wdunlap tibco.com > and be done with it? > > (Warning: if you replace the second '<-' above by '=', it > will not work. > It is NOT true that you can always replace '<-' by '=' for > assignment. > Why?) > > Bill Venables. ______________________________________________ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.