Another option for doing opertions by grouping variables is to the the plyr package.
d <- data.frame(x=1:10, g1=LETTERS[rep(11:12,each=5)], g2=letters[rep(21:23,c(3,3,4))] ) library(plyr) ddply(d, c("g1", "g2"), function(z){ z$x <- z$x / max(z$x) z }) ------------------------------------------------------------------------ ---- ir. Thierry Onkelinx Instituut voor natuur- en bosonderzoek team Biometrie & Kwaliteitszorg Gaverstraat 4 9500 Geraardsbergen Belgium Research Institute for Nature and Forest team Biometrics & Quality Assurance Gaverstraat 4 9500 Geraardsbergen Belgium tel. + 32 54/436 185 thierry.onkel...@inbo.be www.inbo.be To call in the statistician after the experiment is done may be no more than asking him to perform a post-mortem examination: he may be able to say what the experiment died of. ~ Sir Ronald Aylmer Fisher The plural of anecdote is not data. ~ Roger Brinner The combination of some data and an aching desire for an answer does not ensure that a reasonable answer can be extracted from a given body of data. ~ John Tukey > -----Oorspronkelijk bericht----- > Van: r-help-boun...@r-project.org > [mailto:r-help-boun...@r-project.org] Namens William Dunlap > Verzonden: woensdag 22 september 2010 19:51 > Aan: Seth W Bigelow; bill.venab...@csiro.au > CC: r-help@r-project.org > Onderwerp: Re: [R] Doing operations by grouping variable > > > -----Original Message----- > > From: r-help-boun...@r-project.org > > [mailto:r-help-boun...@r-project.org] On Behalf Of Seth W Bigelow > > Sent: Tuesday, September 21, 2010 4:22 PM > > To: bill.venab...@csiro.au > > Cc: r-help@r-project.org > > Subject: Re: [R] Doing operations by grouping variable > > > > Aah, that is the sort of truly elegant solution I have been > seeking. > > And it's wrapped up in a nice programming shortcut to boot > (i.e., the > > within statement). I retract anything I may have said about tapply > > being clunky. > > > > Many thanks > > > > --Seth > > > > Dr. Seth W. Bigelow > > Biologist, USDA-FS Pacific Southwest Research Station > > 1731 Research Park Drive, Davis California > > > > <bill.venab...@csiro.au> > > 09/21/2010 03:15 PM > > > > To <sbige...@fs.fed.us> > > > > You left out the subscript. Why not just do > > > > d <- within(data.frame(group = rep(1:5, each = 5), > > variable = rnorm(25)), > > scaled <- variable/tapply(variable, group, max)[group]) > > This approach can be tricky when there is more than one > grouping variable. E.g., suppose we have grouping variables > g1 and g2: > > d <- data.frame(x=1:10, > g1=LETTERS[rep(11:12,each=5)], > g2=letters[rep(21:23,c(3,3,4))]) > > d > x g1 g2 > 1 1 K u > 2 2 K u > 3 3 K u > 4 4 K v > 5 5 K v > 6 6 L v > 7 7 L w > 8 8 L w > 9 9 L w > 10 10 L w > and we want to divide each x value by it max for each > g1*g2 group (6 possible groups, of which 4 are in the data). > > You can extend Bill V.'s approach with > > with(d, x/tapply(x, list(g1,g2), FUN=max)[cbind(g1,g2)]) > [1] 0.3333333 0.6666667 1.0000000 0.8000000 1.0000000 > [6] 1.0000000 0.7000000 0.8000000 0.9000000 1.0000000 That > would fail if g1 and g2 were not factors but were integer > vectors. Try it with > > di <- data.frame(x=1:10, > g1=rep(11:12,each=5), > g2=rep(21:23,c(3,3,4))) > > with(di, x/tapply(x, list(g1,g2), FUN=max)[cbind(g1,g2)]) > Error in tapply(x, list(g1, g2), FUN = max)[cbind(g1, g2)] : > subscript out of bounds > > To avoid that problem you can call tapply with no FUN to get > the indices to subscript by > > with(d, x/tapply(x, list(g1,g2), FUN=max)[tapply(x, > list(g1, g2))]) > [1] 0.3333333 0.6666667 1.0000000 0.8000000 1.0000000 > [6] 1.0000000 0.7000000 0.8000000 0.9000000 1.0000000 > > The misleadingly named ave() can avoid the need to do the > subscripting after tapply but has other problems > > with(d, x/ave(x, g1, g2, FUN=max)) > [1] 0.3333333 0.6666667 1.0000000 0.8000000 1.0000000 > [6] 1.0000000 0.7000000 0.8000000 0.9000000 1.0000000 > Warning messages: > 1: In FUN(X[[6L]], ...) : no non-missing arguments to max; > returning -Inf > 2: In FUN(X[[6L]], ...) : no non-missing arguments to max; > returning -Inf It gives the right answer but it is calling > FUN even for the empty interaction groups. For some FUN's this would > abort the call, not just give a warning. In any case it > is a waste of time. > > In either case you can also use the interaction() function to > change the multiple grouping vectors into one: > > d <- within(d, interaction(g1, g2, drop=TRUE)) > > with(d, x/ave(x, g1g2, FUN=max)) > [1] 0.3333333 0.6666667 1.0000000 0.8000000 1.0000000 > [6] 1.0000000 0.7000000 0.8000000 0.9000000 1.0000000 > > with(d, x/tapply(x, g1g2, FUN=max)[g1g2]) > K.u K.u K.u K.v K.v L.v > 0.3333333 0.6666667 1.0000000 0.8000000 1.0000000 1.0000000 > L.w L.w L.w L.w > 0.7000000 0.8000000 0.9000000 1.0000000 > > with(d, x/tapply(x, g1g2, FUN=max)[tapply(x, g1g2)]) > K.u K.u K.u K.v K.v L.v > 0.3333333 0.6666667 1.0000000 0.8000000 1.0000000 1.0000000 > L.w L.w L.w L.w > 0.7000000 0.8000000 0.9000000 1.0000000 The names are > probably unwanted in the tapply cases; use unname to get rid of them. > > Bill Dunlap > Spotfire, TIBCO Software > wdunlap tibco.com > > > and be done with it? > > > > (Warning: if you replace the second '<-' above by '=', it will not > > work. > > It is NOT true that you can always replace '<-' by '=' for > assignment. > > Why?) > > > > Bill Venables. > > ______________________________________________ > R-help@r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide > http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. > Druk dit bericht a.u.b. niet onnodig af. Please do not print this message unnecessarily. Dit bericht en eventuele bijlagen geven enkel de visie van de schrijver weer en binden het INBO onder geen enkel beding, zolang dit bericht niet bevestigd is door een geldig ondertekend document. The views expressed in this message and any annex are purely those of the writer and may not be regarded as stating an official position of INBO, as long as the message is not confirmed by a duly signed document. ______________________________________________ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.