OK, this is great, Jim. Last question: How about if I want the 1 copy of each id to be selected randomly versus taking the first one?
Thank you, AC > On Sat, Feb 20, 2010 at 8:37 PM, jim holtman <jholt...@gmail.com> wrote: >> I am not sure what you mean by eliminating a row. Now if you want only one >> copy of each 'id', and it is the first one, the you can use 'duplicated': >> >>> x <- with(datas, aggregate(list(r = r), by = list(id = id, mod1 = >>> mod1),mean)) >>> x >> id mod1 r >> 1 1 1 0.980 >> 2 4 1 0.640 >> 3 7 1 0.490 >> 4 10 1 0.180 >> 5 1 2 0.295 >> 6 5 2 0.490 >> 7 8 2 0.330 >> 8 11 2 0.600 >> 9 6 3 -0.040 >> 10 9 3 0.580 >> 11 12 3 0.210 >>> subset(x, !duplicated(id)) >> id mod1 r >> 1 1 1 0.98 >> 2 4 1 0.64 >> 3 7 1 0.49 >> 4 10 1 0.18 >> 6 5 2 0.49 >> 7 8 2 0.33 >> 8 11 2 0.60 >> 9 6 3 -0.04 >> 10 9 3 0.58 >> 11 12 3 0.21 >> >> >> On Sat, Feb 20, 2010 at 8:07 PM, AC Del Re <de...@wisc.edu> wrote: >>> >>> Perfect! Thanks Jim. >>> >>> Do you know how I could then reduce the data even further? >>> Specifically, reducing it to 1 id per row? In this dataset, id 1 would >>> have one row eliminated. >>> Assume the data is much larger and cannot be deleted by visual >>> inspection and elimination one row at a time. >>> >>> >>> Thank you, >>> >>> AC >>> >>> On Sat, Feb 20, 2010 at 6:26 PM, jim holtman <jholt...@gmail.com> wrote: >>> > This seems to work fine (notice the missing 'c(...)'; why did you think >>> > you >>> > needed it); >>> > >>> >> with(datas, aggregate(list(r = r), by = list(id = id, mod1 = >>> >> mod1),mean)) >>> > id mod1 r >>> > 1 1 1 0.980 >>> > 2 4 1 0.640 >>> > 3 7 1 0.490 >>> > 4 10 1 0.180 >>> > 5 1 2 0.295 >>> > 6 5 2 0.490 >>> > 7 8 2 0.330 >>> > 8 11 2 0.600 >>> > 9 6 3 -0.040 >>> > 10 9 3 0.580 >>> > 11 12 3 0.210 >>> >> >>> > >>> > >>> > On Sat, Feb 20, 2010 at 6:54 PM, AC Del Re <de...@wisc.edu> wrote: >>> >> >>> >> Hi All, >>> >> >>> >> I am interested in aggregating a data frame based on 2 >>> >> categories--mean effect size (r) for each 'id's' 'mod1'. The >>> >> 'with' function works well when aggregating on one category (e.g., >>> >> based on 'id' below) but doesnt work if I try 2 categories. How can >>> >> this be accomplished? >>> >> >>> >> # sample data >>> >> >>> >> id<-c(1,1,1,rep(4:12)) >>> >> n<-c(10,20,13,22,28,12,12,36,19,12, 15,8) >>> >> r<-c(.98,.56,.03,.64,.49,-.04,.49,.33,.58,.18, .6,.21) >>> >> mod1<-factor(c(1,2,2, rep(c(1,2,3),3))) >>> >> mod2<-c(1,2,15,rep(3,9)) >>> >> datas<-data.frame(id,n,r,mod1,mod2) >>> >> >>> >> # one category works perfect: >>> >> >>> >> with(datas, aggregate(list(r = r), by = list(id = id),mean)) >>> >> >>> >> id r >>> >> 1 1 0.5233333 >>> >> 2 4 0.6400000 >>> >> 3 5 0.4900000 >>> >> 4 6 -0.0400000 >>> >> 5 7 0.4900000 >>> >> 6 8 0.3300000 >>> >> 7 9 0.5800000 >>> >> 8 10 0.1800000 >>> >> 9 11 0.6000000 >>> >> 10 12 0.2100000 >>> >> >>> >> # trying with 2 categories: >>> >> >>> >> with(datas, aggregate(list(r = r), by = list(c(id = id, mod1 = >>> >> mod1)),mean)) >>> >> >>> >> Error in FUN(X[[1L]], ...) : arguments must have same length >>> >> >>> >> Thank you, >>> >> >>> >> AC >>> >> >>> >> ______________________________________________ >>> >> R-help@r-project.org mailing list >>> >> https://stat.ethz.ch/mailman/listinfo/r-help >>> >> PLEASE do read the posting guide >>> >> http://www.R-project.org/posting-guide.html >>> >> and provide commented, minimal, self-contained, reproducible code. >>> > >>> > >>> > >>> > -- >>> > Jim Holtman >>> > Cincinnati, OH >>> > +1 513 646 9390 >>> > >>> > What is the problem that you are trying to solve? >>> > >> >> >> >> -- >> Jim Holtman >> Cincinnati, OH >> +1 513 646 9390 >> >> What is the problem that you are trying to solve? >> > ______________________________________________ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.