This will do it. You can see two different values for id=1: > x <- with(datas, aggregate(list(r = r), by = list(id = id, mod1 = mod1),mean)) > x id mod1 r 1 1 1 0.980 2 4 1 0.640 3 7 1 0.490 4 10 1 0.180 5 1 2 0.295 6 5 2 0.490 7 8 2 0.330 8 11 2 0.600 9 6 3 -0.040 10 9 3 0.580 11 12 3 0.210 > # choose random duplicate to use > do.call(rbind, lapply(split(x, x$id), function(.data) .data[sample(nrow(.data), 1),])) id mod1 r 1 1 1 0.98 4 4 1 0.64 5 5 2 0.49 6 6 3 -0.04 7 7 1 0.49 8 8 2 0.33 9 9 3 0.58 10 10 1 0.18 11 11 2 0.60 12 12 3 0.21 > > # choose random duplicate to use - try to see if a different one comes up > do.call(rbind, lapply(split(x, x$id), function(.data) .data[sample(nrow(.data), 1),])) id mod1 r 1 1 2 0.295 4 4 1 0.640 5 5 2 0.490 6 6 3 -0.040 7 7 1 0.490 8 8 2 0.330 9 9 3 0.580 10 10 1 0.180 11 11 2 0.600 12 12 3 0.210 > >
On Sat, Feb 20, 2010 at 9:50 PM, AC Del Re <acde...@gmail.com> wrote: > OK, this is great, Jim. Last question: How about if I want the 1 copy > of each id to be selected randomly versus taking the first one? > > AC > > On Sat, Feb 20, 2010 at 8:37 PM, jim holtman <jholt...@gmail.com> wrote: > > I am not sure what you mean by eliminating a row. Now if you want only > one > > copy of each 'id', and it is the first one, the you can use 'duplicated': > > > >> x <- with(datas, aggregate(list(r = r), by = list(id = id, mod1 = > >> mod1),mean)) > >> x > > id mod1 r > > 1 1 1 0.980 > > 2 4 1 0.640 > > 3 7 1 0.490 > > 4 10 1 0.180 > > 5 1 2 0.295 > > 6 5 2 0.490 > > 7 8 2 0.330 > > 8 11 2 0.600 > > 9 6 3 -0.040 > > 10 9 3 0.580 > > 11 12 3 0.210 > >> subset(x, !duplicated(id)) > > id mod1 r > > 1 1 1 0.98 > > 2 4 1 0.64 > > 3 7 1 0.49 > > 4 10 1 0.18 > > 6 5 2 0.49 > > 7 8 2 0.33 > > 8 11 2 0.60 > > 9 6 3 -0.04 > > 10 9 3 0.58 > > 11 12 3 0.21 > > > > > > On Sat, Feb 20, 2010 at 8:07 PM, AC Del Re <de...@wisc.edu> wrote: > >> > >> Perfect! Thanks Jim. > >> > >> Do you know how I could then reduce the data even further? > >> Specifically, reducing it to 1 id per row? In this dataset, id 1 would > >> have one row eliminated. > >> Assume the data is much larger and cannot be deleted by visual > >> inspection and elimination one row at a time. > >> > >> > >> Thank you, > >> > >> AC > >> > >> On Sat, Feb 20, 2010 at 6:26 PM, jim holtman <jholt...@gmail.com> > wrote: > >> > This seems to work fine (notice the missing 'c(...)'; why did you > think > >> > you > >> > needed it); > >> > > >> >> with(datas, aggregate(list(r = r), by = list(id = id, mod1 = > >> >> mod1),mean)) > >> > id mod1 r > >> > 1 1 1 0.980 > >> > 2 4 1 0.640 > >> > 3 7 1 0.490 > >> > 4 10 1 0.180 > >> > 5 1 2 0.295 > >> > 6 5 2 0.490 > >> > 7 8 2 0.330 > >> > 8 11 2 0.600 > >> > 9 6 3 -0.040 > >> > 10 9 3 0.580 > >> > 11 12 3 0.210 > >> >> > >> > > >> > > >> > On Sat, Feb 20, 2010 at 6:54 PM, AC Del Re <de...@wisc.edu> wrote: > >> >> > >> >> Hi All, > >> >> > >> >> I am interested in aggregating a data frame based on 2 > >> >> categories--mean effect size (r) for each 'id's' 'mod1'. The > >> >> 'with' function works well when aggregating on one category (e.g., > >> >> based on 'id' below) but doesnt work if I try 2 categories. How can > >> >> this be accomplished? > >> >> > >> >> # sample data > >> >> > >> >> id<-c(1,1,1,rep(4:12)) > >> >> n<-c(10,20,13,22,28,12,12,36,19,12, 15,8) > >> >> r<-c(.98,.56,.03,.64,.49,-.04,.49,.33,.58,.18, .6,.21) > >> >> mod1<-factor(c(1,2,2, rep(c(1,2,3),3))) > >> >> mod2<-c(1,2,15,rep(3,9)) > >> >> datas<-data.frame(id,n,r,mod1,mod2) > >> >> > >> >> # one category works perfect: > >> >> > >> >> with(datas, aggregate(list(r = r), by = list(id = id),mean)) > >> >> > >> >> id r > >> >> 1 1 0.5233333 > >> >> 2 4 0.6400000 > >> >> 3 5 0.4900000 > >> >> 4 6 -0.0400000 > >> >> 5 7 0.4900000 > >> >> 6 8 0.3300000 > >> >> 7 9 0.5800000 > >> >> 8 10 0.1800000 > >> >> 9 11 0.6000000 > >> >> 10 12 0.2100000 > >> >> > >> >> # trying with 2 categories: > >> >> > >> >> with(datas, aggregate(list(r = r), by = list(c(id = id, mod1 = > >> >> mod1)),mean)) > >> >> > >> >> Error in FUN(X[[1L]], ...) : arguments must have same length > >> >> > >> >> Thank you, > >> >> > >> >> AC > >> >> > >> >> ______________________________________________ > >> >> R-help@r-project.org mailing list > >> >> https://stat.ethz.ch/mailman/listinfo/r-help > >> >> PLEASE do read the posting guide > >> >> http://www.R-project.org/posting-guide.html<http://www.r-project.org/posting-guide.html> > >> >> and provide commented, minimal, self-contained, reproducible code. > >> > > >> > > >> > > >> > -- > >> > Jim Holtman > >> > Cincinnati, OH > >> > +1 513 646 9390 > >> > > >> > What is the problem that you are trying to solve? > >> > > > > > > > > > -- > > Jim Holtman > > Cincinnati, OH > > +1 513 646 9390 > > > > What is the problem that you are trying to solve? > > > -- Jim Holtman Cincinnati, OH +1 513 646 9390 What is the problem that you are trying to solve? [[alternative HTML version deleted]] ______________________________________________ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.