This will do it.  You can see two different values for id=1:

>  x <- with(datas,  aggregate(list(r = r),  by = list(id = id, mod1 =
mod1),mean))
> x
   id mod1      r
1   1    1  0.980
2   4    1  0.640
3   7    1  0.490
4  10    1  0.180
5   1    2  0.295
6   5    2  0.490
7   8    2  0.330
8  11    2  0.600
9   6    3 -0.040
10  9    3  0.580
11 12    3  0.210
> # choose random duplicate to use
> do.call(rbind, lapply(split(x, x$id), function(.data)
.data[sample(nrow(.data), 1),]))
   id mod1     r
1   1    1  0.98
4   4    1  0.64
5   5    2  0.49
6   6    3 -0.04
7   7    1  0.49
8   8    2  0.33
9   9    3  0.58
10 10    1  0.18
11 11    2  0.60
12 12    3  0.21
>
> # choose random duplicate to use - try to see if a different one comes up
> do.call(rbind, lapply(split(x, x$id), function(.data)
.data[sample(nrow(.data), 1),]))
   id mod1      r
1   1    2  0.295
4   4    1  0.640
5   5    2  0.490
6   6    3 -0.040
7   7    1  0.490
8   8    2  0.330
9   9    3  0.580
10 10    1  0.180
11 11    2  0.600
12 12    3  0.210
>
>


On Sat, Feb 20, 2010 at 9:50 PM, AC Del Re <acde...@gmail.com> wrote:

> OK, this is great, Jim. Last question: How about if I want the 1 copy
> of each id to be selected randomly versus taking the first one?
>
> AC
>
> On Sat, Feb 20, 2010 at 8:37 PM, jim holtman <jholt...@gmail.com> wrote:
> > I am not sure what you mean by eliminating a row.  Now if you want only
> one
> > copy of each 'id', and it is the first one, the you can use 'duplicated':
> >
> >> x <- with(datas,  aggregate(list(r = r),  by = list(id = id, mod1 =
> >> mod1),mean))
> >> x
> >    id mod1      r
> > 1   1    1  0.980
> > 2   4    1  0.640
> > 3   7    1  0.490
> > 4  10    1  0.180
> > 5   1    2  0.295
> > 6   5    2  0.490
> > 7   8    2  0.330
> > 8  11    2  0.600
> > 9   6    3 -0.040
> > 10  9    3  0.580
> > 11 12    3  0.210
> >> subset(x, !duplicated(id))
> >    id mod1     r
> > 1   1    1  0.98
> > 2   4    1  0.64
> > 3   7    1  0.49
> > 4  10    1  0.18
> > 6   5    2  0.49
> > 7   8    2  0.33
> > 8  11    2  0.60
> > 9   6    3 -0.04
> > 10  9    3  0.58
> > 11 12    3  0.21
> >
> >
> > On Sat, Feb 20, 2010 at 8:07 PM, AC Del Re <de...@wisc.edu> wrote:
> >>
> >> Perfect! Thanks Jim.
> >>
> >> Do you know how I could then reduce the data even further?
> >> Specifically, reducing it to 1 id per row? In this dataset, id 1 would
> >> have one row eliminated.
> >> Assume the data is much larger and cannot be deleted by visual
> >> inspection and elimination one row at a time.
> >>
> >>
> >> Thank you,
> >>
> >> AC
> >>
> >> On Sat, Feb 20, 2010 at 6:26 PM, jim holtman <jholt...@gmail.com>
> wrote:
> >> > This seems to work fine (notice the missing 'c(...)'; why did you
> think
> >> > you
> >> > needed it);
> >> >
> >> >>  with(datas,  aggregate(list(r = r),  by = list(id = id, mod1 =
> >> >> mod1),mean))
> >> >    id mod1      r
> >> > 1   1    1  0.980
> >> > 2   4    1  0.640
> >> > 3   7    1  0.490
> >> > 4  10    1  0.180
> >> > 5   1    2  0.295
> >> > 6   5    2  0.490
> >> > 7   8    2  0.330
> >> > 8  11    2  0.600
> >> > 9   6    3 -0.040
> >> > 10  9    3  0.580
> >> > 11 12    3  0.210
> >> >>
> >> >
> >> >
> >> > On Sat, Feb 20, 2010 at 6:54 PM, AC Del Re <de...@wisc.edu> wrote:
> >> >>
> >> >> Hi All,
> >> >>
> >> >> I am interested in aggregating a data frame based on 2
> >> >> categories--mean effect size (r) for each 'id's' 'mod1'. The
> >> >> 'with' function works well when aggregating on one category (e.g.,
> >> >> based on 'id' below) but doesnt work if I try 2 categories. How can
> >> >> this be accomplished?
> >> >>
> >> >> # sample data
> >> >>
> >> >> id<-c(1,1,1,rep(4:12))
> >> >> n<-c(10,20,13,22,28,12,12,36,19,12, 15,8)
> >> >> r<-c(.98,.56,.03,.64,.49,-.04,.49,.33,.58,.18, .6,.21)
> >> >> mod1<-factor(c(1,2,2, rep(c(1,2,3),3)))
> >> >> mod2<-c(1,2,15,rep(3,9))
> >> >> datas<-data.frame(id,n,r,mod1,mod2)
> >> >>
> >> >> # one category works perfect:
> >> >>
> >> >> with(datas,  aggregate(list(r = r),  by = list(id = id),mean))
> >> >>
> >> >>  id          r
> >> >> 1   1  0.5233333
> >> >> 2   4  0.6400000
> >> >> 3   5  0.4900000
> >> >> 4   6 -0.0400000
> >> >> 5   7  0.4900000
> >> >> 6   8  0.3300000
> >> >> 7   9  0.5800000
> >> >> 8  10  0.1800000
> >> >> 9  11  0.6000000
> >> >> 10 12  0.2100000
> >> >>
> >> >> # trying with 2 categories:
> >> >>
> >> >>  with(datas,  aggregate(list(r = r),  by = list(c(id = id, mod1 =
> >> >> mod1)),mean))
> >> >>
> >> >> Error in FUN(X[[1L]], ...) : arguments must have same length
> >> >>
> >> >> Thank you,
> >> >>
> >> >> AC
> >> >>
> >> >> ______________________________________________
> >> >> R-help@r-project.org mailing list
> >> >> https://stat.ethz.ch/mailman/listinfo/r-help
> >> >> PLEASE do read the posting guide
> >> >> http://www.R-project.org/posting-guide.html<http://www.r-project.org/posting-guide.html>
> >> >> and provide commented, minimal, self-contained, reproducible code.
> >> >
> >> >
> >> >
> >> > --
> >> > Jim Holtman
> >> > Cincinnati, OH
> >> > +1 513 646 9390
> >> >
> >> > What is the problem that you are trying to solve?
> >> >
> >
> >
> >
> > --
> > Jim Holtman
> > Cincinnati, OH
> > +1 513 646 9390
> >
> > What is the problem that you are trying to solve?
> >
>



-- 
Jim Holtman
Cincinnati, OH
+1 513 646 9390

What is the problem that you are trying to solve?

        [[alternative HTML version deleted]]

______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Reply via email to