Thanks, everybody!

*Felipe Nunes*
CAPES/Fulbright Fellow
PhD Student Political Science - UCLA
Web: felipenunes.bol.ucla.edu



On Mon, Jan 16, 2012 at 2:33 PM, Marc Schwartz <marc_schwa...@me.com> wrote:

>
> On Jan 16, 2012, at 4:13 PM, Felipe Nunes wrote:
>
> > Dear all,
> >
> > I have a data set in which the same unit appears 2 or 3 or 4 times. I
> need
> > to aggregate this data to maintain only one unit by row. But I need to do
> > that based on a comparison between the values of such units. I can't
> find a
> > function to help me on that. I appreciate any help. Below I provide an
> > example of what I want:
> >
> > This is my data:
> >
> > Units  Var1 Var2 Var3
> >  1          B       2        2
> >  1          C       1       3
> >  2          D      3        3
> >  2          C      2        1
> >  2          A      1        2
> >  3          C      2        1
> >  3          A      1        3
> >  4          B      3        3
> >  4          C      2        2
> >  4          A      1        1
> >
> > I want to produce a new data in which I keep, for each unit, the row with
> > max number for Var2. For instance,
> >
> > Units  Var1 Var2 Var3
> >  1          B       2       2
> >  2          D      3        3
> >  3          C      2        1
> >  4          B      3        3
> >
> > Thanks,
>
>
> There are likely to be multiple approaches, but the first that comes to
> mind is combining ?merge with ?aggregate.
>
> > DF
>   Units Var1 Var2 Var3
> 1      1    B    2    2
> 2      1    C    1    3
> 3      2    D    3    3
> 4      2    C    2    1
> 5      2    A    1    2
> 6      3    C    2    1
> 7      3    A    1    3
> 8      4    B    3    3
> 9      4    C    2    2
> 10     4    A    1    1
>
> > merge(aggregate(Var2 ~ Units, max, data = DF), DF)
>  Units Var2 Var1 Var3
> 1     1    2    B    2
> 2     2    3    D    3
> 3     3    2    C    1
> 4     4    3    B    3
>
>
> The use of aggregate() gets you the max value of Var2 for each group
> within Units:
>
> > aggregate(Var2 ~ Units, max, data = DF)
>  Units Var2
> 1     1    2
> 2     2    3
> 3     3    2
> 4     4    3
>
> merge() then allows you to add back the columns from 'DF', matching on
> 'Units' and 'Var2', which are the common columns between the two data
> frames. Note that the ordering of the columns is changed a bit here. That
> is easily adjusted with:
>
> > merge(aggregate(Var2 ~ Units, max, data = DF), DF)[, c(1, 3, 2, 4)]
>  Units Var1 Var2 Var3
> 1     1    B    2    2
> 2     2    D    3    3
> 3     3    C    2    1
> 4     4    B    3    3
>
>
> Note that you would want to use the 'na.rm = TRUE' argument to max() in
> the call to aggregate() if in fact any of your Var2 arguments are NA.
>
> HTH,
>
> Marc Schwartz
>
>

        [[alternative HTML version deleted]]

______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Reply via email to