Thanks, everybody! *Felipe Nunes* CAPES/Fulbright Fellow PhD Student Political Science - UCLA Web: felipenunes.bol.ucla.edu
On Mon, Jan 16, 2012 at 2:33 PM, Marc Schwartz <marc_schwa...@me.com> wrote: > > On Jan 16, 2012, at 4:13 PM, Felipe Nunes wrote: > > > Dear all, > > > > I have a data set in which the same unit appears 2 or 3 or 4 times. I > need > > to aggregate this data to maintain only one unit by row. But I need to do > > that based on a comparison between the values of such units. I can't > find a > > function to help me on that. I appreciate any help. Below I provide an > > example of what I want: > > > > This is my data: > > > > Units Var1 Var2 Var3 > > 1 B 2 2 > > 1 C 1 3 > > 2 D 3 3 > > 2 C 2 1 > > 2 A 1 2 > > 3 C 2 1 > > 3 A 1 3 > > 4 B 3 3 > > 4 C 2 2 > > 4 A 1 1 > > > > I want to produce a new data in which I keep, for each unit, the row with > > max number for Var2. For instance, > > > > Units Var1 Var2 Var3 > > 1 B 2 2 > > 2 D 3 3 > > 3 C 2 1 > > 4 B 3 3 > > > > Thanks, > > > There are likely to be multiple approaches, but the first that comes to > mind is combining ?merge with ?aggregate. > > > DF > Units Var1 Var2 Var3 > 1 1 B 2 2 > 2 1 C 1 3 > 3 2 D 3 3 > 4 2 C 2 1 > 5 2 A 1 2 > 6 3 C 2 1 > 7 3 A 1 3 > 8 4 B 3 3 > 9 4 C 2 2 > 10 4 A 1 1 > > > merge(aggregate(Var2 ~ Units, max, data = DF), DF) > Units Var2 Var1 Var3 > 1 1 2 B 2 > 2 2 3 D 3 > 3 3 2 C 1 > 4 4 3 B 3 > > > The use of aggregate() gets you the max value of Var2 for each group > within Units: > > > aggregate(Var2 ~ Units, max, data = DF) > Units Var2 > 1 1 2 > 2 2 3 > 3 3 2 > 4 4 3 > > merge() then allows you to add back the columns from 'DF', matching on > 'Units' and 'Var2', which are the common columns between the two data > frames. Note that the ordering of the columns is changed a bit here. That > is easily adjusted with: > > > merge(aggregate(Var2 ~ Units, max, data = DF), DF)[, c(1, 3, 2, 4)] > Units Var1 Var2 Var3 > 1 1 B 2 2 > 2 2 D 3 3 > 3 3 C 2 1 > 4 4 B 3 3 > > > Note that you would want to use the 'na.rm = TRUE' argument to max() in > the call to aggregate() if in fact any of your Var2 arguments are NA. > > HTH, > > Marc Schwartz > > [[alternative HTML version deleted]] ______________________________________________ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.