My solution: SP <- split(df, df[, 1:2]) minner <- function(x, col = 'numMiss') { x[which.min(unlist(x[,col])), , drop=FALSE]} NEW <- do.call('rbind', lapply(SP, minner))SP2 <- split(NEW, NEW[, 'id'])do.call('rbind', lapply(SP2, function(x) minner(x, 'A')))
Cheers,Tyler > Date: Sat, 14 Apr 2012 12:03:36 -0700 > From: francy.casal...@gmail.com > To: r-help@r-project.org > Subject: [R] Choose between duplicated rows > > Dear r experts, > > Sorry for this basic question, but I can't seem to find a solution… > > I have this data frame: > df <- data.frame(id = c("id1", "id1", "id1", "id2", "id2", "id2"), A = > c(11905, 11907, 11907, 11829, 11829, 11829), v1 = c(NA, 3, NA,1,2,NA), v2 = > c(NA,2,NA, 2, NA,NA), v3 = c(NA,1,NA,1,NA,NA), v4 = c("N", "Y", "N", "Y", > "N","N"), v5 = c(0,0,0,1,0,0), numMiss=c(3,0,3,0,2,3)) > > > df > id A v1 v2 v3 v4 v5 numMiss > 1 id1 11905 NA NA NA N 0 3 > 2 id1 11907 3 2 1 Y 0 0 > 3 id1 11907 NA NA NA N 0 3 > 4 id2 11829 1 2 1 Y 1 0 > 5 id2 11829 2 NA NA N 0 2 > 6 id2 11829 NA NA NA N 0 3 > > > And I need to keep, of the rows that have the same value for "A" by id, only > the ones with the least amount of missing values for all the variables (with > min(numMiss)) to get this: > > id A v1 v2 v3 v4 v5 numMiss > 1 id1 11905 NA NA NA N 0 3 > 2 id1 11907 3 2 1 Y 0 0 > 4 id2 11829 1 2 1 Y 1 0 > > Then I have to choose the records with the least value of "A" of the rows > that have the same id like this: > id A v1 v2 v3 v4 v5 numMiss > 1 id1 11905 NA NA NA N 0 3 > 4 id2 11829 1 2 1 Y 1 0 > > For groupings I have used the package "plyr" before, but this would involve > a sort of double-grouping by id and by duplicated values of A…Could you > please help me understand how this can be done? > > Thank you very much. > -f > > > > > > > -- > View this message in context: > http://r.789695.n4.nabble.com/Choose-between-duplicated-rows-tp4557833p4557833.html > Sent from the R help mailing list archive at Nabble.com. > > ______________________________________________ > R-help@r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. ______________________________________________ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.