Dear r experts, Sorry for this basic question, but I can't seem to find a solution…
I have this data frame: df <- data.frame(id = c("id1", "id1", "id1", "id2", "id2", "id2"), A = c(11905, 11907, 11907, 11829, 11829, 11829), v1 = c(NA, 3, NA,1,2,NA), v2 = c(NA,2,NA, 2, NA,NA), v3 = c(NA,1,NA,1,NA,NA), v4 = c("N", "Y", "N", "Y", "N","N"), v5 = c(0,0,0,1,0,0), numMiss=c(3,0,3,0,2,3)) > df id A v1 v2 v3 v4 v5 numMiss 1 id1 11905 NA NA NA N 0 3 2 id1 11907 3 2 1 Y 0 0 3 id1 11907 NA NA NA N 0 3 4 id2 11829 1 2 1 Y 1 0 5 id2 11829 2 NA NA N 0 2 6 id2 11829 NA NA NA N 0 3 And I need to keep, of the rows that have the same value for "A" by id, only the ones with the least amount of missing values for all the variables (with min(numMiss)) to get this: id A v1 v2 v3 v4 v5 numMiss 1 id1 11905 NA NA NA N 0 3 2 id1 11907 3 2 1 Y 0 0 4 id2 11829 1 2 1 Y 1 0 Then I have to choose the records with the least value of "A" of the rows that have the same id like this: id A v1 v2 v3 v4 v5 numMiss 1 id1 11905 NA NA NA N 0 3 4 id2 11829 1 2 1 Y 1 0 For groupings I have used the package "plyr" before, but this would involve a sort of double-grouping by id and by duplicated values of A…Could you please help me understand how this can be done? Thank you very much. -f -- View this message in context: http://r.789695.n4.nabble.com/Choose-between-duplicated-rows-tp4557833p4557833.html Sent from the R help mailing list archive at Nabble.com. ______________________________________________ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.