Hi, example<- data.frame(id1,id2,GENDER,ETH,stringsAsFactors=FALSE) res<-unique(example[!(grepl("UNK",example$GENDER)|grepl("UNK",example$ETH)),]) res # id1 id2 GENDER ETH #1 1 22 G-M E-VT #3 2 34 G-M E-AF #5 3 15 G-M E-AF #7 4 76 G-F E-VT #8 5 45 G-F E-VT #12 7 37 G-F E-AF #13 8 52 G-F E-AF #14 9 66 G-F E-AF #16 10 91 G-F E-VT
It is a bit unclear about the condition for id1 #6. If I include both of them, the nrows will be 11, now it is 9. 10 6 84 G-UNK E-AF 11 6 84 G-F E-UNK A.K. ----- Original Message ----- From: Robert Lynch <robert.b.ly...@gmail.com> To: R help <r-help@r-project.org> Cc: Sent: Saturday, September 7, 2013 3:02 AM Subject: [R] finding both rows that are duplicated in a data frame I have a data frame that looks like id1<-c(1,1,2,2,3,3,4,5,5,6,6,7,8,9,9,10) id2<-c(22,22,34,34,15,15,76,45,45,84,84,37,52,66,66,91) GENDER<-sample(c("G-UNK","G-M","G-F"),16, replace = TRUE) ETH <-sample(c("E-AF","E-UNK","E-VT"),16, replace = TRUE) example<-cbind(id1,id2,GENDER,ETH) where there are two id's and some duplicate entries for ID's that have different GENDER or ETH(nicity) I would like to get a data frame that doesn't have the duplicates, but the ones that are kept are which ever GENDER is not G-UNK (unknown) and the kept ETH is what ever is not E-UNK the resultant data frame should have 10 rows with no *-UNK in either of the last two columns ( unless both entries were UNK) yes the example data may have some impossible results but it does capture important aspects. 1) G-UNK is alphabetically last of G-F, G-M & G-UNK 2) E-UNK is in the middle alphabetically 3) some times the first entry is the unknown gender, some times it is the second *likely to happen with random sample 4) some times both entries for one variable, GENDER or ETH are unknown. 5) only appears to be two of each row, * not 100% sure Thanks! Robert [[alternative HTML version deleted]] ______________________________________________ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. ______________________________________________ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.