Hello. I am trying to remove from my dataframe, those rows in which the first 7 columns are duplicated even if subsequent columns make those rows unique.
df<-data.frame(id=rep(c('amy','bob','joe') , each=5), pet1=sample(LETTERS[1:3],15, replace=T), pet2=sample(LETTERS[1:3],15, replace=T), pet3=sample(LETTERS[1:5],15, replace=T)) >df id pet1 pet2 pet3 1 amy C B A 2 amy B A A 3 amy A A D 4 amy B C A 5 amy C B B 6 bob B A A 7 bob C A C 8 bob C C A 9 bob B C E 10 bob C B C 11 joe C B A 12 joe A B E 13 joe C C B 14 joe C A D 15 joe A C C I am trying to identify and remove the rows of df that are duplicates in df[,1:3]. culled.df<-unique(x[,1:3]) >culled.df id pet1 pet2 1 amy A A 2 amy C C 3 amy C A 5 amy A B 6 bob A B 7 bob C C 8 bob B C 10 bob B A 11 joe B B 13 joe B C 14 joe B A This is where I'm hung up. I've been trying match() or %in% to get the rows of df where df[,1:3] match df.culled > df[df.culled %in% df[,1:3],] Is this a reasonable solution, or am I making it more difficult than it need to be? Thanks for your suggestions, Jason -- View this message in context: http://r.789695.n4.nabble.com/partial-duplicates-of-dataframe-rows-indexing-and-removal-tp4171322p4171322.html Sent from the R help mailing list archive at Nabble.com. [[alternative HTML version deleted]] ______________________________________________ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.