Dear R users,
Basically, from the following arbitrary data set: a <- data.frame(id=c(c("A1","A2","A3","A4","A5"),c("A3","A2","A3","A4","A5")),loc=c("B1","B2","B3","B4","B5"),clm=c(rep(("General"),6),rep("Life",4))) > a id loc clm 1 A1 B1 General 2 A2 B2 General 3 A3 B3 General 4 A4 B4 General 5 A5 B5 General 6 A3 B1 General 7 A2 B2 Life 8 A3 B3 Life 9 A4 B4 Life 10 A5 B5 Life I desire removing records (highlighted records above) with identical values in each fields ("id" & "loc") but with different value of "clm" (i.e according to category) i.e > categ <- table(a$id,a$clm) > categ General Life A1 1 0 A2 1 1 A3 2 1 A4 1 1 A5 1 1 The desired output is id loc clm 1 A1 B1 General 6 A3 B1 General Because the data set I am working on is quite big (~ 800,000 x 20) with majority of the fields values being long strings, looping turned out to be very inefficient in comapring individual rows.. Are there any alternative efficient methods in implementing this problem? Greatly appreciate for your expertise. Steven [[alternative HTML version deleted]] ______________________________________________ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.