Hi everybody I have found something (for me at least) strange with duplicated(). I will first provide a replicable example of a certain kind of behaviour that I find odd and then give a sample of unexpected results from my own data. I hope someone can help me understand this.
Consider the following # this works as expected ex=sample(1:20, replace=TRUE) ex duplicated(ex) ex=sort(ex) ex duplicated(ex) # but why does duplicate not work after order() ? ex=sample(1:20, replace=TRUE) ex duplicated(ex) ex=order(ex) duplicated(ex) Why does duplicated() not work after order() has been applied but it works fine after sort() ? Is this an error or is there something I don't understand. I have been getting very strage results from duplicated() and unique() in a dataset I am analysing. Her is a little sample of my real life problem > str(Masechaba$PROPDESC) Factor w/ 24545 levels " 06"," 71Hemilton str",..: 14527 8043 16113 16054 13875 15780 12522 7771 14824 12314 ... > # Create a indicator if the PROPDESC is unique. Default false > Masechaba$unique=FALSE > Masechaba$unique[which(is.na(unique(Masechaba$PROPDESC))==FALSE)]=TRUE > # Check is something happended > length(which(Masechaba$unique==TRUE)) [1] 2174 > length(which(Masechaba$unique==FALSE)) [1] 476 > Masechaba$duplicate=FALSE > Masechaba$duplicate[which(duplicated(Masechaba$PROPDESC)==TRUE)]=TRUE > length(which(Masechaba$duplicate==TRUE)) [1] 476 > length(which(Masechaba$duplicate==FALSE)) [1] 2174 > # Looks OK so far > # Test on a known duplicate. I expect one to be true and one to be false > Masechaba[which(Masechaba$PROPDESC==2363),10:12] PROPDESC unique duplicate 24874 2363 TRUE FALSE 31280 2363 TRUE TRUE # This is strange. I expected that unique() and duplicate() would give the same results. The variable PROPDESC is clearly not unique in both cases. # The totals are the same but not the individual results > table(Masechaba$unique,Masechaba$duplicate) FALSE TRUE FALSE 342 134 TRUE 1832 342 I don't understand this. Is there something I am missing? Best regards Christaan P.S > sessionInfo() R version 2.11.1 (2010-05-31) x86_64-apple-darwin9.8.0 locale: [1] en_US.UTF-8/en_US.UTF-8/C/C/en_US.UTF-8/en_US.UTF-8 attached base packages: [1] splines stats graphics grDevices utils datasets methods base other attached packages: [1] plyr_0.1.9 maptools_0.7-34 lattice_0.18-8 foreign_0.8-40 Hmisc_3.8-0 survival_2.35-8 rgdal_0.6-26 [8] sp_0.9-64 loaded via a namespace (and not attached): [1] cluster_1.12.3 grid_2.11.1 tools_2.11.1 [[alternative HTML version deleted]] ______________________________________________ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.