Dear R-Helpers, I have a dataframe (g10df) formatted like this:
GENE PVAL 1 KCTD12 4.06904e-22 2 UNC93A 9.91852e-22 3 CDKN3 1.24695e-21 4 CLEC2B 4.71759e-21 5 DAB2 1.12062e-20 The rows are ranked in ascending order by PVAL, and I need to end up with the same relative order. There are duplicate entries for genes in the first column with corresponding p-values in the second, but the p-values are unique. I had intended to use the plyr package to remove these duplicates: ddply(g10df, "GENE", summarise, PVAL = mean(PVAL)) But it occurred to me that instead of averaging the p-values for each set of duplicates, I should instead select one duplicate at random, and remove the rest. I am relatively new to R, and I have not been able to find a way to do this, with plyr or otherwise. Any help would be greatly appreciated. Thanks and best regards, Jeff -- View this message in context: http://n4.nabble.com/Choosing-and-preserving-a-random-duplicate-tp1746091p1746091.html Sent from the R help mailing list archive at Nabble.com. ______________________________________________ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.