>>>>> Duncan Murdoch <murdoch.dun...@gmail.com> >>>>> on Tue, 18 Nov 2014 10:40:16 -0500 writes:
> On 18/11/2014 10:23 AM, Dennis Fisher wrote: >> R 3.1.1 >> OS X >> >> Colleagues >> >> When I use the duplicated function, I often need to find both the duplicates and the original element that was duplicated. This can be accomplished with: >> duplicated(OBJECT) | duplicated(OBJECT, fromLast=TRUE) >> >> From my perspective, an improvement in the duplicated function would be an option that accomplishes this with a single call to the function. This could either be: >> 1. a new option: all=TRUE (pick whatever name makes sense) >> 2. allowing fromLast to take a new value (e.g., NA, in the spirit of the xpd option in par()) >> >> If my suggestion would yield unintended consequences, it can certainly be ignored. > The duplicated() function is pretty fast, so what's wrong with your > original version? If you find it to be too much typing, wouldn't it be > simplest to write your own function, e.g. > nonunique <- function(x) duplicated(x) | duplicated(x, fromLast=TRUE) > ? > Something I've wanted more than once is a variation on duplicated that > returns the index of the duplicated element, so for example > dupindex(c(7,7,7,2,3,2)) > would return > 0 1 1 0 0 4 > or possibly > 1 1 1 4 5 4 > Duncan Murdoch In our CRAN package 'sfsmisc' (http://cran.r-project.org/web/packages/sfsmisc) we have had a function Duplicated() for a while now with the following "feature": > x <- c(9:12, 1:4, 3:6, 0:7) > data.frame(x, dup = duplicated(x), + dupL= duplicated(x, fromLast=TRUE), + Dup = Duplicated(x), + DupL= Duplicated(x, fromLast=TRUE)) x dup dupL Dup DupL 1 9 FALSE FALSE NA NA 2 10 FALSE FALSE NA NA 3 11 FALSE FALSE NA NA 4 12 FALSE FALSE NA NA 5 1 FALSE TRUE 3 1 6 2 FALSE TRUE 4 2 7 3 FALSE TRUE 1 3 8 4 FALSE TRUE 2 4 9 3 TRUE TRUE 1 3 10 4 TRUE TRUE 2 4 11 5 FALSE TRUE 7 7 12 6 FALSE TRUE 8 8 13 0 FALSE FALSE NA NA 14 1 TRUE FALSE 3 1 15 2 TRUE FALSE 4 2 16 3 TRUE FALSE 1 3 17 4 TRUE FALSE 2 4 18 5 TRUE FALSE 7 7 19 6 TRUE FALSE 8 8 20 7 FALSE FALSE NA NA > ---- help page -------------------------------------------------------- Duplicated package:sfsmisc R Documentation Counting-Generalization of duplicated() Description: Duplicated() generalizes the ‘duplicated’ method for vectors, by returning indices of “equivalence classes” for duplicated entries and returning ‘nomatch’ (‘NA’ by default) for unique entries. Note that ‘duplicated()’ is not ‘TRUE’ for the first time a duplicate appears, whereas ‘Duplicated()’ only marks unique entries with ‘nomatch’ (‘NA’). Usage: Duplicated(v, incomparables = FALSE, fromLast = FALSE, nomatch = NA_integer_) Arguments: v: a vector, often character, factor, or numeric. incomparables: a vector of values that cannot be compared, passed to both ‘duplicated()’ and ‘match()’. ‘FALSE’ is a special value, meaning that all values can be compared, and may be the only value accepted for methods other than the default. It will be coerced internally to the same type as ‘x’. fromLast: logical indicating if duplication should be considered from the reverse side, i.e., the last (or rightmost) of identical elements would correspond to ‘duplicated=FALSE’. nomatch: passed to ‘match()’: the value to be returned in the case when no match is found. Note that it is coerced to ‘integer’. Value: an integer vector of the same length as ‘v’. Can be used as a ‘factor’, e.g., in ‘split’, ‘tapply’, etc. Author(s): Christoph Buser and Martin Maechler, Seminar fuer Statistik, ETH Zurich, Sep.2007 See Also: ‘uniqueL’ (also in ‘sfsmisc’); ‘duplicated’, ‘match’. Examples: x <- c(9:12, 1:4, 3:6, 0:7) data.frame(x, dup = duplicated(x), dupL= duplicated(x, fromLast=TRUE), Dup = Duplicated(x), DupL= Duplicated(x, fromLast=TRUE)) ______________________________________________ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.