Dimitris Rizopoulos wrote: > Wacek Kusnierczyk wrote: >> Wacek Kusnierczyk wrote: >>> Michael Dewey wrote: >>> >>>> At 05:07 30/03/2009, Aaron M. Swoboda wrote: >>>> >>>>> I would like to know which rows are duplicates of each other, not >>>>> simply that a row is duplicate of another row. In the following >>>>> example rows 1 and 3 are duplicates. >>>>> >>>>> >>>>>> x <- c(1,3,1) >>>>>> y <- c(2,4,2) >>>>>> z <- c(3,4,3) >>>>>> data <- data.frame(x,y,z) >>>>>> >>>>> x y z >>>>> 1 1 2 3 >>>>> 2 3 4 4 >>>>> 3 1 2 3 >>>>> >>> i don't have any solution significantly better than what you have >>> already been given. >> >> i now seem to have one: >> >> # dummy data >> data = data.frame(x=sample(1:2, 5, replace=TRUE), y=sample(1:2, 5, >> replace=TRUE)) >> # add a class column; identical rows have the same class id >> data$class = local({ >> rows = do.call('paste', c(data, sep='\r')) >> with( >> rle(sort(rows)), >> rep(1:length(values), lengths)[rank(rows)] ) }) >> >> data >> # x y class >> # 1 2 2 3 >> # 2 2 1 2 >> # 3 2 1 2 >> # 4 1 2 1 >> # 5 2 2 3 >> > > another approach (maybe a bit cleaner) seems to be: > > data <- data.frame(x=sample(1:2, 5, replace=TRUE), y=sample(1:2, 5, > replace = TRUE)) > > vals <- do.call('paste', c(data, sep = '\r')) > data$class <- match(vals, unique(vals)) > data >
wow, cool! this seems unbeatable ;) i guess it can't be slower than any of the others. vQ ______________________________________________ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.