Dimitris Rizopoulos wrote: > >>> >> >> another approach (maybe a bit cleaner) seems to be: >> >> data <- data.frame(x=sample(1:2, 5, replace=TRUE), y=sample(1:2, 5, >> replace = TRUE)) >> >> vals <- do.call('paste', c(data, sep = '\r')) >> data$class <- match(vals, unique(vals)) >> data >> >> >> I have tried benchmarking it. > > sorry, I wanted to write: I have *not* tried benchmarking it.
# dummy data frame, just integers n = 100; m = 100 data = as.data.frame( matrix(nrow=n, ncol=m, sample(n, m*n, replace=TRUE))) # do a simple benchmarking library(rbenchmark) benchmark( replications=100, order='elapsed', columns=c('test', 'elapsed'), waku=local({ rows = do.call('paste', c(data, sep='\r')) data$class = with( rle(sort(rows)), rep(1:length(values), lengths)[rank(rows)] ) }), diri=local({ values = do.call('paste', c(data, sep='\r')) data$class = match(values, unique(values)) }) ) # test elapsed # 2 diri 0.43 # 1 waku 0.52 comparable for m=n=100 (and even better for n >> m), but way cleaner code, and the class ids are now better sorted. that's collaborative problem solving ;) best, vQ ______________________________________________ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.