> * Sam Steingold <f...@tah.bet> [2012-10-16 11:03:27 -0400]: > > I need an analogue of "uniq -c" for a data frame.
Summary of options: 1. William: isFirstInRun <- function(x) UseMethod("isFirstInRun") isFirstInRun.default <- function(x) c(TRUE, x[-1] != x[-length(x)]) isFirstInRun.data.frame <- function(x) { stopifnot(ncol(x)>0) retval <- isFirstInRun(x[[1]]) for(column in x) { retval <- retval | isFirstInRun(column) } retval } row.count.1 <- function (x) { i <- which(isFirstInRun(x)) data.frame(x[i,], count=diff(c(i, 1L+nrow(x)))) } 147 seconds 2. http://orgmode.org/worg/org-contrib/babel/examples/Rpackage.html#sec-6-1 row.count.2 <- function (x) { equal.to.previous <- rowSums( x[2:nrow(x),] != x[1:(nrow(x)-1),] )==0 tf.runs <- rle(equal.to.previous) counts <- c(1, unlist(mapply(function(x,y) if (y) x+1 else (rep(1,x)), tf.runs$length, tf.runs$value))) counts <- counts[ c( diff( counts ) <= 0, TRUE ) ] unique.rows <- which( c(TRUE, !equal.to.previous ) ) cbind(x[ unique.rows, ,drop=FALSE ], counts) } 136 seconds 3. Micael: paste/strsplit row.count.3 <- function (x) { pa <- do.call(paste,x) rl <- rle(p) sp <- strsplit(as.character(rl$values)," ") data.frame(user = sapply(sp,"[",1), country = sapply(sp,"[",2), language = sapply(sp,"[",3), count = rl$length) } here I know the columns and rely on absense of spaces in values. 27 seconds. Thanks to all who answered. -- Sam Steingold (http://sds.podval.org/) on Ubuntu 12.04 (precise) X 11.0.11103000 http://www.childpsy.net/ http://www.PetitionOnline.com/tap12009/ http://thereligionofpeace.com http://ffii.org http://camera.org A slave dreams not of Freedom, but of owning his own slaves. ______________________________________________ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.