Here are two related approaches to your problem. The first uses a logical vector, "keep", to say which rows to keep. The second uses an integer vector, it can be considerably faster when the columns are not well correlated with one another (so the number of desired rows is small proportion of the input rows).
f1 <- function (x) { # sieve with logical 'keep' vector stopifnot(is.data.frame(x), ncol(x) > 1) keep <- x[[1]] == x[[2]] for (i in seq_len(ncol(x))[-(1:2)]) { keep <- keep & x[[i - 1]] == x[[i]] } !is.na(keep) & keep } f2 <- function (x) { # sieve with integer 'keep' vector stopifnot(is.data.frame(x), ncol(x) > 1) keep <- which(x[[1]] == x[[2]]) for (i in seq_len(ncol(x))[-(1:2)]) { keep <- keep[which(x[[i - 1]][keep] == x[[i]][keep])] } seq_len(nrow(x)) %in% keep } E.g., for a 10 million by 10 data.frame I get: > x <- data.frame(lapply(structure(1:10,names=letters[1:10]), > function(i)sample(c(NA,1,1,1,2,2,2,3), replace=TRUE, size=1e7))) > system.time(v1 <- f1(x)) user system elapsed 4.04 0.16 4.19 > system.time(v2 <- f2(x)) user system elapsed 0.80 0.00 0.79 > identical(v1, v2) [1] TRUE > head(x[v1,]) a b c d e f g h i j 4811 2 2 2 2 2 2 2 2 2 2 41706 1 1 1 1 1 1 1 1 1 1 56633 1 1 1 1 1 1 1 1 1 1 70859 1 1 1 1 1 1 1 1 1 1 83848 1 1 1 1 1 1 1 1 1 1 84767 1 1 1 1 1 1 1 1 1 1 Bill Dunlap Spotfire, TIBCO Software wdunlap tibco.com > -----Original Message----- > From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org] On > Behalf > Of Sam Steingold > Sent: Friday, January 18, 2013 12:53 PM > To: r-help@r-project.org > Subject: [R] select rows with identical columns from a data frame > > I have a data frame with several columns. > I want to select the rows with no NAs (as with complete.cases) > and all columns identical. > E.g., for > > --8<---------------cut here---------------start------------->8--- > > f <- data.frame(a=c(1,NA,NA,4),b=c(1,NA,3,40),c=c(1,NA,5,40)) > > f > a b c > 1 1 1 1 > 2 NA NA NA > 3 NA 3 5 > 4 4 40 40 > --8<---------------cut here---------------end--------------->8--- > > I want the vector TRUE,FALSE,FALSE,FALSE selecting just the first > row because there all 3 columns are the same and none is NA. > > thanks! > > -- > Sam Steingold (http://sds.podval.org/) on Ubuntu 12.04 (precise) X > 11.0.11103000 > http://www.childpsy.net/ http://memri.org http://mideasttruth.com > http://honestreporting.com http://pmw.org.il http://iris.org.il > All extremists should be taken out and shot. > > ______________________________________________ > R-help@r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. ______________________________________________ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.