> -----Original Message----- > From: r-help-boun...@r-project.org > [mailto:r-help-boun...@r-project.org] On Behalf Of David Winsemius > Sent: Monday, February 21, 2011 3:11 PM > To: David Winsemius > Cc: r-help@r-project.org; IgnacioQM > Subject: Re: [R] How to delete rows with specific values on > all columns(variables)? > > > On Feb 21, 2011, at 6:05 PM, David Winsemius wrote: > > > > > On Feb 21, 2011, at 4:03 PM, IgnacioQM wrote: > > > >> I need to filter my data: > >> I think its easy but i'm stuck so i'll appreciate some help: > >> > >> I have a data frame with 14 variables and 6 million rows. About > >> half of this > >> rows have a value of "0" in 12 variables (the other two variables > >> always > >> have values). How can I delete the rows in which all 12 variables > >> have the > >> value of "0". > >> > >> example (from my data, variable 14 is missing): > >> > >> 1783 81 85 78 89 71 97 76 > >> 66 88 > >> 95 95 98 -57.48258 > >> 1784 81 86 79 90 71 97 77 > >> 66 88 > >> 95 95 98 -57.43768 > >> 1785 81 86 79 90 71 98 77 > >> 66 89 > >> 95 94 98 -57.39278 > >> 1786 0 0 0 0 0 0 0 > >> 0 0 > >> 0 0 0 -57.34788 > >> 1787 0 0 0 0 0 0 0 > >> 0 0 > >> 0 0 0 -57.30298 > >> 1788 80 86 80 90 72 98 78 > >> 66 88 > >> 93 93 96 -57.25808 > >> 1789 77 83 78 88 70 95 76 > >> 63 86 > >> 91 90 93 -57.21318 > >> 1790 77 84 79 89 70 96 76 > >> 64 87 > >> 91 90 93 -57.16828 > >> > >> I would need to delete rows 1786 & 1787. > > > > something along the lines of: > > > > dfrm[ -apply(dfrm, 1, function(x) all(x==0) ), ] > > Looking at a second time, I see the qualification of only the > first 12 > rows, so > > dfrm[ -apply(dfrm[, 1:12], 1, function(x) all(x==0) ), ]
I think you want !apply, not -apply, as in f0 <- function (dfrm) { dfrm[!apply(dfrm[, 1:12], 1, function(x) all(x == 0)), ] } > > Email obscured that compounded by the fact that you didn't post a > reproducible data object. A faster and safer way would be to operate a column at a time (faster when there are many more rows than colummns) and to avoid apply (safer, as it turns the data.frame into a matrix whose storage.mode might surprise you and lead to errors in the x==0 test). E.g., f1 <- function (dfrm) { isZero <- function(x) !is.na(x) & x == 0 areAllColsZero <- isZero(dfrm[, 1]) for (col in dfrm[, 2:12]) areAllColsZero <- areAllColsZero & isZero(col) dfrm[!areAllColsZero, , drop = FALSE] } You can use Reduce() instead of the loop, but the loop code is easy to understand. I made some fake data with the following function, which makes all but 12 rows be all-zero: makeData <- function (nrow) { rowNum <- seq_len(nrow) data.frame(lapply(structure(1:12, names = paste("X", 1:12, sep = "")), function(i) as.integer(rowNum == (2 * i))), Data1 = 1:nrow, Data2 = sqrt(1:nrow)) } To test this out: > dfrm <- makeData(1e6) # million rows, 12 to keep > system.time(r0 <- f0(dfrm)) user system elapsed 21.45 0.60 22.55 > system.time(r1 <- f1(dfrm)) # faster user system elapsed 0.87 0.07 0.88 > identical(r0, r1) # gives same results [1] TRUE > dim(r0) [1] 12 14 > r0 X1 X2 X3 X4 X5 X6 X7 X8 X9 X10 X11 X12 Data1 Data2 2 1 0 0 0 0 0 0 0 0 0 0 0 2 1.414214 4 0 1 0 0 0 0 0 0 0 0 0 0 4 2.000000 ... Bill Dunlap Spotfire, TIBCO Software wdunlap tibco.com > > > >> > >> I tried subset with variable1>"0"&variable2>"0", but it wasn't > >> useful 'cause > >> it only took the rows that didn't have a 0 in any of the > variables; > >> I only > >> need in ALL of the variables simultaneously. > >> > >> Thanks, > >> > >> Ignacio > > > > > David Winsemius, MD > West Hartford, CT > > ______________________________________________ > R-help@r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide > http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. > ______________________________________________ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.