On Fri, Feb 24, 2012 at 01:00:00PM +0530, Apoorva Gupta wrote: > Dear R users, > > I have a data.frame as follows > > a b c d e > [1,] 1 1 1 0 0 > [2,] 1 1 0 0 0 > [3,] 1 1 0 0 0 > [4,] 0 1 1 1 1 > [5,] 0 1 1 1 1 > [6,] 1 1 1 1 1 > [7,] 1 1 1 0 1 > [8,] 1 1 1 0 1 > [9,] 1 1 1 0 0 > [10,] 1 1 1 0 0 > > Within these 4 vectors, I want to choose those vectors for which I > have the pattern (0,0,1,1,1,1) occuring anywhere in the vector. > This means I want vectors a,c,e and not b and d.
Hi. A related thread was [R] matching a sequence in a vector? which started at https://stat.ethz.ch/pipermail/r-help/2012-February/303608.html https://stat.ethz.ch/pipermail/r-help/attachments/20120215/989a2e88/attachment.pl and a summary of suggested solutions was at https://stat.ethz.ch/pipermail/r-help/2012-February/303756.html Try the following, where any of the functions occur* described there may be used instead of occur1. The original function returned the vector "candidate" of the indices, where an occurence of "patrn" in "exmpl" starts. For your purposes, the function has to be modified in two directions. 1. The output is the condition length(candidate) != 0 instead of "candidate". 2. The argument "exmpl" is the first argument. # your data frame df <- structure(list(a = c(1L, 1L, 1L, 0L, 0L, 1L, 1L, 1L, 1L, 1L), b = c(1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L), c = c(1L, 0L, 0L, 1L, 1L, 1L, 1L, 1L, 1L, 1L), d = c(0L, 0L, 0L, 1L, 1L, 1L, 0L, 0L, 0L, 0L), e = c(0L, 0L, 0L, 1L, 1L, 1L, 1L, 1L, 0L, 0L)), .Names = c("a", "b", "c", "d", "e"), class = "data.frame", row.names = c(NA, -10L)) # modified function occur1 testoccur1 <- function(exmpl, patrn) { m <- length(patrn) n <- length(exmpl) candidate <- seq.int(length=n-m+1) for (i in seq.int(length=m)) { candidate <- candidate[patrn[i] == exmpl[candidate + i - 1]] } length(candidate) != 0 } selection <- unlist(lapply(df, testoccur1, patrn=c(0,0,1,1,1,1))) selection a b c d e TRUE FALSE TRUE FALSE TRUE df[, selection] a c e 1 1 1 0 2 1 0 0 3 1 0 0 4 0 1 1 5 0 1 1 6 1 1 1 7 1 1 1 8 1 1 1 9 1 1 0 10 1 1 0 In your post, you printed not a data frame, but a matrix. If your structure is a matrix, try the following # your matrix mat <- as.matrix(df) mat a b c d e [1,] 1 1 1 0 0 [2,] 1 1 0 0 0 [3,] 1 1 0 0 0 [4,] 0 1 1 1 1 [5,] 0 1 1 1 1 [6,] 1 1 1 1 1 [7,] 1 1 1 0 1 [8,] 1 1 1 0 1 [9,] 1 1 1 0 0 [10,] 1 1 1 0 0 # selection of columns sel <- apply(mat, 2, testoccur1, patrn=c(0,0,1,1,1,1)) mat[, sel] a c e [1,] 1 1 0 [2,] 1 0 0 [3,] 1 0 0 [4,] 0 1 1 [5,] 0 1 1 [6,] 1 1 1 [7,] 1 1 1 [8,] 1 1 1 [9,] 1 1 0 [10,] 1 1 0 Hope this helps. Petr Savicky. ______________________________________________ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.