Thanks, I see that it is working in the sample data. My data, however, gives me an error message:
data <- strapplyc(text, batch[[l]]) Error in structure(.External("dotTcl", ..., PACKAGE = "tcltk"), class = "tclObj") : [tcl] couldn't compile regular expression pattern: parentheses () not balanced. batch[[l]] is similar to your "re" string except that there is a larger variety of characters. I haven't been able to figure out which characters are causing trouble here. Any thoughts? Thank you very much. Math Gabor Grothendieck wrote > > On Fri, Jul 6, 2012 at 10:45 AM, mdvaan <mathijsdevaan@> wrote: >> Hi, >> >> I am using R's grep function to find patterns in vectors of strings. The >> number of patterns I would like to match is 7,700 (of different sizes). I >> noticed that I get an error message when I do the following: >> >> data <- array() >> for (j in 1:length(x)) >> { >> array[j] <- length(grep(paste(patterns[1:7700], collapse = "|"), x[j], >> value = T)) >> } >> >> When I break this up into 4 chunks of patterns it works: >> >> data <- array() >> for (j in 1:length(x)) >> { >> array$chunk1[j] <- length(grep(paste(patterns[1:2500], collapse = "|"), >> x[j], value = T)) >> array$chunk1[j] <- length(grep(paste(patterns[2501:5000], collapse = >> "|"), >> x[j], value = T)) >> array$chunk1[j] <- length(grep(paste(patterns[5001:7500], collapse = >> "|"), >> x[j], value = T)) >> array$chunk1[j] <- length(grep(paste(patterns[7501:7700], collapse = >> "|"), >> x[j], value = T)) >> } >> >> My questions: what's the maximum size of the patterns argument in grep? >> Is >> there a way to do this faster? It is very slow. > > Try strapplyc in gsubfn and see > http://gsubfn.googlecode.com > for more info. > > # test data > x <- c("abcd", "z", "dbef") > > # re is regexp with 7700 alternatives > # to test with > g <- expand.grid(letters, letters, letters) > gp <- do.call("paste0", g) > gp7700 <- head(gp, 7700) > re <- paste(gp7700, collapse = "|") > > # grep gives error message > grep.out <- grep(re, x) > > # strapplyc works > library(gsubfn) > which(sapply(strapplyc(x, re), length) > 0) > > > -- > Statistics & Software Consulting > GKX Group, GKX Associates Inc. > tel: 1-877-GKX-GROUP > email: ggrothendieck at gmail.com > > ______________________________________________ > R-help@ mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide > http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. > -- View this message in context: http://r.789695.n4.nabble.com/Maximum-number-of-patterns-and-speed-in-grep-tp4635613p4636437.html Sent from the R help mailing list archive at Nabble.com. ______________________________________________ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.