Hi,

I am using R's grep function to find patterns in vectors of strings. The
number of patterns I would like to match is 7,700 (of different sizes). I
noticed that I get an error message when I do the following: 

data <- array()
for (j in 1:length(x))
{
array[j] <- length(grep(paste(patterns[1:7700], collapse = "|"),  x[j],
value = T))
}

When I break this up into 4 chunks of patterns it works:

data <- array()
for (j in 1:length(x))
{
array$chunk1[j] <- length(grep(paste(patterns[1:2500], collapse = "|"), 
x[j], value = T))
array$chunk1[j] <- length(grep(paste(patterns[2501:5000], collapse = "|"), 
x[j], value = T))
array$chunk1[j] <- length(grep(paste(patterns[5001:7500], collapse = "|"), 
x[j], value = T))
array$chunk1[j] <- length(grep(paste(patterns[7501:7700], collapse = "|"), 
x[j], value = T))
} 

My questions: what's the maximum size of the patterns argument in grep? Is
there a way to do this faster? It is very slow.

Thanks.

Math

Sorry for not providing a reproducible example. It's a size issue which
makes it difficult to provide an example.

 

--
View this message in context: 
http://r.789695.n4.nabble.com/Maximum-number-of-patterns-and-speed-in-grep-tp4635613.html
Sent from the R help mailing list archive at Nabble.com.

______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Reply via email to