On Fri, Jul 13, 2012 at 1:41 PM, mdvaan <mathijsdev...@gmail.com> wrote: > Here's some data (which should give you the error messages): > > # read in data > data <- read.csv("https://dl.dropbox.com/u/13631687/data.csv", header = > T, sep = ",") > > # first paste all data > data1 <- paste(data[,1], collapse = "|") > > # second paste subsets of the data > data2a <- paste(data[1:750,1], collapse = "|") > data2b <- paste(data[751:1500,1], collapse = "|") > > # define the object to be searched > text <- c("the first is Santa Fe Gold Corp", "the second is Starpharma > Holdings") > > # match > strapplyc(text, data1) > strapplyc(text, data2a) > strapplyc(text, data2b) > > Thanks in advance! >
Although it seems that strapplyc can handle larger regular expressions than grep in R it seems neither can handle as many as in your example so process it in chunks: k <- 3000 # chunk size f <- function(from, text) { to <- min(from + k - 1, nrow(data)) r <- paste(data[seq(from, to), 1], collapse = "|") r <- gsub("[().*?+{}]", "", r) strapply(text, r) } ix <- seq(1, nrow(data), k) out <- lapply(text, function(text) unlist(lapply(ix, f, text))) -- Statistics & Software Consulting GKX Group, GKX Associates Inc. tel: 1-877-GKX-GROUP email: ggrothendieck at gmail.com ______________________________________________ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.