Hi, I have a minor follow-up question:
In the example below, "ann" and "nn" in the third element of text are matched. I would like to ignore all matches in which the character following the match is one of [:alpha:]. How do I do this without removing the "ignore.case = TRUE" argument of the strapply function? So the output should be: [[1]] [1] "Santa Fe Gold Corp" [[2]] [1] "Starpharma Holdings" [[3]] NULL Rather than: [[1]] [1] "Santa Fe Gold Corp" [[2]] [1] "Starpharma Holdings" [[3]] [1] "ann" "nn" Thanks! require(gsubfn) # read in data data <- read.csv("https://dl.dropbox.com/u/13631687/data.csv", header = T, sep = ",") # define the object to be searched text <- c("the first is Santa Fe Gold Corp", "the second is Starpharma Holdings", "the annual earnings exceed those of last year") k <- 3000 # chunk size f <- function(from, text) { to <- min(from + k - 1, nrow(data)) r <- paste(data[seq(from, to), 1], collapse = "|") r <- gsub("[().*?+{}]", "", r) strapply(text, r, ignore.case = TRUE) } ix <- seq(1, nrow(data), k) out <- lapply(text, function(text) unlist(lapply(ix, f, text))) -- View this message in context: http://r.789695.n4.nabble.com/Maximum-number-of-patterns-and-speed-in-grep-tp4635613p4637458.html Sent from the R help mailing list archive at Nabble.com. ______________________________________________ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.