Re: [R] Maximum number of patterns and speed in grep

mdvaan Mon, 23 Jul 2012 11:44:17 -0700

Hi,

I have a minor follow-up question:


In the example below, "ann" and "nn" in the third element of text are
matched. I would like to ignore all matches in which the character following
the match is one of [:alpha:]. How do I do this without removing the
"ignore.case = TRUE" argument of the strapply function?

So the output should be:

[[1]]
[1] "Santa Fe Gold Corp"

[[2]]
[1] "Starpharma Holdings"

[[3]]
NULL

Rather than:

[[1]]
[1] "Santa Fe Gold Corp"

[[2]]
[1] "Starpharma Holdings"

[[3]]
[1] "ann" "nn"

Thanks!


require(gsubfn)

# read in data 
data <- read.csv("https://dl.dropbox.com/u/13631687/data.csv";, header = T,
sep = ",") 

# define the object to be searched 
text <- c("the first is Santa Fe Gold Corp", "the second is Starpharma
Holdings", "the annual earnings exceed those of last year") 

k <- 3000 # chunk size 

f <- function(from, text) { 
  to <- min(from + k - 1, nrow(data)) 
  r <- paste(data[seq(from, to), 1], collapse = "|") 
  r <- gsub("[().*?+{}]", "", r) 
  strapply(text, r, ignore.case = TRUE) 
} 
ix <- seq(1, nrow(data), k) 
out <- lapply(text, function(text) unlist(lapply(ix, f, text))) 



--
View this message in context: 
http://r.789695.n4.nabble.com/Maximum-number-of-patterns-and-speed-in-grep-tp4635613p4637458.html
Sent from the R help mailing list archive at Nabble.com.

______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Maximum number of patterns and speed in grep

Reply via email to