I have a database of text documents (letter sequences). Several thousands of documents with approx. 1000-2000 letters each.
I need to find exact matches of short 3-15 letters sequences in those documents. Without any regexp patterns the search of one 3-15 letter "words" takes in the order of 1s. So for a database with several thousand documents it's an the order of hours. The naive approach would be to use mcmapply, but than on a standard hardware I am still in the same order and since R is an interactive programming environment this isn't a solution I would go for. But aren't there faster algorithmic solutions? Can anyone point me please to an implementation available in R. Thank you Witold -- Witold Eryk Wolski [[alternative HTML version deleted]] ______________________________________________ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.