I have a database of text documents (letter sequences). Several thousands
of documents with approx. 1000-2000 letters each.

I need to find exact matches of short 3-15 letters sequences in those
documents.

Without any regexp patterns the search of one 3-15 letter "words" takes in
the order of 1s.

So for a database with several thousand documents it's an the order of
hours.
The naive approach would be to use mcmapply, but than on a standard
hardware I am still in the same order and since R is an interactive
programming environment this isn't a solution I would go for.

But aren't there faster algorithmic solutions? Can anyone point me please
to an implementation  available in R.

Thank you
Witold




-- 
Witold Eryk Wolski

        [[alternative HTML version deleted]]

______________________________________________
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Reply via email to