Gabor,

Thanks for the suggestion, I'll try it out tonight or tomorrow.

Regards,
Richard

_____________________
Richard R. Liu
Dittingerstr. 33
CH-4053 Basel
Switzerland
Tel. +41 79 708 67 66

Sent from my iPhone 3GS

On Apr 29, 2010, at 13:06, Gabor Grothendieck <ggrothendi...@gmail.com> wrote:

Using charmatch partial matches of 10,000 5 leters words to the same
list can be done in 10 seconds on my machine and 10,000 5 letter words
to 100,000 10 letter words in 1 minute.  Is that good enough?  Try
this simulation:

# generate N random words each k long
rwords <- function(N, k) {
  L <- sample(letters, N*k, replace = TRUE)
  apply(matrix(L, k), 2, paste, collapse = "")
}
w1 <- rwords(1e5, 10)
w2 <- rwords(1e4, 5)

system.time(charmatch(w2, w2))

system.time(charmatch(w2, w1))


On Thu, Apr 29, 2010 at 4:05 AM, Richard R. Liu <richard....@pueo-owl.ch > wrote:
I have an application that a long list of character strings to determine which occur at the beginning of a given word. A straight forward R script using grep takes a long time to run. Rewriting it to use substr and match might be an option, but I have the impression that preparing the list as a trie and performing trie searches might lead to dramatic improvements in performance.


I have searched the CRAN packages and find no packages that support Compact
Patricia Trees.  Does anybody know of such?


Thanks,
Richard

Richard R. Liu
richard....@pueo-owl.ch

______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.



______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Reply via email to