At 12:26 PM +0100 2/10/00, [EMAIL PROTECTED] wrote:
>by the code. But if it's activated, a list of unique words is maintained
>in the index. I use this a lot in a context other than htdig so I'm really
>sure it works well. But it takes a bit more space, of course.
>  The 'substring' search could browse this list instead of the complete index
>and that would give a list of candidates much more quickly.

This would speed up things for now, but the algorithm I'm thinking of 
would speed things up considerably more than this. Basically, you 
make a list of all the trigrams (or n-grams) in your query, then you 
use a pre-computed trigram database (like the metaphone and soundex 
ones) to narrow down the search space to only the words that have all 
the trigrams in your query. There will be some mismatches, so you'll 
still need to check the substring on these but you'll be doing it 
over a small subset of the word database.

e.g.

query: metaphone -> met eta tap aph pho hon one

There are obviously few words with all of these trigrams so you're set.

-Geoff


------------------------------------
To unsubscribe from the htdig3-dev mailing list, send a message to
[EMAIL PROTECTED] 
You will receive a message to confirm this. 

Reply via email to