If you want any letter and any possible substring you might be better off
breaking every word into single letters with special tokens between words:
ie:

the quick brown fox

Becomes

t h e ZZ q u i c k ZZ b r o w n ZZ f o x

then you can do all the single letter searches and multi letter searches
turn into phrase searches.  Ie:

uic (from quick)

would be rewritten as

"u i c"

And so on.  This should give you better performance and more predictable
results than wildcard searches depending on the size and complexity of your
data.  Relevancy would be horrible since the tf/idf would always have a
common denominator depending on character set but there are ways around that
as well.

- will 

 

-----Original Message-----
From: Mike Klaas [mailto:[EMAIL PROTECTED] 
Sent: Friday, November 30, 2007 7:51 PM
To: solr-user@lucene.apache.org
Subject: Re: Tips for searching

On 30-Nov-07, at 4:43 PM, Dave C. wrote:

>
> Thanks for the quick response Mike...
> Ideally it should match more than just a single character, i.e.  
> "the" in "weather" or "pro" in "profile" or "000" in "18000".
>
> Would these cases be taken care of by the StopFilterFactory?

No... you are looking for variant of WildcardQuery's.  Prefix  
wildcards are supported (pro* -> profile), but generalize wildcard  
queries aren't enabled by default.  There has been lots of discussion  
on the list if you do a search.

-Mike

Reply via email to