Hi All,

I am trying to setup autosuggest using solr 1.4 for my site and needed some 
pointers on that. Basically, we provide autosuggest for user typed in 
characters in the searchbox. The autosuggest index is created with older user 
typed in search queries which returned > 0 results. We do some lazy writing to 
store this information into the db and then export it to solr on a nightly 
basis. As far as I know, there are 3 ways (apart from wild card search) of 
achieving autosuggest using solr 1.4:

1. Use EdgeNGrams
2. Use shingles and prefix query.
3. Use the new Terms component.

I am for now more inclinded towards using the EdgeNGrams (no method to madness) 
and just wanted to know is there any recommended approach out of the 3 in terms 
of performance, since the user excepts the suggestions to be almost 
instantaneous? We do some heavy caching at our end to avoid hitting solr 
everytime but is any of these 3 approaches faster than the other?

Also, I would also like to return the suggestion even if the user typed in 
query matches in between: for instance if I have the query "chicken pasta" in 
my index and the user types in "pasta", I would also like this query to be 
returned as part of the suggestion (ala Yahoo!). Below is my field definition:

        <fieldType name="suggest" class="solr.TextField" 
positionIncrementGap="100">
            <analyzer type="index">
                <tokenizer class="solr.KeywordTokenizerFactory"/>
                <filter class="solr.LowerCaseFilterFactory"/>
                <filter class="solr.EdgeNGramFilterFactory" minGramSize="2" 
maxGramSize="50" />
            </analyzer>
            <analyzer type="query">
                <tokenizer class="solr.KeywordTokenizerFactory"/>
                <filter class="solr.LowerCaseFilterFactory"/>
            </analyzer>
        </fieldType>


 I tried changing the KeywordTokenizerFactory with LetterTokenizerFactory, and 
though it works great for the above scenario (does a in-between match), it has 
the side-effect of removing everything which are not letters so if the user 
types in "123" he gets absolutely no suggestions. Is there anything that I'm 
missing in my configuration, is this even achievable by using EdgeNGrams or 
shall I look at using perhaps the TermsComponent after applying the regex patch 
from 1.5 and maybe do something like ".*user-typed-in-chars.*"?

Thanks!


 

Reply via email to