On Feb 24, 2010, at 1:17 AM, Sachin wrote:

> Hi All,
> 
> I am trying to setup autosuggest using solr 1.4 for my site and needed some 
> pointers on that. Basically, we provide autosuggest for user typed in 
> characters in the searchbox. The autosuggest index is created with older user 
> typed in search queries which returned > 0 results. We do some lazy writing 
> to store this information into the db and then export it to solr on a nightly 
> basis. As far as I know, there are 3 ways (apart from wild card search) of 
> achieving autosuggest using solr 1.4:
> 
> 1. Use EdgeNGrams
> 2. Use shingles and prefix query.
> 3. Use the new Terms component.

Another scenario you did not consider is the approach I recommend in my book 
(p. 156).  There's a poor example of this on the wiki: 
http://wiki.apache.org/solr/SimpleFacetParameters#Facet_prefix_.28term_suggest.29

> I am for now more inclinded towards using the EdgeNGrams (no method to 
> madness) and just wanted to know is there any recommended approach out of the 
> 3 in terms of performance, since the user excepts the suggestions to be 
> almost instantaneous? We do some heavy caching at our end to avoid hitting 
> solr everytime but is any of these 3 approaches faster than the other?

The Terms component should be the fastest since it has the most direct access 
to the underlying data.  But I don't understand why people use it for 
auto-suggest because it fails to consider the context of the query considering 
words before the right-most term.  However if you use KeywordTokenizer with 
EdgeNGram with Terms then this addresses that somewhat... You don't seem 
interested in matching cases where someone once queried "a b c" and you don't 
want "b c" to match on this apparently. Personally that would bug me.  I like 
the faceting approach but admittedly I have not used it at scale.

~ David Smiley
Author: http://www.packtpub.com/solr-1-4-enterprise-search-server/


> Also, I would also like to return the suggestion even if the user typed in 
> query matches in between: for instance if I have the query "chicken pasta" in 
> my index and the user types in "pasta", I would also like this query to be 
> returned as part of the suggestion (ala Yahoo!). Below is my field definition:
> 
>        <fieldType name="suggest" class="solr.TextField" 
> positionIncrementGap="100">
>            <analyzer type="index">
>                <tokenizer class="solr.KeywordTokenizerFactory"/>
>                <filter class="solr.LowerCaseFilterFactory"/>
>                <filter class="solr.EdgeNGramFilterFactory" minGramSize="2" 
> maxGramSize="50" />
>            </analyzer>
>            <analyzer type="query">
>                <tokenizer class="solr.KeywordTokenizerFactory"/>
>                <filter class="solr.LowerCaseFilterFactory"/>
>            </analyzer>
>        </fieldType>
> 
> 
> I tried changing the KeywordTokenizerFactory with LetterTokenizerFactory, and 
> though it works great for the above scenario (does a in-between match), it 
> has the side-effect of removing everything which are not letters so if the 
> user types in "123" he gets absolutely no suggestions. Is there anything that 
> I'm missing in my configuration, is this even achievable by using EdgeNGrams 
> or shall I look at using perhaps the TermsComponent after applying the regex 
> patch from 1.5 and maybe do something like ".*user-typed-in-chars.*"?
> 
> Thanks!
> 
> 
> 



Reply via email to