You might also look at http://issues.apache.org/jira/browse/SOLR-1316

On Feb 24, 2010, at 1:17 AM, Sachin wrote:

> 
> 
> Hi All,
> 
> I am trying to setup autosuggest using solr 1.4 for my site and needed some 
> pointers on that. Basically, we provide autosuggest for user typed in 
> characters in the searchbox. The autosuggest index is created with older user 
> typed in search queries which returned > 0 results. We do some lazy writing 
> to store this information into the db and then export it to solr on a nightly 
> basis. As far as I know, there are 3 ways (apart from wild card search) of 
> achieving autosuggest using solr 1.4:
> 
> 1. Use EdgeNGrams
> 2. Use shingles and prefix query.
> 3. Use the new Terms component.
> 
> I am for now more inclinded towards using the EdgeNGrams (no method to 
> madness) and just wanted to know is there any recommended approach out of the 
> 3 in terms of performance, since the user excepts the suggestions to be 
> almost instantaneous? We do some heavy caching at our end to avoid hitting 
> solr everytime but is any of these 3 approaches faster than the other?
> 
> Also, I would also like to return the suggestion even if the user typed in 
> query matches in between: for instance if I have the query "chicken pasta" in 
> my index and the user types in "pasta", I would also like this query to be 
> returned as part of the suggestion (ala Yahoo!). Below is my field definition:
> 
>        <fieldType name="suggest" class="solr.TextField" 
> positionIncrementGap="100">
>            <analyzer type="index">
>                <tokenizer class="solr.KeywordTokenizerFactory"/>
>                <filter class="solr.LowerCaseFilterFactory"/>
>                <filter class="solr.EdgeNGramFilterFactory" minGramSize="2" 
> maxGramSize="50" />
>            </analyzer>
>            <analyzer type="query">
>                <tokenizer class="solr.KeywordTokenizerFactory"/>
>                <filter class="solr.LowerCaseFilterFactory"/>
>            </analyzer>
>        </fieldType>
> 
> 
> I tried changing the KeywordTokenizerFactory with LetterTokenizerFactory, and 
> though it works great for the above scenario (does a in-between match), it 
> has the side-effect of removing everything which are not letters so if the 
> user types in "123" he gets absolutely no suggestions. Is there anything that 
> I'm missing in my configuration, is this even achievable by using EdgeNGrams 
> or shall I look at using perhaps the TermsComponent after applying the regex 
> patch from 1.5 and maybe do something like ".*user-typed-in-chars.*"?
> 
> Thanks!
> 
> 
> 


Reply via email to