You might also look at http://issues.apache.org/jira/browse/SOLR-1316
On Feb 24, 2010, at 1:17 AM, Sachin wrote: > > > Hi All, > > I am trying to setup autosuggest using solr 1.4 for my site and needed some > pointers on that. Basically, we provide autosuggest for user typed in > characters in the searchbox. The autosuggest index is created with older user > typed in search queries which returned > 0 results. We do some lazy writing > to store this information into the db and then export it to solr on a nightly > basis. As far as I know, there are 3 ways (apart from wild card search) of > achieving autosuggest using solr 1.4: > > 1. Use EdgeNGrams > 2. Use shingles and prefix query. > 3. Use the new Terms component. > > I am for now more inclinded towards using the EdgeNGrams (no method to > madness) and just wanted to know is there any recommended approach out of the > 3 in terms of performance, since the user excepts the suggestions to be > almost instantaneous? We do some heavy caching at our end to avoid hitting > solr everytime but is any of these 3 approaches faster than the other? > > Also, I would also like to return the suggestion even if the user typed in > query matches in between: for instance if I have the query "chicken pasta" in > my index and the user types in "pasta", I would also like this query to be > returned as part of the suggestion (ala Yahoo!). Below is my field definition: > > <fieldType name="suggest" class="solr.TextField" > positionIncrementGap="100"> > <analyzer type="index"> > <tokenizer class="solr.KeywordTokenizerFactory"/> > <filter class="solr.LowerCaseFilterFactory"/> > <filter class="solr.EdgeNGramFilterFactory" minGramSize="2" > maxGramSize="50" /> > </analyzer> > <analyzer type="query"> > <tokenizer class="solr.KeywordTokenizerFactory"/> > <filter class="solr.LowerCaseFilterFactory"/> > </analyzer> > </fieldType> > > > I tried changing the KeywordTokenizerFactory with LetterTokenizerFactory, and > though it works great for the above scenario (does a in-between match), it > has the side-effect of removing everything which are not letters so if the > user types in "123" he gets absolutely no suggestions. Is there anything that > I'm missing in my configuration, is this even achievable by using EdgeNGrams > or shall I look at using perhaps the TermsComponent after applying the regex > patch from 1.5 and maybe do something like ".*user-typed-in-chars.*"? > > Thanks! > > >