I've been indexing and reindexing stuff here with Shingles. I don't believe it's the best approach. Results are interesting, but I believe it's not what the suggester is meant to be.
I tried <fieldType name="textSuggestion" class="solr.TextField" positionIncrementGap="10" stored="false" multiValued="true"> <analyzer type="index"> <tokenizer class="solr.StandardTokenizerFactory"/> <filter class="solr.LowerCaseFilterFactory"/> <filter class="solr.ShingleFilterFactory" maxShingleSize="4" outputUnigrams="true" outputUnigramsIfNoShingles="false" /> </analyzer> <analyzer type="query"> <tokenizer class="solr.StandardTokenizerFactory"/> <filter class="solr.LowerCaseFilterFactory"/> <filter class="solr.StandardFilterFactory"/> <filter class="solr.RemoveDuplicatesTokenFilterFactory"/> </analyzer> </fieldType> but I got compound words in the suggestion itself. If you query them like http://localhost:8983/solr/{mycore}/suggest/?q=dri i get <response> <lst name="responseHeader"> <int name="status">0</int> <int name="QTime">1</int> </lst> <lst name="spellcheck"> <lst name="suggestions"> <lst name="dri"> <int name="numFound">6</int> <int name="startOffset">0</int> <int name="endOffset">3</int> <arr name="suggestion"> <str>drivers</str> <str>drivers nvidia</str> <str>drivers intel</str> <str>drivers nvidia geforce</str> <str>drive</str> <str>driver</str> </arr> </lst> <str name="collation">drivers</str> </lst> </lst> </response> but when i enter the second word, http://localhost:8983/solr/{mycore}/suggest/?q=drivers%20n<http://localhost:8983/solr/%7Bmycore%7D/suggest/?q=drivers%20n> it scrambles everything <response> <lst name="responseHeader"> <int name="status">0</int> <int name="QTime">0</int> </lst> <lst name="spellcheck"> <lst name="suggestions"> <lst name="drivers"> <int name="numFound">4</int> <int name="startOffset">0</int> <int name="endOffset">7</int> <arr name="suggestion"> <str>drivers</str> <str>drivers nvidia</str> <str>drivers intel</str> <str>drivers nvidia geforce</str> </arr> </lst> <lst name="n"> <int name="numFound">10</int> <int name="startOffset">8</int> <int name="endOffset">9</int> <arr name="suggestion"> <str>nvidia</str> <str>net</str> <str>nvidia geforce</str> <str>network</str> <str>new</str> <str>n</str> <str>ninja</str> </arr> </lst> <str name="collation">drivers nvidia</str> </lst> </lst> </response> Although the collation seems fine for this, it's not exactly what suggester is supposed to do. Any thoughts? 2011/8/17 Alexei Martchenko <ale...@superdownloads.com.br> > I have the very very very same problem. I could copy+paste your message as > mine. I've discovered so far that bigger dictionaries work better for me, > controlling threshold is much better than avoid indexing one or twio fields. > Of course i'm still polishing this. > > At this very moment I was looking into Shingles, are you using them? > http://wiki.apache.org/solr/AnalyzersTokenizersTokenFilters#solr.ShingleFilterFactory > > How are your fields? > > 2011/8/17 Kuba Krzemień <krzemien.k...@gmail.com> > >> Hello, I am working on creating a auto-complete functionality for my >> platform which indexes large ammounts of text (title + contents) - there is >> too much data for a dictionary. I am using the latest version of Solr (3.3) >> and I am trying to take advantage of the Suggester functionality. >> Unfortunately so far the outcome isn't that great. >> >> The Suggester works only for single words or whole phrases (depends on the >> tokenizer). When using the first option, I am unable to suggest any combined >> queries. For example the suggestion for 'ne' will be 'new'. Suggestion for >> 'new y' will be two separate lists, one for 'new' and one for 'y'. Whats >> worse, querying 'new AND y' gives the same results (also when using >> collate), which means that the returned suggestion may give no results - >> what makes sense separately often doesn't work combined. I need a way to >> find only those suggestions, that will return results when doing a AND query >> (for example 'new AND york', 'new AND year', as long as they give results >> upon querying - 'new AND yeti' shouldn't be returned as a suggestion). >> >> When I use the second tokenizer and the suggestions return phrases, for >> 'ne' I will get 'new york' and 'new year', but for 'new y' I will get >> nothing. Also, for 'y' I will get nothing, so the issue remains. >> >> If someone has some experience working with the Suggester, or if someone >> has created a well working auto-suggester based on Solr, please help me. >> I've been trying to find a sollution for this for quite some time. >> >> Yours sincerely, >> Jackob K >> > > > > -- > > *Alexei Martchenko* | *CEO* | Superdownloads > ale...@superdownloads.com.br | ale...@martchenko.com.br | (11) > 5083.1018/5080.3535/5080.3533 > > -- *Alexei Martchenko* | *CEO* | Superdownloads ale...@superdownloads.com.br | ale...@martchenko.com.br | (11) 5083.1018/5080.3535/5080.3533