Thanks, this is very helpful. Suggester config is quite under documented. It took me longer than I expected to get it working.
wunder Walter Underwood wun...@wunderwood.org http://observer.wunderwood.org/ (my blog) On Jul 10, 2015, at 6:30 PM, Alessandro Benedetti <benedetti.ale...@gmail.com> wrote: > Hi guys, > just wrote a blog to integrate Erick's post and to explain in details with > practical examples all the main Lookup implementations : > > http://alexbenedetti.blogspot.co.uk/2015/07/solr-you-complete-me.html > > I think this can be useful for Edwin to finally fix the config for the > FreeTextSuggester ( which finally I clarified Erick, thanks to Mike answer > in dev, and deep code analysis and testing :) ) > > Cheers > > 2015-06-27 23:51 GMT+01:00 Alessandro Benedetti <benedetti.ale...@gmail.com> > : > >> Thanks, Erick, i didn't have time to go again through the code. >> But i will forward this to the Dev list. >> Thank you for your time ! >> >> Cheers >> >> 2015-06-27 16:19 GMT+01:00 Erick Erickson <erickerick...@gmail.com>: >> >>> Alessandro: >>> >>> Going to have to defer to Mike McCandless et.al., they're the >>> authorities here. Don't quite know whether they monitor this list, >>> consider the dev list? >>> >>> Best, >>> Erick >>> >>> On Fri, Jun 26, 2015 at 4:53 AM, Alessandro Benedetti >>> <benedetti.ale...@gmail.com> wrote: >>>> Up, Can anyone gently take a look to my considerations related the >>> FreeText >>>> Suggester ? >>>> I am curious to have more insight. >>>> Eventually I will deeply analyse the code to understand my errors. >>>> >>>> Cheers >>>> >>>> 2015-06-19 11:53 GMT+01:00 Alessandro Benedetti < >>> benedetti.ale...@gmail.com> >>>> : >>>> >>>>> Actually the documentation is not clear enough. >>>>> Let's try to understand this suggester. >>>>> >>>>> *Building* >>>>> This suggester build a FST that it will use to provide the autocomplete >>>>> feature running prefix searches on it . >>>>> The terms it uses to generate the FST are the tokens produced by the >>>>> "suggestFreeTextAnalyzerFieldType" . >>>>> >>>>> And this should be correct. >>>>> So if we have a shingle token filter[1-3] ( we produce unigrams as >>> well) >>>>> in our analysis to keep it simple , from these original field values : >>>>> "mp3 ipod" >>>>> "mp3 player" >>>>> "mp3 player ipod" >>>>> "player of Real" >>>>> >>>>> -> we produce these list of possible suggestions in our FST : >>>>> >>>>> <mp3> >>>>> <player> >>>>> <ipod> >>>>> <real> >>>>> <of> >>>>> >>>>> <mp3 ipod> >>>>> <mp3 player> >>>>> <player ipod> >>>>> <player of> >>>>> <of real> >>>>> >>>>> <mp3 player ipod> >>>>> <player of real> >>>>> >>>>> From the documentation I read : >>>>> >>>>>> " ngrams: The max number of tokens out of which singles will be make >>> the >>>>>> dictionary. The default value is 2. Increasing this would mean you >>> want >>>>>> more than the previous 2 tokens to be taken into consideration when >>> making >>>>>> the suggestions. " >>>>> >>>>> >>>>> This makes me confused, as I was not expecting this param to affect the >>>>> suggestion dictionary. >>>>> So I would like a clarification here from our masters :) >>>>> At this point let's see what happens at query time . >>>>> >>>>> *Query Time * >>>>> As my understanding the ngrams params will consider the last N-1 >>> tokens >>>>> the user put separated by the space separator. >>>>> >>>>> "Builds an ngram model from the text sent to {@link >>>>>> * #build} and predicts based on the last grams-1 tokens in >>>>>> * the request sent to {@link #lookup}. This tries to >>>>>> * handle the "long tail" of suggestions for when the >>>>>> * incoming query is a never before seen query string." >>>>> >>>>> >>>>> Example , grams=3 should consider only the last 2 tokens >>>>> >>>>> special mp3 p -> mp3 p >>>>> >>>>> Then this query is analysed using the >>> "suggestFreeTextAnalyzerFieldType" . >>>>> We produce 3 tokens : >>>>> <mp3> >>>>> <p> >>>>> <mp3 p> >>>>> >>>>> And we run the prefix matching on the FST . >>>>> >>>>> *Conclusion* >>>>> My understanding is wrong for sure at some point, as the behaviour I >>> get >>>>> is different. >>>>> Can we discuss this , clarify this and eventually put it in the >>> official >>>>> documentation ? >>>>> >>>>> Cheers >>>>> >>>>> 2015-06-19 6:40 GMT+01:00 Zheng Lin Edwin Yeo <edwinye...@gmail.com>: >>>>> >>>>>> I'm implementing an auto-suggest feature in Solr, and I'll like to >>> achieve >>>>>> the follwing: >>>>>> >>>>>> For example, if the user enters "mp3", Solr might suggest "mp3 >>> player", >>>>>> "mp3 nano" and "mp3 music". >>>>>> When the user enters "mp3 p", the suggestion should narrow down to >>> "mp3 >>>>>> player". >>>>>> >>>>>> Currently, when I type "mp3 p", the suggester is returning words that >>>>>> starts with the letter "p" only, and I'm getting results like "plan", >>>>>> "production", etc, and it does not take the "mp3" token into >>>>>> consideration. >>>>>> >>>>>> I'm using Solr 5.1 and below is my configuration: >>>>>> >>>>>> In solrconfig.xml: >>>>>> >>>>>> <searchComponent name="suggest" class="solr.SuggestComponent"> >>>>>> <lst name="suggester"> >>>>>> >>>>>> <str name="lookupImpl">FreeTextLookupFactory</str> >>>>>> <str name="indexPath">suggester_freetext_dir</str> >>>>>> >>>>>> <str name="dictionaryImpl">DocumentDictionaryFactory</str> >>>>>> <str name="field">Suggestion</str> >>>>>> <str name="weightField">Project</str> >>>>>> <str name="suggestFreeTextAnalyzerFieldType">suggestType</str> >>>>>> <int name="ngrams">5</int> >>>>>> <str name="buildOnStartup">false</str> >>>>>> <str name="buildOnCommit">false</str> >>>>>> </lst> >>>>>> </searchComponent> >>>>>> >>>>>> >>>>>> In schema.xml >>>>>> >>>>>> <fieldType name="suggestType" class="solr.TextField" >>>>>> positionIncrementGap="100"> >>>>>> <analyzer type="index"> >>>>>> <charFilter class="solr.PatternReplaceCharFilterFactory" >>>>>> pattern="[^a-zA-Z0-9]" replacement=" " /> >>>>>> <tokenizer class="solr.WhitespaceTokenizerFactory"/> >>>>>> <filter class="solr.ShingleFilterFactory" minShingleSize="2" >>>>>> maxShingleSize="6" outputUnigrams="false"/> >>>>>> </analyzer> >>>>>> <analyzer type="query"> >>>>>> <charFilter class="solr.PatternReplaceCharFilterFactory" >>>>>> pattern="[^a-zA-Z0-9]" replacement=" " /> >>>>>> <tokenizer class="solr.WhitespaceTokenizerFactory"/> >>>>>> <filter class="solr.ShingleFilterFactory" minShingleSize="2" >>>>>> maxShingleSize="6" outputUnigrams="true"/> >>>>>> </analyzer> >>>>>> </fieldType> >>>>>> >>>>>> >>>>>> Is there anything that I configured wrongly? >>>>>> >>>>>> >>>>>> Regards, >>>>>> Edwin >>>>>> >>>>> >>>>> >>>>> >>>>> -- >>>>> -------------------------- >>>>> >>>>> Benedetti Alessandro >>>>> Visiting card : http://about.me/alessandro_benedetti >>>>> >>>>> "Tyger, tyger burning bright >>>>> In the forests of the night, >>>>> What immortal hand or eye >>>>> Could frame thy fearful symmetry?" >>>>> >>>>> William Blake - Songs of Experience -1794 England >>>>> >>>> >>>> >>>> >>>> -- >>>> -------------------------- >>>> >>>> Benedetti Alessandro >>>> Visiting card : http://about.me/alessandro_benedetti >>>> >>>> "Tyger, tyger burning bright >>>> In the forests of the night, >>>> What immortal hand or eye >>>> Could frame thy fearful symmetry?" >>>> >>>> William Blake - Songs of Experience -1794 England >>> >> >> >> >> -- >> -------------------------- >> >> Benedetti Alessandro >> Visiting card : http://about.me/alessandro_benedetti >> >> "Tyger, tyger burning bright >> In the forests of the night, >> What immortal hand or eye >> Could frame thy fearful symmetry?" >> >> William Blake - Songs of Experience -1794 England >> > > > > -- > -------------------------- > > Benedetti Alessandro > Visiting card : http://about.me/alessandro_benedetti > > "Tyger, tyger burning bright > In the forests of the night, > What immortal hand or eye > Could frame thy fearful symmetry?" > > William Blake - Songs of Experience -1794 England