Thanks, Erick, i didn't have time to go again through the code. But i will forward this to the Dev list. Thank you for your time !
Cheers 2015-06-27 16:19 GMT+01:00 Erick Erickson <[email protected]>: > Alessandro: > > Going to have to defer to Mike McCandless et.al., they're the > authorities here. Don't quite know whether they monitor this list, > consider the dev list? > > Best, > Erick > > On Fri, Jun 26, 2015 at 4:53 AM, Alessandro Benedetti > <[email protected]> wrote: > > Up, Can anyone gently take a look to my considerations related the > FreeText > > Suggester ? > > I am curious to have more insight. > > Eventually I will deeply analyse the code to understand my errors. > > > > Cheers > > > > 2015-06-19 11:53 GMT+01:00 Alessandro Benedetti < > [email protected]> > > : > > > >> Actually the documentation is not clear enough. > >> Let's try to understand this suggester. > >> > >> *Building* > >> This suggester build a FST that it will use to provide the autocomplete > >> feature running prefix searches on it . > >> The terms it uses to generate the FST are the tokens produced by the > >> "suggestFreeTextAnalyzerFieldType" . > >> > >> And this should be correct. > >> So if we have a shingle token filter[1-3] ( we produce unigrams as well) > >> in our analysis to keep it simple , from these original field values : > >> "mp3 ipod" > >> "mp3 player" > >> "mp3 player ipod" > >> "player of Real" > >> > >> -> we produce these list of possible suggestions in our FST : > >> > >> <mp3> > >> <player> > >> <ipod> > >> <real> > >> <of> > >> > >> <mp3 ipod> > >> <mp3 player> > >> <player ipod> > >> <player of> > >> <of real> > >> > >> <mp3 player ipod> > >> <player of real> > >> > >> From the documentation I read : > >> > >>> " ngrams: The max number of tokens out of which singles will be make > the > >>> dictionary. The default value is 2. Increasing this would mean you want > >>> more than the previous 2 tokens to be taken into consideration when > making > >>> the suggestions. " > >> > >> > >> This makes me confused, as I was not expecting this param to affect the > >> suggestion dictionary. > >> So I would like a clarification here from our masters :) > >> At this point let's see what happens at query time . > >> > >> *Query Time * > >> As my understanding the ngrams params will consider the last N-1 tokens > >> the user put separated by the space separator. > >> > >> "Builds an ngram model from the text sent to {@link > >>> * #build} and predicts based on the last grams-1 tokens in > >>> * the request sent to {@link #lookup}. This tries to > >>> * handle the "long tail" of suggestions for when the > >>> * incoming query is a never before seen query string." > >> > >> > >> Example , grams=3 should consider only the last 2 tokens > >> > >> special mp3 p -> mp3 p > >> > >> Then this query is analysed using the > "suggestFreeTextAnalyzerFieldType" . > >> We produce 3 tokens : > >> <mp3> > >> <p> > >> <mp3 p> > >> > >> And we run the prefix matching on the FST . > >> > >> *Conclusion* > >> My understanding is wrong for sure at some point, as the behaviour I get > >> is different. > >> Can we discuss this , clarify this and eventually put it in the official > >> documentation ? > >> > >> Cheers > >> > >> 2015-06-19 6:40 GMT+01:00 Zheng Lin Edwin Yeo <[email protected]>: > >> > >>> I'm implementing an auto-suggest feature in Solr, and I'll like to > achieve > >>> the follwing: > >>> > >>> For example, if the user enters "mp3", Solr might suggest "mp3 player", > >>> "mp3 nano" and "mp3 music". > >>> When the user enters "mp3 p", the suggestion should narrow down to "mp3 > >>> player". > >>> > >>> Currently, when I type "mp3 p", the suggester is returning words that > >>> starts with the letter "p" only, and I'm getting results like "plan", > >>> "production", etc, and it does not take the "mp3" token into > >>> consideration. > >>> > >>> I'm using Solr 5.1 and below is my configuration: > >>> > >>> In solrconfig.xml: > >>> > >>> <searchComponent name="suggest" class="solr.SuggestComponent"> > >>> <lst name="suggester"> > >>> > >>> <str name="lookupImpl">FreeTextLookupFactory</str> > >>> <str name="indexPath">suggester_freetext_dir</str> > >>> > >>> <str name="dictionaryImpl">DocumentDictionaryFactory</str> > >>> <str name="field">Suggestion</str> > >>> <str name="weightField">Project</str> > >>> <str name="suggestFreeTextAnalyzerFieldType">suggestType</str> > >>> <int name="ngrams">5</int> > >>> <str name="buildOnStartup">false</str> > >>> <str name="buildOnCommit">false</str> > >>> </lst> > >>> </searchComponent> > >>> > >>> > >>> In schema.xml > >>> > >>> <fieldType name="suggestType" class="solr.TextField" > >>> positionIncrementGap="100"> > >>> <analyzer type="index"> > >>> <charFilter class="solr.PatternReplaceCharFilterFactory" > >>> pattern="[^a-zA-Z0-9]" replacement=" " /> > >>> <tokenizer class="solr.WhitespaceTokenizerFactory"/> > >>> <filter class="solr.ShingleFilterFactory" minShingleSize="2" > >>> maxShingleSize="6" outputUnigrams="false"/> > >>> </analyzer> > >>> <analyzer type="query"> > >>> <charFilter class="solr.PatternReplaceCharFilterFactory" > >>> pattern="[^a-zA-Z0-9]" replacement=" " /> > >>> <tokenizer class="solr.WhitespaceTokenizerFactory"/> > >>> <filter class="solr.ShingleFilterFactory" minShingleSize="2" > >>> maxShingleSize="6" outputUnigrams="true"/> > >>> </analyzer> > >>> </fieldType> > >>> > >>> > >>> Is there anything that I configured wrongly? > >>> > >>> > >>> Regards, > >>> Edwin > >>> > >> > >> > >> > >> -- > >> -------------------------- > >> > >> Benedetti Alessandro > >> Visiting card : http://about.me/alessandro_benedetti > >> > >> "Tyger, tyger burning bright > >> In the forests of the night, > >> What immortal hand or eye > >> Could frame thy fearful symmetry?" > >> > >> William Blake - Songs of Experience -1794 England > >> > > > > > > > > -- > > -------------------------- > > > > Benedetti Alessandro > > Visiting card : http://about.me/alessandro_benedetti > > > > "Tyger, tyger burning bright > > In the forests of the night, > > What immortal hand or eye > > Could frame thy fearful symmetry?" > > > > William Blake - Songs of Experience -1794 England > -- -------------------------- Benedetti Alessandro Visiting card : http://about.me/alessandro_benedetti "Tyger, tyger burning bright In the forests of the night, What immortal hand or eye Could frame thy fearful symmetry?" William Blake - Songs of Experience -1794 England
