Up, Can anyone gently take a look to my considerations related the FreeText Suggester ? I am curious to have more insight. Eventually I will deeply analyse the code to understand my errors.
Cheers 2015-06-19 11:53 GMT+01:00 Alessandro Benedetti <benedetti.ale...@gmail.com> : > Actually the documentation is not clear enough. > Let's try to understand this suggester. > > *Building* > This suggester build a FST that it will use to provide the autocomplete > feature running prefix searches on it . > The terms it uses to generate the FST are the tokens produced by the > "suggestFreeTextAnalyzerFieldType" . > > And this should be correct. > So if we have a shingle token filter[1-3] ( we produce unigrams as well) > in our analysis to keep it simple , from these original field values : > "mp3 ipod" > "mp3 player" > "mp3 player ipod" > "player of Real" > > -> we produce these list of possible suggestions in our FST : > > <mp3> > <player> > <ipod> > <real> > <of> > > <mp3 ipod> > <mp3 player> > <player ipod> > <player of> > <of real> > > <mp3 player ipod> > <player of real> > > From the documentation I read : > >> " ngrams: The max number of tokens out of which singles will be make the >> dictionary. The default value is 2. Increasing this would mean you want >> more than the previous 2 tokens to be taken into consideration when making >> the suggestions. " > > > This makes me confused, as I was not expecting this param to affect the > suggestion dictionary. > So I would like a clarification here from our masters :) > At this point let's see what happens at query time . > > *Query Time * > As my understanding the ngrams params will consider the last N-1 tokens > the user put separated by the space separator. > > "Builds an ngram model from the text sent to {@link >> * #build} and predicts based on the last grams-1 tokens in >> * the request sent to {@link #lookup}. This tries to >> * handle the "long tail" of suggestions for when the >> * incoming query is a never before seen query string." > > > Example , grams=3 should consider only the last 2 tokens > > special mp3 p -> mp3 p > > Then this query is analysed using the "suggestFreeTextAnalyzerFieldType" . > We produce 3 tokens : > <mp3> > <p> > <mp3 p> > > And we run the prefix matching on the FST . > > *Conclusion* > My understanding is wrong for sure at some point, as the behaviour I get > is different. > Can we discuss this , clarify this and eventually put it in the official > documentation ? > > Cheers > > 2015-06-19 6:40 GMT+01:00 Zheng Lin Edwin Yeo <edwinye...@gmail.com>: > >> I'm implementing an auto-suggest feature in Solr, and I'll like to achieve >> the follwing: >> >> For example, if the user enters "mp3", Solr might suggest "mp3 player", >> "mp3 nano" and "mp3 music". >> When the user enters "mp3 p", the suggestion should narrow down to "mp3 >> player". >> >> Currently, when I type "mp3 p", the suggester is returning words that >> starts with the letter "p" only, and I'm getting results like "plan", >> "production", etc, and it does not take the "mp3" token into >> consideration. >> >> I'm using Solr 5.1 and below is my configuration: >> >> In solrconfig.xml: >> >> <searchComponent name="suggest" class="solr.SuggestComponent"> >> <lst name="suggester"> >> >> <str name="lookupImpl">FreeTextLookupFactory</str> >> <str name="indexPath">suggester_freetext_dir</str> >> >> <str name="dictionaryImpl">DocumentDictionaryFactory</str> >> <str name="field">Suggestion</str> >> <str name="weightField">Project</str> >> <str name="suggestFreeTextAnalyzerFieldType">suggestType</str> >> <int name="ngrams">5</int> >> <str name="buildOnStartup">false</str> >> <str name="buildOnCommit">false</str> >> </lst> >> </searchComponent> >> >> >> In schema.xml >> >> <fieldType name="suggestType" class="solr.TextField" >> positionIncrementGap="100"> >> <analyzer type="index"> >> <charFilter class="solr.PatternReplaceCharFilterFactory" >> pattern="[^a-zA-Z0-9]" replacement=" " /> >> <tokenizer class="solr.WhitespaceTokenizerFactory"/> >> <filter class="solr.ShingleFilterFactory" minShingleSize="2" >> maxShingleSize="6" outputUnigrams="false"/> >> </analyzer> >> <analyzer type="query"> >> <charFilter class="solr.PatternReplaceCharFilterFactory" >> pattern="[^a-zA-Z0-9]" replacement=" " /> >> <tokenizer class="solr.WhitespaceTokenizerFactory"/> >> <filter class="solr.ShingleFilterFactory" minShingleSize="2" >> maxShingleSize="6" outputUnigrams="true"/> >> </analyzer> >> </fieldType> >> >> >> Is there anything that I configured wrongly? >> >> >> Regards, >> Edwin >> > > > > -- > -------------------------- > > Benedetti Alessandro > Visiting card : http://about.me/alessandro_benedetti > > "Tyger, tyger burning bright > In the forests of the night, > What immortal hand or eye > Could frame thy fearful symmetry?" > > William Blake - Songs of Experience -1794 England > -- -------------------------- Benedetti Alessandro Visiting card : http://about.me/alessandro_benedetti "Tyger, tyger burning bright In the forests of the night, What immortal hand or eye Could frame thy fearful symmetry?" William Blake - Songs of Experience -1794 England