Thanks, this is very helpful.

Suggester config is quite under documented. It took me longer than I expected 
to get it working.

wunder
Walter Underwood
wun...@wunderwood.org
http://observer.wunderwood.org/  (my blog)


On Jul 10, 2015, at 6:30 PM, Alessandro Benedetti <benedetti.ale...@gmail.com> 
wrote:

> Hi guys,
> just wrote a blog to integrate Erick's post and to explain in details with
> practical examples all the main Lookup implementations :
> 
> http://alexbenedetti.blogspot.co.uk/2015/07/solr-you-complete-me.html
> 
> I think this can be useful for Edwin to finally fix the config for the
> FreeTextSuggester ( which finally I clarified Erick, thanks to Mike answer
> in dev, and deep code analysis and testing :) )
> 
> Cheers
> 
> 2015-06-27 23:51 GMT+01:00 Alessandro Benedetti <benedetti.ale...@gmail.com>
> :
> 
>> Thanks, Erick, i didn't have time to go again through the code.
>> But i will forward this to the Dev list.
>> Thank you for your time !
>> 
>> Cheers
>> 
>> 2015-06-27 16:19 GMT+01:00 Erick Erickson <erickerick...@gmail.com>:
>> 
>>> Alessandro:
>>> 
>>> Going to have to defer to Mike McCandless et.al., they're the
>>> authorities here. Don't quite know whether they monitor this list,
>>> consider the dev list?
>>> 
>>> Best,
>>> Erick
>>> 
>>> On Fri, Jun 26, 2015 at 4:53 AM, Alessandro Benedetti
>>> <benedetti.ale...@gmail.com> wrote:
>>>> Up, Can anyone gently take a look to my considerations related the
>>> FreeText
>>>> Suggester ?
>>>> I am curious to have more insight.
>>>> Eventually I will deeply analyse the code to understand my errors.
>>>> 
>>>> Cheers
>>>> 
>>>> 2015-06-19 11:53 GMT+01:00 Alessandro Benedetti <
>>> benedetti.ale...@gmail.com>
>>>> :
>>>> 
>>>>> Actually the documentation is not clear enough.
>>>>> Let's try to understand this suggester.
>>>>> 
>>>>> *Building*
>>>>> This suggester build a FST that it will use to provide the autocomplete
>>>>> feature running prefix searches on it .
>>>>> The terms it uses to generate the FST are the tokens produced by the
>>>>> "suggestFreeTextAnalyzerFieldType" .
>>>>> 
>>>>> And this should be correct.
>>>>> So if we have a shingle token filter[1-3] ( we produce unigrams as
>>> well)
>>>>> in our analysis to keep it simple , from these original field values :
>>>>> "mp3 ipod"
>>>>> "mp3 player"
>>>>> "mp3 player ipod"
>>>>> "player of Real"
>>>>> 
>>>>> -> we produce these list of possible suggestions in our FST :
>>>>> 
>>>>> <mp3>
>>>>> <player>
>>>>> <ipod>
>>>>> <real>
>>>>> <of>
>>>>> 
>>>>> <mp3 ipod>
>>>>> <mp3 player>
>>>>> <player ipod>
>>>>> <player of>
>>>>> <of real>
>>>>> 
>>>>> <mp3 player ipod>
>>>>> <player of real>
>>>>> 
>>>>> From the documentation I read :
>>>>> 
>>>>>> " ngrams: The max number of tokens out of which singles will be make
>>> the
>>>>>> dictionary. The default value is 2. Increasing this would mean you
>>> want
>>>>>> more than the previous 2 tokens to be taken into consideration when
>>> making
>>>>>> the suggestions. "
>>>>> 
>>>>> 
>>>>> This makes me confused, as I was not expecting this param to affect the
>>>>> suggestion dictionary.
>>>>> So I would like a clarification here from our masters :)
>>>>> At this point let's see what happens at query time .
>>>>> 
>>>>> *Query Time *
>>>>> As my understanding the ngrams params will consider  the last N-1
>>> tokens
>>>>> the user put separated by the space separator.
>>>>> 
>>>>> "Builds an ngram model from the text sent to {@link
>>>>>> * #build} and predicts based on the last grams-1 tokens in
>>>>>> * the request sent to {@link #lookup}. This tries to
>>>>>> * handle the "long tail" of suggestions for when the
>>>>>> * incoming query is a never before seen query string."
>>>>> 
>>>>> 
>>>>> Example , grams=3 should consider only the last 2 tokens
>>>>> 
>>>>> special mp3 p -> mp3 p
>>>>> 
>>>>> Then this query is analysed using the
>>> "suggestFreeTextAnalyzerFieldType" .
>>>>> We produce 3 tokens :
>>>>> <mp3>
>>>>> <p>
>>>>> <mp3 p>
>>>>> 
>>>>> And we run the prefix matching on the FST .
>>>>> 
>>>>> *Conclusion*
>>>>> My understanding is wrong for sure at some point, as the behaviour I
>>> get
>>>>> is different.
>>>>> Can we discuss this , clarify this and eventually put it in the
>>> official
>>>>> documentation ?
>>>>> 
>>>>> Cheers
>>>>> 
>>>>> 2015-06-19 6:40 GMT+01:00 Zheng Lin Edwin Yeo <edwinye...@gmail.com>:
>>>>> 
>>>>>> I'm implementing an auto-suggest feature in Solr, and I'll like to
>>> achieve
>>>>>> the follwing:
>>>>>> 
>>>>>> For example, if the user enters "mp3", Solr might suggest "mp3
>>> player",
>>>>>> "mp3 nano" and "mp3 music".
>>>>>> When the user enters "mp3 p", the suggestion should narrow down to
>>> "mp3
>>>>>> player".
>>>>>> 
>>>>>> Currently, when I type "mp3 p", the suggester is returning words that
>>>>>> starts with the letter "p" only, and I'm getting results like "plan",
>>>>>> "production", etc, and it does not take the "mp3" token into
>>>>>> consideration.
>>>>>> 
>>>>>> I'm using Solr 5.1 and below is my configuration:
>>>>>> 
>>>>>> In solrconfig.xml:
>>>>>> 
>>>>>> <searchComponent name="suggest" class="solr.SuggestComponent">
>>>>>>  <lst name="suggester">
>>>>>> 
>>>>>>                 <str name="lookupImpl">FreeTextLookupFactory</str>
>>>>>>                 <str name="indexPath">suggester_freetext_dir</str>
>>>>>> 
>>>>>> <str name="dictionaryImpl">DocumentDictionaryFactory</str>
>>>>>> <str name="field">Suggestion</str>
>>>>>> <str name="weightField">Project</str>
>>>>>> <str name="suggestFreeTextAnalyzerFieldType">suggestType</str>
>>>>>> <int name="ngrams">5</int>
>>>>>> <str name="buildOnStartup">false</str>
>>>>>> <str name="buildOnCommit">false</str>
>>>>>>  </lst>
>>>>>> </searchComponent>
>>>>>> 
>>>>>> 
>>>>>> In schema.xml
>>>>>> 
>>>>>> <fieldType name="suggestType" class="solr.TextField"
>>>>>> positionIncrementGap="100">
>>>>>> <analyzer type="index">
>>>>>> <charFilter class="solr.PatternReplaceCharFilterFactory"
>>>>>> pattern="[^a-zA-Z0-9]" replacement=" " />
>>>>>> <tokenizer class="solr.WhitespaceTokenizerFactory"/>
>>>>>> <filter class="solr.ShingleFilterFactory" minShingleSize="2"
>>>>>> maxShingleSize="6" outputUnigrams="false"/>
>>>>>> </analyzer>
>>>>>> <analyzer type="query">
>>>>>> <charFilter class="solr.PatternReplaceCharFilterFactory"
>>>>>> pattern="[^a-zA-Z0-9]" replacement=" " />
>>>>>> <tokenizer class="solr.WhitespaceTokenizerFactory"/>
>>>>>> <filter class="solr.ShingleFilterFactory" minShingleSize="2"
>>>>>> maxShingleSize="6" outputUnigrams="true"/>
>>>>>> </analyzer>
>>>>>> </fieldType>
>>>>>> 
>>>>>> 
>>>>>> Is there anything that I configured wrongly?
>>>>>> 
>>>>>> 
>>>>>> Regards,
>>>>>> Edwin
>>>>>> 
>>>>> 
>>>>> 
>>>>> 
>>>>> --
>>>>> --------------------------
>>>>> 
>>>>> Benedetti Alessandro
>>>>> Visiting card : http://about.me/alessandro_benedetti
>>>>> 
>>>>> "Tyger, tyger burning bright
>>>>> In the forests of the night,
>>>>> What immortal hand or eye
>>>>> Could frame thy fearful symmetry?"
>>>>> 
>>>>> William Blake - Songs of Experience -1794 England
>>>>> 
>>>> 
>>>> 
>>>> 
>>>> --
>>>> --------------------------
>>>> 
>>>> Benedetti Alessandro
>>>> Visiting card : http://about.me/alessandro_benedetti
>>>> 
>>>> "Tyger, tyger burning bright
>>>> In the forests of the night,
>>>> What immortal hand or eye
>>>> Could frame thy fearful symmetry?"
>>>> 
>>>> William Blake - Songs of Experience -1794 England
>>> 
>> 
>> 
>> 
>> --
>> --------------------------
>> 
>> Benedetti Alessandro
>> Visiting card : http://about.me/alessandro_benedetti
>> 
>> "Tyger, tyger burning bright
>> In the forests of the night,
>> What immortal hand or eye
>> Could frame thy fearful symmetry?"
>> 
>> William Blake - Songs of Experience -1794 England
>> 
> 
> 
> 
> -- 
> --------------------------
> 
> Benedetti Alessandro
> Visiting card : http://about.me/alessandro_benedetti
> 
> "Tyger, tyger burning bright
> In the forests of the night,
> What immortal hand or eye
> Could frame thy fearful symmetry?"
> 
> William Blake - Songs of Experience -1794 England

Reply via email to