Up, Can anyone gently take a look to my considerations related the FreeText
Suggester ?
I am curious to have more insight.
Eventually I will deeply analyse the code to understand my errors.

Cheers

2015-06-19 11:53 GMT+01:00 Alessandro Benedetti <benedetti.ale...@gmail.com>
:

> Actually the documentation is not clear enough.
> Let's try to understand this suggester.
>
> *Building*
> This suggester build a FST that it will use to provide the autocomplete
> feature running prefix searches on it .
> The terms it uses to generate the FST are the tokens produced by the
>  "suggestFreeTextAnalyzerFieldType" .
>
> And this should be correct.
> So if we have a shingle token filter[1-3] ( we produce unigrams as well)
> in our analysis to keep it simple , from these original field values :
> "mp3 ipod"
> "mp3 player"
> "mp3 player ipod"
> "player of Real"
>
> -> we produce these list of possible suggestions in our FST :
>
> <mp3>
> <player>
> <ipod>
> <real>
> <of>
>
> <mp3 ipod>
> <mp3 player>
> <player ipod>
> <player of>
> <of real>
>
> <mp3 player ipod>
> <player of real>
>
> From the documentation I read :
>
>> " ngrams: The max number of tokens out of which singles will be make the
>> dictionary. The default value is 2. Increasing this would mean you want
>> more than the previous 2 tokens to be taken into consideration when making
>> the suggestions. "
>
>
> This makes me confused, as I was not expecting this param to affect the
> suggestion dictionary.
> So I would like a clarification here from our masters :)
> At this point let's see what happens at query time .
>
> *Query Time *
> As my understanding the ngrams params will consider  the last N-1 tokens
> the user put separated by the space separator.
>
> "Builds an ngram model from the text sent to {@link
>> * #build} and predicts based on the last grams-1 tokens in
>> * the request sent to {@link #lookup}. This tries to
>> * handle the "long tail" of suggestions for when the
>> * incoming query is a never before seen query string."
>
>
> Example , grams=3 should consider only the last 2 tokens
>
> special mp3 p -> mp3 p
>
> Then this query is analysed using the "suggestFreeTextAnalyzerFieldType" .
> We produce 3 tokens :
> <mp3>
> <p>
> <mp3 p>
>
> And we run the prefix matching on the FST .
>
> *Conclusion*
> My understanding is wrong for sure at some point, as the behaviour I get
> is different.
> Can we discuss this , clarify this and eventually put it in the official
> documentation ?
>
> Cheers
>
> 2015-06-19 6:40 GMT+01:00 Zheng Lin Edwin Yeo <edwinye...@gmail.com>:
>
>> I'm implementing an auto-suggest feature in Solr, and I'll like to achieve
>> the follwing:
>>
>> For example, if the user enters "mp3", Solr might suggest "mp3 player",
>> "mp3 nano" and "mp3 music".
>> When the user enters "mp3 p", the suggestion should narrow down to "mp3
>> player".
>>
>> Currently, when I type "mp3 p", the suggester is returning words that
>> starts with the letter "p" only, and I'm getting results like "plan",
>> "production", etc, and it does not take the "mp3" token into
>> consideration.
>>
>> I'm using Solr 5.1 and below is my configuration:
>>
>> In solrconfig.xml:
>>
>> <searchComponent name="suggest" class="solr.SuggestComponent">
>>   <lst name="suggester">
>>
>>                  <str name="lookupImpl">FreeTextLookupFactory</str>
>>                  <str name="indexPath">suggester_freetext_dir</str>
>>
>> <str name="dictionaryImpl">DocumentDictionaryFactory</str>
>> <str name="field">Suggestion</str>
>> <str name="weightField">Project</str>
>> <str name="suggestFreeTextAnalyzerFieldType">suggestType</str>
>> <int name="ngrams">5</int>
>> <str name="buildOnStartup">false</str>
>> <str name="buildOnCommit">false</str>
>>   </lst>
>> </searchComponent>
>>
>>
>> In schema.xml
>>
>> <fieldType name="suggestType" class="solr.TextField"
>> positionIncrementGap="100">
>> <analyzer type="index">
>> <charFilter class="solr.PatternReplaceCharFilterFactory"
>> pattern="[^a-zA-Z0-9]" replacement=" " />
>> <tokenizer class="solr.WhitespaceTokenizerFactory"/>
>> <filter class="solr.ShingleFilterFactory" minShingleSize="2"
>> maxShingleSize="6" outputUnigrams="false"/>
>> </analyzer>
>> <analyzer type="query">
>> <charFilter class="solr.PatternReplaceCharFilterFactory"
>> pattern="[^a-zA-Z0-9]" replacement=" " />
>> <tokenizer class="solr.WhitespaceTokenizerFactory"/>
>> <filter class="solr.ShingleFilterFactory" minShingleSize="2"
>> maxShingleSize="6" outputUnigrams="true"/>
>> </analyzer>
>> </fieldType>
>>
>>
>> Is there anything that I configured wrongly?
>>
>>
>> Regards,
>> Edwin
>>
>
>
>
> --
> --------------------------
>
> Benedetti Alessandro
> Visiting card : http://about.me/alessandro_benedetti
>
> "Tyger, tyger burning bright
> In the forests of the night,
> What immortal hand or eye
> Could frame thy fearful symmetry?"
>
> William Blake - Songs of Experience -1794 England
>



-- 
--------------------------

Benedetti Alessandro
Visiting card : http://about.me/alessandro_benedetti

"Tyger, tyger burning bright
In the forests of the night,
What immortal hand or eye
Could frame thy fearful symmetry?"

William Blake - Songs of Experience -1794 England

Reply via email to