Actually the documentation is not clear enough.
Let's try to understand this suggester.

*Building*
This suggester build a FST that it will use to provide the autocomplete
feature running prefix searches on it .
The terms it uses to generate the FST are the tokens produced by the
 "suggestFreeTextAnalyzerFieldType" .

And this should be correct.
So if we have a shingle token filter[1-3] ( we produce unigrams as well) in
our analysis to keep it simple , from these original field values :
"mp3 ipod"
"mp3 player"
"mp3 player ipod"
"player of Real"

-> we produce these list of possible suggestions in our FST :

<mp3>
<player>
<ipod>
<real>
<of>

<mp3 ipod>
<mp3 player>
<player ipod>
<player of>
<of real>

<mp3 player ipod>
<player of real>

>From the documentation I read :

> " ngrams: The max number of tokens out of which singles will be make the
> dictionary. The default value is 2. Increasing this would mean you want
> more than the previous 2 tokens to be taken into consideration when making
> the suggestions. "


This makes me confused, as I was not expecting this param to affect the
suggestion dictionary.
So I would like a clarification here from our masters :)
At this point let's see what happens at query time .

*Query Time *
As my understanding the ngrams params will consider  the last N-1 tokens
the user put separated by the space separator.

"Builds an ngram model from the text sent to {@link
> * #build} and predicts based on the last grams-1 tokens in
> * the request sent to {@link #lookup}. This tries to
> * handle the "long tail" of suggestions for when the
> * incoming query is a never before seen query string."


Example , grams=3 should consider only the last 2 tokens

special mp3 p -> mp3 p

Then this query is analysed using the "suggestFreeTextAnalyzerFieldType" .
We produce 3 tokens :
<mp3>
<p>
<mp3 p>

And we run the prefix matching on the FST .

*Conclusion*
My understanding is wrong for sure at some point, as the behaviour I get is
different.
Can we discuss this , clarify this and eventually put it in the official
documentation ?

Cheers

2015-06-19 6:40 GMT+01:00 Zheng Lin Edwin Yeo <edwinye...@gmail.com>:

> I'm implementing an auto-suggest feature in Solr, and I'll like to achieve
> the follwing:
>
> For example, if the user enters "mp3", Solr might suggest "mp3 player",
> "mp3 nano" and "mp3 music".
> When the user enters "mp3 p", the suggestion should narrow down to "mp3
> player".
>
> Currently, when I type "mp3 p", the suggester is returning words that
> starts with the letter "p" only, and I'm getting results like "plan",
> "production", etc, and it does not take the "mp3" token into consideration.
>
> I'm using Solr 5.1 and below is my configuration:
>
> In solrconfig.xml:
>
> <searchComponent name="suggest" class="solr.SuggestComponent">
>   <lst name="suggester">
>
>                  <str name="lookupImpl">FreeTextLookupFactory</str>
>                  <str name="indexPath">suggester_freetext_dir</str>
>
> <str name="dictionaryImpl">DocumentDictionaryFactory</str>
> <str name="field">Suggestion</str>
> <str name="weightField">Project</str>
> <str name="suggestFreeTextAnalyzerFieldType">suggestType</str>
> <int name="ngrams">5</int>
> <str name="buildOnStartup">false</str>
> <str name="buildOnCommit">false</str>
>   </lst>
> </searchComponent>
>
>
> In schema.xml
>
> <fieldType name="suggestType" class="solr.TextField"
> positionIncrementGap="100">
> <analyzer type="index">
> <charFilter class="solr.PatternReplaceCharFilterFactory"
> pattern="[^a-zA-Z0-9]" replacement=" " />
> <tokenizer class="solr.WhitespaceTokenizerFactory"/>
> <filter class="solr.ShingleFilterFactory" minShingleSize="2"
> maxShingleSize="6" outputUnigrams="false"/>
> </analyzer>
> <analyzer type="query">
> <charFilter class="solr.PatternReplaceCharFilterFactory"
> pattern="[^a-zA-Z0-9]" replacement=" " />
> <tokenizer class="solr.WhitespaceTokenizerFactory"/>
> <filter class="solr.ShingleFilterFactory" minShingleSize="2"
> maxShingleSize="6" outputUnigrams="true"/>
> </analyzer>
> </fieldType>
>
>
> Is there anything that I configured wrongly?
>
>
> Regards,
> Edwin
>



-- 
--------------------------

Benedetti Alessandro
Visiting card : http://about.me/alessandro_benedetti

"Tyger, tyger burning bright
In the forests of the night,
What immortal hand or eye
Could frame thy fearful symmetry?"

William Blake - Songs of Experience -1794 England

Reply via email to