Re: Auto-suggest in Solr

Zheng Lin Edwin Yeo Fri, 19 Jun 2015 22:15:07 -0700

Ok sure.

> " ngrams: The max number of tokens out of which singles will be make the
> dictionary. The default value is 2. Increasing this would mean you want
> more than the previous 2 tokens to be taken into consideration when making
> the suggestions. "


I got confused by this, as I could not get the behavior when I use the
suggester. Since the default value is 2, it means the search for "mp3 p"
should include only suggestions that contains "mp3 ..." and not just from
the letter "p". But I have only been getting suggestions that starts with
"p" only.
Even when I try with a bigger ngrams value for longer search, I'm getting
the same results as well, that the suggester only consider the last token
when giving the suggestions.

I still could not achieve anything that consider 2 or more tokens when
returning the suggestions.

So am I actually following the right direction with this?

Regards,
Edwin



On 19 June 2015 at 18:53, Alessandro Benedetti <[email protected]>
wrote:

> Actually the documentation is not clear enough.
> Let's try to understand this suggester.
>
> *Building*
> This suggester build a FST that it will use to provide the autocomplete
> feature running prefix searches on it .
> The terms it uses to generate the FST are the tokens produced by the
>  "suggestFreeTextAnalyzerFieldType" .
>
> And this should be correct.
> So if we have a shingle token filter[1-3] ( we produce unigrams as well) in
> our analysis to keep it simple , from these original field values :
> "mp3 ipod"
> "mp3 player"
> "mp3 player ipod"
> "player of Real"
>
> -> we produce these list of possible suggestions in our FST :
>
> <mp3>
> <player>
> <ipod>
> <real>
> <of>
>
> <mp3 ipod>
> <mp3 player>
> <player ipod>
> <player of>
> <of real>
>
> <mp3 player ipod>
> <player of real>
>
> From the documentation I read :
>
> > " ngrams: The max number of tokens out of which singles will be make the
> > dictionary. The default value is 2. Increasing this would mean you want
> > more than the previous 2 tokens to be taken into consideration when
> making
> > the suggestions. "
>
>
> This makes me confused, as I was not expecting this param to affect the
> suggestion dictionary.
> So I would like a clarification here from our masters :)
> At this point let's see what happens at query time .
>
> *Query Time *
> As my understanding the ngrams params will consider  the last N-1 tokens
> the user put separated by the space separator.
>
> "Builds an ngram model from the text sent to {@link
> > * #build} and predicts based on the last grams-1 tokens in
> > * the request sent to {@link #lookup}. This tries to
> > * handle the "long tail" of suggestions for when the
> > * incoming query is a never before seen query string."
>
>
> Example , grams=3 should consider only the last 2 tokens
>
> special mp3 p -> mp3 p
>
> Then this query is analysed using the "suggestFreeTextAnalyzerFieldType" .
> We produce 3 tokens :
> <mp3>
> <p>
> <mp3 p>
>
> And we run the prefix matching on the FST .
>
> *Conclusion*
> My understanding is wrong for sure at some point, as the behaviour I get is
> different.
> Can we discuss this , clarify this and eventually put it in the official
> documentation ?
>
> Cheers
>
> 2015-06-19 6:40 GMT+01:00 Zheng Lin Edwin Yeo <[email protected]>:
>
> > I'm implementing an auto-suggest feature in Solr, and I'll like to
> achieve
> > the follwing:
> >
> > For example, if the user enters "mp3", Solr might suggest "mp3 player",
> > "mp3 nano" and "mp3 music".
> > When the user enters "mp3 p", the suggestion should narrow down to "mp3
> > player".
> >
> > Currently, when I type "mp3 p", the suggester is returning words that
> > starts with the letter "p" only, and I'm getting results like "plan",
> > "production", etc, and it does not take the "mp3" token into
> consideration.
> >
> > I'm using Solr 5.1 and below is my configuration:
> >
> > In solrconfig.xml:
> >
> > <searchComponent name="suggest" class="solr.SuggestComponent">
> >   <lst name="suggester">
> >
> >                  <str name="lookupImpl">FreeTextLookupFactory</str>
> >                  <str name="indexPath">suggester_freetext_dir</str>
> >
> > <str name="dictionaryImpl">DocumentDictionaryFactory</str>
> > <str name="field">Suggestion</str>
> > <str name="weightField">Project</str>
> > <str name="suggestFreeTextAnalyzerFieldType">suggestType</str>
> > <int name="ngrams">5</int>
> > <str name="buildOnStartup">false</str>
> > <str name="buildOnCommit">false</str>
> >   </lst>
> > </searchComponent>
> >
> >
> > In schema.xml
> >
> > <fieldType name="suggestType" class="solr.TextField"
> > positionIncrementGap="100">
> > <analyzer type="index">
> > <charFilter class="solr.PatternReplaceCharFilterFactory"
> > pattern="[^a-zA-Z0-9]" replacement=" " />
> > <tokenizer class="solr.WhitespaceTokenizerFactory"/>
> > <filter class="solr.ShingleFilterFactory" minShingleSize="2"
> > maxShingleSize="6" outputUnigrams="false"/>
> > </analyzer>
> > <analyzer type="query">
> > <charFilter class="solr.PatternReplaceCharFilterFactory"
> > pattern="[^a-zA-Z0-9]" replacement=" " />
> > <tokenizer class="solr.WhitespaceTokenizerFactory"/>
> > <filter class="solr.ShingleFilterFactory" minShingleSize="2"
> > maxShingleSize="6" outputUnigrams="true"/>
> > </analyzer>
> > </fieldType>
> >
> >
> > Is there anything that I configured wrongly?
> >
> >
> > Regards,
> > Edwin
> >
>
>
>
> --
> --------------------------
>
> Benedetti Alessandro
> Visiting card : http://about.me/alessandro_benedetti
>
> "Tyger, tyger burning bright
> In the forests of the night,
> What immortal hand or eye
> Could frame thy fearful symmetry?"
>
> William Blake - Songs of Experience -1794 England
>

Re: Auto-suggest in Solr

Reply via email to