Re: Suggester duplicating values

Alessandro Benedetti Thu, 02 Jul 2015 02:42:08 -0700

Hi Rafael,
Your problem is clear and it has actually been explored few times in the
past.
I agree with you in a first instance.


A Suggester basic unit of information is a term. Not a document.
This means that actually it does not make a lot of sense to return
duplicates terms ( because they are coming from different docs).
The term id should be the term itself as there is no way for a human to
perceive any difference between two different terms returned by the
Suggester.

So, this consideration apart, are you using an intermediate API to query
Solr ( you should definitely do) .
If you are using any client, your client language should provide you a data
structure implementation to use to avoid duplicates.
Java for example is giving you HashSet , TreeSet and all the related
classes.

Hope this helps,

Cheers

2015-07-01 18:40 GMT+01:00 Rafael <rafael.man...@gmail.com>:

> Hi, I'm building a autocomplete solution on top of Solr for an ebook
> seller, but my database is complete denormalized, for example, I have this
> kind of records:
>
> *author           | title                       | price*
> -----------------+-----------------------------+---------
> J. R. R. Tolkien | Lord of the Rings           | $10.0
> J. R. R. Tolkien | Lord of the Rings Vol. 3    | $12.0
> J. R. R. Tolkien | Lord of the Rings           | $11.0
> J. R. R. Tolkien | Lord of the Rings Vol. 3    | $7.5
> J. R. R. Tolkien | Lord of the Rings Hardcover | $30.5
>
> ****We are already spending effort to normalize the database, but it will
> take a while*
>
>
> Thus, when I try to implement a suggest on author field, for example, if I
> type "*J.*" I'd get "*J. R. R. Tolkien*" 4 times.
>
> My Suggester Configuration is pretty standard:
>
> <!-- schema -->
>     <fieldType name="textSuggest" class="solr.TextField"
> positionIncrementGap="100">
>       <analyzer type="index">
>         <tokenizer class="solr.KeywordTokenizerFactory"/>
>         <filter class="solr.LowerCaseFilterFactory"/>
>       </analyzer>
>       <analyzer type="query">
>         <tokenizer class="solr.KeywordTokenizerFactory"/>
>         <filter class="solr.LowerCaseFilterFactory"/>
>       </analyzer>
>     </fieldType>
>
>
> <!-- Solrconfig -->
>   <searchComponent name="suggest" class="solr.SuggestComponent">
>         <lst name="suggester">
>       <str name="name">mySuggester</str>
>       <str name="lookupImpl">AnalyzingInfixLookupFactory</str>
>       <str name="dictionaryImpl">DocumentDictionaryFactory</str>
>       <str name="field">author</str>
>       <str name="suggestAnalyzerFieldType">textSuggest</str>
>     </lst>
>   </searchComponent>
>
>   <requestHandler name="/suggest" class="solr.SearchHandler"
> startup="lazy">
>     <lst name="defaults">
>       <str name="suggest">true</str>
>       <str name="suggest.count">20</str>
>       <str name="suggest.dictionary">mySuggester</str>
>     </lst>
>     <arr name="components">
>       <str>suggest</str>
>     </arr>
>   </requestHandler>
>
>
> And I'm using Solr 5.2.1.
>
> *Question:* Is there a way to get only unique values for suggestion ? Or,
> would be simpler to export a file (or even a nem table in database) without
> duplicated values ?
>
> Thanks.
>



-- 
--------------------------

Benedetti Alessandro
Visiting card : http://about.me/alessandro_benedetti

"Tyger, tyger burning bright
In the forests of the night,
What immortal hand or eye
Could frame thy fearful symmetry?"

William Blake - Songs of Experience -1794 England

Re: Suggester duplicating values

Reply via email to