[ 
https://issues.apache.org/jira/browse/LUCENE-6336?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17017458#comment-17017458
 ] 

Michal Hlavac commented on LUCENE-6336:
---------------------------------------

It's not general solution, but I tried to override add method to basically 
create or update existing document and it works. Of course, it doesn't work 
with weightField and payloadField, but in my scenario with only field usage it 
works:
{code:java}
public class DedupAnalyzingInfixSuggester extends AnalyzingInfixSuggester {

    public DedupAnalyzingInfixSuggester(Directory dir, Analyzer analyzer) 
throws IOException {
        super(dir, analyzer);
    }

    // ... Other constructors ...

    @Override
    public void add(BytesRef text, Set<BytesRef> contexts, long weight, 
BytesRef payload) throws IOException {
        update(text, contexts, weight, payload);
    }
}
{code}

> AnalyzingInfixSuggester needs duplicate handling
> ------------------------------------------------
>
>                 Key: LUCENE-6336
>                 URL: https://issues.apache.org/jira/browse/LUCENE-6336
>             Project: Lucene - Core
>          Issue Type: Bug
>    Affects Versions: 4.10.3, 5.0
>            Reporter: Jan Høydahl
>            Priority: Major
>              Labels: lookup, suggester
>         Attachments: LUCENE-6336.patch
>
>
> Spinoff from LUCENE-5833 but else unrelated.
> Using {{AnalyzingInfixSuggester}} which is backed by a Lucene index and 
> stores payload and score together with the suggest text.
> I did some testing with Solr, producing the DocumentDictionary from an index 
> with multiple documents containing the same text, but with random weights 
> between 0-100. Then I got duplicate identical suggestions sorted by weight:
> {code}
> {
>   "suggest":{"languages":{
>       "engl":{
>         "numFound":101,
>         "suggestions":[{
>             "term":"<b>Engl</b>ish",
>             "weight":100,
>             "payload":"0"},
>           {
>             "term":"<b>Engl</b>ish",
>             "weight":99,
>             "payload":"0"},
>           {
>             "term":"<b>Engl</b>ish",
>             "weight":98,
>             "payload":"0"},
> ---etc all the way down to 0---
> {code}
> I also reproduced the same behavior in AnalyzingInfixSuggester directly. So 
> there is a need for some duplicate removal here, either while building the 
> local suggest index or during lookup. Only the highest weight suggestion for 
> a given term should be returned.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

Reply via email to