Emiliyan Sinigerov created LUCENE-10645:
-------------------------------------------

             Summary: Wrong autocomplete suggestion
                 Key: LUCENE-10645
                 URL: https://issues.apache.org/jira/browse/LUCENE-10645
             Project: Lucene - Core
          Issue Type: Bug
            Reporter: Emiliyan Sinigerov


I have problem with autocomplete suggestion (I use your test to show you where 
is the bug 
https://github.com/apache/lucene/blob/698f40ad51af0c42b0a4a8321ab89968e8d0860b/lucene/suggest/src/test/org/apache/lucene/search/suggest/analyzing/TestAnalyzingInfixSuggester.java).

This is your test and everything works fine:

public void testBothExactAndPrefix() throws Exception {
    Analyzer a = new MockAnalyzer(random(), MockTokenizer.WHITESPACE, false);
    AnalyzingInfixSuggester suggester = new 
AnalyzingInfixSuggester(newDirectory(), a, a, 3, false);
    suggester.build(new InputArrayIterator(new Input[0]));
    suggester.add(new BytesRef("the pen is pretty"), null, 10, new 
BytesRef("foobaz"));
    suggester.refresh();

    List<LookupResult> results =
        suggester.lookup(TestUtil.stringToCharSequence("pen p", random()), 10, 
true, true);
    assertEquals(1, results.size());
    assertEquals("the pen is pretty", results.get(0).key);
    assertEquals("the <b>pen</b> is <b>p</b>retty", 
results.get(0).highlightKey);
    assertEquals(10, results.get(0).value);
    assertEquals(new BytesRef("foobaz"), results.get(0).payload);
    suggester.close();
    a.close();
 }

 

But if I add this row to the test {*}suggester.add(new BytesRef("the pen is 
fretty"), null, 10, new BytesRef("foobaz")){*}, the test goes wrong.

public void testBothExactAndPrefix() throws Exception {
  Analyzer a = new MockAnalyzer(random(), MockTokenizer.WHITESPACE, false);
  AnalyzingInfixSuggester suggester = new 
AnalyzingInfixSuggester(newDirectory(), a, a, 3, false);
  suggester.build(new InputArrayIterator(new Input[0]));
  suggester.add(new BytesRef("the pen is pretty"), null, 10, new 
BytesRef("foobaz"));
  *suggester.add(new BytesRef("the pen is fretty"), null, 10, new 
BytesRef("foobaz"));*

  suggester.refresh();

  List<LookupResult> results =
      suggester.lookup(TestUtil.stringToCharSequence("pen p", random()), 10, 
true, true);
  assertEquals(1, results.size());
  assertEquals("the pen is pretty", results.get(0).key);
  assertEquals("the <b>pen</b> is <b>p</b>retty", results.get(0).highlightKey);
  assertEquals(10, results.get(0).value);
  assertEquals(new BytesRef("foobaz"), results.get(0).payload);
  suggester.close();
  a.close();
}

We want to find everything that contains "pen p" and we have just one matcher 
"the pen is pretty", but in the results we have two matches "the pen is pretty" 
and "the pen is fretty".

I think when we want to find some words - in this study "pen" and the second 
word with one letter, which is the same as the first letter in our word - in 
this study "p", the suggester first match word "pen" and then match "p" in 
"pen", which is inccorect. We want to match "p" in a word other than "pen".

 

Thank you,

 

Emiliyan.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

Reply via email to