[jira] [Updated] (LUCENE-10645) Wrong autocomplete suggestion

Emiliyan Sinigerov (Jira) Mon, 11 Jul 2022 00:19:29 -0700


     [ 
https://issues.apache.org/jira/browse/LUCENE-10645?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]


Emiliyan Sinigerov updated LUCENE-10645:
----------------------------------------
    Description: 
I have problem with autocomplete suggestion (I use your test to show you where 
is the bug 
[https://github.com/apache/lucene/blob/698f40ad51af0c42b0a4a8321ab89968e8d0860b/lucene/suggest/src/test/org/apache/lucene/search/suggest/analyzing/TestAnalyzingInfixSuggester.java]).

This is your test and everything works fine:

public void testBothExactAndPrefix() throws Exception

{     Analyzer a = new MockAnalyzer(random(), MockTokenizer.WHITESPACE, false); 
    AnalyzingInfixSuggester suggester = new 
AnalyzingInfixSuggester(newDirectory(), a, a, 3, false);     
suggester.build(new InputArrayIterator(new Input[0]));     suggester.add(new 
BytesRef("the pen is pretty"), null, 10, new BytesRef("foobaz"));     
suggester.refresh();     List<LookupResult> results =         
suggester.lookup(TestUtil.stringToCharSequence("pen p", random()), 10, true, 
true);     assertEquals(1, results.size());     assertEquals("the pen is 
pretty", results.get(0).key);     assertEquals("the <b>pen</b> is 
<b>p</b>retty", results.get(0).highlightKey);     assertEquals(10, 
results.get(0).value);     assertEquals(new BytesRef("foobaz"), 
results.get(0).payload);     suggester.close();     a.close();  }

 

But if I add this row to the test {*}suggester.add(new BytesRef("the pen is 
fretty"), null, 10, new BytesRef("foobaz")){*}, the test goes wrong.

public void testBothExactAndPrefix() throws Exception

{   Analyzer a = new MockAnalyzer(random(), MockTokenizer.WHITESPACE, false);   
AnalyzingInfixSuggester suggester = new AnalyzingInfixSuggester(newDirectory(), 
a, a, 3, false);   suggester.build(new InputArrayIterator(new Input[0]));   
suggester.add(new BytesRef("the pen is pretty"), null, 10, new 
BytesRef("foobaz"));   *suggester.add(new BytesRef("the pen is fretty"), null, 
10, new BytesRef("foobaz"));*   suggester.refresh();   List<LookupResult> 
results =       suggester.lookup(TestUtil.stringToCharSequence("pen p", 
random()), 10, true, true);   assertEquals(1, results.size());   
assertEquals("the pen is pretty", results.get(0).key);   assertEquals("the 
<b>pen</b> is <b>p</b>retty", results.get(0).highlightKey);   assertEquals(10, 
results.get(0).value);   assertEquals(new BytesRef("foobaz"), 
results.get(0).payload);   suggester.close();   a.close(); }

We want to find everything that contains "pen p" and we have just one matcher 
"the pen is pretty", but in the results we have two matches "the pen is pretty" 
and "the pen is fretty".

I think when we want to find some words - in this study "pen" and the second 
word with one letter, which is the same as the first letter in our word - in 
this study "p", the suggester first match word "pen" and then match "p" in 
"pen", which is inccorect. We want to match "p" in a word other than "pen".

  was:
I have problem with autocomplete suggestion (I use your test to show you where 
is the bug 
https://github.com/apache/lucene/blob/698f40ad51af0c42b0a4a8321ab89968e8d0860b/lucene/suggest/src/test/org/apache/lucene/search/suggest/analyzing/TestAnalyzingInfixSuggester.java).

This is your test and everything works fine:

public void testBothExactAndPrefix() throws Exception {
    Analyzer a = new MockAnalyzer(random(), MockTokenizer.WHITESPACE, false);
    AnalyzingInfixSuggester suggester = new 
AnalyzingInfixSuggester(newDirectory(), a, a, 3, false);
    suggester.build(new InputArrayIterator(new Input[0]));
    suggester.add(new BytesRef("the pen is pretty"), null, 10, new 
BytesRef("foobaz"));
    suggester.refresh();

    List<LookupResult> results =
        suggester.lookup(TestUtil.stringToCharSequence("pen p", random()), 10, 
true, true);
    assertEquals(1, results.size());
    assertEquals("the pen is pretty", results.get(0).key);
    assertEquals("the <b>pen</b> is <b>p</b>retty", 
results.get(0).highlightKey);
    assertEquals(10, results.get(0).value);
    assertEquals(new BytesRef("foobaz"), results.get(0).payload);
    suggester.close();
    a.close();
 }

 

But if I add this row to the test {*}suggester.add(new BytesRef("the pen is 
fretty"), null, 10, new BytesRef("foobaz")){*}, the test goes wrong.

public void testBothExactAndPrefix() throws Exception {
  Analyzer a = new MockAnalyzer(random(), MockTokenizer.WHITESPACE, false);
  AnalyzingInfixSuggester suggester = new 
AnalyzingInfixSuggester(newDirectory(), a, a, 3, false);
  suggester.build(new InputArrayIterator(new Input[0]));
  suggester.add(new BytesRef("the pen is pretty"), null, 10, new 
BytesRef("foobaz"));
  *suggester.add(new BytesRef("the pen is fretty"), null, 10, new 
BytesRef("foobaz"));*

  suggester.refresh();

  List<LookupResult> results =
      suggester.lookup(TestUtil.stringToCharSequence("pen p", random()), 10, 
true, true);
  assertEquals(1, results.size());
  assertEquals("the pen is pretty", results.get(0).key);
  assertEquals("the <b>pen</b> is <b>p</b>retty", results.get(0).highlightKey);
  assertEquals(10, results.get(0).value);
  assertEquals(new BytesRef("foobaz"), results.get(0).payload);
  suggester.close();
  a.close();
}

We want to find everything that contains "pen p" and we have just one matcher 
"the pen is pretty", but in the results we have two matches "the pen is pretty" 
and "the pen is fretty".

I think when we want to find some words - in this study "pen" and the second 
word with one letter, which is the same as the first letter in our word - in 
this study "p", the suggester first match word "pen" and then match "p" in 
"pen", which is inccorect. We want to match "p" in a word other than "pen".

 

Thank you,

 

Emiliyan.


> Wrong autocomplete suggestion
> -----------------------------
>
>                 Key: LUCENE-10645
>                 URL: https://issues.apache.org/jira/browse/LUCENE-10645
>             Project: Lucene - Core
>          Issue Type: Bug
>            Reporter: Emiliyan Sinigerov
>            Priority: Major
>
> I have problem with autocomplete suggestion (I use your test to show you 
> where is the bug 
> [https://github.com/apache/lucene/blob/698f40ad51af0c42b0a4a8321ab89968e8d0860b/lucene/suggest/src/test/org/apache/lucene/search/suggest/analyzing/TestAnalyzingInfixSuggester.java]).
> This is your test and everything works fine:
> public void testBothExactAndPrefix() throws Exception
> {     Analyzer a = new MockAnalyzer(random(), MockTokenizer.WHITESPACE, 
> false);     AnalyzingInfixSuggester suggester = new 
> AnalyzingInfixSuggester(newDirectory(), a, a, 3, false);     
> suggester.build(new InputArrayIterator(new Input[0]));     suggester.add(new 
> BytesRef("the pen is pretty"), null, 10, new BytesRef("foobaz"));     
> suggester.refresh();     List<LookupResult> results =         
> suggester.lookup(TestUtil.stringToCharSequence("pen p", random()), 10, true, 
> true);     assertEquals(1, results.size());     assertEquals("the pen is 
> pretty", results.get(0).key);     assertEquals("the <b>pen</b> is 
> <b>p</b>retty", results.get(0).highlightKey);     assertEquals(10, 
> results.get(0).value);     assertEquals(new BytesRef("foobaz"), 
> results.get(0).payload);     suggester.close();     a.close();  }
>  
> But if I add this row to the test {*}suggester.add(new BytesRef("the pen is 
> fretty"), null, 10, new BytesRef("foobaz")){*}, the test goes wrong.
> public void testBothExactAndPrefix() throws Exception
> {   Analyzer a = new MockAnalyzer(random(), MockTokenizer.WHITESPACE, false); 
>   AnalyzingInfixSuggester suggester = new 
> AnalyzingInfixSuggester(newDirectory(), a, a, 3, false);   
> suggester.build(new InputArrayIterator(new Input[0]));   suggester.add(new 
> BytesRef("the pen is pretty"), null, 10, new BytesRef("foobaz"));   
> *suggester.add(new BytesRef("the pen is fretty"), null, 10, new 
> BytesRef("foobaz"));*   suggester.refresh();   List<LookupResult> results =   
>     suggester.lookup(TestUtil.stringToCharSequence("pen p", random()), 10, 
> true, true);   assertEquals(1, results.size());   assertEquals("the pen is 
> pretty", results.get(0).key);   assertEquals("the <b>pen</b> is 
> <b>p</b>retty", results.get(0).highlightKey);   assertEquals(10, 
> results.get(0).value);   assertEquals(new BytesRef("foobaz"), 
> results.get(0).payload);   suggester.close();   a.close(); }
> We want to find everything that contains "pen p" and we have just one matcher 
> "the pen is pretty", but in the results we have two matches "the pen is 
> pretty" and "the pen is fretty".
> I think when we want to find some words - in this study "pen" and the second 
> word with one letter, which is the same as the first letter in our word - in 
> this study "p", the suggester first match word "pen" and then match "p" in 
> "pen", which is inccorect. We want to match "p" in a word other than "pen".



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[jira] [Updated] (LUCENE-10645) Wrong autocomplete suggestion

Reply via email to