[ https://issues.apache.org/jira/browse/LUCENE-10645?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Emiliyan Sinigerov updated LUCENE-10645: ---------------------------------------- Description: I have problem with autocomplete suggestion (I use your test to show you where is the bug [https://github.com/apache/lucene/blob/698f40ad51af0c42b0a4a8321ab89968e8d0860b/lucene/suggest/src/test/org/apache/lucene/search/suggest/analyzing/TestAnalyzingInfixSuggester.java]). This is your test and everything works fine: public void testBothExactAndPrefix() throws Exception { Analyzer a = new MockAnalyzer(random(), MockTokenizer.WHITESPACE, false); AnalyzingInfixSuggester suggester = new AnalyzingInfixSuggester(newDirectory(), a, a, 3, false); suggester.build(new InputArrayIterator(new Input[0])); suggester.add(new BytesRef("the pen is pretty"), null, 10, new BytesRef("foobaz")); suggester.refresh(); List<LookupResult> results = suggester.lookup(TestUtil.stringToCharSequence("pen p", random()), 10, true, true); assertEquals(1, results.size()); assertEquals("the pen is pretty", results.get(0).key); assertEquals("the <b>pen</b> is <b>p</b>retty", results.get(0).highlightKey); assertEquals(10, results.get(0).value); assertEquals(new BytesRef("foobaz"), results.get(0).payload); suggester.close(); a.close(); } But if I add this row to the test {*}suggester.add(new BytesRef("the pen is fretty"), null, 10, new BytesRef("foobaz")){*}, the test goes wrong. public void testBothExactAndPrefix() throws Exception { Analyzer a = new MockAnalyzer(random(), MockTokenizer.WHITESPACE, false); AnalyzingInfixSuggester suggester = new AnalyzingInfixSuggester(newDirectory(), a, a, 3, false); suggester.build(new InputArrayIterator(new Input[0])); suggester.add(new BytesRef("the pen is pretty"), null, 10, new BytesRef("foobaz")); *suggester.add(new BytesRef("the pen is fretty"), null, 10, new BytesRef("foobaz"));* suggester.refresh(); List<LookupResult> results = suggester.lookup(TestUtil.stringToCharSequence("pen p", random()), 10, true, true); assertEquals(1, results.size()); assertEquals("the pen is pretty", results.get(0).key); assertEquals("the <b>pen</b> is <b>p</b>retty", results.get(0).highlightKey); assertEquals(10, results.get(0).value); assertEquals(new BytesRef("foobaz"), results.get(0).payload); suggester.close(); a.close(); } We want to find everything that contains "pen p" and we have just one matcher "the pen is pretty", but in the results we have two matches "the pen is pretty" and "the pen is fretty". I think when we want to find some words - in this study "pen" and the second word with one letter, which is the same as the first letter in our word - in this study "p", the suggester first match word "pen" and then match "p" in "pen", which is inccorect. We want to match "p" in a word other than "pen". was: I have problem with autocomplete suggestion (I use your test to show you where is the bug https://github.com/apache/lucene/blob/698f40ad51af0c42b0a4a8321ab89968e8d0860b/lucene/suggest/src/test/org/apache/lucene/search/suggest/analyzing/TestAnalyzingInfixSuggester.java). This is your test and everything works fine: public void testBothExactAndPrefix() throws Exception { Analyzer a = new MockAnalyzer(random(), MockTokenizer.WHITESPACE, false); AnalyzingInfixSuggester suggester = new AnalyzingInfixSuggester(newDirectory(), a, a, 3, false); suggester.build(new InputArrayIterator(new Input[0])); suggester.add(new BytesRef("the pen is pretty"), null, 10, new BytesRef("foobaz")); suggester.refresh(); List<LookupResult> results = suggester.lookup(TestUtil.stringToCharSequence("pen p", random()), 10, true, true); assertEquals(1, results.size()); assertEquals("the pen is pretty", results.get(0).key); assertEquals("the <b>pen</b> is <b>p</b>retty", results.get(0).highlightKey); assertEquals(10, results.get(0).value); assertEquals(new BytesRef("foobaz"), results.get(0).payload); suggester.close(); a.close(); } But if I add this row to the test {*}suggester.add(new BytesRef("the pen is fretty"), null, 10, new BytesRef("foobaz")){*}, the test goes wrong. public void testBothExactAndPrefix() throws Exception { Analyzer a = new MockAnalyzer(random(), MockTokenizer.WHITESPACE, false); AnalyzingInfixSuggester suggester = new AnalyzingInfixSuggester(newDirectory(), a, a, 3, false); suggester.build(new InputArrayIterator(new Input[0])); suggester.add(new BytesRef("the pen is pretty"), null, 10, new BytesRef("foobaz")); *suggester.add(new BytesRef("the pen is fretty"), null, 10, new BytesRef("foobaz"));* suggester.refresh(); List<LookupResult> results = suggester.lookup(TestUtil.stringToCharSequence("pen p", random()), 10, true, true); assertEquals(1, results.size()); assertEquals("the pen is pretty", results.get(0).key); assertEquals("the <b>pen</b> is <b>p</b>retty", results.get(0).highlightKey); assertEquals(10, results.get(0).value); assertEquals(new BytesRef("foobaz"), results.get(0).payload); suggester.close(); a.close(); } We want to find everything that contains "pen p" and we have just one matcher "the pen is pretty", but in the results we have two matches "the pen is pretty" and "the pen is fretty". I think when we want to find some words - in this study "pen" and the second word with one letter, which is the same as the first letter in our word - in this study "p", the suggester first match word "pen" and then match "p" in "pen", which is inccorect. We want to match "p" in a word other than "pen". Thank you, Emiliyan. > Wrong autocomplete suggestion > ----------------------------- > > Key: LUCENE-10645 > URL: https://issues.apache.org/jira/browse/LUCENE-10645 > Project: Lucene - Core > Issue Type: Bug > Reporter: Emiliyan Sinigerov > Priority: Major > > I have problem with autocomplete suggestion (I use your test to show you > where is the bug > [https://github.com/apache/lucene/blob/698f40ad51af0c42b0a4a8321ab89968e8d0860b/lucene/suggest/src/test/org/apache/lucene/search/suggest/analyzing/TestAnalyzingInfixSuggester.java]). > This is your test and everything works fine: > public void testBothExactAndPrefix() throws Exception > { Analyzer a = new MockAnalyzer(random(), MockTokenizer.WHITESPACE, > false); AnalyzingInfixSuggester suggester = new > AnalyzingInfixSuggester(newDirectory(), a, a, 3, false); > suggester.build(new InputArrayIterator(new Input[0])); suggester.add(new > BytesRef("the pen is pretty"), null, 10, new BytesRef("foobaz")); > suggester.refresh(); List<LookupResult> results = > suggester.lookup(TestUtil.stringToCharSequence("pen p", random()), 10, true, > true); assertEquals(1, results.size()); assertEquals("the pen is > pretty", results.get(0).key); assertEquals("the <b>pen</b> is > <b>p</b>retty", results.get(0).highlightKey); assertEquals(10, > results.get(0).value); assertEquals(new BytesRef("foobaz"), > results.get(0).payload); suggester.close(); a.close(); } > > But if I add this row to the test {*}suggester.add(new BytesRef("the pen is > fretty"), null, 10, new BytesRef("foobaz")){*}, the test goes wrong. > public void testBothExactAndPrefix() throws Exception > { Analyzer a = new MockAnalyzer(random(), MockTokenizer.WHITESPACE, false); > AnalyzingInfixSuggester suggester = new > AnalyzingInfixSuggester(newDirectory(), a, a, 3, false); > suggester.build(new InputArrayIterator(new Input[0])); suggester.add(new > BytesRef("the pen is pretty"), null, 10, new BytesRef("foobaz")); > *suggester.add(new BytesRef("the pen is fretty"), null, 10, new > BytesRef("foobaz"));* suggester.refresh(); List<LookupResult> results = > suggester.lookup(TestUtil.stringToCharSequence("pen p", random()), 10, > true, true); assertEquals(1, results.size()); assertEquals("the pen is > pretty", results.get(0).key); assertEquals("the <b>pen</b> is > <b>p</b>retty", results.get(0).highlightKey); assertEquals(10, > results.get(0).value); assertEquals(new BytesRef("foobaz"), > results.get(0).payload); suggester.close(); a.close(); } > We want to find everything that contains "pen p" and we have just one matcher > "the pen is pretty", but in the results we have two matches "the pen is > pretty" and "the pen is fretty". > I think when we want to find some words - in this study "pen" and the second > word with one letter, which is the same as the first letter in our word - in > this study "p", the suggester first match word "pen" and then match "p" in > "pen", which is inccorect. We want to match "p" in a word other than "pen". -- This message was sent by Atlassian Jira (v8.20.10#820010) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org