Emiliyan Sinigerov created LUCENE-10645: -------------------------------------------
Summary: Wrong autocomplete suggestion Key: LUCENE-10645 URL: https://issues.apache.org/jira/browse/LUCENE-10645 Project: Lucene - Core Issue Type: Bug Reporter: Emiliyan Sinigerov I have problem with autocomplete suggestion (I use your test to show you where is the bug https://github.com/apache/lucene/blob/698f40ad51af0c42b0a4a8321ab89968e8d0860b/lucene/suggest/src/test/org/apache/lucene/search/suggest/analyzing/TestAnalyzingInfixSuggester.java). This is your test and everything works fine: public void testBothExactAndPrefix() throws Exception { Analyzer a = new MockAnalyzer(random(), MockTokenizer.WHITESPACE, false); AnalyzingInfixSuggester suggester = new AnalyzingInfixSuggester(newDirectory(), a, a, 3, false); suggester.build(new InputArrayIterator(new Input[0])); suggester.add(new BytesRef("the pen is pretty"), null, 10, new BytesRef("foobaz")); suggester.refresh(); List<LookupResult> results = suggester.lookup(TestUtil.stringToCharSequence("pen p", random()), 10, true, true); assertEquals(1, results.size()); assertEquals("the pen is pretty", results.get(0).key); assertEquals("the <b>pen</b> is <b>p</b>retty", results.get(0).highlightKey); assertEquals(10, results.get(0).value); assertEquals(new BytesRef("foobaz"), results.get(0).payload); suggester.close(); a.close(); } But if I add this row to the test {*}suggester.add(new BytesRef("the pen is fretty"), null, 10, new BytesRef("foobaz")){*}, the test goes wrong. public void testBothExactAndPrefix() throws Exception { Analyzer a = new MockAnalyzer(random(), MockTokenizer.WHITESPACE, false); AnalyzingInfixSuggester suggester = new AnalyzingInfixSuggester(newDirectory(), a, a, 3, false); suggester.build(new InputArrayIterator(new Input[0])); suggester.add(new BytesRef("the pen is pretty"), null, 10, new BytesRef("foobaz")); *suggester.add(new BytesRef("the pen is fretty"), null, 10, new BytesRef("foobaz"));* suggester.refresh(); List<LookupResult> results = suggester.lookup(TestUtil.stringToCharSequence("pen p", random()), 10, true, true); assertEquals(1, results.size()); assertEquals("the pen is pretty", results.get(0).key); assertEquals("the <b>pen</b> is <b>p</b>retty", results.get(0).highlightKey); assertEquals(10, results.get(0).value); assertEquals(new BytesRef("foobaz"), results.get(0).payload); suggester.close(); a.close(); } We want to find everything that contains "pen p" and we have just one matcher "the pen is pretty", but in the results we have two matches "the pen is pretty" and "the pen is fretty". I think when we want to find some words - in this study "pen" and the second word with one letter, which is the same as the first letter in our word - in this study "p", the suggester first match word "pen" and then match "p" in "pen", which is inccorect. We want to match "p" in a word other than "pen". Thank you, Emiliyan. -- This message was sent by Atlassian Jira (v8.20.10#820010) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org