Emiliyan Sinigerov created LUCENE-10645:
-------------------------------------------
Summary: Wrong autocomplete suggestion
Key: LUCENE-10645
URL: https://issues.apache.org/jira/browse/LUCENE-10645
Project: Lucene - Core
Issue Type: Bug
Reporter: Emiliyan Sinigerov
I have problem with autocomplete suggestion (I use your test to show you where
is the bug
https://github.com/apache/lucene/blob/698f40ad51af0c42b0a4a8321ab89968e8d0860b/lucene/suggest/src/test/org/apache/lucene/search/suggest/analyzing/TestAnalyzingInfixSuggester.java).
This is your test and everything works fine:
public void testBothExactAndPrefix() throws Exception {
Analyzer a = new MockAnalyzer(random(), MockTokenizer.WHITESPACE, false);
AnalyzingInfixSuggester suggester = new
AnalyzingInfixSuggester(newDirectory(), a, a, 3, false);
suggester.build(new InputArrayIterator(new Input[0]));
suggester.add(new BytesRef("the pen is pretty"), null, 10, new
BytesRef("foobaz"));
suggester.refresh();
List<LookupResult> results =
suggester.lookup(TestUtil.stringToCharSequence("pen p", random()), 10,
true, true);
assertEquals(1, results.size());
assertEquals("the pen is pretty", results.get(0).key);
assertEquals("the <b>pen</b> is <b>p</b>retty",
results.get(0).highlightKey);
assertEquals(10, results.get(0).value);
assertEquals(new BytesRef("foobaz"), results.get(0).payload);
suggester.close();
a.close();
}
But if I add this row to the test {*}suggester.add(new BytesRef("the pen is
fretty"), null, 10, new BytesRef("foobaz")){*}, the test goes wrong.
public void testBothExactAndPrefix() throws Exception {
Analyzer a = new MockAnalyzer(random(), MockTokenizer.WHITESPACE, false);
AnalyzingInfixSuggester suggester = new
AnalyzingInfixSuggester(newDirectory(), a, a, 3, false);
suggester.build(new InputArrayIterator(new Input[0]));
suggester.add(new BytesRef("the pen is pretty"), null, 10, new
BytesRef("foobaz"));
*suggester.add(new BytesRef("the pen is fretty"), null, 10, new
BytesRef("foobaz"));*
suggester.refresh();
List<LookupResult> results =
suggester.lookup(TestUtil.stringToCharSequence("pen p", random()), 10,
true, true);
assertEquals(1, results.size());
assertEquals("the pen is pretty", results.get(0).key);
assertEquals("the <b>pen</b> is <b>p</b>retty", results.get(0).highlightKey);
assertEquals(10, results.get(0).value);
assertEquals(new BytesRef("foobaz"), results.get(0).payload);
suggester.close();
a.close();
}
We want to find everything that contains "pen p" and we have just one matcher
"the pen is pretty", but in the results we have two matches "the pen is pretty"
and "the pen is fretty".
I think when we want to find some words - in this study "pen" and the second
word with one letter, which is the same as the first letter in our word - in
this study "p", the suggester first match word "pen" and then match "p" in
"pen", which is inccorect. We want to match "p" in a word other than "pen".
Thank you,
Emiliyan.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]