[
https://issues.apache.org/jira/browse/LUCENE-9365?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17106265#comment-17106265
]
Michael McCandless commented on LUCENE-9365:
--------------------------------------------
{quote}
bq. so +1 to make FuzzyQuery lenient to these cases and rewrite itself to
PrefixQuery or RegexpQuery instead.
Would this mean we need to add a max length option to PrefixQuery?
{quote}
OK, let me narrow my +1 a bit ;)
I'm +1 to having {{FuzzyQuery}} be lenient by allowing this strange case where
{{prefix == term.text().length()}} and implementing it "correctly", to make it
less trappy for users.
But I'm less clear on how exactly we should implement that. You're right, if
we rewrite to {{PrefixQuery}} then we must then add a max length option to it.
Maybe that is indeed a useful option to expose publicly to {{PrefixQuery}}
users? That would let users cap how many characters are allowed after the
prefix.
Alternatively, we could just rewrite to an anonymous {{AutomatonQuery}} that
accepts precisely the term as prefix, and then at most {{edit-distance}}
additional arbitrary characters?
I'm not sure which approach is better ... I think I would favor the first
option.
> Fuzzy query has a false negative when prefix length == search term length
> --------------------------------------------------------------------------
>
> Key: LUCENE-9365
> URL: https://issues.apache.org/jira/browse/LUCENE-9365
> Project: Lucene - Core
> Issue Type: Bug
> Components: core/query/scoring
> Reporter: Mark Harwood
> Priority: Major
>
> When using FuzzyQuery the search string `bba` does not match doc value `bbab`
> with an edit distance of 1 and prefix length of 3.
> In FuzzyQuery an automaton is created for the "suffix" part of the search
> string which in this case is an empty string.
> In this scenario maybe the FuzzyQuery should rewrite to a WildcardQuery of
> the following form :
> {code:java}
> searchString + "?"
> {code}
> .. where there's an appropriate number of ? characters according to the edit
> distance.
--
This message was sent by Atlassian Jira
(v8.3.4#803005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]