jpountz commented on issue #14168: URL: https://github.com/apache/lucene/issues/14168#issuecomment-2615747790
`PhraseWildcardSearch` is appealing, but its implementation makes trade-offs to work around the fact that it doesn't work efficiently if any of the wildcards expands to many terms. If you have a low-cardinality vocabulary, this is probably fine, but otherwise (e.g. English content), your queries may either be extremely costly if `maxMultiTermExpansions` is high, or miss matches (possibly all of them) if `maxMultiTermExpansions` is low. This makes me a bit uneasy about exposing it out of the box as it could take users by surprise. For reference, there are other approaches for wildcard search that have different trade-offs, such as indexing (edge) n-grams, so that your wildcard expressions can actually be indexed and searched as simple terms (what Elasticsearch does when you configure `text` fields with `index_prefixes: true`) or indexing n-grams (with n=3 typically) for the whole input, using ngrams to find a superset of the matches, and then verifying the wildcard phrase against the raw data of this superset of matches (what the Elasticsearch `wildcard` field does under the hood). -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org