Re: [I] Query parser support for wildcards in phrase queries [lucene]

via GitHub Mon, 27 Jan 2025 05:39:23 -0800


jpountz commented on issue #14168:
URL: https://github.com/apache/lucene/issues/14168#issuecomment-2615747790


   `PhraseWildcardSearch` is appealing, but its implementation makes trade-offs 
to work around the fact that it doesn't work efficiently if any of the 
wildcards expands to many terms. If you have a low-cardinality vocabulary, this 
is probably fine, but otherwise (e.g. English content), your queries may either 
be extremely costly if `maxMultiTermExpansions` is high, or miss matches 
(possibly all of them) if `maxMultiTermExpansions` is low. This makes me a bit 
uneasy about exposing it out of the box as it could take users by surprise.
   
   For reference, there are other approaches for wildcard search that have 
different trade-offs, such as indexing (edge) n-grams, so that your wildcard 
expressions can actually be indexed and searched as simple terms (what 
Elasticsearch does when you configure `text` fields with `index_prefixes: 
true`) or indexing n-grams (with n=3 typically) for the whole input, using 
ngrams to find a superset of the matches, and then verifying the wildcard 
phrase against the raw data of this superset of matches (what the Elasticsearch 
`wildcard` field does under the hood).


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

Re: [I] Query parser support for wildcards in phrase queries [lucene]

Reply via email to