[ https://issues.apache.org/jira/browse/LUCENE-10562?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17542440#comment-17542440 ]
Uwe Schindler edited comment on LUCENE-10562 at 5/26/22 10:58 AM: ------------------------------------------------------------------ Hi [~zhuming], this is better a question to ask on the user mailing list. As short answer: If you use {{TopTermsScoringBooleanQueryRewrite}} you have to live with the consequences. As said several times in this issue: If you need to use wildcard queries think about changing your analysis, so you can do the same queries (e.g., by using ngrams in the analysis) in a performant ways. It is impossible to implement wildcard queries in an efficient way in inverted indexes, as the the expansion is always done before the query and it can't use any other query clauses: There's no way to only select terms in the first query that would also produce a hit for the second query (your filter) as there is no relationship at all. In addition: Scoring of wildcard queries like that - "hoping for something" - does not look like the right way to solve your problem. was (Author: thetaphi): Hi [~zhuming], this is better a question to ask on the user mailing list. As short answer: If you use {{TopTermsScoringBooleanQueryRewrite}} you have to live with the consequences. As said several times in this issue: If you need to use wildcard queries think about changing your analysis, so you can do the same queries (e.g., by using ngrams in the analysis) in a performant ways. It is impossible to implement wildcard queries in an efficient way in inverted indexes, as the the expansion is always done before the query and it can't use any other query clauses: There's no way to only select terms in the first query that would also produce a hit for the second query (your filter) as there is no relationship at all. In addition: Scoring of wildcard queries like that are not the right way to solve your problem. > Large system: Wildcard search leads to full index scan despite filter query > --------------------------------------------------------------------------- > > Key: LUCENE-10562 > URL: https://issues.apache.org/jira/browse/LUCENE-10562 > Project: Lucene - Core > Issue Type: Bug > Components: core/search > Affects Versions: 8.11.1 > Reporter: Henrik Hertel > Priority: Major > Labels: performance > > I use Solr and have a large system with 1TB in one core and about 5 million > documents. The textual content of large PDF files is indexed there. My query > is extremely slow (more than 30 seconds) as soon as I use wildcards e.g. > {code:java} > *searchvalue* > {code} > , even though I put a filter query in front of it that reduces to less than > 20 documents. > searchvalue -> less than 1 second > searchvalue* -> less than 1 second > My query: > {code:java} > select?defType=lucene&q=content_t:*searchvalue*&fq=metadataitemids_is:20950&fl=id&rows=50&start=0 > {code} > I've tried everything imaginable. It doesn't make sense to me why a search > over a small subset should take so long. If I omit the filter query > metadataitemids_is:20950, so search the entire inventory, then it also takes > the same amount of time. Therefore, I suspect that despite the filter query, > the main query runs over the entire index. -- This message was sent by Atlassian Jira (v8.20.7#820007) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org