[ https://issues.apache.org/jira/browse/LUCENE-10562?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17542341#comment-17542341 ]
Ming Zhu edited comment on LUCENE-10562 at 5/26/22 4:39 AM: ------------------------------------------------------------ I'm encountering a similar issue, but the impact is more than performance. My case is, I have a wildcard query with filter, let's say, *wildcard:'*searchvalue*' and term filter 'status':'open'* And I'm using TopTermsScoringBooleanQueryRewrite to hopefully get some meaningful relevance scores to sort all the hits. For my data set, there are millions of documents where status is NOT open, and a handful of them with status:open. So the issue here is with the rewrite with top terms, all the terms which are relevant for documents with *status:open* are ranked very low (because of their low frequencies), but apparently I can't keep increasing the size of terms to be taken in the rewrite phase, as that may lead to the max clause issue. So this query+filter ended up with not hitting anything. Any idea how to get out of this situation? Thanks. [~uschindler] [~tomoko] was (Author: JIRAUSER290042): I'm encountering a similar issue, but the impact is more than performance. My case is, I have a wildcard query with filter, let's say, *wildcard:'*searchvalue*' and term filter 'status':'open'* And I'm using TopTermsScoringBooleanQueryRewrite to hopefully get some meaningful relevance scores to sort all the hits. For my data set, there are millions of documents where status is NOT open, and a handful of them with status:open. So the issue here is with the rewrite with top terms, all the terms which are relevant for documents with *status:open* are ranked very low (because of their low frequencies), but apparently I can't keep increasing the size of terms to be taken in the rewrite phase, as that may lead to the max clause issue. So this query+filter ended up with not hitting anything. Any idea how to get out of this situation? Thanks. [~uschindler] [~tomoko] > Large system: Wildcard search leads to full index scan despite filter query > --------------------------------------------------------------------------- > > Key: LUCENE-10562 > URL: https://issues.apache.org/jira/browse/LUCENE-10562 > Project: Lucene - Core > Issue Type: Bug > Components: core/search > Affects Versions: 8.11.1 > Reporter: Henrik Hertel > Priority: Major > Labels: performance > > I use Solr and have a large system with 1TB in one core and about 5 million > documents. The textual content of large PDF files is indexed there. My query > is extremely slow (more than 30 seconds) as soon as I use wildcards e.g. > {code:java} > *searchvalue* > {code} > , even though I put a filter query in front of it that reduces to less than > 20 documents. > searchvalue -> less than 1 second > searchvalue* -> less than 1 second > My query: > {code:java} > select?defType=lucene&q=content_t:*searchvalue*&fq=metadataitemids_is:20950&fl=id&rows=50&start=0 > {code} > I've tried everything imaginable. It doesn't make sense to me why a search > over a small subset should take so long. If I omit the filter query > metadataitemids_is:20950, so search the entire inventory, then it also takes > the same amount of time. Therefore, I suspect that despite the filter query, > the main query runs over the entire index. -- This message was sent by Atlassian Jira (v8.20.7#820007) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org