[ https://issues.apache.org/jira/browse/LUCENE-10562?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17533982#comment-17533982 ]
Tomoko Uchida edited comment on LUCENE-10562 at 5/9/22 7:43 PM: ---------------------------------------------------------------- Yes, it's not about "query execution (retrieving inverted index)" but "what postings you are traversing"; there's no way to optimize when you have so many postings that match the wildcard term. Just for a practical tip, I usually enforce two or three leading characters for wildcard queries when it's needed to reduce the search space. {quote}Consider using the reverse wildcard filter in Solr (there's documentation about this). But this won't help if you need a wildcard on both sides of the term {quote} -I think you could run conjunction queries of two fields (one for the normal text field, one for the reversed field) to support the infix wildcard query- - not sure it is worth adding another text field when the index is already large. Anyway, dictionary-based decomposition looks promising to me in the case of German (though I have little knowledge of it except for the very basic one I've learned at the university lectures). Correction: conjunction query does not work in this situation - sorry. nGram or more sophisticated term decomposition will be needed. was (Author: tomoko uchida): Yes, it's not about "query execution (retrieving inverted index)" but "what postings you are traversing"; there's no way to optimize when you have so many postings that match the wildcard term. Just for a practical tip, I usually enforce two or three leading characters for wildcard queries when it's needed to reduce the search space. bq. Consider using the reverse wildcard filter in Solr (there's documentation about this). But this won't help if you need a wildcard on both sides of the term I think you could run conjunction queries of two fields (one for the normal text field, one for the reversed field) to support the infix wildcard query - not sure it is worth adding another text field when the index is already large. Anyway, dictionary-based decomposition looks promising to me in the case of German (though I have little knowledge of it except for the very basic one I've learned at the university lectures). > Large system: Wildcard search leads to full index scan despite filter query > --------------------------------------------------------------------------- > > Key: LUCENE-10562 > URL: https://issues.apache.org/jira/browse/LUCENE-10562 > Project: Lucene - Core > Issue Type: Bug > Components: core/search > Affects Versions: 8.11.1 > Reporter: Henrik Hertel > Priority: Major > Labels: performance > > I use Solr and have a large system with 1TB in one core and about 5 million > documents. The textual content of large PDF files is indexed there. My query > is extremely slow (more than 30 seconds) as soon as I use wildcards e.g. > {code:java} > *searchvalue* > {code} > , even though I put a filter query in front of it that reduces to less than > 20 documents. > searchvalue -> less than 1 second > searchvalue* -> less than 1 second > My query: > {code:java} > select?defType=lucene&q=content_t:*searchvalue*&fq=metadataitemids_is:20950&fl=id&rows=50&start=0 > {code} > I've tried everything imaginable. It doesn't make sense to me why a search > over a small subset should take so long. If I omit the filter query > metadataitemids_is:20950, so search the entire inventory, then it also takes > the same amount of time. Therefore, I suspect that despite the filter query, > the main query runs over the entire index. -- This message was sent by Atlassian Jira (v8.20.7#820007) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org