[jira] [Comment Edited] (LUCENE-10562) Large system: Wildcard search leads to full index scan despite filter query

Uwe Schindler (Jira) Thu, 26 May 2022 03:59:31 -0700


    [ 
https://issues.apache.org/jira/browse/LUCENE-10562?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17542440#comment-17542440
 ]


Uwe Schindler edited comment on LUCENE-10562 at 5/26/22 10:58 AM:
------------------------------------------------------------------

Hi [~zhuming],
this is better a question to ask on the user mailing list.

As short answer: If you use {{TopTermsScoringBooleanQueryRewrite}} you have to 
live with the consequences. As said several times in this issue: If you need to 
use wildcard queries think about changing your analysis, so you can do the same 
queries (e.g., by using ngrams in the analysis) in a performant ways. It is 
impossible to implement wildcard queries in an efficient way in inverted 
indexes, as the the expansion is always done before the query and it can't use 
any other query clauses: There's no way to only select terms in the first query 
that would also produce a hit for the second query (your filter) as there is no 
relationship at all.

In addition: Scoring of wildcard queries like that - "hoping for something" - 
does not look like the right way to solve your problem.


was (Author: thetaphi):
Hi [~zhuming],
this is better a question to ask on the user mailing list.

As short answer: If you use {{TopTermsScoringBooleanQueryRewrite}} you have to 
live with the consequences. As said several times in this issue: If you need to 
use wildcard queries think about changing your analysis, so you can do the same 
queries (e.g., by using ngrams in the analysis) in a performant ways. It is 
impossible to implement wildcard queries in an efficient way in inverted 
indexes, as the the expansion is always done before the query and it can't use 
any other query clauses: There's no way to only select terms in the first query 
that would also produce a hit for the second query (your filter) as there is no 
relationship at all.

In addition: Scoring of wildcard queries like that are not the right way to 
solve your problem.

> Large system: Wildcard search leads to full index scan despite filter query
> ---------------------------------------------------------------------------
>
>                 Key: LUCENE-10562
>                 URL: https://issues.apache.org/jira/browse/LUCENE-10562
>             Project: Lucene - Core
>          Issue Type: Bug
>          Components: core/search
>    Affects Versions: 8.11.1
>            Reporter: Henrik Hertel
>            Priority: Major
>              Labels: performance
>
> I use Solr and have a large system with 1TB in one core and about 5 million 
> documents. The textual content of large PDF files is indexed there. My query 
> is extremely slow (more than 30 seconds)  as soon as I use wildcards e.g. 
> {code:java}
> *searchvalue*
> {code}
> , even though I put a filter query in front of it that reduces to less than 
> 20 documents.
> searchvalue -> less than 1 second
> searchvalue* -> less than 1 second
> My query:
> {code:java}
> select?defType=lucene&q=content_t:*searchvalue*&fq=metadataitemids_is:20950&fl=id&rows=50&start=0
>  {code}
> I've tried everything imaginable. It doesn't make sense to me why a search 
> over a small subset should take so long. If I omit the filter query 
> metadataitemids_is:20950, so search the entire inventory, then it also takes 
> the same amount of time. Therefore, I suspect that despite the filter query, 
> the main query runs over the entire index.



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[jira] [Comment Edited] (LUCENE-10562) Large system: Wildcard search leads to full index scan despite filter query

Reply via email to