[jira] [Comment Edited] (LUCENE-10562) Large system: Wildcard search leads to full index scan despite filter query

Uwe Schindler (Jira) Mon, 09 May 2022 08:39:32 -0700


    [ 
https://issues.apache.org/jira/browse/LUCENE-10562?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17533857#comment-17533857
 ]


Uwe Schindler edited comment on LUCENE-10562 at 5/9/22 3:38 PM:
----------------------------------------------------------------

As explanation why this is slow: It has nothing to do with filters or query 
first. The problem is already before that: Wildcard queries are expanded to 
filter bitsets / large OR queries in the query preprocessing (rewrite mode). 
This happens before the actualy query is executed. So as soon as you have a 
wildcard with many matching terms, the preprocessing takes a significant amount 
of time. The actual query execution is fast and can be optimized. Due to the 
way on how an inverted index is built, there's no way to use another query to 
limit the amount of preprocessing work. The preprocessing time is linear to the 
total number of terms in a field, not size of index or number of documents.


was (Author: thetaphi):
As explanation why this is slow: It has nothing to do with filters or query 
first. The problem is already before that: Wildcard queries are expanded to 
filter bitsets / large OR queries in the query preprocessing (rewrite mode). 
This happens before the actualy query is executed. So as soon as you have a 
wildcard with many matching terms, the preprocessing takes a significant amount 
of time. The actual query execution is fast and can be optimized. Due to the 
way on how an inverted index is built, there's no way to use another query to 
limit the amount of preprocessing work.

> Large system: Wildcard search leads to full index scan despite filter query
> ---------------------------------------------------------------------------
>
>                 Key: LUCENE-10562
>                 URL: https://issues.apache.org/jira/browse/LUCENE-10562
>             Project: Lucene - Core
>          Issue Type: Bug
>          Components: core/search
>    Affects Versions: 8.11.1
>            Reporter: Henrik Hertel
>            Priority: Major
>              Labels: performance
>
> I use Solr and have a large system with 1TB in one core and about 5 million 
> documents. The textual content of large PDF files is indexed there. My query 
> is extremely slow (more than 30 seconds)  as soon as I use wildcards e.g. 
> {code:java}
> *searchvalue*
> {code}
> , even though I put a filter query in front of it that reduces to less than 
> 20 documents.
> searchvalue -> less than 1 second
> searchvalue* -> less than 1 second
> My query:
> {code:java}
> select?defType=lucene&q=content_t:*searchvalue*&fq=metadataitemids_is:20950&fl=id&rows=50&start=0
>  {code}
> I've tried everything imaginable. It doesn't make sense to me why a search 
> over a small subset should take so long. If I omit the filter query 
> metadataitemids_is:20950, so search the entire inventory, then it also takes 
> the same amount of time. Therefore, I suspect that despite the filter query, 
> the main query runs over the entire index.



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[jira] [Comment Edited] (LUCENE-10562) Large system: Wildcard search leads to full index scan despite filter query

Reply via email to