[jira] [Updated] (LUCENE-10562) Large system: Wildcard search leads to full index scan despite filter query

Henrik Hertel (Jira) Sun, 08 May 2022 09:56:03 -0700


     [ 
https://issues.apache.org/jira/browse/LUCENE-10562?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]


Henrik Hertel updated LUCENE-10562:
-----------------------------------
    Description: 
I use Solr and have a large system with 1TB in one core and about 5 million 
documents. The textual content of large PDF files is indexed there. My query is 
extremely slow as soon as I use wildcards e.g. 
{code:java}
*searchvalue*
{code}
, even though I put a filter query in front of it that reduces to less than 20 
documents.

searchvalue -> less than 1 second
searchvalue* -> less than 1 second
{*}{{*}}searchvalue{{*}}{*}-> more than 30 seconds

My query:
select?defType=lucene&q=content_t:{*}{{*}}searchvalue{{*}}{*}&fq=metadataitemids_is:20950&fq=renditions_ss%3A&fl=id&rows=50&start=0

I've tried everything imaginable. It doesn't make sense to me why a search over 
a small subset should take so long. If I omit the filter query 
metadataitemids_is:20950, so search the entire inventory, then it also takes 
the same amount of time. Therefore, I suspect that despite the filter query, 
the main query runs over the entire index.

  was:
I use Solr and have a large system with 1TB in one core and about 5 million 
documents. The textual content of large PDF files is indexed there. My query is 
extremely slow as soon as I use wildcards e.g. *{*}searchvalue{*}*, even though 
I put a filter query in front of it that reduces to less than 20 documents.

searchvalue -> less than 1 second
searchvalue* -> less than 1 second
*{*}searchvalue{*}*-> more than 30 seconds

My query:
select?defType=lucene&q=content_t:*{*}searchvalue{*}*&fq=metadataitemids_is:20950&fq=renditions_ss%3A&fl=id&rows=50&start=0

I've tried everything imaginable. It doesn't make sense to me why a search over 
a small subset should take so long. If I omit the filter query 
metadataitemids_is:20950, so search the entire inventory, then it also takes 
the same amount of time. Therefore, I suspect that despite the filter query, 
the main query runs over the entire index.


> Large system: Wildcard search leads to full index scan despite filter query
> ---------------------------------------------------------------------------
>
>                 Key: LUCENE-10562
>                 URL: https://issues.apache.org/jira/browse/LUCENE-10562
>             Project: Lucene - Core
>          Issue Type: Bug
>          Components: core/search
>    Affects Versions: 8.11.1
>            Reporter: Henrik Hertel
>            Priority: Major
>              Labels: performance
>
> I use Solr and have a large system with 1TB in one core and about 5 million 
> documents. The textual content of large PDF files is indexed there. My query 
> is extremely slow as soon as I use wildcards e.g. 
> {code:java}
> *searchvalue*
> {code}
> , even though I put a filter query in front of it that reduces to less than 
> 20 documents.
> searchvalue -> less than 1 second
> searchvalue* -> less than 1 second
> {*}{{*}}searchvalue{{*}}{*}-> more than 30 seconds
> My query:
> select?defType=lucene&q=content_t:{*}{{*}}searchvalue{{*}}{*}&fq=metadataitemids_is:20950&fq=renditions_ss%3A&fl=id&rows=50&start=0
> I've tried everything imaginable. It doesn't make sense to me why a search 
> over a small subset should take so long. If I omit the filter query 
> metadataitemids_is:20950, so search the entire inventory, then it also takes 
> the same amount of time. Therefore, I suspect that despite the filter query, 
> the main query runs over the entire index.



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[jira] [Updated] (LUCENE-10562) Large system: Wildcard search leads to full index scan despite filter query

Reply via email to