[ https://issues.apache.org/jira/browse/LUCENE-10562?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17533856#comment-17533856 ]
Uwe Schindler commented on LUCENE-10562: ---------------------------------------- Hi, I think those question do not relate to Lucene and are no issues at all. I think those quetsions should be asked on the Solr mailing list: us...@solr.apache.org. This is not a bug and there is no way to improve this situation inside Lucene. Some additional hints: - Consider using the reverse wildcard filter in Solr (there's documentation about this). But this won't help if you need a wildcard on both sides of the star - Consider to disable wildcards for end-users in your case (the flexible or dismax query parser in Solr can do this) In general, using wildcards in a full text search engine is showing that text analysis works wrong. Based on your name and profile, it looks like this is a typical "German language problem". In Germany, compounds are usual ("Donaudampschiffahrtskapitän", the captain of a steam powered ship on the German river Donau) and then people using wildcards is always a sign for missing decompounding. This can be done with hyphenation-compound token filter in combination with dictionaries. An example and minimalized data files for German language is here: https://github.com/uschindler/german-decompounder When you do decompounding, wildcards should not be needed. > Large system: Wildcard search leads to full index scan despite filter query > --------------------------------------------------------------------------- > > Key: LUCENE-10562 > URL: https://issues.apache.org/jira/browse/LUCENE-10562 > Project: Lucene - Core > Issue Type: Bug > Components: core/search > Affects Versions: 8.11.1 > Reporter: Henrik Hertel > Priority: Major > Labels: performance > > I use Solr and have a large system with 1TB in one core and about 5 million > documents. The textual content of large PDF files is indexed there. My query > is extremely slow (more than 30 seconds) as soon as I use wildcards e.g. > {code:java} > *searchvalue* > {code} > , even though I put a filter query in front of it that reduces to less than > 20 documents. > searchvalue -> less than 1 second > searchvalue* -> less than 1 second > My query: > {code:java} > select?defType=lucene&q=content_t:*searchvalue*&fq=metadataitemids_is:20950&fl=id&rows=50&start=0 > {code} > I've tried everything imaginable. It doesn't make sense to me why a search > over a small subset should take so long. If I omit the filter query > metadataitemids_is:20950, so search the entire inventory, then it also takes > the same amount of time. Therefore, I suspect that despite the filter query, > the main query runs over the entire index. -- This message was sent by Atlassian Jira (v8.20.7#820007) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org