If the range query is so much better shouldn't the Solr query parser
create a range query for a token query that only contains the wildcard?
For the *:* case it does already contain a special path.
On 20.07.2017 21:00, Shawn Heisey wrote:
On 7/20/2017 7:20 AM, Hendrik Haddorp wrote:
the Solr 6.6. ref guide states that to "finds all documents without a
value for field" you can use:
-field:[* TO *]
While this is true I'm wondering why it is recommended to use a range
query instead of simply:
-field:*
Performance.
A wildcard is expanded to all possible term values for that field. If
the field has millions of possible terms, then the query object created
at the Lucene level will quite literally have millions of terms in it.
No matter how you approach a query with those characteristics, it's
going to be slow, for both getting the terms list and executing the query.
A full range query might be somewhat slow when there are many possible
values, but it's a lot faster than a wildcard in those cases.
If the field is only used by a handful of documents and has very few
possible values, then it might be faster than a range query ... but this
is not common, so the recommended way to do this is with a range query.
Thanks,
Shawn