On 4/17/2019 10:51 AM, John Davis wrote:
Can you clarify why field:[* TO *] is lot more efficient than field:*
It's a range query. For every document, Lucene just has to answer two questions -- is the value more than any possible value and is the value less than any possible value. The answer will be yes if the field exists, and no if it doesn't. With one million documents, there are two million questions that Lucene has to answer. Which probably seems like a lot ... but keep reading. (Side note: It wouldn't surprise me if Lucene has an optimization specifically for the all inclusive range such that it actually only asks one question, not two)
With a wildcard query, there are as many questions as there are values in the field. Every question is asked for every single document. So if you have a million documents and there are three hundred thousand different values contained in the field across the whole index, that's 300 billion questions.
Thanks, Shawn