Yeah, there are optimizations there. BTW, these two queries are subtly 
different.

Well, they’ll be exactly the same if (and only if) every document has a tag. 
Otherwise, the
first one will exclude a doc that has no tag and the second one will include it.

How slow is “very slow”?

The second form only has to index into the terms dictionary for the tag field
value “email”, then zip down the posting list for all the docs that have it. The
first form has to first identify all the docs that have a tag, accumulate that 
list,
_then_ find the “email” value and zip down the postings list. 

You could get around this if you require the first form functionality by, say, 
including a boolean field “has_tags”, then the first one would be 

fq=has_tags:true -tags:email

Best,
Erick

> On Jul 14, 2020, at 8:05 AM, Emir Arnautović <emir.arnauto...@sematext.com> 
> wrote:
> 
> Hi Chris,
> tag:* is a wildcard query while *:* is match all query. I believe that 
> adjusting pure negative is turned on by default so you can safely just use 
> -tag:email and it’ll be translated to *:* -tag:email.
> 
> HTH,
> Emir
> --
> Monitoring - Log Management - Alerting - Anomaly Detection
> Solr & Elasticsearch Consulting Support Training - http://sematext.com/
> 
> 
> 
>> On 14 Jul 2020, at 14:00, Chris Dempsey <cdal...@gmail.com> wrote:
>> 
>> I'm trying to understand the difference between something like
>> fq={!cache=false}(tag:* -tag:email) which is very slow compared to
>> fq={!cache=false}(*:* -tag:email) on Solr 7.7.1.
>> 
>> I believe in the case of `tag:*` Solr spends some effort to gather all of
>> the documents that have a value for `tag` and then removes those with
>> `-tag:email` while in the `*:*` Solr simply uses the document set as-is
>> and  then remove those with `-tag:email` (*and I believe Erick mentioned
>> there were special optimizations for `*:*`*)?
> 

Reply via email to