Re: &fq degrades qtime in a 20million doc collection

Shawn Heisey Thu, 14 Jan 2016 12:49:43 -0800

On 1/14/2016 12:07 PM, Anria B. wrote:
> Here are some Actual examples, if it helps
>
> wt=json&q=*:*&indent=on&fq=SolrDocumentType:"invalidValue"&fl=timestamp&rows=0&start=0&debug=timing
<snip>
>         "QTime": 590,
<snip>
> Now we wipe out all caches, and put the filter in q.
>
> wt=json&q=SolrDocumentType:"invalidValue"&indent=on&fl=timestamp&rows=0&start=0&debug=timing
<snip>
>         "QTime": 266,


For uncached queries on an index with 20+ million documents that takes
up 121GB of disk space, these are pretty good times.

When the query is not cached, a filter query will *always* be slower
than the same thing in the q parameter.  The reason for this is very
simple -- the *result* of a filter query is a bitset where every
document in the index is represented, with a zero for no match and a one
for a match.  Solr must touch every single document in the index
(including deleted documents) to build this bitset. The bitset for a 20
million document index is 2.5 million bytes long.  This bitset is what
gets put into the filterCache.

When a query is in the q parameter, there are shortcuts in Lucene that
Solr uses to find *only* the number of results requested in the rows
parameter, so it takes less time.

Filter queries are *lightning* fast when they are cached, because Solr
does not need to do the work of checking every document in the index to
see if it's in the result list.  That is the reason that you will
commonly see advice to move things from q to fq ... but that advice
should only be followed if you expect filters to be re-used often enough
to result in cache hits.

Thanks,
Shawn

Re: &fq degrades qtime in a 20million doc collection

Reply via email to