On 6/24/2015 5:28 AM, Esther Goldbraich wrote:
> We are comparing the performance of fq versus q for queries that are 
> actually filters and should not be cached.
> In part of queries we see strange behavior where q performs 5-10x better 
> than fq. The question is why?
> 
> An example1:
> q=maildate:{DATE1 to DATE2} COMPARED TO fq={!cache=false}maildate:{DATE1 
> to DATE2}
> sort=maildate_sort* desc

<snip>

> <field name="maildate" stored="true" indexed="true" type="tdate"/>
> <field name="maildate_sort" stored="false" indexed="false" type="tdate" 
> docValues="true"/>

For simplicity, I would probably just use one field for that, rather
than a separate sort field.  The disk space required would probably be
the same either way, but your interaction with the index will not be as
complex.  There's nothing wrong with doing it the way you have, though.

I'm not at all an expert, but I've been a member of this community for a
long time.  Here's my guess about why your query is faster in the q
parameter than a non-cached filter:

The result of a standard query is the stored fields from the top N
documents, where N is the value in the rows parameter.  The default for
N is typically set to 10, and for most people will normally be 200 or less.

The result of a filter is very different -- it is a bitset of all the
documents in your entire index, with binary 0 for documents that don't
match the filter and binary 1 for documents that do match.

If your index has 100 million documents, every single one of those 100
million documents must be checked against the filter query to produce a
filter bitset, but when it's in the q parameter, shortcuts can be taken
which will get the top N results quickly.

The filterCache levels the playing field when filters are re-used.  If a
requested filter is already in the cache, it can be retrieved and
applied to a result VERY quickly.

You have turned off the caching for your filter.  I'm not sure why you
did this, but you know your use case a lot better than I do.  If it were
me, I would use filter queries and do everything possible to re-use the
same filters, and I would cache them.

Thanks,
Shawn

Reply via email to