Re: Filter cache pollution during sharded edismax queries

Shawn Heisey Tue, 30 Sep 2014 06:40:00 -0700

On 9/30/2014 4:38 AM, Charlie Hull wrote:
> We've just found a very similar issue at a client installation. They have
> around 27 million documents and are faceting on fields with high
> cardinality, and are unhappy with query performance and the server hardware
> necessary to make this performance acceptable. Last night we noticed the
> filter cache had a pretty low hit rate and seemed to be filling up with
> many unexpected items (we were testing with only a *single* actual filter
> query). Diagnosing this with the showItems flag set on the Solr admin
> statistics we could see entries relating to facets, even though we were
> sure we were using the default facet.method=fc setting that should prevent
> filters being constructed. We're thus seeing similar cache pollution to Ken
> and Anca.
> 
> We're trying a different type of cache (LFUCache) now and also may try
> tweaking cache sizes to try and help, as the filter creation seems to be
> something we can't easily get round.


Since I was the one who wrote the current LFUCache implementation you'll
find in Solr, I can tell you the implementation is very naive.  It
correctly implements LFU, but it does so in a "beginning programming
student" way.  To decide which entry to evict, it must basically sort
the list by the number of times each entry has been used.  Because that
number can continually change on each entry, that sort must be done
every time an eviction must happen.

Unless the cache size is very small, I would not expect the performance
to be very good when the cache gets full and it must decide which
entries to evict.  I don't know what number qualifies as "very small"
... I'm not sure I'd go above 32 or 64.  As the size goes up, the
performance of adding a new entry to a full cache will go down.

I've got a very efficient new cache implementation in Jira, but haven't
had the time to devote to getting it polished and committed.

Thanks,
Shawn

Re: Filter cache pollution during sharded edismax queries

Reply via email to