On 9/8/2015 9:10 AM, adfel70 wrote:
> I am trying to understand why faceting on a field with lots of unique values
> has a great impact on query performance. Since Googling for Solr facet
> algorithm did not yield anything, I looked how facets are implemented in
> Lucene. I found out that there are 2 methods - taxonomy-based and
> SortedSetDocValues-based. Does Solr facet capabilities are based on one of
> those methods? if so, I still cant understand why unique values impacts
> query performance...

Lucene's facet implementation is completely separate (and different)
from Solr's implementation.  I am not familiar with the inner workings
of either implementation.  Solr implemented faceting long before Lucene
did.  I think *Solr* actually contains at least two different facet
implementations, used for different kinds of facets.

Faceting on a field with many unique values uses a HUGE amount of heap
memory, which is likely why query performance is impacted.

I have a dev system with all my indexes (each of which has dedicated
hardware for production) on it.  Normally it requires 15GB of heap to
operate properly.  Every now and then, I get asked to do a duplicate
check on a field that *should* be unique, on an index with 250 million
docs in it.  The query that I am asked to do for the facet matches about
100 million docs.  This facet query, on a field that DOES have
docValues, will throw OOM if my heap is less than 27GB.  The dev machine
only has 32GB of RAM, so as you might imagine, performance is really
terrible when I do this query.  Thankfully it's a dev machine.  When I
was doing these queries, it was running 4.9.1.  I have since upgraded it
to 5.2.1, as a proof of concept for upgrading our production indexes ...
but I have not attempted the facet query since the upgrade.

Thanks,
Shawn

Reply via email to