Facet optimization for facet.method=enum and "exists" case

Alexey Kozhemiakin Mon, 10 Feb 2014 08:31:56 -0800

Dear All,

Background:
We have a dataset containing hundreds of millions of records, we facet by 
dozens of fields with many of facet-excludes and have relatively small number 
of unique values in fields, around thousands.
Before executing search, our users work with "advanced search" and goal is to 
populate dozens of filters with values which are applicable with other selected 
values, so basically this is a use case for facets with mincount=1, but without 
need in actual counts.
Our performance tests showed that facet.method=enum works much better than 
fc\fcs, probably due to a specific ratio of "docset"\"unique terms count". For 
example average execution of query time with method fc=1500ms, fcs=2600ms and 
with enum=280ms. Profiling indicated the majority time for enum was spent on 
intersecting docsets.


So...
We've implemented a patch that introduces an extension to facet calculation for 
method=enum. Basically it uses docSetA.intersects(docSetB) instead of docSetA. 
intersectionSize (docSetB).
As a result we were able to reduce our average query time from 280ms to 60ms.

How would you suggest to name such parameter?
Now we call it "facet.enum.exists" but I'm not sure it's a good name.
When we will clarify this little thing, I'll create a jira-issue and attach 
patch for review. Is there anybody willing to review and commit?

Thank

Alexey

Facet optimization for facet.method=enum and "exists" case

Reply via email to