Re: Facet optimization for facet.method=enum and "exists" case

Annette Newton Thu, 13 Feb 2014 02:46:19 -0800

Hi Alexey,

I would be very interested in your progress with this.  Your use case seems
to match ours, we found enum to be much quicker than fc particularly for
multivalued fields.  We found that fc caused memory issues and caused us to
frequently lose nodes.  We, like you, have no interest in the counts, just
need a distinct list of values.


Thanks.

Netty Newton.


On 10 February 2014 19:30, Erick Erickson <erickerick...@gmail.com> wrote:

> Alexey:
>
> There's no need to wait to create a JIRA! It's perfectly reasonable to
> create it and attach a patch before it's completely polished. People often
> include a note when posting the patch like "for review, not ready for
> commit". Also, including comments in the code like
> //nocommit
> will cause it to fail the "ant precommit" step. This is often useful to get
> other eyeballs on the code early.
>
> But it's up to you.
>
> Best,
> Erick
>
>
> On Mon, Feb 10, 2014 at 8:29 AM, Alexey Kozhemiakin <
> alexey_kozhemia...@epam.com> wrote:
>
> > Dear All,
> >
> > Background:
> > We have a dataset containing hundreds of millions of records, we facet by
> > dozens of fields with many of facet-excludes and have relatively small
> > number of unique values in fields, around thousands.
> > Before executing search, our users work with "advanced search" and goal
> is
> > to populate dozens of filters with values which are applicable with other
> > selected values, so basically this is a use case for facets with
> > mincount=1, but without need in actual counts.
> > Our performance tests showed that facet.method=enum works much better
> than
> > fc\fcs, probably due to a specific ratio of "docset"\"unique terms
> count".
> > For example average execution of query time with method fc=1500ms,
> > fcs=2600ms and with enum=280ms. Profiling indicated the majority time for
> > enum was spent on intersecting docsets.
> >
> > So...
> > We've implemented a patch that introduces an extension to facet
> > calculation for method=enum. Basically it uses
> docSetA.intersects(docSetB)
> > instead of docSetA. intersectionSize (docSetB).
> > As a result we were able to reduce our average query time from 280ms to
> > 60ms.
> >
> > How would you suggest to name such parameter?
> > Now we call it "facet.enum.exists" but I'm not sure it's a good name.
> > When we will clarify this little thing, I'll create a jira-issue and
> > attach patch for review. Is there anybody willing to review and commit?
> >
> > Thank
> >
> > Alexey
> >
>



-- 

Annette Newton

Database Administrator

ServiceTick Ltd



T:+44(0)1603 618326



Seebohm House, 2-4 Queen Street, Norwich, England NR2 4SQ

www.servicetick.com

*www.sessioncam.com <http://www.sessioncam.com>*

-- 
*This message is confidential and is intended to be read solely by the 
addressee. The contents should not be disclosed to any other person or 
copies taken unless authorised to do so. If you are not the intended 
recipient, please notify the sender and permanently delete this message. As 
Internet communications are not secure ServiceTick accepts neither legal 
responsibility for the contents of this message nor responsibility for any 
change made to this message after it was forwarded by the original author.*

Re: Facet optimization for facet.method=enum and "exists" case

Reply via email to