RE: Facet optimization for facet.method=enum and "exists" case

Alexey Kozhemiakin Thu, 13 Feb 2014 13:45:02 -0800

Hi Annette, 

You might want to find initial version of patch attached 
https://issues.apache.org/jira/browse/SOLR-5725


I'd be happy to find out performance improvement on your setup, let me know if 
you need help with patching your version of solr.

--
Alexey 

-----Original Message-----
From: Annette Newton [mailto:annette.new...@servicetick.com] 
Sent: Thursday, February 13, 2014 13:46
To: solr-user@lucene.apache.org
Subject: Re: Facet optimization for facet.method=enum and "exists" case

Hi Alexey,

I would be very interested in your progress with this.  Your use case seems to 
match ours, we found enum to be much quicker than fc particularly for 
multivalued fields.  We found that fc caused memory issues and caused us to 
frequently lose nodes.  We, like you, have no interest in the counts, just need 
a distinct list of values.

Thanks.

Netty Newton.


On 10 February 2014 19:30, Erick Erickson <erickerick...@gmail.com> wrote:

> Alexey:
>
> There's no need to wait to create a JIRA! It's perfectly reasonable to 
> create it and attach a patch before it's completely polished. People 
> often include a note when posting the patch like "for review, not 
> ready for commit". Also, including comments in the code like 
> //nocommit will cause it to fail the "ant precommit" step. This is 
> often useful to get other eyeballs on the code early.
>
> But it's up to you.
>
> Best,
> Erick
>
>
> On Mon, Feb 10, 2014 at 8:29 AM, Alexey Kozhemiakin < 
> alexey_kozhemia...@epam.com> wrote:
>
> > Dear All,
> >
> > Background:
> > We have a dataset containing hundreds of millions of records, we 
> > facet by dozens of fields with many of facet-excludes and have 
> > relatively small number of unique values in fields, around thousands.
> > Before executing search, our users work with "advanced search" and 
> > goal
> is
> > to populate dozens of filters with values which are applicable with 
> > other selected values, so basically this is a use case for facets 
> > with mincount=1, but without need in actual counts.
> > Our performance tests showed that facet.method=enum works much 
> > better
> than
> > fc\fcs, probably due to a specific ratio of "docset"\"unique terms
> count".
> > For example average execution of query time with method fc=1500ms, 
> > fcs=2600ms and with enum=280ms. Profiling indicated the majority 
> > time for enum was spent on intersecting docsets.
> >
> > So...
> > We've implemented a patch that introduces an extension to facet 
> > calculation for method=enum. Basically it uses
> docSetA.intersects(docSetB)
> > instead of docSetA. intersectionSize (docSetB).
> > As a result we were able to reduce our average query time from 280ms 
> > to 60ms.
> >
> > How would you suggest to name such parameter?
> > Now we call it "facet.enum.exists" but I'm not sure it's a good name.
> > When we will clarify this little thing, I'll create a jira-issue and 
> > attach patch for review. Is there anybody willing to review and commit?
> >
> > Thank
> >
> > Alexey
> >
>



-- 

Annette Newton

Database Administrator

ServiceTick Ltd



T:+44(0)1603 618326



Seebohm House, 2-4 Queen Street, Norwich, England NR2 4SQ

www.servicetick.com

*www.sessioncam.com <http://www.sessioncam.com>*

--
*This message is confidential and is intended to be read solely by the 
addressee. The contents should not be disclosed to any other person or copies 
taken unless authorised to do so. If you are not the intended recipient, please 
notify the sender and permanently delete this message. As Internet 
communications are not secure ServiceTick accepts neither legal responsibility 
for the contents of this message nor responsibility for any change made to this 
message after it was forwarded by the original author.*

RE: Facet optimization for facet.method=enum and "exists" case

Reply via email to