Re: How to speed up field collapsing on large number of groups

Joel Bernstein Thu, 14 Jul 2016 08:35:56 -0700

The top_fc hint doesn't come into play until Solr 5.  With Solr 4x the
CollapsingQParserPlugin always uses a top level field cache.


Joel Bernstein
http://joelsolr.blogspot.com/

On Mon, Jul 4, 2016 at 9:20 AM, Alessandro Benedetti <abenede...@apache.org>
wrote:

> Have you tried with docValues for the fields involved in the collapse
> group head selection ?
>
> With a group head selection of "min" "max"and "sort" should work quite
> well.
> Of course it depends of your formula.
>
> Does your index change often ?
> If the warming time is not a problem you could try with :
>
> hint
>
> Currently there is only one hint available "top_fc", which stands for top
> level FieldCache. The top_fc hint is only available when collapsing on
> String fields. top_fc provides the best query time speed but takes the
> longest to warm on startup or following a commit. top_fc also will result
> in having the collapsed field cached in memory twice if the it's used for
> faceting or sorting.
>
> Cheers
>
> On Wed, Jun 29, 2016 at 1:59 AM, Jichi Guo <jichi...@gmail.com> wrote:
>
>> Thanks for the quick response, Joel!
>>
>> I am hoping to delay sharding if possible, which might involve more
>> things to
>> consider :)
>>
>>
>>
>> 1) What is the size of the result set before the collapse?
>>
>>
>>
>> When search with q=*:* for example, before collapse numFound is around 5
>> million, and that after collapse is 2 million.
>>
>> I only return about the top 30 documents in the result.
>>
>>
>>
>> 2) Have you tested without the long formula, just using a field for the
>> min/max. It would be good to understand the impact of the formula on
>> performance.
>>
>>
>>
>> The performance seems to be affected by the number of fields appearing in
>> the
>> max formula.
>>
>>
>>
>> For example, that 5 million expensive query would take 4.4 sec.
>>
>> For both {!collapse field=productGroupId} and {!collapse
>> field=productGroupId
>> max=only_one_field}, the query time would reduce to around 2.4 sec.
>>
>> If I remove the entire collapse fq, then the query only took 1.3 sec.
>>
>>
>>
>> 3) How much memory do you have on the server and for the heap. Memory use
>> rises with the cardinality of the collapse field. So you'll want to be
>> sure
>> there is enough memory to comfortably perform the collapse.
>>
>>
>>
>> I am setting Xmx to 24G. The total index size on disk is 50G.
>>
>> In solrconfig.xml, I use solr.FastLRUCache for filterCache with cache size
>> 2048, solr.LRUCache for documentCache with cache size 32768, and
>> solr.LRUCache
>> for queryResultCache with cache size 4096. I am using default
>> fieldValueCache.
>>
>>
>>
>> I found Collapsing QParser plugin explicitly uses lucene's field cache.
>>
>> Maybe, increasing fieldCache would help?  But I am not sure how to
>> increase it
>> in Solr.
>>
>>
>> Sent from [Nylas N1](https://link.nylas.com/link/5tkvmhpozan5j5h3lhni487b
>> /local-
>>
>> 481233c4-d727/0?redirect=https%3A%2F%2Fnylas.com%2Fn1%3Fref%3Dn1&r=c29sci11c2VyQGx1Y2VuZS5hcGFjaGUub3Jn),
>> the extensible, open source mail client.
>>
>> ![](https://link.nylas.com/open/5tkvmhpozan5j5h3lhni487b/local-
>> 481233c4-d727?r=c29sci11c2VyQGx1Y2VuZS5hcGFjaGUub3Jn)
>>
>> On Jun 28 2016, at 4:48 pm, Joel Bernstein &lt;joels...@gmail.com&gt;
>> wrote:
>>
>> > Sharding will help, but you'll need to co-locate documents by group ID.
>> A
>> few questions / suggestions:
>>
>> >
>>
>> >
>> >
>>
>> >
>>
>> > 1) What is the size of the result set before the collapse?
>>
>> >
>>
>> > 2) Have you tested without the long formula, just using a field for the
>> min/max. It would be good to understand the impact of the formula on
>> performance.
>>
>> >
>>
>> > 3) How much memory do you have on the server and for the heap. Memory
>> use
>> rises with the cardinality of the collapse field. So you'll want to be
>> sure
>> there is enough memory to comfortably perform the collapse.
>>
>> >
>>
>> >
>> >
>>
>> >
>>
>> >
>> >
>>
>> >
>>
>> >
>> >
>>
>> >
>>
>> > Joel Bernstein
>>
>> >
>>
>> >
>> [
>> http://joelsolr.blogspot.com/](http://joelsolr.blogspot.com/&r=c29sci11c2VyQGx1Y2VuZS5hcGFjaGUub3Jn)
>> >
>>
>> >
>>
>> >
>> >
>>
>> >
>>
>> > On Tue, Jun 28, 2016 at 4:08 PM, jichi
>> &lt;[jichi...@gmail.com](mailto:jichi...@gmail.com)&gt; wrote:
>> >
>>
>> >
>>
>> >> Hi everyone,
>> >
>> >  I am using Solr 4.10 to index 20 million documents without sharding.
>> >  Each document has a groupId field, and there are about 2 million
>> groups.
>> >  I found the search with collapsing on groupId significantly slower
>> >  comparing to without collapsing, especially when combined with facet
>> >  queries.
>> >
>> >  I am wondering what would be the general approach to speedup field
>> >  collapsing by 2~4 times?
>> >  Would sharding the index help?
>> >  Is it possible to optimize collapsing without sharding?
>> >
>> >  The filter parameter for collapsing is like this:
>> >
>> >      q=*:*&amp;fq={!collapse field=groupId max=sum(...a long
>> formula...)}
>> >
>> >  I also put this fq into warmup queries xml to warmup caches. But still,
>> >  when q changes and more fq are added, the collapsing search would take
>> >  about 3~5 seconds. Without collapsing, the search can finish within 2
>> >  seconds.
>> >
>> >  I am thinking to manually optimize CollapsingQParserPlugin through
>> >  parallelization or extra caching.
>> >  For example, is it possible to parallelize collapsing collector by
>> >  different lucene index segments?
>> >
>> >  Thanks!
>> >
>> >  \--
>> >  jichi
>> >
>>
>> >
>>
>> >
>> >
>>
>>
>
>
> --
> --------------------------
>
> Benedetti Alessandro
> Visiting card : http://about.me/alessandro_benedetti
>
> "Tyger, tyger burning bright
> In the forests of the night,
> What immortal hand or eye
> Could frame thy fearful symmetry?"
>
> William Blake - Songs of Experience -1794 England
>

Re: How to speed up field collapsing on large number of groups

Reply via email to