The top_fc hint doesn't come into play until Solr 5. With Solr 4x the CollapsingQParserPlugin always uses a top level field cache.
Joel Bernstein http://joelsolr.blogspot.com/ On Mon, Jul 4, 2016 at 9:20 AM, Alessandro Benedetti <abenede...@apache.org> wrote: > Have you tried with docValues for the fields involved in the collapse > group head selection ? > > With a group head selection of "min" "max"and "sort" should work quite > well. > Of course it depends of your formula. > > Does your index change often ? > If the warming time is not a problem you could try with : > > hint > > Currently there is only one hint available "top_fc", which stands for top > level FieldCache. The top_fc hint is only available when collapsing on > String fields. top_fc provides the best query time speed but takes the > longest to warm on startup or following a commit. top_fc also will result > in having the collapsed field cached in memory twice if the it's used for > faceting or sorting. > > Cheers > > On Wed, Jun 29, 2016 at 1:59 AM, Jichi Guo <jichi...@gmail.com> wrote: > >> Thanks for the quick response, Joel! >> >> I am hoping to delay sharding if possible, which might involve more >> things to >> consider :) >> >> >> >> 1) What is the size of the result set before the collapse? >> >> >> >> When search with q=*:* for example, before collapse numFound is around 5 >> million, and that after collapse is 2 million. >> >> I only return about the top 30 documents in the result. >> >> >> >> 2) Have you tested without the long formula, just using a field for the >> min/max. It would be good to understand the impact of the formula on >> performance. >> >> >> >> The performance seems to be affected by the number of fields appearing in >> the >> max formula. >> >> >> >> For example, that 5 million expensive query would take 4.4 sec. >> >> For both {!collapse field=productGroupId} and {!collapse >> field=productGroupId >> max=only_one_field}, the query time would reduce to around 2.4 sec. >> >> If I remove the entire collapse fq, then the query only took 1.3 sec. >> >> >> >> 3) How much memory do you have on the server and for the heap. Memory use >> rises with the cardinality of the collapse field. So you'll want to be >> sure >> there is enough memory to comfortably perform the collapse. >> >> >> >> I am setting Xmx to 24G. The total index size on disk is 50G. >> >> In solrconfig.xml, I use solr.FastLRUCache for filterCache with cache size >> 2048, solr.LRUCache for documentCache with cache size 32768, and >> solr.LRUCache >> for queryResultCache with cache size 4096. I am using default >> fieldValueCache. >> >> >> >> I found Collapsing QParser plugin explicitly uses lucene's field cache. >> >> Maybe, increasing fieldCache would help? But I am not sure how to >> increase it >> in Solr. >> >> >> Sent from [Nylas N1](https://link.nylas.com/link/5tkvmhpozan5j5h3lhni487b >> /local- >> >> 481233c4-d727/0?redirect=https%3A%2F%2Fnylas.com%2Fn1%3Fref%3Dn1&r=c29sci11c2VyQGx1Y2VuZS5hcGFjaGUub3Jn), >> the extensible, open source mail client. >> >>  >> >> On Jun 28 2016, at 4:48 pm, Joel Bernstein <joels...@gmail.com> >> wrote: >> >> > Sharding will help, but you'll need to co-locate documents by group ID. >> A >> few questions / suggestions: >> >> > >> >> > >> > >> >> > >> >> > 1) What is the size of the result set before the collapse? >> >> > >> >> > 2) Have you tested without the long formula, just using a field for the >> min/max. It would be good to understand the impact of the formula on >> performance. >> >> > >> >> > 3) How much memory do you have on the server and for the heap. Memory >> use >> rises with the cardinality of the collapse field. So you'll want to be >> sure >> there is enough memory to comfortably perform the collapse. >> >> > >> >> > >> > >> >> > >> >> > >> > >> >> > >> >> > >> > >> >> > >> >> > Joel Bernstein >> >> > >> >> > >> [ >> http://joelsolr.blogspot.com/](http://joelsolr.blogspot.com/&r=c29sci11c2VyQGx1Y2VuZS5hcGFjaGUub3Jn) >> > >> >> > >> >> > >> > >> >> > >> >> > On Tue, Jun 28, 2016 at 4:08 PM, jichi >> <[jichi...@gmail.com](mailto:jichi...@gmail.com)> wrote: >> > >> >> > >> >> >> Hi everyone, >> > >> > I am using Solr 4.10 to index 20 million documents without sharding. >> > Each document has a groupId field, and there are about 2 million >> groups. >> > I found the search with collapsing on groupId significantly slower >> > comparing to without collapsing, especially when combined with facet >> > queries. >> > >> > I am wondering what would be the general approach to speedup field >> > collapsing by 2~4 times? >> > Would sharding the index help? >> > Is it possible to optimize collapsing without sharding? >> > >> > The filter parameter for collapsing is like this: >> > >> > q=*:*&fq={!collapse field=groupId max=sum(...a long >> formula...)} >> > >> > I also put this fq into warmup queries xml to warmup caches. But still, >> > when q changes and more fq are added, the collapsing search would take >> > about 3~5 seconds. Without collapsing, the search can finish within 2 >> > seconds. >> > >> > I am thinking to manually optimize CollapsingQParserPlugin through >> > parallelization or extra caching. >> > For example, is it possible to parallelize collapsing collector by >> > different lucene index segments? >> > >> > Thanks! >> > >> > \-- >> > jichi >> > >> >> > >> >> > >> > >> >> > > > -- > -------------------------- > > Benedetti Alessandro > Visiting card : http://about.me/alessandro_benedetti > > "Tyger, tyger burning bright > In the forests of the night, > What immortal hand or eye > Could frame thy fearful symmetry?" > > William Blake - Songs of Experience -1794 England >