Sharding will help, but you'll need to co-locate documents by group ID. A
few questions / suggestions:

1) What is the size of the result set before the collapse?
2) Have you tested without the long formula, just using a field for the
min/max. It would be good to understand the impact of the formula on
performance.
3) How much memory do you have on the server and for the heap. Memory use
rises with the cardinality of the collapse field. So you'll want to be sure
there is enough memory to comfortably perform the collapse.



Joel Bernstein
http://joelsolr.blogspot.com/

On Tue, Jun 28, 2016 at 4:08 PM, jichi <jichi...@gmail.com> wrote:

> Hi everyone,
>
> I am using Solr 4.10 to index 20 million documents without sharding.
> Each document has a groupId field, and there are about 2 million groups.
> I found the search with collapsing on groupId significantly slower
> comparing to without collapsing, especially when combined with facet
> queries.
>
> I am wondering what would be the general approach to speedup field
> collapsing by 2~4 times?
> Would sharding the index help?
> Is it possible to optimize collapsing without sharding?
>
> The filter parameter for collapsing is like this:
>
>     q=*:*&fq={!collapse field=groupId max=sum(...a long formula...)}
>
> I also put this fq into warmup queries xml to warmup caches. But still,
> when q changes and more fq are added, the collapsing search would take
> about 3~5 seconds. Without collapsing, the search can finish within 2
> seconds.
>
> I am thinking to manually optimize CollapsingQParserPlugin through
> parallelization or extra caching.
> For example, is it possible to parallelize collapsing collector by
> different lucene index segments?
>
> Thanks!
>
> --
> jichi
>

Reply via email to