Hi everyone, I am using Solr 4.10 to index 20 million documents without sharding. Each document has a groupId field, and there are about 2 million groups. I found the search with collapsing on groupId significantly slower comparing to without collapsing, especially when combined with facet queries.
I am wondering what would be the general approach to speedup field collapsing by 2~4 times? Would sharding the index help? Is it possible to optimize collapsing without sharding? The filter parameter for collapsing is like this: q=*:*&fq={!collapse field=groupId max=sum(...a long formula...)} I also put this fq into warmup queries xml to warmup caches. But still, when q changes and more fq are added, the collapsing search would take about 3~5 seconds. Without collapsing, the search can finish within 2 seconds. I am thinking to manually optimize CollapsingQParserPlugin through parallelization or extra caching. For example, is it possible to parallelize collapsing collector by different lucene index segments? Thanks! -- jichi