RE: Solr cloud performance degradation with billions of documents

Toke Eskildsen Fri, 15 Aug 2014 13:47:43 -0700

Wilburn, Scott [scott.wilb...@verizonwireless.com.INVALID] wrote:
> You make some very good valid points. Let me clear a few things up, though.
> We are not trying to put 7B docs into one single shard, because we are using
> collections, created daily, which spread the index across the 32 shards that
> make up the cloud/collection.


Just to be sure I understand: You make a new collection, consisting of 32 
shards, each day? And when you do, the old collection is not updated anymore?

As your primary problem is indexing speed degradation, dividing your machines 
into a dedicated search pool and a dedicated index (plus search in the 
collection being build) pool might work. This would require you to move 
finished collections from the indexers to the searchers, but it would make it 
possible for you to have quite fine-grained control over how much power should 
be given to each of the two jobs, by adjusting the pool sizes. Furthermore 
having shards that are no longer updated allows for optimization down to a 
single segment, which might also help with performance.

> You are very correct about the memory issues. In fact, we cannot do any
> complicated searches or faceting without Solr returning memory errors.

Could you describe the field(s) you would like to facet on? Number/string? 
Single-/multi-value? Have you tried with DocValues? Under the right 
circumstances, faceting can be done surprisingly cheap.

- Toke Eskildsen

RE: Solr cloud performance degradation with billions of documents

Reply via email to