Wilburn, Scott [scott.wilb...@verizonwireless.com.INVALID] wrote:
> Hardware wise, I have a 32-node Hadoop cluster that I use to run all of the 
> Solr shards and
> each node has 128GB of memory. The current SolrCloud setup is split into 4 > 
> separate and
> individual clouds of 32 shards each thereby giving four running shards per 
> cloud or one
> cloud per eight nodes.

You mean 4 running shards per node, right? With 6GB/shard that leaves about 
100GB RAM for everything else on each node.

[Snip: 10 billion insertions/day]

That is nearly 4000 insertions/second per node. Quite a lot.

> A single shard index in one collection in the busiest cloud currently takes 
> up 30G disk space
> or 960G for the entire collection. The documents are being auto committed 
> with a hard
> commit time of 4 minutes (opensearcher = false) and soft commit time of 8 
> minutes.

And you have 4 of these collections, so each node holds about 120GB of index 
with heavy updating?

> In the initial load testing, I was able to achieve a projected indexing rate 
> of 10 Billion
> documents per cloud per day for a grand total of 40 Billion per day. However, 
> the initial load
> testing was done on fairly empty clouds with just a few small collections. 
> Now that there have
> been several days of documents being indexed, I am starting to see a fairly 
> steep drop-off in
> indexing performance once the clouds reached about 15 full collections [...]

If a single collection is 30GB and you have 15 now. That means your indexes 
takes up about 450GB on each node, which has less than 100GB free memory. 
Everything is not disk cached and since you are doing searches while you index, 
your indexer must compete for the disk cache. It seems natural that this would 
slow down indexing, with the slow down getting progressively worse as you fill 
the storage with active indexes.

If you could isolate the "old" collections from the ones being updated, you 
could avoid this cache competition. Or you could of course throw more hardware 
at the problem: Are you stuck on spinning drives or are you using SSDs?

- Toke Eskildsen

Reply via email to