Re: Dynamic schema memory consumption

2017-04-11 Thread Dorian Hoxha
Here is a small snippet that I copy pated from Shawn Helsey (who is a core contributor I think, he's good): > One thing to note: SolrCloud begins to have performance issues when the > number of collections in the cloud reaches the low hundreds. It's not > going to scale very well with a collecti

Re: Dynamic schema memory consumption

2017-04-11 Thread Dorian Hoxha
> > And this overhead depends on what? I mean, if I create an empty collection > will it take up much heap size just for "being there" ? Yes. You can search on elastic-search/solr/lucene mailing lists and see that it's true. But nobody has `empty` collections, so yours will have a schema and some

Re: Dynamic schema memory consumption

2017-04-11 Thread jpereira
The way the data is spread across the cluster is not really uniform. Most of shards have way lower than 50GB; I would say about 15% of the total shards have more than 50GB. Dorian Hoxha wrote > Each shard is a lucene index which has a lot of overhead. And this overhead depends on what? I mean,

Re: Dynamic schema memory consumption

2017-04-11 Thread Dorian Hoxha
What I'm suggesting, is that you should aim for max(50GB) per shard of data. How much is it currently ? Each shard is a lucene index which has a lot of overhead. If you can, try to have 20x-50x-100x less shards than you currently do and you'll see lower heap requirement. I don't know about static/d

Re: Dynamic schema memory consumption

2017-04-11 Thread jpereira
Dorian Hoxha wrote > Isn't 18K lucene-indexes (1 for each shard, not counting the replicas) a > little too much for 3TB of data ? > Something like 0.167GB for each shard ? > Isn't that too much overhead (i've mostly worked with es but still lucene > underneath) ? I don't have only 3TB , I have 3TB

Re: Dynamic schema memory consumption

2017-04-11 Thread Dorian Hoxha
Also you should change the heap 32GB->30GB so you're guaranteed to get pointer compression. I think you should have no need to increase it more than this, since most things have moved to out-of-heap stuff, like docValues etc. On Tue, Apr 11, 2017 at 12:07 PM, Dorian Hoxha wrote: > Isn't 18K luce

Re: Dynamic schema memory consumption

2017-04-11 Thread Dorian Hoxha
Isn't 18K lucene-indexes (1 for each shard, not counting the replicas) a little too much for 3TB of data ? Something like 0.167GB for each shard ? Isn't that too much overhead (i've mostly worked with es but still lucene underneath) ? Can't you use 1/100 the current number of collections ? On Mon