On Fri, 2016-12-16 at 09:31 +0100, Dorian Hoxha wrote: > I'm researching solr for a project that would require a max- > inserts(10M/s) and some heavy facet+fq on top of that, though on low > qps.
You don't ask for much, do you :-) If you add high commit rate to the list, you have a serious candidate for worst-case. > And I'm trying to find blogs/slides where people have used some big > machines instead of hundreds of small ones. > > 1. Largest I've found is this > <https://sbdevel.wordpress.com/2016/11/30/70tb-16b-docs-4-machines-1- > solrcloud/> > with 16cores + 384GB ram but they were using 25! solr4 instances / > server which seems wasteful to me ? The way those machines are set up is (nearly) the same as having 16 quadcore machines with 96GB of RAM, each running 6 Solr instances. I say nearly because the shared memory is a plus as it averages fluctuations in Solr requirements and a minus because of the cross- socket penalties in NUMA. I digress, sorry. Point is that they are not really run as large machines. The choice of box size vs. box count was hugely driven by purchase & maintenance cost. Also, as that setup is highly optimized towards serving a static index, I don't think it would fit your very high update requirements. As for you argument for less Solrs, each serving multiple shards, then it is entirely valid. I have answered your question about this on the blog, but the short story is: It works now and optimizing hardware utilization is not high on our priority list. > I know that 1 solr can have max ~29-30GB heap because GC is > wasteful/sucks after that, and you should leave the other amount to > the os for file-cache. We try hard to stay below 32GB, but for some setups the penalty of crossing the boundary is worth it. If, for example, having everything in 1 shard means a heap requirement of 50GB, it can be a better solution than a multi-shard setup with 2*25GB heap. > 2. But do you think 1 instance will be able to fully-use a > 256GB/20core machine ? I think (you should verify this) that there is some congestion issues in the indexing part of Solr: Feeding a single Solr with X threads will give you a lower index rate that feeding 2 separate Solrs (running on the same machine) with X/2 threads each. - Toke Eskildsen, State and University Library, Denmark