On Fri, 2016-12-16 at 09:31 +0100, Dorian Hoxha wrote:
> I'm researching solr for a project that would require a max-
> inserts(10M/s) and some heavy facet+fq on top of that, though on low
> qps.

You don't ask for much, do you :-) If you add high commit rate to the
list, you have a serious candidate for worst-case.

> And I'm trying to find blogs/slides where people have used some big
> machines instead of hundreds of small ones.
> 
> 1. Largest I've found is this
> <https://sbdevel.wordpress.com/2016/11/30/70tb-16b-docs-4-machines-1-
> solrcloud/>
> with 16cores + 384GB ram but they were using 25! solr4 instances /
> server which seems wasteful to me ?

The way those machines are set up is (nearly) the same as having 16
quadcore machines with 96GB of RAM, each running 6 Solr instances.
I say nearly because the shared memory is a plus as it averages
fluctuations in Solr requirements and a minus because of the cross-
socket penalties in NUMA.

I digress, sorry. Point is that they are not really run as large
machines. The choice of box size vs. box count was hugely driven by
purchase & maintenance cost. Also, as that setup is highly optimized
towards serving a static index, I don't think it would fit your very
high update requirements.

As for you argument for less Solrs, each serving multiple shards, then
it is entirely valid. I have answered your question about this on the
blog, but the short story is: It works now and optimizing hardware
utilization is not high on our priority list.

> I know that 1 solr can have max ~29-30GB heap because GC is
> wasteful/sucks after that, and you should leave the other amount to
> the os for file-cache.

We try hard to stay below 32GB, but for some setups the penalty of
crossing the boundary is worth it. If, for example, having everything
in 1 shard means a heap requirement of 50GB, it can be a better
solution than a multi-shard setup with 2*25GB heap.

> 2. But do you think 1 instance will be able to fully-use a
> 256GB/20core machine ?

I think (you should verify this) that there is some congestion issues
in the indexing part of Solr: Feeding a single Solr with X threads will
give you a lower index rate that feeding 2 separate Solrs (running on
the same machine) with X/2 threads each.

- Toke Eskildsen, State and University Library, Denmark

Reply via email to