On Fri, Dec 16, 2016 at 10:45 AM, Toke Eskildsen <t...@statsbiblioteket.dk> wrote:
> On Fri, 2016-12-16 at 09:31 +0100, Dorian Hoxha wrote: > > I'm researching solr for a project that would require a max- > > inserts(10M/s) and some heavy facet+fq on top of that, though on low > > qps. > > You don't ask for much, do you :-) If you add high commit rate to the > list, you have a serious candidate for worst-case. > I'm sorry, the commit will be 1-2 seconds :( . But this will be expiring data, so it won't go petabytes. I can also relax disk-activity. I don't see a config on how to relax the translog persistence ?Like I write, solr returns 'ok', the document is in the translog, but the translog didn't get an 'ok' from the filesystem. > > > And I'm trying to find blogs/slides where people have used some big > > machines instead of hundreds of small ones. > > > > 1. Largest I've found is this > > <https://sbdevel.wordpress.com/2016/11/30/70tb-16b-docs-4-machines-1- > > solrcloud/> > > with 16cores + 384GB ram but they were using 25! solr4 instances / > > server which seems wasteful to me ? > > The way those machines are set up is (nearly) the same as having 16 > quadcore machines with 96GB of RAM, each running 6 Solr instances. > I say nearly because the shared memory is a plus as it averages > fluctuations in Solr requirements and a minus because of the cross- > socket penalties in NUMA. > > I digress, sorry. Point is that they are not really run as large > machines. The choice of box size vs. box count was hugely driven by > purchase & maintenance cost. Also, as that setup is highly optimized > towards serving a static index, I don't think it would fit your very > high update requirements. > > As for you argument for less Solrs, each serving multiple shards, then > it is entirely valid. I have answered your question about this on the > blog, but the short story is: It works now and optimizing hardware > utilization is not high on our priority list. > > > I know that 1 solr can have max ~29-30GB heap because GC is > > wasteful/sucks after that, and you should leave the other amount to > > the os for file-cache. > > We try hard to stay below 32GB, but for some setups the penalty of > crossing the boundary is worth it. If, for example, having everything > in 1 shard means a heap requirement of 50GB, it can be a better > solution than a multi-shard setup with 2*25GB heap. > The heap is for the instance, not for each shard. Yeah, having less shards is ~more efficient since terms-dictionary,cache etc have lower duplication. > > > 2. But do you think 1 instance will be able to fully-use a > > 256GB/20core machine ? > > I think (you should verify this) that there is some congestion issues > in the indexing part of Solr: Feeding a single Solr with X threads will > give you a lower index rate that feeding 2 separate Solrs (running on > the same machine) with X/2 threads each. > That means the thread-pools aren't ~very scalable with number of cores. Assuming we have 2 shards on 1 solr vs 2 solr each with 1 shard. > > - Toke Eskildsen, State and University Library, Denmark > Thanks Toke!