On Fri, Dec 16, 2016 at 10:45 AM, Toke Eskildsen <t...@statsbiblioteket.dk>
wrote:

> On Fri, 2016-12-16 at 09:31 +0100, Dorian Hoxha wrote:
> > I'm researching solr for a project that would require a max-
> > inserts(10M/s) and some heavy facet+fq on top of that, though on low
> > qps.
>
> You don't ask for much, do you :-) If you add high commit rate to the
> list, you have a serious candidate for worst-case.
>
I'm sorry, the commit will be 1-2 seconds :( . But this will be expiring
data, so it won't go petabytes. I can also relax disk-activity. I don't see
a config on how to relax the translog persistence ?Like I write, solr
returns 'ok', the document is in the translog, but the translog didn't get
an 'ok' from the filesystem.

>
> > And I'm trying to find blogs/slides where people have used some big
> > machines instead of hundreds of small ones.
> >
> > 1. Largest I've found is this
> > <https://sbdevel.wordpress.com/2016/11/30/70tb-16b-docs-4-machines-1-
> > solrcloud/>
> > with 16cores + 384GB ram but they were using 25! solr4 instances /
> > server which seems wasteful to me ?
>
> The way those machines are set up is (nearly) the same as having 16
> quadcore machines with 96GB of RAM, each running 6 Solr instances.
> I say nearly because the shared memory is a plus as it averages
> fluctuations in Solr requirements and a minus because of the cross-
> socket penalties in NUMA.
>
> I digress, sorry. Point is that they are not really run as large
> machines. The choice of box size vs. box count was hugely driven by
> purchase & maintenance cost. Also, as that setup is highly optimized
> towards serving a static index, I don't think it would fit your very
> high update requirements.
>
> As for you argument for less Solrs, each serving multiple shards, then
> it is entirely valid. I have answered your question about this on the
> blog, but the short story is: It works now and optimizing hardware
> utilization is not high on our priority list.
>
> > I know that 1 solr can have max ~29-30GB heap because GC is
> > wasteful/sucks after that, and you should leave the other amount to
> > the os for file-cache.
>
> We try hard to stay below 32GB, but for some setups the penalty of
> crossing the boundary is worth it. If, for example, having everything
> in 1 shard means a heap requirement of 50GB, it can be a better
> solution than a multi-shard setup with 2*25GB heap.
>
The heap is for the instance, not for each shard. Yeah, having less shards
is ~more efficient since terms-dictionary,cache etc have lower duplication.

>
> > 2. But do you think 1 instance will be able to fully-use a
> > 256GB/20core machine ?
>
> I think (you should verify this) that there is some congestion issues
> in the indexing part of Solr: Feeding a single Solr with X threads will
> give you a lower index rate that feeding 2 separate Solrs (running on
> the same machine) with X/2 threads each.
>
That means the thread-pools aren't ~very scalable with number of cores.
Assuming we have 2 shards on 1 solr vs 2 solr each with 1 shard.

>
> - Toke Eskildsen, State and University Library, Denmark
>
Thanks Toke!

Reply via email to