Re: distribution of leader and replica in SolrCloud

Erick Erickson Tue, 09 May 2017 07:44:40 -0700

Bernd:

You rarely have to worry about who the leader is unless and until you
get many 100s of shards. The extra work a leader does is usually
minimal and spending time trying to control where the leaders live is
usually time wasted. Leaders will shift from replica to replica
anyway. Say your leader for shard1 is on instance1, shard1_replica1.
Then you shut instance1 down. The leader will shift to some other
replica in the shard, say shard1_replica4.


If you insist you can use the collections API BALANCESHARDUNIQUE and
REBALANCELEADERS. The former assigns a "preferredLeader" role to one
replica for each shard and the latter tries to make those replicas the
real leader. If you really want to go all-out you can use
ADDREPLICAPROP to make the replica of your choice the preferredLeader.

But this is generally a waste of time and energy. Those abilities were
added for a case where 100s of leaders wound up being in the same JVM
and the performance impact was noticeable. And even if you do assign
the preferredLeader role, that is just a hint, not a requirement. The
collection will tend to have the specified replicas be the leaders,
but only "tend".

Best,
Erick

On Tue, May 9, 2017 at 5:35 AM, Shawn Heisey <apa...@elyograg.org> wrote:
> On 5/9/2017 1:44 AM, Bernd Fehling wrote:
>> From my point of view it is a good solution to have 5 virtual 64GB
>> servers on 5 different huge physical machines and start 2 instances on
>> each virtual server.
>
> If the total amount of memory in the virtual machine is 64GB, then I
> would run one Solr node on it with a heap size between 8 and 16GB.  The
> rest of the memory in the virtual machine would then be available to
> cache whatever index data exists.  That caching is extremely important
> for good performance.
>
> If the *heap* size is what would be 64GB (and you actually do need that
> much heap), then it *does* make sense to split that into two instances,
> each with a 31GB heap.  I would argue that it's better to have those two
> instances on separate machines.
>
> Assuming that you have a bare metal server with 256GB of RAM, you would
> *not* want to divide that up into five virtual machines each with 64GB.
> The physical host would not have enough memory for all five virtual
> machines.  It would have the option of using its disk space as extra
> memory, but as soon as you start swapping memory to disk, performance of
> ANY software becomes unacceptable.  Solr in particular requires actual
> real memory.  Oversubscribing memory on VMs might work for some
> workloads, but it won't work for Solr.
>
> If all your virtual machines are running on the same physical host, then
> you have no redundancy.  Modern servers have redundant power supplies,
> redundant hard drives, and other kinds of fault tolerance.  Even so,
> there are many components in a server that have no redundancy, like the
> motherboard, or the backplane.  If one of those components were to die,
> all of the virtual machines would go down.
>
> Thanks,
> Shawn
>

Re: distribution of leader and replica in SolrCloud

Reply via email to