Re: distribution of leader and replica in SolrCloud

Bernd Fehling Wed, 10 May 2017 03:13:53 -0700

Hi Rick,

yes I have distributed 5 virtual server accross 5 physical machines.
So each virtual server is on a separate physical machine.


Splitting each virtual server (64GB RAM) into two (32GB RAM), which then
will be 10 virtual server accross 5 physical machines, is no option
because there is no gain against hardware failure of a physical machine.

So I rather go with two Solr instances per 64GB virtual server as first try.

Currently I'm still trying to solve the Rule-based Replica Placement.
There seams to be no way to report if a node is a "leader" or has the 
"role="leader".

Do you know how to create a rule like:
--> "do not create the replica on the same host where his leader exists"

Regards,
Bernd


Am 10.05.2017 um 10:54 schrieb Rick Leir:
> Bernd,
> 
> Yes, cloud, ahhh. As you say, the world changed.  Do you have any hint from 
> the cloud provider as to which physical machine your virtual server
> is on? If so, you can hopefully distribute your replicas across physical 
> machines. This is not just for reliability: in a sharded system, each
> query will cause activity in several virtual servers and you would prefer 
> that they are on separate physical machines, not competing for
> resources. Maybe, for Solr, you should choose a provider which can lease you 
> the whole physical machine. You would prefer a 256G machine over
> several shards on 64G virtual machines.
> 
> And many cloud providers assume that servers are mostly idle, so they cram 
> too many server containers into a machine. Then, very occasionally,
> you get OOM even though you did not exceed your advertised RAM. This is a 
> topic for some other forum, where should I look?
> 
> With AWS you can choose to locate your virtual machine in US-west-Oregon or 
> US-east-i-forget or a few other locations, but that is a very coarse
> division. Can you choose physical machine?
> 
> With Google, it might be dynamic?
> cheers -- Rick
> 
> 
> On 2017-05-09 03:44 AM, Bernd Fehling wrote:
>> I would name your solution more a work around as any similar solution of 
>> this kind.
>> The issue SOLR-6027 is now 3 years open and the world has changed.
>> Instead of racks full of blades where you had many dedicated bare metal 
>> servers
>> you have now huge machines with 256GB RAM and many CPUs. Virtualization has 
>> taken place.
>> To get under these conditions some independance from the physical hardware 
>> you have
>> to spread the shards across several physical machines with virtual servers.
>> >From my point of view it is a good solution to have 5 virtual 64GB servers
>> on 5 different huge physical machines and start 2 instances on each virtual 
>> server.
>> If I would split up each 64GB virtual server into two 32GB virtual server 
>> there would
>> be no gain. We don't have 10 huge machines (no security win) and we have to 
>> admin
>> and control 10 virtual servers instead of 5 (plus zookeeper servers).
>>
>> It is state of the art that you don't have to care about the servers within
>> the cloud. This is the main sense of a cloud.
>> The leader should always be aware who are the members of his cloud, how to 
>> reach
>> them (IP address) and how are the users of the cloud (collections) 
>> distributed
>> across the cloud.
>>
>> It would be great if a solution of issue SOLR-6027 would lead to some kind of
>> "automatic mode" for server distribution, without any special configuring.
>>
>> Regards,
>> Bernd
>>
>>
>> Am 08.05.2017 um 17:47 schrieb Erick Erickson:
>>> Also, you can specify custom placement rules, see:
>>> https://cwiki.apache.org/confluence/display/solr/Rule-based+Replica+Placement
>>>
>>> But Shawn's statement is the nub of what you're seeing, by default
>>> multiple JVMs on the same physical machine are considered separate
>>> Solr instances.
>>>
>>> Also note that if you want to, you can specify a nodeSet when you
>>> create the nodes, and in particular the special value EMPTY. That'll
>>> create a collection with no replicas and you can ADDREPLICA to
>>> precisely place each one if you require that level of control.
>>>
>>> Best,
>>> Erick
>>>
>>> On Mon, May 8, 2017 at 7:44 AM, Shawn Heisey <apa...@elyograg.org> wrote:
>>>> On 5/8/2017 5:38 AM, Bernd Fehling wrote:
>>>>> boss ------ shard1 ----- server2:7574
>>>>>         |             |-- server2:8983 (leader)
>>>> The reason that this happened is because you've got two nodes running on
>>>> every server.  From SolrCloud's perspective, there are ten distinct
>>>> nodes, not five.
>>>>
>>>> SolrCloud doesn't notice the fact that different nodes are running on
>>>> the same server(s).  If your reaction to hearing this is that it
>>>> *should* notice, you're probably right, but in a typical use case, each
>>>> server should only be running one Solr instance, so this would never 
>>>> happen.
>>>>
>>>> There is only one instance where I can think of where I would recommend
>>>> running multiple instances per server, and that is when the required
>>>> heap size for a single instance would be VERY large.  Running two
>>>> instances with smaller heaps can yield better performance.
>>>>
>>>> See this issue:
>>>>
>>>> https://issues.apache.org/jira/browse/SOLR-6027
>>>>
>>>> Thanks,
>>>> Shawn
>>>>
>

Re: distribution of leader and replica in SolrCloud

Reply via email to