Myself, I am still in the old camp. For critical machines, I want to know that it is my machine, with my disks, and what software is installed exactly. But maybe the cloud provider's fast network is more important? Cheers--Rick
On May 10, 2017 6:13:27 AM EDT, Bernd Fehling <bernd.fehl...@uni-bielefeld.de> wrote: >Hi Rick, > >yes I have distributed 5 virtual server accross 5 physical machines. >So each virtual server is on a separate physical machine. > >Splitting each virtual server (64GB RAM) into two (32GB RAM), which >then >will be 10 virtual server accross 5 physical machines, is no option >because there is no gain against hardware failure of a physical >machine. > >So I rather go with two Solr instances per 64GB virtual server as first >try. > >Currently I'm still trying to solve the Rule-based Replica Placement. >There seams to be no way to report if a node is a "leader" or has the >"role="leader". > >Do you know how to create a rule like: >--> "do not create the replica on the same host where his leader >exists" > >Regards, >Bernd > > >Am 10.05.2017 um 10:54 schrieb Rick Leir: >> Bernd, >> >> Yes, cloud, ahhh. As you say, the world changed. Do you have any >hint from the cloud provider as to which physical machine your virtual >server >> is on? If so, you can hopefully distribute your replicas across >physical machines. This is not just for reliability: in a sharded >system, each >> query will cause activity in several virtual servers and you would >prefer that they are on separate physical machines, not competing for >> resources. Maybe, for Solr, you should choose a provider which can >lease you the whole physical machine. You would prefer a 256G machine >over >> several shards on 64G virtual machines. >> >> And many cloud providers assume that servers are mostly idle, so they >cram too many server containers into a machine. Then, very >occasionally, >> you get OOM even though you did not exceed your advertised RAM. This >is a topic for some other forum, where should I look? >> >> With AWS you can choose to locate your virtual machine in >US-west-Oregon or US-east-i-forget or a few other locations, but that >is a very coarse >> division. Can you choose physical machine? >> >> With Google, it might be dynamic? >> cheers -- Rick >> >> >> On 2017-05-09 03:44 AM, Bernd Fehling wrote: >>> I would name your solution more a work around as any similar >solution of this kind. >>> The issue SOLR-6027 is now 3 years open and the world has changed. >>> Instead of racks full of blades where you had many dedicated bare >metal servers >>> you have now huge machines with 256GB RAM and many CPUs. >Virtualization has taken place. >>> To get under these conditions some independance from the physical >hardware you have >>> to spread the shards across several physical machines with virtual >servers. >>> >From my point of view it is a good solution to have 5 virtual 64GB >servers >>> on 5 different huge physical machines and start 2 instances on each >virtual server. >>> If I would split up each 64GB virtual server into two 32GB virtual >server there would >>> be no gain. We don't have 10 huge machines (no security win) and we >have to admin >>> and control 10 virtual servers instead of 5 (plus zookeeper >servers). >>> >>> It is state of the art that you don't have to care about the servers >within >>> the cloud. This is the main sense of a cloud. >>> The leader should always be aware who are the members of his cloud, >how to reach >>> them (IP address) and how are the users of the cloud (collections) >distributed >>> across the cloud. >>> >>> It would be great if a solution of issue SOLR-6027 would lead to >some kind of >>> "automatic mode" for server distribution, without any special >configuring. >>> >>> Regards, >>> Bernd >>> >>> >>> Am 08.05.2017 um 17:47 schrieb Erick Erickson: >>>> Also, you can specify custom placement rules, see: >>>> >https://cwiki.apache.org/confluence/display/solr/Rule-based+Replica+Placement >>>> >>>> But Shawn's statement is the nub of what you're seeing, by default >>>> multiple JVMs on the same physical machine are considered separate >>>> Solr instances. >>>> >>>> Also note that if you want to, you can specify a nodeSet when you >>>> create the nodes, and in particular the special value EMPTY. >That'll >>>> create a collection with no replicas and you can ADDREPLICA to >>>> precisely place each one if you require that level of control. >>>> >>>> Best, >>>> Erick >>>> >>>> On Mon, May 8, 2017 at 7:44 AM, Shawn Heisey <apa...@elyograg.org> >wrote: >>>>> On 5/8/2017 5:38 AM, Bernd Fehling wrote: >>>>>> boss ------ shard1 ----- server2:7574 >>>>>> | |-- server2:8983 (leader) >>>>> The reason that this happened is because you've got two nodes >running on >>>>> every server. From SolrCloud's perspective, there are ten >distinct >>>>> nodes, not five. >>>>> >>>>> SolrCloud doesn't notice the fact that different nodes are running >on >>>>> the same server(s). If your reaction to hearing this is that it >>>>> *should* notice, you're probably right, but in a typical use case, >each >>>>> server should only be running one Solr instance, so this would >never happen. >>>>> >>>>> There is only one instance where I can think of where I would >recommend >>>>> running multiple instances per server, and that is when the >required >>>>> heap size for a single instance would be VERY large. Running two >>>>> instances with smaller heaps can yield better performance. >>>>> >>>>> See this issue: >>>>> >>>>> https://issues.apache.org/jira/browse/SOLR-6027 >>>>> >>>>> Thanks, >>>>> Shawn >>>>> >> -- Sorry for being brief. Alternate email is rickleir at yahoo dot com