Document updates will fail with less than the quorum of ZKs, so you won't be able to index anything when 1 server is down.

Its the one area that always seems counter intuitive (to me at any rate), after all you have your 2 instances on 1 server, so you have all the shard data, logically you should be able to index just using that (and if you had a single ZK running on that server it would indeed be fine)... However, ZK needs a 3rd instance running somewhere in order to maintain its majority rule.

The consensus I've seen tends to be run a ZK on all your cloud servers, and then run some "outside" the cloud on other machines. If you had a 3rd VM that just ran ZK and nothing else, you could lose any 1 of the 3 machines and still be ok. But if you lose 2 you are in trouble.

-----Original Message----- From: James Dulin
Sent: Friday, May 31, 2013 10:28 PM
To: solr-user@lucene.apache.org
Subject: RE: 2 VM setup for SOLRCLOUD?

Thanks. When you say updates will fail, do you mean document updates will fail, or, updates to the cluster, like adding a new node? If adding new data will fail, I will definitely need to figure out a different way to set this up.

-----Original Message-----
From: Erick Erickson [mailto:erickerick...@gmail.com]
Sent: Friday, May 31, 2013 4:33 PM
To: solr-user@lucene.apache.org
Subject: Re: 2 VM setup for SOLRCLOUD?

Be really careful here. Zookeeper requires a quorum, which is ((zk
nodes)/2) + 1. So the problem here is that if (zk nodes) is 2, both of them need to be up. If either of them is down, searches will still work, but updates will fail.

Best
Erick

On Fri, May 31, 2013 at 11:39 AM, James Dulin <jdu...@crelate.com> wrote:

Thanks, I think that the load balancer will be simple enough to set up in Azure. My only other current concern is having the zookeepers on the same VMs as Solr. While not ideal, we basically just need simple redunancy, so my theory is that if VM1 goes down, VM 2 will have the shard, node, and zookeeper to keep everything going smooth.


-----Original Message-----
From: Erick Erickson [mailto:erickerick...@gmail.com]
Sent: Friday, May 31, 2013 8:07 AM
To: solr-user@lucene.apache.org
Subject: Re: 2 VM setup for SOLRCLOUD?

Actually, you don't technically _need_ a load balancer, you could hard code all requests to the same node and internally, everything would "just work". But then you'd be _creating_ a single point of failure if that node went down, so a fronting LB is usually indicated.

Perhaps the thing you're missing is that Zookeeper is there explicitly for the purpose of knowing where all the nodes are and what their state is. Solr communicates with ZK and any incoming requests (update or query) are handled appripriately thus Jason's comment that once a request gets to any node in the cluster, things are handled automatically.

All that said, if you're using SolrJ and use CloudSolrServer exclusively, then the load balancer isn't necessary. Internally CloudSolrServer (the client) reads the list of accessible nodes from Zookeeper and will be fault tolerant and load balance internally.

Best
Erick

On Thu, May 30, 2013 at 3:51 PM, Jason Hellman <jhell...@innoventsolutions.com> wrote:
Jamey,

You will need a load balancer on the front end to direct traffic into one of your SolrCore entry points. It doesn't matter, technically, which one though you will find benefits to narrowing traffic to fewer (for purposes of better cache management).

Internally SolrCloud will round-robin distribute requests to other shards once a query begins execution. But you do need an entry point externally to be defined through your load balancer.

Hope this is useful!

Jason

On May 30, 2013, at 12:48 PM, James Dulin <jdu...@crelate.com> wrote:

Working to setup SolrCloud in Windows Azure.  I have read over the
solr Cloud wiki, but am a little confused about some of the
deployment options.  I am attaching an image for what I am thinking
we want to do.  2 VM's that will have 2 shards spanning across them.
4 Nodes total across the two machines, and a zookeeper on each VM.
I think this is feasible, but, I am a little confused about how each
node knows how to respond to requests (do I need a load balancer in
front, or can we just reference the "collection" etc.)



Thanks!

Jamey




Reply via email to