Document updates will fail with less than the quorum of ZKs, so you won't be
able to index anything when 1 server is down.
Its the one area that always seems counter intuitive (to me at any rate),
after all you have your 2 instances on 1 server, so you have all the shard
data, logically you should be able to index just using that (and if you had
a single ZK running on that server it would indeed be fine)... However, ZK
needs a 3rd instance running somewhere in order to maintain its majority
rule.
The consensus I've seen tends to be run a ZK on all your cloud servers, and
then run some "outside" the cloud on other machines. If you had a 3rd VM
that just ran ZK and nothing else, you could lose any 1 of the 3 machines
and still be ok. But if you lose 2 you are in trouble.
-----Original Message-----
From: James Dulin
Sent: Friday, May 31, 2013 10:28 PM
To: solr-user@lucene.apache.org
Subject: RE: 2 VM setup for SOLRCLOUD?
Thanks. When you say updates will fail, do you mean document updates will
fail, or, updates to the cluster, like adding a new node? If adding new
data will fail, I will definitely need to figure out a different way to set
this up.
-----Original Message-----
From: Erick Erickson [mailto:erickerick...@gmail.com]
Sent: Friday, May 31, 2013 4:33 PM
To: solr-user@lucene.apache.org
Subject: Re: 2 VM setup for SOLRCLOUD?
Be really careful here. Zookeeper requires a quorum, which is ((zk
nodes)/2) + 1. So the problem here is that if (zk nodes) is 2, both of them
need to be up. If either of them is down, searches will still work, but
updates will fail.
Best
Erick
On Fri, May 31, 2013 at 11:39 AM, James Dulin <jdu...@crelate.com> wrote:
Thanks, I think that the load balancer will be simple enough to set up in
Azure. My only other current concern is having the zookeepers on the
same VMs as Solr. While not ideal, we basically just need simple
redunancy, so my theory is that if VM1 goes down, VM 2 will have the
shard, node, and zookeeper to keep everything going smooth.
-----Original Message-----
From: Erick Erickson [mailto:erickerick...@gmail.com]
Sent: Friday, May 31, 2013 8:07 AM
To: solr-user@lucene.apache.org
Subject: Re: 2 VM setup for SOLRCLOUD?
Actually, you don't technically _need_ a load balancer, you could hard
code all requests to the same node and internally, everything would "just
work". But then you'd be _creating_ a single point of failure if that node
went down, so a fronting LB is usually indicated.
Perhaps the thing you're missing is that Zookeeper is there explicitly for
the purpose of knowing where all the nodes are and what their state is.
Solr communicates with ZK and any incoming requests (update or query) are
handled appripriately thus Jason's comment that once a request gets to any
node in the cluster, things are handled automatically.
All that said, if you're using SolrJ and use CloudSolrServer exclusively,
then the load balancer isn't necessary. Internally CloudSolrServer (the
client) reads the list of accessible nodes from Zookeeper and will be
fault tolerant and load balance internally.
Best
Erick
On Thu, May 30, 2013 at 3:51 PM, Jason Hellman
<jhell...@innoventsolutions.com> wrote:
Jamey,
You will need a load balancer on the front end to direct traffic into one
of your SolrCore entry points. It doesn't matter, technically, which one
though you will find benefits to narrowing traffic to fewer (for purposes
of better cache management).
Internally SolrCloud will round-robin distribute requests to other shards
once a query begins execution. But you do need an entry point externally
to be defined through your load balancer.
Hope this is useful!
Jason
On May 30, 2013, at 12:48 PM, James Dulin <jdu...@crelate.com> wrote:
Working to setup SolrCloud in Windows Azure. I have read over the
solr Cloud wiki, but am a little confused about some of the
deployment options. I am attaching an image for what I am thinking
we want to do. 2 VM's that will have 2 shards spanning across them.
4 Nodes total across the two machines, and a zookeeper on each VM.
I think this is feasible, but, I am a little confused about how each
node knows how to respond to requests (do I need a load balancer in
front, or can we just reference the "collection" etc.)
Thanks!
Jamey