Re: Load Balancing between Two Cloud Clusters

Erick Erickson Mon, 30 Apr 2018 16:29:46 -0700

"We need a way to determine that a node is still 'alive' and should be
in the load balancer, and we need a way to know that a new node is now
available and fully ready with its replicas to add to the load
balancer."

Why? If a Solr node is running but the replicas aren't up yet, it'll
pass the request along to a node that _does_ have live replicas, you
don't have to do anything. As far as the node being alive, there are
lots of ways, any API end point has to have a Solr to field it,
perhaps just use the Collections LIST command?

"How does ZooKeeper make this determination?  Does it do something
different if multiple collections are on a single cluster?  And, even
with just one cluster, what is best practice for keeping a current
list of active nodes in the cluster, especially for extremely high
query rates?"

This is a common misconception. ZooKeeper isn't interested in Solr at
all. ZooKeeper will ping the nodes it knows about and, perhaps, remove
a node from the live_nodes list, but that's all. It isn't involved in
Solr's operation in terms of routing queries, updates or anything like
that.

_Solr_ keeps track of all this by _watching_ various znodes. Say Solr
hosts some replica in a collection. when it comes up it sets a "watch"
on the /collections/my_collection/state.json Znode. It also published
its own state. So say it hosts three replicas for the collection. As
each one is loaded and ready for action, Solr posts an update to the
relevant state.json file.

ZooKeeper is then responsible for telling an other node who'd set a
watch that the znode has changed. ZK doesn't know or care whether
those are Solr nodes or not.

So when a request comes in to a Solr node, it knows what other Solr
nodes host what particular replicas and does all the sub-requests
itself, ZK isn't involved at all at that level.

So imagine node1 hosts S1R1 and S2R1 Node2 hosts S1R2 and S2R2 (for
collection A). When node1 comes up it updates the state in ZK to say
S1R2 and S1R2 are "active". Now claim node2 is coming up but hasn't
loaded it's cores yet. If it receives a request it can forward them on
to node1.

Now node2 loads both its cores. It updates the ZK node for the
collection, and since node1 is watching, it fetches the updated
state.json. From this point forward, both nodes have complete
information about all the replicas in the collection and don't need to
reference ZK any more at all.

In fact, ZK can completely go away and _queries_ can continue to work
off their cached state.json. Updates will fail since ZK quorums are
required for updates to indexes to prevent "split brain" problems.

Best,
Erick

On Mon, Apr 30, 2018 at 11:03 AM, Monica Skidmore
<monica.skidm...@careerbuilder.com> wrote:
> Thank you, Erick.  That confirms our understanding for a single cluster, or 
> once we select a node from one of the two clusters to query.
>
> As we try to set up an external load balancer to go between two clusters, 
> though, we still have some questions.  We need a way to determine that a node 
> is still 'alive' and should be in the load balancer, and we need a way to 
> know that a new node is now available and fully ready with its replicas to 
> add to the load balancer.
>
> How does ZooKeeper make this determination?  Does it do something different 
> if multiple collections are on a single cluster?  And, even with just one 
> cluster, what is best practice for keeping a current list of active nodes in 
> the cluster, especially for extremely high query rates?
>
> Again, if there's some good documentation on this, I'd love a pointer...
>
> Monica Skidmore
> Senior Software Engineer
>
>
>
> On 4/30/18, 1:09 PM, "Erick Erickson" <erickerick...@gmail.com> wrote:
>
>     Multiple clusters with the same dataset aren't load-balanced by Solr,
>     you'll have to accomplish that from "outside", e.g. something that sends
>     queries to each cluster.
>
>     _Within_ a cluster (collection), as long as a request gets to any Solr
>     node, sub-requests are distributed with an internal software LB. As far as
>     a single collection, you're fine just sending any query to any node. Even
>     if you send a query to a node that hosts no replicas for a collection, 
> Solr
>     will "do the right thing" and forward it appropiately.
>
>     HTH,
>     Erick
>
>     On Mon, Apr 30, 2018 at 9:46 AM, Monica Skidmore <
>     monica.skidm...@careerbuilder.com> wrote:
>
>     > We are migrating from a master-slave configuration to Solr cloud (7.3) 
> and
>     > have questions about the preferred way to load balance between the two
>     > clusters.  It looks like we want to use a load balancer that directs
>     > queries to any of the server nodes in either cluster, trusting that 
> node to
>     > handle the query correctly – true?  If we auto-scale nodes into the
>     > cluster, are there considerations about when a node becomes ‘ready’ to
>     > query from a Solr perspective and when it is added to the load balancer?
>     > Also, what is the preferred method of doing a health-check for the load
>     > balancer – would it be “bin/solr healthcheck -c myCollection”?
>     >
>     >
>     >
>     > Pointers in the right direction – especially to any documentation on
>     > running multiple clusters with the same dataset – would be appreciated.
>     >
>     >
>     >
>     > *Monica Skidmore*
>     > *Senior Software Engineer*
>     >
>     >
>     >
>     > [image: cid:image001.png@01D3A0F1.06327950]
>     >
>     >
>     >
>
>

Re: Load Balancing between Two Cloud Clusters

Reply via email to