Thank you, Erick.  This is exactly the information I needed but hadn't 
correctly parsed as a new Solr cloud user.  You've just made setting up our new 
configuration much easier!!

Monica Skidmore
Senior Software Engineer
 

 
On 4/30/18, 7:29 PM, "Erick Erickson" <erickerick...@gmail.com> wrote:

    "We need a way to determine that a node is still 'alive' and should be
    in the load balancer, and we need a way to know that a new node is now
    available and fully ready with its replicas to add to the load
    balancer."
    
    Why? If a Solr node is running but the replicas aren't up yet, it'll
    pass the request along to a node that _does_ have live replicas, you
    don't have to do anything. As far as the node being alive, there are
    lots of ways, any API end point has to have a Solr to field it,
    perhaps just use the Collections LIST command?
    
    "How does ZooKeeper make this determination?  Does it do something
    different if multiple collections are on a single cluster?  And, even
    with just one cluster, what is best practice for keeping a current
    list of active nodes in the cluster, especially for extremely high
    query rates?"
    
    This is a common misconception. ZooKeeper isn't interested in Solr at
    all. ZooKeeper will ping the nodes it knows about and, perhaps, remove
    a node from the live_nodes list, but that's all. It isn't involved in
    Solr's operation in terms of routing queries, updates or anything like
    that.
    
    _Solr_ keeps track of all this by _watching_ various znodes. Say Solr
    hosts some replica in a collection. when it comes up it sets a "watch"
    on the /collections/my_collection/state.json Znode. It also published
    its own state. So say it hosts three replicas for the collection. As
    each one is loaded and ready for action, Solr posts an update to the
    relevant state.json file.
    
    ZooKeeper is then responsible for telling an other node who'd set a
    watch that the znode has changed. ZK doesn't know or care whether
    those are Solr nodes or not.
    
    So when a request comes in to a Solr node, it knows what other Solr
    nodes host what particular replicas and does all the sub-requests
    itself, ZK isn't involved at all at that level.
    
    So imagine node1 hosts S1R1 and S2R1 Node2 hosts S1R2 and S2R2 (for
    collection A). When node1 comes up it updates the state in ZK to say
    S1R2 and S1R2 are "active". Now claim node2 is coming up but hasn't
    loaded it's cores yet. If it receives a request it can forward them on
    to node1.
    
    Now node2 loads both its cores. It updates the ZK node for the
    collection, and since node1 is watching, it fetches the updated
    state.json. From this point forward, both nodes have complete
    information about all the replicas in the collection and don't need to
    reference ZK any more at all.
    
    In fact, ZK can completely go away and _queries_ can continue to work
    off their cached state.json. Updates will fail since ZK quorums are
    required for updates to indexes to prevent "split brain" problems.
    
    Best,
    Erick
    
    On Mon, Apr 30, 2018 at 11:03 AM, Monica Skidmore
    <monica.skidm...@careerbuilder.com> wrote:
    > Thank you, Erick.  That confirms our understanding for a single cluster, 
or once we select a node from one of the two clusters to query.
    >
    > As we try to set up an external load balancer to go between two clusters, 
though, we still have some questions.  We need a way to determine that a node 
is still 'alive' and should be in the load balancer, and we need a way to know 
that a new node is now available and fully ready with its replicas to add to 
the load balancer.
    >
    > How does ZooKeeper make this determination?  Does it do something 
different if multiple collections are on a single cluster?  And, even with just 
one cluster, what is best practice for keeping a current list of active nodes 
in the cluster, especially for extremely high query rates?
    >
    > Again, if there's some good documentation on this, I'd love a pointer...
    >
    > Monica Skidmore
    > Senior Software Engineer
    >
    >
    >
    > On 4/30/18, 1:09 PM, "Erick Erickson" <erickerick...@gmail.com> wrote:
    >
    >     Multiple clusters with the same dataset aren't load-balanced by Solr,
    >     you'll have to accomplish that from "outside", e.g. something that 
sends
    >     queries to each cluster.
    >
    >     _Within_ a cluster (collection), as long as a request gets to any Solr
    >     node, sub-requests are distributed with an internal software LB. As 
far as
    >     a single collection, you're fine just sending any query to any node. 
Even
    >     if you send a query to a node that hosts no replicas for a 
collection, Solr
    >     will "do the right thing" and forward it appropiately.
    >
    >     HTH,
    >     Erick
    >
    >     On Mon, Apr 30, 2018 at 9:46 AM, Monica Skidmore <
    >     monica.skidm...@careerbuilder.com> wrote:
    >
    >     > We are migrating from a master-slave configuration to Solr cloud 
(7.3) and
    >     > have questions about the preferred way to load balance between the 
two
    >     > clusters.  It looks like we want to use a load balancer that directs
    >     > queries to any of the server nodes in either cluster, trusting that 
node to
    >     > handle the query correctly – true?  If we auto-scale nodes into the
    >     > cluster, are there considerations about when a node becomes ‘ready’ 
to
    >     > query from a Solr perspective and when it is added to the load 
balancer?
    >     > Also, what is the preferred method of doing a health-check for the 
load
    >     > balancer – would it be “bin/solr healthcheck -c myCollection”?
    >     >
    >     >
    >     >
    >     > Pointers in the right direction – especially to any documentation on
    >     > running multiple clusters with the same dataset – would be 
appreciated.
    >     >
    >     >
    >     >
    >     > *Monica Skidmore*
    >     > *Senior Software Engineer*
    >     >
    >     >
    >     >
    >     > [image: cid:image001.png@01D3A0F1.06327950]
    >     >
    >     >
    >     >
    >
    >
    

Reply via email to