Thanks for bringing up this topic, Ilan. > We are also running a fork of Solr and in our > fork we have made some optimizations to avoid processing DOWNNODE > messages for nodes that only host PRS collections. Those optimizations > have not made it upstream at this point. I can take a look at > upstreaming those changes or some variation of those.
Justin, my bad that I may have missed upstreaming some of the fixes. I can take a look at consolidating all changes to upstream. I can see that Fullstory's fork, upstream 8x and upstream 9x are all off by some fixes, and I'm trying to consolidate them. I'm tracking the 8x branch mainly from the 8.11 release standpoint, and shall attempt to bring all these branches to sync such that whatever is working at scale inside Fullstory can be used by everyone. On Mon, 2 Oct 2023 at 16:52, Ilan Ginzburg <ilans...@gmail.com> wrote: > Not sure I totally follow what you mean Mark. > We thought making actual replica state = published replica state AND node > state, which would set practical replica states to down when an ephemeral > Zookeeper node for a SolrCloud node disappears. > This works nicely for the going down part, but still requires work for > coming back up, since at startup the SolrCloud node likely wants to > advertise itself as up, but doesn't want all its replicas to appear as > ACTIVE right away. So that would still require a loop updating all states > for all replicas of the node to DOWN at startup, then updating them back to > ACTIVE as they're becoming active. > > The main downside of EPHEMERAL replica state nodes in Zookeeper is the > (added) bookkeeping required on ZK session expiration and reconnection to > recreate all replica states. > In our production clusters though, ZK session expiration is not correctly > recovered from. The nodes with expired and reconnected sessions appear ok > but their replicas are not really seen by the rest of the cluster (whatever > that means, we didn't manage yet to understand the issue). > > I wonder if ZK session expiration and re establishment works nicely for > others? The code handling this is in ZkController.onReconnect(). > > On Mon, Oct 2, 2023 at 9:08 AM Mark Miller <markrmil...@gmail.com> wrote: > > > Actually, I think what I did was move the DOWN state to startup. Since > you > > can’t count on it on shutdown (crash, killed process, state doesn’t get > > published for a variety of reasons), it doesn’t do anything solid for the > > holes where you are indexing and a node cycles. So it can come up in any > > state, like active, before it publishes recovery. So I kept it so a node > > can set everything as down before publishing its live node. > > >