[ https://issues.apache.org/jira/browse/SOLR-14371?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17073243#comment-17073243 ]
Jan Høydahl edited comment on SOLR-14371 at 4/1/20, 11:55 PM: -------------------------------------------------------------- [~houston], I wanted to test dynamic reconfiguration with 3 external ZKs like in the screenshot above. Sadly it does not work, so your assumption in Solr Operator seems to be wrong - Solr will only be able to connect to the zookeeper(s) listed in ZK_HOST connection string. Each solr node will select one of them and connect, and on connection loss, the {{ConnectionManager}} will try one of the other addresses in ZK_HOST. I thought this failover was handled by Zookeeper's client code, but seems it is not. So the reason the Solr Operator approach may still work, is that at connection loss, Solr will retry and connect to one of the other ZK nodes through the same host:port and succeed. But since Zookeeper assumes a persistent 2-way connection with its clients, I wonder if that connection will frequently go up and down or switch between ZK servers like a pogo stick? Can you test that? When I killed zoo1 in my setup above (which is the one on localhost:2181, no LB), Solr stopped working and was not able to "find" zoo2 or zoo3. More research is needed to figure out if the official ZK java client gives us something here, or if we need to make our own {{ConnectionManager}} put a watch on {{/zookeeper/config}} and only use ZK_HOST as a bootstrap to get to the dynamic config? was (Author: janhoy): [~houston], I wanted to test dynamic reconfiguration with 3 external ZKs like in the screenshot above. Sadly it does not work, so your assumption in Solr Operator is wrong - Solr will only maintain a connection to one ZK at a time, and while the service LB in front of ZK will make sure that Solr will find a healty ZK, it is not the best way to connect to ZK. Each node should have a live connection to multiple ZKs at the same time. I have not investigated why it does not work. Solr uses a new 3.5.x Zk client library which should have the capability, but probably the way we configure and use it does not take avantage of this. When I killed zoo1 in my setup above (which is the one on localhost:2181), Solr stopped working and was not able to "find" zoo2 or zoo3. > Zk StatusHandler should know about dynamic zk config > ---------------------------------------------------- > > Key: SOLR-14371 > URL: https://issues.apache.org/jira/browse/SOLR-14371 > Project: Solr > Issue Type: Bug > Security Level: Public(Default Security Level. Issues are Public) > Reporter: Jan Høydahl > Assignee: Jan Høydahl > Priority: Major > Attachments: dynamic-reconfig.png > > Time Spent: 50m > Remaining Estimate: 0h > > With zk 3.5 it supports dynamic reconfig, which is used by the solr-operator > for Kubernetes. Then Solr is given a zkHost of one url pointing to a LB > (Service) in front of all zookeepers, and the zkclient will then fetch list > of all zookeepers from special zknode /zookeeper/config and reconfigure > itself with connection to all zk nodes listed. So you can then scale up/down > number of zk nodes dynamically without restarting solr. > However, the Admin UI displays errors since it believes it is connected to > only one zk, which is contradictory to what zk itself reports. We need to > make ZookeeperStatusHandler aware of dynamic reconfig so it asks zkclient > what current zkHost is instead of relying on Zk_HOST static setting. -- This message was sent by Atlassian Jira (v8.3.4#803005) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org