[ https://issues.apache.org/jira/browse/SOLR-13899?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16968388#comment-16968388 ]
Erick Erickson commented on SOLR-13899: --------------------------------------- Thanks for raising this, and _especially_ for providing such detailed analysis! I won't be able to get to this either, but thought an "good job" was in order ;) > zkstatus page incorrectly reports zookeeper in error when Zookeeper observers > are present > ----------------------------------------------------------------------------------------- > > Key: SOLR-13899 > URL: https://issues.apache.org/jira/browse/SOLR-13899 > Project: Solr > Issue Type: Bug > Security Level: Public(Default Security Level. Issues are Public) > Components: SolrCloud > Affects Versions: 8.3.0 > Reporter: Salvatore > Priority: Trivial > Labels: easyfix > Attachments: zkstatus.png > > > When a zookeeper ensemble has 'observers', the zkstatus page incorrectly says > Zookeeper status is in error (See attachment.) > This is because the > [ZookeeperStatusHandler|https://github.com/apache/lucene-solr/blob/master/solr/core/src/java/org/apache/solr/handler/admin/ZookeeperStatusHandler.java] > does not account for the > '[observer|https://zookeeper.apache.org/doc/current/zookeeperObservers.html]' > role whatsoever. > This should be an easy fix - I see there being two options; > 1. Treat observers as followers by changing > [L112|https://github.com/apache/lucene-solr/blob/master/solr/core/src/java/org/apache/solr/handler/admin/ZookeeperStatusHandler.java#L112] > to > {code:java} > if ("follower".equals(state) || "observer".equals(state)) { > {code} > > 2. Ignore observers from the required follower count by changing > [L116|https://github.com/apache/lucene-solr/blob/master/solr/core/src/java/org/apache/solr/handler/admin/ZookeeperStatusHandler.java#L116] > to > {code:java} > reportedFollowers = > Integer.parseInt(String.valueOf(stat.get("zk_synced_followers"))); > {code} > Option 1 will make the zkstatus page show error when an observer is down. > Option 2 will not make the zkstatus page show error when an observer is down. > *Ideally*, additional logic to account for observers should be added, and > show a STATUS_YELLOW when any observers are down (but followers are all up), > as this means the ensemble is only in a degraded, but functional state. > Happy to create a PR, however I don't have a lot of free time at home at the > moment, so it may take a week or two. > > Additional info: > See below for example mntr output for the Leader/Follower/Observer roles, > noting the Leader's zk_followers and zk_synced_followers values, and the > values of zk_server_state. > Leader: > {code:java} > [root@master1 ~]# echo mntr | nc master3 12181 > zk_version 3.5.6-c11b7e26bc554b8523dc929761dd28808913f091, built on > 10/08/2019 20:18 GMT > zk_avg_latency 0 > zk_max_latency 2 > zk_min_latency 0 > zk_packets_received 97 > zk_packets_sent 96 > zk_num_alive_connections 2 > zk_outstanding_requests 0 > zk_server_state leader > zk_znode_count 92 > zk_watch_count 7 > zk_ephemerals_count 9 > zk_approximate_data_size 236333 > zk_open_file_descriptor_count 64 > zk_max_file_descriptor_count 4096 > zk_followers 4 > zk_synced_followers 2 > zk_pending_syncs 0 > zk_last_proposal_size -1 > zk_max_proposal_size -1 > zk_min_proposal_size -1 > {code} > Follower: > {code:java} > [root@master1 ~]# echo mntr | nc master2 12181 > zk_version 3.5.6-c11b7e26bc554b8523dc929761dd28808913f091, built on > 10/08/2019 20:18 GMT > zk_avg_latency 0 > zk_max_latency 6 > zk_min_latency 0 > zk_packets_received 97 > zk_packets_sent 96 > zk_num_alive_connections 2 > zk_outstanding_requests 0 > zk_server_state follower > zk_znode_count 92 > zk_watch_count 7 > zk_ephemerals_count 9 > zk_approximate_data_size 236333 > zk_open_file_descriptor_count 60 > zk_max_file_descriptor_count 4096 > {code} > Observer: > {code:java} > [root@master1 ~]# echo mntr | nc slave1 12181 > zk_version 3.5.6-c11b7e26bc554b8523dc929761dd28808913f091, built on > 10/08/2019 20:18 GMT > zk_avg_latency 0 > zk_max_latency 8 > zk_min_latency 0 > zk_packets_received 174 > zk_packets_sent 173 > zk_num_alive_connections 2 > zk_outstanding_requests 0 > zk_server_state observer > zk_znode_count 92 > zk_watch_count 7 > zk_ephemerals_count 9 > zk_approximate_data_size 236333 > zk_open_file_descriptor_count 59 > zk_max_file_descriptor_count 4096 > {code} -- This message was sent by Atlassian Jira (v8.3.4#803005) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org