ddanielr commented on code in PR #5321:
URL: https://github.com/apache/accumulo/pull/5321#discussion_r1962292859


##########
assemble/bin/accumulo-cluster:
##########
@@ -471,6 +477,13 @@ function control_services() {
           fi
         fi
       done
+      if [[ $operation == "stop" || $operation == "kill" ]]; then
+        # If the prior commands were executed via ssh, then we need to wait 
for them
+        # to complete before zapping the nodes in ZooKeeper
+        ssh_wait
+        echo "Cleaning tablet server entries from zookeeper for resource group 
$group"
+        debugOrRun "$accumulo_cmd" org.apache.accumulo.server.util.ZooZap 
-verbose -tservers -group "$group"

Review Comment:
   If `./accumulo-cluster stop --tservers=group1 --local` is run then this 
ZooZap command will remove locks for both local and remote tservers. 
   
   This behavior seems like a common failure point where an admin will attempt 
to only stop "local" services and cause an entire cluster shutdown.
   
   That could be fixed by adding logic to check for the `--local` arg and only 
removing entries if its a cluster-wide action. 
   
   Alternatively, ZooZap could be modified to support passing a `-host` filter 
similar to the `-group` option and replacing the `AddressSelector.all()` use in 
ZooZap.
   
   I'm guessing it couldn't be an exact match as cluster.yaml has the hostname 
but not the port information.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to