dlmarion commented on PR #5321:
URL: https://github.com/apache/accumulo/pull/5321#issuecomment-2666485492
> Looks good, did you manually test this? Wondering what the output would
look like and if there would be an indication that zoozap ran. That lead to the
comment about debugOrRun
I applied your suggestion and made some changes to ZooZap in 18844f8 to
clean up zookeeper in a consistent manner for the different server types.
ZooZap was doing one thing for compactors, and something else for tservers and
sservers. I changed ZooZap so that in all cases it does a recursive delete on
the resource group.
Below is the output of `accumulo-cluster stop` on my local node, with the
log4j lines removed.
```
Stopping Accumulo cluster...
Accumulo shut down cleanly
Utilities and unresponsive servers will shut down in 5 seconds (Ctrl-C to
abort)
Executing stop on tablet servers for group default ....Cleaning tablet
server entries from zookeeper for resource group default
Stopping service process: tserver_default_1
Stopping tserver_default_1 on 1.2.3.4
done
Executing stop on managers
Executing stop on garbage collectors
Executing stop on monitors
Executing stop on scan servers for group default
Stopping service process: manager_default_1
Stopping manager_default_1 on 1.2.3.4
Stopping service process: gc_default_1
Stopping gc_default_1 on 1.2.3.4
Cleaning scan server entries from zookeeper for resource group default
Stopping service process: monitor_default_1
Stopping monitor_default_1 on 1.2.3.4
Stopping service process: sserver_default_1
Stopping sserver_default_1 on 1.2.3.4
Executing stop on compactors for group default
Cleaning compactor entries from zookeeper for resource group default
Stopping service process: compactor_default_1
Stopping compactor_default_1 on 1.2.3.4
Deleting compactor
/accumulo/063fa7a3-0eda-409e-89b8-6d559d6269cf/compactors/default from zookeeper
```
I think it shows several problems where the ZooKeeper cleanup is
inconsistent.
1. In the case of the compactor and sserver, you see a message "Deleting
..... from zookeeper". This is coming from ZooZap, but you don't see it for the
tserver. Looking at the tserver log it shows that a stop was requested,
performed a graceful shutdown, and removed it's own lock from ZooKeeper. When
ZooZap was executed the tserver address I think was empty, so it did not remove
the tserver resource group default path. I think this may have been cleaned up
by the `admin stopAll` code path, but not entirely sure at the moment.
2. For the compactor case you see that the you see the following:
```
Executing stop on compactors for group default
Cleaning compactor entries from zookeeper for resource group default
Stopping service process: compactor_default_1
Stopping compactor_default_1 on 1.2.3.4
Deleting compactor
/accumulo/063fa7a3-0eda-409e-89b8-6d559d6269cf/compactors/default from zookeeper
```
The two lines about remove entries from ZooKeeper are from the ZooZap
command which comes after the code in `accumulo-cluster` that performs the stop
on the compactor process. In my setup I'm using a remote address for the
processes, so an ssh command is being used to run the command to stop the
compactor. I think it's possible that the ZooKeeper paths could be removed
before the processes are stopped in this case.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]