ctubbsii commented on code in PR #5246:
URL: https://github.com/apache/accumulo/pull/5246#discussion_r1914216953
##########
minicluster/src/main/java/org/apache/accumulo/miniclusterImpl/MiniAccumuloClusterImpl.java:
##########
@@ -824,9 +825,46 @@ public synchronized void stop() throws IOException,
InterruptedException {
control.stop(ServerType.GARBAGE_COLLECTOR, null);
control.stop(ServerType.MANAGER, null);
+ control.stop(ServerType.COMPACTION_COORDINATOR);
control.stop(ServerType.TABLET_SERVER, null);
+ control.stop(ServerType.COMPACTOR, null);
+ control.stop(ServerType.SCAN_SERVER, null);
+
+ // The method calls above kill the server
+ // Clean up the locks in ZooKeeper fo that if the cluster
+ // is restarted, then the processes will start right away
+ // and not wait for the old locks to be cleaned up.
+ try {
+ new ZooZap().zap(getServerContext().getSiteConfiguration(), new String[]
{"-manager",
+ "-compaction-coordinators", "-tservers", "-compactors",
"-sservers"});
+ } catch (Exception e) {
+ log.error("Error zapping zookeeper locks", e);
+ }
control.stop(ServerType.ZOOKEEPER, null);
+ // Clear the location of the servers in ZooCache.
+ // When ZooKeeper was stopped in the previous method call,
+ // the local ZooKeeper watcher did not fire. If MAC is
+ // restarted, then ZooKeeper will start on the same port with
+ // the same data, but no Watchers will fire.
+ boolean startCalled = true;
+ try {
+ getServerContext();
+ } catch (RuntimeException e) {
+ if (e.getMessage().startsWith("Accumulo not initialized")) {
+ startCalled = false;
+ }
+ }
+ if (startCalled) {
+ final ServerContext ctx = getServerContext();
+ final String zRoot = getServerContext().getZooKeeperRoot();
+ ctx.getZooCache().clear((path) -> path.startsWith(zRoot +
Constants.ZMANAGER_LOCK));
+ ctx.getZooCache().clear((path) -> path.startsWith(zRoot +
Constants.ZGC_LOCK));
+ ctx.getZooCache().clear((path) -> path.startsWith(zRoot +
Constants.ZCOMPACTORS));
+ ctx.getZooCache().clear((path) -> path.startsWith(zRoot +
Constants.ZSSERVERS));
+ ctx.getZooCache().clear((path) -> path.startsWith(zRoot +
Constants.ZTSERVERS));
Review Comment:
I don't know if this is helpful, but you can define the predicate once
outside this code, and then just call clear a single time with a single
ready-to-go predicate.
```suggestion
Predicate<String> lockPaths = path -> path.startsWith(zRoot +
Constants.ZMANAGER_LOCK);
for (String s : Set.of(Constants.ZGC_LOCK, Constants.ZCOMPACTORS,
Constants.ZSSERVERS, Constants.ZTSERVERS)) {
lockPaths = lockPaths.or(path -> path.startsWith(zRoot + s));
}
ctx.getZooCache().clear(lockPaths);
```
You could also define it a predicate that is true if any predicate on a
stream is true:
```suggestion
Predicate<String> lockPaths = path ->
Stream.of(Constants.ZMANAGER_LOCK, Constants.ZGC_LOCK, Constants.ZCOMPACTORS,
Constants.ZSSERVERS, Constants.ZTSERVERS)
.anyMatch(p -> path.startsWith(zRoot + p));
ctx.getZooCache().clear(lockPaths);
```
The setup of the Predicate is a little more costly, but if done once per jvm
as a static final, it could be worth it to reduce the number of calls to
zooCache.clear(), and it might be a little more readable.
##########
minicluster/src/main/java/org/apache/accumulo/miniclusterImpl/MiniAccumuloClusterImpl.java:
##########
@@ -824,9 +825,47 @@ public synchronized void stop() throws IOException,
InterruptedException {
control.stop(ServerType.GARBAGE_COLLECTOR, null);
control.stop(ServerType.MANAGER, null);
+ control.stop(ServerType.COMPACTION_COORDINATOR);
control.stop(ServerType.TABLET_SERVER, null);
+ control.stop(ServerType.COMPACTOR, null);
+ control.stop(ServerType.SCAN_SERVER, null);
+
+ // The method calls above kill the server
+ // Clean up the locks in ZooKeeper fo that if the cluster
+ // is restarted, then the processes will start right away
+ // and not wait for the old locks to be cleaned up.
+ try {
+ System.setProperty("accumulo.properties", "file://" +
getAccumuloPropertiesPath());
+ new ZooZap().zap(getServerContext().getSiteConfiguration(), new String[]
{"-manager",
+ "-compaction-coordinators", "-tservers", "-compactors",
"-sservers"});
+ } catch (Exception e) {
+ log.error("Error zapping zookeeper locks", e);
+ }
control.stop(ServerType.ZOOKEEPER, null);
+ // Clear the location of the servers in ZooCache.
+ // When ZooKeeper was stopped in the previous method call,
+ // the local ZooKeeper watcher did not fire. If MAC is
+ // restarted, then ZooKeeper will start on the same port with
+ // the same data, but no Watchers will fire.
Review Comment:
I found some info at this [confluence
page](https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=24193420#FAQ-WhathappenstoZKsessionswhiletheclusterisdown?)
that says that sessions are preserved while ZK is offline, and sessions
(including any session-related data, like ephemeral nodes) are restored and the
session timeout refreshed by the new leader after the cluster is restored. So,
the premise of this issue is sound, regarding the need to wait for the old
accumulo server locks to expire before the restarted services are ready. I
assumed all sessions would automatically expire if the ZK cluster were to go
down entirely. However, I suppose it makes sense that it's not the case, since
that is essentially not any different than if the ZK leader went offline and
the cluster had to elect a new leader, which is probably not as uncommon as an
entire cluster being offline.
However, the ZooCache watcher should still see the disconnected event, and
clears the cache on disconnected, well before the session would have expired.
So, even if it never sees the session expired event (which can only happen if
ZK never comes back online), I don't think that part of this is relevant. The
connection loss KeeperException should happen only for API calls while in a
still-disconnected state, but those are separate from the watcher having been
triggered and clearing as a result of the disconnected event. Those are
something we could potentially address using ZooReader, if it's something we
care about. But, it shouldn't affect the timing of ZooCache being updated.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]