Re: [PR] Made ITs that restart MAC faster [accumulo]

via GitHub Mon, 13 Jan 2025 20:17:35 -0800


ctubbsii commented on code in PR #5246:
URL: https://github.com/apache/accumulo/pull/5246#discussion_r1914216953



##########
minicluster/src/main/java/org/apache/accumulo/miniclusterImpl/MiniAccumuloClusterImpl.java:
##########
@@ -824,9 +825,46 @@ public synchronized void stop() throws IOException, 
InterruptedException {
 
     control.stop(ServerType.GARBAGE_COLLECTOR, null);
     control.stop(ServerType.MANAGER, null);
+    control.stop(ServerType.COMPACTION_COORDINATOR);
     control.stop(ServerType.TABLET_SERVER, null);
+    control.stop(ServerType.COMPACTOR, null);
+    control.stop(ServerType.SCAN_SERVER, null);
+
+    // The method calls above kill the server
+    // Clean up the locks in ZooKeeper fo that if the cluster
+    // is restarted, then the processes will start right away
+    // and not wait for the old locks to be cleaned up.
+    try {
+      new ZooZap().zap(getServerContext().getSiteConfiguration(), new String[] 
{"-manager",
+          "-compaction-coordinators", "-tservers", "-compactors", 
"-sservers"});
+    } catch (Exception e) {
+      log.error("Error zapping zookeeper locks", e);
+    }
     control.stop(ServerType.ZOOKEEPER, null);
 
+    // Clear the location of the servers in ZooCache.
+    // When ZooKeeper was stopped in the previous method call,
+    // the local ZooKeeper watcher did not fire. If MAC is
+    // restarted, then ZooKeeper will start on the same port with
+    // the same data, but no Watchers will fire.
+    boolean startCalled = true;
+    try {
+      getServerContext();
+    } catch (RuntimeException e) {
+      if (e.getMessage().startsWith("Accumulo not initialized")) {
+        startCalled = false;
+      }
+    }
+    if (startCalled) {
+      final ServerContext ctx = getServerContext();
+      final String zRoot = getServerContext().getZooKeeperRoot();
+      ctx.getZooCache().clear((path) -> path.startsWith(zRoot + 
Constants.ZMANAGER_LOCK));
+      ctx.getZooCache().clear((path) -> path.startsWith(zRoot + 
Constants.ZGC_LOCK));
+      ctx.getZooCache().clear((path) -> path.startsWith(zRoot + 
Constants.ZCOMPACTORS));
+      ctx.getZooCache().clear((path) -> path.startsWith(zRoot + 
Constants.ZSSERVERS));
+      ctx.getZooCache().clear((path) -> path.startsWith(zRoot + 
Constants.ZTSERVERS));

Review Comment:
   I don't know if this is helpful, but you can define the predicate once 
outside this code, and then just call clear a single time with a single 
ready-to-go predicate.
   
   ```suggestion
         Predicate<String> lockPaths = path -> path.startsWith(zRoot + 
Constants.ZMANAGER_LOCK);
         for (String s : Set.of(Constants.ZGC_LOCK, Constants.ZCOMPACTORS, 
Constants.ZSSERVERS, Constants.ZTSERVERS)) {
           lockPaths = lockPaths.or(path -> path.startsWith(zRoot + s));
         }
         ctx.getZooCache().clear(lockPaths);
   ```
   
   You could also define it a predicate that is true if any predicate on a 
stream is true:
   
   ```suggestion
         Predicate<String> lockPaths = path -> 
Stream.of(Constants.ZMANAGER_LOCK, Constants.ZGC_LOCK, Constants.ZCOMPACTORS, 
Constants.ZSSERVERS, Constants.ZTSERVERS)
             .anyMatch(p -> path.startsWith(zRoot + p));
         ctx.getZooCache().clear(lockPaths);
   ```
   
   The setup of the Predicate is a little more costly, but if done once per jvm 
as a static final, it could be worth it to reduce the number of calls to 
zooCache.clear(), and it might be a little more readable.
   



##########
minicluster/src/main/java/org/apache/accumulo/miniclusterImpl/MiniAccumuloClusterImpl.java:
##########
@@ -824,9 +825,47 @@ public synchronized void stop() throws IOException, 
InterruptedException {
 
     control.stop(ServerType.GARBAGE_COLLECTOR, null);
     control.stop(ServerType.MANAGER, null);
+    control.stop(ServerType.COMPACTION_COORDINATOR);
     control.stop(ServerType.TABLET_SERVER, null);
+    control.stop(ServerType.COMPACTOR, null);
+    control.stop(ServerType.SCAN_SERVER, null);
+
+    // The method calls above kill the server
+    // Clean up the locks in ZooKeeper fo that if the cluster
+    // is restarted, then the processes will start right away
+    // and not wait for the old locks to be cleaned up.
+    try {
+      System.setProperty("accumulo.properties", "file://" + 
getAccumuloPropertiesPath());
+      new ZooZap().zap(getServerContext().getSiteConfiguration(), new String[] 
{"-manager",
+          "-compaction-coordinators", "-tservers", "-compactors", 
"-sservers"});
+    } catch (Exception e) {
+      log.error("Error zapping zookeeper locks", e);
+    }
     control.stop(ServerType.ZOOKEEPER, null);
 
+    // Clear the location of the servers in ZooCache.
+    // When ZooKeeper was stopped in the previous method call,
+    // the local ZooKeeper watcher did not fire. If MAC is
+    // restarted, then ZooKeeper will start on the same port with
+    // the same data, but no Watchers will fire.

Review Comment:
   I found some info at this [confluence 
page](https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=24193420#FAQ-WhathappenstoZKsessionswhiletheclusterisdown?)
 that says that sessions are preserved while ZK is offline, and sessions 
(including any session-related data, like ephemeral nodes) are restored and the 
session timeout refreshed by the new leader after the cluster is restored. So, 
the premise of this issue is sound, regarding the need to wait for the old 
accumulo server locks to expire before the restarted services are ready. I 
assumed all sessions would automatically expire if the ZK cluster were to go 
down entirely. However, I suppose it makes sense that it's not the case, since 
that is essentially not any different than if the ZK leader went offline and 
the cluster had to elect a new leader, which is probably not as uncommon as an 
entire cluster being offline.
   
   However, the ZooCache watcher should still see the disconnected event, and 
clears the cache on disconnected, well before the session would have expired. 
So, even if it never sees the session expired event (which can only happen if 
ZK never comes back online), I don't think that part of this is relevant. The 
connection loss KeeperException should happen only for API calls while in a 
still-disconnected state, but those are separate from the watcher having been 
triggered and clearing as a result of the disconnected event. Those are 
something we could potentially address using ZooReader, if it's something we 
care about. But, it shouldn't affect the timing of ZooCache being updated.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Re: [PR] Made ITs that restart MAC faster [accumulo]

Reply via email to