gortiz commented on code in PR #10528:
URL: https://github.com/apache/pinot/pull/10528#discussion_r1204062751


##########
pinot-integration-tests/src/test/java/org/apache/pinot/integration/tests/MultiNodesOfflineClusterIntegrationTest.java:
##########
@@ -138,28 +140,33 @@ public void testServerHardFailure()
 
     // Take a server and shut down its query server to mimic a hard failure
     BaseServerStarter serverStarter = _serverStarters.get(NUM_SERVERS - 1);
-    serverStarter.getServerInstance().shutDown();
-
-    // First query should hit all servers and get connection refused exception
-    testCountStarQuery(NUM_SERVERS, true);
-
-    // Second query should not hit the failed server, and should return the 
correct result
-    testCountStarQuery(NUM_SERVERS - 1, false);
-
-    // Restart the failed server, and it should be included in the routing 
again
-    serverStarter.stop();
-    serverStarter = startOneServer(NUM_SERVERS - 1);
-    _serverStarters.set(NUM_SERVERS - 1, serverStarter);
-    TestUtils.waitForCondition(aVoid -> {
-      try {
-        JsonNode queryResult = postQuery("SELECT COUNT(*) FROM mytable");
-        // Result should always be correct
-        
assertEquals(queryResult.get("resultTable").get("rows").get(0).get(0).longValue(),
 getCountStarResult());
-        return queryResult.get("numServersQueried").intValue() == NUM_SERVERS;
-      } catch (Exception e) {
-        throw new RuntimeException(e);
-      }
-    }, 10_000L, "Failed to include the restarted server into the routing");
+    try {
+      serverStarter.getServerInstance().shutDown();
+
+      // First query should hit all servers and get connection refused 
exception
+      TestUtils.waitForCondition(() -> {
+        testCountStarQuery(NUM_SERVERS, true);
+        return null;
+      }, 500, 5000, "Expected " + NUM_SERVERS + " with error", true, 
Duration.of(1, ChronoUnit.SECONDS));
+
+      // Second query should not hit the failed server, and should return the 
correct result
+      testCountStarQuery(NUM_SERVERS - 1, false);
+    } finally {
+      // Restart the failed server, and it should be included in the routing 
again
+      serverStarter.stop();
+      serverStarter = startOneServer(NUM_SERVERS - 1);

Review Comment:
   This was a very problematic issue. If the test failed, it never tried to 
restart the server again, which makes other test fail afterwards, which fooled 
me a lot while trying to fix the first issue.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@pinot.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscr...@pinot.apache.org
For additional commands, e-mail: commits-h...@pinot.apache.org

Reply via email to