J-HowHuang commented on code in PR #16886:
URL: https://github.com/apache/pinot/pull/16886#discussion_r2395584084


##########
pinot-controller/src/main/java/org/apache/pinot/controller/helix/core/rebalance/tenant/ZkBasedTenantRebalanceObserver.java:
##########
@@ -20,120 +20,294 @@
 
 import com.fasterxml.jackson.core.JsonProcessingException;
 import com.google.common.annotations.VisibleForTesting;
+import java.util.ArrayList;
 import java.util.HashMap;
 import java.util.List;
 import java.util.Map;
 import java.util.Set;
-import java.util.stream.Collectors;
+import java.util.concurrent.atomic.AtomicInteger;
+import java.util.concurrent.atomic.AtomicReference;
+import java.util.function.BiConsumer;
+import org.apache.commons.lang3.StringUtils;
+import org.apache.commons.lang3.tuple.Pair;
 import org.apache.pinot.controller.helix.core.PinotHelixResourceManager;
 import org.apache.pinot.controller.helix.core.controllerjob.ControllerJobTypes;
 import org.apache.pinot.controller.helix.core.rebalance.RebalanceJobConstants;
+import org.apache.pinot.controller.helix.core.rebalance.RebalanceResult;
+import org.apache.pinot.controller.helix.core.rebalance.TableRebalanceManager;
 import org.apache.pinot.spi.utils.CommonConstants;
 import org.apache.pinot.spi.utils.JsonUtils;
+import org.apache.pinot.spi.utils.retry.AttemptFailureException;
+import org.apache.pinot.spi.utils.retry.RetryPolicies;
+import org.apache.pinot.spi.utils.retry.RetryPolicy;
 import org.slf4j.Logger;
 import org.slf4j.LoggerFactory;
 
 
 public class ZkBasedTenantRebalanceObserver implements TenantRebalanceObserver 
{
   private static final Logger LOGGER = 
LoggerFactory.getLogger(ZkBasedTenantRebalanceObserver.class);
+  public static final int DEFAULT_ZK_UPDATE_MAX_RETRIES = 30;

Review Comment:
   30 was the number I put as a guarantee so that it retries for 30 seconds 
maximum.
   
   I've changed the policy to random now. The delay is 100-200ms now, based on 
the other zk update retry policy implementation here 
https://github.com/apache/pinot/blob/3a6a2f260a9470166646979f5db8fdb6e27fd304/pinot-common/src/main/java/org/apache/pinot/common/utils/helix/HelixHelper.java#L65-L66
   Ref: https://github.com/apache/pinot/pull/6165#issuecomment-714667987
   
   We probably don't need as many retries as IS update, since tenant rebalance 
job metadata updates should be less frequent, except the job polls at the 
beginning. The tenant rebalancer now has an observer with max retries set to 
its degree of parallelism, with a random policy this should be enough. wdyt?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to