J-HowHuang commented on code in PR #16886:
URL: https://github.com/apache/pinot/pull/16886#discussion_r2395584084
##########
pinot-controller/src/main/java/org/apache/pinot/controller/helix/core/rebalance/tenant/ZkBasedTenantRebalanceObserver.java:
##########
@@ -20,120 +20,294 @@
import com.fasterxml.jackson.core.JsonProcessingException;
import com.google.common.annotations.VisibleForTesting;
+import java.util.ArrayList;
import java.util.HashMap;
import java.util.List;
import java.util.Map;
import java.util.Set;
-import java.util.stream.Collectors;
+import java.util.concurrent.atomic.AtomicInteger;
+import java.util.concurrent.atomic.AtomicReference;
+import java.util.function.BiConsumer;
+import org.apache.commons.lang3.StringUtils;
+import org.apache.commons.lang3.tuple.Pair;
import org.apache.pinot.controller.helix.core.PinotHelixResourceManager;
import org.apache.pinot.controller.helix.core.controllerjob.ControllerJobTypes;
import org.apache.pinot.controller.helix.core.rebalance.RebalanceJobConstants;
+import org.apache.pinot.controller.helix.core.rebalance.RebalanceResult;
+import org.apache.pinot.controller.helix.core.rebalance.TableRebalanceManager;
import org.apache.pinot.spi.utils.CommonConstants;
import org.apache.pinot.spi.utils.JsonUtils;
+import org.apache.pinot.spi.utils.retry.AttemptFailureException;
+import org.apache.pinot.spi.utils.retry.RetryPolicies;
+import org.apache.pinot.spi.utils.retry.RetryPolicy;
import org.slf4j.Logger;
import org.slf4j.LoggerFactory;
public class ZkBasedTenantRebalanceObserver implements TenantRebalanceObserver
{
private static final Logger LOGGER =
LoggerFactory.getLogger(ZkBasedTenantRebalanceObserver.class);
+ public static final int DEFAULT_ZK_UPDATE_MAX_RETRIES = 30;
Review Comment:
30 was the number I put as a guarantee so that it retries for 30 seconds
maximum.
I've changed the policy to random now. The delay is 100-200ms now, based on
the other zk update retry policy implementation here
https://github.com/apache/pinot/blob/3a6a2f260a9470166646979f5db8fdb6e27fd304/pinot-common/src/main/java/org/apache/pinot/common/utils/helix/HelixHelper.java#L65-L66
Ref: https://github.com/apache/pinot/pull/6165#issuecomment-714667987
We probably don't need as many retries as IS update, since tenant rebalance
job metadata updates should be less frequent, except the job polls at the
beginning. The tenant rebalancer now has an observer with max retries set to
its degree of parallelism, with a random policy this should be enough. wdyt?
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]