somandal commented on PR #15618: URL: https://github.com/apache/pinot/pull/15618#issuecomment-2854869492
> > > > Before this PR, if we set externalViewStabilizationTimeoutInMs=10000ms, regular run will fail after 10000ms wait, while bestEfforts continue the next step after 10000ms. > > > > In this PR, regular run will succeed after, for example, 150000ms (extended 14 times), while bestEfforts continue the next step after 10000ms. > > > > All of this is because current PR will not give any timeout extension to bestEfforts=true. What I suggested is to allow bestEfforts to have this dynamic timeout too. > > > > > > > > > Conclusion from offline discussion: In the sense that `externalViewStabilizationTimeoutInMs` is likely to remain the same as before (not expect to tune this number), not giving timeout extension to `bestEfforts=true` is preferred so as to lower the impact to the current uses of bestEfforts, as they would just behave as usual. > > > > > > Discussed this more offline with @Jackie-Jiang and @klsince and we have consensus that we can change the behavior for bestEfforts=true to also extend the timeout as long as progress is made. If we don't make progress, bestEfforts=true should continue onto the next step rather than failing. Let's incorporate this behavior into this PR itself @J-HowHuang since it hasn't been merged yet (provided it doesn't delay the release) cc @yashmayya for FYI > > I think we probably need to better define the contract for everything that setting `bestEfforts` to `true` does in a rebalance? This is our current definition - > > ``` > Applicable for rebalance with downtime=false. > > If a no-downtime rebalance cannot be performed successfully, this flag controls whether to fail the rebalance or do a best-effort rebalance. Warning: setting this flag to true can cause downtime under two scenarios: 1) any segments get into ERROR state and 2) EV-IS convergence times out > ``` Yes, the same definition holds for best-efforts. a) ERROR segments are ignored and we still continue rebalance, b) EV-IS convergence will keep getting extended as long as we make progress. If we don't make progress we will continue to the next step rather than timing out. The main difference is that best-efforts will become much slower to get to the next step, but it can still cause downtime in the above scenarios -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@pinot.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: commits-unsubscr...@pinot.apache.org For additional commands, e-mail: commits-h...@pinot.apache.org