somandal commented on PR #15618:
URL: https://github.com/apache/pinot/pull/15618#issuecomment-2854869492

   > > > > Before this PR, if we set 
externalViewStabilizationTimeoutInMs=10000ms, regular run will fail after 
10000ms wait, while bestEfforts continue the next step after 10000ms.
   > > > > In this PR, regular run will succeed after, for example, 150000ms 
(extended 14 times), while bestEfforts continue the next step after 10000ms.
   > > > > All of this is because current PR will not give any timeout 
extension to bestEfforts=true. What I suggested is to allow bestEfforts to have 
this dynamic timeout too.
   > > > 
   > > > 
   > > > Conclusion from offline discussion: In the sense that 
`externalViewStabilizationTimeoutInMs` is likely to remain the same as before 
(not expect to tune this number), not giving timeout extension to 
`bestEfforts=true` is preferred so as to lower the impact to the current uses 
of bestEfforts, as they would just behave as usual.
   > > 
   > > 
   > > Discussed this more offline with @Jackie-Jiang and @klsince and we have 
consensus that we can change the behavior for bestEfforts=true to also extend 
the timeout as long as progress is made. If we don't make progress, 
bestEfforts=true should continue onto the next step rather than failing. Let's 
incorporate this behavior into this PR itself @J-HowHuang since it hasn't been 
merged yet (provided it doesn't delay the release) cc @yashmayya for FYI
   > 
   > I think we probably need to better define the contract for everything that 
setting `bestEfforts` to `true` does in a rebalance? This is our current 
definition -
   > 
   > ```
   > Applicable for rebalance with downtime=false.
   > 
   > If a no-downtime rebalance cannot be performed successfully, this flag 
controls whether to fail the rebalance or do a best-effort rebalance. Warning: 
setting this flag to true can cause downtime under two scenarios: 1) any 
segments get into ERROR state and 2) EV-IS convergence times out
   > ```
   
   Yes, the same definition holds for best-efforts. a) ERROR segments are 
ignored and we still continue rebalance, b) EV-IS convergence will keep getting 
extended as long as we make progress. If we don't make progress we will 
continue to the next step rather than timing out.
   
   The main difference is that best-efforts will become much slower to get to 
the next step, but it can still cause downtime in the above scenarios


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@pinot.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscr...@pinot.apache.org
For additional commands, e-mail: commits-h...@pinot.apache.org

Reply via email to