somandal opened a new pull request, #16255:
URL: https://github.com/apache/pinot/pull/16255

   Today the `UtilizationChecker` and `ResourceUtilizationManager` return a 
boolean for the resource utilization check. For scenarios where the controller 
is restarted, for some time it is possible that no cached resource utilization 
data exists on the controller (e.g. disk utilization information has to be 
fetched from servers and then cached in the controller by the 
`ResourceUtilizationChecker` periodic task). Today for such scenarios, the 
resource utilization check return `true`. This can lead to weird behavior where 
a previously paused table may be un-paused for a short period of time, and then 
be paused again once the information has been fetched correctly.
   
   To get around the above, this PR handles this scenario by:
   - Adding a new enum, `CheckResult` with three values: STALE, TRUE, FALSE
   - The resource utilization check will now return a `CheckResult` enum
   - The `DiskUtilizationChecker` returns the STALE status when all the server 
instances for that table are STALE
   - If only a subset are stale, the `DiskUtilizationChecker` will return FALSE 
if any of the known servers breach utilization, otherwise TRUE
   - For the RealtimeSegmentValidation code path, the table is not un-paused if 
a STALE status is returned
   - For the TaskGeneration code path, no change (STALE is ignored) [let me 
know if we should skip task creation in this case as well]


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@pinot.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscr...@pinot.apache.org
For additional commands, e-mail: commits-h...@pinot.apache.org

Reply via email to