itschrispeck opened a new pull request, #13541:
URL: https://github.com/apache/pinot/pull/13541

   We have seen servers sometimes fail to pass the service status checker until 
the timeout is reached, even after all segments are online/in the expected 
state. Logs show:
   
   ```
   Sleep for 10000ms as service status has not turned GOOD: 
MultipleCallbackServiceStatusCallback:IdealStateAndCurrentStateMatchServiceStatusCallback:Helix
 state does not exist, waitingFor=CurrentStateMatch, resource=table_REALTIME, 
numResourcesLeft=2, numTotalResources=802, 
minStartCount=802,;IdealStateAndExternalViewMatchServiceStatusCallback:Init;;
   ```
   
   This is due to [this 
check](https://github.com/apache/pinot/blob/cf1a0f6b9b44fdb567452b63a34e76ac5c635429/pinot-common/src/main/java/org/apache/pinot/common/utils/ServiceStatus.java#L437-L440),
 which considers the table resource to have `STARTING` status if the external 
view/current state is `null`. However, this isn't a valid assumption since the 
current state can be null if the last segment on the server is removed and the 
ideal state still exists. We primary see this behavior with completed segment 
redistribution turned on, on small tables. 
   
   The change here is meant to allow the resource status to return `GOOD` if 
the instance is no longer assigned any segment (when the server first started 
and collected all resources to monitor it was assigned). 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@pinot.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscr...@pinot.apache.org
For additional commands, e-mail: commits-h...@pinot.apache.org

Reply via email to