jadami10 commented on issue #16565:
URL: https://github.com/apache/pinot/issues/16565#issuecomment-3189966117

   we discussed this live, there's 2 items we think are right to do here:
   
   1. server `/health` for readiness should be updated to check 
[IS_SHUTDOWN_IN_PROGRESS](https://github.com/apache/pinot/blob/642bf00501ef0cc0ddb79ade00b2eff695590ea0/pinot-server/src/main/java/org/apache/pinot/server/starter/helix/BaseServerStarter.java#L799C38-L800C40)
 is set to false. This doesn't completely cover the issue, but it ensures 
`/health` returns OK _after_ the query server has started and gives the broker 
more time to process the server status change
   2. we need some form of coordination between the external view the server 
knows about and the external view the broker routing map is based on. We need 
some form of watermark to be able to say "broker X has processed external view 
changes up to version V" in order to ensure either 1) servers are not marked 
healthy until brokers have rebuilt the routing map with the latest external 
view changes or 2) an external orchestrating system is able to correctly await 
broker routing map changes in order to inform when the next batch of servers 
can be restarted
   
   the first item is much more straightforward, so we can start there.
   
   I likely won't have time to get to this for a few months, and we've papered 
over this by adding `sleep(30s)` between batches of servers.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to