jadami10 commented on issue #16565: URL: https://github.com/apache/pinot/issues/16565#issuecomment-3189966117
we discussed this live, there's 2 items we think are right to do here: 1. server `/health` for readiness should be updated to check [IS_SHUTDOWN_IN_PROGRESS](https://github.com/apache/pinot/blob/642bf00501ef0cc0ddb79ade00b2eff695590ea0/pinot-server/src/main/java/org/apache/pinot/server/starter/helix/BaseServerStarter.java#L799C38-L800C40) is set to false. This doesn't completely cover the issue, but it ensures `/health` returns OK _after_ the query server has started and gives the broker more time to process the server status change 2. we need some form of coordination between the external view the server knows about and the external view the broker routing map is based on. We need some form of watermark to be able to say "broker X has processed external view changes up to version V" in order to ensure either 1) servers are not marked healthy until brokers have rebuilt the routing map with the latest external view changes or 2) an external orchestrating system is able to correctly await broker routing map changes in order to inform when the next batch of servers can be restarted the first item is much more straightforward, so we can start there. I likely won't have time to get to this for a few months, and we've papered over this by adding `sleep(30s)` between batches of servers. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
