jadami10 opened a new issue, #16565:
URL: https://github.com/apache/pinot/issues/16565

   We've been hit by a bug related to #14529 and external orchestration of 
Pinot restarts.
   
   Let's assume you have a table with 2 replica groups
   
   - External system sends SIGTERM to pinot-server-1
   - pinot-server-1 sets `IS_SHUTDOWN_IN_PROGRESS
   - Pinot broker stops routing to pinot-server-1
   - pinot-server-1 starts back up with `/health` not returning `OK`
   - pinot-server-1 
[startupServiceStatusCheck](https://github.com/apache/pinot/blob/642bf00501ef0cc0ddb79ade00b2eff695590ea0/pinot-server/src/main/java/org/apache/pinot/server/starter/helix/BaseServerStarter.java#L150)
 completes.
   - *start of problem*: External system seems `/health` return OK
   - *problem*: External system restarts pinot-server-2
   - *problem*: Queries fail because `pinot-server-1` and `pinot-server-2` are 
both not serving queries
   - `pinot-server-1` sets `IS_SHUTDOWN_IN_PROGRESS` false
   - Broker adds `pinot-server-1` back to the routing table, and queries 
succeed again
   
   In our case, this caused ~17 seconds of down time.
   
   It's not clear how to orchestrate this correctly in Pinot. It seems you have 
to check the broker routing table for every table to ensure your server is 
found in there. But there's no clear API for "Is X server available for all 
necessary segments" or "is Y server going to cause downtime if I take it down". 
So if you're performing a rolling restart, you're kind of crossing your fingers 
you wait long enough between replica group restarts.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to