AlexanderKM commented on PR #16867:
URL: https://github.com/apache/pinot/pull/16867#issuecomment-3330595648

   I took your advice and added a retry in the segment assignment flow! Just 
only when a ZkInterruptedException is encountered (it probably doesn't make 
sense to retry in other scenarios since the 
[RetryPolicy](https://github.com/apache/pinot/blob/master/pinot-common/src/main/java/org/apache/pinot/common/utils/helix/IdealStateGroupCommit.java#L209-L211)
 is already attempting 20 times in "normal" cases, so keeping this scoped to 
only ZK exceptions for now). This did require a bit of refactoring but I think 
it looks better in the end.
   
   To support this, I added some unit tests for the following:
   - Successful add new segment
   - Recovering from ZkInterruptedException leads to a successful add new 
segment
   - Complete failure of adding new segment in case of multiple ZK exceptions 
or other normal exceptions like AttemptsExceeded
   
   >Clean up is fine, but what if the clean up also fails?
   
   I think there will inevitably be some scenarios that we cannot completely 
recover from, but we will at least let the caller know that an error occurred 
and then they can handle cleanup manually as needed. This also hopefully 
removes the segment from both the PropertyStore AND IdealState, which leaves 
the zookeeper state in a better place 🤞 
   
   Let me know your thoughts when you have time @Jackie-Jiang 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to