AlexanderKM commented on PR #16867: URL: https://github.com/apache/pinot/pull/16867#issuecomment-3330595648
I took your advice and added a retry in the segment assignment flow! Just only when a ZkInterruptedException is encountered (it probably doesn't make sense to retry in other scenarios since the [RetryPolicy](https://github.com/apache/pinot/blob/master/pinot-common/src/main/java/org/apache/pinot/common/utils/helix/IdealStateGroupCommit.java#L209-L211) is already attempting 20 times in "normal" cases, so keeping this scoped to only ZK exceptions for now). This did require a bit of refactoring but I think it looks better in the end. To support this, I added some unit tests for the following: - Successful add new segment - Recovering from ZkInterruptedException leads to a successful add new segment - Complete failure of adding new segment in case of multiple ZK exceptions or other normal exceptions like AttemptsExceeded >Clean up is fine, but what if the clean up also fails? I think there will inevitably be some scenarios that we cannot completely recover from, but we will at least let the caller know that an error occurred and then they can handle cleanup manually as needed. This also hopefully removes the segment from both the PropertyStore AND IdealState, which leaves the zookeeper state in a better place 🤞 Let me know your thoughts when you have time @Jackie-Jiang -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
