sajjad-moradi opened a new issue, #15897: URL: https://github.com/apache/pinot/issues/15897
If a Pinot Server encounters multiple consumption errors, it instructs the Controller to mark its replica as OFFLINE in the Ideal State (IS). Currently, the RealtimeSegmentValidationManager (RSVM) periodic job attempts to create a new consuming segment [only if all replicas are in the OFFLINE state](https://github.com/apache/pinot/blob/master/pinot-controller/src/main/java/org/apache/pinot/controller/helix/core/realtime/PinotLLCRealtimeSegmentManager.java#L1586-L1587) in the IS. While this is a useful automation, we have observed many cases where some, but not all, replicas are marked OFFLINE due to transient stream issues. In such cases, all queries are routed to the remaining healthy replicas, which is not ideal. It would be beneficial if the RSVM job could automatically mitigate these scenarios. One proposed solution is to issue a force commit when this condition is detected. The force commit should apply only to the affected partition and only if a sufficient number of events have been consumed—e.g., at least half of the desired numRows specified in the segment ZK metadata. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@pinot.apache.org.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: commits-unsubscr...@pinot.apache.org For additional commands, e-mail: commits-h...@pinot.apache.org