sajjad-moradi opened a new pull request, #15173: URL: https://github.com/apache/pinot/pull/15173
In segment commit end, Controller updates ZK in multiple steps (it does not use ZK transaction). In one step, it updates segment ZK metadata of committing segments (set segment.status as DONE), and in the next step, it updates the consuming segment in IdealState to ONLINE. If Controller fails in between these two steps for any reason (crash, restart, stream connection failure, ...), IdealState and SegmentZKMetadata will be out of sync. `RealtimeSegmentValidationManager` job detects this inconsistency and tries to fix it by marking IdealState as ONLINE and creating a new CONSUMING segment. However, if this segment is scheduled for purge (or any other 1:1 minion task), and purge job successfully uploads the segment, the status in SegmentZKMetadata will be updated to UPLOADED. This prevents `RealtimeSegmentValidationManager` to fix the issue. This PR fixes this issue by making sure that the problematic segment is not scheduled for different minion tasks by detecting the problematic scenario (status in SegmentZKMetadata is UPLOADED, but IdealState is CONSUMING). Note: MergeRollup task generator needs to be updated once this issue (https://github.com/apache/pinot/issues/15128) is fixed. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@pinot.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: commits-unsubscr...@pinot.apache.org For additional commands, e-mail: commits-h...@pinot.apache.org