jadami10 commented on issue #10460: URL: https://github.com/apache/pinot/issues/10460#issuecomment-1482056208
> @jadami10 were you able to validate that forcecommit was the issue? yes, after looking through the logs of the winners, the server that was lagging was chosen as the winner for 5 partitions. > I think one of the use-cases that forcecommit was meant to solve the case where the ingestion is seems stuck. In such cases, force committing based on the first responsive replica rather than winning replica seems appropriate. Assuming my previous comment/reading of the code above is correct, when your ingestion is stuck and you call this, you will at most wait 3.3 seconds more seconds for a winner. In practice, it's likely minutes before you've noticed ingestion is stuck. But in the case where 1 server is significantly behind (in our case 2 hours behind), accidentally picking it has pretty big ramifications. Now every server is 2 hours behind for those partitions. There's also very few cases where the wait period is skipped (only row limit and end of partition reached). Even for time based or size based segment completion, the protocol still waits, and that doesn't seem to be a problem. I'll wait to hear from @sajjad-moradi, but I would be strongly in favor of undoing this part of the force commit feature. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@pinot.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: commits-unsubscr...@pinot.apache.org For additional commands, e-mail: commits-h...@pinot.apache.org