ankitsultana commented on issue #12390: URL: https://github.com/apache/pinot/issues/12390#issuecomment-1936819688
The exact issue is described below (all of these are confirmed via logs): * Server has a full GC which leads to ZK and Helix disconnection. * When ZK reconnects, a bunch of OFFLINE to CONSUMING messages are sent * We see the exception above. **Current Theory**: When ZK disconnection happens, the PartitionConsumer thread is still alive and holding the semaphore, and so when Helix sends OFFLINE to CONSUMING transition again for that segment, the Segment Data Manager fails to init. I am low on time right now so can't dig deeper. Wondering if anyone can hint at some potential solutions. I had also seen this somewhat related issue from a few years ago: https://github.com/apache/pinot/issues/7874 cc: @Jackie-Jiang -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@pinot.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: commits-unsubscr...@pinot.apache.org For additional commands, e-mail: commits-h...@pinot.apache.org