siddharthteotia commented on PR #11496: URL: https://github.com/apache/pinot/pull/11496#issuecomment-1716388980
> Thanks @jasperjiaguo for you comments! > > > Could you eleborate your concern here? I think the tests/heap dump/graphs show that we recover deterministically and the direct buffers are deallocated. > > My concern is that we are trying to prove that the fix is working using tests/heap dump, etc. vs the restart will just work. We have customers using Pinot, and their workload may have some surprises. This fix certainly has less recovery time though. > > Beyond the recovery time, do you have other concerns on shutting down the Broker? How many restarts do you see in your environment, and how many occurrences of direct memory OOM are there? If the fraction of number of direct memory OOM is not significant with respect to restarts because of other reasons, then the additional restarts won't be significant. My perspective is that we should not rely on operational toil (restarts etc) to recover from issues that can largely be handled in code. I think this is what the fix is doing. Let me just say that we have had significant number of OOMs and that's why we have built features like runtime query killing etc to try and improve resiliency via code as opposed to resorting to restarts. I don't think it is wise to rely on restarts unless the problem is absolutely unsolvable via code -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@pinot.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: commits-unsubscr...@pinot.apache.org For additional commands, e-mail: commits-h...@pinot.apache.org