siddharthteotia commented on PR #11496:
URL: https://github.com/apache/pinot/pull/11496#issuecomment-1716388980

   > Thanks @jasperjiaguo for you comments!
   > 
   > > Could you eleborate your concern here? I think the tests/heap 
dump/graphs show that we recover deterministically and the direct buffers are 
deallocated.
   > 
   > My concern is that we are trying to prove that the fix is working using 
tests/heap dump, etc. vs the restart will just work. We have customers using 
Pinot, and their workload may have some surprises. This fix certainly has less 
recovery time though.
   > 
   > Beyond the recovery time, do you have other concerns on shutting down the 
Broker? How many restarts do you see in your environment, and how many 
occurrences of direct memory OOM are there? If the fraction of number of direct 
memory OOM is not significant with respect to restarts because of other 
reasons, then the additional restarts won't be significant.
   
   My perspective is that we should not rely on operational toil (restarts etc) 
to recover from issues that can largely be handled in code. I think this is 
what the fix is doing. 
   
   Let me just say that we have had significant number of OOMs and that's why 
we have built features like runtime query killing etc to try and improve 
resiliency via code as opposed to resorting to restarts. I don't think it is 
wise to rely on restarts unless the problem is absolutely unsolvable via code
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@pinot.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscr...@pinot.apache.org
For additional commands, e-mail: commits-h...@pinot.apache.org

Reply via email to