ege-st commented on PR #11496:
URL: https://github.com/apache/pinot/pull/11496#issuecomment-1716467737

   > > My concern is that we are trying to prove that the fix is working using 
tests/heap dump, etc. vs the restart will just work.
   > 
   > Let me elaborate a bit on the nature of problem we saw in our production.
   > 
   > We have a cluster several thousands of tables served by handful of brokers.
   > 
   > A really bad query that was fetching around 150MB of data from each of the 
160 servers (fan out was 160) caused direct memory OOM on broker. Note that 
this was a soft OOM (broker didn't crash unlike Java heap space OOM)
   > 
   > The problem is not just with the OOM. It is the cascade impact of this OOM 
on the overall stability / availability of the system.
   > 
   
   Yes, this is the core problem here, imo, this bug causes the Broker to 
become a bad actor and start sabotaging queries which is, imo, one of the worst 
situations the cluster could be in.
   
   > I agree that shutting down channels will cause the other queries to fail 
but that particular impact may not be worse than the potential real life worst 
impact that I described above -- which without manual interference or other 
tooling etc will continue to cause problems on the cluster IMHO
   
   Causing queries to fail is not a huge issue as any solution will necessarily 
involve some queries failing while the broker recovers. It is certainly very 
minor when compared to the Broker continuing to accept queries even though it 
cannot execute them.
   
   > @soumitra-st @ege-st - I hope this gives some insight into where we are 
coming from. We can also chat offline and align if need be
   > 
   > cc @jasperjiaguo @vvivekiyer
   
   I think that this is very insightful, thanks.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@pinot.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscr...@pinot.apache.org
For additional commands, e-mail: commits-h...@pinot.apache.org

Reply via email to