ege-st commented on PR #11496: URL: https://github.com/apache/pinot/pull/11496#issuecomment-1716467737
> > My concern is that we are trying to prove that the fix is working using tests/heap dump, etc. vs the restart will just work. > > Let me elaborate a bit on the nature of problem we saw in our production. > > We have a cluster several thousands of tables served by handful of brokers. > > A really bad query that was fetching around 150MB of data from each of the 160 servers (fan out was 160) caused direct memory OOM on broker. Note that this was a soft OOM (broker didn't crash unlike Java heap space OOM) > > The problem is not just with the OOM. It is the cascade impact of this OOM on the overall stability / availability of the system. > Yes, this is the core problem here, imo, this bug causes the Broker to become a bad actor and start sabotaging queries which is, imo, one of the worst situations the cluster could be in. > I agree that shutting down channels will cause the other queries to fail but that particular impact may not be worse than the potential real life worst impact that I described above -- which without manual interference or other tooling etc will continue to cause problems on the cluster IMHO Causing queries to fail is not a huge issue as any solution will necessarily involve some queries failing while the broker recovers. It is certainly very minor when compared to the Broker continuing to accept queries even though it cannot execute them. > @soumitra-st @ege-st - I hope this gives some insight into where we are coming from. We can also chat offline and align if need be > > cc @jasperjiaguo @vvivekiyer I think that this is very insightful, thanks. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@pinot.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: commits-unsubscr...@pinot.apache.org For additional commands, e-mail: commits-h...@pinot.apache.org