lhotari commented on PR #25522: URL: https://github.com/apache/pulsar/pull/25522#issuecomment-4302468787
> Based on the scenarios I’m currently using, when brokers and consumer networks are throttled, message dispatch can slow down, causing back pressure on the broker side and leading to high direct memory usage. If, at this time, the Pulsar ledger handler attempts to read entries from BK, it may encounter transient direct memory OOM due to direct memory exhaustion. @geniusjoe The back pressure for BK reads is handled by configuring `managedLedgerMaxReadsInFlightSizeInMB `. Please see https://github.com/apache/pulsar/blob/master/pip/pip-442.md for more details about broker memory management and related backpressure configuration. The memory limits aren't accurate since a Netty ByteBuf can be holding on to a larger underlying buffer where it was split off. In tuning, it's necessary to set the limits low enough that memory limit doesn't ever reach the scenario where Netty direct memory would run out. That might be impossible to achieve in all cases. One of the reasons is what you touched upon in #25274. Memory allocations, either pooled or "direct" (handled by malloc/OS), will cause fragmentation over time and there will be less memory available in certain workloads. > I personally think the current exception handling logic is quite good, as it helps avoid OOM that could be triggered by back pressure when brokers and consumer networks are throttled. I’m not entirely sure whether Netty 4.2’s `adaptiveByteBuf` is fully compatible with the logic of Netty 4.1’s `pooledByteBuf` in this part. When upgrading to a major version of Netty, if the new `adaptiveByteBuf` does not provide significant improvements in performance or other aspects, it may be worth evaluating whether to adopt the new `adaptiveByteBuf` or continue reusing the old `pooledByteBuf` mode. As mentioned in my comment https://github.com/apache/pulsar/issues/25021#issuecomment-3584774110, AutoMQ's blog post ["Challenges of Custom Cache Implementation in Netty-Based Streaming Systems: Memory Fragmentation and OOM Issues"](https://www.automq.com/blog/netty-based-streaming-systems-memory-fragmentation-and-oom-issues) describes some of the problems with fragmentation in the Netty PooledByteBufAllocator. The AutoMQ article doesn't evaluate AdaptiveByteBufAllocator. I'd assume that it handles caching usecases in a better way than PooledByteBufAllocator. It would be great to get some feedback from real tests. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
