lhotari commented on PR #25522:
URL: https://github.com/apache/pulsar/pull/25522#issuecomment-4302468787

   > Based on the scenarios I’m currently using, when brokers and consumer 
networks are throttled, message dispatch can slow down, causing back pressure 
on the broker side and leading to high direct memory usage. If, at this time, 
the Pulsar ledger handler attempts to read entries from BK, it may encounter 
transient direct memory OOM due to direct memory exhaustion.
   
   @geniusjoe The back pressure for BK reads is handled by configuring 
`managedLedgerMaxReadsInFlightSizeInMB `. Please see 
https://github.com/apache/pulsar/blob/master/pip/pip-442.md for more details 
about broker memory management and related backpressure configuration. The 
memory limits aren't accurate since a Netty ByteBuf can be holding on to a 
larger underlying buffer where it was split off. 
   
   In tuning, it's necessary to set the limits low enough that memory limit 
doesn't ever reach the scenario where Netty direct memory would run out. That 
might be impossible to achieve in all cases. One of the reasons is what you 
touched upon in #25274. Memory allocations, either pooled or "direct" (handled 
by malloc/OS), will cause fragmentation over time and there will be less memory 
available in certain workloads.
   
   > I personally think the current exception handling logic is quite good, as 
it helps avoid OOM that could be triggered by back pressure when brokers and 
consumer networks are throttled. I’m not entirely sure whether Netty 4.2’s 
`adaptiveByteBuf` is fully compatible with the logic of Netty 4.1’s 
`pooledByteBuf` in this part. When upgrading to a major version of Netty, if 
the new `adaptiveByteBuf` does not provide significant improvements in 
performance or other aspects, it may be worth evaluating whether to adopt the 
new `adaptiveByteBuf` or continue reusing the old `pooledByteBuf` mode.
   
   As mentioned in my comment 
https://github.com/apache/pulsar/issues/25021#issuecomment-3584774110, AutoMQ's 
blog post ["Challenges of Custom Cache Implementation in Netty-Based Streaming 
Systems: Memory Fragmentation and OOM 
Issues"](https://www.automq.com/blog/netty-based-streaming-systems-memory-fragmentation-and-oom-issues)
 describes some of the problems with fragmentation in the Netty 
PooledByteBufAllocator. The AutoMQ article doesn't evaluate 
AdaptiveByteBufAllocator. I'd assume that it handles caching usecases in a 
better way than PooledByteBufAllocator. It would be great to get some feedback 
from real tests.
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to