[ https://issues.apache.org/jira/browse/LOG4J2-2606?focusedWorklogId=246453&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-246453 ]
ASF GitHub Bot logged work on LOG4J2-2606: ------------------------------------------ Author: ASF GitHub Bot Created on: 21/May/19 23:17 Start Date: 21/May/19 23:17 Worklog Time Spent: 10m Work Description: carterkozak commented on pull request #273: LOG4J2-2606: Substantially improve async logging performance under heavy load URL: https://github.com/apache/logging-log4j2/pull/273 This change applies synchronization to AsyncLoggerDisruptor blocking enqueue operations when the asynchronous logging queue is completely full. This new behavior may be disabled using the boolean property `AsyncLogger.SynchronizeEnqueueWhenQueueFull` (default true). Alternatives tested: * All available lmax disruptor `WaitStrategy` implementations. None of the available implementations provided a substantial difference in throughput or CPU utilization. In all cases my processor was stuck at or near 100% across all six cores. * Based on feedback from https://github.com/LMAX-Exchange/disruptor/issues/266 I attempted avoiding the blocking enqueue entirely in favor of `LockSupport.parkNanos` with values ranging from 1,000 through 1,000,000. 1,000,000 provided the most throughput among parkNanos values tested, but is too long for most scenarios. * Semaphore instead of synchronized. Testing with permits equivalent to the available processor count yielded higher CPU utilization and lower throughput than a semaphore with a single permit. Given that most work occurs on the (single) background thread, there's not much reason to allow multiple enqueues simultaneously. Updated benchmarks adding "ENQUEUE_UNSYNCHRONIZED" following the old unsynchronized path, where "ENQUEUE" shows an improvement: ``` Benchmark (queueFullPolicy) Mode Cnt Score Error Units ConcurrentAsyncLoggerToFileBenchmark.concurrentLoggingThreads ENQUEUE thrpt 3 1146649.538 ± 103899.750 ops/s ConcurrentAsyncLoggerToFileBenchmark.concurrentLoggingThreads ENQUEUE_UNSYNCHRONIZED thrpt 3 422244.124 ± 785429.301 ops/s ConcurrentAsyncLoggerToFileBenchmark.concurrentLoggingThreads SYNCHRONOUS thrpt 3 1084417.832 ± 354547.695 ops/s ConcurrentAsyncLoggerToFileBenchmark.singleLoggingThread ENQUEUE thrpt 3 1250748.284 ± 1202507.746 ops/s ConcurrentAsyncLoggerToFileBenchmark.singleLoggingThread ENQUEUE_UNSYNCHRONIZED thrpt 3 1280794.688 ± 832379.969 ops/s ConcurrentAsyncLoggerToFileBenchmark.singleLoggingThread SYNCHRONOUS thrpt 3 1227202.214 ± 1398451.960 ops/s ``` ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking ------------------- Worklog Id: (was: 246453) Time Spent: 10m Remaining Estimate: 0h > Asynchronous logging when the queue is full results heavy CPU load > ------------------------------------------------------------------ > > Key: LOG4J2-2606 > URL: https://issues.apache.org/jira/browse/LOG4J2-2606 > Project: Log4j 2 > Issue Type: Bug > Affects Versions: 2.11.2 > Reporter: Carter Kozak > Assignee: Carter Kozak > Priority: Major > Time Spent: 10m > Remaining Estimate: 0h > > I've encountered unexpected performance characteristics when the asynchronous > logging queue is full, resulting in the application becoming unavailable for > an extended period of time. > I think EventRoute.ENQUEUE is using a spin lock (or similar construct). When > many threads attempt to log at a higher rate than we can drain the queue, > logging throughput drops significantly while CPU utilization skyrockets > causing instability across the system. > I can reproduce this in a benchmark (I will clean it up and push when I have > a moment) where I get the following results: > Setup: > Root logger has a random access file appender with immediateFlush disabled > AsyncLoggerContextSelector enabled for completely asynchronous logging > 1 thread logging, any queue full policy: ~1,400k events/second, 150% CPU > 32 threads logging, default policy (EventRoute.ENQUEUE, default policy): > 300k-400k events/second. 930% CPU load > 32 threads logging, custom policy using EventRoute.SYNCHRONOUS: 1,200k > events/second. 350% CPU load > Using the synchronous policy results in ~1/3 the CPU load and 3-4x throughput > when the system is loaded to the point that the logging queue is filled. -- This message was sent by Atlassian JIRA (v7.6.3#76005)