[ 
https://issues.apache.org/jira/browse/LOG4J2-2606?focusedWorklogId=246453&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-246453
 ]

ASF GitHub Bot logged work on LOG4J2-2606:
------------------------------------------

                Author: ASF GitHub Bot
            Created on: 21/May/19 23:17
            Start Date: 21/May/19 23:17
    Worklog Time Spent: 10m 
      Work Description: carterkozak commented on pull request #273: 
LOG4J2-2606: Substantially improve async logging performance under heavy load
URL: https://github.com/apache/logging-log4j2/pull/273
 
 
   This change applies synchronization to AsyncLoggerDisruptor blocking enqueue
   operations when the asynchronous logging queue is completely full.
   This new behavior may be disabled using the boolean property
   `AsyncLogger.SynchronizeEnqueueWhenQueueFull` (default true).
   
   Alternatives tested:
   * All available lmax disruptor `WaitStrategy` implementations.
     None of the available implementations provided a substantial
     difference in throughput or CPU utilization. In all cases
     my processor was stuck at or near 100% across all six cores.
   * Based on feedback from 
https://github.com/LMAX-Exchange/disruptor/issues/266
     I attempted avoiding the blocking enqueue entirely in favor of
     `LockSupport.parkNanos` with values ranging from 1,000 through
     1,000,000. 1,000,000 provided the most throughput among parkNanos
     values tested, but is too long for most scenarios.
   * Semaphore instead of synchronized. Testing with permits
     equivalent to the available processor count yielded higher
     CPU utilization and lower throughput than a semaphore
     with a single permit. Given that most work occurs on
     the (single) background thread, there's not much reason
     to allow multiple enqueues simultaneously.
   
   Updated benchmarks adding "ENQUEUE_UNSYNCHRONIZED" following
   the old unsynchronized path, where "ENQUEUE" shows an improvement:
   
   ```
   Benchmark                                                           
(queueFullPolicy)   Mode  Cnt        Score         Error  Units
   ConcurrentAsyncLoggerToFileBenchmark.concurrentLoggingThreads                
 ENQUEUE  thrpt    3  1146649.538 ±  103899.750  ops/s
   ConcurrentAsyncLoggerToFileBenchmark.concurrentLoggingThreads  
ENQUEUE_UNSYNCHRONIZED  thrpt    3   422244.124 ±  785429.301  ops/s
   ConcurrentAsyncLoggerToFileBenchmark.concurrentLoggingThreads             
SYNCHRONOUS  thrpt    3  1084417.832 ±  354547.695  ops/s
   ConcurrentAsyncLoggerToFileBenchmark.singleLoggingThread                     
 ENQUEUE  thrpt    3  1250748.284 ± 1202507.746  ops/s
   ConcurrentAsyncLoggerToFileBenchmark.singleLoggingThread       
ENQUEUE_UNSYNCHRONIZED  thrpt    3  1280794.688 ±  832379.969  ops/s
   ConcurrentAsyncLoggerToFileBenchmark.singleLoggingThread                  
SYNCHRONOUS  thrpt    3  1227202.214 ± 1398451.960  ops/s
   ```
 
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
-------------------

            Worklog Id:     (was: 246453)
            Time Spent: 10m
    Remaining Estimate: 0h

> Asynchronous logging when the queue is full results heavy CPU load
> ------------------------------------------------------------------
>
>                 Key: LOG4J2-2606
>                 URL: https://issues.apache.org/jira/browse/LOG4J2-2606
>             Project: Log4j 2
>          Issue Type: Bug
>    Affects Versions: 2.11.2
>            Reporter: Carter Kozak
>            Assignee: Carter Kozak
>            Priority: Major
>          Time Spent: 10m
>  Remaining Estimate: 0h
>
> I've encountered unexpected performance characteristics when the asynchronous 
> logging queue is full, resulting in the application becoming unavailable for 
> an extended period of time.
> I think EventRoute.ENQUEUE is using a spin lock (or similar construct). When 
> many threads attempt to log at a higher rate than we can drain the queue, 
> logging throughput drops significantly while CPU utilization skyrockets 
> causing instability across the system.
> I can reproduce this in a benchmark (I will clean it up and push when I have 
> a moment) where I get the following results:
> Setup:
>  Root logger has a random access file appender with immediateFlush disabled
> AsyncLoggerContextSelector enabled for completely asynchronous logging
> 1 thread logging, any queue full policy: ~1,400k events/second, 150% CPU
> 32 threads logging, default policy (EventRoute.ENQUEUE, default policy): 
> 300k-400k events/second. 930% CPU load
> 32 threads logging, custom policy using EventRoute.SYNCHRONOUS: 1,200k 
> events/second. 350% CPU load
> Using the synchronous policy results in ~1/3 the CPU load and 3-4x throughput 
> when the system is loaded to the point that the logging queue is filled.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to