[ 
https://issues.apache.org/jira/browse/LOG4J2-2606?focusedWorklogId=246563&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-246563
 ]

ASF GitHub Bot logged work on LOG4J2-2606:
------------------------------------------

                Author: ASF GitHub Bot
            Created on: 22/May/19 03:43
            Start Date: 22/May/19 03:43
    Worklog Time Spent: 10m 
      Work Description: carterkozak commented on issue #273: LOG4J2-2606: 
Substantially improve async logging performance under heavy load
URL: https://github.com/apache/logging-log4j2/pull/273#issuecomment-494640721
 
 
   > Can we experiment a bit before moving ahead?
   @carterkozak You reported this happened in your production environment. Can 
I ask how you've dealt with that? Is your organization waiting for a quick 
Log4j fix, or do you have some other interim solution?
   
   Absolutely, I'd much rather get this done correctly than quickly. For the 
time being we've implemented a custom `AsyncQueueFullPolicy` which uses 
`EventRoute.SYNCHRONOUS` instead of `ENQUEUE`. Event order isn't terribly 
problematic in our case because because our events use a custom json format, 
and flow through an automated pipeline where they're indexed and presented in 
order based on timestamps.
   
   > I'm also surprised that all the WaitStrategies were reported to show the 
same behaviour when the ringbuffer is full and under load. The proposed change 
(synchronizing on a mutex object) seems to do the same as the 
BlockingWaitStrategy provided by the Disruptor. So I would expect at least the 
BlockingWaitStrategy to show some improvement...
   
   The BlockingWaitStrategy in disruptor master does appear much closer to what 
I've written than the version in 3.4.2, however if I backport that version of 
the file to this branch and run benchmarks (values in 
queuFullPolicy=ENQUEUE_UNSYNCHRONIZED), the java process CPU utilization 
exceeds 1000% (across 6 cores + hyperthreading) and throughput is lower than 
the timeout strategy. (disclaimer: benchmarks run on a laptop, cpu throttling 
and other processes produce noise)
   
   ```
   Benchmark                                                           
(queueFullPolicy)   Mode  Cnt        Score         Error  Units
   ConcurrentAsyncLoggerToFileBenchmark.concurrentLoggingThreads                
 ENQUEUE  thrpt    3  1269003.278 ±  858705.609  ops/s
   ConcurrentAsyncLoggerToFileBenchmark.concurrentLoggingThreads  
ENQUEUE_UNSYNCHRONIZED  thrpt    3   296842.480 ± 1060494.701  ops/s
   ConcurrentAsyncLoggerToFileBenchmark.concurrentLoggingThreads             
SYNCHRONOUS  thrpt    3  1095597.447 ±  767282.544  ops/s
   ConcurrentAsyncLoggerToFileBenchmark.singleLoggingThread                     
 ENQUEUE  thrpt    3  1231449.118 ± 1874760.323  ops/s
   ConcurrentAsyncLoggerToFileBenchmark.singleLoggingThread       
ENQUEUE_UNSYNCHRONIZED  thrpt    3  1329153.582 ± 2409041.282  ops/s
   ConcurrentAsyncLoggerToFileBenchmark.singleLoggingThread                  
SYNCHRONOUS  thrpt    3  1336944.414 ±  636546.820  ops/s
   ```
 
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
-------------------

    Worklog Id:     (was: 246563)
    Time Spent: 1h  (was: 50m)

> Asynchronous logging when the queue is full results heavy CPU load
> ------------------------------------------------------------------
>
>                 Key: LOG4J2-2606
>                 URL: https://issues.apache.org/jira/browse/LOG4J2-2606
>             Project: Log4j 2
>          Issue Type: Bug
>    Affects Versions: 2.11.2
>            Reporter: Carter Kozak
>            Assignee: Carter Kozak
>            Priority: Major
>          Time Spent: 1h
>  Remaining Estimate: 0h
>
> I've encountered unexpected performance characteristics when the asynchronous 
> logging queue is full, resulting in the application becoming unavailable for 
> an extended period of time.
> I think EventRoute.ENQUEUE is using a spin lock (or similar construct). When 
> many threads attempt to log at a higher rate than we can drain the queue, 
> logging throughput drops significantly while CPU utilization skyrockets 
> causing instability across the system.
> I can reproduce this in a benchmark (I will clean it up and push when I have 
> a moment) where I get the following results:
> Setup:
>  Root logger has a random access file appender with immediateFlush disabled
> AsyncLoggerContextSelector enabled for completely asynchronous logging
> 1 thread logging, any queue full policy: ~1,400k events/second, 150% CPU
> 32 threads logging, default policy (EventRoute.ENQUEUE, default policy): 
> 300k-400k events/second. 930% CPU load
> 32 threads logging, custom policy using EventRoute.SYNCHRONOUS: 1,200k 
> events/second. 350% CPU load
> Using the synchronous policy results in ~1/3 the CPU load and 3-4x throughput 
> when the system is loaded to the point that the logging queue is filled.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to