[GitHub] [logging-log4j2] carterkozak commented on pull request #544: LOG4J2-3115: Remember previously computed strings in NamePatternConverter for performance ~3x

GitBox Tue, 03 Aug 2021 09:35:38 -0700


carterkozak commented on pull request #544:
URL: https://github.com/apache/logging-log4j2/pull/544#issuecomment-891993420



   I slightly restructured the benchmark to fix the issue where only the first 
class was ever used, and reused the StringBuilder and LogEvent objects to avoid 
allocation overhead on a per-invocation basis.
   
   ## All benchmarks use  1 fork, 3 x 3 seconds warmup, 3*3 seconds measurement
   
   I've run the benchmarks on an `Intel(R) Xeon(R) W-2175 CPU` with 14 cores 
and 28 threads.
   
   ### 1 thread
   
   My results using the benchmark against release-2.x (no caching):
   
   ```
   Benchmark                                                                   
(className)  Mode  Cnt   Score    Error  Units
   NamePatternConverterBenchmark.benchNamePatternConverter    
org.bogus.hokus.pokus.Clazz1  avgt    3  89.382 ±  3.264  ns/op
   NamePatternConverterBenchmark.benchNamePatternConverter    
com.bogus.hokus.pokus.Clazz2  avgt    3  91.734 ± 42.690  ns/op
   NamePatternConverterBenchmark.benchNamePatternConverter  
edu.bogus.hokus.pokus.a.Clazz3  avgt    3  98.898 ±  5.629  ns/op
   NamePatternConverterBenchmark.benchNamePatternConverter         
de.bogus.hokus.b.Clazz4  avgt    3  81.632 ±  1.608  ns/op
   NamePatternConverterBenchmark.benchNamePatternConverter               
jp.bogus.c.Clazz5  avgt    3  62.760 ±  2.325  ns/op
   NamePatternConverterBenchmark.benchNamePatternConverter                     
cn.d.Clazz6  avgt    3  43.892 ±  2.879  ns/op
   ```
   
   With the cache:
   
   ```
   Benchmark                                                                   
(className)  Mode  Cnt   Score   Error  Units
   NamePatternConverterBenchmark.benchNamePatternConverter    
org.bogus.hokus.pokus.Clazz1  avgt    3  23.171 ± 4.225  ns/op
   NamePatternConverterBenchmark.benchNamePatternConverter    
com.bogus.hokus.pokus.Clazz2  avgt    3  22.009 ± 2.083  ns/op
   NamePatternConverterBenchmark.benchNamePatternConverter  
edu.bogus.hokus.pokus.a.Clazz3  avgt    3  22.745 ± 5.220  ns/op
   NamePatternConverterBenchmark.benchNamePatternConverter         
de.bogus.hokus.b.Clazz4  avgt    3  25.033 ± 4.682  ns/op
   NamePatternConverterBenchmark.benchNamePatternConverter               
jp.bogus.c.Clazz5  avgt    3  23.140 ± 6.582  ns/op
   NamePatternConverterBenchmark.benchNamePatternConverter                     
cn.d.Clazz6  avgt    3  22.592 ± 4.511  ns/op
   ```
   
   ### 14 threads
   
   My results using the benchmark against release-2.x (no caching):
   
   ```
   Benchmark                                                                   
(className)  Mode  Cnt    Score    Error  Units
   NamePatternConverterBenchmark.benchNamePatternConverter    
org.bogus.hokus.pokus.Clazz1  avgt    3  108.224 ±  4.403  ns/op
   NamePatternConverterBenchmark.benchNamePatternConverter    
com.bogus.hokus.pokus.Clazz2  avgt    3  105.806 ±  7.577  ns/op
   NamePatternConverterBenchmark.benchNamePatternConverter  
edu.bogus.hokus.pokus.a.Clazz3  avgt    3  125.960 ±  3.303  ns/op
   NamePatternConverterBenchmark.benchNamePatternConverter         
de.bogus.hokus.b.Clazz4  avgt    3  101.220 ± 37.384  ns/op
   NamePatternConverterBenchmark.benchNamePatternConverter               
jp.bogus.c.Clazz5  avgt    3   78.575 ± 13.142  ns/op
   NamePatternConverterBenchmark.benchNamePatternConverter                     
cn.d.Clazz6  avgt    3   54.714 ±  3.688  ns/op
   ```
   
   With the cache:
   
   ```
   Benchmark                                                                   
(className)  Mode  Cnt     Score      Error  Units
   NamePatternConverterBenchmark.benchNamePatternConverter    
org.bogus.hokus.pokus.Clazz1  avgt    3  8495.062 ± 9732.081  ns/op
   NamePatternConverterBenchmark.benchNamePatternConverter    
com.bogus.hokus.pokus.Clazz2  avgt    3  8920.030 ± 7984.602  ns/op
   NamePatternConverterBenchmark.benchNamePatternConverter  
edu.bogus.hokus.pokus.a.Clazz3  avgt    3  8843.975 ± 3125.221  ns/op
   NamePatternConverterBenchmark.benchNamePatternConverter         
de.bogus.hokus.b.Clazz4  avgt    3  8879.263 ± 4066.595  ns/op
   NamePatternConverterBenchmark.benchNamePatternConverter               
jp.bogus.c.Clazz5  avgt    3  8623.134 ± 3309.457  ns/op
   NamePatternConverterBenchmark.benchNamePatternConverter                     
cn.d.Clazz6  avgt    3  8833.953 ±  616.468  ns/op
   ```
   
   ## My interpretation:
   
   Introducing a cache provides a synchronization point which didn't previously 
exist, so the worst case performance for a best-case cache (one with a single 
element that's always used due to the way benchmarks are broken up by fork) is 
substantially worse than the worst case performance of the uncached 
implementation.
   The single threaded case sees an uplift of between 20ns and 80ns average 
time, while the multithreaded case sees an 8,500ns to 8,800ns potential 
regression (based on workload, hardware, etc).
   
   I haven't benchmarked cases in which the cache is full and pointer jumps are 
required to find the matching element, nor cases in which we rotate through 
more loggers than we have cache space. I suspect an implementation using 
https://github.com/ben-manes/caffeine would work better as it supports several 
novel concurrent datastructures, and the cache size could be based on the 
length of input and output strings, but it would take a great deal of testing.
   
   Benchmarks run with:
   ```java
   package org.apache.logging.log4j.perf.jmh;
   
   import org.apache.logging.log4j.core.AbstractLogEvent;
   import org.apache.logging.log4j.core.LogEvent;
   import org.apache.logging.log4j.core.pattern.LoggerPatternConverter;
   import org.openjdk.jmh.annotations.Benchmark;
   import org.openjdk.jmh.annotations.BenchmarkMode;
   import org.openjdk.jmh.annotations.Fork;
   import org.openjdk.jmh.annotations.Measurement;
   import org.openjdk.jmh.annotations.Mode;
   import org.openjdk.jmh.annotations.OutputTimeUnit;
   import org.openjdk.jmh.annotations.Param;
   import org.openjdk.jmh.annotations.Scope;
   import org.openjdk.jmh.annotations.Setup;
   import org.openjdk.jmh.annotations.State;
   import org.openjdk.jmh.annotations.Threads;
   import org.openjdk.jmh.annotations.Warmup;
   
   import java.util.concurrent.TimeUnit;
   
   /**
    * Tests Log4j2 NamePatternConverter's performance.<br>
    *
    * How to run these benchmarks:<br>
    *
    * Single thread:<br>
    * <pre>java -jar log4j-perf/target/benchmarks.jar 
".*NamePatternConverterBenchmark.*" -f 1 -wi 2 -i 3 -r 3s -jvmArgs '-server 
-XX:+AggressiveOpts'</pre>
    *
    * Multiple threads (for example, 4 threads):<br>
    * <pre>java -jar log4j-perf/target/benchmarks.jar 
".*NamePatternConverterBenchmark.*" -f 1 -wi 4 -i 20 -t 4 -si true</pre>
    *
    * Usage help:<br>
    * <pre>java -jar log4j-perf/target/benchmarks.jar -help</pre>
    */
   @Fork(1)
   @Threads(14)
   @Warmup(iterations = 3, time = 3)
   @Measurement(iterations = 3, time = 3)
   public class NamePatternConverterBenchmark {
   
       @State(Scope.Benchmark)
       public static class ExecutionPlan {
           @Param({
                   "org.bogus.hokus.pokus.Clazz1",
                   "com.bogus.hokus.pokus.Clazz2",
                   "edu.bogus.hokus.pokus.a.Clazz3",
                   "de.bogus.hokus.b.Clazz4",
                   "jp.bogus.c.Clazz5",
                   "cn.d.Clazz6"
           })
           String className;
           LogEvent event;
           private final ThreadLocal<StringBuilder> destination = 
ThreadLocal.withInitial(StringBuilder::new);
   
           final LoggerPatternConverter converter = 
LoggerPatternConverter.newInstance(new String[] {"1."});
   
           @Setup
           public void setup() {
               event = new BenchmarkLogEvent(className);
           }
   
           StringBuilder destination() {
               StringBuilder result = destination.get();
               result.setLength(0);
               return result;
           }
       }
   
       @Benchmark
       @BenchmarkMode(Mode.AverageTime)
       @OutputTimeUnit(TimeUnit.NANOSECONDS)
       public void benchNamePatternConverter(ExecutionPlan plan) {
           plan.converter.format(plan.event, plan.destination());
       }
   
       private static class BenchmarkLogEvent extends AbstractLogEvent {
           private final String loggerName;
   
           BenchmarkLogEvent(String loggerName) {
               this.loggerName = loggerName;
           }
   
           @Override
           public String getLoggerName() {
               return loggerName;
           }
       }
   }
   ```


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: notifications-unsubscr...@logging.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[GitHub] [logging-log4j2] carterkozak commented on pull request #544: LOG4J2-3115: Remember previously computed strings in NamePatternConverter for performance ~3x

Reply via email to