carterkozak commented on pull request #544: URL: https://github.com/apache/logging-log4j2/pull/544#issuecomment-891993420
I slightly restructured the benchmark to fix the issue where only the first class was ever used, and reused the StringBuilder and LogEvent objects to avoid allocation overhead on a per-invocation basis. ## All benchmarks use 1 fork, 3 x 3 seconds warmup, 3*3 seconds measurement I've run the benchmarks on an `Intel(R) Xeon(R) W-2175 CPU` with 14 cores and 28 threads. ### 1 thread My results using the benchmark against release-2.x (no caching): ``` Benchmark (className) Mode Cnt Score Error Units NamePatternConverterBenchmark.benchNamePatternConverter org.bogus.hokus.pokus.Clazz1 avgt 3 89.382 ± 3.264 ns/op NamePatternConverterBenchmark.benchNamePatternConverter com.bogus.hokus.pokus.Clazz2 avgt 3 91.734 ± 42.690 ns/op NamePatternConverterBenchmark.benchNamePatternConverter edu.bogus.hokus.pokus.a.Clazz3 avgt 3 98.898 ± 5.629 ns/op NamePatternConverterBenchmark.benchNamePatternConverter de.bogus.hokus.b.Clazz4 avgt 3 81.632 ± 1.608 ns/op NamePatternConverterBenchmark.benchNamePatternConverter jp.bogus.c.Clazz5 avgt 3 62.760 ± 2.325 ns/op NamePatternConverterBenchmark.benchNamePatternConverter cn.d.Clazz6 avgt 3 43.892 ± 2.879 ns/op ``` With the cache: ``` Benchmark (className) Mode Cnt Score Error Units NamePatternConverterBenchmark.benchNamePatternConverter org.bogus.hokus.pokus.Clazz1 avgt 3 23.171 ± 4.225 ns/op NamePatternConverterBenchmark.benchNamePatternConverter com.bogus.hokus.pokus.Clazz2 avgt 3 22.009 ± 2.083 ns/op NamePatternConverterBenchmark.benchNamePatternConverter edu.bogus.hokus.pokus.a.Clazz3 avgt 3 22.745 ± 5.220 ns/op NamePatternConverterBenchmark.benchNamePatternConverter de.bogus.hokus.b.Clazz4 avgt 3 25.033 ± 4.682 ns/op NamePatternConverterBenchmark.benchNamePatternConverter jp.bogus.c.Clazz5 avgt 3 23.140 ± 6.582 ns/op NamePatternConverterBenchmark.benchNamePatternConverter cn.d.Clazz6 avgt 3 22.592 ± 4.511 ns/op ``` ### 14 threads My results using the benchmark against release-2.x (no caching): ``` Benchmark (className) Mode Cnt Score Error Units NamePatternConverterBenchmark.benchNamePatternConverter org.bogus.hokus.pokus.Clazz1 avgt 3 108.224 ± 4.403 ns/op NamePatternConverterBenchmark.benchNamePatternConverter com.bogus.hokus.pokus.Clazz2 avgt 3 105.806 ± 7.577 ns/op NamePatternConverterBenchmark.benchNamePatternConverter edu.bogus.hokus.pokus.a.Clazz3 avgt 3 125.960 ± 3.303 ns/op NamePatternConverterBenchmark.benchNamePatternConverter de.bogus.hokus.b.Clazz4 avgt 3 101.220 ± 37.384 ns/op NamePatternConverterBenchmark.benchNamePatternConverter jp.bogus.c.Clazz5 avgt 3 78.575 ± 13.142 ns/op NamePatternConverterBenchmark.benchNamePatternConverter cn.d.Clazz6 avgt 3 54.714 ± 3.688 ns/op ``` With the cache: ``` Benchmark (className) Mode Cnt Score Error Units NamePatternConverterBenchmark.benchNamePatternConverter org.bogus.hokus.pokus.Clazz1 avgt 3 8495.062 ± 9732.081 ns/op NamePatternConverterBenchmark.benchNamePatternConverter com.bogus.hokus.pokus.Clazz2 avgt 3 8920.030 ± 7984.602 ns/op NamePatternConverterBenchmark.benchNamePatternConverter edu.bogus.hokus.pokus.a.Clazz3 avgt 3 8843.975 ± 3125.221 ns/op NamePatternConverterBenchmark.benchNamePatternConverter de.bogus.hokus.b.Clazz4 avgt 3 8879.263 ± 4066.595 ns/op NamePatternConverterBenchmark.benchNamePatternConverter jp.bogus.c.Clazz5 avgt 3 8623.134 ± 3309.457 ns/op NamePatternConverterBenchmark.benchNamePatternConverter cn.d.Clazz6 avgt 3 8833.953 ± 616.468 ns/op ``` ## My interpretation: Introducing a cache provides a synchronization point which didn't previously exist, so the worst case performance for a best-case cache (one with a single element that's always used due to the way benchmarks are broken up by fork) is substantially worse than the worst case performance of the uncached implementation. The single threaded case sees an uplift of between 20ns and 80ns average time, while the multithreaded case sees an 8,500ns to 8,800ns potential regression (based on workload, hardware, etc). I haven't benchmarked cases in which the cache is full and pointer jumps are required to find the matching element, nor cases in which we rotate through more loggers than we have cache space. I suspect an implementation using https://github.com/ben-manes/caffeine would work better as it supports several novel concurrent datastructures, and the cache size could be based on the length of input and output strings, but it would take a great deal of testing. Benchmarks run with: ```java package org.apache.logging.log4j.perf.jmh; import org.apache.logging.log4j.core.AbstractLogEvent; import org.apache.logging.log4j.core.LogEvent; import org.apache.logging.log4j.core.pattern.LoggerPatternConverter; import org.openjdk.jmh.annotations.Benchmark; import org.openjdk.jmh.annotations.BenchmarkMode; import org.openjdk.jmh.annotations.Fork; import org.openjdk.jmh.annotations.Measurement; import org.openjdk.jmh.annotations.Mode; import org.openjdk.jmh.annotations.OutputTimeUnit; import org.openjdk.jmh.annotations.Param; import org.openjdk.jmh.annotations.Scope; import org.openjdk.jmh.annotations.Setup; import org.openjdk.jmh.annotations.State; import org.openjdk.jmh.annotations.Threads; import org.openjdk.jmh.annotations.Warmup; import java.util.concurrent.TimeUnit; /** * Tests Log4j2 NamePatternConverter's performance.<br> * * How to run these benchmarks:<br> * * Single thread:<br> * <pre>java -jar log4j-perf/target/benchmarks.jar ".*NamePatternConverterBenchmark.*" -f 1 -wi 2 -i 3 -r 3s -jvmArgs '-server -XX:+AggressiveOpts'</pre> * * Multiple threads (for example, 4 threads):<br> * <pre>java -jar log4j-perf/target/benchmarks.jar ".*NamePatternConverterBenchmark.*" -f 1 -wi 4 -i 20 -t 4 -si true</pre> * * Usage help:<br> * <pre>java -jar log4j-perf/target/benchmarks.jar -help</pre> */ @Fork(1) @Threads(14) @Warmup(iterations = 3, time = 3) @Measurement(iterations = 3, time = 3) public class NamePatternConverterBenchmark { @State(Scope.Benchmark) public static class ExecutionPlan { @Param({ "org.bogus.hokus.pokus.Clazz1", "com.bogus.hokus.pokus.Clazz2", "edu.bogus.hokus.pokus.a.Clazz3", "de.bogus.hokus.b.Clazz4", "jp.bogus.c.Clazz5", "cn.d.Clazz6" }) String className; LogEvent event; private final ThreadLocal<StringBuilder> destination = ThreadLocal.withInitial(StringBuilder::new); final LoggerPatternConverter converter = LoggerPatternConverter.newInstance(new String[] {"1."}); @Setup public void setup() { event = new BenchmarkLogEvent(className); } StringBuilder destination() { StringBuilder result = destination.get(); result.setLength(0); return result; } } @Benchmark @BenchmarkMode(Mode.AverageTime) @OutputTimeUnit(TimeUnit.NANOSECONDS) public void benchNamePatternConverter(ExecutionPlan plan) { plan.converter.format(plan.event, plan.destination()); } private static class BenchmarkLogEvent extends AbstractLogEvent { private final String loggerName; BenchmarkLogEvent(String loggerName) { this.loggerName = loggerName; } @Override public String getLoggerName() { return loggerName; } } } ``` -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: notifications-unsubscr...@logging.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org