Hello Mark, I did some micro benchmarks on the max-only thing. And they somewhat look like I have expected on my 4core/8threads:
1 thread Benchmark Mode Samples Score Score error Units n.e.j.t.MyBenchmark.atomicMax thrpt 5 71754699,465 4916943,615 ops/s n.e.j.t.MyBenchmark.synchronizedMax thrpt 5 31398785,726 1071751,697 ops/s 2 threads Benchmark Mode Samples Score Score error Units n.e.j.t.MyBenchmark.atomicMax thrpt 5 111773939,071 6478325,697 ops/s n.e.j.t.MyBenchmark.synchronizedMax thrpt 5 8197066,038 457199,732 ops/s 4 threads Benchmark Mode Samples Score Score error Units n.e.j.t.MyBenchmark.atomicMax thrpt 5 190326272,383 11287614,702 ops/s n.e.j.t.MyBenchmark.synchronizedMax thrpt 5 5227629,001 35488,823 ops/s 6 threads Benchmark Mode Samples Score Score error Units n.e.j.t.MyBenchmark.atomicMax thrpt 5 257831969,334 21506920,317 ops/s n.e.j.t.MyBenchmark.synchronizedMax thrpt 5 5437590,131 651603,805 ops/s 8 threads Benchmark Mode Samples Score Score error Units n.e.j.t.MyBenchmark.atomicMax thrpt 5 308313078,989 7365995,913 ops/s n.e.j.t.MyBenchmark.synchronizedMax thrpt 5 4808905,027 657214,727 ops/s 16 threads Benchmark Mode Samples Score Score error Units n.e.j.t.MyBenchmark.atomicMax thrpt 5 295064556,539 28027347,836 ops/s n.e.j.t.MyBenchmark.synchronizedMax thrpt 5 4974286,841 79612,105 ops/s I would say one sees that both get slower when the active conurrency gets higher, but the atomic version is always faster. When there are more threads than cores it even gets a bit better. So I would say it is safe to only replace the synchronized max version with the atomic one. The further changes like moving it to the stats store and cleaning up the mean calculation and the synchronized method can be left for later: I would actually like to introduce a StatsStore interface where the user can register their own implementaions. They would have to provide add() and getMean() and getMax() but they can query all they want from their smart implementation (including percentils and stuff). Until then, I attached a more minimal patch for the atomicMax of the borrow (only). Gruss Bernd PS: code is here: https://gist.github.com/ecki/8350a276caa444a62dce Am Tue, 23 Sep 2014 19:11:42 +0100 schrieb Mark Thomas <ma...@apache.org>: > On 23/09/2014 17:59, Phil Steitz wrote: > > On 9/21/14 7:29 PM, Phil Steitz wrote: > > <snip/> > > >> I was not able to demonstrate any consistent performance difference > >> in my tests. The code in trunk was a tiny bit faster on average > >> but some runs of the patched code were faster than some runs of the > >> unpatched code. It might be worth just changing the max to be > >> "lock free" and seeing if that change by itself makes any > >> difference. > > > > I did that and still could not demonstrate any improvement. The > > report in the ticket does show monitor contention, though, so I am > > ambivalent on this one. What do others think? > > Typo: s/Mast/Must/ > > I have often found that for large numbers of cheap operations that > monitoring tools are far more overhead than the thing they are trying > to measure. I recently spent some time tuning a new Cookie parser for > Tomcat and the profiler results were useless at identifying the real > bottlenecks. I wonder if that is happening here. > > I have no objection to the patch if the performance is about the same. > Adding max to the StatsStore is likely to be useful elsewhere. > > Mark > > > --------------------------------------------------------------------- > To unsubscribe, e-mail: dev-unsubscr...@commons.apache.org > For additional commands, e-mail: dev-h...@commons.apache.org > --------------------------------------------------------------------- To unsubscribe, e-mail: dev-unsubscr...@commons.apache.org For additional commands, e-mail: dev-h...@commons.apache.org