SOLR-15428: Integrating the jmh micro-benchmark tool into the build
I want to bring attention to this issue, because there is lots to drop in on top of it, but a bit like JUnit tests, it’s best if you get things close to as wanted early for consistency without having to change a ton of tests as you adjust. It’s obvious why a micro-benchmark framework would be useful, I think a group of people could pack that bin full of reasons with ease. So while I have my current reasons around ranking, disagreement is unlikely on the ‘useful’ front. But for what it’s worth, a couple of my motivations beyond the obvious: My strategy on the Solr Ref branch was pretty straightforward, but had ridiculous input requirements. It’s the only viable strategy I’ve encountered, cost of admission IMO, but it’s a bit of eat it all or nothing in terms of really getting near it’s actual intended output. You use tests, tools, others work, benchmarks, and you chase down outlier performance issues. Understanding performance means understanding the code much better, understanding the system much better, add, repeat. For a very long time. Because things happen in layers and connections in a way that you are still connecting and understanding more and more, more easily, as long tail issues are cut off, over and over and over. At some point most of what is left is less interesting to look at from a stability and low hanging fruit, performance standpoint. Indicators start pointing to base and sensible stuff that requires real thought and effort to make gains on vs the return. I believe in “don’t pre optimize”. It often makes sense for all the reasons you can look up. But that is different than *understanding* the performance. Understanding the system resource impacts. Understanding GC, etc. When you take the time to understand, silly stuff get’s corrected, low hanging fruit is spotted, and the good developer generally knows how to handle differentiating between what is then sensible to tackle or not or when. So how do you go about a less thorough and wider spread strategy (importantly, with a Solr Ref Branch map is a critical for me, +1) to gain some results? Make it much easier to gauge and look at and experiment with: performance. Never mind that outlier performance is heavily correlated to silliness IME. Micro-benchmarks are also important for comparing main and the ref branch easily and usefully. It’s also important in order to pull things over. When you trace down issue after issue and accumulate enough advancement, you can start making trade offs and decisions that will not apply the same between main and the ref branch. You can’t pull for free, you have to be able to analyze and adapt. Micro-benchmarks are also important for wider attention on maintaining gains and preventing regressions. Something has to be built-in easy - it has to be good, and easy for devs, and super flexible. Although, micro-benchmarks are not easy. So let’s just say much easier for devs. I have stripped the issue down to just the very basic integration and a couple of examples. I’m way down the road in some things, but am pulling back here to ensure a reasonable base that can more easily accept others input. I’ll be nipping and tucking around that, but if you have an interest in micro-benchmarking, this is when it’s easiest to setup conventions and base code for ease and consistency - before updating, creating, piling a ton of benchmarks on top. If you don’t know what jmh is or need a refresher: https://openjdk.java.net/projects/code-tools/jmh/ These samples are essentially documentation: https://github.com/openjdk/jmh/tree/master/jmh-samples/src/main/java/org/openjdk/jmh/samples The main guy behind it (I believe) has a fantastic series of articles here as well: https://shipilev.net Definitely check out https://shipilev.net/blog/2014/nanotrusting-nanotime/ The Solr JIRA is https://issues.apache.org/jira/browse/SOLR-15428 There is a fair bit more information there. “Refined performance models are by far the noblest and greatest achievement one could get with the benchmarking — it contributes to understanding how computers, runtimes, libraries, and user code work together.” - Shipilёv MRM -- - Mark http://about.me/markrmiller