Re: [PR] Add JMH benchmark comparing various sort algorithms specifically for sorting ScoreDoc[] of varying lengths [lucene]

via GitHub Fri, 17 Apr 2026 13:38:08 -0700


mikemccand commented on code in PR #15950:
URL: https://github.com/apache/lucene/pull/15950#discussion_r3103124703



##########
lucene/benchmark-jmh/jmh-table.py:
##########
@@ -0,0 +1,771 @@
+#!/usr/bin/env python3
+"""Parse JMH JSON output from stdin, produce an interactive HTML table on 
stdout.
+
+Supports both JSON (-rf json) and plain text JMH output.
+With JSON input, clicking a cell shows a histogram of the raw iteration samples
+and the benchmark method source code.
+
+Usage:
+  # JSON (recommended – enables histograms + source):
+  java --module-path ... --module org.apache.lucene.benchmark.jmh 
ScoreDocSortBenchmark \
+    -rf json -rff results.json \
+    && python3 jmh-table.py [BenchmarkSource.java] < results.json > 
results.html
+
+  # Plain text (no histograms):
+  java --module-path ... --module org.apache.lucene.benchmark.jmh 
ScoreDocSortBenchmark \
+    | python3 jmh-table.py > results.html
+
+  The optional positional argument is the path to the Java source file 
containing
+  the @Benchmark methods. If provided, clicking a cell also shows the method 
source.
+"""
+
+import sys
+import re
+import json
+import html
+import math
+
+
+def parse_jmh_text(text):
+    """Parse plain-text JMH output."""
+    entries = []
+    for line in text.splitlines():
+        m = re.match(
+            r'\S+\.(\S+)\s+'
+            r'(\S+)\s+'
+            r'\S+\s+'
+            r'\d+\s+'
+            r'(\S+)\s+'
+            r'.\s+'
+            r'(\S+)\s+'
+            r'(\S+)',
+            line,
+        )
+        if m:
+            method, param, score, error, unit = m.groups()
+            entries.append({
+                'method': method,
+                'param': param,
+                'score': float(score),
+                'error': float(error),
+                'unit': unit,
+                'raw': [],
+            })
+    return entries, {}
+
+
+def parse_jmh_json(data):
+    """Parse JMH JSON output. Returns (entries, config_dict)."""
+    entries = []
+    config = {}
+    total_sec = 0
+    for i, result in enumerate(data):
+        bench = result['benchmark'].rsplit('.', 1)[-1]
+        params = result.get('params', {})
+        # Handle multiple params: ScoreDocSortBenchmark uses 'size' and 
'distribution'
+        size = params.get('size', '')
+        dist = params.get('distribution', 'random')

Review Comment:
   Actually, noodling more on this, I rather like this idea!  (A simple 
reusable HTML GUI to understand JMH results).  I've been wall-of-text'd by the 
JMH results (e.g. on [this recent cool JMH 
benchy](https://github.com/apache/lucene/pull/15938#issue-4220255639) where 
@gsmiller also sought genai help to build a more visual table summarizing the 
results).  If we could view wall-of-text JMH results using same GUI as here 
that would be higher bandwidth JMH -> brain / understanding consumption.  And, 
it can visually share important aspects of the distribution of runs, like the 
bimodal compilation problem hotspot could be causing.
   
   @neoremind maybe open a spinoff issue to try to generalize this UI for any 
JMH benchy?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Re: [PR] Add JMH benchmark comparing various sort algorithms specifically for sorting ScoreDoc[] of varying lengths [lucene]

Reply via email to