Re: [PR] Add JMH benchmark comparing various sort algorithms specifically for sorting ScoreDoc[] of varying lengths [lucene]

via GitHub Fri, 17 Apr 2026 09:49:42 -0700


neoremind commented on code in PR #15950:
URL: https://github.com/apache/lucene/pull/15950#discussion_r3101892199



##########
lucene/benchmark-jmh/jmh-table.py:
##########
@@ -0,0 +1,771 @@
+#!/usr/bin/env python3
+"""Parse JMH JSON output from stdin, produce an interactive HTML table on 
stdout.
+
+Supports both JSON (-rf json) and plain text JMH output.
+With JSON input, clicking a cell shows a histogram of the raw iteration samples
+and the benchmark method source code.
+
+Usage:
+  # JSON (recommended – enables histograms + source):
+  java --module-path ... --module org.apache.lucene.benchmark.jmh 
ScoreDocSortBenchmark \
+    -rf json -rff results.json \
+    && python3 jmh-table.py [BenchmarkSource.java] < results.json > 
results.html
+
+  # Plain text (no histograms):
+  java --module-path ... --module org.apache.lucene.benchmark.jmh 
ScoreDocSortBenchmark \
+    | python3 jmh-table.py > results.html
+
+  The optional positional argument is the path to the Java source file 
containing
+  the @Benchmark methods. If provided, clicking a cell also shows the method 
source.
+"""
+
+import sys
+import re
+import json
+import html
+import math
+
+
+def parse_jmh_text(text):
+    """Parse plain-text JMH output."""
+    entries = []
+    for line in text.splitlines():
+        m = re.match(
+            r'\S+\.(\S+)\s+'
+            r'(\S+)\s+'
+            r'\S+\s+'
+            r'\d+\s+'
+            r'(\S+)\s+'
+            r'.\s+'
+            r'(\S+)\s+'
+            r'(\S+)',
+            line,
+        )
+        if m:
+            method, param, score, error, unit = m.groups()
+            entries.append({
+                'method': method,
+                'param': param,
+                'score': float(score),
+                'error': float(error),
+                'unit': unit,
+                'raw': [],
+            })
+    return entries, {}
+
+
+def parse_jmh_json(data):
+    """Parse JMH JSON output. Returns (entries, config_dict)."""
+    entries = []
+    config = {}
+    total_sec = 0
+    for i, result in enumerate(data):
+        bench = result['benchmark'].rsplit('.', 1)[-1]
+        params = result.get('params', {})
+        # Handle multiple params: ScoreDocSortBenchmark uses 'size' and 
'distribution'
+        size = params.get('size', '')
+        dist = params.get('distribution', 'random')

Review Comment:
   Nice work on the benchmark + visualization!
   
   One thing I noticed: `jmh-table.py` currently hardcodes `size` and 
`distribution` as the param dimensions, which is specific to this sort bench. 
If the script is meant to be general-purpose, would it make sense to 
auto-discover `@Param` attribute names? For multiple params, the layout could 
be something like columns for the highest-cardinality param and a dropdown for 
the rest. Anyways, it already looks great as-is, just a thought for making it 
even more general.
   



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Re: [PR] Add JMH benchmark comparing various sort algorithms specifically for sorting ScoreDoc[] of varying lengths [lucene]

Reply via email to