Re: [PR] Add JMH benchmark comparing various sort algorithms specifically for sorting ScoreDoc[] of varying lengths [lucene]

via GitHub Sat, 18 Apr 2026 09:49:56 -0700


neoremind commented on code in PR #15950:
URL: https://github.com/apache/lucene/pull/15950#discussion_r3105429174



##########
lucene/benchmark-jmh/jmh-table.py:
##########
@@ -0,0 +1,771 @@
+#!/usr/bin/env python3
+"""Parse JMH JSON output from stdin, produce an interactive HTML table on 
stdout.
+
+Supports both JSON (-rf json) and plain text JMH output.
+With JSON input, clicking a cell shows a histogram of the raw iteration samples
+and the benchmark method source code.
+
+Usage:
+  # JSON (recommended – enables histograms + source):
+  java --module-path ... --module org.apache.lucene.benchmark.jmh 
ScoreDocSortBenchmark \
+    -rf json -rff results.json \
+    && python3 jmh-table.py [BenchmarkSource.java] < results.json > 
results.html
+
+  # Plain text (no histograms):
+  java --module-path ... --module org.apache.lucene.benchmark.jmh 
ScoreDocSortBenchmark \
+    | python3 jmh-table.py > results.html
+
+  The optional positional argument is the path to the Java source file 
containing
+  the @Benchmark methods. If provided, clicking a cell also shows the method 
source.
+"""
+
+import sys
+import re
+import json
+import html
+import math
+
+
+def parse_jmh_text(text):
+    """Parse plain-text JMH output."""
+    entries = []
+    for line in text.splitlines():
+        m = re.match(
+            r'\S+\.(\S+)\s+'
+            r'(\S+)\s+'
+            r'\S+\s+'
+            r'\d+\s+'
+            r'(\S+)\s+'
+            r'.\s+'
+            r'(\S+)\s+'
+            r'(\S+)',
+            line,
+        )
+        if m:
+            method, param, score, error, unit = m.groups()
+            entries.append({
+                'method': method,
+                'param': param,
+                'score': float(score),
+                'error': float(error),
+                'unit': unit,
+                'raw': [],
+            })
+    return entries, {}
+
+
+def parse_jmh_json(data):
+    """Parse JMH JSON output. Returns (entries, config_dict)."""
+    entries = []
+    config = {}
+    total_sec = 0
+    for i, result in enumerate(data):
+        bench = result['benchmark'].rsplit('.', 1)[-1]
+        params = result.get('params', {})
+        # Handle multiple params: ScoreDocSortBenchmark uses 'size' and 
'distribution'
+        size = params.get('size', '')
+        dist = params.get('distribution', 'random')

Review Comment:
   I've created https://github.com/apache/lucene/issues/15968 to track.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Re: [PR] Add JMH benchmark comparing various sort algorithms specifically for sorting ScoreDoc[] of varying lengths [lucene]

Reply via email to