neoremind commented on code in PR #15950:
URL: https://github.com/apache/lucene/pull/15950#discussion_r3101892199
##########
lucene/benchmark-jmh/jmh-table.py:
##########
@@ -0,0 +1,771 @@
+#!/usr/bin/env python3
+"""Parse JMH JSON output from stdin, produce an interactive HTML table on
stdout.
+
+Supports both JSON (-rf json) and plain text JMH output.
+With JSON input, clicking a cell shows a histogram of the raw iteration samples
+and the benchmark method source code.
+
+Usage:
+ # JSON (recommended – enables histograms + source):
+ java --module-path ... --module org.apache.lucene.benchmark.jmh
ScoreDocSortBenchmark \
+ -rf json -rff results.json \
+ && python3 jmh-table.py [BenchmarkSource.java] < results.json >
results.html
+
+ # Plain text (no histograms):
+ java --module-path ... --module org.apache.lucene.benchmark.jmh
ScoreDocSortBenchmark \
+ | python3 jmh-table.py > results.html
+
+ The optional positional argument is the path to the Java source file
containing
+ the @Benchmark methods. If provided, clicking a cell also shows the method
source.
+"""
+
+import sys
+import re
+import json
+import html
+import math
+
+
+def parse_jmh_text(text):
+ """Parse plain-text JMH output."""
+ entries = []
+ for line in text.splitlines():
+ m = re.match(
+ r'\S+\.(\S+)\s+'
+ r'(\S+)\s+'
+ r'\S+\s+'
+ r'\d+\s+'
+ r'(\S+)\s+'
+ r'.\s+'
+ r'(\S+)\s+'
+ r'(\S+)',
+ line,
+ )
+ if m:
+ method, param, score, error, unit = m.groups()
+ entries.append({
+ 'method': method,
+ 'param': param,
+ 'score': float(score),
+ 'error': float(error),
+ 'unit': unit,
+ 'raw': [],
+ })
+ return entries, {}
+
+
+def parse_jmh_json(data):
+ """Parse JMH JSON output. Returns (entries, config_dict)."""
+ entries = []
+ config = {}
+ total_sec = 0
+ for i, result in enumerate(data):
+ bench = result['benchmark'].rsplit('.', 1)[-1]
+ params = result.get('params', {})
+ # Handle multiple params: ScoreDocSortBenchmark uses 'size' and
'distribution'
+ size = params.get('size', '')
+ dist = params.get('distribution', 'random')
Review Comment:
Nice work on the benchmark + visualization!
One thing I noticed: `jmh-table.py` currently hardcodes `size` and
`distribution` as the param dimensions, which is specific to this sort bench.
If the script is meant to be general-purpose, would it make sense to
auto-discover `@Param` attribute names? For multiple params, the layout could
be something like columns for the highest-cardinality param and a dropdown for
the rest. Anyways, it already looks great as-is, just a thought for making it
even more general.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]