[PR] Add JMH benchmark comparing various sort algorithms specifically for sorting ScoreDoc[] of varying lengths [lucene]

via GitHub Fri, 10 Apr 2026 08:45:35 -0700


mikemccand opened a new pull request, #15950:
URL: https://github.com/apache/lucene/pull/15950


   [Spinoff from @zihanx's recent PR (#15803) adding 
`ReaderUtil.partitionByLeaf` helper to collate any `ScoreDoc[]` into per-leaf 
sorted groups.]
   
   TL;DR: see [this cool benchmark results 
UI](http://githubsearch.mikemccandless.com/jmh-sort-scoredoc-benchy.html) 
testing various algorithms to sort `ScoreDoc[]` (smaller is better, the numbers 
are microseconds per sort call).
   
   The hits from a Lucene query (`ScoreDoc[]`) come out sorted by something 
important to the user (e.g. relevance), but in order to retrieve stored fields 
or doc values, you instead need to partition by leaf and sort by `docid` within 
each leaf, retrieve your values via Lucene's usual iterators, and somehow then 
invert that step so you can stick/associate the retrieved values with original 
`ScoreDoc[]` (sorted for the user).
   
   We currently use `Arrays.sort(int[])` after extracting just the `docid` from 
each hit, but that loses association with the original `ScoreDoc[]`.   #15905 
is about NOT losing that association by also keeping track of the `int ordinal` 
position (in the original array).  And then @gsmiller made the cool JMH bench 
to find a faster partition step (after sorting) -- #15938.
   
   This all made me curious about the absolute fastest way to sort these hits.  
So, I worked with Claude Code (and maybe Gemini for some prompts too) to create 
this JMH benchmark (`ScoreDocSortBenchmark.java`) plus simple HTML UI to 
understand the results.
   
   It tests 12 different approaches across array sizes from 10 to 10,000:
   *   The usual JDK options (`Arrays.sort` with lambdas, static comparators, 
and `parallelSort`).
   *   Lucene's internal sorters (`IntroSorter`, `TimSorter`, 
`InPlaceMergeSorter`, and `LSBRadixSorter`).
   *   A few primitive extraction techniques (packing `docid` and the original 
array `index` into `int[]` or `long[]` for cache-friendly primitive sorting).
   *   A manual 2-pass 8-bit radix sort.
   
   The benchmark does upfront validation during the `@Setup` phase to ensure 
every sort is actually sorting correctly and not losing/duplicating any 
`ScoreDoc` instances, all without polluting the actual JMH measurement time.
   
   Claude and I also added a companion Python script (`jmh-table.py`) to 
convert the raw JMH JSON output into an interactive HTML heatmap.
   
   I tried to preserve all local commits, and tell the ai to make a local 
commit after each prompt+code iteration.
   
   Here is a screenshot of the output:
   
   <img width="1797" height="1161" alt="sort-benchy" 
src="https://github.com/user-attachments/assets/2b212dac-bbf8-4603-8a54-8b3a6f9cf5d3";
 />
   
   Here's a [hosted version that should work for 
you](http://githubsearch.mikemccandless.com/jmh-sort-scoredoc-benchy.html) 
(click column headers to sort all rows!).
   
   **NOTES/Learnings**
   * **AI Collaboration:** This code was written mostly by an AI (Gemini CLI) 
with me (a human) prompting, reviewing, and iterating. 
   * Measurements for very small array sizes (like size=10) include JMH's 
`Level.Invocation` overhead because we have to do a shallow array copy before 
every invocation. It inflates the absolute numbers a bit for tiny arrays, but 
the overhead is consistent across all algorithms.
   * Each cell is greener if it was more competitive vs the other algos, and 
includes a sparkle histogram of latencies.
   * It is disturbing how much the latency varies, and is sometimes strongly 
bimodal (likely hotspot structural noise e.g. mis-compilation)  even when it's 
just one thread in a fresh JVM on an idle box
   * Click on column header to sort all rows by it (Excel-like).  Click on a 
cell to see the source code for that algo, and under the table you'll see 
histogram of latencies of all runs for that cell.
   * It only tests random arrays.  Ideally I would test on real hits popping 
out of a useful search engine.  When you test on random data you draw random 
conclusions, sigh.
   * Claude invented those two primitive solutions with no specific prompting 
from me, which is super cool.  They are nearly fastest (so is the radix sort!) 
across the board except `parallelSort` on large (10K) arrays, but that's using 
multiple CPU cores.  This, despite that they must make first pass to encode to 
`long[]` and ending pass to get back to `ScoreDoc[]`.
   * It's surprising how much slower / faster the various algorithms are -- 
`TimSorter` (is this still used for Python's `listobject` sort?) was 
surprisingly slow
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[PR] Add JMH benchmark comparing various sort algorithms specifically for sorting ScoreDoc[] of varying lengths [lucene]

Reply via email to