gsmiller commented on issue #15934:
URL: https://github.com/apache/lucene/issues/15934#issuecomment-4184592670
Modified my benchmark and got some results:
```
Benchmark (numDocIds)
(numLeaves) Mode Cnt Score Error Units
PartitionByLeafBenchmark.arraysSortBinarySearchPartition 100
5 thrpt 15 1266.733 ± 242.517 ops/ms
PartitionByLeafBenchmark.arraysSortBinarySearchPartition 100
10 thrpt 15 1355.237 ± 76.162 ops/ms
PartitionByLeafBenchmark.arraysSortBinarySearchPartition 100
20 thrpt 15 1086.332 ± 59.511 ops/ms
PartitionByLeafBenchmark.arraysSortBinarySearchPartition 100
50 thrpt 15 657.963 ± 44.439 ops/ms
PartitionByLeafBenchmark.arraysSortBinarySearchPartition 100
200 thrpt 15 397.090 ± 13.318 ops/ms
PartitionByLeafBenchmark.arraysSortBinarySearchPartition 1000
5 thrpt 15 98.762 ± 3.021 ops/ms
PartitionByLeafBenchmark.arraysSortBinarySearchPartition 1000
10 thrpt 15 101.797 ± 6.867 ops/ms
PartitionByLeafBenchmark.arraysSortBinarySearchPartition 1000
20 thrpt 15 103.472 ± 2.145 ops/ms
PartitionByLeafBenchmark.arraysSortBinarySearchPartition 1000
50 thrpt 15 95.354 ± 1.992 ops/ms
PartitionByLeafBenchmark.arraysSortBinarySearchPartition 1000
200 thrpt 15 83.167 ± 4.200 ops/ms
PartitionByLeafBenchmark.arraysSortBinarySearchPartition 10000
5 thrpt 15 6.078 ± 0.545 ops/ms
PartitionByLeafBenchmark.arraysSortBinarySearchPartition 10000
10 thrpt 15 6.081 ± 0.426 ops/ms
PartitionByLeafBenchmark.arraysSortBinarySearchPartition 10000
20 thrpt 15 6.112 ± 0.909 ops/ms
PartitionByLeafBenchmark.arraysSortBinarySearchPartition 10000
50 thrpt 15 5.681 ± 0.805 ops/ms
PartitionByLeafBenchmark.arraysSortBinarySearchPartition 10000
200 thrpt 15 6.027 ± 0.445 ops/ms
PartitionByLeafBenchmark.arraysSortBinarySearchPartition 100000
5 thrpt 15 0.269 ± 0.005 ops/ms
PartitionByLeafBenchmark.arraysSortBinarySearchPartition 100000
10 thrpt 15 0.269 ± 0.008 ops/ms
PartitionByLeafBenchmark.arraysSortBinarySearchPartition 100000
20 thrpt 15 0.268 ± 0.006 ops/ms
PartitionByLeafBenchmark.arraysSortBinarySearchPartition 100000
50 thrpt 15 0.240 ± 0.022 ops/ms
PartitionByLeafBenchmark.arraysSortBinarySearchPartition 100000
200 thrpt 15 0.250 ± 0.019 ops/ms
PartitionByLeafBenchmark.arraysSortOnly 100
5 thrpt 15 1002.754 ± 98.899 ops/ms
PartitionByLeafBenchmark.arraysSortOnly 100
10 thrpt 15 1254.892 ± 72.662 ops/ms
PartitionByLeafBenchmark.arraysSortOnly 100
20 thrpt 15 1280.451 ± 52.612 ops/ms
PartitionByLeafBenchmark.arraysSortOnly 100
50 thrpt 15 1150.066 ± 58.922 ops/ms
PartitionByLeafBenchmark.arraysSortOnly 100
200 thrpt 15 833.865 ± 33.402 ops/ms
PartitionByLeafBenchmark.arraysSortOnly 1000
5 thrpt 15 84.483 ± 3.502 ops/ms
PartitionByLeafBenchmark.arraysSortOnly 1000
10 thrpt 15 83.488 ± 2.792 ops/ms
PartitionByLeafBenchmark.arraysSortOnly 1000
20 thrpt 15 78.479 ± 21.774 ops/ms
PartitionByLeafBenchmark.arraysSortOnly 1000
50 thrpt 15 74.364 ± 12.695 ops/ms
PartitionByLeafBenchmark.arraysSortOnly 1000
200 thrpt 15 74.425 ± 13.535 ops/ms
PartitionByLeafBenchmark.arraysSortOnly 10000
5 thrpt 15 5.637 ± 0.468 ops/ms
PartitionByLeafBenchmark.arraysSortOnly 10000
10 thrpt 15 5.560 ± 0.406 ops/ms
PartitionByLeafBenchmark.arraysSortOnly 10000
20 thrpt 15 5.637 ± 0.425 ops/ms
PartitionByLeafBenchmark.arraysSortOnly 10000
50 thrpt 15 6.077 ± 0.808 ops/ms
PartitionByLeafBenchmark.arraysSortOnly 10000
200 thrpt 15 5.950 ± 0.548 ops/ms
PartitionByLeafBenchmark.arraysSortOnly 100000
5 thrpt 15 0.257 ± 0.010 ops/ms
PartitionByLeafBenchmark.arraysSortOnly 100000
10 thrpt 15 0.256 ± 0.005 ops/ms
PartitionByLeafBenchmark.arraysSortOnly 100000
20 thrpt 15 0.255 ± 0.006 ops/ms
PartitionByLeafBenchmark.arraysSortOnly 100000
50 thrpt 15 0.235 ± 0.034 ops/ms
PartitionByLeafBenchmark.arraysSortOnly 100000
200 thrpt 15 0.229 ± 0.032 ops/ms
```
Used AI to Summarize:
| numDocIds | numLeaves | Linear (ops/ms) | BinarySearch (ops/ms) |
Difference |
|-----------|-----------|------------------|-----------------------|------------|
| 100 | 5 | 1,003 | 1,267 | BS +26%
✅ |
| 100 | 10 | 1,255 | 1,355 | ~tie
|
| 100 | 20 | 1,280 | 1,086 | Linear
+18% ❌ |
| 100 | 50 | 1,150 | 658 | Linear
+75% ❌ |
| 100 | 200 | 834 | 397 | Linear
+110% ❌ |
| 1,000 | 5 | 84 | 99 | BS +17%
✅ |
| 1,000 | 10 | 83 | 102 | BS +22%
✅ |
| 1,000 | 20 | 78 | 103 | BS +32%
✅ |
| 1,000 | 50 | 74 | 95 | BS +28%
✅ |
| 1,000 | 200 | 74 | 83 | BS +12%
✅ |
| 10,000 | 5 | 5.6 | 6.1 | ~tie
|
| 10,000 | 10 | 5.6 | 6.1 | ~tie
|
| 10,000 | 20 | 5.6 | 6.1 | ~tie
|
| 10,000 | 50 | 6.1 | 5.7 | ~tie
|
| 10,000 | 200 | 6.0 | 6.0 | ~tie
|
| 100,000 | 5 | 0.257 | 0.269 | ~tie
|
| 100,000 | 10 | 0.256 | 0.269 | ~tie
|
| 100,000 | 20 | 0.255 | 0.268 | ~tie
|
| 100,000 | 50 | 0.235 | 0.240 | ~tie
|
| 100,000 | 200 | 0.229 | 0.250 | ~tie
|
Key observations:
- At 10K+ doc IDs, the sort step dominates and the partition strategy makes
no measurable difference.
- At 1K doc IDs, binary search is modestly faster across all leaf counts
(12-32%).
- At 100 doc IDs, the result depends on the ratio of docs to leaves. When
leaves greatly outnumber docs (e.g., 200 leaves for 100 docs), the binary
search approach degrades because it iterates all leaf
boundaries regardless. The linear scan only touches the 100 doc IDs.
- In absolute terms, even the largest differences are sub-microsecond (e.g.,
100 docs: ~1μs vs ~2.5μs at 200 leaves).
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]