gsmiller commented on issue #15934:
URL: https://github.com/apache/lucene/issues/15934#issuecomment-4184592670

   Modified my benchmark and got some results:
   ```
   Benchmark                                                 (numDocIds)  
(numLeaves)   Mode  Cnt     Score     Error   Units
   PartitionByLeafBenchmark.arraysSortBinarySearchPartition          100        
    5  thrpt   15  1266.733 ± 242.517  ops/ms
   PartitionByLeafBenchmark.arraysSortBinarySearchPartition          100        
   10  thrpt   15  1355.237 ±  76.162  ops/ms
   PartitionByLeafBenchmark.arraysSortBinarySearchPartition          100        
   20  thrpt   15  1086.332 ±  59.511  ops/ms
   PartitionByLeafBenchmark.arraysSortBinarySearchPartition          100        
   50  thrpt   15   657.963 ±  44.439  ops/ms
   PartitionByLeafBenchmark.arraysSortBinarySearchPartition          100        
  200  thrpt   15   397.090 ±  13.318  ops/ms
   PartitionByLeafBenchmark.arraysSortBinarySearchPartition         1000        
    5  thrpt   15    98.762 ±   3.021  ops/ms
   PartitionByLeafBenchmark.arraysSortBinarySearchPartition         1000        
   10  thrpt   15   101.797 ±   6.867  ops/ms
   PartitionByLeafBenchmark.arraysSortBinarySearchPartition         1000        
   20  thrpt   15   103.472 ±   2.145  ops/ms
   PartitionByLeafBenchmark.arraysSortBinarySearchPartition         1000        
   50  thrpt   15    95.354 ±   1.992  ops/ms
   PartitionByLeafBenchmark.arraysSortBinarySearchPartition         1000        
  200  thrpt   15    83.167 ±   4.200  ops/ms
   PartitionByLeafBenchmark.arraysSortBinarySearchPartition        10000        
    5  thrpt   15     6.078 ±   0.545  ops/ms
   PartitionByLeafBenchmark.arraysSortBinarySearchPartition        10000        
   10  thrpt   15     6.081 ±   0.426  ops/ms
   PartitionByLeafBenchmark.arraysSortBinarySearchPartition        10000        
   20  thrpt   15     6.112 ±   0.909  ops/ms
   PartitionByLeafBenchmark.arraysSortBinarySearchPartition        10000        
   50  thrpt   15     5.681 ±   0.805  ops/ms
   PartitionByLeafBenchmark.arraysSortBinarySearchPartition        10000        
  200  thrpt   15     6.027 ±   0.445  ops/ms
   PartitionByLeafBenchmark.arraysSortBinarySearchPartition       100000        
    5  thrpt   15     0.269 ±   0.005  ops/ms
   PartitionByLeafBenchmark.arraysSortBinarySearchPartition       100000        
   10  thrpt   15     0.269 ±   0.008  ops/ms
   PartitionByLeafBenchmark.arraysSortBinarySearchPartition       100000        
   20  thrpt   15     0.268 ±   0.006  ops/ms
   PartitionByLeafBenchmark.arraysSortBinarySearchPartition       100000        
   50  thrpt   15     0.240 ±   0.022  ops/ms
   PartitionByLeafBenchmark.arraysSortBinarySearchPartition       100000        
  200  thrpt   15     0.250 ±   0.019  ops/ms
   PartitionByLeafBenchmark.arraysSortOnly                           100        
    5  thrpt   15  1002.754 ±  98.899  ops/ms
   PartitionByLeafBenchmark.arraysSortOnly                           100        
   10  thrpt   15  1254.892 ±  72.662  ops/ms
   PartitionByLeafBenchmark.arraysSortOnly                           100        
   20  thrpt   15  1280.451 ±  52.612  ops/ms
   PartitionByLeafBenchmark.arraysSortOnly                           100        
   50  thrpt   15  1150.066 ±  58.922  ops/ms
   PartitionByLeafBenchmark.arraysSortOnly                           100        
  200  thrpt   15   833.865 ±  33.402  ops/ms
   PartitionByLeafBenchmark.arraysSortOnly                          1000        
    5  thrpt   15    84.483 ±   3.502  ops/ms
   PartitionByLeafBenchmark.arraysSortOnly                          1000        
   10  thrpt   15    83.488 ±   2.792  ops/ms
   PartitionByLeafBenchmark.arraysSortOnly                          1000        
   20  thrpt   15    78.479 ±  21.774  ops/ms
   PartitionByLeafBenchmark.arraysSortOnly                          1000        
   50  thrpt   15    74.364 ±  12.695  ops/ms
   PartitionByLeafBenchmark.arraysSortOnly                          1000        
  200  thrpt   15    74.425 ±  13.535  ops/ms
   PartitionByLeafBenchmark.arraysSortOnly                         10000        
    5  thrpt   15     5.637 ±   0.468  ops/ms
   PartitionByLeafBenchmark.arraysSortOnly                         10000        
   10  thrpt   15     5.560 ±   0.406  ops/ms
   PartitionByLeafBenchmark.arraysSortOnly                         10000        
   20  thrpt   15     5.637 ±   0.425  ops/ms
   PartitionByLeafBenchmark.arraysSortOnly                         10000        
   50  thrpt   15     6.077 ±   0.808  ops/ms
   PartitionByLeafBenchmark.arraysSortOnly                         10000        
  200  thrpt   15     5.950 ±   0.548  ops/ms
   PartitionByLeafBenchmark.arraysSortOnly                        100000        
    5  thrpt   15     0.257 ±   0.010  ops/ms
   PartitionByLeafBenchmark.arraysSortOnly                        100000        
   10  thrpt   15     0.256 ±   0.005  ops/ms
   PartitionByLeafBenchmark.arraysSortOnly                        100000        
   20  thrpt   15     0.255 ±   0.006  ops/ms
   PartitionByLeafBenchmark.arraysSortOnly                        100000        
   50  thrpt   15     0.235 ±   0.034  ops/ms
   PartitionByLeafBenchmark.arraysSortOnly                        100000        
  200  thrpt   15     0.229 ±   0.032  ops/ms
   ```
   
   Used AI to Summarize:
   | numDocIds | numLeaves | Linear (ops/ms) | BinarySearch (ops/ms) | 
Difference |
   
|-----------|-----------|------------------|-----------------------|------------|
   | 100       | 5         | 1,003            | 1,267                 | BS +26% 
✅ |
   | 100       | 10        | 1,255            | 1,355                 | ~tie    
   |
   | 100       | 20        | 1,280            | 1,086                 | Linear 
+18% ❌ |
   | 100       | 50        | 1,150            | 658                   | Linear 
+75% ❌ |
   | 100       | 200       | 834              | 397                   | Linear 
+110% ❌ |
   | 1,000     | 5         | 84               | 99                    | BS +17% 
✅ |
   | 1,000     | 10        | 83               | 102                   | BS +22% 
✅ |
   | 1,000     | 20        | 78               | 103                   | BS +32% 
✅ |
   | 1,000     | 50        | 74               | 95                    | BS +28% 
✅ |
   | 1,000     | 200       | 74               | 83                    | BS +12% 
✅ |
   | 10,000    | 5         | 5.6              | 6.1                   | ~tie    
   |
   | 10,000    | 10        | 5.6              | 6.1                   | ~tie    
   |
   | 10,000    | 20        | 5.6              | 6.1                   | ~tie    
   |
   | 10,000    | 50        | 6.1              | 5.7                   | ~tie    
   |
   | 10,000    | 200       | 6.0              | 6.0                   | ~tie    
   |
   | 100,000   | 5         | 0.257            | 0.269                 | ~tie    
   |
   | 100,000   | 10        | 0.256            | 0.269                 | ~tie    
   |
   | 100,000   | 20        | 0.255            | 0.268                 | ~tie    
   |
   | 100,000   | 50        | 0.235            | 0.240                 | ~tie    
   |
   | 100,000   | 200       | 0.229            | 0.250                 | ~tie    
   |
   
   Key observations:
   - At 10K+ doc IDs, the sort step dominates and the partition strategy makes 
no measurable difference.
   - At 1K doc IDs, binary search is modestly faster across all leaf counts 
(12-32%).
   - At 100 doc IDs, the result depends on the ratio of docs to leaves. When 
leaves greatly outnumber docs (e.g., 200 leaves for 100 docs), the binary 
search approach degrades because it iterates all leaf 
   boundaries regardless. The linear scan only touches the 100 doc IDs.
   - In absolute terms, even the largest differences are sub-microsecond (e.g., 
100 docs: ~1μs vs ~2.5μs at 200 leaves).
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to