gf2121 opened a new pull request, #12800:
URL: https://github.com/apache/lucene/pull/12800

   **Description**
   
   In https://github.com/apache/lucene/pull/12114, we had great numbers for LSB 
radix sorter when sorting random docs in `SortingDocsEnum` . But we can not 
take advantage of the LSB radix sorter in `SortingPostingsEnum` because ((I 
guess) we need to swap an parallel offset array. This PR proposes to do the 
following: 
   
   * Extract a generalized LSB radix sorter called `BaseLSBRadixSorter`, and 
change the origin `LSBRadixSorter` to `UnsignedIntLSBRadixSorter`.
   
   * Implement a custom LSB radix sorter fo `SortingPostingsEnum`.
   
   **Some Notes**
   
   * I plan to mark the `LSBRadixSorter` deprecated in 9.x and delete in 10.0. 
So I called the abstract class `BaseLSBRadixSorter` instead of `LSBRadixSorter` 
to help backport.
   
   * I run a benchmark to make sure we do not have regression for the unsigned 
ints sorter. I attach the benchmark code in the PR and will remove before merge.
   ```
   Benchmark                  (bit)  (order)  (size)   Mode  Cnt  Score   Error 
  Units
   LSBSorterBenchmark.newLSB     24  natural  100000  thrpt    5  2.802 ± 0.038 
 ops/ms
   LSBSorterBenchmark.newLSB     24  reverse  100000  thrpt    5  2.800 ± 0.040 
 ops/ms
   LSBSorterBenchmark.newLSB     24   random  100000  thrpt    5  2.768 ± 0.039 
 ops/ms
   LSBSorterBenchmark.newLSB     31  natural  100000  thrpt    5  2.125 ± 0.021 
 ops/ms
   LSBSorterBenchmark.newLSB     31  reverse  100000  thrpt    5  2.128 ± 0.033 
 ops/ms
   LSBSorterBenchmark.newLSB     31   random  100000  thrpt    5  2.101 ± 0.014 
 ops/ms
   LSBSorterBenchmark.oldLSB     24  natural  100000  thrpt    5  2.847 ± 0.030 
 ops/ms
   LSBSorterBenchmark.oldLSB     24  reverse  100000  thrpt    5  2.848 ± 0.013 
 ops/ms
   LSBSorterBenchmark.oldLSB     24   random  100000  thrpt    5  2.787 ± 0.129 
 ops/ms
   LSBSorterBenchmark.oldLSB     31  natural  100000  thrpt    5  2.154 ± 0.033 
 ops/ms
   LSBSorterBenchmark.oldLSB     31  reverse  100000  thrpt    5  2.150 ± 0.018 
 ops/ms
   LSBSorterBenchmark.oldLSB     31   random  100000  thrpt    5  2.121 ± 0.011 
 ops/ms
   ```
   
   * I run a benchmark to compare tim sorter and LSB radix sorter impl in 
`SortingPostingsEnum`. LSB radix sorter is much faster in most cases except for 
the completely sorted docs, which looks fine to me. I attached the benchmark 
code in the PR and will remove before merge as well.
   
   ```
   Benchmark                       (bit)  (order)  (size)   Mode  Cnt   Score   
 Error   Units
   DocSorterBenchmark.radixSorter     24  natural  100000  thrpt    5   2.063 ± 
 0.088  ops/ms
   DocSorterBenchmark.radixSorter     24  reverse  100000  thrpt    5   2.082 ± 
 0.016  ops/ms
   DocSorterBenchmark.radixSorter     24   random  100000  thrpt    5   2.110 ± 
 0.014  ops/ms
   DocSorterBenchmark.radixSorter     24  partial  100000  thrpt    5   2.091 ± 
 0.026  ops/ms
   DocSorterBenchmark.radixSorter     31  natural  100000  thrpt    5   1.633 ± 
 0.014  ops/ms
   DocSorterBenchmark.radixSorter     31  reverse  100000  thrpt    5   1.636 ± 
 0.023  ops/ms
   DocSorterBenchmark.radixSorter     31   random  100000  thrpt    5   1.641 ± 
 0.009  ops/ms
   DocSorterBenchmark.radixSorter     31  partial  100000  thrpt    5   1.640 ± 
 0.010  ops/ms
   DocSorterBenchmark.timSorter       24  natural  100000  thrpt    5  49.726 ± 
 1.452  ops/ms
   DocSorterBenchmark.timSorter       24  reverse  100000  thrpt    5   0.724 ± 
 0.006  ops/ms
   DocSorterBenchmark.timSorter       24   random  100000  thrpt    5   0.061 ± 
 0.001  ops/ms
   DocSorterBenchmark.timSorter       24  partial  100000  thrpt    5   0.203 ± 
 0.023  ops/ms
   DocSorterBenchmark.timSorter       31  natural  100000  thrpt    5  49.475 ± 
 2.480  ops/ms
   DocSorterBenchmark.timSorter       31  reverse  100000  thrpt    5   2.236 ± 
 0.192  ops/ms
   DocSorterBenchmark.timSorter       31   random  100000  thrpt    5   0.058 ± 
 0.001  ops/ms
   DocSorterBenchmark.timSorter       31  partial  100000  thrpt    5   0.209 ± 
 0.012  ops/ms
   ```
   
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

Reply via email to