gf2121 opened a new pull request, #12800: URL: https://github.com/apache/lucene/pull/12800
**Description** In https://github.com/apache/lucene/pull/12114, we had great numbers for LSB radix sorter when sorting random docs in `SortingDocsEnum` . But we can not take advantage of the LSB radix sorter in `SortingPostingsEnum` because ((I guess) we need to swap an parallel offset array. This PR proposes to do the following: * Extract a generalized LSB radix sorter called `BaseLSBRadixSorter`, and change the origin `LSBRadixSorter` to `UnsignedIntLSBRadixSorter`. * Implement a custom LSB radix sorter fo `SortingPostingsEnum`. **Some Notes** * I plan to mark the `LSBRadixSorter` deprecated in 9.x and delete in 10.0. So I called the abstract class `BaseLSBRadixSorter` instead of `LSBRadixSorter` to help backport. * I run a benchmark to make sure we do not have regression for the unsigned ints sorter. I attach the benchmark code in the PR and will remove before merge. ``` Benchmark (bit) (order) (size) Mode Cnt Score Error Units LSBSorterBenchmark.newLSB 24 natural 100000 thrpt 5 2.802 ± 0.038 ops/ms LSBSorterBenchmark.newLSB 24 reverse 100000 thrpt 5 2.800 ± 0.040 ops/ms LSBSorterBenchmark.newLSB 24 random 100000 thrpt 5 2.768 ± 0.039 ops/ms LSBSorterBenchmark.newLSB 31 natural 100000 thrpt 5 2.125 ± 0.021 ops/ms LSBSorterBenchmark.newLSB 31 reverse 100000 thrpt 5 2.128 ± 0.033 ops/ms LSBSorterBenchmark.newLSB 31 random 100000 thrpt 5 2.101 ± 0.014 ops/ms LSBSorterBenchmark.oldLSB 24 natural 100000 thrpt 5 2.847 ± 0.030 ops/ms LSBSorterBenchmark.oldLSB 24 reverse 100000 thrpt 5 2.848 ± 0.013 ops/ms LSBSorterBenchmark.oldLSB 24 random 100000 thrpt 5 2.787 ± 0.129 ops/ms LSBSorterBenchmark.oldLSB 31 natural 100000 thrpt 5 2.154 ± 0.033 ops/ms LSBSorterBenchmark.oldLSB 31 reverse 100000 thrpt 5 2.150 ± 0.018 ops/ms LSBSorterBenchmark.oldLSB 31 random 100000 thrpt 5 2.121 ± 0.011 ops/ms ``` * I run a benchmark to compare tim sorter and LSB radix sorter impl in `SortingPostingsEnum`. LSB radix sorter is much faster in most cases except for the completely sorted docs, which looks fine to me. I attached the benchmark code in the PR and will remove before merge as well. ``` Benchmark (bit) (order) (size) Mode Cnt Score Error Units DocSorterBenchmark.radixSorter 24 natural 100000 thrpt 5 2.063 ± 0.088 ops/ms DocSorterBenchmark.radixSorter 24 reverse 100000 thrpt 5 2.082 ± 0.016 ops/ms DocSorterBenchmark.radixSorter 24 random 100000 thrpt 5 2.110 ± 0.014 ops/ms DocSorterBenchmark.radixSorter 24 partial 100000 thrpt 5 2.091 ± 0.026 ops/ms DocSorterBenchmark.radixSorter 31 natural 100000 thrpt 5 1.633 ± 0.014 ops/ms DocSorterBenchmark.radixSorter 31 reverse 100000 thrpt 5 1.636 ± 0.023 ops/ms DocSorterBenchmark.radixSorter 31 random 100000 thrpt 5 1.641 ± 0.009 ops/ms DocSorterBenchmark.radixSorter 31 partial 100000 thrpt 5 1.640 ± 0.010 ops/ms DocSorterBenchmark.timSorter 24 natural 100000 thrpt 5 49.726 ± 1.452 ops/ms DocSorterBenchmark.timSorter 24 reverse 100000 thrpt 5 0.724 ± 0.006 ops/ms DocSorterBenchmark.timSorter 24 random 100000 thrpt 5 0.061 ± 0.001 ops/ms DocSorterBenchmark.timSorter 24 partial 100000 thrpt 5 0.203 ± 0.023 ops/ms DocSorterBenchmark.timSorter 31 natural 100000 thrpt 5 49.475 ± 2.480 ops/ms DocSorterBenchmark.timSorter 31 reverse 100000 thrpt 5 2.236 ± 0.192 ops/ms DocSorterBenchmark.timSorter 31 random 100000 thrpt 5 0.058 ± 0.001 ops/ms DocSorterBenchmark.timSorter 31 partial 100000 thrpt 5 0.209 ± 0.012 ops/ms ``` -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org