[ 
https://issues.apache.org/jira/browse/LUCENE-10196?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17432670#comment-17432670
 ] 

Bruno Roustant commented on LUCENE-10196:
-----------------------------------------

Benchmark run to compare sorters implementations for different shapes of data:

Each value is the time to complete a run. The values for the same column can be 
compared because the same data is provided as input to the various sorters. 
Each column has different random data. IntroSorter2 is the new modified version 
of IntroSorter.

The benchmark runs with 20K comparable entries. Comparing IntroSorter and 
IntroSorter2, we mainly observe a x5 speed for random low cardinality, and an 
improvement for each data shape.
{noformat}
RANDOM
  IntroSorter  ...  445  445  459  458  453  458  460  465  452  451
  IntroSorter2 ...  394  403  401  400  401  398  400  404  396  399
  TimSorter    ... 1196 1203 1197 1206 1193 1195 1193 1204 1230 1207
  MergeSorter  ... 1462 1470 1482 1466 1463 1475 1478 1475 1466 1471
RANDOM_LOW_CARDINALITY
  IntroSorter  ...  505  513  504  490  527  499  510  512  509  525
  IntroSorter2 ...   89   84   88   88   86   90   88   89   92   88
  TimSorter    ...  511  513  508  508  513  512  521  511  524  516
  MergeSorter  ...  725  725  725  762  737  723  727  724  736  733
RANDOM_MEDIUM_CARDINALITY
  IntroSorter  ...  463  451  452  455  448  452  451  459  458  455
  IntroSorter2 ...  370  381  378  373  375  376  376  372  370  370
  TimSorter    ... 1192 1212 1197 1196 1201 1202 1196 1199 1196 1204
  MergeSorter  ... 1493 1465 1470 1480 1460 1470 1483 1464 1506 1500
ASCENDING
  IntroSorter  ...  211  205  215  213  207  206  208  214  212  211
  IntroSorter2 ...  191  188  190  193  194  191  188  187  185  188
  TimSorter    ...   17   18   18   18   19   19   18   17   18   19
  MergeSorter  ...   73   71   72   75   72   73   73   77   72   71
DESCENDING
  IntroSorter  ...  225  253  229  220  225  231  222  217  220  223
  IntroSorter2 ...  220  213  214  220  205  211  208  210  208  212
  TimSorter    ...  545  576  562  553  543  551  552  552  548  546
  MergeSorter  ...  537  537  548  538  537  536  533  530  533  545
STRICTLY_DESCENDING
  IntroSorter  ...  215  214  221  224  218  227  213  212  212  211
  IntroSorter2 ...  202  203  202  205  202  204  206  204  202  204
  TimSorter    ...   22   21   21   22   22   21   21   22   22   23
  MergeSorter  ...  534  531  533  527  531  529  526  527  528  527
ASCENDING_SEQUENCES
  IntroSorter  ...  370  366  361  376  367  369  358  364  379  376
  IntroSorter2 ...  234  235  231  236  234  245  242  239  239  236
  TimSorter    ...  686  679  745  673  694  685  673  719  682  685
  MergeSorter  ...  894  911  932  907  923  907  918  917  920  916
MOSTLY_ASCENDING
  IntroSorter  ...  284  282  282  283  285  282  278  284  283  287
  IntroSorter2 ...  254  252  249  250  255  255  249  250  252  251
  TimSorter    ...  233  233  230  235  232  234  234  233  228  238
  MergeSorter  ...  399  385  390  398  398  392  380  377  377  387
{noformat}

> Improve IntroSorter with 3-ways partitioning
> --------------------------------------------
>
>                 Key: LUCENE-10196
>                 URL: https://issues.apache.org/jira/browse/LUCENE-10196
>             Project: Lucene - Core
>          Issue Type: Improvement
>            Reporter: Bruno Roustant
>            Priority: Major
>
> I added a SorterBenchmark to evaluate the performance of the various Sorter 
> implementations depending on the strategies defined in BaseSortTestCase 
> (random, random-low-cardinality, ascending, descending, etc).
> By changing the implementation of the IntroSorter to use a 3-ways 
> partitioning, we can gain a significant performance improvement when sorting 
> low-cardinality lists, and we additional changes we can also improve the 
> performance for all the strategies.
> Proposed changes:
> - Sort small ranges with insertion sort (instead of binary sort).
> - Select the quick sort pivot with medians.
> - Partition with the fast Bentley-McIlroy 3-ways partitioning algorithm.
> - Replace the tail recursion by a loop.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

Reply via email to