[ https://issues.apache.org/jira/browse/LUCENE-10196?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17432670#comment-17432670 ]
Bruno Roustant commented on LUCENE-10196: ----------------------------------------- Benchmark run to compare sorters implementations for different shapes of data: Each value is the time to complete a run. The values for the same column can be compared because the same data is provided as input to the various sorters. Each column has different random data. IntroSorter2 is the new modified version of IntroSorter. The benchmark runs with 20K comparable entries. Comparing IntroSorter and IntroSorter2, we mainly observe a x5 speed for random low cardinality, and an improvement for each data shape. {noformat} RANDOM IntroSorter ... 445 445 459 458 453 458 460 465 452 451 IntroSorter2 ... 394 403 401 400 401 398 400 404 396 399 TimSorter ... 1196 1203 1197 1206 1193 1195 1193 1204 1230 1207 MergeSorter ... 1462 1470 1482 1466 1463 1475 1478 1475 1466 1471 RANDOM_LOW_CARDINALITY IntroSorter ... 505 513 504 490 527 499 510 512 509 525 IntroSorter2 ... 89 84 88 88 86 90 88 89 92 88 TimSorter ... 511 513 508 508 513 512 521 511 524 516 MergeSorter ... 725 725 725 762 737 723 727 724 736 733 RANDOM_MEDIUM_CARDINALITY IntroSorter ... 463 451 452 455 448 452 451 459 458 455 IntroSorter2 ... 370 381 378 373 375 376 376 372 370 370 TimSorter ... 1192 1212 1197 1196 1201 1202 1196 1199 1196 1204 MergeSorter ... 1493 1465 1470 1480 1460 1470 1483 1464 1506 1500 ASCENDING IntroSorter ... 211 205 215 213 207 206 208 214 212 211 IntroSorter2 ... 191 188 190 193 194 191 188 187 185 188 TimSorter ... 17 18 18 18 19 19 18 17 18 19 MergeSorter ... 73 71 72 75 72 73 73 77 72 71 DESCENDING IntroSorter ... 225 253 229 220 225 231 222 217 220 223 IntroSorter2 ... 220 213 214 220 205 211 208 210 208 212 TimSorter ... 545 576 562 553 543 551 552 552 548 546 MergeSorter ... 537 537 548 538 537 536 533 530 533 545 STRICTLY_DESCENDING IntroSorter ... 215 214 221 224 218 227 213 212 212 211 IntroSorter2 ... 202 203 202 205 202 204 206 204 202 204 TimSorter ... 22 21 21 22 22 21 21 22 22 23 MergeSorter ... 534 531 533 527 531 529 526 527 528 527 ASCENDING_SEQUENCES IntroSorter ... 370 366 361 376 367 369 358 364 379 376 IntroSorter2 ... 234 235 231 236 234 245 242 239 239 236 TimSorter ... 686 679 745 673 694 685 673 719 682 685 MergeSorter ... 894 911 932 907 923 907 918 917 920 916 MOSTLY_ASCENDING IntroSorter ... 284 282 282 283 285 282 278 284 283 287 IntroSorter2 ... 254 252 249 250 255 255 249 250 252 251 TimSorter ... 233 233 230 235 232 234 234 233 228 238 MergeSorter ... 399 385 390 398 398 392 380 377 377 387 {noformat} > Improve IntroSorter with 3-ways partitioning > -------------------------------------------- > > Key: LUCENE-10196 > URL: https://issues.apache.org/jira/browse/LUCENE-10196 > Project: Lucene - Core > Issue Type: Improvement > Reporter: Bruno Roustant > Priority: Major > > I added a SorterBenchmark to evaluate the performance of the various Sorter > implementations depending on the strategies defined in BaseSortTestCase > (random, random-low-cardinality, ascending, descending, etc). > By changing the implementation of the IntroSorter to use a 3-ways > partitioning, we can gain a significant performance improvement when sorting > low-cardinality lists, and we additional changes we can also improve the > performance for all the strategies. > Proposed changes: > - Sort small ranges with insertion sort (instead of binary sort). > - Select the quick sort pivot with medians. > - Partition with the fast Bentley-McIlroy 3-ways partitioning algorithm. > - Replace the tail recursion by a loop. -- This message was sent by Atlassian Jira (v8.3.4#803005) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org