neoremind edited a comment on pull request #91:
URL: https://github.com/apache/lucene/pull/91#issuecomment-825776633
I make `StableMSBRadixSorter` as default sorter and use `InPlaceMergeSorter`
as fallback sorter. Please check the latest commit. (I did not squash commits,
and save every commit as separate branch in my own forked repo for future
reference).
Average sort time (unit in us) comparison.
(1) PR branch
(2) main branch
(3) PR branch, verify doc ID not in ascending order, original
`MSBRadixSorter` with stable reorder.
```
-------------------------------------------------------------------------------
| bytesPerDim | isDocIdIncremental | (1) | (2) | (3)
|
-------------------------------------------------------------------------------
| 1 | N | 31132.7 | 1144138.9 |
981732.7 |
| 1 | Y | 33049.8 | 390398.8 |
|
| 2 | N | 273975.1 | 1121301.9 |
941666.1 |
| 2 | Y | 281060.2 | 1124444.9 |
|
| 3 | N | 844291.7 | 1482451.3 |
1306374.9 |
| 3 | Y | 804839.8 | 1471338.1 |
|
| 4 | N | 1274670.8 | 1424961.6 |
1262810.3 |
| 4 | Y | 1289128.8 | 1423907.1 |
|
| 8 | N | 1357592.3 | 1437768.5 |
1282193.9 |
| 8 | Y | 1286732.0 | 1474001.1 |
|
| 16 | N | 1366177.6 | 1464370.4 |
1269967.4 |
| 16 | Y | 1353213.1 | 1478291.3 |
|
| 32 | N | 1403655.4 | 1500323.9 |
1293686.0 |
| 32 | Y | 1406872.4 | 1508646.1 |
|
-------------------------------------------------------------------------------
```
In conclusion, PR branch runs faster than main branch regardless of
bytePerDim and isDocIdIncremental, some cases 38x times faster. If we make
original `MSBRadixSorter` with stable reorder, PR branch is little bit slower
but acceptable. I think in real case, doc ID is usually in ascending order when
index building, and that could be a noticeable performance improvement.
> We could still iterate later, but for now this sounds to me like a good
performance-simplicity trade-off. What do you think?
I have no problem 😄
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]