neoremind edited a comment on pull request #91: URL: https://github.com/apache/lucene/pull/91#issuecomment-825776633
I make `StableMSBRadixSorter` as default sorter and use `InPlaceMergeSorter` as fallback sorter. Please check the latest commit. (I did not squash commits, and save every commit as separate branch in my own forked repo for future reference). Average sort time (unit in us) comparison. (1) PR branch (2) main branch (3) PR branch, verify doc ID not in ascending order, original `MSBRadixSorter` with stable reorder. ``` ------------------------------------------------------------------------------- | bytesPerDim | isDocIdIncremental | (1) | (2) | (3) | ------------------------------------------------------------------------------- | 1 | N | 31132.7 | 1144138.9 | 981732.7 | | 1 | Y | 33049.8 | 390398.8 | | | 2 | N | 273975.1 | 1121301.9 | 941666.1 | | 2 | Y | 281060.2 | 1124444.9 | | | 3 | N | 844291.7 | 1482451.3 | 1306374.9 | | 3 | Y | 804839.8 | 1471338.1 | | | 4 | N | 1274670.8 | 1424961.6 | 1262810.3 | | 4 | Y | 1289128.8 | 1423907.1 | | | 8 | N | 1357592.3 | 1437768.5 | 1282193.9 | | 8 | Y | 1286732.0 | 1474001.1 | | | 16 | N | 1366177.6 | 1464370.4 | 1269967.4 | | 16 | Y | 1353213.1 | 1478291.3 | | | 32 | N | 1403655.4 | 1500323.9 | 1293686.0 | | 32 | Y | 1406872.4 | 1508646.1 | | ------------------------------------------------------------------------------- ``` In conclusion, PR branch runs faster than main branch regardless of bytePerDim and isDocIdIncremental, some cases 38x times faster. If we make original `MSBRadixSorter` with stable reorder, PR branch is little bit slower but acceptable. I think in real case, doc ID is usually in ascending order when index building, and that could be a noticeable performance improvement. > We could still iterate later, but for now this sounds to me like a good performance-simplicity trade-off. What do you think? I have no problem 😄 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org