neoremind commented on pull request #91: URL: https://github.com/apache/lucene/pull/91#issuecomment-823949164
@jpountz Per your advice, I have updated the code. In terms of performance, I refined `TestBKDDisableSortDocId`, to make it re-runnable as a benchmark. I have made the following benchmark. doc num = 2,000,000, dim number = 1, bytePerDim = [1,2,3,4,8,16,32], run times = 10 (warm up 5 times) Result is shown as below. ``` ------------------------------------------------- | bytesPerDim | isDocIdIncremental | avg time(us) | ------------------------------------------------- | 1 | N | 1127688.5 | | 1 | Y | 56464.7 | | 2 | N | 1124137.8 | | 2 | Y | 339150.2 | | 3 | N | 1485020.4 | | 3 | Y | 878251.1 | | 4 | N | 1436003.9 | | 4 | Y | 1376974.1 | | 8 | N | 1444971.3 | | 8 | Y | 1365877.5 | | 16 | N | 1500235.5 | | 16 | Y | 1385235.8 | | 32 | N | 1516514.0 | | 32 | Y | 1415364.9 | ------------------------------------------------- ``` Meanwhile, I also reset to main branch to run the same test case. Result is shown as below. ``` ------------------------------------------------- | bytesPerDim | isDocIdIncremental | avg time(us) | ------------------------------------------------- | 1 | N | 1144138.9 | | 1 | Y | 390398.8 | | 2 | N | 1121301.9 | | 2 | Y | 1124444.9 | | 3 | N | 1482451.3 | | 3 | Y | 1471338.1 | | 4 | N | 1424961.6 | | 4 | Y | 1423907.1 | | 8 | N | 1437768.5 | | 8 | Y | 1474001.1 | | 16 | N | 1464370.4 | | 16 | Y | 1478291.3 | | 32 | N | 1500323.9 | | 32 | Y | 1508646.1 | ------------------------------------------------- ``` I made a graph so that we can see more clearly.  If DocIds are increasing, PR branch out-performs in all scenarios. If DocIds are not in order, we expect the performance to be the same with main branch. It does work almost the same, but here we introduce a small overhead to scan data beforehand, checking whether data is in order, so PR branch is a little bit (like 1% percent) backward. I made a flame-graph, the right-most column is where checking order consumes, very small, like below 3% of total CPU consumption.  -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org