HUSTERGS commented on code in PR #14906: URL: https://github.com/apache/lucene/pull/14906#discussion_r2188854655
########## lucene/core/src/java/org/apache/lucene/search/ScorerUtil.java: ########## @@ -157,11 +157,10 @@ static void filterCompetitiveHits( int newSize = 0; for (int i = 0; i < buffer.size; ++i) { - if (buffer.scores[i] >= minRequiredScore) { - buffer.docs[newSize] = buffer.docs[i]; - buffer.scores[newSize] = buffer.scores[i]; - newSize++; - } + int inc = buffer.scores[i] >= minRequiredScore ? 1 : 0; + buffer.docs[newSize] = buffer.docs[i]; + buffer.scores[newSize] = buffer.scores[i]; + newSize += inc; Review Comment: Sorry, I found this change make the branchless way lost it's magic, have to revert this one, here is the micro-benchmark result after this change, `branchlessCandidateReversed` represent the changed version: ``` Benchmark (minScoreInclusive) (size) Mode Cnt Score Error Units CompetitiveBenchmark.baseline 0 128 thrpt 5 41672.026 ± 8380.126 ops/ms CompetitiveBenchmark.baseline 0.2 128 thrpt 5 3883.436 ± 198.196 ops/ms CompetitiveBenchmark.baseline 0.4 128 thrpt 5 2180.255 ± 25.426 ops/ms CompetitiveBenchmark.baseline 0.5 128 thrpt 5 1927.560 ± 19.153 ops/ms CompetitiveBenchmark.baseline 0.8 128 thrpt 5 4220.166 ± 89.918 ops/ms CompetitiveBenchmark.branchlessCandidate 0 128 thrpt 5 39618.062 ± 5416.865 ops/ms CompetitiveBenchmark.branchlessCandidate 0.2 128 thrpt 5 10786.274 ± 653.386 ops/ms CompetitiveBenchmark.branchlessCandidate 0.4 128 thrpt 5 10623.743 ± 412.274 ops/ms CompetitiveBenchmark.branchlessCandidate 0.5 128 thrpt 5 10746.761 ± 573.725 ops/ms CompetitiveBenchmark.branchlessCandidate 0.8 128 thrpt 5 10679.752 ± 286.015 ops/ms CompetitiveBenchmark.branchlessCandidateReversed 0 128 thrpt 5 39772.970 ± 19875.397 ops/ms CompetitiveBenchmark.branchlessCandidateReversed 0.2 128 thrpt 5 5300.621 ± 60.061 ops/ms CompetitiveBenchmark.branchlessCandidateReversed 0.4 128 thrpt 5 6514.371 ± 1623.871 ops/ms CompetitiveBenchmark.branchlessCandidateReversed 0.5 128 thrpt 5 7341.370 ± 181.379 ops/ms CompetitiveBenchmark.branchlessCandidateReversed 0.8 128 thrpt 5 9105.116 ± 276.744 ops/ms CompetitiveBenchmark.vectorizedCandidate 0 128 thrpt 5 42486.653 ± 9245.167 ops/ms CompetitiveBenchmark.vectorizedCandidate 0.2 128 thrpt 5 9389.316 ± 9250.269 ops/ms CompetitiveBenchmark.vectorizedCandidate 0.4 128 thrpt 5 10561.076 ± 427.473 ops/ms CompetitiveBenchmark.vectorizedCandidate 0.5 128 thrpt 5 11163.025 ± 3068.748 ops/ms CompetitiveBenchmark.vectorizedCandidate 0.8 128 thrpt 5 10584.527 ± 757.254 ops/ms ``` I didn't dig into the assemble, but this regression make sense to me, one possible reason is that this change enhances the data dependency between ``` int inc = scores[i] >= minScoreInclusive ? 1 : 0; newSize += inc; ``` and cpu somehow failed to parallel/pre-execution this critical code path -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org