HUSTERGS commented on PR #14827: URL: https://github.com/apache/lucene/pull/14827#issuecomment-3024748199
Sorry for the late reply. I've dug a little bit into this issue. There are two problems. The first one is that **docs NOT collected in baseline ARE collected under this patch**, this is actually caused by the initial `double` to `float` cast, so I changed it from `float` to `double` The second one is that **docs collected in baseline are NOT collected under this patch**, it turns out the original code actually let docs with score lower than `minCompetitiveScore` been collected, and this patch savely prunes it away. Here is a example which can reproduce the two problems above: ```java package org.apache.lucene; import org.apache.lucene.util.MathUtil; public class Run { public static void main(String[] args) { { // problem that doc collected in patch but not in baseline System.out.println("problem that doc collected in patch but not in baseline"); float minCompetitiveScore = 3.5382755f; double score = 2.201078414916992d; double maxRemainingScore = 1.337196946144104d; // false, means this can not be collected in baseline System.out.println( ((float) MathUtil.sumUpperBound(score + maxRemainingScore, 2)) >= minCompetitiveScore); { float minRequiredScore = (float) (minCompetitiveScore - maxRemainingScore); while ((float) MathUtil.sumUpperBound(minRequiredScore + maxRemainingScore, 2) > minCompetitiveScore) { minRequiredScore = Math.nextDown(minRequiredScore); } // score=2.201078414916992d, minRequiredScore=2.2010784f // true, means this will be collected in patch System.out.println(score >= minRequiredScore); } { double minRequiredScore = (minCompetitiveScore - maxRemainingScore); while ((float) MathUtil.sumUpperBound(minRequiredScore + maxRemainingScore, 2) > minCompetitiveScore) { minRequiredScore = Math.nextDown(minRequiredScore); } // false, score=2.201078414916992d, minRequiredScoreD=2.2010785341262817d System.out.println(score >= minRequiredScore); } } System.out.println(); { // problem that doc collected in baseline but not in patch System.out.println("problem that doc collected in baseline but not in patch"); float minCompetitiveScore = 7.638806f; double score = 7.638805627822876d; double maxRemainingScore = 0.0d; int numScorers = 33; double minRequiredScore = (minCompetitiveScore - maxRemainingScore); while ((float) MathUtil.sumUpperBound(minRequiredScore + maxRemainingScore, numScorers) > minCompetitiveScore) { minRequiredScore = Math.nextDown(minRequiredScore); } // false, means this can not be collected by current patch. System.out.println(score >= minRequiredScore); // true, means this can be collected in baseline // MathUtil.sumUpperBound(score + maxRemainingScore, numScorers)=7.638805627822984d; // (float) MathUtil.sumUpperBound(score + maxRemainingScore, numScorers)=7.638806f == minCompetitiveScore; // // The original double (before cast to float) is actually smaller than minCompetitiveScore // which means we can actually prune this doc safely. System.out.println( ((float) MathUtil.sumUpperBound(score + maxRemainingScore, numScorers)) >= minCompetitiveScore); } } } ``` I did another quick **one** iteration full task luceneutils to verify the hit count, this time it still complains about the hit count, but all the diffs are one direction (patch less than basline), which I think is expected behavior:  And I run another luceneutil on `wikimediumall` with `searchConcurrency=0, taskCountPerCat=5, taskRepeatCount=50` after 20 iterations: ``` TaskQPS baseline StdDevQPS my_modified_version StdDev Pct diff p-value OrHighRare 120.94 (2.7%) 122.37 (1.9%) 1.2% ( -3% - 5%) 0.109 OrHighMed 90.74 (3.6%) 92.40 (2.8%) 1.8% ( -4% - 8%) 0.073 AndHighMed 69.64 (2.8%) 71.32 (2.1%) 2.4% ( -2% - 7%) 0.002 AndHighHigh 28.49 (2.4%) 29.29 (2.1%) 2.8% ( -1% - 7%) 0.000 OrHighHigh 26.83 (2.5%) 27.60 (1.9%) 2.9% ( -1% - 7%) 0.000 ``` The while loop containing call to `Math.nextDown` (which operate on a `double` rather than `float` now) doesn's seem to add much overhead. I locally added some simple print over the iteration count of the loop, they are almost entirely zero. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org