HUSTERGS commented on PR #14827:
URL: https://github.com/apache/lucene/pull/14827#issuecomment-3024748199

   Sorry for the late reply. 
   I've dug a little bit into this issue. There are two problems. 
   The first one is that **docs NOT collected in baseline ARE collected under 
this patch**, this is actually caused by the initial `double` to `float` cast, 
so I changed it from `float` to `double`
   The second one is that **docs collected in baseline are NOT collected under 
this patch**, it turns out the original code actually let docs with score lower 
than `minCompetitiveScore` been collected, and this patch savely prunes it 
away. 
   
   Here is a example which can reproduce the two problems above:
   ```java
   package org.apache.lucene;
   
   import org.apache.lucene.util.MathUtil;
   
   public class Run {
   
     public static void main(String[] args) {
   
       {
         // problem that doc collected in patch but not in baseline
         System.out.println("problem that doc collected in patch but not in 
baseline");
   
         float minCompetitiveScore = 3.5382755f;
         double score = 2.201078414916992d;
         double maxRemainingScore = 1.337196946144104d;
   
         // false, means this can not be collected in baseline
         System.out.println(
             ((float) MathUtil.sumUpperBound(score + maxRemainingScore, 2)) >= 
minCompetitiveScore);
   
         {
           float minRequiredScore = (float) (minCompetitiveScore - 
maxRemainingScore);
           while ((float) MathUtil.sumUpperBound(minRequiredScore + 
maxRemainingScore, 2) > minCompetitiveScore) {
             minRequiredScore = Math.nextDown(minRequiredScore);
           }
   
           // score=2.201078414916992d, minRequiredScore=2.2010784f
           // true, means this will be collected in patch
           System.out.println(score >= minRequiredScore);
         }
   
         {
           double minRequiredScore = (minCompetitiveScore - maxRemainingScore);
           while ((float) MathUtil.sumUpperBound(minRequiredScore + 
maxRemainingScore, 2) > minCompetitiveScore) {
             minRequiredScore = Math.nextDown(minRequiredScore);
           }
   
           // false, score=2.201078414916992d, 
minRequiredScoreD=2.2010785341262817d
           System.out.println(score >= minRequiredScore);
         }
       }
   
       System.out.println();
       {
         // problem that doc collected in baseline but not in patch
         System.out.println("problem that doc collected in baseline but not in 
patch");
   
         float minCompetitiveScore = 7.638806f;
         double score = 7.638805627822876d;
         double maxRemainingScore = 0.0d;
         int numScorers = 33;
   
         double minRequiredScore = (minCompetitiveScore - maxRemainingScore);
         while ((float) MathUtil.sumUpperBound(minRequiredScore + 
maxRemainingScore, numScorers) > minCompetitiveScore) {
           minRequiredScore = Math.nextDown(minRequiredScore);
         }
   
         // false, means this can not be collected by current patch.
         System.out.println(score >= minRequiredScore);
   
         // true, means this can be collected in baseline
         // MathUtil.sumUpperBound(score + maxRemainingScore, 
numScorers)=7.638805627822984d;
         // (float) MathUtil.sumUpperBound(score + maxRemainingScore, 
numScorers)=7.638806f == minCompetitiveScore;
         //
         // The original double (before cast to float) is actually smaller than 
minCompetitiveScore
         // which means we can actually prune this doc safely.
         System.out.println(
             ((float) MathUtil.sumUpperBound(score + maxRemainingScore, 
numScorers)) >= minCompetitiveScore);
   
       }
     }
   }
   ```
   
   I did another quick **one** iteration full task luceneutils to verify the 
hit count, this time it still complains about the hit count, but all the diffs 
are one direction (patch less than basline), which I think is expected behavior:
   
![image](https://github.com/user-attachments/assets/a986668a-8121-41f0-90b1-6888aa1ec489)
   
   And I run another luceneutil  on `wikimediumall` with `searchConcurrency=0, 
taskCountPerCat=5, taskRepeatCount=50` after 20 iterations:
   
   ```
                               TaskQPS baseline      StdDevQPS 
my_modified_version      StdDev                Pct diff p-value
                         OrHighRare      120.94      (2.7%)      122.37      
(1.9%)    1.2% (  -3% -    5%) 0.109
                          OrHighMed       90.74      (3.6%)       92.40      
(2.8%)    1.8% (  -4% -    8%) 0.073
                         AndHighMed       69.64      (2.8%)       71.32      
(2.1%)    2.4% (  -2% -    7%) 0.002
                        AndHighHigh       28.49      (2.4%)       29.29      
(2.1%)    2.8% (  -1% -    7%) 0.000
                         OrHighHigh       26.83      (2.5%)       27.60      
(1.9%)    2.9% (  -1% -    7%) 0.000
   ```
   
   The while loop containing call to `Math.nextDown` (which operate on a 
`double` rather than `float` now) doesn's seem to add much overhead. I locally 
added some simple print over the iteration count of the loop, they are almost 
entirely zero.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

Reply via email to