gf2121 commented on PR #14203:
URL: https://github.com/apache/lucene/pull/14203#issuecomment-2726739699

   On the AVX-512 machine:
   
   * Specialized read does not vectorize the remainder loop, it seems the 
complier failed to inline it.
   * Specialized decode vectorizes the remainder loop.
   * Push masks to remainder loop seems to get a better performance.
   
   I pushed the benchmark code to the patch, here is result on my machine:
   
   ```
   Benchmark                                                         Mode  Cnt  
 Score   Error   Units
   InnerLoopDecodingBenchmark.hybridInnerLoop                       thrpt    5  
76.311 ? 0.177  ops/ms
   InnerLoopDecodingBenchmark.hybridInnerLoop:asm                   thrpt       
   NaN             ---
   InnerLoopDecodingBenchmark.specializedDecode                     thrpt    5  
73.600 ? 0.123  ops/ms
   InnerLoopDecodingBenchmark.specializedDecode:asm                 thrpt       
   NaN             ---
   InnerLoopDecodingBenchmark.specializedDecodeMaskInRemainder      thrpt    5  
80.902 ? 0.046  ops/ms
   InnerLoopDecodingBenchmark.specializedDecodeMaskInRemainder:asm  thrpt       
   NaN             ---
   InnerLoopDecodingBenchmark.specializedRead                       thrpt    5  
37.195 ? 0.099  ops/ms
   InnerLoopDecodingBenchmark.specializedRead:asm                   thrpt       
   NaN             ---
   ```
   
   **LuceneUtil**
   
   hybridInnerLoop (baseline) vs specializedRead (candidate)
   ```
                               TaskQPS baseline      StdDevQPS 
my_modified_version      StdDev                Pct diff p-value
                             IntNRQ       83.50      (3.7%)       78.07      
(4.1%)   -6.5% ( -13% -    1%) 0.000
                     FilteredIntNRQ       81.66      (3.6%)       76.50      
(3.8%)   -6.3% ( -13% -    1%) 0.000
                CountFilteredIntNRQ       44.30      (1.9%)       43.07      
(2.8%)   -2.8% (  -7% -    1%) 0.000
                             IntSet       94.41      (1.7%)       94.38      
(1.0%)   -0.0% (  -2% -    2%) 0.940
   ```
   
   hybridInnerLoop (baseline) vs specializedDecodeMaskInRemainder (candidate)
   ```
                               TaskQPS baseline      StdDevQPS 
my_modified_version      StdDev                Pct diff p-value
                             IntSet       96.10      (1.1%)       94.85      
(1.5%)   -1.3% (  -3% -    1%) 0.039
                CountFilteredIntNRQ       43.67      (2.3%)       44.78      
(2.8%)    2.6% (  -2% -    7%) 0.036
                             IntNRQ       81.12      (2.9%)       84.52      
(4.1%)    4.2% (  -2% -   11%) 0.013
                     FilteredIntNRQ       78.94      (3.0%)       83.07      
(4.1%)    5.2% (  -1% -   12%) 0.002
   ```


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

Reply via email to