jpountz commented on PR #14896:
URL: https://github.com/apache/lucene/pull/14896#issuecomment-3038878223

   FYI I played with an alternative impl that is branchless, partially 
auto-vectorizes, and seems to give almost the same performance as the 
vectorized impl on my machine (which doesn't have AVX-512, so the gap may be 
bigger on your machine).
   
   ```
   Benchmark                                 (minScoreInclusive)  (size)   Mode 
 Cnt      Score      Error   Units
   CompetitiveBenchmark.baseline                               0     128  thrpt 
   5  16580.322 ±  544.820  ops/ms
   CompetitiveBenchmark.baseline                             0.2     128  thrpt 
   5   3135.744 ±   16.845  ops/ms
   CompetitiveBenchmark.baseline                             0.4     128  thrpt 
   5   1919.107 ±   16.555  ops/ms
   CompetitiveBenchmark.baseline                             0.5     128  thrpt 
   5   1787.012 ±   44.708  ops/ms
   CompetitiveBenchmark.baseline                             0.8     128  thrpt 
   5   3533.556 ±   43.427  ops/ms
   CompetitiveBenchmark.branchlessCandidate                    0     128  thrpt 
   5  16991.935 ± 1305.993  ops/ms
   CompetitiveBenchmark.branchlessCandidate                  0.2     128  thrpt 
   5   8585.819 ±  376.521  ops/ms
   CompetitiveBenchmark.branchlessCandidate                  0.4     128  thrpt 
   5   8564.359 ±  275.025  ops/ms
   CompetitiveBenchmark.branchlessCandidate                  0.5     128  thrpt 
   5   8783.434 ±   84.353  ops/ms
   CompetitiveBenchmark.branchlessCandidate                  0.8     128  thrpt 
   5   8808.948 ±   64.613  ops/ms
   CompetitiveBenchmark.vectorizedCandidate                    0     128  thrpt 
   5   9913.819 ±   34.008  ops/ms
   CompetitiveBenchmark.vectorizedCandidate                  0.2     128  thrpt 
   5   9963.308 ±  897.262  ops/ms
   CompetitiveBenchmark.vectorizedCandidate                  0.4     128  thrpt 
   5   9743.768 ±  461.215  ops/ms
   CompetitiveBenchmark.vectorizedCandidate                  0.5     128  thrpt 
   5   9851.264 ±  200.810  ops/ms
   CompetitiveBenchmark.vectorizedCandidate                  0.8     128  thrpt 
   5  10043.750 ±   91.672  ops/ms
   ```


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

Reply via email to