uschindler commented on PR #14896:
URL: https://github.com/apache/lucene/pull/14896#issuecomment-3040103885

   > I get the following results on an Apple M3 (ARM). The vectorized 
implementation is 10x slower than the scalar impls.
   > 
   > ```
   > Benchmark                                 (minScoreInclusive)  (size)   
Mode  Cnt      Score      Error   Units
   > CompetitiveBenchmark.baseline                               0     128  
thrpt    5  39680,991 ± 1331,474  ops/ms
   > CompetitiveBenchmark.baseline                             0.2     128  
thrpt    5  12341,873 ± 2689,119  ops/ms
   > CompetitiveBenchmark.baseline                             0.4     128  
thrpt    5  11486,998 ± 1605,569  ops/ms
   > CompetitiveBenchmark.baseline                             0.5     128  
thrpt    5  12114,898 ± 1071,809  ops/ms
   > CompetitiveBenchmark.baseline                             0.8     128  
thrpt    5  17948,458 ± 2517,146  ops/ms
   > CompetitiveBenchmark.branchlessCandidate                    0     128  
thrpt    5  36995,216 ± 1933,371  ops/ms
   > CompetitiveBenchmark.branchlessCandidate                  0.2     128  
thrpt    5  10730,098 ±  322,258  ops/ms
   > CompetitiveBenchmark.branchlessCandidate                  0.4     128  
thrpt    5  11226,245 ±  124,199  ops/ms
   > CompetitiveBenchmark.branchlessCandidate                  0.5     128  
thrpt    5  11210,890 ±  215,824  ops/ms
   > CompetitiveBenchmark.branchlessCandidate                  0.8     128  
thrpt    5  11154,615 ±  245,362  ops/ms
   > CompetitiveBenchmark.vectorizedCandidate                    0     128  
thrpt    5   1140,617 ±   14,719  ops/ms
   > CompetitiveBenchmark.vectorizedCandidate                  0.2     128  
thrpt    5    994,966 ±   29,994  ops/ms
   > CompetitiveBenchmark.vectorizedCandidate                  0.4     128  
thrpt    5    923,214 ±   90,922  ops/ms
   > CompetitiveBenchmark.vectorizedCandidate                  0.5     128  
thrpt    5    890,464 ±    4,270  ops/ms
   > CompetitiveBenchmark.vectorizedCandidate                  0.8     128  
thrpt    5    950,584 ±    7,772  ops/ms
   > ```
   
   I think we have to check the CPU flags. Looks like this is similar to other 
vectorizations we tried before. 10 times slower basically means there's no 
support for it at all and it runs the old in java code. 🤪
   
   I think we have to check some of the hotspot flags based constants added 
previously to exclude on some constellations.
   
   Uwe


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

Reply via email to