rmuir commented on PR #12311:
URL: https://github.com/apache/lucene/pull/12311#issuecomment-1556403183

   I pushed a new benchmark to https://github.com/rmuir/vectorbench for the 
binary dot product. 
   
   Basically this has to act like:
   ```
   int sum = 0;
   for (...) {
     short product = (short) (a[i] * b[i]);
     sum += (int) product;
   }
   ```
   
   So it is tricky to do with totally generic implementation (just using 
SPECIES_PREFERRED). for avx256, it means you read byte vector of length 32 and 
then work each half as short (2 short vectors of length 16), and then the same 
thing again for each half as int (4 int vectors of length 8). This generic 
approach only gives me a 2x speedup which is a little disappointing.
   
   but this is a stupid approach if you have 256-bit vectors. You can just use 
ByteVector.SPECIES_64, ShortVector.SPECIES_128, and IntVector.SPECIES_256 and 
the whole thing is much faster. 
   
   on my skylake (has avx 256 and gets the optimized 256-bit impl)
   ```
   Benchmark                                (size)   Mode  Cnt    Score   Error 
  Units
   BinaryDotProductBenchmark.dotProductNew       1  thrpt    5  159.476 ± 8.177 
 ops/us
   BinaryDotProductBenchmark.dotProductNew     128  thrpt    5   41.759 ± 0.267 
 ops/us
   BinaryDotProductBenchmark.dotProductNew     207  thrpt    5   25.094 ± 0.107 
 ops/us
   BinaryDotProductBenchmark.dotProductNew     256  thrpt    5   24.841 ± 0.124 
 ops/us
   BinaryDotProductBenchmark.dotProductNew     300  thrpt    5   19.624 ± 0.891 
 ops/us
   BinaryDotProductBenchmark.dotProductNew     512  thrpt    5   13.763 ± 0.171 
 ops/us
   BinaryDotProductBenchmark.dotProductNew     702  thrpt    5    9.792 ± 0.388 
 ops/us
   BinaryDotProductBenchmark.dotProductNew    1024  thrpt    5    6.878 ± 0.834 
 ops/us
   BinaryDotProductBenchmark.dotProductOld       1  thrpt    5  160.423 ± 6.845 
 ops/us
   BinaryDotProductBenchmark.dotProductOld     128  thrpt    5   13.300 ± 0.159 
 ops/us
   BinaryDotProductBenchmark.dotProductOld     207  thrpt    5    8.678 ± 0.293 
 ops/us
   BinaryDotProductBenchmark.dotProductOld     256  thrpt    5    6.892 ± 0.331 
 ops/us
   BinaryDotProductBenchmark.dotProductOld     300  thrpt    5    6.008 ± 0.438 
 ops/us
   BinaryDotProductBenchmark.dotProductOld     512  thrpt    5    3.613 ± 0.192 
 ops/us
   BinaryDotProductBenchmark.dotProductOld     702  thrpt    5    2.710 ± 0.167 
 ops/us
   BinaryDotProductBenchmark.dotProductOld    1024  thrpt    5    1.825 ± 0.125 
 ops/us
   ```


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

Reply via email to