[PR] [DRAFT] Add unsigned byte vector operations for uint8 quantization [lucene]

via GitHub Wed, 18 Oct 2023 06:58:30 -0700


benwtrent opened a new pull request, #12694:
URL: https://github.com/apache/lucene/pull/12694


   {DRAFT}
   
   After finalizing work and merging: 
https://github.com/apache/lucene/pull/12582
   
   Investigation on if adding unsigned vector operations should occur. 
Quantizing within `[0-255]` can reduce error. However, panama vector operations 
over unsigned bytes is slightly more expensive (see JMH benchmarks below). Need 
to benchmark recall vs. latency over some data sets to verify if this is worth 
it or not.
   
   <details>
   <summary> M1 (AMD 128 NEON) </summary>
   
   ```
   Benchmark                                           (size)   Mode  Cnt   
Score   Error   Units
   VectorUtilBenchmark.binaryCosineScalar                 128  thrpt    5   
8.369 ± 0.208  ops/us
   VectorUtilBenchmark.binaryCosineScalar                 207  thrpt    5   
5.124 ± 0.210  ops/us
   VectorUtilBenchmark.binaryCosineScalar                 256  thrpt    5   
4.193 ± 0.014  ops/us
   VectorUtilBenchmark.binaryCosineScalar                1024  thrpt    5   
1.043 ± 0.002  ops/us
   VectorUtilBenchmark.binaryCosineUnsignedScalar         128  thrpt    5   
8.359 ± 0.100  ops/us
   VectorUtilBenchmark.binaryCosineUnsignedScalar         207  thrpt    5   
5.193 ± 0.025  ops/us
   VectorUtilBenchmark.binaryCosineUnsignedScalar         256  thrpt    5   
4.194 ± 0.015  ops/us
   VectorUtilBenchmark.binaryCosineUnsignedScalar        1024  thrpt    5   
1.043 ± 0.002  ops/us
   VectorUtilBenchmark.binaryCosineUnsignedVector         128  thrpt    5  
21.068 ± 0.072  ops/us
   VectorUtilBenchmark.binaryCosineUnsignedVector         207  thrpt    5  
12.901 ± 0.041  ops/us
   VectorUtilBenchmark.binaryCosineUnsignedVector         256  thrpt    5  
11.595 ± 0.128  ops/us
   VectorUtilBenchmark.binaryCosineUnsignedVector        1024  thrpt    5   
3.197 ± 0.007  ops/us
   VectorUtilBenchmark.binaryCosineVector                 128  thrpt    5  
23.552 ± 0.081  ops/us
   VectorUtilBenchmark.binaryCosineVector                 207  thrpt    5  
14.358 ± 0.077  ops/us
   VectorUtilBenchmark.binaryCosineVector                 256  thrpt    5  
13.165 ± 0.053  ops/us
   VectorUtilBenchmark.binaryCosineVector                1024  thrpt    5   
3.681 ± 0.027  ops/us
   VectorUtilBenchmark.binaryDotProductScalar             128  thrpt    5  
25.125 ± 0.043  ops/us
   VectorUtilBenchmark.binaryDotProductScalar             207  thrpt    5  
15.512 ± 0.061  ops/us
   VectorUtilBenchmark.binaryDotProductScalar             256  thrpt    5  
12.557 ± 0.044  ops/us
   VectorUtilBenchmark.binaryDotProductScalar            1024  thrpt    5   
3.110 ± 0.029  ops/us
   VectorUtilBenchmark.binaryDotProductUnsignedScalar     128  thrpt    5  
25.115 ± 0.082  ops/us
   VectorUtilBenchmark.binaryDotProductUnsignedScalar     207  thrpt    5  
15.518 ± 0.039  ops/us
   VectorUtilBenchmark.binaryDotProductUnsignedScalar     256  thrpt    5  
12.554 ± 0.037  ops/us
   VectorUtilBenchmark.binaryDotProductUnsignedScalar    1024  thrpt    5   
3.112 ± 0.011  ops/us
   VectorUtilBenchmark.binaryDotProductUnsignedVector     128  thrpt    5  
38.071 ± 0.060  ops/us
   VectorUtilBenchmark.binaryDotProductUnsignedVector     207  thrpt    5  
25.039 ± 0.120  ops/us
   VectorUtilBenchmark.binaryDotProductUnsignedVector     256  thrpt    5  
20.578 ± 0.062  ops/us
   VectorUtilBenchmark.binaryDotProductUnsignedVector    1024  thrpt    5   
5.465 ± 0.008  ops/us
   VectorUtilBenchmark.binaryDotProductVector             128  thrpt    5  
45.923 ± 0.150  ops/us
   VectorUtilBenchmark.binaryDotProductVector             207  thrpt    5  
30.516 ± 0.053  ops/us
   VectorUtilBenchmark.binaryDotProductVector             256  thrpt    5  
25.510 ± 0.053  ops/us
   VectorUtilBenchmark.binaryDotProductVector            1024  thrpt    5   
6.744 ± 0.046  ops/us
   ```
   
   </details>
   
   <details> 
   
   <summary> GCP AVX512 </summary>
   
   ```
   Benchmark                                           (size)   Mode  Cnt   
Score   Error   Units
   VectorUtilBenchmark.binaryCosineScalar                 128  thrpt    5   
7.290 ± 0.003  ops/us
   VectorUtilBenchmark.binaryCosineScalar                 207  thrpt    5   
4.236 ± 0.015  ops/us
   VectorUtilBenchmark.binaryCosineScalar                 256  thrpt    5   
3.452 ± 0.015  ops/us
   VectorUtilBenchmark.binaryCosineScalar                1024  thrpt    5   
0.885 ± 0.003  ops/us
   VectorUtilBenchmark.binaryCosineUnsignedScalar         128  thrpt    5   
7.304 ± 0.007  ops/us
   VectorUtilBenchmark.binaryCosineUnsignedScalar         207  thrpt    5   
4.225 ± 0.013  ops/us
   VectorUtilBenchmark.binaryCosineUnsignedScalar         256  thrpt    5   
3.431 ± 0.026  ops/us
   VectorUtilBenchmark.binaryCosineUnsignedScalar        1024  thrpt    5   
0.879 ± 0.006  ops/us
   VectorUtilBenchmark.binaryCosineUnsignedVector         128  thrpt    5  
29.931 ± 0.049  ops/us
   VectorUtilBenchmark.binaryCosineUnsignedVector         207  thrpt    5  
17.284 ± 0.018  ops/us
   VectorUtilBenchmark.binaryCosineUnsignedVector         256  thrpt    5  
19.145 ± 0.067  ops/us
   VectorUtilBenchmark.binaryCosineUnsignedVector        1024  thrpt    5   
6.109 ± 0.004  ops/us
   VectorUtilBenchmark.binaryCosineVector                 128  thrpt    5  
32.736 ± 0.027  ops/us
   VectorUtilBenchmark.binaryCosineVector                 207  thrpt    5  
18.272 ± 0.640  ops/us
   VectorUtilBenchmark.binaryCosineVector                 256  thrpt    5  
21.435 ± 0.051  ops/us
   VectorUtilBenchmark.binaryCosineVector                1024  thrpt    5   
7.029 ± 0.011  ops/us
   VectorUtilBenchmark.binaryDotProductScalar             128  thrpt    5  
16.971 ± 0.053  ops/us
   VectorUtilBenchmark.binaryDotProductScalar             207  thrpt    5   
9.508 ± 0.091  ops/us
   VectorUtilBenchmark.binaryDotProductScalar             256  thrpt    5   
8.121 ± 0.059  ops/us
   VectorUtilBenchmark.binaryDotProductScalar            1024  thrpt    5   
2.501 ± 0.011  ops/us
   VectorUtilBenchmark.binaryDotProductUnsignedScalar     128  thrpt    5  
16.977 ± 0.056  ops/us
   VectorUtilBenchmark.binaryDotProductUnsignedScalar     207  thrpt    5  
10.448 ± 0.045  ops/us
   VectorUtilBenchmark.binaryDotProductUnsignedScalar     256  thrpt    5   
8.352 ± 0.042  ops/us
   VectorUtilBenchmark.binaryDotProductUnsignedScalar    1024  thrpt    5   
2.502 ± 0.042  ops/us
   VectorUtilBenchmark.binaryDotProductUnsignedVector     128  thrpt    5  
69.663 ± 0.079  ops/us
   VectorUtilBenchmark.binaryDotProductUnsignedVector     207  thrpt    5  
44.077 ± 0.059  ops/us
   VectorUtilBenchmark.binaryDotProductUnsignedVector     256  thrpt    5  
41.963 ± 0.030  ops/us
   VectorUtilBenchmark.binaryDotProductUnsignedVector    1024  thrpt    5  
11.856 ± 0.020  ops/us
   VectorUtilBenchmark.binaryDotProductVector             128  thrpt    5  
85.247 ± 0.175  ops/us
   VectorUtilBenchmark.binaryDotProductVector             207  thrpt    5  
48.486 ± 0.055  ops/us
   VectorUtilBenchmark.binaryDotProductVector             256  thrpt    5  
50.560 ± 0.045  ops/us
   VectorUtilBenchmark.binaryDotProductVector            1024  thrpt    5  
14.697 ± 0.010  ops/us
   ```
   
   </details> 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[PR] [DRAFT] Add unsigned byte vector operations for uint8 quantization [lucene]

Reply via email to