pmpailis commented on PR #13076:
URL: https://github.com/apache/lucene/pull/13076#issuecomment-1929150939

   Thank you so much @rmuir & @uschindler for taking such a close look and also 
running benchmarks. 🙇  The reason I went with the look up table was because 
there seemed to be some improvement  in Neon compared to `Integer.bitCount` 
(hadn't checked using `VarHandle` tbf), and although I wasn't fond of the 
explicit lookup table either, in the case that we went ahead with something 
like that, I was hoping to discuss a better alternative (also vector based 
results seem much different).
   
    I added the changes to use `VarHandle` and re-run the benchmarks. The 
following are from my local dev machine (Neon)
   ```
   Benchmark                                             (size)   Mode  Cnt    
Score    Error   Units
   VectorUtilBenchmark.binaryHammingDistanceIntBitCount       1  thrpt   15  
488.021 ±  4.800  ops/us
   VectorUtilBenchmark.binaryHammingDistanceIntBitCount     128  thrpt   15    
5.896 ±  0.038  ops/us
   VectorUtilBenchmark.binaryHammingDistanceIntBitCount     207  thrpt   15    
4.420 ±  0.065  ops/us
   VectorUtilBenchmark.binaryHammingDistanceIntBitCount     256  thrpt   15    
3.589 ±  0.032  ops/us
   VectorUtilBenchmark.binaryHammingDistanceIntBitCount     300  thrpt   15    
3.123 ±  0.040  ops/us
   VectorUtilBenchmark.binaryHammingDistanceIntBitCount     512  thrpt   15    
1.854 ±  0.017  ops/us
   VectorUtilBenchmark.binaryHammingDistanceIntBitCount     702  thrpt   15    
1.348 ±  0.045  ops/us
   VectorUtilBenchmark.binaryHammingDistanceIntBitCount    1024  thrpt   15    
0.938 ±  0.015  ops/us
   
   VectorUtilBenchmark.binaryHammingDistanceLookupTable       1  thrpt   15  
502.334 ± 16.595  ops/us
   VectorUtilBenchmark.binaryHammingDistanceLookupTable     128  thrpt   15   
18.142 ±  0.508  ops/us
   VectorUtilBenchmark.binaryHammingDistanceLookupTable     207  thrpt   15   
11.611 ±  0.367  ops/us
   VectorUtilBenchmark.binaryHammingDistanceLookupTable     256  thrpt   15    
9.426 ±  0.124  ops/us
   VectorUtilBenchmark.binaryHammingDistanceLookupTable     300  thrpt   15    
7.932 ±  0.254  ops/us
   VectorUtilBenchmark.binaryHammingDistanceLookupTable     512  thrpt   15    
4.762 ±  0.116  ops/us
   VectorUtilBenchmark.binaryHammingDistanceLookupTable     702  thrpt   15    
3.532 ±  0.018  ops/us
   VectorUtilBenchmark.binaryHammingDistanceLookupTable    1024  thrpt   15    
2.425 ±  0.016  ops/us
   
   VectorUtilBenchmark.binaryHammingDistanceVarHandle         1  thrpt   15  
473.315 ±  5.442  ops/us
   VectorUtilBenchmark.binaryHammingDistanceVarHandle       128  thrpt   15   
27.318 ±  0.152  ops/us
   VectorUtilBenchmark.binaryHammingDistanceVarHandle       207  thrpt   15   
16.651 ±  0.540  ops/us
   VectorUtilBenchmark.binaryHammingDistanceVarHandle       256  thrpt   15   
14.506 ±  0.046  ops/us
   VectorUtilBenchmark.binaryHammingDistanceVarHandle       300  thrpt   15   
12.170 ±  0.023  ops/us
   VectorUtilBenchmark.binaryHammingDistanceVarHandle       512  thrpt   15    
7.478 ±  0.020  ops/us
   VectorUtilBenchmark.binaryHammingDistanceVarHandle       702  thrpt   15    
5.157 ±  0.314  ops/us
   VectorUtilBenchmark.binaryHammingDistanceVarHandle      1024  thrpt   15    
3.677 ±  0.085  ops/us
   
   VectorUtilBenchmark.binaryHammingDistanceVector            1  thrpt   15  
491.316 ± 14.116  ops/us
   VectorUtilBenchmark.binaryHammingDistanceVector          128  thrpt   15   
87.343 ±  2.689  ops/us
   VectorUtilBenchmark.binaryHammingDistanceVector          207  thrpt   15   
43.176 ±  1.220  ops/us
   VectorUtilBenchmark.binaryHammingDistanceVector          256  thrpt   15   
48.915 ±  0.477  ops/us
   VectorUtilBenchmark.binaryHammingDistanceVector          300  thrpt   15   
34.555 ±  0.326  ops/us
   VectorUtilBenchmark.binaryHammingDistanceVector          512  thrpt   15   
26.251 ±  0.284  ops/us
   VectorUtilBenchmark.binaryHammingDistanceVector          702  thrpt   15   
17.679 ±  0.204  ops/us
   VectorUtilBenchmark.binaryHammingDistanceVector         1024  thrpt   15   
13.717 ±  0.056  ops/us
   ```
   
   Also run the same experiments on a Xeon cloud instance with the following 
results: 
   ```
   Benchmark                                             (size)   Mode  Cnt    
Score   Error   Units
   VectorUtilBenchmark.binaryHammingDistanceIntBitCount       1  thrpt   15  
407.490 ? 1.681  ops/us
   VectorUtilBenchmark.binaryHammingDistanceIntBitCount     128  thrpt   15   
13.283 ? 0.033  ops/us
   VectorUtilBenchmark.binaryHammingDistanceIntBitCount     207  thrpt   15    
8.201 ? 0.194  ops/us
   VectorUtilBenchmark.binaryHammingDistanceIntBitCount     256  thrpt   15    
6.775 ? 0.124  ops/us
   VectorUtilBenchmark.binaryHammingDistanceIntBitCount     300  thrpt   15    
5.658 ? 0.159  ops/us
   VectorUtilBenchmark.binaryHammingDistanceIntBitCount     512  thrpt   15    
3.488 ? 0.099  ops/us
   VectorUtilBenchmark.binaryHammingDistanceIntBitCount     702  thrpt   15    
2.588 ? 0.046  ops/us
   VectorUtilBenchmark.binaryHammingDistanceIntBitCount    1024  thrpt   15    
1.866 ? 0.009  ops/us
   
   VectorUtilBenchmark.binaryHammingDistanceLookupTable       1  thrpt   15  
319.515 ? 0.776  ops/us
   VectorUtilBenchmark.binaryHammingDistanceLookupTable     128  thrpt   15   
16.192 ? 0.222  ops/us
   VectorUtilBenchmark.binaryHammingDistanceLookupTable     207  thrpt   15    
9.828 ? 0.057  ops/us
   VectorUtilBenchmark.binaryHammingDistanceLookupTable     256  thrpt   15    
7.082 ? 0.044  ops/us
   VectorUtilBenchmark.binaryHammingDistanceLookupTable     300  thrpt   15    
6.120 ? 0.090  ops/us
   VectorUtilBenchmark.binaryHammingDistanceLookupTable     512  thrpt   15    
4.043 ? 0.058  ops/us
   VectorUtilBenchmark.binaryHammingDistanceLookupTable     702  thrpt   15    
2.625 ? 0.047  ops/us
   VectorUtilBenchmark.binaryHammingDistanceLookupTable    1024  thrpt   15    
1.954 ? 0.008  ops/us
   
   VectorUtilBenchmark.binaryHammingDistanceVarHandle         1  thrpt   15  
344.508 ? 1.039  ops/us
   VectorUtilBenchmark.binaryHammingDistanceVarHandle       128  thrpt   15  
101.425 ? 1.319  ops/us
   VectorUtilBenchmark.binaryHammingDistanceVarHandle       207  thrpt   15   
56.693 ? 6.604  ops/us
   VectorUtilBenchmark.binaryHammingDistanceVarHandle       256  thrpt   15   
76.473 ? 0.201  ops/us
   VectorUtilBenchmark.binaryHammingDistanceVarHandle       300  thrpt   15   
58.439 ? 1.204  ops/us
   VectorUtilBenchmark.binaryHammingDistanceVarHandle       512  thrpt   15   
50.839 ? 1.050  ops/us
   VectorUtilBenchmark.binaryHammingDistanceVarHandle       702  thrpt   15   
42.945 ? 0.974  ops/us
   VectorUtilBenchmark.binaryHammingDistanceVarHandle      1024  thrpt   15   
38.331 ? 0.215  ops/us
   
   VectorUtilBenchmark.binaryHammingDistanceVector512         1  thrpt   15  
281.455 ? 1.110  ops/us
   VectorUtilBenchmark.binaryHammingDistanceVector512       128  thrpt   15   
31.618 ? 0.277  ops/us
   VectorUtilBenchmark.binaryHammingDistanceVector512       207  thrpt   15   
19.928 ? 0.091  ops/us
   VectorUtilBenchmark.binaryHammingDistanceVector512       256  thrpt   15   
16.684 ? 0.066  ops/us
   VectorUtilBenchmark.binaryHammingDistanceVector512       300  thrpt   15   
11.351 ? 0.065  ops/us
   VectorUtilBenchmark.binaryHammingDistanceVector512       512  thrpt   15    
8.520 ? 0.179  ops/us
   VectorUtilBenchmark.binaryHammingDistanceVector512       702  thrpt   15    
5.596 ? 0.012  ops/us
   VectorUtilBenchmark.binaryHammingDistanceVector512      1024  thrpt   15    
4.352 ? 0.021  ops/us
   
   VectorUtilBenchmark.binaryHammingDistanceVector256         1  thrpt   15  
280.541 ? 3.963  ops/us
   VectorUtilBenchmark.binaryHammingDistanceVector256       128  thrpt   15   
22.965 ? 0.386  ops/us
   VectorUtilBenchmark.binaryHammingDistanceVector256       207  thrpt   15   
14.085 ? 0.278  ops/us
   VectorUtilBenchmark.binaryHammingDistanceVector256       256  thrpt   15   
12.248 ? 0.180  ops/us
   VectorUtilBenchmark.binaryHammingDistanceVector256       300  thrpt   15   
10.086 ? 0.220  ops/us
   VectorUtilBenchmark.binaryHammingDistanceVector256       512  thrpt   15    
6.216 ? 0.022  ops/us
   VectorUtilBenchmark.binaryHammingDistanceVector256       702  thrpt   15    
4.288 ? 0.064  ops/us
   VectorUtilBenchmark.binaryHammingDistanceVector256      1024  thrpt   15    
3.164 ? 0.007  ops/us
   
   VectorUtilBenchmark.binaryHammingDistanceVector128         1  thrpt   15  
281.373 ? 1.142  ops/us
   VectorUtilBenchmark.binaryHammingDistanceVector128       128  thrpt   15   
27.610 ? 0.741  ops/us
   VectorUtilBenchmark.binaryHammingDistanceVector128       207  thrpt   15   
16.567 ? 0.165  ops/us
   VectorUtilBenchmark.binaryHammingDistanceVector128       256  thrpt   15   
14.946 ? 0.381  ops/us
   VectorUtilBenchmark.binaryHammingDistanceVector128       300  thrpt   15   
11.887 ? 0.032  ops/us
   VectorUtilBenchmark.binaryHammingDistanceVector128       512  thrpt   15    
7.735 ? 0.108  ops/us
   VectorUtilBenchmark.binaryHammingDistanceVector128       702  thrpt   15    
5.430 ? 0.120  ops/us
   VectorUtilBenchmark.binaryHammingDistanceVector128      1024  thrpt   15    
3.870 ? 0.083  ops/us
   ```
   
   where `VarHandle` clearly outperforms all other solutions. 
   
   As suggested, I'll proceed with adding this as the main and only 
implementation of hamming distance and remove both the Panama one and the 
leftovers from the existing implementation (i.e. lookup table). 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

Reply via email to