pmpailis commented on PR #13076: URL: https://github.com/apache/lucene/pull/13076#issuecomment-1929150939
Thank you so much @rmuir & @uschindler for taking such a close look and also running benchmarks. 🙇 The reason I went with the look up table was because there seemed to be some improvement in Neon compared to `Integer.bitCount` (hadn't checked using `VarHandle` tbf), and although I wasn't fond of the explicit lookup table either, in the case that we went ahead with something like that, I was hoping to discuss a better alternative (also vector based results seem much different). I added the changes to use `VarHandle` and re-run the benchmarks. The following are from my local dev machine (Neon) ``` Benchmark (size) Mode Cnt Score Error Units VectorUtilBenchmark.binaryHammingDistanceIntBitCount 1 thrpt 15 488.021 ± 4.800 ops/us VectorUtilBenchmark.binaryHammingDistanceIntBitCount 128 thrpt 15 5.896 ± 0.038 ops/us VectorUtilBenchmark.binaryHammingDistanceIntBitCount 207 thrpt 15 4.420 ± 0.065 ops/us VectorUtilBenchmark.binaryHammingDistanceIntBitCount 256 thrpt 15 3.589 ± 0.032 ops/us VectorUtilBenchmark.binaryHammingDistanceIntBitCount 300 thrpt 15 3.123 ± 0.040 ops/us VectorUtilBenchmark.binaryHammingDistanceIntBitCount 512 thrpt 15 1.854 ± 0.017 ops/us VectorUtilBenchmark.binaryHammingDistanceIntBitCount 702 thrpt 15 1.348 ± 0.045 ops/us VectorUtilBenchmark.binaryHammingDistanceIntBitCount 1024 thrpt 15 0.938 ± 0.015 ops/us VectorUtilBenchmark.binaryHammingDistanceLookupTable 1 thrpt 15 502.334 ± 16.595 ops/us VectorUtilBenchmark.binaryHammingDistanceLookupTable 128 thrpt 15 18.142 ± 0.508 ops/us VectorUtilBenchmark.binaryHammingDistanceLookupTable 207 thrpt 15 11.611 ± 0.367 ops/us VectorUtilBenchmark.binaryHammingDistanceLookupTable 256 thrpt 15 9.426 ± 0.124 ops/us VectorUtilBenchmark.binaryHammingDistanceLookupTable 300 thrpt 15 7.932 ± 0.254 ops/us VectorUtilBenchmark.binaryHammingDistanceLookupTable 512 thrpt 15 4.762 ± 0.116 ops/us VectorUtilBenchmark.binaryHammingDistanceLookupTable 702 thrpt 15 3.532 ± 0.018 ops/us VectorUtilBenchmark.binaryHammingDistanceLookupTable 1024 thrpt 15 2.425 ± 0.016 ops/us VectorUtilBenchmark.binaryHammingDistanceVarHandle 1 thrpt 15 473.315 ± 5.442 ops/us VectorUtilBenchmark.binaryHammingDistanceVarHandle 128 thrpt 15 27.318 ± 0.152 ops/us VectorUtilBenchmark.binaryHammingDistanceVarHandle 207 thrpt 15 16.651 ± 0.540 ops/us VectorUtilBenchmark.binaryHammingDistanceVarHandle 256 thrpt 15 14.506 ± 0.046 ops/us VectorUtilBenchmark.binaryHammingDistanceVarHandle 300 thrpt 15 12.170 ± 0.023 ops/us VectorUtilBenchmark.binaryHammingDistanceVarHandle 512 thrpt 15 7.478 ± 0.020 ops/us VectorUtilBenchmark.binaryHammingDistanceVarHandle 702 thrpt 15 5.157 ± 0.314 ops/us VectorUtilBenchmark.binaryHammingDistanceVarHandle 1024 thrpt 15 3.677 ± 0.085 ops/us VectorUtilBenchmark.binaryHammingDistanceVector 1 thrpt 15 491.316 ± 14.116 ops/us VectorUtilBenchmark.binaryHammingDistanceVector 128 thrpt 15 87.343 ± 2.689 ops/us VectorUtilBenchmark.binaryHammingDistanceVector 207 thrpt 15 43.176 ± 1.220 ops/us VectorUtilBenchmark.binaryHammingDistanceVector 256 thrpt 15 48.915 ± 0.477 ops/us VectorUtilBenchmark.binaryHammingDistanceVector 300 thrpt 15 34.555 ± 0.326 ops/us VectorUtilBenchmark.binaryHammingDistanceVector 512 thrpt 15 26.251 ± 0.284 ops/us VectorUtilBenchmark.binaryHammingDistanceVector 702 thrpt 15 17.679 ± 0.204 ops/us VectorUtilBenchmark.binaryHammingDistanceVector 1024 thrpt 15 13.717 ± 0.056 ops/us ``` Also run the same experiments on a Xeon cloud instance with the following results: ``` Benchmark (size) Mode Cnt Score Error Units VectorUtilBenchmark.binaryHammingDistanceIntBitCount 1 thrpt 15 407.490 ? 1.681 ops/us VectorUtilBenchmark.binaryHammingDistanceIntBitCount 128 thrpt 15 13.283 ? 0.033 ops/us VectorUtilBenchmark.binaryHammingDistanceIntBitCount 207 thrpt 15 8.201 ? 0.194 ops/us VectorUtilBenchmark.binaryHammingDistanceIntBitCount 256 thrpt 15 6.775 ? 0.124 ops/us VectorUtilBenchmark.binaryHammingDistanceIntBitCount 300 thrpt 15 5.658 ? 0.159 ops/us VectorUtilBenchmark.binaryHammingDistanceIntBitCount 512 thrpt 15 3.488 ? 0.099 ops/us VectorUtilBenchmark.binaryHammingDistanceIntBitCount 702 thrpt 15 2.588 ? 0.046 ops/us VectorUtilBenchmark.binaryHammingDistanceIntBitCount 1024 thrpt 15 1.866 ? 0.009 ops/us VectorUtilBenchmark.binaryHammingDistanceLookupTable 1 thrpt 15 319.515 ? 0.776 ops/us VectorUtilBenchmark.binaryHammingDistanceLookupTable 128 thrpt 15 16.192 ? 0.222 ops/us VectorUtilBenchmark.binaryHammingDistanceLookupTable 207 thrpt 15 9.828 ? 0.057 ops/us VectorUtilBenchmark.binaryHammingDistanceLookupTable 256 thrpt 15 7.082 ? 0.044 ops/us VectorUtilBenchmark.binaryHammingDistanceLookupTable 300 thrpt 15 6.120 ? 0.090 ops/us VectorUtilBenchmark.binaryHammingDistanceLookupTable 512 thrpt 15 4.043 ? 0.058 ops/us VectorUtilBenchmark.binaryHammingDistanceLookupTable 702 thrpt 15 2.625 ? 0.047 ops/us VectorUtilBenchmark.binaryHammingDistanceLookupTable 1024 thrpt 15 1.954 ? 0.008 ops/us VectorUtilBenchmark.binaryHammingDistanceVarHandle 1 thrpt 15 344.508 ? 1.039 ops/us VectorUtilBenchmark.binaryHammingDistanceVarHandle 128 thrpt 15 101.425 ? 1.319 ops/us VectorUtilBenchmark.binaryHammingDistanceVarHandle 207 thrpt 15 56.693 ? 6.604 ops/us VectorUtilBenchmark.binaryHammingDistanceVarHandle 256 thrpt 15 76.473 ? 0.201 ops/us VectorUtilBenchmark.binaryHammingDistanceVarHandle 300 thrpt 15 58.439 ? 1.204 ops/us VectorUtilBenchmark.binaryHammingDistanceVarHandle 512 thrpt 15 50.839 ? 1.050 ops/us VectorUtilBenchmark.binaryHammingDistanceVarHandle 702 thrpt 15 42.945 ? 0.974 ops/us VectorUtilBenchmark.binaryHammingDistanceVarHandle 1024 thrpt 15 38.331 ? 0.215 ops/us VectorUtilBenchmark.binaryHammingDistanceVector512 1 thrpt 15 281.455 ? 1.110 ops/us VectorUtilBenchmark.binaryHammingDistanceVector512 128 thrpt 15 31.618 ? 0.277 ops/us VectorUtilBenchmark.binaryHammingDistanceVector512 207 thrpt 15 19.928 ? 0.091 ops/us VectorUtilBenchmark.binaryHammingDistanceVector512 256 thrpt 15 16.684 ? 0.066 ops/us VectorUtilBenchmark.binaryHammingDistanceVector512 300 thrpt 15 11.351 ? 0.065 ops/us VectorUtilBenchmark.binaryHammingDistanceVector512 512 thrpt 15 8.520 ? 0.179 ops/us VectorUtilBenchmark.binaryHammingDistanceVector512 702 thrpt 15 5.596 ? 0.012 ops/us VectorUtilBenchmark.binaryHammingDistanceVector512 1024 thrpt 15 4.352 ? 0.021 ops/us VectorUtilBenchmark.binaryHammingDistanceVector256 1 thrpt 15 280.541 ? 3.963 ops/us VectorUtilBenchmark.binaryHammingDistanceVector256 128 thrpt 15 22.965 ? 0.386 ops/us VectorUtilBenchmark.binaryHammingDistanceVector256 207 thrpt 15 14.085 ? 0.278 ops/us VectorUtilBenchmark.binaryHammingDistanceVector256 256 thrpt 15 12.248 ? 0.180 ops/us VectorUtilBenchmark.binaryHammingDistanceVector256 300 thrpt 15 10.086 ? 0.220 ops/us VectorUtilBenchmark.binaryHammingDistanceVector256 512 thrpt 15 6.216 ? 0.022 ops/us VectorUtilBenchmark.binaryHammingDistanceVector256 702 thrpt 15 4.288 ? 0.064 ops/us VectorUtilBenchmark.binaryHammingDistanceVector256 1024 thrpt 15 3.164 ? 0.007 ops/us VectorUtilBenchmark.binaryHammingDistanceVector128 1 thrpt 15 281.373 ? 1.142 ops/us VectorUtilBenchmark.binaryHammingDistanceVector128 128 thrpt 15 27.610 ? 0.741 ops/us VectorUtilBenchmark.binaryHammingDistanceVector128 207 thrpt 15 16.567 ? 0.165 ops/us VectorUtilBenchmark.binaryHammingDistanceVector128 256 thrpt 15 14.946 ? 0.381 ops/us VectorUtilBenchmark.binaryHammingDistanceVector128 300 thrpt 15 11.887 ? 0.032 ops/us VectorUtilBenchmark.binaryHammingDistanceVector128 512 thrpt 15 7.735 ? 0.108 ops/us VectorUtilBenchmark.binaryHammingDistanceVector128 702 thrpt 15 5.430 ? 0.120 ops/us VectorUtilBenchmark.binaryHammingDistanceVector128 1024 thrpt 15 3.870 ? 0.083 ops/us ``` where `VarHandle` clearly outperforms all other solutions. As suggested, I'll proceed with adding this as the main and only implementation of hamming distance and remove both the Panama one and the leftovers from the existing implementation (i.e. lookup table). -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org