uschindler commented on PR #13076: URL: https://github.com/apache/lucene/pull/13076#issuecomment-1927626207
Hi, I modified the scalar variant like that: ```java @Override public int binaryHammingDistance(byte[] a, byte[] b) { int distance = 0, i = 0; for (; i < a.length + 1 - Long.BYTES; i += Long.BYTES) { distance += Long.bitCount(((long) BitUtil.VH_NATIVE_LONG.get(a, i) ^ (long) BitUtil.VH_NATIVE_LONG.get(b, i)) & 0xFFFFFFFFFFFFFFFFL); } // int tail for (; i < a.length + 1 - Integer.BYTES; i += Integer.BYTES) { distance += Integer.bitCount(((int) BitUtil.VH_NATIVE_INT.get(a, i) ^ (int) BitUtil.VH_NATIVE_INT.get(b, i)) & 0xFFFFFFFF); } // byte tail for (; i < a.length; i++) { distance += Integer.bitCount((a[i] ^ b[i]) & 0xFF); } return distance; } ``` This one only uses popcnt CPU instruction. I then ran your benchmakrk to compare the panama-vectorized one with my new implementation as above: ``` Benchmark (size) Mode Cnt Score Error Units VectorUtilBenchmark.binaryHammingDistanceScalar 1 thrpt 15 258,511 ± 17,969 ops/us VectorUtilBenchmark.binaryHammingDistanceScalar 128 thrpt 15 62,364 ± 0,723 ops/us VectorUtilBenchmark.binaryHammingDistanceScalar 207 thrpt 15 40,302 ± 0,703 ops/us VectorUtilBenchmark.binaryHammingDistanceScalar 256 thrpt 15 42,025 ± 0,891 ops/us VectorUtilBenchmark.binaryHammingDistanceScalar 300 thrpt 15 35,065 ± 3,125 ops/us VectorUtilBenchmark.binaryHammingDistanceScalar 512 thrpt 15 24,391 ± 1,987 ops/us VectorUtilBenchmark.binaryHammingDistanceScalar 702 thrpt 15 17,505 ± 0,684 ops/us VectorUtilBenchmark.binaryHammingDistanceScalar 1024 thrpt 15 13,806 ± 0,102 ops/us VectorUtilBenchmark.binaryHammingDistanceVector 1 thrpt 15 231,651 ± 9,975 ops/us VectorUtilBenchmark.binaryHammingDistanceVector 128 thrpt 15 16,760 ± 0,251 ops/us VectorUtilBenchmark.binaryHammingDistanceVector 207 thrpt 15 10,317 ± 0,251 ops/us VectorUtilBenchmark.binaryHammingDistanceVector 256 thrpt 15 8,887 ± 0,559 ops/us VectorUtilBenchmark.binaryHammingDistanceVector 300 thrpt 15 7,466 ± 0,345 ops/us VectorUtilBenchmark.binaryHammingDistanceVector 512 thrpt 15 4,706 ± 0,080 ops/us VectorUtilBenchmark.binaryHammingDistanceVector 702 thrpt 15 3,062 ± 0,566 ops/us VectorUtilBenchmark.binaryHammingDistanceVector 1024 thrpt 15 2,404 ± 0,024 ops/us ```` Please not the SCALAR one here is the above variant. As this one four (!) times faster than the Panama variant, there is no need to have a panama-vectorized one and it looks like it is not working at all. I think the lookup table is a bad idea. To make it short: - Remove the impl and the table from VectorUtilSupport (scalar and vectorized) - Implement my above methoid directly in VectorUtil class This will be a short PR. Make sure to add more tests (there's only one for the VectorUtil method). -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org