uschindler commented on PR #13076:
URL: https://github.com/apache/lucene/pull/13076#issuecomment-1927626207

   Hi,
   I modified the scalar variant like that:
   ```java
     @Override
     public int binaryHammingDistance(byte[] a, byte[] b) {
       int distance = 0, i = 0;
       for (; i < a.length + 1 - Long.BYTES; i += Long.BYTES) {
         distance += Long.bitCount(((long) BitUtil.VH_NATIVE_LONG.get(a, i) ^ 
(long) BitUtil.VH_NATIVE_LONG.get(b, i)) & 0xFFFFFFFFFFFFFFFFL);
       }
       // int tail
       for (; i < a.length + 1 - Integer.BYTES; i += Integer.BYTES) {
         distance += Integer.bitCount(((int) BitUtil.VH_NATIVE_INT.get(a, i) ^ 
(int) BitUtil.VH_NATIVE_INT.get(b, i)) & 0xFFFFFFFF);
       }
       // byte tail
       for (; i < a.length; i++) {
         distance += Integer.bitCount((a[i] ^ b[i]) & 0xFF);
       }
       return distance;
     }
   ```
   
   This one only uses popcnt CPU instruction. I then ran your benchmakrk to 
compare the panama-vectorized one with my new implementation as above:
   
   ```
   Benchmark                                        (size)   Mode  Cnt    Score 
   Error   Units
   VectorUtilBenchmark.binaryHammingDistanceScalar       1  thrpt   15  258,511 
± 17,969  ops/us
   VectorUtilBenchmark.binaryHammingDistanceScalar     128  thrpt   15   62,364 
±  0,723  ops/us
   VectorUtilBenchmark.binaryHammingDistanceScalar     207  thrpt   15   40,302 
±  0,703  ops/us
   VectorUtilBenchmark.binaryHammingDistanceScalar     256  thrpt   15   42,025 
±  0,891  ops/us
   VectorUtilBenchmark.binaryHammingDistanceScalar     300  thrpt   15   35,065 
±  3,125  ops/us
   VectorUtilBenchmark.binaryHammingDistanceScalar     512  thrpt   15   24,391 
±  1,987  ops/us
   VectorUtilBenchmark.binaryHammingDistanceScalar     702  thrpt   15   17,505 
±  0,684  ops/us
   VectorUtilBenchmark.binaryHammingDistanceScalar    1024  thrpt   15   13,806 
±  0,102  ops/us
   VectorUtilBenchmark.binaryHammingDistanceVector       1  thrpt   15  231,651 
±  9,975  ops/us
   VectorUtilBenchmark.binaryHammingDistanceVector     128  thrpt   15   16,760 
±  0,251  ops/us
   VectorUtilBenchmark.binaryHammingDistanceVector     207  thrpt   15   10,317 
±  0,251  ops/us
   VectorUtilBenchmark.binaryHammingDistanceVector     256  thrpt   15    8,887 
±  0,559  ops/us
   VectorUtilBenchmark.binaryHammingDistanceVector     300  thrpt   15    7,466 
±  0,345  ops/us
   VectorUtilBenchmark.binaryHammingDistanceVector     512  thrpt   15    4,706 
±  0,080  ops/us
   VectorUtilBenchmark.binaryHammingDistanceVector     702  thrpt   15    3,062 
±  0,566  ops/us
   VectorUtilBenchmark.binaryHammingDistanceVector    1024  thrpt   15    2,404 
±  0,024  ops/us
   ```` 
   
   Please not the SCALAR one here is the above variant. As this one four (!) 
times faster than the Panama variant, there is no need to have a 
panama-vectorized one and it looks like it is not working at all. I think the 
lookup table is a bad idea.
   
   To make it short:
   - Remove the impl and the table from VectorUtilSupport (scalar and 
vectorized)
   - Implement my above methoid directly in VectorUtil class
   
   This will be a short PR. Make sure to add more tests (there's only one for 
the VectorUtil method).


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

Reply via email to