rmuir commented on PR #13076:
URL: https://github.com/apache/lucene/pull/13076#issuecomment-1927739974
Seems to autovectorize just fine, i took uwe's branch and dumped assembly on
my AVX2 machine and see e.g. 256-bit xor and population count logic. I checked
the logic in openjdk and it will use `vpopcntdq` on AVX-512 if available, etc.
So this solution is much better than some explicit vector stuff because it will
do the right thing depending on CPU.
```
...
0.35% 0x00007fffe0141fa3: vmovdqu 0x10(%rax,%r8,1),%ymm9
0x00007fffe0141faa: vpxor 0x10(%rdx,%r8,1),%ymm9,%ymm9
0.04% 0x00007fffe0141fb1: movabs $0xf0f0f0f,%r8
0.35% 0x00007fffe0141fbb: vmovq %r8,%xmm10
0x00007fffe0141fc0: vpbroadcastd %xmm10,%ymm10
0x00007fffe0141fc5: vpsrlw $0x4,%ymm9,%ymm11
0x00007fffe0141fcb: vpand %ymm10,%ymm11,%ymm11
0.40% 0x00007fffe0141fd0: vpand %ymm10,%ymm9,%ymm10
0x00007fffe0141fd5: vmovdqu -0x59829d(%rip),%ymm12
# Stub::popcount_lut
; {external_word}
0x00007fffe0141fdd: vpshufb %ymm10,%ymm12,%ymm10
0.02% 0x00007fffe0141fe2: vpshufb %ymm11,%ymm12,%ymm11
0.48% 0x00007fffe0141fe7: vpaddb %ymm10,%ymm11,%ymm11
0x00007fffe0141fec: vpxor %ymm12,%ymm12,%ymm12
0x00007fffe0141ff1: vpsadbw %ymm12,%ymm11,%ymm10
0.07% 0x00007fffe0141ff6: vpermilps $0x8,%ymm10,%ymm9
0.35% 0x00007fffe0141ffc: vpermpd $0x8,%ymm9,%ymm9
0x00007fffe0142002: vpaddd %xmm9,%xmm1,%xmm1
...
```
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]