rmuir commented on PR #13076: URL: https://github.com/apache/lucene/pull/13076#issuecomment-1927739974
Seems to autovectorize just fine, i took uwe's branch and dumped assembly on my AVX2 machine and see e.g. 256-bit xor and population count logic. I checked the logic in openjdk and it will use `vpopcntdq` on AVX-512 if available, etc. So this solution is much better than some explicit vector stuff because it will do the right thing depending on CPU. ``` ... 0.35% 0x00007fffe0141fa3: vmovdqu 0x10(%rax,%r8,1),%ymm9 0x00007fffe0141faa: vpxor 0x10(%rdx,%r8,1),%ymm9,%ymm9 0.04% 0x00007fffe0141fb1: movabs $0xf0f0f0f,%r8 0.35% 0x00007fffe0141fbb: vmovq %r8,%xmm10 0x00007fffe0141fc0: vpbroadcastd %xmm10,%ymm10 0x00007fffe0141fc5: vpsrlw $0x4,%ymm9,%ymm11 0x00007fffe0141fcb: vpand %ymm10,%ymm11,%ymm11 0.40% 0x00007fffe0141fd0: vpand %ymm10,%ymm9,%ymm10 0x00007fffe0141fd5: vmovdqu -0x59829d(%rip),%ymm12 # Stub::popcount_lut ; {external_word} 0x00007fffe0141fdd: vpshufb %ymm10,%ymm12,%ymm10 0.02% 0x00007fffe0141fe2: vpshufb %ymm11,%ymm12,%ymm11 0.48% 0x00007fffe0141fe7: vpaddb %ymm10,%ymm11,%ymm11 0x00007fffe0141fec: vpxor %ymm12,%ymm12,%ymm12 0x00007fffe0141ff1: vpsadbw %ymm12,%ymm11,%ymm10 0.07% 0x00007fffe0141ff6: vpermilps $0x8,%ymm10,%ymm9 0.35% 0x00007fffe0141ffc: vpermpd $0x8,%ymm9,%ymm9 0x00007fffe0142002: vpaddd %xmm9,%xmm1,%xmm1 ... ``` -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org