https://gcc.gnu.org/bugzilla/show_bug.cgi?id=115843

--- Comment #13 from Richard Biener <rguenth at gcc dot gnu.org> ---
Hmm, interesting.  We even vectorize this with just -mavx512f but end up
using vector(16) int besides vector(8) long and equality compares of
vector(16) int:

        vpcmpd  $0, %zmm7, %zmm0, %k2

according to docs that's fine with AVX512F.  But then for both long and double
you need byte masks so I wonder why kmovb isn't in AVX512F ...

I will adjust the testcase to use only AVX512F and push the fix now.  I can't
reproduce the runfail in a different worktree.

Note I don't see all-zero masks but

  vect_patt_22.11_6 = .MASK_LOAD (&MEM <BITBOARD[64]> [(void *)&KingSafetyMask1
+ 8B], 64B, { -1, 0, 0, 0, 0, 0, 0, 0 });

could be optimized to movq $mem, %zmmN (just a single or just a power-of-two
number of initial elements read).  Not sure if the corresponding

  vect_patt_20.17_34 = .MASK_LOAD (&MEM <BITBOARD[64]> [(void
*)&KingSafetyMask1 + -8B], 64B, { 0, 0, 0, 0, 0, 0, 0, -1 });

is worth optimizing to xor %zmmN, %zmmN and pinsr $MEM, %zmmN?  Eliding
constant masks might help to avoid STLF issues due to false dependences on
masked out elements (IIRC all uarchs currently suffer from that).

Note even all-zero masks cannot be optimized on GIMPLE currently since the
value of the masked out lanes isn't well-defined there (we're working on that).

Reply via email to