15 Regression] 531.deepsjeng_r fails to verify with -O3 -march=znver4 --param vect-partial-vector-usage=2

rguenth at gcc dot gnu.org via Gcc-bugs Tue, 16 Jul 2024 01:07:05 -0700

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=115843


--- Comment #13 from Richard Biener <rguenth at gcc dot gnu.org> ---
Hmm, interesting.  We even vectorize this with just -mavx512f but end up
using vector(16) int besides vector(8) long and equality compares of
vector(16) int:

        vpcmpd  $0, %zmm7, %zmm0, %k2

according to docs that's fine with AVX512F.  But then for both long and double
you need byte masks so I wonder why kmovb isn't in AVX512F ...

I will adjust the testcase to use only AVX512F and push the fix now.  I can't
reproduce the runfail in a different worktree.

Note I don't see all-zero masks but

  vect_patt_22.11_6 = .MASK_LOAD (&MEM <BITBOARD[64]> [(void *)&KingSafetyMask1
+ 8B], 64B, { -1, 0, 0, 0, 0, 0, 0, 0 });

could be optimized to movq $mem, %zmmN (just a single or just a power-of-two
number of initial elements read).  Not sure if the corresponding

  vect_patt_20.17_34 = .MASK_LOAD (&MEM <BITBOARD[64]> [(void
*)&KingSafetyMask1 + -8B], 64B, { 0, 0, 0, 0, 0, 0, 0, -1 });

is worth optimizing to xor %zmmN, %zmmN and pinsr $MEM, %zmmN?  Eliding
constant masks might help to avoid STLF issues due to false dependences on
masked out elements (IIRC all uarchs currently suffer from that).

Note even all-zero masks cannot be optimized on GIMPLE currently since the
value of the masked out lanes isn't well-defined there (we're working on that).

[Bug tree-optimization/115843] [14/15 Regression] 531.deepsjeng_r fails to verify with -O3 -march=znver4 --param vect-partial-vector-usage=2

Reply via email to