https://gcc.gnu.org/bugzilla/show_bug.cgi?id=88464

--- Comment #28 from Jakub Jelinek <jakub at gcc dot gnu.org> ---
(In reply to Uroš Bizjak from comment #27)
> (In reply to Jakub Jelinek from comment #25)
> > Isn't ktestw and kortestw the same thing when both operands are the same
> > mask register?
> True, but kortestw is available with AVX512F, where ktestw is not.
> 
> (In reply to Jakub Jelinek from comment #26)
> > And the TARGET_AVX512F &&  looks incorrect, then we wouldn't be able to test
> > or cmp without -mavx512f.
> No, we fall to *cmp<mode>_ccno_1, which is compatible with CCZmode.

You're right, sorry for the noise.  Your patch looks good to me.

There is another issue though (I guess not correctness, but efficiency), e.g.
on avx512vl-pr88464-{1,3}.c.
E.g. in avx512vl-pr88464-3.c we have:
  if (mask__40.16_82 == { 0, 0 })
    goto <bb 7>; [100.00%]
  else
    goto <bb 6>; [20.00%]
in *.optimized, an attempt to jump around masked stores if the mask is all
zeros.

We emit:
;; if (mask__40.16_82 == { 0, 0 })

(insn 44 43 45 (set (reg:CCZ 17 flags)
        (compare:CCZ (reg:QI 131 [ mask__40.16 ])
            (const_int 0 [0]))) -1
     (nil))

(jump_insn 45 44 0 (set (pc)
        (if_then_else (eq (reg:CCZ 17 flags)
                (const_int 0 [0]))
            (label_ref 0)
            (pc))) -1
     (int_list:REG_BR_PROB 1073741831 (nil)))
for this, i.e.
        kmovw   %k2, %r10d
        testb   %r10b, %r10b
        je      .L4
(without -mavx512dq, I guess ktestb or kortestb with -mavx512dq), but perhaps
we should emit kmovw %k2, %r10d; testb $3, %r10b; je .L4 instead?
If the setter is a compare that clears the higher bit, then it makes no
difference, but if we are e.g. looking at the low 2 or 4 bits of 4 or 8 bit
mask, then it will do a masked store even if the 2 or 4 bits we care about are
clear, just some upper bits are not.

Reply via email to