[Bug c++/98348] GCC 10.2 AVX512 Mask regression from GCC 9

crazylht at gmail dot com via Gcc-bugs Mon, 21 Dec 2020 03:23:31 -0800

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98348


--- Comment #9 from Hongtao.liu <crazylht at gmail dot com> ---
(In reply to Jakub Jelinek from comment #8)
> Created attachment 49806 [details]
> gcc11-pr98348.patch
> 
> So, if we go for GCC11 the way of pre-reload define_insn_and_split, this is
> some incremental untested progress on your patch (just the sse.md part of
> it).
> Changes:
> 1) it is undesirable to put SUBREGs on the SET_DEST side, as it prevents
> other optimizations later on
> 2) I don't see the point on the TARGET_AVX512BW ||, the insn in the end is
> plain AVX or AVX2 or SSE4* etc. one
> 3) handles also the const0 vector_all_ones order
> 4) for commutative cases allows any operand order, for others ensures the
> right

The bellow pattern should be equivilent to const0 vector_all_ones order, but
the generic part didn't simplify it, so i made a bit adjustment to those
patterns, also some changes in ix86_expand_sse_movcc to generate common NOT
operator instead of gen_knot<mode> which has UNSPEC_MASKOP inside, so the
combine can do the right thing.

Successfully matched this instruction:
(set (reg:V16QI 82 [ <retval> ])
    (vec_merge:V16QI (const_vector:V16QI [
                (const_int -1 [0xffffffffffffffff]) repeated x16
            ])
        (const_vector:V16QI [
                (const_int 0 [0]) repeated x16
            ])
        (not:HI (unspec:HI [
                    (reg:V16QI 89)
                    (reg:V16QI 90)
                    (const_int 4 [0x4])
                ] UNSPEC_PCMP))))

> one of the operands is register
> 5) handles also the LE case by swapping the comparison operands
> 
> The patch doesn't handle the cases where based on the comparison one sets up
> floating vectors, as can be seen e.g. in:
> typedef float V128 __attribute__ ((vector_size(16)));
> typedef float V256 __attribute__ ((vector_size(32)));
> typedef float V512 __attribute__ ((vector_size(64)));
> 
> V128
> foo (V128 x)
> {
>   const union U { unsigned u; float f; } u = { -1U };
>   return x > 0.0f ? u.f : 0.0f;
> }
> 
> V256
> bar (V256 x)
> {
>   const union U { unsigned u; float f; } u = { -1U };
>   return x > 0.0f ? u.f : 0.0f;
> }

I'm adding a new predicate named float_vector_all_ones_operand and
corresponding define_insn_and_split to handle it.

Successfully matched this instruction:
(set (reg:V4SF 82 [ <retval> ])
    (vec_merge:V4SF (mem/u/c:V4SF (symbol_ref/u:DI ("*.LC0") [flags 0x2]) [0 
S16 A128])
        (const_vector:V4SF [
                (const_double:SF 0.0 [0x0.0p+0]) repeated x4
            ])
        (unspec:QI [
                (reg:V4SF 84)
                (reg:V4SF 88)
                (const_int 1 [0x1])
            ] UNSPEC_PCMP)))

[Bug c++/98348] GCC 10.2 AVX512 Mask regression from GCC 9

Reply via email to