On Sun, May 24, 2020 at 9:26 AM Hongtao Liu <crazy...@gmail.com> wrote:
>
> On Sat, May 23, 2020 at 6:11 PM Uros Bizjak <ubiz...@gmail.com> wrote:
> >
> > On Sat, May 23, 2020 at 9:25 AM Hongtao Liu <crazy...@gmail.com> wrote:
> > >
> > > Hi:
> > >   This patch fix non-conforming expander for
> > > floatv2div2sf2,floatunsv2div2sf2,fix_truncv2sfv2di,fixuns_truncv2sfv2di,
> > > refer to PR95211, PR95256.
> > >   bootstrap ok, regression test on i386/x86-64 backend is ok.
> > >
> > > gcc/ChangeLog:
> > >         PR target/95211 PR target/95256
> >
> Changed.
> > Please put every PR reference in a separate line.
> >
> > >         * config/i386/sse.md <floatunssuffix>v2div2sf2): New expander.
> > >         (fix<fixunssuffix>_truncv2sfv2di2): Ditto.
> > >         (float<floatunssuffix>v2div2sf2_internal): Renaming from
> > >         float<floatunssuffix>v2div2sf2.
> > >         (fix<fixunssuffix>_truncv2sfv2di2<mask_name>_internal):
> >
> > The convention throughout sse,md is to prefix a standard pattern that
> > is used through builtins with avx512<something>_ instead of suffixing
> > the pattern name with _internal.
> >
> Changed.
> > >         Renaming from fix<fixunssuffix>_truncv2sfv2di2<mask_name>.
> > >         (vec_pack<floatprefix>_float_<mode>): Adjust icode name.
> > >         (vec_unpack_<fixprefix>fix_trunc_lo_<mode>): Ditto.
> > >         * config/i386/i386-builtin.def: Ditto.
> >
> > Uros.
>
> Update patch.

The patch is wrong, and the correct way to fix these patterns is more complex:

a) the pattern should not access register in mode, narrower than 128
bits, as this implies MMX register in non-TARGET-MMX-WITH-SSE targets.
So, the correct way to define insn with narrow mode is to use
vec_select, something like:

(define_insn "sse4_1_<code>v8qiv8hi2<mask_name>"
  [(set (match_operand:V8HI 0 "register_operand" "=Yr,*x,v")
    (any_extend:V8HI
      (vec_select:V8QI
        (match_operand:V16QI 1 "register_operand" "Yr,*x,v")
        (parallel [(const_int 0) (const_int 1)
               (const_int 2) (const_int 3)
               (const_int 4) (const_int 5)
               (const_int 6) (const_int 7)]))))]

The instruction accesses the memory in the correct mode, so the memory
operand is:

(define_insn "*sse4_1_<code>v8qiv8hi2<mask_name>_1"
  [(set (match_operand:V8HI 0 "register_operand" "=Yr,*x,v")
    (any_extend:V8HI
      (match_operand:V8QI 1 "memory_operand" "m,m,m")))]

and a pre-reload split has to be introduced to convert insn from
register form to memory form, when memory gets propagated to the insn:

(define_insn_and_split "*sse4_1_<code>v8qiv8hi2<mask_name>_2"
  [(set (match_operand:V8HI 0 "register_operand")
    (any_extend:V8HI
      (vec_select:V8QI
        (subreg:V16QI
          (vec_concat:V2DI
            (match_operand:DI 1 "memory_operand")
        (const_int 0)) 0)
        (parallel [(const_int 0) (const_int 1)
               (const_int 2) (const_int 3)
               (const_int 4) (const_int 5)
               (const_int 6) (const_int 7)]))))]

For a middle end to use this insn, an expander is used:

(define_expand "<code>v8qiv8hi2"
  [(set (match_operand:V8HI 0 "register_operand")
    (any_extend:V8HI
      (match_operand:V8QI 1 "nonimmediate_operand")))]

b) Similar approach is used when an output is narrower than 128 bits:

(define_insn "*float<floatunssuffix>v2div2sf2"
  [(set (match_operand:V4SF 0 "register_operand" "=v")
    (vec_concat:V4SF
        (any_float:V2SF (match_operand:V2DI 1 "nonimmediate_operand" "vm"))
        (match_operand:V2SF 2 "const0_operand" "C")))]

In your concrete case,

(define_insn "fix<fixunssuffix>_truncv2sfv2di2<mask_name>"
  [(set (match_operand:V2DI 0 "register_operand" "=v")
    (any_fix:V2DI
      (vec_select:V2SF
        (match_operand:V4SF 1 "nonimmediate_operand" "vm")
        (parallel [(const_int 0) (const_int 1)]))))]

is already _NOT_ defined in a correct way as far as memory operand is
concerned, see a) above. But, <shrug>, we will apparently have to live
with that. The problem is, that it is named as a standard named
pattern, so middle-end discovers it and tries to use it. It should be
renamed with avx512dq_... prefix. Let's give middle-end something
correct, similar to:

(define_expand "<code>v8qiv8hi2"
  [(set (match_operand:V8HI 0 "register_operand")
    (any_extend:V8HI
      (match_operand:V8QI 1 "nonimmediate_operand")))]
  "TARGET_SSE4_1"
{
  if (!MEM_P (operands[1]))
    {
      operands[1] = simplify_gen_subreg (V16QImode, operands[1], V8QImode, 0);
      emit_insn (gen_sse4_1_<code>v8qiv8hi2 (operands[0], operands[1]));
      DONE;
    }
})

The second case is with v2sf output, less than 128 bits wide:

(define_insn "*float<floatunssuffix>v2div2sf2"
  [(set (match_operand:V4SF 0 "register_operand" "=v")
    (vec_concat:V4SF
        (any_float:V2SF (match_operand:V2DI 1 "nonimmediate_operand" "vm"))
        (match_operand:V2SF 2 "const0_operand" "C")))]

The above insn pattern is OK, we access the output register with
128bit access, so we are sure no MMX reg will be generated. The
problem is with the existing expander

(define_expand "float<floatunssuffix>v2div2sf2"
  [(set (match_operand:V4SF 0 "register_operand" "=v")
    (vec_concat:V4SF
        (any_float:V2SF (match_operand:V2DI 1 "nonimmediate_operand" "vm"))
        (match_dup 2)))]
  "TARGET_AVX512DQ && TARGET_AVX512VL"
  "operands[2] = CONST0_RTX (V2SFmode);")

again, this expander has standard name, and is discovered by
middle-end. Unfortunatelly, it doesn't have conforming operand type.
So, rename it with avx512dq_ prefix, since it is needed by builtins.

We have to introduce a new expander, that will have conforming mode of
output operand (V2SF) and will produce RTX that will match
*float<floatunssuffix>v2div2sf2. A paradoxical output subreg from
V2SFmode V4SFmode is needed, generated by simplify_gen_subreg as is
the case with paradoxical input subreg.

Uros.

Reply via email to