On Mon, May 25, 2020 at 1:55 AM Uros Bizjak <ubiz...@gmail.com> wrote:
>
> On Sun, May 24, 2020 at 9:26 AM Hongtao Liu <crazy...@gmail.com> wrote:
> >
> > On Sat, May 23, 2020 at 6:11 PM Uros Bizjak <ubiz...@gmail.com> wrote:
> > >
> > > On Sat, May 23, 2020 at 9:25 AM Hongtao Liu <crazy...@gmail.com> wrote:
> > > >
> > > > Hi:
> > > >   This patch fix non-conforming expander for
> > > > floatv2div2sf2,floatunsv2div2sf2,fix_truncv2sfv2di,fixuns_truncv2sfv2di,
> > > > refer to PR95211, PR95256.
> > > >   bootstrap ok, regression test on i386/x86-64 backend is ok.
> > > >
> > > > gcc/ChangeLog:
> > > >         PR target/95211 PR target/95256
> > >
> > Changed.
> > > Please put every PR reference in a separate line.
> > >
> > > >         * config/i386/sse.md <floatunssuffix>v2div2sf2): New expander.
> > > >         (fix<fixunssuffix>_truncv2sfv2di2): Ditto.
> > > >         (float<floatunssuffix>v2div2sf2_internal): Renaming from
> > > >         float<floatunssuffix>v2div2sf2.
> > > >         (fix<fixunssuffix>_truncv2sfv2di2<mask_name>_internal):
> > >
> > > The convention throughout sse,md is to prefix a standard pattern that
> > > is used through builtins with avx512<something>_ instead of suffixing
> > > the pattern name with _internal.
> > >
> > Changed.
> > > >         Renaming from fix<fixunssuffix>_truncv2sfv2di2<mask_name>.
> > > >         (vec_pack<floatprefix>_float_<mode>): Adjust icode name.
> > > >         (vec_unpack_<fixprefix>fix_trunc_lo_<mode>): Ditto.
> > > >         * config/i386/i386-builtin.def: Ditto.
> > >
> > > Uros.
> >
> > Update patch.
>
> The patch is wrong, and the correct way to fix these patterns is more complex:
>
> a) the pattern should not access register in mode, narrower than 128
> bits, as this implies MMX register in non-TARGET-MMX-WITH-SSE targets.

It seems there are some patterns in sse.md not obey this rule.
i.e:
(define_insn "sse_storehps"
  [(set (match_operand:V2SF 0 "nonimmediate_operand" "=m,v,v")
        (vec_select:V2SF
          (match_operand:V4SF 1 "nonimmediate_operand" "v,v,o")
          (parallel [(const_int 2) (const_int 3)])))]
  "TARGET_SSE && !(MEM_P (operands[0]) && MEM_P (operands[1]))"
  "@
   %vmovhps\t{%1, %0|%q0, %1}
   %vmovhlps\t{%1, %d0|%d0, %1}
   %vmovlps\t{%H1, %d0|%d0, %H1}"
  [(set_attr "type" "ssemov")
   (set_attr "prefix" "maybe_vex")
   (set_attr "mode" "V2SF,V4SF,V2SF")])

(define_insn "sse_storelps"
  [(set (match_operand:V2SF 0 "nonimmediate_operand"   "=m,v,v")
        (vec_select:V2SF
          (match_operand:V4SF 1 "nonimmediate_operand" " v,v,m")
          (parallel [(const_int 0) (const_int 1)])))]
  "TARGET_SSE && !(MEM_P (operands[0]) && MEM_P (operands[1]))"
  "@
   %vmovlps\t{%1, %0|%q0, %1}
   %vmovaps\t{%1, %0|%0, %1}
   %vmovlps\t{%1, %d0|%d0, %q1}"
  [(set_attr "type" "ssemov")
   (set_attr "prefix" "maybe_vex")
   (set_attr "mode" "V2SF,V4SF,V2SF")])

Should they be restricted under TARGET_MMX_WITH_SSE or is there
anything i missed?

> So, the correct way to define insn with narrow mode is to use
> vec_select, something like:
>
> (define_insn "sse4_1_<code>v8qiv8hi2<mask_name>"
>   [(set (match_operand:V8HI 0 "register_operand" "=Yr,*x,v")
>     (any_extend:V8HI
>       (vec_select:V8QI
>         (match_operand:V16QI 1 "register_operand" "Yr,*x,v")
>         (parallel [(const_int 0) (const_int 1)
>                (const_int 2) (const_int 3)
>                (const_int 4) (const_int 5)
>                (const_int 6) (const_int 7)]))))]
>
> The instruction accesses the memory in the correct mode, so the memory
> operand is:
>
> (define_insn "*sse4_1_<code>v8qiv8hi2<mask_name>_1"
>   [(set (match_operand:V8HI 0 "register_operand" "=Yr,*x,v")
>     (any_extend:V8HI
>       (match_operand:V8QI 1 "memory_operand" "m,m,m")))]
>
> and a pre-reload split has to be introduced to convert insn from
> register form to memory form, when memory gets propagated to the insn:
>
> (define_insn_and_split "*sse4_1_<code>v8qiv8hi2<mask_name>_2"
>   [(set (match_operand:V8HI 0 "register_operand")
>     (any_extend:V8HI
>       (vec_select:V8QI
>         (subreg:V16QI
>           (vec_concat:V2DI
>             (match_operand:DI 1 "memory_operand")
>         (const_int 0)) 0)
>         (parallel [(const_int 0) (const_int 1)
>                (const_int 2) (const_int 3)
>                (const_int 4) (const_int 5)
>                (const_int 6) (const_int 7)]))))]
>
> For a middle end to use this insn, an expander is used:
>
> (define_expand "<code>v8qiv8hi2"
>   [(set (match_operand:V8HI 0 "register_operand")
>     (any_extend:V8HI
>       (match_operand:V8QI 1 "nonimmediate_operand")))]
>
> b) Similar approach is used when an output is narrower than 128 bits:
>
> (define_insn "*float<floatunssuffix>v2div2sf2"
>   [(set (match_operand:V4SF 0 "register_operand" "=v")
>     (vec_concat:V4SF
>         (any_float:V2SF (match_operand:V2DI 1 "nonimmediate_operand" "vm"))
>         (match_operand:V2SF 2 "const0_operand" "C")))]
>
> In your concrete case,
>
> (define_insn "fix<fixunssuffix>_truncv2sfv2di2<mask_name>"
>   [(set (match_operand:V2DI 0 "register_operand" "=v")
>     (any_fix:V2DI
>       (vec_select:V2SF
>         (match_operand:V4SF 1 "nonimmediate_operand" "vm")
>         (parallel [(const_int 0) (const_int 1)]))))]
>
> is already _NOT_ defined in a correct way as far as memory operand is
> concerned, see a) above. But, <shrug>, we will apparently have to live
> with that. The problem is, that it is named as a standard named
> pattern, so middle-end discovers it and tries to use it. It should be
> renamed with avx512dq_... prefix. Let's give middle-end something
> correct, similar to:
>
> (define_expand "<code>v8qiv8hi2"
>   [(set (match_operand:V8HI 0 "register_operand")
>     (any_extend:V8HI
>       (match_operand:V8QI 1 "nonimmediate_operand")))]
>   "TARGET_SSE4_1"
> {
>   if (!MEM_P (operands[1]))
>     {
>       operands[1] = simplify_gen_subreg (V16QImode, operands[1], V8QImode, 0);
>       emit_insn (gen_sse4_1_<code>v8qiv8hi2 (operands[0], operands[1]));
>       DONE;
>     }
> })
>
> The second case is with v2sf output, less than 128 bits wide:
>
> (define_insn "*float<floatunssuffix>v2div2sf2"
>   [(set (match_operand:V4SF 0 "register_operand" "=v")
>     (vec_concat:V4SF
>         (any_float:V2SF (match_operand:V2DI 1 "nonimmediate_operand" "vm"))
>         (match_operand:V2SF 2 "const0_operand" "C")))]
>
> The above insn pattern is OK, we access the output register with
> 128bit access, so we are sure no MMX reg will be generated. The
> problem is with the existing expander
>
> (define_expand "float<floatunssuffix>v2div2sf2"
>   [(set (match_operand:V4SF 0 "register_operand" "=v")
>     (vec_concat:V4SF
>         (any_float:V2SF (match_operand:V2DI 1 "nonimmediate_operand" "vm"))
>         (match_dup 2)))]
>   "TARGET_AVX512DQ && TARGET_AVX512VL"
>   "operands[2] = CONST0_RTX (V2SFmode);")
>
> again, this expander has standard name, and is discovered by
> middle-end. Unfortunatelly, it doesn't have conforming operand type.
> So, rename it with avx512dq_ prefix, since it is needed by builtins.
>
> We have to introduce a new expander, that will have conforming mode of
> output operand (V2SF) and will produce RTX that will match
> *float<floatunssuffix>v2div2sf2. A paradoxical output subreg from
> V2SFmode V4SFmode is needed, generated by simplify_gen_subreg as is
> the case with paradoxical input subreg.
>
> Uros.



-- 
BR,
Hongtao

Reply via email to