On Mon, May 25, 2020 at 1:55 AM Uros Bizjak <ubiz...@gmail.com> wrote: > > On Sun, May 24, 2020 at 9:26 AM Hongtao Liu <crazy...@gmail.com> wrote: > > > > On Sat, May 23, 2020 at 6:11 PM Uros Bizjak <ubiz...@gmail.com> wrote: > > > > > > On Sat, May 23, 2020 at 9:25 AM Hongtao Liu <crazy...@gmail.com> wrote: > > > > > > > > Hi: > > > > This patch fix non-conforming expander for > > > > floatv2div2sf2,floatunsv2div2sf2,fix_truncv2sfv2di,fixuns_truncv2sfv2di, > > > > refer to PR95211, PR95256. > > > > bootstrap ok, regression test on i386/x86-64 backend is ok. > > > > > > > > gcc/ChangeLog: > > > > PR target/95211 PR target/95256 > > > > > Changed. > > > Please put every PR reference in a separate line. > > > > > > > * config/i386/sse.md <floatunssuffix>v2div2sf2): New expander. > > > > (fix<fixunssuffix>_truncv2sfv2di2): Ditto. > > > > (float<floatunssuffix>v2div2sf2_internal): Renaming from > > > > float<floatunssuffix>v2div2sf2. > > > > (fix<fixunssuffix>_truncv2sfv2di2<mask_name>_internal): > > > > > > The convention throughout sse,md is to prefix a standard pattern that > > > is used through builtins with avx512<something>_ instead of suffixing > > > the pattern name with _internal. > > > > > Changed. > > > > Renaming from fix<fixunssuffix>_truncv2sfv2di2<mask_name>. > > > > (vec_pack<floatprefix>_float_<mode>): Adjust icode name. > > > > (vec_unpack_<fixprefix>fix_trunc_lo_<mode>): Ditto. > > > > * config/i386/i386-builtin.def: Ditto. > > > > > > Uros. > > > > Update patch. > > The patch is wrong, and the correct way to fix these patterns is more complex: > > a) the pattern should not access register in mode, narrower than 128 > bits, as this implies MMX register in non-TARGET-MMX-WITH-SSE targets.
It seems there are some patterns in sse.md not obey this rule. i.e: (define_insn "sse_storehps" [(set (match_operand:V2SF 0 "nonimmediate_operand" "=m,v,v") (vec_select:V2SF (match_operand:V4SF 1 "nonimmediate_operand" "v,v,o") (parallel [(const_int 2) (const_int 3)])))] "TARGET_SSE && !(MEM_P (operands[0]) && MEM_P (operands[1]))" "@ %vmovhps\t{%1, %0|%q0, %1} %vmovhlps\t{%1, %d0|%d0, %1} %vmovlps\t{%H1, %d0|%d0, %H1}" [(set_attr "type" "ssemov") (set_attr "prefix" "maybe_vex") (set_attr "mode" "V2SF,V4SF,V2SF")]) (define_insn "sse_storelps" [(set (match_operand:V2SF 0 "nonimmediate_operand" "=m,v,v") (vec_select:V2SF (match_operand:V4SF 1 "nonimmediate_operand" " v,v,m") (parallel [(const_int 0) (const_int 1)])))] "TARGET_SSE && !(MEM_P (operands[0]) && MEM_P (operands[1]))" "@ %vmovlps\t{%1, %0|%q0, %1} %vmovaps\t{%1, %0|%0, %1} %vmovlps\t{%1, %d0|%d0, %q1}" [(set_attr "type" "ssemov") (set_attr "prefix" "maybe_vex") (set_attr "mode" "V2SF,V4SF,V2SF")]) Should they be restricted under TARGET_MMX_WITH_SSE or is there anything i missed? > So, the correct way to define insn with narrow mode is to use > vec_select, something like: > > (define_insn "sse4_1_<code>v8qiv8hi2<mask_name>" > [(set (match_operand:V8HI 0 "register_operand" "=Yr,*x,v") > (any_extend:V8HI > (vec_select:V8QI > (match_operand:V16QI 1 "register_operand" "Yr,*x,v") > (parallel [(const_int 0) (const_int 1) > (const_int 2) (const_int 3) > (const_int 4) (const_int 5) > (const_int 6) (const_int 7)]))))] > > The instruction accesses the memory in the correct mode, so the memory > operand is: > > (define_insn "*sse4_1_<code>v8qiv8hi2<mask_name>_1" > [(set (match_operand:V8HI 0 "register_operand" "=Yr,*x,v") > (any_extend:V8HI > (match_operand:V8QI 1 "memory_operand" "m,m,m")))] > > and a pre-reload split has to be introduced to convert insn from > register form to memory form, when memory gets propagated to the insn: > > (define_insn_and_split "*sse4_1_<code>v8qiv8hi2<mask_name>_2" > [(set (match_operand:V8HI 0 "register_operand") > (any_extend:V8HI > (vec_select:V8QI > (subreg:V16QI > (vec_concat:V2DI > (match_operand:DI 1 "memory_operand") > (const_int 0)) 0) > (parallel [(const_int 0) (const_int 1) > (const_int 2) (const_int 3) > (const_int 4) (const_int 5) > (const_int 6) (const_int 7)]))))] > > For a middle end to use this insn, an expander is used: > > (define_expand "<code>v8qiv8hi2" > [(set (match_operand:V8HI 0 "register_operand") > (any_extend:V8HI > (match_operand:V8QI 1 "nonimmediate_operand")))] > > b) Similar approach is used when an output is narrower than 128 bits: > > (define_insn "*float<floatunssuffix>v2div2sf2" > [(set (match_operand:V4SF 0 "register_operand" "=v") > (vec_concat:V4SF > (any_float:V2SF (match_operand:V2DI 1 "nonimmediate_operand" "vm")) > (match_operand:V2SF 2 "const0_operand" "C")))] > > In your concrete case, > > (define_insn "fix<fixunssuffix>_truncv2sfv2di2<mask_name>" > [(set (match_operand:V2DI 0 "register_operand" "=v") > (any_fix:V2DI > (vec_select:V2SF > (match_operand:V4SF 1 "nonimmediate_operand" "vm") > (parallel [(const_int 0) (const_int 1)]))))] > > is already _NOT_ defined in a correct way as far as memory operand is > concerned, see a) above. But, <shrug>, we will apparently have to live > with that. The problem is, that it is named as a standard named > pattern, so middle-end discovers it and tries to use it. It should be > renamed with avx512dq_... prefix. Let's give middle-end something > correct, similar to: > > (define_expand "<code>v8qiv8hi2" > [(set (match_operand:V8HI 0 "register_operand") > (any_extend:V8HI > (match_operand:V8QI 1 "nonimmediate_operand")))] > "TARGET_SSE4_1" > { > if (!MEM_P (operands[1])) > { > operands[1] = simplify_gen_subreg (V16QImode, operands[1], V8QImode, 0); > emit_insn (gen_sse4_1_<code>v8qiv8hi2 (operands[0], operands[1])); > DONE; > } > }) > > The second case is with v2sf output, less than 128 bits wide: > > (define_insn "*float<floatunssuffix>v2div2sf2" > [(set (match_operand:V4SF 0 "register_operand" "=v") > (vec_concat:V4SF > (any_float:V2SF (match_operand:V2DI 1 "nonimmediate_operand" "vm")) > (match_operand:V2SF 2 "const0_operand" "C")))] > > The above insn pattern is OK, we access the output register with > 128bit access, so we are sure no MMX reg will be generated. The > problem is with the existing expander > > (define_expand "float<floatunssuffix>v2div2sf2" > [(set (match_operand:V4SF 0 "register_operand" "=v") > (vec_concat:V4SF > (any_float:V2SF (match_operand:V2DI 1 "nonimmediate_operand" "vm")) > (match_dup 2)))] > "TARGET_AVX512DQ && TARGET_AVX512VL" > "operands[2] = CONST0_RTX (V2SFmode);") > > again, this expander has standard name, and is discovered by > middle-end. Unfortunatelly, it doesn't have conforming operand type. > So, rename it with avx512dq_ prefix, since it is needed by builtins. > > We have to introduce a new expander, that will have conforming mode of > output operand (V2SF) and will produce RTX that will match > *float<floatunssuffix>v2div2sf2. A paradoxical output subreg from > V2SFmode V4SFmode is needed, generated by simplify_gen_subreg as is > the case with paradoxical input subreg. > > Uros. -- BR, Hongtao