On Sun, May 24, 2020 at 9:26 AM Hongtao Liu <crazy...@gmail.com> wrote: > > On Sat, May 23, 2020 at 6:11 PM Uros Bizjak <ubiz...@gmail.com> wrote: > > > > On Sat, May 23, 2020 at 9:25 AM Hongtao Liu <crazy...@gmail.com> wrote: > > > > > > Hi: > > > This patch fix non-conforming expander for > > > floatv2div2sf2,floatunsv2div2sf2,fix_truncv2sfv2di,fixuns_truncv2sfv2di, > > > refer to PR95211, PR95256. > > > bootstrap ok, regression test on i386/x86-64 backend is ok. > > > > > > gcc/ChangeLog: > > > PR target/95211 PR target/95256 > > > Changed. > > Please put every PR reference in a separate line. > > > > > * config/i386/sse.md <floatunssuffix>v2div2sf2): New expander. > > > (fix<fixunssuffix>_truncv2sfv2di2): Ditto. > > > (float<floatunssuffix>v2div2sf2_internal): Renaming from > > > float<floatunssuffix>v2div2sf2. > > > (fix<fixunssuffix>_truncv2sfv2di2<mask_name>_internal): > > > > The convention throughout sse,md is to prefix a standard pattern that > > is used through builtins with avx512<something>_ instead of suffixing > > the pattern name with _internal. > > > Changed. > > > Renaming from fix<fixunssuffix>_truncv2sfv2di2<mask_name>. > > > (vec_pack<floatprefix>_float_<mode>): Adjust icode name. > > > (vec_unpack_<fixprefix>fix_trunc_lo_<mode>): Ditto. > > > * config/i386/i386-builtin.def: Ditto. > > > > Uros. > > Update patch.
The patch is wrong, and the correct way to fix these patterns is more complex: a) the pattern should not access register in mode, narrower than 128 bits, as this implies MMX register in non-TARGET-MMX-WITH-SSE targets. So, the correct way to define insn with narrow mode is to use vec_select, something like: (define_insn "sse4_1_<code>v8qiv8hi2<mask_name>" [(set (match_operand:V8HI 0 "register_operand" "=Yr,*x,v") (any_extend:V8HI (vec_select:V8QI (match_operand:V16QI 1 "register_operand" "Yr,*x,v") (parallel [(const_int 0) (const_int 1) (const_int 2) (const_int 3) (const_int 4) (const_int 5) (const_int 6) (const_int 7)]))))] The instruction accesses the memory in the correct mode, so the memory operand is: (define_insn "*sse4_1_<code>v8qiv8hi2<mask_name>_1" [(set (match_operand:V8HI 0 "register_operand" "=Yr,*x,v") (any_extend:V8HI (match_operand:V8QI 1 "memory_operand" "m,m,m")))] and a pre-reload split has to be introduced to convert insn from register form to memory form, when memory gets propagated to the insn: (define_insn_and_split "*sse4_1_<code>v8qiv8hi2<mask_name>_2" [(set (match_operand:V8HI 0 "register_operand") (any_extend:V8HI (vec_select:V8QI (subreg:V16QI (vec_concat:V2DI (match_operand:DI 1 "memory_operand") (const_int 0)) 0) (parallel [(const_int 0) (const_int 1) (const_int 2) (const_int 3) (const_int 4) (const_int 5) (const_int 6) (const_int 7)]))))] For a middle end to use this insn, an expander is used: (define_expand "<code>v8qiv8hi2" [(set (match_operand:V8HI 0 "register_operand") (any_extend:V8HI (match_operand:V8QI 1 "nonimmediate_operand")))] b) Similar approach is used when an output is narrower than 128 bits: (define_insn "*float<floatunssuffix>v2div2sf2" [(set (match_operand:V4SF 0 "register_operand" "=v") (vec_concat:V4SF (any_float:V2SF (match_operand:V2DI 1 "nonimmediate_operand" "vm")) (match_operand:V2SF 2 "const0_operand" "C")))] In your concrete case, (define_insn "fix<fixunssuffix>_truncv2sfv2di2<mask_name>" [(set (match_operand:V2DI 0 "register_operand" "=v") (any_fix:V2DI (vec_select:V2SF (match_operand:V4SF 1 "nonimmediate_operand" "vm") (parallel [(const_int 0) (const_int 1)]))))] is already _NOT_ defined in a correct way as far as memory operand is concerned, see a) above. But, <shrug>, we will apparently have to live with that. The problem is, that it is named as a standard named pattern, so middle-end discovers it and tries to use it. It should be renamed with avx512dq_... prefix. Let's give middle-end something correct, similar to: (define_expand "<code>v8qiv8hi2" [(set (match_operand:V8HI 0 "register_operand") (any_extend:V8HI (match_operand:V8QI 1 "nonimmediate_operand")))] "TARGET_SSE4_1" { if (!MEM_P (operands[1])) { operands[1] = simplify_gen_subreg (V16QImode, operands[1], V8QImode, 0); emit_insn (gen_sse4_1_<code>v8qiv8hi2 (operands[0], operands[1])); DONE; } }) The second case is with v2sf output, less than 128 bits wide: (define_insn "*float<floatunssuffix>v2div2sf2" [(set (match_operand:V4SF 0 "register_operand" "=v") (vec_concat:V4SF (any_float:V2SF (match_operand:V2DI 1 "nonimmediate_operand" "vm")) (match_operand:V2SF 2 "const0_operand" "C")))] The above insn pattern is OK, we access the output register with 128bit access, so we are sure no MMX reg will be generated. The problem is with the existing expander (define_expand "float<floatunssuffix>v2div2sf2" [(set (match_operand:V4SF 0 "register_operand" "=v") (vec_concat:V4SF (any_float:V2SF (match_operand:V2DI 1 "nonimmediate_operand" "vm")) (match_dup 2)))] "TARGET_AVX512DQ && TARGET_AVX512VL" "operands[2] = CONST0_RTX (V2SFmode);") again, this expander has standard name, and is discovered by middle-end. Unfortunatelly, it doesn't have conforming operand type. So, rename it with avx512dq_ prefix, since it is needed by builtins. We have to introduce a new expander, that will have conforming mode of output operand (V2SF) and will produce RTX that will match *float<floatunssuffix>v2div2sf2. A paradoxical output subreg from V2SFmode V4SFmode is needed, generated by simplify_gen_subreg as is the case with paradoxical input subreg. Uros.