On Fri, Oct 26, 2018 at 8:07 AM H.J. Lu <hjl.to...@gmail.com> wrote: > > Many x86 pmovzx/pmovsx instructions with memory operands are modeled in > a wrong way. For example: > > (define_insn "sse4_1_<code>v8qiv8hi2<mask_name>" > [(set (match_operand:V8HI 0 "register_operand" "=Yr,*x,v") > (any_extend:V8HI > (vec_select:V8QI > (match_operand:V16QI 1 "nonimmediate_operand" "Yrm,*xm,vm") > (parallel [(const_int 0) (const_int 1) > (const_int 2) (const_int 3) > (const_int 4) (const_int 5) > (const_int 6) (const_int 7)]))))] > > should be defind for memory operands as: > > (define_insn "sse4_1_<code>v8qiv8hi2<mask_name>" > [(set (match_operand:V8HI 0 "register_operand" "=Yr,*x,v") > (any_extend:V8HI > (match_operand:V8QI "memory_operand" "m,m,m")))] > > This set of patches updates them to > > (define_insn "sse4_1_<code>v8qiv8hi2<mask_name>" > [(set (match_operand:V8HI 0 "register_operand" "=Yr,*x,v") > (any_extend:V8HI > (vec_select:V8QI > (match_operand:V16QI 1 "nonimmediate_operand" "Yr,*x,v") > (parallel [(const_int 0) (const_int 1) > (const_int 2) (const_int 3) > (const_int 4) (const_int 5) > (const_int 6) (const_int 7)]))))] > > (define_insn "*sse4_1_<code>v8qiv8hi2<mask_name>_1" > [(set (match_operand:V8HI 0 "register_operand" "=Yr,*x,v") > (any_extend:V8HI > (match_operand:V8QI "subreg_memory_operand" "m,m,m")))] > > with a splitter: > > (define_insn_and_split "*sse4_1_<code>v8qiv8hi2<mask_name>_2" > [(set (match_operand:V8HI 0 "register_operand" "=Yr,*x,v")
No constraints needed for pre-reload splitter. > (any_extend:V8HI > (vec_select:V8QI > (subreg:V16QI > (vec_concat:V2DI > (match_operand:DI 1 "memory_operand" "m,*m,m") > (const_int 0)) 0) > (parallel [(const_int 0) (const_int 1) > (const_int 2) (const_int 3) > (const_int 4) (const_int 5) > (const_int 6) (const_int 7)]))))] > "TARGET_SSE4_1 && <mask_avx512bw_condition> && <mask_avx512vl_condition>" > "#" > "&& can_create_pseudo_p ()" > [(set (match_dup 0) (match_dup 1))] [(set (match_dup 0) (any_extend:V8HI (match_dup 1)))] > { > operands[1] = gen_rtx_<CODE> (V8HImode, > gen_rtx_SUBREG (V8QImode, > operands[1], 0)); > }) Don't create subregs of memory. Use adjust_address_nv. Uros.