Many x86 pmovzx/pmovsx instructions with memory operands are modeled in a wrong way. For example:
(define_insn "sse4_1_<code>v8qiv8hi2<mask_name>" [(set (match_operand:V8HI 0 "register_operand" "=Yr,*x,v") (any_extend:V8HI (vec_select:V8QI (match_operand:V16QI 1 "nonimmediate_operand" "Yrm,*xm,vm") (parallel [(const_int 0) (const_int 1) (const_int 2) (const_int 3) (const_int 4) (const_int 5) (const_int 6) (const_int 7)]))))] should be defind for memory operands as: (define_insn "sse4_1_<code>v8qiv8hi2<mask_name>" [(set (match_operand:V8HI 0 "register_operand" "=Yr,*x,v") (any_extend:V8HI (match_operand:V8QI "memory_operand" "m,m,m")))] This set of patches updates them to (define_insn "sse4_1_<code>v8qiv8hi2<mask_name>" [(set (match_operand:V8HI 0 "register_operand" "=Yr,*x,v") (any_extend:V8HI (vec_select:V8QI (match_operand:V16QI 1 "nonimmediate_operand" "Yr,*x,v") (parallel [(const_int 0) (const_int 1) (const_int 2) (const_int 3) (const_int 4) (const_int 5) (const_int 6) (const_int 7)]))))] (define_insn "*sse4_1_<code>v8qiv8hi2<mask_name>_1" [(set (match_operand:V8HI 0 "register_operand" "=Yr,*x,v") (any_extend:V8HI (match_operand:V8QI "subreg_memory_operand" "m,m,m")))] with a splitter: (define_insn_and_split "*sse4_1_<code>v8qiv8hi2<mask_name>_2" [(set (match_operand:V8HI 0 "register_operand" "=Yr,*x,v") (any_extend:V8HI (vec_select:V8QI (subreg:V16QI (vec_concat:V2DI (match_operand:DI 1 "memory_operand" "m,*m,m") (const_int 0)) 0) (parallel [(const_int 0) (const_int 1) (const_int 2) (const_int 3) (const_int 4) (const_int 5) (const_int 6) (const_int 7)]))))] "TARGET_SSE4_1 && <mask_avx512bw_condition> && <mask_avx512vl_condition>" "#" "&& can_create_pseudo_p ()" [(set (match_dup 0) (match_dup 1))] { operands[1] = gen_rtx_<CODE> (V8HImode, gen_rtx_SUBREG (V8QImode, operands[1], 0)); }) It also contains a patch to update apply_subst_iterator to handle define_insn_and_split. H.J. Lu (2): apply_subst_iterator: Handle define_insn_and_split x86: Add pmovzx/pmovsx patterns with memory operands gcc/config/i386/predicates.md | 30 ++ gcc/config/i386/sse.md | 323 ++++++++++++++++++++- gcc/read-rtl.c | 6 +- gcc/testsuite/gcc.target/i386/pr87317-1.c | 14 + gcc/testsuite/gcc.target/i386/pr87317-10.c | 14 + gcc/testsuite/gcc.target/i386/pr87317-11.c | 14 + gcc/testsuite/gcc.target/i386/pr87317-12.c | 22 ++ gcc/testsuite/gcc.target/i386/pr87317-13.c | 14 + gcc/testsuite/gcc.target/i386/pr87317-2.c | 14 + gcc/testsuite/gcc.target/i386/pr87317-3.c | 14 + gcc/testsuite/gcc.target/i386/pr87317-4.c | 14 + gcc/testsuite/gcc.target/i386/pr87317-5.c | 14 + gcc/testsuite/gcc.target/i386/pr87317-6.c | 14 + gcc/testsuite/gcc.target/i386/pr87317-7.c | 14 + gcc/testsuite/gcc.target/i386/pr87317-8.c | 14 + gcc/testsuite/gcc.target/i386/pr87317-9.c | 14 + 16 files changed, 535 insertions(+), 14 deletions(-) create mode 100644 gcc/testsuite/gcc.target/i386/pr87317-1.c create mode 100644 gcc/testsuite/gcc.target/i386/pr87317-10.c create mode 100644 gcc/testsuite/gcc.target/i386/pr87317-11.c create mode 100644 gcc/testsuite/gcc.target/i386/pr87317-12.c create mode 100644 gcc/testsuite/gcc.target/i386/pr87317-13.c create mode 100644 gcc/testsuite/gcc.target/i386/pr87317-2.c create mode 100644 gcc/testsuite/gcc.target/i386/pr87317-3.c create mode 100644 gcc/testsuite/gcc.target/i386/pr87317-4.c create mode 100644 gcc/testsuite/gcc.target/i386/pr87317-5.c create mode 100644 gcc/testsuite/gcc.target/i386/pr87317-6.c create mode 100644 gcc/testsuite/gcc.target/i386/pr87317-7.c create mode 100644 gcc/testsuite/gcc.target/i386/pr87317-8.c create mode 100644 gcc/testsuite/gcc.target/i386/pr87317-9.c -- 2.17.2