https://gcc.gnu.org/bugzilla/show_bug.cgi?id=80846
--- Comment #4 from Richard Biener <rguenth at gcc dot gnu.org> --- (define_expand "<plusminus_insn><mode>3" [(set (match_operand:VI_AVX2 0 "register_operand") (plusminus:VI_AVX2 (match_operand:VI_AVX2 1 "vector_operand") (match_operand:VI_AVX2 2 "vector_operand")))] "TARGET_SSE2" "ix86_fixup_binary_operands_no_copy (<CODE>, <MODE>mode, operands);") so maybe things can be fixed up in ix86_fixup_binary_operands which doesn't seem to consider subregs in any way. Index: gcc/config/i386/i386.c =================================================================== --- gcc/config/i386/i386.c (revision 248482) +++ gcc/config/i386/i386.c (working copy) @@ -21270,6 +21270,11 @@ ix86_fixup_binary_operands (enum rtx_cod if (MEM_P (src1) && !rtx_equal_p (dst, src1)) src1 = force_reg (mode, src1); + if (SUBREG_P (src1) && SUBREG_BYTE (src1) != 0) + src1 = force_reg (mode, src1); + if (SUBREG_P (src2) && SUBREG_BYTE (src2) != 0) + src1 = force_reg (mode, src2); + /* Improve address combine. */ if (code == PLUS && GET_MODE_CLASS (mode) == MODE_INT doesn't help though. pre-LRA: (insn 19 16 20 4 (set (reg:V4SI 103) (subreg:V4SI (reg:V8SI 90 [ vect_sum_11.6 ]) 16)) 1222 {movv4si_internal} (nil)) (insn 20 19 21 4 (set (reg:V4SI 98 [ _29 ]) (plus:V4SI (reg:V4SI 103) (subreg:V4SI (reg:V8SI 90 [ vect_sum_11.6 ]) 0))) 2990 {*addv4si3} (expr_list:REG_DEAD (reg:V4SI 103) (expr_list:REG_DEAD (reg:V8SI 90 [ vect_sum_11.6 ]) (nil)))) of course LRA not splitting life ranges when spilling (and thus forcing to spill inside the loop) doesn't help either. But we really don't want to spill... Choosing alt 2 in insn 19: (0) v (1) vm {movv4si_internal} 2 Non pseudo reload: reject++ alt=1,overall=1,losers=0,rld_nregs=0 Choosing alt 1 in insn 20: (0) v (1) v (2) vm {*addv4si3} alt=1,overall=0,losers=0,rld_nregs=0 Choosing alt 2 in insn 19: (0) v (1) vm {movv4si_internal} 0 Non-pseudo reload: reject+=2 0 Non input pseudo reload: reject++ alt=0: Bad operand -- refuse 0 Non-pseudo reload: reject+=2 0 Non input pseudo reload: reject++ alt=1: Bad operand -- refuse 0 Non-pseudo reload: reject+=2 0 Non input pseudo reload: reject++ Cycle danger: overall += LRA_MAX_REJECT Choosing alt 1 in insn 20: (0) v (1) v (2) vm {*addv4si3} alt=0: Bad operand -- refuse alt=1: Bad operand -- refuse alt=2,overall=0,losers=0,rld_nregs=0 so we don't seem to handle insn 19 well (why's that movv4si_internal rather than some pextr?)