------- Comment #5 from rguenth at gcc dot gnu dot org 2009-09-15 14:40 ------- Which is likely because it decides to allocate $cx for the load destination (operand for the scalar shift) and then needs to re-load it to $xmm? for the vector shift. The placement of the re-load inside the loop is unfortunate...
Reloads for insn # 67 Reload 0: reload_in (SI) = (reg:SI 116 [ pretmp.11 ]) SSE_REGS, RELOAD_FOR_INPUT (opnum = 2) reload_in_reg: (reg:SI 116 [ pretmp.11 ]) reload_reg_rtx: (reg:SI 22 xmm1) Reloads for insn # 83 Reload 0: reload_in (QI) = (subreg:QI (reg:SI 116 [ pretmp.11 ]) 0) CREG, RELOAD_FOR_INPUT (opnum = 2) reload_in_reg: (subreg:QI (reg:SI 116 [ pretmp.11 ]) 0) reload_reg_rtx: (reg:QI 2 cx) -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=34011