http://gcc.gnu.org/bugzilla/show_bug.cgi?id=43653
--- Comment #9 from Jeffrey A. Law <law at redhat dot com> 2011-02-15 21:06:23 UTC --- I think an x86 maintainer is going to need to take a look at this. The problem as I see it is we're trying to use the address of a stack slot as a vector initializer. (insn 20 17 21 2 (set (reg:V2DI 74 [ vect_cst_.2 ]) (vec_duplicate:V2DI (reg/f:DI 20 frame))) j.c:4 1788 {*vec_dupv2di} (expr_list:REG_EQUIV (vec_duplicate:V2DI (reg/f:DI 20 frame)) (nil))) There's nothing inherently wrong with that. Eventually frame gets turned into sp+offset like this: (insn 20 17 21 2 (set (reg:V2DI 21 xmm0 [orig:74 vect_cst_.2 ] [74]) (vec_duplicate:V2DI (plus:DI (reg/f:DI 7 sp) (const_int 392 [0x188])))) j.c:4 1788 {*vec_dupv2di} (expr_list:REG_EQUIV (vec_duplicate:V2DI (plus:DI (reg/f:DI 7 sp) (const_int 392 [0x188]))) (nil))) This is a natural result of FP elimination and IMHO, we're still OK. That later gets turned into: (insn 20 17 21 2 (set (reg:V2DI 21 xmm0 [orig:74 vect_cst_.2 ] [74]) (vec_duplicate:V2DI (plus:DI (reg/f:DI 7 sp) (mem/u/c/i:DI (symbol_ref/u:DI ("*.LC0") [flags 0x2]) [0 S8 A64])))) j.c:4 1788 {*vec_dupv2di} (expr_list:REG_EQUIV (vec_duplicate:V2DI (plus:DI (reg/f:DI 7 sp) (const_int 392 [0x188]))) (nil))) Dubious, but I understand why the code did this. We'll have the following reload recorded: Reload 0: reload_in (DI) = (plus:DI (reg/f:DI 7 sp) (mem/u/c/i:DI (symbol_ref/u:DI ("*.LC0") [flags 0x2]) [0 S8 A64])) reload_out (V2DI) = (reg:V2DI 21 xmm0 [orig:74 vect_cst_.2 ] [74]) SSE_REGS, RELOAD_OTHER (opnum = 0), can't combine reload_in_reg: (plus:DI (reg/f:DI 7 sp) (const_int 392 [0x188])) reload_out_reg: (reg:V2DI 21 xmm0 [orig:74 vect_cst_.2 ] [74]) reload_reg_rtx: (reg:V2DI 22 xmm1) Which simply isn't going to work and it's all downhill from there, including a surprise secondary memory allocation which ultimately triggers the ICE. I was able to hack things to work by enabling a secondary reload when copying (plus (sp) (whatever)) into SSE_REGS, but that may not be the best solution. Someone with more x86 internal knowledge should look at this.