https://gcc.gnu.org/bugzilla/show_bug.cgi?id=71768
Bug ID: 71768 Summary: Missed trivial rematerialiation oppurtunity Product: gcc Version: 7.0 Status: UNCONFIRMED Severity: enhancement Priority: P3 Component: rtl-optimization Assignee: unassigned at gcc dot gnu.org Reporter: hubicka at gcc dot gnu.org Target Milestone: --- Compiling #define vector __attribute__ ((vector_size (16))) const vector int cst={10,10,10,10}; int t() { vector int val = cst; asm("#%0"::"x"(val)); e(); asm("#%0"::"x"(val)); } Results in: t: .LFB0: .cfi_startproc subq $24, %rsp .cfi_def_cfa_offset 32 vmovaps .LC0(%rip), %xmm0 #APP # 6 "t.c" 1 #%xmm0 # 0 "" 2 #NO_APP xorl %eax, %eax vmovaps %xmm0, (%rsp) call e vmovaps (%rsp), %xmm0 #APP # 8 "t.c" 1 #%xmm0 # 0 "" 2 #NO_APP addq $24, %rsp .cfi_def_cfa_offset 8 ret Which is clearly suboptimal, because xmm0 can be rematerialized again from LC0. This hits pretty badly the exchange2 benchmark with -O3 -march=bdver2 where the function is self recursive and we end up having many constants cached in XMM registers.