https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110148
cuilili <lili.cui at intel dot com> changed: What |Removed |Added ---------------------------------------------------------------------------- CC| |lili.cui at intel dot com --- Comment #2 from cuilili <lili.cui at intel dot com> --- The commit changed the break dependency chain function, in order to generate more FMA. S242 has a chain that needs to be broken. The chain is in a small loop and related with the loop reduction variable a[i-1]. Src code: for (int i = 1; i < LEN_1D; ++i) { a[i] = a[i - 1] + s1 + s2 + b[i] + c[i] + d[i]; } ------------------------------------------------------ Base version: SSA tree ssa1 = (s1+s2) + b[i]; ssa2 = c[i] + d[i]; ssa3 = ssa1+ssa2; ssa4 = ssa3 + a[i-1] a[i-1] uses xmm1, there are 2 instructions using xmm0 have dependencies across iterations Assembler Loop1: vmovsd 0x60c400(%rax),%xmm0 vaddsd 0x60b000(%rax),%xmm3,%xmm2 add $0x8,%rax vaddsd 0x60b9f8(%rax),%xmm0,%xmm0 vaddsd %xmm2,%xmm0,%xmm0 vaddsd %xmm0,%xmm1,%xmm1 ---> 1 vmovsd %xmm1,0x60cdf8(%rax) ---> 2 cmp $0xa00,%rdx jne Loop1 -------------------------------------------------------------- Base + commit g:e5405f065bace0685cb3b8878d1dfc7a6e7ef409 version: a[i-1] uses xmm0, there are 4 instructions using xmm0 have dependencies across iterations SSA tree ssa1 = (s1+s2) + b[i]; ssa2 = c[i] + d[i]; ssa3 = ssa1 + a[i-1] ssa3 = ssa2 + ssa3; Assembler Loop1: vaddsdq 0x60b000(%rax), %xmm0, %xmm0 ---> 1 vmovsdq 0x60c400(%rax), %xmm1 add $0x8, %rax vaddsdq 0x60b9f8(%rax), %xmm1, %xmm1 vaddsd %xmm2, %xmm0, %xmm0 ---> 2 vaddsd %xmm1, %xmm0, %xmm0 ---> 3 vmovsdq %xmm0, 0x60cdf8(%rax) ---> 4 cmp $0xa00,%rdx jne Loop1