https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110148

cuilili <lili.cui at intel dot com> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |lili.cui at intel dot com

--- Comment #2 from cuilili <lili.cui at intel dot com> ---

The commit changed the break dependency chain function, in order to generate
more FMA. S242 has a chain that needs to be broken. The chain is in a small
loop and related with the loop reduction variable a[i-1].


Src code:

for (int i = 1; i < LEN_1D; ++i) 
   {
     a[i] = a[i - 1] + s1 + s2 + b[i] + c[i] + d[i];
   }

------------------------------------------------------
Base version:

SSA tree
ssa1 = (s1+s2) + b[i];
ssa2 = c[i] + d[i];
ssa3 = ssa1+ssa2;
ssa4 = ssa3 + a[i-1]

a[i-1] uses xmm1, there are 2 instructions using xmm0 have dependencies across
iterations

Assembler
Loop1:
vmovsd 0x60c400(%rax),%xmm0              
vaddsd 0x60b000(%rax),%xmm3,%xmm2        
add    $0x8,%rax                                 
vaddsd 0x60b9f8(%rax),%xmm0,%xmm0        
vaddsd %xmm2,%xmm0,%xmm0                         
vaddsd %xmm0,%xmm1,%xmm1     ---> 1                   
vmovsd %xmm1,0x60cdf8(%rax)  ---> 2
cmp    $0xa00,%rdx
jne    Loop1

--------------------------------------------------------------
Base + commit g:e5405f065bace0685cb3b8878d1dfc7a6e7ef409 version:

a[i-1] uses xmm0, there are 4 instructions using xmm0 have dependencies across
iterations

SSA tree
ssa1 = (s1+s2) + b[i];
ssa2 = c[i] + d[i];
ssa3 = ssa1 + a[i-1]
ssa3 = ssa2 + ssa3;

Assembler
Loop1:
vaddsdq  0x60b000(%rax), %xmm0, %xmm0  ---> 1
vmovsdq  0x60c400(%rax), %xmm1
add $0x8, %rax                                                           
vaddsdq  0x60b9f8(%rax), %xmm1, %xmm1
vaddsd %xmm2, %xmm0, %xmm0             ---> 2
vaddsd %xmm1, %xmm0, %xmm0             ---> 3
vmovsdq  %xmm0, 0x60cdf8(%rax)         ---> 4
cmp    $0xa00,%rdx
jne    Loop1

Reply via email to