[clang] [llvm] [AMDGPU] Change CF intrinsics lowering to reconverge on predecessors (PR #108596)

via cfe-commits Mon, 16 Sep 2024 10:23:36 -0700

alex-t wrote:

> I agree with Jay that this change isn't acceptable from a codegen quality 
> point of view.
> 
> > Then it appeared that instructions loading the values spilled to the memory 
> > and used in the current block must be placed at the block beginning before 
> > they are used but after the point where EXEC mask is restored, since they 
> > are loading VGPRs and, hence, read EXEC. Hence, we had to consider all 
> > spilling opcodes to belong to the block prologue.
> 
> That does not seem logical. The restores themselves only read EXEC, they do 
> not write it. So why should they be part of the prolog?
> 
> It seems the whole problem could be avoided by not considering VGPR restores 
> to be part of the block prolog?


Let me describe the complete case. We have register allocation split in 2 steps 
- the first run allocated SGPRs and the second one takes care of VGPRs. It 
might happen that during the SGPRs allocation we had to spill the register 
containing EXEC mask value to the memory. Hence, we have to load it back at the 
beginning of the flow block BEFORE the OR-ing with current EXEC, as it is a 
live-in value that is an operand for the OR. As soon as we inserted the reload 
the prologue sequence is broken and during the VGPRs allocation any 
spill/reload will be placed before the point at which the exec mask is restored.

The issue was addressed here: https://github.com/llvm/llvm-project/pull/69924 
(also here https://reviews.llvm.org/D145329).
Finally, the decision was made to consider all spill/reload opcodes as 
belonging to the block prologue.

Although, revisiting this now, I still don't understand why they decided to 
include ALL spill opcodes in the prologue, but not only the SGPR spills? 
Clearly, none of the VGPR reloads really belong to the prologue.

At a first glance, changing the isSpill(opcode) to isSGPRSpill(opcode) in the 
snippet below would solve the initial case.
`  return IsNullOrVectorRegister &&
         (isSpill(Opcode) || (!MI.isTerminator() && Opcode != AMDGPU::COPY &&
                              MI.modifiesRegister(AMDGPU::EXEC, &RI)));
}`

I need to look at this a bit more. I am sure they would have done this if such 
a simple change had solved the problem.


https://github.com/llvm/llvm-project/pull/108596
_______________________________________________
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[clang] [llvm] [AMDGPU] Change CF intrinsics lowering to reconverge on predecessors (PR #108596)

Reply via email to