================
@@ -196,8 +208,10 @@ define amdgpu_kernel void @add_i32_constant(ptr 
addrspace(1) %out, ptr addrspace
 ; GFX11W32-NEXT:    v_mbcnt_lo_u32_b32 v0, s1, 0
 ; GFX11W32-NEXT:    ; implicit-def: $vgpr1
 ; GFX11W32-NEXT:    s_delay_alu instid0(VALU_DEP_1)
-; GFX11W32-NEXT:    v_cmpx_eq_u32_e32 0, v0
-; GFX11W32-NEXT:    s_cbranch_execz .LBB0_2
+; GFX11W32-NEXT:    v_cmp_eq_u32_e32 vcc_lo, 0, v0
----------------
jayfoad wrote:

The codegen seems significantly worse for simple conditional branches like 
this, especially on GFX10.3+ where we try to use v_cmpx. Note that there is 
special hardware to handle v_cmpx followed by a VALU instruction without the 
kind of pipeline stall that you would normally get when you write to EXEC.

To improve matters a bit, could we aim for codegen like this?
```
v_cmp_eq_u32_e32 vcc_lo, 0, v0
s_cbranch_vccz .LBB0_2
s_mov_b32 exec_lo, vcc_lo
```
This saves one instruction overall and moves the exec modification into the 
body of the "if".

https://github.com/llvm/llvm-project/pull/108596
_______________________________________________
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

Reply via email to