skc7 added a comment.

In D124158#3477281 <https://reviews.llvm.org/D124158#3477281>, @jdoerfert wrote:

>> For Ex: SimplifyCFG pass removes the branch leading to a BB which has an 
>> incoming value that will always trigger undefined behavior. This basically 
>> modifies the CFG and combines the basic blocks. This works for CPU 
>> execution. But on a GPU, there are intrinsics like "__shfl_sync(unsigned 
>> mask, T var, int srcLane, int width=warpSize)", Where the exchange of 
>> variable occurs simultaneously for all active threads within the warp. So, 
>> here in the cuda/hip kernel, variable var in shuffl_sync may not be 
>> initialised, and LLVM IR treats it as undef. Currently all the arguments are 
>> tagged with noundef attribute and the above mentioned optimization by 
>> SimplifyCFG gets applied and the kernel execution becomes ambiguous. So, the 
>> proposed change is to skip adding noundef attribute to arguments when a 
>> function has been tagged with convergent attribute.
>
> Can we please have an example for this. I don't know what would be broken w/ 
> noundef + convergent and I somewhat doubt noundef is the problem.

For the below source kernel from hypre, the optimisation by simplifyCFG pass 
caused issue with kernel execution on GPU.
We figured out that enabling noudef analysis by default is triggering this 
optimization.

**source kernel:**
Note: variable t is uninitialised intially and gets initialiazed when lane is 0.
void kernel{

  double t, measure_row;
  int lane = hypre_cuda_get_lane_id<1>();

...

  if (lane == 0) {t = read_only_load(measure_diag + row);}
  measure_row = __shfl_sync(HYPRE_WARP_FULL_MASK, t, 0);

...
}

**Example LLVM IR for a similar scenario:**
define void @func(i32 noundef %arg17) {
bb1:

  %i1 = icmp eq i32 %arg17, 0
  br i1 %i1, label %bb2, label %bb3

bb2:                                              ; preds = %bb1

  %i2 = call noundef double @read_only_load()
  br label %bb3

bb3:                                              ; preds = %bb2, %bb1

  %i3 = phi double [ %i2, %bb2 ], [ undef, %bb1 ]
  %i4 = call noundef double @__shfl_sync(double noundef %i3)
  ret void

}

declare double @read_only_load()
declare double @__shfl_sync(double noundef) convergent

**IR Dump After SimplifyCFGPass on func:**
define void @func(i32 noundef %arg17) {
bb1:

  %i1 = icmp eq i32 %arg17, 0
  call void @llvm.assume(i1 %i1)
  %i2 = call noundef double @read_only_load()
  %i4 = call noundef double @__shfl_sync(double noundef %i2)
  ret void

}


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D124158/new/

https://reviews.llvm.org/D124158

_______________________________________________
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
  • [PATCH] D124158:... krishna chaitanya sankisa via Phabricator via cfe-commits
    • [PATCH] D12... Matt Arsenault via Phabricator via cfe-commits
    • [PATCH] D12... krishna chaitanya sankisa via Phabricator via cfe-commits
    • [PATCH] D12... krishna chaitanya sankisa via Phabricator via cfe-commits
    • [PATCH] D12... krishna chaitanya sankisa via Phabricator via cfe-commits
    • [PATCH] D12... krishna chaitanya sankisa via Phabricator via cfe-commits
    • [PATCH] D12... krishna chaitanya sankisa via Phabricator via cfe-commits
    • [PATCH] D12... krishna chaitanya sankisa via Phabricator via cfe-commits
    • [PATCH] D12... krishna chaitanya sankisa via Phabricator via cfe-commits
    • [PATCH] D12... Johannes Doerfert via Phabricator via cfe-commits
    • [PATCH] D12... krishna chaitanya sankisa via Phabricator via cfe-commits
    • [PATCH] D12... Eli Friedman via Phabricator via cfe-commits
    • [PATCH] D12... Johannes Doerfert via Phabricator via cfe-commits
    • [PATCH] D12... Matt Arsenault via Phabricator via cfe-commits
    • [PATCH] D12... Matt Arsenault via Phabricator via cfe-commits
    • [PATCH] D12... Eli Friedman via Phabricator via cfe-commits
    • [PATCH] D12... Johannes Doerfert via Phabricator via cfe-commits
    • [PATCH] D12... Nicolai Hähnle via Phabricator via cfe-commits

Reply via email to