[clang] [clang][CodeGen][SPIRV] Translate `amdgpu_flat_work_group_size` into `reqd_work_group_size`. (PR #116820)

Alex Voicu via cfe-commits Tue, 19 Nov 2024 08:17:56 -0800

AlexVlx wrote:

> > > reqd_work_group_size is for OpenCL reqd_work_group_size attribute and it 
> > > sets exact block size. amdgpu-flat-work-group-size sets a (min, max) 
> > > range for block size.
> > > HIP launch bounds sets a block size range (1, bound). It cannot be 
> > > represented by reqd_work_group_size.
> > 
> > 
> > This is not quite correct. CUDA defines `__launch_bounds__` [as only 
> > carrying the maximum, not a 
> > range](https://docs.nvidia.com/cuda/cuda-c-programming-guide/index.html#launch-bounds).
> >  I implemented it in HIP, and a range of [1, bound] is equivalent to just 
> > taking the maximum, the lower bound is spurious / I only put it in place 
> > because I probably misread the syntax for the attribute / misinterpreted 
> > it. TL;DR HIP `__launch_bounds__` should match CUDA `__launch_bounds__`, 
> > and those only take extrema, not ranges AFAICS, so this is fine.
> 
> For example, if you use reqd_work_group_size to represent 
> launch_bounds(1024), then launch the kernel with block size 256, it will fail 
> since reqd_work_group_size means the kernel can only be launched with block 
> size 1024. I don't think that matches what launch_bounds(1024) intends to be. 
> It intends to allow the kernel to be launched with block size between 1 and 
> 1024.


Oh, apologies, I probably should have clarified that we're only going to see 
this in SPIR-V, as part of run-time finalisation/JIT it gets translated back 
into the original amdgpu attribute; it's mostly for the convenience of carrying 
the maximum / composing with existing tools that an existing attribute is 
chosen, otherwise I'd have had to side-channel it. I agree that it is not a 
direct match, but sadly there is no direct match, as `work_group_size_hint` is 
too weak. If you're strongly opposed to this I can re-factor this to pass via 
side-channel.

https://github.com/llvm/llvm-project/pull/116820
_______________________________________________
cfe-commits mailing list
[email protected]
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[clang] [clang][CodeGen][SPIRV] Translate `amdgpu_flat_work_group_size` into `reqd_work_group_size`. (PR #116820)

Reply via email to