yxsamliu wrote:

> > reqd_work_group_size is for OpenCL reqd_work_group_size attribute and it 
> > sets exact block size. amdgpu-flat-work-group-size sets a (min, max) range 
> > for block size.
> > HIP launch bounds sets a block size range (1, bound). It cannot be 
> > represented by reqd_work_group_size.
> 
> This is not quite correct. CUDA defines `__launch_bounds__` [as only carrying 
> the maximum, not a 
> range](https://docs.nvidia.com/cuda/cuda-c-programming-guide/index.html#launch-bounds).
>  I implemented it in HIP, and a range of [1, bound] is equivalent to just 
> taking the maximum, the lower bound is spurious / I only put it in place 
> because I probably misread the syntax for the attribute / misinterpreted it. 
> TL;DR HIP `__launch_bounds__` should match CUDA `__launch_bounds__`, and 
> those only take extrema, not ranges AFAICS, so this is fine.

For example, if you use reqd_work_group_size to represent launch_bounds(1024), 
then launch the kernel with block size 256, it will fail since 
reqd_work_group_size means the kernel can only be launched with block size 
1024. I don't think that matches what launch_bounds(1024) intends to be. It 
intends to allow the kernel to be launched with block size between 1 and 1024.

https://github.com/llvm/llvm-project/pull/116820
_______________________________________________
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

Reply via email to