rampitec added a comment.

In D89582#2335704 <https://reviews.llvm.org/D89582#2335704>, @arsenm wrote:

> In D89582#2335671 <https://reviews.llvm.org/D89582#2335671>, @rampitec wrote:
>
>> In D89582#2335619 <https://reviews.llvm.org/D89582#2335619>, @arsenm wrote:
>>
>>> In D89582#2335574 <https://reviews.llvm.org/D89582#2335574>, @yaxunl wrote:
>>>
>>>> What if a device function is called by kernels with different work group 
>>>> sizes, will caller's work group size override callee's work group size?
>>>
>>> It's user error to call a function with a larger range than the caller
>>
>> The problem is that user can override default on a kernel with the 
>> attribute, but cannot do so on function. So a module can be compiled with a 
>> default smaller than requested on one of the kernels.
>
>
>
>> Then if default is maximum 1024 and can only be overridden with the 
>> --gpu-max-threads-per-block option it would not be problem, if not the 
>> description of the option:
>>
>>   LANGOPT(GPUMaxThreadsPerBlock, 32, 256, "default max threads per block for 
>> kernel launch bounds for HIP")
>>
>> I.e. it says about the "default", so it should be perfectly legal to set a 
>> higher limits on a specific kernel. Should the option say it restricts the 
>> maximum it would be legal to apply it to functions as well.
>
> The current backend default ends up greatly restricting the registers used in 
> the functions, and increasing the spilling.

I know the problem, but it should be better to use AMDGPUPropagateAttributes 
for this. It will clone functions if needed.


CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D89582/new/

https://reviews.llvm.org/D89582

_______________________________________________
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

Reply via email to