rampitec added a comment. In D89582#2335704 <https://reviews.llvm.org/D89582#2335704>, @arsenm wrote:
> In D89582#2335671 <https://reviews.llvm.org/D89582#2335671>, @rampitec wrote: > >> In D89582#2335619 <https://reviews.llvm.org/D89582#2335619>, @arsenm wrote: >> >>> In D89582#2335574 <https://reviews.llvm.org/D89582#2335574>, @yaxunl wrote: >>> >>>> What if a device function is called by kernels with different work group >>>> sizes, will caller's work group size override callee's work group size? >>> >>> It's user error to call a function with a larger range than the caller >> >> The problem is that user can override default on a kernel with the >> attribute, but cannot do so on function. So a module can be compiled with a >> default smaller than requested on one of the kernels. > > > >> Then if default is maximum 1024 and can only be overridden with the >> --gpu-max-threads-per-block option it would not be problem, if not the >> description of the option: >> >> LANGOPT(GPUMaxThreadsPerBlock, 32, 256, "default max threads per block for >> kernel launch bounds for HIP") >> >> I.e. it says about the "default", so it should be perfectly legal to set a >> higher limits on a specific kernel. Should the option say it restricts the >> maximum it would be legal to apply it to functions as well. > > The current backend default ends up greatly restricting the registers used in > the functions, and increasing the spilling. I know the problem, but it should be better to use AMDGPUPropagateAttributes for this. It will clone functions if needed. CHANGES SINCE LAST ACTION https://reviews.llvm.org/D89582/new/ https://reviews.llvm.org/D89582 _______________________________________________ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits