dhruvachak wrote: With reference to the performance degradation, this patch introduces an additional allocation/data-submit/deallocation for every kernel (GenericKernelTy::getKernelLaunchEnvironment(), PluginInterface.cpp).
Analysis shows that this overhead appears to be the primary reason for the perf degradation. Is it possible to limit this additional overhead only when we need it? For example, can it be avoided for non-reduction kernels? @jdoerfert https://github.com/llvm/llvm-project/pull/70401 _______________________________________________ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits