https://gcc.gnu.org/bugzilla/show_bug.cgi?id=97203
--- Comment #10 from Tom de Vries <vries at gcc dot gnu.org> --- (In reply to Alexander Monakov from comment #8) > No, -msoft-stack-reserve-local is really meant to be in bytes: it may not > exceed the amount of .local memory reserved by CUDA driver (which is just > 1-2 KB, unless overridden via cuCtxSetLimit, which nvptx-run.c does, but > plugin-nvptx.c does not). > > Keep in mind that .local memory reservation is multiplied by number of > active contexts, which could be in range 20000-30000 when the code was > written: 128KB local memory per active thread would imply a 2.5GB allocation > on the GPU. With the number of active contexts, do you mean the sm_count * thread_max as used in nvptx-run.c (which, FWIW, is 10.240 on my card)?