https://gcc.gnu.org/bugzilla/show_bug.cgi?id=122280
--- Comment #19 from Benjamin Schulz <schulz.benjamin at googlemail dot com> --- I want to note that I find it a serious implementation problem if a program encounters errors like this: ========= Program hit CUDA_ERROR_INVALID_CONTEXT (error 201) due to "invalid device context" on CUDA API call to cuCtxGetDevice. ========= Saved host backtrace up to driver entry point at error ========= Program hit CUDA_ERROR_NOT_FOUND (error 500) due to "named symbol not found" on CUDA API call to cuModuleGetGlobal_v2. and nevertheless continues to run but prints out wrong numbers. I have now seen that people ran into similar errors with pytorch and tensorflow when the gpu was too new for the implementation, or too old... https://discuss.pytorch.org/t/pytorch-cuda-returns-error-500-named-symbol-not-found/218501 But they then see CUDA_ERROR_NOT_FOUND (error 500) in their program and then the program execution stops. In the case above, I ust see errouneous numbers from a matrix multiplication but the execution continues without error. Imagine this would be not a matrix multiplication, but an application used by a doctor that inspects medical images for brest cancer. The doctor installs a shiny new gpu in his computer and expects the software to run faster.... You can not expect users to run cuda-sanitizer for each gpu/software/cuda combination before they get run. If something like that CUDA_ERROR_NOT_FOUND (error 500) occurs, then the program should just stop. Or not even compile when the generated code is not 100% compatible to the hardware. This should be included in the omp runtime. It needs to check whether the compiler/binary supports the gpu, and otherwise, refuse to compile/run the application.
