https://gcc.gnu.org/bugzilla/show_bug.cgi?id=71646
Bug ID: 71646 Summary: incompability between ptx code and GPU hardware Product: gcc Version: 6.1.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: libgomp Assignee: unassigned at gcc dot gnu.org Reporter: didu31 at hotmail dot fr CC: jakub at gcc dot gnu.org Target Milestone: --- Created attachment 38758 --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=38758&action=edit Very simple OpenACC program Hardware : Core 2 Quad + Nvidia Geforce GT 430 OS : Linux 4.4.0-24-generic x86_64 lib environ : - gcc 6.1 (compiled from sources) - nvidia-toolkit-7.5 - libcudart 7.5 - libcuda1-361 - nvptx-tools, master branch of June, the 17th (compiled from sources) The attached source program is compiled and linked thanks to this command : gcc t.c -fopenacc -foffload=nvptx-none -foffload="-O3" -O3 -o t -lgomp -Wl,-rpath=/usr/local/lib64 Typing this : export ACC_DEVICE_TYPE= then executing ./t and these messages appear : libgomp: Link error log ptxas fatal : SM version specified by .target is higher than default SM version assumed libgomp: cuLinkAddData (ptx_code) error: no kernel image is available for execution on the device Moreover, ./t hangs. It is expected as my video card supports at most sm_20 ptx code while sm_30 instructions are generated by gcc and even .target sm_30 is hardcoded at gcc/config/nvptx/nvptx.c:3904 : fputs ("\t.target\tsm_30\n", asm_out_file); From my point of view, as sm_30 ptx code only is generated, int nvptx_get_num_devices (void) (libgomp/plugin/plugin-nvptx.c:680) should be aware of that and should not count such a video card. As a result, gomp runtime would switch to host as it does when cuInit(0) != CUDA_SUCCESS.