https://gcc.gnu.org/bugzilla/show_bug.cgi?id=83457
Bug ID: 83457 Summary: Add fhost-simt-vf Product: gcc Version: 8.0 Status: UNCONFIRMED Severity: enhancement Priority: P3 Component: libgomp Assignee: unassigned at gcc dot gnu.org Reporter: vries at gcc dot gnu.org CC: jakub at gcc dot gnu.org Target Milestone: --- Created attachment 42898 --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=42898&action=edit Add fhost-simt-{vf,lane} and fassume-omp-{nthreads,threadid} The simt code path for fopenmp is only exercised in the offloading context, and only for the nvptx target. This demonstrator patch enables it partially on the host. Using the patch, I managed to reproduce PR81778 on x86_64. Consider test.c: ... #include <stdio.h> extern void abort (); #define N 4 int main () { #pragma omp target parallel for simd schedule(static, 2) num_threads(1) for (unsigned int i = N; i > 0 ; i -= 1) { printf ("%d\n", i); if (!(0 < i && i <= N)) abort (); } return 0; } ... The normal fno-openmp execution is: .... $ gcc test.c && ./a.out 4 3 2 1 ... When executing for simt vectorization factor 2 on lane 0 all goes ok: ... $ gcc test.c -O2 -fopenmp -fhost-simt-vf=2 -fhost-simt-lane=0 && ./a.out 4 2 ... But when executing for lane 1, we run into the error: ... $ gcc test.c -O2 -fopenmp -fhost-simt-vf=2 -fhost-simt-lane=1 && ./a.out 3 1 -1 Aborted (core dumped) ... With two additional options to simplify the code: ... $ gcc test.c -O2 -fopenmp \ -fhost-simt-vf=2 \ -fhost-simt-lane=1 \ -fassume-omp-threadid=0 \ -fassume-omp-nthreads=1 \ -fdump-tree-all ... at optimized we have an unconditional abort: ... ;; Function main._omp_fn.1 (main._omp_fn.1, funcdef_no=13, decl_uid=2510, cgraph_uid=13, symbol_order\ =13) (executed once) __attribute__((omp declare target)) main._omp_fn.1 (void * .omp_data_i) { <bb 2> [local count: 403773193]: printf ("%d\n", 3); printf ("%d\n", 1); printf ("%d\n", 4294967295); abort (); } ... Or, without the abort, an eternal loop: ... __attribute__((omp declare target)) main._omp_fn.1 (void * .omp_data_i) { unsigned int i; <bb 2> [local count: 67098799]: printf ("%d\n", 3); <bb 3> [local count: 469691593]: # i_29 = PHI <1(2), i_36(3)> printf ("%d\n", i_29); i_36 = i_29 + 4294967294; goto <bb 3>; [100.00%] } ... So, the error can be analyzed and fixed entirely in the host and pre-rtl domain.