[Bug debug/99319] New: DW_MACRO_define_strp uses uleb128 for second operand
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=99319 Bug ID: 99319 Summary: DW_MACRO_define_strp uses uleb128 for second operand Product: gcc Version: 11.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: debug Assignee: unassigned at gcc dot gnu.org Reporter: vries at gcc dot gnu.org Target Milestone: --- Consider compiling hello world, with macro info: ... $ gcc-11 -ggdb3 -g hello.c -dA -save-temps ... In a-hello.s, we have: ... .section.debug_macro,"",@progbits .Ldebug_macro0: .value 0x5 # DWARF macro version number .byte 0x2 # Flags: 32-bit, lineptr present .long .Ldebug_line0 .byte 0x3 # Start new file .uleb128 0 # Included from line number 0 .uleb128 0x1# file hello.c .byte 0x5 # Define macro strp .uleb128 0 # At line number 0 .long .LASF2 # The macro: "__STDC__ 1" ... So, the DW_MACRO_define_strp entry (starting with .byte 0x5) has two operands, a uleb128 and a .long. AFAIU, this is in accordance with the spec: ... A DW_MACRO_define_strp or DW_MACRO_undef_strp entry has two operands. The first operand encodes the source line number of the #define or #undef macro directive. The second operand consists of an offset into a string table contained in the .debug_str section of the object file. The size of the operand is given in the header offset_size_flag field. ... Now add -gsplit-dwarf: ... $ gcc-11 -ggdb3 -g hello.c -dA -save-temps -gsplit-dwarf ... Now we have instead: ... .section.debug_macro.dwo,"e",@progbits .Ldebug_macro0: .value 0x5 # DWARF macro version number .byte 0x2 # Flags: 32-bit, lineptr present .long .Lskeleton_debug_line0 .byte 0x3 # Start new file .uleb128 0 # Included from line number 0 .uleb128 0x1# file hello.c .byte 0x5 # Define macro strp .uleb128 0 # At line number 0 .uleb128 0x191 # The macro: "__STDC__ 1" ... The second operand is now also a .uleb128. AFAIU, this goes against the spec.
[Bug debug/99319] DW_MACRO_define_strp uses uleb128 for second operand
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=99319 --- Comment #1 from Tom de Vries --- Related readelf PR: https://sourceware.org/bugzilla/show_bug.cgi?id=27387
[Bug debug/99319] DW_MACRO_define_strp uses uleb128 for second operand
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=99319 --- Comment #2 from Tom de Vries --- (In reply to Tom de Vries from comment #0) > The second operand is now also a .uleb128. AFAIU, this goes against the > spec. Also, gdb doesn't get it: ... $ gdb -q -batch -readnow a.out DW_FORM_strp pointing outside of .debug_str section [in module /home/vries/hello/a.out] ... Debugging shows that the error is due to a large str_offset: ... (gdb) p /x str_offset $14 = 0x8c0502cd ... which matches this: ... .byte 0x5 # Define macro strp .uleb128 0x8b # At line number 139 .uleb128 0x14d # The macro: "stdin stdin" .byte 0x5 # Define macro strp .uleb128 0x8c # At line number 140 .uleb128 0x24 # The macro: "stdout stdout" ... Note that the uleb128 representation of 0x14d is "cd02".
[Bug debug/95432] inconsistent behaviors at -O2
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=95432 --- Comment #3 from Tom de Vries --- (In reply to Andrew Pinski from comment #2) > Assembly: > .loc 1 12 3 is_stmt 1 view .LVU12 > .loc 1 10 8 is_stmt 0 view .LVU13 > movaps %xmm0, (%rsp) > .loc 1 11 8 view .LVU14 > movaps %xmm0, 32(%rsp) > .loc 1 12 13 view .LVU15 > callfoo > .LVL1: > .loc 1 13 13 view .LVU16 > leaq32(%rsp), %rdi > .loc 1 12 13 view .LVU17 > movl%eax, %edx > .LVL2: > .loc 1 13 3 is_stmt 1 view .LVU18 > .loc 1 13 13 is_stmt 0 view .LVU19 > callfoo > .LVL3: > .loc 1 14 3 is_stmt 1 view .LVU20 > .loc 1 14 6 is_stmt 0 view .LVU21 > > Looks correct to me, both call foo have the correct line on them. I think > this is another GDB issue, most likely how dwarf3 and is_stmt is handled > just like 95431 even. Ack, this is gdb PR breakpoints/26063 ( https://sourceware.org/bugzilla/show_bug.cgi?id=26063 ). Fixed by https://sourceware.org/git/gitweb.cgi?p=binutils-gdb.git;h=ebde6f2ddc987e7e2d5a218ee8cf0126ec189424 .
[Bug target/99555] [OpenMP/nvptx] Execution-time hang for simple nested OpenMP 'target'/'parallel'/'task' constructs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=99555 --- Comment #1 from Tom de Vries --- I see this as well: ... PASS: libgomp.c/../libgomp.c-c++-common/task-detach-6.c (test for excess errors) WARNING: program timed out. ...
[Bug target/99564] New: [nvptx] FAIL: libgomp.oacc-fortran/derivedtypes-arrays-1.f90 -DACC_DEVICE_TYPE_nvidia=1 -DACC_MEM_SHARED=0 -foffload=nvptx-none -O0 (test for excess errors)
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=99564 Bug ID: 99564 Summary: [nvptx] FAIL: libgomp.oacc-fortran/derivedtypes-arrays-1.f90 -DACC_DEVICE_TYPE_nvidia=1 -DACC_MEM_SHARED=0 -foffload=nvptx-none -O0 (test for excess errors) Product: gcc Version: 11.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: target Assignee: unassigned at gcc dot gnu.org Reporter: vries at gcc dot gnu.org Target Milestone: --- ... FAIL: libgomp.oacc-fortran/derivedtypes-arrays-1.f90 -DACC_DEVICE_TYPE_nvidia=1 -DACC_MEM_SHARED=0 -foffload=nvptx-none -O0 (test for excess errors) FAIL: libgomp.oacc-fortran/derivedtypes-arrays-1.f90 -DACC_DEVICE_TYPE_nvidia=1 -DACC_MEM_SHARED=0 -foffload=nvptx-none -O1 (test for excess errors) FAIL: libgomp.oacc-fortran/derivedtypes-arrays-1.f90 -DACC_DEVICE_TYPE_nvidia=1 -DACC_MEM_SHARED=0 -foffload=nvptx-none -O2 (test for excess errors) FAIL: libgomp.oacc-fortran/derivedtypes-arrays-1.f90 -DACC_DEVICE_TYPE_nvidia=1 -DACC_MEM_SHARED=0 -foffload=nvptx-none -O3 -fomit-frame-pointer -funroll-loops -fpeel-loops -ftracer -finline-functions (test for excess errors) FAIL: libgomp.oacc-fortran/derivedtypes-arrays-1.f90 -DACC_DEVICE_TYPE_nvidia=1 -DACC_MEM_SHARED=0 -foffload=nvptx-none -O3 -g (test for excess errors) FAIL: libgomp.oacc-fortran/derivedtypes-arrays-1.f90 -DACC_DEVICE_TYPE_nvidia=1 -DACC_MEM_SHARED=0 -foffload=nvptx-none -Os (test for excess errors) ... In more detail: ... /home/vries/oacc/trunk/source-gcc/libgomp/testsuite/libgomp.oacc-fortran/derivedtypes-arrays-1.f90: In function 'MAIN__._omp_fn.0':^M /home/vries/oacc/trunk/source-gcc/libgomp/testsuite/libgomp.oacc-fortran/derivedtypes-arrays-1.f90:90:40: warning: using vector_length (32), ignoring 1^M FAIL: libgomp.oacc-fortran/derivedtypes-arrays-1.f90 -DACC_DEVICE_TYPE_nvidia=1 -DACC_MEM_SHARED=0 -foffload=nvptx-none -O0 (test for excess errors) ...
[Bug driver/99896] New: g++ drops -lc
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=99896 Bug ID: 99896 Summary: g++ drops -lc Product: gcc Version: 10.2.1 Status: UNCONFIRMED Severity: normal Priority: P3 Component: driver Assignee: unassigned at gcc dot gnu.org Reporter: vries at gcc dot gnu.org Target Milestone: --- [ Spinoff from gdb PR https://sourceware.org/bugzilla/show_bug.cgi?id=27681 . ] Consider the following test-case, consisting of: ... $ cat main.c #include #include #ifndef _GNU_SOURCE #define _GNU_SOURCE #endif #include extern void foo (void); int main (void) { regex_t re; int res = regcomp (&re, "bla", 0); assert (res == 0); int res2 = regexec (&re, "bla", 0, NULL, 0); assert (res2 == 0); regoff_t res3 = re_search (&re, "bla", 3, 0, 3, NULL); assert (res3 == 0); foo (); return 0; } ... and: ... $ cat foo.c #include #include #include extern void foo (void); void foo (void) { regex_t re; int res = pcre2_regcomp (&re, "bla", 0); assert (res == 0); int res2 = pcre2_regexec (&re, "bla", 0, NULL, 0); assert (res2 == 0); } ... We can compile with gcc and run like this: ... $ gcc main.c -lc foo.c -lpcre2-posix $ ./a.out $ ... likewise, with clang: ... $ clang main.c -lc foo.c -lpcre2-posix $ ./a.out $ ... likewise, with clang++: ... $ clang++ -x c++ main.c -lc foo.c -lpcre2-posix $ ./a.out $ ... but with g++: ... $ g++ -x c++ main.c -lc foo.c -lpcre2-posix $ ./a.out Segmentation fault (core dumped) $ ... Using -v, we can see what goes wrong. With gcc, we have: ... collect2 ... main.o -lc foo.o -lpcre2-posix ... ... With g++, we have instead: ... collect2 ... main.o foo.o -lpcre2-posix ... ... Workaround: use -Wl: ... $ g++ -x c++ main.c -Wl,-lc foo.c -lpcre2-posix $ ./a.out $ ...
[Bug driver/99896] g++ drops -lc
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=99896 --- Comment #2 from Tom de Vries --- (In reply to Jonathan Wakely from comment #1) > (In reply to Tom de Vries from comment #0) > > With g++, we have instead: > > ... > > collect2 ... main.o foo.o -lpcre2-posix ... > > ... > > It isn't dropped, it's moved to the end: > > main.o foo.o -lpcre2-posix -lstdc++ -lm -lc -lgcc_s -lgcc -lc -lgcc_s -lgcc > I don't understand. AFAICT, it's dropped. It's not moved to the end, because -lc is already at the end without specifying -lc. > If you need it before foo.o then -Wl,-lc seems like the right workaround for > me. > Um, for my understanding, does that mean you agree this is a bug in g++? > Why is it needed there anyway though? main.o is intended to use regcomp from glibc. Foo.o is intended to use pcre2_regcomp from pcre2-posix (which is also accessible using plain regcomp). When -lc is droppend, regcomp from pcre2-posix is used by main instead, which is incompatible with using re_search from glibc.
[Bug driver/99896] g++ drops -lc
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=99896 Tom de Vries changed: What|Removed |Added CC||matz at suse dot de --- Comment #6 from Tom de Vries --- (In reply to Jonathan Wakely from comment #4) > (In reply to Tom de Vries from comment #2) > > I don't understand. AFAICT, it's dropped. It's not moved to the end, > > because -lc is already at the end without specifying -lc. > > OK, it's dropped because it's always present at the end. > > This is similar to adding -I/usr/include which gets ignored, because it's > already going to be searched anyway as a system header directory. Quoting > the manual: > > "If a standard system include directory, or a directory specified with > -isystem, is also specified with -I, the -I option is ignored." > > > > Um, for my understanding, does that mean you agree this is a bug in g++? > > No. > OK, so here ( https://gcc.gnu.org/onlinedocs/gcc/Invoking-GCC.html#Invoking-GCC ) I read: ... Also, the placement of the -l option is significant. ... So, if the documentation of gcc says that placement of the -l option is significant, then why does g++ decide to mess with that? ISTM g++ violates documented behaviour.
[Bug target/99555] [OpenMP/nvptx] Execution-time hang for simple nested OpenMP 'target'/'parallel'/'task' constructs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=99555 --- Comment #4 from Tom de Vries --- Investigated using cuda-gdb. After typing ^c, we investigate the state: ... (cuda-gdb) info cuda kernels Kernel Parent Dev Grid Status SMs Mask GridDim BlockDim Invocation * 0 - 01 Active 0x0010 (1,1,1) (32,8,1) main$_omp_fn() ... So, we have 256 threads in the CTA, or 8 warps. The threads have the following state: ... (cuda-gdb) info cuda threads BlockIdx ThreadIdx To BlockIdx ThreadIdx Count Virtual PC Filename Line Kernel 0 * (0,0,0) (0,0,0) (0,0,0) (0,0,0) 1 0x00b5f638 n/a 0 (0,0,0) (0,1,0) (0,0,0) (0,7,0) 7 0x00b2f350 n/a 0 (0,0,0) (1,0,0) (0,0,0) (1,0,0) 1 0x00b5f638 n/a 0 (0,0,0) (1,1,0) (0,0,0) (1,7,0) 7 0x00b2f350 n/a 0 (0,0,0) (2,0,0) (0,0,0) (2,0,0) 1 0x00b5f638 n/a 0 (0,0,0) (2,1,0) (0,0,0) (2,7,0) 7 0x00b2f350 n/a 0 (0,0,0) (3,0,0) (0,0,0) (3,0,0) 1 0x00b5f638 n/a 0 (0,0,0) (3,1,0) (0,0,0) (3,7,0) 7 0x00b2f350 n/a 0 (0,0,0) (4,0,0) (0,0,0) (4,0,0) 1 0x00b5f638 n/a 0 (0,0,0) (4,1,0) (0,0,0) (4,7,0) 7 0x00b2f350 n/a 0 (0,0,0) (5,0,0) (0,0,0) (5,0,0) 1 0x00b5f638 n/a 0 (0,0,0) (5,1,0) (0,0,0) (5,7,0) 7 0x00b2f350 n/a 0 (0,0,0) (6,0,0) (0,0,0) (6,0,0) 1 0x00b5f638 n/a 0 (0,0,0) (6,1,0) (0,0,0) (6,7,0) 7 0x00b2f350 n/a 0 (0,0,0) (7,0,0) (0,0,0) (7,0,0) 1 0x00b5f638 n/a 0 (0,0,0) (7,1,0) (0,0,0) (7,7,0) 7 0x00b2f350 n/a 0 (0,0,0) (8,0,0) (0,0,0) (8,0,0) 1 0x00b5f638 n/a 0 (0,0,0) (8,1,0) (0,0,0) (8,7,0) 7 0x00b2f350 n/a 0 (0,0,0) (9,0,0) (0,0,0) (9,0,0) 1 0x00b5f638 n/a 0 (0,0,0) (9,1,0) (0,0,0) (9,7,0) 7 0x00b2f350 n/a 0 (0,0,0) (10,0,0) (0,0,0) (10,0,0) 1 0x00b5f638 n/a 0 (0,0,0) (10,1,0) (0,0,0) (10,7,0) 7 0x00b2f350 n/a 0 (0,0,0) (11,0,0) (0,0,0) (11,0,0) 1 0x00b5f638 n/a 0 (0,0,0) (11,1,0) (0,0,0) (11,7,0) 7 0x00b2f350 n/a 0 (0,0,0) (12,0,0) (0,0,0) (12,0,0) 1 0x00b5f638 n/a 0 (0,0,0) (12,1,0) (0,0,0) (12,7,0) 7 0x00b2f350 n/a 0 (0,0,0) (13,0,0) (0,0,0) (13,0,0) 1 0x00b5f638 n/a 0 (0,0,0) (13,1,0) (0,0,0) (13,7,0) 7 0x00b2f350 n/a 0 (0,0,0) (14,0,0) (0,0,0) (14,0,0) 1 0x00b5f638 n/a 0 (0,0,0) (14,1,0) (0,0,0) (14,7,0) 7 0x00b2f350 n/a 0 (0,0,0) (15,0,0) (0,0,0) (15,0,0) 1 0x00b5f638 n/a 0 (0,0,0) (15,1,0) (0,0,0) (15,7,0) 7 0x00b2f350 n/a 0 (0,0,0) (16,0,0) (0,0,0) (16,0,0) 1 0x00b5f638 n/a 0 (0,0,0) (16,1,0) (0,0,0) (16,7,0) 7 0x00b2f350 n/a 0 (0,0,0) (17,0,0) (0,0,0) (17,0,0) 1 0x00b5f638 n/a 0 (0,0,0) (17,1,0) (0,0,0) (17,7,0) 7 0x00b2f350 n/a 0 (0,0,0) (18,0,0) (0,0,0) (18,0,0) 1 0x00b5f638 n/a 0 (0,0,0) (18,1,0) (0,0,0) (18,7,0) 7 0x00b2f350 n/a 0 (0,0,0) (19,0,0) (0,0,0) (19,0,0) 1 0x00b5f638 n/a 0 (0,0,0) (19,1,0) (0,0,0) (19,7,0) 7 0x00b2f350 n/a 0 (0,0,0) (20,0,0) (0,0,0) (20,0,0) 1 0x00b5f638 n/a 0 (0,0,0) (20,1,0) (0,0,0) (20,7,0) 7 0x00b2f350 n/a 0 (0,0,0) (21,0,0) (0,0,0) (21,0,0) 1 0x00b5f638 n/a 0 (0,0,0) (21,1,0) (0,0,0) (21,7,0) 7 0x00b2f350 n/a 0 (0,0,0) (22,0,0) (0,0,0) (22,0,0) 1 0x00b5f638 n/a 0 (0,0,0) (22,1,0) (0,0,0) (22,7,0) 7 0x00b2f350 n/a 0 (0,0,0) (23,0,0) (0,0,0) (23,0,0) 1 0x00b5f638 n/a 0 (0,0,0) (23,1,0) (0,0,0) (23,7,0) 7 0x00b2f350 n/a 0 (0,0,0) (24,0,0) (0,0,0) (24,0,0) 1 0x00b5f638 n/a 0 (0,0,0) (24,1,0) (0,0,0) (24,7,0) 7 0x00b2f350 n/a 0 (0,0,0) (25,0,0) (0,0,0) (25,0,0) 1 0x00b5f638 n/a 0 (0,0,0) (25,1,0) (0,0,0) (25,7,0) 7 0x00b2f350 n/a 0 (0,0,0) (26,0,0) (0,0,0) (26,0,0) 1 0x00b5f638
[Bug target/99555] [OpenMP/nvptx] Execution-time hang for simple nested OpenMP 'target'/'parallel'/'task' constructs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=99555 Tom de Vries changed: What|Removed |Added CC||amonakov at gcc dot gnu.org --- Comment #6 from Tom de Vries --- Current theory ... All omp-threads are supposed to participate in a team barrier, and then all together move on. The master omp-thread participates from gomp_team_end, the other omp-threads from the worker loop in gomp_thread_start. Instead, it seems the master omp-thread gets stuck at the team barrier, while all other omp-threads move on, to the thread pool barrier, and that state corresponds to the observed hang. AFAICT, the problem starts when gomp_team_barrier_wake is called with count == 1: ... void gomp_team_barrier_wake (gomp_barrier_t *bar, int count) { if (bar->total > 1) asm ("bar.sync 1, %0;" : : "r" (32 * bar->total)); } ... The count argument is ignored, and instead all omp-threads are woken up, which causes omp-threads to escape the team barrier. This all is a result of the gomp_barrier_handle_tasks path being taken in gomp_team_barrier_wait_end, and I haven't figured out why that is triggered, so it still may be that the root cause lies elsewhere. Anyway, the nvptx bar.{c,h} is copied from linux/bar.{c,h}, which is implemented using futex, and with futex uses replaced with bar.sync uses. FWIW, replacing libgomp/config/nvptx/bar.{c,h} with libgomp/config/posix.{c,h} fixes the problem. Did a full libgomp test run, all problems fixed.
[Bug target/99555] [OpenMP/nvptx] Execution-time hang for simple nested OpenMP 'target'/'parallel'/'task' constructs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=99555 --- Comment #7 from Tom de Vries --- Created attachment 50627 --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=50627&action=edit debug patch A bit more analysis. I'm working with this example, with an actual task to be able to perform a check afterwards: ... #include int i = 1; int main (void) { #pragma omp target map(tofrom:i) #pragma omp parallel num_threads(2) #pragma omp task { __atomic_add_fetch (&i, 1, __ATOMIC_SEQ_CST); } assert (i == 3); return 0; } ... And I've forced the plugin to launch with two omp-threads to limit the dimensions to the minimium: ... (cuda-gdb) info cuda kernels Kernel Parent Dev Grid Status SMs Mask GridDim BlockDim Invocation * 0 - 01 Active 0x0010 (1,1,1) (32,2,1) main$_omp_fn() ... Furthermore I've made specific instances for the bar.sync team barrier, to get more meaningful backtraces. So the lifetimes of the two omp-threads look like this. THREAD 0: ... #0 0x00b73aa8 in bar_sync_thread_0 () #1 0x00b74a80 in bar_sync_n () #2 0x00b72598 in bar_sync_1 () #3 0x00b760b8 in gomp_team_barrier_wake () #4 0x00b5bc38 in GOMP_task () #5 0x00b36a58 in main$_omp_fn () # $1 #6 0x00a7e618 in GOMP_parallel () #7 0x00b377a0 in main$_omp_fn$0$impl () #8 0x00b3c700 in gomp_nvptx_main () #9 0x00b39420 in main$_omp_fn<<<(1,1,1),(32,2,1)>>> () #0 0x00b380e8 in main$_omp_fn () # $2 #1 0x00b95178 in gomp_barrier_handle_tasks () #2 0x00b76e38 in gomp_team_barrier_wait_end () #3 0x00b77dd8 in gomp_team_barrier_wait_final () #4 0x00b2a1b8 in gomp_team_end () #5 0x00b318d8 in GOMP_parallel_end () #6 0x00a7e620 in GOMP_parallel () #7 0x00b377a0 in main$_omp_fn$0$impl () #8 0x00b3c700 in gomp_nvptx_main () #9 0x00b39420 in main$_omp_fn<<<(1,1,1),(32,2,1)>>> () #0 0x00b380e8 in main$_omp_fn () # $2 #1 0x00b95178 in gomp_barrier_handle_tasks () #2 0x00b76e38 in gomp_team_barrier_wait_end () #3 0x00b77dd8 in gomp_team_barrier_wait_final () #4 0x00b2a1b8 in gomp_team_end () #5 0x00b318d8 in GOMP_parallel_end () #6 0x00a7e620 in GOMP_parallel () #7 0x00b377a0 in main$_omp_fn$0$impl () #8 0x00b3c700 in gomp_nvptx_main () #9 0x00b39420 in main$_omp_fn<<<(1,1,1),(32,2,1)>>> () #0 0x00b73aa8 in bar_sync_thread_0 () #1 0x00b74a80 in bar_sync_n () #2 0x00b72598 in bar_sync_1 () #3 0x00b760b8 in gomp_team_barrier_wake () #4 0x00b94c98 in gomp_barrier_handle_tasks () #5 0x00b76e38 in gomp_team_barrier_wait_end () #6 0x00b77dd8 in gomp_team_barrier_wait_final () #7 0x00b2a1b8 in gomp_team_end () #8 0x00b318d8 in GOMP_parallel_end () #9 0x00a7e620 in GOMP_parallel () #10 0x00b377a0 in main$_omp_fn$0$impl () #11 0x00b3c700 in gomp_nvptx_main () #12 0x00b39420 in main$_omp_fn<<<(1,1,1),(32,2,1)>>> () #0 0x00b73aa8 in bar_sync_thread_0 () #1 0x00b74a80 in bar_sync_n () #2 0x00b719b8 in bar_sync_3 () #3 0x00b76f50 in gomp_team_barrier_wait_end () #4 0x00b77dd8 in gomp_team_barrier_wait_final () #5 0x00b2a1b8 in gomp_team_end () #6 0x00b318d8 in GOMP_parallel_end () #7 0x00a7e620 in GOMP_parallel () #8 0x00b377a0 in main$_omp_fn$0$impl () #9 0x00b3c700 in gomp_nvptx_main () #10 0x00b39420 in main$_omp_fn<<<(1,1,1),(32,2,1)>>> () ^C #0 0x00b73da8 in bar_sync_thread_0 () #1 0x00b74a80 in bar_sync_n () #2 0x00b719b8 in bar_sync_3 () #3 0x00b76f50 in gomp_team_barrier_wait_end () #4 0x00b77dd8 in gomp_team_barrier_wait_final () #5 0x00b2a1b8 in gomp_team_end () #6 0x00b318d8 in GOMP_parallel_end () #7 0x00a7e620 in GOMP_parallel () #8 0x00b377a0 in main$_omp_fn$0$impl () #9 0x00b3c700 in gomp_nvptx_main () #10 0x00b39420 in main$_omp_fn<<<(1,1,1),(32,2,1)>>> () ... THREAD 1: ... #0 0x00b70ae8 in bar_sync_thread_1 () #1 0x00b74b80 in bar_sync_n () #2 0x00b72598 in bar_sync_1 () #3 0x00b760b8 in gomp_team_barrier_wake () #4 0x00b5bc38 in GOMP_task () #5 0x00b36a58 in main$_omp_fn () # $1 #6 0x00b3cbb8 in gomp_nvptx_main () #7 0x00b39420 in main$_omp_fn<<<(1,1,1),(32,2,1)>>> () #0 0x00b70ae8 in bar_sync_thread_1 () #1 0x00b74b80 in bar_sync_n () #2 0x00b719b8 in bar_sync_3 () #3 0x00b76f50 in gomp_team_barrier_wait_end () #4 0x00b77dd8 in gomp_team_barrier_wait_final () #5 0x00b3cd50 in gomp_nvptx_main () #6 0x00b39420 in main$_omp_fn<<<(1,1,1),(32,2,1)>>> () ^C #0 0x00b3ca30 in gomp_nvptx_main () #1 0
[Bug target/99555] [OpenMP/nvptx] Execution-time hang for simple nested OpenMP 'target'/'parallel'/'task' constructs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=99555 --- Comment #8 from Tom de Vries --- This fixes the hang: ... @@ -91,14 +129,16 @@ gomp_team_barrier_wait_end (gomp_barrier_t *bar, gomp_barrier_state_t state) { gomp_barrier_handle_tasks (state); state &= ~BAR_WAS_LAST; + gen = __atomic_load_n (&bar->generation, MEMMODEL_ACQUIRE); + if (gen == state + BAR_INCR) + return; } else { ... I'm not yet sure about the implementation, but the idea is to detect that gomp_team_barrier_done was called during gomp_barrier_handle_tasks, and then bail out.
[Bug target/99555] [OpenMP/nvptx] Execution-time hang for simple nested OpenMP 'target'/'parallel'/'task' constructs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=99555 --- Comment #9 from Tom de Vries --- (In reply to Tom de Vries from comment #8) > This fixes the hang: This is a less intrusive solution, and is easier to transplant into gomp_team_barrier_wait_cancel_end: ... diff --git a/libgomp/config/nvptx/bar.c b/libgomp/config/nvptx/bar.c index c5c2fa8829b..cb7b299c6a8 100644 --- a/libgomp/config/nvptx/bar.c +++ b/libgomp/config/nvptx/bar.c @@ -91,6 +91,9 @@ gomp_team_barrier_wait_end (gomp_barrier_t *bar, gomp_barrier_state_t state) { gomp_barrier_handle_tasks (state); state &= ~BAR_WAS_LAST; + if (team->task_count != 0) + __builtin_abort (); + bar->total = 1; } else { @@ -157,6 +160,9 @@ gomp_team_barrier_wait_cancel_end (gomp_barrier_t *bar, { gomp_barrier_handle_tasks (state); state &= ~BAR_WAS_LAST; + if (team->task_count != 0) + __builtin_abort (); + bar->total = 1; } else { ...
[Bug target/99555] [OpenMP/nvptx] Execution-time hang for simple nested OpenMP 'target'/'parallel'/'task' constructs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=99555 --- Comment #10 from Tom de Vries --- Patch posted: https://gcc.gnu.org/pipermail/gcc-patches/2021-April/568295.html
[Bug libgomp/100160] MinGW-w64 g++ with libgomp and nvptx looks for libgomp-plugin-nvptx.so.1 instead of libgomp-plugin-nvptx-1.dll
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=100160 Tom de Vries changed: What|Removed |Added CC||vries at gcc dot gnu.org --- Comment #1 from Tom de Vries --- Test-case updated: removed commented out lines, formatted, printing omp_get_thread_num to get some printout that is not all zeroes: ... #include #include using namespace std; int threads = omp_get_max_threads (); int devices = omp_get_num_devices (); int i, j[100]; int main (void) { printf ("Threads: %d Devices: %d\n", threads, devices); #pragma omp target teams distribute parallel for shared(j) for(i = 0; i < 100; i++) j[i] = omp_get_thread_num (); for (i = 0; i < 100; i++) printf ("%d ", j[i]); return 0; } ... On ubuntu, we have: ... $ ./install/bin/g++ test-1.cpp -fopenmp -foffload=nvptx-none -fno-lto $ LD_LIBRARY_PATH=./install/lib64 ./a.out Threads: 8 Devices: 1 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 1 1 1 1 1 1 1 1 1 1 1 2 2 2 2 2 2 2 2 2 2 2 2 2 3 3 3 3 3 3 3 3 3 3 3 3 3 4 4 4 4 4 4 4 4 4 4 4 4 5 5 5 5 5 5 5 5 5 5 5 5 6 6 6 6 6 6 6 6 6 6 6 6 7 7 7 7 7 7 7 7 7 7 7 7 $ ... And: ... $ ./install/bin/g++ test-1.cpp -fopenmp -foffload=nvptx-none lto-wrapper: fatal error: could not find accel/nvptx-none/mkoffload in /home/vries/oacc/trunk/install/bin/../libexec/gcc/x86_64-pc-linux-gnu/12.0.0/:/home/vries/oacc/trunk/install/bin/../libexec/gcc/ (consider using ‘-B’) compilation terminated. /usr/bin/ld: error: lto-wrapper failed collect2: error: ld returned 1 exit status ... So, there might be something windows-specific that is going wrong. But things seems to already go wrong on linux.
[Bug target/99564] [nvptx] FAIL: libgomp.oacc-fortran/derivedtypes-arrays-1.f90 -DACC_DEVICE_TYPE_nvidia=1 -DACC_MEM_SHARED=0 -foffload=nvptx-none -O0 (test for excess errors)
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=99564 Tom de Vries changed: What|Removed |Added Status|UNCONFIRMED |RESOLVED Resolution|--- |FIXED CC||tschwinge at gcc dot gnu.org Target Milestone|--- |11.0 --- Comment #1 from Tom de Vries --- https://gcc.gnu.org/git/?p=gcc.git;a=commit;h=8bafce1be11a301c2421483736c634b8bf330e69
[Bug libgomp/100160] MinGW-w64 g++ with libgomp and nvptx looks for libgomp-plugin-nvptx.so.1 instead of libgomp-plugin-nvptx-1.dll
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=100160 --- Comment #2 from Tom de Vries --- (In reply to Tom de Vries from comment #1) > ... > $ ./install/bin/g++ test-1.cpp -fopenmp -foffload=nvptx-none > lto-wrapper: fatal error: could not find accel/nvptx-none/mkoffload in > /home/vries/oacc/trunk/install/bin/../libexec/gcc/x86_64-pc-linux-gnu/12.0.0/ > :/home/vries/oacc/trunk/install/bin/../libexec/gcc/ (consider using ‘-B’) > compilation terminated. > /usr/bin/ld: error: lto-wrapper failed > collect2: error: ld returned 1 exit status > ... I can make this work by adding: ... $ ./install/bin/g++ test-1.cpp -fopenmp -foffload=nvptx-none -flto \ -B $(pwd -P)/install/offload-nvptx-none/libexec/gcc/x86_64-pc-linux-gnu/12.0.0/ \ -B $(pwd -P)/install/offload-nvptx-none/bin/ $ LD_LIBRARY_PATH=./install/lib64 ./a.out Threads: 8 Devices: 1 0 1 2 3 4 0 1 2 3 4 0 1 2 3 4 0 1 2 3 4 0 1 2 3 4 0 1 2 3 4 0 1 2 3 4 0 1 2 3 4 0 1 2 3 4 0 1 2 3 4 0 1 2 3 4 0 1 2 3 4 0 1 2 3 4 0 1 2 3 4 0 1 2 3 4 0 1 2 3 4 0 1 2 3 4 0 1 2 3 4 0 1 2 3 ... Hmm, this actually executes on the target, the -fno-lto case executes the host fallback.
[Bug target/99932] OpenACC/nvptx offloading execution regressions starting with CUDA 11.2-era Nvidia Driver 460.27.04
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=99932 --- Comment #1 from Tom de Vries --- (In reply to Thomas Schwinge from comment #0) > We're seeing OpenACC/nvptx offloading execution regressions (including a lot > of timeouts) starting with CUDA 11.2-era Nvidia Driver 460.27.04. Confirmed > with: CUDA 11.2-era 460.27.04, 460.32.03, 460.39, 460.56, 460.67, and CUDA > 11.3-era 465.19.01, across several variants of GPU hardware. > > Explicitly (re-)confirmed good are older versions such as CUDA 9.1-era > 390.12, and CUDA 11.1-era 455.38, 455.45.01. > > Most of these are in the 'vector_length > 32' testcases, but also a few > others. > Confirmed, I see on ubuntu 18.04.5 with dirver version 460.67: ... FAIL: libgomp.oacc-c/../libgomp.oacc-c-c++-common/parallel-dims.c -DACC_DEVICE_TYPE_nvidia=1 -DACC_MEM_SHARED=0 -foffload=nvptx-none -O2 execution test ...
[Bug target/99932] OpenACC/nvptx offloading execution regressions starting with CUDA 11.2-era Nvidia Driver 460.27.04
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=99932 --- Comment #2 from Tom de Vries --- Minimal example: ... $ cat libgomp/testsuite/libgomp.oacc-c-c++-common/parallel-dims.c int main (void) { int vectors_max = -1; #pragma acc parallel \ num_gangs (1) \ num_workers (1) \ vector_length (32) \ copy (vectors_max) { #pragma acc loop gang reduction (max: vectors_max) for (int i = 0; i < 2; i++) #pragma acc loop worker reduction (max: vectors_max) for (int j = 0; j < 2; j++) #pragma acc loop vector reduction (max: vectors_max) for (int k = 0; k < 32; k++) vectors_max = k; } if (vectors_max != 31) __builtin_abort (); return 0; } ... Passes with GOMP_NVPTX_JIT=-O0, starts failing at GOMP_NVPTX_JIT=-O1.
[Bug target/99932] OpenACC/nvptx offloading execution regressions starting with CUDA 11.2-era Nvidia Driver 460.27.04
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=99932 --- Comment #3 from Tom de Vries --- Created attachment 50660 --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=50660&action=edit Cuda reproducer
[Bug target/100232] [OpenMP][nvptx] Reduction fails with optimization and 'loop'/'for simd' but not with 'for'
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=100232 --- Comment #1 from Tom de Vries --- Can you try the patch for PR81778 ? It's possible you're looking at a duplicate.
[Bug target/99932] OpenACC/nvptx offloading execution regressions starting with CUDA 11.2-era Nvidia Driver 460.27.04
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=99932 --- Comment #4 from Tom de Vries --- Created attachment 50662 --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=50662&action=edit Updated cuda reproducer Slimmed down further, eliminated gang/worker reduction parts.
[Bug target/99932] OpenACC/nvptx offloading execution regressions starting with CUDA 11.2-era Nvidia Driver 460.27.04
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=99932 --- Comment #5 from Tom de Vries --- FIled https://developer.nvidia.com/nvidia_bug/3299227
[Bug tree-optimization/97333] [gimple_can_duplicate_bb_p == false, tree-ssa-threadupdate] ICE in duplicate_block, at cfghooks.c:1093
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=97333 --- Comment #2 from Tom de Vries --- (In reply to Richard Biener from comment #1) > (because well, on GIMPLE we can duplicate all blocks). I'm not sure I understand why, given that tracer.c has a can_duplicate_bb_p that sometimes returns false. Sent an RFC patch to ask for clarification: https://gcc.gnu.org/pipermail/gcc-patches/2020-October/555890.html .
[Bug target/97348] [nvptx] Make -misa=sm_35 the default
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=97348 --- Comment #6 from Tom de Vries --- (In reply to CVS Commits from comment #4) > Both build again cuda 9.1. FWIW, tested post-commit against cuda 11.1, no issues found.
[Bug target/97318] [nvptx] Function splitting results in invalid function name
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=97318 --- Comment #1 from Tom de Vries --- Tentative patch: ... diff --git a/gcc/config/nvptx/nvptx.c b/gcc/config/nvptx/nvptx.c index afac1bda45d..7b6a42893f9 100644 --- a/gcc/config/nvptx/nvptx.c +++ b/gcc/config/nvptx/nvptx.c @@ -365,6 +365,30 @@ nvptx_name_replacement (const char *name) return "__nvptx_free"; if (strcmp (name, "realloc") == 0) return "__nvptx_realloc"; + + if (strchr (name, '.') != NULL) +{ + static char *p = NULL; + static size_t p_size = 0; + size_t len = strlen (name); + size_t len_0 = len + 1; + if (p == NULL) + { + p_size = len_0; + p = XNEWVEC (char, p_size); + } + else if (len_0 > p_size) + { + p_size = len_0; + p = XRESIZEVEC (char, p, p_size); + } + strncpy (p, name, len_0); + for (size_t i = 0; i < len_0; ++i) + if (p[i] == '.') + p[i] = '$'; + return p; +} + return name; } ...
[Bug target/97318] [nvptx] Function splitting results in invalid function name
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=97318 Tom de Vries changed: What|Removed |Added Target Milestone|--- |11.0 Resolution|--- |FIXED Status|UNCONFIRMED |RESOLVED --- Comment #3 from Tom de Vries --- Marking resolved-fixed.
[Bug target/97203] [nvptx] 'illegal memory access was encountered' with 'omp simd'/SIMT and cexpf call
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=97203 --- Comment #7 from Tom de Vries --- (In reply to Alexander Monakov from comment #6) > (In reply to Tom de Vries from comment #4) > > So, I think calling functions from simd code is atm not supported for nvptx. > > > > Stack variables in simd code are mapped on a per-thread stack rather than on > > the > > usual per-warp stack. > > > > The functions are compiled with the usual per-warp stack, so calling those > > functions from simd might mean the different lanes are gonna disagree about > > what the value in a stack variable should be. > > This is inaccurate. In -msoft-stack mode there's no baked-in assumption that > stacks are always per-warp. The "soft stack" pointer can point either to > global memory (outside of SIMD regions), or to local memory (inside SIMD > regions). The pointer is switched between per-warp global memory and > per-lane local memory by nvptx.c:nvptx_output_softstack_switch. > > The main requirement is that functions callable from OpenMP offloaded code > are compiled for -mgomp multilib variant. The design allows calling > functions even from inside SIMD regions, and it should be supported. I see, that's helpful, thanks. I guess I was thrown off by seeing a %simtstack_ar of 136 bytes: ... .local .align 8 .b8 %simtstack_ar[136]; ... which seems more of an amount claimed by a single function. Is it possible you meant the default of -msoft-stack-reserve-local=128 to mean 128kb (similar to what is claimed in nvptx_stacks_size in the plugin)? Because currently it means 128 bytes.
[Bug target/97203] [nvptx] 'illegal memory access was encountered' with 'omp simd'/SIMT and cexpf call
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=97203 --- Comment #9 from Tom de Vries --- (In reply to Tom de Vries from comment #2) > Minimal version (without inlining sinf code from newlib): > ... > /* { dg-additional-options "-lm -foffload=-lm" } */ > > #define N 1 > > int > main (void) { > float k[N]; > float res; > > for (int i = 0; i < N; i++) > k[i] = 300; > > #pragma omp target map(to:k) map(from:res) > { > float sum = 0.0; > #pragma omp simd reduction(+:sum) > for (int i = 0; i < N; i++) > sum += __builtin_sinf (k[i]); > > res = sum; > } > > return 0; > } > ... Starts passing at -foffload=-msoft-stack-reserve-local=346.
[Bug libgomp/97384] New: [libgomp, nvptx] Handle -msoft-stack-reserve-local= overflow in plugin
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=97384 Bug ID: 97384 Summary: [libgomp, nvptx] Handle -msoft-stack-reserve-local= overflow in plugin Product: gcc Version: 11.0 Status: UNCONFIRMED Severity: enhancement Priority: P3 Component: libgomp Assignee: unassigned at gcc dot gnu.org Reporter: vries at gcc dot gnu.org CC: jakub at gcc dot gnu.org Target Milestone: --- Using the option -msoft-stack-reserve-local= results in a: ... .local .align 8 .b8 %simtstack_ar[n+8]; ... However, the CU_LIMIT_STACK_SIZE is set by default to 1kb for my card/driver combo, so if I specify say -msoft-stack-reserve-local=2048, I run into: ... libgomp: cuCtxSynchronize error: an illegal memory access was encountered ... or: ... libgomp: cuCtxSynchronize error: an illegal instruction was encountered ... [ The latter at GOMP_NVPTX_JIT=-O0. ] Which may look a lot like the behaviour we're trying to fix by adding -msoft-stack-reserve-local. There's currently no way to make this work. We could add an env var, say GOMP_NVPTX_LIMIT_STACK_SIZE which is used to set: ... r = cuCtxSetLimit(CU_LIMIT_STACK_SIZE, gomp_nvptx_limit_stack_size); ... and then do: ... $ GOMP_NVPTX_LIMIT_STACK_SIZE=3072 ./a.out ... [ Note that GOMP_NVPTX_LIMIT_STACK_SIZE id chosen to be larger than 2048 to accommodate for other .local usage. ] [ It would be nice if we could attempt to accommodate the requested stack size in the libgomp plugin automatically. In the current setup, that would mean scanning the ptx code for "simtstack_ar[]", which is a bit cumbersome and probably too slow. Perhaps emitting an additional additional line before the pre-amble like this: ... // SIMTSTACK_AR_SIZE: 2048 ... would be possible to handle quick enough. ]
[Bug target/97385] New: [nvptx, docs] -msoft-stack-reserve-local= missing documentation
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=97385 Bug ID: 97385 Summary: [nvptx, docs] -msoft-stack-reserve-local= missing documentation Product: gcc Version: 11.0 Status: UNCONFIRMED Severity: trivial Priority: P3 Component: target Assignee: unassigned at gcc dot gnu.org Reporter: vries at gcc dot gnu.org Target Milestone: --- Currently, https://gcc.gnu.org/onlinedocs/gcc/Nvidia-PTX-Options.html does not list -msoft-stack-reserve-local.
[Bug target/97203] [nvptx] 'illegal memory access was encountered' with 'omp simd'/SIMT and cexpf call
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=97203 --- Comment #10 from Tom de Vries --- (In reply to Alexander Monakov from comment #8) > No, -msoft-stack-reserve-local is really meant to be in bytes: it may not > exceed the amount of .local memory reserved by CUDA driver (which is just > 1-2 KB, unless overridden via cuCtxSetLimit, which nvptx-run.c does, but > plugin-nvptx.c does not). > > Keep in mind that .local memory reservation is multiplied by number of > active contexts, which could be in range 2-3 when the code was > written: 128KB local memory per active thread would imply a 2.5GB allocation > on the GPU. With the number of active contexts, do you mean the sm_count * thread_max as used in nvptx-run.c (which, FWIW, is 10.240 on my card)?
[Bug target/97436] New: [nvptx] -m32 support
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=97436 Bug ID: 97436 Summary: [nvptx] -m32 support Product: gcc Version: 11.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: target Assignee: unassigned at gcc dot gnu.org Reporter: vries at gcc dot gnu.org Target Milestone: --- The nvptx port has an -m32 switch: ... m32 Target Report RejectNegative InverseMask(ABI64) Generate code for a 32-bit ABI. ... but the default is -m64: ... #define TARGET_DEFAULT_TARGET_FLAGS MASK_ABI64 ... [ which perhaps should be related to the host being -m64 or -m32? ] We're not building -m32/-m64 multilibs, so it seems we're not building the -m32 part by default. I don't know if the -m32 path was ever tested, either in standalone or offloading setting. But since the switch is there, we should either build and test or deprecate it. I'm not yet sure what would be a working setup in terms of card/drivers/OS. At least for linux, at cuda 7.5 it's mentioned that: "Support for developing and running 32-bit CUDA and OpenCL applications on 64-bit x86 Linux platforms is deprecated". And with cuda 8.0, I get: ... $ ~/cuda/8.0/bin/nvcc ~/hello.cu -m32 nvcc warning : Compiling in the 32-bit mode when the host compiler targets x86 or x86_64 is no longer supported on Linux ...
[Bug tree-optimization/84958] int loads not eliminated against larger stores
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=84958 --- Comment #4 from Tom de Vries --- (In reply to Tom de Vries from comment #0) > [ FWIW, adding an extra fre pass here also results in optimal gimple: > ... > diff --git a/gcc/passes.def b/gcc/passes.def > index 3ebcfc30349..6b64f600c4a 100644 > --- a/gcc/passes.def > +++ b/gcc/passes.def > @@ -325,6 +325,7 @@ along with GCC; see the file COPYING3. If not see >NEXT_PASS (pass_tracer); >NEXT_PASS (pass_thread_jumps); >NEXT_PASS (pass_dominator, false /* may_peel_loop_headers_p */); > + NEXT_PASS (pass_fre); >NEXT_PASS (pass_strlen); >NEXT_PASS (pass_thread_jumps); >NEXT_PASS (pass_vrp, false /* warn_array_bounds_p */); > ... > ] which was added in: ... commit 744fd446c321f78f9a1ce4ef5f83df8dcfa44a9e Author: Richard Biener Date: Mon Jul 1 07:54:38 2019 + tree-ssa-sccvn.c (class pass_fre): Add may_iterate pass parameter. 2019-07-01 Richard Biener * tree-ssa-sccvn.c (class pass_fre): Add may_iterate pass parameter. (pass_fre::execute): Honor it. * passes.def: Adjust pass_fre invocations to allow iterating, add non-iterating pass_fre before late threading/dom. * gcc.dg/tree-ssa/pr77445-2.c: Adjust. From-SVN: r272843 diff --git a/gcc/passes.def b/gcc/passes.def index ad2efabd385..9a5b0cd554a 100644 --- a/gcc/passes.def +++ b/gcc/passes.def @@ -312,6 +312,7 @@ along with GCC; see the file COPYING3. If not see NEXT_PASS (pass_strength_reduction); NEXT_PASS (pass_split_paths); NEXT_PASS (pass_tracer); + NEXT_PASS (pass_fre, false /* may_iterate */); NEXT_PASS (pass_thread_jumps); NEXT_PASS (pass_dominator, false /* may_peel_loop_headers_p */); NEXT_PASS (pass_strlen); ...
[Bug tree-optimization/84958] int loads not eliminated against larger stores
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=84958 Tom de Vries changed: What|Removed |Added CC||ams at gcc dot gnu.org, ||julian at codesourcery dot com --- Comment #5 from Tom de Vries --- I've removed the xfail for nvptx. The only remaining xfail is for gcn. Is that one still necessary?
[Bug tree-optimization/97333] [gimple_can_duplicate_bb_p == false, tree-ssa-threadupdate] ICE in duplicate_block, at cfghooks.c:1093
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=97333 --- Comment #3 from Tom de Vries --- (In reply to Tom de Vries from comment #2) > (In reply to Richard Biener from comment #1) > > (because well, on GIMPLE we can duplicate all blocks). > > I'm not sure I understand why, given that tracer.c has a can_duplicate_bb_p > that sometimes returns false. Sent an RFC patch to ask for clarification: > https://gcc.gnu.org/pipermail/gcc-patches/2020-October/555890.html . Committed as "[gimple] Move can_duplicate_bb_p to gimple_can_duplicate_bb_p" ( https://gcc.gnu.org/git/?p=gcc.git;a=commit;h=17d5739a6b103cdd3315f5d0e09fe8faa6620a03 ). Now gimple_can_duplicate_bb_p can return false.
[Bug target/97444] New: [nvptx] stack atomics
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=97444 Bug ID: 97444 Summary: [nvptx] stack atomics Product: gcc Version: 11.0 Status: UNCONFIRMED Severity: enhancement Priority: P3 Component: target Assignee: unassigned at gcc dot gnu.org Reporter: vries at gcc dot gnu.org Target Milestone: --- There's currently a bunch of tests failing: ... FAIL: gcc.dg/atomic/c11-atomic-exec-6.c -O0 execution test FAIL: gcc.dg/atomic/c11-atomic-exec-6.c -O1 execution test FAIL: gcc.dg/atomic/c11-atomic-exec-6.c -O2 execution test FAIL: gcc.dg/atomic/c11-atomic-exec-6.c -O3 -fomit-frame-pointer -funroll-loops -fpeel-loops -ftracer -finline-functions execution test FAIL: gcc.dg/atomic/c11-atomic-exec-6.c -O3 -g execution test FAIL: gcc.dg/atomic/c11-atomic-exec-6.c -Os execution test FAIL: gcc.dg/atomic/c11-atomic-exec-7.c -O0 execution test FAIL: gcc.dg/atomic/c11-atomic-exec-7.c -O1 execution test FAIL: gcc.dg/atomic/c11-atomic-exec-7.c -O2 execution test FAIL: gcc.dg/atomic/c11-atomic-exec-7.c -O3 -fomit-frame-pointer -funroll-loops -fpeel-loops -ftracer -finline-functions execution test FAIL: gcc.dg/atomic/c11-atomic-exec-7.c -O3 -g execution test FAIL: gcc.dg/atomic/c11-atomic-exec-7.c -Os execution test FAIL: gcc.dg/atomic/stdatomic-op-5.c -O0 execution test FAIL: gcc.dg/atomic/stdatomic-op-5.c -O1 execution test FAIL: gcc.dg/atomic/stdatomic-op-5.c -O2 execution test FAIL: gcc.dg/atomic/stdatomic-op-5.c -O3 -g execution test FAIL: gcc.dg/atomic/stdatomic-op-5.c -Os execution test ... due to using atomics with stack memory. The nvptx atomic ops do not support using stack memory. I've been marking test-cases with atomic builtins using stack with dg-require-effective-target sync_int_long_stack, which is ok because they're just builtins. But it's another thing when atomics are part of the language like in the examples above. In principle, it should possible to generate a run-time test using isspacep, and switch between using an atomic op or a non-atomic fallback (that is either an atom.add.u32, or an add.u32, because it's effectively atomic in .local memory). OTOH, we want a mode of operation where we generate atomics as currently, without the runtime test to have smaller and faster code. So, perhaps, a switch -mstack-atomics.
[Bug target/97436] [nvptx] Remove -m32 support
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=97436 Tom de Vries changed: What|Removed |Added Resolution|--- |FIXED Status|NEW |RESOLVED Target Milestone|--- |11.0 --- Comment #3 from Tom de Vries --- Updated release notes, marking resolved-fixed.
[Bug libgomp/97509] New: [nvptx, offloading] dg-excess-errors directive no longer working in some test-cases
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=97509 Bug ID: 97509 Summary: [nvptx, offloading] dg-excess-errors directive no longer working in some test-cases Product: gcc Version: 11.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: libgomp Assignee: unassigned at gcc dot gnu.org Reporter: vries at gcc dot gnu.org CC: jakub at gcc dot gnu.org Target Milestone: --- With current trunk I'm seeing a number of new fails, all of them "test for excess errors": ... FAIL: libgomp.c/../libgomp.c-c++-common/function-not-offloaded.c (test for excess errors) FAIL: libgomp.c/../libgomp.c-c++-common/variable-not-offloaded.c (test for excess errors) FAIL: libgomp.c/pr86416-1.c (test for excess errors) FAIL: libgomp.c/pr86416-2.c (test for excess errors) FAIL: libgomp.c++/../libgomp.c-c++-common/function-not-offloaded.c (test for excess errors) FAIL: libgomp.c++/../libgomp.c-c++-common/variable-not-offloaded.c (test for excess errors) ... Incidentally, that's all but one of the total set of tests that use the dg-excess-errors: ... $ find libgomp/testsuite/ -type f | xargs grep -l dg-excess-errors libgomp/testsuite/libgomp.oacc-c-c++-common/function-not-offloaded.c libgomp/testsuite/libgomp.c-c++-common/function-not-offloaded.c libgomp/testsuite/libgomp.c-c++-common/variable-not-offloaded.c libgomp/testsuite/libgomp.c/pr86416-1.c libgomp/testsuite/libgomp.c/pr86416-2.c ... The one still passing is an openacc test.
[Bug libgomp/97509] [nvptx, offloading] dg-excess-errors directive no longer working in some test-cases
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=97509 Tom de Vries changed: What|Removed |Added Resolution|--- |WORKSFORME Status|UNCONFIRMED |RESOLVED --- Comment #1 from Tom de Vries --- OK, this seems to be the test results when the nvidia driver stopped working, which I haven't been able to repair yet, but I'm assuming WORKSFORME for now. Sorry for the noise.
[Bug libgomp/97532] New: Error: insn does not satisfy its constraints, internal compiler error: in extract_constrain_insn, at recog.c:2196
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=97532 Bug ID: 97532 Summary: Error: insn does not satisfy its constraints, internal compiler error: in extract_constrain_insn, at recog.c:2196 Product: gcc Version: 11.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: libgomp Assignee: unassigned at gcc dot gnu.org Reporter: vries at gcc dot gnu.org CC: jakub at gcc dot gnu.org Target Milestone: --- At commit c26d7df1031 "OpenMP: Fortran - support omp flush's memorder clauses" I'm seeing a few new FAILs with nvptx offloading: ... FAIL: libgomp.fortran/examples-4/simd-2.f90 -O1 (internal compiler error) FAIL: libgomp.fortran/examples-4/simd-2.f90 -O1 (test for excess errors) FAIL: libgomp.fortran/examples-4/simd-2.f90 -O2 (internal compiler error) FAIL: libgomp.fortran/examples-4/simd-2.f90 -O2 (test for excess errors) FAIL: libgomp.fortran/examples-4/simd-2.f90 -O3 -fomit-frame-pointer -funroll-loops -fpeel-loops -ftracer -finline-functions (internal compiler error) FAIL: libgomp.fortran/examples-4/simd-2.f90 -O3 -fomit-frame-pointer -funroll-loops -fpeel-loops -ftracer -finline-functions (test for excess errors) FAIL: libgomp.fortran/examples-4/simd-2.f90 -O3 -g (internal compiler error) FAIL: libgomp.fortran/examples-4/simd-2.f90 -O3 -g (test for excess errors) FAIL: libgomp.fortran/examples-4/simd-2.f90 -Os (internal compiler error) FAIL: libgomp.fortran/examples-4/simd-2.f90 -Os (test for excess errors) ... In more detail: ... /home/vries/oacc/trunk/source-gcc/libgomp/testsuite/libgomp.fortran/examples-4/simd-2.f90:19:0: Error: insn does not satisfy its constraints:^M (insn 76 75 82 2 (set (reg:V8DF 20 xmm0 [orig:153 vect_c_20.334 ] [153])^M (plus:V8DF (reg:V8DF 20 xmm0 [orig:184 vect__17.333 ] [184])^M (vec_duplicate:V8DF (mem:DF (reg:DI 24 xmm4 [189]) [6 *fact_18(D)+0 S8 A64] "/home/vries/oacc/trunk/source-gcc/libgomp/testsuite/libgomp.fortran/examples-4/simd-2.f90":18:0 1569 {*addv8df3}^M (nil))^M during RTL pass: reload^M /home/vries/oacc/trunk/source-gcc/libgomp/testsuite/libgomp.fortran/examples-4/simd-2.f90:19:0: internal compiler error: in extract_constrain_insn, at recog.c:2196^M 0x6092d8 _fatal_insn(char const*, rtx_def const*, char const*, int, char const*)^M /home/vries/oacc/trunk/source-gcc/gcc/rtl-error.c:108^M 0x609301 _fatal_insn_not_found(rtx_def const*, char const*, int, char const*)^M /home/vries/oacc/trunk/source-gcc/gcc/rtl-error.c:118^M 0xcb52ad extract_constrain_insn(rtx_insn*)^M /home/vries/oacc/trunk/source-gcc/gcc/recog.c:2196^M 0xb99457 check_rtl^M /home/vries/oacc/trunk/source-gcc/gcc/lra.c:2173^M ...
[Bug target/97532] [11 Regression] Error: insn does not satisfy its constraints, internal compiler error: in extract_constrain_insn, at recog.c:2196
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=97532 --- Comment #12 from Tom de Vries --- (In reply to Hongtao.liu from comment #10) > Created attachment 49444 [details] > Fix invalid address for special memory constraint > > I'm testing this patch. Submitted: https://gcc.gnu.org/pipermail/gcc-patches/2020-October/557142.html
[Bug debug/97669] New: Section .debug_info.dwo contains standard_opcode_lenghts array
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=97669 Bug ID: 97669 Summary: Section .debug_info.dwo contains standard_opcode_lenghts array Product: gcc Version: 11.0 Status: UNCONFIRMED Severity: enhancement Priority: P3 Component: debug Assignee: unassigned at gcc dot gnu.org Reporter: vries at gcc dot gnu.org Target Milestone: --- In the dwarf v5 standard we read: ... .debug_line.dwo - Contains specialized line number tables for the type units in the .debug_info.dwo section. These tables contain only the directory and filename lists needed to interpret DW_AT_decl_file attributes in the debugging information entries. Actual line number tables remain in the .debug_line section, and remain in the relocatable object (.o) files. ... Now consider: ... $ gcc-11 \ -g -gsplit-dwarf \ ~/hello.c \ -v -save-temps \ -dA ... In file hello.s, in section .debug_line.dwo, we have the standard_opcode_lenghts array: ... .byte 0xd # Special Opcode Base .byte 0 # opcode: 0x1 has 0 args .byte 0x1 # opcode: 0x2 has 1 args .byte 0x1 # opcode: 0x3 has 1 args .byte 0x1 # opcode: 0x4 has 1 args .byte 0x1 # opcode: 0x5 has 1 args .byte 0 # opcode: 0x6 has 0 args .byte 0 # opcode: 0x7 has 0 args .byte 0 # opcode: 0x8 has 0 args .byte 0x1 # opcode: 0x9 has 1 args .byte 0 # opcode: 0xa has 0 args .byte 0 # opcode: 0xb has 0 args .byte 0x1 # opcode: 0xc has 1 args ... But given that this is a "specialized line number table", there's no line number program, and because of that, the standard_opcode_lengths array is unnecessary. [ And then there's the fact that actually the whole .debug_line.dwo is unnecessary, given that there are no type units in .debug_info.dwo. ]
[Bug debug/97713] New: [gsplit-dwarf] label generated for .debug_abbrev.dwo offset, corresponding relocation ignored by objcopy --extract-dwo
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=97713 Bug ID: 97713 Summary: [gsplit-dwarf] label generated for .debug_abbrev.dwo offset, corresponding relocation ignored by objcopy --extract-dwo Product: gcc Version: 11.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: debug Assignee: unassigned at gcc dot gnu.org Reporter: vries at gcc dot gnu.org Target Milestone: --- I. Consider a simple hello.c, compiled with debug info: ... $ gcc-11 -g ~/hello.c -dA -save-temps ... In a-hello.s, in the CU header we have a label .Ldebug_abbrev0: ... .section.debug_info,"",@progbits .Ldebug_info0: .long 0x87# Length of Compilation Unit Info .value 0x4 # DWARF version number .long .Ldebug_abbrev0 # Offset Into Abbrev. Section referring here: ... .section.debug_abbrev,"",@progbits .Ldebug_abbrev0: .uleb128 0x1# (abbrev code) .uleb128 0x11 # (TAG: DW_TAG_compile_unit) ... In a-hello.o, we find a zero at the abbrev offset: ... Contents of the .debug_info section: Compilation Unit @ offset 0x0: Length:0x87 (32-bit) Version: 4 Abbrev Offset: 0x0 Pointer Size: 8 ... but there's a related relocation in place: ... Relocation section '.rela.debug_info' at offset 0x4c8 contains 16 entries: Offset Info Type Sym. ValueSym. Name + Addend 0006 0007000a R_X86_64_32 .debug_abbrev + 0 ... and in the a.out, we end up with: ... Compilation Unit @ offset 0xc7: Length:0x87 (32-bit) Version: 4 Abbrev Offset: 0x64 ... II. Now cp a-hello.s to hello.s and modify like this: ... $ diff -u a-hello.s hello.s --- a-hello.s 2020-11-04 13:04:03.325633204 +0100 +++ hello.s 2020-11-04 13:04:17.001509687 +0100 @@ -102,6 +102,7 @@ # DW_AT_GNU_all_tail_call_sites .byte 0 # end of children of DIE 0xb .section.debug_abbrev,"",@progbits + .uleb128 0x0# (abbrev code) .Ldebug_abbrev0: .uleb128 0x1# (abbrev code) .uleb128 0x11 # (TAG: DW_TAG_compile_unit) ... and recompile: ... $ gcc-11 -g hello.s -save-temps ... We can see that we've indeed moved the label by 1 byte: ... $ llvm-dwarfdump -debug-abbrev a-hello.o a-hello.o: file format ELF64-x86-64 .debug_abbrev contents: Abbrev table for offset: 0x Abbrev table for offset: 0x0001 [1] DW_TAG_compile_unit DW_CHILDREN_yes ... and likewise, the relocation: ... Relocation section '.rela.debug_info' at offset 0x4c8 contains 16 entries: Offset Info Type Sym. ValueSym. Name + Addend 0006 0007000a R_X86_64_32 .debug_abbrev + 1 ... and likewise, in the a.out: ... Compilation Unit @ offset 0xc7: Length:0x87 (32-bit) Version: 4 Abbrev Offset: 0x65 ... III. Now, let's try the same with in .debug_abbrev.dwo. We compile with -gsplit-dwarf: ... $ gcc-11 ~/hello.c -g -gsplit-dwarf -dA -save-temps ... and copy to hello.s and modify: ... $ diff -u a-hello.s hello.s --- a-hello.s 2020-11-04 13:12:57.188966796 +0100 +++ hello.s 2020-11-04 13:14:48.632059272 +0100 @@ -156,6 +156,7 @@ .byte 0 .byte 0 # end of skeleton .debug_abbrev .section.debug_abbrev.dwo,"e",@progbits + .uleb128 0x0# (abbrev code) .Ldebug_abbrev0: .uleb128 0x1# (abbrev code) .uleb128 0x11 # (TAG: DW_TAG_compile_unit) ... We can see that we indeed added the entry to .debug_abbrev.dwo: ... $ llvm-dwarfdump -debug-abbrev a-hello.dwo a-hello.dwo:file format ELF64-x86-64 .debug_abbrev.dwo contents: Abbrev table for offset: 0x Abbrev table for offset: 0x0001 [1] DW_TAG_compile_unit DW_CHILDREN_yes ... The abbrev offset is 0: ... Contents of the .debug_info.dwo section: Compilation Unit @ offset 0x0: Length:0x50 (32-bit) Version: 4 Abbrev Offset: 0x0 ... but there's no related relocation so the CU is interpreted using the empty abbrev table at offset 0, and we run into: ... <0>: Abbrev Number: 1 readelf: Warning: DIE at offset 0xb refers to abbreviation number 1 which does not exist ... The split into .o/.dwo is done like this: ... /usr/lib64/gcc/x86_64-suse-linux/11/../../../../x86_64-suse-linux/bin/as --gdwarf2 -v --64 -o a-hello.o hello.s objcopy --extract-dwo a-hello.o a-hello.dwo objcopy --strip-dwo a-hello.o ... so we can recreate the original a-hello.o, and find the relocation: ... Relocation section '.rela.debug_info.dwo' at offset 0x710 contains 1 entry: Offset Info Type Sym. ValueSym. Name + Addend 0006 000a000a R_X86_64_32 .debug_abbrev.dwo + 1 ... So, objcopy --extract-
[Bug debug/97713] [gsplit-dwarf] label generated for .debug_abbrev.dwo offset, corresponding relocation ignored by objcopy --extract-dwo
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=97713 --- Comment #1 from Tom de Vries --- (In reply to Tom de Vries from comment #0) > and copy to hello.s and modify: > ... > $ diff -u a-hello.s hello.s > --- a-hello.s 2020-11-04 13:12:57.188966796 +0100 > +++ hello.s 2020-11-04 13:14:48.632059272 +0100 > @@ -156,6 +156,7 @@ > .byte 0 > .byte 0 # end of skeleton .debug_abbrev > .section.debug_abbrev.dwo,"e",@progbits > + .uleb128 0x0# (abbrev code) > .Ldebug_abbrev0: > .uleb128 0x1# (abbrev code) > .uleb128 0x11 # (TAG: DW_TAG_compile_unit) > ... And forgot to mention, recompile: ... $ gcc-11 hello.s -g -gsplit-dwarf -save-temps ...
[Bug debug/97713] [gsplit-dwarf] label generated for .debug_abbrev.dwo offset, corresponding relocation ignored by objcopy --extract-dwo
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=97713 --- Comment #2 from Tom de Vries --- Filed corresponding binutils PR: "objcopy --extract-dwo silently drops relocation" at https://sourceware.org/bugzilla/show_bug.cgi?id=26841 .
[Bug debug/97713] [gsplit-dwarf] label generated for .debug_abbrev.dwo offset, corresponding relocation ignored by objcopy --extract-dwo
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=97713 --- Comment #3 from Tom de Vries --- Mentioning dwarf 5 standard bit @ "7.3.2.2 Second Partition (Unlinked or in a .dwo File)": ... Split DWARF object files do not get linked with any other files, therefore references between sections must not make use of normal object file relocation information. As a result, symbolic references within or between sections are not possible. ...
[Bug debug/97713] [gsplit-dwarf] label generated for .debug_abbrev.dwo offset, corresponding relocation ignored by objcopy --extract-dwo
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=97713 Tom de Vries changed: What|Removed |Added CC||ccoutant at gmail dot com, ||jason at gcc dot gnu.org --- Comment #4 from Tom de Vries --- (In reply to Tom de Vries from comment #0) > Now, should objcopy implement the relocation? Nick proposed a patch that errors out on current gcc output. > Note that llvm emits a '0' as abbrev offset instead of a label. And gcc would have to emit a 0, like llvm. It that ok from gcc perspective?
[Bug debug/97774] New: Incorrect line info for try/catch
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=97774 Bug ID: 97774 Summary: Incorrect line info for try/catch Product: gcc Version: 7.5.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: debug Assignee: unassigned at gcc dot gnu.org Reporter: vries at gcc dot gnu.org Target Milestone: --- [ This PR is FTR, it's already fixed. ] Consider this test-case, minimized from gdb.cp/gdb9593.cc ( https://src.fedoraproject.org/rpms/gdb/blob/master/f/gdb-archer-vla-tests.patch ): ... $ cat -n test.cc 1 void 2 function1 (void) 3 { 4throw 20; 5 } 6 7 int 8 main (void) 9 { 10try 11 { 12function1 (); 13 } 14catch (int x) 15 { 16 } 17 18return 0; 19 } ... We compile using gcc 7.5.0: ... $ g++ -g test.cc -save-temps -dA ... When trying to step over function1 using next, we end up on line 18, and not at the start of line 18 (given the $hex prefix): ... $ gdb a.out -ex start -ex next Reading symbols from a.out... Temporary breakpoint 1 at 0x4007b5: file test.cc, line 12. Starting program: a.out Temporary breakpoint 1, main () at test.cc:12 12function1 (); 0x004007c1 18return 0; (gdb) ... This is caused by the following. There's a .loc for line 18 after the call to function1, but then we jump away to label .L9: ... # test.cc:12 .loc 1 12 0 call_Z9function1v .LEHE0: # BLOCK 3 seq:1 # PRED: 2 (FALLTHRU) 6 [100.0%] .L7: # test.cc:18 .loc 1 18 0 movl$0, %eax # SUCC: 7 [100.0%] jmp .L9 # BLOCK 4 seq:2 # PRED: 2 (ABNORMAL,ABNORMAL_CALL,EH) .L8: cmpq$1, %rdx ... Since there's no other loc before the insn at .L8, it's considered to be part of line 18.
[Bug debug/97774] Incorrect line info for try/catch
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=97774 Tom de Vries changed: What|Removed |Added Target Milestone|--- |10.0 Status|UNCONFIRMED |RESOLVED Resolution|--- |FIXED --- Comment #1 from Tom de Vries --- Starting gcc 10.1, we have instead: ... jmp .L8 # BLOCK 4 seq:2 # PRED: 2 (ABNORMAL,ABNORMAL_CALL,EH) .L7: # test.cc:14:3 .loc 1 14 3 cmpq$1, %rdx ... and: ... $ gdb a.out -ex start -ex n Reading symbols from a.out... Temporary breakpoint 1 at 0x4007b4: file test.cc, line 12. Starting program: a.out Temporary breakpoint 1, main () at test.cc:12 12function1 (); 14catch (int x) ... This is due to: ... commit 4500f7510368cdb24b8afcc66e86e09cafe49199 Author: Eric Botcazou Date: Fri Jul 5 08:39:13 2019 + except.c (emit_to_new_bb_before): Make sure to put a location on SEQ. * except.c (emit_to_new_bb_before): Make sure to put a location on SEQ. * tree-eh.c (replace_goto_queue_1) : Propagate location. (emit_eh_dispatch): Delete. (lower_catch): Emit the eh_dispatch manually and set the location of the first catch statement onto it. (lower_eh_filter): Emit the eh_dispatch manually and set location. (lower_eh_dispatch): Propagate location. * tree-outof-ssa.c (set_location_for_edge): Handle EH edges specially. (eliminate_build): Likewise. From-SVN: r273132 ...
[Bug debug/97713] [gsplit-dwarf] label generated for .debug_abbrev.dwo offset, corresponding relocation ignored by objcopy --extract-dwo
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=97713 --- Comment #5 from Tom de Vries --- (In reply to Tom de Vries from comment #4) > (In reply to Tom de Vries from comment #0) > > Now, should objcopy implement the relocation? > > Nick proposed a patch that errors out on current gcc output. > > > Note that llvm emits a '0' as abbrev offset instead of a label. > > And gcc would have to emit a 0, like llvm. It that ok from gcc perspective? Ping.
[Bug target/98321] [nvptx] 'atom.add.f32' for atomic add of 32-bit 'float'
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98321 --- Comment #1 from Tom de Vries --- Ok, let's first make a runnable test-case: ... $ cat src/libgomp/testsuite/libgomp.oacc-c/test.c #include #define TYPE float TYPE a = 1; TYPE b = 2; int main (void) { printf ("A: %f\n", a); #pragma acc parallel num_gangs (1) num_workers (1) copy (a, b) #pragma acc atomic update a += b; printf ("A: %f\n", a); return !(a == 3); } ... Indeed we see the cas, but that has nothing to do with support in the nvptx port: ... atom.cas.b32%r29, [%r25], %r22, %r28; ... This appears already at ompexp on the host, where we expand: ... #pragma omp atomic_load relaxed D.2555 = *D.2568 : D.2557 = D.2555 + b.1; #pragma omp atomic_store relaxed (D.2557) ... into: ... D.2583 = __atomic_load_4 (D.2582, 0); D.2584 = D.2583; : D.2585 = VIEW_CONVERT_EXPR(D.2584); D.2586 = D.2585 + b.1; D.2587 = VIEW_CONVERT_EXPR(D.2586); D.2588 = __sync_val_compare_and_swap_4 (D.2582, D.2584, D.2587); ... This is part of a generic problem with offloading, where choices are made in the host compiler which are suboptimal or even unsupported in the offload compiler. Ideally this should be addressed in the host compiler. It may be possible to address this in the nvptx port by trying to detect the unoptimal pattern and converting it to the optimal atom.add.f32. But ultimately that's a workaround, and it's better to fix this at the source.
gcc-bugs@gcc.gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=97207 Bug ID: 97207 Summary: [nvptx, build] nvptx.c:3539:38: error: no matching function for call to ‘swap(bracket_vec_t&, bracket_vec_t&)’ Product: gcc Version: 11.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: target Assignee: unassigned at gcc dot gnu.org Reporter: vries at gcc dot gnu.org Target Milestone: --- Building trunk for nvptx on ubuntu 18.04.5 LTS with g++ (Ubuntu 7.5.0-3ubuntu1~18.04) 7.5.0, I run into: ... src/gcc/config/nvptx/nvptx.c: In member function ‘void bb_sese: :append(bb_sese*)’: src/gcc/config/nvptx/nvptx.c:3539:38: error: no matching functi on for call to ‘swap(bracket_vec_t&, bracket_vec_t&)’ std::swap (brackets, child->brackets); ^ In file included from /usr/include/c++/7/bits/nested_exception.h:40:0, from /usr/include/c++/7/exception:143, from /usr/include/c++/7/ios:39, from /usr/include/c++/7/istream:38, from /usr/include/c++/7/sstream:38, from src/gcc/config/nvptx/nvptx.c:24: /usr/include/c++/7/bits/move.h:187:5: note: candidate: template typename std::enabl e_if >, std::is_move_constructible<_Tp>, std ::is_move_assignable<_Tp> >::value>::type std::swap(_Tp&, _Tp&) swap(_Tp& __a, _Tp& __b) ^~~~ /usr/include/c++/7/bits/move.h:187:5: note: template argument deduction/substitution failed: /usr/include/c++/7/bits/move.h: In substitution of ‘template typename std::enable_i f >, std::is_move_constructible<_Tp>, std::i s_move_assignable<_Tp> >::value>::type std::swap(_Tp&, _Tp&) [with _Tp = auto_vec]’: ...
gcc-bugs@gcc.gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=97207 Tom de Vries changed: What|Removed |Added Target||nvptx CC||rguenth at gcc dot gnu.org --- Comment #1 from Tom de Vries --- Regression, started at: ... commit 4b9d61f79c0c0185a33048ae6cc72269cf7efa31 Author: Richard Biener Date: Thu Aug 6 14:50:56 2020 +0200 add move CTOR to auto_vec, use auto_vec for get_loop_exit_edges This adds a move CTOR to auto_vec and makes use of a auto_vec return value for get_loop_exit_edges denoting that lifetime management of the vector is handed to the caller. The move CTOR prompted the hash_table change because it appearantly makes the copy CTOR implicitely deleted (good) and hash-table expansion of the odr_enum_map which is hash_map where odr_enum has an auto_vec member triggers this. Not sure if there's a latent bug there before this (I think we're not invoking DTORs, but we're invoking copy-CTORs). ... The type bracket_vec_t is defined as: ... typedef auto_vec bracket_vec_t; ... so that does look at least related.
gcc-bugs@gcc.gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=97207 --- Comment #2 from Tom de Vries --- Configure from build-gcc/config.log: ... $ /home/vries/nvptx/trunk/source-gcc/configure --target=nvptx-none --prefix= --enable-languages=c,c++,fortran --enable-werror --enable-checking=yes CC=gcc -m64 -Wl,-rpath,/i686-pc-linux-gnu/lib64 CXX=g++ -m64 -Wl,-rpath,/i686-pc-linux-gnu/lib64 --enable-linker-plugin-flags=CC=gcc\ -m32\ -Wl,-rpath,/i686-pc-linux-gnu/lib --enable-linker-plugin-configure-flags=--host=i686-pc-linux-gnu --with-sysroot=/nvptx-none --with-build-sysroot=/home/vries/nvptx/trunk/install/nvptx-none --with-build-time-tools=/home/vries/nvptx/trunk/install/nvptx-none/bin --disable-sjlj-exceptions --enable-newlib-io-long-long CFLAGS=-O0 -g CXXFLAGS=-O0 -g ...
gcc-bugs@gcc.gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=97207 --- Comment #3 from Tom de Vries --- Created attachment 49271 --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=49271&action=edit gzipped preprocessed source Reproduce: $ g++ -m64 -fno-PIE -c -O0 -g -DIN_GCC -DCROSS_DIRECTORY_STRUCTURE -fno-exceptions -fno-rtti -fasynchronous-unwind-tables -W -Wall -Wno-narrowing -Wwrite-strings -Wcast-qual -Wmissing-format-attribute -Woverloaded-virtual -Wno-long-long -Wno-variadic-macros -Wno-overlength-strings -fno-common -fpreprocessed nvptx.c -Wno-implicit-fallthrough nvptx.c: In member function ‘void bb_sese::append(bb_sese*)’: /home/vries/nvptx/trunk/source-gcc/gcc/config/nvptx/nvptx.c:3539:38: error: no matching function for call to ‘swap(bracket_vec_t&, bracket_vec_t&)’ std::swap (brackets, child->brackets); ^ ...
gcc-bugs@gcc.gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=97207 --- Comment #15 from Tom de Vries --- (In reply to Richard Biener from comment #9) > diff --git a/gcc/vec.h b/gcc/vec.h > index d73d865cff2..c0e577893a3 100644 > --- a/gcc/vec.h > +++ b/gcc/vec.h > @@ -1546,7 +1546,12 @@ public: >this->m_vec = r.m_vec; >r.m_vec = NULL; > } > - void operator= (auto_vec&&) = delete; > + void operator= (auto_vec&& r) > +{ > + this->release (); > + this->m_vec = r.m_vec; > + r.m_vec = NULL; > +} > }; > > > > works for the vec.c test, Tom - can you check if it works for nvptx? It does.
[Bug libbacktrace/97227] New: dsymutil runs on ELF execs during libbacktrace testing
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=97227 Bug ID: 97227 Summary: dsymutil runs on ELF execs during libbacktrace testing Product: gcc Version: 11.0 Status: UNCONFIRMED Severity: trivial Priority: P3 Component: libbacktrace Assignee: unassigned at gcc dot gnu.org Reporter: vries at gcc dot gnu.org CC: ian at gcc dot gnu.org Target Milestone: --- When running libbacktrace check, I run into: ... make[3]: Entering directory '/dev/shm/tdevries/data/master/2020-09-24T22-37-10-02-00-942ab9e9d4f/build/libbacktrace' dsymutil btest Stack dump: 0. Program arguments: dsymutil btest #0 0x7fc89542fd7d llvm::sys::PrintStackTrace(llvm::raw_ostream&) (/usr/bin/../lib64/libLLVM.so.10+0x9a9d7d) #1 0x7fc89542d6a0 llvm::sys::RunSignalHandlers() (/usr/bin/../lib64/libLLVM.so.10+0x9a76a0) #2 0x7fc895430412 (/usr/bin/../lib64/libLLVM.so.10+0x9aa412) #3 0x7fc8943295a0 __restore_rt (/lib64/libc.so.6+0x395a0) #4 0x00410085 _init (/usr/bin/dsymutil-10.0.0+0x410085) #5 0x7fc89431434a __libc_start_main (/lib64/libc.so.6+0x2434a) #6 0x0040d69a _init (/usr/bin/dsymutil-10.0.0+0x40d69a) make[3]: *** [Makefile:2396: btest.dSYM] Segmentation fault (core dumped) ... The installed dsymutil is from llvm10, which has a bug which is fixed on llvm trunk by this commit: ... commit ef87f69ec538ccfe7d68b6d03125e7636e859ace (HEAD) Author: Greg Clayton Date: Fri Mar 6 14:59:41 2020 -0800 Fix a copy and paste error that would cause a crash. ... Anyway, after applying that commit we'll be likely to get: ... dsymutil btest error: cannot parse the debug map for 'btest': The file was not recognized as a valid object file ... because this tool is intended for mach-o, not for elf. So, probably we should only do dsymutil btest on a mach-o platform.
[Bug target/97254] New: [nvptx] Define PCC_BITFIELD_TYPE_MATTERS
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=97254 Bug ID: 97254 Summary: [nvptx] Define PCC_BITFIELD_TYPE_MATTERS Product: gcc Version: 11.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: target Assignee: unassigned at gcc dot gnu.org Reporter: vries at gcc dot gnu.org Target Milestone: --- While debugging gcc/testsuite/gcc.dg/pr94600-1.c, I found that nvptx doesn't define PCC_BITFIELD_TYPE_MATTERS. AFAIU, the theory for offloading is that settings that influence abi should be compatible with the host. And in i386.h, we find: ... /* If bit field type is int, don't let it cross an int, and give entire struct the alignment of an int. */ /* Required on the 386 since it doesn't have bit-field insns. */ #define PCC_BITFIELD_TYPE_MATTERS 1 ... OTOH, the value may be different for other hosts. On the llvm side, we find this comment in llvm.git/clang/lib/Basic/Targets/ARM.cpp: ... // Do not respect the alignment of bit-field types when laying out // structures. This corresponds to PCC_BITFIELD_TYPE_MATTERS in gcc. UseBitFieldTypeAlignment = false; ... and in llvm.git/clang/lib/Basic/Targets/NVPTX.cpp: ... UseBitFieldTypeAlignment = HostTarget->useBitFieldTypeAlignment(); ... which seems to confirm that approach. It should be possible to get the host target by looking at the --enable-as-accelerator, and figure out the value from there. OTOH, it's possible that for GCC the nvptx implementation of the hook is moot, given that part of the processing is done in the host compiler. That does not answer the question what to do for the standalone target though. But probably, given that the value is set for both known offloading hosts x86_64 and ppc, it might be a good idea to have the standalone target do the same.
[Bug target/90931] [nvptx] FAIL: gcc.c-torture/execute/pr78675.c -O1 execution test
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=90931 Tom de Vries changed: What|Removed |Added Status|UNCONFIRMED |RESOLVED Resolution|--- |WORKSFORME --- Comment #1 from Tom de Vries --- No longer occurs for current trunk with driver 450.66.
[Bug libgomp/81688] libgomp.c/target-3{3,4}.c fails: GOMP_OFFLOAD_async_run unimplemented for nvptx
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=81688 Tom de Vries changed: What|Removed |Added Status|UNCONFIRMED |RESOLVED Resolution|--- |FIXED --- Comment #6 from Tom de Vries --- This no longer fails at current trunk.
[Bug target/96428] [nvptx] nvptx_gen_shuffle does not handle V2DI mode – Fails with an ICE
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=96428 Tom de Vries changed: What|Removed |Added Status|REOPENED|RESOLVED Resolution|--- |FIXED --- Comment #9 from Tom de Vries --- Thomas, these are fine follow-up comments, but given that there's currently no ICE and there's a test-case in place to check that, this is resolved-fixed.
[Bug libgomp/81778] libgomp.c/for-5.c failure on nvptx -- illegal memory access
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=81778 --- Comment #9 from Tom de Vries --- I ran into this again, and did another round of minimizing. This time, I added some buffering around where we write, and check the entire buffer afterwards: ... $ cat libgomp/testsuite/libgomp.c-c++-common/for-5.c /* { dg-additional-options "-std=gnu99" { target c } } */ #include #define N 4096 #define MID (N/2) #define M 4 #pragma omp declare target int a[N]; #pragma omp end declare target int main (void) { int i; for (i = 0; i < N; i++) a[i] = 0; #pragma omp target update to(a) int s = 1; #pragma omp target simd for (int i = M - 1; i > -1; i -= s) a[MID + i] = 1; #pragma omp target update from(a) int error_found = 0; for (i = 0; i < N; i++) { int expected = (MID <= i && i < MID + M) ? 1 : 0; if (a[i] == expected) continue; error_found = 1; printf ("Expected %d, got %u at %d\n", expected, a[i], i); } if (error_found) __builtin_abort (); return 0; } ... Indeed we're writing more than required (for M == 4, we just want the locations 2048..2051): ... $ LD_LIBRARY_PATH=$(pwd -P)/install/lib64 ./for-5.exe Expected 0, got 1 at 1955 Expected 0, got 1 at 1986 Expected 0, got 1 at 1987 Expected 0, got 1 at 2017 Expected 0, got 1 at 2018 Expected 0, got 1 at 2019 Aborted (core dumped) ... Fails for M >= 2.
[Bug target/80845] nvptx backend generates cvt.u32.u32
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=80845 Tom de Vries changed: What|Removed |Added Status|UNCONFIRMED |RESOLVED Resolution|--- |FIXED --- Comment #6 from Tom de Vries --- The committed patch fixes the issue mentioned in comment 0. The issue from comment 3 no longer reproduces. Marking resolved-fixed.
[Bug libgomp/81778] libgomp.c/for-5.c failure on nvptx -- illegal memory access
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=81778 --- Comment #10 from Tom de Vries --- Tentative patch: ... diff --git a/gcc/omp-expand.c b/gcc/omp-expand.c index 99cb4f9dda4..034de497390 100644 --- a/gcc/omp-expand.c +++ b/gcc/omp-expand.c @@ -6333,6 +6333,8 @@ expand_omp_simd (struct omp_region *region, struct omp_for_data *fd) /* Collapsed loops not handled for SIMT yet: limit to one lane only. */ if (fd->collapse > 1) simt_maxlane = build_one_cst (unsigned_type_node); + else if (TREE_CODE (fd->loops[0].step) != INTEGER_CST) + simt_maxlane = build_one_cst (unsigned_type_node); else if (safelen_int < omp_max_simt_vf ()) simt_maxlane = build_int_cst (unsigned_type_node, safelen_int); tree vf @@ -6636,6 +6638,10 @@ expand_omp_simd (struct omp_region *region, struct omp_for_data *fd) else t = fold_build2 (PLUS_EXPR, type, fd->loop.v, step); expand_omp_build_assign (&gsi, fd->loop.v, t); + /* The alternative IV needs to to be updated as well, but isn't +currently. Assert that we fall back to simt_maxlane == 1. */ + gcc_assert (altv == NULL_TREE + || tree_int_cst_equal (simt_maxlane, integer_one_node)); } /* Remove GIMPLE_OMP_RETURN. */ ...
[Bug fortran/95654] nvptx offloading: FAIL: libgomp.fortran/pr66199-5.f90 -O3 -fomit-frame-pointer -funroll-loops -fpeel-loops -ftracer -finline-functions execution test
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=95654 Tom de Vries changed: What|Removed |Added Resolution|--- |FIXED Status|NEW |RESOLVED --- Comment #17 from Tom de Vries --- Patch with test-case committed, marking resolved-fixed.
[Bug fortran/95654] nvptx offloading: FAIL: libgomp.fortran/pr66199-5.f90 -O3 -fomit-frame-pointer -funroll-loops -fpeel-loops -ftracer -finline-functions execution test
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=95654 Tom de Vries changed: What|Removed |Added Target Milestone|--- |11.0
[Bug tree-optimization/97159] [11 Regression] segfault in modref_may_conflict
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=97159 --- Comment #4 from Tom de Vries --- I'm currently not running into this ICE anymore, so presumably it was fixed. I'm not sure by which commit though.
[Bug tree-optimization/97008] [openacc] Remove invariant that IFN_UNIQUE is last stmt in bb
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=97008 --- Comment #4 from Tom de Vries --- I did a libgomp test run with commit f96b6328fa7 "[tree-optimization] Don't clear ctrl-altering flag for IFN_UNIQUE" reverted, and with this patch: ... diff --git a/gcc/tracer.c b/gcc/tracer.c index 0f69b335b8c..3a4403d92b1 100644 --- a/gcc/tracer.c +++ b/gcc/tracer.c @@ -93,11 +93,15 @@ can_duplicate_insn_p (gimple *g) The IFN_GOMP_SIMT_VOTE_ANY is currently part of such a group, so the same holds there, but it could be argued that the IFN_GOMP_SIMT_VOTE_ANY could be generated after that group, - in which case it could be duplicated. */ + in which case it could be duplicated. + An IFN_UNIQUE call must be duplicated as part of its group, + or not at all. */ if (is_gimple_call (g) && (gimple_call_internal_p (g, IFN_GOMP_SIMT_ENTER_ALLOC) || gimple_call_internal_p (g, IFN_GOMP_SIMT_EXIT) - || gimple_call_internal_p (g, IFN_GOMP_SIMT_VOTE_ANY))) + || gimple_call_internal_p (g, IFN_GOMP_SIMT_VOTE_ANY) + || (gimple_call_internal_p (g) + && gimple_call_internal_unique_p (g return false; return true; @@ -117,8 +121,6 @@ can_duplicate_bb_no_insn_iter_p (const_basic_block bb) if (gimple_code (g) == GIMPLE_TRANSACTION) return false; - /* An IFN_UNIQUE call must be duplicated as part of its group, -or not at all. */ if (is_gimple_call (g) && gimple_call_internal_p (g) && gimple_call_internal_unique_p (g)) ... No issues found.
[Bug libgomp/97291] New: [SIMT] Move SIMT_XCHG_* out of non-uniform execution region
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=97291 Bug ID: 97291 Summary: [SIMT] Move SIMT_XCHG_* out of non-uniform execution region Product: gcc Version: 11.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: libgomp Assignee: unassigned at gcc dot gnu.org Reporter: vries at gcc dot gnu.org CC: jakub at gcc dot gnu.org Target Milestone: --- We have: ... /* Allocate per-lane storage and begin non-uniform execution region. */ static void expand_GOMP_SIMT_ENTER_ALLOC (internal_fn, gcall *stmt) ... and: ... /* Deallocate per-lane storage and leave non-uniform execution region. */ static void expand_GOMP_SIMT_EXIT (internal_fn, gcall *stmt) ... So, if the SIMT_ENTER_ALLOC and the SIMT_EXIT mark the start and end of a region of non-uniform execution, it's strange that such a region can contain SIMT_XCHG_*, which on nvptx requires uniform execution. Moving SIMT_VOTE_ANY/SIMT_LAST_LANE/SIMT_XCHG_* as a whole after SIMT_EXIT is not possible given that VOTE_ANY may have data dependencies to storage that is deallocated by SIMT_EXIT (as Alexander mentioned here: https://gcc.gnu.org/pipermail/gcc-patches/2020-October/555475.html). A possible solution would be to split the SIMT_EXIT into separate bits for exiting non-uniform execution and deallocation, and have: - SIMT_ENTER_ALLOC - SIMT_EXIT_UNI - SIMT_VOTE_ANY/SIMT_LAST_LANE/SIMT_XCHG_* - SIMT_EXIT_DEALLOC Also I've wondered if we could do: - SIMT_ENTER_ALLOC - SIMT_VOTE_ANY - SIMT_EXIT - SIMT_LAST_LANE/SIMT_XCHG_* but perhaps there are again data dependency problems.
[Bug testsuite/81690] libgomp.c/{target-32,thread-limit-2}.c fail for nvptx: missing usleep
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=81690 --- Comment #8 from Tom de Vries --- Pinged issue here ( https://gcc.gnu.org/pipermail/gcc-patches/2020-October/555496.html ).
[Bug tree-optimization/97008] [openacc] Remove invariant that IFN_UNIQUE is last stmt in bb
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=97008 --- Comment #5 from Tom de Vries --- The openacc machinery is the only user of IFN_UNIQUE. Thomas, as openacc maintainer, is this change ok for you, or are reasons why you want to keep the IFN_UNIQUE as last stmt of the BB?
[Bug tree-optimization/97159] [11 Regression] segfault in modref_may_conflict
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=97159 Tom de Vries changed: What|Removed |Added Status|UNCONFIRMED |RESOLVED Resolution|--- |FIXED --- Comment #5 from Tom de Vries --- Marking resolved-fixed.
[Bug middle-end/90861] OpenACC 'declare' not cleaning up for VLAs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=90861 Tom de Vries changed: What|Removed |Added CC||vries at gcc dot gnu.org --- Comment #4 from Tom de Vries --- Created attachment 49317 --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=49317&action=edit Tentative patch
[Bug middle-end/90861] OpenACC 'declare' not cleaning up for VLAs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=90861 --- Comment #5 from Tom de Vries --- Patch submitted: https://gcc.gnu.org/pipermail/gcc-patches/2020-October/555606.html
[Bug middle-end/90861] OpenACC 'declare' not cleaning up for VLAs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=90861 Tom de Vries changed: What|Removed |Added Resolution|--- |FIXED Status|NEW |RESOLVED Target Milestone|--- |11.0 --- Comment #7 from Tom de Vries --- Patch committed, marking resolved-fixed.
[Bug testsuite/81690] libgomp.c/{target-32,thread-limit-2}.c fail for nvptx: missing usleep
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=81690 --- Comment #9 from Tom de Vries --- (In reply to Tobias Burnus from comment #4) > The omp_is_initial_device() is only resolved at run time - hence, I think > the linker still wants to see "usleep". > Well, yes, but that could be fixed: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=82251
[Bug target/97318] New: [nvptx] Function splitting results in invalid function name
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=97318 Bug ID: 97318 Summary: [nvptx] Function splitting results in invalid function name Product: gcc Version: 11.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: target Assignee: unassigned at gcc dot gnu.org Reporter: vries at gcc dot gnu.org Target Milestone: --- Created attachment 49321 --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=49321&action=edit openmp test-case While minimizing PR97203 - I ran into: ... FAIL: libgomp.c/test.c (test for excess errors) Excess errors: ptxas /tmp/ccTfIxsQ.o, line 23; fatal : Parsing error near '.part': syntax error ptxas fatal : Ptx assembly aborted due to errors nvptx-as: ptxas returned 255 exit status ... The problem is that this: ... .func (.param .f32 %value_out) sinf.part.0 (.param .f32 %in_ar0); ... is not supported by nvptx (as indicated by NO_DOT_IN_LABEL).
[Bug libgomp/97331] New: [nvptx] Provide GCN_NUM_TEAMS/GCN_NUM_THREADS equivalent
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=97331 Bug ID: 97331 Summary: [nvptx] Provide GCN_NUM_TEAMS/GCN_NUM_THREADS equivalent Product: gcc Version: 11.0 Status: UNCONFIRMED Severity: enhancement Priority: P3 Component: libgomp Assignee: unassigned at gcc dot gnu.org Reporter: vries at gcc dot gnu.org CC: jakub at gcc dot gnu.org Target Milestone: --- When looking at the gcn plugin, there are a number of environment variables that set/limit launch dimensions: ... $ grep getenv libgomp/plugin/plugin-gcn.c | grep NUM const char *x = secure_getenv ("GCN_NUM_TEAMS"); x = secure_getenv ("GCN_NUM_GANGS"); const char *z = secure_getenv ("GCN_NUM_THREADS"); z = secure_getenv ("GCN_NUM_WORKERS"); ... It would be nice to have similar ones for nvptx.
[Bug libgomp/97332] New: [gcn] GCN_NUM_GANGS/GCN_NUM_WORKERS override compile-time constants
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=97332 Bug ID: 97332 Summary: [gcn] GCN_NUM_GANGS/GCN_NUM_WORKERS override compile-time constants Product: gcc Version: 11.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: libgomp Assignee: unassigned at gcc dot gnu.org Reporter: vries at gcc dot gnu.org CC: jakub at gcc dot gnu.org Target Milestone: --- In openacc programs, dimensions are either dynamic or hardcoded. If the dimensions are hardcoded, and there are builtins returning the size of these dimensions, the builtins are folded in fold_internal_goacc_dim. Changing the dimensions in the plugin then invalidates the folding. I'm guessing this should be fixed, or at least documented in the plugin (with perhaps even a warning).
[Bug libgomp/81802] Report cuLaunchKernel launch dimensions in GOMP_OFFLOAD_run
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=81802 Tom de Vries changed: What|Removed |Added Resolution|--- |FIXED Target Milestone|--- |11.0 Status|UNCONFIRMED |RESOLVED --- Comment #2 from Tom de Vries --- Marking resolved-fixed.
[Bug tree-optimization/97333] New: [gimple_can_duplicate_bb_p == false, tree-ssa-threadupdate] ICE in duplicate_block, at cfghooks.c:1093
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=97333 Bug ID: 97333 Summary: [gimple_can_duplicate_bb_p == false, tree-ssa-threadupdate] ICE in duplicate_block, at cfghooks.c:1093 Product: gcc Version: 11.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: tree-optimization Assignee: unassigned at gcc dot gnu.org Reporter: vries at gcc dot gnu.org Target Milestone: --- With this patch: ... diff --git a/gcc/tree-cfg.c b/gcc/tree-cfg.c index 406441751a9..b9168755155 100644 --- a/gcc/tree-cfg.c +++ b/gcc/tree-cfg.c @@ -6213,7 +6213,7 @@ gimple_split_block_before_cond_jump (basic_block bb) static bool gimple_can_duplicate_bb_p (const_basic_block bb ATTRIBUTE_UNUSED) { - return true; + return false; } /* Create a duplicate of the basic block BB. NOTE: This does not ... we run into: ... spawn -ignore SIGHUP /dev/shm/tdevries/data/master/2020-10-07T08-04-46-02-00-c475cfa435b/build/gcc/xgcc -B/dev/shm/tdevries/data/master/2020-10-07T08-04-46-02-00-c475cfa435b/build/gcc/ -fdiagnostics-plain-output -O1 -w -c -o 20041018-1.o /labs/tdevries/gcc/src/gcc/testsuite/gcc.c-torture/compile/20041018-1.c^M during GIMPLE pass: dom^M /labs/tdevries/gcc/src/gcc/testsuite/gcc.c-torture/compile/20041018-1.c: In function 'foo':^M /labs/tdevries/gcc/src/gcc/testsuite/gcc.c-torture/compile/20041018-1.c:2:1: internal compiler error: in duplicate_block, at cfghooks.c:1093^M 0xbfebaf duplicate_block(basic_block_def*, edge_def*, basic_block_def*, copy_bb_data*)^M /labs/tdevries/gcc/src/gcc/cfghooks.c:1093^M 0x16d794b create_block_for_threading^M /labs/tdevries/gcc/src/gcc/tree-ssa-threadupdate.c:342^M 0x16d90e7 ssa_create_duplicates(redirection_data**, ssa_local_info_t*)^M /labs/tdevries/gcc/src/gcc/tree-ssa-threadupdate.c:1130^M 0x16dfe86 void hash_table::traverse_noresize(ssa_local_info_t*)^M /labs/tdevries/gcc/src/gcc/hash-table.h:1081^M 0x16df31a void hash_table::traverse(ssa_local_info_t*)^M /labs/tdevries/gcc/src/gcc/hash-table.h:1102^M 0x16d9dff thread_block_1^M /labs/tdevries/gcc/src/gcc/tree-ssa-threadupdate.c:1495^M 0x16d9ee5 thread_block^M /labs/tdevries/gcc/src/gcc/tree-ssa-threadupdate.c:1539^M 0x16da3d7 thread_through_loop_header^M /labs/tdevries/gcc/src/gcc/tree-ssa-threadupdate.c:1781^M 0x16dc767 thread_through_all_blocks(bool)^M /labs/tdevries/gcc/src/gcc/tree-ssa-threadupdate.c:2667^M 0x15522d2 execute^M /labs/tdevries/gcc/src/gcc/tree-ssa-dom.c:776^M ...
[Bug target/97203] [nvptx] 'illegal memory access was encountered' with 'omp simd'/SIMT and cexpf call
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=97203 --- Comment #2 from Tom de Vries --- Minimal version (without inlining sinf code from newlib): ... /* { dg-additional-options "-lm -foffload=-lm" } */ #define N 1 int main (void) { float k[N]; float res; for (int i = 0; i < N; i++) k[i] = 300; #pragma omp target map(to:k) map(from:res) { float sum = 0.0; #pragma omp simd reduction(+:sum) for (int i = 0; i < N; i++) sum += __builtin_sinf (k[i]); res = sum; } return 0; } ...
[Bug target/97203] [nvptx] 'illegal memory access was encountered' with 'omp simd'/SIMT and cexpf call
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=97203 --- Comment #3 from Tom de Vries --- [ Note, this is with GOMP_NVPTX_JIT=-O0. ] In sinf, we have: ... 45:return -__kernel_cosf(y[0],y[1]); ... which translates to: ... .loc 1 45 12 ld.f32 %r67,[%frame+4]; ld.f32 %r65,[%frame]; { .param .f32 %value_in; .param .f32 %out_arg1; st.param.f32 [%out_arg1],%r65; .param .f32 %out_arg2; st.param.f32 [%out_arg2],%r67; call (%value_in),__kernel_cosf,(%out_arg1,%out_arg2); ld.param.f32 %r68,[%value_in]; } .loc 1 45 11 neg.f32 %r37,%r68; ... If I place (using GOMP_NVPTX_PTXRW) a trap before the first load: ... .loc 1 45 12 +trap ld.f32 %r67,[%frame+4]; ... I get: ... libgomp: cuCtxSynchronize error: an illegal instruction was encountered ... If I place it after the first load, I get: ... libgomp: cuCtxSynchronize error: an illegal memory access was encountered ...
[Bug target/97203] [nvptx] 'illegal memory access was encountered' with 'omp simd'/SIMT and cexpf call
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=97203 --- Comment #4 from Tom de Vries --- So, I think calling functions from simd code is atm not supported for nvptx. Stack variables in simd code are mapped on a per-thread stack rather than on the usual per-warp stack. The functions are compiled with the usual per-warp stack, so calling those functions from simd might mean the different lanes are gonna disagree about what the value in a stack variable should be. Having said that, for the example in comment 2, there only should be one thread executing the call, so this doesn't explain the illegal memory access.
[Bug target/97203] [nvptx] 'illegal memory access was encountered' with 'omp simd'/SIMT and cexpf call
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=97203 --- Comment #5 from Tom de Vries --- FWIW, another aspect here is convergence (as usual). Looking at the SASS code for main$_omp_fn$0$impl, I don't find evidence for the usual divergence/convergence ops (SSY/SYNC), which might mean that the following shfl is executed in divergent mode, so, even if we would not get the memory access error, we would not get correct results.
[Bug target/97338] New: [nvptx] Convergence checking
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=97338 Bug ID: 97338 Summary: [nvptx] Convergence checking Product: gcc Version: 11.0 Status: UNCONFIRMED Severity: enhancement Priority: P3 Component: target Assignee: unassigned at gcc dot gnu.org Reporter: vries at gcc dot gnu.org Target Milestone: --- With ptx, we have insns that need to be executed in convergent mode, that is, all threads in the warp active. We can unfortunately not enforce this, but we could check it, which could help pinpoint problems. A ptx insn: ... vote.ballot.b32 %rbla, 1; ... gives us the regmask of active threads, so we could check: ... { .reg .u32 %rwarp_active_mask; vote.ballot.b32 %rwarp_active_mask, 1; .reg .pred %pconvergent; setp.eq.u32 %pconvergent,%rwarp_active_mask,-1; @ ! %pconvergent trap; } ...
[Bug target/97348] New: [nvptx] Make -misa=sm_35 the default
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=97348 Bug ID: 97348 Summary: [nvptx] Make -misa=sm_35 the default Product: gcc Version: 11.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: target Assignee: unassigned at gcc dot gnu.org Reporter: vries at gcc dot gnu.org Target Milestone: --- The gcc objects and executables for nvptx are somewhat funny given that they contain just ptx text. However, during assembling, we do verify that the ptx is valid, by running it through ptxas, if that happens to be available in the path. This step can be skipped by using -Wa,--no-verify, but it's done by default. The system cuda on my system (ubuntu Ubuntu 18.04.5) is V9.1.108, and I can do a build (where the system ptxas will be used for verification). However, if I want to compile something while my path is set to include the most recent cuda (11.1), I get: ... $ c=~/cuda/11.1/bin; ( export PATH=$c:$PATH; /home/vries/nvptx/trunk/build-gcc/gcc/xgcc -B/home/vries/nvptx/trunk/build-gcc/gcc/ -fdiagnostics-plain-output --sysroot=/home/vries/nvptx/trunk/install/nvptx-none -O0 -w -c -isystem /home/vries/nvptx/trunk/build-gcc/nvptx-none/./newlib/targ-include -isystem /home/vries/nvptx/trunk/source-gcc/newlib/libc/include -o pr42717.o /home/vries/nvptx/trunk/source-gcc/gcc/testsuite/gcc.c-torture/compile/pr42717.c ) ptxas fatal : Value 'sm_30' is not defined for option 'gpu-name' nvptx-as: ptxas returned 255 exit status ... By adding -misa=sm_35, the compilation succeeds. For a test-case, it's stil feasible to add this, but not for a build. So, it looks like it's time to make -misa=sm_35 the new default. [ It's good to note that the ptxas code will not actually be executed, and that in the end the installed driver is the one making the translation for execution. So atm my exec using sm_30 is still supported by Long Lived Branch driver version 450.66. ]
[Bug target/97348] [nvptx] Make -misa=sm_35 the default
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=97348 Tom de Vries changed: What|Removed |Added Target||nvptx --- Comment #1 from Tom de Vries --- Using this: ... diff --git a/gcc/config/nvptx/nvptx.opt b/gcc/config/nvptx/nvptx.opt index 75c3d54864e..4c27a832d28 100644 --- a/gcc/config/nvptx/nvptx.opt +++ b/gcc/config/nvptx/nvptx.opt @@ -60,5 +60,5 @@ EnumValue Enum(ptx_isa) String(sm_35) Value(PTX_ISA_SM35) misa= -Target RejectNegative ToLower Joined Enum(ptx_isa) Var(ptx_isa_option) Init(PTX_ISA_SM30) +Target RejectNegative ToLower Joined Enum(ptx_isa) Var(ptx_isa_option) Init(PTX_ISA_SM35) Specify the version of the ptx ISA to use. ... and rebuilding cc1, we indeed get the sm_35 in the .s file. However, when doing a complete rebuild against system cuda, we run into this in configure doing a conftest: ... ptxas fatal : SM version specified by .target is higher than default SM version assumed nvptx-as: ptxas returned 255 exit status ... If we add -misa=sm_35 to the command line, the conftest passes. Looking in the nvptx-as sources, we find a hard_coded default: ... const char *smver = "sm_30"; ... and after changing that to sm_35, the conftest passes. I'm not sure if nvptx-as should have a hardcoded default, probably it should parse the .target line and pass that to ptxas.
[Bug target/97348] [nvptx] Make -misa=sm_35 the default
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=97348 --- Comment #2 from Tom de Vries --- Anyway, we should be able to work around this by having gcc explicitly pass -m sm_35 to nvptx-as: ... -#define ASM_SPEC "%{misa=*:-m %*}" +#define ASM_SPEC "%{misa=*:-m %*; :-m sm_35}" ...
[Bug target/97348] [nvptx] Make -misa=sm_35 the default
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=97348 --- Comment #3 from Tom de Vries --- (In reply to Tom de Vries from comment #1) > Looking in the nvptx-as sources, we find a hard_coded default: > ... > const char *smver = "sm_30"; > ... > and after changing that to sm_35, the conftest passes. > > I'm not sure if nvptx-as should have a hardcoded default, probably it should > parse the .target line and pass that to ptxas. Filed https://github.com/MentorEmbedded/nvptx-tools/issues/24.
[Bug target/97348] [nvptx] Make -misa=sm_35 the default
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=97348 Tom de Vries changed: What|Removed |Added Target Milestone|--- |11.0 Resolution|--- |FIXED Status|UNCONFIRMED |RESOLVED --- Comment #5 from Tom de Vries --- Marking resolved-fixed.
[Bug target/98321] [nvptx] 'atom.add.f32' for atomic add of 32-bit 'float'
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98321 --- Comment #3 from Tom de Vries --- (In reply to Thomas Schwinge from comment #2) > However, my report was specifically for the nvptx target compiler. Just > compile with 'nvptx-gcc -fopenacc -S' the code I posed, and compare > '-DTYPE=int'/'-DTYPE=long' vs. '-DTYPE=float'. Ah, I was not aware of usage of openacc beyond the offloading setup. For my understanding, is this just a way for you to easily reproduce some problem really occurring elsewhere, or is this actually used for something?
[Bug target/98321] [nvptx] 'atom.add.f32' for atomic add of 32-bit 'float'
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98321 --- Comment #5 from Tom de Vries --- (In reply to Thomas Schwinge from comment #4) > I had been looking into how/when PTX 'atom' is used for reductions, and > first had a look what the back end currently might emit at all, found SDIM > 'atomic_fetch_add', and SF 'atomic_fetch_addsf'. Ack. > I tried to get these > used via '(void) __atomic_fetch_add (&a, b, __ATOMIC_RELAXED);', which works > fine for integer types, but 'error: operand type ‘float *’ is incompatible > with argument 1 of ‘__atomic_fetch_add’' (didn't research the rationale > behind that), so resorted to 'acc atomic'. > Further analysis to be done. > (Can floating-point type atomic generally not be supported, given that > '__atomic_fetch_add' rejects it? Is OMP atomic handling doing something > wrong for these even for nvptx target (real, not via offloading)? Is > something wrong in the nvptx back end?) > I don't know the rationale either, but at least it looks like documented behaviour, both for the builtin and the pattern. I don't see the backend doing anything wrong. > This isn't important right now; I just filed the issue as I'd found it. Ack, understood.
[Bug debug/98656] New: switchlower_O0 drops line number of switch
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98656 Bug ID: 98656 Summary: switchlower_O0 drops line number of switch Product: gcc Version: 11.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: debug Assignee: unassigned at gcc dot gnu.org Reporter: vries at gcc dot gnu.org Target Milestone: --- [ Originally filed as gdb PR at https://sourceware.org/bugzilla/show_bug.cgi?id=27179 ] Consider test-case small.c. ... $ cat -n small.c 1 #include 2 3 void foo (int x, int y) 4 { 5switch (x) { 6 case 0: break; 7 case 1: break; 8 case 2: break; 9 case 3: 10for (int z = 0; z < ({ if (y) break; 5; }); z++) 11break; 12 case 4: break; 13 default: break; 14} 15 } 16 17 int main () 18 { 19foo (1, 1); // L1 20foo (2, 1); // L2 21printf("hello world!"); // L3 22return 0; 23 } ... With gcc-8, we have a .loc with line number 5 representing the switch statement: ... $ gcc-8 -O0 -g small.c -save-temps $ cat small.s foo: .LFB0: .file 1 "small.c" .loc 1 4 1 .cfi_startproc pushq %rbp .cfi_def_cfa_offset 16 .cfi_offset 6, -16 movq%rsp, %rbp .cfi_def_cfa_register 6 movl%edi, -20(%rbp) movl%esi, -24(%rbp) .loc 1 5 3 cmpl$4, -20(%rbp) ja .L13 ... With gcc-9 that .loc disappeared: ... foo: .LFB0: .file 1 "small.c" .loc 1 4 1 .cfi_startproc pushq %rbp .cfi_def_cfa_offset 16 .cfi_offset 6, -16 movq%rsp, %rbp .cfi_def_cfa_register 6 movl%edi, -20(%rbp) movl%esi, -24(%rbp) cmpl$4, -20(%rbp) ja .L13 ... and that's still the case with gcc-11. Culprit is switchlower_O0. With this compilation: ... $ rm -f *.c.*; gcc-11 -O0 -g small.c -fdump-tree-all-lineno -save-temps ... we have at a-small.c.234t.cplxlower0: ... : [small.c:5:3] switch (x_1(D)) <[small.c:13:2] default: [INV], [small.c:6:5] case 0: [INV], [small.c:7:5] case 1: [INV], [small.c:8:5] case 2: [INV], [small.c:9:5] case 3: [INV], [small.c:12:5] case 4: [INV]> ... and at a-small.c.236t.switchlower_O0: ... : switch (x_1(D)) <[small.c:13:2] default: [0.00%], [small.c:6:5] case 0: [20.00%], [small.c:7:5] case 1: [20.00%], [small.c:8:5] case 2: [20.00%], [small.c:9:5] case 3: [20.00%], [small.c:12:5] case 4: [20.00%]> ... Note the dropped "[small.c:5:3]" in front of "switch".
[Bug debug/98656] [9/10/11 Regression] switchlower_O0 drops line number of switch
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98656 --- Comment #1 from Tom de Vries --- Created attachment 49959 --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=49959&action=edit Tentative patch Using this tentative patch, I get back the .loc for line number 5: ... foo: .LFB0: .file 1 "small.c" .loc 1 4 1 .cfi_startproc pushq %rbp .cfi_def_cfa_offset 16 .cfi_offset 6, -16 movq%rsp, %rbp .cfi_def_cfa_register 6 movl%edi, -20(%rbp) movl%esi, -24(%rbp) .loc 1 5 3 cmpl$4, -20(%rbp) ja .L13 ...
[Bug debug/98780] New: Missing line table entry for inlined stmt at -g -O0
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98780 Bug ID: 98780 Summary: Missing line table entry for inlined stmt at -g -O0 Product: gcc Version: 11.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: debug Assignee: unassigned at gcc dot gnu.org Reporter: vries at gcc dot gnu.org Target Milestone: --- Created attachment 50018 --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=50018&action=edit source file [ Spinoff of gdb PR25884 - "Stepping over inlined function without line number statement fails" (https://sourceware.org/bugzilla/show_bug.cgi?id=25884). ] Consider test source gdb/testsuite/gdb.opt/inline-cmds.c (attached). When compiled with -O0 -g, this bit: ... 71result = 0; /* set breakpoint 3 here */ 72 73func1 (); /* first call */ 74func1 (); /* second call */ 75marker (); ... where func1 is: ... 33 inline __attribute__((always_inline)) int func1(void) 34 { 35bar (); 36return x * y; 37 } ... is represented in the line info as: ... Line numberStarting addressViewStmt 710x64 x 350x6e x 750x78 x ... where the insn are: ... 64: c7 05 00 00 00 00 00movl $0x0,0x0(%rip)# 6e 6b: 00 00 00 6e: e8 00 00 00 00 callq 73 73: e8 00 00 00 00 callq 78 78: e8 00 00 00 00 callq 7d ... The calls to bar at 6e and 73 represent different instantiations of the same statement, but they get a single entry in the line table. It would be more accurate to have two entries. Note that each call to bar has its own DW_TAG_inlined_subroutine descriptor: ... <2><11e>: Abbrev Number: 10 (DW_TAG_inlined_subroutine) <11f> DW_AT_abstract_origin: <0x208> <123> DW_AT_low_pc : 0x6e <12b> DW_AT_high_pc : 0x5 <133> DW_AT_call_file : 1 <134> DW_AT_call_line : 73 <135> DW_AT_call_column : 3 <2><136>: Abbrev Number: 10 (DW_TAG_inlined_subroutine) <137> DW_AT_abstract_origin: <0x208> <13b> DW_AT_low_pc : 0x73 <143> DW_AT_high_pc : 0x5 <14b> DW_AT_call_file : 1 <14c> DW_AT_call_line : 74 <14d> DW_AT_call_column : 3 ... so at some level gcc knows that these are different statements. [ The problem with gdb is that it ignores the DW_TAG_inlined_subroutine on the second call because there's no corresponding line table entry. ]
[Bug debug/98780] Missing line table entry for inlined stmt at -g -O0
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98780 --- Comment #1 from Tom de Vries --- At final, we have: ... (note 113 30 112 2 0x7f53d1836de0 NOTE_INSN_BLOCK_BEG) (note 112 113 31 2 0x7f53d1836e40 NOTE_INSN_BLOCK_BEG) (call_insn 31 112 114 2 (call (mem:QI (symbol_ref:DI ("bar") [flags 0x41] ) [0 bar S1 A8]) (const_int 0 [0])) "src/gdb/testsuite/gdb.opt/inline-cmds.c":35:3 813 {*call} (nil) (nil)) (note 114 31 115 2 0x7f53d1836e40 NOTE_INSN_BLOCK_END) (note 115 114 117 2 0x7f53d1836de0 NOTE_INSN_BLOCK_END) (note 117 115 116 2 0x7f53d1836d20 NOTE_INSN_BLOCK_BEG) (note 116 117 38 2 0x7f53d1836d80 NOTE_INSN_BLOCK_BEG) (call_insn 38 116 118 2 (call (mem:QI (symbol_ref:DI ("bar") [flags 0x41] ) [0 bar S1 A8]) (const_int 0 [0])) "src/gdb/testsuite/gdb.opt/inline-cmds.c":35:3 813 {*call} (nil) (nil)) (note 118 38 119 2 0x7f53d1836d80 NOTE_INSN_BLOCK_END) (note 119 118 45 2 0x7f53d1836d20 NOTE_INSN_BLOCK_END) ... So if we reset last_linenum when encountering a BLOCK_END: ... diff --git a/gcc/final.c b/gcc/final.c index b037e07fca0..a5da1ce7224 100644 --- a/gcc/final.c +++ b/gcc/final.c @@ -2385,6 +2385,8 @@ final_scan_insn_1 (rtx_insn *insn, FILE *file, int optimize_p AT TRIBUTE_UNUSED, debug_hooks->end_block (high_block_linenum, n); gcc_assert (BLOCK_IN_COLD_SECTION_P (NOTE_BLOCK (insn)) == in_cold_section_p); + + last_linenum = 0; } if (write_symbols == DBX_DEBUG) { ... I get: ... Line numberStarting addressViewStmt 710x64 x 350x6e x 350x73 x 750x78 x ...
[Bug libbacktrace/98818] New: [libbacktrace] Don't throw fatal error for unsupported dwarf version
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98818 Bug ID: 98818 Summary: [libbacktrace] Don't throw fatal error for unsupported dwarf version Product: gcc Version: 11.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: libbacktrace Assignee: unassigned at gcc dot gnu.org Reporter: vries at gcc dot gnu.org CC: ian at gcc dot gnu.org Target Milestone: --- I have several gcc versions installed. I use this f.i. for running gdb testsuite with different gcc versions. Consequently, I have libgcc 11 installed in my system, to be able to support the latest gcc compiler. After installing the libgcc 11 debuginfo package, gcc-go9 started to given me problems: ... $ ./outputs/gdb.go/handcall/handcall fatal error: unrecognized DWARF version in .debug_info at 40 goroutine 1 [running, locked to thread]: fatal error: unrecognized DWARF version in .debug_info at 40 panic during panic goroutine 1 [running, locked to thread]: fatal error: unrecognized DWARF version in .debug_info at 40 stack trace unavailable ... The root cause for this is that libgcc's .debug_info contains dwarf5 units, and that causes: ... dwarf_buf_error (&unit_buf, "unrecognized DWARF version"); ... where we do: ... static void dwarf_buf_error (struct dwarf_buf *buf, const char *msg) { char b[200]; snprintf (b, sizeof b, "%s in %s at %d", msg, buf->name, (int) (buf->buf - buf->start)); buf->error_callback (buf->data, b, 0); } ... which gets us to libgo's error_callback: ... (gdb) l 161 /* Error callback. */ 162 163 static void 164 error_callback (void *data __attribute__ ((unused)), 165 const char *msg, int errnum) 166 { 167 if (errnum == -1) 168 { 169 /* No debug info available. Carry on as best we can. */ 170 return; (gdb) l 171 } 172 if (errnum != 0) 173 runtime_printf ("%s errno %d\n", msg, errnum); 174 runtime_throw (msg); 175 } ... ISTM that dwarf info that has a newer version than the libbacktrace reader supports is not that different from missing debug info, so I wonder if we should call the error_callback here with -1 instead.