[Bug debug/99319] New: DW_MACRO_define_strp uses uleb128 for second operand

2021-03-01 Thread vries at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=99319

Bug ID: 99319
   Summary: DW_MACRO_define_strp uses uleb128 for second operand
   Product: gcc
   Version: 11.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: debug
  Assignee: unassigned at gcc dot gnu.org
  Reporter: vries at gcc dot gnu.org
  Target Milestone: ---

Consider compiling hello world, with macro info:
...
$ gcc-11 -ggdb3 -g hello.c -dA -save-temps
...

In a-hello.s, we have:
...
.section.debug_macro,"",@progbits
.Ldebug_macro0:
.value  0x5 # DWARF macro version number
.byte   0x2 # Flags: 32-bit, lineptr present
.long   .Ldebug_line0
.byte   0x3 # Start new file
.uleb128 0  # Included from line number 0
.uleb128 0x1# file hello.c
.byte   0x5 # Define macro strp
.uleb128 0  # At line number 0
.long   .LASF2  # The macro: "__STDC__ 1"
...

So, the DW_MACRO_define_strp entry (starting with .byte 0x5) has two operands,
a uleb128 and a .long.

AFAIU, this is in accordance with the spec:
...
A DW_MACRO_define_strp or DW_MACRO_undef_strp entry has two operands. The first
operand encodes the source line number of the #define or #undef macro
directive. The second operand consists of an offset into a string table
contained in the .debug_str section of the object file. The size of the operand
is given in the header offset_size_flag field.
...

Now add -gsplit-dwarf:
...
$ gcc-11 -ggdb3 -g hello.c -dA -save-temps -gsplit-dwarf
...

Now we have instead:
...
.section.debug_macro.dwo,"e",@progbits
.Ldebug_macro0:
.value  0x5 # DWARF macro version number
.byte   0x2 # Flags: 32-bit, lineptr present
.long   .Lskeleton_debug_line0
.byte   0x3 # Start new file
.uleb128 0  # Included from line number 0
.uleb128 0x1# file hello.c
.byte   0x5 # Define macro strp
.uleb128 0  # At line number 0
.uleb128 0x191  # The macro: "__STDC__ 1"
...

The second operand is now also a .uleb128.  AFAIU, this goes against the spec.

[Bug debug/99319] DW_MACRO_define_strp uses uleb128 for second operand

2021-03-01 Thread vries at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=99319

--- Comment #1 from Tom de Vries  ---
Related readelf PR: https://sourceware.org/bugzilla/show_bug.cgi?id=27387

[Bug debug/99319] DW_MACRO_define_strp uses uleb128 for second operand

2021-03-01 Thread vries at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=99319

--- Comment #2 from Tom de Vries  ---
(In reply to Tom de Vries from comment #0)
> The second operand is now also a .uleb128.  AFAIU, this goes against the
> spec.

Also, gdb doesn't get it:
...
$ gdb -q -batch -readnow a.out
DW_FORM_strp pointing outside of .debug_str section [in module
/home/vries/hello/a.out]
...

Debugging shows that the error is due to a large str_offset:
...
(gdb) p /x str_offset
$14 = 0x8c0502cd
...
which matches this:
...
.byte   0x5 # Define macro strp
.uleb128 0x8b   # At line number 139
.uleb128 0x14d  # The macro: "stdin stdin"
.byte   0x5 # Define macro strp
.uleb128 0x8c   # At line number 140
.uleb128 0x24   # The macro: "stdout stdout"
...

Note that the uleb128 representation of 0x14d is "cd02".

[Bug debug/95432] inconsistent behaviors at -O2

2021-03-07 Thread vries at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=95432

--- Comment #3 from Tom de Vries  ---
(In reply to Andrew Pinski from comment #2)
> Assembly:
> .loc 1 12 3 is_stmt 1 view .LVU12
> .loc 1 10 8 is_stmt 0 view .LVU13
> movaps  %xmm0, (%rsp)
> .loc 1 11 8 view .LVU14
> movaps  %xmm0, 32(%rsp)
> .loc 1 12 13 view .LVU15
> callfoo
> .LVL1:
> .loc 1 13 13 view .LVU16
> leaq32(%rsp), %rdi
> .loc 1 12 13 view .LVU17
> movl%eax, %edx
> .LVL2:
> .loc 1 13 3 is_stmt 1 view .LVU18
> .loc 1 13 13 is_stmt 0 view .LVU19
> callfoo
> .LVL3:
> .loc 1 14 3 is_stmt 1 view .LVU20
> .loc 1 14 6 is_stmt 0 view .LVU21
> 
> Looks correct to me, both call foo have the correct line on them.  I think
> this is another GDB issue, most likely how dwarf3 and is_stmt is handled
> just like 95431 even.

Ack, this is gdb PR breakpoints/26063 (
https://sourceware.org/bugzilla/show_bug.cgi?id=26063 ).

Fixed by
https://sourceware.org/git/gitweb.cgi?p=binutils-gdb.git;h=ebde6f2ddc987e7e2d5a218ee8cf0126ec189424
.

[Bug target/99555] [OpenMP/nvptx] Execution-time hang for simple nested OpenMP 'target'/'parallel'/'task' constructs

2021-03-12 Thread vries at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=99555

--- Comment #1 from Tom de Vries  ---
I see this as well:
...
PASS: libgomp.c/../libgomp.c-c++-common/task-detach-6.c (test for excess
errors)
WARNING: program timed out.
...

[Bug target/99564] New: [nvptx] FAIL: libgomp.oacc-fortran/derivedtypes-arrays-1.f90 -DACC_DEVICE_TYPE_nvidia=1 -DACC_MEM_SHARED=0 -foffload=nvptx-none -O0 (test for excess errors)

2021-03-12 Thread vries at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=99564

Bug ID: 99564
   Summary: [nvptx] FAIL:
libgomp.oacc-fortran/derivedtypes-arrays-1.f90
-DACC_DEVICE_TYPE_nvidia=1 -DACC_MEM_SHARED=0
-foffload=nvptx-none  -O0  (test for excess errors)
   Product: gcc
   Version: 11.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: target
  Assignee: unassigned at gcc dot gnu.org
  Reporter: vries at gcc dot gnu.org
  Target Milestone: ---

...
FAIL: libgomp.oacc-fortran/derivedtypes-arrays-1.f90 -DACC_DEVICE_TYPE_nvidia=1
-DACC_MEM_SHARED=0 -foffload=nvptx-none  -O0  (test for excess errors)
FAIL: libgomp.oacc-fortran/derivedtypes-arrays-1.f90 -DACC_DEVICE_TYPE_nvidia=1
-DACC_MEM_SHARED=0 -foffload=nvptx-none  -O1  (test for excess errors)
FAIL: libgomp.oacc-fortran/derivedtypes-arrays-1.f90 -DACC_DEVICE_TYPE_nvidia=1
-DACC_MEM_SHARED=0 -foffload=nvptx-none  -O2  (test for excess errors)
FAIL: libgomp.oacc-fortran/derivedtypes-arrays-1.f90 -DACC_DEVICE_TYPE_nvidia=1
-DACC_MEM_SHARED=0 -foffload=nvptx-none  -O3 -fomit-frame-pointer
-funroll-loops -fpeel-loops -ftracer -finline-functions  (test for excess
errors)
FAIL: libgomp.oacc-fortran/derivedtypes-arrays-1.f90 -DACC_DEVICE_TYPE_nvidia=1
-DACC_MEM_SHARED=0 -foffload=nvptx-none  -O3 -g  (test for excess errors)
FAIL: libgomp.oacc-fortran/derivedtypes-arrays-1.f90 -DACC_DEVICE_TYPE_nvidia=1
-DACC_MEM_SHARED=0 -foffload=nvptx-none  -Os  (test for excess errors)
...

In more detail:
...
/home/vries/oacc/trunk/source-gcc/libgomp/testsuite/libgomp.oacc-fortran/derivedtypes-arrays-1.f90:
In function 'MAIN__._omp_fn.0':^M
/home/vries/oacc/trunk/source-gcc/libgomp/testsuite/libgomp.oacc-fortran/derivedtypes-arrays-1.f90:90:40:
warning: using vector_length (32), ignoring 1^M
FAIL: libgomp.oacc-fortran/derivedtypes-arrays-1.f90 -DACC_DEVICE_TYPE_nvidia=1
-DACC_MEM_SHARED=0 -foffload=nvptx-none  -O0  (test for excess errors)
...

[Bug driver/99896] New: g++ drops -lc

2021-04-03 Thread vries at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=99896

Bug ID: 99896
   Summary: g++ drops -lc
   Product: gcc
   Version: 10.2.1
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: driver
  Assignee: unassigned at gcc dot gnu.org
  Reporter: vries at gcc dot gnu.org
  Target Milestone: ---

[ Spinoff from gdb PR https://sourceware.org/bugzilla/show_bug.cgi?id=27681 . ]

Consider the following test-case, consisting of:
...
$ cat main.c 
#include 
#include 

#ifndef _GNU_SOURCE
#define _GNU_SOURCE
#endif
#include 

extern void foo (void);

int
main (void)
{
  regex_t re;

  int res = regcomp (&re, "bla", 0);
  assert (res == 0);

  int res2 = regexec (&re, "bla", 0, NULL, 0);
  assert (res2 == 0);

  regoff_t res3 = re_search (&re, "bla", 3, 0, 3, NULL);
  assert (res3 == 0);

  foo ();

  return 0;
} 
...
and:
...
$ cat foo.c 
#include 
#include 

#include 

extern void foo (void);

void
foo (void)
{
  regex_t re;

  int res = pcre2_regcomp (&re, "bla", 0);
  assert (res == 0);

  int res2 = pcre2_regexec (&re, "bla", 0, NULL, 0);
  assert (res2 == 0);
}
...

We can compile with gcc and run like this:
...
$ gcc main.c -lc foo.c -lpcre2-posix
$ ./a.out 
$
...

likewise, with clang:
...
$ clang main.c -lc foo.c -lpcre2-posix
$ ./a.out 
$ 
...

likewise, with clang++:
...
$ clang++ -x c++ main.c -lc foo.c -lpcre2-posix
$ ./a.out 
$
...

but with g++:
...
$ g++ -x c++ main.c -lc foo.c -lpcre2-posix
$ ./a.out 
Segmentation fault (core dumped)
$
...

Using -v, we can see what goes wrong.  With gcc, we have:
...
collect2 ... main.o -lc foo.o -lpcre2-posix ...
...

With g++, we have instead:
...
collect2 ... main.o foo.o -lpcre2-posix ...
...

Workaround: use -Wl:
...
$ g++ -x c++ main.c -Wl,-lc foo.c -lpcre2-posix
$ ./a.out 
$
...

[Bug driver/99896] g++ drops -lc

2021-04-03 Thread vries at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=99896

--- Comment #2 from Tom de Vries  ---
(In reply to Jonathan Wakely from comment #1)
> (In reply to Tom de Vries from comment #0)
> > With g++, we have instead:
> > ...
> > collect2 ... main.o foo.o -lpcre2-posix ...
> > ...
> 
> It isn't dropped, it's moved to the end:
> 
> main.o foo.o -lpcre2-posix -lstdc++ -lm -lc -lgcc_s -lgcc -lc -lgcc_s -lgcc
> 

I don't understand. AFAICT, it's dropped.  It's not moved to the end, because
-lc is already at the end without specifying -lc. 

> If you need it before foo.o then -Wl,-lc seems like the right workaround for
> me.
> 

Um, for my understanding, does that mean you agree this is a bug in g++?

> Why is it needed there anyway though?

main.o is intended to use regcomp from glibc.  Foo.o is intended to use
pcre2_regcomp from pcre2-posix (which is also accessible using plain regcomp). 
When -lc is droppend, regcomp from pcre2-posix is used by main instead, which
is incompatible with using re_search from glibc.

[Bug driver/99896] g++ drops -lc

2021-04-03 Thread vries at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=99896

Tom de Vries  changed:

   What|Removed |Added

 CC||matz at suse dot de

--- Comment #6 from Tom de Vries  ---
(In reply to Jonathan Wakely from comment #4)
> (In reply to Tom de Vries from comment #2)
> > I don't understand. AFAICT, it's dropped.  It's not moved to the end,
> > because -lc is already at the end without specifying -lc. 
> 
> OK, it's dropped because it's always present at the end.
> 
> This is similar to adding -I/usr/include which gets ignored, because it's
> already going to be searched anyway as a system header directory. Quoting
> the manual:
> 
> "If a standard system include directory, or a directory specified with
> -isystem, is also specified with -I, the -I option is ignored."
> 
>  
> > Um, for my understanding, does that mean you agree this is a bug in g++?
> 
> No.
> 

OK, so here ( https://gcc.gnu.org/onlinedocs/gcc/Invoking-GCC.html#Invoking-GCC
) I read:
...
Also, the placement of the -l option is significant. 
...

So, if the documentation of gcc says that placement of the -l option is
significant, then why does g++ decide to mess with that?  ISTM g++ violates
documented behaviour.

[Bug target/99555] [OpenMP/nvptx] Execution-time hang for simple nested OpenMP 'target'/'parallel'/'task' constructs

2021-04-15 Thread vries at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=99555

--- Comment #4 from Tom de Vries  ---
Investigated using cuda-gdb.

After typing ^c, we investigate the state:
...
(cuda-gdb) info cuda kernels
  Kernel Parent Dev Grid Status   SMs Mask GridDim BlockDim Invocation 
*  0  -   01 Active 0x0010 (1,1,1) (32,8,1) main$_omp_fn() 
...

So, we have 256 threads in the CTA, or 8 warps.

The threads have the following state:
...
(cuda-gdb) info cuda threads
  BlockIdx ThreadIdx To BlockIdx ThreadIdx Count Virtual PC Filename 
Line 
Kernel 0
*  (0,0,0)   (0,0,0) (0,0,0)   (0,0,0) 1 0x00b5f638  n/a   
 0 
   (0,0,0)   (0,1,0) (0,0,0)   (0,7,0) 7 0x00b2f350  n/a   
 0 
   (0,0,0)   (1,0,0) (0,0,0)   (1,0,0) 1 0x00b5f638  n/a   
 0 
   (0,0,0)   (1,1,0) (0,0,0)   (1,7,0) 7 0x00b2f350  n/a   
 0 
   (0,0,0)   (2,0,0) (0,0,0)   (2,0,0) 1 0x00b5f638  n/a   
 0 
   (0,0,0)   (2,1,0) (0,0,0)   (2,7,0) 7 0x00b2f350  n/a   
 0 
   (0,0,0)   (3,0,0) (0,0,0)   (3,0,0) 1 0x00b5f638  n/a   
 0 
   (0,0,0)   (3,1,0) (0,0,0)   (3,7,0) 7 0x00b2f350  n/a   
 0 
   (0,0,0)   (4,0,0) (0,0,0)   (4,0,0) 1 0x00b5f638  n/a   
 0 
   (0,0,0)   (4,1,0) (0,0,0)   (4,7,0) 7 0x00b2f350  n/a   
 0 
   (0,0,0)   (5,0,0) (0,0,0)   (5,0,0) 1 0x00b5f638  n/a   
 0 
   (0,0,0)   (5,1,0) (0,0,0)   (5,7,0) 7 0x00b2f350  n/a   
 0 
   (0,0,0)   (6,0,0) (0,0,0)   (6,0,0) 1 0x00b5f638  n/a   
 0 
   (0,0,0)   (6,1,0) (0,0,0)   (6,7,0) 7 0x00b2f350  n/a   
 0 
   (0,0,0)   (7,0,0) (0,0,0)   (7,0,0) 1 0x00b5f638  n/a   
 0 
   (0,0,0)   (7,1,0) (0,0,0)   (7,7,0) 7 0x00b2f350  n/a   
 0 
   (0,0,0)   (8,0,0) (0,0,0)   (8,0,0) 1 0x00b5f638  n/a   
 0 
   (0,0,0)   (8,1,0) (0,0,0)   (8,7,0) 7 0x00b2f350  n/a   
 0 
   (0,0,0)   (9,0,0) (0,0,0)   (9,0,0) 1 0x00b5f638  n/a   
 0 
   (0,0,0)   (9,1,0) (0,0,0)   (9,7,0) 7 0x00b2f350  n/a   
 0 
   (0,0,0)  (10,0,0) (0,0,0)  (10,0,0) 1 0x00b5f638  n/a   
 0 
   (0,0,0)  (10,1,0) (0,0,0)  (10,7,0) 7 0x00b2f350  n/a   
 0 
   (0,0,0)  (11,0,0) (0,0,0)  (11,0,0) 1 0x00b5f638  n/a   
 0 
   (0,0,0)  (11,1,0) (0,0,0)  (11,7,0) 7 0x00b2f350  n/a   
 0 
   (0,0,0)  (12,0,0) (0,0,0)  (12,0,0) 1 0x00b5f638  n/a   
 0 
   (0,0,0)  (12,1,0) (0,0,0)  (12,7,0) 7 0x00b2f350  n/a   
 0 
   (0,0,0)  (13,0,0) (0,0,0)  (13,0,0) 1 0x00b5f638  n/a   
 0 
   (0,0,0)  (13,1,0) (0,0,0)  (13,7,0) 7 0x00b2f350  n/a   
 0 
   (0,0,0)  (14,0,0) (0,0,0)  (14,0,0) 1 0x00b5f638  n/a   
 0 
   (0,0,0)  (14,1,0) (0,0,0)  (14,7,0) 7 0x00b2f350  n/a   
 0 
   (0,0,0)  (15,0,0) (0,0,0)  (15,0,0) 1 0x00b5f638  n/a   
 0 
   (0,0,0)  (15,1,0) (0,0,0)  (15,7,0) 7 0x00b2f350  n/a   
 0 
   (0,0,0)  (16,0,0) (0,0,0)  (16,0,0) 1 0x00b5f638  n/a   
 0 
   (0,0,0)  (16,1,0) (0,0,0)  (16,7,0) 7 0x00b2f350  n/a   
 0 
   (0,0,0)  (17,0,0) (0,0,0)  (17,0,0) 1 0x00b5f638  n/a   
 0 
   (0,0,0)  (17,1,0) (0,0,0)  (17,7,0) 7 0x00b2f350  n/a   
 0 
   (0,0,0)  (18,0,0) (0,0,0)  (18,0,0) 1 0x00b5f638  n/a   
 0 
   (0,0,0)  (18,1,0) (0,0,0)  (18,7,0) 7 0x00b2f350  n/a   
 0 
   (0,0,0)  (19,0,0) (0,0,0)  (19,0,0) 1 0x00b5f638  n/a   
 0 
   (0,0,0)  (19,1,0) (0,0,0)  (19,7,0) 7 0x00b2f350  n/a   
 0 
   (0,0,0)  (20,0,0) (0,0,0)  (20,0,0) 1 0x00b5f638  n/a   
 0 
   (0,0,0)  (20,1,0) (0,0,0)  (20,7,0) 7 0x00b2f350  n/a   
 0 
   (0,0,0)  (21,0,0) (0,0,0)  (21,0,0) 1 0x00b5f638  n/a   
 0 
   (0,0,0)  (21,1,0) (0,0,0)  (21,7,0) 7 0x00b2f350  n/a   
 0 
   (0,0,0)  (22,0,0) (0,0,0)  (22,0,0) 1 0x00b5f638  n/a   
 0 
   (0,0,0)  (22,1,0) (0,0,0)  (22,7,0) 7 0x00b2f350  n/a   
 0 
   (0,0,0)  (23,0,0) (0,0,0)  (23,0,0) 1 0x00b5f638  n/a   
 0 
   (0,0,0)  (23,1,0) (0,0,0)  (23,7,0) 7 0x00b2f350  n/a   
 0 
   (0,0,0)  (24,0,0) (0,0,0)  (24,0,0) 1 0x00b5f638  n/a   
 0 
   (0,0,0)  (24,1,0) (0,0,0)  (24,7,0) 7 0x00b2f350  n/a   
 0 
   (0,0,0)  (25,0,0) (0,0,0)  (25,0,0) 1 0x00b5f638  n/a   
 0 
   (0,0,0)  (25,1,0) (0,0,0)  (25,7,0) 7 0x00b2f350  n/a   
 0 
   (0,0,0)  (26,0,0) (0,0,0)  (26,0,0) 1 0x00b5f638  

[Bug target/99555] [OpenMP/nvptx] Execution-time hang for simple nested OpenMP 'target'/'parallel'/'task' constructs

2021-04-17 Thread vries at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=99555

Tom de Vries  changed:

   What|Removed |Added

 CC||amonakov at gcc dot gnu.org

--- Comment #6 from Tom de Vries  ---
Current theory ...

All omp-threads are supposed to participate in a team barrier, and then all
together move on.  The master omp-thread participates from gomp_team_end, the
other omp-threads from the worker loop in gomp_thread_start.

Instead, it seems the master omp-thread gets stuck at the team barrier, while
all other omp-threads move on, to the thread pool barrier, and that state
corresponds to the observed hang.

AFAICT, the problem starts when gomp_team_barrier_wake is called with count ==
1:
...
void
gomp_team_barrier_wake (gomp_barrier_t *bar, int count)
{
  if (bar->total > 1)
asm ("bar.sync 1, %0;" : : "r" (32 * bar->total));
}
...
The count argument is ignored, and instead all omp-threads are woken up, which
causes omp-threads to escape the team barrier.

This all is a result of the gomp_barrier_handle_tasks path being taken in
gomp_team_barrier_wait_end, and I haven't figured out why that is triggered, so
it still may be that the root cause lies elsewhere.

Anyway, the nvptx bar.{c,h} is copied from linux/bar.{c,h}, which is
implemented using futex, and with futex uses replaced with bar.sync uses.

FWIW, replacing libgomp/config/nvptx/bar.{c,h} with libgomp/config/posix.{c,h}
fixes the problem.  Did a full libgomp test run, all problems fixed.

[Bug target/99555] [OpenMP/nvptx] Execution-time hang for simple nested OpenMP 'target'/'parallel'/'task' constructs

2021-04-19 Thread vries at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=99555

--- Comment #7 from Tom de Vries  ---
Created attachment 50627
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=50627&action=edit
debug patch

A bit more analysis.

I'm working with this example, with an actual task to be able to perform a
check afterwards:
...
#include 

int i = 1;

int
main (void)
{

#pragma omp target map(tofrom:i)
#pragma omp parallel num_threads(2)
#pragma omp task
  {
__atomic_add_fetch (&i, 1, __ATOMIC_SEQ_CST);
  }

  assert (i == 3);

  return 0;
}
...

And I've forced the plugin to launch with two omp-threads to limit the
dimensions to the minimium:
...
(cuda-gdb) info cuda kernels
  Kernel Parent Dev Grid Status   SMs Mask GridDim BlockDim Invocation 
*  0  -   01 Active 0x0010 (1,1,1) (32,2,1) main$_omp_fn() 
...

Furthermore I've made specific instances for the bar.sync team barrier, to get
more meaningful backtraces.  So the lifetimes of the two omp-threads look like
this.

THREAD 0:
...
#0  0x00b73aa8 in bar_sync_thread_0 ()
#1  0x00b74a80 in bar_sync_n ()
#2  0x00b72598 in bar_sync_1 ()
#3  0x00b760b8 in gomp_team_barrier_wake ()
#4  0x00b5bc38 in GOMP_task ()
#5  0x00b36a58 in main$_omp_fn () # $1
#6  0x00a7e618 in GOMP_parallel ()
#7  0x00b377a0 in main$_omp_fn$0$impl ()
#8  0x00b3c700 in gomp_nvptx_main ()
#9  0x00b39420 in main$_omp_fn<<<(1,1,1),(32,2,1)>>> ()

#0  0x00b380e8 in main$_omp_fn () # $2
#1  0x00b95178 in gomp_barrier_handle_tasks ()
#2  0x00b76e38 in gomp_team_barrier_wait_end ()
#3  0x00b77dd8 in gomp_team_barrier_wait_final ()
#4  0x00b2a1b8 in gomp_team_end ()
#5  0x00b318d8 in GOMP_parallel_end ()
#6  0x00a7e620 in GOMP_parallel ()
#7  0x00b377a0 in main$_omp_fn$0$impl ()
#8  0x00b3c700 in gomp_nvptx_main ()
#9  0x00b39420 in main$_omp_fn<<<(1,1,1),(32,2,1)>>> ()

#0  0x00b380e8 in main$_omp_fn () # $2
#1  0x00b95178 in gomp_barrier_handle_tasks ()
#2  0x00b76e38 in gomp_team_barrier_wait_end ()
#3  0x00b77dd8 in gomp_team_barrier_wait_final ()
#4  0x00b2a1b8 in gomp_team_end ()
#5  0x00b318d8 in GOMP_parallel_end ()
#6  0x00a7e620 in GOMP_parallel ()
#7  0x00b377a0 in main$_omp_fn$0$impl ()
#8  0x00b3c700 in gomp_nvptx_main ()
#9  0x00b39420 in main$_omp_fn<<<(1,1,1),(32,2,1)>>> ()

#0  0x00b73aa8 in bar_sync_thread_0 ()
#1  0x00b74a80 in bar_sync_n ()
#2  0x00b72598 in bar_sync_1 ()
#3  0x00b760b8 in gomp_team_barrier_wake ()
#4  0x00b94c98 in gomp_barrier_handle_tasks ()
#5  0x00b76e38 in gomp_team_barrier_wait_end ()
#6  0x00b77dd8 in gomp_team_barrier_wait_final ()
#7  0x00b2a1b8 in gomp_team_end ()
#8  0x00b318d8 in GOMP_parallel_end ()
#9  0x00a7e620 in GOMP_parallel ()
#10 0x00b377a0 in main$_omp_fn$0$impl ()
#11 0x00b3c700 in gomp_nvptx_main ()
#12 0x00b39420 in main$_omp_fn<<<(1,1,1),(32,2,1)>>> ()

#0  0x00b73aa8 in bar_sync_thread_0 ()
#1  0x00b74a80 in bar_sync_n ()
#2  0x00b719b8 in bar_sync_3 ()
#3  0x00b76f50 in gomp_team_barrier_wait_end ()
#4  0x00b77dd8 in gomp_team_barrier_wait_final ()
#5  0x00b2a1b8 in gomp_team_end ()
#6  0x00b318d8 in GOMP_parallel_end ()
#7  0x00a7e620 in GOMP_parallel ()
#8  0x00b377a0 in main$_omp_fn$0$impl ()
#9  0x00b3c700 in gomp_nvptx_main ()
#10 0x00b39420 in main$_omp_fn<<<(1,1,1),(32,2,1)>>> ()

^C

#0  0x00b73da8 in bar_sync_thread_0 ()
#1  0x00b74a80 in bar_sync_n ()
#2  0x00b719b8 in bar_sync_3 ()
#3  0x00b76f50 in gomp_team_barrier_wait_end ()
#4  0x00b77dd8 in gomp_team_barrier_wait_final ()
#5  0x00b2a1b8 in gomp_team_end ()
#6  0x00b318d8 in GOMP_parallel_end ()
#7  0x00a7e620 in GOMP_parallel ()
#8  0x00b377a0 in main$_omp_fn$0$impl ()
#9  0x00b3c700 in gomp_nvptx_main ()
#10 0x00b39420 in main$_omp_fn<<<(1,1,1),(32,2,1)>>> ()
...

THREAD 1:
...
#0  0x00b70ae8 in bar_sync_thread_1 ()
#1  0x00b74b80 in bar_sync_n ()
#2  0x00b72598 in bar_sync_1 ()
#3  0x00b760b8 in gomp_team_barrier_wake ()
#4  0x00b5bc38 in GOMP_task ()
#5  0x00b36a58 in main$_omp_fn () # $1
#6  0x00b3cbb8 in gomp_nvptx_main ()
#7  0x00b39420 in main$_omp_fn<<<(1,1,1),(32,2,1)>>> ()

#0  0x00b70ae8 in bar_sync_thread_1 ()
#1  0x00b74b80 in bar_sync_n ()
#2  0x00b719b8 in bar_sync_3 ()
#3  0x00b76f50 in gomp_team_barrier_wait_end ()
#4  0x00b77dd8 in gomp_team_barrier_wait_final ()
#5  0x00b3cd50 in gomp_nvptx_main ()
#6  0x00b39420 in main$_omp_fn<<<(1,1,1),(32,2,1)>>> ()

^C

#0  0x00b3ca30 in gomp_nvptx_main ()
#1  0

[Bug target/99555] [OpenMP/nvptx] Execution-time hang for simple nested OpenMP 'target'/'parallel'/'task' constructs

2021-04-19 Thread vries at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=99555

--- Comment #8 from Tom de Vries  ---
This fixes the hang:
...
@@ -91,14 +129,16 @@ gomp_team_barrier_wait_end (gomp_barrier_t *bar,
gomp_barrier_state_t state)
{
  gomp_barrier_handle_tasks (state);
  state &= ~BAR_WAS_LAST;
+ gen = __atomic_load_n (&bar->generation, MEMMODEL_ACQUIRE);
+ if (gen == state + BAR_INCR)
+   return;
}
   else
{
...

I'm not yet sure about the implementation, but the idea is to detect that
gomp_team_barrier_done was called during gomp_barrier_handle_tasks, and then
bail out.

[Bug target/99555] [OpenMP/nvptx] Execution-time hang for simple nested OpenMP 'target'/'parallel'/'task' constructs

2021-04-19 Thread vries at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=99555

--- Comment #9 from Tom de Vries  ---
(In reply to Tom de Vries from comment #8)
> This fixes the hang:

This is a less intrusive solution, and is easier to transplant into
gomp_team_barrier_wait_cancel_end:
...
diff --git a/libgomp/config/nvptx/bar.c b/libgomp/config/nvptx/bar.c
index c5c2fa8829b..cb7b299c6a8 100644
--- a/libgomp/config/nvptx/bar.c
+++ b/libgomp/config/nvptx/bar.c
@@ -91,6 +91,9 @@ gomp_team_barrier_wait_end (gomp_barrier_t *bar,
gomp_barrier_state_t state)
{
  gomp_barrier_handle_tasks (state);
  state &= ~BAR_WAS_LAST;
+ if (team->task_count != 0)
+   __builtin_abort ();
+ bar->total = 1;
}
   else
{
@@ -157,6 +160,9 @@ gomp_team_barrier_wait_cancel_end (gomp_barrier_t *bar,
{
  gomp_barrier_handle_tasks (state);
  state &= ~BAR_WAS_LAST;
+ if (team->task_count != 0)
+   __builtin_abort ();
+ bar->total = 1;
}
   else
{
...

[Bug target/99555] [OpenMP/nvptx] Execution-time hang for simple nested OpenMP 'target'/'parallel'/'task' constructs

2021-04-20 Thread vries at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=99555

--- Comment #10 from Tom de Vries  ---
Patch posted: https://gcc.gnu.org/pipermail/gcc-patches/2021-April/568295.html

[Bug libgomp/100160] MinGW-w64 g++ with libgomp and nvptx looks for libgomp-plugin-nvptx.so.1 instead of libgomp-plugin-nvptx-1.dll

2021-04-22 Thread vries at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=100160

Tom de Vries  changed:

   What|Removed |Added

 CC||vries at gcc dot gnu.org

--- Comment #1 from Tom de Vries  ---
Test-case updated: removed commented out lines, formatted, printing
omp_get_thread_num to get some printout that is not all zeroes:
...
#include 
#include 

using namespace std;

int threads = omp_get_max_threads ();
int devices = omp_get_num_devices ();
int i, j[100];

int
main (void)
{
  printf ("Threads: %d Devices: %d\n", threads, devices);

#pragma omp target teams distribute parallel for shared(j)
  for(i = 0; i < 100; i++)
j[i] = omp_get_thread_num ();

  for (i = 0; i < 100; i++)
printf ("%d ", j[i]);

  return 0;
}
...

On ubuntu, we have:
...
$ ./install/bin/g++ test-1.cpp -fopenmp -foffload=nvptx-none -fno-lto
$ LD_LIBRARY_PATH=./install/lib64 ./a.out
Threads: 8 Devices: 1
0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 1 1 1 1 1 1 1 1 1 1 1 2 2 2 2 2 2 2 2 2 2 2 2 2 3
3 3 3 3 3 3 3 3 3 3 3 3 4 4 4 4 4 4 4 4 4 4 4 4 5 5 5 5 5 5 5 5 5 5 5 5 6 6 6 6
6 6 6 6 6 6 6 6 7 7 7 7 7 7 7 7 7 7 7 7 
$
...

And:
...
$ ./install/bin/g++ test-1.cpp -fopenmp -foffload=nvptx-none 
lto-wrapper: fatal error: could not find accel/nvptx-none/mkoffload in
/home/vries/oacc/trunk/install/bin/../libexec/gcc/x86_64-pc-linux-gnu/12.0.0/:/home/vries/oacc/trunk/install/bin/../libexec/gcc/
(consider using ‘-B’)
compilation terminated.
/usr/bin/ld: error: lto-wrapper failed
collect2: error: ld returned 1 exit status
...

So, there might be something windows-specific that is going wrong.

But things seems to already go wrong on linux.

[Bug target/99564] [nvptx] FAIL: libgomp.oacc-fortran/derivedtypes-arrays-1.f90 -DACC_DEVICE_TYPE_nvidia=1 -DACC_MEM_SHARED=0 -foffload=nvptx-none -O0 (test for excess errors)

2021-04-22 Thread vries at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=99564

Tom de Vries  changed:

   What|Removed |Added

 Status|UNCONFIRMED |RESOLVED
 Resolution|--- |FIXED
 CC||tschwinge at gcc dot gnu.org
   Target Milestone|--- |11.0

--- Comment #1 from Tom de Vries  ---
https://gcc.gnu.org/git/?p=gcc.git;a=commit;h=8bafce1be11a301c2421483736c634b8bf330e69

[Bug libgomp/100160] MinGW-w64 g++ with libgomp and nvptx looks for libgomp-plugin-nvptx.so.1 instead of libgomp-plugin-nvptx-1.dll

2021-04-22 Thread vries at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=100160

--- Comment #2 from Tom de Vries  ---
(In reply to Tom de Vries from comment #1)
> ...
> $ ./install/bin/g++ test-1.cpp -fopenmp -foffload=nvptx-none 
> lto-wrapper: fatal error: could not find accel/nvptx-none/mkoffload in
> /home/vries/oacc/trunk/install/bin/../libexec/gcc/x86_64-pc-linux-gnu/12.0.0/
> :/home/vries/oacc/trunk/install/bin/../libexec/gcc/ (consider using ‘-B’)
> compilation terminated.
> /usr/bin/ld: error: lto-wrapper failed
> collect2: error: ld returned 1 exit status
> ...

I can make this work by adding:
...
$ ./install/bin/g++ test-1.cpp -fopenmp -foffload=nvptx-none -flto \
  -B $(pwd
-P)/install/offload-nvptx-none/libexec/gcc/x86_64-pc-linux-gnu/12.0.0/ \
  -B $(pwd -P)/install/offload-nvptx-none/bin/
$ LD_LIBRARY_PATH=./install/lib64 ./a.out
Threads: 8 Devices: 1
0 1 2 3 4 0 1 2 3 4 0 1 2 3 4 0 1 2 3 4 0 1 2 3 4 0 1 2 3 4 0 1 2 3 4 0 1 2 3 4
0 1 2 3 4 0 1 2 3 4 0 1 2 3 4 0 1 2 3 4 0 1 2 3 4 0 1 2 3 4 0 1 2 3 4 0 1 2 3 4
0 1 2 3 4 0 1 2 3 4 0 1 2 3 
...
Hmm, this actually executes on the target, the -fno-lto case executes the host
fallback.

[Bug target/99932] OpenACC/nvptx offloading execution regressions starting with CUDA 11.2-era Nvidia Driver 460.27.04

2021-04-22 Thread vries at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=99932

--- Comment #1 from Tom de Vries  ---
(In reply to Thomas Schwinge from comment #0)
> We're seeing OpenACC/nvptx offloading execution regressions (including a lot
> of timeouts) starting with CUDA 11.2-era Nvidia Driver 460.27.04.  Confirmed
> with: CUDA 11.2-era 460.27.04, 460.32.03, 460.39, 460.56, 460.67, and CUDA
> 11.3-era 465.19.01, across several variants of GPU hardware.
> 
> Explicitly (re-)confirmed good are older versions such as CUDA 9.1-era
> 390.12, and CUDA 11.1-era 455.38, 455.45.01.
> 
> Most of these are in the 'vector_length > 32' testcases, but also a few
> others.
> 

Confirmed, I see on ubuntu 18.04.5 with dirver version 460.67:
...
FAIL: libgomp.oacc-c/../libgomp.oacc-c-c++-common/parallel-dims.c
-DACC_DEVICE_TYPE_nvidia=1 -DACC_MEM_SHARED=0 -foffload=nvptx-none  -O2 
execution test
...

[Bug target/99932] OpenACC/nvptx offloading execution regressions starting with CUDA 11.2-era Nvidia Driver 460.27.04

2021-04-23 Thread vries at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=99932

--- Comment #2 from Tom de Vries  ---
Minimal example:
...
$ cat libgomp/testsuite/libgomp.oacc-c-c++-common/parallel-dims.c
int
main (void)
{
  int vectors_max = -1;
#pragma acc parallel \
  num_gangs (1) \
  num_workers (1) \
  vector_length (32) \
  copy (vectors_max)
{
#pragma acc loop gang reduction (max: vectors_max)
  for (int i = 0; i < 2; i++)
#pragma acc loop worker reduction (max: vectors_max)
for (int j = 0; j < 2; j++)
#pragma acc loop vector reduction (max: vectors_max)
  for (int k = 0; k < 32; k++)
vectors_max = k;
}

  if (vectors_max != 31)
__builtin_abort ();

  return 0;
}
...

Passes with GOMP_NVPTX_JIT=-O0, starts failing at GOMP_NVPTX_JIT=-O1.

[Bug target/99932] OpenACC/nvptx offloading execution regressions starting with CUDA 11.2-era Nvidia Driver 460.27.04

2021-04-23 Thread vries at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=99932

--- Comment #3 from Tom de Vries  ---
Created attachment 50660
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=50660&action=edit
Cuda reproducer

[Bug target/100232] [OpenMP][nvptx] Reduction fails with optimization and 'loop'/'for simd' but not with 'for'

2021-04-23 Thread vries at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=100232

--- Comment #1 from Tom de Vries  ---
Can you try the patch for PR81778 ?

It's possible you're looking at a duplicate.

[Bug target/99932] OpenACC/nvptx offloading execution regressions starting with CUDA 11.2-era Nvidia Driver 460.27.04

2021-04-23 Thread vries at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=99932

--- Comment #4 from Tom de Vries  ---
Created attachment 50662
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=50662&action=edit
Updated cuda reproducer

Slimmed down further, eliminated gang/worker reduction parts.

[Bug target/99932] OpenACC/nvptx offloading execution regressions starting with CUDA 11.2-era Nvidia Driver 460.27.04

2021-04-24 Thread vries at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=99932

--- Comment #5 from Tom de Vries  ---
FIled https://developer.nvidia.com/nvidia_bug/3299227

[Bug tree-optimization/97333] [gimple_can_duplicate_bb_p == false, tree-ssa-threadupdate] ICE in duplicate_block, at cfghooks.c:1093

2020-10-09 Thread vries at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=97333

--- Comment #2 from Tom de Vries  ---
(In reply to Richard Biener from comment #1)
> (because well, on GIMPLE we can duplicate all blocks).

I'm not sure I understand why, given that tracer.c has a can_duplicate_bb_p
that sometimes returns false.  Sent an RFC patch to ask for clarification:
https://gcc.gnu.org/pipermail/gcc-patches/2020-October/555890.html .

[Bug target/97348] [nvptx] Make -misa=sm_35 the default

2020-10-09 Thread vries at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=97348

--- Comment #6 from Tom de Vries  ---
(In reply to CVS Commits from comment #4)
> Both build again cuda 9.1.

FWIW, tested post-commit against cuda 11.1, no issues found.

[Bug target/97318] [nvptx] Function splitting results in invalid function name

2020-10-09 Thread vries at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=97318

--- Comment #1 from Tom de Vries  ---
Tentative patch:
...
diff --git a/gcc/config/nvptx/nvptx.c b/gcc/config/nvptx/nvptx.c
index afac1bda45d..7b6a42893f9 100644
--- a/gcc/config/nvptx/nvptx.c
+++ b/gcc/config/nvptx/nvptx.c
@@ -365,6 +365,30 @@ nvptx_name_replacement (const char *name)
 return "__nvptx_free";
   if (strcmp (name, "realloc") == 0)
 return "__nvptx_realloc";
+
+  if (strchr (name, '.') != NULL)
+{
+  static char *p = NULL;
+  static size_t p_size = 0;
+  size_t len = strlen (name);
+  size_t len_0 = len + 1;
+  if (p == NULL)
+   {
+ p_size = len_0;
+ p = XNEWVEC (char, p_size);
+   }
+  else if (len_0 > p_size)
+   {
+ p_size = len_0;
+ p = XRESIZEVEC (char, p, p_size);
+   }
+  strncpy (p, name, len_0);
+  for (size_t i = 0; i < len_0; ++i)
+   if (p[i] == '.')
+ p[i] = '$';
+  return p;
+}
+
   return name;
 }

...

[Bug target/97318] [nvptx] Function splitting results in invalid function name

2020-10-10 Thread vries at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=97318

Tom de Vries  changed:

   What|Removed |Added

   Target Milestone|--- |11.0
 Resolution|--- |FIXED
 Status|UNCONFIRMED |RESOLVED

--- Comment #3 from Tom de Vries  ---
Marking resolved-fixed.

[Bug target/97203] [nvptx] 'illegal memory access was encountered' with 'omp simd'/SIMT and cexpf call

2020-10-12 Thread vries at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=97203

--- Comment #7 from Tom de Vries  ---
(In reply to Alexander Monakov from comment #6)
> (In reply to Tom de Vries from comment #4)
> > So, I think calling functions from simd code is atm not supported for nvptx.
> > 
> > Stack variables in simd code are mapped on a per-thread stack rather than on
> > the
> > usual per-warp stack.
> > 
> > The functions are compiled with the usual per-warp stack, so calling those
> > functions from simd might mean the different lanes are gonna disagree about
> > what the value in a stack variable should be.
> 
> This is inaccurate. In -msoft-stack mode there's no baked-in assumption that
> stacks are always per-warp. The "soft stack" pointer can point either to
> global memory (outside of SIMD regions), or to local memory (inside SIMD
> regions). The pointer is switched between per-warp global memory and
> per-lane local memory by nvptx.c:nvptx_output_softstack_switch.
> 
> The main requirement is that functions callable from OpenMP offloaded code
> are compiled for -mgomp multilib variant. The design allows calling
> functions even from inside SIMD regions, and it should be supported.

I see, that's helpful, thanks.

I guess I was thrown off by seeing a %simtstack_ar of 136 bytes:
...
.local .align 8 .b8 %simtstack_ar[136];
...
which seems more of an amount claimed by a single function.

Is it possible you meant the default of -msoft-stack-reserve-local=128 to mean
128kb (similar to what is claimed in nvptx_stacks_size in the plugin)? Because
currently it means 128 bytes.

[Bug target/97203] [nvptx] 'illegal memory access was encountered' with 'omp simd'/SIMT and cexpf call

2020-10-12 Thread vries at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=97203

--- Comment #9 from Tom de Vries  ---
(In reply to Tom de Vries from comment #2)
> Minimal version (without inlining sinf code from newlib):
> ...
> /* { dg-additional-options "-lm -foffload=-lm" } */
> 
> #define N 1
> 
> int
> main (void) {
>   float k[N];
>   float res;
> 
>   for (int i = 0; i < N; i++)
> k[i] = 300;
>   
> #pragma omp target map(to:k) map(from:res)
>   {
> float sum = 0.0;
> #pragma omp simd reduction(+:sum)
> for (int i = 0; i < N; i++)
>   sum += __builtin_sinf (k[i]);
> 
> res = sum;
>   }
> 
>   return 0;
> }
> ...

Starts passing at -foffload=-msoft-stack-reserve-local=346.

[Bug libgomp/97384] New: [libgomp, nvptx] Handle -msoft-stack-reserve-local= overflow in plugin

2020-10-12 Thread vries at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=97384

Bug ID: 97384
   Summary: [libgomp, nvptx] Handle -msoft-stack-reserve-local=
overflow in plugin
   Product: gcc
   Version: 11.0
Status: UNCONFIRMED
  Severity: enhancement
  Priority: P3
 Component: libgomp
  Assignee: unassigned at gcc dot gnu.org
  Reporter: vries at gcc dot gnu.org
CC: jakub at gcc dot gnu.org
  Target Milestone: ---

Using the option -msoft-stack-reserve-local= results in a:
...
.local .align 8 .b8 %simtstack_ar[n+8];
...

However, the CU_LIMIT_STACK_SIZE is set by default to 1kb for my card/driver
combo, so if I specify say -msoft-stack-reserve-local=2048, I run into:
...
libgomp: cuCtxSynchronize error: an illegal memory access was encountered
...
or:
...
libgomp: cuCtxSynchronize error: an illegal instruction was encountered
...
[ The latter at GOMP_NVPTX_JIT=-O0. ] Which may look a lot like the behaviour
we're trying to fix by adding -msoft-stack-reserve-local.

There's currently no way to make this work.

We could add an env var, say GOMP_NVPTX_LIMIT_STACK_SIZE which is used to set:
...
  r = cuCtxSetLimit(CU_LIMIT_STACK_SIZE, gomp_nvptx_limit_stack_size);
...
and then do:
...
$ GOMP_NVPTX_LIMIT_STACK_SIZE=3072 ./a.out
...
[ Note that GOMP_NVPTX_LIMIT_STACK_SIZE id chosen to be larger than 2048 to
accommodate for other .local usage. ]

[ It would be nice if we could attempt to accommodate the requested stack size
in the libgomp plugin automatically.  In the current setup, that would mean
scanning the ptx code for "simtstack_ar[]", which is a bit cumbersome and
probably too slow.  Perhaps emitting an additional additional line before the
pre-amble like this:
...
// SIMTSTACK_AR_SIZE: 2048
...
would be possible to handle quick enough. ]

[Bug target/97385] New: [nvptx, docs] -msoft-stack-reserve-local= missing documentation

2020-10-12 Thread vries at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=97385

Bug ID: 97385
   Summary: [nvptx, docs] -msoft-stack-reserve-local= missing
documentation
   Product: gcc
   Version: 11.0
Status: UNCONFIRMED
  Severity: trivial
  Priority: P3
 Component: target
  Assignee: unassigned at gcc dot gnu.org
  Reporter: vries at gcc dot gnu.org
  Target Milestone: ---

Currently, https://gcc.gnu.org/onlinedocs/gcc/Nvidia-PTX-Options.html does not
list -msoft-stack-reserve-local.

[Bug target/97203] [nvptx] 'illegal memory access was encountered' with 'omp simd'/SIMT and cexpf call

2020-10-12 Thread vries at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=97203

--- Comment #10 from Tom de Vries  ---
(In reply to Alexander Monakov from comment #8)
> No, -msoft-stack-reserve-local is really meant to be in bytes: it may not
> exceed the amount of .local memory reserved by CUDA driver (which is just
> 1-2 KB, unless overridden via cuCtxSetLimit, which nvptx-run.c does, but
> plugin-nvptx.c does not).
> 
> Keep in mind that .local memory reservation is multiplied by number of
> active contexts, which could be in range 2-3 when the code was
> written: 128KB local memory per active thread would imply a 2.5GB allocation
> on the GPU.

With the number of active contexts, do you mean the sm_count * thread_max as
used in nvptx-run.c (which, FWIW, is 10.240 on my card)?

[Bug target/97436] New: [nvptx] -m32 support

2020-10-15 Thread vries at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=97436

Bug ID: 97436
   Summary: [nvptx] -m32 support
   Product: gcc
   Version: 11.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: target
  Assignee: unassigned at gcc dot gnu.org
  Reporter: vries at gcc dot gnu.org
  Target Milestone: ---

The nvptx port has an -m32 switch:
...
m32
Target Report RejectNegative InverseMask(ABI64)
Generate code for a 32-bit ABI.
...
but the default is -m64:
...
#define TARGET_DEFAULT_TARGET_FLAGS MASK_ABI64
...
[ which perhaps should be related to the host being -m64 or -m32? ]

We're not building -m32/-m64 multilibs, so it seems we're not building the -m32
part by default.

I don't know if the -m32 path was ever tested, either in standalone or
offloading setting.

But since the switch is there, we should either build and test or deprecate it.

I'm not yet sure what would be a working setup in terms of card/drivers/OS.

At least for linux, at cuda 7.5 it's mentioned that: "Support for developing
and running 32-bit CUDA and OpenCL applications on 64-bit x86 Linux platforms
is deprecated".

And with cuda 8.0, I get:
...
$ ~/cuda/8.0/bin/nvcc ~/hello.cu -m32
nvcc warning : Compiling in the 32-bit mode when the host compiler targets x86
or x86_64 is no longer supported on Linux
...

[Bug tree-optimization/84958] int loads not eliminated against larger stores

2020-10-15 Thread vries at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=84958

--- Comment #4 from Tom de Vries  ---
(In reply to Tom de Vries from comment #0)
> [ FWIW, adding an extra fre pass here also results in optimal gimple:
> ...
> diff --git a/gcc/passes.def b/gcc/passes.def
> index 3ebcfc30349..6b64f600c4a 100644
> --- a/gcc/passes.def
> +++ b/gcc/passes.def
> @@ -325,6 +325,7 @@ along with GCC; see the file COPYING3.  If not see
>NEXT_PASS (pass_tracer);
>NEXT_PASS (pass_thread_jumps);
>NEXT_PASS (pass_dominator, false /* may_peel_loop_headers_p */);
> +  NEXT_PASS (pass_fre);
>NEXT_PASS (pass_strlen);
>NEXT_PASS (pass_thread_jumps);
>NEXT_PASS (pass_vrp, false /* warn_array_bounds_p */);
> ...
> ]

which was added in:
...
commit 744fd446c321f78f9a1ce4ef5f83df8dcfa44a9e
Author: Richard Biener 
Date:   Mon Jul 1 07:54:38 2019 +

tree-ssa-sccvn.c (class pass_fre): Add may_iterate pass parameter.

2019-07-01  Richard Biener  

* tree-ssa-sccvn.c (class pass_fre): Add may_iterate
pass parameter.
(pass_fre::execute): Honor it.
* passes.def: Adjust pass_fre invocations to allow iterating,
add non-iterating pass_fre before late threading/dom.

* gcc.dg/tree-ssa/pr77445-2.c: Adjust.

From-SVN: r272843

diff --git a/gcc/passes.def b/gcc/passes.def
index ad2efabd385..9a5b0cd554a 100644
--- a/gcc/passes.def
+++ b/gcc/passes.def
@@ -312,6 +312,7 @@ along with GCC; see the file COPYING3.  If not see
   NEXT_PASS (pass_strength_reduction);
   NEXT_PASS (pass_split_paths);
   NEXT_PASS (pass_tracer);
+  NEXT_PASS (pass_fre, false /* may_iterate */);
   NEXT_PASS (pass_thread_jumps);
   NEXT_PASS (pass_dominator, false /* may_peel_loop_headers_p */);
   NEXT_PASS (pass_strlen);
...

[Bug tree-optimization/84958] int loads not eliminated against larger stores

2020-10-15 Thread vries at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=84958

Tom de Vries  changed:

   What|Removed |Added

 CC||ams at gcc dot gnu.org,
   ||julian at codesourcery dot com

--- Comment #5 from Tom de Vries  ---
I've removed the xfail for nvptx.

The only remaining xfail is for gcn.  Is that one still necessary?

[Bug tree-optimization/97333] [gimple_can_duplicate_bb_p == false, tree-ssa-threadupdate] ICE in duplicate_block, at cfghooks.c:1093

2020-10-15 Thread vries at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=97333

--- Comment #3 from Tom de Vries  ---
(In reply to Tom de Vries from comment #2)
> (In reply to Richard Biener from comment #1)
> > (because well, on GIMPLE we can duplicate all blocks).
> 
> I'm not sure I understand why, given that tracer.c has a can_duplicate_bb_p
> that sometimes returns false.  Sent an RFC patch to ask for clarification:
> https://gcc.gnu.org/pipermail/gcc-patches/2020-October/555890.html .

Committed as "[gimple] Move can_duplicate_bb_p to gimple_can_duplicate_bb_p" (
https://gcc.gnu.org/git/?p=gcc.git;a=commit;h=17d5739a6b103cdd3315f5d0e09fe8faa6620a03
).

Now gimple_can_duplicate_bb_p can return false.

[Bug target/97444] New: [nvptx] stack atomics

2020-10-15 Thread vries at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=97444

Bug ID: 97444
   Summary: [nvptx] stack atomics
   Product: gcc
   Version: 11.0
Status: UNCONFIRMED
  Severity: enhancement
  Priority: P3
 Component: target
  Assignee: unassigned at gcc dot gnu.org
  Reporter: vries at gcc dot gnu.org
  Target Milestone: ---

There's currently a bunch of tests failing:
...
FAIL: gcc.dg/atomic/c11-atomic-exec-6.c   -O0  execution test
FAIL: gcc.dg/atomic/c11-atomic-exec-6.c   -O1  execution test
FAIL: gcc.dg/atomic/c11-atomic-exec-6.c   -O2  execution test
FAIL: gcc.dg/atomic/c11-atomic-exec-6.c   -O3 -fomit-frame-pointer
-funroll-loops -fpeel-loops -ftracer -finline-functions  execution test
FAIL: gcc.dg/atomic/c11-atomic-exec-6.c   -O3 -g  execution test
FAIL: gcc.dg/atomic/c11-atomic-exec-6.c   -Os  execution test
FAIL: gcc.dg/atomic/c11-atomic-exec-7.c   -O0  execution test
FAIL: gcc.dg/atomic/c11-atomic-exec-7.c   -O1  execution test
FAIL: gcc.dg/atomic/c11-atomic-exec-7.c   -O2  execution test
FAIL: gcc.dg/atomic/c11-atomic-exec-7.c   -O3 -fomit-frame-pointer
-funroll-loops -fpeel-loops -ftracer -finline-functions  execution test
FAIL: gcc.dg/atomic/c11-atomic-exec-7.c   -O3 -g  execution test
FAIL: gcc.dg/atomic/c11-atomic-exec-7.c   -Os  execution test
FAIL: gcc.dg/atomic/stdatomic-op-5.c   -O0  execution test
FAIL: gcc.dg/atomic/stdatomic-op-5.c   -O1  execution test
FAIL: gcc.dg/atomic/stdatomic-op-5.c   -O2  execution test
FAIL: gcc.dg/atomic/stdatomic-op-5.c   -O3 -g  execution test
FAIL: gcc.dg/atomic/stdatomic-op-5.c   -Os  execution test
...
due to using atomics with stack memory.

The nvptx atomic ops do not support using stack memory.

I've been marking test-cases with atomic builtins using stack with
dg-require-effective-target sync_int_long_stack, which is ok because they're
just builtins.

But it's another thing when atomics are part of the language like in the
examples above.

In principle, it should possible to generate a run-time test using isspacep,
and switch between using an atomic op or a non-atomic fallback (that is either
an atom.add.u32, or an add.u32, because it's effectively atomic in .local
memory).

OTOH, we want a mode of operation where we generate atomics as currently,
without the runtime test to have smaller and faster code.

So, perhaps, a switch -mstack-atomics.

[Bug target/97436] [nvptx] Remove -m32 support

2020-10-15 Thread vries at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=97436

Tom de Vries  changed:

   What|Removed |Added

 Resolution|--- |FIXED
 Status|NEW |RESOLVED
   Target Milestone|--- |11.0

--- Comment #3 from Tom de Vries  ---
Updated release notes, marking resolved-fixed.

[Bug libgomp/97509] New: [nvptx, offloading] dg-excess-errors directive no longer working in some test-cases

2020-10-21 Thread vries at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=97509

Bug ID: 97509
   Summary: [nvptx, offloading] dg-excess-errors directive no
longer working in some test-cases
   Product: gcc
   Version: 11.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: libgomp
  Assignee: unassigned at gcc dot gnu.org
  Reporter: vries at gcc dot gnu.org
CC: jakub at gcc dot gnu.org
  Target Milestone: ---

With current trunk I'm seeing a number of new fails, all of them "test for
excess errors":
...
FAIL: libgomp.c/../libgomp.c-c++-common/function-not-offloaded.c (test for
excess errors)
FAIL: libgomp.c/../libgomp.c-c++-common/variable-not-offloaded.c (test for
excess errors)
FAIL: libgomp.c/pr86416-1.c (test for excess errors)
FAIL: libgomp.c/pr86416-2.c (test for excess errors)
FAIL: libgomp.c++/../libgomp.c-c++-common/function-not-offloaded.c (test for
excess errors)
FAIL: libgomp.c++/../libgomp.c-c++-common/variable-not-offloaded.c (test for
excess errors)
...

Incidentally, that's all but one of the total set of tests that use the
dg-excess-errors:
...
$ find libgomp/testsuite/ -type f | xargs grep -l dg-excess-errors
libgomp/testsuite/libgomp.oacc-c-c++-common/function-not-offloaded.c
libgomp/testsuite/libgomp.c-c++-common/function-not-offloaded.c
libgomp/testsuite/libgomp.c-c++-common/variable-not-offloaded.c
libgomp/testsuite/libgomp.c/pr86416-1.c
libgomp/testsuite/libgomp.c/pr86416-2.c
...

The one still passing is an openacc test.

[Bug libgomp/97509] [nvptx, offloading] dg-excess-errors directive no longer working in some test-cases

2020-10-21 Thread vries at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=97509

Tom de Vries  changed:

   What|Removed |Added

 Resolution|--- |WORKSFORME
 Status|UNCONFIRMED |RESOLVED

--- Comment #1 from Tom de Vries  ---
OK, this seems to be the test results when the nvidia driver stopped working,
which I haven't been able to repair yet, but I'm assuming WORKSFORME for now. 
Sorry for the noise.

[Bug libgomp/97532] New: Error: insn does not satisfy its constraints, internal compiler error: in extract_constrain_insn, at recog.c:2196

2020-10-22 Thread vries at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=97532

Bug ID: 97532
   Summary: Error: insn does not satisfy its constraints, internal
compiler error: in extract_constrain_insn, at
recog.c:2196
   Product: gcc
   Version: 11.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: libgomp
  Assignee: unassigned at gcc dot gnu.org
  Reporter: vries at gcc dot gnu.org
CC: jakub at gcc dot gnu.org
  Target Milestone: ---

At commit c26d7df1031 "OpenMP: Fortran - support omp flush's memorder clauses"
I'm seeing a few new FAILs with nvptx offloading:
...
FAIL: libgomp.fortran/examples-4/simd-2.f90   -O1  (internal compiler error)
FAIL: libgomp.fortran/examples-4/simd-2.f90   -O1  (test for excess errors)
FAIL: libgomp.fortran/examples-4/simd-2.f90   -O2  (internal compiler error)
FAIL: libgomp.fortran/examples-4/simd-2.f90   -O2  (test for excess errors)
FAIL: libgomp.fortran/examples-4/simd-2.f90   -O3 -fomit-frame-pointer
-funroll-loops -fpeel-loops -ftracer -finline-functions  (internal compiler
error)
FAIL: libgomp.fortran/examples-4/simd-2.f90   -O3 -fomit-frame-pointer
-funroll-loops -fpeel-loops -ftracer -finline-functions  (test for excess
errors)
FAIL: libgomp.fortran/examples-4/simd-2.f90   -O3 -g  (internal compiler error)
FAIL: libgomp.fortran/examples-4/simd-2.f90   -O3 -g  (test for excess errors)
FAIL: libgomp.fortran/examples-4/simd-2.f90   -Os  (internal compiler error)
FAIL: libgomp.fortran/examples-4/simd-2.f90   -Os  (test for excess errors)
...

In more detail:
...
/home/vries/oacc/trunk/source-gcc/libgomp/testsuite/libgomp.fortran/examples-4/simd-2.f90:19:0:
Error: insn does not satisfy its constraints:^M
(insn 76 75 82 2 (set (reg:V8DF 20 xmm0 [orig:153 vect_c_20.334 ] [153])^M
(plus:V8DF (reg:V8DF 20 xmm0 [orig:184 vect__17.333 ] [184])^M
(vec_duplicate:V8DF (mem:DF (reg:DI 24 xmm4 [189]) [6 *fact_18(D)+0
S8 A64]
"/home/vries/oacc/trunk/source-gcc/libgomp/testsuite/libgomp.fortran/examples-4/simd-2.f90":18:0
1569 {*addv8df3}^M
 (nil))^M
during RTL pass: reload^M
/home/vries/oacc/trunk/source-gcc/libgomp/testsuite/libgomp.fortran/examples-4/simd-2.f90:19:0:
internal compiler error: in extract_constrain_insn, at recog.c:2196^M
0x6092d8 _fatal_insn(char const*, rtx_def const*, char const*, int, char
const*)^M
/home/vries/oacc/trunk/source-gcc/gcc/rtl-error.c:108^M
0x609301 _fatal_insn_not_found(rtx_def const*, char const*, int, char const*)^M
/home/vries/oacc/trunk/source-gcc/gcc/rtl-error.c:118^M
0xcb52ad extract_constrain_insn(rtx_insn*)^M
/home/vries/oacc/trunk/source-gcc/gcc/recog.c:2196^M
0xb99457 check_rtl^M
/home/vries/oacc/trunk/source-gcc/gcc/lra.c:2173^M
...

[Bug target/97532] [11 Regression] Error: insn does not satisfy its constraints, internal compiler error: in extract_constrain_insn, at recog.c:2196

2020-10-29 Thread vries at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=97532

--- Comment #12 from Tom de Vries  ---
(In reply to Hongtao.liu from comment #10)
> Created attachment 49444 [details]
> Fix invalid address for special memory constraint
> 
> I'm testing this patch.

Submitted: https://gcc.gnu.org/pipermail/gcc-patches/2020-October/557142.html

[Bug debug/97669] New: Section .debug_info.dwo contains standard_opcode_lenghts array

2020-11-02 Thread vries at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=97669

Bug ID: 97669
   Summary: Section .debug_info.dwo contains
standard_opcode_lenghts array
   Product: gcc
   Version: 11.0
Status: UNCONFIRMED
  Severity: enhancement
  Priority: P3
 Component: debug
  Assignee: unassigned at gcc dot gnu.org
  Reporter: vries at gcc dot gnu.org
  Target Milestone: ---

In the dwarf v5 standard we read:
...
.debug_line.dwo - Contains specialized line number tables for the type
units in the .debug_info.dwo section. These tables contain only the
directory and filename lists needed to interpret DW_AT_decl_file attributes
in the debugging information entries. Actual line number tables remain in
the .debug_line section, and remain in the relocatable object (.o) files.
...

Now consider:
...
$ gcc-11 \
-g -gsplit-dwarf \
~/hello.c \
-v -save-temps \
-dA
...

In file hello.s, in section .debug_line.dwo, we have the 
standard_opcode_lenghts array:
...
.byte   0xd # Special Opcode Base
.byte   0   # opcode: 0x1 has 0 args
.byte   0x1 # opcode: 0x2 has 1 args
.byte   0x1 # opcode: 0x3 has 1 args
.byte   0x1 # opcode: 0x4 has 1 args
.byte   0x1 # opcode: 0x5 has 1 args
.byte   0   # opcode: 0x6 has 0 args
.byte   0   # opcode: 0x7 has 0 args
.byte   0   # opcode: 0x8 has 0 args
.byte   0x1 # opcode: 0x9 has 1 args
.byte   0   # opcode: 0xa has 0 args
.byte   0   # opcode: 0xb has 0 args
.byte   0x1 # opcode: 0xc has 1 args
...

But given that this is a "specialized line number table", there's no line
number program, and because of that, the standard_opcode_lengths array is
unnecessary.

[ And then there's the fact that actually the whole .debug_line.dwo is
unnecessary, given that there are no type units in .debug_info.dwo. ]

[Bug debug/97713] New: [gsplit-dwarf] label generated for .debug_abbrev.dwo offset, corresponding relocation ignored by objcopy --extract-dwo

2020-11-04 Thread vries at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=97713

Bug ID: 97713
   Summary: [gsplit-dwarf] label generated for .debug_abbrev.dwo
offset, corresponding relocation ignored by objcopy
--extract-dwo
   Product: gcc
   Version: 11.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: debug
  Assignee: unassigned at gcc dot gnu.org
  Reporter: vries at gcc dot gnu.org
  Target Milestone: ---

I.

Consider a simple hello.c, compiled with debug info:
...
$ gcc-11 -g ~/hello.c -dA -save-temps
...

In a-hello.s, in the CU header we have a label .Ldebug_abbrev0:
...
.section.debug_info,"",@progbits
.Ldebug_info0:
.long   0x87# Length of Compilation Unit Info
.value  0x4 # DWARF version number
.long   .Ldebug_abbrev0 # Offset Into Abbrev. Section

referring here:
...
.section.debug_abbrev,"",@progbits
.Ldebug_abbrev0:
.uleb128 0x1# (abbrev code)
.uleb128 0x11   # (TAG: DW_TAG_compile_unit)
...

In a-hello.o, we find a zero at the abbrev offset:
...
Contents of the .debug_info section:

  Compilation Unit @ offset 0x0:
   Length:0x87 (32-bit)
   Version:   4
   Abbrev Offset: 0x0
   Pointer Size:  8
...
but there's a related relocation in place:
...
Relocation section '.rela.debug_info' at offset 0x4c8 contains 16 entries:
  Offset  Info   Type   Sym. ValueSym. Name +
Addend
0006  0007000a R_X86_64_32    .debug_abbrev + 0
...
and in the a.out, we end up with:
...
  Compilation Unit @ offset 0xc7:
   Length:0x87 (32-bit)
   Version:   4
   Abbrev Offset: 0x64
...


II.

Now cp a-hello.s to hello.s and modify like this:
...
$ diff -u a-hello.s hello.s
--- a-hello.s   2020-11-04 13:04:03.325633204 +0100
+++ hello.s 2020-11-04 13:04:17.001509687 +0100
@@ -102,6 +102,7 @@
# DW_AT_GNU_all_tail_call_sites
.byte   0   # end of children of DIE 0xb
.section.debug_abbrev,"",@progbits
+   .uleb128 0x0# (abbrev code)
 .Ldebug_abbrev0:
.uleb128 0x1# (abbrev code)
.uleb128 0x11   # (TAG: DW_TAG_compile_unit)
...
and recompile:
...
$ gcc-11 -g hello.s -save-temps
...

We can see that we've indeed moved the label by 1 byte:
...
$ llvm-dwarfdump -debug-abbrev a-hello.o
a-hello.o:  file format ELF64-x86-64

.debug_abbrev contents:
Abbrev table for offset: 0x
Abbrev table for offset: 0x0001
[1] DW_TAG_compile_unit DW_CHILDREN_yes
...
and likewise, the relocation:
...
Relocation section '.rela.debug_info' at offset 0x4c8 contains 16 entries:
  Offset  Info   Type   Sym. ValueSym. Name +
Addend
0006  0007000a R_X86_64_32    .debug_abbrev + 1
...
and likewise, in the a.out:
...
  Compilation Unit @ offset 0xc7:
   Length:0x87 (32-bit)
   Version:   4
   Abbrev Offset: 0x65
...


III.

Now, let's try the same with in .debug_abbrev.dwo.  We compile with
-gsplit-dwarf:
...
$ gcc-11 ~/hello.c -g -gsplit-dwarf -dA -save-temps
...
and copy to hello.s and modify:
...
$ diff -u a-hello.s hello.s
--- a-hello.s   2020-11-04 13:12:57.188966796 +0100
+++ hello.s 2020-11-04 13:14:48.632059272 +0100
@@ -156,6 +156,7 @@
.byte   0
.byte   0   # end of skeleton .debug_abbrev
.section.debug_abbrev.dwo,"e",@progbits
+   .uleb128 0x0# (abbrev code)
 .Ldebug_abbrev0:
.uleb128 0x1# (abbrev code)
.uleb128 0x11   # (TAG: DW_TAG_compile_unit)
...

We can see that we indeed added the entry to .debug_abbrev.dwo:
...
$ llvm-dwarfdump -debug-abbrev a-hello.dwo 
a-hello.dwo:file format ELF64-x86-64

.debug_abbrev.dwo contents:
Abbrev table for offset: 0x
Abbrev table for offset: 0x0001
[1] DW_TAG_compile_unit DW_CHILDREN_yes
...

The abbrev offset is 0:
...
Contents of the .debug_info.dwo section:

  Compilation Unit @ offset 0x0:
   Length:0x50 (32-bit)
   Version:   4
   Abbrev Offset: 0x0
...
but there's no related relocation so the CU is interpreted using the empty
abbrev table at offset 0, and we run into:
...
 <0>: Abbrev Number: 1
readelf: Warning: DIE at offset 0xb refers to abbreviation number 1 which does
not exist
...

The split into .o/.dwo is done like this:
...
 /usr/lib64/gcc/x86_64-suse-linux/11/../../../../x86_64-suse-linux/bin/as
--gdwarf2 -v --64 -o a-hello.o hello.s
 objcopy --extract-dwo a-hello.o a-hello.dwo
 objcopy --strip-dwo a-hello.o
...
so we can recreate the original a-hello.o, and find the relocation:
...
Relocation section '.rela.debug_info.dwo' at offset 0x710 contains 1 entry:
  Offset  Info   Type   Sym. ValueSym. Name +
Addend
0006  000a000a R_X86_64_32    .debug_abbrev.dwo
+ 1
...

So, objcopy --extract-

[Bug debug/97713] [gsplit-dwarf] label generated for .debug_abbrev.dwo offset, corresponding relocation ignored by objcopy --extract-dwo

2020-11-04 Thread vries at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=97713

--- Comment #1 from Tom de Vries  ---
(In reply to Tom de Vries from comment #0)
> and copy to hello.s and modify:
> ...
> $ diff -u a-hello.s hello.s
> --- a-hello.s   2020-11-04 13:12:57.188966796 +0100
> +++ hello.s 2020-11-04 13:14:48.632059272 +0100
> @@ -156,6 +156,7 @@
> .byte   0
> .byte   0   # end of skeleton .debug_abbrev
> .section.debug_abbrev.dwo,"e",@progbits
> +   .uleb128 0x0# (abbrev code)
>  .Ldebug_abbrev0:
> .uleb128 0x1# (abbrev code)
> .uleb128 0x11   # (TAG: DW_TAG_compile_unit)
> ...

And forgot to mention, recompile:
...
$ gcc-11 hello.s -g -gsplit-dwarf -save-temps
...

[Bug debug/97713] [gsplit-dwarf] label generated for .debug_abbrev.dwo offset, corresponding relocation ignored by objcopy --extract-dwo

2020-11-04 Thread vries at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=97713

--- Comment #2 from Tom de Vries  ---
Filed corresponding binutils PR: "objcopy --extract-dwo silently drops
relocation" at https://sourceware.org/bugzilla/show_bug.cgi?id=26841 .

[Bug debug/97713] [gsplit-dwarf] label generated for .debug_abbrev.dwo offset, corresponding relocation ignored by objcopy --extract-dwo

2020-11-06 Thread vries at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=97713

--- Comment #3 from Tom de Vries  ---
Mentioning dwarf 5 standard bit @ "7.3.2.2 Second Partition (Unlinked or in a
.dwo File)":
...
Split DWARF object files do not get linked with any other files, therefore
references between sections must not make use of normal object file relocation
information. As a result, symbolic references within or between sections are
not
possible.
...

[Bug debug/97713] [gsplit-dwarf] label generated for .debug_abbrev.dwo offset, corresponding relocation ignored by objcopy --extract-dwo

2020-11-06 Thread vries at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=97713

Tom de Vries  changed:

   What|Removed |Added

 CC||ccoutant at gmail dot com,
   ||jason at gcc dot gnu.org

--- Comment #4 from Tom de Vries  ---
(In reply to Tom de Vries from comment #0)
> Now, should objcopy implement the relocation?

Nick proposed a patch that errors out on current gcc output.

> Note that llvm emits a '0' as abbrev offset instead of a label.

And gcc would have to emit a 0, like llvm.  It that ok from gcc perspective?

[Bug debug/97774] New: Incorrect line info for try/catch

2020-11-10 Thread vries at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=97774

Bug ID: 97774
   Summary: Incorrect line info for try/catch
   Product: gcc
   Version: 7.5.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: debug
  Assignee: unassigned at gcc dot gnu.org
  Reporter: vries at gcc dot gnu.org
  Target Milestone: ---

[ This PR is FTR, it's already fixed. ]

Consider this test-case, minimized from gdb.cp/gdb9593.cc (
https://src.fedoraproject.org/rpms/gdb/blob/master/f/gdb-archer-vla-tests.patch
):
...
$ cat -n test.cc
 1  void
 2  function1 (void)
 3  {
 4throw 20;
 5  }
 6
 7  int
 8  main (void)
 9  { 
10try
11  {
12function1 ();
13  }
14catch (int x)
15  {
16  }
17
18return 0;
19  }
...

We compile using gcc 7.5.0:
...
$ g++ -g test.cc -save-temps -dA
...

When trying to step over function1 using next, we end up on line 18, and not at
the start of line 18 (given the $hex prefix):
...
$ gdb a.out -ex start -ex next
Reading symbols from a.out...
Temporary breakpoint 1 at 0x4007b5: file test.cc, line 12.
Starting program: a.out 

Temporary breakpoint 1, main () at test.cc:12
12function1 ();
0x004007c1  18return 0;
(gdb) 
...

This is caused by the following.

There's a .loc for line 18 after the call to function1, but then we jump away
to label .L9:
...
# test.cc:12
.loc 1 12 0
call_Z9function1v
.LEHE0:
# BLOCK 3 seq:1
# PRED: 2 (FALLTHRU) 6 [100.0%]
.L7:
# test.cc:18
.loc 1 18 0
movl$0, %eax
# SUCC: 7 [100.0%]
jmp .L9
# BLOCK 4 seq:2
# PRED: 2 (ABNORMAL,ABNORMAL_CALL,EH)
.L8:
cmpq$1, %rdx
...
Since there's no other loc before the insn at .L8, it's considered to be part
of line 18.

[Bug debug/97774] Incorrect line info for try/catch

2020-11-10 Thread vries at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=97774

Tom de Vries  changed:

   What|Removed |Added

   Target Milestone|--- |10.0
 Status|UNCONFIRMED |RESOLVED
 Resolution|--- |FIXED

--- Comment #1 from Tom de Vries  ---
Starting gcc 10.1, we have instead:
...
jmp .L8
# BLOCK 4 seq:2
# PRED: 2 (ABNORMAL,ABNORMAL_CALL,EH)
.L7:
# test.cc:14:3
.loc 1 14 3
cmpq$1, %rdx
...
and:
...
$ gdb a.out -ex start -ex n
Reading symbols from a.out...
Temporary breakpoint 1 at 0x4007b4: file test.cc, line 12.
Starting program: a.out 

Temporary breakpoint 1, main () at test.cc:12
12function1 ();
14catch (int x)
...

This is due to:
...
commit 4500f7510368cdb24b8afcc66e86e09cafe49199
Author: Eric Botcazou 
Date:   Fri Jul 5 08:39:13 2019 +

except.c (emit_to_new_bb_before): Make sure to put a location on SEQ.

* except.c (emit_to_new_bb_before): Make sure to put a location on
SEQ.
* tree-eh.c (replace_goto_queue_1) : Propagate
location.
(emit_eh_dispatch): Delete.
(lower_catch): Emit the eh_dispatch manually and set the location
of
the first catch statement onto it.
(lower_eh_filter): Emit the eh_dispatch manually and set location.
(lower_eh_dispatch): Propagate location.
* tree-outof-ssa.c (set_location_for_edge): Handle EH edges
specially.
(eliminate_build): Likewise.

From-SVN: r273132
...

[Bug debug/97713] [gsplit-dwarf] label generated for .debug_abbrev.dwo offset, corresponding relocation ignored by objcopy --extract-dwo

2020-11-19 Thread vries at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=97713

--- Comment #5 from Tom de Vries  ---
(In reply to Tom de Vries from comment #4)
> (In reply to Tom de Vries from comment #0)
> > Now, should objcopy implement the relocation?
> 
> Nick proposed a patch that errors out on current gcc output.
> 
> > Note that llvm emits a '0' as abbrev offset instead of a label.
> 
> And gcc would have to emit a 0, like llvm.  It that ok from gcc perspective?

Ping.

[Bug target/98321] [nvptx] 'atom.add.f32' for atomic add of 32-bit 'float'

2020-12-17 Thread vries at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98321

--- Comment #1 from Tom de Vries  ---
Ok, let's first make a runnable test-case:
...
$ cat src/libgomp/testsuite/libgomp.oacc-c/test.c
#include 

#define TYPE float

TYPE a = 1;
TYPE b = 2;

int
main (void)
{

  printf ("A: %f\n", a);

#pragma acc parallel num_gangs (1) num_workers (1) copy (a, b)
#pragma acc atomic update
  a += b;

  printf ("A: %f\n", a);

  return !(a == 3);
}
...

Indeed we see the cas, but that has nothing to do with support in the nvptx
port:
...
atom.cas.b32%r29, [%r25], %r22, %r28;   
...

This appears already at ompexp on the host, where we expand:
...
  #pragma omp atomic_load relaxed
D.2555 = *D.2568

   :
  D.2557 = D.2555 + b.1;
  #pragma omp atomic_store relaxed (D.2557)
...
into:
...
  D.2583 = __atomic_load_4 (D.2582, 0);
  D.2584 = D.2583;

   :
  D.2585 = VIEW_CONVERT_EXPR(D.2584);
  D.2586 = D.2585 + b.1;
  D.2587 = VIEW_CONVERT_EXPR(D.2586);
  D.2588 = __sync_val_compare_and_swap_4 (D.2582, D.2584, D.2587);
...

This is part of a generic problem with offloading, where choices are made in
the host compiler which are suboptimal or even unsupported in the offload
compiler.

Ideally this should be addressed in the host compiler.

It may be possible to address this in the nvptx port by trying to detect the
unoptimal pattern and converting it to the optimal atom.add.f32.  But
ultimately that's a workaround, and it's better to fix this at the source.

gcc-bugs@gcc.gnu.org

2020-09-25 Thread vries at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=97207

Bug ID: 97207
   Summary: [nvptx, build] nvptx.c:3539:38: error: no matching
function for call to ‘swap(bracket_vec_t&,
bracket_vec_t&)’
   Product: gcc
   Version: 11.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: target
  Assignee: unassigned at gcc dot gnu.org
  Reporter: vries at gcc dot gnu.org
  Target Milestone: ---

Building trunk for nvptx on ubuntu 18.04.5 LTS with g++ (Ubuntu
7.5.0-3ubuntu1~18.04) 7.5.0, I run into:
...
src/gcc/config/nvptx/nvptx.c: In member function ‘void bb_sese:
:append(bb_sese*)’:
src/gcc/config/nvptx/nvptx.c:3539:38: error: no matching functi
on for call to ‘swap(bracket_vec_t&, bracket_vec_t&)’
  std::swap (brackets, child->brackets);
  ^
In file included from /usr/include/c++/7/bits/nested_exception.h:40:0,
 from /usr/include/c++/7/exception:143,
 from /usr/include/c++/7/ios:39,
 from /usr/include/c++/7/istream:38,
 from /usr/include/c++/7/sstream:38,
 from src/gcc/config/nvptx/nvptx.c:24:
/usr/include/c++/7/bits/move.h:187:5: note: candidate: template
typename std::enabl
e_if >,
std::is_move_constructible<_Tp>, std
::is_move_assignable<_Tp> >::value>::type std::swap(_Tp&, _Tp&)
 swap(_Tp& __a, _Tp& __b)
 ^~~~
/usr/include/c++/7/bits/move.h:187:5: note:   template argument
deduction/substitution failed:
/usr/include/c++/7/bits/move.h: In substitution of ‘template
typename std::enable_i
f >,
std::is_move_constructible<_Tp>, std::i
s_move_assignable<_Tp> >::value>::type std::swap(_Tp&, _Tp&) [with _Tp =
auto_vec]’:
...

gcc-bugs@gcc.gnu.org

2020-09-25 Thread vries at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=97207

Tom de Vries  changed:

   What|Removed |Added

 Target||nvptx
 CC||rguenth at gcc dot gnu.org

--- Comment #1 from Tom de Vries  ---
Regression, started at:
...
commit 4b9d61f79c0c0185a33048ae6cc72269cf7efa31
Author: Richard Biener 
Date:   Thu Aug 6 14:50:56 2020 +0200

add move CTOR to auto_vec, use auto_vec for get_loop_exit_edges

This adds a move CTOR to auto_vec and makes use of a
auto_vec return value for get_loop_exit_edges denoting
that lifetime management of the vector is handed to the caller.

The move CTOR prompted the hash_table change because it appearantly
makes the copy CTOR implicitely deleted (good) and hash-table
expansion of the odr_enum_map which is
hash_map  where odr_enum has an
auto_vec member triggers this.  Not sure if
there's a latent bug there before this (I think we're not
invoking DTORs, but we're invoking copy-CTORs).
...

The type bracket_vec_t is defined as:
...
typedef auto_vec bracket_vec_t;
...
so that does look at least related.

gcc-bugs@gcc.gnu.org

2020-09-25 Thread vries at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=97207

--- Comment #2 from Tom de Vries  ---
Configure from build-gcc/config.log:
...
  $ /home/vries/nvptx/trunk/source-gcc/configure --target=nvptx-none --prefix=
--enable-languages=c,c++,fortran --enable-werror --enable-checking=yes CC=gcc
-m64 -Wl,-rpath,/i686-pc-linux-gnu/lib64 CXX=g++ -m64
-Wl,-rpath,/i686-pc-linux-gnu/lib64 --enable-linker-plugin-flags=CC=gcc\ -m32\
-Wl,-rpath,/i686-pc-linux-gnu/lib
--enable-linker-plugin-configure-flags=--host=i686-pc-linux-gnu
--with-sysroot=/nvptx-none
--with-build-sysroot=/home/vries/nvptx/trunk/install/nvptx-none
--with-build-time-tools=/home/vries/nvptx/trunk/install/nvptx-none/bin
--disable-sjlj-exceptions --enable-newlib-io-long-long CFLAGS=-O0 -g
CXXFLAGS=-O0 -g
...

gcc-bugs@gcc.gnu.org

2020-09-25 Thread vries at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=97207

--- Comment #3 from Tom de Vries  ---
Created attachment 49271
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=49271&action=edit
gzipped preprocessed source

Reproduce:

$ g++ -m64 -fno-PIE -c   -O0 -g -DIN_GCC  -DCROSS_DIRECTORY_STRUCTURE  
-fno-exceptions -fno-rtti -fasynchronous-unwind-tables -W -Wall -Wno-narrowing
-Wwrite-strings -Wcast-qual -Wmissing-format-attribute -Woverloaded-virtual 
-Wno-long-long -Wno-variadic-macros -Wno-overlength-strings -fno-common
-fpreprocessed nvptx.c -Wno-implicit-fallthrough
nvptx.c: In member function ‘void bb_sese::append(bb_sese*)’:
/home/vries/nvptx/trunk/source-gcc/gcc/config/nvptx/nvptx.c:3539:38: error: no
matching function for call to ‘swap(bracket_vec_t&, bracket_vec_t&)’
  std::swap (brackets, child->brackets);
  ^
...

gcc-bugs@gcc.gnu.org

2020-09-25 Thread vries at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=97207

--- Comment #15 from Tom de Vries  ---
(In reply to Richard Biener from comment #9)
> diff --git a/gcc/vec.h b/gcc/vec.h
> index d73d865cff2..c0e577893a3 100644
> --- a/gcc/vec.h
> +++ b/gcc/vec.h
> @@ -1546,7 +1546,12 @@ public:
>this->m_vec = r.m_vec;
>r.m_vec = NULL;
>  }
> -  void operator= (auto_vec&&) = delete;
> +  void operator= (auto_vec&& r)
> +{
> +  this->release ();
> +  this->m_vec = r.m_vec;
> +  r.m_vec = NULL;
> +}
>  };
>  
>  
> 
> works for the vec.c test, Tom - can you check if it works for nvptx?

It does.

[Bug libbacktrace/97227] New: dsymutil runs on ELF execs during libbacktrace testing

2020-09-28 Thread vries at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=97227

Bug ID: 97227
   Summary: dsymutil runs on ELF execs during libbacktrace testing
   Product: gcc
   Version: 11.0
Status: UNCONFIRMED
  Severity: trivial
  Priority: P3
 Component: libbacktrace
  Assignee: unassigned at gcc dot gnu.org
  Reporter: vries at gcc dot gnu.org
CC: ian at gcc dot gnu.org
  Target Milestone: ---

When running libbacktrace check, I run into:
...
make[3]: Entering directory
'/dev/shm/tdevries/data/master/2020-09-24T22-37-10-02-00-942ab9e9d4f/build/libbacktrace'
dsymutil btest
Stack dump:
0.  Program arguments: dsymutil btest
#0 0x7fc89542fd7d llvm::sys::PrintStackTrace(llvm::raw_ostream&)
(/usr/bin/../lib64/libLLVM.so.10+0x9a9d7d)
#1 0x7fc89542d6a0 llvm::sys::RunSignalHandlers()
(/usr/bin/../lib64/libLLVM.so.10+0x9a76a0)
#2 0x7fc895430412 (/usr/bin/../lib64/libLLVM.so.10+0x9aa412)
#3 0x7fc8943295a0 __restore_rt (/lib64/libc.so.6+0x395a0)
#4 0x00410085 _init (/usr/bin/dsymutil-10.0.0+0x410085)
#5 0x7fc89431434a __libc_start_main (/lib64/libc.so.6+0x2434a)
#6 0x0040d69a _init (/usr/bin/dsymutil-10.0.0+0x40d69a)
make[3]: *** [Makefile:2396: btest.dSYM] Segmentation fault (core dumped)
...

The installed dsymutil is from llvm10, which has a bug which is fixed on llvm
trunk by this commit:
...
commit ef87f69ec538ccfe7d68b6d03125e7636e859ace (HEAD)
Author: Greg Clayton 
Date:   Fri Mar 6 14:59:41 2020 -0800

Fix a copy and paste error that would cause a crash.
...

Anyway, after applying that commit we'll be likely to get:
...
dsymutil btest
error: cannot parse the debug map for 'btest': The file was not recognized as a
valid object file
...
because this tool is intended for mach-o, not for elf.

So, probably we should only do dsymutil btest on a mach-o platform.

[Bug target/97254] New: [nvptx] Define PCC_BITFIELD_TYPE_MATTERS

2020-09-30 Thread vries at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=97254

Bug ID: 97254
   Summary: [nvptx] Define PCC_BITFIELD_TYPE_MATTERS
   Product: gcc
   Version: 11.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: target
  Assignee: unassigned at gcc dot gnu.org
  Reporter: vries at gcc dot gnu.org
  Target Milestone: ---

While debugging gcc/testsuite/gcc.dg/pr94600-1.c, I found that nvptx doesn't
define PCC_BITFIELD_TYPE_MATTERS.

AFAIU, the theory for offloading is that settings that influence abi should be
compatible with the host.

And in i386.h, we find:
...
 /* If bit field type is int, don't let it cross an int,
   and give entire struct the alignment of an int.  */
/* Required on the 386 since it doesn't have bit-field insns.  */
#define PCC_BITFIELD_TYPE_MATTERS 1
...

OTOH, the value may be different for other hosts.

On the llvm side, we find this comment in
llvm.git/clang/lib/Basic/Targets/ARM.cpp:
...
  // Do not respect the alignment of bit-field types when laying out
  // structures. This corresponds to PCC_BITFIELD_TYPE_MATTERS in gcc.  
  UseBitFieldTypeAlignment = false;
...
and in llvm.git/clang/lib/Basic/Targets/NVPTX.cpp:
...
  UseBitFieldTypeAlignment = HostTarget->useBitFieldTypeAlignment();
...
which seems to confirm that approach.

It should be possible to get the host target by looking at the
--enable-as-accelerator, and figure out the value from there.

OTOH, it's possible that for GCC the nvptx implementation of the hook is moot,
given that part of the processing is done in the host compiler.

That does not answer the question what to do for the standalone target though.
But probably, given that the value is set for both known offloading hosts
x86_64 and ppc, it might be a good idea to have the standalone target do the
same.

[Bug target/90931] [nvptx] FAIL: gcc.c-torture/execute/pr78675.c -O1 execution test

2020-09-30 Thread vries at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=90931

Tom de Vries  changed:

   What|Removed |Added

 Status|UNCONFIRMED |RESOLVED
 Resolution|--- |WORKSFORME

--- Comment #1 from Tom de Vries  ---
No longer occurs for current trunk with driver 450.66.

[Bug libgomp/81688] libgomp.c/target-3{3,4}.c fails: GOMP_OFFLOAD_async_run unimplemented for nvptx

2020-09-30 Thread vries at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=81688

Tom de Vries  changed:

   What|Removed |Added

 Status|UNCONFIRMED |RESOLVED
 Resolution|--- |FIXED

--- Comment #6 from Tom de Vries  ---
This no longer fails at current trunk.

[Bug target/96428] [nvptx] nvptx_gen_shuffle does not handle V2DI mode – Fails with an ICE

2020-10-01 Thread vries at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=96428

Tom de Vries  changed:

   What|Removed |Added

 Status|REOPENED|RESOLVED
 Resolution|--- |FIXED

--- Comment #9 from Tom de Vries  ---
Thomas, these are fine follow-up comments, but given that there's currently no
ICE and there's a test-case in place to check that, this is resolved-fixed.

[Bug libgomp/81778] libgomp.c/for-5.c failure on nvptx -- illegal memory access

2020-10-01 Thread vries at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=81778

--- Comment #9 from Tom de Vries  ---
I ran into this again, and did another round of minimizing.  This time, I added
some buffering around where we write, and check the entire buffer afterwards:
...
$ cat libgomp/testsuite/libgomp.c-c++-common/for-5.c  
/* { dg-additional-options "-std=gnu99" { target c } } */

#include 

#define N 4096
#define MID (N/2)
#define M 4
#pragma omp declare target
int a[N];
#pragma omp end declare target

int
main (void)
{
  int i;

  for (i = 0; i < N; i++)
a[i] = 0;

#pragma omp target update to(a)

  int s = 1;

#pragma omp target simd
  for (int i = M - 1; i > -1; i -= s)
a[MID + i] = 1;

#pragma omp target update from(a)

  int error_found = 0;
  for (i = 0; i < N; i++)
{
  int expected = (MID <= i && i < MID + M) ? 1 : 0;
  if (a[i] == expected)
continue;

  error_found = 1;
  printf ("Expected %d, got %u at %d\n", expected, a[i], i);
}

  if (error_found)
__builtin_abort ();

  return 0;
}
...

Indeed we're writing more than required (for M == 4, we just want the locations
2048..2051):
...
$ LD_LIBRARY_PATH=$(pwd -P)/install/lib64 ./for-5.exe
Expected 0, got 1 at 1955
Expected 0, got 1 at 1986
Expected 0, got 1 at 1987
Expected 0, got 1 at 2017
Expected 0, got 1 at 2018
Expected 0, got 1 at 2019
Aborted (core dumped)
...

Fails for M >= 2.

[Bug target/80845] nvptx backend generates cvt.u32.u32

2020-10-01 Thread vries at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=80845

Tom de Vries  changed:

   What|Removed |Added

 Status|UNCONFIRMED |RESOLVED
 Resolution|--- |FIXED

--- Comment #6 from Tom de Vries  ---
The committed patch fixes the issue mentioned in comment 0.

The issue from comment 3 no longer reproduces.

Marking resolved-fixed.

[Bug libgomp/81778] libgomp.c/for-5.c failure on nvptx -- illegal memory access

2020-10-01 Thread vries at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=81778

--- Comment #10 from Tom de Vries  ---
Tentative patch:
...
diff --git a/gcc/omp-expand.c b/gcc/omp-expand.c
index 99cb4f9dda4..034de497390 100644
--- a/gcc/omp-expand.c
+++ b/gcc/omp-expand.c
@@ -6333,6 +6333,8 @@ expand_omp_simd (struct omp_region *region, struct
omp_for_data *fd)
   /* Collapsed loops not handled for SIMT yet: limit to one lane only.  */
   if (fd->collapse > 1)
simt_maxlane = build_one_cst (unsigned_type_node);
+  else if (TREE_CODE (fd->loops[0].step) != INTEGER_CST)
+   simt_maxlane = build_one_cst (unsigned_type_node);
   else if (safelen_int < omp_max_simt_vf ())
simt_maxlane = build_int_cst (unsigned_type_node, safelen_int);
   tree vf
@@ -6636,6 +6638,10 @@ expand_omp_simd (struct omp_region *region, struct
omp_for_data *fd)
   else
t = fold_build2 (PLUS_EXPR, type, fd->loop.v, step);
   expand_omp_build_assign (&gsi, fd->loop.v, t);
+  /* The alternative IV needs to to be updated as well, but isn't
+currently.  Assert that we fall back to simt_maxlane == 1.  */
+  gcc_assert (altv == NULL_TREE
+ || tree_int_cst_equal (simt_maxlane, integer_one_node));
 }

   /* Remove GIMPLE_OMP_RETURN.  */
...

[Bug fortran/95654] nvptx offloading: FAIL: libgomp.fortran/pr66199-5.f90 -O3 -fomit-frame-pointer -funroll-loops -fpeel-loops -ftracer -finline-functions execution test

2020-10-05 Thread vries at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=95654

Tom de Vries  changed:

   What|Removed |Added

 Resolution|--- |FIXED
 Status|NEW |RESOLVED

--- Comment #17 from Tom de Vries  ---
Patch with test-case committed, marking resolved-fixed.

[Bug fortran/95654] nvptx offloading: FAIL: libgomp.fortran/pr66199-5.f90 -O3 -fomit-frame-pointer -funroll-loops -fpeel-loops -ftracer -finline-functions execution test

2020-10-05 Thread vries at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=95654

Tom de Vries  changed:

   What|Removed |Added

   Target Milestone|--- |11.0

[Bug tree-optimization/97159] [11 Regression] segfault in modref_may_conflict

2020-10-05 Thread vries at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=97159

--- Comment #4 from Tom de Vries  ---
I'm currently not running into this ICE anymore, so presumably it was fixed.

I'm not sure by which commit though.

[Bug tree-optimization/97008] [openacc] Remove invariant that IFN_UNIQUE is last stmt in bb

2020-10-05 Thread vries at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=97008

--- Comment #4 from Tom de Vries  ---
I did a libgomp test run with commit f96b6328fa7 "[tree-optimization] Don't
clear ctrl-altering flag for IFN_UNIQUE" reverted, and with this patch:
...
diff --git a/gcc/tracer.c b/gcc/tracer.c
index 0f69b335b8c..3a4403d92b1 100644
--- a/gcc/tracer.c
+++ b/gcc/tracer.c
@@ -93,11 +93,15 @@ can_duplicate_insn_p (gimple *g)
  The IFN_GOMP_SIMT_VOTE_ANY is currently part of such a group,
  so the same holds there, but it could be argued that the
  IFN_GOMP_SIMT_VOTE_ANY could be generated after that group,
- in which case it could be duplicated.  */
+ in which case it could be duplicated.
+ An IFN_UNIQUE call must be duplicated as part of its group,
+ or not at all.  */
   if (is_gimple_call (g)
   && (gimple_call_internal_p (g, IFN_GOMP_SIMT_ENTER_ALLOC)
  || gimple_call_internal_p (g, IFN_GOMP_SIMT_EXIT)
- || gimple_call_internal_p (g, IFN_GOMP_SIMT_VOTE_ANY)))
+ || gimple_call_internal_p (g, IFN_GOMP_SIMT_VOTE_ANY)
+ || (gimple_call_internal_p (g)
+ && gimple_call_internal_unique_p (g
 return false;

   return true;
@@ -117,8 +121,6 @@ can_duplicate_bb_no_insn_iter_p (const_basic_block bb)
   if (gimple_code (g) == GIMPLE_TRANSACTION)
return false;

-  /* An IFN_UNIQUE call must be duplicated as part of its group,
-or not at all.  */
   if (is_gimple_call (g)
  && gimple_call_internal_p (g)
  && gimple_call_internal_unique_p (g))
...

No issues found.

[Bug libgomp/97291] New: [SIMT] Move SIMT_XCHG_* out of non-uniform execution region

2020-10-05 Thread vries at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=97291

Bug ID: 97291
   Summary: [SIMT] Move SIMT_XCHG_* out of non-uniform execution
region
   Product: gcc
   Version: 11.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: libgomp
  Assignee: unassigned at gcc dot gnu.org
  Reporter: vries at gcc dot gnu.org
CC: jakub at gcc dot gnu.org
  Target Milestone: ---

We have:
...
/* Allocate per-lane storage and begin non-uniform execution region.  */

static void
expand_GOMP_SIMT_ENTER_ALLOC (internal_fn, gcall *stmt)
...
and:
...
/* Deallocate per-lane storage and leave non-uniform execution region.  */

static void
expand_GOMP_SIMT_EXIT (internal_fn, gcall *stmt)
...

So, if the SIMT_ENTER_ALLOC and the SIMT_EXIT mark the start and end of a
region of non-uniform execution, it's strange that such a region can contain
SIMT_XCHG_*, which on nvptx requires uniform execution.

Moving SIMT_VOTE_ANY/SIMT_LAST_LANE/SIMT_XCHG_* as a whole after SIMT_EXIT is
not possible given that VOTE_ANY may have data dependencies to storage that is
deallocated by SIMT_EXIT (as Alexander mentioned here:
https://gcc.gnu.org/pipermail/gcc-patches/2020-October/555475.html).

A possible solution would be to split the SIMT_EXIT into separate bits for
exiting non-uniform execution and deallocation, and have:
- SIMT_ENTER_ALLOC
- SIMT_EXIT_UNI
- SIMT_VOTE_ANY/SIMT_LAST_LANE/SIMT_XCHG_*
- SIMT_EXIT_DEALLOC

Also I've wondered if we could do:
- SIMT_ENTER_ALLOC
- SIMT_VOTE_ANY
- SIMT_EXIT
- SIMT_LAST_LANE/SIMT_XCHG_*
but perhaps there are again data dependency problems.

[Bug testsuite/81690] libgomp.c/{target-32,thread-limit-2}.c fail for nvptx: missing usleep

2020-10-05 Thread vries at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=81690

--- Comment #8 from Tom de Vries  ---
Pinged issue here (
https://gcc.gnu.org/pipermail/gcc-patches/2020-October/555496.html ).

[Bug tree-optimization/97008] [openacc] Remove invariant that IFN_UNIQUE is last stmt in bb

2020-10-06 Thread vries at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=97008

--- Comment #5 from Tom de Vries  ---
The openacc machinery is the only user of IFN_UNIQUE.

Thomas, as openacc maintainer, is this change ok for you, or are reasons why
you want to keep the IFN_UNIQUE as last stmt of the BB?

[Bug tree-optimization/97159] [11 Regression] segfault in modref_may_conflict

2020-10-06 Thread vries at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=97159

Tom de Vries  changed:

   What|Removed |Added

 Status|UNCONFIRMED |RESOLVED
 Resolution|--- |FIXED

--- Comment #5 from Tom de Vries  ---
Marking resolved-fixed.

[Bug middle-end/90861] OpenACC 'declare' not cleaning up for VLAs

2020-10-06 Thread vries at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=90861

Tom de Vries  changed:

   What|Removed |Added

 CC||vries at gcc dot gnu.org

--- Comment #4 from Tom de Vries  ---
Created attachment 49317
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=49317&action=edit
Tentative patch

[Bug middle-end/90861] OpenACC 'declare' not cleaning up for VLAs

2020-10-06 Thread vries at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=90861

--- Comment #5 from Tom de Vries  ---
Patch submitted:
https://gcc.gnu.org/pipermail/gcc-patches/2020-October/555606.html

[Bug middle-end/90861] OpenACC 'declare' not cleaning up for VLAs

2020-10-06 Thread vries at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=90861

Tom de Vries  changed:

   What|Removed |Added

 Resolution|--- |FIXED
 Status|NEW |RESOLVED
   Target Milestone|--- |11.0

--- Comment #7 from Tom de Vries  ---
Patch committed, marking resolved-fixed.

[Bug testsuite/81690] libgomp.c/{target-32,thread-limit-2}.c fail for nvptx: missing usleep

2020-10-06 Thread vries at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=81690

--- Comment #9 from Tom de Vries  ---
(In reply to Tobias Burnus from comment #4)
> The omp_is_initial_device() is only resolved at run time - hence, I think
> the linker still wants to see "usleep".
> 

Well, yes, but that could be fixed: 
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=82251

[Bug target/97318] New: [nvptx] Function splitting results in invalid function name

2020-10-07 Thread vries at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=97318

Bug ID: 97318
   Summary: [nvptx] Function splitting results in invalid function
name
   Product: gcc
   Version: 11.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: target
  Assignee: unassigned at gcc dot gnu.org
  Reporter: vries at gcc dot gnu.org
  Target Milestone: ---

Created attachment 49321
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=49321&action=edit
openmp test-case

While minimizing PR97203 - I ran into:
...
FAIL: libgomp.c/test.c (test for excess errors)
Excess errors:
ptxas /tmp/ccTfIxsQ.o, line 23; fatal   : Parsing error near '.part': syntax
error
ptxas fatal   : Ptx assembly aborted due to errors
nvptx-as: ptxas returned 255 exit status
...

The problem is that this:
...
.func (.param .f32 %value_out) sinf.part.0 (.param .f32 %in_ar0);
...
is not supported by nvptx (as indicated by NO_DOT_IN_LABEL).

[Bug libgomp/97331] New: [nvptx] Provide GCN_NUM_TEAMS/GCN_NUM_THREADS equivalent

2020-10-08 Thread vries at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=97331

Bug ID: 97331
   Summary: [nvptx] Provide GCN_NUM_TEAMS/GCN_NUM_THREADS
equivalent
   Product: gcc
   Version: 11.0
Status: UNCONFIRMED
  Severity: enhancement
  Priority: P3
 Component: libgomp
  Assignee: unassigned at gcc dot gnu.org
  Reporter: vries at gcc dot gnu.org
CC: jakub at gcc dot gnu.org
  Target Milestone: ---

When looking at the gcn plugin, there are a number of environment variables
that set/limit launch dimensions:
...
$ grep getenv libgomp/plugin/plugin-gcn.c | grep NUM
  const char *x = secure_getenv ("GCN_NUM_TEAMS");
x = secure_getenv ("GCN_NUM_GANGS");
  const char *z = secure_getenv ("GCN_NUM_THREADS");
z = secure_getenv ("GCN_NUM_WORKERS");
...

It would be nice to have similar ones for nvptx.

[Bug libgomp/97332] New: [gcn] GCN_NUM_GANGS/GCN_NUM_WORKERS override compile-time constants

2020-10-08 Thread vries at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=97332

Bug ID: 97332
   Summary: [gcn] GCN_NUM_GANGS/GCN_NUM_WORKERS override
compile-time constants
   Product: gcc
   Version: 11.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: libgomp
  Assignee: unassigned at gcc dot gnu.org
  Reporter: vries at gcc dot gnu.org
CC: jakub at gcc dot gnu.org
  Target Milestone: ---

In openacc programs, dimensions are either dynamic or hardcoded.

If the dimensions are hardcoded, and there are builtins returning the size of
these dimensions, the builtins are folded in fold_internal_goacc_dim.

Changing the dimensions in the plugin then invalidates the folding.

I'm guessing this should be fixed, or at least documented in the plugin (with
perhaps even a warning).

[Bug libgomp/81802] Report cuLaunchKernel launch dimensions in GOMP_OFFLOAD_run

2020-10-08 Thread vries at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=81802

Tom de Vries  changed:

   What|Removed |Added

 Resolution|--- |FIXED
   Target Milestone|--- |11.0
 Status|UNCONFIRMED |RESOLVED

--- Comment #2 from Tom de Vries  ---
Marking resolved-fixed.

[Bug tree-optimization/97333] New: [gimple_can_duplicate_bb_p == false, tree-ssa-threadupdate] ICE in duplicate_block, at cfghooks.c:1093

2020-10-08 Thread vries at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=97333

Bug ID: 97333
   Summary: [gimple_can_duplicate_bb_p == false,
tree-ssa-threadupdate] ICE in duplicate_block, at
cfghooks.c:1093
   Product: gcc
   Version: 11.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: tree-optimization
  Assignee: unassigned at gcc dot gnu.org
  Reporter: vries at gcc dot gnu.org
  Target Milestone: ---

With this patch:
...
diff --git a/gcc/tree-cfg.c b/gcc/tree-cfg.c
index 406441751a9..b9168755155 100644
--- a/gcc/tree-cfg.c
+++ b/gcc/tree-cfg.c
@@ -6213,7 +6213,7 @@ gimple_split_block_before_cond_jump (basic_block bb)
 static bool
 gimple_can_duplicate_bb_p (const_basic_block bb ATTRIBUTE_UNUSED)
 {
-  return true;
+  return false;
 }

 /* Create a duplicate of the basic block BB.  NOTE: This does not
...
we run into:
...
spawn -ignore SIGHUP
/dev/shm/tdevries/data/master/2020-10-07T08-04-46-02-00-c475cfa435b/build/gcc/xgcc
-B/dev/shm/tdevries/data/master/2020-10-07T08-04-46-02-00-c475cfa435b/build/gcc/
-fdiagnostics-plain-output -O1 -w -c -o 20041018-1.o
/labs/tdevries/gcc/src/gcc/testsuite/gcc.c-torture/compile/20041018-1.c^M
during GIMPLE pass: dom^M
/labs/tdevries/gcc/src/gcc/testsuite/gcc.c-torture/compile/20041018-1.c: In
function 'foo':^M
/labs/tdevries/gcc/src/gcc/testsuite/gcc.c-torture/compile/20041018-1.c:2:1:
internal compiler error: in duplicate_block, at cfghooks.c:1093^M
0xbfebaf duplicate_block(basic_block_def*, edge_def*, basic_block_def*,
copy_bb_data*)^M
/labs/tdevries/gcc/src/gcc/cfghooks.c:1093^M
0x16d794b create_block_for_threading^M
/labs/tdevries/gcc/src/gcc/tree-ssa-threadupdate.c:342^M
0x16d90e7 ssa_create_duplicates(redirection_data**, ssa_local_info_t*)^M
/labs/tdevries/gcc/src/gcc/tree-ssa-threadupdate.c:1130^M
0x16dfe86 void hash_table::traverse_noresize(ssa_local_info_t*)^M
/labs/tdevries/gcc/src/gcc/hash-table.h:1081^M
0x16df31a void hash_table::traverse(ssa_local_info_t*)^M
/labs/tdevries/gcc/src/gcc/hash-table.h:1102^M
0x16d9dff thread_block_1^M
/labs/tdevries/gcc/src/gcc/tree-ssa-threadupdate.c:1495^M
0x16d9ee5 thread_block^M
/labs/tdevries/gcc/src/gcc/tree-ssa-threadupdate.c:1539^M
0x16da3d7 thread_through_loop_header^M
/labs/tdevries/gcc/src/gcc/tree-ssa-threadupdate.c:1781^M
0x16dc767 thread_through_all_blocks(bool)^M
/labs/tdevries/gcc/src/gcc/tree-ssa-threadupdate.c:2667^M
0x15522d2 execute^M
/labs/tdevries/gcc/src/gcc/tree-ssa-dom.c:776^M
...

[Bug target/97203] [nvptx] 'illegal memory access was encountered' with 'omp simd'/SIMT and cexpf call

2020-10-08 Thread vries at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=97203

--- Comment #2 from Tom de Vries  ---
Minimal version (without inlining sinf code from newlib):
...
/* { dg-additional-options "-lm -foffload=-lm" } */

#define N 1

int
main (void) {
  float k[N];
  float res;

  for (int i = 0; i < N; i++)
k[i] = 300;

#pragma omp target map(to:k) map(from:res)
  {
float sum = 0.0;
#pragma omp simd reduction(+:sum)
for (int i = 0; i < N; i++)
  sum += __builtin_sinf (k[i]);

res = sum;
  }

  return 0;
}
...

[Bug target/97203] [nvptx] 'illegal memory access was encountered' with 'omp simd'/SIMT and cexpf call

2020-10-08 Thread vries at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=97203

--- Comment #3 from Tom de Vries  ---
[ Note, this is with GOMP_NVPTX_JIT=-O0. ]

In sinf, we have:
...
 45:return -__kernel_cosf(y[0],y[1]);
...
which translates to:
...
.loc 1 45 12
ld.f32 %r67,[%frame+4];
ld.f32 %r65,[%frame];
{
.param .f32 %value_in;
.param .f32 %out_arg1;
st.param.f32 [%out_arg1],%r65;
.param .f32 %out_arg2;
st.param.f32 [%out_arg2],%r67;
call (%value_in),__kernel_cosf,(%out_arg1,%out_arg2);
ld.param.f32 %r68,[%value_in];
}
.loc 1 45 11
neg.f32 %r37,%r68;
...

If I place (using GOMP_NVPTX_PTXRW) a trap before the first load:
...
 .loc 1 45 12
+trap
 ld.f32 %r67,[%frame+4];
...
I get:
...
libgomp: cuCtxSynchronize error: an illegal instruction was encountered
...

If I place it after the first load, I get:
...
libgomp: cuCtxSynchronize error: an illegal memory access was encountered
...

[Bug target/97203] [nvptx] 'illegal memory access was encountered' with 'omp simd'/SIMT and cexpf call

2020-10-08 Thread vries at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=97203

--- Comment #4 from Tom de Vries  ---
So, I think calling functions from simd code is atm not supported for nvptx.

Stack variables in simd code are mapped on a per-thread stack rather than on
the
usual per-warp stack.

The functions are compiled with the usual per-warp stack, so calling those
functions from simd might mean the different lanes are gonna disagree about
what the value in a stack variable should be.

Having said that, for the example in comment 2, there only should be one thread
executing the call, so this doesn't explain the illegal memory access.

[Bug target/97203] [nvptx] 'illegal memory access was encountered' with 'omp simd'/SIMT and cexpf call

2020-10-08 Thread vries at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=97203

--- Comment #5 from Tom de Vries  ---
FWIW, another aspect here is convergence (as usual).

Looking at the SASS code for main$_omp_fn$0$impl, I don't find evidence for the
usual divergence/convergence ops (SSY/SYNC), which might mean that the
following shfl is executed in divergent mode, so, even if we would not get the
memory access error, we would not get correct results.

[Bug target/97338] New: [nvptx] Convergence checking

2020-10-08 Thread vries at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=97338

Bug ID: 97338
   Summary: [nvptx] Convergence checking
   Product: gcc
   Version: 11.0
Status: UNCONFIRMED
  Severity: enhancement
  Priority: P3
 Component: target
  Assignee: unassigned at gcc dot gnu.org
  Reporter: vries at gcc dot gnu.org
  Target Milestone: ---

With ptx, we have insns that need to be executed in convergent mode, that is,
all threads in the warp active.

We can unfortunately not enforce this, but we could check it, which could help
pinpoint problems.

A ptx insn:
...
  vote.ballot.b32 %rbla, 1;
...
gives us the regmask of active threads, so we could check:
...
{
  .reg .u32 %rwarp_active_mask;
  vote.ballot.b32 %rwarp_active_mask, 1;
  .reg .pred %pconvergent;
  setp.eq.u32 %pconvergent,%rwarp_active_mask,-1;
  @ ! %pconvergent trap;
}
...

[Bug target/97348] New: [nvptx] Make -misa=sm_35 the default

2020-10-09 Thread vries at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=97348

Bug ID: 97348
   Summary: [nvptx] Make  -misa=sm_35 the default
   Product: gcc
   Version: 11.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: target
  Assignee: unassigned at gcc dot gnu.org
  Reporter: vries at gcc dot gnu.org
  Target Milestone: ---

The gcc objects and executables for nvptx are somewhat funny given that they
contain just ptx text.

However, during assembling, we do verify that the ptx is valid, by running it
through ptxas, if that happens to be available in the path.  This step can be
skipped by using -Wa,--no-verify, but it's done by default.

The system cuda on my system (ubuntu Ubuntu 18.04.5) is V9.1.108, and I can do
a build (where the system ptxas will be used for verification).

However, if I want to compile something while my path is set to include the
most recent cuda (11.1), I get:
...
$ c=~/cuda/11.1/bin; ( export PATH=$c:$PATH;
/home/vries/nvptx/trunk/build-gcc/gcc/xgcc
-B/home/vries/nvptx/trunk/build-gcc/gcc/ -fdiagnostics-plain-output
--sysroot=/home/vries/nvptx/trunk/install/nvptx-none -O0 -w -c -isystem
/home/vries/nvptx/trunk/build-gcc/nvptx-none/./newlib/targ-include -isystem
/home/vries/nvptx/trunk/source-gcc/newlib/libc/include -o pr42717.o
/home/vries/nvptx/trunk/source-gcc/gcc/testsuite/gcc.c-torture/compile/pr42717.c
)
ptxas fatal   : Value 'sm_30' is not defined for option 'gpu-name'
nvptx-as: ptxas returned 255 exit status
...

By adding -misa=sm_35, the compilation succeeds.

For a test-case, it's stil feasible to add this, but not for a build.

So, it looks like it's time to make -misa=sm_35 the new default.

[ It's good to note that the ptxas code will not actually be executed, and that
in the end the installed driver is the one making the translation for
execution.  So atm my exec using sm_30 is still supported by Long Lived Branch
driver version 450.66. ]

[Bug target/97348] [nvptx] Make -misa=sm_35 the default

2020-10-09 Thread vries at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=97348

Tom de Vries  changed:

   What|Removed |Added

 Target||nvptx

--- Comment #1 from Tom de Vries  ---
Using this:
...
diff --git a/gcc/config/nvptx/nvptx.opt b/gcc/config/nvptx/nvptx.opt
index 75c3d54864e..4c27a832d28 100644
--- a/gcc/config/nvptx/nvptx.opt
+++ b/gcc/config/nvptx/nvptx.opt
@@ -60,5 +60,5 @@ EnumValue
 Enum(ptx_isa) String(sm_35) Value(PTX_ISA_SM35)

 misa=
-Target RejectNegative ToLower Joined Enum(ptx_isa) Var(ptx_isa_option)
Init(PTX_ISA_SM30)
+Target RejectNegative ToLower Joined Enum(ptx_isa) Var(ptx_isa_option)
Init(PTX_ISA_SM35)
 Specify the version of the ptx ISA to use.
...
and rebuilding cc1, we indeed get the sm_35 in the .s file.

However, when doing a complete rebuild against system cuda, we run into this in
configure doing a conftest:
...
ptxas fatal   : SM version specified by .target is higher than default SM
version assumed
nvptx-as: ptxas returned 255 exit status
...

If we add -misa=sm_35 to the command line, the conftest passes.

Looking in the nvptx-as sources, we find a hard_coded default:
...
  const char *smver = "sm_30";
...
and after changing that to sm_35, the conftest passes.

I'm not sure if nvptx-as should have a hardcoded default, probably it should
parse the .target line and pass that to ptxas.

[Bug target/97348] [nvptx] Make -misa=sm_35 the default

2020-10-09 Thread vries at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=97348

--- Comment #2 from Tom de Vries  ---
Anyway, we should be able to work around this by having gcc explicitly pass -m
sm_35 to nvptx-as:
...
-#define ASM_SPEC "%{misa=*:-m %*}"
+#define ASM_SPEC "%{misa=*:-m %*; :-m sm_35}"
...

[Bug target/97348] [nvptx] Make -misa=sm_35 the default

2020-10-09 Thread vries at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=97348

--- Comment #3 from Tom de Vries  ---
(In reply to Tom de Vries from comment #1)
> Looking in the nvptx-as sources, we find a hard_coded default:
> ...
>   const char *smver = "sm_30";
> ...
> and after changing that to sm_35, the conftest passes.
> 
> I'm not sure if nvptx-as should have a hardcoded default, probably it should
> parse the .target line and pass that to ptxas.

Filed https://github.com/MentorEmbedded/nvptx-tools/issues/24.

[Bug target/97348] [nvptx] Make -misa=sm_35 the default

2020-10-09 Thread vries at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=97348

Tom de Vries  changed:

   What|Removed |Added

   Target Milestone|--- |11.0
 Resolution|--- |FIXED
 Status|UNCONFIRMED |RESOLVED

--- Comment #5 from Tom de Vries  ---
Marking resolved-fixed.

[Bug target/98321] [nvptx] 'atom.add.f32' for atomic add of 32-bit 'float'

2020-12-17 Thread vries at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98321

--- Comment #3 from Tom de Vries  ---
(In reply to Thomas Schwinge from comment #2)
> However, my report was specifically for the nvptx target compiler.  Just
> compile with 'nvptx-gcc -fopenacc -S' the code I posed, and compare
> '-DTYPE=int'/'-DTYPE=long' vs. '-DTYPE=float'.


Ah, I was not aware of usage of openacc beyond the offloading setup.

For my understanding, is this just a way for you to easily reproduce some
problem really occurring elsewhere, or is this actually used for something?

[Bug target/98321] [nvptx] 'atom.add.f32' for atomic add of 32-bit 'float'

2020-12-18 Thread vries at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98321

--- Comment #5 from Tom de Vries  ---
(In reply to Thomas Schwinge from comment #4)
> I had been looking into how/when PTX 'atom' is used for reductions, and
> first had a look what the back end currently might emit at all, found SDIM
> 'atomic_fetch_add', and SF 'atomic_fetch_addsf'.

Ack.

> I tried to get these
> used via '(void) __atomic_fetch_add (&a, b, __ATOMIC_RELAXED);', which works
> fine for integer types, but 'error: operand type ‘float *’ is incompatible
> with argument 1 of ‘__atomic_fetch_add’' (didn't research the rationale
> behind that), so resorted to 'acc atomic'.
> Further analysis to be done. 
> (Can floating-point type atomic generally not be supported, given that
> '__atomic_fetch_add' rejects it?  Is OMP atomic handling doing something
> wrong for these even for nvptx target (real, not via offloading)?  Is
> something wrong in the nvptx back end?)
> 

I don't know the rationale either, but at least it looks like documented
behaviour, both for the builtin and the pattern.

I don't see the backend doing anything wrong.

> This isn't important right now; I just filed the issue as I'd found it.

Ack, understood.

[Bug debug/98656] New: switchlower_O0 drops line number of switch

2021-01-13 Thread vries at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98656

Bug ID: 98656
   Summary: switchlower_O0 drops line number of switch
   Product: gcc
   Version: 11.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: debug
  Assignee: unassigned at gcc dot gnu.org
  Reporter: vries at gcc dot gnu.org
  Target Milestone: ---

[ Originally filed as gdb PR at
https://sourceware.org/bugzilla/show_bug.cgi?id=27179 ]

Consider test-case small.c.
...
$ cat -n small.c
 1  #include 
 2
 3  void foo (int x, int y)
 4  {
 5switch (x) {
 6  case 0: break;
 7  case 1: break;
 8  case 2: break;
 9  case 3:
10for (int z = 0; z < ({ if (y) break; 5; }); z++)
11break;
12  case 4: break;
13  default: break;
14}
15  }
16
17  int main ()
18  {
19foo (1, 1);  // L1
20foo (2, 1);  // L2
21printf("hello world!");  // L3
22return 0;
23  }
...

With gcc-8, we have a .loc with line number 5 representing the switch
statement:
...
$ gcc-8 -O0 -g small.c -save-temps
$ cat small.s
foo:
.LFB0:
.file 1 "small.c"
.loc 1 4 1
.cfi_startproc
pushq   %rbp
.cfi_def_cfa_offset 16
.cfi_offset 6, -16
movq%rsp, %rbp
.cfi_def_cfa_register 6
movl%edi, -20(%rbp)
movl%esi, -24(%rbp)
.loc 1 5 3
cmpl$4, -20(%rbp)
ja  .L13
...

With gcc-9 that .loc disappeared:
...
foo:
.LFB0:
.file 1 "small.c"
.loc 1 4 1
.cfi_startproc
pushq   %rbp
.cfi_def_cfa_offset 16
.cfi_offset 6, -16
movq%rsp, %rbp
.cfi_def_cfa_register 6
movl%edi, -20(%rbp)
movl%esi, -24(%rbp)
cmpl$4, -20(%rbp)
ja  .L13
...
and that's still the case with gcc-11.

Culprit is switchlower_O0.

With this compilation:
...
$ rm -f *.c.*; gcc-11 -O0 -g small.c -fdump-tree-all-lineno -save-temps
...
we have at a-small.c.234t.cplxlower0:
...
   :
  [small.c:5:3] switch (x_1(D)) <[small.c:13:2] default:  [INV],
[small.c:6:5] case 0:  [INV], [small.c:7:5] case 1:  [INV],
[small.c:8:5] case 2:  [INV], [small.c:9:5] case 3:  [INV],
[small.c:12:5] case 4:  [INV]>
...
and at a-small.c.236t.switchlower_O0:
...
   :
  switch (x_1(D)) <[small.c:13:2] default:  [0.00%], [small.c:6:5] case 0:
 [20.00%], [small.c:7:5] case 1:  [20.00%], [small.c:8:5] case 2: 
[20.00%], [small.c:9:5] case 3:  [20.00%], [small.c:12:5] case 4: 
[20.00%]>
...

Note the dropped "[small.c:5:3]" in front of "switch".

[Bug debug/98656] [9/10/11 Regression] switchlower_O0 drops line number of switch

2021-01-13 Thread vries at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98656

--- Comment #1 from Tom de Vries  ---
Created attachment 49959
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=49959&action=edit
Tentative patch

Using this tentative patch, I get back the .loc for line number 5:
...
foo:
.LFB0:
.file 1 "small.c"
.loc 1 4 1
.cfi_startproc
pushq   %rbp
.cfi_def_cfa_offset 16
.cfi_offset 6, -16
movq%rsp, %rbp
.cfi_def_cfa_register 6
movl%edi, -20(%rbp)
movl%esi, -24(%rbp)
.loc 1 5 3
cmpl$4, -20(%rbp)
ja  .L13
...

[Bug debug/98780] New: Missing line table entry for inlined stmt at -g -O0

2021-01-21 Thread vries at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98780

Bug ID: 98780
   Summary: Missing line table entry for inlined stmt at -g -O0
   Product: gcc
   Version: 11.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: debug
  Assignee: unassigned at gcc dot gnu.org
  Reporter: vries at gcc dot gnu.org
  Target Milestone: ---

Created attachment 50018
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=50018&action=edit
source file

[ Spinoff of gdb PR25884 - "Stepping over inlined function without line number
statement fails" (https://sourceware.org/bugzilla/show_bug.cgi?id=25884). ]

Consider test source gdb/testsuite/gdb.opt/inline-cmds.c (attached).

When compiled with -O0 -g, this bit:
...
71result = 0; /* set breakpoint 3 here */
72
73func1 (); /* first call */
74func1 (); /* second call */
75marker ();
...
where func1 is:
...
33  inline __attribute__((always_inline)) int func1(void)
34  {
35bar ();
36return x * y;
37  }
...
is represented in the line info as:
...
Line numberStarting addressViewStmt
710x64   x
350x6e   x
750x78   x
...
where the insn are:
...
  64:   c7 05 00 00 00 00 00movl   $0x0,0x0(%rip)# 6e 
  6b:   00 00 00 
  6e:   e8 00 00 00 00  callq  73 
  73:   e8 00 00 00 00  callq  78 
  78:   e8 00 00 00 00  callq  7d 
...

The calls to bar at 6e and 73 represent different instantiations of the same
statement, but they get a single entry in the line table.

It would be more accurate to have two entries.

Note that each call to bar has its own DW_TAG_inlined_subroutine descriptor:
...
 <2><11e>: Abbrev Number: 10 (DW_TAG_inlined_subroutine)
<11f>   DW_AT_abstract_origin: <0x208>
<123>   DW_AT_low_pc  : 0x6e
<12b>   DW_AT_high_pc : 0x5
<133>   DW_AT_call_file   : 1
<134>   DW_AT_call_line   : 73
<135>   DW_AT_call_column : 3
 <2><136>: Abbrev Number: 10 (DW_TAG_inlined_subroutine)
<137>   DW_AT_abstract_origin: <0x208>
<13b>   DW_AT_low_pc  : 0x73
<143>   DW_AT_high_pc : 0x5
<14b>   DW_AT_call_file   : 1
<14c>   DW_AT_call_line   : 74
<14d>   DW_AT_call_column : 3
...
so at some level gcc knows that these are different statements.

[ The problem with gdb is that it ignores the DW_TAG_inlined_subroutine on the
second call because there's no corresponding line table entry. ]

[Bug debug/98780] Missing line table entry for inlined stmt at -g -O0

2021-01-21 Thread vries at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98780

--- Comment #1 from Tom de Vries  ---
At final, we have:
...
(note 113 30 112 2 0x7f53d1836de0 NOTE_INSN_BLOCK_BEG)
(note 112 113 31 2 0x7f53d1836e40 NOTE_INSN_BLOCK_BEG)
(call_insn 31 112 114 2 (call (mem:QI (symbol_ref:DI ("bar") [flags 0x41] 
) [0 bar S1 A8])
(const_int 0 [0])) "src/gdb/testsuite/gdb.opt/inline-cmds.c":35:3 813
{*call}
 (nil)
(nil))
(note 114 31 115 2 0x7f53d1836e40 NOTE_INSN_BLOCK_END)
(note 115 114 117 2 0x7f53d1836de0 NOTE_INSN_BLOCK_END)
(note 117 115 116 2 0x7f53d1836d20 NOTE_INSN_BLOCK_BEG)
(note 116 117 38 2 0x7f53d1836d80 NOTE_INSN_BLOCK_BEG)
(call_insn 38 116 118 2 (call (mem:QI (symbol_ref:DI ("bar") [flags 0x41] 
) [0 bar S1 A8])
(const_int 0 [0])) "src/gdb/testsuite/gdb.opt/inline-cmds.c":35:3 813
{*call}
 (nil)
(nil))
(note 118 38 119 2 0x7f53d1836d80 NOTE_INSN_BLOCK_END)
(note 119 118 45 2 0x7f53d1836d20 NOTE_INSN_BLOCK_END)
...

So if we reset last_linenum when encountering a BLOCK_END:
...
diff --git a/gcc/final.c b/gcc/final.c
index b037e07fca0..a5da1ce7224 100644
--- a/gcc/final.c
+++ b/gcc/final.c
@@ -2385,6 +2385,8 @@ final_scan_insn_1 (rtx_insn *insn, FILE *file, int
optimize_p AT
TRIBUTE_UNUSED,
debug_hooks->end_block (high_block_linenum, n);
  gcc_assert (BLOCK_IN_COLD_SECTION_P (NOTE_BLOCK (insn))
  == in_cold_section_p);
+
+ last_linenum = 0;
}
  if (write_symbols == DBX_DEBUG)
{
...
I get:
...
Line numberStarting addressViewStmt
710x64   x
350x6e   x
350x73   x
750x78   x
...

[Bug libbacktrace/98818] New: [libbacktrace] Don't throw fatal error for unsupported dwarf version

2021-01-25 Thread vries at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98818

Bug ID: 98818
   Summary: [libbacktrace] Don't throw fatal error for unsupported
dwarf version
   Product: gcc
   Version: 11.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: libbacktrace
  Assignee: unassigned at gcc dot gnu.org
  Reporter: vries at gcc dot gnu.org
CC: ian at gcc dot gnu.org
  Target Milestone: ---

I have several gcc versions installed. I use this f.i. for running gdb
testsuite with different gcc versions.

Consequently, I have libgcc 11 installed in my system, to be able to support
the latest gcc compiler.

After installing the libgcc 11 debuginfo package, gcc-go9 started to given me
problems:
...
$ ./outputs/gdb.go/handcall/handcall 
fatal error: unrecognized DWARF version in .debug_info at 40

goroutine 1 [running, locked to thread]:
fatal error: unrecognized DWARF version in .debug_info at 40
panic during panic

goroutine 1 [running, locked to thread]:
fatal error: unrecognized DWARF version in .debug_info at 40
stack trace unavailable
...

The root cause for this is that libgcc's .debug_info contains dwarf5 units, and
that causes:
...
   dwarf_buf_error (&unit_buf, "unrecognized DWARF version");
...
where we do:
...
static void
dwarf_buf_error (struct dwarf_buf *buf, const char *msg)
{
  char b[200];

  snprintf (b, sizeof b, "%s in %s at %d",
msg, buf->name, (int) (buf->buf - buf->start));
  buf->error_callback (buf->data, b, 0);
}
...
which gets us to libgo's error_callback:
...
(gdb) l
161 /* Error callback.  */
162
163 static void
164 error_callback (void *data __attribute__ ((unused)),
165 const char *msg, int errnum)
166 {
167   if (errnum == -1)
168 {
169   /* No debug info available.  Carry on as best we can.  */
170   return;
(gdb) l
171 }
172   if (errnum != 0)
173 runtime_printf ("%s errno %d\n", msg, errnum);
174   runtime_throw (msg);
175 }
...

ISTM that dwarf info that has a newer version than the libbacktrace reader
supports is not that different from missing debug info, so I wonder if we
should call the error_callback here with -1 instead.

  1   2   3   4   5   >