[Bug middle-end/51663] Desirable/undesirable elimination of unused variables & functions at -O0, -O0 -flto and -O0 -fwhole-program

2020-03-17 Thread vries at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=51663

Tom de Vries  changed:

   What|Removed |Added

 CC||vries at gcc dot gnu.org

--- Comment #11 from Tom de Vries  ---
Cross-referencing PR gdb/25684 - "gdb testing with gcc -flto" (
https://sourceware.org/bugzilla/show_bug.cgi?id=25684 ).

Ideally there would be a way to enable the lto infrastructure without actually
optimizing, such that when running the gdb testsuite with and without flto and
comparing results, any regression would indicate something that needs fixing.

In the current situation, each individual regression needs investigation
whether something needs fixing or whether the failure is just an optimization
artifact. And due to the fact there are optimizations, there are thousands of
such regressions.

[Bug middle-end/51663] Desirable/undesirable elimination of unused variables & functions at -O0, -O0 -flto and -O0 -fwhole-program

2020-03-17 Thread vries at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=51663

--- Comment #13 from Tom de Vries  ---
(In reply to Richard Biener from comment #12)
> (In reply to Tom de Vries from comment #11)
> > Cross-referencing PR gdb/25684 - "gdb testing with gcc -flto" (
> > https://sourceware.org/bugzilla/show_bug.cgi?id=25684 ).
> > 
> > Ideally there would be a way to enable the lto infrastructure without
> > actually optimizing, such that when running the gdb testsuite with and
> > without flto and comparing results, any regression would indicate something
> > that needs fixing.
> > 
> > In the current situation, each individual regression needs investigation
> > whether something needs fixing or whether the failure is just an
> > optimization artifact. And due to the fact there are optimizations, there
> > are thousands of such regressions.
> 
> I suppose we're talking about -O0 -flto here.

Right, and ideally -flto plain, with -O0 implicit.

>  What kind of transforms
> are undesirable?  I think at -O0 you'll get
> 
>  - more aggressive unused variable/function removal
>  - promotion of variables from global to local
> 

Right, is there a way to switch these off?

> some of the transforms are unavoidable due to partitioning(?) but we could
> default to 1:1 partitioning at -O0 ...

At this point I'm not interested in defaults yet. I can achieve 1:1 partition
by testing target board unix/-flto/-flto-partition=1to1.

For now I'm interested in a combination of flags that exercises the specific
type of debug info generation as is done for lto, without actually doing any
optimizations.

F.i., an open question for me is the following: I'm now using
-flto-partition=none for testing, but maybe 1to1 should yield better results?

[Bug debug/94235] New: worse debug info with O0 than with O2 with flto

2020-03-20 Thread vries at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=94235

Bug ID: 94235
   Summary: worse debug info with O0 than with O2 with flto
   Product: gcc
   Version: 10.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: debug
  Assignee: unassigned at gcc dot gnu.org
  Reporter: vries at gcc dot gnu.org
  Target Milestone: ---

Consider the following test-case (minimized from
gdb/testsuite/gdb.threads/step-bg-decr-pc-switch-thread.c):
...
$ cat -n test.c
 1  int i;
 2
 3  int
 4  main (void)
 5  {
 6i++;
 7
 8while (1);
 9
10return 0;
11  }
12
...

When compiled normally:
...
$ gcc-10 test.c -g
...

We can run to main, and step once:
...
$ gdb -batch a.out -ex start -ex s
Temporary breakpoint 1 at 0x400496: file test.c, line 6.

Temporary breakpoint 1, main () at test.c:6
6 i++;
8 while (1);
$
...

But if we use -flto -O0:
...
$ gcc-10 test.c -g -flto -O0
...
instead we have:
...
$ gdb -batch a.out -ex start -ex s
Temporary breakpoint 1 at 0x400496: file test.c, line 6.

Temporary breakpoint 1, main () at test.c:6
6 i++;

...

Looking at the differences with objdump -dS, we have normally:
...
00400492 :
int i;

int
main (void)
{
  400492:   55  push   %rbp
  400493:   48 89 e5mov%rsp,%rbp
  i++;
  400496:   8b 05 90 0b 20 00   mov0x200b90(%rip),%eax#
60102c 
  40049c:   83 c0 01add$0x1,%eax
  40049f:   89 05 87 0b 20 00   mov%eax,0x200b87(%rip)#
60102c 

  while (1);
  4004a5:   eb fe   jmp4004a5 
  4004a7:   66 0f 1f 84 00 00 00nopw   0x0(%rax,%rax,1)
  4004ae:   00 00 
...

but with lto there's no line number info for the loop:
...
00400492 :
int i;

int
main (void)
{
  400492:   55  push   %rbp
  400493:   48 89 e5mov%rsp,%rbp
  i++;
  400496:   8b 05 90 0b 20 00   mov0x200b90(%rip),%eax#
60102c 
  40049c:   83 c0 01add$0x1,%eax
  40049f:   89 05 87 0b 20 00   mov%eax,0x200b87(%rip)#
60102c 
  4004a5:   eb fe   jmp4004a5 
  4004a7:   66 0f 1f 84 00 00 00nopw   0x0(%rax,%rax,1)
  4004ae:   00 00 
...

Amazingly, with -flto -O2, we have:
...
004003c0 :
int i;

int
main (void)
{
  i++;
  4003c0:   83 05 65 0c 20 00 01addl   $0x1,0x200c65(%rip)#
60102c 

  while (1);
  4003c7:   eb fe   jmp4003c7 
  4003c9:   0f 1f 80 00 00 00 00nopl   0x0(%rax)

...
and:
...
$ gdb -batch a.out -ex start -ex s
Temporary breakpoint 1 at 0x4003c0: file test.c, line 6.

Temporary breakpoint 1, main () at test.c:6
6 i++;
8 while (1);
$
...

Same for O1 as for O2.

For some reason, we have worse debug info with O0 than with O2.

[Bug debug/94450] New: lto abstract variable emitted as concrete decl

2020-04-01 Thread vries at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=94450

Bug ID: 94450
   Summary: lto abstract variable emitted as concrete decl
   Product: gcc
   Version: 10.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: debug
  Assignee: unassigned at gcc dot gnu.org
  Reporter: vries at gcc dot gnu.org
  Target Milestone: ---

Consider test-case test.c:
...
int aaa;

int
main (void)
{
  return aaa;
}
...

Compiled with -flto:
...
$ gcc-10 -O0 test.c -g -flto -flto-partition=none -ffat-lto-objects
...

The debug info for the variable aaa looks like this:
...
 <0>: Abbrev Number: 1 (DW_TAG_compile_unit)
   DW_AT_name: 
 <1>: Abbrev Number: 2 (DW_TAG_imported_unit)
   DW_AT_import  : <0x12b>  [Abbrev Number: 1]
 <1>: Abbrev Number: 3 (DW_TAG_variable)
   DW_AT_abstract_origin: <0x13d>
   DW_AT_location: 9 byte block: 3 2c 10 60 0 0 0 0 0  
(DW_OP_addr: 60102c)
 <0><12b>: Abbrev Number: 1 (DW_TAG_compile_unit)
<131>   DW_AT_name: test.c
 <1><13d>: Abbrev Number: 2 (DW_TAG_variable)
<13e>   DW_AT_name: aaa
<142>   DW_AT_decl_file   : 1
<143>   DW_AT_decl_line   : 1
<144>   DW_AT_decl_column : 5
<145>   DW_AT_type: <0x149>
<149>   DW_AT_external: 1
...

When printing the symbol tables in gdb:
...
$ gdb -readnow -batch a.out -ex "maint print symbols"
...
it turns out we have two symbols aaa, one here:
...
Symtab for file test.c at 0x23aeeb0
Compilation directory is /home/vries
Read from object file /home/vries/a.out (0x238cc90)
Language: c

Blockvector:

block #000, object at 0x23af030, 1 syms/buckets in 0x0..0x0
 int aaa; unresolved
  block #001, object at 0x23aef80 under 0x23af030, 1 syms/buckets in 0x0..0x0
   typedef int int; 

Compunit user: 0x23aedf0
...
and one here:
...
Symtab for file  at 0x23aedf0
Compilation directory is /home/vries
Read from object file /home/vries/a.out (0x238cc90)
Language: c

Blockvector:

block #000, object at 0x23aecb0, 1 syms/buckets in 0x400492..0x40049e
 int aaa; static at 0x60102c section .bss
 int main(void); block object 0x23aebf0, 0x400492..0x40049e section .text
  block #001, object at 0x23aec50 under 0x23aecb0, 0 syms/buckets in
0x400492..0x40049e
block #002, object at 0x23aebf0 under 0x23aec50, 0 syms/buckets in
0x400492..0x40049e, function main

Compunit include: 0x23aeeb0
...

If we do "print aaa" in gdb and gdb finds the 'static' aaa symbol first, it
uses the DW_AT_location to find the address of the variable.

If we do "print aaa" in gdb and gdb finds the 'unresolved' aaa symbol first, it
looks in the minimal symbols for a symbol aaa and uses that address.

In both cases, we'd find the same address and print the same value, so there's
no correctness problem for this example.

But with ada, we run into PR gdb/25760 - "[gcc -flto] FAIL:
gdb.ada/call_pn.exp: print last_node_id after calling pn (timeout)", which
means that there is a correctness problem.

It's an idea to try to ignore these useless decls in gdb (filed as PR gdb/25759
- "Remove useless decls from symtab"), but I'm not sure yet how easy it is to
do this efficiently.

So, it would be good if gcc could make it explicit in the DWARF that there's
only one symbol to be considered, rather than having gdb spent time to ignore
the abstract one.

ISTM the only way to do this is to make the test.c CU a partial unit (using
DW_TAG_partial_unit) and drop the import.

[ Having said that, I'm not sure that gdb in its current state would correctly
interpret such dwarf and only create one symbol, so that might require an
additional gdb fix. ]

[Bug debug/94450] lto abstract variable emitted as concrete decl

2020-04-02 Thread vries at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=94450

--- Comment #5 from Tom de Vries  ---
(In reply to Richard Biener from comment #1)
> I guess the more correct DWARF would be to have the 13d DIE include
> DW_AT_declaration?

Well, currently the debug info contains two concrete symbols, one with and one
without location information. The things that makes the latter symbol concrete
are both the fact that it's contained in a CU (as opposed to in a PU), as well
as that it's imported into another CU (in fact, one could make an argument that
in fact three concrete symbols are present, but let's not go there). So, if
we'd mark the one without location information as declaration we'd still have
two concrete symbols.

It could be pedantically argued that tagging the symbol as declaration is
incorrect because there's in fact no declaration in the source that it
corresponds to. That could be fixed by marking the declaration with
DW_AT_artificial == 1 (and perhaps marking the def with DW_AT_artificial == 0
in order to make sure the artificial setting is not inherited, in case we go
the DW_AT_specification route). Btw, the dwarf5 standard lists DW_AT_artificial
as applicable to DW_TAG_variable, and the dwarf4 standard doesn't.  I'm not
sure yet whether that reflects improved documentation or an actual change.

But indeed, marking it as declaration would make the situation resemble more
non-lto code (for the case where the source has indeed both a decl and def).

I wonder even if the DW_AT_artificial marking itself (irrespective of a
possible DW_AT_declaration) is used or could be used in gdb to fix PR
gdb/25760.  I'll have to mock up a gdb testsuite dwarf assembly test-case
resembling the test-case and experiment a bit to see what works, and whether
gdb needs changes.

Anyway, the point I was trying to make is that the easiest way to make decls
abstract (rather than adding stuff to the decl itself), is by making the decl
not a top-level member of CU, in other words: declare it in a PU, and don't
import it into another CU.

> Then we could also stop the "abuse" of
> DW_AT_abstract_origin
> and instead have to use DW_AT_specification.  But I'm not sure whether
> DW_AT_specification allows cross CU references (technically yes but
> practically) especially since there's explicit wording that
> DW_AT_specification
> cannot refer to type unit entities.
>

Using DW_AT_specification sounds cleaner, agreed.

> Note I originally saw all early debug as abstract (but we're not consistently
> emitting DW_AT_inline to all early function DIEs either) but that concept
> doesn't apply to globals.
> 
> As you said the DW_TAG_imported_unit serve no useful purpose (I originally
> thought that it would provide proper name-lookup scopes but that works
> correct in other ways).  And I'm fine to simply drop those (also given
> consumers seem to handle references to CUs not explicitely imported just
> fine).  That could be done for GCC 10 already, I fear the rest needs more
> testing?
> 

Yeah, I think the part of dropping the imports should be safe, and the rest
should be decided once we have more info from playing with the above-mentioned
mockup example.

> Btw, thanks for sanity checking the LTO DWARF.

Sure. I'm working on trying to improve gdb speed for lto executables, and in
order to test gdb patches I need to regression test in lto mode, where I do run
into regressions, which need to be analyzed, and that's how I'm running into
this sort of issues.

[Bug debug/94450] lto abstract variable emitted as concrete decl

2020-04-03 Thread vries at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=94450

--- Comment #8 from Tom de Vries  ---
(In reply to Richard Biener from comment #7)
> The DW_TAG_imported_unit are now gone for GCC 10.  So can we consider this
> fixed?

I'd like a PR to refer to at the to-be-added xfail in the gdb test-case (and
the PR should be open as long as that test fails with trunk gcc). It doesn't
matter for me whether that's this particular PR or a follow-up PR.

[Bug debug/94450] lto abstract variable emitted as concrete decl

2020-04-03 Thread vries at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=94450

--- Comment #10 from Tom de Vries  ---
(In reply to rguent...@suse.de from comment #9)
> On Fri, 3 Apr 2020, vries at gcc dot gnu.org wrote:
> 
> > https://gcc.gnu.org/bugzilla/show_bug.cgi?id=94450
> > 
> > --- Comment #8 from Tom de Vries  ---
> > (In reply to Richard Biener from comment #7)
> > > The DW_TAG_imported_unit are now gone for GCC 10.  So can we consider this
> > > fixed?
> > 
> > I'd like a PR to refer to at the to-be-added xfail in the gdb test-case (and
> > the PR should be open as long as that test fails with trunk gcc). It doesn't
> > matter for me whether that's this particular PR or a follow-up PR.
> 
> OK, so if you have a (single?) specific testcase that's still affected
> please duplicate that into a new bugzilla.  It's always better to
> have something specific to track.
> 
> So did the patch not change anything?

Well, the changes I asked for related to the example in comment 0 are:
- drop the import
- change the tag from DW_TAG_compile_unit to DW_TAG_partial unit.

AFAIU the patch only removes the import, so in that sense I do not consider the
test-case reported in comment 0 addressed.

I have not tried out the patch.  FWIW, I did try a quick dwarf-assembly
experiment (not the ada one, which will cost more time I expect, but modified
gdb testsuite test-case gdb.dwarf2/imported-unit.exp) and confirmed that
neither only removing the import nor only changing the tag is sufficient to get
only one entry in the symtab.

Anyway, I understand the example is somewhat abstract, and I'll file the actual
ada example.

[Bug debug/94469] New: lto abstract variable emitted as concrete decl (ada test-case)

2020-04-03 Thread vries at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=94469

Bug ID: 94469
   Summary: lto abstract variable emitted as concrete decl (ada
test-case)
   Product: gcc
   Version: 10.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: debug
  Assignee: unassigned at gcc dot gnu.org
  Reporter: vries at gcc dot gnu.org
  Target Milestone: ---

Consider gdb testsuite test-case gdb.ada/call_pn (
https://sourceware.org/git/?p=binutils-gdb.git;a=tree;f=gdb/testsuite/gdb.ada/call_pn;hb=HEAD
), consisting of files foo.adb, pck.adb and pck.ads.

Compiled like so:
...
$ gnatmake-10 \
  --GCC=/usr/bin/gcc-10 \
  --GNATBIND=/usr/bin/gnatbind-10 \
  --GNATLINK=/usr/bin/gnatlink-10 \
  -largs --GCC=/usr/bin/gcc-10 -margs \
  src/gdb/testsuite/gdb.ada/call_pn/foo.adb \
  -g -flto -O0 -flto-partition=none -ffat-lto-objects
...

When trying to print the value of last_node_id, we get a question which symbol
to print:
...
$ gdb foo -ex "p last_node_id"
Reading symbols from foo...
Multiple matches for last_node_id
[0] cancel
[1] pck.last_node_id at src/gdb/testsuite/gdb.ada/call_pn/pck.adb:17
[2] pck.last_node_id at src/gdb/testsuite/gdb.ada/call_pn/foo.adb:17
> 
...
where 1 gives us:
...
> 1
$1 = 
...
and 2 gives us:
...
> 2
$1 = 0
...

If we compile without lto, so just with -g, we have instead:
...
$ gdb foo -ex "p last_node_id"
$1 = 0
...

The structure of the dwarf for the lto case is:
...
 <0><15ef>: Abbrev Number: 1 (DW_TAG_compile_unit)
<15f5>   DW_AT_name: 
 <1><1611>: Abbrev Number: 2 (DW_TAG_imported_unit)
<1612>   DW_AT_import  : <0x167a>   [Abbrev Number: 1]
 <1><163f>: Abbrev Number: 5 (DW_TAG_variable)
<1640>   DW_AT_abstract_origin: <0x16d4>
<1644>   DW_AT_location: 9 byte block: 3 d4 33 63 0 0 0 0 0
(DW_OP_addr: 6333d4)
 <0><167a>: Abbrev Number: 1 (DW_TAG_compile_unit)
<1680>   DW_AT_name: src/gdb/testsuite/gdb.ada/call_pn/pck.adb
 <1><16d4>: Abbrev Number: 8 (DW_TAG_variable)
<16d5>   DW_AT_name: pck__last_node_id
...

My understanding of DWARF is that this actually declares three symbols:
- the one for DW_TAG_compile_unit pck.adb
- the one for DW_TAG_compile_unit 
- the one resulting from the import of  into DW_TAG_compile_unit
  

And, AFAIU, the way to make sure we declare just one symbol is by both:
- dropping the import, and
- changing the tag for pck.adb to DW_TAG_partial_unit.

The import was already dropped on master by commit 54af95767e8 "debug/94450 -
remove DW_TAG_imported_unit generated in LTRANS units" ( see PR 94450 comment 6
).

Note: interestingly, the foo.adb here is incorrect:
...
[1] pck.last_node_id at src/gdb/testsuite/gdb.ada/call_pn/pck.adb:17
[2] pck.last_node_id at src/gdb/testsuite/gdb.ada/call_pn/foo.adb:17
...
For now I'm assuming that's a gdb PR, filed as 
PR gdb/25771 - "Inter-cu DW_AT_abstract_origin results in wrong file".

[Bug debug/94450] lto abstract variable emitted as concrete decl

2020-04-03 Thread vries at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=94450

--- Comment #12 from Tom de Vries  ---
(In reply to Tom de Vries from comment #10)
> I'll file the actual ada example.

PR94469 - "lto abstract variable emitted as concrete decl (ada test-case)"

[Bug debug/94450] lto abstract variable emitted as concrete decl

2020-04-03 Thread vries at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=94450

Tom de Vries  changed:

   What|Removed |Added

 CC||tromey at gcc dot gnu.org

--- Comment #13 from Tom de Vries  ---
(In reply to rguent...@suse.de from comment #11)

> Note that as I read the DWARF spec changing the
> early CUs to DW_TAG_partial_unit and then again importing those
> (the spec suggests you need to import DW_TAG_partial_units) would
> not reflect documented semantics either.

It's not my understanding from the spec that DW_TAG_partial_units are required
to be imported. AFAIU, it's just that in all the use-cases there using
DW_TAG_partial_unit, an import happens to be required.

OK, so lets look at the use cases described in the spec at E.1.

I. classic dwz: factor out into partial unit, add imports:
...
DW_TAG_compile_unit
  DW_AT_name cu1
L1:   DIEx
  DIEy
DW_AT_type L1
DW_TAG_compile_unit
  DW_AT_name cu2
L2:   DIEx
  DIEz
DW_AT_type L2

->

L3: DW_TAG_partial_unit
L4:   DIEx
DW_TAG_compile_unit
  DW_TAG_imported_unit
DW_AT_import L3
  DW_AT_name cu1
  DIEy
DW_AT_type L4
DW_TAG_compile_unit
  DW_TAG_imported_unit
DW_AT_import L3
  DW_AT_name cu2
  DIEz
DW_AT_type L4
...

II. dwz --devel-uni-lang --devel-gen-cu: factor out into DW_TAG_compile_unit,
no imports, exploit global namespace:
...
DW_TAG_compile_unit
  DW_AT_name cu1
L1:   DIEx
  DIEy
DW_AT_type L1
DW_TAG_compile_unit
  DW_AT_name cu2
L2:   DIEx
  DIEz
DW_AT_type L2

->

DW_TAG_compile_unit
L4:   DIEx
DW_TAG_compile_unit
  DW_AT_name cu1
  DIEy
DW_AT_type L4
DW_TAG_compile_unit
  DW_AT_name cu2
  DIEz
DW_AT_type L4
...

III. #include in namespace:
...
DW_TAG_compile_unit
  DW_AT_name cu1
  DW_TAG_namespace bla1
L1: DIEx
DIEy
  DW_AT_type L1
DW_TAG_compile_unit
  DW_AT_name cu2
  DW_TAG_namespace bla2
L2: DIEx
DIEz
  DW_AT_type L2

->

L3: DW_TAG_partial_unit
L4:   DIEx
DW_TAG_compile_unit
  DW_AT_name cu1
  DW_TAG_namespace bla1
DW_TAG_imported_unit
  DW_AT_import L3
DIEy
  DW_AT_type L4
DW_TAG_compile_unit
  DW_AT_name cu2
  DW_TAG_namespace bla2
DW_TAG_imported_unit
  DW_AT_import L3
DIEz
  DW_AT_type L4
...

So indeed, in all cases where DW_TAG_partial_unit is used, we use an import,
but that's because it's applicable to the transformation, and we're just doing
an entirely different transformation here:
...
DW_TAG_compile_unit
  DW_AT_name cu1
  DW_TAG_variable
DW_AT_name var1
DW_AT_location

->

DW_TAG_partial_unit
L1:   DW_TAG_variable
DW_AT_name var1
DW_TAG_compile_unit
  DW_AT_name cu1
  DW_TAG_variable
DW_AT_abstract_origin L1
DW_AT_location
...
We fabricate a new abstract DW_TAG_variable DIE out of thin air, then try to
hide that fact by placing it in a DW_TAG_partial_unit, much like is done at
III. Only in contrast to III, we don't want to reintroduce it in another
context, we want it to keep hidden, so there's no import.

[Bug debug/94469] lto abstract variable emitted as concrete decl (ada test-case)

2020-04-06 Thread vries at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=94469

--- Comment #2 from Tom de Vries  ---
(In reply to Richard Biener from comment #1)
> Do you know under which circumstances gdb asks which symbol to print? 
> Because
> I've never seen this for C or C++ and it should be present for _all_ global
> variables when LTO is used and your analysis is correct.
> 

The choice in printing is Ada-specific, for C/C++ it picks one.

> Or does Ada somehow not have variable shadowing and for C and C++ gdb
> implements
> shadowing rules, picking the "last" instance?

Unfortunately I have no knowledge of Ada as a language as such, so I cannot
comment on that.

As for which instance is found if there are two matches for C/C++, I'm not sure
if there are any guarantees.

---

Having said that, asking which match to print is Ada-specific, listing all
variables is not, and when doing that we get two variables instead of one:
...
(gdb) info variables
   ...
File gdb.ada/call_pn/foo.adb:
17: static pck.last_node_id: pck.node_id;

File gdb.ada/call_pn/pck.adb:
17: static pck.last_node_id: pck.node_id;
...
So, this might be a way to reproduce the problem outside of Ada.

I tried the following C test-case:
...
$ cat test.c
static int aaa;
int
main (void)
{  
  return 0;
}
$ cat test2.c
static int bbb;
$ cat test3.c
static int ccc;
...
and compiled using:
...
$ gcc-10 -g test.c test2.c test3.c -flto -flto-partition=none -ffat-lto-objects
-O0
...
and we have again duplicate variables (for bbb and ccc):
...
$ gdb.sh a.out
Reading symbols from a.out...
(gdb) info variables
All defined variables:

File init.c:
24: const int _IO_stdin_used;

File test.c:
1:  static int aaa;
1:  static int bbb;
1:  static int ccc;

File test2.c:
1:  static int bbb;

File test3.c:
1:  static int ccc;
...

This seems to be the same issue to me.

Even more clearly, we cannot print the values of bbb and ccc:
...
$ gdb a.out
Reading symbols from a.out...
(gdb) p aaa
$1 = 0
(gdb) p bbb
$2 = 
(gdb) p ccc
$3 = 
...
because the one without DW_AT_location is shadowing the one with
DW_AT_location.

[Bug debug/94469] lto abstract variable emitted as concrete decl (ada test-case)

2020-04-06 Thread vries at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=94469

--- Comment #4 from Tom de Vries  ---
(In reply to Tom de Vries from comment #2)
(In reply to Richard Biener from comment #3)
> Ah, thanks for the hints - that's something I can work with more easily than
> an Ada testcase ;)

Sure :)

FWIW, the gdb behaviour is somewhat flaky, so this reproduces what I had:
...
$ gdb -batch a.out -ex "p aaa" -ex "p bbb" -ex "p ccc"
$1 = 0
$2 = 
$3 = 
...
but if I drop printing aaa, I do get the value of bbb:
...
$ gdb -batch a.out -ex "p bbb" -ex "p ccc"
$1 = 0
$2 = 
...
So this also seems to interact with partial symbol tables.

To reproduce this reliably, just skip partial symbols tables using -readnow:
...
$ gdb -readnow -batch a.out -ex "p aaa" -ex "p bbb" -ex "p ccc"
$1 = 
$2 = 
$3 = 
...
and now also the problem surfaces for aaa.

[Bug debug/94469] lto abstract variable emitted as concrete decl (ada test-case)

2020-04-06 Thread vries at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=94469

--- Comment #5 from Tom de Vries  ---
(In reply to Tom de Vries from comment #4)
> (In reply to Tom de Vries from comment #2)
> (In reply to Richard Biener from comment #3)
> $ gdb -readnow -batch a.out -ex "p aaa" -ex "p bbb" -ex "p ccc"
> $1 = 
> $2 = 
> $3 = 
> ...
> and now also the problem surfaces for aaa.

And, it's good to realize that once you set the context to main, things do
work:
...
$ gdb.sh -readnow -batch a.out -ex start -ex "p aaa" -ex "p bbb" -ex "p ccc" 
Temporary breakpoint 1 at 0x400496: file test.c, line 6.

Temporary breakpoint 1, main () at test.c:6
6 return 0;
$1 = 0
$2 = 0
$3 = 0
...

[Bug debug/94469] lto abstract variable emitted as concrete decl (ada test-case)

2020-04-06 Thread vries at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=94469

--- Comment #8 from Tom de Vries  ---
(In reply to Richard Biener from comment #7)
> (In reply to Richard Biener from comment #6)
> > Btw, I still wonder how it works with abstract functions, inline and
> > concrete instances.  Works in gdb, that is - will dig into it a bit after
> > lunch.
> 
> So here I see us (without LTO) putting DW_AT_location attributes on the
> abstract instance copy.  That kind-of makes it not abstract anymore, no?
> But that way we avoid emitting multiple DIEs for local statics.  With
> -flto the same problem appears there because we cannot annotate the
> early CUs DIE with a location so the concrete instance copy
> gets [generated and] the location.
> 
> So while I intended to have the early CUs behave like fully abstract
> the actual DWARF is different.  I understand that if I emit the early CU as
> partial unit it becomes abstract?

Well, that's my theory.

I've created a minimal dwarf assembler variant corresponding to the C test-case
(with only var aaa), and I could reproduce the problem, however after changing
the tag to DW_TAG_partial_unit still a symbol table for that partial unit was
created. It seems that the inter-cu ref handling code is responsible for that.
I'll try to fix this.

>  Note we do emit DW_AT_const_value
> to early optimized out decls - would those still be found when the early CU
> is partial [and not imported anywhere]?

I think so, but I could check with the dwarf assembler test-case.

[Bug debug/94469] lto abstract variable emitted as concrete decl (ada test-case)

2020-04-07 Thread vries at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=94469

--- Comment #11 from Tom de Vries  ---
(In reply to Richard Biener from comment #9)
> (In reply to Tom de Vries from comment #8)
> > (In reply to Richard Biener from comment #7)
> > > (In reply to Richard Biener from comment #6)
> > > > Btw, I still wonder how it works with abstract functions, inline and
> > > > concrete instances.  Works in gdb, that is - will dig into it a bit 
> > > > after
> > > > lunch.
> > > 
> > > So here I see us (without LTO) putting DW_AT_location attributes on the
> > > abstract instance copy.  That kind-of makes it not abstract anymore, no?
> > > But that way we avoid emitting multiple DIEs for local statics.  With
> > > -flto the same problem appears there because we cannot annotate the
> > > early CUs DIE with a location so the concrete instance copy
> > > gets [generated and] the location.
> > > 
> > > So while I intended to have the early CUs behave like fully abstract
> > > the actual DWARF is different.  I understand that if I emit the early CU 
> > > as
> > > partial unit it becomes abstract?
> > 
> > Well, that's my theory.
> > 
> > I've created a minimal dwarf assembler variant corresponding to the C
> > test-case (with only var aaa), and I could reproduce the problem, however
> > after changing the tag to DW_TAG_partial_unit still a symbol table for that
> > partial unit was created. It seems that the inter-cu ref handling code is
> > responsible for that. I'll try to fix this.
> > 
> > >  Note we do emit DW_AT_const_value
> > > to early optimized out decls - would those still be found when the early 
> > > CU
> > > is partial [and not imported anywhere]?
> > 
> > I think so, but I could check with the dwarf assembler test-case.
> 
> OK, so that doesn't work anymore.
> 
> static const int i = 1;
> 
> int main()
> {
>   return i;
> }
> 
> with -O2 -flto -g and DW_TAG_compile_unit I see
> 
> (gdb) start
> Temporary breakpoint 1 at 0x4003a0: file t.c, line 5.
> Starting program: /home/abuild/rguenther/obj-gcc-g/gcc/a.out
> 
> Temporary breakpoint 1, main () at t.c:5
> 5 return i;
> (gdb) p i
> 1
> 
> while when using DW_TAG_partial_unit:
> 
> (gdb) p i
> No symbol "i" in current context.
> 
>   Compilation Unit @ offset 0xc7:
>Length:0x3d (32-bit)
>Version:   4
>Abbrev Offset: 0x64
>Pointer Size:  8
>  <0>: Abbrev Number: 1 (DW_TAG_compile_unit)
>DW_AT_producer: (indirect string, offset: 0x1d0): GNU GIMPLE
> 10.0
> .1 20200406 (experimental) -mtune=generic -march=x86-64 -g -g -O2 -O2
> -fno-openm
> p -fno-openacc -fno-pie -fltrans
>DW_AT_language: 12   (ANSI C99)
>DW_AT_name: (indirect string, offset: 0x250): 
>DW_AT_comp_dir: (indirect string, offset: 0x25d):
> /abuild/rguenth
> er/obj-gcc-g/gcc
>DW_AT_ranges  : 0x40
>DW_AT_low_pc  : 0x0
>DW_AT_stmt_list   : 0xe9
>  <1>: Abbrev Number: 2 (DW_TAG_subprogram)
>DW_AT_abstract_origin: <0x13c>
>DW_AT_low_pc  : 0x4003a0
>DW_AT_high_pc : 0x6
> <105>   DW_AT_frame_base  : 1 byte block: 9c   
> (DW_OP_call_frame_cfa)
> <107>   DW_AT_GNU_all_call_sites: 1
>  <1><107>: Abbrev Number: 0
>   Compilation Unit @ offset 0x108:
>Length:0x3d (32-bit)
>Version:   4
>Abbrev Offset: 0x88
>Pointer Size:  8
>  <0><113>: Abbrev Number: 1 (DW_TAG_partial_unit)
> <114>   DW_AT_producer: (indirect string, offset: 0x27d): GNU C17
> 10.0.1
>  20200406 (experimental) -mtune=generic -march=x86-64 -g -O2 -flto
> <118>   DW_AT_language: 12  (ANSI C99)
> <119>   DW_AT_name: t.c
> <11d>   DW_AT_comp_dir: (indirect string, offset: 0x25d):
> /abuild/rguent
> her/obj-gcc-g/gcc
> <121>   DW_AT_stmt_list   : 0x127
>  <1><125>: Abbrev Number: 2 (DW_TAG_variable)
> <126>   DW_AT_name: i
> <128>   DW_AT_decl_file   : 1
> <129>   DW_AT_decl_line   : 1
> <12a>   DW_AT_decl_column : 18
> <12b>   DW_AT_type: <0x137>
> <12f>   DW_AT_const_value : 1
>  <1><130>: Abbrev Number: 3 (DW_TAG_base_type)
> <131>   DW_AT_byte_size   : 4
> <132>   DW_AT_encoding: 5   (signed)
> <133>   DW_AT_name: int
>  <1><137>: Abbrev Number: 4 (DW_TAG_const_type)
> <138>   DW_AT_type: <0x130>
>  <1><13c>: Abbrev Number: 5 (DW_TAG_subprogram)
> <13d>   DW_AT_external: 1
> <13d>   DW_AT_name: (indirect string, offset: 0x2ce): main
> <141>   DW_AT_decl_file   : 1
> <142>   DW_AT_decl_line   : 3
> <143>   DW_AT_decl_column : 5
> <144>   DW_AT_type: <0x130>
>  <1><148>: Abbrev Number: 0
> 

Ack, I've managed to reproduce this using a dwarf assembly test-case (PR
gdb/25796 - "Symbol with inherited DW_AT_const_value not found" @
https://sourceware.org/bugzilla/show_bug.cgi?id=25796), and submitted a gdb
patch for this.

Note that the problem is specific to gdb's partial symbol tables feat

[Bug debug/94469] lto abstract variable emitted as concrete decl (ada test-case)

2020-04-07 Thread vries at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=94469

--- Comment #13 from Tom de Vries  ---
(In reply to rguent...@suse.de from comment #12)
> > Ack, I've managed to reproduce this using a dwarf assembly test-case (PR
> > gdb/25796 - "Symbol with inherited DW_AT_const_value not found" @
> > https://sourceware.org/bugzilla/show_bug.cgi?id=25796), and submitted a gdb
> > patch for this.
> 
> Note compared to your gdb bug the case above does not have a
> reference to 'i' from the "real" DW_TAG_compile_unit.

Ah, sorry, I've misread that, thanks for pointing that out. For that case, I
think gdb behaves as expected.

That is, my mental model is:
- a PU represents a dwarf repository
- a CU represents a symtab (as well as a dwarf repository)
- an import imports dwarf from one dwarf repository into another
- an inter-cu ref does not represent an implicit import

So, if we have only i declared in a PU, and the PU is not imported, there's no
symtab with an 'i', and gdb can't find it.

If you want a symtab with an 'i', you have to add a DW_TAG_variable DIE to a
CU, with DW_AT_abstract_origin referencing the DIE in the PU (as my dwarf
assembly test-case does).

> A an extreme dwarf
> testcase would probably contain a partial unit with the optimized
> out variable and a main unit with just a 'main' and no reference to
> the partial unit at all.
> 

In my interpretation as decribed above, that boils down to the same: no symbol
'i' declared in any symtab.

> > Note that the problem is specific to gdb's partial symbol tables feature, so
> > that problem doesn't occur when using -readnow.
> 
> > > I understand that if I would again add imports this would likely be 
> > > resolved
> > > but at the expense of re-creating the original issue (but just with two
> > > instances rather than three)?
> > 
> > Agreed.
> 
> OK.  So I understand the DWARF standard doesn't really say how consumers
> should work but how do partial vs. full units differ as to "name lookup"?
> I've originally placed imports of original units where I instantiated
> something from that original unit so to make things "visible" at that
> point (as in, all global statics are visible by name lookup).  But of
> course consumers do not really follow name lookup rules since I
> can perfectly well lookup 'i' from 'foo' for
> 
> void foo() {}
> int i;
> 
> where it is obviously not visible (but consumers do need to do
> something resembling name lookup when interpreting user expressions
> written in the source language).

I hope I managed to explained above how I see the difference.

---

It's perhaps good to follow-up at this point to the discussion related to
DW_AT_declaration (see PR94450 comment 5).

As mentioned in comment 8, I've created a minimal dwarf assembler variant
corresponding to the C test-case (with only var aaa), and I could reproduce the
problem. Then by tagging the abstract DIE with DW_AT_declaration, I managed to
fix the problem of "info variables" listing the variable twice. But well, the
fact that we keep decl in symtabs for gdb comes with a number of known issues
(which are tracked here:
https://sourceware.org/bugzilla/show_bug.cgi?id=25755).

In particular, there's PR24985 - "Cannot print value of global variable because
decl in one CU shadows def in other CU", and I was thinking we might run into
that for VLAs when we start using DW_AT_declaration.

So, I was hoping to avoid those issue by using PUs, but it seems also that
comes with its own set of issues in gdb :)

[Bug debug/94469] lto abstract variable emitted as concrete decl (ada test-case)

2020-04-07 Thread vries at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=94469

--- Comment #15 from Tom de Vries  ---
(In reply to Tom de Vries from comment #8)
> > So while I intended to have the early CUs behave like fully abstract
> > the actual DWARF is different.  I understand that if I emit the early CU as
> > partial unit it becomes abstract?
> 
> Well, that's my theory.
> 
> I've created a minimal dwarf assembler variant corresponding to the C
> test-case (with only var aaa), and I could reproduce the problem, however
> after changing the tag to DW_TAG_partial_unit still a symbol table for that
> partial unit was created. It seems that the inter-cu ref handling code is
> responsible for that. I'll try to fix this.
> 

Filed PR gdb/25798 - "symbol in non-imported PU should not appear in symtabs.

[Bug debug/94469] lto abstract variable emitted as concrete decl (ada test-case)

2020-04-07 Thread vries at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=94469

--- Comment #16 from Tom de Vries  ---
(In reply to Richard Biener from comment #14)
> Using DW_AT_declaration for variables and CUs instead of PUs is IMHO the
> most promising approach then.

I managed to reproduce the "Multiple matches" problem by switching the language
for the dwarf assembly test-case to ada (and using -readnow). And adding the
DW_AT_declaration at the concrete DIE fixed that problem. So yeah, that looks
promising.

[Bug debug/94847] New: -fdebug-types-section drops const in type

2020-04-29 Thread vries at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=94847

Bug ID: 94847
   Summary: -fdebug-types-section drops const in type
   Product: gcc
   Version: 10.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: debug
  Assignee: unassigned at gcc dot gnu.org
  Reporter: vries at gcc dot gnu.org
  Target Milestone: ---

Consider test-case gdb/testsuite/gdb.base/constvars.c (
https://sourceware.org/git/?p=binutils-gdb.git;a=blob_plain;f=gdb/testsuite/gdb.base/constvars.c;hb=HEAD
).

Compiled with debug info:
...
$ gcc -g constvars.c
...
we have the expected:
...
$ gdb -batch a.out -ex start -ex "ptype victor"
Temporary breakpoint 1 at 0x4004f4: file constvars.c, line 24.

Temporary breakpoint 1, main () at constvars.c:24
24char lave = 'B';
type = const volatile char
...

However, compiled with -fdebug-types-section in addition:
...
$ gcc -g -fdebug-types-section constvars.c
...
we have instead:
...
$ gdb -batch a.out -ex start -ex "ptype victor"
Temporary breakpoint 1 at 0x4004f4: file constvars.c, line 24.

Temporary breakpoint 1, main () at constvars.c:24
24char lave = 'B';
type = volatile char
...
So, the 'const' got dropped.

Looking at the debug info, in the first case we have:
...
 <2><64b>: Abbrev Number: 3 (DW_TAG_variable)
<64c>   DW_AT_name: victor
<652>   DW_AT_type: <0x826>
 <1><826>: Abbrev Number: 8 (DW_TAG_const_type)
<827>   DW_AT_type: <0x821>
 <1><821>: Abbrev Number: 9 (DW_TAG_volatile_type)
<822>   DW_AT_type: <0x815>
 <1><815>: Abbrev Number: 10 (DW_TAG_base_type)
<816>   DW_AT_byte_size   : 1
<817>   DW_AT_encoding: 6   (signed char)
<818>   DW_AT_name: char
...
but in the second case we have:
...
 <2><64b>: Abbrev Number: 3 (DW_TAG_variable)
<64c>   DW_AT_name: victor
<652>   DW_AT_type: <0x821>
 <1><821>: Abbrev Number: 9 (DW_TAG_volatile_type)
<822>   DW_AT_type: <0x815>
 <1><815>: Abbrev Number: 10 (DW_TAG_base_type)
<816>   DW_AT_byte_size   : 1
<817>   DW_AT_encoding: 6   (signed char)
<818>   DW_AT_name: char
...

[Bug debug/94847] -fdebug-types-section drops const in type

2020-04-29 Thread vries at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=94847

--- Comment #1 from Tom de Vries  ---
Created attachment 48407
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=48407&action=edit
Test case

[Bug debug/94875] New: -fdebug-types-section drops DW_AT_object_pointer

2020-04-30 Thread vries at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=94875

Bug ID: 94875
   Summary: -fdebug-types-section drops DW_AT_object_pointer
   Product: gcc
   Version: 10.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: debug
  Assignee: unassigned at gcc dot gnu.org
  Reporter: vries at gcc dot gnu.org
  Target Milestone: ---

Consider the test-case consisting of gdb.cp/derivation.cc and
gdb.cp/derivation2.cc (
https://sourceware.org/git/?p=binutils-gdb.git;a=blob_plain;f=gdb/testsuite/gdb.cp/derivation.cc;hb=HEAD
and
https://sourceware.org/git/?p=binutils-gdb.git;a=blob_plain;f=gdb/testsuite/gdb.cp/derivation2.cc;hb=HEAD
).

Without -fdebug-types-section:
...
$ g++ derivation.cc derivation2.cc -g
...
we have:
...
 <0>: Abbrev Number: 1 (DW_TAG_compile_unit)
   DW_AT_name: derivation.cc
 <1><161>: Abbrev Number: 11 (DW_TAG_class_type)
<162>   DW_AT_name: A
<164>   DW_AT_byte_size   : 8
<165>   DW_AT_decl_file   : 1
<166>   DW_AT_decl_line   : 33
<167>   DW_AT_sibling : <0x1df>
 <2><16b>: Abbrev Number: 12 (DW_TAG_typedef)
<16c>   DW_AT_name: value_type
<170>   DW_AT_decl_file   : 1
<171>   DW_AT_decl_line   : 35
<172>   DW_AT_type: <0x14a>
<176>   DW_AT_accessibility: 1  (public)
 <2><1a6>: Abbrev Number: 15 (DW_TAG_subprogram)
<1a7>   DW_AT_external: 1
<1a7>   DW_AT_name: afoo
<1ab>   DW_AT_decl_file   : 1
<1ac>   DW_AT_decl_line   : 44
<1ad>   DW_AT_linkage_name: _ZN1A4afooEv
<1b1>   DW_AT_type: <0x16b>
<1b5>   DW_AT_accessibility: 1  (public)
<1b6>   DW_AT_declaration : 1
<1b6>   DW_AT_object_pointer: <0x1be>
<1ba>   DW_AT_sibling : <0x1c4>
 <3><1be>: Abbrev Number: 7 (DW_TAG_formal_parameter)
<1bf>   DW_AT_type: <0x1df>
<1c3>   DW_AT_artificial  : 1
 <1>: Abbrev Number: 48 (DW_TAG_subprogram)
   DW_AT_specification: <0x1a6>
   DW_AT_decl_line   : 198
   DW_AT_object_pointer: <0xd39>
   DW_AT_low_pc  : 0x400640
   DW_AT_high_pc : 0xf
   DW_AT_frame_base  : 1 byte block: 9c(DW_OP_call_frame_cfa)
   DW_AT_object_pointer: <0xd39>
   DW_AT_GNU_all_call_sites: 1
   DW_AT_sibling : <0xd46>
 <2>: Abbrev Number: 47 (DW_TAG_formal_parameter)
   DW_AT_name:  this
   DW_AT_type: <0x1e5>
   DW_AT_artificial  : 1
   DW_AT_location: 2 byte block: 91 68 (DW_OP_fbreg: -24)
...

OTOH with -fdebug-types-section:
...
$ g++ derivation.cc derivation2.cc -g -fdebug-types-section
...
we have:
...
.debug_info:
 <0>: Abbrev Number: 41 (DW_TAG_compile_unit)
   DW_AT_name: derivation.cc
 <1><13e>: Abbrev Number: 45 (DW_TAG_class_type)
<13f>   DW_AT_name: A
<141>   DW_AT_signature   : <0xccc>
<145>   DW_AT_declaration : 1
<145>   DW_AT_sibling : <0x174>
 <2><153>: Abbrev Number: 47 (DW_TAG_subprogram)
<154>   DW_AT_external: 1
<154>   DW_AT_name: afoo
<158>   DW_AT_decl_file   : 1
<159>   DW_AT_decl_line   : 44
<15a>   DW_AT_linkage_name: _ZN1A4afooEv
<15e>   DW_AT_type: <0xcdb>
<162>   DW_AT_accessibility: 1  (public)
<163>   DW_AT_declaration : 1
 <1><8fd>: Abbrev Number: 68 (DW_TAG_subprogram)
<8fe>   DW_AT_specification: <0x153>
<902>   DW_AT_decl_line   : 198
<903>   DW_AT_object_pointer: <0x921>
<907>   DW_AT_low_pc  : 0x400640
<90f>   DW_AT_high_pc : 0xf
<917>   DW_AT_frame_base  : 1 byte block: 9c(DW_OP_call_frame_cfa)
<919>   DW_AT_object_pointer: <0x921>
<91d>   DW_AT_GNU_all_call_sites: 1
<91d>   DW_AT_sibling : <0x92e>
 <2><921>: Abbrev Number: 67 (DW_TAG_formal_parameter)
<922>   DW_AT_name: (indirect string, offset: 0x63c): this
<926>   DW_AT_type: <0x17a>
<92a>   DW_AT_artificial  : 1
<92a>   DW_AT_location: 2 byte block: 91 68 (DW_OP_fbreg: -24)
 <1>: Abbrev Number: 27 (DW_TAG_class_type)
   DW_AT_name: A
   DW_AT_signature   : signature: 0xbb06cf12bfa5e351
   DW_AT_declaration : 1
   DW_AT_sibling : <0xce8>
 <2>: Abbrev Number: 28 (DW_TAG_typedef)
   DW_AT_name: (indirect string, offset: 0x515): value_type
   DW_AT_decl_file   : 1
   DW_AT_decl_line   : 35
   DW_AT_type: <0xce8>
 

[Bug debug/94875] -fdebug-types-section drops DW_AT_object_pointer

2020-04-30 Thread vries at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=94875

--- Comment #1 from Tom de Vries  ---
Minimal example:
...
$ cat derivation.cc
class A {
public:
  A() {}
  int afoo() { return 1; }
};

A a_instance;

int
main (void)
{
return 0;
}
...

compiled as:
...
$ g++ derivation.cc -g -fdebug-types-section
...

DW_AT_object_pointer is missing here in .debug_info:
...
 <2><109>: Abbrev Number: 11 (DW_TAG_subprogram)
<10a>   DW_AT_external: 1
<10a>   DW_AT_name: (indirect string, offset: 0x20c): afoo
<10e>   DW_AT_decl_file   : 1
<10f>   DW_AT_decl_line   : 4
<110>   DW_AT_linkage_name: (indirect string, offset: 0x285): _ZN1A4afooEv
<114>   DW_AT_type: <0x125>
<118>   DW_AT_accessibility: 1  (public)
<119>   DW_AT_declaration : 1
...
but not here in .debug_types:
...
 <2><47>: Abbrev Number: 5 (DW_TAG_subprogram)
<48>   DW_AT_external: 1
<48>   DW_AT_name: (indirect string, offset: 0x20c): afoo
<4c>   DW_AT_decl_file   : 1
<4d>   DW_AT_decl_line   : 4
<4e>   DW_AT_linkage_name: (indirect string, offset: 0x285): _ZN1A4afooEv
<52>   DW_AT_type: <0x68>
<56>   DW_AT_accessibility: 1   (public)
<57>   DW_AT_declaration : 1
<57>   DW_AT_object_pointer: <0x5b>
...

[Bug debug/94847] -fdebug-types-section drops const in type

2020-04-30 Thread vries at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=94847

--- Comment #2 from Tom de Vries  ---
Minimal test-case:
...
$ cat constvars.c
int
main (void)
{
  const char laconic = 'A';
  volatile char vox = 'X';
  const volatile char victor = 'Y';

  return 0;
}
...

Compiled like this:
...
$ gcc -g constvars.c -fdebug-types-section
...

gives the wrong type for 'victor':
...
$ gdb -batch a.out -ex start -ex "ptype victor"
Temporary breakpoint 1 at 0x40049b: file constvars.c, line 4.

Temporary breakpoint 1, main () at constvars.c:4
4 const char laconic = 'A';
type = volatile char
...

[Bug debug/94887] New: -fdebug-types-section drops DW_TAG_formal_parameter and DW_TAG_template_type_param

2020-04-30 Thread vries at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=94887

Bug ID: 94887
   Summary: -fdebug-types-section drops DW_TAG_formal_parameter
and DW_TAG_template_type_param
   Product: gcc
   Version: 10.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: debug
  Assignee: unassigned at gcc dot gnu.org
  Reporter: vries at gcc dot gnu.org
  Target Milestone: ---

Consider this test-case, minimized from gdb testsuite test-case py-methods.cc:
...
$ cat py-xmethods.cc
template 
class G
{
public:
  template 
  T mul(const T1 t1) { return t1 * 5; }
};

int
main(void)
{
  G g;
  return g.mul (1.0);
}
...

Without -fdebug-types-section, we have:
...
$ g++ py-xmethods.cc -g
$ gdb -batch a.out -ex start  -ex "ptype g"  
Temporary breakpoint 1 at 0x4004cf: file py-xmethods.cc, line 13.

Temporary breakpoint 1, main () at py-xmethods.cc:13
13return g.mul (1.0);
type = class G [with T = int] {
  public:
T mul(double);
}
...

But with -fdebug-types-section, we have:
...
$ g++ py-xmethods.cc -g -fdebug-types-section
$ gdb -batch a.out -ex start  -ex "ptype g"
Temporary breakpoint 1 at 0x4004cf: file py-xmethods.cc, line 13.

Temporary breakpoint 1, main () at py-xmethods.cc:13
13return g.mul (1.0);
type = class G [with T = int] {

}
...

Or with the tentative gdb patch from here (
https://sourceware.org/bugzilla/show_bug.cgi?id=25898#c3 ), we have:
...
$ gdb -batch a.out -ex start  -ex "ptype g"
Temporary breakpoint 1 at 0x4004cf: file py-xmethods.cc, line 13.

Temporary breakpoint 1, main () at py-xmethods.cc:13
13return g.mul (1.0);
type = class G [with T = int] {
  public:
T mul(void);
}
...
Note the 'void' instead of 'double'.

Without -fdebug-types-section, we have:
...
 <1>: Abbrev Number: 2 (DW_TAG_class_type)
   DW_AT_name: (indirect string, offset: 0x225): G
   DW_AT_byte_size   : 1
   DW_AT_decl_file   : 1
   DW_AT_decl_line   : 2
   DW_AT_sibling : <0x12f>
 <2>: Abbrev Number: 3 (DW_TAG_subprogram)
   DW_AT_external: 1
   DW_AT_name: mul
<101>   DW_AT_decl_file   : 1
<102>   DW_AT_decl_line   : 6
<103>   DW_AT_linkage_name: _ZN1GIiE3mulIdEEiT_
<107>   DW_AT_type: <0x12f>
<10b>   DW_AT_accessibility: 1  (public)
<10c>   DW_AT_declaration : 1
<10c>   DW_AT_object_pointer: <0x11c>
<110>   DW_AT_sibling : <0x127>
 <3><114>: Abbrev Number: 4 (DW_TAG_template_type_param)
<115>   DW_AT_name: T1
<118>   DW_AT_type: <0x136>
 <3><11c>: Abbrev Number: 5 (DW_TAG_formal_parameter)
<11d>   DW_AT_type: <0x142>
<121>   DW_AT_artificial  : 1
 <3><121>: Abbrev Number: 6 (DW_TAG_formal_parameter)
<122>   DW_AT_type: <0x136>
 <1><14d>: Abbrev Number: 11 (DW_TAG_subprogram)
<14e>   DW_AT_specification: <0xfc>
<152>   DW_AT_object_pointer: <0x178>
<156>   DW_AT_low_pc  : 0x4004ee
<15e>   DW_AT_high_pc : 0x24
<166>   DW_AT_frame_base  : 1 byte block: 9c(DW_OP_call_frame_cfa)
<168>   DW_AT_object_pointer: <0x178>
<16c>   DW_AT_GNU_all_call_sites: 1
<16c>   DW_AT_sibling : <0x192>
 <2><170>: Abbrev Number: 4 (DW_TAG_template_type_param)
<171>   DW_AT_name: T1
<174>   DW_AT_type: <0x136>
 <2><178>: Abbrev Number: 12 (DW_TAG_formal_parameter)
<179>   DW_AT_name: this
<17d>   DW_AT_type: <0x148>
<181>   DW_AT_artificial  : 1
<181>   DW_AT_location: 2 byte block: 91 68 (DW_OP_fbreg: -24)
 <2><184>: Abbrev Number: 13 (DW_TAG_formal_parameter)
<185>   DW_AT_name: t1
<188>   DW_AT_decl_file   : 1
<189>   DW_AT_decl_line   : 6
<18a>   DW_AT_type: <0x13d>
<18e>   DW_AT_location: 2 byte block: 91 60 (DW_OP_fbreg: -32)
...

But with -fdebug-types-section, we have:
...
 <1><10e>: Abbrev Number: 9 (DW_TAG_subprogram)
<10f>   DW_AT_specification: <0x18e>
<113>   DW_AT_object_pointer: <0x139>
<117>   DW_AT_low_pc  : 0x4004ee
<11f>   DW_AT_high_pc : 0x24
<127>   DW_AT_frame_base  : 1 byte block: 9c(DW_OP_call_frame_cfa)
<129>   DW_AT_object_pointer: <0x139>
<12d>   DW_AT_GNU_all_call_sites: 1
<12d>   DW_AT_sibling : <0x153>
 <2><131>: Abbrev Number: 3 (DW_TAG_template_type_param)
<132>   DW_AT_name: T1

[Bug debug/95360] inconsistent behaviors at -O0

2020-05-27 Thread vries at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=95360

--- Comment #3 from Tom de Vries  ---
(In reply to Yibiao Yang from comment #0)
> Breakpoint 1, main () at small.c:5
> 5   for (; d<1; d++)
> (gdb) stepi
> 0x004011545 for (; d<1; d++)
> (gdb) stepi
> 0x0040115a5 for (; d<1; d++)
> (gdb) stepi
> 0x0040115c5 for (; d<1; d++)
> (gdb) stepi
> 0x0040113b6   for (; b<1; b++)
> (gdb) stepi
> 0x004011416   for (; b<1; b++)
> (gdb) stepi
> 0x004011436   for (; b<1; b++)
> (gdb) stepi
> 7   c[b][d+1] = 0;
> (gdb)
> 
> 
> /*
> As showed, Line 6 is hit first and then hit Line 7 with stepi.
> However, when using step, gdb is first hit Line 7 and then hit Line 6.
> This is an inconsistent behaviors between stepi and step
> */

Gdb is behaving consistently in the following sense:
- when gdb is at a "recommended breakpoint location" it shows the source line
  only with line number prefix.
- otherwise, it shows the source line with both address and line number prefix.

So, what the stepi sequence shows it that the next "recommended breakpoint
location" after line 5 is line 7, which is consistent with a step from line 5
to line 7.

[Bug debug/95360] inconsistent behaviors at -O0

2020-05-27 Thread vries at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=95360

--- Comment #4 from Tom de Vries  ---
I compiled the test-case:
...
$ gcc-10 -O0 -g small.c
...

And did the stepi scenario:
...
$ gdb a.out -batch -ex start $(for n in $(seq 1 7); do echo -ex si; done)
Temporary breakpoint 1 at 0x400496: file small.c, line 5.

Temporary breakpoint 1, main () at small.c:5
5 for (; d<1; d++)
0x004004e4  5 for (; d<1; d++)
0x004004ea  5 for (; d<1; d++)
0x004004ec  5 for (; d<1; d++)
0x004004cb  6   for (; b<1; b++)
0x004004d1  6   for (; b<1; b++)
0x004004d3  6   for (; b<1; b++)
7 c[b][d+1] = 0;
...

The line table gdb uses is:
...
$ gdb a.out -batch -ex start -ex "maint info line-table"
Temporary breakpoint 1 at 0x400496: file small.c, line 5.

Temporary breakpoint 1, main () at small.c:5
5 for (; d<1; d++)
objfile: a.out ((struct objfile *) 0x2f2f520)
compunit_symtab: ((struct compunit_symtab *) 0x2f618e0)
symtab: small.c ((struct symtab *) 0x2f61960)
linetable: ((struct linetable *) 0x2fa3ae0):
INDEX  LINE   ADDRESSIS-STMT 
0  4  0x00400492 Y 
1  5  0x00400496 Y 
2  7  0x00400498 Y 
3  6  0x004004bc Y 
4  5  0x004004d5 Y 
5  9  0x004004ee Y 
6  10 0x004004f3 Y 
7  END0x004004f5 Y 
...

And indeed, the insn at 0x004004cb is not a "recommended breakpoint
location" in this table.

However, if we look in the line number program using readelf -wl we see an
entry with that address:
...
  [0x0145]  Special opcode 215: advance Address by 15 to 0x4004cb and Line
by 0 to 6
...

The whole line number program looks like this:
...
 Line Number Statements:
  [0x0111]  Set column to 12
  [0x0113]  Extended opcode 2: set Address to 0x400492
  [0x011e]  Special opcode 8: advance Address by 0 to 0x400492 and Line by
3 to 4
  [0x011f]  Set column to 3
  [0x0121]  Special opcode 62: advance Address by 4 to 0x400496 and Line by
1 to 5
  [0x0122]  Set column to 11
  [0x0124]  Extended opcode 4: set Discriminator to 2
  [0x0128]  Special opcode 35: advance Address by 2 to 0x400498 and Line by
2 to 7
  [0x0129]  Set column to 13
  [0x012b]  Extended opcode 4: set Discriminator to 2
  [0x012f]  Special opcode 89: advance Address by 6 to 0x40049e and Line by
0 to 7
  [0x0130]  Set column to 17
  [0x0132]  Extended opcode 4: set Discriminator to 2
  [0x0136]  Special opcode 131: advance Address by 9 to 0x4004a7 and Line
by 0 to 7
  [0x0137]  Set column to 18
  [0x0139]  Extended opcode 4: set Discriminator to 2
  [0x013d]  Advance PC by constant 17 to 0x4004b8
  [0x013e]  Special opcode 60: advance Address by 4 to 0x4004bc and Line by
-1 to 6
  [0x013f]  Set column to 13
  [0x0141]  Extended opcode 4: set Discriminator to 1
  [0x0145]  Special opcode 215: advance Address by 15 to 0x4004cb and Line
by 0 to 6
  [0x0146]  Set column to 5
  [0x0148]  Extended opcode 4: set Discriminator to 1
  [0x014c]  Special opcode 89: advance Address by 6 to 0x4004d1 and Line by
0 to 6
  [0x014d]  Set column to 16
  [0x014f]  Special opcode 60: advance Address by 4 to 0x4004d5 and Line by
-1 to 5
  [0x0150]  Set column to 11
  [0x0152]  Extended opcode 4: set Discriminator to 1
  [0x0156]  Special opcode 215: advance Address by 15 to 0x4004e4 and Line
by 0 to 5
  [0x0157]  Set column to 3
  [0x0159]  Extended opcode 4: set Discriminator to 1
  [0x015d]  Special opcode 89: advance Address by 6 to 0x4004ea and Line by
0 to 5
  [0x015e]  Set column to 10
  [0x0160]  Special opcode 65: advance Address by 4 to 0x4004ee and Line by
4 to 9
  [0x0161]  Set column to 1
  [0x0163]  Special opcode 76: advance Address by 5 to 0x4004f3 and Line by
1 to 10
  [0x0164]  Advance PC by 2 to 0x4004f5
  [0x0166]  Extended opcode 1: End of Sequence
...

So, it seems gdb ignores the "recommended breakpoint location" at 0x4004cb,
because there's an earlier one on the same line at 0x4004bc.

The gdb approach is reasonable, but it could do better.

It will be interesting to see how this example is handled by this (
https://sourceware.org/pipermail/gdb-patches/2020-May/168673.html ) gdb patch
series.

[Bug debug/95360] inconsistent behaviors at -O0

2020-05-27 Thread vries at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=95360

--- Comment #7 from Tom de Vries  ---
(In reply to Tom de Vries from comment #4)
> So, it seems gdb ignores the "recommended breakpoint location" at 0x4004cb,
> because there's an earlier one on the same line at 0x4004bc.
> 
> The gdb approach is reasonable, but it could do better.
> 

I found how to disable this behaviour in gdb, and then we do step from line 5
to line 6.

Filed gdb PR https://sourceware.org/bugzilla/show_bug.cgi?id=26054 .

[Bug debug/95574] New: line table entry in sequence with address after sequence

2020-06-08 Thread vries at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=95574

Bug ID: 95574
   Summary: line table entry in sequence with address after
sequence
   Product: gcc
   Version: 11.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: debug
  Assignee: unassigned at gcc dot gnu.org
  Reporter: vries at gcc dot gnu.org
  Target Milestone: ---

When doing a build from current trunk, I get the following in the .debug_line
section of build/x86_64-pc-linux-gnu/libgcc/libgcc_s.so.1:
...
  [0x06f5]  Special opcode 75: advance Address by 5 to 0x2faa and Line by 0
to 246
  [0x06f6]  Advance PC by 0 to 0x2faa
  [0x06f8]  Extended opcode 1: End of Sequence
...

This is nonsensical dwarf: both the special opcode and the End-Of-Sequence
declare a row in the matrix, each with the same address.

The special opcode declares a target instruction at that address.

The End-of-Sequence declares that the sequence ends before that address.

It's a contradiction that the target instruction is both part of the sequence
(according to Copy) and not part of the sequence (according to
End-of-Sequence).

[ Relevant dwarf standard bits:

end_sequence:

A boolean indicating that the current address is that of the first byte
after the end of a sequence of target machine instructions. end_sequence
terminates a sequence of lines; therefore other information in the same
row is not meaningful.

Special Opcodes

Each ubyte special opcode has the following effect on the state machine:
1. Add a signed integer to the line register.
2. Modify the operation pointer by incrementing the address and op_index
registers as described below.
3. Append a row to the matrix using the current values of the state machine
registers.

DW_LNE_end_sequence:

The DW_LNE_end_sequence opcode takes no operands. It sets the
end_sequence register of the state machine to “true” and appends a row
to the matrix using the current values of the state-machine registers.
Then it resets the registers to the initial values specified above (see
Section 6.2.2). Every line number program sequence must end with a
DW_LNE_end_sequence instruction which creates a row whose address is
that of the byte after the last target machine instruction of the sequence.

]

[Bug debug/95574] line table entry in sequence with address after sequence

2020-06-08 Thread vries at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=95574

Tom de Vries  changed:

   What|Removed |Added

   Keywords||wrong-debug

--- Comment #1 from Tom de Vries  ---
I can track this down to build/x86_64-pc-linux-gnu/libgcc/_absvsi2_s.o
...
  [0x0116]  Special opcode 75: advance Address by 5 to 0xa and Line by 0 to
246
  [0x0117]  Advance PC by 0 to 0xa
  [0x0119]  Extended opcode 1: End of Sequence
...
and using -save-temps I get the corresponding .s file:
...
.section.text.unlikely
.cfi_startproc
.type   __absvsi2.cold, @function
__absvsi2.cold:
.LFSB26:
.L12:
.cfi_def_cfa_offset 16
.loc 1 246 5 is_stmt 1 view .LVU17
callabort@PLT
.LVL7:
.loc 1 246 5 is_stmt 0 view .LVU33
.cfi_endproc
.LFE26:
...

I think this is due to the last .loc.  I'm not sure if it makes sense to
declare a .loc after a non-returning insn.

[Bug debug/95574] line table entry in sequence with address after sequence

2020-06-08 Thread vries at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=95574

--- Comment #2 from Tom de Vries  ---
Created attachment 48702
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=48702&action=edit
_absvsi2_s.c

To reproduce:
...
$ gcc -O2 -g -fpic -mlong-double-80 -fcf-protection -mshstk -fbuilding-libgcc
-fno-stack-protector -o _absvsi2_s.o -c _absvsi2_s.c -save-temps
...

[Bug debug/95574] line table entry in sequence with address after sequence

2020-06-08 Thread vries at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=95574

--- Comment #3 from Tom de Vries  ---
Gdb currently silently ignores the offending entry.

I've filed a gdb PR, PR26092 - "Complain about contradictory
DW_LNE_end_sequence marker" to complain about this (
https://sourceware.org/bugzilla/show_bug.cgi?id=26092 ).

[Bug debug/95601] New: Remove workaround for GDB PR in pass_partition_blocks::gate

2020-06-09 Thread vries at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=95601

Bug ID: 95601
   Summary: Remove workaround for GDB PR in
pass_partition_blocks::gate
   Product: gcc
   Version: 11.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: debug
  Assignee: unassigned at gcc dot gnu.org
  Reporter: vries at gcc dot gnu.org
  Target Milestone: ---

Current GCC contains this workaround in bb-reorder.c:
...
  /* Workaround a bug in GDB where read_partial_die doesn't cope
 with DIEs with DW_AT_ranges, see PR81115.  */
  && !(in_lto_p && MAIN_NAME_P (DECL_NAME (fun->decl;
...

The PR81115 is a typo, it actually refers to gcc PR 81155, as we can see from
the log message:
...
commit 8f72ce2cec8b8961f381995eb6e2c5de1cd0f3d3
Author: Jakub Jelinek 
Date:   Fri Jan 12 19:20:49 2018 +0100

re PR debug/81155 (Debug make check regressions in GCC 8.0)

PR debug/81155
* bb-reorder.c (pass_partition_blocks::gate): In lto don't
partition
main to workaround a bug in GDB.

From-SVN: r256592
...

The PR81155 discusses a GDB bug, but no GDB counterpart was filed, so I've
filed https://sourceware.org/bugzilla/show_bug.cgi?id=26095 for this purpose.

The GDB PR may be a duplicate of
https://sourceware.org/bugzilla/show_bug.cgi?id=23331 , in which case this
should be fixed from gdb release 9.1 onwards.

So, on the GCC side we need to test whether indeed the workaround is no longer
required for a recent gdb, and if so, either remove the workaround, or add a
comment to the workaround about starting which gdb release the workaround is no
longer required.

[Bug debug/81155] [8 Regression] Debug make check regressions in GCC 8.0

2020-06-09 Thread vries at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=81155

Tom de Vries  changed:

   What|Removed |Added

 CC||vries at gcc dot gnu.org

--- Comment #14 from Tom de Vries  ---
(In reply to Jakub Jelinek from comment #13)
> The GDB missing feature/bug worked around for GCC 8, hopefully GDB will be
> fixed soon and we can remove the workaround at some point.

Filed a GDB PR to track the worked-around issue in GDB:
- https://sourceware.org/bugzilla/show_bug.cgi?id=26095

Filed a GCC PR to tested whether the workaround is still required:
- PR95601 - "Remove workaround for GDB PR in pass_partition_blocks::gate"

[Bug target/96005] New: Add possibility to use newer ptx isa

2020-06-30 Thread vries at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=96005

Bug ID: 96005
   Summary: Add possibility to use newer ptx isa
   Product: gcc
   Version: 11.0
Status: UNCONFIRMED
  Severity: enhancement
  Priority: P3
 Component: target
  Assignee: unassigned at gcc dot gnu.org
  Reporter: vries at gcc dot gnu.org
  Target Milestone: ---

Currently, we're at ptx isa v3.1:
...
static void
nvptx_file_start (void)
{
  fputs ("// BEGIN PREAMBLE\n", asm_out_file);
  fputs ("\t.version\t3.1\n", asm_out_file);
...

Using a newer isa could give some benefits, f.i. starting PTX ISA 6.3 we have
atom.cas.b16.

[Bug target/90932] [nvptx] internal compiler error: in tree_to_shwi, at tree.c

2020-07-02 Thread vries at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=90932

Tom de Vries  changed:

   What|Removed |Added

 CC||roger at nextmovesoftware dot 
com

--- Comment #4 from Tom de Vries  ---
Patch submitted:
https://gcc.gnu.org/pipermail/gcc-patches/2020-June/549100.html

[Bug target/90932] [nvptx] internal compiler error: in tree_to_shwi, at tree.c

2020-07-03 Thread vries at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=90932

Tom de Vries  changed:

   What|Removed |Added

   Target Milestone|--- |11.0
 Status|NEW |RESOLVED
 Resolution|--- |FIXED

--- Comment #6 from Tom de Vries  ---
Patch committed, marking resolved-fixed.

[Bug debug/95574] line table entry in sequence with address after sequence

2020-07-07 Thread vries at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=95574

--- Comment #4 from Tom de Vries  ---
(In reply to Tom de Vries from comment #2)
> Created attachment 48702 [details]
> _absvsi2_s.c
> 
> To reproduce:
> ...
> $ gcc -O2 -g -fpic -mlong-double-80 -fcf-protection -mshstk
> -fbuilding-libgcc -fno-stack-protector -o _absvsi2_s.o -c _absvsi2_s.c
> -save-temps
> ...

Minimal example test.c:
...
#include 

int
foo (int a)
{
  int w = a;

  if (a < 0)
w = -(unsigned int) a;

  if (w < 0)
abort ();

   return w;
}
...

Compiled like this:
...
$ gcc test.c -O2 -g -c -fno-reorder-blocks-and-partition
...

resulting in test.s:
...
.file   "test.c"
.text
.Ltext0:
.p2align 4
.globl  foo
.type   foo, @function
foo:
.LVL0:
.LFB13:
.file 1 "test.c"
.loc 1 5 1 view -0
.cfi_startproc
.loc 1 6 3 view .LVU1
.loc 1 8 3 view .LVU2
.loc 1 5 1 is_stmt 0 view .LVU3
movl%edi, %eax
.loc 1 8 6 view .LVU4
testl   %edi, %edi
js  .L7
.LVL1:
.L2:
.loc 1 14 4 is_stmt 1 view .LVU5
.loc 1 15 1 is_stmt 0 view .LVU6
ret
.LVL2:
.p2align 4,,10
.p2align 3
.L7:
.loc 1 9 5 is_stmt 1 view .LVU7
.loc 1 9 9 is_stmt 0 view .LVU8
negl%eax
.LVL3:
.loc 1 11 3 is_stmt 1 view .LVU9
.loc 1 11 6 is_stmt 0 view .LVU10
testl   %eax, %eax
jns .L2
.loc 1 12 5 is_stmt 1 view .LVU11
.LVL4:
.loc 1 5 1 is_stmt 0 view .LVU12
pushq   %rax
.cfi_def_cfa_offset 16
.loc 1 12 5 view .LVU13
callabort
.LVL5:
.loc 1 12 5 view .LVU14
.cfi_endproc
.LFE13:
.size   foo, .-foo
.Letext0:
...

[Bug debug/95574] line table entry in sequence with address after sequence

2020-07-07 Thread vries at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=95574

--- Comment #5 from Tom de Vries  ---
This seems to be var-track related.

Before var-track we have:
...
(debug_insn 23 41 24 5 (debug_marker) "test2.c":12:5 -1
 (nil))
(call_insn 24 23 25 5 (call (mem:QI (symbol_ref:DI ("abort") [flags 0x41] 
) [0 __builtin_abort S1 A8])
(const_int 0 [0])) "test2.c":12:5 795 {*call}
 (expr_list:REG_CALL_DECL (symbol_ref:DI ("abort") [flags 0x41] 
)
(expr_list:REG_ARGS_SIZE (const_int 0 [0])
(expr_list:REG_NORETURN (const_int 0 [0])
(expr_list:REG_EH_REGION (const_int 0 [0])
(nil)
(nil))
(barrier 25 24 39)
(note 39 25 0 NOTE_INSN_DELETED)
...
and after:
...
(call_insn:TI 24 41 80 5 (call (mem:QI (symbol_ref:DI ("abort") [flags 0x41] 
) [0 __builtin_abort S1 A8])
(const_int 0 [0])) "test2.c":12:5 666 {*call}
 (expr_list:REG_CALL_ARG_LOCATION (nil)
(expr_list:REG_CALL_DECL (symbol_ref:DI ("abort") [flags 0x41] 
)
(expr_list:REG_ARGS_SIZE (const_int 0 [0])
(expr_list:REG_NORETURN (const_int 0 [0])
(expr_list:REG_EH_REGION (const_int 0 [0])
(nil))
(nil))
(note/c 80 24 79 (var_location a (entry_value:SI (reg:SI 5 di [ a ])))
NOTE_INSN_VAR_LOCATION)
(note/c 79 80 25 (var_location w (neg:SI (entry_value:SI (reg:SI 5 di [ a ]
NOTE_INSN_VAR_LOCATION)
(barrier 25 79 39)
(note 39 25 0 NOTE_INSN_DELETED)
...

[Bug debug/95574] line table entry in sequence with address after sequence

2020-07-07 Thread vries at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=95574

--- Comment #6 from Tom de Vries  ---
A simple way of fixing this is:
...
diff --git a/gcc/var-tracking.c b/gcc/var-tracking.c
index 899a5c0290d..4b143f6702b 100644
--- a/gcc/var-tracking.c
+++ b/gcc/var-tracking.c
@@ -6635,7 +6635,7 @@ add_with_sets (rtx_insn *insn, struct cselib_set *sets,
int n_se
ts)
std::swap (mos[n1], mos[n2]);
 }

-  if (CALL_P (insn))
+  if (CALL_P (insn) && ! find_reg_note (insn, REG_NORETURN, NULL))
 {
   micro_operation mo;

...

after which we have:
...
.loc 1 12 5 view .LVU13
callabort
.cfi_endproc
...

[Bug debug/95574] line table entry in sequence with address after sequence

2020-07-07 Thread vries at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=95574

--- Comment #7 from Tom de Vries  ---
A bit more subtle:
...
diff --git a/gcc/var-tracking.c b/gcc/var-tracking.c
index 899a5c0290d..f94eb38f797 100644
--- a/gcc/var-tracking.c
+++ b/gcc/var-tracking.c
@@ -8880,6 +8880,10 @@ emit_note_insn_var_location (variable **varp,
emit_note_data *data)

   if (where != EMIT_NOTE_BEFORE_INSN)
 {
+  if (CALL_P (insn) && where == EMIT_NOTE_AFTER_CALL_INSN
+ && find_reg_note (insn, REG_NORETURN, NULL))
+   goto done;
+
   note = emit_note_after (NOTE_INSN_VAR_LOCATION, insn);
   if (where == EMIT_NOTE_AFTER_CALL_INSN)
NOTE_DURING_CALL_P (note) = true;
@@ -8901,6 +8905,7 @@ emit_note_insn_var_location (variable **varp,
emit_note_data *data)
 }
   NOTE_VAR_LOCATION (note) = note_vl;

+ done:
   set_dv_changed (var->dv, false);
   gcc_assert (var->in_changed_variables);
   var->in_changed_variables = false;
...
which gets us:
...
.loc 1 12 5 view .LVU13
callabort
.LVL5:
.cfi_endproc
...

[Bug tree-optimization/96295] New: Wmaybe-uninitialized warning for range operator with empty range struct

2020-07-23 Thread vries at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=96295

Bug ID: 96295
   Summary: Wmaybe-uninitialized warning for range operator with
empty range struct
   Product: gcc
   Version: 11.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: tree-optimization
  Assignee: unassigned at gcc dot gnu.org
  Reporter: vries at gcc dot gnu.org
  Target Milestone: ---

Consider test-case tui-winsource.c (minimized from gdb sources, filed as gdb PR
https://sourceware.org/bugzilla/show_bug.cgi?id=26282):
...
struct tui_source_window_iterator
{
public:
  typedef tui_source_window_iterator self_type;
  typedef void *value_type;

  explicit tui_source_window_iterator (void *it, void *end) {}

  explicit tui_source_window_iterator (void *it) {}

  bool operator!= (const self_type &other) const { return false; }

  value_type operator* () const { return (value_type)0; }

  self_type &operator++ () { return *this; }
};

struct tui_source_windows
{
  tui_source_window_iterator begin () const
  {
return tui_source_window_iterator ((void *)0, (void *)0);
  }

  tui_source_window_iterator end () const
  {
return tui_source_window_iterator ((void*)0);
  }
};

void
foo (void)
{
  for (void *win : tui_source_windows ())
{
  (void)win;
}
}
...

With gcc-10, we have:
...
$ g++-10 -x c++  -Wall -O0 -g -c tui-winsource.c
...

And with gcc-11 (g++-11 (SUSE Linux) 11.0.0 20200720 (experimental) [revision
8764e9a3fc43f1117db77d1f056b6c3f15a29db3]):
...
$ g++-11 -x c++  -Wall -O0 -g -c tui-winsource.c
tui-winsource.c: In function ‘void foo()’:
tui-winsource.c:34:40: warning: ‘’ may be used uninitialized
[-Wmaybe-uninitialized]
   34 |   for (void *win : tui_source_windows ())
  |^
tui-winsource.c:20:30: note: by argument 1 of type ‘const tui_source_windows*’
to ‘tui_source_window_iterator tui_source_windows::begin() const’ declared here
   20 |   tui_source_window_iterator begin () const
  |  ^
tui-winsource.c:34:40: note: ‘’ declared here
   34 |   for (void *win : tui_source_windows ())
  |^
...

At gimple, we have:
...
struct tui_source_windows & __for_range;
struct tui_source_windows D.2465;

try
  {
__for_range = &D.2465;
tui_source_windows::begin (__for_range);
tui_source_windows::end (__for_range);
...

So, strictly speaking the warning is correct, because &D.2465 is not
initialized, and consequently __for_range is not initialized when it is passed
as argument to tui_source_windows::begin.

But it shouldn't matter, because struct tui_source_windows is an empty struct.

Workaround is to add:
...

 struct tui_source_windows
 {
+  tui_source_windows () {}

   tui_source_window_iterator begin () const
...
which gives us:
...
struct tui_source_windows & __for_range;
struct tui_source_windows D.2469;

try
  {
tui_source_windows::tui_source_windows (&D.2469);
__for_range = &D.2469;
tui_source_windows::begin (__for_range);
tui_source_windows::end (__for_range);
...

[Bug other/96296] New: libiberty/dyn-string.c:280:3: warning: ‘strncpy’ output truncated before terminating nul copying as many bytes from a string as its length [-Wstringop-truncation]

2020-07-23 Thread vries at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=96296

Bug ID: 96296
   Summary: libiberty/dyn-string.c:280:3: warning: ‘strncpy’
output truncated before terminating nul copying as
many bytes from a string as its length
[-Wstringop-truncation]
   Product: gcc
   Version: 11.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: other
  Assignee: unassigned at gcc dot gnu.org
  Reporter: vries at gcc dot gnu.org
  Target Milestone: ---

[ Reported and discussed earlier here:
https://gcc.gnu.org/legacy-ml/gcc/2019-03/msg00184.html ]

I ran into this warning in dyn-string.c:
...
$ gcc-11 src/libiberty/dyn-string.c -I src/include/ -c -DHAVE_STRING_H
-DHAVE_STDLIB_H  -Wall -O2
src/libiberty/dyn-string.c: In function ‘dyn_string_insert_cstr’:
src/libiberty/dyn-string.c:280:3: warning: ‘strncpy’ output truncated before
terminating nul copying as many bytes from a string as its length
[-Wstringop-truncation]
  280 |   strncpy (dest->s + pos, src, length);
  |   ^~~~
src/libiberty/dyn-string.c:272:16: note: length computed here
  272 |   int length = strlen (src);
  |^~~~
...

As mentioned here ( https://gcc.gnu.org/legacy-ml/gcc/2019-03/msg00199.html ):
"Using memcpy instead of strncpy would avoid the warning".

Tentative untested patch fixes the warning:
...
diff --git a/libiberty/dyn-string.c b/libiberty/dyn-string.c
index e10f691181a..bf155effb5f 100644
--- a/libiberty/dyn-string.c
+++ b/libiberty/dyn-string.c
@@ -277,7 +277,7 @@ dyn_string_insert_cstr (dyn_string_t dest, int pos, const
char *sr
c)
   for (i = dest->length; i >= pos; --i)
 dest->s[i + length] = dest->s[i];
   /* Splice in the new stuff.  */
-  strncpy (dest->s + pos, src, length);
+  memcpy (dest->s + pos, src, length);
   /* Compute the new length.  */
   dest->length += length;
   return 1;
...

[Bug target/35488] A incorrect result in a simple division, only in 32-bit gcc.

2020-07-29 Thread vries at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=35488

Tom de Vries  changed:

   What|Removed |Added

 CC||vries at gcc dot gnu.org

--- Comment #11 from Tom de Vries  ---
FTR, this PR is linked to from here (
https://lemire.me/blog/2020/06/26/gcc-not-nearest/ ).

[Bug target/96371] New: [nvptx] frounding-math support

2020-07-29 Thread vries at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=96371

Bug ID: 96371
   Summary: [nvptx] frounding-math support
   Product: gcc
   Version: 11.0
Status: UNCONFIRMED
  Severity: enhancement
  Priority: P3
 Component: target
  Assignee: unassigned at gcc dot gnu.org
  Reporter: vries at gcc dot gnu.org
  Target Milestone: ---

Floating-point ops like f.i. div:
...
(define_insn "div3"
  [(set (match_operand:SDFM 0 "nvptx_register_operand" "=R")
(div:SDFM (match_operand:SDFM 1 "nvptx_register_operand" "R")
  (match_operand:SDFM 2 "nvptx_nonmemory_operand" "RF")))]
  ""
  "%.\\tdiv%#%t0\\t%0, %1, %2;")
...
have a bit '%#' with according to nvptx_print_operand the semantics:
...
   # -- print a rounding mode for the instruction  
  ...
but which is hardcoded to .rn (round to nearest):
...
  else if (code == '#')
{
  fputs (".rn", file);
  return;
}
...

According to this ( https://gcc.gnu.org/wiki/FloatingPointMath ), round to
nearest is the rounding mode for div by default, but when -frounding-math is
specified, that can no longer be assumed.

The way this normally works is that a cpu has a status register describing the
current state of rounding mode.  By specifying -frounding-math, we make sure
the compiler makes no assumptions about rounding mode, such that the status
register will take effect at runtime. And at runtime, we use a libc function
from fenv.h to manipulate the status register.

Nvptx has no such status register.

Newlib has fenv.h support since version 3.2.0 (Jan 2020), but the nvptx port
has no implementation.  It could add one, implementing a fake status register
(perhaps there is another architecture that has something similar), which could
then be tested in the assembly for div3 to determine whether to execute
div.rn, div.rz, div.rm or div.rp.

The standalone implementation only supports scalar execution, so we only need a
scalar status register, but in the offloading and parallel context, each thread
can have set a different rounding mode, so we'll need thread-specific status
registers.  Perhaps that's too expensive, and we'll have to limit fesetround to
using constants (which I guess will be the case anyway for typical numerical
code).

Anyway, in absence of all this, without fenv.h support there's no way to set
the rounding mode, meaning that we can assume default rounding mode, as the
current implementation of "div3" does.  OTOH, we don't take that
assumption further, f.i. we don't ignore frounding-math.

It would be nice if we'd warn about making the assumption when emitting a div
with .rn hardcoded and frounding-math, something like:
...
Assuming fenv.h not supported, so using default rounding mode for float op.
...

Or, we could just error out when specifying frounding-math, or when
encountering a float op with frounding-math or some such.

[Bug target/90928] [9/10/11 Regression] [nvptx] internal compiler error: in instantiate_virtual_regs_in_insn, at function.c:1737

2020-07-30 Thread vries at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=90928

--- Comment #4 from Tom de Vries  ---
Patch submitted:
https://gcc.gnu.org/pipermail/gcc-patches/2020-July/550140.html

[Bug target/96401] New: [nvptx] Take advantage of subword ld/st/cvt

2020-07-31 Thread vries at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=96401

Bug ID: 96401
   Summary: [nvptx] Take advantage of subword ld/st/cvt
   Product: gcc
   Version: 11.0
Status: UNCONFIRMED
  Severity: enhancement
  Priority: P3
 Component: target
  Assignee: unassigned at gcc dot gnu.org
  Reporter: vries at gcc dot gnu.org
  Target Milestone: ---

Consider test-case test.c:
...
void
foo (void)
{
  volatile unsigned int v;
  volatile unsigned short v2;
  v2 = v;
}
...

With the current compiler, we have:
...
$ gcc test.c -S -o- -O2
  ...
.reg.u32 %r22;
.reg.u16 %r24;
ld.u32  %r22, [%frame];
cvt.u16.u32 %r24, %r22;
st.u16  [%frame+4], %r24;
}
...

As it happens, the nvptx manual states at 5.2.2 "Restricted Use of Sub-Word
Sizes":
...
For convenience, ld, st, and cvt instructions permit source and destination
data operands to be wider than the instruction-type size, so that narrow values
may be loaded, stored, and converted using regular-width registers. For
example, 8-bit or 16-bit values may be held directly in 32-bit or 64-bit
registers when being loaded, stored, or converted to other types and sizes.
...

In other words, we may emit instead:
...
.reg.u32 %r22;
ld.u32  %r22, [%frame];
st.u16  [%frame+4], %r22;
...

[Bug target/96401] [nvptx] Take advantage of subword ld/st/cvt

2020-07-31 Thread vries at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=96401

--- Comment #1 from Tom de Vries  ---
(In reply to Tom de Vries from comment #0)
> In other words, we may emit instead:
> ...
> .reg.u32 %r22;
> ld.u32  %r22, [%frame];
> st.u16  [%frame+4], %r22;
> ...

So, why don't we?

Using -dP we see the respective insns:
...
//(insn 5 2 6 2
//(set (reg:SI 22 [ v$0_1 ])
// (mem/v/c:SI (reg/f:DI 2 %frame) [1 v+0 S4 A128]))
// "test.c":7:6 6 {*movsi_insn}
// (nil))
ld.u32  %r22, [%frame]; // 5[c=4]  *movsi_insn/1

//(insn 6 5 9 2
//(set (reg:HI 24 [ v$0_1 ])
// (subreg:HI (reg:SI 22 [ v$0_1 ]) 0))
// "test.c":7:6 5 {*movhi_insn}
// (expr_list:REG_DEAD (reg:SI 22 [ v$0_1 ])
// (nil)))
cvt.u16.u32 %r24, %r22; // 6[c=12]  *movhi_insn/0

//(insn 9 6 12 2
//(set (mem/v/c:HI (plus:DI (reg/f:DI 2 %frame)
// (const_int 4 [0x4])) [2 v2+0 S2 A32])
// (reg:HI 24 [ v$0_1 ]))
// "test.c":7:6 5 {*movhi_insn}
// (expr_list:REG_DEAD (reg:HI 24 [ v$0_1 ])
// (nil)))
st.u16  [%frame+4], %r24;   // 9[c=4]  *movhi_insn/2
...

I went to investigate why combine doesn't combine insns 6 and 9, that is, why
doesn't it generate:
...
//(insn 9 6 12 2
//(set (mem/v/c:HI (plus:DI (reg/f:DI 2 %frame)
// (const_int 4 [0x4])) [2 v2+0 S2 A32])
// (subreg:HI (reg:SI 22 [ v$0_1 ]) 0))
// "test.c":7:6 5 {*movhi_insn}
// (expr_list:REG_DEAD (reg:HI 22 [ v$0_1 ])
// (nil)))
...

Part of the required changes is to make the movhi_insn store alternative work
for subreg source operand:
...
@@ -229,8 +234,8 @@

 (define_insn "*mov_insn"
   [(set (match_operand:QHSDIM 0 "nonimmediate_operand" "=R,R,m")
-   (match_operand:QHSDIM 1 "general_operand" "Ri,m,R"))]
-  "!MEM_P (operands[0]) || REG_P (operands[1])"
+   (match_operand:QHSDIM 1 "general_operand" "Ri,m,Q"))]
+  "!MEM_P (operands[0]) || REG_P (operands[1]) || SUBREG_P (operands[1])"
 {
   if (which_alternative == 1)
 return "%.\\tld%A1%u1\\t%0, %1;";
...
which required me to define:
...
+(define_constraint "Q"
+  "A pseudo register or subreg."
+  (ior (match_code "reg")
+  (match_code "subreg")))
+
...
[ Note that this constraint is an oddity, like the R constraint: it's not a
register constraint. ]

After debugging I found that I needed this as well:
...
diff --git a/gcc/config/nvptx/nvptx.c b/gcc/config/nvptx/nvptx.c
index d2f321fcbcc..2234edad53b 100644
--- a/gcc/config/nvptx/nvptx.c
+++ b/gcc/config/nvptx/nvptx.c
@@ -6444,7 +6444,7 @@ nvptx_data_alignment (const_tree type, unsigned int
basic_align)
 static bool
 nvptx_modes_tieable_p (machine_mode, machine_mode)
 {
-  return false;
+  return true;
 }

 /* Implement TARGET_HARD_REGNO_NREGS.  */
...
due to this bit in combine.c:subst():
...
  /* In general, don't install a subreg involving two   
 modes not tieable.  It can worsen register 
 allocation, and can even make invalid reload   
 insns, since the reg inside may need to be copied  
 from in the outside mode, and that may be invalid  
 if it is an fp reg copied in integer mode. 
   
  ...

Using these changes, I get the desired:
...
.reg.u32 %r22;
ld.u32  %r22, [%frame];
st.u16  [%frame+4], %r22;
...

[Bug target/96401] [nvptx] Take advantage of subword ld/st/cvt

2020-07-31 Thread vries at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=96401

--- Comment #2 from Tom de Vries  ---
(In reply to Tom de Vries from comment #1)
> Using these changes, I get the desired:
> ...
> .reg.u32 %r22;
> ld.u32  %r22, [%frame];
> st.u16  [%frame+4], %r22;
> ...

And to be precise about it, that's starting at fwprop1 that we have two insns:
...
(insn 5 2 9 2
(set (reg:SI 22 [ v$0_1 ])
 (mem/v/c:SI (reg/f:DI 2 %frame) [1 v+0 S4 A128]))
"test.c":7:6 6 {*movsi_insn}
(nil))
(insn 9 5 0 2
(set (mem/v/c:HI (plus:DI (reg/f:DI 2 %frame)
  (const_int 4 [0x4])) [2 v2+0 S2 A32])
 (subreg:HI (reg:SI 22 [ v$0_1 ]) 0))
"test.c":7:6 5 {*movhi_insn}
(expr_list:REG_DEAD (reg:SI 23 [ _2 ])
(nil)))
...

Which is a bit earlier (at 247r) than combine (at 271r).

[Bug target/96401] [nvptx] Take advantage of subword ld/st/cvt

2020-07-31 Thread vries at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=96401

--- Comment #3 from Tom de Vries  ---
Note that with the proposed TARGET_TRULY_NOOP_TRUNCATION -> false change (
https://gcc.gnu.org/pipermail/gcc-patches/2020-July/549896.html ), we start out
with the same ptx insns, but with the cvt.u16.u32 a truncate instead of a
subreg move:
...
//(insn 5 2 6 2
//(set (reg:SI 22 [ v$0_1 ])
// (mem/v/c:SI (reg/f:DI 2 %frame) [1 v+0 S4 A128]))
// "test.c":7:6 6 {*movsi_insn}
// (nil))
ld.u32  %r22, [%frame]; // 5[c=4]  *movsi_insn/1

//(insn 6 5 9 2
//(set (reg:HI 24 [ v$0_1 ])
// (truncate:HI (reg:SI 22 [ v$0_1 ])))
   "test.c":7:6 30 {truncsihi2}
// (expr_list:REG_DEAD (reg:SI 22 [ v$0_1 ])
// (nil)))
cvt.u16.u32 %r24, %r22; // 6[c=4]  truncsihi2/0

//(insn 9 6 12 2
//(set (mem/v/c:HI (plus:DI (reg/f:DI 2 %frame)
//  (const_int 4 [0x4])) [2 v2+0 S2 A32])
// (reg:HI 24 [ v$0_1 ])) "test.c":7:6 5 {*movhi_insn}
// (expr_list:REG_DEAD (reg:HI 24 [ v$0_1 ])
//(nil)))
st.u16  [%frame+4], %r24;   // 9[c=4]  *movhi_insn/2
...

Still, with the changes in comment 1 enabled we end up with the desired two
insns, though a bit later, at cse2 (265r), and not using movhi_insn:
...
(insn 9 5 0 2 (set (mem/v/c:HI (plus:DI (reg/f:DI 2 %frame)
(const_int 4 [0x4])) [2 v2+0 S2 A32])
(truncate:HI (reg:SI 22 [ v$0_1 ]))) "test.c":7:6 30 {truncsihi2}
 (expr_list:REG_DEAD (reg:HI 24 [ v$0_1 ])
(nil)))
...
so we might get this just with the nvptx_modes_tieable_p change.

[Bug target/96403] New: [nvptx] Less optimal code in v2si-cvt.c after setting TARGET_TRULY_NOOP_TRUNCATION to false

2020-07-31 Thread vries at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=96403

Bug ID: 96403
   Summary: [nvptx] Less optimal code in v2si-cvt.c after setting
TARGET_TRULY_NOOP_TRUNCATION to false
   Product: gcc
   Version: 11.0
Status: UNCONFIRMED
  Severity: enhancement
  Priority: P3
 Component: target
  Assignee: unassigned at gcc dot gnu.org
  Reporter: vries at gcc dot gnu.org
  Target Milestone: ---

[ I've rewritten the v2si-cvt.c source to something more minimal:
...
__v2si __attribute__((unused))
vector_cvt (__v2si arg)
{
  unsigned short *p = (unsigned short*)&arg;

  volatile unsigned short s = p[0];

  return arg;
}

__v2si __attribute__((unused))
vector_cvt_2 (__v2si arg)
{
  unsigned char *p = (unsigned char*)&arg;

  volatile unsigned char s = p[0];

  return arg;
}
...
]

When changing TARGET_TRULY_NOOP_TRUNCATION to false, we have a regression in
v2si-cvt.c, this for vector_cvt:
...
-   cvt.u16.u32 %r27, %r25.x;
+   mov.b64 %r26, %r25;
+   cvt.u16.u64 %r27, %r26;
...
and this for vector_cvt_2:
...
-   cvt.u32.u32 %r27, %r25.x;
-   st.u8   [%frame], %r27;
+   mov.b64 %r26, %r25;
+   cvt.u32.u64 %r27, %r26;
+   cvt.u16.u8  %r32, %r27;
+   mov.u16 %r29, %r32;
+   cvt.u32.u16 %r30, %r29;
+   st.u8   [%frame], %r30;
...

[Bug target/96403] [nvptx] Less optimal code in v2si-cvt.c after setting TARGET_TRULY_NOOP_TRUNCATION to false

2020-07-31 Thread vries at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=96403

Tom de Vries  changed:

   What|Removed |Added

 Target||nvptx

--- Comment #1 from Tom de Vries  ---
Looking at the first regression, we have without the patch:
...
//(insn 9 5 12 2
//(set (reg:HI 27 [ arg ])
// (subreg:HI (reg/v:V2SI 25 [ arg ]) 0))
// "v2si-cvt.c":11:32 5 {*movhi_insn}
// (nil))
cvt.u16.u32 %r27, %r25.x; // 9 [c=12] *movhi_insn/0
...
and with the patch:
...
//(insn 8 5 9 2 
//(set (reg:DI 26 [ arg ])
// (subreg:DI (reg/v:V2SI 25 [ arg ]) 0))
// "v2si-cvt.c":11:32 7 {*movdi_insn}
// (nil))
mov.b64 %r26, %r25; // 8[c=12]  *movdi_insn/0

//(insn 9 8 13 2
//(set (reg:HI 27 [ arg ])
// (truncate:HI (reg:DI 26 [ arg ])))
// "v2si-cvt.c":11:32 32 {truncdihi2}
// (expr_list:REG_DEAD (reg:DI 26 [ arg ])
//(nil)))
   cvt.u16.u64 %r27, %r26;
...

I guess we would like to generate this instead:
...
//(insn 9 8 13 2
//(set (reg:HI 27 [ arg ])
// (truncate:HI (subreg:SI (reg/v:V2SI 25 [ arg ]) 0))
// "v2si-cvt.c":11:32 32 {truncdihi2}
// (expr_list:REG_DEAD (reg:DI 26 [ arg ])
//(nil)))
   cvt.u16.u32 %r26, %r25.x;
...

Debugging combine, we hit TARGET_MODES_TIEABLE_P as a barrier, but after
enabling that we have a slightly different inns (the store has merged with the
truncate), where combine also fails:
...
Trying 8 -> 13:
8: r26:DI=r25:V2SI#0
   13: [%frame:DI]=trunc(r26:DI)
  REG_DEAD r26:DI
Failed to match this instruction:
(set (mem/v/c:HI (reg/f:DI 2 %frame) [2 s+0 S2 A128])
(truncate:HI (subreg:DI (reg/v:V2SI 25 [ arg ]) 0)))
...
I've tried enabling subregs in truncsi but that didn't help either.

I managed to get the desired code using this (to match the pattern tried by
combine):
...
@@ -372,11 +386,26 @@

 (define_insn "truncdi2"
   [(set (match_operand:QHSIM 0 "nvptx_nonimmediate_operand" "=R,m")
-   (truncate:QHSIM (match_operand:DI 1 "nvptx_register_operand" "R,R")))]
+   (truncate:QHSIM (match_operand:DI 1 "register_operand" "R,Q")))]
   ""
-  "@
-   %.\\tcvt%t0.u64\\t%0, %1;
-   %.\\tst%A0.u%T0\\t%0, %1;"
+{
+if (which_alternative == 0)
+  {
+if (SUBREG_P (operands[1])
+   && GET_MODE (SUBREG_REG (operands[1])) == V2SImode)
+  return "%.\\tcvt%t0.u32\\t%0, %1.x;";
+else
+  return "%.\\tcvt%t0.u64\\t%0, %1;";
+  }
+else
+  {
+if (SUBREG_P (operands[1])
+   && GET_MODE (SUBREG_REG (operands[1])) == V2SImode)
+  return "   %.\\tst%A0.u%T0\\t%0, %1.x;";
+else
+  return "   %.\\tst%A0.u%T0\\t%0, %1;";
+  }
+}
   [(set_attr "subregs_ok" "true")])

 ;; Integer arithmetic
...
But I would hope there's a cleaner way.

[Bug target/96428] [nvptx] nvptx_gen_shuffle does not handle V2DI mode – Fails with an ICE

2020-08-03 Thread vries at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=96428

--- Comment #1 from Tom de Vries  ---
Tentative patch:
...
diff --git a/gcc/config/nvptx/nvptx.c b/gcc/config/nvptx/nvptx.c
index d8a8fb2d55b..cf53a921e5b 100644
--- a/gcc/config/nvptx/nvptx.c
+++ b/gcc/config/nvptx/nvptx.c
@@ -1796,6 +1796,44 @@ nvptx_gen_shuffle (rtx dst, rtx src, rtx idx,
nvptx_shuffle_kind kind)
end_sequence ();
   }
   break;
+case E_V2SImode:
+  {
+   rtx src0 = gen_rtx_SUBREG (SImode, src, 0);
+   rtx src1 = gen_rtx_SUBREG (SImode, src, 4);
+   rtx dst0 = gen_rtx_SUBREG (SImode, dst, 0);
+   rtx dst1 = gen_rtx_SUBREG (SImode, dst, 4);
+   rtx tmp0 = gen_reg_rtx (SImode);
+   rtx tmp1 = gen_reg_rtx (SImode);
+   start_sequence ();
+   emit_insn (gen_movsi (tmp0, src0));
+   emit_insn (gen_movsi (tmp1, src1));
+   emit_insn (nvptx_gen_shuffle (tmp0, tmp0, idx, kind));
+   emit_insn (nvptx_gen_shuffle (tmp1, tmp1, idx, kind));
+   emit_insn (gen_movsi (dst0, tmp0));
+   emit_insn (gen_movsi (dst1, tmp1));
+   res = get_insns ();
+   end_sequence ();
+  }
+  break;
+case E_V2DImode:
+  {
+   rtx src0 = gen_rtx_SUBREG (DImode, src, 0);
+   rtx src1 = gen_rtx_SUBREG (DImode, src, 8);
+   rtx dst0 = gen_rtx_SUBREG (DImode, dst, 0);
+   rtx dst1 = gen_rtx_SUBREG (DImode, dst, 8);
+   rtx tmp0 = gen_reg_rtx (DImode);
+   rtx tmp1 = gen_reg_rtx (DImode);
+   start_sequence ();
+   emit_insn (gen_movdi (tmp0, src0));
+   emit_insn (gen_movdi (tmp1, src1));
+   emit_insn (nvptx_gen_shuffle (tmp0, tmp0, idx, kind));
+   emit_insn (nvptx_gen_shuffle (tmp1, tmp1, idx, kind));
+   emit_insn (gen_movdi (dst0, tmp0));
+   emit_insn (gen_movdi (dst1, tmp1));
+   res = get_insns ();
+   end_sequence ();
+  }
+  break;
 case E_BImode:
   {
rtx tmp = gen_reg_rtx (SImode);
...

[Bug target/96428] [nvptx] nvptx_gen_shuffle does not handle V2DI mode – Fails with an ICE

2020-08-03 Thread vries at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=96428

--- Comment #2 from Tom de Vries  ---
(In reply to Tobias Burnus from comment #0)
> Created attachment 48986 [details]
> Test case (Fortran, use 'gfortran -fopenacc" with nvptx offloading)
> 

With the test-case setup like this:
...
! { dg-do link }
! { dg-additional-options "-O2 -ftree-vectorize" }  
...
I run into the ICE, but with the fix for the ICE in place, I run into:
...
FAIL: libgomp.oacc-fortran/test.f90 -DACC_DEVICE_TYPE_nvidia=1
-DACC_MEM_SHARED=0 -foffload=nvptx-none  -O  (test for excess errors)
Excess errors:
unresolved symbol two_
...

Can you post a test-case that doesn't fail during linking?

[Bug target/96428] [nvptx] nvptx_gen_shuffle does not handle V2DI mode – Fails with an ICE

2020-08-04 Thread vries at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=96428

--- Comment #4 from Tom de Vries  ---
FTR, this is not the leanest solution.

This patch generates:
...
cvt.u64.u64 %r74, %r65.x;
cvt.u64.u64 %r75, %r65.y;
mov.b64 {%r76,%r77}, %r74;
shfl.idx.b32%r76, %r76, 0, 31;
shfl.idx.b32%r77, %r77, 0, 31;
mov.b64 %r74, {%r76,%r77};
mov.b64 {%r78,%r79}, %r75;
shfl.idx.b32%r78, %r78, 0, 31;
shfl.idx.b32%r79, %r79, 0, 31;
mov.b64 %r75, {%r78,%r79};
cvt.u64.u64 %r65.x, %r74;
cvt.u64.u64 %r65.y, %r75;
...

but using this followup patch:
...
diff --git a/gcc/config/nvptx/nvptx.c b/gcc/config/nvptx/nvptx.c
index cf53a921e5b..84df8e1ca4a 100644
--- a/gcc/config/nvptx/nvptx.c
+++ b/gcc/config/nvptx/nvptx.c
@@ -1821,15 +1821,9 @@ nvptx_gen_shuffle (rtx dst, rtx src, rtx idx,
nvptx_shuffle_kind kind)
rtx src1 = gen_rtx_SUBREG (DImode, src, 8);
rtx dst0 = gen_rtx_SUBREG (DImode, dst, 0);
rtx dst1 = gen_rtx_SUBREG (DImode, dst, 8);
-   rtx tmp0 = gen_reg_rtx (DImode);
-   rtx tmp1 = gen_reg_rtx (DImode);
start_sequence ();
-   emit_insn (gen_movdi (tmp0, src0));
-   emit_insn (gen_movdi (tmp1, src1));
-   emit_insn (nvptx_gen_shuffle (tmp0, tmp0, idx, kind));
-   emit_insn (nvptx_gen_shuffle (tmp1, tmp1, idx, kind));
-   emit_insn (gen_movdi (dst0, tmp0));
-   emit_insn (gen_movdi (dst1, tmp1));
+   emit_insn (nvptx_gen_shuffle (dst0, src0, idx, kind));
+   emit_insn (nvptx_gen_shuffle (dst1, src1, idx, kind));
res = get_insns ();
end_sequence ();
   }
diff --git a/gcc/config/nvptx/nvptx.md b/gcc/config/nvptx/nvptx.md
index c23edcf34bf..6e81ad449b3 100644
--- a/gcc/config/nvptx/nvptx.md
+++ b/gcc/config/nvptx/nvptx.md
@@ -176,6 +176,11 @@
   "A pseudo register."
   (match_code "reg"))

+(define_constraint "Q"
+  "A pseudo register."
+  (ior (match_code "reg")
+   (match_code "subreg")))
+
 (define_constraint "Ia"
   "Any integer constant."
   (and (match_code "const_int") (match_test "true")))
@@ -1513,21 +1518,23 @@
 ;; extract parts of a 64 bit object into 2 32-bit ints
 (define_insn "unpacksi2"
   [(set (match_operand:SI 0 "nvptx_register_operand" "=R")
-(unspec:SI [(match_operand:BITD 2 "nvptx_register_operand" "R")
+(unspec:SI [(match_operand:BITD 2 "register_operand" "Q")
(const_int 0)] UNSPEC_BIT_CONV))
(set (match_operand:SI 1 "nvptx_register_operand" "=R")
 (unspec:SI [(match_dup 2) (const_int 1)] UNSPEC_BIT_CONV))]
   ""
-  "%.\\tmov.b64\\t{%0,%1}, %2;")
+  "%.\\tmov.b64\\t{%0,%1}, %2;"
+  [(set_attr "subregs_ok" "true")])

 ;; pack 2 32-bit ints into a 64 bit object
 (define_insn "packsi2"
-  [(set (match_operand:BITD 0 "nvptx_register_operand" "=R")
+  [(set (match_operand:BITD 0 "register_operand" "=Q")
 (unspec:BITD [(match_operand:SI 1 "nvptx_register_operand" "R")
  (match_operand:SI 2 "nvptx_register_operand" "R")]
UNSPEC_BIT_CONV))]
   ""
-  "%.\\tmov.b64\\t%0, {%1,%2};")
+  "%.\\tmov.b64\\t%0, {%1,%2};"
+  [(set_attr "subregs_ok" "true")])

 ;; Atomic insns.

...

we have instead:
...
mov.b64 {%r74,%r75}, %r65.x;
shfl.idx.b32%r74, %r74, 0, 31;
shfl.idx.b32%r75, %r75, 0, 31;
mov.b64 %r65.x, {%r74,%r75};
...

But for an ICE fix, I'd rather keep things simple.

[Bug target/96428] [nvptx] nvptx_gen_shuffle does not handle V2DI mode – Fails with an ICE

2020-08-04 Thread vries at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=96428

Tom de Vries  changed:

   What|Removed |Added

 Status|UNCONFIRMED |RESOLVED
   Target Milestone|--- |11.0
 Resolution|--- |FIXED

--- Comment #6 from Tom de Vries  ---
Patch with test-case committed, marking resolved-fixed.

[Bug target/96494] New: [nvptx] Enable effective target sync_int_long

2020-08-06 Thread vries at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=96494

Bug ID: 96494
   Summary: [nvptx] Enable effective target sync_int_long
   Product: gcc
   Version: 11.0
Status: UNCONFIRMED
  Severity: enhancement
  Priority: P3
 Component: target
  Assignee: unassigned at gcc dot gnu.org
  Reporter: vries at gcc dot gnu.org
  Target Milestone: ---

The effective target sync_int_long currently doesn't include nvptx.

Consequently, when running gcc.dg/ia64-sync-*.c, we get "UNSUPPORTED".

If we add nvptx to sync_int_long, those tests pass.

AFAICT, from the point of view of the PTX isa, there's no reason why we
couldn't support this.

So, unless a testsuite run points to some problem, we should enable the
sync_int_long for nvptx.

[Bug target/96520] New: [nvptx] Fix set insn component order

2020-08-07 Thread vries at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=96520

Bug ID: 96520
   Summary: [nvptx] Fix set insn component order
   Product: gcc
   Version: 11.0
Status: UNCONFIRMED
  Severity: trivial
  Priority: P3
 Component: target
  Assignee: unassigned at gcc dot gnu.org
  Reporter: vries at gcc dot gnu.org
  Target Milestone: ---

I noticed that we emit:
...
set.u32.eq.u64 %r31,%r26,2147483648;
...

But the ptx isa specifies:
...
set.CmpOp{.ftz}.dtype.stype d, a, b;
...
so we should emit instead:
...
set.eq.u32.u64 %r31,%r26,2147483648;
...

[Bug c++/96537] New: Missing std::pair constructor

2020-08-08 Thread vries at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=96537

Bug ID: 96537
   Summary: Missing std::pair constructor
   Product: gcc
   Version: 4.8.5
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: c++
  Assignee: unassigned at gcc dot gnu.org
  Reporter: vries at gcc dot gnu.org
  Target Milestone: ---

Consider test-case test.c:
...
#include 
#include 
class A {
 public:
  A (int a) { i = a; }
  int i;
};
int main (void) {
  std::unordered_map> m;
  m.emplace (1, new A(1));
  return 0;
}
...

With gcc 7.5.0, we have:
...
$ g++-7 test.c -O2 -std=c++11
$ ./a.out
$
...

With gcc 4.8.5, we have instead:
...
$ g++-4.8 test.c -O2 -std=c++11 
In file included from /usr/include/c++/4.8/bits/hashtable.h:35:0,
 from /usr/include/c++/4.8/unordered_map:47,
 from test.c:2:
/usr/include/c++/4.8/bits/hashtable_policy.h: In instantiation of
‘std::__detail::_Hash_node<_Value, false>::_Hash_node(_Args&& ...) [with _Args
= {int, A*}; _Value = std::pair >]’:
/usr/include/c++/4.8/ext/new_allocator.h:120:4:   required from ‘void
__gnu_cxx::new_allocator<_Tp>::construct(_Up*, _Args&& ...) [with _Up =
std::__detail::_Hash_node >, false>;
_Args = {int, A*}; _Tp = std::__detail::_Hash_node >, false>]’
/usr/include/c++/4.8/bits/hashtable.h:727:6:   required from
‘std::_Hashtable<_Key, _Value, _Alloc, _ExtractKey, _Equal, _H1, _H2, _Hash,
_RehashPolicy, _Traits>::__node_type* std::_Hashtable<_Key, _Value, _Alloc,
_ExtractKey, _Equal, _H1, _H2, _Hash, _RehashPolicy,
_Traits>::_M_allocate_node(_Args&& ...) [with _Args = {int, A*}; _Key = int;
_Value = std::pair >; _Alloc =
std::allocator > >; _ExtractKey =
std::__detail::_Select1st; _Equal = std::equal_to; _H1 = std::hash;
_H2 = std::__detail::_Mod_range_hashing; _Hash =
std::__detail::_Default_ranged_hash; _RehashPolicy =
std::__detail::_Prime_rehash_policy; _Traits =
std::__detail::_Hashtable_traits; std::_Hashtable<_Key,
_Value, _Alloc, _ExtractKey, _Equal, _H1, _H2, _Hash, _RehashPolicy,
_Traits>::__node_type = std::__detail::_Hash_node >, false>]’
/usr/include/c++/4.8/bits/hashtable.h:1260:71:   required from
‘std::pair::iterator, bool> std::_Hashtable<_Key,
_Value, _Alloc, _ExtractKey, _Equal, _H1, _H2, _Hash, _RehashPolicy,
_Traits>::_M_emplace(std::true_type, _Args&& ...) [with _Args = {int, A*}; _Key
= int; _Value = std::pair >; _Alloc =
std::allocator > >; _ExtractKey =
std::__detail::_Select1st; _Equal = std::equal_to; _H1 = std::hash;
_H2 = std::__detail::_Mod_range_hashing; _Hash =
std::__detail::_Default_ranged_hash; _RehashPolicy =
std::__detail::_Prime_rehash_policy; _Traits =
std::__detail::_Hashtable_traits; typename
std::__detail::_Hashtable_base<_Key, _Value, _ExtractKey, _Equal, _H1, _H2,
_Hash, _Traits>::iterator = std::__detail::_Node_iterator >, false, false>; std::true_type =
std::integral_constant]’
/usr/include/c++/4.8/bits/hashtable.h:665:69:   required from
‘std::_Hashtable<_Key, _Value, _Alloc, _ExtractKey, _Equal, _H1, _H2, _Hash,
_RehashPolicy, _Traits>::__ireturn_type std::_Hashtable<_Key, _Value, _Alloc,
_ExtractKey, _Equal, _H1, _H2, _Hash, _RehashPolicy, _Traits>::emplace(_Args&&
...) [with _Args = {int, A*}; _Key = int; _Value = std::pair >; _Alloc = std::allocator > >; _ExtractKey = std::__detail::_Select1st; _Equal =
std::equal_to; _H1 = std::hash; _H2 =
std::__detail::_Mod_range_hashing; _Hash = std::__detail::_Default_ranged_hash;
_RehashPolicy = std::__detail::_Prime_rehash_policy; _Traits =
std::__detail::_Hashtable_traits; std::_Hashtable<_Key,
_Value, _Alloc, _ExtractKey, _Equal, _H1, _H2, _Hash, _RehashPolicy,
_Traits>::__ireturn_type =
std::pair
>, false, false>, bool>]’
/usr/include/c++/4.8/bits/unordered_map.h:311:54:   required from
‘std::pair, _Alloc,
std::__detail::_Select1st, _Pred, _Hash, std::__detail::_Mod_range_hashing,
std::__detail::_Default_ranged_hash, std::__detail::_Prime_rehash_policy,
std::__detail::_Hashtable_traits,
std::is_default_constructible<_Hash>, std::is_copy_assignable<_Hash>,
std::__detail::__is_noexcept_hash<_Key, _Hash> > >::value, false, true>
>::iterator, bool> std::unordered_map<_Key, _Tp, _Hash, _Pred,
_Alloc>::emplace(_Args&& ...) [with _Args = {int, A*}; _Key = int; _Tp =
std::unique_ptr; _Hash = std::hash; _Pred = std::equal_to; _Alloc
= std::allocator > >; typename
std::_Hashtable<_Key, std::pair, _Alloc,
std::__detail::_Select1st, _Pred, _Hash, std::__detail::_Mod_range_hashing,
std::__detail::_Default_ranged_hash, std::__detail::_Prime_rehash_policy,
std::__detail::_Hashtable_traits,
std::is_default_constructible<_Hash>, std::is_copy_assignable<_Hash>,
std::__detail::__is_noexcept_hash<_Key, _Hash> > >::value, false, true

[Bug c++/96537] Missing std::pair constructor

2020-08-08 Thread vries at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=96537

Tom de Vries  changed:

   What|Removed |Added

  Known to work||7.5.0
   Target Milestone|--- |7.5
 Resolution|--- |FIXED
 Status|UNCONFIRMED |RESOLVED
  Known to fail||4.8.5

--- Comment #1 from Tom de Vries  ---
Can be worked around by using:
...
  m.emplace (1, std::unique_ptr (new A(1)));
...

This PR is probably a duplicate, but I'm not sure which one.

Anyway, this is fixed starting 7.5.0,  so closing it as resolved-fixed, perhaps
someone can mark it a duplicate.

[Bug c++/96537] Missing std::pair constructor

2020-08-10 Thread vries at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=96537

--- Comment #4 from Tom de Vries  ---
(In reply to Jonathan Wakely from comment #2)
> Not a bug. C++11 and C++14 said for the relevant pair(U&&, V&&) constructor:
> 
> Remarks: If U is not implicitly convertible to first_type or V is not
> implicitly convertible to second_type this constructor shall not participate
> in overload resolution.
> 
> That was changed for C++17 by http://wg21.link/n4387 which was accepted into
> the C++17 draft in 2015, so you can't expect C++11 compilers released before
> then to support the new API.

Thanks for the comment.  Still, if this is a language version issue, I used
-std=c++11 with 7.5.0, shouldn't I then get the same behaviour as with gcc
4.8.5?

[Bug target/96494] [nvptx] Enable effective target sync_int_long

2020-08-10 Thread vries at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=96494

--- Comment #1 from Tom de Vries  ---
(In reply to Tom de Vries from comment #0)
> AFAICT, from the point of view of the PTX isa, there's no reason why we
> couldn't support this.
> 
> So, unless a testsuite run points to some problem, we should enable the
> sync_int_long for nvptx.

Well, I found a problem with test-case gcc/testsuite/gcc.dg/pr86314.c.

There we try to do an atomic insn on stack, and since stack is implemented as
.local, and the atom insn is not supported for .local, we run into:
...
 nvptx-run: error getting kernel result: operation not supported on
global/shared address space
...

Something like this would work:
...
$ git diff
diff --git a/gcc/testsuite/gcc.dg/pr86314.c b/gcc/testsuite/gcc.dg/pr86314.c
index 8962a3cf2ff..565fb02eee2 100644
--- a/gcc/testsuite/gcc.dg/pr86314.c
+++ b/gcc/testsuite/gcc.dg/pr86314.c
@@ -1,5 +1,5 @@
 // PR target/86314
-// { dg-do run { target sync_int_long } }
+// { dg-do run { target sync_int_long_stack } }
 // { dg-options "-O2" }

 __attribute__((noinline, noclone)) unsigned long
diff --git a/gcc/testsuite/lib/target-supports.exp
b/gcc/testsuite/lib/target-supports.exp
index e79015b4d54..a870b1de275 100644
--- a/gcc/testsuite/lib/target-supports.exp
+++ b/gcc/testsuite/lib/target-supports.exp
@@ -7704,7 +7704,16 @@ proc check_effective_target_sync_int_long { } {
 || [istarget cris-*-*]
 || ([istarget sparc*-*-*] && [check_effective_target_sparc_v9])
 || ([istarget arc*-*-*] && [check_effective_target_arc_atomic])
-|| [check_effective_target_mips_llsc] }}]
+|| [check_effective_target_mips_llsc]
+|| [istarget nvptx*-*-*]
+}}]
+}
+
+proc check_effective_target_sync_int_long_stack { } {
+return [check_cached_effective_target sync_int_long_stack {
+  expr { ![istarget nvptx*-*-*]
+&& [check_effective_target_sync_int_long]   
+}}]
 }

 # Return 1 if the target supports atomic operations on "char" and "short".
...

[Bug target/96494] [nvptx] Enable effective target sync_int_long

2020-08-10 Thread vries at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=96494

--- Comment #2 from Tom de Vries  ---
FTR, we could fix this by just mapping onto a nonatomic insn for .local (and
I'm not really sure why ptx doesn't).

But since we have generic pointers, we only known runtime whether something is
local (using isspacep), so that while that will help the standalone target be
more generic, it'll possibly make the offloading target slower and larger.

[Bug target/83812] nvptx-run: error getting kernel result: operation not supported on global/shared address space

2020-08-10 Thread vries at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=83812

--- Comment #1 from Tom de Vries  ---
See PR 96494.

[Bug target/96566] New: [nvptx] Timeout in gcc.dg/builtin-object-size-21.c

2020-08-11 Thread vries at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=96566

Bug ID: 96566
   Summary: [nvptx] Timeout in gcc.dg/builtin-object-size-21.c
   Product: gcc
   Version: 11.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: target
  Assignee: unassigned at gcc dot gnu.org
  Reporter: vries at gcc dot gnu.org
  Target Milestone: ---

When running test-case gcc.dg/builtin-object-size-21.c, we have:
...
spawn -ignore SIGHUP /home/vries/nvptx/mainkernel-2/build-gcc/gcc/xgcc
-B/home/vries/nvptx/mainkernel-2/build-gcc/gcc/
/home/vries/nvptx/mainkernel-2/source-gcc/gcc/testsuite/gcc.dg/builtin-object-size-21.c
-fno-diagnostics-show-caret -fno-diagnostics-show-line-numbers
-fdiagnostics-color=never -fdiagnostics-urls=never
--sysroot=/home/vries/nvptx/mainkernel-2/install/nvptx-none -Wall
-fdump-tree-optimized -S -isystem
/home/vries/nvptx/mainkernel-2/build-gcc/nvptx-none/./newlib/targ-include
-isystem /home/vries/nvptx/mainkernel-2/source-gcc/newlib/libc/include -o
builtin-object-size-21.s^M
/home/vries/nvptx/mainkernel-2/source-gcc/gcc/testsuite/gcc.dg/builtin-object-size-21.c:43:14:
error: size of variable 'xmx_1' is too large^M
/home/vries/nvptx/mainkernel-2/source-gcc/gcc/testsuite/gcc.dg/builtin-object-size-21.c:29:14:
error: size of variable 'xm3_4' is too large^M
WARNING: program timed out
compiler exited with status 1
PASS: gcc.dg/builtin-object-size-21.c  (test for errors, line 29)
PASS: gcc.dg/builtin-object-size-21.c  (test for errors, line 43)
PASS: gcc.dg/builtin-object-size-21.c (test for excess errors)
...

If we run the command by hand, and tail the .s file, we get an endless
repetition of 0, 0, 0, ... , which starts off like this:
...
// BEGIN GLOBAL VAR DEF: xm3_3
.visible .global .align 1 .u32 xm3_3[-2305843009213693951]
  = { 0, 0, 0,
...

The negative length doesn't look good.

[Bug target/96566] [nvptx] Timeout in gcc.dg/builtin-object-size-21.c

2020-08-11 Thread vries at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=96566

--- Comment #1 from Tom de Vries  ---
Corresponding source bit:
...
struct Ax_m3 { char a[PTRDIFF_MAX - 3], ax[]; };

struct Ax_m3 xm3_3 = { { 0 }, { 1, 2, 3 } };


On x86_64, we generate for this:
...
xm3_3:
.byte   0
.zero   9223372036854775803
.byte   1
.byte   2
.byte   3
...
where 9223372036854775803 is 0x7FFB.

[Bug target/96566] [nvptx] Timeout in gcc.dg/builtin-object-size-21.c

2020-08-11 Thread vries at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=96566

--- Comment #2 from Tom de Vries  ---
(In reply to Tom de Vries from comment #0)
> If we run the command by hand, and tail the .s file, we get an endless
> repetition of 0, 0, 0, ... , which starts off like this:
> ...
> // BEGIN GLOBAL VAR DEF: xm3_3
> .visible .global .align 1 .u32 xm3_3[-2305843009213693951]
>   = { 0, 0, 0,
> ...
> 
> The negative length doesn't look good.

Using:
...
diff --git a/gcc/config/nvptx/nvptx.c b/gcc/config/nvptx/nvptx.c
index cf53a921e5b..752c12561dd 100644
--- a/gcc/config/nvptx/nvptx.c
+++ b/gcc/config/nvptx/nvptx.c
@@ -2232,7 +2232,7 @@ nvptx_assemble_decl_begin (FILE *file, const char *name,
const char *section,
   if (size)
 /* We make everything an array, to simplify any initialization
emission.  */
-fprintf (file, "[" HOST_WIDE_INT_PRINT_DEC "]", init_frag.remaining);
+fprintf (file, "[" HOST_WIDE_INT_PRINT_UNSIGNED "]", init_frag.remaining);
   else if (atype)
 fprintf (file, "[]");
 }
...

we have instead:
...
.visible .global .align 1 .u32 xm3_3[16140901064495857665]
...
which in hex is 0xE001, so it still looks wrong.

[Bug target/96566] [nvptx] Timeout in gcc.dg/builtin-object-size-21.c

2020-08-11 Thread vries at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=96566

--- Comment #5 from Tom de Vries  ---
Then with this in addition:
...
@@ -2202,7 +2202,7 @@ nvptx_assemble_decl_begin (FILE *file, const char *name,
const char *section,
 /* Neither vector nor complex types can contain the other.  */
 type = TREE_TYPE (type);

-  unsigned elt_size = int_size_in_bytes (type);
+  unsigned HOST_WIDE_INT elt_size = int_size_in_bytes (type);

   /* Largest mode we're prepared to accept.  For BLKmode types we
  don't know if it'll contain pointer constants, so have to choose
...
we have:
...
// BEGIN GLOBAL VAR DEF: xm3_3
.visible .global .align 1 .u32 xm3_3[2305843009213693952] = { 0, 0, 0,
...
where 2305843009213693952 is 0x2000, so this claims one byte more
than required (due to using .u32).  This may cause an overflow in ptx, not sure
yet.

[Bug target/96566] [nvptx] Timeout in gcc.dg/builtin-object-size-21.c

2020-08-11 Thread vries at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=96566

--- Comment #6 from Tom de Vries  ---
(In reply to Jakub Jelinek from comment #3)
> Either the test can be skipped on nvptx or any targets that don't emit
> something like a .zero similar directive, or we should after the size of
> variable is too large diagnostic throw the initializer away (set it to
> error_mark_node)?
> Of course, I guess the timeout will happen even if the object size is not
> too large for the warning, just slightly below it,
> struct Ax_m3 { char a[PTRDIFF_MAX / 32 - 3], ax[]; };
> struct Ax_m3 xm3_3 = { { 0 }, { 1, 2, 3 } };
> will IMHO still timeout if it needs to emit 288 quadrillion "0, " strings.

Agreed, I browsed the ptx spec at bit, and was hoping for a better way to
express this, but it seems there isn't, even in the latest ptx version (7.0).

As for the ptx back-end, we could add an -minit-limit, with a reasonable
default.

With a size of 0xfff we take 5s and generate a 193MB assembly file.

With a size of 0x we take 1m10s and generate a 3.1GB assembly file.

So perhaps the first could be a good default. 

Then when running into the limit we error out, instead of timing out or running
out of disk space.

[Bug target/96566] [nvptx] Timeout in gcc.dg/builtin-object-size-21.c

2020-08-11 Thread vries at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=96566

--- Comment #8 from Tom de Vries  ---
(In reply to Tom de Vries from comment #6)
> With a size of 0xfff we take 5s and generate a 193MB assembly file.
> 
> With a size of 0x we take 1m10s and generate a 3.1GB assembly file.


FTR, I tried the same code with latest (11.0 update 1) cuda, and got these
results:

With a size of 0xfff we take 19.4s and generate a 769MB assembly file (it's
bigger because it uses u8 instead of u32 as basetype).

With a size of 0x we run into "Floating point exception (core dumped)"
after 8 minutes.

[Bug target/96566] [nvptx] Timeout in gcc.dg/builtin-object-size-21.c

2020-08-11 Thread vries at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=96566

--- Comment #9 from Tom de Vries  ---
(In reply to Jakub Jelinek from comment #7)
> I'm not sure a target specific option is the way to go here, the only
> difference is that nvptx spends all the time on this (adjusted) testcase at
> compile time (and eats all disk space there too), while on x86_64 it is at
> assembly time.
> gcc -O2 -c -o /tmp/1.o /tmp/1.c
> /tmp/ccUN9rYB.s: Assembler messages:
> /tmp/ccUN9rYB.s: Fatal error: can't fill 256 bytes in section .data of
> /tmp/1.o: 'No space left on device'
> In real-world people will only compile code that is useful for something,
> and we should honor there the no hardcoded limits unless really necessary
> rule, some users may need 20GB initializers some day (sure, on most PTX
> decides it wouldn't likely fit, but that can be diagnosed later).
> For the error recovery, it is ok to throw away the initializers if it
> doesn't result in further diagnostics, but otherwise, let's let users do
> what they want
> if they have time and disk space for that.

I guess we can set the limit to the max by default, and then run the testsuite
with the limit set to something more reasonable.

[Bug target/96566] [nvptx] Timeout in gcc.dg/builtin-object-size-21.c

2020-08-11 Thread vries at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=96566

--- Comment #11 from Tom de Vries  ---
(In reply to Martin Sebor from comment #10)
> The issue described in bug 92815 comment 9 sounds like a similar problem. 
> Does sending the output to /dev/null instead of a .s file help?  If it does,
> adding a dg directive to do that might be a solution.

It would certainly help for the disk-space issue.

[Bug target/96566] [nvptx] Timeout in gcc.dg/builtin-object-size-21.c

2020-08-11 Thread vries at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=96566

--- Comment #12 from Tom de Vries  ---
(In reply to Tom de Vries from comment #6)
> (In reply to Jakub Jelinek from comment #3)
> > Either the test can be skipped on nvptx or any targets that don't emit
> > something like a .zero similar directive, 

How about this:
...
diff --git a/gcc/testsuite/gcc.dg/builtin-object-size-21.c
b/gcc/testsuite/gcc.dg/builtin-object-size-21.c
index 1c42374ba89..7e0f85ffdf3 100644
--- a/gcc/testsuite/gcc.dg/builtin-object-size-21.c
+++ b/gcc/testsuite/gcc.dg/builtin-object-size-21.c
@@ -1,7 +1,8 @@
 /* PR middle-end/92815 - spurious -Wstringop-overflow writing into
a flexible array of an extern struct
{ dg-do compile }
-   { dg-options "-Wall -fdump-tree-optimized" } */
+   { dg-options "-Wall -fdump-tree-optimized" }
+   { dg-require-effective-target large_initializer } */

 #define PTRDIFF_MAX __PTRDIFF_MAX__

diff --git a/gcc/testsuite/gcc.dg/strlenopt-55.c
b/gcc/testsuite/gcc.dg/strlenopt-55.c
index ea6fb22a2ed..ca89ecd3c53 100644
--- a/gcc/testsuite/gcc.dg/strlenopt-55.c
+++ b/gcc/testsuite/gcc.dg/strlenopt-55.c
@@ -3,7 +3,8 @@

Verify that strlen() of braced initialized array is folded
{ dg-do compile }
-   { dg-options "-O1 -Wall -fdump-tree-gimple -fdump-tree-optimized" } */
+   { dg-options "-O1 -Wall -fdump-tree-gimple -fdump-tree-optimized" }
+   { dg-require-effective-target large_initializer } */

 #include "strlenopt.h"

diff --git a/gcc/testsuite/lib/target-supports.exp
b/gcc/testsuite/lib/target-supports.exp
index e79015b4d54..4e0d45aaae5 100644
--- a/gcc/testsuite/lib/target-supports.exp
+++ b/gcc/testsuite/lib/target-supports.exp
@@ -10424,3 +10424,14 @@ proc check_effective_target_msp430_large {} {
#endif
 } ""]
 }
+
+# Return 1 if the target has an efficient means to encode large initializers
+# in the assembly.
+
+proc check_effective_target_large_initializer { } {
+if { [istarget nvptx*-*-*] } {
+   return 0
+}
+
+return 1
+}
...

[Bug target/96566] [nvptx] Timeout in gcc.dg/builtin-object-size-21.c

2020-08-12 Thread vries at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=96566

--- Comment #13 from Tom de Vries  ---
Printing correct array dimension fixed in
https://gcc.gnu.org/git/?p=gcc.git;a=commit;h=b9c7fe59f9f66ecc091e215c826ecd1a04d032dc
.

[Bug target/96566] [nvptx] Timeout in gcc.dg/builtin-object-size-21.c

2020-08-12 Thread vries at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=96566

--- Comment #14 from Tom de Vries  ---
(In reply to Tom de Vries from comment #12)
> (In reply to Tom de Vries from comment #6)
> > (In reply to Jakub Jelinek from comment #3)
> > > Either the test can be skipped on nvptx or any targets that don't emit
> > > something like a .zero similar directive, 
> 
> How about this:
> ...
> diff --git a/gcc/testsuite/gcc.dg/builtin-object-size-21.c
> b/gcc/testsuite/gcc.dg/builtin-object-size-21.c
> index 1c42374ba89..7e0f85ffdf3 100644
> --- a/gcc/testsuite/gcc.dg/builtin-object-size-21.c
> +++ b/gcc/testsuite/gcc.dg/builtin-object-size-21.c
> @@ -1,7 +1,8 @@
>  /* PR middle-end/92815 - spurious -Wstringop-overflow writing into
> a flexible array of an extern struct
> { dg-do compile }
> -   { dg-options "-Wall -fdump-tree-optimized" } */
> +   { dg-options "-Wall -fdump-tree-optimized" }
> +   { dg-require-effective-target large_initializer } */
>  
>  #define PTRDIFF_MAX __PTRDIFF_MAX__
>  
> diff --git a/gcc/testsuite/gcc.dg/strlenopt-55.c
> b/gcc/testsuite/gcc.dg/strlenopt-55.c
> index ea6fb22a2ed..ca89ecd3c53 100644
> --- a/gcc/testsuite/gcc.dg/strlenopt-55.c
> +++ b/gcc/testsuite/gcc.dg/strlenopt-55.c
> @@ -3,7 +3,8 @@
>  
> Verify that strlen() of braced initialized array is folded
> { dg-do compile }
> -   { dg-options "-O1 -Wall -fdump-tree-gimple -fdump-tree-optimized" } */
> +   { dg-options "-O1 -Wall -fdump-tree-gimple -fdump-tree-optimized" }
> +   { dg-require-effective-target large_initializer } */
>  
>  #include "strlenopt.h"
>  
> diff --git a/gcc/testsuite/lib/target-supports.exp
> b/gcc/testsuite/lib/target-supports.exp
> index e79015b4d54..4e0d45aaae5 100644
> --- a/gcc/testsuite/lib/target-supports.exp
> +++ b/gcc/testsuite/lib/target-supports.exp
> @@ -10424,3 +10424,14 @@ proc check_effective_target_msp430_large {} {
> #endif
>  } ""]
>  }
> +
> +# Return 1 if the target has an efficient means to encode large initializers
> +# in the assembly.
> +
> +proc check_effective_target_large_initializer { } {
> +if { [istarget nvptx*-*-*] } {
> +   return 0
> +}
> +
> +return 1
> +}
> ...

submitted at https://gcc.gnu.org/pipermail/gcc-patches/2020-August/551837.html
.

[Bug target/96588] New: [nvptx] Add -minit-limit

2020-08-12 Thread vries at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=96588

Bug ID: 96588
   Summary: [nvptx] Add -minit-limit
   Product: gcc
   Version: 11.0
Status: UNCONFIRMED
  Severity: enhancement
  Priority: P3
 Component: target
  Assignee: unassigned at gcc dot gnu.org
  Reporter: vries at gcc dot gnu.org
  Target Milestone: ---

[ As proposed in PR96566. ]

When compiling test-case gcc.dg/builtin-object-size-21.c for nvptx, we time
out, possibly while consuming a lot of disk space.

This has now been fixed for that testcase by requiring effective target
large_initializer, but for new test-cases that doesn't help.

We could add a nvptx option -minit-limit, set to unlimited by default, and then
add say -minit-limit=0xfff in nvptx-none-run.exp to make sure that we run
into an error rather than a timeout/out-of-disk-space.

[Bug target/96566] [nvptx] Timeout in gcc.dg/builtin-object-size-21.c

2020-08-12 Thread vries at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=96566

--- Comment #16 from Tom de Vries  ---
(In reply to Tom de Vries from comment #9)
> (In reply to Jakub Jelinek from comment #7)
> > I'm not sure a target specific option is the way to go here, the only
> > difference is that nvptx spends all the time on this (adjusted) testcase at
> > compile time (and eats all disk space there too), while on x86_64 it is at
> > assembly time.
> > gcc -O2 -c -o /tmp/1.o /tmp/1.c
> > /tmp/ccUN9rYB.s: Assembler messages:
> > /tmp/ccUN9rYB.s: Fatal error: can't fill 256 bytes in section .data of
> > /tmp/1.o: 'No space left on device'
> > In real-world people will only compile code that is useful for something,
> > and we should honor there the no hardcoded limits unless really necessary
> > rule, some users may need 20GB initializers some day (sure, on most PTX
> > decides it wouldn't likely fit, but that can be diagnosed later).
> > For the error recovery, it is ok to throw away the initializers if it
> > doesn't result in further diagnostics, but otherwise, let's let users do
> > what they want
> > if they have time and disk space for that.
> 
> I guess we can set the limit to the max by default, and then run the
> testsuite with the limit set to something more reasonable.

Filed this idea as https://gcc.gnu.org/bugzilla/show_bug.cgi?id=96588 .

[Bug testsuite/96589] New: Directive to redirect compiler output to /dev/null

2020-08-12 Thread vries at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=96589

Bug ID: 96589
   Summary: Directive to redirect compiler output to /dev/null
   Product: gcc
   Version: 11.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: testsuite
  Assignee: unassigned at gcc dot gnu.org
  Reporter: vries at gcc dot gnu.org
  Target Milestone: ---

[ As proposed here: PR 96566 comment 10. ]

This directive can be useful if the assembly file is potentially large, to
prevent we run out of disk space.

[Bug target/96566] [nvptx] Timeout in gcc.dg/builtin-object-size-21.c

2020-08-12 Thread vries at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=96566

--- Comment #17 from Tom de Vries  ---
(In reply to Martin Sebor from comment #10)
> The issue described in bug 92815 comment 9 sounds like a similar problem. 
> Does sending the output to /dev/null instead of a .s file help?  If it does,
> adding a dg directive to do that might be a solution.
> 

Filed as PR96589.

[Bug target/96566] [nvptx] Timeout in gcc.dg/builtin-object-size-21.c

2020-08-12 Thread vries at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=96566

Tom de Vries  changed:

   What|Removed |Added

 Resolution|--- |FIXED
 Status|UNCONFIRMED |RESOLVED

--- Comment #18 from Tom de Vries  ---
Patch skipping the test-case for nvptx committed, marking resolved-fixed.

[Bug target/96494] [nvptx] Enable effective target sync_int_long

2020-08-12 Thread vries at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=96494

--- Comment #3 from Tom de Vries  ---
https://gcc.gnu.org/pipermail/gcc-patches/2020-August/551842.html

[Bug testsuite/96589] Directive to redirect compiler output to /dev/null

2020-08-13 Thread vries at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=96589

--- Comment #2 from Tom de Vries  ---
With this patch:
...
diff --git a/gcc/testsuite/gcc.dg/builtin-object-size-21.c
b/gcc/testsuite/gcc.dg/builtin-object-size-21.c
index 7e0f85ffdf3..87058988780 100644
--- a/gcc/testsuite/gcc.dg/builtin-object-size-21.c
+++ b/gcc/testsuite/gcc.dg/builtin-object-size-21.c
@@ -1,8 +1,7 @@
 /* PR middle-end/92815 - spurious -Wstringop-overflow writing into
a flexible array of an extern struct
{ dg-do compile }
-   { dg-options "-Wall -fdump-tree-optimized" }
-   { dg-require-effective-target large_initializer } */
+   { dg-options "-Wall -fdump-tree-optimized -o /dev/null" } */

 #define PTRDIFF_MAX __PTRDIFF_MAX__

...
we have:
...
spawn -ignore SIGHUP /home/vries/nvptx/mainkernel-2/build-gcc/gcc/xgcc
-B/home/vries/nvptx/mainkernel-2/build-gcc/gcc/
/home/vries/nvptx/mainkernel-2/source-gcc/gcc/testsuite/gcc.dg/builtin-object-size-21.c
-fno-diagnostics-show-caret -fno-diagnostics-show-line-numbers
-fdiagnostics-color=never -fdiagnostics-urls=never
--sysroot=/home/vries/nvptx/mainkernel-2/install/nvptx-none -Wall
-fdump-tree-optimized -o /dev/null -S -isystem
/home/vries/nvptx/mainkernel-2/build-gcc/nvptx-none/./newlib/targ-include
-isystem /home/vries/nvptx/mainkernel-2/source-gcc/newlib/libc/include -o
builtin-object-size-21.s^M
cc1: error: output filename specified twice^M
compiler exited with status 1
FAIL: gcc.dg/builtin-object-size-21.c  (test for errors, line 29)
...

[Bug target/90928] [9/10 Regression] [nvptx] internal compiler error: in instantiate_virtual_regs_in_insn, at function.c:1737

2020-08-13 Thread vries at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=90928

Tom de Vries  changed:

   What|Removed |Added

   Target Milestone|9.4 |11.0

[Bug target/90928] [9/10 Regression] [nvptx] internal compiler error: in instantiate_virtual_regs_in_insn, at function.c:1737

2020-08-13 Thread vries at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=90928

Tom de Vries  changed:

   What|Removed |Added

 Status|NEW |RESOLVED
 Resolution|--- |FIXED

--- Comment #6 from Tom de Vries  ---
Patch committed, marking resolved-fixed.

[Bug target/90933] [nvptx] internal compiler error: RTL check: expected code 'const_int', have 'reg' in rtx_to_poly_int64, at rtl.h:2367

2020-08-13 Thread vries at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=90933

--- Comment #1 from Tom de Vries  ---
New behaviour for the test-case.

Instead of ICE-ing, we have:
...
/home/vries/nvptx/mainkernel-2/source-gcc/gcc/testsuite/gcc.dg/memcmp-1.c: In
function 'test_strncmp_49_1':^M
/home/vries/nvptx/mainkernel-2/source-gcc/gcc/testsuite/gcc.dg/memcmp-1.c:190:13:
error: total size of local objects 12659529496391581745 exceeds maximum
9223372036854775296^M
...

[Bug target/96494] [nvptx] Enable effective target sync_int_long

2020-08-19 Thread vries at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=96494

Tom de Vries  changed:

   What|Removed |Added

 Resolution|--- |FIXED
 Status|UNCONFIRMED |RESOLVED
   Target Milestone|--- |11.0

--- Comment #5 from Tom de Vries  ---
Testsuite patch committed, marking resolved-fixed.

[Bug target/96706] New: [nvptx] compilation failure of pr89663-1.c

2020-08-19 Thread vries at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=96706

Bug ID: 96706
   Summary: [nvptx] compilation failure of pr89663-1.c
   Product: gcc
   Version: 11.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: target
  Assignee: unassigned at gcc dot gnu.org
  Reporter: vries at gcc dot gnu.org
  Target Milestone: ---

Consider test-case pr89663-1.c, minimized from
gcc/testsuite/gcc.c-torture/compile/pr89663-1.c, and with added main:
...
long lrint ();

void
foo (long long *p)
{
  int n = 0;
  p[n++] = lrint (1);
}

long
lrint (a)
  int a;  
{
  return a + 1;
}

int
main (void)
{
  long long l;
  foo (&l);
  return l == 2;
}
...

With gcc on x86_64, we have:
...
$ gcc pr89663-1.c -fno-builtin
$ ./a.out; echo $?
1
...
and:
...
$ gcc pr89663-1.c
pr89663-1.c: In function ‘lrint’:
pr89663-1.c:12:7: warning: argument ‘a’ doesn’t match built-in prototype
   int a;
   ^
$ ./a.out; echo $?
1
...

With nvptx, we have:
...
$ /home/vries/nvptx/mainkernel-2/build-gcc/gcc/xgcc
-B/home/vries/nvptx/mainkernel-2/build-gcc/gcc/ -fdiagnostics-plain-output
--sysroot=/home/vries/nvptx/mainkernel-2/install/nvptx-none -w -O0 -isystem
/home/vries/nvptx/mainkernel-2/build-gcc/nvptx-none/./newlib/targ-include
-isystem /home/vries/nvptx/mainkernel-2/source-gcc/newlib/libc/include
-B/home/vries/nvptx/mainkernel-2/build-gcc/nvptx-none/./newlib/
-L/home/vries/nvptx/mainkernel-2/build-gcc/nvptx-none/./newlib -mmainkernel -lm
pr89663-1.c -fno-builtin
$ ~/nvptx/mainkernel-2/install/bin/nvptx-none-run a.out; echo $?
1
...
and:
...
$ /home/vries/nvptx/mainkernel-2/build-gcc/gcc/xgcc
-B/home/vries/nvptx/mainkernel-2/build-gcc/gcc/ -fdiagnostics-plain-output
--sysroot=/home/vries/nvptx/mainkernel-2/install/nvptx-none -w -O0 -isystem
/home/vries/nvptx/mainkernel-2/build-gcc/nvptx-none/./newlib/targ-include
-isystem /home/vries/nvptx/mainkernel-2/source-gcc/newlib/libc/include
-B/home/vries/nvptx/mainkernel-2/build-gcc/nvptx-none/./newlib/
-L/home/vries/nvptx/mainkernel-2/build-gcc/nvptx-none/./newlib -mmainkernel -lm
pr89663-1.c 
ptxas /tmp/ccbuZdU3.o, line 26; error   : Arguments mismatch for instruction
'mov'
ptxas /tmp/ccbuZdU3.o, line 71; error   : Type of argument does not match
formal parameter '%in_ar0'
ptxas /tmp/ccbuZdU3.o, line 71; error   : Alignment of argument does not match
formal parameter '%in_ar0'
ptxas fatal   : Ptx assembly aborted due to errors
nvptx-as: ptxas returned 255 exit status
...
because:
...
// BEGIN GLOBAL FUNCTION DECL: lrint
.visible .func (.param.u64 %value_out) lrint (.param.f64 %in_ar0);  

// BEGIN GLOBAL FUNCTION DEF: lrint 
.visible .func (.param.u64 %value_out) lrint (.param.f64 %in_ar0)
{
.reg.f64 %ar0;  
ld.param.f64 %ar0, [%in_ar0];   
.reg.u32 %r25;  
mov.u32 %r25, %ar0;
  ...

The problem is that we end up in nvptx_declare_function_name with a decl that
has as decl args:
...
arguments  unit-size 
...
but we to emit the declaration, we use the fntype, which has arg:
...
(gdb) call debug_tree (args)
 
...

[Bug target/96706] [nvptx] compilation failure of pr89663-1.c

2020-08-19 Thread vries at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=96706

Tom de Vries  changed:

   What|Removed |Added

 Target||nvptx

--- Comment #1 from Tom de Vries  ---
Tentative patch:
...
diff --git a/gcc/config/nvptx/nvptx.c b/gcc/config/nvptx/nvptx.c
index 39d0275493a..7ad9ab326d0 100644
--- a/gcc/config/nvptx/nvptx.c
+++ b/gcc/config/nvptx/nvptx.c
@@ -895,12 +895,12 @@ write_fn_proto (std::stringstream &s, bool is_defn,
  NULL in DECL_ARGUMENTS, for builtin functions without another
declaration.
  So we have to pick the best one we have.  */
-  tree args = TYPE_ARG_TYPES (fntype);
-  bool prototyped = true;
+  tree args = DECL_ARGUMENTS (decl);
+  bool prototyped = false;
   if (!args)
 {
-  args = DECL_ARGUMENTS (decl);
-  prototyped = false;
+  args = TYPE_ARG_TYPES (fntype);
+  prototyped = true;
 }

   for (; args; args = TREE_CHAIN (args), not_atomic_weak_arg--)
@@ -1304,12 +1304,12 @@ nvptx_declare_function_name (FILE *file, const char
*name, const_tree decl)
 argno = write_arg_type (s, 0, argno, ptr_type_node, true);

   /* Declare and initialize incoming arguments.  */
-  tree args = TYPE_ARG_TYPES (fntype);
-  bool prototyped = true;
+  tree args = DECL_ARGUMENTS (decl);
+  bool prototyped = false;
   if (!args)
 {
-  args = DECL_ARGUMENTS (decl);
-  prototyped = false;
+  args = TYPE_ARG_TYPES (fntype);
+  prototyped = true;
 }

   for (; args != NULL_TREE; args = TREE_CHAIN (args))
...

[Bug analyzer/96792] New: Analyzer assumes pointer is NULL, even though pointer was dereferenced earlier

2020-08-26 Thread vries at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=96792

Bug ID: 96792
   Summary: Analyzer assumes pointer is NULL, even though pointer
was dereferenced earlier
   Product: gcc
   Version: 11.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: analyzer
  Assignee: dmalcolm at gcc dot gnu.org
  Reporter: vries at gcc dot gnu.org
  Target Milestone: ---

I build gdb/gdbserver master with gcc-11 (gcc-11 (SUSE Linux) 11.0.0 20200824
(experimental) [revision 0d166f4a8773a43d925be006e713b7d81626ddb9]) and
CFLAGS/CXXFLAGS="-Wall -O0 -g -fanalyzer".

I ran into a warning I thought was interesting in gdb/block.c.  I've minimized
it into this test-case:
...
$ cat test.c 
#define NULL (void *)0

struct block
{
  void *function;
  const struct block *superblock;
};

struct global_block
{
  struct block block;
  void *compunit_symtab;
};

extern const struct block *block_global_block (const struct block *block);

void *
block_objfile (const struct block *block)
{
  const struct global_block *global_block;

  if (block->function != NULL)
return block->function;

  global_block = (struct global_block *) block_global_block (block);
  return global_block->compunit_symtab;
}

const struct block *
block_global_block (const struct block *block)
{
  if (block == NULL)
return NULL;

  while (block->superblock != NULL)
block = block->superblock;

  return block;
}
...

The analyzer shows:
...
$ gcc-11 -fanalyzer -c test.c
In function ‘block_objfile’:
test.c:26:22: warning: dereference of NULL ‘global_block’ [CWE-690]
[-Wanalyzer-null-dereference]
   26 |   return global_block->compunit_symtab;
  |  ^
  ‘block_objfile’: events 1-4
|
|   18 | block_objfile (const struct block *block)
|  | ^
|  | |
|  | (1) entry to ‘block_objfile’
|..
|   22 |   if (block->function != NULL)
|  |  ~
|  |  |
|  |  (2) following ‘false’ branch...
|..
|   25 |   global_block = (struct global_block *) block_global_block
(block);
|  | 
~~
|  |  |
|  |  (3) ...to here
|  |  (4) calling
‘block_global_block’ from ‘block_objfile’
|
+--> ‘block_global_block’: events 5-6
   |
   |   30 | block_global_block (const struct block *block)
   |  | ^~
   |  | |
   |  | (5) entry to ‘block_global_block’
   |   31 | {
   |   32 |   if (block == NULL)
   |  |  ~
   |  |  |
   |  |  (6) following ‘true’ branch (when ‘block’ is NULL)...
   |
 ‘block_global_block’: event 7
   |
   |1 | #define NULL (void *)0
   |  |  ^
   |  |  |
   |  |  (7) ...to here
test.c:33:12: note: in expansion of macro ‘NULL’
   |   33 | return NULL;
   |  |^~~~
   |
 ‘block_global_block’: event 8
   |
   |1 | #define NULL (void *)0
   |  |  ^
   |  |  |
   |  |  (8) ‘0’ is NULL
test.c:33:12: note: in expansion of macro ‘NULL’
   |   33 | return NULL;
   |  |^~~~
   |
<--+
|
  ‘block_objfile’: events 9-10
|
|   25 |   global_block = (struct global_block *) block_global_block
(block);
|  | 
^~
|  |  |
|  |  (9) return of NULL to
‘block_objfile’ from ‘block_global_block’
|   26 |   return global_block->compunit_symtab;
|  |  ~
|  |  |
|  |  (10) dereference of NULL ‘global_block’
|
...

The interesting bit to me is that event 6 asserts that block == NULL.  However,
at event 2, we test block->function != NULL, which we cannot do without block
!= NULL.

[Bug analyzer/96894] New: Analyzer assumes pointer is NULL, even if pointer was tested to be non-null before

2020-09-02 Thread vries at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=96894

Bug ID: 96894
   Summary: Analyzer assumes pointer is NULL, even if pointer was
tested to be non-null before
   Product: gcc
   Version: 11.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: analyzer
  Assignee: dmalcolm at gcc dot gnu.org
  Reporter: vries at gcc dot gnu.org
  Target Milestone: ---

Created attachment 49174
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=49174&action=edit
fibheap.c, preprocessed version from gdb/binutils master

Using gcc-11 (SUSE Linux) 11.0.0 20200901 (experimental) [revision
b1850c617b14eedaf60b358f3b7d4707cff73b8a].

Invoked like this:
...
$ gcc-11 fibheap.c -fanalyzer -S
...

We have:
...
fibheap.c: In function ‘fibnode_remove’:
fibheap.c:3122:42: warning: dereference of NULL ‘*(node).parent’ [CWE-690]
[-Wanalyzer-null-dereference]
 3122 |   && node->parent->child == node)
  |  ^~~
...

Looking at the source code, we have:
...
  3118if (node->parent !=
  3119
  3120   ((void *)0)
  3121
  3122&& node->parent->child == node)
  3123  node->parent->child = ret;
...

So, just before dereferencing node->parent, we check that it's non-null, so the
warning that node->parent is dereferenced while it's null makes no sense.

[Bug target/96898] New: [nvptx] libatomic support

2020-09-02 Thread vries at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=96898

Bug ID: 96898
   Summary: [nvptx] libatomic support
   Product: gcc
   Version: 11.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: target
  Assignee: unassigned at gcc dot gnu.org
  Reporter: vries at gcc dot gnu.org
  Target Milestone: ---

When building gcc for nvptx, we get:
...
checking for libatomic support... no
...

As mentioned here (
https://gcc.gnu.org/pipermail/gcc-patches/2020-September/553142.html  ), could
be useful.

[Bug target/96898] [nvptx] libatomic support

2020-09-03 Thread vries at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=96898

Tom de Vries  changed:

   What|Removed |Added

 CC||jakub at redhat dot com

--- Comment #1 from Tom de Vries  ---
Hmm, so libatomic needs to fall back onto protect_start and protect_end.

It would make sense for the openmp/openacc programs to have that map onto
GOMP_start/GOMP_stop.

But this introduces a dependency of libatomic on libgomp.  Not ideal, I
suppose.

Maybe we could (at least for the nvptx case) move the global lock out of
libgomp into libatomic, and have a dependency of libgomp on libatomic instead.

[Bug target/96898] [nvptx] libatomic support

2020-09-04 Thread vries at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=96898

--- Comment #2 from Tom de Vries  ---
Hmm, I found this difference: 
- AFAIU, GOMP_atomic_start/end have barrier semantics
- libatomics protect_start/end are always paired with explicit barriers, so
  presumably these don't have barrier semantics

So, using GOMP_atomic_start for protect_start in libatomics will have the
effect of issuing the barrier twice, which might be a performance problem.

[Bug target/96932] New: [nvptx] atomic_exchange missing barrier

2020-09-04 Thread vries at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=96932

Bug ID: 96932
   Summary: [nvptx] atomic_exchange missing barrier
   Product: gcc
   Version: 11.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: target
  Assignee: unassigned at gcc dot gnu.org
  Reporter: vries at gcc dot gnu.org
  Target Milestone: ---

After digging into GOMP_atomic_start/end I realized these also imply barrier
semantics.

And looking at the source code used for nvptx in libgomp/config/accel/mutex.h,
that should be fine:
...
static inline void
gomp_mutex_lock (gomp_mutex_t *mutex)
{
  while (__sync_lock_test_and_set (mutex, 1))
/* spin */ ;
}

static inline void
gomp_mutex_unlock (gomp_mutex_t *mutex)
{
  __sync_lock_release (mutex);
}
...

However, when looking at the resulting code in libgomp.a we see there's no
barrier for GOMP_atomic_start:
...
.visible .func GOMP_atomic_start
{
.reg .u32 %r22;
.reg .pred %r23;
$L2:
.loc 1 51 10
atom.global.exch.b32 %r22,[atomic_lock],1;
.loc 1 51 9
setp.ne.u32 %r23,%r22,0;
@ %r23 bra $L2;
.loc 2 43 1
ret;
}
...

While there is for GOMP_atomic_end:
...
.visible .func GOMP_atomic_end
{
.reg .u32 %r22;
.loc 1 58 3
membar.sys;
mov.u32 %r22,0;
st.global.u32 [atomic_lock],%r22;
.loc 2 49 1
ret;
}
...

[Bug target/96898] [nvptx] libatomic support

2020-09-04 Thread vries at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=96898

--- Comment #4 from Tom de Vries  ---
(In reply to Jakub Jelinek from comment #3)
> For OpenMP reductions, we really don't care what kind of mutex protects the
> updates, as long as it is the same for all updates of the same reduction.
> I believe we don't rely on any other synchronization effects.
> So, I think we should change omp-low.c so that it emits __atomic_* calls
> with __ATOMIC_RELAXED rather than __sync_* calls.

That sounds like a good idea.

> And could just use
> libatomic with its own locking if we didn't go the GOMP_atomic_{start,end}
> route (that one is done if there are multiple reductions or the atomics
> aren't available or there are user defined reductions we don't understand
> (or all?), perhaps we should consider also using atomics perhaps even for
> two simple reductions or similar.
> And nvptx certainly could just use libatomic...

If we use libatomic as fallback for openmp, shouldn't we then use the same lock
in both?

[Bug target/96964] New: [nvptx] Implement __atomic_test_and_set

2020-09-07 Thread vries at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=96964

Bug ID: 96964
   Summary: [nvptx] Implement __atomic_test_and_set
   Product: gcc
   Version: 11.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: target
  Assignee: unassigned at gcc dot gnu.org
  Reporter: vries at gcc dot gnu.org
  Target Milestone: ---

Currently __atomic_test_and_set for nvptx falls back onto the "Failing all
else, assume a single threaded environment and simply perform the operation"
case in expand_atomic_test_and_set, so it doesn't map onto an actual atomic
operation.

So, for test-case test.c:
...
int a;

int
main (void)
{
  int res = __atomic_test_and_set (&a, __ATOMIC_SEQ_CST);
  return res;
}
...
we get:
...
$ gcc test.c -S -o-
// BEGIN PREAMBLE
.version3.1
.target sm_30
.address_size 64
// END PREAMBLE


// BEGIN GLOBAL FUNCTION DECL: main
.visible .func (.param.u32 %value_out) main (.param.u32 %in_ar0, .param.u64
%in_ar1);

// BEGIN GLOBAL FUNCTION DEF: main
.visible .func (.param.u32 %value_out) main (.param.u32 %in_ar0, .param.u64
%in_ar1)
{
.reg.u32 %value;
.local .align 16 .b8 %frame_ar[16];
.reg.u64 %frame;
cvta.local.u64 %frame, %frame_ar;
.reg.u32 %r22;
.reg.u32 %r23;
.reg.u32 %r24;
.reg.u32 %r25;
.reg.u32 %r26;
ld.global.u8%r25, [a];
mov.u32 %r26, 1;
st.global.u8[a], %r26;
cvt.u32.u8  %r22, %r25;
st.u32  [%frame], %r22;
ld.u32  %r23, [%frame];
mov.u32 %r24, %r23;
mov.u32 %value, %r24;
st.param.u32[%value_out], %value;
ret;
}


// BEGIN GLOBAL VAR DEF: a
.visible .global .align 4 .u32 a[1];
...

[Bug target/96964] [nvptx] Implement __atomic_test_and_set

2020-09-07 Thread vries at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=96964

--- Comment #1 from Tom de Vries  ---
This is an attempt to implement it by using a fallback in libatomic (see also
PR96898):
...
diff --git a/gcc/config/nvptx/nvptx.md b/gcc/config/nvptx/nvptx.md
index 4168190fa42..612240661f8 100644
--- a/gcc/config/nvptx/nvptx.md
+++ b/gcc/config/nvptx/nvptx.md
@@ -54,6 +54,7 @@
UNSPECV_LOCK
UNSPECV_CAS
UNSPECV_XCHG
+   UNSPECV_TAS
UNSPECV_BARSYNC
UNSPECV_MEMBAR
UNSPECV_MEMBAR_CTA
@@ -1667,6 +1668,35 @@
   "%.\\tatom%A1.b%T0.\\t%0, %1, %2;"
   [(set_attr "atomic" "true")])

+(define_insn "atomic_test_and_set"
+  [(set (match_operand:QI 0 "nvptx_register_operand" "=R")
+(unspec_volatile:QI
+  [(match_operand:QI 1 "memory_operand" "+m")
+  (match_operand:SI 2 "const_int_operand") ;; model
+ ]
+  UNSPECV_TAS))
+   (set (match_dup 1)
+(unspec_volatile:QI [(match_dup 1)] UNSPECV_TAS))]
+  ""
+  { operands[1] = XEXP (operands[1], 0);
+return
+  "// BEGIN GLOBAL FUNCTION DECL: __atomic_test_and_set_1\n"
+  ".extern .func (.param .u32 %%value_out)"
+  " __atomic_test_and_set_1 (.param .u64 %%in_ar0, .param .u32
%%in_ar1);\n"
+  "{\n"
+  " .param .u32 %%value_in;\n"
+  " .param .u64 %%out_arg1;\n"
+  " .reg.u64 %%ptr;\n"
+  " cvta.global.u64 %%ptr, %1;\n"
+  " st.param.u64 [%%out_arg1],%%ptr;\n"
+  " .param .u32 %%out_arg2;\n"
+  " st.param.u32 [%%out_arg2],%2;\n"
+  " call (%%value_in),__atomic_test_and_set_1,(%%out_arg1,%%out_arg2);\n"
+  " ld.param.u32 %0,[%%value_in];\n"
+  "}";
+  }
+[(set_attr "atomic" "true")])
+
 (define_insn "nvptx_barsync"
   [(unspec_volatile [(match_operand:SI 0 "nvptx_nonmemory_operand" "Ri")
 (match_operand:SI 1 "const_int_operand")]
...

Funnily enough, doing this has the side-effect that the fallback
__atomic_test_and_set_1 in libatomic is fixed.

[Bug target/96898] [nvptx] libatomic support

2020-09-07 Thread vries at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=96898

--- Comment #6 from Tom de Vries  ---
Created attachment 49195
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=49195&action=edit
Tentative patch

Introduces an option -fatomic-libcalls (analogous to -fsync-libcalls) such that
__atomic_test_and_set maps onto libatomic function __atomic_test_and_set_1.

[Bug target/96898] [nvptx] libatomic support

2020-09-07 Thread vries at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=96898

--- Comment #7 from Tom de Vries  ---
(In reply to Tom de Vries from comment #6)
> Created attachment 49195 [details]
> Tentative patch
> 
> Introduces an option -fatomic-libcalls (analogous to -fsync-libcalls) such
> that __atomic_test_and_set maps onto libatomic function
> __atomic_test_and_set_1.

I've now achieved the same in the target, not relying on a new option:
...
diff --git a/gcc/config/nvptx/nvptx.md b/gcc/config/nvptx/nvptx.md
index 4168190fa42..6178e6a0f77 100644
--- a/gcc/config/nvptx/nvptx.md
+++ b/gcc/config/nvptx/nvptx.md
@@ -1667,6 +1667,22 @@
   "%.\\tatom%A1.b%T0.\\t%0, %1, %2;"
   [(set_attr "atomic" "true")])

+(define_expand "atomic_test_and_set"
+  [(match_operand:SI 0 "nvptx_register_operand")   ;; output
+   (match_operand:QI 1 "memory_operand")   ;; memory
+   (match_operand:SI 2 "const_int_operand")]   ;; model
+  ""
+{
+  rtx libfunc;
+  rtx addr;
+  libfunc = init_one_libfunc ("__atomic_test_and_set_1");
+  addr = convert_memory_address (ptr_mode, XEXP (operands[1], 0));
+  emit_library_call_value (libfunc, operands[0], LCT_NORMAL, SImode,
+ addr, ptr_mode,
+ operands[2], SImode);
+  DONE;
+})
+
 (define_insn "nvptx_barsync"
   [(unspec_volatile [(match_operand:SI 0 "nvptx_nonmemory_operand" "Ri")
 (match_operand:SI 1 "const_int_operand")]
...

[Bug target/96964] [nvptx] Implement __atomic_test_and_set

2020-09-07 Thread vries at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=96964

--- Comment #2 from Tom de Vries  ---
Patch submitted:
https://gcc.gnu.org/pipermail/gcc-patches/2020-September/553393.html

[Bug target/96898] [nvptx] libatomic support

2020-09-07 Thread vries at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=96898

--- Comment #8 from Tom de Vries  ---
Patch submitted:
https://gcc.gnu.org/pipermail/gcc-patches/2020-September/553393.html

  1   2   3   4   5   6   7   8   9   10   >