Re: [patch] adjust default nvptx launch geometry for OpenACC offloaded regions

2018-07-02 Thread Tom de Vries
On 06/21/2018 03:58 PM, Cesar Philippidis wrote:
> On 06/20/2018 03:15 PM, Tom de Vries wrote:
>> On 06/20/2018 11:59 PM, Cesar Philippidis wrote:
>>> Now it follows the formula contained in
>>> the "CUDA Occupancy Calculator" spreadsheet that's distributed with CUDA.
>>
>> Any reason we're not using the cuda runtime functions to get the
>> occupancy (see PR85590 - [nvptx, libgomp, openacc] Use cuda runtime fns
>> to determine launch configuration in nvptx ) ?
> 
> There are two reasons:
> 
>   1) cuda_occupancy.h depends on the CUDA runtime to extract the device
>  properties instead of the CUDA driver API. However, we can always
>  teach libgomp how to populate the cudaDeviceProp struct using the
>  driver API.
> 
>   2) CUDA is not always present on the build host, and that's why
>  libgomp maintains its own cuda.h. So at the very least, this
>  functionality would be good to have in libgomp as a fallback
>  implementation;

Libgomp maintains its own cuda.h to "allow building GCC with PTX
offloading even without CUDA being installed" (
https://gcc.gnu.org/ml/gcc-patches/2017-01/msg00980.html ).

The libgomp nvptx plugin however uses the cuda driver API to launch
kernels etc, so we can assume that's always available at launch time.
And according to the "CUDA Pro Tip: Occupancy API Simplifies Launch
Configuration", the occupancy API is also available in the driver API.

What we cannot assume to be available is the occupancy API pre cuda-6.5.
So it's fine to have a fallback for that (properly isolated in utility
functions), but for cuda 6.5 and up we want to use the occupancy API.

>  its not good to have program fail due to
>  insufficient hardware resources errors when it is avoidable.
>

Right, in fact there are two separate things you're trying to address
here: launch failure and occupancy heuristic, so split the patch.

Thanks,
- Tom


Re: [testsuite/guality, committed] Prevent optimization of local in vla-1.c

2018-07-03 Thread Tom de Vries
On 07/02/2018 10:16 AM, Jakub Jelinek wrote:
> On Mon, Jul 02, 2018 at 09:44:04AM +0200, Richard Biener wrote:
>> Given the array has size i + 1 it's upper bound should be 'i' and 'i'
>> should be available via DW_OP_[GNU_]entry_value.
>>
>> I see it is
>>
>> <175>   DW_AT_upper_bound : 10 byte block: 75 1 8 20 24 8 20 26 31
>> 1c   (DW_OP_breg5 (rdi): 1; DW_OP_const1u: 32; DW_OP_shl;
>> DW_OP_const1u: 32; DW_OP_shra; DW_OP_lit1; DW_OP_minus)
>>
>> and %rdi is 1.  Not sure why gdb fails to print it's length.  Yes, the
>> storage itself doesn't have a location but the
>> type specifies the size.
>>
>> (gdb) ptype a
>> type = char [variable length]
>> (gdb) p sizeof(a)
>> $3 = 0
>>
>> this looks like a gdb bug to me?
>>

With gdb patch:
...
diff --git a/gdb/findvar.c b/gdb/findvar.c
index 8ad5e25cb2..ebaff923a1 100644
--- a/gdb/findvar.c
+++ b/gdb/findvar.c
@@ -789,6 +789,8 @@ default_read_var_value
   break;

 case LOC_OPTIMIZED_OUT:
+  if (is_dynamic_type (type))
+   type = resolve_dynamic_type (type, NULL,
+/* Unused address.  */ 0);
   return allocate_optimized_out_value (type);

 default:
...

I get:
...
$ ./gdb -batch -ex "b f1" -ex "r" -ex "p sizeof (a)" vla-1.exe
Breakpoint 1 at 0x4004a8: file vla-1.c, line 17.

Breakpoint 1, f1 (i=i@entry=5) at vla-1.c:17
17return a[0];
$1 = 6
...

So I'd say it's a gdb bug indeed.

Thanks,
- Tom

>> Btw, the location expression looks odd, if I deciper it correctly
>> we end up with ((%rdi << 32) >> 32) - 1 which computes to 4
>> but the upper bound should be 5.  The GIMPLE debug stmts compute
>> the upper bound as (sizetype)((long)(i_1(D) + 1) - 1)
> 
> The << 32 >> 32 is sign extension.  And yes, for f1 I don't see why
> DW_OP_GNU_entry_value shouldn't work, i in main is needed for the call to
> f2, so needs to live in some register or memory in that function until the
> second call.  For f2 i is needed after the bar call for the a[i + 4] read,
> worst case in form of precomputed i + 4, but that is reversible op.
> 
>   Jakub
> 


Re: [testsuite/guality, committed] Prevent optimization of local in vla-1.c

2018-07-04 Thread Tom de Vries
On 07/03/2018 11:05 AM, Tom de Vries wrote:
> On 07/02/2018 10:16 AM, Jakub Jelinek wrote:
>> On Mon, Jul 02, 2018 at 09:44:04AM +0200, Richard Biener wrote:
>>> Given the array has size i + 1 it's upper bound should be 'i' and 'i'
>>> should be available via DW_OP_[GNU_]entry_value.
>>>
>>> I see it is
>>>
>>> <175>   DW_AT_upper_bound : 10 byte block: 75 1 8 20 24 8 20 26 31
>>> 1c   (DW_OP_breg5 (rdi): 1; DW_OP_const1u: 32; DW_OP_shl;
>>> DW_OP_const1u: 32; DW_OP_shra; DW_OP_lit1; DW_OP_minus)
>>>
>>> and %rdi is 1.  Not sure why gdb fails to print it's length.  Yes, the
>>> storage itself doesn't have a location but the
>>> type specifies the size.
>>>
>>> (gdb) ptype a
>>> type = char [variable length]
>>> (gdb) p sizeof(a)
>>> $3 = 0
>>>
>>> this looks like a gdb bug to me?
>>>
> 
> With gdb patch:
> ...
> diff --git a/gdb/findvar.c b/gdb/findvar.c
> index 8ad5e25cb2..ebaff923a1 100644
> --- a/gdb/findvar.c
> +++ b/gdb/findvar.c
> @@ -789,6 +789,8 @@ default_read_var_value
>break;
> 
>  case LOC_OPTIMIZED_OUT:
> +  if (is_dynamic_type (type))
> +   type = resolve_dynamic_type (type, NULL,
> +/* Unused address.  */ 0);
>return allocate_optimized_out_value (type);
> 
>  default:
> ...
> 
> I get:
> ...
> $ ./gdb -batch -ex "b f1" -ex "r" -ex "p sizeof (a)" vla-1.exe
> Breakpoint 1 at 0x4004a8: file vla-1.c, line 17.
> 
> Breakpoint 1, f1 (i=i@entry=5) at vla-1.c:17
> 17return a[0];
> $1 = 6
> ...
> 

Well, for -O1 and -O2.

For O3, I get instead:
...
$ ./gdb vla-1.exe -q -batch -ex "b f1" -ex "run" -ex "p sizeof (a)"
Breakpoint 1 at 0x4004b0: f1. (2 locations)

Breakpoint 1, f1 (i=5) at vla-1.c:17
17return a[0];
$1 = 0
...

At O3, f1 is cloned to a version f1.constprop with no parameters,
eliminating parameter i, but i is still accessible via
DW_OP_GNU_parameter_ref:
...
$ ./gdb vla-1.exe -q -batch -ex "b f1" -ex "run" -ex "p i" -ex "info addr i"
Breakpoint 1 at 0x4004b0: f1. (2 locations)

Breakpoint 1, f1 (i=5) at vla-1.c:17
17return a[0];
$1 = 5
Symbol "i" is a complex DWARF expression:
 0: DW_OP_GNU_parameter_ref offset 125
 5: DW_OP_stack_value
.
...

Variable a in f1 has full info available:
...
 <2><1f6>: Abbrev Number: 20 (DW_TAG_variable)
<1f7>   DW_AT_name: a
<1f9>   DW_AT_decl_file   : 1
<1fa>   DW_AT_decl_line   : 15
<1fb>   DW_AT_decl_column : 8
<1fc>   DW_AT_type: <0x201>
 <2><200>: Abbrev Number: 0
 <1><201>: Abbrev Number: 15 (DW_TAG_array_type)
<202>   DW_AT_type: <0x21b>
<206>   DW_AT_sibling : <0x21b>
 <2><20a>: Abbrev Number: 21 (DW_TAG_subrange_type)
<20b>   DW_AT_type: <0x1ce>
<20f>   DW_AT_upper_bound :
10 byte block: 75 1 8 20 24 8 20 26 31 1c
(DW_OP_breg5 (rdi): 1; DW_OP_const1u: 32; DW_OP_shl;
 DW_OP_const1u: 32; DW_OP_shra; DW_OP_lit1;
 DW_OP_minus)
 <2><21a>: Abbrev Number: 0
...

but a in f1.constprop has no DW_AT_upper_bound:
...
 <2><26e>: Abbrev Number: 26 (DW_TAG_variable)
<26f>   DW_AT_abstract_origin: <0x1f6>
<273>   DW_AT_type: <0x284>
 <2><283>: Abbrev Number: 0
 <1><284>: Abbrev Number: 15 (DW_TAG_array_type)
<285>   DW_AT_type: <0x21b>
<289>   DW_AT_sibling : <0x293>
 <2><28d>: Abbrev Number: 28 (DW_TAG_subrange_type)
<28e>   DW_AT_type: <0x1ce>
 <2><292>: Abbrev Number: 0
...

AFAIU, we could emit a DW_AT_upper_bound here as well, using
DW_OP_GNU_parameter_ref as we do for i.

Thanks,
- Tom


[testsuite] Simplify dg-final

2018-07-05 Thread Tom de Vries
[ was: [PATCH, testsuite/guality] Use line number vars in gdb-test ]

On Wed, Jul 04, 2018 at 08:32:49PM +0100, Richard Sandiford wrote:
> Tom de Vries  writes:
> > +proc dg-final { args } {
> > +upvar dg-final-code final-code
> > +
> > +if { [llength $args] > 2 } {
> > +   error "[lindex $args 0]: too many arguments"
> > +}
> > +set line [lindex $args 0]
> > +set code [lindex $args 1]
> > +set directive [lindex $code 0]
> > +set withline \
> > +   [switch $directive {
> > +   gdb-test {expr {1}}
> > +   default  {expr {0}}
> > +   }]
> > +if { $withline == 1 } {
> > +   set code [linsert $code 1 $line]
> > +}
> > +append final-code "$code\n"
> > +}
> 
> Like the idea, but I think:
> 
> set withline \
>   [switch $directive {
>   gdb-test {expr {1}}
>   default  {expr {0}}
>   }]
> if { $withline == 1 } {
>   set code [linsert $code 1 $line]
> }
> 
> would be clearer as:
> 
> switch $directive {
>   gdb-test {
>   set code [linsert $code 1 $line]
>   }
> }

Agreed, thanks for the comment.  Committed as below.

Thanks,
- Tom

[testsuite] Simplify dg-final

2018-07-05  Tom de Vries  

* lib/gcc-dg.exp (dg-final): Simplify tcl code.

---
 gcc/testsuite/lib/gcc-dg.exp | 11 ---
 1 file changed, 4 insertions(+), 7 deletions(-)

diff --git a/gcc/testsuite/lib/gcc-dg.exp b/gcc/testsuite/lib/gcc-dg.exp
index 9e0b3f4ef95..f5e6bef5dd9 100644
--- a/gcc/testsuite/lib/gcc-dg.exp
+++ b/gcc/testsuite/lib/gcc-dg.exp
@@ -123,13 +123,10 @@ proc dg-final { args } {
 set line [lindex $args 0]
 set code [lindex $args 1]
 set directive [lindex $code 0]
-set withline \
-   [switch $directive {
-   gdb-test {expr {1}}
-   default  {expr {0}}
-   }]
-if { $withline == 1 } {
-   set code [linsert $code 1 $line]
+switch $directive {
+   gdb-test {
+   set code [linsert $code 1 $line]
+   }
 }
 append final-code "$code\n"
 }


[PATCH][debug] Handle references to skipped params in remap_ssa_name

2018-07-05 Thread Tom de Vries
[ was: Re: [testsuite/guality, committed] Prevent optimization of local in
vla-1.c ]

On Wed, Jul 04, 2018 at 02:32:27PM +0200, Tom de Vries wrote:
> On 07/03/2018 11:05 AM, Tom de Vries wrote:
> > On 07/02/2018 10:16 AM, Jakub Jelinek wrote:
> >> On Mon, Jul 02, 2018 at 09:44:04AM +0200, Richard Biener wrote:
> >>> Given the array has size i + 1 it's upper bound should be 'i' and 'i'
> >>> should be available via DW_OP_[GNU_]entry_value.
> >>>
> >>> I see it is
> >>>
> >>> <175>   DW_AT_upper_bound : 10 byte block: 75 1 8 20 24 8 20 26 31
> >>> 1c   (DW_OP_breg5 (rdi): 1; DW_OP_const1u: 32; DW_OP_shl;
> >>> DW_OP_const1u: 32; DW_OP_shra; DW_OP_lit1; DW_OP_minus)
> >>>
> >>> and %rdi is 1.  Not sure why gdb fails to print it's length.  Yes, the
> >>> storage itself doesn't have a location but the
> >>> type specifies the size.
> >>>
> >>> (gdb) ptype a
> >>> type = char [variable length]
> >>> (gdb) p sizeof(a)
> >>> $3 = 0
> >>>
> >>> this looks like a gdb bug to me?
> >>>
> > 
> > With gdb patch:
> > ...
> > diff --git a/gdb/findvar.c b/gdb/findvar.c
> > index 8ad5e25cb2..ebaff923a1 100644
> > --- a/gdb/findvar.c
> > +++ b/gdb/findvar.c
> > @@ -789,6 +789,8 @@ default_read_var_value
> >break;
> > 
> >  case LOC_OPTIMIZED_OUT:
> > +  if (is_dynamic_type (type))
> > +   type = resolve_dynamic_type (type, NULL,
> > +/* Unused address.  */ 0);
> >return allocate_optimized_out_value (type);
> > 
> >  default:
> > ...
> > 
> > I get:
> > ...
> > $ ./gdb -batch -ex "b f1" -ex "r" -ex "p sizeof (a)" vla-1.exe
> > Breakpoint 1 at 0x4004a8: file vla-1.c, line 17.
> > 
> > Breakpoint 1, f1 (i=i@entry=5) at vla-1.c:17
> > 17return a[0];
> > $1 = 6
> > ...
> > 
> 
> Well, for -O1 and -O2.
> 
> For O3, I get instead:
> ...
> $ ./gdb vla-1.exe -q -batch -ex "b f1" -ex "run" -ex "p sizeof (a)"
> Breakpoint 1 at 0x4004b0: f1. (2 locations)
> 
> Breakpoint 1, f1 (i=5) at vla-1.c:17
> 17return a[0];
> $1 = 0
> ...
> 

Hi,

When compiling guality/vla-1.c with -O3 -g, vla 'a[i + 1]' in f1 is optimized
away, but f1 still contains a debug expression describing the upper bound of the
vla (D.1914):
...
 __attribute__((noinline))
 f1 (intD.6 iD.1900)
 {
   
   saved_stack.1_2 = __builtin_stack_save ();
   # DEBUG BEGIN_STMT
   # DEBUG D#3 => i_1(D) + 1
   # DEBUG D#2 => (long intD.8) D#3
   # DEBUG D#1 => D#2 + -1
   # DEBUG D.1914 => (sizetype) D#1
...

Then f1 is cloned to a version f1.constprop with no parameters, eliminating
parameter i, and 'DEBUG D#3 => i_1(D) + 1' turns into 'D#3 => NULL'.
Consequently, 'print sizeof (a)' yields '0' in gdb.

This patch fixes that by recognizing eliminated parameters in remap_ssa_name,
defining a debug expression linking back to the the eliminated parameter, and
using that debug expression to replace references to the eliminated parameter:
...
 __attribute__((noinline))
 f1.constprop ()
 {
   intD.6 iD.1949;

   
   # DEBUG D#8 s=> iD.1900
   # DEBUG iD.1949 => D#8

   
+  # DEBUG D#6 s=> iD.1900
   saved_stack.1_1 = __builtin_stack_save ();
   # DEBUG BEGIN_STMT
-  # DEBUG D#3 => NULL
+  # DEBUG D#3 => D#6 + 1
   # DEBUG D#2 => (long intD.8) D#3
   # DEBUG D#1 => D#2 + -1
   # DEBUG D.1951 => (sizetype) D#1
...

The inserted debug expression (D#6) is a duplicate of the debug expression
that will be inserted after copy_body in tree_function_versioning (D#8), so the
patch contains a todo to fix the duplication.

Bootstrapped and reg-tested on x86_64.

OK for trunk?

Thanks,
- Tom

[debug] Handle references to skipped params in remap_ssa_name

2018-07-05  Tom de Vries  

* tree-inline.c (remap_ssa_name): Handle references to skipped
params.

---
 gcc/tree-inline.c | 17 +++--
 1 file changed, 15 insertions(+), 2 deletions(-)

diff --git a/gcc/tree-inline.c b/gcc/tree-inline.c
index 427ef959740..0fa996cab49 100644
--- a/gcc/tree-inline.c
+++ b/gcc/tree-inline.c
@@ -204,11 +204,22 @@ remap_ssa_name (tree name, copy_body_data *id)
  gimple *def_temp;
  gimple_stmt_iterator gsi;
  tree val = SSA_NAME_VAR (name);
+ bool skipped_parm_decl = false;
 
  n = id->decl_map->get (val

Re: [PATCH][debug] Handle references to skipped params in remap_ssa_name

2018-07-05 Thread Tom de Vries
On 07/05/2018 01:39 PM, Richard Biener wrote:
> On Thu, Jul 5, 2018 at 1:25 PM Tom de Vries  wrote:
>>
>> [ was: Re: [testsuite/guality, committed] Prevent optimization of local in
>> vla-1.c ]
>>
>> On Wed, Jul 04, 2018 at 02:32:27PM +0200, Tom de Vries wrote:
>>> On 07/03/2018 11:05 AM, Tom de Vries wrote:
>>>> On 07/02/2018 10:16 AM, Jakub Jelinek wrote:
>>>>> On Mon, Jul 02, 2018 at 09:44:04AM +0200, Richard Biener wrote:
>>>>>> Given the array has size i + 1 it's upper bound should be 'i' and 'i'
>>>>>> should be available via DW_OP_[GNU_]entry_value.
>>>>>>
>>>>>> I see it is
>>>>>>
>>>>>> <175>   DW_AT_upper_bound : 10 byte block: 75 1 8 20 24 8 20 26 31
>>>>>> 1c   (DW_OP_breg5 (rdi): 1; DW_OP_const1u: 32; DW_OP_shl;
>>>>>> DW_OP_const1u: 32; DW_OP_shra; DW_OP_lit1; DW_OP_minus)
>>>>>>
>>>>>> and %rdi is 1.  Not sure why gdb fails to print it's length.  Yes, the
>>>>>> storage itself doesn't have a location but the
>>>>>> type specifies the size.
>>>>>>
>>>>>> (gdb) ptype a
>>>>>> type = char [variable length]
>>>>>> (gdb) p sizeof(a)
>>>>>> $3 = 0
>>>>>>
>>>>>> this looks like a gdb bug to me?
>>>>>>
>>>>
>>>> With gdb patch:
>>>> ...
>>>> diff --git a/gdb/findvar.c b/gdb/findvar.c
>>>> index 8ad5e25cb2..ebaff923a1 100644
>>>> --- a/gdb/findvar.c
>>>> +++ b/gdb/findvar.c
>>>> @@ -789,6 +789,8 @@ default_read_var_value
>>>>break;
>>>>
>>>>  case LOC_OPTIMIZED_OUT:
>>>> +  if (is_dynamic_type (type))
>>>> +   type = resolve_dynamic_type (type, NULL,
>>>> +/* Unused address.  */ 0);
>>>>return allocate_optimized_out_value (type);
>>>>
>>>>  default:
>>>> ...
>>>>
>>>> I get:
>>>> ...
>>>> $ ./gdb -batch -ex "b f1" -ex "r" -ex "p sizeof (a)" vla-1.exe
>>>> Breakpoint 1 at 0x4004a8: file vla-1.c, line 17.
>>>>
>>>> Breakpoint 1, f1 (i=i@entry=5) at vla-1.c:17
>>>> 17return a[0];
>>>> $1 = 6
>>>> ...
>>>>
>>>
>>> Well, for -O1 and -O2.
>>>
>>> For O3, I get instead:
>>> ...
>>> $ ./gdb vla-1.exe -q -batch -ex "b f1" -ex "run" -ex "p sizeof (a)"
>>> Breakpoint 1 at 0x4004b0: f1. (2 locations)
>>>
>>> Breakpoint 1, f1 (i=5) at vla-1.c:17
>>> 17return a[0];
>>> $1 = 0
>>> ...
>>>
>>
>> Hi,
>>
>> When compiling guality/vla-1.c with -O3 -g, vla 'a[i + 1]' in f1 is optimized
>> away, but f1 still contains a debug expression describing the upper bound of 
>> the
>> vla (D.1914):
>> ...
>>  __attribute__((noinline))
>>  f1 (intD.6 iD.1900)
>>  {
>>
>>saved_stack.1_2 = __builtin_stack_save ();
>># DEBUG BEGIN_STMT
>># DEBUG D#3 => i_1(D) + 1
>># DEBUG D#2 => (long intD.8) D#3
>># DEBUG D#1 => D#2 + -1
>># DEBUG D.1914 => (sizetype) D#1
>> ...
>>
>> Then f1 is cloned to a version f1.constprop with no parameters, eliminating
>> parameter i, and 'DEBUG D#3 => i_1(D) + 1' turns into 'D#3 => NULL'.
>> Consequently, 'print sizeof (a)' yields '0' in gdb.
> 
> So does gdb correctly recognize there isn't any size available or do we 
> somehow
> generate invalid debug info, not recognizing that D#3 => NULL means
> "optimized out" and thus all dependent expressions are "optimized out" as 
> well?
> 
> That is, shouldn't gdb do
> 
> (gdb) print sizeof (a)
> 
> 
> ?

The type for the vla gcc is emitting is an DW_TAG_array_type with
DW_TAG_subrange_type without DW_AT_upper_bound or DW_AT_count, which
makes the upper bound value 'unknown'. So I'd say the debug info is valid.

Using this gdb patch:
...
diff --git a/gdb/eval.c b/gdb/eval.c
index 9db6e7c69d..ea6f782c5b 100644
--- a/gdb/eval.c
+++ b/gdb/eval.c
@@ -3145,6 +3145,8 @@ evaluate_subexp_for_sizeof (...)
{
  val = evaluate_subexp (NULL_TYPE, exp, pos, EVAL_NORMAL);
  type = value_type (val);
+ if (TYPE_LENGTH (type) == 0)
+   return allocate_optimized_out_value (size_type);
}
   else
(*pos) += 4;
...

I get:
...
$ ./gdb vla-1.exe -batch -ex "b f1" -ex run -ex "p sizeof (a)"
Breakpoint 1 at 0x4004b0: f1. (2 locations)

Breakpoint 1, f1 (i=5) at vla-1.c:17
17return a[0];
$1 = 
...

Thanks,
- Tom


Re: [PATCH][debug] Handle references to skipped params in remap_ssa_name

2018-07-06 Thread Tom de Vries
On 07/05/2018 01:39 PM, Richard Biener wrote:
> On Thu, Jul 5, 2018 at 1:25 PM Tom de Vries  wrote:
>>
>> [ was: Re: [testsuite/guality, committed] Prevent optimization of local in
>> vla-1.c ]
>>
>> On Wed, Jul 04, 2018 at 02:32:27PM +0200, Tom de Vries wrote:
>>> On 07/03/2018 11:05 AM, Tom de Vries wrote:
>>>> On 07/02/2018 10:16 AM, Jakub Jelinek wrote:
>>>>> On Mon, Jul 02, 2018 at 09:44:04AM +0200, Richard Biener wrote:



>> [debug] Handle references to skipped params in remap_ssa_name
>>
>> 2018-07-05  Tom de Vries  
>>
>> * tree-inline.c (remap_ssa_name): Handle references to skipped
>> params.
>>
>> ---
>>  gcc/tree-inline.c | 17 +++--
>>  1 file changed, 15 insertions(+), 2 deletions(-)
>>
>> diff --git a/gcc/tree-inline.c b/gcc/tree-inline.c
>> index 427ef959740..0fa996cab49 100644
>> --- a/gcc/tree-inline.c
>> +++ b/gcc/tree-inline.c
>> @@ -204,11 +204,22 @@ remap_ssa_name (tree name, copy_body_data *id)
>>   gimple *def_temp;
>>   gimple_stmt_iterator gsi;
>>   tree val = SSA_NAME_VAR (name);
>> + bool skipped_parm_decl = false;
>>
>>   n = id->decl_map->get (val);
>>   if (n != NULL)
>> -   val = *n;
>> - if (TREE_CODE (val) != PARM_DECL)
>> +   {
>> + if (TREE_CODE (*n) == DEBUG_EXPR_DECL)
>> +   return *n;
>> +
>> + if (TREE_CODE (*n) == VAR_DECL
>> + && DECL_ABSTRACT_ORIGIN (*n)
>> + && TREE_CODE (DECL_ABSTRACT_ORIGIN (*n)) == PARM_DECL)
>> +   skipped_parm_decl = true;
>> + else
>> +   val = *n;
>> +   }
>> + if (TREE_CODE (val) != PARM_DECL && !skipped_parm_decl)
> 
> I wonder if this cannot be more easily set up in copy_arguments_for_versioning
> which already does
> 
> else if (!id->decl_map->get (arg))
>   {
> /* Make an equivalent VAR_DECL.  If the argument was used
>as temporary variable later in function, the uses will be
>replaced by local variable.  */
> tree var = copy_decl_to_var (arg, id);
> insert_decl_map (id, arg, var);
> /* Declare this new variable.  */
> DECL_CHAIN (var) = *vars;
> *vars = var;
>   }
> 
> which just misses to re-map the default def of the PARM_DECL (in case it 
> exists)
> to the same(?) var?

I've updated the patch to add a debug expr here in
copy_arguments_for_versioning for every parameter that has a default
def, and to use that debug expr in remap_ssa_name.

> All remaining uses should be in debug stmts (I hope).

I ran into a test-case where that was not the case, so I had to handle
that in remap_ssa_name, the comment in the patch describes that in more
detail.

Bootstrapped and reg-tested on x86_64.

OK for trunk?

Thanks,
- Tom
[debug] Handle references to skipped params in remap_ssa_name

When compiling guality/vla-1.c with -O3 -g, vla a in f1 is optimized away, but
f1 still contains a debug expression describing the upper bound of the vla
(D.1914):
...
 __attribute__((noinline))
 f1 (intD.6 iD.1900)
 {
   
   saved_stack.1_2 = __builtin_stack_save ();
   # DEBUG BEGIN_STMT
   # DEBUG D#3 => i_1(D) + 1
   # DEBUG D#2 => (long intD.8) D#3
   # DEBUG D#1 => D#2 + -1
   # DEBUG D.1914 => (sizetype) D#1
...

Then f1 is cloned to a version f1.constprop with no parameters, eliminating
parameter i, and 'DEBUG D#3 => i_1(D) + 1' turns into 'D#3 => NULL'.
Consequently, print sizeof (a) yields '0' in gdb.

This patch fixes that by defining debug expressions for default defs of
eliminated parameters in copy_arguments_for_versioning:
...
 __attribute__((noinline))
 f1.constprop ()
 {
   intD.6 iD.1949;

   
   # DEBUG D#8 s=> iD.1900
   # DEBUG iD.1949 => D#8
+  # DEBUG D#6 s=> iD.1900

   
   saved_stack.1_1 = __builtin_stack_save ();
   # DEBUG BEGIN_STMT
-  # DEBUG D#3 => NULL
+  # DEBUG D#3 => D#6 + 1
   # DEBUG D#2 => (long intD.8) D#3
   # DEBUG D#1 => D#2 + -1
   # DEBUG D.1951 => (sizetype) D#1
...

The inserted debug expression (D#6) is a duplicate of the debug expression
that will be inserted after copy_body in tree_function_versioning (D#8), so the
patch contains a todo to fix the duplication.

Bootstrapped and reg-tested on x86_64.

2018-07-05  Tom de Vries  

	* tree-inline.c (create_debug_expr): Factor out of ...
	(remap_ssa_name): ... here.  Use debug expr only f

[PATCH] Fix sigsegv on -fdump-tree-all-enumerate_locals

2018-07-06 Thread Tom de Vries
Hi,

this patch fixes a sigsegv when using -fdump-tree-all-enumerate_locals, by
handling cfun->cfg == NULL conservatively in dump_enumerated_decls.

OK for trunk?

Thanks,
- Tom

Fix sigsegv on -fdump-tree-all-enumerate_locals

2018-07-06  Tom de Vries  

* tree-dfa.c (dump_enumerated_decls): Handle cfun->cfg == NULL.

* gcc.misc-tests/options.exp (check_for_all_options): Clean up dump
files.
(get_dump_flags): New proc.
(toplevel): Test all dump flags.

---
 gcc/testsuite/gcc.misc-tests/options.exp | 38 
 gcc/tree-dfa.c   |  3 +++
 2 files changed, 41 insertions(+)

diff --git a/gcc/testsuite/gcc.misc-tests/options.exp 
b/gcc/testsuite/gcc.misc-tests/options.exp
index 693b40df1fd..faeae705c08 100644
--- a/gcc/testsuite/gcc.misc-tests/options.exp
+++ b/gcc/testsuite/gcc.misc-tests/options.exp
@@ -52,6 +52,10 @@ proc check_for_all_options {language gcc_options 
compiler_pattern as_pattern ld_
 }
 set gcc_output [gcc_target_compile $filename.c $filename.x executable 
$gcc_options]
 remote_file build delete $filename.c $filename.x $filename.gcno
+set dumpfiles [glob -nocomplain $filename.c.*]
+foreach dumpfile $dumpfiles {
+   remote_file build delete $dumpfile
+}   
 
 if {![regexp -- "/${compiler}(\\.exe)? -quiet.*$compiler_pattern" 
$gcc_output]} {
fail "$test (compiler options)"
@@ -70,4 +74,38 @@ proc check_for_all_options {language gcc_options 
compiler_pattern as_pattern ld_
 
 check_for_all_options c {--coverage} {-fprofile-arcs -ftest-coverage} {} 
{-lgcov}
 
+proc get_dump_flags {} {
+set res [list]
+
+global srcdir
+set file "$srcdir/../dumpfile.c"
+
+set a [open $file]
+set lines [split [read $a] "\n"]
+close $a
+
+set domatch 0
+foreach line $lines {
+   if { [regexp "dump_options.* =" $line] } {
+   set domatch 1
+   } elseif { [regexp "^\};" $line] } {
+   set domatch 0
+   }
+   if { $domatch } {
+   if { [regexp "\"(.*)\"" $line match submatch] } {
+   lappend res $submatch
+   }
+   }
+}
+
+return $res
+}
+
+foreach flag [get_dump_flags] {
+check_for_all_options c -fdump-tree-all-$flag {} {} {}
+check_for_all_options c -fdump-ipa-all-$flag {} {} {}
+check_for_all_options c -fdump-rtl-all-$flag {} {} {}
+check_for_all_options c -fdump-lang-all-$flag {} {} {}
+}
+
 gcc_parallel_test_enable 1
diff --git a/gcc/tree-dfa.c b/gcc/tree-dfa.c
index 00aa75f47ab..ee2ff2958db 100644
--- a/gcc/tree-dfa.c
+++ b/gcc/tree-dfa.c
@@ -992,6 +992,9 @@ dump_enumerated_decls_push (tree *tp, int *walk_subtrees, 
void *data)
 void
 dump_enumerated_decls (FILE *file, dump_flags_t flags)
 {
+  if (!cfun->cfg)
+return;
+
   basic_block bb;
   struct walk_stmt_info wi;
   auto_vec decl_list;


[PATCH][debug] Handle debug references to skipped params

2018-07-08 Thread Tom de Vries
On Fri, Jul 06, 2018 at 04:38:50PM +0200, Richard Biener wrote:
> On Fri, Jul 6, 2018 at 12:47 PM Tom de Vries  wrote:
> > On 07/05/2018 01:39 PM, Richard Biener wrote:



> I now also spotted the code in remap_ssa_name that is supposed to handle
> this it seems and for the testcase we only give up because the PARM_DECL is
> remapped to a VAR_DECL.  So I suppose it is to be handled via the
> debug-args stuff
> which probably lacks in the area of versioning.
> 
> Your patch feels like it adds stuff ontop of existing mechanisms that
> should "just work"
> with the correct setup at the correct places...
> 

Hmm, I realized that I may be complicating things, by trying to do an
optimal fix in a single patch, so I decided to write two patches, one
with a fix, and then one improving the fix to be more optimal.

Also, I suspect that the "just work" approach is this:
...
   # DEBUG D#8 s=> iD.1900
   # DEBUG iD.1949 => D#8
   # DEBUG D#6 s=> iD.1949
...
whereas previously I tried to map 'D#6' on iD.1900 directly.

First patch OK for trunk?

[debug] Handle debug references to skipped params

When compiling guality/vla-1.c with -O3 -g, vla a in f1 is optimized away, but
f1 still contains a debug expression describing the upper bound of the vla
(D.1914):
...
 __attribute__((noinline))
 f1 (intD.6 iD.1900)
 {
   
   saved_stack.1_2 = __builtin_stack_save ();
   # DEBUG BEGIN_STMT
   # DEBUG D#3 => i_1(D) + 1
   # DEBUG D#2 => (long intD.8) D#3
   # DEBUG D#1 => D#2 + -1
   # DEBUG D.1914 => (sizetype) D#1
...

Then f1 is cloned to a version f1.constprop with no parameters, eliminating
parameter i, and 'DEBUG D#3 => i_1(D) + 1' turns into 'D#3 => NULL'.

This patch fixes that by defining debug expressions for default defs of
eliminated parameters in remap_ssa_name:
...
 __attribute__((noinline))
 f1.constprop ()
 {
   intD.6 iD.1949;

   
   # DEBUG D#8 s=> iD.1900
   # DEBUG iD.1949 => D#8

   
+  # DEBUG D#6 s=> iD.1949
   saved_stack.1_1 = __builtin_stack_save ();
   # DEBUG BEGIN_STMT
-  # DEBUG D#3 => NULL
+  # DEBUG D#3 => D#6 + 1
   # DEBUG D#2 => (long intD.8) D#3
   # DEBUG D#1 => D#2 + -1
   # DEBUG D.1951 => (sizetype) D#1
...

Bootstrapped and reg-tested on x86_64.

2018-07-07  Tom de Vries  

* cfgexpand.c (expand_debug_source_expr): Handle VAR_DECL.
* tree-inline.c (remap_ssa_name): Handle default def ssa_name mapping
onto VAR_DECL with abstract origin.

* gcc.dg/vla-1.c: New test.

---
 gcc/cfgexpand.c  |  4 
 gcc/testsuite/gcc.dg/vla-1.c | 25 +
 gcc/tree-inline.c|  4 +++-
 3 files changed, 32 insertions(+), 1 deletion(-)

diff --git a/gcc/cfgexpand.c b/gcc/cfgexpand.c
index 9b91279282e..d6e3c382085 100644
--- a/gcc/cfgexpand.c
+++ b/gcc/cfgexpand.c
@@ -5141,6 +5141,10 @@ expand_debug_source_expr (tree exp)
 
   switch (TREE_CODE (exp))
 {
+case VAR_DECL:
+  if (DECL_ABSTRACT_ORIGIN (exp))
+   return expand_debug_source_expr (DECL_ABSTRACT_ORIGIN (exp));
+  break;
 case PARM_DECL:
   {
mode = DECL_MODE (exp);
diff --git a/gcc/testsuite/gcc.dg/vla-1.c b/gcc/testsuite/gcc.dg/vla-1.c
new file mode 100644
index 000..0c19feffd2b
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/vla-1.c
@@ -0,0 +1,25 @@
+/* { dg-do compile } */
+/* { dg-options "-g -O3 -fdump-tree-optimized" } */
+
+int __attribute__((noinline))
+f1 (int i)
+{
+  char a[i + 1];
+  char b[i + 2];
+  b[1] = 3;
+  a[0] = 5;
+  return a[0] + b[1];
+}
+
+int
+main ()
+{
+  volatile int j;
+  int x = 5;
+  j = f1 (x);
+  return 0;
+}
+
+/* One debug source bind is generated for the parameter, and two to describe 
the
+   sizes of a and b.  */
+/* { dg-final { scan-tree-dump-times " s=> i" 3 "optimized" } } */
diff --git a/gcc/tree-inline.c b/gcc/tree-inline.c
index 427ef959740..6fbd8c3ca61 100644
--- a/gcc/tree-inline.c
+++ b/gcc/tree-inline.c
@@ -208,7 +208,9 @@ remap_ssa_name (tree name, copy_body_data *id)
  n = id->decl_map->get (val);
  if (n != NULL)
val = *n;
- if (TREE_CODE (val) != PARM_DECL)
+ if (TREE_CODE (val) != PARM_DECL
+ && !(TREE_CODE (val) == VAR_DECL
+  && DECL_ABSTRACT_ORIGIN (val)))
{
  processing_debug_stmt = -1;
  return name;


[PATCH][debug] Reuse debug exprs generated in remap_ssa_name

2018-07-08 Thread Tom de Vries
On Sun, Jul 08, 2018 at 11:22:41AM +0200, Tom de Vries wrote:
> On Fri, Jul 06, 2018 at 04:38:50PM +0200, Richard Biener wrote:
> > On Fri, Jul 6, 2018 at 12:47 PM Tom de Vries  wrote:
> > > On 07/05/2018 01:39 PM, Richard Biener wrote:
> 
> 
> 
> > I now also spotted the code in remap_ssa_name that is supposed to handle
> > this it seems and for the testcase we only give up because the PARM_DECL is
> > remapped to a VAR_DECL.  So I suppose it is to be handled via the
> > debug-args stuff
> > which probably lacks in the area of versioning.
> > 
> > Your patch feels like it adds stuff ontop of existing mechanisms that
> > should "just work"
> > with the correct setup at the correct places...
> > 
> 
> Hmm, I realized that I may be complicating things, by trying to do an
> optimal fix in a single patch, so I decided to write two patches, one
> with a fix, and then one improving the fix to be more optimal.
> 
> Also, I suspect that the "just work" approach is this:
> ...
># DEBUG D#8 s=> iD.1900
># DEBUG iD.1949 => D#8
># DEBUG D#6 s=> iD.1949
> ...
> whereas previously I tried to map 'D#6' on iD.1900 directly.
> 

Second patch OK for trunk?

Thanks,
- Tom

[debug] Reuse debug exprs generated in remap_ssa_name

When compiling gcc.dg/vla-1.c with -O3 -g, vla a and b in f1 are optimized
away, and f1 is cloned to a version f1.constprop with no parameters, eliminating
parameter i.  Debug info is generated to describe the sizes of a and b, but
that process generates debug expressions that are not reused.

Fix the duplication by saving and reusing the generated debug expressions in
remap_ssa_name.  Concretely: reuse D#7 here instead of generating D#8:
...
 __attribute__((noinline))
 f1.constprop ()
 {
   intD.6 iD.1935;

   
   # DEBUG D#10 s=> iD.1897
   # DEBUG iD.1935 => D#10

   
-  # DEBUG D#8 s=> iD.1935
   # DEBUG D#7 s=> iD.1935
   saved_stack.2_1 = __builtin_stack_save ();
   # DEBUG BEGIN_STMT
   # DEBUG D#6 => D#7 + 1
   # DEBUG D#5 => (long intD.8) D#6
   # DEBUG D#4 => D#5 + -1
   # DEBUG D.1937 => (sizetype) D#4
   # DEBUG a.0D.1942 => NULL
   # DEBUG BEGIN_STMT
-  # DEBUG D#3 => D#8 + 2
+  # DEBUG D#3 => D#7 + 2
   # DEBUG D#2 => (long intD.8) D#3
   # DEBUG D#1 => D#2 + -1
   # DEBUG D.1944 => (sizetype) D#1
   # DEBUG b.1D.1949 => NULL
...

Bootstrapped and reg-tested on x86_64.

2018-07-07  Tom de Vries  

* tree-inline.c (remap_ssa_name): Save and reuse debug exprs generated
in remap_ssa_name.

* gcc.dg/vla-1.c: Update.

---
 gcc/testsuite/gcc.dg/vla-1.c | 5 +++--
 gcc/tree-inline.c| 4 
 2 files changed, 7 insertions(+), 2 deletions(-)

diff --git a/gcc/testsuite/gcc.dg/vla-1.c b/gcc/testsuite/gcc.dg/vla-1.c
index 0c19feffd2b..94db23d1336 100644
--- a/gcc/testsuite/gcc.dg/vla-1.c
+++ b/gcc/testsuite/gcc.dg/vla-1.c
@@ -20,6 +20,7 @@ main ()
   return 0;
 }
 
-/* One debug source bind is generated for the parameter, and two to describe 
the
+/* One debug source bind is generated for the parameter, and one to describe 
the
sizes of a and b.  */
-/* { dg-final { scan-tree-dump-times " s=> i" 3 "optimized" } } */
+/* { dg-final { scan-tree-dump-times " s=> i" 2 "optimized" } } */
+
diff --git a/gcc/tree-inline.c b/gcc/tree-inline.c
index 6fbd8c3ca61..164c7fff710 100644
--- a/gcc/tree-inline.c
+++ b/gcc/tree-inline.c
@@ -215,12 +215,16 @@ remap_ssa_name (tree name, copy_body_data *id)
  processing_debug_stmt = -1;
  return name;
}
+ n = id->decl_map->get (val);
+ if (n && TREE_CODE (*n) == DEBUG_EXPR_DECL)
+   return *n;
  def_temp = gimple_build_debug_source_bind (vexpr, val, NULL);
  DECL_ARTIFICIAL (vexpr) = 1;
  TREE_TYPE (vexpr) = TREE_TYPE (name);
  SET_DECL_MODE (vexpr, DECL_MODE (SSA_NAME_VAR (name)));
  gsi = gsi_after_labels (single_succ (ENTRY_BLOCK_PTR_FOR_FN (cfun)));
  gsi_insert_before (&gsi, def_temp, GSI_SAME_STMT);
+ insert_decl_map (id, val, vexpr);
  return vexpr;
}
 



[testsuite, committed] Use relative line numbers in gcc.dg/guality

2018-07-09 Thread Tom de Vries
Hi,

this patches uses relative line numbers in gcc.dg/guality where obvious:
either the relative line number is '.', '.-1' or '.+1', or adjacent to
another obvious case.

Committed as obvious.

Thanks,
- Tom

[testsuite] Use relative line numbers in gcc.dg/guality

2018-07-09  Tom de Vries  

* gcc.dg/guality/asm-1.c: Use relative line numbers where obvious.
* gcc.dg/guality/bswaptest.c: Same.
* gcc.dg/guality/clztest.c: Same.
* gcc.dg/guality/csttest.c: Same.
* gcc.dg/guality/ctztest.c: Same.
* gcc.dg/guality/drap.c: Same.
* gcc.dg/guality/nrv-1.c: Same.
* gcc.dg/guality/pr41353-1.c: Same.
* gcc.dg/guality/pr41353-2.c: Same.
* gcc.dg/guality/pr41404-1.c: Same.
* gcc.dg/guality/pr43051-1.c: Same.
* gcc.dg/guality/pr43077-1.c: Same.
* gcc.dg/guality/pr43177.c: Same.
* gcc.dg/guality/pr43329-1.c: Same.
* gcc.dg/guality/pr43479.c: Same.
* gcc.dg/guality/pr43593.c: Same.
* gcc.dg/guality/pr45003-1.c: Same.
* gcc.dg/guality/pr45003-2.c: Same.
* gcc.dg/guality/pr45003-3.c: Same.
* gcc.dg/guality/pr48437.c: Same.
* gcc.dg/guality/pr48466.c: Same.
* gcc.dg/guality/pr49888.c: Same.
* gcc.dg/guality/pr54200.c: Same.
* gcc.dg/guality/pr54519-1.c: Same.
* gcc.dg/guality/pr54519-2.c: Same.
* gcc.dg/guality/pr54519-3.c: Same.
* gcc.dg/guality/pr54519-4.c: Same.
* gcc.dg/guality/pr54519-5.c: Same.
* gcc.dg/guality/pr54519-6.c: Same.
* gcc.dg/guality/pr54551.c: Same.
* gcc.dg/guality/pr54693-2.c: Same.
* gcc.dg/guality/pr54693.c: Same.
* gcc.dg/guality/pr54796.c: Same.
* gcc.dg/guality/pr54970.c: Same.
* gcc.dg/guality/pr67192.c: Same.
* gcc.dg/guality/pr69947.c: Same.
* gcc.dg/guality/pr78726.c: Same.
* gcc.dg/guality/rotatetest.c: Same.
* gcc.dg/guality/sra-1.c: Same.
* gcc.dg/guality/vla-2.c: Same.

---
 gcc/testsuite/gcc.dg/guality/asm-1.c  |  2 +-
 gcc/testsuite/gcc.dg/guality/bswaptest.c  |  4 +-
 gcc/testsuite/gcc.dg/guality/clztest.c|  4 +-
 gcc/testsuite/gcc.dg/guality/csttest.c| 72 +++
 gcc/testsuite/gcc.dg/guality/ctztest.c|  4 +-
 gcc/testsuite/gcc.dg/guality/drap.c   |  4 +-
 gcc/testsuite/gcc.dg/guality/nrv-1.c  |  2 +-
 gcc/testsuite/gcc.dg/guality/pr41353-1.c  | 30 ++---
 gcc/testsuite/gcc.dg/guality/pr41353-2.c  |  4 +-
 gcc/testsuite/gcc.dg/guality/pr41404-1.c  |  6 +--
 gcc/testsuite/gcc.dg/guality/pr43051-1.c  | 12 +++---
 gcc/testsuite/gcc.dg/guality/pr43077-1.c  | 20 -
 gcc/testsuite/gcc.dg/guality/pr43177.c|  8 ++--
 gcc/testsuite/gcc.dg/guality/pr43329-1.c  |  4 +-
 gcc/testsuite/gcc.dg/guality/pr43479.c| 10 ++---
 gcc/testsuite/gcc.dg/guality/pr43593.c|  2 +-
 gcc/testsuite/gcc.dg/guality/pr45003-1.c  |  4 +-
 gcc/testsuite/gcc.dg/guality/pr45003-2.c  |  4 +-
 gcc/testsuite/gcc.dg/guality/pr45003-3.c  |  4 +-
 gcc/testsuite/gcc.dg/guality/pr48437.c|  2 +-
 gcc/testsuite/gcc.dg/guality/pr48466.c|  8 ++--
 gcc/testsuite/gcc.dg/guality/pr49888.c|  4 +-
 gcc/testsuite/gcc.dg/guality/pr54200.c|  2 +-
 gcc/testsuite/gcc.dg/guality/pr54519-1.c  | 12 +++---
 gcc/testsuite/gcc.dg/guality/pr54519-2.c  |  6 +--
 gcc/testsuite/gcc.dg/guality/pr54519-3.c  | 12 +++---
 gcc/testsuite/gcc.dg/guality/pr54519-4.c  |  6 +--
 gcc/testsuite/gcc.dg/guality/pr54519-5.c  |  6 +--
 gcc/testsuite/gcc.dg/guality/pr54519-6.c  |  4 +-
 gcc/testsuite/gcc.dg/guality/pr54551.c|  4 +-
 gcc/testsuite/gcc.dg/guality/pr54693-2.c  |  8 ++--
 gcc/testsuite/gcc.dg/guality/pr54693.c|  2 +-
 gcc/testsuite/gcc.dg/guality/pr54796.c|  6 +--
 gcc/testsuite/gcc.dg/guality/pr54970.c| 68 ++---
 gcc/testsuite/gcc.dg/guality/pr67192.c| 10 ++---
 gcc/testsuite/gcc.dg/guality/pr69947.c|  4 +-
 gcc/testsuite/gcc.dg/guality/pr78726.c|  6 +--
 gcc/testsuite/gcc.dg/guality/rotatetest.c | 12 +++---
 gcc/testsuite/gcc.dg/guality/sra-1.c  | 12 +++---
 gcc/testsuite/gcc.dg/guality/vla-2.c  |  4 +-
 40 files changed, 199 insertions(+), 199 deletions(-)

diff --git a/gcc/testsuite/gcc.dg/guality/asm-1.c 
b/gcc/testsuite/gcc.dg/guality/asm-1.c
index e9cf4a167aa..089badf812a 100644
--- a/gcc/testsuite/gcc.dg/guality/asm-1.c
+++ b/gcc/testsuite/gcc.dg/guality/asm-1.c
@@ -10,7 +10,7 @@ foo (struct A *p, char *q)
 {
   int f = &p->z[p->y] - q;
   asm volatile (NOP);
-  asm volatile (NOP : : "g" (f));  /* { dg-final { gdb-test 14 "f" 
"14" } } */
+  asm volatile (NOP : : "g" (f));  /* { dg-final { gdb-test .+1 
"f" "14" } } */
   asm volatile ("" : : "g" (p), "g" (q));
 }
 
diff --git a/gcc/testsuite/gcc.dg/guality/bswaptest.c 
b/gcc/t

[PATCH][testsuite, guality] Add -fno-ipa-icf in gcc.dg/guality

2018-07-13 Thread Tom de Vries
Hi,

Optimization fipa-icf breaks debug info (as is noted in PR63572 - "ICF
breaks user debugging experience"), which make guality tests clztest.c,
ctztest.c and sra-1.c unsupported for option combination "-O2 -flto
-fuse-linker-plugin -fno-fat-lto-objects".  F.i., in clztest.c foo and bar are
merged, and gdb can set a breakpoint on a line in foo, but trying to set a
breakpoint on a line in bar results in a breakpoint in main instead.

This patch works around the problem by adding -fno-ipa-icf (as is already done
in csttest.c and pr43077-1.c) to those testcases:
...
-UNSUPPORTED: gcc.dg/guality/clztest.c ... line . g == f
+PASS:gcc.dg/guality/clztest.c ... line . g == f
-UNSUPPORTED: gcc.dg/guality/ctztest.c ... line . g == f
+PASS:gcc.dg/guality/ctztest.c ... line . g == f
-UNSUPPORTED: gcc.dg/guality/sra-1.c ... line .+1 a[0] == 4
+PASS:gcc.dg/guality/sra-1.c ... line .+1 a[0] == 4
-UNSUPPORTED: gcc.dg/guality/sra-1.c ... line . a[1] == 14
+PASS:gcc.dg/guality/sra-1.c ... line . a[1] == 14
...

Tested on x86_64.

OK for trunk?

Thanks,
- Tom

[testsuite, guality] Add -fno-ipa-icf in gcc.dg/guality

2018-07-13  Tom de Vries  

* gcc.dg/guality/clztest.c: Add -fno-ipa-icf in dg-options.
* gcc.dg/guality/ctztest.c: Same.
* gcc.dg/guality/sra-1.c: Same.

---
 gcc/testsuite/gcc.dg/guality/clztest.c | 2 +-
 gcc/testsuite/gcc.dg/guality/ctztest.c | 2 +-
 gcc/testsuite/gcc.dg/guality/sra-1.c   | 2 +-
 3 files changed, 3 insertions(+), 3 deletions(-)

diff --git a/gcc/testsuite/gcc.dg/guality/clztest.c 
b/gcc/testsuite/gcc.dg/guality/clztest.c
index f89c1c31a15..69527561c22 100644
--- a/gcc/testsuite/gcc.dg/guality/clztest.c
+++ b/gcc/testsuite/gcc.dg/guality/clztest.c
@@ -1,5 +1,5 @@
 /* { dg-do run { target { { i?86-*-* x86_64-*-* } && lp64 } } } */
-/* { dg-options "-g" } */
+/* { dg-options "-g -fno-ipa-icf" } */
 
 volatile int vv;
 
diff --git a/gcc/testsuite/gcc.dg/guality/ctztest.c 
b/gcc/testsuite/gcc.dg/guality/ctztest.c
index 5ce6c674be3..276752ac986 100644
--- a/gcc/testsuite/gcc.dg/guality/ctztest.c
+++ b/gcc/testsuite/gcc.dg/guality/ctztest.c
@@ -1,5 +1,5 @@
 /* { dg-do run { target { { i?86-*-* x86_64-*-* } && lp64 } } } */
-/* { dg-options "-g" } */
+/* { dg-options "-g -fno-ipa-icf" } */
 
 volatile int vv;
 
diff --git a/gcc/testsuite/gcc.dg/guality/sra-1.c 
b/gcc/testsuite/gcc.dg/guality/sra-1.c
index a747bc302aa..8ad57cf3f8e 100644
--- a/gcc/testsuite/gcc.dg/guality/sra-1.c
+++ b/gcc/testsuite/gcc.dg/guality/sra-1.c
@@ -1,6 +1,6 @@
 /* PR debug/43983 */
 /* { dg-do run } */
-/* { dg-options "-g" } */
+/* { dg-options "-g -fno-ipa-icf" } */
 
 struct A { int i; int j; };
 struct B { int : 4; int i : 12; int j : 12; int : 4; };


Re: [PATCH][debug] Reuse debug exprs generated in remap_ssa_name

2018-07-13 Thread Tom de Vries
On 07/09/2018 02:43 PM, Richard Biener wrote:
> On Sun, Jul 8, 2018 at 11:27 AM Tom de Vries  wrote:
>>
>> On Sun, Jul 08, 2018 at 11:22:41AM +0200, Tom de Vries wrote:
>>> On Fri, Jul 06, 2018 at 04:38:50PM +0200, Richard Biener wrote:
>>>> On Fri, Jul 6, 2018 at 12:47 PM Tom de Vries  wrote:
>>>>> On 07/05/2018 01:39 PM, Richard Biener wrote:
>>>
>>> 
>>>
>>>> I now also spotted the code in remap_ssa_name that is supposed to handle
>>>> this it seems and for the testcase we only give up because the PARM_DECL is
>>>> remapped to a VAR_DECL.  So I suppose it is to be handled via the
>>>> debug-args stuff
>>>> which probably lacks in the area of versioning.
>>>>
>>>> Your patch feels like it adds stuff ontop of existing mechanisms that
>>>> should "just work"
>>>> with the correct setup at the correct places...
>>>>
>>>
>>> Hmm, I realized that I may be complicating things, by trying to do an
>>> optimal fix in a single patch, so I decided to write two patches, one
>>> with a fix, and then one improving the fix to be more optimal.
>>>
>>> Also, I suspect that the "just work" approach is this:
>>> ...
>>># DEBUG D#8 s=> iD.1900
>>># DEBUG iD.1949 => D#8
>>># DEBUG D#6 s=> iD.1949
>>> ...
>>> whereas previously I tried to map 'D#6' on iD.1900 directly.
>>>
>>
>> Second patch OK for trunk?
> 
> OK, though I wonder how it doesn't fail with that testcase with
> the mismatching type where the removed param-decl is mapped
> to a local var-decl.

Previously I mapped the default def ssa-name onto the debug expression,
which meant I had to add special handling at the start of
remap_ssa_name, where already mapped ssa-names are handled, for the
mismatched argument/parameter type case.

Now I map the local var-decl onto the debug expression, and the code
that is added to look it up is already guarded with processing_debug_stmt.

Thanks,
- Tom

>> diff --git a/gcc/tree-inline.c b/gcc/tree-inline.c
>> index 6fbd8c3ca61..164c7fff710 100644
>> --- a/gcc/tree-inline.c
>> +++ b/gcc/tree-inline.c
>> @@ -215,12 +215,16 @@ remap_ssa_name (tree name, copy_body_data *id)
>>   processing_debug_stmt = -1;
>>   return name;
>> }
>> + n = id->decl_map->get (val);
>> + if (n && TREE_CODE (*n) == DEBUG_EXPR_DECL)
>> +   return *n;
>>   def_temp = gimple_build_debug_source_bind (vexpr, val, NULL);
>>   DECL_ARTIFICIAL (vexpr) = 1;
>>   TREE_TYPE (vexpr) = TREE_TYPE (name);
>>   SET_DECL_MODE (vexpr, DECL_MODE (SSA_NAME_VAR (name)));
>>   gsi = gsi_after_labels (single_succ (ENTRY_BLOCK_PTR_FOR_FN 
>> (cfun)));
>>   gsi_insert_before (&gsi, def_temp, GSI_SAME_STMT);
>> + insert_decl_map (id, val, vexpr);
>>   return vexpr;
>> }
>>
>>


[PATCH][testsuite/guality] Run guality tests with Og

2018-07-13 Thread Tom de Vries
Hi,

we advertise Og as the optimization level of choice for the standard
edit-compile-debug cycle, but do not run the guality tests for Og with the
default torture options.

This patch ensures that we test -Og in the guality tests.

F.i., for gcc.dg/guality there are 45 fails for Og (while there are none for
O1), in these test-cases:
...
gcc.dg/guality/pr54200.c
gcc.dg/guality/pr54970.c
gcc.dg/guality/pr56154-1.c
gcc.dg/guality/pr59776.c
gcc.dg/guality/sra-1.c
...

Tested gcc.dg/guality on c-only compiler, currently doing bootstrap and
reg-test on x86_64.

OK for trunk if no issues found during testing?

Thanks,
- Tom

[testsuite/guality] Run guality tests with Og

2018-07-13  Tom de Vries  

* lib/gcc-gdb-test.exp (guality_minimal_options): New proc.
* g++.dg/guality/guality.exp: Ensure Og is part of torture options.
* gcc.dg/guality/guality.exp: Same.
* gfortran.dg/guality/guality.exp: Same.

---
 gcc/testsuite/g++.dg/guality/guality.exp  |  9 +
 gcc/testsuite/gcc.dg/guality/guality.exp  |  3 ++-
 gcc/testsuite/gfortran.dg/guality/guality.exp |  9 +
 gcc/testsuite/lib/gcc-gdb-test.exp| 14 ++
 4 files changed, 34 insertions(+), 1 deletion(-)

diff --git a/gcc/testsuite/g++.dg/guality/guality.exp 
b/gcc/testsuite/g++.dg/guality/guality.exp
index 4be22baa19c..757b20b61e2 100644
--- a/gcc/testsuite/g++.dg/guality/guality.exp
+++ b/gcc/testsuite/g++.dg/guality/guality.exp
@@ -48,6 +48,14 @@ if ![info exists ::env(GUALITY_GDB_NAME)] {
 }
 report_gdb $::env(GUALITY_GDB_NAME) [info script]
 
+global DG_TORTURE_OPTIONS LTO_TORTURE_OPTIONS
+set guality_dg_torture_options [guality_minimal_options $DG_TORTURE_OPTIONS]
+torture-init
+set-torture-options \
+$guality_dg_torture_options \
+[list {}] \
+$LTO_TORTURE_OPTIONS
+
 if {[check_guality "
   #include \"$srcdir/$subdir/guality.h\"
   volatile long int varl = 6;
@@ -65,4 +73,5 @@ if [info exists guality_gdb_name] {
 unsetenv GUALITY_GDB_NAME
 }
 
+torture-finish
 dg-finish
diff --git a/gcc/testsuite/gcc.dg/guality/guality.exp 
b/gcc/testsuite/gcc.dg/guality/guality.exp
index d9994341477..ca77a446f86 100644
--- a/gcc/testsuite/gcc.dg/guality/guality.exp
+++ b/gcc/testsuite/gcc.dg/guality/guality.exp
@@ -62,7 +62,8 @@ proc guality_transform_options { args } {
 }
 
 global DG_TORTURE_OPTIONS
-set guality_dg_torture_options [guality_transform_options $DG_TORTURE_OPTIONS]
+set guality_dg_torture_options [guality_minimal_options $DG_TORTURE_OPTIONS]
+set guality_dg_torture_options [guality_transform_options 
$guality_dg_torture_options]
 set guality_lto_torture_options [guality_transform_options 
$LTO_TORTURE_OPTIONS]
 torture-init
 set-torture-options \
diff --git a/gcc/testsuite/gfortran.dg/guality/guality.exp 
b/gcc/testsuite/gfortran.dg/guality/guality.exp
index f76347dd52f..f224cdfefa5 100644
--- a/gcc/testsuite/gfortran.dg/guality/guality.exp
+++ b/gcc/testsuite/gfortran.dg/guality/guality.exp
@@ -29,10 +29,19 @@ if ![info exists ::env(GUALITY_GDB_NAME)] {
 }
 report_gdb $::env(GUALITY_GDB_NAME) [info script]
 
+global DG_TORTURE_OPTIONS LTO_TORTURE_OPTIONS
+set guality_dg_torture_options [guality_minimal_options $DG_TORTURE_OPTIONS]
+torture-init
+set-torture-options \
+$guality_dg_torture_options \
+[list {}] \
+$LTO_TORTURE_OPTIONS
+
 gfortran-dg-runtest [lsort [glob $srcdir/$subdir/*.\[fF\]{,90,95,03,08} ]] "" 
""
 
 if [info exists guality_gdb_name] {
 unsetenv GUALITY_GDB_NAME
 }
 
+torture-finish
 dg-finish
diff --git a/gcc/testsuite/lib/gcc-gdb-test.exp 
b/gcc/testsuite/lib/gcc-gdb-test.exp
index bb966d43023..b13d3ec7f85 100644
--- a/gcc/testsuite/lib/gcc-gdb-test.exp
+++ b/gcc/testsuite/lib/gcc-gdb-test.exp
@@ -166,3 +166,17 @@ proc report_gdb { gdb loc } {
 }
 send_log -- "---\n$gdb_version\n---\n"
 }
+
+# Argument 0 is the option list.
+# Return the option list, ensuring that at least -Og is present.
+
+proc guality_minimal_options { args } {
+set options [lindex $args 0]
+foreach opt $options {
+   if { [regexp -- "-Og" $opt] } {
+   return $options
+   }
+}
+
+return [lappend options "-Og"]
+}


Re: [PATCH][testsuite/guality] Run guality tests with Og

2018-07-15 Thread Tom de Vries
On Fri, Jul 13, 2018 at 01:27:25PM +0200, Richard Biener wrote:
> On Fri, 13 Jul 2018, Tom de Vries wrote:
> 
> > Hi,
> > 
> > we advertise Og as the optimization level of choice for the standard
> > edit-compile-debug cycle, but do not run the guality tests for Og with the
> > default torture options.
> > 
> > This patch ensures that we test -Og in the guality tests.
> > 
> > F.i., for gcc.dg/guality there are 45 fails for Og (while there are none for
> > O1), in these test-cases:
> > ...
> > gcc.dg/guality/pr54200.c
> > gcc.dg/guality/pr54970.c
> > gcc.dg/guality/pr56154-1.c
> > gcc.dg/guality/pr59776.c
> > gcc.dg/guality/sra-1.c
> > ...
> > 
> > Tested gcc.dg/guality on c-only compiler, currently doing bootstrap and
> > reg-test on x86_64.
> > 
> > OK for trunk if no issues found during testing?
> 
> OK.
> 

I ran into problems with gfortran-dg-runtest, which unconditionally calls
torture-init.  I've made that conditional, similar to wat is done in
gcc-dg-runtest.

Committed to trunk as attached.

Thanks,
- Tom

[testsuite/guality] Run guality tests with Og

We advertise Og as the optimization level of choice for the standard
edit-compile-debug cycle, but do not run the guality tests for Og with the
default torture options.

This patch ensures that we test -Og in the guality tests.

F.i., for gcc.dg/guality there are 45 fails for Og (while there are none for
O1), in these test-cases:
...
gcc.dg/guality/pr54200.c
gcc.dg/guality/pr54970.c
gcc.dg/guality/pr56154-1.c
gcc.dg/guality/pr59776.c
gcc.dg/guality/sra-1.c
...

2018-07-13  Tom de Vries  

* lib/gcc-gdb-test.exp (guality_minimal_options): New proc.
* lib/gfortran-dg.exp (gfortran-dg-runtest): Don't call torture-init if
already called.
* g++.dg/guality/guality.exp: Ensure Og is part of torture options.
* gcc.dg/guality/guality.exp: Same.
* gfortran.dg/guality/guality.exp: Same.

---
 gcc/testsuite/g++.dg/guality/guality.exp  |  9 +
 gcc/testsuite/gcc.dg/guality/guality.exp  |  3 ++-
 gcc/testsuite/gfortran.dg/guality/guality.exp |  7 +++
 gcc/testsuite/lib/gcc-gdb-test.exp| 14 ++
 gcc/testsuite/lib/gfortran-dg.exp | 18 +-
 5 files changed, 45 insertions(+), 6 deletions(-)

diff --git a/gcc/testsuite/g++.dg/guality/guality.exp 
b/gcc/testsuite/g++.dg/guality/guality.exp
index 4be22baa19c..757b20b61e2 100644
--- a/gcc/testsuite/g++.dg/guality/guality.exp
+++ b/gcc/testsuite/g++.dg/guality/guality.exp
@@ -48,6 +48,14 @@ if ![info exists ::env(GUALITY_GDB_NAME)] {
 }
 report_gdb $::env(GUALITY_GDB_NAME) [info script]
 
+global DG_TORTURE_OPTIONS LTO_TORTURE_OPTIONS
+set guality_dg_torture_options [guality_minimal_options $DG_TORTURE_OPTIONS]
+torture-init
+set-torture-options \
+$guality_dg_torture_options \
+[list {}] \
+$LTO_TORTURE_OPTIONS
+
 if {[check_guality "
   #include \"$srcdir/$subdir/guality.h\"
   volatile long int varl = 6;
@@ -65,4 +73,5 @@ if [info exists guality_gdb_name] {
 unsetenv GUALITY_GDB_NAME
 }
 
+torture-finish
 dg-finish
diff --git a/gcc/testsuite/gcc.dg/guality/guality.exp 
b/gcc/testsuite/gcc.dg/guality/guality.exp
index d9994341477..ca77a446f86 100644
--- a/gcc/testsuite/gcc.dg/guality/guality.exp
+++ b/gcc/testsuite/gcc.dg/guality/guality.exp
@@ -62,7 +62,8 @@ proc guality_transform_options { args } {
 }
 
 global DG_TORTURE_OPTIONS
-set guality_dg_torture_options [guality_transform_options $DG_TORTURE_OPTIONS]
+set guality_dg_torture_options [guality_minimal_options $DG_TORTURE_OPTIONS]
+set guality_dg_torture_options [guality_transform_options 
$guality_dg_torture_options]
 set guality_lto_torture_options [guality_transform_options 
$LTO_TORTURE_OPTIONS]
 torture-init
 set-torture-options \
diff --git a/gcc/testsuite/gfortran.dg/guality/guality.exp 
b/gcc/testsuite/gfortran.dg/guality/guality.exp
index f76347dd52f..eaa7ae770d6 100644
--- a/gcc/testsuite/gfortran.dg/guality/guality.exp
+++ b/gcc/testsuite/gfortran.dg/guality/guality.exp
@@ -29,10 +29,17 @@ if ![info exists ::env(GUALITY_GDB_NAME)] {
 }
 report_gdb $::env(GUALITY_GDB_NAME) [info script]
 
+global DG_TORTURE_OPTIONS
+set guality_dg_torture_options [guality_minimal_options $DG_TORTURE_OPTIONS]
+torture-init
+set-torture-options \
+$guality_dg_torture_options \
+
 gfortran-dg-runtest [lsort [glob $srcdir/$subdir/*.\[fF\]{,90,95,03,08} ]] "" 
""
 
 if [info exists guality_gdb_name] {
 unsetenv GUALITY_GDB_NAME
 }
 
+torture-finish
 dg-finish
diff --git a/gcc/testsuite/lib/gcc-gdb-test.exp 
b/gcc/testsuite/lib/gcc-gdb-test.exp
index bb966d43023..b13d3ec7f85 100644
--- a/gcc/testsuite/lib/gcc-gdb-test.exp
+++ b/gcc/testsuite/lib/gcc-gdb-test.exp
@@ -166,3 +166,17 @@ proc report_gdb { gdb loc } {
 }
 send_log -- "---\n$g

[PATCH][debug] Fix pre_dec handling in vartrack

2018-07-15 Thread Tom de Vries
Hi,

when compiling test-case gcc.target/i386/vartrack-1.c with -O1 -g, register bx
is pushed in the prologue and popped in the epilogue:
...
(insn/f 26 3 27 2
  (set (mem:DI (pre_dec:DI (reg/f:DI 7 sp)) [0  S8 A8])
   (reg:DI 3 bx))
   "vartrack-1.c":10 61 {*pushdi2_rex64}
   (expr_list:REG_DEAD (reg:DI 3 bx) (nil)))
  ...
(insn/f 29 28 30 2
  (set (reg:DI 3 bx)
   (mem:DI (post_inc:DI (reg/f:DI 7 sp)) [0  S8 A8]))
   "vartrack-1.c":15 71 {*popdi1}
   (expr_list:REG_CFA_ADJUST_CFA
 (set (reg/f:DI 7 sp)
  (plus:DI (reg/f:DI 7 sp)
   (const_int 8 [0x8]))) (nil)))
...

However, when we adjust those insns in vartrack to eliminate the pre_dec and
post_inc, the frame location for the push is at argp - 24, while the one for the
pop is at argp - 16:
...
(insn/f 26 3 27 2
  (parallel [
(set (mem:DI (plus:DI (reg/f:DI 16 argp)
  (const_int -24 [0xffe8])) [0  S8 A8])
 (reg:DI 3 bx))
(set (reg/f:DI 7 sp)
 (plus:DI (reg/f:DI 16 argp)
  (const_int -24 [0xffe8])))
  ])
  "vartrack-1.c":10 61 {*pushdi2_rex64}
  (expr_list:REG_DEAD (reg:DI 3 bx) (nil)))
  ...
(insn/f 29 28 30 2
  (parallel [
(set (reg:DI 3 bx)
 (mem:DI (plus:DI (reg/f:DI 16 argp)
  (const_int -16 [0xfff0])) [0  S8 A8]))
(set (reg/f:DI 7 sp)
 (plus:DI (reg/f:DI 16 argp)
  (const_int -8 [0xfff8])))
  ])
  "vartrack-1.c":15 71 {*popdi1}
  (expr_list:REG_CFA_ADJUST_CFA
(set (reg/f:DI 7 sp)
 (plus:DI (reg/f:DI 7 sp)
  (const_int 8 [0x8]))) (nil)))
...

This patch fixes that by moving the stack_adjust modification after
adjust_insn in vt_initialize.  Also, the patch prints the adjusted insn in a
slim way if dump_flags contain TDF_SLIM, which makes the scan-rtl-dump in the
testcase easier.

Bootstrapped and reg-tested on x86_64.

OK for trunk?

Thanks,
- Tom

[debug] Fix pre_dec handling in vartrack

2018-07-15  Tom de Vries  

* var-tracking.c (vt_initialize): Fix pre_dec handling.  Print adjusted
insn slim if dump_flags request TDF_SLIM.

* gcc.target/i386/vartrack-1.c: New test.

---
 gcc/testsuite/gcc.target/i386/vartrack-1.c | 28 
 gcc/var-tracking.c | 13 +++--
 2 files changed, 39 insertions(+), 2 deletions(-)

diff --git a/gcc/testsuite/gcc.target/i386/vartrack-1.c 
b/gcc/testsuite/gcc.target/i386/vartrack-1.c
new file mode 100644
index 000..15514fc0da7
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/vartrack-1.c
@@ -0,0 +1,28 @@
+/* { dg-require-effective-target lp64 } */
+/* { dg-options "-O1 -g -fdump-rtl-vartrack-details-slim" } */
+
+static volatile int vv = 1;
+
+extern long foo (long x);
+
+int
+main ()
+{
+  long x = vv;
+  foo (x);
+  foo (x + 1);
+  return 0;
+}
+
+/* Before adjust_insn:
+   26: [--sp:DI]=bx:DI
+   29: bx:DI=[sp:DI++]
+
+   after adjust_insn:
+   26: {[argp:DI-0x10]=bx:DI;sp:DI=argp:DI-0x10;}
+   29: {bx:DI=[argp:DI-0x10];sp:DI=argp:DI-0x8;} */
+
+/* { dg-final { scan-rtl-dump-times {[0-9][0-9]*: 
\{\[argp:DI-0x10\]=bx:DI;sp:DI=argp:DI-0x10;\}} 1 "vartrack" } } */
+
+/* { dg-final { scan-rtl-dump-times {[0-9][0-9]*: 
\{bx:DI=\[argp:DI-0x10\];sp:DI=argp:DI-0x8;\}} 1 "vartrack" } } */
+
diff --git a/gcc/var-tracking.c b/gcc/var-tracking.c
index 8e800960b6d..5058f3cc7d2 100644
--- a/gcc/var-tracking.c
+++ b/gcc/var-tracking.c
@@ -115,6 +115,7 @@
 #include "tree-pretty-print.h"
 #include "rtl-iter.h"
 #include "fibonacci_heap.h"
+#include "print-rtl.h"
 
 typedef fibonacci_heap  bb_heap_t;
 typedef fibonacci_node  bb_heap_node_t;
@@ -10208,12 +10209,17 @@ vt_initialize (void)
log_op_type (PATTERN (insn), bb, insn,
 MO_ADJUST, dump_file);
  VTI (bb)->mos.safe_push (mo);
- VTI (bb)->out.stack_adjust += pre;
}
}
 
  cselib_hook_called = false;
  adjust_insn (bb, insn);
+
+ if (!frame_pointer_needed)
+   {
+ if (pre)
+   VTI (bb)->out.stack_adjust += pre;
+   }
  if (DEBUG_MARKER_INSN_P (insn))
{
  reemit_marker_as_note (insn);
@@ -10227,7 +10233,10 @@ vt_initialize (void)
  cselib_process_insn (insn);
  if (dump_file && (dump_flags & TDF_DETAILS))
{
- print_rtl_single (dump_file, insn);
+ if (dump_flags & TDF_SLIM)
+   dump_insn_slim (dump_file, insn);
+ else
+   print_rtl_single (dump_file, insn);
  dump_cselib_table (dump_file);
}
}


[RFC][debug] Add -fadd-debug-nops

2018-07-16 Thread Tom de Vries
Hi,

this is an idea that I'm currently playing around with: adding nops in
an optimized application with debug info can improve the debug info.


Consider f.i. this gdb session in foo of pr54200-2.c (using -Os):
...
(gdb) n
26return a; /* { dg-final { gdb-test . "(int)a" "6" } } */
(gdb) p a
'a' has unknown type; cast it to its declared type
(gdb) n
main () at pr54200-2.c:34
34return 0;
...

The problem is that the scope in which a is declared ends at .LBE7, and the
statement .loc for line 26 ends up attached to the ret insn:
...
.loc 1 24 11
addl%edx, %eax
.LVL1:
# DEBUG a => ax
.loc 1 26 7 is_stmt 1
.LBE7:
.loc 1 28 1 is_stmt 0
ret
.cfi_endproc
...

This patch fixes the problem (for Og and Os, the 'DEBUG a => ax' is missing
for O1 and higher) by adding a nop before the ret insn:
...
.loc 1 24 11
addl%edx, %eax
 .LVL1:
# DEBUG a => ax
.loc 1 26 7 is_stmt 1
+   nop
 .LBE7:
.loc 1 28 1 is_stmt 0
ret
.cfi_endproc
...

and instead we have:
...
(gdb) n
26return a; /* { dg-final { gdb-test . "(int)a" "6" } } */
(gdb) p a
$1 = 6
(gdb) n
main () at pr54200-2.c:34
34return 0;
...

Any comments?

Thanks,
- Tom

[debug] Add -fadd-debug-nops

---
 gcc/common.opt   |  4 
 gcc/testsuite/gcc.dg/guality/pr54200-2.c | 37 
 gcc/var-tracking.c   | 16 ++
 3 files changed, 57 insertions(+)

diff --git a/gcc/common.opt b/gcc/common.opt
index c29abdb5cb1..8e2876d2b8e 100644
--- a/gcc/common.opt
+++ b/gcc/common.opt
@@ -1504,6 +1504,10 @@ Common Report Var(flag_hoist_adjacent_loads) Optimization
 Enable hoisting adjacent loads to encourage generating conditional move
 instructions.
 
+fadd-debug-nops
+Common Report Var(flag_add_debug_nops) Optimization
+Add nops if that improves debugging of optimized code
+
 fkeep-gc-roots-live
 Common Undocumented Report Var(flag_keep_gc_roots_live) Optimization
 ; Always keep a pointer to a live memory block
diff --git a/gcc/testsuite/gcc.dg/guality/pr54200-2.c 
b/gcc/testsuite/gcc.dg/guality/pr54200-2.c
new file mode 100644
index 000..2694f7401e2
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/guality/pr54200-2.c
@@ -0,0 +1,37 @@
+/* { dg-do run } */
+/* { dg-skip-if "" { *-*-* }  { "*" } { "-Og" "-Os" "-O0" } } */
+/* { dg-options "-g -fadd-debug-nops -DMAIN" } */
+
+#include "prevent-optimization.h"
+
+int o ATTRIBUTE_USED;
+
+void
+bar (void)
+{
+  o = 2;
+}
+
+int __attribute__((noinline,noclone))
+foo (int z, int x, int b)
+{
+  if (x == 1)
+{
+  bar ();
+  return z;
+}
+  else
+{
+  int a = (x + z) + b;
+  /* Add cast to prevent unsupported.  */
+  return a; /* { dg-final { gdb-test . "(int)a" "6" } } */
+}
+}
+
+#ifdef MAIN
+int main ()
+{
+  foo (3, 2, 1);
+  return 0;
+}
+#endif
diff --git a/gcc/var-tracking.c b/gcc/var-tracking.c
index 5537fa64085..230845a38a0 100644
--- a/gcc/var-tracking.c
+++ b/gcc/var-tracking.c
@@ -9994,6 +9994,15 @@ reemit_marker_as_note (rtx_insn *insn)
 }
 }
 
+static void
+emit_nop_before (rtx_insn *insn)
+{
+  rtx_insn *insert_after = PREV_INSN (insn);
+  while (INSN_LOCATION (insert_after) == INSN_LOCATION (insn))
+insert_after = PREV_INSN (insert_after);
+  emit_insn_after (gen_nop (), insert_after);
+}
+
 /* Allocate and initialize the data structures for variable tracking
and parse the RTL to get the micro operations.  */
 
@@ -10224,6 +10233,13 @@ vt_initialize (void)
  continue;
}
 
+ /* Add debug nops before ret insn.  */
+ if (optimize
+ && flag_add_debug_nops
+ && cfun->debug_nonbind_markers
+ && returnjump_p (insn))
+   emit_nop_before (insn);
+
  if (MAY_HAVE_DEBUG_BIND_INSNS)
{
  if (CALL_P (insn))


Re: [RFC][debug] Add -fadd-debug-nops

2018-07-16 Thread Tom de Vries
On 07/16/2018 03:34 PM, Jakub Jelinek wrote:
> On Mon, Jul 16, 2018 at 03:29:10PM +0200, Tom de Vries wrote:
>> this is an idea that I'm currently playing around with: adding nops in
>> an optimized application with debug info can improve the debug info.
>>
>> Consider f.i. this gdb session in foo of pr54200-2.c (using -Os):
>> ...
>> (gdb) n
>> 26return a; /* { dg-final { gdb-test . "(int)a" "6" } } */
>> (gdb) p a
>> 'a' has unknown type; cast it to its declared type
>> (gdb) n
>> main () at pr54200-2.c:34
>> 34return 0;
>> ...
>>
>> The problem is that the scope in which a is declared ends at .LBE7, and the
>> statement .loc for line 26 ends up attached to the ret insn:
>> ...
>> .loc 1 24 11
>> addl%edx, %eax
>> .LVL1:
>> # DEBUG a => ax
>> .loc 1 26 7 is_stmt 1
>> .LBE7:
>> .loc 1 28 1 is_stmt 0
>> ret
>> .cfi_endproc
>> ...
>>
>> This patch fixes the problem (for Og and Os, the 'DEBUG a => ax' is missing
>> for O1 and higher) by adding a nop before the ret insn:
>> ...
>> .loc 1 24 11
>> addl%edx, %eax
>>  .LVL1:
>> # DEBUG a => ax
>> .loc 1 26 7 is_stmt 1
>> +   nop
>>  .LBE7:
>> .loc 1 28 1 is_stmt 0
>> ret
>> .cfi_endproc
>> ...
>>
>> and instead we have:
>> ...
>> (gdb) n
>> 26return a; /* { dg-final { gdb-test . "(int)a" "6" } } */
>> (gdb) p a
>> $1 = 6
>> (gdb) n
>> main () at pr54200-2.c:34
>> 34return 0;
>> ...
>>
>> Any comments?
> 
> So is this essentially a workaround for GDB not supporting the statement
> frontiers?

AFAIU now, the concept of location views addresses this problem, so yes.

> Isn't the right fix just to add that support instead and then
> users can choose how exactly they want to step through the function in the
> debugger.

Right, but in the mean time I don't mind having an option that lets me
filter out noise in guality test results.

Thanks,
- Tom


Re: [RFC][debug] Add -fadd-debug-nops

2018-07-16 Thread Tom de Vries
On 07/16/2018 03:50 PM, Richard Biener wrote:
> On Mon, 16 Jul 2018, Tom de Vries wrote:
> 
>> Hi,
>>
>> this is an idea that I'm currently playing around with: adding nops in
>> an optimized application with debug info can improve the debug info.
>>
>>
>> Consider f.i. this gdb session in foo of pr54200-2.c (using -Os):
>> ...
>> (gdb) n
>> 26return a; /* { dg-final { gdb-test . "(int)a" "6" } } */
>> (gdb) p a
>> 'a' has unknown type; cast it to its declared type
>> (gdb) n
>> main () at pr54200-2.c:34
>> 34return 0;
>> ...
>>
>> The problem is that the scope in which a is declared ends at .LBE7, and the
>> statement .loc for line 26 ends up attached to the ret insn:
>> ...
>> .loc 1 24 11
>> addl%edx, %eax
>> .LVL1:
>> # DEBUG a => ax
>> .loc 1 26 7 is_stmt 1
>> .LBE7:
>> .loc 1 28 1 is_stmt 0
>> ret
>> .cfi_endproc
>> ...
>>
>> This patch fixes the problem (for Og and Os, the 'DEBUG a => ax' is missing
>> for O1 and higher) by adding a nop before the ret insn:
>> ...
>> .loc 1 24 11
>> addl%edx, %eax
>>  .LVL1:
>> # DEBUG a => ax
>> .loc 1 26 7 is_stmt 1
>> +   nop
>>  .LBE7:
>> .loc 1 28 1 is_stmt 0
>> ret
>> .cfi_endproc
>> ...
>>
>> and instead we have:
>> ...
>> (gdb) n
>> 26return a; /* { dg-final { gdb-test . "(int)a" "6" } } */
>> (gdb) p a
>> $1 = 6
>> (gdb) n
>> main () at pr54200-2.c:34
>> 34return 0;
>> ...
>>
>> Any comments?
> 
> Interesting idea.  I wonder if that should be generalized
> to other places

I kept the option name general, to allow for that.

And indeed, this is a point-fix patch. I've been playing around with a
more generic patch that adds nops such that each is_stmt .loc is
associated with a unique insn, but that was embedded in an
fkeep-vars-live branch, so this patch is minimally addressing the first
problem I managed to reproduce on trunk.

> and how we can avoid compare-debug failures
> (var-tracking usually doesn't change code-generation).
> 

I could remove the cfun->debug_nonbind_markers test and move the
nop-insertion to a separate pass or some such.

Thanks,
- Tom


Re: [PATCH, PR86257, i386/debug] Fix insn prefix in tls_global_dynamic_64_

2018-07-17 Thread Tom de Vries
On 06/25/2018 03:02 PM, Tom de Vries wrote:
> On 06/25/2018 02:45 PM, Nathan Sidwell wrote:
>> On 06/25/2018 08:25 AM, Tom de Vries wrote:
>>
>>> If we'd implemented something like this in gas:
>>> ...
>>> .insn
>>> .byte 0x66
>>> .endinsn
>>> ...
>>> we could fix this more generically.
>>
>> Doesn't arm gas provide this functionality with some target-specific
>> pseudo?  It'd be good to copy that.
>>
>> ah, inst:
>>
>> @cindex @code{.inst} directive, ARM
>> @item .inst @var{opcode} [ , @dots{} ]
>> @itemx .inst.n @var{opcode} [ , @dots{} ]
>> @itemx .inst.w @var{opcode} [ , @dots{} ]
>> Generates the instruction corresponding to the numerical value
>> @var{opcode}.
>> @code{.inst.n} and @code{.inst.w} allow the Thumb instruction size to be
>> specified explicitly, overriding the normal encoding rules.
>>
>>
>> (ARM needs to distinguish data from insns to emit the mapping symbols
>> needed for be8 endianness xforms).
>>
> 
> Hmm, thanks for pointing that out. There's also something similar for
> s390 and riscv.
> 
> For mips there's another form of .insn (
> https://sourceware.org/binutils/docs/as/MIPS-insn.html#index-_002einsn
> ), which is similar to what I was proposing:
> ...
> 9.27.8 Directive to mark data as an instruction
> 
> The .insn directive tells as that the following data is actually
> instructions. This makes a difference in MIPS 16 and microMIPS modes:
> when loading the address of a label which precedes instructions, as
> automatically adds 1 to the value, so that jumping to the loaded address
> will do the right thing.
> ...
> 

Filed as PR23423 - "generic directive to mark data as insn"  (
https://sourceware.org/bugzilla/show_bug.cgi?id=23423 ).

Thanks,
- Tom


Re: [PATCH][debug] Handle references to skipped params in remap_ssa_name

2018-07-18 Thread Tom de Vries
On 07/06/2018 12:28 PM, Richard Biener wrote:
> On Thu, Jul 5, 2018 at 4:12 PM Tom de Vries  wrote:
>>
>> On 07/05/2018 01:39 PM, Richard Biener wrote:
>>> On Thu, Jul 5, 2018 at 1:25 PM Tom de Vries  wrote:
>>>>
>>>> [ was: Re: [testsuite/guality, committed] Prevent optimization of local in
>>>> vla-1.c ]
>>>>
>>>> On Wed, Jul 04, 2018 at 02:32:27PM +0200, Tom de Vries wrote:
>>>>> On 07/03/2018 11:05 AM, Tom de Vries wrote:
>>>>>> On 07/02/2018 10:16 AM, Jakub Jelinek wrote:
>>>>>>> On Mon, Jul 02, 2018 at 09:44:04AM +0200, Richard Biener wrote:
>>>>>>>> Given the array has size i + 1 it's upper bound should be 'i' and 'i'
>>>>>>>> should be available via DW_OP_[GNU_]entry_value.
>>>>>>>>
>>>>>>>> I see it is
>>>>>>>>
>>>>>>>> <175>   DW_AT_upper_bound : 10 byte block: 75 1 8 20 24 8 20 26 31
>>>>>>>> 1c   (DW_OP_breg5 (rdi): 1; DW_OP_const1u: 32; DW_OP_shl;
>>>>>>>> DW_OP_const1u: 32; DW_OP_shra; DW_OP_lit1; DW_OP_minus)
>>>>>>>>
>>>>>>>> and %rdi is 1.  Not sure why gdb fails to print it's length.  Yes, the
>>>>>>>> storage itself doesn't have a location but the
>>>>>>>> type specifies the size.
>>>>>>>>
>>>>>>>> (gdb) ptype a
>>>>>>>> type = char [variable length]
>>>>>>>> (gdb) p sizeof(a)
>>>>>>>> $3 = 0
>>>>>>>>
>>>>>>>> this looks like a gdb bug to me?
>>>>>>>>
>>>>>>
>>>>>> With gdb patch:
>>>>>> ...
>>>>>> diff --git a/gdb/findvar.c b/gdb/findvar.c
>>>>>> index 8ad5e25cb2..ebaff923a1 100644
>>>>>> --- a/gdb/findvar.c
>>>>>> +++ b/gdb/findvar.c
>>>>>> @@ -789,6 +789,8 @@ default_read_var_value
>>>>>>break;
>>>>>>
>>>>>>  case LOC_OPTIMIZED_OUT:
>>>>>> +  if (is_dynamic_type (type))
>>>>>> +   type = resolve_dynamic_type (type, NULL,
>>>>>> +/* Unused address.  */ 0);
>>>>>>return allocate_optimized_out_value (type);
>>>>>>
>>>>>>  default:
>>>>>> ...
>>>>>>
>>>>>> I get:
>>>>>> ...
>>>>>> $ ./gdb -batch -ex "b f1" -ex "r" -ex "p sizeof (a)" vla-1.exe
>>>>>> Breakpoint 1 at 0x4004a8: file vla-1.c, line 17.
>>>>>>
>>>>>> Breakpoint 1, f1 (i=i@entry=5) at vla-1.c:17
>>>>>> 17return a[0];
>>>>>> $1 = 6
>>>>>> ...
>>>>>>
>>>>>
>>>>> Well, for -O1 and -O2.
>>>>>
>>>>> For O3, I get instead:
>>>>> ...
>>>>> $ ./gdb vla-1.exe -q -batch -ex "b f1" -ex "run" -ex "p sizeof (a)"
>>>>> Breakpoint 1 at 0x4004b0: f1. (2 locations)
>>>>>
>>>>> Breakpoint 1, f1 (i=5) at vla-1.c:17
>>>>> 17return a[0];
>>>>> $1 = 0
>>>>> ...
>>>>>
>>>>
>>>> Hi,
>>>>
>>>> When compiling guality/vla-1.c with -O3 -g, vla 'a[i + 1]' in f1 is 
>>>> optimized
>>>> away, but f1 still contains a debug expression describing the upper bound 
>>>> of the
>>>> vla (D.1914):
>>>> ...
>>>>  __attribute__((noinline))
>>>>  f1 (intD.6 iD.1900)
>>>>  {
>>>>
>>>>saved_stack.1_2 = __builtin_stack_save ();
>>>># DEBUG BEGIN_STMT
>>>># DEBUG D#3 => i_1(D) + 1
>>>># DEBUG D#2 => (long intD.8) D#3
>>>># DEBUG D#1 => D#2 + -1
>>>># DEBUG D.1914 => (sizetype) D#1
>>>> ...
>>>>
>>>> Then f1 is cloned to a version f1.constprop with no parameters, eliminating
>>>> parameter i, and 'DEBUG D#3 => i_1(D) + 1' turns into 'D#3 => NULL'.
>>>> Consequently, 'p

Re: [PATCH][debug] Handle references to skipped params in remap_ssa_name

2018-07-24 Thread Tom de Vries
On 07/19/2018 10:30 AM, Richard Biener wrote:
> On Wed, Jul 18, 2018 at 3:42 PM Tom de Vries  wrote:
>>
>> On 07/06/2018 12:28 PM, Richard Biener wrote:
>>> On Thu, Jul 5, 2018 at 4:12 PM Tom de Vries  wrote:
>>>>
>>>> On 07/05/2018 01:39 PM, Richard Biener wrote:
>>>>> On Thu, Jul 5, 2018 at 1:25 PM Tom de Vries  wrote:
>>>>>>
>>>>>> [ was: Re: [testsuite/guality, committed] Prevent optimization of local 
>>>>>> in
>>>>>> vla-1.c ]
>>>>>>
>>>>>> On Wed, Jul 04, 2018 at 02:32:27PM +0200, Tom de Vries wrote:
>>>>>>> On 07/03/2018 11:05 AM, Tom de Vries wrote:
>>>>>>>> On 07/02/2018 10:16 AM, Jakub Jelinek wrote:
>>>>>>>>> On Mon, Jul 02, 2018 at 09:44:04AM +0200, Richard Biener wrote:
>>>>>>>>>> Given the array has size i + 1 it's upper bound should be 'i' and 'i'
>>>>>>>>>> should be available via DW_OP_[GNU_]entry_value.
>>>>>>>>>>
>>>>>>>>>> I see it is
>>>>>>>>>>
>>>>>>>>>> <175>   DW_AT_upper_bound : 10 byte block: 75 1 8 20 24 8 20 26 
>>>>>>>>>> 31
>>>>>>>>>> 1c   (DW_OP_breg5 (rdi): 1; DW_OP_const1u: 32; DW_OP_shl;
>>>>>>>>>> DW_OP_const1u: 32; DW_OP_shra; DW_OP_lit1; DW_OP_minus)
>>>>>>>>>>
>>>>>>>>>> and %rdi is 1.  Not sure why gdb fails to print it's length.  Yes, 
>>>>>>>>>> the
>>>>>>>>>> storage itself doesn't have a location but the
>>>>>>>>>> type specifies the size.
>>>>>>>>>>
>>>>>>>>>> (gdb) ptype a
>>>>>>>>>> type = char [variable length]
>>>>>>>>>> (gdb) p sizeof(a)
>>>>>>>>>> $3 = 0
>>>>>>>>>>
>>>>>>>>>> this looks like a gdb bug to me?
>>>>>>>>>>
>>>>>>>>
>>>>>>>> With gdb patch:
>>>>>>>> ...
>>>>>>>> diff --git a/gdb/findvar.c b/gdb/findvar.c
>>>>>>>> index 8ad5e25cb2..ebaff923a1 100644
>>>>>>>> --- a/gdb/findvar.c
>>>>>>>> +++ b/gdb/findvar.c
>>>>>>>> @@ -789,6 +789,8 @@ default_read_var_value
>>>>>>>>break;
>>>>>>>>
>>>>>>>>  case LOC_OPTIMIZED_OUT:
>>>>>>>> +  if (is_dynamic_type (type))
>>>>>>>> +   type = resolve_dynamic_type (type, NULL,
>>>>>>>> +/* Unused address.  */ 0);
>>>>>>>>return allocate_optimized_out_value (type);
>>>>>>>>
>>>>>>>>  default:
>>>>>>>> ...
>>>>>>>>
>>>>>>>> I get:
>>>>>>>> ...
>>>>>>>> $ ./gdb -batch -ex "b f1" -ex "r" -ex "p sizeof (a)" vla-1.exe
>>>>>>>> Breakpoint 1 at 0x4004a8: file vla-1.c, line 17.
>>>>>>>>
>>>>>>>> Breakpoint 1, f1 (i=i@entry=5) at vla-1.c:17
>>>>>>>> 17return a[0];
>>>>>>>> $1 = 6
>>>>>>>> ...
>>>>>>>>
>>>>>>>
>>>>>>> Well, for -O1 and -O2.
>>>>>>>
>>>>>>> For O3, I get instead:
>>>>>>> ...
>>>>>>> $ ./gdb vla-1.exe -q -batch -ex "b f1" -ex "run" -ex "p sizeof (a)"
>>>>>>> Breakpoint 1 at 0x4004b0: f1. (2 locations)
>>>>>>>
>>>>>>> Breakpoint 1, f1 (i=5) at vla-1.c:17
>>>>>>> 17return a[0];
>>>>>>> $1 = 0
>>>>>>> ...
>>>>>>>
>>>>>>
>>>>>> Hi,
>>>>>>
>>>>>> When compiling guality/vla-1.c with -O3 -g, vla 'a[i + 1]' in f1 is 
>>>>>> optimized
>>>>>> away, but f1 still contains a debug 

Re: [RFC][debug] Add -fadd-debug-nops

2018-07-24 Thread Tom de Vries
On 07/16/2018 05:10 PM, Tom de Vries wrote:
> On 07/16/2018 03:50 PM, Richard Biener wrote:
>> On Mon, 16 Jul 2018, Tom de Vries wrote:
>>> Any comments?
>>
>> Interesting idea.  I wonder if that should be generalized
>> to other places
> 
> I kept the option name general, to allow for that.
> 
> And indeed, this is a point-fix patch. I've been playing around with a
> more generic patch that adds nops such that each is_stmt .loc is
> associated with a unique insn, but that was embedded in an
> fkeep-vars-live branch, so this patch is minimally addressing the first
> problem I managed to reproduce on trunk.
> 
>> and how we can avoid compare-debug failures
>> (var-tracking usually doesn't change code-generation).
>>
> 

I'll post this patch series (the current state of my fkeep-vars-live
branch) in reply to this email:

 1  [debug] Add fdebug-nops
 2  [debug] Add fkeep-vars-live
 3  [debug] Add fdebug-nops and fkeep-vars-live to Og only

Bootstrapped and reg-tested on x86_64. ChangeLog entries and function
header comments missing.

Comments welcome.

Thanks,
- Tom


[RFC 1/3, debug] Add fdebug-nops

2018-07-24 Thread Tom de Vries
On Tue, Jul 24, 2018 at 01:30:30PM +0200, Tom de Vries wrote:
> On 07/16/2018 05:10 PM, Tom de Vries wrote:
> > On 07/16/2018 03:50 PM, Richard Biener wrote:
> >> On Mon, 16 Jul 2018, Tom de Vries wrote:
> >>> Any comments?
> >>
> >> Interesting idea.  I wonder if that should be generalized
> >> to other places
> > 
> > I kept the option name general, to allow for that.
> > 
> > And indeed, this is a point-fix patch. I've been playing around with a
> > more generic patch that adds nops such that each is_stmt .loc is
> > associated with a unique insn, but that was embedded in an
> > fkeep-vars-live branch, so this patch is minimally addressing the first
> > problem I managed to reproduce on trunk.
> > 
> >> and how we can avoid compare-debug failures
> >> (var-tracking usually doesn't change code-generation).
> >>
> > 
> 
> I'll post this patch series (the current state of my fkeep-vars-live
> branch) in reply to this email:
> 
>  1  [debug] Add fdebug-nops
>  2  [debug] Add fkeep-vars-live
>  3  [debug] Add fdebug-nops and fkeep-vars-live to Og only
> 
> Bootstrapped and reg-tested on x86_64. ChangeLog entries and function
> header comments missing.
> 
> Comments welcome.
> 

Consider this gdb session in foo of pr54200-2.c (using -Os):
...
(gdb) n
26return a; /* { dg-final { gdb-test . "(int)a" "6" } } */
(gdb) p a
'a' has unknown type; cast it to its declared type
(gdb) n
main () at pr54200-2.c:34
34return 0;
...

The problem is that the scope in which a is declared ends at .LBE7, and the
statement .loc for line 26 ends up attached to the ret insn:
...
.loc 1 24 11
addl%edx, %eax
.LVL1:
# DEBUG a => ax
.loc 1 26 7 is_stmt 1
.LBE7:
.loc 1 28 1 is_stmt 0
ret
.cfi_endproc
...

This patch fixes the problem (for Og and Os, the 'DEBUG a => ax' is missing
for O1 and higher) by adding a nop after the ".loc is_stmt 1" annotation:
...
.loc 1 24 11
addl%edx, %eax
 .LVL1:
# DEBUG a => ax
.loc 1 26 7 is_stmt 1
+   nop
 .LBE7:
.loc 1 28 1 is_stmt 0
ret
.cfi_endproc
...

and instead we have:
...
(gdb) n
26return a; /* { dg-final { gdb-test . "(int)a" "6" } } */
(gdb) p a
$1 = 6
(gdb) n
main () at pr54200-2.c:34
34return 0;
...

The option should have the same effect as using a debugger with location views
capability.

There's a design principle in GCC that code generation and debug generation
are independent.  This guarantees that if you're encountering a problem in an
application without debug info, you can recompile it with -g and be certain
that you can reproduce the same problem, and use the debug info to debug the
problem.  This invariant is enforced by bootstrap-debug.  The fdebug-nops
breaks this invariant, but we consider this acceptable since it is intended
for use in combination with -Og -g in the standard edit-compile-debug cycle,
where -g is assumed to be already present at the point of encountering a
problem.

PR debug/78685

[debug] Add fdebug-nops

---
 gcc/common.opt   |  4 +++
 gcc/testsuite/gcc.dg/guality/pr54200-2.c | 37 +
 gcc/var-tracking.c   | 46 
 3 files changed, 87 insertions(+)

diff --git a/gcc/common.opt b/gcc/common.opt
index 4bf8de90331..984b351cd79 100644
--- a/gcc/common.opt
+++ b/gcc/common.opt
@@ -1492,6 +1492,10 @@ Common Report Var(flag_hoist_adjacent_loads) Optimization
 Enable hoisting adjacent loads to encourage generating conditional move
 instructions.
 
+fdebug-nops
+Common Report Var(flag_debug_nops) Optimization
+Insert nops if that might improve debugging of optimized code.
+
 fkeep-gc-roots-live
 Common Undocumented Report Var(flag_keep_gc_roots_live) Optimization
 ; Always keep a pointer to a live memory block
diff --git a/gcc/testsuite/gcc.dg/guality/pr54200-2.c 
b/gcc/testsuite/gcc.dg/guality/pr54200-2.c
new file mode 100644
index 000..e30e3c92b94
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/guality/pr54200-2.c
@@ -0,0 +1,37 @@
+/* { dg-do run } */
+/* { dg-skip-if "" { *-*-* }  { "*" } { "-Og" "-Os" "-O0" } } */
+/* { dg-options "-g -fdebug-nops -DMAIN" } */
+
+#include "prevent-optimization.h"
+
+int o ATTRIBUTE_USED;
+
+void
+bar (void)
+{
+  o = 2;
+}
+
+int __attribute__((noinline,noclone))
+foo (int z, int x, int b)
+{
+  if (x == 1)
+{
+  bar ();
+  return z;
+}
+  else
+{
+  int a = (x + z) + b;
+  /* Add cast to prevent unsupported.  */
+  return a; /* { dg-final { gdb-test . "(int)a" "

[RFC 2/3, debug] Add fkeep-vars-live

2018-07-24 Thread Tom de Vries
On Tue, Jul 24, 2018 at 01:30:30PM +0200, Tom de Vries wrote:
> On 07/16/2018 05:10 PM, Tom de Vries wrote:
> > On 07/16/2018 03:50 PM, Richard Biener wrote:
> >> On Mon, 16 Jul 2018, Tom de Vries wrote:
> >>> Any comments?
> >>
> >> Interesting idea.  I wonder if that should be generalized
> >> to other places
> > 
> > I kept the option name general, to allow for that.
> > 
> > And indeed, this is a point-fix patch. I've been playing around with a
> > more generic patch that adds nops such that each is_stmt .loc is
> > associated with a unique insn, but that was embedded in an
> > fkeep-vars-live branch, so this patch is minimally addressing the first
> > problem I managed to reproduce on trunk.
> > 
> >> and how we can avoid compare-debug failures
> >> (var-tracking usually doesn't change code-generation).
> >>
> > 
> 
> I'll post this patch series (the current state of my fkeep-vars-live
> branch) in reply to this email:
> 
>  1  [debug] Add fdebug-nops
>  2  [debug] Add fkeep-vars-live
>  3  [debug] Add fdebug-nops and fkeep-vars-live to Og only
> 
> Bootstrapped and reg-tested on x86_64. ChangeLog entries and function
> header comments missing.
> 
> Comments welcome.
> 

This patch adds fake uses of user variables at the point where they go out of
scope, to keep user variables inspectable throughout the application.

This approach will generate sub-optimal code: in some cases, the executable
code will go through efforts to keep a var alive, while var-tracking can
easily compute the value of the var from something else.

Also, the compiler treats the fake use as any other use, so it may keep an
expensive resource like a register occupied (if we could mark the use as a
cold use or some such, we could tell optimizers that we need the value, but
it's ok if getting the value is expensive, so it could be spilled instead of
occupying a register).

The current implementation is expected to increase register pressure, and
therefore spilling, but we'd still expect less memory accesses then with O0.

Another drawback is that the fake uses confuse the unitialized warning
analysis, so that is switched off for -fkeep-vars-live.

PR debug/78685

[debug] Add fkeep-vars-live

---
 gcc/cfgexpand.c  |  9 +++
 gcc/common.opt   |  4 +++
 gcc/function.c   |  5 ++--
 gcc/gimplify.c   | 44 +++-
 gcc/testsuite/gcc.dg/guality/pr54200-2.c |  3 +--
 gcc/tree-cfg.c   |  1 +
 gcc/tree-sra.c   |  7 +
 gcc/tree-ssa-alias.c |  4 +++
 gcc/tree-ssa-structalias.c   |  3 ++-
 gcc/tree-ssa-uninit.c|  2 +-
 10 files changed, 70 insertions(+), 12 deletions(-)

diff --git a/gcc/cfgexpand.c b/gcc/cfgexpand.c
index d6e3c382085..eb9e36a8c5b 100644
--- a/gcc/cfgexpand.c
+++ b/gcc/cfgexpand.c
@@ -3533,6 +3533,13 @@ expand_clobber (tree lhs)
 }
 }
 
+static void
+expand_use (tree rhs)
+{
+  rtx target = expand_expr (rhs, NULL_RTX, VOIDmode, EXPAND_NORMAL);
+  emit_use (target);
+}
+
 /* A subroutine of expand_gimple_stmt, expanding one gimple statement
STMT that doesn't require special handling for outgoing edges.  That
is no tailcalls and no GIMPLE_COND.  */
@@ -3632,6 +3639,8 @@ expand_gimple_stmt_1 (gimple *stmt)
  /* This is a clobber to mark the going out of scope for
 this LHS.  */
  expand_clobber (lhs);
+   else if (TREE_CLOBBER_P (lhs))
+ expand_use (rhs);
else
  expand_assignment (lhs, rhs,
 gimple_assign_nontemporal_move_p (
diff --git a/gcc/common.opt b/gcc/common.opt
index 984b351cd79..a29e320ba45 100644
--- a/gcc/common.opt
+++ b/gcc/common.opt
@@ -1496,6 +1496,10 @@ fdebug-nops
 Common Report Var(flag_debug_nops) Optimization
 Insert nops if that might improve debugging of optimized code.
 
+fkeep-vars-live
+Common Report Var(flag_keep_vars_live) Optimization
+Add artificial uses of local vars at end of scope.
+
 fkeep-gc-roots-live
 Common Undocumented Report Var(flag_keep_gc_roots_live) Optimization
 ; Always keep a pointer to a live memory block
diff --git a/gcc/function.c b/gcc/function.c
index dee303cdbdd..6367d282db3 100644
--- a/gcc/function.c
+++ b/gcc/function.c
@@ -1964,11 +1964,12 @@ instantiate_virtual_regs (void)
   {
/* These patterns in the instruction stream can never be recognized.
   Fortunately, they shouldn't contain virtual registers either.  */
-if (GET_CODE (PATTERN (insn)) == USE
-   || GET_CODE (PATTERN (insn)) == CLOBBER
+if (GET_CODE (PATTERN (insn)) == CLOBBER

[RFC 3/3, debug] Add fdebug-nops and fkeep-vars-live to Og only

2018-07-24 Thread Tom de Vries
On Tue, Jul 24, 2018 at 01:30:30PM +0200, Tom de Vries wrote:
> On 07/16/2018 05:10 PM, Tom de Vries wrote:
> > On 07/16/2018 03:50 PM, Richard Biener wrote:
> >> On Mon, 16 Jul 2018, Tom de Vries wrote:
> >>> Any comments?
> >>
> >> Interesting idea.  I wonder if that should be generalized
> >> to other places
> > 
> > I kept the option name general, to allow for that.
> > 
> > And indeed, this is a point-fix patch. I've been playing around with a
> > more generic patch that adds nops such that each is_stmt .loc is
> > associated with a unique insn, but that was embedded in an
> > fkeep-vars-live branch, so this patch is minimally addressing the first
> > problem I managed to reproduce on trunk.
> > 
> >> and how we can avoid compare-debug failures
> >> (var-tracking usually doesn't change code-generation).
> >>
> > 
> 
> I'll post this patch series (the current state of my fkeep-vars-live
> branch) in reply to this email:
> 
>  1  [debug] Add fdebug-nops
>  2  [debug] Add fkeep-vars-live
>  3  [debug] Add fdebug-nops and fkeep-vars-live to Og only
> 
> Bootstrapped and reg-tested on x86_64. ChangeLog entries and function
> header comments missing.
> 
> Comments welcome.
> 

This switches on fdebug-nops and fkeep-vars-live by default for Og only, in
order to excercise the new switches in combination with the optimization
option they're intended to be used with.

Resulting fixes:
...
FAIL: gcc.dg/guality/pr54200.c  -Og -DPREVENT_OPTIMIZATION  line . z == 3
FAIL: gcc.dg/guality/pr54970.c  -Og -DPREVENT_OPTIMIZATION  line .+4 a[0] == 1
FAIL: gcc.dg/guality/pr54970.c  -Og -DPREVENT_OPTIMIZATION  line .+3 a[1] == 2
FAIL: gcc.dg/guality/pr54970.c  -Og -DPREVENT_OPTIMIZATION  line .+2 a[2] == 3
FAIL: gcc.dg/guality/pr54970.c  -Og -DPREVENT_OPTIMIZATION  line .+1 *p == 3
FAIL: gcc.dg/guality/pr54970.c  -Og -DPREVENT_OPTIMIZATION  line . *q == 2
FAIL: gcc.dg/guality/pr54970.c  -Og -DPREVENT_OPTIMIZATION  line .+4 a[0] == 1
FAIL: gcc.dg/guality/pr54970.c  -Og -DPREVENT_OPTIMIZATION  line .+3 a[1] == 2
FAIL: gcc.dg/guality/pr54970.c  -Og -DPREVENT_OPTIMIZATION  line .+2 a[2] == 13
FAIL: gcc.dg/guality/pr54970.c  -Og -DPREVENT_OPTIMIZATION  line .+1 *p == 13
FAIL: gcc.dg/guality/pr54970.c  -Og -DPREVENT_OPTIMIZATION  line . *q == 2
FAIL: gcc.dg/guality/pr54970.c  -Og -DPREVENT_OPTIMIZATION  line .+4 a[0] == 1
FAIL: gcc.dg/guality/pr54970.c  -Og -DPREVENT_OPTIMIZATION  line .+3 a[1] == 12
FAIL: gcc.dg/guality/pr54970.c  -Og -DPREVENT_OPTIMIZATION  line .+2 a[2] == 13
FAIL: gcc.dg/guality/pr54970.c  -Og -DPREVENT_OPTIMIZATION  line .+1 *p == 13
FAIL: gcc.dg/guality/pr54970.c  -Og -DPREVENT_OPTIMIZATION  line . *q == 12
FAIL: gcc.dg/guality/pr54970.c  -Og -DPREVENT_OPTIMIZATION  line .+3 a[1] == 5
FAIL: gcc.dg/guality/pr54970.c  -Og -DPREVENT_OPTIMIZATION  line .+2 a[2] == 6
FAIL: gcc.dg/guality/pr54970.c  -Og -DPREVENT_OPTIMIZATION  line .+1 *p == 6
FAIL: gcc.dg/guality/pr54970.c  -Og -DPREVENT_OPTIMIZATION  line . *q == 5
FAIL: gcc.dg/guality/pr54970.c  -Og -DPREVENT_OPTIMIZATION  line .+3 a[1] == 5
FAIL: gcc.dg/guality/pr54970.c  -Og -DPREVENT_OPTIMIZATION  line .+2 a[2] == 26
FAIL: gcc.dg/guality/pr54970.c  -Og -DPREVENT_OPTIMIZATION  line .+1 *p == 26
FAIL: gcc.dg/guality/pr54970.c  -Og -DPREVENT_OPTIMIZATION  line . *q == 5
FAIL: gcc.dg/guality/pr54970.c  -Og -DPREVENT_OPTIMIZATION  line .+7 a[1] == 25
FAIL: gcc.dg/guality/pr54970.c  -Og -DPREVENT_OPTIMIZATION  line .+6 a[2] == 26
FAIL: gcc.dg/guality/pr54970.c  -Og -DPREVENT_OPTIMIZATION  line .+5 *p == 26
FAIL: gcc.dg/guality/pr54970.c  -Og -DPREVENT_OPTIMIZATION  line .+4 p[-1] == 25
FAIL: gcc.dg/guality/pr54970.c  -Og -DPREVENT_OPTIMIZATION  line .+1 q[1] == 26
FAIL: gcc.dg/guality/pr54970.c  -Og -DPREVENT_OPTIMIZATION  line . *q == 25
...

The intended nature of Og is: offering a reasonable level of optimization
while maintaining fast compilation and a good debugging experience.
Todo: assess compile time and runtime cost of adding adding fdebug-nops and
fkeep-vars-live to Og, to see if nature of Og has been changed.

PR debug/78685

[debug] Add fdebug-nops and fkeep-vars-live to Og only

---
 gcc/common/common-target.h | 1 +
 gcc/opts.c | 6 ++
 gcc/testsuite/gcc.dg/pr84614.c | 2 +-
 3 files changed, 8 insertions(+), 1 deletion(-)

diff --git a/gcc/common/common-target.h b/gcc/common/common-target.h
index 9b796c8f085..17e8950a027 100644
--- a/gcc/common/common-target.h
+++ b/gcc/common/common-target.h
@@ -34,6 +34,7 @@ enum opt_levels
   OPT_LEVELS_1_PLUS, /* -O1 and above, including -Os and -Og.  */
   OPT_LEVELS_1_PLUS_SPEED_ONLY, /* -O1 and above, but not -Os or -Og.  */
   OPT_LEVELS_1_PLUS_NOT_DEBUG, /* -O1 and above, but not -Og.  */
+  OPT_LEVELS_1_DEBUG, /* -Og.  */
   OPT_LEVELS_2_PLUS, /* -O2 and above, in

Re: [RFC 2/3, debug] Add fkeep-vars-live

2018-07-24 Thread Tom de Vries
On 07/24/2018 01:46 PM, Jakub Jelinek wrote:
> On Tue, Jul 24, 2018 at 01:37:32PM +0200, Tom de Vries wrote:
>> Another drawback is that the fake uses confuse the unitialized warning
>> analysis, so that is switched off for -fkeep-vars-live.
> 
> Is that really needed?  I.e. can't you for the purpose of uninitialized
> warning analysis ignore the clobber = var uses?
> 

This seems to work on the test-case that failed during testing
(g++.dg/uninit-pred-4.C):
...
diff --git a/gcc/tree-ssa-uninit.c b/gcc/tree-ssa-uninit.c
index 77f090bfa80..953db9ed02d 100644
--- a/gcc/tree-ssa-uninit.c
+++ b/gcc/tree-ssa-uninit.c
@@ -132,6 +132,9 @@ warn_uninit (enum opt_code wc, tree t, tree expr,
tree var,
   if (is_gimple_assign (context)
   && gimple_assign_rhs_code (context) == COMPLEX_EXPR)
 return;
+  if (gimple_assign_single_p (context)
+  && TREE_CLOBBER_P (gimple_assign_lhs (context)))
+return;
   if (!has_undefined_value_p (t))
 return;
...
But I don't know the pass well enough to know whether this is a
sufficient fix.


> Is the -fkeep-vars-live option -fcompare-debug friendly?
> 

I think so, there's no reference to debug flags or instructions.

>> --- a/gcc/cfgexpand.c
>> +++ b/gcc/cfgexpand.c
>> @@ -3533,6 +3533,13 @@ expand_clobber (tree lhs)
>>  }
>>  }
>>  
>> +static void
>> +expand_use (tree rhs)
>> +{
>> +  rtx target = expand_expr (rhs, NULL_RTX, VOIDmode, EXPAND_NORMAL);
>> +  emit_use (target);
>> +}
> 
> Missing function comment.
> 
>> +fkeep-vars-live
>> +Common Report Var(flag_keep_vars_live) Optimization
>> +Add artificial uses of local vars at end of scope.
> 
> at the end of scope?

Is this better?
+Add artificial use for each local variable at the end of the
declaration scope

Thanks,
- Tom


[PATCH, debug] Add fdebug-nops

2018-07-24 Thread Tom de Vries
On Tue, Jul 24, 2018 at 01:35:14PM +0200, Tom de Vries wrote:
> On Tue, Jul 24, 2018 at 01:30:30PM +0200, Tom de Vries wrote:
> > On 07/16/2018 05:10 PM, Tom de Vries wrote:
> > > On 07/16/2018 03:50 PM, Richard Biener wrote:
> > >> On Mon, 16 Jul 2018, Tom de Vries wrote:
> > >>> Any comments?
> > >>
> > >> Interesting idea.  I wonder if that should be generalized
> > >> to other places
> > > 
> > > I kept the option name general, to allow for that.
> > > 
> > > And indeed, this is a point-fix patch. I've been playing around with a
> > > more generic patch that adds nops such that each is_stmt .loc is
> > > associated with a unique insn, but that was embedded in an
> > > fkeep-vars-live branch, so this patch is minimally addressing the first
> > > problem I managed to reproduce on trunk.
> > > 
> > >> and how we can avoid compare-debug failures
> > >> (var-tracking usually doesn't change code-generation).
> > >>
> > > 
> > 
> > I'll post this patch series (the current state of my fkeep-vars-live
> > branch) in reply to this email:
> > 
> >  1  [debug] Add fdebug-nops
> >  2  [debug] Add fkeep-vars-live
> >  3  [debug] Add fdebug-nops and fkeep-vars-live to Og only
> > 
> > Bootstrapped and reg-tested on x86_64. ChangeLog entries and function
> > header comments missing.
> > 
> > Comments welcome.
> > 
> 

And here's a version that is fcompare-debug friendly.

OK for trunk?

Thanks,
- Tom

[debug] Add fdebug-nops

Consider this gdb session in foo of pr54200-2.c (using -Os):
...
(gdb) n
26return a; /* { dg-final { gdb-test . "(int)a" "6" } } */
(gdb) p a
'a' has unknown type; cast it to its declared type
(gdb) n
main () at pr54200-2.c:34
34return 0;
...

The problem is that the scope in which a is declared ends at .LBE7, and the
statement .loc for line 26 ends up attached to the ret insn:
...
.loc 1 24 11
addl%edx, %eax
.LVL1:
# DEBUG a => ax
.loc 1 26 7 is_stmt 1
.LBE7:
.loc 1 28 1 is_stmt 0
ret
.cfi_endproc
...

This patch fixes the problem (for Og and Os, the 'DEBUG a => ax' is missing
for O1 and higher) by adding a nop after the ".loc is_stmt 1" annotation:
...
.loc 1 24 11
addl%edx, %eax
 .LVL1:
# DEBUG a => ax
.loc 1 26 7 is_stmt 1
+   nop
 .LBE7:
.loc 1 28 1 is_stmt 0
ret
.cfi_endproc
...

and instead we have:
...
(gdb) n
26return a; /* { dg-final { gdb-test . "(int)a" "6" } } */
(gdb) p a
$1 = 6
(gdb) n
main () at pr54200-2.c:34
34return 0;
...

The option should have the same effect as using a debugger with location views
capability.

2018-07-24  Tom de Vries  

PR debug/78685
* common.opt (fdebug-nops): New option.
* passes.def (pass_late_compilation): Add pass_debug_nops.
* rtl.h (MAY_HAVE_DEBUG_MARKER_INSNS): Add flag_debug_nops.
* timevar.def (TV_VAR_DEBUG_NOPS): New timevar.
* tree-pass.h (make_pass_debug_nops): Declare.
* tree.h (MAY_HAVE_DEBUG_MARKER_STMTS): Add flag_debug_nops.
* var-tracking.c (insert_debug_nops): New function.
(class pass_debug_nops): New pass.
(make_pass_debug_nops): New function.

* gcc.dg/guality/pr54200-2.c: New test.

---
 gcc/common.opt   |  4 ++
 gcc/passes.def   |  1 +
 gcc/rtl.h|  2 +-
 gcc/testsuite/gcc.dg/guality/pr54200-2.c | 37 +
 gcc/timevar.def  |  1 +
 gcc/tree-pass.h  |  1 +
 gcc/tree.h   |  2 +-
 gcc/var-tracking.c   | 91 
 8 files changed, 137 insertions(+), 2 deletions(-)

diff --git a/gcc/common.opt b/gcc/common.opt
index 4bf8de90331..984b351cd79 100644
--- a/gcc/common.opt
+++ b/gcc/common.opt
@@ -1492,6 +1492,10 @@ Common Report Var(flag_hoist_adjacent_loads) Optimization
 Enable hoisting adjacent loads to encourage generating conditional move
 instructions.
 
+fdebug-nops
+Common Report Var(flag_debug_nops) Optimization
+Insert nops if that might improve debugging of optimized code.
+
 fkeep-gc-roots-live
 Common Undocumented Report Var(flag_keep_gc_roots_live) Optimization
 ; Always keep a pointer to a live memory block
diff --git a/gcc/passes.def b/gcc/passes.def
index 2a8fbc2efbe..e97c4f9533a 100644
--- a/gcc/passes.def
+++ b/gcc/passes.def
@@ -487,6 +487,7 @@ along with GCC; see the file COPYING3.  If not see
   NEXT_PASS (pass_late_compilation);
   PUSH_INSERT_PASSES_WITH

[PATCH, debug] Add fkeep-vars-live

2018-07-24 Thread Tom de Vries
On Tue, Jul 24, 2018 at 02:34:26PM +0200, Tom de Vries wrote:
> On 07/24/2018 01:46 PM, Jakub Jelinek wrote:
> > On Tue, Jul 24, 2018 at 01:37:32PM +0200, Tom de Vries wrote:
> >> Another drawback is that the fake uses confuse the unitialized warning
> >> analysis, so that is switched off for -fkeep-vars-live.
> > 
> > Is that really needed?  I.e. can't you for the purpose of uninitialized
> > warning analysis ignore the clobber = var uses?
> > 
> 
> This seems to work on the test-case that failed during testing
> (g++.dg/uninit-pred-4.C):
> ...
> diff --git a/gcc/tree-ssa-uninit.c b/gcc/tree-ssa-uninit.c
> index 77f090bfa80..953db9ed02d 100644
> --- a/gcc/tree-ssa-uninit.c
> +++ b/gcc/tree-ssa-uninit.c
> @@ -132,6 +132,9 @@ warn_uninit (enum opt_code wc, tree t, tree expr,
> tree var,
>if (is_gimple_assign (context)
>&& gimple_assign_rhs_code (context) == COMPLEX_EXPR)
>  return;
> +  if (gimple_assign_single_p (context)
> +  && TREE_CLOBBER_P (gimple_assign_lhs (context)))
> +return;
>if (!has_undefined_value_p (t))
>  return;
> ...
> But I don't know the pass well enough to know whether this is a
> sufficient fix.
> 

Updated and re-tested patch.

> +Add artificial use for each local variable at the end of the
> declaration scope

Is this a better option description?


OK for trunk?

Thanks,
- Tom

[debug] Add fkeep-vars-live

This patch adds fake uses of user variables at the point where they go out of
scope, to keep user variables inspectable throughout the application.

This approach will generate sub-optimal code: in some cases, the executable
code will go through efforts to keep a var alive, while var-tracking can
easily compute the value of the var from something else.

Also, the compiler treats the fake use as any other use, so it may keep an
expensive resource like a register occupied (if we could mark the use as a
cold use or some such, we could tell optimizers that we need the value, but
it's ok if getting the value is expensive, so it could be spilled instead of
occupying a register).

The current implementation is expected to increase register pressure, and
therefore spilling, but we'd still expect less memory accesses then with O0.

2018-07-24  Tom de Vries  

PR debug/78685
* cfgexpand.c (expand_use): New function.
(expand_gimple_stmt_1): Handle TREE_CLOBBER_P as lhs of assignment.
* common.opt (fkeep-vars-live): New option.
* function.c (instantiate_virtual_regs): Instantiate in USEs as well.
* gimplify.c (gimple_build_uses): New function.
(gimplify_bind_expr): Build clobber uses for variables that don't have
to be in memory.
(gimplify_body): Build clobber uses for arguments.
* tree-cfg.c (verify_gimple_assign_single): Handle TREE_CLOBBER_P as lhs
of assignment.
* tree-sra.c (sra_modify_assign): Same.
* tree-ssa-alias.c (refs_may_alias_p_1): Same.
* tree-ssa-structalias.c (find_func_aliases): Same.
* tree-ssa-uninit.c (warn_uninit): Same.

* gcc.dg/guality/pr54200-2.c: Update.

---
 gcc/cfgexpand.c  | 11 
 gcc/common.opt   |  4 +++
 gcc/function.c   |  5 ++--
 gcc/gimplify.c   | 46 +++-
 gcc/testsuite/gcc.dg/guality/pr54200-2.c |  3 +--
 gcc/tree-cfg.c   |  1 +
 gcc/tree-sra.c   |  7 +
 gcc/tree-ssa-alias.c |  4 +++
 gcc/tree-ssa-structalias.c   |  3 ++-
 gcc/tree-ssa-uninit.c|  3 +++
 10 files changed, 76 insertions(+), 11 deletions(-)

diff --git a/gcc/cfgexpand.c b/gcc/cfgexpand.c
index d6e3c382085..e28e8ceec75 100644
--- a/gcc/cfgexpand.c
+++ b/gcc/cfgexpand.c
@@ -3533,6 +3533,15 @@ expand_clobber (tree lhs)
 }
 }
 
+/* Expand a use of RHS.  */
+
+static void
+expand_use (tree rhs)
+{
+  rtx target = expand_expr (rhs, NULL_RTX, VOIDmode, EXPAND_NORMAL);
+  emit_use (target);
+}
+
 /* A subroutine of expand_gimple_stmt, expanding one gimple statement
STMT that doesn't require special handling for outgoing edges.  That
is no tailcalls and no GIMPLE_COND.  */
@@ -3632,6 +3641,8 @@ expand_gimple_stmt_1 (gimple *stmt)
  /* This is a clobber to mark the going out of scope for
 this LHS.  */
  expand_clobber (lhs);
+   else if (TREE_CLOBBER_P (lhs))
+ expand_use (rhs);
else
  expand_assignment (lhs, rhs,
 gimple_assign_nontemporal_move_p (
diff --git a/gcc/common.opt b/gcc/common.opt
index 984b351cd79..a29e320ba45 100644
--- a/gcc/common.opt
+++ b/gcc/common.opt
@@ -1496,6 +1496,10 @@ fdebug-nops
 

Re: [RFC 1/3, debug] Add fdebug-nops

2018-07-24 Thread Tom de Vries
On 07/24/2018 09:06 PM, Alexandre Oliva wrote:
> On Jul 24, 2018, Tom de Vries  wrote:
> 
>> There's a design principle in GCC that code generation and debug generation
>> are independent.  This guarantees that if you're encountering a problem in an
>> application without debug info, you can recompile it with -g and be certain
>> that you can reproduce the same problem, and use the debug info to debug the
>> problem.  This invariant is enforced by bootstrap-debug.  The fdebug-nops
>> breaks this invariant
> 
> I thought of a way to not break it: enable the debug info generation
> machinery, including VTA and SFN, but discard those only at the very end
> if -g is not enabled.  The downside is that it would likely slow -Og
> down significantly, but who uses it without -g anyway?

I thought of the same.  I've submitted a patch here that uses SFN:
https://gcc.gnu.org/ml/gcc-patches/2018-07/msg01391.html . VTA is not
needed AFAIU.

Thanks,
- Tom


[committed, libgomp, openacc, testsuite] Fix async/wait logic in lib-13.f90

2018-07-26 Thread Tom de Vries
Hi,

the purpose of the lib-13.f90 test-case is to test acc_wait_all_async.  The
test indeed calls acc_wait_all_async, but then subsequentlys calls
acc_wait_all, so the acc_wait_all_async functionality is not tested.
Furthermore, all acc_async_test calls are placed in a location where they are
not guaranteed to succeed, which explains why there's an xfail for the lower
optimization levels.

This patch fixes the problems by replacing acc_wait_all with an acc_wait on
the async id used for the acc_wait_all_async call, and moving the
acc_async_test calls to the correct locations.

Reg-tested on x86_64 with nvptx accelerator.

Committed to trunk.

Thanks,
- Tom

[libgomp, openacc, testsuite] Fix async/wait logic in lib-13.f90

2018-07-26  Tom de Vries  

* testsuite/libgomp.oacc-fortran/lib-13.f90: Replace acc_wait_all with
acc_wait.  Move acc_async_test calls to correct locations.  Remove
xfail.

---
 libgomp/testsuite/libgomp.oacc-fortran/lib-13.f90 | 10 --
 1 file changed, 4 insertions(+), 6 deletions(-)

diff --git a/libgomp/testsuite/libgomp.oacc-fortran/lib-13.f90 
b/libgomp/testsuite/libgomp.oacc-fortran/lib-13.f90
index 6d713b1cd95..da944c35de9 100644
--- a/libgomp/testsuite/libgomp.oacc-fortran/lib-13.f90
+++ b/libgomp/testsuite/libgomp.oacc-fortran/lib-13.f90
@@ -1,5 +1,4 @@
 ! { dg-do run }
-! { dg-xfail-run-if "TODO" { openacc_nvidia_accel_selected } { "-O0" "-O1" } { 
"" } }
 
 program main
   use openacc
@@ -22,13 +21,12 @@ program main
 end do
   !$acc end data
 
-  if (acc_async_test (1) .neqv. .TRUE.) call abort
-  if (acc_async_test (2) .neqv. .TRUE.) call abort
-
   call acc_wait_all_async (nprocs + 1)
 
-  if (acc_async_test (nprocs + 1) .neqv. .TRUE.) call abort
+  call acc_wait (nprocs + 1)
 
-  call acc_wait_all ()
+  if (acc_async_test (1) .neqv. .TRUE.) call abort
+  if (acc_async_test (2) .neqv. .TRUE.) call abort
+  if (acc_async_test (nprocs + 1) .neqv. .TRUE.) call abort
 
 end program


[committed, libgomp, openacc, testsuite] Fix async logic in lib-12.f90

2018-07-26 Thread Tom de Vries
Hi,

in testcase lib-12.f90, all acc_async_test calls are placed in a location
where they are not guaranteed to succeed, which explains why there's an xfail
for the lower optimization levels.

This patch fixes the problem by moving the acc_async_test calls to the correct
locations.

Reg-tested on x86_64 with nvptx accelerator.

Committed to trunk.

Thanks,
- Tom

[libgomp, openacc, testsuite] Fix async logic in lib-12.f90

2018-07-26  Tom de Vries  

* testsuite/libgomp.oacc-fortran/lib-12.f90: Move acc_async_test calls
to correct locations.  Remove xfail.

---
 libgomp/testsuite/libgomp.oacc-fortran/lib-12.f90 | 6 ++
 1 file changed, 2 insertions(+), 4 deletions(-)

diff --git a/libgomp/testsuite/libgomp.oacc-fortran/lib-12.f90 
b/libgomp/testsuite/libgomp.oacc-fortran/lib-12.f90
index e307dfde374..6912f67d444 100644
--- a/libgomp/testsuite/libgomp.oacc-fortran/lib-12.f90
+++ b/libgomp/testsuite/libgomp.oacc-fortran/lib-12.f90
@@ -1,5 +1,4 @@
 ! { dg-do run }
-! { dg-xfail-run-if "TODO" { openacc_nvidia_accel_selected } { "-O0" "-O1" } { 
"" } }
 
 program main
   use openacc
@@ -18,10 +17,9 @@ program main
 
   call acc_wait_async (0, 1)
 
-  if (acc_async_test (0) .neqv. .TRUE.) call abort
+  call acc_wait (1)
 
+  if (acc_async_test (0) .neqv. .TRUE.) call abort
   if (acc_async_test (1) .neqv. .TRUE.) call abort
 
-  call acc_wait (1)
-
 end program


Re: [patch] adjust default nvptx launch geometry for OpenACC offloaded regions

2018-07-26 Thread Tom de Vries
> Content-Type: text/x-patch; name="trunk-libgomp-default-par.diff"
> Content-Transfer-Encoding: 7bit
> Content-Disposition: attachment; filename="trunk-libgomp-default-par.diff"

>From https://gcc.gnu.org/contribute.html#patches :
...
We prefer patches posted as plain text or as MIME parts of type
text/x-patch or text/plain, disposition inline, encoded as 7bit or 8bit.
It is strongly discouraged to post patches as MIME parts of type
application/whatever, disposition attachment or encoded as base64 or
quoted-printable.
...

Please post with content-disposition inline instead of attachment (or,
as plain text).

Thanks,
- Tom


[libgomp, nvptx] Move device property sampling from nvptx_exec to nvptx_open

2018-07-26 Thread Tom de Vries
> diff --git a/libgomp/plugin/plugin-nvptx.c b/libgomp/plugin/plugin-nvptx.c
> index 89326e57741..5022e462a3d 100644
> --- a/libgomp/plugin/plugin-nvptx.c
> +++ b/libgomp/plugin/plugin-nvptx.c
> @@ -1120,6 +1126,7 @@ nvptx_exec (void (*fn), size_t mapnum, void 
> **hostaddrs, void **devaddrs,
>void *hp, *dp;
>struct nvptx_thread *nvthd = nvptx_thread ();
>const char *maybe_abort_msg = "(perhaps abort was called)";
> +  int dev_size = nvthd->ptx_dev->num_sms;
>  
>function = targ_fn->fn;
>  
> @@ -1150,23 +1156,20 @@ nvptx_exec (void (*fn), size_t mapnum, void 
> **hostaddrs, void **devaddrs,
> for (int i = 0; i < GOMP_DIM_MAX; ++i)
>   default_dims[i] = GOMP_PLUGIN_acc_default_dim (i);
>  
> -   int warp_size, block_size, dev_size, cpu_size;
> +   int warp_size, block_size, cpu_size;
> CUdevice dev = nvptx_thread()->ptx_dev->dev;
> /* 32 is the default for known hardware.  */
> int gang = 0, worker = 32, vector = 32;
> -   CUdevice_attribute cu_tpb, cu_ws, cu_mpc, cu_tpm;
> +   CUdevice_attribute cu_tpb, cu_ws, cu_tpm;
>  
> cu_tpb = CU_DEVICE_ATTRIBUTE_MAX_THREADS_PER_BLOCK;
> cu_ws = CU_DEVICE_ATTRIBUTE_WARP_SIZE;
> -   cu_mpc = CU_DEVICE_ATTRIBUTE_MULTIPROCESSOR_COUNT;
> cu_tpm  = CU_DEVICE_ATTRIBUTE_MAX_THREADS_PER_MULTIPROCESSOR;
>  
> if (CUDA_CALL_NOCHECK (cuDeviceGetAttribute, &block_size, cu_tpb,
>dev) == CUDA_SUCCESS
> && CUDA_CALL_NOCHECK (cuDeviceGetAttribute, &warp_size, cu_ws,
>   dev) == CUDA_SUCCESS
> -   && CUDA_CALL_NOCHECK (cuDeviceGetAttribute, &dev_size, cu_mpc,
> - dev) == CUDA_SUCCESS
> && CUDA_CALL_NOCHECK (cuDeviceGetAttribute, &cpu_size, cu_tpm,
>   dev) == CUDA_SUCCESS)
>   {

This is a good idea (and should have been an independent patch of course).

Furthermore, it's better to move the remaining cuDeviceGetAttribute
calls to nvptx_open, as was already suggested by Thomas here (
https://gcc.gnu.org/ml/gcc-patches/2017-02/msg01020.html ).

Committed to trunk.

- Tom
[libgomp, nvptx] Move device property sampling from nvptx_exec to nvptx_open

Move sampling of device properties from nvptx_exec to nvptx_open, and assume
the sampling always succeeds.  This simplifies the default dimension
initialization code in nvptx_open.

2018-07-26  Cesar Philippidis  
	Tom de Vries  

	* plugin/plugin-nvptx.c (struct ptx_device): Add warp_size,
	max_threads_per_block and max_threads_per_multiprocessor fields.
	(nvptx_open_device): Initialize new fields.
	(nvptx_exec): Use num_sms, and new fields.

---
 libgomp/plugin/plugin-nvptx.c | 53 +--
 1 file changed, 26 insertions(+), 27 deletions(-)

diff --git a/libgomp/plugin/plugin-nvptx.c b/libgomp/plugin/plugin-nvptx.c
index 89326e57741..5d9b5151e95 100644
--- a/libgomp/plugin/plugin-nvptx.c
+++ b/libgomp/plugin/plugin-nvptx.c
@@ -414,6 +414,9 @@ struct ptx_device
   int num_sms;
   int regs_per_block;
   int regs_per_sm;
+  int warp_size;
+  int max_threads_per_block;
+  int max_threads_per_multiprocessor;
 
   struct ptx_image_data *images;  /* Images loaded on device.  */
   pthread_mutex_t image_lock; /* Lock for above list.  */
@@ -800,6 +803,15 @@ nvptx_open_device (int n)
   GOMP_PLUGIN_error ("Only warp size 32 is supported");
   return NULL;
 }
+  ptx_dev->warp_size = pi;
+
+  CUDA_CALL_ERET (NULL, cuDeviceGetAttribute, &pi,
+		  CU_DEVICE_ATTRIBUTE_MAX_THREADS_PER_BLOCK, dev);
+  ptx_dev->max_threads_per_block = pi;
+
+  CUDA_CALL_ERET (NULL, cuDeviceGetAttribute, &pi,
+		  CU_DEVICE_ATTRIBUTE_MAX_THREADS_PER_MULTIPROCESSOR, dev);
+  ptx_dev->max_threads_per_multiprocessor = pi;
 
   r = CUDA_CALL_NOCHECK (cuDeviceGetAttribute, &async_engines,
 			 CU_DEVICE_ATTRIBUTE_ASYNC_ENGINE_COUNT, dev);
@@ -1150,33 +1162,20 @@ nvptx_exec (void (*fn), size_t mapnum, void **hostaddrs, void **devaddrs,
 	  for (int i = 0; i < GOMP_DIM_MAX; ++i)
 	default_dims[i] = GOMP_PLUGIN_acc_default_dim (i);
 
-	  int warp_size, block_size, dev_size, cpu_size;
-	  CUdevice dev = nvptx_thread()->ptx_dev->dev;
-	  /* 32 is the default for known hardware.  */
-	  int gang = 0, worker = 32, vector = 32;
-	  CUdevice_attribute cu_tpb, cu_ws, cu_mpc, cu_tpm;
-
-	  cu_tpb = CU_DEVICE_ATTRIBUTE_MAX_THREADS_PER_BLOCK;
-	  cu_ws = CU_DEVICE_ATTRIBUTE_WARP_SIZE;
-	  cu_mpc = CU_DEVICE_ATTRIBUTE_MULTIPROCESSOR_COUNT;
-	  cu_tpm  = CU_DEVICE_ATTRIBUTE_MAX_THREADS_PER_MULTIPROCESSOR;
-
-	  if (CUDA_CALL_NOCHECK (cuDeviceGetAttribute, &block_size, cu_tpb,
- dev) == CUDA_SUCCESS
-	  && CUDA_CA

Re: [patch] adjust default nvptx launch geometry for OpenACC offloaded regions

2018-07-26 Thread Tom de Vries
>> Right, in fact there are two separate things you're trying to address
>> here: launch failure and occupancy heuristic, so split the patch.

> That hunk was small, so I included it with this patch. Although if you
> insist, I can remove it.

Please, for future reference, always assume that I insist instead of
asking me, unless you have an argument to present why that is not a good
idea. And just to be clear here: "small" is not such an argument.

Please keep in mind ( https://gcc.gnu.org/contribute.html#patches ):
...
Don't mix together changes made for different reasons. Send them
individually.
...

> +  /* Check if the accelerator has sufficient hardware resources to
> + launch the offloaded kernel.  */
> +  if (dims[GOMP_DIM_WORKER] * dims[GOMP_DIM_VECTOR]
> +  > targ_fn->max_threads_per_block)
> +GOMP_PLUGIN_fatal ("The Nvidia accelerator has insufficient resources to"
> +" launch '%s' with num_workers = %d and vector_length ="
> +" %d; recompile the program with 'num_workers = x and"
> +" vector_length = y' on that offloaded region or "
> +"'-fopenacc-dim=-:x:y' where x * y <= %d.\n",
> +targ_fn->launch->fn, dims[GOMP_DIM_WORKER],
> +dims[GOMP_DIM_VECTOR], targ_fn->max_threads_per_block);
> +

This is copied from the state on an openacc branch where vector-length
is variable, and the error message text doesn't make sense on current
trunk for that reason. Also, it suggests a syntax for fopenacc-dim
that's not supported on trunk.

Committed as attached.

Thanks,
- Tom
[libgomp, nvptx] Add error with recompilation hint for launch failure

Currently, when a kernel is lauched with too many workers, it results in a cuda
launch failure.  This is triggered f.i. for parallel-loop-1.c at -O0 on a Quadro
M1200.

This patch detects this situation, and errors out with a hint on how to fix it.

Build and reg-tested on x86_64 with nvptx accelerator.

2018-07-26  Cesar Philippidis  
	Tom de Vries  

	* plugin/plugin-nvptx.c (nvptx_exec): Error if the hardware doesn't have
	sufficient resources to launch a kernel, and give a hint on how to fix
	it.

---
 libgomp/plugin/plugin-nvptx.c | 15 +++
 1 file changed, 15 insertions(+)

diff --git a/libgomp/plugin/plugin-nvptx.c b/libgomp/plugin/plugin-nvptx.c
index 5d9b5151e95..3a4077a1315 100644
--- a/libgomp/plugin/plugin-nvptx.c
+++ b/libgomp/plugin/plugin-nvptx.c
@@ -1204,6 +1204,21 @@ nvptx_exec (void (*fn), size_t mapnum, void **hostaddrs, void **devaddrs,
 	  dims[i] = default_dims[i];
 }
 
+  /* Check if the accelerator has sufficient hardware resources to
+ launch the offloaded kernel.  */
+  if (dims[GOMP_DIM_WORKER] * dims[GOMP_DIM_VECTOR]
+  > targ_fn->max_threads_per_block)
+{
+  int suggest_workers
+	= targ_fn->max_threads_per_block / dims[GOMP_DIM_VECTOR];
+  GOMP_PLUGIN_fatal ("The Nvidia accelerator has insufficient resources to"
+			 " launch '%s' with num_workers = %d; recompile the"
+			 " program with 'num_workers = %d' on that offloaded"
+			 " region or '-fopenacc-dim=:%d'",
+			 targ_fn->launch->fn, dims[GOMP_DIM_WORKER],
+			 suggest_workers, suggest_workers);
+}
+
   /* This reserves a chunk of a pre-allocated page of memory mapped on both
  the host and the device. HP is a host pointer to the new chunk, and DP is
  the corresponding device pointer.  */


Re: [patch] adjust default nvptx launch geometry for OpenACC offloaded regions

2018-07-26 Thread Tom de Vries
On 07/26/2018 04:27 PM, Cesar Philippidis wrote:
> Hi Tom,
> 
> I see that you're reviewing the libgomp changes. Please disregard the
> following hunk:
> 
> On 07/11/2018 12:13 PM, Cesar Philippidis wrote:
>> @@ -1199,12 +1202,59 @@ nvptx_exec (void (*fn), size_t mapnum, void 
>> **hostaddrs, void **devaddrs,
>>   default_dims[GOMP_DIM_VECTOR]);
>>  }
>>pthread_mutex_unlock (&ptx_dev_lock);
>> +  int vectors = default_dims[GOMP_DIM_VECTOR];
>> +  int workers = default_dims[GOMP_DIM_WORKER];
>> +  int gangs = default_dims[GOMP_DIM_GANG];
>> +
>> +  if (nvptx_thread()->ptx_dev->driver_version > 6050)
>> +{
>> +  int grids, blocks;
>> +  CUDA_CALL_ASSERT (cuOccupancyMaxPotentialBlockSize, &grids,
>> +&blocks, function, NULL, 0,
>> +dims[GOMP_DIM_WORKER] * dims[GOMP_DIM_VECTOR]);
>> +  GOMP_PLUGIN_debug (0, "cuOccupancyMaxPotentialBlockSize: "
>> + "grid = %d, block = %d\n", grids, blocks);
>> +
>> +  gangs = grids * dev_size;
>> +  workers = blocks / vectors;
>> +}
> 
> I revisited this change yesterday and I noticed it was setting gangs
> incorrectly. Basically, gangs should be set as follows
> 
>   gangs = grids * (blocks / warp_size);
> 
> or to be more closer to og8 as
> 
>   gangs = 2 * grids * (blocks / warp_size);
> 
> The use of that magic constant 2 is to prevent thread starvation. That's
> a similar concept behind make -j<2*#threads>.
> 
> Anyway, I'm still experimenting with that change. There are still some
> discrepancies between the way that I select num_workers and how the
> driver does. The driver appears to be a little bit more conservative,
> but according to the thread occupancy calculator, that should yield
> greater performance on GPUs.
> 
> I just wanted to give you a heads up because you seem to be working on this.
> 

Ack, thanks for letting me know.

> Thanks for all of your reviews!
> 
> By the way, are you now maintainer of the libgomp nvptx plugin?

I'm not sure if that's a separate thing.

AFAIU the responsibilities of the nvptx maintainer are:
- the nvptx backend (under supervision of the global maintainers)
- and anything nvptx-y in all other components (under supervision of the
  component and global maintainers)

So, I'd say I'm on the hook to review patches for the nvptx plugin in
libgomp.

Thanks,
- Tom


Re: [patch] adjust default nvptx launch geometry for OpenACC offloaded regions

2018-07-30 Thread Tom de Vries
On 07/11/2018 09:13 PM, Cesar Philippidis wrote:
> 2018-07-XX  Cesar Philippidis  
>       Tom de Vries  
> 
>   gcc/
>   * config/nvptx/nvptx.c (PTX_GANG_DEFAULT): Rename to ...
>   (PTX_DEFAULT_RUNTIME_DIM): ... this.
>   (nvptx_goacc_validate_dims): Set default worker and gang dims to
>   PTX_DEFAULT_RUNTIME_DIM.
>   (nvptx_dim_limit): Ignore GOMP_DIM_WORKER;

That's an independent patch.

Committed at below.

Thanks,
- Tom
[nvptx, offloading] Determine default workers at runtime

Currently, if the user doesn't specify the number of workers for an openacc
region, the compiler hardcodes it to a default value.

This patch removes this functionality, such that the libgomp runtime can decide
on a default value.

2018-07-27  Cesar Philippidis  
	Tom de Vries  

	* config/nvptx/nvptx.c (PTX_GANG_DEFAULT): Rename to ...
	(PTX_DEFAULT_RUNTIME_DIM): ... this.
	(nvptx_goacc_validate_dims): Set default worker and gang dims to
	PTX_DEFAULT_RUNTIME_DIM.
	(nvptx_dim_limit): Ignore GOMP_DIM_WORKER.

---
 gcc/config/nvptx/nvptx.c | 9 +++--
 1 file changed, 3 insertions(+), 6 deletions(-)

diff --git a/gcc/config/nvptx/nvptx.c b/gcc/config/nvptx/nvptx.c
index 5608bee8a8d..c1946e75f42 100644
--- a/gcc/config/nvptx/nvptx.c
+++ b/gcc/config/nvptx/nvptx.c
@@ -5165,7 +5165,7 @@ nvptx_expand_builtin (tree exp, rtx target, rtx ARG_UNUSED (subtarget),
 /* Define dimension sizes for known hardware.  */
 #define PTX_VECTOR_LENGTH 32
 #define PTX_WORKER_LENGTH 32
-#define PTX_GANG_DEFAULT  0 /* Defer to runtime.  */
+#define PTX_DEFAULT_RUNTIME_DIM 0 /* Defer to runtime.  */
 
 /* Implement TARGET_SIMT_VF target hook: number of threads in a warp.  */
 
@@ -5214,9 +5214,9 @@ nvptx_goacc_validate_dims (tree decl, int dims[], int fn_level)
 {
   dims[GOMP_DIM_VECTOR] = PTX_VECTOR_LENGTH;
   if (dims[GOMP_DIM_WORKER] < 0)
-	dims[GOMP_DIM_WORKER] = PTX_WORKER_LENGTH;
+	dims[GOMP_DIM_WORKER] = PTX_DEFAULT_RUNTIME_DIM;
   if (dims[GOMP_DIM_GANG] < 0)
-	dims[GOMP_DIM_GANG] = PTX_GANG_DEFAULT;
+	dims[GOMP_DIM_GANG] = PTX_DEFAULT_RUNTIME_DIM;
   changed = true;
 }
 
@@ -5230,9 +5230,6 @@ nvptx_dim_limit (int axis)
 {
   switch (axis)
 {
-case GOMP_DIM_WORKER:
-  return PTX_WORKER_LENGTH;
-
 case GOMP_DIM_VECTOR:
   return PTX_VECTOR_LENGTH;
 


[libgomp, nvptx, committed] Calculate default dims per device

2018-07-30 Thread Tom de Vries
Hi,

Build and reg-tested on x86_64 with nvptx accelerator.

Committed to trunk.

Thanks,
- Tom
[libgomp, nvptx] Calculate default dims per device

The default dimensions are calculated using per-device properties, but
initialized once and used on all devices.

This patch fixes this problem by introducing per-device default dimensions.

2018-07-27  Tom de Vries  

	* plugin/plugin-nvptx.c (struct ptx_device): Add default_dims field.
	(nvptx_open_device): Init default_dims for device.
	(nvptx_exec): Use default_dims from device.

---
 libgomp/plugin/plugin-nvptx.c | 28 +---
 1 file changed, 21 insertions(+), 7 deletions(-)

diff --git a/libgomp/plugin/plugin-nvptx.c b/libgomp/plugin/plugin-nvptx.c
index 3a4077a1315..5c522aaf281 100644
--- a/libgomp/plugin/plugin-nvptx.c
+++ b/libgomp/plugin/plugin-nvptx.c
@@ -417,6 +417,7 @@ struct ptx_device
   int warp_size;
   int max_threads_per_block;
   int max_threads_per_multiprocessor;
+  int default_dims[GOMP_DIM_MAX];
 
   struct ptx_image_data *images;  /* Images loaded on device.  */
   pthread_mutex_t image_lock; /* Lock for above list.  */
@@ -818,6 +819,9 @@ nvptx_open_device (int n)
   if (r != CUDA_SUCCESS)
 async_engines = 1;
 
+  for (int i = 0; i != GOMP_DIM_MAX; i++)
+ptx_dev->default_dims[i] = 0;
+
   ptx_dev->images = NULL;
   pthread_mutex_init (&ptx_dev->image_lock, NULL);
 
@@ -1152,15 +1156,22 @@ nvptx_exec (void (*fn), size_t mapnum, void **hostaddrs, void **devaddrs,
 
   if (seen_zero)
 {
-  /* See if the user provided GOMP_OPENACC_DIM environment
-	 variable to specify runtime defaults. */
-  static int default_dims[GOMP_DIM_MAX];
-
   pthread_mutex_lock (&ptx_dev_lock);
-  if (!default_dims[0])
+
+  static int gomp_openacc_dims[GOMP_DIM_MAX];
+  if (!gomp_openacc_dims[0])
+	{
+	  /* See if the user provided GOMP_OPENACC_DIM environment
+	 variable to specify runtime defaults.  */
+	  for (int i = 0; i < GOMP_DIM_MAX; ++i)
+	gomp_openacc_dims[i] = GOMP_PLUGIN_acc_default_dim (i);
+	}
+
+  if (!nvthd->ptx_dev->default_dims[0])
 	{
+	  int default_dims[GOMP_DIM_MAX];
 	  for (int i = 0; i < GOMP_DIM_MAX; ++i)
-	default_dims[i] = GOMP_PLUGIN_acc_default_dim (i);
+	default_dims[i] = gomp_openacc_dims[i];
 
 	  int gang, worker, vector;
 	  {
@@ -1196,12 +1207,15 @@ nvptx_exec (void (*fn), size_t mapnum, void **hostaddrs, void **devaddrs,
 			 default_dims[GOMP_DIM_GANG],
 			 default_dims[GOMP_DIM_WORKER],
 			 default_dims[GOMP_DIM_VECTOR]);
+
+	  for (i = 0; i != GOMP_DIM_MAX; i++)
+	nvthd->ptx_dev->default_dims[i] = default_dims[i];
 	}
   pthread_mutex_unlock (&ptx_dev_lock);
 
   for (i = 0; i != GOMP_DIM_MAX; i++)
 	if (!dims[i])
-	  dims[i] = default_dims[i];
+	  dims[i] = nvthd->ptx_dev->default_dims[i];
 }
 
   /* Check if the accelerator has sufficient hardware resources to


[libgomp, nvptx. committed] Handle per-function max-threads-per-block in default dims

2018-07-30 Thread Tom de Vries
Hi,

Build and reg-tested on x86_64 with nvptx accelerator.

Committed to trunk.

Thanks,
- Tom
[libgomp, nvptx] Handle per-function max-threads-per-block in default dims

Currently parallel-loop-1.c fails at -O0 on a Quadro M1200, because one of the
kernel launch configurations exceeds the resources available in the device, due
to the default dimensions chosen by the runtime.

This patch fixes that by taking the per-function max_threads_per_block into
account when using the default dimensions.

2018-07-27  Tom de Vries  

	* plugin/plugin-nvptx.c (MIN, MAX): Redefine.
	(nvptx_exec): Ensure worker and vector default dims don't exceed
	targ_fn->max_threads_per_block.

---
 libgomp/plugin/plugin-nvptx.c | 29 +
 1 file changed, 25 insertions(+), 4 deletions(-)

diff --git a/libgomp/plugin/plugin-nvptx.c b/libgomp/plugin/plugin-nvptx.c
index 5c522aaf281..b6ec5f88d59 100644
--- a/libgomp/plugin/plugin-nvptx.c
+++ b/libgomp/plugin/plugin-nvptx.c
@@ -141,6 +141,11 @@ init_cuda_lib (void)
 
 #include "secure_getenv.h"
 
+#undef MIN
+#undef MAX
+#define MIN(X,Y) ((X) < (Y) ? (X) : (Y))
+#define MAX(X,Y) ((X) > (Y) ? (X) : (Y))
+
 /* Convenience macros for the frequently used CUDA library call and
error handling sequence as well as CUDA library calls that
do the error checking themselves or don't do it at all.  */
@@ -1135,6 +1140,7 @@ nvptx_exec (void (*fn), size_t mapnum, void **hostaddrs, void **devaddrs,
   void *kargs[1];
   void *hp, *dp;
   struct nvptx_thread *nvthd = nvptx_thread ();
+  int warp_size = nvthd->ptx_dev->warp_size;
   const char *maybe_abort_msg = "(perhaps abort was called)";
 
   function = targ_fn->fn;
@@ -1175,7 +1181,6 @@ nvptx_exec (void (*fn), size_t mapnum, void **hostaddrs, void **devaddrs,
 
 	  int gang, worker, vector;
 	  {
-	int warp_size = nvthd->ptx_dev->warp_size;
 	int block_size = nvthd->ptx_dev->max_threads_per_block;
 	int cpu_size = nvthd->ptx_dev->max_threads_per_multiprocessor;
 	int dev_size = nvthd->ptx_dev->num_sms;
@@ -1213,9 +1218,25 @@ nvptx_exec (void (*fn), size_t mapnum, void **hostaddrs, void **devaddrs,
 	}
   pthread_mutex_unlock (&ptx_dev_lock);
 
-  for (i = 0; i != GOMP_DIM_MAX; i++)
-	if (!dims[i])
-	  dims[i] = nvthd->ptx_dev->default_dims[i];
+  {
+	bool default_dim_p[GOMP_DIM_MAX];
+	for (i = 0; i != GOMP_DIM_MAX; i++)
+	  {
+	default_dim_p[i] = !dims[i];
+	if (default_dim_p[i])
+	  dims[i] = nvthd->ptx_dev->default_dims[i];
+	  }
+
+	if (default_dim_p[GOMP_DIM_VECTOR])
+	  dims[GOMP_DIM_VECTOR]
+	= MIN (dims[GOMP_DIM_VECTOR],
+		   (targ_fn->max_threads_per_block / warp_size * warp_size));
+
+	if (default_dim_p[GOMP_DIM_WORKER])
+	  dims[GOMP_DIM_WORKER]
+	= MIN (dims[GOMP_DIM_WORKER],
+		   targ_fn->max_threads_per_block / dims[GOMP_DIM_VECTOR]);
+  }
 }
 
   /* Check if the accelerator has sufficient hardware resources to


[PATCH][c++] Fix DECL_BY_REFERENCE of clone parms

2018-07-30 Thread Tom de Vries
Hi,

Consider test.C compiled at -O0 -g:
...
class string {
public:
  string (const char *p) { this->p = p ; }
  string (const string &s) { this->p = s.p; }

private:
  const char *p;
};

class foo {
public:
  foo (string dir_hint) {}
};

int
main (void)
{
  std::string s = "This is just a string";
  foo bar(s);
  return 0;
}
...

When parsing foo::foo, the dir_hint parameter gets a DECL_ARG_TYPE of
'struct string & restrict'.  Then during finish_struct, we call
clone_constructors_and_destructors and create clones for foo::foo, and
set the DECL_ARG_TYPE in the same way.

Later on, during finish_function, cp_genericize is called for the original
foo::foo, which sets the type of parm dir_hint to DECL_ARG_TYPE, and sets
DECL_BY_REFERENCE of dir_hint to 1.

After that, during maybe_clone_body update_cloned_parm is called with:
...
(gdb) call debug_generic_expr (parm.typed.type)
struct string & restrict
(gdb) call debug_generic_expr (cloned_parm.typed.type)
struct string
...
The type of the cloned_parm is then set to the type of parm, but
DECL_BY_REFERENCE is not set.

When doing cp_genericize for the clone later on,
TREE_ADDRESSABLE (TREE_TYPE ()) is no longer true for the updated type of
the parm, so DECL_BY_REFERENCE is not set there either.

This patch fixes the problem by copying DECL_BY_REFERENCE in update_cloned_parm.

Build and reg-tested on x86_64.

OK for trunk?

Thanks,
- Tom

[c++] Fix DECL_BY_REFERENCE of clone parms

2018-07-30  Tom de Vries  

PR debug/86687
* optimize.c (update_cloned_parm): Copy DECL_BY_REFERENCE.

* g++.dg/guality/pr86687.C: New test.

---
 gcc/cp/optimize.c  |  2 ++
 gcc/testsuite/g++.dg/guality/pr86687.C | 28 
 2 files changed, 30 insertions(+)

diff --git a/gcc/cp/optimize.c b/gcc/cp/optimize.c
index 0e9b84ed8a4..3923a5fc6c4 100644
--- a/gcc/cp/optimize.c
+++ b/gcc/cp/optimize.c
@@ -46,6 +46,8 @@ update_cloned_parm (tree parm, tree cloned_parm, bool first)
   /* We may have taken its address.  */
   TREE_ADDRESSABLE (cloned_parm) = TREE_ADDRESSABLE (parm);
 
+  DECL_BY_REFERENCE (cloned_parm) = DECL_BY_REFERENCE (parm);
+
   /* The definition might have different constness.  */
   TREE_READONLY (cloned_parm) = TREE_READONLY (parm);
 
diff --git a/gcc/testsuite/g++.dg/guality/pr86687.C 
b/gcc/testsuite/g++.dg/guality/pr86687.C
new file mode 100644
index 000..140a6fce596
--- /dev/null
+++ b/gcc/testsuite/g++.dg/guality/pr86687.C
@@ -0,0 +1,28 @@
+// PR debug/86687
+// { dg-do run }
+// { dg-options "-g" }
+
+class string {
+public:
+  string (int p) { this->p = p ; }
+  string (const string &s) { this->p = s.p; }
+
+  int p;
+};
+
+class foo {
+public:
+  foo (string dir_hint) {
+p = dir_hint.p; // { dg-final { gdb-test . "dir_hint.p" 3 } }
+  }
+
+  int p;
+};
+
+int
+main (void)
+{
+  string s = 3;
+  foo bar(s);
+  return !(bar.p == 3);
+}


Re: [PATCH,nvptx] Use CUDA driver API to select default runtime launch, geometry

2018-08-01 Thread Tom de Vries
On 07/31/2018 04:58 PM, Cesar Philippidis wrote:
> The attached patch teaches libgomp how to use the CUDA thread occupancy
> calculator built into the CUDA driver. Despite both being based off the
> CUDA thread occupancy spreadsheet distributed with CUDA, the built in
> occupancy calculator differs from the occupancy calculator in og8 in two
> key ways. First, og8 launches twice the number of gangs as the driver
> thread occupancy calculator. This was my attempt at preventing threads
> from idling, and it operating on a similar principle of running 'make
> -jN', where N is twice the number of CPU threads.

You're saying the two methods are different, and that the difference
between the two methods is a factor two, which is a heuristic you added
yourself on top of one of the methods, which implies that in fact the
two methods are identical. Is my understanding correct here?

> Second, whereas og8
> always attempts to maximize the CUDA block size, the driver may select a
> smaller block, which effectively decreases num_workers.
> 

So, do I understand it correctly that using the function
cuOccupancyMaxPotentialBlockSize gives us "minimum block size that can
achieve the maximum occupancy" or some such and og8 gives us "maximum
block size"?

> In terms of performance, there really isn't that much of a difference
> between the CUDA driver's occupancy calculator and og8's. However, on
> the tests that are impacted, they are generally within a factor of two
> from one another, with some tests running faster with the driver
> occupancy calculator and others with og8's.
> 

Ack. Well, until we understand that in more detail, going with the
driver's occupancy calculator seems the right thing to do.

> Unfortunately, support for the CUDA driver API isn't universal; it's
> only available in CUDA version 6.5 (or 6050) and newer. In this patch,
> I'm exploiting the fact that init_cuda_lib only checks for errors on the
> last library function initialized.

That sounds incorrect to me. In init_cuda_lib I see:
...
# define CUDA_ONE_CALL(call) CUDA_ONE_CALL_1 (call)
# define CUDA_ONE_CALL_1(call) \
  cuda_lib.call = dlsym (h, #call); \
  if (cuda_lib.call == NULL)\
return false;
  CUDA_CALLS
...
so in fact every library function is checked. Have you tested this with
pre 6-5 cuda?

I think we need to add and handle:
...
  CUDA_ONE_CALL_MAYBE_NULL (cuOccupancyMaxPotentialBlockSize)
...

> Therefore it guards the usage of
> 
>   cuOccupancyMaxPotentialBlockSizeWithFlags
> 
> by checking driver_version.

If we allow the cuOccupancyMaxPotentialBlockSize field to be NULL, we
can test for NULL, which seems a simpler solution than testing the version.

> If the driver occupancy calculator isn't
> available, it falls back to the existing defaults. Maybe the og8 thread
> occupancy would make a better default for older versions of CUDA, but
> that's a patch for another day.
> 

Agreed.

> Is this patch OK for trunk?

The patch doesn't build in a setup with
--enable-offload-targets=nvptx-none and without cuda, that enables usage
of plugin/cuda/cuda.h:
...
/data/offload-nvptx/src/libgomp/plugin/plugin-nvptx.c:98:16: error:
‘cuOccupancyMaxPotentialBlockSize’ undeclared here (not in a function);
did you mean ‘cuOccupancyMaxPotentialBlockSizeWithFlags’?
 CUDA_ONE_CALL (cuOccupancyMaxPotentialBlockSize) \
...

> @@ -1220,11 +1227,39 @@ nvptx_exec (void (*fn), size_t mapnum, void 
> **hostaddrs, void **devaddrs,
>  
>{
>   bool default_dim_p[GOMP_DIM_MAX];
> + int vectors = nvthd->ptx_dev->default_dims[GOMP_DIM_VECTOR];
> + int workers = nvthd->ptx_dev->default_dims[GOMP_DIM_WORKER];
> + int gangs = nvthd->ptx_dev->default_dims[GOMP_DIM_GANG];
> +
> + /* The CUDA driver occupancy calculator is only available on
> +CUDA version 6.5 (6050) and newer.  */
> + if (nvthd->ptx_dev->driver_version > 6050)
> +   {
> + int grids, blocks;
> + CUDA_CALL_ASSERT (cuOccupancyMaxPotentialBlockSize, &grids,
> +   &blocks, function, NULL, 0,
> +   dims[GOMP_DIM_WORKER] * dims[GOMP_DIM_VECTOR]);
> + GOMP_PLUGIN_debug (0, "cuOccupancyMaxPotentialBlockSize: "
> +"grid = %d, block = %d\n", grids, blocks);
> +


> + if (GOMP_PLUGIN_acc_default_dim (GOMP_DIM_GANG) == 0)

You should use gomp_openacc_dims[0].

> +   gangs = grids * (blocks / warp_size);

So, we launch with gangs == grids * workers ? Is that intentional?

> +
> + if (GOMP_PLUGIN_acc_default_dim (GOMP_DIM_WORKER) == 0)
> +   workers = blocks / vectors;

Also, the new default calculation is not nicely separated from the
fallback default calculation.  I've updated the patch with a cleaner
separation, attached and build without cuda but untested.

Thanks,
- Tom
diff --git a/libgomp/plugin/cuda/cuda.h b/libgomp/plugin/cuda/cuda.h
index 4799825bda2..67a475b695e 100644
--- a/libgomp/plugin/cuda/cuda.h
+++ b/libgomp

Re: [PATCH,nvptx] Remove use of 'struct map' from plugin (nvptx)

2018-08-01 Thread Tom de Vries
On 07/31/2018 05:12 PM, Cesar Philippidis wrote:
> This is an old patch which removes the struct map from the nvptx plugin.
> I believe at one point this was supposed to be used to manage async data
> mappings, but in practice that never worked out.

I don't quite understand what rationale you're trying to present here.

Is this dead code?

If not, what kind of test-case exercises this? What is the difference in
behaviour for such a test-case with and without the patch?

Thanks,
- Tom


Re: [PATCH] nvptx: properly use flag_patchable_function_entry

2021-02-10 Thread Tom de Vries
On 2/10/21 2:17 PM, Martin Liška wrote:
> It's fix needed after my commit g:0d701e3eb89870237669ef7bf41394d90c35ae70.
> Tobias tested the patch for me.
> 
> Ready to be installed?

LGTM.

Thanks,
- Tom

> Thanks,
> Martin
> 
> gcc/ChangeLog:
> 
> * config/nvptx/nvptx.c (nvptx_option_override): Use
> flag_patchable_function_entry instead of the removed
> function_entry_patch_area_size.
> ---
>  gcc/config/nvptx/nvptx.c | 6 +-
>  1 file changed, 5 insertions(+), 1 deletion(-)
> 
> diff --git a/gcc/config/nvptx/nvptx.c b/gcc/config/nvptx/nvptx.c
> index f08b6791069..794c5a69db0 100644
> --- a/gcc/config/nvptx/nvptx.c
> +++ b/gcc/config/nvptx/nvptx.c
> @@ -74,6 +74,7 @@
>  #include "cfgloop.h"
>  #include "fold-const.h"
>  #include "intl.h"
> +#include "opts.h"
>  
>  /* This file should be included last.  */
>  #include "target-def.h"
> @@ -219,7 +220,10 @@ nvptx_option_override (void)
>  flag_no_common = 1;
>  
>    /* The patch area requires nops, which we don't have.  */
> -  if (function_entry_patch_area_size > 0)
> +  HOST_WIDE_INT patch_area_size, patch_area_entry;
> +  parse_and_check_patch_area (flag_patchable_function_entry, false,
> +  &patch_area_size, &patch_area_entry);
> +  if (patch_area_size > 0)
>  sorry ("not generating patch area, nops not supported");
>  
>    /* Assumes that it will see only hard registers.  */


Re: [Patch] [middle-end & nvptx] gcc/tracer.c: Don't split BB with SIMT LANE [PR95654]

2020-09-16 Thread Tom de Vries
[ cc-ing author omp support for nvptx. ]

On 9/16/20 12:39 PM, Tobias Burnus wrote:
> Hi Tom, hi Richard, hello all,
> 
> @Richard: does it look okay from the ME side?
> @Tom: Can you check which IFN_GOMP_SIMT should be
> excluded with -ftracer?
> 
> Pre-remark, I do not know much about SIMT – except that they
> only appear with nvptx and somehow relate to lanes on the
> GPU.
> 
> In any case, as the testcase libgomp.fortran/pr66199-5.f90 shows,
> if a basic block with  GOMP_SIMT_VOTE_ANY in it is duplicated,
> which happens via -ftracer for this testcase, the result is wrong.
> 
> The testcase ignores all the loop work but via "lastprivate" takes
> the upper loop bound (as assigned to the loop indices); instead of
> the expected 32*32 = 1024, either some number (like 4 or very large
> or negative) is returned.
> 
> While GOMP_SIMT_VOTE_ANY fixes the issue for this testcase, I
> have the feeling that at least GOMP_SIMT_LAST_LANE should be
> not copied - but I might be wrong.
> 
> Tom: Do you think GOMP_SIMT_LAST_LANE should be removed from
> that list – or any of the following added as well?
> GOMP_USE_SIMT, GOMP_SIMT_ENTER, GOMP_SIMT_ENTER_ALLOC, GOMP_SIMT_EXIT,
> GOMP_SIMT_VF, GOMP_SIMT_ORDERED_PRED, GOMP_SIMT_XCHG_BFLY,
> GOMP_SIMT_XCHG_IDX
> 
> OK for mainline?
> 
> Tobias
> 
> -
> Mentor Graphics (Deutschland) GmbH, Arnulfstraße 201, 80634 München /
> Germany
> Registergericht München HRB 106955, Geschäftsführer: Thomas Heurung,
> Alexander Walter


Re: [Patch] [middle-end & nvptx] gcc/tracer.c: Don't split BB with SIMT LANE [PR95654]

2020-09-16 Thread Tom de Vries
On 9/16/20 1:24 PM, Alexander Monakov wrote:
> Can you supply a tar with tree dumps for me to look at please?
> 

Hi Alexander,

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=95654#c6 .

> Also, if you can check if the problem can be triggered without a
> collapsed loop (e.g. try removing collapse(2), remove mentions of
> d2) and if so supply dumps from that instead, I'd appreciate that too.
> 

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=95654#c8

Thanks,
- Tom

> Alexander
> 
> On Wed, 16 Sep 2020, Tom de Vries wrote:
> 
>> [ cc-ing author omp support for nvptx. ]
>>
>> On 9/16/20 12:39 PM, Tobias Burnus wrote:
>>> Hi Tom, hi Richard, hello all,
>>>
>>> @Richard: does it look okay from the ME side?
>>> @Tom: Can you check which IFN_GOMP_SIMT should be
>>> excluded with -ftracer?
>>>
>>> Pre-remark, I do not know much about SIMT – except that they
>>> only appear with nvptx and somehow relate to lanes on the
>>> GPU.
>>>
>>> In any case, as the testcase libgomp.fortran/pr66199-5.f90 shows,
>>> if a basic block with  GOMP_SIMT_VOTE_ANY in it is duplicated,
>>> which happens via -ftracer for this testcase, the result is wrong.
>>>
>>> The testcase ignores all the loop work but via "lastprivate" takes
>>> the upper loop bound (as assigned to the loop indices); instead of
>>> the expected 32*32 = 1024, either some number (like 4 or very large
>>> or negative) is returned.
>>>
>>> While GOMP_SIMT_VOTE_ANY fixes the issue for this testcase, I
>>> have the feeling that at least GOMP_SIMT_LAST_LANE should be
>>> not copied - but I might be wrong.
>>>
>>> Tom: Do you think GOMP_SIMT_LAST_LANE should be removed from
>>> that list – or any of the following added as well?
>>> GOMP_USE_SIMT, GOMP_SIMT_ENTER, GOMP_SIMT_ENTER_ALLOC, GOMP_SIMT_EXIT,
>>> GOMP_SIMT_VF, GOMP_SIMT_ORDERED_PRED, GOMP_SIMT_XCHG_BFLY,
>>> GOMP_SIMT_XCHG_IDX
>>>
>>> OK for mainline?
>>>
>>> Tobias
>>>
>>> -
>>> Mentor Graphics (Deutschland) GmbH, Arnulfstraße 201, 80634 München /
>>> Germany
>>> Registergericht München HRB 106955, Geschäftsführer: Thomas Heurung,
>>> Alexander Walter


[committed][libgomp, nvptx] Print error log for link error

2020-09-22 Thread Tom de Vries
Hi,

By running libgomp test-case libgomp.c/target-28.c with GOMP_NVPTX_PTXRW=w
(using a maintenance patch that adds support for this env var), we dump the
ptx in target-28.exe to file.  By editing one ptx file to rename
gomp_nvptx_main to gomp_nvptx_main2 in both declaration and call, and
running with GOMP_NVPTX_PTXRW=r, we trigger a link error:
...
$ GOMP_NVPTX_PTXRW=r ./target-28.exe
libgomp: cuLinkComplete error: unknown error
...
The error is somewhat uninformative.

Fix this by dumping the error log returned by the failing cuda call, such
that we have instead:
...
$ GOMP_NVPTX_PTXRW=r ./target-28.exe
libgomp: Link error log error   : \
  Undefined reference to 'gomp_nvptx_main2' in ''
libgomp: cuLinkComplete error: unknown error
...

Build on x86_64 with nvptx accelerator, tested libgomp.

Committed to trunk.

Thanks,
- Tom

[libgomp, nvptx] Print error log for link error

libgomp/ChangeLog:

* plugin/plugin-nvptx.c (link_ptx): Print elog if cuLinkComplete call
fails.

---
 libgomp/plugin/plugin-nvptx.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/libgomp/plugin/plugin-nvptx.c b/libgomp/plugin/plugin-nvptx.c
index 390804ad1fa..a63dd1a99fb 100644
--- a/libgomp/plugin/plugin-nvptx.c
+++ b/libgomp/plugin/plugin-nvptx.c
@@ -701,6 +701,7 @@ link_ptx (CUmodule *module, const struct targ_ptx_obj 
*ptx_objs,
 
   if (r != CUDA_SUCCESS)
 {
+  GOMP_PLUGIN_error ("Link error log %s\n", &elog[0]);
   GOMP_PLUGIN_error ("cuLinkComplete error: %s", cuda_error (r));
   return false;
 }


Re: [Patch] [middle-end & nvptx] gcc/tracer.c: Don't split BB with SIMT LANE [PR95654]

2020-09-22 Thread Tom de Vries
On 9/16/20 12:39 PM, Tobias Burnus wrote:
> Hi Tom, hi Richard, hello all,
> 
> @Richard: does it look okay from the ME side?
> @Tom: Can you check which IFN_GOMP_SIMT should be
> excluded with -ftracer?
> 
> Pre-remark, I do not know much about SIMT – except that they
> only appear with nvptx and somehow relate to lanes on the
> GPU.
> 
> In any case, as the testcase libgomp.fortran/pr66199-5.f90 shows,
> if a basic block with  GOMP_SIMT_VOTE_ANY in it is duplicated,
> which happens via -ftracer for this testcase, the result is wrong.
> 
> The testcase ignores all the loop work but via "lastprivate" takes
> the upper loop bound (as assigned to the loop indices); instead of
> the expected 32*32 = 1024, either some number (like 4 or very large
> or negative) is returned.
> 
> While GOMP_SIMT_VOTE_ANY fixes the issue for this testcase, I
> have the feeling that at least GOMP_SIMT_LAST_LANE should be
> not copied - but I might be wrong.
> 
> Tom: Do you think GOMP_SIMT_LAST_LANE should be removed from
> that list – or any of the following added as well?
> GOMP_USE_SIMT, GOMP_SIMT_ENTER, GOMP_SIMT_ENTER_ALLOC, GOMP_SIMT_EXIT,
> GOMP_SIMT_VF, GOMP_SIMT_ORDERED_PRED, GOMP_SIMT_XCHG_BFLY,
> GOMP_SIMT_XCHG_IDX
> 
> OK for mainline?
> 

You can commit the test-case bit as obvious.

Thanks,
- Tom

> libgomp/ChangeLog:
> 
>   * testsuite/libgomp.fortran/pr66199-5.f90: Make stop codes unique.

> diff --git a/libgomp/testsuite/libgomp.fortran/pr66199-5.f90 
> b/libgomp/testsuite/libgomp.fortran/pr66199-5.f90
> index 9482f08..2627a81 100644
> --- a/libgomp/testsuite/libgomp.fortran/pr66199-5.f90
> +++ b/libgomp/testsuite/libgomp.fortran/pr66199-5.f90
> @@ -67,5 +67,5 @@ program main
>if (f1 (0, 1024) /= 1024) stop 1
>if (f2 (0, 1024, 17) /= 1024 + (17 + 5 * 1023)) stop 2
>if (f3 (0, 32, 0, 32) /= 64) stop 3
> -  if (f4 (0, 32, 0, 32) /= 64) stop 3
> +  if (f4 (0, 32, 0, 32) /= 64) stop 4
>  end



[PATCH][omp, ftracer] Don't duplicate blocks in SIMT region

2020-09-22 Thread Tom de Vries
[ was: Re: [Patch] [middle-end & nvptx] gcc/tracer.c: Don't split BB
with SIMT LANE [PR95654] ]

On 9/16/20 8:20 PM, Alexander Monakov wrote:
> 
> 
> On Wed, 16 Sep 2020, Tom de Vries wrote:
> 
>> [ cc-ing author omp support for nvptx. ]
> 
> The issue looks familiar. I recognized it back in 2017 (and LLVM people
> recognized it too for their GPU targets). In an attempt to get agreement
> to fix the issue "properly" for GCC I found a similar issue that affects
> all targets, not just offloading, and filed it as PR 80053.
> 
> (yes, there are no addressable labels involved in offloading, but nevertheless
> the nature of the middle-end issue is related)

Hi Alexander,

thanks for looking into this.

Seeing that the attempt to fix things properly is stalled, for now I'm
proposing a point-fix, similar to the original patch proposed by Tobias.

Richi, Jakub, OK for trunk?

Thanks,
- Tom

[omp, ftracer] Don't duplicate blocks in SIMT region

When running the libgomp testsuite on x86_64-linux with nvptx accelerator,
we run into:
...
FAIL: libgomp.fortran/pr66199-5.f90   -O3 -fomit-frame-pointer -funroll-loops \
  -fpeel-loops -ftracer -finline-functions  execution test
...

The problem is that ftracer duplicates a block containing GOMP_SIMT_VOTE_ANY.

That is, before ftracer we have (dropping the GOMP_SIMT_ prefix):
...
bb4(ENTER_ALLOC)
*--+
|   \
|\
| v
| *
v bb8
*<*
bb5(VOTE_ANY)
*-+
| |
| |
| |
| |
| v
| *
v bb7(XCHG_IDX)
*<*
bb6(EXIT)
...

The XCHG_IDX internal-fn does inter-SIMT-lane communication, which for nvptx
maps onto shfl, an operator which has the requirement that the warp executing
the operator is convergent.  The warp diverges at bb4, and
reconverges at bb5, and does not diverge by going to bb7, so the shfl is
indeed executed by a convergent warp.

After ftracer, we have:
...
bb4(ENTER_ALLOC)
*--+
|   \
|\
| \
|  \
v   v
*   *
bb5(VOTE_ANY)   bb8(VOTE_ANY)
*   *
|\ /|
| \  ++ |
|  \/   |
|  /\   |
| /  +--v
|/  *
v   bb7(XCHG_IDX)
*<--*
bb6(EXIT)
...

The warp diverges again at bb5, but does not reconverge again before bb6, so
the shfl is executed by a divergent warp, which causes the FAIL.

Fix this by making ftracer ignore blocks containing ENTER_ALLOC and EXIT,
effectively treating the SIMT region conservatively.

One could argue that the EXIT and VOTE_ANY can be generated by omp-low in
reverse order, in which case the VOTE_ANY could be duplicated.  This is the
reason VOTE_ANY is not explicitly listed as ignored in this patch.

An argument can also be made that the test needs to be added in a more
generic place, like gimple_can_duplicate_bb_p or some such, and that ftracer
then needs to use the generic test.  But that's a discussion with a much
broader scope, so I'm leaving that for another patch.

Build on x86_64-linux with nvptx accelerator, tested with libgomp.

gcc/ChangeLog:

	PR fortran/95654
	* tracer.c (ignore_bb_p): Ignore GOMP_SIMT_ENTER_ALLOC
	and GOMP_SIMT_EXIT.

---
 gcc/tracer.c | 10 ++
 1 file changed, 10 insertions(+)

diff --git a/gcc/tracer.c b/gcc/tracer.c
index 82ede722534..de80416f163 100644
--- a/gcc/tracer.c
+++ b/gcc/tracer.c
@@ -108,6 +108,16 @@ ignore_bb_p (const_basic_block bb)
 	return true;
 }
 
+  for (gimple_stmt_iterator gsi = gsi_start_bb (CONST_CAST_BB (bb));
+   !gsi_end_p (gsi); gsi_next (&gsi))
+{
+  gimple *g = gsi_stmt (gsi);
+  if (is_gimple_call (g)
+	  && (gimple_call_internal_p (g, IFN_GOMP_SIMT_ENTER_ALLOC)
+	  || gimple_call_internal_p (g, IFN_GOMP_SIMT_EXIT)))
+	return true;
+}
+
   return false;
 }
 


[committed][testsuite] Add missing require-effective-target alloca

2020-09-23 Thread Tom de Vries
Hi,

Add missing require-effect-target alloca directives.

Tested on nvptx.

Committed to trunk.

Thanks,
- Tom

[testsuite] Add missing require-effective-target alloca

gcc/testsuite/ChangeLog:

* gcc.dg/Warray-bounds-63.c: Add require-effective-target alloca.
* gcc.dg/Warray-bounds-66.c: Same.
* gcc.dg/atomic/stdatomic-vm.c: Same.

---
 gcc/testsuite/gcc.dg/Warray-bounds-63.c| 3 ++-
 gcc/testsuite/gcc.dg/Warray-bounds-66.c| 3 ++-
 gcc/testsuite/gcc.dg/atomic/stdatomic-vm.c | 1 +
 3 files changed, 5 insertions(+), 2 deletions(-)

diff --git a/gcc/testsuite/gcc.dg/Warray-bounds-63.c 
b/gcc/testsuite/gcc.dg/Warray-bounds-63.c
index 0583d233c22..a3fc9188211 100644
--- a/gcc/testsuite/gcc.dg/Warray-bounds-63.c
+++ b/gcc/testsuite/gcc.dg/Warray-bounds-63.c
@@ -1,7 +1,8 @@
 /* PR middle-end/94195 - missing warning reading a smaller object via
an lvalue of a larger type
{ dg-do compile }
-   { dg-options "-O2 -Wall" } */
+   { dg-options "-O2 -Wall" }
+   { dg-require-effective-target alloca } */
 
 typedef __INT16_TYPE__ int16_t;
 typedef __SIZE_TYPE__  size_t;
diff --git a/gcc/testsuite/gcc.dg/Warray-bounds-66.c 
b/gcc/testsuite/gcc.dg/Warray-bounds-66.c
index d9bb2a29ca4..c61891f5c07 100644
--- a/gcc/testsuite/gcc.dg/Warray-bounds-66.c
+++ b/gcc/testsuite/gcc.dg/Warray-bounds-66.c
@@ -1,6 +1,7 @@
 /* PR middle-end/82608 - missing -Warray-bounds on an out-of-bounds VLA index
  { dg-do compile }
- { dg-options "-O2 -Wall -Wno-uninitialized -ftrack-macro-expansion=0" } */
+ { dg-options "-O2 -Wall -Wno-uninitialized -ftrack-macro-expansion=0" }
+ { dg-require-effective-target alloca } */
 
 #include "range.h"
 
diff --git a/gcc/testsuite/gcc.dg/atomic/stdatomic-vm.c 
b/gcc/testsuite/gcc.dg/atomic/stdatomic-vm.c
index f43fa49ef12..cdfb701207c 100644
--- a/gcc/testsuite/gcc.dg/atomic/stdatomic-vm.c
+++ b/gcc/testsuite/gcc.dg/atomic/stdatomic-vm.c
@@ -2,6 +2,7 @@
with side effects.  */
 /* { dg-do run } */
 /* { dg-options "-std=c11 -pedantic-errors" } */
+/* { dg-require-effective-target alloca } */
 
 #include 
 


[committed][nvptx] Handle move from DF subreg to DF reg in nvptx_output_mov_insn

2020-09-23 Thread Tom de Vries
Hi,

When compiling test-case gcc.dg/atomic/c11-atomic-exec-1.c, we run into
these ptxas errors:
...
line 100; error: Rounding modifier required for instruction 'cvt'
line 105; error: Rounding modifier required for instruction 'cvt'
...

The problem is that this move:
...
//(insn 13 11 14 2
//  (set (reg:DF 28 [ _9 ])
//   (subreg:DF (reg:TI 22 [ _1 ]) 0)) 9 {*movdf_insn}
//   (nil))
cvt.f64.u64 %r28, %r22$0;
...
is emitted as cvt.f64.u64, while it should be a mov.b64 instead.

Fix this by handling this case in nvptx_output_mov_insn.

Tested on nvptx.

Committed to trunk.

Thanks,
- Tom

[nvptx] Handle move from DF subreg to DF reg in nvptx_output_mov_insn

gcc/ChangeLog:

PR target/97158
* config/nvptx/nvptx.c (nvptx_output_mov_insn): Handle move from
DF subreg to DF reg.

---
 gcc/config/nvptx/nvptx.c | 7 ++-
 1 file changed, 6 insertions(+), 1 deletion(-)

diff --git a/gcc/config/nvptx/nvptx.c b/gcc/config/nvptx/nvptx.c
index 0c590d8d1f6..54b1fdf669b 100644
--- a/gcc/config/nvptx/nvptx.c
+++ b/gcc/config/nvptx/nvptx.c
@@ -2349,6 +2349,7 @@ const char *
 nvptx_output_mov_insn (rtx dst, rtx src)
 {
   machine_mode dst_mode = GET_MODE (dst);
+  machine_mode src_mode = GET_MODE (src);
   machine_mode dst_inner = (GET_CODE (dst) == SUBREG
? GET_MODE (XEXP (dst, 0)) : dst_mode);
   machine_mode src_inner = (GET_CODE (src) == SUBREG
@@ -2375,7 +2376,7 @@ nvptx_output_mov_insn (rtx dst, rtx src)
   if (GET_MODE_SIZE (dst_inner) == GET_MODE_SIZE (src_inner))
 {
   if (GET_MODE_BITSIZE (dst_mode) == 128
- && GET_MODE_BITSIZE (GET_MODE (src)) == 128)
+ && GET_MODE_BITSIZE (src_mode) == 128)
{
  /* mov.b128 is not supported.  */
  if (dst_inner == V2DImode && src_inner == TImode)
@@ -2388,6 +2389,10 @@ nvptx_output_mov_insn (rtx dst, rtx src)
   return "%.\tmov.b%T0\t%0, %1;";
 }
 
+  if (GET_MODE_BITSIZE (src_inner) == 128
+  && GET_MODE_BITSIZE (src_mode) == 64)
+return "%.\tmov.b%T0\t%0, %1;";
+
   return "%.\tcvt%t0%t1\t%0, %1;";
 }
 


Re: [PATCH][omp, ftracer] Don't duplicate blocks in SIMT region

2020-09-23 Thread Tom de Vries
On 9/23/20 9:28 AM, Richard Biener wrote:
> On Tue, 22 Sep 2020, Tom de Vries wrote:
> 
>> [ was: Re: [Patch] [middle-end & nvptx] gcc/tracer.c: Don't split BB
>> with SIMT LANE [PR95654] ]
>>
>> On 9/16/20 8:20 PM, Alexander Monakov wrote:
>>>
>>>
>>> On Wed, 16 Sep 2020, Tom de Vries wrote:
>>>
>>>> [ cc-ing author omp support for nvptx. ]
>>>
>>> The issue looks familiar. I recognized it back in 2017 (and LLVM people
>>> recognized it too for their GPU targets). In an attempt to get agreement
>>> to fix the issue "properly" for GCC I found a similar issue that affects
>>> all targets, not just offloading, and filed it as PR 80053.
>>>
>>> (yes, there are no addressable labels involved in offloading, but 
>>> nevertheless
>>> the nature of the middle-end issue is related)
>>
>> Hi Alexander,
>>
>> thanks for looking into this.
>>
>> Seeing that the attempt to fix things properly is stalled, for now I'm
>> proposing a point-fix, similar to the original patch proposed by Tobias.
>>
>> Richi, Jakub, OK for trunk?
> 
> I notice that we call ignore_bb_p many times in tracer.c but one call
> is conveniently early in tail_duplicate (void):
> 
>   int n = count_insns (bb);
>   if (!ignore_bb_p (bb))
> blocks[bb->index] = heap.insert (-bb->count.to_frequency (cfun), 
> bb);
> 
> where count_insns already walks all stmts in the block.  It would be
> nice to avoid repeatedly walking all stmts, maybe adjusting the above
> call is enough and/or count_insns can compute this and/or the ignore_bb_p
> result can be cached (optimize_bb_for_size_p might change though,
> but maybe all other ignore_bb_p calls effectively just are that,
> checks for blocks that became optimize_bb_for_size_p).
> 

This untested follow-up patch tries something in that direction.

Is this what you meant?

Thanks,
- Tom
try

---
 gcc/tracer.c | 112 +--
 1 file changed, 94 insertions(+), 18 deletions(-)

diff --git a/gcc/tracer.c b/gcc/tracer.c
index de80416f163..63bfbf1e286 100644
--- a/gcc/tracer.c
+++ b/gcc/tracer.c
@@ -53,7 +53,6 @@
 #include "fibonacci_heap.h"
 #include "tracer.h"
 
-static int count_insns (basic_block);
 static bool better_p (const_edge, const_edge);
 static edge find_best_successor (basic_block);
 static edge find_best_predecessor (basic_block);
@@ -84,58 +83,127 @@ bb_seen_p (basic_block bb)
   return bitmap_bit_p (bb_seen, bb->index);
 }
 
-/* Return true if we should ignore the basic block for purposes of tracing.  */
-bool
-ignore_bb_p (const_basic_block bb)
+static bool
+can_duplicate_bb_p (const_basic_block bb)
 {
   if (bb->index < NUM_FIXED_BLOCKS)
-return true;
-  if (optimize_bb_for_size_p (bb))
-return true;
+return false;
 
   if (gimple *g = last_stmt (CONST_CAST_BB (bb)))
 {
   /* A transaction is a single entry multiple exit region.  It
 	 must be duplicated in its entirety or not at all.  */
   if (gimple_code (g) == GIMPLE_TRANSACTION)
-	return true;
+	return false;
 
   /* An IFN_UNIQUE call must be duplicated as part of its group,
 	 or not at all.  */
   if (is_gimple_call (g)
 	  && gimple_call_internal_p (g)
 	  && gimple_call_internal_unique_p (g))
-	return true;
+	return false;
+}
+
+  return true;
+}
+
+static bool
+can_duplicate_insn_p (gimple *g)
+{
+  if (is_gimple_call (g)
+  && (gimple_call_internal_p (g, IFN_GOMP_SIMT_ENTER_ALLOC)
+	  || gimple_call_internal_p (g, IFN_GOMP_SIMT_EXIT)))
+return false;
+
+  return true;
+}
+
+static sbitmap can_duplicate_bb;
+static sbitmap can_duplicate_bb_computed;
+
+static inline void
+cache_can_duplicate_bb_p (const_basic_block bb, bool val)
+{
+  unsigned int size = SBITMAP_SIZE (can_duplicate_bb);
+
+  if ((unsigned int)bb->index >= size)
+{
+  can_duplicate_bb = sbitmap_resize (can_duplicate_bb, size * 2, 0);
+  can_duplicate_bb_computed
+	= sbitmap_resize (can_duplicate_bb_computed, size * 2, 0);
 }
 
+  if (val)
+bitmap_set_bit (can_duplicate_bb, bb->index);
+  bitmap_set_bit (can_duplicate_bb_computed, bb->index);
+}
+
+static bool
+compute_can_duplicate_bb_p (const_basic_block bb)
+{
+  if (!can_duplicate_bb_p (bb))
+return false;
+
   for (gimple_stmt_iterator gsi = gsi_start_bb (CONST_CAST_BB (bb));
!gsi_end_p (gsi); gsi_next (&gsi))
 {
   gimple *g = gsi_stmt (gsi);
-  if (is_gimple_call (g)
-	  && (gimple_call_internal_p (g, IFN_GOMP_SIMT_ENTER_ALLOC)
-	  || gimple_call_internal_p (g, IFN_GOMP_SIMT_EXIT)))
-	return true;
+  if (!can_duplicate_insn_p (g))
+	return false;
+}
+
+  return true;
+}
+
+static bool
+c

[committed][nvptx] Split up function ref plus const

2020-09-23 Thread Tom de Vries
Hi,

With test-case gcc.c-torture/compile/pr92231.c, we run into:
...
nvptx-as: ptxas terminated with signal 11 [Segmentation fault], core dumped^M
compiler exited with status 1
FAIL: gcc.c-torture/compile/pr92231.c   -O0  (test for excess errors)
...
due to using a function reference plus constant as operand:
...
  mov.u64 %r24,bar+4096';
...

Fix this by splitting such an insn into:
...
  mov.u64 %r24,bar';
  add.u64 %r24,%r24,4096';
...

Tested on nvptx.

Committed to trunk.

Thanks,
- Tom

[nvptx] Split up function ref plus const

gcc/ChangeLog:

* config/nvptx/nvptx.md: Don't allow operand containing sum of
function ref and const.

---
 gcc/config/nvptx/nvptx.md | 18 ++
 1 file changed, 18 insertions(+)

diff --git a/gcc/config/nvptx/nvptx.md b/gcc/config/nvptx/nvptx.md
index 6178e6a0f77..035f6e0151b 100644
--- a/gcc/config/nvptx/nvptx.md
+++ b/gcc/config/nvptx/nvptx.md
@@ -146,6 +146,13 @@
   return true;
 })
 
+;; Test for a function symbol ref operand
+(define_predicate "symbol_ref_function_operand"
+  (match_code "symbol_ref")
+{
+  return SYMBOL_REF_FUNCTION_P (op);
+})
+
 (define_attr "predicable" "false,true"
   (const_string "true"))
 
@@ -241,6 +248,17 @@
 }
   [(set_attr "subregs_ok" "true")])
 
+;; ptxas segfaults on 'mov.u64 %r24,bar+4096', so break it up.
+(define_split
+  [(set (match_operand:DI 0 "nvptx_register_operand")
+   (const:DI (plus:DI (match_operand:DI 1 "symbol_ref_function_operand")
+  (match_operand 2 "const_int_operand"]
+  ""
+  [(set (match_dup 0) (match_dup 1))
+   (set (match_dup 0) (plus:DI (match_dup 0) (match_dup 2)))
+  ]
+  "")
+
 (define_insn "*mov_insn"
   [(set (match_operand:SDFM 0 "nonimmediate_operand" "=R,R,m")
(match_operand:SDFM 1 "general_operand" "RF,m,R"))]


[committed][testsuite] Check target alias in builtin-has-attribute-3.c

2020-09-23 Thread Tom de Vries
Hi,

When running test-case c-c++-common/builtin-has-attribute-3.c on nvptx, I get:
...
FAIL: c-c++-common/builtin-has-attribute-3.c  -Wc++-compat \
  (test for excess errors)
Excess errors:
src/gcc/testsuite/c-c++-common/builtin-has-attribute-3.c:33:33: error: \
  alias definitions not supported in this configuration
...

Fix this by adding -DSKIP_ALIAS to the compilation options for effective
target ! alias.

Tested on nvptx.

Committed to trunk.

Thanks,
- Tom

[testsuite] Check target alias in builtin-has-attribute-3.c

gcc/testsuite/ChangeLog:

* c-c++-common/builtin-has-attribute-3.c: Compile with -DSKIP_ALIAS
for effective target ! alias.

---
 gcc/testsuite/c-c++-common/builtin-has-attribute-3.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/gcc/testsuite/c-c++-common/builtin-has-attribute-3.c 
b/gcc/testsuite/c-c++-common/builtin-has-attribute-3.c
index 5736bab9443..45806d5deb7 100644
--- a/gcc/testsuite/c-c++-common/builtin-has-attribute-3.c
+++ b/gcc/testsuite/c-c++-common/builtin-has-attribute-3.c
@@ -2,7 +2,7 @@
{ dg-do compile }
{ dg-options "-Wall -ftrack-macro-expansion=0" }
{ dg-options "-Wall -Wno-narrowing -Wno-unused-local-typedefs 
-ftrack-macro-expansion=0" { target c++ } } 
-   { dg-additional-options "-DSKIP_ALIAS" { target *-*-darwin* hppa*-*-hpux* } 
} 
+   { dg-additional-options "-DSKIP_ALIAS" { target { { *-*-darwin* 
hppa*-*-hpux } || { ! alias } } } }
 */
 
 #define ATTR(...) __attribute__ ((__VA_ARGS__))


Re: [RFC] Offloading and automatic linking of libraries

2020-09-23 Thread Tom de Vries
On 9/24/20 8:32 AM, Tobias Burnus wrote:
> Hi all,
> 
> we got the user comment that it is far from obvious to
> use  -foffload=-latomic if the following error shows up:
> 
> unresolved symbol __atomic_compare_exchange_16
> collect2: error: ld returned 1 exit status
> mkoffload: fatal error:
> /powerpc64le-none-linux-gnu-accel-nvptx-none-gcc returned 1 exit status
> 
> In principle, the same issue pops up with -lm and -lgfortran,
> which on the host are automatically linked for g++ (-lm) and
> for gfortran (both) but not gcc.
> 
> Atomic is a bit more special as '-lm' and its functions
> are better known and when compiling with gfortran and
> getting 'unresolved symbol _gfortran_stop_numeric' is
> also a bit more obvious.
> 
> Thoughts? Or just better educating our users in the
> documentation? — Currently, -foffload= is only documented
> in the wiki ...
> 
> For nvptx-none and libatomic, I was thinking of adding
>    -foffload=nvptx-none=-latomic
> to the host's
>    libgomp/libgomp.spec(.in)
> if configured with --enable-offload-targets=nvptx-none
> but that assumes that nvptx-none has been configured
> without --disable-atomic. – And I fail how the host
> configure can see how the device configure was invoked.

Maybe we could require building libatomic for nvptx.

Thanks,
- Tom


[committed][testsuite] Require non_strict_align in pr94600-{1,3}.c

2020-09-24 Thread Tom de Vries
Hi,

With the nvptx target, we run into:
...
FAIL: gcc.dg/pr94600-1.c scan-rtl-dump-times final "\\(mem/v" 6
FAIL: gcc.dg/pr94600-1.c scan-rtl-dump-times final "\\(set \\(mem/v" 6
FAIL: gcc.dg/pr94600-3.c scan-rtl-dump-times final "\\(mem/v" 1
FAIL: gcc.dg/pr94600-3.c scan-rtl-dump-times final "\\(set \\(mem/v" 1
...
The scans attempt to check for volatile stores, but on nvptx we have memcpy
instead.

This is due to nvptx being a STRICT_ALIGNMENT target, which has the effect
that the TYPE_MODE for the store target is set to BKLmode in
compute_record_mode.

Fix the FAILs by requiring effective target non_strict_align.

Tested on nvptx.

Committed to trunk.

Thanks,
- Tom

[testsuite] Require non_strict_align in pr94600-{1,3}.c

gcc/testsuite/ChangeLog:

2020-09-24  Tom de Vries  

* gcc.dg/pr94600-1.c: Require effective target non_strict_align for
scan-rtl-dump-times.
* gcc.dg/pr94600-3.c: Same.

---
 gcc/testsuite/gcc.dg/pr94600-1.c | 4 ++--
 gcc/testsuite/gcc.dg/pr94600-3.c | 4 ++--
 2 files changed, 4 insertions(+), 4 deletions(-)

diff --git a/gcc/testsuite/gcc.dg/pr94600-1.c b/gcc/testsuite/gcc.dg/pr94600-1.c
index b5913a0939c..38f939a98cb 100644
--- a/gcc/testsuite/gcc.dg/pr94600-1.c
+++ b/gcc/testsuite/gcc.dg/pr94600-1.c
@@ -32,5 +32,5 @@ foo(void)
 }
 
 /* The only volatile accesses should be the obvious writes.  */
-/* { dg-final { scan-rtl-dump-times {\(mem/v} 6 "final" } } */
-/* { dg-final { scan-rtl-dump-times {\(set \(mem/v} 6 "final" } } */
+/* { dg-final { scan-rtl-dump-times {\(mem/v} 6 "final" { target { 
non_strict_align } } } } */
+/* { dg-final { scan-rtl-dump-times {\(set \(mem/v} 6 "final" { target { 
non_strict_align } } } } */
diff --git a/gcc/testsuite/gcc.dg/pr94600-3.c b/gcc/testsuite/gcc.dg/pr94600-3.c
index 7537f6cb797..e8776fbdb28 100644
--- a/gcc/testsuite/gcc.dg/pr94600-3.c
+++ b/gcc/testsuite/gcc.dg/pr94600-3.c
@@ -31,5 +31,5 @@ foo(void)
 }
 
 /* The loop isn't unrolled. */
-/* { dg-final { scan-rtl-dump-times {\(mem/v} 1 "final" } } */
-/* { dg-final { scan-rtl-dump-times {\(set \(mem/v} 1 "final" } } */
+/* { dg-final { scan-rtl-dump-times {\(mem/v} 1 "final" { target { 
non_strict_align } } } } */
+/* { dg-final { scan-rtl-dump-times {\(set \(mem/v} 1 "final" { target { 
non_strict_align } } } } */


[PATCH][testsuite] Add effective target ident_directive

2020-09-24 Thread Tom de Vries
Hi,

On nvptx we run into:
...
FAIL: c-c++-common/ident-1b.c  -Wc++-compat   scan-assembler GCC:
FAIL: c-c++-common/ident-2b.c  -Wc++-compat   scan-assembler GCC:
...

Using a scan-assembler directive adds -fno-indent to the compile options.
The test c-c++-common/ident-1b.c adds dg-options "-fident", and intends to
check that the -fident overrides the -fno-indent, by means of the
scan-assembler.  But for nvptx, there's no .ident directive, both with -fident
and -fno-ident.

Fix this by adding an effective target ident_directive, and requiring
it in both test-cases.

Tested on nvptx and x86_64.

OK for trunk?

Thanks,
- Tom

[testsuite] Add effective target ident_directive

gcc/testsuite/ChangeLog:

2020-09-24  Tom de Vries  

* lib/target-supports.exp (check_effective_target_ident_directive):
New proc.
* c-c++-common/ident-1b.c: Require effective target ident_directive.
* c-c++-common/ident-2b.c: Same.

---
 gcc/testsuite/c-c++-common/ident-1b.c | 1 +
 gcc/testsuite/c-c++-common/ident-2b.c | 1 +
 gcc/testsuite/lib/target-supports.exp | 9 +
 3 files changed, 11 insertions(+)

diff --git a/gcc/testsuite/c-c++-common/ident-1b.c 
b/gcc/testsuite/c-c++-common/ident-1b.c
index 69567442a03..b8b83e64ad2 100644
--- a/gcc/testsuite/c-c++-common/ident-1b.c
+++ b/gcc/testsuite/c-c++-common/ident-1b.c
@@ -2,6 +2,7 @@
  * Make sure scan-assembler turns off .ident unless -fident in testcase */
 /* { dg-do compile } */
 /* { dg-options "-fident" } */
+/* { dg-require-effective-target ident_directive }*/
 int i;
 
 /* { dg-final { scan-assembler "GCC: " { xfail { { hppa*-*-hpux* && { ! lp64 } 
} || { powerpc-ibm-aix* || powerpc*-*-darwin* } } } } } */
diff --git a/gcc/testsuite/c-c++-common/ident-2b.c 
b/gcc/testsuite/c-c++-common/ident-2b.c
index fae6a031571..52f0693e164 100644
--- a/gcc/testsuite/c-c++-common/ident-2b.c
+++ b/gcc/testsuite/c-c++-common/ident-2b.c
@@ -2,6 +2,7 @@
  * Make sure scan-assembler-times turns off .ident unless -fident in testcase 
*/
 /* { dg-do compile } */
 /* { dg-options "-fident" } */
+/* { dg-require-effective-target ident_directive }*/
 int ident;
 
 /* { dg-final { scan-assembler "GCC: " { xfail { { hppa*-*-hpux* && { ! lp64 } 
} || { powerpc-ibm-aix* || powerpc*-*-darwin* } } } } } */
diff --git a/gcc/testsuite/lib/target-supports.exp 
b/gcc/testsuite/lib/target-supports.exp
index 5cbe32ffbd6..0a00972edb5 100644
--- a/gcc/testsuite/lib/target-supports.exp
+++ b/gcc/testsuite/lib/target-supports.exp
@@ -10510,3 +10510,12 @@ proc check_symver_available { } {
}
}]
 }
+
+# Return 1 if emitted assembly contains .ident directive.
+
+proc check_effective_target_ident_directive {} {
+return [check_no_messages_and_pattern ident_directive \
+   "(?n)^\[\t\]+\\.ident" assembly {
+   int i;
+}]
+}


[committed][testsuite, nvptx] Fix string matching in gcc.dg/pr87314-1.c

2020-09-24 Thread Tom de Vries
Hi,

with nvptx we run into:
...
FAIL: gcc.dg/pr87314-1.c scan-assembler hellooo
...

The required string is part of the assembly, just in a different format than
expected:
...
.const .align 1 .u8 $LC0[12] =
  { 104, 101, 108, 108, 111, 111, 111, 111, 98, 121, 101, 0 };
...

Fix this by adding an nvptx-specific scan-assembler directive.

Tested on nvptx and x86_64.

Committed to trunk.

Thanks,
- Tom

[testsuite, nvptx] Fix string matching in gcc.dg/pr87314-1.c

gcc/testsuite/ChangeLog:

2020-09-24  Tom de Vries  

* gcc.dg/pr87314-1.c: Add nvptx-specific scan-assembler directive.

---
 gcc/testsuite/gcc.dg/pr87314-1.c | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/gcc/testsuite/gcc.dg/pr87314-1.c b/gcc/testsuite/gcc.dg/pr87314-1.c
index 9bc905612b5..0cb9c07e32c 100644
--- a/gcc/testsuite/gcc.dg/pr87314-1.c
+++ b/gcc/testsuite/gcc.dg/pr87314-1.c
@@ -8,4 +8,6 @@ int h() { return "bye"=="hellbye"+8; }
 /* { dg-final { scan-tree-dump-times "hello" 1 "original" } } */
 /* The test in h() should be retained because the result depends on
string merging.  */
-/* { dg-final { scan-assembler "hellooo" } } */
+/* { dg-final { scan-assembler "hellooo" { target { ! nvptx*-*-* } } } } */
+/* { dg-final { scan-assembler "104, 101, 108, 108, 111, 111, 111" { target { 
nvptx*-*-* } } } } */
+


[committed][testsuite] Scan final instead of asm in independent-cloneids-1.c

2020-09-24 Thread Tom de Vries
Hi,

When running test-case gcc.dg/independent-cloneids-1.c for nvptx, we get:
...
FAIL: scan-assembler-times (?n)^_*bar[.$_]constprop[.$_]0: 1
FAIL: scan-assembler-times (?n)^_*bar[.$_]constprop[.$_]1: 1
FAIL: scan-assembler-times (?n)^_*bar[.$_]constprop[.$_]2: 1
FAIL: scan-assembler-times (?n)^_*foo[.$_]constprop[.$_]0: 1
FAIL: scan-assembler-times (?n)^_*foo[.$_]constprop[.$_]1: 1
FAIL: scan-assembler-times (?n)^_*foo[.$_]constprop[.$_]2: 1
...

The test expects to find something like:
...
bar.constprop.0:
...
but instead on nvptx we have:
...
.func (.param.u32 %value_out) bar$constprop$0
...

Fix this by rewriting the scans to use the final dump instead.

Tested on x86_64.

Committed to trunk.

Thanks,
- Tom

[testsuite] Scan final instead of asm in independent-cloneids-1.c

gcc/testsuite/ChangeLog:

2020-09-24  Tom de Vries  

* gcc.dg/independent-cloneids-1.c: Use scan-rtl-dump instead of
scan-assembler.

---
 gcc/testsuite/gcc.dg/independent-cloneids-1.c | 18 +-
 1 file changed, 9 insertions(+), 9 deletions(-)

diff --git a/gcc/testsuite/gcc.dg/independent-cloneids-1.c 
b/gcc/testsuite/gcc.dg/independent-cloneids-1.c
index 516211a6e86..efbc1c51da0 100644
--- a/gcc/testsuite/gcc.dg/independent-cloneids-1.c
+++ b/gcc/testsuite/gcc.dg/independent-cloneids-1.c
@@ -1,5 +1,5 @@
 /* { dg-do compile } */
-/* { dg-options "-O3 -fipa-cp -fipa-cp-clone"  } */
+/* { dg-options "-O3 -fipa-cp -fipa-cp-clone -fdump-rtl-final"  } */
 /* { dg-skip-if "Odd label definition syntax" { mmix-*-* } } */
 
 extern int printf (const char *, ...);
@@ -29,11 +29,11 @@ baz (int arg)
   return foo (8);
 }
 
-/* { dg-final { scan-assembler-times {(?n)^_*bar[.$_]constprop[.$_]0:} 1 } } */
-/* { dg-final { scan-assembler-times {(?n)^_*bar[.$_]constprop[.$_]1:} 1 } } */
-/* { dg-final { scan-assembler-times {(?n)^_*bar[.$_]constprop[.$_]2:} 1 } } */
-/* { dg-final { scan-assembler-times {(?n)^_*foo[.$_]constprop[.$_]0:} 1 } } */
-/* { dg-final { scan-assembler-times {(?n)^_*foo[.$_]constprop[.$_]1:} 1 } } */
-/* { dg-final { scan-assembler-times {(?n)^_*foo[.$_]constprop[.$_]2:} 1 } } */
-/* { dg-final { scan-assembler-not {(?n)^_*foo[.$_]constprop[.$_]3:} } } */
-/* { dg-final { scan-assembler-not {(?n)^_*foo[.$_]constprop[.$_]4:} } } */
+/* { dg-final { scan-rtl-dump-times {(?n)^;; Function bar.constprop 
\(bar[.$_]constprop[.$_]0,} 1 "final" } } */
+/* { dg-final { scan-rtl-dump-times {(?n)^;; Function bar.constprop 
\(bar[.$_]constprop[.$_]1,} 1 "final" } } */
+/* { dg-final { scan-rtl-dump-times {(?n)^;; Function bar.constprop 
\(bar[.$_]constprop[.$_]2,} 1 "final" } } */
+/* { dg-final { scan-rtl-dump-times {(?n)^;; Function foo.constprop 
\(foo[.$_]constprop[.$_]0,} 1 "final" } } */
+/* { dg-final { scan-rtl-dump-times {(?n)^;; Function foo.constprop 
\(foo[.$_]constprop[.$_]1,} 1 "final" } } */
+/* { dg-final { scan-rtl-dump-times {(?n)^;; Function foo.constprop 
\(foo[.$_]constprop[.$_]2,} 1 "final" } } */
+/* { dg-final { scan-rtl-dump-times {(?n)^;; Function foo.constprop 
\(foo[.$_]constprop[.$_]3,} 0 "final" } } */
+/* { dg-final { scan-rtl-dump-times {(?n)^;; Function foo.constprop 
\(foo[.$_]constprop[.$_]4,} 0 "final" } } */


Re: [PATCH][omp, ftracer] Don't duplicate blocks in SIMT region

2020-09-24 Thread Tom de Vries
On 9/24/20 1:42 PM, Richard Biener wrote:
> On Wed, 23 Sep 2020, Tom de Vries wrote:
> 
>> On 9/23/20 9:28 AM, Richard Biener wrote:
>>> On Tue, 22 Sep 2020, Tom de Vries wrote:
>>>
>>>> [ was: Re: [Patch] [middle-end & nvptx] gcc/tracer.c: Don't split BB
>>>> with SIMT LANE [PR95654] ]
>>>>
>>>> On 9/16/20 8:20 PM, Alexander Monakov wrote:
>>>>>
>>>>>
>>>>> On Wed, 16 Sep 2020, Tom de Vries wrote:
>>>>>
>>>>>> [ cc-ing author omp support for nvptx. ]
>>>>>
>>>>> The issue looks familiar. I recognized it back in 2017 (and LLVM people
>>>>> recognized it too for their GPU targets). In an attempt to get agreement
>>>>> to fix the issue "properly" for GCC I found a similar issue that affects
>>>>> all targets, not just offloading, and filed it as PR 80053.
>>>>>
>>>>> (yes, there are no addressable labels involved in offloading, but 
>>>>> nevertheless
>>>>> the nature of the middle-end issue is related)
>>>>
>>>> Hi Alexander,
>>>>
>>>> thanks for looking into this.
>>>>
>>>> Seeing that the attempt to fix things properly is stalled, for now I'm
>>>> proposing a point-fix, similar to the original patch proposed by Tobias.
>>>>
>>>> Richi, Jakub, OK for trunk?
>>>
>>> I notice that we call ignore_bb_p many times in tracer.c but one call
>>> is conveniently early in tail_duplicate (void):
>>>
>>>   int n = count_insns (bb);
>>>   if (!ignore_bb_p (bb))
>>> blocks[bb->index] = heap.insert (-bb->count.to_frequency (cfun), 
>>> bb);
>>>
>>> where count_insns already walks all stmts in the block.  It would be
>>> nice to avoid repeatedly walking all stmts, maybe adjusting the above
>>> call is enough and/or count_insns can compute this and/or the ignore_bb_p
>>> result can be cached (optimize_bb_for_size_p might change though,
>>> but maybe all other ignore_bb_p calls effectively just are that,
>>> checks for blocks that became optimize_bb_for_size_p).
>>>
>>
>> This untested follow-up patch tries something in that direction.
>>
>> Is this what you meant?
> 
> Yeah, sort of.
> 
> +static bool
> +cached_can_duplicate_bb_p (const_basic_block bb)
> +{
> +  if (can_duplicate_bb)
> 
> is there any path where can_duplicate_bb would be NULL?
> 

Yes, ignore_bb_p is called from gimple-ssa-split-paths.c.

> +{
> +  unsigned int size = SBITMAP_SIZE (can_duplicate_bb);
> +  /* Assume added bb's should be ignored.  */
> +  if ((unsigned int)bb->index < size
> + && bitmap_bit_p (can_duplicate_bb_computed, bb->index))
> +   return !bitmap_bit_p (can_duplicate_bb, bb->index);
> 
> yes, newly added bbs should be ignored so,
> 
>  }
>  
> -  return false;
> +  bool val = compute_can_duplicate_bb_p (bb);
> +  if (can_duplicate_bb)
> +cache_can_duplicate_bb_p (bb, val);
> 
> no need to compute & cache for them, just return true (because
> we did duplicate them)?
> 

Also the case for gimple-ssa-split-paths.c.?

Thanks,
- Tom


[committed][testsuite, nvptx] Fix gcc.dg/tls/thr-cse-1.c

2020-09-24 Thread Tom de Vries
Hi,

With nvptx, we run into:
...
FAIL: gcc.dg/tls/thr-cse-1.c scan-assembler-not \
  emutls_get_address.*emutls_get_address.*
...
because the nvptx assembly looks like:
...
  call (%value_in), __emutls_get_address, (%out_arg1);
  ...
// BEGIN GLOBAL FUNCTION DECL: __emutls_get_address
.extern .func (.param.u64 %value_out) __emutls_get_address (.param.u64 %in_ar0);
...

Fix this by checking the slim final dump instead, where we have just:
...
   12: r35:DI=call [`__emutls_get_address'] argc:0
...

Committed to trunk.

Thanks,
- Tom

[testsuite, nvptx] Fix gcc.dg/tls/thr-cse-1.c

gcc/testsuite/ChangeLog:

2020-09-24  Tom de Vries  

* gcc.dg/tls/thr-cse-1.c: Scan final dump instead of assembly for
nvptx.

---
 gcc/testsuite/gcc.dg/tls/thr-cse-1.c | 5 -
 1 file changed, 4 insertions(+), 1 deletion(-)

diff --git a/gcc/testsuite/gcc.dg/tls/thr-cse-1.c 
b/gcc/testsuite/gcc.dg/tls/thr-cse-1.c
index 84eedfdb226..7145671eb95 100644
--- a/gcc/testsuite/gcc.dg/tls/thr-cse-1.c
+++ b/gcc/testsuite/gcc.dg/tls/thr-cse-1.c
@@ -4,6 +4,7 @@
registers and thus getting the counts wrong.  */
 /* { dg-additional-options "-mshort-calls" { target epiphany-*-* } } */
 /* { dg-require-effective-target tls_emulated } */
+/* { dg-additional-options "-fdump-rtl-final-slim" { target nvptx-*-* } }*/
 
 /* Test that we only get one call to emutls_get_address when CSE is
active.  Note that the var _must_ be initialized for the scan asm
@@ -18,10 +19,12 @@ int foo (int b, int c, int d)
   return a;
 }
 
-/* { dg-final { scan-assembler-not "emutls_get_address.*emutls_get_address.*" 
{ target { ! { "*-wrs-vxworks"  "*-*-darwin8"  "hppa*-*-hpux*" "i?86-*-mingw*" 
"x86_64-*-mingw*" visium-*-* } } } } } */
+/* { dg-final { scan-assembler-not "emutls_get_address.*emutls_get_address.*" 
{ target { ! { "*-wrs-vxworks"  "*-*-darwin8"  "hppa*-*-hpux*" "i?86-*-mingw*" 
"x86_64-*-mingw*" visium-*-* nvptx-*-* } } } } } */
 /* { dg-final { scan-assembler-not 
"call\tL___emutls_get_address.stub.*call\tL___emutls_get_address.stub.*" { 
target "*-*-darwin8" } } } */
 /* { dg-final { scan-assembler-not "(b,l|bl) __emutls_get_address.*(b,l|bl) 
__emutls_get_address.*" { target "hppa*-*-hpux*" } } } */
 /* { dg-final { scan-assembler-not "tls_lookup.*tls_lookup.*" { target 
*-wrs-vxworks } } } */
 /* { dg-final { scan-assembler-not 
"call\t___emutls_get_address.*call\t___emutls_get_address" { target 
"i?86-*-mingw*" } } } */
 /* { dg-final { scan-assembler-not 
"call\t__emutls_get_address.*call\t__emutls_get_address" { target 
"x86_64-*-mingw*" } } } */
 /* { dg-final { scan-assembler-not "%l __emutls_get_address.*%l 
__emutls_get_address" { target visium-*-* } } } */
+
+/* { dg-final { scan-rtl-dump-times "emutls_get_address" 1 "final" { target 
nvptx-*-* } } } */


[committed][testsuite] Add missing require-effective-target alloca

2020-09-25 Thread Tom de Vries
Hi,

Add missing require-effect-target alloca directives.

Tested on nvptx.

Committed to trunk.

Thanks,
- Tom

[testsuite] Add missing require-effective-target alloca

gcc/testsuite/ChangeLog:

2020-09-25  Tom de Vries  

* gcc.dg/analyzer/pr93355-localealias.c: Require effective target
alloca.

---
 gcc/testsuite/gcc.dg/analyzer/pr93355-localealias.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/gcc/testsuite/gcc.dg/analyzer/pr93355-localealias.c 
b/gcc/testsuite/gcc.dg/analyzer/pr93355-localealias.c
index a5cb0d56e70..043e45f828e 100644
--- a/gcc/testsuite/gcc.dg/analyzer/pr93355-localealias.c
+++ b/gcc/testsuite/gcc.dg/analyzer/pr93355-localealias.c
@@ -5,6 +5,7 @@
 /* { dg-do "compile" } */
 /* { dg-additional-options "-Wno-analyzer-too-complex 
-fno-analyzer-feasibility" } */
 /* TODO: remove the need for these options.  */
+/* { dg-require-effective-target alloca } */
 
 /* Handle aliases for locale names.
Copyright (C) 1995-1999, 2000-2001, 2003 Free Software Foundation, Inc.


Re: [PATCH] add move CTOR to auto_vec, use auto_vec for get_loop_exit_edges

2020-09-25 Thread Tom de Vries
On 9/24/20 5:05 PM, Richard Biener wrote:
> On Thu, 24 Sep 2020, Jonathan Wakely wrote:
> 
>> On 24/09/20 11:11 +0200, Richard Biener wrote:
>>> On Wed, 26 Aug 2020, Richard Biener wrote:
>>>
 On Thu, 6 Aug 2020, Richard Biener wrote:

> On Thu, 6 Aug 2020, Richard Biener wrote:
>
>> This adds a move CTOR to auto_vec and makes use of a
>> auto_vec return value for get_loop_exit_edges denoting
>> that lifetime management of the vector is handed to the caller.
>>
>> The move CTOR prompted the hash_table change because it appearantly
>> makes the copy CTOR implicitely deleted (good) and hash-table
>> expansion of the odr_enum_map which is
>> hash_map  where odr_enum has an
>> auto_vec member triggers this.  Not sure if
>> there's a latent bug there before this (I think we're not
>> invoking DTORs, but we're invoking copy-CTORs).
>>
>> Bootstrap / regtest running on x86_64-unknown-linux-gnu.
>>
>> Does this all look sensible and is it a good change
>> (the get_loop_exit_edges one)?
>
> Regtest went OK, here's an update with a complete ChangeLog
> (how useful..) plus the move assign operator deleted, copy
> assign wouldn't work as auto-generated and at the moment
> there's no use of assigning.  I guess if we'd have functions
> that take an auto_vec<> argument meaning they will destroy
> the vector that will become useful and we can implement it.
>
> OK for trunk?

 Ping.
>>>
>>> Ping^2.
>>
>> Looks good to me as far as the use of C++ features goes.
> 
> Thanks, now pushed after re-testing.

Ran into a build breaker after this commit, reported here (
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=97207 ).

Thanks,
- Tom


[committed][nvptx] Fix Wimplicit-fallthrough in nvptx.c with -save-temps

2020-09-25 Thread Tom de Vries
Hi,

When compiling nvptx.c using -save-temps, I ran into Wimplicit-fallthrough
warnings.

The fallthrough locations have been marked with a fallthrough comment, but
that doesn't work with -save-temps, something that has been filed as
PR78497.

Work around this by using gcc_fallthrough () in addition to the comment.

Tested by building target nvptx, copying nvptx.c compile line and adding
-save-temps.

Committed to trunk.

Thanks,
- Tom

[nvptx] Fix Wimplicit-fallthrough in nvptx.c with -save-temps

gcc/ChangeLog:

2020-09-25  Tom de Vries  

* config/nvptx/nvptx.c (nvptx_assemble_integer, nvptx_print_operand):
Use gcc_fallthrough ().

---
 gcc/config/nvptx/nvptx.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/gcc/config/nvptx/nvptx.c b/gcc/config/nvptx/nvptx.c
index 54b1fdf669b..de82f9ab875 100644
--- a/gcc/config/nvptx/nvptx.c
+++ b/gcc/config/nvptx/nvptx.c
@@ -2101,7 +2101,7 @@ nvptx_assemble_integer (rtx x, unsigned int size, int 
ARG_UNUSED (aligned_p))
   val = INTVAL (XEXP (x, 1));
   x = XEXP (x, 0);
   gcc_assert (GET_CODE (x) == SYMBOL_REF);
-  /* FALLTHROUGH */
+  gcc_fallthrough (); /* FALLTHROUGH */
 
 case SYMBOL_REF:
   gcc_assert (size == init_frag.size);
@@ -2603,7 +2603,7 @@ nvptx_print_operand (FILE *file, rtx x, int code)
 {
 case 'A':
   x = XEXP (x, 0);
-  /* FALLTHROUGH.  */
+  gcc_fallthrough (); /* FALLTHROUGH. */
 
 case 'D':
   if (GET_CODE (x) == CONST)


Re: [PATCH] amdgcn, nvptx: Disable OMP barriers in nested teams

2020-09-28 Thread Tom de Vries
On 9/18/20 1:25 PM, Andrew Stubbs wrote:
> This patch fixes a problem in which nested OpenMP parallel regions cause
> errors if the number of inner teams is not balanced (i.e. the number of
> loop iterations is not divisible by the number of physical threads). A
> testcase is included.
> 
> On NVPTX the symptom was a fatal error:
> 
> libgomp: cuCtxSynchronize error: an illegal instruction was encountered
> 
> This was caused by mismatched "bar.sync" instructions (one waiting for
> 32 threads while another is waiting for 256). The source of the mismatch
> being that some threads were still busy while others had run out of work
> to do.
> 
> On GCN there was no such error (GCN barriers always wait for all
> threads), but it worked only by chance: the idle threads were "matching"
> different barriers to the busy threads, but it was harmless because the
> thread function pointer remained NULL.
> 
> This patch simply skips barriers when they would "wait" for only one
> thread (the current thread). This means that teams nested inside other
> teams now run independently, instead of strictly in lock-step, and is
> only valid as long as inner teams are limited to one thread each
> (currently the case).

Is this inner-team-one-thread-limit coded or documented somewhere?

If so, it might be good to add a comment there referring to the code
this patch adds.

Follow-up patch is OK, thanks.
- Tom

> When the inner regions exit then the barriers for
> the outer region will sync everything up again.
> 
> OK to commit?
> 
> Andrew
> 
> P.S. I can approve the amdgcn portion myself; I'm seeking approval for
> the nvptx portion.


Re: [PATCH] amdgcn, nvptx: Disable OMP barriers in nested teams

2020-09-28 Thread Tom de Vries
On 9/28/20 4:17 PM, Andrew Stubbs wrote:
> On 28/09/2020 15:02, Tom de Vries wrote:
>>> This patch simply skips barriers when they would "wait" for only one
>>> thread (the current thread). This means that teams nested inside other
>>> teams now run independently, instead of strictly in lock-step, and is
>>> only valid as long as inner teams are limited to one thread each
>>> (currently the case).
>>
>> Is this inner-team-one-thread-limit coded or documented somewhere?
> 
> In libgomp/parallel.c, gomp_resolve_num_threads we have:
> 
>   else if (thr->ts.active_level >= 1 && !icv->nest_var)
>     return 1;
> 
>> If so, it might be good to add a comment there referring to the code
>> this patch adds.
> 
>   /* Accelerators with fixed thread counts require this to return 1 for
>  nested parallel regions.  */
> 
> WDYT?

Yep, looks good, thanks.
- Tom


[PATCH] Add type arg to TARGET_LIBC_HAS_FUNCTION

2020-09-28 Thread Tom de Vries
[ was: Re: [Patch][nvptx] return true in libc_has_function for
function_sincos ]

On 9/26/20 6:47 PM, Tobias Burnus wrote:
> Found when looking at PR97203 (but having no effect there).
> 
> The GCC ME optimizes with -O1 (or higher) the
>   a = sinf(x)
>   b = cosf(x)
> to __builtin_cexpi(x, &a, &b)
> (...i as in internal; like cexp(z) but with with __real__ z == 0)
> 
> 
> In expand_builtin_cexpi, that is handles as:
>   if (optab_handler (sincos_optab, mode) != CODE_FOR_nothing)
>     ...
>   else if (targetm.libc_has_function (function_sincos))
>     ...
>   else
>     fn = builtin_decl_explicit (BUILT_IN_CEXPF);
> 
> And the latter is done. As newlib's cexpf does not know that
> __real__ z == 0, it calculates 'r = expf (__real__ z)' before
> invoking sinf and cosf on __imag__ z.
> 
> Thus, it is much faster to call 'sincosf', which also exists
> in newlib.
> 
> Solution: Return true for targetm.libc_has_function (function_sincos).
> 
> 
> NOTE: With -funsafe-math-optimizations (-O0 or higher),
> sinf/cosf and sincosf invoke .sin.approx/.cos/.approx instead of
> doing a library call.

This version takes care to enable sincos and sincosf, but not sincosl.

Target hook changes OK for trunk?

Thanks,
- Tom
[nvptx] Add type arg to TARGET_LIBC_HAS_FUNCTION

GCC has a target hook TARGET_LIBC_HAS_FUNCTION, which tells the compiler
which functions it can expect to be present in libc.

The default target hook does not include the sincos functions.

The nvptx port of newlib does include sincos and sincosf, but not sincosl.

The target hook TARGET_LIBC_HAS_FUNCTION does not distinguish between sincos,
sincosf and sincosl, so if we enable it for the sincos functions, then for
test.c:
...
long double x, a, b;
int main (void) {
  x = 0.5;
  a = sinl (x);
  b = cosl (x);
  printf ("a: %f\n", (double)a);
  printf ("b: %f\n", (double)b);
  return 0;
}
...
we introduce a regression:
...
$ gcc test.c -lm -O2
unresolved symbol sincosl
collect2: error: ld returned 1 exit status
...

Add a type argument to target hook TARGET_LIBC_HAS_FUNCTION_TYPE, and use it
in nvptx_libc_has_function_type to enable sincos and sincosf, but not sincosl.

For now, a non-null type argument is only supported for
fn_class == function_sincos.

Tested on nvptx.

2020-09-28  Tobias Burnus  
	Tom de Vries  

	* builtins.c (expand_builtin_cexpi, fold_builtin_sincos): Update
	targetm.libc_has_function call.
	* builtins.def (DEF_C94_BUILTIN, DEF_C99_BUILTIN, DEF_C11_BUILTIN):
	(DEF_C2X_BUILTIN, DEF_C99_COMPL_BUILTIN, DEF_C99_C90RES_BUILTIN):
	Same.
	* config/darwin-protos.h (darwin_libc_has_function): Update prototype.
	* config/darwin.c (darwin_libc_has_function): Add arg.
	* config/linux-protos.h (linux_libc_has_function): Update prototype.
	* config/linux.c (linux_libc_has_function): Add arg.
	* config/i386/i386.c (ix86_libc_has_function): Update
	targetm.libc_has_function call.
	* config/nvptx/nvptx.c (nvptx_libc_has_function): New function.
	(TARGET_LIBC_HAS_FUNCTION): Redefine to nvptx_libc_has_function.
	* convert.c (convert_to_integer_1): Update targetm.libc_has_function
	call.
	* match.pd: Same.
	* target.def (libc_has_function): Add arg.
	* doc/tm.texi: Regenerate.
	* targhooks.c (default_libc_has_function, gnu_libc_has_function)
	(no_c99_libc_has_function): Add arg.
	* targhooks.h (default_libc_has_function,  no_c99_libc_has_function)
	(gnu_libc_has_function): Update prototype.
	* tree-ssa-math-opts.c (pass_cse_sincos::execute): Update
	targetm.libc_has_function call.

---
 gcc/builtins.c |  4 ++--
 gcc/builtins.def   | 20 ++--
 gcc/config/darwin-protos.h |  2 +-
 gcc/config/darwin.c| 16 ++--
 gcc/config/i386/i386.c |  2 +-
 gcc/config/linux-protos.h  |  2 +-
 gcc/config/linux.c | 16 ++--
 gcc/config/nvptx/nvptx.c   | 20 
 gcc/convert.c  |  8 
 gcc/doc/tm.texi|  5 +++--
 gcc/fortran/f95-lang.c |  4 ++--
 gcc/match.pd   |  6 +++---
 gcc/target.def |  5 +++--
 gcc/targhooks.c| 44 +---
 gcc/targhooks.h|  6 +++---
 gcc/tree-ssa-math-opts.c   |  8 +---
 16 files changed, 131 insertions(+), 37 deletions(-)

diff --git a/gcc/builtins.c b/gcc/builtins.c
index cac842fd4a3..85f5f630a0f 100644
--- a/gcc/builtins.c
+++ b/gcc/builtins.c
@@ -2733,7 +2733,7 @@ expand_builtin_cexpi (tree exp, rtx target)
   /* Compute into op1 and op2.  */
   expand_twoval_unop (sincos_optab, op0, op2, op1, 0);
 }
-  else if (targetm.libc_has_function (function_sincos))
+  else if (targetm.libc_has_function (function_sincos, type))
 {
   tree call, fn = NULL_TREE;
   tree top1, top2;
@@ -9770,7 +9770,7 @@ fold_builtin_sincos (location_t loc,
 }
   if (!call)
 

Re: [PATCH] Add type arg to TARGET_LIBC_HAS_FUNCTION

2020-09-29 Thread Tom de Vries
On 9/29/20 8:59 AM, Richard Biener wrote:
> On Mon, Sep 28, 2020 at 7:28 PM Tom de Vries  wrote:
>>
>> [ was: Re: [Patch][nvptx] return true in libc_has_function for
>> function_sincos ]
>>
>> On 9/26/20 6:47 PM, Tobias Burnus wrote:
>>> Found when looking at PR97203 (but having no effect there).
>>>
>>> The GCC ME optimizes with -O1 (or higher) the
>>>   a = sinf(x)
>>>   b = cosf(x)
>>> to __builtin_cexpi(x, &a, &b)
>>> (...i as in internal; like cexp(z) but with with __real__ z == 0)
>>>
>>>
>>> In expand_builtin_cexpi, that is handles as:
>>>   if (optab_handler (sincos_optab, mode) != CODE_FOR_nothing)
>>> ...
>>>   else if (targetm.libc_has_function (function_sincos))
>>> ...
>>>   else
>>> fn = builtin_decl_explicit (BUILT_IN_CEXPF);
>>>
>>> And the latter is done. As newlib's cexpf does not know that
>>> __real__ z == 0, it calculates 'r = expf (__real__ z)' before
>>> invoking sinf and cosf on __imag__ z.
>>>
>>> Thus, it is much faster to call 'sincosf', which also exists
>>> in newlib.
>>>
>>> Solution: Return true for targetm.libc_has_function (function_sincos).
>>>
>>>
>>> NOTE: With -funsafe-math-optimizations (-O0 or higher),
>>> sinf/cosf and sincosf invoke .sin.approx/.cos/.approx instead of
>>> doing a library call.
>>
>> This version takes care to enable sincos and sincosf, but not sincosl.
>>
>> Target hook changes OK for trunk?
> 
> @@ -9770,7 +9770,7 @@ fold_builtin_sincos (location_t loc,
>  }
>if (!call)
>  {
> -  if (!targetm.libc_has_function (function_c99_math_complex)
> +  if (!targetm.libc_has_function (function_c99_math_complex, NULL_TREE)
> 
> why pass NULL_TREE and not 'type' here?
> 
>   || !builtin_decl_implicit_p (fn))
> return NULL_TREE;
> 

I was trying to do the minimal, sincos-only implementation.

> similar for the builtins.def change for the cases where math functions
> are affected?  I guess it's a bit awkward to make it work there, so OK.
> 
>  bool
> -darwin_libc_has_function (enum function_class fn_class)
> +darwin_libc_has_function (enum function_class fn_class, tree type)
>  {
> -  if (fn_class == function_sincos)
> +  if (type != NULL_TREE)
> +{
> +  switch (fn_class)
> +   {
> +   case function_sincos:
> + break;
> +   default:
> + /* Not implemented.  */
> + gcc_unreachable ();
> +   }
> +}
> 
> huh.  I think special-casing this just for sincos is a bit awkward,
> esp. ICEing for other queries with a type.  Specifically
> 
> -@deftypefn {Target Hook} bool TARGET_LIBC_HAS_FUNCTION (enum
> function_class @var{fn_class})
> +@deftypefn {Target Hook} bool TARGET_LIBC_HAS_FUNCTION (enum
> function_class @var{fn_class}, tree @var{type})
>  This hook determines whether a function from a class of functions
> -@var{fn_class} is present in the target C library.
> +@var{fn_class} is present in the target C library.  The @var{type} argument
> +can be used to distinguish between float, double and long double versions.
>  @end deftypefn
> 
> This doesn't mention we'll ICE for anything but sincos.  A sensible
> semantics would be that if TYPE is NULL the caller asks for support
> for all standard (float, double, long double) types while with TYPE
> non-NULL it can ask for a specific type including for example the
> new _FloatN, etc. types.
> 

Ack, updated accordingly and retested.

OK for trunk?

Thanks,
- Tom
[nvptx] Add type arg to TARGET_LIBC_HAS_FUNCTION

GCC has a target hook TARGET_LIBC_HAS_FUNCTION, which tells the compiler
which functions it can expect to be present in libc.

The default target hook does not include the sincos functions.

The nvptx port of newlib does include sincos and sincosf, but not sincosl.

The target hook TARGET_LIBC_HAS_FUNCTION does not distinguish between sincos,
sincosf and sincosl, so if we enable it for the sincos functions, then for
test.c:
...
long double x, a, b;
int main (void) {
  x = 0.5;
  a = sinl (x);
  b = cosl (x);
  printf ("a: %f\n", (double)a);
  printf ("b: %f\n", (double)b);
  return 0;
}
...
we introduce a regression:
...
$ gcc test.c -lm -O2
unresolved symbol sincosl
collect2: error: ld returned 1 exit status
...

Add a type argument to target hook TARGET_LIBC_HAS_FUNCTION_TYPE, and use it
in nvptx_libc_has_function_type to enable sincos and sincosf, but not sincosl.

Build and reg-tested on x86_64-linux.

Build and tested on nvptx.

2020-09-28  Tobias Burnus  

[committed][testsuite] Re-enable pr94600-{1,3}.c tests for arm

2020-09-30 Thread Tom de Vries
[ was: Re: [committed][testsuite] Require non_strict_align in
pr94600-{1,3}.c ]

On 9/30/20 4:53 AM, Hans-Peter Nilsson wrote:
> On Thu, 24 Sep 2020, Tom de Vries wrote:
> 
>> Hi,
>>
>> With the nvptx target, we run into:
>> ...
>> FAIL: gcc.dg/pr94600-1.c scan-rtl-dump-times final "\\(mem/v" 6
>> FAIL: gcc.dg/pr94600-1.c scan-rtl-dump-times final "\\(set \\(mem/v" 6
>> FAIL: gcc.dg/pr94600-3.c scan-rtl-dump-times final "\\(mem/v" 1
>> FAIL: gcc.dg/pr94600-3.c scan-rtl-dump-times final "\\(set \\(mem/v" 1
>> ...
>> The scans attempt to check for volatile stores, but on nvptx we have memcpy
>> instead.
>>
>> This is due to nvptx being a STRICT_ALIGNMENT target, which has the effect
>> that the TYPE_MODE for the store target is set to BKLmode in
>> compute_record_mode.
>>
>> Fix the FAILs by requiring effective target non_strict_align.
> 
> No, that's wrong.  There's more than that at play; it worked for
> the strict-alignment targets where it was tested at the time.
> 

Hi,

thanks for letting me know.

> The test is a valuable canary for this kind of bug.  You now
> disabled it for strict-alignment targets.
> 
> Please revert and add your target specifier instead, if you
> don't feel like investigating further.

I've analyzed the compilation on strict-alignment target arm-eabi, and
broadened the effective target to (non_strict_align ||
pcc_bitfield_type_matters).

Thanks,
- Tom
[testsuite] Re-enable pr94600-{1,3}.c tests for arm

Before commit 7e437162001 "[testsuite] Require non_strict_align in
pr94600-{1,3}.c", some tests were failing for nvptx, because volatile stores
were expected, but memcpy's were found instead.

This was traced back to this bit in compute_record_mode:
...
  /* If structure's known alignment is less than what the scalar
 mode would need, and it matters, then stick with BLKmode.  */
  if (mode != BLKmode
  && STRICT_ALIGNMENT
  && ! (TYPE_ALIGN (type) >= BIGGEST_ALIGNMENT
|| TYPE_ALIGN (type) >= GET_MODE_ALIGNMENT (mode)))
{
  /* If this is the only reason this type is BLKmode, then
 don't force containing types to be BLKmode.  */
  TYPE_NO_FORCE_BLK (type) = 1;
  mode = BLKmode;
}
...
which got triggered for nvptx, but not for x86_64.

The commit disabled the tests for non_strict_align effective target, but
that had the effect for the arm target that those tests were disabled, even
though they were passing before.

Further investigation in compute_record_mode shows that the if-condition
evaluates to false for arm because, because TYPE_ALIGN (type) == 32, while
it's 8 for nvptx.  This again can be explained by the
PCC_BITFIELD_TYPE_MATTERS setting, which is 1 for arm, but 0 for nvptx.

Re-enable the test for arm by using effective target
(non_strict_align || pcc_bitfield_type_matters).

Tested on arm-eabi and nvptx.

gcc/testsuite/ChangeLog:

2020-09-30  Tom de Vries  

	* gcc.dg/pr94600-1.c: Use effective target
	(non_strict_align || pcc_bitfield_type_matters).
	* gcc.dg/pr94600-3.c: Same.

---
 gcc/testsuite/gcc.dg/pr94600-1.c | 4 ++--
 gcc/testsuite/gcc.dg/pr94600-3.c | 4 ++--
 2 files changed, 4 insertions(+), 4 deletions(-)

diff --git a/gcc/testsuite/gcc.dg/pr94600-1.c b/gcc/testsuite/gcc.dg/pr94600-1.c
index 38f939a98cb..c9a7bb9007e 100644
--- a/gcc/testsuite/gcc.dg/pr94600-1.c
+++ b/gcc/testsuite/gcc.dg/pr94600-1.c
@@ -32,5 +32,5 @@ foo(void)
 }
 
 /* The only volatile accesses should be the obvious writes.  */
-/* { dg-final { scan-rtl-dump-times {\(mem/v} 6 "final" { target { non_strict_align } } } } */
-/* { dg-final { scan-rtl-dump-times {\(set \(mem/v} 6 "final" { target { non_strict_align } } } } */
+/* { dg-final { scan-rtl-dump-times {\(mem/v} 6 "final" { target { non_strict_align || pcc_bitfield_type_matters } } } } */
+/* { dg-final { scan-rtl-dump-times {\(set \(mem/v} 6 "final" { target { non_strict_align || pcc_bitfield_type_matters } } } } */
diff --git a/gcc/testsuite/gcc.dg/pr94600-3.c b/gcc/testsuite/gcc.dg/pr94600-3.c
index e8776fbdb28..ff42c7db3c6 100644
--- a/gcc/testsuite/gcc.dg/pr94600-3.c
+++ b/gcc/testsuite/gcc.dg/pr94600-3.c
@@ -31,5 +31,5 @@ foo(void)
 }
 
 /* The loop isn't unrolled. */
-/* { dg-final { scan-rtl-dump-times {\(mem/v} 1 "final" { target { non_strict_align } } } } */
-/* { dg-final { scan-rtl-dump-times {\(set \(mem/v} 1 "final" { target { non_strict_align } } } } */
+/* { dg-final { scan-rtl-dump-times {\(mem/v} 1 "final" { target { non_strict_align || pcc_bitfield_type_matters } } } } */
+/* { dg-final { scan-rtl-dump-times {\(set \(mem/v} 1 "final" { target { non_strict_align || pcc_bitfield_type_matters } } } } */


[committed][testsuite] Enable pr94600-{1,3}.c tests for nvptx

2020-10-01 Thread Tom de Vries
[ was: Re: [committed][testsuite] Re-enable pr94600-{1,3}.c tests for arm ]

On 10/1/20 7:38 AM, Hans-Peter Nilsson wrote:
> On Wed, 30 Sep 2020, Tom de Vries wrote:
> 
>> [ was: Re: [committed][testsuite] Require non_strict_align in
>> pr94600-{1,3}.c ]
>>
>> On 9/30/20 4:53 AM, Hans-Peter Nilsson wrote:
>>> On Thu, 24 Sep 2020, Tom de Vries wrote:
>>>
>>>> Hi,
>>>>
>>>> With the nvptx target, we run into:
>>>> ...
>>>> FAIL: gcc.dg/pr94600-1.c scan-rtl-dump-times final "\\(mem/v" 6
>>>> FAIL: gcc.dg/pr94600-1.c scan-rtl-dump-times final "\\(set \\(mem/v" 6
>>>> FAIL: gcc.dg/pr94600-3.c scan-rtl-dump-times final "\\(mem/v" 1
>>>> FAIL: gcc.dg/pr94600-3.c scan-rtl-dump-times final "\\(set \\(mem/v" 1
>>>> ...
>>>> The scans attempt to check for volatile stores, but on nvptx we have memcpy
>>>> instead.
>>>>
>>>> This is due to nvptx being a STRICT_ALIGNMENT target, which has the effect
>>>> that the TYPE_MODE for the store target is set to BKLmode in
>>>> compute_record_mode.
>>>>
>>>> Fix the FAILs by requiring effective target non_strict_align.
>>>
>>> No, that's wrong.  There's more than that at play; it worked for
>>> the strict-alignment targets where it was tested at the time.
>>>
>>
>> Hi,
>>
>> thanks for letting me know.
>>
>>> The test is a valuable canary for this kind of bug.  You now
>>> disabled it for strict-alignment targets.
>>>
>>> Please revert and add your target specifier instead, if you
>>> don't feel like investigating further.
>>
>> I've analyzed the compilation on strict-alignment target arm-eabi, and
> 
> An analysis should result in more than that statement.
> 

Well, it refers to the analysis in the commit log of the patch, sorry if
that was not obvious.

>> broadened the effective target to (non_strict_align ||
>> pcc_bitfield_type_matters).
> 
> That's *also* not right.  I'm guessing your nvptx fails because
> it has 64-bit alignment requirement, but no 32-bit writes.
> ...um, no that can't be it, nvptx seems to have them.  Costs?
> Yes, probably your #define MOVE_RATIO(SPEED) 4.
> 
> The writes are to 32-bit aligned addresses which gcc can deduce
> (also for strict-alignment targets) because it's a literal,
> where it isn't explicitly declared to be attribute-aligned
> 
> You should have noticed the weirness in that you "only" needed
> to tweak pr94600-1.c and -3.c, not even pr94600-2.c, which
> should be the case if it was just the test-case getting the
> predicates wrong.  This points at your MOVE_RATIO, together with
> middle-end not applying it consistently for -2.c.
> 
> Again, please just skip for nvptx (don't mix-n-match general
> predicates) unless you really look into the reason you don't get
> 6 single 32-bit-writes only in *some* of the cases.

Thanks for the pointer to pr94600-2.c.  I've compared the behaviour
between pr94600-1.c and pr94600-2.c and figured out why in one case we
get the load/store pair, and in the other the memcpy.  See rationale in
commit below.

Committed to trunk.

Thanks,
- Tom

[testsuite] Enable pr94600-{1,3}.c tests for nvptx

When compiling test-case pr94600-1.c for nvptx, this gimple mem move:
...
  MEM[(volatile struct t0 *)655404B] ={v} a0[0];
...
is expanded into a memcpy, but when compiling pr94600-2.c instead, this similar
gimple mem move:
...
  MEM[(volatile struct t0 *)655404B] ={v} a00;
...
is expanded into a 32-bit load/store pair.

In both cases, emit_block_move is called.

In the latter case, can_move_by_pieces (4 /* byte-size */, 32 /* bit-align */)
is called, which returns true (because by_pieces_ninsns returns 1, which is
smaller than the MOVE_RATIO of 4).

In the former case, can_move_by_pieces (4 /* byte-size */, 8 /* bit-align */)
is called, which returns false (because by_pieces_ninsns returns 4, which is
not smaller than the MOVE_RATIO of 4).

So the difference in code generation is explained by the alignment.  The
difference in alignment comes from the move sources: a0[0] vs. a00.  Both
have the same type with 8-bit alignment, but a00 is on stack, which based on
the base stack align and stack variable placement happens to result in a
32-bit alignment.

Enable test-cases pr94600-{1,3}.c for nvptx by forcing the currently 8-byte
aligned variables to have a 32-bit alignment for STRICT_ALIGNMENT targets.

Tested on nvptx.

gcc/testsuite/ChangeLog:

2020-10-01  Tom de Vries  

	* gcc.dg/pr94600-1.c: Force 32-bit alignment for a0 for !non_strict_align
	targets.  Remov

[committed][nvptx] Emit mov.u32 instead of cvt.u32.u32 for truncsiqi2

2020-10-01 Thread Tom de Vries
Hi,

When running:
...
$ gcc.sh src/gcc/testsuite/gcc.target/nvptx/abi-complex-arg.c -S -dP
...
we have in abi-complex-arg.s:
...
//(insn 3 5 4 2
//  (set
//(reg:QI 23)
//(truncate:QI (reg:SI 22))) "abi-complex-arg.c":38:1 29 {truncsiqi2}
//  (nil))
cvt.u32.u32 %r23, %r22; // 3[c=4]  truncsiqi2/0
...

The cvt.u32.u32 can be written shorter and clearer as mov.u32.

Fix this in define_insn "truncsi2".

Tested on nvptx.

Committed to trunk.

Thanks,
- Tom

[nvptx] Emit mov.u32 instead of cvt.u32.u32 for truncsiqi2

gcc/ChangeLog:

2020-10-01  Tom de Vries  

PR target/80845
* config/nvptx/nvptx.md (define_insn "truncsi2"): Emit mov.u32
instead of cvt.u32.u32.

---
 gcc/config/nvptx/nvptx.md | 10 +++---
 1 file changed, 7 insertions(+), 3 deletions(-)

diff --git a/gcc/config/nvptx/nvptx.md b/gcc/config/nvptx/nvptx.md
index 035f6e0151b..ccbcd096fd1 100644
--- a/gcc/config/nvptx/nvptx.md
+++ b/gcc/config/nvptx/nvptx.md
@@ -383,9 +383,13 @@ (define_insn "truncsi2"
   [(set (match_operand:QHIM 0 "nvptx_nonimmediate_operand" "=R,m")
(truncate:QHIM (match_operand:SI 1 "nvptx_register_operand" "R,R")))]
   ""
-  "@
-   %.\\tcvt%t0.u32\\t%0, %1;
-   %.\\tst%A0.u%T0\\t%0, %1;"
+  {
+if (which_alternative == 1)
+  return "%.\\tst%A0.u%T0\\t%0, %1;";
+if (GET_MODE (operands[0]) == QImode)
+  return "%.\\tmov%t0\\t%0, %1;";
+return "%.\\tcvt%t0.u32\\t%0, %1;";
+  }
   [(set_attr "subregs_ok" "true")])
 
 (define_insn "truncdi2"


[PATCH][omp, simt] Handle alternative IV

2020-10-02 Thread Tom de Vries
Hi,

Consider the test-case libgomp.c/pr81778.c added in this commit, with
this core loop (note: CANARY_SIZE set to 0 for simplicity):
...
  int s = 1;
  #pragma omp target simd
  for (int i = N - 1; i > -1; i -= s)
a[i] = 1;
...
which, given that N is 32, sets a[0..31] to 1.

After omp-expand, this looks like:
...
   :
  simduid.7 = .GOMP_SIMT_ENTER (simduid.7);
  .omp_simt.8 = .GOMP_SIMT_ENTER_ALLOC (simduid.7);
  D.3193 = -s;
  s.9 = s;
  D.3204 = .GOMP_SIMT_LANE ();
  D.3205 = -s.9;
  D.3206 = (int) D.3204;
  D.3207 = D.3205 * D.3206;
  i = D.3207 + 31;
  D.3209 = 0;
  D.3210 = -s.9;
  D.3211 = D.3210 - i;
  D.3210 = -s.9;
  D.3212 = D.3211 / D.3210;
  D.3213 = (unsigned int) D.3212;
  D.3213 = i >= 0 ? D.3213 : 0;

   :
  if (D.3209 < D.3213)
goto ; [87.50%]
  else
goto ; [12.50%]

   :
  a[i] = 1;
  D.3215 = -s.9;
  D.3219 = .GOMP_SIMT_VF ();
  D.3216 = (int) D.3219;
  D.3220 = D.3215 * D.3216;
  i = D.3220 + i;
  D.3209 = D.3209 + 1;
  goto ; [100.00%]
...

On nvptx, the first time bb6 is executed, i is in the 0..31 range (depending
on the lane that is executing) at bb entry.

So we have the following sequence:
- a[0..31] is set to 1
- i is updated to -32..-1
- D.3209 is updated to 1 (being 0 initially)
- bb19 is executed, and if condition (D.3209 < D.3213) == (1 < 32) evaluates
  to true
- bb6 is once more executed, which should not happen because all the elements
  that needed to be handled were already handled.
- consequently, elements that should not be written are written
- with CANARY_SIZE == 0, we may run into a libgomp error:
  ...
  libgomp: cuCtxSynchronize error: an illegal memory access was encountered
  ...
  and with CANARY_SIZE unmodified, we run into:
  ...
  Expected 0, got 1 at base[-961]
  Aborted (core dumped)
  ...

The cause of this is as follows:
- because the step s is a variable rather than a constant, an alternative
  IV (D.3209 in our example) is generated in expand_omp_simd, and the
  loop condition is tested in terms of the alternative IV rather than
  the original IV (i in our example).
- the SIMT code in expand_omp_simd works by modifying step and initial value.
- The initial value fd->loop.n1 is loaded into a variable n1, which is
  modified by the SIMT code and then used there-after.
- The step fd->loop.step is loaded into a variable step, which is is modified
  by the SIMT code, but afterwards there are uses of both step and
  fd->loop.step.
- There are uses of fd->loop.step in the alternative IV handling code,
  which should use step instead.

Fix this by introducing an additional variable orig_step, which is not
modified by the SIMT code and replacing all remaining uses of fd->loop.step
by either step or orig_step.

Build on x86_64-linux with nvptx accelerator, tested libgomp.

This fixes for-5.c and for-6.c FAILs I'm currently seeing on a quadro m1200
with driver 450.66.

OK for trunk?

Thanks,
- Tom

[omp, simt] Handle alternative IV

gcc/ChangeLog:

2020-10-02  Tom de Vries  

* omp-expand.c (expand_omp_simd): Add step_orig, and replace uses of
fd->loop.step by either step or orig_step.

libgomp/ChangeLog:

2020-10-02  Tom de Vries  

* testsuite/libgomp.c/pr81778.c: New test.

---
 gcc/omp-expand.c  | 11 
 libgomp/testsuite/libgomp.c/pr81778.c | 48 +++
 2 files changed, 54 insertions(+), 5 deletions(-)

diff --git a/gcc/omp-expand.c b/gcc/omp-expand.c
index 99cb4f9dda4..80e35ac0294 100644
--- a/gcc/omp-expand.c
+++ b/gcc/omp-expand.c
@@ -6307,6 +6307,7 @@ expand_omp_simd (struct omp_region *region, struct 
omp_for_data *fd)
   n2 = OMP_CLAUSE_DECL (innerc);
 }
   tree step = fd->loop.step;
+  tree orig_step = step; /* May be different from step if is_simt.  */
 
   bool is_simt = omp_find_clause (gimple_omp_for_clauses (fd->for_stmt),
  OMP_CLAUSE__SIMT_);
@@ -6455,7 +6456,7 @@ expand_omp_simd (struct omp_region *region, struct 
omp_for_data *fd)
   tree altv = NULL_TREE, altn2 = NULL_TREE;
   if (fd->collapse == 1
   && !broken_loop
-  && TREE_CODE (fd->loops[0].step) != INTEGER_CST)
+  && TREE_CODE (orig_step) != INTEGER_CST)
 {
   /* The vectorizer currently punts on loops with non-constant steps
 for the main IV (can't compute number of iterations and gives up
@@ -6471,7 +6472,7 @@ expand_omp_simd (struct omp_region *region, struct 
omp_for_data *fd)
itype = signed_type_for (itype);
   t = build_int_cst (itype, (fd->loop.cond_code == LT_EXPR ? -1 : 1));
   t = fold_build2 (PLUS_EXPR, itype,
-  fold_convert (itype, fd->loop.step), t);
+  fold_convert (itype, step), t);
   t = fold_build2 (PLUS_EXPR, itype, t, fold_convert (itype, n2));
   t = fold_build2 (MINUS_EXPR, itype, t,
   fold_convert (itype, fd->loop.v));
@@ -

Re: [PATCH][omp, ftracer] Don't duplicate blocks in SIMT region

2020-10-05 Thread Tom de Vries
On 9/22/20 6:38 PM, Tom de Vries wrote:
> [ was: Re: [Patch] [middle-end & nvptx] gcc/tracer.c: Don't split BB
> with SIMT LANE [PR95654] ]
> 
> On 9/16/20 8:20 PM, Alexander Monakov wrote:
>>
>>
>> On Wed, 16 Sep 2020, Tom de Vries wrote:
>>
>>> [ cc-ing author omp support for nvptx. ]
>>
>> The issue looks familiar. I recognized it back in 2017 (and LLVM people
>> recognized it too for their GPU targets). In an attempt to get agreement
>> to fix the issue "properly" for GCC I found a similar issue that affects
>> all targets, not just offloading, and filed it as PR 80053.
>>
>> (yes, there are no addressable labels involved in offloading, but 
>> nevertheless
>> the nature of the middle-end issue is related)
> 
> Hi Alexander,
> 
> thanks for looking into this.
> 
> Seeing that the attempt to fix things properly is stalled, for now I'm
> proposing a point-fix, similar to the original patch proposed by Tobias.
> 
> Richi, Jakub, OK for trunk?
> 

I've had to modify this patch in two ways:
- the original test-case stopped failing, though not the
  minimized one, so I added that one as a test-case
- only testing for ENTER_ALLOC and EXIT, and not explicitly for VOTE_ANY
  in ignore_bb_p also stopped working, so I've added that now.

Re-tested and committed.

Thanks,
- Tom
[omp, ftracer] Don't duplicate blocks in SIMT region

When running the libgomp testsuite on x86_64-linux with nvptx accelerator on
the test-case included in this patch, we run into:
...
FAIL: libgomp.fortran/pr95654.f90 -O3 -fomit-frame-pointer -funroll-loops \
  -fpeel-loops -ftracer -finline-functions  execution test
...

The test-case is a minimal version of this FAIL:
...
FAIL: libgomp.fortran/pr66199-5.f90 -O3 -fomit-frame-pointer -funroll-loops \
  -fpeel-loops -ftracer -finline-functions  execution test
...
but that one has stopped failing at commit c2ebf4f10de "openmp: Add support
for non-rect simd and improve collapsed simd support".

The problem is that ftracer duplicates a block containing GOMP_SIMT_VOTE_ANY.

That is, before ftracer we have (dropping the GOMP_SIMT_ prefix):
...
bb4(ENTER_ALLOC)
*--+
|   \
|\
| v
| *
v bb8
*<*
bb5(VOTE_ANY)
*-+
| |
| |
| |
| |
| v
| *
v bb7(XCHG_IDX)
*<*
bb6(EXIT)
...

The XCHG_IDX internal-fn does inter-SIMT-lane communication, which for nvptx
maps onto shfl, an operator which has the requirement that the warp executing
the operator is convergent.  The warp diverges at bb4, and
reconverges at bb5, and does not diverge by going to bb7, so the shfl is
indeed executed by a convergent warp.

After ftracer, we have:
...
bb4(ENTER_ALLOC)
*--+
|   \
|\
| \
|  \
v   v
*   *
bb5(VOTE_ANY)   bb8(VOTE_ANY)
*   *
|\ /|
| \  ++ |
|  \/   |
|  /\   |
| /  +--v
|/  *
v   bb7(XCHG_IDX)
*<--*
bb6(EXIT)
...

The warp diverges again at bb5, but does not reconverge again before bb6, so
the shfl is executed by a divergent warp, which causes the FAIL.

Fix this by making ftracer ignore blocks containing ENTER_ALLOC, VOTE_ANY and
EXIT, effectively treating the SIMT region conservatively.

An argument can be made that the test needs to be added in a more
generic place, like gimple_can_duplicate_bb_p or some such, and that ftracer
then needs to use the generic test.  But that's a discussion with a much
broader scope, so I'm leaving that for another patch.

Bootstrapped and reg-tested on x86_64-linux.

Build on x86_64-linux with nvptx accelerator, tested with libgomp.

gcc/ChangeLog:

	PR fortran/95654
	* tracer.c (ignore_bb_p): Ignore GOMP_SIMT_ENTER_ALLOC,
	GOMP_SIMT_VOTE_ANY and GOMP_SIMT_EXIT.

libgomp/ChangeLog:

2020-10-05  Tom de Vries  

	PR fortran/95654
	* testsuite/libgomp.fortran/pr95654.f90: New test.

---
 gcc/tracer.c  | 18 ++
 libgomp/testsuite/libgomp.fortran/pr95654.f90 | 11 +++
 2 files changed, 29 insertions(+)

diff --git a/gcc/tracer.c b/gcc/tracer.c
index 82ede722534..5e51752d89f 100644
--- a/gcc/tracer.c
+++ b/gcc/tracer.c
@@ -108,6 +108,24 @@ ignore_bb_p (const_basic_block bb)
 	return true;
 }
 
+  for (gimple_stmt_iterator gsi = gsi_start_bb (CONST_CAST_BB (bb));
+   !gsi_end_p (gsi); gsi_next (&gsi))
+{
+  gimple *g = gsi_stmt (gsi);
+
+  /* An IFN_GOMP_SIMT_ENTER_ALLOC/IFN_GOMP_SIMT_EXIT call must be
+	 duplicated as part of its group, or not at all.
+	 The IFN_GOMP_SIMT_VOTE_ANY is currently part of such a group,
+	 so the same holds there, but it could be argued that t

Re: [PATCH][omp, ftracer] Don't duplicate blocks in SIMT region

2020-10-05 Thread Tom de Vries
On 9/24/20 2:44 PM, Richard Biener wrote:
> On Thu, 24 Sep 2020, Tom de Vries wrote:
> 
>> On 9/24/20 1:42 PM, Richard Biener wrote:
>>> On Wed, 23 Sep 2020, Tom de Vries wrote:
>>>
>>>> On 9/23/20 9:28 AM, Richard Biener wrote:
>>>>> On Tue, 22 Sep 2020, Tom de Vries wrote:
>>>>>
>>>>>> [ was: Re: [Patch] [middle-end & nvptx] gcc/tracer.c: Don't split BB
>>>>>> with SIMT LANE [PR95654] ]
>>>>>>
>>>>>> On 9/16/20 8:20 PM, Alexander Monakov wrote:
>>>>>>>
>>>>>>>
>>>>>>> On Wed, 16 Sep 2020, Tom de Vries wrote:
>>>>>>>
>>>>>>>> [ cc-ing author omp support for nvptx. ]
>>>>>>>
>>>>>>> The issue looks familiar. I recognized it back in 2017 (and LLVM people
>>>>>>> recognized it too for their GPU targets). In an attempt to get agreement
>>>>>>> to fix the issue "properly" for GCC I found a similar issue that affects
>>>>>>> all targets, not just offloading, and filed it as PR 80053.
>>>>>>>
>>>>>>> (yes, there are no addressable labels involved in offloading, but 
>>>>>>> nevertheless
>>>>>>> the nature of the middle-end issue is related)
>>>>>>
>>>>>> Hi Alexander,
>>>>>>
>>>>>> thanks for looking into this.
>>>>>>
>>>>>> Seeing that the attempt to fix things properly is stalled, for now I'm
>>>>>> proposing a point-fix, similar to the original patch proposed by Tobias.
>>>>>>
>>>>>> Richi, Jakub, OK for trunk?
>>>>>
>>>>> I notice that we call ignore_bb_p many times in tracer.c but one call
>>>>> is conveniently early in tail_duplicate (void):
>>>>>
>>>>>   int n = count_insns (bb);
>>>>>   if (!ignore_bb_p (bb))
>>>>> blocks[bb->index] = heap.insert (-bb->count.to_frequency (cfun), 
>>>>> bb);
>>>>>
>>>>> where count_insns already walks all stmts in the block.  It would be
>>>>> nice to avoid repeatedly walking all stmts, maybe adjusting the above
>>>>> call is enough and/or count_insns can compute this and/or the ignore_bb_p
>>>>> result can be cached (optimize_bb_for_size_p might change though,
>>>>> but maybe all other ignore_bb_p calls effectively just are that,
>>>>> checks for blocks that became optimize_bb_for_size_p).
>>>>>
>>>>
>>>> This untested follow-up patch tries something in that direction.
>>>>
>>>> Is this what you meant?
>>>
>>> Yeah, sort of.
>>>
>>> +static bool
>>> +cached_can_duplicate_bb_p (const_basic_block bb)
>>> +{
>>> +  if (can_duplicate_bb)
>>>
>>> is there any path where can_duplicate_bb would be NULL?
>>>
>>
>> Yes, ignore_bb_p is called from gimple-ssa-split-paths.c.
> 
> Oh, that was probably done because of the very same OMP issue ...
> 
>>> +{
>>> +  unsigned int size = SBITMAP_SIZE (can_duplicate_bb);
>>> +  /* Assume added bb's should be ignored.  */
>>> +  if ((unsigned int)bb->index < size
>>> + && bitmap_bit_p (can_duplicate_bb_computed, bb->index))
>>> +   return !bitmap_bit_p (can_duplicate_bb, bb->index);
>>>
>>> yes, newly added bbs should be ignored so,
>>>
>>>  }
>>>  
>>> -  return false;
>>> +  bool val = compute_can_duplicate_bb_p (bb);
>>> +  if (can_duplicate_bb)
>>> +cache_can_duplicate_bb_p (bb, val);
>>>
>>> no need to compute & cache for them, just return true (because
>>> we did duplicate them)?
>>>
>>
>> Also the case for gimple-ssa-split-paths.c.?
> 
> If it had the bitmap then yes ... since it doesn't the early
> out should be in the conditional above only.
> 

Ack, updated the patch accordingly, and split it up in two bits, one
that does refactoring, and one that adds the actual caching:
- [ftracer] Factor out can_duplicate_bb_p
- [ftracer] Add caching of can_duplicate_bb_p

I'll post these in reply to this email.

Thanks,
- Tom


[PATCH][ftracer] Factor out can_duplicate_bb_p

2020-10-05 Thread Tom de Vries
[ was: Re: [PATCH][omp, ftracer] Don't duplicate blocks in SIMT region ]

On 10/5/20 9:05 AM, Tom de Vries wrote:
> Ack, updated the patch accordingly, and split it up in two bits, one
> that does refactoring, and one that adds the actual caching:
> - [ftracer] Factor out can_duplicate_bb_p
> - [ftracer] Add caching of can_duplicate_bb_p
> 
> I'll post these in reply to this email.
> 

OK?

Thanks,
- Tom
[ftracer] Factor out can_duplicate_bb_p

Factor out can_duplicate_bb_p out of ignore_bb_p.

Also factor out can_duplicate_insn_p and can_duplicate_bb_no_insn_iter_p to
expose the parts of can_duplicate_bb_p that are per-bb and per-insn.

Bootstrapped and reg-tested on x86_64-linux.

gcc/ChangeLog:

2020-10-05  Tom de Vries  

	* tracer.c (can_duplicate_insn_p, can_duplicate_bb_no_insn_iter_p)
	(can_duplicate_bb_p): New function, factored out of ...
	(ignore_bb_p): ... here.

---
 gcc/tracer.c | 74 
 1 file changed, 50 insertions(+), 24 deletions(-)

diff --git a/gcc/tracer.c b/gcc/tracer.c
index 5e51752d89f..c0e888f6b03 100644
--- a/gcc/tracer.c
+++ b/gcc/tracer.c
@@ -84,49 +84,75 @@ bb_seen_p (basic_block bb)
   return bitmap_bit_p (bb_seen, bb->index);
 }
 
-/* Return true if we should ignore the basic block for purposes of tracing.  */
-bool
-ignore_bb_p (const_basic_block bb)
+/* Return true if gimple stmt G can be duplicated.  */
+static bool
+can_duplicate_insn_p (gimple *g)
+{
+  /* An IFN_GOMP_SIMT_ENTER_ALLOC/IFN_GOMP_SIMT_EXIT call must be
+ duplicated as part of its group, or not at all.
+ The IFN_GOMP_SIMT_VOTE_ANY is currently part of such a group,
+ so the same holds there, but it could be argued that the
+ IFN_GOMP_SIMT_VOTE_ANY could be generated after that group,
+ in which case it could be duplicated.  */
+  if (is_gimple_call (g)
+  && (gimple_call_internal_p (g, IFN_GOMP_SIMT_ENTER_ALLOC)
+	  || gimple_call_internal_p (g, IFN_GOMP_SIMT_EXIT)
+	  || gimple_call_internal_p (g, IFN_GOMP_SIMT_VOTE_ANY)))
+return false;
+
+  return true;
+}
+
+/* Return true if BB can be duplicated.  Avoid iterating over the insns.  */
+static bool
+can_duplicate_bb_no_insn_iter_p (const_basic_block bb)
 {
   if (bb->index < NUM_FIXED_BLOCKS)
-return true;
-  if (optimize_bb_for_size_p (bb))
-return true;
+return false;
 
   if (gimple *g = last_stmt (CONST_CAST_BB (bb)))
 {
   /* A transaction is a single entry multiple exit region.  It
 	 must be duplicated in its entirety or not at all.  */
   if (gimple_code (g) == GIMPLE_TRANSACTION)
-	return true;
+	return false;
 
   /* An IFN_UNIQUE call must be duplicated as part of its group,
 	 or not at all.  */
   if (is_gimple_call (g)
 	  && gimple_call_internal_p (g)
 	  && gimple_call_internal_unique_p (g))
-	return true;
+	return false;
 }
 
+  return true;
+}
+
+/* Return true if BB can be duplicated.  */
+static bool
+can_duplicate_bb_p (const_basic_block bb)
+{
+  if (!can_duplicate_bb_no_insn_iter_p (bb))
+return false;
+
   for (gimple_stmt_iterator gsi = gsi_start_bb (CONST_CAST_BB (bb));
!gsi_end_p (gsi); gsi_next (&gsi))
-{
-  gimple *g = gsi_stmt (gsi);
-
-  /* An IFN_GOMP_SIMT_ENTER_ALLOC/IFN_GOMP_SIMT_EXIT call must be
-	 duplicated as part of its group, or not at all.
-	 The IFN_GOMP_SIMT_VOTE_ANY is currently part of such a group,
-	 so the same holds there, but it could be argued that the
-	 IFN_GOMP_SIMT_VOTE_ANY could be generated after that group,
-	 in which case it could be duplicated.  */
-  if (is_gimple_call (g)
-	  && (gimple_call_internal_p (g, IFN_GOMP_SIMT_ENTER_ALLOC)
-	  || gimple_call_internal_p (g, IFN_GOMP_SIMT_EXIT)
-	  || gimple_call_internal_p (g, IFN_GOMP_SIMT_VOTE_ANY)))
-	return true;
-}
+if (!can_duplicate_insn_p (gsi_stmt (gsi)))
+  return false;
+
+  return true;
+}
+
+/* Return true if we should ignore the basic block for purposes of tracing.  */
+bool
+ignore_bb_p (const_basic_block bb)
+{
+  if (bb->index < NUM_FIXED_BLOCKS)
+return true;
+  if (optimize_bb_for_size_p (bb))
+return true;
 
-  return false;
+  return !can_duplicate_bb_p (bb);
 }
 
 /* Return number of instructions in the block.  */


[PATCH][ftracer] Add caching of can_duplicate_bb_p

2020-10-05 Thread Tom de Vries
[ was: Re: [PATCH][omp, ftracer] Don't duplicate blocks in SIMT region ]

On 10/5/20 9:05 AM, Tom de Vries wrote:
> Ack, updated the patch accordingly, and split it up in two bits, one
> that does refactoring, and one that adds the actual caching:
> - [ftracer] Factor out can_duplicate_bb_p
> - [ftracer] Add caching of can_duplicate_bb_p
> 
> I'll post these in reply to this email.

OK?

Thanks,
- Tom
[ftracer] Add caching of can_duplicate_bb_p

The fix "[omp, ftracer] Don't duplicate blocks in SIMT region" adds iteration
over insns in ignore_bb_p, which makes it more expensive.

Counteract this by piggybacking the computation of can_duplicate_bb_p onto
count_insns, which is called at the start of ftracer.

Bootstrapped and reg-tested on x86_64-linux.

gcc/ChangeLog:

2020-10-05  Tom de Vries  

	* tracer.c (count_insns): Rename to ...
	(analyze_bb): ... this.
	(cache_can_duplicate_bb_p, cached_can_duplicate_bb_p): New function.
	(ignore_bb_p): Use cached_can_duplicate_bb_p.
	(tail_duplicate): Call cache_can_duplicate_bb_p.

---
 gcc/tracer.c | 47 +--
 1 file changed, 41 insertions(+), 6 deletions(-)

diff --git a/gcc/tracer.c b/gcc/tracer.c
index c0e888f6b03..0f69b335b8c 100644
--- a/gcc/tracer.c
+++ b/gcc/tracer.c
@@ -53,7 +53,7 @@
 #include "fibonacci_heap.h"
 #include "tracer.h"
 
-static int count_insns (basic_block);
+static void analyze_bb (basic_block, int *);
 static bool better_p (const_edge, const_edge);
 static edge find_best_successor (basic_block);
 static edge find_best_predecessor (basic_block);
@@ -143,6 +143,33 @@ can_duplicate_bb_p (const_basic_block bb)
   return true;
 }
 
+static sbitmap can_duplicate_bb;
+
+/* Cache VAL as value of can_duplicate_bb_p for BB.  */
+static inline void
+cache_can_duplicate_bb_p (const_basic_block bb, bool val)
+{
+  if (val)
+bitmap_set_bit (can_duplicate_bb, bb->index);
+}
+
+/* Return cached value of can_duplicate_bb_p for BB.  */
+static bool
+cached_can_duplicate_bb_p (const_basic_block bb)
+{
+  if (can_duplicate_bb)
+{
+  unsigned int size = SBITMAP_SIZE (can_duplicate_bb);
+  if ((unsigned int)bb->index < size)
+	return bitmap_bit_p (can_duplicate_bb, bb->index);
+
+  /* Assume added bb's should not be duplicated.  */
+  return false;
+}
+
+  return can_duplicate_bb_p (bb);
+}
+
 /* Return true if we should ignore the basic block for purposes of tracing.  */
 bool
 ignore_bb_p (const_basic_block bb)
@@ -152,24 +179,27 @@ ignore_bb_p (const_basic_block bb)
   if (optimize_bb_for_size_p (bb))
 return true;
 
-  return !can_duplicate_bb_p (bb);
+  return !cached_can_duplicate_bb_p (bb);
 }
 
 /* Return number of instructions in the block.  */
 
-static int
-count_insns (basic_block bb)
+static void
+analyze_bb (basic_block bb, int *count)
 {
   gimple_stmt_iterator gsi;
   gimple *stmt;
   int n = 0;
+  bool can_duplicate = can_duplicate_bb_no_insn_iter_p (bb);
 
   for (gsi = gsi_start_bb (bb); !gsi_end_p (gsi); gsi_next (&gsi))
 {
   stmt = gsi_stmt (gsi);
   n += estimate_num_insns (stmt, &eni_size_weights);
+  can_duplicate = can_duplicate && can_duplicate_insn_p (stmt);
 }
-  return n;
+  *count = n;
+  cache_can_duplicate_bb_p (bb, can_duplicate);
 }
 
 /* Return true if E1 is more frequent than E2.  */
@@ -317,6 +347,8 @@ tail_duplicate (void)
  resize it.  */
   bb_seen = sbitmap_alloc (last_basic_block_for_fn (cfun) * 2);
   bitmap_clear (bb_seen);
+  can_duplicate_bb = sbitmap_alloc (last_basic_block_for_fn (cfun));
+  bitmap_clear (can_duplicate_bb);
   initialize_original_copy_tables ();
 
   if (profile_info && profile_status_for_fn (cfun) == PROFILE_READ)
@@ -330,7 +362,8 @@ tail_duplicate (void)
 
   FOR_EACH_BB_FN (bb, cfun)
 {
-  int n = count_insns (bb);
+  int n;
+  analyze_bb (bb, &n);
   if (!ignore_bb_p (bb))
 	blocks[bb->index] = heap.insert (-bb->count.to_frequency (cfun), bb);
 
@@ -420,6 +453,8 @@ tail_duplicate (void)
 
   free_original_copy_tables ();
   sbitmap_free (bb_seen);
+  sbitmap_free (can_duplicate_bb);
+  can_duplicate_bb = NULL;
   free (trace);
   free (counts);
 


[committed][omp, ftracer] Remove incorrect suggestion in ignore_bb_p

2020-10-05 Thread Tom de Vries
[ was: Re: [PATCH][omp, ftracer] Don't duplicate blocks in SIMT region ]
On 10/5/20 10:51 AM, Alexander Monakov wrote:
>> + The IFN_GOMP_SIMT_VOTE_ANY is currently part of such a group,
>> + so the same holds there, but it could be argued that the
>> + IFN_GOMP_SIMT_VOTE_ANY could be generated after that group,
>> + in which case it could be duplicated.  */
> No, something like that cannot be argued, as VOTE_ANY may have data
> dependencies to storage that is deallocated by SIMT_EXIT. You seem to be
> claiming something that is simply not possible with the current design.
> 

Fixed in patch below, committed.

Thanks,
- Tom
[omp, ftracer] Remove incorrect suggestion in ignore_bb_p

In commit ab3f4b27abe "[omp, ftracer] Don't duplicate blocks in SIMT region" I
added a comment in ignore_bb_p suggesting a reordering of SIMT_VOTE_ANY and
SIMT_EXIT, which is not possible since VOTE_ANY may have data dependencies to
storage that is deallocated by SIMT_EXIT.

I've now opened a PR (PR97291) to describe the problem the reordering was
intended to fix.

Remove the incorrect suggestion.

gcc/ChangeLog:

2020-10-05  Tom de Vries  

	* tracer.c (ignore_bb_p): Remove incorrect suggestion.

---
 gcc/tracer.c | 6 ++
 1 file changed, 2 insertions(+), 4 deletions(-)

diff --git a/gcc/tracer.c b/gcc/tracer.c
index 5e51752d89f..5ee66511f8d 100644
--- a/gcc/tracer.c
+++ b/gcc/tracer.c
@@ -115,10 +115,8 @@ ignore_bb_p (const_basic_block bb)
 
   /* An IFN_GOMP_SIMT_ENTER_ALLOC/IFN_GOMP_SIMT_EXIT call must be
 	 duplicated as part of its group, or not at all.
-	 The IFN_GOMP_SIMT_VOTE_ANY is currently part of such a group,
-	 so the same holds there, but it could be argued that the
-	 IFN_GOMP_SIMT_VOTE_ANY could be generated after that group,
-	 in which case it could be duplicated.  */
+	 The IFN_GOMP_SIMT_VOTE_ANY is part of such a group, so the same holds
+	 there.  */
   if (is_gimple_call (g)
 	  && (gimple_call_internal_p (g, IFN_GOMP_SIMT_ENTER_ALLOC)
 	  || gimple_call_internal_p (g, IFN_GOMP_SIMT_EXIT)


Re: [PATCH] xfail and improve some failing libgomp tests

2020-10-05 Thread Tom de Vries
On 2/7/20 4:29 PM, Jakub Jelinek wrote:
> On Fri, Feb 07, 2020 at 09:56:38AM +0100, Harwath, Frederik wrote:
>> * {target-32.c, thread-limit-2.c}:
>> no "usleep" implemented for nvptx. Cf. https://gcc.gnu.org/PR81690
> 
> Please don't, I want to deal with that using declare variant, just didn't
> get yet around to finishing the last patch needed for that.  Will try next 
> week.
> 

Hi Jakub,

Ping, any update on this?

Thanks,
- Tom


[committed][omp, ftracer] Ignore IFN_GOMP_SIMT_XCHG_* in ignore_bb_p

2020-10-05 Thread Tom de Vries
[ was: Re: [PATCH][omp, ftracer] Don't duplicate blocks in SIMT region ]

On 10/5/20 10:51 AM, Alexander Monakov wrote:
>> +  if (is_gimple_call (g)
>> +  && (gimple_call_internal_p (g, IFN_GOMP_SIMT_ENTER_ALLOC)
>> +  || gimple_call_internal_p (g, IFN_GOMP_SIMT_EXIT)
>> +  || gimple_call_internal_p (g, IFN_GOMP_SIMT_VOTE_ANY)))
> 
> Hm? So you are leaving SIMT_XCHG_* be until the next testcase breaks?
> 

Fixed in patch below, committed.

Thanks,
- Tom
[omp, ftracer] Ignore IFN_GOMP_SIMT_XCHG_* in ignore_bb_p

As IFN_GOMP_SIMT_XCHG_* are part of the group marked by
IFN_GOMP_SIMT_ENTER_ALLOC/IFN_GOMP_SIMT_EXIT, handle them conservatively
in ignore_bb_p.

Build on x86_64-linux with nvptx accelerator, tested with libgomp.

gcc/ChangeLog:

2020-10-05  Tom de Vries  

	* tracer.c (ignore_bb_p): Ignore GOMP_SIMT_XCHG_*.

---
 gcc/tracer.c | 8 +---
 1 file changed, 5 insertions(+), 3 deletions(-)

diff --git a/gcc/tracer.c b/gcc/tracer.c
index 5ee66511f8d..7f32ccb7e21 100644
--- a/gcc/tracer.c
+++ b/gcc/tracer.c
@@ -115,12 +115,14 @@ ignore_bb_p (const_basic_block bb)
 
   /* An IFN_GOMP_SIMT_ENTER_ALLOC/IFN_GOMP_SIMT_EXIT call must be
 	 duplicated as part of its group, or not at all.
-	 The IFN_GOMP_SIMT_VOTE_ANY is part of such a group, so the same holds
-	 there.  */
+	 The IFN_GOMP_SIMT_VOTE_ANY and IFN_GOMP_SIMT_XCHG_* are part of such a
+	 group, so the same holds there.  */
   if (is_gimple_call (g)
 	  && (gimple_call_internal_p (g, IFN_GOMP_SIMT_ENTER_ALLOC)
 	  || gimple_call_internal_p (g, IFN_GOMP_SIMT_EXIT)
-	  || gimple_call_internal_p (g, IFN_GOMP_SIMT_VOTE_ANY)))
+	  || gimple_call_internal_p (g, IFN_GOMP_SIMT_VOTE_ANY)
+	  || gimple_call_internal_p (g, IFN_GOMP_SIMT_XCHG_BFLY)
+	  || gimple_call_internal_p (g, IFN_GOMP_SIMT_XCHG_IDX)))
 	return true;
 }
 


[PATCH][omp, ftracer] Improve comment in ignore_bb_p

2020-10-05 Thread Tom de Vries
[ was: Re: [PATCH][omp, ftracer] Don't duplicate blocks in SIMT region ]

On 10/5/20 10:51 AM, Alexander Monakov wrote:
> On Mon, 5 Oct 2020, Tom de Vries wrote:
> 
>> I've had to modify this patch in two ways:
>> - the original test-case stopped failing, though not the
>>   minimized one, so I added that one as a test-case
>> - only testing for ENTER_ALLOC and EXIT, and not explicitly for VOTE_ANY
>>   in ignore_bb_p also stopped working, so I've added that now.
>>
>> Re-tested and committed.
> 
> I don't understand, was the patch already approved somewhere?

Not explicitly, no.  But it was sent two weeks ago, and all the review
comments were related to compile-time slow-down, which I deal with in
separate patches.  So, based on that and the fact that this fixes a
problem for nvptx offloading currently triggering in the libgomp
test-suite, I decided to commit.

> It has some
> issues.
> 

OK, thanks for the review.

>> --- a/gcc/tracer.c
>> +++ b/gcc/tracer.c
>> @@ -108,6 +108,24 @@ ignore_bb_p (const_basic_block bb)
>>  return true;
>>  }
>>  
>> +  for (gimple_stmt_iterator gsi = gsi_start_bb (CONST_CAST_BB (bb));
>> +   !gsi_end_p (gsi); gsi_next (&gsi))
>> +{
>> +  gimple *g = gsi_stmt (gsi);
>> +
>> +  /* An IFN_GOMP_SIMT_ENTER_ALLOC/IFN_GOMP_SIMT_EXIT call must be
>> + duplicated as part of its group, or not at all.
> 
> What does "its group" stand for? It seems obviously copy-pasted from the
> description of IFN_UNIQUE treatment, where it is even less clear what the
> "group" is.
> 
> (I know what it means, but the comment is not explaining things well at all)
> 
>> + The IFN_GOMP_SIMT_VOTE_ANY is currently part of such a group,
>> + so the same holds there, but it could be argued that the
>> + IFN_GOMP_SIMT_VOTE_ANY could be generated after that group,
>> + in which case it could be duplicated.  */
> 

How about this?

Thanks,
- Tom
[omp, ftracer] Improve comment in ignore_bb_p

Improve comment in ignore_bb_p related to the group marked by
IFN_GOMP_SIMT_ENTER_ALLOC/IFN_GOMP_SIMT_EXIT.

gcc/ChangeLog:

2020-10-05  Tom de Vries  

	* tracer.c (ignore_bb_p): Improve comment.

---
 gcc/tracer.c | 14 +-
 1 file changed, 9 insertions(+), 5 deletions(-)

diff --git a/gcc/tracer.c b/gcc/tracer.c
index 7f32ccb7e21..cdda535ce9d 100644
--- a/gcc/tracer.c
+++ b/gcc/tracer.c
@@ -112,11 +112,15 @@ ignore_bb_p (const_basic_block bb)
!gsi_end_p (gsi); gsi_next (&gsi))
 {
   gimple *g = gsi_stmt (gsi);
-
-  /* An IFN_GOMP_SIMT_ENTER_ALLOC/IFN_GOMP_SIMT_EXIT call must be
-	 duplicated as part of its group, or not at all.
-	 The IFN_GOMP_SIMT_VOTE_ANY and IFN_GOMP_SIMT_XCHG_* are part of such a
-	 group, so the same holds there.  */
+  /* An IFN_GOMP_SIMT_ENTER_ALLOC/IFN_GOMP_SIMT_EXIT pair marks a SESE
+	 region.  When entering the region, per-lane storage is allocated,
+	 and non-uniform execution is started.  When exiting the region,
+	 non-uniform execution is stopped and per-lane storage is deallocated.
+	 These calls can be duplicated safely, if the entire region is
+	 duplicated, otherwise not.  So, since we're not asked about such a
+	 region here, conservatively return false.
+	 The IFN_GOMP_SIMT_VOTE_ANY and SIMT_XCHG_* are part of such
+	 a region, so the same holds there.  */
   if (is_gimple_call (g)
 	  && (gimple_call_internal_p (g, IFN_GOMP_SIMT_ENTER_ALLOC)
 	  || gimple_call_internal_p (g, IFN_GOMP_SIMT_EXIT)


[PATCH][openacc] Fix acc declare for VLAs

2020-10-06 Thread Tom de Vries
Hi,

Consider test-case test.c, with VLA A:
...
int main (void) {
  int N = 1000;
  int A[N];
  #pragma acc declare copy(A)
  return 0;
}
...
compiled using:
...
$ gcc test.c -fopenacc -S -fdump-tree-all
...

At original, we have:
...
  #pragma acc declare map(tofrom:A);
...
but at gimple, we have a map (to:A.1), but not a map (from:A.1):
...
  int[0:D.2074] * A.1;

  {
int A[0:D.2074] [value-expr: *A.1];

saved_stack.2 = __builtin_stack_save ();
try
  {
A.1 = __builtin_alloca_with_align (D.2078, 32);
#pragma omp target oacc_declare map(to:(*A.1) [len: D.2076])
  }
finally
  {
__builtin_stack_restore (saved_stack.2);
  }
  }
...

This is caused by the following incompatibility.  When storing the desired
from clause in oacc_declare_returns, we use 'A.1' as the key:
...
10898 oacc_declare_returns->put (decl, c);
(gdb) call debug_generic_expr (decl)
A.1
(gdb) call debug_generic_expr (c)
map(from:(*A.1))
...
but when looking it up, we use 'A' as the key:
...
(gdb)
1471  tree *c = oacc_declare_returns->get (t);
(gdb) call debug_generic_expr (t)
A
...

Fix this by extracing the 'A.1' lookup key from 'A' using the decl-expr.

In addition, unshare the looked up value, to fix avoid running into
an "incorrect sharing of tree nodes" error.

Using these two fixes, we get our desired:
...
 finally
   {
+#pragma omp target oacc_declare map(from:(*A.1))
 __builtin_stack_restore (saved_stack.2);
   }
...

Build on x86_64-linux with nvptx accelerator, tested libgomp.

OK for trunk?

Thanks,
- Tom

[openacc] Fix acc declare for VLAs

gcc/ChangeLog:

2020-10-06  Tom de Vries  

PR middle-end/90861
* gimplify.c (gimplify_bind_expr): Handle lookup in
oacc_declare_returns using key with decl-expr.

libgomp/ChangeLog:

2020-10-06  Tom de Vries  

PR middle-end/90861
* testsuite/libgomp.oacc-c-c++-common/declare-vla.c: Remove xfail.

---
 gcc/gimplify.c| 13 ++---
 libgomp/testsuite/libgomp.oacc-c-c++-common/declare-vla.c |  5 -
 2 files changed, 10 insertions(+), 8 deletions(-)

diff --git a/gcc/gimplify.c b/gcc/gimplify.c
index 2dea03cce3d..fa89e797940 100644
--- a/gcc/gimplify.c
+++ b/gcc/gimplify.c
@@ -1468,15 +1468,22 @@ gimplify_bind_expr (tree *expr_p, gimple_seq *pre_p)
 
  if (flag_openacc && oacc_declare_returns != NULL)
{
- tree *c = oacc_declare_returns->get (t);
+ tree key = t;
+ if (DECL_HAS_VALUE_EXPR_P (key))
+   {
+ key = DECL_VALUE_EXPR (key);
+ if (TREE_CODE (key) == INDIRECT_REF)
+   key = TREE_OPERAND (key, 0);
+   }
+ tree *c = oacc_declare_returns->get (key);
  if (c != NULL)
{
  if (ret_clauses)
OMP_CLAUSE_CHAIN (*c) = ret_clauses;
 
- ret_clauses = *c;
+ ret_clauses = unshare_expr (*c);
 
- oacc_declare_returns->remove (t);
+ oacc_declare_returns->remove (key);
 
  if (oacc_declare_returns->is_empty ())
{
diff --git a/libgomp/testsuite/libgomp.oacc-c-c++-common/declare-vla.c 
b/libgomp/testsuite/libgomp.oacc-c-c++-common/declare-vla.c
index 0f51badca42..714935772c1 100644
--- a/libgomp/testsuite/libgomp.oacc-c-c++-common/declare-vla.c
+++ b/libgomp/testsuite/libgomp.oacc-c-c++-common/declare-vla.c
@@ -59,8 +59,3 @@ main ()
 
   return 0;
 }
-
-
-/* { dg-xfail-run-if "TODO PR90861" { *-*-* } { "-DACC_MEM_SHARED=0" } }
-   This might XPASS if the compiler happens to put the two 'A' VLAs at the same
-   address.  */


Re: [PATCH] xfail and improve some failing libgomp tests

2020-10-06 Thread Tom de Vries
On 10/5/20 3:15 PM, Tom de Vries wrote:
> On 2/7/20 4:29 PM, Jakub Jelinek wrote:
>> On Fri, Feb 07, 2020 at 09:56:38AM +0100, Harwath, Frederik wrote:
>>> * {target-32.c, thread-limit-2.c}:
>>> no "usleep" implemented for nvptx. Cf. https://gcc.gnu.org/PR81690
>>
>> Please don't, I want to deal with that using declare variant, just didn't
>> get yet around to finishing the last patch needed for that.  Will try next 
>> week.
>>
> 
> Hi Jakub,
> 
> Ping, any update on this?

FWIW, I've tried as in patch attached below, but I didn't get it
compiling, I still got:
...
FAIL: libgomp.c/target-32.c (test for excess errors)
Excess errors:
unresolved symbol usleep
...

Jakub, is this already supposed to work?

Thanks,
- Tom
diff --git a/libgomp/testsuite/libgomp.c/target-32.c b/libgomp/testsuite/libgomp.c/target-32.c
index 233877b702b..7ddf8721ed3 100644
--- a/libgomp/testsuite/libgomp.c/target-32.c
+++ b/libgomp/testsuite/libgomp.c/target-32.c
@@ -1,6 +1,26 @@
 #include 
 #include 
 
+extern void base_delay(int);
+extern void nvptx_delay(int);
+
+#pragma omp declare variant( nvptx_delay ) match( construct={target}, implementation={vendor(nvidia)} )
+void base_delay(int d)
+{
+  usleep (d);
+}
+
+void nvptx_delay(int d)
+{
+  /* This function serves as a replacement for usleep in
+ this test case. It does not even attempt to be functionally
+ equivalent  - we just want some sort of delay. */
+  int i;
+  int N = d * 2000;
+  for (i = 0; i < N; i++)
+asm volatile ("" : : : "memory");
+}
+
 int main ()
 {
   int a = 0, b = 0, c = 0, d[7];
@@ -18,28 +38,28 @@ int main ()
 
 #pragma omp target nowait map(alloc: b) depend(in: d[2]) depend(out: d[3])
 {
-  usleep (1000);
+  base_delay (1000);
   #pragma omp atomic update
   b |= 4;
 }
 
 #pragma omp target nowait map(alloc: b) depend(in: d[2]) depend(out: d[4])
 {
-  usleep (5000);
+  base_delay (5000);
   #pragma omp atomic update
   b |= 1;
 }
 
 #pragma omp target nowait map(alloc: c) depend(in: d[3], d[4]) depend(out: d[5])
 {
-  usleep (5000);
+  base_delay (5000);
   #pragma omp atomic update
   c |= 8;
 }
 
 #pragma omp target nowait map(alloc: c) depend(in: d[3], d[4]) depend(out: d[6])
 {
-  usleep (1000);
+  base_delay (1000);
   #pragma omp atomic update
   c |= 2;
 }


Re: [PATCH] xfail and improve some failing libgomp tests

2020-10-06 Thread Tom de Vries
On 10/6/20 5:02 PM, Jakub Jelinek wrote:
> On Tue, Oct 06, 2020 at 04:48:40PM +0200, Tom de Vries wrote:
>> On 10/5/20 3:15 PM, Tom de Vries wrote:
>>> On 2/7/20 4:29 PM, Jakub Jelinek wrote:
>>>> On Fri, Feb 07, 2020 at 09:56:38AM +0100, Harwath, Frederik wrote:
>>>>> * {target-32.c, thread-limit-2.c}:
>>>>> no "usleep" implemented for nvptx. Cf. https://gcc.gnu.org/PR81690
>>>>
>>>> Please don't, I want to deal with that using declare variant, just didn't
>>>> get yet around to finishing the last patch needed for that.  Will try next 
>>>> week.
>>>>
>>>
>>> Hi Jakub,
>>>
>>> Ping, any update on this?
> 
> Not finished the last step, I run into LTO issues.  Will need to return to
> that soon.
> Last progress in "[RFH] LTO cgraph support for late declare variant 
> resolution"
> mail from May on gcc-patches.
> 

Ack, thanks for the update.

>> --- a/libgomp/testsuite/libgomp.c/target-32.c
>> +++ b/libgomp/testsuite/libgomp.c/target-32.c
>> @@ -1,6 +1,26 @@
>>  #include 
>>  #include 
>>  
>> +extern void base_delay(int);
> 
> No need to declare this one early.
> 
>> +extern void nvptx_delay(int);
> 
> Space before (, and the definition could go here instead of
> the declaration.
> 
>> +#pragma omp declare variant( nvptx_delay ) match( construct={target}, 
>> implementation={vendor(nvidia)} )
> 
> This isn't the right declare variant for what we want though,
> we only provide gnu as accepted vendor, it is implementation's vendor,
> not vendor of one of the hw components.
> So, it ought to be instead
> #pragma omp declare variant (nvptx_delay) 
> match(construct={target},device={arch(nvptx)})
> 
>> +void base_delay(int d)
>> +{
>> +  usleep (d);
>> +}

I've updated the patch accordingly.

FWIW, I now run into an ICE which looks like PR96680:
...
lto1: internal compiler error: in lto_fixup_prevailing_decls, at
lto/lto-common.c:2595^M
0x93afcd lto_fixup_prevailing_decls^M
/home/vries/oacc/trunk/source-gcc/gcc/lto/lto-common.c:2595^M
0x93b1d6 lto_fixup_decls^M
/home/vries/oacc/trunk/source-gcc/gcc/lto/lto-common.c:2645^M
0x93bcc4 read_cgraph_and_symbols(unsigned int, char const**)^M
/home/vries/oacc/trunk/source-gcc/gcc/lto/lto-common.c:2897^M
0x910358 lto_main()^M
/home/vries/oacc/trunk/source-gcc/gcc/lto/lto.c:625^M
...

Thanks,
- Tom
diff --git a/libgomp/testsuite/libgomp.c/target-32.c b/libgomp/testsuite/libgomp.c/target-32.c
index 233877b702b..b8deae72b08 100644
--- a/libgomp/testsuite/libgomp.c/target-32.c
+++ b/libgomp/testsuite/libgomp.c/target-32.c
@@ -1,6 +1,25 @@
 #include 
 #include 
 
+void
+nvptx_delay (int d)
+{
+  /* This function serves as a replacement for usleep in
+ this test case.  It does not even attempt to be functionally
+ equivalent  - we just want some sort of delay. */
+  int i;
+  int N = d * 2000;
+  for (i = 0; i < N; i++)
+asm volatile ("" : : : "memory");
+}
+
+#pragma omp declare variant (nvptx_delay) match(construct={target},device={arch(nvptx)})
+void
+base_delay(int d)
+{
+  usleep (d);
+}
+
 int main ()
 {
   int a = 0, b = 0, c = 0, d[7];
@@ -18,28 +37,28 @@ int main ()
 
 #pragma omp target nowait map(alloc: b) depend(in: d[2]) depend(out: d[3])
 {
-  usleep (1000);
+  base_delay (1000);
   #pragma omp atomic update
   b |= 4;
 }
 
 #pragma omp target nowait map(alloc: b) depend(in: d[2]) depend(out: d[4])
 {
-  usleep (5000);
+  base_delay (5000);
   #pragma omp atomic update
   b |= 1;
 }
 
 #pragma omp target nowait map(alloc: c) depend(in: d[3], d[4]) depend(out: d[5])
 {
-  usleep (5000);
+  base_delay (5000);
   #pragma omp atomic update
   c |= 8;
 }
 
 #pragma omp target nowait map(alloc: c) depend(in: d[3], d[4]) depend(out: d[6])
 {
-  usleep (1000);
+  base_delay (1000);
   #pragma omp atomic update
   c |= 2;
 }


[PATCH][openacc, libgomp, testsuite] Xfail declare-5.f90

2020-10-06 Thread Tom de Vries
Hi,

We're currently running into:
...
FAIL: libgomp.oacc-fortran/declare-5.f90 -DACC_DEVICE_TYPE_nvidia=1 
-DACC_MEM_SHARED=0 -foffload=nvptx-none  -O0  execution test
FAIL: libgomp.oacc-fortran/declare-5.f90 -DACC_DEVICE_TYPE_nvidia=1 
-DACC_MEM_SHARED=0 -foffload=nvptx-none  -O1  execution test
FAIL: libgomp.oacc-fortran/declare-5.f90 -DACC_DEVICE_TYPE_nvidia=1 
-DACC_MEM_SHARED=0 -foffload=nvptx-none  -O2  execution test
FAIL: libgomp.oacc-fortran/declare-5.f90 -DACC_DEVICE_TYPE_nvidia=1 
-DACC_MEM_SHARED=0 -foffload=nvptx-none  -O3 -fomit-frame-pointer 
-funroll-loops -fpeel-loops -ftracer -finline-functions  execution test
FAIL: libgomp.oacc-fortran/declare-5.f90 -DACC_DEVICE_TYPE_nvidia=1 
-DACC_MEM_SHARED=0 -foffload=nvptx-none  -O3 -g  execution test
FAIL: libgomp.oacc-fortran/declare-5.f90 -DACC_DEVICE_TYPE_nvidia=1 
-DACC_MEM_SHARED=0 -foffload=nvptx-none  -Os  execution test
...

A PR was filed for this: PR92790 - "[OpenACC] declare device_resident -
Fortran common blocks not handled / libgomp.oacc-fortran/declare-5.f90 fails"

Xfail the fails.

Tested on x86_64-linux with nvptx accelerator.

OK for trunk?

Thanks,
- Tom

[openacc, libgomp, testsuite] Xfail declare-5.f90

libgomp/ChangeLog:

2020-10-06  Tom de Vries  

* testsuite/libgomp.oacc-fortran/declare-5.f90: Add xfail for PR92790.

---
 libgomp/testsuite/libgomp.oacc-fortran/declare-5.f90 | 1 +
 1 file changed, 1 insertion(+)

diff --git a/libgomp/testsuite/libgomp.oacc-fortran/declare-5.f90 
b/libgomp/testsuite/libgomp.oacc-fortran/declare-5.f90
index 2fd25d611a9..ab434f7f127 100644
--- a/libgomp/testsuite/libgomp.oacc-fortran/declare-5.f90
+++ b/libgomp/testsuite/libgomp.oacc-fortran/declare-5.f90
@@ -1,4 +1,5 @@
 ! { dg-do run }
+! { dg-xfail-run-if "PR92790 - acc declare device_resident - Fortran common 
blocks not handled" { *-*-* } { "*" } { "-DACC_DEVICE_TYPE_host=1" } }
 
 module vars
   implicit none


[PATCH][tree-ssa-loop-ch] Add missing NULL test for dump_file

2020-10-06 Thread Tom de Vries
Hi,

If we change gimple_can_duplicate_bb_p to return false instead of true, we run
into a segfault in ch_base::copy_headers due to using dump_file while it's
NULL:
...
  if (!gimple_duplicate_sese_region (entry, exit, bbs, n_bbs, copied_bbs,
true))
   {
 fprintf (dump_file, "Duplication failed.\n");
 continue;
   }
...

Fix this by adding the missing dump_file != NULL test.

Tested by rebuilding lto1 and rerunning the failing test-case.

[ Not committing this as obvious because I'm not sure about the
(dump_flags & TDF_DETAILS) bit. ]

OK for trunk?

Thanks,
- Tom

[tree-ssa-loop-ch] Add missing NULL test for dump_file

gcc/ChangeLog:

2020-10-07  Tom de Vries  

* tree-ssa-loop-ch.c (ch_base::copy_headers): Add missing NULL test
for dump_file.

---
 gcc/tree-ssa-loop-ch.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/gcc/tree-ssa-loop-ch.c b/gcc/tree-ssa-loop-ch.c
index b86acf7c39d..1f3d9321a55 100644
--- a/gcc/tree-ssa-loop-ch.c
+++ b/gcc/tree-ssa-loop-ch.c
@@ -425,7 +425,8 @@ ch_base::copy_headers (function *fun)
   if (!gimple_duplicate_sese_region (entry, exit, bbs, n_bbs, copied_bbs,
 true))
{
- fprintf (dump_file, "Duplication failed.\n");
+ if (dump_file && (dump_flags & TDF_DETAILS))
+   fprintf (dump_file, "Duplication failed.\n");
  continue;
}
   copied.safe_push (std::make_pair (entry, loop));


[committed][libgomp, nvptx] Report launch dimensions in GOMP_OFFLOAD_run

2020-10-08 Thread Tom de Vries
Hi,

Using this patch, when using GOMP_DEBUG=1 and launching a kernel in
GOMP_OFFLOAD_run (used by the omp implementation), we see the kernel launch
dimensions:
...
  GOMP_OFFLOAD_run: kernel main$_omp_fn$0: \
launch [(teams: 1), 1, 1] [(lanes: 32), (threads: 1), 1]
...

Build on x86_64-linux with nvptx accelerator, tested libgomp.

Committed to trunk.

Thanks,
- Tom

[libgomp, nvptx] Report launch dimensions in GOMP_OFFLOAD_run

libgomp/ChangeLog:

2020-10-08  Tom de Vries  

PR libgomp/81802
* plugin/plugin-nvptx.c (GOMP_OFFLOAD_run): Report launch
dimensions.

---
 libgomp/plugin/plugin-nvptx.c | 9 -
 1 file changed, 8 insertions(+), 1 deletion(-)

diff --git a/libgomp/plugin/plugin-nvptx.c b/libgomp/plugin/plugin-nvptx.c
index a63dd1a99fb..11d4ceeae62 100644
--- a/libgomp/plugin/plugin-nvptx.c
+++ b/libgomp/plugin/plugin-nvptx.c
@@ -1891,7 +1891,11 @@ nvptx_stacks_free (void *p, int num)
 void
 GOMP_OFFLOAD_run (int ord, void *tgt_fn, void *tgt_vars, void **args)
 {
-  CUfunction function = ((struct targ_fn_descriptor *) tgt_fn)->fn;
+  struct targ_fn_descriptor *tgt_fn_desc
+= (struct targ_fn_descriptor *) tgt_fn;
+  CUfunction function = tgt_fn_desc->fn;
+  const struct targ_fn_launch *launch = tgt_fn_desc->launch;
+  const char *fn_name = launch->fn;
   CUresult r;
   struct ptx_device *ptx_dev = ptx_devices[ord];
   const char *maybe_abort_msg = "(perhaps abort was called)";
@@ -1926,6 +1930,9 @@ GOMP_OFFLOAD_run (int ord, void *tgt_fn, void *tgt_vars, 
void **args)
 CU_LAUNCH_PARAM_BUFFER_SIZE, &fn_args_size,
 CU_LAUNCH_PARAM_END
   };
+  GOMP_PLUGIN_debug (0, "  %s: kernel %s: launch"
+" [(teams: %u), 1, 1] [(lanes: 32), (threads: %u), 1]\n",
+__FUNCTION__, fn_name, teams, threads);
   r = CUDA_CALL_NOCHECK (cuLaunchKernel, function, teams, 1, 1,
 32, threads, 1, 0, NULL, NULL, config);
   if (r != CUDA_SUCCESS)


Re: [committed][nvptx] Split up function ref plus const

2020-10-09 Thread Tom de Vries
On 10/9/20 11:03 AM, Thomas Schwinge wrote:
> Hi Tom!
> 
> On 2020-09-23T22:46:34+0200, Tom de Vries  wrote:
>> With test-case gcc.c-torture/compile/pr92231.c, we run into:
> 
> "Interesting" testcase...  ;-)
> 
>> ...
>> nvptx-as: ptxas terminated with signal 11 [Segmentation fault], core dumped^M
> 
> Confirmed with:
> 
> $ ptxas --version
> ptxas: NVIDIA (R) Ptx optimizing assembler
> Copyright (c) 2005-2014 NVIDIA Corporation
> Built on Thu_Jul_17_21:41:15_CDT_2014
> Cuda compilation tools, release 6.5, V6.5.12
> 
> ..., and:
> 
> $ ptxas --version
> ptxas: NVIDIA (R) Ptx optimizing assembler
> Copyright (c) 2005-2017 NVIDIA Corporation
> Built on Fri_Dec__1_00:57:38_CST_2017
> Cuda compilation tools, release 9.1, V9.1.108
> 
> Have you reported this to Nvidia?
> 

No, it's on my list to report, but I'm locked out from my nvidia account
for a while now, and nvidia remains unresponsive.

>> compiler exited with status 1
>> FAIL: gcc.c-torture/compile/pr92231.c   -O0  (test for excess errors)
>> ...
>> due to using a function reference plus constant as operand:
>> ...
>>   mov.u64 %r24,bar+4096';
>> ...
>>
>> Fix this by splitting such an insn into:
>> ...
>>   mov.u64 %r24,bar';
>>   add.u64 %r24,%r24,4096';
>> ...
> 
> (Spurious single-quote characters in PTX code?)
> 

Yeah, I think that was a copy-paste thing, the compiler DDRT.

Thanks,
- Tom


[committed][nvptx] Set -misa=sm_35 by default

2020-10-09 Thread Tom de Vries
Hi,

The nvptx-as assembler verifies the ptx code using ptxas, if there's any
in the PATH.

The default in the nvptx port for -misa=sm_xx is sm_30, but the ptxas of the
latest cuda release (11.1) no longer supports sm_30.

Consequently we cannot build gcc against that release (although we should
still be able to build without any cuda release).

Fix this by setting -misa=sm_35 by default.

Tested check-gcc on nvptx.

Tested libgomp on x86_64-linux with nvpx accelerator.

Both build against cuda 9.1.

Committed to trunk.

Thanks,
- Tom

[nvptx] Set -misa=sm_35 by default

gcc/ChangeLog:

2020-10-09  Tom de Vries  

PR target/97348
* config/nvptx/nvptx.h (ASM_SPEC): Also pass -m to nvptx-as if
default is used.
* config/nvptx/nvptx.opt (misa): Init with PTX_ISA_SM35.

---
 gcc/config/nvptx/nvptx.h   | 5 -
 gcc/config/nvptx/nvptx.opt | 3 ++-
 2 files changed, 6 insertions(+), 2 deletions(-)

diff --git a/gcc/config/nvptx/nvptx.h b/gcc/config/nvptx/nvptx.h
index 6ebcc760771..17fe157058c 100644
--- a/gcc/config/nvptx/nvptx.h
+++ b/gcc/config/nvptx/nvptx.h
@@ -29,7 +29,10 @@
 
 #define STARTFILE_SPEC "%{mmainkernel:crt0.o}"
 
-#define ASM_SPEC "%{misa=*:-m %*}"
+/* Default needs to be in sync with default for misa in nvptx.opt.
+   We add a default here to work around a hard-coded sm_30 default in
+   nvptx-as.  */
+#define ASM_SPEC "%{misa=*:-m %*; :-m sm_35}"
 
 #define TARGET_CPU_CPP_BUILTINS()  \
   do   \
diff --git a/gcc/config/nvptx/nvptx.opt b/gcc/config/nvptx/nvptx.opt
index 75c3d54864e..d6910a96cf0 100644
--- a/gcc/config/nvptx/nvptx.opt
+++ b/gcc/config/nvptx/nvptx.opt
@@ -59,6 +59,7 @@ Enum(ptx_isa) String(sm_30) Value(PTX_ISA_SM30)
 EnumValue
 Enum(ptx_isa) String(sm_35) Value(PTX_ISA_SM35)
 
+; Default needs to be in sync with default in ASM_SPEC in nvptx.h.
 misa=
-Target RejectNegative ToLower Joined Enum(ptx_isa) Var(ptx_isa_option) 
Init(PTX_ISA_SM30)
+Target RejectNegative ToLower Joined Enum(ptx_isa) Var(ptx_isa_option) 
Init(PTX_ISA_SM35)
 Specify the version of the ptx ISA to use.


Re: [committed][nvptx] Set -misa=sm_35 by default

2020-10-09 Thread Tom de Vries
On 10/9/20 2:19 PM, Tobias Burnus wrote:
> Hi,
> 
> On 10/9/20 1:56 PM, Tom de Vries wrote:
>> The default in the nvptx port for -misa=sm_xx is sm_30, but the ptxas
>> of the
>> latest cuda release (11.1) no longer supports sm_30.
> 
> Interestingly, at
> https://docs.nvidia.com/cuda/parallel-thread-execution/index.html#release-notes__ptx-release-history
> 
> they still claim to support everything down to sm_10. (They talk about
> supported targets, but still.)
>

Hi,

ha, funny.  Well, ptxas is pretty convinced it doesn't want to support it ;)

>> Fix this by setting -misa=sm_35 by default.
> 
> Can you update the release notes?
> 
> The other question is whether and, if so, how we want to add support for
> newer PTX ISA versions than 3.1 = CUDA 5.0. In terms of PTX ISA itself,
> moving to 6.3 would be a great step forward but requires at least CUDA
> 10. Hence, it could either a bump of the minimal CUDA version or to have
> some way to specify the PTX version. Thoughty?

A PR is open for this: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=96005.

FWIW, I'm not planning to work on this short term, my plate is pretty full.

Thanks,
- Tom



[RFC][gimple] Move can_duplicate_bb_p to gimple_can_duplicate_bb_p

2020-10-09 Thread Tom de Vries
Hi,

The function gimple_can_duplicate_bb_p currently always returns true.

The presence of can_duplicate_bb_p in tracer.c however suggests that
there are cases when bb's indeed cannot be duplicated.

Move the implementation of can_duplicate_bb_p to gimple_can_duplicate_bb_p.

Bootstrapped and reg-tested on x86_64-linux.

Build x86_64-linux with nvptx accelerator and tested libgomp.

No issues found.

As corner-case check, bootstrapped and reg-tested a patch that makes
gimple_can_duplicate_bb_p always return false, resulting in
PR97333 - "[gimple_can_duplicate_bb_p == false, tree-ssa-threadupdate]
ICE in duplicate_block, at cfghooks.c:1093".

Any comments?

Thanks,
- Tom

[gimple] Move can_duplicate_bb_p to gimple_can_duplicate_bb_p

gcc/ChangeLog:

2020-10-09  Tom de Vries  

* tracer.c (cached_can_duplicate_bb_p): Use can_duplicate_block_p
instead of can_duplicate_bb_p.
(can_duplicate_insn_p, can_duplicate_bb_no_insn_iter_p): Move ...
* tree-cfg.c: ... here.
* tracer.c (can_duplicate_bb_p): Move ...
* tree-cfg.c (gimple_can_duplicate_bb_p): here.
* tree-cfg.h (can_duplicate_insn_p, can_duplicate_bb_no_insn_iter_p):
Declare.

---
 gcc/tracer.c   | 61 +-
 gcc/tree-cfg.c | 54 ++-
 gcc/tree-cfg.h |  2 ++
 3 files changed, 56 insertions(+), 61 deletions(-)

diff --git a/gcc/tracer.c b/gcc/tracer.c
index e1c2b9527e5..16b46c65b14 100644
--- a/gcc/tracer.c
+++ b/gcc/tracer.c
@@ -84,65 +84,6 @@ bb_seen_p (basic_block bb)
   return bitmap_bit_p (bb_seen, bb->index);
 }
 
-/* Return true if gimple stmt G can be duplicated.  */
-static bool
-can_duplicate_insn_p (gimple *g)
-{
-  /* An IFN_GOMP_SIMT_ENTER_ALLOC/IFN_GOMP_SIMT_EXIT call must be
- duplicated as part of its group, or not at all.
- The IFN_GOMP_SIMT_VOTE_ANY and IFN_GOMP_SIMT_XCHG_* are part of such a
- group, so the same holds there.  */
-  if (is_gimple_call (g)
-  && (gimple_call_internal_p (g, IFN_GOMP_SIMT_ENTER_ALLOC)
- || gimple_call_internal_p (g, IFN_GOMP_SIMT_EXIT)
- || gimple_call_internal_p (g, IFN_GOMP_SIMT_VOTE_ANY)
- || gimple_call_internal_p (g, IFN_GOMP_SIMT_XCHG_BFLY)
- || gimple_call_internal_p (g, IFN_GOMP_SIMT_XCHG_IDX)))
-return false;
-
-  return true;
-}
-
-/* Return true if BB can be duplicated.  Avoid iterating over the insns.  */
-static bool
-can_duplicate_bb_no_insn_iter_p (const_basic_block bb)
-{
-  if (bb->index < NUM_FIXED_BLOCKS)
-return false;
-
-  if (gimple *g = last_stmt (CONST_CAST_BB (bb)))
-{
-  /* A transaction is a single entry multiple exit region.  It
-must be duplicated in its entirety or not at all.  */
-  if (gimple_code (g) == GIMPLE_TRANSACTION)
-   return false;
-
-  /* An IFN_UNIQUE call must be duplicated as part of its group,
-or not at all.  */
-  if (is_gimple_call (g)
- && gimple_call_internal_p (g)
- && gimple_call_internal_unique_p (g))
-   return false;
-}
-
-  return true;
-}
-
-/* Return true if BB can be duplicated.  */
-static bool
-can_duplicate_bb_p (const_basic_block bb)
-{
-  if (!can_duplicate_bb_no_insn_iter_p (bb))
-return false;
-
-  for (gimple_stmt_iterator gsi = gsi_start_bb (CONST_CAST_BB (bb));
-   !gsi_end_p (gsi); gsi_next (&gsi))
-if (!can_duplicate_insn_p (gsi_stmt (gsi)))
-  return false;
-
-  return true;
-}
-
 static sbitmap can_duplicate_bb;
 
 /* Cache VAL as value of can_duplicate_bb_p for BB.  */
@@ -167,7 +108,7 @@ cached_can_duplicate_bb_p (const_basic_block bb)
   return false;
 }
 
-  return can_duplicate_bb_p (bb);
+  return can_duplicate_block_p (bb);
 }
 
 /* Return true if we should ignore the basic block for purposes of tracing.  */
diff --git a/gcc/tree-cfg.c b/gcc/tree-cfg.c
index 5caf3b62d69..a5677859ffc 100644
--- a/gcc/tree-cfg.c
+++ b/gcc/tree-cfg.c
@@ -6208,11 +6208,63 @@ gimple_split_block_before_cond_jump (basic_block bb)
 }
 
 
+/* Return true if gimple stmt G can be duplicated.  */
+bool
+can_duplicate_insn_p (gimple *g)
+{
+  /* An IFN_GOMP_SIMT_ENTER_ALLOC/IFN_GOMP_SIMT_EXIT call must be
+ duplicated as part of its group, or not at all.
+ The IFN_GOMP_SIMT_VOTE_ANY and IFN_GOMP_SIMT_XCHG_* are part of such a
+ group, so the same holds there.  */
+  if (is_gimple_call (g)
+  && (gimple_call_internal_p (g, IFN_GOMP_SIMT_ENTER_ALLOC)
+ || gimple_call_internal_p (g, IFN_GOMP_SIMT_EXIT)
+ || gimple_call_internal_p (g, IFN_GOMP_SIMT_VOTE_ANY)
+ || gimple_call_internal_p (g, IFN_GOMP_SIMT_XCHG_BFLY)
+ || gimple_call_internal_p (g, IFN_GOMP_SIMT_XCHG_IDX)))
+return false;
+
+  return true;
+}
+
+/* Return true if BB can be duplicated.  Avoid iterating over the insns.  */
+bool
+can_duplicate_bb_no_insn_iter_p (const_basic_block b

[committed][nvptx] Factor out write_fn_proto_1

2020-10-10 Thread Tom de Vries
Hi,

Factor out write_fn_proto_1 from write_fn_proto.

Tested check-gcc on nvptx.

Tested libgomp on x86_64-linux with nvptx accelerator.

Committed to trunk.

Thanks,
- Tom

[nvptx] Factor out write_fn_proto_1

gcc/ChangeLog:

2020-10-10  Tom de Vries  

* config/nvptx/nvptx.c (write_fn_proto_1): New function, factored out
of ...
(write_fn_proto): ... here.  Return void.

---
 gcc/config/nvptx/nvptx.c | 42 +++---
 1 file changed, 23 insertions(+), 19 deletions(-)

diff --git a/gcc/config/nvptx/nvptx.c b/gcc/config/nvptx/nvptx.c
index afac1bda45d..0c1d6d112ec 100644
--- a/gcc/config/nvptx/nvptx.c
+++ b/gcc/config/nvptx/nvptx.c
@@ -827,26 +827,12 @@ write_var_marker (FILE *file, bool is_defn, bool 
globalize, const char *name)
   fputs ("\n", file);
 }
 
-/* Write a .func or .kernel declaration or definition along with
-   a helper comment for use by ld.  S is the stream to write to, DECL
-   the decl for the function with name NAME.   For definitions, emit
-   a declaration too.  */
+/* Helper function for write_fn_proto.  */
 
-static const char *
-write_fn_proto (std::stringstream &s, bool is_defn,
-   const char *name, const_tree decl)
+static void
+write_fn_proto_1 (std::stringstream &s, bool is_defn,
+ const char *name, const_tree decl)
 {
-  if (is_defn)
-/* Emit a declaration. The PTX assembler gets upset without it.   */
-name = write_fn_proto (s, false, name, decl);
-  else
-{
-  /* Avoid repeating the name replacement.  */
-  name = nvptx_name_replacement (name);
-  if (name[0] == '*')
-   name++;
-}
-
   write_fn_marker (s, is_defn, TREE_PUBLIC (decl), name);
 
   /* PTX declaration.  */
@@ -929,8 +915,26 @@ write_fn_proto (std::stringstream &s, bool is_defn,
 s << ")";
 
   s << (is_defn ? "\n" : ";\n");
+}
 
-  return name;
+/* Write a .func or .kernel declaration or definition along with
+   a helper comment for use by ld.  S is the stream to write to, DECL
+   the decl for the function with name NAME.  For definitions, emit
+   a declaration too.  */
+
+static void
+write_fn_proto (std::stringstream &s, bool is_defn,
+   const char *name, const_tree decl)
+{
+  name = nvptx_name_replacement (name);
+  if (name[0] == '*')
+name++;
+
+  if (is_defn)
+/* Emit a declaration.  The PTX assembler gets upset without it.  */
+write_fn_proto_1 (s, false, name, decl);
+
+  write_fn_proto_1 (s, is_defn, name, decl);
 }
 
 /* Construct a function declaration from a call insn.  This can be


[committed][nvptx] Replace dots in function names

2020-10-10 Thread Tom de Vries
Hi,

When function splitting clones a function sinf in the host compiler, the clone
is callled sinf.part.0.  However, ptx does not allows dots in identifiers, so
we run into:
...
ptxas test.o, line 23; fatal   : Parsing error near '.part': syntax error
ptxas fatal   : Ptx assembly aborted due to errors
nvptx-as: ptxas returned 255 exit status
...

Rename such functions by replacing the dots with dollar signs.

Tested check-gcc on nvptx.

Tested libgomp on x86_64-linux with nvptx accelerator.

Committed to trunk.

Thanks,
- Tom

[nvptx] Replace dots in function names

gcc/ChangeLog:

2020-10-10  Tom de Vries  

PR target/97318
* config/nvptx/nvptx.c (nvptx_replace_dot): New function.
(write_fn_proto, write_fn_proto_from_insn, nvptx_output_call_insn):
Use nvptx_replace_dot.

---
 gcc/config/nvptx/nvptx.c | 57 +---
 1 file changed, 54 insertions(+), 3 deletions(-)

diff --git a/gcc/config/nvptx/nvptx.c b/gcc/config/nvptx/nvptx.c
index 0c1d6d112ec..17349475fff 100644
--- a/gcc/config/nvptx/nvptx.c
+++ b/gcc/config/nvptx/nvptx.c
@@ -368,6 +368,22 @@ nvptx_name_replacement (const char *name)
   return name;
 }
 
+/* Return NULL if NAME contains no dot.  Otherwise return a copy of NAME
+   with the dots replaced with dollar signs.  */
+
+static char *
+nvptx_replace_dot (const char *name)
+{
+  if (strchr (name, '.') == NULL)
+return NULL;
+
+  char *p = xstrdup (name);
+  for (size_t i = 0; i < strlen (p); ++i)
+if (p[i] == '.')
+  p[i] = '$';
+  return p;
+}
+
 /* If MODE should be treated as two registers of an inner mode, return
that inner mode.  Otherwise return VOIDmode.  */
 
@@ -926,7 +942,16 @@ static void
 write_fn_proto (std::stringstream &s, bool is_defn,
const char *name, const_tree decl)
 {
-  name = nvptx_name_replacement (name);
+  const char *replacement = nvptx_name_replacement (name);
+  char *replaced_dots = NULL;
+  if (replacement != name)
+name = replacement;
+  else
+{
+  replaced_dots = nvptx_replace_dot (name);
+  if (replaced_dots)
+   name = replaced_dots;
+}
   if (name[0] == '*')
 name++;
 
@@ -935,6 +960,9 @@ write_fn_proto (std::stringstream &s, bool is_defn,
 write_fn_proto_1 (s, false, name, decl);
 
   write_fn_proto_1 (s, is_defn, name, decl);
+
+  if (replaced_dots)
+XDELETE (replaced_dots);
 }
 
 /* Construct a function declaration from a call insn.  This can be
@@ -946,6 +974,8 @@ static void
 write_fn_proto_from_insn (std::stringstream &s, const char *name,
  rtx result, rtx pat)
 {
+  char *replaced_dots = NULL;
+
   if (!name)
 {
   s << "\t.callprototype ";
@@ -953,7 +983,15 @@ write_fn_proto_from_insn (std::stringstream &s, const char 
*name,
 }
   else
 {
-  name = nvptx_name_replacement (name);
+  const char *replacement = nvptx_name_replacement (name);
+  if (replacement != name)
+   name = replacement;
+  else
+   {
+ replaced_dots = nvptx_replace_dot (name);
+ if (replaced_dots)
+   name = replaced_dots;
+   }
   write_fn_marker (s, false, true, name);
   s << "\t.extern .func ";
 }
@@ -962,6 +1000,8 @@ write_fn_proto_from_insn (std::stringstream &s, const char 
*name,
 write_return_mode (s, true, GET_MODE (result));
 
   s << name;
+  if (replaced_dots)
+XDELETE (replaced_dots);
 
   int arg_end = XVECLEN (pat, 0);
   for (int i = 1; i < arg_end; i++)
@@ -2467,9 +2507,20 @@ nvptx_output_call_insn (rtx_insn *insn, rtx result, rtx 
callee)
   
   if (decl)
 {
+  char *replaced_dots = NULL;
   const char *name = get_fnname_from_decl (decl);
-  name = nvptx_name_replacement (name);
+  const char *replacement = nvptx_name_replacement (name);
+  if (replacement != name)
+   name = replacement;
+  else
+   {
+ replaced_dots = nvptx_replace_dot (name);
+ if (replaced_dots)
+   name = replaced_dots;
+   }
   assemble_name (asm_out_file, name);
+  if (replaced_dots)
+   XDELETE (replaced_dots);
 }
   else
 output_address (VOIDmode, callee);


[committed][nvptx] Fix -msoft-stack-reserve-local format

2020-10-12 Thread Tom de Vries
Hi,

Currently, in order to use the switch -msoft-stack-reserve-local with the
default arg 128, you have to specify '-msoft-stack-reserve-local128'.

Fix the switch format such that you specify '-msoft-stack-reserve-local=128'
instead.

Tested on nvptx.

Committed to trunk.

Thanks,
- Tom

[nvptx] Fix -msoft-stack-reserve-local format

gcc/ChangeLog:

2020-10-12  Tom de Vries  

* config/nvptx/nvptx.opt (-msoft-stack-reserve-local): Rename to ...
(-msoft-stack-reserve-local=): ... this.

---
 gcc/config/nvptx/nvptx.opt | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/gcc/config/nvptx/nvptx.opt b/gcc/config/nvptx/nvptx.opt
index d6910a96cf0..38454222d42 100644
--- a/gcc/config/nvptx/nvptx.opt
+++ b/gcc/config/nvptx/nvptx.opt
@@ -37,7 +37,7 @@ msoft-stack
 Target Report Mask(SOFT_STACK)
 Use custom stacks instead of local memory for automatic storage.
 
-msoft-stack-reserve-local
+msoft-stack-reserve-local=
 Target Report Joined RejectNegative UInteger Var(nvptx_softstack_size) 
Init(128)
 Specify size of .local memory used for stack when the exact amount is not 
known.
 


[PING][PATCH][libgomp, nvptx] Fix hang in gomp_team_barrier_wait_end

2021-05-19 Thread Tom de Vries
On 4/23/21 6:48 PM, Tom de Vries wrote:
> On 4/23/21 5:45 PM, Alexander Monakov wrote:
>> On Thu, 22 Apr 2021, Tom de Vries wrote:
>>
>>> Ah, I see, agreed, that makes sense.  I was afraid there was some
>>> fundamental problem that I overlooked.
>>>
>>> Here's an updated version.  I've tried to make it clear that the
>>> futex_wait/wake are locally used versions, not generic functionality.
>> Could you please regenerate the patch passing appropriate flags to
>> 'git format-patch' so it presents a rewrite properly (see documentation
>> for --patience and --break-rewrites options). The attached patch was mostly
>> unreadable, I'm afraid.
> Sure.  I did notice that the patch was not readable, but I didn't known
> there were options to improve that, so thanks for pointing that out.
> 

Ping.  Any comments?

Thanks,
- Tom

> 0001-libgomp-nvptx-Fix-hang-in-gomp_team_barrier_wait_end.patch
> 
> From d3053a7ec7444b371ee29097a673e637b0d369d9 Mon Sep 17 00:00:00 2001
> From: Tom de Vries 
> Date: Tue, 20 Apr 2021 08:47:03 +0200
> Subject: [PATCH 1/4] [libgomp, nvptx] Fix hang in gomp_team_barrier_wait_end
> 
> Consider the following omp fragment.
> ...
>   #pragma omp target
>   #pragma omp parallel num_threads (2)
>   #pragma omp task
> ;
> ...
> 
> This hangs at -O0 for nvptx.
> 
> Investigating the behaviour gives us the following trace of events:
> - both threads execute GOMP_task, where they:
>   - deposit a task, and
>   - execute gomp_team_barrier_wake
> - thread 1 executes gomp_team_barrier_wait_end and, not being the last thread,
>   proceeds to wait at the team barrier
> - thread 0 executes gomp_team_barrier_wait_end and, being the last thread, it
>   calls gomp_barrier_handle_tasks, where it:
>   - executes both tasks and marks the team barrier done
>   - executes a gomp_team_barrier_wake which wakes up thread 1
> - thread 1 exits the team barrier
> - thread 0 returns from gomp_barrier_handle_tasks and goes to wait at
>   the team barrier.
> - thread 0 hangs.
> 
> To understand why there is a hang here, it's good to understand how things
> are setup for nvptx.  The libgomp/config/nvptx/bar.c implementation is
> a copy of the libgomp/config/linux/bar.c implementation, with uses of both
> futex_wake and do_wait replaced with uses of ptx insn bar.sync:
> ...
>   if (bar->total > 1)
> asm ("bar.sync 1, %0;" : : "r" (32 * bar->total));
> ...
> 
> The point where thread 0 goes to wait at the team barrier, corresponds in
> the linux implementation with a do_wait.  In the linux case, the call to
> do_wait doesn't hang, because it's waiting for bar->generation to become
> a certain value, and if bar->generation already has that value, it just
> proceeds, without any need for coordination with other threads.
> 
> In the nvtpx case, the bar.sync waits until thread 1 joins it in the same
> logical barrier, which never happens: thread 1 is lingering in the
> thread pool at the thread pool barrier (using a different logical barrier),
> waiting to join a new team.
> 
> The easiest way to fix this is to revert to the posix implementation for
> bar.{c,h}.  That however falls back on a busy-waiting approach, and
> does not take advantage of the ptx bar.sync insn.
> 
> Instead, we revert to the linux implementation for bar.c,
> and implement bar.c local functions futex_wait and futex_wake using the
> bar.sync insn.
> 
> This is a WIP version that does not yet take performance into consideration,
> but instead focuses on copying a working version as completely as possible,
> and isolating the machine-specific changes to as few functions as
> possible.
> 
> The bar.sync insn takes an argument specifying how many threads are
> participating, and that doesn't play well with the futex syntax where it's
> not clear in advance how many threads will be woken up.
> 
> This is solved by waking up all waiting threads each time a futex_wait or
> futex_wake happens, and possibly going back to sleep with an updated thread
> count.
> 
> Tested libgomp on x86_64 with nvptx accelerator, both as-is and with
> do_spin hardcoded to 1.
> 
> libgomp/ChangeLog:
> 
> 2021-04-20  Tom de Vries  
> 
>   PR target/99555
>   * config/nvptx/bar.c (generation_to_barrier): New function, copied
>   from config/rtems/bar.c.
>   (futex_wait, futex_wake): New function.
>   (do_spin, do_wait): New function, copied from config/linux/wait.h.
>   (gomp_barrier_wait_end, gomp_barrier_wait_last)
>   (gomp_team_barrier_wake, gomp_team_barrier_wait_end):
>   (gomp_team_barrier_wait_canc

Re: [PATCH][libgomp, nvptx] Fix hang in gomp_team_barrier_wait_end

2021-05-20 Thread Tom de Vries
On 5/20/21 11:52 AM, Thomas Schwinge wrote:
> Hi Tom!
> 
> First, thanks for looking into this PR99555!
> 
> 
> I can't comment on the OpenMP/nvptx changes, so just the following:
> 
> On 2021-04-23T18:48:01+0200, Tom de Vries  wrote:
>> --- a/libgomp/testsuite/libgomp.fortran/task-detach-6.f90
>> +++ b/libgomp/testsuite/libgomp.fortran/task-detach-6.f90
>> @@ -1,6 +1,5 @@
>>  ! { dg-do run }
>>
>> -! { dg-additional-sources on_device_arch.c }
>>! { dg-prune-output "command-line option '-fintrinsic-modules-path=.*' is 
>> valid for Fortran but not for C" }
> 
> Please remove the 'dg-prune-output', too.  ;-)
> 

Ack, updated patch.

> Your changes leave
> 'libgomp/testsuite/lib/libgomp.exp:check_effective_target_offload_device_nvptx',
> 'libgomp/testsuite/libgomp.c-c++-common/on_device_arch.h',
> 'libgomp/testsuite/libgomp.fortran/on_device_arch.c' unused.  Should we
> keep those for a potential future use (given that they've been tested to
> work) or remove (as now unused, danger of bit-rot)?

I vote to leave them in, they look useful, and I think the danger of
bit-rot is less than the danger of not knowing/remembering that they
once where there and having to start from scratch.

Thanks,
- Tom


[PATCH][gcc/doc] Improve nonnull attribute documentation

2021-07-28 Thread Tom de Vries
Hi,

Improve nonnull attribute documentation in a number of ways:

Reorganize discussion of effects into:
- effects for calls to functions with nonnull-marked parameters, and
- effects for function definitions with nonnull-marked parameters.
This makes it clear that -fno-delete-null-pointer-checks has no effect for
optimizations based on nonnull-marked parameters in function definitions
(see PR100404).

Mention -Wnonnull-compare.

Mention workaround from PR100404 comment 7.

The workaround can be used for this scenario.  Say we have a test.c:
...
 #include 

 extern int isnull (char *ptr) __attribute__ ((nonnull));
 int isnull (char *ptr)
 {
   if (ptr == 0)
 return 1;
   return 0;
 }

 int
 main (void)
 {
   char *ptr = NULL;
   if (isnull (ptr)) __builtin_abort ();
   return 0;
 }
...

The test-case contains a mistake: ptr == NULL, and we want to detect the
mistake using an abort:
...
$ gcc test.c
$ ./a.out
Aborted (core dumped)
...

At -O2 however, the mistake is not detected:
...
$ gcc test.c -O2
$ ./a.out
...
which is what -Wnonnull-compare (not show here) warns about.

The easiest way to fix this is by dropping the nonnull attribute.  But that
also disables -Wnonnull, which would detect something like:
...
  if (isnull (NULL)) __builtin_abort ();
...
at compile time.

Using this workaround:
...
 int isnull (char *ptr)
 {
+  asm ("" : "+r"(ptr));
   if (ptr == 0)
 return 1;
   return 0;
 }
...
we still manage to detect the problem at runtime with -O2:
...
$ ~/gcc_versions/devel/install/bin/gcc test.c -O2
$ ./a.out
Aborted (core dumped)
...
while keeping the possibility to detect "isnull (NULL)" at compile time.

OK for trunk?

Thanks,
- Tom

[gcc/doc] Improve nonnull attribute documentation

gcc/ChangeLog:

2021-07-28  Tom de Vries  

* doc/extend.texi (nonnull attribute): Improve documentation.

---
 gcc/doc/extend.texi | 51 ---
 1 file changed, 40 insertions(+), 11 deletions(-)

diff --git a/gcc/doc/extend.texi b/gcc/doc/extend.texi
index b83cd4919bb..3389effd70c 100644
--- a/gcc/doc/extend.texi
+++ b/gcc/doc/extend.texi
@@ -3488,17 +3488,46 @@ my_memcpy (void *dest, const void *src, size_t len)
 @end smallexample
 
 @noindent
-causes the compiler to check that, in calls to @code{my_memcpy},
-arguments @var{dest} and @var{src} are non-null.  If the compiler
-determines that a null pointer is passed in an argument slot marked
-as non-null, and the @option{-Wnonnull} option is enabled, a warning
-is issued.  @xref{Warning Options}.  Unless disabled by
-the @option{-fno-delete-null-pointer-checks} option the compiler may
-also perform optimizations based on the knowledge that certain function
-arguments cannot be null. In addition,
-the @option{-fisolate-erroneous-paths-attribute} option can be specified
-to have GCC transform calls with null arguments to non-null functions
-into traps. @xref{Optimize Options}.
+informs the compiler that, in calls to @code{my_memcpy}, arguments
+@var{dest} and @var{src} must be non-null.
+
+The attribute has effect both for functions calls and function definitions.
+
+For function calls:
+@itemize @bullet
+@item If the compiler determines that a null pointer is
+passed in an argument slot marked as non-null, and the
+@option{-Wnonnull} option is enabled, a warning is issued.
+@xref{Warning Options}.
+@item The @option{-fisolate-erroneous-paths-attribute} option can be
+specified to have GCC transform calls with null arguments to non-null
+functions into traps.  @xref{Optimize Options}.
+@item The compiler may also perform optimizations based on the
+knowledge that certain function arguments cannot be null.  These
+optimizations can be disabled by the
+@option{-fno-delete-null-pointer-checks} option. @xref{Optimize Options}.
+@end itemize
+
+For function definitions:
+@itemize @bullet
+@item If the compiler determines that a function parameter that is
+marked with non-null is compared with null, and
+@option{-Wnonnull-compare} option is enabled, a warning is issued.
+@xref{Warning Options}.
+@item The compiler may also perform optimizations based on the
+knowledge that certain function parameters cannot be null.  This can
+be disabled by hiding the nonnullness using an inline assembly statement:
+
+@smallexample
+extern int isnull (char *ptr) __attribute__((nonnull));
+int isnull (char *ptr) @{
+  asm ("" : "+r"(ptr));
+  if (ptr == 0)
+return 1;
+  return 0;
+@}
+@end smallexample
+@end itemize
 
 If no @var{arg-index} is given to the @code{nonnull} attribute,
 all pointer arguments are marked as non-null.  To illustrate, the


Re: [PATCH][gcc/doc] Improve nonnull attribute documentation

2021-07-30 Thread Tom de Vries
On 7/30/21 9:25 AM, Richard Biener wrote:
> On Wed, 28 Jul 2021, Tom de Vries wrote:
> 
>> Hi,
>>
>> Improve nonnull attribute documentation in a number of ways:
>>
>> Reorganize discussion of effects into:
>> - effects for calls to functions with nonnull-marked parameters, and
>> - effects for function definitions with nonnull-marked parameters.
>> This makes it clear that -fno-delete-null-pointer-checks has no effect for
>> optimizations based on nonnull-marked parameters in function definitions
>> (see PR100404).
>>
>> Mention -Wnonnull-compare.
>>
>> Mention workaround from PR100404 comment 7.
>>
>> The workaround can be used for this scenario.  Say we have a test.c:
>> ...
>>  #include 
>>
>>  extern int isnull (char *ptr) __attribute__ ((nonnull));
>>  int isnull (char *ptr)
>>  {
>>if (ptr == 0)
>>  return 1;
>>return 0;
>>  }
>>
>>  int
>>  main (void)
>>  {
>>char *ptr = NULL;
>>if (isnull (ptr)) __builtin_abort ();
>>return 0;
>>  }
>> ...
>>
>> The test-case contains a mistake: ptr == NULL, and we want to detect the
>> mistake using an abort:
>> ...
>> $ gcc test.c
>> $ ./a.out
>> Aborted (core dumped)
>> ...
>>
>> At -O2 however, the mistake is not detected:
>> ...
>> $ gcc test.c -O2
>> $ ./a.out
>> ...
>> which is what -Wnonnull-compare (not show here) warns about.
>>
>> The easiest way to fix this is by dropping the nonnull attribute.  But that
>> also disables -Wnonnull, which would detect something like:
>> ...
>>   if (isnull (NULL)) __builtin_abort ();
>> ...
>> at compile time.
>>
>> Using this workaround:
>> ...
>>  int isnull (char *ptr)
>>  {
>> +  asm ("" : "+r"(ptr));
>>if (ptr == 0)
>>  return 1;
>>return 0;
>>  }
>> ...
>> we still manage to detect the problem at runtime with -O2:
>> ...
>> $ ~/gcc_versions/devel/install/bin/gcc test.c -O2
>> $ ./a.out
>> Aborted (core dumped)
>> ...
>> while keeping the possibility to detect "isnull (NULL)" at compile time.
>>
>> OK for trunk?
> 
> I think it's an improvement over the current situation but the
> inline-assembler suggestion to "fix" definition side optimizations
> are IMHO a hint at that we need a better solution here.

Agreed.

[ If you want I can resubmit without that bit, if it's not acceptable. I
merely included it because it's the only solution I found advertised (in
bugzilla). ]

> Splitting
> the attribute into a caller and a calle side one for example,
> or making -fno-delete-null-pointer-checks do what it suggests.
> 

I think the problem is that in fact there are two attributes:
- assume_nonnull
- verify_nonnull (or, want_nonnull or some such)
which got conflated in the nonnull attribute.  [ Which still could be
fine if you got an -fnonnull=assume/verify to switch between the two
interpretations. ]

Anyway, so more concretely my idea is that this should generate a -Wnonnull:
...
extern void foo (void *ptr) __attribute__((verify_nonnull (1)));
void bar (void) {
  foo (nullptr);
}
...
and the same with assume_nonnull (after all, the assumption is violated).

And this assert could be optimized away (with -Wnonnull-compare):
...
extern void foo (void *ptr) __attribute__((assume_nonnull (1)));
void foo (void *ptr) {
  assert (ptr != nullptr);
}
...
while the assert shouldn't be optimized away with verify_nonnull.

And likewise, this if could be optimized away by
fdelete-null-pointer-checks:
... 
extern void foo (void *ptr) __attribute__((assume_nonnull (1)));
void bar (void *ptr) {
  foo (ptr);
  if (ptr != nullptr)
{ ... }
}
...
while the if shouldn't be optimized away with verify_nonnull.

I think this way of splitting up the functionality conforms to the
principle of least surprise and does not require thinking about caller /
callee distinction, nor does it require extension of
-fno-delete-null-pointer-checks.

Thanks,
- Tom

> And as suggested elsewhere the effect of -fno-delete-null-pointer-checks
> making objects at NULL address valid should be a target hook based
> on the address-space with the default implementation considering
> only the default address-space having no objects at NULL.
> 



Re: [PATCH][gcc/doc] Improve nonnull attribute documentation

2021-07-30 Thread Tom de Vries
On 7/30/21 6:17 PM, Martin Sebor wrote:
> On 7/28/21 9:20 AM, Tom de Vries wrote:
>> Hi,
>>
>> Improve nonnull attribute documentation in a number of ways:
>>
>> Reorganize discussion of effects into:
>> - effects for calls to functions with nonnull-marked parameters, and
>> - effects for function definitions with nonnull-marked parameters.
>> This makes it clear that -fno-delete-null-pointer-checks has no effect
>> for
>> optimizations based on nonnull-marked parameters in function definitions
>> (see PR100404).
> 
> This resolves half of PR 101665 that I raised the other day (i.e.,
> updates the docs).  Thank you!

You're welcome :)

> Since PR 100404 was resolved as
> invalid,

Yeah, I can also live with reopening that one as documentation PR.

> can you please reference the other PR in the changelog?

Done.

> The other half (warning when attribute nonnull is specified along
> with attribute optimize "-fno-delete-null-pointer-checks") remains.
> I plan to look into it unless someone beats me to it or unless some
> other solution emerges.
> 

FWIW, In my reply to Richi here (
https://gcc.gnu.org/pipermail/gcc-patches/2021-July/576415.html ) I
proposed to split the existing nonnull  attribute functionality into
assume_nonnull/verify_nonnull attributes.  I'm curious what you think of
that proposal.

> A few comments on the documentation changes below.
> 
>>
>> Mention -Wnonnull-compare.
>>
>> Mention workaround from PR100404 comment 7.
>>
>> The workaround can be used for this scenario.  Say we have a test.c:
>> ...
>>   #include 
>>
>>   extern int isnull (char *ptr) __attribute__ ((nonnull));
>>   int isnull (char *ptr)
>>   {
>>     if (ptr == 0)
>>   return 1;
>>     return 0;
>>   }
>>
>>   int
>>   main (void)
>>   {
>>     char *ptr = NULL;
>>     if (isnull (ptr)) __builtin_abort ();
>>     return 0;
>>   }
>> ...
>>
>> The test-case contains a mistake: ptr == NULL, and we want to detect the
>> mistake using an abort:
>> ...
>> $ gcc test.c
>> $ ./a.out
>> Aborted (core dumped)
>> ...
>>
>> At -O2 however, the mistake is not detected:
>> ...
>> $ gcc test.c -O2
>> $ ./a.out
>> ...
>> which is what -Wnonnull-compare (not show here) warns about.
>>
>> The easiest way to fix this is by dropping the nonnull attribute.  But
>> that
>> also disables -Wnonnull, which would detect something like:
>> ...
>>    if (isnull (NULL)) __builtin_abort ();
>> ...
>> at compile time.
>>
>> Using this workaround:
>> ...
>>   int isnull (char *ptr)
>>   {
>> +  asm ("" : "+r"(ptr));
>>     if (ptr == 0)
>>   return 1;
>>     return 0;
>>   }
>> ...
>> we still manage to detect the problem at runtime with -O2:
>> ...
>> $ ~/gcc_versions/devel/install/bin/gcc test.c -O2
>> $ ./a.out
>> Aborted (core dumped)
>> ...
>> while keeping the possibility to detect "isnull (NULL)" at compile time.
>>
>> OK for trunk?
>>
>> Thanks,
>> - Tom
>>
>> [gcc/doc] Improve nonnull attribute documentation
>>
>> gcc/ChangeLog:
>>
>> 2021-07-28  Tom de Vries  
>>
>> * doc/extend.texi (nonnull attribute): Improve documentation.
>>
>> ---
>>   gcc/doc/extend.texi | 51
>> ---
>>   1 file changed, 40 insertions(+), 11 deletions(-)
>>
>> diff --git a/gcc/doc/extend.texi b/gcc/doc/extend.texi
>> index b83cd4919bb..3389effd70c 100644
>> --- a/gcc/doc/extend.texi
>> +++ b/gcc/doc/extend.texi
>> @@ -3488,17 +3488,46 @@ my_memcpy (void *dest, const void *src, size_t
>> len)
>>   @end smallexample
>>     @noindent
>> -causes the compiler to check that, in calls to @code{my_memcpy},
>> -arguments @var{dest} and @var{src} are non-null.  If the compiler
>> -determines that a null pointer is passed in an argument slot marked
>> -as non-null, and the @option{-Wnonnull} option is enabled, a warning
>> -is issued.  @xref{Warning Options}.  Unless disabled by
>> -the @option{-fno-delete-null-pointer-checks} option the compiler may
>> -also perform optimizations based on the knowledge that certain function
>> -arguments cannot be null. In addition,
>> -the @option{-fisolate-erroneous-paths-attribute} option can be specified
>> -to have GCC transform calls with null arguments to non-null functions
>> -into traps. @xref{Optimize Option

[PATCH][omp, simt] Fix expand_GOMP_SIMT_*

2021-04-28 Thread Tom de Vries
Hi,

When running the test-case included in this patch using an
nvptx accelerator, it fails in execution.

The problem is that the expansion of GOMP_SIMT_XCHG_BFLY is optimized away
during pass_jump as "trivially dead insns".

This is caused by this code in expand_GOMP_SIMT_XCHG_BFLY:
...
  class expand_operand ops[3];
  create_output_operand (&ops[0], target, mode);
  ...
  expand_insn (targetm.code_for_omp_simt_xchg_bfly, 3, ops);
...
which doesn't guarantee that target is assigned to by the expanded insn.

F.i., if target is:
...
(gdb) call debug_rtx ( target )
(subreg/s/u:QI (reg:SI 40 [ _61 ]) 0)
...
then after expand_insn, we have:
...
(gdb) call debug_rtx ( ops[0].value )
(reg:QI 57)
...

See commit 3af3bec2e4d "internal-fn: Avoid dropping the lhs of some
calls [PR94941]" for a similar problem.

Fix this in the same way, by adding:
...
  if (!rtx_equal_p (target, ops[0].value))
emit_move_insn (target, ops[0].value);
...
where applicable in the expand_GOMP_SIMT_* functions.

Tested libgomp on x86_64 with nvptx accelerator.

Any comments?

Thanks,
- Tom

[omp, simt] Fix expand_GOMP_SIMT_*

gcc/ChangeLog:

2021-04-28  Tom de Vries  

PR target/100232
* internal-fn.c (expand_GOMP_SIMT_ENTER_ALLOC)
(expand_GOMP_SIMT_LAST_LANE, expand_GOMP_SIMT_ORDERED_PRED)
(expand_GOMP_SIMT_VOTE_ANY, expand_GOMP_SIMT_XCHG_BFLY)
(expand_GOMP_SIMT_XCHG_IDX): Ensure target is assigned to.

---
 gcc/internal-fn.c   | 12 
 libgomp/testsuite/libgomp.c/target-43.c | 24 
 2 files changed, 36 insertions(+)

diff --git a/gcc/internal-fn.c b/gcc/internal-fn.c
index dd7173126fb..d209a52f823 100644
--- a/gcc/internal-fn.c
+++ b/gcc/internal-fn.c
@@ -243,6 +243,8 @@ expand_GOMP_SIMT_ENTER_ALLOC (internal_fn, gcall *stmt)
   create_input_operand (&ops[2], align, Pmode);
   gcc_assert (targetm.have_omp_simt_enter ());
   expand_insn (targetm.code_for_omp_simt_enter, 3, ops);
+  if (!rtx_equal_p (target, ops[0].value))
+emit_move_insn (target, ops[0].value);
 }
 
 /* Deallocate per-lane storage and leave non-uniform execution region.  */
@@ -300,6 +302,8 @@ expand_GOMP_SIMT_LAST_LANE (internal_fn, gcall *stmt)
   create_input_operand (&ops[1], cond, mode);
   gcc_assert (targetm.have_omp_simt_last_lane ());
   expand_insn (targetm.code_for_omp_simt_last_lane, 2, ops);
+  if (!rtx_equal_p (target, ops[0].value))
+emit_move_insn (target, ops[0].value);
 }
 
 /* Non-transparent predicate used in SIMT lowering of OpenMP "ordered".  */
@@ -319,6 +323,8 @@ expand_GOMP_SIMT_ORDERED_PRED (internal_fn, gcall *stmt)
   create_input_operand (&ops[1], ctr, mode);
   gcc_assert (targetm.have_omp_simt_ordered ());
   expand_insn (targetm.code_for_omp_simt_ordered, 2, ops);
+  if (!rtx_equal_p (target, ops[0].value))
+emit_move_insn (target, ops[0].value);
 }
 
 /* "Or" boolean reduction across SIMT lanes: return non-zero in all lanes if
@@ -339,6 +345,8 @@ expand_GOMP_SIMT_VOTE_ANY (internal_fn, gcall *stmt)
   create_input_operand (&ops[1], cond, mode);
   gcc_assert (targetm.have_omp_simt_vote_any ());
   expand_insn (targetm.code_for_omp_simt_vote_any, 2, ops);
+  if (!rtx_equal_p (target, ops[0].value))
+emit_move_insn (target, ops[0].value);
 }
 
 /* Exchange between SIMT lanes with a "butterfly" pattern: source lane index
@@ -361,6 +369,8 @@ expand_GOMP_SIMT_XCHG_BFLY (internal_fn, gcall *stmt)
   create_input_operand (&ops[2], idx, SImode);
   gcc_assert (targetm.have_omp_simt_xchg_bfly ());
   expand_insn (targetm.code_for_omp_simt_xchg_bfly, 3, ops);
+  if (!rtx_equal_p (target, ops[0].value))
+emit_move_insn (target, ops[0].value);
 }
 
 /* Exchange between SIMT lanes according to given source lane index.  */
@@ -382,6 +392,8 @@ expand_GOMP_SIMT_XCHG_IDX (internal_fn, gcall *stmt)
   create_input_operand (&ops[2], idx, SImode);
   gcc_assert (targetm.have_omp_simt_xchg_idx ());
   expand_insn (targetm.code_for_omp_simt_xchg_idx, 3, ops);
+  if (!rtx_equal_p (target, ops[0].value))
+emit_move_insn (target, ops[0].value);
 }
 
 /* This should get expanded in adjust_simduid_builtins.  */
diff --git a/libgomp/testsuite/libgomp.c/target-43.c 
b/libgomp/testsuite/libgomp.c/target-43.c
new file mode 100644
index 000..46b1cfc5b20
--- /dev/null
+++ b/libgomp/testsuite/libgomp.c/target-43.c
@@ -0,0 +1,24 @@
+/* { dg-do run } */
+#include 
+
+#define N 32
+#define TYPE char
+
+int
+main (void)
+{
+  TYPE result = 1;
+  TYPE a[N];
+  for (int x = 0; x < N; ++x)
+a[x] = 1;
+
+#pragma omp target map(tofrom: result) map(to:a)
+#pragma omp for simd reduction(&&:result)
+  for (int x = 0; x < N; ++x)
+result = result && a[x];
+
+  if (result != 1)
+abort ();
+
+  return 0;
+}


[committed][omp, simt] Handle alternative IV

2021-04-29 Thread Tom de Vries
On 4/22/21 1:46 PM, Tom de Vries wrote:
> On 12/17/20 5:46 PM, Tom de Vries wrote:
>> On 10/15/20 5:05 PM, Tom de Vries wrote:
>>> On 10/2/20 3:21 PM, Tom de Vries wrote:
>>>> Hi,
>>>>
>>>> Consider the test-case libgomp.c/pr81778.c added in this commit, with
>>>> this core loop (note: CANARY_SIZE set to 0 for simplicity):
>>>> ...
>>>>   int s = 1;
>>>>   #pragma omp target simd
>>>>   for (int i = N - 1; i > -1; i -= s)
>>>> a[i] = 1;
>>>> ...
>>>> which, given that N is 32, sets a[0..31] to 1.
>>>>
>>>> After omp-expand, this looks like:
>>>> ...
>>>>:
>>>>   simduid.7 = .GOMP_SIMT_ENTER (simduid.7);
>>>>   .omp_simt.8 = .GOMP_SIMT_ENTER_ALLOC (simduid.7);
>>>>   D.3193 = -s;
>>>>   s.9 = s;
>>>>   D.3204 = .GOMP_SIMT_LANE ();
>>>>   D.3205 = -s.9;
>>>>   D.3206 = (int) D.3204;
>>>>   D.3207 = D.3205 * D.3206;
>>>>   i = D.3207 + 31;
>>>>   D.3209 = 0;
>>>>   D.3210 = -s.9;
>>>>   D.3211 = D.3210 - i;
>>>>   D.3210 = -s.9;
>>>>   D.3212 = D.3211 / D.3210;
>>>>   D.3213 = (unsigned int) D.3212;
>>>>   D.3213 = i >= 0 ? D.3213 : 0;
>>>>
>>>>:
>>>>   if (D.3209 < D.3213)
>>>> goto ; [87.50%]
>>>>   else
>>>> goto ; [12.50%]
>>>>
>>>>:
>>>>   a[i] = 1;
>>>>   D.3215 = -s.9;
>>>>   D.3219 = .GOMP_SIMT_VF ();
>>>>   D.3216 = (int) D.3219;
>>>>   D.3220 = D.3215 * D.3216;
>>>>   i = D.3220 + i;
>>>>   D.3209 = D.3209 + 1;
>>>>   goto ; [100.00%]
>>>> ...
>>>>
>>>> On nvptx, the first time bb6 is executed, i is in the 0..31 range 
>>>> (depending
>>>> on the lane that is executing) at bb entry.
>>>>
>>>> So we have the following sequence:
>>>> - a[0..31] is set to 1
>>>> - i is updated to -32..-1
>>>> - D.3209 is updated to 1 (being 0 initially)
>>>> - bb19 is executed, and if condition (D.3209 < D.3213) == (1 < 32) 
>>>> evaluates
>>>>   to true
>>>> - bb6 is once more executed, which should not happen because all the 
>>>> elements
>>>>   that needed to be handled were already handled.
>>>> - consequently, elements that should not be written are written
>>>> - with CANARY_SIZE == 0, we may run into a libgomp error:
>>>>   ...
>>>>   libgomp: cuCtxSynchronize error: an illegal memory access was encountered
>>>>   ...
>>>>   and with CANARY_SIZE unmodified, we run into:
>>>>   ...
>>>>   Expected 0, got 1 at base[-961]
>>>>   Aborted (core dumped)
>>>>   ...
>>>>
>>>> The cause of this is as follows:
>>>> - because the step s is a variable rather than a constant, an alternative
>>>>   IV (D.3209 in our example) is generated in expand_omp_simd, and the
>>>>   loop condition is tested in terms of the alternative IV rather than
>>>>   the original IV (i in our example).
>>>> - the SIMT code in expand_omp_simd works by modifying step and initial 
>>>> value.
>>>> - The initial value fd->loop.n1 is loaded into a variable n1, which is
>>>>   modified by the SIMT code and then used there-after.
>>>> - The step fd->loop.step is loaded into a variable step, which is is 
>>>> modified
>>>>   by the SIMT code, but afterwards there are uses of both step and
>>>>   fd->loop.step.
>>>> - There are uses of fd->loop.step in the alternative IV handling code,
>>>>   which should use step instead.
>>>>
>>>> Fix this by introducing an additional variable orig_step, which is not
>>>> modified by the SIMT code and replacing all remaining uses of fd->loop.step
>>>> by either step or orig_step.
>>>>
>>>> Build on x86_64-linux with nvptx accelerator, tested libgomp.
>>>>
>>>> This fixes for-5.c and for-6.c FAILs I'm currently seeing on a quadro m1200
>>>> with driver 450.66.
>>>>
>>>> OK for trunk?
>>>>
>>>
> 
> Ping^3.
> 

Committed.

Thanks,
- Tom

>>>> [omp, simt] Handle alternative IV
>>>>
>>>> gcc/ChangeLog:
>

[PATCH][openmp, simt] Error out for user-defined reduction

2021-05-03 Thread Tom de Vries
Hi,

The test-case included in this patch contains this target region:
...
  for (int i0 = 0 ; i0 < N0 ; i0++ )
counter_N0.i += 1;
...

When running with nvptx accelerator, the counter variable is expected to
be N0 after the region, but instead is N0 / 32.  The problem is that rather
than getting the result for all warp lanes, we get it for just one lane.

This is caused by the implementation of SIMT being incomplete.  It handles
regular reductions, but appearantly not user-defined reductions.

For now, make this explicit by erroring out for nvptx, like this:
...
target-44.c: In function 'main':
target-44.c:20:9: error: SIMT reduction not fully implemented
...

Tested libgomp on x86_64-linux with and without nvptx accelerator.

Any comments?

Thanks,
- Tom

[openmp, simt] Error out for user-defined reduction

gcc/ChangeLog:

2021-05-03  Tom de Vries  

PR target/100321
* omp-low.c (lower_rec_input_clauses): Error out for user-defined 
reduction
for SIMT.

libgomp/ChangeLog:

2021-05-03  Tom de Vries  

PR target/100321
* testsuite/libgomp.c/target-44.c: New test.

---
 gcc/omp-low.c   |  2 ++
 libgomp/testsuite/libgomp.c/target-44.c | 28 
 2 files changed, 30 insertions(+)

diff --git a/gcc/omp-low.c b/gcc/omp-low.c
index 7b122059c6e..0f122857a3a 100644
--- a/gcc/omp-low.c
+++ b/gcc/omp-low.c
@@ -6005,6 +6005,8 @@ lower_rec_input_clauses (tree clauses, gimple_seq *ilist, 
gimple_seq *dlist,
  tree placeholder = OMP_CLAUSE_REDUCTION_PLACEHOLDER (c);
  gimple *tseq;
  tree ptype = TREE_TYPE (placeholder);
+ if (sctx.is_simt)
+   error ("SIMT reduction not fully implemented");
  if (cond)
{
  x = error_mark_node;
diff --git a/libgomp/testsuite/libgomp.c/target-44.c 
b/libgomp/testsuite/libgomp.c/target-44.c
new file mode 100644
index 000..497931cd14c
--- /dev/null
+++ b/libgomp/testsuite/libgomp.c/target-44.c
@@ -0,0 +1,28 @@
+/* { dg-do link { target { offload_target_nvptx } } } */
+/* { dg-additional-options "-foffload=-latomic" { target { 
offload_target_nvptx } } } */
+/* { dg-error "SIMT reduction not fully implemented" "" { target { 
offload_target_nvptx } } 0 }  */
+#include 
+
+struct s
+{
+  int i;
+};
+
+#pragma omp declare reduction(+: struct s: omp_out.i += omp_in.i)
+
+int
+main (void)
+{
+  const int N0 = 32768;
+
+  struct s counter_N0 = { 0 };
+#pragma omp target
+#pragma omp for simd reduction(+: counter_N0)
+  for (int i0 = 0 ; i0 < N0 ; i0++ )
+counter_N0.i += 1;
+
+  if (counter_N0.i != N0)
+abort ();
+
+  return 0;
+}


  1   2   3   4   5   6   7   8   9   10   >