Re: [gomp4] OpenACC reduction tests

2015-09-23 Thread Thomas Schwinge
Hi!

On Fri, 18 Sep 2015 10:11:25 +0200, I wrote:
> On Fri, 17 Jul 2015 11:13:59 -0700, Cesar Philippidis 
>  wrote:
> > This patch updates the libgomp OpenACC reduction test cases [...]
> 
> > --- a/libgomp/testsuite/libgomp.oacc-fortran/reduction-5.f90
> > +++ b/libgomp/testsuite/libgomp.oacc-fortran/reduction-5.f90

> With -O0, I frequently see this test FAIL (thus XFAILed), both for nvptx
> offloading and host-fallback execution.  Adding a few printfs, I observe
> redsub_gang compute "random" results.

This seems to have gotten fixed with Nathan's recent "Another oacc
reduction simplification",
,
so I'm removing the XFAIL.


The following issue however remains to be addressed:

> Given the following
> -Wuninitialized/-Wmaybe-uninitialized warnings (for -O1, for example),
> maybe there's some initialization of (internal) variables missing?
> (These user-visible warnings about compiler internals need to be
> addressed regardless.)  Would you please have a look at that?
> 
> source-gcc/libgomp/testsuite/libgomp.oacc-fortran/reduction-5.f90: In 
> function 'redsub_combined_._omp_fn.0':
> source-gcc/libgomp/testsuite/libgomp.oacc-fortran/reduction-5.f90:73:0: 
> warning: '' is used uninitialized in this function 
> [-Wuninitialized]
>!$acc loop reduction(+:sum) gang worker vector
> ^
> source-gcc/libgomp/testsuite/libgomp.oacc-fortran/reduction-5.f90: In 
> function 'redsub_vector_._omp_fn.1':
> source-gcc/libgomp/testsuite/libgomp.oacc-fortran/reduction-5.f90:60:0: 
> warning: '' is used uninitialized in this function 
> [-Wuninitialized]
>!$acc loop reduction(+:sum) vector
> ^
> source-gcc/libgomp/testsuite/libgomp.oacc-fortran/reduction-5.f90: In 
> function 'redsub_worker_._omp_fn.2':
> source-gcc/libgomp/testsuite/libgomp.oacc-fortran/reduction-5.f90:47:0: 
> warning: '' is used uninitialized in this function 
> [-Wuninitialized]
>!$acc loop reduction(+:sum) worker
> ^
> source-gcc/libgomp/testsuite/libgomp.oacc-fortran/reduction-5.f90: In 
> function 'redsub_gang_._omp_fn.3':
> source-gcc/libgomp/testsuite/libgomp.oacc-fortran/reduction-5.f90:34:0: 
> warning: 'sum.43' may be used uninitialized in this function 
> [-Wmaybe-uninitialized]
>!$acc loop reduction(+:sum) gang
> ^


Committed to gomp-4_0-branch in r228035:

commit 705169947333655ded3427985b34b758a5bc6cf5
Author: tschwinge 
Date:   Wed Sep 23 07:54:15 2015 +

Remove XFAIL of OpenACC reduction execution test case for -O0

libgomp/
* testsuite/libgomp.oacc-fortran/reduction-5.f90: Remove XFAIL of
execution test for -O0.

git-svn-id: svn+ssh://gcc.gnu.org/svn/gcc/branches/gomp-4_0-branch@228035 
138bc75d-0d04-0410-961f-82ee72b054a4
---
 libgomp/ChangeLog.gomp | 5 +
 libgomp/testsuite/libgomp.oacc-fortran/reduction-5.f90 | 1 -
 2 files changed, 5 insertions(+), 1 deletion(-)

diff --git libgomp/ChangeLog.gomp libgomp/ChangeLog.gomp
index 46c1a05..47db0d4 100644
--- libgomp/ChangeLog.gomp
+++ libgomp/ChangeLog.gomp
@@ -1,3 +1,8 @@
+2015-09-23  Thomas Schwinge  
+
+   * testsuite/libgomp.oacc-fortran/reduction-5.f90: Remove XFAIL of
+   execution test for -O0.
+
 2015-09-22  Cesar Philippidis  
 
* testsuite/libgomp.oacc-fortran/dummy-array.f90: New test.
diff --git libgomp/testsuite/libgomp.oacc-fortran/reduction-5.f90 
libgomp/testsuite/libgomp.oacc-fortran/reduction-5.f90
index f787e7d..180c9a2 100644
--- libgomp/testsuite/libgomp.oacc-fortran/reduction-5.f90
+++ libgomp/testsuite/libgomp.oacc-fortran/reduction-5.f90
@@ -1,5 +1,4 @@
 ! { dg-do run }
-! { dg-xfail-run-if "TODO" { *-*-* } { "-O0" } }
 
 ! subroutine reduction
 


Grüße,
 Thomas


signature.asc
Description: PGP signature


Re: [gomp4] Another oacc reduction simplification

2015-09-23 Thread Thomas Schwinge
Hi!

On Tue, 22 Sep 2015 11:29:37 -0400, Nathan Sidwell  wrote:
> I've committed this patch, which simplifies the generation of openacc 
> reduction 
> code.

Aside from the progression mentioned in
,
this change is also causing a regression:

[-PASS:-]{+FAIL:+} c-c++-common/goacc/routine-7.c  (test for errors, line 
35)
[-PASS:-]{+FAIL:+} c-c++-common/goacc/routine-7.c  (test for errors, line 
58)
[-PASS:-]{+FAIL:+} c-c++-common/goacc/routine-7.c  (test for errors, line 
62)
[-PASS:-]{+FAIL:+} c-c++-common/goacc/routine-7.c  (test for errors, line 
81)
[-PASS:-]{+FAIL:+} c-c++-common/goacc/routine-7.c  (test for errors, line 
85)
[-PASS:-]{+FAIL:+} c-c++-common/goacc/routine-7.c  (test for errors, line 
89)
[-PASS:-]{+FAIL: c-c++-common/goacc/routine-7.c (internal compiler error)+}
{+FAIL:+} c-c++-common/goacc/routine-7.c (test for excess errors)

Same for C++.

[...]/build-gcc/gcc/xgcc -B[...]/build-gcc/gcc/ 
[...]/source-gcc/gcc/testsuite/c-c++-common/goacc/routine-7.c 
-fno-diagnostics-show-caret -fdiagnostics-color=never -fopenacc -S -o 
routine-7.s
[...]/source-gcc/gcc/testsuite/c-c++-common/goacc/routine-7.c: In function 
'gang':
[...]/source-gcc/gcc/testsuite/c-c++-common/goacc/routine-7.c:12:9: 
internal compiler error: Segmentation fault
0xacaaaf crash_signal
[...]/source-gcc/gcc/toplev.c:352
0x127054b splay_tree_splay
[...]/source-gcc/libiberty/splay-tree.c:141
0x1270aa0 splay_tree_lookup
[...]/source-gcc/libiberty/splay-tree.c:456
0x9b4136 maybe_lookup_field
[...]/source-gcc/gcc/omp-low.c:1044
0x9b4136 lower_oacc_reductions
[...]/source-gcc/gcc/omp-low.c:4805
0x9d89e2 lower_oacc_head_tail
[...]/source-gcc/gcc/omp-low.c:4903
0x9d89e2 lower_omp_for
[...]/source-gcc/gcc/omp-low.c:11311
0x9d89e2 lower_omp_1
[...]/source-gcc/gcc/omp-low.c:12489
0x9d89e2 lower_omp
[...]/source-gcc/gcc/omp-low.c:12621
0x9d7ead lower_omp_1
[...]/source-gcc/gcc/omp-low.c:12474
0x9d7ead lower_omp
[...]/source-gcc/gcc/omp-low.c:12621
0x9d7ead lower_omp_1
[...]/source-gcc/gcc/omp-low.c:12474
0x9d7ead lower_omp
[...]/source-gcc/gcc/omp-low.c:12621
0x9d9b7c execute_lower_omp
[...]/source-gcc/gcc/omp-low.c:12659
0x9d9b7c execute
[...]/source-gcc/gcc/omp-low.c:12696

$ cat -n < source-gcc/gcc/testsuite/c-c++-common/goacc/routine-7.c
 1  /* Test invalid intra-routine parallelism.  */
 2  /* { dg-do compile } */
 3  
 4  #pragma acc routine gang
 5  int
 6  gang (int red)
 7  {
 8  #pragma acc loop reduction (+:red)
 9for (int i = 0; i < 10; i++)
10  red ++;
11  
12  #pragma acc loop gang reduction (+:red)
13for (int i = 0; i < 10; i++)
14  red ++;
15  
16  #pragma acc loop worker reduction (+:red)
17for (int i = 0; i < 10; i++)
18  red ++;
19  
20  #pragma acc loop vector reduction (+:red)
21for (int i = 0; i < 10; i++)
22  red ++;
23  
24return 1;
25  }
[...]


Grüße,
 Thomas


signature.asc
Description: PGP signature


Re: [ARM] Use vector wide add for mixed-mode adds

2015-09-23 Thread Kyrill Tkachov

Hi Michael,

On 23/09/15 00:52, Michael Collison wrote:

This is a modified version of the previous patch that removes the
documentation and read-md.c fixes. These patches have been submitted
separately and approved.

This patch is designed to address code that was not being vectorized due
to missing widening patterns in the ARM backend. Code such as:

int t6(int len, void * dummy, short * __restrict x)
{
len = len & ~31;
int result = 0;
__asm volatile ("");
for (int i = 0; i < len; i++)
  result += x[i];
return result;
}

Validated on arm-none-eabi, arm-none-linux-gnueabi,
arm-none-linux-gnueabihf, and armeb-none-linux-gnueabihf.

2015-09-22  Michael Collison  

  * config/arm/neon.md (widen_sum): New patterns
  where mode is VQI to improve mixed mode add vectorization.



Please list all the new define_expands and define_insns
in the changelog. Also, please add an ChangeLog entry for
the testsuite additions.

The approach looks ok to me with a few comments on some
parts of the patch itself.


+(define_insn "vec_sel_widen_ssum_hi3"
+  [(set (match_operand: 0 "s_register_operand" "=w")
+   (plus: (sign_extend: (vec_select:VW (match_operand:VQI 1 
"s_register_operand" "%w")
+  (match_operand:VQI 2 
"vect_par_constant_high" "")))
+   (match_operand: 3 "s_register_operand" 
"0")))]
+  "TARGET_NEON"
+  "vaddw.\t%q0, %q3, %f1"
+  [(set_attr "type" "neon_add_widen")
+  (set_attr "length" "8")]
+)


This is a single instruction, and it has a length of 4, so no need to override 
the length attribute.
Same with the other define_insns in this patch.


diff --git a/gcc/testsuite/gcc.target/arm/neon-vaddws16.c 
b/gcc/testsuite/gcc.target/arm/neon-vaddws16.c
new file mode 100644
index 000..ed10669
--- /dev/null
+++ b/gcc/testsuite/gcc.target/arm/neon-vaddws16.c
@@ -0,0 +1,21 @@
+/* { dg-do compile } */
+/* { dg-require-effective-target arm_neon_hw } */

The arm_neon_hw check is usually used when you want to run the tests.
Since this is a compile-only tests you just need arm_neon_ok.

 +/* { dg-add-options arm_neon_ok } */
+/* { dg-options "-O3" } */
+
+
+int
+t6(int len, void * dummy, short * __restrict x)
+{
+  len = len & ~31;
+  int result = 0;
+  __asm volatile ("");
+  for (int i = 0; i < len; i++)
+result += x[i];
+  return result;
+}
+
+/* { dg-final { scan-assembler "vaddw\.s16" } } */
+
+
+

Stray trailing newlines. Similar comments for the other testcases.

Thanks,
Kyrill



Re: [PATCH] Fix a -Wmisleading-indentation false-negative

2015-09-23 Thread Bernd Schmidt

On 09/23/2015 03:37 AM, Patrick Palka wrote:


gcc/c-family/ChangeLog:

* c-indentation.c (should_warn_for_misleading_indentation):
Compare next_stmt_vis_column with guard_line_first_nws instead
of with guard_line_vis_column.

gcc/testsuite/ChangeLog:

* c-c++-common/Wmisleading-indentation.c: Augment test.


Ok.


Bernd



Re: [RFC] PR tree-optimization/67628: Make tree ifcombine more symmetric and interactions with dom

2015-09-23 Thread Kyrill Tkachov


On 22/09/15 20:31, Jeff Law wrote:

On 09/22/2015 07:36 AM, Kyrill Tkachov wrote:

Hi all,
Unfortunately, I see a testsuite regression with this patch:
FAIL: gcc.dg/pr66299-2.c scan-tree-dump-not optimized "<<"

The reduced part of that test is:
void
test1 (int x, unsigned u)
{
if ((1U << x) != 64
|| (2 << x) != u
|| (x << x) != 384
|| (3 << x) == 9
|| (x << 14) != 98304U
|| (1 << x) == 14
|| (3 << 2) != 12)
  __builtin_abort ();
}

The patched ifcombine pass works more or less as expected and produces
fewer basic blocks.
Before this patch a relevant part of the ifcombine dump for test1 is:
;;   basic block 2, loop depth 0, count 0, freq 1, maybe hot
if (x_1(D) != 6)
  goto ;
else
  goto ;

;;   basic block 3, loop depth 0, count 0, freq 9996, maybe hot
_2 = 2 << x_1(D);
_3 = (unsigned intD.10) _2;
if (_3 != u_4(D))
  goto ;
else
  goto ;


After this patch it is:
;;   basic block 2, loop depth 0, count 0, freq 1, maybe hot
_2 = 2 << x_1(D);
_3 = (unsigned intD.10) _2;
_9 = _3 != u_4(D);
_10 = x_1(D) != 6;
_11 = _9 | _10;
if (_11 != 0)
  goto ;
else
  goto ;

The second form ends up generating worse codegen however, and the
badness starts with the dom1 pass.
In the unpatched case it manages to deduce that x must be 6 by the time
it reaches basic block 3 and
uses that information to eliminate the shift in "_2 = 2 << x_1(D)" from
basic block 3
In the patched case it is unable to make that call, I think because the
x != 6 condition is IORed
with another test.

I'm not familiar with the internals of the dom pass, so I'm not sure
where to go looking for a fix for this.
Is the ifcombine change a step in the right direction? If so, what would
need to be done to fix the issue with
the dom pass?

I don't see how you can reasonably fix this in DOM.  if _9 or _10 is
true, then _11 is true.  But we can't reasonably record any kind of
equivalence for _9 or _10 individually.

If the statement
_11 = _9 | _10;

Were changed to

_11 = _9 & _10;

Then we could record something useful about _9 and _10.



I suppose what we want is to not combine basic blocks if the sequence
and conditions of the basic blocks are
such that dom can potentially exploit them, but how do we express that?

I don't think there's going to be a way to directly express that.  You
could essentially claim that TRUTH_OR is more expensive than TRUTH_AND
because of the impact on DOM, but that in and of itself may not resolve
the situation either.

I think the question we need to answer is whether or not your changes
are generally better, even if there's specific instances where they make
things worse.  If the benefits outweigh the negatives then we can xfail
that test.


Ok, I'll investigate and benchmark some more.
Andrew, this transformation to ifcombine (together with the restriction that 
the inner condition block
has to be a single comparison) was added by you with r204194.
Is there a particular reason for that restriction and why it is applied to the 
inner block and not either?

Thanks,
Kyrill





jeff





Re: [RFC] PR tree-optimization/67628: Make tree ifcombine more symmetric and interactions with dom

2015-09-23 Thread Pinski, Andrew

> On Sep 23, 2015, at 1:59 AM, Kyrill Tkachov  wrote:
> 
> 
>> On 22/09/15 20:31, Jeff Law wrote:
>>> On 09/22/2015 07:36 AM, Kyrill Tkachov wrote:
>>> Hi all,
>>> Unfortunately, I see a testsuite regression with this patch:
>>> FAIL: gcc.dg/pr66299-2.c scan-tree-dump-not optimized "<<"
>>> 
>>> The reduced part of that test is:
>>> void
>>> test1 (int x, unsigned u)
>>> {
>>>if ((1U << x) != 64
>>>|| (2 << x) != u
>>>|| (x << x) != 384
>>>|| (3 << x) == 9
>>>|| (x << 14) != 98304U
>>>|| (1 << x) == 14
>>>|| (3 << 2) != 12)
>>>  __builtin_abort ();
>>> }
>>> 
>>> The patched ifcombine pass works more or less as expected and produces
>>> fewer basic blocks.
>>> Before this patch a relevant part of the ifcombine dump for test1 is:
>>> ;;   basic block 2, loop depth 0, count 0, freq 1, maybe hot
>>>if (x_1(D) != 6)
>>>  goto ;
>>>else
>>>  goto ;
>>> 
>>> ;;   basic block 3, loop depth 0, count 0, freq 9996, maybe hot
>>>_2 = 2 << x_1(D);
>>>_3 = (unsigned intD.10) _2;
>>>if (_3 != u_4(D))
>>>  goto ;
>>>else
>>>  goto ;
>>> 
>>> 
>>> After this patch it is:
>>> ;;   basic block 2, loop depth 0, count 0, freq 1, maybe hot
>>>_2 = 2 << x_1(D);
>>>_3 = (unsigned intD.10) _2;
>>>_9 = _3 != u_4(D);
>>>_10 = x_1(D) != 6;
>>>_11 = _9 | _10;
>>>if (_11 != 0)
>>>  goto ;
>>>else
>>>  goto ;
>>> 
>>> The second form ends up generating worse codegen however, and the
>>> badness starts with the dom1 pass.
>>> In the unpatched case it manages to deduce that x must be 6 by the time
>>> it reaches basic block 3 and
>>> uses that information to eliminate the shift in "_2 = 2 << x_1(D)" from
>>> basic block 3
>>> In the patched case it is unable to make that call, I think because the
>>> x != 6 condition is IORed
>>> with another test.
>>> 
>>> I'm not familiar with the internals of the dom pass, so I'm not sure
>>> where to go looking for a fix for this.
>>> Is the ifcombine change a step in the right direction? If so, what would
>>> need to be done to fix the issue with
>>> the dom pass?
>> I don't see how you can reasonably fix this in DOM.  if _9 or _10 is
>> true, then _11 is true.  But we can't reasonably record any kind of
>> equivalence for _9 or _10 individually.
>> 
>> If the statement
>> _11 = _9 | _10;
>> 
>> Were changed to
>> 
>> _11 = _9 & _10;
>> 
>> Then we could record something useful about _9 and _10.
>> 
>> 
>>> I suppose what we want is to not combine basic blocks if the sequence
>>> and conditions of the basic blocks are
>>> such that dom can potentially exploit them, but how do we express that?
>> I don't think there's going to be a way to directly express that.  You
>> could essentially claim that TRUTH_OR is more expensive than TRUTH_AND
>> because of the impact on DOM, but that in and of itself may not resolve
>> the situation either.
>> 
>> I think the question we need to answer is whether or not your changes
>> are generally better, even if there's specific instances where they make
>> things worse.  If the benefits outweigh the negatives then we can xfail
>> that test.
> 
> Ok, I'll investigate and benchmark some more.
> Andrew, this transformation to ifcombine (together with the restriction that 
> the inner condition block
> has to be a single comparison) was added by you with r204194.
> Is there a particular reason for that restriction and why it is applied to 
> the inner block and not either?

My reasoning at the time was there might be an "expensive" instruction or one 
that might trap (I did not check to see if the other part of the code was 
detecting that).
The outer block did not need any checks as we have something like
...
If (a)
  If (b)

Or

If (a)
  Goto f
else if (b)
 
Else
{
F:

}

And there was no need to check what was before the if (a) part just what is in 
between the two ifs. 

What I mean by expensive for an example is division or some function call. 

Thanks,
Andrew


> 
> Thanks,
> Kyrill
> 
> 
> 
>> 
>> jeff
> 


[HSA] Add new gate predicate

2015-09-23 Thread Martin Liška
Hello.

Following patch does a small refactoring of HSA tree generation pass.

Martin
>From a887efce8fc6aa136a2a069ea5ddda10b4e28de6 Mon Sep 17 00:00:00 2001
From: marxin 
Date: Tue, 22 Sep 2015 18:58:12 +0200
Subject: [PATCH] HSA: add new gate predicate

gcc/ChangeLog:

2015-09-22  Martin Liska  

	* hsa.h (hsa_gpu_implementation_p): New predicate.
	* hsa-gen.c (pass_gen_hsail::gate): Use it.
	(pass_gen_hsail::execute): Do not simulate gate predicate.
---
 gcc/hsa-gen.c | 18 +++---
 gcc/hsa.h | 13 +
 2 files changed, 20 insertions(+), 11 deletions(-)

diff --git a/gcc/hsa-gen.c b/gcc/hsa-gen.c
index 3a7ce5d..34cbe42 100644
--- a/gcc/hsa-gen.c
+++ b/gcc/hsa-gen.c
@@ -4492,9 +4492,10 @@ public:
 /* Determine whether or not to run generation of HSAIL.  */
 
 bool
-pass_gen_hsail::gate (function *)
+pass_gen_hsail::gate (function *f)
 {
-  return hsa_gen_requested_p ();
+  return hsa_gen_requested_p ()
+&& hsa_gpu_implementation_p (f->decl);
 }
 
 unsigned int
@@ -4503,15 +4504,10 @@ pass_gen_hsail::execute (function *)
   hsa_function_summary *s = hsa_summaries->get
 (cgraph_node::get_create (current_function_decl));
 
-  if (s->gpu_implementation_p)
-{
-  convert_switch_statements ();
-  generate_hsa (s->kind == HSA_KERNEL);
-  TREE_ASM_WRITTEN (current_function_decl) = 1;
-  return TODO_stop_pass_execution;
-}
-
-  return 0;
+  convert_switch_statements ();
+  generate_hsa (s->kind == HSA_KERNEL);
+  TREE_ASM_WRITTEN (current_function_decl) = 1;
+  return TODO_stop_pass_execution;
 }
 
 } // anon namespace
diff --git a/gcc/hsa.h b/gcc/hsa.h
index 6164b86..16fe310 100644
--- a/gcc/hsa.h
+++ b/gcc/hsa.h
@@ -1042,4 +1042,17 @@ union hsa_bytes
   uint64_t b64;
 };
 
+/* Return true if a function DECL is an HSA implementation.  */
+
+static inline bool
+hsa_gpu_implementation_p (tree decl)
+{
+  if (hsa_summaries == NULL)
+return false;
+
+  hsa_function_summary *s = hsa_summaries->get (cgraph_node::get_create (decl));
+
+  return s->gpu_implementation_p;
+}
+
 #endif /* HSA_H */
-- 
2.5.1



Re: [RFC] PR tree-optimization/67628: Make tree ifcombine more symmetric and interactions with dom

2015-09-23 Thread Kyrill Tkachov


On 23/09/15 10:09, Pinski, Andrew wrote:

On Sep 23, 2015, at 1:59 AM, Kyrill Tkachov  wrote:



On 22/09/15 20:31, Jeff Law wrote:

On 09/22/2015 07:36 AM, Kyrill Tkachov wrote:
Hi all,
Unfortunately, I see a testsuite regression with this patch:
FAIL: gcc.dg/pr66299-2.c scan-tree-dump-not optimized "<<"

The reduced part of that test is:
void
test1 (int x, unsigned u)
{
if ((1U << x) != 64
|| (2 << x) != u
|| (x << x) != 384
|| (3 << x) == 9
|| (x << 14) != 98304U
|| (1 << x) == 14
|| (3 << 2) != 12)
  __builtin_abort ();
}

The patched ifcombine pass works more or less as expected and produces
fewer basic blocks.
Before this patch a relevant part of the ifcombine dump for test1 is:
;;   basic block 2, loop depth 0, count 0, freq 1, maybe hot
if (x_1(D) != 6)
  goto ;
else
  goto ;

;;   basic block 3, loop depth 0, count 0, freq 9996, maybe hot
_2 = 2 << x_1(D);
_3 = (unsigned intD.10) _2;
if (_3 != u_4(D))
  goto ;
else
  goto ;


After this patch it is:
;;   basic block 2, loop depth 0, count 0, freq 1, maybe hot
_2 = 2 << x_1(D);
_3 = (unsigned intD.10) _2;
_9 = _3 != u_4(D);
_10 = x_1(D) != 6;
_11 = _9 | _10;
if (_11 != 0)
  goto ;
else
  goto ;

The second form ends up generating worse codegen however, and the
badness starts with the dom1 pass.
In the unpatched case it manages to deduce that x must be 6 by the time
it reaches basic block 3 and
uses that information to eliminate the shift in "_2 = 2 << x_1(D)" from
basic block 3
In the patched case it is unable to make that call, I think because the
x != 6 condition is IORed
with another test.

I'm not familiar with the internals of the dom pass, so I'm not sure
where to go looking for a fix for this.
Is the ifcombine change a step in the right direction? If so, what would
need to be done to fix the issue with
the dom pass?

I don't see how you can reasonably fix this in DOM.  if _9 or _10 is
true, then _11 is true.  But we can't reasonably record any kind of
equivalence for _9 or _10 individually.

If the statement
_11 = _9 | _10;

Were changed to

_11 = _9 & _10;

Then we could record something useful about _9 and _10.



I suppose what we want is to not combine basic blocks if the sequence
and conditions of the basic blocks are
such that dom can potentially exploit them, but how do we express that?

I don't think there's going to be a way to directly express that.  You
could essentially claim that TRUTH_OR is more expensive than TRUTH_AND
because of the impact on DOM, but that in and of itself may not resolve
the situation either.

I think the question we need to answer is whether or not your changes
are generally better, even if there's specific instances where they make
things worse.  If the benefits outweigh the negatives then we can xfail
that test.

Ok, I'll investigate and benchmark some more.
Andrew, this transformation to ifcombine (together with the restriction that 
the inner condition block
has to be a single comparison) was added by you with r204194.
Is there a particular reason for that restriction and why it is applied to the 
inner block and not either?

My reasoning at the time was there might be an "expensive" instruction or one 
that might trap (I did not check to see if the other part of the code was detecting that).
The outer block did not need any checks as we have something like
...
If (a)
   If (b)

Or

If (a)
   Goto f
else if (b)
  
Else
{
F:

}

And there was no need to check what was before the if (a) part just what is in 
between the two ifs.


Ah, because the code in outer_cond_bb would have to be executed anyway whether
we perform the conversion or not, right?

Thanks,
Kyrill



What I mean by expensive for an example is division or some function call.

Thanks,
Andrew



Thanks,
Kyrill




jeff




Re: [gomp4] lock/unlock internal fn

2015-09-23 Thread Thomas Schwinge
Hi Nathan!

On Mon, 17 Aug 2015 15:30:16 -0400, Nathan Sidwell  wrote:
> I've committed this patch to add a new pair of internal functions.  These 
> will 
> be used in implementing reductions.
> 
> They'll be emitted around reduction finalization, and implement the locking 
> required for the general case of combining reduction values.  They may be 
> transformed in the oacc_xform pass, and the default behaviour is to delete 
> them, 
> if there is no RTL expander.  For PTX we delete them if they are at the 
> vector 
> level.
> 
> This avoids needing machine-specific builtins to expand to, and thus should 
> result in less backend code duplication.

With the __builtin_nvptx_lock and __builtin_nvptx_unlock builtins
removed, should the gcc.target/nvptx/spinlock-1.c and
gcc.target/nvptx/spinlock-2.c test cases then be removed, too, or should
these be re-written differently?

For reference:

$ grep ^ gcc/testsuite/gcc.target/nvptx/spinlock-*.c
gcc/testsuite/gcc.target/nvptx/spinlock-1.c:/* { dg-do compile } */
gcc/testsuite/gcc.target/nvptx/spinlock-1.c:void Foo ()
gcc/testsuite/gcc.target/nvptx/spinlock-1.c:{
gcc/testsuite/gcc.target/nvptx/spinlock-1.c:  __builtin_nvptx_lock (0);
gcc/testsuite/gcc.target/nvptx/spinlock-1.c:  __builtin_nvptx_unlock (0);
gcc/testsuite/gcc.target/nvptx/spinlock-1.c:}
gcc/testsuite/gcc.target/nvptx/spinlock-1.c:
gcc/testsuite/gcc.target/nvptx/spinlock-1.c:
gcc/testsuite/gcc.target/nvptx/spinlock-1.c:/* { dg-final { 
scan-assembler-times ".atom.global.cas.b32" 2 } } */
gcc/testsuite/gcc.target/nvptx/spinlock-1.c:/* { dg-final { scan-assembler 
".global .u32 __global_lock;" } } */
gcc/testsuite/gcc.target/nvptx/spinlock-1.c:/* { dg-final { 
scan-assembler-not ".shared .u32 __shared_lock;" } } */
gcc/testsuite/gcc.target/nvptx/spinlock-2.c:/* { dg-do compile } */
gcc/testsuite/gcc.target/nvptx/spinlock-2.c:void Foo ()
gcc/testsuite/gcc.target/nvptx/spinlock-2.c:{
gcc/testsuite/gcc.target/nvptx/spinlock-2.c:  __builtin_nvptx_lock (1);
gcc/testsuite/gcc.target/nvptx/spinlock-2.c:  __builtin_nvptx_unlock (1);
gcc/testsuite/gcc.target/nvptx/spinlock-2.c:}
gcc/testsuite/gcc.target/nvptx/spinlock-2.c:
gcc/testsuite/gcc.target/nvptx/spinlock-2.c:/* { dg-final { 
scan-assembler-times ".atom.shared.cas.b32" 2 } } */
gcc/testsuite/gcc.target/nvptx/spinlock-2.c:/* { dg-final { scan-assembler 
".shared .u32 __shared_lock;" } } */
gcc/testsuite/gcc.target/nvptx/spinlock-2.c:/* { dg-final { 
scan-assembler-not ".global .u32 __global_lock;" } } */

> 2015-08-17  Nathan Sidwell  
> 
>   * target.def (lock_unlock): New GOACC hook.
>   * targhooks.h (default_goacc_lock_unlock): Declare.
>   * doc/tm.texi.in (TARGET_GOACC_LOCK_UNLOCK): Add.
>   * doc/tm.texi: Rebuilt.
>   * internal-fn.def (GOACC_LOCK, GOACC_UNLOCK): New.
>   * internal-fn.c (expand_GOACC_LOCK, expand_GOACC_UNLOCK): New.
>   * omp-low.c (execute_oacc_transform): Add lock/unlock handling.
>   (default_goacc_lock_unlock): New.
>   * config/nvptx/nvptx-protos.h (nvptx_expand_oacc_lock_unlock): Declare.
>   * config/nvptx/nvptx.md (UNSPECV_UNLOCK): Delete.
>   (oacc_lock, oacc_unlock): New expanders.
>   (nvptx_spinlock, nvptx_spinunlock): Use UNSPECV_LOCK.
>   * config/nvptx/nvptx.c (nvptx_expand_oacc_lock_unlock): New.
>   (nvptx_expand_lock_unlock): Delete.
>   (nvptx_expand_lock, nvptx_expand_unlock): Delete.
>   (nvptx_expand_work_red_addr): Fixup address generation.
>   (enum nvptx_types): Delete NT_VOID_UINT.
>   (builtins): Delete nvptx_lock and nvptx_unlock.
>   (nvptx_init_builtins): Adjust.
>   (nvptx_xform_lock_unlock): New.
>   (TARGET_GOACC_LOCK_UNLOCK): Override.
>   
> Index: gcc/config/nvptx/nvptx-protos.h
> ===
> --- gcc/config/nvptx/nvptx-protos.h   (revision 226951)
> +++ gcc/config/nvptx/nvptx-protos.h   (working copy)
> @@ -34,6 +34,7 @@ extern const char *nvptx_section_for_dec
>  #ifdef RTX_CODE
>  extern void nvptx_expand_oacc_fork (rtx);
>  extern void nvptx_expand_oacc_join (rtx);
> +extern void nvptx_expand_oacc_lock_unlock (rtx, bool);
>  extern void nvptx_expand_call (rtx, rtx);
>  extern rtx nvptx_expand_compare (rtx);
>  extern const char *nvptx_ptx_type_from_mode (machine_mode, bool);
> Index: gcc/config/nvptx/nvptx.md
> ===
> --- gcc/config/nvptx/nvptx.md (revision 226951)
> +++ gcc/config/nvptx/nvptx.md (working copy)
> @@ -61,7 +61,6 @@
>  
>  (define_c_enum "unspecv" [
> UNSPECV_LOCK
> -   UNSPECV_UNLOCK
> UNSPECV_CAS
> UNSPECV_XCHG
> UNSPECV_BARSYNC
> @@ -1366,6 +1365,26 @@
>return asms[INTVAL (operands[1])];
>  })
>  
> +(define_expand "oacc_lock"
> +  [(unspec_volatile:SI [(match_operand:SI 0 "const_int_operand" "")
> + (match_operand:SI 1 "const_int_operand" "

Re: [RFC] PR tree-optimization/67628: Make tree ifcombine more symmetric and interactions with dom

2015-09-23 Thread Richard Biener
On Wed, 23 Sep 2015, Kyrill Tkachov wrote:

> 
> On 23/09/15 10:09, Pinski, Andrew wrote:
> > > On Sep 23, 2015, at 1:59 AM, Kyrill Tkachov 
> > > wrote:
> > > 
> > > 
> > > > On 22/09/15 20:31, Jeff Law wrote:
> > > > > On 09/22/2015 07:36 AM, Kyrill Tkachov wrote:
> > > > > Hi all,
> > > > > Unfortunately, I see a testsuite regression with this patch:
> > > > > FAIL: gcc.dg/pr66299-2.c scan-tree-dump-not optimized "<<"
> > > > > 
> > > > > The reduced part of that test is:
> > > > > void
> > > > > test1 (int x, unsigned u)
> > > > > {
> > > > > if ((1U << x) != 64
> > > > > || (2 << x) != u
> > > > > || (x << x) != 384
> > > > > || (3 << x) == 9
> > > > > || (x << 14) != 98304U
> > > > > || (1 << x) == 14
> > > > > || (3 << 2) != 12)
> > > > >   __builtin_abort ();
> > > > > }
> > > > > 
> > > > > The patched ifcombine pass works more or less as expected and produces
> > > > > fewer basic blocks.
> > > > > Before this patch a relevant part of the ifcombine dump for test1 is:
> > > > > ;;   basic block 2, loop depth 0, count 0, freq 1, maybe hot
> > > > > if (x_1(D) != 6)
> > > > >   goto ;
> > > > > else
> > > > >   goto ;
> > > > > 
> > > > > ;;   basic block 3, loop depth 0, count 0, freq 9996, maybe hot
> > > > > _2 = 2 << x_1(D);
> > > > > _3 = (unsigned intD.10) _2;
> > > > > if (_3 != u_4(D))
> > > > >   goto ;
> > > > > else
> > > > >   goto ;
> > > > > 
> > > > > 
> > > > > After this patch it is:
> > > > > ;;   basic block 2, loop depth 0, count 0, freq 1, maybe hot
> > > > > _2 = 2 << x_1(D);
> > > > > _3 = (unsigned intD.10) _2;
> > > > > _9 = _3 != u_4(D);
> > > > > _10 = x_1(D) != 6;
> > > > > _11 = _9 | _10;
> > > > > if (_11 != 0)
> > > > >   goto ;
> > > > > else
> > > > >   goto ;
> > > > > 
> > > > > The second form ends up generating worse codegen however, and the
> > > > > badness starts with the dom1 pass.
> > > > > In the unpatched case it manages to deduce that x must be 6 by the
> > > > > time
> > > > > it reaches basic block 3 and
> > > > > uses that information to eliminate the shift in "_2 = 2 << x_1(D)"
> > > > > from
> > > > > basic block 3
> > > > > In the patched case it is unable to make that call, I think because
> > > > > the
> > > > > x != 6 condition is IORed
> > > > > with another test.
> > > > > 
> > > > > I'm not familiar with the internals of the dom pass, so I'm not sure
> > > > > where to go looking for a fix for this.
> > > > > Is the ifcombine change a step in the right direction? If so, what
> > > > > would
> > > > > need to be done to fix the issue with
> > > > > the dom pass?
> > > > I don't see how you can reasonably fix this in DOM.  if _9 or _10 is
> > > > true, then _11 is true.  But we can't reasonably record any kind of
> > > > equivalence for _9 or _10 individually.
> > > > 
> > > > If the statement
> > > > _11 = _9 | _10;
> > > > 
> > > > Were changed to
> > > > 
> > > > _11 = _9 & _10;
> > > > 
> > > > Then we could record something useful about _9 and _10.
> > > > 
> > > > 
> > > > > I suppose what we want is to not combine basic blocks if the sequence
> > > > > and conditions of the basic blocks are
> > > > > such that dom can potentially exploit them, but how do we express
> > > > > that?
> > > > I don't think there's going to be a way to directly express that.  You
> > > > could essentially claim that TRUTH_OR is more expensive than TRUTH_AND
> > > > because of the impact on DOM, but that in and of itself may not resolve
> > > > the situation either.
> > > > 
> > > > I think the question we need to answer is whether or not your changes
> > > > are generally better, even if there's specific instances where they make
> > > > things worse.  If the benefits outweigh the negatives then we can xfail
> > > > that test.
> > > Ok, I'll investigate and benchmark some more.
> > > Andrew, this transformation to ifcombine (together with the restriction
> > > that the inner condition block
> > > has to be a single comparison) was added by you with r204194.
> > > Is there a particular reason for that restriction and why it is applied to
> > > the inner block and not either?
> > My reasoning at the time was there might be an "expensive" instruction or
> > one that might trap (I did not check to see if the other part of the code
> > was detecting that).
> > The outer block did not need any checks as we have something like
> > ...
> > If (a)
> >If (b)
> > 
> > Or
> > 
> > If (a)
> >Goto f
> > else if (b)
> >   
> > Else
> > {
> > F:
> > 
> > }
> > 
> > And there was no need to check what was before the if (a) part just what is
> > in between the two ifs.
> 
> Ah, because the code in outer_cond_bb would have to be executed anyway whether
> we perform the conversion or not, right?

All ifcombine transforms make the outer condition unconditionally 
true/false thus the check should have been on whether the out

Re: [RFC, PR target/65105] Use vector instructions for scalar 64bit computations on 32bit target

2015-09-23 Thread Ilya Enkovich
On 14 Sep 17:50, Uros Bizjak wrote:
> 
> +(define_insn_and_split "*zext_doubleword"
> +  [(set (match_operand:DI 0 "register_operand" "=r")
> + (zero_extend:DI (match_operand:SWI24 1 "nonimmediate_operand" "rm")))]
> +  "!TARGET_64BIT && TARGET_STV && TARGET_SSE2"
> +  "#"
> +  "&& reload_completed && GENERAL_REG_P (operands[0])"
> +  [(set (match_dup 0) (zero_extend:SI (match_dup 1)))
> +   (set (match_dup 2) (const_int 0))]
> +  "split_double_mode (DImode, &operands[0], 1, &operands[0], &operands[2]);")
> +
> +(define_insn_and_split "*zextqi_doubleword"
> +  [(set (match_operand:DI 0 "register_operand" "=r")
> + (zero_extend:DI (match_operand:QI 1 "nonimmediate_operand" "qm")))]
> +  "!TARGET_64BIT && TARGET_STV && TARGET_SSE2"
> +  "#"
> +  "&& reload_completed && GENERAL_REG_P (operands[0])"
> +  [(set (match_dup 0) (zero_extend:SI (match_dup 1)))
> +   (set (match_dup 2) (const_int 0))]
> +  "split_double_mode (DImode, &operands[0], 1, &operands[0], &operands[2]);")
> +
> 
> Please put the above patterns together with other zero_extend
> patterns. You can also merge these two patterns using SWI124 mode
> iterator with  mode attribute as a register constraint. Also, no
> need to check for GENERAL_REG_P after reload, when "r" constraint is
> in effect:
> 
> (define_insn_and_split "*zext_doubleword"
>   [(set (match_operand:DI 0 "register_operand" "=r")
>  (zero_extend:DI (match_operand:SWI124 1 "nonimmediate_operand" "m")))]
>   "!TARGET_64BIT && TARGET_STV && TARGET_SSE2"
>   "#"
>   "&& reload_completed"
>   [(set (match_dup 0) (zero_extend:SI (match_dup 1)))
>(set (match_dup 2) (const_int 0))]
>   "split_double_mode (DImode, &operands[0], 1, &operands[0], &operands[2]);")

Register constraint doesn't affect split and I need GENERAL_REG_P to filter 
other registers case.

I merged QI and HI cases of zext but made a separate pattern for SI case 
because it doesn't need zero_extend in resulting code.  Bootstrapped and 
regtested for x86_64-unknown-linux-gnu.

Thanks,
Ilya
--
gcc/

2015-09-23  Ilya Enkovich  

* config/i386/i386.c: Include dbgcnt.h.
(has_non_address_hard_reg): New.
(convertible_comparison_p): New.
(scalar_to_vector_candidate_p): New.
(remove_non_convertible_regs): New.
(scalar_chain): New.
(scalar_chain::scalar_chain): New.
(scalar_chain::~scalar_chain): New.
(scalar_chain::add_to_queue): New.
(scalar_chain::mark_dual_mode_def): New.
(scalar_chain::analyze_register_chain): New.
(scalar_chain::add_insn): New.
(scalar_chain::build): New.
(scalar_chain::compute_convert_gain): New.
(scalar_chain::replace_with_subreg): New.
(scalar_chain::replace_with_subreg_in_insn): New.
(scalar_chain::emit_conversion_insns): New.
(scalar_chain::make_vector_copies): New.
(scalar_chain::convert_reg): New.
(scalar_chain::convert_op): New.
(scalar_chain::convert_insn): New.
(scalar_chain::convert): New.
(convert_scalars_to_vector): New.
(pass_data_stv): New.
(pass_stv): New.
(make_pass_stv): New.
(ix86_option_override): Created and register stv pass.
(flag_opts): Add -mstv.
(ix86_option_override_internal): Likewise.
* config/i386/i386.md (SWIM1248x): New.
(*movdi_internal): Add xmm to mem alternative for TARGET_STV.
(and3): Use SWIM1248x iterator instead of SWIM.
(*anddi3_doubleword): New.
(*zext_doubleword): New.
(*zextsi_doubleword): New.
(3): Use SWIM1248x iterator instead of SWIM.
(*di3_doubleword): New.
* config/i386/i386.opt (mstv): New.
* dbgcnt.def (stv_conversion): New.

gcc/testsuite/

2015-09-23  Ilya Enkovich  

* gcc.target/i386/pr65105-1.c: New.
* gcc.target/i386/pr65105-2.c: New.
* gcc.target/i386/pr65105-3.c: New.
* gcc.target/i386/pr65105-4.C: New.
* gcc.dg/lower-subreg-1.c: Add -mno-stv options for ia32.


diff --git a/gcc/config/i386/i386.c b/gcc/config/i386/i386.c
index d547cfd..2663f85 100644
--- a/gcc/config/i386/i386.c
+++ b/gcc/config/i386/i386.c
@@ -87,6 +87,7 @@ along with GCC; see the file COPYING3.  If not see
 #include "tree-iterator.h"
 #include "tree-chkp.h"
 #include "rtl-chkp.h"
+#include "dbgcnt.h"
 
 /* This file should be included last.  */
 #include "target-def.h"
@@ -2600,6 +2601,908 @@ rest_of_handle_insert_vzeroupper (void)
   return 0;
 }
 
+/* Return 1 if INSN uses or defines a hard register.
+   Hard register uses in a memory address are ignored.
+   Clobbers and flags definitions are ignored.  */
+
+static bool
+has_non_address_hard_reg (rtx_insn *insn)
+{
+  df_ref ref;
+  FOR_EACH_INSN_DEF (ref, insn)
+if (HARD_REGISTER_P (DF_REF_REAL_REG (ref))
+   && !DF_REF_FLAGS_IS_SET (ref, DF_REF_MUST_CLOBBER)
+   && DF_REF_REGNO (ref) != FLAGS_REG)
+  return true;
+
+  FOR_EACH_INSN_USE (ref, insn)
+if (!D

Re: [RFC, PR target/65105] Use vector instructions for scalar 64bit computations on 32bit target

2015-09-23 Thread Uros Bizjak
On Wed, Sep 23, 2015 at 12:19 PM, Ilya Enkovich  wrote:
> On 14 Sep 17:50, Uros Bizjak wrote:
>>
>> +(define_insn_and_split "*zext_doubleword"
>> +  [(set (match_operand:DI 0 "register_operand" "=r")
>> + (zero_extend:DI (match_operand:SWI24 1 "nonimmediate_operand" "rm")))]
>> +  "!TARGET_64BIT && TARGET_STV && TARGET_SSE2"
>> +  "#"
>> +  "&& reload_completed && GENERAL_REG_P (operands[0])"
>> +  [(set (match_dup 0) (zero_extend:SI (match_dup 1)))
>> +   (set (match_dup 2) (const_int 0))]
>> +  "split_double_mode (DImode, &operands[0], 1, &operands[0], 
>> &operands[2]);")
>> +
>> +(define_insn_and_split "*zextqi_doubleword"
>> +  [(set (match_operand:DI 0 "register_operand" "=r")
>> + (zero_extend:DI (match_operand:QI 1 "nonimmediate_operand" "qm")))]
>> +  "!TARGET_64BIT && TARGET_STV && TARGET_SSE2"
>> +  "#"
>> +  "&& reload_completed && GENERAL_REG_P (operands[0])"
>> +  [(set (match_dup 0) (zero_extend:SI (match_dup 1)))
>> +   (set (match_dup 2) (const_int 0))]
>> +  "split_double_mode (DImode, &operands[0], 1, &operands[0], 
>> &operands[2]);")
>> +
>>
>> Please put the above patterns together with other zero_extend
>> patterns. You can also merge these two patterns using SWI124 mode
>> iterator with  mode attribute as a register constraint. Also, no
>> need to check for GENERAL_REG_P after reload, when "r" constraint is
>> in effect:
>>
>> (define_insn_and_split "*zext_doubleword"
>>   [(set (match_operand:DI 0 "register_operand" "=r")
>>  (zero_extend:DI (match_operand:SWI124 1 "nonimmediate_operand" "m")))]
>>   "!TARGET_64BIT && TARGET_STV && TARGET_SSE2"
>>   "#"
>>   "&& reload_completed"
>>   [(set (match_dup 0) (zero_extend:SI (match_dup 1)))
>>(set (match_dup 2) (const_int 0))]
>>   "split_double_mode (DImode, &operands[0], 1, &operands[0], &operands[2]);")
>
> Register constraint doesn't affect split and I need GENERAL_REG_P to filter 
> other registers case.

OK.

> I merged QI and HI cases of zext but made a separate pattern for SI case 
> because it doesn't need zero_extend in resulting code.  Bootstrapped and 
> regtested for x86_64-unknown-linux-gnu.

This change is OK.

The patch LGTM, but please wait a couple of days if Jeff has some
comment on algorithmic aspect of the patch.

Thanks,
Uros.

>
> Thanks,
> Ilya
> --
> gcc/
>
> 2015-09-23  Ilya Enkovich  
>
> * config/i386/i386.c: Include dbgcnt.h.
> (has_non_address_hard_reg): New.
> (convertible_comparison_p): New.
> (scalar_to_vector_candidate_p): New.
> (remove_non_convertible_regs): New.
> (scalar_chain): New.
> (scalar_chain::scalar_chain): New.
> (scalar_chain::~scalar_chain): New.
> (scalar_chain::add_to_queue): New.
> (scalar_chain::mark_dual_mode_def): New.
> (scalar_chain::analyze_register_chain): New.
> (scalar_chain::add_insn): New.
> (scalar_chain::build): New.
> (scalar_chain::compute_convert_gain): New.
> (scalar_chain::replace_with_subreg): New.
> (scalar_chain::replace_with_subreg_in_insn): New.
> (scalar_chain::emit_conversion_insns): New.
> (scalar_chain::make_vector_copies): New.
> (scalar_chain::convert_reg): New.
> (scalar_chain::convert_op): New.
> (scalar_chain::convert_insn): New.
> (scalar_chain::convert): New.
> (convert_scalars_to_vector): New.
> (pass_data_stv): New.
> (pass_stv): New.
> (make_pass_stv): New.
> (ix86_option_override): Created and register stv pass.
> (flag_opts): Add -mstv.
> (ix86_option_override_internal): Likewise.
> * config/i386/i386.md (SWIM1248x): New.
> (*movdi_internal): Add xmm to mem alternative for TARGET_STV.
> (and3): Use SWIM1248x iterator instead of SWIM.
> (*anddi3_doubleword): New.
> (*zext_doubleword): New.
> (*zextsi_doubleword): New.
> (3): Use SWIM1248x iterator instead of SWIM.
> (*di3_doubleword): New.
> * config/i386/i386.opt (mstv): New.
> * dbgcnt.def (stv_conversion): New.
>
> gcc/testsuite/
>
> 2015-09-23  Ilya Enkovich  
>
> * gcc.target/i386/pr65105-1.c: New.
> * gcc.target/i386/pr65105-2.c: New.
> * gcc.target/i386/pr65105-3.c: New.
> * gcc.target/i386/pr65105-4.C: New.
> * gcc.dg/lower-subreg-1.c: Add -mno-stv options for ia32.
>
>
> diff --git a/gcc/config/i386/i386.c b/gcc/config/i386/i386.c
> index d547cfd..2663f85 100644
> --- a/gcc/config/i386/i386.c
> +++ b/gcc/config/i386/i386.c
> @@ -87,6 +87,7 @@ along with GCC; see the file COPYING3.  If not see
>  #include "tree-iterator.h"
>  #include "tree-chkp.h"
>  #include "rtl-chkp.h"
> +#include "dbgcnt.h"
>
>  /* This file should be included last.  */
>  #include "target-def.h"
> @@ -2600,6 +2601,908 @@ rest_of_handle_insert_vzeroupper (void)
>return 0;
>  }
>
> +/* Return 1 if INSN uses or defines a hard register.
> +   Hard register uses in a me

Re: [gomp4, wip] remove references to ganglocal shared memory inside gcc

2015-09-23 Thread Thomas Schwinge
Hi!

On Fri, 18 Sep 2015 06:51:18 -0700, Cesar Philippidis  
wrote:
> On 09/18/2015 01:39 AM, Thomas Schwinge wrote:
> 
> > On Tue, 1 Sep 2015 18:29:55 +0200, Tom de Vries  
> > wrote:
> >> On 27/08/15 03:37, Cesar Philippidis wrote:
> >>> -  ctx->ganglocal_size_host = align_and_expand (&gl_host, host_size, 
> >>> align);
> >>
> >> I suspect this caused a bootstrap failure (align_and_expand unused). 
> >> Worked-around as attached.

> > If I remember correctly, this has only ever been used in the "ganglocal"
> > implementation -- which is now gone.  So, should align_and_expand also be
> > elided (Cesar)?
> 
> Most likely. I probably overlooked it when I was working on that
> ganglocal removal patch. Can you remove it please? I'm already juggling
> a couple of patches right now.

Together with removal of printing the declarator for sdata, committed to
gomp-4_0-branch in r228038:

commit f5890b47c1b6f09134c4bfadcc7ece0d5403a1d7
Author: tschwinge 
Date:   Wed Sep 23 10:35:31 2015 +

More "ganglocal" cleanup

gcc/
* config/nvptx/nvptx.c (nvptx_file_start): Don't print declaration
of sdata.
* omp-low.c (align_and_expand): Remove function.

git-svn-id: svn+ssh://gcc.gnu.org/svn/gcc/branches/gomp-4_0-branch@228038 
138bc75d-0d04-0410-961f-82ee72b054a4
---
 gcc/ChangeLog.gomp   |  6 ++
 gcc/config/nvptx/nvptx.c |  1 -
 gcc/omp-low.c| 15 ---
 3 files changed, 6 insertions(+), 16 deletions(-)

diff --git gcc/ChangeLog.gomp gcc/ChangeLog.gomp
index 21c6fa0..c66f80a 100644
--- gcc/ChangeLog.gomp
+++ gcc/ChangeLog.gomp
@@ -1,3 +1,9 @@
+2015-09-23  Thomas Schwinge  
+
+   * config/nvptx/nvptx.c (nvptx_file_start): Don't print declaration
+   of sdata.
+   * omp-low.c (align_and_expand): Remove function.
+
 2015-09-22  Cesar Philippidis  
 
* gimplify.c (oacc_default_clause): Inspect pointer types when
diff --git gcc/config/nvptx/nvptx.c gcc/config/nvptx/nvptx.c
index 5640e34..37b50a3 100644
--- gcc/config/nvptx/nvptx.c
+++ gcc/config/nvptx/nvptx.c
@@ -4063,7 +4063,6 @@ nvptx_file_start (void)
   else
 fputs ("\t.target\tsm_30\n", asm_out_file);
   fprintf (asm_out_file, "\t.address_size %d\n", GET_MODE_BITSIZE (Pmode));
-  fprintf (asm_out_file, "\t.extern .shared .u8 sdata[];\n");
   fputs ("// END PREAMBLE\n", asm_out_file);
 }
 
diff --git gcc/omp-low.c gcc/omp-low.c
index ee527d0..ec76096 100644
--- gcc/omp-low.c
+++ gcc/omp-low.c
@@ -1446,21 +1446,6 @@ omp_copy_decl (tree var, copy_body_data *cb)
   return error_mark_node;
 }
 
-/* Modify the old size *POLDSZ to align it up to ALIGN, and then return
-   a value with SIZE added to it.  */
-static tree ATTRIBUTE_UNUSED
-align_and_expand (tree *poldsz, tree size, unsigned int align)
-{
-  tree oldsz = *poldsz;
-  oldsz = fold_build2 (BIT_AND_EXPR, size_type_node,
-  fold_build2 (PLUS_EXPR, size_type_node,
-   oldsz, size_int (align - 1)),
-  fold_build1 (BIT_NOT_EXPR, size_type_node,
-   size_int (align - 1)));
-  *poldsz = oldsz;
-  return fold_build2 (PLUS_EXPR, size_type_node, oldsz, size);
-}
-
 /* Debugging dumps for parallel regions.  */
 void dump_omp_region (FILE *, struct omp_region *, int);
 void debug_omp_region (struct omp_region *);


Grüße,
 Thomas


signature.asc
Description: PGP signature


Re: [Patch/ccmp] Cost instruction sequences to choose better expand order

2015-09-23 Thread Bernd Schmidt

No. Please see NOTE part of the description. AArch64 doesn't cost ccmp
currently. It will be fixed by a seperate patch later. The testcase is
thus marked as XFAIL.


I'd prefer to do things in the right order. Your patch is approved, but 
please commit only after you can remove the xfail from the testcase.



Bernd



Re: New post-LTO OpenACC pass

2015-09-23 Thread Bernd Schmidt

On 09/22/2015 05:16 PM, Nathan Sidwell wrote:

+   if (gimple_call_builtin_p (call, BUILT_IN_ACC_ON_DEVICE))
+ /* acc_on_device must be evaluated at compile time for
+constant arguments.  */
+ {
+   oacc_xform_on_device (call);
+   rescan = true;
+ }


Is there a reason this is not done as part of pass_fold_builtins? (It 
looks like maybe adding this to fold_call_stmt in builtins.c would be 
sufficient too).



Bernd


Re: [ubsan PATCH] Fix uninitialized var issue (PR sanitizer/64906)

2015-09-23 Thread Bernd Schmidt

On 09/22/2015 05:11 PM, Marek Polacek wrote:


diff --git gcc/c-family/c-ubsan.c gcc/c-family/c-ubsan.c
index e0cce84..d2bc264 100644
--- gcc/c-family/c-ubsan.c
+++ gcc/c-family/c-ubsan.c
@@ -104,6 +104,7 @@ ubsan_instrument_division (location_t loc, tree op0, tree 
op1)
}
  }
t = fold_build2 (COMPOUND_EXPR, TREE_TYPE (t), unshare_expr (op0), t);
+  t = fold_build2 (COMPOUND_EXPR, TREE_TYPE (t), unshare_expr (op1), t);
if (flag_sanitize_undefined_trap_on_error)
  tt = build_call_expr_loc (loc, builtin_decl_explicit (BUILT_IN_TRAP), 0);
else


I really don't know this code, but just before the location you're 
patching, there's this:


  /* In case we have a SAVE_EXPR in a conditional context, we need to
 make sure it gets evaluated before the condition.  If the OP0 is
 an instrumented array reference, mark it as having side effects so
 it's not folded away.  */
  if (flag_sanitize & SANITIZE_BOUNDS)
{
  tree xop0 = op0;
  while (CONVERT_EXPR_P (xop0))
xop0 = TREE_OPERAND (xop0, 0);
  if (TREE_CODE (xop0) == ARRAY_REF)
{
  TREE_SIDE_EFFECTS (xop0) = 1;
  TREE_SIDE_EFFECTS (op0) = 1;
}
}

Does that need to be done for op1 as well? (I really wonder why this is 
needed or whether it's sufficient to find such an ARRAY_REF if you can 
have more complex operands).


The same pattern occurs in another function, so it may be best to break 
it out into a new function if additional occurrences are necessary.



Bernd


[PATCH] Improve restrict handling further

2015-09-23 Thread Richard Biener

The following fixes

int
f5 (S *__restrict x, S *__restrict y)
{
  x->p[0] = 5;
  y->p[0] = 0;
// { dg-final { scan-tree-dump-times "return 5" 1 "optimized" { xfail 
*-*-* } } }
  return x->p[0];
}

which requires building representatives for restrict qualified pointers
(as opposed to references or decl-by-references).  The fear here was
that as we can access that representative with out-of-bound objects
(we eventually point to an array) we'd miscompute points-to sets.
I verified we do the obvious thing here, namely glob those accesses
to the first/last subfield of the representative (that code was added
to compensate for pointer arithmetic going ouf-of-bounds).

Bootstrapped and tested on x86_64-unknown-linux-gnu, applied to trunk.

Richard.

2015-09-23  Richard Biener  

* tree-ssa-structalias.c (intra_create_variable_infos): Build
representatives for all restrict qualified pointer destinations.

* g++.dg/tree-ssa/restrict2.C: Un-XFAIL testcase.

Index: gcc/tree-ssa-structalias.c
===
*** gcc/tree-ssa-structalias.c  (revision 228014)
--- gcc/tree-ssa-structalias.c  (working copy)
*** intra_create_variable_infos (struct func
*** 5854,5865 
  {
varinfo_t p = get_vi_for_tree (t);
  
!   /* For restrict qualified pointers to objects passed by
!  reference build a real representative for the pointed-to object.
!Treat restrict qualified references the same.  */
!   if (TYPE_RESTRICT (TREE_TYPE (t))
! && ((DECL_BY_REFERENCE (t) && POINTER_TYPE_P (TREE_TYPE (t)))
! || TREE_CODE (TREE_TYPE (t)) == REFERENCE_TYPE)
  && !type_contains_placeholder_p (TREE_TYPE (TREE_TYPE (t
{
  struct constraint_expr lhsc, rhsc;
--- 5854,5865 
  {
varinfo_t p = get_vi_for_tree (t);
  
!   /* For restrict qualified pointers build a representative for
!the pointed-to object.  Note that this ends up handling
!out-of-bound references conservatively by aggregating them
!in the first/last subfield of the object.  */
!   if (POINTER_TYPE_P (TREE_TYPE (t))
! && TYPE_RESTRICT (TREE_TYPE (t))
  && !type_contains_placeholder_p (TREE_TYPE (TREE_TYPE (t
{
  struct constraint_expr lhsc, rhsc;
Index: gcc/testsuite/g++.dg/tree-ssa/restrict2.C
===
*** gcc/testsuite/g++.dg/tree-ssa/restrict2.C   (revision 228014)
--- gcc/testsuite/g++.dg/tree-ssa/restrict2.C   (working copy)
*** f5 (S *__restrict x, S *__restrict y)
*** 45,52 
  {
x->p[0] = 5;
y->p[0] = 0;
! // We might handle this some day
! // { dg-final { scan-tree-dump-times "return 5" 1 "optimized" { xfail *-*-* } 
} }
return x->p[0];
  }
  
--- 45,51 
  {
x->p[0] = 5;
y->p[0] = 0;
! // { dg-final { scan-tree-dump-times "return 5" 1 "optimized" } }
return x->p[0];
  }
  


Re: [PATCH c-family/49654/49655] reject invalid options in pragma diagnostic

2015-09-23 Thread Bernd Schmidt

On 09/22/2015 08:08 PM, Manuel López-Ibáñez wrote:

Use find_opt instead of linear search through options in
handle_pragma_diagnostic (PR 49654) and reject non-warning options and
options not valid for the current language (PR 49655).



+  /* option_string + 1 to skip the initial '-' */
+  unsigned int lang_mask = c_common_option_lang_mask () | CL_COMMON;
+  unsigned int option_index = find_opt (option_string + 1, lang_mask);


Swap the first two lines to have the comment in the right spot.


+  else if (!(cl_options[option_index].flags & lang_mask))
+{
+  char * ok_langs = write_langs (cl_options[option_index].flags);
+  char * bad_lang = write_langs (c_common_option_lang_mask ());
+  warning_at (loc, OPT_Wpragmas,
+ "option %qs is valid for %s but not for %s",
+ option_string, ok_langs, bad_lang);
+  free (ok_langs);
+  free (bad_lang);
+  return;
+}


Slightly surprising, but I checked and find_opt is documented to return 
an option for a different front end if it can't find a valid one 
matching lang_mask.


Patch is ok.


Bernd


Re: [RFC] PR tree-optimization/67628: Make tree ifcombine more symmetric and interactions with dom

2015-09-23 Thread Kyrill Tkachov


On 23/09/15 11:10, Richard Biener wrote:

On Wed, 23 Sep 2015, Kyrill Tkachov wrote:


On 23/09/15 10:09, Pinski, Andrew wrote:

On Sep 23, 2015, at 1:59 AM, Kyrill Tkachov 
wrote:



On 22/09/15 20:31, Jeff Law wrote:

On 09/22/2015 07:36 AM, Kyrill Tkachov wrote:
Hi all,
Unfortunately, I see a testsuite regression with this patch:
FAIL: gcc.dg/pr66299-2.c scan-tree-dump-not optimized "<<"

The reduced part of that test is:
void
test1 (int x, unsigned u)
{
 if ((1U << x) != 64
 || (2 << x) != u
 || (x << x) != 384
 || (3 << x) == 9
 || (x << 14) != 98304U
 || (1 << x) == 14
 || (3 << 2) != 12)
   __builtin_abort ();
}

The patched ifcombine pass works more or less as expected and produces
fewer basic blocks.
Before this patch a relevant part of the ifcombine dump for test1 is:
;;   basic block 2, loop depth 0, count 0, freq 1, maybe hot
 if (x_1(D) != 6)
   goto ;
 else
   goto ;

;;   basic block 3, loop depth 0, count 0, freq 9996, maybe hot
 _2 = 2 << x_1(D);
 _3 = (unsigned intD.10) _2;
 if (_3 != u_4(D))
   goto ;
 else
   goto ;


After this patch it is:
;;   basic block 2, loop depth 0, count 0, freq 1, maybe hot
 _2 = 2 << x_1(D);
 _3 = (unsigned intD.10) _2;
 _9 = _3 != u_4(D);
 _10 = x_1(D) != 6;
 _11 = _9 | _10;
 if (_11 != 0)
   goto ;
 else
   goto ;

The second form ends up generating worse codegen however, and the
badness starts with the dom1 pass.
In the unpatched case it manages to deduce that x must be 6 by the
time
it reaches basic block 3 and
uses that information to eliminate the shift in "_2 = 2 << x_1(D)"
from
basic block 3
In the patched case it is unable to make that call, I think because
the
x != 6 condition is IORed
with another test.

I'm not familiar with the internals of the dom pass, so I'm not sure
where to go looking for a fix for this.
Is the ifcombine change a step in the right direction? If so, what
would
need to be done to fix the issue with
the dom pass?

I don't see how you can reasonably fix this in DOM.  if _9 or _10 is
true, then _11 is true.  But we can't reasonably record any kind of
equivalence for _9 or _10 individually.

If the statement
_11 = _9 | _10;

Were changed to

_11 = _9 & _10;

Then we could record something useful about _9 and _10.



I suppose what we want is to not combine basic blocks if the sequence
and conditions of the basic blocks are
such that dom can potentially exploit them, but how do we express
that?

I don't think there's going to be a way to directly express that.  You
could essentially claim that TRUTH_OR is more expensive than TRUTH_AND
because of the impact on DOM, but that in and of itself may not resolve
the situation either.

I think the question we need to answer is whether or not your changes
are generally better, even if there's specific instances where they make
things worse.  If the benefits outweigh the negatives then we can xfail
that test.

Ok, I'll investigate and benchmark some more.
Andrew, this transformation to ifcombine (together with the restriction
that the inner condition block
has to be a single comparison) was added by you with r204194.
Is there a particular reason for that restriction and why it is applied to
the inner block and not either?

My reasoning at the time was there might be an "expensive" instruction or
one that might trap (I did not check to see if the other part of the code
was detecting that).
The outer block did not need any checks as we have something like
...
If (a)
If (b)

Or

If (a)
Goto f
else if (b)
   
Else
{
F:

}

And there was no need to check what was before the if (a) part just what is
in between the two ifs.

Ah, because the code in outer_cond_bb would have to be executed anyway whether
we perform the conversion or not, right?

All ifcombine transforms make the outer condition unconditionally
true/false thus the check should have been on whether the outer
cond BB is "empty".  Which would solve your problem, right?


I'm not sure I follow. Why does cond bb has to be empty?



Note that other transforms (bit test recognition) don't care (sth
we might want to fix?).

In general this needs a better cost function, maybe simply use
estimate_num_insns with speed estimates and compare against a
new --param.


Thanks, that looks like a starting point.
If we were add some kind of costing check here, would we even need
the checks mentioned above? I don't think it will affect correctness
(the inner cond bb is checked for no side-effects before entering this 
function).

Thanks,
Kyrill



Thanks,
Richard.


Thanks,
Kyrill


What I mean by expensive for an example is division or some function call.

Thanks,
Andrew



Thanks,
Kyrill




jeff






Re: [RFC] PR tree-optimization/67628: Make tree ifcombine more symmetric and interactions with dom

2015-09-23 Thread Richard Biener
On Wed, 23 Sep 2015, Kyrill Tkachov wrote:

> 
> On 23/09/15 11:10, Richard Biener wrote:
> > On Wed, 23 Sep 2015, Kyrill Tkachov wrote:
> > 
> > > On 23/09/15 10:09, Pinski, Andrew wrote:
> > > > > On Sep 23, 2015, at 1:59 AM, Kyrill Tkachov 
> > > > > wrote:
> > > > > 
> > > > > 
> > > > > > On 22/09/15 20:31, Jeff Law wrote:
> > > > > > > On 09/22/2015 07:36 AM, Kyrill Tkachov wrote:
> > > > > > > Hi all,
> > > > > > > Unfortunately, I see a testsuite regression with this patch:
> > > > > > > FAIL: gcc.dg/pr66299-2.c scan-tree-dump-not optimized "<<"
> > > > > > > 
> > > > > > > The reduced part of that test is:
> > > > > > > void
> > > > > > > test1 (int x, unsigned u)
> > > > > > > {
> > > > > > >  if ((1U << x) != 64
> > > > > > >  || (2 << x) != u
> > > > > > >  || (x << x) != 384
> > > > > > >  || (3 << x) == 9
> > > > > > >  || (x << 14) != 98304U
> > > > > > >  || (1 << x) == 14
> > > > > > >  || (3 << 2) != 12)
> > > > > > >__builtin_abort ();
> > > > > > > }
> > > > > > > 
> > > > > > > The patched ifcombine pass works more or less as expected and
> > > > > > > produces
> > > > > > > fewer basic blocks.
> > > > > > > Before this patch a relevant part of the ifcombine dump for test1
> > > > > > > is:
> > > > > > > ;;   basic block 2, loop depth 0, count 0, freq 1, maybe hot
> > > > > > >  if (x_1(D) != 6)
> > > > > > >goto ;
> > > > > > >  else
> > > > > > >goto ;
> > > > > > > 
> > > > > > > ;;   basic block 3, loop depth 0, count 0, freq 9996, maybe hot
> > > > > > >  _2 = 2 << x_1(D);
> > > > > > >  _3 = (unsigned intD.10) _2;
> > > > > > >  if (_3 != u_4(D))
> > > > > > >goto ;
> > > > > > >  else
> > > > > > >goto ;
> > > > > > > 
> > > > > > > 
> > > > > > > After this patch it is:
> > > > > > > ;;   basic block 2, loop depth 0, count 0, freq 1, maybe hot
> > > > > > >  _2 = 2 << x_1(D);
> > > > > > >  _3 = (unsigned intD.10) _2;
> > > > > > >  _9 = _3 != u_4(D);
> > > > > > >  _10 = x_1(D) != 6;
> > > > > > >  _11 = _9 | _10;
> > > > > > >  if (_11 != 0)
> > > > > > >goto ;
> > > > > > >  else
> > > > > > >goto ;
> > > > > > > 
> > > > > > > The second form ends up generating worse codegen however, and the
> > > > > > > badness starts with the dom1 pass.
> > > > > > > In the unpatched case it manages to deduce that x must be 6 by the
> > > > > > > time
> > > > > > > it reaches basic block 3 and
> > > > > > > uses that information to eliminate the shift in "_2 = 2 << x_1(D)"
> > > > > > > from
> > > > > > > basic block 3
> > > > > > > In the patched case it is unable to make that call, I think
> > > > > > > because
> > > > > > > the
> > > > > > > x != 6 condition is IORed
> > > > > > > with another test.
> > > > > > > 
> > > > > > > I'm not familiar with the internals of the dom pass, so I'm not
> > > > > > > sure
> > > > > > > where to go looking for a fix for this.
> > > > > > > Is the ifcombine change a step in the right direction? If so, what
> > > > > > > would
> > > > > > > need to be done to fix the issue with
> > > > > > > the dom pass?
> > > > > > I don't see how you can reasonably fix this in DOM.  if _9 or _10 is
> > > > > > true, then _11 is true.  But we can't reasonably record any kind of
> > > > > > equivalence for _9 or _10 individually.
> > > > > > 
> > > > > > If the statement
> > > > > > _11 = _9 | _10;
> > > > > > 
> > > > > > Were changed to
> > > > > > 
> > > > > > _11 = _9 & _10;
> > > > > > 
> > > > > > Then we could record something useful about _9 and _10.
> > > > > > 
> > > > > > 
> > > > > > > I suppose what we want is to not combine basic blocks if the
> > > > > > > sequence
> > > > > > > and conditions of the basic blocks are
> > > > > > > such that dom can potentially exploit them, but how do we express
> > > > > > > that?
> > > > > > I don't think there's going to be a way to directly express that.
> > > > > > You
> > > > > > could essentially claim that TRUTH_OR is more expensive than
> > > > > > TRUTH_AND
> > > > > > because of the impact on DOM, but that in and of itself may not
> > > > > > resolve
> > > > > > the situation either.
> > > > > > 
> > > > > > I think the question we need to answer is whether or not your
> > > > > > changes
> > > > > > are generally better, even if there's specific instances where they
> > > > > > make
> > > > > > things worse.  If the benefits outweigh the negatives then we can
> > > > > > xfail
> > > > > > that test.
> > > > > Ok, I'll investigate and benchmark some more.
> > > > > Andrew, this transformation to ifcombine (together with the
> > > > > restriction
> > > > > that the inner condition block
> > > > > has to be a single comparison) was added by you with r204194.
> > > > > Is there a particular reason for that restriction and why it is
> > > > > applied to
> > > > > the inner block and not either?
> > > > My reasoning at the time was there might be 

[v3 patch] Fix filesystem::create_directories() function

2015-09-23 Thread Jonathan Wakely

This function wasn't working properly (testing is useful!)

Tested x86_64-linux, powerpc64le-linux and x86_64-dragonfly4.1,
committed to trunk.

commit 9f9ee62dc3e3d5a1cc825298b93afedc2eaf0aeb
Author: Jonathan Wakely 
Date:   Tue Sep 22 23:43:59 2015 +0100

Fix filesystem::create_directories() function

* src/filesystem/ops.cc (is_dot, is_dotdot): Define new helpers.
(create_directories): Fix error handling.
* testsuite/experimental/filesystem/operations/create_directories.cc:
New.

diff --git a/libstdc++-v3/src/filesystem/ops.cc 
b/libstdc++-v3/src/filesystem/ops.cc
index b5c8eb9..5ff8120 100644
--- a/libstdc++-v3/src/filesystem/ops.cc
+++ b/libstdc++-v3/src/filesystem/ops.cc
@@ -85,6 +85,24 @@ fs::absolute(const path& p, const path& base)
 
 namespace
 {
+#ifdef _GLIBCXX_FILESYSTEM_IS_WINDOWS
+  inline bool is_dot(wchar_t c) { return c == L'.'; }
+#else
+  inline bool is_dot(char c) { return c == '.'; }
+#endif
+
+  inline bool is_dot(const fs::path& path)
+  {
+const auto& filename = path.native();
+return filename.size() == 1 && is_dot(filename[0]);
+  }
+
+  inline bool is_dotdot(const fs::path& path)
+  {
+const auto& filename = path.native();
+return filename.size() == 2 && is_dot(filename[0]) && is_dot(filename[1]);
+  }
+
   struct free_as_in_malloc
   {
 void operator()(void* p) const { ::free(p); }
@@ -576,19 +594,36 @@ fs::create_directories(const path& p)
 bool
 fs::create_directories(const path& p, error_code& ec) noexcept
 {
+  if (p.empty())
+{
+  ec = std::make_error_code(errc::invalid_argument);
+  return false;
+}
   std::stack missing;
   path pp = p;
-  ec.clear();
-  while (!p.empty() && !exists(pp, ec) && !ec.value())
+
+  while (!pp.empty() && status(pp, ec).type() == file_type::not_found)
 {
-  missing.push(pp);
-  pp = pp.parent_path();
+  ec.clear();
+  const auto& filename = pp.filename();
+  if (!is_dot(filename) && !is_dotdot(filename))
+   missing.push(pp);
+  pp.remove_filename();
 }
-  while (!missing.empty() && !ec.value())
+
+  if (ec || missing.empty())
+return false;
+
+  do
 {
-  create_directory(missing.top(), ec);
+  const path& top = missing.top();
+  create_directory(top, ec);
+  if (ec && is_directory(top))
+   ec.clear();
   missing.pop();
 }
+  while (!missing.empty() && !ec);
+
   return missing.empty();
 }
 
diff --git 
a/libstdc++-v3/testsuite/experimental/filesystem/operations/create_directories.cc
 
b/libstdc++-v3/testsuite/experimental/filesystem/operations/create_directories.cc
new file mode 100644
index 000..b84d966
--- /dev/null
+++ 
b/libstdc++-v3/testsuite/experimental/filesystem/operations/create_directories.cc
@@ -0,0 +1,75 @@
+// Copyright (C) 2015 Free Software Foundation, Inc.
+//
+// This file is part of the GNU ISO C++ Library.  This library is free
+// software; you can redistribute it and/or modify it under the
+// terms of the GNU General Public License as published by the
+// Free Software Foundation; either version 3, or (at your option)
+// any later version.
+
+// This library is distributed in the hope that it will be useful,
+// but WITHOUT ANY WARRANTY; without even the implied warranty of
+// MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+// GNU General Public License for more details.
+
+// You should have received a copy of the GNU General Public License along
+// with this library; see the file COPYING3.  If not see
+// .
+
+// { dg-options "-std=gnu++11 -lstdc++fs" }
+// { dg-require-filesystem-ts "" }
+
+#include 
+#include 
+#include 
+
+namespace fs = std::experimental::filesystem;
+
+void
+test01()
+{
+  bool test __attribute__((unused)) = false;
+  std::error_code ec;
+
+  // Test empty path.
+  bool b = fs::create_directories( "", ec );
+  VERIFY( ec );
+  VERIFY( !b );
+
+  // Test existing path.
+  b = fs::create_directories( fs::current_path(), ec );
+  VERIFY( !ec );
+  VERIFY( !b );
+
+  // Test non-existent path.
+  const auto p = __gnu_test::nonexistent_path();
+  b = fs::create_directories( p, ec );
+  VERIFY( !ec );
+  VERIFY( b );
+  VERIFY( is_directory(p) );
+
+  b = fs::create_directories( p/".", ec );
+  VERIFY( !ec );
+  VERIFY( !b );
+
+  b = fs::create_directories( p/"..", ec );
+  VERIFY( !ec );
+  VERIFY( !b );
+
+  b = fs::create_directories( p/"d1/d2/d3", ec );
+  VERIFY( !ec );
+  VERIFY( b );
+  VERIFY( is_directory(p/"d1/d2/d3") );
+
+  b = fs::create_directories( p/"./d4/../d5", ec );
+  VERIFY( !ec );
+  VERIFY( b );
+  VERIFY( is_directory(p/"./d4/../d5") );
+
+  remove_all(p, ec);
+}
+
+int
+main()
+{
+  test01();
+}


Re: [PATCH c-family/49654/49655] reject invalid options in pragma diagnostic

2015-09-23 Thread Marek Polacek
On Tue, Sep 22, 2015 at 08:08:28PM +0200, Manuel López-Ibáñez wrote:
> +  else if (!(cl_options[option_index].flags & lang_mask))
> +{
> +  char * ok_langs = write_langs (cl_options[option_index].flags);
> +  char * bad_lang = write_langs (c_common_option_lang_mask ());

Please remove the spaces after * when you commit the patch.

Thanks,

Marek


[v3 patch] Fix Filesystem TS directory iterators

2015-09-23 Thread Jonathan Wakely

directory_iterator and recursive_directory_iterator fail to meet this
requirement in http://wg21.link/n4099#Class-directory_iterator

 The directory_iterator default constructor shall create an iterator
 equal to the end iterator value, and this shall be the only valid
 iterator for the end condition.

The current code creates the end iterator when an error occurs during
construction and an error_code parameter was used (so an exception
is not thrown, but construction finishes normally and sets the
error_code).

This fixes it by creating a distinct error state that is not the end
iterator state:

 // An error occurred, we need a non-empty shared_ptr so that *this will
 // not compare equal to the end iterator.
 _M_dir.reset(static_cast(nullptr));

This way the shared_ptr owns a null pointer, so (bool)_M_dir is false
(and we don't allow incrementing or dereferencing) but it can be
distinguished from an empty shared_ptr by comparing them using
shared_ptr::owner_before.

(The order of the owner_before checks is chosen so that the common
case of testing iter != directory_iterator() should short-circuit and
only check the first condition).

There were a few other problems with directory iterators, including
the fact that the get_file_type function never worked because autoconf
was defining _GLIBCXX_GLIBCXX_HAVE_STRUCT_DIRENT_D_TYPE instead of
the macro I was checking, _GLIBCXX_HAVE_STRUCT_DIRENT_D_TYPE.

I've removed the ErrorCode utility that was meant to simplify
clearing/setting an error_code that may or may not be present, but
really just obsfuscated things.

I'm also now consistently checking the skip_permission_denied flag
everywhere it matters.

Tested x86_64-linux, powerpc64le-linux, x86_64-dragonfly4.1, committed
to trunk.


commit 8d08e1c6724cb433e1ca4f975ce85bd277ba2389
Author: Jonathan Wakely 
Date:   Wed Sep 23 00:28:19 2015 +0100

Fix semantics of Filesystem TS directory iterators

[class.directory_iterator] p4 and [directory_iterator.members] p4
require that only the default constructor and ignored permission denied
errors can create the end iterator.

* acinclude.m4 (GLIBCXX_CHECK_FILESYSTEM_DEPS): Remove _GLIBCXX_
prefix from HAVE_STRUCT_DIRENT_D_TYPE.
* config.h.in: Regenerate.
* configure: Regenerate.
* include/experimental/fs_dir.h (operator==, operator==):
Use owner_before instead of pointer equality.
(directory_iterator(std::shared_ptr<_Dir>, error_code*)): Remove.
* src/filesystem/dir.cc (ErrorCode): Remove.
(_Dir::advance): Change ErrorCode parameter to error_code*, add
directory_options parameter and check it on error.
(opendir): Rename to open_dir to avoid clashing with macro. Change
ErrorCode parameter to error_code*.
(make_shared_dir): Remove.
(native_readdir) [_GLIBCXX_FILESYSTEM_IS_WINDOWS]: Don't set errno.
(directory_iterator(std::shared_ptr<_Dir>, error_code*)): Remove.
(directory_iterator(const path&, directory_options, error_code*)):
Pass options to _Dir::advance and create non-end iterator on error.
(recursive_directory_iterator(const path&, directory_options,
error_code*)): Clear error_code on ignored error, create non-end
iterator otherwise.
(recursive_directory_iterator::increment): Pass _M_options to
_Dir::advance.
(recursive_directory_iterator::pop): Likewise.
* testsuite/experimental/filesystem/iterators/directory_iterator.cc:
New.
* testsuite/experimental/filesystem/iterators/
recursive_directory_iterator.cc: New.

diff --git a/libstdc++-v3/acinclude.m4 b/libstdc++-v3/acinclude.m4
index c133c25..4b031f7 100644
--- a/libstdc++-v3/acinclude.m4
+++ b/libstdc++-v3/acinclude.m4
@@ -3940,7 +3940,7 @@ dnl
   [glibcxx_cv_dirent_d_type=no])
   ])
   if test $glibcxx_cv_dirent_d_type = yes; then
-AC_DEFINE(_GLIBCXX_HAVE_STRUCT_DIRENT_D_TYPE, 1, [Define to 1 if `d_type' 
is a member of `struct dirent'.])
+AC_DEFINE(HAVE_STRUCT_DIRENT_D_TYPE, 1, [Define to 1 if `d_type' is a 
member of `struct dirent'.])
   fi
   AC_MSG_RESULT($glibcxx_cv_dirent_d_type)
 dnl
diff --git a/libstdc++-v3/include/experimental/fs_dir.h 
b/libstdc++-v3/include/experimental/fs_dir.h
index d46d41b..0c5253f 100644
--- a/libstdc++-v3/include/experimental/fs_dir.h
+++ b/libstdc++-v3/include/experimental/fs_dir.h
@@ -201,14 +201,12 @@ _GLIBCXX_BEGIN_NAMESPACE_CXX11
   return __tmp;
 }
 
-friend bool
-operator==(const directory_iterator& __lhs,
-   const directory_iterator& __rhs)
-{ return __lhs._M_dir == __rhs._M_dir; }
-
   private:
 directory_iterator(const path&, directory_options, error_code*);
-directory_iterator(std::shared_ptr<_Dir>, error_code*);
+
+friend bool
+operator==(const directory_iterator& __lhs,
+   const directory_iterator& __rhs);
 
 friend class recursive_directory_iterator;
 

Re: [patch] libstdc++/67173 Fix filesystem::canonical for Solaris 10.

2015-09-23 Thread Jonathan Wakely

On 17/09/15 09:37 -0600, Martin Sebor wrote:

On 09/17/2015 05:16 AM, Jonathan Wakely wrote:

On 16/09/15 17:42 -0600, Martin Sebor wrote:

I see now the first exists test will detect symlink loops in
the original path. But I'm not convinced there isn't a corner
case that's subject to a TOCTOU race condition between the first
exists test and the while loop during which a symlink loop can
be introduced.

Suppose we call the function with /foo/bar as an argument and
the path exists and contains no symlinks. result is / and cmpts
is set to { foo, bar }. Just as the loop is entered, /foo/bar
is replaced with a symlink containing /foo/bar. The loop then
proceeds like so:

1. The first iteration removes foo from cmpts and sets result
to /foo. cmpts is { bar }.

2. The second iteration removes bar from cmpts, sets result to
/foo/bar, determines it's a symlink, reads its contents, sees
it's an absolute pathname and replaces result with /. It then
inserts the symlink's components { foo, bar } into cmpts. cmpts
becomes { foo, bar }. exists(result) succeeds.

3. The next iteration of the loop has the same initial state
as the first.

But I could have very easily missed something that takes care
of this corner case. If I did, sorry for the false alarm!


No, you're right. The TS says such filesystem races are undefined:
http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2014/n4099.html#fs.race.behavior

but it would be nice to fail gracefully rather than DOS the
application.

The simplest approach would be to increment a counter every time we
follow a symlink, and if it reaches some limit decide something is
wrong and fail with ELOOP.

I don't see how anything else can be 100% bulletproof, because a truly
evil attacker could just keep altering the contents of symlinks so we
keep ping-ponging between two or more paths. If we keep track of paths
we've seen before the attacker could just keep changing the contents
to a unique path each time, that initially exists as a file, but by
the time we get to is_symlink() its become a symlink to a new path.

So if we use a counter, what's a sane maximum? Is MAXSYMLINKS in
 the value the kernel uses? 20 seems quite low, I was
thinking of a much higher number.


Yes, it is a corner case, and it's not really avoidable in the case
of hard links. For symlinks, POSIX defines the SYMLOOP_MAX constant
as the maximum, with the _SC_SYMLOOP_MAX and _PC_SYMLOOP_MAX
sysconf and pathconf variables. Otherwise 40 seems reasonable.

With this, I'll let you get back to work -- I think we've beat this
function to death ;)


Here's what I committed. Similar to the last patch, but using the new
is_dot and is_dotdot helpers.


commit 8128173a00c234ccf34e258115747fa0e3b4457a
Author: Jonathan Wakely 
Date:   Wed Sep 23 02:00:57 2015 +0100

Limit number of symlinks that canonical() will resolve

* src/filesystem/ops.cc (canonical): Simplify error handling and
limit number of symlinks that can be resolved.

diff --git a/libstdc++-v3/src/filesystem/ops.cc 
b/libstdc++-v3/src/filesystem/ops.cc
index 5ff8120..7b261fb 100644
--- a/libstdc++-v3/src/filesystem/ops.cc
+++ b/libstdc++-v3/src/filesystem/ops.cc
@@ -116,6 +116,7 @@ fs::canonical(const path& p, const path& base, error_code& 
ec)
 {
   const path pa = absolute(p, base);
   path result;
+
 #ifdef _GLIBCXX_USE_REALPATH
   char_ptr buf{ nullptr };
 # if _XOPEN_VERSION < 700
@@ -137,18 +138,9 @@ fs::canonical(const path& p, const path& base, error_code& 
ec)
 }
 #endif
 
-  auto fail = [&ec, &result](int e) mutable {
-  if (!ec.value())
-   ec.assign(e, std::generic_category());
-  result.clear();
-  };
-
   if (!exists(pa, ec))
-{
-  fail(ENOENT);
-  return result;
-}
-  // else we can assume no unresolvable symlink loops
+return result;
+  // else: we know there are (currently) no unresolvable symlink loops
 
   result = pa.root_path();
 
@@ -156,20 +148,19 @@ fs::canonical(const path& p, const path& base, 
error_code& ec)
   for (auto& f : pa.relative_path())
 cmpts.push_back(f);
 
-  while (!cmpts.empty())
+  int max_allowed_symlinks = 40;
+
+  while (!cmpts.empty() && !ec)
 {
   path f = std::move(cmpts.front());
   cmpts.pop_front();
 
-  if (f.compare(".") == 0)
+  if (is_dot(f))
{
- if (!is_directory(result, ec))
-   {
- fail(ENOTDIR);
- break;
-   }
+ if (!is_directory(result, ec) && !ec)
+   ec.assign(ENOTDIR, std::generic_category());
}
-  else if (f.compare("..") == 0)
+  else if (is_dotdot(f))
{
  auto parent = result.parent_path();
  if (parent.empty())
@@ -184,27 +175,30 @@ fs::canonical(const path& p, const path& base, 
error_code& ec)
  if (is_symlink(result, ec))
{
  path link = read_symlink(result, ec);
- if (!ec.value())
+ if (!ec)
{
- if (link.is_absolute(

[SH][committed] Fix PR 67391

2015-09-23 Thread Oleg Endo
Hi,

The attached patch fixes PR 67391.  Some additional reg overlapping were
added to the addsi3 patterns while making LRA on SH work, but not all of
them seem to be good.  Removing them, seems to be working just fine.
Tested on sh-elf (LRA enabled) with make -k check
RUNTESTFLAGS="--target_board=sh-sim
\{-m2/-ml,-m2/-mb,-m2a/-mb,-m4/-ml,-m4/-mb,-m4a/-ml,-m4a/-mb}"
and by Kaz on sh4-linux.

Committed to trunk as r228046 and to the GCC 5 branch as r228047.

Cheers,
Oleg

gcc/ChangeLog:
PR target/67391
* config/sh/sh.md (addsi3, *addsi3_compact): Don't check for overlapping
regs when matching the pattern.
Index: gcc/config/sh/sh.md
===
--- gcc/config/sh/sh.md	(revision 228020)
+++ gcc/config/sh/sh.md	(working copy)
@@ -2129,11 +2129,6 @@
 {
   if (TARGET_SHMEDIA)
 operands[1] = force_reg (SImode, operands[1]);
-  else if (! arith_operand (operands[2], SImode))
-{
-  if (reg_overlap_mentioned_p (operands[0], operands[1]))
-	FAIL;
-}
 })
 
 (define_insn "addsi3_media"
@@ -2172,10 +2167,7 @@
   [(set (match_operand:SI 0 "arith_reg_dest" "=r,&u")
 	(plus:SI (match_operand:SI 1 "arith_operand" "%0,r")
 		 (match_operand:SI 2 "arith_or_int_operand" "rI08,rn")))]
-  "TARGET_SH1
-   && ((rtx_equal_p (operands[0], operands[1])
-&& arith_operand (operands[2], SImode))
-   || ! reg_overlap_mentioned_p (operands[0], operands[1]))"
+  "TARGET_SH1"
   "@
 	add	%2,%0
 	#"


Re: New post-LTO OpenACC pass

2015-09-23 Thread Nathan Sidwell

On 09/23/15 06:59, Bernd Schmidt wrote:

On 09/22/2015 05:16 PM, Nathan Sidwell wrote:

+if (gimple_call_builtin_p (call, BUILT_IN_ACC_ON_DEVICE))
+  /* acc_on_device must be evaluated at compile time for
+ constant arguments.  */
+  {
+oacc_xform_on_device (call);
+rescan = true;
+  }


Is there a reason this is not done as part of pass_fold_builtins? (It looks like
maybe adding this to fold_call_stmt in builtins.c would be sufficient too).


Perhaps it could be.  I'll need to check where  that pass happens.  Anyway, the 
main thrust of this patch is the new pass, which I thought might be easier to 
review with minimal additional  clutter.


nathan


Re: [gomp4] Another oacc reduction simplification

2015-09-23 Thread Nathan Sidwell

On 09/23/15 04:02, Thomas Schwinge wrote:

Hi!

On Tue, 22 Sep 2015 11:29:37 -0400, Nathan Sidwell  wrote:

I've committed this patch, which simplifies the generation of openacc reduction
code.


Aside from the progression mentioned in
,
this change is also causing a regression:

 [-PASS:-]{+FAIL:+} c-c++-common/goacc/routine-7.c  (test for errors, line 
35)
 [-PASS:-]{+FAIL:+} c-c++-common/goacc/routine-7.c  (test for errors, line 
58)
 [-PASS:-]{+FAIL:+} c-c++-common/goacc/routine-7.c  (test for errors, line 
62)
 [-PASS:-]{+FAIL:+} c-c++-common/goacc/routine-7.c  (test for errors, line 
81)
 [-PASS:-]{+FAIL:+} c-c++-common/goacc/routine-7.c  (test for errors, line 
85)
 [-PASS:-]{+FAIL:+} c-c++-common/goacc/routine-7.c  (test for errors, line 
89)
 [-PASS:-]{+FAIL: c-c++-common/goacc/routine-7.c (internal compiler error)+}
 {+FAIL:+} c-c++-common/goacc/routine-7.c (test for excess errors)


Odd.   I didn't see any new fails.  Will look

nathan
--
Nathan Sidwell


Re: [gomp4] lock/unlock internal fn

2015-09-23 Thread Nathan Sidwell

On 09/23/15 05:27, Thomas Schwinge wrote:

Hi Nathan!

On Mon, 17 Aug 2015 15:30:16 -0400, Nathan Sidwell  wrote:

I've committed this patch to add a new pair of internal functions.  These will
be used in implementing reductions.

They'll be emitted around reduction finalization, and implement the locking
required for the general case of combining reduction values.  They may be
transformed in the oacc_xform pass, and the default behaviour is to delete them,
if there is no RTL expander.  For PTX we delete them if they are at the vector
level.

This avoids needing machine-specific builtins to expand to, and thus should
result in less backend code duplication.


With the __builtin_nvptx_lock and __builtin_nvptx_unlock builtins
removed, should the gcc.target/nvptx/spinlock-1.c and
gcc.target/nvptx/spinlock-2.c test cases then be removed, too, or should
these be re-written differently?


confused.  I don't think I remoced those locks.  Certainly didn't intend to, and 
I would have expected massive test fails if I had.


nathan

--
Nathan Sidwell


[gomp4] vector reductions

2015-09-23 Thread Nathan Sidwell
I've committed this reimplementation of the vector shuffling code.  In preparing 
a fix for the worker reductions (to use a lockless scheme), I wanted to check 
VIEW_CONVERT_EXPR DTRT.  Use of gimplify_assign also reduces the code size.


nathan
2015-09-23  Nathan Sidwell  

	* config/nvptx/nvptx.c (nvptx_generate_vector_shuffle):
	Reimplement using integer builtins and VIEW_CONVERT_EXPR.
	(nvptx_goacc_reduction_fini): Pass location to
	nvptx_generate_vector_shuffle.

Index: config/nvptx/nvptx.c
===
--- config/nvptx/nvptx.c	(revision 228021)
+++ config/nvptx/nvptx.c	(working copy)
@@ -4478,68 +4478,43 @@ nvptx_get_worker_red_addr_fn (tree var,
will cast the variable if necessary.  */
 
 static void
-nvptx_generate_vector_shuffle (tree dest_var, tree var, int shfl,
+nvptx_generate_vector_shuffle (location_t loc,
+			   tree dest_var, tree var, unsigned shift,
 			   gimple_seq *seq)
 {
-  tree vartype = TREE_TYPE (var);
-  enum nvptx_builtins fn = NVPTX_BUILTIN_SHUFFLE_DOWN;
-  machine_mode mode = TYPE_MODE (vartype);
-  tree casted_dest = dest_var;
-  tree casted_var = var;
-  tree call_arg_type;
+  unsigned fn = NVPTX_BUILTIN_SHUFFLE_DOWN;
+  tree_code code = NOP_EXPR;
+  tree type = unsigned_type_node;
 
-  switch (mode)
+  switch (TYPE_MODE (TREE_TYPE (var)))
 {
+case SFmode:
+  code = VIEW_CONVERT_EXPR;
+  /* FALLTHROUGH */
 case QImode:
 case HImode:
 case SImode:
-  fn = NVPTX_BUILTIN_SHUFFLE_DOWN;
-  call_arg_type = unsigned_type_node;
   break;
+
+case DFmode:
+  code = VIEW_CONVERT_EXPR;
+  /* FALLTHROUGH  */
 case DImode:
+  type = long_long_unsigned_type_node;
   fn = NVPTX_BUILTIN_SHUFFLE_DOWNLL;
-  call_arg_type = long_long_unsigned_type_node;
-  break;
-case DFmode:
-  fn = NVPTX_BUILTIN_SHUFFLE_DOWND;
-  call_arg_type = double_type_node;
-  break;
-case SFmode:
-  fn = NVPTX_BUILTIN_SHUFFLE_DOWNF;
-  call_arg_type = float_type_node;
   break;
+
 default:
   gcc_unreachable ();
 }
 
-  /* All of the integral types need to be unsigned.  Furthermore, small
- integral types may need to be extended to 32-bits.  */
-  bool need_conversion = !types_compatible_p (vartype, call_arg_type);
+  tree call = build_call_expr_loc (loc, nvptx_builtin_decl (fn, true),
+   2, build1 (code, type, var),
+   build_int_cst (unsigned_type_node, shift));
 
-  if (need_conversion)
-{
-  casted_var = make_ssa_name (call_arg_type);
-  tree t1 = fold_build1 (NOP_EXPR, call_arg_type, var);
-  gassign *conv1 = gimple_build_assign (casted_var, t1);
-  gimple_seq_add_stmt (seq, conv1);
-}
-
-  tree fndecl = nvptx_builtin_decl (fn, true);
-  tree shift =  build_int_cst (unsigned_type_node, shfl);
-  gimple call = gimple_build_call (fndecl, 2, casted_var, shift);
-
-  gimple_seq_add_stmt (seq, call);
-
-  if (need_conversion)
-{
-  casted_dest = make_ssa_name (call_arg_type);
-  tree t2 = fold_build1 (NOP_EXPR, vartype, casted_dest);
-  gassign *conv2 = gimple_build_assign (dest_var, t2);
-  gimple_seq_add_stmt (seq, conv2);
-}
+  call = fold_build1 (code, TREE_TYPE (dest_var), call);
 
-  update_stmt (call);
-  gimple_call_set_lhs (call, casted_dest);
+  gimplify_assign (dest_var, call, seq);
 }
 
 /* NVPTX implementation of GOACC_REDUCTION_SETUP.  Reserve shared
@@ -4770,11 +4745,12 @@ nvptx_goacc_reduction_fini (gimple call)
   for (int shfl = PTX_VECTOR_LENGTH / 2; shfl > 0; shfl = shfl >> 1)
 	{
 	  tree other_var = make_ssa_name (TREE_TYPE (var));
-	  nvptx_generate_vector_shuffle (other_var, var, shfl, &seq);
+	  nvptx_generate_vector_shuffle (gimple_location (call),
+	 other_var, var, shfl, &seq);
 
 	  r = make_ssa_name (TREE_TYPE (var));
 	  gimplify_assign (r, fold_build2 (op, TREE_TYPE (var),
-	 var, other_var), &seq);
+	   var, other_var), &seq);
 	  var = r;
 	}
 }


Re: New post-LTO OpenACC pass

2015-09-23 Thread Bernd Schmidt

On 09/23/2015 02:14 PM, Nathan Sidwell wrote:

On 09/23/15 06:59, Bernd Schmidt wrote:

On 09/22/2015 05:16 PM, Nathan Sidwell wrote:

+if (gimple_call_builtin_p (call, BUILT_IN_ACC_ON_DEVICE))
+  /* acc_on_device must be evaluated at compile time for
+ constant arguments.  */
+  {
+oacc_xform_on_device (call);
+rescan = true;
+  }


Is there a reason this is not done as part of pass_fold_builtins? (It
looks like
maybe adding this to fold_call_stmt in builtins.c would be sufficient
too).


Perhaps it could be.  I'll need to check where  that pass happens.
Anyway, the main thrust of this patch is the new pass, which I thought
might be easier to review with minimal additional  clutter.


There's no issue adding a new pass if there's a demonstrated need for 
it, but I think builtin folding doesn't quite meet that criterion given 
that we already have a pass that does that. Unless you really need it to 
happen very early in the pipeline - fold_builtins runs pretty late, but 
I checked and fold_call_stmt gets called from pass_forwprop and possibly 
from elsewhere too.



Bernd


[PATCH] Fix testcase from PR48885

2015-09-23 Thread Richard Biener

I am currently testing the following patch enabling us to optimize

void
test (int *a, int *b, int * restrict v)
{
*a = *v;
*b = *v;
}

there is a simple case we can handle without implementing ??? from
visit_loadstore.

Richard.

2015-09-23  Richard Biener  

PR tree-optimization/48885
* tree-ssa-structalias.c (visit_loadstore): Handle default defs
as not including any restrict tags from other pointers.

* gcc.dg/tree-ssa/restrict-6.c: New testcase.

Index: gcc/tree-ssa-structalias.c
===
*** gcc/tree-ssa-structalias.c  (revision 228037)
--- gcc/tree-ssa-structalias.c  (working copy)
*** visit_loadstore (gimple *, tree base, tr
*** 6952,6961 
|| TREE_CODE (base) == TARGET_MEM_REF)
  {
tree ptr = TREE_OPERAND (base, 0);
!   if (TREE_CODE (ptr) == SSA_NAME)
{
  /* ???  We need to make sure 'ptr' doesn't include any of
!the restrict tags in its points-to set.  */
  return false;
}
  
--- 7047,7057 
|| TREE_CODE (base) == TARGET_MEM_REF)
  {
tree ptr = TREE_OPERAND (base, 0);
!   if (TREE_CODE (ptr) == SSA_NAME
! && ! SSA_NAME_IS_DEFAULT_DEF (ptr))
{
  /* ???  We need to make sure 'ptr' doesn't include any of
!the restrict tags we added bases for in its points-to set.  */
  return false;
}
  
Index: gcc/testsuite/gcc.dg/tree-ssa/restrict-6.c
===
*** gcc/testsuite/gcc.dg/tree-ssa/restrict-6.c  (revision 0)
--- gcc/testsuite/gcc.dg/tree-ssa/restrict-6.c  (working copy)
***
*** 0 
--- 1,11 
+ /* { dg-do compile } */
+ /* { dg-options "-O -fdump-tree-fre1" } */
+ 
+ void
+ test (int *a, int *b, int * __restrict__ v)
+ {
+   *a = *v;
+   *b = *v;
+ }
+ 
+ /* { dg-final { scan-tree-dump-times "= \\*v" 1 "fre1" } } */


[PATCH] Preserve restrict dependence info in FRE/PRE

2015-09-23 Thread Richard Biener

I noticed we don't handle secondary effects of restrict in FRE when
looking at another testcase from PR48885:

int
f (int *__restrict__ &__restrict__ p, int *p2)
{
  *p = 1;
  *p2 = 2;
  return *p;
}

with the previously posted patch to improve the handling for p2
we should be able to optimize the return stmt to return 1
in FRE1.  Without the following patch we remove the redundant
load of 'p' but not the load from *p.  This is because the SCCVN
IL didn't record dependence info and did not reconstruct it for
the alias walks or final PRE insert.

The following fixes that - bootstrap and regtest running on
x86_64-unknown-linux-gnu.

Richard.

2015-09-23  Richard Biener  

* tree-ssa-sccvn.h (vn_reference_op_struct): Add clique and base
members.
* tree-ssa-sccvn.c (copy_reference_ops_from_ref): Record clique
and base for MEM_REF and TARGET_MEM_REF.  Handle BIT_FIELD_REF
offset.
(ao_ref_init_from_vn_reference): Record clique and base in the
built base.
* tree-ssa-pre.c (create_component_ref_by_pieces_1): Likewise

* g++.dg/tree-ssa/restrict3.C: New testcase.

Index: gcc/tree-ssa-sccvn.h
===
*** gcc/tree-ssa-sccvn.h(revision 228037)
--- gcc/tree-ssa-sccvn.h(working copy)
*** typedef struct vn_reference_op_struct
*** 83,88 
--- 83,91 
ENUM_BITFIELD(tree_code) opcode : 16;
/* 1 for instrumented calls.  */
unsigned with_bounds : 1;
+   /* Dependence info, used for [TARGET_]MEM_REF only.  */
+   unsigned short clique;
+   unsigned short base;
/* Constant offset this op adds or -1 if it is variable.  */
HOST_WIDE_INT off;
tree type;
Index: gcc/tree-ssa-sccvn.c
===
*** gcc/tree-ssa-sccvn.c(revision 228037)
--- gcc/tree-ssa-sccvn.c(working copy)
*** copy_reference_ops_from_ref (tree ref, v
*** 773,778 
--- 783,790 
temp.op1 = TMR_STEP (ref);
temp.op2 = TMR_OFFSET (ref);
temp.off = -1;
+   temp.clique = MR_DEPENDENCE_CLIQUE (ref);
+   temp.base = MR_DEPENDENCE_BASE (ref);
result->quick_push (temp);
  
memset (&temp, 0, sizeof (temp));
*** copy_reference_ops_from_ref (tree ref, v
*** 816,826 
--- 828,846 
  temp.op0 = TREE_OPERAND (ref, 1);
  if (tree_fits_shwi_p (TREE_OPERAND (ref, 1)))
temp.off = tree_to_shwi (TREE_OPERAND (ref, 1));
+ temp.clique = MR_DEPENDENCE_CLIQUE (ref);
+ temp.base = MR_DEPENDENCE_BASE (ref);
  break;
case BIT_FIELD_REF:
  /* Record bits and position.  */
  temp.op0 = TREE_OPERAND (ref, 1);
  temp.op1 = TREE_OPERAND (ref, 2);
+ if (tree_fits_shwi_p (TREE_OPERAND (ref, 2)))
+   {
+ HOST_WIDE_INT off = tree_to_shwi (TREE_OPERAND (ref, 2));
+ if (off % BITS_PER_UNIT == 0)
+   temp.off = off / 8;
+   }
  break;
case COMPONENT_REF:
  /* The field decl is enough to unambiguously specify the field,
*** ao_ref_init_from_vn_reference (ao_ref *r
*** 1017,1022 
--- 1037,1044 
  base_alias_set = get_deref_alias_set (op->op0);
  *op0_p = build2 (MEM_REF, op->type,
   NULL_TREE, op->op0);
+ MR_DEPENDENCE_CLIQUE (*op0_p) = op->clique;
+ MR_DEPENDENCE_BASE (*op0_p) = op->base;
  op0_p = &TREE_OPERAND (*op0_p, 0);
  break;
  
Index: gcc/tree-ssa-pre.c
===
*** gcc/tree-ssa-pre.c  (revision 228037)
--- gcc/tree-ssa-pre.c  (working copy)
*** create_component_ref_by_pieces_1 (basic_
*** 2531,2537 
 off));
baseop = build_fold_addr_expr (base);
  }
!   return fold_build2 (MEM_REF, currop->type, baseop, offset);
}
  
  case TARGET_MEM_REF:
--- 2531,2540 
 off));
baseop = build_fold_addr_expr (base);
  }
!   genop = build2 (MEM_REF, currop->type, baseop, offset);
!   MR_DEPENDENCE_CLIQUE (genop) = currop->clique;
!   MR_DEPENDENCE_BASE (genop) = currop->base;
!   return genop;
}
  
  case TARGET_MEM_REF:
*** create_component_ref_by_pieces_1 (basic_
*** 2554,2561 
if (!genop1)
  return NULL_TREE;
  }
!   return build5 (TARGET_MEM_REF, currop->type,
!  baseop, currop->op2, genop0, currop->op1, genop1);
}
  
  case ADDR_EXPR:
--- 2557,2568 
if (!genop1)
  return NULL_TREE;
  }
!   genop = build5 (TARGET_MEM_REF, currop->type,
!   baseop, currop->op2, genop0, currop->op1, genop1);
! 
!   MR_DEPENDE

Re: [PATCH 0/5] RFC: Overhaul of diagnostics (v2)

2015-09-23 Thread Michael Matz
Hi,

On Tue, 22 Sep 2015, David Malcolm wrote:

> The drawback is that it could bloat the ad-hoc table.  Can the ad-hoc
> table ever get smaller, or does it only ever get inserted into?

It only ever grows.

> An idea I had is that we could stash short ranges directly into the 32 
> bits of location_t, by offsetting the per-column-bits somewhat.

It's certainly worth an experiment: let's say you restrict yourself to 
tokens less than 8 characters, you need an additional 3 bits (using one 
value, e.g. zero, as the escape value).  That leaves 20 bits for the line 
numbers (for the normal 8 bit columns), which might be enough for most 
single-file compilations.  For LTO compilation this often won't be enough.

> My plan is to investigate the impact these patches have on the time and 
> memory consumption of the compiler,

When you do so, make sure you're also measuring an LTO compilation with 
debug info of something big (firefox).  I know that we already had issues 
with the size of the linemap data in the past for these cases (probably 
when we added columns).


Ciao,
Michael.


Re: [PATCH, i386, AVX-512] Fix iterator for k, introduce kshift[lr][bwdq].

2015-09-23 Thread Kirill Yukhin
Hello,
On 22 Sep 18:14, Kirill Yukhin wrote:
> Hello,
> Patch in the bottom fixes iterator for k insns
> since QI mode is only available for AVX-512DQ.
> 
> It also adds support for kshift[rl][bwdq]. This patterns
> will be used for mask load/store autogeneration on which
> Ilya Enkovich is working on.
> 
> gcc/
>   * config/i386/i386.md (define_code_attr mshift): New.
>   (define_mode_iterator SWI1248_AVX512BW): Rename ...
>   (SWI1248_AVX512BW): ... to this. Make QI enabled for TARGET_AVX512DQ
>   only.
>   (define_insn "*k"): Use new iterator name.
>   (define_insn "*3"): New.
> 
> Bootstrapped and regtest in progress
> 
> Is it ok for trunk (if regtest pass)?
Emit pattern was wrong (caught by Spec2k6 autogeneration).

Comitted to main trunk as obvious.

gcc/
* config/i386/i386.md (define_insn "*3"): Fix
insn emit.

--
Thanks, K

commit 254e3b944ac96441544d36c438e92a9a09b963b1
Author: Kirill Yukhin 
Date:   Wed Sep 23 16:24:50 2015 +0300

AVX-512. Fix emit in '*3' pattern.

diff --git a/gcc/config/i386/i386.md b/gcc/config/i386/i386.md
index c0911d4..ba5ab32 100644
--- a/gcc/config/i386/i386.md
+++ b/gcc/config/i386/i386.md
@@ -9366,7 +9366,7 @@
(any_lshift:SWI1248_AVX512BWDQ (match_operand:SWI1248_AVX512BWDQ 1 
"register_operand" "k")
   (match_operand:QI 2 "immediate_operand" 
"i")))]
   "TARGET_AVX512F"
-  "k %2, %1, %0|%0, %1, %2"
+  "k\t{%2, %1, %0|%0, %1, %2}"
   [(set_attr "type" "msklog")
(set_attr "prefix" "vex")])
 


Re: [PATCH 0/5] RFC: Overhaul of diagnostics (v2)

2015-09-23 Thread Richard Biener
On Wed, Sep 23, 2015 at 3:19 PM, Michael Matz  wrote:
> Hi,
>
> On Tue, 22 Sep 2015, David Malcolm wrote:
>
>> The drawback is that it could bloat the ad-hoc table.  Can the ad-hoc
>> table ever get smaller, or does it only ever get inserted into?
>
> It only ever grows.
>
>> An idea I had is that we could stash short ranges directly into the 32
>> bits of location_t, by offsetting the per-column-bits somewhat.
>
> It's certainly worth an experiment: let's say you restrict yourself to
> tokens less than 8 characters, you need an additional 3 bits (using one
> value, e.g. zero, as the escape value).  That leaves 20 bits for the line
> numbers (for the normal 8 bit columns), which might be enough for most
> single-file compilations.  For LTO compilation this often won't be enough.
>
>> My plan is to investigate the impact these patches have on the time and
>> memory consumption of the compiler,
>
> When you do so, make sure you're also measuring an LTO compilation with
> debug info of something big (firefox).  I know that we already had issues
> with the size of the linemap data in the past for these cases (probably
> when we added columns).

The issue we have with LTO is that the linemap gets populated in quite
random order and thus we repeatedly switch files (we've mitigated this
somewhat for GCC 5).  We also considered dropping column info
(and would drop range info) as diagnostics are from optimizers only
with LTO and we keep locations merely for debug info.

Richard.

>
> Ciao,
> Michael.


Re: [RFC] Try vector as a new representation for vector masks

2015-09-23 Thread Ilya Enkovich
2015-09-18 16:40 GMT+03:00 Ilya Enkovich :
> 2015-09-18 15:22 GMT+03:00 Richard Biener :
>>
>> I was thinking about targets not supporting generating vec
>> (of whatever mode) from a comparison directly but only via
>> a COND_EXPR.
>
> Where may these direct comparisons come from? Vectorizer never
> generates unsupported statements. It means we get them from
> gimplifier? So touch optabs in gimplifier to avoid direct comparisons?
> Actually vect lowering checks if we are able to make comparison and
> expand also uses vec_cond to expand vector comparison, so probably we
> may live with them.
>
>>
>> Not sure if we are always talking about the same thing for
>> "bool patterns".  I'd remove bool patterns completely, IMHO
>> they are not necessary at all.
>
> I refer to transformations made by vect_recog_bool_pattern. Don't see
> how to remove them completely for targets not supporting comparison
> vectorization.
>
>>
>> I think we do allow this, just the vectorizer doesn't expect it.  In the long
>> run I want to get rid of the GENERIC exprs in both COND_EXPR and
>> VEC_COND_EXPR.  Just didn't have the time to do this...
>
> That would be nice. As a first step I'd like to support optabs for
> VEC_COND_EXPR directly using vec.
>
> Thanks,
> Ilya
>
>>
>> Richard.

Hi Richard,

Do you think we have enough confidence approach is working and we may
start integrating it into trunk? What would be integration plan then?

Thanks,
Ilya


Re: [RFC] Masking vectorized loops with bound not aligned to VF.

2015-09-23 Thread Richard Biener
On Fri, Sep 18, 2015 at 6:07 PM, Kirill Yukhin  wrote:
> Hello,
> On 18 Sep 10:31, Richard Biener wrote:
>> On Thu, 17 Sep 2015, Ilya Enkovich wrote:
>>
>> > 2015-09-16 15:30 GMT+03:00 Richard Biener :
>> > > On Mon, 14 Sep 2015, Kirill Yukhin wrote:
>> > >
>> > >> Hello,
>> > >> I'd like to initiate discussion on vectorization of loops which
>> > >> boundaries are not aligned to VF. Main target for this optimization
>> > >> right now is x86's AVX-512, which features per-element embedded masking
>> > >> for all instructions. The main goal for this mail is to agree on overall
>> > >> design of the feature.
>> > >>
>> > >> This approach was presented @ GNU Cauldron 2015 by Ilya Enkovich [1].
>> > >>
>> > >> Here's a sketch of the algorithm:
>> > >>   1. Add check on basic stmts for masking: possibility to introduce 
>> > >> index vector and
>> > >>  corresponding mask
>> > >>   2. At the check if statements are vectorizable we additionally check 
>> > >> if stmts
>> > >>  need and can be masked and compute masking cost. Result is stored 
>> > >> in `stmt_vinfo`.
>> > >>  We are going  to mask only mem. accesses, reductions and modify 
>> > >> mask for already
>> > >>  masked stmts (mask load, mask store and vect. condition)
>> > >
>> > > I think you also need to mask divisions (for integer divide by zero) and
>> > > want to mask FP ops which may result in NaNs or denormals (because that's
>> > > generally to slow down execution a lot in my experience).
>> > >
>> > > Why not simply mask all stmts?
>> >
>> > Hi,
>> >
>> > Statement masking may be not free. Especially if we need to transform
>> > mask somehow to do it. It also may be unsupported on a platform (e.g.
>> > for AVX-512 not all instructions support masking) but still not be a
>> > problem to mask a loop. BTW for AVX-512 masking doesn't boost
>> > performance even if we have some special cases like NaNs. We don't
>> > consider exceptions in vector code (and it seems to be a case now?)
>> > otherwise we would need to mask them also.
>>
>> Well, we do need to honor
>>
>>   if (x != 0.)
>>y[i] = z[i] / x;
>>
>> in some way.  I think if-conversion currently simply gives up here.
>> So if we have the epilogue and using masked loads what are the
>> contents of the 'masked' elements (IIRC they are zero or all-ones,
>> right)?  If the end up as zero then even simple code like
>>
>>   for (i;;)
>>a[i] = b[i] / c[i];
>>
>> cannot be transformed in the suggested way with -ftrapping-math
>> and the remainder iteration might get slow if processing NaN
>> operands is still as slow as it was 10 years ago.
>>
>> IMHO for if-converting possibly trapping stmts (like the above
>> example) we need some masking support anyway (and a way to express
>> the masking in GIMPLE).
> We'll use if-cvt technique. If op is trapping - we do not apply masking for 
> loop remainder
> This is subject for further development. Currently we don't try truly mask 
> existing GIMPLE
> stmts. All masking is achieved using `vec_cond` and we're not sure that 
> trapping is really
> useful feature while vectorization is on.

Ok.  And yes, we'd need to have a way to predicate such stmts directly.

>> > >>   3. Make a decision about masking: take computed costs and est. 
>> > >> iterations count
>> > >>  into consideration
>> > >>   4. Modify prologue/epilogue generation according decision made at 
>> > >> analysis. Three
>> > >>  options available:
>> > >> a. Use scalar remainder
>> > >> b. Use masked remainder. Won't be supported in first version
>> > >> c. Mask main loop
>> > >>   5.Support vectorized loop masking:
>> > >> - Create stmts for mask generation
>> > >> - Support generation of masked vector code (create generic vector 
>> > >> code then
>> > >>   patch it w/ masks)
>> > >>   -  Mask loads/stores/vconds/reductions only
>> > >>
>> > >>  In first version (targeted v6) we're not going to support 4.b and loop
>> > >> mask pack/unpack. No `pack/unpack` means that masking will be supported
>> > >> only for types w/ the same size as index variable
>> > >
>> > > This means that if ncopies for any stmt is > 1 masking won't be 
>> > > supported,
>> > > right?  (you'd need two or more different masks)
>> >
>> > We don't think it is a very important feature to have in initial
>> > version. It can be added later and shouldn't affect overall
>> > implementation design much. BTW currently masked loads and stores
>> > don't support masks of other sizes and don't do masks pack/unpack.
>>
>> I think masked loads/stores support this just fine.  Remember the
>> masks are regular vectors generated by cond exprs in the current code.
> Not quite true, mask load/stores are not supported for different size.
> E.g. this example is not vectorized:
>   int a[LENGTH], b[LENGTH];
>   long long c[LENGTH];
>
>   int test ()
>   {
> int i;
> #pragma omp simd safelen(16)
> for (i = 0; i < LENGTH; i++)
>   if (a[i] > b[i])
> c[i] = 1;

Re: [RFC] Try vector as a new representation for vector masks

2015-09-23 Thread Richard Biener
On Fri, Sep 18, 2015 at 3:40 PM, Ilya Enkovich  wrote:
> 2015-09-18 15:22 GMT+03:00 Richard Biener :
>> On Thu, Sep 3, 2015 at 3:57 PM, Ilya Enkovich  wrote:
>>> 2015-09-03 15:11 GMT+03:00 Richard Biener :
 On Thu, Sep 3, 2015 at 2:03 PM, Ilya Enkovich  
 wrote:
> Adding CCs.
>
> 2015-09-03 15:03 GMT+03:00 Ilya Enkovich :
>> 2015-09-01 17:25 GMT+03:00 Richard Biener :
>>
>> Totally disabling old style vector comparison and bool pattern is a
>> goal but doing hat would mean a lot of regressions for many targets.
>> Do you want to it to be tried to estimate amount of changes required
>> and reveal possible issues? What would be integration plan for these
>> changes? Do you want to just introduce new vector in GIMPLE
>> disabling bool patterns and then resolving vectorization regression on
>> all targets or allow them live together with following target switch
>> one by one from bool patterns with finally removing them? Not all
>> targets are likely to be adopted fast I suppose.

 Well, the frontends already create vec_cond exprs I believe.  So for
 bool patterns the vectorizer would have to do the same, but the
 comparison result in there would still use vec.  Thus the scalar

  _Bool a = b < c;
  _Bool c = a || d;
  if (c)

 would become

  vec a = VEC_COND ;
  vec c = a | d;
>>>
>>> This should be identical to
>>>
>>> vec<_Bool> a = a < b;
>>> vec<_Bool> c = a | d;
>>>
>>> where vec<_Bool> has VxSI mode. And we should prefer it in case target
>>> supports vector comparison into vec, right?
>>>

 when the target does not have vecs directly and otherwise
 vec directly (dropping the VEC_COND).

 Just the vector comparison inside the VEC_COND would always
 have vec type.
>>>
>>> I don't really understand what you mean by 'doesn't have vecs
>>> dirrectly' here. Currently I have a hook to ask for a vec mode
>>> and assume target doesn't support it in case it returns VOIDmode. But
>>> in such case I have no mode to use for vec inside VEC_COND
>>> either.
>>
>> I was thinking about targets not supporting generating vec
>> (of whatever mode) from a comparison directly but only via
>> a COND_EXPR.
>
> Where may these direct comparisons come from? Vectorizer never
> generates unsupported statements. It means we get them from
> gimplifier?

That's what I say - the vecotirzer wouldn't generate them.

> So touch optabs in gimplifier to avoid direct comparisons?
> Actually vect lowering checks if we are able to make comparison and
> expand also uses vec_cond to expand vector comparison, so probably we
> may live with them.
>
>>
>>> In default implementation of the new target hook I always return
>>> integer vector mode (to have default behavior similar to the current
>>> one). It should allow me to use vec for conditions in all
>>> vec_cond. But we'd need some other trigger for bool patterns to apply.
>>> Probably check vec_cmp optab in check_bool_pattern and don't convert
>>> in case comparison is supported by target? Or control it via
>>> additional hook.
>>
>> Not sure if we are always talking about the same thing for
>> "bool patterns".  I'd remove bool patterns completely, IMHO
>> they are not necessary at all.
>
> I refer to transformations made by vect_recog_bool_pattern. Don't see
> how to remove them completely for targets not supporting comparison
> vectorization.

The vectorizer can vectorize comparisons by emitting a VEC_COND_EXPR
(the bool pattern would turn the comparison into a COND_EXPR).  I don't
see how the pattern intermediate step is necessary.  The important part
is to get the desired vector type of the comparison determined.

>>

 And the "bool patterns" I am talking about are those in
 tree-vect-patterns.c, not any targets instruction patterns.
>>>
>>> I refer to them also. BTW bool patterns also pull comparison into
>>> vec_cond. Thus we cannot have SSA_NAME in vec_cond as a condition. I
>>> think with vector comparisons in place we should allow SSA_NAME as
>>> conditions in VEC_COND for better CSE. That should require new vcond
>>> optabs though.
>>
>> I think we do allow this, just the vectorizer doesn't expect it.  In the long
>> run I want to get rid of the GENERIC exprs in both COND_EXPR and
>> VEC_COND_EXPR.  Just didn't have the time to do this...
>
> That would be nice. As a first step I'd like to support optabs for
> VEC_COND_EXPR directly using vec.
>
> Thanks,
> Ilya
>
>>
>> Richard.
>>
>>> Ilya
>>>

 Richard.

>>
>> Ilya


Re: [PATCH 0/5] RFC: Overhaul of diagnostics (v2)

2015-09-23 Thread Michael Matz
Hi,

On Wed, 23 Sep 2015, Richard Biener wrote:

> The issue we have with LTO is that the linemap gets populated in quite 
> random order and thus we repeatedly switch files (we've mitigated this 
> somewhat for GCC 5).

Yes.

> We also considered dropping column info (and would drop range info) as 
> diagnostics are from optimizers only with LTO and we keep locations 
> merely for debug info.

That would be the obvious mitigations, yes.  I do like the fact that we'd 
be able to do all this without enlarging location_t.


Ciao,
Micha.


Re: patch for PR61578

2015-09-23 Thread Dominik Vogt
On Tue, Sep 01, 2015 at 03:39:19PM -0400, Vladimir Makarov wrote:
>   The following patch is for
> 
> https://gcc.gnu.org/bugzilla/show_bug.cgi?id=61578
> 
>   The patch was bootstrapped and tested on x86 and x86-64.
> 
>   Committed as rev. 227382.
> 
> 2015-09-01  Vladimir Makarov  
> 
> PR target/61578
> * lra-lives.c (process_bb_lives): Process move pseudos with the
> same value for copies and preferences
> * lra-constraints.c (match_reload): Create match reload pseudo
> with the same value from single dying input pseudo.

This check-in caused a regression on s390, please see
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=61578 for details.

Ciao

Dominik ^_^  ^_^

-- 

Dominik Vogt
IBM Germany



Re: [RFC] Try vector as a new representation for vector masks

2015-09-23 Thread Richard Biener
On Wed, Sep 23, 2015 at 3:41 PM, Ilya Enkovich  wrote:
> 2015-09-18 16:40 GMT+03:00 Ilya Enkovich :
>> 2015-09-18 15:22 GMT+03:00 Richard Biener :
>>>
>>> I was thinking about targets not supporting generating vec
>>> (of whatever mode) from a comparison directly but only via
>>> a COND_EXPR.
>>
>> Where may these direct comparisons come from? Vectorizer never
>> generates unsupported statements. It means we get them from
>> gimplifier? So touch optabs in gimplifier to avoid direct comparisons?
>> Actually vect lowering checks if we are able to make comparison and
>> expand also uses vec_cond to expand vector comparison, so probably we
>> may live with them.
>>
>>>
>>> Not sure if we are always talking about the same thing for
>>> "bool patterns".  I'd remove bool patterns completely, IMHO
>>> they are not necessary at all.
>>
>> I refer to transformations made by vect_recog_bool_pattern. Don't see
>> how to remove them completely for targets not supporting comparison
>> vectorization.
>>
>>>
>>> I think we do allow this, just the vectorizer doesn't expect it.  In the 
>>> long
>>> run I want to get rid of the GENERIC exprs in both COND_EXPR and
>>> VEC_COND_EXPR.  Just didn't have the time to do this...
>>
>> That would be nice. As a first step I'd like to support optabs for
>> VEC_COND_EXPR directly using vec.
>>
>> Thanks,
>> Ilya
>>
>>>
>>> Richard.
>
> Hi Richard,
>
> Do you think we have enough confidence approach is working and we may
> start integrating it into trunk? What would be integration plan then?

I'm still worried about the vec vector size vs. element size
issue (well, somewhat).

Otherwise the integration plan would be

 1) put in the vector GIMPLE type support and change the vector
comparison type IL requirement to be vector,
fixing all fallout

 2) get support for directly expanding vector comparisons to
vector and make use of that from the x86 backend

 3) make the vectorizer generate the above if supported

I think independent improvements are

 1) remove (most) of the bool patterns from the vectorizer

 2) make VEC_COND_EXPR not have a GENERIC comparison embedded

(same for COND_EXPR?)

Richard.

> Thanks,
> Ilya


[PATCH] Fix PR67662

2015-09-23 Thread Richard Biener

Bootstrapped and tested on x86_64-unknown-linux-gnu, applied.

Richard.

2015-09-23   Richard Biener  

PR middle-end/67662
* fold-const.c (fold_binary_loc): Do not reassociate two vars with
undefined overflow unless they will cancel out.

* gcc.dg/ubsan/pr67662.c: New testcase.

Index: gcc/fold-const.c
===
--- gcc/fold-const.c(revision 228037)
+++ gcc/fold-const.c(working copy)
@@ -9493,25 +9511,32 @@ fold_binary_loc (location_t loc,
{
  tree tmp0 = var0;
  tree tmp1 = var1;
+ bool one_neg = false;
 
  if (TREE_CODE (tmp0) == NEGATE_EXPR)
-   tmp0 = TREE_OPERAND (tmp0, 0);
+   {
+ tmp0 = TREE_OPERAND (tmp0, 0);
+ one_neg = !one_neg;
+   }
  if (CONVERT_EXPR_P (tmp0)
  && INTEGRAL_TYPE_P (TREE_TYPE (TREE_OPERAND (tmp0, 0)))
  && (TYPE_PRECISION (TREE_TYPE (TREE_OPERAND (tmp0, 0)))
  <= TYPE_PRECISION (atype)))
tmp0 = TREE_OPERAND (tmp0, 0);
  if (TREE_CODE (tmp1) == NEGATE_EXPR)
-   tmp1 = TREE_OPERAND (tmp1, 0);
+   {
+ tmp1 = TREE_OPERAND (tmp1, 0);
+ one_neg = !one_neg;
+   }
  if (CONVERT_EXPR_P (tmp1)
  && INTEGRAL_TYPE_P (TREE_TYPE (TREE_OPERAND (tmp1, 0)))
  && (TYPE_PRECISION (TREE_TYPE (TREE_OPERAND (tmp1, 0)))
  <= TYPE_PRECISION (atype)))
tmp1 = TREE_OPERAND (tmp1, 0);
  /* The only case we can still associate with two variables
-is if they are the same, modulo negation and bit-pattern
-preserving conversions.  */
- if (!operand_equal_p (tmp0, tmp1, 0))
+is if they cancel out.  */
+ if (!one_neg
+ || !operand_equal_p (tmp0, tmp1, 0))
ok = false;
}
}
Index: gcc/testsuite/gcc.dg/ubsan/pr67662.c
===
--- gcc/testsuite/gcc.dg/ubsan/pr67662.c(revision 0)
+++ gcc/testsuite/gcc.dg/ubsan/pr67662.c(working copy)
@@ -0,0 +1,14 @@
+/* { dg-do run } */
+/* { dg-options "-fsanitize=undefined" } */
+
+extern void abort (void);
+
+int
+main (void)
+{
+  int halfmaxval = __INT_MAX__ / 2 + 1;
+  int maxval = halfmaxval - 1 + halfmaxval;
+  if (maxval != __INT_MAX__)
+abort ();
+  return 0;
+}


Refactor omp_reduction_init: omp_reduction_init_op (was: [gomp4] ptx reduction simplification)

2015-09-23 Thread Thomas Schwinge
Hi!

On Tue, 22 Sep 2015 11:11:59 -0400, Nathan Sidwell  
wrote:
> On 09/22/15 11:10, Thomas Schwinge wrote:
> > On Fri, 18 Sep 2015 20:05:48 -0400, Nathan Sidwell  wrote:
> >> I've committed this patch to rework and simplify [...]
> >> the reduction lowering hooks.
> >>
> >> The current implementation [...]
> >> [was] overcomplicated in a number of ways.
> >
> >>* omp-low.h (omp_reduction_init_op): Declare.
> >>* omp-low.c (omp_reduction_init_op): New, broken out of ...
> >>(omp_reduction_init): ... here.  Call it.
> >>* tree-parloops.c (initialize_reductions): Use
> >>omp_redutction_init_op.
> >
> > Should this go into trunk already?  (I can test it, if you'd like me to.)
> 
> go  for it!

Tested on x86_64-pc-linux-gnu; no changes.  OK for trunk?

commit de2726ef46b8d875239ccb445c784c56e1a716dc
Author: Thomas Schwinge 
Date:   Tue Sep 22 17:30:40 2015 +0200

Refactor omp_reduction_init: omp_reduction_init_op

2015-09-23  Thomas Schwinge  
Nathan Sidwell  

gcc/
* omp-low.h (omp_reduction_init_op): Declare.
* omp-low.c (omp_reduction_init_op): New, broken out of ...
(omp_reduction_init): ... here.  Call it.
* tree-parloops.c (initialize_reductions): Use
omp_reduction_init_op.
---
 gcc/omp-low.c   |   16 
 gcc/omp-low.h   |1 +
 gcc/tree-parloops.c |   16 +---
 3 files changed, 18 insertions(+), 15 deletions(-)

diff --git gcc/omp-low.c gcc/omp-low.c
index 88a5149..fae407d 100644
--- gcc/omp-low.c
+++ gcc/omp-low.c
@@ -3372,13 +3372,12 @@ maybe_lookup_decl_in_outer_ctx (tree decl, omp_context 
*ctx)
 }
 
 
-/* Construct the initialization value for reduction CLAUSE.  */
+/* Construct the initialization value for reduction operation OP.  */
 
 tree
-omp_reduction_init (tree clause, tree type)
+omp_reduction_init_op (location_t loc, enum tree_code op, tree type)
 {
-  location_t loc = OMP_CLAUSE_LOCATION (clause);
-  switch (OMP_CLAUSE_REDUCTION_CODE (clause))
+  switch (op)
 {
 case PLUS_EXPR:
 case MINUS_EXPR:
@@ -3451,6 +3450,15 @@ omp_reduction_init (tree clause, tree type)
 }
 }
 
+/* Construct the initialization value for reduction CLAUSE.  */
+
+tree
+omp_reduction_init (tree clause, tree type)
+{
+  return omp_reduction_init_op (OMP_CLAUSE_LOCATION (clause),
+   OMP_CLAUSE_REDUCTION_CODE (clause), type);
+}
+
 /* Return alignment to be assumed for var in CLAUSE, which should be
OMP_CLAUSE_ALIGNED.  */
 
diff --git gcc/omp-low.h gcc/omp-low.h
index 8a4052e..44e35a3 100644
--- gcc/omp-low.h
+++ gcc/omp-low.h
@@ -25,6 +25,7 @@ struct omp_region;
 extern tree find_omp_clause (tree, enum omp_clause_code);
 extern void omp_expand_local (basic_block);
 extern void free_omp_regions (void);
+extern tree omp_reduction_init_op (location_t, enum tree_code, tree);
 extern tree omp_reduction_init (tree, tree);
 extern bool make_gimple_omp_edges (basic_block, struct omp_region **, int *);
 extern void omp_finish_file (void);
diff --git gcc/tree-parloops.c gcc/tree-parloops.c
index c164121..94cacb6 100644
--- gcc/tree-parloops.c
+++ gcc/tree-parloops.c
@@ -565,8 +565,8 @@ reduc_stmt_res (gimple stmt)
 int
 initialize_reductions (reduction_info **slot, struct loop *loop)
 {
-  tree init, c;
-  tree bvar, type, arg;
+  tree init;
+  tree type, arg;
   edge e;
 
   struct reduction_info *const reduc = *slot;
@@ -577,16 +577,10 @@ initialize_reductions (reduction_info **slot, struct loop 
*loop)
   /* In the phi node at the header, replace the argument coming
  from the preheader with the reduction initialization value.  */
 
-  /* Create a new variable to initialize the reduction.  */
+  /* Initialize the reduction.  */
   type = TREE_TYPE (PHI_RESULT (reduc->reduc_phi));
-  bvar = create_tmp_var (type, "reduction");
-
-  c = build_omp_clause (gimple_location (reduc->reduc_stmt),
-   OMP_CLAUSE_REDUCTION);
-  OMP_CLAUSE_REDUCTION_CODE (c) = reduc->reduction_code;
-  OMP_CLAUSE_DECL (c) = SSA_NAME_VAR (reduc_stmt_res (reduc->reduc_stmt));
-
-  init = omp_reduction_init (c, TREE_TYPE (bvar));
+  init = omp_reduction_init_op (gimple_location (reduc->reduc_stmt),
+   reduc->reduction_code, type);
   reduc->init = init;
 
   /* Replace the argument representing the initialization value


Grüße,
 Thomas


signature.asc
Description: PGP signature


Re: Refactor omp_reduction_init: omp_reduction_init_op

2015-09-23 Thread Bernd Schmidt

gcc/
* omp-low.h (omp_reduction_init_op): Declare.
* omp-low.c (omp_reduction_init_op): New, broken out of ...
(omp_reduction_init): ... here.  Call it.
* tree-parloops.c (initialize_reductions): Use
omp_reduction_init_op.


That looks ok.


Bernd



Re: [gomp4] lock/unlock internal fn

2015-09-23 Thread Thomas Schwinge
Hi Nathan!

On Wed, 23 Sep 2015 08:40:51 -0400, Nathan Sidwell  
wrote:
> On 09/23/15 05:27, Thomas Schwinge wrote:
> > On Mon, 17 Aug 2015 15:30:16 -0400, Nathan Sidwell  wrote:
> >> I've committed this patch to add a new pair of internal functions.  These 
> >> will
> >> be used in implementing reductions.
> >>
> >> They'll be emitted around reduction finalization, and implement the locking
> >> required for the general case of combining reduction values.  They may be
> >> transformed in the oacc_xform pass, and the default behaviour is to delete 
> >> them,
> >> if there is no RTL expander.  For PTX we delete them if they are at the 
> >> vector
> >> level.
> >>
> >> This avoids needing machine-specific builtins to expand to, and thus should
> >> result in less backend code duplication.
> >
> > With the __builtin_nvptx_lock and __builtin_nvptx_unlock builtins
> > removed, should the gcc.target/nvptx/spinlock-1.c and
> > gcc.target/nvptx/spinlock-2.c test cases then be removed, too, or should
> > these be re-written differently?
> 
> confused.  I don't think I remoced those locks.  Certainly didn't intend to, 
> and 
> I would have expected massive test fails if I had.

You didn't remove the functionality, but you did remove the
__builtin_nvptx_lock and __builtin_nvptx_unlock builtins (which the two
test cases were written for), replacing them with GOACC_LOCK/GOACC_UNLOCK
internal functions, nvptx_expand_oacc_lock_unlock.


Grüße,
 Thomas


signature.asc
Description: PGP signature


Re: [gomp4] lock/unlock internal fn

2015-09-23 Thread Nathan Sidwell

On 09/23/15 10:16, Thomas Schwinge wrote:

Hi Nathan!

On Wed, 23 Sep 2015 08:40:51 -0400, Nathan Sidwell  
wrote:

On 09/23/15 05:27, Thomas Schwinge wrote:

On Mon, 17 Aug 2015 15:30:16 -0400, Nathan Sidwell  wrote:

I've committed this patch to add a new pair of internal functions.  These will
be used in implementing reductions.

They'll be emitted around reduction finalization, and implement the locking
required for the general case of combining reduction values.  They may be
transformed in the oacc_xform pass, and the default behaviour is to delete them,
if there is no RTL expander.  For PTX we delete them if they are at the vector
level.

This avoids needing machine-specific builtins to expand to, and thus should
result in less backend code duplication.


With the __builtin_nvptx_lock and __builtin_nvptx_unlock builtins
removed, should the gcc.target/nvptx/spinlock-1.c and
gcc.target/nvptx/spinlock-2.c test cases then be removed, too, or should
these be re-written differently?


confused.  I don't think I remoced those locks.  Certainly didn't intend to, and
I would have expected massive test fails if I had.


You didn't remove the functionality, but you did remove the
__builtin_nvptx_lock and __builtin_nvptx_unlock builtins (which the two
test cases were written for), replacing them with GOACC_LOCK/GOACC_UNLOCK
internal functions, nvptx_expand_oacc_lock_unlock.


ah, thanks. I expect even these are going to go away soon. the spinlock 
testcases should be removed.


nathan

--
Nathan Sidwell


Re: [PATCH] Add new hooks ASM_OUTPUT_START_FUNCTION_HEADER ...

2015-09-23 Thread Dominik Vogt
On Tue, Sep 22, 2015 at 01:56:15PM -0600, Jeff Law wrote: > Is
there some good reason these aren't hooks?

No, that was just inobservance.  New version attached.  Would it be
preferrable to initialize the hooks with a NULL pointer and test
the pointer before calling them?  (That way the changes to
hooks.[ch] could be dropped.)

Ciao

Dominik ^_^  ^_^

-- 

Dominik Vogt
IBM Germany
gcc/ChangeLog

* target.def: Add function_start and function_end hooks.
* hooks.c (hook_void_FILEptr_tree): New function.
* hooks.h: Ditto.
* varasm.c (assemble_start_function): Call hook at start of function.
(assemble_end_function): Call hook at end of function.
* doc/tm.texi.in: Document new hooks.
* doc/tm.texi: Regenerate.
>From 791b0dc5ba32ace51fb8214cdb0cf769b91a024c Mon Sep 17 00:00:00 2001
From: Dominik Vogt 
Date: Wed, 29 Jul 2015 16:14:23 +0100
Subject: [PATCH] Add new hooks asm_out.function_start and
 asm_out.function_end.

They are used by the implementation of __attribute__ ((target(...))) on S390.
---
 gcc/doc/tm.texi| 10 ++
 gcc/doc/tm.texi.in |  4 
 gcc/hooks.c|  7 +++
 gcc/hooks.h|  1 +
 gcc/target.def | 16 
 gcc/varasm.c   |  2 ++
 6 files changed, 40 insertions(+)

diff --git a/gcc/doc/tm.texi b/gcc/doc/tm.texi
index d548d96..62d83db 100644
--- a/gcc/doc/tm.texi
+++ b/gcc/doc/tm.texi
@@ -7348,6 +7348,16 @@ Output to @code{asm_out_file} any text which the assembler expects
 to find at the end of a file.  The default is to output nothing.
 @end deftypefn
 
+@deftypefn {Target Hook} void TARGET_ASM_FUNCTION_START (FILE *@var{}, @var{tree})
+Output to @code{asm_out_file} any text which is necessary at the start of
+a function.  The default is to output nothing.
+@end deftypefn
+
+@deftypefn {Target Hook} void TARGET_ASM_FUNCTION_END (FILE *@var{}, @var{tree})
+Output to @code{asm_out_file} any text which is necessary at the end of a
+function.  The default is to output nothing.
+@end deftypefn
+
 @deftypefun void file_end_indicate_exec_stack ()
 Some systems use a common convention, the @samp{.note.GNU-stack}
 special section, to indicate whether or not an object file relies on
diff --git a/gcc/doc/tm.texi.in b/gcc/doc/tm.texi.in
index 9bef4a5..b1c4b96 100644
--- a/gcc/doc/tm.texi.in
+++ b/gcc/doc/tm.texi.in
@@ -5122,6 +5122,10 @@ This describes the overall framework of an assembly file.
 
 @hook TARGET_ASM_FILE_END
 
+@hook TARGET_ASM_FUNCTION_START
+
+@hook TARGET_ASM_FUNCTION_END
+
 @deftypefun void file_end_indicate_exec_stack ()
 Some systems use a common convention, the @samp{.note.GNU-stack}
 special section, to indicate whether or not an object file relies on
diff --git a/gcc/hooks.c b/gcc/hooks.c
index 0fb9add..3440e06 100644
--- a/gcc/hooks.c
+++ b/gcc/hooks.c
@@ -146,6 +146,13 @@ hook_void_FILEptr_constcharptr_const_tree (FILE *, const char *, const_tree)
 {
 }
 
+/* Generic hook that takes (FILE *, tree) and does
+   nothing.  */
+void
+hook_void_FILEptr_tree (FILE *, tree)
+{
+}
+
 /* Generic hook that takes (FILE *, rtx) and returns false.  */
 bool
 hook_bool_FILEptr_rtx_false (FILE *a ATTRIBUTE_UNUSED,
diff --git a/gcc/hooks.h b/gcc/hooks.h
index c3d4bd3..bbd26cb 100644
--- a/gcc/hooks.h
+++ b/gcc/hooks.h
@@ -70,6 +70,7 @@ extern void hook_void_rtx_insn_int (rtx_insn *, int);
 extern void hook_void_FILEptr_constcharptr (FILE *, const char *);
 extern void hook_void_FILEptr_constcharptr_const_tree (FILE *, const char *,
 		   const_tree);
+extern void hook_void_FILEptr_tree (FILE *, tree);
 extern bool hook_bool_FILEptr_rtx_false (FILE *, rtx);
 extern void hook_void_rtx_tree (rtx, tree);
 extern void hook_void_tree (tree);
diff --git a/gcc/target.def b/gcc/target.def
index aa5a1f1..4a18be5 100644
--- a/gcc/target.def
+++ b/gcc/target.def
@@ -672,6 +672,22 @@ to find at the end of a file.  The default is to output nothing.",
  void, (void),
  hook_void_void)
 
+/* Output additional text at the start of a function.  */
+DEFHOOK
+(function_start,
+ "Output to @code{asm_out_file} any text which is necessary at the start of\n\
+a function.  The default is to output nothing.",
+ void, (FILE *, tree),
+ hook_void_FILEptr_tree)
+
+/* Output additional text at the end of a function.  */
+DEFHOOK
+(function_end,
+ "Output to @code{asm_out_file} any text which is necessary at the end of a\n\
+function.  The default is to output nothing.",
+ void, (FILE *, tree),
+ hook_void_FILEptr_tree)
+
 /* Output any boilerplate text needed at the beginning of an
LTO output stream.  */
 DEFHOOK
diff --git a/gcc/varasm.c b/gcc/varasm.c
index 706e652..1b6f7b7 100644
--- a/gcc/varasm.c
+++ b/gcc/varasm.c
@@ -1701,6 +1701,7 @@ assemble_start_function (tree decl, const char *fnname)
   char tmp_label[100];
   bool hot_label_written = false;
 
+  targetm.asm_out.function_start (asm_out_file, current_function_decl);
   if (flag_reorder_blocks_and_partition)
 {
   ASM_GENERATE_INTERNAL_LABEL (tmp

Re: [PATCH 6/n] OpenMP 4.0 offloading infrastructure: option handling

2015-09-23 Thread Thomas Schwinge
Hi!

I clarified the -foffload usage:
.

On Wed, 23 Sep 2015 00:23:50 +0200, Bernd Schmidt  wrote:
> On 09/22/2015 02:02 PM, Thomas Schwinge wrote:
> >
> > gcc/
> > * gcc.c (handle_foffload_option): Don't lose the trailing NUL
> > character when appending to offload_targets.
> >
> > gcc/
> > * configure.ac (offload_targets, OFFLOAD_TARGETS): Separate
> > offload targets by commas, not colons.
> > * config.in: Regenerate.
> > * configure: Likewise.
> > * gcc.c (driver::maybe_putenv_COLLECT_LTO_WRAPPER): Due to that,
> > instead of setting up the default offload targets here...
> > (process_command): ..., do it here.
> > libgomp/
> > * plugin/configfrag.ac (OFFLOAD_TARGETS): Clarify that offload
> > targets are separated by commas.
> > * config.h.in: Regenerate.
> 
> Looks ok to me

Thanks for the prompt review!

> except this double ChangeLog seems messed up.

Hmm, I thought that was the standard way to format ChangeLogs for
several/independent changes?  Anyway, to avoid that, I've split the patch
into two separate commits; r228053 and r228054:

commit daa8f58fd840e8d35f362306fb54e1963f4cbd0f
Author: tschwinge 
Date:   Wed Sep 23 14:52:50 2015 +

Fix --enable-offload-targets/-foffload handling, pt. 1

gcc/
* configure.ac (offload_targets, OFFLOAD_TARGETS): Separate
offload targets by commas, not colons.
* config.in: Regenerate.
* configure: Likewise.
* gcc.c (driver::maybe_putenv_COLLECT_LTO_WRAPPER): Due to that,
instead of setting up the default offload targets here...
(process_command): ..., do it here.
libgomp/
* plugin/configfrag.ac (OFFLOAD_TARGETS): Clarify that offload
targets are separated by commas.
* config.h.in: Regenerate.

git-svn-id: svn+ssh://gcc.gnu.org/svn/gcc/trunk@228053 
138bc75d-0d04-0410-961f-82ee72b054a4
---
 gcc/ChangeLog|   14 ++
 gcc/config.in|2 +-
 gcc/configure|2 +-
 gcc/configure.ac |4 ++--
 gcc/gcc.c|   23 +--
 gcc/lto-wrapper.c|4 
 libgomp/config.h.in  |2 +-
 libgomp/plugin/configfrag.ac |2 +-
 8 files changed, 37 insertions(+), 16 deletions(-)

diff --git gcc/ChangeLog gcc/ChangeLog
index 0e9b728..df71558 100644
--- gcc/ChangeLog
+++ gcc/ChangeLog
@@ -1,4 +1,18 @@
 2015-09-23  Thomas Schwinge  
+
+   * configure.ac (offload_targets, OFFLOAD_TARGETS): Separate
+   offload targets by commas, not colons.
+   * config.in: Regenerate.
+   * configure: Likewise.
+   * gcc.c (driver::maybe_putenv_COLLECT_LTO_WRAPPER): Due to that,
+   instead of setting up the default offload targets here...
+   (process_command): ..., do it here.
+   libgomp/
+   * plugin/configfrag.ac (OFFLOAD_TARGETS): Clarify that offload
+   targets are separated by commas.
+   * config.h.in: Regenerate.
+
+2015-09-23  Thomas Schwinge  
Nathan Sidwell  
 
* omp-low.h (omp_reduction_init_op): Declare.
diff --git gcc/config.in gcc/config.in
index 431d262..c5c1be4 100644
--- gcc/config.in
+++ gcc/config.in
@@ -1913,7 +1913,7 @@
 #endif
 
 
-/* Define to hold the list of target names suitable for offloading. */
+/* Define to offload targets, separated by commas. */
 #ifndef USED_FOR_TARGET
 #undef OFFLOAD_TARGETS
 #endif
diff --git gcc/configure gcc/configure
index 6fb11a7..7493c80 100755
--- gcc/configure
+++ gcc/configure
@@ -7696,7 +7696,7 @@ for tgt in `echo $enable_offload_targets | sed 's/,/ 
/g'`; do
   if test x"$offload_targets" = x; then
 offload_targets=$tgt
   else
-offload_targets="$offload_targets:$tgt"
+offload_targets="$offload_targets,$tgt"
   fi
 done
 
diff --git gcc/configure.ac gcc/configure.ac
index a6e078a..9d1f6f1 100644
--- gcc/configure.ac
+++ gcc/configure.ac
@@ -941,11 +941,11 @@ for tgt in `echo $enable_offload_targets | sed 's/,/ 
/g'`; do
   if test x"$offload_targets" = x; then
 offload_targets=$tgt
   else
-offload_targets="$offload_targets:$tgt"
+offload_targets="$offload_targets,$tgt"
   fi
 done
 AC_DEFINE_UNQUOTED(OFFLOAD_TARGETS, "$offload_targets",
-  [Define to hold the list of target names suitable for offloading.])
+  [Define to offload targets, separated by commas.])
 if test x"$offload_targets" != x; then
   AC_DEFINE(ENABLE_OFFLOADING, 1,
 [Define this to enable support for offloading.])
diff --git gcc/gcc.c gcc/gcc.c
index 757bfc9..78b68e2 100644
--- gcc/gcc.c
+++ gcc/gcc.c
@@ -284,7 +284,8 @@ static const char *const spec_version = 
DEFAULT_TARGET_VERSION;
 static const char *spec_machine = DEFAULT_TARGET_MACHINE;
 static const char *spec_host_machine = DEFAULT_REAL_

Re: [PATCH 0/5] RFC: Overhaul of diagnostics (v2)

2015-09-23 Thread Jeff Law

On 09/23/2015 07:47 AM, Michael Matz wrote:

Hi,

On Wed, 23 Sep 2015, Richard Biener wrote:


The issue we have with LTO is that the linemap gets populated in quite
random order and thus we repeatedly switch files (we've mitigated this
somewhat for GCC 5).


Yes.


We also considered dropping column info (and would drop range info) as
diagnostics are from optimizers only with LTO and we keep locations
merely for debug info.


That would be the obvious mitigations, yes.  I do like the fact that we'd
be able to do all this without enlarging location_t.

That's the hope.

However, I did ask David to ponder the effects if ultimately we did need 
to extend location_t to 64 bits.


Jff


Re: [gomp4] lock/unlock internal fn

2015-09-23 Thread Thomas Schwinge
Hi!

On Wed, 23 Sep 2015 10:19:15 -0400, Nathan Sidwell  
wrote:
> On 09/23/15 10:16, Thomas Schwinge wrote:
> > On Wed, 23 Sep 2015 08:40:51 -0400, Nathan Sidwell 
> >  wrote:
> >> On 09/23/15 05:27, Thomas Schwinge wrote:
> >>> On Mon, 17 Aug 2015 15:30:16 -0400, Nathan Sidwell  wrote:
>  I've committed this patch to add a new pair of internal functions.  
>  These will
>  be used in implementing reductions.

> >>> With the __builtin_nvptx_lock and __builtin_nvptx_unlock builtins
> >>> removed, should the gcc.target/nvptx/spinlock-1.c and
> >>> gcc.target/nvptx/spinlock-2.c test cases then be removed, too, or should
> >>> these be re-written differently?
> >>
> >> confused.  I don't think I remoced those locks.  Certainly didn't intend 
> >> to, and
> >> I would have expected massive test fails if I had.
> >
> > You didn't remove the functionality, but you did remove the
> > __builtin_nvptx_lock and __builtin_nvptx_unlock builtins (which the two
> > test cases were written for), replacing them with GOACC_LOCK/GOACC_UNLOCK
> > internal functions, nvptx_expand_oacc_lock_unlock.
> 
> ah, thanks. I expect even these are going to go away soon. the spinlock 
> testcases should be removed.

Committed to gomp-4_0-branch in r228055:

commit fa0a1ef0b746e6f2f7c54f5516ee2c8ebe05cf25
Author: tschwinge 
Date:   Wed Sep 23 15:16:05 2015 +

[nvptx] Remove obsolete spinlock test cases

gcc/testsuite/
* gcc.target/nvptx/spinlock-1.c: Remove file.
* gcc.target/nvptx/spinlock-2.c: Likewise.

git-svn-id: svn+ssh://gcc.gnu.org/svn/gcc/branches/gomp-4_0-branch@228055 
138bc75d-0d04-0410-961f-82ee72b054a4
---
 gcc/testsuite/ChangeLog.gomp|5 +
 gcc/testsuite/gcc.target/nvptx/spinlock-1.c |   11 ---
 gcc/testsuite/gcc.target/nvptx/spinlock-2.c |   10 --
 3 files changed, 5 insertions(+), 21 deletions(-)

diff --git gcc/testsuite/ChangeLog.gomp gcc/testsuite/ChangeLog.gomp
index b14167e..1e7667d 100644
--- gcc/testsuite/ChangeLog.gomp
+++ gcc/testsuite/ChangeLog.gomp
@@ -1,3 +1,8 @@
+2015-09-23  Thomas Schwinge  
+
+   * gcc.target/nvptx/spinlock-1.c: Remove file.
+   * gcc.target/nvptx/spinlock-2.c: Likewise.
+
 2015-09-18  Thomas Schwinge  
 
* gcc.target/nvptx/spinlock-1.c: Fix DejaGnu directives.
diff --git gcc/testsuite/gcc.target/nvptx/spinlock-1.c 
gcc/testsuite/gcc.target/nvptx/spinlock-1.c
deleted file mode 100644
index b464ad9..000
--- gcc/testsuite/gcc.target/nvptx/spinlock-1.c
+++ /dev/null
@@ -1,11 +0,0 @@
-/* { dg-do compile } */
-void Foo ()
-{
-  __builtin_nvptx_lock (0);
-  __builtin_nvptx_unlock (0);
-}
-
-
-/* { dg-final { scan-assembler-times ".atom.global.cas.b32" 2 } } */
-/* { dg-final { scan-assembler ".global .u32 __global_lock;" } } */
-/* { dg-final { scan-assembler-not ".shared .u32 __shared_lock;" } } */
diff --git gcc/testsuite/gcc.target/nvptx/spinlock-2.c 
gcc/testsuite/gcc.target/nvptx/spinlock-2.c
deleted file mode 100644
index 9a51d3f..000
--- gcc/testsuite/gcc.target/nvptx/spinlock-2.c
+++ /dev/null
@@ -1,10 +0,0 @@
-/* { dg-do compile } */
-void Foo ()
-{
-  __builtin_nvptx_lock (1);
-  __builtin_nvptx_unlock (1);
-}
-
-/* { dg-final { scan-assembler-times ".atom.shared.cas.b32" 2 } } */
-/* { dg-final { scan-assembler ".shared .u32 __shared_lock;" } } */
-/* { dg-final { scan-assembler-not ".global .u32 __global_lock;" } } */


Grüße,
 Thomas


signature.asc
Description: PGP signature


Re: [PATCH] vectorizing conditional expressions (PR tree-optimization/65947)

2015-09-23 Thread Alan Hayward


On 18/09/2015 14:53, "Alan Hayward"  wrote:

>
>
>On 18/09/2015 14:26, "Alan Lawrence"  wrote:
>
>>On 18/09/15 13:17, Richard Biener wrote:
>>>
>>> Ok, I see.
>>>
>>> That this case is already vectorized is because it implements MAX_EXPR,
>>> modifying it slightly to
>>>
>>> int foo (int *a)
>>> {
>>>int val = 0;
>>>for (int i = 0; i < 1024; ++i)
>>>  if (a[i] > val)
>>>val = a[i] + 1;
>>>return val;
>>> }
>>>
>>> makes it no longer handled by current code.
>>>
>>
>>Yes. I believe the idea for the patch is to handle arbitrary expressions
>>like
>>
>>int foo (int *a)
>>{
>>int val = 0;
>>for (int i = 0; i < 1024; ++i)
>>  if (some_expression (i))
>>val = another_expression (i);
>>return val;
>>}
>
>Yes, that’s correct. Hopefully my new test cases should cover everything.
>

Attached is a new version of the patch containing all the changes
requested by Richard.


Thanks,
Alan.




0001-Support-for-vectorizing-conditional-expressions.patch
Description: Binary data


[PATCH][tree-inline][obvious] Delete redundant count_insns_seq

2015-09-23 Thread Kyrill Tkachov

Hi all,

I notice that the functions count_insns_seq and estimate_num_insns_seq perform 
the exact same function for exactly the same arguments.
It's redundant to keep both around. I've decided to delete count_insns_seq and 
replace its one use by estimate_num_insns_seq.

Bootstrapped and tested on aarch64, x86_64.
I think this change is obvious, so I'll commit it in 24 hours unless someone 
objects.

Thanks,
Kyrill

2015-09-23  Kyrylo Tkachov  

* tree-inline.h (count_insns_seq): Delete prototype.
(estimate_num_insns_seq): Define prototype.
* tree-inline.c (count_insns_seq): Delete.
(estimate_num_insns_seq): Remove static qualifier.
* tree-eh.c (decide_copy_try_finally): Replace use of count_insns_seq
with estimate_num_insns_seq.
commit b4266c4bd350628fe5d333998b7a76a7d4ab2ad5
Author: Kyrylo Tkachov 
Date:   Wed Sep 23 12:14:46 2015 +0100

[tree-inline] Delete redundant count_insns_seq

diff --git a/gcc/tree-eh.c b/gcc/tree-eh.c
index c19d2be..cb1f08a 100644
--- a/gcc/tree-eh.c
+++ b/gcc/tree-eh.c
@@ -1621,7 +1621,7 @@ decide_copy_try_finally (int ndests, bool may_throw, gimple_seq finally)
 }
 
   /* Finally estimate N times, plus N gotos.  */
-  f_estimate = count_insns_seq (finally, &eni_size_weights);
+  f_estimate = estimate_num_insns_seq (finally, &eni_size_weights);
   f_estimate = (f_estimate + 1) * ndests;
 
   /* Switch statement (cost 10), N variable assignments, N gotos.  */
diff --git a/gcc/tree-inline.c b/gcc/tree-inline.c
index abaea3f..36075b2 100644
--- a/gcc/tree-inline.c
+++ b/gcc/tree-inline.c
@@ -3972,8 +3972,8 @@ estimate_operator_cost (enum tree_code code, eni_weights *weights,
the statements in the statement sequence STMTS.
WEIGHTS contains weights attributed to various constructs.  */
 
-static
-int estimate_num_insns_seq (gimple_seq stmts, eni_weights *weights)
+int
+estimate_num_insns_seq (gimple_seq stmts, eni_weights *weights)
 {
   int cost;
   gimple_stmt_iterator gsi;
@@ -4262,19 +4262,6 @@ init_inline_once (void)
   eni_time_weights.return_cost = 2;
 }
 
-/* Estimate the number of instructions in a gimple_seq. */
-
-int
-count_insns_seq (gimple_seq seq, eni_weights *weights)
-{
-  gimple_stmt_iterator gsi;
-  int n = 0;
-  for (gsi = gsi_start (seq); !gsi_end_p (gsi); gsi_next (&gsi))
-n += estimate_num_insns (gsi_stmt (gsi), weights);
-
-  return n;
-}
-
 
 /* Install new lexical TREE_BLOCK underneath 'current_block'.  */
 
diff --git a/gcc/tree-inline.h b/gcc/tree-inline.h
index f0e5436..b8fb2a2 100644
--- a/gcc/tree-inline.h
+++ b/gcc/tree-inline.h
@@ -207,7 +207,7 @@ tree copy_decl_no_change (tree decl, copy_body_data *id);
 int estimate_move_cost (tree type, bool);
 int estimate_num_insns (gimple *, eni_weights *);
 int estimate_num_insns_fn (tree, eni_weights *);
-int count_insns_seq (gimple_seq, eni_weights *);
+int estimate_num_insns_seq (gimple_seq, eni_weights *);
 bool tree_versionable_function_p (tree);
 extern tree remap_decl (tree decl, copy_body_data *id);
 extern tree remap_type (tree type, copy_body_data *id);


Re: [ubsan PATCH] Fix uninitialized var issue (PR sanitizer/64906)

2015-09-23 Thread Marek Polacek
On Wed, Sep 23, 2015 at 01:08:53PM +0200, Bernd Schmidt wrote:
> On 09/22/2015 05:11 PM, Marek Polacek wrote:
> 
> >diff --git gcc/c-family/c-ubsan.c gcc/c-family/c-ubsan.c
> >index e0cce84..d2bc264 100644
> >--- gcc/c-family/c-ubsan.c
> >+++ gcc/c-family/c-ubsan.c
> >@@ -104,6 +104,7 @@ ubsan_instrument_division (location_t loc, tree op0, 
> >tree op1)
> > }
> >  }
> >t = fold_build2 (COMPOUND_EXPR, TREE_TYPE (t), unshare_expr (op0), t);
> >+  t = fold_build2 (COMPOUND_EXPR, TREE_TYPE (t), unshare_expr (op1), t);
> >if (flag_sanitize_undefined_trap_on_error)
> >  tt = build_call_expr_loc (loc, builtin_decl_explicit (BUILT_IN_TRAP), 
> > 0);
> >else
> 
> I really don't know this code, but just before the location you're patching,
> there's this:
> 
>   /* In case we have a SAVE_EXPR in a conditional context, we need to
>  make sure it gets evaluated before the condition.  If the OP0 is
>  an instrumented array reference, mark it as having side effects so
>  it's not folded away.  */
>   if (flag_sanitize & SANITIZE_BOUNDS)
> {
>   tree xop0 = op0;
>   while (CONVERT_EXPR_P (xop0))
> xop0 = TREE_OPERAND (xop0, 0);
>   if (TREE_CODE (xop0) == ARRAY_REF)
> {
>   TREE_SIDE_EFFECTS (xop0) = 1;
>   TREE_SIDE_EFFECTS (op0) = 1;
> }
> }
> 
> Does that need to be done for op1 as well? (I really wonder why this is
> needed or whether it's sufficient to find such an ARRAY_REF if you can have
> more complex operands).
 
Good point.  I've dug into this and that hunk doesn't seem to be needed
(anymore?).  I suppose there was a reason I added that, but removing it
doesn't seem to break anything.  It can be triggered with a code like:

struct S
{
  unsigned long a[1];
  int l;
};

static inline unsigned long
fn (const struct S *s, int i)
{
  return s->a[i] / i;
}

int
main ()
{
  struct S s;
  fn (&s, 1);
}

With the hunk, we sanitize the same array twice -- that's "suboptimal".  With
the hunk removed, we sanitize the array just once as expected.

> The same pattern occurs in another function, so it may be best to break it
> out into a new function if additional occurrences are necessary.

Given that the code above seems to be useless now, I think let's put this
patch in as-is, backport it to gcc-5, then remove those redundant hunks on
trunk and add the testcase above.  Do you agree?

Marek


[patch] Reduce space and time overhead of std::thread

2015-09-23 Thread Jonathan Wakely

For PR 65393 I avoided some unnecessary shared_ptr copies while
launching a std::thread. This goes further and avoids shared_ptr
entirely, using unique_ptr instead. This reduces the memory overhead
of a std::thread by 32 bytes (on 64-bit) and avoids any
reference-count updates.

The downside is it exports some new symbols, and we have to keep the
old code for backwards compatibility, but I think it's worth doing.

Does anybody disagree?



commit 2d7e89aae8ac12dd7a6b2083e5169679c1200cc5
Author: Jonathan Wakely 
Date:   Thu Mar 12 13:23:23 2015 +

Reduce space and time overhead of std::thread

	PR libstdc++/65393
	* config/abi/pre/gnu.ver: Export new symbols.
	* include/std/thread (thread::_State, thread::_State_impl): New types.
	(thread::_M_start_thread): Add overload taking unique_ptr<_State>.
	(thread::_M_make_routine): Remove.
	(thread::_S_make_state): Add.
	(thread::_Impl_base, thread::_Impl, thread::_M_start_thread)
	[_GLIBCXX_THREAD_ABI_COMPAT] Only declare conditionally.
	* src/c++11/thread.cc (execute_native_thread_routine): Rename to
	execute_native_thread_routine_compat and re-define to use _State.
	(thread::_State::~_State()): Define.
	(thread::_M_make_thread): Define new overload.
	(thread::_M_make_thread) [_GLIBCXX_THREAD_ABI_COMPAT]: Only define old
	overloads conditionally.

diff --git a/libstdc++-v3/config/abi/pre/gnu.ver b/libstdc++-v3/config/abi/pre/gnu.ver
index d42cd37..08d9bc6 100644
--- a/libstdc++-v3/config/abi/pre/gnu.ver
+++ b/libstdc++-v3/config/abi/pre/gnu.ver
@@ -1870,6 +1870,11 @@ GLIBCXX_3.4.22 {
 # std::uncaught_exceptions()
 _ZSt19uncaught_exceptionsv;
 
+# std::thread::_State::~_State()
+_ZT[ISV]NSt6thread6_StateE;
+_ZNSt6thread6_StateD[012]Ev;
+_ZNSt6thread15_M_start_threadESt10unique_ptrINS_6_StateESt14default_deleteIS1_EEPFvvE;
+
 } GLIBCXX_3.4.21;
 
 # Symbols in the support library (libsupc++) have their own tag.
diff --git a/libstdc++-v3/include/std/thread b/libstdc++-v3/include/std/thread
index ebbda62..c67ec46 100644
--- a/libstdc++-v3/include/std/thread
+++ b/libstdc++-v3/include/std/thread
@@ -60,9 +60,16 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
   class thread
   {
   public:
+// Abstract base class for types that wrap arbitrary functors to be
+// invoked in the new thread of execution.
+struct _State
+{
+  virtual ~_State();
+  virtual void _M_run() = 0;
+};
+using _State_ptr = unique_ptr<_State>;
+
 typedef __gthread_t			native_handle_type;
-struct _Impl_base;
-typedef shared_ptr<_Impl_base>	__shared_base_type;
 
 /// thread::id
 class id
@@ -92,29 +99,6 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
 	operator<<(basic_ostream<_CharT, _Traits>& __out, thread::id __id);
 };
 
-// Simple base type that the templatized, derived class containing
-// an arbitrary functor can be converted to and called.
-struct _Impl_base
-{
-  __shared_base_type	_M_this_ptr;
-
-  inline virtual ~_Impl_base();
-
-  virtual void _M_run() = 0;
-};
-
-template
-  struct _Impl : public _Impl_base
-  {
-	_Callable		_M_func;
-
-	_Impl(_Callable&& __f) : _M_func(std::forward<_Callable>(__f))
-	{ }
-
-	void
-	_M_run() { _M_func(); }
-  };
-
   private:
 id_M_id;
 
@@ -133,16 +117,15 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
   thread(_Callable&& __f, _Args&&... __args)
   {
 #ifdef GTHR_ACTIVE_PROXY
-	// Create a reference to pthread_create, not just the gthr weak symbol
-_M_start_thread(_M_make_routine(std::__bind_simple(
-std::forward<_Callable>(__f),
-std::forward<_Args>(__args)...)),
-	reinterpret_cast(&pthread_create));
+	// Create a reference to pthread_create, not just the gthr weak symbol.
+	auto __depend = reinterpret_cast(&pthread_create);
 #else
-_M_start_thread(_M_make_routine(std::__bind_simple(
-std::forward<_Callable>(__f),
-std::forward<_Args>(__args)...)));
+	auto __depend = nullptr;
 #endif
+_M_start_thread(_S_make_state(
+	  std::__bind_simple(std::forward<_Callable>(__f),
+ std::forward<_Args>(__args)...)),
+	__depend);
   }
 
 ~thread()
@@ -190,23 +173,48 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
 hardware_concurrency() noexcept;
 
   private:
+template
+  struct _State_impl : public _State
+  {
+	_Callable		_M_func;
+
+	_State_impl(_Callable&& __f) : _M_func(std::forward<_Callable>(__f))
+	{ }
+
+	void
+	_M_run() { _M_func(); }
+  };
+
+void
+_M_start_thread(_State_ptr, void (*)());
+
+template
+  static _State_ptr
+  _S_make_state(_Callable&& __f)
+  {
+	using _Impl = _State_impl<_Callable>;
+	return _State_ptr{new _Impl{std::forward<_Callable>(__f)}};
+  }
+#if _GLIBCXX_THREAD_ABI_COMPAT
+  public:
+struct _Impl_base;
+typedef shared_ptr<_Impl_base>	__shared_base_type;
+struct _Impl_base
+{
+  __shared_base_t

Re: [C/C++ PATCH] RFC: Implement -Wduplicated-cond (PR c/64249) (take

2015-09-23 Thread Marek Polacek
On Tue, Sep 22, 2015 at 03:33:34PM -0600, Martin Sebor wrote:
> It's fine by me (for whatever it's worth).

Thanks.  Let's wait if Jason/Joseph or anyone else wants to chime in.
 
> Btw., if you're unhappy about having to wipe out the whole chain
> after every side-effect it occurred to me that it might be possible
> to do better: instead of deleting the whole chain, only remove from
> it the elements that may be affected by the side-effect. This should
> make it possible to keep on the chain all conditions involving local
> variables whose address hasn't been taken, which I would expect to
> be most in most cases.

I'm not unhappy about deleting the chain ;).  I'd rather not do that
because that might get somewhat hairy.  First, I don't think we have
the capability to easily detect variables whose address hasn't been
taken, second, consider e.g.

  if (j == 4) // ...
  else if ((j++, --k, ++l)) // ...
  else if (bar (j, &k)) // ...

we'd probably need some walk_tree, save the variables temporarily somewhere
etc.; that might slow and complicate things for a corner case.  Or am I being
just too lazy? ;)

Thanks,

Marek


libgo patch committed: rewrite lfstack to look more like gc code

2015-09-23 Thread Ian Lance Taylor
This patch by Michael Hudson-Doyle rewrites the lfstack code in libgo
to look more like that in the gc library.  It also fixes it for arm64.
Bootstrapped and ran Go testsuite on x86_64-unknown-linux-gnu.
Committed to mainline.

Ian
Index: gcc/go/gofrontend/MERGE
===
--- gcc/go/gofrontend/MERGE (revision 227863)
+++ gcc/go/gofrontend/MERGE (working copy)
@@ -1,4 +1,4 @@
-e069d4417a692c1261df99fe3323277e1a0193d2
+2087b95180caea3477647c449772b7fecc01a71c
 
 The first line of this file holds the git revision number of the last
 merge done from the gofrontend repository.
Index: libgo/runtime/lfstack.goc
===
--- libgo/runtime/lfstack.goc   (revision 227696)
+++ libgo/runtime/lfstack.goc   (working copy)
@@ -9,25 +9,41 @@ package runtime
 #include "arch.h"
 
 #if __SIZEOF_POINTER__ == 8
-// Amd64 uses 48-bit virtual addresses, 47-th bit is used as kernel/user flag.
-// So we use 17msb of pointers as ABA counter.
-# define PTR_BITS 47
-#else
-# define PTR_BITS 32
-#endif
-#define PTR_MASK ((1ull CNT_BITS) << 3);
+}
+#else
+static inline uint64 lfPack(LFNode *node, uintptr cnt) {
+   return ((uint64)(uintptr)(node)<<32) | cnt;
+}
+static inline LFNode* lfUnpack(uint64 val) {
+   return (LFNode*)(uintptr)(val >> 32);
+}
 #endif
 
 void
@@ -35,16 +51,16 @@ runtime_lfstackpush(uint64 *head, LFNode
 {
uint64 old, new;
 
-   if((uintptr)node != ((uintptr)node&PTR_MASK)) {
+   if(node != lfUnpack(lfPack(node, 0))) {
runtime_printf("p=%p\n", node);
runtime_throw("runtime_lfstackpush: invalid pointer");
}
 
node->pushcnt++;
-   new = 
(uint64)(uintptr)node|(((uint64)node->pushcnt&CNT_MASK)next = (LFNode*)(uintptr)(old&PTR_MASK);
+   node->next = lfUnpack(old);
if(runtime_cas64(head, old, new))
break;
}
@@ -60,11 +76,11 @@ runtime_lfstackpop(uint64 *head)
old = runtime_atomicload64(head);
if(old == 0)
return nil;
-   node = (LFNode*)(uintptr)(old&PTR_MASK);
+   node = lfUnpack(old);
node2 = runtime_atomicloadp(&node->next);
new = 0;
if(node2 != nil)
-   new = 
(uint64)(uintptr)node2|(((uint64)node2->pushcnt&CNT_MASK)

[gomp4 1/8] nvptx: remove assumption of OpenACC attrs presence

2015-09-23 Thread Alexander Monakov
This patch makes one OpenACC-specific path in nvptx_record_offload_symbol
optional.

* config/nvptx/nvptx.c (nvptx_record_offload_symbol): Allow missing
OpenACC attributes.
---
 gcc/config/nvptx/nvptx.c | 19 +++
 1 file changed, 11 insertions(+), 8 deletions(-)

diff --git a/gcc/config/nvptx/nvptx.c b/gcc/config/nvptx/nvptx.c
index 53850a1..21c59ef 100644
--- a/gcc/config/nvptx/nvptx.c
+++ b/gcc/config/nvptx/nvptx.c
@@ -4026,19 +4026,22 @@ nvptx_record_offload_symbol (tree decl)
 
 case FUNCTION_DECL:
   {
-   tree attr = get_oacc_fn_attrib (decl);
-   tree dims = TREE_VALUE (attr);
-   unsigned ix;
-   
fprintf (asm_out_file, "//:FUNC_MAP \"%s\"",
 IDENTIFIER_POINTER (DECL_ASSEMBLER_NAME (decl)));
 
-   for (ix = 0; ix != GOMP_DIM_MAX; ix++, dims = TREE_CHAIN (dims))
+   tree attr = get_oacc_fn_attrib (decl);
+   if (attr)
  {
-   int size = TREE_INT_CST_LOW (TREE_VALUE (dims));
+   tree dims = TREE_VALUE (attr);
+   unsigned ix;
 
-   gcc_assert (!TREE_PURPOSE (dims));
-   fprintf (asm_out_file, ", %#x", size);
+   for (ix = 0; ix != GOMP_DIM_MAX; ix++, dims = TREE_CHAIN (dims))
+   {
+ int size = TREE_INT_CST_LOW (TREE_VALUE (dims));
+
+ gcc_assert (!TREE_PURPOSE (dims));
+ fprintf (asm_out_file, ", %#x", size);
+   }
  }
 
fprintf (asm_out_file, "\n");


[gomp4 4/8] libgomp: minimal OpenMP support in plugin-nvptx.c

2015-09-23 Thread Alexander Monakov
This is a minimal patch for NVPTX OpenMP offloading, using Jakub's initial
implementation.  It allows to successfully run '#pragma omp target', without
any parallel execution: 1 team of 1 thread is spawned on the device, and
target regions with '#pragma omp parallel' will fail with a link error.

* plugin/plugin-nvptx.c (nvptx_host2dev): Allow NULL 'nvthd'.
(nvptx_dev2host): Ditto.
(GOMP_OFFLOAD_get_caps): Add GOMP_OFFLOAD_CAP_OPENMP_400.
(GOMP_OFFLOAD_run): New.
---
 libgomp/plugin/plugin-nvptx.c | 30 +++---
 1 file changed, 27 insertions(+), 3 deletions(-)

diff --git a/libgomp/plugin/plugin-nvptx.c b/libgomp/plugin/plugin-nvptx.c
index 52c49c7..a3eaafa 100644
--- a/libgomp/plugin/plugin-nvptx.c
+++ b/libgomp/plugin/plugin-nvptx.c
@@ -1052,7 +1052,7 @@ nvptx_host2dev (void *d, const void *h, size_t s)
 GOMP_PLUGIN_fatal ("invalid size");
 
 #ifndef DISABLE_ASYNC
-  if (nvthd->current_stream != nvthd->ptx_dev->null_stream)
+  if (nvthd && nvthd->current_stream != nvthd->ptx_dev->null_stream)
 {
   CUevent *e;
 
@@ -1117,7 +1117,7 @@ nvptx_dev2host (void *h, const void *d, size_t s)
 GOMP_PLUGIN_fatal ("invalid size");
 
 #ifndef DISABLE_ASYNC
-  if (nvthd->current_stream != nvthd->ptx_dev->null_stream)
+  if (nvthd && nvthd->current_stream != nvthd->ptx_dev->null_stream)
 {
   CUevent *e;
 
@@ -1451,7 +1451,7 @@ GOMP_OFFLOAD_get_name (void)
 unsigned int
 GOMP_OFFLOAD_get_caps (void)
 {
-  return GOMP_OFFLOAD_CAP_OPENACC_200;
+  return GOMP_OFFLOAD_CAP_OPENACC_200 | GOMP_OFFLOAD_CAP_OPENMP_400;
 }
 
 int
@@ -1788,3 +1788,27 @@ GOMP_OFFLOAD_openacc_set_cuda_stream (int async, void 
*stream)
 {
   return nvptx_set_cuda_stream (async, stream);
 }
+
+void
+GOMP_OFFLOAD_run (int ord, void *tgt_fn, void *tgt_vars)
+{
+  CUfunction function = ((struct targ_fn_descriptor *) tgt_fn)->fn;
+  CUresult r;
+  struct ptx_device *ptx_dev = ptx_devices[ord];
+  const char *maybe_abort_msg = "(perhaps abort was called)";
+  void *args = &tgt_vars;
+
+  r = cuLaunchKernel (function,
+ 1, 1, 1,
+ 1, 1, 1,
+ 0, ptx_dev->null_stream->stream, &args, 0);
+  if (r != CUDA_SUCCESS)
+GOMP_PLUGIN_fatal ("cuLaunchKernel error: %s", cuda_error (r));
+
+  r = cuCtxSynchronize ();
+  if (r == CUDA_ERROR_LAUNCH_FAILED)
+GOMP_PLUGIN_fatal ("cuCtxSynchronize error: %s %s\n", cuda_error (r),
+  maybe_abort_msg);
+  else if (r != CUDA_SUCCESS)
+GOMP_PLUGIN_fatal ("cuCtxSynchronize error: %s", cuda_error (r));
+}


[gomp4 3/8] libgomp: provide target-to-host fallback diagnostic

2015-09-23 Thread Alexander Monakov
This patch allows to see when target regions are executed on host with
GOMP_DEBUG=1 in the environment.

* target.c (GOMP_target): Use gomp_debug on fallback path.
---
 libgomp/target.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/libgomp/target.c b/libgomp/target.c
index 6ca80ad..1cc2098 100644
--- a/libgomp/target.c
+++ b/libgomp/target.c
@@ -1008,6 +1008,7 @@ GOMP_target (int device, void (*fn) (void *), const void 
*unused,
   || !(devicep->capabilities & GOMP_OFFLOAD_CAP_OPENMP_400))
 {
   /* Host fallback.  */
+  gomp_debug (0, "%s: target region executing on host\n", __FUNCTION__);
   struct gomp_thread old_thr, *thr = gomp_thread ();
   old_thr = *thr;
   memset (thr, '\0', sizeof (*thr));


[gomp4 2/8] nvptx mkoffload: do not restrict to OpenACC

2015-09-23 Thread Alexander Monakov
This patch allows to meaningfully invoke mkoffload with -fopenmp.  The check
for -fopenacc flag is specific to gomp4 branch: trunk does not have it.

* config/nvptx/mkoffload.c (main): Do not check for -fopenacc.
---
 gcc/config/nvptx/mkoffload.c | 7 ++-
 1 file changed, 2 insertions(+), 5 deletions(-)

diff --git a/gcc/config/nvptx/mkoffload.c b/gcc/config/nvptx/mkoffload.c
index 0114394..8c15686 100644
--- a/gcc/config/nvptx/mkoffload.c
+++ b/gcc/config/nvptx/mkoffload.c
@@ -468,15 +468,12 @@ main (int argc, char **argv)
   obstack_ptr_grow (&argv_obstack, str);
 }
 
-  bool fopenacc = false;
   for (int ix = 1; ix != argc; ix++)
 {
   if (!strcmp (argv[ix], "-v"))
verbose = true;
   else if (!strcmp (argv[ix], "-save-temps"))
save_temps = true;
-  else if (!strcmp (argv[ix], "-fopenacc"))
-   fopenacc = true;
 
   if (!strcmp (argv[ix], "-o") && ix + 1 != argc)
outname = argv[++ix];
@@ -491,8 +488,8 @@ main (int argc, char **argv)
 fatal_error (input_location, "cannot open '%s'", ptx_cfile_name);
 
   /* PR libgomp/65099: Currently, we only support offloading in 64-bit
- configurations, and only for OpenACC offloading.  */
-  if (!target_ilp32 && fopenacc)
+ configurations.  */
+  if (!target_ilp32)
 {
   ptx_name = make_temp_file (".mkoffload");
   obstack_ptr_grow (&argv_obstack, "-o");


[gomp4 0/8] NVPTX: initial OpenMP offloading

2015-09-23 Thread Alexander Monakov
Hello,

This patch series implements some minimally required changes to have OpenMP
offloading working for NVPTX target on the gomp4 branch.  '#pragma omp target'
and data updates should work, but all parallel execution functionality remains
stubbed out (uses of '#pragma omp parallel' in target regions yield a link
error).

I'd like to get feedback on the patches, and approval for the gomp-4_0-branch
where possible.

Patches 1-2 unbreak compilation with offloading, patch 4 allows to invoke a
target region on the accelerator, patches 5-8 unbreak libgomp.h and allow
env.c to be compiled for the accelerator.

  nvptx: remove assumption of OpenACC attrs presence
  nvptx mkoffload: do not restrict to OpenACC
  libgomp: provide target-to-host fallback diagnostic
  libgomp: minimal OpenMP support in plugin-nvptx.c
  libgomp: provide sem.h, mutex.h, ptrlock.h on nvptx
  libgomp: provide stub bar.h on nvptx
  libgomp: work around missing pthread_attr_t on nvptx
  libgomp: provide ICVs via env.c on nvptx

 gcc/config/nvptx/mkoffload.c   |   7 +-
 gcc/config/nvptx/nvptx.c   |  19 ++--
 libgomp/config/nvptx/bar.h |  38 +++
 libgomp/config/nvptx/env.c | 219 +
 libgomp/config/nvptx/mutex.h   |  67 +
 libgomp/config/nvptx/ptrlock.h |  73 ++
 libgomp/config/nvptx/sem.h |  65 
 libgomp/libgomp.h  |   5 +
 libgomp/plugin/plugin-nvptx.c  |  30 +-
 libgomp/target.c   |   1 +
 10 files changed, 508 insertions(+), 16 deletions(-)
 create mode 100644 libgomp/config/nvptx/bar.h
 create mode 100644 libgomp/config/nvptx/mutex.h
 create mode 100644 libgomp/config/nvptx/ptrlock.h
 create mode 100644 libgomp/config/nvptx/sem.h



[gomp4 7/8] libgomp: work around missing pthread_attr_t on nvptx

2015-09-23 Thread Alexander Monakov
Although newlib headers define most pthreads types, pthread_attr_t is not
available.  Macro-replace it by 'void' to keep the prototype of
gomp_init_thread_affinity unchanged, and do not declare gomp_thread_attr.

* libgomp.h: Define pthread_attr_t to void on NVPTX.
---
 libgomp/libgomp.h | 5 +
 1 file changed, 5 insertions(+)

diff --git a/libgomp/libgomp.h b/libgomp/libgomp.h
index d51b08b..f4255b4 100644
--- a/libgomp/libgomp.h
+++ b/libgomp/libgomp.h
@@ -510,8 +510,13 @@ static inline struct gomp_task_icv *gomp_icv (bool write)
 return &gomp_global_icv;
 }
 
+#ifdef __nvptx__
+/* pthread_attr_t is not provided by newlib on NVPTX.  */
+#define pthread_attr_t void
+#else
 /* The attributes to be used during thread creation.  */
 extern pthread_attr_t gomp_thread_attr;
+#endif
 
 /* Function prototypes.  */
 


[gomp4 6/8] libgomp: provide stub bar.h on nvptx

2015-09-23 Thread Alexander Monakov
This stub header only provides empty struct gomp_barrier_t.  For now I've
punted on providing a minimally-correct implementation.

* config/nvptx/bar.h: New file.
---
 libgomp/config/nvptx/bar.h | 38 ++
 1 file changed, 38 insertions(+)
 create mode 100644 libgomp/config/nvptx/bar.h

diff --git a/libgomp/config/nvptx/bar.h b/libgomp/config/nvptx/bar.h
new file mode 100644
index 000..009d85f
--- /dev/null
+++ b/libgomp/config/nvptx/bar.h
@@ -0,0 +1,38 @@
+/* Copyright (C) 2015 Free Software Foundation, Inc.
+
+   This file is part of the GNU Offloading and Multi Processing Library
+   (libgomp).
+
+   Libgomp is free software; you can redistribute it and/or modify it
+   under the terms of the GNU General Public License as published by
+   the Free Software Foundation; either version 3, or (at your option)
+   any later version.
+
+   Libgomp is distributed in the hope that it will be useful, but WITHOUT ANY
+   WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS
+   FOR A PARTICULAR PURPOSE.  See the GNU General Public License for
+   more details.
+
+   Under Section 7 of GPL version 3, you are granted additional
+   permissions described in the GCC Runtime Library Exception, version
+   3.1, as published by the Free Software Foundation.
+
+   You should have received a copy of the GNU General Public License and
+   a copy of the GCC Runtime Library Exception along with this program;
+   see the files COPYING3 and COPYING.RUNTIME respectively.  If not, see
+   .  */
+
+/* This is an NVPTX specific implementation of a barrier synchronization
+   mechanism for libgomp.  This type is private to the library.  This
+   implementation is a stub, for now.  */
+
+#ifndef GOMP_BARRIER_H
+#define GOMP_BARRIER_H 1
+
+typedef struct
+{
+} gomp_barrier_t;
+
+typedef unsigned int gomp_barrier_state_t;
+
+#endif /* GOMP_BARRIER_H */


[gomp4 8/8] libgomp: provide ICVs via env.c on nvptx

2015-09-23 Thread Alexander Monakov
This patch ports env.c to NVPTX.  It drops all environment parsing routines
since there's no "environment" on the device.  For now, the useful effect of
the patch is providing 'omp_is_initial_device' to distinguish host execution
from target execution in user code.

Several functions use gomp_icv, which is not adjusted for NVPTX and thus will
try to use EMUTLS.  The intended way forward is to provide a custom
implementation of gomp_icv on NVPTX, likely via pre-allocating storage prior
to spawning a team.

* config/nvptx/env.c: New file.
---
 libgomp/config/nvptx/env.c | 219 +
 1 file changed, 219 insertions(+)

diff --git a/libgomp/config/nvptx/env.c b/libgomp/config/nvptx/env.c
index e69de29..f964b29 100644
--- a/libgomp/config/nvptx/env.c
+++ b/libgomp/config/nvptx/env.c
@@ -0,0 +1,219 @@
+/* Copyright (C) 2015 Free Software Foundation, Inc.
+
+   This file is part of the GNU Offloading and Multi Processing Library
+   (libgomp).
+
+   Libgomp is free software; you can redistribute it and/or modify it
+   under the terms of the GNU General Public License as published by
+   the Free Software Foundation; either version 3, or (at your option)
+   any later version.
+
+   Libgomp is distributed in the hope that it will be useful, but WITHOUT ANY
+   WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS
+   FOR A PARTICULAR PURPOSE.  See the GNU General Public License for
+   more details.
+
+   Under Section 7 of GPL version 3, you are granted additional
+   permissions described in the GCC Runtime Library Exception, version
+   3.1, as published by the Free Software Foundation.
+
+   You should have received a copy of the GNU General Public License and
+   a copy of the GCC Runtime Library Exception along with this program;
+   see the files COPYING3 and COPYING.RUNTIME respectively.  If not, see
+   .  */
+
+/* This file defines the OpenMP internal control variables.  There is
+   no environment on the accelerator, so the variables can be changed
+   only via OpenMP API in target regions.  */
+
+#include "libgomp.h"
+#include "libgomp_f.h"
+
+#include 
+
+struct gomp_task_icv gomp_global_icv = {
+  .nthreads_var = 1,
+  .thread_limit_var = UINT_MAX,
+  .run_sched_var = GFS_DYNAMIC,
+  .run_sched_modifier = 1,
+  .default_device_var = 0,
+  .dyn_var = false,
+  .nest_var = false,
+  .bind_var = omp_proc_bind_false,
+  .target_data = NULL
+};
+
+unsigned long gomp_max_active_levels_var = INT_MAX;
+unsigned long gomp_available_cpus = 1, gomp_managed_threads = 1;
+unsigned long long gomp_spin_count_var, gomp_throttled_spin_count_var;
+unsigned long *gomp_nthreads_var_list, gomp_nthreads_var_list_len;
+char *gomp_bind_var_list;
+unsigned long gomp_bind_var_list_len;
+void **gomp_places_list;
+unsigned long gomp_places_list_len;
+int gomp_debug_var;
+
+void
+omp_set_num_threads (int n)
+{
+  struct gomp_task_icv *icv = gomp_icv (true);
+  icv->nthreads_var = (n > 0 ? n : 1);
+}
+
+void
+omp_set_dynamic (int val)
+{
+  struct gomp_task_icv *icv = gomp_icv (true);
+  icv->dyn_var = val;
+}
+
+int
+omp_get_dynamic (void)
+{
+  struct gomp_task_icv *icv = gomp_icv (false);
+  return icv->dyn_var;
+}
+
+void
+omp_set_nested (int val)
+{
+  struct gomp_task_icv *icv = gomp_icv (true);
+  icv->nest_var = val;
+}
+
+int
+omp_get_nested (void)
+{
+  struct gomp_task_icv *icv = gomp_icv (false);
+  return icv->nest_var;
+}
+
+void
+omp_set_schedule (omp_sched_t kind, int modifier)
+{
+  struct gomp_task_icv *icv = gomp_icv (true);
+  switch (kind)
+{
+case omp_sched_static:
+  if (modifier < 1)
+   modifier = 0;
+  icv->run_sched_modifier = modifier;
+  break;
+case omp_sched_dynamic:
+case omp_sched_guided:
+  if (modifier < 1)
+   modifier = 1;
+  icv->run_sched_modifier = modifier;
+  break;
+case omp_sched_auto:
+  break;
+default:
+  return;
+}
+  icv->run_sched_var = kind;
+}
+
+void
+omp_get_schedule (omp_sched_t *kind, int *modifier)
+{
+  struct gomp_task_icv *icv = gomp_icv (false);
+  *kind = icv->run_sched_var;
+  *modifier = icv->run_sched_modifier;
+}
+
+int
+omp_get_max_threads (void)
+{
+  struct gomp_task_icv *icv = gomp_icv (false);
+  return icv->nthreads_var;
+}
+
+int
+omp_get_thread_limit (void)
+{
+  struct gomp_task_icv *icv = gomp_icv (false);
+  return icv->thread_limit_var > INT_MAX ? INT_MAX : icv->thread_limit_var;
+}
+
+void
+omp_set_max_active_levels (int max_levels)
+{
+  if (max_levels >= 0)
+gomp_max_active_levels_var = max_levels;
+}
+
+int
+omp_get_max_active_levels (void)
+{
+  return gomp_max_active_levels_var;
+}
+
+int
+omp_get_cancellation (void)
+{
+  return 0;
+}
+
+omp_proc_bind_t
+omp_get_proc_bind (void)
+{
+  return omp_proc_bind_false;
+}
+
+void
+omp_set_default_device (int device_num __attribute__((unused)))
+{
+}
+
+int
+omp_get_default_device (void)
+{
+  return 0;
+}
+
+int
+omp_get_num_devices (vo

[gomp4 5/8] libgomp: provide sem.h, mutex.h, ptrlock.h on nvptx

2015-09-23 Thread Alexander Monakov
This patch provides minimal non-stub implementations for libgomp
mutex/ptrlock/semaphore, using atomic ops and busy waiting.  The goal here is
to at least provide stub struct declarations necessary to unbreak libgomp.h.

Atomics with busy waiting seems to be the only way to provide such primitives
for inter-team synchronizations, but for intra-team ops a more efficient
implementation may be possible.

(all functionality is unused since consumers are stubbed out in config/nvptx)

* config/nvptx/mutex.h: New file.
* config/nvptx/ptrlock.h: New file.
* config/nvptx/sem.h: New file.
---
 libgomp/config/nvptx/mutex.h   | 67 ++
 libgomp/config/nvptx/ptrlock.h | 73 ++
 libgomp/config/nvptx/sem.h | 65 +
 3 files changed, 205 insertions(+)
 create mode 100644 libgomp/config/nvptx/mutex.h
 create mode 100644 libgomp/config/nvptx/ptrlock.h
 create mode 100644 libgomp/config/nvptx/sem.h

diff --git a/libgomp/config/nvptx/mutex.h b/libgomp/config/nvptx/mutex.h
new file mode 100644
index 000..a98d5a9
--- /dev/null
+++ b/libgomp/config/nvptx/mutex.h
@@ -0,0 +1,67 @@
+/* Copyright (C) 2015 Free Software Foundation, Inc.
+   Contributed by Alexander Monakov 
+
+   This file is part of the GNU Offloading and Multi Processing Library
+   (libgomp).
+
+   Libgomp is free software; you can redistribute it and/or modify it
+   under the terms of the GNU General Public License as published by
+   the Free Software Foundation; either version 3, or (at your option)
+   any later version.
+
+   Libgomp is distributed in the hope that it will be useful, but WITHOUT ANY
+   WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS
+   FOR A PARTICULAR PURPOSE.  See the GNU General Public License for
+   more details.
+
+   Under Section 7 of GPL version 3, you are granted additional
+   permissions described in the GCC Runtime Library Exception, version
+   3.1, as published by the Free Software Foundation.
+
+   You should have received a copy of the GNU General Public License and
+   a copy of the GCC Runtime Library Exception along with this program;
+   see the files COPYING3 and COPYING.RUNTIME respectively.  If not, see
+   .  */
+
+/* This is an NVPTX specific implementation of a mutex synchronization
+   mechanism for libgomp.  This type is private to the library.  This
+   implementation uses atomic instructions and busy waiting.  */
+
+#ifndef GOMP_MUTEX_H
+#define GOMP_MUTEX_H 1
+
+typedef int gomp_mutex_t;
+
+#define GOMP_MUTEX_INIT_0 1
+
+static inline void
+gomp_mutex_init (gomp_mutex_t *mutex)
+{
+  *mutex = 0;
+}
+
+static inline void
+gomp_mutex_destroy (gomp_mutex_t *mutex)
+{
+}
+
+static inline void
+gomp_mutex_lock (gomp_mutex_t *mutex)
+{
+  int value = __atomic_load_n (mutex, MEMMODEL_ACQUIRE);
+  for (;;)
+{
+  while (value == 0)
+   value = __atomic_load_n (mutex, MEMMODEL_ACQUIRE);
+  if (__atomic_compare_exchange_n (mutex, &value, 1, false,
+  MEMMODEL_ACQUIRE, MEMMODEL_RELAXED))
+   return;
+}
+}
+
+static inline void
+gomp_mutex_unlock (gomp_mutex_t *mutex)
+{
+  __atomic_store_n (mutex, 0, MEMMODEL_RELEASE);
+}
+#endif /* GOMP_MUTEX_H */
diff --git a/libgomp/config/nvptx/ptrlock.h b/libgomp/config/nvptx/ptrlock.h
new file mode 100644
index 000..c4ff033
--- /dev/null
+++ b/libgomp/config/nvptx/ptrlock.h
@@ -0,0 +1,73 @@
+/* Copyright (C) 2015 Free Software Foundation, Inc.
+   Contributed by Alexander Monakov 
+
+   This file is part of the GNU Offloading and Multi Processing Library
+   (libgomp).
+
+   Libgomp is free software; you can redistribute it and/or modify it
+   under the terms of the GNU General Public License as published by
+   the Free Software Foundation; either version 3, or (at your option)
+   any later version.
+
+   Libgomp is distributed in the hope that it will be useful, but WITHOUT ANY
+   WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS
+   FOR A PARTICULAR PURPOSE.  See the GNU General Public License for
+   more details.
+
+   Under Section 7 of GPL version 3, you are granted additional
+   permissions described in the GCC Runtime Library Exception, version
+   3.1, as published by the Free Software Foundation.
+
+   You should have received a copy of the GNU General Public License and
+   a copy of the GCC Runtime Library Exception along with this program;
+   see the files COPYING3 and COPYING.RUNTIME respectively.  If not, see
+   .  */
+
+/* This is an NVPTX specific implementation of a mutex synchronization
+   mechanism for libgomp.  This type is private to the library.  This
+   implementation uses atomic instructions and busy waiting.
+
+   A ptrlock has four states:
+   0/NULL Initial
+   1  Owned by me, I get to write a pointer to ptrlock.
+   2  Some thre

Re: (patch,rfc) s/gimple/gimple */

2015-09-23 Thread Thomas Schwinge
Hi!

On Sat, 19 Sep 2015 20:55:35 -0400, Trevor Saunders  
wrote:
> On Fri, Sep 18, 2015 at 09:32:37AM -0600, Jeff Law wrote:
> > On 09/18/2015 07:32 AM, Trevor Saunders wrote:
> > >On Wed, Sep 16, 2015 at 03:11:14PM -0400, David Malcolm wrote:
> > >>On Wed, 2015-09-16 at 09:16 -0400, Trevor Saunders wrote:
> > >>>I gave changing from gimple to gimple * a shot last week.

> ok, its committed now :)

I guess the following should also be adjusted?

gcc/doc/gimple.texi:@subsection @code{gimple_statement_base} (gsbase)
gcc/doc/gimple.texi:@cindex gimple_statement_base
gcc/doc/gimple.texi:Inherited from @code{struct gimple_statement_base}.
gcc/doc/gimple.texi:   gimple_statement_base
gcc/doc/gimple.texi:codes.  Then you must add a corresponding 
gimple_statement_base subclass
gcc/doc/gimple.texi:as a pointer to the appropriate gimple_statement_base 
subclass.
gcc/gdbhooks.py:pp.add_printer_for_types(['gimple', 
'gimple_statement_base *',
gcc/gimple.h:   always stored in gimple_statement_base.subcode and they may 
only be
gcc/gimple.h: This is different than the BLOCK field in 
gimple_statement_base,
gcc/gimple.h:   Note: This is based on gimple_statement_base, not g_s_omp, 
because g_s_omp

gcc/doc/gimple.texi:@subsection @code{gimple_statement_base} (gsbase)
gcc/doc/gimple.texi:@item@code{gsbase}   @tab 256
gcc/doc/gimple.texi:@item @code{gsbase}
gcc/doc/gimple.texi:@item @code{gsbase}  @tab 256


Grüße,
 Thomas


signature.asc
Description: PGP signature


[gomp4] remap variables inside gang, worker, vector and collapse clauses

2015-09-23 Thread Cesar Philippidis
Gang, worker, vector and collapse all contain optional arguments which
may be used during loop expansion. In OpenACC, those expressions could
contain variables, but those variables aren't always getting remapped
automatically. This patch remaps those variables inside lower_omp_loop.

Note that I didn't need to use a tree walker for more complicated
expressions because it's not required. By the time those clauses reach
lower_omp_loop, only the result of the expression is available. So the
other variables in those expressions get remapped with everything else
during omplow. Therefore, the only problematic case is when the the
optional expression is just a decl, e.g. gang(static:foo).

I've applied this patch to gomp-4_0-branch.

Cesar


Re: [PATCH] Preserve restrict dependence info in FRE/PRE

2015-09-23 Thread Bernhard Reutner-Fischer
On September 23, 2015 3:00:51 PM GMT+02:00, Richard Biener  
wrote:

>*** copy_reference_ops_from_ref (tree ref, v
>*** 816,826 
>--- 828,846 
> temp.op0 = TREE_OPERAND (ref, 1);
> if (tree_fits_shwi_p (TREE_OPERAND (ref, 1)))
>   temp.off = tree_to_shwi (TREE_OPERAND (ref, 1));
>+temp.clique = MR_DEPENDENCE_CLIQUE (ref);
>+temp.base = MR_DEPENDENCE_BASE (ref);
> break;
>   case BIT_FIELD_REF:
> /* Record bits and position.  */
> temp.op0 = TREE_OPERAND (ref, 1);
> temp.op1 = TREE_OPERAND (ref, 2);
>+if (tree_fits_shwi_p (TREE_OPERAND (ref, 2)))
>+  {
>+HOST_WIDE_INT off = tree_to_shwi (TREE_OPERAND (ref, 2));
>+if (off % BITS_PER_UNIT == 0)
>+  temp.off = off / 8;

s/8/BITS_PER_UNIT/

Thanks,



[gomp4] oacc xform updates

2015-09-23 Thread Nathan Sidwell
I've committed this patch to change all the OACC hooks to take a gcall * rather 
than 'gimple'.  mainline has changed the type of 'gimple', and we know we're 
passing a call anyway.  Also updated the rescanning to be more straightforwards.


nathan
2015-09-23  Nathan Sidwell  

	* target.def: GOACC hooks take gcall arg.
	* doc/tm.texi: Regenerate.
	* targhooks.h (default_goacc_reduction, default_goacc_fork_join,
	default_coacc_lock): Gimple is a gcall pointer.
	* omp-low.c (oacc_xform_on_device): Arg is a gcall.  Adjust.
	(oacc_xform_dim): Likewise.
	(execute_oacc_transform): Adjust to pass gcall pointer to worker
	functions.  Handle rescan immediately.
	(default_goacc_reduction, default_goacc_fork_join,
	default_coacc_lock): Gimple is a gcall pointer.
	*  config/nvptx/nvptx.c (nvptx_xform_fork_join,
	nvptx_xform_lock, nvptx_goacc_reduction_setup,
	nvptx_goacc_reduction_init, nvptx_goacc_reduction_fini, 
	nvptx_goacc_reduction_teardown, nvptx_goacc_reduction): Argument
	is a gcall, adjust.

Index: target.def
===
--- target.def	(revision 228058)
+++ target.def	(working copy)
@@ -1667,7 +1667,7 @@ DEFHOOK
 calls to target-specific gimple.  It is executed during the oacc_xform\n\
 pass.  It should return true, if the functions should be deleted.  The\n\
 default hook returns true, if there are no RTL expanders for them.",
-bool, (gimple stmt, const int dims[], bool is_fork),
+bool, (gcall *call, const int dims[], bool is_fork),
 default_goacc_fork_join)
 
 DEFHOOK
@@ -1677,7 +1677,7 @@ IFN_GOACC_LOCK_INIT function calls to ta
 executed during the oacc_xform pass.  It should return true, if the\n\
 functions should be deleted.  The default hook returns true, if there\n\
 are no RTL expanders for them.",
-bool, (gimple stmt, const int dims[], unsigned ifn_code),
+bool, (gcall *call, const int dims[], unsigned ifn_code),
 default_goacc_lock)
 
 DEFHOOK
@@ -1692,7 +1692,7 @@ hook removes statement @var{call} after
 inserted.  This hook is also responsible for allocating any storage for\n\
 reductions when necessary.  It returns @var{true} if the expanded\n\
 sequence introduces any calls to OpenACC-specific internal functions.",
-bool, (gimple call),
+bool, (gcall *call),
 default_goacc_reduction)
 
 HOOK_VECTOR_END (goacc)
Index: omp-low.c
===
--- omp-low.c	(revision 228058)
+++ omp-low.c	(working copy)
@@ -14689,9 +14660,9 @@ make_pass_late_lower_omp (gcc::context *
offloaded function we're never 'none'.  */
 
 static void
-oacc_xform_on_device (gimple stmt)
+oacc_xform_on_device (gcall *call)
 {
-  tree arg = gimple_call_arg (stmt, 0);
+  tree arg = gimple_call_arg (call, 0);
   unsigned val = GOMP_DEVICE_HOST;
 	  
 #ifdef ACCEL_COMPILER
@@ -14708,14 +14679,14 @@ oacc_xform_on_device (gimple stmt)
   }
 #endif
   result = fold_convert (integer_type_node, result);
-  tree lhs = gimple_call_lhs (stmt);
+  tree lhs = gimple_call_lhs (call);
   gimple_seq seq = NULL;
 
   push_gimplify_context (true);
   gimplify_assign (lhs, result, &seq);
   pop_gimplify_context (NULL);
 
-  gimple_stmt_iterator gsi = gsi_for_stmt (stmt);
+  gimple_stmt_iterator gsi = gsi_for_stmt (call);
   gsi_replace_with_seq (&gsi, seq, false);
 }
 
@@ -14723,9 +14694,9 @@ oacc_xform_on_device (gimple stmt)
constants, where possible.  */
 
 static void
-oacc_xform_dim (gimple stmt, const int dims[], bool is_pos)
+oacc_xform_dim (gcall *call, const int dims[], bool is_pos)
 {
-  tree arg = gimple_call_arg (stmt, 0);
+  tree arg = gimple_call_arg (call, 0);
   unsigned axis = (unsigned)TREE_INT_CST_LOW (arg);
   int size = dims[axis];
 
@@ -14742,11 +14713,11 @@ oacc_xform_dim (gimple stmt, const int d
 }
 
   /* Replace the internal call with a constant.  */
-  tree lhs = gimple_call_lhs (stmt);
+  tree lhs = gimple_call_lhs (call);
   gimple g = gimple_build_assign
 (lhs, build_int_cst (integer_type_node, size));
 
-  gimple_stmt_iterator gsi = gsi_for_stmt (stmt);
+  gimple_stmt_iterator gsi = gsi_for_stmt (call);
   gsi_replace (&gsi, g, false);
 }
 
@@ -14815,10 +14786,8 @@ oacc_validate_dims (tree fn, tree attrs,
 static unsigned int
 execute_oacc_transform ()
 {
-  basic_block bb;
   tree attrs = get_oacc_fn_attrib (current_function_decl);
   int dims[GOMP_DIM_MAX];
-  bool needs_rescan;
   
   if (!attrs)
 /* Not an offloaded function.  */
@@ -14830,80 +14799,97 @@ execute_oacc_transform ()
  dominance information to update SSA.  */
   calculate_dominance_info (CDI_DOMINATORS);
 
-  do
-{
-  needs_rescan = false;
-
-  FOR_ALL_BB_FN (bb, cfun)
-	for (gimple_stmt_iterator gsi = gsi_start_bb (bb); !gsi_end_p (gsi);)
+  basic_block bb;
+  FOR_ALL_BB_FN (bb, cfun)
+for (gimple_stmt_iterator gsi = gsi_start_bb (bb); !gsi_end_p (gsi);)
+  {
+	gimple stmt = gsi_stmt (gsi);
+	int rescan = 0;
+	
+	if (!is_gimple_call (stmt))
 	  {
-	gimple stmt = gsi_stmt (gsi);
+	   

Re: [gomp4] remap variables inside gang, worker, vector and collapse clauses

2015-09-23 Thread Cesar Philippidis
On 09/23/2015 10:42 AM, Cesar Philippidis wrote:

> I've applied this patch to gomp-4_0-branch.

This patch, that is.

Cesar

2015-09-23  Cesar Philippidis  

	gcc/
	* omp-low.c (lower_omp_for): Remap any variables present in
	OMP_CLAUSE_GANG, OMP_CLAUSE_WORKER, OMP_CLAUSE_VECTOR and
	OMP_CLAUSE_COLLAPSE becuase they will be used later by expand_omp_for.

	libgomp/
	* testsuite/libgomp.oacc-c-c++-common/gang-static-2.c: Test if
	static gang expressions containing variables work.
	* testsuite/libgomp.oacc-fortran/gang-static-1.f90: Likewise.

diff --git a/gcc/omp-low.c b/gcc/omp-low.c
index ec76096..3f36b7a 100644
--- a/gcc/omp-low.c
+++ b/gcc/omp-low.c
@@ -11325,6 +11325,35 @@ lower_omp_for (gimple_stmt_iterator *gsi_p, omp_context *ctx)
   if (oacc_tail)
 gimple_seq_add_seq (&body, oacc_tail);
 
+  /* Update the variables inside any clauses which may be involved in loop
+ expansion later on.  */
+  for (tree c = gimple_omp_for_clauses (stmt); c; c = OMP_CLAUSE_CHAIN (c))
+{
+  int args;
+
+  switch (OMP_CLAUSE_CODE (c))
+	{
+	default:
+	  args = 0;
+	  break;
+	case OMP_CLAUSE_GANG:
+	  args = 2;
+	  break;
+	case OMP_CLAUSE_VECTOR:
+	case OMP_CLAUSE_WORKER:
+	case OMP_CLAUSE_COLLAPSE:
+	  args = 1;
+	  break;
+	}
+
+  for (int i = 0; i < args; i++)
+	{
+	  tree expr = OMP_CLAUSE_OPERAND (c, i);
+	  if (expr && DECL_P (expr))
+	OMP_CLAUSE_OPERAND (c, i) = build_outer_var_ref (expr, ctx);
+	}
+}
+
   pop_gimplify_context (new_stmt);
 
   gimple_bind_append_vars (new_stmt, ctx->block_vars);
diff --git a/libgomp/testsuite/libgomp.oacc-c-c++-common/gang-static-2.c b/libgomp/testsuite/libgomp.oacc-c-c++-common/gang-static-2.c
index 3a9a508..20a866d 100644
--- a/libgomp/testsuite/libgomp.oacc-c-c++-common/gang-static-2.c
+++ b/libgomp/testsuite/libgomp.oacc-c-c++-common/gang-static-2.c
@@ -39,7 +39,7 @@ int
 main ()
 {
   int a[N];
-  int i;
+  int i, x;
 
 #pragma acc parallel loop gang (static:*) num_gangs (10)
   for (i = 0; i < 100; i++)
@@ -78,5 +78,21 @@ main ()
 
   test_nonstatic (a, 10);
 
+  /* Static arguments with a variable expression.  */
+
+  x = 20;
+#pragma acc parallel loop gang (static:0+x) num_gangs (10)
+  for (i = 0; i < 100; i++)
+a[i] = GANG_ID (i);
+
+  test_static (a, 10, 20);
+
+  x = 20;
+#pragma acc parallel loop gang (static:x) num_gangs (10)
+  for (i = 0; i < 100; i++)
+a[i] = GANG_ID (i);
+
+  test_static (a, 10, 20);
+
   return 0;
 }
diff --git a/libgomp/testsuite/libgomp.oacc-fortran/gang-static-1.f90 b/libgomp/testsuite/libgomp.oacc-fortran/gang-static-1.f90
index e562535..7d56060 100644
--- a/libgomp/testsuite/libgomp.oacc-fortran/gang-static-1.f90
+++ b/libgomp/testsuite/libgomp.oacc-fortran/gang-static-1.f90
@@ -3,6 +3,7 @@
 program main
   integer, parameter :: n = 100
   integer i, a(n), b(n)
+  integer x
 
   do i = 1, n
  b(i) = i
@@ -48,6 +49,23 @@ program main
 
   call test (a, b, 20, n)
 
+  x = 5
+  !$acc parallel loop gang (static:0+x) num_gangs (10)
+  do i = 1, n
+ a(i) = b(i) + 5
+  end do
+  !$acc end parallel loop
+
+  call test (a, b, 5, n)
+
+  x = 10
+  !$acc parallel loop gang (static:x) num_gangs (10)
+  do i = 1, n
+ a(i) = b(i) + 10
+  end do
+  !$acc end parallel loop
+
+  call test (a, b, 10, n)
 end program main
 
 subroutine test (a, b, sarg, n)


Re: [C/C++ PATCH] RFC: Implement -Wduplicated-cond (PR c/64249) (take

2015-09-23 Thread Jeff Law

On 09/23/2015 10:32 AM, Marek Polacek wrote:

On Tue, Sep 22, 2015 at 03:33:34PM -0600, Martin Sebor wrote:

It's fine by me (for whatever it's worth).


Thanks.  Let's wait if Jason/Joseph or anyone else wants to chime in.


Btw., if you're unhappy about having to wipe out the whole chain
after every side-effect it occurred to me that it might be possible
to do better: instead of deleting the whole chain, only remove from
it the elements that may be affected by the side-effect. This should
make it possible to keep on the chain all conditions involving local
variables whose address hasn't been taken, which I would expect to
be most in most cases.


I'm not unhappy about deleting the chain ;).  I'd rather not do that
because that might get somewhat hairy.  First, I don't think we have
the capability to easily detect variables whose address hasn't been
taken, second, consider e.g.

   if (j == 4) // ...
   else if ((j++, --k, ++l)) // ...
   else if (bar (j, &k)) // ...

we'd probably need some walk_tree, save the variables temporarily somewhere
etc.; that might slow and complicate things for a corner case.  Or am I being
just too lazy? ;)
This is all running on generic, not gimple/ssa, right?  In which case, 
no you don't know what stuff might be aliased.


Jeff


[C++ PATCH] Fix small typos in the coding rule enforcement warnings.

2015-09-23 Thread Ville Voutilainen
Tested on Linux-PPC64, committed as obvious.

/cp
2015-09-23  Ville Voutilainen  

Fix small typos in the coding rule enforcement warnings.
* parser.c (cp_parser_namespace_definition): Replace 'namepace'
with 'namespace'.

/testsuite
2015-09-23  Ville Voutilainen  

Fix small typos in the coding rule enforcement warnings.
* g++.dg/diagnostic/disable.C: Replace 'namepace'
with 'namespace'.
diff --git a/gcc/cp/parser.c b/gcc/cp/parser.c
index cc920926e..1148156 100644
--- a/gcc/cp/parser.c
+++ b/gcc/cp/parser.c
@@ -17043,7 +17043,7 @@ cp_parser_namespace_definition (cp_parser* parser)
 
   has_visibility = handle_namespace_attrs (current_namespace, attribs);
 
-  warning  (OPT_Wnamespaces, "namepace %qD entered", current_namespace);
+  warning  (OPT_Wnamespaces, "namespace %qD entered", current_namespace);
 
   /* Parse the body of the namespace.  */
   cp_parser_namespace_body (parser);
diff --git a/gcc/testsuite/g++.dg/diagnostic/disable.C 
b/gcc/testsuite/g++.dg/diagnostic/disable.C
index a69033d..7d86e07 100644
--- a/gcc/testsuite/g++.dg/diagnostic/disable.C
+++ b/gcc/testsuite/g++.dg/diagnostic/disable.C
@@ -3,7 +3,7 @@
 #include 
 #include 
 
-namespace foo { } // { dg-warning "namepace" }
+namespace foo { } // { dg-warning "namespace" }
 
 template  X Foo (); // { dg-warning "template" }
 


Re: [PATCH][tree-inline][obvious] Delete redundant count_insns_seq

2015-09-23 Thread Jeff Law

On 09/23/2015 10:05 AM, Kyrill Tkachov wrote:

Hi all,

I notice that the functions count_insns_seq and estimate_num_insns_seq
perform the exact same function for exactly the same arguments.
It's redundant to keep both around. I've decided to delete
count_insns_seq and replace its one use by estimate_num_insns_seq.

Bootstrapped and tested on aarch64, x86_64.
I think this change is obvious, so I'll commit it in 24 hours unless
someone objects.

Thanks,
Kyrill

2015-09-23  Kyrylo Tkachov  

 * tree-inline.h (count_insns_seq): Delete prototype.
 (estimate_num_insns_seq): Define prototype.
 * tree-inline.c (count_insns_seq): Delete.
 (estimate_num_insns_seq): Remove static qualifier.
 * tree-eh.c (decide_copy_try_finally): Replace use of count_insns_seq
 with estimate_num_insns_seq.

This is fine and I think would fall under the "obvious" rule.

Note that we don't have a "no objections" rule.

Jeff



Re: [PATCH] PR28901 -Wunused-variable ignores unused const initialised variables

2015-09-23 Thread Jeff Law

On 09/18/2015 08:29 PM, Martin Sebor wrote:

I guess it is not the 'const' I think should be handled special but the
'static'.  Having unused static variables (const or not) declared in a
header file but unused seems reasonable since the header file may be
included in multiple .c files each of which uses a subset of the static
variables.


I tend to agree. I suppose diagnosing unused non-const static
definitions might be helpful but I can't think of a good reason
to diagnose unused initialized static consts in C. Especially
since they're not diagnosed in C++.

Would diagnosing them in source files while avoiding the warning
for static const definitions in headers be an acceptable compromise?

It's probably worth a try.

jeff




Re: [C/C++ PATCH] RFC: Implement -Wduplicated-cond (PR c/64249) (take

2015-09-23 Thread Marek Polacek
On Wed, Sep 23, 2015 at 12:21:35PM -0600, Jeff Law wrote:
> On 09/23/2015 10:32 AM, Marek Polacek wrote:
> >On Tue, Sep 22, 2015 at 03:33:34PM -0600, Martin Sebor wrote:
> >>It's fine by me (for whatever it's worth).
> >
> >Thanks.  Let's wait if Jason/Joseph or anyone else wants to chime in.
> >
> >>Btw., if you're unhappy about having to wipe out the whole chain
> >>after every side-effect it occurred to me that it might be possible
> >>to do better: instead of deleting the whole chain, only remove from
> >>it the elements that may be affected by the side-effect. This should
> >>make it possible to keep on the chain all conditions involving local
> >>variables whose address hasn't been taken, which I would expect to
> >>be most in most cases.
> >
> >I'm not unhappy about deleting the chain ;).  I'd rather not do that
> >because that might get somewhat hairy.  First, I don't think we have
> >the capability to easily detect variables whose address hasn't been
> >taken, second, consider e.g.
> >
> >   if (j == 4) // ...
> >   else if ((j++, --k, ++l)) // ...
> >   else if (bar (j, &k)) // ...
> >
> >we'd probably need some walk_tree, save the variables temporarily somewhere
> >etc.; that might slow and complicate things for a corner case.  Or am I being
> >just too lazy? ;)
> This is all running on generic, not gimple/ssa, right?  In which case, no
> you don't know what stuff might be aliased.

Right.  Hence this doesn't seem doable, but I don't think that's a big deal
at all.

Marek


Re: [gomp4] remap variables inside gang, worker, vector and collapse clauses

2015-09-23 Thread Thomas Schwinge
Hi!

On Wed, 23 Sep 2015 10:57:40 -0700, Cesar Philippidis  
wrote:
> On 09/23/2015 10:42 AM, Cesar Philippidis wrote:
> | Gang, worker, vector and collapse all contain optional arguments which
> | may be used during loop expansion. In OpenACC, those expressions could
> | contain variables

I'm fairly sure that at least the collapse clause needs to be a
compile-time constant?

> | but those variables aren't always getting remapped
> | automatically. This patch remaps those variables inside lower_omp_loop.

Shouldn't that be done in lower_rec_input_clauses?  (Maybe I'm confused
-- it's been a long time that I looked at this code.)  (Jakub?)

> | Note that I didn't need to use a tree walker for more complicated
> | expressions because it's not required. By the time those clauses reach
> | lower_omp_loop, only the result of the expression is available. So the
> | other variables in those expressions get remapped with everything else
> | during omplow. Therefore, the only problematic case is when the the
> | optional expression is just a decl, e.g. gang(static:foo).
> 
> > I've applied this patch to gomp-4_0-branch.

> 2015-09-23  Cesar Philippidis  
> 
>   gcc/
>   * omp-low.c (lower_omp_for): Remap any variables present in
>   OMP_CLAUSE_GANG, OMP_CLAUSE_WORKER, OMP_CLAUSE_VECTOR and
>   OMP_CLAUSE_COLLAPSE becuase they will be used later by expand_omp_for.
> 
>   libgomp/
>   * testsuite/libgomp.oacc-c-c++-common/gang-static-2.c: Test if
>   static gang expressions containing variables work.
>   * testsuite/libgomp.oacc-fortran/gang-static-1.f90: Likewise.
> 
> diff --git a/gcc/omp-low.c b/gcc/omp-low.c
> index ec76096..3f36b7a 100644
> --- a/gcc/omp-low.c
> +++ b/gcc/omp-low.c
> @@ -11325,6 +11325,35 @@ lower_omp_for (gimple_stmt_iterator *gsi_p, 
> omp_context *ctx)
>if (oacc_tail)
>  gimple_seq_add_seq (&body, oacc_tail);
>  
> +  /* Update the variables inside any clauses which may be involved in loop
> + expansion later on.  */
> +  for (tree c = gimple_omp_for_clauses (stmt); c; c = OMP_CLAUSE_CHAIN (c))
> +{
> +  int args;
> +
> +  switch (OMP_CLAUSE_CODE (c))
> + {
> + default:
> +   args = 0;
> +   break;
> + case OMP_CLAUSE_GANG:
> +   args = 2;
> +   break;
> + case OMP_CLAUSE_VECTOR:
> + case OMP_CLAUSE_WORKER:
> + case OMP_CLAUSE_COLLAPSE:
> +   args = 1;
> +   break;
> + }
> +
> +  for (int i = 0; i < args; i++)
> + {
> +   tree expr = OMP_CLAUSE_OPERAND (c, i);
> +   if (expr && DECL_P (expr))
> + OMP_CLAUSE_OPERAND (c, i) = build_outer_var_ref (expr, ctx);
> + }
> +}
> +
>pop_gimplify_context (new_stmt);
>  
>gimple_bind_append_vars (new_stmt, ctx->block_vars);
> diff --git a/libgomp/testsuite/libgomp.oacc-c-c++-common/gang-static-2.c 
> b/libgomp/testsuite/libgomp.oacc-c-c++-common/gang-static-2.c
> index 3a9a508..20a866d 100644
> --- a/libgomp/testsuite/libgomp.oacc-c-c++-common/gang-static-2.c
> +++ b/libgomp/testsuite/libgomp.oacc-c-c++-common/gang-static-2.c
> @@ -39,7 +39,7 @@ int
>  main ()
>  {
>int a[N];
> -  int i;
> +  int i, x;
>  
>  #pragma acc parallel loop gang (static:*) num_gangs (10)
>for (i = 0; i < 100; i++)
> @@ -78,5 +78,21 @@ main ()
>  
>test_nonstatic (a, 10);
>  
> +  /* Static arguments with a variable expression.  */
> +
> +  x = 20;
> +#pragma acc parallel loop gang (static:0+x) num_gangs (10)
> +  for (i = 0; i < 100; i++)
> +a[i] = GANG_ID (i);
> +
> +  test_static (a, 10, 20);
> +
> +  x = 20;
> +#pragma acc parallel loop gang (static:x) num_gangs (10)
> +  for (i = 0; i < 100; i++)
> +a[i] = GANG_ID (i);
> +
> +  test_static (a, 10, 20);
> +
>return 0;
>  }
> diff --git a/libgomp/testsuite/libgomp.oacc-fortran/gang-static-1.f90 
> b/libgomp/testsuite/libgomp.oacc-fortran/gang-static-1.f90
> index e562535..7d56060 100644
> --- a/libgomp/testsuite/libgomp.oacc-fortran/gang-static-1.f90
> +++ b/libgomp/testsuite/libgomp.oacc-fortran/gang-static-1.f90
> @@ -3,6 +3,7 @@
>  program main
>integer, parameter :: n = 100
>integer i, a(n), b(n)
> +  integer x
>  
>do i = 1, n
>   b(i) = i
> @@ -48,6 +49,23 @@ program main
>  
>call test (a, b, 20, n)
>  
> +  x = 5
> +  !$acc parallel loop gang (static:0+x) num_gangs (10)
> +  do i = 1, n
> + a(i) = b(i) + 5
> +  end do
> +  !$acc end parallel loop
> +
> +  call test (a, b, 5, n)
> +
> +  x = 10
> +  !$acc parallel loop gang (static:x) num_gangs (10)
> +  do i = 1, n
> + a(i) = b(i) + 10
> +  end do
> +  !$acc end parallel loop
> +
> +  call test (a, b, 10, n)
>  end program main
>  
>  subroutine test (a, b, sarg, n)


Grüße,
 Thomas


signature.asc
Description: PGP signature


Re: [gomp4 0/8] NVPTX: initial OpenMP offloading

2015-09-23 Thread Bernd Schmidt

On 09/23/2015 07:22 PM, Alexander Monakov wrote:

This patch series implements some minimally required changes to have OpenMP
offloading working for NVPTX target on the gomp4 branch.  '#pragma omp target'
and data updates should work, but all parallel execution functionality remains
stubbed out (uses of '#pragma omp parallel' in target regions yield a link
error).

I'd like to get feedback on the patches, and approval for the gomp-4_0-branch
where possible.


I have two major concerns here. Can I ask you how much experience you 
have with GPU programming and ptx? These patches provide stub 
functionality, which is easy enough, but I can't tell whether there's a 
credible plan to provide a full implementation. GPUs really need a 
different programming model than normal CPUs, which is something I 
learned the hard way, and I'm not terribly optimistic about porting 
libgomp to ptx. (I may be wrong.)


In one patch you mention newlib pthread type definitions - are you aware 
that there is no real pthreads implementation in the ptx newlib? The ptx 
newlib is really only provided for a minimal subset of libc functionality.


My other concern would be not to approve changes to the gomp-4_0-branch 
that could derail or slow down the effort to implement OpenACC, which 
has a much better chance of being in gcc-6 than this effort. You might 
want to make a private branch for your work.



Bernd


Re: New post-LTO OpenACC pass

2015-09-23 Thread Nathan Sidwell

On 09/23/15 08:58, Bernd Schmidt wrote:

On 09/23/2015 02:14 PM, Nathan Sidwell wrote:

On 09/23/15 06:59, Bernd Schmidt wrote:

On 09/22/2015 05:16 PM, Nathan Sidwell wrote:

+if (gimple_call_builtin_p (call, BUILT_IN_ACC_ON_DEVICE))
+  /* acc_on_device must be evaluated at compile time for
+ constant arguments.  */
+  {
+oacc_xform_on_device (call);
+rescan = true;
+  }


Is there a reason this is not done as part of pass_fold_builtins? (It
looks like
maybe adding this to fold_call_stmt in builtins.c would be sufficient
too).



As I feared, builtin folding occurs in several places.  In particular its first 
call is very early on in the host compiler, which is far too soon.


We have to defer folding until we know whether we're doing host or device 
compilation.


nathan


Re: [RFC] Try vector as a new representation for vector masks

2015-09-23 Thread Richard Henderson
On 09/23/2015 06:53 AM, Richard Biener wrote:
> I think independent improvements are
> 
>  1) remove (most) of the bool patterns from the vectorizer
> 
>  2) make VEC_COND_EXPR not have a GENERIC comparison embedded
> 
> (same for COND_EXPR?)

Careful.

The reason that COND_EXPR have embedded comparisons is to handle flags
registers.  You can't separate the setting of the flags from the using of the
flags on most targets, because there's only one flags register.

The same is true for VEC_COND_EXPR with respect to MIPS.  The base architecture
has 8 floating-point comparison result flags, and the vector compare
instructions are fixed to set fcc[0:width-1].  So again there's only one
possible output location for the result of the compare.

MIPS is going to present a problem if we attempt to generalize logical
combinations of these vector, since one has to use several instructions
(or one insn and pre-load constants into two registers) to get the fcc bits out
into a form we can manipulate.


r~


Re: [gomp4] remap variables inside gang, worker, vector and collapse clauses

2015-09-23 Thread Cesar Philippidis
On 09/23/2015 11:26 AM, Thomas Schwinge wrote:
> On Wed, 23 Sep 2015 10:57:40 -0700, Cesar Philippidis 
>  wrote:
>> On 09/23/2015 10:42 AM, Cesar Philippidis wrote:
>> | Gang, worker, vector and collapse all contain optional arguments which
>> | may be used during loop expansion. In OpenACC, those expressions could
>> | contain variables
> 
> I'm fairly sure that at least the collapse clause needs to be a
> compile-time constant?

Thanks, you're correct. I was looking at a user application and not the
spec when I made this change. I've applied this patch to fix that.

>> | but those variables aren't always getting remapped
>> | automatically. This patch remaps those variables inside lower_omp_loop.
> 
> Shouldn't that be done in lower_rec_input_clauses?  (Maybe I'm confused
> -- it's been a long time that I looked at this code.)  (Jakub?)

I thought that lower_rec_input_clauses was for omp reductions and
firstprivate initialization? Variables ultimately get remapped when
omplower eventually calls gimple_regimplify_operands. That function uses
the value-expr to for remapping.

In this case, since lower_omp_for is responsible for GIMPLE_OMP_FOR
stmts, gimple_regimplify_operands doesn't get called on the clauses.

Cesar
2015-09-23  Cesar Philippidis  

	gcc/
	* omp-low.c (lower_omp_for): Don't remap OMP_CLAUSE_COLLAPSE
	because it always a constant value.

diff --git a/gcc/omp-low.c b/gcc/omp-low.c
index fa6b8a5..753996b 100644
--- a/gcc/omp-low.c
+++ b/gcc/omp-low.c
@@ -11341,7 +11341,6 @@ lower_omp_for (gimple_stmt_iterator *gsi_p, omp_context *ctx)
 	  break;
 	case OMP_CLAUSE_VECTOR:
 	case OMP_CLAUSE_WORKER:
-	case OMP_CLAUSE_COLLAPSE:
 	  args = 1;
 	  break;
 	}


Re: New post-LTO OpenACC pass

2015-09-23 Thread Bernd Schmidt

On 09/23/2015 08:42 PM, Nathan Sidwell wrote:


As I feared, builtin folding occurs in several places.  In particular
its first call is very early on in the host compiler, which is far too
soon.

We have to defer folding until we know whether we're doing host or
device compilation.


Doesn't something like "symtab->state >= EXPANSION" give you that?


Bernd


Re: [ubsan PATCH] Fix uninitialized var issue (PR sanitizer/64906)

2015-09-23 Thread Bernd Schmidt

On 09/23/2015 06:07 PM, Marek Polacek wrote:

Given that the code above seems to be useless now, I think let's put this
patch in as-is, backport it to gcc-5, then remove those redundant hunks on
trunk and add the testcase above.  Do you agree?


Sounds reasonable. If you can find a point in the history where that 
code wasn't useless, it would be good to help us understand why it's there.



Bernd


Re: New post-LTO OpenACC pass

2015-09-23 Thread Nathan Sidwell

On 09/23/15 14:51, Bernd Schmidt wrote:

On 09/23/2015 08:42 PM, Nathan Sidwell wrote:


As I feared, builtin folding occurs in several places.  In particular
its first call is very early on in the host compiler, which is far too
soon.

We have to defer folding until we know whether we're doing host or
device compilation.


Doesn't something like "symtab->state >= EXPANSION" give you that?


I don't know.   It doesn't seem to me to be a good idea for the builtin 
expanders to be context-sensitive.


nathan


Re: Elimitate duplication of get_catalogs in different abi

2015-09-23 Thread François Dumont
On 05/09/2015 23:02, François Dumont wrote:
> On 22/08/2015 14:24, Daniel Krügler wrote:
>> 2015-08-21 23:11 GMT+02:00 François Dumont :
>>> I think I found a better way to handle this problem. It is c++locale.cc
>>> that needs to be built with --fimplicit-templates. I even think that the
>>> *_cow.cc file do not need this option but as I don't know what is the
>>> drawback of this option I kept it. I also explicitely used the file name
>>> c++locale.cc even if it is an alias to a configurable source file.  I
>>> guess there must be some variable to use no ?
>>>
>>> With this patch there are 6 additional symbols. I guess I need to
>>> declare those in the scripts even if it is for internal library usage,
>>> right ?
>> I would expect that the new Catalog_info definition either has deleted
>> or properly (user-)defined copy constructor and copy assignment
>> operator.
>>
>>
>> - Daniel
>>
> This type is used in C++98 so I need to make those private, not deleted.
>
> With this change, is the patch ok to commit ?
>
> François
>

What about this patch ?

I am still uncomfortable in exposing those implementation details in the
versionned symbols but I don't know how to do otherwise. Do you want me
to push this code in std::__detail namespace ?

François



Re: [PR64164] drop copyrename, integrate into expand

2015-09-23 Thread Alexandre Oliva
On Sep 18, 2015, Alan Lawrence  wrote:

> With the latest git commit 2b27ef197ece54c4573c5a748b0d40076e35412c on
> branch aoliva/pr64164, I am now able to build a cross toolchain for
> aarch64 and aarch64_be, and can confirm the ABI failure is fixed on
> the branch.

Thanks for the confirmation.  I've made one further tweak for cris and
lm32, dropping the assert that caused build failures for libstdc++
atomics parms that required more alignment than
MAX_SUPPORTED_STACK_ALIGNMENT, consolidated the patchset and retested it
with a more recent baseline (r228019), with native regstraps on
x86_64-linux-gnu, i686-linux-gnu, powerpc64-linux-gnu,
powerpc64le-linux-gnu, and cross toolchain builds for the following 73
platforms: aarch64_be-elf aarch64-elf arm-eabi armeb-eabihf
arm-symbianelf avr-elf bfin-elf c6x-elf cr16-elf cris-elf crisv32-elf
epiphany-elf fido-elf fr30-elf frv-elf ft32-elf h8300-elf i686-elf
ia64-elf iq2000-elf lm32-elf m32c-elf m32r-elf m32rle-elf m68k-elf
mcore-elf mep-elf microblaze-elf mips64el-elf mips64-elf mips64orion-elf
mips64vr-elf mipsel-elf mipsisa32-elfoabi mipsisa64-elfoabi
mipsisa64r2el-elf mipsisa64r2-sde-elf mipsisa64sb1-elf
mipsisa64sr71k-elf mipstx39-elf mn10300-elf moxie-elf msp430-elf
nds32be-elf nds32le-elf nios2-elf pdp11-aout powerpc-eabialtivec
powerpc-eabi powerpc-eabisimaltivec powerpc-eabisim powerpc-eabispe
powerpcle-eabi powerpcle-eabisim powerpcle-elf powerpc-xilinx-eabi
ppc64-eabi ppc-eabi ppc-elf rl78-elf rx-elf sh64-elf sh-elf
sh-superh-elf sparc64-elf sparc-elf sparc-leon-elf spu-elf v850e-elf
v850-elf visium-elf xstormy16-elf xtensa-elf.  Not all of them succeeded
in building, but those that didn't failed at the very same spots before
and after this patch.


This patch doesn't really add much functionality.  It rather
reimplements a lot of the ugly and fragile stuff I put in in the
previous big patchset in a far more robust and pleasant way.  It fixes a
number of regressions in the process, mainly because, instead of
modifying assign_parms so as to let cfgexpand do part of its job, it
reverts all of the RTL assignment for parameters and results to
assign_parms.  cfgexpand now leaves the RTL assignment of partitions
containing default defs or parms and results to assign_parms, and
assign_parms uses a single callback, set_parm_rtl, to tell cfgexpand the
assignment for the partition containing the default def of each
parameter.

This required introducing default defs for all parms and results, even
if unused; we could refrain from creating them, and refrain from
initializing those parameters (at least when optimizing), but that would
require messing with the fragile bits in assign_parms again, and it
would bring little benefit, since RTL optimization will likely notice
the initialization is unused and drop it anyway.  Besides, adding the
default defs was actually needed to fix a regression in the previous
patch, and even with the current patch it helps make sure we don't
assign more than one default def to the same SSA partition (the previous
patch attempted to do that, but there was a bug, fixed in the current
patch).  Having unused default defs makes it easier for us to decide
whether to use an entry_value rtx for the initial debug insn of a parm.
We track partitions holding default defs for parms and results with a
bitmap; we used to have a bitmap that tracked partitions holding default
defs, but it was unused!  I just renamed it and repurposed it.

I've also added checking asserts to set_rtl, to verify that, when we
expect a REG, we get a REG, and that it has the expected mode.  set_rtl
was also adjusted to record anonymous SSA names or their base types in
attrs of REGs or MEMs, respectively, so that code that relied on the
attrs to detect properties of the decl types no longer regress just
because we no longer generate decls for anonymous SSA names.  Since
there were prior uses of types in MEM attrs, that was expected to go
smoothly, but I was surprised at how smoothly adding SSA names to REG
attrs went.  No adjustments required!

I also tightened a bit the conditions for coalescing: we used to require
the same canonical type; I've added tests for same alignment
requirements, and for same signedness.  OTOH, I've added a few more
coalesce candidates for RESULT_DECLs and the newly-added default defs of
parms and results.

Other relevant changes were in mode promotion.  TYPE_MODE would often
return BLKmode for some vector types, which was fine for some return
decl RTL with PARALLEL, but that didn't quite work for SSA partitions.
There were other cases of mode promotion of result decls that failed the
asserts in set_rtl, that revealed promote_decl_mode didn't call
promote_function_mode as expected for results.

The new assers brought additional requirements: promoting the mode of
the RTL generated for the static chain, arranging for result decls to be
assigned to a pseudo where it would formerly have got a BLKmode PARALLEL
(as mentioned above), and arranging for p

Re: [gomp4 0/8] NVPTX: initial OpenMP offloading

2015-09-23 Thread Alexander Monakov
On Wed, 23 Sep 2015, Bernd Schmidt wrote:
> I have two major concerns here. Can I ask you how much experience you have
> with GPU programming and ptx?

I'd say I have a good understanding of the programming model and nvidia
hardware architecture, having used CUDA tools and payed attention to
r&d news for a few years.  I've discussed with OpenACC and HSA folks at the
GNU Cauldron my plans to take on this work, and I hope they can acknowledge
that I at least seemed to have a clue :)

> These patches provide stub functionality, which
> is easy enough, but I can't tell whether there's a credible plan to provide a
> full implementation. GPUs really need a different programming model than
> normal CPUs, which is something I learned the hard way, and I'm not terribly
> optimistic about porting libgomp to ptx. (I may be wrong.)

Right, libgomp running on ptx would have to do many things differently from
how it does now (and some drop entirely, like affinity).  Thankfully it can be
implemented piecemeal in config/nvptx, without #ifdef butchery in the primary
source files.  The plan towards providing a full implementation is thus to
work my way incrementally over GOMP_nn api, allowing '#pragma omp parallel' to
link successfully, then 'for', 'teams' and so on.  For 'parallel' the
intention is to either have prestarted idle threads in teams if possible, or
start another kernel via dynamic parallelism.  Exact details are to be worked
out -- I'd like to avoid introducing a hard dependency on dynamic parallelism.

> In one patch you mention newlib pthread type definitions - are you aware that
> there is no real pthreads implementation in the ptx newlib? The ptx newlib is
> really only provided for a minimal subset of libc functionality.

Sure, I'm aware.  The point was to make libgomp.h valid to be included into
the rest of to-be-ported source files, keeping modifications to it to a
minimum.  If the idea is that relying on #include  available on
nvptx in the first place is too much of a hack, we can discuss alternatives :)

> My other concern would be not to approve changes to the gomp-4_0-branch that
> could derail or slow down the effort to implement OpenACC, which has a much
> better chance of being in gcc-6 than this effort. You might want to make a
> private branch for your work.

I'm unclear how this work might hurt the OpenACC efforts, and in any case I
intend to be careful.  I don't imagine there will be conflicting requirements
to source code changes along the way.  In defense of the idea of working on
gomp4 branch, I expect that interleaving OpenACC and OpenMP work on a common
branch will cause less pain in case of inadvertent breakage than a merge
afterward.  Jakub, since you suggested submitting for gomp-4_0-branch, what's
your recommendation here?

Thanks
Alexander


libgo patch committed: Use =, not ==, in mksysinfo.sh

2015-09-23 Thread Ian Lance Taylor
As suggested in http://gcc.gnu.org/PR67695 , we should not use test
with == in mksysinfo.sh.  This patch fixes it.  Bootstrapped and ran
Go testsuite on x86_64-unknown-linux-gnu.  Committed to mainline.

Ian
Index: gcc/go/gofrontend/MERGE
===
--- gcc/go/gofrontend/MERGE (revision 228057)
+++ gcc/go/gofrontend/MERGE (working copy)
@@ -1,4 +1,4 @@
-2087b95180caea3477647c449772b7fecc01a71c
+90ebe729992443dc00b19c76b28d1270e17245a4
 
 The first line of this file holds the git revision number of the last
 merge done from the gofrontend repository.
Index: libgo/mksysinfo.sh
===
--- libgo/mksysinfo.sh  (revision 227696)
+++ libgo/mksysinfo.sh  (working copy)
@@ -531,7 +531,7 @@ upcase_fields () {
 # GNU/Linux specific; it should do no harm if there is no
 # _user_regs_struct.
 regs=`grep '^type _user_regs_struct struct' gen-sysinfo.go || true`
-if test "$regs" == ""; then
+if test "$regs" = ""; then
   # s390
   regs=`grep '^type __user_regs_struct struct' gen-sysinfo.go || true`
   if test "$regs" != ""; then


Re: [PATCH, ARM]: Fix static interworking call

2015-09-23 Thread Christophe Lyon
On 21 September 2015 at 16:15, Christian Bruel  wrote:
>
>
> On 09/18/2015 05:03 PM, Richard Earnshaw wrote:
>
>>> Index: attr_thumb-static2.c
>>> ===
>>> --- attr_thumb-static2.c(revision 227904)
>>> +++ attr_thumb-static2.c(working copy)
>>> @@ -1,6 +1,6 @@
>>>   /* Check that interwork between static functions is correctly resolved.
>>> */
>>>
>>> -/* { dg-skip-if "" { ! { arm_thumb1_ok || arm_thumb2_ok } } } */
>>> +/* { dg-require-effective-target arm_thumb2_ok } */
>>>   /* { dg-options "-O0 -march=armv7-a -mfloat-abi=hard" } */
>>>   /* { dg-do compile } */
>>>
>>>
>>
>> Do you really need -mfloat-abi=hard for this test?  If so, I think you
>> also need "dg-require-effective-target arm_hard_vfp_ok".  See
>> gcc.target/arm/pr65729.c
>>
>
> The test was not crashing for -mfloat-abi=soft.
> But the number of blx is independent. So yes we can write the conditions in
> such a way the test runs without hard fp.
>
> is this one better ?

You need to move the dg-do directive before the other ones (otherwise,
it overrides the effect of require-target).


[PATCH 0/4] bb-reorder: Add the "simple" algorithm

2015-09-23 Thread Segher Boessenkool
The current basic block reordering always uses the "software trace cache"
algorithm.  That has a few problems:

1) It increases code size substantially; this makes it not suitable for
-O1 or -Os, and not at all for some architectures;
2) but it is enabled for -Os and all targets;
3) and -O1 gets nothing, resulting in pretty jumpy code.

This patch set adds a new simple greedy basic block reordering algorithm,
adds a flag -freorder-blocks-algorithm=, and sets things up so that -O1
and -Os use the simple algo.

Split into a few pieces for easier review.  Every intermediate step works.

Bootstrapped and tested on powerpc64-linux.  There are two new fails in
guality testresults for -Os.

Is this okay for mainline?


Segher


Segher Boessenkool (4):
  bb-reorder: Split out STC
  bb-reorder: Add the "simple" algorithm
  bb-reorder: Add -freorder-blocks-algorithm= and wire it up
  bb-reorder: Documentation updates

 gcc/bb-reorder.c| 196 +---
 gcc/common.opt  |  13 
 gcc/doc/invoke.texi |  23 --
 gcc/flag-types.h|   7 ++
 gcc/opts.c  |   4 +-
 5 files changed, 227 insertions(+), 16 deletions(-)

-- 
1.8.1.4



[PATCH 1/4] bb-reorder: Split out STC

2015-09-23 Thread Segher Boessenkool
This first patch simply factors code a little bit.


2015-09-23   Segher Boessenkool  

* bb-reorder.c (reorder_basic_blocks_software_trace_cache): New
function, factored out from ...
(reorder_basic_blocks): ... here.

---
 gcc/bb-reorder.c | 29 ++---
 1 file changed, 18 insertions(+), 11 deletions(-)

diff --git a/gcc/bb-reorder.c b/gcc/bb-reorder.c
index 2110bd2..725cdc3 100644
--- a/gcc/bb-reorder.c
+++ b/gcc/bb-reorder.c
@@ -2226,24 +2226,15 @@ update_crossing_jump_flags (void)
}
 }
 
-/* Reorder basic blocks.  The main entry point to this file.  FLAGS is
-   the set of flags to pass to cfg_layout_initialize().  */
+/* Reorder basic blocks using the software trace cache (STC) algorithm.  */
 
 static void
-reorder_basic_blocks (void)
+reorder_basic_blocks_software_trace_cache (void)
 {
   int n_traces;
   int i;
   struct trace *traces;
 
-  gcc_assert (current_ir_type () == IR_RTL_CFGLAYOUT);
-
-  if (n_basic_blocks_for_fn (cfun) <= NUM_FIXED_BLOCKS + 1)
-return;
-
-  set_edge_can_fallthru_flag ();
-  mark_dfs_back_edges ();
-
   /* We are estimating the length of uncond jump insn only once since the code
  for getting the insn length always returns the minimal length now.  */
   if (uncond_jump_length == 0)
@@ -2268,6 +2259,22 @@ reorder_basic_blocks (void)
   connect_traces (n_traces, traces);
   FREE (traces);
   FREE (bbd);
+}
+
+/* Reorder basic blocks.  The main entry point to this file.  */
+
+static void
+reorder_basic_blocks (void)
+{
+  gcc_assert (current_ir_type () == IR_RTL_CFGLAYOUT);
+
+  if (n_basic_blocks_for_fn (cfun) <= NUM_FIXED_BLOCKS + 1)
+return;
+
+  set_edge_can_fallthru_flag ();
+  mark_dfs_back_edges ();
+
+  reorder_basic_blocks_software_trace_cache ();
 
   relink_block_chain (/*stay_in_cfglayout_mode=*/true);
 
-- 
1.8.1.4



[PATCH 2/4] bb-reorder: Add the "simple" algorithm

2015-09-23 Thread Segher Boessenkool
This is the meat of this series: a new algorithm to do basic block
reordering.  It uses the simple greedy approach to maximum weighted
matching, where the weights are the predicted execution frequency of
the edges.  This always finds a solution that is within a factor two
of optimal, if you disregard loops (which we cannot allow) and the
complications of block partitioning.


2015-09-23   Segher Boessenkool  

* bb-reorder.c (reorder_basic_blocks_software_trace_cache): Print
a header to the dump file.
(edge_order): New function.
(reorder_basic_blocks_simple): New function.
(reorder_basic_blocks): Choose between the STC and the simple
algorithms (always choose the former).

---
 gcc/bb-reorder.c | 160 ++-
 1 file changed, 159 insertions(+), 1 deletion(-)

diff --git a/gcc/bb-reorder.c b/gcc/bb-reorder.c
index 725cdc3..40e9e50 100644
--- a/gcc/bb-reorder.c
+++ b/gcc/bb-reorder.c
@@ -2231,6 +2231,9 @@ update_crossing_jump_flags (void)
 static void
 reorder_basic_blocks_software_trace_cache (void)
 {
+  if (dump_file)
+fprintf (dump_file, "\nReordering with the STC algorithm.\n\n");
+
   int n_traces;
   int i;
   struct trace *traces;
@@ -2261,6 +2264,158 @@ reorder_basic_blocks_software_trace_cache (void)
   FREE (bbd);
 }
 
+/* Return true if edge E1 is more desirable as a fallthrough edge than
+   edge E2 is.  */
+
+static bool
+edge_order (edge e1, edge e2)
+{
+  return EDGE_FREQUENCY (e1) > EDGE_FREQUENCY (e2);
+}
+
+/* Reorder basic blocks using the "simple" algorithm.  This tries to
+   maximize the dynamic number of branches that are fallthrough, without
+   copying instructions.  The algorithm is greedy, looking at the most
+   frequently executed branch first.  */
+
+static void
+reorder_basic_blocks_simple (void)
+{
+  if (dump_file)
+fprintf (dump_file, "\nReordering with the \"simple\" algorithm.\n\n");
+
+  edge *edges = new edge[2 * n_basic_blocks_for_fn (cfun)];
+
+  /* First, collect all edges that can be optimized by reordering blocks:
+ simple jumps and conditional jumps, as well as the function entry edge.  
*/
+
+  int n = 0;
+  edges[n++] = EDGE_SUCC (ENTRY_BLOCK_PTR_FOR_FN (cfun), 0);
+
+  basic_block bb;
+  FOR_EACH_BB_FN (bb, cfun)
+{
+  rtx_insn *end = BB_END (bb);
+
+  if (computed_jump_p (end) || tablejump_p (end, NULL, NULL))
+   continue;
+
+  if (any_condjump_p (end))
+   {
+ edges[n++] = EDGE_SUCC (bb, 0);
+ edges[n++] = EDGE_SUCC (bb, 1);
+   }
+  else if (single_succ_p (bb))
+   edges[n++] = EDGE_SUCC (bb, 0);
+}
+
+  /* Sort the edges, the most desirable first.  */
+
+  std::stable_sort (edges, edges + n, edge_order);
+
+  /* Now decide which of those edges to make fallthrough edges.  We set
+ BB_VISITED if a block already has a fallthrough successor assigned
+ to it.  We make ->AUX of an endpoint point to the opposite endpoint
+ of a sequence of blocks that fall through, and ->AUX will be NULL
+ for a block that is in such a sequence but not an endpoint anymore.
+
+ To start with, everything points to itself, nothing is assigned yet.  */
+
+  FOR_ALL_BB_FN (bb, cfun)
+bb->aux = bb;
+
+  EXIT_BLOCK_PTR_FOR_FN (cfun)->aux = 0;
+
+  /* Now for all edges, the most desirable first, see if that edge can
+ connect two sequences.  If it can, update AUX and BB_VISITED; if it
+ cannot, zero out the edge in the table.  */
+
+  int j;
+  for (j = 0; j < n; j++)
+{
+  edge e = edges[j];
+
+  basic_block tail_a = e->src;
+  basic_block head_b = e->dest;
+  basic_block head_a = (basic_block) tail_a->aux;
+  basic_block tail_b = (basic_block) head_b->aux;
+
+  /* An edge cannot connect two sequences if:
+- it crosses partitions;
+- its src is not a current endpoint;
+- its dest is not a current endpoint;
+- or, it would create a loop.  */
+
+  if (e->flags & EDGE_CROSSING
+ || tail_a->flags & BB_VISITED
+ || !tail_b
+ || (!(head_b->flags & BB_VISITED) && head_b != tail_b)
+ || tail_a == tail_b)
+   {
+ edges[j] = 0;
+ continue;
+   }
+
+  tail_a->aux = 0;
+  head_b->aux = 0;
+  head_a->aux = tail_b;
+  tail_b->aux = head_a;
+  tail_a->flags |= BB_VISITED;
+}
+
+  /* Put the pieces together, in the same order that the start blocks of
+ the sequences already had.  The hot/cold partitioning gives a little
+ complication: as a first pass only do this for blocks in the same
+ partition as the start block, and (if there is anything left to do)
+ in a second pass handle the other partition.  */
+
+  basic_block last_tail = (basic_block) ENTRY_BLOCK_PTR_FOR_FN (cfun)->aux;
+
+  int current_partition = BB_PARTITION (last_tail);
+  bool need_another_pass = true;
+
+  for (int pass = 0; pass < 2 && need_another_pass; pass++)
+{
+  need_ano

[PATCH 4/4] bb-reorder: Documentation updates

2015-09-23 Thread Segher Boessenkool
This updates the documentation for the new option and new defaults.


2015-09-23   Segher Boessenkool  

* doc/invoke.texi (Optimization Options): Add
-freorder-blocks-algorithm=.
(Optimize Options) <-O>: Add -freorder-blocks.
<-O2>: Remove -freorder-blocks.  Add -freorder-blocks-algorithm=stc.
<-Os>: Add -freorder-blocks-algorithm=stc as not enabled.
<-freorder-blocks>: Also enabled at levels -O and -Os.
<-freorder-blocks-algorithm=>: Document new option.

---
 gcc/doc/invoke.texi | 23 +++
 1 file changed, 19 insertions(+), 4 deletions(-)

diff --git a/gcc/doc/invoke.texi b/gcc/doc/invoke.texi
index 09c58ee..ca18501 100644
--- a/gcc/doc/invoke.texi
+++ b/gcc/doc/invoke.texi
@@ -430,6 +430,7 @@ Objective-C and Objective-C++ Dialects}.
 -fprofile-use -fprofile-use=@var{path} -fprofile-values @gol
 -fprofile-reorder-functions @gol
 -freciprocal-math -free -frename-registers -freorder-blocks @gol
+-freorder-blocks-algorithm=@var{algorithm} @gol
 -freorder-blocks-and-partition -freorder-functions @gol
 -frerun-cse-after-loop -freschedule-modulo-scheduled-loops @gol
 -frounding-math -fsched2-use-superblocks -fsched-pressure @gol
@@ -7683,6 +7684,7 @@ compilation time.
 -fipa-reference @gol
 -fmerge-constants @gol
 -fmove-loop-invariants @gol
+-freorder-blocks @gol
 -fshrink-wrap @gol
 -fsplit-wide-types @gol
 -ftree-bit-ccp @gol
@@ -7739,7 +7741,8 @@ also turns on the following optimization flags:
 -foptimize-strlen @gol
 -fpartial-inlining @gol
 -fpeephole2 @gol
--freorder-blocks -freorder-blocks-and-partition -freorder-functions @gol
+-freorder-blocks-algorithm=stc @gol
+-freorder-blocks-and-partition -freorder-functions @gol
 -frerun-cse-after-loop  @gol
 -fsched-interblock  -fsched-spec @gol
 -fschedule-insns  -fschedule-insns2 @gol
@@ -7776,8 +7779,8 @@ optimizations designed to reduce code size.
 
 @option{-Os} disables the following optimization flags:
 @gccoptlist{-falign-functions  -falign-jumps  -falign-loops @gol
--falign-labels  -freorder-blocks  -freorder-blocks-and-partition @gol
--fprefetch-loop-arrays}
+-falign-labels  -freorder-blocks  -freorder-blocks-algorithm=stc @gol
+-freorder-blocks-and-partition  -fprefetch-loop-arrays}
 
 @item -Ofast
 @opindex Ofast
@@ -9127,7 +9130,19 @@ The default is @option{-fguess-branch-probability} at 
levels
 Reorder basic blocks in the compiled function in order to reduce number of
 taken branches and improve code locality.
 
-Enabled at levels @option{-O2}, @option{-O3}.
+Enabled at levels @option{-O}, @option{-O2}, @option{-O3}, @option{-Os}.
+
+@item -freorder-blocks-algorithm=@var{algorithm}
+@opindex freorder-blocks-algorithm
+Use the specified algorithm for basic block reordering.  The
+@var{algorithm} argument can be @samp{simple}, which does not increase
+code size (except sometimes due to secondary effects like alignment),
+or @samp{stc}, the ``software trace cache'' algorithm, which tries to
+put all often executed code together, minimizing the number of branches
+executed by making extra copies of code.
+
+The default is @samp{simple} at levels @option{-O}, @option{-Os}, and
+@samp{stc} at levels @option{-O2}, @option{-O3}.
 
 @item -freorder-blocks-and-partition
 @opindex freorder-blocks-and-partition
-- 
1.8.1.4



[PATCH 3/4] bb-reorder: Add -freorder-blocks-algorithm= and wire it up

2015-09-23 Thread Segher Boessenkool
This adds an -freorder-blocks-algorithm=[simple|stc] flag, with "simple"
as default.  For -O2 and up (except -Os) it is switched to "stc" instead.
Targets that never want STC can override this.  This changes -freorder-blocks
to be on at -O1 and up (was -O2 and up).

In effect, the changes are for -O1 (which now gets "simple" instead of
nothing), -Os (which now gets "simple" instead of "stc", since STC results
in much bigger code), and for targets that wish to never use STC (not in
this patch though).


2015-09-23   Segher Boessenkool  

* bb-reorder.c (reorder_basic_blocks): Use the algorithm selected
with flag_reorder_blocks_algorithm.
* common.opt (freorder-blocks-algorithm=): New flag.
(reorder_blocks_algorithm): New enum.
* flag-types.h (reorder_blocks_algorithm): New enum.
* opts.c (default_options_table): Use -freorder-blocks at -O1 and up,
and -freorder-blocks-algorithm=stc at -O2 and up (not at -Os).

---
 gcc/bb-reorder.c | 17 +
 gcc/common.opt   | 13 +
 gcc/flag-types.h |  7 +++
 gcc/opts.c   |  4 +++-
 4 files changed, 36 insertions(+), 5 deletions(-)

diff --git a/gcc/bb-reorder.c b/gcc/bb-reorder.c
index 40e9e50..e09f344 100644
--- a/gcc/bb-reorder.c
+++ b/gcc/bb-reorder.c
@@ -2429,10 +2429,19 @@ reorder_basic_blocks (void)
   set_edge_can_fallthru_flag ();
   mark_dfs_back_edges ();
 
-  if (1)
-reorder_basic_blocks_software_trace_cache ();
-  else
-reorder_basic_blocks_simple ();
+  switch (flag_reorder_blocks_algorithm)
+{
+case REORDER_BLOCKS_ALGORITHM_SIMPLE:
+  reorder_basic_blocks_simple ();
+  break;
+
+case REORDER_BLOCKS_ALGORITHM_STC:
+  reorder_basic_blocks_software_trace_cache ();
+  break;
+
+default:
+  gcc_unreachable ();
+}
 
   relink_block_chain (/*stay_in_cfglayout_mode=*/true);
 
diff --git a/gcc/common.opt b/gcc/common.opt
index 94d1d88..b0f70fb 100644
--- a/gcc/common.opt
+++ b/gcc/common.opt
@@ -1910,6 +1910,19 @@ freorder-blocks
 Common Report Var(flag_reorder_blocks) Optimization
 Reorder basic blocks to improve code placement
 
+freorder-blocks-algorithm=
+Common Joined RejectNegative Enum(reorder_blocks_algorithm) 
Var(flag_reorder_blocks_algorithm) Init(REORDER_BLOCKS_ALGORITHM_SIMPLE) 
Optimization
+-freorder-blocks-algorithm=[simple|stc] Set the used basic block reordering 
algorithm
+
+Enum
+Name(reorder_blocks_algorithm) Type(enum reorder_blocks_algorithm) 
UnknownError(unknown basic block reordering algorithm %qs)
+
+EnumValue
+Enum(reorder_blocks_algorithm) String(simple) 
Value(REORDER_BLOCKS_ALGORITHM_SIMPLE)
+
+EnumValue
+Enum(reorder_blocks_algorithm) String(stc) Value(REORDER_BLOCKS_ALGORITHM_STC)
+
 freorder-blocks-and-partition
 Common Report Var(flag_reorder_blocks_and_partition) Optimization
 Reorder basic blocks and partition into hot and cold sections
diff --git a/gcc/flag-types.h b/gcc/flag-types.h
index ac9ca0b..6301cea 100644
--- a/gcc/flag-types.h
+++ b/gcc/flag-types.h
@@ -109,6 +109,13 @@ enum stack_reuse_level
   SR_ALL
 };
 
+/* The algorithm used for basic block reordering.  */
+enum reorder_blocks_algorithm
+{
+  REORDER_BLOCKS_ALGORITHM_SIMPLE,
+  REORDER_BLOCKS_ALGORITHM_STC
+};
+
 /* The algorithm used for the integrated register allocator (IRA).  */
 enum ira_algorithm
 {
diff --git a/gcc/opts.c b/gcc/opts.c
index f1a9acd..786fd3a 100644
--- a/gcc/opts.c
+++ b/gcc/opts.c
@@ -441,6 +441,7 @@ static const struct default_options default_options_table[] 
=
 { OPT_LEVELS_1_PLUS, OPT_fipa_reference, NULL, 1 },
 { OPT_LEVELS_1_PLUS, OPT_fipa_profile, NULL, 1 },
 { OPT_LEVELS_1_PLUS, OPT_fmerge_constants, NULL, 1 },
+{ OPT_LEVELS_1_PLUS, OPT_freorder_blocks, NULL, 1 },
 { OPT_LEVELS_1_PLUS, OPT_fshrink_wrap, NULL, 1 },
 { OPT_LEVELS_1_PLUS, OPT_fsplit_wide_types, NULL, 1 },
 { OPT_LEVELS_1_PLUS, OPT_ftree_ccp, NULL, 1 },
@@ -483,7 +484,8 @@ static const struct default_options default_options_table[] 
=
 #endif
 { OPT_LEVELS_2_PLUS, OPT_fstrict_aliasing, NULL, 1 },
 { OPT_LEVELS_2_PLUS, OPT_fstrict_overflow, NULL, 1 },
-{ OPT_LEVELS_2_PLUS, OPT_freorder_blocks, NULL, 1 },
+{ OPT_LEVELS_2_PLUS_SPEED_ONLY, OPT_freorder_blocks_algorithm_, NULL,
+  REORDER_BLOCKS_ALGORITHM_STC },
 { OPT_LEVELS_2_PLUS, OPT_freorder_functions, NULL, 1 },
 { OPT_LEVELS_2_PLUS, OPT_ftree_vrp, NULL, 1 },
 { OPT_LEVELS_2_PLUS, OPT_ftree_builtin_call_dce, NULL, 1 },
-- 
1.8.1.4



Re: [RFC PATCH] parse #pragma GCC diagnostic in libcpp

2015-09-23 Thread Joseph Myers
On Sun, 20 Sep 2015, Manuel López-Ibáñez wrote:

> PING^2: https://gcc.gnu.org/ml/gcc-patches/2015-07/msg02414.html
> 
> On 21 August 2015 at 19:41, Manuel López-Ibáñez  wrote:
> > Any comments on this? 
> > https://gcc.gnu.org/ml/gcc-patches/2015-07/msg02414.html
> >
> > I don't see any other way to fix these PRs, but I don't know how to
> > keep the pragmas from being deleted by the preprocessor.

I'd suppose you want a new type of pragma, that acts like a combination of 
a deferred one and one for which a handler is called immediately by 
libcpp.  libcpp would call the handler but also create a CPP_PRAGMA token.  
The front-end code calling pragma handlers would need to know to do 
nothing with such pragmas; the token would only be for textual 
preprocessor output.

-- 
Joseph S. Myers
jos...@codesourcery.com

[PATCH] rs6000: Fix -mdebug=stack code for spe_gp_offset

2015-09-23 Thread Segher Boessenkool
This seems like an obvious typo.  I cannot test SPE, but I noticed
this offset shows up in the debug output for normal configurations.
The condition is inverted, compared to all similar ones.

Is this okay for trunk?


Segher


2015-09-23  Segher Boessenkool  

* config/rs6000/rs6000.c (debug_stack_info): Invert the test
for info->spe_gp_size.

---
 gcc/config/rs6000/rs6000.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/gcc/config/rs6000/rs6000.c b/gcc/config/rs6000/rs6000.c
index 5897ea8..f8638f9 100644
--- a/gcc/config/rs6000/rs6000.c
+++ b/gcc/config/rs6000/rs6000.c
@@ -22703,7 +22703,7 @@ debug_stack_info (rs6000_stack_t *info)
 fprintf (stderr, "\taltivec_save_offset = %5d\n",
 info->altivec_save_offset);
 
-  if (info->spe_gp_size == 0)
+  if (info->spe_gp_size)
 fprintf (stderr, "\tspe_gp_save_offset  = %5d\n",
 info->spe_gp_save_offset);
 
-- 
1.8.1.4



[committed, pa] Add long long support to config/pa/linux-atomic.c

2015-09-23 Thread John David Anglin
The attached change re-integrates long long support into linux-atomic.c.  This 
is possible due a kernel
fix to the LWS syscall and a middle-end fix to expand_atomic_compare_and_swap.

The patch corrects the inverted value returned by 
__sync_bool_compare_and_swap_##WIDTH.

It revises the return value check order in __kernel_cmpxchg2 to improve the 
fast path.

Finally, the return value types used to store results of __kernel_cmpxchg and 
__kernel_cmpxchg2
have been changed to long to match the type returned by these calls.

Tested on hppa-unknown-linux-gn with no regressions.  Committed to trunk, gcc-5 
and gcc-4.9 branches.

Dave
--
John David Anglin   dave.ang...@bell.net


2015-09-23  John David Anglin  

* config/pa/linux-atomic.c (__kernel_cmpxchg2): Reorder error checks.
(__sync_fetch_and_##OP##_##WIDTH): Change result to match type of
__kernel_cmpxchg2.
(__sync_##OP##_and_fetch_##WIDTH): Likewise.
(__sync_val_compare_and_swap_##WIDTH): Likewise.
(__sync_bool_compare_and_swap_##WIDTH): Likewise.
(__sync_lock_test_and_set_##WIDTH): Likewise.
(__sync_lock_release_##WIDTH): Likewise.
(__sync_fetch_and_##OP##_4): Change result to match type of
__kernel_cmpxchg.
(__sync_##OP##_and_fetch_4): Likewise.
(__sync_val_compare_and_swap_4): Likewise.
(__sync_bool_compare_and_swap_4): likewise.
(__sync_lock_test_and_set_4): Likewise.
(__sync_lock_release_4): Likewise.
(FETCH_AND_OP_2): Add long long variants.
(OP_AND_FETCH_2): Likewise.
(COMPARE_AND_SWAP_2 ): Likewise.
(SYNC_LOCK_TEST_AND_SET_2): Likewise.
(SYNC_LOCK_RELEASE_2): Likewise.
(__sync_bool_compare_and_swap_##WIDTH): Correct return.

Index: config/pa/linux-atomic.c
===
--- config/pa/linux-atomic.c(revision 227986)
+++ config/pa/linux-atomic.c(working copy)
@@ -88,12 +88,17 @@
: "i" (2)
: "r1", "r20", "r22", "r29", "r31", "fr4", "memory"
   );
+
+  /* If the kernel LWS call is successful, lws_ret contains 0.  */
+  if (__builtin_expect (lws_ret == 0, 1))
+return 0;
+
   if (__builtin_expect (lws_errno == -EFAULT || lws_errno == -ENOSYS, 0))
 __builtin_trap ();
 
-  /* If the kernel LWS call fails, return EBUSY */
-  if (!lws_errno && lws_ret)
-lws_errno = -EBUSY;
+  /* If the kernel LWS call fails with no error, return -EBUSY */
+  if (__builtin_expect (!lws_errno, 0))
+return -EBUSY;
 
   return lws_errno;
 }
@@ -111,7 +116,7 @@
   __sync_fetch_and_##OP##_##WIDTH (TYPE *ptr, TYPE val)
\
   {\
 TYPE tmp, newval;  \
-int failure;   \
+long failure;  \
\
 do {   \
   tmp = __atomic_load_n (ptr, __ATOMIC_SEQ_CST);   \
@@ -122,6 +127,13 @@
 return tmp;
\
   }
 
+FETCH_AND_OP_2 (add,   , +, long long, 8, 3)
+FETCH_AND_OP_2 (sub,   , -, long long, 8, 3)
+FETCH_AND_OP_2 (or,, |, long long, 8, 3)
+FETCH_AND_OP_2 (and,   , &, long long, 8, 3)
+FETCH_AND_OP_2 (xor,   , ^, long long, 8, 3)
+FETCH_AND_OP_2 (nand, ~, &, long long, 8, 3)
+
 FETCH_AND_OP_2 (add,   , +, short, 2, 1)
 FETCH_AND_OP_2 (sub,   , -, short, 2, 1)
 FETCH_AND_OP_2 (or,, |, short, 2, 1)
@@ -141,7 +153,7 @@
   __sync_##OP##_and_fetch_##WIDTH (TYPE *ptr, TYPE val)
\
   {\
 TYPE tmp, newval;  \
-int failure;   \
+long failure;  \
\
 do {   \
   tmp = __atomic_load_n (ptr, __ATOMIC_SEQ_CST);   \
@@ -152,6 +164,13 @@
 return PFX_OP (tmp INF_OP val);\
   }
 
+OP_AND_FETCH_2 (add,   , +, long long, 8, 3)
+OP_AND_FETCH_2 (sub,   , -, long long, 8, 3)
+OP_AND_FETCH_2 (or,, |, long long, 8, 3)
+OP_AND_FETCH_2 (and,   , &, long long, 8, 3)
+OP_AND_FETCH_2 (xor,   , ^, long long, 8, 3)
+OP_AND_FETCH_2 (nand, ~, &, long long, 8, 3)
+
 OP_AND_FETCH_2 (add,   , +, short, 2, 1)
 OP_AND_FETCH_2 (sub,   , -, short, 2, 1)
 OP_AND_FETCH_2 (or,, |, short, 2, 1)
@@ -170,7 +189,8 @@
   int HIDDEN   \
   __sync_fetch_and_##OP##_4 (int *ptr, int val)  

Re: [PATCH 0/5] RFC: Overhaul of diagnostics (v2)

2015-09-23 Thread David Malcolm
On Wed, 2015-09-23 at 15:36 +0200, Richard Biener wrote:
> On Wed, Sep 23, 2015 at 3:19 PM, Michael Matz  wrote:
> > Hi,
> >
> > On Tue, 22 Sep 2015, David Malcolm wrote:
> >
> >> The drawback is that it could bloat the ad-hoc table.  Can the ad-hoc
> >> table ever get smaller, or does it only ever get inserted into?
> >
> > It only ever grows.
> >
> >> An idea I had is that we could stash short ranges directly into the 32
> >> bits of location_t, by offsetting the per-column-bits somewhat.
> >
> > It's certainly worth an experiment: let's say you restrict yourself to
> > tokens less than 8 characters, you need an additional 3 bits (using one
> > value, e.g. zero, as the escape value).  That leaves 20 bits for the line
> > numbers (for the normal 8 bit columns), which might be enough for most
> > single-file compilations.  For LTO compilation this often won't be enough.
> >
> >> My plan is to investigate the impact these patches have on the time and
> >> memory consumption of the compiler,
> >
> > When you do so, make sure you're also measuring an LTO compilation with
> > debug info of something big (firefox).  I know that we already had issues
> > with the size of the linemap data in the past for these cases (probably
> > when we added columns).
> 
> The issue we have with LTO is that the linemap gets populated in quite
> random order and thus we repeatedly switch files (we've mitigated this
> somewhat for GCC 5).  We also considered dropping column info
> (and would drop range info) as diagnostics are from optimizers only
> with LTO and we keep locations merely for debug info.

Thanks.  Presumably the mitigation you're referring to is the
lto_location_cache class in lto-streamer-in.c?

Am I right in thinking that, right now, the LTO code doesn't support
ad-hoc locations? (presumably the block pointers only need to exist
during optimization, which happens after the serialization)

The obvious simplification would be, as you suggest, to not bother
storing range information with LTO, falling back to just the existing
representation.  Then there's no need to extend LTO to serialize ad-hoc
data; simply store the underlying locus into the bit stream.  I think
that this happens already: lto-streamer-out.c calls expand_location and
stores the result, so presumably any ad-hoc location_t values made by
the v2 patches would have dropped their range data there when I ran the
test suite.

If it's acceptable to not bother with ranges for LTO, one way to do the
"stashing short ranges into the location_t" idea might be for the
bits-per-range of location_t values to be a property of the line_table
(or possibly the line map), set up when the struct line_maps is created.
For non-LTO it could be some tuned value (maybe from a param?); for LTO
it could be zero, so that we have as many bits as before for line/column
data.

Hope this sounds sane
Dave



[PATCH] update a few places for the change from gimple_statement_base to gimple

2015-09-23 Thread tbsaunde+gcc
From: Trevor Saunders 

Hi,

This fixes up a few remaining references to gimple_statement_base that were 
just brought up.

bootstrapped on x86_64-linux-gnu, but the only non comment / doc change is 
gdbhooks.py, ok?

Trev

gcc/ChangeLog:

2015-09-23  Trevor Saunders  

* doc/gimple.texi: Update references to gimple_statement_base.
* gdbhooks.py: Likewise.
* gimple.h: Likewise.
---
 gcc/doc/gimple.texi | 12 ++--
 gcc/gdbhooks.py |  2 +-
 gcc/gimple.h| 10 +-
 3 files changed, 12 insertions(+), 12 deletions(-)

diff --git a/gcc/doc/gimple.texi b/gcc/doc/gimple.texi
index 543de90..d089d4f 100644
--- a/gcc/doc/gimple.texi
+++ b/gcc/doc/gimple.texi
@@ -92,8 +92,8 @@ groups: a header describing the instruction and its locations,
 and a variable length body with all the operands. Tuples are
 organized into a hierarchy with 3 main classes of tuples.
 
-@subsection @code{gimple_statement_base} (gsbase)
-@cindex gimple_statement_base
+@subsection @code{gimple} (gsbase)
+@cindex gimple
 
 This is the root of the hierarchy, it holds basic information
 needed by most GIMPLE statements. There are some fields that
@@ -223,7 +223,7 @@ is then inherited from the other two tuples.
 
 @itemize @bullet
 @item @code{gsbase}
-Inherited from @code{struct gimple_statement_base}.
+Inherited from @code{struct gimple}.
 
 @item @code{def_ops}
 Array of pointers into the operand array indicating all the slots that
@@ -300,7 +300,7 @@ kinds, along with their relationships to @code{GSS_} values 
(layouts) and
 @code{GIMPLE_} values (codes):
 
 @smallexample
-   gimple_statement_base
+   gimple
  |layout: GSS_BASE
  |used for 4 codes: GIMPLE_ERROR_MARK
  |  GIMPLE_NOP
@@ -2654,7 +2654,7 @@ any new basic blocks which are necessary.
 
 The first step in adding a new GIMPLE statement code, is
 modifying the file @code{gimple.def}, which contains all the GIMPLE
-codes.  Then you must add a corresponding gimple_statement_base subclass
+codes.  Then you must add a corresponding gimple subclass
 located in @code{gimple.h}.  This in turn, will require you to add a
 corresponding @code{GTY} tag in @code{gsstruct.def}, and code to handle
 this tag in @code{gss_for_code} which is located in @code{gimple.c}.
@@ -2667,7 +2667,7 @@ in @code{gimple.c}.
 You will probably want to create a function to build the new
 gimple statement in @code{gimple.c}.  The function should be called
 @code{gimple_build_@var{new-tuple-name}}, and should return the new tuple
-as a pointer to the appropriate gimple_statement_base subclass.
+as a pointer to the appropriate gimple subclass.
 
 If your new statement requires accessors for any members or
 operands it may have, put simple inline accessors in
diff --git a/gcc/gdbhooks.py b/gcc/gdbhooks.py
index 3a62a2d..2b9a94c 100644
--- a/gcc/gdbhooks.py
+++ b/gcc/gdbhooks.py
@@ -484,7 +484,7 @@ def build_pretty_printer():
  'cgraph_node', CGraphNodePrinter)
 pp.add_printer_for_types(['dw_die_ref'],
  'dw_die_ref', DWDieRefPrinter)
-pp.add_printer_for_types(['gimple', 'gimple_statement_base *',
+pp.add_printer_for_types(['gimple', 'gimple *',
 
   # Keep this in the same order as gimple.def:
   'gimple_cond', 'const_gimple_cond',
diff --git a/gcc/gimple.h b/gcc/gimple.h
index 91c26b6..30b1041 100644
--- a/gcc/gimple.h
+++ b/gcc/gimple.h
@@ -123,7 +123,7 @@ enum gimple_rhs_class
 };
 
 /* Specific flags for individual GIMPLE statements.  These flags are
-   always stored in gimple_statement_base.subcode and they may only be
+   always stored in gimple.subcode and they may only be
defined for statement codes that do not use subcodes.
 
Values for the masks can overlap as long as the overlapping values
@@ -380,7 +380,7 @@ struct GTY((tag("GSS_BIND")))
   tree vars;
 
   /* [ WORD 8 ]
- This is different than the BLOCK field in gimple_statement_base,
+ This is different than the BLOCK field in gimple,
  which is analogous to TREE_BLOCK (i.e., the lexical block holding
  this statement).  This field is the equivalent of BIND_EXPR_BLOCK
  in tree land (i.e., the lexical scope defined by this bind).  See
@@ -744,7 +744,7 @@ struct GTY((tag("GSS_OMP_SINGLE_LAYOUT")))
 
 
 /* GIMPLE_OMP_ATOMIC_LOAD.
-   Note: This is based on gimple_statement_base, not g_s_omp, because g_s_omp
+   Note: This is based on gimple, not g_s_omp, because g_s_omp
contains a sequence, which we don't need here.  */
 
 struct GTY((tag("GSS_OMP_ATOMIC_LOAD")))
@@ -1813,7 +1813,7 @@ gimple_set_no_warning (gimple *stmt, bool no_warning)
 
You can learn more about the visited property of the gimple
statement by reading the comments of the 'visited' data member of
-   struct gimple statement_base.
+   struct gimple.
  */
 
 static inline void
@@ -1832,7 +1832,7 @@ gimple_set_visited (gimple *stmt, bool visited_p)
 

[PATCH] DWARF support for AIX v4

2015-09-23 Thread David Edelsohn
Richard and Richard,

Appended is the updated version of the DWARF support patch for AIX.  I
still can split out the length computation into a separate helper
function, but, as I mentioned, it won't apply to the instance that
uses a delta of two labels.

This version sets have_macinfo to False and disables add_AT_loc_list.
It also define XCOFF_DEBUGGING_INFO to 0 by default in dwarf2out.c and
dwarf2asm.c.

Thanks, David

* dwarf2out.c (XCOFF_DEBUGGING_INFO): Default 0 definition.
(have_macinfo): Force to False for XCOFF_DEBUGGING_INFO.
(add_AT_loc_list): Return early if XCOFF_DEBUGGING_INFO.
(output_compilation_unit_header): Don't output length on AIX.
(output_pubnames): Don't output length on AIX.
(output_aranges): Delete argument. Compute length locally. Don't
output length on AIX.
(output_line_info): Don't output length on AIX.
(dwarf2out_finish): Don't compute aranges_length.
* dwarf2asm.c (XCOFF_DEBUGGING_INFO): Default 0 definition.
(dw2_asm_output_nstring): Emit .byte not .ascii on AIX.
* config/rs6000/rs6000.c (rs6000_output_dwrf_dtprel): Emit correct
symbol decoration for AIX.
(rs6000_xcoff_debug_unwind_info): New.
(rs6000_xcoff_asm_named_section): Emit .dwsect pseudo-op
for SECTION_DEBUG.
(rs6000_xcoff_declare_function_name): Emit different
.function pseudo-op when DWARF2_DEBUG. Don't call
xcoffout_declare_function for DWARF2_DEBUG.
* config/rs6000/xcoff.h (TARGET_DEBUG_UNWIND_INFO):
Redefine.
* config/rs6000/aix71.h (DWARF2_DEBUGGING_INFO): Define.
(PREFERRED_DEBUGGING_TYPE): Define.
(DEBUG_INFO_SECTION): Define.
(DEBUG_ABBREV_SECTION): Define.
(DEBUG_ARANGES_SECTION): Define.
(DEBUG_LINE_SECTION): Define.
(DEBUG_PUBNAMES_SECTION): Define.
(DEBUG_PUBTYPES_SECTION): Define.
(DEBUG_STR_SECTION): Define.
(DEBUG_RANGES_SECTION): Define.

Index: dwarf2out.c
===
--- dwarf2out.c (revision 228071)
+++ dwarf2out.c (working copy)
@@ -108,6 +108,10 @@ static rtx_insn *last_var_location_insn;
 static rtx_insn *cached_next_real_insn;
 static void dwarf2out_decl (tree);

+#ifndef XCOFF_DEBUGGING_INFO
+#define XCOFF_DEBUGGING_INFO 0
+#endif
+
 #ifdef VMS_DEBUGGING_INFO
 int vms_file_stats_name (const char *, long long *, long *, char *, int *);

@@ -2995,7 +2999,8 @@ static GTY (()) vec *macinfo
 /* True if .debug_macinfo or .debug_macros section is going to be
emitted.  */
 #define have_macinfo \
-  (debug_info_level >= DINFO_LEVEL_VERBOSE \
+  (!XCOFF_DEBUGGING_INFO \
+   && debug_info_level >= DINFO_LEVEL_VERBOSE \
&& !macinfo_table->is_empty ())
 /* Array of dies for which we should generate .debug_ranges info.  */
@@ -3202,7 +3207,7 @@ static void add_enumerator_pubname (const char *,
 static void add_pubname_string (const char *, dw_die_ref);
 static void add_pubtype (tree, dw_die_ref);
 static void output_pubnames (vec *);
-static void output_aranges (unsigned long);
+static void output_aranges (void);
 static unsigned int add_ranges_num (int);
 static unsigned int add_ranges (const_tree);
 static void add_ranges_by_labels (dw_die_ref, const char *, const char *,
@@ -4236,6 +4241,9 @@ add_AT_loc_list (dw_die_ref die, enum dwarf_attrib
 {
   dw_attr_node attr;

+  if (XCOFF_DEBUGGING_INFO)
+return;
+
   attr.dw_attr = attr_kind;
   attr.dw_attr_val.val_class = dw_val_class_loc_list;
   attr.dw_attr_val.val_entry = NULL;
@@ -9197,12 +9205,16 @@ output_compilation_unit_header (void)
  DWARFv5 draft DIE tags in DWARFv4 format.  */
   int ver = dwarf_version < 5 ? dwarf_version : 4;

-  if (DWARF_INITIAL_LENGTH_SIZE - DWARF_OFFSET_SIZE == 4)
-dw2_asm_output_data (4, 0x,
-  "Initial length escape value indicating 64-bit DWARF extension");
-  dw2_asm_output_data (DWARF_OFFSET_SIZE,
-  next_die_offset - DWARF_INITIAL_LENGTH_SIZE,
-  "Length of Compilation Unit Info");
+  if (!XCOFF_DEBUGGING_INFO)
+{
+  if (DWARF_INITIAL_LENGTH_SIZE - DWARF_OFFSET_SIZE == 4)
+   dw2_asm_output_data (4, 0x,
+ "Initial length escape value indicating 64-bit DWARF extension");
+  dw2_asm_output_data (DWARF_OFFSET_SIZE,
+  next_die_offset - DWARF_INITIAL_LENGTH_SIZE,
+  "Length of Compilation Unit Info");
+}
+
   dw2_asm_output_data (2, ver, "DWARF version number");
   dw2_asm_output_offset (DWARF_OFFSET_SIZE, abbrev_section_label,
 debug_abbrev_section,
@@ -9632,10 +9644,14 @@ output_pubnames (vec *names)
   unsigned long pubnames_length = size_of_pubnames (names);
   pubname_entry *pub;

-  if (DWARF_INITIAL_LENGTH_SIZE - DWARF_OFFSET_SIZE == 4)
-dw2_asm_output_data (4, 0x,
-  "Initial length escape value indicating 64-bit DWARF ext

Re: [PATCH] rs6000: Fix -mdebug=stack code for spe_gp_offset

2015-09-23 Thread David Edelsohn
On Wed, Sep 23, 2015 at 7:35 PM, Segher Boessenkool
 wrote:
> This seems like an obvious typo.  I cannot test SPE, but I noticed
> this offset shows up in the debug output for normal configurations.
> The condition is inverted, compared to all similar ones.
>
> Is this okay for trunk?
>
>
> Segher
>
>
> 2015-09-23  Segher Boessenkool  
>
> * config/rs6000/rs6000.c (debug_stack_info): Invert the test
> for info->spe_gp_size.

Okay.

Thanks, David


  1   2   >