Re: [PATCH 1/2, expr.c] Optimize switch with sign-extended index.

2018-05-04 Thread Richard Biener
On Thu, May 3, 2018 at 11:29 PM, Jim Wilson  wrote:
> On Thu, May 3, 2018 at 12:29 AM, Richard Biener
>  wrote:
>> Just as a note, IIRC all the SUBREG_PROMOTED_* stuff is quite fragile
>> - I remember
>> Eric fixing things up a bit but some verification would be nice to
>> have (instrumentation
>> at RTL level that for SUBREG_PROMOTED_* the bits are as expected).
>
> If you are using SUBREG_PROMOTED_* in a late optimization pass like
> combine, then this requires that all earlier optimization passes
> propagate the info correctly.  I suppose there could be issues there.
> But that isn't what this patch is doing.  This is code called during
> initial RTL generation.  The SUBREG_PROMOTED_* bits are set during
> this process because we know that arguments are passed sign-extended
> to full register size.  We are then consuming the info while still in
> the RTL generation phase.  I think that there is little that can go
> wrong here.

Indeed.  But IIRC the info stays around and people might be tempted to use it...
I do see various uses in later RTL optimizers.

> Verifying this info in RTL generation would effectively be verifying
> that the argument passing conventions are implemented correctly, and
> we already have other ways to do that.
>
> It might be useful to try to verify this info before combine where it
> is more likely to be wrong.  I don't think there is any easy way to
> verify this at compile time.  This would probably require emitting
> code to check at application run-time that a promoted subreg actually
> has a properly promoted value, and call abort if it doesn't.  This
> would likely be an expensive check that we don't want enabled by
> default, but might be useful for debugging purposes.  I don't think we
> have any --enable-checking code like this at present.  We have
> compiler compile-time checking and compiler run-time checking, but I
> don't think that we have application run-time checking.  This would be
> more like a sanitizer option, except to validate info in the RTL.  Is
> this what you are asking for?

Yes, that's what I was suggesting.  I guess similarly "sanitizing"
on-the-side info we have at GIMPLE level would be interesting, like
verifying range-info.

Just an idea - I'm not actually expecting you to implemen this,
esp. since doing the actual instrumentation on RTL can be a bit
tricky (best not emit a call to abort but use the targets trap
instruction for simplicity).

Richard.

>
> Jim


[PATCH] Fix PR85574

2018-05-04 Thread Richard Biener

The PR correctly notices that we cannot transform -((a-b)/c) to
(b-a)/c if a-b == b-a == INT_MIN because we are then changing
the results sign if c != 1.  Thus the following patch restricts
this transform/predicate further.

Bootstrapped and tested on x86_64-unknown-linux-gnu, applied to trunk.

Richard.

2018-05-04  Richard Biener  

PR middle-end/85574
* fold-const.c (negate_expr_p): Restrict negation of operand
zero of a division to when we know that can happen without
overflow.
(fold_negate_expr_1): Likewise.

* gcc.dg/torture/pr85574.c: New testcase.
* gcc.dg/torture/pr57656.c: Use dg-additional-options.

Index: gcc/testsuite/gcc.dg/torture/pr85574.c
===
--- gcc/testsuite/gcc.dg/torture/pr85574.c  (nonexistent)
+++ gcc/testsuite/gcc.dg/torture/pr85574.c  (working copy)
@@ -0,0 +1,4 @@
+/* { dg-do run } */
+/* { dg-additional-options "-fwrapv" } */
+
+#include "pr57656.c"
Index: gcc/testsuite/gcc.dg/torture/pr57656.c
===
--- gcc/testsuite/gcc.dg/torture/pr57656.c  (revision 259879)
+++ gcc/testsuite/gcc.dg/torture/pr57656.c  (working copy)
@@ -1,5 +1,5 @@
 /* { dg-do run } */
-/* { dg-options "-fstrict-overflow" } */
+/* { dg-additional-options "-fstrict-overflow" } */
 
 int main (void)
 {
Index: gcc/fold-const.c
===
--- gcc/fold-const.c(revision 259879)
+++ gcc/fold-const.c(working copy)
@@ -474,12 +474,15 @@ negate_expr_p (tree t)
 case EXACT_DIV_EXPR:
   if (TYPE_UNSIGNED (type))
break;
-  if (negate_expr_p (TREE_OPERAND (t, 0)))
+  /* In general we can't negate A in A / B, because if A is INT_MIN and
+ B is not 1 we change the sign of the result.  */
+  if (TREE_CODE (TREE_OPERAND (t, 0)) == INTEGER_CST
+ && negate_expr_p (TREE_OPERAND (t, 0)))
return true;
   /* In general we can't negate B in A / B, because if A is INT_MIN and
 B is 1, we may turn this into INT_MIN / -1 which is undefined
 and actually traps on some architectures.  */
-  if (! INTEGRAL_TYPE_P (TREE_TYPE (t))
+  if (! ANY_INTEGRAL_TYPE_P (TREE_TYPE (t))
  || TYPE_OVERFLOW_WRAPS (TREE_TYPE (t))
  || (TREE_CODE (TREE_OPERAND (t, 1)) == INTEGER_CST
  && ! integer_onep (TREE_OPERAND (t, 1
@@ -652,14 +655,17 @@ fold_negate_expr_1 (location_t loc, tree
 case EXACT_DIV_EXPR:
   if (TYPE_UNSIGNED (type))
break;
-  if (negate_expr_p (TREE_OPERAND (t, 0)))
+  /* In general we can't negate A in A / B, because if A is INT_MIN and
+B is not 1 we change the sign of the result.  */
+  if (TREE_CODE (TREE_OPERAND (t, 0)) == INTEGER_CST
+ && negate_expr_p (TREE_OPERAND (t, 0)))
return fold_build2_loc (loc, TREE_CODE (t), type,
negate_expr (TREE_OPERAND (t, 0)),
TREE_OPERAND (t, 1));
   /* In general we can't negate B in A / B, because if A is INT_MIN and
 B is 1, we may turn this into INT_MIN / -1 which is undefined
 and actually traps on some architectures.  */
-  if ((! INTEGRAL_TYPE_P (TREE_TYPE (t))
+  if ((! ANY_INTEGRAL_TYPE_P (TREE_TYPE (t))
   || TYPE_OVERFLOW_WRAPS (TREE_TYPE (t))
   || (TREE_CODE (TREE_OPERAND (t, 1)) == INTEGER_CST
   && ! integer_onep (TREE_OPERAND (t, 1


Re: [PATCH] Fix PR85627 (and more)

2018-05-04 Thread Richard Biener
On Thu, 3 May 2018, Richard Biener wrote:

> 
> The following fixes PR85627 and more generally complex lowering not
> preserving EH information with -fnon-call-exceptions when replacing
> complex multiplication or division with a libcall.
> 
> This requires changing BUILT_IN_COMPLEX_{MUL,DIV} to be no longer
> declared nothrow - complex lowering (which looks like the only consumer)
> properly will set the nothrow flag on the individual call based on
> the context.
> 
> Test coverage of -fnon-call-exceptions is notoriously bad and I'm not
> sure whether Ada uses GCCs complex types.  Eric?
> 
> Otherwise does this look sane?  I'm waiting for Kyrylos patch to come
> in and then will refresh and re-test.

This is what I have committed.

Bootstrapped and tested on x86_64-unknown-linux-gnu.

Richard.

2018-05-04  Richard Biener  

PR middle-end/85627
* tree-complex.c (update_complex_assignment): We are always in SSA form.
(expand_complex_div_wide): Likewise.
(expand_complex_operations_1): Likewise.
(expand_complex_libcall): Preserve EH info of the original stmt.
(tree_lower_complex): Handle removed blocks.
* tree.c (build_common_builtin_nodes): Do not set ECF_NOTRHOW
on complex multiplication and division libcall builtins.

* g++.dg/torture/pr85627.C: New testcase.

Index: gcc/tree-complex.c
===
--- gcc/tree-complex.c  (revision 259889)
+++ gcc/tree-complex.c  (working copy)
@@ -703,8 +703,7 @@ update_complex_assignment (gimple_stmt_i
   if (maybe_clean_eh_stmt (stmt))
 gimple_purge_dead_eh_edges (gimple_bb (stmt));
 
-  if (gimple_in_ssa_p (cfun))
-update_complex_components (gsi, gsi_stmt (*gsi), r, i);
+  update_complex_components (gsi, gsi_stmt (*gsi), r, i);
 }
 
 
@@ -1006,37 +1005,44 @@ expand_complex_libcall (gimple_stmt_iter
   else
 gcc_unreachable ();
   fn = builtin_decl_explicit (bcode);
-
   stmt = gimple_build_call (fn, 4, ar, ai, br, bi);
 
-
   if (inplace_p)
 {
   gimple *old_stmt = gsi_stmt (*gsi);
+  gimple_call_set_nothrow (stmt, !stmt_could_throw_p (old_stmt));
   lhs = gimple_assign_lhs (old_stmt);
   gimple_call_set_lhs (stmt, lhs);
-  update_stmt (stmt);
-  gsi_replace (gsi, stmt, false);
-
-  if (maybe_clean_or_replace_eh_stmt (old_stmt, stmt))
-   gimple_purge_dead_eh_edges (gsi_bb (*gsi));
+  gsi_replace (gsi, stmt, true);
 
   type = TREE_TYPE (type);
-  update_complex_components (gsi, stmt,
- build1 (REALPART_EXPR, type, lhs),
- build1 (IMAGPART_EXPR, type, lhs));
+  if (stmt_can_throw_internal (stmt))
+   {
+ edge_iterator ei;
+ edge e;
+ FOR_EACH_EDGE (e, ei, gimple_bb (stmt)->succs)
+ if (!(e->flags & EDGE_EH))
+   break;
+ basic_block bb = split_edge (e);
+ gimple_stmt_iterator gsi2 = gsi_start_bb (bb);
+ update_complex_components (&gsi2, stmt,
+build1 (REALPART_EXPR, type, lhs),
+build1 (IMAGPART_EXPR, type, lhs));
+ return NULL_TREE;
+   }
+  else
+   update_complex_components (gsi, stmt,
+  build1 (REALPART_EXPR, type, lhs),
+  build1 (IMAGPART_EXPR, type, lhs));
   SSA_NAME_DEF_STMT (lhs) = stmt;
   return NULL_TREE;
 }
 
-  lhs = create_tmp_var (type);
+  gimple_call_set_nothrow (stmt, true);
+  lhs = make_ssa_name (type);
   gimple_call_set_lhs (stmt, lhs);
-
-  lhs = make_ssa_name (lhs, stmt);
-  gimple_call_set_lhs (stmt, lhs);
-
-  update_stmt (stmt);
   gsi_insert_before (gsi, stmt, GSI_SAME_STMT);
+
   return lhs;
 }
 
@@ -1265,14 +1271,8 @@ expand_complex_div_wide (gimple_stmt_ite
   gimple *stmt;
   tree cond, tmp;
 
-  tmp = create_tmp_var (boolean_type_node);
+  tmp = make_ssa_name (boolean_type_node);
   stmt = gimple_build_assign (tmp, compare);
-  if (gimple_in_ssa_p (cfun))
-   {
- tmp = make_ssa_name (tmp, stmt);
- gimple_assign_set_lhs (stmt, tmp);
-   }
-
   gsi_insert_before (gsi, stmt, GSI_SAME_STMT);
 
   cond = fold_build2_loc (gimple_location (stmt),
@@ -1698,25 +1698,20 @@ expand_complex_operations_1 (gimple_stmt
   else
 br = bi = NULL_TREE;
 
-  if (gimple_in_ssa_p (cfun))
+  al = find_lattice_value (ac);
+  if (al == UNINITIALIZED)
+al = VARYING;
+
+  if (TREE_CODE_CLASS (code) == tcc_unary)
+bl = UNINITIALIZED;
+  else if (ac == bc)
+bl = al;
+  else
 {
-  al = find_lattice_value (ac);
-  if (al == UNINITIALIZED)
-   al = VARYING;
-
-  if (TREE_CODE_CLASS (code) == tcc_unary)
-   bl = UNINITIALIZED;
-  else if (ac == bc)
-   bl = al;
-  else
-   {
- bl = find_lattice_value (bc);
- if (bl == UNINITIALIZED)
-   bl = VA

[PATCH, expand, PR85639] Handle null target in expand_builtin_goacc_parlevel_id_size

2018-05-04 Thread Tom de Vries
[ was: Re: [PATCH, PR82428] Add 
__builtin_goacc_{gang,worker,vector}_{id,size} ]


On 01/18/2018 09:55 AM, Tom de Vries wrote:

On 01/17/2018 06:51 PM, Jakub Jelinek wrote:

On Wed, Jan 17, 2018 at 06:42:33PM +0100, Tom de Vries wrote:




@@ -6602,6 +6604,71 @@ expand_stack_save (void)
return ret;
  }
  
+/* Emit code to get the openacc gang, worker or vector id or size.  */

+
+static rtx
+expand_builtin_goacc_parlevel_id_size (tree exp, rtx target, int ignore)
+{





+
+  if (ignore)
+return target;
+
+  if (!targetm.have_oacc_dim_size ())
+{
+  emit_move_insn (target, fallback_retval);
+  return target;
+}
+


Hi,

As reported in PR85639 (triggered on powerpc64), this function causes a 
segfault when called with ignore == 0 and target == NULL_RTX, by calling 
emit_move_insn with target == NULL_RTX.


Fixed by making sure target is non-null before calling emit_move_insn.

Rebuild cc1 on powerpc64, and verified that:
- the segfault no longer happens, and
- valid RTL code is generated for the __builtin_goacc_parlevel_id call
  that caused the ICE.

OK for trunk?

Thanks,
- Tom
[expand] Handle null target in expand_builtin_goacc_parlevel_id_size

2018-05-04  Tom de Vries  

	PR libgomp/85639
	* builtins.c (expand_builtin_goacc_parlevel_id_size):

---
 gcc/builtins.c | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/gcc/builtins.c b/gcc/builtins.c
index 300e13c..0097d5b 100644
--- a/gcc/builtins.c
+++ b/gcc/builtins.c
@@ -6682,6 +6682,9 @@ expand_builtin_goacc_parlevel_id_size (tree exp, rtx target, int ignore)
   if (ignore)
 return target;
 
+  if (target == NULL_RTX)
+target = gen_reg_rtx (TYPE_MODE (TREE_TYPE (exp)));
+
   if (!targetm.have_oacc_dim_size ())
 {
   emit_move_insn (target, fallback_retval);


Re: [PATCH, expand, PR85639] Handle null target in expand_builtin_goacc_parlevel_id_size

2018-05-04 Thread Jakub Jelinek
On Fri, May 04, 2018 at 09:46:45AM +0200, Tom de Vries wrote:
> As reported in PR85639 (triggered on powerpc64), this function causes a
> segfault when called with ignore == 0 and target == NULL_RTX, by calling
> emit_move_insn with target == NULL_RTX.
> 
> Fixed by making sure target is non-null before calling emit_move_insn.
> 
> Rebuild cc1 on powerpc64, and verified that:
> - the segfault no longer happens, and
> - valid RTL code is generated for the __builtin_goacc_parlevel_id call
>   that caused the ICE.
> 
> OK for trunk?
> 
> Thanks,
> - Tom

> [expand] Handle null target in expand_builtin_goacc_parlevel_id_size
> 
> 2018-05-04  Tom de Vries  
> 
>   PR libgomp/85639
>   * builtins.c (expand_builtin_goacc_parlevel_id_size):

Please say what has changed in the ChangeLog entry.
Otherwise LGTM.

>  gcc/builtins.c | 3 +++
>  1 file changed, 3 insertions(+)
> 
> diff --git a/gcc/builtins.c b/gcc/builtins.c
> index 300e13c..0097d5b 100644
> --- a/gcc/builtins.c
> +++ b/gcc/builtins.c
> @@ -6682,6 +6682,9 @@ expand_builtin_goacc_parlevel_id_size (tree exp, rtx 
> target, int ignore)
>if (ignore)
>  return target;
>  
> +  if (target == NULL_RTX)
> +target = gen_reg_rtx (TYPE_MODE (TREE_TYPE (exp)));
> +
>if (!targetm.have_oacc_dim_size ())
>  {
>emit_move_insn (target, fallback_retval);


Jakub


Re: [ARM] Fix PR85434: spill of stack protector's guard address

2018-05-04 Thread Segher Boessenkool
Hi Thomas,

On Fri, May 04, 2018 at 05:52:57AM +0100, Thomas Preudhomme wrote:
> >> As mentionned in the ticket this was my first thought but this means
> >> making the pattern aware of all the possible way the address could be
> >> access (PIC Vs non-PIC, Arm Vs Thumb-2 Vs Thumb-1) to decide how many
> >> scratch registers are needed. I'd rather reuse the existing pattern as
> >> much as possible to make sure they are well tested. Ideally I wanted a
> >> way to mark a REG RTX so that it is never spilled and such that the
> >> mark is propagated when the register is moved to another register or
> >> propagated. But that is a bigger change so decided it should be an
> >> improvement for later but needed another solution right now.
> >
> > How would that work, esp. for pseudos?  If too many regs have such a
> > mark then the compiler will have to sorry() or similar, not a good
> > thing at all.
> 
> I'm missing something, there should be the same amount of pseudo with that
> mark as there is scratch in the new pattern doing memory address load(s) +
> set / check. I'm guessing this is not as easy to achieve as it sounds.

But this pattern is expanded all the way at the beginning of the RTL
pipeline, so you'll need to prevent anything copying this.  And if any
other pattern wants to use this do-not-spill feature as well, you'll
have a problem no matter what.

> >> By the way about making sure the address is not left in a register, I
> >> have a question regarding the current stack_protect_set and
> >> stack_protect_check pattern and their requirements to have register
> >> cleared afterwards: why is that necessary? Currently not all registers
> >> are cleared and the guard is available in the canari before it is
> >> overwritten anyway so I don't see how clearing the register adds any
> >> extra security. What sort of attack is it protecting against?
> >
> > From md.texi:
> >
> > @item @samp{stack_protect_set}
> > This pattern, if defined, moves a @code{ptr_mode} value from the memory
> > in operand 1 to the memory in operand 0 without leaving the value in
> > a register afterward.  This is to avoid leaking the value some place
> > that an attacker might use to rewrite the stack guard slot after
> > having clobbered it.
> >
> > (etc.)
> 
> I've read that doc but what I don't understand is why the guard value being
> leaked in a register would be a problem if modified. The pattern as they
> are guarantee the guard is always reloaded from its canonical location
> (e.g. TLS var). Because the patterns do not represent in RTL what they do
> the compiler could not reuse the value left in a register. Are we worrying
> about optimization the assembler could do?
> 
> > Having the canary in a global variable makes it a lot easier for exploit
> > code to access it then if it is e.g. in TLS data.  Actually leaking a
> > pointer to it would make it extra easy...
> 
> If an attacker can execute code to access and modify the guard, why would
> s/he bother doing a stack overflow instead of just executing the code he
> wants to directly?

The issue is leaking the value so the user can observe it, and then when
overwriting the stack write the expected value to the cookie again, so
that the protection isn't triggered.

You don't necessarily need to execute code of your choice to overwrite
a memory location of your choice, fwiw.  SSP does not prevent all attacks,
just very many.


Segher


Re: [C++ PATCH] Fix value initialized decltype(nullptr) in constexpr (PR c++/85553)

2018-05-04 Thread Paolo Carlini

Hi all, Jason,

On 29/04/2018 09:23, Paolo Carlini wrote:

Hi,

On 28/04/2018 18:41, Jason Merrill wrote:
On Fri, Apr 27, 2018 at 7:26 PM, Paolo Carlini 
 wrote:

Hi again,

I'm now pretty sure that we have a latent issue in ocp_convert. The bug
fixed by Jakub shows that we used to not have issues with 
integer_zero_node.
That's easy to explain: at the beginning of ocp_convert there is 
code which

handles first some special / simple cases when
same_type_ignoring_top_level_qualifiers_p is true. That code isn't 
of course
used for integer_zero_node as source expression, which therefore is 
handled

by:

   if (NULLPTR_TYPE_P (type) && e && null_ptr_cst_p (e))
 {
   if (complain & tf_warning)
 maybe_warn_zero_as_null_pointer_constant (e, loc);
   return nullptr_node;
 }

Maybe we should move this code up, then.
You are totally right. Yesterday I realized that and tested on 
x86_64-linux the below, both with and without Jakub's fix.

In trunk shall I go ahead with this, then?

    https://gcc.gnu.org/ml/gcc-patches/2018-04/msg01290.html

Thanks!
Paolo.


[PATCH] PR libstdc++/85642 fix is_nothrow_default_constructible>

2018-05-04 Thread Jonathan Wakely

Add missing noexcept keyword to default constructor of each
_Optional_payload specialization.

PR libstdc++/85642 fix is_nothrow_default_constructible>
* include/std/optional (_Optional_payload): Add noexcept to default
constructor. Re-indent.
(_Optional_payload<_Tp, true, true, true>): Likewise. Add noexcept to
constructor for copying disengaged payloads.
(_Optional_payload<_Tp, true, false, true>): Likewise.
(_Optional_payload<_Tp, true, true, false>): Likewise.
(_Optional_payload<_Tp, true, false, false>): Likewise.
* testsuite/20_util/optional/cons/85642.cc: New.
* testsuite/20_util/optional/cons/value_neg.cc: Adjust dg-error lines.

I've also re-indented the _Optional_payload code to ensure that
everything after a template-head is indented and a ctor-initializer is
lined up with the start of the constructor decl (either the class name
or the 'constexpr' keyword).

Tested powerpc64le-linux, committed to trunk. Will backport to gcc-8
too.

commit b49a999a658a2073944c462be7b13a16bddb6daa
Author: Jonathan Wakely 
Date:   Fri May 4 09:22:17 2018 +0100

PR libstdc++/85642 fix is_nothrow_default_constructible>

Add missing noexcept keyword to default constructor of each
_Optional_payload specialization.

PR libstdc++/85642 fix is_nothrow_default_constructible>
* include/std/optional (_Optional_payload): Add noexcept to default
constructor. Re-indent.
(_Optional_payload<_Tp, true, true, true>): Likewise. Add noexcept 
to
constructor for copying disengaged payloads.
(_Optional_payload<_Tp, true, false, true>): Likewise.
(_Optional_payload<_Tp, true, true, false>): Likewise.
(_Optional_payload<_Tp, true, false, false>): Likewise.
* testsuite/20_util/optional/cons/85642.cc: New.
* testsuite/20_util/optional/cons/value_neg.cc: Adjust dg-error 
lines.

diff --git a/libstdc++-v3/include/std/optional 
b/libstdc++-v3/include/std/optional
index 0aa20dd9437..746ee2fd87e 100644
--- a/libstdc++-v3/include/std/optional
+++ b/libstdc++-v3/include/std/optional
@@ -82,8 +82,9 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
   {
   public:
 bad_optional_access() { }
+
 virtual const char* what() const noexcept override
-{return "bad optional access";}
+{ return "bad optional access"; }
 
 virtual ~bad_optional_access() noexcept = default;
   };
@@ -108,36 +109,40 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
  is_trivially_move_assignable<_Tp>::value>
 struct _Optional_payload
 {
-  constexpr _Optional_payload()
-   : _M_empty() {}
+  constexpr _Optional_payload() noexcept : _M_empty() { }
 
   template 
-  constexpr _Optional_payload(in_place_t, _Args&&... __args)
-   : _M_payload(std::forward<_Args>(__args)...),
- _M_engaged(true) {}
+   constexpr
+   _Optional_payload(in_place_t, _Args&&... __args)
+   : _M_payload(std::forward<_Args>(__args)...), _M_engaged(true) { }
 
   template
-  constexpr _Optional_payload(std::initializer_list<_Up> __il,
- _Args&&... __args)
+   constexpr
+   _Optional_payload(std::initializer_list<_Up> __il,
+ _Args&&... __args)
: _M_payload(__il, std::forward<_Args>(__args)...),
- _M_engaged(true) {}
+ _M_engaged(true)
+   { }
+
   constexpr
   _Optional_payload(bool __engaged, const _Optional_payload& __other)
-   : _Optional_payload(__other)
-  {}
+  : _Optional_payload(__other)
+  { }
 
   constexpr
   _Optional_payload(bool __engaged, _Optional_payload&& __other)
-   : _Optional_payload(std::move(__other))
-  {}
+  : _Optional_payload(std::move(__other))
+  { }
 
-  constexpr _Optional_payload(const _Optional_payload& __other)
+  constexpr
+  _Optional_payload(const _Optional_payload& __other)
   {
if (__other._M_engaged)
  this->_M_construct(__other._M_payload);
   }
 
-  constexpr _Optional_payload(_Optional_payload&& __other)
+  constexpr
+  _Optional_payload(_Optional_payload&& __other)
   {
if (__other._M_engaged)
  this->_M_construct(std::move(__other._M_payload));
@@ -176,7 +181,9 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
   }
 
   using _Stored_type = remove_const_t<_Tp>;
+
   struct _Empty_byte { };
+
   union {
   _Empty_byte _M_empty;
   _Stored_type _M_payload;
@@ -201,16 +208,12 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
 
   // The _M_get operations have _M_engaged as a precondition.
   constexpr _Tp&
-   _M_get() noexcept
-  {
-   return this->_M_payload;
-  }
+  _M_get() noexcept
+  { return this->_M_payload; }
 
   constexpr const _Tp&
-   _M_get() const noexcept
-  {
-   return this->_M_payload;
-  }
+  _M_get() 

[PATCH] rs6000: Remove Xilinx FP

2018-05-04 Thread Segher Boessenkool
This removes the special Xilinx FP support.  It was deprecated in
GCC 8.

After this patch all of TARGET_{DOUBLE,SINGLE}_FLOAT,
TARGET_{DF,SF}_INSN, and TARGET_{DF,SF}_FPR are replaced by
TARGET_HARD_FLOAT.  Also the fp_type attribute is deleted.

Tested on powerpc64-linux {-m32,-m64}.  Committing.


Segher


2018-05-04  Segher Boessenkool  

* common/config/rs6000/rs6000-common.c (rs6000_handle_option): Remove
Xilinx FP support.
* config.gcc (powerpc-xilinx-eabi*): Remove.
* config/rs6000/predicates.md (easy_fp_constant): Remove Xilinx FP
support.
(fusion_addis_mem_combo_load): Ditto.
* config/rs6000/rs6000-c.c (rs6000_target_modify_macros): Remove Xilinx
FP support.
(rs6000_cpu_cpp_builtins): Ditto.
* config/rs6000/rs6000-linux.c
(rs6000_linux_float_exceptions_rounding_supported_p): Ditto.
* config/rs6000/rs6000-opts.h (enum fpu_type_t): Delete.
* config/rs6000/rs6000.c (rs6000_debug_reg_global): Remove Xilinx FP
support.
(rs6000_setup_reg_addr_masks): Ditto.
(rs6000_init_hard_regno_mode_ok): Ditto.
(rs6000_option_override_internal): Ditto.
(legitimate_lo_sum_address_p): Ditto.
(rs6000_legitimize_address): Ditto.
(rs6000_legitimize_reload_address): Ditto.
(rs6000_legitimate_address_p): Ditto.
(abi_v4_pass_in_fpr): Ditto.
(setup_incoming_varargs): Ditto.
(rs6000_gimplify_va_arg): Ditto.
(rs6000_split_multireg_move): Ditto.
(rs6000_savres_strategy): Ditto.
(rs6000_emit_prologue_components): Ditto.
(rs6000_emit_epilogue_components): Ditto.
(rs6000_emit_prologue): Ditto.
(rs6000_emit_epilogue): Ditto.
(rs6000_elf_file_end): Ditto.
(rs6000_function_value): Ditto.
(rs6000_libcall_value): Ditto.
* config/rs6000/rs6000.h: Ditto.
(TARGET_MINMAX_SF, TARGET_MINMAX_DF): Delete, merge to ...
(TARGET_MINMAX): ... this.  New.
(TARGET_SF_FPR, TARGET_DF_FPR, TARGET_SF_INSN, TARGET_DF_INSN): Delete.
* config/rs6000/rs6000.md: Remove Xilinx FP support.
(*movsi_internal1_single): Delete.
* config/rs6000/rs6000.opt (msingle-float, mdouble-float, msimple-fpu,
mfpu=, mxilinx-fpu): Delete.
* config/rs6000/singlefp.h: Delete.
* config/rs6000/sysv4.h: Remove Xilinx FP support.
* config/rs6000/t-rs6000: Ditto.
* config/rs6000/t-xilinx: Delete.
* gcc/config/rs6000/titan.md: Adjust for fp_type removal.
* gcc/config/rs6000/vsx.md: Remove Xilinx FP support.
(VStype_simple): Delete.
(VSfptype_simple, VSfptype_mul, VSfptype_div, VSfptype_sqrt): Delete.
* config/rs6000/xfpu.h: Delete.
* config/rs6000/xfpu.md: Delete.
* config/rs6000/xilinx.h: Delete.
* config/rs6000/xilinx.opt: Delete.
* gcc/doc/invoke.texi (RS/6000 and PowerPC Options): Remove
-msingle-float, -mdouble-float, -msimple-fpu, -mfpu=, and -mxilinx-fpu.

---
 gcc/common/config/rs6000/rs6000-common.c |  58 --
 gcc/config.gcc   |   6 -
 gcc/config/rs6000/predicates.md  |   8 +-
 gcc/config/rs6000/rs6000-c.c |  25 +--
 gcc/config/rs6000/rs6000-linux.c |   2 +-
 gcc/config/rs6000/rs6000-opts.h  |  11 --
 gcc/config/rs6000/rs6000.c   | 108 
 gcc/config/rs6000/rs6000.h   |  44 +
 gcc/config/rs6000/rs6000.md  | 294 ---
 gcc/config/rs6000/rs6000.opt |  38 
 gcc/config/rs6000/singlefp.h |  40 -
 gcc/config/rs6000/sysv4.h|   3 -
 gcc/config/rs6000/t-rs6000   |   1 -
 gcc/config/rs6000/t-xilinx   |  28 ---
 gcc/config/rs6000/titan.md   |   5 -
 gcc/config/rs6000/vsx.md | 108 
 gcc/config/rs6000/xfpu.h |  26 ---
 gcc/config/rs6000/xfpu.md| 140 ---
 gcc/config/rs6000/xilinx.h   |  47 -
 gcc/config/rs6000/xilinx.opt |  32 
 gcc/doc/invoke.texi  |  29 +--
 21 files changed, 197 insertions(+), 856 deletions(-)
 delete mode 100644 gcc/config/rs6000/singlefp.h
 delete mode 100644 gcc/config/rs6000/t-xilinx
 delete mode 100644 gcc/config/rs6000/xfpu.h
 delete mode 100644 gcc/config/rs6000/xfpu.md
 delete mode 100644 gcc/config/rs6000/xilinx.h
 delete mode 100644 gcc/config/rs6000/xilinx.opt

diff --git a/gcc/common/config/rs6000/rs6000-common.c 
b/gcc/common/config/rs6000/rs6000-common.c
index ed348f5..c4e77a2 100644
--- a/gcc/common/config/rs6000/rs6000-common.c
+++ b/gcc/common/config/rs6000/rs6000-common.c
@@ -83,7 +83,6 @@ rs6000_handle_option (struct gcc_options *opts, struct 
gcc_options *opts_set,
  const struct cl_decoded_option *decoded,
  location_t loc)
 {
-  enum fpu

libffi PowerPC64 ELFv1 fp arg fixes

2018-05-04 Thread Alan Modra
The attached patch has been accepted into upstream libffi.  It fixes
powerpc64-linux problems shown up by Bruno Haible's new libffi
testsuite tests.  Bootstrapped and regression tested powerpc64-linux
and powerpc64le-linux.  OK mainline and active branches?

-- 
Alan Modra
Australia Development Lab, IBM
>From a3b6c9db017d3f142031636a9dd6088c5478ca28 Mon Sep 17 00:00:00 2001
From: Alan Modra 
Date: Wed, 2 May 2018 19:10:53 +0930
Subject: [PATCH] libffi PowerPC64 ELFv1 fp arg fixes

The ELFv1 ABI says: "Single precision floating point values are mapped
to the second word in a single doubleword" and also "Floating point
registers f1 through f13 are used consecutively to pass up to 13
floating point values, one member aggregates passed by value
containing a floating point value, and to pass complex floating point
values".

libffi wasn't expecting float args in the second word, and wasn't
passing one member aggregates in fp registers.  This patch fixes those
problems, making use of the existing ELFv2 homogeneous aggregate
support since a one element fp struct is a special case of an
homogeneous aggregate.

I've also set a flag when returning pointers that might be used one
day.  This is just a tidy since the ppc64 assembly support code
currently doesn't test FLAG_RETURNS_64BITS for integer types..

	* src/powerpc/ffi_linux64.c (discover_homogeneous_aggregate):
	Compile for ELFv1 too, handling single element aggregates.
	(ffi_prep_cif_linux64_core): Call discover_homogeneous_aggregate
	for ELFv1.  Set FLAG_RETURNS_64BITS for FFI_TYPE_POINTER return.
	(ffi_prep_args64): Call discover_homogeneous_aggregate for ELFv1,
	and handle single element structs containing float or double
	as if the element wasn't wrapped in a struct.  Store floats in
	second word of doubleword slot when big-endian.
	(ffi_closure_helper_LINUX64): Similarly.

diff --git a/libffi/src/powerpc/ffi_linux64.c b/libffi/src/powerpc/ffi_linux64.c
index b84b91fb237..ef0361b24ee 100644
--- a/libffi/src/powerpc/ffi_linux64.c
+++ b/libffi/src/powerpc/ffi_linux64.c
@@ -62,7 +62,6 @@ ffi_prep_types_linux64 (ffi_abi abi)
 #endif
 
 
-#if _CALL_ELF == 2
 static unsigned int
 discover_homogeneous_aggregate (const ffi_type *t, unsigned int *elnum)
 {
@@ -86,8 +85,13 @@ discover_homogeneous_aggregate (const ffi_type *t, unsigned int *elnum)
 	  return 0;
 	base_elt = el_elt;
 	total_elnum += el_elnum;
+#if _CALL_ELF == 2
 	if (total_elnum > 8)
 	  return 0;
+#else
+	if (total_elnum > 1)
+	  return 0;
+#endif
 	el++;
 	  }
 	*elnum = total_elnum;
@@ -98,7 +102,6 @@ discover_homogeneous_aggregate (const ffi_type *t, unsigned int *elnum)
   return 0;
 }
 }
-#endif
 
 
 /* Perform machine dependent cif processing */
@@ -109,9 +112,7 @@ ffi_prep_cif_linux64_core (ffi_cif *cif)
   unsigned bytes;
   unsigned i, fparg_count = 0, intarg_count = 0;
   unsigned flags = cif->flags;
-#if _CALL_ELF == 2
   unsigned int elt, elnum;
-#endif
 
 #if FFI_TYPE_LONGDOUBLE == FFI_TYPE_DOUBLE
   /* If compiled without long double support..  */
@@ -157,6 +158,7 @@ ffi_prep_cif_linux64_core (ffi_cif *cif)
   /* Fall through.  */
 case FFI_TYPE_UINT64:
 case FFI_TYPE_SINT64:
+case FFI_TYPE_POINTER:
   flags |= FLAG_RETURNS_64BITS;
   break;
 
@@ -222,7 +224,6 @@ ffi_prep_cif_linux64_core (ffi_cif *cif)
 		intarg_count = ALIGN (intarg_count, align);
 	}
 	  intarg_count += ((*ptr)->size + 7) / 8;
-#if _CALL_ELF == 2
 	  elt = discover_homogeneous_aggregate (*ptr, &elnum);
 	  if (elt)
 	{
@@ -231,7 +232,6 @@ ffi_prep_cif_linux64_core (ffi_cif *cif)
 		flags |= FLAG_ARG_NEEDS_PSAVE;
 	}
 	  else
-#endif
 	{
 	  if (intarg_count > NUM_GPR_ARG_REGISTERS64)
 		flags |= FLAG_ARG_NEEDS_PSAVE;
@@ -449,9 +449,7 @@ ffi_prep_args64 (extended_cif *ecif, unsigned long *const stack)
i < nargs;
i++, ptr++, p_argv.v++)
 {
-#if _CALL_ELF == 2
   unsigned int elt, elnum;
-#endif
 
   switch ((*ptr)->type)
 	{
@@ -494,6 +492,7 @@ ffi_prep_args64 (extended_cif *ecif, unsigned long *const stack)
 	  /* Fall through.  */
 #endif
 	case FFI_TYPE_DOUBLE:
+	do_double:
 	  double_tmp = **p_argv.d;
 	  if (fparg_count < NUM_FPR_ARG_REGISTERS64 && i < nfixedargs)
 	{
@@ -512,17 +511,30 @@ ffi_prep_args64 (extended_cif *ecif, unsigned long *const stack)
 	  break;
 
 	case FFI_TYPE_FLOAT:
+	do_float:
 	  double_tmp = **p_argv.f;
 	  if (fparg_count < NUM_FPR_ARG_REGISTERS64 && i < nfixedargs)
 	{
 	  *fpr_base.d++ = double_tmp;
 #if _CALL_ELF != 2
 	  if ((flags & FLAG_COMPAT) != 0)
-		*next_arg.f = (float) double_tmp;
+		{
+# ifndef __LITTLE_ENDIAN__
+		  next_arg.f[1] = (float) double_tmp;
+# else
+		  next_arg.f[0] = (float) double_tmp;
+# endif
+		}
 #endif
 	}
 	  else
-	*next_arg.f = (float) double_tmp;
+	{
+# ifndef __LITTLE_ENDIAN__
+	  next_arg.f[1] = (float) double_tmp;
+# else
+	  next_arg.f[0] = (float) double_tmp;
+# endif
+	}
 	  if (++next_arg.ul == 

[gomp5] omp_lock_hint_t -> omp_sync_hint_t

2018-05-04 Thread Jakub Jelinek
Hi!

Just renaming with keeping the old enumerators/typedefs for backwards
compatibility.

Committed to gomp-5_0-branch.

2018-05-04  Jakub Jelinek  

* omp.h.in (enum omp_lock_hint_t): Renamed to ...
(enum omp_sync_hint_t): ... this.  Define omp_sync_hint_*
enumerators using numbers and omp_lock_hint_* as their aliases.
(omp_lock_hint_t): New typedef.  Rename to ...
(omp_sync_hint_t): ... this.
(omp_init_lock_with_hint, omp_init_nest_lock_with_hint): Use
omp_sync_hint_t instead of omp_lock_hint_t.

--- libgomp/omp.h.in.jj 2018-04-30 13:21:05.925866091 +0200
+++ libgomp/omp.h.in2018-05-04 12:31:28.863633774 +0200
@@ -62,14 +62,21 @@ typedef enum omp_proc_bind_t
   omp_proc_bind_spread = 4
 } omp_proc_bind_t;
 
-typedef enum omp_lock_hint_t
+typedef enum omp_sync_hint_t
 {
-  omp_lock_hint_none = 0,
-  omp_lock_hint_uncontended = 1,
-  omp_lock_hint_contended = 2,
-  omp_lock_hint_nonspeculative = 4,
-  omp_lock_hint_speculative = 8,
-} omp_lock_hint_t;
+  omp_sync_hint_none = 0,
+  omp_lock_hint_none = omp_sync_hint_none,
+  omp_sync_hint_uncontended = 1,
+  omp_lock_hint_uncontended = omp_sync_hint_uncontended,
+  omp_sync_hint_contended = 2,
+  omp_lock_hint_contended = omp_sync_hint_contended,
+  omp_sync_hint_nonspeculative = 4,
+  omp_lock_hint_nonspeculative = omp_sync_hint_nonspeculative,
+  omp_sync_hint_speculative = 8,
+  omp_lock_hint_speculative = omp_sync_hint_speculative
+} omp_sync_hint_t;
+
+typedef omp_sync_hint_t omp_lock_hint_t;
 
 #ifdef __cplusplus
 extern "C" {
@@ -93,7 +100,7 @@ extern void omp_set_nested (int) __GOMP_
 extern int omp_get_nested (void) __GOMP_NOTHROW;
 
 extern void omp_init_lock (omp_lock_t *) __GOMP_NOTHROW;
-extern void omp_init_lock_with_hint (omp_lock_t *, omp_lock_hint_t)
+extern void omp_init_lock_with_hint (omp_lock_t *, omp_sync_hint_t)
   __GOMP_NOTHROW;
 extern void omp_destroy_lock (omp_lock_t *) __GOMP_NOTHROW;
 extern void omp_set_lock (omp_lock_t *) __GOMP_NOTHROW;
@@ -101,7 +108,7 @@ extern void omp_unset_lock (omp_lock_t *
 extern int omp_test_lock (omp_lock_t *) __GOMP_NOTHROW;
 
 extern void omp_init_nest_lock (omp_nest_lock_t *) __GOMP_NOTHROW;
-extern void omp_init_nest_lock_with_hint (omp_nest_lock_t *, omp_lock_hint_t)
+extern void omp_init_nest_lock_with_hint (omp_nest_lock_t *, omp_sync_hint_t)
   __GOMP_NOTHROW;
 extern void omp_destroy_nest_lock (omp_nest_lock_t *) __GOMP_NOTHROW;
 extern void omp_set_nest_lock (omp_nest_lock_t *) __GOMP_NOTHROW;

Jakub


Re: libffi PowerPC64 ELFv1 fp arg fixes

2018-05-04 Thread Segher Boessenkool
On Fri, May 04, 2018 at 07:40:20PM +0930, Alan Modra wrote:
> The attached patch has been accepted into upstream libffi.  It fixes
> powerpc64-linux problems shown up by Bruno Haible's new libffi
> testsuite tests.  Bootstrapped and regression tested powerpc64-linux
> and powerpc64le-linux.  OK mainline and active branches?

That looks fine, and since it is in upstream it counts as obvious I
think?

Is there a test for this, btw?


Segher


> >From a3b6c9db017d3f142031636a9dd6088c5478ca28 Mon Sep 17 00:00:00 2001
> From: Alan Modra 
> Date: Wed, 2 May 2018 19:10:53 +0930
> Subject: [PATCH] libffi PowerPC64 ELFv1 fp arg fixes
> 
> The ELFv1 ABI says: "Single precision floating point values are mapped
> to the second word in a single doubleword" and also "Floating point
> registers f1 through f13 are used consecutively to pass up to 13
> floating point values, one member aggregates passed by value
> containing a floating point value, and to pass complex floating point
> values".
> 
> libffi wasn't expecting float args in the second word, and wasn't
> passing one member aggregates in fp registers.  This patch fixes those
> problems, making use of the existing ELFv2 homogeneous aggregate
> support since a one element fp struct is a special case of an
> homogeneous aggregate.
> 
> I've also set a flag when returning pointers that might be used one
> day.  This is just a tidy since the ppc64 assembly support code
> currently doesn't test FLAG_RETURNS_64BITS for integer types..
> 
>   * src/powerpc/ffi_linux64.c (discover_homogeneous_aggregate):
>   Compile for ELFv1 too, handling single element aggregates.
>   (ffi_prep_cif_linux64_core): Call discover_homogeneous_aggregate
>   for ELFv1.  Set FLAG_RETURNS_64BITS for FFI_TYPE_POINTER return.
>   (ffi_prep_args64): Call discover_homogeneous_aggregate for ELFv1,
>   and handle single element structs containing float or double
>   as if the element wasn't wrapped in a struct.  Store floats in
>   second word of doubleword slot when big-endian.
>   (ffi_closure_helper_LINUX64): Similarly.


[PATCH] Fix memleaks

2018-05-04 Thread Richard Biener

The following fixes three memleaks I discovered when double-checking a
local patch.

Bootstrap / regtest running on x86_64-unknown-linux-gnu.

Richard.

2018-05-04  Richard Biener  

* bb-reorder.c (sanitize_hot_paths): Release hot_bbs_to_check.
* gimple-ssa-store-merging.c
(imm_store_chain_info::output_merged_store): Remove redundant create,
release split_store vector contents on failure.
* tree-vect-slp.c (vect_schedule_slp_instance): Avoid leaking
scalar stmt vector on cache hit.

diff --git a/gcc/bb-reorder.c b/gcc/bb-reorder.c
index d2b41606a14..bc08e11a81d 100644
--- a/gcc/bb-reorder.c
+++ b/gcc/bb-reorder.c
@@ -1572,6 +1572,7 @@ sanitize_hot_paths (bool walk_up, unsigned int 
cold_bb_count,
   hot_bbs_to_check.safe_push (reach_bb);
 }
 }
+  hot_bbs_to_check.release ();
 
   return cold_bb_count;
 }
diff --git a/gcc/gimple-ssa-store-merging.c b/gcc/gimple-ssa-store-merging.c
index 6f6538bf37e..2e1a6ef0e55 100644
--- a/gcc/gimple-ssa-store-merging.c
+++ b/gcc/gimple-ssa-store-merging.c
@@ -3343,6 +3343,8 @@ invert_op (split_store *split_store, int idx, tree 
int_type, tree &mask)
 bool
 imm_store_chain_info::output_merged_store (merged_store_group *group)
 {
+  split_store *split_store;
+  unsigned int i;
   unsigned HOST_WIDE_INT start_byte_pos
 = group->bitregion_start / BITS_PER_UNIT;
 
@@ -3351,7 +3353,6 @@ imm_store_chain_info::output_merged_store 
(merged_store_group *group)
 return false;
 
   auto_vec split_stores;
-  split_stores.create (0);
   bool allow_unaligned_store
 = !STRICT_ALIGNMENT && PARAM_VALUE (PARAM_STORE_MERGING_ALLOW_UNALIGNED);
   bool allow_unaligned_load = allow_unaligned_store;
@@ -3378,6 +3379,8 @@ imm_store_chain_info::output_merged_store 
(merged_store_group *group)
fprintf (dump_file, "Exceeded original number of stmts (%u)."
"  Not profitable to emit new sequence.\n",
 orig_num_stmts);
+  FOR_EACH_VEC_ELT (split_stores, i, split_store)
+   delete split_store;
   return false;
 }
   if (total_orig <= total_new)
@@ -3389,6 +3392,8 @@ imm_store_chain_info::output_merged_store 
(merged_store_group *group)
" not larger than estimated number of new"
" stmts (%u).\n",
 total_orig, total_new);
+  FOR_EACH_VEC_ELT (split_stores, i, split_store)
+   delete split_store;
   return false;
 }
 
@@ -3453,8 +3458,6 @@ imm_store_chain_info::output_merged_store 
(merged_store_group *group)
 }
 
   gimple *stmt = NULL;
-  split_store *split_store;
-  unsigned int i;
   auto_vec orig_stmts;
   gimple_seq this_seq;
   tree addr = force_gimple_operand_1 (unshare_expr (base_addr), &this_seq,
diff --git a/gcc/tree-vect-slp.c b/gcc/tree-vect-slp.c
index 73aa2271b53..d1703cfca43 100644
--- a/gcc/tree-vect-slp.c
+++ b/gcc/tree-vect-slp.c
@@ -4064,15 +4086,15 @@ vect_schedule_slp_instance (slp_tree node, slp_instance 
instance,
 
   /* See if we have already vectorized the same set of stmts and reuse their
  vectorized stmts.  */
-  slp_tree &leader
-= bst_map->get_or_insert (SLP_TREE_SCALAR_STMTS (node).copy ());
-  if (leader)
+  if (slp_tree *leader = bst_map->get (SLP_TREE_SCALAR_STMTS (node)))
 {
-  SLP_TREE_VEC_STMTS (node).safe_splice (SLP_TREE_VEC_STMTS (leader));
+  SLP_TREE_VEC_STMTS (node).safe_splice (SLP_TREE_VEC_STMTS (*leader));
+  SLP_TREE_NUMBER_OF_VEC_STMTS (node)
+   = SLP_TREE_NUMBER_OF_VEC_STMTS (*leader);
   return false;
 }
 
-  leader = node;
+  bst_map->put (SLP_TREE_SCALAR_STMTS (node).copy (), node);
   FOR_EACH_VEC_ELT (SLP_TREE_CHILDREN (node), i, child)
 vect_schedule_slp_instance (child, instance, bst_map);
 


[PATCH, i386]: AVX false dependencies fix

2018-05-04 Thread Nesterovskiy, Alexander
This is the same patch I posted a few days ago, a bit modified according to 
Uros' recommendation.

Patch fixes false dependencies for vmovss, vmovsd, vrcpss, vrsqrtss, vsqrtss 
and vsqrtsd instructions.
Tested on x86-64/Linux, no new test fails, some SPEC 2006/2017 performance 
gains.

2018-05-04  Alexander Nesterovskiy  

* config/i386/i386.md (*movsf_internal): AVX falsedep fix.
(*movdf_internal): Ditto.
(*rcpsf2_sse): Ditto.
(*rsqrtsf2_sse): Ditto.
(*sqrt2_sse): Ditto.

--
Alexander Nesterovskiy


avx_falsedep.patch
Description: avx_falsedep.patch


[og7, libgomp, openacc, nvptx, committed] Don't select too many workers

2018-05-04 Thread Tom de Vries

Hi,

On the og7 branch for Titan V, we run into this error message in 
testsuite polybench for testcases covariance and lu:

...
libgomp: The Nvidia accelerator has insufficient resources to launch 
'x$_omp_fn$0' with num_workers = 27 and vector_length = 32; recompile 
the program with 'num_workers = x and vector_length = y' on that 
offloaded region or '-fopenacc-dim=-:x:y' where x * y <= 768.

...

The problem here is that num_workers is chosen by libgomp, and instead 
of giving the error, it should reduce the num_workers.


Fixed by this patch.

Build x86_64 with nvptx accelerator, tested libgomp.

Committed to og7 branch.

Thanks,
- Tom
[libgomp, openacc, nvptx] Don't select too many workers

2018-05-04  Tom de Vries  

	PR libgomp/85649
	* plugin/plugin-nvptx.c (MIN, MAX): Redefine.
	(nvptx_exec): Choose num_workers such that device has sufficient
	resources.

---
 libgomp/plugin/plugin-nvptx.c | 9 -
 1 file changed, 8 insertions(+), 1 deletion(-)

diff --git a/libgomp/plugin/plugin-nvptx.c b/libgomp/plugin/plugin-nvptx.c
index 3c00555..e4d87f5 100644
--- a/libgomp/plugin/plugin-nvptx.c
+++ b/libgomp/plugin/plugin-nvptx.c
@@ -189,6 +189,12 @@ cuda_error (CUresult r)
   return desc;
 }
 
+/* From gcc/system.h.  */
+#undef MIN
+#undef MAX
+#define MIN(X,Y) ((X) < (Y) ? (X) : (Y))
+#define MAX(X,Y) ((X) > (Y) ? (X) : (Y))
+
 static unsigned int instantiated_devices = 0;
 static pthread_mutex_t ptx_dev_lock = PTHREAD_MUTEX_INITIALIZER;
 
@@ -802,7 +808,8 @@ nvptx_exec (void (*fn), size_t mapnum, void **hostaddrs, void **devaddrs,
 {
   int vectors = dims[GOMP_DIM_VECTOR] > 0
 	? dims[GOMP_DIM_VECTOR] : warp_size;
-  int workers = threads_per_block / vectors;
+  int workers
+	= MIN (threads_per_block, targ_fn->max_threads_per_block) / vectors;
 
   for (i = 0; i != GOMP_DIM_MAX; i++)
 	if (!dims[i])


New Ukrainian PO file for 'gcc' (version 8.1.0)

2018-05-04 Thread Translation Project Robot
Hello, gentle maintainer.

This is a message from the Translation Project robot.

A revised PO file for textual domain 'gcc' has been submitted
by the Ukrainian team of translators.  The file is available at:

http://translationproject.org/latest/gcc/uk.po

(This file, 'gcc-8.1.0.uk.po', has just now been sent to you in
a separate email.)

All other PO files for your package are available in:

http://translationproject.org/latest/gcc/

Please consider including all of these in your next release, whether
official or a pretest.

Whenever you have a new distribution with a new version number ready,
containing a newer POT file, please send the URL of that distribution
tarball to the address below.  The tarball may be just a pretest or a
snapshot, it does not even have to compile.  It is just used by the
translators when they need some extra translation context.

The following HTML page has been updated:

http://translationproject.org/domain/gcc.html

If any question arises, please contact the translation coordinator.

Thank you for all your work,

The Translation Project robot, in the
name of your translation coordinator.




Re: libffi PowerPC64 ELFv1 fp arg fixes

2018-05-04 Thread Alan Modra
On Fri, May 04, 2018 at 06:02:27AM -0500, Segher Boessenkool wrote:
> On Fri, May 04, 2018 at 07:40:20PM +0930, Alan Modra wrote:
> > The attached patch has been accepted into upstream libffi.  It fixes
> > powerpc64-linux problems shown up by Bruno Haible's new libffi
> > testsuite tests.  Bootstrapped and regression tested powerpc64-linux
> > and powerpc64le-linux.  OK mainline and active branches?
> 
> That looks fine, and since it is in upstream it counts as obvious I
> think?
> 
> Is there a test for this, btw?

https://github.com/libffi/libffi/tree/master/testsuite/libffi.bhaible
I'm guessing this could be imported but I'm not sure of the politics.

powerpc64 failed test-call.c DGTEST=12, DGTEST=13, DGTEST=55,
DGTEST=56, DGTEST=57 and DGTEST=68, and corresponding tests in
test-callback.c.

-- 
Alan Modra
Australia Development Lab, IBM


[PATCH 0/8] [BRIGFE] various fixes and optimizations

2018-05-04 Thread Pekka Jääskeläinen

Hi,

I'm posting a series of patches that I will shortly commit to trunk
as a maintainer of the BRIG frontend. All of them are build tested,
the last one also with a bootstrap build.

Best regards,
--
Pekka


[PATCH 1/8] [BRIGFE] fix an alloca stack underflow

2018-05-04 Thread Pekka Jääskeläinen

We didn't preserve additional space for the alloca frame pointers that
are needed to be saved in the alloca space.

Fixes libgomp.c++/target-6.C execution test.
---
 libhsail-rt/rt/workitems.c | 15 ---
 1 file changed, 12 insertions(+), 3 deletions(-)

diff --git a/libhsail-rt/rt/workitems.c b/libhsail-rt/rt/workitems.c
index 39daf27..36c9169 100644
--- a/libhsail-rt/rt/workitems.c
+++ b/libhsail-rt/rt/workitems.c
@@ -63,6 +63,12 @@ static clock_t start_time;
 #define FIBER_STACK_SIZE (64*1024)
 #define GROUP_SEGMENT_ALIGN 256
 +/* Preserve this amount of additional space in the alloca stack as we 
need to

+   store the alloca frame pointer to the alloca frame, thus must preserve
+   space for it.  This thus supports at most 1024 functions with allocas in
+   a call chain.  */
+#define ALLOCA_OVERHEAD 1024*4
+
 uint32_t __hsail_workitemabsid (uint32_t dim, PHSAWorkItem *context);
  uint32_t __hsail_workitemid (uint32_t dim, PHSAWorkItem *context);
@@ -246,7 +252,7 @@ phsa_execute_wi_gang (PHSAKernelLaunchData *context, 
void *group_base_ptr,

   != 0)
 phsa_fatal_error (3);
 -  wg.alloca_stack_p = wg.private_segment_total_size;
+  wg.alloca_stack_p = wg.private_segment_total_size + ALLOCA_OVERHEAD;
   wg.alloca_frame_p = wg.alloca_stack_p;
   wg.initial_group_offset = group_local_offset;
 @@ -446,7 +452,7 @@ phsa_execute_work_groups (PHSAKernelLaunchData 
*context, void *group_base_ptr,

   != 0)
 phsa_fatal_error (3);
 -  wg.alloca_stack_p = dp->private_segment_size * wg_size;
+  wg.alloca_stack_p = dp->private_segment_size * wg_size + ALLOCA_OVERHEAD;
   wg.alloca_frame_p = wg.alloca_stack_p;
wg.private_base_ptr = private_base_ptr;
@@ -867,9 +873,12 @@ uint32_t
 __hsail_alloca (uint32_t size, uint32_t align, PHSAWorkItem *wi)
 {
   volatile PHSAWorkGroup *wg = wi->wg;
-  uint32_t new_pos = wg->alloca_stack_p - size;
+  int64_t new_pos = wg->alloca_stack_p - size;
   while (new_pos % align != 0)
 new_pos--;
+  if (new_pos < 0)
+phsa_fatal_error (2);
+
   wg->alloca_stack_p = new_pos;
  #ifdef DEBUG_ALLOCA
--
2.7.4





[PATCH 2/8] [BRIGFE] Enable whole program optimizations

2018-05-04 Thread Pekka Jääskeläinen

HSA assumes all program scope HSAIL symbols can be queried from
the host runtime API, thus cannot be removed by the IPA.

Getting some inlining happening in the finalized binary required:
* explicitly marking the 'prog' scope functions and the launcher
function "externally_visible" to avoid the inliner removing it
* also the host_def ptr is set to externally visible, otherwise
IPA assumes it's never set
* adding the 'inline' keyword to functions to enable inlining,
otherwise GCC defaults to replaceable functions (one can link
over the previous one) which cannot be inlined
* replacing all calls to declarations with calls to definitions to
enable the inliner to find the definition
* to fix missing hidden argument types in the generated functions.
These were ignored silently until GCC started to be able to
inline calls to such functions.
* do not gimplify before fixing the call targets. Otherwise the
calls get detached and the definitions are not found. The reason
why this happens is not clear, but gimplifying only after call
target decl->def conversion fixes this.
---
 gcc/brig/ChangeLog| 11 +++
 gcc/brig/brig-lang.c  |  4 +-
 gcc/brig/brigfrontend/brig-branch-inst-handler.cc |  2 +
 gcc/brig/brigfrontend/brig-function-handler.cc| 30 ++---
 gcc/brig/brigfrontend/brig-function.cc|  4 +-
 gcc/brig/brigfrontend/brig-to-generic.cc  | 82 
++-

 gcc/brig/brigfrontend/brig-to-generic.h   |  8 +++
 gcc/brig/brigfrontend/brig-variable-handler.cc|  3 +
 8 files changed, 130 insertions(+), 14 deletions(-)

>From 6a6a7c1913052f67f92743bc7591d5fc431f02e3 Mon Sep 17 00:00:00 2001
From: =?UTF-8?q?Pekka=20J=C3=A4=C3=A4skel=C3=A4inen?=
 
Date: Sat, 27 Jan 2018 10:36:42 +0100
Subject: [PATCH 2/8] [BRIGFE] Enable whole program optimizations

HSA assumes all program scope HSAIL symbols can be queried from
the host runtime API, thus cannot be removed by the IPA.

Getting some inlining happening in the finalized binary required:
* explicitly marking the 'prog' scope functions and the launcher
function "externally_visible" to avoid the inliner removing it
* also the host_def ptr is set to externally visible, otherwise
IPA assumes it's never set
* adding the 'inline' keyword to functions to enable inlining,
otherwise GCC defaults to replaceable functions (one can link
over the previous one) which cannot be inlined
* replacing all calls to declarations with calls to definitions to
enable the inliner to find the definition
* to fix missing hidden argument types in the generated functions.
These were ignored silently until GCC started to be able to
inline calls to such functions.
* do not gimplify before fixing the call targets. Otherwise the
calls get detached and the definitions are not found. The reason
why this happens is not clear, but gimplifying only after call
target decl->def conversion fixes this.
---
 gcc/brig/ChangeLog| 11 +++
 gcc/brig/brig-lang.c  |  4 +-
 gcc/brig/brigfrontend/brig-branch-inst-handler.cc |  2 +
 gcc/brig/brigfrontend/brig-function-handler.cc| 30 ++---
 gcc/brig/brigfrontend/brig-function.cc|  4 +-
 gcc/brig/brigfrontend/brig-to-generic.cc  | 82 ++-
 gcc/brig/brigfrontend/brig-to-generic.h   |  8 +++
 gcc/brig/brigfrontend/brig-variable-handler.cc|  3 +
 8 files changed, 130 insertions(+), 14 deletions(-)

diff --git a/gcc/brig/ChangeLog b/gcc/brig/ChangeLog
index 7805b99..4abe773 100644
--- a/gcc/brig/ChangeLog
+++ b/gcc/brig/ChangeLog
@@ -1,3 +1,14 @@
+2018-xx-yy  Pekka Jääskeläinen  
+
+	* brig/brig-lang.c: Add support for whole program
+	optimizations by marking the kernels externally visible.
+	* brig/brigfrontend/brig-branch-inst-handler.cc: See above.
+	* brig/brigfrontend/brig-function-handler.cc: See above.
+	* brig/brigfrontend/brig-function.cc: See above.
+	* brig/brigfrontend/brig-to-generic.cc: See above.
+	* brig/brigfrontend/brig-to-generic.h: See above.
+	* brig/brigfrontend/brig-variable-handler.h: See above.
+
 2018-01-03  Richard Sandiford  
 	Alan Hayward  
 	David Sherwood  
diff --git a/gcc/brig/brig-lang.c b/gcc/brig/brig-lang.c
index 997dad4..030d76a 100644
--- a/gcc/brig/brig-lang.c
+++ b/gcc/brig/brig-lang.c
@@ -57,7 +57,7 @@ static tree handle_pure_attribute (tree *, tree, tree, int, bool *);
 static tree handle_nothrow_attribute (tree *, tree, tree, int, bool *);
 static tree handle_returns_twice_attribute (tree *, tree, tree, int, bool *);
 
-/* This file is based on Go frontent'd go-lang.c and gogo-tree.cc.  */
+/* This file is based on Go frontend's go-lang.c and gogo-tree.cc.  */
 
 /* If -v set.  */
 
@@ -123,7 +123,7 @@ brig_langhook_init_options_struct (struct gcc_options *opts)
   /* If we set this to one, the whole program optimizations internalize
  all global variables, making them invisible to the dyn loader (and

[PATCH 3/8] [BRIGFE] The modulo in ID computation should not be needed.

2018-05-04 Thread Pekka Jääskeläinen

The case where a dim is greater than the grid size doesn't seem
to be mentioned in the specs nor tested by PRM test suite.
---
 gcc/brig/brigfrontend/brig-code-entry-handler.cc | 10 +-
 1 file changed, 1 insertion(+), 9 deletions(-)

>From 99303afff584518c1fd17e3c6ebe965043dd58f0 Mon Sep 17 00:00:00 2001
From: =?UTF-8?q?Pekka=20J=C3=A4=C3=A4skel=C3=A4inen?=
 
Date: Tue, 27 Mar 2018 22:19:11 +0300
Subject: [PATCH 3/8] [BRIGFE] The modulo in ID computation should not be
 needed.

The case where a dim is greater than the grid size doesn't seem
to be mentioned in the specs nor tested by PRM test suite.
---
 gcc/brig/brigfrontend/brig-code-entry-handler.cc | 10 +-
 1 file changed, 1 insertion(+), 9 deletions(-)

diff --git a/gcc/brig/brigfrontend/brig-code-entry-handler.cc b/gcc/brig/brigfrontend/brig-code-entry-handler.cc
index 54b53fd..36a8deb 100644
--- a/gcc/brig/brigfrontend/brig-code-entry-handler.cc
+++ b/gcc/brig/brigfrontend/brig-code-entry-handler.cc
@@ -1048,7 +1048,6 @@ brig_code_entry_handler::expand_builtin (BrigOpcode16_t brig_opcode,
   tree local_id_var = m_parent.m_cf->m_local_id_vars[dim];
   tree wg_id_var = m_parent.m_cf->m_wg_id_vars[dim];
   tree wg_size_var = m_parent.m_cf->m_wg_size_vars[dim];
-  tree grid_size_var = m_parent.m_cf->m_grid_size_vars[dim];
 
   tree wg_id_x_wg_size = build2 (MULT_EXPR, uint32_type_node,
  convert (uint32_type_node, wg_id_var),
@@ -1056,15 +1055,8 @@ brig_code_entry_handler::expand_builtin (BrigOpcode16_t brig_opcode,
   tree sum
 	= build2 (PLUS_EXPR, uint32_type_node, wg_id_x_wg_size, local_id_var);
 
-  /* We need a modulo here because of work-groups which have dimensions
-	 larger than the grid size :( TO CHECK: is this really allowed in the
-	 specs?  */
-  tree modulo
-	= build2 (TRUNC_MOD_EXPR, uint32_type_node, sum, grid_size_var);
-
   return add_temp_var (std::string ("workitemabsid_")
-			 + (char) ((int) 'x' + dim),
-			   modulo);
+			   + (char) ((int) 'x' + dim), sum);
 }
   else if (brig_opcode == BRIG_OPCODE_WORKITEMFLATID)
 {
-- 
2.7.4



[PATCH 4/8] [BRIGFE] allow controlling strict aliasing from cmd line

2018-05-04 Thread Pekka Jääskeläinen

---
 gcc/brig/brig-lang.c | 12 +---
 1 file changed, 9 insertions(+), 3 deletions(-)


>From c8a86773e877949fb9308d2dd448ea013be22c3e Mon Sep 17 00:00:00 2001
From: =?UTF-8?q?Pekka=20J=C3=A4=C3=A4skel=C3=A4inen?=
 
Date: Mon, 12 Feb 2018 11:34:58 +0200
Subject: [PATCH 4/8] [BRIGFE] allow controlling strict aliasing from cmd line

---
 gcc/brig/brig-lang.c | 12 +---
 1 file changed, 9 insertions(+), 3 deletions(-)

diff --git a/gcc/brig/brig-lang.c b/gcc/brig/brig-lang.c
index 030d76a..58b98fd 100644
--- a/gcc/brig/brig-lang.c
+++ b/gcc/brig/brig-lang.c
@@ -167,9 +167,15 @@ brig_langhook_post_options (const char **pfilename ATTRIBUTE_UNUSED)
   if (flag_excess_precision_cmdline == EXCESS_PRECISION_DEFAULT)
 flag_excess_precision_cmdline = EXCESS_PRECISION_STANDARD;
 
-  /* gccbrig casts pointers around like crazy, TBAA produces
- broken code if not force disabling it.  */
-  flag_strict_aliasing = 0;
+  /* gccbrig casts pointers around like crazy, TBAA might produce broken
+ code if not disabling it by default.  Some PRM conformance tests such
+ as prm/core/memory/ordinary/ld/ld_u16 fail currently with strict
+ aliasing (to fix).  It can be enabled from the command line for cases
+ that are known not to break the C style aliasing requirements.  */
+  if (!global_options_set.x_flag_strict_aliasing)
+flag_strict_aliasing = 0;
+  else
+flag_strict_aliasing = global_options.x_flag_strict_aliasing;
 
   /* Returning false means that the backend should be used.  */
   return false;
-- 
2.7.4



[PATCH 5/8] [BRIGFE] do not allow optimizations based on known C builtins

2018-05-04 Thread Pekka Jääskeläinen

This can break inputs that have similarly names functions.
---
 gcc/brig/brig-lang.c | 16 +++-
 1 file changed, 7 insertions(+), 9 deletions(-)

>From 105277ee937482ee1a55265b1ec45637bc1e7a0b Mon Sep 17 00:00:00 2001
From: =?UTF-8?q?Pekka=20J=C3=A4=C3=A4skel=C3=A4inen?=
 
Date: Sat, 17 Feb 2018 08:54:43 +0200
Subject: [PATCH 5/8] [BRIGFE] do not allow optimizations based on known C
 builtins

This can break inputs that have similarly names functions.
---
 gcc/brig/brig-lang.c | 16 +++-
 1 file changed, 7 insertions(+), 9 deletions(-)

diff --git a/gcc/brig/brig-lang.c b/gcc/brig/brig-lang.c
index 58b98fd..3c4d4bd 100644
--- a/gcc/brig/brig-lang.c
+++ b/gcc/brig/brig-lang.c
@@ -136,6 +136,8 @@ brig_langhook_init_options_struct (struct gcc_options *opts)
   opts->x_flag_signed_zeros = 1;
 
   opts->x_optimize = 3;
+
+  flag_no_builtin = 1;
 }
 
 /* Handle Brig specific options.  Return 0 if we didn't do anything.  */
@@ -635,9 +637,11 @@ builtin_type_for_size (int size, bool unsignedp)
 
 static void
 def_builtin_1 (enum built_in_function fncode, const char *name,
-	   enum built_in_class fnclass, tree fntype, tree libtype,
-	   bool both_p, bool fallback_p, bool nonansi_p,
-	   tree fnattrs, bool implicit_p)
+	   enum built_in_class fnclass ATTRIBUTE_UNUSED,
+	   tree fntype, tree libtype ATTRIBUTE_UNUSED,
+	   bool both_p ATTRIBUTE_UNUSED, bool fallback_p,
+	   bool nonansi_p ATTRIBUTE_UNUSED, tree fnattrs,
+	   bool implicit_p)
 {
   tree decl;
   const char *libname;
@@ -650,12 +654,6 @@ def_builtin_1 (enum built_in_function fncode, const char *name,
 			   (fallback_p ? libname : NULL),
 			   fnattrs);
 
-  if (both_p
-  && !flag_no_builtin
-  && !(nonansi_p && flag_no_nonansi_builtin))
-add_builtin_function (libname, libtype, fncode, fnclass,
-			  NULL, fnattrs);
-
   set_builtin_decl (fncode, decl, implicit_p);
 }
 
-- 
2.7.4



[PATCH 6/8] [BRIGFE] skip multiple forward declarations of the same function

2018-05-04 Thread Pekka Jääskeläinen

---
 gcc/brig/brigfrontend/brig-function-handler.cc | 4 
 1 file changed, 4 insertions(+)

>From 1c708e887073960b6142d716d3e85f3453d7 Mon Sep 17 00:00:00 2001
From: =?UTF-8?q?Pekka=20J=C3=A4=C3=A4skel=C3=A4inen?=
 
Date: Sat, 17 Feb 2018 08:59:25 +0200
Subject: [PATCH 6/8] [BRIGFE] skip multiple forward declarations of the same
 function

---
 gcc/brig/brigfrontend/brig-function-handler.cc | 4 
 1 file changed, 4 insertions(+)

diff --git a/gcc/brig/brigfrontend/brig-function-handler.cc b/gcc/brig/brigfrontend/brig-function-handler.cc
index c524dbe..d64135d 100644
--- a/gcc/brig/brigfrontend/brig-function-handler.cc
+++ b/gcc/brig/brigfrontend/brig-function-handler.cc
@@ -80,6 +80,10 @@ brig_directive_function_handler::operator () (const BrigBase *base)
   if (m_parent.m_analyzing)
 return bytes_consumed;
 
+  /* There can be multiple forward declarations of the same function.
+ Skip all but the first one.  */
+  if (!is_definition && m_parent.function_decl (func_name) != NULL_TREE)
+return bytes_consumed;
   tree fndecl;
   tree ret_value = NULL_TREE;
 
-- 
2.7.4



[PATCH 8/8] [BRIGFE] Fix handling of NOPs

2018-05-04 Thread Pekka Jääskeläinen

---
 gcc/brig/brigfrontend/brig-basic-inst-handler.cc | 10 +-
 1 file changed, 5 insertions(+), 5 deletions(-)

>From fee48e53063309a58a9a3050df26395ae1615111 Mon Sep 17 00:00:00 2001
From: =?UTF-8?q?Pekka=20J=C3=A4=C3=A4skel=C3=A4inen?=
 
Date: Thu, 12 Oct 2017 15:55:11 +0200
Subject: [PATCH 8/8] [BRIGFE] Fix handling of NOPs

---
 gcc/brig/brigfrontend/brig-basic-inst-handler.cc | 10 +-
 1 file changed, 5 insertions(+), 5 deletions(-)

diff --git a/gcc/brig/brigfrontend/brig-basic-inst-handler.cc b/gcc/brig/brigfrontend/brig-basic-inst-handler.cc
index c8224ae..75e1cfa 100644
--- a/gcc/brig/brigfrontend/brig-basic-inst-handler.cc
+++ b/gcc/brig/brigfrontend/brig-basic-inst-handler.cc
@@ -447,6 +447,8 @@ size_t
 brig_basic_inst_handler::operator () (const BrigBase *base)
 {
   const BrigInstBase *brig_inst = (const BrigInstBase *) base;
+  if (brig_inst->opcode == BRIG_OPCODE_NOP)
+return base->byteCount;
 
   tree_stl_vec operands = build_operands (*brig_inst);
 
@@ -466,11 +468,9 @@ brig_basic_inst_handler::operator () (const BrigBase *base)
 
   BrigType16_t brig_inst_type = brig_inst->type;
 
-  if (brig_inst->opcode == BRIG_OPCODE_NOP)
-return base->byteCount;
-  else if (brig_inst->opcode == BRIG_OPCODE_FIRSTBIT
-	   || brig_inst->opcode == BRIG_OPCODE_LASTBIT
-	   || brig_inst->opcode == BRIG_OPCODE_SAD)
+  if (brig_inst->opcode == BRIG_OPCODE_FIRSTBIT
+  || brig_inst->opcode == BRIG_OPCODE_LASTBIT
+  || brig_inst->opcode == BRIG_OPCODE_SAD)
 /* These instructions are reported to be always 32b in HSAIL, but we want
to treat them according to their input argument's type to select the
correct instruction/builtin.  */
-- 
2.7.4



[PATCH 7/8] [BRIGFE] phsa-specific optimizations

2018-05-04 Thread Pekka Jääskeläinen

Add flag -fassume-phsa that is on by default. If -fno-assume-phsa
is given, these optimizations are disabled.

With this flag, gccbrig can generate GENERIC that assumes we are
targeting a phsa-runtime based implementation, which allows us
to expose the work-item context accesses to retrieve WI IDs etc.
which helps optimizers.

First optimization that takes advantage of this is to get rid of
the setworkitemid calls whenever we have non-inlined calls that
use IDs internally.

Other optimizations added in this commit:

- expand absoluteid to similar level of simplicity as workitemid.
At the moment absoluteid is the best indexing ID to end up with
WG vectorization.
- propagate ID variables closer to their uses. This is mainly
to avoid known useless casts, which confuse at least scalar
evolution analysis.
- use signed long long for storing IDs. Unsigned integers have
defined wraparound semantics, which confuse at least scalar
evolution analysis, leading to unvectorizable WI loops.
- also refactor some BRIG function generation helpers to brig_function.
- no point in having the wi-loop as a for-loop. It's really
a do...while and SCEV can analyze it just fine still.
- add consts to ptrs etc. in BRIG builtin defs.
Improves optimization opportunities.
- add qualifiers to generated function parameters.
Const and restrict on the hidden local/private pointers,
the arg buffer and the context pointer help some optimizations.
---
 gcc/brig-builtins.def  |  27 +-
 gcc/brig/brigfrontend/brig-basic-inst-handler.cc   | 172 +---
 gcc/brig/brigfrontend/brig-branch-inst-handler.cc  |  21 +-
 gcc/brig/brigfrontend/brig-cmp-inst-handler.cc |   6 +-
 gcc/brig/brigfrontend/brig-code-entry-handler.cc   | 503 +--
 gcc/brig/brigfrontend/brig-code-entry-handler.h|  21 -
 gcc/brig/brigfrontend/brig-control-handler.cc  |  20 +-
 gcc/brig/brigfrontend/brig-cvt-inst-handler.cc |   6 +
 gcc/brig/brigfrontend/brig-function-handler.cc |  89 +-
 gcc/brig/brigfrontend/brig-function.cc | 925 
+++--

 gcc/brig/brigfrontend/brig-function.h  |  43 +
 gcc/brig/brigfrontend/brig-label-handler.cc|   3 +
 gcc/brig/brigfrontend/brig-lane-inst-handler.cc|   2 +-
 gcc/brig/brigfrontend/brig-mem-inst-handler.cc |   7 +-
 gcc/brig/brigfrontend/phsa.h   |   9 +
 gcc/brig/lang.opt  |   5 +
 gcc/builtin-types.def  |   4 +
 gcc/testsuite/brig.dg/test/gimple/smoke_test.hsail |  10 +-
 libhsail-rt/include/internal/phsa-rt.h |   1 -
 libhsail-rt/include/internal/workitems.h   |  50 +-
 libhsail-rt/rt/workitems.c |  84 +-
 21 files changed, 1195 insertions(+), 813 deletions(-)

>From 56864a873079ab21087474abe19949f93be9b3d3 Mon Sep 17 00:00:00 2001
From: =?UTF-8?q?Pekka=20J=C3=A4=C3=A4skel=C3=A4inen?=
 
Date: Sat, 17 Feb 2018 10:16:03 +0200
Subject: [PATCH 7/8] [BRIGFE] phsa-specific optimizations

Add flag -fassume-phsa that is on by default. If -fno-assume-phsa
is given, these optimizations are disabled.

With this flag, gccbrig can generate GENERIC that assumes we are
targeting a phsa-runtime based implementation, which allows us
to expose the work-item context accesses to retrieve WI IDs etc.
which helps optimizers.

First optimization that takes advantage of this is to get rid of
the setworkitemid calls whenever we have non-inlined calls that
use IDs internally.

Other optimizations added in this commit:

- expand absoluteid to similar level of simplicity as workitemid.
At the moment absoluteid is the best indexing ID to end up with
WG vectorization.
- propagate ID variables closer to their uses. This is mainly
to avoid known useless casts, which confuse at least scalar
evolution analysis.
- use signed long long for storing IDs. Unsigned integers have
defined wraparound semantics, which confuse at least scalar
evolution analysis, leading to unvectorizable WI loops.
- also refactor some BRIG function generation helpers to brig_function.
- no point in having the wi-loop as a for-loop. It's really
a do...while and SCEV can analyze it just fine still.
- add consts to ptrs etc. in BRIG builtin defs.
Improves optimization opportunities.
- add qualifiers to generated function parameters.
Const and restrict on the hidden local/private pointers,
the arg buffer and the context pointer help some optimizations.
---
 gcc/brig-builtins.def  |  27 +-
 gcc/brig/brigfrontend/brig-basic-inst-handler.cc   | 172 +---
 gcc/brig/brigfrontend/brig-branch-inst-handler.cc  |  21 +-
 gcc/brig/brigfrontend/brig-cmp-inst-handler.cc |   6 +-
 gcc/brig/brigfrontend/brig-code-entry-handler.cc   | 503 +--
 gcc/brig/brigfrontend/brig-code-entry-handler.h|  21 -
 gcc/brig/brigfrontend/brig-control-handler.cc  |  20 +-
 gcc/brig/brigfrontend/brig-cvt-inst-handler.cc |   6 +
 gcc/brig/brigfrontend/brig-function-h

libgo patch committed: Fix unaligned read in unwind code

2018-05-04 Thread Ian Lance Taylor
This patch by Than McIntosh fixes some unaligned reads in the Go
unwinding code.  Bootstrapped and ran a few Go tests on
sparc-solaris11.  Committed to mainline.

Ian
Index: gcc/go/gofrontend/MERGE
===
--- gcc/go/gofrontend/MERGE (revision 259920)
+++ gcc/go/gofrontend/MERGE (working copy)
@@ -1,4 +1,4 @@
-30e2033a91fc08be9351d26737599a1fa6486017
+0c9b7a1ca4c6308345ea2a276cf820ff52513592
 
 The first line of this file holds the git revision number of the last
 merge done from the gofrontend repository.
Index: libgo/runtime/go-unwind.c
===
--- libgo/runtime/go-unwind.c   (revision 259861)
+++ libgo/runtime/go-unwind.c   (working copy)
@@ -197,10 +197,6 @@ read_sleb128 (const uint8_t *p, _sleb128
 
 #define ROUND_UP_TO_PVB(x) (x + sizeof(void *) - 1) &- sizeof(void *)
 
-#define COPY_AND_ADVANCE(dst, ptr, typ) \
-  (dst = *((const typ*)ptr),\
-   ptr += sizeof(typ))
-
 static inline const uint8_t *
 read_encoded_value (struct _Unwind_Context *context, uint8_t encoding,
 const uint8_t *p, _Unwind_Ptr *val)
@@ -221,17 +217,53 @@ read_encoded_value (struct _Unwind_Conte
   switch (encoding & 0x0f)
 {
   case DW_EH_PE_sdata2:
+{
+  int16_t result;
+  __builtin_memcpy (&result, p, sizeof(int16_t));
+  decoded = result;
+  p += sizeof(int16_t);
+  break;
+}
   case DW_EH_PE_udata2:
-COPY_AND_ADVANCE (decoded, p, uint16_t);
-break;
+{
+  uint16_t result;
+  __builtin_memcpy (&result, p, sizeof(uint16_t));
+  decoded = result;
+  p += sizeof(uint16_t);
+  break;
+}
   case DW_EH_PE_sdata4:
+{
+  int32_t result;
+  __builtin_memcpy (&result, p, sizeof(int32_t));
+  decoded = result;
+  p += sizeof(int32_t);
+  break;
+}
   case DW_EH_PE_udata4:
-COPY_AND_ADVANCE (decoded, p, uint32_t);
-break;
+{
+  uint32_t result;
+  __builtin_memcpy (&result, p, sizeof(uint32_t));
+  decoded = result;
+  p += sizeof(uint32_t);
+  break;
+}
   case DW_EH_PE_sdata8:
+{
+  int64_t result;
+  __builtin_memcpy (&result, p, sizeof(int64_t));
+  decoded = result;
+  p += sizeof(int64_t);
+  break;
+}
   case DW_EH_PE_udata8:
-COPY_AND_ADVANCE (decoded, p, uint64_t);
-break;
+{
+  uint64_t result;
+  __builtin_memcpy (&result, p, sizeof(uint64_t));
+  decoded = result;
+  p += sizeof(uint64_t);
+  break;
+}
   case DW_EH_PE_uleb128:
 {
   _uleb128_t value;
@@ -247,7 +279,7 @@ read_encoded_value (struct _Unwind_Conte
   break;
 }
   case DW_EH_PE_absptr:
-decoded = (_Unwind_Internal_Ptr)(*(const void *const *)p);
+__builtin_memcpy (&decoded, (const void *)p, sizeof(const void*));
 p += sizeof(void *);
 break;
   default:


Re: [PATCH, rs6000] Add missing vec_max tests

2018-05-04 Thread Carl Love
Segher:

> > -  *out++ = vec_sel (in0, in1, inl);
> > -  *out++ = vec_sel (in0, in1, inb);
> >    *out++ = vec_sub (in0, in1);
> >    *out++ = vec_sqrt (in0);
> >    *out++ = vec_trunc (in0);
> 
> Why does the patch remove these two vec_sel?  If that is wanted, the
> changelog should mention this.
> 

Segher:

No clue why I removed the vec_sel test.  Been sitting on the testsuite
patch set too long, I don't recall.  I put it back in and tested it. 
Then noticed there wasn't any tests for the xxsel instruction which the
vec_sel test generates.  I added an instruction count test to both the
BE and LE tests.  I found the BE test needed some updating as well. 
The revised patch is given below.  I retested on:

    powerpc64le-unknown-linux-gnu (Power 8 LE)
    powerpc64-unknown-linux-gnu (Power 8 BE)
    powerpc64le-unknown-linux-gnu (Power 9 LE).

Please let me know if everything looks OK now.  Thanks.

 Carl Love




gcc/testsuite/ChangeLog:

2018-05-03 Carl Love  
* gcc.target/powerpc/vsx-vector-6.h (foo): Add test for vec_max,
vec_trunc.
* gcc.target/powerpc/vsx-vector-6-le.c (dg-final): Update xvcmpeqdp,
xvcmpgtdp, xvcmpgedp counts. Add xxsel counts.
* gcc.target/powerpc/vsx-vector-6-be.c (dg-final): Update xvcmpgtdp,
xvcmpgedp counts. Add xxsel counts.
---
 gcc/testsuite/gcc.target/powerpc/vsx-vector-6-be.c |  5 +++--
 gcc/testsuite/gcc.target/powerpc/vsx-vector-6-le.c |  7 ---
 gcc/testsuite/gcc.target/powerpc/vsx-vector-6.h| 14 --
 3 files changed, 19 insertions(+), 7 deletions(-)

diff --git a/gcc/testsuite/gcc.target/powerpc/vsx-vector-6-be.c 
b/gcc/testsuite/gcc.target/powerpc/vsx-vector-6-be.c
index a33f6d1..3305781 100644
--- a/gcc/testsuite/gcc.target/powerpc/vsx-vector-6-be.c
+++ b/gcc/testsuite/gcc.target/powerpc/vsx-vector-6-be.c
@@ -9,8 +9,8 @@
 /* { dg-final { scan-assembler-times "xvadddp" 1 } } */
 /* { dg-final { scan-assembler-times "xxlnor" 7 } } */
 /* { dg-final { scan-assembler-times "xvcmpeqdp" 6 } } */
-/* { dg-final { scan-assembler-times "xvcmpgtdp" 7 } } */
-/* { dg-final { scan-assembler-times "xvcmpgedp" 6 } } */
+/* { dg-final { scan-assembler-times "xvcmpgtdp" 8 } } */
+/* { dg-final { scan-assembler-times "xvcmpgedp" 7 } } */
 /* { dg-final { scan-assembler-times "xvrdpim" 1 } } */
 /* { dg-final { scan-assembler-times "xvmaddadp" 1 } } */
 /* { dg-final { scan-assembler-times "xvmsubadp" 1 } } */
@@ -26,6 +26,7 @@
 /* { dg-final { scan-assembler-times "xvnmaddasp" 1 } } */
 /* { dg-final { scan-assembler-times "vmsumshs" 1 } } */
 /* { dg-final { scan-assembler-times "xxland" 13 } } */
+/* { dg-final { scan-assembler-times "xxsel" 2 } } */
 
 /* Source code for the test in vsx-vector-6.h */
 #include "vsx-vector-6.h"
diff --git a/gcc/testsuite/gcc.target/powerpc/vsx-vector-6-le.c 
b/gcc/testsuite/gcc.target/powerpc/vsx-vector-6-le.c
index fe7eeb1..dbf87b3 100644
--- a/gcc/testsuite/gcc.target/powerpc/vsx-vector-6-le.c
+++ b/gcc/testsuite/gcc.target/powerpc/vsx-vector-6-le.c
@@ -14,9 +14,9 @@
their usage counts being stable.  Therefore, we just ensure at least one
xxlor instruction was generated.  */
 /* { dg-final { scan-assembler "xxlor" } } */
-/* { dg-final { scan-assembler-times "xvcmpeqdp" 5 } } */
-/* { dg-final { scan-assembler-times "xvcmpgtdp" 8 } } */
-/* { dg-final { scan-assembler-times "xvcmpgedp" 6 } } */
+/* { dg-final { scan-assembler-times "xvcmpeqdp" 4 } } */
+/* { dg-final { scan-assembler-times "xvcmpgtdp" 7 } } */
+/* { dg-final { scan-assembler-times "xvcmpgedp" 7 } } */
 /* { dg-final { scan-assembler-times "xvrdpim" 1 } } */
 /* { dg-final { scan-assembler-times "xvmaddadp" 1 } } */
 /* { dg-final { scan-assembler-times "xvmsubadp" 1 } } */
@@ -32,6 +32,7 @@
 /* { dg-final { scan-assembler-times "xvnmaddasp" 1 } } */
 /* { dg-final { scan-assembler-times "vmsumshs" 1 } } */
 /* { dg-final { scan-assembler-times "xxland" 13 } } */
+/* { dg-final { scan-assembler-times "xxsel" 2 } } */
 
 /* Source code for the test in vsx-vector-6.h */
 #include "vsx-vector-6.h"
diff --git a/gcc/testsuite/gcc.target/powerpc/vsx-vector-6.h 
b/gcc/testsuite/gcc.target/powerpc/vsx-vector-6.h
index 422f8a1..a891b64 100644
--- a/gcc/testsuite/gcc.target/powerpc/vsx-vector-6.h
+++ b/gcc/testsuite/gcc.target/powerpc/vsx-vector-6.h
@@ -7,7 +7,9 @@
 void foo (vector double *out, vector double *in, vector long *p_l, vector bool 
long *p_b,
      vector unsigned char *p_uc, int *i, vector float *p_f,
      vector bool char *outbc, vector bool int *outbi,
-     vector bool short *outbsi, vector int *outsi, vector unsigned int 
*outui)
+     vector bool short *outbsi, vector int *outsi,
+     vector unsigned int *outui, vector signed char *outsc,
+     vector unsigned char *outuc)
 {
   vector double in0 = in[0];
   vector double in1 = in[1];
@@ -20,6 +22,8 @@ void foo (vector double *out, vector do

Re: [PATCH, rs6000] Add missing vec_max tests

2018-05-04 Thread Segher Boessenkool
Hi!

On Fri, May 04, 2018 at 07:59:37AM -0700, Carl Love wrote:
> > Why does the patch remove these two vec_sel?  If that is wanted, the
> > changelog should mention this.

> No clue why I removed the vec_sel test.  Been sitting on the testsuite
> patch set too long, I don't recall.  I put it back in and tested it. 
> Then noticed there wasn't any tests for the xxsel instruction which the
> vec_sel test generates.  I added an instruction count test to both the
> BE and LE tests.  I found the BE test needed some updating as well. 
> The revised patch is given below.  I retested on:
> 
> powerpc64le-unknown-linux-gnu (Power 8 LE)
>     powerpc64-unknown-linux-gnu (Power 8 BE)
>     powerpc64le-unknown-linux-gnu (Power 9 LE).
> 
> Please let me know if everything looks OK now.  Thanks.

Looks good to me, okay for trunk.  Thanks!

(I'll find out soon enough if it works on power7 as well :-) )


Segher


> 2018-05-03 Carl Love  
>   * gcc.target/powerpc/vsx-vector-6.h (foo): Add test for vec_max,
>   vec_trunc.
>   * gcc.target/powerpc/vsx-vector-6-le.c (dg-final): Update xvcmpeqdp,
>   xvcmpgtdp, xvcmpgedp counts. Add xxsel counts.
>   * gcc.target/powerpc/vsx-vector-6-be.c (dg-final): Update xvcmpgtdp,
>   xvcmpgedp counts. Add xxsel counts.


Re: [PATCH, i386]: AVX false dependencies fix

2018-05-04 Thread Uros Bizjak
> This is the same patch I posted a few days ago, a bit modified according to 
> Uros'
> recommendation.
>
> Patch fixes false dependencies for vmovss, vmovsd, vrcpss, vrsqrtss, vsqrtss 
> and vsqrtsd
> instructions.
> Tested on x86-64/Linux, no new test fails, some SPEC 2006/2017 performance 
> gains.
>
> 2018-05-04  Alexander Nesterovskiy  
>
> * config/i386/i386.md (*movsf_internal): AVX falsedep fix.
> (*movdf_internal): Ditto.
> (*rcpsf2_sse): Ditto.
> (*rsqrtsf2_sse): Ditto.
> (*sqrt2_sse): Ditto.

OK.

Thanks,
Uros.


[PATCH GCC][1/6]Compute type mode and register class mapping

2018-05-04 Thread Bin Cheng
Hi,
This is the updated version patch set computing register pressure on TREE SSA
and use that information to direct other loop optimizers (predcom only for now).
This version of change is to follow Jeff's comment that we should reuse existing
tree-ssa-live.c infrastructure for live range computation, rather than inventing
another one.
Jeff had another concern about exposing ira.h and low-level register stuff in
GIMPLE world.  Unfortunately I haven't got a clear solution to it.  I found it's
a bit hard to relate type/type_mode with register class and with available regs
without exposing the information, especially there are multiple possible 
register
classes for vector types and it's not fixed.  I am open to any suggestions here.

This is the first patch estimating the map from type mode to register class.
This one doesn't need update and it's the same as the original version patch
at https://gcc.gnu.org/ml/gcc-patches/2017-05/msg01021.html

Bootstrap and test on x86_64 and AArch64 ongoing.  Any comments?

Thanks,
bin
2018-04-27  Bin Cheng  

* ira.c (setup_mode_classes): New function.
(find_reg_classes): Call above function.
* ira.h (struct target_ira): New field x_ira_mode_classes.
(ira_mode_classes): New macro.From d65c160a37f785cff29172f1335e87d01fc260ba Mon Sep 17 00:00:00 2001
From: Bin Cheng 
Date: Mon, 24 Apr 2017 14:41:28 +0100
Subject: [PATCH 1/6] ira-mode-reg_class-map-20170316.txt

---
 gcc/ira.c | 77 +++
 gcc/ira.h |  7 ++
 2 files changed, 84 insertions(+)

diff --git a/gcc/ira.c b/gcc/ira.c
index b7bcc15..f132a7a 100644
--- a/gcc/ira.c
+++ b/gcc/ira.c
@@ -1154,6 +1154,82 @@ setup_class_translate (void)
 			   ira_pressure_classes_num, ira_pressure_classes);
 }
 
+/* Find desired register class for machine mode from information about
+   register pressure class.  On RTL level, we can compute preferred
+   register class infomation for each pseudo register or allocno.  On
+   GIMPLE level, we need to infer register class from variable's type,
+   i.e, we need map from type mode to register class.
+
+   The map information is computed by simple guess, it's good enough
+   for use on GIMPLE.  */
+void
+setup_mode_classes (void)
+{
+  int i, j;
+  machine_mode mode;
+  enum reg_class vector_class = NO_REGS;
+
+  for (i = 0; i < NUM_MACHINE_MODES; i++)
+{
+  mode = (machine_mode) i;
+  ira_mode_classes[mode] = NO_REGS;
+
+  /* Only care about integer, float and vector modes on GIMPLE.  */
+  if (!INTEGRAL_MODE_P (mode)
+	  && !FLOAT_MODE_P (mode) && !VECTOR_MODE_P (mode))
+	continue;
+
+  /* Integers must be in GENERAL_REGS by default.  */
+  if (SCALAR_INT_MODE_P (mode))
+	{
+	  ira_mode_classes[mode] = GENERAL_REGS;
+	  continue;
+	}
+
+  /* Iterate over pressure classes and find the most appropriate
+	 one for this mode.  */
+  for (j = 0; j < ira_pressure_classes_num; j++)
+	{
+	  HARD_REG_SET valid_for_cl;
+	  enum reg_class cl = ira_pressure_classes[j];
+
+	  if (!contains_reg_of_mode[cl][mode])
+	continue;
+
+	  COPY_HARD_REG_SET (valid_for_cl, reg_class_contents[cl]);
+	  AND_COMPL_HARD_REG_SET (valid_for_cl,
+  ira_prohibited_class_mode_regs[cl][mode]);
+	  AND_COMPL_HARD_REG_SET (valid_for_cl, ira_no_alloc_regs);
+	  if (hard_reg_set_empty_p (valid_for_cl))
+	continue;
+
+	  if (ira_mode_classes[mode] == NO_REGS)
+	{
+	  ira_mode_classes[mode] = cl;
+
+	  /* Record reg_class for vector mode.  */
+	  if (VECTOR_MODE_P (mode) && cl != NO_REGS)
+		vector_class = cl;
+
+	  continue;
+	}
+	  /* Prefer non GENERAL_REGS for floating points.  */
+	  if ((FLOAT_MODE_P (mode) || VECTOR_MODE_P (mode))
+	  && cl != GENERAL_REGS && ira_mode_classes[mode] == GENERAL_REGS)
+	ira_mode_classes[mode] = cl;
+	}
+}
+
+  /* Setup vector modes that are missed previously.  */
+  if (vector_class != NO_REGS)
+for (i = 0; i < NUM_MACHINE_MODES; i++)
+  {
+	mode = (machine_mode) i;
+	if (ira_mode_classes[mode] == NO_REGS && VECTOR_MODE_P (mode))
+	  ira_mode_classes[mode] = vector_class;
+  }
+}
+
 /* Order numbers of allocno classes in original target allocno class
array, -1 for non-allocno classes.  */
 static int allocno_class_order[N_REG_CLASSES];
@@ -1430,6 +1506,7 @@ find_reg_classes (void)
   setup_class_translate ();
   reorder_important_classes ();
   setup_reg_class_relations ();
+  setup_mode_classes ();
 }
 
 
diff --git a/gcc/ira.h b/gcc/ira.h
index 9df983c..3471d4c 100644
--- a/gcc/ira.h
+++ b/gcc/ira.h
@@ -66,6 +66,11 @@ struct target_ira
  class.  */
   enum reg_class x_ira_pressure_class_translate[N_REG_CLASSES];
 
+  /* Map of machine mode to register pressure class.  With this map,
+ coarse-grained register pressure can be computed on GIMPLE, where
+ we don't have insn pattern to compute preferred reg class.  */
+  enum reg_class x_ira_mode_classes[MAX_MACHINE_MODE];
+
   /* Biggest p

[PATCH GCC][2/6]Compute available register for each register classes

2018-05-04 Thread Bin Cheng
Hi,
This is the second patch computing available/clobber registers for register 
classes.
It's the same as the original patch posted 
@https://gcc.gnu.org/ml/gcc-patches/2017-05/msg01022.html

Bootstrap and test on x86_64 and AArch64 ongoing.  Any comments?

Thanks,
bin
2017-04-27  Bin Cheng  

* cfgloop.h (struct target_cfgloop): Change x_target_avail_regs and
x_target_clobbered_regs into array fields.
(init_avail_clobber_regs): New declaration.
* cfgloopanal.c (memmodel.h, ira.h): Include header files.
(init_set_costs): Remove computation for old x_target_avail_regs and
x_target_clobbered_regs fields.
(init_avail_clobber_regs): New function.
(estimate_reg_pressure_cost): Update the uses.
* toplev.c (cfgloop.h): Update comment why the header file is needed.
(backend_init_target): Call init_avail_clobber_regs.
* tree-predcom.c (memmodel.h, ira.h): Include header files.
(MAX_DISTANCE): Update the use.
* tree-ssa-loop-ivopts.c (AVAILABLE_REGS, CLOBBERED_REGS): New marco.
(ivopts_estimate_reg_pressure, determine_set_costs): Update the uses.From 47a2074d21f4b28b4c38233628e94bcaef9ed40d Mon Sep 17 00:00:00 2001
From: Bin Cheng 
Date: Thu, 19 Apr 2018 15:54:14 +0100
Subject: [PATCH 2/6] init-avail_clob-regs-20180428.txt

---
 gcc/cfgloop.h  | 10 ---
 gcc/cfgloopanal.c  | 68 +++---
 gcc/toplev.c   |  3 +-
 gcc/tree-predcom.c |  4 ++-
 gcc/tree-ssa-loop-ivopts.c | 14 ++
 5 files changed, 72 insertions(+), 27 deletions(-)

diff --git a/gcc/cfgloop.h b/gcc/cfgloop.h
index af9bfab..3d06e1c 100644
--- a/gcc/cfgloop.h
+++ b/gcc/cfgloop.h
@@ -773,11 +773,12 @@ loop_iterator::~loop_iterator ()
 
 /* The properties of the target.  */
 struct target_cfgloop {
-  /* Number of available registers.  */
-  unsigned x_target_avail_regs;
+  /* Number of available registers per register pressure class.  */
+  unsigned x_target_avail_regs[N_REG_CLASSES];
 
-  /* Number of available registers that are call-clobbered.  */
-  unsigned x_target_clobbered_regs;
+  /* Number of available registers that are call-clobbered, per register
+ pressure class.  */
+  unsigned x_target_clobbered_regs[N_REG_CLASSES];
 
   /* Number of registers reserved for temporary expressions.  */
   unsigned x_target_res_regs;
@@ -812,6 +813,7 @@ extern struct target_cfgloop *this_target_cfgloop;
invariant motion.  */
 extern unsigned estimate_reg_pressure_cost (unsigned, unsigned, bool, bool);
 extern void init_set_costs (void);
+extern void init_avail_clobber_regs (void);
 
 /* Loop optimizer initialization.  */
 extern void loop_optimizer_init (unsigned);
diff --git a/gcc/cfgloopanal.c b/gcc/cfgloopanal.c
index 3af0b2d..20010bb 100644
--- a/gcc/cfgloopanal.c
+++ b/gcc/cfgloopanal.c
@@ -22,6 +22,8 @@ along with GCC; see the file COPYING3.  If not see
 #include "coretypes.h"
 #include "backend.h"
 #include "rtl.h"
+#include "memmodel.h"
+#include "ira.h"
 #include "tree.h"
 #include "predict.h"
 #include "memmodel.h"
@@ -344,20 +346,6 @@ init_set_costs (void)
   rtx reg2 = gen_raw_REG (SImode, LAST_VIRTUAL_REGISTER + 2);
   rtx addr = gen_raw_REG (Pmode, LAST_VIRTUAL_REGISTER + 3);
   rtx mem = validize_mem (gen_rtx_MEM (SImode, addr));
-  unsigned i;
-
-  target_avail_regs = 0;
-  target_clobbered_regs = 0;
-  for (i = 0; i < FIRST_PSEUDO_REGISTER; i++)
-if (TEST_HARD_REG_BIT (reg_class_contents[GENERAL_REGS], i)
-	&& !fixed_regs[i])
-  {
-	target_avail_regs++;
-	if (call_used_regs[i])
-	  target_clobbered_regs++;
-  }
-
-  target_res_regs = 3;
 
   for (speed = 0; speed < 2; speed++)
  {
@@ -387,6 +375,54 @@ init_set_costs (void)
   default_rtl_profile ();
 }
 
+/* Initialize available, clobbered register for each register classes.  */
+
+void
+init_avail_clobber_regs (void)
+{
+  int j;
+  unsigned i;
+  bool general_regs_presented_p = false;
+
+  /* Check if GENERAL_REGS is one of pressure classes.  */
+  for (j = 0; j < ira_pressure_classes_num; j++)
+{
+  target_avail_regs[j] = 0;
+  target_clobbered_regs[j] = 0;
+  if (ira_pressure_classes[j] == GENERAL_REGS)
+	general_regs_presented_p = true;
+}
+  target_avail_regs[GENERAL_REGS] = 0;
+  target_clobbered_regs[GENERAL_REGS] = 0;
+
+  for (i = 0; i < FIRST_PSEUDO_REGISTER; i++)
+{
+  if (fixed_regs[i])
+	continue;
+
+  bool call_used = call_used_regs[i];
+
+  for (j = 0; j < ira_pressure_classes_num; j++)
+	if (TEST_HARD_REG_BIT (reg_class_contents[ira_pressure_classes[j]], i))
+	  {
+	target_avail_regs[ira_pressure_classes[j]]++;
+	if (call_used)
+	  target_clobbered_regs[ira_pressure_classes[j]]++;
+	  }
+
+  /* Compute pressure information for GENERAL_REGS separately.  */
+  if (!general_regs_presented_p)
+	if (TEST_HARD_REG_BIT (reg_class_contents[GENERAL_REGS], i))
+	  {
+	target_avail_regs[GENERAL_REGS]++;
+	i

[PATCH GCC][3/6]Delete unnecessary function live_merge_and_clear

2018-05-04 Thread Bin Cheng
HI,
This is an obvious patch removing the unnecessary function.

Bootstrap and test on x86_64 and AArch64 ongoing.  Is it OK?

Thanks,
bin
2018-04-27  Bin Cheng  

* tree-ssa-live.h (live_merge_and_clear): Delete.From ba6e47da7faba9a31c776a6d06ef052b1ed392a8 Mon Sep 17 00:00:00 2001
From: Bin Cheng 
Date: Wed, 2 May 2018 11:37:34 +0100
Subject: [PATCH 3/6] remove-live_merge_and_clear.txt

---
 gcc/tree-ssa-live.h | 12 
 1 file changed, 12 deletions(-)

diff --git a/gcc/tree-ssa-live.h b/gcc/tree-ssa-live.h
index e62293b..448aaf9 100644
--- a/gcc/tree-ssa-live.h
+++ b/gcc/tree-ssa-live.h
@@ -289,18 +289,6 @@ live_var_map (tree_live_info_p live)
 }
 
 
-/* Merge the live on entry information in LIVE for partitions P1 and P2. Place
-   the result into P1.  Clear P2.  */
-
-static inline void
-live_merge_and_clear (tree_live_info_p live, int p1, int p2)
-{
-  gcc_checking_assert (&live->livein[p1] && &live->livein[p2]);
-  bitmap_ior_into (&live->livein[p1], &live->livein[p2]);
-  bitmap_clear (&live->livein[p2]);
-}
-
-
 /* Mark partition P as live on entry to basic block BB in LIVE.  */
 
 static inline void
-- 
1.9.1



[PATCH GCC][4/6]Support regional coalesce and live range computation

2018-05-04 Thread Bin Cheng
Hi,
Following Jeff's suggestion, I am now using existing tree-ssa-live.c and
tree-ssa-coalesce.c to compute register pressure, rather than inventing
another live range solver.

The major change is to record region's basic blocks in var_map and use that
information in computation, rather than FOR_EACH_BB_FN.  For now only loop
and function type regions are supported.  The default one is function type
region which is used in out-of-ssa.  Loop type region will be used in next
patch to compute information for a loop.

Bootstrap and test on x86_64 and AArch64 ongoing.  Any comments?

Thanks,
bin
2018-04-27  Bin Cheng  

* tree-outof-ssa.c (remove_ssa_form): Update use.
* tree-ssa-coalesce.c (build_ssa_conflict_graph): Support regional
coalesce.
(coalesce_with_default): Update comment.
(create_outofssa_var_map): Support regional coalesce.  Rename to...
(create_var_map): ...this.
(coalesce_partitions): Support regional coalesce.
(gimple_can_coalesce_p, compute_optimized_partition_bases): Ditto.
(coalesce_ssa_name): Ditto.
* tree-ssa-coalesce.h (coalesce_ssa_name, gimple_can_coalesce_p):
Add parameter in declarations.
* tree-ssa-live.c (init_var_map, delete_var_map): Support regional
coalesce.
(new_tree_live_info, loe_visit_block, set_var_live_on_entry): Ditto.
(calculate_live_on_exit, verify_live_on_entry): Ditto.
* tree-ssa-live.h (enum region_type): New.
(struct _var_map): New fields.
(init_var_map): Add parameter in declaration.
(function_region_p, region_contains_p): New.
* tree-ssa-uncprop.c (uncprop_into_successor_phis): Update uses.From 6b7b80eb40c0bd08c25c14b3f7c33937941bdfaa Mon Sep 17 00:00:00 2001
From: Bin Cheng 
Date: Fri, 4 May 2018 09:39:17 +0100
Subject: [PATCH 4/6] liverange-support-region-20180427

---
 gcc/tree-outof-ssa.c|  2 +-
 gcc/tree-ssa-coalesce.c | 77 ++-
 gcc/tree-ssa-coalesce.h |  4 +--
 gcc/tree-ssa-live.c | 80 +++--
 gcc/tree-ssa-live.h | 51 ++-
 gcc/tree-ssa-uncprop.c  |  5 ++--
 6 files changed, 163 insertions(+), 56 deletions(-)

diff --git a/gcc/tree-outof-ssa.c b/gcc/tree-outof-ssa.c
index 59bdcd6..81edbc5 100644
--- a/gcc/tree-outof-ssa.c
+++ b/gcc/tree-outof-ssa.c
@@ -945,7 +945,7 @@ remove_ssa_form (bool perform_ter, struct ssaexpand *sa)
   bitmap values = NULL;
   var_map map;
 
-  map = coalesce_ssa_name ();
+  map = coalesce_ssa_name (NULL, flag_tree_coalesce_vars);
 
   /* Return to viewing the variable list as just all reference variables after
  coalescing has been performed.  */
diff --git a/gcc/tree-ssa-coalesce.c b/gcc/tree-ssa-coalesce.c
index 5cc0aca..7269eb1 100644
--- a/gcc/tree-ssa-coalesce.c
+++ b/gcc/tree-ssa-coalesce.c
@@ -869,7 +869,7 @@ build_ssa_conflict_graph (tree_live_info_p liveinfo)
  coalesce variables from different base variables, including
  different parameters, so we have to make sure default defs live
  at the entry block conflict with each other.  */
-  if (flag_tree_coalesce_vars)
+  if (liveinfo->map->coalesce_vars_p)
 entry = single_succ (ENTRY_BLOCK_PTR_FOR_FN (cfun));
   else
 entry = NULL;
@@ -879,7 +879,7 @@ build_ssa_conflict_graph (tree_live_info_p liveinfo)
 
   live = new_live_track (map);
 
-  FOR_EACH_BB_FN (bb, cfun)
+  for (unsigned i = 0; liveinfo->map->vec_bbs->iterate (i, &bb); ++i)
 {
   /* Start with live on exit temporaries.  */
   live_track_init (live, live_on_exit (liveinfo, bb));
@@ -944,6 +944,8 @@ build_ssa_conflict_graph (tree_live_info_p liveinfo)
 	{
 	  gphi *phi = gsi.phi ();
 	  tree result = PHI_RESULT (phi);
+	  if (virtual_operand_p (result))
+	continue;
 	  if (live_track_live_p (live, result))
 	live_track_process_def (live, result, graph);
 	}
@@ -1071,14 +1073,18 @@ coalesce_with_default (tree var, coalesce_list *cl, bitmap used_in_copy)
   add_cost_one_coalesce (cl, SSA_NAME_VERSION (ssa), SSA_NAME_VERSION (var));
   bitmap_set_bit (used_in_copy, SSA_NAME_VERSION (var));
   /* Default defs will have their used_in_copy bits set at the end of
- create_outofssa_var_map.  */
+ create_var_map.  */
 }
 
-/* This function creates a var_map for the current function as well as creating
-   a coalesce list for use later in the out of ssa process.  */
+/* This function creates a var_map for a region indicated by BBS in the current
+   function as well as creating a coalesce list for use later in the out of ssa
+   process.  Region is a loop if LOOP is not NULL, otherwise the function.
+   COALESCE_VARS_P is true if we coalesce version of different user-defined
+   variables.  */
 
 static var_map
-create_outofssa_var_map (coalesce_list *cl, bitmap used_in_copy)
+create_var_map (struct loop *loop, coalesce_list *cl, bitmap used_in_copy,
+		bool coalesce_vars_p)
 {
   gimple_stmt_

[PATCH GCC][5/6]implement live range, reg pressure computation class

2018-05-04 Thread Bin Cheng
Hi,
Based on previous patch, this one implements live range, reg pressure 
computation
class in tree-ssa-live.c.  The user would only need to instantiate the class and
call the computation interface as in next patch.
During the work, I think it's also worthwhile to classify all live range and 
coalesce
data structures and algorithms in the future.

Bootstrap and test on x86_64 and AArch64 ongoing.  Any comments?

Thanks,
bin
2018-04-27  Bin Cheng  

* tree-ssa-live.c (memmodel.h, ira.h, tree-ssa-coalesce.h): Include.
(struct stmt_lr_info, free_stmt_lr_info): New.
(lr_region::lr_region, lr_region::~lr_region): New.
(lr_region::create_stmt_lr_info): New.
(lr_region::update_live_range_by_stmt): New.
(lr_region::calculate_coalesced_pressure): New.
(lr_region::calculate_pressure): New.
* tree-ssa-live.h (struct stmt_lr_info): New declaration.
(class lr_region): New class.From 5c16db5672a4f0826d2a164823759a9ffb12c349 Mon Sep 17 00:00:00 2001
From: Bin Cheng 
Date: Fri, 4 May 2018 09:42:04 +0100
Subject: [PATCH 5/6] region-reg-pressure-20180428

---
 gcc/tree-ssa-live.c | 157 
 gcc/tree-ssa-live.h |  49 
 2 files changed, 206 insertions(+)

diff --git a/gcc/tree-ssa-live.c b/gcc/tree-ssa-live.c
index ccb0d99..e51cd15 100644
--- a/gcc/tree-ssa-live.c
+++ b/gcc/tree-ssa-live.c
@@ -23,6 +23,8 @@ along with GCC; see the file COPYING3.  If not see
 #include "coretypes.h"
 #include "backend.h"
 #include "rtl.h"
+#include "memmodel.h"
+#include "ira.h"
 #include "tree.h"
 #include "gimple.h"
 #include "timevar.h"
@@ -34,6 +36,7 @@ along with GCC; see the file COPYING3.  If not see
 #include "tree-dfa.h"
 #include "dumpfile.h"
 #include "tree-ssa-live.h"
+#include "tree-ssa-coalesce.h"
 #include "debug.h"
 #include "tree-ssa.h"
 #include "ipa-utils.h"
@@ -1204,6 +1207,160 @@ calculate_live_ranges (var_map map, bool want_livein)
 }
 
 
+/* Live range information for a gimple stmt.  */
+struct stmt_lr_info
+{
+  /*  ID of the stmt.  */
+  unsigned id;
+  gimple *stmt;
+  /* Live ranges after the stmt.  */
+  bitmap lr_after_stmt;
+};
+
+/* Call back function to free live range INFO of gimple STMT.  */
+
+bool
+free_stmt_lr_info (gimple *const & stmt, stmt_lr_info *const &info, void *)
+{
+  gcc_assert (info->stmt == stmt);
+  if (info->lr_after_stmt != NULL)
+BITMAP_FREE (info->lr_after_stmt);
+
+  free (info);
+  return true;
+}
+
+lr_region::lr_region (struct loop *loop)
+  : m_loop (loop),
+m_varmap (NULL),
+m_liveinfo (NULL),
+m_stmtmap (new hash_map (13))
+{
+  memset (m_pressure, 0, sizeof (unsigned) * N_REG_CLASSES);
+}
+
+lr_region::~lr_region ()
+{
+  m_stmtmap->traverse (NULL);
+  delete m_stmtmap;
+}
+
+struct stmt_lr_info *
+lr_region::create_stmt_lr_info (gimple *stmt)
+{
+  bool exist_p;
+  struct stmt_lr_info **slot = &m_stmtmap->get_or_insert (stmt, &exist_p);
+
+  gcc_assert (!exist_p);
+  *slot = XCNEW (struct stmt_lr_info);
+  (*slot)->stmt = stmt;
+  (*slot)->lr_after_stmt = NULL;
+  return *slot;
+}
+
+void
+lr_region::update_live_range_by_stmt (gimple *stmt, bitmap live_ranges,
+  unsigned *pressure)
+{
+  int p;
+  tree var;
+  ssa_op_iter iter;
+
+  FOR_EACH_SSA_TREE_OPERAND (var, stmt, iter, SSA_OP_DEF)
+{
+  p = var_to_partition (m_varmap, var);
+  gcc_assert (p != NO_PARTITION);
+  if (bitmap_clear_bit (live_ranges, p))
+	pressure[ira_mode_classes[TYPE_MODE (TREE_TYPE (var))]]--;
+}
+  FOR_EACH_SSA_TREE_OPERAND (var, stmt, iter, SSA_OP_USE)
+{
+  p = var_to_partition (m_varmap, var);
+  gcc_assert (p != NO_PARTITION);
+  if (bitmap_set_bit (live_ranges, p))
+	pressure[ira_mode_classes[TYPE_MODE (TREE_TYPE (var))]]++;
+}
+}
+
+void
+lr_region::calculate_coalesced_pressure ()
+{
+  unsigned i, j, reg_class, pressure[N_REG_CLASSES];
+  bitmap_iterator bi, bj;
+  gimple_stmt_iterator bsi;
+  auto_bitmap live_ranges;
+  bitmap bbs = get_bbs ();
+
+  EXECUTE_IF_SET_IN_BITMAP (bbs, 0, i, bi)
+{
+  basic_block bb = BASIC_BLOCK_FOR_FN (cfun, i);
+  bitmap_copy (live_ranges, &m_liveinfo->liveout[bb->index]);
+
+  memset (pressure, 0, sizeof (unsigned) * N_REG_CLASSES);
+  EXECUTE_IF_SET_IN_BITMAP (live_ranges, 0, j, bj)
+	{
+	  tree var = partition_to_var (m_varmap, j);
+	  reg_class = ira_mode_classes[TYPE_MODE (TREE_TYPE (var))];
+	  pressure[reg_class]++;
+	}
+
+  for (bsi = gsi_last_bb (bb); !gsi_end_p (bsi); gsi_prev (&bsi))
+	{
+	  gimple *stmt = gsi_stmt (bsi);
+	  struct stmt_lr_info *stmt_info = create_stmt_lr_info (stmt);
+	  /* No need to compute live range information for debug stmt.  */
+	  if (is_gimple_debug (stmt))
+	continue;
+
+	  for (j = 0; j < N_REG_CLASSES; j++)
+	if (pressure[j] > m_pressure[j])
+	  m_pressure[j] = pressure[j];
+
+	  stmt_info->lr_after_stmt = BITMAP_ALLOC (NULL);
+	  bitmap_copy (stmt_info->lr_after_stmt, live_ranges);
+	  update_

gotools patch committed: Set GOCACHE during tests

2018-05-04 Thread Ian Lance Taylor
This patch to the gotools Makefile sets the GOCACHE variable while
running the gotools tests.  This avoids creating a cache in the
default location, which is the user's home directory.  This should fix
PR 85630.  Bootstrapped and ran gotools tests on x86_64-pc-linux-gnu.
Committed to mainline.

I would like to commit this to the GCC 8 branch but I haven't seen a
statement that the branch is open for bug fixes.  Is the backport OK?
Thanks.

Ian

2018-05-04  Ian Lance Taylor  

PR go/85630
* Makefile.am (CHECK_ENV): Set GOCACHE.
(ECHO_ENV): Update for setting of GOCACHE.
* Makefile.in: Rebuild.
Index: Makefile.am
===
--- Makefile.am (revision 259935)
+++ Makefile.am (working copy)
@@ -218,11 +218,13 @@ CHECK_ENV = \
export LD_LIBRARY_PATH; \
GOROOT=$${abs_libgodir}; \
export GOROOT; \
+   GOCACHE=$(abs_builddir)/gocache-test; \
+   export GOCACHE; \
fl1="FA"; fl2="IL"; fl="$${fl1}$${fl2}";
 
 # ECHO_ENV is a variant of CHECK_ENV to put into a testlog file.
 # It assumes that abs_libgodir is set.
-ECHO_ENV = PATH=`echo $(abs_builddir):$${PATH} | sed 
's,::*,:,g;s,^:*,,;s,:*$$,,'` GCCGO='$(abs_builddir)/check-gccgo' 
CC='$(abs_builddir)/check-gcc' GCCGOTOOLDIR='$(abs_builddir)' 
GO_TESTING_GOTOOLS=yes LD_LIBRARY_PATH=`echo 
$${abs_libgodir}/.libs:$${LD_LIBRARY_PATH} | sed 's,::*,:,g;s,^:*,,;s,:*$$,,'` 
GOROOT=`echo $${abs_libgodir}`
+ECHO_ENV = PATH=`echo $(abs_builddir):$${PATH} | sed 
's,::*,:,g;s,^:*,,;s,:*$$,,'` GCCGO='$(abs_builddir)/check-gccgo' 
CC='$(abs_builddir)/check-gcc' GCCGOTOOLDIR='$(abs_builddir)' 
GO_TESTING_GOTOOLS=yes LD_LIBRARY_PATH=`echo 
$${abs_libgodir}/.libs:$${LD_LIBRARY_PATH} | sed 's,::*,:,g;s,^:*,,;s,:*$$,,'` 
GOROOT=`echo $${abs_libgodir} GOCACHE='$(abs_builddir)/gocache-test'`
 
 # check-go-tool runs `go test cmd/go` in our environment.
 check-go-tool: go$(EXEEXT) $(noinst_PROGRAMS) check-head check-gccgo check-gcc


[PATCH GCC][6/6]Restrict predcom using register pressure information

2018-05-04 Thread Bin Cheng
Hi,
This patch restricts predcom pass using register pressure information.
In case of high register pressure, we now prune additional chains as well
as disable unrolling in predcom.  In generally, I think this patch set is
useful.

Bootstrap and test on x86_64 ongoing.  Any comments?

Thanks,
bin
2018-04-27  Bin Cheng  

* tree-predcom.c (stor-layout.h, tree-ssa-live.h): Include.
(REG_RELAX_RATIO, prune_chains): New.
(tree_predictive_commoning_loop): Compute reg pressure using class
region.  Prune chains based on reg pressure.  Force to not unroll
if reg pressure is high.From 1b488665f8fea619c4ce35f71650c342df69de2f Mon Sep 17 00:00:00 2001
From: Bin Cheng 
Date: Wed, 25 Apr 2018 16:30:41 +0100
Subject: [PATCH 6/6] pcom-reg-pressure-20180423

---
 gcc/tree-predcom.c | 74 ++
 1 file changed, 74 insertions(+)

diff --git a/gcc/tree-predcom.c b/gcc/tree-predcom.c
index aeadbf7..d0c18b3 100644
--- a/gcc/tree-predcom.c
+++ b/gcc/tree-predcom.c
@@ -217,6 +217,7 @@ along with GCC; see the file COPYING3.  If not see
 #include "tree-pass.h"
 #include "ssa.h"
 #include "gimple-pretty-print.h"
+#include "stor-layout.h"
 #include "alias.h"
 #include "fold-const.h"
 #include "cfgloop.h"
@@ -227,6 +228,7 @@ along with GCC; see the file COPYING3.  If not see
 #include "tree-ssa-loop-ivopts.h"
 #include "tree-ssa-loop-manip.h"
 #include "tree-ssa-loop-niter.h"
+#include "tree-ssa-live.h"
 #include "tree-ssa-loop.h"
 #include "tree-into-ssa.h"
 #include "tree-dfa.h"
@@ -242,6 +244,10 @@ along with GCC; see the file COPYING3.  If not see
 
 #define MAX_DISTANCE (target_avail_regs[GENERAL_REGS] < 16 ? 4 : 8)
 
+/* The ratio by which register pressure check is relaxed.  */
+
+#define REG_RELAX_RATIO (2)
+
 /* Data references (or phi nodes that carry data reference values across
loop iterations).  */
 
@@ -3156,6 +3162,59 @@ insert_init_seqs (struct loop *loop, vec chains)
   }
 }
 
+/* Prune chains causing high register pressure.  */
+
+static void
+prune_chains (vec *chains, unsigned *max_pressure)
+{
+  bool pruned_p = false;
+  machine_mode mode;
+  enum reg_class cl;
+  unsigned i, new_pressure;
+
+  for (i = 0; i < chains->length ();)
+{
+  chain_p chain = (*chains)[i];
+  /* Always allow combined chain and zero-length chain.  */
+  if (chain->combined || chain->type == CT_COMBINATION
+	  || chain->length == 0 || chain->type == CT_STORE_STORE)
+	{
+	  i++;
+	  continue;
+	}
+
+  gcc_assert (chain->refs.length () > 0);
+  mode = TYPE_MODE (TREE_TYPE (chain->refs[0]->ref->ref));
+  /* Bypass chain that doesn't contribute to any reg_class, although
+	 something could be wrong when mapping type mode to reg_class.  */
+  if (ira_mode_classes[mode] == NO_REGS)
+	{
+	  i++;
+	  continue;
+	}
+
+  cl = ira_pressure_class_translate[ira_mode_classes[mode]];
+  /* Prune chain if it causes higher register pressure than available
+	 registers; otherwise keep the chain and update register pressure
+	 information.  */
+  new_pressure = max_pressure[cl] + chain->length - 1;
+  if (new_pressure <= target_avail_regs[cl] * REG_RELAX_RATIO)
+	{
+	  i++;
+	  max_pressure[cl] = new_pressure;
+	}
+  else
+	{
+	  release_chain (chain);
+	  chains->unordered_remove (i);
+	  pruned_p = true;
+	}
+}
+
+  if (pruned_p && dump_file && (dump_flags & TDF_DETAILS))
+fprintf (dump_file, "Prune chain because of high reg pressure\n");
+}
+
 /* Performs predictive commoning for LOOP.  Sets bit 1<<0 of return value
if LOOP was unrolled; Sets bit 1<<1 of return value if loop closed ssa
form was corrupted.  */
@@ -3171,6 +3230,9 @@ tree_predictive_commoning_loop (struct loop *loop)
   struct tree_niter_desc desc;
   bool unroll = false, loop_closed_ssa = false;
   edge exit;
+  lr_region *region;
+  unsigned max_pressure[N_REG_CLASSES];
+  bool high_pressure_p;
 
   if (dump_file && (dump_flags & TDF_DETAILS))
 fprintf (dump_file, "Processing loop %d\n",  loop->num);
@@ -3239,6 +3301,11 @@ tree_predictive_commoning_loop (struct loop *loop)
   /* Try to combine the chains that are always worked with together.  */
   try_combine_chains (loop, &chains);
 
+  region = new lr_region (loop);
+  high_pressure_p = region->calculate_pressure (max_pressure);
+  delete region;
+  prune_chains (&chains, max_pressure);
+
   insert_init_seqs (loop, chains);
 
   if (dump_file && (dump_flags & TDF_DETAILS))
@@ -3250,6 +3317,13 @@ tree_predictive_commoning_loop (struct loop *loop)
   /* Determine the unroll factor, and if the loop should be unrolled, ensure
  that its number of iterations is divisible by the factor.  */
   unroll_factor = determine_unroll_factor (chains);
+  /* Force to not unroll if register pressure is high.  */
+  if (high_pressure_p && unroll_factor > 1)
+{
+  unroll_factor = 1;
+  if (dump_file && (dump_flags & TDF_DETAILS))
+	fprintf (dump_file, "Force to not unroll b

Re: gotools patch committed: Set GOCACHE during tests

2018-05-04 Thread Jakub Jelinek
On Fri, May 04, 2018 at 09:23:54AM -0700, Ian Lance Taylor wrote:
> This patch to the gotools Makefile sets the GOCACHE variable while
> running the gotools tests.  This avoids creating a cache in the
> default location, which is the user's home directory.  This should fix
> PR 85630.  Bootstrapped and ran gotools tests on x86_64-pc-linux-gnu.
> Committed to mainline.
> 
> I would like to commit this to the GCC 8 branch but I haven't seen a
> statement that the branch is open for bug fixes.  Is the backport OK?

It is open under the normal release branch rules.  So yes, the backport is
ok.

Jakub


Re: [PATCH 1/8] [BRIGFE] fix an alloca stack underflow

2018-05-04 Thread Pekka Jääskeläinen
Committed as r259942.

On Fri, May 4, 2018 at 4:56 PM, Pekka Jääskeläinen  wrote:
> We didn't preserve additional space for the alloca frame pointers that
> are needed to be saved in the alloca space.


Re: [PATCH 2/8] [BRIGFE] Enable whole program optimizations

2018-05-04 Thread Pekka Jääskeläinen
Committed as r259943.

On Fri, May 4, 2018 at 4:57 PM, Pekka Jääskeläinen
 wrote:
> HSA assumes all program scope HSAIL symbols can be queried from
> the host runtime API, thus cannot be removed by the IPA.
>
> Getting some inlining happening in the finalized binary required:
> * explicitly marking the 'prog' scope functions and the launcher
> function "externally_visible" to avoid the inliner removing it
> * also the host_def ptr is set to externally visible, otherwise
> IPA assumes it's never set
> * adding the 'inline' keyword to functions to enable inlining,
> otherwise GCC defaults to replaceable functions (one can link
> over the previous one) which cannot be inlined
> * replacing all calls to declarations with calls to definitions to
> enable the inliner to find the definition
> * to fix missing hidden argument types in the generated functions.
> These were ignored silently until GCC started to be able to
> inline calls to such functions.
> * do not gimplify before fixing the call targets. Otherwise the
> calls get detached and the definitions are not found. The reason
> why this happens is not clear, but gimplifying only after call
> target decl->def conversion fixes this.
> ---
>  gcc/brig/ChangeLog| 11 +++
>  gcc/brig/brig-lang.c  |  4 +-
>  gcc/brig/brigfrontend/brig-branch-inst-handler.cc |  2 +
>  gcc/brig/brigfrontend/brig-function-handler.cc| 30 ++---
>  gcc/brig/brigfrontend/brig-function.cc|  4 +-
>  gcc/brig/brigfrontend/brig-to-generic.cc  | 82
> ++-
>  gcc/brig/brigfrontend/brig-to-generic.h   |  8 +++
>  gcc/brig/brigfrontend/brig-variable-handler.cc|  3 +
>  8 files changed, 130 insertions(+), 14 deletions(-)
>


Re: [PATCH 3/8] [BRIGFE] The modulo in ID computation should not be needed.

2018-05-04 Thread Pekka Jääskeläinen
Commited in r259944.

On Fri, May 4, 2018 at 4:58 PM, Pekka Jääskeläinen  wrote:
> The case where a dim is greater than the grid size doesn't seem
> to be mentioned in the specs nor tested by PRM test suite.
> ---
>  gcc/brig/brigfrontend/brig-code-entry-handler.cc | 10 +-
>  1 file changed, 1 insertion(+), 9 deletions(-)
>


Re: gotools patch committed: Set GOCACHE during tests

2018-05-04 Thread Ian Lance Taylor
On Fri, May 4, 2018 at 9:25 AM, Jakub Jelinek  wrote:
> On Fri, May 04, 2018 at 09:23:54AM -0700, Ian Lance Taylor wrote:
>> This patch to the gotools Makefile sets the GOCACHE variable while
>> running the gotools tests.  This avoids creating a cache in the
>> default location, which is the user's home directory.  This should fix
>> PR 85630.  Bootstrapped and ran gotools tests on x86_64-pc-linux-gnu.
>> Committed to mainline.
>>
>> I would like to commit this to the GCC 8 branch but I haven't seen a
>> statement that the branch is open for bug fixes.  Is the backport OK?
>
> It is open under the normal release branch rules.  So yes, the backport is
> ok.

Thanks.  Tested on x86_64-pc-linux-gnu and committed to GCC 8 branch.

Ian


Re: [C++ PATCH] Fix value initialized decltype(nullptr) in constexpr (PR c++/85553)

2018-05-04 Thread Jason Merrill
On Sun, Apr 29, 2018 at 3:23 AM, Paolo Carlini  wrote:
> Hi,
>
> On 28/04/2018 18:41, Jason Merrill wrote:
>>
>> On Fri, Apr 27, 2018 at 7:26 PM, Paolo Carlini 
>> wrote:
>>>
>>> Hi again,
>>>
>>> I'm now pretty sure that we have a latent issue in ocp_convert. The bug
>>> fixed by Jakub shows that we used to not have issues with
>>> integer_zero_node.
>>> That's easy to explain: at the beginning of ocp_convert there is code
>>> which
>>> handles first some special / simple cases when
>>> same_type_ignoring_top_level_qualifiers_p is true. That code isn't of
>>> course
>>> used for integer_zero_node as source expression, which therefore is
>>> handled
>>> by:
>>>
>>>if (NULLPTR_TYPE_P (type) && e && null_ptr_cst_p (e))
>>>  {
>>>if (complain & tf_warning)
>>>  maybe_warn_zero_as_null_pointer_constant (e, loc);
>>>return nullptr_node;
>>>  }
>>
>> Maybe we should move this code up, then.
>
> You are totally right. Yesterday I realized that and tested on x86_64-linux
> the below, both with and without Jakub's fix.

+  if (!TREE_SIDE_EFFECTS (e))
+return nullptr_node;

So what happens if e has side-effects?

Jason


libgo patch committed: On AIX, pass -X64 first to ar

2018-05-04 Thread Ian Lance Taylor
This patch by Tony Reix passes -X64 first to the ar command on AIX,
not after the rc command.  Bootstrapped and ran Go testsuite on
x86_64-pc-linux-gnu.  Committed to mainline and GCC 8 branch.

Ian
Index: gcc/go/gofrontend/MERGE
===
--- gcc/go/gofrontend/MERGE (revision 259935)
+++ gcc/go/gofrontend/MERGE (working copy)
@@ -1,4 +1,4 @@
-0c9b7a1ca4c6308345ea2a276cf820ff52513592
+6b0355769edd9543e6c5f2270b26b140bb96e9aa
 
 The first line of this file holds the git revision number of the last
 merge done from the gofrontend repository.
Index: libgo/go/cmd/go/internal/work/gccgo.go
===
--- libgo/go/cmd/go/internal/work/gccgo.go  (revision 259805)
+++ libgo/go/cmd/go/internal/work/gccgo.go  (working copy)
@@ -198,7 +198,7 @@ func (gccgoToolchain) pack(b *Builder, a
// AIX "ar" command does not know D option.
arArgs = append(arArgs, "-X64")
}
-   return b.run(a, p.Dir, p.ImportPath, nil, "ar", "rc", arArgs, 
absAfile, absOfiles)
+   return b.run(a, p.Dir, p.ImportPath, nil, "ar", arArgs, "rc", 
absAfile, absOfiles)
}
return nil
 }


Re: [PATCH 4/8] [BRIGFE] allow controlling strict aliasing from cmd line

2018-05-04 Thread Pekka Jääskeläinen
Committed as r259948.

On Fri, May 4, 2018 at 4:59 PM, Pekka Jääskeläinen  wrote:
> ---
>  gcc/brig/brig-lang.c | 12 +---
>  1 file changed, 9 insertions(+), 3 deletions(-)
>
>


Re: [PATCH 5/8] [BRIGFE] do not allow optimizations based on known C builtins

2018-05-04 Thread Pekka Jääskeläinen
Committed as r259949.

On Fri, May 4, 2018 at 5:00 PM, Pekka Jääskeläinen  wrote:
> This can break inputs that have similarly names functions.
> ---
>  gcc/brig/brig-lang.c | 16 +++-
>  1 file changed, 7 insertions(+), 9 deletions(-)
>


Re: [C++ PATCH] Fix value initialized decltype(nullptr) in constexpr (PR c++/85553)

2018-05-04 Thread Paolo Carlini

Hi,

On 04/05/2018 19:45, Jason Merrill wrote:

On Sun, Apr 29, 2018 at 3:23 AM, Paolo Carlini  wrote:

Hi,

On 28/04/2018 18:41, Jason Merrill wrote:

On Fri, Apr 27, 2018 at 7:26 PM, Paolo Carlini
wrote:

Hi again,

I'm now pretty sure that we have a latent issue in ocp_convert. The bug
fixed by Jakub shows that we used to not have issues with
integer_zero_node.
That's easy to explain: at the beginning of ocp_convert there is code
which
handles first some special / simple cases when
same_type_ignoring_top_level_qualifiers_p is true. That code isn't of
course
used for integer_zero_node as source expression, which therefore is
handled
by:

if (NULLPTR_TYPE_P (type) && e && null_ptr_cst_p (e))
  {
if (complain & tf_warning)
  maybe_warn_zero_as_null_pointer_constant (e, loc);
return nullptr_node;
  }

Maybe we should move this code up, then.

You are totally right. Yesterday I realized that and tested on x86_64-linux
the below, both with and without Jakub's fix.

+  if (!TREE_SIDE_EFFECTS (e))
+return nullptr_node;

So what happens if e has side-effects?
In that case nothing should change wrt the status quo, that is, the 
"fast path" wrapping the thing in a NOP_EXPR. That only for 
NULLPTR_TYPE_P nodes, I don't think that can happen for 
integer_zero_nodes. I must say, if I take out the check there are no 
regressions, but using it seems consistent with decay_conversion, were 
not having the check caused a real wrong code bug. What do you think? 
Maybe an alternative would be returning immediately e as-is?!?


Paolo.


Re: [PATCH PR other/77609] Let the assembler choose ELF section types for miscellaneous named sections

2018-05-04 Thread Roland McGrath via gcc-patches
ping

On Sat, Apr 28, 2018 at 2:42 AM Roland McGrath  wrote:

> I'm back for stage 1!

> The same patch from
https://gcc.gnu.org/ml/gcc-patches/2018-02/msg01549.html
> rebases cleanly and I didn't change anything but the date on the log entry
> since what I posted there.  The fresh rebase is on the roland/pr77609 git
> branch for your convenience.

> It has no check-gcc failures on x86_64-linux-gnu.

> OK to commit to trunk now?

> When will be the right time to raise the question of backporting it,
> perhaps shortly after the 8 release?


> Thanks,
> Roland


[gomp5] simd if/nontemporal clauses parsing and cancel if modifier

2018-05-04 Thread Jakub Jelinek
Hi!

This patch adds parsing of if and nontemporal clauses for simd construct
and also adds parsing of (optional) cancel modifier for if clause on cancel
directive.

While nontemporal clause is just an optimization (we still want to use
non-temporal stores (or even loads?) for those vars, what is the best way to
do that?), simd if is not an optimization, if the expression evaluates to
false at runtime, then we should just not vectorize; so probably we want to
preserve it in some form until vectorization and include this condition next
to where we emit checks for runtime aliasing or alignment etc.  Thoughts on
how to do that?

2018-05-04  Jakub Jelinek  

* tree-core.h (enum omp_clause_code): Add OMP_CLAUSE_NONTEMPORAL.
* tree.c (omp_clause_num_ops, omp_clause_code_name): Add nontemporal
clause entries.
(walk_tree_1): Handle OMP_CLAUSE_NONTEMPORAL.
* gimplify.c (enum gimplify_omp_var_data): Add GOVD_NONTEMPORAL.
(gimplify_scan_omp_clauses): Handle cancel and simd
OMP_CLAUSE_IF_MODIFIERs.  Handle OMP_CLAUSE_NONTEMPORAL.
(gimplify_adjust_omp_clauses_1): Ignore GOVD_NONTEMPORAL.
(gimplify_adjust_omp_clauses): Handle OMP_CLAUSE_NONTEMPORAL.
* omp-grid.c (grid_eliminate_combined_simd_part): Formatting fix.
Fix comment typos.
* tree-nested.c (convert_local_omp_clauses): Handle
OMP_CLAUSE_NONTEMPORAL.
(convert_nonlocal_omp_clauses): Likewise.  Remove useless test.
* tree-pretty-print.c (dump_omp_clause): Handle OMP_CLAUSE_NONTEMPORAL.
Handle cancel and simd OMP_CLAUSE_IF_MODIFIERs.
* omp-low.c (scan_sharing_clauses): Handle OMP_CLAUSE_NONTEMPORAL.
gcc/c-family/
* c-omp.c (c_omp_split_clauses): Handle OMP_CLAUSE_NONTEMPORAL.  Handle
splitting OMP_CLAUSE_IF also to OMP_SIMD.
* c-pragma.h (enum pragma_omp_clause): Add
PRAGMA_OMP_CLAUSE_NONTEMPORAL.
gcc/c/
* c-parser.c (c_parser_omp_clause_name): Handle nontemporal clause.
(c_parser_omp_clause_if): Handle cancel and simd modifiers.
(c_parser_omp_clause_nontemporal): New function.
(c_parser_omp_all_clauses): Handle PRAGMA_OMP_CLAUSE_NONTEMPORAL.
(OMP_SIMD_CLAUSE_MASK): Add if and nontemporal clauses.
* c-typeck.c (c_finish_omp_cancel): Diagnose if clause with modifier
other than cancel.
(c_finish_omp_clauses): Handle OMP_CLAUSE_NONTEMPORAL.
gcc/cp/
* parser.c (cp_parser_omp_clause_name): Handle nontemporal clause.
(cp_parser_omp_clause_if): Handle cancel and simd modifiers.
(cp_parser_omp_all_clauses): Handle PRAGMA_OMP_CLAUSE_NONTEMPORAL.
(OMP_SIMD_CLAUSE_MASK): Add if and nontemporal clauses.
* semantics.c (finish_omp_clauses): Diagnose if clause with modifier
other than cancel.
(finish_omp_cancel): Handle OMP_CLAUSE_NONTEMPORAL.
* pt.c (tsubst_omp_clauses): Likewise.
gcc/testsuite/
* c-c++-common/gomp/if-1.c (foo): Add some further tests.
* c-c++-common/gomp/if-2.c (foo): Likewise.  Expect slightly different
diagnostics wording in one case.
* c-c++-common/gomp/if-3.c: New test.
* c-c++-common/gomp/nontemporal-1.c: New test.
libgomp/
* testsuite/libgomp.c/cancel-for-2.c (foo): Use cancel modifier
in some cases.

--- gcc/tree-core.h.jj  2018-05-02 17:29:55.902260817 +0200
+++ gcc/tree-core.h 2018-05-04 15:13:22.384614499 +0200
@@ -293,6 +293,9 @@ enum omp_clause_code {
   /* OpenMP clause: depend ({in,out,inout}:variable-list).  */
   OMP_CLAUSE_DEPEND,
 
+  /* OpenMP clause: nontemporal (variable-list).  */
+  OMP_CLAUSE_NONTEMPORAL,
+
   /* OpenMP clause: uniform (argument-list).  */
   OMP_CLAUSE_UNIFORM,
 
--- gcc/tree.c.jj   2018-04-30 13:49:44.692824652 +0200
+++ gcc/tree.c  2018-05-04 19:08:55.309273302 +0200
@@ -289,6 +289,7 @@ unsigned const char omp_clause_num_ops[]
   3, /* OMP_CLAUSE_LINEAR  */
   2, /* OMP_CLAUSE_ALIGNED  */
   1, /* OMP_CLAUSE_DEPEND  */
+  1, /* OMP_CLAUSE_NONTEMPORAL  */
   1, /* OMP_CLAUSE_UNIFORM  */
   1, /* OMP_CLAUSE_TO_DECLARE  */
   1, /* OMP_CLAUSE_LINK  */
@@ -362,6 +363,7 @@ const char * const omp_clause_code_name[
   "linear",
   "aligned",
   "depend",
+  "nontemporal",
   "uniform",
   "to",
   "link",
@@ -11528,6 +11530,7 @@ walk_tree_1 (tree *tp, walk_tree_fn func
case OMP_CLAUSE_SCHEDULE:
case OMP_CLAUSE_UNIFORM:
case OMP_CLAUSE_DEPEND:
+   case OMP_CLAUSE_NONTEMPORAL:
case OMP_CLAUSE_NUM_TEAMS:
case OMP_CLAUSE_THREAD_LIMIT:
case OMP_CLAUSE_DEVICE:
--- gcc/gimplify.c.jj   2018-05-03 15:22:54.521416751 +0200
+++ gcc/gimplify.c  2018-05-04 19:08:55.309273302 +0200
@@ -111,6 +111,8 @@ enum gimplify_omp_var_data
   /* Flag for GOVD_MAP: only copy back.  */
   GOVD_MAP_FROM_ONLY = 2097152,
 
+  GOVD_NONTEMPORAL = 4194304,
+
   GOVD_DATA_SHARE_CLASS = (GOVD_SHARED | GOVD_PRIVATE | GOVD_FIRSTPRIVATE
   

Re: [PATCH PR other/77609] Let the assembler choose ELF section types for miscellaneous named sections

2018-05-04 Thread Ian Lance Taylor via gcc-patches
On Sat, Apr 28, 2018 at 2:42 AM, Roland McGrath  wrote:
> I'm back for stage 1!
>
> The same patch from https://gcc.gnu.org/ml/gcc-patches/2018-02/msg01549.html
> rebases cleanly and I didn't change anything but the date on the log entry
> since what I posted there.  The fresh rebase is on the roland/pr77609 git
> branch for your convenience.
>
> It has no check-gcc failures on x86_64-linux-gnu.
>
> OK to commit to trunk now?
>
> When will be the right time to raise the question of backporting it,
> perhaps shortly after the 8 release?

This is OK for trunk.  Thanks.

I  think a backport would be OK after a month.

Ian


[RFA][fortran] Fix # handling in the Fortran front-end

2018-05-04 Thread Jeff Law

The Fortran front end has its own code to parse #  
directives.  We've run into a case where it does not function correctly.
 In particular when the directive changes the current file, subsequent
diagnostics still refer to the original filename.

Concretely take this code and compile it with -Wall:



# 12345 "foo-f"
SUBROUTINE s(dummy)
  INTEGER, INTENT(in) :: dummy
END SUBROUTINE


The "dummy" argument is unused and you'll get a diagnostic like:

-bash-4.3$ gfortran j.f90 -Wall
j.f90:12345:18:

Warning: Unused dummy argument ‘dummy’ at (1) [-Wunused-dummy-argument]

Note how we got the right line #, but the wrong file in the diagnostic.
It should look something like this:

[law@torsion gcc]$ ./gfortran -Wall -B./ j.f90
foo-f:12345:18:

Warning: Unused dummy argument ‘dummy’ at (1) [-Wunused-dummy-argument]

--


AFAICT the Fortran front-end has failed to notify the linemap interface
that the current filename has changed.

This patch (and testcase) fixes the problem by adding the missing call
to linemap_add.

Bootstrapped and regression tested on x86_64-linux-gnu.  OK for the trunk?

Jeff
diff --git a/gcc/fortran/scanner.c b/gcc/fortran/scanner.c
index aab5379..55d6daf 100644
--- a/gcc/fortran/scanner.c
+++ b/gcc/fortran/scanner.c
@@ -2107,6 +2107,10 @@ preprocessor_line (gfc_char_t *c)
   in the linemap.  Alternative could be using GC or updating linemap to
   point to the new name, but there is no API for that currently.  */
   current_file->filename = xstrdup (filename);
+
+  /* We need to tell the linemap API that the filename changed.  Just
+changing current_file is insufficient.  */
+  linemap_add (line_table, LC_RENAME, false, current_file->filename, line);
 }
 
   /* Set new line number.  */
diff --git a/gcc/testsuite/gfortran.dg/linefile.f90 
b/gcc/testsuite/gfortran.dg/linefile.f90
new file mode 100644
index 000..7f1465a
--- /dev/null
+++ b/gcc/testsuite/gfortran.dg/linefile.f90
@@ -0,0 +1,12 @@
+! { dg-do compile }
+! { dg-options "-Wall" }
+
+# 4 "foo-f"
+SUBROUTINE s(dummy)
+  INTEGER, INTENT(in) :: dummy
+END SUBROUTINE
+! We want to check that the # directive changes the filename in the
+! diagnostic.  Nothing else really matters here.  dg-regexp allows us
+! to see the entire diagnostic.  We just have to make sure to consume
+! the entire message.
+! { dg-regexp "foo-f\[^\n]*" }


Re: [C++ PATCH] Fix value initialized decltype(nullptr) in constexpr (PR c++/85553)

2018-05-04 Thread Paolo Carlini
... thinking more about the issue, probably the best thing to do would 
be wrapping instead in a COMPOUND_EXPR, which would be also consistent 
with cp_convert_to_pointer. Thus I'm finishing testing the below (past 
g++.dg). How about it?


Thanks!
Paolo.

/
Index: cvt.c
===
--- cvt.c   (revision 259926)
+++ cvt.c   (working copy)
@@ -711,6 +711,15 @@ ocp_convert (tree type, tree expr, int convtype, i
   if (error_operand_p (e))
 return error_mark_node;
 
+  if (NULLPTR_TYPE_P (type) && null_ptr_cst_p (e))
+{
+  if (complain & tf_warning)
+   maybe_warn_zero_as_null_pointer_constant (e, loc);
+
+  return (TREE_SIDE_EFFECTS (e)
+ ? build2 (COMPOUND_EXPR, type, e, nullptr_node) : nullptr_node);
+}
+
   if (MAYBE_CLASS_TYPE_P (type) && (convtype & CONV_FORCE_TEMP))
 /* We need a new temporary; don't take this shortcut.  */;
   else if (same_type_ignoring_top_level_qualifiers_p (type, TREE_TYPE (e)))
@@ -832,12 +841,6 @@ ocp_convert (tree type, tree expr, int convtype, i
   /* Ignore any integer overflow caused by the conversion.  */
   return ignore_overflows (converted, e);
 }
-  if (NULLPTR_TYPE_P (type) && e && null_ptr_cst_p (e))
-{
-  if (complain & tf_warning)
-   maybe_warn_zero_as_null_pointer_constant (e, loc);
-  return nullptr_node;
-}
   if (POINTER_TYPE_P (type) || TYPE_PTRMEM_P (type))
 return cp_convert_to_pointer (type, e, dofold, complain);
   if (code == VECTOR_TYPE)


Re: [PATCH 7/8] [BRIGFE] phsa-specific optimizations

2018-05-04 Thread Pekka Jääskeläinen
Committed as r259957.

On Fri, May 4, 2018 at 5:02 PM, Pekka Jääskeläinen  wrote:
> Add flag -fassume-phsa that is on by default. If -fno-assume-phsa
> is given, these optimizations are disabled.
>
> With this flag, gccbrig can generate GENERIC that assumes we are
> targeting a phsa-runtime based implementation, which allows us
> to expose the work-item context accesses to retrieve WI IDs etc.
> which helps optimizers.
>
> First optimization that takes advantage of this is to get rid of
> the setworkitemid calls whenever we have non-inlined calls that
> use IDs internally.
>
> Other optimizations added in this commit:
>
> - expand absoluteid to similar level of simplicity as workitemid.
> At the moment absoluteid is the best indexing ID to end up with
> WG vectorization.
> - propagate ID variables closer to their uses. This is mainly
> to avoid known useless casts, which confuse at least scalar
> evolution analysis.
> - use signed long long for storing IDs. Unsigned integers have
> defined wraparound semantics, which confuse at least scalar
> evolution analysis, leading to unvectorizable WI loops.
> - also refactor some BRIG function generation helpers to brig_function.
> - no point in having the wi-loop as a for-loop. It's really
> a do...while and SCEV can analyze it just fine still.
> - add consts to ptrs etc. in BRIG builtin defs.
> Improves optimization opportunities.
> - add qualifiers to generated function parameters.
> Const and restrict on the hidden local/private pointers,
> the arg buffer and the context pointer help some optimizations.
> ---
>  gcc/brig-builtins.def  |  27 +-
>  gcc/brig/brigfrontend/brig-basic-inst-handler.cc   | 172 +---
>  gcc/brig/brigfrontend/brig-branch-inst-handler.cc  |  21 +-
>  gcc/brig/brigfrontend/brig-cmp-inst-handler.cc |   6 +-
>  gcc/brig/brigfrontend/brig-code-entry-handler.cc   | 503 +--
>  gcc/brig/brigfrontend/brig-code-entry-handler.h|  21 -
>  gcc/brig/brigfrontend/brig-control-handler.cc  |  20 +-
>  gcc/brig/brigfrontend/brig-cvt-inst-handler.cc |   6 +
>  gcc/brig/brigfrontend/brig-function-handler.cc |  89 +-
>  gcc/brig/brigfrontend/brig-function.cc | 925
> +++--
>  gcc/brig/brigfrontend/brig-function.h  |  43 +
>  gcc/brig/brigfrontend/brig-label-handler.cc|   3 +
>  gcc/brig/brigfrontend/brig-lane-inst-handler.cc|   2 +-
>  gcc/brig/brigfrontend/brig-mem-inst-handler.cc |   7 +-
>  gcc/brig/brigfrontend/phsa.h   |   9 +
>  gcc/brig/lang.opt  |   5 +
>  gcc/builtin-types.def  |   4 +
>  gcc/testsuite/brig.dg/test/gimple/smoke_test.hsail |  10 +-
>  libhsail-rt/include/internal/phsa-rt.h |   1 -
>  libhsail-rt/include/internal/workitems.h   |  50 +-
>  libhsail-rt/rt/workitems.c |  84 +-
>  21 files changed, 1195 insertions(+), 813 deletions(-)
>


Re: [PATCH 8/8] [BRIGFE] Fix handling of NOPs

2018-05-04 Thread Pekka Jääskeläinen
Committed as r259958.

On Fri, May 4, 2018 at 5:02 PM, Pekka Jääskeläinen  wrote:
> ---
>  gcc/brig/brigfrontend/brig-basic-inst-handler.cc | 10 +-
>  1 file changed, 5 insertions(+), 5 deletions(-)
>


Re: [C++ PATCH] Fix value initialized decltype(nullptr) in constexpr (PR c++/85553)

2018-05-04 Thread Jason Merrill
On Fri, May 4, 2018 at 2:06 PM, Paolo Carlini  wrote:
> Hi,
>
>
> On 04/05/2018 19:45, Jason Merrill wrote:
>>
>> On Sun, Apr 29, 2018 at 3:23 AM, Paolo Carlini
>> wrote:
>>>
>>> Hi,
>>>
>>> On 28/04/2018 18:41, Jason Merrill wrote:

 On Fri, Apr 27, 2018 at 7:26 PM, Paolo Carlini
 wrote:
>
> Hi again,
>
> I'm now pretty sure that we have a latent issue in ocp_convert. The bug
> fixed by Jakub shows that we used to not have issues with
> integer_zero_node.
> That's easy to explain: at the beginning of ocp_convert there is code
> which
> handles first some special / simple cases when
> same_type_ignoring_top_level_qualifiers_p is true. That code isn't of
> course
> used for integer_zero_node as source expression, which therefore is
> handled
> by:
>
> if (NULLPTR_TYPE_P (type) && e && null_ptr_cst_p (e))
>   {
> if (complain & tf_warning)
>   maybe_warn_zero_as_null_pointer_constant (e, loc);
> return nullptr_node;
>   }

 Maybe we should move this code up, then.
>>>
>>> You are totally right. Yesterday I realized that and tested on
>>> x86_64-linux
>>> the below, both with and without Jakub's fix.
>>
>> +  if (!TREE_SIDE_EFFECTS (e))
>> +return nullptr_node;
>>
>> So what happens if e has side-effects?
>
> In that case nothing should change wrt the status quo, that is, the "fast
> path" wrapping the thing in a NOP_EXPR. That only for NULLPTR_TYPE_P nodes,
> I don't think that can happen for integer_zero_nodes. I must say, if I take
> out the check there are no regressions, but using it seems consistent with
> decay_conversion, were not having the check caused a real wrong code bug.
> What do you think? Maybe an alternative would be returning immediately e
> as-is?!?

The patch is OK as is.

Jason


Re: [RFA][fortran] Fix # handling in the Fortran front-end

2018-05-04 Thread Steve Kargl
On Fri, May 04, 2018 at 01:32:00PM -0600, Jeff Law wrote:
> 
> The Fortran front end has its own code to parse #  
> directives.  We've run into a case where it does not function correctly.
>  In particular when the directive changes the current file, subsequent
> diagnostics still refer to the original filename.
> 
> Concretely take this code and compile it with -Wall:
> 
> # 12345 "foo-f"
> SUBROUTINE s(dummy)
>   INTEGER, INTENT(in) :: dummy
> END SUBROUTINE

Can you tell us where the above comes from?

If I have the three lines of code in a file 
name 'a.inc', and use either the Fortran
INCLUDE statement or a cpp #include statement
I get what I expect.

% cat a.f90
module bar
  contains
  include 'a.inc'
end module bar
% gfcx -c -Wall a.f90
a.inc:1:18:

 SUBROUTINE s(dummy)
  1
Warning: Unused dummy argument 'dummy' at (1) [-Wunused-dummy-argument]

% cat a.F90
module bar
  contains
#include "a.inc"
end module bar
% gfcx -c -Wall a.F90
a.inc:1:18:
 
 SUBROUTINE s(dummy)
  1
Warning: Unused dummy argument 'dummy' at (1) [-Wunused-dummy-argument]

> 
> Bootstrapped and regression tested on x86_64-linux-gnu.  OK for the trunk?
> 

I don't have any objection to the patch, but I would like
to understand where '# 12345 "foo-f"' comes from.

-- 
Steve


Re: [RFA][fortran] Fix # handling in the Fortran front-end

2018-05-04 Thread Jeff Law
On 05/04/2018 01:55 PM, Steve Kargl wrote:
> On Fri, May 04, 2018 at 01:32:00PM -0600, Jeff Law wrote:
>>
>> The Fortran front end has its own code to parse #  
>> directives.  We've run into a case where it does not function correctly.
>>  In particular when the directive changes the current file, subsequent
>> diagnostics still refer to the original filename.
>>
>> Concretely take this code and compile it with -Wall:
>>
>> # 12345 "foo-f"
>> SUBROUTINE s(dummy)
>>   INTEGER, INTENT(in) :: dummy
>> END SUBROUTINE
> 
> Can you tell us where the above comes from?
Use of #   is a standard CPP directive one can use to change
the compiler's notion of file/line.   It's defined by ISO for C/C++ and
appears to be relatively common in other Fortran compilers.  GNU Fortran
tries to handle it and just gets it slightly wrong.

It's most typically used when source code is generated by another program.




> 
> If I have the three lines of code in a file 
> name 'a.inc', and use either the Fortran
> INCLUDE statement or a cpp #include statement
> I get what I expect.
Right.  That uses a slightly different form internally.  It'll get
turned into

#   

Where the flags will indicate entry/exit from included files.  That form
is handled correctly by the GNU Fortran front end.  But this form is not
for external use.


Jeff


Re: [RFA][fortran] Fix # handling in the Fortran front-end

2018-05-04 Thread Steve Kargl
On Fri, May 04, 2018 at 02:05:11PM -0600, Jeff Law wrote:
> On 05/04/2018 01:55 PM, Steve Kargl wrote:
> > On Fri, May 04, 2018 at 01:32:00PM -0600, Jeff Law wrote:
> >>
> >> The Fortran front end has its own code to parse #  
> >> directives.  We've run into a case where it does not function correctly.
> >>  In particular when the directive changes the current file, subsequent
> >> diagnostics still refer to the original filename.
> >>
> >> Concretely take this code and compile it with -Wall:
> >>
> >> # 12345 "foo-f"
> >> SUBROUTINE s(dummy)
> >>   INTEGER, INTENT(in) :: dummy
> >> END SUBROUTINE
> > 
> > Can you tell us where the above comes from?
> Use of #   is a standard CPP directive one can use to change
> the compiler's notion of file/line.   It's defined by ISO for C/C++ and
> appears to be relatively common in other Fortran compilers.  GNU Fortran
> tries to handle it and just gets it slightly wrong.
> 
> It's most typically used when source code is generated by another program.
> 
> > If I have the three lines of code in a file 
> > name 'a.inc', and use either the Fortran
> > INCLUDE statement or a cpp #include statement
> > I get what I expect.
> Right.  That uses a slightly different form internally.  It'll get
> turned into
> 
> #   
> 
> Where the flags will indicate entry/exit from included files.  That form
> is handled correctly by the GNU Fortran front end.  But this form is not
> for external use.

Thanks for the explanation.  The patch looks good to me.

-- 
Steve


[PATCH, aarch64] Patch to update pipeline descriptions in thunderx2t99.md

2018-05-04 Thread Steve Ellcey
Now that we are in stage 1 again, here is an update to my earlier t99
scheduling file patch for thunderx2t99.md.  There were some instruction
types (mostly asimd) that did not have schedules and other types that had
duplicate schedules.  With this patch there should be one schedule for
every type and no duplicates.

I did some SPEC2017 runs on a T99 to see if this had any significant
performance impact but it did not appear to.  The performance
differences were small and within the range of results I had gotten
before.  I would still like to check this in though in order to have
a complete and correct schedule file for T99.

Steve Ellcey
sell...@cavium.com

2018-05-04  Steve Ellcey  

* config/aarch64/thunderx2t99.md (thunderx2t99_ls_both): Delete.
(thunderx2t99_multiple): Delete psuedo-units from used cpus.
Add untyped.
(thunderx2t99_alu_shift): Remove alu_shift_reg, alus_shift_reg.
Change logics_shift_reg to logics_shift_imm.
(thunderx2t99_loadpair): Fix cpu unit ordering.
(thunderx2t99_fp_loadpair_basic): Delete.
(thunderx2t99_fp_storepair_basic): Delete.
(thunderx2t99_asimd_int): Add neon_sub and neon_sub_q types.
(thunderx2t99_asimd_polynomial): Delete.
(thunderx2t99_asimd_fp_simple): Add neon_fp_mul_s_scalar_q
and neon_fp_mul_d_scalar_q.
(thunderx2t99_asimd_fp_conv): Add *int_to_fp* 
types.gcc/config/aarch64/thunderx2t99.md
(thunderx2t99_asimd_misc): Delete neon_dup and neon_dup_q.
(thunderx2t99_asimd_recip_step): Add missing *sqrt* types.
(thunderx2t99_asimd_lut): Add missing tbl types.
(thunderx2t99_asimd_ext): Delete.
(thunderx2t99_asimd_load1_1_mult): Delete.
(thunderx2t99_asimd_load1_2_mult): Delete.
(thunderx2t99_asimd_load1_ldp): New.
(thunderx2t99_asimd_load1): New.
(thunderx2t99_asimd_load2): Add missing *load2* types.
(thunderx2t99_asimd_load3): New.
(thunderx2t99_asimd_load4): New.
(thunderx2t99_asimd_store1_1_mult): Delete.
(thunderx2t99_asimd_store1_2_mult): Delete.
(thunderx2t99_asimd_store2_mult): Delete.
(thunderx2t99_asimd_store2_onelane): Delete.
(thunderx2t99_asimd_store_stp): New.
(thunderx2t99_asimd_store1): New.
(thunderx2t99_asimd_store2): New.
(thunderx2t99_asimd_store3): New.
(thunderx2t99_asimd_store4): New.diff --git a/gcc/config/aarch64/thunderx2t99.md b/gcc/config/aarch64/thunderx2t99.md
index 589e564..eee2896 100644
--- a/gcc/config/aarch64/thunderx2t99.md
+++ b/gcc/config/aarch64/thunderx2t99.md
@@ -54,8 +54,6 @@
 (define_reservation "thunderx2t99_ls01" "thunderx2t99_ls0|thunderx2t99_ls1")
 (define_reservation "thunderx2t99_f01" "thunderx2t99_f0|thunderx2t99_f1")
 
-(define_reservation "thunderx2t99_ls_both" "thunderx2t99_ls0+thunderx2t99_ls1")
-
 ; A load with delay in the ls0/ls1 pipes.
 (define_reservation "thunderx2t99_l0delay" "thunderx2t99_ls0,\
   thunderx2t99_ls0d1,thunderx2t99_ls0d2,\
@@ -86,12 +84,10 @@
 
 (define_insn_reservation "thunderx2t99_multiple" 1
   (and (eq_attr "tune" "thunderx2t99")
-   (eq_attr "type" "multiple"))
+   (eq_attr "type" "multiple,untyped"))
   "thunderx2t99_i0+thunderx2t99_i1+thunderx2t99_i2+thunderx2t99_ls0+\
thunderx2t99_ls1+thunderx2t99_sd+thunderx2t99_i1m1+thunderx2t99_i1m2+\
-   thunderx2t99_i1m3+thunderx2t99_ls0d1+thunderx2t99_ls0d2+thunderx2t99_ls0d3+\
-   thunderx2t99_ls1d1+thunderx2t99_ls1d2+thunderx2t99_ls1d3+thunderx2t99_f0+\
-   thunderx2t99_f1")
+   thunderx2t99_i1m3+thunderx2t99_f0+thunderx2t99_f1")
 
 ;; Integer arithmetic/logic instructions.
 
@@ -113,9 +109,9 @@
 
 (define_insn_reservation "thunderx2t99_alu_shift" 2
   (and (eq_attr "tune" "thunderx2t99")
-   (eq_attr "type" "alu_shift_imm,alu_ext,alu_shift_reg,\
-			alus_shift_imm,alus_ext,alus_shift_reg,\
-			logic_shift_imm,logics_shift_reg"))
+   (eq_attr "type" "alu_shift_imm,alu_ext,\
+			alus_shift_imm,alus_ext,\
+			logic_shift_imm,logics_shift_imm"))
   "thunderx2t99_i012,thunderx2t99_i012")
 
 (define_insn_reservation "thunderx2t99_div" 13
@@ -150,7 +146,7 @@
 (define_insn_reservation "thunderx2t99_loadpair" 5
   (and (eq_attr "tune" "thunderx2t99")
(eq_attr "type" "load_8,load_16"))
-  "thunderx2t99_i012,thunderx2t99_ls01")
+  "thunderx2t99_ls01,thunderx2t99_i012")
 
 (define_insn_reservation "thunderx2t99_store_basic" 1
   (and (eq_attr "tune" "thunderx2t99")
@@ -228,21 +224,11 @@
(eq_attr "type" "f_loads,f_loadd"))
   "thunderx2t99_ls01")
 
-(define_insn_reservation "thunderx2t99_fp_loadpair_basic" 4
-  (and (eq_attr "tune" "thunderx2t99")
-   (eq_attr "type" "neon_load1_2reg"))
-  "thunderx2t99_ls01*2")
-
 (define_insn_reservation "thunderx2t99_fp_store_basic" 1
   (and (eq_attr "tune" "thunderx2t99")
(eq_attr "type" "f_stores,f_stored"))
   "thunderx2t99_ls01,thunderx2t99_sd")
 
-(define_insn_reservation "thunderx2t99_fp_storepair_basic"

[PATCH] RISC-V: Use new linker emulations for glibc ABI.

2018-05-04 Thread Jim Wilson
I've submitted a binutils patch that adds some new linker emulations to fix
a linker problem with library paths.  The rv64/lp64d linker looks in /lib64
when glibc says it should look in /lib64/lp64d.  To make the binutils patch
work, I had to add 4 new emulations because we have 6 ABIs.  This patch
modifies the compiler to use the new linker emulations in the linux port.  This
was done in a backwards compatible way, so the linker still looks in the
original dir after the ABI specific dir, and I didn't change the emulation
names for the default lp64d and ilp32d ABIs.

This was tested with riscv{32,64}-{elf,linux} builds and tests.  There were
no regressions.

This will go in after the binutils patch goes in.

Jim

gcc/
* config/riscv/linux.h (MUSL_ABI_SUFFIX): Delete unnecessary backslash.
(LD_EMUL_SUFFIX): New.
---
 gcc/config/riscv/linux.h | 12 ++--
 1 file changed, 10 insertions(+), 2 deletions(-)

diff --git a/gcc/config/riscv/linux.h b/gcc/config/riscv/linux.h
index aa8a28d5d31..85561846dad 100644
--- a/gcc/config/riscv/linux.h
+++ b/gcc/config/riscv/linux.h
@@ -30,7 +30,7 @@ along with GCC; see the file COPYING3.  If not see
   "%{mabi=ilp32d:}" \
   "%{mabi=lp64:-sf}" \
   "%{mabi=lp64f:-sp}" \
-  "%{mabi=lp64d:}" \
+  "%{mabi=lp64d:}"
 
 #undef MUSL_DYNAMIC_LINKER
 #define MUSL_DYNAMIC_LINKER "/lib/ld-musl-riscv" XLEN_SPEC MUSL_ABI_SUFFIX 
".so.1"
@@ -49,8 +49,16 @@ along with GCC; see the file COPYING3.  If not see
 
 #define CPP_SPEC "%{pthread:-D_REENTRANT}"
 
+#define LD_EMUL_SUFFIX \
+  "%{mabi=lp64d:}" \
+  "%{mabi=lp64f:_lp64f}" \
+  "%{mabi=lp64:_lp64}" \
+  "%{mabi=ilp32d:}" \
+  "%{mabi=ilp32f:_ilp32f}" \
+  "%{mabi=ilp32:_ilp32}"
+
 #define LINK_SPEC "\
--melf" XLEN_SPEC "lriscv \
+-melf" XLEN_SPEC "lriscv" LD_EMUL_SUFFIX " \
 %{mno-relax:--no-relax} \
 %{shared} \
   %{!shared: \
-- 
2.14.1



Re: [PATCH] Backport of RISC-V support for libffi

2018-05-04 Thread Jim Wilson

On 03/20/2018 03:50 AM, Andreas Schwab wrote:

This is a backport of 
(the only difference to upstream is s/FFI_ALIGN/ALIGN/, as we don't have
commit bd72848c7a).  This is needed for libgo.


OK.

You might have to update the patch, as at least one other libffi patch 
has been backported recently, which might affect changes to common files.


Jim