[RFC] fwprop address cost changes

2018-07-11 Thread Robin Dapp
Hi,

we recently hit a problem where fwprop would not propagate a memory
address into an insn because our backend (s390) tells it that the
address_cost ()s for an address with index are higher than for one
without. Subsequently, should_replace_address () returns false and no
propagation is performed.

This checks seems to be just an early bail out since, when disabling it,
try_fwprop_subst () still checks src costs and would allow the
propagation. The problem is, though, that it relies on the newly
propagated-into insn already created before checking the costs so it
cannot be called at the same place should_replace_address () is being
called.

In this patch I quickly worked around this, adding an update flag to
try_fwprop_subst () that, when set to false, does not actually commit
the propagation but still checks costs. I'm sure there is a better and
much smaller way and I don't indent to apply this in its current state
(there's a lot of boilerplate code to keep default behavior) but it
might serve as basis for discussion/ideas. Richard mentioned the
insn_cost hook, but this would require the insn to exist as well.

Regards
 Robin
diff --git a/gcc/fwprop.c b/gcc/fwprop.c
index 0fca0f1edbc..6eeb77b93ed 100644
--- a/gcc/fwprop.c
+++ b/gcc/fwprop.c
@@ -392,7 +392,7 @@ canonicalize_address (rtx x)
 
 static bool
 should_replace_address (rtx old_rtx, rtx new_rtx, machine_mode mode,
-			addr_space_t as, bool speed)
+			addr_space_t as, bool speed, bool cost_addr)
 {
   int gain;
 
@@ -405,8 +405,11 @@ should_replace_address (rtx old_rtx, rtx new_rtx, machine_mode mode,
 return true;
 
   /* Prefer the new address if it is less expensive.  */
-  gain = (address_cost (old_rtx, mode, as, speed)
-	  - address_cost (new_rtx, mode, as, speed));
+  if (cost_addr)
+gain = (address_cost (old_rtx, mode, as, speed)
+	- address_cost (new_rtx, mode, as, speed));
+  else
+gain = 0;
 
   /* If the addresses have equivalent cost, prefer the new address
  if it has the highest `set_src_cost'.  That has the potential of
@@ -448,6 +451,10 @@ enum {
   PR_OPTIMIZE_FOR_SPEED = 4
 };
 
+static bool
+try_fwprop_subst (df_ref use, rtx *loc, rtx new_rtx, rtx_insn *def_insn,
+		  bool set_reg_equal, bool update);
+
 
 /* Replace all occurrences of OLD in *PX with NEW and try to simplify the
resulting expression.  Replace *PX with a new RTL expression if an
@@ -458,7 +465,11 @@ enum {
that is because there is no simplify_gen_* function for LO_SUM).  */
 
 static bool
-propagate_rtx_1 (rtx *px, rtx old_rtx, rtx new_rtx, int flags)
+propagate_rtx_1 (rtx *px, rtx old_rtx, rtx new_rtx, int flags);
+
+static bool
+propagate_rtx_1 (rtx *px, rtx old_rtx, rtx new_rtx, int flags,
+		 rtx *loc, df_ref use, rtx_insn *def_insn, bool set_reg_equal)
 {
   rtx x = *px, tem = NULL_RTX, op0, op1, op2;
   enum rtx_code code = GET_CODE (x);
@@ -491,7 +502,8 @@ propagate_rtx_1 (rtx *px, rtx old_rtx, rtx new_rtx, int flags)
 case RTX_UNARY:
   op0 = XEXP (x, 0);
   op_mode = GET_MODE (op0);
-  valid_ops &= propagate_rtx_1 (&op0, old_rtx, new_rtx, flags);
+  valid_ops &= propagate_rtx_1 (&op0, old_rtx, new_rtx, flags,
+loc, use, def_insn, set_reg_equal);
   if (op0 == XEXP (x, 0))
 	return true;
   tem = simplify_gen_unary (code, mode, op0, op_mode);
@@ -501,8 +513,10 @@ propagate_rtx_1 (rtx *px, rtx old_rtx, rtx new_rtx, int flags)
 case RTX_COMM_ARITH:
   op0 = XEXP (x, 0);
   op1 = XEXP (x, 1);
-  valid_ops &= propagate_rtx_1 (&op0, old_rtx, new_rtx, flags);
-  valid_ops &= propagate_rtx_1 (&op1, old_rtx, new_rtx, flags);
+  valid_ops &= propagate_rtx_1 (&op0, old_rtx, new_rtx, flags,
+loc, use, def_insn, set_reg_equal);
+  valid_ops &= propagate_rtx_1 (&op1, old_rtx, new_rtx, flags,
+loc, use, def_insn, set_reg_equal);
   if (op0 == XEXP (x, 0) && op1 == XEXP (x, 1))
 	return true;
   tem = simplify_gen_binary (code, mode, op0, op1);
@@ -513,8 +527,10 @@ propagate_rtx_1 (rtx *px, rtx old_rtx, rtx new_rtx, int flags)
   op0 = XEXP (x, 0);
   op1 = XEXP (x, 1);
   op_mode = GET_MODE (op0) != VOIDmode ? GET_MODE (op0) : GET_MODE (op1);
-  valid_ops &= propagate_rtx_1 (&op0, old_rtx, new_rtx, flags);
-  valid_ops &= propagate_rtx_1 (&op1, old_rtx, new_rtx, flags);
+  valid_ops &= propagate_rtx_1 (&op0, old_rtx, new_rtx, flags,
+loc, use, def_insn, set_reg_equal);
+  valid_ops &= propagate_rtx_1 (&op1, old_rtx, new_rtx, flags,
+loc, use, def_insn, set_reg_equal);
   if (op0 == XEXP (x, 0) && op1 == XEXP (x, 1))
 	return true;
   tem = simplify_gen_relational (code, mode, op_mode, op0, op1);
@@ -526,9 +542,12 @@ propagate_rtx_1 (rtx *px, rtx old_rtx, rtx new_rtx, int flags)
   op1 = XEXP (x, 1);
   op2 = XEXP (x, 2);
   op_mode = GET_MODE (op0);
-  valid_ops &= propagate_rtx_1 (&op0, old_rtx, new_rtx, flags);
-  valid_ops &= propagate_rtx_1 (&op1, old_rtx, new_rtx, flags);
-

[PATCH][C family] Fix PR86453

2018-07-11 Thread Richard Biener


This fixes handle_packed_attribute creating a type variant which differs
in TYPE_PACKED.  This cannot be generally allowed since TYPE_PACKED
affects layout and layout is shared between variants.

For the testcase in question the attribute itself is later ignored
but TYPE_PACKED is still applied which eventually leads to an ICE
in type verification (that isn't applied very reliably).

Bootstrapped and tested on x86_64-unknown-linux-gnu, OK?

Thanks,
Richard.

2018-07-11  Richard Biener  

PR c/86453
* c-attribs.c (handle_packed_attribute): Do not build a variant
type with TYPE_PACKED, instead ignore the attribute if we may
not apply to the original type.

* g++.dg/warn/pr86453.C: New testcase.

diff --git a/gcc/c-family/c-attribs.c b/gcc/c-family/c-attribs.c
index f91add488bb..8cb87eb8154 100644
--- a/gcc/c-family/c-attribs.c
+++ b/gcc/c-family/c-attribs.c
@@ -502,8 +502,13 @@ handle_packed_attribute (tree *node, tree name, tree 
ARG_UNUSED (args),
   if (TYPE_P (*node))
 {
   if (!(flags & (int) ATTR_FLAG_TYPE_IN_PLACE))
-   *node = build_variant_type_copy (*node);
-  TYPE_PACKED (*node) = 1;
+   {
+ warning (OPT_Wattributes,
+  "%qE attribute ignored for type %qT", name, *node);
+ *no_add_attrs = true;
+   }
+  else
+   TYPE_PACKED (*node) = 1;
 }
   else if (TREE_CODE (*node) == FIELD_DECL)
 {
Index: gcc/testsuite/g++.dg/warn/pr86453.C
===
--- gcc/testsuite/g++.dg/warn/pr86453.C (nonexistent)
+++ gcc/testsuite/g++.dg/warn/pr86453.C (working copy)
@@ -0,0 +1,5 @@
+// { dg-do compile }
+// { dg-additional-options "-flto" { target lto } }
+struct X {
+  int *__attribute__((aligned(2), packed)) a; // { dg-warning "attribute 
ignored" }
+} b;


Re: [PATCH] fold strlen() of aggregate members (PR 77357)

2018-07-11 Thread Andre Vieira (lists)
On 09/07/18 22:44, Martin Sebor wrote:
> On 07/09/2018 06:40 AM, Richard Biener wrote:
>> On Sun, Jul 8, 2018 at 4:56 AM Martin Sebor  wrote:
>>>
>>> On 07/06/2018 09:52 AM, Richard Biener wrote:
 On Fri, Jul 6, 2018 at 1:54 AM Martin Sebor  wrote:
>
> GCC folds accesses to members of constant aggregates except
> for character arrays/strings.  For example, the strlen() call
> below is not folded:
>
>const char a[][4] = { "1", "12" };
>
>int f (void) { retturn strlen (a[1]); }
>
> The attached change set enhances the string_constant() function
> to make it possible to extract string constants from aggregate
> initializers (CONSTRUCTORS).
>
> The initial solution was much simpler but as is often the case,
> MEM_REF made it fail to fold things like:
>
>int f (void) { retturn strlen (a[1] + 1); }
>
> Handling those made the project a bit more interesting and
> the final solution somewhat more involved.
>
> To handle offsets into aggregate string members the patch also
> extends the fold_ctor_reference() function to extract entire
> string array initializers even if the offset points past
> the beginning of the string and even though the size and
> exact type of the reference are not known (there isn't enough
> information in a MEM_REF to determine that).
>
> Tested along with the patch for PR 86415 on x86_64-linux.

 +  if (TREE_CODE (init) == CONSTRUCTOR)
 +   {
 + tree type;
 + if (TREE_CODE (arg) == ARRAY_REF
 + || TREE_CODE (arg) == MEM_REF)
 +   type = TREE_TYPE (arg);
 + else if (TREE_CODE (arg) == COMPONENT_REF)
 +   {
 + tree field = TREE_OPERAND (arg, 1);
 + type = TREE_TYPE (field);
 +   }
 + else
 +   return NULL_TREE;

 what's wrong with just

 type = TREE_TYPE (field);
>>>
>>> In response to your comment below abut size I simplified things
>>> further so determining the type a priori is no longer necessary.
>>>
 ?

 + base_off *= BITS_PER_UNIT;

 poly_uint64 isn't enough for "bits", with wide-int you'd use
 offset_int,
 for poly you'd then use poly_offset?
>>>
>>> Okay, I tried to avoid the overflow.  (Converting between all
>>> these flavors of wide int types is a monumental PITA.)
>>>

 You extend fold_ctor_reference to treat size == 0 specially but then
 bother to compute a size here - that looks unneeded?
>>>
>>> Yes, well spotted, thanks!  I simplified the code so this isn't
>>> necessary, and neither is the type.
>>>

 While the offset of the reference determines the first field in the
 CONSTRUCTOR, how do you know the access doesn't touch
 adjacent ones?  STRING_CSTs do not have to be '\0' terminated,
 so consider

   char x[2][4] = { "abcd", "abcd" };

 and MEM[&x] with a char[8] type?  memcpy "inlining" will create
 such MEMs for example.
>>>
>>> The code is only used to find string constants in initializer
>>> expressions where I don't think the size of the access comes
>>> into play.  If a memcpy() call results in a MEM_REF[char[8],
>>> &x, 8] that's fine.  It's a valid reference and we can still
>>> get the underlying character sequence (which is represented
>>> as two STRING_CSTs with the two string literals).  I might
>>> be missing the point of your question.
>>
>> Maybe irrelevant for strlen folding depending on what you do
>> for missing '\0' termination.
>>

 @@ -6554,8 +6577,16 @@ fold_nonarray_ctor_reference (tree type, tree
 ctor,
tree byte_offset = DECL_FIELD_OFFSET (cfield);
tree field_offset = DECL_FIELD_BIT_OFFSET (cfield);
tree field_size = DECL_SIZE (cfield);
 -  offset_int bitoffset;
 -  offset_int bitoffset_end, access_end;
 +
 +  if (!field_size && TREE_CODE (cval) == STRING_CST)
 +   {
 + /* Determine the size of the flexible array member from
 +the size of the string initializer provided for it.  */
 + unsigned HOST_WIDE_INT len = TREE_STRING_LENGTH (cval);
 + tree eltype = TREE_TYPE (TREE_TYPE (cval));
 + len *= tree_to_uhwi (TYPE_SIZE (eltype));
 + field_size = build_int_cst (size_type_node, len);
 +   }

 Why does this only apply to STRING_CST initializers and not
 CONSTRUCTORS,
 say, for

 struct S { int i; int a[]; } s = { 1, { 2, 3, 4, 5, 6 } };
>>>
>>> I can't think of a use for it.  Do you have something in mind?
>>
>> Well, you basically implemented a get-CONSTRUCTOR-elt-at-offset
>> which is useful in other parts of the compiler.  So I don't see why
>> it shouldn't work for general flex-arrays.
>>

 ?  And why not use simply

   field_size = TYPE_SIZE (TRE

Re: [AArch64] Use arrays and loops rather than numbered variables in aarch64_operands_adjust_ok_for_ldpstp [1/2]

2018-07-11 Thread Jackson Woodruff

Hi Kyrill,


On 07/10/2018 10:55 AM, Kyrill Tkachov wrote:

Hi Jackson,

On 10/07/18 09:37, Jackson Woodruff wrote:

Hi all,

This patch removes some duplicated code.  Since this method deals with
four loads or stores, there is a lot of duplicated code that can easily
be replaced with smaller loops.

Regtest and bootstrap OK.

OK for trunk?



This looks like a good cleanup. There are no functional changes, right?

Yes, there are no functional changes in this patch.

Looks good to me, but you'll need approval from a maintainer.

Thanks,
Kyrill


Thanks,

Jackson

Changelog:

gcc/

2018-06-28  Jackson Woodruff 

 * config/aarch64/aarch64.c 
(aarch64_operands_adjust_ok_for_ldpstp):

 Use arrays instead of numbered variables.







[PATCH][GCC][AARCH64] Canonicalize aarch64 widening simd plus insns

2018-07-11 Thread Matthew Malcomson

Hi there,

The current RTL patterns for widening addition and subtraction 
instructions in

aarch64-simd.md use the code iterator attribute  to make their
definition more compact.
This approach means that the `minus` and `plus` cases have their operands in
the same order, which causes problems in matching.
The `minus` case needs the more complex operand second to be semantically
correct, but the `plus` case needs the more complex operand first to be in
canonical form.

This patch splits the RTL patterns into two, one for `plus` and one for
`minus` with differing operand order to match their differing requirements.


Ready for trunk?

Bootstrap and test on aarch64-none-linux-gnu

Changelog for gcc/testsuite/Changelog
2018-07-10  Matthew Malcomson  

    * gcc.target/aarch64/vect-su-add-sub.c: New.

Changelog for gcc/Changelog
2018-07-10  Matthew Malcomson  

    * config/aarch64/aarch64-simd.md
(aarch64_w): Split into...
    (aarch64_subw): ... This...
    (aarch64_addw): ... And this.
(aarch64_w_internal): Split into...
    (aarch64_subw_internal): ... This...
    (aarch64_addw_internal): ... And this.
(aarch64_w2_internal): Split into...
    (aarch64_subw2_internal): ... This...
    (aarch64_addw2_internal): ... And this.

diff --git a/gcc/config/aarch64/aarch64-simd.md b/gcc/config/aarch64/aarch64-simd.md
index aac5fa146ed8dde4507a0eb4ad6a07ce78d2f0cd..67b29cbe2cad91e031ee23be656ec61a403f2cf9 100644
--- a/gcc/config/aarch64/aarch64-simd.md
+++ b/gcc/config/aarch64/aarch64-simd.md
@@ -3302,38 +3302,78 @@
   DONE;
 })
 
-(define_insn "aarch64_w"
+(define_insn "aarch64_subw"
   [(set (match_operand: 0 "register_operand" "=w")
-(ADDSUB: (match_operand: 1 "register_operand" "w")
-			(ANY_EXTEND:
-			  (match_operand:VD_BHSI 2 "register_operand" "w"]
+			(minus:
+			 (match_operand: 1 "register_operand" "w")
+			 (ANY_EXTEND:
+			   (match_operand:VD_BHSI 2 "register_operand" "w"]
   "TARGET_SIMD"
-  "w\\t%0., %1., %2."
-  [(set_attr "type" "neon__widen")]
+  "subw\\t%0., %1., %2."
+  [(set_attr "type" "neon_sub_widen")]
 )
 
-(define_insn "aarch64_w_internal"
+(define_insn "aarch64_subw_internal"
   [(set (match_operand: 0 "register_operand" "=w")
-(ADDSUB: (match_operand: 1 "register_operand" "w")
-			(ANY_EXTEND:
-			  (vec_select:
-			   (match_operand:VQW 2 "register_operand" "w")
-			   (match_operand:VQW 3 "vect_par_cnst_lo_half" "")]
+		(minus:
+		 (match_operand: 1 "register_operand" "w")
+		 (ANY_EXTEND:
+		   (vec_select:
+		(match_operand:VQW 2 "register_operand" "w")
+		(match_operand:VQW 3 "vect_par_cnst_lo_half" "")]
   "TARGET_SIMD"
-  "w\\t%0., %1., %2."
-  [(set_attr "type" "neon__widen")]
+  "subw\\t%0., %1., %2."
+  [(set_attr "type" "neon_sub_widen")]
 )
 
-(define_insn "aarch64_w2_internal"
+(define_insn "aarch64_subw2_internal"
   [(set (match_operand: 0 "register_operand" "=w")
-(ADDSUB: (match_operand: 1 "register_operand" "w")
-			(ANY_EXTEND:
-			  (vec_select:
-			   (match_operand:VQW 2 "register_operand" "w")
-			   (match_operand:VQW 3 "vect_par_cnst_hi_half" "")]
+		(minus:
+		  (match_operand: 1 "register_operand" "w")
+		  (ANY_EXTEND:
+		(vec_select:
+		 (match_operand:VQW 2 "register_operand" "w")
+		 (match_operand:VQW 3 "vect_par_cnst_hi_half" "")]
+  "TARGET_SIMD"
+  "subw2\\t%0., %1., %2."
+  [(set_attr "type" "neon_sub_widen")]
+)
+
+(define_insn "aarch64_addw"
+  [(set (match_operand: 0 "register_operand" "=w")
+		(plus:
+		 (ANY_EXTEND:
+		  (match_operand:VD_BHSI 2 "register_operand" "w"))
+		 (match_operand: 1 "register_operand" "w")))]
   "TARGET_SIMD"
-  "w2\\t%0., %1., %2."
-  [(set_attr "type" "neon__widen")]
+  "addw\\t%0., %1., %2."
+  [(set_attr "type" "neon_add_widen")]
+)
+
+(define_insn "aarch64_addw_internal"
+  [(set (match_operand: 0 "register_operand" "=w")
+		(plus:
+		 (ANY_EXTEND:
+		  (vec_select:
+		   (match_operand:VQW 2 "register_operand" "w")
+		   (match_operand:VQW 3 "vect_par_cnst_lo_half" "")))
+		  (match_operand: 1 "register_operand" "w")))]
+  "TARGET_SIMD"
+  "addw\\t%0., %1., %2."
+  [(set_attr "type" "neon_add_widen")]
+)
+
+(define_insn "aarch64_addw2_internal"
+  [(set (match_operand: 0 "register_operand" "=w")
+		(plus:
+		 (ANY_EXTEND:
+		  (vec_select:
+		   (match_operand:VQW 2 "register_operand" "w")
+		   (match_operand:VQW 3 "vect_par_cnst_hi_half" "")))
+		 (match_operand: 1 "register_operand" "w")))]
+  "TARGET_SIMD"
+  "addw2\\t%0., %1., %2."
+  [(set_attr "type" "neon_add_widen")]
 )
 
 (define_expand "aarch64_saddw2"
diff --git a/gcc/testsuite/gcc.target/aarch64/simd/vect_su_add_sub.c b/gcc/testsuite/gcc.target/aarch64/simd/vect_su_add_sub.c
new file mode 100644
index ..15956ed83fdd5fc8dc895ab1ac4de3f98bc8a625
--- /dev/null
+++ b/gcc/testsuite/gcc.target/aarch64/simd/vect_su_add_sub.c
@@ -0,0 +1,56 @@
+/* { dg-do compile } */
+/* { dg-options "-O3" } */
+
+/* Ensure we use the signed/unsigned extend vectorized ad

Re: [PATCH, contrib] Add contrib/maintainers-verify.sh

2018-07-11 Thread Richard Earnshaw (lists)
On 12/06/18 11:03, Tom de Vries wrote:
> [ Fixed ENOPATCH ]
> 
> On Tue, Jun 12, 2018 at 11:57:13AM +0200, Tom de Vries wrote:
>> [ was: Re: [MAINTAINERS, committed] Remove redundant write-after-approval
>> entries ]
>>
>> On Tue, Jun 12, 2018 at 10:26:31AM +0200, Martin Liška wrote:
>>> Hi.
>>>
>>> Thanks for the script, it also found me in Write After Approval section.
>>> Thus I'll install following patch.
>>>
>>> Tom what about installing the script into contrib?
>>
>> I've renamed the script to contrib/maintainers-verify.sh.
>>
>> Also I've added a regression test that runs it:
>> ...
>> Running src/gcc/testsuite/gcc.src/maintainers.exp ...
>> PASS: maintainers-verify.sh
>> ...
>>
>> When failing (by reverting your patch on MAINTAINERS), it shows in gcc.log:
>> ...
>> Running src/gcc/testsuite/gcc.src/maintainers.exp ...
>> Redundant in write approval: Martin Liska
>> FAIL: maintainers-verify.sh
>> ...
>>
>> OK for trunk?

OK.

R.

>>
>> Thanks,
>> - Tom
> 
> [contrib] Add contrib/maintainers-verify.sh
> 
> ---
>  contrib/maintainers-verify.sh | 45 
> +++
>  gcc/testsuite/gcc.src/maintainers.exp | 35 +++
>  2 files changed, 80 insertions(+)
> 
> diff --git a/contrib/maintainers-verify.sh b/contrib/maintainers-verify.sh
> new file mode 100755
> index 000..226c158fdaa
> --- /dev/null
> +++ b/contrib/maintainers-verify.sh
> @@ -0,0 +1,45 @@
> +#!/bin/sh
> +
> +# Copyright (C) 2018 Free Software Foundation, Inc.
> +#
> +# This file is part of GCC.
> +#
> +# GCC is free software; you can redistribute it and/or modify
> +# it under the terms of the GNU General Public License as published by
> +# the Free Software Foundation; either version 3, or (at your option)
> +# any later version.
> +#
> +# GCC is distributed in the hope that it will be useful,
> +# but WITHOUT ANY WARRANTY; without even the implied warranty of
> +# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
> +# GNU General Public License for more details.
> +#
> +# You should have received a copy of the GNU General Public License
> +# along with GCC; see the file COPYING.  If not, write to
> +# the Free Software Foundation, 51 Franklin Street, Fifth Floor,
> +# Boston, MA 02110-1301, USA.
> +
> +if [ "$1" != "" ]; then
> +f="$1"
> +else
> +f=./MAINTAINERS
> +fi
> +
> +grep @ $f \
> +| sed 's/[\t][\t]*/\t/g' \
> +| awk -F '\t' \
> +   "
> +{
> +  if (NF == 2) {
> +name=\$1
> +email=\$2
> +if (names[name] == 1) {
> +printf \"Redundant in write approval: %s\n\", name
> +}
> +  } else if (NF == 3 ) {
> +name=\$2
> +email=\$3
> +names[name] = 1
> +  }
> +}
> +"
> diff --git a/gcc/testsuite/gcc.src/maintainers.exp 
> b/gcc/testsuite/gcc.src/maintainers.exp
> new file mode 100644
> index 000..89a062fb7ad
> --- /dev/null
> +++ b/gcc/testsuite/gcc.src/maintainers.exp
> @@ -0,0 +1,35 @@
> +#   Copyright (C) 2018 Free Software Foundation, Inc.
> +
> +# This program is free software; you can redistribute it and/or modify
> +# it under the terms of the GNU General Public License as published by
> +# the Free Software Foundation; either version 3 of the License, or
> +# (at your option) any later version.
> +# 
> +# This program is distributed in the hope that it will be useful,
> +# but WITHOUT ANY WARRANTY; without even the implied warranty of
> +# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
> +# GNU General Public License for more details.
> +# 
> +# You should have received a copy of the GNU General Public License
> +# along with GCC; see the file COPYING3.  If not see
> +# .
> +
> +proc gcc_src_run_maintainers_verify_sh {} {
> +set script maintainers-verify.sh
> +
> +global srcdir
> +set rootdir $srcdir/../..
> +set contrib $rootdir/contrib
> +
> +set maintainers $rootdir/MAINTAINERS
> +
> +set verify_output [exec $contrib/$script $maintainers]
> +if { "$verify_output"  == "" } {
> + pass "$script"
> +} else {
> + send_log "$verify_output\n"
> + fail "$script"
> +}
> +}
> +
> +gcc_src_run_maintainers_verify_sh
> 



[PATCH 2/2] Add "-fsave-optimization-record"

2018-07-11 Thread David Malcolm
This patch implements a -fsave-optimization-record option, which
leads to a JSON file being written out, recording the dump_* calls
made (via the optinfo infrastructure in the previous patch).

The patch includes a minimal version of the JSON patch I posted last
year, with just enough support needed for optimization records (I
removed all of the parser code, leaving just the code for building
in-memory JSON trees and writing them to a pretty_printer).

Successfully bootstrapped & regrtested on x86_64-pc-linux-gnu.

OK for trunk?

gcc/ChangeLog:
* Makefile.in (OBJS): Add json.o and optinfo-emit-json.o.
(CFLAGS-optinfo-emit-json.o): Define TARGET_NAME.
* common.opt (fsave-optimization-record): New option.
* coretypes.h (struct kv_pair): Move here from dumpfile.c.
* doc/invoke.texi (-fsave-optimization-record): New option.
* dumpfile.c: Include "optinfo-emit-json.h".
(struct kv_pair): Move to coretypes.h.
(optgroup_options): Make non-static.
(dump_context::end_scope): Call
optimization_records_maybe_pop_dump_scope.
* dumpfile.h (optgroup_options): New decl.
* json.cc: New file.
* json.h: New file.
* optinfo-emit-json.cc: New file.
* optinfo-emit-json.h: New file.
* optinfo.cc: Include "optinfo-emit-json.h".
(optinfo::emit): Call optimization_records_maybe_record_optinfo.
(optinfo_enabled_p): Check optimization_records_enabled_p.
(optinfo_wants_inlining_info_p): Likewise.
* optinfo.h: Update comment.
* profile-count.c (profile_quality_as_string): New function.
* profile-count.h (profile_quality_as_string): New decl.
(profile_count::quality): New accessor.
* selftest-run-tests.c (selftest::run_tests): Call json_cc_tests
and optinfo_emit_json_cc_tests.
* selftest.h (selftest::json_cc_tests): New decl.
(selftest::optinfo_emit_json_cc_tests): New decl.
* toplev.c: Include "optinfo-emit-json.h".
(compile_file): Call optimization_records_finish.
(do_compile): Call optimization_records_start.
* tree-ssa-live.c: Include optinfo.h.
(remove_unused_scope_block_p): Retain inlining information if
optinfo_wants_inlining_info_p returns true.
---
 gcc/Makefile.in  |   3 +
 gcc/common.opt   |   4 +
 gcc/coretypes.h  |   8 +
 gcc/doc/invoke.texi  |   8 +-
 gcc/dumpfile.c   |  15 +-
 gcc/dumpfile.h   |   3 +
 gcc/json.cc  | 293 
 gcc/json.h   | 166 ++
 gcc/optinfo-emit-json.cc | 568 +++
 gcc/optinfo-emit-json.h  |  36 +++
 gcc/optinfo.cc   |  11 +-
 gcc/optinfo.h|   4 -
 gcc/profile-count.c  |  28 +++
 gcc/profile-count.h  |   5 +
 gcc/selftest-run-tests.c |   2 +
 gcc/selftest.h   |   2 +
 gcc/toplev.c |   5 +
 gcc/tree-ssa-live.c  |   4 +-
 18 files changed, 1143 insertions(+), 22 deletions(-)
 create mode 100644 gcc/json.cc
 create mode 100644 gcc/json.h
 create mode 100644 gcc/optinfo-emit-json.cc
 create mode 100644 gcc/optinfo-emit-json.h

diff --git a/gcc/Makefile.in b/gcc/Makefile.in
index dd1dfc1..b871640 100644
--- a/gcc/Makefile.in
+++ b/gcc/Makefile.in
@@ -1387,6 +1387,7 @@ OBJS = \
ira-color.o \
ira-emit.o \
ira-lives.o \
+   json.o \
jump.o \
langhooks.o \
lcm.o \
@@ -1428,6 +1429,7 @@ OBJS = \
optabs-query.o \
optabs-tree.o \
optinfo.o \
+   optinfo-emit-json.o \
options-save.o \
opts-global.o \
passes.o \
@@ -2251,6 +2253,7 @@ s-bversion: BASE-VER
$(STAMP) s-bversion
 
 CFLAGS-toplev.o += -DTARGET_NAME=\"$(target_noncanonical)\"
+CFLAGS-optinfo-emit-json.o += -DTARGET_NAME=\"$(target_noncanonical)\"
 
 pass-instances.def: $(srcdir)/passes.def $(PASSES_EXTRA) \
$(srcdir)/gen-pass-instances.awk
diff --git a/gcc/common.opt b/gcc/common.opt
index 5a50bc27..a13c709 100644
--- a/gcc/common.opt
+++ b/gcc/common.opt
@@ -1950,6 +1950,10 @@ fopt-info-
 Common Joined RejectNegative Var(common_deferred_options) Defer
 -fopt-info[-=filename]   Dump compiler optimization details.
 
+fsave-optimization-record
+Common Report Var(flag_save_optimization_record) Optimization
+Write a SRCFILE.opt-record.json file detailing what optimizations were 
performed.
+
 foptimize-register-move
 Common Ignore
 Does nothing. Preserved for backward compatibility.
diff --git a/gcc/coretypes.h b/gcc/coretypes.h
index ed0e825..2fd20e4 100644
--- a/gcc/coretypes.h
+++ b/gcc/coretypes.h
@@ -332,6 +332,14 @@ namespace gcc {
 
 typedef std::pair  tree_pair;
 
+/* Define a name->value mapping.  */
+template 
+struct kv_pair
+{
+  const char *const name;  /* the name of the value */
+  const ValueType value;   /* the value of the name */
+};
+
 #else
 
 struct 

[PATCH 1/2] v5: Add "optinfo" framework

2018-07-11 Thread David Malcolm
Changes relative to v4:
* eliminated optinfo subclasses as discussed
* eliminated optinfo-internal.h, moving what remained into optinfo.h
* added support for dump_gimple_expr_loc and dump_gimple_expr
* more selftests

This patch implements a way to consolidate dump_* calls into
optinfo objects, as enabling work towards being able to write out
optimization records to a file (I'm focussing on that destination
in this patch kit, rather than diagnostic remarks).

The patch adds the support for building optinfo instances from dump_*
calls, but leaves implementing any *users* of them to followup patches.

Successfully bootstrapped & regrtested on x86_64-pc-linux-gnu.

OK for trunk?

gcc/ChangeLog:
* Makefile.in (OBJS): Add optinfo.o.
* coretypes.h (class symtab_node): New forward decl.
(struct cgraph_node): New forward decl.
(class varpool_node): New forward decl.
* dump-context.h: New file.
* dumpfile.c: Include "optinfo.h", "dump-context.h", "cgraph.h",
"tree-pass.h".
(refresh_dumps_are_enabled): Use optinfo_enabled_p.
(set_dump_file): Call dumpfile_ensure_any_optinfo_are_flushed.
(set_alt_dump_file): Likewise.
(dump_context::~dump_context): New dtor.
(dump_gimple_stmt): Move implementation to...
(dump_context::dump_gimple_stmt): ...this new member function.
Add the stmt to any pending optinfo, creating one if need be.
(dump_gimple_stmt_loc): Move implementation to...
(dump_context::dump_gimple_stmt_loc): ...this new member function.
Start a new optinfo and add the stmt to it.
(dump_gimple_expr): Move implementation to...
(dump_context::dump_gimple_expr): ...this new member function.
Add the stmt to any pending optinfo, creating one if need be.
(dump_gimple_expr_loc): Move implementation to...
(dump_context::dump_gimple_expr_loc): ...this new member function.
Start a new optinfo and add the stmt to it.
(dump_generic_expr): Move implementation to...
(dump_context::dump_generic_expr): ...this new member function.
Add the tree to any pending optinfo, creating one if need be.
(dump_generic_expr_loc): Move implementation to...
(dump_context::dump_generic_expr_loc): ...this new member
function.  Add the tree to any pending optinfo, creating one if
need be.
(dump_printf): Move implementation to...
(dump_context::dump_printf_va): ...this new member function.  Add
the text to any pending optinfo, creating one if need be.
(dump_printf_loc): Move implementation to...
(dump_context::dump_printf_loc_va): ...this new member function.
Start a new optinfo and add the stmt to it.
(dump_dec): Move implementation to...
(dump_context::dump_dec): ...this new member function.  Add the
value to any pending optinfo, creating one if need be.
(dump_context::dump_symtab_node): New member function.
(dump_context::get_scope_depth): New member function.
(dump_context::begin_scope): New member function.
(dump_context::end_scope): New member function.
(dump_context::ensure_pending_optinfo): New member function.
(dump_context::begin_next_optinfo): New member function.
(dump_context::end_any_optinfo): New member function.
(dump_context::s_current): New global.
(dump_context::s_default): New global.
(dump_scope_depth): Delete global.
(dumpfile_ensure_any_optinfo_are_flushed): New function.
(dump_symtab_node): New function.
(get_dump_scope_depth): Reimplement in terms of dump_context.
(dump_begin_scope): Likewise.
(dump_end_scope): Likewise.
(selftest::temp_dump_context::temp_dump_context): New ctor.
(selftest::temp_dump_context::~temp_dump_context): New dtor.
(selftest::verify_item): New function.
(ASSERT_IS_TEXT): New macro.
(ASSERT_IS_TREE): New macro.
(ASSERT_IS_GIMPLE): New macro.
(selftest::test_capture_of_dump_calls): New test.
(selftest::dumpfile_c_tests): Call it.
* dumpfile.h (dump_printf, dump_printf_loc, dump_basic_block)
(dump_generic_expr_loc, dump_generic_expr, dump_gimple_stmt_loc)
(dump_gimple_stmt, dump_dec): Gather these related decls and add a
descriptive comment.
(dump_function, print_combine_total_stats, enable_rtl_dump_file)
(dump_node, dump_bb): Move these unrelated decls.
(class dump_manager): Add leading comment.
* optinfo.cc: New file.
* optinfo.h: New file.
---
 gcc/Makefile.in|   1 +
 gcc/coretypes.h|   7 +
 gcc/dump-context.h | 138 +
 gcc/dumpfile.c | 597 +
 gcc/dumpfile.h |  84 +---
 gcc/optinfo.cc | 236 +
 gcc/optinfo.h  | 203 ++
 

[PATCH][GCC][AArch64][mid-end] Updated stack-clash implementation for AArch64. [patch (0/6)]

2018-07-11 Thread Tamar Christina
Hi All,

The patch series will allow AArch64 to use 64k guard sizes correctly and 
improves the code quality.
It also enables a reduction of the overhead in code size over the current GCC 8 
implementation.

Using 64k guard sizes results in a reduction in overhead compared to the 4k 
guard size.
The code size overhead of enabling stack clash protection with this patch 
series is ~0.86% vs
the ~0.96% of the current GCC 8 implementation.

All measurements were done over spec2017.

This series will also contain some mid-end changes required for alloca in order 
to
allow targets to opt in to a different implementation via a target hook.  As 
documented
using this hook will require the target to provide certain guarantees, but the 
result
is a smaller alloca implementation and one that doesn't need an extra register 
for targets
which have a limited offset for the probe instructions.

Thanks,
Tamar

-- 


[PATCH][GCC][AArch64] Cleanup the AArch64 testsuite when stack-clash is on [Patch (6/6)]

2018-07-11 Thread Tamar Christina
Hi All,

This patch cleans up the testsuite when a run is done with stack clash
protection turned on.

Concretely this switches off -fstack-clash-protection for a couple of tests:

* sve: We don't yet support stack-clash-protection and sve, so for now turn 
these off.
* assembler scan: some tests are quite fragile in that they check for exact
   assembly output, e.g. check for exact amount of sub etc.  These won't
   match now.
* vla: Some of the ubsan tests negative array indices. Because the arrays 
weren't
   used before the incorrect $sp wouldn't have been used. The correct value 
is
   restored on ret.  Now however we probe the $sp which causes a segfault.
* params: When testing the parameters we have to skip these on AArch64 because 
of our
  custom constraints on them.  We already test them separately so this 
isn't a
  loss.

Note that the testsuite is not entire clean due to gdb failure caused by alloca 
with
stack clash. On AArch64 we output an incorrect .loc directive, but this is 
already the
case with the current implementation in GCC and is a bug unrelated to this 
patch series.

Bootstrapped Regtested on aarch64-none-linux-gnu, x86_64-pc-linux-gnu and no 
issues.
Both targets were tested with stack clash on and off by default.

Ok for trunk?

Thanks,
Tamar

gcc/testsuite/
2018-07-11  Tamar Christina  

PR target/86486
gcc.dg/pr82788.c: Skip for AArch64.
gcc.dg/guality/vla-1.c: Turn off stack-clash.
gcc.target/aarch64/subsp.c: Likewise.
gcc.target/aarch64/sve/mask_struct_load_3.c: Likewise.
gcc.target/aarch64/sve/mask_struct_store_3.c: Likewise.
gcc.target/aarch64/sve/mask_struct_store_4.c: Likewise.
gcc.dg/params/blocksort-part.c: Skip stack-clash checks
on AArch64.

-- 
diff --git a/gcc/testsuite/c-c++-common/ubsan/vla-1.c b/gcc/testsuite/c-c++-common/ubsan/vla-1.c
index 52ade3aab7566dce3ca7ef931ac65895005d5e13..c97465edae195442a71ee66ab25015a2ac4fc8fc 100644
--- a/gcc/testsuite/c-c++-common/ubsan/vla-1.c
+++ b/gcc/testsuite/c-c++-common/ubsan/vla-1.c
@@ -1,5 +1,5 @@
 /* { dg-do run } */
-/* { dg-options "-fsanitize=vla-bound -Wall -Wno-unused-variable" } */
+/* { dg-options "-fsanitize=vla-bound -Wall -Wno-unused-variable -fno-stack-clash-protection" } */
 
 typedef long int V;
 int x = -1;
diff --git a/gcc/testsuite/gcc.dg/params/blocksort-part.c b/gcc/testsuite/gcc.dg/params/blocksort-part.c
index a9154f2e61ccd21b60153f20be3891b988f9ef2c..1e677878e7bd9c68b026f8c72b0de9f01e15459c 100644
--- a/gcc/testsuite/gcc.dg/params/blocksort-part.c
+++ b/gcc/testsuite/gcc.dg/params/blocksort-part.c
@@ -1,3 +1,4 @@
+/* { dg-skip-if "AArch64 does not support these bounds." { aarch64*-*-* } { "--param stack-clash-protection-*" } } */
 
 /*-*/
 /*--- Block sorting machinery   ---*/
diff --git a/gcc/testsuite/gcc.dg/pr82788.c b/gcc/testsuite/gcc.dg/pr82788.c
index a8f628fd7f66c3e56739f6ff491df38b23f4d4df..41c442f61a625c8b350e1e4c870a98d86b167031 100644
--- a/gcc/testsuite/gcc.dg/pr82788.c
+++ b/gcc/testsuite/gcc.dg/pr82788.c
@@ -1,4 +1,5 @@
 /* { dg-do run } */
 /* { dg-options "-O2 -fstack-clash-protection --param stack-clash-protection-probe-interval=10 --param stack-clash-protection-guard-size=12" } */
 /* { dg-require-effective-target supports_stack_clash_protection } */
+/* { dg-skip-if "AArch64 does not support this interval." { aarch64*-*-* } } */
 int main() { int a[1442]; return 0;}
diff --git a/gcc/testsuite/gcc.target/aarch64/subsp.c b/gcc/testsuite/gcc.target/aarch64/subsp.c
index 70d848c59d1f1e4df4314ca012c7a5d9d3b91ebc..6ef6b2c90ae694055749a94b68cbba5ee4aea882 100644
--- a/gcc/testsuite/gcc.target/aarch64/subsp.c
+++ b/gcc/testsuite/gcc.target/aarch64/subsp.c
@@ -1,4 +1,4 @@
-/* { dg-options "-O" } */
+/* { dg-options "-O -fno-stack-clash-protection" } */
 
 int foo (void *);
 
diff --git a/gcc/testsuite/gcc.target/aarch64/sve/mask_struct_load_3.c b/gcc/testsuite/gcc.target/aarch64/sve/mask_struct_load_3.c
index 29702ab55f249c3ebd0baf44981870524098e1e4..baeec61bb59aff56f0dcc20fc6ec6b93d517490e 100644
--- a/gcc/testsuite/gcc.target/aarch64/sve/mask_struct_load_3.c
+++ b/gcc/testsuite/gcc.target/aarch64/sve/mask_struct_load_3.c
@@ -1,5 +1,5 @@
 /* { dg-do compile } */
-/* { dg-options "-O2 -ftree-vectorize -ffast-math" } */
+/* { dg-options "-O2 -ftree-vectorize -ffast-math -fno-stack-clash-protection" } */
 
 #include 
 
diff --git a/gcc/testsuite/gcc.target/aarch64/sve/mask_struct_store_3.c b/gcc/testsuite/gcc.target/aarch64/sve/mask_struct_store_3.c
index 001f5be8ff58bfcc75eccc4c050bef1e53faffeb..eae3be7a7b24dc124f7c1c26a97fb25400cc62d2 100644
--- a/gcc/testsuite/gcc.target/aarch64/sve/mask_struct_store_3.c
+++ b/gcc/testsuite/gcc.target/aarch64/sve/mask_struct_store_3.c
@@ -1,5 +1,5 @@
 /* { dg-do compile } */
-/* { dg-options "-O2 -ftree-vectorize -ffast-math" } */
+/* { dg-options "-O2 -ftree-vector

[PATCH][GCC][AArch64] Updated stack-clash implementation supporting 64k probes. [patch (1/6)]

2018-07-11 Thread Tamar Christina
Hi All,

This patch implements the use of the stack clash mitigation for aarch64.
In Aarch64 we expect both the probing interval and the guard size to be 64KB
and we enforce them to always be equal.

We also probe up by 1024 bytes in the general case when a probe is required.

AArch64 has the following probing conditions:

 1) Any allocation less than 63KB requires no probing.  An ABI defined safe
buffer of 1Kbytes is used and a page size of 64k is assumed.

 2) Any allocations larger than 1 page size, is done in increments of page size
and probed up by 1KB leaving the residuals.

 3a) Any residual for local arguments that is less than 63KB requires no 
probing.
 Essentially this is a sliding window.  The probing range determines the ABI
 safe buffer, and the amount to be probed up.

  b) Any residual for outgoing arguments that is less than 1KB requires no 
probing,
 However to maintain our invariant, anything above or equal to 1KB requires 
a probe.

Incrementally allocating less than the probing thresholds, e.g. recursive 
functions will
not be an issue as the storing of LR counts as a probe.


+---+   
 
|  ABI SAFE REGION  |   
 
  +--   
 
  | |   |   
 
  | |   |   
 
  | |   |   
 
  | |   |   
 
  | |   |   
 
  | |   |   
 
 maximum amount   | |   |   
 
 not needing a| |   |   
 
 probe| |   |   
 
  | |   |   
 
  | |   |   
 
  | |   |   
 
  | |   |Probe offset when  
 
  |  probe is required  
 
  | |   |   
 
  + +---+   Point of first 
probe 
|  ABI SAFE REGION  |   
 
-   
 
|   |   
 
|   |   
 
|   |   
  

Bootstrapped Regtested on aarch64-none-linux-gnu and no issues.
Target was tested with stack clash on and off by default.

Ok for trunk?

Thanks,
Tamar

gcc/
2018-07-11  Jeff Law  
Richard Sandiford 
Tamar Christina  

PR target/86486
* config/aarch64/aarch64.md (cmp,
probe_stack_range): Add k (SP) constraint.
* config/aarch64/aarch64.h (STACK_CLASH_CALLER_GUARD,
STACK_CLASH_MAX_UNROLL_PAGES): New.
* config/aarch64/aarch64.c (aarch64_output_probe_stack_range): Emit
stack probes for stack clash.
(aarch64_allocate_and_probe_stack_space): New.
(aarch64_expand_prologue): Use it.
(aarch64_expand_epilogue): Likewise and update IP regs re-use criteria.
(aarch64_sub_sp): Add emit_move_imm optional param.

gcc/testsuite/
2018-07-11  Jeff Law  
Richard Sandiford 
Tamar Christina  

PR target/86486
* gcc.target/aarch64/stack-check-12.c: New.
* gcc.target/aarch64/stack-check-13.c: New.
* gcc.target/aarch64/stack-check-cfa-1.c: New.
* gcc.target/aarch64/stack-check-cfa-2.c: New.
* gcc.target/aarch64/stack-check-prologue-1.c: New.
* gcc.target/aarch64/stack-check-prologue-10.c: New.
* gcc.target/aarch64/stack-check-prologue-11.c: New.
* gcc.target/aarch64/stack-check-prologue-2.c: New.
* gcc.target/aarch64/stack-check-prologue-3.c: New.
* gcc.target/aarch64/stack-check-prologue-4.c: New.
* gcc.target/aarch64/stack-check-prologue-5.c: New.
* gcc.target/aarch64/stack-check-prologue-6.c: New.
* gcc.target/aarch64/stack-check-prologue-7.c: New.
* gcc.target/aarch64/stack-check-prologue-8.c: New.
* g

[PATCH][GCC][front-end][build-machinery][opt-framework] Allow setting of stack-clash via configure options. [Patch (4/6)]

2018-07-11 Thread Tamar Christina
Hi All,

This patch defines a configure option to allow the setting of the default
guard size via configure flags when building the target.

The new flag is:

 * --with-stack-clash-protection-guard-size=

The value of configured based params are set very early on and allow the
target to validate or reject the values as it sees fit.

To do this the values for the parameter get set by configure through CPP 
defines.
In case the back-end wants to know if a value was set or not the original 
default
value is also passed down as a define.

This allows a target to check if a param was changed by the user at configure 
time.

Bootstrapped Regtested on aarch64-none-linux-gnu, x86_64-pc-linux-gnu and no 
issues.
Both targets were tested with stack clash on and off by default.

Ok for trunk?

Thanks,
Tamar

gcc/
2018-07-11  Tamar Christina  

PR target/86486
* configure.ac: Add stack-clash-protection-guard-size.
* config.in (DEFAULT_STK_CLASH_GUARD_SIZE, STK_CLASH_GUARD_SIZE_DEFAULT,
STK_CLASH_GUARD_SIZE_MAX, STK_CLASH_GUARD_SIZE_MIN): New.
* params.def (PARAM_STACK_CLASH_PROTECTION_GUARD_SIZE): Use it.
* configure: Regenerate.
* Makefile.in (params.list, params.options): Add include dir for CPP.
* params-list.h: Include auto-host.h
* params-options.h: Likewise.

-- 
diff --git a/gcc/Makefile.in b/gcc/Makefile.in
index d8f3e8861189604035b248b69bc484443f334c1c..f2fcab8e1c91bac4fac1b7162659165f74c6795b 100644
--- a/gcc/Makefile.in
+++ b/gcc/Makefile.in
@@ -3469,13 +3469,13 @@ installdirs:
 
 params.list: s-params.list; @true
 s-params.list: $(srcdir)/params-list.h $(srcdir)/params.def
-	$(CPP) $(srcdir)/params-list.h | sed 's/^#.*//;/^$$/d' > tmp-params.list
+	$(CPP) -I$(objdir) $(srcdir)/params-list.h | sed 's/^#.*//;/^$$/d' > tmp-params.list
 	$(SHELL) $(srcdir)/../move-if-change tmp-params.list params.list
 	$(STAMP) s-params.list
 
 params.options: s-params.options; @true
 s-params.options: $(srcdir)/params-options.h $(srcdir)/params.def
-	$(CPP) $(srcdir)/params-options.h | sed 's/^#.*//;/^$$/d' > tmp-params.options
+	$(CPP) -I$(objdir) $(srcdir)/params-options.h | sed 's/^#.*//;/^$$/d' > tmp-params.options
 	$(SHELL) $(srcdir)/../move-if-change tmp-params.options params.options
 	$(STAMP) s-params.options
 
diff --git a/gcc/config.in b/gcc/config.in
index 2856e72d627df537a301a6c7ab6b5bbb75f6b43f..bf593b5231adef0a6fc71b259597afa76e862607 100644
--- a/gcc/config.in
+++ b/gcc/config.in
@@ -55,6 +55,12 @@
 #endif
 
 
+/* Define to larger than zero set the default stack clash protector size. */
+#ifndef USED_FOR_TARGET
+#undef DEFAULT_STK_CLASH_GUARD_SIZE
+#endif
+
+
 /* Define if you want to use __cxa_atexit, rather than atexit, to register C++
destructors for local statics and global objects. This is essential for
fully standards-compliant handling of destructors, but requires
@@ -2148,6 +2154,24 @@
 #endif
 
 
+/* Set the stack clash guard size default value. */
+#ifndef USED_FOR_TARGET
+#undef STK_CLASH_GUARD_SIZE_DEFAULT
+#endif
+
+
+/* Set the stack clash guard size maximum value. */
+#ifndef USED_FOR_TARGET
+#undef STK_CLASH_GUARD_SIZE_MAX
+#endif
+
+
+/* Set the stack clash guard size minimum value. */
+#ifndef USED_FOR_TARGET
+#undef STK_CLASH_GUARD_SIZE_MIN
+#endif
+
+
 /* Define if you can safely include both  and . */
 #ifndef USED_FOR_TARGET
 #undef STRING_WITH_STRINGS
diff --git a/gcc/configure b/gcc/configure
index 60d373982fd38fe51c285e2b02941754d1b833d6..35cf0c724b4f14152205153337379e0ae3a0d501 100755
--- a/gcc/configure
+++ b/gcc/configure
@@ -905,6 +905,7 @@ enable_valgrind_annotations
 with_stabs
 enable_multilib
 enable_multiarch
+with_stack_clash_protection_guard_size
 enable___cxa_atexit
 enable_decimal_float
 enable_fixed_point
@@ -1724,6 +1725,8 @@ Optional Packages:
   --with-gnu-as   arrange to work with GNU as
   --with-as   arrange to use the specified as (full pathname)
   --with-stabsarrange to use stabs instead of host debug format
+  --with-stack-clash-protection-guard-size=size
+  Set the default stack clash protection guard size.
   --with-dwarf2   force the default debug format to be DWARF 2
   --with-specs=SPECS  add SPECS to driver command-line processing
   --with-pkgversion=PKG   Use PKG in the version string in place of "GCC"
@@ -7436,6 +7439,51 @@ $as_echo "$enable_multiarch$ma_msg_suffix" >&6; }
 
 
 
+# default stack clash protection guard size
+# These are kept here and passed down to params.def.  This way we don't have to
+# worry about keeping them in sync.
+stk_clash_min=12
+stk_clash_max=30
+stk_clash_default=12
+
+# Keep the default value when the option is not used to 0, this allows us to
+# distinguish between the cases where the user specifially set a value via
+# configure and when the normal default value is used.
+
+# Check whether --with-stack-clash-protection-guard-size was given.
+if test "${with_stack_clash_prote

[PATCH][GCC][mid-end] Add a hook to support telling the mid-end when to probe the stack [patch (2/6)]

2018-07-11 Thread Tamar Christina
Hi All,

This patch adds a hook to tell the mid-end about the probing requirements of the
target.  On AArch64 we allow a specific range for which no probing needs to
be done.  This same range is also the amount that will have to be probed up when
a probe is needed after dropping the stack.

Defining this probe comes with the extra requirement that the outgoing arguments
size of any function that uses alloca and stack clash be at the very least 8
bytes.  With this invariant we can skip doing the zero checks for alloca and
save some code.

A simplified version of the AArch64 stack frame is:

   +---+  
   |   | 
   |   |  
   |   |  
   +---+  
   |LR |  
   +---+  
   |FP |  
   +---+  
   |dynamic allocations| -\  probe range hook effects these   
   +---+   --\   and ensures that outgoing stack  
   |padding|  -- args is always > 8 when alloca.  
   +---+  ---/   Which means it's always safe to probe
   |outgoing stack args|-/   at SP
   +---+  

   

This allows us to generate better code than without the hook without affecting
other targets.

With this patch I am also removing the 
stack_clash_protection_final_dynamic_probe
hook which was added specifically for AArch64 but that is no longer needed.

Bootstrapped Regtested on aarch64-none-linux-gnu, x86_64-pc-linux-gnu and no 
issues.
Both targets were tested with stack clash on and off by default.

Ok for trunk?

Thanks,
Tamar

gcc/
2018-07-11  Tamar Christina  

PR target/86486
* explow.c (anti_adjust_stack_and_probe_stack_clash): Support custom
probe ranges.
* target.def (stack_clash_protection_alloca_probe_range): New.
(stack_clash_protection_final_dynamic_probe): Remove.
* targhooks.h (default_stack_clash_protection_alloca_probe_range) New.
(default_stack_clash_protection_final_dynamic_probe): Remove.
* targhooks.c: Likewise.
* doc/tm.texi.in (TARGET_STACK_CLASH_PROTECTION_ALLOCA_PROBE_RANGE): 
New.
(TARGET_STACK_CLASH_PROTECTION_FINAL_DYNAMIC_PROBE): Remove.
* doc/tm.texi: Regenerate.

-- 
diff --git a/gcc/doc/tm.texi b/gcc/doc/tm.texi
index 1c5a80920f17694f1119696ec40faef1452fe1c1..138d905023b449a7a239fb32c07f5c8551d5380f 100644
--- a/gcc/doc/tm.texi
+++ b/gcc/doc/tm.texi
@@ -3450,8 +3450,12 @@ GCC computed the default from the values of the above macros and you will
 normally not need to override that default.
 @end defmac
 
-@deftypefn {Target Hook} bool TARGET_STACK_CLASH_PROTECTION_FINAL_DYNAMIC_PROBE (rtx @var{residual})
-Some targets make optimistic assumptions about the state of stack probing when they emit their prologues.  On such targets a probe into the end of any dynamically allocated space is likely required for safety against stack clash style attacks.  Define this variable to return nonzero if such a probe is required or zero otherwise.  You need not define this macro if it would always have the value zero.
+@deftypefn {Target Hook} HOST_WIDE_INT TARGET_STACK_CLASH_PROTECTION_ALLOCA_PROBE_RANGE (void)
+Some targets have an ABI defined interval for which no probing needs to be done.
+When a probe does need to be done this same interval is used as the probe distance up when doing stack clash protection for alloca.
+On such targets this value can be set to override the default probing up interval.
+Define this variable to return nonzero if such a probe range is required or zero otherwise.  Defining this hook also requires your functions which make use of alloca to have at least 8 byesof outgoing arguments.  If this is not the case the stack will be corrupted.
+You need not define this macro if it would always have the value zero.
 @end deftypefn
 
 @need 2000
diff --git a/gcc/doc/tm.texi.in b/gcc/doc/tm.texi.in
index bf2c64e15dba1b95179cfe682523f29bb8fa1151..8c62ae06b6be2fe338e3614917dc5f5b1f4fd4aa 100644
--- a/gcc/doc/tm.texi.in
+++ b/gcc/doc/tm.texi.in
@@ -2841,7 +2841,7 @@ GCC computed the default from the values of the above macros and you will
 normally not need to override that default.
 @end defmac
 
-@hook TARGET_STACK_CLASH_PROTECTION_FINAL_DYNAMIC_PROBE
+@hook TARGET_STACK_CLASH_PROTECTIO

Re: [PATCH][RFC] Make iterating over hash-map elide copying/destructing

2018-07-11 Thread Trevor Saunders
On Tue, Jul 10, 2018 at 11:46:54AM +0200, Richard Biener wrote:
> On Tue, 10 Jul 2018, Trevor Saunders wrote:
> 
> > On Tue, Jul 10, 2018 at 10:43:20AM +0200, Richard Biener wrote:
> > > 
> > > The following makes the hash-map iterator dereference return a pair > > Value&> rather than a copy of Value.  This matches the hash-table iterator
> > > behavior and avoids issues with
> > > 
> > >   hash_map >
> > 
> > Eventually somebodies probably going to want
> > hash_map>, auto_vec> too, so we might as well go ahead
> > and make it pair?
> > 
> > > where iterating over the hash-table will call the auto_vec destructor
> > > when dereferencing the iterator.  I note that the copy ctor of
> > > auto_vec should probably be deleted and the hash-table/map iterators
> > > should possibly support an alternate "reference" type to the stored
> > > Values so we can use vec<> for "references" and auto_vec<> for
> > > stored members.
> > 
> > I think code somewhere uses the auto_vec copy ctor to return a auto_vec,
> > this is pretty similar to the situation with unique_ptr in c++98 mode.
> > 
> > > But that's out of scope - the patch below seems to survive minimal
> > > testing at least.
> > > 
> > > I suppose we still want to somehow hide the copy ctors of auto_vec?
> > 
> > I suspec the best we can do is delete it in c++11 mode and provide a
> > auto_vec(auto_vec &&) move ctor instead.  Though I think for the
> > case where auto_vec has inline storage we should be able to just delete
> > the copy ctor?
> > 
> > > How does hash-map growth work here?  (I suppose it doesn't...?)
> > 
> > Yeah was going to ask, I think hash_table memcpy's the elements? in
> > which case memcpying a pointer into yourself isn't going to work.
> 
> It doesn't work.  It uses assignment but auto_vec doesn't implement
> that so auto-storage breaks.  So you say it should use
> std::move<> where that's obviously not available for us :/

Well, since it doesn't define an assignment operator, but its members
are copyable it gets a copy assignment operator generated by the
compiler that just copies the data leading to it being broken.  I
suppose we could implement a copy assignment that handles the different
cases of using inline storage or not, but that seems complicated, and
kind of slow for a "assignment" so I'd be inclined to not support
copying it.  However if we went that route we should prevent use of the
assignment operator by declaring one explicitly and making it private but
then not implementing it, so it at least fails to link and with some
macros you can actually tell the compiler in c++11 its deleted and may
not be used.

> 
> > However I think if you use the auto_vec specialization for 0 internal
> > elements that should be able to work if we null out the old auto_vec or
> > avoid running dtors on the old elements.
> 
> Well, then I don't really need auto_vec, I'm more interested in the
> embedded storage than the destructor ;)

A hash table with inline storage like that is really going to eat
memory, but I suppose you know that ;) anyway fair enough.

> > > Any further comments?
> > 
> > other than using a reference for the key type seems good.
> 
> OK, I suppose it should be 'const Key&' then (hopefully that
> works for Key == const X / X * as intended).

Unfortunately I can never remember, but hopefully.

> I guess given the expansion problem I'm going to re-think using
> auto_vec for now :/

Seems like the destructor would help with cleanup, but yeah kind of an
odd beast to put in a datastructure.

> Can we please move to C++11? ;)

;) talk to the IBM folks I think their basically the only ones who
really care.

thanks

Trev

> 
> Richard.


[PATCH][GCC][front-end][opt-framework] Update options framework for parameters to properly handle and validate configure time params. [Patch (2/3)]

2018-07-11 Thread Tamar Christina
Hi All,

This patch builds on a previous patch to pass param options down from configure
by adding more expansive validation and correctness checks.

These are set very early on and allow the target to validate or reject the
values as they see fit.

To do this compiler_param has been extended to hold a value set at configure
time, this value is used to be able to distinguish between

1) default value
2) configure value
3) back-end default
4) user specific value.

The priority of the values should be 4 > 2 > 3 > 1.  The compiler will now also
validate the values in params.def after setting them.  This means invalid values
will no longer be accepted.

This also changes it so that default parameters are validated during
initialization. This change is needed to ensure parameters set via configure
or by the target specific common initialization routines still keep the
parameters within the valid range.

Bootstrapped Regtested on aarch64-none-linux-gnu, x86_64-pc-linux-gnu and no 
issues.
Both targets were tested with stack clash on and off by default.

Ok for trunk?

Thanks,
Tamar

gcc/
2018-07-11  Tamar Christina  

* params.h (struct param_info): Add configure_value.
* params.c (DEFPARAMCONF): New.
(DEFPARAM, DEFPARAMENUM5): Set configure_value.
(validate_param): New.
(add_params): Use it.
(set_param_value): Refactor param validation into validate_param.
(maybe_set_param_value): Don't override value from configure.
(diagnostic.h): Include.
* params-enum.h (DEFPARAMCONF): New.
* params-list.h: Likewise.
* params-options.h: Likewise.
* params.def (PARAM_STACK_CLASH_PROTECTION_GUARD_SIZE): Use it.
* diagnostic.h (diagnostic_ready_p): New.

-- 
diff --git a/gcc/diagnostic.h b/gcc/diagnostic.h
index cf3a610f3d945f2dbbfde7d9cf7a66f46ad6f0b1..584b5877b489d3cce5c18da2db5f73b7b41a72a4 100644
--- a/gcc/diagnostic.h
+++ b/gcc/diagnostic.h
@@ -250,6 +250,10 @@ diagnostic_inhibit_notes (diagnostic_context * context)
and similar functions.  */
 extern diagnostic_context *global_dc;
 
+/* Returns whether the diagnostic framework has been intialized already and is
+   ready for use.  */
+#define diagnostic_ready_p() (global_dc->printer != NULL)
+
 /* The total count of a KIND of diagnostics emitted so far.  */
 #define diagnostic_kind_count(DC, DK) (DC)->diagnostic_count[(int) (DK)]
 
diff --git a/gcc/params-enum.h b/gcc/params-enum.h
index 5f9ac3050257ea5ec3f8be5c0909a4d558b97861..a2d19d18bdceedd7fc9c4fdbc62c6902ddf967fe 100644
--- a/gcc/params-enum.h
+++ b/gcc/params-enum.h
@@ -18,6 +18,7 @@ along with GCC; see the file COPYING3.  If not see
 .  */
 
 #define DEFPARAM(ENUM, OPTION, HELP, DEFAULT, MIN, MAX)
+#define DEFPARAMCONF(ENUM, OPTION, HELP, MACRO, DEFAULT, MIN, MAX)
 #define DEFPARAMENUMNAME(ENUM) ENUM ## _KIND
 #define DEFPARAMENUMVAL(ENUM, V) ENUM ## _KIND_ ## V
 #define DEFPARAMENUMTERM(ENUM) ENUM ## _KIND_ ## LAST
@@ -36,4 +37,5 @@ along with GCC; see the file COPYING3.  If not see
 #undef DEFPARAMENUMTERM
 #undef DEFPARAMENUMVAL
 #undef DEFPARAMENUMNAME
+#undef DEFPARAMCONF
 #undef DEFPARAM
diff --git a/gcc/params-list.h b/gcc/params-list.h
index 4889c39a180abb3d0efaaf12e148deb1d011f65f..acb6ffd291d169642d44a05cd3b634029c53d50a 100644
--- a/gcc/params-list.h
+++ b/gcc/params-list.h
@@ -19,8 +19,11 @@ along with GCC; see the file COPYING3.  If not see
 
 #define DEFPARAM(enumerator, option, nocmsgid, default, min, max) \
   enumerator,
+#define DEFPARAMCONF(enumerator, option, nocmsgid, macro, default, min, max) \
+  enumerator,
 #define DEFPARAMENUM5(enumerator, option, nocmsgid, default, \
 		  v0, v1, v2, v3, v4) enumerator,
 #include "params.def"
 #undef DEFPARAM
+#undef DEFPARAMCONF
 #undef DEFPARAMENUM5
diff --git a/gcc/params-options.h b/gcc/params-options.h
index e9ac2e73522ddb6c199ed0af462ebc7e4777d676..fbb2f73c894da9505f233408979e5789b44c5fd6 100644
--- a/gcc/params-options.h
+++ b/gcc/params-options.h
@@ -19,9 +19,12 @@ along with GCC; see the file COPYING3.  If not see
 
 #define DEFPARAM(enumerator, option, nocmsgid, default, min, max) \
   option=default,min,max
+#define DEFPARAMCONF(enumerator, option, nocmsgid, macro, default, min, max) \
+  option=macro,default,min,max
 #define DEFPARAMENUM5(enumerator, option, nocmsgid, default, \
 		  v0, v1, v2, v3, v4) \
   option=v0,v1,v2,v3,v4
 #include "params.def"
 #undef DEFPARAM
+#undef DEFPARAMCONF
 #undef DEFPARAMENUM5
diff --git a/gcc/params.c b/gcc/params.c
index eb663be880a91dc0adce2a84c6bad7e06b4c72c3..e99c0bff48dafdd7fcca122b1c30df21ec75e6f2 100644
--- a/gcc/params.c
+++ b/gcc/params.c
@@ -25,6 +25,7 @@ along with GCC; see the file COPYING3.  If not see
 #include "params.h"
 #include "params-enum.h"
 #include "diagnostic-core.h"
+#include "diagnostic.h"
 #include "spellcheck.h"
 
 /* An array containing the compiler parameters and their current
@@ -40,24 +41,33 @@ static size_t num_compiler_params;
 st

[PATCH][GCC][AArch64] Validate and set default parameters for stack-clash. [Patch (3/3)]

2018-07-11 Thread Tamar Christina
Hi All,

This patch defines the default parameters and validation for the aarch64
stack clash probing interval and guard sizes.  It cleans up the previous
implementation and insures that at no point the invalidate arguments are
present in the pipeline for AArch64.  Currently they are only corrected once
cc1 initalizes the back-end.

The default for AArch64 is 64 KB for both of these and we only support 4 KB and 
64 KB
probes.  We also enforce that any value you set here for the parameters must be
in sync.

If an invalid value is specified an error will be generated and compilation 
aborted.

Bootstrapped Regtested on aarch64-none-linux-gnu and no issues.
Target was tested with stack clash on and off by default.

Ok for trunk?

Thanks,
Tamar

gcc/
2018-07-11  Tamar Christina  

* common/config/aarch64/aarch64-common.c (TARGET_OPTION_DEFAULT_PARAM,
aarch64_option_default_param):  New.
(params.h): Include.
(TARGET_OPTION_VALIDATE_PARAM, aarch64_option_validate_param): New.
* config/aarch64/aarch64.c (aarch64_override_options_internal): Simplify
stack-clash protection validation code.

-- 
diff --git a/gcc/common/config/aarch64/aarch64-common.c b/gcc/common/config/aarch64/aarch64-common.c
index 292fb818705d4650113da59a6d88cf2aa7c9e57d..73f2f95e8cb989f93e7a17bbf274f4364e660c0d 100644
--- a/gcc/common/config/aarch64/aarch64-common.c
+++ b/gcc/common/config/aarch64/aarch64-common.c
@@ -30,6 +30,7 @@
 #include "opts.h"
 #include "flags.h"
 #include "diagnostic.h"
+#include "params.h"
 
 #ifdef  TARGET_BIG_ENDIAN_DEFAULT
 #undef  TARGET_DEFAULT_TARGET_FLAGS
@@ -41,6 +42,10 @@
 
 #undef	TARGET_OPTION_OPTIMIZATION_TABLE
 #define TARGET_OPTION_OPTIMIZATION_TABLE aarch_option_optimization_table
+#undef TARGET_OPTION_DEFAULT_PARAMS
+#define TARGET_OPTION_DEFAULT_PARAMS aarch64_option_default_params
+#undef TARGET_OPTION_VALIDATE_PARAM
+#define TARGET_OPTION_VALIDATE_PARAM aarch64_option_validate_param
 
 /* Set default optimization options.  */
 static const struct default_options aarch_option_optimization_table[] =
@@ -60,6 +65,52 @@ static const struct default_options aarch_option_optimization_table[] =
 { OPT_LEVELS_NONE, 0, NULL, 0 }
   };
 
+/* Implement target validation TARGET_OPTION_DEFAULT_PARAM.  */
+
+static bool
+aarch64_option_validate_param (const int value, const int param)
+{
+  /* Check that both parameters are the same.  */
+  if (param == (int) PARAM_STACK_CLASH_PROTECTION_GUARD_SIZE)
+{
+  if (value != 12 && value != 16)
+	{
+	error ("only values 12 (4 KB) and 16 (64 KB) are supported for guard "
+		"size.  Given value %d (%llu KB) is out of range.\n",
+		value, (1ULL << value) / 1024ULL);
+	return false;
+	}
+
+  /* Enforce that they are the same.  */
+  set_default_param_value (PARAM_STACK_CLASH_PROTECTION_PROBE_INTERVAL,
+			   value);
+}
+
+  return true;
+}
+
+/* Implement TARGET_OPTION_DEFAULT_PARAMS.  */
+
+static void
+aarch64_option_default_params (void)
+{
+  /* We assume the guard page is 64k.  */
+  int index = (int) PARAM_STACK_CLASH_PROTECTION_GUARD_SIZE;
+  if (!compiler_params[index].configure_value)
+ set_default_param_value (PARAM_STACK_CLASH_PROTECTION_GUARD_SIZE, 16);
+
+  int guard_size
+= default_param_value (PARAM_STACK_CLASH_PROTECTION_GUARD_SIZE);
+
+  /* Set the interval parameter to be the same as the guard size.  This way the
+ mid-end code does the right thing for us.  */
+  set_default_param_value (PARAM_STACK_CLASH_PROTECTION_PROBE_INTERVAL,
+			   guard_size);
+
+  /* Validate the options.  */
+  aarch64_option_validate_param (guard_size, index);
+}
+
 /* Implement TARGET_HANDLE_OPTION.
This function handles the target specific options for CPU/target selection.
 
diff --git a/gcc/config/aarch64/aarch64.c b/gcc/config/aarch64/aarch64.c
index e62d8a92ff53128e5e10ffd3b52eb8898869b756..dfc0da6a27d6007c669db32abe5c7b0248ea28a5 100644
--- a/gcc/config/aarch64/aarch64.c
+++ b/gcc/config/aarch64/aarch64.c
@@ -10900,41 +10900,21 @@ aarch64_override_options_internal (struct gcc_options *opts)
 			   opts->x_param_values,
 			   global_options_set.x_param_values);
 
-  /* Use the alternative scheduling-pressure algorithm by default.  */
-  maybe_set_param_value (PARAM_SCHED_PRESSURE_ALGORITHM, SCHED_PRESSURE_MODEL,
-			 opts->x_param_values,
-			 global_options_set.x_param_values);
-
-  /* If the user hasn't change it via configure then set the default to 64 KB
- for the backend.  */
-  if (DEFAULT_STK_CLASH_GUARD_SIZE == 0)
-  maybe_set_param_value (PARAM_STACK_CLASH_PROTECTION_GUARD_SIZE, 16,
-			opts->x_param_values,
-			global_options_set.x_param_values);
-
-  /* Validate the guard size.  */
-  int guard_size = PARAM_VALUE (PARAM_STACK_CLASH_PROTECTION_GUARD_SIZE);
-  if (guard_size != 12 && guard_size != 16)
-  error ("only values 12 (4 KB) and 16 (64 KB) are supported for guard "
-	 "size.  Given value %d (%llu KB) is out of range.\n",
-	 guard_size, (1ULL

[PATCH][GCC][front-end][opt-framework] Allow back-ends to be able to do custom validations on params. [Patch (1/3)]

2018-07-11 Thread Tamar Christina
Hi All,

This patch adds the ability for backends to add custom constrains to the param
values by defining a new hook option_validate_param.

This hook is invoked on every set_param_value which allows the back-end to
ensure that the parameters are always within it's desired state.

Bootstrapped Regtested on aarch64-none-linux-gnu, x86_64-pc-linux-gnu and no 
issues.
Both targets were tested with stack clash on and off by default.

Ok for trunk?

Thanks,
Tamar

gcc/
2018-07-11  Tamar Christina  

* params.c (set_param_value):
Add index of parameter being validated.
* common/common-target.def (option_validate_param): New.
* common/common-targhooks.h (default_option_validate_param): New.
* common/common-targhooks.c (default_option_validate_param): New.
* doc/tm.texi.in (TARGET_OPTION_VALIDATE_PARAM): New.
* doc/tm.texi: Regenerate.

-- 
diff --git a/gcc/common/common-target.def b/gcc/common/common-target.def
index e0afbc6af29a37a908c32bcdbecd68c8cda003af..021b7e916471de97174ed17b143a03154165bb2d 100644
--- a/gcc/common/common-target.def
+++ b/gcc/common/common-target.def
@@ -56,6 +56,13 @@ DEFHOOK
  void, (void),
  hook_void_void)
 
+DEFHOOK
+(option_validate_param,
+"Validate target-dependent value for @option{--param} settings, using\
+ calls to @code{set_param_value}.",
+ bool, (int, int),
+ default_option_validate_param)
+
 /* The initial value of target_flags.  */
 DEFHOOKPOD
 (default_target_flags,
diff --git a/gcc/common/common-targhooks.h b/gcc/common/common-targhooks.h
index d290d7f3e2110aa2c421c57e9285e97def4bf0be..ff1da6a78322a26a622c74a1ad7b841847487bf1 100644
--- a/gcc/common/common-targhooks.h
+++ b/gcc/common/common-targhooks.h
@@ -29,6 +29,8 @@ extern bool default_target_handle_option (struct gcc_options *,
 	  const struct cl_decoded_option *,
 	  location_t);
 
+extern bool default_option_validate_param (const int, const int);
+
 extern const struct default_options empty_optimization_table[];
 
 #endif
diff --git a/gcc/common/common-targhooks.c b/gcc/common/common-targhooks.c
index b109019066422429280bfec58e10a0c0421ffae7..5cb8d27321a4bc5c858f349c29e90691baa9c144 100644
--- a/gcc/common/common-targhooks.c
+++ b/gcc/common/common-targhooks.c
@@ -77,6 +77,16 @@ default_target_handle_option (struct gcc_options *opts ATTRIBUTE_UNUSED,
   return true;
 }
 
+/* Default version of TARGET_OPTION_VALIDATE_PARAM.  */
+
+bool
+default_option_validate_param (const int value ATTRIBUTE_UNUSED,
+			   const int param ATTRIBUTE_UNUSED)
+{
+  return true;
+}
+
+
 const struct default_options empty_optimization_table[] =
   {
 { OPT_LEVELS_NONE, 0, NULL, 0 }
diff --git a/gcc/doc/tm.texi b/gcc/doc/tm.texi
index 138d905023b449a7a239fb32c07f5c8551d5380f..e61bee35b5379466563cb5ad03c12225c1bdfe04 100644
--- a/gcc/doc/tm.texi
+++ b/gcc/doc/tm.texi
@@ -753,6 +753,10 @@ Set target-dependent initial values of fields in @var{opts}.
 Set target-dependent default values for @option{--param} settings, using calls to @code{set_default_param_value}.
 @end deftypefn
 
+@deftypefn {Common Target Hook} bool TARGET_OPTION_VALIDATE_PARAM (const @var{int}, const @var{int})
+Validate target-dependent value for @option{--param} settings, using calls to @code{set_param_value}.
+@end deftypefn
+
 @defmac SWITCHABLE_TARGET
 Some targets need to switch between substantially different subtargets
 during compilation.  For example, the MIPS target has one subtarget for
diff --git a/gcc/doc/tm.texi.in b/gcc/doc/tm.texi.in
index 8c62ae06b6be2fe338e3614917dc5f5b1f4fd4aa..37373a29ced2540e1070559e53b5c6cf952fdda4 100644
--- a/gcc/doc/tm.texi.in
+++ b/gcc/doc/tm.texi.in
@@ -729,6 +729,8 @@ options are changed via @code{#pragma GCC optimize} or by using the
 
 @hook TARGET_OPTION_DEFAULT_PARAMS
 
+@hook TARGET_OPTION_VALIDATE_PARAM
+
 @defmac SWITCHABLE_TARGET
 Some targets need to switch between substantially different subtargets
 during compilation.  For example, the MIPS target has one subtarget for
diff --git a/gcc/params.c b/gcc/params.c
index 623296ce49b6c8cf98da7be08a3fb01bc2c21a93..eb663be880a91dc0adce2a84c6bad7e06b4c72c3 100644
--- a/gcc/params.c
+++ b/gcc/params.c
@@ -209,7 +209,7 @@ set_param_value (const char *name, int value,
 error ("maximum value of parameter %qs is %u",
 	   compiler_params[i].option,
 	   compiler_params[i].max_value);
-  else
+  else if (targetm_common.option_validate_param (value, (int)i))
 set_param_value_internal ((compiler_param) i, value,
 			  params, params_set, true);
 }



Re: [RFC] Fix recent popcount change is breaking

2018-07-11 Thread Kugan Vivekanandarajah
Hi Andrew,

On 11 July 2018 at 15:43, Andrew Pinski  wrote:
> On Tue, Jul 10, 2018 at 6:35 PM Kugan Vivekanandarajah
>  wrote:
>>
>> Hi Andrew,
>>
>> On 11 July 2018 at 11:19, Andrew Pinski  wrote:
>> > On Tue, Jul 10, 2018 at 6:14 PM Kugan Vivekanandarajah
>> >  wrote:
>> >>
>> >> On 10 July 2018 at 23:17, Richard Biener  
>> >> wrote:
>> >> > On Tue, Jul 10, 2018 at 3:06 PM Kugan Vivekanandarajah
>> >> >  wrote:
>> >> >>
>> >> >> Hi,
>> >> >>
>> >> >> Jeff told me that the recent popcount built-in detection is causing
>> >> >> kernel build issues as
>> >> >> ERROR: "__popcountsi2"
>> >> >> [drivers/net/wireless/broadcom/brcm80211/brcmfmac/brcmfmac.ko] 
>> >> >> undefined!
>> >> >>
>> >> >> I could also reproduce this. AFIK, we should check if the libfunc is
>> >> >> defined while checking popcount?
>> >> >>
>> >> >> I am testing the attached RFC patch. Is this reasonable?
>> >> >
>> >> > It doesn't work that way, all targets have this libfunc in libgcc.  
>> >> > This means
>> >> > the kernel has to provide it.  The only thing you could do is restrict
>> >> > replacement of CALL_EXPRs (in SCEV cprop) to those the target
>> >> > natively supports.
>> >>
>> >> How about restricting it in expression_expensive_p ? Is that what you
>> >> wanted. Attached patch does this.
>> >> Bootstrap and regression testing progressing.
>> >
>> > Seems like that should go into is_inexpensive_builtin  instead which
>> > is just tested right below.
>>
>> I hought about that. is_inexpensive_builtin is used in various other
>> places including some inlining decision so wasn't sure if it is the
>> right thing. Happy to change it if that is the right thing to do.
>
> I audited all of the users (and their users if it is used in a
> wrapper) and found that is_inexpensive_builtin should return false for
> this builtin if it is a function call in the end; there are other
> builtins which should be checked the similar way but I think we should
> not going to force you to do the similar thing for those builtins.

Attached patch does this. Testing is progressing. Is This OK if no regression.

Thanks,
Kugan


>
> Thanks,
> Andrew
>
>>
>> Thanks,
>> Kugan
>> >
>> > Thanks,
>> > Andrew
>> >
>> >>
>> >> Thanks,
>> >> Kugan
>> >>
>> >> >
>> >> > Richard.
>> >> >
>> >> >> Thanks,
>> >> >> Kugan
>> >> >>
>> >> >> gcc/ChangeLog:
>> >> >>
>> >> >> 2018-07-10  Kugan Vivekanandarajah  
>> >> >>
>> >> >> * tree-ssa-loop-niter.c (number_of_iterations_popcount): Check
>> >> >> if libfunc for popcount is available.
diff --git a/gcc/builtins.c b/gcc/builtins.c
index 820d6c2..59cf567 100644
--- a/gcc/builtins.c
+++ b/gcc/builtins.c
@@ -10619,6 +10619,18 @@ is_inexpensive_builtin (tree decl)
   else if (DECL_BUILT_IN_CLASS (decl) == BUILT_IN_NORMAL)
 switch (DECL_FUNCTION_CODE (decl))
   {
+  case BUILT_IN_POPCOUNT:
+  case BUILT_IN_POPCOUNTL:
+  case BUILT_IN_POPCOUNTLL:
+ {
+   tree arg = TYPE_ARG_TYPES (TREE_TYPE (decl));
+   /* Check if opcode for popcount is available.  */
+   if (optab_handler (popcount_optab,
+  TYPE_MODE (TREE_VALUE (arg)))
+   == CODE_FOR_nothing)
+ return false;
+ }
+   return true;
   case BUILT_IN_ABS:
   CASE_BUILT_IN_ALLOCA:
   case BUILT_IN_BSWAP16:
@@ -10670,10 +10682,7 @@ is_inexpensive_builtin (tree decl)
   case BUILT_IN_VA_COPY:
   case BUILT_IN_TRAP:
   case BUILT_IN_SAVEREGS:
-  case BUILT_IN_POPCOUNTL:
-  case BUILT_IN_POPCOUNTLL:
   case BUILT_IN_POPCOUNTIMAX:
-  case BUILT_IN_POPCOUNT:
   case BUILT_IN_PARITYL:
   case BUILT_IN_PARITYLL:
   case BUILT_IN_PARITYIMAX:
diff --git a/gcc/testsuite/gcc.target/aarch64/popcount4.c 
b/gcc/testsuite/gcc.target/aarch64/popcount4.c
index e69de29..ee55b2e 100644
--- a/gcc/testsuite/gcc.target/aarch64/popcount4.c
+++ b/gcc/testsuite/gcc.target/aarch64/popcount4.c
@@ -0,0 +1,14 @@
+/* { dg-do compile } */
+/* { dg-options "-O2 -fdump-tree-optimized -mgeneral-regs-only" } */
+
+int PopCount (long b) {
+int c = 0;
+
+while (b) {
+   b &= b - 1;
+   c++;
+}
+return c;
+}
+
+/* { dg-final { scan-tree-dump-times "__builtin_popcount" 0 "optimized" } } */


[PATCH][GCC][AArch64] Ensure that outgoing argument size is at least 8 bytes when alloca and stack-clash. [Patch (3/6)]

2018-07-11 Thread Tamar Christina
Hi All,

This patch adds a requirement that the number of outgoing arguments for a
function is at least 8 bytes when using stack-clash protection.

By using this condition we can avoid a check in the alloca code and so have
smaller and simpler code there.

A simplified version of the AArch64 stack frames is:

   +---+  
   |   | 
   |   |  
   |   |  
   +---+  
   |LR |  
   +---+  
   |FP |  
   +---+  
   |dynamic allocations|   expanding area which will push the outgoing
   +---+   args down during each allocation.
   |padding|
   +---+
   |outgoing stack args|  safety buffer of 8 bytes (aligned)
   +---+

By always defining an outgoing argument, alloca(0) effectively is safe to probe
at $sp due to the reserved buffer being there.  It will never corrupt the stack.

This is also safe for alloca(x) where x is 0 or x % page_size == 0.  In the
former it is the same case as alloca(0) while the latter is safe because any
allocation pushes the outgoing stack args down:

   |FP |  
   +---+  
   |   |
   |dynamic allocations|   alloca (x)
   |   |
   +---+
   |padding|
   +---+
   |outgoing stack args|  safety buffer of 8 bytes (aligned)
   +---+

Which means when you probe for the residual, if it's 0 you'll again just probe
in the outgoing stack args range, which we know is non-zero (at least 8 bytes).

Bootstrapped Regtested on aarch64-none-linux-gnu and no issues.
Target was tested with stack clash on and off by default.

Ok for trunk?

Thanks,
Tamar

gcc/
2018-07-11  Tamar Christina  

PR target/86486
* config/aarch64/aarch64.h (STACK_CLASH_OUTGOING_ARGS,
STACK_DYNAMIC_OFFSET): New.
* config/aarch64/aarch64.c (aarch64_layout_frame):
Update outgoing args size.
(aarch64_stack_clash_protection_alloca_probe_range,
TARGET_STACK_CLASH_PROTECTION_ALLOCA_PROBE_RANGE): New.

gcc/testsuite/
2018-07-11  Tamar Christina  

PR target/86486
* gcc.target/aarch64/stack-check-alloca-1.c: New.
* gcc.target/aarch64/stack-check-alloca-10.c: New.
* gcc.target/aarch64/stack-check-alloca-2.c: New.
* gcc.target/aarch64/stack-check-alloca-3.c: New.
* gcc.target/aarch64/stack-check-alloca-4.c: New.
* gcc.target/aarch64/stack-check-alloca-5.c: New.
* gcc.target/aarch64/stack-check-alloca-6.c: New.
* gcc.target/aarch64/stack-check-alloca-7.c: New.
* gcc.target/aarch64/stack-check-alloca-8.c: New.
* gcc.target/aarch64/stack-check-alloca-9.c: New.
* gcc.target/aarch64/stack-check-alloca.h: New.
* gcc.target/aarch64/stack-check-14.c: New.
* gcc.target/aarch64/stack-check-15.c: New.

-- 
diff --git a/gcc/config/aarch64/aarch64.h b/gcc/config/aarch64/aarch64.h
index 1345f0eb171d05e2b833935c0a32f79c3db03f99..e9560b53bd8b5761855561dbf82d9c90cc1c282a 100644
--- a/gcc/config/aarch64/aarch64.h
+++ b/gcc/config/aarch64/aarch64.h
@@ -88,6 +88,10 @@
before probing has to be done for stack clash protection.  */
 #define STACK_CLASH_CALLER_GUARD 1024
 
+/* This value represents the minimum amount of bytes we expect the function's
+   outgoing arguments to be when stack-clash is enabled.  */
+#define STACK_CLASH_OUTGOING_ARGS 8
+
 /* This value controls how many pages we manually unroll the loop for when
generating stack clash probes.  */
 #define STACK_CLASH_MAX_UNROLL_PAGES 4
@@ -1069,4 +1073,15 @@ extern poly_uint16 aarch64_sve_vg;
 
 #define REGMODE_NATURAL_SIZE(MODE) aarch64_regmode_natural_size (MODE)
 
+/* Allocate the minimum of STACK_CLASH_OUTGOING_ARGS if stack clash protection
+   is enabled for the outgoing arguments.  This is essential as the extra args
+   space allows if to skip a check in alloca.  */
+#undef STACK_DYNAMIC_OFFSET
+#define STACK_DYNAMIC_OFFSET(FUNDECL)	   \
+   ((flag_stack_clash_protection	   \
+ && cfun->calls_alloca		   \
+ && known_lt (crtl->outgoing_args_size, STACK_CLASH_OUTGOING_ARGS))\
+? ROUND_UP (STACK_CLASH_OUTGOING_ARGS, STACK_BOUNDARY / BITS_PER_UNIT) \
+: (crtl->outgoing_args_size + 

[PATCH][GCC][AArch64] Set default values for stack-clash and do basic validation in back-end. [Patch (5/6)]

2018-07-11 Thread Tamar Christina
Hi All,

This patch enforces that the default guard size for stack-clash protection for
AArch64 be 64KB unless the user has overriden it via configure in which case
the user value is used as long as that value is within the valid range.

It also does some basic validation to ensure that the guard size is only 4KB or
64KB and also enforces that for aarch64 the stack-clash probing interval is
equal to the guard size.

Bootstrapped Regtested on aarch64-none-linux-gnu and no issues.
Target was tested with stack clash on and off by default.

Ok for trunk?

Thanks,
Tamar

gcc/
2018-07-11  Tamar Christina  

PR target/86486
* config/aarch64/aarch64.c (aarch64_override_options_internal):
Add validation for stack-clash parameters.

-- 
diff --git a/gcc/config/aarch64/aarch64.c b/gcc/config/aarch64/aarch64.c
index ae3c2fb85256b1e95e2242f3f16a027e918ba368..e62d8a92ff53128e5e10ffd3b52eb8898869b756 100644
--- a/gcc/config/aarch64/aarch64.c
+++ b/gcc/config/aarch64/aarch64.c
@@ -10905,6 +10905,36 @@ aarch64_override_options_internal (struct gcc_options *opts)
 			 opts->x_param_values,
 			 global_options_set.x_param_values);
 
+  /* If the user hasn't change it via configure then set the default to 64 KB
+ for the backend.  */
+  if (DEFAULT_STK_CLASH_GUARD_SIZE == 0)
+  maybe_set_param_value (PARAM_STACK_CLASH_PROTECTION_GUARD_SIZE, 16,
+			opts->x_param_values,
+			global_options_set.x_param_values);
+
+  /* Validate the guard size.  */
+  int guard_size = PARAM_VALUE (PARAM_STACK_CLASH_PROTECTION_GUARD_SIZE);
+  if (guard_size != 12 && guard_size != 16)
+  error ("only values 12 (4 KB) and 16 (64 KB) are supported for guard "
+	 "size.  Given value %d (%llu KB) is out of range.\n",
+	 guard_size, (1ULL << guard_size) / 1024ULL);
+
+  /* Enforce that interval is the same size as size so the mid-end does the
+ right thing.  */
+  maybe_set_param_value (PARAM_STACK_CLASH_PROTECTION_PROBE_INTERVAL,
+			 guard_size,
+			 opts->x_param_values,
+			 global_options_set.x_param_values);
+
+  /* The maybe_set calls won't update the value if the user has explicitly set
+ one.  Which means we need to validate that probing interval and guard size
+ are equal.  */
+  int probe_interval
+= PARAM_VALUE (PARAM_STACK_CLASH_PROTECTION_PROBE_INTERVAL);
+  if (guard_size != probe_interval)
+error ("stack clash guard size '%d' must be equal to probing interval "
+	   "'%d'\n", guard_size, probe_interval);
+
   /* Enable sw prefetching at specified optimization level for
  CPUS that have prefetch.  Lower optimization level threshold by 1
  when profiling is enabled.  */



Re: [RFC] Fix recent popcount change is breaking

2018-07-11 Thread Richard Biener
On Wed, Jul 11, 2018 at 1:26 PM Kugan Vivekanandarajah
 wrote:
>
> Hi Andrew,
>
> On 11 July 2018 at 15:43, Andrew Pinski  wrote:
> > On Tue, Jul 10, 2018 at 6:35 PM Kugan Vivekanandarajah
> >  wrote:
> >>
> >> Hi Andrew,
> >>
> >> On 11 July 2018 at 11:19, Andrew Pinski  wrote:
> >> > On Tue, Jul 10, 2018 at 6:14 PM Kugan Vivekanandarajah
> >> >  wrote:
> >> >>
> >> >> On 10 July 2018 at 23:17, Richard Biener  
> >> >> wrote:
> >> >> > On Tue, Jul 10, 2018 at 3:06 PM Kugan Vivekanandarajah
> >> >> >  wrote:
> >> >> >>
> >> >> >> Hi,
> >> >> >>
> >> >> >> Jeff told me that the recent popcount built-in detection is causing
> >> >> >> kernel build issues as
> >> >> >> ERROR: "__popcountsi2"
> >> >> >> [drivers/net/wireless/broadcom/brcm80211/brcmfmac/brcmfmac.ko] 
> >> >> >> undefined!
> >> >> >>
> >> >> >> I could also reproduce this. AFIK, we should check if the libfunc is
> >> >> >> defined while checking popcount?
> >> >> >>
> >> >> >> I am testing the attached RFC patch. Is this reasonable?
> >> >> >
> >> >> > It doesn't work that way, all targets have this libfunc in libgcc.  
> >> >> > This means
> >> >> > the kernel has to provide it.  The only thing you could do is restrict
> >> >> > replacement of CALL_EXPRs (in SCEV cprop) to those the target
> >> >> > natively supports.
> >> >>
> >> >> How about restricting it in expression_expensive_p ? Is that what you
> >> >> wanted. Attached patch does this.
> >> >> Bootstrap and regression testing progressing.
> >> >
> >> > Seems like that should go into is_inexpensive_builtin  instead which
> >> > is just tested right below.
> >>
> >> I hought about that. is_inexpensive_builtin is used in various other
> >> places including some inlining decision so wasn't sure if it is the
> >> right thing. Happy to change it if that is the right thing to do.
> >
> > I audited all of the users (and their users if it is used in a
> > wrapper) and found that is_inexpensive_builtin should return false for
> > this builtin if it is a function call in the end; there are other
> > builtins which should be checked the similar way but I think we should
> > not going to force you to do the similar thing for those builtins.
>
> Attached patch does this. Testing is progressing. Is This OK if no regression.

As said this isn't a complete fix given others may code-generate expressions
with niter, for example vectorization.

Also the table-based popcount implementation in libgcc is probably
faster and the popcount call at least smaller than an open-coded variant.

So I'm not sure if this is an appropriate fix.

Why not simply make popcountdi available in the kernel?  They do have
implementations for other libgcc functions IIRC.

Richard.

> Thanks,
> Kugan
>
>
> >
> > Thanks,
> > Andrew
> >
> >>
> >> Thanks,
> >> Kugan
> >> >
> >> > Thanks,
> >> > Andrew
> >> >
> >> >>
> >> >> Thanks,
> >> >> Kugan
> >> >>
> >> >> >
> >> >> > Richard.
> >> >> >
> >> >> >> Thanks,
> >> >> >> Kugan
> >> >> >>
> >> >> >> gcc/ChangeLog:
> >> >> >>
> >> >> >> 2018-07-10  Kugan Vivekanandarajah  
> >> >> >>
> >> >> >> * tree-ssa-loop-niter.c (number_of_iterations_popcount): Check
> >> >> >> if libfunc for popcount is available.


Re: abstract gimple_call_nonnull*() from vr-values

2018-07-11 Thread Richard Biener
On Tue, Jul 10, 2018 at 12:14 PM Aldy Hernandez  wrote:
>
> Ho hum, more abstractions.
>
> No change in functionality.
>
> OK for trunk?

OK.  Bonus points for finding the copy I vaguely remember exists somewhere...

Richard.


Re: abstract wide int binop code from VRP

2018-07-11 Thread Richard Biener
On Wed, Jul 11, 2018 at 8:48 AM Aldy Hernandez  wrote:
>
> Hmmm, I think we can do better, and since this hasn't been reviewed yet,
> I don't think anyone will mind the adjustment to the patch ;-).
>
> I really hate int_const_binop_SOME_RANDOM_NUMBER.  We should abstract
> them into properly named poly_int_binop, wide_int_binop, and tree_binop,
> and then use a default argument for int_const_binop() to get things going.
>
> Sorry for more changes in flight, but I thought we could benefit from
> more cleanups :).
>
> OK for trunk pending tests?

Much of GCC pre-dates function overloading / default args ;)

Looks OK but can you please rename your tree_binop to int_cst_binop?
Or maybe inline it into int_const_binop, also sharing the force_fit_type ()
tail with poly_int_binop?

What about mixed INTEGER_CST / poly_int constants?  Shouldn't it
be

  if (neither-poly-nor-integer-cst (arg1 || arg2))
return NULL_TREE;
  if (poly_int_tree (arg1) || poly_int_tree (arg2))
poly-int-stuff
  else if (INTEGER_CST && INTEGER_CST)
wide-int-stuff

?  I see that is a pre-existing issue but if you are at refactoring...
wi::to_poly_wide should handle INTEGER_CST operands just fine
I hope.

Thanks,
Richard.

> Aldy
>
> On 07/10/2018 04:31 AM, Aldy Hernandez wrote:
> > Howdy!
> >
> > Attached are more cleanups to VRP getting rid of some repetitive code,
> > as well as abstracting wide int handling code into their own functions.
> > There should be no change to existing functionality.
> >
> > You may notice that I have removed the PLUS/MINUS_EXPR handling in
> > vrp_int_const_binop, even from the new abstracted code:
> >
> > -  /* For addition, the operands must be of the same sign
> > - to yield an overflow.  Its sign is therefore that
> > - of one of the operands, for example the first.  */
> > -  || (code == PLUS_EXPR && sgn1 >= 0)
> > -  /* For subtraction, operands must be of
> > - different signs to yield an overflow.  Its sign is
> > - therefore that of the first operand or the opposite of
> > - that of the second operand.  A first operand of 0 counts
> > - as positive here, for the corner case 0 - (-INF), which
> > - overflows, but must yield +INF.  */
> > -  || (code == MINUS_EXPR && sgn1 >= 0)
> >
> > This code is actually unreachable, as the switch above this snippet was
> > already aborting if code was not one of the shift or mult/div operators.
> >
> > Oh yeah, don't blame me for the cryptic comment to
> > range_easy_mask_min_mask().  That machine language comment was already
> > there ;-).
> >
> > OK pending one more round of tests?
> >
> > Aldy


[PATCH] Fix PR86452

2018-07-11 Thread Richard Biener


The following fixes PR86452 by using scope_die_for which correctly
deals with -g1 in not creating (new) DIEs for namespaces.  With
a larger LTO testcase we run into this with a namespace DIE not
readily available but I'm not sure the issue isn't latent with
non-LTO.

The original code was added with the fix for PR44188 which
had a lengthy discussion and various patch variants but this
very piece of change didn't have an explanation and why it
didn't use scope_die_for.  The single testcase added still passes
after the patch.

Bootstraped on x86_64-unknown-linux-gnu, testing in progress.

OK for trunk?

Thanks,
Richard.

2018-07-11  Richard Biener  

PR debug/86452
* dwarf2out.c (gen_type_die_with_usage): Use scope_die_for
instead of get_context_die.

Index: gcc/dwarf2out.c
===
--- gcc/dwarf2out.c (revision 262551)
+++ gcc/dwarf2out.c (working copy)
@@ -25378,11 +25378,8 @@ gen_type_die_with_usage (tree type, dw_d
  generate debug info for the typedef.  */
   if (is_naming_typedef_decl (TYPE_NAME (type)))
 {
-  /* Use the DIE of the containing namespace as the parent DIE of
- the type description DIE we want to generate.  */
-  if (DECL_CONTEXT (TYPE_NAME (type))
- && TREE_CODE (DECL_CONTEXT (TYPE_NAME (type))) == NAMESPACE_DECL)
-   context_die = get_context_die (DECL_CONTEXT (TYPE_NAME (type)));
+  /* Give typedefs the right scope.  */
+  context_die = scope_die_for (type, context_die);
 
   gen_decl_die (TYPE_NAME (type), NULL, NULL, context_die);
   return;


Re: [committed] Fix OpenMP class iterators in distribute parallel for (PR c++/86443)

2018-07-11 Thread Jakub Jelinek
On Tue, Jul 10, 2018 at 09:18:18AM +0200, Jakub Jelinek wrote:
> Bootstrapped/regtested on x86_64-linux and i686-linux, committed to trunk.

I found two small issues and one big issue (results being declare target)
which break the test if using non-shared memory offloading.

This should fix it, tested on x86_64-linux, committed to trunk.

2018-07-11  Jakub Jelinek  

PR c++/86443
* testsuite/libgomp.c++/for-15.C (a): Remove unused variable.
(results): Make sure the variable is not inside declare target region.
(qux): Remove unused function.

--- libgomp/testsuite/libgomp.c++/for-15.C  (revision 262551)
+++ libgomp/testsuite/libgomp.c++/for-15.C  (working copy)
@@ -88,10 +88,11 @@ private:
 
 template  const I &J::begin () { return b; }
 template  const I &J::end () { return e; }
+#pragma omp end declare target
 
-int a[2000];
 int results[2000];
 
+#pragma omp declare target
 template 
 void
 baz (I &i)
@@ -110,13 +111,6 @@ baz (int i)
 }
 
 void
-qux (I &i)
-{
-  if (*i != 1931)
-abort ();
-}
-
-void
 f1 (J j)
 {
 #pragma omp distribute parallel for default(none)


Jakub


Re: [PATCH] fold strlen() of aggregate members (PR 77357)

2018-07-11 Thread Andre Vieira (lists)
On 11/07/18 11:00, Andre Vieira (lists) wrote:
> On 09/07/18 22:44, Martin Sebor wrote:
>> On 07/09/2018 06:40 AM, Richard Biener wrote:
>>> On Sun, Jul 8, 2018 at 4:56 AM Martin Sebor  wrote:

 On 07/06/2018 09:52 AM, Richard Biener wrote:
> On Fri, Jul 6, 2018 at 1:54 AM Martin Sebor  wrote:
>>
>> GCC folds accesses to members of constant aggregates except
>> for character arrays/strings.  For example, the strlen() call
>> below is not folded:
>>
>>const char a[][4] = { "1", "12" };
>>
>>int f (void) { retturn strlen (a[1]); }
>>
>> The attached change set enhances the string_constant() function
>> to make it possible to extract string constants from aggregate
>> initializers (CONSTRUCTORS).
>>
>> The initial solution was much simpler but as is often the case,
>> MEM_REF made it fail to fold things like:
>>
>>int f (void) { retturn strlen (a[1] + 1); }
>>
>> Handling those made the project a bit more interesting and
>> the final solution somewhat more involved.
>>
>> To handle offsets into aggregate string members the patch also
>> extends the fold_ctor_reference() function to extract entire
>> string array initializers even if the offset points past
>> the beginning of the string and even though the size and
>> exact type of the reference are not known (there isn't enough
>> information in a MEM_REF to determine that).
>>
>> Tested along with the patch for PR 86415 on x86_64-linux.
>
> +  if (TREE_CODE (init) == CONSTRUCTOR)
> +   {
> + tree type;
> + if (TREE_CODE (arg) == ARRAY_REF
> + || TREE_CODE (arg) == MEM_REF)
> +   type = TREE_TYPE (arg);
> + else if (TREE_CODE (arg) == COMPONENT_REF)
> +   {
> + tree field = TREE_OPERAND (arg, 1);
> + type = TREE_TYPE (field);
> +   }
> + else
> +   return NULL_TREE;
>
> what's wrong with just
>
> type = TREE_TYPE (field);

 In response to your comment below abut size I simplified things
 further so determining the type a priori is no longer necessary.

> ?
>
> + base_off *= BITS_PER_UNIT;
>
> poly_uint64 isn't enough for "bits", with wide-int you'd use
> offset_int,
> for poly you'd then use poly_offset?

 Okay, I tried to avoid the overflow.  (Converting between all
 these flavors of wide int types is a monumental PITA.)

>
> You extend fold_ctor_reference to treat size == 0 specially but then
> bother to compute a size here - that looks unneeded?

 Yes, well spotted, thanks!  I simplified the code so this isn't
 necessary, and neither is the type.

>
> While the offset of the reference determines the first field in the
> CONSTRUCTOR, how do you know the access doesn't touch
> adjacent ones?  STRING_CSTs do not have to be '\0' terminated,
> so consider
>
>   char x[2][4] = { "abcd", "abcd" };
>
> and MEM[&x] with a char[8] type?  memcpy "inlining" will create
> such MEMs for example.

 The code is only used to find string constants in initializer
 expressions where I don't think the size of the access comes
 into play.  If a memcpy() call results in a MEM_REF[char[8],
 &x, 8] that's fine.  It's a valid reference and we can still
 get the underlying character sequence (which is represented
 as two STRING_CSTs with the two string literals).  I might
 be missing the point of your question.
>>>
>>> Maybe irrelevant for strlen folding depending on what you do
>>> for missing '\0' termination.
>>>
>
> @@ -6554,8 +6577,16 @@ fold_nonarray_ctor_reference (tree type, tree
> ctor,
>tree byte_offset = DECL_FIELD_OFFSET (cfield);
>tree field_offset = DECL_FIELD_BIT_OFFSET (cfield);
>tree field_size = DECL_SIZE (cfield);
> -  offset_int bitoffset;
> -  offset_int bitoffset_end, access_end;
> +
> +  if (!field_size && TREE_CODE (cval) == STRING_CST)
> +   {
> + /* Determine the size of the flexible array member from
> +the size of the string initializer provided for it.  */
> + unsigned HOST_WIDE_INT len = TREE_STRING_LENGTH (cval);
> + tree eltype = TREE_TYPE (TREE_TYPE (cval));
> + len *= tree_to_uhwi (TYPE_SIZE (eltype));
> + field_size = build_int_cst (size_type_node, len);
> +   }
>
> Why does this only apply to STRING_CST initializers and not
> CONSTRUCTORS,
> say, for
>
> struct S { int i; int a[]; } s = { 1, { 2, 3, 4, 5, 6 } };

 I can't think of a use for it.  Do you have something in mind?
>>>
>>> Well, you basically implemented a get-CONSTRUCTOR-elt-at-offset
>>> which is useful in

Re: [PATCH][C family] Fix PR86453

2018-07-11 Thread Marek Polacek
On Wed, Jul 11, 2018 at 11:55:32AM +0200, Richard Biener wrote:
> 
> This fixes handle_packed_attribute creating a type variant which differs
> in TYPE_PACKED.  This cannot be generally allowed since TYPE_PACKED
> affects layout and layout is shared between variants.
> 
> For the testcase in question the attribute itself is later ignored
> but TYPE_PACKED is still applied which eventually leads to an ICE
> in type verification (that isn't applied very reliably).
> 
> Bootstrapped and tested on x86_64-unknown-linux-gnu, OK?

Ok.

Marek


Re: [PATCH] fold strlen() of aggregate members (PR 77357)

2018-07-11 Thread Martin Sebor

On 07/11/2018 07:50 AM, Andre Vieira (lists) wrote:

On 11/07/18 11:00, Andre Vieira (lists) wrote:

On 09/07/18 22:44, Martin Sebor wrote:

On 07/09/2018 06:40 AM, Richard Biener wrote:

On Sun, Jul 8, 2018 at 4:56 AM Martin Sebor  wrote:


On 07/06/2018 09:52 AM, Richard Biener wrote:

On Fri, Jul 6, 2018 at 1:54 AM Martin Sebor  wrote:


GCC folds accesses to members of constant aggregates except
for character arrays/strings.  For example, the strlen() call
below is not folded:

   const char a[][4] = { "1", "12" };

   int f (void) { retturn strlen (a[1]); }

The attached change set enhances the string_constant() function
to make it possible to extract string constants from aggregate
initializers (CONSTRUCTORS).

The initial solution was much simpler but as is often the case,
MEM_REF made it fail to fold things like:

   int f (void) { retturn strlen (a[1] + 1); }

Handling those made the project a bit more interesting and
the final solution somewhat more involved.

To handle offsets into aggregate string members the patch also
extends the fold_ctor_reference() function to extract entire
string array initializers even if the offset points past
the beginning of the string and even though the size and
exact type of the reference are not known (there isn't enough
information in a MEM_REF to determine that).

Tested along with the patch for PR 86415 on x86_64-linux.


+  if (TREE_CODE (init) == CONSTRUCTOR)
+   {
+ tree type;
+ if (TREE_CODE (arg) == ARRAY_REF
+ || TREE_CODE (arg) == MEM_REF)
+   type = TREE_TYPE (arg);
+ else if (TREE_CODE (arg) == COMPONENT_REF)
+   {
+ tree field = TREE_OPERAND (arg, 1);
+ type = TREE_TYPE (field);
+   }
+ else
+   return NULL_TREE;

what's wrong with just

type = TREE_TYPE (field);


In response to your comment below abut size I simplified things
further so determining the type a priori is no longer necessary.


?

+ base_off *= BITS_PER_UNIT;

poly_uint64 isn't enough for "bits", with wide-int you'd use
offset_int,
for poly you'd then use poly_offset?


Okay, I tried to avoid the overflow.  (Converting between all
these flavors of wide int types is a monumental PITA.)



You extend fold_ctor_reference to treat size == 0 specially but then
bother to compute a size here - that looks unneeded?


Yes, well spotted, thanks!  I simplified the code so this isn't
necessary, and neither is the type.



While the offset of the reference determines the first field in the
CONSTRUCTOR, how do you know the access doesn't touch
adjacent ones?  STRING_CSTs do not have to be '\0' terminated,
so consider

  char x[2][4] = { "abcd", "abcd" };

and MEM[&x] with a char[8] type?  memcpy "inlining" will create
such MEMs for example.


The code is only used to find string constants in initializer
expressions where I don't think the size of the access comes
into play.  If a memcpy() call results in a MEM_REF[char[8],
&x, 8] that's fine.  It's a valid reference and we can still
get the underlying character sequence (which is represented
as two STRING_CSTs with the two string literals).  I might
be missing the point of your question.


Maybe irrelevant for strlen folding depending on what you do
for missing '\0' termination.



@@ -6554,8 +6577,16 @@ fold_nonarray_ctor_reference (tree type, tree
ctor,
   tree byte_offset = DECL_FIELD_OFFSET (cfield);
   tree field_offset = DECL_FIELD_BIT_OFFSET (cfield);
   tree field_size = DECL_SIZE (cfield);
-  offset_int bitoffset;
-  offset_int bitoffset_end, access_end;
+
+  if (!field_size && TREE_CODE (cval) == STRING_CST)
+   {
+ /* Determine the size of the flexible array member from
+the size of the string initializer provided for it.  */
+ unsigned HOST_WIDE_INT len = TREE_STRING_LENGTH (cval);
+ tree eltype = TREE_TYPE (TREE_TYPE (cval));
+ len *= tree_to_uhwi (TYPE_SIZE (eltype));
+ field_size = build_int_cst (size_type_node, len);
+   }

Why does this only apply to STRING_CST initializers and not
CONSTRUCTORS,
say, for

struct S { int i; int a[]; } s = { 1, { 2, 3, 4, 5, 6 } };


I can't think of a use for it.  Do you have something in mind?


Well, you basically implemented a get-CONSTRUCTOR-elt-at-offset
which is useful in other parts of the compiler.  So I don't see why
it shouldn't work for general flex-arrays.



?  And why not use simply

  field_size = TYPE_SIZE (TREE_TYPE (cval));

like you do in c_strlen?


Yes, that's simpler, thanks.



Otherwise looks reasonable.


Attached is an updated patch.  I also enhanced the handling
of non-constant indices.  They were already handled before
to a smaller extent.  (There may be other opportunities
here.)


Please don't do functional changes to a patch in review, without
exactly pointing out the change.  It makes review inefficent for me.

Looks like it might be the NULL type argument handling?

Go patch committed: Fix evaluation order of LHS index expressions

2018-07-11 Thread Ian Lance Taylor
The Go spec says that when an index expression appears on the left
hand side of an assignment, the operands should be evaluated. The
gofrontend code was assuming that that only referred to the index
operand. But discussion of https://golang.org/issue/23188 has
clarified that this means both the slice/map/string operand and the
index operand. This patch adjusts the gofrontend code accordingly,
fixing the issue.  The test case for this is in
https://golang.org/cl/123115.  Bootstrapped and ran Go testsuite on
x86_64-pc-linux-gnu.  Committed to mainline.

Ian
Index: gcc/go/gofrontend/MERGE
===
--- gcc/go/gofrontend/MERGE (revision 262540)
+++ gcc/go/gofrontend/MERGE (working copy)
@@ -1,4 +1,4 @@
-8ad67a72a4fa59efffc891e73ecf10020e3c565d
+ea7ac7784791dca517b6681a02c39c11bf136755
 
 The first line of this file holds the git revision number of the last
 merge done from the gofrontend repository.
Index: gcc/go/gofrontend/expressions.cc
===
--- gcc/go/gofrontend/expressions.cc(revision 262540)
+++ gcc/go/gofrontend/expressions.cc(working copy)
@@ -10898,6 +10898,20 @@ Array_index_expression::do_check_types(G
 }
 }
 
+// The subexpressions of an array index must be evaluated in order.
+// If this is indexing into an array, rather than a slice, then only
+// the index should be evaluated.  Since this is called for values on
+// the left hand side of an assigment, evaluating the array, meaning
+// copying the array, will cause a different array to be modified.
+
+bool
+Array_index_expression::do_must_eval_subexpressions_in_order(
+int* skip) const
+{
+  *skip = this->array_->type()->is_slice_type() ? 0 : 1;
+  return true;
+}
+
 // Flatten array indexing by using temporary variables for slices and indexes.
 
 Expression*
Index: gcc/go/gofrontend/expressions.h
===
--- gcc/go/gofrontend/expressions.h (revision 262540)
+++ gcc/go/gofrontend/expressions.h (working copy)
@@ -2771,12 +2771,10 @@ class Index_expression : public Parser_e
this->location());
   }
 
+  // This shouldn't be called--we don't know yet.
   bool
-  do_must_eval_subexpressions_in_order(int* skip) const
-  {
-*skip = 1;
-return true;
-  }
+  do_must_eval_subexpressions_in_order(int*) const
+  { go_unreachable(); }
 
   void
   do_dump_expression(Ast_dump_context*) const;
@@ -2882,11 +2880,7 @@ class Array_index_expression : public Ex
   }
 
   bool
-  do_must_eval_subexpressions_in_order(int* skip) const
-  {
-*skip = 1;
-return true;
-  }
+  do_must_eval_subexpressions_in_order(int* skip) const;
 
   bool
   do_is_addressable() const;
@@ -2965,11 +2959,8 @@ class String_index_expression : public E
   }
 
   bool
-  do_must_eval_subexpressions_in_order(int* skip) const
-  {
-*skip = 1;
-return true;
-  }
+  do_must_eval_subexpressions_in_order(int*) const
+  { return true; }
 
   Bexpression*
   do_get_backend(Translate_context*);
@@ -3052,11 +3043,8 @@ class Map_index_expression : public Expr
   }
 
   bool
-  do_must_eval_subexpressions_in_order(int* skip) const
-  {
-*skip = 1;
-return true;
-  }
+  do_must_eval_subexpressions_in_order(int*) const
+  { return true; }
 
   // A map index expression is an lvalue but it is not addressable.
 


[arm] Put CPU's FPU capabilities directly in the ISA specification

2018-07-11 Thread Richard Earnshaw (lists)
As part of the transition from the original support for named FPUs to
general FPU properties I defined an entry in the CPU definitions in
arm-cpus.in to use a named FPU.  However, that has now outlived its
usefulness and increasingly we are likely to find that newer cores do
not fit the legacy FPU names very well.  Furthermore it is now possible
to encode all the FPU capatilities directly in the ISA definitions,
often as simply as using +fp or +simd.

So this patch removes the fpu field from the "define cpu" entries and
instead encodes the same information in the isa field.  This also alows
us to remove a bit of now-dead code from parsecpu.awk.

* config/arm/arm-cpus.in: Move information from fpu field of each
cpu definition to the isa field.
* config/arm/parsecpu.awk (fpu): Delete match rule.
(gen_comm_data): Don't add bits from the CPU's FPU entry.

Committed to trunk.
diff --git a/gcc/config/arm/arm-cpus.in b/gcc/config/arm/arm-cpus.in
index c2dacda..d6eed2f 100644
--- a/gcc/config/arm/arm-cpus.in
+++ b/gcc/config/arm/arm-cpus.in
@@ -620,7 +620,6 @@ end arch iwmmxt2
 #   [tune for ]
 #   [tune flags ]
 #   architecture 
-#   [fpu ]
 #   [isa ]
 #   [option  add|remove ]*
 #   [optalias  ]*
@@ -633,7 +632,7 @@ end arch iwmmxt2
 # isa flags are appended to those defined by the architecture.
 # Each add option must have a distinct feature set and each remove
 # option must similarly have a distinct feature set.  Option aliases can be
-# added with the optalias statement
+# added with the optalias statement.
 
 # V4 Architecture Processors
 begin cpu arm8
@@ -778,8 +777,7 @@ end cpu arm1020t
 # V5TE Architecture Processors
 begin cpu arm9e
  tune flags LDSCHED
- architecture armv5te
- fpu vfpv2
+ architecture armv5te+fp
  option nofp remove ALL_FP
  costs 9e
 end cpu arm9e
@@ -787,8 +785,7 @@ end cpu arm9e
 begin cpu arm946e-s
  cname arm946es
  tune flags LDSCHED
- architecture armv5te
- fpu vfpv2
+ architecture armv5te+fp
  option nofp remove ALL_FP
  costs 9e
 end cpu arm946e-s
@@ -796,8 +793,7 @@ end cpu arm946e-s
 begin cpu arm966e-s
  cname arm966es
  tune flags LDSCHED
- architecture armv5te
- fpu vfpv2
+ architecture armv5te+fp
  option nofp remove ALL_FP
  costs 9e
 end cpu arm966e-s
@@ -805,32 +801,28 @@ end cpu arm966e-s
 begin cpu arm968e-s
  cname arm968es
  tune flags LDSCHED
- architecture armv5te
- fpu vfpv2
+ architecture armv5te+fp
  option nofp remove ALL_FP
  costs 9e
 end cpu arm968e-s
 
 begin cpu arm10e
  tune flags LDSCHED
- architecture armv5te
- fpu vfpv2
+ architecture armv5te+fp
  option nofp remove ALL_FP
  costs fastmul
 end cpu arm10e
 
 begin cpu arm1020e
  tune flags LDSCHED
- architecture armv5te
- fpu vfpv2
+ architecture armv5te+fp
  option nofp remove ALL_FP
  costs fastmul
 end cpu arm1020e
 
 begin cpu arm1022e
  tune flags LDSCHED
- architecture armv5te
- fpu vfpv2
+ architecture armv5te+fp
  option nofp remove ALL_FP
  costs fastmul
 end cpu arm1022e
@@ -883,8 +875,7 @@ end cpu fa726te
 begin cpu arm926ej-s
  cname arm926ejs
  tune flags LDSCHED
- architecture armv5tej
- fpu vfpv2
+ architecture armv5tej+fp
  option nofp remove ALL_FP
  costs 9e
 end cpu arm926ej-s
@@ -892,8 +883,7 @@ end cpu arm926ej-s
 begin cpu arm1026ej-s
  cname arm1026ejs
  tune flags LDSCHED
- architecture armv5tej
- fpu vfpv2
+ architecture armv5tej+fp
  option nofp remove ALL_FP
  costs 9e
 end cpu arm1026ej-s
@@ -910,8 +900,7 @@ end cpu arm1136j-s
 begin cpu arm1136jf-s
  cname arm1136jfs
  tune flags LDSCHED
- architecture armv6j
- fpu vfpv2
+ architecture armv6j+fp
  costs 9e
 end cpu arm1136jf-s
 
@@ -925,8 +914,7 @@ end cpu arm1176jz-s
 begin cpu arm1176jzf-s
  cname arm1176jzfs
  tune flags LDSCHED
- architecture armv6kz
- fpu vfpv2
+ architecture armv6kz+fp
  costs 9e
 end cpu arm1176jzf-s
 
@@ -938,8 +926,7 @@ end cpu mpcorenovfp
 
 begin cpu mpcore
  tune flags LDSCHED
- architecture armv6k
- fpu vfpv2
+ architecture armv6k+fp
  costs 9e
 end cpu mpcore
 
@@ -953,8 +940,7 @@ end cpu arm1156t2-s
 begin cpu arm1156t2f-s
  cname arm1156t2fs
  tune flags LDSCHED
- architecture armv6t2
- fpu vfpv2
+ architecture armv6t2+fp
  costs v6t2
 end cpu arm1156t2f-s
 
@@ -1012,8 +998,7 @@ end cpu cortex-m0plus.small-multiply
 begin cpu generic-armv7-a
  cname genericv7a
  tune flags LDSCHED
- architecture armv7-a
- fpu vfpv3-d16
+ architecture armv7-a+fp
  option vfpv3-d16 add VFPv3 FP_DBL
  option vfpv3 add VFPv3 FP_D32
  option vfpv3-d16-fp16 add VFPv3 FP_DBL fp16conv
@@ -1033,8 +1018,7 @@ end cpu generic-armv7-a
 begin cpu cortex-a5
  cname cortexa5
  tune flags LDSCHED
- architecture armv7-a
- fpu neon-fp16
+ architecture armv7-a+neon-fp16
  option nosimd remove ALL_SIMD
  option nofp remove ALL_FP
  costs cortex_a5
@@ -1043,8 +1027,7 @@ end cpu cortex-a5
 begin cpu cortex-a7
  cname cortexa7
  tune flags LDSCHED
- architecture armv7ve
- fpu neon-vfpv4
+ architecture armv7ve+simd
  option nosimd remove ALL_SIMD
  option nofp remove ALL_FP
  costs cortex_a7
@@ -

[PATCH, S390] Increase function alignment to 16 bytes

2018-07-11 Thread Robin Dapp
Hi,

the following patch increases the default function alignment to 16
bytes.  This helps get rid of some unwanted performance effects.

I'm unsure whether or when it's necessary to implement
OVERRIDE_OPTIONS_AFTER_CHANGE.
Apparently ia64 did it to set flags that are reset when using
__attribute__((optimize)). i386 calls i386_default_align () and sets
various alignments only when the alignment value is unset but when is
e.g. global_options.x_str_align_functions actually unset except for the
very first call?

Trying simple examples like


 void foo () {};

 __attribute__((optimize("Os")))
 void bar () {};


I did not observe that the default alignment, once set, was reset anywhere.

Regards
 Robin

--

gcc/ChangeLog:

2018-07-11  Robin Dapp  

* config/s390/s390.c (s390_default_align): Set default
function alignment.
(s390_override_options_after_change): New.
(s390_option_override_internal): Call s390_default_align.
(TARGET_OVERRIDE_OPTIONS_AFTER_CHANGE): New.
diff --git a/gcc/config/s390/s390.c b/gcc/config/s390/s390.c
index 8df195ddd78..eaeba89b321 100644
--- a/gcc/config/s390/s390.c
+++ b/gcc/config/s390/s390.c
@@ -15322,6 +15322,23 @@ s390_function_specific_restore (struct gcc_options *opts,
   opts->x_s390_cost_pointer = (long)processor_table[opts->x_s390_tune].cost;
 }
 
+static void
+s390_default_align (struct gcc_options *opts)
+{
+  /* Set the default function alignment to 16 in order to get rid of
+ some unwanted performance effects. */
+  if (opts->x_flag_align_functions && !opts->x_str_align_functions
+  && opts->x_s390_tune >= PROCESSOR_2964_Z13
+  && !opts->x_optimize_size)
+opts->x_str_align_functions = "16";
+}
+
+static void
+s390_override_options_after_change (void)
+{
+  s390_default_align (&global_options);
+}
+
 static void
 s390_option_override_internal (bool main_args_p,
 			   struct gcc_options *opts,
@@ -15559,6 +15576,9 @@ s390_option_override_internal (bool main_args_p,
 			 opts->x_param_values,
 			 opts_set->x_param_values);
 
+  /* Set the default alignment.  */
+  s390_default_align (opts);
+
   /* Call target specific restore function to do post-init work.  At the moment,
  this just sets opts->x_s390_cost_pointer.  */
   s390_function_specific_restore (opts, NULL);
@@ -16751,6 +16771,9 @@ s390_case_values_threshold (void)
 #undef TARGET_PASS_BY_REFERENCE
 #define TARGET_PASS_BY_REFERENCE s390_pass_by_reference
 
+#undef  TARGET_OVERRIDE_OPTIONS_AFTER_CHANGE
+#define TARGET_OVERRIDE_OPTIONS_AFTER_CHANGE s390_override_options_after_change
+
 #undef TARGET_FUNCTION_OK_FOR_SIBCALL
 #define TARGET_FUNCTION_OK_FOR_SIBCALL s390_function_ok_for_sibcall
 #undef TARGET_FUNCTION_ARG


Re: [PATCH 0/5] [RFC v2] Higher-level reporting of vectorization problems

2018-07-11 Thread Richard Sandiford
David Malcolm  writes:
> On Mon, 2018-06-25 at 11:10 +0200, Richard Biener wrote:
>> On Fri, 22 Jun 2018, David Malcolm wrote:
>> 
>> > NightStrike and I were chatting on IRC last week about
>> > issues with trying to vectorize the following code:
>> > 
>> > #include 
>> > std::size_t f(std::vector> const & v) {
>> >std::size_t ret = 0;
>> >for (auto const & w: v)
>> >ret += w.size();
>> >return ret;
>> > }
>> > 
>> > icc could vectorize it, but gcc couldn't, but neither of us could
>> > immediately figure out what the problem was.
>> > 
>> > Using -fopt-info leads to a wall of text.
>> > 
>> > I tried using my patch here:
>> > 
>> >  "[PATCH] v3 of optinfo, remarks and optimization records"
>> >   https://gcc.gnu.org/ml/gcc-patches/2018-06/msg01267.html
>> > 
>> > It improved things somewhat, by showing:
>> > (a) the nesting structure via indentation, and
>> > (b) the GCC line at which each message is emitted (by using the
>> > "remark" output)
>> > 
>> > but it's still a wall of text:
>> > 
>> >   https://dmalcolm.fedorapeople.org/gcc/2018-06-18/test.cc.remarks.
>> > html
>> >   https://dmalcolm.fedorapeople.org/gcc/2018-06-18/test.cc.d/..%7C.
>> > .%7Csrc%7Ctest.cc.html#line-4
>> > 
>> > It doesn't yet provide a simple high-level message to a
>> > tech-savvy user on what they need to do to get GCC to
>> > vectorize their loop.
>> 
>> Yeah, in particular the vectorizer is way too noisy in its low-level
>> functions.  IIRC -fopt-info-vec-missed is "somewhat" better:
>> 
>> t.C:4:26: note: step unknown.
>> t.C:4:26: note: vector alignment may not be reachable
>> t.C:4:26: note: not ssa-name.
>> t.C:4:26: note: use not simple.
>> t.C:4:26: note: not ssa-name.
>> t.C:4:26: note: use not simple.
>> t.C:4:26: note: no array mode for V2DI[3]
>> t.C:4:26: note: Data access with gaps requires scalar epilogue loop
>> t.C:4:26: note: can't use a fully-masked loop because the target
>> doesn't 
>> have the appropriate masked load or store.
>> t.C:4:26: note: not ssa-name.
>> t.C:4:26: note: use not simple.
>> t.C:4:26: note: not ssa-name.
>> t.C:4:26: note: use not simple.
>> t.C:4:26: note: no array mode for V2DI[3]
>> t.C:4:26: note: Data access with gaps requires scalar epilogue loop
>> t.C:4:26: note: op not supported by target.
>> t.C:4:26: note: not vectorized: relevant stmt not supported: _15 =
>> _14 
>> /[ex] 4;
>> t.C:4:26: note: bad operation or unsupported loop bound.
>> t.C:4:26: note: not vectorized: no grouped stores in basic block.
>> t.C:4:26: note: not vectorized: no grouped stores in basic block.
>> t.C:6:12: note: not vectorized: not enough data-refs in basic block.
>> 
>> 
>> > The pertinent dump messages are:
>> > 
>> > test.cc:4:23: remark: === try_vectorize_loop_1 ===
>> > [../../src/gcc/tree-vectorizer.c:674:try_vectorize_loop_1]
>> > cc1plus: remark:
>> > Analyzing loop at test.cc:4
>> > [../../src/gcc/dumpfile.c:735:ensure_pending_optinfo]
>> > test.cc:4:23: remark:  === analyze_loop_nest ===
>> > [../../src/gcc/tree-vect-loop.c:2299:vect_analyze_loop]
>> > [...snip...]
>> > test.cc:4:23: remark:   === vect_analyze_loop_operations ===
>> > [../../src/gcc/tree-vect-loop.c:1520:vect_analyze_loop_operations]
>> > [...snip...]
>> > test.cc:4:23: remark:==> examining statement: ‘_15 = _14 /[ex]
>> > 4;’ [../../src/gcc/tree-vect-stmts.c:9382:vect_analyze_stmt]
>> > test.cc:4:23: remark:vect_is_simple_use: operand ‘_14’
>> > [../../src/gcc/tree-vect-stmts.c:10064:vect_is_simple_use]
>> > test.cc:4:23: remark:def_stmt: ‘_14 = _8 - _7;’
>> > [../../src/gcc/tree-vect-stmts.c:10098:vect_is_simple_use]
>> > test.cc:4:23: remark:type of def: internal [../../src/gcc/tree-
>> > vect-stmts.c:10112:vect_is_simple_use]
>> > test.cc:4:23: remark:vect_is_simple_use: operand ‘4’
>> > [../../src/gcc/tree-vect-stmts.c:10064:vect_is_simple_use]
>> > test.cc:4:23: remark:op not supported by target.
>> > [../../src/gcc/tree-vect-stmts.c:5932:vectorizable_operation]
>> > test.cc:4:23: remark:not vectorized: relevant stmt not
>> > supported: ‘_15 = _14 /[ex] 4;’ [../../src/gcc/tree-vect-
>> > stmts.c:9565:vect_analyze_stmt]
>> > test.cc:4:23: remark:   bad operation or unsupported loop bound.
>> > [../../src/gcc/tree-vect-loop.c:2043:vect_analyze_loop_2]
>> > cc1plus: remark: vectorized 0 loops in function.
>> > [../../src/gcc/tree-vectorizer.c:904:vectorize_loops]
>> > 
>> > In particular, that complaint from
>> >   [../../src/gcc/tree-vect-stmts.c:9565:vect_analyze_stmt]
>> > is coming from:
>> > 
>> >   if (!ok)
>> > {
>> >   if (dump_enabled_p ())
>> > {
>> >   dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location,
>> >"not vectorized: relevant stmt not ");
>> >   dump_printf (MSG_MISSED_OPTIMIZATION, "supported: ");
>> >   dump_gimple_stmt (MSG_MISSED_OPTIMIZATION, TDF_SLIM,
>> > stmt, 0);
>> > }
>> > 
>> >   return false;
>> > }
>> > 
>> > This got me thinking: the user p

Re: [PATCH, doc] Small clarification on define_subst

2018-07-11 Thread Jeff Law
On 07/08/2018 06:03 PM, Paul Koning wrote:
> In doing CCmode work I was confused how define_subst handles cases where the 
> same argument appears more than once.  The attached clarifies this.
> 
> Ok for trunk?
> 
>   paul
> 
> ChangeLog:
> 
> 2018-07-08  Paul Koning  
> 
>   * doc/md.texi (define_subst): Document how multiple occurrences of
>   the same argument in the replacement pattern are handled.
OK.
jeff


Re: allow thread_through_all_blocks() to start from the same initial BB

2018-07-11 Thread Jeff Law
On 07/10/2018 05:14 AM, Aldy Hernandez wrote:
> I believe I missed this companion patch when I submitted...
> 
>    Subject: jump threading multiple paths that start from the same BB
> 
> The attached patch changes thread_through_all_blocks to allow threads
> that start from the same basic block as another thread.
> 
> OK for trunk?
> 
> curr.patch
> 
> 
> gcc/
> 
> * tree-ssa-threadupdate.c (thread_through_all_blocks): Do not jump
>   thread twice from the same starting edge.
OK
jeff


Re: [PATCH] doc: add missing "mode" type attribute

2018-07-11 Thread Jeff Law
On 07/10/2018 10:50 AM, Paul Koning wrote:
> "mode" is documented as a variable attribute but not as a type attribute.  
> This fixes that omission.  I simply copied the other text, it seemed suitable 
> as it stands.
> 
> The attributes are normally listed in alphabetical order but "mode" was out 
> of order in the variable attributes.
> 
> Ok for trunk?
> 
>   paul
> 
> ChangeLog:
> 
> 2018-07-10  Paul Koning  
> 
>   * doc/extend.texi (Common Variable Attributes): Move "mode" into
>   alphabetical order.
>   (Common Type Attributes): Add "mode" attribute.
OK.
jeff


Re: [AArch64] Use arrays and loops rather than numbered variables in aarch64_operands_adjust_ok_for_ldpstp [1/2]

2018-07-11 Thread Jackson Woodruff

Hi Sudi,

Thanks for the review.


On 07/10/2018 10:56 AM, Sudakshina wrote:

Hi Jackson


-  if (!MEM_P (mem_1) || aarch64_mem_pair_operand (mem_1, mode))
+  if (!MEM_P (mem[1]) || aarch64_mem_pair_operand (mem[1], mode))

mem_1 == mem[1]?

Oops, yes... That should be mem[0].


 return false;

-  /* The mems cannot be volatile.  */
...

/* If we have SImode and slow unaligned ldp,
  check the alignment to be at least 8 byte. */
   if (mode == SImode
   && (aarch64_tune_params.extra_tuning_flags
-  & AARCH64_EXTRA_TUNE_SLOW_UNALIGNED_LDPW)
+      & AARCH64_EXTRA_TUNE_SLOW_UNALIGNED_LDPW)
   && !optimize_size
-  && MEM_ALIGN (mem_1) < 8 * BITS_PER_UNIT)
+  && MEM_ALIGN (mem[1]) < 8 * BITS_PER_UNIT)

Likewise

Done

...
   /* Check if the registers are of same class.  */
-  if (rclass_1 != rclass_2 || rclass_2 != rclass_3 || rclass_3 != 
rclass_4)

-    return false;
+  for (int i = 0; i < 3; i++)

num_instructions -1 instead of 3 would be more consistent.

Done


+    if (rclass[i] != rclass[i + 1])
+  return false;

It looks good otherwise.

Thanks
Sudi


Re-regtested and boostrapped.

OK for trunk?

Thanks,

Jackson
diff --git a/gcc/config/aarch64/aarch64.c b/gcc/config/aarch64/aarch64.c
index 01f35f8e8525adb455780269757452c8c3eb20be..da44b33b2bc12f9aa2122cf5194e244437fb31a5 100644
--- a/gcc/config/aarch64/aarch64.c
+++ b/gcc/config/aarch64/aarch64.c
@@ -17026,23 +17026,21 @@ bool
 aarch64_operands_adjust_ok_for_ldpstp (rtx *operands, bool load,
    scalar_mode mode)
 {
-  enum reg_class rclass_1, rclass_2, rclass_3, rclass_4;
-  HOST_WIDE_INT offvals[4], msize;
-  rtx mem_1, mem_2, mem_3, mem_4, reg_1, reg_2, reg_3, reg_4;
-  rtx base_1, base_2, base_3, base_4, offset_1, offset_2, offset_3, offset_4;
+  const int num_instructions = 4;
+  enum reg_class rclass[num_instructions];
+  HOST_WIDE_INT offvals[num_instructions], msize;
+  rtx mem[num_instructions], reg[num_instructions],
+  base[num_instructions], offset[num_instructions];
 
   if (load)
 {
-  reg_1 = operands[0];
-  mem_1 = operands[1];
-  reg_2 = operands[2];
-  mem_2 = operands[3];
-  reg_3 = operands[4];
-  mem_3 = operands[5];
-  reg_4 = operands[6];
-  mem_4 = operands[7];
-  gcc_assert (REG_P (reg_1) && REG_P (reg_2)
-		  && REG_P (reg_3) && REG_P (reg_4));
+  for (int i = 0; i < num_instructions; i++)
+	{
+	  reg[i] = operands[2 * i];
+	  mem[i] = operands[2 * i + 1];
+
+	  gcc_assert (REG_P (reg[i]));
+	}
 
   /* Do not attempt to merge the loads if the loads clobber each other.  */
   for (int i = 0; i < 8; i += 2)
@@ -17051,53 +17049,47 @@ aarch64_operands_adjust_ok_for_ldpstp (rtx *operands, bool load,
 	return false;
 }
   else
-{
-  mem_1 = operands[0];
-  reg_1 = operands[1];
-  mem_2 = operands[2];
-  reg_2 = operands[3];
-  mem_3 = operands[4];
-  reg_3 = operands[5];
-  mem_4 = operands[6];
-  reg_4 = operands[7];
-}
+for (int i = 0; i < num_instructions; i++)
+  {
+	mem[i] = operands[2 * i];
+	reg[i] = operands[2 * i + 1];
+  }
+
   /* Skip if memory operand is by itslef valid for ldp/stp.  */
-  if (!MEM_P (mem_1) || aarch64_mem_pair_operand (mem_1, mode))
+  if (!MEM_P (mem[0]) || aarch64_mem_pair_operand (mem[0], mode))
 return false;
 
-  /* The mems cannot be volatile.  */
-  if (MEM_VOLATILE_P (mem_1) || MEM_VOLATILE_P (mem_2)
-  || MEM_VOLATILE_P (mem_3) ||MEM_VOLATILE_P (mem_4))
-return false;
+  for (int i = 0; i < num_instructions; i++)
+{
+  /* The mems cannot be volatile.  */
+  if (MEM_VOLATILE_P (mem[i]))
+	return false;
 
-  /* Check if the addresses are in the form of [base+offset].  */
-  extract_base_offset_in_addr (mem_1, &base_1, &offset_1);
-  if (base_1 == NULL_RTX || offset_1 == NULL_RTX)
-return false;
-  extract_base_offset_in_addr (mem_2, &base_2, &offset_2);
-  if (base_2 == NULL_RTX || offset_2 == NULL_RTX)
-return false;
-  extract_base_offset_in_addr (mem_3, &base_3, &offset_3);
-  if (base_3 == NULL_RTX || offset_3 == NULL_RTX)
-return false;
-  extract_base_offset_in_addr (mem_4, &base_4, &offset_4);
-  if (base_4 == NULL_RTX || offset_4 == NULL_RTX)
-return false;
+  /* Check if the addresses are in the form of [base+offset].  */
+  extract_base_offset_in_addr (mem[i], base + i, offset + i);
+  if (base[i] == NULL_RTX || offset[i] == NULL_RTX)
+	return false;
+}
+
+  /* Check if addresses are clobbered by load.  */
+  if (load)
+for (int i = 0; i < num_instructions; i++)
+  if (reg_mentioned_p (reg[i], mem[i]))
+	return false;
 
   /* Check if the bases are same.  */
-  if (!rtx_equal_p (base_1, base_2)
-  || !rtx_equal_p (base_2, base_3)
-  || !rtx_equal_p (base_3, base_4))
-return false;
+  for (int i = 0; i < num_instructions - 1; i++)
+if (!rtx_equal_p (base[i], base[i + 1]))
+  return false;
+
+  for (int i = 0; i < num_instructions; i++)
+offva

Re: [AArch64] Generate load-pairs when the last load clobbers the address register [2/2]

2018-07-11 Thread Jackson Woodruff

Hi Sudi,

On 07/10/2018 02:29 PM, Sudakshina Das wrote:

Hi Jackson


On Tuesday 10 July 2018 09:37 AM, Jackson Woodruff wrote:

Hi all,

This patch resolves PR86014.  It does so by noticing that the last 
load may clobber the address register without issue (regardless of 
where it exists in the final ldp/stp sequence). That check has been 
changed so that the last register may be clobbered and the testcase 
(gcc.target/aarch64/ldp_stp_10.c) now passes.


Bootstrap and regtest OK.

OK for trunk?

Jackson

Changelog:

gcc/

2018-06-25  Jackson Woodruff  

    PR target/86014
    * config/aarch64/aarch64.c 
(aarch64_operands_adjust_ok_for_ldpstp):

    Remove address clobber check on last register.

This looks good to me but you will need a maintainer to approve it. 
The only
thing I would add is that if you could move the comment on top of the 
for loop

to this patch. That is, keep the original
/* Check if the addresses are clobbered by load.  */
in your [1/2] and make the comment change in [2/2].

Thanks, change made.  OK for trunk?

Thanks,

Jackson
diff --git a/gcc/config/aarch64/aarch64.c b/gcc/config/aarch64/aarch64.c
index da44b33b2bc12f9aa2122cf5194e244437fb31a5..8a027974e9772cacf5f5cb8ec61e8ef62187e879 100644
--- a/gcc/config/aarch64/aarch64.c
+++ b/gcc/config/aarch64/aarch64.c
@@ -17071,9 +17071,10 @@ aarch64_operands_adjust_ok_for_ldpstp (rtx *operands, bool load,
 	return false;
 }
 
-  /* Check if addresses are clobbered by load.  */
+  /* Only the last register in the order in which they occur
+ may be clobbered by the load.  */
   if (load)
-for (int i = 0; i < num_instructions; i++)
+for (int i = 0; i < num_instructions - 1; i++)
   if (reg_mentioned_p (reg[i], mem[i]))
 	return false;
 


Re: abstract wide int binop code from VRP

2018-07-11 Thread Aldy Hernandez



On 07/11/2018 08:52 AM, Richard Biener wrote:

On Wed, Jul 11, 2018 at 8:48 AM Aldy Hernandez  wrote:


Hmmm, I think we can do better, and since this hasn't been reviewed yet,
I don't think anyone will mind the adjustment to the patch ;-).

I really hate int_const_binop_SOME_RANDOM_NUMBER.  We should abstract
them into properly named poly_int_binop, wide_int_binop, and tree_binop,
and then use a default argument for int_const_binop() to get things going.

Sorry for more changes in flight, but I thought we could benefit from
more cleanups :).

OK for trunk pending tests?


Much of GCC pre-dates function overloading / default args ;)


Heh...and ANSI C.



Looks OK but can you please rename your tree_binop to int_cst_binop?
Or maybe inline it into int_const_binop, also sharing the force_fit_type ()
tail with poly_int_binop?


I tried both, but inlining looked cleaner :).  Done.



What about mixed INTEGER_CST / poly_int constants?  Shouldn't it
be

   if (neither-poly-nor-integer-cst (arg1 || arg2))
 return NULL_TREE;
   if (poly_int_tree (arg1) || poly_int_tree (arg2))
 poly-int-stuff
   else if (INTEGER_CST && INTEGER_CST)
 wide-int-stuff

?  I see that is a pre-existing issue but if you are at refactoring...
wi::to_poly_wide should handle INTEGER_CST operands just fine
I hope.


This aborted:
gcc_assert (NUM_POLY_INT_COEFFS != 1);

but even taking it out made the bootstrap die somewhere else.

If it's ok, I'd rather not tackle this now, as I have some more cleanups 
that are pending on this.  If you feel strongly, I could do it at a 
later time.


OK pending tests?
Aldy
gcc/

* fold-const.c (int_const_binop_1): Abstract...
(wide_int_binop): ...wide int code here.
	(poly_int_binop): ...poly int code here.
	Abstract the rest of int_const_binop_1 into int_const_binop.
* fold-const.h (wide_int_binop): New.
* tree-vrp.c (vrp_int_const_binop): Call wide_int_binop.
	Remove useless PLUS/MINUS_EXPR case.
(zero_nonzero_bits_from_vr): Move wide int code...
(zero_nonzero_bits_from_bounds): ...here.
(extract_range_from_binary_expr_1): Move mask optimization code...
(range_easy_mask_min_max): ...here.
* tree-vrp.h (zero_nonzero_bits_from_bounds): New.
(range_easy_mask_min_max): New.

diff --git a/gcc/fold-const.c b/gcc/fold-const.c
index 97c435fa5e0..ad8c0a69f63 100644
--- a/gcc/fold-const.c
+++ b/gcc/fold-const.c
@@ -966,21 +966,17 @@ int_binop_types_match_p (enum tree_code code, const_tree type1, const_tree type2
 	 && TYPE_MODE (type1) == TYPE_MODE (type2);
 }
 
-/* Subroutine of int_const_binop_1 that handles two INTEGER_CSTs.  */
+/* Combine two wide ints ARG1 and ARG2 under operation CODE to produce
+   a new constant in RES.  Return FALSE if we don't know how to
+   evaluate CODE at compile-time.  */
 
-static tree
-int_const_binop_2 (enum tree_code code, const_tree parg1, const_tree parg2,
-		   int overflowable)
+bool
+wide_int_binop (enum tree_code code,
+		wide_int &res, const wide_int &arg1, const wide_int &arg2,
+		signop sign, wi::overflow_type &overflow)
 {
-  wide_int res;
-  tree t;
-  tree type = TREE_TYPE (parg1);
-  signop sign = TYPE_SIGN (type);
-  wi::overflow_type overflow = wi::OVF_NONE;
-
-  wi::tree_to_wide_ref arg1 = wi::to_wide (parg1);
-  wide_int arg2 = wi::to_wide (parg2, TYPE_PRECISION (type));
-
+  wide_int tmp;
+  overflow = wi::OVF_NONE;
   switch (code)
 {
 case BIT_IOR_EXPR:
@@ -999,37 +995,41 @@ int_const_binop_2 (enum tree_code code, const_tree parg1, const_tree parg2,
 case LSHIFT_EXPR:
   if (wi::neg_p (arg2))
 	{
-	  arg2 = -arg2;
+	  tmp = -arg2;
 	  if (code == RSHIFT_EXPR)
 	code = LSHIFT_EXPR;
 	  else
 	code = RSHIFT_EXPR;
 	}
+  else
+tmp = arg2;
 
   if (code == RSHIFT_EXPR)
 	/* It's unclear from the C standard whether shifts can overflow.
 	   The following code ignores overflow; perhaps a C standard
 	   interpretation ruling is needed.  */
-	res = wi::rshift (arg1, arg2, sign);
+	res = wi::rshift (arg1, tmp, sign);
   else
-	res = wi::lshift (arg1, arg2);
+	res = wi::lshift (arg1, tmp);
   break;
 
 case RROTATE_EXPR:
 case LROTATE_EXPR:
   if (wi::neg_p (arg2))
 	{
-	  arg2 = -arg2;
+	  tmp = -arg2;
 	  if (code == RROTATE_EXPR)
 	code = LROTATE_EXPR;
 	  else
 	code = RROTATE_EXPR;
 	}
+  else
+tmp = arg2;
 
   if (code == RROTATE_EXPR)
-	res = wi::rrotate (arg1, arg2);
+	res = wi::rrotate (arg1, tmp);
   else
-	res = wi::lrotate (arg1, arg2);
+	res = wi::lrotate (arg1, tmp);
   break;
 
 case PLUS_EXPR:
@@ -1051,49 +1051,49 @@ int_const_binop_2 (enum tree_code code, const_tree parg1, const_tree parg2,
 case TRUNC_DIV_EXPR:
 case EXACT_DIV_EXPR:
   if (arg2 == 0)
-	return NULL_TREE;
+	return false;
   res = wi::div_trunc (arg1, arg2, sign, &overflow);
   break;
 
 case FLOOR_DIV_EXPR:
   if (arg2 == 0)
-	return NULL_TREE;
+	return false;
   res

Re: abstract wide int binop code from VRP

2018-07-11 Thread Richard Sandiford
Richard Biener  writes:
> On Wed, Jul 11, 2018 at 8:48 AM Aldy Hernandez  wrote:
>>
>> Hmmm, I think we can do better, and since this hasn't been reviewed yet,
>> I don't think anyone will mind the adjustment to the patch ;-).
>>
>> I really hate int_const_binop_SOME_RANDOM_NUMBER.  We should abstract
>> them into properly named poly_int_binop, wide_int_binop, and tree_binop,
>> and then use a default argument for int_const_binop() to get things going.
>>
>> Sorry for more changes in flight, but I thought we could benefit from
>> more cleanups :).
>>
>> OK for trunk pending tests?
>
> Much of GCC pre-dates function overloading / default args ;)
>
> Looks OK but can you please rename your tree_binop to int_cst_binop?
> Or maybe inline it into int_const_binop, also sharing the force_fit_type ()
> tail with poly_int_binop?
>
> What about mixed INTEGER_CST / poly_int constants?  Shouldn't it
> be
>
>   if (neither-poly-nor-integer-cst (arg1 || arg2))
> return NULL_TREE;
>   if (poly_int_tree (arg1) || poly_int_tree (arg2))
> poly-int-stuff
>   else if (INTEGER_CST && INTEGER_CST)
> wide-int-stuff
>
> ?  I see that is a pre-existing issue but if you are at refactoring...
> wi::to_poly_wide should handle INTEGER_CST operands just fine
> I hope.

Don't think it's a preexisting issue.  poly_int_tree_p returns true
for anything that can be represented as a poly_int, i.e. both
INTEGER_CST and POLY_INT_CST.  (It wouldn't really make sense to
ask whether something could *only* be represented as a POLY_INT_CST.)

So:

  if (poly_int_tree_p (arg1) && poly_int_tree_p (arg2))
{
  poly_wide_int res;
  bool overflow;
  tree type = TREE_TYPE (arg1);
  signop sign = TYPE_SIGN (type);
  switch (code)
{
case PLUS_EXPR:
  res = wi::add (wi::to_poly_wide (arg1),
 wi::to_poly_wide (arg2), sign, &overflow);
  break;

handles POLY_INT_CST + POLY_INT_CST, POLY_INT_CST + INTEGER_CST and
INTEGER_CST + POLY_INT_CST.

Thanks,
Richard


Re: [PATCH][GCC][AArch64] Updated stack-clash implementation supporting 64k probes. [patch (1/6)]

2018-07-11 Thread Jeff Law
On 07/11/2018 05:20 AM, Tamar Christina wrote:
> Hi All,
> 
> This patch implements the use of the stack clash mitigation for aarch64.
> In Aarch64 we expect both the probing interval and the guard size to be 64KB
> and we enforce them to always be equal.
> 
> We also probe up by 1024 bytes in the general case when a probe is required.
> 
> AArch64 has the following probing conditions:
> 
>  1) Any allocation less than 63KB requires no probing.  An ABI defined safe
> buffer of 1Kbytes is used and a page size of 64k is assumed.
> 
>  2) Any allocations larger than 1 page size, is done in increments of page 
> size
> and probed up by 1KB leaving the residuals.
> 
>  3a) Any residual for local arguments that is less than 63KB requires no 
> probing.
>  Essentially this is a sliding window.  The probing range determines the 
> ABI
>  safe buffer, and the amount to be probed up.
> 
>   b) Any residual for outgoing arguments that is less than 1KB requires no 
> probing,
>  However to maintain our invariant, anything above or equal to 1KB 
> requires a probe.
> 
> Incrementally allocating less than the probing thresholds, e.g. recursive 
> functions will
> not be an issue as the storing of LR counts as a probe.
> 
> 
> +---+ 
>
> |  ABI SAFE REGION  | 
>
>   +-- 
>
>   | |   | 
>
>   | |   | 
>
>   | |   | 
>
>   | |   | 
>
>   | |   | 
>
>   | |   | 
>
>  maximum amount   | |   | 
>
>  not needing a| |   | 
>
>  probe| |   | 
>
>   | |   | 
>
>   | |   | 
>
>   | |   | 
>
>   | |   |Probe offset when
>
>   |  probe is required
>
>   | |   | 
>
>   + +---+   Point of first 
> probe 
> |  ABI SAFE REGION  | 
>
> - 
>
> |   | 
>
> |   | 
>
> |   | 
> 
> 
> Bootstrapped Regtested on aarch64-none-linux-gnu and no issues.
> Target was tested with stack clash on and off by default.
> 
> Ok for trunk?
> 
> Thanks,
> Tamar
> 
> gcc/
> 2018-07-11  Jeff Law  
>   Richard Sandiford 
>   Tamar Christina  
> 
>   PR target/86486
>   * config/aarch64/aarch64.md (cmp,
>   probe_stack_range): Add k (SP) constraint.
>   * config/aarch64/aarch64.h (STACK_CLASH_CALLER_GUARD,
>   STACK_CLASH_MAX_UNROLL_PAGES): New.
>   * config/aarch64/aarch64.c (aarch64_output_probe_stack_range): Emit
>   stack probes for stack clash.
>   (aarch64_allocate_and_probe_stack_space): New.
>   (aarch64_expand_prologue): Use it.
>   (aarch64_expand_epilogue): Likewise and update IP regs re-use criteria.
>   (aarch64_sub_sp): Add emit_move_imm optional param.
[ ... ]
I'm going to let the aarch64 maintainers ack/nack the aarch64 specific
bits.  I'll review them to see if there's anything obvious (since I am
familiar with the core issues and the original implementation).

I'm happy to own review work on the target independent chunks.

jeff


[PATCH, rs6000] Add missing logical-op interfaces to emmintrin.h

2018-07-11 Thread Bill Schmidt
Hi,

It was recently brought to our attention that the existing emmintrin.h
header, which was believed to be feature-complete for SSE2 support, is
actually missing four logical-op interfaces:

 _mm_and_si128
 _mm_andnot_si128
 _mm_or_si128
 _mm_xor_si128

This patch provides those with the obvious implementations, along with
test cases.  I've bootstrapped it on powerpc64le-linux-gnu (P8, P9)
and powerpc64-linux-gnu (P7, P8) and tested it with no regressions.
Is this okay for trunk?

Although this isn't a regression, it is an oversight that leaves the
SSE2 support incomplete.  Thus I'd like to ask permission to also
backport this to gcc-8-branch after a short waiting period.  It's
passed regstrap on P8 and P9 LE, and P7/P8 BE testing is underway.
Is that backport okay if testing succeeds?

[BTW, I'm shepherding this patch on behalf of Steve Munroe.]

Thanks!
Bill


[gcc]

2018-07-10  Bill Schmidt  
Steve Munroe  

* config/rs6000/emmintrin.h (_mm_and_si128): New function.
(_mm_andnot_si128): Likewise.
(_mm_or_si128): Likewise.
(_mm_xor_si128): Likewise.

[gcc/testsuite]

2018-07-10  Bill Schmidt  
Steve Munroe  

* gcc.target/powerpc/sse2-pand-1.c: New file.
* gcc.target/powerpc/sse2-pandn-1.c: Likewise.
* gcc.target/powerpc/sse2-por-1.c: Likewise.
* gcc.target/powerpc/sse2-pxor-1.c: Likewise.


Index: gcc/config/rs6000/emmintrin.h
===
--- gcc/config/rs6000/emmintrin.h   (revision 262235)
+++ gcc/config/rs6000/emmintrin.h   (working copy)
@@ -1884,6 +1884,30 @@
 }
 
 extern __inline __m128i __attribute__((__gnu_inline__, __always_inline__, 
__artificial__))
+_mm_and_si128 (__m128i __A, __m128i __B)
+{
+  return (__m128i)vec_and ((__v2di) __A, (__v2di) __B);
+}
+
+extern __inline __m128i __attribute__((__gnu_inline__, __always_inline__, 
__artificial__))
+_mm_andnot_si128 (__m128i __A, __m128i __B)
+{
+  return (__m128i)vec_andc ((__v2di) __B, (__v2di) __A);
+}
+
+extern __inline __m128i __attribute__((__gnu_inline__, __always_inline__, 
__artificial__))
+_mm_or_si128 (__m128i __A, __m128i __B)
+{
+  return (__m128i)vec_or ((__v2di) __A, (__v2di) __B);
+}
+
+extern __inline __m128i __attribute__((__gnu_inline__, __always_inline__, 
__artificial__))
+_mm_xor_si128 (__m128i __A, __m128i __B)
+{
+  return (__m128i)vec_xor ((__v2di) __A, (__v2di) __B);
+}
+
+extern __inline __m128i __attribute__((__gnu_inline__, __always_inline__, 
__artificial__))
 _mm_cmpeq_epi8 (__m128i __A, __m128i __B)
 {
   return (__m128i) vec_cmpeq ((__v16qi) __A, (__v16qi)__B);
@@ -2333,3 +2357,4 @@
 }
 
 #endif /* EMMINTRIN_H_ */
+
Index: gcc/testsuite/gcc.target/powerpc/sse2-pand-1.c
===
--- gcc/testsuite/gcc.target/powerpc/sse2-pand-1.c  (nonexistent)
+++ gcc/testsuite/gcc.target/powerpc/sse2-pand-1.c  (working copy)
@@ -0,0 +1,41 @@
+/* { dg-do run } */
+/* { dg-options "-O3 -mpower8-vector -Wno-psabi" } */
+/* { dg-require-effective-target lp64 } */
+/* { dg-require-effective-target p8vector_hw } */
+
+#ifndef CHECK_H
+#define CHECK_H "sse2-check.h"
+#endif
+
+#include CHECK_H
+
+#ifndef TEST
+#define TEST sse2_test_pand_1
+#endif
+
+#include 
+
+static __m128i
+__attribute__((noinline, unused))
+test (__m128i s1, __m128i s2)
+{
+  return _mm_and_si128 (s1, s2); 
+}
+
+static void
+TEST (void)
+{
+  union128i_b u, s1, s2;
+  char e[16];
+  int i;
+   
+  s1.x = _mm_set_epi8 (1,2,3,4,10,20,30,90,-80,-40,-100,-15,98, 25, 98,7);
+  s2.x = _mm_set_epi8 (88, 44, 33, 22, 11, 98, 76, -100, -34, -78, -39, 6, 3, 
4, 5, 119);
+  u.x = test (s1.x, s2.x); 
+   
+  for (i = 0; i < 16; i++)
+ e[i] = s1.a[i] & s2.a[i];
+
+  if (check_union128i_b (u, e))
+abort ();
+}
Index: gcc/testsuite/gcc.target/powerpc/sse2-pandn-1.c
===
--- gcc/testsuite/gcc.target/powerpc/sse2-pandn-1.c (nonexistent)
+++ gcc/testsuite/gcc.target/powerpc/sse2-pandn-1.c (working copy)
@@ -0,0 +1,41 @@
+/* { dg-do run } */
+/* { dg-options "-O3 -mpower8-vector -Wno-psabi" } */
+/* { dg-require-effective-target lp64 } */
+/* { dg-require-effective-target p8vector_hw } */
+
+#ifndef CHECK_H
+#define CHECK_H "sse2-check.h"
+#endif
+
+#include CHECK_H
+
+#ifndef TEST
+#define TEST sse2_test_pandn_1
+#endif
+
+#include 
+
+static __m128i
+__attribute__((noinline, unused))
+test (__m128i s1, __m128i s2)
+{
+  return _mm_andnot_si128 (s1, s2); 
+}
+
+static void
+TEST (void)
+{
+  union128i_b u, s1, s2;
+  char e[16];
+  int i;
+   
+  s1.x = _mm_set_epi8 (1,2,3,4,10,20,30,90,-80,-40,-100,-15,98, 25, 98,7);
+  s2.x = _mm_set_epi8 (88, 44, 33, 22, 11, 98, 76, -100, -34, -78, -39, 6, 3, 
4, 5, 119);
+  u.x = test (s1.x, s2.x); 
+   
+  for (i = 0; i < 16; i++)
+ e[i] = (~s1.a[i]) & s2.a[i];
+
+  if (check_union128i_b (u, e))
+abort ();
+}
Index: gcc/testsuite/gcc.target/p

Re: [PATCH][GCC][AArch64] Updated stack-clash implementation supporting 64k probes. [patch (1/6)]

2018-07-11 Thread Jeff Law
On 07/11/2018 05:20 AM, Tamar Christina wrote:
> Hi All,
> 
> This patch implements the use of the stack clash mitigation for aarch64.
> In Aarch64 we expect both the probing interval and the guard size to be 64KB
> and we enforce them to always be equal.
> 
> We also probe up by 1024 bytes in the general case when a probe is required.
> 
> AArch64 has the following probing conditions:
> 
>  1) Any allocation less than 63KB requires no probing.  An ABI defined safe
> buffer of 1Kbytes is used and a page size of 64k is assumed.
> 
>  2) Any allocations larger than 1 page size, is done in increments of page 
> size
> and probed up by 1KB leaving the residuals.
> 
>  3a) Any residual for local arguments that is less than 63KB requires no 
> probing.
>  Essentially this is a sliding window.  The probing range determines the 
> ABI
>  safe buffer, and the amount to be probed up.
> 
>   b) Any residual for outgoing arguments that is less than 1KB requires no 
> probing,
>  However to maintain our invariant, anything above or equal to 1KB 
> requires a probe.
> 
> Incrementally allocating less than the probing thresholds, e.g. recursive 
> functions will
> not be an issue as the storing of LR counts as a probe.
> 
> 
> +---+ 
>
> |  ABI SAFE REGION  | 
>
>   +-- 
>
>   | |   | 
>
>   | |   | 
>
>   | |   | 
>
>   | |   | 
>
>   | |   | 
>
>   | |   | 
>
>  maximum amount   | |   | 
>
>  not needing a| |   | 
>
>  probe| |   | 
>
>   | |   | 
>
>   | |   | 
>
>   | |   | 
>
>   | |   |Probe offset when
>
>   |  probe is required
>
>   | |   | 
>
>   + +---+   Point of first 
> probe 
> |  ABI SAFE REGION  | 
>
> - 
>
> |   | 
>
> |   | 
>
> |   | 
> 
> 
> Bootstrapped Regtested on aarch64-none-linux-gnu and no issues.
> Target was tested with stack clash on and off by default.
> 
> Ok for trunk?
> 
> Thanks,
> Tamar
> 
> gcc/
> 2018-07-11  Jeff Law  
>   Richard Sandiford 
>   Tamar Christina  
> 
>   PR target/86486
>   * config/aarch64/aarch64.md (cmp,
>   probe_stack_range): Add k (SP) constraint.
>   * config/aarch64/aarch64.h (STACK_CLASH_CALLER_GUARD,
>   STACK_CLASH_MAX_UNROLL_PAGES): New.
>   * config/aarch64/aarch64.c (aarch64_output_probe_stack_range): Emit
>   stack probes for stack clash.
>   (aarch64_allocate_and_probe_stack_space): New.
>   (aarch64_expand_prologue): Use it.
>   (aarch64_expand_epilogue): Likewise and update IP regs re-use criteria.
>   (aarch64_sub_sp): Add emit_move_imm optional param.
> 
> gcc/testsuite/
> 2018-07-11  Jeff Law  
>   Richard Sandiford 
>   Tamar Christina  
> 
>   PR target/86486
>   * gcc.target/aarch64/stack-check-12.c: New.
>   * gcc.target/aarch64/stack-check-13.c: New.
>   * gcc.target/aarch64/stack-check-cfa-1.c: New.
>   * gcc.target/aarch64/stack-check-cfa-2.c: New.
>   * gcc.target/aarch64/stack-check-prologue-1.c: New.
>   * gcc.target/aarch64/stack-check-prologue-10.c: New.
>   * gcc.target/aarch64/stack-check-prologue-11.c: New.
>   * gcc.target/aarch64/stack-check-prologue-2.c: New.
>   * gcc.target/aarch64/stack-check-prologue-3.c: New.
>   * gcc.target/aarch64/stack-check-prologue-4.c: New.
>   

Re: abstract wide int binop code from VRP

2018-07-11 Thread Richard Sandiford
Aldy Hernandez  writes:
> On 07/11/2018 08:52 AM, Richard Biener wrote:
>> On Wed, Jul 11, 2018 at 8:48 AM Aldy Hernandez  wrote:
>>>
>>> Hmmm, I think we can do better, and since this hasn't been reviewed yet,
>>> I don't think anyone will mind the adjustment to the patch ;-).
>>>
>>> I really hate int_const_binop_SOME_RANDOM_NUMBER.  We should abstract
>>> them into properly named poly_int_binop, wide_int_binop, and tree_binop,
>>> and then use a default argument for int_const_binop() to get things going.
>>>
>>> Sorry for more changes in flight, but I thought we could benefit from
>>> more cleanups :).
>>>
>>> OK for trunk pending tests?
>> 
>> Much of GCC pre-dates function overloading / default args ;)
>
> Heh...and ANSI C.
>
>> 
>> Looks OK but can you please rename your tree_binop to int_cst_binop?
>> Or maybe inline it into int_const_binop, also sharing the force_fit_type ()
>> tail with poly_int_binop?
>
> I tried both, but inlining looked cleaner :).  Done.
>
>> 
>> What about mixed INTEGER_CST / poly_int constants?  Shouldn't it
>> be
>> 
>>if (neither-poly-nor-integer-cst (arg1 || arg2))
>>  return NULL_TREE;
>>if (poly_int_tree (arg1) || poly_int_tree (arg2))
>>  poly-int-stuff
>>else if (INTEGER_CST && INTEGER_CST)
>>  wide-int-stuff
>> 
>> ?  I see that is a pre-existing issue but if you are at refactoring...
>> wi::to_poly_wide should handle INTEGER_CST operands just fine
>> I hope.
>
> This aborted:
> gcc_assert (NUM_POLY_INT_COEFFS != 1);
>
> but even taking it out made the bootstrap die somewhere else.
>
> If it's ok, I'd rather not tackle this now, as I have some more cleanups 
> that are pending on this.  If you feel strongly, I could do it at a 
> later time.
>
> OK pending tests?

LGTM FWIW, just some nits:

> -/* Subroutine of int_const_binop_1 that handles two INTEGER_CSTs.  */
> +/* Combine two wide ints ARG1 and ARG2 under operation CODE to produce
> +   a new constant in RES.  Return FALSE if we don't know how to
> +   evaluate CODE at compile-time.  */
> 
> -static tree
> -int_const_binop_2 (enum tree_code code, const_tree parg1, const_tree parg2,
> -int overflowable)
> +bool
> +wide_int_binop (enum tree_code code,
> + wide_int &res, const wide_int &arg1, const wide_int &arg2,
> + signop sign, wi::overflow_type &overflow)
>  {

IMO we should avoid pass-back by reference like the plague. :-)
It's especially confusing when the code does things like:

case FLOOR_DIV_EXPR:
  if (arg2 == 0)
return false;
  res = wi::div_floor (arg1, arg2, sign, &overflow);
  break;

It looked at first like it was taking the address of a local variable
and failing to propagate the information back up.

I think we should stick to using pointers for this kind of thing.

> -/* Combine two integer constants PARG1 and PARG2 under operation CODE
> -   to produce a new constant.  Return NULL_TREE if we don't know how
> +/* Combine two poly int's ARG1 and ARG2 under operation CODE to
> +   produce a new constant in RES.  Return FALSE if we don't know how
> to evaluate CODE at compile-time.  */
> 
> -static tree
> -int_const_binop_1 (enum tree_code code, const_tree arg1, const_tree arg2,
> -int overflowable)
> +static bool
> +poly_int_binop (poly_wide_int &res, enum tree_code code,
> + const_tree arg1, const_tree arg2,
> + signop sign, wi::overflow_type &overflow)
>  {

Would be good to be consistent about the order of the result and code
arguments.  Here it's "result, code" (which seems better IMO),
but in wide_int_binop it's "code, result".

> +/* Combine two integer constants PARG1 and PARG2 under operation CODE
> +   to produce a new constant.  Return NULL_TREE if we don't know how
> +   to evaluate CODE at compile-time.  */
> +
>  tree
> -int_const_binop (enum tree_code code, const_tree arg1, const_tree arg2)
> +int_const_binop (enum tree_code code, const_tree arg1, const_tree arg2,
> +  int overflowable)

s/PARG/ARG/g in comment.

>  {
> -  return int_const_binop_1 (code, arg1, arg2, 1);
> +  bool success = false;
> +  poly_wide_int poly_res;
> +  tree type = TREE_TYPE (arg1);
> +  signop sign = TYPE_SIGN (type);
> +  wi::overflow_type overflow = wi::OVF_NONE;
> +
> +  if (TREE_CODE (arg1) == INTEGER_CST && TREE_CODE (arg2) == INTEGER_CST)
> +{
> +  wide_int warg1 = wi::to_wide (arg1), res;
> +  wide_int warg2 = wi::to_wide (arg2, TYPE_PRECISION (type));
> +  success = wide_int_binop (code, res, warg1, warg2, sign, overflow);
> +  poly_res = res;
> +}
> +  else if (poly_int_tree_p (arg1) && poly_int_tree_p (arg2))
> +success = poly_int_binop (poly_res, code, arg1, arg2, sign, overflow);
> +  if (success)
> +return force_fit_type (type, poly_res, overflowable,
> +(((sign == SIGNED || overflowable == -1)
> +  && overflow)
> + | TREE_OVERFLOW (arg1) | TREE_OVERFL

Re: [PATCH 0/5] [RFC v2] Higher-level reporting of vectorization problems

2018-07-11 Thread David Malcolm
On Wed, 2018-07-11 at 16:56 +0100, Richard Sandiford wrote:
> David Malcolm  writes:
> > On Mon, 2018-06-25 at 11:10 +0200, Richard Biener wrote:
> > > On Fri, 22 Jun 2018, David Malcolm wrote:
> > > 
> > > > NightStrike and I were chatting on IRC last week about
> > > > issues with trying to vectorize the following code:
> > > > 
> > > > #include 
> > > > std::size_t f(std::vector> const & v) {
> > > > std::size_t ret = 0;
> > > > for (auto const & w: v)
> > > > ret += w.size();
> > > > return ret;
> > > > }
> > > > 
> > > > icc could vectorize it, but gcc couldn't, but neither of us
> > > > could
> > > > immediately figure out what the problem was.
> > > > 
> > > > Using -fopt-info leads to a wall of text.
> > > > 
> > > > I tried using my patch here:
> > > > 
> > > >  "[PATCH] v3 of optinfo, remarks and optimization records"
> > > >   https://gcc.gnu.org/ml/gcc-patches/2018-06/msg01267.html
> > > > 
> > > > It improved things somewhat, by showing:
> > > > (a) the nesting structure via indentation, and
> > > > (b) the GCC line at which each message is emitted (by using the
> > > > "remark" output)
> > > > 
> > > > but it's still a wall of text:
> > > > 
> > > >   https://dmalcolm.fedorapeople.org/gcc/2018-06-18/test.cc.rema
> > > > rks.
> > > > html
> > > >   https://dmalcolm.fedorapeople.org/gcc/2018-06-18/test.cc.d/..
> > > > %7C.
> > > > .%7Csrc%7Ctest.cc.html#line-4
> > > > 
> > > > It doesn't yet provide a simple high-level message to a
> > > > tech-savvy user on what they need to do to get GCC to
> > > > vectorize their loop.
> > > 
> > > Yeah, in particular the vectorizer is way too noisy in its low-
> > > level
> > > functions.  IIRC -fopt-info-vec-missed is "somewhat" better:
> > > 
> > > t.C:4:26: note: step unknown.
> > > t.C:4:26: note: vector alignment may not be reachable
> > > t.C:4:26: note: not ssa-name.
> > > t.C:4:26: note: use not simple.
> > > t.C:4:26: note: not ssa-name.
> > > t.C:4:26: note: use not simple.
> > > t.C:4:26: note: no array mode for V2DI[3]
> > > t.C:4:26: note: Data access with gaps requires scalar epilogue
> > > loop
> > > t.C:4:26: note: can't use a fully-masked loop because the target
> > > doesn't 
> > > have the appropriate masked load or store.
> > > t.C:4:26: note: not ssa-name.
> > > t.C:4:26: note: use not simple.
> > > t.C:4:26: note: not ssa-name.
> > > t.C:4:26: note: use not simple.
> > > t.C:4:26: note: no array mode for V2DI[3]
> > > t.C:4:26: note: Data access with gaps requires scalar epilogue
> > > loop
> > > t.C:4:26: note: op not supported by target.
> > > t.C:4:26: note: not vectorized: relevant stmt not supported: _15
> > > =
> > > _14 
> > > /[ex] 4;
> > > t.C:4:26: note: bad operation or unsupported loop bound.
> > > t.C:4:26: note: not vectorized: no grouped stores in basic block.
> > > t.C:4:26: note: not vectorized: no grouped stores in basic block.
> > > t.C:6:12: note: not vectorized: not enough data-refs in basic
> > > block.
> > > 
> > > 
> > > > The pertinent dump messages are:
> > > > 
> > > > test.cc:4:23: remark: === try_vectorize_loop_1 ===
> > > > [../../src/gcc/tree-vectorizer.c:674:try_vectorize_loop_1]
> > > > cc1plus: remark:
> > > > Analyzing loop at test.cc:4
> > > > [../../src/gcc/dumpfile.c:735:ensure_pending_optinfo]
> > > > test.cc:4:23: remark:  === analyze_loop_nest ===
> > > > [../../src/gcc/tree-vect-loop.c:2299:vect_analyze_loop]
> > > > [...snip...]
> > > > test.cc:4:23: remark:   === vect_analyze_loop_operations ===
> > > > [../../src/gcc/tree-vect-
> > > > loop.c:1520:vect_analyze_loop_operations]
> > > > [...snip...]
> > > > test.cc:4:23: remark:==> examining statement: ‘_15 = _14
> > > > /[ex]
> > > > 4;’ [../../src/gcc/tree-vect-stmts.c:9382:vect_analyze_stmt]
> > > > test.cc:4:23: remark:vect_is_simple_use: operand ‘_14’
> > > > [../../src/gcc/tree-vect-stmts.c:10064:vect_is_simple_use]
> > > > test.cc:4:23: remark:def_stmt: ‘_14 = _8 - _7;’
> > > > [../../src/gcc/tree-vect-stmts.c:10098:vect_is_simple_use]
> > > > test.cc:4:23: remark:type of def: internal
> > > > [../../src/gcc/tree-
> > > > vect-stmts.c:10112:vect_is_simple_use]
> > > > test.cc:4:23: remark:vect_is_simple_use: operand ‘4’
> > > > [../../src/gcc/tree-vect-stmts.c:10064:vect_is_simple_use]
> > > > test.cc:4:23: remark:op not supported by target.
> > > > [../../src/gcc/tree-vect-stmts.c:5932:vectorizable_operation]
> > > > test.cc:4:23: remark:not vectorized: relevant stmt not
> > > > supported: ‘_15 = _14 /[ex] 4;’ [../../src/gcc/tree-vect-
> > > > stmts.c:9565:vect_analyze_stmt]
> > > > test.cc:4:23: remark:   bad operation or unsupported loop
> > > > bound.
> > > > [../../src/gcc/tree-vect-loop.c:2043:vect_analyze_loop_2]
> > > > cc1plus: remark: vectorized 0 loops in function.
> > > > [../../src/gcc/tree-vectorizer.c:904:vectorize_loops]
> > > > 
> > > > In particular, that complaint from
> > > >   [../../src/gcc/tree-vect-stmts.c:9565:vect_analyze_stmt]
> > > > is c

Re: [PATCH][GCC][mid-end] Add a hook to support telling the mid-end when to probe the stack [patch (2/6)]

2018-07-11 Thread Jeff Law
On 07/11/2018 05:21 AM, Tamar Christina wrote:
> Hi All,
> 
> This patch adds a hook to tell the mid-end about the probing requirements of 
> the
> target.  On AArch64 we allow a specific range for which no probing needs to
> be done.  This same range is also the amount that will have to be probed up 
> when
> a probe is needed after dropping the stack.
> 
> Defining this probe comes with the extra requirement that the outgoing 
> arguments
> size of any function that uses alloca and stack clash be at the very least 8
> bytes.  With this invariant we can skip doing the zero checks for alloca and
> save some code.
> 
> A simplified version of the AArch64 stack frame is:
> 
>+---+  
>|   | 
>|   |  
>|   |  
>+---+  
>|LR |  
>+---+  
>|FP |  
>+---+  
>|dynamic allocations| -\  probe range hook effects these   
>+---+   --\   and ensures that outgoing stack  
>|padding|  -- args is always > 8 when alloca.  
>+---+  ---/   Which means it's always safe to probe
>|outgoing stack args|-/   at SP
>+---+  
>   
>  
> 
> This allows us to generate better code than without the hook without affecting
> other targets.
> 
> With this patch I am also removing the 
> stack_clash_protection_final_dynamic_probe
> hook which was added specifically for AArch64 but that is no longer needed.
> 
> Bootstrapped Regtested on aarch64-none-linux-gnu, x86_64-pc-linux-gnu and no 
> issues.
> Both targets were tested with stack clash on and off by default.
> 
> Ok for trunk?
> 
> Thanks,
> Tamar
> 
> gcc/
> 2018-07-11  Tamar Christina  
> 
>   PR target/86486
>   * explow.c (anti_adjust_stack_and_probe_stack_clash): Support custom
>   probe ranges.
>   * target.def (stack_clash_protection_alloca_probe_range): New.
>   (stack_clash_protection_final_dynamic_probe): Remove.
>   * targhooks.h (default_stack_clash_protection_alloca_probe_range) New.
>   (default_stack_clash_protection_final_dynamic_probe): Remove.
>   * targhooks.c: Likewise.
>   * doc/tm.texi.in (TARGET_STACK_CLASH_PROTECTION_ALLOCA_PROBE_RANGE): 
> New.
>   (TARGET_STACK_CLASH_PROTECTION_FINAL_DYNAMIC_PROBE): Remove.
>   * doc/tm.texi: Regenerate.
>
The control flow is a bit convoluted here, but after a few false starts
where I thought this was wrong, I think it's OK.

Jeff












Re: [patch] adjust default nvptx launch geometry for OpenACC offloaded regions

2018-07-11 Thread Cesar Philippidis
On 07/02/2018 07:14 AM, Tom de Vries wrote:
> On 06/21/2018 03:58 PM, Cesar Philippidis wrote:
>> On 06/20/2018 03:15 PM, Tom de Vries wrote:
>>> On 06/20/2018 11:59 PM, Cesar Philippidis wrote:
 Now it follows the formula contained in
 the "CUDA Occupancy Calculator" spreadsheet that's distributed with CUDA.
>>>
>>> Any reason we're not using the cuda runtime functions to get the
>>> occupancy (see PR85590 - [nvptx, libgomp, openacc] Use cuda runtime fns
>>> to determine launch configuration in nvptx ) ?
>>
>> There are two reasons:
>>
>>   1) cuda_occupancy.h depends on the CUDA runtime to extract the device
>>  properties instead of the CUDA driver API. However, we can always
>>  teach libgomp how to populate the cudaDeviceProp struct using the
>>  driver API.
>>
>>   2) CUDA is not always present on the build host, and that's why
>>  libgomp maintains its own cuda.h. So at the very least, this
>>  functionality would be good to have in libgomp as a fallback
>>  implementation;
> 
> Libgomp maintains its own cuda.h to "allow building GCC with PTX
> offloading even without CUDA being installed" (
> https://gcc.gnu.org/ml/gcc-patches/2017-01/msg00980.html ).
> 
> The libgomp nvptx plugin however uses the cuda driver API to launch
> kernels etc, so we can assume that's always available at launch time.
> And according to the "CUDA Pro Tip: Occupancy API Simplifies Launch
> Configuration", the occupancy API is also available in the driver API.
> 
> What we cannot assume to be available is the occupancy API pre cuda-6.5.
> So it's fine to have a fallback for that (properly isolated in utility
> functions), but for cuda 6.5 and up we want to use the occupancy API.

Here's revision 2 to the patch. I replaced all of my thread occupancy
heuristics with calls to the CUDA driver as you suggested. The
performance is worse than my heuristics, but that's to be expected
because the CUDA driver only guarantees the minimal launch geometry to
to fully utilize the hardware, and not the optimal value. I'll
reintroduce my heuristics later as a follow up patch. The major
advantage of the CUDA thread occupancy calculator is that it allows the
runtime to select sensible default num_workers to avoid those annoying
runtime failures due to insufficient GPU hardware resources.

One thing that may stick out in this patch is how it probes for the
driver version instead of the API version. It turns out that the API
version corresponds to the SM version declared in the PTX sources,
whereas the driver version corresponds to the latest version of CUDA
supported by the driver. At least that's the case with driver version
396.24.

>>  its not good to have program fail due to
>>  insufficient hardware resources errors when it is avoidable.
>>
> 
> Right, in fact there are two separate things you're trying to address
> here: launch failure and occupancy heuristic, so split the patch.

That hunk was small, so I included it with this patch. Although if you
insist, I can remove it.

Is this patch OK for trunk? I tested it x86_64 with nvptx offloading.

Cesar
2018-07-XX  Cesar Philippidis  
	Tom de Vries  

	gcc/
	* config/nvptx/nvptx.c (PTX_GANG_DEFAULT): Rename to ...
	(PTX_DEFAULT_RUNTIME_DIM): ... this.
	(nvptx_goacc_validate_dims): Set default worker and gang dims to
	PTX_DEFAULT_RUNTIME_DIM.
	(nvptx_dim_limit): Ignore GOMP_DIM_WORKER;

	libgomp/
	* plugin/cuda/cuda.h (CUoccupancyB2DSize): Declare.
	(cuOccupancyMaxPotentialBlockSizeWithFlags): Likewise.
	* plugin/plugin-nvptx.c (struct ptx_device): Add driver_version member.
	(nvptx_open_device): Set it.
	(nvptx_exec): Use the CUDA driver to both determine default num_gangs
	and num_workers, and error if the hardware doesn't have sufficient
	resources to launch a kernel.


diff --git a/gcc/config/nvptx/nvptx.c b/gcc/config/nvptx/nvptx.c
index 5608bee8a8d..c1946e75f42 100644
--- a/gcc/config/nvptx/nvptx.c
+++ b/gcc/config/nvptx/nvptx.c
@@ -5165,7 +5165,7 @@ nvptx_expand_builtin (tree exp, rtx target, rtx ARG_UNUSED (subtarget),
 /* Define dimension sizes for known hardware.  */
 #define PTX_VECTOR_LENGTH 32
 #define PTX_WORKER_LENGTH 32
-#define PTX_GANG_DEFAULT  0 /* Defer to runtime.  */
+#define PTX_DEFAULT_RUNTIME_DIM 0 /* Defer to runtime.  */
 
 /* Implement TARGET_SIMT_VF target hook: number of threads in a warp.  */
 
@@ -5214,9 +5214,9 @@ nvptx_goacc_validate_dims (tree decl, int dims[], int fn_level)
 {
   dims[GOMP_DIM_VECTOR] = PTX_VECTOR_LENGTH;
   if (dims[GOMP_DIM_WORKER] < 0)
-	dims[GOMP_DIM_WORKER] = PTX_WORKER_LENGTH;
+	dims[GOMP_DIM_WORKER] = PTX_DEFAULT_RUNTIME_DIM;
   if (dims[GOMP_DIM_GANG] < 0)
-	dims[GOMP_DIM_GANG] = PTX_GANG_DEFAULT;
+	dims[GOMP_DIM_GANG] = PTX_DEFAULT_RUNTIME_DIM;
   changed = true;
 }
 
@@ -5230,9 +5230,6 @@ nvptx_dim_limit (int axis)
 {
   switch (axis)
 {
-case GOMP_DIM_WORKER:
-  return PTX_WORKER_LENGTH;
-
 case GOMP_DIM_VECTOR:
   return PTX_VECTOR_LENGTH;

Re: [PATCH][GCC][front-end][build-machinery][opt-framework] Allow setting of stack-clash via configure options. [Patch (4/6)]

2018-07-11 Thread Jeff Law
On 07/11/2018 05:22 AM, Tamar Christina wrote:
> Hi All,
> 
> This patch defines a configure option to allow the setting of the default
> guard size via configure flags when building the target.
> 
> The new flag is:
> 
>  * --with-stack-clash-protection-guard-size=
> 
> The value of configured based params are set very early on and allow the
> target to validate or reject the values as it sees fit.
> 
> To do this the values for the parameter get set by configure through CPP 
> defines.
> In case the back-end wants to know if a value was set or not the original 
> default
> value is also passed down as a define.
> 
> This allows a target to check if a param was changed by the user at configure 
> time.
> 
> Bootstrapped Regtested on aarch64-none-linux-gnu, x86_64-pc-linux-gnu and no 
> issues.
> Both targets were tested with stack clash on and off by default.
> 
> Ok for trunk?
> 
> Thanks,
> Tamar
> 
> gcc/
> 2018-07-11  Tamar Christina  
> 
>   PR target/86486
>   * configure.ac: Add stack-clash-protection-guard-size.
>   * config.in (DEFAULT_STK_CLASH_GUARD_SIZE, STK_CLASH_GUARD_SIZE_DEFAULT,
>   STK_CLASH_GUARD_SIZE_MAX, STK_CLASH_GUARD_SIZE_MIN): New.
>   * params.def (PARAM_STACK_CLASH_PROTECTION_GUARD_SIZE): Use it.
>   * configure: Regenerate.
>   * Makefile.in (params.list, params.options): Add include dir for CPP.
>   * params-list.h: Include auto-host.h
>   * params-options.h: Likewise.
> 
Something seems wrong here.

What's the purpose of including auto-host in params-list and
params-options?  It seems like you're putting a property of the target
(guard size) into the wrong place (auto-host.h).

It's also a bit unclear to me why this is necessary at all.  Are we
planning to support both the 4k and 64k guards?  My goal (once the guard
was configurable) was never for supporting multiple sizes on a target
but instead to allow experimentation to find the right default.

Jeff


Re: [PATCH][GCC][front-end][opt-framework] Allow back-ends to be able to do custom validations on params. [Patch (1/3)]

2018-07-11 Thread Jeff Law
On 07/11/2018 05:24 AM, Tamar Christina wrote:
> Hi All,
> 
> This patch adds the ability for backends to add custom constrains to the param
> values by defining a new hook option_validate_param.
> 
> This hook is invoked on every set_param_value which allows the back-end to
> ensure that the parameters are always within it's desired state.
> 
> Bootstrapped Regtested on aarch64-none-linux-gnu, x86_64-pc-linux-gnu and no 
> issues.
> Both targets were tested with stack clash on and off by default.
> 
> Ok for trunk?
> 
> Thanks,
> Tamar
> 
> gcc/
> 2018-07-11  Tamar Christina  
> 
>   * params.c (set_param_value):
>   Add index of parameter being validated.
>   * common/common-target.def (option_validate_param): New.
>   * common/common-targhooks.h (default_option_validate_param): New.
>   * common/common-targhooks.c (default_option_validate_param): New.
>   * doc/tm.texi.in (TARGET_OPTION_VALIDATE_PARAM): New.
>   * doc/tm.texi: Regenerate.
> 
OK
jeff


Re: [PATCH][GCC][front-end][opt-framework] Update options framework for parameters to properly handle and validate configure time params. [Patch (2/3)]

2018-07-11 Thread Jeff Law
On 07/11/2018 05:24 AM, Tamar Christina wrote:
> Hi All,
> 
> This patch builds on a previous patch to pass param options down from 
> configure
> by adding more expansive validation and correctness checks.
> 
> These are set very early on and allow the target to validate or reject the
> values as they see fit.
> 
> To do this compiler_param has been extended to hold a value set at configure
> time, this value is used to be able to distinguish between
> 
> 1) default value
> 2) configure value
> 3) back-end default
> 4) user specific value.
> 
> The priority of the values should be 4 > 2 > 3 > 1.  The compiler will now 
> also
> validate the values in params.def after setting them.  This means invalid 
> values
> will no longer be accepted.
> 
> This also changes it so that default parameters are validated during
> initialization. This change is needed to ensure parameters set via configure
> or by the target specific common initialization routines still keep the
> parameters within the valid range.
> 
> Bootstrapped Regtested on aarch64-none-linux-gnu, x86_64-pc-linux-gnu and no 
> issues.
> Both targets were tested with stack clash on and off by default.
> 
> Ok for trunk?
> 
> Thanks,
> Tamar
> 
> gcc/
> 2018-07-11  Tamar Christina  
> 
>   * params.h (struct param_info): Add configure_value.
>   * params.c (DEFPARAMCONF): New.
>   (DEFPARAM, DEFPARAMENUM5): Set configure_value.
>   (validate_param): New.
>   (add_params): Use it.
>   (set_param_value): Refactor param validation into validate_param.
>   (maybe_set_param_value): Don't override value from configure.
>   (diagnostic.h): Include.
>   * params-enum.h (DEFPARAMCONF): New.
>   * params-list.h: Likewise.
>   * params-options.h: Likewise.
>   * params.def (PARAM_STACK_CLASH_PROTECTION_GUARD_SIZE): Use it.
>   * diagnostic.h (diagnostic_ready_p): New.
Generally OK, though probably should depend on what we decide WRT
configurability.  ie, I'm not convinced we need to be able to set the
default via a configure time option.  And if we don't support that this
patch gets somewhat simpler.

jeff
> 



Re: [PATCH] Fix __mmask* types on many AVX512 intrinsics

2018-07-11 Thread Jeff Law
On 07/07/2018 02:15 AM, Jakub Jelinek wrote:
> Hi!
> 
> On Fri, Jul 06, 2018 at 12:47:07PM +0200, Jakub Jelinek wrote:
>> On Thu, Jul 05, 2018 at 11:57:26PM +0300, Grazvydas Ignotas wrote:
>>> I think it would be more efficient if you took care of it. I won't
>>> have time for at least a few days anyway.
> 
> Here is the complete patch, I found two further issues where
> the __mmask mismatch was in between the return type and what was used
> in the rest of the intrinsic, so not caught by my earlier greps.
> 
> I've added (except for the avx512bitalg which seems to have no runtime
> test coverage whatsoever) tests that cover the real bugs and further
> fixed the avx512*-vpcmp{,u}b-2.c test because (rel) << i triggered UB
> if i could go up to 63.
> 
> I don't have AVX512* hw, so I've just bootstrapped/regtested the patch
> normally on i686-linux and x86_64-linux AVX2 hw and tried the affected
> tests without the config/i386/ changes and with them under SDE.
> The patch should fix these FAILs:
> 
> FAIL: gcc.target/i386/avx512bw-vpcmpb-2.c execution test
> FAIL: gcc.target/i386/avx512bw-vpcmpub-2.c execution test
> FAIL: gcc.target/i386/avx512f-vinsertf32x4-3.c execution test
> FAIL: gcc.target/i386/avx512f-vinserti32x4-3.c execution test
> FAIL: gcc.target/i386/avx512vl-vpcmpb-2.c execution test
> FAIL: gcc.target/i386/avx512vl-vpcmpgeb-2.c execution test
> FAIL: gcc.target/i386/avx512vl-vpcmpgeub-2.c execution test
> FAIL: gcc.target/i386/avx512vl-vpcmpgeuw-2.c execution test
> FAIL: gcc.target/i386/avx512vl-vpcmpgew-2.c execution test
> FAIL: gcc.target/i386/avx512vl-vpcmpleb-2.c execution test
> FAIL: gcc.target/i386/avx512vl-vpcmpleub-2.c execution test
> FAIL: gcc.target/i386/avx512vl-vpcmpleuw-2.c execution test
> FAIL: gcc.target/i386/avx512vl-vpcmplew-2.c execution test
> FAIL: gcc.target/i386/avx512vl-vpcmpltb-2.c execution test
> FAIL: gcc.target/i386/avx512vl-vpcmpltub-2.c execution test
> FAIL: gcc.target/i386/avx512vl-vpcmpltuw-2.c execution test
> FAIL: gcc.target/i386/avx512vl-vpcmpltw-2.c execution test
> FAIL: gcc.target/i386/avx512vl-vpcmpneqb-2.c execution test
> FAIL: gcc.target/i386/avx512vl-vpcmpnequb-2.c execution test
> FAIL: gcc.target/i386/avx512vl-vpcmpnequw-2.c execution test
> FAIL: gcc.target/i386/avx512vl-vpcmpneqw-2.c execution test
> FAIL: gcc.target/i386/avx512vl-vpcmpub-2.c execution test
> 
> Ok for trunk?
> 
> I guess we want to backport it soon, but would appreciate somebody testing
> it on real AVX512-{BW,VL} hw before doing the backports.
> 
> Another thing to consider is whether we shouldn't add those grep/sed checks
> I've been doing (at least the easy ones that don't cross-check the
> i386-builtins.def against the uses in the intrin headers) to config/i386/t-*
> some way.
> 
> 2018-07-07  Jakub Jelinek  
> 
>   * config/i386/avx512bitalgintrin.h (_mm512_mask_bitshuffle_epi64_mask):
>   Use __mmask64 type instead of __mmask8 for __M argument.
>   * config/i386/avx512fintrin.h (_mm512_mask_xor_epi64,
>   _mm512_maskz_xor_epi64): Use __mmask8 type instead of __mmask16 for
>   __U argument.
>   (_mm512_mask_cmpneq_epi64_mask): Use __mmask8 type instead of
>   __mmask16 for __M argument.
>   (_mm512_maskz_insertf32x4, _mm512_maskz_inserti32x4,
>   _mm512_mask_insertf32x4, _mm512_mask_inserti32x4): Cast last argument
>   to __mmask16 instead of __mmask8.
>   * config/i386/avx512vlintrin.h (_mm_mask_add_ps, _mm_maskz_add_ps,
>   _mm256_mask_add_ps, _mm256_maskz_add_ps, _mm_mask_sub_ps,
>   _mm_maskz_sub_ps, _mm256_mask_sub_ps, _mm256_maskz_sub_ps,
>   _mm256_maskz_cvtepi32_ps, _mm_maskz_cvtepi32_ps): Use __mmask8 type
>   instead of __mmask16 for __U argument.
>   * config/i386/avx512vlbwintrin.h (_mm_mask_cmp_epi8_mask): Use
>   __mmask16 instead of __mmask8 for __U argument.
>   (_mm256_mask_cmp_epi8_mask): Use __mmask32 instead of __mmask16 for
>   __U argument.
>   (_mm256_cmp_epi8_mask): Use __mmask32 return type instead of
>   __mmask16.
>   (_mm_mask_cmp_epu8_mask): Use __mmask16 instead of __mmask8 for __U
>   argument.
>   (_mm256_mask_cmp_epu8_mask): Use __mmask32 instead of __mmask16 for
>   __U argument.
>   (_mm256_cmp_epu8_mask): Use __mmask32 return type instead of
>   __mmask16.
>   (_mm_mask_cmp_epi16_mask): Cast last argument to __mmask8 instead
>   of __mmask16.
>   (_mm256_mask_cvtepi8_epi16): Use __mmask16 instead of __mmask32 for
>   __U argument.
>   (_mm_mask_cvtepi8_epi16): Use __mmask8 instead of __mmask32 for
>   __U argument.
>   (_mm256_mask_cvtepu8_epi16): Use __mmask16 instead of __mmask32 for
>   __U argument.
>   (_mm_mask_cvtepu8_epi16): Use __mmask8 instead of __mmask32 for
>   __U argument.
>   (_mm256_mask_cmpneq_epu8_mask, _mm256_mask_cmplt_epu8_mask,
>   _mm256_mask_cmpge_epu8_mask, _mm256_mask_cmple_epu8_mask): Change
>   return type as well as __M argument type and all casts

RFC: lra-constraints.c and TARGET_HARD_REGNO_CALL_PART_CLOBBERED question/patch

2018-07-11 Thread Steve Ellcey
I have a reload/register allocation question and possible patch.  While
working on the Aarch64 SIMD ABI[1] I ran into a problem where GCC was
saving and restoring registers that it did not need to.  I tracked it
down to lra-constraints.c and its use of
targetm.hard_regno_call_part_clobbered on instructions that are not
calls.  Specifically need_for_call_save_p would check this macro even
when the instruction in question (unknown to need_for_call_save_p)
was not a call instruction.

This seems wrong to me and I was wondering if anyone more familiar
with the register allocator and reload could look at this patch and
tell me if it seems reasonable or not.  It passed bootstrap and I
am running tests now.  I am just wondering if there is any reason why
this target function would need to be called on non-call instructions
or if doing so is just an oversight/bug.

Steve Ellcey
sell...@cavium.com


[1] https://gcc.gnu.org/ml/gcc/2018-07/msg00012.html


2018-07-11  Steve Ellcey  

* lra-constraints.c (need_for_call_save_p): Add insn argument
and only check targetm.hard_regno_call_part_clobbered on calls.
(need_for_split_p): Add insn argument, pass to need_for_call_save_p.
(split_reg): Pass insn to need_for_call_save_p.
(split_if_necessary): Pass curr_insn to need_for_split_p.
(inherit_in_ebb): Ditto.


diff --git a/gcc/lra-constraints.c b/gcc/lra-constraints.c
index 7eeec76..7fc8e7f 100644
--- a/gcc/lra-constraints.c
+++ b/gcc/lra-constraints.c
@@ -5344,7 +5344,7 @@ inherit_reload_reg (bool def_p, int original_regno,
 /* Return true if we need a caller save/restore for pseudo REGNO which
was assigned to a hard register.  */
 static inline bool
-need_for_call_save_p (int regno)
+need_for_call_save_p (int regno, rtx_insn *insn)
 {
   lra_assert (regno >= FIRST_PSEUDO_REGISTER && reg_renumber[regno] >= 0);
   return (usage_insns[regno].calls_num < calls_num
@@ -5354,7 +5354,7 @@ need_for_call_save_p (int regno)
       ? lra_reg_info[regno].actual_call_used_reg_set
       : call_used_reg_set,
       PSEUDO_REGNO_MODE (regno), reg_renumber[regno])
-     || (targetm.hard_regno_call_part_clobbered
+     || (CALL_P (insn) && targetm.hard_regno_call_part_clobbered
      (reg_renumber[regno], PSEUDO_REGNO_MODE (regno);
 }
 
@@ -5374,7 +5374,8 @@ static bitmap_head ebb_global_regs;
assignment pass because of too many generated moves which will be
probably removed in the undo pass.  */
 static inline bool
-need_for_split_p (HARD_REG_SET potential_reload_hard_regs, int regno)
+need_for_split_p (HARD_REG_SET potential_reload_hard_regs,
+     int regno, rtx_insn *insn)
 {
   int hard_regno = regno < FIRST_PSEUDO_REGISTER ? regno : reg_renumber[regno];
 
@@ -5416,7 +5417,8 @@ need_for_split_p (HARD_REG_SET 
potential_reload_hard_regs, int regno)
       || (regno >= FIRST_PSEUDO_REGISTER
       && lra_reg_info[regno].nrefs > 3
       && bitmap_bit_p (&ebb_global_regs, regno
-     || (regno >= FIRST_PSEUDO_REGISTER && need_for_call_save_p (regno)));
+     || (regno >= FIRST_PSEUDO_REGISTER
+     && need_for_call_save_p (regno, insn)));
 }
 
 /* Return class for the split pseudo created from original pseudo with
@@ -5536,7 +5538,7 @@ split_reg (bool before_p, int original_regno, rtx_insn 
*insn,
   nregs = hard_regno_nregs (hard_regno, mode);
   rclass = lra_get_allocno_class (original_regno);
   original_reg = regno_reg_rtx[original_regno];
-  call_save_p = need_for_call_save_p (original_regno);
+  call_save_p = need_for_call_save_p (original_regno, insn);
 }
   lra_assert (hard_regno >= 0);
   if (lra_dump_file != NULL)
@@ -5759,7 +5761,7 @@ split_if_necessary (int regno, machine_mode mode,
     && INSN_UID (next_usage_insns) < max_uid)
    || (GET_CODE (next_usage_insns) == INSN_LIST
    && (INSN_UID (XEXP (next_usage_insns, 0)) < max_uid)))
-   && need_for_split_p (potential_reload_hard_regs, regno + i)
+   && need_for_split_p (potential_reload_hard_regs, regno + i, insn)
    && split_reg (before_p, regno + i, insn, next_usage_insns, NULL))
 res = true;
   return res;
@@ -6529,7 +6531,8 @@ inherit_in_ebb (rtx_insn *head, rtx_insn *tail)
      && usage_insns[j].check == curr_usage_insns_check
      && (next_usage_insns = usage_insns[j].insns) != NULL_RTX)
    {
-     if (need_for_split_p (potential_reload_hard_regs, j))
+     if (need_for_split_p (potential_reload_hard_regs, j,
+   curr_insn))
    {
      if (lra_dump_file != NULL && head_p)
    {


Re: [PATCH 0/7] Mitigation against unsafe data speculation (CVE-2017-5753)

2018-07-11 Thread Jeff Law
On 07/10/2018 10:43 AM, Richard Earnshaw (lists) wrote:
> On 10/07/18 16:42, Jeff Law wrote:
>> On 07/10/2018 02:49 AM, Richard Earnshaw (lists) wrote:
>>> On 10/07/18 00:13, Jeff Law wrote:
 On 07/09/2018 10:38 AM, Richard Earnshaw wrote:
>
> To address all of the above, these patches adopt a new approach, based
> in part on a posting by Chandler Carruth to the LLVM developers list
> (https://lists.llvm.org/pipermail/llvm-dev/2018-March/122085.html),
> but which we have extended to deal with inter-function speculation.
> The patches divide the problem into two halves.
 We're essentially turning the control dependency into a value that we
 can then use to munge the pointer or the resultant data.

>
> The first half is some target-specific code to track the speculation
> condition through the generated code to provide an internal variable
> which can tell us whether or not the CPU's control flow speculation
> matches the data flow calculations.  The idea is that the internal
> variable starts with the value TRUE and if the CPU's control flow
> speculation ever causes a jump to the wrong block of code the variable
> becomes false until such time as the incorrect control flow
> speculation gets unwound.
 Right.

 So one of the things that comes immediately to mind is you have to run
 this early enough that you can still get to all the control flow and
 build your predicates.  Otherwise you have do undo stuff like
 conditional move generation.
>>>
>>> No, the opposite, in fact.  We want to run this very late, at least on
>>> Arm systems (AArch64 or AArch32).  Conditional move instructions are
>>> fine - they're data-flow operations, not control flow (in fact, that's
>>> exactly what the control flow tracker instructions are).  By running it
>>> late we avoid disrupting any of the earlier optimization passes as well.
>> Ack.  I looked at the aarch64 implementation after sending my message
>> and it clearly runs very late.
>>
>> I haven't convinced myself that all the work generic parts of the
>> compiler to rewrite and eliminate conditionals is safe.  But even if it
>> isn't, you're probably getting enough coverage to drastically reduce the
>> attack surface.  I'm going to have to think about the early
>> transformations we make and how they interact here harder.  But I think
>> the general approach can dramatically reduce the attack surface.
> 
> My argument here would be that we are concerned about speculation that
> the CPU does with the generated program.  We're not particularly
> bothered about the abstract machine description it's based upon.  As
> long as the earlier transforms lead to a valid translation (it hasn't
> removed a necessary bounds check) then running late is fine.
I'm thinking about obfuscation of the bounds check or the pointer or
turning branchy into straightline code, possibly doing some speculation
in the process, if-conversion and the like.

For example hoist_adjacent_loads which results in speculative loads and
likely a conditional move to select between the two loaded values.

Or what if we've done something like

if (x < maxval)
   res = *p;

And we've turned that into


t = *p;
res = (x < maxval) ? t : res;


That may be implemented as a conditional move at the RTL level, so
protecting that may be nontrivial.

In those examples the compiler itself has introduced the speculation.

I can't find the conditional obfuscation I was looking for, so it's hard
to rule it in our out as potentially problematical.

WRT pointer obfuscation, we no longer propagate conditional equivalences
very agressively, so it may be a non-issue in the end.

But again, even with these concerns I think what you're doing cuts down
the attack surface in meaningful ways.



> 
> I can't currently conceive a situation where the compiler would be able
> to remove a /necessary/ bounds check that could lead to unsafe
> speculation later on.  A redundant bounds check removal shouldn't be a
> problem as the non-redundant check should remain and that will still get
> tracking code added.
It's less about removal and more about either compiler-generated
speculation or obfuscation of the patterns you're looking for.


jeff






Re: [PATCH 6/7] AArch64 - new pass to add conditional-branch speculation tracking

2018-07-11 Thread Jeff Law
On 07/09/2018 10:38 AM, Richard Earnshaw wrote:
> This patch is the main part of the speculation tracking code.  It adds
> a new target-specific pass that is run just before the final branch
> reorg pass (so that it can clean up any new edge insertions we make).
> The pass is only run with -mtrack-speculation is passed on the command
> line.
> 
> One thing that did come to light as part of this was that the stack pointer
> register was not being permitted in comparision instructions.  We rely on
> that for moving the tracking state between SP and the scratch register at
> function call boundaries.
Note that the sp in comparison instructions issue came up with the
improvements to stack-clash that Tamar, Richard S. and you worked on.


> 
>   * config/aarch64/aarch64-speculation.cc: New file.
>   * config/aarch64/aarch64-passes.def (pass_track_speculation): Add before
>   pass_reorder_blocks.
>   * config/aarch64/aarch64-protos.h (make_pass_track_speculation): Add
>   prototype.
>   * config/aarch64/aarch64.c (aarch64_conditional_register_usage): Fix
>   X14 and X15 when tracking speculation.
>   * config/aarch64/aarch64.md (register name constants): Add
>   SPECULATION_TRACKER_REGNUM and SPECULATION_SCRATCH_REGNUM.
>   (unspec): Add UNSPEC_SPECULATION_TRACKER.
>   (speculation_barrier): New insn attribute.
>   (cmp): Allow SP in comparisons.
>   (speculation_tracker): New insn.
>   (speculation_barrier): Add speculation_barrier attribute.
>   * config/aarch64/t-aarch64: Add make rule for aarch64-speculation.o.
>   * config.gcc (aarch64*-*-*): Add aarch64-speculation.o to extra_objs.
>   * doc/invoke.texi (AArch64 Options): Document -mtrack-speculation.
> ---
>  gcc/config.gcc|   2 +-
>  gcc/config/aarch64/aarch64-passes.def |   1 +
>  gcc/config/aarch64/aarch64-protos.h   |   3 +-
>  gcc/config/aarch64/aarch64-speculation.cc | 494 
> ++
>  gcc/config/aarch64/aarch64.c  |  13 +
>  gcc/config/aarch64/aarch64.md |  30 +-
>  gcc/config/aarch64/t-aarch64  |  10 +
>  gcc/doc/invoke.texi   |  10 +-
>  8 files changed, 558 insertions(+), 5 deletions(-)
>  create mode 100644 gcc/config/aarch64/aarch64-speculation.cc
Given the consensus forming about using these kind of masking
instructions being the preferred way to mitigate (as opposed to lfence
barriers and the like) I have to ask your opinions about making the bulk
of this a general pass rather than one specific to the aarch backend.
I'd hate to end up duplicating all this stuff across multiple architectures.

I think it all looks pretty reasonable though.

jeff



[Patch, Fortran] PR 85599: warn about short-circuiting of logical expressions for non-pure functions

2018-07-11 Thread Janus Weil
Hi all,

after the dust of the heated discussion around this PR has settled a
bit, here is another attempt to implement at least some basic warnings
about compiler-dependent behavior concerning the short-circuiting of
logical expressions.

As a reminder (and recap of the previous discussion), the Fortran
standard unfortunately is a bit sloppy in this area: It allows
compilers to short-circuit the second operand of .AND. / .OR.
operators, but does not require this. As a result, compilers can do
what they want without conflicting with the standard, and they do:
gfortran does short-circuiting (via TRUTH_ANDIF_EXPR/TRUTH_ORIF_EXPR),
ifort does not.

I'm continuing here the least-invasive approach of keeping gfortran's
current behavior, but warning about cases where compilers may produce
different results.

The attached patch is very close to the version I posted previously
(which was already approved by Janne), with the difference that the
warnings are now triggered by -Wextra and not -Wsurprising (which is
included in -Wall), as suggested by Nick Maclaren. I think this is
more reasonable, since not everyone may want to see these warnings.

Note that I don't want to warn about all possible optimizations that
might be allowed by the standard, but only about those that are
actually problematic in practice and result in compiler-dependent
behavior.

The patch regtests cleanly on x86_64-linux-gnu. Ok for trunk?

Cheers,
Janus


2018-07-11  Thomas Koenig  
Janus Weil  

PR fortran/85599
* dump-parse-tree (show_attr): Add handling of implicit_pure.
* resolve.c (impure_function_callback): New function.
(resolve_operator): Call it vial gfc_expr_walker.


2018-07-11  Janus Weil  

PR fortran/85599
* gfortran.dg/short_circuiting.f90: New test.
Index: gcc/fortran/dump-parse-tree.c
===
--- gcc/fortran/dump-parse-tree.c	(revision 262563)
+++ gcc/fortran/dump-parse-tree.c	(working copy)
@@ -716,6 +716,8 @@ show_attr (symbol_attribute *attr, const char * mo
 fputs (" ELEMENTAL", dumpfile);
   if (attr->pure)
 fputs (" PURE", dumpfile);
+  if (attr->implicit_pure)
+fputs (" IMPLICIT_PURE", dumpfile);
   if (attr->recursive)
 fputs (" RECURSIVE", dumpfile);
 
Index: gcc/fortran/resolve.c
===
--- gcc/fortran/resolve.c	(revision 262563)
+++ gcc/fortran/resolve.c	(working copy)
@@ -3822,6 +3822,46 @@ lookup_uop_fuzzy (const char *op, gfc_symtree *uop
 }
 
 
+/* Callback finding an impure function as an operand to an .and. or
+   .or.  expression.  Remember the last function warned about to
+   avoid double warnings when recursing.  */
+
+static int
+impure_function_callback (gfc_expr **e, int *walk_subtrees ATTRIBUTE_UNUSED,
+			  void *data)
+{
+  gfc_expr *f = *e;
+  const char *name;
+  static gfc_expr *last = NULL;
+  bool *found = (bool *) data;
+
+  if (f->expr_type == EXPR_FUNCTION)
+{
+  *found = 1;
+  if (f != last && !pure_function (f, &name))
+	{
+	  /* This could still be a function without side effects, i.e.
+	 implicit pure.  Do not warn for that case.  */
+	  if (f->symtree == NULL || f->symtree->n.sym == NULL
+	  || !gfc_implicit_pure (f->symtree->n.sym))
+	{
+	  if (name)
+		gfc_warning (OPT_Wextra,
+			 "Function %qs at %L might not be evaluated",
+			 name, &f->where);
+	  else
+		gfc_warning (OPT_Wextra,
+			 "Function at %L might not be evaluated",
+			 &f->where);
+	}
+	}
+  last = f;
+}
+
+  return 0;
+}
+
+
 /* Resolve an operator expression node.  This can involve replacing the
operation with a user defined function call.  */
 
@@ -3930,6 +3970,14 @@ resolve_operator (gfc_expr *e)
 	gfc_convert_type (op1, &e->ts, 2);
 	  else if (op2->ts.kind < e->ts.kind)
 	gfc_convert_type (op2, &e->ts, 2);
+
+	  if (e->value.op.op == INTRINSIC_AND || e->value.op.op == INTRINSIC_OR)
+	{
+	  /* Warn about short-circuiting
+	 with impure function as second operand.  */
+	  bool op2_f = false;
+	  gfc_expr_walker (&op2, impure_function_callback, &op2_f);
+	}
 	  break;
 	}
 
! { dg-do compile }
! { dg-additional-options "-Wextra" }
!
! PR 85599: warn about short-circuiting of logical expressions for non-pure functions
!
! Contributed by Janus Weil 

program short_circuit

   logical :: flag
   flag = .false.
   flag = check() .and. flag
   flag = flag .and. check()  ! { dg-warning "might not be evaluated" }
   flag = flag .and. pure_check()

contains

   logical function check()
  integer, save :: i = 1
  print *, "check", i
  i = i + 1
  check = .true.
   end function

   logical pure function pure_check()
  pure_check = .true.
   end function

end


Re: [PATCH][RFC] Make iterating over hash-map elide copying/destructing

2018-07-11 Thread Pedro Alves
On 07/11/2018 12:24 PM, Trevor Saunders wrote:
> However if we went that route we should prevent use of the
> assignment operator by declaring one explicitly and making it private but
> then not implementing it, so it at least fails to link and with some
> macros you can actually tell the compiler in c++11 its deleted and may
> not be used.

The macro already exists --- DISABLE_COPY_AND_ASSIGN in include/ansidecl.h.

Thanks,
Pedro Alves


Re: RFC: lra-constraints.c and TARGET_HARD_REGNO_CALL_PART_CLOBBERED question/patch

2018-07-11 Thread Jeff Law
On 07/11/2018 02:07 PM, Steve Ellcey wrote:
> I have a reload/register allocation question and possible patch.  While
> working on the Aarch64 SIMD ABI[1] I ran into a problem where GCC was
> saving and restoring registers that it did not need to.  I tracked it
> down to lra-constraints.c and its use of
> targetm.hard_regno_call_part_clobbered on instructions that are not
> calls.  Specifically need_for_call_save_p would check this macro even
> when the instruction in question (unknown to need_for_call_save_p)
> was not a call instruction.
> 
> This seems wrong to me and I was wondering if anyone more familiar
> with the register allocator and reload could look at this patch and
> tell me if it seems reasonable or not.  It passed bootstrap and I
> am running tests now.  I am just wondering if there is any reason why
> this target function would need to be called on non-call instructions
> or if doing so is just an oversight/bug.
> 
> Steve Ellcey
> sell...@cavium.com
> 
> 
> [1] https://gcc.gnu.org/ml/gcc/2018-07/msg00012.html
> 
> 
> 2018-07-11  Steve Ellcey  
> 
>   * lra-constraints.c (need_for_call_save_p): Add insn argument
>   and only check targetm.hard_regno_call_part_clobbered on calls.
>   (need_for_split_p): Add insn argument, pass to need_for_call_save_p.
>   (split_reg): Pass insn to need_for_call_save_p.
>   (split_if_necessary): Pass curr_insn to need_for_split_p.
>   (inherit_in_ebb): Ditto.
Various target have calls which are exposed as INSNs rather than as
CALL_INSNs.   So we need to check that hook on all insns.

You can probably see this in action with the TLS insns on aarch64.

jeff


[PATCH][Middle-end][version 3]3rd patch of PR78809

2018-07-11 Thread Qing Zhao
Hi,   This is the 3rd version of the patch for the last part of PR78809.

the major change in this version is to address the following concerns raised by 
Martin:

> One of the basic design principles that I myself have
> accidentally violated in the past is that warning options
> should not impact the emitted object code.  I don't think
> your patch actually does introduce this dependency by having
> the codegen depend on the result of check_access() -- I'm
> pretty sure the function is designed to do the validation
> irrespective of warning options and return based on
> the result of the validation and not based on whether
> a warning was issued.  But the choice of the variable name,
> no_overflow_warn, suggests that it does, in fact, have this
> effect.  So I would suggest to rename the variable and add
> a test that verifies that this dependency does not exist.

I have addressed this concern as following per our discussion:

1. in routine expand_builtin_memcmp, 
* delete the condition if (warn_stringop_overflow) before check_access;
* change the name of the variable that holds the return value of check_access 
to no_overflow

2. in the testsuite, change the new testcase strcmpopt_6.c to inhibit inlining 
when check_access
detects error (Not depend on whether the warning option is ON or not).

the following is the new patch, tested on both X86 and aarch64, no regression.

Okay for thunk?

thanks.

Qing

gcc/ChangeLog:

+2018-07-11  Qing Zhao  
+
+   PR middle-end/78809
+   * builtins.c (expand_builtin_memcmp): Inline the calls first
+   when result_eq is false.
+   (expand_builtin_strcmp): Inline the calls first.
+   (expand_builtin_strncmp): Likewise.
+   (inline_string_cmp): New routine. Expand a string compare 
+   call by using a sequence of char comparison.
+   (inline_expand_builtin_string_cmp): New routine. Inline expansion
+   a call to str(n)cmp/memcmp.
+   * doc/invoke.texi (--param builtin-string-cmp-inline-length): New 
option.
+   * params.def (BUILTIN_STRING_CMP_INLINE_LENGTH): New.
+

gcc/testsuite/ChangeLog:

+2018-07-11  Qing Zhao  
+
+   PR middle-end/78809
+   * gcc.dg/strcmpopt_5.c: New test.
+   * gcc.dg/strcmpopt_6.c: New test.
+



0001-3nd-Patch-for-PR78009.patch
Description: Binary data


> On Jul 5, 2018, at 10:46 AM, Qing Zhao  wrote:
> 
> Hi,
> 
> I have sent two emails with the updated patches on 7/3:
> 
> https://gcc.gnu.org/ml/gcc-patches/2018-07/msg00065.html
> https://gcc.gnu.org/ml/gcc-patches/2018-07/msg00070.html
> 
> however, these 2 emails  were not successfully forwarded to the 
> gcc-patches@gcc.gnu.org mailing list.
> 
> So, I am sending the same email again in this one, hopefully this time it can 
> go through.
> Qing
> 
> Hi, Jeff,
> 
> thanks a lot for your review and comments.
> 
> I have addressed your comments,updated the patch, retested on both
> aarch64 and x86.
> 
> The major changes in this version compared to the previous version are:
> 
>   1. in routine expand_builtin_memcmp:
> * move the inlining transformation AFTER the warning is issues for
> -Wstringop-overflow;
> * only apply inlining when there is No warning is issued.
>   2. in the testsuite, add a new testcase strcmpopt_6.c for this case.
>   3. update comments to:
> * capitalize the first word.
> * capitalize all the arguments.
> 
> NOTE, the routine expand_builtin_strcmp and expand_builtin_strncmp are not 
> changed.
> the reason is:  there is NO overflow checking for these two routines 
> currently.
> if we need overflow checking for these two routines, I think that a separate 
> patch is needed.
> if this is needed, let me know, I can work on this separate patch for issuing 
> warning for strcmp/strncmp when
> -Wstringop-overflow is specified.



[PATCH] reject conflicting attributes before calling handlers (PR 86453)

2018-07-11 Thread Martin Sebor

The attached change set adjusts the attribute exclusion code
to detect and reject incompatible attributes before attribute
handlers are called to have a chance to make changes despite
the exclusions.  The handlers are not run when a conflict is
found.

Tested on x86_64-linux.  I expected the fallout to be bigger
but only a handful of tests needed adjusting and the changes
all look like clear improvements.  I.e., conflicting attributes
that diagnosed as being ignored really are being ignored as one
would expect.

Martin
PR c/86453 - error: type variant differs by TYPE_PACKED in free_lang_data since r255469

gcc/ChangeLog:

	PR c/86453
	* attribs.c (decl_attributes): Reject conflicting attributes before
	calling attribute handlers.

gcc/testsuite/ChangeLog:

	PR c/86453
	* c-c++-common/Wattributes.c: Adjust.
	* gcc.dg/Wattributes-10.c: New test.
	* g++.dg/Wattributes-3.C: Adjust.
	* g++.dg/lto/pr86453_0.C: New test.
	* gcc.dg/Wattributes-6.c: Adjust.
	* gcc.dg/pr18079.c: Adjust.
	* gcc.dg/torture/pr42363.c: Adjust.

Index: gcc/attribs.c
===
--- gcc/attribs.c	(revision 262542)
+++ gcc/attribs.c	(working copy)
@@ -672,6 +672,35 @@ decl_attributes (tree *node, tree attributes, int
 
   bool no_add_attrs = false;
 
+  /* Check for exclusions with other attributes on the current
+	 declation as well as the last declaration of the same
+	 symbol already processed (if one exists).  Detect and
+	 reject incompatible attributes.  */
+  bool built_in = flags & ATTR_FLAG_BUILT_IN;
+  if (spec->exclude
+	  && (flag_checking || !built_in))
+	{
+	  /* Always check attributes on user-defined functions.
+	 Check them on built-ins only when -fchecking is set.
+	 Ignore __builtin_unreachable -- it's both const and
+	 noreturn.  */
+
+	  if (!built_in
+	  || !DECL_P (*anode)
+	  || (DECL_FUNCTION_CODE (*anode) != BUILT_IN_UNREACHABLE
+		  && (DECL_FUNCTION_CODE (*anode)
+		  != BUILT_IN_UBSAN_HANDLE_BUILTIN_UNREACHABLE)))
+	{
+	  bool no_add = diag_attr_exclusions (last_decl, *anode, name, spec);
+	  if (!no_add && anode != node)
+		no_add = diag_attr_exclusions (last_decl, *node, name, spec);
+	  no_add_attrs |= no_add;
+	}
+	}
+
+  if (no_add_attrs)
+	continue;
+
   if (spec->handler != NULL)
 	{
 	  int cxx11_flag =
@@ -695,33 +724,6 @@ decl_attributes (tree *node, tree attributes, int
 	returned_attrs = chainon (ret, returned_attrs);
 	}
 
-  /* If the attribute was successfully handled on its own and is
-	 about to be added check for exclusions with other attributes
-	 on the current declation as well as the last declaration of
-	 the same symbol already processed (if one exists).  */
-  bool built_in = flags & ATTR_FLAG_BUILT_IN;
-  if (spec->exclude
-	  && !no_add_attrs
-	  && (flag_checking || !built_in))
-	{
-	  /* Always check attributes on user-defined functions.
-	 Check them on built-ins only when -fchecking is set.
-	 Ignore __builtin_unreachable -- it's both const and
-	 noreturn.  */
-
-	  if (!built_in
-	  || !DECL_P (*anode)
-	  || (DECL_FUNCTION_CODE (*anode) != BUILT_IN_UNREACHABLE
-		  && (DECL_FUNCTION_CODE (*anode)
-		  != BUILT_IN_UBSAN_HANDLE_BUILTIN_UNREACHABLE)))
-	{
-	  bool no_add = diag_attr_exclusions (last_decl, *anode, name, spec);
-	  if (!no_add && anode != node)
-		no_add = diag_attr_exclusions (last_decl, *node, name, spec);
-	  no_add_attrs |= no_add;
-	}
-	}
-
   /* Layout the decl in case anything changed.  */
   if (spec->type_required && DECL_P (*node)
 	  && (VAR_P (*node)
Index: gcc/testsuite/c-c++-common/Wattributes.c
===
--- gcc/testsuite/c-c++-common/Wattributes.c	(revision 262542)
+++ gcc/testsuite/c-c++-common/Wattributes.c	(working copy)
@@ -39,13 +39,13 @@ PackedPacked { int i; };
aligned and packed on a function declaration.  */
 
 void ATTR ((aligned (8), packed))
-faligned8_1 (void);   /* { dg-warning ".packed. attribute ignored" } */
+faligned8_1 (void);   /* { dg-warning "ignoring attribute .packed. because it conflicts with attribute .aligned." } */
 
 void ATTR ((aligned (8)))
-faligned8_2 (void);   /* { dg-message "previous declaration here" "" { xfail *-*-* } } */
+faligned8_2 (void);   /* { dg-message "previous declaration here" } */
 
 void ATTR ((packed))
-faligned8_2 (void);   /* { dg-warning ".packed. attribute ignored" } */
+faligned8_2 (void);   /* { dg-warning "ignoring attribute .packed. because it conflicts with attribute .aligned." } */
 
 /* Exercise the handling of the mutually exclusive attributes
always_inline and noinline (in that order).  */
Index: gcc/testsuite/g++.dg/Wattributes-3.C
===
--- gcc/testsuite/g++.dg/Wattributes-3.C	(revision 262542)
+++ gcc/testsuite/g++.dg/Wa

[PATCH] Fix store-merging wrong-code issue (PR tree-optimization/86492)

2018-07-11 Thread Jakub Jelinek
Hi!

The following testcase is a similar issue to PR84503 and the fix is similar,
because coalesce_immediate_stores temporarily sorts the stores on ascending
bitpos and if stores are merged, the merged store is emitted in the location
of the latest of the stores, we need to verify that there is no overlap with
other stores that are originally before that latest store from those we are
considering and overlaps the set of stores we are considering to merge.
In that case we need to punt and not merge (unless we do smarts like prove
overlap between just some of the stores and force reordering).

Fixed thusly, bootstrapped/regtested on x86_64-linux and i686-linux, ok for
trunk and 8.2?

2018-07-11  Jakub Jelinek  

PR tree-optimization/86492
* gimple-ssa-store-merging.c
(imm_store_chain_info::coalesce_immediate_stores): Call
check_no_overlap even for the merge_overlapping case.  Formatting fix.

* gcc.c-torture/execute/pr86492.c: New test.

--- gcc/gimple-ssa-store-merging.c.jj   2018-06-13 10:05:53.0 +0200
+++ gcc/gimple-ssa-store-merging.c  2018-07-11 19:24:12.084120206 +0200
@@ -2702,7 +2702,12 @@ imm_store_chain_info::coalesce_immediate
{
  /* Only allow overlapping stores of constants.  */
  if (info->rhs_code == INTEGER_CST
- && merged_store->stores[0]->rhs_code == INTEGER_CST)
+ && merged_store->stores[0]->rhs_code == INTEGER_CST
+ && check_no_overlap (m_store_info, i, INTEGER_CST,
+  MAX (merged_store->last_order, info->order),
+  MAX (merged_store->start
+   + merged_store->width,
+   info->bitpos + info->bitsize)))
{
  merged_store->merge_overlapping (info);
  goto done;
@@ -2732,10 +2737,8 @@ imm_store_chain_info::coalesce_immediate
  info->ops_swapped_p = true;
}
  if (check_no_overlap (m_store_info, i, info->rhs_code,
-   MAX (merged_store->last_order,
-info->order),
-   MAX (merged_store->start
-+ merged_store->width,
+   MAX (merged_store->last_order, info->order),
+   MAX (merged_store->start + merged_store->width,
 info->bitpos + info->bitsize)))
{
  /* Turn MEM_REF into BIT_INSERT_EXPR for bit-field stores.  */
--- gcc/testsuite/gcc.c-torture/execute/pr86492.c.jj2018-07-11 
19:40:27.760122514 +0200
+++ gcc/testsuite/gcc.c-torture/execute/pr86492.c   2018-07-11 
19:40:13.460107841 +0200
@@ -0,0 +1,34 @@
+/* PR tree-optimization/86492 */
+
+union U
+{
+  unsigned int r;
+  struct S
+  {
+unsigned int a:12;
+unsigned int b:4;
+unsigned int c:16;
+  } f;
+};
+
+__attribute__((noipa)) unsigned int
+foo (unsigned int x)
+{
+  union U u;
+  u.r = 0;
+  u.f.c = x;
+  u.f.b = 0xe;
+  return u.r;
+}
+
+int
+main ()
+{
+  union U u;
+  if (__CHAR_BIT__ * __SIZEOF_INT__ != 32 || sizeof (u.r) != sizeof (u.f))
+return 0;
+  u.r = foo (0x72);
+  if (u.f.a != 0 || u.f.b != 0xe || u.f.c != 0x72)
+__builtin_abort ();
+  return 0;
+}

Jakub


Re: [PATCH 0/7] Mitigation against unsafe data speculation (CVE-2017-5753)

2018-07-11 Thread Richard Earnshaw (lists)
On 11/07/18 21:46, Jeff Law wrote:
> On 07/10/2018 10:43 AM, Richard Earnshaw (lists) wrote:
>> On 10/07/18 16:42, Jeff Law wrote:
>>> On 07/10/2018 02:49 AM, Richard Earnshaw (lists) wrote:
 On 10/07/18 00:13, Jeff Law wrote:
> On 07/09/2018 10:38 AM, Richard Earnshaw wrote:
>>
>> To address all of the above, these patches adopt a new approach, based
>> in part on a posting by Chandler Carruth to the LLVM developers list
>> (https://lists.llvm.org/pipermail/llvm-dev/2018-March/122085.html),
>> but which we have extended to deal with inter-function speculation.
>> The patches divide the problem into two halves.
> We're essentially turning the control dependency into a value that we
> can then use to munge the pointer or the resultant data.
>
>>
>> The first half is some target-specific code to track the speculation
>> condition through the generated code to provide an internal variable
>> which can tell us whether or not the CPU's control flow speculation
>> matches the data flow calculations.  The idea is that the internal
>> variable starts with the value TRUE and if the CPU's control flow
>> speculation ever causes a jump to the wrong block of code the variable
>> becomes false until such time as the incorrect control flow
>> speculation gets unwound.
> Right.
>
> So one of the things that comes immediately to mind is you have to run
> this early enough that you can still get to all the control flow and
> build your predicates.  Otherwise you have do undo stuff like
> conditional move generation.

 No, the opposite, in fact.  We want to run this very late, at least on
 Arm systems (AArch64 or AArch32).  Conditional move instructions are
 fine - they're data-flow operations, not control flow (in fact, that's
 exactly what the control flow tracker instructions are).  By running it
 late we avoid disrupting any of the earlier optimization passes as well.
>>> Ack.  I looked at the aarch64 implementation after sending my message
>>> and it clearly runs very late.
>>>
>>> I haven't convinced myself that all the work generic parts of the
>>> compiler to rewrite and eliminate conditionals is safe.  But even if it
>>> isn't, you're probably getting enough coverage to drastically reduce the
>>> attack surface.  I'm going to have to think about the early
>>> transformations we make and how they interact here harder.  But I think
>>> the general approach can dramatically reduce the attack surface.
>>
>> My argument here would be that we are concerned about speculation that
>> the CPU does with the generated program.  We're not particularly
>> bothered about the abstract machine description it's based upon.  As
>> long as the earlier transforms lead to a valid translation (it hasn't
>> removed a necessary bounds check) then running late is fine.
> I'm thinking about obfuscation of the bounds check or the pointer or
> turning branchy into straightline code, possibly doing some speculation
> in the process, if-conversion and the like.
> 
> For example hoist_adjacent_loads which results in speculative loads and
> likely a conditional move to select between the two loaded values.
> 
> Or what if we've done something like
> 
> if (x < maxval)
>res = *p;
> 
> And we've turned that into
> 
> 
> t = *p;
> res = (x < maxval) ? t : res;

Hmm, interesting.  But for that to be safe, the compiler would have to
be able to prove that dereferencing p was safe even if x >= maxval,
otherwise the run-time code could fault (so if there's any chance that
it could point to something vulnerable, then there must also be a chance
that it points to unmapped memory).  Given that requirement, I don't
think this case can be a specific concern, since the requirement implies
that p must already be within some known bounds for the type of object
it points to.

R.

> 
> 
> That may be implemented as a conditional move at the RTL level, so
> protecting that may be nontrivial.
> 
> In those examples the compiler itself has introduced the speculation.
> 
> I can't find the conditional obfuscation I was looking for, so it's hard
> to rule it in our out as potentially problematical.
> 
> WRT pointer obfuscation, we no longer propagate conditional equivalences
> very agressively, so it may be a non-issue in the end.
> 
> But again, even with these concerns I think what you're doing cuts down
> the attack surface in meaningful ways.
> 
> 
> 
>>
>> I can't currently conceive a situation where the compiler would be able
>> to remove a /necessary/ bounds check that could lead to unsafe
>> speculation later on.  A redundant bounds check removal shouldn't be a
>> problem as the non-redundant check should remain and that will still get
>> tracking code added.
> It's less about removal and more about either compiler-generated
> speculation or obfuscation of the patterns you're looking for.
> 
> 
> jeff
> 
> 
> 
> 



Re: [PATCH, rs6000] gimple folding support for vec_pack and vec_unpack

2018-07-11 Thread Segher Boessenkool
Hi!

On Mon, Jul 09, 2018 at 02:08:37PM -0500, Will Schmidt wrote:
>   * config/rs6000/rs6000.c (rs6000_gimple_fold_builtin):
>   Add support for gimple-folding of vec_pack() and vec_unpack()
>   intrinsics.

> +case ALTIVEC_BUILTIN_VUPKHPX:
> +case ALTIVEC_BUILTIN_VUPKLPX:
> +  {
> +   return false;
> +  }

A block around a signle statement looks a bit silly (and in the other
cases in your patch it isn't necessary either; it is nice if you use it
to give some scope to a local var, but you don't have that here).

But, patch is fine as far as I can see :-)


Segher


Re: [PATCH, rs6000] Testcase adds for vec_unpack

2018-07-11 Thread Segher Boessenkool
Hi Will,


On Mon, Jul 09, 2018 at 02:08:49PM -0500, Will Schmidt wrote:
>   * gcc.target/powerpc/fold-vec-unpack-char.c: New.
>   * gcc.target/powerpc/fold-vec-unpack-float.c: New.
>   * gcc.target/powerpc/fold-vec-unpack-int.c: New.
>   * gcc.target/powerpc/fold-vec-unpack-pixel.c: New.
>   * gcc.target/powerpc/fold-vec-unpack-short.c: New.

This looks fine.  Okay for trunk.  Thanks!


Segher


Re: [PATCH, rs6000] Add support for gimple folding vec_perm()

2018-07-11 Thread Segher Boessenkool
On Mon, Jul 09, 2018 at 02:08:55PM -0500, Will Schmidt wrote:
>Add support for early gimple folding of vec_perm.   Testcases are already 
> in-tree as
> gcc.target/powerpc/fold-vec-perm-*.c
> 
> OK for trunk?

Looks fine to me.  Okay if no one else complains :-)


Segher


>   * gcc/config/rs6000/rs6000.c (rs6000_gimple_fold_builtin): Add support
>   for folding vec_perm.


[PING][PATCH][Aarch64] v2: Arithmetic overflow addv patterns [Patch 2/4]

2018-07-11 Thread Michael Collison
Ping. Last patch here:

https://gcc.gnu.org/ml/gcc-patches/2018-06/msg00735.html



[PATCH] doc: update looping constructs

2018-07-11 Thread Paul Koning
This patch removes the obsolete documentation for 
decrement_and_branch_until_zero.  It also adds detail to the description for 
doloop_end.  In particular, it describes the required form of the conditional 
branch part of the pattern.

Ok for trunk?

paul

ChangeLog:

2018-07-11  Paul Koning  

* doc/rtl.texi (REG_NONNEG): Remove decrement and branch until
zero reference, add doloop_end instead.
* doc/md.texi (decrement_and_branch_until_zero): Remove.
(Looping patterns): Remove decrement_and_branch_until_zero.  Add
detail for doloop_end.

Index: doc/md.texi
===
--- doc/md.texi (revision 262562)
+++ doc/md.texi (working copy)
@@ -6681,17 +6681,6 @@ second operand, but you should incorporate it in t
 that the jump optimizer will not delete the table as unreachable code.
 
 
-@cindex @code{decrement_and_branch_until_zero} instruction pattern
-@item @samp{decrement_and_branch_until_zero}
-Conditional branch instruction that decrements a register and
-jumps if the register is nonzero.  Operand 0 is the register to
-decrement and test; operand 1 is the label to jump to if the
-register is nonzero.  @xref{Looping Patterns}.
-
-This optional instruction pattern is only used by the combiner,
-typically for loops reversed by the loop optimizer when strength
-reduction is enabled.
-
 @cindex @code{doloop_end} instruction pattern
 @item @samp{doloop_end}
 Conditional branch instruction that decrements a register and
@@ -7515,67 +7504,12 @@ iterations.  This avoids the need for fetching and
 @samp{dbra}-like instruction and avoids pipeline stalls associated with
 the jump.
 
-GCC has three special named patterns to support low overhead looping.
-They are @samp{decrement_and_branch_until_zero}, @samp{doloop_begin},
-and @samp{doloop_end}.  The first pattern,
-@samp{decrement_and_branch_until_zero}, is not emitted during RTL
-generation but may be emitted during the instruction combination phase.
-This requires the assistance of the loop optimizer, using information
-collected during strength reduction, to reverse a loop to count down to
-zero.  Some targets also require the loop optimizer to add a
-@code{REG_NONNEG} note to indicate that the iteration count is always
-positive.  This is needed if the target performs a signed loop
-termination test.  For example, the 68000 uses a pattern similar to the
-following for its @code{dbra} instruction:
+GCC has two special named patterns to support low overhead looping.
+They are @samp{doloop_begin} and @samp{doloop_end}.  These are emitted
+by the loop optimizer for certain well-behaved loops with a finite
+number of loop iterations using information collected during strength
+reduction.
 
-@smallexample
-@group
-(define_insn "decrement_and_branch_until_zero"
-  [(set (pc)
-(if_then_else
-  (ge (plus:SI (match_operand:SI 0 "general_operand" "+d*am")
-   (const_int -1))
-  (const_int 0))
-  (label_ref (match_operand 1 "" ""))
-  (pc)))
-   (set (match_dup 0)
-(plus:SI (match_dup 0)
- (const_int -1)))]
-  "find_reg_note (insn, REG_NONNEG, 0)"
-  "@dots{}")
-@end group
-@end smallexample
-
-Note that since the insn is both a jump insn and has an output, it must
-deal with its own reloads, hence the `m' constraints.  Also note that
-since this insn is generated by the instruction combination phase
-combining two sequential insns together into an implicit parallel insn,
-the iteration counter needs to be biased by the same amount as the
-decrement operation, in this case @minus{}1.  Note that the following similar
-pattern will not be matched by the combiner.
-
-@smallexample
-@group
-(define_insn "decrement_and_branch_until_zero"
-  [(set (pc)
-(if_then_else
-  (ge (match_operand:SI 0 "general_operand" "+d*am")
-  (const_int 1))
-  (label_ref (match_operand 1 "" ""))
-  (pc)))
-   (set (match_dup 0)
-(plus:SI (match_dup 0)
- (const_int -1)))]
-  "find_reg_note (insn, REG_NONNEG, 0)"
-  "@dots{}")
-@end group
-@end smallexample
-
-The other two special looping patterns, @samp{doloop_begin} and
-@samp{doloop_end}, are emitted by the loop optimizer for certain
-well-behaved loops with a finite number of loop iterations using
-information collected during strength reduction.
-
 The @samp{doloop_end} pattern describes the actual looping instruction
 (or the implicit looping operation) and the @samp{doloop_begin} pattern
 is an optional companion pattern that can be used for initialization
@@ -7594,15 +7528,65 @@ desired special iteration counter register was not
 machine dependent reorg pass could emit a traditional compare and jump
 instruction pair.
 
-The essential difference between the
-@samp{decrement_and_branch_until_zero} and the @samp{doloop_end}
-patterns is that the loop optimizer allocates an additional pseudo

[PATCH] C++: suggestions for misspelled private members (PR c++/84993)

2018-07-11 Thread David Malcolm
PR c++/84993 identifies a problem with our suggestions for
misspelled member names in the C++ FE for the case where the
member is private.

For example, given:

class foo
{
public:
  double get_ratio() const { return m_ratio; }

private:
  double m_ratio;
};

void test(foo *ptr)
{
  if (ptr->ratio >= 0.5)
;// etc
}

...we currently emit this suggestion:

: In function 'void test(foo*)':
:12:12: error: 'class foo' has no member named 'ratio'; did you mean 
'm_ratio'?
   if (ptr->ratio >= 0.5)
^
m_ratio

...but if the user follows this suggestion, they get:

: In function 'void test(foo*)':
:12:12: error: 'double foo::m_ratio' is private within this context
   if (ptr->m_ratio >= 0.5)
^~~
:7:10: note: declared private here
   double m_ratio;
  ^~~
:12:12: note: field 'double foo::m_ratio' can be accessed via 'double 
foo::get_ratio() const'
   if (ptr->m_ratio >= 0.5)
^~~
get_ratio()

It feels wrong to be emitting a fix-it hint that doesn't compile, so this
patch adds the accessor fix-it hint logic to this case, so that we directly
offer a valid suggestion:

: In function 'void test(foo*)':
:12:12: error: 'class foo' has no member named 'ratio'; did you mean
'double foo::m_ratio'? (accessible via 'double foo::get_ratio() const')
   if (ptr->ratio >= 0.5)
^
get_ratio()

Successfully bootstrapped & regrtested on x86_64-pc-linux-gnu;
adds 105 PASS results to g++.sum.

OK for trunk?

gcc/cp/ChangeLog:
PR c++/84993
* call.c (enforce_access): Move diagnostics to...
(complain_about_access): ...this new function.
* cp-tree.h (class access_failure_info): Rename split out field
"m_field_decl" into "m_decl" and "m_diag_decl".
(access_failure_info::record_access_failure): Add tree param.
(access_failure_info::was_inaccessible_p): New accessor.
(access_failure_info::get_decl): New accessor.
(access_failure_info::get_diag_decl): New accessor.
(access_failure_info::get_any_accessor): New member function.
(access_failure_info::add_fixit_hint): New static member function.
(complain_about_access): New decl.
* typeck.c (access_failure_info::record_access_failure): Update
for change to fields.
(access_failure_info::maybe_suggest_accessor): Split out into...
(access_failure_info::get_any_accessor): ...this new function...
(access_failure_info::add_fixit_hint): ...and this new function.
(finish_class_member_access_expr): Split out "has no member named"
error-handling into...
(complain_about_unrecognized_member): ...this new function, and
check that the guessed name is accessible along the access path.
Only provide a spell-correction fix-it hint if it is; otherwise,
attempt to issue an accessor fix-it hint.

gcc/testsuite/ChangeLog:
PR c++/84993
* g++.dg/torture/accessor-fixits-9.C: New test.
---
 gcc/cp/call.c|  64 ++
 gcc/cp/cp-tree.h |  17 ++-
 gcc/cp/typeck.c  | 150 +--
 gcc/testsuite/g++.dg/torture/accessor-fixits-9.C | 119 ++
 4 files changed, 282 insertions(+), 68 deletions(-)
 create mode 100644 gcc/testsuite/g++.dg/torture/accessor-fixits-9.C

diff --git a/gcc/cp/call.c b/gcc/cp/call.c
index 209c1fd..121f0ca 100644
--- a/gcc/cp/call.c
+++ b/gcc/cp/call.c
@@ -6456,6 +6456,38 @@ build_op_delete_call (enum tree_code code, tree addr, 
tree size,
   return error_mark_node;
 }
 
+/* Issue diagnostics about a disallowed access of DECL, using DIAG_DECL
+   in the diagnostics.
+
+   If ISSUE_ERROR is true, then issue an error about the
+   access, followed by a note showing the declaration.
+   Otherwise, just show the note.  */
+
+void
+complain_about_access (tree decl, tree diag_decl, bool issue_error)
+{
+  if (TREE_PRIVATE (decl))
+{
+  if (issue_error)
+   error ("%q#D is private within this context", diag_decl);
+  inform (DECL_SOURCE_LOCATION (diag_decl),
+ "declared private here");
+}
+  else if (TREE_PROTECTED (decl))
+{
+  if (issue_error)
+   error ("%q#D is protected within this context", diag_decl);
+  inform (DECL_SOURCE_LOCATION (diag_decl),
+ "declared protected here");
+}
+  else
+{
+  if (issue_error)
+   error ("%q#D is inaccessible within this context", diag_decl);
+  inform (DECL_SOURCE_LOCATION (diag_decl), "declared here");
+}
+}
+
 /* If the current scope isn't allowed to access DECL along
BASETYPE_PATH, give an error.  The most derived class in
BASETYPE_PATH is the one used to qualify DECL. DIAG_DECL is
@@ -6480,34 +6512,12 @@ enforce_access (tree basetype_path, tree decl, tree 
diag_decl,
 
   if (!accessible_p (basetype_path, decl, true))
 {
+  if (flag_n

Go patch committed: Build a single backend type for a type alias

2018-07-11 Thread Ian Lance Taylor
A type alias and its underlying type are identical.  This patch to the
Go frontend by Cherry Zhang builds a single backend type for them.
Previously we build two backend types, which sometimes confuse the
backend's type system.

Also don't include type aliases into the list of named type
declarations, since they are not named types.

Bootstrapped and ran Go testsuite on x86_64-pc-linux-gnu.  Committed
to mainline.

Ian
Index: gcc/go/gofrontend/MERGE
===
--- gcc/go/gofrontend/MERGE (revision 262554)
+++ gcc/go/gofrontend/MERGE (working copy)
@@ -1,4 +1,4 @@
-ea7ac7784791dca517b6681a02c39c11bf136755
+267686fd1dffbc03e610e9f17dadb4e72c75f18d
 
 The first line of this file holds the git revision number of the last
 merge done from the gofrontend repository.
Index: gcc/go/gofrontend/gogo.cc
===
--- gcc/go/gofrontend/gogo.cc   (revision 262540)
+++ gcc/go/gofrontend/gogo.cc   (working copy)
@@ -7604,7 +7604,7 @@ Named_object::get_backend(Gogo* gogo, st
 case NAMED_OBJECT_TYPE:
   {
 Named_type* named_type = this->u_.type_value;
-   if (!Gogo::is_erroneous_name(this->name_))
+   if (!Gogo::is_erroneous_name(this->name_) && !named_type->is_alias())
  type_decls.push_back(named_type->get_backend(gogo));
 
 // We need to produce a type descriptor for every named
Index: gcc/go/gofrontend/types.cc
===
--- gcc/go/gofrontend/types.cc  (revision 262540)
+++ gcc/go/gofrontend/types.cc  (working copy)
@@ -991,6 +991,11 @@ Type::get_backend(Gogo* gogo)
   if (this->btype_ != NULL)
 return this->btype_;
 
+  if (this->named_type() != NULL && this->named_type()->is_alias()) {
+this->btype_ = this->unalias()->get_backend(gogo);
+return this->btype_;
+  }
+
   if (this->forward_declaration_type() != NULL
   || this->named_type() != NULL)
 return this->get_btype_without_hash(gogo);


--enable-maintainer-mode currently broken, needs --disable-werror to complete bootstrap

2018-07-11 Thread Thomas Koenig

Hi,

this is a heads-up that configuring with --enable-maintainer-mode
currently breaks bootstrap; see

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=86450

for details.

Running configure with --enable-maintainer-mode --disable-werror
allows bootstrap to proceed until the underlying issue is fixed.

Regards

Thomas


Re: RFC: lra-constraints.c and TARGET_HARD_REGNO_CALL_PART_CLOBBERED question/patch

2018-07-11 Thread Richard Sandiford
Jeff Law  writes:
> On 07/11/2018 02:07 PM, Steve Ellcey wrote:
>> I have a reload/register allocation question and possible patch.  While
>> working on the Aarch64 SIMD ABI[1] I ran into a problem where GCC was
>> saving and restoring registers that it did not need to.  I tracked it
>> down to lra-constraints.c and its use of
>> targetm.hard_regno_call_part_clobbered on instructions that are not
>> calls.  Specifically need_for_call_save_p would check this macro even
>> when the instruction in question (unknown to need_for_call_save_p)
>> was not a call instruction.
>> 
>> This seems wrong to me and I was wondering if anyone more familiar
>> with the register allocator and reload could look at this patch and
>> tell me if it seems reasonable or not.  It passed bootstrap and I
>> am running tests now.  I am just wondering if there is any reason why
>> this target function would need to be called on non-call instructions
>> or if doing so is just an oversight/bug.
>> 
>> Steve Ellcey
>> sell...@cavium.com
>> 
>> 
>> [1] https://gcc.gnu.org/ml/gcc/2018-07/msg00012.html
>> 
>> 
>> 2018-07-11  Steve Ellcey  
>> 
>>  * lra-constraints.c (need_for_call_save_p): Add insn argument
>>  and only check targetm.hard_regno_call_part_clobbered on calls.
>>  (need_for_split_p): Add insn argument, pass to need_for_call_save_p.
>>  (split_reg): Pass insn to need_for_call_save_p.
>>  (split_if_necessary): Pass curr_insn to need_for_split_p.
>>  (inherit_in_ebb): Ditto.
> Various target have calls which are exposed as INSNs rather than as
> CALL_INSNs.   So we need to check that hook on all insns.
>
> You can probably see this in action with the TLS insns on aarch64.

Not sure whether it's that: I think other code does only consider
hard_regno_call_part_clobbered on calls.  But as it stands
need_for_call_save_p is checking whether there's a call somewhere
inbetween the current instruction and the last use in the EBB:

/* Return true if we need a caller save/restore for pseudo REGNO which
   was assigned to a hard register.  */
static inline bool
need_for_call_save_p (int regno)
{
  lra_assert (regno >= FIRST_PSEUDO_REGISTER && reg_renumber[regno] >= 0);
  return (usage_insns[regno].calls_num < calls_num
...
}

So it only calls targetm.hard_regno_call_part_clobbered if such a
call is known to exist somewhere between the two references to
regno (although we don't have the calls themselves to hand).

Thanks,
Richard