Re: [PATCH][Middle-end]Add a new option to finer control inlining based on function's visibility

2018-09-12 Thread Alexander Monakov
On Tue, 11 Sep 2018, Qing Zhao wrote:
> no immediate use case for -finline-visibility=extern right now. But I guess
> that GCC developers might use this option to control inlining scope for
> debugging or development purpose in the future.

In that case I'd recommend to simplify the patch by implementing only the
part to suppress inlining of non-static functions. No need to overengineer.

> thanks for the suggestion, how about -finline-linkage=[all|extern|static]?

I would suggest -finline-only-static. Note however that I do not have the
authority to review this: you still need an opinion from an appropriate
maintainer.

Alexander


Re: [PATCH, OpenACC] C++ reference mapping (PR middle-end/86336)

2018-09-12 Thread Jakub Jelinek
On Tue, Sep 11, 2018 at 10:20:26PM -0400, Julian Brown wrote:
> 2018-09-09  Cesar Philippidis  
> Julian Brown  
> 
> PR middle-end/86336
> 
> gcc/cp/
> * semantics.c (finish_omp_clauses): Treat C++ references the same in
> OpenACC as OpenMP.
> 
> * gimplify.c (gimplify_scan_omp_clauses): Set
> target_firstprivatize_array_bases in OpenACC parallel and kernels
> region contexts.  Remove GOMP_MAP_FIRSTPRIVATE_REFERENCE clauses from
> OpenACC data regions.
> 
> libgomp/
> * testsuite/libgomp.oacc-c++/non-scalar-data.C: Remove XFAIL.

LGTM.

Jakub


[PATCH] Fix fold-const (X & C) ? C : 0 optimization (PR middle-end/87248)

2018-09-12 Thread Jakub Jelinek
Hi!

The following testcase is miscompiled, because it optimizes
(x & -128) ? -128 : 0 to (x & -128) when it shouldn't.  The problem is if
the type of the COND_EXPR has smaller precision than the BIT_AND_EXPR in the
test, we verify that integer_pow2p (arg1), which is true in this case (-128
in signed char is a power of two), and then operand_equal_p between that
value and the constant in BIT_AND_EXPR's second argument (operand_equal_p
compares only the value, and -128 == -128).  The following patch verifies
also that the BIT_AND_EXPR's second operand is a power of two.

Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk and
release branches?

2018-09-12  Jakub Jelinek  

PR middle-end/87248
* fold-const.c (fold_ternary_loc) : Verify also that
BIT_AND_EXPR's second operand is a power of two.  Formatting fix.

* c-c++-common/torture/pr87248.c: New test.

--- gcc/fold-const.c.jj 2018-09-06 09:41:59.0 +0200
+++ gcc/fold-const.c2018-09-08 00:12:28.332418784 +0200
@@ -11607,10 +11607,16 @@ fold_ternary_loc (location_t loc, enum t
  && integer_pow2p (arg1)
  && TREE_CODE (TREE_OPERAND (arg0, 0)) == BIT_AND_EXPR
  && operand_equal_p (TREE_OPERAND (TREE_OPERAND (arg0, 0), 1),
- arg1, OEP_ONLY_CONST))
+ arg1, OEP_ONLY_CONST)
+ /* operand_equal_p compares just value, not precision, so e.g.
+arg1 could be 8-bit -128 and be power of two, but BIT_AND_EXPR
+second operand 32-bit -128, which is not a power of two (or vice
+versa.  */
+ && integer_pow2p (TREE_OPERAND (TREE_OPERAND (arg0, 0), 1)))
return pedantic_non_lvalue_loc (loc,
-   fold_convert_loc (loc, type,
- TREE_OPERAND (arg0, 0)));
+   fold_convert_loc (loc, type,
+ TREE_OPERAND (arg0,
+   0)));
 
   /* Disable the transformations below for vectors, since
 fold_binary_op_with_conditional_arg may undo them immediately,
--- gcc/testsuite/c-c++-common/torture/pr87248.c.jj 2018-09-08 
01:13:43.431334239 +0200
+++ gcc/testsuite/c-c++-common/torture/pr87248.c2018-09-08 
01:14:55.446593520 +0200
@@ -0,0 +1,36 @@
+/* PR middle-end/87248 */
+/* { dg-do run } */
+
+void
+foo (signed char *p, int q)
+{
+  *p = q & (-__SCHAR_MAX__ - 1) ? (-__SCHAR_MAX__ - 1) : 0;
+}
+
+int
+bar (long long x)
+{
+  return x & (-__INT_MAX__ - 1) ? (-__INT_MAX__ - 1) : 0;
+}
+
+int
+main ()
+{
+#if __INT_MAX__ > 4 * __SCHAR_MAX__
+  signed char a[4];
+  foo (a, __SCHAR_MAX__ + 1U);
+  foo (a + 1, 2 * (__SCHAR_MAX__ + 1U));
+  foo (a + 2, -__INT_MAX__ - 1);
+  foo (a + 3, (__SCHAR_MAX__ + 1U) / 2);
+  if (a[0] != (-__SCHAR_MAX__ - 1) || a[1] != a[0] || a[2] != a[0] || a[3] != 
0)
+__builtin_abort ();
+#endif
+#if __LONG_LONG_MAX__ > 4 * __INT_MAX__
+  if (bar (__INT_MAX__ + 1LL) != (-__INT_MAX__ - 1)
+  || bar (2 * (__INT_MAX__ + 1LL)) != (-__INT_MAX__ - 1)
+  || bar (-__LONG_LONG_MAX__ - 1) != (-__INT_MAX__ - 1)
+  || bar ((__INT_MAX__ + 1LL) / 2) != 0)
+__builtin_abort ();
+#endif
+  return 0;
+}

Jakub


Re: [PATCH] Fix fold-const (X & C) ? C : 0 optimization (PR middle-end/87248)

2018-09-12 Thread Richard Biener
On Wed, 12 Sep 2018, Jakub Jelinek wrote:

> Hi!
> 
> The following testcase is miscompiled, because it optimizes
> (x & -128) ? -128 : 0 to (x & -128) when it shouldn't.  The problem is if
> the type of the COND_EXPR has smaller precision than the BIT_AND_EXPR in the
> test, we verify that integer_pow2p (arg1), which is true in this case (-128
> in signed char is a power of two), and then operand_equal_p between that
> value and the constant in BIT_AND_EXPR's second argument (operand_equal_p
> compares only the value, and -128 == -128).  The following patch verifies
> also that the BIT_AND_EXPR's second operand is a power of two.
> 
> Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk and
> release branches?

OK.

Richard.

> 
> 2018-09-12  Jakub Jelinek  
> 
>   PR middle-end/87248
>   * fold-const.c (fold_ternary_loc) : Verify also that
>   BIT_AND_EXPR's second operand is a power of two.  Formatting fix.
> 
>   * c-c++-common/torture/pr87248.c: New test.
> 
> --- gcc/fold-const.c.jj   2018-09-06 09:41:59.0 +0200
> +++ gcc/fold-const.c  2018-09-08 00:12:28.332418784 +0200
> @@ -11607,10 +11607,16 @@ fold_ternary_loc (location_t loc, enum t
> && integer_pow2p (arg1)
> && TREE_CODE (TREE_OPERAND (arg0, 0)) == BIT_AND_EXPR
> && operand_equal_p (TREE_OPERAND (TREE_OPERAND (arg0, 0), 1),
> -   arg1, OEP_ONLY_CONST))
> +   arg1, OEP_ONLY_CONST)
> +   /* operand_equal_p compares just value, not precision, so e.g.
> +  arg1 could be 8-bit -128 and be power of two, but BIT_AND_EXPR
> +  second operand 32-bit -128, which is not a power of two (or vice
> +  versa.  */
> +   && integer_pow2p (TREE_OPERAND (TREE_OPERAND (arg0, 0), 1)))
>   return pedantic_non_lvalue_loc (loc,
> - fold_convert_loc (loc, type,
> -   TREE_OPERAND (arg0, 0)));
> + fold_convert_loc (loc, type,
> +   TREE_OPERAND (arg0,
> + 0)));
>  
>/* Disable the transformations below for vectors, since
>fold_binary_op_with_conditional_arg may undo them immediately,
> --- gcc/testsuite/c-c++-common/torture/pr87248.c.jj   2018-09-08 
> 01:13:43.431334239 +0200
> +++ gcc/testsuite/c-c++-common/torture/pr87248.c  2018-09-08 
> 01:14:55.446593520 +0200
> @@ -0,0 +1,36 @@
> +/* PR middle-end/87248 */
> +/* { dg-do run } */
> +
> +void
> +foo (signed char *p, int q)
> +{
> +  *p = q & (-__SCHAR_MAX__ - 1) ? (-__SCHAR_MAX__ - 1) : 0;
> +}
> +
> +int
> +bar (long long x)
> +{
> +  return x & (-__INT_MAX__ - 1) ? (-__INT_MAX__ - 1) : 0;
> +}
> +
> +int
> +main ()
> +{
> +#if __INT_MAX__ > 4 * __SCHAR_MAX__
> +  signed char a[4];
> +  foo (a, __SCHAR_MAX__ + 1U);
> +  foo (a + 1, 2 * (__SCHAR_MAX__ + 1U));
> +  foo (a + 2, -__INT_MAX__ - 1);
> +  foo (a + 3, (__SCHAR_MAX__ + 1U) / 2);
> +  if (a[0] != (-__SCHAR_MAX__ - 1) || a[1] != a[0] || a[2] != a[0] || a[3] 
> != 0)
> +__builtin_abort ();
> +#endif
> +#if __LONG_LONG_MAX__ > 4 * __INT_MAX__
> +  if (bar (__INT_MAX__ + 1LL) != (-__INT_MAX__ - 1)
> +  || bar (2 * (__INT_MAX__ + 1LL)) != (-__INT_MAX__ - 1)
> +  || bar (-__LONG_LONG_MAX__ - 1) != (-__INT_MAX__ - 1)
> +  || bar ((__INT_MAX__ + 1LL) / 2) != 0)
> +__builtin_abort ();
> +#endif
> +  return 0;
> +}
> 
>   Jakub
> 
> 

-- 
Richard Biener 
SUSE LINUX GmbH, GF: Felix Imendoerffer, Jane Smithard, Graham Norton, HRB 
21284 (AG Nuernberg)


Re: [PATCH 09/25] Elide repeated RTL elements.

2018-09-12 Thread Andrew Stubbs

On 11/09/18 23:45, Jeff Law wrote:

Does this need a corresponding change to the RTL front-end so that it
can read the new form?


There's an RTL front-end? When did that happen... clearly I've not been 
paying attention.


If it's expected that dumps can be fed back in unmodified then yes, it 
needs to recognise the new output.


I'll look into it.

Andrew


[PATCH] Fix store merging (PR tree-optimization/86844)

2018-09-12 Thread Jakub Jelinek
On Tue, Sep 11, 2018 at 04:06:40PM +0200, Andreas Krebbel wrote:
> > It is a fix, but not optimal.
> > We have essentially:
> >  MEM[(int *)p_28] = 0;
> >  MEM[(char *)p_28 + 3B] = 1;
> >  MEM[(char *)p_28 + 1B] = 2;
> >  MEM[(char *)p_28 + 2B] = MEM[(char *)p_28 + 6B];
> > It is useful to merge the first 3 stores into:
> >  MEM[(int *)p_28] = 0x01000200; // or 0x00020001; depending on endianity
> >  MEM[(char *)p_28 + 2B] = MEM[(char *)p_28 + 6B];
> > rather than punt, and just ignore (i.e. make sure it isn't merged with
> > anything else) the non-INTEGER_CST store).  If you don't mind, I'll take 
> > this
> > PR over and handle it tomorrow.
> 
> Please do. Thanks!

Here it is, I hope the added comments are clear enough on what's going on
and when we can and when we can't handle it.

Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk and
eventually 8.3?

2018-09-12  Jakub Jelinek  
Andreas Krebbel  

PR tree-optimization/86844
* gimple-ssa-store-merging.c
(imm_store_chain_info::coalesce_immediate): For overlapping stores, if
there are any overlapping stores in between them, make sure they are
also coalesced or we give up completely.

* gcc.c-torture/execute/pr86844.c: New test.
* gcc.dg/store_merging_22.c: New test.
* gcc.dg/store_merging_23.c: New test.

--- gcc/gimple-ssa-store-merging.c.jj   2018-07-12 09:38:46.137335036 +0200
+++ gcc/gimple-ssa-store-merging.c  2018-09-11 22:47:41.406582798 +0200
@@ -2702,15 +2702,80 @@ imm_store_chain_info::coalesce_immediate
{
  /* Only allow overlapping stores of constants.  */
  if (info->rhs_code == INTEGER_CST
- && merged_store->stores[0]->rhs_code == INTEGER_CST
- && check_no_overlap (m_store_info, i, INTEGER_CST,
-  MAX (merged_store->last_order, info->order),
-  MAX (merged_store->start
-   + merged_store->width,
-   info->bitpos + info->bitsize)))
+ && merged_store->stores[0]->rhs_code == INTEGER_CST)
{
- merged_store->merge_overlapping (info);
- goto done;
+ unsigned int last_order
+   = MAX (merged_store->last_order, info->order);
+ unsigned HOST_WIDE_INT end
+   = MAX (merged_store->start + merged_store->width,
+  info->bitpos + info->bitsize);
+ if (check_no_overlap (m_store_info, i, INTEGER_CST,
+   last_order, end))
+   {
+ /* check_no_overlap call above made sure there are no
+overlapping stores with non-INTEGER_CST rhs_code
+in between the first and last of the stores we've
+just merged.  If there are any INTEGER_CST rhs_code
+stores in between, we need to merge_overlapping them
+even if in the sort_by_bitpos order there are other
+overlapping stores in between.  Keep those stores as is.
+Example:
+   MEM[(int *)p_28] = 0;
+   MEM[(char *)p_28 + 3B] = 1;
+   MEM[(char *)p_28 + 1B] = 2;
+   MEM[(char *)p_28 + 2B] = MEM[(char *)p_28 + 6B];
+We can't merge the zero store with the store of two and
+not merge anything else, because the store of one is
+in the original order in between those two, but in
+store_by_bitpos order it comes after the last store that
+we can't merge with them.  We can merge the first 3 stores
+and keep the last store as is though.  */
+ unsigned int len = m_store_info.length (), k = i;
+ for (unsigned int j = i + 1; j < len; ++j)
+   {
+ store_immediate_info *info2 = m_store_info[j];
+ if (info2->bitpos >= end)
+   break;
+ if (info2->order < last_order)
+   {
+ if (info2->rhs_code != INTEGER_CST)
+   {
+ /* Normally check_no_overlap makes sure this
+doesn't happen, but if end grows below, then
+we need to process more stores than
+check_no_overlap verified.  Example:
+   MEM[(int *)p_5] = 0;
+   MEM[(short *)p_5 + 3B] = 1;
+   MEM[(char *)p_5 + 4B] = _9;
+   MEM[(char *)p_5 + 2B] = 2;  */
+ k = 0;
+ break;
+  

Re: [PATCH] Fix store merging (PR tree-optimization/86844)

2018-09-12 Thread Richard Biener
On Wed, 12 Sep 2018, Jakub Jelinek wrote:

> On Tue, Sep 11, 2018 at 04:06:40PM +0200, Andreas Krebbel wrote:
> > > It is a fix, but not optimal.
> > > We have essentially:
> > >  MEM[(int *)p_28] = 0;
> > >  MEM[(char *)p_28 + 3B] = 1;
> > >  MEM[(char *)p_28 + 1B] = 2;
> > >  MEM[(char *)p_28 + 2B] = MEM[(char *)p_28 + 6B];
> > > It is useful to merge the first 3 stores into:
> > >  MEM[(int *)p_28] = 0x01000200; // or 0x00020001; depending on 
> > > endianity
> > >  MEM[(char *)p_28 + 2B] = MEM[(char *)p_28 + 6B];
> > > rather than punt, and just ignore (i.e. make sure it isn't merged with
> > > anything else) the non-INTEGER_CST store).  If you don't mind, I'll take 
> > > this
> > > PR over and handle it tomorrow.
> > 
> > Please do. Thanks!
> 
> Here it is, I hope the added comments are clear enough on what's going on
> and when we can and when we can't handle it.
> 
> Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk and
> eventually 8.3?

OK.  We need to fix the bug on the branch, so either we go with this 
variant or the other (which was pessimizing some cases?)

Richard.

> 2018-09-12  Jakub Jelinek  
>   Andreas Krebbel  
> 
>   PR tree-optimization/86844
>   * gimple-ssa-store-merging.c
>   (imm_store_chain_info::coalesce_immediate): For overlapping stores, if
>   there are any overlapping stores in between them, make sure they are
>   also coalesced or we give up completely.
> 
>   * gcc.c-torture/execute/pr86844.c: New test.
>   * gcc.dg/store_merging_22.c: New test.
>   * gcc.dg/store_merging_23.c: New test.
> 
> --- gcc/gimple-ssa-store-merging.c.jj 2018-07-12 09:38:46.137335036 +0200
> +++ gcc/gimple-ssa-store-merging.c2018-09-11 22:47:41.406582798 +0200
> @@ -2702,15 +2702,80 @@ imm_store_chain_info::coalesce_immediate
>   {
> /* Only allow overlapping stores of constants.  */
> if (info->rhs_code == INTEGER_CST
> -   && merged_store->stores[0]->rhs_code == INTEGER_CST
> -   && check_no_overlap (m_store_info, i, INTEGER_CST,
> -MAX (merged_store->last_order, info->order),
> -MAX (merged_store->start
> - + merged_store->width,
> - info->bitpos + info->bitsize)))
> +   && merged_store->stores[0]->rhs_code == INTEGER_CST)
>   {
> -   merged_store->merge_overlapping (info);
> -   goto done;
> +   unsigned int last_order
> + = MAX (merged_store->last_order, info->order);
> +   unsigned HOST_WIDE_INT end
> + = MAX (merged_store->start + merged_store->width,
> +info->bitpos + info->bitsize);
> +   if (check_no_overlap (m_store_info, i, INTEGER_CST,
> + last_order, end))
> + {
> +   /* check_no_overlap call above made sure there are no
> +  overlapping stores with non-INTEGER_CST rhs_code
> +  in between the first and last of the stores we've
> +  just merged.  If there are any INTEGER_CST rhs_code
> +  stores in between, we need to merge_overlapping them
> +  even if in the sort_by_bitpos order there are other
> +  overlapping stores in between.  Keep those stores as is.
> +  Example:
> + MEM[(int *)p_28] = 0;
> + MEM[(char *)p_28 + 3B] = 1;
> + MEM[(char *)p_28 + 1B] = 2;
> + MEM[(char *)p_28 + 2B] = MEM[(char *)p_28 + 6B];
> +  We can't merge the zero store with the store of two and
> +  not merge anything else, because the store of one is
> +  in the original order in between those two, but in
> +  store_by_bitpos order it comes after the last store that
> +  we can't merge with them.  We can merge the first 3 stores
> +  and keep the last store as is though.  */
> +   unsigned int len = m_store_info.length (), k = i;
> +   for (unsigned int j = i + 1; j < len; ++j)
> + {
> +   store_immediate_info *info2 = m_store_info[j];
> +   if (info2->bitpos >= end)
> + break;
> +   if (info2->order < last_order)
> + {
> +   if (info2->rhs_code != INTEGER_CST)
> + {
> +   /* Normally check_no_overlap makes sure this
> +  doesn't happen, but if end grows below, then
> +  we need to process more stores than
> +  check_no_overlap verified.  Example:
> + MEM[(int *)p_5] = 0;
> + ME

Re: [PATCH] Fix store merging (PR tree-optimization/86844)

2018-09-12 Thread Jakub Jelinek
On Wed, Sep 12, 2018 at 11:04:52AM +0200, Richard Biener wrote:
> OK.  We need to fix the bug on the branch, so either we go with this 
> variant or the other (which was pessimizing some cases?)

Yeah.  Andreas' patch is simpler and for that reason might be better to
backport, on the other side trying to keep trunk and branch in sync has
advantages as well, especially for potential backports of further
store-merging fixes if they'll be needed.

Jakub


Re: [PATCH] Introduce libgcov.so run-time library (PR gcov-profile/84107).

2018-09-12 Thread Martin Liška
On 9/11/18 5:08 PM, Alexander Monakov wrote:
> On Tue, 11 Sep 2018, Martin Liška wrote:
>> I've discussed the topic with Alexander on the Cauldron and we hoped
>> that the issue with unique __gcov_root can be handled with DECL_COMMON in 
>> each DSO.
>> Apparently this approach does not work as all DSOs in an executable have 
>> eventually
>> just a single instance of the __gcov_root, which is bad.
> 
> No, that happens only if they have default visibility.  Emitting them with
> "hidden" visibility solves this.

Ah, ok, thanks. So theoretically that way should work.

> 
>> So that I returned back to catch the root of the failure. It shows that even 
>> with
>> -Wl,--dynamic-list-data __gcov_indirect_call_callee point to a different 
>> variable
>> among the DSOs. The reason is that the variable is TLS:
> 
> (no, that the variable is TLS is not causing the bug; the issue is libraries
> carry public definitions of just one or both variables depending on if they
> have indirect calls or not, and the library state becomes "diverged" when
> one variable gets interposed while the other does not)

I see, I'm attaching patch that does that. I can confirm your test-case works
fine w/o -Wl,--dynamic-list-data.

I'm wondering if it will work as well with dlopen/dlsym machinery? Or now
the linker flag will be needed?

> 
>> That said I would like to go this direction. I would be happy to receive
>> a feedback. Then I'll test the patch.
> 
> Hm, this TLS change is attacking the issue from a wrong direction.  What I
> suggested on the Cauldron as a simple and backportable way of solving this
> is to consolidate the two TLS variables into one TLS struct, which is then
> either interposed or not as a whole.

Got it, current I prefer the solution.

Martin

> 
> Alexander
> 

>From ce708b5d3f3742b3e8b930aa6eb74181273efd9f Mon Sep 17 00:00:00 2001
From: marxin 
Date: Wed, 12 Sep 2018 11:24:59 +0200
Subject: [PATCH] Patch candidate.

---
 gcc/tree-profile.c| 88 +--
 libgcc/libgcov-profiler.c | 25 ---
 libgcc/libgcov.h  |  9 
 3 files changed, 64 insertions(+), 58 deletions(-)

diff --git a/gcc/tree-profile.c b/gcc/tree-profile.c
index f96bd4b9704..e2df8f005be 100644
--- a/gcc/tree-profile.c
+++ b/gcc/tree-profile.c
@@ -53,6 +53,8 @@ along with GCC; see the file COPYING3.  If not see
 #include "stringpool.h"
 #include "attribs.h"
 #include "tree-pretty-print.h"
+#include "langhooks.h"
+#include "stor-layout.h"
 
 static GTY(()) tree gcov_type_node;
 static GTY(()) tree tree_interval_profiler_fn;
@@ -64,8 +66,9 @@ static GTY(()) tree tree_ior_profiler_fn;
 static GTY(()) tree tree_time_profiler_counter;
 
 
-static GTY(()) tree ic_void_ptr_var;
-static GTY(()) tree ic_gcov_type_ptr_var;
+static GTY(()) tree ic_tuple_var;
+static GTY(()) tree ic_tuple_counters_field;
+static GTY(()) tree ic_tuple_callee_field;
 static GTY(()) tree ptr_void;
 
 /* Do initialization work for the edge profiler.  */
@@ -81,38 +84,34 @@ init_ic_make_global_vars (void)
   tree gcov_type_ptr;
 
   ptr_void = build_pointer_type (void_type_node);
+  gcov_type_ptr = build_pointer_type (get_gcov_type ());
 
-  ic_void_ptr_var
-= build_decl (UNKNOWN_LOCATION, VAR_DECL,
-		  get_identifier (
-			  (PARAM_VALUE (PARAM_INDIR_CALL_TOPN_PROFILE) ?
-			   "__gcov_indirect_call_topn_callee" :
-			   "__gcov_indirect_call_callee")),
-		  ptr_void);
-  TREE_PUBLIC (ic_void_ptr_var) = 1;
-  DECL_EXTERNAL (ic_void_ptr_var) = 1;
-  TREE_STATIC (ic_void_ptr_var) = 1;
-  DECL_ARTIFICIAL (ic_void_ptr_var) = 1;
-  DECL_INITIAL (ic_void_ptr_var) = NULL;
-  if (targetm.have_tls)
-set_decl_tls_model (ic_void_ptr_var, decl_default_tls_model (ic_void_ptr_var));
+  tree tuple_type = lang_hooks.types.make_type (RECORD_TYPE);
 
-  gcov_type_ptr = build_pointer_type (get_gcov_type ());
+  /* callee */
+  ic_tuple_callee_field = build_decl (BUILTINS_LOCATION, FIELD_DECL, NULL_TREE, ptr_void);
 
-  ic_gcov_type_ptr_var
+  /* counters */
+  ic_tuple_counters_field = build_decl (BUILTINS_LOCATION, FIELD_DECL, NULL_TREE, gcov_type_ptr);
+  DECL_CHAIN (ic_tuple_counters_field) = ic_tuple_callee_field;
+
+  finish_builtin_struct (tuple_type, "indirect_call_tuple",
+			 ic_tuple_counters_field, NULL_TREE);
+
+  ic_tuple_var
 = build_decl (UNKNOWN_LOCATION, VAR_DECL,
 		  get_identifier (
 			  (PARAM_VALUE (PARAM_INDIR_CALL_TOPN_PROFILE) ?
-			   "__gcov_indirect_call_topn_counters" :
-			   "__gcov_indirect_call_counters")),
-		  gcov_type_ptr);
-  TREE_PUBLIC (ic_gcov_type_ptr_var) = 1;
-  DECL_EXTERNAL (ic_gcov_type_ptr_var) = 1;
-  TREE_STATIC (ic_gcov_type_ptr_var) = 1;
-  DECL_ARTIFICIAL (ic_gcov_type_ptr_var) = 1;
-  DECL_INITIAL (ic_gcov_type_ptr_var) = NULL;
+			   "__gcov_indirect_call_topn" :
+			   "__gcov_indirect_call")),
+		  tuple_type);
+  TREE_PUBLIC (ic_tuple_var) = 1;
+  DECL_EXTERNAL (ic_tuple_var) = 1;
+  TREE_STATIC (ic_tuple_var) = 1;
+  DECL_ARTIFICIAL (ic_tuple_var) = 1;
+  DECL_INITIAL (i

Re: [wwwdocs] Document 87137 fix

2018-09-12 Thread Nathan Sidwell

On 9/6/18 12:24 PM, Gerald Pfeifer wrote:

Hi Nathan,
On Wed, 5 Sep 2018, Nathan Sidwell wrote:

this documents the fix for pr87137.  Discovered as a GCC-8 regression,
turned out to be an ABI bug.  Decided to fix the entire bug in one go.
Are these changes.html changes ok?


thanks for doing this!  I have minor suggestions if you don't mind;
the patch is fine if you consider these.


thanks, I have committed with your suggestions.

nathan

--
Nathan Sidwell


Re: [PATCH] Add quotes for -fconstexpr-depth= in an error message.

2018-09-12 Thread Martin Liška
PING^1

On 8/29/18 10:27 AM, Martin Liška wrote:
> Hi.
> 
> That's just simple patch that wraps an option name
> in an error message into apostrophes.
> 
> Patch can bootstrap on ppc64le-redhat-linux and survives regression tests.
> 
> Ready to be installed?
> Martin
> 
> gcc/cp/ChangeLog:
> 
> 2018-08-28  Martin Liska  
> 
>   * constexpr.c (cxx_eval_call_expression): Add quotes
>   to -fconstexpr-depth=.
> ---
>  gcc/cp/constexpr.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> 



Re: [C++, wwwdocs] bugs/index.html - complete C++ non-bug entry

2018-09-12 Thread Nathan Sidwell

On 9/2/18 10:26 AM, Gerald Pfeifer wrote:

Jason and Nathan,

while I completed one C+ non-bug entry with the patch below, I am
not sure the items as such is really still relevant?

In fact, if you could have a look at https://gcc.gnu.org/bugs/#nonbugs_cxx
and advise which entries perhaps should be removed (or updated or
added), I'll take care.


sorry for the tardiness.

1) export
exported templates were removed as a feature in C++11.  'export' remains 
a reserved keyword.  It is being recycled for C++ modules, currently an 
experimental TS.


2) G++ emits two copies of constructors and destructors.
IIUC the ABI has been corrected to allow a single cdtor in cases where 
they are the same.  We emit compatibility(?) aliases on targets that 
support them.  Jason?


3) Global destructors are not run in the correct order.
I think --use-cxa-atexit is enabled by default on most (nearly all) 
targets these days.


4) Classes in exception specifiers must be complete types.
While this is still correct, exception specifiers were deprecated in 
C++11.  They were removed in C++17.  We still support them for 
compatibility.  Modern code should use noexcept, or 
noexcept(boolean-constant-expression)


5) Exceptions don't work in multithreaded applications.
Correct, but as with cxa-atexit, usually enabled by default on threading 
systems


6) Templates, scoping, and digraphs.
again, correct, but we give a very clueful diagnostic:
digraph.cc:8:2: error: ‘<::’ cannot begin a template-argument list 
[-fpermissive]

 B<::X::A> *p;
  ^~
digraph.cc:8:2: note: ‘<:’ is an alternate spelling for ‘[’. Insert 
whitespace between ‘<’ and ‘::’
digraph.cc:8:2: note: (if you use ‘-fpermissive’ or ‘-std=c++11’, or 
‘-std=gnu++11’ G++ will accept your code)


7) Copy constructor access check while initializing a reference.
This example is no longer correct, it compiles without error (including 
c++98 mode).  I think compilers now have to know about about-to-die 
temporaries and move them?


8) ABI changes
We should probably document the std-forced ABI changes?

Hope that helps.

nathan



--
Nathan Sidwell


Re: [PATCH] Add quotes for -fconstexpr-depth= in an error message.

2018-09-12 Thread Nathan Sidwell

On 9/12/18 6:17 AM, Martin Liška wrote:

PING^1




gcc/cp/ChangeLog:

2018-08-28  Martin Liska  

* constexpr.c (cxx_eval_call_expression): Add quotes
to -fconstexpr-depth=.


ok, thanks


--
Nathan Sidwell


Re: [PATCH] Fix store merging (PR tree-optimization/86844)

2018-09-12 Thread Richard Biener
On Wed, 12 Sep 2018, Jakub Jelinek wrote:

> On Wed, Sep 12, 2018 at 11:04:52AM +0200, Richard Biener wrote:
> > OK.  We need to fix the bug on the branch, so either we go with this 
> > variant or the other (which was pessimizing some cases?)
> 
> Yeah.  Andreas' patch is simpler and for that reason might be better to
> backport, on the other side trying to keep trunk and branch in sync has
> advantages as well, especially for potential backports of further
> store-merging fixes if they'll be needed.

I'd say backport the change if no issues show up within a week.

Richard.


Re: [PATCH 04/25] SPECIAL_REGNO_P

2018-09-12 Thread Andrew Stubbs

On 11/09/18 23:42, Jeff Law wrote:

This feels like you're papering over a problem in regrename and/or the
GCN port..  regrename should be checking the predicate and constraints
when it makes changes.  And I think that you're still allowed to refer
to a fixed register in alternatives.


I think you're allowed to use a constraint to match an already-present 
hardreg, fixed or otherwise, but my understanding is that LRA will never 
convert a pseudoreg to a fixed hardreg, no matter what the constraint says.


Just to make sure, I just tried to fix EXEC (the only register matching 
the "e" constraint, and one of the "special" ones), and as expected the 
compiler blows up with "unable to generate reloads for ...".


Anyway, back to the issue of SPECIAL_REGNO_P ...

I've just retested the motivating example that we had, and that no 
longer fails in regrename.  That could be because the problem is fixed, 
or simply that the compiler no longer generates the exact instruction 
sequence that demonstrates the problem.


If I can't reproduce the issue then this macro becomes just a small 
compile-time optimization and we can remove it safely.


I'll report back when I've done more testing.

Andrew


[PATCH] Fix PR87266

2018-09-12 Thread Richard Biener


Bootstrapped and tested on x86_64-unknown-linux-gnu, applied.

Richard.

2018-09-12  Richard Biener  

PR tree-optimization/87266
* tree-ssa-sccvn.c (do_rpo_vn): Always iterate to not yet
visited blocks.

* gcc.dg/torture/pr87266-1.c: New testcase.
* gcc.dg/torture/pr87266-2.c: Likewise.
* gcc.dg/torture/pr87266-3.c: Likewise.
* gcc.dg/torture/pr87266-4.c: Likewise.

Index: gcc/tree-ssa-sccvn.c
===
--- gcc/tree-ssa-sccvn.c(revision 264234)
+++ gcc/tree-ssa-sccvn.c(working copy)
@@ -6471,6 +6471,17 @@ do_rpo_vn (function *fn, edge entry, bit
== (EDGE_DFS_BACK|EDGE_EXECUTABLE)
&& rpo_state[bb_to_rpo[e->dest->index]].iterate)
  {
+   int destidx = bb_to_rpo[e->dest->index];
+   if (!rpo_state[destidx].visited)
+ {
+   if (dump_file && (dump_flags & TDF_DETAILS))
+ fprintf (dump_file, "Unvisited destination %d\n",
+  e->dest->index);
+   if (iterate_to == -1
+   || destidx < iterate_to)
+ iterate_to = destidx;
+   continue;
+ }
if (dump_file && (dump_flags & TDF_DETAILS))
  fprintf (dump_file, "Looking for changed values of backedge "
   "%d->%d destination PHIs\n",
@@ -6497,7 +6508,6 @@ do_rpo_vn (function *fn, edge entry, bit
&& dump_file && (dump_flags & TDF_DETAILS))
  fprintf (dump_file, "PHI was CSEd and hashtable "
   "state (changed)\n");
-   int destidx = bb_to_rpo[e->dest->index];
if (iterate_to == -1
|| destidx < iterate_to)
  iterate_to = destidx;
Index: gcc/testsuite/gcc.dg/torture/pr87266-1.c
===
--- gcc/testsuite/gcc.dg/torture/pr87266-1.c(nonexistent)
+++ gcc/testsuite/gcc.dg/torture/pr87266-1.c(working copy)
@@ -0,0 +1,32 @@
+/* { dg-do compile } */
+/* { dg-additional-options "-fno-tree-ccp -fno-tree-forwprop" } */
+
+void
+iw (int gu, int mq, int r2)
+{
+  int yn = 0;
+
+  while (gu < 1)
+{
+  for (;;)
+;
+
+ bb:;
+  int ay = 0;
+
+  while (ay < 1)
+++mq;
+}
+
+  if (yn != 0)
+goto up;
+
+  if (0)
+{
+ up:
+  if (r2 == 0)
+goto bb;
+}
+
+  goto up;
+}
Index: gcc/testsuite/gcc.dg/torture/pr87266-2.c
===
--- gcc/testsuite/gcc.dg/torture/pr87266-2.c(nonexistent)
+++ gcc/testsuite/gcc.dg/torture/pr87266-2.c(working copy)
@@ -0,0 +1,32 @@
+/* { dg-do compile } */
+/* { dg-additional-options "-fno-tree-ccp -fno-tree-forwprop" } */
+
+void
+iw (int gu, int mq, int r2)
+{
+  int yn = 0;
+
+  while (gu < 1)
+{
+  for (;;)
+;
+
+ bb:;
+  int ay = 0;
+
+  while (yn < 1)
+++mq;
+}
+
+  if (yn != 0)
+goto up;
+
+  if (0)
+{
+ up:
+  if (r2 == 0)
+goto bb;
+}
+
+  goto up;
+}
Index: gcc/testsuite/gcc.dg/torture/pr87266-3.c
===
--- gcc/testsuite/gcc.dg/torture/pr87266-3.c(nonexistent)
+++ gcc/testsuite/gcc.dg/torture/pr87266-3.c(working copy)
@@ -0,0 +1,32 @@
+/* { dg-do compile } */
+/* { dg-additional-options "-fno-tree-ccp -fno-tree-forwprop" } */
+
+void
+iw (int gu, int mq, int r2)
+{
+  int yn = 0;
+
+  while (gu < 1)
+{
+  int ay = 0;
+
+  for (;;)
+;
+
+ bb:
+  while (ay < 1)
+++mq;
+}
+
+  if (yn != 0)
+goto up;
+
+  if (0)
+{
+ up:
+  if (r2 == 0)
+goto bb;
+}
+
+  goto up;
+}
Index: gcc/testsuite/gcc.dg/torture/pr87266-4.c
===
--- gcc/testsuite/gcc.dg/torture/pr87266-4.c(nonexistent)
+++ gcc/testsuite/gcc.dg/torture/pr87266-4.c(working copy)
@@ -0,0 +1,38 @@
+/* { dg-do compile } */
+/* { dg-additional-options "-fno-tree-dominator-opts -fno-tree-forwprop" } */
+
+unsigned long int
+re (long int j9)
+{
+  if (j9 == 0)
+return 1;
+
+  return j9;
+}
+
+void
+zq (int bt, int yy)
+{
+  int p3 = 0, go = 4, ez = go;
+
+  while (go != 0)
+{
+  if (ez + !!bt - re (bt) != 0 && go != 0)
+   {
+ if (yy != 0)
+   p3 = yy;
+   }
+  else
+   return;
+
+  go = 2;
+}
+}
+
+void
+my (unsigned long int n6, int bt, int yy)
+{
+  zq (bt, yy);
+  n6 = n6 == bt;
+  zq (bt, yy);
+}


[PATCH] Improve location information of -Wcoverage-mismatch.

2018-09-12 Thread Martin Liška
Hi.

The patch improves locations of the warning in following way:

sample.c: In function ‘main’:
sample.c:16:1: error: source locations for function ‘main’ have changed, the 
profile data may be out of date [-Werror=coverage-mismatch]
16 | }
   | ^
sample.c:16:1: error: source locations for function ‘main’ have changed, the 
profile data may be out of date [-Werror=coverage-mismatch]
cc1: some warnings being treated as errors

into:

sample.c: In function ‘main’:
sample.c:10:5: error: source locations for function ‘main’ have changed, the 
profile data may be out of date [-Werror=coverage-mismatch]
10 | int main()
   | ^~~~
sample.c:10:5: error: source locations for function ‘main’ have changed, the 
profile data may be out of date [-Werror=coverage-mismatch]

Patch can bootstrap on ppc64le-redhat-linux and survives regression tests.

Thanks,
Martin

gcc/ChangeLog:

2018-09-12  Martin Liska  

* coverage.c (get_coverage_counts): Use warning_at
with current_function_decl location. Use %qD in warning
message.
---
 gcc/coverage.c | 18 ++
 1 file changed, 10 insertions(+), 8 deletions(-)



diff --git a/gcc/coverage.c b/gcc/coverage.c
index bae6f5cafac..71f2c59e5a7 100644
--- a/gcc/coverage.c
+++ b/gcc/coverage.c
@@ -374,12 +374,13 @@ get_coverage_counts (unsigned counter, unsigned expected,
 {
   static int warned = 0;
   bool warning_printed = false;
-  tree id = DECL_ASSEMBLER_NAME (current_function_decl);
 
   warning_printed =
-	warning_at (input_location, OPT_Wcoverage_mismatch,
-		"the control flow of function %qE does not match "
-		"its profile data (counter %qs)", id, ctr_names[counter]);
+	warning_at (DECL_SOURCE_LOCATION (current_function_decl),
+		OPT_Wcoverage_mismatch,
+		"the control flow of function %qD does not match "
+		"its profile data (counter %qs)", current_function_decl,
+		ctr_names[counter]);
   if (warning_printed && dump_enabled_p ())
 	{
 	  dump_user_location_t loc
@@ -408,10 +409,11 @@ get_coverage_counts (unsigned counter, unsigned expected,
 }
   else if (entry->lineno_checksum != lineno_checksum)
 {
-  warning (OPT_Wcoverage_mismatch,
-   "source locations for function %qE have changed,"
-	   " the profile data may be out of date",
-	   DECL_ASSEMBLER_NAME (current_function_decl));
+  warning_at (DECL_SOURCE_LOCATION (current_function_decl),
+		  OPT_Wcoverage_mismatch,
+		  "source locations for function %qD have changed,"
+		  " the profile data may be out of date",
+		  current_function_decl);
 }
 
   if (summary)




[PATCH] gcov: emit hotness colors to easily find hot code.

2018-09-12 Thread Martin Liška
Hi.

This is follow-up of:
https://gcc.gnu.org/ml/gcc/2018-08/msg00162.html

I'm suggesting to introduce using colors in order to indicate hotness
of lines. Legend is printed at the very beginning of the output file.
Example: https://pasteboard.co/HDxK4Nm.png

Patch survives gcov.exp test-suite. Will install next week if no objections.

Martin

gcc/ChangeLog:

2018-09-12  Martin Liska  

* doc/gcov.texi: Document new option --use-hotness-colors.
* gcov.c (struct source_info): Declare new field.
(source_info::source_info): Set default for maximum_count.
(print_usage): Add new -q option.
(process_args): Process it.
(accumulate_line_info): Save src->maximum_count.
(output_line_beginning): Make color line number if
flag_use_hotness_colors is set.
(output_line_details): Pass default argument value.
(output_lines): Pass src->maximum_count.
---
 gcc/doc/gcov.texi |  8 ++-
 gcc/gcov.c| 56 +--
 2 files changed, 56 insertions(+), 8 deletions(-)


diff --git a/gcc/doc/gcov.texi b/gcc/doc/gcov.texi
index 98f4a876293..3b1b38aebfa 100644
--- a/gcc/doc/gcov.texi
+++ b/gcc/doc/gcov.texi
@@ -132,6 +132,7 @@ gcov [@option{-v}|@option{--version}] [@option{-h}|@option{--help}]
  [@option{-n}|@option{--no-output}]
  [@option{-o}|@option{--object-directory} @var{directory|file}]
  [@option{-p}|@option{--preserve-paths}]
+ [@option{-q}|@option{--use-hotness-colors}]
  [@option{-r}|@option{--relative-only}]
  [@option{-s}|@option{--source-prefix} @var{directory}]
  [@option{-t}|@option{--stdout}]
@@ -264,7 +265,6 @@ Use colors for lines of code that have zero coverage.  We use red color for
 non-exceptional lines and cyan for exceptional.  Same colors are used for
 basic blocks with @option{-a} option.
 
-
 @item -l
 @itemx --long-file-names
 Create long file names for included source files.  For example, if the
@@ -305,6 +305,12 @@ removed and unremoveable @file{..}
 components renamed to @samp{^}.  This is useful if sourcefiles are in several
 different directories.
 
+@item -q
+@itemx --use-hotness-colors
+
+Emit perf-like colored output for hot lines.  Legend of the color scale
+is printed at the very beginning of the output file.
+
 @item -r
 @itemx --relative-only
 Only output information about source files with a relative pathname
diff --git a/gcc/gcov.c b/gcc/gcov.c
index 6a24a320046..64ab85c981f 100644
--- a/gcc/gcov.c
+++ b/gcc/gcov.c
@@ -339,13 +339,16 @@ struct source_info
 
   coverage_info coverage;
 
+  /* Maximum line count in the source file.  */
+  unsigned int maximum_count;
+
   /* Functions in this source file.  These are in ascending line
  number order.  */
   vector  functions;
 };
 
 source_info::source_info (): index (0), name (NULL), file_time (),
-  lines (), coverage (), functions ()
+  lines (), coverage (), maximum_count (0), functions ()
 {
 }
 
@@ -502,6 +505,10 @@ static int flag_verbose = 0;
 
 static int flag_use_colors = 0;
 
+/* Use perf-like colors to indicate hot lines.  */
+
+static int flag_use_hotness_colors = 0;
+
 /* Output count information for every basic block, not merely those
that contain line number information.  */
 
@@ -839,6 +846,7 @@ print_usage (int error_p)
   fnotice (file, "  -n, --no-output Do not create an output file\n");
   fnotice (file, "  -o, --object-directory DIR|FILE Search for object files in DIR or called FILE\n");
   fnotice (file, "  -p, --preserve-pathsPreserve all pathname components\n");
+  fnotice (file, "  -q, --use-hotness-colorsEmit perf-like colored output for hot lines\n");
   fnotice (file, "  -r, --relative-only Only show data for relative sources\n");
   fnotice (file, "  -s, --source-prefix DIR Source prefix to elide\n");
   fnotice (file, "  -t, --stdoutOutput to stdout instead of a file\n");
@@ -890,6 +898,7 @@ static const struct option options[] =
   { "display-progress", no_argument,   NULL, 'd' },
   { "hash-filenames",	no_argument,   NULL, 'x' },
   { "use-colors",	no_argument,   NULL, 'k' },
+  { "use-hotness-colors",   no_argument,   NULL, 'q' },
   { 0, 0, 0, 0 }
 };
 
@@ -900,7 +909,7 @@ process_args (int argc, char **argv)
 {
   int opt;
 
-  const char *opts = "abcdfhijklmno:prs:tuvwx";
+  const char *opts = "abcdfhijklmno:pqrs:tuvwx";
   while ((opt = getopt_long (argc, argv, opts, options, NULL)) != -1)
 {
   switch (opt)
@@ -929,6 +938,9 @@ process_args (int argc, char **argv)
 	case 'k':
 	  flag_use_colors = 1;
 	  break;
+	case 'q':
+	  flag_use_hotness_colors = 1;
+	  break;
 	case 'm':
 	  flag_demangled_names = 1;
 	  break;
@@ -2580,6 +2592,9 @@ static void accumulate_line_info (line_info *line, source_info *src,
   /* Now, add the count of loops entirely on this line.  */
   count += get_cycles_count (*line);
   line->count = count

[PATCH] Properly mark lambdas in GCOV (PR gcov-profile/86109).

2018-09-12 Thread Martin Liška
Hi.

This is follow-up of:
https://gcc.gnu.org/ml/gcc/2018-08/msg7.html

I've chosen to implement that with new DECL_CXX_LAMBDA_FUNCTION that
uses an empty bit in tree_function_decl.

Patch can bootstrap on ppc64le-redhat-linux and survives regression tests.

Ready for trunk?

gcc/ChangeLog:

2018-09-12  Martin Liska  

PR gcov-profile/86109
* coverage.c (coverage_begin_function): Do not
mark lambdas as artificial.
* tree-core.h (struct GTY): Remove tm_clone_flag
and introduce new lambda_function.
* tree.h (DECL_CXX_LAMBDA_FUNCTION): New macro.

gcc/cp/ChangeLog:

2018-09-12  Martin Liska  

PR gcov-profile/86109
* parser.c (cp_parser_lambda_declarator_opt):
Set DECL_CXX_LAMBDA_FUNCTION for lambdas.

gcc/testsuite/ChangeLog:

2018-09-12  Martin Liska  

PR gcov-profile/86109
* g++.dg/gcov/pr86109.C: New test.
---
 gcc/coverage.c  |  3 ++-
 gcc/cp/parser.c |  1 +
 gcc/testsuite/g++.dg/gcov/pr86109.C | 16 
 gcc/tree-core.h |  2 +-
 gcc/tree.h  |  4 
 5 files changed, 24 insertions(+), 2 deletions(-)
 create mode 100644 gcc/testsuite/g++.dg/gcov/pr86109.C


diff --git a/gcc/coverage.c b/gcc/coverage.c
index bae6f5cafac..550d84598ab 100644
--- a/gcc/coverage.c
+++ b/gcc/coverage.c
@@ -657,7 +657,8 @@ coverage_begin_function (unsigned lineno_checksum, unsigned cfg_checksum)
   gcov_write_string (IDENTIFIER_POINTER
 		 (DECL_ASSEMBLER_NAME (current_function_decl)));
   gcov_write_unsigned (DECL_ARTIFICIAL (current_function_decl)
-		   && !DECL_FUNCTION_VERSIONED (current_function_decl));
+		   && !DECL_FUNCTION_VERSIONED (current_function_decl)
+		   && !DECL_CXX_LAMBDA_FUNCTION (current_function_decl));
   gcov_write_filename (xloc.file);
   gcov_write_unsigned (xloc.line);
   gcov_write_unsigned (xloc.column);
diff --git a/gcc/cp/parser.c b/gcc/cp/parser.c
index f5e4fa4ff07..a1d4e30a966 100644
--- a/gcc/cp/parser.c
+++ b/gcc/cp/parser.c
@@ -10649,6 +10649,7 @@ cp_parser_lambda_declarator_opt (cp_parser* parser, tree lambda_expr)
 	DECL_ARTIFICIAL (fco) = 1;
 	/* Give the object parameter a different name.  */
 	DECL_NAME (DECL_ARGUMENTS (fco)) = closure_identifier;
+	DECL_CXX_LAMBDA_FUNCTION (fco) = 1;
   }
 if (template_param_list)
   {
diff --git a/gcc/testsuite/g++.dg/gcov/pr86109.C b/gcc/testsuite/g++.dg/gcov/pr86109.C
new file mode 100644
index 000..9052d2e5a04
--- /dev/null
+++ b/gcc/testsuite/g++.dg/gcov/pr86109.C
@@ -0,0 +1,16 @@
+
+/* { dg-options "-fprofile-arcs -ftest-coverage -std=c++11" } */
+/* { dg-do run { target native } } */
+
+int main()
+{
+auto partially_uncovered_lambda = [](int i) { /* count(1) */
+if (i > 10) /* count(1) */
+return 0; /* count(1) */
+return 1; /* count(#) */
+};
+
+return partially_uncovered_lambda(20); /* count(1) */
+}
+
+/* { dg-final { run-gcov pr86109.C } } */
diff --git a/gcc/tree-core.h b/gcc/tree-core.h
index dee27f89dec..cd3a2bad08c 100644
--- a/gcc/tree-core.h
+++ b/gcc/tree-core.h
@@ -1789,8 +1789,8 @@ struct GTY(()) tree_function_decl {
   unsigned pure_flag : 1;
   unsigned looping_const_or_pure_flag : 1;
   unsigned has_debug_args_flag : 1;
-  unsigned tm_clone_flag : 1;
   unsigned versioned_function : 1;
+  unsigned lambda_function: 1;
   /* No bits left.  */
 };
 
diff --git a/gcc/tree.h b/gcc/tree.h
index 4f415b7a220..692123eafe3 100644
--- a/gcc/tree.h
+++ b/gcc/tree.h
@@ -3039,6 +3039,10 @@ extern vec **decl_debug_args_insert (tree);
 #define DECL_CXX_DESTRUCTOR_P(NODE)\
(FUNCTION_DECL_CHECK (NODE)->decl_with_vis.cxx_destructor)
 
+/* In FUNCTION_DECL, this is set if this function is a C++ lambda function.  */
+#define DECL_CXX_LAMBDA_FUNCTION(NODE) \
+  (FUNCTION_DECL_CHECK (NODE)->function_decl.lambda_function)
+
 /* In FUNCTION_DECL that represent an virtual method this is set when
the method is final.  */
 #define DECL_FINAL_P(NODE)\



[PATCH] i386: move alignment defaults to processor_costs.

2018-09-12 Thread Martin Liška
Hi.

Honza asked me to do the follow-up. It moves definition of alignments
related to a CPU into *_cost scructures. Advantage of it is that there's
no redundant definition for CPUs that have equal cost.

I verified that it produces same output for all valid -march options:
gcc --help=common -Q -O2 -march=*

I need to reg&bootstrap the patch. Will it be fine after I'll test it?
Thanks,
Martin

gcc/ChangeLog:

2018-09-12  Martin Liska  

* common/config/i386/i386-common.c (ix86_get_valid_option_values):
Use processor_names table.
* config/i386/i386.c (ix86_default_align): Use
processor_cost_table for alignment values.
(ix86_option_override_internal): Use processor_names.
(ix86_function_specific_print): Likewise.
* config/i386/i386.h (struct processor_costs):
Add alignment values.
(struct ptt): Remove and replace with const char *.
* config/i386/x86-tune-costs.h (struct processor_costs):
Declare default alignments for all costs.
---
 gcc/common/config/i386/i386-common.c |  82 ++---
 gcc/config/i386/i386.c   |  15 ++--
 gcc/config/i386/i386.h   |  22 +++---
 gcc/config/i386/x86-tune-costs.h | 104 +++
 4 files changed, 159 insertions(+), 64 deletions(-)


diff --git a/gcc/common/config/i386/i386-common.c b/gcc/common/config/i386/i386-common.c
index c7eb859e1c1..3b5312d7250 100644
--- a/gcc/common/config/i386/i386-common.c
+++ b/gcc/common/config/i386/i386-common.c
@@ -1461,49 +1461,45 @@ i386_except_unwind_info (struct gcc_options *opts)
 #define TARGET_SUPPORTS_SPLIT_STACK ix86_supports_split_stack
 
 /* This table must be in sync with enum processor_type in i386.h.  */
-const struct ptt processor_target_table[PROCESSOR_max] =
+const char *const processor_names[PROCESSOR_max] =
 {
-  /* The "0:0:8" label alignment specified for some processors generates
- secondary 8-byte alignment only for those label/jump/loop targets
- which have primary alignment.  */
-
-  {"generic","16:11:8", "16:11:8", "0:0:8", "16"},
-  {"i386",   "4",   "4",   NULL,"4" },
-  {"i486",   "16",  "16",  "0:0:8", "16"},
-  {"pentium","16:8:8",  "16:8:8",  "0:0:8", "16"},
-  {"lakemont",   "16:8:8",  "16:8:8",  "0:0:8", "16"},
-  {"pentiumpro", "16",  "16:11:8", "0:0:8", "16"},
-  {"pentium4",   NULL,  NULL,  NULL,NULL},
-  {"nocona", NULL,  NULL,  NULL,NULL},
-  {"core2",  "16:11:8", "16:11:8", "0:0:8", "16"},
-  {"nehalem","16:11:8", "16:11:8", "0:0:8", "16"},
-  {"sandybridge","16:11:8", "16:11:8", "0:0:8", "16"},
-  {"haswell","16:11:8", "16:11:8", "0:0:8", "16"},
-  {"bonnell","16",  "16:8:8",  "0:0:8", "16"},
-  {"silvermont", "16",  "16:8:8",  "0:0:8", "16"},
-  {"goldmont",   "16",  "16:8:8",  "0:0:8", "16"},
-  {"goldmont-plus",  "16",  "16:8:8",  "0:0:8", "16"},
-  {"tremont","16",  "16:8:8",  "0:0:8", "16"},
-  {"knl","16",  "16:8:8",  "0:0:8", "16"},
-  {"knm","16",  "16:8:8",  "0:0:8", "16"},
-  {"skylake","16:11:8", "16:11:8", "0:0:8", "16"},
-  {"skylake-avx512", "16:11:8", "16:11:8", "0:0:8", "16"},
-  {"cannonlake", "16:11:8", "16:11:8", "0:0:8", "16"},
-  {"icelake-client", "16:11:8", "16:11:8", "0:0:8", "16"},
-  {"icelake-server", "16:11:8", "16:11:8", "0:0:8", "16"},
-  {"intel",  "16",  "16:8:8",  "0:0:8", "16"},
-  {"geode",  NULL,  NULL,  NULL,NULL},
-  {"k6", "32:8:8",  "32:8:8",  "0:0:8", "32"},
-  {"athlon", "16:8:8",  "16:8:8",  "0:0:8", "16"},
-  {"k8", "16:8:8",  "16:8:8",  "0:0:8", "16"},
-  {"amdfam10",   "32:25:8", "32:8:8",  "0:0:8", "32"},
-  {"bdver1", "16:11:8", "16:8:8",  "0:0:8", "11"},
-  {"bdver2", "16:11:8", "16:8:8",  "0:0:8", "11"},
-  {"bdver3", "16:11:8", "16:8:8",  "0:0:8", "11"},
-  {"bdver4", "16:11:8", "16:8:8",  "0:0:8", "11"},
-  {"btver1", "16:11:8", "16:8:8",  "0:0:8", "11"},
-  {"btver2", "16:11:8", "16:8:8",  "0:0:8", "11"},
-  {"znver1", "16",  "16",  "0:0:8", "16"}
+  "generic",
+  "i386",
+  "i486",
+  "pentium",
+  "lakemont",
+  "pentiumpro",
+  "pentium4",
+  "nocona",
+  "core2",
+  "nehalem",
+  "sandybridge",
+  "haswell",
+  "bonnell",
+  "silvermont",
+  "goldmont",
+  "goldmont-plus",
+  "tremont",
+  "knl",
+  "knm",
+  "skylake",
+  "skylake-avx512",
+  "cannonlake",
+  "icelake-client",
+  "icelake-server",
+  "intel",
+  "geode",
+  "k6",
+  "athlon",
+  "k8",
+  "amdfam10",
+  "bdver1",
+  "bdver2",
+  "bdver3",
+  "bdver4",
+  "btver1",
+  "btver2",
+  "znver1"
 };
 
 const pta processor_alias_table[] =
@@ -1715,7 +1711,7 @@ ix86_get_valid_option_values (int option_code,
   break;
 case OPT_mtune_:
   for (unsigned i = 0; i < PROCESSOR_max; i++)
-	v.safe

[PATCH][tree-ssa-mathopts] PR tree-optimization/87259: Try execute_cse_reciprocals_1 when optimize_recip_sqrt fails

2018-09-12 Thread Kyrill Tkachov

Hi all,

This PR is an SSA validation error where the definition ends up not dominating 
a use.
This appeared after my recip_sqrt optimisation. Though the optimisation doesn't 
get triggered
as there is no sqrt involved it prevents execute_cse_reciprocals_1 from running 
as it's currently
an either/or situation. I think execute_cse_reciprocals_1 doesn't like it when 
some divisions get
execute_cse_reciprocals_1 called on them and some don't.

This patch changes optimize_recip_sqrt to return a boolean indicating success 
and call execute_cse_reciprocals_1
if no transform was made so that execute_cse_reciprocals_1 has a chance to 
optimise it as well.

Bootstrapped and tested on x86_64-unknown-linux-gnu.
Ok for trunk?

Thanks,
Kyrill

2018-09-12  Kyrylo Tkachov  

PR tree-optimization/87259
PR lto/87283
* tree-ssa-math-opts.c (optimize_recip_sqrt): Make function return
boolean.
(pass_cse_reciprocals::execute): Try execute_cse_reciprocals_1 if
optimize_recip_sqrt was ineffective.

2018-09-12  Kyrylo Tkachov  

PR tree-optimization/87259
* gcc.dg/pr87259.c: New test.
diff --git a/gcc/testsuite/gcc.dg/pr87259.c b/gcc/testsuite/gcc.dg/pr87259.c
new file mode 100644
index ..527a60a37adb520591ff40a52a12706aa1582068
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/pr87259.c
@@ -0,0 +1,14 @@
+/* { dg-do compile } */
+/* { dg-options "-Ofast" } */
+
+int a, b, c;
+int *e;
+float f;
+void h() {
+  for (int g;;) {
+float d = b, i = 0 / f, j = a / (f * f), k, l = 0 / d;
+c = i + j;
+g = l;
+e[g] = c / d * k / d;
+  }
+}
diff --git a/gcc/tree-ssa-math-opts.c b/gcc/tree-ssa-math-opts.c
index 055b4e08e97469bac3ba7e26d61193512c234575..700ce6cecf0bc00a6d562db4c1f268df5b25f03a 100644
--- a/gcc/tree-ssa-math-opts.c
+++ b/gcc/tree-ssa-math-opts.c
@@ -547,9 +547,10 @@ free_bb (struct occurrence *occ)
depending on the uses of x, r1, r2.  This removes one multiplication and
allows the sqrt and division operations to execute in parallel.
DEF_GSI is the gsi of the initial division by sqrt that defines
-   DEF (x in the example abovs).  */
+   DEF (x in the example abovs).
+   Return TRUE if a transformation was possible.  */
 
-static void
+static bool
 optimize_recip_sqrt (gimple_stmt_iterator *def_gsi, tree def)
 {
   gimple *use_stmt;
@@ -562,13 +563,13 @@ optimize_recip_sqrt (gimple_stmt_iterator *def_gsi, tree def)
   if (TREE_CODE (orig_sqrt_ssa_name) != SSA_NAME
   || TREE_CODE (div_rhs1) != REAL_CST
   || !real_equal (&TREE_REAL_CST (div_rhs1), &dconst1))
-return;
+return false;
 
   gcall *sqrt_stmt
 = dyn_cast  (SSA_NAME_DEF_STMT (orig_sqrt_ssa_name));
 
   if (!sqrt_stmt || !gimple_call_lhs (sqrt_stmt))
-return;
+return false;
 
   switch (gimple_call_combined_fn (sqrt_stmt))
 {
@@ -577,7 +578,7 @@ optimize_recip_sqrt (gimple_stmt_iterator *def_gsi, tree def)
   break;
 
 default:
-  return;
+  return false;
 }
   tree a = gimple_call_arg (sqrt_stmt, 0);
 
@@ -615,15 +616,15 @@ optimize_recip_sqrt (gimple_stmt_iterator *def_gsi, tree def)
  a multiplication so make sure we don't introduce a multiplication
  on a path where there was none.  */
   if (has_other_use && !mult_on_main_path)
-return;
+return false;
 
   if (sqr_stmts.is_empty () && mult_stmts.is_empty ())
-return;
+return false;
 
   /* If x = 1.0 / sqrt (a) has uses other than those optimized here we want
  to be able to compose it from the sqr and mult cases.  */
   if (has_other_use && (sqr_stmts.is_empty () || mult_stmts.is_empty ()))
-return;
+return false;
 
   if (dump_file)
 {
@@ -715,6 +716,7 @@ optimize_recip_sqrt (gimple_stmt_iterator *def_gsi, tree def)
   gsi_remove (&gsi2, true);
   release_defs (stmt);
 }
+  return true;
 }
 
 /* Look for floating-point divisions among DEF's uses, and try to
@@ -950,8 +952,9 @@ pass_cse_reciprocals::execute (function *fun)
 	  if (flag_unsafe_math_optimizations
 		  && is_gimple_assign (stmt)
 		  && !stmt_can_throw_internal (stmt)
-		  && gimple_assign_rhs_code (stmt) == RDIV_EXPR)
-		optimize_recip_sqrt (&gsi, def);
+		  && gimple_assign_rhs_code (stmt) == RDIV_EXPR
+		  && optimize_recip_sqrt (&gsi, def))
+		;
 	  else
 		execute_cse_reciprocals_1 (&gsi, def);
 	}


Re: [PATCH] Improve x % c1 == c2 expansion (PR middle-end/82853)

2018-09-12 Thread Richard Biener
On Wed, 5 Sep 2018, Jakub Jelinek wrote:

> On Wed, Sep 05, 2018 at 02:42:36PM +0200, Richard Biener wrote:
> > IIRC you said we're already doing x % power-of-two == 0 optimized but the 
> > new
> > code isn't in that place?
> 
> For unsigned %, there is no need for anything special, we just
> expand that as x % (power-of-two - 1) == 0, as any other % power-of-two.
> 
> For signed x % power-of-two == N, Kyrill posted a patch 3 years ago
> and it could be handled in the same spot as this too.  x % power-of-two == 0
> can be expanded as (unsigned) x % (unsigned) power-of-two == 0, for
> x % power-of-two == N where N is in [1 .. power-of-two - 1] we can
> expand it (if cheaper) as x & (msb | (power-of-two - 1)) == N and
> x % power-of-two == N where N is in [-power-of-two + 1, -1] we could
> do x & (msb | (power-of-two - 1)) == (N & (msb | (power-of-two - 1))).
> 
> > Specifically you're looking at immediate
> > uses during RTL expansion which makes me a bit nervous.  The code
> 
> What's wrong about it?  We don't kill the gimple statements until expansion
> is done.  Plus it is just a heuristic, not affecting emitted code
> functionality, it just picks whether we emit it one way or another
> (and only checks uses in the current bb); cfgexpand.c also uses
> FOR_EACH_IMM_USE_FAST in multiple spots.

OK, it just looked misplaced...  but I see we don't kill gimple
stmts while expanding so stuff should be still valid.

> > looks like it performs a GIMPLE transform so I think it should
> > be implemented as such (and thus also not need TER restrictions).
> 
> It isn't really a GIMPLE transform, just needs to provide the callers
> with a tree.  Specifically, if it passes several initial checks, it
> expands the X expression unconditionally and then just uses the result of
> that.  What I could and probably should do is to just emit the cheaper
> of the two sequences and make_tree the result thereof, so the caller
> would emit only the actual comparison, rather than the modulo or its
> replacement again too as the patch does.
> 
> > Our current GIMPLE kitchen-sink for pre-expand insn recog is
> > pass_optimize_widening_mul.
> 
> We don't do anything close to this there though, use the expander to expand
> quite complex tree expressions into RTL.  I'm afraid the expander isn't even
> sufficiently initialized there, we'd need to rewrite the expressions into
> trees using magic VAR_DECLs that would map to some virtual pseudos, etc.
> The only similar case to what we'd need is in ivopts, and that has quite
> some code to make that work.

:/
 
> > I'm not sure where regular jump RTL expansion happens but I
> > think I remeber some TER pattern stuff there as well.
> 
> expand_gimple_cond has:
> We're sometimes presented with such code:
>D.123_1 = x < y;
>if (D.123_1 != 0)
> ... TER stuff in it, and this patch is done next to it (with do_store_flag
> being another spot).  Sticking the optimization later would be much harder
> (I mean e.g. into jumpif*_*.

OK, I see.  I was really hoping it would turn out like synth_mult
or friends but given it's a more complex operation all we could
do on GIMPLE would be recognizing the pattern to end up with

 _1 = __IFN_MOD_CMP (x, c1, c2);
 if (_1 != 0)
   ...

where we then end up with the issue of efficiently combining the
IFN expansion and the CC user of the result...  Which would mean
we'd have to handle the above in jumpif*, TERing in the __IFN
(I think TER might not even handle that right now).

Not pretty either.

So I guess your patch is OK (with the small followup you posted).

Thanks,
Richard.


Re: [PATCH][AAarch64][v3] Add support for TARGET_COMPUTE_FRAME_LAYOUT

2018-09-12 Thread Vlad Lazar

On 11/09/18 17:53, James Greenhalgh wrote:

On Mon, Aug 06, 2018 at 11:14:17AM -0500, Vlad Lazar wrote:

Hi,

The patch adds support for the TARGET_COMPUTE_FRAME_LAYOUT hook on AArch64
and removes unneeded frame layout recalculation.

The removed aarch64_layout_frame calls are unnecessary because the functions in 
which
they appear will be called during or after the reload pass in which the 
TARGET_COMPUTE_FRAME_LAYOUT
hook is called. The if statement in aarch64_layout_frame had the purpose of 
avoiding
the extra work from the calls which have been removed and is now redundant.


I'm not sure I understand, I may be missing something as the frame layout
is complex, but I can't get where I need to be to accept your patch from this
comment.

The check you removed ensures that if we're after reload, and the frame is
laid out, we do no additional work. That part I understand, and that would
mean that any post-reload calls were no-ops. Is the argument that all
users of this code that you eliminate are after reload, and consequently
would have hit this no-op path? Can you talk me through why each case is
safe?


Thanks for taking a look at the patch.

Indeed, all the removed calls are happening during or after reload. I'll go 
trough all of them
and try to explain the rationale behind.

aarch64_expand_prologue and aarch64_expand_epilogue are called after the 
pro_and_epilogue pass,
which runs after reload where TARGET_COMPUTE_FRAME_LAYOUT is called.

aarch64_use_return_insn_p checks explicitly for reload_completed at the 
beginning of the function
and returns false if reload has not run. So it's safe to remove the call as the 
frame layout is
computed by the time it reaches that point.

aarch64_get_separate_components implements the 
TARGET_SHRINK_WRAP_GET_SEPARATE_COMPONENTS hook.
This hook only seems to be used int shrink_wrap.c:try_shrink_wrapping_separate. 
The actual origin
of this hook call can be traced back to the pro_and_epilogue pass:
shrink_wrap.c:try_shrink_wrapping_separate <-
function.c:thread_prologue_and_epilogue_insns <-
function.c:rest_of_handle_thread_prologue_and_epilogue (pro_and_epilogue pass 
entry point).
Therefore, aarch64_get_separate_components only gets called post reload.

aarch64_get_separate_components implements the INITIAL_ELIMINATION_OFFSET hook, 
which is used in:
1. rtlanal.c:get_initial_register_offset: Before using the hook it 
checks that reload has
been completed.
2. reload1.c:get_initial_register_offset and 
reload1.c:set_initial_elim_offsets: These functions
explicitly call TARGET_COMPUTE_FRAME_LAYOUT before using the hook.
3. lra-eliminitations.c:update_reg_eliminate: The 
TARGET_COMPUTE_FRAME_LAYOUT is, again, called
before the INITIAL_ELIMINATION_OFFSET hook is used.

I hope this helps make things a bit clearer.

Thanks,
Vlad


Thanks,
James


gcc/
2018-08-06  Vlad Lazar  

* config/aarch64/aarch64.h (TARGET_COMPUTE_FRAME_LAYOUT): Define.
* config/aarch64/aarch64.c (aarch64_expand_prologue): Remove 
aarch64_layout_frame call.
(aarch64_expand_epilogue): Likewise.
(aarch64_initial_elimination_offset): Likewise.
(aarch64_get_separate_components): Likewise.
(aarch64_use_return_insn_p): Likewise.
(aarch64_layout_frame): Remove unneeded check.




Re: [PATCH, RFC, rs6000, v3] enable early gimple-folding of vec_splat

2018-09-12 Thread Segher Boessenkool
Hi!

Sorry this is all taking so long.

On Mon, Aug 20, 2018 at 04:44:30PM -0500, Will Schmidt wrote:
> 2018-08-20  Will Schmidt  
> 
>   * config/rs6000/rs6000.c (rs6000_gimple_fold_builtin): Add support for
> early gimple folding of vec_splat().

"early" should indent together with the "*" above, not "config".

>   * tree-vect-generic.c: Remove static from tree_vec_extract() definition.
>   * gimple-fold.h:  Add an extern define for tree_vec_extract().

One space after : .

> --- a/gcc/config/rs6000/rs6000.c
> +++ b/gcc/config/rs6000/rs6000.c
> @@ -15768,10 +15768,52 @@ rs6000_gimple_fold_builtin (gimple_stmt_iterator 
> *gsi)
>tree splat_tree = build_vector_from_val (TREE_TYPE (lhs), splat_value);
>g = gimple_build_assign (lhs, splat_tree);
>gimple_set_location (g, gimple_location (stmt));
>gsi_replace (gsi, g, true);
>return true;
> + }

Something is wrong with the indent here (in the original code already).

> + arg0 = gimple_call_arg (stmt, 0); /* input vector.  */
> + arg1 = gimple_call_arg (stmt, 1); /* index into arg0.  */
> + /* Only fold the vec_splat_*() if arg1 is both a constant value, and a 
> valid
> +  index into the arg0 vector.  */

First comment line wraps, please break it a little earlier?

> + unsigned int n_elts = VECTOR_CST_NELTS (arg0);
> + if (TREE_CODE (arg1) != INTEGER_CST
> + || TREE_INT_CST_LOW (arg1) > (n_elts -1))
> +   return false;

I think you should check lower bound 0 as well.  Maybe use IN_RANGE?

The rest looks fine, thanks!  Okay for trunk (with the trivialities fixed).


Segher


Re: [PATCH 21/25] GCN Back-end (part 2/2).

2018-09-12 Thread Andrew Stubbs

On 05/09/18 15:22, Joseph Myers wrote:

In cases like this with alternative diagnostic messages using ?:, you need
to mark up each message with G_() so they both get extracted for
translation by exgettext.


[...]


This concatenation with a macro won't work with exgettext extracting
messages for translation.


[...]


Use %qs (presuming this code is using the generic diagnostic machinery
that supports it).

+gcn-run$(exeext): gcn-run.o
+   +$(LINKER) $(ALL_LINKERFLAGS) $(LDFLAGS) -o $@ $< -ldl

I'd expect this to fail on non-Unix configurations that don't have -ldl,
and thus to need appropriate conditionals / configure tests to avoid that
build failure.


The attached diff from the previous patch should address these issues, I 
hope. If they're OK I'll incorporate the changes into the next version 
of the (much) larger patch when I next post them.



A new port should add an appropriate entry to contrib/config-list.mk.


I'm still testing this.

Andrew
diff --git a/gcc/config.gcc b/gcc/config.gcc
index d28bee5..3d7aa43 100644
--- a/gcc/config.gcc
+++ b/gcc/config.gcc
@@ -1387,7 +1387,13 @@ amdgcn-*-amdhsa)
 	extra_modes=gcn/gcn-modes.def
 	extra_objs="${extra_objs} gcn-tree.o"
 	extra_gcc_objs="driver-gcn.o"
-	extra_programs="${extra_programs} gcn-run\$(exeext)"
+	case "$host" in
+	x86_64*-*-linux-gnu )
+		if test "$ac_res" != no; then
+			extra_programs="${extra_programs} gcn-run\$(exeext)"
+		fi
+		;;
+	esac
 	if test x$enable_as_accelerator = xyes; then
 		extra_programs="${extra_programs} mkoffload\$(exeext)"
 		tm_file="${tm_file} gcn/offload.h"
diff --git a/gcc/config/gcn/gcn.c b/gcc/config/gcn/gcn.c
index ce03d5b..67cf907 100644
--- a/gcc/config/gcn/gcn.c
+++ b/gcc/config/gcn/gcn.c
@@ -48,6 +48,7 @@
 #include "print-rtl.h"
 #include "attribs.h"
 #include "varasm.h"
+#include "intl.h"
 
 /* This file should be included last.  */
 #include "target-def.h"
@@ -4614,8 +4615,9 @@ gcn_goacc_validate_dims (tree decl, int dims[], int fn_level)
 	warning_at (decl ? DECL_SOURCE_LOCATION (decl) : UNKNOWN_LOCATION,
 		OPT_Wopenacc_dims,
 		(dims[GOMP_DIM_VECTOR]
-		 ? "using vector_length (64), ignoring %d"
-		 : "using vector_length (64), ignoring runtime setting"),
+		 ? G_("using vector_length (64), ignoring %d")
+		 : G_("using vector_length (64), "
+			  "ignoring runtime setting")),
 		dims[GOMP_DIM_VECTOR]);
   dims[GOMP_DIM_VECTOR] = 1;
   changed = true;
diff --git a/gcc/config/gcn/mkoffload.c b/gcc/config/gcn/mkoffload.c
index 57e0f25..d3b5b96 100644
--- a/gcc/config/gcn/mkoffload.c
+++ b/gcc/config/gcn/mkoffload.c
@@ -489,7 +489,7 @@ main (int argc, char **argv)
 
   char *collect_gcc = getenv ("COLLECT_GCC");
   if (collect_gcc == NULL)
-fatal_error (input_location, "COLLECT_GCC must be set.");
+fatal_error (input_location, "COLLECT_GCC must be set");
   const char *gcc_path = dirname (ASTRDUP (collect_gcc));
   const char *gcc_exec = basename (ASTRDUP (collect_gcc));
 
@@ -555,7 +555,7 @@ main (int argc, char **argv)
 	offload_abi = OFFLOAD_ABI_ILP32;
 	  else
 	fatal_error (input_location,
-			 "unrecognizable argument of option " STR);
+			 "unrecognizable argument of option %s", argv[i]);
 	}
 #undef STR
   else if (strcmp (argv[i], "-fopenmp") == 0)
@@ -663,11 +663,11 @@ main (int argc, char **argv)
 
   out = fopen (gcn_s2_name, "w");
   if (!out)
-fatal_error (input_location, "cannot open '%s'", gcn_s2_name);
+fatal_error (input_location, "cannot open %qs", gcn_s2_name);
 
   cfile = fopen (gcn_cfile_name, "w");
   if (!cfile)
-fatal_error (input_location, "cannot open '%s'", gcn_cfile_name);
+fatal_error (input_location, "cannot open %qs", gcn_cfile_name);
 
   process_asm (in, out, cfile);
 
diff --git a/gcc/configure b/gcc/configure
index b7a8e36..4123c2a 100755
--- a/gcc/configure
+++ b/gcc/configure
@@ -746,6 +746,7 @@ manext
 LIBICONV_DEP
 LTLIBICONV
 LIBICONV
+DL_LIB
 LDEXP_LIB
 EXTRA_GCC_LIBS
 GNAT_LIBEXC
@@ -9643,6 +9644,69 @@ LDEXP_LIB="$LIBS"
 LIBS="$save_LIBS"
 
 
+# Some systems need dlopen
+save_LIBS="$LIBS"
+LIBS=
+{ $as_echo "$as_me:${as_lineno-$LINENO}: checking for library containing dlopen" >&5
+$as_echo_n "checking for library containing dlopen... " >&6; }
+if test "${ac_cv_search_dlopen+set}" = set; then :
+  $as_echo_n "(cached) " >&6
+else
+  ac_func_search_save_LIBS=$LIBS
+cat confdefs.h - <<_ACEOF >conftest.$ac_ext
+/* end confdefs.h.  */
+
+/* Override any GCC internal prototype to avoid an error.
+   Use char because int might match the return type of a GCC
+   builtin and then its argument prototype would still apply.  */
+#ifdef __cplusplus
+extern "C"
+#endif
+char dlopen ();
+int
+main ()
+{
+return dlopen ();
+  ;
+  return 0;
+}
+_ACEOF
+for ac_lib in '' dl; do
+  if test -z "$ac_lib"; then
+ac_res="none required"
+  else
+ac_res=-l$ac_lib
+LIBS="-l$ac_lib  $ac_func_search_save_LIBS"
+  fi
+  if ac_fn_cxx_try_link "$LINENO"; then :
+  ac_cv_search_dlopen=$ac_res
+fi
+rm -f

Re: [PATCH, RFC, rs6000, v3] enable early gimple-folding of vec_splat

2018-09-12 Thread Bill Schmidt
On 9/12/18 8:23 AM, Segher Boessenkool wrote:
> Hi!
>
> Sorry this is all taking so long.
>
> On Mon, Aug 20, 2018 at 04:44:30PM -0500, Will Schmidt wrote:
>> 2018-08-20  Will Schmidt  
>> 
>>  * config/rs6000/rs6000.c (rs6000_gimple_fold_builtin): Add support for
>>early gimple folding of vec_splat().
> "early" should indent together with the "*" above, not "config".
>
>>  * tree-vect-generic.c: Remove static from tree_vec_extract() definition.
>>  * gimple-fold.h:  Add an extern define for tree_vec_extract().
> One space after : .
>
>> --- a/gcc/config/rs6000/rs6000.c
>> +++ b/gcc/config/rs6000/rs6000.c
>> @@ -15768,10 +15768,52 @@ rs6000_gimple_fold_builtin (gimple_stmt_iterator 
>> *gsi)
>>   tree splat_tree = build_vector_from_val (TREE_TYPE (lhs), splat_value);
>>   g = gimple_build_assign (lhs, splat_tree);
>>   gimple_set_location (g, gimple_location (stmt));
>>   gsi_replace (gsi, g, true);
>>   return true;
>> +}
> Something is wrong with the indent here (in the original code already).
>
>> +arg0 = gimple_call_arg (stmt, 0); /* input vector.  */
>> +arg1 = gimple_call_arg (stmt, 1); /* index into arg0.  */
>> +/* Only fold the vec_splat_*() if arg1 is both a constant value, and a 
>> valid
>> + index into the arg0 vector.  */
> First comment line wraps, please break it a little earlier?
>
>> +unsigned int n_elts = VECTOR_CST_NELTS (arg0);
>> +if (TREE_CODE (arg1) != INTEGER_CST
>> +|| TREE_INT_CST_LOW (arg1) > (n_elts -1))
>> +  return false;
> I think you should check lower bound 0 as well.  Maybe use IN_RANGE?

And "-1" should be "- 1".  Thanks!

Bill

>
> The rest looks fine, thanks!  Okay for trunk (with the trivialities fixed).
>
>
> Segher
>



Re: [PATCH][AAarch64][v3] Add support for TARGET_COMPUTE_FRAME_LAYOUT

2018-09-12 Thread James Greenhalgh
On Wed, Sep 12, 2018 at 08:07:41AM -0500, Vlad Lazar wrote:
> On 11/09/18 17:53, James Greenhalgh wrote:
> > On Mon, Aug 06, 2018 at 11:14:17AM -0500, Vlad Lazar wrote:
> >> Hi,
> >>
> >> The patch adds support for the TARGET_COMPUTE_FRAME_LAYOUT hook on AArch64
> >> and removes unneeded frame layout recalculation.
> >>
> >> The removed aarch64_layout_frame calls are unnecessary because the 
> >> functions in which
> >> they appear will be called during or after the reload pass in which the 
> >> TARGET_COMPUTE_FRAME_LAYOUT
> >> hook is called. The if statement in aarch64_layout_frame had the purpose 
> >> of avoiding
> >> the extra work from the calls which have been removed and is now redundant.
> >
> > I'm not sure I understand, I may be missing something as the frame layout
> > is complex, but I can't get where I need to be to accept your patch from 
> > this
> > comment.
> >
> > The check you removed ensures that if we're after reload, and the frame is
> > laid out, we do no additional work. That part I understand, and that would
> > mean that any post-reload calls were no-ops. Is the argument that all
> > users of this code that you eliminate are after reload, and consequently
> > would have hit this no-op path? Can you talk me through why each case is
> > safe?
> >
> Thanks for taking a look at the patch.
> 
> Indeed, all the removed calls are happening during or after reload. I'll go 
> trough all of them
> and try to explain the rationale behind.
> 
> aarch64_expand_prologue and aarch64_expand_epilogue are called after the 
> pro_and_epilogue pass,
> which runs after reload where TARGET_COMPUTE_FRAME_LAYOUT is called.
> 
> aarch64_use_return_insn_p checks explicitly for reload_completed at the 
> beginning of the function
> and returns false if reload has not run. So it's safe to remove the call as 
> the frame layout is
> computed by the time it reaches that point.
> 
> aarch64_get_separate_components implements the 
> TARGET_SHRINK_WRAP_GET_SEPARATE_COMPONENTS hook.
> This hook only seems to be used int 
> shrink_wrap.c:try_shrink_wrapping_separate. The actual origin
> of this hook call can be traced back to the pro_and_epilogue pass:
> shrink_wrap.c:try_shrink_wrapping_separate <-
> function.c:thread_prologue_and_epilogue_insns <-
> function.c:rest_of_handle_thread_prologue_and_epilogue (pro_and_epilogue pass 
> entry point).
> Therefore, aarch64_get_separate_components only gets called post reload.
> 
> aarch64_get_separate_components implements the INITIAL_ELIMINATION_OFFSET 
> hook, which is used in:
>   1. rtlanal.c:get_initial_register_offset: Before using the hook it 
> checks that reload has
>   been completed.
>   2. reload1.c:get_initial_register_offset and 
> reload1.c:set_initial_elim_offsets: These functions
>   explicitly call TARGET_COMPUTE_FRAME_LAYOUT before using the hook.
>   3. lra-eliminitations.c:update_reg_eliminate: The 
> TARGET_COMPUTE_FRAME_LAYOUT is, again, called
>   before the INITIAL_ELIMINATION_OFFSET hook is used.
> 
> I hope this helps make things a bit clearer.

Thanks for this, it is very helpful. The patch is OK for trunk.

James

> >> gcc/
> >> 2018-08-06  Vlad Lazar  
> >>
> >>* config/aarch64/aarch64.h (TARGET_COMPUTE_FRAME_LAYOUT): Define.
> >>* config/aarch64/aarch64.c (aarch64_expand_prologue): Remove 
> >> aarch64_layout_frame call.
> >>(aarch64_expand_epilogue): Likewise.
> >>(aarch64_initial_elimination_offset): Likewise.
> >>(aarch64_get_separate_components): Likewise.
> >>(aarch64_use_return_insn_p): Likewise.
> >>(aarch64_layout_frame): Remove unneeded check.
> 


Re: [PATCH 18/25] Fix interleaving of Fortran stop messages

2018-09-12 Thread Andrew Stubbs

On 05/09/18 19:11, Janne Blomqvist wrote:

Ok, thanks.


Committed, thanks.

Apologies again for missing the Fortran list.

Andrew


Re: [PATCH 17/25] Fix Fortran STOP.

2018-09-12 Thread Andrew Stubbs

On 05/09/18 19:09, Janne Blomqvist wrote:

Ok, thanks.


Committed, thanks.

Andrew



[PATCH] Fix PR87280

2018-09-12 Thread Richard Biener


The following fixes PR87280 (and PR87169 in a different way).

Bootstrapped and tested on x86_64-unknown-linux-gnu, applied.

Richard.

2018-09-12  Richard Biener  

PR tree-optimization/87280
* tree-ssa-sccvn.c (process_bb): Handle the case of executable
edge but unreachable target.
(do_rpo_vn): For conservatively handling a PHI only mark
the backedge executable but not the block reachable.

* gcc.dg/torture/pr87280.c: New testcase.

Index: gcc/tree-ssa-sccvn.c
===
--- gcc/tree-ssa-sccvn.c(revision 264234)
+++ gcc/tree-ssa-sccvn.c(working copy)
@@ -5976,15 +5976,24 @@ process_bb (rpo_elim &avail, basic_block
 {
   FOR_EACH_EDGE (e, ei, bb->succs)
{
- if (e->flags & EDGE_EXECUTABLE)
-   continue;
- if (dump_file && (dump_flags & TDF_DETAILS))
-   fprintf (dump_file,
-"marking outgoing edge %d -> %d executable\n",
-e->src->index, e->dest->index);
- gcc_checking_assert (iterate || !(e->flags & EDGE_DFS_BACK));
- e->flags |= EDGE_EXECUTABLE;
- e->dest->flags |= BB_EXECUTABLE;
+ if (!(e->flags & EDGE_EXECUTABLE))
+   {
+ if (dump_file && (dump_flags & TDF_DETAILS))
+   fprintf (dump_file,
+"marking outgoing edge %d -> %d executable\n",
+e->src->index, e->dest->index);
+ gcc_checking_assert (iterate || !(e->flags & EDGE_DFS_BACK));
+ e->flags |= EDGE_EXECUTABLE;
+ e->dest->flags |= BB_EXECUTABLE;
+   }
+ else if (!(e->dest->flags & BB_EXECUTABLE))
+   {
+ if (dump_file && (dump_flags & TDF_DETAILS))
+   fprintf (dump_file,
+"marking destination block %d reachable\n",
+e->dest->index);
+ e->dest->flags |= BB_EXECUTABLE;
+   }
}
 }
   for (gimple_stmt_iterator gsi = gsi_start_bb (bb);
@@ -6124,20 +6133,37 @@ process_bb (rpo_elim &avail, basic_block
  e->flags |= EDGE_EXECUTABLE;
  e->dest->flags |= BB_EXECUTABLE;
}
+ else if (!(e->dest->flags & BB_EXECUTABLE))
+   {
+ if (dump_file && (dump_flags & TDF_DETAILS))
+   fprintf (dump_file,
+"marking destination block %d reachable\n",
+e->dest->index);
+ e->dest->flags |= BB_EXECUTABLE;
+   }
}
   else if (gsi_one_before_end_p (gsi))
{
  FOR_EACH_EDGE (e, ei, bb->succs)
{
- if (e->flags & EDGE_EXECUTABLE)
-   continue;
- if (dump_file && (dump_flags & TDF_DETAILS))
-   fprintf (dump_file,
-"marking outgoing edge %d -> %d executable\n",
-e->src->index, e->dest->index);
- gcc_checking_assert (iterate || !(e->flags & EDGE_DFS_BACK));
- e->flags |= EDGE_EXECUTABLE;
- e->dest->flags |= BB_EXECUTABLE;
+ if (!(e->flags & EDGE_EXECUTABLE))
+   {
+ if (dump_file && (dump_flags & TDF_DETAILS))
+   fprintf (dump_file,
+"marking outgoing edge %d -> %d executable\n",
+e->src->index, e->dest->index);
+ gcc_checking_assert (iterate || !(e->flags & EDGE_DFS_BACK));
+ e->flags |= EDGE_EXECUTABLE;
+ e->dest->flags |= BB_EXECUTABLE;
+   }
+ else if (!(e->dest->flags & BB_EXECUTABLE))
+   {
+ if (dump_file && (dump_flags & TDF_DETAILS))
+   fprintf (dump_file,
+"marking destination block %d reachable\n",
+e->dest->index);
+ e->dest->flags |= BB_EXECUTABLE;
+   }
}
}
 
@@ -6399,7 +6425,6 @@ do_rpo_vn (function *fn, edge entry, bit
if (e->flags & EDGE_DFS_BACK)
  {
e->flags |= EDGE_EXECUTABLE;
-   e->dest->flags |= BB_EXECUTABLE;
/* There can be a non-latch backedge into the header
   which is part of an outer irreducible region.  We
   cannot avoid iterating this block then.  */
Index: gcc/testsuite/gcc.dg/torture/pr87280.c
===
--- gcc/testsuite/gcc.dg/torture/pr87280.c  (nonexistent)
+++ gcc/testsuite/gcc.dg/torture/pr87280.c  (working copy)
@@ -0,0 +1,61 @@
+/* { dg-do compile } */
+/* { dg-additional-options "--param rpo-vn-max-loop-depth=5" } */
+
+int uc;
+
+void
+j8 (int *xv, int f3)
+{
+  uc = 0;
+  while (uc < 1)
+{
+}
+
+  if (*xv == 0)
+{
+  int *

Re: [PATCH][Middle-end]Add a new option to finer control inlining based on function's visibility

2018-09-12 Thread Qing Zhao


> On Sep 12, 2018, at 2:46 AM, Alexander Monakov  wrote:
> 
> On Tue, 11 Sep 2018, Qing Zhao wrote:
>> no immediate use case for -finline-visibility=extern right now. But I guess
>> that GCC developers might use this option to control inlining scope for
>> debugging or development purpose in the future.
> 
> In that case I'd recommend to simplify the patch by implementing only the
> part to suppress inlining of non-static functions. No need to overengineer.

the current design is trying to make the functionality more general although 
the part of disable inlining of
static functions does not have immediate use case. 

the additional implementation for the disabling inlining of static functions is 
very trivial and minimal. 

Personally I’d like to keep the current design.

However, I am flexible on this.

> 
>> thanks for the suggestion, how about -finline-linkage=[all|extern|static]?
> 
> I would suggest -finline-only-static. Note however that I do not have the
> authority to review this: you still need an opinion from an appropriate
> maintainer.

understand, thanks a lot for your comments and suggestions. 

thanks.

Qing
> 
> Alexander



Re: Do not stream TYPE_VALUES to ltrans units

2018-09-12 Thread Richard Biener
On Mon, 10 Sep 2018, Jan Hubicka wrote:

> Hi,
> TYPE_VALUES are currently only used to output warnings on ODR mismatched 
> enums.
> I think those warnings are useful and thus we want to stream them to WPA, but
> there is no need to stream them further to ltrans units.
> 
> Bootstrapped/regtested x86_64-linux, OK?

OK.

Richard.

>   * tree-streamer-out.c (write_ts_type_non_common_tree_pointers):
>   Do not stream TYPE_VALUES to ltrans units.
>   * lto-streamer-out.c (DFS::DFS_write_tree_body): Likewise.
> Index: tree-streamer-out.c
> ===
> --- tree-streamer-out.c   (revision 264180)
> +++ tree-streamer-out.c   (working copy)
> @@ -700,7 +700,9 @@ write_ts_type_non_common_tree_pointers (
>   bool ref_p)
>  {
>if (TREE_CODE (expr) == ENUMERAL_TYPE)
> -stream_write_tree (ob, TYPE_VALUES (expr), ref_p);
> +/* At WPA time we do not need to stream type values; those are only 
> needed
> +   to output ODR warnings.  */
> +stream_write_tree (ob, flag_wpa ? NULL : TYPE_VALUES (expr), ref_p);
>else if (TREE_CODE (expr) == ARRAY_TYPE)
>  stream_write_tree (ob, TYPE_DOMAIN (expr), ref_p);
>else if (RECORD_OR_UNION_TYPE_P (expr))
> Index: lto-streamer-out.c
> ===
> --- lto-streamer-out.c(revision 264180)
> +++ lto-streamer-out.c(working copy)
> @@ -864,7 +992,9 @@ DFS::DFS_write_tree_body (struct output_
>  
>if (CODE_CONTAINS_STRUCT (code, TS_TYPE_NON_COMMON))
>  {
> -  if (TREE_CODE (expr) == ENUMERAL_TYPE)
> +  /* At WPA time we do not need to stream type values; those are only 
> needed
> + to output ODR warnings.  */
> +  if (TREE_CODE (expr) == ENUMERAL_TYPE && !flag_wpa)
>   DFS_follow_tree_edge (TYPE_VALUES (expr));
>else if (TREE_CODE (expr) == ARRAY_TYPE)
>   DFS_follow_tree_edge (TYPE_DOMAIN (expr));
> 
> 

-- 
Richard Biener 
SUSE LINUX GmbH, GF: Felix Imendoerffer, Jane Smithard, Graham Norton, HRB 
21284 (AG Nuernberg)


Re: [PATCH][Middle-end]Add a new option to finer control inlining based on function's visibility

2018-09-12 Thread Richard Biener
On Wed, 12 Sep 2018, Qing Zhao wrote:

> 
> > On Sep 12, 2018, at 2:46 AM, Alexander Monakov  wrote:
> > 
> > On Tue, 11 Sep 2018, Qing Zhao wrote:
> >> no immediate use case for -finline-visibility=extern right now. But I guess
> >> that GCC developers might use this option to control inlining scope for
> >> debugging or development purpose in the future.
> > 
> > In that case I'd recommend to simplify the patch by implementing only the
> > part to suppress inlining of non-static functions. No need to overengineer.
> 
> the current design is trying to make the functionality more general although 
> the part of disable inlining of
> static functions does not have immediate use case. 
> 
> the additional implementation for the disabling inlining of static functions 
> is very trivial and minimal. 
> 
> Personally I’d like to keep the current design.
> 
> However, I am flexible on this.
> 
> > 
> >> thanks for the suggestion, how about -finline-linkage=[all|extern|static]?
> > 
> > I would suggest -finline-only-static. Note however that I do not have the
> > authority to review this: you still need an opinion from an appropriate
> > maintainer.
> 
> understand, thanks a lot for your comments and suggestions. 

With LTO "static" is too blurry - would -finline-only-hidden be OK
with you?

> thanks.
> 
> Qing
> > 
> > Alexander
> 
> 

-- 
Richard Biener 
SUSE LINUX GmbH, GF: Felix Imendoerffer, Jane Smithard, Graham Norton, HRB 
21284 (AG Nuernberg)

[PATCH] Use steady_clock to implement condition_variable::wait_for with predicate

2018-09-12 Thread Mike Crowe
In r263225 (d2e378182a12d68fe5caeffae681252662a2fe7b), I fixed
condition_variable::wait_for to use std::chrono::steady_clock for the wait.
Unfortunately, I failed to spot that the same fix is required for the
wait_for variant that takes a predicate too.

Reported-by: Tom Wood 
---
 libstdc++-v3/include/std/condition_variable | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/libstdc++-v3/include/std/condition_variable 
b/libstdc++-v3/include/std/condition_variable
index 1f84ea324eb..24b96e7c295 100644
--- a/libstdc++-v3/include/std/condition_variable
+++ b/libstdc++-v3/include/std/condition_variable
@@ -158,11 +158,11 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
   const chrono::duration<_Rep, _Period>& __rtime,
   _Predicate __p)
   {
-   using __dur = typename __clock_t::duration;
+   using __dur = typename __steady_clock_t::duration;
auto __reltime = chrono::duration_cast<__dur>(__rtime);
if (__reltime < __rtime)
  ++__reltime;
-   return wait_until(__lock, __clock_t::now() + __reltime, std::move(__p));
+   return wait_until(__lock, __steady_clock_t::now() + __reltime, 
std::move(__p));
   }
 
 native_handle_type
-- 
2.11.0



Re: [PATCH 10/25] Convert BImode vectors.

2018-09-12 Thread Richard Biener
On Tue, Sep 11, 2018 at 4:36 PM Andrew Stubbs  wrote:
>
> On 05/09/18 13:43, Richard Biener wrote:
> > No.  You might want to look into the x86 backend if there's maybe more 
> > tweaks
> > needed when using non-vector mask modes.
>
> I tracked it down to the vector alignment configuration.
>
> Apparently the vectorizer likes to build a "truth" vector, but is
> perfectly happy to put it in a non-vector mode. Unfortunately that
> causes TARGET_VECTOR_ALIGNMENT to be called with the non-vector mode,
> which wasn't handled correctly.
>
> I'm testing to see what happens with the reg_equal and reg_equiv
> conversions, but we might be able to drop this patch.

That's good news!

> Andrew


Re: [PATCH 03/25] Improve TARGET_MANGLE_DECL_ASSEMBLER_NAME.

2018-09-12 Thread Richard Biener
On Wed, Sep 12, 2018 at 12:56 AM Jeff Law  wrote:
>
> On 9/5/18 5:48 AM, a...@codesourcery.com wrote:
> >
> > The HSA GPU drivers can't cope with binaries that have the same symbol 
> > defined
> > multiple times, even though the names are not exported.  This happens 
> > whenever
> > there are file-scope static variables with matching names.  I believe it's 
> > also
> > an issue with switch tables.
> >
> > This is a bug, but outside our control, so we must work around it when 
> > multiple
> > translation units have the same symbol defined.
> >
> > Therefore, we've implemented name mangling via
> > TARGET_MANGLE_DECL_ASSEMBLER_NAME, but found some places where the 
> > middle-end
> > assumes that the decl name matches the name in the source.
> >
> > This patch fixes up those cases by falling back to comparing the unmangled
> > name, when a lookup fails.
> >
> > 2018-09-05  Julian Brown  
> >
> >   gcc/
> >   * cgraphunit.c (handle_alias_pairs): Scan for aliases by DECL_NAME if
> >   decl assembler name doesn't match.
> >
> >   gcc/c-family/
> >   * c-pragma.c (maye_apply_pending_pragma_weaks): Scan for aliases with
> >   DECL_NAME if decl assembler name doesn't match.
> This should be fine.  But please verify there's no regressions on the
> x86_64 linux target, particularly for the multi-versioning tests  (mv*.c
> mv*.C

Err - the patch clearly introduces quadraticness into a path which
isn't acceptable.
get_for_asmname works through a hashtable.

It also looks like !target can readily happen so I wonder what happens if an
assembler name does not match but a DECL_NAME one does by accident?

I fear you have to fix this one in a different way... (and I hope
Honza agrees with me).

Thanks,
Richard.


> Jeff


Re: [PATCH] Introduce libgcov.so run-time library (PR gcov-profile/84107).

2018-09-12 Thread Richard Biener
On Wed, Sep 12, 2018 at 11:27 AM Martin Liška  wrote:
>
> On 9/11/18 5:08 PM, Alexander Monakov wrote:
> > On Tue, 11 Sep 2018, Martin Liška wrote:
> >> I've discussed the topic with Alexander on the Cauldron and we hoped
> >> that the issue with unique __gcov_root can be handled with DECL_COMMON in 
> >> each DSO.
> >> Apparently this approach does not work as all DSOs in an executable have 
> >> eventually
> >> just a single instance of the __gcov_root, which is bad.
> >
> > No, that happens only if they have default visibility.  Emitting them with
> > "hidden" visibility solves this.
>
> Ah, ok, thanks. So theoretically that way should work.
>
> >
> >> So that I returned back to catch the root of the failure. It shows that 
> >> even with
> >> -Wl,--dynamic-list-data __gcov_indirect_call_callee point to a different 
> >> variable
> >> among the DSOs. The reason is that the variable is TLS:
> >
> > (no, that the variable is TLS is not causing the bug; the issue is libraries
> > carry public definitions of just one or both variables depending on if they
> > have indirect calls or not, and the library state becomes "diverged" when
> > one variable gets interposed while the other does not)
>
> I see, I'm attaching patch that does that. I can confirm your test-case works
> fine w/o -Wl,--dynamic-list-data.
>
> I'm wondering if it will work as well with dlopen/dlsym machinery? Or now
> the linker flag will be needed?
>
> >
> >> That said I would like to go this direction. I would be happy to receive
> >> a feedback. Then I'll test the patch.
> >
> > Hm, this TLS change is attacking the issue from a wrong direction.  What I
> > suggested on the Cauldron as a simple and backportable way of solving this
> > is to consolidate the two TLS variables into one TLS struct, which is then
> > either interposed or not as a whole.
>
> Got it, current I prefer the solution.

Sounds good.  I notice the code had this before but...

+  TREE_PUBLIC (ic_tuple_var) = 1;
+  DECL_EXTERNAL (ic_tuple_var) = 1;
+  TREE_STATIC (ic_tuple_var) = 1;

that's one flag too much?!  I suggest to drop DECL_EXTERNAL.

Richard.

> Martin
>
> >
> > Alexander
> >
>


Re: [PATCH][tree-ssa-mathopts] PR tree-optimization/87259: Try execute_cse_reciprocals_1 when optimize_recip_sqrt fails

2018-09-12 Thread Richard Biener
On Wed, Sep 12, 2018 at 2:56 PM Kyrill Tkachov
 wrote:
>
> Hi all,
>
> This PR is an SSA validation error where the definition ends up not 
> dominating a use.
> This appeared after my recip_sqrt optimisation. Though the optimisation 
> doesn't get triggered
> as there is no sqrt involved it prevents execute_cse_reciprocals_1 from 
> running as it's currently
> an either/or situation. I think execute_cse_reciprocals_1 doesn't like it 
> when some divisions get
> execute_cse_reciprocals_1 called on them and some don't.
>
> This patch changes optimize_recip_sqrt to return a boolean indicating success 
> and call execute_cse_reciprocals_1
> if no transform was made so that execute_cse_reciprocals_1 has a chance to 
> optimise it as well.
>
> Bootstrapped and tested on x86_64-unknown-linux-gnu.
> Ok for trunk?

OK.  But your analysis suggests to do it the other way around - call
execute_cse_reciprocals_1 and if that
leaves a division at *gsi process it with optimize_recip_sqrt?

Richard.

> Thanks,
> Kyrill
>
> 2018-09-12  Kyrylo Tkachov  
>
>  PR tree-optimization/87259
>  PR lto/87283
>  * tree-ssa-math-opts.c (optimize_recip_sqrt): Make function return
>  boolean.
>  (pass_cse_reciprocals::execute): Try execute_cse_reciprocals_1 if
>  optimize_recip_sqrt was ineffective.
>
> 2018-09-12  Kyrylo Tkachov  
>
>  PR tree-optimization/87259
>  * gcc.dg/pr87259.c: New test.


Re: [PATCH] Introduce libgcov.so run-time library (PR gcov-profile/84107).

2018-09-12 Thread Alexander Monakov
On Wed, 12 Sep 2018, Martin Liška wrote:
> I see, I'm attaching patch that does that. I can confirm your test-case works
> fine w/o -Wl,--dynamic-list-data.
> 
> I'm wondering if it will work as well with dlopen/dlsym machinery? Or now
> the linker flag will be needed?

No, for this issue dlopen doesn't pose extra complications as far as I know.

> > Hm, this TLS change is attacking the issue from a wrong direction.  What I
> > suggested on the Cauldron as a simple and backportable way of solving this
> > is to consolidate the two TLS variables into one TLS struct, which is then
> > either interposed or not as a whole.
> 
> Got it, current I prefer the solution.

Ack, I think this is a good way to solve the specific issue in the PR.

I'd like to remind that the point of the PR was to draw attention to larger
design issues in libgcov.

There's no decision on the topic of gcov.so. At the Cauldron I had the chance
to talk to you and Richi separately, but we didn't meet together.  Would it
help if I started a new thread summarizing the current status from my point of
view?

Alexander

Re: [PATCH 03/25] Improve TARGET_MANGLE_DECL_ASSEMBLER_NAME.

2018-09-12 Thread Jeff Law
On 9/12/18 8:42 AM, Richard Biener wrote:
> On Wed, Sep 12, 2018 at 12:56 AM Jeff Law  wrote:
>>
>> On 9/5/18 5:48 AM, a...@codesourcery.com wrote:
>>>
>>> The HSA GPU drivers can't cope with binaries that have the same symbol 
>>> defined
>>> multiple times, even though the names are not exported.  This happens 
>>> whenever
>>> there are file-scope static variables with matching names.  I believe it's 
>>> also
>>> an issue with switch tables.
>>>
>>> This is a bug, but outside our control, so we must work around it when 
>>> multiple
>>> translation units have the same symbol defined.
>>>
>>> Therefore, we've implemented name mangling via
>>> TARGET_MANGLE_DECL_ASSEMBLER_NAME, but found some places where the 
>>> middle-end
>>> assumes that the decl name matches the name in the source.
>>>
>>> This patch fixes up those cases by falling back to comparing the unmangled
>>> name, when a lookup fails.
>>>
>>> 2018-09-05  Julian Brown  
>>>
>>>   gcc/
>>>   * cgraphunit.c (handle_alias_pairs): Scan for aliases by DECL_NAME if
>>>   decl assembler name doesn't match.
>>>
>>>   gcc/c-family/
>>>   * c-pragma.c (maye_apply_pending_pragma_weaks): Scan for aliases with
>>>   DECL_NAME if decl assembler name doesn't match.
>> This should be fine.  But please verify there's no regressions on the
>> x86_64 linux target, particularly for the multi-versioning tests  (mv*.c
>> mv*.C
> 
> Err - the patch clearly introduces quadraticness into a path which
> isn't acceptable.
> get_for_asmname works through a hashtable.
But isn't this only being rused when we aren't able to find the symbol?
 My impression was that should be rare, except for the GCN target.

> 
> It also looks like !target can readily happen so I wonder what happens if an
> assembler name does not match but a DECL_NAME one does by accident?
> 
> I fear you have to fix this one in a different way... (and I hope
> Honza agrees with me).
Honza certainly knows the code better than I.  If  he thinks there's a
performance issue and this needs to be resolved a better way, then I'll
go along with that.

Jeff


[PATCH, i386]: Remove sqrt_extendxf3_i387 insn pattern

2018-09-12 Thread Uros Bizjak
Hello!

This pattern is not needed because all x87 float_extend RTXes
degenerate to a plain
move (or a no-op move).

2018-09-11  Uros Bizjak  

* config/i386/i386.md (sqrt_extendxf3_i387): Remove.
(sqrt2): Extend operand 1 to XFmode and generate sqrtxf3 insn.

Patch was bootstrapped and regression tested on x86_64-linux-gnu {,-m32}.

Committed to mainline SVN.

Uros.
Index: config/i386/i386.md
===
--- config/i386/i386.md (revision 264242)
+++ config/i386/i386.md (working copy)
@@ -15126,19 +15126,6 @@
(set_attr "amdfam10_decode" "direct")
(set_attr "bdver1_decode" "direct")])
 
-(define_insn "sqrt_extendxf2_i387"
-  [(set (match_operand:XF 0 "register_operand" "=f")
-   (sqrt:XF
- (float_extend:XF
-   (match_operand:MODEF 1 "register_operand" "0"]
-  "TARGET_USE_FANCY_MATH_387"
-  "fsqrt"
-  [(set_attr "type" "fpspc")
-   (set_attr "mode" "XF")
-   (set_attr "athlon_decode" "direct")
-   (set_attr "amdfam10_decode" "direct")
-   (set_attr "bdver1_decode" "direct")])
-
 (define_insn "*rsqrtsf2_sse"
   [(set (match_operand:SF 0 "register_operand" "=x,x")
(unspec:SF [(match_operand:SF 1 "nonimmediate_operand" "x,m")]
@@ -15201,9 +15188,10 @@
   if (!(SSE_FLOAT_MODE_P (mode) && TARGET_SSE_MATH))
 {
   rtx op0 = gen_reg_rtx (XFmode);
-  rtx op1 = force_reg (mode, operands[1]);
+  rtx op1 = gen_reg_rtx (XFmode);
 
-  emit_insn (gen_sqrt_extendxf2_i387 (op0, op1));
+  emit_insn (gen_extendxf2 (op1, operands[1]));
+  emit_insn (gen_sqrtxf2 (op0, op1));
   emit_insn (gen_truncxf2_i387_noop_unspec (operands[0], op0));
   DONE;
}


Re: [PATCH] Introduce libgcov.so run-time library (PR gcov-profile/84107).

2018-09-12 Thread Richard Biener
On Wed, Sep 12, 2018 at 5:02 PM Alexander Monakov  wrote:
>
> On Wed, 12 Sep 2018, Martin Liška wrote:
> > I see, I'm attaching patch that does that. I can confirm your test-case 
> > works
> > fine w/o -Wl,--dynamic-list-data.
> >
> > I'm wondering if it will work as well with dlopen/dlsym machinery? Or now
> > the linker flag will be needed?
>
> No, for this issue dlopen doesn't pose extra complications as far as I know.
>
> > > Hm, this TLS change is attacking the issue from a wrong direction.  What I
> > > suggested on the Cauldron as a simple and backportable way of solving this
> > > is to consolidate the two TLS variables into one TLS struct, which is then
> > > either interposed or not as a whole.
> >
> > Got it, current I prefer the solution.
>
> Ack, I think this is a good way to solve the specific issue in the PR.
>
> I'd like to remind that the point of the PR was to draw attention to larger
> design issues in libgcov.
>
> There's no decision on the topic of gcov.so. At the Cauldron I had the chance
> to talk to you and Richi separately, but we didn't meet together.  Would it
> help if I started a new thread summarizing the current status from my point of
> view?

Definitely.

Richard.

> Alexander


Re: [PATCH 09/25] Elide repeated RTL elements.

2018-09-12 Thread Jeff Law
On 9/12/18 2:46 AM, Andrew Stubbs wrote:
> On 11/09/18 23:45, Jeff Law wrote:
>> Does this need a corresponding change to the RTL front-end so that it
>> can read the new form?
> 
> There's an RTL front-end? When did that happen... clearly I've not been
> paying attention.
Within the last couple years.  It's primarily for testing purposes so
that we can take RTL, feed it back into a pass and then look at the
results.  There's some tests in the testsuite you can poke at to see how
it works -- they're probably also a good starting point for a test that
we can properly reconstruct RTL with elided elements.


> 
> If it's expected that dumps can be fed back in unmodified then yes, it
> needs to recognise the new output.
I don't think it goes in strictly unmodified, but ISTM we need to be
able to handle this case.

Thanks,
Jeff


Re: [PATCH 03/25] Improve TARGET_MANGLE_DECL_ASSEMBLER_NAME.

2018-09-12 Thread Richard Biener
On Wed, Sep 12, 2018 at 5:07 PM Jeff Law  wrote:
>
> On 9/12/18 8:42 AM, Richard Biener wrote:
> > On Wed, Sep 12, 2018 at 12:56 AM Jeff Law  wrote:
> >>
> >> On 9/5/18 5:48 AM, a...@codesourcery.com wrote:
> >>>
> >>> The HSA GPU drivers can't cope with binaries that have the same symbol 
> >>> defined
> >>> multiple times, even though the names are not exported.  This happens 
> >>> whenever
> >>> there are file-scope static variables with matching names.  I believe 
> >>> it's also
> >>> an issue with switch tables.
> >>>
> >>> This is a bug, but outside our control, so we must work around it when 
> >>> multiple
> >>> translation units have the same symbol defined.
> >>>
> >>> Therefore, we've implemented name mangling via
> >>> TARGET_MANGLE_DECL_ASSEMBLER_NAME, but found some places where the 
> >>> middle-end
> >>> assumes that the decl name matches the name in the source.
> >>>
> >>> This patch fixes up those cases by falling back to comparing the unmangled
> >>> name, when a lookup fails.
> >>>
> >>> 2018-09-05  Julian Brown  
> >>>
> >>>   gcc/
> >>>   * cgraphunit.c (handle_alias_pairs): Scan for aliases by DECL_NAME 
> >>> if
> >>>   decl assembler name doesn't match.
> >>>
> >>>   gcc/c-family/
> >>>   * c-pragma.c (maye_apply_pending_pragma_weaks): Scan for aliases 
> >>> with
> >>>   DECL_NAME if decl assembler name doesn't match.
> >> This should be fine.  But please verify there's no regressions on the
> >> x86_64 linux target, particularly for the multi-versioning tests  (mv*.c
> >> mv*.C
> >
> > Err - the patch clearly introduces quadraticness into a path which
> > isn't acceptable.
> > get_for_asmname works through a hashtable.
> But isn't this only being rused when we aren't able to find the symbol?

This case seems to happen though.

>  My impression was that should be rare, except for the GCN target.

Still even for the GCN target it looks like a hack given the linear search.

I think it is required to track down the "invalid" uses of DECL_NAME vs.
"mangled" name instead.

> >
> > It also looks like !target can readily happen so I wonder what happens if an
> > assembler name does not match but a DECL_NAME one does by accident?
> >
> > I fear you have to fix this one in a different way... (and I hope
> > Honza agrees with me).
> Honza certainly knows the code better than I.  If  he thinks there's a
> performance issue and this needs to be resolved a better way, then I'll
> go along with that.

I think the symptom GCN sees needs to be better understood - like wheter
it is generally OK to mangle things arbitrarily.

Note that TARGET_MANGLE_DECL_ASSEMBLER_NAME might not be
a general symbol mangling hook but may be restricted to symbols with
specific visibility.

Richard.

>
> Jeff


Re: [PATCH][Middle-end]Add a new option to finer control inlining based on function's visibility

2018-09-12 Thread Jeff Law
On 9/11/18 9:12 AM, Qing Zhao wrote:
> Hi, 
> 
> This is a simple patch to add a new first-class option 
> 
> -finline-visibility={all|extern|static}
> 
> to finer control inlining based on function’s visibility. 
> 
> '-finline-visibility=[all|extern|static]'
>  By default, GCC inlines functions without considering their
>  visibility.  This flag allows finer control of inlining based on
>  their visibility.
> 
>  The value 'extern' tells the compiler to only inline functions with
>  external visibility.  The value 'static' tells the compiler to only
>  inline functions with static visibility.  The value 'all' tells the
>  compilers to inline functions without considering their visibility.
> 
>  The default value of '-finline-visibility' is 'all'.
> 
> The major purpose of this option is to provide a way for the user 
> to finer choose the inline candidates based on function’s visibility.
> For example, some online patching users might want to limit the inlining
> to only static functions to avoid patching the callers of global functions
> in order to control the memory consumption caused by online patching. 
> 
> let me know any comments and suggestions.
> 
> thanks.
> 
> Qing
> 
> gcc/ChangeLog:
> 
> +2018-09-11  Qing Zhao  
> +
> +   * cif-code.def (FUNCTION_EXTERN): New CIFCODE.
> +   (FUNCTION_STATIC): Likewise.
> +   * common.opt (-finline-visibility=): New option.
> +   * doc/invoke.texi: Document -finline-visibility=.
> +   * flag-types.h: Add a new enum inline_visibility.
> +   * ipa-inline.c (can_inline_edge_p): Control inlining based on
> +   function's visibility. 
> +
> 
> gcc/testsuite/ChangeLog:
> 
> +2018-09-11  Qing Zhao  
> +
> +   * gcc.dg/inline_visibility_1.c: New test.
> +   * gcc.dg/inline_visibility_2.c: New test.
Presumably one of the cases where this capability is really helpful is
things like ksplice.   If you have a function with global scope that has
been potentially inlined, then it's a lot harder track down those
inlining points at DTRT.

We ran into this internally when looking at hot patching some of the
spinlock code in the kernel.  It would have been real helpful if the
kernel had been compiled with this kind of option :-)

So conceptually I can see value in this kind of option.

Not a review, just wanted to chime in on potential use scenarios...

jeff


Re: [PATCH 04/25] SPECIAL_REGNO_P

2018-09-12 Thread Richard Henderson
On 09/05/2018 04:48 AM, a...@codesourcery.com wrote:
> @@ -1198,6 +1198,10 @@ see the files COPYING3 and COPYING.RUNTIME 
> respectively.  If not, see
>  #define NO_FUNCTION_CSE false
>  #endif
>  
> +#ifndef SPECIAL_REGNO_P
> +#define SPECIAL_REGNO_P(REGNO) false
> +#endif
> +
>  #ifndef HARD_REGNO_RENAME_OK
>  #define HARD_REGNO_RENAME_OK(FROM, TO) true
>  #endif
...

> @@ -320,6 +320,7 @@ check_new_reg_p (int reg ATTRIBUTE_UNUSED, int new_reg,
>  if (TEST_HARD_REG_BIT (this_unavailable, new_reg + i)
>   || fixed_regs[new_reg + i]
>   || global_regs[new_reg + i]
> + || SPECIAL_REGNO_P (new_reg + i)
>   /* Can't use regs which aren't saved by the prologue.  */
>   || (! df_regs_ever_live_p (new_reg + i)
>   && ! call_used_regs[new_reg + i])

How is this different from HARD_REGNO_RENAME_OK via the TO argument?
Seems like the hook you're looking for already exists...


r~


Re: [PATCH 21/25] GCN Back-end (part 2/2).

2018-09-12 Thread Joseph Myers
On Wed, 12 Sep 2018, Andrew Stubbs wrote:

> > I'd expect this to fail on non-Unix configurations that don't have -ldl,
> > and thus to need appropriate conditionals / configure tests to avoid that
> > build failure.
> 
> The attached diff from the previous patch should address these issues, I hope.
> If they're OK I'll incorporate the changes into the next version of the (much)
> larger patch when I next post them.

> + case "$host" in
> + x86_64*-*-linux-gnu )
> + if test "$ac_res" != no; then
> + extra_programs="${extra_programs} gcn-run\$(exeext)"
> + fi

ac_res is a generic autoconf variable used by a lot of tests.  I don't 
think it's at all safe to embed an assumption into the middle of 
config.gcc about which the last test run that sets such a variable is; you 
need a variable explicitly relating to whatever the relevant test is.

What if the host is x86_64 with the x32 ABI?  If the requirement is for 
various types to be the same between the host and GCN, I'd expect that x32 
ABI on the host means it is unsuitable for using gcn-run.  Or are the 
requirements for compatible types between some other two pieces, so that 
an x32 gcn-run is OK?

-- 
Joseph S. Myers
jos...@codesourcery.com


Re: [PATCH] avoid warning on constant strncpy until next statement is reachable (PR 87028)

2018-09-12 Thread Martin Sebor

PING: https://gcc.gnu.org/ml/gcc-patches/2018-08/msg01818.html

There have been follow up comments in this thread suggesting
alternate designs for the nonstr attribute but (AFAICT) no
objections to the bug fix.  I don't expect to have the time
to redesign and reimplement the attribute for GCC 9 in terms
of the alias oracle as was suggested but I would like to avoid
the warning in the report.

Is the final patch okay to commit?

Martin

On 08/28/2018 06:12 PM, Martin Sebor wrote:

Sadly, dstbase is the PARM_DECL for d.  That's where things are going
"wrong".  Not sure why you're getting the PARM_DECL in that case.  I'd
debug get_addr_base_and_unit_offset to understand what's going on.
Essentially you're getting different results of
get_addr_base_and_unit_offset in a case where they arguably should be
the same.


Probably get_attr_nonstring_decl has the same "mistake" and returns
the PARM_DECL instead of the SSA name pointer.  So we're comparing
apples and oranges here.


Returning the SSA_NAME_VAR from get_attr_nonstring_decl() is
intentional but the function need not (perhaps should not)
also set *REF to it.



Yeah:

/* If EXPR refers to a character array or pointer declared attribute
   nonstring return a decl for that array or pointer and set *REF to
   the referenced enclosing object or pointer.  Otherwise returns
   null.  */

tree
get_attr_nonstring_decl (tree expr, tree *ref)
{
  tree decl = expr;
  if (TREE_CODE (decl) == SSA_NAME)
{
  gimple *def = SSA_NAME_DEF_STMT (decl);

  if (is_gimple_assign (def))
{
  tree_code code = gimple_assign_rhs_code (def);
  if (code == ADDR_EXPR
  || code == COMPONENT_REF
  || code == VAR_DECL)
decl = gimple_assign_rhs1 (def);
}
  else if (tree var = SSA_NAME_VAR (decl))
decl = var;
}

  if (TREE_CODE (decl) == ADDR_EXPR)
decl = TREE_OPERAND (decl, 0);

  if (ref)
*ref = decl;

I see a lot of "magic" here again in the attempt to "propagate"
a nonstring attribute.


That's the function's purpose: to look for the attribute.  Is
there a better way to do this?


Note

foo (char *p __attribute__(("nonstring")))
{
  p = "bar";
  strlen (p); // or whatever is necessary to call get_attr_nonstring_decl
}

is perfectly valid and p as passed to strlen is _not_ nonstring(?).


I don't know if you're saying that it should get a warning or
shouldn't.  Right now it doesn't because the strlen() call is
folded before we check for nonstring.

I could see an argument for diagnosing it but I suspect you
wouldn't like it because it would mean more warning from
the folder.  I could also see an argument against it because,
as you said, it's safe.

If you take the assignment to p away then a warning is issued,
and that's because p is declared with attribute nonstring.
That's also why get_attr_nonstring_decl looks at SSA_NAME_VAR.


I think in your code comparing bases you want to look at the _original_
argument to the string function rather than what get_attr_nonstring_decl
returned as ref.


I've adjusted get_attr_nonstring_decl() to avoid setting *REF
to SSA_NAME_VAR.  That let me remove the GIMPLE_NOP code from
the patch.  I've also updated the comment above SSA_NAME_VAR
to clarify its purpose per Jeff's comments.

Attached is an updated revision with these changes.

Martin




Re: [PATCH][Middle-end]Add a new option to finer control inlining based on function's visibility

2018-09-12 Thread Alexander Monakov
On Wed, 12 Sep 2018, Richard Biener wrote:
> With LTO "static" is too blurry - would -finline-only-hidden be OK
> with you?

This doesn't sound right - in non-pic, everything is hidden, effectively.
And the intended use is with Linux kernel, which does not use -fpic.

I agree with LTO this option makes less sense, but I wouldn't expect LTO
to be used for livepatching-capable kernels.

Alexander


Re: [PATCH][Middle-end]Add a new option to finer control inlining based on function's visibility

2018-09-12 Thread Qing Zhao


> On Sep 12, 2018, at 9:33 AM, Richard Biener  wrote:
> 
> On Wed, 12 Sep 2018, Qing Zhao wrote:
> 
>> 
>>> On Sep 12, 2018, at 2:46 AM, Alexander Monakov  wrote:
>>> 
>>> On Tue, 11 Sep 2018, Qing Zhao wrote:
 no immediate use case for -finline-visibility=extern right now. But I guess
 that GCC developers might use this option to control inlining scope for
 debugging or development purpose in the future.
>>> 
>>> In that case I'd recommend to simplify the patch by implementing only the
>>> part to suppress inlining of non-static functions. No need to overengineer.
>> 
>> the current design is trying to make the functionality more general although 
>> the part of disable inlining of
>> static functions does not have immediate use case. 
>> 
>> the additional implementation for the disabling inlining of static functions 
>> is very trivial and minimal. 
>> 
>> Personally I’d like to keep the current design.
>> 
>> However, I am flexible on this.
>> 
>>> 
 thanks for the suggestion, how about -finline-linkage=[all|extern|static]?
>>> 
>>> I would suggest -finline-only-static. Note however that I do not have the
>>> authority to review this: you still need an opinion from an appropriate
>>> maintainer.
>> 
>> understand, thanks a lot for your comments and suggestions. 
> 
> With LTO "static" is too blurry - would -finline-only-hidden be OK
> with you?

the new inlining control is mainly to control inlining based on: whether this 
function is declared with “static” keyword or not at the source code level,
i.e, whether this function can ONLY be accessed inside the file or can be 
accessed from other files. 

As my understanding, even non-“static” functions can be “hidden” during the 
linking time, so, I think that “hidden” is not good for this purpose.

Qing
> 
>> thanks.
>> 
>> Qing
>>> 
>>> Alexander
>> 
>> 
> 
> -- 
> Richard Biener mailto:rguent...@suse.de>>
> SUSE LINUX GmbH, GF: Felix Imendoerffer, Jane Smithard, Graham Norton, HRB 
> 21284 (AG Nuernberg)



Re: [PATCH 04/25] SPECIAL_REGNO_P

2018-09-12 Thread Andrew Stubbs

On 12/09/18 16:31, Richard Henderson wrote:

How is this different from HARD_REGNO_RENAME_OK via the TO argument?
Seems like the hook you're looking for already exists...


I don't know how we got here (I didn't do the original work), but the 
SPECIAL_REGNO_P was indeed used in HARD_REGNO_RENAME_OK, as well as in 
some extra places in regrename.


This definitely was necessary at some point, but I've not yet reproduced 
the issue in the current code-base, so I suspect you may be correct.


Andrew


Re: [PATCH][Middle-end]Add a new option to finer control inlining based on function's visibility

2018-09-12 Thread Qing Zhao


> On Sep 12, 2018, at 10:22 AM, Jeff Law  wrote:
> 
> On 9/11/18 9:12 AM, Qing Zhao wrote:
>> Hi, 
>> 
>> This is a simple patch to add a new first-class option 
>> 
>> -finline-visibility={all|extern|static}
>> 
>> to finer control inlining based on function’s visibility. 
>> 
>> '-finline-visibility=[all|extern|static]'
>> By default, GCC inlines functions without considering their
>> visibility.  This flag allows finer control of inlining based on
>> their visibility.
>> 
>> The value 'extern' tells the compiler to only inline functions with
>> external visibility.  The value 'static' tells the compiler to only
>> inline functions with static visibility.  The value 'all' tells the
>> compilers to inline functions without considering their visibility.
>> 
>> The default value of '-finline-visibility' is 'all'.
>> 
>> The major purpose of this option is to provide a way for the user 
>> to finer choose the inline candidates based on function’s visibility.
>> For example, some online patching users might want to limit the inlining
>> to only static functions to avoid patching the callers of global functions
>> in order to control the memory consumption caused by online patching. 
>> 
>> let me know any comments and suggestions.
>> 
>> thanks.
>> 
>> Qing
>> 
>> gcc/ChangeLog:
>> 
>> +2018-09-11  Qing Zhao  
>> +
>> +   * cif-code.def (FUNCTION_EXTERN): New CIFCODE.
>> +   (FUNCTION_STATIC): Likewise.
>> +   * common.opt (-finline-visibility=): New option.
>> +   * doc/invoke.texi: Document -finline-visibility=.
>> +   * flag-types.h: Add a new enum inline_visibility.
>> +   * ipa-inline.c (can_inline_edge_p): Control inlining based on
>> +   function's visibility. 
>> +
>> 
>> gcc/testsuite/ChangeLog:
>> 
>> +2018-09-11  Qing Zhao  
>> +
>> +   * gcc.dg/inline_visibility_1.c: New test.
>> +   * gcc.dg/inline_visibility_2.c: New test.
> Presumably one of the cases where this capability is really helpful is
> things like ksplice.   If you have a function with global scope that has
> been potentially inlined, then it's a lot harder track down those
> inlining points at DTRT.

exactly one of the use cases and motivation for this new functionality.  Thanks!

> 
> We ran into this internally when looking at hot patching some of the
> spinlock code in the kernel.  It would have been real helpful if the
> kernel had been compiled with this kind of option :-)
yes. 

the major purpose of this new option is to disable inlining of global functions 
to help
online patching.

> 
> So conceptually I can see value in this kind of option.
> 
> Not a review, just wanted to chime in on potential use scenarios…

thank you.

Qing
> 
> jeff



Re: [PATCH] combine: Do another check before splitting a parallel (PR86771)

2018-09-12 Thread Segher Boessenkool
On Wed, Aug 22, 2018 at 01:19:44PM +, Segher Boessenkool wrote:
> When combine splits a resulting parallel into its two SETs, it has to
> place one at i2, and the other stays at i3.  This does not work if the
> destination of the SET that will be placed at i2 is modified between
> i2 and i3.  This patch fixes it.
> 
> Tested on powerpc64-linux {-m32,-m64}, and the reported failing fortran
> testcase on an arm-linux-gnueabi crosscompiler (by inspection).
> 
> Committing.
> 
> 
> 2018-08-22  Segher Boessenkool  
> 
>   * combine.c (try_combine): Do not allow splitting a resulting PARALLEL
>   of two SETs into those two SETs, one to be placed at i2, if that SETs
>   destination is modified between i2 and i3.

I backported this to 8 now.


Segher


Re: [PATCH][tree-ssa-mathopts] PR tree-optimization/87259: Try execute_cse_reciprocals_1 when optimize_recip_sqrt fails

2018-09-12 Thread Kyrill Tkachov


On 12/09/18 15:51, Richard Biener wrote:

On Wed, Sep 12, 2018 at 2:56 PM Kyrill Tkachov
 wrote:

Hi all,

This PR is an SSA validation error where the definition ends up not dominating 
a use.
This appeared after my recip_sqrt optimisation. Though the optimisation doesn't 
get triggered
as there is no sqrt involved it prevents execute_cse_reciprocals_1 from running 
as it's currently
an either/or situation. I think execute_cse_reciprocals_1 doesn't like it when 
some divisions get
execute_cse_reciprocals_1 called on them and some don't.

This patch changes optimize_recip_sqrt to return a boolean indicating success 
and call execute_cse_reciprocals_1
if no transform was made so that execute_cse_reciprocals_1 has a chance to 
optimise it as well.

Bootstrapped and tested on x86_64-unknown-linux-gnu.
Ok for trunk?

OK.  But your analysis suggests to do it the other way around - call
execute_cse_reciprocals_1 and if that
leaves a division at *gsi process it with optimize_recip_sqrt?



Ah, that's a shorter approach, this version implements that.

Bootstrapped and tested on x86_64-unknown-linux-gnu.
Is this preferable?

Thanks,
Kyrill

2018-09-12  Kyrylo Tkachov  

PR tree-optimization/87259
PR lto/87283
(pass_cse_reciprocals::execute): Run optimize_recip_sqrt after
execute_cse_reciprocals_1 has tried transforming.

2018-09-12  Kyrylo Tkachov  

PR tree-optimization/87259
* gcc.dg/pr87259.c: New test.

diff --git a/gcc/testsuite/gcc.dg/pr87259.c b/gcc/testsuite/gcc.dg/pr87259.c
new file mode 100644
index ..527a60a37adb520591ff40a52a12706aa1582068
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/pr87259.c
@@ -0,0 +1,14 @@
+/* { dg-do compile } */
+/* { dg-options "-Ofast" } */
+
+int a, b, c;
+int *e;
+float f;
+void h() {
+  for (int g;;) {
+float d = b, i = 0 / f, j = a / (f * f), k, l = 0 / d;
+c = i + j;
+g = l;
+e[g] = c / d * k / d;
+  }
+}
diff --git a/gcc/tree-ssa-math-opts.c b/gcc/tree-ssa-math-opts.c
index 055b4e08e97469bac3ba7e26d61193512c234575..2b3ee887ad047b6e1430efe901965e2aeccf163a 100644
--- a/gcc/tree-ssa-math-opts.c
+++ b/gcc/tree-ssa-math-opts.c
@@ -547,7 +547,7 @@ free_bb (struct occurrence *occ)
depending on the uses of x, r1, r2.  This removes one multiplication and
allows the sqrt and division operations to execute in parallel.
DEF_GSI is the gsi of the initial division by sqrt that defines
-   DEF (x in the example abovs).  */
+   DEF (x in the example above).  */
 
 static void
 optimize_recip_sqrt (gimple_stmt_iterator *def_gsi, tree def)
@@ -947,13 +947,13 @@ pass_cse_reciprocals::execute (function *fun)
 	  && FLOAT_TYPE_P (TREE_TYPE (def))
 	  && TREE_CODE (def) == SSA_NAME)
 	{
+	  execute_cse_reciprocals_1 (&gsi, def);
+	  stmt = gsi_stmt (gsi);
 	  if (flag_unsafe_math_optimizations
 		  && is_gimple_assign (stmt)
 		  && !stmt_can_throw_internal (stmt)
 		  && gimple_assign_rhs_code (stmt) == RDIV_EXPR)
 		optimize_recip_sqrt (&gsi, def);
-	  else
-		execute_cse_reciprocals_1 (&gsi, def);
 	}
 	}
 


Re: [PATCH 03/25] Improve TARGET_MANGLE_DECL_ASSEMBLER_NAME.

2018-09-12 Thread Andrew Stubbs

On 12/09/18 16:16, Richard Biener wrote:

I think the symptom GCN sees needs to be better understood - like wheter
it is generally OK to mangle things arbitrarily.


The name mangling is a horrible workaround for a bug in the HSA runtime 
code (which we do not own, cannot fix, and would want to support old 
versions of anyway).  Basically it refuses to load any binary that has 
the same symbol as another, already loaded, binary, regardless of the 
symbol's linkage.  Worse, it also rejects any binary that has duplicate 
symbols within it, despite the fact that it already linked just fine.


Adding the extra lookups is enough to build GCN binaries, with mangled 
names, whereas the existing name mangling support was either more 
specialized or bit rotten (I don't know which).


It may well be that there's a better way to solve the problem, or at 
least to do the lookups.


It may also be that there are some unintended consequences, such as 
false name matches, but I don't know of any at present.


Julian, can you comment, please?

Andrew


Re: [PATCH 21/25] GCN Back-end (part 2/2).

2018-09-12 Thread Andrew Stubbs

On 12/09/18 16:32, Joseph Myers wrote:

+   case "$host" in
+   x86_64*-*-linux-gnu )
+   if test "$ac_res" != no; then
+   extra_programs="${extra_programs} gcn-run\$(exeext)"
+   fi


ac_res is a generic autoconf variable used by a lot of tests.  I don't
think it's at all safe to embed an assumption into the middle of
config.gcc about which the last test run that sets such a variable is; you
need a variable explicitly relating to whatever the relevant test is.


Oops, EWRONGPATCH. That's supposed to be "ac_cv_search_dlopen".


What if the host is x86_64 with the x32 ABI?  If the requirement is for
various types to be the same between the host and GCN, I'd expect that x32
ABI on the host means it is unsuitable for using gcn-run.  Or are the
requirements for compatible types between some other two pieces, so that
an x32 gcn-run is OK?


No, x32 would not be ok. The test as is rejects x86_64-*-linux-gnux32, 
so that ought not to be a problem unless somebody has an x32 system with 
the default triplet.


Is that something we really need to care about? If so then I guess a new 
configure test is needed.


Andrew


Re: [PATCH 21/25] GCN Back-end (part 2/2).

2018-09-12 Thread Joseph Myers
On Wed, 12 Sep 2018, Andrew Stubbs wrote:

> > What if the host is x86_64 with the x32 ABI?  If the requirement is for
> > various types to be the same between the host and GCN, I'd expect that x32
> > ABI on the host means it is unsuitable for using gcn-run.  Or are the
> > requirements for compatible types between some other two pieces, so that
> > an x32 gcn-run is OK?
> 
> No, x32 would not be ok. The test as is rejects x86_64-*-linux-gnux32, so that
> ought not to be a problem unless somebody has an x32 system with the default
> triplet.

I don't see anything in config.guess that would create such a name 
(config.guess tries to avoid testing $CC as much as possible, and testing 
the ABI used by $CC would be necessary to distinguish x32), so while 
people can use such a triplet to change the default ABI for the target, I 
wouldn't expect it necessarily to be used for the host.

-- 
Joseph S. Myers
jos...@codesourcery.com


Re: VRP: undefined shifting calculation should not need sign bit

2018-09-12 Thread Richard Sandiford
Aldy Hernandez  writes:
> diff --git a/gcc/wide-int-range.h b/gcc/wide-int-range.h
> index 589fdea4df6..e9ee418e5b2 100644
> --- a/gcc/wide-int-range.h
> +++ b/gcc/wide-int-range.h
> @@ -131,7 +131,7 @@ extern bool wide_int_range_div (wide_int &wmin, wide_int 
> &wmax,
>  /* Return TRUE if shifting by range [MIN, MAX] is undefined behavior.  */
>  
>  inline bool
> -wide_int_range_shift_undefined_p (signop sign, unsigned prec,
> +wide_int_range_shift_undefined_p (unsigned prec,
> const wide_int &min, const wide_int &max)
>  {
>/* ?? Note: The original comment said this only applied to
> @@ -142,7 +142,7 @@ wide_int_range_shift_undefined_p (signop sign, unsigned 
> prec,
>   behavior from the shift operation.  We cannot even trust
>   SHIFT_COUNT_TRUNCATED at this stage, because that applies to rtl
>   shifts, and the operation at the tree level may be widened.  */
> -  return wi::lt_p (min, 0, sign) || wi::ge_p (max, prec, sign);
> +  return wi::sign_mask (min) || wi::ge_p (max, prec, UNSIGNED);

I don't think this is a good idea.  Logically the comparison should
be done relative to the TYPE_SIGN of the type, so I think the original
code was correct.

Thanks,
Richard


Re: [PATCH][GCC][AArch64] Updated stack-clash implementation supporting 64k probes. [patch (1/7)]

2018-09-12 Thread Richard Sandiford
Tamar Christina  writes:
> Hi Richard,
>
>> -Original Message-
>> From: Richard Sandiford 
>> Sent: Tuesday, September 11, 2018 15:49
>> To: Tamar Christina 
>> Cc: Jeff Law ; gcc-patches@gcc.gnu.org; nd
>> ; James Greenhalgh ;
>> Richard Earnshaw ; Marcus Shawcroft
>> 
>> Subject: Re: [PATCH][GCC][AArch64] Updated stack-clash implementation
>> supporting 64k probes. [patch (1/7)]
>> 
>> Tamar Christina  writes:
>> > Hi Richard,
>> >
>> > The 08/28/2018 21:58, Richard Sandiford wrote:
>> >> Tamar Christina  writes:
>> >> > +  HOST_WIDE_INT guard_used_by_caller =
>> STACK_CLASH_CALLER_GUARD;
>> >> > + /* When doing the final adjustment for the outgoing argument size
>> >> > we can't
>> >> > + assume that LR was saved at position 0.  So subtract it's offset
>> >> > from the
>> >> > + ABI safe buffer so that we don't accidentally allow an adjustment
>> that
>> >> > + would result in an allocation larger than the ABI buffer without
>> >> > + probing.  */
>> >> > +  HOST_WIDE_INT min_probe_threshold
>> >> > += final_adjustment_p
>> >> > +  ? guard_used_by_caller - cfun->machine-
>> >frame.reg_offset[LR_REGNUM]
>> >> > +  : guard_size - guard_used_by_caller;
>> >> [...]
>> >> > +  if (residual)
>> >> > +{
>> >> > +  aarch64_sub_sp (temp1, temp2, residual, frame_related_p);
>> >> > +  if (residual >= min_probe_threshold)
>> >> > +   {
>> >> > + if (dump_file)
>> >> > +   fprintf (dump_file,
>> >> > +"Stack clash AArch64 prologue residuals: "
>> >> > +HOST_WIDE_INT_PRINT_DEC " bytes, probing will be
>> required."
>> >> > +"\n", residual);
>> >> > + emit_stack_probe (plus_constant (Pmode, stack_pointer_rtx,
>> >> > +  STACK_CLASH_CALLER_GUARD));
>> >>
>> >> reg_offsets are nonnegative, so if LR_REGNUM isn't saved at position
>> >> 0, min_probe_threshold will be less than STACK_CLASH_CALLER_GUARD.
>> >> It looks like the probe would then write above the region.
>> >>
>> >> Using >= rather than > means that the same thing could happen when
>> >> LR_REGNUM is at position 0, if the residual is exactly
>> >> STACK_CLASH_CALLER_GUARD.
>> >
>> > That's true. While addressing this we changed how the residuals are
>> probed.
>> >
>> > To address a comment you raised offline about the saving of LR when
>> > calling a no-return function using a tail call and
>> > -fomit-frame-pointer, I think this should be safe as the code in
>> > frame_layout (line 4131-4136) would ensure that R30 is saved.
>> 
>> That line number range doesn't seem to match up with current sources.
>> But my point was that "X is a non-leaf function" does not directly imply "X
>> saves R30_REGNUM".  It might happen to imply that indirectly for all cases at
>> the moment, but it's not something that the AArch64 code enforces itself.
>> AFAICT the only time we force R30 to be saved explicitly is when emitting a
>> chain:
>> 
>>   if (cfun->machine->frame.emit_frame_chain)
>> {
>>   /* FP and LR are placed in the linkage record.  */
>>   cfun->machine->frame.reg_offset[R29_REGNUM] = 0;
>>   cfun->machine->frame.wb_candidate1 = R29_REGNUM;
>>   cfun->machine->frame.reg_offset[R30_REGNUM] = UNITS_PER_WORD;
>>   cfun->machine->frame.wb_candidate2 = R30_REGNUM;
>>   offset = 2 * UNITS_PER_WORD;
>> }
>> 
>> Otherwise we only save R30_REGNUM if the df machinery says R30 is live.
>> And in principle, it should be correct to use a sibcall pattern that doesn't
>> clobber R30 for a noreturn call even in functions that require a frame.  We
>> don't do that yet, and it probably doesn't make sense from a QoI perspective
>> on AArch64, but I don't think it's invalid in terms of rtl semantics.  
>> There's no
>> real reason why a noreturn function has to save the address of its caller
>> unless the target forces it to, such as for frame chains above.
>> 
>> In this patch we're relying on the link between the two concepts for a
>> security feature, so I think we should either enforce it explicitly or add an
>> assert like:
>> 
>>   gcc_assert (crtl->is_leaf
>>|| (cfun->machine->frame.reg_offset[R30_REGNUM]
>>!= SLOT_NOT_REQUIRED));
>> 
>> to aarch64_expand_prologue.  (I don't think we should make the assert
>> dependent on stack-clash, since the point is that we're assuming this
>> happens regardless of stack-clash.)
>
> I agree that the assert would be a good idea, though I'm not sure
> enabling it always is
> a good idea. I'm not sure what other languages that don't use
> stack-clash-protection and do their
> own probing do. Like Ada or Go?

I think the argument was that R30 will always be saved in non-leaf
frames without any specific help from the target (which I have my
doubts about, but might be true in practice).  The mechanism by which
that happens doesn't include a test for stack-clash, so I think we
should have the courage of our convictio

Re: [PATCH 03/25] Improve TARGET_MANGLE_DECL_ASSEMBLER_NAME.

2018-09-12 Thread Julian Brown
On Wed, 12 Sep 2018 17:31:58 +0100
Andrew Stubbs  wrote:

> On 12/09/18 16:16, Richard Biener wrote:
> > I think the symptom GCN sees needs to be better understood - like
> > wheter it is generally OK to mangle things arbitrarily.  
> 
> The name mangling is a horrible workaround for a bug in the HSA
> runtime code (which we do not own, cannot fix, and would want to
> support old versions of anyway).  Basically it refuses to load any
> binary that has the same symbol as another, already loaded, binary,
> regardless of the symbol's linkage.  Worse, it also rejects any
> binary that has duplicate symbols within it, despite the fact that it
> already linked just fine.
> 
> Adding the extra lookups is enough to build GCN binaries, with
> mangled names, whereas the existing name mangling support was either
> more specialized or bit rotten (I don't know which).
> 
> It may well be that there's a better way to solve the problem, or at 
> least to do the lookups.
> 
> It may also be that there are some unintended consequences, such as 
> false name matches, but I don't know of any at present.
> 
> Julian, can you comment, please?

I did the local-symbol name mangling in two places:

- The ASM_FORMAT_PRIVATE_NAME macro (good for local statics)
- The TARGET_MANGLE_DECL_ASSEMBLER_NAME hook (for file-scope
  local/statics)

Possibly, this was an abuse of these hooks, but it's arguably wrong that
that e.g. handle_alias_pairs has the "assembler name" leak through into
the user's source code -- if it's expected that the hook could make
arbitrary transformations to the string. (The latter hook is only used
by PE code for x86 at present, by the look of it, and the default
handles only special-purpose mangling indicated by placing a '*' at the
front of the symbol.)

I couldn't find an existing place where the DECL_NAMEs for symbols were
indexed in a hash table, equivalent to the table for assembler names.
Aliases are made via pragmas, so it's not 100% clear to me what the
scoping/lookup rules are supposed to be for those anyway, nor what the
possibility or consequences might be of false matches.

(The "!target" case in maybe_apply_pending_pragma_weaks, if it doesn't
somehow make a false match, just slows down reporting of an error a
little, I think. Similarly in handle_alias_pairs.)

If we had a symtab_node::get_for_name () using a suitable hash table, I
think it'd probably be right to use that. Can that be done (easily), or
is there some equivalent way? Introducing a new hash table everywhere
for a bug workaround for a relatively obscure feature on a single
target seems unfortunate.

Thanks,

Julian


Re: VRP: undefined shifting calculation should not need sign bit

2018-09-12 Thread Aldy Hernandez




On 09/12/2018 12:57 PM, Richard Sandiford wrote:

Aldy Hernandez  writes:

diff --git a/gcc/wide-int-range.h b/gcc/wide-int-range.h
index 589fdea4df6..e9ee418e5b2 100644
--- a/gcc/wide-int-range.h
+++ b/gcc/wide-int-range.h
@@ -131,7 +131,7 @@ extern bool wide_int_range_div (wide_int &wmin, wide_int 
&wmax,
  /* Return TRUE if shifting by range [MIN, MAX] is undefined behavior.  */
  
  inline bool

-wide_int_range_shift_undefined_p (signop sign, unsigned prec,
+wide_int_range_shift_undefined_p (unsigned prec,
  const wide_int &min, const wide_int &max)
  {
/* ?? Note: The original comment said this only applied to
@@ -142,7 +142,7 @@ wide_int_range_shift_undefined_p (signop sign, unsigned 
prec,
   behavior from the shift operation.  We cannot even trust
   SHIFT_COUNT_TRUNCATED at this stage, because that applies to rtl
   shifts, and the operation at the tree level may be widened.  */
-  return wi::lt_p (min, 0, sign) || wi::ge_p (max, prec, sign);
+  return wi::sign_mask (min) || wi::ge_p (max, prec, UNSIGNED);


I don't think this is a good idea.  Logically the comparison should
be done relative to the TYPE_SIGN of the type, so I think the original
code was correct.


The operation to calculate undefinedness must be done with the type of 
the RHS, as opposed to the type of the entire operation.  This can be 
confusing, as most operations use the same type for all operands as well 
as for the type of the entire operation.  For example, AFAICT, the 
following is valid gimple:


UINT64 = UINT64 << INT32

The original code was doing this (correctly), but since it was confusing 
to remember which type to pass, I rewrote the above function to not need 
the sign of the RHS.  This came about because in my ranger work, I 
passed the wrong type which took forever to find ;-).  My patch avoids 
further confusion.


Am I missing a subtle incorrectness in my approach?

Aldy


Re: [PATCH] look harder for MEM_REF operand equality to avoid -Wstringop-truncation (PR 84561)

2018-09-12 Thread Martin Sebor

On 08/31/2018 04:07 AM, Richard Biener wrote:

On Thu, Aug 30, 2018 at 7:39 PM Martin Sebor  wrote:


On 08/30/2018 11:22 AM, Richard Biener wrote:

On August 30, 2018 6:54:21 PM GMT+02:00, Martin Sebor  wrote:

On 08/30/2018 02:35 AM, Richard Biener wrote:

On Thu, Aug 30, 2018 at 2:12 AM Martin Sebor 

wrote:


The attached patch adds code to work harder to determine whether
the destination of an assignment involving MEM_REF is the same
as the destination of a prior strncpy call.  The included test
case demonstrates when this situation comes up.  During ccp,
dstbase and lhsbase returned by get_addr_base_and_unit_offset()
end up looking like this:


"During CCP" means exactly when?  The CCP lattice tracks copies
so CCP should already know that _1 == _8.  I suppose during
substitute_and_fold then?  But that replaces uses before folding
the stmt.


Yes, when ccp_finalize() performs the final substitution during
substitute_and_fold().


But then you shouldn't need the loop but at most look at the pointer SSA Def to 
get at the non-invariant ADDR_EXPR.


I don't follow.   Are you suggesting to compare
SSA_NAME_DEF_STMT (dstbase) to SSA_NAME_DEF_STMT (lhsbase) for
equality?  They're not equal.


No.


The first loop iterates once and retrieves

   1.  _8 = &pb_3(D)->a;

The second loop iterates three times and retrieves:

   1.  _1 = _9
   2.  _9 = _8
   3.  _8 = &pb_3(D)->a;

How do I get from _1 to &pb_3(D)->a without iterating?  Or are
you saying to still iterate but compare the SSA_NAME_DEF_STMT?


I say you should retrieve _8 = &pb_3(D)->a immediately since the
copies should be
propagated out at this stage.


The warning is issued as the strncpy call is being folded (during
the dom walk in substitute_and_fold_engine::substitute_and_fold)
but before the subsequent statements have been folded (during
the subsequent loop to eliminate statements).  So at the point
of the strncpy folding the three assignments above are still
there.

I can't think of a good way to solve this problem that's not
overly intrusive.  Unless you have some suggestions for how
to deal with it, is the patch okay as is?

Martin


Re: [PATCH] (PR86989)

2018-09-12 Thread Segher Boessenkool
On Fri, Aug 24, 2018 at 10:44:22AM +, Segher Boessenkool wrote:
> There currently is nothing that prevents replacing the TOC_REGISTER in
> a TOCREL unspec with something else, like a pseudo, or a memory ref.
> This of course does not work.  Fix that.

I backported this to 7 and 8 now.


Segher


>   PR target/86989
>   * config/rs6000/rs6000.c (toc_relative_expr_p): Check that the base is
>   the TOC register.


Re: [PATCH] Improve x % c1 == c2 expansion (PR middle-end/82853)

2018-09-12 Thread Jakub Jelinek
On Wed, Sep 12, 2018 at 03:02:41PM +0200, Richard Biener wrote:
> So I guess your patch is OK (with the small followup you posted).

Thanks, here is what I've committed after another bootstrap/regtest:

2018-09-12  Jakub Jelinek  

PR middle-end/82853
* expr.h (maybe_optimize_mod_cmp): Declare.
* expr.c (mod_inv): New function.
(maybe_optimize_mod_cmp): New function.
(do_store_flag): Use it.
* cfgexpand.c (expand_gimple_cond): Likewise.

* gcc.target/i386/pr82853-1.c: New test.
* gcc.target/i386/pr82853-2.c: New test.

--- gcc/expr.h.jj   2018-09-11 18:12:17.716325233 +0200
+++ gcc/expr.h  2018-09-12 15:20:28.329957642 +0200
@@ -290,6 +290,8 @@ expand_normal (tree exp)
a string constant.  */
 extern tree string_constant (tree, tree *, tree *, tree *);
 
+extern enum tree_code maybe_optimize_mod_cmp (enum tree_code, tree *, tree *);
+
 /* Two different ways of generating switch statements.  */
 extern int try_casesi (tree, tree, tree, tree, rtx, rtx, rtx, 
profile_probability);
 extern int try_tablejump (tree, tree, tree, tree, rtx, rtx, 
profile_probability);
--- gcc/expr.c.jj   2018-09-11 18:12:17.665326073 +0200
+++ gcc/expr.c  2018-09-12 15:20:37.448807135 +0200
@@ -11491,6 +11491,243 @@ string_constant (tree arg, tree *ptr_off
   return init;
 }
 
+/* Compute the modular multiplicative inverse of A modulo M
+   using extended Euclid's algorithm.  Assumes A and M are coprime.  */
+static wide_int
+mod_inv (const wide_int &a, const wide_int &b)
+{
+  /* Verify the assumption.  */
+  gcc_checking_assert (wi::eq_p (wi::gcd (a, b), 1));
+
+  unsigned int p = a.get_precision () + 1;
+  gcc_checking_assert (b.get_precision () + 1 == p);
+  wide_int c = wide_int::from (a, p, UNSIGNED);
+  wide_int d = wide_int::from (b, p, UNSIGNED);
+  wide_int x0 = wide_int::from (0, p, UNSIGNED);
+  wide_int x1 = wide_int::from (1, p, UNSIGNED);
+
+  if (wi::eq_p (b, 1))
+return wide_int::from (1, p, UNSIGNED);
+
+  while (wi::gt_p (c, 1, UNSIGNED))
+{
+  wide_int t = d;
+  wide_int q = wi::divmod_trunc (c, d, UNSIGNED, &d);
+  c = t;
+  wide_int s = x0;
+  x0 = wi::sub (x1, wi::mul (q, x0));
+  x1 = s;
+}
+  if (wi::lt_p (x1, 0, SIGNED))
+x1 += d;
+  return x1;
+}
+
+/* Attempt to optimize unsigned (X % C1) == C2 (or (X % C1) != C2).
+   If C1 is odd to:
+   (X - C2) * C3 <= C4 (or >), where
+   C3 is modular multiplicative inverse of C1 and 1< ((1<> S) <= C4, where C3 is modular multiplicative
+   inverse of C1>>S and 1<>S)) >> S.
+
+   For signed (X % C1) == 0 if C1 is odd to (all operations in it
+   unsigned):
+   (X * C3) + C4 <= 2 * C4, where
+   C3 is modular multiplicative inverse of (unsigned) C1 and 1<> S <= (C4 >> (S - 1))
+   where C3 is modular multiplicative inverse of (unsigned)(C1>>S) and 1<>S)) & (-1<= c should be always false.  */
+  || tree_int_cst_le (treeop1, *arg1))
+return code;
+
+  tree type = TREE_TYPE (*arg0);
+  scalar_int_mode mode;
+  if (!is_a  (TYPE_MODE (type), &mode))
+return code;
+  if (GET_MODE_BITSIZE (mode) != TYPE_PRECISION (type)
+  || TYPE_PRECISION (type) <= 1)
+return code;
+
+  signop sgn = UNSIGNED;
+  /* If both operands are known to have the sign bit clear, handle
+ even the signed modulo case as unsigned.  treeop1 is always
+ positive >= 2, checked above.  */
+  if (!TYPE_UNSIGNED (type) && get_range_pos_neg (treeop0) != 1)
+sgn = SIGNED;
+
+  if (!TYPE_UNSIGNED (type))
+{
+  if (tree_int_cst_sgn (*arg1) == -1)
+   return code;
+  type = unsigned_type_for (type);
+  if (!type || TYPE_MODE (type) != TYPE_MODE (TREE_TYPE (*arg0)))
+   return code;
+}
+
+  int prec = TYPE_PRECISION (type);
+  wide_int w = wi::to_wide (treeop1);
+  int shift = wi::ctz (w);
+  /* Unsigned (X % C1) == C2 is equivalent to (X - C2) % C1 == 0 if
+ C2 <= -1U % C1, because for any Z >= 0U - C2 in that case (Z % C1) != 0.
+ If C1 is odd, we can handle all cases by subtracting
+ C4 below.  We could handle even the even C1 and C2 > -1U % C1 cases
+ e.g. by testing for overflow on the subtraction, punt on that for now
+ though.  */
+  if ((sgn == SIGNED || shift) && !integer_zerop (*arg1))
+{
+  if (sgn == SIGNED)
+   return code;
+  wide_int x = wi::umod_trunc (wi::mask (prec, false, prec), w);
+  if (wi::gtu_p (wi::to_wide (*arg1), x))
+   return code;
+}
+
+  imm_use_iterator imm_iter;
+  use_operand_p use_p;
+  FOR_EACH_IMM_USE_FAST (use_p, imm_iter, treeop0)
+{
+  gimple *use_stmt = USE_STMT (use_p);
+  /* Punt if treeop0 is used in the same bb in a division
+or another modulo with the same divisor.  We should expect
+the division and modulo combined together.  */
+  if (use_stmt == stmt
+ || gimple_bb (use_stmt) != gimple_bb (stmt))
+   continue;
+  if (!is_gimple_assign (use_stmt)
+ || (gimple_assign_rhs_code (use_stmt) != 

[PATCH] Move signed to unsigned x % c == 0 optimization from fold-const to match.pd (PR tree-optimization/87287)

2018-09-12 Thread Jakub Jelinek
Hi!

While working on the next PR, I've noticed we only fold this on generic, not
gimple.

Fixed thusly, bootstrapped/regtested on x86_64-linux and i686-linux, ok for
trunk?

2018-09-12  Jakub Jelinek  

PR tree-optimization/87287
* fold-const.c (fold_binary_loc) : Move signed modulo
X % C == 0 to X % (unsigned) C == 0 optimization to ...
* match.pd (X % C == 0): ... here.  New optimization.

* gcc.dg/tree-ssa/pr87287.c: New test.

--- gcc/match.pd.jj 2018-08-29 13:32:54.427224535 +0200
+++ gcc/match.pd2018-09-12 16:25:01.385733828 +0200
@@ -470,7 +470,15 @@ (define_operator_list COND_TERNARY
&& TYPE_OVERFLOW_UNDEFINED (type)
&& wi::multiple_of_p (wi::to_wide (@1), wi::to_wide (@2),
 TYPE_SIGN (type)))
-   { build_zero_cst (type); })))
+   { build_zero_cst (type); }))
+ /* For (X % C) == 0, if X is signed and C is power of 2, use unsigned
+modulo and comparison, since it is simpler and equivalent.  */
+ (for cmp (eq ne)
+  (simplify
+   (cmp (mod @0 integer_pow2p@2) integer_zerop@1)
+   (if (!TYPE_UNSIGNED (TREE_TYPE (@0)))
+(with { tree utype = unsigned_type_for (TREE_TYPE (@0)); }
+ (cmp (mod (convert:utype @0) (convert:utype @2)) (convert:utype @1)))
 
 /* X % -C is the same as X % C.  */
 (simplify
--- gcc/fold-const.c.jj 2018-09-12 11:18:26.0 +0200
+++ gcc/fold-const.c2018-09-12 16:19:39.289018098 +0200
@@ -10661,28 +10661,6 @@ fold_binary_loc (location_t loc, enum tr
}
}
 
-  /* If this is an NE or EQ comparison of zero against the result of a
-signed MOD operation whose second operand is a power of 2, make
-the MOD operation unsigned since it is simpler and equivalent.  */
-  if (integer_zerop (arg1)
- && !TYPE_UNSIGNED (TREE_TYPE (arg0))
- && (TREE_CODE (arg0) == TRUNC_MOD_EXPR
- || TREE_CODE (arg0) == CEIL_MOD_EXPR
- || TREE_CODE (arg0) == FLOOR_MOD_EXPR
- || TREE_CODE (arg0) == ROUND_MOD_EXPR)
- && integer_pow2p (TREE_OPERAND (arg0, 1)))
-   {
- tree newtype = unsigned_type_for (TREE_TYPE (arg0));
- tree newmod = fold_build2_loc (loc, TREE_CODE (arg0), newtype,
-fold_convert_loc (loc, newtype,
-  TREE_OPERAND (arg0, 0)),
-fold_convert_loc (loc, newtype,
-  TREE_OPERAND (arg0, 1)));
-
- return fold_build2_loc (loc, code, type, newmod,
- fold_convert_loc (loc, newtype, arg1));
-   }
-
   /* Fold ((X >> C1) & C2) == 0 and ((X >> C1) & C2) != 0 where
 C1 is a valid shift constant, and C2 is a power of two, i.e.
 a single bit.  */
--- gcc/testsuite/gcc.dg/tree-ssa/pr87287.c.jj  2018-09-12 16:37:38.817307646 
+0200
+++ gcc/testsuite/gcc.dg/tree-ssa/pr87287.c 2018-09-12 16:36:50.293103713 
+0200
@@ -0,0 +1,34 @@
+/* PR tree-optimization/87287 */
+/* { dg-options "-O2 -fdump-tree-cddce1" } */
+/* { dg-final { scan-tree-dump-not " % 16" "cddce1" } } */
+/* { dg-final { scan-tree-dump-times " & 15" 4 "cddce1" } } */
+
+void f0 (void);
+
+int
+f1 (int x)
+{
+  return x % 16 == 0;
+}
+
+int
+f2 (int x)
+{
+  int y = x % 16;
+  return y != 0;
+}
+
+void
+f3 (int x)
+{
+  if (x % 16 != 0)
+f0 ();
+}
+
+void
+f4 (int x)
+{
+  int y = x % 16;
+  if (y == 0)
+f0 ();
+}

Jakub


[PATCH, middle-end]: Default the mode of pop and swap insns to the reg_raw_mode of FIRST_STACK_REG

2018-09-12 Thread Uros Bizjak
Hello!

Although reg-stack.c is mostly an x86 affair, attached patch decouples
it from x86 a bit. The patch changes the default mode of pop and swap
insns to the reg_raw_mode of FIRST_STACK_REG (= XFmode on x86). Also,
the patch explicitly constructs swap insn, without calling x86
specific named insn pattern.

2018-09-11  Uros Bizjak  

* reg-stack.c: Include regs.h.
(replace_reg): Assert that mode is MODE_FLOAT or MODE_COMPLEX_FLOAT.
(emit_pop_insn): Default pop insn mode to the reg_raw_mode of
FIRST_STACK_REG, not DFmode.
(emit_swap_insn): Default swap insn mode to the reg_raw_mode of
FIRST_STACK_REG, not XFmode.  Explicitly construct swap RTX.
(change stack): Default register mode to the reg_raw_mode of
FIRST_STACK_REG, not DFmode.
* config/i386/i386.md (*swap): Remove insn pattern.
(*swapxf): Rename from swapxf.

Bootstrapped and regression tested on x86_64-linux-gnu {,-m32}.

No functional changes, but the patch needs middle-end approval.

OK for mainline?

Uros.
diff --git a/gcc/config/i386/i386.md b/gcc/config/i386/i386.md
index 1e7241c87c2..ab1237e23da 100644
--- a/gcc/config/i386/i386.md
+++ b/gcc/config/i386/i386.md
@@ -3864,7 +3864,7 @@
 operands[1] = CONST1_RTX (mode);
 })
 
-(define_insn "swapxf"
+(define_insn "*swapxf"
   [(set (match_operand:XF 0 "register_operand" "+f")
(match_operand:XF 1 "register_operand" "+f"))
(set (match_dup 1)
@@ -3878,22 +3878,8 @@
 }
   [(set_attr "type" "fxch")
(set_attr "mode" "XF")])
-
-(define_insn "*swap"
-  [(set (match_operand:MODEF 0 "fp_register_operand" "+f")
-   (match_operand:MODEF 1 "fp_register_operand" "+f"))
-   (set (match_dup 1)
-   (match_dup 0))]
-  "TARGET_80387 || reload_completed"
-{
-  if (STACK_TOP_P (operands[0]))
-return "fxch\t%1";
-  else
-return "fxch\t%0";
-}
-  [(set_attr "type" "fxch")
-   (set_attr "mode" "")])
 
+
 ;; Zero extension instructions
 
 (define_expand "zero_extendsidi2"
diff --git a/gcc/reg-stack.c b/gcc/reg-stack.c
index d97a358e3f8..97d758c8307 100644
--- a/gcc/reg-stack.c
+++ b/gcc/reg-stack.c
@@ -162,6 +162,7 @@
 #include "df.h"
 #include "insn-config.h"
 #include "memmodel.h"
+#include "regs.h"
 #include "emit-rtl.h"  /* FIXME: Can go away once crtl is moved to rtl.h.  */
 #include "recog.h"
 #include "varasm.h"
@@ -711,7 +712,7 @@ replace_reg (rtx *reg, int regno)
   gcc_assert (IN_RANGE (regno, FIRST_STACK_REG, LAST_STACK_REG));
   gcc_assert (STACK_REG_P (*reg));
 
-  gcc_assert (SCALAR_FLOAT_MODE_P (GET_MODE (*reg))
+  gcc_assert (GET_MODE_CLASS (GET_MODE (*reg)) == MODE_FLOAT
  || GET_MODE_CLASS (GET_MODE (*reg)) == MODE_COMPLEX_FLOAT);
 
   *reg = FP_MODE_REG (regno, GET_MODE (*reg));
@@ -765,8 +766,10 @@ get_hard_regnum (stack_ptr regstack, rtx reg)
cases the movdf pattern to pop.  */
 
 static rtx_insn *
-emit_pop_insn (rtx_insn *insn, stack_ptr regstack, rtx reg, enum emit_where 
where)
+emit_pop_insn (rtx_insn *insn, stack_ptr regstack, rtx reg,
+  enum emit_where where)
 {
+  machine_mode raw_mode = reg_raw_mode[FIRST_STACK_REG];
   rtx_insn *pop_insn;
   rtx pop_rtx;
   int hard_regno;
@@ -775,8 +778,8 @@ emit_pop_insn (rtx_insn *insn, stack_ptr regstack, rtx reg, 
enum emit_where wher
  CLOBBER and USE expressions.  */
   if (COMPLEX_MODE_P (GET_MODE (reg)))
 {
-  rtx reg1 = FP_MODE_REG (REGNO (reg), DFmode);
-  rtx reg2 = FP_MODE_REG (REGNO (reg) + 1, DFmode);
+  rtx reg1 = FP_MODE_REG (REGNO (reg), raw_mode);
+  rtx reg2 = FP_MODE_REG (REGNO (reg) + 1, raw_mode);
 
   pop_insn = NULL;
   if (get_hard_regnum (regstack, reg1) >= 0)
@@ -791,15 +794,15 @@ emit_pop_insn (rtx_insn *insn, stack_ptr regstack, rtx 
reg, enum emit_where wher
 
   gcc_assert (hard_regno >= FIRST_STACK_REG);
 
-  pop_rtx = gen_rtx_SET (FP_MODE_REG (hard_regno, DFmode),
-FP_MODE_REG (FIRST_STACK_REG, DFmode));
+  pop_rtx = gen_rtx_SET (FP_MODE_REG (hard_regno, raw_mode),
+FP_MODE_REG (FIRST_STACK_REG, raw_mode));
 
   if (where == EMIT_AFTER)
 pop_insn = emit_insn_after (pop_rtx, insn);
   else
 pop_insn = emit_insn_before (pop_rtx, insn);
 
-  add_reg_note (pop_insn, REG_DEAD, FP_MODE_REG (FIRST_STACK_REG, DFmode));
+  add_reg_note (pop_insn, REG_DEAD, FP_MODE_REG (FIRST_STACK_REG, raw_mode));
 
   regstack->reg[regstack->top - (hard_regno - FIRST_STACK_REG)]
 = regstack->reg[regstack->top];
@@ -820,7 +823,6 @@ static void
 emit_swap_insn (rtx_insn *insn, stack_ptr regstack, rtx reg)
 {
   int hard_regno;
-  rtx swap_rtx;
   int other_reg;   /* swap regno temps */
   rtx_insn *i1;/* the stack-reg insn prior to INSN */
   rtx i1set = NULL_RTX;/* the SET rtx within I1 */
@@ -978,9 +980,13 @@ emit_swap_insn (rtx_insn *insn, stack_ptr regstack, rtx 
reg)
   return;
 }
 
-  swap_rtx = gen_swapxf (FP_MODE_REG (hard_regno, XFmode),
-FP_MODE_REG (FIRST_STACK_R

[PATCH] Improve x % c == d signed expansion (PR middle-end/87290)

2018-09-12 Thread Jakub Jelinek
Hi!

This patch optimizes signed x % C1 == C2 expansion if it is beneficial.
x % C1 == 0 should be handled unconditionally in match.pd (previous patch),
this only handles the cases where C1 is positive power of two and abs (C2)
is smaller than it and non-zero.

Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?

2018-09-12  Jakub Jelinek  
Kyrylo Tkachov  

PR middle-end/87290
* expr.c (maybe_optimize_pow2p_mod_cmp): New function.
(maybe_optimize_mod_cmp): Use it if integer_pow2p treeop1.

* gcc.target/i386/pr87290.c: New test.
* gcc.c-torture/execute/pr87290.c: New test.

--- gcc/expr.c.jj   2018-09-12 15:20:37.448807135 +0200
+++ gcc/expr.c  2018-09-12 18:16:41.0 +0200
@@ -11523,6 +11523,98 @@ mod_inv (const wide_int &a, const wide_i
   return x1;
 }
 
+/* Optimize x % C1 == C2 for signed modulo if C1 is a power of two and C2
+   is non-zero and C3 ((1<<(prec-1)) | (C1 - 1)):
+   for C2 > 0 to x & C3 == C2
+   for C2 < 0 to x & C3 == (C2 & C3).  */
+enum tree_code
+maybe_optimize_pow2p_mod_cmp (enum tree_code code, tree *arg0, tree *arg1)
+{
+  gimple *stmt = get_def_for_expr (*arg0, TRUNC_MOD_EXPR);
+  tree treeop0 = gimple_assign_rhs1 (stmt);
+  tree treeop1 = gimple_assign_rhs2 (stmt);
+  tree type = TREE_TYPE (*arg0);
+  scalar_int_mode mode;
+  if (!is_a  (TYPE_MODE (type), &mode))
+return code;
+  if (GET_MODE_BITSIZE (mode) != TYPE_PRECISION (type)
+  || TYPE_PRECISION (type) <= 1
+  || TYPE_UNSIGNED (type)
+  /* Signed x % c == 0 should have been optimized into unsigned modulo
+earlier.  */
+  || integer_zerop (*arg1)
+  /* If c is known to be non-negative, modulo will be expanded as unsigned
+modulo.  */
+  || get_range_pos_neg (treeop0) == 1)
+return code;
+
+  /* x % c == d where d < 0 && d <= -c should be always false.  */
+  if (tree_int_cst_sgn (*arg1) == -1
+  && -wi::to_widest (treeop1) >= wi::to_widest (*arg1))
+return code;
+
+  int prec = TYPE_PRECISION (type);
+  wide_int w = wi::to_wide (treeop1) - 1;
+  w |= wi::shifted_mask (0, prec - 1, true, prec);
+  tree c3 = wide_int_to_tree (type, w);
+  tree c4 = *arg1;
+  if (tree_int_cst_sgn (*arg1) == -1)
+c4 = wide_int_to_tree (type, w & wi::to_wide (*arg1));
+
+  rtx op0 = expand_normal (treeop0);
+  treeop0 = make_tree (TREE_TYPE (treeop0), op0);
+
+  bool speed_p = optimize_insn_for_speed_p ();
+
+  do_pending_stack_adjust ();
+
+  location_t loc = gimple_location (stmt);
+  struct separate_ops ops;
+  ops.code = TRUNC_MOD_EXPR;
+  ops.location = loc;
+  ops.type = TREE_TYPE (treeop0);
+  ops.op0 = treeop0;
+  ops.op1 = treeop1;
+  ops.op2 = NULL_TREE;
+  start_sequence ();
+  rtx mor = expand_expr_real_2 (&ops, NULL_RTX, TYPE_MODE (ops.type),
+   EXPAND_NORMAL);
+  rtx_insn *moinsns = get_insns ();
+  end_sequence ();
+
+  unsigned mocost = seq_cost (moinsns, speed_p);
+  mocost += rtx_cost (mor, mode, EQ, 0, speed_p);
+  mocost += rtx_cost (expand_normal (*arg1), mode, EQ, 1, speed_p);
+
+  ops.code = BIT_AND_EXPR;
+  ops.location = loc;
+  ops.type = TREE_TYPE (treeop0);
+  ops.op0 = treeop0;
+  ops.op1 = c3;
+  ops.op2 = NULL_TREE;
+  start_sequence ();
+  rtx mur = expand_expr_real_2 (&ops, NULL_RTX, TYPE_MODE (ops.type),
+   EXPAND_NORMAL);
+  rtx_insn *muinsns = get_insns ();
+  end_sequence ();
+
+  unsigned mucost = seq_cost (muinsns, speed_p);
+  mucost += rtx_cost (mur, mode, EQ, 0, speed_p);
+  mucost += rtx_cost (expand_normal (c4), mode, EQ, 1, speed_p);
+
+  if (mocost <= mucost)
+{
+  emit_insn (moinsns);
+  *arg0 = make_tree (TREE_TYPE (*arg0), mor);
+  return code;
+}
+
+  emit_insn (muinsns);
+  *arg0 = make_tree (TREE_TYPE (*arg0), mur);
+  *arg1 = c4;
+  return code;
+}
+
 /* Attempt to optimize unsigned (X % C1) == C2 (or (X % C1) != C2).
If C1 is odd to:
(X - C2) * C3 <= C4 (or >), where
@@ -11561,8 +11653,6 @@ maybe_optimize_mod_cmp (enum tree_code c
   tree treeop1 = gimple_assign_rhs2 (stmt);
   if (TREE_CODE (treeop0) != SSA_NAME
   || TREE_CODE (treeop1) != INTEGER_CST
-  /* x % pow2 is handled right already.  */
-  || integer_pow2p (treeop1)
   /* Don't optimize the undefined behavior case x % 0;
 x % 1 should have been optimized into zero, punt if
 it makes it here for whatever reason;
@@ -11572,6 +11662,11 @@ maybe_optimize_mod_cmp (enum tree_code c
   || tree_int_cst_le (treeop1, *arg1))
 return code;
 
+  /* Unsigned x % pow2 is handled right already, for signed
+ modulo handle it in maybe_optimize_pow2p_mod_cmp.  */
+  if (integer_pow2p (treeop1))
+return maybe_optimize_pow2p_mod_cmp (code, arg0, arg1);
+
   tree type = TREE_TYPE (*arg0);
   scalar_int_mode mode;
   if (!is_a  (TYPE_MODE (type), &mode))
--- gcc/testsuite/gcc.target/i386/pr87290.c.jj  2018-09-12 18:32:57.575345992 
+0200
+++ gcc/testsuite/gcc.target/i386/pr87290.c   

[PATCH] Re: reconfigured configure arg

2018-09-12 Thread Jakub Jelinek
On Mon, Sep 03, 2018 at 02:14:55PM +, Joseph Myers wrote:
> On Sat, 1 Sep 2018, Jakub Jelinek wrote:
> 
> > Couldn't we do something smarter than this?
> > Noticed while debugging why our bisect seed machine spends over 2 minutes in
> > sed when handling config.status, and the reason apparently was that it has
> > " : (reconfigured) " string in it 37753 times, so almost 700KB just in
> > that.  If it is useful information how many times gcc has been reconfigured,
> > can't we use a human readable form instead, where we add
> > " : (reconfigured) " the first time and then just change it to
> > " : (2x reconfigured) ", ..., " : (37753x reconfigured) " etc.?
> > 
> > Also, don't we actually add the above way $TOPLEVEL_CONFIGURE_ARGUMENTS
> > multiple times (if not empty, that is)?
> 
> I think my original idea when adding this information in 
>  was that, if 
> reconfigured, the arguments in TOPLEVEL_CONFIGURE_ARGUMENTS might include 
> synthetic ones created by config.status.  So simply using 
> TOPLEVEL_CONFIGURE_ARGUMENTS (and ignoring the previous set of configure 
> arguments) in the case of reconfiguration would give misleading results, 
> but also not mentioning it at all would give misleading results if someone 
> did in fact specify different configure options manually when 
> reconfiguring.  I was not expecting 37753x reconfiguration.

That was in our bisect seed...

Would the following minimal patch be acceptable, which just doesn't append
the " : (reconfigured) " + arguments if the configure arguments already end
with exactly that?

Bootstrapped/regtested on x86_64-linux and i686-linux, plus tested with
./config.status --recheck done the first time after configure (reconfigured
added) and second and following time (not added).  Ok for trunk?

2018-09-12  Jakub Jelinek  

* configure.ac: Only append
" : (reconfigured) $TOPLEVEL_CONFIGURE_ARGUMENTS" to
gcc_config_arguments if it was never reconfigured or last reconfigure
was with different arguments.
* configure: Regenerated.

--- gcc/configure.ac.jj 2018-08-26 22:42:22.518779717 +0200
+++ gcc/configure.ac2018-09-12 19:10:14.714835001 +0200
@@ -1744,7 +1744,10 @@ changequote(,)dnl
 if test -f configargs.h ; then
# Being re-configured.
gcc_config_arguments=`grep configuration_arguments configargs.h | sed 
-e 's/.*"\([^"]*\)".*/\1/'`
-   gcc_config_arguments="$gcc_config_arguments : (reconfigured) 
$TOPLEVEL_CONFIGURE_ARGUMENTS"
+   gcc_reconf_arguments=`echo "$gcc_config_arguments" | sed -e 's/^.*\( : 
(reconfigured) .*$\)/\1/'`
+   if [ "$gcc_reconf_arguments" != " : (reconfigured) 
$TOPLEVEL_CONFIGURE_ARGUMENTS" ]; then
+   gcc_config_arguments="$gcc_config_arguments : (reconfigured) 
$TOPLEVEL_CONFIGURE_ARGUMENTS"
+   fi
 else
gcc_config_arguments="$TOPLEVEL_CONFIGURE_ARGUMENTS"
 fi
--- gcc/configure.jj2018-08-26 22:42:05.912058938 +0200
+++ gcc/configure   2018-09-12 19:10:39.358433613 +0200
@@ -11836,7 +11836,10 @@ xm_file="auto-host.h ansidecl.h ${xm_fil
 if test -f configargs.h ; then
# Being re-configured.
gcc_config_arguments=`grep configuration_arguments configargs.h | sed 
-e 's/.*"\([^"]*\)".*/\1/'`
-   gcc_config_arguments="$gcc_config_arguments : (reconfigured) 
$TOPLEVEL_CONFIGURE_ARGUMENTS"
+   gcc_reconf_arguments=`echo "$gcc_config_arguments" | sed -e 's/^.*\( : 
(reconfigured) .*$\)/\1/'`
+   if [ "$gcc_reconf_arguments" != " : (reconfigured) 
$TOPLEVEL_CONFIGURE_ARGUMENTS" ]; then
+   gcc_config_arguments="$gcc_config_arguments : (reconfigured) 
$TOPLEVEL_CONFIGURE_ARGUMENTS"
+   fi
 else
gcc_config_arguments="$TOPLEVEL_CONFIGURE_ARGUMENTS"
 fi
@@ -18460,7 +18463,7 @@ else
   lt_dlunknown=0; lt_dlno_uscore=1; lt_dlneed_uscore=2
   lt_status=$lt_dlunknown
   cat > conftest.$ac_ext <<_LT_EOF
-#line 18463 "configure"
+#line 18466 "configure"
 #include "confdefs.h"
 
 #if HAVE_DLFCN_H
@@ -18566,7 +18569,7 @@ else
   lt_dlunknown=0; lt_dlno_uscore=1; lt_dlneed_uscore=2
   lt_status=$lt_dlunknown
   cat > conftest.$ac_ext <<_LT_EOF
-#line 18569 "configure"
+#line 18572 "configure"
 #include "confdefs.h"
 
 #if HAVE_DLFCN_H
@@ -19731,20 +19734,20 @@ if test -z "$aix_libpath"; then aix_libp
  prelink_cmds_CXX='tpldir=Template.dir~
rm -rf $tpldir~
$CC --prelink_objects --instantiation_dir $tpldir $objs 
$libobjs $compile_deplibs~
-   compile_command="$compile_command `find $tpldir -name \*.o | 
$NL2SP`"'
+   compile_command="$compile_command `find $tpldir -name \*.o | 
sort | $NL2SP`"'
  old_archive_cmds_CXX='tpldir=Template.dir~
rm -rf $tpldir~
$CC --prelink_objects --instantiation_dir $tpldir 
$oldobjs$old_deplibs~
-   $AR $AR_FLAGS $oldlib$oldobjs$old_deplibs `find $tpldir -name 
\*.o | $NL2SP`~
+

[PATCH] Use vectored writes when reporting errors and warnings.

2018-09-12 Thread Janne Blomqvist
When producing error and warning messages, libgfortran writes a
message by using many system calls.  By using vectored writes (the
POSIX writev function) when available and feasible to use without
major surgery, we reduce the chance that output gets intermingled with
other output to stderr.

In practice, this is done by introducing a new function estr_writev in
addition to the existing estr_write.  In order to use this, the old
st_vprintf is removed, replaced by direct calls of vsnprintf, allowing
more message batching.

Regtested on x86_64-pc-linux-gnu, Ok for trunk?

libgfortran/ChangeLog:

2018-09-12  Janne Blomqvist  

* config.h.in: Regenerated.
* configure: Regenerated.
* configure.ac: Check for writev and sys/uio.h.
* libgfortran.h: Include sys/uio.h.
(st_vprintf): Remove prototype.
(struct iovec): Define if not available.
(estr_writev): New prototype.
* runtime/backtrace.c (error_callback): Use estr_writev.
* runtime/error.c (ST_VPRINTF_SIZE): Remove.
(estr_writev): New function.
(st_vprintf): Remove.
(gf_vsnprintf): New function.
(ST_ERRBUF_SIZE): New macro.
(st_printf): Use vsnprintf.
(os_error): Use estr_writev.
(runtime_error): Use vsnprintf and estr_writev.
(runtime_error_at): Likewise.
(runtime_warning_at): Likewise.
(internal_error): Use estr_writev.
(generate_error_common): Likewise.
(generate_warning): Likewise.
(notify_std): Likewise.
* runtime/pause.c (pause_string): Likewise.
* runtime/stop.c (report_exception): Likewise.
(stop_string): Likewise.
(error_stop_string): Likewise.
---
 libgfortran/config.h.in |   6 +
 libgfortran/configure   |  10 +-
 libgfortran/configure.ac|   4 +-
 libgfortran/libgfortran.h   |  15 ++-
 libgfortran/runtime/backtrace.c |  27 +++--
 libgfortran/runtime/error.c | 188 +++-
 libgfortran/runtime/pause.c |  14 ++-
 libgfortran/runtime/stop.c  |  71 +---
 8 files changed, 250 insertions(+), 85 deletions(-)

diff --git a/libgfortran/config.h.in b/libgfortran/config.h.in
index 65fd27c6b11..c7f47146030 100644
--- a/libgfortran/config.h.in
+++ b/libgfortran/config.h.in
@@ -762,6 +762,9 @@
 /* Define to 1 if you have the  header file. */
 #undef HAVE_SYS_TYPES_H
 
+/* Define to 1 if you have the  header file. */
+#undef HAVE_SYS_UIO_H
+
 /* Define to 1 if you have the  header file. */
 #undef HAVE_SYS_WAIT_H
 
@@ -828,6 +831,9 @@
 /* Define if target has a reliable stat. */
 #undef HAVE_WORKING_STAT
 
+/* Define to 1 if you have the `writev' function. */
+#undef HAVE_WRITEV
+
 /* Define to 1 if you have the  header file. */
 #undef HAVE_XLOCALE_H
 
diff --git a/libgfortran/configure b/libgfortran/configure
index a583b676a3e..1c93683acd2 100755
--- a/libgfortran/configure
+++ b/libgfortran/configure
@@ -2553,6 +2553,7 @@ as_fn_append ac_header_list " sys/times.h"
 as_fn_append ac_header_list " sys/resource.h"
 as_fn_append ac_header_list " sys/types.h"
 as_fn_append ac_header_list " sys/stat.h"
+as_fn_append ac_header_list " sys/uio.h"
 as_fn_append ac_header_list " sys/wait.h"
 as_fn_append ac_header_list " floatingpoint.h"
 as_fn_append ac_header_list " ieeefp.h"
@@ -2584,6 +2585,7 @@ as_fn_append ac_func_list " access"
 as_fn_append ac_func_list " fork"
 as_fn_append ac_func_list " setmode"
 as_fn_append ac_func_list " fcntl"
+as_fn_append ac_func_list " writev"
 as_fn_append ac_func_list " gettimeofday"
 as_fn_append ac_func_list " stat"
 as_fn_append ac_func_list " fstat"
@@ -12514,7 +12516,7 @@ else
   lt_dlunknown=0; lt_dlno_uscore=1; lt_dlneed_uscore=2
   lt_status=$lt_dlunknown
   cat > conftest.$ac_ext <<_LT_EOF
-#line 12517 "configure"
+#line 12519 "configure"
 #include "confdefs.h"
 
 #if HAVE_DLFCN_H
@@ -12620,7 +12622,7 @@ else
   lt_dlunknown=0; lt_dlno_uscore=1; lt_dlneed_uscore=2
   lt_status=$lt_dlunknown
   cat > conftest.$ac_ext <<_LT_EOF
-#line 12623 "configure"
+#line 12625 "configure"
 #include "confdefs.h"
 
 #if HAVE_DLFCN_H
@@ -16168,6 +16170,8 @@ done
 
 
 
+
+
 
 
 
@@ -16763,6 +16767,8 @@ done
 
 
 
+
+
 
 
 
diff --git a/libgfortran/configure.ac b/libgfortran/configure.ac
index 05952aa0d40..64f7b9a39d5 100644
--- a/libgfortran/configure.ac
+++ b/libgfortran/configure.ac
@@ -276,7 +276,7 @@ AC_CHECK_TYPES([ptrdiff_t])
 
 # check header files (we assume C89 is available, so don't check for that)
 AC_CHECK_HEADERS_ONCE(unistd.h sys/random.h sys/time.h sys/times.h \
-sys/resource.h sys/types.h sys/stat.h sys/wait.h \
+sys/resource.h sys/types.h sys/stat.h sys/uio.h sys/wait.h \
 floatingpoint.h ieeefp.h fenv.h fptrap.h \
 fpxcp.h pwd.h complex.h xlocale.h)
 
@@ -315,7 +315,7 @@ else
AC_CHECK_FUNCS_ONCE(getrusage times mkstemp strtof strtold snprintf \
ftruncate chsize chdir getentropy getlogin gethostname kill link symlink \
sleep ttyname \
-   alarm a

Re: [PATCH][4/4][v2] RPO-style value-numbering for FRE/PRE

2018-09-12 Thread John David Anglin
On Wed, Sep 05, 2018 at 12:12:39AM +0200, Gerald Pfeifer wrote:
> On Fri, 24 Aug 2018, Richard Biener wrote:
> > Comments are still welcome - I've re-bootstrapped and tested the series
> > on x86_64-unknown-linux-gnu for all languages and will talk about
> > this work@the Cauldron in more detail.
> 
> Is there any chance you can test this on i586 as well?
> 
> Since around that commit (August 27th) my i586 builds are failing
> with something like
> 
>   during GIMPLE pass: pre
>   cp-demangle.c: In function ?d_print_comp?:
>   cp-demangle.c:5711:1: internal compiler error: Segmentation fault
>   5711 | d_print_comp (struct d_print_info *dpi, int options,
>| ^~~~
> 
> when doing a regular bootstrap on i586-unknown-freebsd10.4 (with
> clang 3.4.1 as system compiler).
> 
> Alas, when I add -save-temps, the internal compiler error goes away,
> and the backtrace using gdb installed on that system I share isn't 
> very useful, either.  (When I replace cp-demangle.c by cp-demangle.i
> in the invocation the error also disppears.)
> 
> On the other hand, this ICE has been consistent across a week of
> daily builds now.

I see the same thing on hppa2.0w-hp-hpux11.11.  I did a regression search
and 263875 is the culprit.  I've seen segmentation faults in cp-demangle.c
and macro.c.  Again, this is tough to debug.

Dave


Re: [PATCH] Re: reconfigured configure arg

2018-09-12 Thread Joseph Myers
On Wed, 12 Sep 2018, Jakub Jelinek wrote:

> Would the following minimal patch be acceptable, which just doesn't append
> the " : (reconfigured) " + arguments if the configure arguments already end
> with exactly that?
> 
> Bootstrapped/regtested on x86_64-linux and i686-linux, plus tested with
> ./config.status --recheck done the first time after configure (reconfigured
> added) and second and following time (not added).  Ok for trunk?

OK.

-- 
Joseph S. Myers
jos...@codesourcery.com


Re: [C++, wwwdocs] bugs/index.html - complete C++ non-bug entry

2018-09-12 Thread Jason Merrill
On Wed, Sep 12, 2018 at 6:25 AM, Nathan Sidwell  wrote:
> On 9/2/18 10:26 AM, Gerald Pfeifer wrote:
>>
>> Jason and Nathan,
>>
>> while I completed one C+ non-bug entry with the patch below, I am
>> not sure the items as such is really still relevant?
>>
>> In fact, if you could have a look at https://gcc.gnu.org/bugs/#nonbugs_cxx
>> and advise which entries perhaps should be removed (or updated or
>> added), I'll take care.
>
>
> sorry for the tardiness.
>
> 1) export
> exported templates were removed as a feature in C++11.  'export' remains a
> reserved keyword.  It is being recycled for C++ modules, currently an
> experimental TS.

Yes, the heading should say "C++98 export".

> 2) G++ emits two copies of constructors and destructors.
> IIUC the ABI has been corrected to allow a single cdtor in cases where they
> are the same.  We emit compatibility(?) aliases on targets that support
> them.  Jason?

The ABI hasn't changed, we just do better about avoiding code
duplication.  Nowadays we mostly only emit two separate complete
functions for variadic constructors in a class with virtual bases.
So, rarely.  But there are still two symbols for the different entry
points.

> 3) Global destructors are not run in the correct order.
> I think --use-cxa-atexit is enabled by default on most (nearly all) targets
> these days.

We certainly try to enable it everywhere __cxa_atexit is available,
which is indeed pretty common at this point.

> 4) Classes in exception specifiers must be complete types.
> While this is still correct, exception specifiers were deprecated in C++11.
> They were removed in C++17.  We still support them for compatibility.
> Modern code should use noexcept, or noexcept(boolean-constant-expression)

Yeah, this isn't interesting anymore.

> 5) Exceptions don't work in multithreaded applications.
> Correct, but as with cxa-atexit, usually enabled by default on threading
> systems
>
> 6) Templates, scoping, and digraphs.
> again, correct, but we give a very clueful diagnostic:
> digraph.cc:8:2: error: ‘<::’ cannot begin a template-argument list
> [-fpermissive]
>  B<::X::A> *p;
>   ^~
> digraph.cc:8:2: note: ‘<:’ is an alternate spelling for ‘[’. Insert
> whitespace between ‘<’ and ‘::’
> digraph.cc:8:2: note: (if you use ‘-fpermissive’ or ‘-std=c++11’, or
> ‘-std=gnu++11’ G++ will accept your code)
>
> 7) Copy constructor access check while initializing a reference.
> This example is no longer correct, it compiles without error (including
> c++98 mode).  I think compilers now have to know about about-to-die
> temporaries and move them?

As the document says, we allow this as of GCC 4.3.  At this point it
isn't useful to keep this entry around.

> 8) ABI changes
> We should probably document the std-forced ABI changes?

The text that says we change the ABI with each major version is no
longer accurate since we moved to bumping the major version every
year.  This should definitely be updated.

Jason


Re: [PATCH] fixincludes: vxworks: regs.h: Guard include of vxTypesOld.h by !_ASMLANGUAGE

2018-09-12 Thread Rasmus Villemoes
On 2018-09-05 11:38, Olivier Hainque wrote:
> Hi Rasmus,
> 
> I think we should either do a fixinclude that would "work" for
> C and ASM (like #include vxCpu for ASM, vxTypesOld otherwise), or
> simply remove this hack (just using the fixinclude parlance here).
> 
> My inclination would be for the latter.
> 
[snip]
> 
> What happens on your end if you just remove the hack ?
> 

Unfortunately, the libstdc++ build breaks:

In file included from
/usr/powerpc-wrs-vxworks/wind_base/target/h/regs.h:66:0,
 from
/bld/vestas/auto/work.20180912.214646/gcc-build/gcc/include-fixed/setjmp.h:57,
 from
/bld/vestas/auto/work.20180912.214646/gcc-build/powerpc-wrs-vxworks/libstdc++-v3/include/csetjmp:42,
 from
/bld/vestas/auto/work.20180912.214646/gcc-src/libstdc++-v3/include/precompiled/stdc++.h:42:
/usr/powerpc-wrs-vxworks/wind_base/target/h/arch/ppc/regsPpc.h:33:5:
error: 'UINT32' does not name a type
 UINT32 cr;   /* condition register */
 ^~

I'm happy to add an include of vxCpu in the _ASMLANGUAGE case, along
with a big comment. But, it's also a small enough patch that we can
carry it internally, if you prefer that we don't touch this hack upstream.

Thanks,
Rasmus




Go patch committed: Omit a couple of write barriers

2018-09-12 Thread Ian Lance Taylor
This patch to the Go frontend omits a couple of write barriers.

We omit a write barrier for
s = s[0:]
for a slice s.  In this case the pointer is not changing and no write
barrier is required.

We omit a write barrier for
s = append(s, v)
in the case where len(s) < cap(s) (and similarly when appending more
values).  When the slice has enough capacity the pointer is not
changing and no write barrier is required.

These changes are required to avoid write barriers in the method
randomOrder.reset in the runtime package.  That method is called from
procresize, at a point where we do not want to allocate memory.
Otherwise that method can use a write barrier, allocate memory, and
break TestReadMemStats.

Bootstrapped and ran Go testsuite on x86_64-pc-linux-gnu.  Committed
to mainline.

Ian
Index: gcc/go/gofrontend/MERGE
===
--- gcc/go/gofrontend/MERGE (revision 264245)
+++ gcc/go/gofrontend/MERGE (working copy)
@@ -1,4 +1,4 @@
-06e688ff6d829c8de3735e9f59b61b373afc596f
+acf852f838e6b99f907d84648be853fa2c374393
 
 The first line of this file holds the git revision number of the last
 merge done from the gofrontend repository.
Index: gcc/go/gofrontend/expressions.cc
===
--- gcc/go/gofrontend/expressions.cc(revision 264245)
+++ gcc/go/gofrontend/expressions.cc(working copy)
@@ -131,6 +131,40 @@ Expression::determine_type_no_context()
   this->do_determine_type(&context);
 }
 
+// Return true if two expressions refer to the same variable or struct
+// field.  This can only be true when there are no side effects.
+
+bool
+Expression::is_same_variable(Expression* a, Expression* b)
+{
+  if (a->classification() != b->classification())
+return false;
+
+  Var_expression* av = a->var_expression();
+  if (av != NULL)
+return av->named_object() == b->var_expression()->named_object();
+
+  Field_reference_expression* af = a->field_reference_expression();
+  if (af != NULL)
+{
+  Field_reference_expression* bf = b->field_reference_expression();
+  return (af->field_index() == bf->field_index()
+ && Expression::is_same_variable(af->expr(), bf->expr()));
+}
+
+  Unary_expression* au = a->unary_expression();
+  if (au != NULL)
+{
+  Unary_expression* bu = b->unary_expression();
+  return (au->op() == OPERATOR_MULT
+ && bu->op() == OPERATOR_MULT
+ && Expression::is_same_variable(au->operand(),
+ bu->operand()));
+}
+
+  return false;
+}
+
 // Return an expression handling any conversions which must be done during
 // assignment.
 
@@ -7421,7 +7455,7 @@ Builtin_call_expression::do_flatten(Gogo
   break;
 
 case BUILTIN_APPEND:
-  return this->flatten_append(gogo, function, inserter);
+  return this->flatten_append(gogo, function, inserter, NULL, NULL);
 
 case BUILTIN_COPY:
   {
@@ -7658,11 +7692,18 @@ Builtin_call_expression::lower_make(Stat
 
 // Flatten a call to the predeclared append function.  We do this in
 // the flatten phase, not the lowering phase, so that we run after
-// type checking and after order_evaluations.
+// type checking and after order_evaluations.  If ASSIGN_LHS is not
+// NULL, this append is the right-hand-side of an assignment and
+// ASSIGN_LHS is the left-hand-side; in that case, set LHS directly
+// rather than returning a slice.  This lets us omit a write barrier
+// in common cases like a = append(a, ...) when the slice does not
+// need to grow.  ENCLOSING is not NULL iff ASSIGN_LHS is not NULL.
 
 Expression*
 Builtin_call_expression::flatten_append(Gogo* gogo, Named_object* function,
-   Statement_inserter* inserter)
+   Statement_inserter* inserter,
+   Expression* assign_lhs,
+   Block* enclosing)
 {
   if (this->is_error_expression())
 return this;
@@ -7679,6 +7720,8 @@ Builtin_call_expression::flatten_append(
   if (args->size() == 1)
 {
   // append(s) evaluates to s.
+  if (assign_lhs != NULL)
+   return NULL;
   return args->front();
 }
 
@@ -7795,14 +7838,46 @@ Builtin_call_expression::flatten_append(
   // FIXME: Mark this index as not requiring bounds checks.
   ref = Expression::make_index(ref, zero, ref2, NULL, loc);
 
-  Expression* rhs = Expression::make_conditional(cond, call, ref, loc);
+  if (assign_lhs == NULL)
+{
+  Expression* rhs = Expression::make_conditional(cond, call, ref, loc);
+
+  gogo->lower_expression(function, inserter, &rhs);
+  gogo->flatten_expression(function, inserter, &rhs);
 
-  gogo->lower_expression(function, inserter, &rhs);
-  gogo->flatten_expression(function, inserter, &rhs);
+  ref = Expression::make_temporary_reference(s1tmp, loc);
+  Statement* assign = Statement::m

Re: [PATCH] Add a dwarf unit type to represent 24 bit values.

2018-09-12 Thread John Darrington
On Mon, Sep 10, 2018 at 04:40:58PM +0100, Jason Merrill wrote:
 On Mon, Sep 10, 2018 at 3:42 PM, John Darrington
  wrote:
 > On Mon, Sep 10, 2018 at 03:36:26PM +0100, Jason Merrill wrote:
 >  On Mon, Aug 27, 2018 at 8:20 PM, John Darrington
 >   wrote:
 >  > * include/dwarf2.h (enum dwarf_unit_type) 
[DE_EH_PE_udata3]: New member.
 >
 >
 >  What's the rationale?  Do you have a separate patch that uses this 
new macro?
 >
 > Yes.   I there is an upcoming patch for GDB.  See
 > https://sourceware.org/ml/gdb-patches/2018-08/msg00731.html
 
 This looks like support for reading fixed 3-byte values from the
 exception handling unwind information.  Do you expect this information
 to ever need to store 3-byte values? 

Yes,  I've already come across that (which is why I created the patch).
Without this patch, GDB will barf.  See the attached backtrace.

 The offsets in the unwind info
 don't need to correspond exactly to target word sizes,

So far as I can tell, the dwarf-unwinder assumes that they do.  Chaning
it would require a rewrite.  I doubt the GDB team would be happy with
that.

 and if you use
 an assembler that supports it (such as GNU as), the table will use
 variable-length encoding anyway.

I must admit I don't understand what you are saying here.  I was using
GNU as.  To what table are you refering?
 

J'
(top-gdb) bt
#0  internal_error (file=0x100afeac0 <_ZN3gdbL8in_placeE+86> 
"/home/john/binutils-gdb/gdb/dwarf2-frame.c", line=1535,
fmt=0x100aff05b  "Unsupported address size %d") at 
/home/john/binutils-gdb/gdb/common/errors.c:54
#1  0x00010055bce1 in encoding_for_size (size=3) at 
/home/john/binutils-gdb/gdb/dwarf2-frame.c:1535
#2  0x00010055be64 in read_encoded_value (unit=0x60086bf80, encoding=0 
'\000', ptr_len=3, buf=0x60087c300 "",
bytes_read_ptr=0xb82c, func_base=0) at 
/home/john/binutils-gdb/gdb/dwarf2-frame.c:1590
#3  0x00010055cd63 in decode_frame_entry_1 (unit=0x60086bf80, 
start=0x60087c2f8 "", eh_frame_p=0, cie_table=0xb970,
fde_table=0xb960, entry_type=EH_CIE_OR_FDE_TYPE_ID) at 
/home/john/binutils-gdb/gdb/dwarf2-frame.c:2046
#4  0x00010055cefd in decode_frame_entry (unit=0x60086bf80, 
start=0x60087c2f8 "", eh_frame_p=0, cie_table=0xb970,
fde_table=0xb960, entry_type=EH_CIE_OR_FDE_TYPE_ID) at 
/home/john/binutils-gdb/gdb/dwarf2-frame.c:2101
#5  0x00010055d3fc in dwarf2_build_frame_info (objfile=0x60068a300) at 
/home/john/binutils-gdb/gdb/dwarf2-frame.c:2297
#6  0x00010055c355 in dwarf2_frame_find_fde (pc=0xbac0, out_offset=0x0) 
at /home/john/binutils-gdb/gdb/dwarf2-frame.c:1719
#7  0x00010055b97c in dwarf2_frame_sniffer (self=0x100afef00 
, this_frame=0x600810530, this_cache=0x600810548)
at /home/john/binutils-gdb/gdb/dwarf2-frame.c:1349
#8  0x0001005cbd78 in frame_unwind_try_unwinder (this_frame=0x600810530, 
this_cache=0x600810548,
unwinder=0x100afef00 ) at 
/home/john/binutils-gdb/gdb/frame-unwind.c:106
#9  0x0001005cbf35 in frame_unwind_find_by_frame (this_frame=0x600810530, 
this_cache=0x600810548)
at /home/john/binutils-gdb/gdb/frame-unwind.c:165
#10 0x0001005ca77c in get_frame_type (frame=0x600810530) at 
/home/john/binutils-gdb/gdb/frame.c:2623
#11 0x0001006cb9c7 in print_frame_info (frame=0x600810530, print_level=0, 
print_what=SRC_AND_LOC, print_args=1, set_current_sal=1)
at /home/john/binutils-gdb/gdb/stack.c:795
#12 0x0001006ca373 in print_stack_frame (frame=0x600810530, print_level=0, 
print_what=SRC_AND_LOC, set_current_sal=1)
at /home/john/binutils-gdb/gdb/stack.c:176
#13 0x000100611c8f in print_stop_location (ws=0xbec0) at 
/home/john/binutils-gdb/gdb/infrun.c:8042
#14 0x000100611ceb in print_stop_event (uiout=0x6006784d0) at 
/home/john/binutils-gdb/gdb/infrun.c:8059
#15 0x00010041b911 in tui_on_normal_stop (bs=0x0, print_frame=1) at 
/home/john/binutils-gdb/gdb/tui/tui-interp.c:98
#16 0x0001009a8986 in std::_Function_handler::_M_invoke(std::_Any_data const&, bpstats*&&,   
   int&&) (__functor=..., __args#0=,
__args#1=)
at /usr/lib/gcc/x86_64-linux-gnu/7.3.0/include/c++/bits/std_function.h:316
#17 0x0001009273ed in std::function::operator()(bpstats*, int) const (this=0x600155a30, __args#0=0x0, 
__args#1=1)
at /usr/lib/gcc/x86_64-linux-gnu/7.3.0/include/c++/bits/std_function.h:706
#18 0x0001008c891f in gdb::observers::observable::notify 
(this=0x100cf2aa0 , args#0=0x0,
args#1=1) at /home/john/binutils-gdb/gdb/common/observable.h:106
#19 0x0001006123f9 in normal_stop () at 
/home/john/binutils-gdb/gdb/infrun.c:8339
#20 0x000100607bf1 in start_remote (from_tty=1) at 
/home/john/binutils-gdb/gdb/infrun.c:3236
#21 0x00010068fdcf in remote_target::start_remote (this=0x600850d00, 
from_tty=1, extended_p=0)
at /home/john/binutils-gdb/gdb/remote.c:4767
#22 0x00010069

[PATCH] PR libstdc++/87135 Rehash only when necessary (LWG2156)

2018-09-12 Thread François Dumont

All changes limited to hashtable_c++0x.cc file.

_Prime_rehash_policy::_M_next_bkt now really does what its comment was 
declaring that is to say:

  // Return a prime no smaller than n.

_Prime_rehash_policy::_M_need_rehash rehash only when _M_next_size is 
exceeded, not only when it is reach.


    PR libstdc++/87135
    * src/c++11/hashtable_c++0x.cc:
    (_Prime_rehash_policy::_M_next_bkt): Return a prime no smaller than
    requested size, but not necessarily greater.
    (_Prime_rehash_policy::_M_need_rehash): Rehash only if target size is
    strictly greater than next resize threshold.
    * testsuite/23_containers/unordered_map/modifiers/reserve.cc: Adapt 
test

    to validate that there is no rehash as long as number of insertion is
    lower or equal to the reserved number of elements.

unordered_map tests successful, ok to commit once all other tests 
completed ?


François
diff --git a/libstdc++-v3/src/c++11/hashtable_c++0x.cc b/libstdc++-v3/src/c++11/hashtable_c++0x.cc
index a776a8506fe..ec6031b3f5b 100644
--- a/libstdc++-v3/src/c++11/hashtable_c++0x.cc
+++ b/libstdc++-v3/src/c++11/hashtable_c++0x.cc
@@ -46,10 +46,10 @@ namespace __detail
   {
 // Optimize lookups involving the first elements of __prime_list.
 // (useful to speed-up, eg, constructors)
-static const unsigned char __fast_bkt[13]
-  = { 2, 2, 3, 5, 5, 7, 7, 11, 11, 11, 11, 13, 13 };
+static const unsigned char __fast_bkt[]
+  = { 2, 2, 2, 3, 5, 5, 7, 7, 11, 11, 11, 11, 13, 13 };
 
-if (__n <= 12)
+if (__n < sizeof(__fast_bkt) / sizeof(unsigned char))
   {
 	_M_next_resize =
 	  __builtin_ceil(__fast_bkt[__n] * (long double)_M_max_load_factor);
@@ -65,9 +65,8 @@ namespace __detail
 // iterator that can be dereferenced to get the last prime.
 constexpr auto __last_prime = __prime_list + __n_primes - 1;
 
-// Look for 'n + 1' to make sure returned value will be greater than n.
 const unsigned long* __next_bkt =
-  std::lower_bound(__prime_list + 6, __last_prime, __n + 1);
+  std::lower_bound(__prime_list + 6, __last_prime, __n);
 
 if (__next_bkt == __last_prime)
   // Set next resize to the max value so that we never try to rehash again
@@ -95,7 +94,7 @@ namespace __detail
   _M_need_rehash(std::size_t __n_bkt, std::size_t __n_elt,
 		 std::size_t __n_ins) const
   {
-if (__n_elt + __n_ins >= _M_next_resize)
+if (__n_elt + __n_ins > _M_next_resize)
   {
 	long double __min_bkts = (__n_elt + __n_ins)
    / (long double)_M_max_load_factor;
diff --git a/libstdc++-v3/testsuite/23_containers/unordered_map/modifiers/reserve.cc b/libstdc++-v3/testsuite/23_containers/unordered_map/modifiers/reserve.cc
index e9cf7fd6f67..7f34325df87 100644
--- a/libstdc++-v3/testsuite/23_containers/unordered_map/modifiers/reserve.cc
+++ b/libstdc++-v3/testsuite/23_containers/unordered_map/modifiers/reserve.cc
@@ -18,23 +18,46 @@
 // .
 
 #include 
+
 #include 
 
 void test01()
 {
-  const int N = 1000;
-
   typedef std::unordered_map Map;
   Map m;
-  m.reserve(N);
 
-  std::size_t bkts = m.bucket_count();
-  for (int i = 0; i != N; ++i)
+  // Make sure max load factor is 1 so that reserved elements is directly
+  // the bucket count.
+  m.max_load_factor(1);
+
+  int i = -1;
+  for (;;)
 {
-  m.insert(std::make_pair(i, i));
-  // As long as we insert less than the reserved number of elements we
-  // shouldn't experiment any rehash.
+  m.reserve(m.bucket_count());
+
+  std::size_t bkts = m.bucket_count();
+
+  m.reserve(bkts);
   VERIFY( m.bucket_count() == bkts );
+
+  for (++i; i < bkts; ++i)
+	{
+	  m.insert(std::make_pair(i, i));
+
+	  // As long as we insert less than the reserved number of elements we
+	  // shouldn't experiment any rehash.
+	  VERIFY( m.bucket_count() == bkts );
+
+	  VERIFY( m.load_factor() <= m.max_load_factor() );
+	}
+
+  // One more element should rehash.
+  m.insert(std::make_pair(i, i));
+  VERIFY( m.bucket_count() != bkts );
+  VERIFY( m.load_factor() <= m.max_load_factor() );
+
+  if (i > 1024)
+	break;
 }
 }
 


Re: [PATCH] Move signed to unsigned x % c == 0 optimization from fold-const to match.pd (PR tree-optimization/87287)

2018-09-12 Thread Jeff Law
On 9/12/18 12:31 PM, Jakub Jelinek wrote:
> Hi!
> 
> While working on the next PR, I've noticed we only fold this on generic, not
> gimple.
> 
> Fixed thusly, bootstrapped/regtested on x86_64-linux and i686-linux, ok for
> trunk?
> 
> 2018-09-12  Jakub Jelinek  
> 
>   PR tree-optimization/87287
>   * fold-const.c (fold_binary_loc) : Move signed modulo
>   X % C == 0 to X % (unsigned) C == 0 optimization to ...
>   * match.pd (X % C == 0): ... here.  New optimization.
> 
>   * gcc.dg/tree-ssa/pr87287.c: New test.
OK.
jeff


Re: [PATCH, middle-end]: Default the mode of pop and swap insns to the reg_raw_mode of FIRST_STACK_REG

2018-09-12 Thread Jeff Law
On 9/12/18 12:34 PM, Uros Bizjak wrote:
> Hello!
> 
> Although reg-stack.c is mostly an x86 affair, attached patch decouples
> it from x86 a bit. The patch changes the default mode of pop and swap
> insns to the reg_raw_mode of FIRST_STACK_REG (= XFmode on x86). Also,
> the patch explicitly constructs swap insn, without calling x86
> specific named insn pattern.
> 
> 2018-09-11  Uros Bizjak  
> 
> * reg-stack.c: Include regs.h.
> (replace_reg): Assert that mode is MODE_FLOAT or MODE_COMPLEX_FLOAT.
> (emit_pop_insn): Default pop insn mode to the reg_raw_mode of
> FIRST_STACK_REG, not DFmode.
> (emit_swap_insn): Default swap insn mode to the reg_raw_mode of
> FIRST_STACK_REG, not XFmode.  Explicitly construct swap RTX.
> (change stack): Default register mode to the reg_raw_mode of
> FIRST_STACK_REG, not DFmode.
> * config/i386/i386.md (*swap): Remove insn pattern.
> (*swapxf): Rename from swapxf.
> 
> Bootstrapped and regression tested on x86_64-linux-gnu {,-m32}.
> 
> No functional changes, but the patch needs middle-end approval.
> 
> OK for mainline?
OK.
jeff