Re: [PATCH 0/3] mklog improvements

2019-09-04 Thread Martin Liška

On 9/3/19 6:37 PM, Jeff Law wrote:

On 8/30/19 2:55 AM, Martin Liška wrote:

PING^1

On 8/13/19 9:49 AM, Martin Liska wrote:

Hi.

I'm sending format independent changes to mklog that should
improve the script. It addresses couple of improvement
mentioned here:
https://gcc.gnu.org/ml/gcc-patches/2019-08/msg00031.html

Martin

Martin Liska (3):
   Use argparse.ArgumentParser for mklog.
   mklog: parse PR references from new test files
   mklog: Do not print changed functions in testsuite

So there's a ton of follow-ups (primarily between Jakub and yourself).


Yes, the follow up is about possible change to ChangeLog format. I first
provided a patch, but later I realized that I would rather stick with
the current format.



What's the state of the changes Jakub has requested?  Can we go forward
with the patch as-is and iterate on further improvements?


Yes, I'm going to install these 3 patches which do not change the format.

Martin



Jeff





Re: [PATCH] Deprecate -frepo option.

2019-09-04 Thread Martin Liška

On 7/9/19 3:00 PM, Martin Liška wrote:

Great. Then I'm sending patch that does the functionality removal.


Hi.

The GCC 9.2.0 is released and the version contains a deprecation
warning for -frepo.

Is it the right time now to remove it from trunk?
Thanks,
Martin


Force IFN_LOAD/STORE_LANES operands to be memory (PR91577)

2019-09-04 Thread Richard Sandiford
This patch uses the workaround Richi suggested in the PR: make
discover_nonconstant_array_refs mark the source of an IFN_LOAD_LANES
call and the destination of an IFN_STORE_LANES call as addressable,
so that they don't end up being REG rtxes during expansion.

I had to move the discover_nonconstant_array_refs call outside the
currently_expanding_to_rtl block since otherwise mark_addressable
just queues the decision for later.

Tested on aarch64-linux-gnu.  OK to install?

Richard


2019-09-04  Richard Sandiford  

gcc/
PR middle-end/91577
* cfgexpand.c (discover_nonconstant_array_refs): Force the source
of an IFN_LOAD_LANES call and the destination of an IFN_STORE_LANES
call to be in memory.
(pass_expand::execute): Call discover_nonconstant_array_refs before
setting currently_expanding_to_rtl.

gcc/testsuite/
PR middle-end/91577
* gfortran.dg/pr91577.f90: New test, taken from temporary_1.f90.

Index: gcc/cfgexpand.c
===
--- gcc/cfgexpand.c 2019-07-12 11:33:27.712037291 +0100
+++ gcc/cfgexpand.c 2019-09-04 10:16:17.798875071 +0100
@@ -6155,7 +6155,24 @@ discover_nonconstant_array_refs (void)
   {
gimple *stmt = gsi_stmt (gsi);
if (!is_gimple_debug (stmt))
- walk_gimple_op (stmt, discover_nonconstant_array_refs_r, NULL);
+ {
+   walk_gimple_op (stmt, discover_nonconstant_array_refs_r, NULL);
+   gcall *call = dyn_cast  (stmt);
+   if (call && gimple_call_internal_p (call))
+ switch (gimple_call_internal_fn (call))
+   {
+   case IFN_LOAD_LANES:
+ /* The source must be a MEM.  */
+ mark_addressable (gimple_call_arg (call, 0));
+ break;
+   case IFN_STORE_LANES:
+ /* The destination must be a MEM.  */
+ mark_addressable (gimple_call_lhs (call));
+ break;
+   default:
+ break;
+   }
+ }
   }
 }
 
@@ -6353,6 +6370,9 @@ pass_expand::execute (function *fun)
avoid_deep_ter_for_debug (gsi_stmt (gsi), 0);
 }
 
+  /* Mark arrays indexed with non-constant indices with TREE_ADDRESSABLE.  */
+  discover_nonconstant_array_refs ();
+
   /* Make sure all values used by the optimization passes have sane
  defaults.  */
   reg_renumber = 0;
@@ -6387,9 +6407,6 @@ pass_expand::execute (function *fun)
  Also, final expects a note to appear there.  */
   emit_note (NOTE_INSN_DELETED);
 
-  /* Mark arrays indexed with non-constant indices with TREE_ADDRESSABLE.  */
-  discover_nonconstant_array_refs ();
-
   targetm.expand_to_rtl_hook ();
   crtl->init_stack_alignment ();
   fun->cfg->max_jumptable_ents = 0;
Index: gcc/testsuite/gfortran.dg/pr91577.f90
===
--- /dev/null   2019-07-30 08:53:31.317691683 +0100
+++ gcc/testsuite/gfortran.dg/pr91577.f90   2019-09-04 10:16:17.798875071 
+0100
@@ -0,0 +1,28 @@
+! { dg-do run }
+! { dg-additional-options "--param max-completely-peel-loop-nest-depth=1" }
+! PR 27662. Don't zero the first stride to indicate a temporary. It
+! may be used later.
+program pr27662
+ implicit none
+ real(kind=kind(1.0d0)), dimension (2, 2):: x, y, z;
+ integer i, j
+ x(1,1) = 1.d0
+ x(2,1) = 0.d0
+ x(1,2) = 0.d0
+ x(2,2) = 1.d0
+ z = matmul (x, transpose (test ()))
+ do i = 1, size (x, 1)
+   do j = 1, size (x, 2)
+ if (x (i, j) .ne. z (i, j)) STOP 1
+   end do
+ end do
+
+contains
+ function test () result (res)
+   real(kind=kind(1.0d0)), dimension(2,2) :: res
+   res(1,1) = 1.d0
+   res(2,1) = 0.d0
+   res(1,2) = 0.d0
+   res(2,2) = 1.d0
+ end function
+end


Re: [PATCH] PR fortran/91551 -- ALLOCATED has one argument

2019-09-04 Thread Janne Blomqvist
On Wed, Aug 28, 2019 at 3:43 AM Steve Kargl
 wrote:
>
> The attach patch was built and tested on i586-*-freebsd.
> It includes a check for ALLOCATED with no arguments.
> OK to commit?

Ok.


-- 
Janne Blomqvist


Re: [PATCH] Deprecate -frepo option.

2019-09-04 Thread Jonathan Wakely
On Wed, 4 Sep 2019 at 09:13, Martin Liška wrote:
>
> On 7/9/19 3:00 PM, Martin Liška wrote:
> > Great. Then I'm sending patch that does the functionality removal.
>
> Hi.
>
> The GCC 9.2.0 is released and the version contains a deprecation
> warning for -frepo.
>
> Is it the right time now to remove it from trunk?

If users upgrade to 9.2 in 6 months, see the new warning, and tell us
they're using the feature, will it be possible to re-enable it?

The point of the warning was to see if users complain. Three weeks
isn't a very long time to see if users are going to complain :-)


Re: [PATCH v2] [MIPS] Fix handling of MSA SUBREG moves on big-endian targets

2019-09-04 Thread Richard Sandiford
Jeff Law  writes:
> On 9/3/19 2:16 AM, Mihailo Stojanovic wrote:
>> From: Mihailo Stojanovic 
>> 
>> Hi everybody,
>> 
>> This fixes the MSA implementation on big-endian targets which is
>> essentially broken for things like SUBREG handling and calling
>> convention for vector types. It borrows heavily from [1] as Aarch64 has
>> the same problem with SVE vectors.
>> 
>> Conceptually, register bitconverts should act as the data has been
>> stored to memory in one mode, and loaded from memory in the other.
>> This isn't what happens on big-endian as vector load/store instructions
>> are essentially mixed-endian with respect to the vector as a whole.
>> The in-register representation of data must be changed so that the
>> load/store round-trip becomes valid. This is done by inserting one or
>> two shuffle instructions for every SUBREG move, as previously
>> implemented in [2] for LLVM. Even if the shuffle instructions weren't
>> generated, constraint in mips_can_change_mode_class would force the
>> conceptual memory reload of SUBREG move operand, which would generate
>> correct, albeit very inefficient code.
>> 
>> New msa_reg_predicate was created in order to forbid SUBREG operands in
>> MSA patterns on big-endian targets. It weeds SUBREGs out of the
>> instruction patterns into SUBREG->REG moves which are caught by the new
>> msa_mov_subreg_be pattern and transformed into shuffle(s).
>> 
>> As for the MSA calling convention, ABI states that compiling for MSA
>> should not change the base ABIs vector calling convention, that is, MSA
>> vectors passed or returned by value do not use the MSA vector registers.
>> Instead, they are passed by general-purpose registers, as described by
>> the ABI. Passing the vector argument requires splitting it into 2 (or 4)
>> general-purpose registers and bitconverting it into V2DImode (or
>> V4SImode). The solution boils down to the one presented for SUBREG
>> moves: force every vector argument to the appropriate mode (V2DI or
>> V4SI) so that the shuffle instruction(s) might be generated in order to
>> conform to the calling convention. The same applies to vectors as return
>> values.
>> 
>> New testcases were added (thanks Dragan!) to check calling convention
>> compliance for all possible combinations of MSA and non-MSA
>> interlinking.
>> 
>> This fixes the following vectorization test failures:
>> 
>> vect/pr66251.c
>> vect/vect-alias-check-10.c
>> vect/vect-alias-check-11.c
>> vect/vect-alias-check-12.c
>> vect/vect-bswap32.c
>> vect/vect-bswap64.c
>> vect/vect-nop-move.c
>> vect/slp-41.c
>> vect/slp-43.c
>> vect/slp-45.c
>> vect/slp-perm-11.c
>> 
>> Tested on mips32-mti-linux-gnu and mips64-mti-linux-gnu.
>> 
>> Regards,
>> Mihailo
>> 
>> [1] https://gcc.gnu.org/ml/gcc-patches/2018-01/msg02176.html
>> [2] https://reviews.llvm.org/rL189330
>> 
>> gcc/ChangeLog:
>> 
>> 2019-08-08  Mihailo Stojanovic  
>> 
>> * config/mips/mips-msa.md: Replace register_operand predicate with
>> msa_reg_operand in every pattern.
>> (msa_change_mode): New pattern.
>> (*msa_change_mode_): New unspec.
>> (*msa_mov_subreg_be): New unspec.
>> * config/mips/mips-protos.h (mips_split_msa_subreg_move): Declare.
>> * config/mips/mips.c (mips_maybe_expand_msa_subreg_move): New
>> function.
>> (mips_replace_reg_mode): Ditto.
>> (mips_split_msa_subreg_move): Ditto.
>> (mips_legitimize_move): Modify machine modes of MSA vectors which
>> reside in general-purpose registers. Check whether SUBREG move can
>> be replaced with shuffle(s).
>> (mips_split_128bit_move): Replace new SUBREGs with REGs.
>> (mips_split_msa_copy_d): Ditto.
>> (mips_split_msa_insert_d): Ditto.
>> (mips_split_msa_fill_d): Ditto.
>> (mips_can_change_mode_class): Disallow mode change which would
>> result in lane width change.
>> (mips_expand_vec_unpack): Replace new SUBREG with REG on big-endian
>> targets.
>> * config/mips/predicates.md (msa_reg_operand): New predicate.
>> 
>> gcc/testsuite/ChangeLog:
>> 
>> 2019-08-08  Dragan Mladjenovic  
>> 
>> * gcc.target/mips/inter/msa-inter.exp: New file.
>> * gcc.target/mips/inter/msa_1.h: New test.
>> * gcc.target/mips/inter/msa_1_main.c: New test.
>> * gcc.target/mips/inter/msa_1_x.c: New test.
>> * gcc.target/mips/inter/msa_1_y.c: New test.
> I don't guess Richard S's proposed cleanups to the generic handling of
> SUBREGs to address issues with SVE really help here.

Sorry, haven't applied that patch yet -- need to find time to retest.

But yeah, I don't think my patch will help.  For something like this
I think we'd need target hooks to control the subreg layout in registers.
Sounds like it should be doable, but probably quite tricky.

Thanks,
Richard


[PATCH V2] Generalized predicate/condition for parameter reference in IPA (PR ipa/91088)

2019-09-04 Thread Feng Xue OS
>> +  if (unmodified_parm_or_parm_agg_item (fbi, stmt, expr, index_p,
>> + size_p, aggpos))
>> + {
>> +   if (param_ops_p)
>> + {
>> +   /* Find use of parameter, add a convert operation to describe
>> +  result type, which may not be same as parameter type.  */
>> +   eval_op.val_is_rhs = false;
>> +   eval_op.val = NULL_TREE;
>> +   eval_op.code = VIEW_CONVERT_EXPR;
>> +   eval_op.type = TREE_TYPE (expr);
>> +
>> +   vec_safe_insert (*param_ops_p, 0, eval_op);

> If we get here in the first iteration of the loop, could we not insert
> anything into the vector and handle such cases in
> evaluate_conditions_for_known_args like we do today (well, with
> fold_convert might be better)?  It could save quite some memory and it
> is important to try keep the memory footprint down in IPA summaries.

This tricky element was removed, instead, TREE_TYPE of EXPR is stored
in CONDITION struct.

> Also, I think you want a parameter to limit the maximum length of
> param_ops_p, at some point someone will come with some crazy
> machine-generated code that will create huge vectors.
Added.

Feng
---
diff --git a/gcc/ipa-fnsummary.c b/gcc/ipa-fnsummary.c
index 278bf606661..65a6befbcd2 100644
--- a/gcc/ipa-fnsummary.c
+++ b/gcc/ipa-fnsummary.c
@@ -296,9 +296,9 @@ set_hint_predicate (predicate **p, predicate new_predicate)
 }
 
 
-/* Compute what conditions may or may not hold given invormation about
+/* Compute what conditions may or may not hold given information about
parameters.  RET_CLAUSE returns truths that may hold in a specialized copy,
-   whie RET_NONSPEC_CLAUSE returns truths that may hold in an nonspecialized
+   while RET_NONSPEC_CLAUSE returns truths that may hold in an nonspecialized
copy when called in a given context.  It is a bitmask of conditions. Bit
0 means that condition is known to be false, while bit 1 means that 
condition
may or may not be true.  These differs - for example NOT_INLINED condition
@@ -331,6 +331,8 @@ evaluate_conditions_for_known_args (struct cgraph_node 
*node,
 {
   tree val;
   tree res;
+  int j;
+  struct expr_eval_op *op;
 
   /* We allow call stmt to have fewer arguments than the callee function
  (especially for K&R style programs).  So bound check here (we assume
@@ -382,7 +384,7 @@ evaluate_conditions_for_known_args (struct cgraph_node 
*node,
  continue;
}
 
-  if (maybe_ne (tree_to_poly_int64 (TYPE_SIZE (TREE_TYPE (val))), c->size))
+  if (TYPE_SIZE (c->type) != TYPE_SIZE (TREE_TYPE (val)))
{
  clause |= 1 << (i + predicate::first_dynamic_condition);
  nonspec_clause |= 1 << (i + predicate::first_dynamic_condition);
@@ -394,7 +396,19 @@ evaluate_conditions_for_known_args (struct cgraph_node 
*node,
  continue;
}
 
-  val = fold_unary (VIEW_CONVERT_EXPR, TREE_TYPE (c->val), val);
+  val = fold_unary (VIEW_CONVERT_EXPR, c->type, val);
+  for (j = 0; vec_safe_iterate (c->param_ops, j, &op); j++)
+   {
+ if (!val)
+   break;
+ if (!op->val)
+   val = fold_unary (op->code, op->type, val);
+ else if (op->val_is_rhs)
+   val = fold_binary_to_constant (op->code, op->type, val, op->val);
+ else
+   val = fold_binary_to_constant (op->code, op->type, op->val, val);
+   }
+
   res = val
? fold_binary_to_constant (c->code, boolean_type_node, val, c->val)
: NULL;
@@ -1157,6 +1171,110 @@ eliminated_by_inlining_prob (ipa_func_body_info *fbi, 
gimple *stmt)
 }
 }
 
+/* Analyze EXPR if it represents a series of simple operations performed on
+   a function parameter and return true if so.  FBI, STMT, EXPR, INDEX_P and
+   AGGPOS have the same meaning like in unmodified_parm_or_parm_agg_item.
+   Type of the parameter or load from an aggregate via the parameter is
+   stored in *TYPE_P.  Operations on the parameter are recorded to
+   PARAM_OPS_P if it is not NULL.  */
+
+static bool
+decompose_param_expr (struct ipa_func_body_info *fbi,
+ gimple *stmt, tree expr,
+ int *index_p, tree *type_p,
+ struct agg_position_info *aggpos,
+ expr_eval_ops *param_ops_p = NULL)
+{
+  int op_limit = PARAM_VALUE (PARAM_IPA_MAX_PARAM_EXPR_OPS);
+  int op_count = 0;
+
+  if (param_ops_p)
+*param_ops_p = NULL;
+
+  while (true)
+{
+  expr_eval_op eval_op;
+  poly_int64 size;
+
+  if (unmodified_parm_or_parm_agg_item (fbi, stmt, expr, index_p, &size,
+   aggpos))
+   {
+  tree type = TREE_TYPE (expr);
+
+ /* Stop if found bit-field whose size does not equal any natural
+integral type.  */
+ if (maybe_ne (tree_to_poly_int64 (TYPE_SIZE (type)), size))
+   break;
+
+ *type_p = type;
+   

[PATCH V3] Generalized predicate/condition for parameter reference in IPA (PR ipa/91088)

2019-09-04 Thread Feng Xue OS
Some minor changes.

Feng

diff --git a/gcc/ipa-fnsummary.c b/gcc/ipa-fnsummary.c
index 278bf606661..89a6b8bf061 100644
--- a/gcc/ipa-fnsummary.c
+++ b/gcc/ipa-fnsummary.c
@@ -296,9 +296,9 @@ set_hint_predicate (predicate **p, predicate new_predicate)
 }
 
 
-/* Compute what conditions may or may not hold given invormation about
+/* Compute what conditions may or may not hold given information about
parameters.  RET_CLAUSE returns truths that may hold in a specialized copy,
-   whie RET_NONSPEC_CLAUSE returns truths that may hold in an nonspecialized
+   while RET_NONSPEC_CLAUSE returns truths that may hold in an nonspecialized
copy when called in a given context.  It is a bitmask of conditions. Bit
0 means that condition is known to be false, while bit 1 means that 
condition
may or may not be true.  These differs - for example NOT_INLINED condition
@@ -331,6 +331,8 @@ evaluate_conditions_for_known_args (struct cgraph_node 
*node,
 {
   tree val;
   tree res;
+  int j;
+  struct expr_eval_op *op;
 
   /* We allow call stmt to have fewer arguments than the callee function
  (especially for K&R style programs).  So bound check here (we assume
@@ -382,7 +384,7 @@ evaluate_conditions_for_known_args (struct cgraph_node 
*node,
  continue;
}
 
-  if (maybe_ne (tree_to_poly_int64 (TYPE_SIZE (TREE_TYPE (val))), c->size))
+  if (TYPE_SIZE (c->type) != TYPE_SIZE (TREE_TYPE (val)))
{
  clause |= 1 << (i + predicate::first_dynamic_condition);
  nonspec_clause |= 1 << (i + predicate::first_dynamic_condition);
@@ -394,7 +396,19 @@ evaluate_conditions_for_known_args (struct cgraph_node 
*node,
  continue;
}
 
-  val = fold_unary (VIEW_CONVERT_EXPR, TREE_TYPE (c->val), val);
+  val = fold_unary (VIEW_CONVERT_EXPR, c->type, val);
+  for (j = 0; vec_safe_iterate (c->param_ops, j, &op); j++)
+   {
+ if (!val)
+   break;
+ if (!op->val)
+   val = fold_unary (op->code, op->type, val);
+ else if (op->val_is_rhs)
+   val = fold_binary_to_constant (op->code, op->type, val, op->val);
+ else
+   val = fold_binary_to_constant (op->code, op->type, op->val, val);
+   }
+
   res = val
? fold_binary_to_constant (c->code, boolean_type_node, val, c->val)
: NULL;
@@ -1157,6 +1171,110 @@ eliminated_by_inlining_prob (ipa_func_body_info *fbi, 
gimple *stmt)
 }
 }
 
+/* Analyze EXPR if it represents a series of simple operations performed on
+   a function parameter and return true if so.  FBI, STMT, EXPR, INDEX_P and
+   AGGPOS have the same meaning like in unmodified_parm_or_parm_agg_item.
+   Type of the parameter or load from an aggregate via the parameter is
+   stored in *TYPE_P.  Operations on the parameter are recorded to
+   PARAM_OPS_P if it is not NULL.  */
+
+static bool
+decompose_param_expr (struct ipa_func_body_info *fbi,
+ gimple *stmt, tree expr,
+ int *index_p, tree *type_p,
+ struct agg_position_info *aggpos,
+ expr_eval_ops *param_ops_p = NULL)
+{
+  int op_limit = PARAM_VALUE (PARAM_IPA_MAX_PARAM_EXPR_OPS);
+  int op_count = 0;
+
+  if (param_ops_p)
+*param_ops_p = NULL;
+
+  while (true)
+{
+  expr_eval_op eval_op;
+  poly_int64 size;
+
+  if (unmodified_parm_or_parm_agg_item (fbi, stmt, expr, index_p, &size,
+   aggpos))
+   {
+  tree type = TREE_TYPE (expr);
+
+ /* Stop if found bit-field whose size does not equal any natural
+integral type.  */
+ if (maybe_ne (tree_to_poly_int64 (TYPE_SIZE (type)), size))
+   break;
+
+ *type_p = type;
+ return true;
+   }
+
+  if (TREE_CODE (expr) != SSA_NAME || SSA_NAME_IS_DEFAULT_DEF (expr))
+   break;
+
+  if (!is_gimple_assign (stmt = SSA_NAME_DEF_STMT (expr)))
+   break;
+
+  switch (gimple_assign_rhs_class (stmt))
+   {
+ case GIMPLE_SINGLE_RHS:
+   expr = gimple_assign_rhs1 (stmt);
+   continue;
+
+ case GIMPLE_UNARY_RHS:
+   expr = gimple_assign_rhs1 (stmt);
+
+   eval_op.val_is_rhs = false;
+   eval_op.val = NULL_TREE;
+   break;
+
+ case GIMPLE_BINARY_RHS:
+   {
+ tree op1 = gimple_assign_rhs1 (stmt);
+ tree op2 = gimple_assign_rhs2 (stmt);
+
+ if (is_gimple_ip_invariant (op1))
+   {
+ eval_op.val_is_rhs = false;
+ eval_op.val = op1;
+ expr = op2;
+ break;
+   }
+
+ if (is_gimple_ip_invariant (op2))
+   {
+ eval_op.val_is_rhs = true;
+ eval_op.val = op2;
+ expr = op1;
+ break;
+   }
+   }
+
+ default:
+  

Re: [PATCH] Deprecate -frepo option.

2019-09-04 Thread Martin Liška
On 9/4/19 11:22 AM, Jonathan Wakely wrote:
> On Wed, 4 Sep 2019 at 09:13, Martin Liška wrote:
>>
>> On 7/9/19 3:00 PM, Martin Liška wrote:
>>> Great. Then I'm sending patch that does the functionality removal.
>>
>> Hi.
>>
>> The GCC 9.2.0 is released and the version contains a deprecation
>> warning for -frepo.
>>
>> Is it the right time now to remove it from trunk?
> 
> If users upgrade to 9.2 in 6 months, see the new warning, and tell us
> they're using the feature, will it be possible to re-enable it?

Sure, we'll just revert the patch.

> 
> The point of the warning was to see if users complain. Three weeks
> isn't a very long time to see if users are going to complain :-)
> 

I see :) That said, I'm going to install the patch close to the end
of stage1.

Martin


[PATCH,Fortran] Improve PRNG jumping when using threads

2019-09-04 Thread Janne Blomqvist
Currently, when a new thread needs to use the RANDOM_NUMBER intrinsic,
the per-thread PRNG state is initialized by copying the master state
and then jumping forwards N*2**128 entries in the stream so that the
PRNG streams for different threads don't alias each other, where N is
the number of threads that have so far initialized the PRNG.

With this patch the master state itself is jumped forwards once each
time a new thread initializes the PRNG, thus obviating the need to
jump through all the N-1 previous streams. Effectively turning an O(N)
algorithm into an O(1) one.

Regtested on x86_64-pc-linux-gnu, Ok for trunk?

libgfortran/ChangeLog:

2019-09-04  Janne Blomqvist  

* intrinsics/random.c (master_init): Replace with
master_state.init.
(njumps): Remove variable.
(master_state): Make instance of struct prng_state.
(init_rand_state): When jumping, update the master_state once
instead of keeping track of how many jumps need to be done.
(SZU64): Modify to handle new master_state.
(SZ): Likewise.
(random_seed_i4): Likewise.
(random_seed_i8): Likewise.
---
 libgfortran/intrinsics/random.c | 46 ++---
 1 file changed, 19 insertions(+), 27 deletions(-)

diff --git a/libgfortran/intrinsics/random.c b/libgfortran/intrinsics/random.c
index dd2c46e7ef5..bdad208bd98 100644
--- a/libgfortran/intrinsics/random.c
+++ b/libgfortran/intrinsics/random.c
@@ -194,13 +194,10 @@ typedef struct
 prng_state;
 
 
-/* master_init, njumps, and master_state are the only variables
-   protected by random_lock.  */
-static bool master_init;
-static unsigned njumps; /* How many times we have jumped.  */
-static uint64_t master_state[] = {
-  0xad63fa1ed3b55f36ULL, 0xd94473e78978b497ULL, 0xbc60592a98172477ULL,
-  0xa3de7c6e81265301ULL
+/* master_state is the only variable protected by random_lock.  */
+static prng_state master_state = { .init = false, .s = {
+0xad63fa1ed3b55f36ULL, 0xd94473e78978b497ULL, 0xbc60592a98172477ULL,
+0xa3de7c6e81265301ULL }
 };
 
 
@@ -353,24 +350,21 @@ init_rand_state (prng_state* rs, const bool locked)
 {
   if (!locked)
 __gthread_mutex_lock (&random_lock);
-  if (!master_init)
+  if (!master_state.init)
 {
   uint64_t os_seed;
   getosrandom (&os_seed, sizeof (os_seed));
-  for (uint64_t i = 0; i < sizeof (master_state) / sizeof (uint64_t); i++)
+  for (uint64_t i = 0; i < sizeof (master_state.s) / sizeof (uint64_t); 
i++)
{
   os_seed = splitmix64 (os_seed);
-  master_state[i] = os_seed;
+  master_state.s[i] = os_seed;
 }
-  njumps = 0;
-  master_init = true;
+  master_state.init = true;
 }
-  memcpy (&rs->s, master_state, sizeof (master_state));
-  unsigned n = njumps++;
+  memcpy (&rs->s, master_state.s, sizeof (master_state.s));
+  jump (&master_state);
   if (!locked)
 __gthread_mutex_unlock (&random_lock);
-  for (unsigned i = 0; i < n; i++)
-jump (rs);
   rs->init = true;
 }
 
@@ -727,7 +721,7 @@ arandom_r16 (gfc_array_r16 *x)
 
 
 /* Number of elements in master_state array.  */
-#define SZU64 (sizeof (master_state) / sizeof (uint64_t))
+#define SZU64 (sizeof (master_state.s) / sizeof (uint64_t))
 
 
 /* Keys for scrambling the seed in order to avoid poor seeds.  */
@@ -757,7 +751,7 @@ void
 random_seed_i4 (GFC_INTEGER_4 *size, gfc_array_i4 *put, gfc_array_i4 *get)
 {
   uint64_t seed[SZU64];
-#define SZ (sizeof (master_state) / sizeof (GFC_INTEGER_4))
+#define SZ (sizeof (master_state.s) / sizeof (GFC_INTEGER_4))
 
   /* Check that we only have one argument present.  */
   if ((size ? 1 : 0) + (put ? 1 : 0) + (get ? 1 : 0) > 1)
@@ -800,7 +794,7 @@ random_seed_i4 (GFC_INTEGER_4 *size, gfc_array_i4 *put, 
gfc_array_i4 *get)
  a processor-dependent value to the seed."  */
   if (size == NULL && put == NULL && get == NULL)
 {
-  master_init = false;
+  master_state.init = false;
   init_rand_state (rs, true);
 }
 
@@ -822,9 +816,8 @@ random_seed_i4 (GFC_INTEGER_4 *size, gfc_array_i4 *put, 
gfc_array_i4 *get)
 
   /* We put it after scrambling the bytes, to paper around users who
 provide seeds with quality only in the lower or upper part.  */
-  scramble_seed (master_state, seed);
-  njumps = 0;
-  master_init = true;
+  scramble_seed (master_state.s, seed);
+  master_state.init = true;
   init_rand_state (rs, true);
 }
 
@@ -844,7 +837,7 @@ random_seed_i8 (GFC_INTEGER_8 *size, gfc_array_i8 *put, 
gfc_array_i8 *get)
   if ((size ? 1 : 0) + (put ? 1 : 0) + (get ? 1 : 0) > 1)
 runtime_error ("RANDOM_SEED should have at most one argument present.");
 
-#define SZ (sizeof (master_state) / sizeof (GFC_INTEGER_8))
+#define SZ (sizeof (master_state.s) / sizeof (GFC_INTEGER_8))
   if (size != NULL)
 *size = SZ;
 
@@ -881,7 +874,7 @@ random_seed_i8 (GFC_INTEGER_8 *size, gfc_array_i8 *put, 
gfc_array_i8 *get)
  a processor-dependent v

Re: [PATCHv5] Fix not 8-byte aligned ldrd/strd on ARMv5 (PR 89544)

2019-09-04 Thread Richard Earnshaw (lists)

On 15/08/2019 20:47, Bernd Edlinger wrote:

On 8/15/19 6:29 PM, Richard Biener wrote:


Please split it into the parts for the PR and parts making the
asserts not trigger.



Yes, will do.



Okay, here is the rest of the PR 89544 fix,
actually just an optimization, making the larger stack alignment
known to the middle-end, and the test cases.


Boot-strapped and reg-tested on x86_64-pc-linux-gnu and arm-linux-gnueabihf.
Is it OK for trunk?


Thanks
Bernd.



Index: gcc/testsuite/gcc.target/arm/unaligned-argument-2.c
===
--- gcc/testsuite/gcc.target/arm/unaligned-argument-2.c (Revision 0)
+++ gcc/testsuite/gcc.target/arm/unaligned-argument-2.c (Arbeitskopie)
@@ -0,0 +1,19 @@
+/* { dg-do compile } */
+/* { dg-require-effective-target arm_arm_ok } */
+/* { dg-require-effective-target arm_ldrd_strd_ok } */
+/* { dg-options "-marm -mno-unaligned-access -O3" } */
+
+struct s {
+  int a, b;
+} __attribute__((aligned(8)));
+
+struct s f0;
+
+void f(int a, int b, int c, int d, int e, struct s f)
+{
+  f0 = f;
+}
+
+/* { dg-final { scan-assembler-times "ldrd" 0 } } */
+/* { dg-final { scan-assembler-times "strd" 0 } } */
+/* { dg-final { scan-assembler-times "stm" 1 } } */

I don't think this test is right.  While we can't use an LDRD to load 
the argument off the stack, there's nothing wrong with using an STRD to 
then store the value to f0 (as that is 8-byte aligned).  So the second 
and third scan-assembler tests are meaningless.


R.

(sorry, just noticed this).


Re: [PATCH] Deprecate -frepo option.

2019-09-04 Thread Nathan Sidwell

On 9/4/19 7:20 AM, Martin Liška wrote:

On 9/4/19 11:22 AM, Jonathan Wakely wrote:




The point of the warning was to see if users complain. Three weeks
isn't a very long time to see if users are going to complain :-)



I see :) That said, I'm going to install the patch close to the end
of stage1.


I think it'd be ok to install it during stage3 (perhaps early feb?). 
More chance of people speaking up.


nathan

--
Nathan Sidwell


Re: [PATCH v2] [MIPS] Fix handling of MSA SUBREG moves on big-endian targets

2019-09-04 Thread Mihailo Stojanović

Hello,

On 3.9.19. 21:16, Jeff Law wrote:

On 9/3/19 2:16 AM, Mihailo Stojanovic wrote:

From: Mihailo Stojanovic 

Hi everybody,

This fixes the MSA implementation on big-endian targets which is
essentially broken for things like SUBREG handling and calling
convention for vector types. It borrows heavily from [1] as Aarch64 has
the same problem with SVE vectors.

Conceptually, register bitconverts should act as the data has been
stored to memory in one mode, and loaded from memory in the other.
This isn't what happens on big-endian as vector load/store instructions
are essentially mixed-endian with respect to the vector as a whole.
The in-register representation of data must be changed so that the
load/store round-trip becomes valid. This is done by inserting one or
two shuffle instructions for every SUBREG move, as previously
implemented in [2] for LLVM. Even if the shuffle instructions weren't
generated, constraint in mips_can_change_mode_class would force the
conceptual memory reload of SUBREG move operand, which would generate
correct, albeit very inefficient code.

New msa_reg_predicate was created in order to forbid SUBREG operands in
MSA patterns on big-endian targets. It weeds SUBREGs out of the
instruction patterns into SUBREG->REG moves which are caught by the new
msa_mov_subreg_be pattern and transformed into shuffle(s).

As for the MSA calling convention, ABI states that compiling for MSA
should not change the base ABIs vector calling convention, that is, MSA
vectors passed or returned by value do not use the MSA vector registers.
Instead, they are passed by general-purpose registers, as described by
the ABI. Passing the vector argument requires splitting it into 2 (or 4)
general-purpose registers and bitconverting it into V2DImode (or
V4SImode). The solution boils down to the one presented for SUBREG
moves: force every vector argument to the appropriate mode (V2DI or
V4SI) so that the shuffle instruction(s) might be generated in order to
conform to the calling convention. The same applies to vectors as return
values.

New testcases were added (thanks Dragan!) to check calling convention
compliance for all possible combinations of MSA and non-MSA
interlinking.

This fixes the following vectorization test failures:

 vect/pr66251.c
 vect/vect-alias-check-10.c
 vect/vect-alias-check-11.c
 vect/vect-alias-check-12.c
 vect/vect-bswap32.c
 vect/vect-bswap64.c
 vect/vect-nop-move.c
 vect/slp-41.c
 vect/slp-43.c
 vect/slp-45.c
 vect/slp-perm-11.c

Tested on mips32-mti-linux-gnu and mips64-mti-linux-gnu.

Regards,
Mihailo

[1] https://gcc.gnu.org/ml/gcc-patches/2018-01/msg02176.html
[2] https://reviews.llvm.org/rL189330

gcc/ChangeLog:

2019-08-08  Mihailo Stojanovic  

 * config/mips/mips-msa.md: Replace register_operand predicate with
 msa_reg_operand in every pattern.
 (msa_change_mode): New pattern.
 (*msa_change_mode_): New unspec.
 (*msa_mov_subreg_be): New unspec.
 * config/mips/mips-protos.h (mips_split_msa_subreg_move): Declare.
 * config/mips/mips.c (mips_maybe_expand_msa_subreg_move): New
 function.
 (mips_replace_reg_mode): Ditto.
 (mips_split_msa_subreg_move): Ditto.
 (mips_legitimize_move): Modify machine modes of MSA vectors which
 reside in general-purpose registers. Check whether SUBREG move can
 be replaced with shuffle(s).
 (mips_split_128bit_move): Replace new SUBREGs with REGs.
 (mips_split_msa_copy_d): Ditto.
 (mips_split_msa_insert_d): Ditto.
 (mips_split_msa_fill_d): Ditto.
 (mips_can_change_mode_class): Disallow mode change which would
 result in lane width change.
 (mips_expand_vec_unpack): Replace new SUBREG with REG on big-endian
 targets.
 * config/mips/predicates.md (msa_reg_operand): New predicate.

gcc/testsuite/ChangeLog:

2019-08-08  Dragan Mladjenovic  

 * gcc.target/mips/inter/msa-inter.exp: New file.
 * gcc.target/mips/inter/msa_1.h: New test.
 * gcc.target/mips/inter/msa_1_main.c: New test.
 * gcc.target/mips/inter/msa_1_x.c: New test.
 * gcc.target/mips/inter/msa_1_y.c: New test.

I don't guess Richard S's proposed cleanups to the generic handling of
SUBREGs to address issues with SVE really help here.

Could you send a link to the proposed patch?
I can't seem to find it on the mailing list.

I suspect you're going to need to define secondary reloads here much
like the aarch64 port does.  See aarch64_secondary_reload

I'm not sure that MSA memory accesses need secondary reloads, as MSA
doesn't have predicated moves. Instead, every memory load and store
is handled by LD.* and ST.* instructions.

I suspect you also need to twiddle mips_can_change_mode_class.

I've already made changes to mips_can_change_mode_class, to disallow
MSA mode change which results in a lane width change. Maybe you had
something else in mind?

I'm not sure I really understood yours and Richard's suggestions, so I
don't have a clear idea ho

[COMMITTED][GCC8][GCC9][AArch64] Backport PR81800

2019-09-04 Thread Wilco Dijkstra
I've backported the fix for PR81800 to GCC8 and GCC9:

PR81800 is about the lrint inline giving spurious FE_INEXACT exceptions.
The previous change for PR81800 didn't fix this: when lrint is disabled
in the backend, the midend will simply use llrint.  This actually makes
things worse since llrint now also ignores FE_INVALID exceptions!
The fix is to disable lrint/llrint on double if the size of a long is
smaller (ie. ilp32).

ChangeLog
2018-11-13  Wilco Dijkstra

gcc/
PR target/81800
* gcc/config/aarch64/aarch64.md (lrint): Disable lrint pattern if GPF
operand is larger than a long int.

testsuite/
PR target/81800
* gcc.target/aarch64/no-inline-lrint_3.c: New test.
--

diff --git a/gcc/config/aarch64/aarch64.md b/gcc/config/aarch64/aarch64.md
index 
5a1894063a1ed2db1cc947c9c449d48808ed96ae..f08cd0930b3fc6527fbca218ad3c464f1ead0103
 100644
--- a/gcc/config/aarch64/aarch64.md
+++ b/gcc/config/aarch64/aarch64.md
@@ -6304,7 +6304,7 @@ (define_expand "lrint2"
   [(match_operand:GPI 0 "register_operand")
(match_operand:GPF 1 "register_operand")]
   "TARGET_FLOAT
-   && ((GET_MODE_SIZE (mode) <= GET_MODE_SIZE (mode))
+   && ((GET_MODE_BITSIZE (mode) <= LONG_TYPE_SIZE)
|| !flag_trapping_math || flag_fp_int_builtin_inexact)"
 {
   rtx cvt = gen_reg_rtx (mode);
diff --git a/gcc/testsuite/gcc.target/aarch64/no-inline-lrint_3.c 
b/gcc/testsuite/gcc.target/aarch64/no-inline-lrint_3.c
new file mode 100644
index 
..ca772cb999e7b6cfbd3f080111d3eb479d43f47b
--- /dev/null
+++ b/gcc/testsuite/gcc.target/aarch64/no-inline-lrint_3.c
@@ -0,0 +1,17 @@
+/* { dg-do compile } */
+/* { dg-require-effective-target ilp32 } */
+/* { dg-options "-O3 -fno-math-errno -fno-fp-int-builtin-inexact" } */
+
+#define TEST(name, float_type, int_type, fn) void f_##name (float_type x) \
+{\
+  volatile int_type   b = __builtin_##fn (x);\
+}
+
+TEST (dld, double, long, lrint)
+TEST (flf, float , long, lrintf)
+
+TEST (did, double, int, lrint)
+TEST (fif, float , int, lrintf)
+
+/* { dg-final { scan-assembler-times "fcvtzs\tw\[0-9\]+, \[d,s\]\[0-9\]+" 2 } 
} */
+/* { dg-final { scan-assembler-times "bl\tlrint" 2 } } */

Re: [PATCHv5] Fix not 8-byte aligned ldrd/strd on ARMv5 (PR 89544)

2019-09-04 Thread Bernd Edlinger
On 9/4/19 2:53 PM, Richard Earnshaw (lists) wrote:
> On 15/08/2019 20:47, Bernd Edlinger wrote:
>> On 8/15/19 6:29 PM, Richard Biener wrote:
>
> Please split it into the parts for the PR and parts making the
> asserts not trigger.
>

 Yes, will do.

>>
>> Okay, here is the rest of the PR 89544 fix,
>> actually just an optimization, making the larger stack alignment
>> known to the middle-end, and the test cases.
>>
>>
>> Boot-strapped and reg-tested on x86_64-pc-linux-gnu and arm-linux-gnueabihf.
>> Is it OK for trunk?
>>
>>
>> Thanks
>> Bernd.
>>
> 
> Index: gcc/testsuite/gcc.target/arm/unaligned-argument-2.c
> ===
> --- gcc/testsuite/gcc.target/arm/unaligned-argument-2.c    (Revision 0)
> +++ gcc/testsuite/gcc.target/arm/unaligned-argument-2.c    (Arbeitskopie)
> @@ -0,0 +1,19 @@
> +/* { dg-do compile } */
> +/* { dg-require-effective-target arm_arm_ok } */
> +/* { dg-require-effective-target arm_ldrd_strd_ok } */
> +/* { dg-options "-marm -mno-unaligned-access -O3" } */
> +
> +struct s {
> +  int a, b;
> +} __attribute__((aligned(8)));
> +
> +struct s f0;
> +
> +void f(int a, int b, int c, int d, int e, struct s f)
> +{
> +  f0 = f;
> +}
> +
> +/* { dg-final { scan-assembler-times "ldrd" 0 } } */
> +/* { dg-final { scan-assembler-times "strd" 0 } } */
> +/* { dg-final { scan-assembler-times "stm" 1 } } */
> 
> I don't think this test is right.  While we can't use an LDRD to load the 
> argument off the stack, there's nothing wrong with using an STRD to then 
> store the value to f0 (as that is 8-byte aligned).  So the second and third 
> scan-assembler tests are meaningless.
> 

Ah, that is very similar to the unaligned-memcpy-2/3.c,
see https://gcc.gnu.org/ml/gcc-patches/2019-09/msg00157.html

initially that is a movdi,
then in subreg1 it is split in two movsi
which is then re-assembled as ldm


Not sure if that is intended in that way.


Bernd.


Re: [PATCH 2/2] PR libstdc++/41861 Add full steady_clock support to condition_variable

2019-09-04 Thread Jonathan Wakely

On 15/07/19 17:47 +0100, Mike Crowe wrote:

The pthread_cond_clockwait function was recently added[1] to glibc, and is
due to be released in glibc 2.30. If this function is available in the C
library it can be used it to fix
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=41861 by supporting
std::chrono::steady_clock properly with std::condition_variable.

This means that code using std::condition_variable::wait_for or
std::condition_variable::wait_until with std::chrono::steady_clock is no
longer subject to timing out early or potentially waiting for much longer
if the system clock is warped at an inopportune moment.

If pthread_cond_clockwait is available then std::chrono::steady_clock is
deemed to be the "best" clock available which means that it is used for the
relative wait_for calls and absolute wait_until calls using user-defined
clocks. Calls explicitly using std::chrono::system_clock continue to use
CLOCK_REALTIME via __gthread_cond_timedwait.

If pthread_cond_clockwait is not available then std::chrono::system_clock
is deemed to be the "best" clock available which means that the previous
suboptimal behaviour remains.

[1] 
https://sourceware.org/git/?p=glibc.git;a=commit;h=afe4de7d283ebd88157126c5494ce1796194c16e

libstdc++-v3/

* include/std/condition_variable: Add include of 
to make _GLIBCXX_USE_PTHREAD_COND_CLOCKWAIT available.
(condition_variable): Split __clock_t into __system_clock_t, which
is always system_clock and __best_clock_t, which is the clock that
we wish to convert arbitrary other clocks to.  If we know that
pthread_cond_clockwait is available then it is best to convert
clocks to steady_clock, otherwise it's best to use
system_clock. (wait_until): If pthread_cond_clockwait is available,
provide a steady_clock overload.  Convert previous __clock_t
overload to use __system_clock_t.  Convert generic overload to
convert passed clock to __best_clock_t.  (wait_until_impl): Add
steady_clock overload that calls pthread_cond_clockwait.  Convert
previous __clock_t overload to use
__system_clock_t. (condition_variable_any): Use steady_clock for
__clock_t if pthread_cond_clockwait is available.

* acinclude.m4: Detect the availability of POSIX-proposed
pthread_cond_clockwait function.
* configure: Likewise.
* configure.ac: Likewise.
* config.h.in: Add _GLIBCXX_USE_PTHREAD_COND_CLOCKWAIT define to
indicate presence of pthread_cond_clockwait function.


Thanks, Mike!

This patch is much simpler than the previous versions. Thanks for
perservering with POSIX and glibc to get the necessary functionality
into libc.

I've attached a slightly-modified patch, which changes the names of
the clock typedefs in std::condition_variable. The names
"steady_clock" and "system_clock" are already reserved names, so we
can just use those (instead of the uglier __steady_clock_t forms). And
we can just keep the original __clock_t name to mean the best clock.
Does that look OK to you too?

I've confirmed that with glibc 2.30 the new function is detected, and
the testsuite passes. I'm running the tests with an older glibc as
well.

I noticed that the new tests you added in [PATCH 1/2] pass on current
trunk, even without [PATCH 2/2], is that expected?

commit d3ddd730a7a2ca4a9d207eb3e1229f5cb8c652e4
Author: Jonathan Wakely 
Date:   Wed Sep 4 12:59:04 2019 +0100

PR libstdc++/41861 Add full steady_clock support to condition_variable

The pthread_cond_clockwait function is available in glibc since the 2.30
release. If this function is available in the C library it can be used
to fix PR libstdc++/41861 by supporting std::chrono::steady_clock
properly with std::condition_variable.

This means that code using std::condition_variable::wait_for or
std::condition_variable::wait_until with std::chrono::steady_clock is no
longer subject to timing out early or potentially waiting for much
longer if the system clock is warped at an inopportune moment.

If pthread_cond_clockwait is available then std::chrono::steady_clock is
deemed to be the "best" clock available which means that it is used for
the relative wait_for calls and absolute wait_until calls using
user-defined clocks. Calls explicitly using std::chrono::system_clock
continue to use CLOCK_REALTIME via __gthread_cond_timedwait.

If pthread_cond_clockwait is not available then
std::chrono::system_clock is deemed to be the "best" clock available
which means that the previous suboptimal behaviour remains.

2019-09-04  Mike Crowe  

PR libstdc++/41861
* acinclude.m4 (GLIBCXX_CHECK_PTHREAD_COND_CLOCKWAIT): Check for new
pthread_cond_clockwait function.
* configure.ac: Use GLIBCXX_CHECK_PTHREAD_COND_CLOCKWAIT.
* configure: Regenerate.
* config.h.in

Re: r272976 - in /trunk/gcc/ada: ChangeLog ali.adb ...

2019-09-04 Thread Maciej W. Rozycki
Hi,

On Wed, 3 Jul 2019, pmdero...@gcc.gnu.org wrote:

> Author: pmderodat
> Date: Wed Jul  3 08:14:57 2019
> New Revision: 272976
> 
> URL: https://gcc.gnu.org/viewcvs?rev=272976&root=gcc&view=rev
> Log:
> [Ada] ABE checks v3.0, foundations of Elaboration order v4.0
> 
> 
> -- Elaboration checks --
> 
> 
> The dynamic ABE checks model now emits the same diagnostics as those of the
> static ABE checks model.
> 
> The ABE checks mechanism has been redesigned and refactored in the face of
> increasing requirements. Most of the functionality can now be toggled, thus
> allowing for various combinations of behavior. The combinations are defined
> as "initial states" and may be further altered.
> 
> Scenarios and targets have been distinctly separated at the higher level,
> instead of directly working with nodes and entitites. Scenarios and targets
> now carry a representation which removes the need to constantly recompute
> relevant attributes, and offers a common interface for the various processors.
> 
> Most processing has now been refactored into "services" which perform a single
> ABE-related function.
> 
> ---
> -- Elaboration order --
> ---
> 
> A new elaboration order mechanism based on the use of an invocation graph to
> provide extra information about the flow of execution at elaboration time has
> been introduced.
> 
> The ABE checks mechanism has been altered to encode pieces of the invocation
> graph in the associated ALI files of units.
> 
> The new elaboration order mechanism reconstructs the full invocation graph at
> bind time, and coupled with the library item graph, determines the elaboration
> order of units.
> 
> The new elaboration order mechanism is currently inaccessible.

 This change (not posted to `gcc-patches' for some reason) has caused a 
regression in the form of a build failure with the `riscv-linux-gnu' 
target (and for the record the `x86_64-linux-gnu' build/host running GCC 
8.3.0):

ali.adb:34:28: warning: use clause for package "GNAT" has no effect
ali.adb:35:28: warning: use clause for package "Dynamic_HTables" has no effect
ali.adb:155:52: "Bucket_Range_Type" is undefined (more references follow)
ali.adb:158:27: "Dynamic_Hash_Tables" is undefined
ali.adb:173:30: "Sig_Map" is undefined (more references follow)
gnatmake: ".../gcc/ada/ali.adb" compilation error
make[3]: *** [../gcc-interface/Makefile:469: gnatmake-re] Error 4
make[3]: Leaving directory '.../gcc/ada/tools'
make[2]: *** [Makefile:216: gnattools-cross] Error 2
make[2]: Leaving directory '.../gnattools'
make[1]: *** [Makefile:11224: all-gnattools] Error 2

-- that has persisted ever since.

 I have noticed that the `Bucket_Range_Type' data type has only been added 
after GCC 8 has branched, so I tried GCC 9 instead, Debian 9.2.1 20190821 
specifically, but that still fails:

ali.adb:35:28: warning: use clause for package "Dynamic_HTables" has no effect
ali.adb:158:27: "Dynamic_Hash_Tables" is undefined
ali.adb:173:30: "Sig_Map" is undefined (more references follow)
gnatmake: ".../gcc/ada/ali.adb" compilation error
../gcc-interface/Makefile:468: recipe for target 'gnatmake-re' failed

-- likely due to r272860, which renamed the `Dynamic_HTable' package to 
`Dynamic_Hash_Tables' (r272860 is earlier than r272976, but it is only 
r272976 that has made GNAT proper use `Dynamic_Hash_Tables'; previously 
only `libgnat' used that package).

 This means that no released GCC version can be used to build trunk GNAT, 
which would provide a clean upgrade path, and there is only a narrow 
window between r272860 and r272975 inclusive that lets one move on to GNAT 
10.  I find this rather unfortunate.

 Can you please look into it?

  Maciej





Re: r272976 - in /trunk/gcc/ada: ChangeLog ali.adb ...

2019-09-04 Thread Arnaud Charlet
Maciej,

>  This change (not posted to `gcc-patches' for some reason) has caused a 
> regression in the form of a build failure with the `riscv-linux-gnu' 
> target (and for the record the `x86_64-linux-gnu' build/host running GCC 
> 8.3.0):
> 
> ali.adb:34:28: warning: use clause for package "GNAT" has no effect
> ali.adb:35:28: warning: use clause for package "Dynamic_HTables" has no effect
> ali.adb:155:52: "Bucket_Range_Type" is undefined (more references follow)
> ali.adb:158:27: "Dynamic_Hash_Tables" is undefined
> ali.adb:173:30: "Sig_Map" is undefined (more references follow)
> gnatmake: ".../gcc/ada/ali.adb" compilation error
> make[3]: *** [../gcc-interface/Makefile:469: gnatmake-re] Error 4
> make[3]: Leaving directory '.../gcc/ada/tools'
> make[2]: *** [Makefile:216: gnattools-cross] Error 2
> make[2]: Leaving directory '.../gnattools'
> make[1]: *** [Makefile:11224: all-gnattools] Error 2
> 
> -- that has persisted ever since.

As documented in https://gcc.gnu.org/install/build.html
you need a matching native GNAT compiler to build a GNAT cross compiler.

If I understood properly, you are using a GCC 8.3.0 as the host compiler
to build a GNAT cross compiler, which indeed won't work: you need to first
build a native matching compiler and then use this matching compiler to build
the cross.

Arno


Re: r272976 - in /trunk/gcc/ada: ChangeLog ali.adb ...

2019-09-04 Thread Andreas Schwab
On Sep 04 2019, "Maciej W. Rozycki"  wrote:

>  This means that no released GCC version can be used to build trunk GNAT, 
> which would provide a clean upgrade path, and there is only a narrow 
> window between r272860 and r272975 inclusive that lets one move on to GNAT 
> 10.  I find this rather unfortunate.

I don't see that here.  My gcc testers can still build trunk GNAT with
gcc 9.

http://gcc.gnu.org/ml/gcc-testresults/2019-09/msg00072.html
http://gcc.gnu.org/ml/gcc-testresults/2019-09/msg00349.html

Andreas.

-- 
Andreas Schwab, SUSE Labs, sch...@suse.de
GPG Key fingerprint = 0196 BAD8 1CE9 1970 F4BE  1748 E4D4 88E3 0EEA B9D7
"And now for something completely different."


Re: [PATCHv5] Fix not 8-byte aligned ldrd/strd on ARMv5 (PR 89544)

2019-09-04 Thread Richard Earnshaw (lists)

On 04/09/2019 14:28, Bernd Edlinger wrote:

On 9/4/19 2:53 PM, Richard Earnshaw (lists) wrote:

On 15/08/2019 20:47, Bernd Edlinger wrote:

On 8/15/19 6:29 PM, Richard Biener wrote:


Please split it into the parts for the PR and parts making the
asserts not trigger.



Yes, will do.



Okay, here is the rest of the PR 89544 fix,
actually just an optimization, making the larger stack alignment
known to the middle-end, and the test cases.


Boot-strapped and reg-tested on x86_64-pc-linux-gnu and arm-linux-gnueabihf.
Is it OK for trunk?


Thanks
Bernd.



Index: gcc/testsuite/gcc.target/arm/unaligned-argument-2.c
===
--- gcc/testsuite/gcc.target/arm/unaligned-argument-2.c    (Revision 0)
+++ gcc/testsuite/gcc.target/arm/unaligned-argument-2.c    (Arbeitskopie)
@@ -0,0 +1,19 @@
+/* { dg-do compile } */
+/* { dg-require-effective-target arm_arm_ok } */
+/* { dg-require-effective-target arm_ldrd_strd_ok } */
+/* { dg-options "-marm -mno-unaligned-access -O3" } */
+
+struct s {
+  int a, b;
+} __attribute__((aligned(8)));
+
+struct s f0;
+
+void f(int a, int b, int c, int d, int e, struct s f)
+{
+  f0 = f;
+}
+
+/* { dg-final { scan-assembler-times "ldrd" 0 } } */
+/* { dg-final { scan-assembler-times "strd" 0 } } */
+/* { dg-final { scan-assembler-times "stm" 1 } } */

I don't think this test is right.  While we can't use an LDRD to load the 
argument off the stack, there's nothing wrong with using an STRD to then store 
the value to f0 (as that is 8-byte aligned).  So the second and third 
scan-assembler tests are meaningless.



Ah, that is very similar to the unaligned-memcpy-2/3.c,
see https://gcc.gnu.org/ml/gcc-patches/2019-09/msg00157.html

initially that is a movdi,
then in subreg1 it is split in two movsi
which is then re-assembled as ldm


Not sure if that is intended in that way.




Yeah, these are causing me some problems too, but that's because with 
some changes I'm working on I now see the compiler using r4 and r5, 
which leads to prologue and epilogue stores that distort the results.


Tests like this are generally fragile - I hate 'em

R.

Bernd.





Re: [ARM/FDPIC v5 10/21] [ARM] FDPIC: Implement TLS support.

2019-09-04 Thread Kyrill Tkachov

Hi Christophe,

On 5/15/19 1:39 PM, Christophe Lyon wrote:

Support additional relocations: TLS_GD32_FDPIC, TLS_LDM32_FDPIC, and
TLS_IE32_FDPIC.

We do not support the GNU2 TLS dialect.

2019-XX-XX  Christophe Lyon  
    Mickaël Guêné 

    gcc/
    * config/arm/arm.c (tls_reloc): Add TLS_GD32_FDPIC,
    TLS_LDM32_FDPIC and TLS_IE32_FDPIC.
    (arm_call_tls_get_addr): Add FDPIC support.
    (legitimize_tls_address): Likewise.
    (arm_emit_tls_decoration): Likewise.

Change-Id: I4ea5034ff654540c4658d0a79fb92f70550cdf4a

diff --git a/gcc/config/arm/arm.c b/gcc/config/arm/arm.c
index 99d13bf..5fc7a20 100644
--- a/gcc/config/arm/arm.c
+++ b/gcc/config/arm/arm.c
@@ -2379,9 +2379,12 @@ char arm_arch_name[] = "__ARM_ARCH_PROFILE__";

 enum tls_reloc {
   TLS_GD32,
+  TLS_GD32_FDPIC,
   TLS_LDM32,
+  TLS_LDM32_FDPIC,
   TLS_LDO32,
   TLS_IE32,
+  TLS_IE32_FDPIC,
   TLS_LE32,
   TLS_DESCSEQ  /* GNU scheme */
 };
@@ -8760,20 +8763,34 @@ arm_call_tls_get_addr (rtx x, rtx reg, rtx 
*valuep, int reloc)

   gcc_assert (reloc != TLS_DESCSEQ);
   start_sequence ();

-  labelno = GEN_INT (pic_labelno++);
-  label = gen_rtx_UNSPEC (Pmode, gen_rtvec (1, labelno), 
UNSPEC_PIC_LABEL);

-  label = gen_rtx_CONST (VOIDmode, label);
+  if (TARGET_FDPIC)
+    {
+  sum = gen_rtx_UNSPEC (Pmode,
+   gen_rtvec (2, x, GEN_INT (reloc)),
+   UNSPEC_TLS);
+    }
+  else
+    {
+  labelno = GEN_INT (pic_labelno++);
+  label = gen_rtx_UNSPEC (Pmode, gen_rtvec (1, labelno), 
UNSPEC_PIC_LABEL);

+  label = gen_rtx_CONST (VOIDmode, label);

-  sum = gen_rtx_UNSPEC (Pmode,
-   gen_rtvec (4, x, GEN_INT (reloc), label,
-  GEN_INT (TARGET_ARM ? 8 : 4)),
-   UNSPEC_TLS);
+  sum = gen_rtx_UNSPEC (Pmode,
+   gen_rtvec (4, x, GEN_INT (reloc), label,
+  GEN_INT (TARGET_ARM ? 8 : 4)),
+   UNSPEC_TLS);
+    }
   reg = load_tls_operand (sum, reg);

-  if (TARGET_ARM)
-    emit_insn (gen_pic_add_dot_plus_eight (reg, reg, labelno));
+  if (TARGET_FDPIC)
+    {
+  emit_insn (gen_addsi3 (reg, reg, gen_rtx_REG (Pmode, 
FDPIC_REGNUM)));

+    }


No {} around single statement.


   else
-    emit_insn (gen_pic_add_dot_plus_four (reg, reg, labelno));
+    if (TARGET_ARM)
+  emit_insn (gen_pic_add_dot_plus_eight (reg, reg, labelno));



Merge that "if" with the "else" in an "else if"


+    else
+  emit_insn (gen_pic_add_dot_plus_four (reg, reg, labelno));

   *valuep = emit_library_call_value (get_tls_get_addr (), NULL_RTX,
  LCT_PURE, /* LCT_CONST?  */
@@ -8808,6 +8825,7 @@ arm_tls_descseq_addr (rtx x, rtx reg)
   return reg;
 }

+
 rtx
 legitimize_tls_address (rtx x, rtx reg)
 {
@@ -8820,6 +8838,9 @@ legitimize_tls_address (rtx x, rtx reg)
 case TLS_MODEL_GLOBAL_DYNAMIC:
   if (TARGET_GNU2_TLS)
 {
+ if (TARGET_FDPIC)
+   gcc_unreachable();
+


Use gcc_assert (TARGET_FDPIC)



   reg = arm_tls_descseq_addr (x, reg);

   tp = arm_load_tp (NULL_RTX);
@@ -8829,7 +8850,10 @@ legitimize_tls_address (rtx x, rtx reg)
   else
 {
   /* Original scheme */
- insns = arm_call_tls_get_addr (x, reg, &ret, TLS_GD32);
+ if (TARGET_FDPIC)
+   insns = arm_call_tls_get_addr (x, reg, &ret, TLS_GD32_FDPIC);
+ else
+   insns = arm_call_tls_get_addr (x, reg, &ret, TLS_GD32);
   dest = gen_reg_rtx (Pmode);
   emit_libcall_block (insns, dest, ret, x);
 }
@@ -8838,6 +8862,9 @@ legitimize_tls_address (rtx x, rtx reg)
 case TLS_MODEL_LOCAL_DYNAMIC:
   if (TARGET_GNU2_TLS)
 {
+ if (TARGET_FDPIC)
+   gcc_unreachable();
+


Likewise.

Ok with those changes.

Thanks,

Kyrill



   reg = arm_tls_descseq_addr (x, reg);

   tp = arm_load_tp (NULL_RTX);
@@ -8846,7 +8873,10 @@ legitimize_tls_address (rtx x, rtx reg)
 }
   else
 {
- insns = arm_call_tls_get_addr (x, reg, &ret, TLS_LDM32);
+ if (TARGET_FDPIC)
+   insns = arm_call_tls_get_addr (x, reg, &ret, TLS_LDM32_FDPIC);
+ else
+   insns = arm_call_tls_get_addr (x, reg, &ret, TLS_LDM32);

   /* Attach a unique REG_EQUIV, to allow the RTL optimizers to
  share the LDM result with other LD model accesses.  */
@@ -8865,23 +8895,35 @@ legitimize_tls_address (rtx x, rtx reg)
   return dest;

 case TLS_MODEL_INITIAL_EXEC:
-  labelno = GEN_INT (pic_labelno++);
-  label = gen_rtx_UNSPEC (Pmode, gen_rtvec (1, labelno), 
UNSPEC_PIC_LABEL);

-  label = gen_rtx_CONST (VOIDmode, label);
-  sum = gen_rtx_UNSPEC (Pmode,
-   gen_rtvec (4, x, GEN_INT (TLS_IE32), label,
-  GEN_INT (TARGET_ARM ? 8 : 4)),
-  

Re: [ARM/FDPIC v5 11/21] [ARM] FDPIC: Add support to unwind FDPIC signal frame

2019-09-04 Thread Kyrill Tkachov

Hi Christophe,

On 5/15/19 1:39 PM, Christophe Lyon wrote:

2019-XX-XX  Christophe Lyon 
    Mickaël Guêné 

    libgcc/
    * unwind-arm-common.inc (ARM_SET_R7_RT_SIGRETURN)
    (THUMB2_SET_R7_RT_SIGRETURN, FDPIC_LDR_R12_WITH_FUNCDESC)
    (FDPIC_LDR_R9_WITH_GOT, FDPIC_LDR_PC_WITH_RESTORER)
    (FDPIC_FUNCDESC_OFFSET, ARM_NEW_RT_SIGFRAME_UCONTEXT)
    (ARM_UCONTEXT_SIGCONTEXT, ARM_SIGCONTEXT_R0, 
FDPIC_T2_LDR_R12_WITH_FUNCDESC)

    (FDPIC_T2_LDR_R9_WITH_GOT, FDPIC_T2_LDR_PC_WITH_RESTORER): New.
    (__gnu_personality_sigframe_fdpic): New.
    (get_eit_entry): Add FDPIC signal frame support.



At this point I'm looking that the code is FDPIC-specific and thus won't 
break other targets.


I'll have to trust your testing that this is correct for FDPIC.

Ok.

Thanks,

Kyrill



Change-Id: I7f9527cc50665dd1a731b7badf71c319fb38bf57

diff --git a/libgcc/unwind-arm-common.inc b/libgcc/unwind-arm-common.inc
index 0bacc11..c9a82b7 100644
--- a/libgcc/unwind-arm-common.inc
+++ b/libgcc/unwind-arm-common.inc
@@ -30,6 +30,26 @@
 #include 
 #endif

+#if __FDPIC__
+/* Load r7 with rt_sigreturn value.  */
+#define ARM_SET_R7_RT_SIGRETURN 0xe3a070ad  /* mov   r7, #0xad */
+#define THUMB2_SET_R7_RT_SIGRETURN 0x07adf04f  /* mov.w r7, 
#0xad */

+
+/* FDPIC jump to restorer sequence.  */
+#define FDPIC_LDR_R12_WITH_FUNCDESC    0xe59fc004  /* ldr   r12, 
[pc, #4] */
+#define FDPIC_LDR_R9_WITH_GOT  0xe59c9004  /* ldr   r9, 
[r12, #4] */
+#define FDPIC_LDR_PC_WITH_RESTORER 0xe59cf000  /* ldr   pc, 
[r12] */
+#define FDPIC_T2_LDR_R12_WITH_FUNCDESC  0xc008f8df /* ldr.w r12, 
[pc, #8] */
+#define FDPIC_T2_LDR_R9_WITH_GOT   0x9004f8dc  /* ldr.w r9, 
[r12, #4] */
+#define FDPIC_T2_LDR_PC_WITH_RESTORER   0xf000f8dc /* ldr.w pc, 
[r12] */

+#define FDPIC_FUNCDESC_OFFSET  12
+
+/* Signal frame offsets.  */
+#define ARM_NEW_RT_SIGFRAME_UCONTEXT   0x80
+#define ARM_UCONTEXT_SIGCONTEXT    0x14
+#define ARM_SIGCONTEXT_R0  0xc
+#endif
+
 /* We add a prototype for abort here to avoid creating a dependency on
    target headers.  */
 extern void abort (void);
@@ -199,6 +219,45 @@ search_EIT_table (const __EIT_entry * table, int 
nrec, _uw return_address)

 }
 }

+#if __FDPIC__
+/* VFP is not restored, but this is sufficient to allow unwinding.  */
+static _Unwind_Reason_Code
+__gnu_personality_sigframe_fdpic (_Unwind_State state,
+ _Unwind_Control_Block *ucbp,
+ _Unwind_Context *context)
+{
+    unsigned int sp;
+    unsigned int pc;
+    unsigned int funcdesc;
+    unsigned int handler;
+    unsigned int first_handler_instruction;
+    int i;
+
+    _Unwind_VRS_Get (context, _UVRSC_CORE, R_SP, _UVRSD_UINT32, &sp);
+    _Unwind_VRS_Get (context, _UVRSC_CORE, R_PC, _UVRSD_UINT32, &pc);
+
+    funcdesc = *(unsigned int *)((pc & ~1) + FDPIC_FUNCDESC_OFFSET);
+    handler = *(unsigned int *)(funcdesc);
+    first_handler_instruction = *(unsigned int *)(handler & ~1);
+
+    /* Adjust SP to point to the start of registers according to
+   signal type.  */
+    if (first_handler_instruction == ARM_SET_R7_RT_SIGRETURN
+   || first_handler_instruction == THUMB2_SET_R7_RT_SIGRETURN)
+   sp += ARM_NEW_RT_SIGFRAME_UCONTEXT
+ + ARM_UCONTEXT_SIGCONTEXT
+ + ARM_SIGCONTEXT_R0;
+    else
+   sp += ARM_UCONTEXT_SIGCONTEXT
+ + ARM_SIGCONTEXT_R0;
+    /* Restore regs saved on stack by the kernel.  */
+    for (i = 0; i < 16; i++)
+   _Unwind_VRS_Set (context, _UVRSC_CORE, i, _UVRSD_UINT32, sp + 
4 * i);

+
+    return _URC_CONTINUE_UNWIND;
+}
+#endif
+
 /* Find the exception index table eintry for the given address.
    Fill in the relevant fields of the UCB.
    Returns _URC_FAILURE if an error occurred, _URC_OK on success.  */
@@ -222,6 +281,27 @@ get_eit_entry (_Unwind_Control_Block *ucbp, _uw 
return_address)

&nrec);
   if (!eitp)
 {
+#if __FDPIC__
+ /* If we are unwinding a signal handler then perhaps we have
+    reached a trampoline.  Try to detect jump to restorer
+    sequence.  */
+ _uw *pc = (_uw *)((return_address+2) & ~1);
+ if ((pc[0] == FDPIC_LDR_R12_WITH_FUNCDESC
+  && pc[1] == FDPIC_LDR_R9_WITH_GOT
+  && pc[2] == FDPIC_LDR_PC_WITH_RESTORER)
+ || (pc[0] == FDPIC_T2_LDR_R12_WITH_FUNCDESC
+ && pc[1] == FDPIC_T2_LDR_R9_WITH_GOT
+ && pc[2] == FDPIC_T2_LDR_PC_WITH_RESTORER))
+   {
+ struct funcdesc_t *funcdesc
+   = (struct funcdesc_t *) &__gnu_personality_sigframe_fdpic;
+
+ UCB_PR_ADDR (ucbp) = funcdesc->ptr;
+ UCB_PR_GOT (ucbp) = funcdesc->got;
+
+ return _URC_OK;
+   }
+#endif
   UCB_PR_ADDR (ucbp) = 0;
   return _URC_FAILURE;
 }
@@ -236,6 +316,27 @@ get_eit_entry (_Unwind_Control_Block *ucbp, 

Re: [PATCH 02/25] Propagate address spaces to builtins.

2019-09-04 Thread Kyrill Tkachov



On 9/3/19 4:00 PM, Jeff Law wrote:

On 9/3/19 8:01 AM, Kyrill Tkachov wrote:

Hi all,

On 9/5/18 12:48 PM, a...@codesourcery.com wrote:

At present, pointers passed to builtin functions, including atomic
operators,
are stripped of their address space properties.  This doesn't seem to be
deliberate, it just omits to copy them.

Not only that, but it forces pointer sizes to Pmode, which isn't
appropriate
for all address spaces.

This patch attempts to correct both issues.  It works for GCN atomics and
GCN OpenACC gang-private variables.

2018-09-05  Andrew Stubbs  
     Julian Brown  

     gcc/
     * builtins.c (get_builtin_sync_mem): Handle address spaces.


Sorry for responding to this so late. I'm testing a rebased version of
Richard's OOL atomic patches [1] and am hitting an ICE building the
-mabi=ilp32 libgfortran multilib for aarch64-none-elf:

0x7284db emit_library_call_value_1(int, rtx_def*, rtx_def*,
libcall_type, machine_mode, int, std::pair*)
     $SRC/gcc/calls.c:4915
0x1037817 emit_library_call_value(rtx_def*, rtx_def*, libcall_type,
machine_mode, rtx_def*, machine_mode, rtx_def*, machine_mode, rtx_def*,
machine_mode)
     $SRC/gcc/rtl.h:4240
0x1037817 aarch64_expand_compare_and_swap(rtx_def**)
     $SRC/gcc/config/aarch64/aarch64.c:16981
0x1353a43 gen_atomic_compare_and_swapsi(rtx_def*, rtx_def*, rtx_def*,
rtx_def*, rtx_def*, rtx_def*, rtx_def*, rtx_def*)
     $SRC/gcc/config/aarch64/atomics.md:34
0xb1f9f1 insn_gen_fn::operator()(rtx_def*, rtx_def*, rtx_def*, rtx_def*,
rtx_def*, rtx_def*, rtx_def*, rtx_def*) const
     $SRC/gcc/recog.h:324
0xb1f9f1 maybe_gen_insn(insn_code, unsigned int, expand_operand*)
     $SRC/gcc/optabs.c:7443
0xb1fa78 maybe_expand_insn(insn_code, unsigned int, expand_operand*)
     $SRC/gcc/optabs.c:7459
0xb21024 expand_atomic_compare_and_swap(rtx_def**, rtx_def**, rtx_def*,
rtx_def*, rtx_def*, bool, memmodel, memmodel)
     $SRC/gcc/optabs.c:6448
0x709bd3 expand_builtin_atomic_compare_exchange
     $SRC/gcc/builtins.c:6379
0x71a4e9 expand_builtin(tree_node*, rtx_def*, rtx_def*, machine_mode, int)
     $SRC/gcc/builtins.c:8147
0x88b746 expand_expr_real_1(tree_node*, rtx_def*, machine_mode,
expand_modifier, rtx_def**, bool)
     $SRC/gcc/expr.c:11052
0x88cce6 expand_expr_real(tree_node*, rtx_def*, machine_mode,
expand_modifier, rtx_def**, bool)
     $SRC/gcc/expr.c:8289
0x74cb47 expand_expr
     $SRC/gcc/expr.h:281
0x74cb47 expand_call_stmt
     $SRC/gcc/cfgexpand.c:2731
0x74cb47 expand_gimple_stmt_1
     $SRC/gcc/cfgexpand.c:3710
0x74cb47 expand_gimple_stmt
     $SRC/gcc/cfgexpand.c:3875
0x75439b expand_gimple_basic_block
     $SRC/gcc/cfgexpand.c:5915
0x7563ab execute
     $SRC/gcc/cfgexpand.c:6538
Please submit a full bug report,
with preprocessed source if appropriate.
Please include the complete backtrace with any bug report.
See  for instructions.

A MEM rtx now uses a DImode address where for ILP32 we expect SImode.

This looks to be because

[1] https://gcc.gnu.org/ml/gcc-patches/2018-11/msg00062.html



---
  gcc/builtins.c | 13 ++---
  1 file changed, 10 insertions(+), 3 deletions(-)


0002-Propagate-address-spaces-to-builtins.patch

diff --git a/gcc/builtins.c b/gcc/builtins.c
index 58ea747..361361c 100644
--- a/gcc/builtins.c
+++ b/gcc/builtins.c
@@ -5781,14 +5781,21 @@ static rtx
  get_builtin_sync_mem (tree loc, machine_mode mode)
  {
    rtx addr, mem;
+  int addr_space = TYPE_ADDR_SPACE (POINTER_TYPE_P (TREE_TYPE (loc))
+    ? TREE_TYPE (TREE_TYPE (loc))
+    : TREE_TYPE (loc));
+  scalar_int_mode addr_mode = targetm.addr_space.address_mode
(addr_space);
  
... This now returns Pmode (the default for the hook) for aarch64 ILP32,

which is always DImode.

-  addr = expand_expr (loc, NULL_RTX, ptr_mode, EXPAND_SUM);

Before this patch we used ptr_mode, which does the right thing for
AArch64 ILP32.
Do you think we should just be implementing
targetm.addr_space.address_mode for AArch64 to return SImode for ILP32?

Possibly.   Is there any fallout from making that change?


Unfortunately some ICEs when building libgcc with POST_INC arguments 
output :(


I'll need to dig further.

Thanks,

Kyrill




Jeff


Re: [PATCH 2/2] PR libstdc++/41861 Add full steady_clock support to condition_variable

2019-09-04 Thread Mike Crowe
On Wednesday 04 September 2019 at 14:39:35 +0100, Jonathan Wakely wrote:
> On 15/07/19 17:47 +0100, Mike Crowe wrote:
> > The pthread_cond_clockwait function was recently added[1] to glibc, and is
> > due to be released in glibc 2.30. If this function is available in the C
> > library it can be used it to fix
> > https://gcc.gnu.org/bugzilla/show_bug.cgi?id=41861 by supporting
> > std::chrono::steady_clock properly with std::condition_variable.
> > 
> > This means that code using std::condition_variable::wait_for or
> > std::condition_variable::wait_until with std::chrono::steady_clock is no
> > longer subject to timing out early or potentially waiting for much longer
> > if the system clock is warped at an inopportune moment.
> > 
> > If pthread_cond_clockwait is available then std::chrono::steady_clock is
> > deemed to be the "best" clock available which means that it is used for the
> > relative wait_for calls and absolute wait_until calls using user-defined
> > clocks. Calls explicitly using std::chrono::system_clock continue to use
> > CLOCK_REALTIME via __gthread_cond_timedwait.
> > 
> > If pthread_cond_clockwait is not available then std::chrono::system_clock
> > is deemed to be the "best" clock available which means that the previous
> > suboptimal behaviour remains.
> > 
> > [1] 
> > https://sourceware.org/git/?p=glibc.git;a=commit;h=afe4de7d283ebd88157126c5494ce1796194c16e
> > 
> > libstdc++-v3/
> > 
> > * include/std/condition_variable: Add include of 
> > to make _GLIBCXX_USE_PTHREAD_COND_CLOCKWAIT available.
> > (condition_variable): Split __clock_t into __system_clock_t, which
> > is always system_clock and __best_clock_t, which is the clock that
> > we wish to convert arbitrary other clocks to.  If we know that
> > pthread_cond_clockwait is available then it is best to convert
> > clocks to steady_clock, otherwise it's best to use
> > system_clock. (wait_until): If pthread_cond_clockwait is available,
> > provide a steady_clock overload.  Convert previous __clock_t
> > overload to use __system_clock_t.  Convert generic overload to
> > convert passed clock to __best_clock_t.  (wait_until_impl): Add
> > steady_clock overload that calls pthread_cond_clockwait.  Convert
> > previous __clock_t overload to use
> > __system_clock_t. (condition_variable_any): Use steady_clock for
> > __clock_t if pthread_cond_clockwait is available.
> > 
> > * acinclude.m4: Detect the availability of POSIX-proposed
> > pthread_cond_clockwait function.
> > * configure: Likewise.
> > * configure.ac: Likewise.
> > * config.h.in: Add _GLIBCXX_USE_PTHREAD_COND_CLOCKWAIT define to
> > indicate presence of pthread_cond_clockwait function.
> 
> Thanks, Mike!
> 
> This patch is much simpler than the previous versions. Thanks for
> perservering with POSIX and glibc to get the necessary functionality
> into libc.
> 
> I've attached a slightly-modified patch, which changes the names of
> the clock typedefs in std::condition_variable. The names
> "steady_clock" and "system_clock" are already reserved names, so we
> can just use those (instead of the uglier __steady_clock_t forms). And
> we can just keep the original __clock_t name to mean the best clock.
> Does that look OK to you too?

You're far more expert on that stuff than I am, but it looks fine to me.

However, you've also removed the  include. Was that
intentional?

> I've confirmed that with glibc 2.30 the new function is detected, and
> the testsuite passes. I'm running the tests with an older glibc as
> well.

I found that it was necessary to run the test under strace to prove that
the correct variant of futex is called too, because...

> I noticed that the new tests you added in [PATCH 1/2] pass on current
> trunk, even without [PATCH 2/2], is that expected?

Unfortunately, yes. CLOCK_MONOTONIC and CLOCK_REALTIME will tick at the
same rate, so unless someone warps CLOCK_REALTIME during the test we can't
tell the difference between the old implementation of translating
steady_clock to system_clock and the new implementation that uses
steady_clock directly. :( I added the tests in the hope of finding other
mistakes in the new implementation rather than to reproduce the original
problem.

Maybe I should add comments to the test to make that clear along the lines
of those found in testsuite/27_io/objects/char/3_xin.cc ?

Thanks for looking at this.

Mike.


Re: [PATCH v2] [MIPS] Fix handling of MSA SUBREG moves on big-endian targets

2019-09-04 Thread Jeff Law
On 9/4/19 3:35 AM, Richard Sandiford wrote:
> Jeff Law  writes:
>> On 9/3/19 2:16 AM, Mihailo Stojanovic wrote:
>>> From: Mihailo Stojanovic 
>>>
>>> Hi everybody,
>>>
>>> This fixes the MSA implementation on big-endian targets which is
>>> essentially broken for things like SUBREG handling and calling
>>> convention for vector types. It borrows heavily from [1] as Aarch64 has
>>> the same problem with SVE vectors.
>>>
>>> Conceptually, register bitconverts should act as the data has been
>>> stored to memory in one mode, and loaded from memory in the other.
>>> This isn't what happens on big-endian as vector load/store instructions
>>> are essentially mixed-endian with respect to the vector as a whole.
>>> The in-register representation of data must be changed so that the
>>> load/store round-trip becomes valid. This is done by inserting one or
>>> two shuffle instructions for every SUBREG move, as previously
>>> implemented in [2] for LLVM. Even if the shuffle instructions weren't
>>> generated, constraint in mips_can_change_mode_class would force the
>>> conceptual memory reload of SUBREG move operand, which would generate
>>> correct, albeit very inefficient code.
>>>
>>> New msa_reg_predicate was created in order to forbid SUBREG operands in
>>> MSA patterns on big-endian targets. It weeds SUBREGs out of the
>>> instruction patterns into SUBREG->REG moves which are caught by the new
>>> msa_mov_subreg_be pattern and transformed into shuffle(s).
>>>
>>> As for the MSA calling convention, ABI states that compiling for MSA
>>> should not change the base ABIs vector calling convention, that is, MSA
>>> vectors passed or returned by value do not use the MSA vector registers.
>>> Instead, they are passed by general-purpose registers, as described by
>>> the ABI. Passing the vector argument requires splitting it into 2 (or 4)
>>> general-purpose registers and bitconverting it into V2DImode (or
>>> V4SImode). The solution boils down to the one presented for SUBREG
>>> moves: force every vector argument to the appropriate mode (V2DI or
>>> V4SI) so that the shuffle instruction(s) might be generated in order to
>>> conform to the calling convention. The same applies to vectors as return
>>> values.
>>>
>>> New testcases were added (thanks Dragan!) to check calling convention
>>> compliance for all possible combinations of MSA and non-MSA
>>> interlinking.
>>>
>>> This fixes the following vectorization test failures:
>>>
>>> vect/pr66251.c
>>> vect/vect-alias-check-10.c
>>> vect/vect-alias-check-11.c
>>> vect/vect-alias-check-12.c
>>> vect/vect-bswap32.c
>>> vect/vect-bswap64.c
>>> vect/vect-nop-move.c
>>> vect/slp-41.c
>>> vect/slp-43.c
>>> vect/slp-45.c
>>> vect/slp-perm-11.c
>>>
>>> Tested on mips32-mti-linux-gnu and mips64-mti-linux-gnu.
>>>
>>> Regards,
>>> Mihailo
>>>
>>> [1] https://gcc.gnu.org/ml/gcc-patches/2018-01/msg02176.html
>>> [2] https://reviews.llvm.org/rL189330
>>>
>>> gcc/ChangeLog:
>>>
>>> 2019-08-08  Mihailo Stojanovic  
>>>
>>> * config/mips/mips-msa.md: Replace register_operand predicate with
>>> msa_reg_operand in every pattern.
>>> (msa_change_mode): New pattern.
>>> (*msa_change_mode_): New unspec.
>>> (*msa_mov_subreg_be): New unspec.
>>> * config/mips/mips-protos.h (mips_split_msa_subreg_move): Declare.
>>> * config/mips/mips.c (mips_maybe_expand_msa_subreg_move): New
>>> function.
>>> (mips_replace_reg_mode): Ditto.
>>> (mips_split_msa_subreg_move): Ditto.
>>> (mips_legitimize_move): Modify machine modes of MSA vectors which
>>> reside in general-purpose registers. Check whether SUBREG move can
>>> be replaced with shuffle(s).
>>> (mips_split_128bit_move): Replace new SUBREGs with REGs.
>>> (mips_split_msa_copy_d): Ditto.
>>> (mips_split_msa_insert_d): Ditto.
>>> (mips_split_msa_fill_d): Ditto.
>>> (mips_can_change_mode_class): Disallow mode change which would
>>> result in lane width change.
>>> (mips_expand_vec_unpack): Replace new SUBREG with REG on big-endian
>>> targets.
>>> * config/mips/predicates.md (msa_reg_operand): New predicate.
>>>
>>> gcc/testsuite/ChangeLog:
>>>
>>> 2019-08-08  Dragan Mladjenovic  
>>>
>>> * gcc.target/mips/inter/msa-inter.exp: New file.
>>> * gcc.target/mips/inter/msa_1.h: New test.
>>> * gcc.target/mips/inter/msa_1_main.c: New test.
>>> * gcc.target/mips/inter/msa_1_x.c: New test.
>>> * gcc.target/mips/inter/msa_1_y.c: New test.
>> I don't guess Richard S's proposed cleanups to the generic handling of
>> SUBREGs to address issues with SVE really help here.
> 
> Sorry, haven't applied that patch yet -- need to find time to retest.
ACK.  ISTM that getting it in this week or waiting until after Cauldron
would be best.   Obviously your call.  I'll pass along anything my
tester complains about post-installation.

> 
> But yeah, I don't think my patch will help.  For something like this
> I think we'd need 

Re: [PATCH 0/3] mklog improvements

2019-09-04 Thread Jeff Law
On 9/4/19 2:05 AM, Martin Liška wrote:
> On 9/3/19 6:37 PM, Jeff Law wrote:
>> On 8/30/19 2:55 AM, Martin Liška wrote:
>>> PING^1
>>>
>>> On 8/13/19 9:49 AM, Martin Liska wrote:
 Hi.

 I'm sending format independent changes to mklog that should
 improve the script. It addresses couple of improvement
 mentioned here:
 https://gcc.gnu.org/ml/gcc-patches/2019-08/msg00031.html

 Martin

 Martin Liska (3):
    Use argparse.ArgumentParser for mklog.
    mklog: parse PR references from new test files
    mklog: Do not print changed functions in testsuite
>> So there's a ton of follow-ups (primarily between Jakub and yourself).
> 
> Yes, the follow up is about possible change to ChangeLog format. I first
> provided a patch, but later I realized that I would rather stick with
> the current format.
I'd rather drop ChangeLogs and generate them from the VCS, but that's
another discussion entirely :-)

> 
>>
>> What's the state of the changes Jakub has requested?  Can we go forward
>> with the patch as-is and iterate on further improvements?
> 
> Yes, I'm going to install these 3 patches which do not change the format.
THat works for me.

jeff



Re: [PATCHv5] Fix not 8-byte aligned ldrd/strd on ARMv5 (PR 89544)

2019-09-04 Thread Bernd Edlinger
On 9/4/19 4:14 PM, Richard Earnshaw (lists) wrote:
> On 04/09/2019 14:28, Bernd Edlinger wrote:
>> On 9/4/19 2:53 PM, Richard Earnshaw (lists) wrote:
>>> Index: gcc/testsuite/gcc.target/arm/unaligned-argument-2.c
>>> ===
>>> --- gcc/testsuite/gcc.target/arm/unaligned-argument-2.c    (Revision 0)
>>> +++ gcc/testsuite/gcc.target/arm/unaligned-argument-2.c    (Arbeitskopie)
>>> @@ -0,0 +1,19 @@
>>> +/* { dg-do compile } */
>>> +/* { dg-require-effective-target arm_arm_ok } */
>>> +/* { dg-require-effective-target arm_ldrd_strd_ok } */
>>> +/* { dg-options "-marm -mno-unaligned-access -O3" } */
>>> +
>>> +struct s {
>>> +  int a, b;
>>> +} __attribute__((aligned(8)));
>>> +
>>> +struct s f0;
>>> +
>>> +void f(int a, int b, int c, int d, int e, struct s f)
>>> +{
>>> +  f0 = f;
>>> +}
>>> +
>>> +/* { dg-final { scan-assembler-times "ldrd" 0 } } */
>>> +/* { dg-final { scan-assembler-times "strd" 0 } } */
>>> +/* { dg-final { scan-assembler-times "stm" 1 } } */
>>>
>>> I don't think this test is right.  While we can't use an LDRD to load the 
>>> argument off the stack, there's nothing wrong with using an STRD to then 
>>> store the value to f0 (as that is 8-byte aligned).  So the second and third 
>>> scan-assembler tests are meaningless.
>>>
>>
>> Ah, that is very similar to the unaligned-memcpy-2/3.c,
>> see https://gcc.gnu.org/ml/gcc-patches/2019-09/msg00157.html
>>
>> initially that is a movdi,
>> then in subreg1 it is split in two movsi
>> which is then re-assembled as ldm
>>
>>
>> Not sure if that is intended in that way.
>>
>>
> 
> Yeah, these are causing me some problems too, but that's because with some 
> changes I'm working on I now see the compiler using r4 and r5, which leads to 
> prologue and epilogue stores that distort the results.
> 
> Tests like this are generally fragile - I hate 'em
> 

Yeah, that changed since r275063 introduced the unaligned-load/storedi

r275063 | edlinger | 2019-08-30 12:38:37 +0200 (Fr, 30. Aug 2019) | 10 Zeilen
Geänderte Pfade:
   M /trunk/gcc/ChangeLog
   M /trunk/gcc/config/arm/arm.c
   M /trunk/gcc/config/arm/arm.md
   M /trunk/gcc/config/arm/neon.md

2019-08-30  Bernd Edlinger  

* config/arm/arm.md (unaligned_loaddi,
unaligned_storedi): New unspec insn patterns.
* config/arm/neon.md (unaligned_storev8qi): Likewise.
* config/arm/arm.c (gen_cpymem_ldrd_strd): Use unaligned_loaddi
and unaligned_storedi for 4-byte aligned memory.
(arm_block_set_aligned_vect): Use unaligned_storev8qi for
4-byte aligned memory.

Since other than the movdi they are not split up but stay as ldrd/strd.
But for some unknown reason ira assigns r4-5 to those although also
r1-2 would be available. :-(


Bernd.


Re: Force IFN_LOAD/STORE_LANES operands to be memory (PR91577)

2019-09-04 Thread Jeff Law
On 9/4/19 3:18 AM, Richard Sandiford wrote:
> This patch uses the workaround Richi suggested in the PR: make
> discover_nonconstant_array_refs mark the source of an IFN_LOAD_LANES
> call and the destination of an IFN_STORE_LANES call as addressable,
> so that they don't end up being REG rtxes during expansion.
> 
> I had to move the discover_nonconstant_array_refs call outside the
> currently_expanding_to_rtl block since otherwise mark_addressable
> just queues the decision for later.
> 
> Tested on aarch64-linux-gnu.  OK to install?
> 
> Richard
> 
> 
> 2019-09-04  Richard Sandiford  
> 
> gcc/
>   PR middle-end/91577
>   * cfgexpand.c (discover_nonconstant_array_refs): Force the source
>   of an IFN_LOAD_LANES call and the destination of an IFN_STORE_LANES
>   call to be in memory.
>   (pass_expand::execute): Call discover_nonconstant_array_refs before
>   setting currently_expanding_to_rtl.
> 
> gcc/testsuite/
>   PR middle-end/91577
>   * gfortran.dg/pr91577.f90: New test, taken from temporary_1.f90.
OK
jeff


Re: [PATCH 02/25] Propagate address spaces to builtins.

2019-09-04 Thread Kyrill Tkachov



On 9/4/19 3:21 PM, Kyrill Tkachov wrote:


On 9/3/19 4:00 PM, Jeff Law wrote:
> On 9/3/19 8:01 AM, Kyrill Tkachov wrote:
>> Hi all,
>>
>> On 9/5/18 12:48 PM, a...@codesourcery.com wrote:
>>> At present, pointers passed to builtin functions, including atomic
>>> operators,
>>> are stripped of their address space properties.  This doesn't seem 
to be

>>> deliberate, it just omits to copy them.
>>>
>>> Not only that, but it forces pointer sizes to Pmode, which isn't
>>> appropriate
>>> for all address spaces.
>>>
>>> This patch attempts to correct both issues.  It works for GCN 
atomics and

>>> GCN OpenACC gang-private variables.
>>>
>>> 2018-09-05  Andrew Stubbs 
>>>      Julian Brown 
>>>
>>>      gcc/
>>>      * builtins.c (get_builtin_sync_mem): Handle address spaces.
>>
>> Sorry for responding to this so late. I'm testing a rebased version of
>> Richard's OOL atomic patches [1] and am hitting an ICE building the
>> -mabi=ilp32 libgfortran multilib for aarch64-none-elf:
>>
>> 0x7284db emit_library_call_value_1(int, rtx_def*, rtx_def*,
>> libcall_type, machine_mode, int, std::pair*)
>>      $SRC/gcc/calls.c:4915
>> 0x1037817 emit_library_call_value(rtx_def*, rtx_def*, libcall_type,
>> machine_mode, rtx_def*, machine_mode, rtx_def*, machine_mode, rtx_def*,
>> machine_mode)
>>      $SRC/gcc/rtl.h:4240
>> 0x1037817 aarch64_expand_compare_and_swap(rtx_def**)
>>      $SRC/gcc/config/aarch64/aarch64.c:16981
>> 0x1353a43 gen_atomic_compare_and_swapsi(rtx_def*, rtx_def*, rtx_def*,
>> rtx_def*, rtx_def*, rtx_def*, rtx_def*, rtx_def*)
>>      $SRC/gcc/config/aarch64/atomics.md:34
>> 0xb1f9f1 insn_gen_fn::operator()(rtx_def*, rtx_def*, rtx_def*, 
rtx_def*,

>> rtx_def*, rtx_def*, rtx_def*, rtx_def*) const
>>      $SRC/gcc/recog.h:324
>> 0xb1f9f1 maybe_gen_insn(insn_code, unsigned int, expand_operand*)
>>      $SRC/gcc/optabs.c:7443
>> 0xb1fa78 maybe_expand_insn(insn_code, unsigned int, expand_operand*)
>>      $SRC/gcc/optabs.c:7459
>> 0xb21024 expand_atomic_compare_and_swap(rtx_def**, rtx_def**, rtx_def*,
>> rtx_def*, rtx_def*, bool, memmodel, memmodel)
>>      $SRC/gcc/optabs.c:6448
>> 0x709bd3 expand_builtin_atomic_compare_exchange
>>      $SRC/gcc/builtins.c:6379
>> 0x71a4e9 expand_builtin(tree_node*, rtx_def*, rtx_def*, 
machine_mode, int)

>>      $SRC/gcc/builtins.c:8147
>> 0x88b746 expand_expr_real_1(tree_node*, rtx_def*, machine_mode,
>> expand_modifier, rtx_def**, bool)
>>      $SRC/gcc/expr.c:11052
>> 0x88cce6 expand_expr_real(tree_node*, rtx_def*, machine_mode,
>> expand_modifier, rtx_def**, bool)
>>      $SRC/gcc/expr.c:8289
>> 0x74cb47 expand_expr
>>      $SRC/gcc/expr.h:281
>> 0x74cb47 expand_call_stmt
>>      $SRC/gcc/cfgexpand.c:2731
>> 0x74cb47 expand_gimple_stmt_1
>>      $SRC/gcc/cfgexpand.c:3710
>> 0x74cb47 expand_gimple_stmt
>>      $SRC/gcc/cfgexpand.c:3875
>> 0x75439b expand_gimple_basic_block
>>      $SRC/gcc/cfgexpand.c:5915
>> 0x7563ab execute
>>      $SRC/gcc/cfgexpand.c:6538
>> Please submit a full bug report,
>> with preprocessed source if appropriate.
>> Please include the complete backtrace with any bug report.
>> See  for instructions.
>>
>> A MEM rtx now uses a DImode address where for ILP32 we expect SImode.
>>
>> This looks to be because
>>
>> [1] https://gcc.gnu.org/ml/gcc-patches/2018-11/msg00062.html
>>
>>
>>> ---
>>>   gcc/builtins.c | 13 ++---
>>>   1 file changed, 10 insertions(+), 3 deletions(-)
>>>
>> 0002-Propagate-address-spaces-to-builtins.patch
>>
>> diff --git a/gcc/builtins.c b/gcc/builtins.c
>> index 58ea747..361361c 100644
>> --- a/gcc/builtins.c
>> +++ b/gcc/builtins.c
>> @@ -5781,14 +5781,21 @@ static rtx
>>   get_builtin_sync_mem (tree loc, machine_mode mode)
>>   {
>>     rtx addr, mem;
>> +  int addr_space = TYPE_ADDR_SPACE (POINTER_TYPE_P (TREE_TYPE (loc))
>> +    ? TREE_TYPE (TREE_TYPE (loc))
>> +    : TREE_TYPE (loc));
>> +  scalar_int_mode addr_mode = targetm.addr_space.address_mode
>> (addr_space);
>>
>> ... This now returns Pmode (the default for the hook) for aarch64 
ILP32,

>> which is always DImode.
>>
>> -  addr = expand_expr (loc, NULL_RTX, ptr_mode, EXPAND_SUM);
>>
>> Before this patch we used ptr_mode, which does the right thing for
>> AArch64 ILP32.
>> Do you think we should just be implementing
>> targetm.addr_space.address_mode for AArch64 to return SImode for ILP32?
> Possibly.   Is there any fallout from making that change?

Unfortunately some ICEs when building libgcc with POST_INC arguments
output :(

I'll need to dig further.


Adding a convert_memory_address to ptr_mode before each call to 
emit_library_call_value in the OOL atomics code to convert the addresses 
does fix it the ICEs.


Let's see what testing shows.

Kyrill





Thanks,

Kyrill


>
> Jeff


Re: [PATCH 1/2][vect]PR 88915: Vectorize epilogues when versioning loops

2019-09-04 Thread Jeff Law
On 8/23/19 10:50 AM, Andre Vieira (lists) wrote:
> Hi,
> 
> This patch is an improvement on my last RFC.  As you pointed out, we can
> do the vectorization analysis of the epilogues before doing the
> transformation, using the same approach as used by openmp simd.  I have
> not yet incorporated the cost tweaks for vectorizing the epilogue, I
> would like to do this in a subsequent patch, to make it easier to test
> the differences.
> 
> I currently disable the vectorization of epilogues when versioning for
> iterations.  This is simply because I do not completely understand how
> the assumptions are created and couldn't determine whether using
> skip_vectors with this would work.  If you don't think it is a problem
> or have a testcase to show it work I would gladly look at it.
> 
> Bootstrapped this and the next patch on x86_64 and
> aarch64-unknown-linux-gnu, with no regressions (after test changes in
> next patch).
> 
> gcc/ChangeLog:
> 2019-08-23  Andre Vieira  
> 
>  PR 88915
>  * gentype.c (main): Add poly_uint64 type to generator.
>  * tree-vect-loop.c (vect_analyze_loop_2): Make it determine
>  whether we vectorize epilogue loops.
>  (vect_analyze_loop): Idem.
>  (vect_transform_loop): Pass decision to vectorize epilogues
>  to vect_do_peeling.
>  * tree-vect-loop-manip.c (vect_do_peeling): Enable skip-vectors
>  when doing loop versioning if we decided to vectorize epilogues.
>  (vect-loop_versioning): Moved decision to check_profitability
>  based on cost model.
>  * tree-vectorizer.h (vect_loop_versioning, vect_do_peeling,
>  vect_analyze_loop, vect_transform_loop): Update declarations.
>  * tree-vectorizer.c: Include params.h
>  (try_vectorize_loop_1): Initialize vect_epilogues_nomask
>  to PARAM_VECT_EPILOGUES_NOMASK and pass it to vect_analyze_loop
>  and vect_transform_loop.  Also make sure vectorizing epilogues
>  does not count towards number of vectorized loops.
Nit.  In several places you use "epilog", proper spelling is "epilogue".



> diff --git a/gcc/tree-vectorizer.c b/gcc/tree-vectorizer.c
> index 
> 173e6b51652fd023893b38da786ff28f827553b5..25c3fc8ff55e017ae0b971fa93ce8ce2a07cb94c
>  100644
> --- a/gcc/tree-vectorizer.c
> +++ b/gcc/tree-vectorizer.c
> @@ -1013,8 +1015,13 @@ try_vectorize_loop_1 (hash_table 
> *&simduid_to_vf_htab,
>  
>/* Epilogue of vectorized loop must be vectorized too.  */
>if (new_loop)
> -ret |= try_vectorize_loop_1 (simduid_to_vf_htab, num_vectorized_loops,
> -  new_loop, loop_vinfo, NULL, NULL);
> +{
> +  /* Don't include vectorized epilogues in the "vectorized loops" count.
> +   */
> +  unsigned dont_count = *num_vectorized_loops;
> +  ret |= try_vectorize_loop_1 (simduid_to_vf_htab, &dont_count,
> +new_loop, loop_vinfo, NULL, NULL);
> +}
Nit.  Don't wrap a comment with just the closing */ on its own line.
Instead wrap before "count" so that.

This is fine for the trunk after fixing those nits.

jeff


Re: [PATCH 2/2][vect]Make vect-epilogues-nomask=1 default

2019-09-04 Thread Jeff Law
On 8/23/19 10:50 AM, Andre Vieira (lists) wrote:
> I am not sure whether this is the right thing to do, we might want to
> enable this only for certain targets and if so, I also don't know how to
> tackle the testing issue where all the tests that check the number of
> times a dump mentions something will now depend on whether that test
> enables epilogue vectorization or not. I was trying to avoid disabling
> the option for all tests, since if we do that then we aren't really
> testing it. Maybe we ought to enable the option for all tests,
> regardless of the targets default. For testing purposes.
In general we try to drive things with proper costing rather than
enabling/disabling transformations based on the target.  If we need to
adjust cost models, then we're better off getting the bits in now to
give developers a chance to make those adjustments.

WRT testing.  One approach would be to somehow indicate in the dump file
that we're dealing with an epilogue by prefixing the messages.  THen the
scanner could distinguish between main loop vectorization and epilogue
vectorization.  I'm not sure how invasive that would be.

I think the approach you've taken here is fine for now.


jeff


Re: Proposal to patch libiberty.a for controlling whether pathnames on Windows are converted to lower case

2019-09-04 Thread Jeff Law
On 8/15/19 6:42 PM, Carroll, Paul wrote:
> This is a proposed patch to libiberty, with an accompanying patch to GCC.
> The purpose of this patch is to make it possible for Windows-hosted
> toolchains to have the ability to control whether Canonicalized
> filenames are converted to all lower-case.
> Most Windows users are not affected by this behavior.
> However, we have at least one customer who does use Windows systems
> where their hard drives are case-sensitive.  Thus, they desire this
> ability.
> 
> The first implementation of Windows support in libiberty/lrealpath.c was
> added back in 2004.  The new code included a call to GetFullPathName(),
> to Canonicalize the filename.  Next,  if there was sufficient room in
> the buffer, the following code was run:
> 
>     /* The file system is case-preserving but case-insensitive,
>    Canonicalize to lowercase, using the codepage associated
>    with the process locale.  */
>     CharLowerBuff (buf, len);
>     return strdup (buf);
> 
> In effect, the assumption of the code is that all Windows file systems
> will be case-preserving but case-insensitive, so converting a filename
> to lowercase is not a problem.  And tools would always find the
> resulting file on the file system.  That turns out not to be true, but
> lrealpath() was mostly used just for system header files, so no one
> noticed.
> 
> However, in April 2014, libcpp was patched, to cause even non-system
> headers on Windows systems to be Canonicalized.  This patch has caused
> problems for users that have case-sensitive file systems on their
> Windows systems.  A patch to libcpp was proposed to do additional
> Canonicalize non-system headers on Windows systems.
> The discussion on the patch starts at
> https://gcc.gnu.org/ml/gcc-patches/2014-04/msg9.html
> As is noted in the comments:
>   For DOS based file system, we always try to shorten non-system
> headers, as DOS has a tighter constraint on max path length.
> The okay to add the patch was given May 9, 2014 at
> https://gcc.gnu.org/ml/gcc-patches/2014-05/msg00557.html , with the note:
>   I've spoke with Kai a bit about this and he thinks it's
> appropriate and desirable to shorten paths on these kinds of filesystems.
> 
> The libcpp change meant that lrealpath() began to get called for
> non-system header files.  There is other code in both bfd and libiberty
> that can reach the lrealpath() function, but apparently those functions
> are not as much of a problem, as in not really being used in most tools
> (obviously since our customer had no complaints before 2014).
> 
> What I am proposing is to modify lrealpath() to only call CharLowerBuff
> when the user has a filesystem where changing the case of the filename
> is not an issue.
> That is actually most users, so the default should be to call
> CharLowerBuff.
> But I want to give the user the ability to control that behavior.
> 
> My proposal is to change that section of code in libiberty/lrealpath.c
> as follows:
> 
>  else
>    {
> -   /* The file system is case-preserving but case-insensitive,
> -  Canonicalize to lowercase, using the codepage associated
> +   /* The file system is normally case-preserving but
> case-insensitive,
> +  Canonicalize to lowercase if desired, using the codepage
> associated
>    with the process locale.  */
> -    CharLowerBuff (buf, len);
> +   if (libiberty_lowerpath)
> +  CharLowerBuff (buf, len);
>  return strdup (buf);
>    }
> 
> In effect, this just adds a control to let the user decide whether or
> not a pathname is converted to lowercase on Windows systems.
> 
> I also added a global definition for that control at the start of the
> libiberty/lrealpath.c source, setting it so the default behavior on
> Windows is to convert pathnames to lowercase:
> 
> +/* External option to control whether a pathname is converted
> +   to all lower-case on Windows systems.  The default behavior
> +   is to convert.
> +*/
> +#if defined (_WIN32)
> +#ifdef __cplusplus
> +extern "C" {
> +#endif
> +unsigned char libiberty_lowerpath = 1;
> +#ifdef __cplusplus
> +}
> +#endif
> +#endif
> 
> And, for use by tools that link to libiberty.a, I added an external
> reference to that control in include/libiberty.h:
> 
> +#if defined (_WIN32)
> +/* Determine if filenames should be converted to lower case */
> +extern unsigned char libiberty_lowerpath;
> +#endif
> 
> 
> Adding the above code to include/libiberty.h and libiberty/lrealpath.c
> results in a libiberty.a that behaves exactly the same as it does today
> for most users.
> It also makes available a method of affecting whether or not a given
> tool affects canonicalized pathnames on Windows by also converting those
> pathnames to lowercase.
> 
> The questions to discuss:
> 1.    Is this a reasonable solution to this problem, by adding such a
> controlling symbol?
> 2.    Is it reasonable to use ‘libiberty_lowerpath’

Re: [PATCHv5] Fix not 8-byte aligned ldrd/strd on ARMv5 (PR 89544)

2019-09-04 Thread Richard Earnshaw (lists)

On 04/09/2019 16:00, Bernd Edlinger wrote:

On 9/4/19 4:14 PM, Richard Earnshaw (lists) wrote:

On 04/09/2019 14:28, Bernd Edlinger wrote:

On 9/4/19 2:53 PM, Richard Earnshaw (lists) wrote:

Index: gcc/testsuite/gcc.target/arm/unaligned-argument-2.c
===
--- gcc/testsuite/gcc.target/arm/unaligned-argument-2.c    (Revision 0)
+++ gcc/testsuite/gcc.target/arm/unaligned-argument-2.c    (Arbeitskopie)
@@ -0,0 +1,19 @@
+/* { dg-do compile } */
+/* { dg-require-effective-target arm_arm_ok } */
+/* { dg-require-effective-target arm_ldrd_strd_ok } */
+/* { dg-options "-marm -mno-unaligned-access -O3" } */
+
+struct s {
+  int a, b;
+} __attribute__((aligned(8)));
+
+struct s f0;
+
+void f(int a, int b, int c, int d, int e, struct s f)
+{
+  f0 = f;
+}
+
+/* { dg-final { scan-assembler-times "ldrd" 0 } } */
+/* { dg-final { scan-assembler-times "strd" 0 } } */
+/* { dg-final { scan-assembler-times "stm" 1 } } */

I don't think this test is right.  While we can't use an LDRD to load the 
argument off the stack, there's nothing wrong with using an STRD to then store 
the value to f0 (as that is 8-byte aligned).  So the second and third 
scan-assembler tests are meaningless.



Ah, that is very similar to the unaligned-memcpy-2/3.c,
see https://gcc.gnu.org/ml/gcc-patches/2019-09/msg00157.html

initially that is a movdi,
then in subreg1 it is split in two movsi
which is then re-assembled as ldm


Not sure if that is intended in that way.




Yeah, these are causing me some problems too, but that's because with some 
changes I'm working on I now see the compiler using r4 and r5, which leads to 
prologue and epilogue stores that distort the results.

Tests like this are generally fragile - I hate 'em



Yeah, that changed since r275063 introduced the unaligned-load/storedi

r275063 | edlinger | 2019-08-30 12:38:37 +0200 (Fr, 30. Aug 2019) | 10 Zeilen
Geänderte Pfade:
M /trunk/gcc/ChangeLog
M /trunk/gcc/config/arm/arm.c
M /trunk/gcc/config/arm/arm.md
M /trunk/gcc/config/arm/neon.md

2019-08-30  Bernd Edlinger  

 * config/arm/arm.md (unaligned_loaddi,
 unaligned_storedi): New unspec insn patterns.
 * config/arm/neon.md (unaligned_storev8qi): Likewise.
 * config/arm/arm.c (gen_cpymem_ldrd_strd): Use unaligned_loaddi
 and unaligned_storedi for 4-byte aligned memory.
 (arm_block_set_aligned_vect): Use unaligned_storev8qi for
 4-byte aligned memory.

Since other than the movdi they are not split up but stay as ldrd/strd.
But for some unknown reason ira assigns r4-5 to those although also
r1-2 would be available. :-(



r1-r2 can't be used in Arm state as the register has to start on an even 
boundary.  But ira has already used r3 for the address of the store (it 
could have picked r1) and now r4-r5 is the next even-numbered pair.  So 
we end up with needing to save some call-clobbered regs.


R.


Bernd.





Re: [PATCH 2/2] PR libstdc++/41861 Add full steady_clock support to condition_variable

2019-09-04 Thread Jonathan Wakely

On 04/09/19 15:49 +0100, Mike Crowe wrote:

On Wednesday 04 September 2019 at 14:39:35 +0100, Jonathan Wakely wrote:

On 15/07/19 17:47 +0100, Mike Crowe wrote:
> The pthread_cond_clockwait function was recently added[1] to glibc, and is
> due to be released in glibc 2.30. If this function is available in the C
> library it can be used it to fix
> https://gcc.gnu.org/bugzilla/show_bug.cgi?id=41861 by supporting
> std::chrono::steady_clock properly with std::condition_variable.
>
> This means that code using std::condition_variable::wait_for or
> std::condition_variable::wait_until with std::chrono::steady_clock is no
> longer subject to timing out early or potentially waiting for much longer
> if the system clock is warped at an inopportune moment.
>
> If pthread_cond_clockwait is available then std::chrono::steady_clock is
> deemed to be the "best" clock available which means that it is used for the
> relative wait_for calls and absolute wait_until calls using user-defined
> clocks. Calls explicitly using std::chrono::system_clock continue to use
> CLOCK_REALTIME via __gthread_cond_timedwait.
>
> If pthread_cond_clockwait is not available then std::chrono::system_clock
> is deemed to be the "best" clock available which means that the previous
> suboptimal behaviour remains.
>
> [1] 
https://sourceware.org/git/?p=glibc.git;a=commit;h=afe4de7d283ebd88157126c5494ce1796194c16e
>
> libstdc++-v3/
>
>* include/std/condition_variable: Add include of 
>to make _GLIBCXX_USE_PTHREAD_COND_CLOCKWAIT available.
>(condition_variable): Split __clock_t into __system_clock_t, which
>is always system_clock and __best_clock_t, which is the clock that
>we wish to convert arbitrary other clocks to.  If we know that
>pthread_cond_clockwait is available then it is best to convert
>clocks to steady_clock, otherwise it's best to use
>system_clock. (wait_until): If pthread_cond_clockwait is available,
>provide a steady_clock overload.  Convert previous __clock_t
>overload to use __system_clock_t.  Convert generic overload to
>convert passed clock to __best_clock_t.  (wait_until_impl): Add
>steady_clock overload that calls pthread_cond_clockwait.  Convert
>previous __clock_t overload to use
>__system_clock_t. (condition_variable_any): Use steady_clock for
>__clock_t if pthread_cond_clockwait is available.
>
>* acinclude.m4: Detect the availability of POSIX-proposed
>pthread_cond_clockwait function.
>* configure: Likewise.
>* configure.ac: Likewise.
>* config.h.in: Add _GLIBCXX_USE_PTHREAD_COND_CLOCKWAIT define to
>indicate presence of pthread_cond_clockwait function.

Thanks, Mike!

This patch is much simpler than the previous versions. Thanks for
perservering with POSIX and glibc to get the necessary functionality
into libc.

I've attached a slightly-modified patch, which changes the names of
the clock typedefs in std::condition_variable. The names
"steady_clock" and "system_clock" are already reserved names, so we
can just use those (instead of the uglier __steady_clock_t forms). And
we can just keep the original __clock_t name to mean the best clock.
Does that look OK to you too?


You're far more expert on that stuff than I am, but it looks fine to me.

However, you've also removed the  include. Was that
intentional?


Yes, it's already included via  (and probably several
other routes). Every header should already be including that. If they
didn't, the macros like _GLIBCXX_HAS_GTHREADS, _GLIBCXX_VISIBILITY,
and _GLIBCXX_BEGIN_NAMESPACE_VERSION would produce errors.



I've confirmed that with glibc 2.30 the new function is detected, and
the testsuite passes. I'm running the tests with an older glibc as
well.


I found that it was necessary to run the test under strace to prove that
the correct variant of futex is called too, because...


I noticed that the new tests you added in [PATCH 1/2] pass on current
trunk, even without [PATCH 2/2], is that expected?


Unfortunately, yes. CLOCK_MONOTONIC and CLOCK_REALTIME will tick at the
same rate, so unless someone warps CLOCK_REALTIME during the test we can't
tell the difference between the old implementation of translating
steady_clock to system_clock and the new implementation that uses
steady_clock directly. :( I added the tests in the hope of finding other
mistakes in the new implementation rather than to reproduce the original
problem.


OK, that was what I figured was the case. Thanks for confirming.


Maybe I should add comments to the test to make that clear along the lines
of those found in testsuite/27_io/objects/char/3_xin.cc ?


More comments are usually useful, otherwise I'll just ask the same
question again and again :-)

All my testing passed, so I'll get this committed today. Thanks again
for perservering to finally get this fixed!



Re: [2/2] PR 78736: libgomp fallout

2019-09-04 Thread Prathamesh Kulkarni
On Tue, 9 May 2017 at 18:59, Jakub Jelinek  wrote:
>
> On Tue, May 09, 2017 at 06:55:12PM +0530, Prathamesh Kulkarni wrote:
> > ping https://gcc.gnu.org/ml/gcc-patches/2017-05/msg00135.html
>
> The libgomp patch is ok provided the warning is added.
> Though, should it be in -Wall and not say just -Wextra?
Hi Jakub,
I committed the patch for PR78736 to add warning under Wextra, so the
above warnings
for icv.c don't trigger during the build, and the patch isn't strictly
necessary.
Is it OK to commit the patch or leave as-is ?

Thanks,
Prathamesh
>
> Jakub


Re: [2/2] PR 78736: libgomp fallout

2019-09-04 Thread Jakub Jelinek
On Wed, Sep 04, 2019 at 10:08:38PM +0530, Prathamesh Kulkarni wrote:
> On Tue, 9 May 2017 at 18:59, Jakub Jelinek  wrote:
> >
> > On Tue, May 09, 2017 at 06:55:12PM +0530, Prathamesh Kulkarni wrote:
> > > ping https://gcc.gnu.org/ml/gcc-patches/2017-05/msg00135.html
> >
> > The libgomp patch is ok provided the warning is added.
> > Though, should it be in -Wall and not say just -Wextra?
> Hi Jakub,
> I committed the patch for PR78736 to add warning under Wextra, so the
> above warnings
> for icv.c don't trigger during the build, and the patch isn't strictly
> necessary.
> Is it OK to commit the patch or leave as-is ?

I'd leave as is.

Jakub


Re: PR78736

2019-09-04 Thread Prathamesh Kulkarni
On Wed, 4 Sep 2019 at 01:22, Jeff Law  wrote:
>
> On 8/28/19 8:55 PM, Prathamesh Kulkarni wrote:
> > Hi,
> > This is a rebased patch on trunk for PR78736. The last time, it got
> > stuck, because of warning issues with libgfortran, for which I filed
> > PR91593. The patch relegates the warning to Wextra instead, which only
> > triggers (non-fatal) warnings in libgfortran/io/transfer.c, and
> > survives bootstrap+test on x86_64-unknown-linux-gnu. (Also IIRC, Wall
> > triggered several instances of the warning with allmodconfig kernel
> > build last time).
> > Is the patch OK to commit ?
> >
> > Thanks,
> > Prathamesh
> >
> >
> > pr78736-v2-1.txt
> >
> > 2019-08-29  Prathamesh Kulkarni  
> >
> >   * doc/invoke.texi: Document -Wenum-conversion.
> >   * c-family/c.opt (Wenum-conversion): New option.
> >   * c/c-typeck.c (convert_for_assignment): Handle Wenum-conversion.
> >
> > testsuite/
> >   * gcc.dg/Wenum-conversion.c: New test-case.
> OK.  I'd be nice to have a note on PR91593
Thanks, committed as r275376.

Thanks,
Prathamesh
>
> Jeff


Re: [PATCH, V3, #7 of 10], Implement PCREL_OPT relocation optimization

2019-09-04 Thread Michael Meissner
On Tue, Sep 03, 2019 at 06:33:26PM -0500, Segher Boessenkool wrote:
> On Tue, Sep 03, 2019 at 07:20:13PM -0400, Michael Meissner wrote:
> > On Tue, Sep 03, 2019 at 05:56:03PM -0500, Segher Boessenkool wrote:
> > > Hi!
> > > 
> > > On Mon, Aug 26, 2019 at 05:43:41PM -0400, Michael Meissner wrote:
> > > > /* This file implements a RTL pass that looks for pc-relative loads of 
> > > > the
> > > >address of an external variable using the PCREL_GOT relocation and a 
> > > > single
> > > >load/store that uses that GOT pointer.
> > > 
> > > Does this work better than having a peephole for it?  Is there some reason
> > > you cannot do this with a peephole?
> > 
> > Yes.  Peepholes only look at adjacent insns.
> 
> Huh.  Wow.  Would you believe I never knew that (or I forgot)?  Well, that
> explains why peepholes aren't very effective for us at all, alright!
> 
> > This optimization allows the load
> > of the GOT address to be separated from the eventual load or store.
> > 
> > Peephole2's are likely too early, because you really, really, really don't 
> > want
> > any other pass moving things around.
> 
> That is a bit worrying...  What can go wrong?

As I say in the comments, with PCREL_OPT, you must have exactly one load of the
address and one load or store that references the load of the address.  If
something duplicates one of the loads or stores, or adds another reference to
the address, or just moves it so we can't link the loading of the address to
the final load/store, it will not work.

For stores, the value being stored must be live at both the loading of the
address and the store.

For loads, the register being loaded must not be used between the loading of
the address and the final load.

I.e. in:

PLD r1,foo@got@pcrel
.Lpcrel1:

# other instructions

.reloc .Lpcrel1-8,R_PPC64_PCREL_OPT,.-(.Lpcrel1-8)
LWZ r2,0(r1)

If you get lucky and foo is defined in the same compilation unit, this will get
turned into:

PLWZ r2,foo@pcrel

# other instructions

NOP

If foo is defined in a shared library (or you are linking for a shared library,
and foo is defined in the main program or another shared library), you get:

PLD r1,.got.foo@pcrel

# other instructions

LWZ r2,0(r1)

.section .got
.got.foo: .quad foo

So for loads, r2 must not be used between the PLD and LWZ instructions.

Similarly for stores:

PLD r1,foo@got@pcrel
.Lpcrel1:

# other instructions

.reloc .Lpcrel1-8,R_PPC64_PCREL_OPT,.-(.Lpcrel1-8)
stw r2,0(r1)

If you get lucky, this becomes:

PSTW r2,foo@pcrel

# other instructions

NOP

If foo is defined in a shared library (or you are linking for a shared library,
and foo is defined in the main program or another shared library), you get:

PLD r1,.got.foo@pcrel

# other instructions

STW r2,0(r1)

.section .got
.got.foo: .quad foo

So as I said, r2 must be live betweent he PLD and STW, because you don't know
if the PLD will be replaced with a PSTW or not.

So to keep other passes from 'improving' things, I opted to do the pass as the
last pass before final.

-- 
Michael Meissner, IBM
IBM, M/S 2506R, 550 King Street, Littleton, MA 01460-6245, USA
email: meiss...@linux.ibm.com, phone: +1 (978) 899-4797


Re: [PATCH] Simplify sinh (x) / cosh (x)

2019-09-04 Thread Rafael Tsuha
Hi, Jeff

Em seg, 29 de abr de 2019 às 18:22, Jeff Law  escreveu:
>
> On 1/22/19 12:31 PM, Rafael Tsuha wrote:
> > This patch simplifies the expression sinh (x) / cosh (x) to tanh (x).
> > This rule is mathematically valid.
> >
> > There's a slight difference in the result when applying this
> > optimization with x in the interval 0 < x <= 1e-4951. With the
> > optimization, the result using long double is -0 and without the
> > optimization, the result is +0.
> That's an indication something has gone wrong.
>
> If I'm reading this correctly it sounds like tanh in that range is
> returning -0?  If so, that just seems like the wrong output from tanh
> since IIUC for a positive input tanh will always have a positive output.
>

I reverted the patch sent to solve bug 88556 and found out that
tanhl(0) started returning -0 after this patch.

patch we reverted:
(https://gcc.gnu.org/viewcvs/gcc/trunk/gcc/config/i386/i386.c?r1=267325&r2=267324&pathrev=267325)

In the line 44480 of this patch, it checks the sign bit of the input
and if it's false the expression is multiplied by -1. In the way it's
being calculated, this should be done only if the input is a number
greater than zero.

If we follow the code starting at line 44468, replacing op1 with 0, we
can see that e2 equals 0 at line 44482, flags will be false and
finally the e2 = -e2 operation will be executed generating the -0
result.

I have implemented a testcase to reproduce the bug:
https://paste.debian.net/1098800/
this code was compiled with -Ofast when we tested it.

Should I file a bug about this? And for fixing, Is it a good solution
to emit an instruction to return zero immediately if the input equals
zero?

> Jeff
>

Rafael Tsuha


[PATCH, i386]: Fix PR 32413 in a correct way

2019-09-04 Thread Uros Bizjak
Attached patch fixes PR 32413 by forcing secondary memory for QI and
HImode moves between SSE and integer registers. This works together
with (existing) ix86_secondary_memory_needed_mode, where QI and HImode
secondary memory mode is extended to SImode.

The previous patch that fixe PR 32413 forced the limitation of
SSE->integer cost to minimum 8, which was enough to fix the testcase.
We want to get rid of this artificial limitation, but still have to
prevent unsupported move modes.

The patch should have no effect on benchmarks.

2019-09-04  Uroš Bizjak  

PR target/32413
* config/i386/i386.c (inline_secondary_memory_needed): Return true
for QI and HImode moves between SSE and general registers.

Bootstrapped and regression tested on x86_64-linux-gnu {,-m32}.

Committed to mainline SVN.

Uros.
Index: config/i386/i386.c
===
--- config/i386/i386.c  (revision 275375)
+++ config/i386/i386.c  (working copy)
@@ -18328,6 +18328,7 @@ inline_secondary_memory_needed (machine_mode mode,
 
   /* Between SSE and general, we have moves no larger than word size.  */
   if (!(INTEGER_CLASS_P (class1) || INTEGER_CLASS_P (class2))
+ || GET_MODE_SIZE (mode) < GET_MODE_SIZE (SImode)
  || GET_MODE_SIZE (mode) > UNITS_PER_WORD)
return true;
 


Re: C++ PATCH to remove -fdeduce-init-list

2019-09-04 Thread Jason Merrill

On 9/3/19 8:01 PM, Marek Polacek wrote:

On Tue, Sep 03, 2019 at 08:52:54PM -0400, Marek Polacek wrote:

As I recently threatened, it's time we let -fdeduce-init-list go.
This patch removes the implementation while keeping the option for
backward compatibility.

Bootstrapped/regtested on x86_64-linux, ok for trunk?

2019-09-03  Marek Polacek  

* c.opt (fdeduce-init-list): Ignored.

* call.c (build_over_call): Remove -fdeduce-init-list implementation.
* pt.c (unify): Likewise.

* doc/invoke.texi: Remove -fdeduce-init-list documentation.

* g++.dg/cpp0x/initlist-deduce.C: New test.


The test isn't new, I'll fix the CL entry.


OK.

Jason



Re: [ARM/FDPIC v5 04/21] [ARM] FDPIC: Add support for FDPIC for arm architecture

2019-09-04 Thread Christophe Lyon
On Tue, 3 Sep 2019 at 10:40, Richard Sandiford
 wrote:
>
> Christophe Lyon  writes:
> > @@ -3485,6 +3485,14 @@ arm_option_override (void)
> >if (flag_pic && TARGET_VXWORKS_RTP)
> >  arm_pic_register = 9;
> >
> > +  /* If in FDPIC mode then force arm_pic_register to be r9.  */
> > +  if (TARGET_FDPIC)
> > +{
> > +  arm_pic_register = FDPIC_REGNUM;
> > +  if (TARGET_THUMB1)
> > + sorry ("FDPIC mode is not supported in Thumb-1 mode.");
>
> Should be no "." at the end.
Done

>
> > +}
> > +
> >if (arm_pic_register_string != NULL)
> >  {
> >int pic_register = decode_reg_name (arm_pic_register_string);
> > [...]
> > @@ -7295,6 +7303,21 @@ arm_function_ok_for_sibcall (tree decl, tree exp)
> >if (cfun->machine->sibcall_blocked)
> >  return false;
> >
> > +  if (TARGET_FDPIC)
> > +{
> > +  /* In FDPIC, never tailcall something for which we have no decl:
> > +  the target function could be in a different module, requiring
> > +  a different FDPIC register value.  */
> > +  if (decl == NULL)
> > + return false;
> > +
> > +  /* Don't tailcall if we go through the PLT since the FDPIC
> > +  register is then corrupted and we don't restore it after
> > +  static function calls.  */
> > +  if (!targetm.binds_local_p (decl))
> > + return false;
> > +}
> > +
> >/* Never tailcall something if we are generating code for Thumb-1.  */
> >if (TARGET_THUMB1)
> >  return false;
>
> Is this still needed after you removed the optimisation to avoid
> restoring r9?  (Not really a review comment, just curious.)
You are right: it's not needed anymore, I forgot to remove it.

>
> > [...]
> > @@ -7780,28 +7812,132 @@ arm_load_pic_register (unsigned long saved_regs 
> > ATTRIBUTE_UNUSED, rtx pic_reg)
> >emit_use (pic_reg);
> >  }
> >
> > +/* Try to determine whether an object, referenced via ORIG, will be
> > +   placed in the text or data segment.  This is used in FDPIC mode, to
> > +   decide which relocations to use when accessing ORIG.  *IS_READONLY
> > +   is set to true if ORIG is a read-only location, false otherwise.
> > +   Return true if we could determine the location of ORIG, false
> > +   otherwise.  *IS_READONLY is valid only when we return true.  */
> > +static bool
> > +arm_is_segment_info_known (rtx orig, bool *is_readonly)
> > +{
> > +  *is_readonly = false;
> > +
> > +  if (GET_CODE (orig) == LABEL_REF)
> > +{
> > +  *is_readonly = true;
> > +  return true;
> > +}
> > +
> > +  if (SYMBOL_REF_P (orig))
> > +{
> > +  if (CONSTANT_POOL_ADDRESS_P (orig))
> > + {
> > +   *is_readonly = true;
> > +   return true;
> > + }
> > +  else if (SYMBOL_REF_LOCAL_P (orig)
> > +&& !SYMBOL_REF_EXTERNAL_P (orig)
> > +&& SYMBOL_REF_DECL (orig)
> > +&& (!DECL_P (SYMBOL_REF_DECL (orig))
> > +|| !DECL_COMMON (SYMBOL_REF_DECL (orig
>
> This can just be an "if".
>
Done

> > + {
> > +   tree decl = SYMBOL_REF_DECL (orig);
> > +   tree init = (TREE_CODE (decl) == VAR_DECL)
> > + ? DECL_INITIAL (decl) : (TREE_CODE (decl) == CONSTRUCTOR)
> > + ? decl : 0;
> > +   int reloc = 0;
> > +   bool named_section, readonly;
> > +
> > +   if (init && init != error_mark_node)
> > + reloc = compute_reloc_for_constant (init);
> > +
> > +   named_section = TREE_CODE (decl) == VAR_DECL
> > + && lookup_attribute ("section", DECL_ATTRIBUTES (decl));
> > +   readonly = decl_readonly_section (decl, reloc);
> > +
> > +   /* We don't know where the link script will put a named
> > +  section, so return false in such a case.  */
> > +   if (named_section)
> > + return false;
> > +
> > +   *is_readonly = readonly;
> > +   return true;
> > + }
> > +  else
> > + {
> > +   /* We don't know.  */
> > +   return false;
> > + }
> > +}
> > +  else
> > +gcc_unreachable ();
> > +
> > +  return false;
>
> Then this can end with:
>
>   /* We don't know.  */
>   return false;
> }
>
>   gcc_unreachable ();
> }
>
OK

> > +}
> > +
> >  /* Generate code to load the address of a static var when flag_pic is set. 
> >  */
> >  static rtx_insn *
> >  arm_pic_static_addr (rtx orig, rtx reg)
> >  {
> >rtx l1, labelno, offset_rtx;
> > +  rtx_insn *insn;
> >
> >gcc_assert (flag_pic);
> >
> > -  /* We use an UNSPEC rather than a LABEL_REF because this label
> > - never appears in the code stream.  */
> > -  labelno = GEN_INT (pic_labelno++);
> > -  l1 = gen_rtx_UNSPEC (Pmode, gen_rtvec (1, labelno), UNSPEC_PIC_LABEL);
> > -  l1 = gen_rtx_CONST (VOIDmode, l1);
> > +  bool is_readonly = false;
> > +  bool info_known = false;
> > +
> > +  if (TARGET_FDPIC
> > +  && SYMBOL_REF_P (orig)
> > +  && !SYMBOL_REF_FUNCTION_P (orig))
> > +info_known = arm_is_segment_info_known (orig, &is_readonly);
> > +
> > +  if (TARGET_FDPIC
> > +

Re: C++ PATCH for c++/91644 - ICE with constinit in function template

2019-09-04 Thread Jason Merrill

On 9/3/19 4:08 PM, Marek Polacek wrote:

First constinit bug report (yay?).  The problem is that while I made sure
that constinit variable templates work as they should, I clearly neglected
ordinary static variables declared constinit in function templates.


Hmm, right, locals don't get DECL_TEMPLATE_INFO.


As an aside, I'm concerned that if we're required to handle
// Foo.h
struct Foo {
   constinit static int x;
};
// Foo.cpp
int Foo::x = 42;

we might need to resurrect DECL_DECLARED_CONSTINIT_P...


Or use an (internal) attribute?

Jason


Re: [ARM/FDPIC v5 10/21] [ARM] FDPIC: Implement TLS support.

2019-09-04 Thread Christophe Lyon
On Wed, 4 Sep 2019 at 16:16, Kyrill Tkachov  wrote:
>
> Hi Christophe,
>
> On 5/15/19 1:39 PM, Christophe Lyon wrote:
> > Support additional relocations: TLS_GD32_FDPIC, TLS_LDM32_FDPIC, and
> > TLS_IE32_FDPIC.
> >
> > We do not support the GNU2 TLS dialect.
> >
> > 2019-XX-XX  Christophe Lyon  
> > Mickaël Guêné 
> >
> > gcc/
> > * config/arm/arm.c (tls_reloc): Add TLS_GD32_FDPIC,
> > TLS_LDM32_FDPIC and TLS_IE32_FDPIC.
> > (arm_call_tls_get_addr): Add FDPIC support.
> > (legitimize_tls_address): Likewise.
> > (arm_emit_tls_decoration): Likewise.
> >
> > Change-Id: I4ea5034ff654540c4658d0a79fb92f70550cdf4a
> >
> > diff --git a/gcc/config/arm/arm.c b/gcc/config/arm/arm.c
> > index 99d13bf..5fc7a20 100644
> > --- a/gcc/config/arm/arm.c
> > +++ b/gcc/config/arm/arm.c
> > @@ -2379,9 +2379,12 @@ char arm_arch_name[] = "__ARM_ARCH_PROFILE__";
> >
> >  enum tls_reloc {
> >TLS_GD32,
> > +  TLS_GD32_FDPIC,
> >TLS_LDM32,
> > +  TLS_LDM32_FDPIC,
> >TLS_LDO32,
> >TLS_IE32,
> > +  TLS_IE32_FDPIC,
> >TLS_LE32,
> >TLS_DESCSEQ  /* GNU scheme */
> >  };
> > @@ -8760,20 +8763,34 @@ arm_call_tls_get_addr (rtx x, rtx reg, rtx
> > *valuep, int reloc)
> >gcc_assert (reloc != TLS_DESCSEQ);
> >start_sequence ();
> >
> > -  labelno = GEN_INT (pic_labelno++);
> > -  label = gen_rtx_UNSPEC (Pmode, gen_rtvec (1, labelno),
> > UNSPEC_PIC_LABEL);
> > -  label = gen_rtx_CONST (VOIDmode, label);
> > +  if (TARGET_FDPIC)
> > +{
> > +  sum = gen_rtx_UNSPEC (Pmode,
> > +   gen_rtvec (2, x, GEN_INT (reloc)),
> > +   UNSPEC_TLS);
> > +}
> > +  else
> > +{
> > +  labelno = GEN_INT (pic_labelno++);
> > +  label = gen_rtx_UNSPEC (Pmode, gen_rtvec (1, labelno),
> > UNSPEC_PIC_LABEL);
> > +  label = gen_rtx_CONST (VOIDmode, label);
> >
> > -  sum = gen_rtx_UNSPEC (Pmode,
> > -   gen_rtvec (4, x, GEN_INT (reloc), label,
> > -  GEN_INT (TARGET_ARM ? 8 : 4)),
> > -   UNSPEC_TLS);
> > +  sum = gen_rtx_UNSPEC (Pmode,
> > +   gen_rtvec (4, x, GEN_INT (reloc), label,
> > +  GEN_INT (TARGET_ARM ? 8 : 4)),
> > +   UNSPEC_TLS);
> > +}
> >reg = load_tls_operand (sum, reg);
> >
> > -  if (TARGET_ARM)
> > -emit_insn (gen_pic_add_dot_plus_eight (reg, reg, labelno));
> > +  if (TARGET_FDPIC)
> > +{
> > +  emit_insn (gen_addsi3 (reg, reg, gen_rtx_REG (Pmode,
> > FDPIC_REGNUM)));
> > +}
>
> No {} around single statement.
>
> >else
> > -emit_insn (gen_pic_add_dot_plus_four (reg, reg, labelno));
> > +if (TARGET_ARM)
> > +  emit_insn (gen_pic_add_dot_plus_eight (reg, reg, labelno));
>
>
> Merge that "if" with the "else" in an "else if"
>
> > +else
> > +  emit_insn (gen_pic_add_dot_plus_four (reg, reg, labelno));
> >
> >*valuep = emit_library_call_value (get_tls_get_addr (), NULL_RTX,
> >   LCT_PURE, /* LCT_CONST?  */
> > @@ -8808,6 +8825,7 @@ arm_tls_descseq_addr (rtx x, rtx reg)
> >return reg;
> >  }
> >
> > +
> >  rtx
> >  legitimize_tls_address (rtx x, rtx reg)
> >  {
> > @@ -8820,6 +8838,9 @@ legitimize_tls_address (rtx x, rtx reg)
> >  case TLS_MODEL_GLOBAL_DYNAMIC:
> >if (TARGET_GNU2_TLS)
> >  {
> > + if (TARGET_FDPIC)
> > +   gcc_unreachable();
> > +
>
> Use gcc_assert (TARGET_FDPIC)
>
>
> >reg = arm_tls_descseq_addr (x, reg);
> >
> >tp = arm_load_tp (NULL_RTX);
> > @@ -8829,7 +8850,10 @@ legitimize_tls_address (rtx x, rtx reg)
> >else
> >  {
> >/* Original scheme */
> > - insns = arm_call_tls_get_addr (x, reg, &ret, TLS_GD32);
> > + if (TARGET_FDPIC)
> > +   insns = arm_call_tls_get_addr (x, reg, &ret, TLS_GD32_FDPIC);
> > + else
> > +   insns = arm_call_tls_get_addr (x, reg, &ret, TLS_GD32);
> >dest = gen_reg_rtx (Pmode);
> >emit_libcall_block (insns, dest, ret, x);
> >  }
> > @@ -8838,6 +8862,9 @@ legitimize_tls_address (rtx x, rtx reg)
> >  case TLS_MODEL_LOCAL_DYNAMIC:
> >if (TARGET_GNU2_TLS)
> >  {
> > + if (TARGET_FDPIC)
> > +   gcc_unreachable();
> > +
>
> Likewise.
>
> Ok with those changes.

Thanks, here is the updated version, with the initialization of labelno in
call_tls_get_addr moved from patch 4/21.


>
> Thanks,
>
> Kyrill
>
>
> >reg = arm_tls_descseq_addr (x, reg);
> >
> >tp = arm_load_tp (NULL_RTX);
> > @@ -8846,7 +8873,10 @@ legitimize_tls_address (rtx x, rtx reg)
> >  }
> >else
> >  {
> > - insns = arm_call_tls_get_addr (x, reg, &ret, TLS_LDM32);
> > + if (TARGET_FDPIC)
> > +   insns = arm_call_tls_get_addr (x, reg, &ret, TLS_LDM32_FDPIC);
> > + else
> > +

Re: C++ PATCH for c++/91644 - ICE with constinit in function template

2019-09-04 Thread Marek Polacek
On Wed, Sep 04, 2019 at 04:00:24PM -0400, Jason Merrill wrote:
> On 9/3/19 4:08 PM, Marek Polacek wrote:
> > First constinit bug report (yay?).  The problem is that while I made sure
> > that constinit variable templates work as they should, I clearly neglected
> > ordinary static variables declared constinit in function templates.
> 
> Hmm, right, locals don't get DECL_TEMPLATE_INFO.

And if I create a new, clear one, it crashes.  But I was able to work around
that by remembering if we saw constinit.

> > As an aside, I'm concerned that if we're required to handle
> > // Foo.h
> > struct Foo {
> >constinit static int x;
> > };
> > // Foo.cpp
> > int Foo::x = 42;
> > 
> > we might need to resurrect DECL_DECLARED_CONSTINIT_P...
> 
> Or use an (internal) attribute?

Luckily it would appear that we won't be required to handle that case.

Any comments on the patch itself?

Marek


Re: [PATCH,Fortran] Improve PRNG jumping when using threads

2019-09-04 Thread Steve Kargl
On Wed, Sep 04, 2019 at 03:09:26PM +0300, Janne Blomqvist wrote:
> Currently, when a new thread needs to use the RANDOM_NUMBER intrinsic,
> the per-thread PRNG state is initialized by copying the master state
> and then jumping forwards N*2**128 entries in the stream so that the
> PRNG streams for different threads don't alias each other, where N is
> the number of threads that have so far initialized the PRNG.
> 
> With this patch the master state itself is jumped forwards once each
> time a new thread initializes the PRNG, thus obviating the need to
> jump through all the N-1 previous streams. Effectively turning an O(N)
> algorithm into an O(1) one.
> 
> Regtested on x86_64-pc-linux-gnu, Ok for trunk?
> 

Ok.

-- 
Steve


[PATCH] detect strcat/strcpy overflowing member arrays (PR 91631)

2019-09-04 Thread Martin Sebor

While testing some other buffer overflow improvements I was
reminded that GCC doesn't detect even some basic instances
of overflowing struct members involving strcpy calls with
non-constant strings, with or without _FORTIFY_SOURCE.  For
example:

  struct S { char a[4]; void (*pf)(void); };

  extern struct S a[];

  void f (void)
  {
char s[] = "123456789";
strcpy (a[0].a, s);
  }

This is because the strlen pass that computes the length of
non-constant strings like s above isn't yet integrated with
the -Wstringop-overflow/_FORTIFY_SOURCE checking.

But the strlen pass does call into the wrestrict pass to check
for overlapping copies (mainly to avoid issuing meaningless
-Wrestrict warnings or mentioning invalid offsets in then), and
the wrestrict pass checks for invalid offsets and ordinarily
diagnoses those.

However, the invalid offset detection isn't quite robust enough
to detect these cases (it gives up too early), and so the first
line of defense fails.

The attached patch improves the wrestrict pass to also detect
these invalid offsets.  The improvement relies on the recently
introduced component_size() function.  Testing the change in
turn exposed a bug in the function (relying on the size of
an array to determine if it was the member[1] kind rather than
considering its domain.  So the patch also corrects that bug
With the patch applied the out-of-bounds access above is
diagnosed:

  warning: ‘strcpy’ offset [4, 9] from the object at ‘a’ is out of the 
bounds of referenced subobject ‘a’ with type ‘char[4]’ at offset 0 
[-Warray-bounds]


-Warray-bounds isn't the ideal warning to issue and ultimately,
the strcpy member overflow detection will need to be done under
-Wstringop-overflow, and trigger aborts at runtime with
_FORTIFY_SOURCE.  But that's a project of its own (possibly
involving moving the -Wstringop-overflow checking code from
builtins.c into the strlen pass itself).

Tested on x86_64-linux.

Martin

PS I can think of little in terms of "infrastructure" in this
work that could be split out into a separate patch, except
perhaps the component_size change.  But given how small
the function is it doesn't seem necessary.
PR middle-end/91631 - buffer overflow into an array member of a declared object not detected

gcc/ChangeLog:

	PR middle-end/91631
	* builtins.c (component_size): Correct trailing array computation,
	rename to component_ref_size and move...
	(compute_objsize): Adjust.
	* gimple-ssa-warn-restrict.c (builtin_memref::refsize): New member.
	(builtin_access::strict): Do not consider mememmove.
	(builtin_access::write_off): New function.
	(builtin_memref::builtin_memref): Initialize refsize.
	(builtin_memref::set_base_and_offset): Adjust refoff and compute
	refsize.
	(builtin_memref::offset_out_of_bounds): Use ooboff input values.
	Handle refsize.
	(builtin_access::builtin_access): Intialize dstoff to destination
	refeence offset here instead of in maybe_diag_overlap.  Adjust
	referencess even to unrelated objects.	Adjust sizrange of bounded
	string functions to reflect bound.  For strcat, adjust destination
	sizrange by that of source.
	(builtin_access::strcat_overlap):  Adjust offsets and sizes
	to reflect the increase in destination sizrange above.
	(builtin_access::overlap): Do not set dstoff here but instead
	in builtin_access::builtin_access.
	(check_bounds_or_overlap): Use builtin_access::write_off.
	(maybe_diag_access_bounds): Add argument.  Add informational notes.
	(dump_builtin_memref, dump_builtin_access): New functions.
	* tree.c (component_ref_size): ...to here.
	* tree.h (component_ref_size): Declare.

gcc/testsuite/ChangeLog:

	PR middle-end/91631
	* /c-c++-common/Warray-bounds-3.c: Correct expected offsets.
	* /c-c++-common/Warray-bounds-4.c: Same.
	* gcc.dg/Warray-bounds-39.c: Remove xfails.
	* gcc.dg/Warray-bounds-44.c: New test.
	* gcc.dg/Warray-bounds-45.c: New test.

Index: gcc/builtins.c
===
--- gcc/builtins.c	(revision 275343)
+++ gcc/builtins.c	(working copy)
@@ -3562,54 +3562,6 @@ check_access (tree exp, tree, tree, tree dstwrite,
   return true;
 }
 
-/* Determines the size of the member referenced by the COMPONENT_REF
-   REF, using its initializer expression if necessary in order to
-   determine the size of an initialized flexible array member.
-   Returns the size (which might be zero for an object with
-   an uninitialized flexible array member) or null if the size
-   cannot be determined.  */
-
-static tree
-component_size (tree ref)
-{
-  gcc_assert (TREE_CODE (ref) == COMPONENT_REF);
-
-  tree member = TREE_OPERAND (ref, 1);
-
-  /* If the member is not last or has a size greater than one, return
- it.  Otherwise it's either a flexible array member or a zero-length
- array member, or an array of length one treated as such.  */
-  tree size = DECL_SIZE_UNIT (member);
-  if (size
-  && (!array_at_struct_end_p (ref)
-	  || (!integer_zerop (size)
-	  && !integer_on

Re: [PATCH] Disable vtable verification with LTO

2019-09-04 Thread Jeff Law
On 9/4/19 3:59 PM, Caroline Tice via gcc-patches wrote:
> Vtable verification currently does not work properly with LTO.  Fixing
> this will be non-trivial, and I currently do not have the time do
> this.  Until this can be fixed, we should not allow users to specify
> both vtable verification and link-time optimization.  The attached
> patch checks to see if both options were specified and if so, tells
> the user that this is not supported.
> 
> I have tested this by compiling hello world with each option
> separately and then trying to specify them both; it behaves as
> expected.
> 
> Is this patch OK to commit?
> 
> -- Caroline Tice
> cmt...@google.com
> 
> ChangeLog entry:
> 
> 2019-09-04  Caroline Tice  
> 
> *  opts.c (finish_options):  Disallow -fvtable-verify and -flto to be
> specified together;
OK.
jeff
> 


[PATCH] Disable vtable verification with LTO

2019-09-04 Thread Caroline Tice via gcc-patches
Vtable verification currently does not work properly with LTO.  Fixing
this will be non-trivial, and I currently do not have the time do
this.  Until this can be fixed, we should not allow users to specify
both vtable verification and link-time optimization.  The attached
patch checks to see if both options were specified and if so, tells
the user that this is not supported.

I have tested this by compiling hello world with each option
separately and then trying to specify them both; it behaves as
expected.

Is this patch OK to commit?

-- Caroline Tice
cmt...@google.com

ChangeLog entry:

2019-09-04  Caroline Tice  

*  opts.c (finish_options):  Disallow -fvtable-verify and -flto to be
specified together;

Index: gcc/opts.c
===
--- gcc/opts.c (revision 275387)
+++ gcc/opts.c (working copy)
@@ -1226,6 +1226,10 @@
   if (opts->x_flag_live_patching && opts->x_flag_lto)
 sorry ("live patching is not supported with LTO");

+  /* Currently vtable verification is not supported for LTO */
+  if (opts->x_flag_vtable_verify && opts->x_flag_lto)
+sorry ("vtable verification is not supported with LTO");
+
   /* Control IPA optimizations based on different -flive-patching level.  */
   if (opts->x_flag_live_patching)
 {


Re: [PATCH 2/2] PR libstdc++/41861 Add full steady_clock support to condition_variable

2019-09-04 Thread Jonathan Wakely

On 04/09/19 18:21 +0100, Mike Crowe wrote:

On Wednesday 04 September 2019 at 17:57:45 +0100, Mike Crowe wrote:

On Wednesday 04 September 2019 at 17:14:30 +0100, Jonathan Wakely wrote:
> On 04/09/19 15:49 +0100, Mike Crowe wrote:
> > On Wednesday 04 September 2019 at 14:39:35 +0100, Jonathan Wakely wrote:
> > > I noticed that the new tests you added in [PATCH 1/2] pass on current
> > > trunk, even without [PATCH 2/2], is that expected?
> >
> > Unfortunately, yes. CLOCK_MONOTONIC and CLOCK_REALTIME will tick at the
> > same rate, so unless someone warps CLOCK_REALTIME during the test we can't
> > tell the difference between the old implementation of translating
> > steady_clock to system_clock and the new implementation that uses
> > steady_clock directly. :( I added the tests in the hope of finding other
> > mistakes in the new implementation rather than to reproduce the original
> > problem.
>
> OK, that was what I figured was the case. Thanks for confirming.
>
> > Maybe I should add comments to the test to make that clear along the lines
> > of those found in testsuite/27_io/objects/char/3_xin.cc ?
>
> More comments are usually useful, otherwise I'll just ask the same
> question again and again :-)

How about something like:

--8<--
It's not possible for this test to automatically ensure that the
system_clock test cases result in a wait on CLOCK_REALTIME and steady_clock
test cases result in a wait on CLOCK_MONOTONIC. It's recommended to run the
test under strace(1) and check whether the expected futex calls are made by
glibc.
-->8--

Unfortunately I'm unable to determine how I actually managed to run the
test under strace. Perhaps I just compiled a similar test myself rather
than using dejagnu. :(


Ah, I did it. Here's a new comment:

--8<--
It's not possible for this test to automatically ensure that the
system_clock test cases result in a wait on CLOCK_REALTIME and steady_clock
test cases result in a wait on CLOCK_MONOTONIC. It's recommended to run the
test under strace(1) and check whether the expected futex calls are made by
glibc. The easiest way to do this is to copy and paste the line used to
build the test from the output of:

make -j8 check-target-libstdc++-v3 
RUNTESTFLAGS="conformance.exp=30_threads/condition_variable/members/* -v -v"

to compile the test with only the tests for one clock enabled and then run it 
as:

strace ./2.exe

You should see calls to:

futex(..., FUTEX_WAIT_BITSET_PRIVATE|FUTEX_CLOCK_REALTIME, ...)

with large values of tv_sec when using system_clock and calls to:

futex(..., FUTEX_WAIT_BITSET_PRIVATE, ...)

with values of tv_sec based on the system uptime when using steady_clock.
-->8--


Great. I've committed the patch I sent earlier (with the renamed
typedefs) and the attached one for the tests (using an abridged form
of the comment above, linking to the copy of your mail in the list
archives for the full instructions).

Thanks!

commit 8f84bffbd3c4173d684fc825f0847d857a71f54b
Author: Jonathan Wakely 
Date:   Wed Sep 4 12:41:34 2019 +0100

Add user-defined clock to libstdc++ condition_variable tests

2019-09-04  Mike Crowe  

* testsuite/30_threads/condition_variable/members/2.cc (test01):
Parameterise so that test can be run against an arbitrary clock.
(main): Test using std::chrono::steady_clock and a user-defined
clock in addition to the previous std::chrono::system_clock.
* testsuite/30_threads/condition_variable_any/members/2.cc: Likewise.

diff --git a/libstdc++-v3/testsuite/30_threads/condition_variable/members/2.cc b/libstdc++-v3/testsuite/30_threads/condition_variable/members/2.cc
index 0ecb09d80ed..cbac3fa1932 100644
--- a/libstdc++-v3/testsuite/30_threads/condition_variable/members/2.cc
+++ b/libstdc++-v3/testsuite/30_threads/condition_variable/members/2.cc
@@ -26,6 +26,7 @@
 #include 
 #include 
 
+template 
 void test01()
 {
   try 
@@ -35,10 +36,10 @@ void test01()
   std::mutex m;
   std::unique_lock l(m);
 
-  auto then = std::chrono::steady_clock::now();
+  auto then = ClockType::now();
   std::cv_status result = c1.wait_until(l, then + ms);
   VERIFY( result == std::cv_status::timeout );
-  VERIFY( (std::chrono::steady_clock::now() - then) >= ms );
+  VERIFY( (ClockType::now() - then) >= ms );
   VERIFY( l.owns_lock() );
 }
   catch (const std::system_error& e)
@@ -102,9 +103,39 @@ void test01_alternate_clock()
 }
 }
 
+/* User defined clock that ticks in two-thousandths of a second
+   forty-two minutes ahead of steady_clock. */
+struct user_defined_clock
+{
+  typedef std::chrono::steady_clock::rep rep;
+  typedef std::ratio<1, 2000> period;
+  typedef std::chrono::duration duration;
+  typedef std::chrono::time_point time_point;
+
+  static constexpr bool is_steady = true;
+
+  static time_point now() noexcept
+  {
+using namespace std::chrono;
+const auto steady_since_epoch = steady_clock::now().time_since_epoch();

Re: [ PATCH ] C++20

2019-09-04 Thread JeanHeyd Meneide
 Thank you for the thorough review!

On Tue, Sep 3, 2019 at 9:31 AM Jonathan Wakely  wrote:
> It looks like__adl_begin and __adl_end should be guarded by #ifdef
> _GLIBCXX_P1394 because they're not needed otherwise.

 I'm absolutely going to need it for the bit data structures (and
in tuning up other parts of the library). It will also be important
for things that need to behave nicely when working with things needing
[range.access] internally in libstdc++,
while we wait for the cmcstl2 implementation. I did gate
__adl_to_address behind p1394, though, because that is definitely
specific to that paper (and its friend, p1474).

 Let me know if I missed anything!

 Tested on x64_86 Linux with 23_containers/* tests.
diff --git a/libstdc++-v3/include/Makefile.am b/libstdc++-v3/include/Makefile.am
index 3fe80f32cc4..b8b786d9260 100644
--- a/libstdc++-v3/include/Makefile.am
+++ b/libstdc++-v3/include/Makefile.am
@@ -68,6 +68,7 @@ std_headers = \
${std_srcdir}/scoped_allocator \
${std_srcdir}/set \
${std_srcdir}/shared_mutex \
+   ${std_srcdir}/span \
${std_srcdir}/sstream \
${std_srcdir}/stack \
${std_srcdir}/stdexcept \
diff --git a/libstdc++-v3/include/Makefile.in b/libstdc++-v3/include/Makefile.in
index b675d356cd4..cd1e9df5482 100644
--- a/libstdc++-v3/include/Makefile.in
+++ b/libstdc++-v3/include/Makefile.in
@@ -412,6 +412,7 @@ std_headers = \
${std_srcdir}/scoped_allocator \
${std_srcdir}/set \
${std_srcdir}/shared_mutex \
+   ${std_srcdir}/span \
${std_srcdir}/sstream \
${std_srcdir}/stack \
${std_srcdir}/stdexcept \
diff --git a/libstdc++-v3/include/bits/range_access.h 
b/libstdc++-v3/include/bits/range_access.h
index d1e74711433..a80ec48248f 100644
--- a/libstdc++-v3/include/bits/range_access.h
+++ b/libstdc++-v3/include/bits/range_access.h
@@ -318,6 +318,78 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
 
 #endif // C++17
 
+#if __cplusplus > 201703L
+  // "why are these in namespace std:: and not __gnu_cxx:: ?"
+  // because if we don't put them here it's impossible to 
+  // have implicit ADL with "using std::begin/end/size/data;".
+  template 
+constexpr auto
+__adl_begin(_Container& __cont) noexcept(noexcept(begin(__cont)))
+{ return begin(__cont); }
+  
+  template 
+constexpr auto
+__adl_end(_Container& __cont) noexcept(noexcept(end(__cont)))
+{ return end(__cont); }
+  
+  template 
+constexpr auto
+__adl_cbegin(_Container& __cont) noexcept(noexcept(cbegin(__cont)))
+{ return cbegin(__cont); }
+  
+  template 
+constexpr auto
+__adl_cend(_Container& __cont) noexcept(noexcept(cend(__cont)))
+{ return cend(__cont); }
+
+  template 
+constexpr auto
+__adl_rbegin(_Container& __cont) noexcept(noexcept(rbegin(__cont)))
+{ return rbegin(__cont); }
+  
+  template 
+constexpr auto
+__adl_rend(_Container& __cont) noexcept(noexcept(rend(__cont)))
+{ return rend(__cont); }
+  
+  template 
+constexpr auto
+__adl_crbegin(_Container& __cont) noexcept(noexcept(crbegin(__cont)))
+{ return crbegin(__cont); }
+  
+  template 
+constexpr auto
+__adl_crend(_Container& __cont) noexcept(noexcept(crend(__cont)))
+{ return crend(__cont); }
+  
+  template 
+constexpr auto
+__adl_data(_Container& __cont) noexcept(noexcept(data(__cont)))
+{ return data(__cont); }
+  
+  template 
+constexpr auto
+__adl_cdata(_Container& __cont) noexcept(noexcept(cdata(__cont)))
+{ return cdata(__cont); }
+  
+  template 
+constexpr auto
+__adl_size(_Container& __cont) noexcept(noexcept(size(__cont)))
+{ return size(__cont); }
+  
+  template 
+constexpr auto
+__adl_empty(_Container& __cont) noexcept(noexcept(empty(__cont)))
+{ return empty(__cont); }
+  
+#if defined(_GLIBCXX_P1394) && _GLIBCXX_P1394
+  template 
+constexpr auto
+__adl_to_address(_Container& __cont) noexcept(noexcept(to_address(__cont)))
+{ return to_address(__cont); }
+#endif // P1394 and new contiguous_iterator requirements 
[iterator.concept.contiguous]
+#endif // C++20
+
 _GLIBCXX_END_NAMESPACE_VERSION
 } // namespace
 
diff --git a/libstdc++-v3/include/bits/stl_iterator.h 
b/libstdc++-v3/include/bits/stl_iterator.h
index 8ab0d72b0c2..e29775a42c5 100644
--- a/libstdc++-v3/include/bits/stl_iterator.h
+++ b/libstdc++-v3/include/bits/stl_iterator.h
@@ -446,7 +446,8 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
 
 #if __cplusplus >= 201103L
   template
-_GLIBCXX20_CONSTEXPR auto
+_GLIBCXX20_CONSTEXPR
+auto
 __niter_base(reverse_iterator<_Iterator> __it)
 -> decltype(__make_reverse_iterator(__niter_base(__it.base(
 { return __make_reverse_iterator(__niter_base(__it.base())); }
@@ -457,7 +458,8 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
 { };
 
   template
-_GLIBCXX20_CONSTEXPR auto
+_GLIBCXX20_CONSTEXPR
+auto
 __miter_base(reverse_iterator<_Iterator> __it)
 -> decltype(__make_

[PATCH] [og9] Fix libgomp.oacc-fortran/lib-13.f90 async bug

2019-09-04 Thread Julian Brown
This patch fixes a bug with the lib-13.f90 test -- an asynchronous
compute region inside a synchronous data region leads to a data race
copying out/unmapping target data.

This test failed intermittently for AMD GCN. I will apply to the
openacc-gcc-9-branch shortly.

Thanks,

Julian

ChangeLog

libgomp/
* testsuite/libgomp.oacc-fortran/lib-13.f90: End data region after
wait API calls.
---
 libgomp/ChangeLog.openacc | 5 +
 libgomp/testsuite/libgomp.oacc-fortran/lib-13.f90 | 3 +--
 2 files changed, 6 insertions(+), 2 deletions(-)

diff --git a/libgomp/ChangeLog.openacc b/libgomp/ChangeLog.openacc
index f9d8e6ecd39..c7ef40e922c 100644
--- a/libgomp/ChangeLog.openacc
+++ b/libgomp/ChangeLog.openacc
@@ -1,3 +1,8 @@
+2019-09-05  Julian Brown  
+
+   * testsuite/libgomp.oacc-fortran/lib-13.f90: End data region after
+   wait API calls.
+
 2019-08-13  Julian Brown  
 
* plugin/plugin-gcn.c (queue_push_callback): Wait on queue-full
diff --git a/libgomp/testsuite/libgomp.oacc-fortran/lib-13.f90 
b/libgomp/testsuite/libgomp.oacc-fortran/lib-13.f90
index da944c35de9..ea35d71b789 100644
--- a/libgomp/testsuite/libgomp.oacc-fortran/lib-13.f90
+++ b/libgomp/testsuite/libgomp.oacc-fortran/lib-13.f90
@@ -19,11 +19,10 @@ program main
 end do
   !$acc end parallel
 end do
-  !$acc end data
 
   call acc_wait_all_async (nprocs + 1)
-
   call acc_wait (nprocs + 1)
+  !$acc end data
 
   if (acc_async_test (1) .neqv. .TRUE.) call abort
   if (acc_async_test (2) .neqv. .TRUE.) call abort
-- 
2.22.0



Re: r272976 - in /trunk/gcc/ada: ChangeLog ali.adb ...

2019-09-04 Thread Maciej W. Rozycki
Arnaud,

> >  This change (not posted to `gcc-patches' for some reason) has caused a 
> > regression in the form of a build failure with the `riscv-linux-gnu' 
> > target (and for the record the `x86_64-linux-gnu' build/host running GCC 
> > 8.3.0):
> > 
> > ali.adb:34:28: warning: use clause for package "GNAT" has no effect
> > ali.adb:35:28: warning: use clause for package "Dynamic_HTables" has no 
> > effect
> > ali.adb:155:52: "Bucket_Range_Type" is undefined (more references follow)
> > ali.adb:158:27: "Dynamic_Hash_Tables" is undefined
> > ali.adb:173:30: "Sig_Map" is undefined (more references follow)
> > gnatmake: ".../gcc/ada/ali.adb" compilation error
> > make[3]: *** [../gcc-interface/Makefile:469: gnatmake-re] Error 4
> > make[3]: Leaving directory '.../gcc/ada/tools'
> > make[2]: *** [Makefile:216: gnattools-cross] Error 2
> > make[2]: Leaving directory '.../gnattools'
> > make[1]: *** [Makefile:11224: all-gnattools] Error 2
> > 
> > -- that has persisted ever since.
> 
> As documented in https://gcc.gnu.org/install/build.html
> you need a matching native GNAT compiler to build a GNAT cross compiler.

 Hmm, I used to use matching versions of the whole GCC suite for building 
cross-compilers, but that was long ago, up to GCC 4.1, and not with GNAT 
specifically in mind.  I stopped being involved with the Ada frontend in 
any way then until recently.  Having come across this build problem I did 
check  and then 
 and all I found was:

"  GNAT
  In order to build the Ada compiler (GNAT) you must already have
  GNAT installed because portions of the Ada frontend are written
  in Ada (with GNAT extensions.) Refer to the Ada installation
  instructions for more specific information."

Notice that an earlier "ISO C++98 compiler" section mentions GCC "version 
3.4 or later" as required for building a cross-compiler, so arguably no 
version restriction (including though not limited to requiring the host 
compiler's version to match) being mentioned here may be interpreted as 
none at all (of course the 3.4 requirement for other parts of the compiler 
built alongside still applies).  Which is why I stopped looking further.

 NB I wasn't able to locate "the Ada installation instructions" referred.

> If I understood properly, you are using a GCC 8.3.0 as the host compiler
> to build a GNAT cross compiler, which indeed won't work: you need to first
> build a native matching compiler and then use this matching compiler to build
> the cross.

 Thanks for the hint.  I have now successfully natively bootstrapped 
current GCC trunk using my usual `--enable-languages=all' setting, and 
then used that `x86_64-linux-gnu' compiler suite to build my original 
`riscv-linux-gnu' cross-compilation environment.  I guess I'll have to 
wire a preparatory native build stage into my setup somehow.  Sorry about 
the noise.

 Perhaps it would make sense to make the prerequisites section of the 
installation instructions a little bit more explicit about the strict 
version match requirement though?  I guess someone being new to a GCC 
build would go through the whole document and then find this:

"  In order to build GNAT, the Ada compiler, you need a working GNAT
   compiler (GCC version 4.0 or later). This includes GNAT tools such as
   gnatmake and gnatlink, since the Ada front end is written in Ada and
   uses some GNAT-specific extensions."

and than that:

"  In order to build a cross compiler, it is suggested to install the new
   compiler as native first, and then use it to build the cross compiler."

buried in the build section.  Still I think the former paragraph really 
belongs to the prerequisites section and so does the latter, perhaps with 
a change to make it more explicit that building a cross-compiler may not 
actually work if the versions do not match.  WDYT?

 NB I have no specific interest in the Ada frontend.  Rather I want to 
have all the frontends built and included in testing so that when making 
changes any regressions trigger that would not with, say, the C frontend 
only.

  Maciej


[PATCH 0/6] [og9] OpenACC worker partitioning in middle end (AMD GCN)

2019-09-04 Thread Julian Brown
This patch series provides support for worker partitioning in the middle
end. The OpenACC device-lowering pass (oaccdevlow) is split into three
passes: the first assigns parallelism levels to loops, the second (new)
part rewrites basic blocks to implement a neutering/broadcasting scheme
for the OpenACC worker-partitioned execution mode, and the third part
performs the rest of the previous device-lowering pass.

Also included are patches to add support for placing gang-private
variables in special memory (e.g. LDS, "local-data share", on AMD GCN),
and to rewrite reductions targeting reference variables to use temporary
local scalar variables instead.

Further commentary is provided alongside individual patches.

Tested with offloading to AMD GCN. I will apply to the
openacc-gcc-9-branch shortly.

Thanks,

Julian

Julian Brown (6):
  [og9] Target-dependent gang-private variable decl rewriting
  [og9] OpenACC middle-end worker-partitioning support
  [og9] AMD GCN adjustments for middle-end worker partitioning
  [og9] Fix up tests for oaccdevlow pass splitting
  [og9] Reference reduction localization
  [og9] Enable worker partitioning for AMD GCN

 gcc/ChangeLog.openacc |   83 +
 gcc/Makefile.in   |1 +
 gcc/config/gcn/gcn-protos.h   |2 +-
 gcc/config/gcn/gcn-tree.c |6 +-
 gcc/config/gcn/gcn.c  |   15 +-
 gcc/config/gcn/gcn.opt|2 +-
 gcc/doc/tm.texi   |   14 +
 gcc/doc/tm.texi.in|6 +
 gcc/gimplify.c|  102 +
 gcc/omp-builtins.def  |8 +
 gcc/omp-low.c |   47 +-
 gcc/omp-offload.c |  290 ++-
 gcc/omp-offload.h |1 +
 gcc/omp-sese.c| 2036 +
 gcc/omp-sese.h|   26 +
 gcc/passes.def|2 +
 gcc/target.def|   19 +
 gcc/targhooks.h   |1 +
 gcc/testsuite/ChangeLog.openacc   |   12 +
 .../goacc/classify-kernels-unparallelized.c   |8 +-
 .../c-c++-common/goacc/classify-kernels.c |8 +-
 .../c-c++-common/goacc/classify-parallel.c|8 +-
 .../c-c++-common/goacc/classify-routine.c |8 +-
 .../goacc/classify-kernels-unparallelized.f95 |8 +-
 .../gfortran.dg/goacc/classify-kernels.f95|8 +-
 .../gfortran.dg/goacc/classify-parallel.f95   |8 +-
 .../gfortran.dg/goacc/classify-routine.f95|8 +-
 gcc/tree-core.h   |4 +-
 gcc/tree-pass.h   |2 +
 gcc/tree.c|   11 +-
 gcc/tree.h|2 +
 libgomp/ChangeLog.openacc |5 +
 libgomp/plugin/plugin-gcn.c   |4 +-
 33 files changed, 2660 insertions(+), 105 deletions(-)
 create mode 100644 gcc/omp-sese.c
 create mode 100644 gcc/omp-sese.h

-- 
2.22.0



[PATCH 3/6] [og9] AMD GCN adjustments for middle-end worker partitioning

2019-09-04 Thread Julian Brown
This patch renames the TARGET_GOACC_ADJUST_PROPAGATION_RECORD
hook introduced in the GCN backend by a previous merge to
TARGET_GOACC_CREATE_PROPAGATION_RECORD, and removes a FIXME relating to
missing worker-partitioning support.

Julian

ChangeLog

gcc/
* config/gcn/gcn-protos.h (gcn_goacc_adjust_propagation_record): Rename
prototype to...
(gcn_goacc_create_propagation_record): This.
* config/gcn/gcn-tree.c (gcn_goacc_adjust_propagation_record): Rename
function to...
(gcn_goacc_create_propagation_record): This.  Adjust comment.
* config/gcn/gcn.c (gcn_init_builtins): Override decls for
BUILT_IN_GOACC_SINGLE_START, BUILT_IN_GOACC_SINGLE_COPY_START,
BUILT_IN_GOACC_SINGLE_COPY_END and BUILT_IN_GOACC_BARRIER.
(gcn_fork_join): Remove inaccurate comment.
(TARGET_GOACC_ADJUST_PROPAGATION_RECORD): Rename to...
(TARGET_GOACC_CREATE_PROPAGATION_RECORD): This.
---
 gcc/ChangeLog.openacc   | 15 +++
 gcc/config/gcn/gcn-protos.h |  2 +-
 gcc/config/gcn/gcn-tree.c   |  6 +++---
 gcc/config/gcn/gcn.c| 11 +++
 4 files changed, 22 insertions(+), 12 deletions(-)

diff --git a/gcc/ChangeLog.openacc b/gcc/ChangeLog.openacc
index a2b2dcfcf26..0d068ac8ae2 100644
--- a/gcc/ChangeLog.openacc
+++ b/gcc/ChangeLog.openacc
@@ -1,3 +1,18 @@
+2019-09-05  Julian Brown  
+
+   * config/gcn/gcn-protos.h (gcn_goacc_adjust_propagation_record): Rename
+   prototype to...
+   (gcn_goacc_create_propagation_record): This.
+   * config/gcn/gcn-tree.c (gcn_goacc_adjust_propagation_record): Rename
+   function to...
+   (gcn_goacc_create_propagation_record): This.  Adjust comment.
+   * config/gcn/gcn.c (gcn_init_builtins): Override decls for
+BUILT_IN_GOACC_SINGLE_START, BUILT_IN_GOACC_SINGLE_COPY_START,
+BUILT_IN_GOACC_SINGLE_COPY_END and BUILT_IN_GOACC_BARRIER.
+   (gcn_fork_join): Remove inaccurate comment.
+   (TARGET_GOACC_ADJUST_PROPAGATION_RECORD): Rename to...
+   (TARGET_GOACC_CREATE_PROPAGATION_RECORD): This.
+
 2019-09-05  Julian Brown  
 
* Makefile.in (OBJS): Add omp-sese.o.
diff --git a/gcc/config/gcn/gcn-protos.h b/gcc/config/gcn/gcn-protos.h
index da7faf29c70..1711862c6a2 100644
--- a/gcc/config/gcn/gcn-protos.h
+++ b/gcc/config/gcn/gcn-protos.h
@@ -37,7 +37,7 @@ extern rtx gcn_full_exec ();
 extern rtx gcn_full_exec_reg ();
 extern rtx gcn_gen_undef (machine_mode);
 extern bool gcn_global_address_p (rtx);
-extern tree gcn_goacc_adjust_propagation_record (tree record_type, bool sender,
+extern tree gcn_goacc_create_propagation_record (tree record_type, bool sender,
 const char *name);
 extern void gcn_goacc_adjust_gangprivate_decl (tree var);
 extern void gcn_goacc_reduction (gcall *call);
diff --git a/gcc/config/gcn/gcn-tree.c b/gcc/config/gcn/gcn-tree.c
index c6b6302e9ed..04902a39b29 100644
--- a/gcc/config/gcn/gcn-tree.c
+++ b/gcc/config/gcn/gcn-tree.c
@@ -667,12 +667,12 @@ gcn_goacc_reduction (gcall *call)
 }
 }
 
-/* Implement TARGET_GOACC_ADJUST_PROPAGATION_RECORD.
+/* Implement TARGET_GOACC_CREATE_PROPAGATION_RECORD.
  
-   Tweak (worker) propagation record, e.g. to put it in shared memory.  */
+   Create (worker) propagation record in shared memory.  */
 
 tree
-gcn_goacc_adjust_propagation_record (tree record_type, bool sender,
+gcn_goacc_create_propagation_record (tree record_type, bool sender,
 const char *name)
 {
   tree type = record_type;
diff --git a/gcc/config/gcn/gcn.c b/gcc/config/gcn/gcn.c
index f3f112d95a9..ca9321b5f25 100644
--- a/gcc/config/gcn/gcn.c
+++ b/gcc/config/gcn/gcn.c
@@ -3468,8 +3468,6 @@ gcn_init_builtins (void)
   TREE_NOTHROW (gcn_builtin_decls[i]) = 1;
 }
 
-/* FIXME: remove the ifdef once OpenACC support is merged upstream.  */
-#ifdef BUILT_IN_GOACC_SINGLE_START
   /* These builtins need to take/return an LDS pointer: override the generic
  versions here.  */
 
@@ -3486,7 +3484,6 @@ gcn_init_builtins (void)
 
   set_builtin_decl (BUILT_IN_GOACC_BARRIER,
gcn_builtin_decls[GCN_BUILTIN_ACC_BARRIER], false);
-#endif
 }
 
 /* Expand the CMP_SWAP GCN builtins.  We have our own versions that do
@@ -4765,8 +4762,6 @@ static bool
 gcn_fork_join (gcall *ARG_UNUSED (call), const int *ARG_UNUSED (dims),
   bool ARG_UNUSED (is_fork))
 {
-  /* GCN does not use the fork/join concept invented for NVPTX.
- Instead we use standard autovectorization.  */
   return false;
 }
 
@@ -6029,9 +6024,9 @@ print_operand (FILE *file, rtx x, int code)
 #define TARGET_FUNCTION_VALUE_REGNO_P gcn_function_value_regno_p
 #undef  TARGET_GIMPLIFY_VA_ARG_EXPR
 #define TARGET_GIMPLIFY_VA_ARG_EXPR gcn_gimplify_va_arg_expr
-#undef  TARGET_GOACC_ADJUST_PROPAGATION_RECORD
-#define TARGET_GOACC_ADJUST_PROPAGATION_RECORD \
-  gcn_goacc_adjust_propagation_record
+#undef  TARGET_GOACC_CREATE_PROPAGATION_RECOR

[PATCH 4/6] [og9] Fix up tests for oaccdevlow pass splitting

2019-09-04 Thread Julian Brown
This patch adjusts some tests after the splitting of the oaccdevlow pass
into three passes.

Julian

ChangeLog

gcc/testsuite/
* c-c++-common/goacc/classify-kernels-unparallelized.c,
c-c++-common/goacc/classify-kernels.c,
c-c++-common/goacc/classify-parallel.c,
c-c++-common/goacc/classify-routine.c,
gfortran.dg/goacc/classify-kernels-unparallelized.f95,
gfortran.dg/goacc/classify-kernels.f95,
gfortran.dg/goacc/classify-parallel.f95,
gfortran.dg/goacc/classify-routine.f95: Scan oaccloops dump instead of
oaccdevlow pass.
---
 gcc/testsuite/ChangeLog.openacc  | 12 
 .../goacc/classify-kernels-unparallelized.c  |  8 
 gcc/testsuite/c-c++-common/goacc/classify-kernels.c  |  8 
 gcc/testsuite/c-c++-common/goacc/classify-parallel.c |  8 
 gcc/testsuite/c-c++-common/goacc/classify-routine.c  |  8 
 .../goacc/classify-kernels-unparallelized.f95|  8 
 gcc/testsuite/gfortran.dg/goacc/classify-kernels.f95 |  8 
 .../gfortran.dg/goacc/classify-parallel.f95  |  8 
 gcc/testsuite/gfortran.dg/goacc/classify-routine.f95 |  8 
 9 files changed, 44 insertions(+), 32 deletions(-)

diff --git a/gcc/testsuite/ChangeLog.openacc b/gcc/testsuite/ChangeLog.openacc
index 8295fe61ba7..899b9cf1783 100644
--- a/gcc/testsuite/ChangeLog.openacc
+++ b/gcc/testsuite/ChangeLog.openacc
@@ -1,3 +1,15 @@
+2019-09-05  Julian Brown  
+
+   * c-c++-common/goacc/classify-kernels-unparallelized.c,
+   c-c++-common/goacc/classify-kernels.c,
+   c-c++-common/goacc/classify-parallel.c,
+   c-c++-common/goacc/classify-routine.c,
+   gfortran.dg/goacc/classify-kernels-unparallelized.f95,
+   gfortran.dg/goacc/classify-kernels.f95,
+   gfortran.dg/goacc/classify-parallel.f95,
+   gfortran.dg/goacc/classify-routine.f95: Scan oaccloops dump instead of
+   oaccdevlow pass.
+
 2019-07-10  Julian Brown  
 
* c-c++-common/goacc/mdc-1.c: Update clause matching patterns.
diff --git a/gcc/testsuite/c-c++-common/goacc/classify-kernels-unparallelized.c 
b/gcc/testsuite/c-c++-common/goacc/classify-kernels-unparallelized.c
index 9dad2de504c..f05fba9d31b 100644
--- a/gcc/testsuite/c-c++-common/goacc/classify-kernels-unparallelized.c
+++ b/gcc/testsuite/c-c++-common/goacc/classify-kernels-unparallelized.c
@@ -5,7 +5,7 @@
{ dg-additional-options "-fopt-info-optimized-omp" }
{ dg-additional-options "-fdump-tree-ompexp" }
{ dg-additional-options "-fdump-tree-parloops1-all" }
-   { dg-additional-options "-fdump-tree-oaccdevlow" } */
+   { dg-additional-options "-fdump-tree-oaccloops" } */
 
 #define N 1024
 
@@ -36,6 +36,6 @@ void KERNELS ()
 
 /* Check the offloaded function's classification and compute dimensions (will
always be 1 x 1 x 1 for non-offloading compilation).
-   { dg-final { scan-tree-dump-times "(?n)Function is unparallelized OpenACC 
kernels offload" 1 "oaccdevlow" } }
-   { dg-final { scan-tree-dump-times "(?n)Compute dimensions \\\[1, 1, 1\\\]" 
1 "oaccdevlow" } }
-   { dg-final { scan-tree-dump-times "(?n)__attribute__\\(\\(oacc function 
\\(1, 1, 1\\), oacc kernels, omp target entrypoint\\)\\)" 1 "oaccdevlow" } } */
+   { dg-final { scan-tree-dump-times "(?n)Function is unparallelized OpenACC 
kernels offload" 1 "oaccloops" } }
+   { dg-final { scan-tree-dump-times "(?n)Compute dimensions \\\[1, 1, 1\\\]" 
1 "oaccloops" } }
+   { dg-final { scan-tree-dump-times "(?n)__attribute__\\(\\(oacc function 
\\(1, 1, 1\\), oacc kernels, omp target entrypoint\\)\\)" 1 "oaccloops" } } */
diff --git a/gcc/testsuite/c-c++-common/goacc/classify-kernels.c 
b/gcc/testsuite/c-c++-common/goacc/classify-kernels.c
index f1d46130685..009db79b018 100644
--- a/gcc/testsuite/c-c++-common/goacc/classify-kernels.c
+++ b/gcc/testsuite/c-c++-common/goacc/classify-kernels.c
@@ -5,7 +5,7 @@
{ dg-additional-options "-fopt-info-optimized-omp" }
{ dg-additional-options "-fdump-tree-ompexp" }
{ dg-additional-options "-fdump-tree-parloops1-all" }
-   { dg-additional-options "-fdump-tree-oaccdevlow" } */
+   { dg-additional-options "-fdump-tree-oaccloops" } */
 
 #define N 1024
 
@@ -31,6 +31,6 @@ void KERNELS ()
 
 /* Check the offloaded function's classification and compute dimensions (will
always be 1 x 1 x 1 for non-offloading compilation).
-   { dg-final { scan-tree-dump-times "(?n)Function is parallelized OpenACC 
kernels offload" 1 "oaccdevlow" } }
-   { dg-final { scan-tree-dump-times "(?n)Compute dimensions \\\[1, 1, 1\\\]" 
1 "oaccdevlow" } }
-   { dg-final { scan-tree-dump-times "(?n)__attribute__\\(\\(oacc function 
\\(1, 1, 1\\), oacc kernels parallelized, oacc function \\(, , \\), oacc 
kernels, omp target entrypoint\\)\\)" 1 "oaccdevlow" } } */
+   { dg-final { scan-tree-dump-times "(?n)Function is parallelized OpenACC 
kernels offload" 1 "oaccloops" } }
+   { dg-final { scan-tree-dump-times "(?n)Compu

[PATCH 1/6] [og9] Target-dependent gang-private variable decl rewriting

2019-09-04 Thread Julian Brown
This patch adds support for rewriting variables marked up with the "oacc
gangprivate" attributes in a target-dependent way in the oaccdevlow pass
of the offload compiler.

This behaviour is controlled by a new target hook,
TARGET_GOACC_ADJUST_GANGPRIVATE_DECL. This is conceptually similar to
the existing TARGET_GOACC_EXPAND_ACCEL_VAR hook, but that one works too
late in the compilation process for AMD GCN.

The patch to set the "oacc gangprivate" attribute was posted upstream here:

https://gcc.gnu.org/ml/gcc-patches/2018-08/msg00749.html

A version of that is already present on the og9 branch.

Julian

ChangeLog

gcc/
* omp-offload.c (convert.h): Include.
(struct addr_expr_rewrite_info): Add struct.
(rewrite_addr_expr): New function.
(is_sync_builtin_call): New function.
(execute_oacc_device_lower): Support rewriting gang-private variables
using target hook, and fix up addr_expr nodes afterwards.
* target.def (adjust_gangprivate_decl): New target hook.
* doc/tm.texi.in (TARGET_GOACC_ADJUST_GANGPRIVATE_DECL): Document new
target hook.
* doc/tm.texi: Regenerate.
---
 gcc/ChangeLog.openacc |  13 +
 gcc/doc/tm.texi   |   4 ++
 gcc/doc/tm.texi.in|   2 +
 gcc/omp-offload.c | 133 ++
 gcc/target.def|   6 ++
 5 files changed, 158 insertions(+)

diff --git a/gcc/ChangeLog.openacc b/gcc/ChangeLog.openacc
index a22f07c817c..b1c627b394c 100644
--- a/gcc/ChangeLog.openacc
+++ b/gcc/ChangeLog.openacc
@@ -1,3 +1,16 @@
+2019-09-05  Julian Brown  
+
+   * omp-offload.c (convert.h): Include.
+   (struct addr_expr_rewrite_info): Add struct.
+   (rewrite_addr_expr): New function.
+   (is_sync_builtin_call): New function.
+   (execute_oacc_device_lower): Support rewriting gang-private variables
+   using target hook, and fix up addr_expr nodes afterwards.
+   * target.def (adjust_gangprivate_decl): New target hook.
+   * doc/tm.texi.in (TARGET_GOACC_ADJUST_GANGPRIVATE_DECL): Document new
+   target hook.
+   * doc/tm.texi: Regenerate.
+
 2019-08-13  Julian Brown  
 
* omp-oacc-kernels.c (add_wait): New function, split out of...
diff --git a/gcc/doc/tm.texi b/gcc/doc/tm.texi
index 9b88498eb95..f3707c6abe3 100644
--- a/gcc/doc/tm.texi
+++ b/gcc/doc/tm.texi
@@ -6162,6 +6162,10 @@ memories.  A return value of NULL indicates that the 
target does not
 handle this VAR_DECL, and normal RTL expanding is resumed.
 @end deftypefn
 
+@deftypefn {Target Hook} void TARGET_GOACC_ADJUST_GANGPRIVATE_DECL (tree 
@var{var})
+Tweak variable declaration for a gang-private variable.
+@end deftypefn
+
 @deftypefn {Target Hook} bool TARGET_GOACC_EXPLODE_ARGS (void)
 Define this hook to TRUE if arguments to offload regions should be
 exploded, i.e. passed as true arguments rather than in an argument array.
diff --git a/gcc/doc/tm.texi.in b/gcc/doc/tm.texi.in
index c9c4341a35f..cebadf4a502 100644
--- a/gcc/doc/tm.texi.in
+++ b/gcc/doc/tm.texi.in
@@ -4210,6 +4210,8 @@ address;  but often a machine-dependent strategy can 
generate better code.
 
 @hook TARGET_GOACC_EXPAND_ACCEL_VAR
 
+@hook TARGET_GOACC_ADJUST_GANGPRIVATE_DECL
+
 @hook TARGET_GOACC_EXPLODE_ARGS
 
 @node Anchored Addresses
diff --git a/gcc/omp-offload.c b/gcc/omp-offload.c
index 1129b00511e..c94dc956d7e 100644
--- a/gcc/omp-offload.c
+++ b/gcc/omp-offload.c
@@ -52,6 +52,7 @@ along with GCC; see the file COPYING3.  If not see
 #include "stringpool.h"
 #include "attribs.h"
 #include "cfgloop.h"
+#include "convert.h"
 
 /* Describe the OpenACC looping structure of a function.  The entire
function is held in a 'NULL' loop.  */
@@ -1570,6 +1571,78 @@ maybe_discard_oacc_function (tree decl)
   return false;
 }
 
+struct addr_expr_rewrite_info
+{
+  gimple *stmt;
+  hash_set *adjusted_vars;
+  bool avoid_pointer_conversion;
+  bool modified;
+};
+
+static tree
+rewrite_addr_expr (tree *tp, int *walk_subtrees, void *data)
+{
+  walk_stmt_info *wi = (walk_stmt_info *) data;
+  addr_expr_rewrite_info *info = (addr_expr_rewrite_info *) wi->info;
+
+  if (TREE_CODE (*tp) == ADDR_EXPR)
+{
+  tree arg = TREE_OPERAND (*tp, 0);
+
+  if (info->adjusted_vars->contains (arg))
+   {
+ if (info->avoid_pointer_conversion)
+   {
+ *tp = build_fold_addr_expr (arg);
+ info->modified = true;
+ *walk_subtrees = 0;
+   }
+ else
+   {
+ gimple_stmt_iterator gsi = gsi_for_stmt (info->stmt);
+ tree repl = build_fold_addr_expr (arg);
+ gimple *stmt1
+   = gimple_build_assign (make_ssa_name (TREE_TYPE (repl)), repl);
+ tree conv = convert_to_pointer (TREE_TYPE (*tp),
+ gimple_assign_lhs (stmt1));
+ gimple *stmt2
+   = gimple_build_assign (make_ssa_name (TREE_TYPE (*tp)), conv);
+ gsi_insert_b

[PATCH 6/6] [og9] Enable worker partitioning for AMD GCN

2019-09-04 Thread Julian Brown
This patch enables middle-end worker partitioning and multiple workers
on AMD GCN.

Julian

ChangeLog

gcc/
* config/gcn/gcn.c (gcn_goacc_validate_dims): Remove
no-flag_worker-partitioning assertion.
(TARGET_GOACC_WORKER_PARTITIONING): Define target hook to true.
* config/gcn/gcn.opt (flag_worker_partitioning): Change default to 1.

libgomp/
* plugin/plugin-gcn.c (gcn_exec): Change default number of workers to
16.
---
 gcc/ChangeLog.openacc   | 7 +++
 gcc/config/gcn/gcn.c| 4 ++--
 gcc/config/gcn/gcn.opt  | 2 +-
 libgomp/ChangeLog.openacc   | 5 +
 libgomp/plugin/plugin-gcn.c | 4 +---
 5 files changed, 16 insertions(+), 6 deletions(-)

diff --git a/gcc/ChangeLog.openacc b/gcc/ChangeLog.openacc
index 2b7f616810d..dde474d144d 100644
--- a/gcc/ChangeLog.openacc
+++ b/gcc/ChangeLog.openacc
@@ -1,3 +1,10 @@
+2019-09-05  Julian Brown  
+
+   * config/gcn/gcn.c (gcn_goacc_validate_dims): Remove
+   no-flag_worker-partitioning assertion.
+   (TARGET_GOACC_WORKER_PARTITIONING): Define target hook to true.
+   * config/gcn/gcn.opt (flag_worker_partitioning): Change default to 1.
+
 2019-09-05  Cesar Philippidis  
Julian Brown  
 
diff --git a/gcc/config/gcn/gcn.c b/gcc/config/gcn/gcn.c
index ca9321b5f25..b7cf6f093fa 100644
--- a/gcc/config/gcn/gcn.c
+++ b/gcc/config/gcn/gcn.c
@@ -4659,8 +4659,6 @@ gcn_goacc_validate_dims (tree decl, int dims[], int 
fn_level,
   /* FIXME: remove -facc-experimental-workers when they're ready.  */
   int max_workers = flag_worker_partitioning ? 16 : 1;
 
-  gcc_assert (!flag_worker_partitioning);
-
   /* The vector size must appear to be 64, to the user, unless this is a
  SEQ routine.  The real, internal value is always 1, which means use
  autovectorization, but the user should not see that.  */
@@ -6035,6 +6033,8 @@ print_operand (FILE *file, rtx x, int code)
 #define TARGET_GOACC_REDUCTION gcn_goacc_reduction
 #undef  TARGET_GOACC_VALIDATE_DIMS
 #define TARGET_GOACC_VALIDATE_DIMS gcn_goacc_validate_dims
+#undef  TARGET_GOACC_WORKER_PARTITIONING
+#define TARGET_GOACC_WORKER_PARTITIONING true
 #undef  TARGET_HARD_REGNO_MODE_OK
 #define TARGET_HARD_REGNO_MODE_OK gcn_hard_regno_mode_ok
 #undef  TARGET_HARD_REGNO_NREGS
diff --git a/gcc/config/gcn/gcn.opt b/gcc/config/gcn/gcn.opt
index 90d35f42e57..2fd3996edba 100644
--- a/gcc/config/gcn/gcn.opt
+++ b/gcc/config/gcn/gcn.opt
@@ -62,7 +62,7 @@ Target Report RejectNegative Var(flag_bypass_init_error)
 bool flag_worker_partitioning = false
 
 macc-experimental-workers
-Target Report Var(flag_worker_partitioning) Init(0)
+Target Report Var(flag_worker_partitioning) Init(1)
 
 int stack_size_opt = -1
 
diff --git a/libgomp/ChangeLog.openacc b/libgomp/ChangeLog.openacc
index c7ef40e922c..438bd59b47b 100644
--- a/libgomp/ChangeLog.openacc
+++ b/libgomp/ChangeLog.openacc
@@ -1,3 +1,8 @@
+2019-09-05  Julian Brown  
+
+   * plugin/plugin-gcn.c (gcn_exec): Change default number of workers to
+   16.
+
 2019-09-05  Julian Brown  
 
* testsuite/libgomp.oacc-fortran/lib-13.f90: End data region after
diff --git a/libgomp/plugin/plugin-gcn.c b/libgomp/plugin/plugin-gcn.c
index 099f70b647c..f0b22ebc3d7 100644
--- a/libgomp/plugin/plugin-gcn.c
+++ b/libgomp/plugin/plugin-gcn.c
@@ -3244,10 +3244,8 @@ gcn_exec (struct kernel_info *kernel, size_t mapnum, 
void **hostaddrs,
  problem size, so let's do a reasonable number of single-worker gangs.
  64 gangs matches a typical Fiji device.  */
 
-  /* NOTE: Until support for middle-end worker partitioning is merged, use 1
- for the default number of workers.  */
   if (dims[0] == 0) dims[0] = 64; /* Gangs.  */
-  if (dims[1] == 0) dims[1] = 1;  /* Workers.  */
+  if (dims[1] == 0) dims[1] = 16; /* Workers.  */
 
   /* The incoming dimensions are expressed in terms of gangs, workers, and
  vectors.  The HSA dimensions are expressed in terms of "work-items",
-- 
2.22.0



[PATCH 2/6] [og9] OpenACC middle-end worker-partitioning support

2019-09-04 Thread Julian Brown
This patch implements worker-partitioning support in the middle end,
by rewriting gimple. The OpenACC execution model requires that code can
run in either "worker single" mode where only a single worker per gang
is active, or "worker partitioned" mode, where multiple workers per gang
are active. This means we need to do something equivalent to
spawning additional workers when transitioning from worker-single to
worker-partitioned mode. However, GPUs typically fix the number of threads
of invoked kernels at launch time, so we need to do something with the
"extra" threads when they are not wanted.

The scheme used is -- very briefly! -- to conditionalise each basic block
that executes in "worker single" mode for worker 0 only. Conditional
branches are handled specially so "idle" (non-0) workers follow along with
worker 0. On transitioning to "worker partitioned" mode, any variables
modified by worker 0 are propagated to the other workers via GPU shared
memory. Special care is taken for routine calls, writes through pointers,
and so forth.

Much of omp-sese.c originates from code written for NVPTX by Nathan
Sidwell (adapted to work on gimple instead of RTL) -- though at present,
only the per-basic-block scheme is implemented, and the SESE-finding
algorithm isn't yet used.

Julian

ChangeLog

gcc/
* Makefile.in (OBJS): Add omp-sese.o.
* omp-builtins.def (BUILT_IN_GOACC_BARRIER, BUILT_IN_GOACC_SINGLE_START,
BUILT_IN_GOACC_SINGLE_COPY_START, BUILT_IN_GOACC_SINGLE_COPY_END): New
builtins.
* omp-offload.c (omp-sese.h): Include header.
(oacc_loop_xform_head_tail): Call update_stmt for modified builtin
calls.
(oacc_loop_process): Likewise.
(default_goacc_create_propagation_record): New default implementation
for TARGET_GOACC_CREATE_PROPAGATION_RECORD hook.
(execute_oacc_loop_designation): New.  Split out of oacc_device_lower.
(execute_oacc_gimple_workers): New.  Likewise.
(execute_oacc_device_lower): Recreate dims array.
(pass_data_oacc_loop_designation, pass_data_oacc_gimple_workers): New.
(pass_oacc_loop_designation, pass_oacc_gimple_workers): New.
(make_pass_oacc_loop_designation, make_pass_oacc_gimple_workers): New.
* omp-offload.h (oacc_fn_attrib_level): Add prototype.
* omp-sese.c: New file.
* omp-sese.h: New file.
* passes.def (pass_oacc_loop_designation, pass_oacc_gimple_workers):
Add passes.
* target.def (worker_partitioning, create_propagation_record): Add
target hooks.
* targhooks.h (default_goacc_create_propagation_record): Add prototype.
* tree-pass.h (make_pass_oacc_loop_designation,
make_pass_oacc_gimple_workers): Add prototypes.
* doc/tm.texi.in (TARGET_GOACC_WORKER_PARTITIONING,
TARGET_GOACC_CREATE_PROPAGATION_RECORD): Add documentation hooks.
* doc/tm.texi: Regenerate.
---
 gcc/ChangeLog.openacc |   32 +
 gcc/Makefile.in   |1 +
 gcc/doc/tm.texi   |   10 +
 gcc/doc/tm.texi.in|4 +
 gcc/omp-builtins.def  |8 +
 gcc/omp-offload.c |  159 +++-
 gcc/omp-offload.h |1 +
 gcc/omp-sese.c| 2036 +
 gcc/omp-sese.h|   26 +
 gcc/passes.def|2 +
 gcc/target.def|   13 +
 gcc/targhooks.h   |1 +
 gcc/tree-pass.h   |2 +
 13 files changed, 2276 insertions(+), 19 deletions(-)
 create mode 100644 gcc/omp-sese.c
 create mode 100644 gcc/omp-sese.h

diff --git a/gcc/ChangeLog.openacc b/gcc/ChangeLog.openacc
index b1c627b394c..a2b2dcfcf26 100644
--- a/gcc/ChangeLog.openacc
+++ b/gcc/ChangeLog.openacc
@@ -1,3 +1,35 @@
+2019-09-05  Julian Brown  
+
+   * Makefile.in (OBJS): Add omp-sese.o.
+   * omp-builtins.def (BUILT_IN_GOACC_BARRIER, BUILT_IN_GOACC_SINGLE_START,
+BUILT_IN_GOACC_SINGLE_COPY_START, BUILT_IN_GOACC_SINGLE_COPY_END): New
+   builtins.
+   * omp-offload.c (omp-sese.h): Include header.
+   (oacc_loop_xform_head_tail): Call update_stmt for modified builtin
+   calls.
+   (oacc_loop_process): Likewise.
+   (default_goacc_create_propagation_record): New default implementation
+   for TARGET_GOACC_CREATE_PROPAGATION_RECORD hook.
+   (execute_oacc_loop_designation): New.  Split out of oacc_device_lower.
+   (execute_oacc_gimple_workers): New.  Likewise.
+   (execute_oacc_device_lower): Recreate dims array.
+   (pass_data_oacc_loop_designation, pass_data_oacc_gimple_workers): New.
+   (pass_oacc_loop_designation, pass_oacc_gimple_workers): New.
+   (make_pass_oacc_loop_designation, make_pass_oacc_gimple_workers): New.
+   * omp-offload.h (oacc_fn_attrib_level): Add prototype.
+   * omp-sese.c: New file.
+   * omp-sese.h: New file.
+   * passes.def (pass_oacc_loop_designation, pass_oacc_gimple_workers):
+   Add passes.
+   * target.def (worker_partitioning, create_propagat

[PATCH 5/6] [og9] Reference reduction localization

2019-09-04 Thread Julian Brown
This is a version of Cesar's patch to rewrite OpenACC reference reductions
(e.g. Fortran function arguments) to use temporary scalar variables. This
is necessary at present to avoid stack-slot clashes between multiple
workers on AMD GCN.

Julian

ChangeLog

gcc/
* gimplify.c (privatize_reduction): New struct.
(localize_reductions_r, localize_reductions): New functions.
(gimplify_omp_for): Call localize_reductions.
(gimplify_omp_workshare): Likewise.
* omp-low.c (lower_oacc_reductions): Handle localized reductions.
Create fewer temp vars.
* tree-core.h (omp_clause_code): Add OMP_CLAUSE_REDUCTION_PRIVATE_DECL
documentation.
* tree.c (omp_clause_num_ops): Bump number of ops for
OMP_CLAUSE_REDUCTION to 6.
(walk_tree_1): Adjust accordingly.
* tree.h (OMP_CLAUSE_REDUCTION_PRIVATE_DECL): Add macro.
---
 gcc/ChangeLog.openacc |  16 +++
 gcc/gimplify.c| 102 ++
 gcc/omp-low.c |  47 ++-
 gcc/tree-core.h   |   4 +-
 gcc/tree.c|  11 +++--
 gcc/tree.h|   2 +
 6 files changed, 145 insertions(+), 37 deletions(-)

diff --git a/gcc/ChangeLog.openacc b/gcc/ChangeLog.openacc
index 0d068ac8ae2..2b7f616810d 100644
--- a/gcc/ChangeLog.openacc
+++ b/gcc/ChangeLog.openacc
@@ -1,3 +1,19 @@
+2019-09-05  Cesar Philippidis  
+   Julian Brown  
+
+   * gimplify.c (privatize_reduction): New struct.
+   (localize_reductions_r, localize_reductions): New functions.
+   (gimplify_omp_for): Call localize_reductions.
+   (gimplify_omp_workshare): Likewise.
+   * omp-low.c (lower_oacc_reductions): Handle localized reductions.
+   Create fewer temp vars.
+   * tree-core.h (omp_clause_code): Add OMP_CLAUSE_REDUCTION_PRIVATE_DECL
+   documentation.
+   * tree.c (omp_clause_num_ops): Bump number of ops for
+   OMP_CLAUSE_REDUCTION to 6.
+   (walk_tree_1): Adjust accordingly.
+   * tree.h (OMP_CLAUSE_REDUCTION_PRIVATE_DECL): Add macro.
+
 2019-09-05  Julian Brown  
 
* config/gcn/gcn-protos.h (gcn_goacc_adjust_propagation_record): Rename
diff --git a/gcc/gimplify.c b/gcc/gimplify.c
index 58142c9eb90..685db1763e0 100644
--- a/gcc/gimplify.c
+++ b/gcc/gimplify.c
@@ -234,6 +234,11 @@ struct gimplify_omp_ctx
   hash_map *decl_data_clause;
 };
 
+struct privatize_reduction
+{
+  tree ref_var, local_var;
+};
+
 static struct gimplify_ctx *gimplify_ctxp;
 static struct gimplify_omp_ctx *gimplify_omp_ctxp;
 
@@ -10804,6 +10809,80 @@ find_combined_omp_for (tree *tp, int *walk_subtrees, 
void *data)
   return NULL_TREE;
 }
 
+/* Helper function for localize_reductions.  Replace all uses of REF_VAR with
+   LOCAL_VAR.  */
+
+static tree
+localize_reductions_r (tree *tp, int *walk_subtrees, void *data)
+{
+  enum tree_code tc = TREE_CODE (*tp);
+  struct privatize_reduction *pr = (struct privatize_reduction *) data;
+
+  if (TYPE_P (*tp))
+*walk_subtrees = 0;
+
+  switch (tc)
+{
+case INDIRECT_REF:
+case MEM_REF:
+  if (TREE_OPERAND (*tp, 0) == pr->ref_var)
+   *tp = pr->local_var;
+
+  *walk_subtrees = 0;
+  break;
+
+case VAR_DECL:
+case PARM_DECL:
+case RESULT_DECL:
+  if (*tp == pr->ref_var)
+   *tp = pr->local_var;
+
+  *walk_subtrees = 0;
+  break;
+
+default:
+  break;
+}
+
+  return NULL_TREE;
+}
+
+/* OpenACC worker and vector loop state propagation requires reductions
+   to be inside local variables.  This function replaces all reference-type
+   reductions variables associated with the loop with a local copy.  It is
+   also used to create private copies of reduction variables for those
+   which are not associated with acc loops.  */
+
+static void
+localize_reductions (tree clauses, tree body)
+{
+  tree c, var, type, new_var;
+  struct privatize_reduction pr;
+
+  for (c = clauses; c; c = OMP_CLAUSE_CHAIN (c))
+if (OMP_CLAUSE_CODE (c) == OMP_CLAUSE_REDUCTION)
+  {
+   var = OMP_CLAUSE_DECL (c);
+
+   if (!lang_hooks.decls.omp_privatize_by_reference (var))
+ {
+   OMP_CLAUSE_REDUCTION_PRIVATE_DECL (c) = NULL;
+   continue;
+ }
+
+   type = TREE_TYPE (TREE_TYPE (var));
+   new_var = create_tmp_var (type, IDENTIFIER_POINTER (DECL_NAME (var)));
+
+   pr.ref_var = var;
+   pr.local_var = new_var;
+
+   walk_tree (&body, localize_reductions_r, &pr, NULL);
+
+   OMP_CLAUSE_REDUCTION_PRIVATE_DECL (c) = new_var;
+  }
+}
+
+
 /* Gimplify the gross structure of an OMP_FOR statement.  */
 
 static enum gimplify_status
@@ -10989,6 +11068,23 @@ gimplify_omp_for (tree *expr_p, gimple_seq *pre_p)
   gcc_unreachable ();
 }
 
+  if (ort == ORT_ACC)
+{
+  gimplify_omp_ctx *outer = gimplify_omp_ctxp;
+
+  while (outer
+&& outer->region_type != ORT_ACC_PARALLEL
+&& outer->region_type != ORT_ACC_KERNELS)
+  

Go patch committed: Default to -fno-partial-inlining

2019-09-04 Thread Ian Lance Taylor
This patch to the Go frontend defaults to -fno-partial-inlining.  This
is to work around PR 91663: a case in which a function is split for
partial inlining, and then the two functions are inlined together into
a single function, leading to a confusing stack trace.  This doesn't
work for Go since it expects reliable stack traces.  We may be able to
undo this patch later if PR 91663 is fixed.  This patch bootstrapped
and ran Go testsuite on x86_64-pc-linux-gnu.  Committed to mainline.

Ian


2019-09-04  Ian Lance Taylor  

PR tree-optimization/91663
* go-lang.c (go_langhook_post_options): Clear
flag_partial_inlining.
Index: go-lang.c
===
--- go-lang.c   (revision 275351)
+++ go-lang.c   (working copy)
@@ -300,6 +300,11 @@ go_langhook_post_options (const char **p
   if (!global_options_set.x_flag_optimize_sibling_calls)
 global_options.x_flag_optimize_sibling_calls = 0;
 
+  /* Partial inlining can confuses uses of runtime.Callers.
+ See https://gcc.gnu.org/PR91663.  */
+  if (!global_options_set.x_flag_partial_inlining)
+global_options.x_flag_partial_inlining = 0;
+
   /* If the debug info level is still 1, as set in init_options, make
  sure that some debugging type is selected.  */
   if (global_options.x_debug_info_level == DINFO_LEVEL_TERSE


[PATCH] PR fortran/91660 -- Improve and restore error messages

2019-09-04 Thread Steve Kargl
Built and regression tested on x86_64-*-freebsd.

The patch restores an improved error message for a malformed
type-spec.  See pr91660_1.f90 for code demonstrating problem.
While here, improve the error messages for other malformed
type-specs.  Sett pr91660_2.f90.

OK to commit?

2019-09-04  Steven G. Kargl  

PR fortran/91660
* decl.c (gfc_match_decl_type_spec): Improve and restore error
message for malformed types-spec.

2019-09-04  Steven G. Kargl  

PR fortran/91660
* gfortran.dg/pdt_4.f03: Fix invalid code.
* gfortran.dg/pr91660_1.f90: New test.
* gfortran.dg/pr91660_2.f90: Ditto.
-- 
Steve
Index: gcc/fortran/decl.c
===
--- gcc/fortran/decl.c	(revision 275390)
+++ gcc/fortran/decl.c	(working copy)
@@ -4023,7 +4023,6 @@ gfc_match_decl_type_spec (gfc_typespec *ts, int implic
   return MATCH_YES;
 }
 
-
   m = gfc_match (" type (");
   matched_type = (m == MATCH_YES);
   if (matched_type)
@@ -4071,7 +4070,10 @@ gfc_match_decl_type_spec (gfc_typespec *ts, int implic
 	m = MATCH_YES;
 
   if (matched_type && m == MATCH_YES && gfc_match_char (')') != MATCH_YES)
-	m = MATCH_ERROR;
+	{
+	  gfc_error ("Malformed type-spec at %C");
+	  return MATCH_ERROR;
+	}
 
   return m;
 }
@@ -4094,8 +4096,12 @@ gfc_match_decl_type_spec (gfc_typespec *ts, int implic
 	  && !gfc_notify_std (GFC_STD_F2008, "TYPE with "
 			  "intrinsic-type-spec at %C"))
 	return MATCH_ERROR;
+
   if (matched_type && gfc_match_char (')') != MATCH_YES)
-	return MATCH_ERROR;
+	{
+	  gfc_error ("Malformed type-spec at %C");
+	  return MATCH_ERROR;
+	}
 
   ts->type = BT_REAL;
   ts->kind = gfc_default_double_kind;
@@ -4125,7 +4131,10 @@ gfc_match_decl_type_spec (gfc_typespec *ts, int implic
 	return MATCH_ERROR;
 
   if (matched_type && gfc_match_char (')') != MATCH_YES)
-	return MATCH_ERROR;
+	{
+	  gfc_error ("Malformed type-spec at %C");
+	  return MATCH_ERROR;
+	}
 
   ts->type = BT_COMPLEX;
   ts->kind = gfc_default_double_kind;
@@ -4146,7 +4155,13 @@ gfc_match_decl_type_spec (gfc_typespec *ts, int implic
   if (m == MATCH_ERROR)
 	return m;
 
-m = gfc_match_char (')');
+  gfc_gobble_whitespace ();
+  if (gfc_peek_ascii_char () != ')')
+	{
+	  gfc_error ("Malformed type-spec at %C");
+	  return MATCH_ERROR;
+	}
+  m = gfc_match_char (')'); /* Burn closing ')'.  */
 }
 
   if (m != MATCH_YES)
Index: gcc/testsuite/gfortran.dg/pdt_4.f03
===
--- gcc/testsuite/gfortran.dg/pdt_4.f03	(revision 275390)
+++ gcc/testsuite/gfortran.dg/pdt_4.f03	(working copy)
@@ -97,9 +97,9 @@ contains
 type (mytype(4, *)) :: arg  ! OK
   end subroutine
   subroutine bar(arg)   ! { dg-error "is neither allocatable nor a pointer" }
-type (thytype(8, :, 4) :: arg
+type (thytype(8, :, 4)) :: arg
   end subroutine
   subroutine foobar(arg)! OK
-type (thytype(8, *, 4) :: arg
+type (thytype(8, *, 4)) :: arg
   end subroutine
 end
Index: gcc/testsuite/gfortran.dg/pr91660_1.f90
===
--- gcc/testsuite/gfortran.dg/pr91660_1.f90	(nonexistent)
+++ gcc/testsuite/gfortran.dg/pr91660_1.f90	(working copy)
@@ -0,0 +1,9 @@
+! { dg-do compile }
+! PR fortran/91660
+! Code contributed by Gerhard Steinmetz
+program p
+   type t
+   end type
+   type (t x! { dg-error "Malformed type-spec" }
+   x = t()  ! { dg-error "Cannot convert" }
+end
Index: gcc/testsuite/gfortran.dg/pr91660_2.f90
===
--- gcc/testsuite/gfortran.dg/pr91660_2.f90	(nonexistent)
+++ gcc/testsuite/gfortran.dg/pr91660_2.f90	(working copy)
@@ -0,0 +1,9 @@
+! { dg-do compile }
+! PR fortran/91660
+program foo
+   type(doubleprecision :: x   ! { dg-error "Malformed type-spec" }
+   type(double precision :: y  ! { dg-error "Malformed type-spec" }
+   type(character(len=3) :: a  ! { dg-error "Malformed type-spec" }
+   type(doublecomplex :: b ! { dg-error "Malformed type-spec" }
+   type(double complex :: c! { dg-error "Malformed type-spec" }
+end program foo


Re: [PATCH, i386]: Do not limit the cost of moves to/from XMM register to minimum 8.

2019-09-04 Thread Hongtao Liu
On Wed, Sep 4, 2019 at 9:44 AM Hongtao Liu  wrote:
>
> On Wed, Sep 4, 2019 at 12:50 AM Uros Bizjak  wrote:
> >
> > On Tue, Sep 3, 2019 at 1:33 PM Richard Biener
> >  wrote:
> >
> > > > > Note:
> > > > > Removing limit of cost would introduce lots of regressions in 
> > > > > SPEC2017 as follow
> > > > > 
> > > > > 531.deepsjeng_r  -7.18%
> > > > > 548.exchange_r  -6.70%
> > > > > 557.xz_r -6.74%
> > > > > 508.namd_r -2.81%
> > > > > 527.cam4_r -6.48%
> > > > > 544.nab_r -3.99%
> > > > >
> > > > > Tested on skylake server.
> > > > > -
> > > > > How about  changing cost from 2 to 8 until we figure out a better 
> > > > > number.
> > > >
> > > > Certainly works for me.  Note the STV code uses the "other" 
> > > > sse_to_integer
> > > > number and the numbers in question here are those for the RA.  There's
> > > > a multitude of values used in the tables here, including some a lot 
> > > > larger.
> > > > So the overall bumping to 8 certainly was the wrong thing to do and 
> > > > instead
> > > > individual numbers should have been adjusted (didn't look at the history
> > > > of that bumping).
> > >
> > > For reference:
> > >
> > > r125951 | uros | 2007-06-22 19:51:06 +0200 (Fri, 22 Jun 2007) | 6 lines
> > >
> > > PR target/32413
> > > * config/i386/i386.c (ix86_register_move_cost): Rise the cost of
> > > moves between MMX/SSE registers to at least 8 units to prevent
> > > ICE caused by non-tieable SI/HI/QImodes in SSE registers.
> > >
> > > should probably have been "twice the cost of X" or something like that
> > > instead where X be some reg-reg move cost.
> >
> > Thanks for the reference. It looks that the patch fixes the issue in
> > the wrong place, this should be solved in
> > inline_secondary_memory_needed:
> >
> >   /* Between SSE and general, we have moves no larger than word size.  
> > */
> >   if (!(INTEGER_CLASS_P (class1) || INTEGER_CLASS_P (class2))
> >|| GET_MODE_SIZE (mode) < GET_MODE_SIZE (SImode)
> >|| GET_MODE_SIZE (mode) > UNITS_PER_WORD)
> > return true;
> >
> > as an alternative to implement QI and HImode moves as a SImode move
> > between SSE and int<->SSE registers. We have
> > ix86_secondary_memory_needed_mode that extends QI and HImode secondary
> > memory to SImode, so this should solve PR32413.
> >
> > Other than that, what to do with the bizzare property of direct moves
> > that benchmark far worse than indirect moves? I was expecting that
> > keeping the cost of direct inter-regset moves just a bit below the
> > cost of int<->mem<->xmm, but (much ?) higher than itra-regset moves
> > would prevent unwanted wandering of values between register sets,
> > while still generating the direct move when needed. While this almost
>
> I've not tested it yet.
> So i'll start a test about this patch(change cost from 2-->6) with
> Richard's change.
> I'll keep you informed when finishing test.
>
> > fixes the runtime regression, it is not clear to me from Hongtao Liu's
> > message if  Richard's 2019-08-27 fixes the remaining regression or
> > not). Liu, can you please clarify?
> >
> 
> 531.deepsjeng_r  -7.18%
> 548.exchange_r  -6.70%
> 557.xz_r -6.74%
> 508.namd_r -2.81%
> 527.cam4_r -6.48%
> 544.nab_r -3.99%
>
> Tested on skylake server.
> -
> Those regressions are comparing gcc10_20190830 to gcc10_20190824 which
> are mainly caused by removing limit of 8.
>
> > > >  For example Pentium4 has quite high bases for move
> > > > costs, like xmm <-> xmm move costing 12 and SSE->integer costing 20
> > > > while the opposite 12.
> > > >
> > > > So yes, we want to revert the patch by applying its effect to the
> > > > individual cost tables so we can revisit this for the still interesting
> > > > micro-architectures.
> >
> > Uros.
>
>
>
> --
> BR,
> Hongtao

Change cost from 2->6 got
-
531.deepsjeng_r  9.64%
548.exchange_r  10.24%
557.xc_r  7.99%
508.namd_r 1.08%
527.cam4_r  6.91%
553.nab_r3.06%


for 531,548,557,527, even better comparing to version before regression.
for 508,533, still little regressions comparing to version before regression.
-- 
BR,
Hongtao


Re: [PATCH] Disable postreload GCSE on large code

2019-09-04 Thread Dimitar Dimitrov
On вторник, 3 септември 2019 г. 14:54:19 EEST Richard Biener wrote:
> 2019-09-02  Richard Biener  
> 
> PR rtl-optimization/36262
> * postreload-gcse.c: Include intl.h and gcse.h.
> (insert_expr_in_table): Insert at the head of cur_expr->avail_occr
> to avoid linear list walk.
> (record_last_mem_set_info): Gate off if not computing
> transparentness. (get_bb_avail_insn): If transparentness isn't computed
> give up early.
> (gcse_after_reload_main): Skip compute_transp and extended PRE
> if gcse_or_cprop_is_too_expensive says so.
> 
> Index: gcc/postreload-gcse.c
> ===
> --- gcc/postreload-gcse.c   (revision 275294)
> +++ gcc/postreload-gcse.c   (working copy)
> @@ -38,7 +38,9 @@ along with GCC; see the file COPYING3.
>  #include "params.h"
>  #include "tree-pass.h"
>  #include "dbgcnt.h"
> +#include "intl.h"
>  #include "gcse-common.h"
> +#include "gcse.h"
> 
>  /* The following code implements gcse after reload, the purpose of this
> pass is to cleanup redundant loads generated by reload and other
> @@ -364,7 +366,7 @@ insert_expr_in_table (rtx x, rtx_insn *i
>int do_not_record_p;
>hashval_t hash;
>struct expr *cur_expr, **slot;
> -  struct occr *avail_occr, *last_occr = NULL;
> +  struct occr *avail_occr;
> 
>hash = hash_expr (x, &do_not_record_p);
> 
> @@ -405,38 +407,22 @@ insert_expr_in_table (rtx x, rtx_insn *i
>cur_expr = *slot;
>  }
> 
> -  /* Search for another occurrence in the same basic block.  */
> +  /* Search for another occurrence in the same basic block.  We insert
> + insns blockwise from start to end, so keep appending to the
> + start of the list so we have to check only a single element.  */
>avail_occr = cur_expr->avail_occr;
> -  while (avail_occr
> -&& BLOCK_FOR_INSN (avail_occr->insn) != BLOCK_FOR_INSN (insn))
> -{
> -  /* If an occurrence isn't found, save a pointer to the end of
> -the list.  */
> -  last_occr = avail_occr;
> -  avail_occr = avail_occr->next;
> -}
> -
> -  if (avail_occr)
> -/* Found another instance of the expression in the same basic block.
> -   Prefer this occurrence to the currently recorded one.  We want
> -   the last one in the block and the block is scanned from start
> -   to end.  */
> +  if (avail_occr
> +  && BLOCK_FOR_INSN (avail_occr->insn) == BLOCK_FOR_INSN (insn))
>  avail_occr->insn = insn;
>else
>  {
>/* First occurrence of this expression in this basic block.  */
>avail_occr = (struct occr *) obstack_alloc (&occr_obstack,
>   sizeof (struct occr));
> -
> -  /* First occurrence of this expression in any block?  */
> -  if (cur_expr->avail_occr == NULL)
> -cur_expr->avail_occr = avail_occr;
> -  else
> -last_occr->next = avail_occr;
> -
>avail_occr->insn = insn;
> -  avail_occr->next = NULL;
> +  avail_occr->next = cur_expr->avail_occr;
>avail_occr->deleted_p = 0;
> +  cur_expr->avail_occr = avail_occr;
>  }
>  }
>  
> @@ -710,6 +696,9 @@ record_last_reg_set_info_regno (rtx_insn
>  static void
>  record_last_mem_set_info (rtx_insn *insn)
>  {
> +  if (!transp)
> +return;
> +
>struct modifies_mem *list_entry;
> 
>list_entry = (struct modifies_mem *) obstack_alloc
> (&modifies_mem_obstack, @@ -995,7 +984,8 @@ get_bb_avail_insn (basic_block
> bb, struc
>/* If we could not find an occurrence in BB, see if BB
>   has a single predecessor with an occurrence that is
>   transparent through BB.  */
> -  if (single_pred_p (bb)
> +  if (transp
> +  && single_pred_p (bb)
>&& bitmap_bit_p (transp[bb->index], bitmap_index)
>&& (occr = get_bb_avail_insn (single_pred (bb), orig_occr,
> bitmap_index))) {
> @@ -1371,6 +1361,10 @@ delete_redundant_insns (void)
>  static void
>  gcse_after_reload_main (rtx f ATTRIBUTE_UNUSED)
>  {
> +  /* Disable computing transparentness if it is too expensive.  */
> +  bool do_transp
> += !gcse_or_cprop_is_too_expensive (_("using simple load CSE after
> register " +"allocation"));
> 
>memset (&stats, 0, sizeof (stats));
> 
> @@ -1392,15 +1386,21 @@ gcse_after_reload_main (rtx f ATTRIBUTE_
>  increase the number of redundant loads found.  So compute
> transparency information for each memory expression in the hash table.  */
> df_analyze ();
> -  /* This cannot be part of the normal allocation routine because
> -we have to know the number of elements in the hash table.  */
> -  transp = sbitmap_vector_alloc (last_basic_block_for_fn (cfun),
> -expr_table->elements ());
> -  bitmap_vector_ones (transp, last_basic_block_for_fn (cfun));
> -  expr_table->traverse  (dump_file);
> +  if (do_transp)
> +   {
> +

[PATCH] Fix unaligned TARGET_MEM_REF expansion (PR 91615)

2019-09-04 Thread Bernd Edlinger
Hi,

it turns out the TARGET_MEM_REF causes ICEs in armeb-none-linux-gnueabihf,
a big-endian cross compiler.  See PR 91615.

All of them are caused by an unaligned TARGET_MEM_REF for which there is
no movmisalign optab, as it seems.

Fixed by adding extract_bit_field if that happens.


Bootstrapped and reg-tested on x86_64-pc-linux-gnu and used
an armeb cross compiler to check that each of the mentioned test cases
are fixed at -O3 and -Ofast.

Is it OK for trunk?


Thanks
Bernd.
2019-09-04  Bernd Edlinger  

	PR middle-end/91615
	* expr.c (expand_expr_real_1): Handle misaligned TARGET_MEM_REF
	without movmisalign optab.

Index: gcc/expr.c
===
--- gcc/expr.c	(Revision 275342)
+++ gcc/expr.c	(Arbeitskopie)
@@ -10313,7 +10313,6 @@ expand_expr_real_1 (tree exp, rtx target, machine_
   {
 	addr_space_t as
 	  = TYPE_ADDR_SPACE (TREE_TYPE (TREE_TYPE (TREE_OPERAND (exp, 0;
-	enum insn_code icode;
 	unsigned int align;
 
 	op0 = addr_for_mem_ref (exp, as, true);
@@ -10325,21 +10324,27 @@ expand_expr_real_1 (tree exp, rtx target, machine_
 	if (modifier != EXPAND_WRITE
 	&& modifier != EXPAND_MEMORY
 	&& mode != BLKmode
-	&& align < GET_MODE_ALIGNMENT (mode)
-	/* If the target does not have special handling for unaligned
-	   loads of mode then it can use regular moves for them.  */
-	&& ((icode = optab_handler (movmisalign_optab, mode))
-		!= CODE_FOR_nothing))
+	&& align < GET_MODE_ALIGNMENT (mode))
 	  {
-	class expand_operand ops[2];
+	enum insn_code icode;
 
-	/* We've already validated the memory, and we're creating a
-	   new pseudo destination.  The predicates really can't fail,
-	   nor can the generator.  */
-	create_output_operand (&ops[0], NULL_RTX, mode);
-	create_fixed_operand (&ops[1], temp);
-	expand_insn (icode, 2, ops);
-	temp = ops[0].value;
+	if ((icode = optab_handler (movmisalign_optab, mode))
+		!= CODE_FOR_nothing)
+	  {
+		class expand_operand ops[2];
+
+		/* We've already validated the memory, and we're creating a
+		   new pseudo destination.  The predicates really can't fail,
+		   nor can the generator.  */
+		create_output_operand (&ops[0], NULL_RTX, mode);
+		create_fixed_operand (&ops[1], temp);
+		expand_insn (icode, 2, ops);
+		temp = ops[0].value;
+	  }
+	else if (targetm.slow_unaligned_access (mode, align))
+	  temp = extract_bit_field (temp, GET_MODE_BITSIZE (mode),
+	0, unsignedp, NULL_RTX,
+	mode, mode, false, NULL);
 	  }
 	return temp;
   }