date:20181120

[PATCH] Fix PR88089

2018-11-20 Thread Richard Biener



Another oversight.

Bootstrapped and tested on x86_64-unknown-linux-gnu, applied.

Richard.

2018-11-20  Richard Biener  

PR middle-end/88089
* tree-data-ref.c (lambda_matrix_right_hermite): Use abs_hwi.

>From d374f5497b4bb33a5ebf2535035d6183a6a68021 Mon Sep 17 00:00:00 2001
From: Richard Guenther 
Date: Mon, 19 Nov 2018 13:19:47 +0100
Subject: [PATCH] fix-pr88089


diff --git a/gcc/tree-data-ref.c b/gcc/tree-data-ref.c
index 6ebcd93860d..c8193f694df 100644
--- a/gcc/tree-data-ref.c
+++ b/gcc/tree-data-ref.c
@@ -3587,8 +3587,8 @@ lambda_matrix_right_hermite (lambda_matrix A, int m, int 
n,
  a = S[i-1][j];
  b = S[i][j];
  sigma = (a * b < 0) ? -1: 1;
- a = abs (a);
- b = abs (b);
+ a = abs_hwi (a);
+ b = abs_hwi (b);
  factor = sigma * (a / b);
 
  lambda_matrix_row_add (S, n, i, i-1, -factor);

[PATCH] Fix PR88087

2018-11-20 Thread Richard Biener



The following fixes PR88087 - a failure for PRE to re-materialize
call fntypes.

Bootstrapped and tested on x86_64-unknown-linux-gnu, applied.

Richard.

2018-11-20  Richard Biener  

PR tree-optimization/88087
* tree-ssa-pre.c (create_expression_by_pieces): Re-materialize
call fntype.
* tree-ssa-sccvn.c (copy_reference_ops_from_call): Remember
call fntype.

* gcc.dg/tree-ssa/pr88087.c: New testcase.

>From 2c92b2c67a1464671c795e2c524b492ca0dce1a0 Mon Sep 17 00:00:00 2001
From: Richard Guenther 
Date: Mon, 19 Nov 2018 13:19:21 +0100
Subject: [PATCH] fix-pr88087


diff --git a/gcc/testsuite/gcc.dg/tree-ssa/pr88087.c 
b/gcc/testsuite/gcc.dg/tree-ssa/pr88087.c
new file mode 100644
index 000..558f49f4bd7
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/tree-ssa/pr88087.c
@@ -0,0 +1,17 @@
+/* { dg-do compile } */
+/* { dg-options "-O2 -fdump-tree-pre-stats" } */
+
+int f();
+int d;
+void c()
+{
+  for (;;)
+{
+  f();
+  int (*fp)() __attribute__((const)) = (void *)f;
+  d = fp();
+}
+}
+
+/* We shouldn't ICE and hoist the const call of fp out of the loop.  */
+/* { dg-final { scan-tree-dump "Eliminated: 1" "pre" } } */
diff --git a/gcc/tree-ssa-pre.c b/gcc/tree-ssa-pre.c
index 20d3c7807a1..4d5bce83a2c 100644
--- a/gcc/tree-ssa-pre.c
+++ b/gcc/tree-ssa-pre.c
@@ -2792,9 +2792,10 @@ create_expression_by_pieces (basic_block block, pre_expr 
expr,
  args.quick_push (arg);
}
  gcall *call = gimple_build_call_vec (fn, args);
+ gimple_call_set_fntype (call, currop->type);
  if (sc)
gimple_call_set_chain (call, sc);
- tree forcedname = make_ssa_name (currop->type);
+ tree forcedname = make_ssa_name (TREE_TYPE (currop->type));
  gimple_call_set_lhs (call, forcedname);
  /* There's no CCP pass after PRE which would re-compute alignment
 information so make sure we re-materialize this here.  */
diff --git a/gcc/tree-ssa-sccvn.c b/gcc/tree-ssa-sccvn.c
index 43641916d52..01bedf56662 100644
--- a/gcc/tree-ssa-sccvn.c
+++ b/gcc/tree-ssa-sccvn.c
@@ -1206,7 +1206,7 @@ copy_reference_ops_from_call (gcall *call,
 
   /* Copy the type, opcode, function, static chain and EH region, if any.  */
   memset (&temp, 0, sizeof (temp));
-  temp.type = gimple_call_return_type (call);
+  temp.type = gimple_call_fntype (call);
   temp.opcode = CALL_EXPR;
   temp.op0 = gimple_call_fn (call);
   temp.op1 = gimple_call_chain (call);

Re: Fix ICE in cp_var_mod_type_p

2018-11-20 Thread Richard Biener

On Mon, 19 Nov 2018, Jan Hubicka wrote:

> Hi,
> enable-checking compiler crashes in free_lang_data because verify_type
> is called too early (after we free data in types but before we clear the
> langhooks) and it ends up calling cp_var_mod_type_p which ICEs.
> 
> This is fixed by moving type checking after hooks updates.  It would be
> also possible to move hook update into free_lang_data_in_cgraph but it
> seems to me that it is better to keep such a global cahgne in the
> toplevel function (free_lang_data).
> 
> lto-bootstrapped/retested x86_64-linux, OK?

OK.  I remember I had a similar patch that did unconditionally
verify_type (of course failing galore...)

Richard.

> Honza
>   * tree.c (free_lang_data_in_cgraph): Add argument fld; break out
>   type checking to...
>   (free_lang_data) ... here; update call of free_lang_data_in_cgraph.
> Index: tree.c
> ===
> --- tree.c(revision 266235)
> +++ tree.c(working copy)
> @@ -6014,44 +6014,38 @@ assign_assembler_name_if_needed (tree t)
> been set up.  */
>  
>  static void
> -free_lang_data_in_cgraph (void)
> +free_lang_data_in_cgraph (struct free_lang_data_d *fld)
>  {
>struct cgraph_node *n;
>varpool_node *v;
> -  struct free_lang_data_d fld;
>tree t;
>unsigned i;
>alias_pair *p;
>  
>/* Find decls and types in the body of every function in the callgraph.  */
>FOR_EACH_FUNCTION (n)
> -find_decls_types_in_node (n, &fld);
> +find_decls_types_in_node (n, fld);
>  
>FOR_EACH_VEC_SAFE_ELT (alias_pairs, i, p)
> -find_decls_types (p->decl, &fld);
> +find_decls_types (p->decl, fld);
>  
>/* Find decls and types in every varpool symbol.  */
>FOR_EACH_VARIABLE (v)
> -find_decls_types_in_var (v, &fld);
> +find_decls_types_in_var (v, fld);
>  
>/* Set the assembler name on every decl found.  We need to do this
>   now because free_lang_data_in_decl will invalidate data needed
>   for mangling.  This breaks mangling on interdependent decls.  */
> -  FOR_EACH_VEC_ELT (fld.decls, i, t)
> +  FOR_EACH_VEC_ELT (fld->decls, i, t)
>  assign_assembler_name_if_needed (t);
>  
>/* Traverse every decl found freeing its language data.  */
> -  FOR_EACH_VEC_ELT (fld.decls, i, t)
> -free_lang_data_in_decl (t, &fld);
> +  FOR_EACH_VEC_ELT (fld->decls, i, t)
> +free_lang_data_in_decl (t, fld);
>  
>/* Traverse every type found freeing its language data.  */
> -  FOR_EACH_VEC_ELT (fld.types, i, t)
> -free_lang_data_in_type (t, &fld);
> -  if (flag_checking)
> -{
> -  FOR_EACH_VEC_ELT (fld.types, i, t)
> - verify_type (t);
> -}
> +  FOR_EACH_VEC_ELT (fld->types, i, t)
> +free_lang_data_in_type (t, fld);
>  }
>  
>  
> @@ -6061,6 +6055,7 @@ static unsigned
>  free_lang_data (void)
>  {
>unsigned i;
> +  struct free_lang_data_d fld;
>  
>/* If we are the LTO frontend we have freed lang-specific data already.  */
>if (in_lto_p
> @@ -6081,7 +6076,7 @@ free_lang_data (void)
>  
>/* Traverse the IL resetting language specific information for
>   operands, expressions, etc.  */
> -  free_lang_data_in_cgraph ();
> +  free_lang_data_in_cgraph (&fld);
>  
>/* Create gimple variants for common types.  */
>for (unsigned i = 0;
> @@ -6102,6 +6097,15 @@ free_lang_data (void)
>  
>lang_hooks.tree_inlining.var_mod_type_p = hook_bool_tree_tree_false;
>  
> +  if (flag_checking)
> +{
> +  int i;
> +  tree t;
> +
> +  FOR_EACH_VEC_ELT (fld.types, i, t)
> + verify_type (t);
> +}
> +
>/* We do not want the default decl_assembler_name implementation,
>   rather if we have fixed everything we want a wrapper around it
>   asserting that all non-local symbols already got their assembler
> Index: testsuite/g++.dg/torture/pr87997.C
> ===
> --- testsuite/g++.dg/torture/pr87997.C(nonexistent)
> +++ testsuite/g++.dg/torture/pr87997.C(working copy)
> @@ -0,0 +1,35 @@
> +/* { dg-do compile } */
> +template  struct a;
> +template  class b, typename c, typename f, typename... d>
> +struct a, c> {
> +  using e = b;
> +};
> +template  class h {
> +public:
> +  typedef f g;
> +};
> +template  using k = typename a::e;
> +template  struct l { template  using m = k; };
> +template  struct n {
> +  typedef typename j::g o;
> +  template  struct p {
> +typedef typename l::template m other;
> +  };
> +};
> +template  struct F {
> +  typedef typename n::template p::other q;
> +};
> +template > class r {
> +public:
> +  typename n::q>::o operator[](long);
> +  f *t() noexcept;
> +};
> +class s {
> +  void m_fn2();
> +  r u;
> +};
> +void s::m_fn2() try {
> +  for (int i;;)
> +(this->*u[i])();
> +} catch (...) {
> +}
> 
> 

-- 
Richard Biener 
SUSE LINUX GmbH, GF: Felix Imendoerffer, Jane Smithard, Graham Norton, HRB 
21284 (AG Nuernberg)

Re: [PATCH] Fix PR83215, remove alias-set zero case from component_uses_parent_alias_set_from

2018-11-20 Thread Richard Biener

On Mon, 19 Nov 2018, Eric Botcazou wrote:

> > Eric, do you know of any cases in Ada where a alias-set zero base
> > has non-alias-set zero children?  The testsuite seems to be clean
> > but you never know...
> 
> No, at least not off the top of my head; that would be weird in any case.

OK, I'll bootstrap & test the change on the GCC 7 branch as well and if
there's no fallout there install it on trunk (only).  If anything pops
up we can easily revert.

Thanks,
Richard.

Fix PR rtl-optimization/85925

2018-11-20 Thread Eric Botcazou

This is a regression present on all active branches: the combiner wrongly 
optimizes away a zero-extension on the ARM because it rewrites a ZERO_EXTRACT 
from SImode to HImode after having recorded that the upper bits of the results 
are cleared for WORD_REGISTER_OPERATIONS architectures.

I tried 3 approaches to fix the bug (with the help of Segher to evaluate the 
pessimization on various architectures):
 1. Disabling the WORD_REGISTER_OPERATIONS mechanism in the combiner,
 2. Preventing the ZERO_EXTRACT from being rewritten from SImode to HImode,
 3. Selectively disabling the WORD_REGISTER_OPERATIONS mechanism.

The 3 approaches pessimize (as expected) in the following order: 2 > 1 > 3.
The attached patch implements the 3rd approach, which seems a good compromise.

Tested on arm-elf and sparc-sun-solaris2.11, applied on all active branches.


2018-11-20  Eric Botcazou  

PR rtl-optimization/85925
* rtl.h (word_register_operation_p): New predicate.
* combine.c (record_dead_and_set_regs_1): Only apply specific handling
for WORD_REGISTER_OPERATIONS targets to word_register_operation_p RTX.
* rtlanal.c (nonzero_bits1): Likewise.  Adjust couple of comments.
(num_sign_bit_copies1): Likewise.


2018-11-20  Eric Botcazou  

* gcc.c-torture/execute/20181120-1.c: New test.

-- 
Eric BotcazouIndex: rtl.h
===
--- rtl.h	(revision 266178)
+++ rtl.h	(working copy)
@@ -4374,6 +4375,25 @@ strip_offset_and_add (rtx x, poly_int64_
   return x;
 }
 
+/* Return true if X is an operation that always operates on the full
+   registers for WORD_REGISTER_OPERATIONS architectures.  */
+
+inline bool
+word_register_operation_p (const_rtx x)
+{
+  switch (GET_CODE (x))
+{
+case ROTATE:
+case ROTATERT:
+case SIGN_EXTRACT:
+case ZERO_EXTRACT:
+  return false;
+
+default:
+  return true;
+}
+}
+
 /* gtype-desc.c.  */
 extern void gt_ggc_mx (rtx &);
 extern void gt_pch_nx (rtx &);
Index: combine.c
===
--- combine.c	(revision 266178)
+++ combine.c	(working copy)
@@ -13331,6 +13331,7 @@ record_dead_and_set_regs_1 (rtx dest, co
 	   && subreg_lowpart_p (SET_DEST (setter)))
 	record_value_for_reg (dest, record_dead_insn,
 			  WORD_REGISTER_OPERATIONS
+			  && word_register_operation_p (SET_SRC (setter))
 			  && paradoxical_subreg_p (SET_DEST (setter))
 			  ? SET_SRC (setter)
 			  : gen_lowpart (GET_MODE (dest),
Index: rtlanal.c
===
--- rtlanal.c	(revision 266178)
+++ rtlanal.c	(working copy)
@@ -4485,12 +4485,12 @@ nonzero_bits1 (const_rtx x, scalar_int_m
  might be nonzero in its own mode, taking into account the fact that, on
  CISC machines, accessing an object in a wider mode generally causes the
  high-order bits to become undefined, so they are not known to be zero.
- We extend this reasoning to RISC machines for rotate operations since the
- semantics of the operations in the larger mode is not well defined.  */
+ We extend this reasoning to RISC machines for operations that might not
+ operate on the full registers.  */
   if (mode_width > xmode_width
   && xmode_width <= BITS_PER_WORD
   && xmode_width <= HOST_BITS_PER_WIDE_INT
-  && (!WORD_REGISTER_OPERATIONS || code == ROTATE || code == ROTATERT))
+  && !(WORD_REGISTER_OPERATIONS && word_register_operation_p (x)))
 {
   nonzero &= cached_nonzero_bits (x, xmode,
   known_x, known_mode, known_ret);
@@ -4758,13 +4758,16 @@ nonzero_bits1 (const_rtx x, scalar_int_m
 	  nonzero &= cached_nonzero_bits (SUBREG_REG (x), mode,
 	  known_x, known_mode, known_ret);
 
-  /* On many CISC machines, accessing an object in a wider mode
+  /* On a typical CISC machine, accessing an object in a wider mode
 	 causes the high-order bits to become undefined.  So they are
-	 not known to be zero.  */
+	 not known to be zero.
+
+	 On a typical RISC machine, we only have to worry about the way
+	 loads are extended.  Otherwise, if we get a reload for the inner
+	 part, it may be loaded from the stack, and then we may lose all
+	 the zero bits that existed before the store to the stack.  */
 	  rtx_code extend_op;
 	  if ((!WORD_REGISTER_OPERATIONS
-	   /* If this is a typical RISC machine, we only have to worry
-		  about the way loads are extended.  */
 	   || ((extend_op = load_extend_op (inner_mode)) == SIGN_EXTEND
 		   ? val_signbit_known_set_p (inner_mode, nonzero)
 		   : extend_op != ZERO_EXTEND)
@@ -5025,10 +5028,9 @@ num_sign_bit_copies1 (const_rtx x, scala
 {
   /* If this machine does not do all register operations on the entire
 	 register and MODE is wi

Re: [PATCH] make function_args_iterator a proper iterator

2018-11-20 Thread Richard Biener

On Mon, Nov 19, 2018 at 4:36 PM Martin Sebor  wrote:
>
> On 11/19/2018 03:32 AM, Richard Biener wrote:
> > On Sat, Nov 17, 2018 at 12:05 AM Martin Sebor  wrote:
> >>
> >> To encourage and simplify the adoption of iterator classes in
> >> GCC the attached patch turns the function_args_iterator struct
> >> into an (almost) proper C++ iterator class that can be used
> >> the same way as traditional forward iterators.
> >>
> >> The patch also replaces all of the 26 uses of the legacy
> >> FOREACH_FUNCTION_ARGS macro with ordinary for loops that use
> >> function_args_iterator directly, and also poisons both
> >> FOREACH_FUNCTION_ARGS and the unused FOREACH_FUNCTION_ARGS_PTR
> >> macros.
> >>
> >> The few dozen (hundred?) existing uses of for loops that iterate
> >> over function parameter types using the TREE_CHAIN() macro can
> >> be relatively easily modified to adopt the iterator approach over
> >> time.  (The patch stops of short of making this change.)
> >>
> >> Eventually, when GCC moves to more a recent C++ revision, it will
> >> become possible to simplify the for loops to make use of the range
> >> based for loop syntax along the lines of:
> >>
> >>for (auto argtype: function_args (functype))
> >>  {
> >>...
> >>  }
> >>
> >> Tested on x86_64-linux, and (lightly) on powerpc64le-linux using
> >> a cross-compiler.  I'll test the changes to the other back ends
> >> before committing.
> >
> > This isn't stage3 material.
>
> In the response referenced below Jeff requested I make use of
> iterators in my patch.  This simply does what he asked for,
> except throughout all of GCC.

I don't think he said you should invent new iterators - we have
existing ones.

Richard.

>
> Martin
>
> >
> > Richard.
> >
> >>
> >> Martin
> >>
> >> PS For some additional background on this change see:
> >>https://gcc.gnu.org/ml/gcc-patches/2018-11/msg00493.html
>

Re: C++ PATCH to implement P1094R2, Nested inline namespaces

2018-11-20 Thread Richard Biener

On Mon, Nov 19, 2018 at 11:12 PM Marek Polacek  wrote:
>
> On Mon, Nov 19, 2018 at 10:33:17PM +0100, Jakub Jelinek wrote:
> > On Mon, Nov 19, 2018 at 04:21:19PM -0500, Marek Polacek wrote:
> > > 2018-11-19  Marek Polacek  
> > >
> > > Implement P1094R2, Nested inline namespaces.
> > > * g++.dg/cpp2a/nested-inline-ns1.C: New test.
> > > * g++.dg/cpp2a/nested-inline-ns2.C: New test.
> > > * g++.dg/cpp2a/nested-inline-ns3.C: New test.
> >
> > Just a small testsuite comment.
> >
> > > --- /dev/null
> > > +++ gcc/testsuite/g++.dg/cpp2a/nested-inline-ns1.C
> > > @@ -0,0 +1,26 @@
> > > +// P1094R2
> > > +// { dg-do compile { target c++2a } }
> >
> > Especially because 2a testing isn't included by default, but also
> > to make sure it works right even with -std=c++17, wouldn't it be better to
> > drop the nested-inline-ns3.C test, make this test c++17 or
> > even better always enabled, add dg-options "-Wpedantic" and
> > just add dg-warning with c++17_down and c++14_down what should be
> > warned on the 3 lines (with .-1 for c++14_down)?
> >
> > Or if you want add some further testcases that will test how
> > c++17 etc. will dg-error on those with -pedantic-errors etc.
>
> Sure, I've made it { target c++11 } and dropped the third test:
>
> Bootstrapped/regtested on x86_64-linux, ok for trunk?

Just another small comment - given the usual high number of
C++ regressions delaying the release is Stage3 the right time
to add new language features?

> 2018-11-19  Marek Polacek  
>
> Implement P1094R2, Nested inline namespaces.
> * parser.c (cp_parser_namespace_definition): Parse the optional inline
> keyword in a nested-namespace-definition.  Adjust push_namespace call.
> Formatting fix.
>
> * g++.dg/cpp2a/nested-inline-ns1.C: New test.
> * g++.dg/cpp2a/nested-inline-ns2.C: New test.
>
> diff --git gcc/cp/parser.c gcc/cp/parser.c
> index 292cce15676..f39e9d753d2 100644
> --- gcc/cp/parser.c
> +++ gcc/cp/parser.c
> @@ -18872,6 +18872,7 @@ cp_parser_namespace_definition (cp_parser* parser)
>cp_ensure_no_oacc_routine (parser);
>
>bool is_inline = cp_lexer_next_token_is_keyword (parser->lexer, 
> RID_INLINE);
> +  const bool topmost_inline_p = is_inline;
>
>if (is_inline)
>  {
> @@ -18890,6 +18891,17 @@ cp_parser_namespace_definition (cp_parser* parser)
>  {
>identifier = NULL_TREE;
>
> +  bool nested_inline_p = cp_lexer_next_token_is_keyword (parser->lexer,
> +RID_INLINE);
> +  if (nested_inline_p && nested_definition_count != 0)
> +   {
> + if (cxx_dialect < cxx2a)
> +   pedwarn (cp_lexer_peek_token (parser->lexer)->location,
> +OPT_Wpedantic, "nested inline namespace definitions only 
> "
> +"available with -std=c++2a or -std=gnu++2a");
> + cp_lexer_consume_token (parser->lexer);
> +   }
> +
>if (cp_lexer_next_token_is (parser->lexer, CPP_NAME))
> {
>   identifier = cp_parser_identifier (parser);
> @@ -18904,7 +18916,12 @@ cp_parser_namespace_definition (cp_parser* parser)
> }
>
>if (cp_lexer_next_token_is_not (parser->lexer, CPP_SCOPE))
> -   break;
> +   {
> + /* Don't forget that the innermost namespace might have been
> +marked as inline.  */
> + is_inline |= nested_inline_p;
> + break;
> +   }
>
>if (!nested_definition_count && cxx_dialect < cxx17)
>  pedwarn (input_location, OPT_Wpedantic,
> @@ -18913,7 +18930,9 @@ cp_parser_namespace_definition (cp_parser* parser)
>
>/* Nested namespace names can create new namespaces (unlike
>  other qualified-ids).  */
> -  if (int count = identifier ? push_namespace (identifier) : 0)
> +  if (int count = (identifier
> +  ? push_namespace (identifier, nested_inline_p)
> +  : 0))
> nested_definition_count += count;
>else
> cp_parser_error (parser, "nested namespace name required");
> @@ -18926,7 +18945,7 @@ cp_parser_namespace_definition (cp_parser* parser)
>if (nested_definition_count && attribs)
>  error_at (token->location,
>   "a nested namespace definition cannot have attributes");
> -  if (nested_definition_count && is_inline)
> +  if (nested_definition_count && topmost_inline_p)
>  error_at (token->location,
>   "a nested namespace definition cannot be inline");
>
> @@ -18935,7 +18954,7 @@ cp_parser_namespace_definition (cp_parser* parser)
>
>bool has_visibility = handle_namespace_attrs (current_namespace, attribs);
>
> -  warning  (OPT_Wnamespaces, "namespace %qD entered", current_namespace);
> +  warning (OPT_Wnamespaces, "namespace %qD entered", current_namespace);
>
>/* Look for the `{' to validate starting the namespace.  */
>matching_braces braces;
> diff --git gcc/testsuite/g++.dg/cpp2a/nested-inline-ns

Re: [C++ Patch] PR 84636 ("internal compiler error: Segmentation fault (identifier_p()/grokdeclarator())")

2018-11-20 Thread Paolo Carlini


Hi,

On 19/11/18 23:24, Marek Polacek wrote:

On Mon, Nov 19, 2018 at 08:03:24PM +0100, Paolo Carlini wrote:

@@ -12245,8 +12246,9 @@ grokdeclarator (const cp_declarator *declarator,
error ("invalid use of %<::%>");
return error_mark_node;
  }
-   else if (TREE_CODE (type) == FUNCTION_TYPE
-|| TREE_CODE (type) == METHOD_TYPE)
+   else if ((TREE_CODE (type) == FUNCTION_TYPE
+ || TREE_CODE (type) == METHOD_TYPE)

I know it's preexisting but we have FUNC_OR_METHOD_TYPE_P for this.


Ah, thanks! I guess I didn't notice that because the macro is defined in 
tree.h. I adjusted my first patch and I'll send a separate one for all 
the remaining instances (many!) in a separate patch. Thanks again,


Paolo.



Index: cp/decl.c
===
--- cp/decl.c   (revision 266268)
+++ cp/decl.c   (working copy)
@@ -12165,7 +12165,8 @@ grokdeclarator (const cp_declarator *declarator,
 }
 
   if (ctype && TREE_CODE (type) == FUNCTION_TYPE && staticp < 2
-  && !(identifier_p (unqualified_id)
+  && !(unqualified_id
+  && identifier_p (unqualified_id)
   && IDENTIFIER_NEWDEL_OP_P (unqualified_id)))
 {
   cp_cv_quals real_quals = memfn_quals;
@@ -12245,8 +12246,7 @@ grokdeclarator (const cp_declarator *declarator,
error ("invalid use of %<::%>");
return error_mark_node;
  }
-   else if (TREE_CODE (type) == FUNCTION_TYPE
-|| TREE_CODE (type) == METHOD_TYPE)
+   else if (FUNC_OR_METHOD_TYPE_P (type) && !bitfield)
  {
int publicp = 0;
tree function_context;
Index: testsuite/g++.dg/parse/bitfield3.C
===
--- testsuite/g++.dg/parse/bitfield3.C  (revision 266263)
+++ testsuite/g++.dg/parse/bitfield3.C  (working copy)
@@ -5,5 +5,5 @@ typedef void (func_type)();
 
 struct A
 {
-  friend func_type f : 2; /* { dg-error "with non-integral type" } */
+  friend func_type f : 2; /* { dg-error "20:.f. is neither function nor member 
function" } */
 };
Index: testsuite/g++.dg/parse/bitfield6.C
===
--- testsuite/g++.dg/parse/bitfield6.C  (nonexistent)
+++ testsuite/g++.dg/parse/bitfield6.C  (working copy)
@@ -0,0 +1,6 @@
+// PR c++/84636
+
+typedef void a();
+struct A {
+a: 1;  // { dg-error "bit-field .\\. with non-integral type" }
+};

Re: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=87626

2018-11-20 Thread Jakub Jelinek

On Mon, Nov 19, 2018 at 04:08:29PM +0530, Lokesh Janghel wrote:
diff --git a/gcc/ChangeLog b/gcc/ChangeLog
index 8ca2e73..b55dfa9 100644
--- a/gcc/ChangeLog
+++ b/gcc/ChangeLog
@@ -1,3 +1,8 @@
+2018-11-19 Lokesh Janghel 

Two spaces between date and name and name and <, i.e.
2018-11-20  Lokesh Janghel  
in both ChangeLog files.

--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/pr85667-2.c
@@ -0,0 +1,15 @@
+/* { dg-do assemble } */
+/* { dg-options "-O2 -masm=intel" } */
+/* { dg-require-effective-target lp64 } */
+/* { dg-require-effective-target masm_intel } */ 
+/* { dg-final { scan-assembler-times "movl\[^\n\r]*, %eax" 1} } */
+typedef struct
+{
+  float x;
+} Float;
+Float __attribute__((ms_abi)) fn1 ()
+{
+  Float v;
+  v.x = 3.145;
+  return v;
+}

This test wasn't properly tested:

/usr/src/gcc/obj/gcc/xgcc -B/usr/src/gcc/obj/gcc/ -m64 
-fno-diagnostics-show-caret -fno-diagnostics-show-line-numbers 
-fdiagnostics-color=never -O2 -masm=intel -ffat-lto-objects -fno-ident -c -o 
pr85667-2.o /usr/src/gcc/gcc/testsuite/gcc.target/i386/pr85667-2.c
PASS: gcc.target/i386/pr85667-2.c (test for excess errors)
gcc.target/i386/pr85667-2.c: output file does not exist
UNRESOLVED: gcc.target/i386/pr85667-2.c scan-assembler-times movl[^\n\r]*, %eax 
1
testcase /usr/src/gcc/gcc/testsuite/gcc.target/i386/i386.exp completed in 1 
seconds

1) you do not want to use dg-do assemble, but dg-do compile, because only
   in that case (or when using -save-temps) assembly is produced
2) you do not want to use -masm=intel and then expect AT&T syntax in the
   regexp

Thus, I'd replace all the dg- directive lines with:
/* { dg-do compile { target lp64 } } */
/* { dg-options "-O2" } */
/* { dg-final { scan-assembler-times "movl\[^\n\r]*, %eax|mov\[ \t]*eax," 1 } } 
*/

That way, it will work both with -masm=att (explicit or implicit) or
-masm=intel.

One can use

make check-gcc RUNTESTFLAGS='--target_board=unix\{-m32,-m64,-m64/-masm=intel\} 
i386.exp=pr85667*'

to verify and then look at the log file.

Furthermore, I'd copy pr85667-1.c test to pr85667-3.c and the modified
pr85667-2.c to pr85667-4.c, change Float to Double, float to double, remove
f suffixes and adjust all the eax in the regexp to rax, so that you also
test the struct with DFmode case.

Jakub

Re: C++ PATCH to implement P1094R2, Nested inline namespaces

2018-11-20 Thread Jakub Jelinek

On Tue, Nov 20, 2018 at 10:25:01AM +0100, Richard Biener wrote:
> > Bootstrapped/regtested on x86_64-linux, ok for trunk?
> 
> Just another small comment - given the usual high number of
> C++ regressions delaying the release is Stage3 the right time
> to add new language features?

I'd say this is small enough and worth an exception, it is just useful syntactic
sugar, and couldn't be submitted (much) earlier as it has been voted in
during the week when stage1 closed.

Jakub

Re: [PATCH] S/390: Skip LT(G) peephole when literal pool is involved

2018-11-20 Thread Andreas Krebbel

On 19.11.18 17:08, Ilya Leoshkevich wrote:
> Bootstrapped and regtested on s390x-redhat-linux.
> 
> By the time peephole optimizations run, we've already made up our mind
> whether to use base-register or relative addressing for literal pool
> entries.  LT(G) supports only base-register addressing, and so it is
> too late to convert L(G)RL + compare to LT(G).  This change should not
> make the code worse unless building with e.g. -fno-dce, since comparing
> literal pool entries to zero should be optimized away during earlier
> passes.
> 
> gcc/ChangeLog:
> 
> 2018-11-19  Ilya Leoshkevich  
> 
>   PR target/88083
>   * config/s390/s390.md: Skip LT(G) peephole when literal pool is
>   involved.
>   * rtl.h (contains_constant_pool_address_p): New function.
>   * rtlanal.c (contains_constant_pool_address_p): Likewise.
> 
> gcc/testsuite/ChangeLog:
> 
> 2018-11-19  Ilya Leoshkevich  
> 
>   PR target/88083
>   * gcc.target/s390/pr88083.c: New test.

Ok. Thanks!

Andreas

Re: [PATCH] Fix PR88031

2018-11-20 Thread Richard Biener

On Mon, 19 Nov 2018, Christophe Lyon wrote:

> On Thu, 15 Nov 2018 at 14:41, Richard Biener  wrote:
> >
> >
> > With one of my last changes we regressed here so this goes all the
> > way cleaning up things so we only have a single flag to
> > vectorizable_condition whetehr we are called from reduction context.
> > In theory the !multiple-types restriction could be easily lifted now
> > (just remove the check).
> >
> > Bootstrapped and tested on x86_64-unknown-linux-gnu, applied.
> >
> > Richard.
> >
> > 2018-11-15  Richard Biener  
> >
> > PR tree-optimization/88031
> > * tree-vect-loop.c (vectorizable_reduction): Move check
> > for multiple types earlier so we get the expected dump.
> > Simplify calls to vectorizable_condition.
> > * tree-vect-stmts.h (vectorizable_condition): Update prototype.
> > * tree-vect-stmts.c (vectorizable_condition): Instead of
> > reduc_def and reduc_index take just a flag.  Simplify
> > code-generation now that we can rely on the defs being set up.
> > (vectorizable_comparison): Remove unused argument.
> >
> > * gcc.dg/pr88031.c: New testcase.
> >
> 
> Hi Richard,
> 
> Since you committed this patch (r266182),
> I've noticed regressions on aarch64:
> gcc.target/aarch64/sve/clastb_1.c -march=armv8.2-a+sve (internal
> compiler error)
> gcc.target/aarch64/sve/clastb_2.c -march=armv8.2-a+sve (internal
> compiler error)
> gcc.target/aarch64/sve/clastb_3.c -march=armv8.2-a+sve (internal
> compiler error)
> gcc.target/aarch64/sve/clastb_4.c -march=armv8.2-a+sve (internal
> compiler error)
> gcc.target/aarch64/sve/clastb_5.c -march=armv8.2-a+sve (internal
> compiler error)
> gcc.target/aarch64/sve/clastb_6.c -march=armv8.2-a+sve (internal
> compiler error)
> gcc.target/aarch64/sve/clastb_7.c -march=armv8.2-a+sve (internal
> compiler error)
> 
> during GIMPLE pass: vect
> /gcc/testsuite/gcc.target/aarch64/sve/clastb_1.c: In function
> 'condition_reduction':
> /gcc/testsuite/gcc.target/aarch64/sve/clastb_1.c:9:1: internal
> compiler error: in vect_get_vec_def_for_operand_1, at
> tree-vect-stmts.c:1485

I am testing the following.

Richard.

2018-11-20  Richard Biener  

* tree-vect-stmts.c (vectorizable_condition): Do not get
at else_clause vect def for EXTRACT_LAST_REDUCTION.  Remove
pointless vect_is_simple_use calls.

Index: gcc/tree-vect-stmts.c
===
--- gcc/tree-vect-stmts.c   (revision 266306)
+++ gcc/tree-vect-stmts.c   (working copy)
@@ -8911,26 +8911,21 @@ vectorizable_condition (stmt_vec_info st
  vec_cond_lhs
= vect_get_vec_def_for_operand (cond_expr, stmt_info,
comp_vectype);
- vect_is_simple_use (cond_expr, stmt_info->vinfo, &dts[0]);
}
  else
{
  vec_cond_lhs
= vect_get_vec_def_for_operand (cond_expr0,
stmt_info, comp_vectype);
- vect_is_simple_use (cond_expr0, loop_vinfo, &dts[0]);
-
  vec_cond_rhs
= vect_get_vec_def_for_operand (cond_expr1,
stmt_info, comp_vectype);
- vect_is_simple_use (cond_expr1, loop_vinfo, &dts[1]);
}
  vec_then_clause = vect_get_vec_def_for_operand (then_clause,
  stmt_info);
- vect_is_simple_use (then_clause, loop_vinfo, &dts[2]);
- vec_else_clause = vect_get_vec_def_for_operand (else_clause,
- stmt_info);
- vect_is_simple_use (else_clause, loop_vinfo, &dts[3]);
+ if (reduction_type != EXTRACT_LAST_REDUCTION)
+   vec_else_clause = vect_get_vec_def_for_operand (else_clause,
+   stmt_info);
}
}
   else

[maintainer-scipts] Add a bugzilla script

2018-11-20 Thread Martin Liška

Hi.

It's the script that I used to identify potentially resolvable bugs. That's done
by parsing of comments and seeking for trunk/branch commits. Sample output looks
as follows:

  https://gcc.gnu.org/bugzilla/show_bug.cgi?id=88084 branches: trunk
  fail:work:
 basic_string_view::copy doesn't use Traits::copy
  https://gcc.gnu.org/bugzilla/show_bug.cgi?id=88083 branches: trunk
  fail:work:
 ICE in find_constant_pool_ref_1, at config/s390/s390.c:8231
  https://gcc.gnu.org/bugzilla/show_bug.cgi?id=88077 branches: trunk
  fail: 8.2.0  work: 9.0
 [8 Regression] ICE: c:378 since r256989
  https://gcc.gnu.org/bugzilla/show_bug.cgi?id=88073 branches: trunk
  fail:work:
 [7/8 Regression] Internal compiler error  compiling WHERE 
construct with -O or -O2
  https://gcc.gnu.org/bugzilla/show_bug.cgi?id=88071 branches: trunk
  fail: 8.2.0, 9.0 work: 7.3.0  
 [8 Regression] ICE: verify_gimple failed (error: dead STMT in EH 
table)

Plus there are generated bugzilla list so that one can go easily through.
Would you be interested in putting that into maintainer scripts?

Martin
#!/usr/bin/env python3

import requests
import json
import argparse

base_url = 'https://gcc.gnu.org/bugzilla/rest.cgi/'
statuses = ['UNCONFIRMED', 'ASSIGNED', 'SUSPENDED', 'NEW', 'WAITING', 'REOPENED']
regex = '(.*\[)([0-9\./]*)( [rR]egression])(.*)'
closure_question = 'Can the bug be marked as resolved?'
start_page = 20
url_page_size = 50

def get_branches_by_comments(comments):
versions = set()
for c in comments:
text = c['text']
if 'URL: https://gcc.gnu.org/viewcvs' in text:
version = 'trunk'
for l in text.split('\n'):
if 'branches/gcc-' in l:
parts = l.strip().split('/')
parts = parts[1].split('-')
assert len(parts) == 3
versions.add(parts[1])
versions.add(version)
return versions

def get_bugs(api_key, query):
u = base_url + 'bug'
r = requests.get(u, params = query)
return r.json()['bugs']

def search(api_key):
chunk = 1000
ids = []
for i in range(start_page, 0, -1):
print('offset: %d' % (i * chunk), flush = True)
bugs = get_bugs(api_key, {'api_key': api_key, 'bug_status': statuses, 'limit': chunk, 'offset': i * chunk})
for b in sorted(bugs, key = lambda x: x['id'], reverse = True):
id = b['id']

fail = b['cf_known_to_fail']
work = b['cf_known_to_work']

u = base_url + 'bug/' + str(id) + '/comment'
r = requests.get(u, params = {'api_key': api_key} ).json()
keys = list(r['bugs'].keys())
assert len(keys) == 1
comments = r['bugs'][keys[0]]['comments']
for c in comments:
if closure_question in c['text']:
continue

branches = get_branches_by_comments(comments)
if len(branches):
branches_str = ','.join(sorted(list(branches)))
print('  https://gcc.gnu.org/bugzilla/show_bug.cgi?id=%d branches: %-30s fail: %-30s work: %-30s  %s' % (id, branches_str, fail, work, b['summary']))
ids.append(id)
if len(ids) == url_page_size:
print('https://gcc.gnu.org/bugzilla/buglist.cgi?bug_id=%s' % ','.join([str(x) for x in ids]))
ids = []

print('Bugzilla URL page size: %d' % url_page_size)
print('HINT: bugs with following comment are ignored: %s\n' % closure_question)

parser = argparse.ArgumentParser(description='')
parser.add_argument('api_key', help = 'API key')

args = parser.parse_args()
search(args.api_key)

Re: [PATCH] [aarch64] Add CPU support for Ampere Computing's eMAG.

2018-11-20 Thread Kyrill Tkachov


Hi Christoph,

Thank you for the patch.
Can you please confirm how this has been tested?

On 19/11/18 17:11, Christoph Muellner wrote:

*** gcc/ChangeLog ***

2018-xx-xx  Christoph Muellner 

* config/aarch64/aarch64-cores.def: Define emag
* config/aarch64/aarch64-tune.md: Regenerated with emag
* config/aarch64/aarch64.c: Defining tuning struct


Please include the name of the new struct like so:
* config/aarch64/aarch64.c (emag_tunings): New struct.

Also, full stops at the end of all entries.


* doc/invoke.texi: Document mtune value
---
  gcc/config/aarch64/aarch64-cores.def |  1 +
  gcc/config/aarch64/aarch64-tune.md   |  2 +-
  gcc/config/aarch64/aarch64.c | 25 +
  gcc/doc/invoke.texi  |  2 +-
  4 files changed, 28 insertions(+), 2 deletions(-)

diff --git a/gcc/config/aarch64/aarch64-cores.def 
b/gcc/config/aarch64/aarch64-cores.def
index 1f3ac56..6e6800e 100644
--- a/gcc/config/aarch64/aarch64-cores.def
+++ b/gcc/config/aarch64/aarch64-cores.def
@@ -63,6 +63,7 @@ AARCH64_CORE("thunderxt83",   thunderxt83,   thunderx,  8A,  
AARCH64_FL_FOR_ARCH
  
  /* APM ('P') cores. */

  AARCH64_CORE("xgene1",  xgene1,xgene1,8A,  AARCH64_FL_FOR_ARCH8, 
xgene1, 0x50, 0x000, -1)
+AARCH64_CORE("emag",emag,  xgene1,8A,  AARCH64_FL_FOR_ARCH8, 
emag, 0x50, 0x000, -1)
  


I'd suggest you start a new comment "section" here, something like /* Ampere 
cores.  */
From this definition this looks identical to xgene1. In particular the IMP and 
PART fields.
Are they really the same? You can find the values in /proc/cpuinfo on a 
GNU/Linux system.

If so, I don't think the -mcpu=native support will be able to pick up emag 
properly.
Do you have access to a Linux system running on this processor? What does 
-mcpu=native -### get rewritten to?

Thanks,
Kyrill


  /* Qualcomm ('Q') cores. */
  AARCH64_CORE("falkor",  falkor,falkor,8A,  AARCH64_FL_FOR_ARCH8 | 
AARCH64_FL_CRC | AARCH64_FL_CRYPTO | AARCH64_FL_RDMA, qdf24xx,   0x51, 0xC00, -1)
diff --git a/gcc/config/aarch64/aarch64-tune.md 
b/gcc/config/aarch64/aarch64-tune.md
index fade1d4..408976a 100644
--- a/gcc/config/aarch64/aarch64-tune.md
+++ b/gcc/config/aarch64/aarch64-tune.md
@@ -1,5 +1,5 @@
  ;; -*- buffer-read-only: t -*-
  ;; Generated automatically by gentune.sh from aarch64-cores.def
  (define_attr "tune"
-   
"cortexa35,cortexa53,cortexa57,cortexa72,cortexa73,thunderx,thunderxt88p1,thunderxt88,thunderxt81,thunderxt83,xgene1,falkor,qdf24xx,exynosm1,phecda,thunderx2t99p1,vulcan,thunderx2t99,cortexa55,cortexa75,cortexa76,ares,tsv110,saphira,cortexa57cortexa53,cortexa72cortexa53,cortexa73cortexa35,cortexa73cortexa53,cortexa75cortexa55,cortexa76cortexa55"
+   
"cortexa35,cortexa53,cortexa57,cortexa72,cortexa73,thunderx,thunderxt88p1,thunderxt88,thunderxt81,thunderxt83,xgene1,emag,falkor,qdf24xx,exynosm1,phecda,thunderx2t99p1,vulcan,thunderx2t99,cortexa55,cortexa75,cortexa76,ares,tsv110,saphira,cortexa57cortexa53,cortexa72cortexa53,cortexa73cortexa35,cortexa73cortexa53,cortexa75cortexa55,cortexa76cortexa55"
(const (symbol_ref "((enum attr_tune) aarch64_tune)")))
diff --git a/gcc/config/aarch64/aarch64.c b/gcc/config/aarch64/aarch64.c
index f7f88a9..995aafe 100644
--- a/gcc/config/aarch64/aarch64.c
+++ b/gcc/config/aarch64/aarch64.c
@@ -957,6 +957,31 @@ static const struct tune_params xgene1_tunings =
&xgene1_prefetch_tune
  };
  
+static const struct tune_params emag_tunings =

+{
+  &xgene1_extra_costs,
+  &xgene1_addrcost_table,
+  &xgene1_regmove_cost,
+  &xgene1_vector_cost,
+  &generic_branch_cost,
+  &xgene1_approx_modes,
+  6, /* memmov_cost  */
+  4, /* issue_rate  */
+  AARCH64_FUSE_NOTHING, /* fusible_ops  */
+  "16",  /* function_align.  */
+  "16",  /* jump_align.  */
+  "16",  /* loop_align.  */
+  2,   /* int_reassoc_width.  */
+  4,   /* fp_reassoc_width.  */
+  1,   /* vec_reassoc_width.  */
+  2,   /* min_div_recip_mul_sf.  */
+  2,   /* min_div_recip_mul_df.  */
+  17,  /* max_case_values.  */
+  tune_params::AUTOPREFETCHER_OFF, /* autoprefetcher_model.  */
+  (AARCH64_EXTRA_TUNE_NO_LDP_STP_QREGS),   /* tune_flags.  */
+  &xgene1_prefetch_tune
+};
+
  static const struct tune_params qdf24xx_tunings =
  {
&qdf24xx_extra_costs,
diff --git a/gcc/doc/invoke.texi b/gcc/doc/invoke.texi
index e016dce..ac81fb2 100644
--- a/gcc/doc/invoke.texi
+++ b/gcc/doc/invoke.texi
@@ -15288,7 +15288,7 @@ Specify the name of the target processor for which GCC 
should tune the
  performance of the code.  Permissible values for this option are:
  @samp{generic}, @samp{cortex-a35}, @samp{cortex-a53}, @samp{cortex-a55},
  @samp{cortex-a57}, @samp{cortex-a72}, @samp{cortex-a73}, @samp{cortex-a75},
-@samp{cortex-a76}, @samp{ares}, @samp{exynos-m1}, @samp{falkor},
+@samp{cortex-a76}, @samp{ares}, @samp{exynos-m1}, @samp{emag}, @samp{falkor},
  @samp{qdf24xx}, @samp{saphira}, @samp{phecda}, @samp{xgene1}, @samp{vulcan

[PATCH][OBVIOUS] Add -fomit-frame-pointer to a test-case (PR ipa/88093).

2018-11-20 Thread Martin Liška

Hi.

One obvious fix of a test-case which I'm going to install.

Martin

gcc/testsuite/ChangeLog:

2018-11-20  Martin Liska  

PR ipa/88093
* gcc.target/i386/ipa-stack-alignment.c: Add
-fomit-frame-pointer.
---
 gcc/testsuite/gcc.target/i386/ipa-stack-alignment.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)


diff --git a/gcc/testsuite/gcc.target/i386/ipa-stack-alignment.c b/gcc/testsuite/gcc.target/i386/ipa-stack-alignment.c
index 1176b59aa5f..33860acaaf5 100644
--- a/gcc/testsuite/gcc.target/i386/ipa-stack-alignment.c
+++ b/gcc/testsuite/gcc.target/i386/ipa-stack-alignment.c
@@ -1,5 +1,5 @@
 /* { dg-do compile } */
-/* { dg-options "-fno-ipa-stack-alignment -O" } */
+/* { dg-options "-fno-ipa-stack-alignment -O -fomit-frame-pointer" } */
 
 typedef struct {
   long a;

Re: [PATCH, ARM] Improve robustness of -mslow-flash-data

2018-11-20 Thread Christophe Lyon

On Mon, 19 Nov 2018 at 18:56, Thomas Preudhomme
 wrote:
>
> Hi,
>
> Current code to handle -mslow-flash-data in machine description files
> suffers from a number of issues which this patch fixes:
>
> 1) The insn_and_split in vfp.md to load a generic floating-point
> constant via GPR first and move it to VFP register are guarded by
> !reload_completed which is forbidden explicitely in the GCC internals
> documentation section 17.2 point 3;
>
> 2) A number of testcase in the testsuite ICEs under -mslow-flash-data
> when targeting the hardfloat ABI [1];
>
> 3) Instructions performing load from literal pool are not disabled.
>
> These problems are addressed by 2 separate actions:
>
> 1) Making the splitters take a clobber and changing the expanders
> accordingly to generate a mov with clobber in cases where a literal
> pool would be used. The splitter can thus be enabled after reload since
> it does not call gen_reg_rtx anymore;
>
> 2) Adding new predicates and constraints to disable literal pool loads
> in existing instructions when -mslow-flash-data is in effect.
>
> The patch also rework the splitter for DFmode slightly to generate an
> intermediate DI load instead of 2 intermediate SI loads, thus relying on
> the existing DI splitters instead of redoing their job. At last, the
> patch adds some missing arm_fp_ok effective target to some of the
> slow-flash-data testcases.
>
> [1]
> c-c++-common/Wunused-var-3.c
> gcc.c-torture/compile/pr72771.c
> gcc.c-torture/compile/vector-5.c
> gcc.c-torture/compile/vector-6.c
> gcc.c-torture/execute/20030914-1.c
> gcc.c-torture/execute/20050316-1.c
> gcc.c-torture/execute/pr59643.c
> gcc.dg/builtin-tgmath-1.c
> gcc.dg/debug/pr55730.c
> gcc.dg/graphite/interchange-7.c
> gcc.dg/pr56890-2.c
> gcc.dg/pr68474.c
> gcc.dg/pr80286.c
> gcc.dg/torture/pr35227.c
> gcc.dg/torture/pr65077.c
> gcc.dg/torture/pr86363.c
> g++.dg/torture/pr81112.C
> g++.dg/torture/pr82985.C
> g++.dg/warn/Wunused-var-7.C
> and a lot more in libstdc++ in special_functions/*_comp_ellint_* and
> special_functions/*_ellint_* directories.
>
> ChangeLog entries are as follows:
>
> *** gcc/ChangeLog ***
>
> 2018-11-14  Thomas Preud'homme  
>
> * config/arm/arm.md (arm_movdi): Split if -mslow-flash-data and
> source is a constant that would be loaded by literal pool.
> (movsf expander): Generate a no_literal_pool_sf_immediate insn if
> -mslow-flash-data is present, targeting hardfloat ABI and source is a
> float constant that cannot be loaded via vmov.
> (movdf expander): Likewise but generate a no_literal_pool_df_immediate
> insn.
> (arm_movsf_soft_insn): Split if -mslow-flash-data and source is a
> float constant that would be loaded by literal pool.
> (softfloat constant movsf splitter): Splitter for the above case.
> (movdf_soft_insn): Split if -mslow-flash-data and source is a float
> constant that would be loaded by literal pool.
> (softfloat constant movdf splitter): Splitter for the above case.
> * config/arm/constraints.md (Pz): Document existing constraint.
> (Ha): Define constraint.
> (Tu): Likewise.
> * config/arm/predicates.md (hard_sf_operand): New predicate.
> (hard_df_operand): Likewise.
> * config/arm/thumb2.md (thumb2_movsi_insn): Split if
> -mslow-flash-data and constant would be loaded by literal pool.
> * constant/arm/vfp.md (thumb2_movsi_vfp): Likewise and disable 
> constant
> load in VFP register.
> (movdi_vfp): Likewise.
> (thumb2_movsf_vfp): Use hard_sf_operand as predicate for source to
> prevent match for a constant load if -mslow-flash-data and constant
> cannot be loaded via vmov.  Adapt constraint accordingly by
> using Ha instead of E for generic floating-point constant load.
> (thumb2_movdf_vfp): Likewise using hard_df_operand predicate instead.
> (no_literal_pool_df_immediate): Add a clobber to use as the
> intermediate general purpose register and also enable it after reload
> but disable it constant is a valid FP constant.  Add constraints and
> generate a DI intermediate load rather than 2 SI loads.
> (no_literal_pool_sf_immediate): Add a clobber to use as the
> intermediate general purpose register and also enable it after
> reload.
>
> *** gcc/testsuite/ChangeLog ***
>
> 2018-11-14  Thomas Preud'homme  
>
> * gcc.target/arm/thumb2-slow-flash-data-2.c: Require arm_fp_ok
> effective target.
> * gcc.target/arm/thumb2-slow-flash-data-3.c: Likewise.
> * gcc.target/arm/thumb2-slow-flash-data-4.c: Likewise.
> * gcc.target/arm/thumb2-slow-flash-data-5.c: Likewise.
>
> Testing: Built arm-none-eabi cross compilers for Armv7E-M defaulting to
> softfloat and hardfloat ABI which showed no regression and some
> FAIL->PASS for hardfloat ABI. Bootstraped on Arm and Thumb-2 wi

Re: [PATCH, middle-end]: Fix PR 88070, ICE in create_pre_exit, at mode-switching.c:438

2018-11-20 Thread Uros Bizjak

On Tue, Nov 20, 2018 at 8:59 AM Eric Botcazou  wrote:
>
> > The blockage was introduced as a fix for PR14381 [1] in r79265 [2].
> > Later, the blockage was moved after return label as a fix for PR25176
> > [3] in r107871 [4].
> >
> > After that, r122626 [5] moves the blockage after the label for the
> > naked return from the function. Relevant posts from gcc-patches@ ML
> > are at [6], [7]. However, in the posts, there are no concrete
> > examples, how scheduler moves instructions from different BB around
> > blockage insn, the posts just show that there is a jump around
> > blockage when __builtin_return is used. I was under impression that
> > scheduler is unable to move instructions over BB boundaries.
>
> The scheduler works on extended basic blocks.  The [7] post gives a rather
> convincing explanation and there is a C++ testcase under PR rtl-opt/14381.
>
> > A mystery is the tree-ssa merge [8] that copies back the hunk, moved
> > in r122626 [5] to its original position. From this revision onwards,
> > we emit two blockages.
>
> It's the dataflow merge, not the tree-ssa merge.  The additional blockage
> might be needed for DF.
>
> Given that the current PR is totally artificial, I think that we need to be
> quite conservative and only do something on mainline.  And even there I'd be
> rather conservative and remove the kludge only for targets that emit unwind
> information in the epilogue (among which there is x86 I presume).

Hm, I think I'll rather go with somehow target-dependent patch:

--cut here--
diff --git a/gcc/mode-switching.c b/gcc/mode-switching.c
index 370a49e90a9c..de75efe2b6c9 100644
--- a/gcc/mode-switching.c
+++ b/gcc/mode-switching.c
@@ -252,7 +252,21 @@ create_pre_exit (int n_entities, int *entity_map,
const int *num_modes)
if (EDGE_COUNT (EXIT_BLOCK_PTR_FOR_FN (cfun)->preds) == 1
&& NONJUMP_INSN_P ((last_insn = BB_END (src_bb)))
&& GET_CODE (PATTERN (last_insn)) == USE
-   && GET_CODE ((ret_reg = XEXP (PATTERN (last_insn), 0))) == REG)
+   && GET_CODE ((ret_reg = XEXP (PATTERN (last_insn), 0))) == REG
+
+   /* x86 targets use mode-switching infrastructure to
+  conditionally insert vzeroupper instruction at the exit
+  from the function and there is no need to switch the
+  mode before the return value copy.  The vzeroupper insertion
+  pass runs after reload, so use !reload_completed as a stand-in
+  for x86 to skip the search for return value copy insn.
+
+  N.b.: the code below assumes that return copy insn
+  immediately precedes its corresponding use insn.  This
+  assumption does not hold after reload, since sched1 pass
+  can reschedule return copy insn away from its
+  corresponding use insn.  */
+   && !reload_completed)
  {
int ret_start = REGNO (ret_reg);
int nregs = REG_NREGS (ret_reg);
--cut here--

WDYT?

Uros.

[PATCH] Fix PR88069

2018-11-20 Thread Richard Biener



The following fixes another case of region VN escaping the region.

Bootstrapped and tested on x86_64-unknown-linux-gnu, applied.

Richard.

2018-11-20  Richard Biener  

PR tree-optimization/88069
* tree-ssa-sccvn.c (visit_phi): Do not value-number to unvisited
virtual PHI arguments.

* gcc.dg/pr88069.c: New testcase.


diff --git a/gcc/testsuite/gcc.dg/pr88069.c b/gcc/testsuite/gcc.dg/pr88069.c
new file mode 100644
index 000..21485135016
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/pr88069.c
@@ -0,0 +1,23 @@
+/* { dg-do compile } */
+/* { dg-options "-O -ftree-pre -ftree-vectorize -fno-tree-pta" } */
+
+void
+qf (void);
+
+void
+mr (short int db)
+{
+  int vq;
+  short int *lp = &db;
+
+  for (vq = 0; vq < 1; ++vq)
+qf ();
+
+  while (*lp < 2)
+{
+  *lp = db;
+  lp = (short int *) &vq;
+  ++*lp;
+}
+}
+
diff --git a/gcc/tree-ssa-sccvn.c b/gcc/tree-ssa-sccvn.c
index 01bedf56662..941752e7887 100644
--- a/gcc/tree-ssa-sccvn.c
+++ b/gcc/tree-ssa-sccvn.c
@@ -4194,12 +4194,19 @@ visit_phi (gimple *phi, bool *inserted, bool 
backedges_varying_p)
  value from the backedge as that confuses the alias-walking code.
  See gcc.dg/torture/pr87176.c.  If the value is the same on a
  non-backedge everything is OK though.  */
-  if (backedge_val
-  && !seen_non_backedge
-  && TREE_CODE (backedge_val) == SSA_NAME
-  && sameval == backedge_val
-  && (SSA_NAME_IS_VIRTUAL_OPERAND (backedge_val)
- || SSA_VAL (backedge_val) != backedge_val))
+  bool visited_p;
+  if ((backedge_val
+   && !seen_non_backedge
+   && TREE_CODE (backedge_val) == SSA_NAME
+   && sameval == backedge_val
+   && (SSA_NAME_IS_VIRTUAL_OPERAND (backedge_val)
+  || SSA_VAL (backedge_val) != backedge_val))
+  /* Do not value-number a virtual operand to sth not visited though
+given that allows us to escape a region in alias walking.  */
+  || (sameval
+ && TREE_CODE (sameval) == SSA_NAME
+ && SSA_NAME_IS_VIRTUAL_OPERAND (sameval)
+ && (SSA_VAL (sameval, &visited_p), !visited_p)))
 /* Note this just drops to VARYING without inserting the PHI into
the hashes.  */
 result = PHI_RESULT (phi);

Re: [v3 PATCH] Housekeeping for the effective targets of optional's tests.

2018-11-20 Thread Jonathan Wakely


On 20/11/18 07:11 +0200, Ville Voutilainen wrote:

Tested on Linux-x64 for optional's tests only. Ok for trunk?


OK, thanks.

[PATCH] Fix libgomp bootstrap on mingw (PR bootstrap/88106)

2018-11-20 Thread Jakub Jelinek

Hi!

Jonathan reported that gcc fails to build on mingw32, apparently it has some
gethostname implementation, but in winsock library or where, and it isn't
prototyped.  libgfortran instead uses GetComputerName API, so this patch
does that for libgomp too.

Kindly tested by Jonathan, committed to trunk.

2018-11-20  Jakub Jelinek  

PR bootstrap/88106
* config/mingw32/affinity-fmt.c: New file.

--- libgomp/config/mingw32/affinity-fmt.c.jj2018-11-20 09:30:17.796003153 
+0100
+++ libgomp/config/mingw32/affinity-fmt.c   2018-11-20 09:35:54.501378891 
+0100
@@ -0,0 +1,68 @@
+/* Copyright (C) 2018 Free Software Foundation, Inc.
+   Contributed by Jakub Jelinek .
+
+   This file is part of the GNU Offloading and Multi Processing Library
+   (libgomp).
+
+   Libgomp is free software; you can redistribute it and/or modify it
+   under the terms of the GNU General Public License as published by
+   the Free Software Foundation; either version 3, or (at your option)
+   any later version.
+
+   Libgomp is distributed in the hope that it will be useful, but WITHOUT ANY
+   WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS
+   FOR A PARTICULAR PURPOSE.  See the GNU General Public License for
+   more details.
+
+   Under Section 7 of GPL version 3, you are granted additional
+   permissions described in the GCC Runtime Library Exception, version
+   3.1, as published by the Free Software Foundation.
+
+   You should have received a copy of the GNU General Public License and
+   a copy of the GCC Runtime Library Exception along with this program;
+   see the files COPYING3 and COPYING.RUNTIME respectively.  If not, see
+   .  */
+
+#include "libgomp.h"
+#include 
+#include 
+#include 
+#ifdef HAVE_UNISTD_H
+#include 
+#endif
+#ifdef HAVE_INTTYPES_H
+# include   /* For PRIx64.  */
+#endif
+#define WIN32_LEAN_AND_MEAN
+#include 
+#include 
+
+static int
+gomp_gethostname (char *name, size_t len)
+{
+  /* On Win9x GetComputerName fails if the input size is less
+ than MAX_COMPUTERNAME_LENGTH + 1.  */
+  char buffer[MAX_COMPUTERNAME_LENGTH + 1];
+  DWORD size = sizeof (buffer);
+  int ret = 0;
+
+  if (!GetComputerName (buffer, &size))
+return -1;
+
+  if ((size = strlen (buffer) + 1) > len)
+{
+  errno = EINVAL;
+  /* Truncate as per POSIX spec.  We do not NUL-terminate. */
+  size = len;
+  ret = -1;
+}
+  memcpy (name, buffer, (size_t) size);
+
+  return ret;
+}
+
+#undef gethostname
+#define gethostname gomp_gethostname
+#define HAVE_GETHOSTNAME 1
+
+#include "../../affinity-fmt.c"

Jakub

Re: [PATCH PR84648]Adjust loop exit conditions for loop-until-wrap cases.

2018-11-20 Thread Bin.Cheng

On Mon, Nov 19, 2018 at 9:17 PM Christophe Lyon
 wrote:
>
> On Wed, 14 Nov 2018 at 11:10, bin.cheng  wrote:
> >
> > --
> > Sender:Richard Biener 
> > Sent at:2018 Nov 13 (Tue) 23:03
> > To:bin.cheng 
> > Cc:GCC Patches 
> > Subject:Re: [PATCH PR84648]Adjust loop exit conditions for loop-until-wrap 
> > cases.
> >
> > >
> > > On Sun, Nov 11, 2018 at 9:02 AM bin.cheng  
> > > wrote:
> > >>
> > >> Hi,
> > >> This patch fixes PR84648 by adjusting exit conditions for 
> > >> loop-until-wrap cases.
> > >> It only handles simple cases in which IV.base are constants because we 
> > >> rely on
> > >> current niter analyzer which doesn't handle parameterized bound in 
> > >> wrapped
> > >> case.  It could be relaxed in the future.
> > >>
> > >> Bootstrap and test on x86_64 in progress.
> > >
> > > Please use TYPE_MIN/MAX_VALUE or wi::min/max_value consistently.
> > > Either tree_int_cst_equal (iv0->base, TYPE_MIN_VALUE (type)) or
> > > wide_int_to_tree (niter_type, wi::max_value (TYPE_PRECISION (type),
> > > TYPE_SIGN (type))).
> > >
> > > Also
> > >
> > > +  iv0->base = low;
> > > +  iv0->step = fold_convert (niter_type, integer_one_node);
> > >
> > > build_int_cst (niter_type, 1);
> > >
> > > +  iv1->base = high;
> > > +  iv1->step = integer_zero_node;
> > >
> > > build_int_cst (niter_type, 0);
> > Fixed, thanks for reviewing.
> >
> > >
> > > With the code, what happens to signed IVs?  I suppose we figure out things
> > > earlier by means of undefined overflow?
> > The code takes advantage of signed undefined overflow and handle it as wrap.
> > In the reported test case, we have following IL:
> >:
> >   goto ; [INV]
> >
> >:
> >   i_4 = i_2 + 1;
> >
> >:
> >   # i_2 = PHI <0(2), i_4(3)>
> >   i.0_1 = (signed int) i_2;
> >   if (i.0_1 >= 0)
> > goto ; [INV]
> >   else
> > goto ; [INV]
> >
> > So the IV is actually transformed into signed int, we rely on scev to 
> > understand
> > type conversion correctly and generate (int){0, 1} for i.0_1.  Is this 
> > reasonable?
> >
> > Updated patch attached, bootstrap and test on x86_64.
> >
> > Thanks,
> > bin
> >
> > 2018-11-11  Bin Cheng  
> >
> > PR tree-optimization/84648
> > * tree-ssa-loop-niter.c (adjust_cond_for_loop_until_wrap): New.
> > (number_of_iterations_cond): Adjust exit cond for loop-until-wrap 
> > case
> > by calling adjust_cond_for_loop_until_wrap.
> >
> > 2018-11-11  Bin Cheng  
> >
> > PR tree-optimization/84648
> > * gcc.dg/tree-ssa/pr84648.c: New test.
> > * gcc.dg/pr68317.c: Add warning check on overflow.
>
>
> Hi Bin,
>
> Since you committed this patch (r266171), I've noticed a regression
> in fortran:
Very sorry for the breakage.  It's reported as pr88044, I will investigate it.

Thanks,
bin
> FAIL: gfortran.dg/transfer_intrinsic_3.f90   -O3 -fomit-frame-pointer
> -funroll-loops -fpeel-loops -ftracer -finline-functions  execution
> test
> FAIL: gfortran.dg/transfer_intrinsic_3.f90   -O3 -g  execution test
> on arm-none-linux-gnueabihf
> --with-cpu cortex-a5
> --with-fpu vfpv3-d16-fp16
>
> cortex-a9+neon-fp16, cortex-a15+neon-vfpv4 and
> cortex-a57+crypto-neon-fp-armv8 are still OK.
>
> Christophe

Re: [PATCH v2 1/3] Allow memory operands for PTWRITE

2018-11-20 Thread Richard Biener

On Fri, Nov 16, 2018 at 8:07 AM Uros Bizjak  wrote:
>
> On Fri, Nov 16, 2018 at 4:57 AM Andi Kleen  wrote:
> >
> > From: Andi Kleen 
> >
> > The earlier PTWRITE builtin definition was unnecessarily restrictive,
> > only allowing register input to PTWRITE. The instruction actually
> > supports memory operands too, so allow that too.
> >
> > gcc/:
> >
> > 2018-11-15  Andi Kleen  
> >
> > * config/i386/i386.md: Allow memory operands to ptwrite.
>
> OK.

Btw, I wonder why the ptwrite builtin is in SPECIAL_ARGS2
commented as /* Add all special builtins with variable number of operands. */?

On the GIMPLE level this builtin also has quite some (bad) effects on
alias analysis and any related optimization (vectorization, etc.).  I'll have
to see where the instrumenting pass now resides.

Richard.

> Thanks,
> Uros.
>
> > ---
> >  gcc/config/i386/i386.md | 2 +-
> >  1 file changed, 1 insertion(+), 1 deletion(-)
> >
> > diff --git a/gcc/config/i386/i386.md b/gcc/config/i386/i386.md
> > index 44db8ac954c..9c359c0ca04 100644
> > --- a/gcc/config/i386/i386.md
> > +++ b/gcc/config/i386/i386.md
> > @@ -19501,7 +19501,7 @@
> > (set_attr "prefix_extra" "2")])
> >
> >  (define_insn "ptwrite"
> > -  [(unspec_volatile [(match_operand:SWI48 0 "register_operand" "r")]
> > +  [(unspec_volatile [(match_operand:SWI48 0 "nonimmediate_operand" "rm")]
> > UNSPECV_PTWRITE)]
> >"TARGET_PTWRITE"
> >"ptwrite\t%0"
> > --
> > 2.19.1
> >

Re: [PATCH, middle-end]: Fix PR 88070, ICE in create_pre_exit, at mode-switching.c:438

2018-11-20 Thread Uros Bizjak

On Tue, Nov 20, 2018 at 8:59 AM Eric Botcazou  wrote:
>
> > The blockage was introduced as a fix for PR14381 [1] in r79265 [2].
> > Later, the blockage was moved after return label as a fix for PR25176
> > [3] in r107871 [4].
> >
> > After that, r122626 [5] moves the blockage after the label for the
> > naked return from the function. Relevant posts from gcc-patches@ ML
> > are at [6], [7]. However, in the posts, there are no concrete
> > examples, how scheduler moves instructions from different BB around
> > blockage insn, the posts just show that there is a jump around
> > blockage when __builtin_return is used. I was under impression that
> > scheduler is unable to move instructions over BB boundaries.
>
> The scheduler works on extended basic blocks.  The [7] post gives a rather
> convincing explanation and there is a C++ testcase under PR rtl-opt/14381.

Taking into account that BB edges aren't scheduling barriers, I agree with [7].

> > A mystery is the tree-ssa merge [8] that copies back the hunk, moved
> > in r122626 [5] to its original position. From this revision onwards,
> > we emit two blockages.
>
> It's the dataflow merge, not the tree-ssa merge.  The additional blockage
> might be needed for DF.

Ah yes. Thanks for the correction. However, I think  that - according
to the reintroduced duplicated comment - the blockage, reintroduced by
DF merge should not be needed. I'll investigate this a bit and try to
bootstrap/regtest a patch that removes reintroduced blockage.

> Given that the current PR is totally artificial, I think that we need to be
> quite conservative and only do something on mainline.  And even there I'd be
> rather conservative and remove the kludge only for targets that emit unwind
> information in the epilogue (among which there is x86 I presume).

The testcase actually exposes -fschedule-insn problem with x86, so the
test is not totaly artificial. The idea of removing the kludge for
certain targets is tempting (the scheduler will be given some more
freedom), but I agree that it should not be removed in stage-3.

Thanks,
Uros.

Re: [PATCH AutoFDO/2]Treat ZERO as common profile probability/count

2018-11-20 Thread bin.cheng

Sender:Jan Hubicka 
Sent at:2018 Nov 5 (Mon) 22:21
To:Richard Biener 
Cc:bin.cheng ; GCC Patches 

Subject:Re: [PATCH AutoFDO/2]Treat ZERO as common profile probability/count
 
> 
> > On Wed, Oct 31, 2018 at 7:30 AM bin.cheng  
> > wrote:
> > >
> > > Hi,
> > > In new profile probability/count infra, we have different precision 
> > > quality categories,
> > > and probabilities/counts of different categories are not supposed to be 
> > > compared or
> > > calculated.  Though in general is an improvement, it introduces 
> > > unexpected behavior.
> > > Specifically, class profile_probablity and profile_count themselves are 
> > > implemented
> > > by comparing probabilities/counts against profile_count::zero().  while 
> > > zero() is of
> > > profile_precision category, it's always compared different to zero of 
> > > other precision
> > > categories including afdo.
> > >
> > > I can see two ways fixing this: 1) Treat zero as a common 
> > > probability/count regardless
> > > of its category; 2) Provide an "is_zero" method rather than relying on 
> > > "==" comparison
> > > against probability_count::zero().  2) requires lots of code changes so I 
> > > went with 1)
> > > in this patch set.  This patch doesn't handle "always" but it might be.
> > >
> > > This patch also corrects a minor issue where we try to invert an 
> > > uninitialized value.
> > >
> > > Bootstrap and test on x86_64 in patch set.  Is it OK?
> > 
> > I'll defer on the emit_store_flag_force change, likewise for the zero
> > handling in
> > compares - I don't think zeros of different qualities should compare equal.
> > Would compares against ::always() not have the very same issue?
> > Likewise ::even(),
> > ::likely(), etc.?  Those always get guessed quality.
> > 
> > The invert change looks OK to me.  The related change to the always() API 
> > would
> > suggest to replace guessed_always() with always (guessed) and also do 
> > similar
> > changes throughout the whole API...
> > 
> > Honza?
> 
> The zeros are really differenct zeros.  profile_count::zero makes us to
> drop the basic block into cold section because we know that it won't be
> executed in normal run of program (either we have accurate profile
> feedback or by proving that the program is on way to crash or user
> annotated cold section).  Having guessed zero or auto-fdo zero won't
> make us to do such agressive size optimization. 
> This is important since those zeros relatively commonly happens by
> accident and thus if we dropped all the code to cold section the cold
> section would be visited relativel often during execution of program
> which would eliminate its need.
> 
> Most comparsion in profile-count.h which goes agains profile_count==zero
> are realy intended to pass only for this "aboslute zero". They bypass
> the precision adjusmtents which normally happen when you merge values
> of different precision. 
> 
> What kind of unexpected behaviour are you seeing?
> We already have nonzero_p which is what we use when we want to know that
> count is non-zero in some sense of precision.
Hi Honza,
Sorry for letting this slip away.  So in case of AutoFDO, due to the nature
of sampling, lots of funcs/bbs are annotated with zero profile_count in afdo
precision, and we have checks against zero profile_count in precise precision
All these checks end up with false and cause issues.  Take the code in
update_profiling_info as an example:

update_profiling_info (struct cgraph_node *orig_node,
   struct cgraph_node *new_node)
{
   struct cgraph_edge *cs;
   struct caller_statistics stats;
   profile_count new_sum, orig_sum;
   profile_count remainder, orig_node_count = orig_node->count;

   if (!(orig_node_count.ipa () > profile_count::zero ()))
 return;
   //...
   for (cs = new_node->callees; cs; cs = cs->next_callee)
 cs->count = cs->count.apply_scale (new_sum, orig_node_count);

Since we also have below code in profile_count::operator>,
  if (other == profile_count::zero ())
return !(*this == profile_count::zero ());

If orig_node_count is afdo zero, the above zero check for orig_node_count
returns false, we end up with passing zero density to apply_scale issue and
asserting.

In this updated patch, I restrcited changes only to profile_count::operator
<, >, <= and >=.  Plus, I think there is a latent typo in operator>= because
current code return TRUE if '*this' is precise zero and 'other' is precise
non-zero.
@@ -879,7 +879,7 @@ public:
   if (other == profile_count::zero ())
return true;
   if (*this == profile_count::zero ())
-   return !(other == profile_count::zero ());
+   return !other.nonzero_p ();

Bootstrap and test on x86_64 along with other patches.

Thanks,
bin

2018-11-19  Bin Cheng  

* profile-count.h (profile_count::operator<, >, <=): Check ZERO count
using nonzero_p.
(profile_count::oeprator>=): Invert return condition when *this is
precise zero.  Check ZERO count in

Re: GCC 8 backports

2018-11-20 Thread Martin Liška

On 10/3/18 11:23 AM, Martin Liška wrote:
> On 9/25/18 8:48 AM, Martin Liška wrote:
>> Hi.
>>
>> One more tested patch.
>>
>> Martin
>>
> 
> One more tested patch.
> 
> Martin
> 

Hi.

One another tested patch that I'm going to install.

Martin
>From 94cd1e55e5baec63b7a80c59fdd8b5c52595c9e9 Mon Sep 17 00:00:00 2001
From: marxin 
Date: Mon, 19 Nov 2018 15:00:41 +
Subject: [PATCH] Backport r266277

gcc/lto/ChangeLog:

2018-11-19  Martin Liska  

	PR lto/88077
	* lto-symtab.c (lto_symtab_merge): Transform the
	condition before r256989.

gcc/testsuite/ChangeLog:

2018-11-19  Martin Liska  

	PR lto/88077
	* gcc.dg/lto/pr88077_0.c: New test.
	* gcc.dg/lto/pr88077_1.c: New test.
---
 gcc/lto/lto-symtab.c | 5 +++--
 gcc/testsuite/gcc.dg/lto/pr88077_0.c | 3 +++
 gcc/testsuite/gcc.dg/lto/pr88077_1.c | 6 ++
 3 files changed, 12 insertions(+), 2 deletions(-)
 create mode 100644 gcc/testsuite/gcc.dg/lto/pr88077_0.c
 create mode 100644 gcc/testsuite/gcc.dg/lto/pr88077_1.c

diff --git a/gcc/lto/lto-symtab.c b/gcc/lto/lto-symtab.c
index 3663ab7a9b2..cec74894c02 100644
--- a/gcc/lto/lto-symtab.c
+++ b/gcc/lto/lto-symtab.c
@@ -388,8 +388,9 @@ lto_symtab_merge (symtab_node *prevailing, symtab_node *entry)
 	 int a[]={1,2,3};
 	 here the first declaration is COMMON
 	 and sizeof(a) == sizeof (int).  */
-	else if (TREE_CODE (type) == ARRAY_TYPE)
-	  return (TYPE_SIZE (decl) == TYPE_SIZE (TREE_TYPE (type)));
+	else if (TREE_CODE (type) != ARRAY_TYPE
+		 || (TYPE_SIZE (type) != TYPE_SIZE (TREE_TYPE (type
+	  return false;
   }
 
   return true;
diff --git a/gcc/testsuite/gcc.dg/lto/pr88077_0.c b/gcc/testsuite/gcc.dg/lto/pr88077_0.c
new file mode 100644
index 000..9e464b6ad4a
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/lto/pr88077_0.c
@@ -0,0 +1,3 @@
+/* { dg-lto-do link } */
+
+int HeaderStr;
diff --git a/gcc/testsuite/gcc.dg/lto/pr88077_1.c b/gcc/testsuite/gcc.dg/lto/pr88077_1.c
new file mode 100644
index 000..fd3de3e77a6
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/lto/pr88077_1.c
@@ -0,0 +1,6 @@
+char HeaderStr[1];
+
+int main()
+{
+  return 0;
+}
-- 
2.19.1

Re: [PATCH 06/10] GCN back-end config

2018-11-20 Thread Andrew Stubbs


On 16/11/2018 17:44, Joseph Myers wrote:

On Fri, 16 Nov 2018, Andrew Stubbs wrote:


* config.sub: Recognize amdgcn*-*-amdhsa.


config.sub should be copied from upstream config.git (along with
config.guess at the same time), once the support has been added there; it
shouldn't be patched locally in GCC.


GNU config has now been patched.

You can consider the config.sub portion of this patch obsolete.

Andrew

[PATCH][driver] Ensure --help=params lines end with period

2018-11-20 Thread Tom de Vries

Hi,

this patch ensures that gcc --help=params lines end with a period by:
- fixing the help message of param HOT_BB_COUNT_FRACTION, and
- adding a test-case.

Build and tested on x86_64.

OK for trunk?

Thanks,
- Tom

[driver] Ensure --help=params lines end with period

2018-11-20  Tom de Vries  

PR c/79855
* params.def (HOT_BB_COUNT_FRACTION): Terminate help message with
period.

* lib/options.exp (check_for_options_with_filter): New proc.
* gcc.misc-tests/help.exp: Check that --help=params lines end with
period.

---
 gcc/params.def|  2 +-
 gcc/testsuite/gcc.misc-tests/help.exp |  2 ++
 gcc/testsuite/lib/options.exp | 34 ++
 3 files changed, 33 insertions(+), 5 deletions(-)

diff --git a/gcc/params.def b/gcc/params.def
index 2ae5a007530..11396a7f3af 100644
--- a/gcc/params.def
+++ b/gcc/params.def
@@ -397,7 +397,7 @@ DEFPARAM(PARAM_SMS_LOOP_AVERAGE_COUNT_THRESHOLD,
 DEFPARAM(HOT_BB_COUNT_FRACTION,
 "hot-bb-count-fraction",
 "Select fraction of the maximal count of repetitions of basic block in 
program given basic "
-"block needs to have to be considered hot (used in non-LTO mode)",
+"block needs to have to be considered hot (used in non-LTO mode).",
 1, 0, 0)
 DEFPARAM(HOT_BB_COUNT_WS_PERMILLE,
 "hot-bb-count-ws-permille",
diff --git a/gcc/testsuite/gcc.misc-tests/help.exp 
b/gcc/testsuite/gcc.misc-tests/help.exp
index f40cfabb41e..34ff9406e25 100644
--- a/gcc/testsuite/gcc.misc-tests/help.exp
+++ b/gcc/testsuite/gcc.misc-tests/help.exp
@@ -63,6 +63,8 @@ check_for_options c "-v --help" "" {are likely to\n  -std} ""
 # Try various --help= classes and qualifiers.
 check_for_options c "--help=optimizers" "-O" "  -g  " ""
 check_for_options c "--help=params" "maximum number of" 
"-Wunsafe-loop-optimizations" ""
+check_for_options_with_filter c "--help=params" \
+"^The --param option recognizes the following as parameters:$" "" {[^.]$} 
""
 check_for_options c "--help=C" "-ansi" "-gnatO" ""
 check_for_options c {--help=C++} {-std=c\+\+} "-gnatO" ""
 check_for_options c "--help=common" "-dumpbase" "-gnatO" ""
diff --git a/gcc/testsuite/lib/options.exp b/gcc/testsuite/lib/options.exp
index 824d91276e4..60d85eea9d4 100644
--- a/gcc/testsuite/lib/options.exp
+++ b/gcc/testsuite/lib/options.exp
@@ -26,11 +26,14 @@ if { [ishost "*-*-cygwin*"] } {
 }
 
 # Run the LANGUAGE compiler with GCC_OPTIONS and inspect the compiler
-# output to make sure that they match the newline-separated patterns
-# in COMPILER_PATTERNS but not the patterns in COMPILER_NON_PATTERNS.
-# In case of failure, xfail if XFAIL is nonempty.
+# output excluding EXCLUDE lines to make sure that they match the
+# newline-separated patterns in COMPILER_PATTERNS but not the patterns in
+# COMPILER_NON_PATTERNS.  In case of failure, xfail if XFAIL is nonempty.
 
-proc check_for_options {language gcc_options compiler_patterns 
compiler_non_patterns expected_failure} {
+proc check_for_options_with_filter { language gcc_options exclude \
+compiler_patterns \
+compiler_non_patterns \
+expected_failure } {
 set filename test-[pid]
 set fd [open $filename.c w]
 puts $fd "int main (void) { return 0; }"
@@ -47,6 +50,21 @@ proc check_for_options {language gcc_options 
compiler_patterns compiler_non_patt
 set gcc_output [gcc_target_compile $filename.c $filename.x executable 
$gcc_options]
 remote_file build delete $filename.c $filename.x $filename.gcno
 
+if { $exclude != "" } {
+   set lines [split $gcc_output "\n"]
+   set gcc_output ""
+   foreach line $lines {
+   if {[regexp -line -- "$exclude" $line]} {
+   continue
+   }
+   if { $gcc_output == "" } {
+   set gcc_output "$line"
+   } else {
+   set gcc_output "$gcc_output\n$line"
+   }
+   }   
+   }
+
 # Verify that COMPILER_PATTERRNS appear in gcc output.
 foreach pattern [split $compiler_patterns "\n"] {
if {$pattern != ""} {
@@ -79,3 +97,11 @@ proc check_for_options {language gcc_options 
compiler_patterns compiler_non_patt
}
 }
 }
+
+# As check_for_options_with_filter, but without the EXCLUDE parameter.
+
+proc check_for_options { language gcc_options compiler_patterns \
+compiler_non_patterns expected_failure } {
+check_for_options_with_filter $language $gcc_options "" $compiler_patterns 
\
+   $compiler_non_patterns $expected_failure
+}

Re: [PATCH v2 2/3] Add a pass to automatically add ptwrite instrumentation

2018-11-20 Thread Richard Biener

On Fri, Nov 16, 2018 at 4:57 AM Andi Kleen  wrote:
>
> From: Andi Kleen 
>
> Add a new pass to automatically instrument changes to variables
> with the new PTWRITE instruction on x86. PTWRITE writes a 4 or 8 byte
> field into an Processor Trace log, which allows low over head
> logging of information. Essentially it's a hardware accelerated
> printf.
>
> This allows to reconstruct how values later from the log,
> which can be useful for debugging or other analysis of the program
> behavior. With the compiler support this can be done with without
> having to manually add instrumentation to the code.
>
> Using dwarf information this can be later mapped back to the variables.
> The decoder decodes the PTWRITE instructions using IP information
> in the log, and then looks up the argument in the debug information.
> Then this can be used to reconstruct the original variable
> name to display a value history for the variable.
>
> There are new options to enable instrumentation for different types,
> and also a new attribute to control analysis fine grained per
> function or variable level. The attributes can be set on both
> the variable and the type level, and also on structure fields.
> This allows to enable tracing only for specific code in large
> programs in a flexible matter.
>
> The pass is generic, but only the x86 backend enables the necessary
> hooks. When the backend enables the necessary hooks (with -mptwrite)
> there is an additional pass that looks through the code for
> attribute vartrace enabled functions or variables.
>
> The -fvartrace=locals option is experimental: it works, but it
> generates redundant ptwrites because the pass doesn't use
> the SSA information to minimize instrumentation.
> This could be optimized later.
>
> Currently the code can be tested with SDE, or on a Intel
> Gemini Lake system with a new enough Linux kernel (v4.10+)
> that supports PTWRITE for PT. Gemini Lake is used in low
> end laptops ("Intel Pentium Silver J5.. / Celeron N4... /
> Celeron J4...")
>
> Linux perf can be used to record the values
>
> perf record -e intel_pt/ptw=1,branch=0/ program
> perf script --itrace=crw -F +synth ...
>
> I have an experimential version of perf that can also use
> dwarf information to symbolize many[1] values back to their variable
> names. So far it is not in standard perf, but available at
>
> https://git.kernel.org/pub/scm/linux/kernel/git/ak/linux-misc.git/log/?h=perf/var-resolve-4
>
> It is currently not able to decode all variable locations to names,
> but a large subset.
>
> Longer term hopefully gdb will support this information too.
>
> The CPU can potentially generate very data high bandwidths when
> code doing a lot of computation is heavily instrumented.
> This can cause some data loss in both the CPU and also in perf
> logging the data when the disk cannot keep up.
>
> Running some larger workloads most workloads do not cause
> CPU level overflows, but I've seen it with -fvartrace
> with crafty, and with more workloads with -fvartrace-locals.
>
> Recommendation is to not fully instrument programs,
> but only areas of interest either at the file level or using
> the attributes.
>
> The other thing is that perf and the disk often cannot keep up
> with the data bandwidth for longer computations. In this case
> it's possible to use perf snapshot mode (add --snapshot
> to the command line above). The data will be only logged to
> a memory ring buffer then, and only dump the buffers on events
> of interest by sending SIGUSR2 to the perf binrary.
>
> In the future this will be hopefully better supported with
> core files and gdb.
>
> Passes bootstrap and test suite on x86_64-linux, also
> bootstrapped and tested gcc itself with full -fvartrace
> and -fvartrace=locals instrumentation.

In the cover mail you mentioned you didn't get rid of SSA update.
That is because your instrumentation does not set the calls
virtual operands.  Since your builtin clobbers memory and you
instrument non-memory ops that's only possible if you'd track
the active virtual operand during the walk over the function.  I
suppose using SSA update is OK for now.

More comments inline

> gcc/:
>
> 2018-11-15  Andi Kleen  
>
> * Makefile.in: Add tree-vartrace.o.
> * common.opt: Add -fvartrace
> * opts.c (parse_vartrace_options): Add.
> (common_handle_option): Call parse_vartrace_options.
> * config/i386/i386.c (ix86_vartrace_func): Add.
> (TARGET_VARTRACE_FUNC): Add.
> * doc/extend.texi: Document vartrace/no_vartrace
> attributes.
> * doc/invoke.texi: Document -fvartrace.
> * doc/tm.texi (TARGET_VARTRACE_FUNC): Add.
> * passes.def: Add vartrace pass.
> * target.def (vartrace_func): Add.
> * tree-pass.h (make_pass_vartrace): Add.
> * tree-vartrace.c: New file to implement vartrace pass.
>
> gcc/c-family/:
>
> 2018-11-15  Andi Kleen  
>
> * c-attribs.c (handle_vartrace_attribute,
>

Re: [PATCH] Come up with gcc/testsuite/g++.target/i386/i386.dg and move there some tests.

2018-11-20 Thread Martin Liška

On 11/16/18 6:01 PM, Renlin Li wrote:
> Hi Martin,
> 
> Seems the change is not checked in yet?

Sorry, I've just installed the patch.

Martin

> 
> Thanks,
> Renlin
> 
> On 10/22/2018 01:22 PM, Martin Liška wrote:
>> On 10/22/18 12:09 PM, Jakub Jelinek wrote:
>>> On Mon, Oct 22, 2018 at 12:04:23PM +0200, Martin Liška wrote:
> I noticed that before the tests were run with all of
> -std=(c|gnu)++(98|11|14), now with no explict -std option.  I wonder if
> this is an issue.
>
> Rainer
>

 Hello.

 I guess that should not be a problem.
>>>
>>> If we want that, it is a matter of (untested):
>>> --- gcc/testsuite/g++.target/i386/i386.exp.jj    2018-10-10 
>>> 10:50:48.352235231 +0200
>>> +++ gcc/testsuite/g++.target/i386/i386.exp    2018-10-22 12:08:56.546807996 
>>> +0200
>>> @@ -35,8 +35,8 @@ dg-init
>>>   clearcap-init
>>>     # Main loop.
>>> -dg-runtest [lsort [glob -nocomplain $srcdir/$subdir/*.C]] \
>>> -    "" $DEFAULT_CXXFLAGS
>>> +g++-dg-runtest [lsort [glob -nocomplain $srcdir/$subdir/*.C]] \
>>> +   "" $DEFAULT_CXXFLAGS
>>>     # All done.
>>>   clearcap-finish
>>>
>>> Jakub
>>>
>>
>> Thank you Jakub, works for me for:
>> $ make check -k RUNTESTFLAGS="i386.exp"
>>
>> I can confirm that:
>> grep '^PASS' ./gcc/testsuite/g++/g++.log | wc -l
>>
>>
>> changed from 61 to 183.
>>
>> I'm going to install the patch.
>>
>> Martin
>>
>>

Re: [PATCH]Come up with -flive-patching master option.

2018-11-20 Thread Martin Liška

On 11/19/18 4:52 PM, Qing Zhao wrote:
> 
>> On Nov 19, 2018, at 2:11 AM, Martin Liška > > wrote:
>>
>> On 11/16/18 5:04 PM, Qing Zhao wrote:
>>>
 On Nov 16, 2018, at 9:26 AM, Martin Liška >>>  > wrote:

 On 11/16/18 2:36 AM, Qing Zhao wrote:
> Hi,
>
> this is the new version of the patch.
>
> I have bootstrapped it on both aarch64 and x86,  no regression.
>
> please take a look.

 Thanks for the updated version of the patch.
 I have last small nits I see:

 - gcc/common.opt: when running --help=common, the line is too long
>>>
>>> the following is the output for ./gcc —help=common:
>>>   -flive-patching             Same as -flive-patching=.  Use the latter 
>>> option
>>>                               instead.
>>>   -flive-patching=[inline-only-static|inline-clone] Control IPA 
>>> optimizations
>>>                               to provide a safe compilation for 
>>> live-patching.
>>>                               At the same time, provides multiple-level 
>>> control
>>>                               on the enabled IPA optimizations.
>>>  
>>> Not sure what’s you mean of “the line is too long”? could you please 
>>> specify the above which line?
>>
>> You are probably using a console that has quite small column limit, so that 
>> you see it automatically
>> wrapped.
>>
>> I see:
>>
>> ...
>>  -flimit-function-alignment  This option lacks documentation.
>>  -flive-patching             Same as -flive-patching=.  Use the latter 
>> option instead.
>>  -flive-patching=[inline-only-static|inline-clone] Control IPA optimizations 
>> to provide a safe compilation for live-patching. At the same time, provides 
>> multiple-level control on the enabled IPA optimizations.
>>
>> ^--- the long line
> 
> Okay, I see.
> 
> From the documentation: 
> https://gcc.gnu.org/onlinedocs/gcc-4.3.4/gccint/Option-file-format.html#Option-file-format
> 
> "
> An option definition record. �These records have the following fields:
> • the name of the option, with the leading “-” removed
> • a space-separated list of option properties (see Option properties)
> • the help text to use for --help (omitted if the second field contains the 
> Undocumented property).
> ….
> The help text is automatically line-wrapped before being displayed. Normally 
> the name of the option is printed on the left-hand side of the output and the 
> help text is printed on the right. However, if the help text contains a tab 
> character, the text to the left of the tab is used instead of the option's 
> name and the text to the right of the tab forms the help text. This allows 
> you to elaborate on what type of argument the option takes.       
> “
> 
> Looks like that by design, the help text will be automatically line-wrapped 
> before being displayed to fit on the current console. So, I think that the 
> long line should be fine? (the only way to
> make the help text shorter line is to cut the help text). 

Hi.

I see, then it's all fine.

> 
> I also see some other options have even longer help text:
> 
>   -fcf-protection=[full|branch|return|none] Instrument functions with checks 
> to verify jump/call/return control-flow transfer instructions have valid 
> targets.
>   -fisolate-erroneous-paths-attribute Detect paths that trigger erroneous or 
> undefined behavior due to a null value being used in a way forbidden by a 
> returns_nonnull or nonnull
>                               attribute.  Isolate those paths from the main 
> control flow and turn the statement with erroneous or undefined behavior into 
> a trap.
>   -fisolate-erroneous-paths-dereference Detect paths that trigger erroneous 
> or undefined behavior due to dereferencing a null pointer.  Isolate those 
> paths from the main control flow
>                               and turn the statement with erroneous or 
> undefined behavior into a trap.
> 

Good.

>> ...
>>
>>>
 - gcc/doc/invoke.texi - 2 spaces in between sentences + better gol
 - gcc/opts.c - do not mix spaces + tabs
>>>
>>> I have used contrib/check_GNU_style.sh to check the patch, I did see one 
>>> place that complains about 2 spaces in between sentences, fixed it.
>>
>> I see it:
>>
>> === ERROR type #3: dot, space, space, new sentence (3 error(s)) ===
>> gcc/common.opt:2190:62:optimizations to provide a safe compilation for 
>> live-patching.█At the same
>> gcc/doc/invoke.texi:9291:14:optimizations.█For example, inlining a function 
>> into its caller, cloning
>> gcc/doc/invoke.texi:9297:37:impacted functions for each function.█In order 
>> to control the number of
> 
> fixed.
> 
>>
>>> but I didn’t see spaces + tabs mix issue with the script. could you please 
>>> specify?
>>
>> This is a new check that I've just installed:
>>
>> === ERROR type #1: a space should not precede a tab (1 error(s)) ===
>> gcc/opts.c:2350:0:    control_optimizations_for_live_patching (opts, 
>> opts_set

[PATCH] Fix PR88105

2018-11-20 Thread Richard Biener



Boostrapped and tested on x86_64-unknown-linux-gnu, applied.

Richard.

>From 27b32c8684c9703b92f5c035ebb6f06b9e2a20af Mon Sep 17 00:00:00 2001
From: Richard Guenther 
Date: Tue, 20 Nov 2018 10:33:16 +0100
Subject: [PATCH] fix-pr88105

2018-11-20  Richard Biener  

PR tree-optimization/88074
* tree-ssa-dom.c (pass_dominator::execute): Do not walk
backedges.

* gcc.dg/pr88074.c: New testcase.

diff --git a/gcc/testsuite/gcc.dg/gomp/pr88105.c 
b/gcc/testsuite/gcc.dg/gomp/pr88105.c
new file mode 100644
index 000..9680fdd19f6
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/gomp/pr88105.c
@@ -0,0 +1,30 @@
+/* { dg-do compile } */
+/* { dg-options "-fopenmp -O -fexceptions -fnon-call-exceptions -fno-tree-fre" 
} */
+
+int
+s0 (void)
+{
+  int g6, oh = 0;
+  int *a6 = &g6;
+
+  (void) a6;
+
+#pragma omp parallel for
+  for (g6 = 0; g6 < 1; ++g6)
+{
+  int zk;
+
+  for (zk = 0; zk < 1; ++zk)
+{
+  oh += zk / (zk + 1);
+
+  for (;;)
+{
+}
+}
+
+  a6 = &zk;
+}
+
+  return oh;
+}
diff --git a/gcc/tree-ssa-dom.c b/gcc/tree-ssa-dom.c
index c50618dc809..7787da8b237 100644
--- a/gcc/tree-ssa-dom.c
+++ b/gcc/tree-ssa-dom.c
@@ -777,7 +777,8 @@ pass_dominator::execute (function *fun)
  if (bb == NULL)
continue;
  while (single_succ_p (bb)
-&& (single_succ_edge (bb)->flags & EDGE_EH) == 0)
+&& (single_succ_edge (bb)->flags
+& (EDGE_EH|EDGE_DFS_BACK)) == 0)
bb = single_succ (bb);
  if (bb == EXIT_BLOCK_PTR_FOR_FN (fun))
continue;

[PATCH] Do not mix -fsanitize=thread and -mabi=ms (PR sanitizer/88017).

2018-11-20 Thread Martin Liška

Hi.

It's very similar to what I did few days ago for -fsanitize=address and 
-mabi=ms.

Patch survives tests on x86_64-linux-gnu and bootstraps.

Ready for trunk?
Thanks,
Martin

gcc/ChangeLog:

2018-11-20  Martin Liska  

PR sanitizer/88017
* config/i386/i386.c (ix86_option_override_internal):

gcc/testsuite/ChangeLog:

2018-11-20  Martin Liska  

PR sanitizer/88017
* gcc.dg/tsan/pr88017.c: New test.
---
 gcc/config/i386/i386.c  | 2 ++
 gcc/testsuite/gcc.dg/tsan/pr88017.c | 6 ++
 2 files changed, 8 insertions(+)
 create mode 100644 gcc/testsuite/gcc.dg/tsan/pr88017.c


diff --git a/gcc/config/i386/i386.c b/gcc/config/i386/i386.c
index c18c60a1d19..6bd1eeefe87 100644
--- a/gcc/config/i386/i386.c
+++ b/gcc/config/i386/i386.c
@@ -3550,6 +3550,8 @@ ix86_option_override_internal (bool main_args_p,
 error ("%<-mabi=ms%> not supported with %<-fsanitize=address%>");
   if ((opts->x_flag_sanitize & SANITIZE_KERNEL_ADDRESS) && opts->x_ix86_abi == MS_ABI)
 error ("%<-mabi=ms%> not supported with %<-fsanitize=kernel-address%>");
+  if ((opts->x_flag_sanitize & SANITIZE_THREAD) && opts->x_ix86_abi == MS_ABI)
+error ("%<-mabi=ms%> not supported with %<-fsanitize=thread%>");
 
   /* For targets using ms ABI enable ms-extensions, if not
  explicit turned off.  For non-ms ABI we turn off this
diff --git a/gcc/testsuite/gcc.dg/tsan/pr88017.c b/gcc/testsuite/gcc.dg/tsan/pr88017.c
new file mode 100644
index 000..82693a67e87
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/tsan/pr88017.c
@@ -0,0 +1,6 @@
+/* { dg-do compile { target { { i?86-*-* x86_64-*-* } && lp64 } } } */
+/* { dg-options "-fsanitize=thread -mabi=ms" } */
+
+int i;
+
+/* { dg-error ".-mabi=ms. not supported with .-fsanitize=thread." "" { target *-*-* } 0 } */

Re: [PATCH] Remove unreachable nodes before IPA profile pass (PR ipa/87706).

2018-11-20 Thread Jan Hubicka

> Hi.
> 
> In order to fix the warnings mentioned in the PR, we need
> to run remove_unreachable_nodes after early tree passes. That's
> however possible only within a IPA pass. Thus I'm calling that
> before the profile PASS.
> 
> Patch survives regression tests on ppc64le-linux-gnu and majority
> of warnings are gone in profiledbootstrap.
> 
> Ready for trunk?
> Thanks,
> Martin
> 
> gcc/ChangeLog:
> 
> 2018-11-08  Martin Liska  
> 
>   * tree-profile.c: Run TODO_remove_functions before "profile"
>   pass in order to remove dead functions that will trigger
>   -Wmissing-profile.

Hi,
it turns out there are few bugs on the way.  First ipa.c has incomplete
tests on when it needs to keep the functions. Also the function remval
is actually scheduled in fnsummary1 pass that just run bit late because
post-profile function splitting needs it.

This is variant I ended up testing.  My original plan to do it at the
end of the early passes ipa pass did not fly because TODOs are run
before subpasses (as already commented in ipa-fnsummary.c. So I ended up
splitting ipa_fnsummary into two mini-passes.

Bootstrapped/regtested x86_64-linux, plan to commit it shortly.

Honza

* ipa-fnsummary.c (pass_ipa_fnsummary): Do not remove functions
* ipa.c (possible_inline_candidate_p): Break out from ..
(process_references): ... here ; drop before_inlining_p;
cleanup handling of alises.
(walk_polymorphic_call_targets): Likewise.
(symbol_table::remove_unreachable_nodes): Likewise.
* passes.c (pass_data_ipa_remove_symbols): New structure.
(pass_ipa_remove_symbols): New pass.
(make_pass_ipa_remove_symbols): New functoin.
* passes.def (pass_ipa_remove_symbols): Schedule after early passes.
Index: ipa-fnsummary.c
===
--- ipa-fnsummary.c (revision 266288)
+++ ipa-fnsummary.c (working copy)
@@ -3563,10 +3563,7 @@ public:
   virtual unsigned int execute (function *)
 {
   ipa_free_fn_summary ();
-  /* Early optimizations may make function unreachable.  We can not
-remove unreachable functions as part of the early opts pass because
-TODOs are run before subpasses.  Do it here.  */
-  return small_p ? TODO_remove_functions | TODO_dump_symtab : 0;
+  return 0;
 }
 
 private:
Index: ipa.c
===
--- ipa.c   (revision 266288)
+++ ipa.c   (working copy)
@@ -101,12 +101,32 @@ enqueue_node (symtab_node *node, symtab_
   *first = node;
 }
 
+/* Return true if NODE may get inlined later.
+   This is used to keep DECL_EXTERNAL function bodies around long enough
+   so inliner can proces them.  */
+
+static bool
+possible_inline_candidate_p (symtab_node *node)
+{
+  if (symtab->state >= IPA_SSA_AFTER_INLINING)
+return false;
+  cgraph_node *cnode = dyn_cast  (node);
+  if (!cnode)
+return false;
+  if (DECL_UNINLINABLE (cnode->decl))
+return false;
+  if (opt_for_fn (cnode->decl, optimize))
+return true;
+  if (symtab->state >= IPA_SSA)
+return false;
+  return lookup_attribute ("always_inline", DECL_ATTRIBUTES (node->decl));
+}
+
 /* Process references.  */
 
 static void
 process_references (symtab_node *snode,
symtab_node **first,
-   bool before_inlining_p,
hash_set *reachable)
 {
   int i;
@@ -118,14 +138,7 @@ process_references (symtab_node *snode,
 
   if (node->definition && !node->in_other_partition
  && ((!DECL_EXTERNAL (node->decl) || node->alias)
- || (((before_inlining_p
-   && (TREE_CODE (node->decl) != FUNCTION_DECL
-   || (TREE_CODE (node->decl) == FUNCTION_DECL
-   && opt_for_fn (body->decl, optimize))
-   || (symtab->state < IPA_SSA
-   && lookup_attribute
-("always_inline",
- DECL_ATTRIBUTES (body->decl))
+ || (possible_inline_candidate_p (node)
  /* We use variable constructors during late compilation for
 constant folding.  Keep references alive so partitioning
 knows about potential references.  */
@@ -140,7 +153,7 @@ process_references (symtab_node *snode,
 body.  */
  if (DECL_EXTERNAL (node->decl)
  && node->alias
- && before_inlining_p)
+ && symtab->state < IPA_SSA_AFTER_INLINING)
reachable->add (body);
  reachable->add (node);
}
@@ -160,8 +173,7 @@ static void
 walk_polymorphic_call_targets (hash_set *reachable_call_targets,
   struct cgraph_edge *edge,
   symtab_node **first,
-  hash_set *reachable,
-  bool before_inlining_p)
+

Fix PR37916 (compile time regression)

2018-11-20 Thread Michael Matz

Hi,

the testcase gcc.dg/20020425-1.c was once a compile time hog.  With 
current trunk it only needs 7 seconds (on my machine, with -O0 cc1) but 
there's still something to improve as virtually all of that time is 
wasted in repeatedly scanning the same (long) sequence of gimple 
statements to possibly give them locations.  Basically it's:

  gimplify_stmt (E)
gimplify_expr (E)
  gimplify_cond_expr (E)
gimplify_stmt (E.then)
  gimplify_expr (E.then)
update_locs (seq1)
gimplify_stmt (E.else)
  gimplify_expr (E.else)
update_locs (seq2);
return (seq = seq1 + seq2)
  update_locs (seq)

So the tails of the sequence (containing the expanded then/else subtrees) 
are repeatedly iterated over to give them locations from E, even if they 
already have locations from E.then and E.else.  That's quadratic.

So let's avoid this, as the patch does.  If we are sure that the 
subsequence has locations (namely when E.then or E.else have locations) 
return that sequence not in *pre_p but in something else that isn't 
iterated over but appended to *pre_p in the caller.

That shoves off most time and it now takes 0.25 seconds.

The bug report contains some discussion about how the recursion between 
gimplify_expr->gimplify_cond_expr is bad, and I'm not tackling that.  It 
would only be possible with an explicit stack (as these trees _are_ 
recursive) and the testcase is not as deeply nested to need that on normal 
stack sizes.

But it's not a time hog anymore, so should we marked it fixed 
nevertheless?

Anyway, regstrapped on x86-64-linux, no regressions.  Okay for trunk?


Ciao,
Michael.

PR middle-end/38059
* gimplify.c (gimplify_cond_expr): Add MIDDLE argument, use
for passing back part of generated sequence.
(gimplify_expr): Change call to gimplify_cond_expr, don't
iterate through middle for updating locations.

diff --git a/gcc/gimplify.c b/gcc/gimplify.c
index 87082ad10d2a..719f4ba379ed 100644
--- a/gcc/gimplify.c
+++ b/gcc/gimplify.c
@@ -3940,10 +3940,14 @@ generic_expr_could_trap_p (tree expr)
 The second form is used when *EXPR_P is of type void.
 
 PRE_P points to the list where side effects that must happen before
-  *EXPR_P should be stored.  */
+  *EXPR_P should be stored.
+MIDDLE points to a sequence that possibly contains the lowered
+  statements for the two arms; must be appended to *PRE_P in the
+  caller.  */
 
 static enum gimplify_status
-gimplify_cond_expr (tree *expr_p, gimple_seq *pre_p, fallback_t fallback)
+gimplify_cond_expr (tree *expr_p, gimple_seq *pre_p, gimple_seq *middle,
+   fallback_t fallback)
 {
   tree expr = *expr_p;
   tree type = TREE_TYPE (expr);
@@ -3952,10 +3956,17 @@ gimplify_cond_expr (tree *expr_p, gimple_seq *pre_p, 
fallback_t fallback)
   enum gimplify_status ret;
   tree label_true, label_false, label_cont;
   bool have_then_clause_p, have_else_clause_p;
+  location_t loc1, loc2;
   gcond *cond_stmt;
   enum tree_code pred_code;
   gimple_seq seq = NULL;
 
+  loc1 = TREE_OPERAND (expr, 1)
+  ? EXPR_LOCATION (TREE_OPERAND (expr, 1))
+  : UNKNOWN_LOCATION;
+  loc2 = TREE_OPERAND (expr, 2)
+  ? EXPR_LOCATION (TREE_OPERAND (expr, 2))
+  : UNKNOWN_LOCATION;
   /* If this COND_EXPR has a value, copy the values into a temporary within
  the arms.  */
   if (!VOID_TYPE_P (type))
@@ -4098,6 +4109,7 @@ gimplify_cond_expr (tree *expr_p, gimple_seq *pre_p, 
fallback_t fallback)
 &arm2);
   cond_stmt = gimple_build_cond (pred_code, arm1, arm2, label_true,
 label_false);
+  gimple_set_location (cond_stmt, input_location);
   gimple_set_no_warning (cond_stmt, TREE_NO_WARNING (COND_EXPR_COND (expr)));
   gimplify_seq_add_stmt (&seq, cond_stmt);
   gimple_stmt_iterator gsi = gsi_last (seq);
@@ -4150,7 +4162,15 @@ gimplify_cond_expr (tree *expr_p, gimple_seq *pre_p, 
fallback_t fallback)
 gimplify_seq_add_stmt (&seq, gimple_build_label (label_cont));
 
   gimple_pop_condition (pre_p);
-  gimple_seq_add_seq (pre_p, seq);
+
+  /* If SEQ contains only statements that certainly will have a location
+ there's no need for our caller to retry setting another location,
+ so put that sequence into *MIDDLE.  Otherwise just append to *PRE_P.  */
+  if ((!have_then_clause_p || loc1 != UNKNOWN_LOCATION)
+  && (!have_else_clause_p || loc2 != UNKNOWN_LOCATION))
+gimple_seq_add_seq (middle, seq);
+  else
+gimple_seq_add_seq (pre_p, seq);
 
   if (ret == GS_ERROR)
 ; /* Do nothing.  */
@@ -12182,6 +12202,7 @@ gimplify_expr (tree *expr_p, gimple_seq *pre_p, 
gimple_seq *post_p,
   tree tmp;
   gimple_seq internal_pre = NULL;
   gimple_seq internal_post = NULL;
+  gimple_seq middle = NULL;
   tree save_expr;
   bool is_statement;
   location_t saved_location;
@@ -12319,7 +12340,7 @@ gimplify_expr (tree *expr_p, gimple_seq *pre_p,

Fix PR37916 (unnecessary spilling)

2018-11-20 Thread Michael Matz

Hi,

this bug report is about cris generating worse code since tree-ssa.  The 
effect is also visible on x86-64.  The symptom is that the work horse of 
adler32.c (from zlib) needs spills in the inner loop, while gcc 3 did not, 
and those spills go away with -fno-tree-reassoc.

The underlying reason for the spills is register pressure, which could 
either be rectified by the pressure aware scheduling (which cris doesn't 
have), or by simply not generating high pressure code to begin with.  In 
this case it's TER which ultimately causes the register pressure to 
increase, and there are many plans in people minds how to fix this (make 
TER regpressure aware, do some regpressure scheduling on gimple, or even 
more ambitious things), but this patch doesn't tackle this.  Instead it 
makes reassoc not generate the situation which causes TER to run wild.

TER increasing register pressure is a long standing problem and so it has 
some heuristics to avoid that.  One wobbly heuristic is that it doesn't 
TER expressions together that have the same base variable as their LHSs.
But reassoc generates only anonymous SSA names for its temporary 
subexpressions, so that TER heuristic doesn't apply.  In this testcase 
it's even the case that reassoc doesn't really change much code (one 
addition moves from the end to the beginning of the inner loop), so that 
whole rewriting is even pointless.

In any case, let's use copy_ssa_name instead of make_ssa_name, when we 
have an obvious LHS; that leads to TER making the same decisions with and 
without -fno-tree-reassoc, leading to the same register pressure and no
spills.

On x86-64 the effect is:
  before patch: 48 bytes stackframe, 24 stack 
accesses (most of them in the loops), 576 bytes codesize;
  after patch: no stack frame, no stack accesses, 438 bytes codesize

On cris:
  before patch: 64 bytes stack frame, 27 stack access in loops, size of .s 
145 lines
  after patch: 20 bytes stack frame (as it uses callee saved regs, which 
is also complained about in the bug report), but no stack accesses
in loops, size of .s: 125 lines

I'm wondering about testcase: should I add an x86-64 specific that tests 
for no stack accesses, or would that be too constraining in the future?

Regstrapped on x86-64-linux, no regressions.  Okay for trunk?


Ciao,
Michael.

PR tree-optimization/37916
* tree-ssa-reassoc.c (make_new_ssa_for_def): Use copy_ssa_name.
(rewrite_expr_tree, linearize_expr, negate_value,
repropagate_negates, attempt_builtin_copysign,
reassociate_bb): Likewise.

diff --git a/gcc/tree-ssa-reassoc.c b/gcc/tree-ssa-reassoc.c
index 971d926e7895..339c3d4e447f 100644
--- a/gcc/tree-ssa-reassoc.c
+++ b/gcc/tree-ssa-reassoc.c
@@ -1182,7 +1182,7 @@ make_new_ssa_for_def (gimple *stmt, enum tree_code 
opcode, tree op)
   tree new_lhs, new_debug_lhs = NULL_TREE;
   tree lhs = gimple_get_lhs (stmt);
 
-  new_lhs = make_ssa_name (TREE_TYPE (lhs));
+  new_lhs = copy_ssa_name (lhs);
   gimple_set_lhs (stmt, new_lhs);
 
   /* Also need to update GIMPLE_DEBUGs.  */
@@ -4512,7 +4512,7 @@ rewrite_expr_tree (gimple *stmt, unsigned int opindex,
{
  gimple *insert_point
= find_insert_point (stmt, oe1->op, oe2->op);
- lhs = make_ssa_name (TREE_TYPE (lhs));
+ lhs = copy_ssa_name (lhs);
  stmt
= gimple_build_assign (lhs, gimple_assign_rhs_code (stmt),
   oe1->op, oe2->op);
@@ -4583,7 +4583,7 @@ rewrite_expr_tree (gimple *stmt, unsigned int opindex,
  unsigned int uid = gimple_uid (stmt);
  gimple *insert_point = find_insert_point (stmt, new_rhs1, oe->op);
 
- lhs = make_ssa_name (TREE_TYPE (lhs));
+ lhs = copy_ssa_name (lhs);
  stmt = gimple_build_assign (lhs, gimple_assign_rhs_code (stmt),
  new_rhs1, oe->op);
  gimple_set_uid (stmt, uid);
@@ -4820,7 +4820,7 @@ linearize_expr (gimple *stmt)
   gsi = gsi_for_stmt (stmt);
 
   gimple_assign_set_rhs2 (stmt, gimple_assign_rhs1 (binrhs));
-  binrhs = gimple_build_assign (make_ssa_name (TREE_TYPE (lhs)),
+  binrhs = gimple_build_assign (copy_ssa_name (lhs),
gimple_assign_rhs_code (binrhs),
gimple_assign_lhs (binlhs),
gimple_assign_rhs2 (binrhs));
@@ -4909,7 +4909,7 @@ negate_value (tree tonegate, gimple_stmt_iterator *gsip)
   rhs2 = negate_value (rhs2, &gsi);
 
   gsi = gsi_for_stmt (negatedefstmt);
-  lhs = make_ssa_name (TREE_TYPE (lhs));
+  lhs = copy_ssa_name (lhs);
   gimple_set_visited (negatedefstmt, true);
   g = gimple_build_assign (lhs, PLUS_EXPR, rhs1, rhs2);
   gimple_set_uid (g, gimple_uid (negatedefstmt));
@@ -5245,7 +5245,7 @@ repropagate_negates (void)
  tree b = gimple_assign_rhs2 (user);
  gimple_stmt_iterator gsi = gsi_for_stm

Re: Fix PR37916 (compile time regression)

2018-11-20 Thread Richard Biener

On Tue, Nov 20, 2018 at 2:23 PM Michael Matz  wrote:
>
> Hi,
>
> the testcase gcc.dg/20020425-1.c was once a compile time hog.  With
> current trunk it only needs 7 seconds (on my machine, with -O0 cc1) but
> there's still something to improve as virtually all of that time is
> wasted in repeatedly scanning the same (long) sequence of gimple
> statements to possibly give them locations.  Basically it's:
>
>   gimplify_stmt (E)
> gimplify_expr (E)
>   gimplify_cond_expr (E)
> gimplify_stmt (E.then)
>   gimplify_expr (E.then)
> update_locs (seq1)
> gimplify_stmt (E.else)
>   gimplify_expr (E.else)
> update_locs (seq2);
> return (seq = seq1 + seq2)
>   update_locs (seq)
>
> So the tails of the sequence (containing the expanded then/else subtrees)
> are repeatedly iterated over to give them locations from E, even if they
> already have locations from E.then and E.else.  That's quadratic.
>
> So let's avoid this, as the patch does.  If we are sure that the
> subsequence has locations (namely when E.then or E.else have locations)
> return that sequence not in *pre_p but in something else that isn't
> iterated over but appended to *pre_p in the caller.
>
> That shoves off most time and it now takes 0.25 seconds.
>
> The bug report contains some discussion about how the recursion between
> gimplify_expr->gimplify_cond_expr is bad, and I'm not tackling that.  It
> would only be possible with an explicit stack (as these trees _are_
> recursive) and the testcase is not as deeply nested to need that on normal
> stack sizes.
>
> But it's not a time hog anymore, so should we marked it fixed
> nevertheless?
>
> Anyway, regstrapped on x86-64-linux, no regressions.  Okay for trunk?

Ick.  Given you do that only for one stmt kind and it looks kind of ugly
wouldn't it be better to splat out gimple_set_location (g, input_location)
to all 108 places that call gimple_build_* in gimplify.c and get rid of
that ugly location post-processing?  I also wonder why we do not simply
rely on the "surrounding" location handing of UNKNOWN_LOCATION and,
say, simply only annotate the "main" gimplified stmt with the expr location?

Richard.

>
> Ciao,
> Michael.
>
> PR middle-end/38059
> * gimplify.c (gimplify_cond_expr): Add MIDDLE argument, use
> for passing back part of generated sequence.
> (gimplify_expr): Change call to gimplify_cond_expr, don't
> iterate through middle for updating locations.
>
> diff --git a/gcc/gimplify.c b/gcc/gimplify.c
> index 87082ad10d2a..719f4ba379ed 100644
> --- a/gcc/gimplify.c
> +++ b/gcc/gimplify.c
> @@ -3940,10 +3940,14 @@ generic_expr_could_trap_p (tree expr)
>  The second form is used when *EXPR_P is of type void.
>
>  PRE_P points to the list where side effects that must happen before
> -  *EXPR_P should be stored.  */
> +  *EXPR_P should be stored.
> +MIDDLE points to a sequence that possibly contains the lowered
> +  statements for the two arms; must be appended to *PRE_P in the
> +  caller.  */
>
>  static enum gimplify_status
> -gimplify_cond_expr (tree *expr_p, gimple_seq *pre_p, fallback_t fallback)
> +gimplify_cond_expr (tree *expr_p, gimple_seq *pre_p, gimple_seq *middle,
> +   fallback_t fallback)
>  {
>tree expr = *expr_p;
>tree type = TREE_TYPE (expr);
> @@ -3952,10 +3956,17 @@ gimplify_cond_expr (tree *expr_p, gimple_seq *pre_p, 
> fallback_t fallback)
>enum gimplify_status ret;
>tree label_true, label_false, label_cont;
>bool have_then_clause_p, have_else_clause_p;
> +  location_t loc1, loc2;
>gcond *cond_stmt;
>enum tree_code pred_code;
>gimple_seq seq = NULL;
>
> +  loc1 = TREE_OPERAND (expr, 1)
> +  ? EXPR_LOCATION (TREE_OPERAND (expr, 1))
> +  : UNKNOWN_LOCATION;
> +  loc2 = TREE_OPERAND (expr, 2)
> +  ? EXPR_LOCATION (TREE_OPERAND (expr, 2))
> +  : UNKNOWN_LOCATION;
>/* If this COND_EXPR has a value, copy the values into a temporary within
>   the arms.  */
>if (!VOID_TYPE_P (type))
> @@ -4098,6 +4109,7 @@ gimplify_cond_expr (tree *expr_p, gimple_seq *pre_p, 
> fallback_t fallback)
>  &arm2);
>cond_stmt = gimple_build_cond (pred_code, arm1, arm2, label_true,
>  label_false);
> +  gimple_set_location (cond_stmt, input_location);
>gimple_set_no_warning (cond_stmt, TREE_NO_WARNING (COND_EXPR_COND (expr)));
>gimplify_seq_add_stmt (&seq, cond_stmt);
>gimple_stmt_iterator gsi = gsi_last (seq);
> @@ -4150,7 +4162,15 @@ gimplify_cond_expr (tree *expr_p, gimple_seq *pre_p, 
> fallback_t fallback)
>  gimplify_seq_add_stmt (&seq, gimple_build_label (label_cont));
>
>gimple_pop_condition (pre_p);
> -  gimple_seq_add_seq (pre_p, seq);
> +
> +  /* If SEQ contains only statements that certainly will have a location
> + there's no need for our caller to retry setting another loca

Re: Fix PR37916 (unnecessary spilling)

2018-11-20 Thread Alexander Monakov

On Tue, 20 Nov 2018, Michael Matz wrote:
> 
> I'm wondering about testcase: should I add an x86-64 specific that tests 
> for no stack accesses, or would that be too constraining in the future?
> 
> Regstrapped on x86-64-linux, no regressions.  Okay for trunk?

By the way, this patch helps x86-64 on PR 84681.  Some unnecessary spills
remain in the loop, but not as many as without the patch.

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=84681

Alexander

[PATCH][RFC] Extend locations where to seach for Fortran pre-include.

2018-11-20 Thread Martin Liška

Hi.

Following patch is a follow up of the discussion we had on IRC about
locations where a Fortran pre-include should be searched.

With the patch applied I see now:
$ strace -f -s512 ./xgcc -B. ~/Programming/testcases/usage.F90  -c 2>&1 | grep 
math-vector
access("./x86_64-pc-linux-gnu/9.0.0/math-vector-fortran.h", R_OK) = -1 ENOENT 
(No such file or directory)
access("./math-vector-fortran.h", R_OK) = -1 ENOENT (No such file or directory)
access("/home/marxin/bin/gcc2/bin/../include/finclude/x86_64-pc-linux-gnu/9.0.0/math-vector-fortran.h",
 R_OK) = -1 ENOENT (No such file or directory)
access("/home/marxin/bin/gcc2/bin/../include/finclude/math-vector-fortran.h", 
R_OK) = -1 ENOENT (No such file or directory)
access("/usr/include/finclude/x86_64-pc-linux-gnu/9.0.0/math-vector-fortran.h", 
R_OK) = -1 ENOENT (No such file or directory)
access("/usr/include/finclude/math-vector-fortran.h", R_OK) = -1 ENOENT (No 
such file or directory)

Martin
>From c89d25005ae649d652d316cefe2aab8c676cd6ca Mon Sep 17 00:00:00 2001
From: marxin 
Date: Tue, 20 Nov 2018 15:09:16 +0100
Subject: [PATCH] Extend locations where to seach for Fortran pre-include.

---
 gcc/gcc.c | 49 +++--
 1 file changed, 31 insertions(+), 18 deletions(-)

diff --git a/gcc/gcc.c b/gcc/gcc.c
index 4d01e1e2f3b..bd6b83a3e6d 100644
--- a/gcc/gcc.c
+++ b/gcc/gcc.c
@@ -9891,20 +9891,49 @@ debug_level_greater_than_spec_func (int argc, const char **argv)
   return NULL;
 }
 
+static void
+path_prefix_reset (path_prefix *prefix)
+{
+  struct prefix_list *iter, *next;
+  iter = prefix->plist;
+  while (iter)
+{
+  next = iter->next;
+  free (const_cast  (iter->prefix));
+  XDELETE (iter);
+  iter = next;
+}
+  prefix->plist = 0;
+  prefix->max_len = 0;
+}
+
 /* The function takes 2 arguments: OPTION name and file name.
When the FILE is found by find_file, return OPTION=path_to_file.  */
 
 static const char *
 find_fortran_preinclude_file (int argc, const char **argv)
 {
+  char *result = NULL;
   if (argc != 2)
 return NULL;
 
+  struct path_prefix prefixes = { 0, 0, "preinclude" };
+  add_prefix (&prefixes, STANDARD_BINDIR_PREFIX "../include/finclude/", NULL,
+	  0, 0, 0);
+  add_prefix (&prefixes, "/usr/include/finclude/", NULL, 0, 0, 0);
+
   const char *path = find_a_file (&include_prefixes, argv[1], R_OK, true);
   if (path != NULL)
-return concat (argv[0], path, NULL);
+result = concat (argv[0], path, NULL);
+  else
+{
+  path = find_a_file (&prefixes, argv[1], R_OK, true);
+  if (path != NULL)
+	result = concat (argv[0], path, NULL);
+}
 
-  return NULL;
+  path_prefix_reset (&prefixes);
+  return result;
 }
 
 
@@ -9956,22 +9985,6 @@ convert_white_space (char *orig)
 return orig;
 }
 
-static void
-path_prefix_reset (path_prefix *prefix)
-{
-  struct prefix_list *iter, *next;
-  iter = prefix->plist;
-  while (iter)
-{
-  next = iter->next;
-  free (const_cast  (iter->prefix));
-  XDELETE (iter);
-  iter = next;
-}
-  prefix->plist = 0;
-  prefix->max_len = 0;
-}
-
 /* Restore all state within gcc.c to the initial state, so that the driver
code can be safely re-run in-process.
 
-- 
2.19.1

Re: Fix PR37916 (compile time regression)

2018-11-20 Thread Michael Matz

Hi,

On Tue, 20 Nov 2018, Richard Biener wrote:

> > Anyway, regstrapped on x86-64-linux, no regressions.  Okay for trunk?
> 
> Ick.  Given you do that only for one stmt kind and it looks kind of ugly 
> wouldn't it be better to splat out gimple_set_location (g, 
> input_location) to all 108 places that call gimple_build_* in gimplify.c 
> and get rid of that ugly location post-processing?

I thought about this and rejected it for stage 3, but if you say that's 
feasible I'll work on that; it is indeed nicer.

> I also wonder why we do not simply rely on the "surrounding" location 
> handing of UNKNOWN_LOCATION and, say, simply only annotate the "main" 
> gimplified stmt with the expr location?

Yeah.  Though that will be harder to verify to be correct (or at least not 
regressing vis the current state).

Ciao,
Michael.

Re: [PATCH][RFC] Extend locations where to seach for Fortran pre-include.

2018-11-20 Thread Jakub Jelinek

On Tue, Nov 20, 2018 at 03:14:02PM +0100, Martin Liška wrote:
> Following patch is a follow up of the discussion we had on IRC about
> locations where a Fortran pre-include should be searched.
> 
> With the patch applied I see now:
> $ strace -f -s512 ./xgcc -B. ~/Programming/testcases/usage.F90  -c 2>&1 | 
> grep math-vector
> access("./x86_64-pc-linux-gnu/9.0.0/math-vector-fortran.h", R_OK) = -1 ENOENT 
> (No such file or directory)
> access("./math-vector-fortran.h", R_OK) = -1 ENOENT (No such file or 
> directory)
> access("/home/marxin/bin/gcc2/bin/../include/finclude/x86_64-pc-linux-gnu/9.0.0/math-vector-fortran.h",
>  R_OK) = -1 ENOENT (No such file or directory)
> access("/home/marxin/bin/gcc2/bin/../include/finclude/math-vector-fortran.h", 
> R_OK) = -1 ENOENT (No such file or directory)
> access("/usr/include/finclude/x86_64-pc-linux-gnu/9.0.0/math-vector-fortran.h",
>  R_OK) = -1 ENOENT (No such file or directory)
> access("/usr/include/finclude/math-vector-fortran.h", R_OK) = -1 ENOENT (No 
> such file or directory)
> 
>  static const char *
>  find_fortran_preinclude_file (int argc, const char **argv)
>  {
> +  char *result = NULL;
>if (argc != 2)
>  return NULL;
>  

it doesn't search the same directory as is normally searched for Fortran
modules; that is finclude%s and thus would need to be yet another
argument to this spec internal function and you'd check if
a file exists with that prefix

> +  struct path_prefix prefixes = { 0, 0, "preinclude" };
> +  add_prefix (&prefixes, STANDARD_BINDIR_PREFIX "../include/finclude/", NULL,
> +   0, 0, 0);
> +  add_prefix (&prefixes, "/usr/include/finclude/", NULL, 0, 0, 0);

hardcoding /usr/include looks just very wrong here.  That should always be
dependent on the configured prefix or better be relative from the driver,
gcc should be relocatable.  Or at least come from configure.  It should e.g.
honor the sysroot stuff etc.

That said, I think you need somebody familiar with the driver, perhaps
Joseph?

Jakub

Re: C++ PATCH to implement P1094R2, Nested inline namespaces

2018-11-20 Thread Marek Polacek

On Tue, Nov 20, 2018 at 10:36:32AM +0100, Jakub Jelinek wrote:
> On Tue, Nov 20, 2018 at 10:25:01AM +0100, Richard Biener wrote:
> > > Bootstrapped/regtested on x86_64-linux, ok for trunk?
> > 
> > Just another small comment - given the usual high number of
> > C++ regressions delaying the release is Stage3 the right time
> > to add new language features?
> 
> I'd say this is small enough and worth an exception, it is just useful 
> syntactic
> sugar, and couldn't be submitted (much) earlier as it has been voted in
> during the week when stage1 closed.

Yeah, this one is very low risk and I think it would be nice to have it in for
GCC 9 to make the C++20 support better.
FWIW, I don't plan on working on other C++20 features for this stage; I'll be
addressing C++ bugs now.

Marek

Re: [RFC][PATCH]Merge VEC_COND_EXPR into MASK_STORE after loop vectorization

2018-11-20 Thread Renlin Li


Hi Richard,

On 11/14/2018 02:59 PM, Richard Biener wrote:

On Fri, Nov 9, 2018 at 4:49 PM Renlin Li  wrote:


Hi Richard,

On 11/09/2018 11:48 AM, Richard Biener wrote:

On Thu, Nov 8, 2018 at 5:55 PM Renlin Li  wrote:


Hi Richard,



I don't see the masked load here on x86_64 btw. (I don't see
if-conversion generating a load).
I guess that's again when store-data-races are allowed that it uses a
RMW cycle and vectorization
generating the masked variants for the loop-mask.  Which means for SVE
if-conversion should
prefer the masked-store variant even when store data races are allowed?


Yes, it looks like, for SVE, masked-store variant is preferred even when store 
data races are allowed.
This decision is made in if-cvt.

mask_store need a pointer, and if it is created from an array access, we need 
to make sure the data reference analysis
could properly analysis relationship between array reference and pointer 
reference.
So that no versioned loop is generated during loop vectorization.
(This is a general improvement, and could be done in a different patch?)







I was wondering whether we can implement

l = [masked]load;
tem = cond ? x : l;
masked-store = tem;

pattern matching in a regular pass - forwprop for example.  Note the
load doesn't need to be masked,
correct?  In fact if it is masked you need to make sure the
conditional never accesses parts that
are masked in the load, no?  Or require the mask to be the same as
that used by the store.  But then
you still cannot simply replace the store mask with a new mask
generated from the conditional?


Yes, this would require the mask for load and store is the same.
This matches the pattern before loop vectorization.
The mask here is loop mask, to ensure we are bounded by the number of 
iterations.

The new mask is the (original mask & condition mask) (example shown above).
In this case, less lanes will be stored.

It is possible we do that in forwprop.
I could try to integrate the change into it if it is the correct place to go.

As the pattern is initially generated by loop vectorizer, I did the change 
right after it before it got
converted into other forms. For example, forwprop will transform the original 
code into:

vect__2.4_29 = vect_cst__27 + { 1, ... };
_16 = (void *) ivtmp.13_25;
_2 = &MEM[base: _16, offset: 0B];
vect__ifc__24.7_33 = .MASK_LOAD (_2, 4B, loop_mask_32);
_28 = vect_cst__34 != { 0, ... };
_35 = .COND_ADD (_28, vect_cst__27, { 1, ... }, vect__ifc__24.7_33);
vect__ifc__26.8_36 = _35;
.MASK_STORE (_2, 4B, loop_mask_32, vect__ifc__26.8_36);
ivtmp_41 = ivtmp_40 + POLY_INT_CST [4, 4];
next_mask_43 = .WHILE_ULT (ivtmp_41, 16, { 0, ... });
ivtmp.13_15 = ivtmp.13_25 + POLY_INT_CST [16, 16];

This make the pattern matching not straight forward.


Ah, that's because of the .COND_ADDs (I wonder about the copy that's
left - forwprop should eliminate copies).  Note the pattern-matching
could go in the

   /* Apply forward propagation to all stmts in the basic-block.
  Note we update GSI within the loop as necessary.  */

loop which comes before the match.pd pattern matching so you'd
still see the form without the .COND_ADD I think.

There _is_ precedence for some masked-store post-processing
in the vectorizer (optimize_mask_stores), not that I like that
very much either.  Eventually those can be at least combined...

Thanks for your suggestion, indeed .COND_ADD is generated later in fold_stmt 
function.

I update the patch with the style of "forward-propagation". It starts from
VEC_COND, and forward propagate it into MASK_STORE when specific pattern is 
found.

 X = MASK_LOAD (PTR, -, MASK)
 VAL = ...
 Y = VEC_COND (cond, VAL, X)
 MASK_STORE (PTR, -, MASK, Y)


That said, I prefer the forwprop place for any pattern matching
and the current patch needs more comments to understand
what it is doing (the DCE it does is IMHO premature).  You
should also modify the masked store in-place rather than
building a new one.  I don't like how you need to use
find_data_references_in_stmt, can't you simply compare
the address and size arguments?  


find_data_references_in_stmt is used because the two data reference are created
as two new SSA_NAMEs from same scalar pointer by loop vectorizer.
I can not directly compare the address as the are complicated with loop 
information.

By moving the functionality into forwprop, the complications are removed by the 
optimizers in between.
This makes a simple comparison possible.



It should also work for
a non-masked load I guess and thus apply to non-SVE if
we manage to feed the masked store with another conditional.


You are right, non-masked load is a load with a mask of all 1.
As long as the store mask is a subset of load mask, and they are loading from 
the same address,
we could do this combining. (I haven't add this yet as I don't have a test case 
to test it)
This probably indicates there are more cases we could rewrite a

Re: Fix PR37916 (compile time regression)

2018-11-20 Thread Richard Biener

On Tue, Nov 20, 2018 at 3:19 PM Michael Matz  wrote:
>
> Hi,
>
> On Tue, 20 Nov 2018, Richard Biener wrote:
>
> > > Anyway, regstrapped on x86-64-linux, no regressions.  Okay for trunk?
> >
> > Ick.  Given you do that only for one stmt kind and it looks kind of ugly
> > wouldn't it be better to splat out gimple_set_location (g,
> > input_location) to all 108 places that call gimple_build_* in gimplify.c
> > and get rid of that ugly location post-processing?
>
> I thought about this and rejected it for stage 3, but if you say that's
> feasible I'll work on that; it is indeed nicer.

If you can try that would be nice, and yes, given it fixes a bug it's
OK for stage3.

> > I also wonder why we do not simply rely on the "surrounding" location
> > handing of UNKNOWN_LOCATION and, say, simply only annotate the "main"
> > gimplified stmt with the expr location?
>
> Yeah.  Though that will be harder to verify to be correct (or at least not
> regressing vis the current state).

True.  Nevertheless eventually good enough ;)

Richard.

>
> Ciao,
> Michael.

Re: [PATCH] [aarch64] Add CPU support for Ampere Computing's eMAG.

2018-11-20 Thread Christoph Müllner

Hi Kyrill,

thank's for your comments.

> On 20.11.2018, at 11:08, Kyrill Tkachov  wrote:
> 
> Hi Christoph,
> 
> Thank you for the patch.
> Can you please confirm how this has been tested?

Tested with "make check" and no regressions found.
Will put that info into the mail next time.

> 
> On 19/11/18 17:11, Christoph Muellner wrote:
>> *** gcc/ChangeLog ***
>> 
>> 2018-xx-xx  Christoph Muellner 
>> 
>>  * config/aarch64/aarch64-cores.def: Define emag
>>  * config/aarch64/aarch64-tune.md: Regenerated with emag
>>  * config/aarch64/aarch64.c: Defining tuning struct
> 
> Please include the name of the new struct like so:
>* config/aarch64/aarch64.c (emag_tunings): New struct.
> 
> Also, full stops at the end of all entries.

Will do.

> 
>>  * doc/invoke.texi: Document mtune value
>> ---
>>  gcc/config/aarch64/aarch64-cores.def |  1 +
>>  gcc/config/aarch64/aarch64-tune.md   |  2 +-
>>  gcc/config/aarch64/aarch64.c | 25 +
>>  gcc/doc/invoke.texi  |  2 +-
>>  4 files changed, 28 insertions(+), 2 deletions(-)
>> 
>> diff --git a/gcc/config/aarch64/aarch64-cores.def 
>> b/gcc/config/aarch64/aarch64-cores.def
>> index 1f3ac56..6e6800e 100644
>> --- a/gcc/config/aarch64/aarch64-cores.def
>> +++ b/gcc/config/aarch64/aarch64-cores.def
>> @@ -63,6 +63,7 @@ AARCH64_CORE("thunderxt83",   thunderxt83,   thunderx,  
>> 8A,  AARCH64_FL_FOR_ARCH
>>/* APM ('P') cores. */
>>  AARCH64_CORE("xgene1",  xgene1,xgene1,8A,  
>> AARCH64_FL_FOR_ARCH8, xgene1, 0x50, 0x000, -1)
>> +AARCH64_CORE("emag",emag,  xgene1,8A,  
>> AARCH64_FL_FOR_ARCH8, emag, 0x50, 0x000, -1)
>>  
> 
> I'd suggest you start a new comment "section" here, something like /* Ampere 
> cores.  */
> From this definition this looks identical to xgene1. In particular the IMP 
> and PART fields.
> Are they really the same? You can find the values in /proc/cpuinfo on a 
> GNU/Linux system.
> 
> If so, I don't think the -mcpu=native support will be able to pick up emag 
> properly.
> Do you have access to a Linux system running on this processor? What does 
> -mcpu=native -### get rewritten to?

Yes, Xgene3 and eMAG share the same implementor (0x50) and part (0x3) field.
I will set the part field to 0x3 and reorder the entries, s.t. emag is 
preferred.

Thanks,
Christoph


> 
>>  /* Qualcomm ('Q') cores. */
>>  AARCH64_CORE("falkor",  falkor,falkor,8A,  AARCH64_FL_FOR_ARCH8 
>> | AARCH64_FL_CRC | AARCH64_FL_CRYPTO | AARCH64_FL_RDMA, qdf24xx,   0x51, 
>> 0xC00, -1)
>> diff --git a/gcc/config/aarch64/aarch64-tune.md 
>> b/gcc/config/aarch64/aarch64-tune.md
>> index fade1d4..408976a 100644
>> --- a/gcc/config/aarch64/aarch64-tune.md
>> +++ b/gcc/config/aarch64/aarch64-tune.md
>> @@ -1,5 +1,5 @@
>>  ;; -*- buffer-read-only: t -*-
>>  ;; Generated automatically by gentune.sh from aarch64-cores.def
>>  (define_attr "tune"
>> -
>> "cortexa35,cortexa53,cortexa57,cortexa72,cortexa73,thunderx,thunderxt88p1,thunderxt88,thunderxt81,thunderxt83,xgene1,falkor,qdf24xx,exynosm1,phecda,thunderx2t99p1,vulcan,thunderx2t99,cortexa55,cortexa75,cortexa76,ares,tsv110,saphira,cortexa57cortexa53,cortexa72cortexa53,cortexa73cortexa35,cortexa73cortexa53,cortexa75cortexa55,cortexa76cortexa55"
>> +
>> "cortexa35,cortexa53,cortexa57,cortexa72,cortexa73,thunderx,thunderxt88p1,thunderxt88,thunderxt81,thunderxt83,xgene1,emag,falkor,qdf24xx,exynosm1,phecda,thunderx2t99p1,vulcan,thunderx2t99,cortexa55,cortexa75,cortexa76,ares,tsv110,saphira,cortexa57cortexa53,cortexa72cortexa53,cortexa73cortexa35,cortexa73cortexa53,cortexa75cortexa55,cortexa76cortexa55"
>>  (const (symbol_ref "((enum attr_tune) aarch64_tune)")))
>> diff --git a/gcc/config/aarch64/aarch64.c b/gcc/config/aarch64/aarch64.c
>> index f7f88a9..995aafe 100644
>> --- a/gcc/config/aarch64/aarch64.c
>> +++ b/gcc/config/aarch64/aarch64.c
>> @@ -957,6 +957,31 @@ static const struct tune_params xgene1_tunings =
>>&xgene1_prefetch_tune
>>  };
>>  +static const struct tune_params emag_tunings =
>> +{
>> +  &xgene1_extra_costs,
>> +  &xgene1_addrcost_table,
>> +  &xgene1_regmove_cost,
>> +  &xgene1_vector_cost,
>> +  &generic_branch_cost,
>> +  &xgene1_approx_modes,
>> +  6, /* memmov_cost  */
>> +  4, /* issue_rate  */
>> +  AARCH64_FUSE_NOTHING, /* fusible_ops  */
>> +  "16", /* function_align.  */
>> +  "16", /* jump_align.  */
>> +  "16", /* loop_align.  */
>> +  2,/* int_reassoc_width.  */
>> +  4,/* fp_reassoc_width.  */
>> +  1,/* vec_reassoc_width.  */
>> +  2,/* min_div_recip_mul_sf.  */
>> +  2,/* min_div_recip_mul_df.  */
>> +  17,   /* max_case_values.  */
>> +  tune_params::AUTOPREFETCHER_OFF,  /* autoprefetcher_model.  */
>> +  (AARCH64_EXTRA_TUNE_NO_LDP_STP_QREGS),/* tune_flags.  */
>> +  &xgene1_prefetch_tune
>> +};
>> +
>>  static const struct tune_params qdf24xx_tunings =
>>  {
>>&qdf24xx_extra_costs,
>> diff --git a/gcc/doc/invoke.texi b/gcc/do

[PATCH v2] [aarch64] Add CPU support for Ampere Computing's eMAG.

2018-11-20 Thread Christoph Muellner

Tested with "make check" and no regressions found.

*** gcc/ChangeLog ***

2018-xx-xx  Christoph Muellner 

* config/aarch64/aarch64-cores.def: Define emag.
* config/aarch64/aarch64-tune.md: Regenerated with emag.
* config/aarch64/aarch64.c (emag_tunings): New struct.
* doc/invoke.texi: Document mtune value.

Signed-off-by: Christoph Muellner 
---
 gcc/config/aarch64/aarch64-cores.def |  3 +++
 gcc/config/aarch64/aarch64-tune.md   |  2 +-
 gcc/config/aarch64/aarch64.c | 25 +
 gcc/doc/invoke.texi  |  2 +-
 4 files changed, 30 insertions(+), 2 deletions(-)

diff --git a/gcc/config/aarch64/aarch64-cores.def 
b/gcc/config/aarch64/aarch64-cores.def
index 1f3ac56..8eee97f 100644
--- a/gcc/config/aarch64/aarch64-cores.def
+++ b/gcc/config/aarch64/aarch64-cores.def
@@ -61,6 +61,9 @@ AARCH64_CORE("thunderxt88",   thunderxt88,   thunderx,  8A,  
AARCH64_FL_FOR_ARCH
 AARCH64_CORE("thunderxt81",   thunderxt81,   thunderx,  8A,  
AARCH64_FL_FOR_ARCH8 | AARCH64_FL_CRC | AARCH64_FL_CRYPTO, thunderx,  0x43, 
0x0a2, -1)
 AARCH64_CORE("thunderxt83",   thunderxt83,   thunderx,  8A,  
AARCH64_FL_FOR_ARCH8 | AARCH64_FL_CRC | AARCH64_FL_CRYPTO, thunderx,  0x43, 
0x0a3, -1)
 
+/* Ampere Computing cores. */
+AARCH64_CORE("emag",emag,  xgene1,8A,  AARCH64_FL_FOR_ARCH8, 
emag, 0x50, 0x000, 3)
+
 /* APM ('P') cores. */
 AARCH64_CORE("xgene1",  xgene1,xgene1,8A,  AARCH64_FL_FOR_ARCH8, 
xgene1, 0x50, 0x000, -1)
 
diff --git a/gcc/config/aarch64/aarch64-tune.md 
b/gcc/config/aarch64/aarch64-tune.md
index fade1d4..2fc7f03 100644
--- a/gcc/config/aarch64/aarch64-tune.md
+++ b/gcc/config/aarch64/aarch64-tune.md
@@ -1,5 +1,5 @@
 ;; -*- buffer-read-only: t -*-
 ;; Generated automatically by gentune.sh from aarch64-cores.def
 (define_attr "tune"
-   
"cortexa35,cortexa53,cortexa57,cortexa72,cortexa73,thunderx,thunderxt88p1,thunderxt88,thunderxt81,thunderxt83,xgene1,falkor,qdf24xx,exynosm1,phecda,thunderx2t99p1,vulcan,thunderx2t99,cortexa55,cortexa75,cortexa76,ares,tsv110,saphira,cortexa57cortexa53,cortexa72cortexa53,cortexa73cortexa35,cortexa73cortexa53,cortexa75cortexa55,cortexa76cortexa55"
+   
"cortexa35,cortexa53,cortexa57,cortexa72,cortexa73,thunderx,thunderxt88p1,thunderxt88,thunderxt81,thunderxt83,emag,xgene1,falkor,qdf24xx,exynosm1,phecda,thunderx2t99p1,vulcan,thunderx2t99,cortexa55,cortexa75,cortexa76,ares,tsv110,saphira,cortexa57cortexa53,cortexa72cortexa53,cortexa73cortexa35,cortexa73cortexa53,cortexa75cortexa55,cortexa76cortexa55"
(const (symbol_ref "((enum attr_tune) aarch64_tune)")))
diff --git a/gcc/config/aarch64/aarch64.c b/gcc/config/aarch64/aarch64.c
index f7f88a9..995aafe 100644
--- a/gcc/config/aarch64/aarch64.c
+++ b/gcc/config/aarch64/aarch64.c
@@ -957,6 +957,31 @@ static const struct tune_params xgene1_tunings =
   &xgene1_prefetch_tune
 };
 
+static const struct tune_params emag_tunings =
+{
+  &xgene1_extra_costs,
+  &xgene1_addrcost_table,
+  &xgene1_regmove_cost,
+  &xgene1_vector_cost,
+  &generic_branch_cost,
+  &xgene1_approx_modes,
+  6, /* memmov_cost  */
+  4, /* issue_rate  */
+  AARCH64_FUSE_NOTHING, /* fusible_ops  */
+  "16",/* function_align.  */
+  "16",/* jump_align.  */
+  "16",/* loop_align.  */
+  2,   /* int_reassoc_width.  */
+  4,   /* fp_reassoc_width.  */
+  1,   /* vec_reassoc_width.  */
+  2,   /* min_div_recip_mul_sf.  */
+  2,   /* min_div_recip_mul_df.  */
+  17,  /* max_case_values.  */
+  tune_params::AUTOPREFETCHER_OFF, /* autoprefetcher_model.  */
+  (AARCH64_EXTRA_TUNE_NO_LDP_STP_QREGS),   /* tune_flags.  */
+  &xgene1_prefetch_tune
+};
+
 static const struct tune_params qdf24xx_tunings =
 {
   &qdf24xx_extra_costs,
diff --git a/gcc/doc/invoke.texi b/gcc/doc/invoke.texi
index e016dce..ac81fb2 100644
--- a/gcc/doc/invoke.texi
+++ b/gcc/doc/invoke.texi
@@ -15288,7 +15288,7 @@ Specify the name of the target processor for which GCC 
should tune the
 performance of the code.  Permissible values for this option are:
 @samp{generic}, @samp{cortex-a35}, @samp{cortex-a53}, @samp{cortex-a55},
 @samp{cortex-a57}, @samp{cortex-a72}, @samp{cortex-a73}, @samp{cortex-a75},
-@samp{cortex-a76}, @samp{ares}, @samp{exynos-m1}, @samp{falkor},
+@samp{cortex-a76}, @samp{ares}, @samp{exynos-m1}, @samp{emag}, @samp{falkor},
 @samp{qdf24xx}, @samp{saphira}, @samp{phecda}, @samp{xgene1}, @samp{vulcan},
 @samp{thunderx}, @samp{thunderxt88}, @samp{thunderxt88p1}, @samp{thunderxt81},
 @samp{tsv110}, @samp{thunderxt83}, @samp{thunderx2t99},
-- 
2.9.5

Re: [PATCH] make function_args_iterator a proper iterator

2018-11-20 Thread Martin Sebor


On 11/20/2018 02:21 AM, Richard Biener wrote:

On Mon, Nov 19, 2018 at 4:36 PM Martin Sebor  wrote:


On 11/19/2018 03:32 AM, Richard Biener wrote:

On Sat, Nov 17, 2018 at 12:05 AM Martin Sebor  wrote:


To encourage and simplify the adoption of iterator classes in
GCC the attached patch turns the function_args_iterator struct
into an (almost) proper C++ iterator class that can be used
the same way as traditional forward iterators.

The patch also replaces all of the 26 uses of the legacy
FOREACH_FUNCTION_ARGS macro with ordinary for loops that use
function_args_iterator directly, and also poisons both
FOREACH_FUNCTION_ARGS and the unused FOREACH_FUNCTION_ARGS_PTR
macros.

The few dozen (hundred?) existing uses of for loops that iterate
over function parameter types using the TREE_CHAIN() macro can
be relatively easily modified to adopt the iterator approach over
time.  (The patch stops of short of making this change.)

Eventually, when GCC moves to more a recent C++ revision, it will
become possible to simplify the for loops to make use of the range
based for loop syntax along the lines of:

   for (auto argtype: function_args (functype))
 {
   ...
 }

Tested on x86_64-linux, and (lightly) on powerpc64le-linux using
a cross-compiler.  I'll test the changes to the other back ends
before committing.


This isn't stage3 material.


In the response referenced below Jeff requested I make use of
iterators in my patch.  This simply does what he asked for,
except throughout all of GCC.


I don't think he said you should invent new iterators - we have
existing ones.


The patch doesn't add a new iterator: it makes the existing
function_args_iterator a proper iterator class with the expected
iterator members like increment and equality operator, to make
it usable in contexts where other iterators (and pointers) are
expected.

Martin



Richard.



Martin



Richard.



Martin

PS For some additional background on this change see:
   https://gcc.gnu.org/ml/gcc-patches/2018-11/msg00493.html

[PR 87926] bitmap -Wno-error=array-bounds

2018-11-20 Thread Nathan Sidwell


Applying a more focussed fix, as suggested by Richard.

nathan
--
Nathan Sidwell
2018-11-20  Nathan Sidwell  

	PR 87926
	* Makefile.in (bitmap.o-warn): Use -Wno-error=array-bounds.

Index: Makefile.in
===
--- Makefile.in	(revision 266318)
+++ Makefile.in	(working copy)
@@ -221,7 +221,7 @@ libgcov-merge-tool.o-warn = -Wno-error
 gimple-match.o-warn = -Wno-unused
 generic-match.o-warn = -Wno-unused
 dfp.o-warn = -Wno-strict-aliasing
-bitmap.o-warn = -Wno-error # PR 87926
+bitmap.o-warn = -Wno-error=array-bounds # PR 87926
 
 # All warnings have to be shut off in stage1 if the compiler used then
 # isn't gcc; configure determines that.  WARN_CFLAGS will be either

Re: [PATCH] handle unusual targets in -Wbuiltin-declaration-mismatch (PR 88098)

2018-11-20 Thread Christophe Lyon

On Mon, 19 Nov 2018 at 22:38, Martin Sebor  wrote:
>
> The gcc.dg/Wbuiltin-declaration-mismatch-4.c test added with
> the recent -Wbuiltin-declaration-mismatch enhancement to detect
> calls with incompatible arguments to built-ins declared without
> a prototype fails on a few targets due to incorrect assumptions
> hardcoded into the test.  Besides removing those assumptions
> (or adding appropriate { target } attributes, the attached patch
> also adjusts the implementation of the warning to avoid triggering
> for enum promotion to int on short_enums targets.
>
> Since the fix is trivial I plan to commit it tomorrow if there
> are no concerns.
>
> Tested on x86_64-linux and with an arm-none-eabi cross-compiler.
> I also did a little bit of testing with sparc-solaris2.11 cross
> compiler but there the test harness fails due to the -m32 option
> so the Wbuiltin-declaration-mismatch-4.c still has unexpected
> FAILs.  I've raised bug 88104 for the outstanding problem on
> sparc-solaris2.11.
>

Hello,

I tested your patch on arm* and aarch64*. It does the job on arm, but
on aarch64*elf,
I'm seeing new failures:
gcc.dg/Wbuiltin-declaration-mismatch-4.c large long double (test for
warnings, line 121)
gcc.dg/Wbuiltin-declaration-mismatch-4.c large long double (test for
warnings, line 123)
gcc.dg/Wbuiltin-declaration-mismatch-4.c large long double (test for
warnings, line 98)



> Martin

Improve ODR warnings

2018-11-20 Thread Jan Hubicka

Hi,
this patch fixes another problem with ODR warnings (and I hope I am one patch
away from closing that PR).

As shown in the first testcase Martin attached, we output very confusing
warning about wrong alignment of subtype instead of noticing that type
contains field of type that already violates ODR.  This used to (mostly)
work by accient since types was registered in streaming order that goes
by SCCs. Since Maritn sabilized the order to follow locations this is no
longer true. It was not robust anyway since two types may share SCC
(they ought not anymore but still it is better to not have correctness
of warnings dependent on details on the streaming).

This patch fixes it by simply making register_odr_type to first recurse to
subtypes to be sure that everything is registered in right order.

Testcases are still missing - there is location being output wrong which I
will work next.

lto-bootstrapped/regtested x86_64-linux, will commit it shortly.

Honza

PR lto/87957
* ipa-devirt.c (odr_subtypes_equivalent_p): Report ODR violation
when sybtype already violates ODR.
(get_odr_type): Do not ICE when insert is false and type duplicate
is not registered yet.
(register_odr_type): Be sure to register subtypes first.
Index: ipa-devirt.c
===
--- ipa-devirt.c(revision 266314)
+++ ipa-devirt.c(working copy)
@@ -692,14 +692,17 @@ odr_subtypes_equivalent_p (tree t1, tree
  and other ODR even though it is a violation.  */
   if (types_odr_comparable (t1, t2))
 {
+  if (t1 != t2
+ && odr_type_p (TYPE_MAIN_VARIANT (t1))
+ && get_odr_type (TYPE_MAIN_VARIANT (t1), true)->odr_violated)
+   return false;
   if (!types_same_for_odr (t1, t2))
 return false;
   if (!type_variants_equivalent_p (t1, t2, warn, warned))
return false;
   /* Limit recursion: If subtypes are ODR types and we know
 that they are same, be happy.  */
-  if (!odr_type_p (TYPE_MAIN_VARIANT (t1))
- || !get_odr_type (TYPE_MAIN_VARIANT (t1), true)->odr_violated)
+  if (odr_type_p (TYPE_MAIN_VARIANT (t1)))
 return true;
 }
 
@@ -2047,10 +2050,9 @@ get_odr_type (tree type, bool insert)
   else if (*vtable_slot)
val = *vtable_slot;
 
-  if (val->type != type
+  if (val->type != type && insert
  && (!val->types_set || !val->types_set->add (type)))
{
- gcc_assert (insert);
  /* We have type duplicate, but it may introduce vtable name or
 mangled name; be sure to keep hashes in sync.  */
  if (in_lto_p && can_be_vtable_hashed_p (type)
@@ -2144,7 +2146,36 @@ register_odr_type (tree type)
 odr_vtable_hash = new odr_vtable_hash_type (23);
 }
   if (type == TYPE_MAIN_VARIANT (type))
-get_odr_type (type, true);
+{
+  /* To get ODR warings right, first register all sub-types.  */
+  if (RECORD_OR_UNION_TYPE_P (type)
+ && COMPLETE_TYPE_P (type))
+   {
+ /* Limit recursion on types which are already registered.  */
+ odr_type ot = get_odr_type (type, false);
+ if (ot
+ && (ot->type == type
+ || (ot->types_set
+ && ot->types_set->contains (type
+   return;
+ for (tree f = TYPE_FIELDS (type); f; f = TREE_CHAIN (f))
+   if (TREE_CODE (f) == FIELD_DECL)
+ {
+   tree subtype = TREE_TYPE (f);
+
+   while (TREE_CODE (subtype) == ARRAY_TYPE)
+ subtype = TREE_TYPE (subtype);
+   if (type_with_linkage_p (TYPE_MAIN_VARIANT (subtype)))
+ register_odr_type (TYPE_MAIN_VARIANT (subtype));
+ }
+  if (TYPE_BINFO (type))
+for (unsigned int i = 0;
+ i < BINFO_N_BASE_BINFOS (TYPE_BINFO (type)); i++)
+  register_odr_type (BINFO_TYPE (BINFO_BASE_BINFO
+(TYPE_BINFO (type), i)));
+   }
+  get_odr_type (type, true);
+}
 }
 
 /* Return true if type is known to have no derivations.  */

Re: [PATCH 1/6] [RS6000] rs6000_call_template for external call insn assembly output

2018-11-20 Thread Segher Boessenkool

On Tue, Nov 13, 2018 at 11:17:55PM +1030, Alan Modra wrote:
> Version 2.
> 
> This is a first step in tidying rs6000 call patterns, in preparation
> to support inline plt calls.

Okay for trunk.  Thanks for the patch, and for the rework!


Segher


>   * config/rs6000/rs6000-protos.h (rs6000_call_template): Declare.
>   (rs6000_sibcall_template): Declare.
>   (macho_call_template): Rename from output_call.
>   * config/rs6000/rs6000.c (rs6000_call_template_1): New function.
>   (rs6000_call_template, rs6000_sibcall_template): Likewise.
>   (macho_call_template): Rename from output_call.
>   * config/rs6000/rs6000.md (tls_gd_aix, tls_gd_sysv),
>   (tls_gd_call_aix, tls_gd_call_sysv, tls_ld_aix, tls_ld_sysv),
>   (tls_ld_call_aix, tls_ld_call_sysv, call_nonlocal_sysv),
>   (call_nonlocal_sysv_secure, call_value_nonlocal_sysv),
>   (call_value_nonlocal_sysv_secure, call_nonlocal_aix),
>   (call_value_nonlocal_aix): Use rs6000_call_template and update
>   occurrences of output_call to macho_call_template.
>   (sibcall_nonlocal_sysv, sibcall_value_nonlocal_sysv, sibcall_aix),
>   (sibcall_value_aix): Use rs6000_sibcall_template.

Re: [PATCH v2] [aarch64] Add CPU support for Ampere Computing's eMAG.

2018-11-20 Thread Kyrill Tkachov


Hi Christoph,

On 20/11/18 15:22, Christoph Muellner wrote:

Tested with "make check" and no regressions found.

*** gcc/ChangeLog ***

2018-xx-xx  Christoph Muellner 

* config/aarch64/aarch64-cores.def: Define emag.
* config/aarch64/aarch64-tune.md: Regenerated with emag.
* config/aarch64/aarch64.c (emag_tunings): New struct.
* doc/invoke.texi: Document mtune value.

Signed-off-by: Christoph Muellner 
---
  gcc/config/aarch64/aarch64-cores.def |  3 +++
  gcc/config/aarch64/aarch64-tune.md   |  2 +-
  gcc/config/aarch64/aarch64.c | 25 +
  gcc/doc/invoke.texi  |  2 +-
  4 files changed, 30 insertions(+), 2 deletions(-)

diff --git a/gcc/config/aarch64/aarch64-cores.def 
b/gcc/config/aarch64/aarch64-cores.def
index 1f3ac56..8eee97f 100644
--- a/gcc/config/aarch64/aarch64-cores.def
+++ b/gcc/config/aarch64/aarch64-cores.def
@@ -61,6 +61,9 @@ AARCH64_CORE("thunderxt88",   thunderxt88,   thunderx,  8A,  
AARCH64_FL_FOR_ARCH
  AARCH64_CORE("thunderxt81",   thunderxt81,   thunderx,  8A,  
AARCH64_FL_FOR_ARCH8 | AARCH64_FL_CRC | AARCH64_FL_CRYPTO, thunderx,  0x43, 0x0a2, -1)
  AARCH64_CORE("thunderxt83",   thunderxt83,   thunderx,  8A,  
AARCH64_FL_FOR_ARCH8 | AARCH64_FL_CRC | AARCH64_FL_CRYPTO, thunderx,  0x43, 0x0a3, -1)
  
+/* Ampere Computing cores. */

+AARCH64_CORE("emag",emag,  xgene1,8A,  AARCH64_FL_FOR_ARCH8, 
emag, 0x50, 0x000, 3)
+


According to your previous reply, the 0x3 should be in the "PART" field, that is
..., 0x50, 0x3, -1)

Thanks,
Kyrill


  /* APM ('P') cores. */
  AARCH64_CORE("xgene1",  xgene1,xgene1,8A,  AARCH64_FL_FOR_ARCH8, 
xgene1, 0x50, 0x000, -1)
  
diff --git a/gcc/config/aarch64/aarch64-tune.md b/gcc/config/aarch64/aarch64-tune.md

index fade1d4..2fc7f03 100644
--- a/gcc/config/aarch64/aarch64-tune.md
+++ b/gcc/config/aarch64/aarch64-tune.md
@@ -1,5 +1,5 @@
  ;; -*- buffer-read-only: t -*-
  ;; Generated automatically by gentune.sh from aarch64-cores.def
  (define_attr "tune"
-   
"cortexa35,cortexa53,cortexa57,cortexa72,cortexa73,thunderx,thunderxt88p1,thunderxt88,thunderxt81,thunderxt83,xgene1,falkor,qdf24xx,exynosm1,phecda,thunderx2t99p1,vulcan,thunderx2t99,cortexa55,cortexa75,cortexa76,ares,tsv110,saphira,cortexa57cortexa53,cortexa72cortexa53,cortexa73cortexa35,cortexa73cortexa53,cortexa75cortexa55,cortexa76cortexa55"
+   
"cortexa35,cortexa53,cortexa57,cortexa72,cortexa73,thunderx,thunderxt88p1,thunderxt88,thunderxt81,thunderxt83,emag,xgene1,falkor,qdf24xx,exynosm1,phecda,thunderx2t99p1,vulcan,thunderx2t99,cortexa55,cortexa75,cortexa76,ares,tsv110,saphira,cortexa57cortexa53,cortexa72cortexa53,cortexa73cortexa35,cortexa73cortexa53,cortexa75cortexa55,cortexa76cortexa55"
(const (symbol_ref "((enum attr_tune) aarch64_tune)")))
diff --git a/gcc/config/aarch64/aarch64.c b/gcc/config/aarch64/aarch64.c
index f7f88a9..995aafe 100644
--- a/gcc/config/aarch64/aarch64.c
+++ b/gcc/config/aarch64/aarch64.c
@@ -957,6 +957,31 @@ static const struct tune_params xgene1_tunings =
&xgene1_prefetch_tune
  };
  
+static const struct tune_params emag_tunings =

+{
+  &xgene1_extra_costs,
+  &xgene1_addrcost_table,
+  &xgene1_regmove_cost,
+  &xgene1_vector_cost,
+  &generic_branch_cost,
+  &xgene1_approx_modes,
+  6, /* memmov_cost  */
+  4, /* issue_rate  */
+  AARCH64_FUSE_NOTHING, /* fusible_ops  */
+  "16",  /* function_align.  */
+  "16",  /* jump_align.  */
+  "16",  /* loop_align.  */
+  2,   /* int_reassoc_width.  */
+  4,   /* fp_reassoc_width.  */
+  1,   /* vec_reassoc_width.  */
+  2,   /* min_div_recip_mul_sf.  */
+  2,   /* min_div_recip_mul_df.  */
+  17,  /* max_case_values.  */
+  tune_params::AUTOPREFETCHER_OFF, /* autoprefetcher_model.  */
+  (AARCH64_EXTRA_TUNE_NO_LDP_STP_QREGS),   /* tune_flags.  */
+  &xgene1_prefetch_tune
+};
+
  static const struct tune_params qdf24xx_tunings =
  {
&qdf24xx_extra_costs,
diff --git a/gcc/doc/invoke.texi b/gcc/doc/invoke.texi
index e016dce..ac81fb2 100644
--- a/gcc/doc/invoke.texi
+++ b/gcc/doc/invoke.texi
@@ -15288,7 +15288,7 @@ Specify the name of the target processor for which GCC 
should tune the
  performance of the code.  Permissible values for this option are:
  @samp{generic}, @samp{cortex-a35}, @samp{cortex-a53}, @samp{cortex-a55},
  @samp{cortex-a57}, @samp{cortex-a72}, @samp{cortex-a73}, @samp{cortex-a75},
-@samp{cortex-a76}, @samp{ares}, @samp{exynos-m1}, @samp{falkor},
+@samp{cortex-a76}, @samp{ares}, @samp{exynos-m1}, @samp{emag}, @samp{falkor},
  @samp{qdf24xx}, @samp{saphira}, @samp{phecda}, @samp{xgene1}, @samp{vulcan},
  @samp{thunderx}, @samp{thunderxt88}, @samp{thunderxt88p1}, @samp{thunderxt81},
  @samp{tsv110}, @samp{thunderxt83}, @samp{thunderx2t99},

Re: [PATCH v2] [aarch64] Add CPU support for Ampere Computing's eMAG.

2018-11-20 Thread Christoph Müllner



> On 20.11.2018, at 17:14, Kyrill Tkachov  wrote:
> 
> Hi Christoph,
> 
> On 20/11/18 15:22, Christoph Muellner wrote:
>> Tested with "make check" and no regressions found.
>> 
>> *** gcc/ChangeLog ***
>> 
>> 2018-xx-xx  Christoph Muellner 
>> 
>>  * config/aarch64/aarch64-cores.def: Define emag.
>>  * config/aarch64/aarch64-tune.md: Regenerated with emag.
>>  * config/aarch64/aarch64.c (emag_tunings): New struct.
>>  * doc/invoke.texi: Document mtune value.
>> 
>> Signed-off-by: Christoph Muellner 
>> ---
>>  gcc/config/aarch64/aarch64-cores.def |  3 +++
>>  gcc/config/aarch64/aarch64-tune.md   |  2 +-
>>  gcc/config/aarch64/aarch64.c | 25 +
>>  gcc/doc/invoke.texi  |  2 +-
>>  4 files changed, 30 insertions(+), 2 deletions(-)
>> 
>> diff --git a/gcc/config/aarch64/aarch64-cores.def 
>> b/gcc/config/aarch64/aarch64-cores.def
>> index 1f3ac56..8eee97f 100644
>> --- a/gcc/config/aarch64/aarch64-cores.def
>> +++ b/gcc/config/aarch64/aarch64-cores.def
>> @@ -61,6 +61,9 @@ AARCH64_CORE("thunderxt88",   thunderxt88,   thunderx,  
>> 8A,  AARCH64_FL_FOR_ARCH
>>  AARCH64_CORE("thunderxt81",   thunderxt81,   thunderx,  8A,  
>> AARCH64_FL_FOR_ARCH8 | AARCH64_FL_CRC | AARCH64_FL_CRYPTO, thunderx,  0x43, 
>> 0x0a2, -1)
>>  AARCH64_CORE("thunderxt83",   thunderxt83,   thunderx,  8A,  
>> AARCH64_FL_FOR_ARCH8 | AARCH64_FL_CRC | AARCH64_FL_CRYPTO, thunderx,  0x43, 
>> 0x0a3, -1)
>>  +/* Ampere Computing cores. */
>> +AARCH64_CORE("emag",emag,  xgene1,8A,  
>> AARCH64_FL_FOR_ARCH8, emag, 0x50, 0x000, 3)
>> +
> 
> According to your previous reply, the 0x3 should be in the "PART" field, that 
> is
> ..., 0x50, 0x3, -1)

Should have been "variant field" in the email.
The v2 patch is correct and tested:

processor   : 0
BogoMIPS: 100.00
Features: fp asimd evtstrm aes pmull sha1 sha2 crc32 cpuid
CPU implementer : 0x50
CPU architecture: 8
CPU variant : 0x3
CPU part: 0x000
CPU revision: 1

gcc -mcpu=native -Q --help=target
  -mcpu=emag+crypto+crc+aes+sha2+profile

Thanks,
Christoph



> 
> Thanks,
> Kyrill
> 
>>  /* APM ('P') cores. */
>>  AARCH64_CORE("xgene1",  xgene1,xgene1,8A,  
>> AARCH64_FL_FOR_ARCH8, xgene1, 0x50, 0x000, -1)
>>  diff --git a/gcc/config/aarch64/aarch64-tune.md 
>> b/gcc/config/aarch64/aarch64-tune.md
>> index fade1d4..2fc7f03 100644
>> --- a/gcc/config/aarch64/aarch64-tune.md
>> +++ b/gcc/config/aarch64/aarch64-tune.md
>> @@ -1,5 +1,5 @@
>>  ;; -*- buffer-read-only: t -*-
>>  ;; Generated automatically by gentune.sh from aarch64-cores.def
>>  (define_attr "tune"
>> -
>> "cortexa35,cortexa53,cortexa57,cortexa72,cortexa73,thunderx,thunderxt88p1,thunderxt88,thunderxt81,thunderxt83,xgene1,falkor,qdf24xx,exynosm1,phecda,thunderx2t99p1,vulcan,thunderx2t99,cortexa55,cortexa75,cortexa76,ares,tsv110,saphira,cortexa57cortexa53,cortexa72cortexa53,cortexa73cortexa35,cortexa73cortexa53,cortexa75cortexa55,cortexa76cortexa55"
>> +
>> "cortexa35,cortexa53,cortexa57,cortexa72,cortexa73,thunderx,thunderxt88p1,thunderxt88,thunderxt81,thunderxt83,emag,xgene1,falkor,qdf24xx,exynosm1,phecda,thunderx2t99p1,vulcan,thunderx2t99,cortexa55,cortexa75,cortexa76,ares,tsv110,saphira,cortexa57cortexa53,cortexa72cortexa53,cortexa73cortexa35,cortexa73cortexa53,cortexa75cortexa55,cortexa76cortexa55"
>>  (const (symbol_ref "((enum attr_tune) aarch64_tune)")))
>> diff --git a/gcc/config/aarch64/aarch64.c b/gcc/config/aarch64/aarch64.c
>> index f7f88a9..995aafe 100644
>> --- a/gcc/config/aarch64/aarch64.c
>> +++ b/gcc/config/aarch64/aarch64.c
>> @@ -957,6 +957,31 @@ static const struct tune_params xgene1_tunings =
>>&xgene1_prefetch_tune
>>  };
>>  +static const struct tune_params emag_tunings =
>> +{
>> +  &xgene1_extra_costs,
>> +  &xgene1_addrcost_table,
>> +  &xgene1_regmove_cost,
>> +  &xgene1_vector_cost,
>> +  &generic_branch_cost,
>> +  &xgene1_approx_modes,
>> +  6, /* memmov_cost  */
>> +  4, /* issue_rate  */
>> +  AARCH64_FUSE_NOTHING, /* fusible_ops  */
>> +  "16", /* function_align.  */
>> +  "16", /* jump_align.  */
>> +  "16", /* loop_align.  */
>> +  2,/* int_reassoc_width.  */
>> +  4,/* fp_reassoc_width.  */
>> +  1,/* vec_reassoc_width.  */
>> +  2,/* min_div_recip_mul_sf.  */
>> +  2,/* min_div_recip_mul_df.  */
>> +  17,   /* max_case_values.  */
>> +  tune_params::AUTOPREFETCHER_OFF,  /* autoprefetcher_model.  */
>> +  (AARCH64_EXTRA_TUNE_NO_LDP_STP_QREGS),/* tune_flags.  */
>> +  &xgene1_prefetch_tune
>> +};
>> +
>>  static const struct tune_params qdf24xx_tunings =
>>  {
>>&qdf24xx_extra_costs,
>> diff --git a/gcc/doc/invoke.texi b/gcc/doc/invoke.texi
>> index e016dce..ac81fb2 100644
>> --- a/gcc/doc/invoke.texi
>> +++ b/gcc/doc/invoke.texi
>> @@ -15288,7 +15288,7 @@ Specify the name of the target processor for which 
>> GCC should tune the
>>  performance of the code.  Permissib

[Committed] S/390: Fix flogr RTX.

2018-11-20 Thread Andreas Krebbel

The flogr instruction uses a 64 bit register pair target operand.  In
the RTX we model this as a write to a TImode register.  Unfortunately
the RTX's being assigned to the two parts of the target operand were
swapped.  This is no problem if in the end the flogr instruction will
be emitted since the instruction still does what the clzdi expander
expects.  However, a problem arises when the RTX is used to optimize
CLZ for a constant input operand.  Even then it matters only if the
expression couldn't be folded on tree level already.

In the testcase this happened thanks to loop unrolling on RTL level.
The iteration variable is used as an argument to the clz
builtin. Due to the loop unrolling it becomes a constant and after
folding the broken RTX leads to a wrong assumption.

Bootstrapped and regtested on s390x.

I plan to backport this to older branches after giving it a week on
mainline.

gcc/ChangeLog:

2018-11-20  Andreas Krebbel  

* config/s390/s390.md ("clztidi2"): Swap the RTX's written to the
DImode parts of the target operand.

gcc/testsuite/ChangeLog:

2018-11-20  Andreas Krebbel  

* gcc.target/s390/flogr-1.c: New test.
---
 gcc/config/s390/s390.md | 16 +--
 gcc/testsuite/gcc.target/s390/flogr-1.c | 47 +
 2 files changed, 55 insertions(+), 8 deletions(-)
 create mode 100644 gcc/testsuite/gcc.target/s390/flogr-1.c

diff --git a/gcc/config/s390/s390.md b/gcc/config/s390/s390.md
index 721222d..30d113f 100644
--- a/gcc/config/s390/s390.md
+++ b/gcc/config/s390/s390.md
@@ -8759,17 +8759,17 @@
   DONE;
 })
 
+; CLZ result is in hard reg op0 - this is the high part of the target operand
+; The source with the left-most one bit cleared is in hard reg op0 + 1 - the 
low part
 (define_insn "clztidi2"
   [(set (match_operand:TI 0 "register_operand" "=d")
(ior:TI
- (ashift:TI
-(zero_extend:TI
- (xor:DI (match_operand:DI 1 "register_operand" "d")
-  (lshiftrt (match_operand:DI 2 "const_int_operand" "")
-   (subreg:SI (clz:DI (match_dup 1)) 4
-
-   (const_int 64))
-  (zero_extend:TI (clz:DI (match_dup 1)
+ (ashift:TI (zero_extend:TI (clz:DI (match_operand:DI 1 
"register_operand" "d")))
+(const_int 64))
+ (zero_extend:TI
+  (xor:DI (match_dup 1)
+  (lshiftrt (match_operand:DI 2 "const_int_operand" "")
+(subreg:SI (clz:DI (match_dup 1)) 4))
(clobber (reg:CC CC_REGNUM))]
   "UINTVAL (operands[2]) == HOST_WIDE_INT_1U << 63
&& TARGET_EXTIMM && TARGET_ZARCH"
diff --git a/gcc/testsuite/gcc.target/s390/flogr-1.c 
b/gcc/testsuite/gcc.target/s390/flogr-1.c
new file mode 100644
index 000..a386900
--- /dev/null
+++ b/gcc/testsuite/gcc.target/s390/flogr-1.c
@@ -0,0 +1,47 @@
+/* { dg-do run } */
+/* { dg-options "-O2 -funroll-loops -march=z9-109" } */
+/* { dg-require-effective-target stdint_types } */
+
+/* Folding of the FLOGR caused a wrong value to be returned by
+   __builtin_clz becuase of a problem in the RTX we emit for FLOGR.
+   The problematic folding can only be triggered with constants inputs
+   introduced on RTL level.  In this case it happens with loop
+   unrolling.  */
+
+#include 
+#include 
+
+static inline uint32_t pow2_ceil_u32(uint32_t x) {
+  if (x <= 1) {
+return x;
+  }
+  int msb_on_index;
+  msb_on_index = (31 ^ __builtin_clz(x - 1));
+  assert(msb_on_index < 31);
+  return 1U << (msb_on_index + 1);
+}
+
+void __attribute__((noinline,noclone))
+die (int a)
+{
+  if (a)
+__builtin_abort ();
+}
+
+void test_pow2_ceil_u32(void) {
+  unsigned i;
+
+  for (i = 0; i < 18; i++) {
+  uint32_t a_ = (pow2_ceil_u32(((uint32_t)1) << i));
+  if (!(a_ == (((uint32_t)1) << i))) {
+   die(1);
+  }
+  }
+}
+
+int
+main(void) {
+  test_pow2_ceil_u32();
+
+  return 0;
+}
-- 
2.7.4

Re: [PATCH 2/6] [RS6000] rs6000_indirect_call_template

2018-11-20 Thread Segher Boessenkool

Hi Alan,

On Tue, Nov 13, 2018 at 11:19:03PM +1030, Alan Modra wrote:
> Like the last patch for external calls, now handle most assembly code
> for indirect calls in one place.  The patch also merges some insns,
> correcting some !rs6000_speculate_indirect_jumps cases branching to
> LR, which don't require a speculation barrier.

Okay for trunk.  Thanks!


Segher


>   * config/rs6000/rs6000-protos.h (rs6000_indirect_call_template),
>   (rs6000_indirect_sibcall_template): Declare.
>   * config/rs6000/rs6000.c (rs6000_indirect_call_template_1),
>   (rs6000_indirect_call_template, rs6000_indirect_sibcall_template):
>   New functions.
>   * config/rs6000/rs6000.md (call_indirect_nonlocal_sysv),
>   (call_value_indirect_nonlocal_sysv, sibcall_nonlocal_sysv),
>   (call_indirect_aix, call_value_indirect_aix): Use
>   rs6000_indirect_call_template and rs6000_indirect_sibcall_template.
>   call_indirect_elfv2, call_value_indirect_elfv2): Likewise, and
>   handle both speculation and non-speculation cases.
>   (call_indirect_aix_nospec, call_value_indirect_aix_nospec): Delete.
>   (call_indirect_elfv2_nospec, call_value_indirect_elfv2_nospec): Delete.

Re: [PATCH v2] [aarch64] Add CPU support for Ampere Computing's eMAG.

2018-11-20 Thread Kyrill Tkachov




On 20/11/18 16:20, Christoph Müllner wrote:

On 20.11.2018, at 17:14, Kyrill Tkachov  wrote:

Hi Christoph,

On 20/11/18 15:22, Christoph Muellner wrote:

Tested with "make check" and no regressions found.

*** gcc/ChangeLog ***

2018-xx-xx  Christoph Muellner 

* config/aarch64/aarch64-cores.def: Define emag.
* config/aarch64/aarch64-tune.md: Regenerated with emag.
* config/aarch64/aarch64.c (emag_tunings): New struct.
* doc/invoke.texi: Document mtune value.

Signed-off-by: Christoph Muellner 
---
  gcc/config/aarch64/aarch64-cores.def |  3 +++
  gcc/config/aarch64/aarch64-tune.md   |  2 +-
  gcc/config/aarch64/aarch64.c | 25 +
  gcc/doc/invoke.texi  |  2 +-
  4 files changed, 30 insertions(+), 2 deletions(-)

diff --git a/gcc/config/aarch64/aarch64-cores.def 
b/gcc/config/aarch64/aarch64-cores.def
index 1f3ac56..8eee97f 100644
--- a/gcc/config/aarch64/aarch64-cores.def
+++ b/gcc/config/aarch64/aarch64-cores.def
@@ -61,6 +61,9 @@ AARCH64_CORE("thunderxt88",   thunderxt88,   thunderx,  8A,  
AARCH64_FL_FOR_ARCH
  AARCH64_CORE("thunderxt81",   thunderxt81,   thunderx,  8A,  
AARCH64_FL_FOR_ARCH8 | AARCH64_FL_CRC | AARCH64_FL_CRYPTO, thunderx,  0x43, 0x0a2, -1)
  AARCH64_CORE("thunderxt83",   thunderxt83,   thunderx,  8A,  
AARCH64_FL_FOR_ARCH8 | AARCH64_FL_CRC | AARCH64_FL_CRYPTO, thunderx,  0x43, 0x0a3, -1)
  +/* Ampere Computing cores. */
+AARCH64_CORE("emag",emag,  xgene1,8A,  AARCH64_FL_FOR_ARCH8, 
emag, 0x50, 0x000, 3)
+

According to your previous reply, the 0x3 should be in the "PART" field, that is
..., 0x50, 0x3, -1)

Should have been "variant field" in the email.
The v2 patch is correct and tested:


I see, that does look correct then.


processor   : 0
BogoMIPS: 100.00
Features: fp asimd evtstrm aes pmull sha1 sha2 crc32 cpuid


This line says that the CPU supports the "crc" and "crypto" option extensions 
by default so
the 5th field should include AARCH64_FL_CRC | AARCH64_FL_CRYPTO so that the 
user gets them
by default when they use -mcpu=emag.

Thanks, this is really helpful.
Kyrill


CPU implementer : 0x50
CPU architecture: 8
CPU variant : 0x3
CPU part: 0x000
CPU revision: 1

gcc -mcpu=native -Q --help=target
   -mcpu=emag+crypto+crc+aes+sha2+profile

Thanks,
Christoph




Thanks,
Kyrill


  /* APM ('P') cores. */
  AARCH64_CORE("xgene1",  xgene1,xgene1,8A,  AARCH64_FL_FOR_ARCH8, 
xgene1, 0x50, 0x000, -1)
  diff --git a/gcc/config/aarch64/aarch64-tune.md 
b/gcc/config/aarch64/aarch64-tune.md
index fade1d4..2fc7f03 100644
--- a/gcc/config/aarch64/aarch64-tune.md
+++ b/gcc/config/aarch64/aarch64-tune.md
@@ -1,5 +1,5 @@
  ;; -*- buffer-read-only: t -*-
  ;; Generated automatically by gentune.sh from aarch64-cores.def
  (define_attr "tune"
-   
"cortexa35,cortexa53,cortexa57,cortexa72,cortexa73,thunderx,thunderxt88p1,thunderxt88,thunderxt81,thunderxt83,xgene1,falkor,qdf24xx,exynosm1,phecda,thunderx2t99p1,vulcan,thunderx2t99,cortexa55,cortexa75,cortexa76,ares,tsv110,saphira,cortexa57cortexa53,cortexa72cortexa53,cortexa73cortexa35,cortexa73cortexa53,cortexa75cortexa55,cortexa76cortexa55"
+   
"cortexa35,cortexa53,cortexa57,cortexa72,cortexa73,thunderx,thunderxt88p1,thunderxt88,thunderxt81,thunderxt83,emag,xgene1,falkor,qdf24xx,exynosm1,phecda,thunderx2t99p1,vulcan,thunderx2t99,cortexa55,cortexa75,cortexa76,ares,tsv110,saphira,cortexa57cortexa53,cortexa72cortexa53,cortexa73cortexa35,cortexa73cortexa53,cortexa75cortexa55,cortexa76cortexa55"
(const (symbol_ref "((enum attr_tune) aarch64_tune)")))
diff --git a/gcc/config/aarch64/aarch64.c b/gcc/config/aarch64/aarch64.c
index f7f88a9..995aafe 100644
--- a/gcc/config/aarch64/aarch64.c
+++ b/gcc/config/aarch64/aarch64.c
@@ -957,6 +957,31 @@ static const struct tune_params xgene1_tunings =
&xgene1_prefetch_tune
  };
  +static const struct tune_params emag_tunings =
+{
+  &xgene1_extra_costs,
+  &xgene1_addrcost_table,
+  &xgene1_regmove_cost,
+  &xgene1_vector_cost,
+  &generic_branch_cost,
+  &xgene1_approx_modes,
+  6, /* memmov_cost  */
+  4, /* issue_rate  */
+  AARCH64_FUSE_NOTHING, /* fusible_ops  */
+  "16",  /* function_align.  */
+  "16",  /* jump_align.  */
+  "16",  /* loop_align.  */
+  2,   /* int_reassoc_width.  */
+  4,   /* fp_reassoc_width.  */
+  1,   /* vec_reassoc_width.  */
+  2,   /* min_div_recip_mul_sf.  */
+  2,   /* min_div_recip_mul_df.  */
+  17,  /* max_case_values.  */
+  tune_params::AUTOPREFETCHER_OFF, /* autoprefetcher_model.  */
+  (AARCH64_EXTRA_TUNE_NO_LDP_STP_QREGS),   /* tune_flags.  */
+  &xgene1_prefetch_tune
+};
+
  static const struct tune_params qdf24xx_tunings =
  {
&qdf24xx_extra_costs,
diff --git a/gcc/doc/invoke.texi b/gcc/doc/invoke.texi
index e016dce..ac81fb2 100644
--- a/gcc/doc/invoke.texi
+++ b/gcc/doc/invoke.texi
@@ -15288,7 +15288,7 @@ Specify the name of the target processor for whi

Re: Tweak ALAP calculation in SCHED_PRESSURE_MODEL

2018-11-20 Thread Pat Haugen

On 11/19/18 2:30 PM, Pat Haugen wrote:
>> This is a follow-up from 
>> https://gcc.gnu.org/ml/gcc-patches/2018-11/msg01525.html
>> This version introduces an "artificial" property of the dependencies 
>> produced in
>> sched-deps.c that is recorded when they are created due to 
>> MAX_PENDING_LIST_LENGTH
>> and they are thus ignored in the model_analyze_insns ALAP calculation.
>>
>> This approach gives most of the benefits of the original patch [1] on 
>> aarch64.
>> I tried it on the cactusADM hot function (bench_staggeredleapfrog2_) on 
>> powerpc64le-unknown-linux-gnu
>> with -O3 and found that the initial version proposed did indeed increase the 
>> instruction count
>> and stack space. This version gives a small improvement on powerpc in terms 
>> of instruction count
>> (number of st* instructions stays the same), so I'm hoping this version 
>> addresses Pat's concerns.
>> Pat, could you please try this version out if you've got the chance?
>>
> I tried the new verison on cactusADM, it's showing a 2% degradation. I've 
> kicked off a full CPU2006 run just to see if any others are affected.

The other benchmarks were neutral. So the only benchmark showing a change is 
the 2% degradation on cactusADM. Comparing the generated .s files for 
bench_staggeredleapfrog2_(), there is about a 0.7% increase in load insns and 
still the 1% increase in store insns.

-Pat

Re: Tweak ALAP calculation in SCHED_PRESSURE_MODEL

2018-11-20 Thread Kyrill Tkachov




On 20/11/18 16:48, Pat Haugen wrote:

On 11/19/18 2:30 PM, Pat Haugen wrote:

This is a follow-up from 
https://gcc.gnu.org/ml/gcc-patches/2018-11/msg01525.html
This version introduces an "artificial" property of the dependencies produced in
sched-deps.c that is recorded when they are created due to 
MAX_PENDING_LIST_LENGTH
and they are thus ignored in the model_analyze_insns ALAP calculation.

This approach gives most of the benefits of the original patch [1] on aarch64.
I tried it on the cactusADM hot function (bench_staggeredleapfrog2_) on 
powerpc64le-unknown-linux-gnu
with -O3 and found that the initial version proposed did indeed increase the 
instruction count
and stack space. This version gives a small improvement on powerpc in terms of 
instruction count
(number of st* instructions stays the same), so I'm hoping this version 
addresses Pat's concerns.
Pat, could you please try this version out if you've got the chance?


I tried the new verison on cactusADM, it's showing a 2% degradation. I've 
kicked off a full CPU2006 run just to see if any others are affected.

The other benchmarks were neutral. So the only benchmark showing a change is 
the 2% degradation on cactusADM. Comparing the generated .s files for 
bench_staggeredleapfrog2_(), there is about a 0.7% increase in load insns and 
still the 1% increase in store insns.


Sigh :(
What options are you compiling with? I tried a powerpc64le compiler with plain 
-O3 and saw got a slight improvement (by manual expection)

Thanks,
Kyrill



-Pat

Re: [PATCH v2] [aarch64] Add CPU support for Ampere Computing's eMAG.

2018-11-20 Thread Christoph Müllner



> On 20.11.2018, at 17:43, Kyrill Tkachov  wrote:
> 
> On 20/11/18 16:20, Christoph Müllner wrote:
>>> On 20.11.2018, at 17:14, Kyrill Tkachov  wrote:
>>> 
>>> Hi Christoph,
>>> 
>>> On 20/11/18 15:22, Christoph Muellner wrote:
 Tested with "make check" and no regressions found.
 
 *** gcc/ChangeLog ***
 
 2018-xx-xx  Christoph Muellner 
 
* config/aarch64/aarch64-cores.def: Define emag.
* config/aarch64/aarch64-tune.md: Regenerated with emag.
* config/aarch64/aarch64.c (emag_tunings): New struct.
* doc/invoke.texi: Document mtune value.
 
 Signed-off-by: Christoph Muellner 
 
 ---
  gcc/config/aarch64/aarch64-cores.def |  3 +++
  gcc/config/aarch64/aarch64-tune.md   |  2 +-
  gcc/config/aarch64/aarch64.c | 25 +
  gcc/doc/invoke.texi  |  2 +-
  4 files changed, 30 insertions(+), 2 deletions(-)
 
 diff --git a/gcc/config/aarch64/aarch64-cores.def 
 b/gcc/config/aarch64/aarch64-cores.def
 index 1f3ac56..8eee97f 100644
 --- a/gcc/config/aarch64/aarch64-cores.def
 +++ b/gcc/config/aarch64/aarch64-cores.def
 @@ -61,6 +61,9 @@ AARCH64_CORE("thunderxt88",   thunderxt88,   thunderx,  
 8A,  AARCH64_FL_FOR_ARCH
  AARCH64_CORE("thunderxt81",   thunderxt81,   thunderx,  8A,  
 AARCH64_FL_FOR_ARCH8 | AARCH64_FL_CRC | AARCH64_FL_CRYPTO, thunderx,  
 0x43, 0x0a2, -1)
  AARCH64_CORE("thunderxt83",   thunderxt83,   thunderx,  8A,  
 AARCH64_FL_FOR_ARCH8 | AARCH64_FL_CRC | AARCH64_FL_CRYPTO, thunderx,  
 0x43, 0x0a3, -1)
  +/* Ampere Computing cores. */
 +AARCH64_CORE("emag",emag,  xgene1,8A,  
 AARCH64_FL_FOR_ARCH8, emag, 0x50, 0x000, 3)
 +
>>> According to your previous reply, the 0x3 should be in the "PART" field, 
>>> that is
>>> ..., 0x50, 0x3, -1)
>> Should have been "variant field" in the email.
>> The v2 patch is correct and tested:
> 
> I see, that does look correct then.
> 
>> processor   : 0
>> BogoMIPS: 100.00
>> Features: fp asimd evtstrm aes pmull sha1 sha2 crc32 cpuid
> 
> This line says that the CPU supports the "crc" and "crypto" option extensions 
> by default so
> the 5th field should include AARCH64_FL_CRC | AARCH64_FL_CRYPTO so that the 
> user gets them
> by default when they use -mcpu=emag.

Good hint!

Thanks,
Christoph

> 
> Thanks, this is really helpful.
> Kyrill
> 
>> CPU implementer : 0x50
>> CPU architecture: 8
>> CPU variant : 0x3
>> CPU part: 0x000
>> CPU revision: 1
>> 
>> gcc -mcpu=native -Q --help=target
>>   -mcpu=emag+crypto+crc+aes+sha2+profile
>> 
>> Thanks,
>> Christoph
>> 
>> 
>> 
>>> Thanks,
>>> Kyrill
>>> 
  /* APM ('P') cores. */
  AARCH64_CORE("xgene1",  xgene1,xgene1,8A,  
 AARCH64_FL_FOR_ARCH8, xgene1, 0x50, 0x000, -1)
  diff --git a/gcc/config/aarch64/aarch64-tune.md 
 b/gcc/config/aarch64/aarch64-tune.md
 index fade1d4..2fc7f03 100644
 --- a/gcc/config/aarch64/aarch64-tune.md
 +++ b/gcc/config/aarch64/aarch64-tune.md
 @@ -1,5 +1,5 @@
  ;; -*- buffer-read-only: t -*-
  ;; Generated automatically by gentune.sh from aarch64-cores.def
  (define_attr "tune"
 -  
 "cortexa35,cortexa53,cortexa57,cortexa72,cortexa73,thunderx,thunderxt88p1,thunderxt88,thunderxt81,thunderxt83,xgene1,falkor,qdf24xx,exynosm1,phecda,thunderx2t99p1,vulcan,thunderx2t99,cortexa55,cortexa75,cortexa76,ares,tsv110,saphira,cortexa57cortexa53,cortexa72cortexa53,cortexa73cortexa35,cortexa73cortexa53,cortexa75cortexa55,cortexa76cortexa55"
 +  
 "cortexa35,cortexa53,cortexa57,cortexa72,cortexa73,thunderx,thunderxt88p1,thunderxt88,thunderxt81,thunderxt83,emag,xgene1,falkor,qdf24xx,exynosm1,phecda,thunderx2t99p1,vulcan,thunderx2t99,cortexa55,cortexa75,cortexa76,ares,tsv110,saphira,cortexa57cortexa53,cortexa72cortexa53,cortexa73cortexa35,cortexa73cortexa53,cortexa75cortexa55,cortexa76cortexa55"
(const (symbol_ref "((enum attr_tune) aarch64_tune)")))
 diff --git a/gcc/config/aarch64/aarch64.c b/gcc/config/aarch64/aarch64.c
 index f7f88a9..995aafe 100644
 --- a/gcc/config/aarch64/aarch64.c
 +++ b/gcc/config/aarch64/aarch64.c
 @@ -957,6 +957,31 @@ static const struct tune_params xgene1_tunings =
&xgene1_prefetch_tune
  };
  +static const struct tune_params emag_tunings =
 +{
 +  &xgene1_extra_costs,
 +  &xgene1_addrcost_table,
 +  &xgene1_regmove_cost,
 +  &xgene1_vector_cost,
 +  &generic_branch_cost,
 +  &xgene1_approx_modes,
 +  6, /* memmov_cost  */
 +  4, /* issue_rate  */
 +  AARCH64_FUSE_NOTHING, /* fusible_ops  */
 +  "16",   /* function_align.  */
 +  "16",   /* jump_align.  */
 +  "16",   /* loop_align.  */
 +  2,  /* int_reassoc_width.  */
 +  4,  /* fp_reassoc_width.  */
 +  1,  /* vec_reassoc_width.  */
 +  2,  /* m

Patch ping (Re: [PATCH] Fortran include line fixes and -fdec-include support)

2018-11-20 Thread Jakub Jelinek

Hi!

I'd like to ping this patch, ok for trunk?

> 2018-11-12  Jakub Jelinek  
>   Mark Eggleston  
> 
>   * lang.opt (fdec-include): New option.
>   * options.c (set_dec_flags): Set also flag_dec_include.
>   * scanner.c (include_line): Change return type from bool to int.
>   In fixed form allow spaces in between include keyword letters.
>   For -fdec-include, allow in fixed form 0 in column 6.  With
>   -fdec-include return -1 if the parsed line is not full include
>   statement and it could be successfully completed on continuation
>   lines.
>   (include_stmt): New function.
>   (load_file): Adjust include_line caller.  If it returns -1, keep
>   trying include_stmt until it stops returning -1 whenever adding
>   further line of input.
> 
>   * gfortran.dg/include_10.f: New test.
>   * gfortran.dg/include_10.inc: New file.
>   * gfortran.dg/include_11.f: New test.
>   * gfortran.dg/include_12.f: New test.
>   * gfortran.dg/include_13.f90: New test.
>   * gfortran.dg/gomp/include_1.f: New test.
>   * gfortran.dg/gomp/include_1.inc: New file.
>   * gfortran.dg/gomp/include_2.f90: New test.

Jakub

Patch ping (was Re: [PATCH] Fix aarch64_compare_and_swap* constraints (PR target/87839))

2018-11-20 Thread Jakub Jelinek

Hi!

On Tue, Nov 13, 2018 at 10:28:16AM +0100, Jakub Jelinek wrote:
> 2018-11-13  Jakub Jelinek  
> 
>   PR target/87839
>   * config/aarch64/atomics.md (@aarch64_compare_and_swap): Use
>   rIJ constraint for aarch64_plus_operand rather than rn.
> 
>   * gcc.target/aarch64/pr87839.c: New test.

I'd like to ping this patch, Kyrill had kindly tested it, ok for trunk?

Jakub

Re: Simplify enumerate and array types

2018-11-20 Thread Jan Hubicka

> 
> Somehow do this first, otherwise 'incomplete' above is useless
> work?

Well, there are two types but one is main variant of the other.
I have restructured the code and avoided the duplicate lookup & code
duplication. The attached patch boostraps®tests and I plan to commit
it after bit of further testing (re-doing stats on type duplicates).

* ipa-devirt.c (add_type_duplicate): Do not ICE on incomplete enums.
* tree.c (build_array_type_1): Forward declare.
(fld_type_variant_equal_p): Add INNER_TYPE parameter.
(fld_type_variant): Likewise.
(fld_simplified_types): New hash.
(fld_process_array_type): New function.
(fld_incomplete_type_of): Handle array and enumeration types.
(fld_simplified_type): Handle simplification of arrays.
(free_lang_data): Allocate and free simplified types hash.
Index: ipa-devirt.c
===
--- ipa-devirt.c(revision 266322)
+++ ipa-devirt.c(working copy)
@@ -1720,7 +1720,8 @@ add_type_duplicate (odr_type val, tree t
   else if (!COMPLETE_TYPE_P (val->type) && COMPLETE_TYPE_P (type))
 {
   prevail = true;
-  build_bases = TYPE_BINFO (type);
+  if (TREE_CODE (type) == RECORD_TYPE)
+build_bases = TYPE_BINFO (type);
 }
   else if (COMPLETE_TYPE_P (val->type) && !COMPLETE_TYPE_P (type))
 ;
Index: tree.c
===
--- tree.c  (revision 266321)
+++ tree.c  (working copy)
@@ -265,6 +265,8 @@ static void print_type_hash_statistics (
 static void print_debug_expr_statistics (void);
 static void print_value_expr_statistics (void);
 
+static tree build_array_type_1 (tree, tree, bool, bool);
+
 tree global_trees[TI_MAX];
 tree integer_types[itk_none];
 
@@ -5109,10 +5111,11 @@ fld_simplified_type_name (tree type)
 
 /* Do same comparsion as check_qualified_type skipping lang part of type
and be more permissive about type names: we only care that names are
-   same (for diagnostics) and that ODR names are the same.  */
+   same (for diagnostics) and that ODR names are the same.
+   If INNER_TYPE is non-NULL, be sure that TREE_TYPE match it.  */
 
 static bool
-fld_type_variant_equal_p (tree t, tree v)
+fld_type_variant_equal_p (tree t, tree v, tree inner_type)
 {
   if (TYPE_QUALS (t) != TYPE_QUALS (v)
   /* We want to match incomplete variants with complete types.
@@ -5122,21 +5125,24 @@ fld_type_variant_equal_p (tree t, tree v
  || TYPE_USER_ALIGN (t) != TYPE_USER_ALIGN (v)))
   || fld_simplified_type_name (t) != fld_simplified_type_name (v)
   || !attribute_list_equal (TYPE_ATTRIBUTES (t),
-   TYPE_ATTRIBUTES (v)))
+   TYPE_ATTRIBUTES (v))
+  || (inner_type && TREE_TYPE (v) != inner_type))
 return false;
- 
+
   return true;
 }
 
-/* Find variant of FIRST that match T and create new one if necessary.  */
+/* Find variant of FIRST that match T and create new one if necessary.
+   Set TREE_TYPE to INNER_TYPE if non-NULL.  */
 
 static tree
-fld_type_variant (tree first, tree t, struct free_lang_data_d *fld)
+fld_type_variant (tree first, tree t, struct free_lang_data_d *fld,
+ tree inner_type = NULL)
 {
   if (first == TYPE_MAIN_VARIANT (t))
 return t;
   for (tree v = first; v; v = TYPE_NEXT_VARIANT (v))
-if (fld_type_variant_equal_p (t, v))
+if (fld_type_variant_equal_p (t, v, inner_type))
   return v;
   tree v = build_variant_type_copy (first);
   TYPE_READONLY (v) = TYPE_READONLY (t);
@@ -5154,7 +5160,9 @@ fld_type_variant (tree first, tree t, st
   SET_TYPE_ALIGN (v, TYPE_ALIGN (t));
   TYPE_USER_ALIGN (v) = TYPE_USER_ALIGN (t);
 }
-  gcc_checking_assert (fld_type_variant_equal_p (t,v));
+  if (inner_type)
+TREE_TYPE (v) = inner_type;
+  gcc_checking_assert (fld_type_variant_equal_p (t,v, inner_type));
   add_tree_to_fld_list (v, fld);
   return v;
 }
@@ -5163,6 +5171,41 @@ fld_type_variant (tree first, tree t, st
 
 static hash_map *fld_incomplete_types;
 
+/* Map types to simplified types.  */
+
+static hash_map *fld_simplified_types;
+
+/* Produce variant of T whose TREE_TYPE is T2. If it is main variant,
+   use MAP to prevent duplicates.  */
+
+static tree
+fld_process_array_type (tree t, tree t2, hash_map *map,
+   struct free_lang_data_d *fld)
+{
+  if (TREE_TYPE (t) == t2)
+return t;
+
+  if (TYPE_MAIN_VARIANT (t) != t)
+{
+  return fld_type_variant
+  (fld_process_array_type (TYPE_MAIN_VARIANT (t),
+   TYPE_MAIN_VARIANT (t2), map, fld),
+   t, fld, t2);
+}
+
+  bool existed;
+  tree &array
+ = map->get_or_insert (t, &existed);
+  if (!existed)
+{
+  array = build_array_type_1 (t2, TYPE_DOMAIN (t),
+ TYPE_TYPELESS_STORAGE (t), false);
+  TYPE_CANONICAL (array

PING^3: [PATCH] apply_subst_iterator: Handle define_split/define_insn_and_split

2018-11-20 Thread H.J. Lu

On Tue, Nov 13, 2018 at 6:08 AM H.J. Lu  wrote:
>
> On Sun, Nov 4, 2018 at 7:24 AM H.J. Lu  wrote:
> >
> > On Fri, Oct 26, 2018 at 12:44 AM H.J. Lu  wrote:
> > >
> > > On 10/25/18, Uros Bizjak  wrote:
> > > > On Fri, Oct 26, 2018 at 8:48 AM H.J. Lu  wrote:
> > > >>
> > > >> On 10/25/18, Uros Bizjak  wrote:
> > > >> > On Fri, Oct 26, 2018 at 8:07 AM H.J. Lu  wrote:
> > > >> >>
> > > >> >> * read-rtl.c (apply_subst_iterator): Handle
> > > >> >> define_insn_and_split.
> > > >> >> ---
> > > >> >>  gcc/read-rtl.c | 6 --
> > > >> >>  1 file changed, 4 insertions(+), 2 deletions(-)
> > > >> >>
> > > >> >> diff --git a/gcc/read-rtl.c b/gcc/read-rtl.c
> > > >> >> index d698dd4af4d..5957c29671a 100644
> > > >> >> --- a/gcc/read-rtl.c
> > > >> >> +++ b/gcc/read-rtl.c
> > > >> >> @@ -275,9 +275,11 @@ apply_subst_iterator (rtx rt, unsigned int, int
> > > >> >> value)
> > > >> >>if (value == 1)
> > > >> >>  return;
> > > >> >>gcc_assert (GET_CODE (rt) == DEFINE_INSN
> > > >> >> + || GET_CODE (rt) == DEFINE_INSN_AND_SPLIT
> > > >> >>   || GET_CODE (rt) == DEFINE_EXPAND);
> > > >> >
> > > >> > Can we also handle DEFINE_SPLIT here?
> > > >> >
> > > >>
> > > >> Yes, we could if there were a usage for it.  I am reluctant to add
> > > >> something
> > > >> I have no use nor test for.
> > > >
> > > > Just split one define_insn_and_split to define_insn and corresponding
> > > > define_split.
> > > >
> > > > define_insn_and_split is a contraction for for the define_insn and
> > > > corresponding define_split, so it looks weird to only handle
> > > > define_insn_and-split without handling define_split.
> > > >
> > >
> > > Here is the updated patch to handle define_split.  Tested with
> > >
> > > (define_insn "*sse4_1_v8qiv8hi2_2"
> > >   [(set (match_operand:V8HI 0 "register_operand")
> > > (any_extend:V8HI
> > >   (vec_select:V8QI
> > > (subreg:V16QI
> > >   (vec_concat:V2DI
> > > (match_operand:DI 1 "memory_operand")
> > > (const_int 0)) 0)
> > > (parallel [(const_int 0) (const_int 1)
> > >(const_int 2) (const_int 3)
> > >(const_int 4) (const_int 5)
> > >(const_int 6) (const_int 7)]]
> > >   "TARGET_SSE4_1 &&  && 
> > > "
> > >   "#")
> > >
> > > (define_split
> > >   [(set (match_operand:V8HI 0 "register_operand")
> > > (any_extend:V8HI
> > >   (vec_select:V8QI
> > > (subreg:V16QI
> > >   (vec_concat:V2DI
> > > (match_operand:DI 1 "memory_operand")
> > > (const_int 0)) 0)
> > > (parallel [(const_int 0) (const_int 1)
> > >(const_int 2) (const_int 3)
> > >(const_int 4) (const_int 5)
> > >(const_int 6) (const_int 7)]]
> > >   "TARGET_SSE4_1 &&  && 
> > >&& can_create_pseudo_p ()"
> > >   [(set (match_dup 0)
> > > (any_extend:V8HI (match_dup 1)))]
> > > {
> > >   operands[1] = adjust_address_nv (operands[1], V8QImode, 0);
> > > })
> > >
> >
> > PING:
> >
> > https://gcc.gnu.org/ml/gcc-patches/2018-10/msg01665.html
> >
> > This patch blocks an i386 backend patch.
> >
>
> PING.
>

PING.

-- 
H.J.

Re: [PATCH 3/6] [RS6000] Replace TLSmode with P, and correct tls call mems

2018-11-20 Thread Segher Boessenkool

On Tue, Nov 13, 2018 at 11:20:08PM +1030, Alan Modra wrote:
> There is really no need to define a TLSmode mode iterator that is
> identical (since !TARGET_64BIT == TARGET_32BIT) to the much used P
> mode iterator.  It's nonsense to think we might ever want to support
> 32-bit TLS on 64-bit or vice versa!  The patch also fixes a minor
> error in the call mems.  All other direct calls use (call (mem:SI ..)).
> 
>   * config/rs6000/rs6000.md (TLSmode): Delete mode iterator.  Replace
>   with P throughout except for call mems which should use SI.
>   (tls_abi_suffix, tls_sysv_suffix, tls_insn_suffix): Delete mode
>   iterators.  Replace with bits, mode and ptrload respectively.

(mode attributes, not iterators, in that last line)

> +(define_insn_and_split "tls_gd_aix"

Instead of  you can just say , because P is the only (mode)
iterator used here.  Similar throughout the patch.

Okay for trunk (with that change if you can easily make it).  Thanks!


Segher

PING: V2 [PATCH] i386: Remove duplicated AVX2/AVX512 vec_dup patterns

2018-11-20 Thread H.J. Lu

On Mon, Nov 5, 2018 at 2:02 PM H.J. Lu  wrote:
>
> Hi Richard, Jakub,
>
> Can you take a look at this patch?  The last review from Kirill was in
> June.
>
> Thanks.
>
>
> H.J.
> --
> There are many duplicated AVX2/AVX512 vec_dup patterns like:
>
> (define_insn "avx2_vec_dup"
>   [(set (match_operand:VF1_128_256 0 "register_operand" "=v")
> (vec_duplicate:VF1_128_256
>   (vec_select:SF
> (match_operand:V4SF 1 "register_operand" "v")
> (parallel [(const_int 0)]]
>   "TARGET_AVX2"
>   "vbroadcastss\t{%1, %0|%0, %1}"
>   [(set_attr "type" "sselog1")
> (set_attr "prefix" "maybe_evex")
> (set_attr "mode" "")])
>
> and
>
> (define_insn "vec_dup"
>   [(set (match_operand:AVX_VEC_DUP_MODE 0 "register_operand" "=x,x,x,v,x")
> (vec_duplicate:AVX_VEC_DUP_MODE
>   (match_operand: 1 "nonimmediate_operand" 
> "m,m,x,v,?x")))]
>   "TARGET_AVX"
>   "@
>vbroadcast\t{%1, %0|%0, %1}
>vbroadcast\t{%1, %0|%0, %1}
>vbroadcast\t{%x1, %0|%0, %x1}
>vbroadcast\t{%x1, %g0|%g0, %x1}
>#"
>   [(set_attr "type" "ssemov")
>(set_attr "prefix_extra" "1")
>(set_attr "prefix" "maybe_evex")
>(set_attr "isa" "avx2,noavx2,avx2,avx512f,noavx2")
>(set_attr "mode" ",V8SF,,,V8SF")])
>
> We can remove the duplicated AVX2/AVX512 vec_dup patterns and use the
> normal AVX2/AVX512 vec_dup patterns instead by changing source operand
> to subreg of the same register class of the base by generating
>
> (set (reg:V8SF 84)
>  (vec_duplicate:V8SF (subreg:SF (reg:V4SF 85) 0)))
>
> instead of
>
> (set (reg:V8SF 84)
>   (vec_duplicate:V8SF
> (vec_select:SF (reg:V4SF 85)
>   (parallel [(const_int 0 [0])]
>
> For integer vector broadcast, we generate
>
> (set (reg:V32QI 86)
>  (vec_duplicate:V32QI
> (vec_select:QI (subreg:V16QI (reg:V32QI 87) 0))
>   (parallel [(const_int 0 [0])]
>
> instead of
>
> (set (reg:V32QI 86)
>  (vec_duplicate:V32QI
> (vec_select:QI (reg:V32QI 87)
>   (parallel [(const_int 0 [0])]
>
> so that we can remove
>
> (define_insn "avx2_pbroadcast_1"
>   [(set (match_operand:VI_256 0 "register_operand" "=x,x,v,v")
> (vec_duplicate:VI_256
>   (vec_select:
> (match_operand:VI_256 1 "nonimmediate_operand" "m,x,m,v")
> (parallel [(const_int 0)]]
>   "TARGET_AVX2"
>   "@
>vpbroadcast\t{%1, %0|%0, %1}
>vpbroadcast\t{%x1, %0|%0, %x1}
>vpbroadcast\t{%1, %0|%0, %1}
>vpbroadcast\t{%x1, %0|%0, %x1}"
>   [(set_attr "isa" "*,*,,")
>(set_attr "type" "ssemov")
>(set_attr "prefix_extra" "1")
>(set_attr "prefix" "vex")
>(set_attr "mode" "")])
>
> and keep only
>
> (define_insn "avx2_pbroadcast"
>   [(set (match_operand:VI 0 "register_operand" "=x,v")
> (vec_duplicate:VI
>   (vec_select:
> (match_operand: 1 "nonimmediate_operand" "xm,vm")
> (parallel [(const_int 0)]]
>   "TARGET_AVX2"
>   "vpbroadcast\t{%1, %0|%0, %1}"
>   [(set_attr "isa" "*,")
>(set_attr "type" "ssemov")
>(set_attr "prefix_extra" "1")
>(set_attr "prefix" "vex,evex")
>(set_attr "mode" "")])
>
> gcc.target/i386/avx2-vbroadcastss_ps256-1.c is changed by
>
>  avx2_test:
> .cfi_startproc
> -   vmovaps x(%rip), %xmm1
> -   vbroadcastss%xmm1, %ymm0
> +   vbroadcastssx(%rip), %ymm0
> vmovaps %ymm0, y(%rip)
> vzeroupper
> ret
> .cfi_endproc
>
> gcc.target/i386/avx512vl-vbroadcast-3.c is changed by
>
> @@ -113,7 +113,7 @@ f10:
> .cfi_startproc
> vmovaps %ymm0, %ymm16
> vpermilps   $85, %ymm16, %ymm16
> -   vbroadcastss%xmm16, %ymm16
> +   vshuff32x4  $0x0, %ymm16, %ymm16, %ymm16
> vzeroupper
> ret
> .cfi_endproc
> @@ -153,8 +153,7 @@ f12:
>  f13:
>  .LFB12:
> .cfi_startproc
> -   vmovaps (%rdi), %ymm16
> -   vbroadcastss%xmm16, %ymm16
> +   vbroadcastss(%rdi), %ymm16
> vzeroupper
> ret
> .cfi_endproc
>
> gcc/
>
> * config/i386/i386-builtin.def: Replace CODE_FOR_avx2_vec_dupv4sf,
> CODE_FOR_avx2_vec_dupv8sf and CODE_FOR_avx2_vec_dupv4df with
> CODE_FOR_vec_dupv4sf, CODE_FOR_vec_dupv8sf and
> CODE_FOR_vec_dupv4df, respectively.
> * config/i386/i386.c (expand_vec_perm_1): Use subreg with vec_dup.
> * config/i386/i386.md (SF to DF splitter): Replace
> gen_avx512f_vec_dupv16sf_1 with gen_avx512f_vec_dupv16sf.
> * config/i386/sse.md (VF48_AVX512VL): New.
> (avx2_vec_dup): Removed.
> (avx2_vec_dupv8sf_1): Likewise.
> (avx512f_vec_dup_1): Likewise.
> (avx2_pbroadcast_1): Likewise.
> (avx2_vec_dupv4df): Likewise.
> (_vec_dup_1): Likewise.
> (*avx_vperm_broadcast_): Replace gen_avx2_vec_dupv8sf with
> gen_vec_dupv8sf.
>
> gcc/testsuite/
>
> * gcc.target/i386/avx2-vbroadcastss_ps256-1.c: Updated.

Re: Tweak ALAP calculation in SCHED_PRESSURE_MODEL

2018-11-20 Thread Pat Haugen

On 11/20/18 10:53 AM, Kyrill Tkachov wrote:
> On 20/11/18 16:48, Pat Haugen wrote:
>> On 11/19/18 2:30 PM, Pat Haugen wrote:
 This is a follow-up from 
 https://gcc.gnu.org/ml/gcc-patches/2018-11/msg01525.html
 This version introduces an "artificial" property of the dependencies 
 produced in
 sched-deps.c that is recorded when they are created due to 
 MAX_PENDING_LIST_LENGTH
 and they are thus ignored in the model_analyze_insns ALAP calculation.

 This approach gives most of the benefits of the original patch [1] on 
 aarch64.
 I tried it on the cactusADM hot function (bench_staggeredleapfrog2_) on 
 powerpc64le-unknown-linux-gnu
 with -O3 and found that the initial version proposed did indeed increase 
 the instruction count
 and stack space. This version gives a small improvement on powerpc in 
 terms of instruction count
 (number of st* instructions stays the same), so I'm hoping this version 
 addresses Pat's concerns.
 Pat, could you please try this version out if you've got the chance?

>>> I tried the new verison on cactusADM, it's showing a 2% degradation. I've 
>>> kicked off a full CPU2006 run just to see if any others are affected.
>> The other benchmarks were neutral. So the only benchmark showing a change is 
>> the 2% degradation on cactusADM. Comparing the generated .s files for 
>> bench_staggeredleapfrog2_(), there is about a 0.7% increase in load insns 
>> and still the 1% increase in store insns.
> 
> Sigh :(
> What options are you compiling with? I tried a powerpc64le compiler with 
> plain -O3 and saw got a slight improvement (by manual expection)

I was using the following: -O3 -mcpu=power8 -fpeel-loops -funroll-loops 
-ffast-math -mpopcntd -mrecip=all. When I run with just -O3 -mcpu=power8 I see 
just under a 1% degradation.

-Pat

[PATCH v3] [aarch64] Add CPU support for Ampere Computing's eMAG.

2018-11-20 Thread Christoph Muellner

Tested with "make check" and no regressions found.

This patch depends on the struct xgene1_prefetch_tune,
which has been acknowledged already:
https://gcc.gnu.org/ml/gcc-patches/2018-11/msg00985.html

*** gcc/ChangeLog ***

2018-xx-xx  Christoph Muellner 

* config/aarch64/aarch64-cores.def: Define emag.
* config/aarch64/aarch64-tune.md: Regenerated with emag.
* config/aarch64/aarch64.c (emag_tunings): New struct.
* doc/invoke.texi: Document mtune value.

Signed-off-by: Christoph Muellner 
---
 gcc/config/aarch64/aarch64-cores.def |  3 +++
 gcc/config/aarch64/aarch64-tune.md   |  2 +-
 gcc/config/aarch64/aarch64.c | 25 +
 gcc/doc/invoke.texi  |  2 +-
 4 files changed, 30 insertions(+), 2 deletions(-)

diff --git a/gcc/config/aarch64/aarch64-cores.def 
b/gcc/config/aarch64/aarch64-cores.def
index 1f3ac56..68cca00 100644
--- a/gcc/config/aarch64/aarch64-cores.def
+++ b/gcc/config/aarch64/aarch64-cores.def
@@ -61,6 +61,9 @@ AARCH64_CORE("thunderxt88",   thunderxt88,   thunderx,  8A,  
AARCH64_FL_FOR_ARCH
 AARCH64_CORE("thunderxt81",   thunderxt81,   thunderx,  8A,  
AARCH64_FL_FOR_ARCH8 | AARCH64_FL_CRC | AARCH64_FL_CRYPTO, thunderx,  0x43, 
0x0a2, -1)
 AARCH64_CORE("thunderxt83",   thunderxt83,   thunderx,  8A,  
AARCH64_FL_FOR_ARCH8 | AARCH64_FL_CRC | AARCH64_FL_CRYPTO, thunderx,  0x43, 
0x0a3, -1)
 
+/* Ampere Computing cores. */
+AARCH64_CORE("emag",emag,  xgene1,8A,  AARCH64_FL_FOR_ARCH8 | 
AARCH64_FL_CRC | AARCH64_FL_CRYPTO, emag, 0x50, 0x000, 3)
+
 /* APM ('P') cores. */
 AARCH64_CORE("xgene1",  xgene1,xgene1,8A,  AARCH64_FL_FOR_ARCH8, 
xgene1, 0x50, 0x000, -1)
 
diff --git a/gcc/config/aarch64/aarch64-tune.md 
b/gcc/config/aarch64/aarch64-tune.md
index fade1d4..2fc7f03 100644
--- a/gcc/config/aarch64/aarch64-tune.md
+++ b/gcc/config/aarch64/aarch64-tune.md
@@ -1,5 +1,5 @@
 ;; -*- buffer-read-only: t -*-
 ;; Generated automatically by gentune.sh from aarch64-cores.def
 (define_attr "tune"
-   
"cortexa35,cortexa53,cortexa57,cortexa72,cortexa73,thunderx,thunderxt88p1,thunderxt88,thunderxt81,thunderxt83,xgene1,falkor,qdf24xx,exynosm1,phecda,thunderx2t99p1,vulcan,thunderx2t99,cortexa55,cortexa75,cortexa76,ares,tsv110,saphira,cortexa57cortexa53,cortexa72cortexa53,cortexa73cortexa35,cortexa73cortexa53,cortexa75cortexa55,cortexa76cortexa55"
+   
"cortexa35,cortexa53,cortexa57,cortexa72,cortexa73,thunderx,thunderxt88p1,thunderxt88,thunderxt81,thunderxt83,emag,xgene1,falkor,qdf24xx,exynosm1,phecda,thunderx2t99p1,vulcan,thunderx2t99,cortexa55,cortexa75,cortexa76,ares,tsv110,saphira,cortexa57cortexa53,cortexa72cortexa53,cortexa73cortexa35,cortexa73cortexa53,cortexa75cortexa55,cortexa76cortexa55"
(const (symbol_ref "((enum attr_tune) aarch64_tune)")))
diff --git a/gcc/config/aarch64/aarch64.c b/gcc/config/aarch64/aarch64.c
index f7f88a9..995aafe 100644
--- a/gcc/config/aarch64/aarch64.c
+++ b/gcc/config/aarch64/aarch64.c
@@ -957,6 +957,31 @@ static const struct tune_params xgene1_tunings =
   &xgene1_prefetch_tune
 };
 
+static const struct tune_params emag_tunings =
+{
+  &xgene1_extra_costs,
+  &xgene1_addrcost_table,
+  &xgene1_regmove_cost,
+  &xgene1_vector_cost,
+  &generic_branch_cost,
+  &xgene1_approx_modes,
+  6, /* memmov_cost  */
+  4, /* issue_rate  */
+  AARCH64_FUSE_NOTHING, /* fusible_ops  */
+  "16",/* function_align.  */
+  "16",/* jump_align.  */
+  "16",/* loop_align.  */
+  2,   /* int_reassoc_width.  */
+  4,   /* fp_reassoc_width.  */
+  1,   /* vec_reassoc_width.  */
+  2,   /* min_div_recip_mul_sf.  */
+  2,   /* min_div_recip_mul_df.  */
+  17,  /* max_case_values.  */
+  tune_params::AUTOPREFETCHER_OFF, /* autoprefetcher_model.  */
+  (AARCH64_EXTRA_TUNE_NO_LDP_STP_QREGS),   /* tune_flags.  */
+  &xgene1_prefetch_tune
+};
+
 static const struct tune_params qdf24xx_tunings =
 {
   &qdf24xx_extra_costs,
diff --git a/gcc/doc/invoke.texi b/gcc/doc/invoke.texi
index e016dce..ac81fb2 100644
--- a/gcc/doc/invoke.texi
+++ b/gcc/doc/invoke.texi
@@ -15288,7 +15288,7 @@ Specify the name of the target processor for which GCC 
should tune the
 performance of the code.  Permissible values for this option are:
 @samp{generic}, @samp{cortex-a35}, @samp{cortex-a53}, @samp{cortex-a55},
 @samp{cortex-a57}, @samp{cortex-a72}, @samp{cortex-a73}, @samp{cortex-a75},
-@samp{cortex-a76}, @samp{ares}, @samp{exynos-m1}, @samp{falkor},
+@samp{cortex-a76}, @samp{ares}, @samp{exynos-m1}, @samp{emag}, @samp{falkor},
 @samp{qdf24xx}, @samp{saphira}, @samp{phecda}, @samp{xgene1}, @samp{vulcan},
 @samp{thunderx}, @samp{thunderxt88}, @samp{thunderxt88p1}, @samp{thunderxt81},
 @samp{tsv110}, @samp{thunderxt83}, @samp{thunderx2t99},
-- 
2.9.5

[PATCH] clarify comments for implicit_p flag for built-ins

2018-11-20 Thread Martin Sebor


Would the updated comments in the attached patch more accurately
describe the purpose of the IMPLICIT_P flag and
the builtin_decl_explicit() and builtin_decl_implicit() functions?

I ended up here while trying to understand the differences between
the functions on different targets and decide which one should be
used to diagnose bugs like the one below:

  long double fabsl ();   // missing prototype

  long double f (int x)
  {
return fabsl (x); // want a warning here
  }

I think we want the warning regardless of IMPLICIT_P so the warning
code should call builtin_decl_explicit() to obtain fabsl's expected
type, even if the target's runtime doesn't support the function on
the basis of the comment:

  When a program uses floorf we may assume that the floorf function
  has the expected meaning

Thanks
Martin
gcc/ChangeLog:

	* builtins.def (DEF_BUILTIN): Update comment..
	* tree.h (builtin_decl_explicit): Same.
	(builtin_decl_implicit): Same.

Index: gcc/tree.h
===
--- gcc/tree.h	(revision 266320)
+++ gcc/tree.h	(working copy)
@@ -5220,7 +5220,9 @@ is_lang_specific (const_tree t)
 #define BUILTIN_VALID_P(FNCODE) \
   (IN_RANGE ((int)FNCODE, ((int)BUILT_IN_NONE) + 1, ((int) END_BUILTINS) - 1))
 
-/* Return the tree node for an explicit standard builtin function or NULL.  */
+/* Return the tree node for the built-in function declaration corresponding
+   to FNCODE or NULL.  */
+
 static inline tree
 builtin_decl_explicit (enum built_in_function fncode)
 {
@@ -5229,7 +5231,12 @@ builtin_decl_explicit (enum built_in_function fnco
   return builtin_info[(size_t)fncode].decl;
 }
 
-/* Return the tree node for an implicit builtin function or NULL.  */
+/* Return the tree node for the built-in function declaration corresponding
+   to FNCODE if its IMPLICIT_P flag has been set or NULL otherwise.
+   IMPLICIT_P is clear for library built-ins that GCC implements but that
+   may not be implemented in the runtime library on the target.  See also
+   the DEF_BUILTIN macro in builtins.def.  */
+
 static inline tree
 builtin_decl_implicit (enum built_in_function fncode)
 {
Index: gcc/builtins.def
===
--- gcc/builtins.def	(revision 266320)
+++ gcc/builtins.def	(working copy)
@@ -54,12 +54,13 @@ along with GCC; see the file COPYING3.  If not see
ATTRs is an attribute list as defined in builtin-attrs.def that
describes the attributes of this builtin function.
 
-   IMPLICIT specifies condition when the builtin can be produced by
-   compiler.  For instance C90 reserves floorf function, but does not
-   define it's meaning.  When user uses floorf we may assume that the
-   floorf has the meaning we expect, but we can't produce floorf by
-   simplifying floor((double)float) since the runtime need not implement
-   it.
+   IMPLICIT specifies the condition when calls to the builtin can be
+   introduced by GCC.  For instance C90 reserves the floorf function,
+   but does not define its meaning.  When a program uses floorf we
+   may assume that the floorf function has the expected meaning and
+   signature, but we may not transform the call floor((double)flt)
+   into a call to floorf(flt) since the runtime need not implement
+   the latter.
 
The builtins is registered only if COND is true.  */

Re: [PATCH][RFC] Extend locations where to seach for Fortran pre-include.

2018-11-20 Thread Joseph Myers

On Tue, 20 Nov 2018, Jakub Jelinek wrote:

> hardcoding /usr/include looks just very wrong here.  That should always be
> dependent on the configured prefix or better be relative from the driver,
> gcc should be relocatable.  Or at least come from configure.  It should e.g.
> honor the sysroot stuff etc.
> 
> That said, I think you need somebody familiar with the driver, perhaps
> Joseph?

I'd sort of expect structures like those in cppdefault.[ch] to describe 
the relevant Fortran directories and their properties (such as being 
sysrooted or not - and if sysrooted, I suppose you'll want to make sure 
SYSROOT_HEADERS_SUFFIX_SPEC is properly applied).

If this preinclude doesn't pass through the C preprocessor, directories in 
which it is searched for will need multilib or multiarch suffixes.  
(Multilib suffixes on include directories for C are more or less an 
implementation detail of how fixed headers are arranged in the case where 
sysroot headers suffixes are used; they aren't really expected to be a 
stable interface such that third-party software might install anything 
using them, but I'm not sure if this preinclude is meant to come from 
external software or be installed by GCC.  Multiarch suffixes, for systems 
using Debian/Ubuntu-style multiarch directory arrangements, *are* intended 
as a stable interface.  And multilib *OS* suffixes 
(-print-multi-os-directory) are a stable interface, but only really 
suitable for libraries, not headers, because they are paths relative to 
lib/ such as ../lib64.)

-- 
Joseph S. Myers
jos...@codesourcery.com

Re: [PATCH v2 2/3] Add a pass to automatically add ptwrite instrumentation

2018-11-20 Thread Andi Kleen

On Tue, Nov 20, 2018 at 01:04:19PM +0100, Richard Biener wrote:
> Since your builtin clobbers memory

Hmm, maybe we could get rid of that, but then how to avoid
the optimizer moving it around over function calls etc.?
The instrumentation should still be useful when the program
crashes, so we don't want to delay logging too much.

> Maybe even instead pass it a number of bytes so it models how atomics work.

How could that reject float?

Mode seems better for now.

Eventually might support float/double through memory, but not in the
first version.

> >NEXT_PASS (pass_tsan_O0);
> >NEXT_PASS (pass_sanopt);
> > +  NEXT_PASS (pass_vartrace);
> 
> I'd move it one lower, after pass_cleanup_eh.  Further enhancement
> would make it a
> RTL pass ...

It's after pass_nrv now.

> So in reality the restriction is on the size of the object, correct?

The instruction accepts 32 or 64bit memory or register.

In principle everything could be logged through this, but i was trying
to limit the cases to integers and pointers for now to simplify
the problem.

Right now the backend fails up when something else than 4 or 8 bytes
is passed.

> 
> > +{
> > +  if (!supported_type (t))
> 
> You handle some nested cases below via recursion,
> like a.b.c (but not a[i][j]).  But then this check will
> fire.  I think it would be better to restructure the
> function to look at the outermost level for whether
> the op is of supported type, thus we can log it
> at all and then get all the way down to the base via
> sth like
> 
>   if (!supported_type (t))
> return false;
>   enum attrstate s = ;
>   do
> {
>s = supported_op (t, s);
>if (s == force_off)
>  return false;
> }
>   while (handled_component_p (t) && (t = TREE_OPERAND (t, 0)))
> 
> Now t is either an SSA_NAME, a DECL (you fail to handle PARM_DECL

Incoming arguments and returns are handled separately.

> and RESULT_DECL below) or a [TARGET_]MEM_REF.  To get rid
> of non-pointer indirections do then
> 
>   t = get_base_address (t);
>   if (DECL_P (t) && is_local (t))
>   
> 
> because...
> 
> > +return false;
> > +
> > +  enum attrstate s = supported_op (t, neutral);
> > +  if (s == force_off)
> > +return false;
> > +  if (s == force_on)
> > +force = true;
> > +
> > +  switch (TREE_CODE (t))
> > +{
> > +case VAR_DECL:
> > +  if (DECL_ARTIFICIAL (t))
> > +   return false;
> > +  if (is_local (t))
> > +   return true;
> > +  return s == force_on || force;
> > +
> > +case ARRAY_REF:
> > +  t = TREE_OPERAND (t, 0);
> > +  s = supported_op (t, s);
> > +  if (s == force_off)
> > +   return false;
> > +  return supported_type (TREE_TYPE (t));
> 
> Your supported_type is said to take a DECL.  And you
> already asked for this type - it's the type of the original t
> (well, the type of this type given TREE_TYPE (t) is an array type).
> But you'll reject a[i][j] where the type of this type is an array type as 
> well.

Just to be clear, after your changes above I only need
to handle VAR_DECL and SSA_NAME here then, correct?

So one of the reasons I handled ARRAY_REF this way is to
trace the index as a local if needed. If I can assume
it was always in a MEM with an own ASSIGN earlier if the local 
was a user visible that wouldn't be needed (and also some other similar
code elsewhere)

But when I look at a simple test case like vartrace-6 

void
f (void)
{
  int i;
  for (i = 0; i < 10; i++)
   f2 ();
}

i appears to be a SSA name only that is referenced everywhere without
MEM. And if the user wants to track the value of i I would need
to explicitely handle all these cases. Do I miss something here?

I'm starting to think i should perhaps drop locals support to simplify
everything? But that might limit usability for debugging somewhat.

> gsi_insert_* does update_stmt already.  Btw, if you allow any
> SImode or DImode size value you can use a VIEW_CONVERT_EXPR

Just add them unconditionally? 

> > +bool
> > +instrument_args (function *fun)
> > +{
> > +  gimple_stmt_iterator gi;
> > +  bool changed = false;
> > +
> > +  /* Local tracing usually takes care of the argument too, when
> > + they are read. This avoids redundant trace instructions.  */
> 
> But only when instrumenting reads?

Yes will add the check.

> 
> Hmm, but then this requires the target instruction to have a memory operand?

Yes that's right for now. Eventually it will be fixed and x86 would
benefit too.

> That's going to be unlikely for RISCy cases?  On x86 does it work if
> combine later does not syntesize a ptwrite with memory operand?
> I also wonder how this survives RTL CSE since you are basically doing
> 
>   mem = val;  // orig stmt
>   val' = mem;
>   ptwrite (val');
> 
> that probably means when CSE removes the load there ends up a debug-insn
> reflecting what you want?

I'll check.

> > +  /* Handle operators in case they read locals.  */
> 
> Does it make sense at all to instrument SSA "rea

Re: [PATCH v2 1/3] Allow memory operands for PTWRITE

2018-11-20 Thread Andi Kleen

On Tue, Nov 20, 2018 at 11:53:15AM +0100, Richard Biener wrote:
> On Fri, Nov 16, 2018 at 8:07 AM Uros Bizjak  wrote:
> >
> > On Fri, Nov 16, 2018 at 4:57 AM Andi Kleen  wrote:
> > >
> > > From: Andi Kleen 
> > >
> > > The earlier PTWRITE builtin definition was unnecessarily restrictive,
> > > only allowing register input to PTWRITE. The instruction actually
> > > supports memory operands too, so allow that too.
> > >
> > > gcc/:
> > >
> > > 2018-11-15  Andi Kleen  
> > >
> > > * config/i386/i386.md: Allow memory operands to ptwrite.
> >
> > OK.
> 
> Btw, I wonder why the ptwrite builtin is in SPECIAL_ARGS2
> commented as /* Add all special builtins with variable number of operands. */?

i think i put it in the same place as a similar builtin. AFAIK
those others don't have variable arguments either, so the comment
may be wrong?
> 
> On the GIMPLE level this builtin also has quite some (bad) effects on
> alias analysis and any related optimization (vectorization, etc.).  I'll have
> to see where the instrumenting pass now resides.

It's fairly late now.

Any suggestions for improvements? At some point I removed the edges
like the old MPX builtins to minimize memory usage, but that was
removed during an earlier review cycle.

-Andi

Re: [PATCH] RISC-V: Pass -mno-relax through to assembler if supported

2018-11-20 Thread Jim Wilson


On 11/18/18 7:28 AM, James Clarke wrote:

GCC will emit ".option (no)relax" in its outputted assembly, but when
using it as an assembler driver, such as for preprocessed assembly, it's
merely preprocessing and therefore will not generate this directive.
Therefore we should pass -mno-relax on to the assembler if specified
(and supported) as we do for other flags.


This looks good.  I will commit when your assignment comes through.

Jim

[PATCH] Fix missing dump_impl_location_t values, using a new dump_metadata_t

2018-11-20 Thread David Malcolm

The dump_* API attempts to capture emission location metadata for the
various dump messages, but looking in -fsave-optimization-record shows
that many dump messages are lacking useful impl_location values, instead
having this location within dumpfile.c:

"impl_location": {
"file": "../../src/gcc/dumpfile.c",
"function": "ensure_pending_optinfo",
"line": 1169
},

The problem is that the auto-capturing of dump_impl_location_t is tied to
dump_location_t, and this is tied to the dump_*_loc calls.  If a message
comes from a dump_* call without a "_loc" suffix (e.g. dump_printf), the
current code synthesizes the dump_location_t within
dump_context::ensure_pending_optinfo, and thus saves the useless
impl_location seen above.

This patch fixes things by changing the dump_* API so that, rather than
taking a dump_flags_t, they take a new class dump_metadata_t, which is
constructed from a dump_flags_t, but captures the emission location.

Hence e.g.:

  dump_printf (MSG_NOTE, "some message\n");

implicitly builds a dump_metadata_t wrapping the MSG_NOTE and the
emission location.  If there are several dump_printf calls without
a dump_*_loc call, the emission location within the optinfo is that
of the first dump call within it.

The patch updates selftest::test_capture_of_dump_calls to verify
that the impl location of various dump_* calls is captured.  I also
manually verified that the references to dumpfile.c in the saved
optimization records were fixed.

Successfully bootstrapped & regrtested on x86_64-pc-linux-gnu.

OK for trunk?

gcc/ChangeLog:
* dump-context.h (dump_context::dump_loc): Convert 1st param from
dump_flags_t to const dump_metadata_t &.  Convert 2nd param from
const dump_location_t & to const dump_user_location_t &.
(dump_context::dump_loc_immediate): Convert 2nd param from
const dump_location_t & to const dump_user_location_t &.
(dump_context::dump_gimple_stmt): Convert 1st param from
dump_flags_t to const dump_metadata_t &.
(dump_context::void dump_gimple_stmt_loc): Likewise; convert
2nd param from const dump_location_t & to
const dump_user_location_t &.
(dump_context::dump_gimple_expr): Convert 1st param from
dump_flags_t to const dump_metadata_t &.
(dump_context::dump_gimple_expr_loc): Likewise; convert
2nd param from const dump_location_t & to
const dump_user_location_t &.
(dump_context::dump_generic_expr): Convert 1st param from
dump_flags_t to const dump_metadata_t &.
(dump_context::dump_generic_expr_loc): Likewise; convert
2nd param from const dump_location_t & to
const dump_user_location_t &.
(dump_context::dump_printf_va): Convert 1st param from
dump_flags_t to const dump_metadata_t &.
(dump_context::dump_printf_loc_va): Likewise; convert
2nd param from const dump_location_t & to
const dump_user_location_t &.
(dump_context::dump_dec): Convert 1st param from
dump_flags_t to const dump_metadata_t &.
(dump_context::dump_symtab_node): Likewise.
(dump_context::begin_scope): Split out 2nd param into
user and impl locations.
(dump_context::ensure_pending_optinfo): Add metadata param.
(dump_context::begin_next_optinfo): Replace dump_location_t param
with metadata and user location.
* dumpfile.c (dump_context::dump_loc): Convert 1st param from
dump_flags_t to const dump_metadata_t &.  Convert 2nd param from
const dump_location_t & to const dump_user_location_t &.
(dump_context::dump_loc_immediate): Convert 2nd param from
const dump_location_t & to const dump_user_location_t &.
(dump_context::dump_gimple_stmt): Convert 1st param from
dump_flags_t to const dump_metadata_t &.
(dump_context::void dump_gimple_stmt_loc): Likewise; convert
2nd param from const dump_location_t & to
const dump_user_location_t &.
(dump_context::dump_gimple_expr): Convert 1st param from
dump_flags_t to const dump_metadata_t &.
(dump_context::dump_gimple_expr_loc): Likewise; convert
2nd param from const dump_location_t & to
const dump_user_location_t &.
(dump_context::dump_generic_expr): Convert 1st param from
dump_flags_t to const dump_metadata_t &.
(dump_context::dump_generic_expr_loc): Likewise; convert
2nd param from const dump_location_t & to
const dump_user_location_t &.
(dump_context::dump_printf_va): Convert 1st param from
dump_flags_t to const dump_metadata_t &.
(dump_context::dump_printf_loc_va): Likewise; convert
2nd param from const dump_location_t & to
const dump_user_location_t &.
(dump_context::dump_dec): Convert 1st param from
dump_flags_t to const dump_metadata_t &.

[PATCH v2, target]: Fix PR 88070, ICE in create_pre_exit, at mode-switching.c:438

2018-11-20 Thread Uros Bizjak

Hello!

Attached patch is a different approach to the problem of split return
copies in create_pre_exit. It turns out that for vzeroupper insertion
pass, we actually don't need to insert a mode switch before the return
copy, it is enough to split edge to exit block - so we can emit
vzeroupper at the function exit edge.

Since x86 is the only target that uses optimize mode switching after
reload, I took the liberty and used !reload_completed for the
condition when we don't need to search for return copy. Sure, with the
big comment as evident from the patch.

2018-11-20  Uros Bizjak  

PR target/88070
* mode-switching.c (create_pre_exit): After reload, always split the
fallthrough edge to the exit block.

testsuite/ChangeLog:

2018-11-20  Uros Bizjak  

PR target/88070
* gcc.target/i386/pr88070.c: New test.

Patch was bootstrapped and regression tested on x86_64-linux-gnu {,-m32}.

Committed to mainline SVN.

Uros.
Index: mode-switching.c
===
--- mode-switching.c(revision 266278)
+++ mode-switching.c(working copy)
@@ -248,8 +248,22 @@ create_pre_exit (int n_entities, int *entity_map,
gcc_assert (!pre_exit);
/* If this function returns a value at the end, we have to
   insert the final mode switch before the return value copy
-  to its hard register.  */
-   if (EDGE_COUNT (EXIT_BLOCK_PTR_FOR_FN (cfun)->preds) == 1
+  to its hard register.
+
+  x86 targets use mode-switching infrastructure to
+  conditionally insert vzeroupper instruction at the exit
+  from the function where there is no need to switch the
+  mode before the return value copy.  The vzeroupper insertion
+  pass runs after reload, so use !reload_completed as a stand-in
+  for x86 to skip the search for the return value copy insn.
+
+  N.b.: the code below assumes that the return copy insn
+  immediately precedes its corresponding use insn.  This
+  assumption does not hold after reload, since sched1 pass
+  can schedule the return copy insn away from its
+  corresponding use insn.  */
+   if (!reload_completed
+   && EDGE_COUNT (EXIT_BLOCK_PTR_FOR_FN (cfun)->preds) == 1
&& NONJUMP_INSN_P ((last_insn = BB_END (src_bb)))
&& GET_CODE (PATTERN (last_insn)) == USE
&& GET_CODE ((ret_reg = XEXP (PATTERN (last_insn), 0))) == REG)
Index: testsuite/gcc.target/i386/pr88070.c
===
--- testsuite/gcc.target/i386/pr88070.c (nonexistent)
+++ testsuite/gcc.target/i386/pr88070.c (working copy)
@@ -0,0 +1,12 @@
+/* PR target/88070 */
+/* { dg-do compile } */
+/* { dg-options "-O -fexpensive-optimizations -fnon-call-exceptions 
-fschedule-insns -fno-dce -fno-dse -mavx" } */
+
+typedef float vfloat2 __attribute__ ((__vector_size__ (2 * sizeof (float;
+
+vfloat2
+test1float2 (float c)
+{
+  vfloat2 v = { c, c };
+  return v;
+}

Re: Fix PR rtl-optimization/85925

2018-11-20 Thread Segher Boessenkool

Hi Eric,

On Tue, Nov 20, 2018 at 10:05:21AM +0100, Eric Botcazou wrote:
> +/* Return true if X is an operation that always operates on the full
> +   registers for WORD_REGISTER_OPERATIONS architectures.  */
> +
> +inline bool
> +word_register_operation_p (const_rtx x)
> +{
> +  switch (GET_CODE (x))
> +{
> +case ROTATE:
> +case ROTATERT:
> +case SIGN_EXTRACT:
> +case ZERO_EXTRACT:
> +  return false;
> +
> +default:
> +  return true;
> +}
> +}

This is saying that *every* op except those very few works on the full
register.  And that for every architecture that has W_R_O.

It also only looks at the top code in the RTL, so it will say for example
a rotate-and-mask is just fine, while that isn't true.

Segher

Re: [C++ PATCH] Fix ICE in adjust_temp_type (PR c++/87506)

2018-11-20 Thread Jakub Jelinek

On Sun, Nov 18, 2018 at 08:33:39PM -0500, Jason Merrill wrote:
> On Fri, Nov 16, 2018 at 4:26 PM Jakub Jelinek  wrote:
> > I admit this is just a shot in the dark, but I don't see why
> > one couldn't adjust a type of EMPTY_CLASS_EXPR to EMPTY_CLASS_EXPR
> > with a different variant of the same type.
> 
> Makes sense.
> 
> > Or, should I drop that
> >   && TYPE_MAIN_VARIANT (type) == TYPE_MAIN_VARIANT (TREE_TYPE (temp))
> > part?  We don't really verify something similar for CONSTRUCTORs.
> 
> I suppose it makes sense to assert
> same_type_ignoring_top_level_qualifiers_p for both CONSTRUCTOR and
> EMPTY_CLASS_EXPR.  OK with that change.

Unfortunately that regresses
+FAIL: g++.dg/cpp2a/lambda-uneval8.C  -std=c++2a (internal compiler error)
+FAIL: g++.dg/cpp2a/lambda-uneval8.C  -std=c++2a (test for excess errors)
where one adjust_temp_type call is with
struct A<()>
as type and a different (including different TYPE_MAIN_VARIANT)
TREE_TYPE (temp), also with printable name
struct A<()>.

Shall I remove those asserts for now then, or hand it over to you (no idea
what's going on there)?

2018-11-19  Jakub Jelinek  

PR c++/87506
* constexpr.c (adjust_temp_type): Handle EMPTY_CLASS_EXPR.
Add assertions for the CONSTRUCTOR case.

* g++.dg/cpp0x/constexpr-87506.C: New test.

--- gcc/cp/constexpr.c.jj   2018-11-16 21:35:34.551110868 +0100
+++ gcc/cp/constexpr.c  2018-11-19 09:35:06.880386449 +0100
@@ -1280,7 +1280,17 @@ adjust_temp_type (tree type, tree temp)
 return temp;
   /* Avoid wrapping an aggregate value in a NOP_EXPR.  */
   if (TREE_CODE (temp) == CONSTRUCTOR)
-return build_constructor (type, CONSTRUCTOR_ELTS (temp));
+{
+  gcc_assert (same_type_ignoring_top_level_qualifiers_p (TREE_TYPE (temp),
+type));
+  return build_constructor (type, CONSTRUCTOR_ELTS (temp));
+}
+  if (TREE_CODE (temp) == EMPTY_CLASS_EXPR)
+{
+  gcc_assert (same_type_ignoring_top_level_qualifiers_p (TREE_TYPE (temp),
+type));
+  return build0 (EMPTY_CLASS_EXPR, type);
+}
   gcc_assert (scalarish_type_p (type));
   return cp_fold_convert (type, temp);
 }
--- gcc/testsuite/g++.dg/cpp0x/constexpr-87506.C.jj 2018-11-19 
09:33:07.795341369 +0100
+++ gcc/testsuite/g++.dg/cpp0x/constexpr-87506.C2018-11-19 
09:33:07.795341369 +0100
@@ -0,0 +1,12 @@
+// PR c++/87506
+// { dg-do compile { target c++11 } }
+
+struct A {};
+struct B { constexpr B (const A) {} };
+struct C : B { using B::B; };
+
+void
+foo ()
+{
+  C c (A{});
+}

> 
> Jason

Jakub

Re: [PATCH] clarify comments for implicit_p flag for built-ins

2018-11-20 Thread Martin Sebor


On 11/20/2018 11:02 AM, Martin Sebor wrote:

Would the updated comments in the attached patch more accurately
describe the purpose of the IMPLICIT_P flag and
the builtin_decl_explicit() and builtin_decl_implicit() functions?

I ended up here while trying to understand the differences between
the functions on different targets and decide which one should be
used to diagnose bugs like the one below:

  long double fabsl ();   // missing prototype

  long double f (int x)
  {
return fabsl (x); // want a warning here
  }

I think we want the warning regardless of IMPLICIT_P so the warning
code should call builtin_decl_explicit() to obtain fabsl's expected
type, even if the target's runtime doesn't support the function on
the basis of the comment:

  When a program uses floorf we may assume that the floorf function
  has the expected meaning


Actually, some more testing suggests the comment in builtins.def
(either the original or the patched one) isn't entirely accurate
or helpful to understanding the purpose of the flag:

  IMPLICIT specifies condition when the builtin can be produced by
  compiler.  For instance C90 reserves floorf function, but does not
  define it's meaning.  When user uses floorf we may assume that the
  floorf has the meaning we expect, but we can't produce floorf by
  simplifying floor((double)float) since the runtime need not
  implement it.

Given the following code:

  float floorf ();

  void f (void)
  {
if (floorf (0.0f))
  __builtin_abort ();
  }

in C90 mode, BUILT_IN_FLOORF's IMPLICIT flag is clear and GCC
doesn't seem to assume anything about the call to the function,
contrary to the comment ("we may assume the meaning we expect").
The comment also doesn't explain when IMPLICIT may be set.

I've updated the comment a bit more to more accurately describe
when I think the flag is set or clear, and how it's used.
Corrections or further clarification are appreciated.

Thanks
Martin
gcc/ChangeLog:

	* builtins.def (DEF_BUILTIN): Update comment..
	* tree.h (builtin_decl_explicit): Same.
	(builtin_decl_implicit): Same.

Index: gcc/tree.h
===
--- gcc/tree.h	(revision 266320)
+++ gcc/tree.h	(working copy)
@@ -5220,7 +5220,9 @@ is_lang_specific (const_tree t)
 #define BUILTIN_VALID_P(FNCODE) \
   (IN_RANGE ((int)FNCODE, ((int)BUILT_IN_NONE) + 1, ((int) END_BUILTINS) - 1))
 
-/* Return the tree node for an explicit standard builtin function or NULL.  */
+/* Return the tree node for the built-in function declaration corresponding
+   to FNCODE or NULL.  */
+
 static inline tree
 builtin_decl_explicit (enum built_in_function fncode)
 {
@@ -5229,7 +5231,12 @@ builtin_decl_explicit (enum built_in_function fnco
   return builtin_info[(size_t)fncode].decl;
 }
 
-/* Return the tree node for an implicit builtin function or NULL.  */
+/* Return the tree node for the built-in function declaration corresponding
+   to FNCODE if its IMPLICIT_P flag has been set or NULL otherwise.
+   IMPLICIT_P is clear for library built-ins that GCC implements but that
+   may not be implemented in the runtime library on the target.  See also
+   the DEF_BUILTIN macro in builtins.def.  */
+
 static inline tree
 builtin_decl_implicit (enum built_in_function fncode)
 {
Index: gcc/builtins.def
===
--- gcc/builtins.def	(revision 266320)
+++ gcc/builtins.def	(working copy)
@@ -54,12 +54,18 @@ along with GCC; see the file COPYING3.  If not see
ATTRs is an attribute list as defined in builtin-attrs.def that
describes the attributes of this builtin function.
 
-   IMPLICIT specifies condition when the builtin can be produced by
-   compiler.  For instance C90 reserves floorf function, but does not
-   define it's meaning.  When user uses floorf we may assume that the
-   floorf has the meaning we expect, but we can't produce floorf by
-   simplifying floor((double)float) since the runtime need not implement
-   it.
+   IMPLICIT specifies the condition when calls to a library builtin
+   may be introduced by GCC.  For instance, C90 defines the floor
+   function but only reserves floorf without defining its meaning.
+   Thus, IMPLICIT is set for floor but clear for floorf.  GCC can
+   safely substitute calls to floor for equivalent expressions but
+   the most it can do for floorf is assume that explicit calls to
+   it in a program are those to the reserved function.  It cannot
+   introduce calls to the function that do not exist in the source
+   code of the program.  This prevents trasformations that might be
+   possibe otherwise, such as turning the call floor((double)flt)
+   into one to floorf(flt) because the runtime library can be assumed
+   to implement the latter function.
 
The builtins is registered only if COND is true.  */

[PATCH] Fix up method-nonnull-1.mm testcase on Solaris (PR testsuite/88090)

2018-11-20 Thread Jakub Jelinek

Hi!

The following testcase fails on Solaris, because it doesn't print there
'size_t', but 'std::size_t', as the type is defined by system headers and
it is not under gcc control how exactly is size_t defined.

The following patch fixes that by using a different typedef which we have
total control over.

Tested on x86_64-linux and i686-linux, ok for trunk?

2018-11-20  Jakub Jelinek  

PR testsuite/88090
* obj-c++.dg/attributes/method-nonnull-1.mm (my_size_t): New typedef.
(MyArray::removeObjectAtIndex): Use my_size_t instead of size_t and
expect it in diagnostics.

--- gcc/testsuite/obj-c++.dg/attributes/method-nonnull-1.mm.jj  2018-11-16 
10:22:17.817272221 +0100
+++ gcc/testsuite/obj-c++.dg/attributes/method-nonnull-1.mm 2018-11-20 
09:10:28.404872788 +0100
@@ -5,6 +5,8 @@
 #include 
 #include 
 
+typedef __SIZE_TYPE__ my_size_t;
+
 @interface MyArray
 {
   Class isa;
@@ -25,8 +27,8 @@
 + (void) removeObject: (id)object __attribute__ ((nonnull (2))); /* { 
dg-warning "exceeds the number of function parameters 3" } */
 - (void) removeObject: (id)object __attribute__ ((nonnull (2))); /* { 
dg-warning "exceeds the number of function parameters 3" } */
 
-+ (void) removeObjectAtIndex: (size_t)object __attribute__ ((nonnull (1))); /* 
{ dg-warning "refers to parameter type .size_t." } */
-- (void) removeObjectAtIndex: (size_t)object __attribute__ ((nonnull (1))); /* 
{ dg-warning "refers to parameter type .size_t." } */
++ (void) removeObjectAtIndex: (my_size_t)object __attribute__ ((nonnull (1))); 
/* { dg-warning "refers to parameter type .my_size_t." } */
+- (void) removeObjectAtIndex: (my_size_t)object __attribute__ ((nonnull (1))); 
/* { dg-warning "refers to parameter type .my_size_t." } */
 
 + (void) removeObject: (id)object __attribute__ ((nonnull (MyArray))); /* { 
dg-error "" } */
   /* { dg-warning "attribute argument is invalid" "" { target *-*-* } .-1 } */

Jakub

[committed] Fix omp simd clone creation for multiple return stmts (PR tree-optimization/87895)

2018-11-20 Thread Jakub Jelinek

Hi!

In certain cases like the testcases below there are multiple return stmts
in the function for which we create simd clones and the simd clone adjusting
code wasn't handling that case properly, some bbs could end up with
non-fallthru edges to the increment bb even without a gimple_goto at the
end, others could be missed from redirection to incr_bb.

Fixed thusly, bootstrapped/regtested on x86_64-linux and i686-linux,
committed to trunk.

2018-11-20  Jakub Jelinek  

PR tree-optimization/87895
* omp-simd-clone.c (ipa_simd_modify_function_body): When removing
or replacing GIMPLE_RETURN, set EDGE_FALLTHRU on the edge to EXIT.
(simd_clone_adjust): Don't set EDGE_FALLTHRU here. In a loop that
redirects edges to EXIT to edges to incr_bb, iterate while EXIT
has any preds and always use EDGE_PRED (, 0).

* gcc.dg/gomp/pr87895-1.c: New test.
* gcc.dg/gomp/pr87895-2.c: New test.
* gcc.dg/gomp/pr87895-3.c: New test.

--- gcc/omp-simd-clone.c.jj 2018-11-14 01:01:56.758459348 +0100
+++ gcc/omp-simd-clone.c2018-11-20 13:57:53.902488981 +0100
@@ -994,6 +994,8 @@ ipa_simd_modify_function_body (struct cg
  if (greturn *return_stmt = dyn_cast  (stmt))
{
  tree retval = gimple_return_retval (return_stmt);
+ edge e = find_edge (bb, EXIT_BLOCK_PTR_FOR_FN (cfun));
+ e->flags |= EDGE_FALLTHRU;
  if (!retval)
{
  gsi_remove (&gsi, true);
@@ -1150,14 +1152,9 @@ simd_clone_adjust (struct cgraph_node *n
   incr_bb = create_empty_bb (orig_exit);
   incr_bb->count = profile_count::zero ();
   add_bb_to_loop (incr_bb, body_bb->loop_father);
-  /* The succ of orig_exit was EXIT_BLOCK_PTR_FOR_FN (cfun), with an empty
-flag.  Set it now to be a FALLTHRU_EDGE.  */
-  gcc_assert (EDGE_COUNT (orig_exit->succs) == 1);
-  EDGE_SUCC (orig_exit, 0)->flags |= EDGE_FALLTHRU;
-  for (unsigned i = 0;
-  i < EDGE_COUNT (EXIT_BLOCK_PTR_FOR_FN (cfun)->preds); ++i)
+  while (EDGE_COUNT (EXIT_BLOCK_PTR_FOR_FN (cfun)->preds))
{
- edge e = EDGE_PRED (EXIT_BLOCK_PTR_FOR_FN (cfun), i);
+ edge e = EDGE_PRED (EXIT_BLOCK_PTR_FOR_FN (cfun), 0);
  redirect_edge_succ (e, incr_bb);
  incr_bb->count += e->count ();
}
--- gcc/testsuite/gcc.dg/gomp/pr87895-1.c.jj2018-11-20 13:18:00.483355400 
+0100
+++ gcc/testsuite/gcc.dg/gomp/pr87895-1.c   2018-11-20 13:18:28.792884133 
+0100
@@ -0,0 +1,19 @@
+/* PR tree-optimization/87895 */
+/* { dg-do compile } */
+/* { dg-additional-options "-O0" } */
+
+#pragma omp declare simd
+int
+foo (int x)
+{
+  if (x == 0)
+return 0;
+}
+
+#pragma omp declare simd
+int
+bar (int *x, int y)
+{
+  if ((y == 0) ? (*x = 0) : *x)
+return 0;
+}
--- gcc/testsuite/gcc.dg/gomp/pr87895-2.c.jj2018-11-20 13:18:07.780233931 
+0100
+++ gcc/testsuite/gcc.dg/gomp/pr87895-2.c   2018-11-20 13:18:57.265410143 
+0100
@@ -0,0 +1,5 @@
+/* PR tree-optimization/87895 */
+/* { dg-do compile } */
+/* { dg-additional-options "-O1" } */
+
+#include "pr87895-1.c"
--- gcc/testsuite/gcc.dg/gomp/pr87895-3.c.jj2018-11-20 14:06:23.131004074 
+0100
+++ gcc/testsuite/gcc.dg/gomp/pr87895-3.c   2018-11-20 14:06:03.697327933 
+0100
@@ -0,0 +1,18 @@
+/* PR tree-optimization/87895 */
+/* { dg-do compile } */
+/* { dg-additional-options "-O2" } */
+
+#pragma omp declare simd
+int foo (int x) __attribute__((noreturn));
+
+#pragma omp declare simd
+int
+bar (int x, int y)
+{
+  if (y == 1)
+foo (x + 2);
+  if (y == 10)
+foo (x + 6);
+  if (y != 25)
+return 4;
+}

Jakub

[C++ PATCH] Fix ICE in constexpr OBJ_TYPE_REF handling (PR c++/88110)

2018-11-20 Thread Jakub Jelinek

Hi!

The comment in OBJ_TYPE_REF handling code correctly says that we are
looking for x.D.2103.D.2094, but it is important that x is not an
INDIRECT_REF or something similar as in the following testcase - we can't
really devirtualize in that case because we really don't know what it points
to.  The following patch ensures that the argument got evaluated to address
of some field of (ultimately) a decl, which is all we should get during
valid constexpr evaluation.

Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?

2018-11-20  Jakub Jelinek  

PR c++/88110
* constexpr.c (cxx_eval_constant_expression) : Punt
if get_base_address of ADDR_EXPR operand is not a DECL_P.

* g++.dg/cpp2a/constexpr-virtual13.C: New test.

--- gcc/cp/constexpr.c.jj   2018-11-19 14:24:49.0 +0100
+++ gcc/cp/constexpr.c  2018-11-20 15:03:26.968152935 +0100
@@ -4815,7 +4815,8 @@ cxx_eval_constant_expression (const cons
obj = cxx_eval_constant_expression (ctx, obj, lval, non_constant_p,
overflow_p);
/* We expect something in the form of &x.D.2103.D.2094; get x. */
-   if (TREE_CODE (obj) != ADDR_EXPR)
+   if (TREE_CODE (obj) != ADDR_EXPR
+   || !DECL_P (get_base_address (TREE_OPERAND (obj, 0
  {
if (!ctx->quiet)
  error_at (cp_expr_loc_or_loc (t, input_location),
--- gcc/testsuite/g++.dg/cpp2a/constexpr-virtual13.C.jj 2018-11-20 
15:07:17.558386765 +0100
+++ gcc/testsuite/g++.dg/cpp2a/constexpr-virtual13.C2018-11-20 
15:05:30.188140420 +0100
@@ -0,0 +1,20 @@
+// PR c++/88110
+// { dg-do compile }
+
+struct A {
+  virtual int foo () const = 0;
+};
+struct B {
+  virtual int bar () const = 0;
+  virtual int baz () const = 0;
+};
+struct C : public A { };
+struct D : public C { };
+struct E : public D, public B { };
+
+void
+qux (const E *x)
+{
+  if (x->baz ())
+;
+}

Jakub

[RFC C++ PATCH] Improve locations of id-expressions and operator "" (PR c++/87386)

2018-11-20 Thread Jakub Jelinek

Hi!

This PR is complaining about range covering the first token from
an id-expression:
pr87386.C:4:15: error: static assertion failed: foo
4 | static_assert(foo::test::value, "foo");
  |   ^~~
The following patch adjust that to:
pr87386.C:4:31: error: static assertion failed: foo
4 | static_assert(foo::test::value, "foo");
  |   ^
instead, though as the changes to the testsuite show, not really sure
if it is a good idea in all cases, because then we sometimes print:
... bar is not in foo namespace, did you mean 'baz'
  foo::bar
  ~^~~
  baz
where the baz is misaligned.  Would it be better to just print
pr87386.C:4:31: error: static assertion failed: foo
4 | static_assert(foo::test::value, "foo");
  |   ^
instead?  That would mean dropping the cp_parser_id_expression change
and readjusting or dropping some testsuite changes.

2018-11-20  Jakub Jelinek  

PR c++/87386
* parser.c (cp_parser_primary_expression): Use
id_expression.get_location () instead of id_expr_token->location.
(cp_parser_id_expression): For nested_name_specifier_p, make a
range covering whole id-expression.
(cp_parser_operator): For operator "" make a range from "" to
the end of the suffix with caret at the start of the id.
gcc/testsuite/
* g++.dg/spellcheck-pr79298.C: Adjust expected diagnostics.
* g++.dg/lookup/suggestions2.C: Likewise.
* g++.dg/spellcheck-single-vs-multiple.C: Likewise.
* g++.dg/parse/error17.C: Likewise.
* g++.dg/spellcheck-pr77829.C: Likewise.
* g++.dg/spellcheck-pr78656.C: Likewise.
* g++.dg/cpp0x/pr51420.C: Likewise.
* g++.dg/cpp0x/udlit-declare-neg.C: Likewise.
* g++.dg/cpp0x/udlit-member-neg.C: Likewise.
libstdc++-v3/
* testsuite/20_util/scoped_allocator/69293_neg.cc: Adjust expected
line.
* testsuite/20_util/uses_allocator/cons_neg.cc: Likewise.
* testsuite/experimental/propagate_const/requirements2.cc: Likewise.
* testsuite/experimental/propagate_const/requirements3.cc: Likewise.
* testsuite/experimental/propagate_const/requirements4.cc: Likewise.
* testsuite/experimental/propagate_const/requirements5.cc: Likewise.

--- gcc/cp/parser.c.jj  2018-11-20 08:41:28.686923718 +0100
+++ gcc/cp/parser.c 2018-11-20 19:05:22.848941522 +0100
@@ -5602,7 +5602,7 @@ cp_parser_primary_expression (cp_parser
  /*is_namespace=*/false,
  /*check_dependency=*/true,
  &ambiguous_decls,
- id_expr_token->location);
+ id_expression.get_location ());
/* If the lookup was ambiguous, an error will already have
   been issued.  */
if (ambiguous_decls)
@@ -5673,7 +5673,7 @@ cp_parser_primary_expression (cp_parser
if (parser->local_variables_forbidden_p
&& local_variable_p (decl))
  {
-   error_at (id_expr_token->location,
+   error_at (id_expression.get_location (),
  "local variable %qD may not appear in this context",
  decl.get_value ());
return error_mark_node;
@@ -5692,7 +5692,7 @@ cp_parser_primary_expression (cp_parser
 id_expression.get_location ()));
if (error_msg)
  cp_parser_error (parser, error_msg);
-   decl.set_location (id_expr_token->location);
+   decl.set_location (id_expression.get_location ());
return decl;
   }
 
@@ -5758,6 +5758,7 @@ cp_parser_id_expression (cp_parser *pars
 {
   bool global_scope_p;
   bool nested_name_specifier_p;
+  location_t start_loc = cp_lexer_peek_token (parser->lexer)->location;
 
   /* Assume the `template' keyword was not used.  */
   if (template_p)
@@ -5809,6 +5810,7 @@ cp_parser_id_expression (cp_parser *pars
   parser->object_scope = saved_object_scope;
   parser->qualifying_scope = saved_qualifying_scope;
 
+  unqualified_id.set_range (start_loc, unqualified_id.get_finish ());
   return unqualified_id;
 }
   /* Otherwise, if we are in global scope, then we are looking at one
@@ -14931,7 +14933,7 @@ cp_literal_operator_id (const char* name
 static cp_expr
 cp_parser_operator (cp_parser* parser)
 {
-  tree id = NULL_TREE;
+  cp_expr id = NULL_TREE;
   cp_token *token;
   bool utf8 = false;
 
@@ -15219,8 +15221,9 @@ cp_parser_operator (cp_parser* parser)
if (id != error_mark_node)
  {
const char *name = IDENTIFIER_POINTER (id);
-   id = cp_literal_operator_id (name);
+   *id = cp_literal_operator_id (name);
  }
+   id.set_range (start_loc, id.get_finish ());
return id;
   }
 
@@ -15244,7 +15247,8 @@ cp_parser_operato

Re: [PATCH] handle unusual targets in -Wbuiltin-declaration-mismatch (PR 88098)

2018-11-20 Thread Martin Sebor


On 11/20/2018 08:56 AM, Christophe Lyon wrote:

On Mon, 19 Nov 2018 at 22:38, Martin Sebor  wrote:


The gcc.dg/Wbuiltin-declaration-mismatch-4.c test added with
the recent -Wbuiltin-declaration-mismatch enhancement to detect
calls with incompatible arguments to built-ins declared without
a prototype fails on a few targets due to incorrect assumptions
hardcoded into the test.  Besides removing those assumptions
(or adding appropriate { target } attributes, the attached patch
also adjusts the implementation of the warning to avoid triggering
for enum promotion to int on short_enums targets.

Since the fix is trivial I plan to commit it tomorrow if there
are no concerns.

Tested on x86_64-linux and with an arm-none-eabi cross-compiler.
I also did a little bit of testing with sparc-solaris2.11 cross
compiler but there the test harness fails due to the -m32 option
so the Wbuiltin-declaration-mismatch-4.c still has unexpected
FAILs.  I've raised bug 88104 for the outstanding problem on
sparc-solaris2.11.



Hello,

I tested your patch on arm* and aarch64*. It does the job on arm, but
on aarch64*elf,
I'm seeing new failures:
gcc.dg/Wbuiltin-declaration-mismatch-4.c large long double (test for
warnings, line 121)
gcc.dg/Wbuiltin-declaration-mismatch-4.c large long double (test for
warnings, line 123)
gcc.dg/Wbuiltin-declaration-mismatch-4.c large long double (test for
warnings, line 98)


The failures are due to the target apparently not supporting
the fabsl built-in used by the test.

Martin

Re: [RFC C++ PATCH] Improve locations of id-expressions and operator "" (PR c++/87386)

2018-11-20 Thread David Malcolm

On Tue, 2018-11-20 at 21:57 +0100, Jakub Jelinek wrote:
> Hi!
> 
> This PR is complaining about range covering the first token from
> an id-expression:
> pr87386.C:4:15: error: static assertion failed: foo
> 4 | static_assert(foo::test::value, "foo");
>   |   ^~~
> The following patch adjust that to:
> pr87386.C:4:31: error: static assertion failed: foo
> 4 | static_assert(foo::test::value, "foo");
>   |   ^
> instead, though as the changes to the testsuite show, not really sure
> if it is a good idea in all cases, because then we sometimes print:
> ... bar is not in foo namespace, did you mean 'baz'
>   foo::bar
>   ~^~~
>   baz
> where the baz is misaligned.  

Is the compiler suggesting the use of
(a) "foo::baz" or 
(b) "::baz"?

Given the underlining, the fix-it hint would be suggesting the
replacement of "foo::bar" with "baz", which would be wrong if we mean
(a) above.

(c.f. "Fix-it hints should work" in https://gcc.gnu.org/onlinedocs/gcci
nt/Guidelines-for-Diagnostics.html )


FWIW in r265610 I ran into issues like this, which I resolved by
holding off on some fix-it hints for the case where we don't have a
location covering the whole of a qualified name.

> Would it be better to just print
> pr87386.C:4:31: error: static assertion failed: foo
> 4 | static_assert(foo::test::value, "foo");
>   |   ^
> instead?  That would mean dropping the cp_parser_id_expression change
> and readjusting or dropping some testsuite changes.

That might be better... let me look at the affected test cases.

[...snip...]

 
>  /* Parse a template-declaration.
> --- gcc/testsuite/g++.dg/spellcheck-pr79298.C.jj  2018-10-31
> 10:31:13.281572644 +0100
> +++ gcc/testsuite/g++.dg/spellcheck-pr79298.C 2018-11-20
> 19:14:19.208219955 +0100
> @@ -11,7 +11,7 @@ int foo ()
>return M::y; // { dg-error ".y. is not a member of .M." }
>/* { dg-begin-multiline-output "" }
> return M::y;
> - ^
> +  ~~~^
>   { dg-end-multiline-output "" } */
>  }

 
> @@ -20,7 +20,7 @@ int bar ()
>return O::colour; // { dg-error ".colour. is not a member of .O.;
> did you mean 'color'\\?" }
>/* { dg-begin-multiline-output "" }
> return O::colour;
> - ^~
> - color
> +  ~~~^~
> +  color
>   { dg-end-multiline-output "" } */
>  }

This makes the fix-it hint wrong: after the fix-it is applied, it will
become
  return color;
(which won't compile), rather than
  return O::color;
which will.

(I wish we had a good automated way of verifying that fix-it hints fix
things)

> --- gcc/testsuite/g++.dg/lookup/suggestions2.C.jj 2018-10-31
> 10:31:06.928677642 +0100
> +++ gcc/testsuite/g++.dg/lookup/suggestions2.C2018-11-20
> 19:12:05.281395810 +0100
> @@ -33,8 +33,8 @@ int test_1_long (void) {
>return outer_ns::var_in_inner_ns_a; // { dg-error "did you mean
> 'var_in_outer_ns'" }
>/* { dg-begin-multiline-output "" }
> return outer_ns::var_in_inner_ns_a;
> -^
> -var_in_outer_ns
> +  ~~^
> +  var_in_outer_ns
>   { dg-end-multiline-output "" } */
>  }

Again, this makes the fix-it hint wrong: after the fix-it is applied,
it will become
  return var_in_outer_ns;
(which won't compile) rather than:
  return outer_ns::var_in_outer_ns;

[...snip; I think there are more examples in this test file...] 

> --- gcc/testsuite/g++.dg/spellcheck-single-vs-multiple.C.jj   20
> 18-10-31 10:31:07.765663807 +0100
> +++ gcc/testsuite/g++.dg/spellcheck-single-vs-multiple.C  2018-
> 11-20 19:14:35.698952024 +0100
> @@ -73,7 +73,7 @@ void test_3 ()
>ns3::goo_3 (); // { dg-error "'goo_3' is not a member of 'ns3';
> did you mean 'foo_3'\\?" }
>/* { dg-begin-multiline-output "" }
> ns3::goo_3 ();
> -^
> -foo_3
> +   ~^
> +   foo_3
>   { dg-end-multiline-output "" } */
>  }

Again, the fix-it hint becomes wrong, it will become:
   foo_3 ();
rather than:
   ns3::foo_3 ();

[...snip...]

> --- gcc/testsuite/g++.dg/spellcheck-pr77829.C.jj  2018-10-31
> 10:31:10.213623350 +0100
> +++ gcc/testsuite/g++.dg/spellcheck-pr77829.C 2018-11-20
> 19:13:30.78045 +0100
> @@ -21,8 +21,8 @@ void fn_1_explicit ()
>detail::some_type i; // { dg-error ".some_type. is not a member of
> .detail.; did you mean 'some_typedef'\\?" }
>/* { dg-begin-multiline-output "" }
> detail::some_type i;
> -   ^
> -   some_typedef
> +   ^
> +   some_typedef
>   { dg-end-multiline-output "" } */
>  }

Similar problems here.
 
[...snip...]

So it looks like the less invasive fix might be better (not that I've
looked at it in detail, though).

Hope this is constructive
Dave

Re: [PATCH] handle unusual targets in -Wbuiltin-declaration-mismatch (PR 88098)

2018-11-20 Thread Martin Sebor


By calling builtin_decl_explicit rather than builtin_decl_implicit
the updated patch in the attachment avoids test failures due to
missing warnings on targets with support for long double but whose
libc doesn't support C99 functions like fabsl (such as apparently
aarch64-linux).

Martin

On 11/19/2018 02:37 PM, Martin Sebor wrote:

The gcc.dg/Wbuiltin-declaration-mismatch-4.c test added with
the recent -Wbuiltin-declaration-mismatch enhancement to detect
calls with incompatible arguments to built-ins declared without
a prototype fails on a few targets due to incorrect assumptions
hardcoded into the test.  Besides removing those assumptions
(or adding appropriate { target } attributes, the attached patch
also adjusts the implementation of the warning to avoid triggering
for enum promotion to int on short_enums targets.

Since the fix is trivial I plan to commit it tomorrow if there
are no concerns.

Tested on x86_64-linux and with an arm-none-eabi cross-compiler.
I also did a little bit of testing with sparc-solaris2.11 cross
compiler but there the test harness fails due to the -m32 option
so the Wbuiltin-declaration-mismatch-4.c still has unexpected
FAILs.  I've raised bug 88104 for the outstanding problem on
sparc-solaris2.11.

Martin


PR testsuite/88098 - FAIL: gcc.dg/Wbuiltin-declaration-mismatch-4.c

gcc/c/ChangeLog:

	PR testsuite/88098
	* c-typeck.c (convert_arguments): Call builtin_decl_explicit instead.
	(maybe_warn_builtin_no_proto_arg): Handle short enum to int promotion.

gcc/testsuite/ChangeLog:

	PR testsuite/88098
	* gcc.dg/Wbuiltin-declaration-mismatch-4.c: Adjust.
	* gcc.dg/Wbuiltin-declaration-mismatch-5.c: New test.

Index: gcc/c/c-typeck.c
===
--- gcc/c/c-typeck.c	(revision 266320)
+++ gcc/c/c-typeck.c	(working copy)
@@ -3422,7 +3422,10 @@ convert_arguments (location_t loc, vec
   built_in_function code = DECL_FUNCTION_CODE (fundecl);
   if (C_DECL_BUILTIN_PROTOTYPE (fundecl))
 	{
-	  if (tree bdecl = builtin_decl_implicit (code))
+	  /* For a call to a built-in function declared without a prototype
+	 use the types of the parameters of the internal built-in to
+	 match those of the arguments to.  */
+	  if (tree bdecl = builtin_decl_explicit (code))
 	builtin_typelist = TYPE_ARG_TYPES (TREE_TYPE (bdecl));
 	}
 
@@ -6461,7 +6464,9 @@ maybe_warn_builtin_no_proto_arg (location_t loc, t
   && TYPE_MODE (parmtype) == TYPE_MODE (argtype))
 return;
 
-  if (parmcode == argcode
+  if ((parmcode == argcode
+   || (parmcode == INTEGER_TYPE
+	   && argcode == ENUMERAL_TYPE))
   && TYPE_MAIN_VARIANT (parmtype) == TYPE_MAIN_VARIANT (promoted))
 return;
 
Index: gcc/testsuite/gcc.dg/Wbuiltin-declaration-mismatch-4.c
===
--- gcc/testsuite/gcc.dg/Wbuiltin-declaration-mismatch-4.c	(revision 266320)
+++ gcc/testsuite/gcc.dg/Wbuiltin-declaration-mismatch-4.c	(working copy)
@@ -77,9 +77,9 @@ void test_integer_conversion_memset (void *d)
   /* Passing a ptrdiff_t where size_t is expected may not be unsafe
  but because GCC may emits suboptimal code for such calls warning
  for them helps improve efficiency.  */
-  memset (d, 0, diffi);   /* { dg-warning ".memset. argument 3 promotes to .ptrdiff_t. {aka .long int.} where .long unsigned int. is expected" } */
+  memset (d, 0, diffi);   /* { dg-warning ".memset. argument 3 promotes to .ptrdiff_t. {aka .\(long \)?int.} where .\(long \)?unsigned int. is expected" } */
 
-  memset (d, 0, 2.0); /* { dg-warning ".memset. argument 3 type is .double. where 'long unsigned int' is expected" } */
+  memset (d, 0, 2.0); /* { dg-warning ".memset. argument 3 type is .double. where '\(long \)?unsigned int' is expected" } */
 
   /* Verify that the same call as above but to the built-in doesn't
  trigger a warning.  */
@@ -108,7 +108,8 @@ void test_real_conversion_fabs (void)
   /* In C, the type of an enumeration constant is int.  */
   d = fabs (e0);/* { dg-warning ".fabs. argument 1 type is .int. where .double. is expected in a call to built-in function declared without prototype" } */
 
-  d = fabs (e); /* { dg-warning ".fabs. argument 1 type is .enum E. where .double. is expected in a call to built-in function declared without prototype" } */
+  d = fabs (e); /* { dg-warning ".fabs. argument 1 type is .enum E. where .double. is expected in a call to built-in function declared without prototype" "ordinary enum" { target { ! short_enums } } } */
+  /* { dg-warning ".fabs. argument 1 promotes to .int. where .double. is expected in a call to built-in function declared without prototype" "size 1 enum" { target short_enums } .-1 } */
 
   /* No warning here since float is promoted to double.  */
   d = fabs (f);

[PATCH 1/6] [og8] Host-to-device transfer coalescing & magic offset value self-documentation

2018-11-20 Thread Julian Brown


Previously posted upstream:
https://gcc.gnu.org/ml/gcc-patches/2018-11/msg00825.html

libgomp/
* libgomp.h (OFFSET_INLINED, OFFSET_POINTER, OFFSET_STRUCT): Define.
* target.c (FIELD_TGT_EMPTY): Define.
(gomp_coalesce_chunk): New.
(gomp_coalesce_buf): Use above instead of flat array of size_t pairs.
(gomp_coalesce_buf_add): Adjust for above change.
(gomp_copy_host2dev): Likewise.
(gomp_map_val): Use OFFSET_* macros instead of magic constants.  Write
as switch instead of list of ifs.
(gomp_map_vars_async): Adjust for gomp_coalesce_chunk change.  Use
OFFSET_* macros.
---
 libgomp/libgomp.h |5 +++
 libgomp/target.c  |  101 +++-
 2 files changed, 65 insertions(+), 41 deletions(-)

diff --git a/libgomp/libgomp.h b/libgomp/libgomp.h
index 607f4c2..acf7f8f 100644
--- a/libgomp/libgomp.h
+++ b/libgomp/libgomp.h
@@ -842,6 +842,11 @@ struct target_mem_desc {
artificial pointer to "omp declare target link" object.  */
 #define REFCOUNT_LINK (~(uintptr_t) 1)
 
+/* Special offset values.  */
+#define OFFSET_INLINED (~(uintptr_t) 0)
+#define OFFSET_POINTER (~(uintptr_t) 1)
+#define OFFSET_STRUCT (~(uintptr_t) 2)
+
 struct splay_tree_key_s {
   /* Address of the host object.  */
   uintptr_t host_start;
diff --git a/libgomp/target.c b/libgomp/target.c
index ab17650..7220ac6 100644
--- a/libgomp/target.c
+++ b/libgomp/target.c
@@ -45,6 +45,8 @@
 #include "plugin-suffix.h"
 #endif
 
+#define FIELD_TGT_EMPTY (~(size_t) 0)
+
 static void gomp_target_init (void);
 
 /* The whole initialization code for offloading plugins is only run one.  */
@@ -206,8 +208,14 @@ goacc_device_copy_async (struct gomp_device_descr *devicep,
 }
 }
 
-/* Infrastructure for coalescing adjacent or nearly adjacent (in device addresses)
-   host to device memory transfers.  */
+/* Infrastructure for coalescing adjacent or nearly adjacent (in device
+   addresses) host to device memory transfers.  */
+
+struct gomp_coalesce_chunk
+{
+  /* The starting and ending point of a coalesced chunk of memory.  */
+  size_t start, end;
+};
 
 struct gomp_coalesce_buf
 {
@@ -215,10 +223,10 @@ struct gomp_coalesce_buf
  it will be copied to the device.  */
   void *buf;
   struct target_mem_desc *tgt;
-  /* Array with offsets, chunks[2 * i] is the starting offset and
- chunks[2 * i + 1] ending offset relative to tgt->tgt_start device address
+  /* Array with offsets, chunks[i].start is the starting offset and
+ chunks[i].end ending offset relative to tgt->tgt_start device address
  of chunks which are to be copied to buf and later copied to device.  */
-  size_t *chunks;
+  struct gomp_coalesce_chunk *chunks;
   /* Number of chunks in chunks array, or -1 if coalesce buffering should not
  be performed.  */
   long chunk_cnt;
@@ -251,14 +259,14 @@ gomp_coalesce_buf_add (struct gomp_coalesce_buf *cbuf, size_t start, size_t len)
 {
   if (cbuf->chunk_cnt < 0)
 	return;
-  if (start < cbuf->chunks[2 * cbuf->chunk_cnt - 1])
+  if (start < cbuf->chunks[cbuf->chunk_cnt-1].end)
 	{
 	  cbuf->chunk_cnt = -1;
 	  return;
 	}
-  if (start < cbuf->chunks[2 * cbuf->chunk_cnt - 1] + MAX_COALESCE_BUF_GAP)
+  if (start < cbuf->chunks[cbuf->chunk_cnt-1].end + MAX_COALESCE_BUF_GAP)
 	{
-	  cbuf->chunks[2 * cbuf->chunk_cnt - 1] = start + len;
+	  cbuf->chunks[cbuf->chunk_cnt-1].end = start + len;
 	  cbuf->use_cnt++;
 	  return;
 	}
@@ -268,8 +276,8 @@ gomp_coalesce_buf_add (struct gomp_coalesce_buf *cbuf, size_t start, size_t len)
   if (cbuf->use_cnt == 1)
 	cbuf->chunk_cnt--;
 }
-  cbuf->chunks[2 * cbuf->chunk_cnt] = start;
-  cbuf->chunks[2 * cbuf->chunk_cnt + 1] = start + len;
+  cbuf->chunks[cbuf->chunk_cnt].start = start;
+  cbuf->chunks[cbuf->chunk_cnt].end = start + len;
   cbuf->chunk_cnt++;
   cbuf->use_cnt = 1;
 }
@@ -301,20 +309,20 @@ gomp_copy_host2dev (struct gomp_device_descr *devicep,
   if (cbuf)
 {
   uintptr_t doff = (uintptr_t) d - cbuf->tgt->tgt_start;
-  if (doff < cbuf->chunks[2 * cbuf->chunk_cnt - 1])
+  if (doff < cbuf->chunks[cbuf->chunk_cnt-1].end)
 	{
 	  long first = 0;
 	  long last = cbuf->chunk_cnt - 1;
 	  while (first <= last)
 	{
 	  long middle = (first + last) >> 1;
-	  if (cbuf->chunks[2 * middle + 1] <= doff)
+	  if (cbuf->chunks[middle].end <= doff)
 		first = middle + 1;
-	  else if (cbuf->chunks[2 * middle] <= doff)
+	  else if (cbuf->chunks[middle].start <= doff)
 		{
-		  if (doff + sz > cbuf->chunks[2 * middle + 1])
+		  if (doff + sz > cbuf->chunks[middle].end)
 		gomp_fatal ("internal libgomp cbuf error");
-		  memcpy ((char *) cbuf->buf + (doff - cbuf->chunks[0]),
+		  memcpy ((char *) cbuf->buf + (doff - cbuf->chunks[0].start),
 			  h, sz);
 		  return;
 		}
@@ -538,17 +546,25 @@ gomp_map_val (struct target_mem_desc *tgt, void **hostaddrs, size_t i)
 return tgt->list[i].key->tgt->tgt_start

[PATCH 2/6] [og8] Factor out duplicate code in gimplify_scan_omp_clauses

2018-11-20 Thread Julian Brown


Previously posted upstream:
https://gcc.gnu.org/ml/gcc-patches/2018-11/msg00824.html

gcc/
* gimplify.c (insert_struct_component_mapping)
(check_base_and_compare_lt): New.
(gimplify_scan_omp_clauses): Outline duplicated code into calls to
above two functions.
---
 gcc/gimplify.c |  307 
 1 files changed, 174 insertions(+), 133 deletions(-)

diff --git a/gcc/gimplify.c b/gcc/gimplify.c
index 9be0b70..824e020 100644
--- a/gcc/gimplify.c
+++ b/gcc/gimplify.c
@@ -7661,6 +7661,160 @@ demote_firstprivate_pointer (tree decl, gimplify_omp_ctx *ctx)
 }
 }
 
+/* Insert a GOMP_MAP_ALLOC or GOMP_MAP_RELEASE node following a
+   GOMP_MAP_STRUCT mapping.  C is an always_pointer mapping.  STRUCT_NODE is
+   the struct node to insert the new mapping after (when the struct node is
+   initially created).  PREV_NODE is the first of two or three mappings for a
+   pointer, and is either:
+ - the node before C, when a pair of mappings is used, e.g. for a C/C++
+   array section.
+ - not the node before C.  This is true when we have a reference-to-pointer
+   type (with a mapping for the reference and for the pointer), or for
+   Fortran derived-type mappings with a GOMP_MAP_TO_PSET.
+   If SCP is non-null, the new node is inserted before *SCP.
+   if SCP is null, the new node is inserted before PREV_NODE.
+   The return type is:
+ - PREV_NODE, if SCP is non-null.
+ - The newly-created ALLOC or RELEASE node, if SCP is null.
+ - The second newly-created ALLOC or RELEASE node, if we are mapping a
+   reference to a pointer.  */
+
+static tree
+insert_struct_component_mapping (enum tree_code code, tree c, tree struct_node,
+ tree prev_node, tree *scp)
+{
+  enum gomp_map_kind mkind = (code == OMP_TARGET_EXIT_DATA
+			  || code == OACC_EXIT_DATA)
+			 ? GOMP_MAP_RELEASE : GOMP_MAP_ALLOC;
+
+  tree c2 = build_omp_clause (OMP_CLAUSE_LOCATION (c), OMP_CLAUSE_MAP);
+  tree cl = scp ? prev_node : c2;
+  OMP_CLAUSE_SET_MAP_KIND (c2, mkind);
+  OMP_CLAUSE_DECL (c2) = unshare_expr (OMP_CLAUSE_DECL (c));
+  OMP_CLAUSE_CHAIN (c2) = scp ? *scp : prev_node;
+  OMP_CLAUSE_SIZE (c2) = TYPE_SIZE_UNIT (ptr_type_node);
+  if (struct_node)
+OMP_CLAUSE_CHAIN (struct_node) = c2;
+
+  /* We might need to create an additional mapping if we have a reference to a
+ pointer (in C++).  Don't do this if we have something other than a
+ GOMP_MAP_ALWAYS_POINTER though, i.e. a GOMP_MAP_TO_PSET.  */
+  if (OMP_CLAUSE_CHAIN (prev_node) != c
+  && OMP_CLAUSE_CODE (OMP_CLAUSE_CHAIN (prev_node)) == OMP_CLAUSE_MAP
+  && (OMP_CLAUSE_MAP_KIND (OMP_CLAUSE_CHAIN (prev_node))
+	  == GOMP_MAP_ALWAYS_POINTER))
+{
+  tree c4 = OMP_CLAUSE_CHAIN (prev_node);
+  tree c3 = build_omp_clause (OMP_CLAUSE_LOCATION (c), OMP_CLAUSE_MAP);
+  OMP_CLAUSE_SET_MAP_KIND (c3, mkind);
+  OMP_CLAUSE_DECL (c3) = unshare_expr (OMP_CLAUSE_DECL (c4));
+  OMP_CLAUSE_SIZE (c3) = TYPE_SIZE_UNIT (ptr_type_node);
+  OMP_CLAUSE_CHAIN (c3) = prev_node;
+  if (!scp)
+	OMP_CLAUSE_CHAIN (c2) = c3;
+  else
+	cl = c3;
+}
+
+  if (scp)
+*scp = c2;
+
+  return cl;
+}
+
+/* Called initially with ORIG_BASE non-null, sets PREV_BITPOS and PREV_POFFSET
+   to the offset of the field given in BASE.  Return type is 1 if BASE is equal
+   to *ORIG_BASE after stripping off ARRAY_REF and INDIRECT_REF nodes and
+   calling get_inner_reference, else 0.
+
+   Called subsequently with ORIG_BASE null, compares the offset of the field
+   given in BASE to PREV_BITPOS, PREV_POFFSET. Returns -1 if the base object
+   has changed, 0 if the new value has a higher bit position than that
+   described by the aforementioned arguments, or 1 if the new value is less
+   than them.  Used for (insertion) sorting components after a GOMP_MAP_STRUCT
+   mapping.  */
+
+static int
+check_base_and_compare_lt (tree base, tree *orig_base, tree decl,
+			   poly_int64 *prev_bitpos,
+			   poly_offset_int *prev_poffset)
+{
+  tree offset;
+  poly_int64 bitsize, bitpos;
+  machine_mode mode;
+  int unsignedp, reversep, volatilep = 0;
+  poly_offset_int poffset;
+
+  if (orig_base)
+{
+  while (TREE_CODE (base) == ARRAY_REF)
+	base = TREE_OPERAND (base, 0);
+
+  if (TREE_CODE (base) == INDIRECT_REF)
+	base = TREE_OPERAND (base, 0);
+}
+  else
+{
+  if (TREE_CODE (base) == ARRAY_REF)
+	{
+	  while (TREE_CODE (base) == ARRAY_REF)
+	base = TREE_OPERAND (base, 0);
+	  if (TREE_CODE (base) != COMPONENT_REF
+	  || TREE_CODE (TREE_TYPE (base)) != ARRAY_TYPE)
+	return -1;
+	}
+  else if (TREE_CODE (base) == INDIRECT_REF
+	   && TREE_CODE (TREE_OPERAND (base, 0)) == COMPONENT_REF
+	   && (TREE_CODE (TREE_TYPE (TREE_OPERAND (base, 0)))
+		   == REFERENCE_TYPE))
+	base = TREE_OPERAND (base, 0);
+}
+
+  base = get_inner_reference (base, &bitsize, &bitpos, &offset, &mode,
+			  &unsignedp, &rever

[PATCH 4/6] [og8] Interaction of dynamic/multidimensional arrays with attach/detach.

2018-11-20 Thread Julian Brown


OpenACC multidimensional (or "dynamic") arrays do not seem to fit very
neatly into the attach/detach mechanism described for OpenACC 2.6,
that is if the user tries to use a multidimensional array as a field
in a struct.  This patch disallows that combination, for now at least.
Multidimensional array support in general has been submitted upstream
here but not yet accepted:

https://gcc.gnu.org/ml/gcc-patches/2018-10/msg00937.html

gcc/
* omp-low.c (scan_sharing_clauses): Disallow dynamic (multidimensional)
arrays within structs.

gcc/testsuite/
* c-c++-common/goacc/deep-copy-multidim.c: Add test.

libgomp/
* target.c (gomp_map_vars_async, gomp_load_image_to_device):
Zero-initialise do_detach, dynamic_refcount and attach_count in more
places.
---
 gcc/omp-low.c  |   10 +-
 .../c-c++-common/goacc/deep-copy-multidim.c|   32 
 libgomp/target.c   |6 
 3 files changed, 47 insertions(+), 1 deletions(-)
 create mode 100644 gcc/testsuite/c-c++-common/goacc/deep-copy-multidim.c

diff --git a/gcc/omp-low.c b/gcc/omp-low.c
index e559211..1726451 100644
--- a/gcc/omp-low.c
+++ b/gcc/omp-low.c
@@ -1481,7 +1481,15 @@ scan_sharing_clauses (tree clauses, omp_context *ctx,
 		  t = TREE_TYPE (t);
 		}
 
-	  install_var_field (da_decl, by_ref, 3, ctx);
+	  if (DECL_P (decl))
+		install_var_field (da_decl, by_ref, 3, ctx);
+	  else
+	{
+		  error_at (OMP_CLAUSE_LOCATION (c),
+			"dynamic arrays cannot be used within structs");
+		  break;
+		}
+
 	  tree new_var = install_var_local (da_decl, ctx);
 
 	  bool existed = ctx->dynamic_arrays->put (new_var, da_dimensions);
diff --git a/gcc/testsuite/c-c++-common/goacc/deep-copy-multidim.c b/gcc/testsuite/c-c++-common/goacc/deep-copy-multidim.c
new file mode 100644
index 000..1696f0c
--- /dev/null
+++ b/gcc/testsuite/c-c++-common/goacc/deep-copy-multidim.c
@@ -0,0 +1,32 @@
+/* { dg-do compile } */
+
+#include 
+#include 
+
+struct dc
+{
+  int a;
+  int **b;
+};
+
+int
+main ()
+{
+  int n = 100, i, j;
+  struct dc v = { .a = 3 };
+
+  v.b = (int **) malloc (sizeof (int *) * n);
+  for (i = 0; i < n; i++)
+v.b[i] = (int *) malloc (sizeof (int) * n);
+
+#pragma acc parallel loop copy(v.a, v.b[:n][:n]) /* { dg-error "dynamic arrays cannot be used within structs" } */
+  for (i = 0; i < n; i++)
+for (j = 0; j < n; j++)
+  v.b[i][j] = v.a + i + j;
+
+  for (i = 0; i < n; i++)
+for (j = 0; j < n; j++)
+  assert (v.b[i][j] == v.a + i + j);
+
+  return 0;
+}
diff --git a/libgomp/target.c b/libgomp/target.c
index d9d42eb..da51291 100644
--- a/libgomp/target.c
+++ b/libgomp/target.c
@@ -1484,6 +1484,7 @@ gomp_map_vars_async (struct gomp_device_descr *devicep,
 	 set to false here.  */
 	  tgt->list[i].copy_from = false;
 	  tgt->list[i].always_copy_from = false;
+	  tgt->list[i].do_detach = false;
 
 	  size_t align = (size_t) 1 << (kind >> rshift);
 	  tgt_size = (tgt_size + align - 1) & ~(align - 1);
@@ -1521,6 +1522,8 @@ gomp_map_vars_async (struct gomp_device_descr *devicep,
 
 		  k->tgt = tgt;
 		  k->refcount = 1;
+		  k->dynamic_refcount = 0;
+		  k->attach_count = NULL;
 		  k->link_key = NULL;
 		  tgt_size = (tgt_size + align - 1) & ~(align - 1);
 		  target_row_addr = tgt->tgt_start + tgt_size;
@@ -1532,6 +1535,7 @@ gomp_map_vars_async (struct gomp_device_descr *devicep,
 		= GOMP_MAP_COPY_FROM_P (kind & typemask);
 		  row_desc->always_copy_from
 		= GOMP_MAP_ALWAYS_FROM_P (kind & typemask);
+		  row_desc->do_detach = false;
 		  row_desc->offset = 0;
 		  row_desc->length = da->data_row_size;
 
@@ -1839,6 +1843,7 @@ gomp_load_image_to_device (struct gomp_device_descr *devicep, unsigned version,
   k->tgt = tgt;
   k->tgt_offset = target_table[i].start;
   k->refcount = REFCOUNT_INFINITY;
+  k->attach_count = NULL;
   k->link_key = NULL;
   tgt->list[i].key = k;
   tgt->refcount++;
@@ -1873,6 +1878,7 @@ gomp_load_image_to_device (struct gomp_device_descr *devicep, unsigned version,
   k->tgt = tgt;
   k->tgt_offset = target_var->start;
   k->refcount = target_size & link_bit ? REFCOUNT_LINK : REFCOUNT_INFINITY;
+  k->attach_count = NULL;
   k->link_key = NULL;
   tgt->list[i].key = k;
   tgt->refcount++;

[PATCH 0/6] [og8] OpenACC attach/detach

2018-11-20 Thread Julian Brown


This patch series is a backport of the OpenACC attach/detach support to
the openacc-gcc-8-branch branch. It was previously posted upstream here:

https://gcc.gnu.org/ml/gcc-patches/2018-11/msg00823.html

This version of the series has been adjusted to account for features on
the branch that are not yet upstream. It also contains improvements to
the reference counting behaviour, partially verified using self-checking
code (not quite complete, and not yet submitted).

Tested (as a series) with offloading to nvptx. I will apply to the
openacc-gcc-8-branch shortly.

Julian Brown (6):
  [og8] Host-to-device transfer coalescing & magic offset value
self-documentation
  [og8] Factor out duplicate code in gimplify_scan_omp_clauses
  [og8] OpenACC 2.6 manual deep copy support (attach/detach)
  [og8] Interaction of dynamic/multidimensional arrays with
attach/detach.
  [og8] Backport parts of upstream declare-allocate patch
  [og8] OpenACC refcounting refresh

 gcc/c/c-parser.c   |   15 +-
 gcc/c/c-typeck.c   |4 +
 gcc/cp/parser.c|   16 +-
 gcc/cp/semantics.c |6 +-
 gcc/fortran/gfortran.h |2 +
 gcc/fortran/openmp.c   |  126 --
 gcc/fortran/trans-openmp.c |  163 +++-
 gcc/gimplify.c |  414 ++
 gcc/omp-low.c  |   13 +-
 .../c-c++-common/goacc/deep-copy-multidim.c|   32 ++
 gcc/testsuite/c-c++-common/goacc/mdc-1.c   |   10 +-
 gcc/testsuite/gfortran.dg/goacc/data-clauses.f95   |   38 +-
 gcc/testsuite/gfortran.dg/goacc/derived-types.f90  |   23 +-
 .../gfortran.dg/goacc/enter-exit-data.f95  |   24 +-
 .../gfortran.dg/goacc/kernels-alias-3.f95  |4 +-
 libgomp/libgomp.h  |   30 ++-
 libgomp/libgomp.map|   10 +
 libgomp/oacc-mem.c |  459 
 libgomp/oacc-parallel.c|  212 --
 libgomp/openacc.h  |6 +
 libgomp/target.c   |  291 +++--
 .../libgomp.oacc-c-c++-common/context-2.c  |6 +-
 .../libgomp.oacc-c-c++-common/context-4.c  |6 +-
 .../libgomp.oacc-c-c++-common/deep-copy-1.c|   24 +
 .../libgomp.oacc-c-c++-common/deep-copy-2.c|   29 ++
 .../libgomp.oacc-c-c++-common/deep-copy-3.c|   34 ++
 .../libgomp.oacc-c-c++-common/deep-copy-4.c|   87 
 .../libgomp.oacc-c-c++-common/deep-copy-5.c|   81 
 .../libgomp.oacc-c-c++-common/deep-copy-6.c|   59 +++
 .../libgomp.oacc-c-c++-common/deep-copy-7.c|   42 ++
 .../libgomp.oacc-c-c++-common/deep-copy-8.c|   53 +++
 libgomp/testsuite/libgomp.oacc-fortran/data-2.f90  |   20 +-
 .../testsuite/libgomp.oacc-fortran/deep-copy-1.f90 |   35 ++
 .../testsuite/libgomp.oacc-fortran/deep-copy-2.f90 |   33 ++
 .../testsuite/libgomp.oacc-fortran/deep-copy-3.f90 |   34 ++
 .../testsuite/libgomp.oacc-fortran/deep-copy-4.f90 |   49 ++
 .../testsuite/libgomp.oacc-fortran/deep-copy-5.f90 |   57 +++
 .../testsuite/libgomp.oacc-fortran/deep-copy-6.f90 |   61 +++
 .../testsuite/libgomp.oacc-fortran/deep-copy-7.f90 |   89 
 .../testsuite/libgomp.oacc-fortran/deep-copy-8.f90 |   41 ++
 .../libgomp.oacc-fortran/derived-type-1.f90|6 +-
 .../libgomp.oacc-fortran/non-scalar-data.f90   |6 +-
 .../testsuite/libgomp.oacc-fortran/update-2.f90|   44 +-
 43 files changed, 2079 insertions(+), 715 deletions(-)
 create mode 100644 gcc/testsuite/c-c++-common/goacc/deep-copy-multidim.c
 create mode 100644 libgomp/testsuite/libgomp.oacc-c-c++-common/deep-copy-1.c
 create mode 100644 libgomp/testsuite/libgomp.oacc-c-c++-common/deep-copy-2.c
 create mode 100644 libgomp/testsuite/libgomp.oacc-c-c++-common/deep-copy-3.c
 create mode 100644 libgomp/testsuite/libgomp.oacc-c-c++-common/deep-copy-4.c
 create mode 100644 libgomp/testsuite/libgomp.oacc-c-c++-common/deep-copy-5.c
 create mode 100644 libgomp/testsuite/libgomp.oacc-c-c++-common/deep-copy-6.c
 create mode 100644 libgomp/testsuite/libgomp.oacc-c-c++-common/deep-copy-7.c
 create mode 100644 libgomp/testsuite/libgomp.oacc-c-c++-common/deep-copy-8.c
 create mode 100644 libgomp/testsuite/libgomp.oacc-fortran/deep-copy-1.f90
 create mode 100644 libgomp/testsuite/libgomp.oacc-fortran/deep-copy-2.f90
 create mode 100644 libgomp/testsuite/libgomp.oacc-fortran/deep-copy-3.f90
 create mode 100644 libgomp/testsuite/libgomp.oacc-fortran/deep-copy-4.f90
 create mode 100644 libgomp/testsuite/libgomp.oacc-fortran/deep-copy-5.f90
 create mode 100644 libgomp/testsuite/libgomp.oacc-fortran/deep-copy-6.f90
 create mode 100644 libgomp/testsuite/libgomp.oacc-fortran/deep-copy-7.f90
 create mode

[PATCH 3/6] [og8] OpenACC 2.6 manual deep copy support (attach/detach)

2018-11-20 Thread Julian Brown


Previously posted upstream:
https://gcc.gnu.org/ml/gcc-patches/2018-11/msg00826.html

gcc/c/
* c-parser.c (c_parser_omp_variable_list): Allow deref (->) in
variable lists.
(c_parser_oacc_all_clauses): Re-alphabetize cases.
* c-typeck.c (handle_omp_array_sections_1): Support deref.

gcc/cp/
* parser.c (cp_parser_omp_var_list_no_open): Support deref.
(cp_parser_oacc_all_clauses): Re-alphabetize cases.
* semantics.c (finish_omp_clauses): Allow "this" for OpenACC data
clauses.  Support deref.

gcc/fortran/
* gfortran.h (gfc_omp_map_op): Add OMP_MAP_ATTACH, OMP_MAP_DETACH.
* openmp.c (omp_mask2): Add OMP_CLAUSE_ATTACH, OMP_CLAUSE_DETACH.
(gfc_match_omp_clauses): Remove allow_derived parameter, infer from
clause mask.  Support attach and detach.  Slight reformatting.
(OACC_PARALLEL_CLAUSES, OACC_KERNELS_CLAUSES, OACC_DATA_CLAUSES)
(OACC_ENTER_DATA_CLAUSES): Add OMP_CLAUSE_ATTACH.
(OACC_EXIT_DATA_CLAUSES): Add OMP_CLAUSE_DETACH.
(match_acc): Remove derived_types parameter, and don't pass to
gfc_match_omp_clauses.
(gfc_match_oacc_update): Don't pass allow_derived argument.
(gfc_match_oacc_enter_data): Likewise.
(gfc_match_oacc_exit_data): Likewise.
(check_symbol_not_pointer): Don't disallow pointer objects of derived
type.
(resolve_oacc_data_clauses): Don't disallow allocatable derived types.
(resolve_omp_clauses): Perform duplicate checking only for non-derived
type component accesses (plain variables and arrays or array sections).
Support component refs.
* trans-openmp.c (gfc_omp_privatize_by_reference): Support component
refs.
(gfc_trans_omp_clauses_1): Support component refs, attach and detach
clauses.

gcc/
* gimplify.c (gimplify_omp_var_data): Add GOVD_MAP_HAS_ATTACHMENTS.
(insert_struct_component_mapping): Support derived-type member mappings
for arrays with descriptors which use GOMP_MAP_TO_PSET.
(gimplify_scan_omp_clauses): Rewrite GOMP_MAP_ALWAYS_POINTER to
GOMP_MAP_ATTACH for OpenACC struct/derived-type component pointers.
Handle pointer mappings that use GOMP_MAP_TO_PSET.  Handle attach/detach
clauses.
(gimplify_adjust_omp_clauses_1): Skip adjustments for explicit
attach/detach clauses.
(gimplify_omp_target_update): Handle finalize for detach.

gcc/testsuite/
* c-c++-common/goacc/mdc-1.c: Update scan tests.
* gfortran.dg/goacc/data-clauses.f95: Remove expected errors.
* gfortran.dg/goacc/derived-types.f90: Likewise.
* gfortran.dg/goacc/enter-exit-data.f95: Likewise.

libgomp/
* libgomp.h (struct target_var_desc): Add do_detach flag.
(struct splay_tree_key_s): Add attach_count field.
(struct gomp_coalesce_buf): Add forward declaration.
(gomp_map_val, gomp_attach_pointer, gomp_detach_pointer): Add
prototypes.
(gomp_unmap_vars): Add finalize parameter.
* libgomp.map (OACC_2.6): New section. Add acc_attach, acc_attach_async,
acc_detach, acc_detach_async, acc_detach_finalize,
acc_detach_finalize_async.
* oacc-async.c (goacc_async_copyout_unmap_vars): Add finalize parameter.
Pass to gomp_unmap_vars_async.
* oacc-init.c (acc_shutdown_1): Update call to gomp_unmap_vars.
* oacc-int.h (goacc_async_copyout_unmap_vars): Add finalize parameter.
* oacc-mem.c (acc_unmap_data): Update call to gomp_unmap_vars.
(present_create_copy): Initialise attach_count.
(delete_copyout): Likewise.
(gomp_acc_insert_pointer): Likewise.
(gomp_acc_remove_pointer): Update calls to gomp_unmap_vars,
goacc_async_copyout_unmap_vars.
(acc_attach_async, acc_attach, goacc_detach_internal, acc_detach)
(acc_detach_async, acc_detach_finalize, acc_detach_finalize_async): New
functions.
* oacc-parallel.c (find_pointer): Support attach/detach.  Make a little
more strict.
(GOACC_parallel_keyed_internal): Use gomp_map_val to calculate device
addresses.  Update calls to gomp_unmap_vars,
goacc_async_copyout_unmap_vars.
(GOACC_data_end): Update call to gomp_unmap_vars.
(GOACC_enter_exit_data): Support attach/detach and GOMP_MAP_STRUCT.
* openacc.h (acc_attach, acc_attach_async, acc_detach)
(acc_detach_async, acc_detach_finalize, acc_detach_finalize_async): Add
prototypes.
* target.c (limits.h): Include.
(gomp_map_vars_existing): Initialise do_detach field of tgt_var_desc.
(gomp_attach_pointer, gomp_detach_pointer): New functions.
(gomp_map_val): Make global.
(gomp_map_vars_async): Support attach and detach.
(gomp_remove_var): Free attach count array if present.

[PATCH 6/6] [og8] OpenACC refcounting refresh

2018-11-20 Thread Julian Brown


This patch represents a mild overhaul of reference counting for OpenACC
in libgomp.  It's been partly automatically checked (using code not yet
quite finished nor submitted upstream), but it's already more precise
than the pre-patch implementation (as demonstrated by adjustments to
previously-erroneous tests, included).

I have a few more changes planned, but those are still tbd.

libgomp/
* libgomp.h (gomp_device_descr): Add GOMP_MAP_VARS_OPENACC_ENTER_DATA.
(gomp_acc_remove_pointer): Update prototype.
(gomp_acc_data_env_remove_tgt): Add prototype.
(gomp_unmap_vars, gomp_map_vars_async): Update prototype.
* oacc-int.h (goacc_async_copyout_unmap_vars): Update prototype.
* oacc-async.c (goacc_async_copyout_unmap_vars): Remove finalize
parameter.
* oacc-init.c (acc_shutdown_1): Remove finalize argument to
gomp_unmap_vars call.
* oacc-mem.c (lookup_dev_1): New helper function.
(lookup_dev): Rewrite in terms of above.
(acc_free): Update calls to lookup_dev.
(acc_map_data): Likewise.  Don't add data mapped this way to OpenACC
data environment list.
(gomp_acc_data_env_remove, gomp_acc_data_env_remove_tgt): New functions.
(acc_unmap_data): Rewrite using splay tree functions directly.  Don't
call gomp_unmap_vars.  Fix refcount handling.
(present_create_copy): Use GOMP_MAP_VARS_OPENACC_ENTER_DATA in
gomp_map_vars_async call.  Adjust refcount handling.
(delete_copyout): Remove dubious handling of target_mem_desc refcount.
(gomp_acc_insert_pointer): Use GOMP_MAP_VARS_OPENACC_ENTER_DATA in
gomp_map_vars_async call.  Update refcount handling.
(gomp_acc_remove_pointer): Reimplement.  Fix detach and refcount
handling.
* oacc-parallel.c (find_pointer): Handle more mapping types.  Update
calls to gomp_unmap_vars and goacc_async_copyout_unmap_vars.
(GOACC_enter_exit_data): Update refcount handling.

libgomp/
* target.c (gomp_detach_pointer): Unlock device on error path.
(gomp_map_vars_async): Support GOMP_MAP_VARS_OPENACC_ENTER_DATA and
mapping size fix GOMP_MAP_ATTACH.
(gomp_unmap_tgt): Call gomp_acc_data_env_remove_tgt.
(gomp_unmap_vars): Remove finalize parameter.
(gomp_unmap_vars_async): Likewise.  Adjust detach handling.
(GOMP_target, GOMP_target_ext, GOMP_target_end_data)
(gomp_target_task_fn): Update calls to gomp_unmap_vars.
* testsuite/libgomp.oacc-c-c++-common/context-2.c: Use correct API to
unmap data.
* testsuite/libgomp.oacc-c-c++-common/context-4.c: Likewise.
* testsuite/libgomp.oacc-c-c++-common/deep-copy-6.c: New test.
* testsuite/libgomp.oacc-c-c++-common/deep-copy-7.c: New test.
* testsuite/libgomp.oacc-c-c++-common/deep-copy-8.c: New test.
* testsuite/libgomp.oacc-fortran/data-2.f90: Fix for unmap semantics.
---
 libgomp/libgomp.h  |   10 +-
 libgomp/oacc-async.c   |4 +-
 libgomp/oacc-init.c|2 +-
 libgomp/oacc-int.h |2 +-
 libgomp/oacc-mem.c |  387 ++--
 libgomp/oacc-parallel.c|   76 +++--
 libgomp/target.c   |   35 ++-
 .../libgomp.oacc-c-c++-common/context-2.c  |6 +-
 .../libgomp.oacc-c-c++-common/context-4.c  |6 +-
 .../libgomp.oacc-c-c++-common/deep-copy-6.c|   59 +++
 .../libgomp.oacc-c-c++-common/deep-copy-7.c|   42 +++
 .../libgomp.oacc-c-c++-common/deep-copy-8.c|   53 +++
 libgomp/testsuite/libgomp.oacc-fortran/data-2.f90  |   20 +-
 13 files changed, 445 insertions(+), 257 deletions(-)
 create mode 100644 libgomp/testsuite/libgomp.oacc-c-c++-common/deep-copy-6.c
 create mode 100644 libgomp/testsuite/libgomp.oacc-c-c++-common/deep-copy-7.c
 create mode 100644 libgomp/testsuite/libgomp.oacc-c-c++-common/deep-copy-8.c

diff --git a/libgomp/libgomp.h b/libgomp/libgomp.h
index 17fe0d3..568e260 100644
--- a/libgomp/libgomp.h
+++ b/libgomp/libgomp.h
@@ -1002,6 +1002,7 @@ struct gomp_device_descr
 enum gomp_map_vars_kind
 {
   GOMP_MAP_VARS_OPENACC,
+  GOMP_MAP_VARS_OPENACC_ENTER_DATA,
   GOMP_MAP_VARS_TARGET,
   GOMP_MAP_VARS_DATA,
   GOMP_MAP_VARS_ENTER_DATA
@@ -1010,7 +1011,8 @@ enum gomp_map_vars_kind
 struct gomp_coalesce_buf;
 
 extern void gomp_acc_insert_pointer (size_t, void **, size_t *, void *, int);
-extern void gomp_acc_remove_pointer (void *, size_t, bool, int, int, int);
+extern void gomp_acc_remove_pointer (void **, size_t *, unsigned short *,
+ int, void *, bool, int);
 extern void gomp_acc_declare_allocate (bool, size_t, void **, size_t *,
    unsigned short *);
 struct gomp_coalesce_buf;
@@ -1039,10 +1041,12 @@ extern struct target_mem_d

[PATCH 5/6] [og8] Backport parts of upstream declare-allocate patch

2018-11-20 Thread Julian Brown


This patch adjusts mappings used for some special cases in Fortran
(e.g. allocatable scalars) on og8 to match code that is already upstream,
or that has been submitted but not yet reviewed. Parts taken from
https://gcc.gnu.org/ml/gcc-patches/2018-09/msg01205.html and parts
reverted from https://gcc.gnu.org/ml/gcc-patches/2017-01/msg02188.html.

gcc/fortran/
* trans-openmp.c (gfc_omp_finish_clause): Don't use
GOMP_MAP_FIRSTPRIVATE_POINTER.
(gfc_trans_omp_clauses_1): Adjust handling of allocatable scalars.

gcc/
* gimplify.c (demote_firstprivate_pointer): Remove.
(gimplify_scan_omp_clauses): Remove special handling for OpenACC. Don't
call demote_firstprivate_pointer.
(gimplify_adjust_omp_clauses): Adjust promotion of reduction clauses.
* omp-low.c (lower_omp_target): Remove special handling for Fortran.

gcc/testsuite/
* gfortran.dg/goacc/kernels-alias-3.f95: Revert comment changes and
XFAIL.

libgomp/
* testsuite/libgomp.oacc-fortran/non-scalar-data.f90: Remove XFAIL for
-O2 and -O3 and explanatory comment.
---
 gcc/fortran/trans-openmp.c |   22 -
 gcc/gimplify.c |   49 ++-
 gcc/omp-low.c  |3 +-
 .../gfortran.dg/goacc/kernels-alias-3.f95  |4 +-
 .../libgomp.oacc-fortran/non-scalar-data.f90   |6 +--
 5 files changed, 20 insertions(+), 64 deletions(-)

diff --git a/gcc/fortran/trans-openmp.c b/gcc/fortran/trans-openmp.c
index 98f40d1..71a3ebb 100644
--- a/gcc/fortran/trans-openmp.c
+++ b/gcc/fortran/trans-openmp.c
@@ -1084,7 +1084,7 @@ gfc_omp_finish_clause (tree c, gimple_seq *pre_p)
 	return;
   tree orig_decl = decl;
   c4 = build_omp_clause (OMP_CLAUSE_LOCATION (c), OMP_CLAUSE_MAP);
-  OMP_CLAUSE_SET_MAP_KIND (c4, GOMP_MAP_FIRSTPRIVATE_POINTER);
+  OMP_CLAUSE_SET_MAP_KIND (c4, GOMP_MAP_POINTER);
   OMP_CLAUSE_DECL (c4) = decl;
   OMP_CLAUSE_SIZE (c4) = size_int (0);
   decl = build_fold_indirect_ref (decl);
@@ -1100,10 +1100,7 @@ gfc_omp_finish_clause (tree c, gimple_seq *pre_p)
 	  OMP_CLAUSE_SIZE (c3) = size_int (0);
 	  decl = build_fold_indirect_ref (decl);
 	  OMP_CLAUSE_DECL (c) = decl;
-	  OMP_CLAUSE_SET_MAP_KIND (c4, GOMP_MAP_POINTER);
 	}
-  if (GFC_DESCRIPTOR_TYPE_P (TREE_TYPE (decl)))
-	OMP_CLAUSE_SET_MAP_KIND (c4, GOMP_MAP_POINTER);
 }
   if (GFC_DESCRIPTOR_TYPE_P (TREE_TYPE (decl)))
 {
@@ -2168,11 +2165,15 @@ gfc_trans_omp_clauses_1 (stmtblock_t *block, gfc_omp_clauses *clauses,
 	(TREE_TYPE (TREE_TYPE (field)
 		{
 		  tree orig_decl = decl;
-		  enum gomp_map_kind gmk = GOMP_MAP_FIRSTPRIVATE_POINTER;
-		  if (GFC_DECL_GET_SCALAR_ALLOCATABLE (decl)
-			  && (n->sym->attr.oacc_declare_create)
-			  && clauses->update_allocatable)
-			gmk = ptr_map_kind;
+		  enum gomp_map_kind gmk = GOMP_MAP_POINTER;
+		  if (GFC_DECL_GET_SCALAR_ALLOCATABLE (field)
+			  && n->sym->attr.oacc_declare_create)
+			{
+			  if (clauses->update_allocatable)
+			gmk = GOMP_MAP_ALWAYS_POINTER;
+			  else
+			gmk = GOMP_MAP_FIRSTPRIVATE_POINTER;
+			}
 		  node4 = build_omp_clause (input_location,
 		OMP_CLAUSE_MAP);
 		  OMP_CLAUSE_SET_MAP_KIND (node4, gmk);
@@ -2189,10 +2190,7 @@ gfc_trans_omp_clauses_1 (stmtblock_t *block, gfc_omp_clauses *clauses,
 			  OMP_CLAUSE_DECL (node3) = decl;
 			  OMP_CLAUSE_SIZE (node3) = size_int (0);
 			  decl = build_fold_indirect_ref (decl);
-			  OMP_CLAUSE_SET_MAP_KIND (node4, GOMP_MAP_POINTER);
 			}
-		  if (GFC_DESCRIPTOR_TYPE_P (TREE_TYPE (decl)))
-			OMP_CLAUSE_SET_MAP_KIND (node4, GOMP_MAP_POINTER);
 		}
 		  if (GFC_DESCRIPTOR_TYPE_P (TREE_TYPE (decl))
 		  && n->u.map_op != OMP_MAP_ATTACH
diff --git a/gcc/gimplify.c b/gcc/gimplify.c
index 40bf586..7f55cfd 100644
--- a/gcc/gimplify.c
+++ b/gcc/gimplify.c
@@ -7634,37 +7634,6 @@ find_decl_expr (tree *tp, int *walk_subtrees, void *data)
   return NULL_TREE;
 }
 
-static void
-demote_firstprivate_pointer (tree decl, gimplify_omp_ctx *ctx)
-{
-  if (!lang_GNU_Fortran ())
-return;
-
-  while (ctx)
-{
-  if (ctx->region_type == ORT_ACC_PARALLEL
-	  || ctx->region_type == ORT_ACC_KERNELS)
-	break;
-  ctx = ctx->outer_context;
-}
-
-  if (ctx == NULL)
-return;
-
-  tree clauses = ctx->clauses;
-
-  for (tree c = clauses; c; c = OMP_CLAUSE_CHAIN (c))
-{
-  if (OMP_CLAUSE_CODE (c) == OMP_CLAUSE_MAP
-	  && OMP_CLAUSE_MAP_KIND (c) == GOMP_MAP_FIRSTPRIVATE_POINTER
-	  && OMP_CLAUSE_DECL (c) == decl)
-	{
-	  OMP_CLAUSE_SET_MAP_KIND (c, GOMP_MAP_POINTER);
-	  return;
-	}
-}
-}
-
 /* Insert a GOMP_MAP_ALLOC or GOMP_MAP_RELEASE node following a
GOMP_MAP_STRUCT mapping.  C is an always_pointer mapping.  STRUCT_NODE is
the struct node to insert the new mapping after (when the struct node is
@@ -7843,7 +7812,7 @@ gimplify_scan_omp

Re: C++ PATCH to implement P1094R2, Nested inline namespaces

2018-11-20 Thread Jason Merrill


On 11/19/18 5:12 PM, Marek Polacek wrote:

On Mon, Nov 19, 2018 at 10:33:17PM +0100, Jakub Jelinek wrote:

On Mon, Nov 19, 2018 at 04:21:19PM -0500, Marek Polacek wrote:

2018-11-19  Marek Polacek  

Implement P1094R2, Nested inline namespaces.
* g++.dg/cpp2a/nested-inline-ns1.C: New test.
* g++.dg/cpp2a/nested-inline-ns2.C: New test.
* g++.dg/cpp2a/nested-inline-ns3.C: New test.


Just a small testsuite comment.


--- /dev/null
+++ gcc/testsuite/g++.dg/cpp2a/nested-inline-ns1.C
@@ -0,0 +1,26 @@
+// P1094R2
+// { dg-do compile { target c++2a } }


Especially because 2a testing isn't included by default, but also
to make sure it works right even with -std=c++17, wouldn't it be better to
drop the nested-inline-ns3.C test, make this test c++17 or
even better always enabled, add dg-options "-Wpedantic" and
just add dg-warning with c++17_down and c++14_down what should be
warned on the 3 lines (with .-1 for c++14_down)?

Or if you want add some further testcases that will test how
c++17 etc. will dg-error on those with -pedantic-errors etc.


Sure, I've made it { target c++11 } and dropped the third test:

Bootstrapped/regtested on x86_64-linux, ok for trunk?

2018-11-19  Marek Polacek  

Implement P1094R2, Nested inline namespaces.
* parser.c (cp_parser_namespace_definition): Parse the optional inline
keyword in a nested-namespace-definition.  Adjust push_namespace call.
Formatting fix.

* g++.dg/cpp2a/nested-inline-ns1.C: New test.
* g++.dg/cpp2a/nested-inline-ns2.C: New test.

diff --git gcc/cp/parser.c gcc/cp/parser.c
index 292cce15676..f39e9d753d2 100644
--- gcc/cp/parser.c
+++ gcc/cp/parser.c
@@ -18872,6 +18872,7 @@ cp_parser_namespace_definition (cp_parser* parser)
cp_ensure_no_oacc_routine (parser);
  
bool is_inline = cp_lexer_next_token_is_keyword (parser->lexer, RID_INLINE);

+  const bool topmost_inline_p = is_inline;
  
if (is_inline)

  {
@@ -18890,6 +18891,17 @@ cp_parser_namespace_definition (cp_parser* parser)
  {
identifier = NULL_TREE;

+  bool nested_inline_p = cp_lexer_next_token_is_keyword (parser->lexer,

+RID_INLINE);
+  if (nested_inline_p && nested_definition_count != 0)
+   {
+ if (cxx_dialect < cxx2a)
+   pedwarn (cp_lexer_peek_token (parser->lexer)->location,
+OPT_Wpedantic, "nested inline namespace definitions only "
+"available with -std=c++2a or -std=gnu++2a");
+ cp_lexer_consume_token (parser->lexer);
+   }


This looks like we won't get any diagnostic in lower conformance modes 
if there are multiple namespace scopes before the inline keyword.



if (cp_lexer_next_token_is (parser->lexer, CPP_NAME))
{
  identifier = cp_parser_identifier (parser);
@@ -18904,7 +18916,12 @@ cp_parser_namespace_definition (cp_parser* parser)
}
  
if (cp_lexer_next_token_is_not (parser->lexer, CPP_SCOPE))

-   break;
+   {
+ /* Don't forget that the innermost namespace might have been
+marked as inline.  */
+ is_inline |= nested_inline_p;


This looks wrong: an inline namespace does not make its nested 
namespaces inline as well.


Jason

Re: [PATCH] v3: C/C++: add fix-it hints for missing '&' and '*' (PR c++/87850)

2018-11-20 Thread David Malcolm

On Tue, 2018-11-20 at 02:46 +, Joseph Myers wrote:
> On Mon, 19 Nov 2018, David Malcolm wrote:
> 
> > +/* C implementation of same_type_p.
> > +   Returns true iff TYPE1 and TYPE2 are the same type, in the
> > usual
> > +   sense of `same'.  */
> > +
> > +bool
> > +same_type_p (tree type1, tree type2)
> > +{
> > +  return comptypes (type1, type2) == 1;
> > +}
> 
> I don't think "compatible" and "same" are the same concept.  Normally
> in C 
> you'd be concerned with compatibility; "same type" is only used for
> the 
> rule on duplicate typedefs, which uses
> comptypes_check_different_types.

The purpose here is to be able to offer fix-it hints for bogus code
that's missing an '&' or a '*' prefix, and have that code live in c-
common.c

Jason wanted to avoid a pointer-equality test for types by using
same_type_p to look through enums - but same_type_p is C++-specific.

Should I do:

(a) something like this for C:

/* C implementation of same_type_p.
   Returns true iff TYPE1 and TYPE2 are the same type, or are
   compatible enough to be permitted in C11 typedef redeclarations.  */

bool
same_type_p (tree type1, tree type2)
{
  bool different_types_p = false;
  int result = comptypes_check_different_types (type1, type2,
&different_types_p);

  if (result == 1 && !different_types_p)
return true;

  return false;   
}

(b) provide a same_type_p for C that e.g. simply does pointer equality,

(d) add a newly named function (e.g. "compatible_types_p", as C++ has a
comptypes, but it has a 3rd param), or

(d) fall back to simply doing pointer equality.

Thanks
Dave

Re: [C++ PATCH] Fix ICE in constexpr OBJ_TYPE_REF handling (PR c++/88110)

2018-11-20 Thread Jason Merrill

OK.
On Tue, Nov 20, 2018 at 3:51 PM Jakub Jelinek  wrote:
>
> Hi!
>
> The comment in OBJ_TYPE_REF handling code correctly says that we are
> looking for x.D.2103.D.2094, but it is important that x is not an
> INDIRECT_REF or something similar as in the following testcase - we can't
> really devirtualize in that case because we really don't know what it points
> to.  The following patch ensures that the argument got evaluated to address
> of some field of (ultimately) a decl, which is all we should get during
> valid constexpr evaluation.
>
> Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?
>
> 2018-11-20  Jakub Jelinek  
>
> PR c++/88110
> * constexpr.c (cxx_eval_constant_expression) : Punt
> if get_base_address of ADDR_EXPR operand is not a DECL_P.
>
> * g++.dg/cpp2a/constexpr-virtual13.C: New test.
>
> --- gcc/cp/constexpr.c.jj   2018-11-19 14:24:49.0 +0100
> +++ gcc/cp/constexpr.c  2018-11-20 15:03:26.968152935 +0100
> @@ -4815,7 +4815,8 @@ cxx_eval_constant_expression (const cons
> obj = cxx_eval_constant_expression (ctx, obj, lval, non_constant_p,
> overflow_p);
> /* We expect something in the form of &x.D.2103.D.2094; get x. */
> -   if (TREE_CODE (obj) != ADDR_EXPR)
> +   if (TREE_CODE (obj) != ADDR_EXPR
> +   || !DECL_P (get_base_address (TREE_OPERAND (obj, 0
>   {
> if (!ctx->quiet)
>   error_at (cp_expr_loc_or_loc (t, input_location),
> --- gcc/testsuite/g++.dg/cpp2a/constexpr-virtual13.C.jj 2018-11-20 
> 15:07:17.558386765 +0100
> +++ gcc/testsuite/g++.dg/cpp2a/constexpr-virtual13.C2018-11-20 
> 15:05:30.188140420 +0100
> @@ -0,0 +1,20 @@
> +// PR c++/88110
> +// { dg-do compile }
> +
> +struct A {
> +  virtual int foo () const = 0;
> +};
> +struct B {
> +  virtual int bar () const = 0;
> +  virtual int baz () const = 0;
> +};
> +struct C : public A { };
> +struct D : public C { };
> +struct E : public D, public B { };
> +
> +void
> +qux (const E *x)
> +{
> +  if (x->baz ())
> +;
> +}
>
> Jakub

Re: [PATCH] v3: C/C++: add fix-it hints for missing '&' and '*' (PR c++/87850)

2018-11-20 Thread Joseph Myers

On Tue, 20 Nov 2018, David Malcolm wrote:

> Should I do:

You should do whatever is appropriate for the warning in question.  But if 
what's appropriate for the warning in question includes types that are 
compatible but not the same, the comments need to avoid saying it's about 
the types being the same.

-- 
Joseph S. Myers
jos...@codesourcery.com

Re: [PATCH, libstdc++] Implement P0415 More constexpr for std::complex.

2018-11-20 Thread Ed Smith-Rowland


On 11/19/18 6:13 AM, Jonathan Wakely wrote:

On 16/11/18 19:39 -0500, Ed Smith-Rowland wrote:

@@ -322,67 +323,43 @@
  //@{
  ///  Return new complex value @a x plus @a y.
  template
-    inline complex<_Tp>
+    inline _GLIBCXX20_CONSTEXPR complex<_Tp>
    operator+(const complex<_Tp>& __x, const complex<_Tp>& __y)
-    {
-  complex<_Tp> __r = __x;
-  __r += __y;
-  return __r;
-    }
+    { return complex<_Tp>(__x.real() + __y.real(), __x.imag() + 
__y.imag()); }


Is this change (and all the similar ones) really needed?

Doesn't the fact that all the constructors and member operators of
std::complex mean that the original definition is also valid in a
constexpr function?

These changes are rolled back. Sorry.

@@ -1163,50 +1143,43 @@
#endif

  template
-    complex&
+    _GLIBCXX20_CONSTEXPR complex&
    operator=(const complex<_Tp>&  __z)
{
-  __real__ _M_value = __z.real();
-  __imag__ _M_value = __z.imag();
+  _M_value = __z.__rep();


These changes look OK, but I wonder if we shouldn't ask the compiler
to make it possible to use __real__ and __imag__ in constexpr
functions instead.

I assume it doesn't, and that's why you made this change. But if it
Just Worked, and the other changes I commented on above are also
unnecessary, then this patch would *mostly* just be adding
_GLIBCXX20_CONSTEXPR which is OK for stage 3 (as it doesn't affect any
dialects except C++2a).


Yes, this is the issue.  I agree that constexpr _real__, __imag__would 
be better.


Do you have any idea where this change would be?  I grepped around a 
little and couldn't figure it out.  if you don't I'll look more.


Actually, looking at constexpr.c it looks like the old way ought to work...

OK, plain assignment works but not the others.  Interesting.




@@ -1872,7 +1831,7 @@
    { return _Tp(); }

  template
-    inline typename __gnu_cxx::__promote<_Tp>::__type
+    _GLIBCXX_CONSTEXPR inline typename 
__gnu_cxx::__promote<_Tp>::__type


This should be _GLIBCXX20_CONSTEXPR.

Done.
Index: 
testsuite/26_numerics/complex/comparison_operators/more_constexpr.cc

===
--- 
testsuite/26_numerics/complex/comparison_operators/more_constexpr.cc 
(nonexistent)
+++ 
testsuite/26_numerics/complex/comparison_operators/more_constexpr.cc 
(working copy)

@@ -0,0 +1,51 @@
+// { dg-do compile { target c++2a } }


All the tests with { target c++2a} should also have:

// { dg-options "-std=gnu++2a" }

Because otherwise they are skipped by default, and only get run when
RUNTESTFLAGS explicitly includes something like
--target_board=unix/-std=gnu++2a

The dg-options needs to come first, or it doesn't apply before the
check for { target c++2a }.


Thank you, done.

Updated patch attached.  I'd like to understand why

    __real__ _M_value += __z.real();

doesn't work though.

Ed


Index: include/std/complex
===
--- include/std/complex (revision 266251)
+++ include/std/complex (working copy)
@@ -70,10 +70,11 @@
   ///  Return phase angle of @a z.
   template _Tp arg(const complex<_Tp>&);
   ///  Return @a z magnitude squared.
-  template _Tp norm(const complex<_Tp>&);
+  template _Tp _GLIBCXX20_CONSTEXPR norm(const complex<_Tp>&);
 
   ///  Return complex conjugate of @a z.
-  template complex<_Tp> conj(const complex<_Tp>&);
+  template
+_GLIBCXX20_CONSTEXPR complex<_Tp> conj(const complex<_Tp>&);
   ///  Return complex with magnitude @a rho and angle @a theta.
   template complex<_Tp> polar(const _Tp&, const _Tp& = 0);
 
@@ -169,18 +170,18 @@
 
   // _GLIBCXX_RESOLVE_LIB_DEFECTS
   // DR 387. std::complex over-encapsulated.
-  void
+  _GLIBCXX20_CONSTEXPR void
   real(_Tp __val) { _M_real = __val; }
 
-  void
+  _GLIBCXX20_CONSTEXPR void
   imag(_Tp __val) { _M_imag = __val; }
 
   /// Assign a scalar to this complex number.
-  complex<_Tp>& operator=(const _Tp&);
+  _GLIBCXX20_CONSTEXPR complex<_Tp>& operator=(const _Tp&);
 
   /// Add a scalar to this complex number.
   // 26.2.5/1
-  complex<_Tp>&
+  _GLIBCXX20_CONSTEXPR complex<_Tp>&
   operator+=(const _Tp& __t)
   {
_M_real += __t;
@@ -189,7 +190,7 @@
 
   /// Subtract a scalar from this complex number.
   // 26.2.5/3
-  complex<_Tp>&
+  _GLIBCXX20_CONSTEXPR complex<_Tp>&
   operator-=(const _Tp& __t)
   {
_M_real -= __t;
@@ -197,30 +198,30 @@
   }
 
   /// Multiply this complex number by a scalar.
-  complex<_Tp>& operator*=(const _Tp&);
+  _GLIBCXX20_CONSTEXPR complex<_Tp>& operator*=(const _Tp&);
   /// Divide this complex number by a scalar.
-  complex<_Tp>& operator/=(const _Tp&);
+  _GLIBCXX20_CONSTEXPR complex<_Tp>& operator/=(const _Tp&);
 
   // Let the compiler synthesize the copy assignment operator
 #if __cplusplus >= 201103L
-  complex& operator=(const complex&) = defau

Re: [PATCH] Do not mix -fsanitize=thread and -mabi=ms (PR sanitizer/88017).

2018-11-20 Thread Jeff Law

On 11/20/18 5:22 AM, Martin Liška wrote:
> Hi.
> 
> It's very similar to what I did few days ago for -fsanitize=address and 
> -mabi=ms.
> 
> Patch survives tests on x86_64-linux-gnu and bootstraps.
> 
> Ready for trunk?
> Thanks,
> Martin
> 
> gcc/ChangeLog:
> 
> 2018-11-20  Martin Liska  
> 
>   PR sanitizer/88017
>   * config/i386/i386.c (ix86_option_override_internal):
> 
> gcc/testsuite/ChangeLog:
> 
> 2018-11-20  Martin Liska  
> 
>   PR sanitizer/88017
>   * gcc.dg/tsan/pr88017.c: New test.
OK
jeff

Re: [PATCH][driver] Ensure --help=params lines end with period

2018-11-20 Thread Jeff Law

On 11/20/18 4:51 AM, Tom de Vries wrote:
> Hi,
> 
> this patch ensures that gcc --help=params lines end with a period by:
> - fixing the help message of param HOT_BB_COUNT_FRACTION, and
> - adding a test-case.
> 
> Build and tested on x86_64.
> 
> OK for trunk?
> 
> Thanks,
> - Tom
> 
> [driver] Ensure --help=params lines end with period
> 
> 2018-11-20  Tom de Vries  
> 
>   PR c/79855
>   * params.def (HOT_BB_COUNT_FRACTION): Terminate help message with
>   period.
> 
>   * lib/options.exp (check_for_options_with_filter): New proc.
>   * gcc.misc-tests/help.exp: Check that --help=params lines end with
>   period.
OK
jeff

Re: [PATCH, middle-end]: Fix PR 88070, ICE in create_pre_exit, at mode-switching.c:438

2018-11-20 Thread Jeff Law

On 11/19/18 12:58 PM, Uros Bizjak wrote:
> Hello!
> 
> The assert in create_pre_exit at mode-switching.c expects return copy
> pair with nothing in between. However, the compiler starts mode
> switching pass with the following sequence:
> 
> (insn 19 18 16 2 (set (reg:V2SF 21 xmm0)
> (mem/c:V2SF (plus:DI (reg/f:DI 7 sp)
> (const_int -72 [0xffb8])) [0  S8 A64]))
> "pr88070.c":8 1157 {*movv2sf_internal}
>  (nil))
> (insn 16 19 20 2 (set (reg:V2SF 0 ax [orig:91  ] [91])
> (reg:V2SF 0 ax [89])) "pr88070.c":8 1157 {*movv2sf_internal}
>  (nil))
> (insn 20 16 21 2 (unspec_volatile [
> (const_int 0 [0])
> ] UNSPECV_BLOCKAGE) "pr88070.c":8 710 {blockage}
>  (nil))
> (insn 21 20 23 2 (use (reg:V2SF 21 xmm0)) "pr88070.c":8 -1
>  (nil))
So I know there's an updated patch.  But I thought it might be worth
mentioning that insn 16 here appears to be a nop-move.   Removing it
might address this instance of the problem, but I doubt it's general
enough to address any larger issues.

You still might want to investigate why it's still in the IL.

Jeff

Re: [PATCH, middle-end]: Fix PR 88070, ICE in create_pre_exit, at mode-switching.c:438

2018-11-20 Thread Jeff Law

On 11/20/18 3:24 AM, Uros Bizjak wrote:
> On Tue, Nov 20, 2018 at 8:59 AM Eric Botcazou  wrote:
>>
>>> The blockage was introduced as a fix for PR14381 [1] in r79265 [2].
>>> Later, the blockage was moved after return label as a fix for PR25176
>>> [3] in r107871 [4].
>>>
>>> After that, r122626 [5] moves the blockage after the label for the
>>> naked return from the function. Relevant posts from gcc-patches@ ML
>>> are at [6], [7]. However, in the posts, there are no concrete
>>> examples, how scheduler moves instructions from different BB around
>>> blockage insn, the posts just show that there is a jump around
>>> blockage when __builtin_return is used. I was under impression that
>>> scheduler is unable to move instructions over BB boundaries.
>>
>> The scheduler works on extended basic blocks.  The [7] post gives a rather
>> convincing explanation and there is a C++ testcase under PR rtl-opt/14381.
>>
>>> A mystery is the tree-ssa merge [8] that copies back the hunk, moved
>>> in r122626 [5] to its original position. From this revision onwards,
>>> we emit two blockages.
>>
>> It's the dataflow merge, not the tree-ssa merge.  The additional blockage
>> might be needed for DF.
>>
>> Given that the current PR is totally artificial, I think that we need to be
>> quite conservative and only do something on mainline.  And even there I'd be
>> rather conservative and remove the kludge only for targets that emit unwind
>> information in the epilogue (among which there is x86 I presume).
> 
> Hm, I think I'll rather go with somehow target-dependent patch:
> 
> --cut here--
> diff --git a/gcc/mode-switching.c b/gcc/mode-switching.c
> index 370a49e90a9c..de75efe2b6c9 100644
> --- a/gcc/mode-switching.c
> +++ b/gcc/mode-switching.c
> @@ -252,7 +252,21 @@ create_pre_exit (int n_entities, int *entity_map,
> const int *num_modes)
> if (EDGE_COUNT (EXIT_BLOCK_PTR_FOR_FN (cfun)->preds) == 1
> && NONJUMP_INSN_P ((last_insn = BB_END (src_bb)))
> && GET_CODE (PATTERN (last_insn)) == USE
> -   && GET_CODE ((ret_reg = XEXP (PATTERN (last_insn), 0))) == REG)
> +   && GET_CODE ((ret_reg = XEXP (PATTERN (last_insn), 0))) == REG
> +
> +   /* x86 targets use mode-switching infrastructure to
> +  conditionally insert vzeroupper instruction at the exit
> +  from the function and there is no need to switch the
> +  mode before the return value copy.  The vzeroupper insertion
> +  pass runs after reload, so use !reload_completed as a stand-in
> +  for x86 to skip the search for return value copy insn.
Note that the GCN target may well end up needing a late mode switching
pass -- it's got kindof an inverse problem to solve -- where to place
initializations of the exec register which is needed when we want to do
scalar ops in a simd unit.

I thought the SH used mode switching as well.  BUt I can't recall if it
was run before register allocation & reload.

jeff

Re: [PATCH][libbacktrace] Factor out read_string

2018-11-20 Thread Jeff Law

On 11/15/18 9:02 AM, Tom de Vries wrote:
> Hi,
> 
> This patch factors out new function read_string in dwarf.c.
> 
> Bootstrapped and reg-tested on x86_64.
> 
> OK for trunk (or, for stage1)?
> 
> Thanks,
> - Tom
> 
> [libbacktrace] Factor out read_string
> 
> 2018-11-15  Tom de Vries  
> 
>   * dwarf.c (read_string): Factor out of ...
>   (read_attribute, read_line_header, read_line_program): ... here.
OK
jeff

1 2 >

1 - 100 of 123 matches

Mail list logo