[openmp] Fix openmp 5.0 builtin function types

2018-12-12 Thread Tom de Vries
[ was: Re: [committed 1a/4] (Partial) OpenMP 5.0 support for GCC 9 (gcc
middle-end) ]

On 08-11-18 18:24, Jakub Jelinek wrote:
>   * omp-builtins.def (BUILT_IN_GOMP_TASKWAIT_DEPEND,
>   BUILT_IN_GOMP_LOOP_NONMONOTONIC_RUNTIME_START,
>   BUILT_IN_GOMP_LOOP_MAYBE_NONMONOTONIC_RUNTIME_START,
>   BUILT_IN_GOMP_LOOP_START, BUILT_IN_GOMP_LOOP_ORDERED_START,
>   BUILT_IN_GOMP_LOOP_DOACROSS_START,
>   BUILT_IN_GOMP_LOOP_NONMONOTONIC_RUNTIME_NEXT,
>   BUILT_IN_GOMP_LOOP_MAYBE_NONMONOTONIC_RUNTIME_NEXT,
>   BUILT_IN_GOMP_LOOP_ULL_NONMONOTONIC_RUNTIME_START,
>   BUILT_IN_GOMP_LOOP_ULL_MAYBE_NONMONOTONIC_RUNTIME_START,
>   BUILT_IN_GOMP_LOOP_ULL_START, BUILT_IN_GOMP_LOOP_ULL_ORDERED_START,
>   BUILT_IN_GOMP_LOOP_ULL_DOACROSS_START,
>   BUILT_IN_GOMP_LOOP_ULL_NONMONOTONIC_RUNTIME_NEXT,
>   BUILT_IN_GOMP_LOOP_ULL_MAYBE_NONMONOTONIC_RUNTIME_NEXT,
>   BUILT_IN_GOMP_PARALLEL_LOOP_NONMONOTONIC_RUNTIME,
>   BUILT_IN_GOMP_PARALLEL_LOOP_MAYBE_NONMONOTONIC_RUNTIME,
>   BUILT_IN_GOMP_PARALLEL_REDUCTIONS, BUILT_IN_GOMP_SECTIONS2_START,
>   BUILT_IN_GOMP_TEAMS_REG, BUILT_IN_GOMP_TASKGROUP_REDUCTION_REGISTER,
>   BUILT_IN_GOMP_TASKGROUP_REDUCTION_UNREGISTER,
>   BUILT_IN_GOMP_TASK_REDUCTION_REMAP,
>   BUILT_IN_GOMP_WORKSHARE_TASK_REDUCTION_UNREGISTER): New builtins.

Hi,

OK for trunk?

Thanks,
- Tom
[openmp] Fix openmp 5.0 builtin function types

Fix some openmp 5.0 builtin functions to match the type used in the
implementation of those functions.

This fixes some libgomp testsuite failures for x86_64 with nvptx accelerator.

Build on x86_64 with nvptx accelerator, tested libgomp.

2018-12-12  Tom de Vries  

	* omp-builtins.def
	(BUILT_IN_GOMP_LOOP_MAYBE_NONMONOTONIC_RUNTIME_START)
	(BUILT_IN_GOMP_LOOP_ULL_MAYBE_NONMONOTONIC_RUNTIME_START)
	(BUILT_IN_GOMP_PARALLEL_LOOP_MAYBE_NONMONOTONIC_RUNTIME): Fix
	function type.

---
 gcc/omp-builtins.def | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/gcc/omp-builtins.def b/gcc/omp-builtins.def
index 70051635fa0..6e22065461a 100644
--- a/gcc/omp-builtins.def
+++ b/gcc/omp-builtins.def
@@ -130,7 +130,7 @@ DEF_GOMP_BUILTIN (BUILT_IN_GOMP_LOOP_NONMONOTONIC_RUNTIME_START,
 		  ATTR_NOTHROW_LEAF_LIST)
 DEF_GOMP_BUILTIN (BUILT_IN_GOMP_LOOP_MAYBE_NONMONOTONIC_RUNTIME_START,
 		  "GOMP_loop_maybe_nonmonotonic_runtime_start",
-		  BT_FN_BOOL_LONG_LONG_LONG_LONG_LONGPTR_LONGPTR,
+		  BT_FN_BOOL_LONG_LONG_LONG_LONGPTR_LONGPTR,
 		  ATTR_NOTHROW_LEAF_LIST)
 DEF_GOMP_BUILTIN (BUILT_IN_GOMP_LOOP_ORDERED_STATIC_START,
 		  "GOMP_loop_ordered_static_start",
@@ -238,7 +238,7 @@ DEF_GOMP_BUILTIN (BUILT_IN_GOMP_LOOP_ULL_NONMONOTONIC_RUNTIME_START,
 		  ATTR_NOTHROW_LEAF_LIST)
 DEF_GOMP_BUILTIN (BUILT_IN_GOMP_LOOP_ULL_MAYBE_NONMONOTONIC_RUNTIME_START,
 		  "GOMP_loop_ull_maybe_nonmonotonic_runtime_start",
-		  BT_FN_BOOL_BOOL_ULL_ULL_ULL_ULL_ULLPTR_ULLPTR,
+		  BT_FN_BOOL_BOOL_ULL_ULL_ULL_ULLPTR_ULLPTR,
 		  ATTR_NOTHROW_LEAF_LIST)
 DEF_GOMP_BUILTIN (BUILT_IN_GOMP_LOOP_ULL_ORDERED_STATIC_START,
 		  "GOMP_loop_ull_ordered_static_start",
@@ -353,7 +353,7 @@ DEF_GOMP_BUILTIN (BUILT_IN_GOMP_PARALLEL_LOOP_NONMONOTONIC_RUNTIME,
 		  ATTR_NOTHROW_LIST)
 DEF_GOMP_BUILTIN (BUILT_IN_GOMP_PARALLEL_LOOP_MAYBE_NONMONOTONIC_RUNTIME,
 		  "GOMP_parallel_loop_maybe_nonmonotonic_runtime",
-		  BT_FN_VOID_OMPFN_PTR_UINT_LONG_LONG_LONG_LONG_UINT,
+		  BT_FN_VOID_OMPFN_PTR_UINT_LONG_LONG_LONG_UINT,
 		  ATTR_NOTHROW_LIST)
 DEF_GOMP_BUILTIN (BUILT_IN_GOMP_LOOP_END, "GOMP_loop_end",
 		  BT_FN_VOID, ATTR_NOTHROW_LEAF_LIST)


[libgomp, nvptx] Disable OMP_{DISPLAY_AFFINITY,AFFINITY_FORMAT} support

2018-12-12 Thread Tom de Vries
[ was: Re: [committed 2/4] (Partial) OpenMP 5.0 support for GCC 9
(runtime library changes) ]

On 08-11-18 18:18, Jakub Jelinek wrote:
> Hi!
> 
> This is the libgomp/ except libgomp/testsuite/ part of the gomp-5_0-branch
> merge to trunk I've just committed.
> 
> 2018-11-08  Jakub Jelinek  
> 
>   * affinity.c (gomp_display_affinity_place): New function.
>   * affinity-fmt.c: New file.

>   * fortran.c: Include stdio.h and string.h.
>   (omp_pause_resource, omp_pause_resource_all): Add ialias_redirect.
>   (omp_get_schedule_, omp_get_schedule_8_): Mask off GFS_MONOTONIC bit.
>   (omp_set_affinity_format_, omp_get_affinity_format_,
>   omp_display_affinity_, omp_capture_affinity_, omp_pause_resource_,
>   omp_pause_resource_all_): New functions.

Hi,

OK for trunk?

Thanks,
- Tom
[libgomp, nvptx] Disable OMP_{DISPLAY_AFFINITY,AFFINITY_FORMAT} support

Disable compilation of support for OMP_DISPLAY_AFFINITY and OMP_AFFINITY_FORMAT
for nvptx.

This fixes some libgomp testsuite failures for x86_64 with nvptx accelerator.

Build on x86_64 with nvptx accelerator, tested libgomp.

2018-12-12  Tom de Vries  

	* affinity.c (gomp_display_affinity_place): Guard with
	#ifndef LIBGOMP_OFFLOADED_ONLY.
	* fortran.c (omp_set_affinity_format_, omp_get_affinity_format_)
	(omp_display_affinity_, omp_capture_affinity_): Same.
	* libgomp/config/nvptx/affinity-fmt.c: New, empty file overriding
	libgomp/affinity-fmt.c.

---
 libgomp/affinity.c  | 2 ++
 libgomp/config/nvptx/affinity-fmt.c | 0
 libgomp/fortran.c   | 2 ++
 3 files changed, 4 insertions(+)

diff --git a/libgomp/affinity.c b/libgomp/affinity.c
index efe5c4260e3..dbfb8e7a75d 100644
--- a/libgomp/affinity.c
+++ b/libgomp/affinity.c
@@ -140,6 +140,7 @@ gomp_get_place_proc_ids_8 (int place_num, int64_t *ids)
   (void) ids;
 }
 
+#ifndef LIBGOMP_OFFLOADED_ONLY
 void
 gomp_display_affinity_place (char *buffer, size_t size, size_t *ret,
 			 int place)
@@ -151,6 +152,7 @@ gomp_display_affinity_place (char *buffer, size_t size, size_t *ret,
 strcpy (buf, "0");
   gomp_display_string (buffer, size, ret, buf, strlen (buf));
 }
+#endif
 
 ialias(omp_get_place_num_procs)
 ialias(omp_get_place_proc_ids)
diff --git a/libgomp/config/nvptx/affinity-fmt.c b/libgomp/config/nvptx/affinity-fmt.c
new file mode 100644
index 000..e69de29bb2d
diff --git a/libgomp/fortran.c b/libgomp/fortran.c
index 0157baec648..a577a2e515b 100644
--- a/libgomp/fortran.c
+++ b/libgomp/fortran.c
@@ -584,6 +584,7 @@ omp_get_max_task_priority_ (void)
   return omp_get_max_task_priority ();
 }
 
+#ifndef LIBGOMP_OFFLOADED_ONLY
 void
 omp_set_affinity_format_ (const char *format, size_t format_len)
 {
@@ -664,6 +665,7 @@ omp_capture_affinity_ (char *buffer, const char *format,
 memset (buffer + ret, ' ', buffer_len - ret);
   return ret;
 }
+#endif
 
 int32_t
 omp_pause_resource_ (const int32_t *kind, const int32_t *device_num)


Re: [C++ PATCH] Fix up __builtin_is_constant_evaluated handling in array type sizes (PR c++/88446, take 2)

2018-12-12 Thread Jakub Jelinek
On Tue, Dec 11, 2018 at 06:02:12PM -0500, Jason Merrill wrote:
> On 12/11/18 4:38 PM, Jakub Jelinek wrote:
> > On Tue, Dec 11, 2018 at 03:35:39PM -0500, Marek Polacek wrote:
> > > >   tree
> > > > -maybe_constant_value (tree t, tree decl)
> > > > +maybe_constant_value (tree t, tree decl, bool pretend_const_required)
> > > >   {
> > > > tree r;
> > > 
> > > Could you please describe the new param in the comment?
> > > 
> > > Perhaps use /*pretend_const_required=*/true?
> > 
> > Thanks, changed both, plus I've changed the other function's descriptions
> > of that argument from the P0595R1 term (required to be const-evaluated)
> > to the P0595R2 term (manifestly const-evaluated).
> > 
> > Ok for trunk?  Or should I also change pretend_const_required argument
> > name to manifestly_const_eval or something similar?
> 
> I think so, yes.  OK with that change.

Here is what I've committed after committing the other two patches.
Thanks for the review.

2018-12-12  Jakub Jelinek  

PR c++/88446
* cp-tree.h (maybe_constant_value): Add manifestly_const_eval
argument.
* constexpr.c (struct constexpr_call): Rename pretend_const_required
member to manifestly_const_eval.
(struct constexpr_ctx): Likewise.
(constexpr_call_hasher::equal): Adjust users.
(cxx_eval_builtin_function_call): Likewise.  Formatting fix.
(cxx_eval_call_expression): Adjust users.
(cxx_eval_outermost_constant_expr, maybe_constant_init_1,
maybe_constant_init): Rename pretend_const_required argument to
manifestly_const_eval, adjust function comments.
(maybe_constant_value): Add manifestly_const_eval argument.  If true,
don't cache and call cxx_eval_outermost_constant_expr with true as
manifestly_const_eval.
* decl.c (compute_array_index_type_loc): Call maybe_constant_value
with true as manifestly_const_eval.

* g++.dg/cpp2a/is-constant-evaluated3.C: New test.

--- gcc/cp/cp-tree.h.jj 2018-12-12 09:22:36.123193623 +0100
+++ gcc/cp/cp-tree.h2018-12-12 09:32:27.408535853 +0100
@@ -7663,7 +7663,7 @@ extern bool require_rvalue_constant_expr
 extern bool require_potential_rvalue_constant_expression (tree);
 extern tree cxx_constant_value (tree, tree = NULL_TREE);
 extern tree cxx_constant_init  (tree, tree = NULL_TREE);
-extern tree maybe_constant_value   (tree, tree = NULL_TREE);
+extern tree maybe_constant_value   (tree, tree = NULL_TREE, bool = 
false);
 extern tree maybe_constant_init(tree, tree = 
NULL_TREE, bool = false);
 extern tree fold_non_dependent_expr(tree, tsubst_flags_t = 
tf_warning_or_error);
 extern tree fold_simple(tree);
--- gcc/cp/constexpr.c.jj   2018-12-12 09:27:43.352175869 +0100
+++ gcc/cp/constexpr.c  2018-12-12 09:34:17.531736075 +0100
@@ -977,7 +977,7 @@ struct GTY((for_user)) constexpr_call {
  recalculate it when expanding the hash table.  */
   hashval_t hash;
   /* Whether __builtin_is_constant_evaluated() should evaluate to true.  */
-  bool pretend_const_required;
+  bool manifestly_const_eval;
 };
 
 struct constexpr_call_hasher : ggc_ptr_hash
@@ -1025,7 +1025,7 @@ struct constexpr_ctx {
  trying harder to get a constant value.  */
   bool strict;
   /* Whether __builtin_is_constant_evaluated () should be true.  */
-  bool pretend_const_required;
+  bool manifestly_const_eval;
 };
 
 /* A table of all constexpr calls that have been evaluated by the
@@ -1057,7 +1057,7 @@ constexpr_call_hasher::equal (constexpr_
 return true;
   if (lhs->hash != rhs->hash)
 return false;
-  if (lhs->pretend_const_required != rhs->pretend_const_required)
+  if (lhs->manifestly_const_eval != rhs->manifestly_const_eval)
 return false;
   if (!constexpr_fundef_hasher::equal (lhs->fundef, rhs->fundef))
 return false;
@@ -1206,11 +1206,11 @@ cxx_eval_builtin_function_call (const co
 }
 
   /* For __builtin_is_constant_evaluated, defer it if not
- ctx->pretend_const_required, otherwise fold it to true.  */
+ ctx->manifestly_const_eval, otherwise fold it to true.  */
   if (fndecl_built_in_p (fun, CP_BUILT_IN_IS_CONSTANT_EVALUATED,
-  BUILT_IN_FRONTEND))
+BUILT_IN_FRONTEND))
 {
-  if (!ctx->pretend_const_required)
+  if (!ctx->manifestly_const_eval)
{
  *non_constant_p = true;
  return t;
@@ -1508,7 +1508,7 @@ cxx_eval_call_expression (const constexp
   location_t loc = cp_expr_loc_or_loc (t, input_location);
   tree fun = get_function_named_in_call (t);
   constexpr_call new_call
-= { NULL, NULL, NULL, 0, ctx->pretend_const_required };
+= { NULL, NULL, NULL, 0, ctx->manifestly_const_eval };
   bool depth_ok;
 
   if (fun == NULL_TREE)
@@ -1684,7 +1684,7 @@ cxx_eval_call_expression (const constexp
   new_call.hash
= iterative_hash_template_arg

Re: [PATCH] libgcc: rs6000: tramp.S: fix placement of .cfi_endproc for __trampoline_setup

2018-12-12 Thread Alan Modra
On Wed, Dec 12, 2018 at 08:43:41AM +0100, Rasmus Villemoes wrote:
> diff --git a/libgcc/config/rs6000/tramp.S b/libgcc/config/rs6000/tramp.S
> index 637f4510146..a9f0f3826dc 100644
> --- a/libgcc/config/rs6000/tramp.S
> +++ b/libgcc/config/rs6000/tramp.S
> @@ -114,11 +114,10 @@ FUNC_START(__trampoline_setup)
>   addir30,r30,_GLOBAL_OFFSET_TABLE_-1b@l
>  #endif
>   bl  JUMP_TARGET(abort)
> +#endif
>   .cfi_endproc
>  FUNC_END(__trampoline_setup)
>  
> -#endif
> -
>  #elif _CALL_ELF == 2
>   .type   trampoline_initial,@object
>   .align  3

Looks good to me.  I hadn't noticed the bad #endif placement with
respect to FUNC_END when I added .cfi_endproc, which naturally goes
with FUNC_END.

-- 
Alan Modra
Australia Development Lab, IBM


Re: [openmp] Fix openmp 5.0 builtin function types

2018-12-12 Thread Jakub Jelinek
On Wed, Dec 12, 2018 at 09:25:06AM +0100, Tom de Vries wrote:
> [openmp] Fix openmp 5.0 builtin function types
> 
> Fix some openmp 5.0 builtin functions to match the type used in the
> implementation of those functions.
> 
> This fixes some libgomp testsuite failures for x86_64 with nvptx accelerator.
> 
> Build on x86_64 with nvptx accelerator, tested libgomp.
> 
> 2018-12-12  Tom de Vries  
> 
>   * omp-builtins.def
>   (BUILT_IN_GOMP_LOOP_MAYBE_NONMONOTONIC_RUNTIME_START)
>   (BUILT_IN_GOMP_LOOP_ULL_MAYBE_NONMONOTONIC_RUNTIME_START)
>   (BUILT_IN_GOMP_PARALLEL_LOOP_MAYBE_NONMONOTONIC_RUNTIME): Fix
>   function type.

Good catch, but it looks insufficient, the builtins without _maybe have the
same bug (copy and paste from the preceding dynamic/guided calls that do
have that extra argument - chunk_size), so if you've caught only these,
it seems we have insufficient testsuite covergate in the target regions.

Does the following testcase show also the remaining three bugs on unpatched
nvptx offloading (sorry, don't have nvptx offloading set up right now,
should fix that soon)?

2018-12-12  Tom de Vries  
Jakub Jelinek  

* omp-builtins.def (BUILT_IN_GOMP_LOOP_NONMONOTONIC_RUNTIME_START,
BUILT_IN_GOMP_LOOP_MAYBE_NONMONOTONIC_RUNTIME_START,
BUILT_IN_GOMP_LOOP_ULL_NONMONOTONIC_RUNTIME_START,
BUILT_IN_GOMP_LOOP_ULL_MAYBE_NONMONOTONIC_RUNTIME_START,
BUILT_IN_GOMP_PARALLEL_LOOP_NONMONOTONIC_RUNTIME,
BUILT_IN_GOMP_PARALLEL_LOOP_MAYBE_NONMONOTONIC_RUNTIME): Fix up
function types - remove one argument.

* testsuite/libgomp.c-c++-common/for-16.c: New test.

--- gcc/omp-builtins.def.jj 2018-11-08 18:07:56.345070635 +0100
+++ gcc/omp-builtins.def2018-12-12 10:03:14.355354282 +0100
@@ -126,11 +126,11 @@ DEF_GOMP_BUILTIN (BUILT_IN_GOMP_LOOP_NON
  ATTR_NOTHROW_LEAF_LIST)
 DEF_GOMP_BUILTIN (BUILT_IN_GOMP_LOOP_NONMONOTONIC_RUNTIME_START,
  "GOMP_loop_nonmonotonic_runtime_start",
- BT_FN_BOOL_LONG_LONG_LONG_LONG_LONGPTR_LONGPTR,
+ BT_FN_BOOL_LONG_LONG_LONG_LONGPTR_LONGPTR,
  ATTR_NOTHROW_LEAF_LIST)
 DEF_GOMP_BUILTIN (BUILT_IN_GOMP_LOOP_MAYBE_NONMONOTONIC_RUNTIME_START,
  "GOMP_loop_maybe_nonmonotonic_runtime_start",
- BT_FN_BOOL_LONG_LONG_LONG_LONG_LONGPTR_LONGPTR,
+ BT_FN_BOOL_LONG_LONG_LONG_LONGPTR_LONGPTR,
  ATTR_NOTHROW_LEAF_LIST)
 DEF_GOMP_BUILTIN (BUILT_IN_GOMP_LOOP_ORDERED_STATIC_START,
  "GOMP_loop_ordered_static_start",
@@ -234,11 +234,11 @@ DEF_GOMP_BUILTIN (BUILT_IN_GOMP_LOOP_ULL
  ATTR_NOTHROW_LEAF_LIST)
 DEF_GOMP_BUILTIN (BUILT_IN_GOMP_LOOP_ULL_NONMONOTONIC_RUNTIME_START,
  "GOMP_loop_ull_nonmonotonic_runtime_start",
- BT_FN_BOOL_BOOL_ULL_ULL_ULL_ULL_ULLPTR_ULLPTR,
+ BT_FN_BOOL_BOOL_ULL_ULL_ULL_ULLPTR_ULLPTR,
  ATTR_NOTHROW_LEAF_LIST)
 DEF_GOMP_BUILTIN (BUILT_IN_GOMP_LOOP_ULL_MAYBE_NONMONOTONIC_RUNTIME_START,
  "GOMP_loop_ull_maybe_nonmonotonic_runtime_start",
- BT_FN_BOOL_BOOL_ULL_ULL_ULL_ULL_ULLPTR_ULLPTR,
+ BT_FN_BOOL_BOOL_ULL_ULL_ULL_ULLPTR_ULLPTR,
  ATTR_NOTHROW_LEAF_LIST)
 DEF_GOMP_BUILTIN (BUILT_IN_GOMP_LOOP_ULL_ORDERED_STATIC_START,
  "GOMP_loop_ull_ordered_static_start",
@@ -349,11 +349,11 @@ DEF_GOMP_BUILTIN (BUILT_IN_GOMP_PARALLEL
  ATTR_NOTHROW_LIST)
 DEF_GOMP_BUILTIN (BUILT_IN_GOMP_PARALLEL_LOOP_NONMONOTONIC_RUNTIME,
  "GOMP_parallel_loop_nonmonotonic_runtime",
- BT_FN_VOID_OMPFN_PTR_UINT_LONG_LONG_LONG_LONG_UINT,
+ BT_FN_VOID_OMPFN_PTR_UINT_LONG_LONG_LONG_UINT,
  ATTR_NOTHROW_LIST)
 DEF_GOMP_BUILTIN (BUILT_IN_GOMP_PARALLEL_LOOP_MAYBE_NONMONOTONIC_RUNTIME,
  "GOMP_parallel_loop_maybe_nonmonotonic_runtime",
- BT_FN_VOID_OMPFN_PTR_UINT_LONG_LONG_LONG_LONG_UINT,
+ BT_FN_VOID_OMPFN_PTR_UINT_LONG_LONG_LONG_UINT,
  ATTR_NOTHROW_LIST)
 DEF_GOMP_BUILTIN (BUILT_IN_GOMP_LOOP_END, "GOMP_loop_end",
  BT_FN_VOID, ATTR_NOTHROW_LEAF_LIST)
--- libgomp/testsuite/libgomp.c-c++-common/for-16.c.jj  2018-12-12 
10:19:50.640074341 +0100
+++ libgomp/testsuite/libgomp.c-c++-common/for-16.c 2018-12-12 
10:19:22.100540708 +0100
@@ -0,0 +1,114 @@
+extern
+#ifdef __cplusplus
+"C"
+#endif
+void abort (void);
+
+unsigned long long int k = 16;
+#pragma omp declare target to (k)
+
+int
+main ()
+{
+  unsigned char a[144], b[144], c[144];
+  int l;
+  #pragma omp target map(from:a, b, c)
+  {
+int i;
+unsigned long long int j;
+#pragma omp parallel for schedule (runtime)
+for (i = 0; i < 16; i++)
+  a[i] = i;
+#pragma omp parallel for schedule (monotonic: runtime)
+for (i = 0; i < 16; i++)
+  a[i + 16] = i + 16;
+#

Re: [libgomp, nvptx] Disable OMP_{DISPLAY_AFFINITY,AFFINITY_FORMAT} support

2018-12-12 Thread Jakub Jelinek
On Wed, Dec 12, 2018 at 09:29:36AM +0100, Tom de Vries wrote:
> > This is the libgomp/ except libgomp/testsuite/ part of the gomp-5_0-branch
> > merge to trunk I've just committed.
> > 
> > 2018-11-08  Jakub Jelinek  
> > 
> > * affinity.c (gomp_display_affinity_place): New function.
> > * affinity-fmt.c: New file.
> 
> > * fortran.c: Include stdio.h and string.h.
> > (omp_pause_resource, omp_pause_resource_all): Add ialias_redirect.
> > (omp_get_schedule_, omp_get_schedule_8_): Mask off GFS_MONOTONIC bit.
> > (omp_set_affinity_format_, omp_get_affinity_format_,
> > omp_display_affinity_, omp_capture_affinity_, omp_pause_resource_,
> > omp_pause_resource_all_): New functions.
> 
> OK for trunk?
> 
> Thanks,
> - Tom

> [libgomp, nvptx] Disable OMP_{DISPLAY_AFFINITY,AFFINITY_FORMAT} support
> 
> Disable compilation of support for OMP_DISPLAY_AFFINITY and 
> OMP_AFFINITY_FORMAT
> for nvptx.
> 
> This fixes some libgomp testsuite failures for x86_64 with nvptx accelerator.
> 
> Build on x86_64 with nvptx accelerator, tested libgomp.
> 
> 2018-12-12  Tom de Vries  
> 
>   * affinity.c (gomp_display_affinity_place): Guard with
>   #ifndef LIBGOMP_OFFLOADED_ONLY.
>   * fortran.c (omp_set_affinity_format_, omp_get_affinity_format_)
>   (omp_display_affinity_, omp_capture_affinity_): Same.
>   * libgomp/config/nvptx/affinity-fmt.c: New, empty file overriding
>   libgomp/affinity-fmt.c.

The runtime APIs should still be available inside of OpenMP regions, your
patch makes them unavailable.
What exactly doesn't work?
hostname handling, getpid, thread_id, affinity places?
I'd prefer for now to just stub what doesn't work in the LIBGOMP_OFFLOADED_ONLY
case (print "node", 0 as pid, 0 as thread id, something for affinity
(0 or 0-N where N is the number of warps it can support or whatever).

Eventually we need to find a way how to transfer some ICVs and other data
from the host to the offloading library, either process shared that aren't
changing, which can include the hostname, getpid, or some others that would
need to be passed for every target region.

Jakub


Re: [PATCH] [RFC] PR target/52813 and target/11807

2018-12-12 Thread Christophe Lyon
On Tue, 11 Dec 2018 at 16:52, Richard Sandiford
 wrote:
>
> Dimitar Dimitrov  writes:
> > On понеделник, 10 декември 2018 г. 11:21:53 EET Richard Sandiford wrote:
> >> Dimitar Dimitrov  writes:
> >> > I have tested this fix on x86_64 host, and found no regression in the C
> >> > and C++ testsuites.  I'm marking this patch as RFC simply because I don't
> >> > have experience with other architectures, and I don't have a setup to
> >> > test all architectures supported by GCC.
> >> >
> >> > gcc/ChangeLog:
> >> >
> >> > 2018-12-07  Dimitar Dimitrov  
> >> >
> >> >* cfgexpand.c (asm_clobber_reg_is_valid): Also produce
> >> >error when stack pointer is clobbered.
> >> >(expand_asm_stmt): Refactor clobber check in separate function.
> >> >
> >> > gcc/testsuite/ChangeLog:
> >> >
> >> > 2018-12-07  Dimitar Dimitrov  
> >> >
> >> >* gcc.target/i386/pr52813.c: New test.
> >> >
> >> > Signed-off-by: Dimitar Dimitrov 
> >>
> >> LGTM.  Do you have a copyright assignment on file?  'Fraid this is
> >> probably big enough to need one.
> > Yes, I have copyright assignment.
>
> OK, great.  I went ahead and applied the patch.
>

Hi,

This patch introduces a regression on arm:
FAIL: gcc.target/arm/pr77904.c (test for excess errors)
Excess errors:
/gcc/testsuite/gcc.target/arm/pr77904.c:7:3: error: Stack Pointer
register clobbered by 'sp' in 'asm'

Indeed the testcase has an explicit:
  __asm volatile ("" : : : "sp");
which is now rejected.

Thomas, is that mandatory to test your code to fix pr77904?

Thanks,

Christophe

> Thanks,
> Richard


Re: [PATCH] x86: Add -march=cascadelake

2018-12-12 Thread Wei Xiao
Hi Uros and other reviewers,

I'd like to split the work into 2 parts:
1) Basic processor enabling.
2) Processor type dynamic check.

Let's use a separate patch to implement the part 2.
The part 1 is implemented by attached patch.
Is it ok for trunk?

Wei

gcc/
  * common/config/i386/i386-common.c (processor_names): Add cascadelake.
  (processor_alias_table): Add cascadelake.
  * config.gcc: Add -march=cascadelake.
  * config/i386/i386-c.c (ix86_target_macros_internal): Handle cascadelake.
  * config/i386/i386.c (Add m_CASCADELAKE): New.
  (processor_cost_table): Add cascadelake.
  (get_builtin_code_for_version): Handle cascadelake.
  * config/i386/i386.h (TARGET_CASCADELAKE, PROCESSOR_CASCADELAKE): New.
  (PTA_CASCADELAKE): Ditto.
  * doc/invoke.texi: Add -march=cascadelake.

gcc/testsuite/
  * gcc.target/i386/funcspec-56.inc: Handle new march.
Wei Xiao  于2018年11月29日周四 下午4:32写道:
>
> Hi
>
> Distinguish based on stepping number is not recommended for some reasons:
> 1) Intel doesn't officially disclose stepping information in SDM.
> 2) Stepping can be changing in the future.
>
> We still prefer the conventional distinguish approach based on feature bits.
> I have refined the patch as attached according to all your suggestions.
>
> Wei
>
> gcc/
> * common/config/i386/i386-common.c (processor_names): Add cascadelake.
> (processor_alias_table): Add cascadelake.
> * config.gcc: Add -march=cascadelake.
> * config/i386/driver-i386.c
> (host_detect_local_cpu): Detect cascadelake.
> * config/i386/i386-c.c (ix86_target_macros_internal): Handle
> cascadelake.
> * config/i386/i386.c (ix86_cost): Add m_CASCADELAKE.
> (processor_cost_table): Add cascadelake.
> (get_builtin_code_for_version): Handle cascadelake.
> (fold_builtin_cpu): Ditto.
> * config/i386/i386.h (TARGET_CASCADELAKE, PROCESSOR_CASCADELAKE): New.
> (PTA_CASCADELAKE): Ditto.
> * doc/extend.texi: Add cascadelake.
> * doc/invoke.texi: Add -march=cascadelake.
> gcc/testsuite/
> * g++.target/i386/mv16.C: Handle new march.
> * gcc.target/i386/builtin_target.c: Ditto.
> * gcc.target/i386/funcspec-56.inc: Ditto.
> libgcc/
> * config/i386/cpuinfo.c (get_intel_cpu): Handle cascadelake.
> * config/i386/cpuinfo.h: Add INTEL_COREI7_CASCADELAKE.
> Wei Xiao  于2018年11月27日周二 下午6:40写道:
> >
> > Thanks for the helpful information!
> > But I'm still checking with hardware team about the
> > family/model/stepping numbers for Cascadelake which are not officially
> > disclosed by Intel (to my best knowledge).
> >
> > Wei
> > Martin Liška  于2018年11月26日周一 下午10:00写道:
> > >
> > > On 11/26/18 12:18 PM, Jakub Jelinek wrote:
> > > > On Mon, Nov 26, 2018 at 12:03:53PM +0100, Martin Liška wrote:
> > > >>> For Cascade Lake the model number is the same as Skylake Server,
> > > >>> it can only be distinguished based on the stepping (5 vs 4)
> > > >>
> > > >> Very interesting, probably the first time a distinguish is based on 
> > > >> stepping number?
> > > >
> > > > Wouldn't it be better to distinguish it based on availability of VNNI, 
> > > > like
> > > > we do for unknown family/model?
> > > >
> > > >>> Like gcc -mcpu=native needs to learn about this.
> > > >>
> > > >> I'm attaching patch that does that. Note that it's completely untested 
> > > >> as I don't have
> > > >> access to any of the new machines (Skylake server).
> > >
> > > Would be possible, the only ugly place would be in 
> > > libgcc/config/i386/cpuinfo.c where we
> > > call:
> > >
> > >   get_intel_cpu (family, model, stepping, brand_id);
> > >   /* Find available features. */
> > >   get_available_features (ecx, edx, max_level, &avx512_vnni);
> > >
> > > one would need a feature to distinguish CPU model. Do we really want that?
> > >
> > > Martin
> > >
> > > >
> > > >   Jakub
> > > >
> > >


cascadelake-v4.diff
Description: Binary data


Re: [PATCH] [RFC] PR target/52813 and target/11807

2018-12-12 Thread Christophe Lyon
On Wed, 12 Dec 2018 at 10:42, Christophe Lyon
 wrote:
>
> On Tue, 11 Dec 2018 at 16:52, Richard Sandiford
>  wrote:
> >
> > Dimitar Dimitrov  writes:
> > > On понеделник, 10 декември 2018 г. 11:21:53 EET Richard Sandiford wrote:
> > >> Dimitar Dimitrov  writes:
> > >> > I have tested this fix on x86_64 host, and found no regression in the C
> > >> > and C++ testsuites.  I'm marking this patch as RFC simply because I 
> > >> > don't
> > >> > have experience with other architectures, and I don't have a setup to
> > >> > test all architectures supported by GCC.
> > >> >
> > >> > gcc/ChangeLog:
> > >> >
> > >> > 2018-12-07  Dimitar Dimitrov  
> > >> >
> > >> >* cfgexpand.c (asm_clobber_reg_is_valid): Also produce
> > >> >error when stack pointer is clobbered.
> > >> >(expand_asm_stmt): Refactor clobber check in separate function.
> > >> >
> > >> > gcc/testsuite/ChangeLog:
> > >> >
> > >> > 2018-12-07  Dimitar Dimitrov  
> > >> >
> > >> >* gcc.target/i386/pr52813.c: New test.
> > >> >
> > >> > Signed-off-by: Dimitar Dimitrov 
> > >>
> > >> LGTM.  Do you have a copyright assignment on file?  'Fraid this is
> > >> probably big enough to need one.
> > > Yes, I have copyright assignment.
> >
> > OK, great.  I went ahead and applied the patch.
> >
>
> Hi,
>
> This patch introduces a regression on arm:
> FAIL: gcc.target/arm/pr77904.c (test for excess errors)
> Excess errors:
> /gcc/testsuite/gcc.target/arm/pr77904.c:7:3: error: Stack Pointer
> register clobbered by 'sp' in 'asm'
>
> Indeed the testcase has an explicit:
>   __asm volatile ("" : : : "sp");
> which is now rejected.
>
> Thomas, is that mandatory to test your code to fix pr77904?
>
> Thanks,
>
> Christophe
>

And just noticed it causes a failure to build GDB for x86_64:
gdb-8.1-release/gdb/nat/linux-ptrace.c: In function 'void
linux_ptrace_init_warnings()':
gdb-8.1-release/gdb/nat/linux-ptrace.c:149:23: error: Stack Pointer
register clobbered by '%rsp' in 'asm'
  149 |: "%rsp", "memory");
  |   ^
Makefile:1640: recipe for target 'linux-ptrace.o' failed

I didn't check if the GDB code is legitimate though

> > Thanks,
> > Richard


Re: [openmp] Fix openmp 5.0 builtin function types

2018-12-12 Thread Tom de Vries
On 12-12-18 10:27, Jakub Jelinek wrote:
> On Wed, Dec 12, 2018 at 09:25:06AM +0100, Tom de Vries wrote:
>> [openmp] Fix openmp 5.0 builtin function types
>>
>> Fix some openmp 5.0 builtin functions to match the type used in the
>> implementation of those functions.
>>
>> This fixes some libgomp testsuite failures for x86_64 with nvptx accelerator.
>>
>> Build on x86_64 with nvptx accelerator, tested libgomp.
>>
>> 2018-12-12  Tom de Vries  
>>
>>  * omp-builtins.def
>>  (BUILT_IN_GOMP_LOOP_MAYBE_NONMONOTONIC_RUNTIME_START)
>>  (BUILT_IN_GOMP_LOOP_ULL_MAYBE_NONMONOTONIC_RUNTIME_START)
>>  (BUILT_IN_GOMP_PARALLEL_LOOP_MAYBE_NONMONOTONIC_RUNTIME): Fix
>>  function type.
> 
> Good catch, but it looks insufficient, the builtins without _maybe have the
> same bug (copy and paste from the preceding dynamic/guided calls that do
> have that extra argument - chunk_size), so if you've caught only these,
> it seems we have insufficient testsuite covergate in the target regions.
> 
> Does the following testcase show also the remaining three bugs on unpatched
> nvptx offloading 

It does, I get "Call has wrong number of parameters" errors for these 3
mismatches:
...
.extern .func (.param .u32 %value_out)
GOMP_loop_nonmonotonic_runtime_start (.param .u64 %in_ar0, .param .u64
%in_ar1, .param .u64 %in_ar2, .param .u64 %in_ar3, .param .u64 %in_ar4,
.param .u64 %in_ar5);

call
(%value_in),GOMP_loop_nonmonotonic_runtime_start,(%out_arg1,%out_arg2,%out_arg3,%out_arg4,%out_arg5);

.extern .func (.param .u32 %value_out)
GOMP_loop_ull_nonmonotonic_runtime_start (.param .u32 %in_ar0, .param
.u64 %in_ar1, .param .u64 %in_ar2, .param .u64 %in_ar3, .param .u64
%in_ar4, .param .u64 %in_ar5, .param .u64 %in_ar6);

call
(%value_in),GOMP_loop_ull_nonmonotonic_runtime_start,(%out_arg1,%out_arg2,%out_arg3,%out_arg4,%out_arg5,%out_arg6);

.extern .func GOMP_parallel_loop_nonmonotonic_runtime (.param .u64
%in_ar0, .param .u64 %in_ar1, .param .u32 %in_ar2, .param .u64 %in_ar3,
.param .u64 %in_ar4, .param .u64 %in_ar5, .param .u64 %in_ar6, .param
.u32 %in_ar7);

call
GOMP_parallel_loop_nonmonotonic_runtime,(%out_arg1,%out_arg2,%out_arg3,%out_arg4,%out_arg5,%out_arg6,%out_arg7);
...

and after rebuilding the test passes.

> (sorry, don't have nvptx offloading set up right now,
> should fix that soon)?
> 

Np.

Thanks,
- Tom


> 2018-12-12  Tom de Vries  
>   Jakub Jelinek  
> 
>   * omp-builtins.def (BUILT_IN_GOMP_LOOP_NONMONOTONIC_RUNTIME_START,
>   BUILT_IN_GOMP_LOOP_MAYBE_NONMONOTONIC_RUNTIME_START,
>   BUILT_IN_GOMP_LOOP_ULL_NONMONOTONIC_RUNTIME_START,
>   BUILT_IN_GOMP_LOOP_ULL_MAYBE_NONMONOTONIC_RUNTIME_START,
>   BUILT_IN_GOMP_PARALLEL_LOOP_NONMONOTONIC_RUNTIME,
>   BUILT_IN_GOMP_PARALLEL_LOOP_MAYBE_NONMONOTONIC_RUNTIME): Fix up
>   function types - remove one argument.
> 
>   * testsuite/libgomp.c-c++-common/for-16.c: New test.
> 
> --- gcc/omp-builtins.def.jj   2018-11-08 18:07:56.345070635 +0100
> +++ gcc/omp-builtins.def  2018-12-12 10:03:14.355354282 +0100
> @@ -126,11 +126,11 @@ DEF_GOMP_BUILTIN (BUILT_IN_GOMP_LOOP_NON
> ATTR_NOTHROW_LEAF_LIST)
>  DEF_GOMP_BUILTIN (BUILT_IN_GOMP_LOOP_NONMONOTONIC_RUNTIME_START,
> "GOMP_loop_nonmonotonic_runtime_start",
> -   BT_FN_BOOL_LONG_LONG_LONG_LONG_LONGPTR_LONGPTR,
> +   BT_FN_BOOL_LONG_LONG_LONG_LONGPTR_LONGPTR,
> ATTR_NOTHROW_LEAF_LIST)
>  DEF_GOMP_BUILTIN (BUILT_IN_GOMP_LOOP_MAYBE_NONMONOTONIC_RUNTIME_START,
> "GOMP_loop_maybe_nonmonotonic_runtime_start",
> -   BT_FN_BOOL_LONG_LONG_LONG_LONG_LONGPTR_LONGPTR,
> +   BT_FN_BOOL_LONG_LONG_LONG_LONGPTR_LONGPTR,
> ATTR_NOTHROW_LEAF_LIST)
>  DEF_GOMP_BUILTIN (BUILT_IN_GOMP_LOOP_ORDERED_STATIC_START,
> "GOMP_loop_ordered_static_start",
> @@ -234,11 +234,11 @@ DEF_GOMP_BUILTIN (BUILT_IN_GOMP_LOOP_ULL
> ATTR_NOTHROW_LEAF_LIST)
>  DEF_GOMP_BUILTIN (BUILT_IN_GOMP_LOOP_ULL_NONMONOTONIC_RUNTIME_START,
> "GOMP_loop_ull_nonmonotonic_runtime_start",
> -   BT_FN_BOOL_BOOL_ULL_ULL_ULL_ULL_ULLPTR_ULLPTR,
> +   BT_FN_BOOL_BOOL_ULL_ULL_ULL_ULLPTR_ULLPTR,
> ATTR_NOTHROW_LEAF_LIST)
>  DEF_GOMP_BUILTIN (BUILT_IN_GOMP_LOOP_ULL_MAYBE_NONMONOTONIC_RUNTIME_START,
> "GOMP_loop_ull_maybe_nonmonotonic_runtime_start",
> -   BT_FN_BOOL_BOOL_ULL_ULL_ULL_ULL_ULLPTR_ULLPTR,
> +   BT_FN_BOOL_BOOL_ULL_ULL_ULL_ULLPTR_ULLPTR,
> ATTR_NOTHROW_LEAF_LIST)
>  DEF_GOMP_BUILTIN (BUILT_IN_GOMP_LOOP_ULL_ORDERED_STATIC_START,
> "GOMP_loop_ull_ordered_static_start",
> @@ -349,11 +349,11 @@ DEF_GOMP_BUILTIN (BUILT_IN_GOMP_PARALLEL
> ATTR_NOTHROW_LIST)
>  DEF_GOMP_BUILTIN (BUILT_IN_GOMP_PARALLEL_LOOP_NONMONOTONIC_RUNTIME,
> "GOMP_parallel_loop_nonmonotonic_runtime",
> -   BT_FN_VOID_OMPFN_PTR_UINT_LONG_

Re: [PATCH] [RFC] PR target/52813 and target/11807

2018-12-12 Thread Thomas Preudhomme
Hi Christophe,

That PR was about a bug occuring when sp was clobbered so if it cannot
be clobbered anymore the whole commit (r242693) can be removed. Let me
check the original code that lead to the PR why it's clobbering sp
though.

Best regards,

Thomas
On Wed, 12 Dec 2018 at 09:43, Christophe Lyon
 wrote:
>
> On Tue, 11 Dec 2018 at 16:52, Richard Sandiford
>  wrote:
> >
> > Dimitar Dimitrov  writes:
> > > On понеделник, 10 декември 2018 г. 11:21:53 EET Richard Sandiford wrote:
> > >> Dimitar Dimitrov  writes:
> > >> > I have tested this fix on x86_64 host, and found no regression in the C
> > >> > and C++ testsuites.  I'm marking this patch as RFC simply because I 
> > >> > don't
> > >> > have experience with other architectures, and I don't have a setup to
> > >> > test all architectures supported by GCC.
> > >> >
> > >> > gcc/ChangeLog:
> > >> >
> > >> > 2018-12-07  Dimitar Dimitrov  
> > >> >
> > >> >* cfgexpand.c (asm_clobber_reg_is_valid): Also produce
> > >> >error when stack pointer is clobbered.
> > >> >(expand_asm_stmt): Refactor clobber check in separate function.
> > >> >
> > >> > gcc/testsuite/ChangeLog:
> > >> >
> > >> > 2018-12-07  Dimitar Dimitrov  
> > >> >
> > >> >* gcc.target/i386/pr52813.c: New test.
> > >> >
> > >> > Signed-off-by: Dimitar Dimitrov 
> > >>
> > >> LGTM.  Do you have a copyright assignment on file?  'Fraid this is
> > >> probably big enough to need one.
> > > Yes, I have copyright assignment.
> >
> > OK, great.  I went ahead and applied the patch.
> >
>
> Hi,
>
> This patch introduces a regression on arm:
> FAIL: gcc.target/arm/pr77904.c (test for excess errors)
> Excess errors:
> /gcc/testsuite/gcc.target/arm/pr77904.c:7:3: error: Stack Pointer
> register clobbered by 'sp' in 'asm'
>
> Indeed the testcase has an explicit:
>   __asm volatile ("" : : : "sp");
> which is now rejected.
>
> Thomas, is that mandatory to test your code to fix pr77904?
>
> Thanks,
>
> Christophe
>
> > Thanks,
> > Richard


Re: [openmp] Fix openmp 5.0 builtin function types

2018-12-12 Thread Jakub Jelinek
On Wed, Dec 12, 2018 at 11:22:22AM +0100, Tom de Vries wrote:
> .extern .func GOMP_parallel_loop_nonmonotonic_runtime (.param .u64
> %in_ar0, .param .u64 %in_ar1, .param .u32 %in_ar2, .param .u64 %in_ar3,
> .param .u64 %in_ar4, .param .u64 %in_ar5, .param .u64 %in_ar6, .param
> .u32 %in_ar7);
> 
> call
> GOMP_parallel_loop_nonmonotonic_runtime,(%out_arg1,%out_arg2,%out_arg3,%out_arg4,%out_arg5,%out_arg6,%out_arg7);
> ...
> 
> and after rebuilding the test passes.

Thanks for testing.  I'll commit it after bootstrap/regtest tonight,
perhaps I'll try to add further testsuite coverage later on to verify
all other OpenMP builtins are covered in target regions.

Jakub


Re: [PATCH] Delete powerpcspe

2018-12-12 Thread Richard Biener
On Tue, Dec 11, 2018 at 2:37 PM Jeff Law  wrote:
>
> On 12/11/18 1:44 AM, Richard Biener wrote:
> > On Mon, Dec 10, 2018 at 9:13 PM Segher Boessenkool
> >  wrote:
> >>
> >> On Mon, Dec 10, 2018 at 06:25:31PM +, Andrew Jenner wrote:
> >>> Sorry for the slow response on this, I was on vacation last week.
> >>>
> >>> On 03/12/2018 21:48, Jakub Jelinek wrote:
>  I'd give the maintainers the last week to act if they don't want this
>  to happen and if nothing happens, commit it.  PR81084 lists all the 
>  reasons
>  why it should be removed when it is totally unmaintained.
>  Just make sure to put stuff that belongs there to gcc/ChangeLog and 
>  without
>  gcc/ prefixes.
> >>>
> >>> Yes, please go ahead and commit
> >>
> >> Committed to trunk as r266961.
> >>
> >>> - it's not fair on other maintainers to
> >>> have to work around my lack of action on this port. I will continue to
> >>> work on it out-of-tree and hope to restore it once it is in proper shape.
> >>
> >> The more important thing is maintenance...  Regular and/or frequent tests
> >> (posted to gcc-testresults@), bug tracker maintenance, etc.  You need to
> >> be visible.
> >
> > Very much agreed on that.  Though if we pull out this card we're applying
> > double-standards here considering for example ia64 or some embedded ports.
> The biggest problem with the embedded ports is lack of reliable
> simulator testing combined with reduced address space availability.  So
> you end up with a test that works fine today, but due to a runtime clash
> of the stack and heap it may well fail tomorrow due to an unrelated code
> change (by changing what's on the stack and where).  Or worse yet, you
> end up with hundreds of tests that time out, causing the testrun to go
> on so long it's useless.
>
> One way to deal with these problems is to create a fake simulator that
> always returns success.  That's what my tester does for the embedded
> targets.  That allows us to do reliable compile-time tests as well as
> the various scan-whatever tests.
>
> It would be trivial to start sending those results to gcc-testresults.

I think it would be more useful if the execute testing would be
reported as UNSUPPORTED rather than simply PASS w/o being
sure it does.

But while posting to gcc-testresults is a sign of testing tracking
regressions (and progressions!) in bugzilla and caring for those
bugs is far more important...

Richard.

> jeff


Re: [libgomp, nvptx] Disable OMP_{DISPLAY_AFFINITY,AFFINITY_FORMAT} support

2018-12-12 Thread Tom de Vries
On 12-12-18 10:36, Jakub Jelinek wrote:
> On Wed, Dec 12, 2018 at 09:29:36AM +0100, Tom de Vries wrote:
>>> This is the libgomp/ except libgomp/testsuite/ part of the gomp-5_0-branch
>>> merge to trunk I've just committed.
>>>
>>> 2018-11-08  Jakub Jelinek  
>>>
>>> * affinity.c (gomp_display_affinity_place): New function.
>>> * affinity-fmt.c: New file.
>>
>>> * fortran.c: Include stdio.h and string.h.
>>> (omp_pause_resource, omp_pause_resource_all): Add ialias_redirect.
>>> (omp_get_schedule_, omp_get_schedule_8_): Mask off GFS_MONOTONIC bit.
>>> (omp_set_affinity_format_, omp_get_affinity_format_,
>>> omp_display_affinity_, omp_capture_affinity_, omp_pause_resource_,
>>> omp_pause_resource_all_): New functions.
>>
>> OK for trunk?
>>
>> Thanks,
>> - Tom
> 
>> [libgomp, nvptx] Disable OMP_{DISPLAY_AFFINITY,AFFINITY_FORMAT} support
>>
>> Disable compilation of support for OMP_DISPLAY_AFFINITY and 
>> OMP_AFFINITY_FORMAT
>> for nvptx.
>>
>> This fixes some libgomp testsuite failures for x86_64 with nvptx accelerator.
>>
>> Build on x86_64 with nvptx accelerator, tested libgomp.
>>
>> 2018-12-12  Tom de Vries  
>>
>>  * affinity.c (gomp_display_affinity_place): Guard with
>>  #ifndef LIBGOMP_OFFLOADED_ONLY.
>>  * fortran.c (omp_set_affinity_format_, omp_get_affinity_format_)
>>  (omp_display_affinity_, omp_capture_affinity_): Same.
>>  * libgomp/config/nvptx/affinity-fmt.c: New, empty file overriding
>>  libgomp/affinity-fmt.c.
> 
> The runtime APIs should still be available inside of OpenMP regions, your
> patch makes them unavailable.

Ah, I see.

> What exactly doesn't work?
> hostname handling, getpid, thread_id, affinity places?

The concrete error I'm getting is unresolved symbol getpid.  This is
caused by the AC_COMPILE_IFELSE configure test for HAVE_GETPID passing
for nvptx, while the master newlib for nvptx does not contain getpid.

> I'd prefer for now to just stub what doesn't work in the 
> LIBGOMP_OFFLOADED_ONLY
> case (print "node", 0 as pid, 0 as thread id, something for affinity
> (0 or 0-N where N is the number of warps it can support or whatever).
> 

OK, I'll give that a try.

Thanks,
- Tom

> Eventually we need to find a way how to transfer some ICVs and other data
> from the host to the offloading library, either process shared that aren't
> changing, which can include the hostname, getpid, or some others that would
> need to be passed for every target region.


[PATCH] Avoid epilogue peeling for x264 vectorization in x264_pixel_sad_x4_8x8

2018-12-12 Thread Richard Biener


The following improves x264 vectorization by avoiding peeling for gaps
noticing that when the upper half of a vector is unused we can
load the lower part only (and fill the upper half with zeros - this
is what x86 does automatically, GIMPLE doesn't allow us to leave
the upper half undefined as RTL would with using subregs).

The implementation is a little bit awkward as for optimal GIMPLE
code-generation and costing we'd like to go the strided load path
instead.  That proves somewhat difficult though thus the following
is easier but doesn't fill out the re-align paths nor the masked
paths (at least the fully masked path would never need peeling for
gaps).

Bootstrapped and tested on x86_64-unknown-linux-gnu, tested with
SPEC CPU 2006 and 2017 with the expected (~4%) improvement for
625.x264_s.  Didn't see any positive or negative effects elsewhere.

Queued for GCC 10.

Richard.

2018-12-12  Richard Biener  

* tree-vect-stmts.c (get_group_load_store_type): Avoid
peeling for gaps by loading only lower halves of vectors
if possible.
(vectorizable_load): Likewise.

* gcc.dg/vect/slp-reduc-sad-2.c: New testcase.

Index: gcc/tree-vect-stmts.c
===
--- gcc/tree-vect-stmts.c   (revision 266744)
+++ gcc/tree-vect-stmts.c   (working copy)
@@ -2194,6 +2194,29 @@ get_group_load_store_type (stmt_vec_info
  && gap < (vect_known_alignment_in_bytes (first_dr_info)
/ vect_get_scalar_dr_size (first_dr_info)))
overrun_p = false;
+
+ /* If the gap splits the vector in half and the target
+can do half-vector operations avoid the epilogue peeling
+by simply loading half of the vector only.  Usually
+the construction with an upper zero half will be elided.  */
+ dr_alignment_support alignment_support_scheme;
+ scalar_mode elmode = SCALAR_TYPE_MODE (TREE_TYPE (vectype));
+ machine_mode vmode;
+ if (overrun_p
+ && !masked_p
+ && (((alignment_support_scheme
+ = vect_supportable_dr_alignment (first_dr_info, false)))
+  == dr_aligned
+ || alignment_support_scheme == dr_unaligned_supported)
+ && known_eq (nunits, (group_size - gap) * 2)
+ && mode_for_vector (elmode, (group_size - gap)).exists (&vmode)
+ && VECTOR_MODE_P (vmode)
+ && targetm.vector_mode_supported_p (vmode)
+ && (convert_optab_handler (vec_init_optab,
+TYPE_MODE (vectype), vmode)
+ != CODE_FOR_nothing))
+   overrun_p = false;
+
  if (overrun_p && !can_overrun_p)
{
  if (dump_enabled_p ())
@@ -8362,8 +8385,24 @@ vectorizable_load (stmt_vec_info stmt_in
  }
else
  {
+   tree ltype = vectype;
+   /* If there's no peeling for gaps but we have a gap
+  with slp loads then load the lower half of the
+  vector only.  See get_group_load_store_type for
+  when we apply this optimization.  */
+   if (slp
+   && loop_vinfo
+   && !LOOP_VINFO_PEELING_FOR_GAPS (loop_vinfo)
+   && DR_GROUP_GAP (first_stmt_info) != 0
+   && known_eq (nunits,
+(group_size
+ - DR_GROUP_GAP (first_stmt_info)) * 
2))
+ ltype = build_vector_type (TREE_TYPE (vectype),
+(group_size
+ - DR_GROUP_GAP
+ (first_stmt_info)));
data_ref
- = fold_build2 (MEM_REF, vectype, dataref_ptr,
+ = fold_build2 (MEM_REF, ltype, dataref_ptr,
 dataref_offset
 ? dataref_offset
 : build_int_cst (ref_type, 0));
@@ -8377,6 +8416,23 @@ vectorizable_load (stmt_vec_info stmt_in
  TREE_TYPE (data_ref)
= build_aligned_type (TREE_TYPE (data_ref),
  TYPE_ALIGN (elem_type));
+   if (ltype != vectype)
+ {
+   vect_copy_ref_info (data_ref, DR_REF 
(first_dr_info->dr));
+   tree tem = make_ssa_name (ltype);
+   new_stmt = gimple_build_assign (tem, data_ref);
+   vect_finish_stmt_generation (stmt_info, new_stmt, 
gsi);

Re: [PATCH] [RFC] PR target/52813 and target/11807

2018-12-12 Thread Thomas Preudhomme
So my understanding is that the original code (CMSIS library) used to
clobber sp because the asm statement was actually changing the sp.
That in turn led GCC to try to save and restore sp which is not what
CMSIS was expecting to happen. Changing sp without clobber as done now
is probably the right solution and r242693 can be reverted. That will
remove the failing test.

Best regards,

Thomas
On Wed, 12 Dec 2018 at 10:30, Thomas Preudhomme
 wrote:
>
> Hi Christophe,
>
> That PR was about a bug occuring when sp was clobbered so if it cannot
> be clobbered anymore the whole commit (r242693) can be removed. Let me
> check the original code that lead to the PR why it's clobbering sp
> though.
>
> Best regards,
>
> Thomas
> On Wed, 12 Dec 2018 at 09:43, Christophe Lyon
>  wrote:
> >
> > On Tue, 11 Dec 2018 at 16:52, Richard Sandiford
> >  wrote:
> > >
> > > Dimitar Dimitrov  writes:
> > > > On понеделник, 10 декември 2018 г. 11:21:53 EET Richard Sandiford wrote:
> > > >> Dimitar Dimitrov  writes:
> > > >> > I have tested this fix on x86_64 host, and found no regression in 
> > > >> > the C
> > > >> > and C++ testsuites.  I'm marking this patch as RFC simply because I 
> > > >> > don't
> > > >> > have experience with other architectures, and I don't have a setup to
> > > >> > test all architectures supported by GCC.
> > > >> >
> > > >> > gcc/ChangeLog:
> > > >> >
> > > >> > 2018-12-07  Dimitar Dimitrov  
> > > >> >
> > > >> >* cfgexpand.c (asm_clobber_reg_is_valid): Also produce
> > > >> >error when stack pointer is clobbered.
> > > >> >(expand_asm_stmt): Refactor clobber check in separate function.
> > > >> >
> > > >> > gcc/testsuite/ChangeLog:
> > > >> >
> > > >> > 2018-12-07  Dimitar Dimitrov  
> > > >> >
> > > >> >* gcc.target/i386/pr52813.c: New test.
> > > >> >
> > > >> > Signed-off-by: Dimitar Dimitrov 
> > > >>
> > > >> LGTM.  Do you have a copyright assignment on file?  'Fraid this is
> > > >> probably big enough to need one.
> > > > Yes, I have copyright assignment.
> > >
> > > OK, great.  I went ahead and applied the patch.
> > >
> >
> > Hi,
> >
> > This patch introduces a regression on arm:
> > FAIL: gcc.target/arm/pr77904.c (test for excess errors)
> > Excess errors:
> > /gcc/testsuite/gcc.target/arm/pr77904.c:7:3: error: Stack Pointer
> > register clobbered by 'sp' in 'asm'
> >
> > Indeed the testcase has an explicit:
> >   __asm volatile ("" : : : "sp");
> > which is now rejected.
> >
> > Thomas, is that mandatory to test your code to fix pr77904?
> >
> > Thanks,
> >
> > Christophe
> >
> > > Thanks,
> > > Richard


Re: [PATCH] [RFC] PR target/52813 and target/11807

2018-12-12 Thread Andreas Schwab
This breaks ia64:

In file included from ../../../libgomp/config/linux/wait.h:46,
 from ../../../libgomp/config/linux/mutex.c:30:
../../../libgomp/config/linux/ia64/futex.h: In function 'gomp_mutex_lock_slow':
../../../libgomp/config/linux/ia64/futex.h:43:3: error: Stack Pointer register 
clobbered by 'r12' in 'asm'
   43 |   __asm __volatile ("break 0x10"
  |   ^
../../../libgomp/config/linux/ia64/futex.h:43:3: error: Stack Pointer register 
clobbered by 'r12' in 'asm'
   43 |   __asm __volatile ("break 0x10"
  |   ^
../../../libgomp/config/linux/ia64/futex.h:43:3: error: Stack Pointer register 
clobbered by 'r12' in 'asm'
   43 |   __asm __volatile ("break 0x10"
  |   ^
../../../libgomp/config/linux/ia64/futex.h:43:3: error: Stack Pointer register 
clobbered by 'r12' in 'asm'
   43 |   __asm __volatile ("break 0x10"
  |   ^
../../../libgomp/config/linux/ia64/futex.h: In function 
'gomp_mutex_unlock_slow':
../../../libgomp/config/linux/ia64/futex.h:43:3: error: Stack Pointer register 
clobbered by 'r12' in 'asm'
   43 |   __asm __volatile ("break 0x10"
  |   ^
../../../libgomp/config/linux/ia64/futex.h:43:3: error: Stack Pointer register 
clobbered by 'r12' in 'asm'
   43 |   __asm __volatile ("break 0x10"
  |   ^

Installed as obvious.

Andreas.

* config/linux/ia64/futex.h (sys_futex0): Don't mark r12 as
clobbered.
---
 libgomp/config/linux/ia64/futex.h | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/libgomp/config/linux/ia64/futex.h 
b/libgomp/config/linux/ia64/futex.h
index 6efec3c813..df450f8def 100644
--- a/libgomp/config/linux/ia64/futex.h
+++ b/libgomp/config/linux/ia64/futex.h
@@ -45,8 +45,8 @@ sys_futex0(int *addr, int op, int val)
  "=r"(r8), "=r"(r10)
: "r"(r15), "r"(out0), "r"(out1), "r"(out2), "r"(out3)
: "memory", "out4", "out5", "out6", "out7",
- /* Non-stacked integer registers, minus r8, r10, r15.  */
- "r2", "r3", "r9", "r11", "r12", "r13", "r14", "r16", "r17", "r18",
+ /* Non-stacked integer registers, minus r8, r10, r12, r15.  */
+ "r2", "r3", "r9", "r11", "r13", "r14", "r16", "r17", "r18",
  "r19", "r20", "r21", "r22", "r23", "r24", "r25", "r26", "r27",
  "r28", "r29", "r30", "r31",
  /* Predicate registers.  */
-- 
2.20.0

-- 
Andreas Schwab, SUSE Labs, sch...@suse.de
GPG Key fingerprint = 0196 BAD8 1CE9 1970 F4BE  1748 E4D4 88E3 0EEA B9D7
"And now for something completely different."


Re: [PATCH AArch64]Fix test failure for pr84682-2.c

2018-12-12 Thread Richard Earnshaw (lists)
On 30/08/2018 13:24, Richard Sandiford wrote:
> Joey Ye  writes:
>> Hi Bin & Richard,
>>
>> It is not as simple as keeping the assertion, which still fails even
>> with the change in reorg.c. The testing result is as following:
>>
>> I. With Bin's patch version 2 (removing the assertion in aarch64.c and
>> adding the check in reorg.c): pr84682-2.c passes
>>
>> II. With Richard's suggestion to keep the assertion in aarch64, but
>> adding the check in reorg.c: pr84682-2.c fails
>>
>> Apparently there is a different path that reaches the assertion.
> 
> Yeah, looks like we also need to make constrain_operands check
> address_operand for 'p' (which I think it should do irrespective
> of "strict", since in general we can only reload an operand as
> a pointer if the original value has the right form for an address).
> 
> Thanks,
> Richard
> 

What's the status of this patch?

R.


RFA: libiberty: Add a limit on demangling qualifiers (PR 87241)

2018-12-12 Thread Nick Clifton
Hi Ian,

  Sorry to bother you, but I have another libiberty demangler resource
  exhaustion prevention patch to present.  This one is for:

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=87241

  Jonathan Wakely reported that __cxa_demanlge() was returning a -2
  result, but I did not see this.  Instead I found that
  consume_count_with_underscores() is returning a very large number
  (because a very large value is encoded in the mangled string) and this
  is resulting in many calls to remember_Ktype() which eventually
  exhaust the amount of memory available.

  The attached patch is a simplistic approach to solving this problem by
  adding a hard upper limit on the number of qualifiers that will be
  allowed by the demangler.  I am not sure if this is the best approach
  to solving the problem, but it is a simple one, and I would think one
  that would not prevent the demangling of any real mangled names.  The
  limit does not have to be DEMANGLE_RECURSE_LIMIT of course.  I just
  chose that value because it was convenient and of a size that I
  thought was appropriate.

  I also did run the libiberty testsuite this time, with no failures
  reported. :-)

  OK to apply ?

Cheers
  Nick

libiberty/ChangeLog
2018-12-12  Nick Clifton  

* cplus-dem.c (demangle_qualified): Add an upper limit on the
number of qualifiers supported, based upon the value of
DEMANGLE_RECURSE_LIMIT.

Index: libiberty/cplus-dem.c
===
--- libiberty/cplus-dem.c   (revision 267043)
+++ libiberty/cplus-dem.c   (working copy)
@@ -3443,6 +3443,17 @@
   success = 0;
 }
 
+  /* PR 87241: Catch malicious input that will try to trick this code into
+ allocating a ridiculous amount of memory via the remember_Ktype()
+ function.
+ The choice of DEMANGLE_RECURSION_LIMIT is somewhat arbitrary.  Possibly
+ a better solution would be to track how much memory remember_Ktype
+ allocates and abort when some upper limit is reached.  */
+  if (qualifiers > DEMANGLE_RECURSION_LIMIT)
+/* FIXME: We ought to have some way to tell the user that
+   this limit has been reached.  */
+success = 0;
+
   if (!success)
 return success;
 


Re: [PATCH, Fortran] pad char to int conversions with spaces instead of zeros (legacy)

2018-12-12 Thread Mark Eggleston
Before delving into the code to make changes to handle the case when 
passing a variable into transfer instead of a literal I revised the the 
test cases. The results indicate to me that this patch as originally 
intended is erroneous.


When a character literal is assigned to character variable e.g.

character(len=8) :: a
a = 'ab'

The value is padded with spaces.

So any variable passed into the transfer statement will already be 
padded with spaces. So the easiest solution is to pad with spaces when a 
character literal is passed in instead of zeros as is currently the case.


Mark


--
https://www.codethink.co.uk/privacy.html



Re: [Patch 1/4][Aarch64] v2: Implement Aarch64 SIMD ABI

2018-12-12 Thread Richard Sandiford
Steve Ellcey  writes:
> On Fri, 2018-12-07 at 17:34 +, Richard Sandiford wrote:
>> > +  (match_operand:TX 2 "register_operand" "w"))
>> > + (set (mem:TX (plus:P (match_dup 0)
>> > +  (match_operand:P 5 "const_int_operand" "n")))
>> > +  (match_operand:TX 3 "register_operand" "w"))])]
>> 
>> Think this last part should be:
>> 
>>  (set (mem:TX (plus:P (plus:P (match_dup 0)
>>   (match_dup 4))
>>   (match_operand:P 5 "const_int_operand"
>> "n")))
>>   (match_operand:TX 3 "register_operand" "w"))])]
>
> I think you are right about this.  What I have for
> loadwb_pair_ matches what is there for
> loadwb_pair_.  If this one is wrong, then I assume
> the others are wrong too?  This won't make a practical difference since
> we call these with gen_loadwb_pair*_* calls and not via pattern
> recognition, but still they should be right.  Should I change them
> all?  I did not change this as part of this patch.

I think we should fix the new pattern, but I agree fixing the others
should be a separate patch.

Patch LGTM with that change.

Thanks,
Richard


RFA: libiberty: Add a limit on demangling qualifiers (PR 87241) (version 2)

2018-12-12 Thread Nick Clifton
Hi Ian,

  *sigh* 5 minutes after sending the patch for this PR, I realised that
   I had made a mistake.  I should have conditionalized the limit on the
   number of supported qualifiers, so that the check is only made if we
   have resource limits enabled.  Like this:

Cheers
  Nick

Index: libiberty/cplus-dem.c
===
--- libiberty/cplus-dem.c   (revision 267043)
+++ libiberty/cplus-dem.c   (working copy)
@@ -3443,6 +3443,20 @@
   success = 0;
 }
 
+  if ((work->options & DMGL_NO_RECURSE_LIMIT) == 0)
+{
+  /* PR 87241: Catch malicious input that will try to trick this code into
+allocating a ridiculous amount of memory via the remember_Ktype()
+function.
+The choice of DEMANGLE_RECURSION_LIMIT is somewhat arbitrary.  Possibly
+a better solution would be to track how much memory remember_Ktype
+allocates and abort when some upper limit is reached.  */
+  if (qualifiers > DEMANGLE_RECURSION_LIMIT)
+   /* FIXME: We ought to have some way to tell the user that
+  this limit has been reached.  */
+   success = 0;
+}
+
   if (!success)
 return success;
 


[SPARC] Fix PR target/86806

2018-12-12 Thread Eric Botcazou
This adds the speculation barrier on SPARC-V9 in the form of membar #Sync.


2018-12-12  Eric Botcazou  

PR target/86806
* config/sparc/sparc.md (unspecv): Add UNSPECV_SPECULATION_BARRIER.
(speculation_barrier): New instruction for V9.

-- 
Eric BotcazouIndex: config/sparc/sparc.md
===
--- config/sparc/sparc.md	(revision 267029)
+++ config/sparc/sparc.md	(working copy)
@@ -104,6 +104,9 @@ (define_c_enum "unspec" [
 
 (define_c_enum "unspecv" [
   UNSPECV_BLOCKAGE
+
+  UNSPECV_SPECULATION_BARRIER
+
   UNSPECV_PROBE_STACK_RANGE
 
   UNSPECV_FLUSHW
@@ -7309,6 +7312,14 @@ (define_insn "*frame_blockage"
   ""
   [(set_attr "length" "0")])
 
+;; We use membar #Sync for the speculation barrier on V9.
+
+(define_insn "speculation_barrier"
+  [(unspec_volatile [(const_int 0)] UNSPECV_SPECULATION_BARRIER)]
+  "TARGET_V9"
+  "membar\t64"
+  [(set_attr "type" "multi")])
+
 (define_expand "probe_stack"
   [(set (match_operand 0 "memory_operand" "") (const_int 0))]
   ""


[PATCH v3 00/10] AMD GCN Port v3

2018-12-12 Thread Andrew Stubbs
This is the third rework of the patchset previously posted on September
5th and November 16th. As before, the series contains the
non-OpenACC/OpenMP portions of a port to AMD GCN3 and GCN5 GPU
processors.  It's sufficient to build single-threaded programs, with
vectorization in the usual way.  C and Fortran are supported, C++ is not
supported, and the other front-ends have not been tested.  The
OpenACC/OpenMP/libgomp portion will follow, once this is committed,
eventually.

Compared to the v2 patchset, patch 1, "Fix IRA ICE", has been dropped,
and a new, unrelated, patch 1 has been added: "Fix LRA bug".

The IRA issue has now been solved by reworking the move instructions in
the back-end so that they no longer require explicit mention of the EXEC
register (this is now managed mostly by the md_reorg pass).  I also took
the opportunity to rework the EXEC use throughout the machine
description (something I've been wanting to get to for ages); the
primary instruction patterns no longer use vec_merge, and there are
"_exec" variants defined (mostly via define_subst) for the use of
specific expanders and so that combine can optimize conditional vector
moves.

Additionally, the patterns that choose which unit to use for
scalar operations now only clobber the relevant condition register (via
a match_scratch), not both of them.

The new LRA issue was exposed by the above changes, but would affect any
target where patterns referring to an eliminable register might also
include a "scratch" register.

I've also addressed the various feedback I received from patch
reviewers.

-- 
Andrew Stubbs
Mentor Graphics / CodeSourcery


[PATCH 01/10] Fix LRA bug

2018-12-12 Thread Andrew Stubbs

[This is new patch not included in the previously posted patch sets.]

This patch fixes an ICE building libgfortran/random.c.

The problem was an adddi3 instruction that had an eliminable frame pointer.
GCN adddi3 includes a match_scratch, which LRA substitutes with a REG, and
checks if it can be converted back to a scratch afterwards.  In the meantime,
the add was converted to a move, meaning that the instruction pattern
completely changed, thus causing a segfault when the instruction is revisited
in restore_scratches.

2018-12-12  Andrew Stubbs  

gcc/
* gcc/lra-int.h (lra_register_new_scratch_op): Add third parameter.
* gcc/lra-remat.c (update_scratch_ops): Pass icode to
lra_register_new_scratch_op.
* gcc/lra.c (struct sloc): Add icode field.
(lra_register_new_scratch_op): Add icode parameter.
Use icode to skip insns that have changed beyond recognition.
---
 gcc/lra-int.h   |  2 +-
 gcc/lra-remat.c |  2 +-
 gcc/lra.c   | 12 ++--
 3 files changed, 12 insertions(+), 4 deletions(-)

diff --git a/gcc/lra-int.h b/gcc/lra-int.h
index 5267b53..d5ff652 100644
--- a/gcc/lra-int.h
+++ b/gcc/lra-int.h
@@ -335,7 +335,7 @@ extern void lra_create_copy (int, int, int);
 extern lra_copy_t lra_get_copy (int);
 extern bool lra_former_scratch_p (int);
 extern bool lra_former_scratch_operand_p (rtx_insn *, int);
-extern void lra_register_new_scratch_op (rtx_insn *, int);
+extern void lra_register_new_scratch_op (rtx_insn *, int, int);
 
 extern int lra_new_regno_start;
 extern int lra_constraint_new_regno_start;
diff --git a/gcc/lra-remat.c b/gcc/lra-remat.c
index faf74ca..627a248 100644
--- a/gcc/lra-remat.c
+++ b/gcc/lra-remat.c
@@ -1042,7 +1042,7 @@ update_scratch_ops (rtx_insn *remat_insn)
 	fprintf (lra_dump_file, "	 Assigning the same %d to r%d\n",
 		 REGNO (*loc), hard_regno);
 	}
-  lra_register_new_scratch_op (remat_insn, i);
+  lra_register_new_scratch_op (remat_insn, i, id->icode);
 }
   
 }
diff --git a/gcc/lra.c b/gcc/lra.c
index 5d58d90..14974ed 100644
--- a/gcc/lra.c
+++ b/gcc/lra.c
@@ -2026,6 +2026,7 @@ struct sloc
 {
   rtx_insn *insn; /* Insn where the scratch was.  */
   int nop;  /* Number of the operand which was a scratch.  */
+  int icode;  /* Original icode from which scratch was removed.  */
 };
 
 typedef struct sloc *sloc_t;
@@ -2057,7 +2058,7 @@ lra_former_scratch_operand_p (rtx_insn *insn, int nop)
 /* Register operand NOP in INSN as a former scratch.  It will be
changed to scratch back, if it is necessary, at the LRA end.  */
 void
-lra_register_new_scratch_op (rtx_insn *insn, int nop)
+lra_register_new_scratch_op (rtx_insn *insn, int nop, int icode)
 {
   lra_insn_recog_data_t id = lra_get_insn_recog_data (insn);
   rtx op = *id->operand_loc[nop];
@@ -2065,6 +2066,7 @@ lra_register_new_scratch_op (rtx_insn *insn, int nop)
   lra_assert (REG_P (op));
   loc->insn = insn;
   loc->nop = nop;
+  loc->icode = icode;
   scratches.safe_push (loc);
   bitmap_set_bit (&scratch_bitmap, REGNO (op));
   bitmap_set_bit (&scratch_operand_bitmap,
@@ -2102,7 +2104,7 @@ remove_scratches (void)
 	  *id->operand_loc[i] = reg
 		= lra_create_new_reg (static_id->operand[i].mode,
   *id->operand_loc[i], ALL_REGS, NULL);
-	  lra_register_new_scratch_op (insn, i);
+	  lra_register_new_scratch_op (insn, i, id->icode);
 	  if (lra_dump_file != NULL)
 		fprintf (lra_dump_file,
 			 "Removing SCRATCH in insn #%u (nop %d)\n",
@@ -2136,6 +2138,12 @@ restore_scratches (void)
 	  last = loc->insn;
 	  id = lra_get_insn_recog_data (last);
 	}
+  if (loc->icode != id->icode)
+	{
+	  /* The icode doesn't match, which means the insn has been modified
+	 (e.g. register elimination).  The scratch cannot be restored.  */
+	  continue;
+	}
   if (REG_P (*id->operand_loc[loc->nop])
 	  && ((regno = REGNO (*id->operand_loc[loc->nop]))
 	  >= FIRST_PSEUDO_REGISTER)


[PATCH v3 02/10] GCN libgfortran.

2018-12-12 Thread Andrew Stubbs

[Already approved by Janne Blomqvist and Jeff Law.  Included here for
completeness.]

This patch contains the GCN port of libgfortran.  We use the minimal
configuration created for NVPTX.  That's all that's required, besides the
target-independent bug fixes posted already.

2018-12-12  Andrew Stubbs  
Kwok Cheung Yeung  
Julian Brown  
Tom de Vries  

libgfortran/
* configure.ac: Use minimal mode for amdgcn.
* configure: Regenerate.
---
 libgfortran/configure| 31 ++-
 libgfortran/configure.ac |  3 ++-
 2 files changed, 12 insertions(+), 22 deletions(-)

diff --git a/libgfortran/configure b/libgfortran/configure
index 62e80a0..487d8c0 100755
--- a/libgfortran/configure
+++ b/libgfortran/configure
@@ -780,7 +780,6 @@ infodir
 docdir
 oldincludedir
 includedir
-runstatedir
 localstatedir
 sharedstatedir
 sysconfdir
@@ -871,7 +870,6 @@ datadir='${datarootdir}'
 sysconfdir='${prefix}/etc'
 sharedstatedir='${prefix}/com'
 localstatedir='${prefix}/var'
-runstatedir='${localstatedir}/run'
 includedir='${prefix}/include'
 oldincludedir='/usr/include'
 docdir='${datarootdir}/doc/${PACKAGE_TARNAME}'
@@ -1124,15 +1122,6 @@ do
   | -silent | --silent | --silen | --sile | --sil)
 silent=yes ;;
 
-  -runstatedir | --runstatedir | --runstatedi | --runstated \
-  | --runstate | --runstat | --runsta | --runst | --runs \
-  | --run | --ru | --r)
-ac_prev=runstatedir ;;
-  -runstatedir=* | --runstatedir=* | --runstatedi=* | --runstated=* \
-  | --runstate=* | --runstat=* | --runsta=* | --runst=* | --runs=* \
-  | --run=* | --ru=* | --r=*)
-runstatedir=$ac_optarg ;;
-
   -sbindir | --sbindir | --sbindi | --sbind | --sbin | --sbi | --sb)
 ac_prev=sbindir ;;
   -sbindir=* | --sbindir=* | --sbindi=* | --sbind=* | --sbin=* \
@@ -1270,7 +1259,7 @@ fi
 for ac_var in	exec_prefix prefix bindir sbindir libexecdir datarootdir \
 		datadir sysconfdir sharedstatedir localstatedir includedir \
 		oldincludedir docdir infodir htmldir dvidir pdfdir psdir \
-		libdir localedir mandir runstatedir
+		libdir localedir mandir
 do
   eval ac_val=\$$ac_var
   # Remove trailing slashes.
@@ -1423,7 +1412,6 @@ Fine tuning of the installation directories:
   --sysconfdir=DIRread-only single-machine data [PREFIX/etc]
   --sharedstatedir=DIRmodifiable architecture-independent data [PREFIX/com]
   --localstatedir=DIR modifiable single-machine data [PREFIX/var]
-  --runstatedir=DIR   modifiable per-process data [LOCALSTATEDIR/run]
   --libdir=DIRobject code libraries [EPREFIX/lib]
   --includedir=DIRC header files [PREFIX/include]
   --oldincludedir=DIR C header files for non-gcc [/usr/include]
@@ -6176,7 +6164,8 @@ fi
 # * C library support for other features such as signal, environment
 #   variables, time functions
 
- if test "x${target_cpu}" = xnvptx; then
+ if test "x${target_cpu}" = xnvptx \
+ || test "x${target_cpu}" = xamdgcn; then
   LIBGFOR_MINIMAL_TRUE=
   LIBGFOR_MINIMAL_FALSE='#'
 else
@@ -12696,7 +12685,7 @@ else
   lt_dlunknown=0; lt_dlno_uscore=1; lt_dlneed_uscore=2
   lt_status=$lt_dlunknown
   cat > conftest.$ac_ext <<_LT_EOF
-#line 12699 "configure"
+#line 12688 "configure"
 #include "confdefs.h"
 
 #if HAVE_DLFCN_H
@@ -12802,7 +12791,7 @@ else
   lt_dlunknown=0; lt_dlno_uscore=1; lt_dlneed_uscore=2
   lt_status=$lt_dlunknown
   cat > conftest.$ac_ext <<_LT_EOF
-#line 12805 "configure"
+#line 12794 "configure"
 #include "confdefs.h"
 
 #if HAVE_DLFCN_H
@@ -16051,7 +16040,7 @@ else
 We can't simply define LARGE_OFF_T to be 9223372036854775807,
 since some C++ compilers masquerading as C compilers
 incorrectly reject 9223372036854775807.  */
-#define LARGE_OFF_T off_t) 1 << 31) << 31) - 1 + (((off_t) 1 << 31) << 31))
+#define LARGE_OFF_T (((off_t) 1 << 62) - 1 + ((off_t) 1 << 62))
   int off_t_is_large[(LARGE_OFF_T % 2147483629 == 721
 		   && LARGE_OFF_T % 2147483647 == 1)
 		  ? 1 : -1];
@@ -16097,7 +16086,7 @@ else
 We can't simply define LARGE_OFF_T to be 9223372036854775807,
 since some C++ compilers masquerading as C compilers
 incorrectly reject 9223372036854775807.  */
-#define LARGE_OFF_T off_t) 1 << 31) << 31) - 1 + (((off_t) 1 << 31) << 31))
+#define LARGE_OFF_T (((off_t) 1 << 62) - 1 + ((off_t) 1 << 62))
   int off_t_is_large[(LARGE_OFF_T % 2147483629 == 721
 		   && LARGE_OFF_T % 2147483647 == 1)
 		  ? 1 : -1];
@@ -16121,7 +16110,7 @@ rm -f core conftest.err conftest.$ac_objext conftest.$ac_ext
 We can't simply define LARGE_OFF_T to be 9223372036854775807,
 since some C++ compilers masquerading as C compilers
 incorrectly reject 9223372036854775807.  */
-#define LARGE_OFF_T off_t) 1 << 31) << 31) - 1 + (((off_t) 1 << 31) << 31))
+#define LARGE_OFF_T (((off_t) 1 << 62) - 1 + ((off_t) 1 << 62))
   int off_t_is_large[(LARGE_OFF_T % 2147483629 == 721
 		   && LARGE_OFF_T % 2147483647 == 1)
 		  ? 1 : -1]

[PATCH v3 03/10] GCN libgcc.

2018-12-12 Thread Andrew Stubbs

[Already approved by Jeff Law.  Included here for completeness.]

This patch contains the GCN port of libgcc.

2018-12-12  Andrew Stubbs  
Kwok Cheung Yeung  
Julian Brown  
Tom de Vries  

libgcc/
* config.host: Recognize amdgcn*-*-amdhsa.
* config/gcn/crt0.c: New file.
* config/gcn/lib2-divmod-hi.c: New file.
* config/gcn/lib2-divmod.c: New file.
* config/gcn/lib2-gcn.h: New file.
* config/gcn/sfp-machine.h: New file.
* config/gcn/t-amdgcn: New file.
---
 libgcc/config.host |   8 +++
 libgcc/config/gcn/crt0.c   |  23 
 libgcc/config/gcn/lib2-divmod-hi.c | 117 +
 libgcc/config/gcn/lib2-divmod.c| 117 +
 libgcc/config/gcn/lib2-gcn.h   |  49 
 libgcc/config/gcn/sfp-machine.h|  51 
 libgcc/config/gcn/t-amdgcn |  16 +
 7 files changed, 381 insertions(+)
 create mode 100644 libgcc/config/gcn/crt0.c
 create mode 100644 libgcc/config/gcn/lib2-divmod-hi.c
 create mode 100644 libgcc/config/gcn/lib2-divmod.c
 create mode 100644 libgcc/config/gcn/lib2-gcn.h
 create mode 100644 libgcc/config/gcn/sfp-machine.h
 create mode 100644 libgcc/config/gcn/t-amdgcn

diff --git a/libgcc/config.host b/libgcc/config.host
index 1cbc8ac..a854ede 100644
--- a/libgcc/config.host
+++ b/libgcc/config.host
@@ -91,6 +91,10 @@ alpha*-*-*)
 am33_2.0-*-linux*)
 	cpu_type=mn10300
 	;;
+amdgcn*-*-*)
+	cpu_type=gcn
+	tmake_file="${tmake_file} t-softfp-sfdf t-softfp"
+	;;
 arc*-*-*)
 	cpu_type=arc
 	;;
@@ -387,6 +391,10 @@ alpha*-dec-*vms*)
 	extra_parts="$extra_parts vms-dwarf2.o vms-dwarf2eh.o"
 	md_unwind_header=alpha/vms-unwind.h
 	;;
+amdgcn*-*-amdhsa)
+	tmake_file="$tmake_file gcn/t-amdgcn"
+	extra_parts="crt0.o"
+	;;
 arc*-*-elf*)
 	tmake_file="arc/t-arc"
 	extra_parts="crti.o crtn.o crtend.o crtbegin.o crtendS.o crtbeginS.o"
diff --git a/libgcc/config/gcn/crt0.c b/libgcc/config/gcn/crt0.c
new file mode 100644
index 000..00a3f42
--- /dev/null
+++ b/libgcc/config/gcn/crt0.c
@@ -0,0 +1,23 @@
+/* Copyright (C) 2017-2018 Free Software Foundation, Inc.
+
+   This file is free software; you can redistribute it and/or modify it
+   under the terms of the GNU General Public License as published by the
+   Free Software Foundation; either version 3, or (at your option) any
+   later version.
+
+   This file is distributed in the hope that it will be useful, but
+   WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+   General Public License for more details.
+
+   Under Section 7 of GPL version 3, you are granted additional
+   permissions described in the GCC Runtime Library Exception, version
+   3.1, as published by the Free Software Foundation.
+
+   You should have received a copy of the GNU General Public License and
+   a copy of the GCC Runtime Library Exception along with this program;
+   see the files COPYING3 and COPYING.RUNTIME respectively.  If not, see
+   .  */
+
+/* Provide an entry point symbol to silence a linker warning.  */
+void _start() {}
diff --git a/libgcc/config/gcn/lib2-divmod-hi.c b/libgcc/config/gcn/lib2-divmod-hi.c
new file mode 100644
index 000..90626f6
--- /dev/null
+++ b/libgcc/config/gcn/lib2-divmod-hi.c
@@ -0,0 +1,117 @@
+/* Copyright (C) 2012-2018 Free Software Foundation, Inc.
+   Contributed by Altera and Mentor Graphics, Inc.
+
+This file is free software; you can redistribute it and/or modify it
+under the terms of the GNU General Public License as published by the
+Free Software Foundation; either version 3, or (at your option) any
+later version.
+
+This file is distributed in the hope that it will be useful, but
+WITHOUT ANY WARRANTY; without even the implied warranty of
+MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+General Public License for more details.
+
+Under Section 7 of GPL version 3, you are granted additional
+permissions described in the GCC Runtime Library Exception, version
+3.1, as published by the Free Software Foundation.
+
+You should have received a copy of the GNU General Public License and
+a copy of the GCC Runtime Library Exception along with this program;
+see the files COPYING3 and COPYING.RUNTIME respectively.  If not, see
+.  */
+
+#include "lib2-gcn.h"
+
+/* 16-bit HI divide and modulo as used in gcn.  */
+
+static UHItype
+udivmodhi4 (UHItype num, UHItype den, word_type modwanted)
+{
+  UHItype bit = 1;
+  UHItype res = 0;
+
+  while (den < num && bit && !(den & (1L<<15)))
+{
+  den <<=1;
+  bit <<=1;
+}
+  while (bit)
+{
+  if (num >= den)
+	{
+	  num -= den;
+	  res |= bit;
+	}
+  bit >>=1;
+  den >>=1;
+}
+  if (modwanted)
+return num;
+  return res;
+}
+
+
+HItype
+__divhi3 (HItype a, HItype b)
+{
+  word_type n

Re: [PATCH, Fortran] pad char to int conversions with spaces instead of zeros (legacy)

2018-12-12 Thread Jakub Jelinek
On Wed, Dec 12, 2018 at 11:37:23AM +, Mark Eggleston wrote:
> Before delving into the code to make changes to handle the case when passing
> a variable into transfer instead of a literal I revised the the test cases.
> The results indicate to me that this patch as originally intended is
> erroneous.
> 
> When a character literal is assigned to character variable e.g.
> 
> character(len=8) :: a
> a = 'ab'
> 
> The value is padded with spaces.

Sure, that is the standard behavior of character assignments.

What we are talking about is the case when transfer is used from a smaller
object to a larger result, whether it is e.g. 'ab' literal or character(len=2)
variable.  SOURCE doesn't have to be a character scalar (or array), it can
be anything.

The Fortran standard says here:
"If the physical representation of the result is longer than that
of SOURCE, the physical representation of the leading part is that of SOURCE 
and the remainder is processor
dependent."
Current gfortran choice of the processor dependent is 0 if SOURCE is a
constant and uninitialized bytes if it is not a constant.

So, does Oracle fortran / ifort etc. pad with spaces only if SOURCE has
character type, or in other cases too?

What about:
  integer(kind=2) :: a
  a = -1
  print *, transfer (1_2, 1_8), transfer (a, 1_8)
end
?

Jakub


[PATCH v3 08/10] Testsuite: GCN is always PIE.

2018-12-12 Thread Andrew Stubbs

[Already approved by Jeff Law. Included here for completeness.]

The GCN/HSA loader ignores the load address and uses a random location, so we
build all GCN binaries as PIE, by default.

This patch makes the necessary testsuite adjustments to make this work
correctly.

2018-12-12  Andrew Stubbs  

gcc/testsuite/
* gcc.dg/graphite/scop-19.c: Check pie_enabled.
* gcc.dg/pic-1.c: Disable on amdgcn.
* gcc.dg/pic-2.c: Disable on amdgcn.
* gcc.dg/pic-3.c: Disable on amdgcn.
* gcc.dg/pic-4.c: Disable on amdgcn.
* gcc.dg/pie-3.c: Disable on amdgcn.
* gcc.dg/pie-4.c: Disable on amdgcn.
* gcc.dg/uninit-19.c: Check pie_enabled.
* lib/target-supports.exp (check_effective_target_pie): Add amdgcn.
---
 gcc/testsuite/gcc.dg/graphite/scop-19.c | 4 ++--
 gcc/testsuite/gcc.dg/pic-1.c| 2 +-
 gcc/testsuite/gcc.dg/pic-2.c| 1 +
 gcc/testsuite/gcc.dg/pic-3.c| 2 +-
 gcc/testsuite/gcc.dg/pic-4.c| 2 +-
 gcc/testsuite/gcc.dg/pie-3.c| 2 +-
 gcc/testsuite/gcc.dg/pie-4.c| 2 +-
 gcc/testsuite/lib/target-supports.exp   | 3 ++-
 8 files changed, 10 insertions(+), 8 deletions(-)

diff --git a/gcc/testsuite/gcc.dg/graphite/scop-19.c b/gcc/testsuite/gcc.dg/graphite/scop-19.c
index c89717b..6028132 100644
--- a/gcc/testsuite/gcc.dg/graphite/scop-19.c
+++ b/gcc/testsuite/gcc.dg/graphite/scop-19.c
@@ -31,6 +31,6 @@ d_growable_string_append_buffer (struct d_growable_string *dgs,
   if (need > dgs->alc)
 d_growable_string_resize (dgs, need);
 }
-/* { dg-final { scan-tree-dump-times "number of SCoPs: 0" 2 "graphite" { target nonpic } } } */
-/* { dg-final { scan-tree-dump-times "number of SCoPs: 0" 1 "graphite" { target { ! nonpic } } } } */
+/* { dg-final { scan-tree-dump-times "number of SCoPs: 0" 2 "graphite" { target { nonpic || pie_enabled } } } } */
+/* { dg-final { scan-tree-dump-times "number of SCoPs: 0" 1 "graphite" { target { ! { nonpic || pie_enabled } } } } } */
 
diff --git a/gcc/testsuite/gcc.dg/pic-1.c b/gcc/testsuite/gcc.dg/pic-1.c
index 82ba43d..4bb332e 100644
--- a/gcc/testsuite/gcc.dg/pic-1.c
+++ b/gcc/testsuite/gcc.dg/pic-1.c
@@ -1,4 +1,4 @@
-/* { dg-do compile { target { ! { *-*-darwin* hppa*-*-* } } } } */
+/* { dg-do compile { target { ! { *-*-darwin* hppa*-*-* amdgcn*-*-* } } } } */
 /* { dg-require-effective-target fpic } */
 /* { dg-options "-fpic" } */
 
diff --git a/gcc/testsuite/gcc.dg/pic-2.c b/gcc/testsuite/gcc.dg/pic-2.c
index bccec13..3846ec4 100644
--- a/gcc/testsuite/gcc.dg/pic-2.c
+++ b/gcc/testsuite/gcc.dg/pic-2.c
@@ -2,6 +2,7 @@
 /* { dg-require-effective-target fpic } */
 /* { dg-options "-fPIC" } */
 /* { dg-skip-if "__PIC__ is always 1 for MIPS" { mips*-*-* } } */
+/* { dg-skip-if "__PIE__ is always defined for GCN" { amdgcn*-*-* } } */
 
 #if __PIC__ != 2
 # error __PIC__ is not 2!
diff --git a/gcc/testsuite/gcc.dg/pic-3.c b/gcc/testsuite/gcc.dg/pic-3.c
index c56f06f..1397977 100644
--- a/gcc/testsuite/gcc.dg/pic-3.c
+++ b/gcc/testsuite/gcc.dg/pic-3.c
@@ -1,4 +1,4 @@
-/* { dg-do compile { target { ! { *-*-darwin* hppa*64*-*-* mips*-*-linux-* } } } } */
+/* { dg-do compile { target { ! { *-*-darwin* hppa*64*-*-* mips*-*-linux-* amdgcn*-*-* } } } } */
 /* { dg-options "-fno-pic" } */
 
 #ifdef __PIC__
diff --git a/gcc/testsuite/gcc.dg/pic-4.c b/gcc/testsuite/gcc.dg/pic-4.c
index 2afdd99..d6d9dc9 100644
--- a/gcc/testsuite/gcc.dg/pic-4.c
+++ b/gcc/testsuite/gcc.dg/pic-4.c
@@ -1,4 +1,4 @@
-/* { dg-do compile { target { ! { *-*-darwin* hppa*64*-*-* mips*-*-linux-* } } } } */
+/* { dg-do compile { target { ! { *-*-darwin* hppa*64*-*-* mips*-*-linux-* amdgcn*-*-* } } } } */
 /* { dg-options "-fno-PIC" } */
 
 #ifdef __PIC__
diff --git a/gcc/testsuite/gcc.dg/pie-3.c b/gcc/testsuite/gcc.dg/pie-3.c
index 5577437..fd4a48d 100644
--- a/gcc/testsuite/gcc.dg/pie-3.c
+++ b/gcc/testsuite/gcc.dg/pie-3.c
@@ -1,4 +1,4 @@
-/* { dg-do compile { target { ! { *-*-darwin* hppa*64*-*-* mips*-*-linux-* } } } } */
+/* { dg-do compile { target { ! { *-*-darwin* hppa*64*-*-* mips*-*-linux-* amdgcn*-*-* } } } } */
 /* { dg-options "-fno-pie" } */
 
 #ifdef __PIC__
diff --git a/gcc/testsuite/gcc.dg/pie-4.c b/gcc/testsuite/gcc.dg/pie-4.c
index 4134676..5523602 100644
--- a/gcc/testsuite/gcc.dg/pie-4.c
+++ b/gcc/testsuite/gcc.dg/pie-4.c
@@ -1,4 +1,4 @@
-/* { dg-do compile { target { ! { *-*-darwin* hppa*64*-*-* mips*-*-linux-* } } } } */
+/* { dg-do compile { target { ! { *-*-darwin* hppa*64*-*-* mips*-*-linux-* amdgcn*-*-* } } } } */
 /* { dg-options "-fno-PIE" } */
 
 #ifdef __PIC__
diff --git a/gcc/testsuite/lib/target-supports.exp b/gcc/testsuite/lib/target-supports.exp
index 15e33eb..eb3fbcd 100644
--- a/gcc/testsuite/lib/target-supports.exp
+++ b/gcc/testsuite/lib/target-supports.exp
@@ -1176,7 +1176,8 @@ proc check_effective_target_pie { } {
 	 || [istarget *-*-dragonfly*]
 	 || [istarget *-*-freebsd*]
 	 || [istarget *-*-linux*]
-	 || [istarget *-*-gnu*] } {
+	 || [istarget *

[PATCH v3 06/10] GCN back-end config

2018-12-12 Thread Andrew Stubbs

[Already approved by Jeff Law.  Included here for completeness, now with
config.sub removed.]

This patch contains the configuration adjustments needed to enable the GCN
back-end.

Since the previous v2 posting, the config.sub patch has be committed to
the upstream GNU config repository.  However, the GCC version will have
to be updated before this config will work.  I presume I can do that as
an "obvious" patch when the time comes? (If nobody else gets there
first.)

The new configure check for dlopen is required to allow building the new
gcn-run tool.  This tool uses libdl to load the HSA runtime libraries, which
are required to run programs on the GPU.  The tool is disabled if libdl is not
available.

2018-12-12  Andrew Stubbs  
Kwok Cheung Yeung  
Julian Brown  
Tom de Vries  
Jan Hubicka  
Martin Jambor  

* configure.ac: Likewise.
* configure: Regenerate.
* contrib/config-list.mk: Add amdgcn-amdhsa.

gcc/
* config.gcc: Add amdgcn*-*-amdhsa configuration.
* configure.ac: Check for dlopen.
* configure: Regenerate.
---
 config.sub |  9 +++
 configure  |  2 ++
 configure.ac   |  2 ++
 contrib/config-list.mk |  1 +
 gcc/config.gcc | 41 ++
 gcc/configure  | 68 --
 gcc/configure.ac   |  8 ++
 7 files changed, 129 insertions(+), 2 deletions(-)

diff --git a/configure b/configure
index 4b095de..adf4fda 100755
--- a/configure
+++ b/configure
@@ -3652,6 +3652,8 @@ case "${target}" in
 noconfigdirs="$noconfigdirs ld gas gdb gprof"
 noconfigdirs="$noconfigdirs sim target-rda"
 ;;
+  amdgcn*-*-*)
+;;
   arm-*-darwin*)
 noconfigdirs="$noconfigdirs ld gas gdb gprof"
 noconfigdirs="$noconfigdirs sim target-rda"
diff --git a/configure.ac b/configure.ac
index 0547510..87f2aee 100644
--- a/configure.ac
+++ b/configure.ac
@@ -934,6 +934,8 @@ case "${target}" in
 noconfigdirs="$noconfigdirs ld gas gdb gprof"
 noconfigdirs="$noconfigdirs sim target-rda"
 ;;
+  amdgcn*-*-*)
+;;
   arm-*-darwin*)
 noconfigdirs="$noconfigdirs ld gas gdb gprof"
 noconfigdirs="$noconfigdirs sim target-rda"
diff --git a/contrib/config-list.mk b/contrib/config-list.mk
index cbb9e28b..de0226b 100644
--- a/contrib/config-list.mk
+++ b/contrib/config-list.mk
@@ -33,6 +33,7 @@ GCC_SRC_DIR=../../gcc
 LIST = aarch64-elf aarch64-linux-gnu aarch64-rtems \
   alpha-linux-gnu alpha-netbsd alpha-openbsd \
   alpha64-dec-vms alpha-dec-vms \
+  amdgcn-amdhsa \
   arc-elf32OPT-with-cpu=arc600 arc-elf32OPT-with-cpu=arc700 \
   arc-linux-uclibcOPT-with-cpu=arc700 arceb-linux-uclibcOPT-with-cpu=arc700 \
   arm-wrs-vxworks arm-netbsdelf \
diff --git a/gcc/config.gcc b/gcc/config.gcc
index 3b2a659..deaed71 100644
--- a/gcc/config.gcc
+++ b/gcc/config.gcc
@@ -326,6 +326,10 @@ alpha*-*-*)
 	cpu_type=alpha
 	extra_options="${extra_options} g.opt"
 	;;
+amdgcn*)
+	cpu_type=gcn
+	use_gcc_stdint=wrap
+	;;
 am33_2.0-*-linux*)
 	cpu_type=mn10300
 	;;
@@ -1410,6 +1414,25 @@ ft32-*-elf)
 	tm_file="dbxelf.h elfos.h newlib-stdint.h ${tm_file}"
 	tmake_file="${tmake_file} ft32/t-ft32"
 	;;
+amdgcn-*-amdhsa)
+	tm_file="elfos.h gcn/gcn-hsa.h gcn/gcn.h newlib-stdint.h"
+	tmake_file="gcn/t-gcn-hsa"
+	native_system_header_dir=/include
+	extra_modes=gcn/gcn-modes.def
+	extra_objs="${extra_objs} gcn-tree.o"
+	extra_gcc_objs="driver-gcn.o"
+	case "$host" in
+	x86_64*-*-linux-gnu )
+		if test "$ac_cv_search_dlopen" != no; then
+			extra_programs="${extra_programs} gcn-run\$(exeext)"
+		fi
+		;;
+	esac
+	if test x$enable_as_accelerator = xyes; then
+		extra_programs="${extra_programs} mkoffload\$(exeext)"
+		tm_file="${tm_file} gcn/offload.h"
+	fi
+	;;
 moxie-*-elf)
 	gas=yes
 	gnu_ld=yes
@@ -4131,6 +4154,24 @@ case "${target}" in
 		esac
 		;;
 
+	amdgcn-*-*)
+		supported_defaults="arch tune"
+
+		for which in arch tune; do
+			eval "val=\$with_$which"
+			case ${val} in
+			"" | carrizo | fiji | gfx900 )
+# OK
+;;
+			*)
+echo "Unknown cpu used in --with-$which=$val." 1>&2
+exit 1
+;;
+			esac
+		done
+		[ "x$with_arch" = x ] && with_arch=fiji
+		;;
+
 	hppa*-*-*)
 		supported_defaults="arch schedule"
 
diff --git a/gcc/configure b/gcc/configure
index efbf621..c27bc17 100755
--- a/gcc/configure
+++ b/gcc/configure
@@ -782,6 +782,7 @@ manext
 LIBICONV_DEP
 LTLIBICONV
 LIBICONV
+DL_LIB
 LDEXP_LIB
 EXTRA_GCC_LIBS
 GNAT_LIBEXC
@@ -9726,6 +9727,69 @@ LDEXP_LIB="$LIBS"
 LIBS="$save_LIBS"
 
 
+# Some systems need dlopen
+save_LIBS="$LIBS"
+LIBS=
+{ $as_echo "$as_me:${as_lineno-$LINENO}: checking for library containing dlopen" >&5
+$as_echo_n "checking for library containing dlopen... " >&6; }
+if ${ac_cv_search_dlopen+:} false; then :
+  $as_echo_n "(cached) " >&6
+else
+  ac_func_search_save_LIBS=$LIBS
+cat confdefs.h - <<_ACEOF >conftest.$ac_ext
+/* end confdefs.h.  */
+
+/* 

[PATCH v3 09/10] Ignore LLVM's blank lines.

2018-12-12 Thread Andrew Stubbs

[Already approved by Jeff Law. Included here for completeness.]

The GCN toolchain must use the LLVM assembler and linker because there's no
binutils port.  The LLVM tools do not have the same diagnostic style as
binutils, so the "blank line(s) in output" tests are inappropriate (and very
noisy).

The LLVM tools also have different command line options, so it's not possible
to autodetect object formats in the same way.

This patch addresses both issues.

2018-12-12  Andrew Stubbs  

gcc/testsuite/
* lib/file-format.exp (gcc_target_object_format): Handle AMD GCN.
* lib/gcc-dg.exp (gcc-dg-prune): Ignore blank lines from the LLVM
linker.
* lib/target-supports.exp (check_effective_target_llvm_binutils): New.
---
 gcc/testsuite/lib/file-format.exp |  3 +++
 gcc/testsuite/lib/gcc-dg.exp  |  2 +-
 gcc/testsuite/lib/target-supports.exp | 15 +++
 3 files changed, 19 insertions(+), 1 deletion(-)

diff --git a/gcc/testsuite/lib/file-format.exp b/gcc/testsuite/lib/file-format.exp
index 5c47246..c595fe2 100644
--- a/gcc/testsuite/lib/file-format.exp
+++ b/gcc/testsuite/lib/file-format.exp
@@ -41,6 +41,9 @@ proc gcc_target_object_format { } {
 } elseif { [istarget *-*-aix*] } {
 	# AIX doesn't necessarily have objdump, so hand-code it.
 	set gcc_target_object_format_saved coff
+} elseif { [istarget *-*-amdhsa*] } {
+	# AMD GCN uses LLVM objdump which is not CLI-compatible
+	set gcc_target_object_format_saved elf
 } else {
 set objdump_name [find_binutils_prog objdump]
 set open_file [open objfmtst.c w]
diff --git a/gcc/testsuite/lib/gcc-dg.exp b/gcc/testsuite/lib/gcc-dg.exp
index 054d884..ba7cb91 100644
--- a/gcc/testsuite/lib/gcc-dg.exp
+++ b/gcc/testsuite/lib/gcc-dg.exp
@@ -363,7 +363,7 @@ proc gcc-dg-prune { system text } {
 
 # Complain about blank lines in the output (PR other/69006)
 global allow_blank_lines
-if { !$allow_blank_lines } {
+if { !$allow_blank_lines && ![check_effective_target_llvm_binutils]} {
 	set num_blank_lines [llength [regexp -all -inline "\n\n" $text]]
 	if { $num_blank_lines } {
 	global testname_with_flags
diff --git a/gcc/testsuite/lib/target-supports.exp b/gcc/testsuite/lib/target-supports.exp
index eb3fbcd..18c4b0c7 100644
--- a/gcc/testsuite/lib/target-supports.exp
+++ b/gcc/testsuite/lib/target-supports.exp
@@ -8742,6 +8742,14 @@ proc check_effective_target_offload_hsa { } {
 } "-foffload=hsa" ]
 }
 
+# Return 1 if the compiler has been configured with hsa offloading.
+
+proc check_effective_target_offload_gcn { } {
+return [check_no_compiler_messages offload_gcn assembly {
+	int main () {return 0;}
+} "-foffload=amdgcn-unknown-amdhsa" ]
+}
+
 # Return 1 if the target support -fprofile-update=atomic
 proc check_effective_target_profile_update_atomic {} {
 return [check_no_compiler_messages profile_update_atomic assembly {
@@ -9032,9 +9040,16 @@ proc check_effective_target_cet { } {
 } "-O2" ]
 }
 
+
 # Return 1 if target supports floating point "infinite"
 proc check_effective_target_inf { } {
 return [check_no_compiler_messages supports_inf assembly {
 const double pinf = __builtin_inf ();
 }]
 }
+
+# Return 1 if this target uses an LLVM assembler and/or linker
+proc check_effective_target_llvm_binutils { } {
+return [expr { [istarget amdgcn*-*-*]
+		   || [check_effective_target_offload_gcn] } ]
+}


[PATCH v3 07/10] Add dg-require-effective-target exceptions

2018-12-12 Thread Andrew Stubbs

[v2 was approved by Mike Stump.  This version adds the documentation,
but is otherwise unchanged.]

There are a number of tests that fail because they assume that exceptions are
available, but GCN does not support them, yet.

This patch adds "dg-require-effective-target exceptions" in all the affected
tests.  There's probably an automatic way to test for exceptions, but the
current implementation simply says that AMD GCN does not support them.  This
should ensure that no other targets are affected by the change.

2018-12-12  Andrew Stubbs  
Kwok Cheung Yeung  
Julian Brown  
Tom de Vries  

gcc/doc/
* sourcebuild.texi: Document dg-required-effective-target exceptions.

gcc/testsuite/
* c-c++-common/ubsan/pr71512-1.c: Require exceptions.
* c-c++-common/ubsan/pr71512-2.c: Require exceptions.
* gcc.c-torture/compile/pr34648.c: Require exceptions.
* gcc.c-torture/compile/pr41469.c: Require exceptions.
* gcc.dg/20111216-1.c: Require exceptions.
* gcc.dg/cleanup-10.c: Require exceptions.
* gcc.dg/cleanup-11.c: Require exceptions.
* gcc.dg/cleanup-12.c: Require exceptions.
* gcc.dg/cleanup-13.c: Require exceptions.
* gcc.dg/cleanup-5.c: Require exceptions.
* gcc.dg/cleanup-8.c: Require exceptions.
* gcc.dg/cleanup-9.c: Require exceptions.
* gcc.dg/gomp/pr29955.c: Require exceptions.
* gcc.dg/lto/pr52097_0.c: Require exceptions.
* gcc.dg/nested-func-5.c: Require exceptions.
* gcc.dg/pch/except-1.c: Require exceptions.
* gcc.dg/pch/valid-2.c: Require exceptions.
* gcc.dg/pr41470.c: Require exceptions.
* gcc.dg/pr42427.c: Require exceptions.
* gcc.dg/pr44545.c: Require exceptions.
* gcc.dg/pr47086.c: Require exceptions.
* gcc.dg/pr51481.c: Require exceptions.
* gcc.dg/pr51644.c: Require exceptions.
* gcc.dg/pr52046.c: Require exceptions.
* gcc.dg/pr54669.c: Require exceptions.
* gcc.dg/pr56424.c: Require exceptions.
* gcc.dg/pr64465.c: Require exceptions.
* gcc.dg/pr65802.c: Require exceptions.
* gcc.dg/pr67563.c: Require exceptions.
* gcc.dg/tree-ssa/pr41469-1.c: Require exceptions.
* gcc.dg/tree-ssa/ssa-dse-28.c: Require exceptions.
* gcc.dg/vect/pr46663.c: Require exceptions.
* lib/target-supports.exp (check_effective_target_exceptions): New.
---
 gcc/doc/sourcebuild.texi  |  3 +++
 gcc/testsuite/c-c++-common/ubsan/pr71512-1.c  |  1 +
 gcc/testsuite/c-c++-common/ubsan/pr71512-2.c  |  1 +
 gcc/testsuite/gcc.c-torture/compile/pr34648.c |  1 +
 gcc/testsuite/gcc.c-torture/compile/pr41469.c |  1 +
 gcc/testsuite/gcc.dg/20111216-1.c |  1 +
 gcc/testsuite/gcc.dg/cleanup-10.c |  1 +
 gcc/testsuite/gcc.dg/cleanup-11.c |  1 +
 gcc/testsuite/gcc.dg/cleanup-12.c |  1 +
 gcc/testsuite/gcc.dg/cleanup-13.c |  1 +
 gcc/testsuite/gcc.dg/cleanup-5.c  |  1 +
 gcc/testsuite/gcc.dg/cleanup-8.c  |  1 +
 gcc/testsuite/gcc.dg/cleanup-9.c  |  1 +
 gcc/testsuite/gcc.dg/gomp/pr29955.c   |  1 +
 gcc/testsuite/gcc.dg/lto/pr52097_0.c  |  1 +
 gcc/testsuite/gcc.dg/nested-func-5.c  |  1 +
 gcc/testsuite/gcc.dg/pch/except-1.c   |  1 +
 gcc/testsuite/gcc.dg/pch/valid-2.c|  2 +-
 gcc/testsuite/gcc.dg/pr41470.c|  1 +
 gcc/testsuite/gcc.dg/pr42427.c|  1 +
 gcc/testsuite/gcc.dg/pr44545.c|  1 +
 gcc/testsuite/gcc.dg/pr47086.c|  1 +
 gcc/testsuite/gcc.dg/pr51481.c|  1 +
 gcc/testsuite/gcc.dg/pr51644.c|  1 +
 gcc/testsuite/gcc.dg/pr52046.c|  1 +
 gcc/testsuite/gcc.dg/pr54669.c|  1 +
 gcc/testsuite/gcc.dg/pr56424.c|  1 +
 gcc/testsuite/gcc.dg/pr64465.c|  1 +
 gcc/testsuite/gcc.dg/pr65802.c|  1 +
 gcc/testsuite/gcc.dg/pr67563.c|  1 +
 gcc/testsuite/gcc.dg/tree-ssa/pr41469-1.c |  1 +
 gcc/testsuite/gcc.dg/tree-ssa/ssa-dse-28.c|  1 +
 gcc/testsuite/gcc.dg/vect/pr46663.c   |  1 +
 gcc/testsuite/lib/target-supports.exp | 10 ++
 34 files changed, 45 insertions(+), 1 deletion(-)

diff --git a/gcc/doc/sourcebuild.texi b/gcc/doc/sourcebuild.texi
index 1204a54..6536a7c 100644
--- a/gcc/doc/sourcebuild.texi
+++ b/gcc/doc/sourcebuild.texi
@@ -2228,6 +2228,9 @@ Target uses @code{__cxa_atexit}.
 @item default_packed
 Target has packed layout of structure members by default.
 
+@item exceptions
+Target supports exceptions.
+
 @item fgraphite
 Target supports Graphite optimizations.
 
diff --git a/gcc/testsuite/c-c++-common/ubsan/pr71512-1.c b/gcc/testsuite/c-c++-common/ubsan/pr71512-1.c
index 2a90ab1..8af9365 100644
--- a/gcc/testsuite/c-c++-common/ubsan/pr71512-1.c
+++ b/

[PATCH v3 10/10] Port testsuite to GCN

2018-12-12 Thread Andrew Stubbs

This collection of miscellaneous patches configures the testsuite to run on AMD
GCN in a standalone (i.e. not offloading) configuration.  It assumes you have
your Dejagnu set up to run binaries via the gcn-run tool.

Since the previous v2 posting, the sqrt_insn additional option mechanism
has been reworked.  Otherwise the patch has only been rebased.

2018-12-12  Andrew Stubbs  
Kwok Cheung Yeung  
Julian Brown  
Tom de Vries  

gcc/testsuite/
* gcc.dg/20020312-2.c: Add amdgcn support.
* gcc.dg/Wno-frame-address.c: Disable on amdgcn.
* gcc.dg/builtin-apply2.c: Likewise.
* gcc.dg/torture/stackalign/builtin-apply-2.c: Likewise.
* gcc.dg/gimplefe-28.c: Add dg-add-options for sqrt_insn.
* gcc.dg/intermod-1.c: Add -mlocal-symbol-id on amdgcn.
* gcc.dg/memcmp-1.c: Increase timeout factor.
* gcc.dg/pr59605-2.c: Addd -DMAX_COPY=1025 on amdgcn.
* gcc.dg/sibcall-10.c: xfail on amdgcn.
* gcc.dg/sibcall-9.c: Likewise.
* gcc.dg/tree-ssa/gen-vect-11c.c: Likewise.
* gcc.dg/tree-ssa/pr84512.c: Likewise.
* gcc.dg/tree-ssa/loop-1.c: Adjust expectations for amdgcn.
* gfortran.dg/bind_c_array_params_2.f90: Likewise.
* lib/target-supports.exp (check_effective_target_trampolines):
Configure amdgcn.
(check_profiling_available): Likewise.
(check_effective_target_global_constructor): Likewise.
(check_effective_target_return_address): Likewise.
(check_effective_target_fopenacc): Likewise.
(check_effective_target_fopenmp): Likewise.
(check_effective_target_vect_int): Likewise.
(check_effective_target_vect_intfloat_cvt): Likewise.
(check_effective_target_vect_uintfloat_cvt): Likewise.
(check_effective_target_vect_floatint_cvt): Likewise.
(check_effective_target_vect_floatuint_cvt): Likewise.
(check_effective_target_vect_simd_clones): Likewise.
(check_effective_target_vect_shift): Likewise.
(check_effective_target_whole_vector_shift): Likewise.
(check_effective_target_vect_bswap): Likewise.
(check_effective_target_vect_shift_char): Likewise.
(check_effective_target_vect_long): Likewise.
(check_effective_target_vect_float): Likewise.
(check_effective_target_vect_double): Likewise.
(check_effective_target_vect_perm): Likewise.
(check_effective_target_vect_perm_byte): Likewise.
(check_effective_target_vect_perm_short): Likewise.
(check_effective_target_vect_widen_mult_qi_to_hi): Likewise.
(check_effective_target_vect_widen_mult_hi_to_si): Likewise.
(check_effective_target_vect_widen_mult_qi_to_hi_pattern): Likewise.
(check_effective_target_vect_widen_mult_hi_to_si_pattern): Likewise.
(check_effective_target_vect_natural_alignment): Likewise.
(check_effective_target_vect_fully_masked): Likewise.
(check_effective_target_vect_element_align): Likewise.
(check_effective_target_vect_masked_store): Likewise.
(check_effective_target_vect_scatter_store): Likewise.
(check_effective_target_vect_condition): Likewise.
(check_effective_target_vect_cond_mixed): Likewise.
(check_effective_target_vect_char_mult): Likewise.
(check_effective_target_vect_short_mult): Likewise.
(check_effective_target_vect_int_mult): Likewise.
(check_effective_target_sqrt_insn): Likewise.
(check_effective_target_vect_call_sqrtf): Likewise.
(check_effective_target_vect_call_btrunc): Likewise.
(check_effective_target_vect_call_btruncf): Likewise.
(check_effective_target_vect_call_ceil): Likewise.
(check_effective_target_vect_call_floorf): Likewise.
(check_effective_target_lto): Likewise.
(check_vect_support_and_set_flags): Likewise.
(check_effective_target_vect_stridedN): Enable when fully masked is
available.
(add_options_for_sqrt_insn): New procedure.
---
 gcc/testsuite/gcc.dg/20020312-2.c  |   2 +
 gcc/testsuite/gcc.dg/Wno-frame-address.c   |   2 +-
 gcc/testsuite/gcc.dg/builtin-apply2.c  |   2 +-
 gcc/testsuite/gcc.dg/gimplefe-28.c |   1 +
 gcc/testsuite/gcc.dg/intermod-1.c  |   1 +
 gcc/testsuite/gcc.dg/memcmp-1.c|   1 +
 gcc/testsuite/gcc.dg/pr59605-2.c   |   2 +-
 gcc/testsuite/gcc.dg/sibcall-10.c  |   2 +-
 gcc/testsuite/gcc.dg/sibcall-9.c   |   2 +-
 .../gcc.dg/torture/stackalign/builtin-apply-2.c|   2 +-
 gcc/testsuite/gcc.dg/tree-ssa/gen-vect-11c.c   |   2 +-
 gcc/testsuite/gcc.dg/tree-ssa/loop-1.c |   6 +-
 gcc/testsuite/gcc.dg/tree-ssa/pr84512.c|   2 +-
 .../gfortran.dg/bind_c_array_params_2.f90  |   3 +-
 gcc/testsuite/lib/target-supports.exp  | 13

Re: Add a loop versioning pass

2018-12-12 Thread Richard Biener
On Thu, Dec 6, 2018 at 2:19 PM Richard Sandiford
 wrote:
>
> Richard Biener  writes:
> >> > The pass contains an awful lot of heuristics :/  Like last year
> >> > with the interchange pass I would suggest to rip most of it out
> >> > and first lay infrastructure with the cases you can positively
> >> > identify without applying heuristics or "hacks" like stripping
> >> > semantically required casts.  That makes it also clear which
> >> > testcases test which code-path.  That said, all the analyze
> >> > multiplications/plusses/factors stuff was extremely hard to review
> >> > and I have no overall picture why this is all so complicated or
> >> > necessary.
> >>
> >> I think the tests do cover most of this -- was glad to see that
> >> when Kyrill tried taking something out, one of the tests started
> >> to fail :-).  And the comments in the tests as well as the code
> >> are supposed to explain why this is desirable.  But I can try to
> >> spruce up the comments in the code if they're not detailed enough.
> >>
> >> The problem is that the motivating (Fortran) case can't be identified
> >> without applying heuristics.  All we see is a collection of adds and
> >> multiplies, with no semantic information to say which multiplies are for
> >> the inner dimension and which aren't.  I agree it would be nicer not
> >> to have them, which is why I'd originally suggested the IFN to provide
> >> information about dimensions.
> >
> > Sure, still a new pass having _all_ the heuristic built in with
> > 10s of testcases do not make it easy to review those heuristics...
>
> Fair :-)  In the patch below I've tried to cut down on the special cases
> and tried to add more comments explaining which patterns the code is
> trying to detect/reject.  Probably the key part is the one beginning:
>
> +   The main difficulty here isn't finding strides that could be used
> +   in a version check (that's pretty easy).  The problem instead is to
> +   avoid versioning for some stride S that is unlikely ever to be 1 at
> +   runtime.  Versioning for S == 1 on its own would lead to unnecessary
> +   code bloat, while adding S == 1 to more realistic version conditions
> +   would lose the optimisation opportunity offered by those other conditions.
>
> >> >> +/* Return true if in principle it is worth versioning an index 
> >> >> fragment of
> >> >> +   the form:
> >> >> +
> >> >> + (i * b * SCALE) / FACTOR
> >> >> +
> >> >> +   for the case in which b == 1.  */
> >> >> +
> >> >> +bool
> >> >> +loop_versioning::acceptable_scale_p (tree scale, poly_uint64 factor)
> >> >> +{
> >> >> +  /* See whether SCALE is a constant multiple of FACTOR, and if the
> >> >> + multiple is small enough for us to treat it as a potential grouped
> >> >> + access.  For example:
> >> >> +
> >> >> +   for (auto i : ...)
> >> >> +  y[i] = f (x[4 * i * stride],
> >> >> +x[4 * i * stride + 1],
> >> >> +x[4 * i * stride + 2]);
> >> >> +
> >> >> + would benefit from versioning for the case in which stride == 1.
> >> >> + High multiples of i * stride are less likely to benefit, and could
> >> >> + indicate a simulated multi-dimensional array.
> >> >> +
> >> >> + This is just a heuristic, to avoid having to do expensive group
> >> >> + analysis of the data references in a loop.  */
> >> >> +  poly_uint64 const_scale;
> >> >> +  unsigned int multiple;
> >> >> +  if (poly_int_tree_p (scale, &const_scale)
> >> >> +  && constant_multiple_p (const_scale, factor, &multiple))
> >> >> +{
> >> >> +  unsigned int maxval = PARAM_VALUE 
> >> >> (PARAM_LOOP_VERSIONING_GROUP_SIZE);
> >> >> +  return IN_RANGE (multiple, 1, maxval);
> >> >
> >> > Hmm.  So you _do_ want to version sth like
> >> >
> >> > struct X { int i; int j; } a[2048];
> >> >
> >> > for (int i = start; i < end; ++i)
> >> >   a[i*s].i = 1;
> >> >
> >> > ?  That is, even with s == 1 the accesses will not become contiguous?
> >> > OK, I suppose that's because you are looking at a stmt in isolation
> >> > and another stmt may access a[i*s].j here.
> >> >
> >> > That is, would it be a future improvement to run sth like the
> >> > vectorizers group access analysis on the references and perform
> >> > this check on whole groups then, possibly better being able to
> >> > constrain what is now the magic parameter 
> >> > PARAM_LOOP_VERSIONING_GROUP_SIZE?
> >>
> >> Yeah, possibly.  The problem is that we might end up having to reproduce
> >> vectoriser heuristics.  E.g. stores with gaps should be vectorisable
> >> in future with SVE, using contiguous predicated stores in which some
> >> lanes are inactive.  So we don't necessarily need the .j access for
> >> this to be worthwhile.
> >>
> >> But perhaps the argument for versioning for vectorisation is stronger
> >> for grouped accesses than contiguous ones.
> >>
> >> Note that the above example doesn't rely on the grouping heuristic
> >> since we just strip the COMPONENT_REF and then analyze the ARRAY_REF

Re: [PATCH, Fortran] pad char to int conversions with spaces instead of zeros (legacy)

2018-12-12 Thread Mark Eggleston



On 12/12/2018 11:52, Jakub Jelinek wrote:

What about:
   integer(kind=2) :: a
   a = -1
   print *, transfer (1_2, 1_8), transfer (a, 1_8)
end
?


I assume you meant transfer (-1_2, 1_8), the result from gfortran is 
65535 for both transfers.


I'm about to build the compiler with

  memset (buffer, 0x20, buffer_size);

instead of

  memset (buffer, (flag_dec_pad_with_spaces ? 0x20 : 0x0), buffer_size);

and will check again, if necessary a padding variable can be used 
instead initially set to zero and changed to 0x20 when it is known that 
the source is character.


Marl

--
https://www.codethink.co.uk/privacy.html



Re: [PATCH v3 09/10] Ignore LLVM's blank lines.

2018-12-12 Thread Rainer Orth
Hi Andrew,

> [Already approved by Jeff Law. Included here for completeness.]
>
> The GCN toolchain must use the LLVM assembler and linker because there's no
> binutils port.  The LLVM tools do not have the same diagnostic style as
> binutils, so the "blank line(s) in output" tests are inappropriate (and very
> noisy).
>
> The LLVM tools also have different command line options, so it's not possible
> to autodetect object formats in the same way.
>
> This patch addresses both issues.
>
> 2018-12-12  Andrew Stubbs  
>
>   gcc/testsuite/
>   * lib/file-format.exp (gcc_target_object_format): Handle AMD GCN.
>   * lib/gcc-dg.exp (gcc-dg-prune): Ignore blank lines from the LLVM
>   linker.
>   * lib/target-supports.exp (check_effective_target_llvm_binutils): New.

like "exceptions", this new effective-target keyword needs documenting
in sourcebuild.texi.

> +# Return 1 if the compiler has been configured with hsa offloading.
> +
> +proc check_effective_target_offload_gcn { } {
> +return [check_no_compiler_messages offload_gcn assembly {
> + int main () {return 0;}
> +} "-foffload=amdgcn-unknown-amdhsa" ]
> +}

As does this one.

> +
>  # Return 1 if the target support -fprofile-update=atomic
>  proc check_effective_target_profile_update_atomic {} {
>  return [check_no_compiler_messages profile_update_atomic assembly {
> @@ -9032,9 +9040,16 @@ proc check_effective_target_cet { } {
>  } "-O2" ]
>  }
>  
> +
>  # Return 1 if target supports floating point "infinite"
>  proc check_effective_target_inf { } {
>  return [check_no_compiler_messages supports_inf assembly {
>  const double pinf = __builtin_inf ();
>  }]
>  }

Why those additional newlines?

Rainer

-- 
-
Rainer Orth, Center for Biotechnology, Bielefeld University


[PATCH] [MSP430] Fix gcc.dg/pr85180.c and gcc.dg/pr87985.c timeouts for msp430-elf -mlarge

2018-12-12 Thread Jozef Lawrynowicz
Compilation of gcc.dg/pr85180.c and gcc.dg/pr87985.c times out after 5 minutes
for msp430 with -mlarge.

nonzero_bits1 (from rtlanal.c), recurses many times for each reg
because reg_nonzero_bits_for_combine (combine.c) never considers using 
last_set_nonzero_bits for the given reg when the reg is PSImode (i.e. Pmode for
msp430-elf -mlarge).

nonzero bits for a mode of class MODE_PARTIAL_INT are valid for a mode of class
MODE_INT, and vice-versa. The existing comment in reg_nonzero_bits_for_combine
explaining why last_set_nonzero_bits is valid even when the precision of the
last set mode is less than the current mode, also explains why
MODE_PARTIAL_INT and MODE_INT can be used interchangeably here:

> record_value_for_reg invoked nonzero_bits on the register
> with nonzero_bits_mode (because last_set_mode is necessarily integral
> and HWI_COMPUTABLE_MODE_P in this case) so bits in nonzero_bits_mode
> are all valid, hence in mode too since nonzero_bits_mode is defined
> to the largest HWI_COMPUTABLE_MODE_P mode.

The attached patch fixes the timeout with mlarge (compilation takes only a
couple of seconds) by allowing the last set nonzero bits for a 
reg to be used if either the current mode or last mode is
MODE_PARTIAL_INT or MODE_INT. Currently only MODE_INT is considered.

Successfully bootstrapped and regtested x86_64-pc-linux-gnu and msp430-elf
-msmall and -mlarge.

Ok for trunk?
>From 753dbbfab665020cece59496765086b3debe23f9 Mon Sep 17 00:00:00 2001
From: Jozef Lawrynowicz 
Date: Tue, 27 Nov 2018 19:03:53 +
Subject: [PATCH] Use last_set_nonzero_bits for a REG when REG mode is
 MODE_PARTIAL_INT

2018-12-12  Jozef Lawrynowicz  

	gcc/ChangeLog:

	* combine.c (reg_nonzero_bits_for_combine): Use last_set_nonzero_bits
	for a reg if the current mode or last set mode was MODE_PARTIAL_INT.

---
 gcc/combine.c | 6 --
 1 file changed, 4 insertions(+), 2 deletions(-)

diff --git a/gcc/combine.c b/gcc/combine.c
index 7e61139..73b80b6 100644
--- a/gcc/combine.c
+++ b/gcc/combine.c
@@ -10245,8 +10245,10 @@ reg_nonzero_bits_for_combine (const_rtx x, scalar_int_mode xmode,
   if (rsp->last_set_value != 0
   && (rsp->last_set_mode == mode
 	  || (REGNO (x) >= FIRST_PSEUDO_REGISTER
-	  && GET_MODE_CLASS (rsp->last_set_mode) == MODE_INT
-	  && GET_MODE_CLASS (mode) == MODE_INT))
+	  && (GET_MODE_CLASS (rsp->last_set_mode) == MODE_INT
+		  || GET_MODE_CLASS (rsp->last_set_mode) == MODE_PARTIAL_INT)
+	  && (GET_MODE_CLASS (mode) == MODE_INT
+		  || GET_MODE_CLASS (mode) == MODE_PARTIAL_INT)))
   && ((rsp->last_set_label >= label_tick_ebb_start
 	   && rsp->last_set_label < label_tick)
 	  || (rsp->last_set_label == label_tick
-- 
2.7.4



Re: [PATCH, Fortran] pad char to int conversions with spaces instead of zeros (legacy)

2018-12-12 Thread Jakub Jelinek
On Wed, Dec 12, 2018 at 12:06:12PM +, Mark Eggleston wrote:
> 
> On 12/12/2018 11:52, Jakub Jelinek wrote:
> > What about:
> >integer(kind=2) :: a
> >a = -1
> >print *, transfer (1_2, 1_8), transfer (a, 1_8)
> > end
> > ?
> 
> I assume you meant transfer (-1_2, 1_8), the result from gfortran is 65535
> for both transfers.

It doesn't really matter that much, the question is what is in the upper
bits and mainly a) what you get with the vendor compilers b) what you get
with your patch.  Because by my reading if you use there 0x20, it would
print 2314885530818510847 or 2314885530818445313 or something similar.
> 
> I'm about to build the compiler with
> 
>   memset (buffer, 0x20, buffer_size);
> 
> instead of
> 
>   memset (buffer, (flag_dec_pad_with_spaces ? 0x20 : 0x0), buffer_size);
> 
> and will check again, if necessary a padding variable can be used instead
> initially set to zero and changed to 0x20 when it is known that the source
> is character.

Jakub


[RFA] require profiling support for gcc.dg/lto/20100430-1_0.c test

2018-12-12 Thread Joel Brobecker
Hello,

This test currently fails unexpectedly if GCC is configured with
--disable-gcov, because it requires -fprofile-arcs. This patch
fixes the issue by requiring profiling support in order to run
this test.

Tested with two compilers, one built with --disable-gcov, resulting
in the test reporting an UNSUPPORTED result; and one built with gcov
support, resulting in 2 PASS tests.

gcc/testsuite/ChangeLog:

* gcc.dg/lto/20100430-1_0.c: Add dg-require-profiling requirement.

OK to push?

Thank you,
-- 
Joel

---
 gcc/testsuite/gcc.dg/lto/20100430-1_0.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/gcc/testsuite/gcc.dg/lto/20100430-1_0.c 
b/gcc/testsuite/gcc.dg/lto/20100430-1_0.c
index 1ccfc9ac1ca..c7dafac31af 100644
--- a/gcc/testsuite/gcc.dg/lto/20100430-1_0.c
+++ b/gcc/testsuite/gcc.dg/lto/20100430-1_0.c
@@ -1,6 +1,7 @@
 /* { dg-lto-do link } */
 /* { dg-lto-options {{-O2 -fprofile-arcs -flto -r -nostdlib}} } */
 /* { dg-extra-ld-options "-flinker-output=nolto-rel" } */
+/* { dg-require-profiling "-fprofile-arcs" } */
 
 
 void
-- 
2.17.1



Re: [Patch 2/4][Aarch64] v2: Implement Aarch64 SIMD ABI

2018-12-12 Thread Richard Sandiford
Steve Ellcey  writes:
> This is the modified version of the second of my Aarch64 SIMD ABI patches.
> While implementing this functionality I found I wanted
> targetm.simd_clone.adjust to be called when creating SIMD clone definitions
> and also when creating SIMD clone declarations.  The current
> implementation (used only by x86) only called this target function when
> creating clone definitions.  I added a second argument to the target
> function to say if it was creating a definition or a declaration and
> modified the i386 code to do nothing on declarations, thus maintaining
> its current behavour.
>
> This allowed my to add the aarch64_vector_pcs attribute to SIMD clone
> declarations and definitions on Aarch64. 
>
> I considered comparing node->decl and cfun->decl to differentiate
> between definitions and declarations instead of using a new argument
> but having an argument seemed cleaner and clearer.

Yeah, agreed.

> +  tree ret_type = TREE_TYPE (TREE_TYPE (node->decl));
> +  if (TREE_CODE (ret_type) != VOID_TYPE)
> +switch (TYPE_MODE (ret_type))
> +  {
> +  case E_QImode:
> +  case E_HImode:
> +  case E_SImode:
> +  case E_DImode:
> +  case E_SFmode:
> +  case E_DFmode:
> +  /* case E_SCmode: */
> +  /* case E_DCmode: */
> + break;
> +  default:
> + warning_at (DECL_SOURCE_LOCATION (node->decl), 0,
> + "unsupported return type %qT for simd\n", ret_type);
> + return 0;
> +  }

I think this should be using the tree-level properties of the type
rather than the mode.  E.g.:

struct S { int i[1]; };

has SImode (I think) but the vector ABI's "PBV" property is false for that.

I think we then need to separate cases in which the code is warning
about things that are invalid according to the vector ABI vs. those
that GCC simply doesn't support yet.  At the moment I think there's
an implicit requirement in the vector ABI that PBV has to be true for
non-void return types, so warning about the struct above would be a
correctness test.

But as the commented-out lines say, complex types are also valid,
so if we don't support those yet, I think the warning should say
something like "GCC does not yet support ...".  ("sorry (...)" is
too strong, since it's error-class.)

We should handle 16-bit floats as well.

> +  tree t;
> +  for (t = DECL_ARGUMENTS (node->decl); t; t = DECL_CHAIN (t))
> +/* FIXME: Shouldn't we allow such arguments if they are uniform?  */

Yeah.

> +switch (TYPE_MODE (TREE_TYPE (t)))
> +  {
> +  case E_QImode:
> +  case E_HImode:
> +  case E_SImode:
> +  case E_DImode:
> +  case E_SFmode:
> +  case E_DFmode:
> +  /* case E_SCmode: */
> +  /* case E_DCmode: */
> + break;
> +  default:
> + warning_at (DECL_SOURCE_LOCATION (node->decl), 0,
> + "unsupported argument type %qT for simd\n", TREE_TYPE (t));
> + return 0;
> +  }

Same comments here.  The vector ABI doesn't place this restriction on
parameters: if PBV is false, the elements are passed by pointer instead.

> +
> +  if (TARGET_SIMD)
> +{
> +clonei->vecsize_mangle = 'n';
> +clonei->mask_mode = VOIDmode;
> +clonei->vecsize_int = 128;
> +clonei->vecsize_float = 128;
> +
> +if (clonei->simdlen == 0)
> +  {
> +  if (SCALAR_INT_MODE_P (TYPE_MODE (base_type)))
> + clonei->simdlen = clonei->vecsize_int;
> +  else
> + clonei->simdlen = clonei->vecsize_float;
> +  clonei->simdlen /= GET_MODE_BITSIZE (SCALAR_TYPE_MODE (base_type));

Are we sure base_type is correct for the AArch64 vector ABI?
I think at the moment it's specific to the Intel ABI.

> +  }
> +else if (clonei->simdlen > 16)
> +  {
> +  /* If it is possible for given SIMDLEN to pass CTYPE value in
> +  registers (v0-v7) accept that SIMDLEN, otherwise warn and don't
> +  emit corresponding clone.  */

This too would be a GCC restriction rather than an ABI one.  Could you
add a comment explaning why we need it?

> +/* If SIMD clone NODE can't be used in a vectorized loop
> +   in current function, return -1, otherwise return a badness of using it
> +   (0 if it is most desirable from vecsize_mangle point of view, 1
> +   slightly less desirable, etc.).  */
> +
> +static int
> +aarch64_simd_clone_usable (struct cgraph_node *node)

Think this should just say:

/* Implement TARGET_SIMD_CLONE_USABLE.  */

instead of duplicating the hook definition.  Maybe there's something
AArch64-specific we can add, but doesn't matter if not.

> diff --git a/gcc/target.def b/gcc/target.def
> index 96f37e0..ffc3787 100644
> --- a/gcc/target.def
> +++ b/gcc/target.def
> @@ -1632,8 +1632,10 @@ int, (struct cgraph_node *, struct cgraph_simd_clone 
> *, tree, int), NULL)
>  DEFHOOK
>  (adjust,
>  "This hook should add implicit @code{attribute(target(\"...\"))} attribute\n\
> -to SIMD clone @var{node} if needed.",
> -void, (struct cgraph_node *), NULL)
> +to SIMD clone @var{node} if n

Re: [Patch 2/4][Aarch64] v2: Implement Aarch64 SIMD ABI

2018-12-12 Thread Jakub Jelinek
On Wed, Dec 12, 2018 at 12:34:46PM +, Richard Sandiford wrote:
> > I considered comparing node->decl and cfun->decl to differentiate
> > between definitions and declarations instead of using a new argument
> > but having an argument seemed cleaner and clearer.
> 
> Yeah, agreed.

I actually disagree, there is no point in passing another argument.
You should be able to just check node->definition whether it is a definition
or declaration.

Jakub


Re: [RFA] require profiling support for gcc.dg/lto/20100430-1_0.c test

2018-12-12 Thread Richard Biener
On Wed, Dec 12, 2018 at 1:23 PM Joel Brobecker  wrote:
>
> Hello,
>
> This test currently fails unexpectedly if GCC is configured with
> --disable-gcov, because it requires -fprofile-arcs. This patch
> fixes the issue by requiring profiling support in order to run
> this test.
>
> Tested with two compilers, one built with --disable-gcov, resulting
> in the test reporting an UNSUPPORTED result; and one built with gcov
> support, resulting in 2 PASS tests.
>
> gcc/testsuite/ChangeLog:
>
> * gcc.dg/lto/20100430-1_0.c: Add dg-require-profiling requirement.
>
> OK to push?

OK.

> Thank you,
> --
> Joel
>
> ---
>  gcc/testsuite/gcc.dg/lto/20100430-1_0.c | 1 +
>  1 file changed, 1 insertion(+)
>
> diff --git a/gcc/testsuite/gcc.dg/lto/20100430-1_0.c 
> b/gcc/testsuite/gcc.dg/lto/20100430-1_0.c
> index 1ccfc9ac1ca..c7dafac31af 100644
> --- a/gcc/testsuite/gcc.dg/lto/20100430-1_0.c
> +++ b/gcc/testsuite/gcc.dg/lto/20100430-1_0.c
> @@ -1,6 +1,7 @@
>  /* { dg-lto-do link } */
>  /* { dg-lto-options {{-O2 -fprofile-arcs -flto -r -nostdlib}} } */
>  /* { dg-extra-ld-options "-flinker-output=nolto-rel" } */
> +/* { dg-require-profiling "-fprofile-arcs" } */
>
>
>  void
> --
> 2.17.1
>


Re: [libgomp, nvptx] Disable OMP_{DISPLAY_AFFINITY,AFFINITY_FORMAT} support

2018-12-12 Thread Thomas Schwinge
Hi!

On Wed, 12 Dec 2018 10:36:17 +0100, Jakub Jelinek  wrote:
> On Wed, Dec 12, 2018 at 09:29:36AM +0100, Tom de Vries wrote:
> > > This is the libgomp/ except libgomp/testsuite/ part of the gomp-5_0-branch
> > > merge to trunk I've just committed.

> > [libgomp, nvptx] Disable OMP_{DISPLAY_AFFINITY,AFFINITY_FORMAT} support

Thanks, Tom, for looking into these issues.  I'd also noticed these, but
not yet been able to allocate time to resolve them.  I'll test your
patches, too.


> Eventually we need to find a way how to transfer some ICVs and other data
> from the host to the offloading library, either process shared that aren't
> changing, which can include the hostname, getpid, or some others that would
> need to be passed for every target region.

That would probably also include any state that the respective language
runtime libraries maintain?  For example, C's global rounding mode as set
my "fesetround".  ..., and I now wonder how to propagate that from the
host libc to the target libcs, for example host: glibc vs. nvptx
offloading: newlib -- have to do a host-side "fegetround" before each
offloaded code region, or at least when there's been an intermediate
host-side "fesetround" call (so have to track these?), and likewise for
any other such state-changing functions?  Also, the "options" array and
call of "_gfortran_set_options" that the Fortran front end synthesizes
into each "main" function?

I have not yet looked into these things in detail, however.


Grüße
 Thomas


Re: [libgomp, nvptx] Disable OMP_{DISPLAY_AFFINITY,AFFINITY_FORMAT} support

2018-12-12 Thread Tom de Vries
On 12-12-18 11:48, Tom de Vries wrote:
> On 12-12-18 10:36, Jakub Jelinek wrote:
>> On Wed, Dec 12, 2018 at 09:29:36AM +0100, Tom de Vries wrote:
 This is the libgomp/ except libgomp/testsuite/ part of the gomp-5_0-branch
 merge to trunk I've just committed.

 2018-11-08  Jakub Jelinek  

* affinity.c (gomp_display_affinity_place): New function.
* affinity-fmt.c: New file.
>>>
* fortran.c: Include stdio.h and string.h.
(omp_pause_resource, omp_pause_resource_all): Add ialias_redirect.
(omp_get_schedule_, omp_get_schedule_8_): Mask off GFS_MONOTONIC bit.
(omp_set_affinity_format_, omp_get_affinity_format_,
omp_display_affinity_, omp_capture_affinity_, omp_pause_resource_,
omp_pause_resource_all_): New functions.
>>>
>>> OK for trunk?
>>>
>>> Thanks,
>>> - Tom
>>
>>> [libgomp, nvptx] Disable OMP_{DISPLAY_AFFINITY,AFFINITY_FORMAT} support
>>>
>>> Disable compilation of support for OMP_DISPLAY_AFFINITY and 
>>> OMP_AFFINITY_FORMAT
>>> for nvptx.
>>>
>>> This fixes some libgomp testsuite failures for x86_64 with nvptx 
>>> accelerator.
>>>
>>> Build on x86_64 with nvptx accelerator, tested libgomp.
>>>
>>> 2018-12-12  Tom de Vries  
>>>
>>> * affinity.c (gomp_display_affinity_place): Guard with
>>> #ifndef LIBGOMP_OFFLOADED_ONLY.
>>> * fortran.c (omp_set_affinity_format_, omp_get_affinity_format_)
>>> (omp_display_affinity_, omp_capture_affinity_): Same.
>>> * libgomp/config/nvptx/affinity-fmt.c: New, empty file overriding
>>> libgomp/affinity-fmt.c.
>>
>> The runtime APIs should still be available inside of OpenMP regions, your
>> patch makes them unavailable.
> 
> Ah, I see.
> 
>> What exactly doesn't work?
>> hostname handling, getpid, thread_id, affinity places?
> 
> The concrete error I'm getting is unresolved symbol getpid.  This is
> caused by the AC_COMPILE_IFELSE configure test for HAVE_GETPID passing
> for nvptx, while the master newlib for nvptx does not contain getpid.
> 
>> I'd prefer for now to just stub what doesn't work in the 
>> LIBGOMP_OFFLOADED_ONLY
>> case (print "node", 0 as pid, 0 as thread id, something for affinity
>> (0 or 0-N where N is the number of warps it can support or whatever).
>>
> 
> OK, I'll give that a try.
> 

This RFC patch implements that approach for getpid and gethostname (I
wonder though whether it's not possible to turn the corresponding
configure tests into link tests, which could also fix this for nvptx).

Another problem we're running into is missing istty, pulled in by
fwrite. The nvptx newlib port does support write, but I'm unsure in what
form I should handle this.  Add configure tests for fwrite and write? Or
special case the files in the config/nvptx dir? Any advice here?

Thanks,
- Tom
[RFC][libgomp, nvptx] Fix target-5.c compilation

Libgomp test-case libgomp.c/target-5.c is failing to compile with nvptx
accelerator due to missing:
- getpid
- gethostname
- isatty (pulled in by fwrite)
in the nvptx newlib.

This demonstrator patch fixes that by minimally guarding all related locations
with ifndef LIBGOMP_OFFLOADED_ONLY, which allows the test-case (and others) to
pass.

---
 libgomp/affinity-fmt.c | 12 +++-
 libgomp/fortran.c  |  4 
 2 files changed, 15 insertions(+), 1 deletion(-)

diff --git a/libgomp/affinity-fmt.c b/libgomp/affinity-fmt.c
index 9e5c5f754eb..72b511eec73 100644
--- a/libgomp/affinity-fmt.c
+++ b/libgomp/affinity-fmt.c
@@ -184,6 +184,7 @@ static void
 gomp_display_hostname (char *buffer, size_t size, size_t *ret,
 		   bool right, size_t sz)
 {
+#ifndef LIBGOMP_OFFLOADED_ONLY
 #ifdef HAVE_GETHOSTNAME
   {
 char buf[256];
@@ -227,6 +228,7 @@ gomp_display_hostname (char *buffer, size_t size, size_t *ret,
 	return;
   }
   }
+#endif
 #endif
   gomp_display_string_len (buffer, size, ret, right, sz, "node", 4);
 }
@@ -351,7 +353,7 @@ gomp_display_affinity (char *buffer, size_t size,
 	  gomp_display_hostname (buffer, size, &ret, right, sz);
 	  break;
 	case 'P':
-#ifdef HAVE_GETPID
+#if defined(HAVE_GETPID) && !defined(LIBGOMP_OFFLOADED_ONLY)
 	  val = getpid ();
 #else
 	  val = 0;
@@ -456,13 +458,17 @@ omp_display_affinity (const char *format)
   if (ret < sizeof buf)
 {
   buf[ret] = '\n';
+#ifndef LIBGOMP_OFFLOADED_ONLY
   fwrite (buf, 1, ret + 1, stderr);
+#endif
   return;
 }
   b = gomp_malloc (ret + 1);
   ialias_call (omp_capture_affinity) (b, ret + 1, format);
   b[ret] = '\n';
+#ifndef LIBGOMP_OFFLOADED_ONLY
   fwrite (b, 1, ret + 1, stderr);
+#endif
   free (b);
 }
 
@@ -477,13 +483,17 @@ gomp_display_affinity_thread (gomp_thread_handle handle,
   if (ret < sizeof buf)
 {
   buf[ret] = '\n';
+#ifndef LIBGOMP_OFFLOADED_ONLY
   fwrite (buf, 1, ret + 1, stderr);
+#endif
   return;
 }
   b = gomp_malloc (ret + 1);
   gomp_display_affinity (b, ret + 1, gomp_affinity_format_var,
   			 handle, ts, place);
   b[ret] = '\n';
+#ifndef LIBGOMP_OFFLOADED_ONLY
   fwrite (b

Re: [libgomp, nvptx] Disable OMP_{DISPLAY_AFFINITY,AFFINITY_FORMAT} support

2018-12-12 Thread Jakub Jelinek
On Wed, Dec 12, 2018 at 02:02:20PM +0100, Tom de Vries wrote:
> This RFC patch implements that approach for getpid and gethostname (I
> wonder though whether it's not possible to turn the corresponding
> configure tests into link tests, which could also fix this for nvptx).
> 
> Another problem we're running into is missing istty, pulled in by
> fwrite. The nvptx newlib port does support write, but I'm unsure in what
> form I should handle this.  Add configure tests for fwrite and write? Or
> special case the files in the config/nvptx dir? Any advice here?
> 
> Thanks,
> - Tom

> [RFC][libgomp, nvptx] Fix target-5.c compilation
> 
> Libgomp test-case libgomp.c/target-5.c is failing to compile with nvptx
> accelerator due to missing:
> - getpid
> - gethostname
> - isatty (pulled in by fwrite)
> in the nvptx newlib.
> 
> This demonstrator patch fixes that by minimally guarding all related locations
> with ifndef LIBGOMP_OFFLOADED_ONLY, which allows the test-case (and others) to
> pass.

I guess my preference would be something similar to what
libgomp/config/mingw32/affinity-fmt.c does now, so override the file,
define/undefine something in there and include the common affinity-fmt.c.
Perhaps for the fwrite stuff define introduce a function that does that
(say
void
gomp_print_string (const char *str, size_t len)
{
  fwrite (str, 1, len, stderr);
},
because it is in more than one file, make it hidden in libgomp.h.
config/nvptx/error.c then has what you could also use in
config/nvptx/affinity-fmt.c.
And for hostname/getpid maybe just #undef those with a comment?

Jakub


Re: [PATCH] [RFC] PR target/52813 and target/11807

2018-12-12 Thread Christophe Lyon
On Wed, 12 Dec 2018 at 12:21, Thomas Preudhomme
 wrote:
>
> So my understanding is that the original code (CMSIS library) used to
> clobber sp because the asm statement was actually changing the sp.
> That in turn led GCC to try to save and restore sp which is not what
> CMSIS was expecting to happen. Changing sp without clobber as done now
> is probably the right solution and r242693 can be reverted. That will
> remove the failing test.
>

OK, I read PR52813 too, but I'm not sure to fully understand the new status.
My understanding is that since this patch was committed, if an asm statement
clobbers sp, it is now allowed to actually declare it as clobber (this patch
generates an error in such a case).
So the user is now expected to lie to the compiler when writing to
this kind of register (sp, pic register), by not declaring it as "clobber"?


> Best regards,
>
> Thomas
> On Wed, 12 Dec 2018 at 10:30, Thomas Preudhomme
>  wrote:
> >
> > Hi Christophe,
> >
> > That PR was about a bug occuring when sp was clobbered so if it cannot
> > be clobbered anymore the whole commit (r242693) can be removed. Let me
> > check the original code that lead to the PR why it's clobbering sp
> > though.
> >
> > Best regards,
> >
> > Thomas
> > On Wed, 12 Dec 2018 at 09:43, Christophe Lyon
> >  wrote:
> > >
> > > On Tue, 11 Dec 2018 at 16:52, Richard Sandiford
> > >  wrote:
> > > >
> > > > Dimitar Dimitrov  writes:
> > > > > On понеделник, 10 декември 2018 г. 11:21:53 EET Richard Sandiford 
> > > > > wrote:
> > > > >> Dimitar Dimitrov  writes:
> > > > >> > I have tested this fix on x86_64 host, and found no regression in 
> > > > >> > the C
> > > > >> > and C++ testsuites.  I'm marking this patch as RFC simply because 
> > > > >> > I don't
> > > > >> > have experience with other architectures, and I don't have a setup 
> > > > >> > to
> > > > >> > test all architectures supported by GCC.
> > > > >> >
> > > > >> > gcc/ChangeLog:
> > > > >> >
> > > > >> > 2018-12-07  Dimitar Dimitrov  
> > > > >> >
> > > > >> >* cfgexpand.c (asm_clobber_reg_is_valid): Also produce
> > > > >> >error when stack pointer is clobbered.
> > > > >> >(expand_asm_stmt): Refactor clobber check in separate function.
> > > > >> >
> > > > >> > gcc/testsuite/ChangeLog:
> > > > >> >
> > > > >> > 2018-12-07  Dimitar Dimitrov  
> > > > >> >
> > > > >> >* gcc.target/i386/pr52813.c: New test.
> > > > >> >
> > > > >> > Signed-off-by: Dimitar Dimitrov 
> > > > >>
> > > > >> LGTM.  Do you have a copyright assignment on file?  'Fraid this is
> > > > >> probably big enough to need one.
> > > > > Yes, I have copyright assignment.
> > > >
> > > > OK, great.  I went ahead and applied the patch.
> > > >
> > >
> > > Hi,
> > >
> > > This patch introduces a regression on arm:
> > > FAIL: gcc.target/arm/pr77904.c (test for excess errors)
> > > Excess errors:
> > > /gcc/testsuite/gcc.target/arm/pr77904.c:7:3: error: Stack Pointer
> > > register clobbered by 'sp' in 'asm'
> > >
> > > Indeed the testcase has an explicit:
> > >   __asm volatile ("" : : : "sp");
> > > which is now rejected.
> > >
> > > Thomas, is that mandatory to test your code to fix pr77904?
> > >
> > > Thanks,
> > >
> > > Christophe
> > >
> > > > Thanks,
> > > > Richard


Re: [C++ PATCH] PR c++/87051

2018-12-12 Thread Marek Polacek
On Wed, Dec 12, 2018 at 09:17:01AM +0200, Ville Voutilainen wrote:
> On Tue, 11 Dec 2018 at 20:58, Marek Polacek  wrote:
> >
> > On Thu, Sep 13, 2018 at 08:58:34PM +0300, Ville Voutilainen wrote:
> > > On 13 September 2018 at 20:41, Jason Merrill  wrote:
> > > >> Okay. Do you think we should have an sfk_kind for non-canonical
> > > >> copy/move operations? That would presumably make it a tad more 
> > > >> straightforward to go from
> > > >> fndecl to whatever class bits, instead of what's currently there, 
> > > >> where we say "yeah I had a fndecl,
> > > >> now I turned it into an sfk_kind that says it's a copy constructor, 
> > > >> but guess which one when you're
> > > >> deeming its triviality". ;)
> > > >
> > > > I suppose it would be possible to have a more detailed sfk_kind for
> > > > distinguishing between different signatures, but I'm inclined instead
> > > > to stop using sfk_kind in trivial_fn_p.  Even if having an enumeration
> > > > is convenient for dispatch (or bitmapping), it doesn't need to be the
> > > > same enum.
> > >
> > > Yeah, the idea of using a different enum dawned on me straight after
> > > sending that email. ;)
> > > I'll give this approach a spin, more bits into the lang_type and a
> > > different mapping, that way we should indeed
> > > get correct answers for all cases.
> >
> > Hi Ville, any updates?
> 
> No, and not any time soon. Do you by any chance want to pick this up? :)

Ok, this sounds interesting -- I'll give it a try.

Marek


Re: [PATCH][libbacktrace] Add allocfail.sh test-case

2018-12-12 Thread Ian Lance Taylor via gcc-patches
On Tue, Dec 11, 2018 at 11:04 PM Tom de Vries  wrote:
>
> [ Fixed ENOPATCH ]
>
> On 12-12-18 08:03, Tom de Vries wrote:
> > On 11-12-18 18:59, Ian Lance Taylor wrote:
> >> On Wed, Nov 28, 2018 at 4:50 AM Tom de Vries  wrote:
> >>>
> >>> Add test-case that forces alloc.c functions to fail, and check whether 
> >>> fail
> >>> handling is robust.
> >>>
> >>> This is the test-case for "[libbacktrace] Fix segfault upon allocation
> >>> failure".  Without that patch, this test-case fails like this:
> >>> ...
> >>> allocfail.sh: line 71: 26041 Segmentation fault  (core dumped) \
> >>>   ./allocfail $i > /dev/null 2>&1
> >>> Unallowed fail found: 13
> >>> FAIL allocfail.sh (exit status: 1)
> >>> ...
> >>>
> >>> This is a seperate patch because the test-case is nontrivial.
> >>>
> >>> Bootstrapped and reg-tested on x86_64.
> >>>
> >>> OK for trunk?
> >>>
> >>> Thanks,
> >>> - Tom
> >>>
> >>> [libbacktrace] Add allocfail.sh test-case
> >>>
> >>> 2018-11-27  Tom de Vries  
> >>>
> >>> * Makefile.am (TESTS): Add allocfail.sh.
> >>> (check_PROGRAMS): Add allocfail.
> >>> * Makefile.in: Regenerate.
> >>> * instrumented_alloc.c: New file.  Redefine malloc and realloc.
> >>> Include alloc.c.
> >>> * allocfail.c: New file.
> >>> * allocfail.sh: New file.
> >>
> >> Can you redo this without using GNU make features like $(filter-out) ?
> >
> > Hi,
> >
> > Done, and re-bootstrapped-and-regtested.
> >
> > OK for trunk?

This is OK.

Thanks.

Ian


Re: RFA: libiberty: Add a limit on demangling qualifiers (PR 87241) (version 2)

2018-12-12 Thread Ian Lance Taylor via gcc-patches
On Wed, Dec 12, 2018 at 3:40 AM Nick Clifton  wrote:
>
>   *sigh* 5 minutes after sending the patch for this PR, I realised that
>I had made a mistake.  I should have conditionalized the limit on the
>number of supported qualifiers, so that the check is only made if we
>have resource limits enabled.  Like this:
>
> Cheers
>   Nick
>
> Index: libiberty/cplus-dem.c
> ===
> --- libiberty/cplus-dem.c   (revision 267043)
> +++ libiberty/cplus-dem.c   (working copy)
> @@ -3443,6 +3443,20 @@
>success = 0;
>  }
>
> +  if ((work->options & DMGL_NO_RECURSE_LIMIT) == 0)
> +{
> +  /* PR 87241: Catch malicious input that will try to trick this code 
> into
> +allocating a ridiculous amount of memory via the remember_Ktype()
> +function.
> +The choice of DEMANGLE_RECURSION_LIMIT is somewhat arbitrary.  
> Possibly
> +a better solution would be to track how much memory remember_Ktype
> +allocates and abort when some upper limit is reached.  */
> +  if (qualifiers > DEMANGLE_RECURSION_LIMIT)
> +   /* FIXME: We ought to have some way to tell the user that
> +  this limit has been reached.  */
> +   success = 0;
> +}
> +
>if (!success)
>  return success;


This is OK.

Thanks.,

I thought we were removing the old demangling schemes?

Ian


Re: [RFA] require profiling support for gcc.dg/lto/20100430-1_0.c test

2018-12-12 Thread Joel Brobecker
> > gcc/testsuite/ChangeLog:
> >
> > * gcc.dg/lto/20100430-1_0.c: Add dg-require-profiling requirement.
> >
> > OK to push?
> 
> OK.

Thank you Richard. Patch pushed to trunk (r267055).

-- 
Joel


Re: C++ PATCH for c++/88136, -Wdeprecated-copy too noisy

2018-12-12 Thread Jason Merrill
On Sat, Dec 8, 2018 at 1:33 PM Ville Voutilainen
 wrote:
> On Sat, 8 Dec 2018 at 20:05, Ville Voutilainen
>  wrote:
>
> > > New compiler releases will usually include new warnings that require
> > > some code modification to accommodate.  Why is this one particularly
> > > problematic?
> >
> > I don't think it's any more problematic than any other warning that
> > introduces new errors for fools that build with -Wall and -Werror.
> > But considering that those errors are false positives, and that
> > turning them off will in some cases require writing boiler-plate
> > (defaulted assignments), I would merely prefer having another release
> > round to get fixes for my codebase out in the wild.
>
> For what it's worth, I find it unfortunate that this deprecation and its 
> resulting warnings end up
> making the decision on whether a "rule of 5" must be followed; correct code 
> needs to be adjusted
> to cope with a fairly stylistic matter, with false positives and all.

I don't see it as a stylistic matter.  If you need a user-provided
copy constructor to get proper copy semantics for a class, you almost
certainly need the same thing for copy assignment.  This was too noisy
for destructors, for which it's fairly common to define a virtual
destructor just to make a class polymorphic, not because there are
significant destruction semantics.  But I don't see a similar argument
for copy constructors: in your example, there was no need for
QVariant::Private to define a copy constructor, and that seems like a
situation where a warning is reasonable, even if the code is in fact
correct.

Jason


[C++ Patch] Fix location of grokdeclarator error message about static data member definition

2018-12-12 Thread Paolo Carlini

Hi,

it seems we can easily improve the location of this - not so uncommon in 
novice code - error to point to the 'static' keyword.


Tested x86_64-linux.

Thanks, Paolo.

/

/cp
2018-12-12  Paolo Carlini  

* decl.c (grokdeclarator): Fix location of error message about
static data member definition.

/testsuite
2018-12-12  Paolo Carlini  

* g++.dg/other/static5.C: New.
Index: cp/decl.c
===
--- cp/decl.c   (revision 267051)
+++ cp/decl.c   (working copy)
@@ -12724,7 +12724,8 @@ grokdeclarator (const cp_declarator *declarator,
DECL_CONTEXT (decl) = ctype;
if (staticp == 1)
  {
-   permerror (input_location, "% may not be used when 
defining "
+   permerror (declspecs->locations[ds_storage_class],
+  "% may not be used when defining "
   "(as opposed to declaring) a static data member");
staticp = 0;
storage_class = sc_none;
Index: testsuite/g++.dg/other/static5.C
===
--- testsuite/g++.dg/other/static5.C(nonexistent)
+++ testsuite/g++.dg/other/static5.C(working copy)
@@ -0,0 +1,8 @@
+struct S
+{
+  static int i;
+  const static double d;
+};
+
+static int S::i;  // { dg-error "1:.static. may not be used" }
+const static double S::d = 1.0;  // { dg-error "7:.static. may not be used" }


Re: [PATCH, Fortran] pad char to int conversions with spaces instead of zeros (legacy)

2018-12-12 Thread Mark Eggleston

On 12/12/2018 12:06, Mark Eggleston wrote:

I'm about to build the compiler with

  memset (buffer, 0x20, buffer_size);

instead of

  memset (buffer, (flag_dec_pad_with_spaces ? 0x20 : 0x0), buffer_size);

and will check again, if necessary a padding variable can be used 
instead initially set to zero and changed to 0x20 when it is known 
that the source is character.


It was indeed necessary to only use 0x20 for padding when the source is 
known to be character. One more check was to transfer a character(4) 
variable to an integer(8) variable, there is no space padding. I don't 
yet know whether this matches the behaviour of other compilers.


I don't currently have access to other compilers. I can have some test 
cases performed on xlf and SunStudio but won't get any answers until 
after Christmas. The answer will determine whether I have any more work 
to do.


Mark


--
https://www.codethink.co.uk/privacy.html



Re: [PATCH] [RFC] PR target/52813 and target/11807

2018-12-12 Thread Christophe Lyon
On Wed, 12 Dec 2018 at 14:19, Christophe Lyon
 wrote:
>
> On Wed, 12 Dec 2018 at 12:21, Thomas Preudhomme
>  wrote:
> >
> > So my understanding is that the original code (CMSIS library) used to
> > clobber sp because the asm statement was actually changing the sp.
> > That in turn led GCC to try to save and restore sp which is not what
> > CMSIS was expecting to happen. Changing sp without clobber as done now
> > is probably the right solution and r242693 can be reverted. That will
> > remove the failing test.
> >
>
> OK, I read PR52813 too, but I'm not sure to fully understand the new status.
> My understanding is that since this patch was committed, if an asm statement
> clobbers sp, it is now allowed to actually declare it as clobber (this patch
> generates an error in such a case).
> So the user is now expected to lie to the compiler when writing to
> this kind of register (sp, pic register), by not declaring it as "clobber"?
>

I'm attaching a small patch which adds a more verbose error message
along the lines of what I understand of the current status.
I'm pretty sure I got (at least) the formatting wrong :)

Christophe

>
> > Best regards,
> >
> > Thomas
> > On Wed, 12 Dec 2018 at 10:30, Thomas Preudhomme
> >  wrote:
> > >
> > > Hi Christophe,
> > >
> > > That PR was about a bug occuring when sp was clobbered so if it cannot
> > > be clobbered anymore the whole commit (r242693) can be removed. Let me
> > > check the original code that lead to the PR why it's clobbering sp
> > > though.
> > >
> > > Best regards,
> > >
> > > Thomas
> > > On Wed, 12 Dec 2018 at 09:43, Christophe Lyon
> > >  wrote:
> > > >
> > > > On Tue, 11 Dec 2018 at 16:52, Richard Sandiford
> > > >  wrote:
> > > > >
> > > > > Dimitar Dimitrov  writes:
> > > > > > On понеделник, 10 декември 2018 г. 11:21:53 EET Richard Sandiford 
> > > > > > wrote:
> > > > > >> Dimitar Dimitrov  writes:
> > > > > >> > I have tested this fix on x86_64 host, and found no regression 
> > > > > >> > in the C
> > > > > >> > and C++ testsuites.  I'm marking this patch as RFC simply 
> > > > > >> > because I don't
> > > > > >> > have experience with other architectures, and I don't have a 
> > > > > >> > setup to
> > > > > >> > test all architectures supported by GCC.
> > > > > >> >
> > > > > >> > gcc/ChangeLog:
> > > > > >> >
> > > > > >> > 2018-12-07  Dimitar Dimitrov  
> > > > > >> >
> > > > > >> >* cfgexpand.c (asm_clobber_reg_is_valid): Also produce
> > > > > >> >error when stack pointer is clobbered.
> > > > > >> >(expand_asm_stmt): Refactor clobber check in separate 
> > > > > >> > function.
> > > > > >> >
> > > > > >> > gcc/testsuite/ChangeLog:
> > > > > >> >
> > > > > >> > 2018-12-07  Dimitar Dimitrov  
> > > > > >> >
> > > > > >> >* gcc.target/i386/pr52813.c: New test.
> > > > > >> >
> > > > > >> > Signed-off-by: Dimitar Dimitrov 
> > > > > >>
> > > > > >> LGTM.  Do you have a copyright assignment on file?  'Fraid this is
> > > > > >> probably big enough to need one.
> > > > > > Yes, I have copyright assignment.
> > > > >
> > > > > OK, great.  I went ahead and applied the patch.
> > > > >
> > > >
> > > > Hi,
> > > >
> > > > This patch introduces a regression on arm:
> > > > FAIL: gcc.target/arm/pr77904.c (test for excess errors)
> > > > Excess errors:
> > > > /gcc/testsuite/gcc.target/arm/pr77904.c:7:3: error: Stack Pointer
> > > > register clobbered by 'sp' in 'asm'
> > > >
> > > > Indeed the testcase has an explicit:
> > > >   __asm volatile ("" : : : "sp");
> > > > which is now rejected.
> > > >
> > > > Thomas, is that mandatory to test your code to fix pr77904?
> > > >
> > > > Thanks,
> > > >
> > > > Christophe
> > > >
> > > > > Thanks,
> > > > > Richard
2018-12-12  Christophe Lyon  

gcc/
* cfgexpand.c (asm_clobber_reg_is_valid): Add a more descriptive
error message.

diff --git a/gcc/cfgexpand.c b/gcc/cfgexpand.c
index 0d04bbc..e1f2bff 100644
--- a/gcc/cfgexpand.c
+++ b/gcc/cfgexpand.c
@@ -2870,12 +2870,20 @@ asm_clobber_reg_is_valid (int regno, int nregs, const 
char *regname)
 {
   /* ??? Diagnose during gimplification?  */
   error ("PIC register clobbered by %qs in %", regname);
+  error ("Such clobbers are not supported by GCC. "
+"If you really want to overwrite the PIC register, "
+"remove it from the clobber list in the % at your own risk: "
+"GCC will not save it.");
   is_valid = false;
 }
   /* Clobbering the STACK POINTER register is an error.  */
   if (overlaps_hard_reg_set_p (regset, Pmode, STACK_POINTER_REGNUM))
 {
   error ("Stack Pointer register clobbered by %qs in %", regname);
+  error ("Such clobbers are not supported by GCC. "
+"If you really want to overwrite the Stack Pointer, "
+"remove it from the clobber list in the % at your own risk: "
+"GCC will not save it.");
   is_valid = false;
 }
 


Re: C++ PATCH for c++/88136, -Wdeprecated-copy too noisy

2018-12-12 Thread Ville Voutilainen
On Wed, 12 Dec 2018 at 16:52, Jason Merrill  wrote:

> > For what it's worth, I find it unfortunate that this deprecation and its 
> > resulting warnings end up
> > making the decision on whether a "rule of 5" must be followed; correct code 
> > needs to be adjusted
> > to cope with a fairly stylistic matter, with false positives and all.
>
> I don't see it as a stylistic matter.  If you need a user-provided
> copy constructor to get proper copy semantics for a class, you almost
> certainly need the same thing for copy assignment.  This was too noisy
> for destructors, for which it's fairly common to define a virtual
> destructor just to make a class polymorphic, not because there are
> significant destruction semantics.  But I don't see a similar argument
> for copy constructors: in your example, there was no need for
> QVariant::Private to define a copy constructor, and that seems like a
> situation where a warning is reasonable, even if the code is in fact
> correct.

I have other cases where the compiler warns besides QVariant::Private.
I don't have a good grasp of what
those cases are like, yet, because I'll look at those warnings some
time later, possibly next year. :)
Thus I don't have any idea whether there are practical cases where the
copy constructor allocates but the assignment
can just blast bits, although I find that rather plausible. (Side
note: disabling warnings in Qt is painful; it's
not really more or less painful than doing code changes and regtesting
those on all platforms; that's not your
problem, though.)


Re: [PATCH] [RFC] PR target/52813 and target/11807

2018-12-12 Thread Thomas Preudhomme
[resending from the right address]

Hi Christophe,

Why not simply: "Clobber of  unsupported" with an
accompanying change of the documentation to state the extra bit you
wanted to put in that error message? Perhaps even add a reference to
the section of the documentation in the error message.


Best regards,


Thomas
On Wed, 12 Dec 2018 at 15:13, Christophe Lyon
 wrote:
>
> On Wed, 12 Dec 2018 at 14:19, Christophe Lyon
>  wrote:
> >
> > On Wed, 12 Dec 2018 at 12:21, Thomas Preudhomme
> >  wrote:
> > >
> > > So my understanding is that the original code (CMSIS library) used to
> > > clobber sp because the asm statement was actually changing the sp.
> > > That in turn led GCC to try to save and restore sp which is not what
> > > CMSIS was expecting to happen. Changing sp without clobber as done now
> > > is probably the right solution and r242693 can be reverted. That will
> > > remove the failing test.
> > >
> >
> > OK, I read PR52813 too, but I'm not sure to fully understand the new status.
> > My understanding is that since this patch was committed, if an asm statement
> > clobbers sp, it is now allowed to actually declare it as clobber (this patch
> > generates an error in such a case).
> > So the user is now expected to lie to the compiler when writing to
> > this kind of register (sp, pic register), by not declaring it as "clobber"?
> >
>
> I'm attaching a small patch which adds a more verbose error message
> along the lines of what I understand of the current status.
> I'm pretty sure I got (at least) the formatting wrong :)
>
> Christophe
>
> >
> > > Best regards,
> > >
> > > Thomas
> > > On Wed, 12 Dec 2018 at 10:30, Thomas Preudhomme
> > >  wrote:
> > > >
> > > > Hi Christophe,
> > > >
> > > > That PR was about a bug occuring when sp was clobbered so if it cannot
> > > > be clobbered anymore the whole commit (r242693) can be removed. Let me
> > > > check the original code that lead to the PR why it's clobbering sp
> > > > though.
> > > >
> > > > Best regards,
> > > >
> > > > Thomas
> > > > On Wed, 12 Dec 2018 at 09:43, Christophe Lyon
> > > >  wrote:
> > > > >
> > > > > On Tue, 11 Dec 2018 at 16:52, Richard Sandiford
> > > > >  wrote:
> > > > > >
> > > > > > Dimitar Dimitrov  writes:
> > > > > > > On понеделник, 10 декември 2018 г. 11:21:53 EET Richard Sandiford 
> > > > > > > wrote:
> > > > > > >> Dimitar Dimitrov  writes:
> > > > > > >> > I have tested this fix on x86_64 host, and found no regression 
> > > > > > >> > in the C
> > > > > > >> > and C++ testsuites.  I'm marking this patch as RFC simply 
> > > > > > >> > because I don't
> > > > > > >> > have experience with other architectures, and I don't have a 
> > > > > > >> > setup to
> > > > > > >> > test all architectures supported by GCC.
> > > > > > >> >
> > > > > > >> > gcc/ChangeLog:
> > > > > > >> >
> > > > > > >> > 2018-12-07  Dimitar Dimitrov  
> > > > > > >> >
> > > > > > >> >* cfgexpand.c (asm_clobber_reg_is_valid): Also produce
> > > > > > >> >error when stack pointer is clobbered.
> > > > > > >> >(expand_asm_stmt): Refactor clobber check in separate 
> > > > > > >> > function.
> > > > > > >> >
> > > > > > >> > gcc/testsuite/ChangeLog:
> > > > > > >> >
> > > > > > >> > 2018-12-07  Dimitar Dimitrov  
> > > > > > >> >
> > > > > > >> >* gcc.target/i386/pr52813.c: New test.
> > > > > > >> >
> > > > > > >> > Signed-off-by: Dimitar Dimitrov 
> > > > > > >>
> > > > > > >> LGTM.  Do you have a copyright assignment on file?  'Fraid this 
> > > > > > >> is
> > > > > > >> probably big enough to need one.
> > > > > > > Yes, I have copyright assignment.
> > > > > >
> > > > > > OK, great.  I went ahead and applied the patch.
> > > > > >
> > > > >
> > > > > Hi,
> > > > >
> > > > > This patch introduces a regression on arm:
> > > > > FAIL: gcc.target/arm/pr77904.c (test for excess errors)
> > > > > Excess errors:
> > > > > /gcc/testsuite/gcc.target/arm/pr77904.c:7:3: error: Stack Pointer
> > > > > register clobbered by 'sp' in 'asm'
> > > > >
> > > > > Indeed the testcase has an explicit:
> > > > >   __asm volatile ("" : : : "sp");
> > > > > which is now rejected.
> > > > >
> > > > > Thomas, is that mandatory to test your code to fix pr77904?
> > > > >
> > > > > Thanks,
> > > > >
> > > > > Christophe
> > > > >
> > > > > > Thanks,
> > > > > > Richard


[ping] use REGNUM macros instead of hardcoded values in aarch64 PROBE_STACK reg definitions

2018-12-12 Thread Olivier Hainque
Hello,

Ping for one of the changes last proposed here:

  https://gcc.gnu.org/ml/gcc-patches/2018-11/msg00761.html

Submitted separately as an attempt to facilitate the review
process.

Got access to a linux box, so in addition to the Ada testing
we do on cross aarch64-vxworks7, bootstrapped and regression
tested on aarch64-linux.


This one is just a simple use of REGNUM macros instead
of harcoded values in the definition of PROBE_STACK_FIRST_REG
and PROBE_STACK_SECOND_REG in aarch64.c.

OK to commit ?

Thanks in advance!

Olivier

2018-12-12  Olivier Hainque  

* config/aarch64/aarch64.c (PROBE_STACK_FIRST_REG) : Redefine as
R9_REGNUM instead of 9.
(PROBE_STACK_SECOND_REG): Redefine as R10_REGNUM instead of 10.



0001-Use-REGNUM-macros-in-the-definitions-of-aarch64-PROB.patch
Description: Binary data


[ping] allow target (OS) SUBTARGET_OVERRIDE_OPTIONS on aarch64

2018-12-12 Thread Olivier Hainque
Ping for one of the changes last proposed here:

 https://gcc.gnu.org/ml/gcc-patches/2018-11/msg00761.html

Submitted separately as an attempt to facilitate the review
process.

This one proposes the possibility for target (OS) configurations
to provide a SUBTARGET_OVERRIDE_OPTIONS macro as other CPU ports
do, needed by our aarch64-vxworks7 port to come.

Got access to a linux box, so in addition to the Ada nighty
testing we do on the cross port, bootstrapped and regression
tested on aarch64-linux.

OK to commit ?

Thanks in advance,

With Kind Regards,

Olivier


2018-12-12  Olivier Hainque  

* config/aarch64/aarch64.c (aarch64_override_options): Once arch,
cpu and tune were validated, insert SUBTARGET_OVERRIDE_OPTIONS if
defined.



0002-Add-support-for-SUBTARGET_OVERRIDE_OPTIONS-on-aarch6.patch
Description: Binary data


Re: [ping] use REGNUM macros instead of hardcoded values in aarch64 PROBE_STACK reg definitions

2018-12-12 Thread Kyrill Tkachov

Hi Olivier,

On 12/12/18 15:35, Olivier Hainque wrote:

Hello,

Ping for one of the changes last proposed here:

https://gcc.gnu.org/ml/gcc-patches/2018-11/msg00761.html

Submitted separately as an attempt to facilitate the review
process.

Got access to a linux box, so in addition to the Ada testing
we do on cross aarch64-vxworks7, bootstrapped and regression
tested on aarch64-linux.


This one is just a simple use of REGNUM macros instead
of harcoded values in the definition of PROBE_STACK_FIRST_REG
and PROBE_STACK_SECOND_REG in aarch64.c.

OK to commit ?



This looks obvious to me tbh, I'd go ahead and commit it.
Thanks,
Kyrill


Thanks in advance!

Olivier

2018-12-12  Olivier Hainque  

* config/aarch64/aarch64.c (PROBE_STACK_FIRST_REG) : Redefine as
R9_REGNUM instead of 9.
(PROBE_STACK_SECOND_REG): Redefine as R10_REGNUM instead of 10.





[ping] Change static chain to r11 on aarch64

2018-12-12 Thread Olivier Hainque
Hello,

Ping for part of the changes last proposed here:

 https://gcc.gnu.org/ml/gcc-patches/2018-11/msg00761.html

Submitted separately as an attempt to facilitate the review
process.

This piece is the change of static chain from r18 to r11.

Regression testing with languages=all on an aarch64-linux box
uncovered (various go related test failures) that libffi needs
a synchronized adjustment.

This is the updated version of the patch, so bootstrapped and
regression tested clean on aarch64-linux.

OK to commit ?

Thanks in advance!

With Kind Regards,

Olivier

2018-12-12  Olivier Hainque  

gcc/
* config/aarch64/aarch64.h (STATIC_CHAIN_REGNUM): Use R11 instead
of R18.

gcc/testsuite/
* gcc.dg/cwsc1.c: Expect r11 as the static chain on aarch64.

libffi/
* src/aarch64/sysv.S (ffi_call_SYSV): Expect r11 as the static chain.
(ffi_go_closure_SYSV): Likewise.
* testsuite/libffi.go/static-chain.h: Likewise.




0003-Change-static-chain-from-r18-to-r11-on-aarch64.patch
Description: Binary data




[ping] allow target configurations to state R18 as reserved on arrch64

2018-12-12 Thread Olivier Hainque
Hello,

Ping for part of the changes last proposed here:

  https://gcc.gnu.org/ml/gcc-patches/2018-11/msg00761.html

Submitted separately as an attempt to facilitate the review
process.

This piece is the change allowing target OS configurations
to state R18 as fixed, possible after the change of static
chain just submitted separately

 (https://gcc.gnu.org/ml/gcc-patches/2018-12/msg00829.html)

I suppose that after the static chain update, a possible
alternative way to achieve this would be to configure with
something like -ffixed-r18 in a driver self spec.

Anyway, bootstrapped and regression tested this with
enable-languages=all on aarch64-linux, in addition to the
gcc-8 based nightly testing we run for the Ada toolchain.

OK to commit ? Or feedback regarding the possible use of
a self spec instead ?

Thanks in advance!

With Kind Regards,

Olivier

2018-12-12  Olivier Hainque  

* config/aarch64/aarch64.h (FIXED_R18): New internal configuration
macro, defaulted to 0.
(FIXED_REGISTERS): Use it.




0004-Allow-target-OS-configuration-to-state-R18-as-reserv.patch
Description: Binary data


Re: [ping] use REGNUM macros instead of hardcoded values in aarch64 PROBE_STACK reg definitions

2018-12-12 Thread Olivier Hainque
Hi Kyrill,

> On 12 Dec 2018, at 16:43, Kyrill Tkachov  wrote:
> 
> This looks obvious to me tbh, I'd go ahead and commit it.

Great :-)

I have just sent separate updates on the other, less obvious
parts (SUBTARGET_OVERRIDE_OPTIONS, static chain change to r11,
and FIXED_R18).

Thanks!

>>* config/aarch64/aarch64.c (PROBE_STACK_FIRST_REG) : Redefine as
>>R9_REGNUM instead of 9.
>>(PROBE_STACK_SECOND_REG): Redefine as R10_REGNUM instead of 10.




Re: [gofrontend-dev] Re: libgo patch committed: Add precise stack scan support

2018-12-12 Thread Cherry Zhang via gcc-patches
Thank you, Matthias!

>From the log, essentially all the tests aborted. The only place the new
code can cause abort on all programs that I can think of is in the runtime
startup code, probestackmaps, which calls value_size, which aborts due to
an unhandled case. I haven't been able to try out on an ARM machine, but I
managed to cross-compile a Go program and visually inspect the exception
table. The type table's encoding is DW_EH_PE_absptr, which is indeed not
handled, which will cause abort.

I send https://go-review.googlesource.com/c/gofrontend/+/153857 (also as
below). Hopefully this will fix the problem.

Thanks,
Cherry

diff --git a/libgo/runtime/go-unwind.c b/libgo/runtime/go-unwind.c
index c44755f9..f4bbfb60 100644
--- a/libgo/runtime/go-unwind.c
+++ b/libgo/runtime/go-unwind.c
@@ -318,6 +318,8 @@ value_size (uint8_t encoding)
   case DW_EH_PE_sdata8:
   case DW_EH_PE_udata8:
 return 8;
+  case DW_EH_PE_absptr:
+return sizeof(uintptr);
   default:
 break;
 }



On Tue, Dec 11, 2018 at 7:03 PM Matthias Klose  wrote:

> On 11.12.18 22:01, Cherry Zhang wrote:
> > On Tue, Dec 11, 2018 at 3:51 PM Ian Lance Taylor 
> wrote:
> >
> >> On Tue, Dec 11, 2018 at 6:52 AM Matthias Klose  wrote:
> >>>
> >>> On 10.12.18 16:54, Cherry Zhang wrote:
>  On Mon, Dec 10, 2018 at 1:41 AM Matthias Klose 
> >> wrote:
> 
> > On 06.12.18 00:09, Ian Lance Taylor wrote:
> >> This libgo patch by Cherry Zhang adds support for precise stack
> >> scanning to the Go runtime.  This uses per-function stack maps
> stored
> >> in the exception tables in the language-specific data area.  The
> >> compiler needs to generate these stack maps; currently this is only
> >> done by a version of LLVM, not by GCC.  Each safepoint in a function
> >> is associated with a (real or dummy) landing pad, and its "type
> info"
> >> in the exception table is a pointer to the stack map. When a stack
> is
> >> scanned, the stack map is found by the stack unwinding code.
> >>
> >> For precise stack scan we need to unwind the stack. There are three
> > cases:
> >>
> >> - If a goroutine is scanning its own stack, it can unwind the stack
> >> and scan the frames.
> >>
> >> - If a goroutine is scanning another, stopped, goroutine, it cannot
> >> directly unwind the target stack. We handle this by switching
> >> (runtime.gogo) to the target g, letting it unwind and scan the
> stack,
> >> and switch back.
> >>
> >> - If we are scanning a goroutine that is blocked in a syscall, we
> >> send
> >> a signal to the target goroutine's thread, and let the signal
> handler
> >> unwind and scan the stack. Extra care is needed as this races with
> >> enter/exit syscall.
> >>
> >> Currently this is only implemented on GNU/Linux.
> >>
> >> Bootstrapped and ran Go testsuite on x86_64-pc-linux-gnu.  Committed
> >> to mainline.
> >
> > this broke the libgo build on ARM32:
> >
> > ../../../src/libgo/runtime/go-unwind.c: In function
> > 'scanstackwithmap_callback':
> > ../../../src/libgo/runtime/go-unwind.c:754:18: error:
> >> '_URC_NORMAL_STOP'
> > undeclared (first use in this function)
> >   754 |   return _URC_NORMAL_STOP;
> >   |  ^~~~
> > ../../../src/libgo/runtime/go-unwind.c:754:18: note: each undeclared
> > identifier
> > is reported only once for each function i
> > t appears in
> > ../../../src/libgo/runtime/go-unwind.c: In function
> > 'probestackmaps_callback':
> > ../../../src/libgo/runtime/go-unwind.c:802:10: error:
> >> '_URC_NORMAL_STOP'
> > undeclared (first use in this function)
> >   802 |   return _URC_NORMAL_STOP;
> >   |  ^~~~
> > ../../../src/libgo/runtime/go-unwind.c:803:1: warning: control
> >> reaches end
> > of
> > non-void function [-Wreturn-type]
> >   803 | }
> >   | ^
> > make[6]: *** [Makefile:1474: runtime/go-unwind.lo] Error 1
> > make[6]: Leaving directory
> > '/<>/build/arm-linux-gnueabihf/libgo'
> >
> >
>  Hell Matthias,
> 
>  Thank you for the report. And sorry about the breakage. Does
>  https://go-review.googlesource.com/c/gofrontend/+/153417 (or the
> patch
>  below) fix ARM32 build? I don't have an ARM32 machine at hand to test.
> >>>
> >>> this fixes the build.
> >>>
> >>> currently running the testsuite, almost every test case core dumps on
> >>> arm-linux-gnueabihf
> >>
> >> I committed Cherry's patch to trunk, since it looks reasonable to me.
> >> Cherry, Matthias, let me know if you figure out why programs are
> >> failing.
> >>
> >>
> > Thank you.
> >
> > I don't know for the moment. I'm trying to find an ARM32 machine so I can
> > test.
> >
> > Matthias, is it convenient for you to share a stack trace for the failing
> > programs? That would be very helpful. Th

Re: [ping] allow target configurations to state R18 as reserved on arrch64

2018-12-12 Thread Wilco Dijkstra
Hi Oliver,

+#define FIXED_R18 0

   {                            \
 0, 0, 0, 0,   0, 0, 0, 0,    /* R0 - R7 */        \
 0, 0, 0, 0,   0, 0, 0, 0,    /* R8 - R15 */        \
-    0, 0, 0, 0,   0, 0, 0, 0,    /* R16 - R23 */        \
+    0, 0, FIXED_R18, 0, 0, 0, 0, 0,    /* R16 - R23 */        \

This is equivalent to having a zero in the table given the #define is 
unconditional. 
It's best to use the existing mechanisms to change registers, this should be 
done
in aarch64_conditional_register_usage given very similar changes for floating
point and SVE are already there.

Cheers,
Wilco



[PATCH] PR libstdc++/80762 avoid ambiguous __constructible_from

2018-12-12 Thread Jonathan Wakely

Ensure we don't try to instantiate __is_constructible_from,
because there are two partial specializations that are equally good
matches.

PR libstdc++/80762
* include/bits/fs_path.h (path::_Path): Use remove_cv_t and is_void.
* include/experimental/bits/fs_path.h (path::_Path): Likewise.
* testsuite/27_io/filesystem/path/construct/80762.cc: New test.
* testsuite/experimental/filesystem/path/construct/80762.cc: New test.

Tested x86_64-linux (and manually verified it fixes the clang errors).

Committed to trunk. This is worth backporting too, I think.

commit 39d7ca64fdd9ad94c80655ac79c84259a33b3306
Author: Jonathan Wakely 
Date:   Wed Dec 12 14:51:08 2018 +

PR libstdc++/80762 avoid ambiguous __constructible_from

Ensure we don't try to instantiate __is_constructible_from,
because there are two partial specializations that are equally good
matches.

PR libstdc++/80762
* include/bits/fs_path.h (path::_Path): Use remove_cv_t and is_void.
* include/experimental/bits/fs_path.h (path::_Path): Likewise.
* testsuite/27_io/filesystem/path/construct/80762.cc: New test.
* testsuite/experimental/filesystem/path/construct/80762.cc: New 
test.

diff --git a/libstdc++-v3/include/bits/fs_path.h 
b/libstdc++-v3/include/bits/fs_path.h
index 0eee684a2f6..cbaea7343a3 100644
--- a/libstdc++-v3/include/bits/fs_path.h
+++ b/libstdc++-v3/include/bits/fs_path.h
@@ -110,7 +110,8 @@ _GLIBCXX_BEGIN_NAMESPACE_CXX11
 
 template
   using _Path = typename
-   std::enable_if<__and_<__not_>,
+   std::enable_if<__and_<__not_, path>>,
+ __not_>,
  __constructible_from<_Tp1, _Tp2>>::value,
   path>::type;
 
diff --git a/libstdc++-v3/include/experimental/bits/fs_path.h 
b/libstdc++-v3/include/experimental/bits/fs_path.h
index 653b4a3fe85..340cc59d541 100644
--- a/libstdc++-v3/include/experimental/bits/fs_path.h
+++ b/libstdc++-v3/include/experimental/bits/fs_path.h
@@ -125,7 +125,9 @@ _GLIBCXX_BEGIN_NAMESPACE_CXX11
 
 template
   using _Path = typename
-   std::enable_if<__and_<__not_>,
+   std::enable_if<__and_<__not_::type,
+path>>,
+ __not_>,
  __constructible_from<_Tp1, _Tp2>>::value,
   path>::type;
 
diff --git a/libstdc++-v3/testsuite/27_io/filesystem/path/construct/80762.cc 
b/libstdc++-v3/testsuite/27_io/filesystem/path/construct/80762.cc
new file mode 100644
index 000..15a79fd4e12
--- /dev/null
+++ b/libstdc++-v3/testsuite/27_io/filesystem/path/construct/80762.cc
@@ -0,0 +1,29 @@
+// Copyright (C) 2018 Free Software Foundation, Inc.
+//
+// This file is part of the GNU ISO C++ Library.  This library is free
+// software; you can redistribute it and/or modify it under the
+// terms of the GNU General Public License as published by the
+// Free Software Foundation; either version 3, or (at your option)
+// any later version.
+
+// This library is distributed in the hope that it will be useful,
+// but WITHOUT ANY WARRANTY; without even the implied warranty of
+// MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+// GNU General Public License for more details.
+
+// You should have received a copy of the GNU General Public License along
+// with this library; see the file COPYING3.  If not see
+// .
+
+// { dg-options "-std=gnu++17" }
+// { dg-do compile { target c++17 } }
+
+#include 
+
+using std::filesystem::path;
+
+static_assert( !std::is_constructible_v );
+static_assert( !std::is_constructible_v );
+static_assert( !std::is_constructible_v );
+static_assert( !std::is_constructible_v );
+static_assert( !std::is_constructible_v );
diff --git 
a/libstdc++-v3/testsuite/experimental/filesystem/path/construct/80762.cc 
b/libstdc++-v3/testsuite/experimental/filesystem/path/construct/80762.cc
new file mode 100644
index 000..fdd9f768f78
--- /dev/null
+++ b/libstdc++-v3/testsuite/experimental/filesystem/path/construct/80762.cc
@@ -0,0 +1,28 @@
+// Copyright (C) 2018 Free Software Foundation, Inc.
+//
+// This file is part of the GNU ISO C++ Library.  This library is free
+// software; you can redistribute it and/or modify it under the
+// terms of the GNU General Public License as published by the
+// Free Software Foundation; either version 3, or (at your option)
+// any later version.
+
+// This library is distributed in the hope that it will be useful,
+// but WITHOUT ANY WARRANTY; without even the implied warranty of
+// MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+// GNU General Public License for more details.
+
+// You should have received a copy of the GNU General Public License along
+// with this library; see the file COPYING3.  If not see
+// .
+
+// { dg-do compile { targe

[PATCH] Overload std::distance and std::advance for path::iterator

2018-12-12 Thread Jonathan Wakely

Although filesystem::path::iterator is only a bidirectional iterator,
the underlying sequence has random access iterators (specifically, raw
pointers). This means std::distance and std::advance can be implemented
more efficiently than the generic versions which apply ++ and --
repeatedly.

PR libstdc++/71044 (partial)
* include/bits/fs_path.h (__path_iter_distance, __path_iter_advance):
New friend functions to implement std::distance and std::advance more
efficiently.
(distance, advance): Add overloads for path::iterator.
* testsuite/27_io/filesystem/path/itr/components.cc: Test new
overload.

Tested x86_64-linux, committed to trunk.

commit 994dfa1a5ca5ddd4e8d280873ef376ee6ab8352b
Author: Jonathan Wakely 
Date:   Fri Nov 30 16:33:24 2018 +

Overload std::distance and std::advance for path::iterator

Although filesystem::path::iterator is only a bidirectional iterator,
the underlying sequence has random access iterators (specifically, raw
pointers). This means std::distance and std::advance can be implemented
more efficiently than the generic versions which apply ++ and --
repeatedly.

PR libstdc++/71044 (partial)
* include/bits/fs_path.h (__path_iter_distance, 
__path_iter_advance):
New friend functions to implement std::distance and std::advance 
more
efficiently.
(distance, advance): Add overloads for path::iterator.
* testsuite/27_io/filesystem/path/itr/components.cc: Test new
overload.

diff --git a/libstdc++-v3/include/bits/fs_path.h 
b/libstdc++-v3/include/bits/fs_path.h
index cbaea7343a3..075b3ab5ef8 100644
--- a/libstdc++-v3/include/bits/fs_path.h
+++ b/libstdc++-v3/include/bits/fs_path.h
@@ -733,6 +733,37 @@ _GLIBCXX_BEGIN_NAMESPACE_CXX11
   private:
 friend class path;
 
+bool _M_is_multi() const { return _M_path->_M_type == _Type::_Multi; }
+
+friend difference_type
+__path_iter_distance(const iterator& __first, const iterator& __last)
+{
+  __glibcxx_assert(__first._M_path != nullptr);
+  __glibcxx_assert(__first._M_path == __last._M_path);
+  if (__first._M_is_multi())
+   return std::distance(__first._M_cur, __last._M_cur);
+  else if (__first._M_at_end == __last._M_at_end)
+   return 0;
+  else
+   return __first._M_at_end ? -1 : 1;
+}
+
+friend void
+__path_iter_advance(iterator& __i, difference_type __n)
+{
+  if (__n == 1)
+   ++__i;
+  else if (__n == -1)
+   --__i;
+  else if (__n != 0)
+   {
+ __glibcxx_assert(__i._M_path != nullptr);
+ __glibcxx_assert(__i._M_is_multi());
+ // __glibcxx_assert(__i._M_path->_M_cmpts.end() - __i._M_cur >= __n);
+ __i._M_cur += __n;
+   }
+}
+
 iterator(const path* __path, path::_List::const_iterator __iter)
 : _M_path(__path), _M_cur(__iter), _M_at_end()
 { }
@@ -1160,6 +1191,15 @@ _GLIBCXX_BEGIN_NAMESPACE_CXX11
 _GLIBCXX_END_NAMESPACE_CXX11
 } // namespace filesystem
 
+inline ptrdiff_t
+distance(filesystem::path::iterator __first, filesystem::path::iterator __last)
+{ return __path_iter_distance(__first, __last); }
+
+template
+  void
+  advance(filesystem::path::iterator& __i, _Distance __n)
+  { __path_iter_advance(__i, static_cast(__n)); }
+
 extern template class __shared_ptr;
 
 _GLIBCXX_END_NAMESPACE_VERSION
diff --git a/libstdc++-v3/testsuite/27_io/filesystem/path/itr/traversal.cc 
b/libstdc++-v3/testsuite/27_io/filesystem/path/itr/traversal.cc
index 4852c03c78e..55760e82a9a 100644
--- a/libstdc++-v3/testsuite/27_io/filesystem/path/itr/traversal.cc
+++ b/libstdc++-v3/testsuite/27_io/filesystem/path/itr/traversal.cc
@@ -136,10 +136,28 @@ test03()
 }
 }
 
+void
+test04()
+{
+  std::filesystem::path p = "/a/b/c/d/e/f/g";
+  VERIFY( std::distance(p.begin(), p.end()) == 8);
+  auto it = p.begin();
+  std::advance(it, 1);
+  VERIFY( std::distance(p.begin(), it) == 1 );
+  VERIFY( it->native() == "a" );
+  std::advance(it, 3);
+  VERIFY( std::distance(p.begin(), it) == 4 );
+  VERIFY( it->native() == "d" );
+  std::advance(it, -2);
+  VERIFY( std::distance(p.begin(), it) == 2 );
+  VERIFY( it->native() == "b" );
+}
+
 int
 main()
 {
   test01();
   test02();
   test03();
+  test04();
 }


Re: [PATCH 1/2] [ARC] Fix REG_CLASS_NAMES

2018-12-12 Thread Andrew Burgess
* Claudiu Zissulescu  [2018-12-11 12:23:34 +0200]:

> Forgot a class name, fix it.
> 
> gcc/
> -xx-xx  Claudiu Zissulescu  
> 
>   * config/arc/arc.h (REG_CLASS_NAMES): Add SIBCALL_REGS.

Looks good,

Thanks,
Andrew

> 
> 
> ---
>  gcc/config/arc/arc.h | 1 +
>  1 file changed, 1 insertion(+)
> 
> diff --git a/gcc/config/arc/arc.h b/gcc/config/arc/arc.h
> index 4d01c99a540..80e785e6562 100644
> --- a/gcc/config/arc/arc.h
> +++ b/gcc/config/arc/arc.h
> @@ -491,6 +491,7 @@ enum reg_class
>"R0R1_CD_REGS",  \
>"R0R3_CD_REGS",  \
>"ARCOMPACT16_REGS",  \
> +  "SIBCALL_REGS",  \
>"AC16_H_REGS",   \
>"DOUBLE_REGS",   \
>"GENERAL_REGS",  \
> -- 
> 2.19.1
> 


Re: [PATCH 2/2] [ARC] Fix millicode wrong blink restore.

2018-12-12 Thread Andrew Burgess
* Claudiu Zissulescu  [2018-12-11 12:23:35 +0200]:

> The blink is restored wrongly when using millicode and regular load
> instructions.
> 
> gcc/
> -xx-xx  Claudiu Zissulescu  
> 
>   * config/arc/arc.c (arc_restore_callee_milli) Don't clobber off
>   variable.
> 
> testsuite/
> -xx-xx  Claudiu Zissulescu  
> 
>   * gcc.target/arc/milli-1.c: New test.

Looks good, and thanks for the new test.

Andrew


> ---
>  gcc/config/arc/arc.c   |  4 +---
>  gcc/testsuite/gcc.target/arc/milli-1.c | 23 +++
>  2 files changed, 24 insertions(+), 3 deletions(-)
>  create mode 100644 gcc/testsuite/gcc.target/arc/milli-1.c
> 
> diff --git a/gcc/config/arc/arc.c b/gcc/config/arc/arc.c
> index 55175215bfe..5af3ee6c9e0 100644
> --- a/gcc/config/arc/arc.c
> +++ b/gcc/config/arc/arc.c
> @@ -3597,9 +3597,7 @@ arc_restore_callee_milli (unsigned int gmask,
>  insn = frame_insn (insn);
>  
>/* Add DWARF info.  */
> -  for (regno = start_reg, off = 0;
> -   regno <= end_reg;
> -   regno++, off += UNITS_PER_WORD)
> +  for (regno = start_reg; regno <= end_reg; regno++)
>  {
>reg = gen_rtx_REG (SImode, regno);
>add_reg_note (insn, REG_CFA_RESTORE, reg);
> diff --git a/gcc/testsuite/gcc.target/arc/milli-1.c 
> b/gcc/testsuite/gcc.target/arc/milli-1.c
> new file mode 100644
> index 000..b501b39eb81
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/arc/milli-1.c
> @@ -0,0 +1,23 @@
> +/* { dg-do compile } */
> +/* { dg-options "-Os" } */
> +
> +/* Test if we restore correctly blink when using millicode.  */
> +extern void bar (void);
> +
> +void foo (void)
> +{
> +  __asm__ volatile ( "" : : : 
> "r13","r14","r15","r16","r17","r18","r20","r21");
> +  bar();
> +}
> +
> +void foo2 (void)
> +{
> +  bar();
> +  __asm__ volatile ( "" : : : 
> "r13","r14","r15","r16","r17","r18","r20","r21");
> +}
> +
> +/* { dg-final { scan-assembler-not "st.*r13,\\\[sp" } } */
> +/* { dg-final { scan-assembler-not "st.*r14,\\\[sp" } } */
> +/* { dg-final { scan-assembler-not "st.*r15,\\\[sp" } } */
> +/* { dg-final { scan-assembler "ld.*blink,\\\[sp,32" } } */
> +/* { dg-final { scan-assembler "mov_s.*r12,32" } } */
> -- 
> 2.19.1
> 


Re: [PATCH] [RFC] PR target/52813 and target/11807

2018-12-12 Thread Dimitar Dimitrov
On Wed, 12 Dec 2018 at 14:19:27 EET Christophe Lyon wrote:
> On Wed, 12 Dec 2018 at 12:21, Thomas Preudhomme
> 
>  wrote:
> > So my understanding is that the original code (CMSIS library) used to
> > clobber sp because the asm statement was actually changing the sp.
> > That in turn led GCC to try to save and restore sp which is not what
> > CMSIS was expecting to happen. Changing sp without clobber as done now
> > is probably the right solution and r242693 can be reverted. That will
> > remove the failing test.
> 
> OK, I read PR52813 too, but I'm not sure to fully understand the new status.
> My understanding is that since this patch was committed, if an asm
> statement clobbers sp, it is now allowed to actually declare it as clobber
> (this patch generates an error in such a case).
> So the user is now expected to lie to the compiler when writing to
> this kind of register (sp, pic register), by not declaring it as "clobber"?

Disclosure: I'm a GCC newbie.

I expect that if I mark a HW register as "clobber", compiler would save its 
contents before executing the asm statement, and after that it would restore 
its contents. This is the GCC behaviour for all but the SP and PIC registers. 
That is why I believe that PR52813 is a valid bug. 


I'm not sure how GCC could recover if SP is clobbered. If SP is clobbered in 
such a way that GCC will not notice (e.g. thread switching), then why should 
GCC know about it in the first place?

Regards,
Dimitar


Re: [PATCH] [RFC] PR target/52813 and target/11807

2018-12-12 Thread Dimitar Dimitrov
On Wed, 12 Dec 2018 at 11:03:24 EET Christophe Lyon wrote:
> And just noticed it causes a failure to build GDB for x86_64:
> gdb-8.1-release/gdb/nat/linux-ptrace.c: In function 'void
> linux_ptrace_init_warnings()':
> gdb-8.1-release/gdb/nat/linux-ptrace.c:149:23: error: Stack Pointer
> register clobbered by '%rsp' in 'asm'
>   149 |: "%rsp", "memory");
> 
> | ^
> 
> Makefile:1640: recipe for target 'linux-ptrace.o' failed
> 
> I didn't check if the GDB code is legitimate though

Sorry about this. I had checked the Linux x86 kernel for SP clobbers, but 
forgot that GDB could also use such magic.

I'll try to fix it and send a patch to GDB. It will likely take me a few days, 
so I hope that this breakage is not considered a P0 bug.

Regards,
Dimitar


Re: [ping] allow target configurations to state R18 as reserved on arrch64

2018-12-12 Thread Olivier Hainque
Hi Wilco,

Thanks for your feedback on this !

> On 12 Dec 2018, at 17:11, Wilco Dijkstra  wrote:
> 
> Hi Oliver,
> 
> +#define FIXED_R18 0
> 
>{\
>  0, 0, 0, 0,   0, 0, 0, 0,/* R0 - R7 */\
>  0, 0, 0, 0,   0, 0, 0, 0,/* R8 - R15 */\
> -0, 0, 0, 0,   0, 0, 0, 0,/* R16 - R23 */\
> +0, 0, FIXED_R18, 0, 0, 0, 0, 0,/* R16 - R23 */\
> 
> This is equivalent to having a zero in the table given the #define is 
> unconditional. 

Not quite, because the use is part of the FIXED_REGISTERS
macro definition, which is expanded in reginfo.c. We get the
value of FIXED_R18 at the expansion point, so after possible
redefinitions by os configuration files.

We're pretty sure this works as intended for the VxWorks port.

r18 is used as a task control block pointer by the OS runtime,
so any non trivial program rapidly misbehaves as soon as r18 is
used for anything else in the user code.

I'm happy to do it differently regardless :)

> It's best to use the existing mechanisms to change registers, this should be 
> done
> in aarch64_conditional_register_usage given very similar changes for floating
> point and SVE are already there.

Ok. That would be achieved with -ffixed-r18 or so. I'll experiment
with a self spec, on top of the static chain update to r11 (change just
posted separately).

Thanks again for your feedback,

With Kind Regards,

Olivier



Re: [PATCH] x86: Add -march=cascadelake

2018-12-12 Thread Uros Bizjak
On Wed, Dec 12, 2018 at 10:48 AM Wei Xiao  wrote:
>
> Hi Uros and other reviewers,
>
> I'd like to split the work into 2 parts:
> 1) Basic processor enabling.
> 2) Processor type dynamic check.
>
> Let's use a separate patch to implement the part 2.
> The part 1 is implemented by attached patch.
> Is it ok for trunk?
>
> Wei
>
> gcc/
>   * common/config/i386/i386-common.c (processor_names): Add cascadelake.
>   (processor_alias_table): Add cascadelake.
>   * config.gcc: Add -march=cascadelake.
>   * config/i386/i386-c.c (ix86_target_macros_internal): Handle 
> cascadelake.
>   * config/i386/i386.c (Add m_CASCADELAKE): New.
>   (processor_cost_table): Add cascadelake.
>   (get_builtin_code_for_version): Handle cascadelake.
>   * config/i386/i386.h (TARGET_CASCADELAKE, PROCESSOR_CASCADELAKE): New.
>   (PTA_CASCADELAKE): Ditto.
>   * doc/invoke.texi: Add -march=cascadelake.
>
> gcc/testsuite/
>   * gcc.target/i386/funcspec-56.inc: Handle new march.

OK for mainline.

Thanks,
Uros.

> Wei Xiao  于2018年11月29日周四 下午4:32写道:
> >
> > Hi
> >
> > Distinguish based on stepping number is not recommended for some reasons:
> > 1) Intel doesn't officially disclose stepping information in SDM.
> > 2) Stepping can be changing in the future.
> >
> > We still prefer the conventional distinguish approach based on feature bits.
> > I have refined the patch as attached according to all your suggestions.
> >
> > Wei
> >
> > gcc/
> > * common/config/i386/i386-common.c (processor_names): Add 
> > cascadelake.
> > (processor_alias_table): Add cascadelake.
> > * config.gcc: Add -march=cascadelake.
> > * config/i386/driver-i386.c
> > (host_detect_local_cpu): Detect cascadelake.
> > * config/i386/i386-c.c (ix86_target_macros_internal): Handle
> > cascadelake.
> > * config/i386/i386.c (ix86_cost): Add m_CASCADELAKE.
> > (processor_cost_table): Add cascadelake.
> > (get_builtin_code_for_version): Handle cascadelake.
> > (fold_builtin_cpu): Ditto.
> > * config/i386/i386.h (TARGET_CASCADELAKE, PROCESSOR_CASCADELAKE): 
> > New.
> > (PTA_CASCADELAKE): Ditto.
> > * doc/extend.texi: Add cascadelake.
> > * doc/invoke.texi: Add -march=cascadelake.
> > gcc/testsuite/
> > * g++.target/i386/mv16.C: Handle new march.
> > * gcc.target/i386/builtin_target.c: Ditto.
> > * gcc.target/i386/funcspec-56.inc: Ditto.
> > libgcc/
> > * config/i386/cpuinfo.c (get_intel_cpu): Handle cascadelake.
> > * config/i386/cpuinfo.h: Add INTEL_COREI7_CASCADELAKE.
> > Wei Xiao  于2018年11月27日周二 下午6:40写道:
> > >
> > > Thanks for the helpful information!
> > > But I'm still checking with hardware team about the
> > > family/model/stepping numbers for Cascadelake which are not officially
> > > disclosed by Intel (to my best knowledge).
> > >
> > > Wei
> > > Martin Liška  于2018年11月26日周一 下午10:00写道:
> > > >
> > > > On 11/26/18 12:18 PM, Jakub Jelinek wrote:
> > > > > On Mon, Nov 26, 2018 at 12:03:53PM +0100, Martin Liška wrote:
> > > > >>> For Cascade Lake the model number is the same as Skylake Server,
> > > > >>> it can only be distinguished based on the stepping (5 vs 4)
> > > > >>
> > > > >> Very interesting, probably the first time a distinguish is based on 
> > > > >> stepping number?
> > > > >
> > > > > Wouldn't it be better to distinguish it based on availability of 
> > > > > VNNI, like
> > > > > we do for unknown family/model?
> > > > >
> > > > >>> Like gcc -mcpu=native needs to learn about this.
> > > > >>
> > > > >> I'm attaching patch that does that. Note that it's completely 
> > > > >> untested as I don't have
> > > > >> access to any of the new machines (Skylake server).
> > > >
> > > > Would be possible, the only ugly place would be in 
> > > > libgcc/config/i386/cpuinfo.c where we
> > > > call:
> > > >
> > > >   get_intel_cpu (family, model, stepping, brand_id);
> > > >   /* Find available features. */
> > > >   get_available_features (ecx, edx, max_level, &avx512_vnni);
> > > >
> > > > one would need a feature to distinguish CPU model. Do we really want 
> > > > that?
> > > >
> > > > Martin
> > > >
> > > > >
> > > > >   Jakub
> > > > >
> > > >


Re: [ping] allow target (OS) SUBTARGET_OVERRIDE_OPTIONS on aarch64

2018-12-12 Thread James Greenhalgh
On Wed, Dec 12, 2018 at 09:42:05AM -0600, Olivier Hainque wrote:
> Ping for one of the changes last proposed here:
> 
>  https://gcc.gnu.org/ml/gcc-patches/2018-11/msg00761.html
> 
> Submitted separately as an attempt to facilitate the review
> process.
> 
> This one proposes the possibility for target (OS) configurations
> to provide a SUBTARGET_OVERRIDE_OPTIONS macro as other CPU ports
> do, needed by our aarch64-vxworks7 port to come.
> 
> Got access to a linux box, so in addition to the Ada nighty
> testing we do on the cross port, bootstrapped and regression
> tested on aarch64-linux.
> 
> OK to commit ?

OK.

Thanks,
James

> 2018-12-12  Olivier Hainque  
> 
>   * config/aarch64/aarch64.c (aarch64_override_options): Once arch,
>   cpu and tune were validated, insert SUBTARGET_OVERRIDE_OPTIONS if
>   defined.



Re: [ping] allow target (OS) SUBTARGET_OVERRIDE_OPTIONS on aarch64

2018-12-12 Thread Olivier Hainque



> On 12 Dec 2018, at 17:57, James Greenhalgh  wrote:

>> This one proposes the possibility for target (OS) configurations
>> to provide a SUBTARGET_OVERRIDE_OPTIONS macro as other CPU ports
>> do, needed by our aarch64-vxworks7 port to come.
> 
> OK.

Great :-) Thanks for your review James !




Re: [ping] Change static chain to r11 on aarch64

2018-12-12 Thread Richard Earnshaw (lists)
On 12/12/2018 15:47, Olivier Hainque wrote:
> Hello,
> 
> Ping for part of the changes last proposed here:
> 
>  https://gcc.gnu.org/ml/gcc-patches/2018-11/msg00761.html
> 
> Submitted separately as an attempt to facilitate the review
> process.
> 
> This piece is the change of static chain from r18 to r11.
> 
> Regression testing with languages=all on an aarch64-linux box
> uncovered (various go related test failures) that libffi needs
> a synchronized adjustment.
> 
> This is the updated version of the patch, so bootstrapped and
> regression tested clean on aarch64-linux.
> 
> OK to commit ?
> 
> Thanks in advance!
> 
> With Kind Regards,
> 
> Olivier
> 
> 2018-12-12  Olivier Hainque  
> 
>   gcc/
>   * config/aarch64/aarch64.h (STATIC_CHAIN_REGNUM): Use R11 instead
>   of R18.
> 
>   gcc/testsuite/
>   * gcc.dg/cwsc1.c: Expect r11 as the static chain on aarch64.
> 
>   libffi/
>   * src/aarch64/sysv.S (ffi_call_SYSV): Expect r11 as the static chain.
>   (ffi_go_closure_SYSV): Likewise.
>   * testsuite/libffi.go/static-chain.h: Likewise.
> 

libffi is a separate project; so a patch for that needs to be sent to
the libffi maintainers.  However, that introduces an issue that that
code is potentially used across multiple versions of gcc, with
potentially different choices of the static chain register.  Hmm, this
might need some more careful thought

I'm also not keen on the fact that we are now seriously eating into the
space of call clobbered registers; what's the argument behind your
selection of r11 as opposed to any other register?  On Arm we manage
with IP; is there some reason why we can't do something similar on AArch64?

R.

> 
> 
> 0003-Change-static-chain-from-r18-to-r11-on-aarch64.patch
> 
> From 2fe2661e581e8e3989e62b81db6ca37d2392f83d Mon Sep 17 00:00:00 2001
> From: Olivier Hainque 
> Date: Wed, 12 Dec 2018 07:05:23 -0800
> Subject: [PATCH 3/4] Change static chain from r18 to r11 on aarch64
> 
>   gcc/
>   * config/aarch64/aarch64.h (STATIC_CHAIN_REGNUM): Use R11 instead
>   of R18.
> 
>   gcc/testsuite/
>   * gcc.dg/cwsc1.c: Expect r11 as the static chain on aarch64.
> 
>   libffi/
>   * src/aarch64/sysv.S (ffi_call_SYSV): Expect r11 as the static chain.
>   (ffi_go_closure_SYSV): Likewise.
>   * testsuite/libffi.go/static-chain.h: Likewise.
> ---
>  gcc/config/aarch64/aarch64.h  | 2 +-
>  gcc/testsuite/gcc.dg/cwsc1.c  | 2 +-
>  libffi/src/aarch64/sysv.S | 6 +++---
>  libffi/testsuite/libffi.go/static-chain.h | 2 +-
>  4 files changed, 6 insertions(+), 6 deletions(-)
> 
> diff --git a/gcc/config/aarch64/aarch64.h b/gcc/config/aarch64/aarch64.h
> index 0c833a8..a8063c2 100644
> --- a/gcc/config/aarch64/aarch64.h
> +++ b/gcc/config/aarch64/aarch64.h
> @@ -422,7 +422,7 @@ extern unsigned aarch64_architecture_version;
> uses alloca.  */
>  #define EXIT_IGNORE_STACK(cfun->calls_alloca)
>  
> -#define STATIC_CHAIN_REGNUM  R18_REGNUM
> +#define STATIC_CHAIN_REGNUM  R11_REGNUM
>  #define HARD_FRAME_POINTER_REGNUMR29_REGNUM
>  #define FRAME_POINTER_REGNUM SFP_REGNUM
>  #define STACK_POINTER_REGNUM SP_REGNUM
> diff --git a/gcc/testsuite/gcc.dg/cwsc1.c b/gcc/testsuite/gcc.dg/cwsc1.c
> index e793e26..be22317 100644
> --- a/gcc/testsuite/gcc.dg/cwsc1.c
> +++ b/gcc/testsuite/gcc.dg/cwsc1.c
> @@ -6,7 +6,7 @@
>  #elif defined(__i386__)
>  # define CHAIN  "%ecx"
>  #elif defined(__aarch64__)
> -# define CHAIN  "x18"
> +# define CHAIN  "x11"
>  #elif defined(__alpha__)
>  # define CHAIN  "$1"
>  #elif defined(__arm__)
> diff --git a/libffi/src/aarch64/sysv.S b/libffi/src/aarch64/sysv.S
> index c1bf9b9..12e7ec3 100644
> --- a/libffi/src/aarch64/sysv.S
> +++ b/libffi/src/aarch64/sysv.S
> @@ -89,7 +89,7 @@ CNAME(ffi_call_SYSV):
>   mov x9, x2  /* save fn */
>   mov x8, x3  /* install structure return */
>  #ifdef FFI_GO_CLOSURES
> - mov x18, x5 /* install static chain */
> + mov x11, x5 /* install static chain */
>  #endif
>   stp x3, x4, [x29, #16]  /* save rvalue and flags */
>  
> @@ -415,8 +415,8 @@ CNAME(ffi_go_closure_SYSV):
>   stp x6, x7, [sp, #16 + 16*N_V_ARG_REG + 48]
>  
>   /* Load ffi_closure_inner arguments.  */
> - ldp PTR_REG(0), PTR_REG(1), [x18, #PTR_SIZE]/* load cif, fn */
> - mov x2, x18 /* load user_data */
> + ldp PTR_REG(0), PTR_REG(1), [x11, #PTR_SIZE]/* load cif, fn */
> + mov x2, x11 /* load user_data */
>   b   .Ldo_closure
>   cfi_endproc
>  
> diff --git a/libffi/testsuite/libffi.go/static-chain.h 
> b/libffi/testsuite/libffi.go/static-chain.h
> index e120eea..d22b037 100644
> --- a/libffi/testsuite/libffi.go/static-chain.h
> +++ b/libffi/testsuite/libffi.go/static-chain.h
> @@ -1,5 +1,5 @@
>  #ifdef __a

[PATCH] PR libstdc++/71044 optimize std::filesystem::path construction

2018-12-12 Thread Jonathan Wakely

This new implementation has a smaller footprint than the previous
implementation, due to replacing std::vector<_Cmpt> with a custom pimpl
type that only needs a single pointer. The _M_type enumeration is also
combined with the pimpl type, by using a tagged pointer, reducing
sizeof(path) further still.

Construction and modification of paths is now done more efficiently, by
splitting the input into a stack-based buffer of string_view objects
instead of a dynamically-allocated vector containing strings. Once the
final size is known only a single allocation is needed to reserve space
for it. The append and concat operations no longer require constructing
temporary path objects, nor re-parsing the entire native pathname.

This results in algorithmic improvements to path construction, and
working with large paths is much faster.

PR libstdc++/71044
* include/bits/fs_path.h (path::path(path&&)): Add noexcept when
appropriate. Move _M_cmpts instead of reparsing the native pathname.
(path::operator=(const path&)): Do not define as defaulted.
(path::operator/=, path::append): Call _M_append.
(path::concat): Call _M_concat.
(path::path(string_type, _Type): Change type of first parameter to
basic_string_view.
(path::_M_append(string_type), path::_M_concat(string_type)): New
member functions.
(path::_S_is_dir_sep): Replace with non-member is_dir_sep.
(path::_M_trim, path::_M_add_root_name, path::_M_add_root_dir)
(path::_M_add_filename): Remove.
(path::_M_type()): New member function to replace _M_type data member.
(path::_List): Define new struct type instead of using std::vector.
(path::_Cmpt::_Cmpt(string_type, _Type, size_t)): Change type of
first parameter to basic_string_view.
(path::operator+=(const path&)): Do not define inline.
(path::operator+=(const string_type&)): Call _M_concat.
(path::operator+=(const value_type*)): Likewise.
(path::operator+=(value_type)): Likewise.
(path::operator+=(basic_string_view)): Likewise.
(path::operator/=(const path&)): Do not define inline.
(path::_M_append(path)): Remove.
* python/libstdcxx/v6/printers.py (StdPathPrinter): New printer that
understands the new path::_List type.
* src/filesystem/std-path.cc (is_dir_sep): New function to replace
path::_S_is_dir_sep.
(path::_Parser): New helper class to parse strings as paths.
(path::_List::_Impl): Define container type for path components.
(path::_List): Define members.
(path::operator=(const path&)): Define explicitly, to provide the
strong exception safety guarantee.
(path::operator/=(const path&)): Implement manually by processing
each component of the argument, rather than using _M_split_cmpts
to parse the entire string again.
(path::_M_append(string_type)): Likewise.
(path::operator+=(const path&)): Likewise.
(path::_M_concat(string_type)): Likewise.
(path::remove_filename()): Perform trim directly instead of calling
_M_trim().
(path::_M_split_cmpts()): Rewrite in terms of _Parser class.
(path::_M_trim, path::_M_add_root_name, path::_M_add_root_dir)
(path::_M_add_filename): Remove.

Tested x86_64-linux. I intend to commit this to trunk tomorrow.

There are some other performance improvements that could be made to
the std::filesystem implementation, but this is the main thing I
wanted to do for PR 71044.


commit 921bb5ac834156bc01e72a0049e27e39ed47c7ca
Author: Jonathan Wakely 
Date:   Wed Dec 12 16:27:11 2018 +

PR libstdc++/71044 optimize std::filesystem::path construction

This new implementation has a smaller footprint than the previous
implementation, due to replacing std::vector<_Cmpt> with a custom pimpl
type that only needs a single pointer. The _M_type enumeration is also
combined with the pimpl type, by using a tagged pointer, reducing
sizeof(path) further still.

Construction and modification of paths is now done more efficiently, by
splitting the input into a stack-based buffer of string_view objects
instead of a dynamically-allocated vector containing strings. Once the
final size is known only a single allocation is needed to reserve space
for it. The append and concat operations no longer require constructing
temporary path objects, nor re-parsing the entire native pathname.

This results in algorithmic improvements to path construction, and
working with large paths is much faster.

PR libstdc++/71044
* include/bits/fs_path.h (path::path(path&&)): Add noexcept when
appropriate. Move _M_cmpts instead of reparsing the native pathname.
(path::operator=(const path&)): Do not define as defaulted.
(path::operator/=, path::append): Call _

Re: [PATCH] Delete powerpcspe

2018-12-12 Thread Segher Boessenkool
On Wed, Dec 12, 2018 at 11:36:29AM +0100, Richard Biener wrote:
> On Tue, Dec 11, 2018 at 2:37 PM Jeff Law  wrote:
> > One way to deal with these problems is to create a fake simulator that
> > always returns success.  That's what my tester does for the embedded
> > targets.  That allows us to do reliable compile-time tests as well as
> > the various scan-whatever tests.
> >
> > It would be trivial to start sending those results to gcc-testresults.
> 
> I think it would be more useful if the execute testing would be
> reported as UNSUPPORTED rather than simply PASS w/o being
> sure it does.

Yes.

> But while posting to gcc-testresults is a sign of testing tracking
> regressions (and progressions!) in bugzilla and caring for those
> bugs is far more important...

If results are posted to gcc-testresults then other people can get a
feel whether the port is detoriating, and at what rate.  If no results
are posted we just have to assume the worst.  Most people do not have
the time (or setup) to test it for themselves.


Segher


Re: [PATCH 2/4] c/c++, asm: Use nicer error for duplicate asm qualifiers

2018-12-12 Thread Segher Boessenkool
On Tue, Dec 11, 2018 at 10:35:00AM -0500, David Malcolm wrote:
> On Mon, 2018-12-10 at 22:47 +, Segher Boessenkool wrote:
> 
> [...]
> 
> > diff --git a/gcc/c/c-parser.c b/gcc/c/c-parser.c
> > index 121a91c..652e53c 100644
> > --- a/gcc/c/c-parser.c
> > +++ b/gcc/c/c-parser.c
> > @@ -6360,41 +6360,54 @@ c_parser_for_statement (c_parser *parser,
> > bool ivdep, unsigned short unroll,
> >  static tree
> >  c_parser_asm_statement (c_parser *parser)
> >  {
> > -  tree quals, str, outputs, inputs, clobbers, labels, ret;
> > -  bool simple, is_volatile, is_inline, is_goto;
> > +  tree str, outputs, inputs, clobbers, labels, ret;
> > +  bool simple;
> >location_t asm_loc = c_parser_peek_token (parser)->location;
> >int section, nsections;
> >  
> >gcc_assert (c_parser_next_token_is_keyword (parser, RID_ASM));
> >c_parser_consume_token (parser);
> >  
> > -  quals = NULL_TREE;
> > -  is_volatile = false;
> > -  is_inline = false;
> > -  is_goto = false;
> > +  /* Handle the asm-qualifier-list.  */
> > +  location_t volatile_loc = UNKNOWN_LOCATION;
> > +  location_t inline_loc = UNKNOWN_LOCATION;
> > +  location_t goto_loc = UNKNOWN_LOCATION;
> >for (;;)
> >  {
> > -  switch (c_parser_peek_token (parser)->keyword)
> > +  c_token *token = c_parser_peek_token (parser);
> > +  location_t loc = token->location;
> > +  switch (token->keyword)
> > {
> > case RID_VOLATILE:
> > - if (is_volatile)
> > -   break;
> > - is_volatile = true;
> > - quals = c_parser_peek_token (parser)->value;
> > + if (volatile_loc)
> > +   {
> > + error_at (loc, "duplicate asm qualifier %qE", token-
> > >value);
> > + inform (volatile_loc, "first seen here");
> > +   }
> 
> Thanks for the improvements.
> 
> Is there test coverage for these errors and notes?

Yes, in this same patch.  The error, that is; I have no idea how to test
the note, and that belongs in a different test anyhow.  Dejagnu ignores
notes normally.

> A diagnostic nit (new with gcc 9): please add an:
> auto_diagnostic_group d;
> to the start of the guarded block, so that the "error" and "note" are
> known to be related.

How would that work for

asm volatile goto volatile goto ("uh-oh"  lab); lab :;

There are two groups, overlapping, but not nested.  I suppose something
could be done with new() but that is too ugly for words.  Is there a
procedural interface, too?  Something that does not depend on C++ magic.


Segher


Re: [PATCH 2/4] c/c++, asm: Use nicer error for duplicate asm qualifiers

2018-12-12 Thread Segher Boessenkool
On Tue, Dec 11, 2018 at 10:48:15AM -0500, David Malcolm wrote:
> For bonus points, these could offer fix-it hints, so that an IDE can
> offer to delete the duplicate qualifier token.

Yes it could.  But have you ever seen this error, in a real program?
(It was an error before, just without a nice message).


Segher


Re: [ping] Change static chain to r11 on aarch64

2018-12-12 Thread Olivier Hainque
Hi Richard,

Thanks for your feedback on this ! 

> On 12 Dec 2018, at 18:21, Richard Earnshaw (lists)  
> wrote:
> 
> libffi is a separate project; so a patch for that needs to be sent to
> the libffi maintainers.

Oh, I hadn't realized that.

>  However, that introduces an issue that that
> code is potentially used across multiple versions of gcc, with
> potentially different choices of the static chain register.  Hmm, this
> might need some more careful thought

Indeed.

> I'm also not keen on the fact that we are now seriously eating into the
> space of call clobbered registers; what's the argument behind your
> selection of r11 as opposed to any other register?

No super strong motivation for r11 in particular.

The primary goal is to get to something else than r18 to
accommodate configurations where r18 is used by the runtime
environment, as allowed by the ABI and which at least VxWorks
leverages.

https://gcc.gnu.org/ml/gcc-patches/2018-10/msg01125.html

suggested r9, then I discovered that r9 and r10 were used
by the stack probing mechanism, so I just picked the following
one that didn't seem to be used for other purposes already.

>  On Arm we manage
> with IP; is there some reason why we can't do something similar on AArch64?

I am unfortunately not familiar enough with the port to tell
out of the top of my head.

Maybe Kyrill, James or Wilco (cc'ed) would know ?



Re: [PATCH 2/4] c/c++, asm: Use nicer error for duplicate asm qualifiers

2018-12-12 Thread Segher Boessenkool
On Tue, Dec 11, 2018 at 10:31:02AM -0700, Martin Sebor wrote:
> >+{
> >+  error_at (loc, "duplicate asm qualifier %qE", token->value);
> 
> We have been making an effort to quote keywords, identifiers,
> option names, and other such things in diagnostics.  In
> the message above and all others like it in this patch kit
> that mention "asm" the keyword should be quoted the same
> way as the name of the qualifier.

This message is about "asm qualifiers".  It is not about the "asm"
statement.  You can write this without "asm" keyword, too, anyway (with
an "__asm__" or such), making the message even more awkward to quote in
that case.  The location of the error has nothing to do with the "asm",
either.

You should only quote keywords that are in the source text.  Not random
words that *could* be a keyword :-)


Segher


Re: [PATCH 2/4] c/c++, asm: Use nicer error for duplicate asm qualifiers

2018-12-12 Thread Martin Sebor

On 12/12/18 10:50 AM, Segher Boessenkool wrote:

On Tue, Dec 11, 2018 at 10:31:02AM -0700, Martin Sebor wrote:

+   {
+ error_at (loc, "duplicate asm qualifier %qE", token->value);


We have been making an effort to quote keywords, identifiers,
option names, and other such things in diagnostics.  In
the message above and all others like it in this patch kit
that mention "asm" the keyword should be quoted the same
way as the name of the qualifier.


This message is about "asm qualifiers".  It is not about the "asm"
statement.  You can write this without "asm" keyword, too, anyway (with
an "__asm__" or such), making the message even more awkward to quote in
that case.  The location of the error has nothing to do with the "asm",
either.

You should only quote keywords that are in the source text.  Not random
words that *could* be a keyword :-)


asm is not a random word, or even an English word (please look
in a dictionary if you're in doubt).  It is a C/C++ keyword :-)

Martin


[PING 1] Testcase for PR 88297 and minor fixes

2018-12-12 Thread Michael Ploujnikov
On 2018-12-06 3:13 p.m., Michael Ploujnikov wrote:
> Thanks to Martin we now have a test that exercises (cp) cloning
> machinery during the WPA stage of LTO.
> 
> Also, during debugging I found that print_all_lattices would trigger
> an assert if I tried to call it inside decide_whether_version_node.
> 
> Finally I've attached some comment spelling fixes as a bonus.
> 
> 
> Bootstrapping (--with-build-config=bootstrap-lto) and regression testing on 
> x86_64.
> 
> Ok for trunk after tests pass?
> 
> 
> - Michael
> 

Ping?



signature.asc
Description: OpenPGP digital signature


[PATCH v4][C][ADA] use function descriptors instead of trampolines in C

2018-12-12 Thread Uecker, Martin

Hi Jeff,

thank you. I fixed all the minor issues, but see below.


Am Montag, den 03.12.2018, 14:56 -0700 schrieb Jeff Law:
> On 11/4/18 1:48 PM, Uecker, Martin wrote:
> > Hi Joseph,
> > 
> > here is a new version of this patch which adds a warning
> > for targets which do not support -fno-trampolines  and
> > only runs the test case on architectures where this is
> > supported. It seems that documentation for this general
> > feature has improved in the meantime so I only mention
> > C as supported.
> > 
> > 
> > Best,
> > Martin
> > 
> > diff --git a/gcc/ada/gcc-interface/trans.c b/gcc/ada/gcc-interface/trans.c
> > index ce2d43f989e..b79f2373c63 100644
> > --- a/gcc/ada/gcc-interface/trans.c
> > +++ b/gcc/ada/gcc-interface/trans.c
> > @@ -1753,7 +1753,8 @@ Attribute_to_gnu (Node_Id gnat_node, tree 
> > *gnu_result_type_p, int
> > attribute)
> >       if ((attribute == Attr_Access
> >        || attribute == Attr_Unrestricted_Access)
> >       && targetm.calls.custom_function_descriptors > 0
> > -     && Can_Use_Internal_Rep (Etype (gnat_node)))
> > +     && Can_Use_Internal_Rep (Etype (gnat_node))
> > +  && (flag_trampolines != 1))
> >     FUNC_ADDR_BY_DESCRIPTOR (gnu_expr) = 1;
> 
> You've got an extraneous set of parenthesis around your flag_trampolines
> check.  Please remove them.
> 
> 
> >  
> >       /* Otherwise, we need to check that we are not violating the
> > @@ -4330,7 +4331,8 @@ Call_to_gnu (Node_Id gnat_node, tree 
> > *gnu_result_type_p, tree gnu_target,
> >    /* If the access type doesn't require foreign-compatible 
> > representation,
> >      be prepared for descriptors.  */
> >    if (targetm.calls.custom_function_descriptors > 0
> > -     && Can_Use_Internal_Rep (Etype (Prefix (Name (gnat_node)
> > +     && Can_Use_Internal_Rep (Etype (Prefix (Name (gnat_node
> > +  && (flag_trampolines != 1))
> 
> Similarly here.
> 
> 
> > diff --git a/gcc/c/c-objc-common.h b/gcc/c/c-objc-common.h
> > index 78e768c2366..ef039560eb9 100644
> > --- a/gcc/c/c-objc-common.h
> > +++ b/gcc/c/c-objc-common.h
> > @@ -110,4 +110,7 @@ along with GCC; see the file COPYING3.  If not see
> >  
> >  #undef LANG_HOOKS_TREE_INLINING_VAR_MOD_TYPE_P
> >  #define LANG_HOOKS_TREE_INLINING_VAR_MOD_TYPE_P c_vla_unspec_p
> > +
> > +#undef LANG_HOOKS_CUSTOM_FUNCTION_DESCRIPTORS
> > +#define LANG_HOOKS_CUSTOM_FUNCTION_DESCRIPTORS true
> >  #endif /* GCC_C_OBJC_COMMON */
> 
> I wonder if we even need the lang hook anymore.  ISTM that a front-end
> that wants to use the function descriptors can just set
> FUNC_ADDR_BY_DESCRIPTOR and we'd use the function descriptor, else we'll
> use the trampoline.  Thoughts?

The lang hook also affects the minimum alignment for function
pointers via the FUNCTION_ALIGNMENT macro (gcc/default.h). This does
not appear to change the default alignment on any architecture, but
it causes a failure in i386/gcc.target/i386/attr-aligned.c when
requesting a smaller alignment which is then silently ignored.

I am not sure what the best approach is, but my preference
would be to remove the lang hook and the FUNCTION_ALIGNMENT
logic which will also fix the test case (the requested
alignment will be applied).

I would then instead add a warning (or error?) which triggers
only with -fno-trampolines if the user requests an alignment
which is too small for this mechanism to work.

Does this sound reasonable?

Best,
Martin


> > diff --git a/gcc/c/c-typeck.c b/gcc/c/c-typeck.c
> > index 9d09b8d65fd..afae9de41e7 100644
> > --- a/gcc/c/c-typeck.c
> > +++ b/gcc/c/c-typeck.c
> > @@ -1912,7 +1912,13 @@ function_to_pointer_conversion (location_t loc, tree 
> > exp)
> >    if (TREE_NO_WARNING (orig_exp))
> >  TREE_NO_WARNING (exp) = 1;
> >  
> > -  return build_unary_op (loc, ADDR_EXPR, exp, false);
> > +  tree r = build_unary_op (loc, ADDR_EXPR, exp, false);
> > +
> > +  if ((TREE_CODE(r) == ADDR_EXPR)
> > +  && (flag_trampolines == 0))
> > + FUNC_ADDR_BY_DESCRIPTOR (r) = 1;
> > +
> > +  return r;
> 
> Extraneous parens here too.
> 
> >  }
> >  
> >  /* Mark EXP as read, not just set, for set but not used -Wunused
> > @@ -3134,6 +3140,11 @@ build_function_call_vec (location_t loc, 
> > vec arg_loc,
> >    else
> >  result = build_call_array_loc (loc, TREE_TYPE (fntype),
> >        function, nargs, argarray);
> > +
> > +  if ((TREE_CODE (result) == CALL_EXPR)
> > +  && (flag_trampolines == 0))
> > +CALL_EXPR_BY_DESCRIPTOR (result) = 1;
> > +
> 
> And here too.
> 
> 
> > diff --git a/gcc/testsuite/lib/target-supports.exp 
> > b/gcc/testsuite/lib/target-supports.exp
> > index fd74c04d092..a34e966b7c4 100644
> > --- a/gcc/testsuite/lib/target-supports.exp
> > +++ b/gcc/testsuite/lib/target-supports.exp
> > @@ -916,6 +916,14 @@ proc check_effective_target_scheduling {} {
> >  } "-fschedule-insns"]
> >  }
> >  
> > +# Return 1 if it is possible to use function descriptors instead of 

Re: [EXT] Re: [Patch 2/4][Aarch64] v2: Implement Aarch64 SIMD ABI

2018-12-12 Thread Steve Ellcey
On Wed, 2018-12-12 at 13:41 +0100, Jakub Jelinek wrote:
> External Email
> 
> ---
> ---
> On Wed, Dec 12, 2018 at 12:34:46PM +, Richard Sandiford wrote:
> > > I considered comparing node->decl and cfun->decl to differentiate
> > > between definitions and declarations instead of using a new
> > > argument
> > > but having an argument seemed cleaner and clearer.
> > 
> > Yeah, agreed.
> 
> I actually disagree, there is no point in passing another argument.
> You should be able to just check node->definition whether it is a
> definition
> or declaration.
> 
>   Jakub

You are right, I can just use node->definition and not add the new
argument.  I will make that change.

Steve Ellcey
sell...@cavium.com



Re: [EXT] Re: [Patch 1/4][Aarch64] v2: Implement Aarch64 SIMD ABI

2018-12-12 Thread Steve Ellcey
On Wed, 2018-12-12 at 11:39 +, Richard Sandiford wrote:
> 
> Steve Ellcey  writes:
> > On Fri, 2018-12-07 at 17:34 +, Richard Sandiford wrote:
> > > > +  (match_operand:TX 2 "register_operand" "w"))
> > > > + (set (mem:TX (plus:P (match_dup 0)
> > > > +  (match_operand:P 5 "const_int_operand"
> > > > "n")))
> > > > +  (match_operand:TX 3 "register_operand" "w"))])]
> > > 
> > > Think this last part should be:
> > > 
> > >  (set (mem:TX (plus:P (plus:P (match_dup 0)
> > >   (match_dup 4))
> > >   (match_operand:P 5 "const_int_operand"
> > > "n")))
> > >   (match_operand:TX 3 "register_operand" "w"))])]
> > 
> > I think you are right about this.  What I have for
> > loadwb_pair_ matches what is there for
> > loadwb_pair_.  If this one is wrong, then I assume
> > the others are wrong too?  This won't make a practical difference since
> > we call these with gen_loadwb_pair*_* calls and not via pattern
> > recognition, but still they should be right.  Should I change them
> > all?  I did not change this as part of this patch.
> 
> I think we should fix the new pattern, but I agree fixing the others
> should be a separate patch.
> 
> Patch LGTM with that change.

I am not sure this is right.  I created a patch (separate from any of
the SIMD changes) to fix the storewb_pair_ and
storewb_pair_ and when I try to build GCC with
that change, gcc aborts while building libgcc.  I didn't think
this change could affect the build but it appears to do so.


/home/sellcey/gcc-md-fix/src/gcc/libgcc/static-object.mk:17: recipe for
target 'addtf3.o' failed
make[1]: *** [addtf3.o] Error 1
Please submit a full bug report,
with preprocessed source if appropriate.
Please include the complete backtrace with any bug report.
See  for instructions.
/home/sellcey/gcc-md-fix/src/gcc/libgcc/static-object.mk:17: recipe for
target 'unwind-dw2.o' failed
make[1]: *** [unwind-dw2.o] Error 1
0x86bc7b dwarf2out_frame_debug_expr
/home/sellcey/gcc-md-fix/src/gcc/gcc/dwarf2cfi.c:1910
0x86acaf dwarf2out_frame_debug_expr
/home/sellcey/gcc-md-fix/src/gcc/gcc/dwarf2cfi.c:1616
0x86c13b dwarf2out_frame_debug
/home/sellcey/gcc-md-fix/src/gcc/gcc/dwarf2cfi.c:2169
0x86c13b scan_insn_after
/home/sellcey/gcc-md-fix/src/gcc/gcc/dwarf2cfi.c:2511


The patch I was trying was:


diff --git a/gcc/config/aarch64/aarch64.md b/gcc/config/aarch64/aarch64.md
index 6657316..3530dd4 100644
--- a/gcc/config/aarch64/aarch64.md
+++ b/gcc/config/aarch64/aarch64.md
@@ -1464,7 +1464,8 @@
  (set (mem:GPI (plus:P (match_dup 0)
   (match_dup 4)))
  (match_operand:GPI 2 "register_operand" "r"))
- (set (mem:GPI (plus:P (match_dup 0)
+ (set (mem:GPI (plus:P (plus:P (match_dup 0)
+  (match_dup 4))
   (match_operand:P 5 "const_int_operand" "n")))
  (match_operand:GPI 3 "register_operand" "r"))])]
   "INTVAL (operands[5]) == INTVAL (operands[4]) + GET_MODE_SIZE 
(mode)"
@@ -1480,7 +1481,8 @@
  (set (mem:GPF (plus:P (match_dup 0)
   (match_dup 4)))
  (match_operand:GPF 2 "register_operand" "w"))
- (set (mem:GPF (plus:P (match_dup 0)
+ (set (mem:GPF (plus:P (plus:P (match_dup 0)
+  (match_dup 4))
   (match_operand:P 5 "const_int_operand" "n")))
  (match_operand:GPF 3 "register_operand" "w"))])]
   "INTVAL (operands[5]) == INTVAL (operands[4]) + GET_MODE_SIZE 
(mode)"





Re: [PATCH 2/4] c/c++, asm: Use nicer error for duplicate asm qualifiers

2018-12-12 Thread Segher Boessenkool
On Wed, Dec 12, 2018 at 11:02:29AM -0700, Martin Sebor wrote:
> On 12/12/18 10:50 AM, Segher Boessenkool wrote:
> >On Tue, Dec 11, 2018 at 10:31:02AM -0700, Martin Sebor wrote:
> >>>+  {
> >>>+error_at (loc, "duplicate asm qualifier %qE", token->value);
> >>
> >>We have been making an effort to quote keywords, identifiers,
> >>option names, and other such things in diagnostics.  In
> >>the message above and all others like it in this patch kit
> >>that mention "asm" the keyword should be quoted the same
> >>way as the name of the qualifier.
> >
> >This message is about "asm qualifiers".  It is not about the "asm"
> >statement.  You can write this without "asm" keyword, too, anyway (with
> >an "__asm__" or such), making the message even more awkward to quote in
> >that case.  The location of the error has nothing to do with the "asm",
> >either.
> >
> >You should only quote keywords that are in the source text.  Not random
> >words that *could* be a keyword :-)
> 
> asm is not a random word, or even an English word (please look
> in a dictionary if you're in doubt).  It is a C/C++ keyword :-)

An "asm qualifier" is a GCC extension.


Segher


Re: Add a loop versioning pass

2018-12-12 Thread Richard Sandiford
Richard Biener  writes:
> On Thu, Dec 6, 2018 at 2:19 PM Richard Sandiford
>> Tested on x86_64-linux-gnu, aarch64-linux-gnu and aarch64_be-elf.
>> Also repeated the performance testing (but haven't yet tried an
>> LTO variant; will do that over the weekend).
>
> Any results?

Sorry, I should've remembered that finding time to run tests is easy,
finding time to analyse them is hard.

Speed-wise, the impact of the patch for LTO is similar to without,
with 554.roms_r being the main beneficiary for both AArch64 and x86_64.
I get a 6.8% improvement on Cortex-A57 with -Ofast -mcpu=native
-flto=jobserver.

Size-wise, there are three tests that grow by >=2% on x86_64:

549.fotonik3d_r: 5.5%
548.exchange2_r: 29.5%
554.roms_r: 39.6%

The roms increase seems fair enough given the benefit, since the
whole program uses a similar style for handling arrays.

fotonik3d is a mixed bag.  Some versioning conditions are from
things like:

  subroutine foo(a)
real :: a(:,:,:)
a(1,:,:) = ...
  end subroutine

where we version for the middle dimension having a stride of 1.
This could be eliminated if we had more semantic information.

Other conditions are for things like:

  subroutine foo(a)
real :: a(:,:,:)
a(:,1,:) = ...
  end subroutine

where the pass really does help, but unfortunately those loops
aren't hot and might not even be used at all.

Some opportunities are cases in which we avoid gather loads, such as
in the "mp" loads in the hot __material_mod_MOD_mat_updatee function.
But mp feeds a gather load itself and these assignments aren't a
critical part of the loop.

(I'm testing on an AVX-only system, so these are synthesised gathers
rather than real gathers.)

The growth in 548.exchange2_r comes from reasonable changes to cold code.
The test spends almost all its time in __brute_force_MOD_digits_2, which
contains the same code before and after the patch, but which accounts
for less than 1% of .text before the patch.

> I've skimmed over the updated patch and it looks
> a lot better now.
>
> +bool
> +loop_versioning
> +::find_per_loop_multiplication (address_info &address, address_term_info 
> &term)
> +{
>
> is that what coding convention allows?

Not sure we've settled on something, so I improvised.

> For grepping I'd then say we should do
>
> bool loop_versioning::
> find_per_loop_multiplication (...)
>
> ;)

Sounds good to me.

> Anywhere else we you use
>
> loop_versioning::foo
>
> so please stick to that.

Yeah, I used that when I could avoid an argument spilling over 80 chars,
but I think I missed that the above function now fits into that category,
Will double-check the others.

> Otherwise OK.

Thanks :-)

> I think I don't see a testcase where we could version both loops in a nest
> so I'm not sure whether the transform works fine when you are only
> updating SSA form at the very end of the pass.

You mean something like:

  real :: foo(:,:), bar(:)

  do i=1,n
do j=1,n
  foo(i,j) = ...
end do
bar(i) = ..
  end do

?  I can add a test if so.

At the moment the pass only looks for direct versioning opportunities
in inner loops, so the assignment to bar wouldn't be treated as a
versioning opportunity.  We should still hoist the version check
for foo outside both loops, which is tested by loop_versioning_4.f90
for cases in which the outer loop doesn't have its own array
arithmetic, but isn't tested for cases like the above.

> There may also be some subtle issues with substitute_and_fold being
> applied to non-up-to-date SSA form given it folds stmts looking at
> (single-use!) SSA edges.  The single-use guard might be what saves you
> here (SSA uses in the copies are not yet updated to point to the
> copied DEFs).

OK.  I was hoping that because we only apply substitute_and_fold
to new code, there would be no problem with uses elsewhere.

Would it be safer to:

  - version all loops we want to version
  - update SSA explicitly
  - apply substitute and fold to all "new" loops

?  Could we then get away with returning a 0 TODO at the end?

Thanks,
Richard


Re: [EXT] Re: [Patch 1/4][Aarch64] v2: Implement Aarch64 SIMD ABI

2018-12-12 Thread Richard Sandiford
Steve Ellcey  writes:
> On Wed, 2018-12-12 at 11:39 +, Richard Sandiford wrote:
>> 
>> Steve Ellcey  writes:
>> > On Fri, 2018-12-07 at 17:34 +, Richard Sandiford wrote:
>> > > > +  (match_operand:TX 2 "register_operand" "w"))
>> > > > + (set (mem:TX (plus:P (match_dup 0)
>> > > > +  (match_operand:P 5 "const_int_operand"
>> > > > "n")))
>> > > > +  (match_operand:TX 3 "register_operand" "w"))])]
>> > > 
>> > > Think this last part should be:
>> > > 
>> > >  (set (mem:TX (plus:P (plus:P (match_dup 0)
>> > >   (match_dup 4))
>> > >   (match_operand:P 5 "const_int_operand"
>> > > "n")))
>> > >   (match_operand:TX 3 "register_operand" "w"))])]
>> > 
>> > I think you are right about this.  What I have for
>> > loadwb_pair_ matches what is there for
>> > loadwb_pair_.  If this one is wrong, then I assume
>> > the others are wrong too?  This won't make a practical difference since
>> > we call these with gen_loadwb_pair*_* calls and not via pattern
>> > recognition, but still they should be right.  Should I change them
>> > all?  I did not change this as part of this patch.
>> 
>> I think we should fix the new pattern, but I agree fixing the others
>> should be a separate patch.
>> 
>> Patch LGTM with that change.
>
> I am not sure this is right.  I created a patch (separate from any of
> the SIMD changes) to fix the storewb_pair_ and
> storewb_pair_ and when I try to build GCC with
> that change, gcc aborts while building libgcc.  I didn't think
> this change could affect the build but it appears to do so.

You're right, sorry, I'd misread the code.  Patch LGTM as posted.

Thanks,
Richard


Re: [ping] Change static chain to r11 on aarch64

2018-12-12 Thread Wilco Dijkstra
Hi,

>> On 12 Dec 2018, at 18:21, Richard Earnshaw (lists) 
>>  wrote:
>
>>  However, that introduces an issue that that
>> code is potentially used across multiple versions of gcc, with
>> potentially different choices of the static chain register.  Hmm, this
>> might need some more careful thought

The static chain is only used inside nested functions, so it's not an ABI but a
function-local agreement. Although it looks like you can take the address of
a nested function, I think you cannot ever export it in a way that exposes a
different static chain given each address-taken nested function would emit
its own trampoline on the stack.

In fact the trampoline implementation is broken by design since the stack
should not be executable by default.

>> I'm also not keen on the fact that we are now seriously eating into the
>> space of call clobbered registers; what's the argument behind your
>> selection of r11 as opposed to any other register?

The static chain register is only used on entry to a nested function. That's why
I suggested using x9 given x8 is the last argument register.

> suggested r9, then I discovered that r9 and r10 were used
> by the stack probing mechanism, so I just picked the following
> one that didn't seem to be used for other purposes already.

We could rename those temporaries if we think x9 is better than x11.

Wilco



[PATCH] Fix independent-cloneids-1.c testcase (PR88318)

2018-12-12 Thread Segher Boessenkool
The testcase uses REs like {(?n)\m_*bar[.$_]constprop[.$_]0:} to find
what functions are defined.  But, this also matches lines like
.L.bar.constprop.0:
(which are used on powerpc64-linux).
The "(?n)" doesn't do anything here either.  We should use "^" here
instead of just "\m".

Committing to trunk.


Segher


2018-12-12  Segher Boessenkool  

PR testsuite/88318
* gcc.dg/independent-cloneids-1.c: Use ^ not \m.

---
 gcc/testsuite/gcc.dg/independent-cloneids-1.c | 16 
 1 file changed, 8 insertions(+), 8 deletions(-)

diff --git a/gcc/testsuite/gcc.dg/independent-cloneids-1.c 
b/gcc/testsuite/gcc.dg/independent-cloneids-1.c
index 3203895..61c1203 100644
--- a/gcc/testsuite/gcc.dg/independent-cloneids-1.c
+++ b/gcc/testsuite/gcc.dg/independent-cloneids-1.c
@@ -28,11 +28,11 @@ baz (int arg)
   return foo (8);
 }
 
-/* { dg-final { scan-assembler-times {(?n)\m_*bar[.$_]constprop[.$_]0:} 1 } } 
*/
-/* { dg-final { scan-assembler-times {(?n)\m_*bar[.$_]constprop[.$_]1:} 1 } } 
*/
-/* { dg-final { scan-assembler-times {(?n)\m_*bar[.$_]constprop[.$_]2:} 1 } } 
*/
-/* { dg-final { scan-assembler-times {(?n)\m_*foo[.$_]constprop[.$_]0:} 1 } } 
*/
-/* { dg-final { scan-assembler-times {(?n)\m_*foo[.$_]constprop[.$_]1:} 1 } } 
*/
-/* { dg-final { scan-assembler-times {(?n)\m_*foo[.$_]constprop[.$_]2:} 1 } } 
*/
-/* { dg-final { scan-assembler-not {(?n)\m_*foo[.$_]constprop[.$_]3:} } } */
-/* { dg-final { scan-assembler-not {(?n)\m_*foo[.$_]constprop[.$_]4:} } } */
+/* { dg-final { scan-assembler-times {(?n)^_*bar[.$_]constprop[.$_]0:} 1 } } */
+/* { dg-final { scan-assembler-times {(?n)^_*bar[.$_]constprop[.$_]1:} 1 } } */
+/* { dg-final { scan-assembler-times {(?n)^_*bar[.$_]constprop[.$_]2:} 1 } } */
+/* { dg-final { scan-assembler-times {(?n)^_*foo[.$_]constprop[.$_]0:} 1 } } */
+/* { dg-final { scan-assembler-times {(?n)^_*foo[.$_]constprop[.$_]1:} 1 } } */
+/* { dg-final { scan-assembler-times {(?n)^_*foo[.$_]constprop[.$_]2:} 1 } } */
+/* { dg-final { scan-assembler-not {(?n)^_*foo[.$_]constprop[.$_]3:} } } */
+/* { dg-final { scan-assembler-not {(?n)^_*foo[.$_]constprop[.$_]4:} } } */
-- 
1.8.3.1



[PATCH, rs6000] Allow libitm to use HTM on newer hw and kernels

2018-12-12 Thread Peter Bergner
Libitm on POWER hardware looks for the "htm" bit in AT_HWCAP2 to determine
whether it can use HTM when executing code within __transaction_atomic
code blocks.  However, on newer hardware and kernels, the "htm" bit is no
longer set and instead the "htm-no-suspend" bit is set, so we currently
don't use HTM on new hw and kernels.  The following patch adds support
for htm-no-suspend to libitm.  I have also added code to use the
__builtin_cpu_supports() builtin if it is available, since that is
much faster than using the getauxval libc call.

This passed bootstrap and regtesting with no errors and someone within
IBM how had a POWER9 box with a newish kernel how ran into the problem
confirmed it works for his test case.

Ok for mainline?  Should be backport this?

Peter

* config/powerpc/target.h (PPC_FEATURE2_HTM_NO_SUSPEND): Conditionally
define.
(htm_available):  Add support for PPC_FEATURE2_HTM_NO_SUSPEND.
Use __builtin_cpu_supports if available.

Index: libitm/config/powerpc/target.h
===
--- libitm/config/powerpc/target.h  (revision 267062)
+++ libitm/config/powerpc/target.h  (working copy)
@@ -26,6 +26,11 @@
 #include 
 #endif
 
+/* This is a fairly new feature bit, so handle it not being defined.  */
+#ifndef PPC_FEATURE2_HTM_NO_SUSPEND
+# define PPC_FEATURE2_HTM_NO_SUSPEND 0
+#endif
+
 namespace GTM HIDDEN {
 
 typedef int v128 __attribute__((vector_size(16), may_alias, aligned(16)));
@@ -81,7 +86,16 @@ cpu_relax (void)
 static inline bool
 htm_available (void)
 {
-  return (getauxval (AT_HWCAP2) & PPC_FEATURE2_HAS_HTM) ? true : false;
+#ifdef __BUILTIN_CPU_SUPPORTS__
+  if (__builtin_cpu_supports ("htm-no-suspend")
+  || __builtin_cpu_supports ("htm"))
+return true;
+#else
+  if (getauxval (AT_HWCAP2)
+  & (PPC_FEATURE2_HAS_HTM | PPC_FEATURE2_HTM_NO_SUSPEND))
+return true;
+#endif
+  return false;
 }
 
 static inline uint32_t



Re: [PATCH, rs6000] Port cleanup patch, use rtl.h convenience macros, etc.

2018-12-12 Thread Peter Bergner
Ping.

Peter


On 12/4/18 10:12 AM, Peter Bergner wrote:
> Hi Segher,
> 
> We talked about replacing rs6000'c regno_or_subregno() with the generic
> reg_or_subregno() function from jump.c.  I agree the geberic version is
> better because it has an assert that ensures we have a REG.  There were
> also a couple of places that could use reg_or_subregno() where we weren't.
> I made those changes and tested it and ran into no regressions.
> 
> Nearby the changes above, there were cases where could could have been using
> the rtl convenience macros like REG_P, etc. to simplify the code so I did
> that too...and kept going and just ended up converting all the cases I could
> find that should have been using them, to use them.  So the patch is large,
> but pretty straight forward.
> 
> The passed bootstrap and regtesting on powerpc64le-linux with no regressions.
> Ok for mainline?
> 
> Peter
> 
>   * config/rs6000/altivec.md (build_vector_mask_for_load): Use MEM_P.
>   * config/rs6000/constraints.md (Q constraint): Use REG_P.
>   * config/rs6000/darwin.h (PREFERRED_RELOAD_CLASS): Use SYMBOL_REF_P.
>   * config/rs6000/freebsd64.h (ASM_OUTPUT_SPECIAL_POOL_ENTRY_P): Use
>   SYMBOL_REF_P, CONST_INT_P and CONST_DOUBLE_P.
>   * config/rs6000/linux64.h (ASM_OUTPUT_SPECIAL_POOL_ENTRY_P): Likewise.
>   * config/rs6000/predicates.md (altivec_register_operand, vint_operand,
>   vsx_register_operand, vsx_reg_sfsubreg_ok, vfloat_operand,
>   vlogical_operand, gpc_reg_operand, int_reg_operand,
>   int_reg_operand_not_pseudo): Use SUBREG_P and HARD_REGISTER_P.
>   (ca_operand, base_reg_operand, htm_spr_reg_operand, cc_reg_operand,
>   cc_reg_not_cr0_operand, input_operand): Use SUBREG_P.
>   (save_world_operation, restore_world_operation, lmw_operation,
>   stmw_operation): Use MEM_P and REG_P.
>   (tie_operand): Use MEM_P.
>   (vrsave_operation, crsave_operation): Use REG_P.
>   (mfcr_operation, mtcrf_operation): Use REG_P and CONST_INT_P.
>   (fpr_reg_operand): Use SUBREG_P and HARD_REGISTER_NUM_P.
>   (quad_int_reg_operand): Use HARD_REGISTER_NUM_P.
>   (call_operand): Use HARD_REGISTER_P.
>   (indexed_or_indirect_operand, altivec_indexed_or_indirect_operand):
>   Use CONST_INT_P.
>   (lwa_operand): Use SUBREG_P, REG_P and CONST_INT_P.
>   * config/rs6000/rs6000-p8swap.c (insn_is_load_p, insn_is_store_p,
>   quad_aligned_load_p, replace_swapped_aligned_store,
>   recombine_lvx_pattern, replace_swapped_aligned_load,
>   recombine_stvx_pattern): Use MEM_P.
>   (const_load_sequence_p, adjust_vperm, replace_swapped_load_constant):
>   Use MEM_P and SYMBOL_REF_P.
>   (rtx_is_swappable_p): Use REG_P and CONST_INT_P.
>   (insn_is_swappable_p): Use REG_P and MEM_P.
>   (insn_is_swap_p, (alignment_mask): Use CONST_INT_P.
>   * config/rs6000/rs6000-string.c (expand_block_clear, expand_block_move):
>   Use CONST_INT_P.
>   * config/rs6000/rs6000.c (rs6000_secondary_reload, rs6000_emit_cmove,
>   rs6000_legitimate_constant_p): Use CONST_DOUBLE_P.
>   (rs6000_output_move_128bit): Use CONST_DOUBLE_P, CONST_INT_P and
>   CONST_WIDE_INT_P.
>   (rs6000_legitimize_address): Use CONST_DOUBLE_P, CONST_INT_P,
>   CONST_WIDE_INT_P, REG_P and SYMBOL_REF_P.
>   (rs6000_emit_move): Use CONST_DOUBLE_P, CONST_INT_P, HARD_REGISTER_P,
>   HARD_REGISTER_NUM_P, MEM_P, REG_P, SUBREG_P, SYMBOL_REF_P and
>   reg_or_subregno:
>   (output_toc): Use CONST_DOUBLE_P, CONST_INT_P and SYMBOL_REF_P.
>   (easy_altivec_constant, rs6000_legitimate_offset_address_p,
>   rs6000_mode_dependent_address, rs6000_expand_mtfsf_builtin,
>   rs6000_expand_set_fpscr_rn_builtin, rs6000_expand_set_fpscr_drn_builtin,
>   rs6000_expand_unop_builtin, INT_P, rs6000_generate_compare,
>   rs6000_machopic_legitimize_pic_address, rs6000_split_logical_inner,
>   rs6000_split_logical_di): Use CONST_INT_P.
>   (rs6000_legitimize_reload_address): Use CONST_INT_P, HARD_REGISTER_P,
>   REG_P and SYMBOL_REF_P.
>   (setup_incoming_varargs, rs6000_rtx_costs): Use CONST_INT_P and MEM_P.
>   (print_operand): Use CONST_INT_P, MEM_P and REG_P.
>   (virtual_stack_registers_memory_p, rs6000_legitimate_address_p,
>   mems_ok_for_quad_peep): Use CONST_INT_P and REG_P.
>   (rs6000_secondary_reload_memory): Use CONST_INT_P and SUBREG_P.
>   (small_data_operand, print_operand_address): Use CONST_INT_P and
>   SYMBOL_REF_P.
>   (split_stack_arg_pointer_used_p): Use HARD_REGISTER_P.
>   (rs6000_init_hard_regno_mode_ok, direct_move_p): Use 
> HARD_REGISTER_NUM_P.
>   (rs6000_secondary_reload_gpr): Use HARD_REGISTER_NUM_P and MEM_P.
>   (rs6000_secondary_reload_class): Use HARD_REGISTER_NUM_P, REG_P,
>   SUBREG_P and SYMBOL_REF_P.
>   (register_to_reg_type, rs6000_secondary_reload_inner): Use SUBREG_P
>   and HARD_REGISTER_NUM_P.
>   (rs6000_adjust_ve

Re: [PATCH, rs6000] Fix PR87496: ICE in aggregate_value_p at gcc/function.c:2046

2018-12-12 Thread Peter Bergner
On 12/7/18 11:38 AM, Peter Bergner wrote:
> On 12/4/18 4:53 PM, Segher Boessenkool wrote:
>>> Since I backported the earlier fix to GCC8, I'd like to backport this
>>> there too.
>>
>> Okay for there too.
> 
> Great, I'll backport the changes and commit after regression testing.
> Thanks!

Backports committed to GCC 8 and GCC 7 now too.

Peter





Re: [ping] Change static chain to r11 on aarch64

2018-12-12 Thread Uecker, Martin
Am Mittwoch, den 12.12.2018, 18:53 + schrieb Wilco Dijkstra:
> Hi,
> 
> > > On 12 Dec 2018,@18:21, Richard Earnshaw (lists) 
> 
> wrote:
> 
> > > However, that introduces an issue that that
> > > code is potentially used across multiple versions of gcc, with
> > > potentially different choices of the static chain register.  Hmm, this
> > > might need some more careful thought
> 
> The static chain is only used inside nested functions, so it's not an ABI but 
> a
> function-local agreement. Although it looks like you can take the address of
> a nested function, I think you cannot ever export it in a way that exposes a
> different static chain given each address-taken nested function would emit
> its own trampoline on the stack.
> 
> In fact the trampoline implementation is broken by design since the stack
> should not be executable by default.

Does a non-executable stack actually improve security?


For the alternative implementation using (custom) function
descriptors (-fno-trampolines) the static chain becomes
part of the ABI or not?

Best,
Martin



> > > I'm also not keen on the fact that we are now seriously eating into the
> > > space of call clobbered registers; what's the argument behind your
> > > selection of r11 as opposed to any other register?
> 
> The static chain register is only used on entry to a nested function.
> That's why I suggested using x9 given x8 is the last argument register.
> 
> > suggested r9, then I discovered that r9 and r10 were used
> > by the stack probing mechanism, so I just picked the following
> > one that didn't seem to be used for other purposes already.
> 
> We could rename those temporaries if we think x9 is better than x11.
> 
> Wilco
> 

[PATCH] match_asm_constraints: Use copy_rtx where needed (PR88001)

2018-12-12 Thread Segher Boessenkool
The new insn here (temporarily) illegally shares RTL.  This fixes it.

Tested with an ARC cross, and regstrapped on powerpc64-linux {-m32,-m64}.
Is this okay for trunk?


Segher


2018-12-12  Segher Boessenkool  

PR rtl-optimization/88001
* function.c (match_asm_constraints_1): Don't invalidly share RTL.

---
 gcc/function.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/gcc/function.c b/gcc/function.c
index 69523c1..60e96f3 100644
--- a/gcc/function.c
+++ b/gcc/function.c
@@ -6529,7 +6529,7 @@ match_asm_constraints_1 (rtx_insn *insn, rtx *p_sets, int 
noutputs)
   output_matched[match] = true;
 
   start_sequence ();
-  emit_move_insn (output, input);
+  emit_move_insn (output, copy_rtx (input));
   insns = get_insns ();
   end_sequence ();
   emit_insn_before (insns, insn);
-- 
1.8.3.1



Re: RFA: libiberty: Add a limit on demangling qualifiers (PR 87241)

2018-12-12 Thread Jason Merrill
On Wed, Dec 12, 2018 at 6:29 AM Nick Clifton  wrote:
>
>   Sorry to bother you, but I have another libiberty demangler resource
>   exhaustion prevention patch to present.  This one is for:
>
> https://gcc.gnu.org/bugzilla/show_bug.cgi?id=87241
>
>   Jonathan Wakely reported that __cxa_demanlge() was returning a -2
>   result, but I did not see this.  Instead I found that
>   consume_count_with_underscores() is returning a very large number
>   (because a very large value is encoded in the mangled string) and this
>   is resulting in many calls to remember_Ktype() which eventually
>   exhaust the amount of memory available.
>
>   The attached patch is a simplistic approach to solving this problem by
>   adding a hard upper limit on the number of qualifiers that will be
>   allowed by the demangler.  I am not sure if this is the best approach
>   to solving the problem, but it is a simple one, and I would think one
>   that would not prevent the demangling of any real mangled names.  The
>   limit does not have to be DEMANGLE_RECURSE_LIMIT of course.  I just
>   chose that value because it was convenient and of a size that I
>   thought was appropriate.
>
>   I also did run the libiberty testsuite this time, with no failures
>   reported. :-)
>
>   OK to apply ?
>
> Cheers
>   Nick
>
> libiberty/ChangeLog
> 2018-12-12  Nick Clifton  
>
> * cplus-dem.c (demangle_qualified): Add an upper limit on the
> number of qualifiers supported, based upon the value of
> DEMANGLE_RECURSE_LIMIT.

This issue also will be resolved by disabling or removing the old
demangling code, which I haven't seen anyone argue against.

Jason


[patch] Fix bootstrap for non linux powerpc targets

2018-12-12 Thread Andreas Tobler

Hi all,

this patch fixes bootstrap for my powerpc*-unknown-freebsd* targets.
The definition of GNU_USER_DYNAMIC_LINKER was recently moved to linux.h.

But the GNU_USER_DYNAMIC_LINKER is still used in rs6000/sysv4.h.
So I add an empty definition with guard to cure the bootstrap issue.

Ok for trunk?

TIA,
Andreas

2018-12-12  Andreas Tobler  

* config/rs6000/sysv4.h: Add an empty definition for
GNU_USER_DYNAMIC_LINKER for all targets which do not include
linux.h where GNU_USER_DYNAMIC_LINKER is defined.

Index: sysv4.h
===
--- sysv4.h (revision 267063)
+++ sysv4.h (working copy)
@@ -761,6 +761,9 @@
 #define MUSL_DYNAMIC_LINKER \
   "/lib/ld-musl-powerpc" MUSL_DYNAMIC_LINKER_E "%{msoft-float:-sf}.so.1"

+#ifndef GNU_USER_DYNAMIC_LINKER
+#define GNU_USER_DYNAMIC_LINKER ""
+#endif
 #define LINK_OS_LINUX_SPEC "-m elf32ppclinux %{!shared: %{!static: \
   %{rdynamic:-export-dynamic} \
   -dynamic-linker " GNU_USER_DYNAMIC_LINKER "}}"


Re: [PATCH 1/2] v3: C++: more location wrapper nodes (PR c++/43064, PR c++/43486)

2018-12-12 Thread Jason Merrill

On 12/7/18 3:13 PM, David Malcolm wrote:

On Tue, 2018-12-04 at 18:31 -0500, Jason Merrill wrote:

On 12/3/18 5:10 PM, Jeff Law wrote:

On 11/19/18 9:51 AM, David Malcolm wrote:

[...]

@@ -1058,6 +1058,9 @@ grokbitfield (const cp_declarator
*declarator,
return NULL_TREE;
  }
  
+  if (width)

+STRIP_ANY_LOCATION_WRAPPER (width);


Why is this needed?  We should already be reducing width to an
unwrapped
constant value somewhere along the line.


"width" is coming from cp_parser_member_declaration, from:

  /* Get the width of the bitfield.  */
  width = cp_parser_constant_expression (parser, false, NULL,
 cxx_dialect >= cxx11);

and currently nothing is unwrapping the value.  We presumably need to
unwrap (or fold?) it before it is stashed in
   DECL_BIT_FIELD_REPRESENTATIVE (value).

Without stripping (or folding) here, we e.g. lose a warning and get this:
   FAIL: g++.dg/abi/empty22.C  -std=gnu++98  (test for warnings, line 15)


Why does that happen?  check_bitfield_decl ought to handle the location 
wrapper fine.  That's where it gets folded.



@@ -656,6 +656,9 @@ add_capture (tree lambda, tree id, tree
orig_init, bool by_reference_p,
listmem = make_pack_expansion (member);
initializer = orig_init;
  }
+
+  STRIP_ANY_LOCATION_WRAPPER (initializer);


Why is this needed?  What cares about the tree codes of the capture
initializer?


This is used to populate LAMBDA_EXPR_CAPTURE_LIST.  Without stripping,
we end up with wrapped VAR_DECLs, rather than the VAR_DECLs themselves,


Sure, that sounds fine.


and this confuses things later on, for example leading to:

  PASS -> FAIL : g++.dg/cpp0x/lambda/lambda-type.C  -std=c++14 (test for excess 
errors)
  PASS -> FAIL : g++.dg/cpp0x/lambda/lambda-type.C  -std=c++17 (test for excess 
errors)


Confuses how?

Jason


Re: [PATCH 2/2] v2: C++: improvements to binary operator diagnostics (PR c++/87504)

2018-12-12 Thread Jason Merrill

On 12/4/18 5:35 PM, David Malcolm wrote:

The v1 patch:
   https://gcc.gnu.org/ml/gcc-patches/2018-11/msg00303.html
has bitrotten somewhat, so here's v2 of the patch, updated relative
to r266740.

Blurb from v1 patch follows:

The C frontend is able (where expression locations are available) to print
problems with binary operators in 3-location form, labelling the types of
the expressions:

   arg_0 op arg_1
   ~ ^~ ~
 ||
 |arg1 type
 arg0 type

The C++ frontend currently just shows the combined location:

   arg_0 op arg_1
   ~~^~~~

and fails to highlight where the subexpressions are, or their types.

This patch introduces a op_location_t struct for handling the above
operator-location vs combined-location split, and a new
class binary_op_rich_location for displaying the above, so that the
C++ frontend is able to use the more detailed 3-location form for
type mismatches in binary operators, and for -Wtautological-compare
(where types are not displayed).  Both forms can be seen in this
example:

bad-binary-ops.C:69:20: error: no match for 'operator&&' (operand types are
   's' and 't')
69 |   return ns_4::foo && ns_4::inner::bar;
   |  ~ ^~ 
   ||   |
   |s   t
bad-binary-ops.C:69:20: note: candidate: 'operator&&(bool, bool)' 
69 |   return ns_4::foo && ns_4::inner::bar;
   |  ~~^~~

The patch also allows from some uses of macros in
-Wtautological-compare, where both sides of the comparison have
been spelled the same way, e.g.:

Wtautological-compare-ranges.c:23:11: warning: self-comparison always
evaluates to true [-Wtautological-compare]
23 |   if (FOO == FOO);
   |   ^~

Successfully bootstrapped & regrtested on x86_64-pc-linux-gnu, in
conjunction with the previous patch.

OK for trunk?
Dave

gcc/c-family/ChangeLog:
PR c++/87504
* c-common.h (warn_tautological_cmp): Convert 1st param from
location_t to const op_location_t &.
* c-warn.c (find_array_ref_with_const_idx_r): Strip location
wrapper when testing for INTEGER_CST.
(warn_tautological_bitwise_comparison): Convert 1st param from
location_t to const op_location_t &; use it to build a
binary_op_rich_location, and use this.
(spelled_the_same_p): New function.
(warn_tautological_cmp): Convert 1st param from location_t to
const op_location_t &.  Warn for macro expansions if
spelled_the_same_p.  Use binary_op_rich_location.

gcc/c/ChangeLog:
PR c++/87504
* c-typeck.c (class maybe_range_label_for_tree_type_mismatch):
Move from here to gcc-rich-location.h and gcc-rich-location.c.
(build_binary_op): Use struct op_location_t and
class binary_op_rich_location.

gcc/cp/ChangeLog:
PR c++/87504
* call.c (op_error): Convert 1st param from location_t to
const op_location_t &.  Use binary_op_rich_location for binary
ops.
(build_conditional_expr_1): Convert 1st param from location_t to
const op_location_t &.
(build_conditional_expr): Likewise.
(build_new_op_1): Likewise.
(build_new_op): Likewise.
* cp-tree.h (build_conditional_expr): Likewise.
(build_new_op): Likewise.
(build_x_binary_op): Likewise.
(cp_build_binary_op): Likewise.
* parser.c (cp_parser_primary_expression): Build a location
for id-expression nodes.
(cp_parser_binary_expression): Use an op_location_t when
calling build_x_binary_op.
(cp_parser_operator): Build a location for user-defined literals.
* typeck.c (build_x_binary_op): Convert 1st param from location_t
to const op_location_t &.
(cp_build_binary_op): Likewise.  Use binary_op_rich_location.

gcc/ChangeLog:
PR c++/87504
* gcc-rich-location.c
(maybe_range_label_for_tree_type_mismatch::get_text): Move here from
c/c-typeck.c.
(binary_op_rich_location::binary_op_rich_location): New ctor.
(binary_op_rich_location::use_operator_loc_p): New function.
* gcc-rich-location.h
(class maybe_range_label_for_tree_type_mismatch)): Move here from
c/c-typeck.c.
(struct op_location_t): New forward decl.
(class binary_op_rich_location): New class.
* tree.h (struct op_location_t): New struct.

gcc/testsuite/ChangeLog:
* c-c++-common/Wtautological-compare-ranges.c: New test.
* g++.dg/cpp0x/pr51420.C: Add -fdiagnostics-show-caret and update
expected output.
* g++.dg/diagnostic/bad-binary-ops.C: Update expected output from
1-location form to 3-location form, with labelling of ranges with
types.  Add examples of id-expression nodes with namespaces.
* g++.dg/diagnostic/param-type-mismatch-2.C: Likewise.

This 

  1   2   >