[PATCH] Adjust gcc.dg/vect/bb-slp-pr65935.c

2020-10-30 Thread Richard Biener
This adjusts the testcase to allow splitting up the group for
larger vector sizes and thus printing the splat message multiple times.

Pushed.

2020-10-30  Richard Biener  

* gcc.dg/vect/bb-slp-pr65935.c: Adjust.
---
 gcc/testsuite/gcc.dg/vect/bb-slp-pr65935.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/gcc/testsuite/gcc.dg/vect/bb-slp-pr65935.c 
b/gcc/testsuite/gcc.dg/vect/bb-slp-pr65935.c
index c262d731150..5d80f560f56 100644
--- a/gcc/testsuite/gcc.dg/vect/bb-slp-pr65935.c
+++ b/gcc/testsuite/gcc.dg/vect/bb-slp-pr65935.c
@@ -62,5 +62,5 @@ int main()
 /* { dg-final { scan-tree-dump-times "optimized: basic block" 10 "slp1" } } */
 /* We should see the s->phase[dir] operand splatted and no other operand built
from scalars.  See PR97334.  */
-/* { dg-final { scan-tree-dump-times "Using a splat" 1 "slp1" } } */
+/* { dg-final { scan-tree-dump "Using a splat" "slp1" } } */
 /* { dg-final { scan-tree-dump-times "Building vector operands from scalars" 0 
"slp1" } } */
-- 
2.26.2


Re: New modref/ipa_modref optimization passes

2020-10-30 Thread Richard Biener
On Thu, 29 Oct 2020, Jan Hubicka wrote:

> > Hi,
> > this is patch I am using to fix the assumed_alias_type.f90 failure by
> > simply arranging alias set 0 for the problematic array descriptor.
> > 
> > I am not sure this is the best option, but I suppose it is better than
> > setting all array descritors to have same canonical type (as done by
> > LTO)?
> > 
> Hi,
> here is updated patch which used TYPELESS_STORAGE instead of alias set
> 0, so it is LTO safe.  Unforunately I also had to enable it for all
> array descriptors otherwise I still get misopitmizations with modref
> extended to handle bulitins, for example:
> 
> FAIL: gfortran.dg/class_array_20.f03   -Os  execution test
> FAIL: gfortran.dg/coindexed_1.f90   -O2  execution test
> FAIL: gfortran.dg/coindexed_1.f90   -O3 -fomit-frame-pointer
> FAIL: gfortran.dg/coindexed_1.f90   -O3 -g  execution test
> 
> This is not a perfect solution (we really want to track array
> descriptors), but it fixes wrong code and would let me to move forward.
> Is it OK for mainline?
> 
> With extended modref I still get infinite loop on pdt_14 testcase.
> ipa-modref only performs disambiguation on
> __vtab_link_module_Pdtlink_8._deallocate this global variable is
> readonly (and is detected as such with LTO) so it must be just
> uncovering some latent problem there.  I am however not familiar enough
> with Fortran to tell what is wrong there.
> 
> The testcase fail different way with -flto for me.
> 
> Bootstrapped/regtested x86_64-linux, OK?

OK.

Thanks,
Richard.

> Honza
> 
>   * trans-types.c: Include alias.h
>   (gfc_get_array_type_bounds): Set typeless storage.
> diff --git a/gcc/fortran/trans-types.c b/gcc/fortran/trans-types.c
> index b15ea667411..b7129dcbe6d 100644
> --- a/gcc/fortran/trans-types.c
> +++ b/gcc/fortran/trans-types.c
> @@ -38,6 +38,7 @@ along with GCC; see the file COPYING3.  If not see
>  #include "trans-array.h"
>  #include "dwarf2out.h"   /* For struct array_descr_info.  */
>  #include "attribs.h"
> +#include "alias.h"
>  
>  
>  #if (GFC_MAX_DIMENSIONS < 10)
> @@ -1903,6 +1904,10 @@ gfc_get_array_type_bounds (tree etype, int dimen, int 
> codimen, tree * lbound,
>base_type = gfc_get_array_descriptor_base (dimen, codimen, false);
>TYPE_CANONICAL (fat_type) = base_type;
>TYPE_STUB_DECL (fat_type) = TYPE_STUB_DECL (base_type);
> +  /* Arrays of unknown type must alias with all array descriptors.  */
> +  TYPE_TYPELESS_STORAGE (base_type) = 1;
> +  TYPE_TYPELESS_STORAGE (fat_type) = 1;
> +  gcc_checking_assert (!get_alias_set (base_type) && !get_alias_set 
> (fat_type));
>  
>tmp = TYPE_NAME (etype);
>if (tmp && TREE_CODE (tmp) == TYPE_DECL)
> 


Re: [EXTERNAL] Re: [PATCH] [tree-optimization] Fix for PR97223

2020-10-30 Thread Richard Biener via Gcc-patches
On Thu, Oct 29, 2020 at 8:45 PM Eugene Rozenfeld
 wrote:
>
> Thank you for the review Richard!
>
> I re-worked the patch based on your suggestions. I combined the two patterns. 
> Neither one requires a signedness check as long as the type of the 'add' has 
> overflow wrap semantics.
>
> I had to modify the regular expression in no-strict-overflow-4.c test. In 
> that test the following function is compiled with -fno-strict-overflow :
>
> int
> foo (int i)
> {
>   return i + 1 > i;
> }
>
> We now optimize this function so that the tree-optimized dump has
>
> ;; Function foo (foo, funcdef_no=0, decl_uid=1931, cgraph_uid=1, 
> symbol_order=0)
>
> foo (int i)
> {
>   _Bool _1;
>   int _3;
>
>[local count: 1073741824]:
>   _1 = i_2(D) != 2147483647;
>   _3 = (int) _1;
>   return _3;
> }
>
> This is a correct optimization since -fno-strict-overflow implies -fwrapv.

OK.

Thanks,
Richard.

> Eugene
>
> -Original Message-
> From: Richard Biener 
> Sent: Tuesday, October 27, 2020 2:23 AM
> To: Eugene Rozenfeld 
> Cc: gcc-patches@gcc.gnu.org
> Subject: [EXTERNAL] Re: [PATCH] [tree-optimization] Fix for PR97223
>
> On Sat, Oct 24, 2020 at 2:20 AM Eugene Rozenfeld via Gcc-patches 
>  wrote:
> >
> > This patch adds a pattern for folding
> > x < (short) ((unsigned short)x + const) to
> >  x <= SHORT_MAX - const
> > (and similarly for other integral types) if const is not 0.
> > as described in PR97223.
> >
> > For example, without this patch the x86_64-pc-linux code generated for
> > this function
> >
> > bool f(char x)
> > {
> > return x < (char)(x + 12);
> > }
> >
> > is
> >
> > leaeax,[rdi+0xc]
> > cmpal,dil
> > setg   al
> > ret
> >
> > With the patch the code is
> >
> > cmpdil,0x73
> > setle  al
> > ret
> >
> > Tested on x86_64-pc-linux.
>
> +/* Similar to the previous pattern but with additional casts. */ (for
> +cmp (lt le ge gt)
> + out (gt gt le le)
> + (simplify
> +  (cmp:c (convert@3 (plus@2 (convert@4 @0) INTEGER_CST@1)) @0)
> +  (if (!TYPE_UNSIGNED (TREE_TYPE (@0))
> +   && types_match (TREE_TYPE (@0), TREE_TYPE (@3))
> +   && types_match (TREE_TYPE (@4), unsigned_type_for (TREE_TYPE (@0)))
> +   && TYPE_OVERFLOW_WRAPS (TREE_TYPE (@4))
> +   && wi::to_wide (@1) != 0
> +   && single_use (@2))
> +   (with { unsigned int prec = TYPE_PRECISION (TREE_TYPE (@0)); }
> +(out @0 { wide_int_to_tree (TREE_TYPE (@0),
> +   wi::max_value (prec, SIGNED)
> +   - wi::to_wide (@1)); })
>
> I think it's reasonable but the comment can be made more precise.
> In particular I wonder why we require a signed comparison here while the 
> previous pattern requires an unsigned comparison.  It might be an artifact 
> and the restriction instead only applies to the plus?
>
> Note that
>
> +   && types_match (TREE_TYPE (@4), unsigned_type_for (TREE_TYPE
> + (@0)))
>
> unsigned_type_for should be avoided since it's quite expensive.  May I suggest
>
>   && TYPE_UNSIGNED (TREE_TYPE (@4))
>   && tree_nop_conversion_p (TREE_TYPE (@4), TREE_TYPE (@0))
>
> instead?
>
> I originally wondered if "but with additional casts" could be done in a 
> single pattern via (convert? ...) uses but then I noticed the strange 
> difference in the comparison signedness requirement ...
>
> Richard.
>
> > Eugene
> >


[committed] openmp: Handle non-static data members in allocate clause and other C++ allocate fixes

2020-10-30 Thread Jakub Jelinek via Gcc-patches
Hi!

This allows specification of non-static data members in allocate clause like it
can be specified in other privatization clauses and adds a new testcase that 
covers
also handling of that clause in templates.

Bootstrapped/regtested on x86_64-linux and i686-linux, committed to trunk.

2020-10-30  Jakub Jelinek  

* semantics.c (finish_omp_clauses) : Handle
non-static members in methods.
* pt.c (tsubst_omp_clauses): Handle OMP_CLAUSE_ALLOCATE.

* c-c++-common/gomp/allocate-1.c (qux): Add another test.
* g++.dg/gomp/allocate-1.C: New test.

--- gcc/cp/semantics.c.jj   2020-10-28 10:37:50.577607075 +0100
+++ gcc/cp/semantics.c  2020-10-29 17:45:19.845141599 +0100
@@ -7200,7 +7200,11 @@ finish_omp_clauses (tree clauses, enum c
  break;
 
case OMP_CLAUSE_ALLOCATE:
- t = OMP_CLAUSE_DECL (c);
+ t = omp_clause_decl_field (OMP_CLAUSE_DECL (c));
+ if (t)
+   omp_note_field_privatization (t, OMP_CLAUSE_DECL (c));
+ else
+   t = OMP_CLAUSE_DECL (c);
  if (t == current_class_ptr)
{
  error_at (OMP_CLAUSE_LOCATION (c),
@@ -7208,7 +7212,9 @@ finish_omp_clauses (tree clauses, enum c
  remove = true;
  break;
}
- if (!VAR_P (t) && TREE_CODE (t) != PARM_DECL)
+ if (!VAR_P (t)
+ && TREE_CODE (t) != PARM_DECL
+ && TREE_CODE (t) != FIELD_DECL)
{
  if (processing_template_decl && TREE_CODE (t) != OVERLOAD)
break;
@@ -7232,17 +7238,18 @@ finish_omp_clauses (tree clauses, enum c
  bitmap_set_bit (&aligned_head, DECL_UID (t));
  allocate_seen = true;
}
- t = OMP_CLAUSE_ALLOCATE_ALLOCATOR (c);
- if (error_operand_p (t))
+ tree allocator;
+ allocator = OMP_CLAUSE_ALLOCATE_ALLOCATOR (c);
+ if (error_operand_p (allocator))
{
  remove = true;
  break;
}
- if (t == NULL_TREE)
-   break;
+ if (allocator == NULL_TREE)
+   goto handle_field_decl;
  tree allocatort;
- allocatort = TYPE_MAIN_VARIANT (TREE_TYPE (t));
- if (!type_dependent_expression_p (t)
+ allocatort = TYPE_MAIN_VARIANT (TREE_TYPE (allocator));
+ if (!type_dependent_expression_p (allocator)
  && (TREE_CODE (allocatort) != ENUMERAL_TYPE
  || TYPE_NAME (allocatort) == NULL_TREE
  || TREE_CODE (TYPE_NAME (allocatort)) != TYPE_DECL
@@ -7254,17 +7261,17 @@ finish_omp_clauses (tree clauses, enum c
  error_at (OMP_CLAUSE_LOCATION (c),
"% clause allocator expression has "
"type %qT rather than %",
-   TREE_TYPE (t));
+   TREE_TYPE (allocator));
  remove = true;
}
  else
{
- t = mark_rvalue_use (t);
+ allocator = mark_rvalue_use (allocator);
  if (!processing_template_decl)
-   t = maybe_constant_value (t);
- OMP_CLAUSE_ALLOCATE_ALLOCATOR (c) = t;
+   allocator = maybe_constant_value (allocator);
+ OMP_CLAUSE_ALLOCATE_ALLOCATOR (c) = allocator;
}
- break;
+ goto handle_field_decl;
 
case OMP_CLAUSE_DEPEND:
  t = OMP_CLAUSE_DECL (c);
--- gcc/cp/pt.c.jj  2020-10-29 15:20:38.0 +0100
+++ gcc/cp/pt.c 2020-10-29 17:28:31.933498329 +0100
@@ -17391,6 +17391,7 @@ tsubst_omp_clauses (tree clauses, enum c
  case OMP_CLAUSE_IS_DEVICE_PTR:
  case OMP_CLAUSE_INCLUSIVE:
  case OMP_CLAUSE_EXCLUSIVE:
+ case OMP_CLAUSE_ALLOCATE:
/* tsubst_expr on SCOPE_REF results in returning
   finish_non_static_data_member result.  Undo that here.  */
if (TREE_CODE (OMP_CLAUSE_DECL (oc)) == SCOPE_REF
--- gcc/testsuite/c-c++-common/gomp/allocate-1.c.jj 2020-10-28 
10:37:50.582607002 +0100
+++ gcc/testsuite/c-c++-common/gomp/allocate-1.c2020-10-29 
17:52:46.106118779 +0100
@@ -74,3 +74,11 @@ foo (int x, int z)
 r += bar (x, &r, 0);
   #pragma omp taskwait
 }
+
+void
+qux (const omp_allocator_handle_t h)
+{
+  int x = 0;
+  #pragma omp parallel firstprivate (x) allocate (h: x)
+  x = 1;
+}
--- gcc/testsuite/g++.dg/gomp/allocate-1.C.jj   2020-10-29 17:19:48.314404417 
+0100
+++ gcc/testsuite/g++.dg/gomp/allocate-1.C  2020-10-29 17:51:26.126018984 
+0100
@@ -0,0 +1,88 @@
+// { dg-do compile }
+// { dg-additional-options "-std=c++11" }
+
+typedef enum omp_allocator_handle_t
+#if __cplusplus >= 201103L
+: __UINTPTR_TYPE__
+#endif
+{
+  omp_null_allocator = 0,
+  omp_default_mem_alloc = 1,
+  omp_large_cap_mem_alloc = 2,
+  omp_const_mem_alloc = 3,
+  omp_high_bw_mem_alloc = 4,
+  omp_low_lat_mem_alloc = 5,
+  omp_cgroup_mem_alloc = 6,
+  omp_pteam_mem_alloc = 7,
+  o

Re: [PATCH] [tree-optimization] Fix for PR96701

2020-10-30 Thread Richard Biener via Gcc-patches
On Fri, Oct 30, 2020 at 3:38 AM Eugene Rozenfeld via Gcc-patches
 wrote:
>
> This patch adds a pattern for folding
>
> x >> x
>
> to
>
>   0
>
> as described in PR96701.
>
>
> Without this patch the x86_64-pc-linux-gnu code generated for this function
>
>
>
> int
>
> foo (int i)
>
> {
>
>   return i >> i;
>
> }
>
>
>
> is
>
>
>
> movecx,edi
>
> saredi,cl
>
> test   edi,edi
>
> setne  al
>
> ret
>
>
>
> With the patch the code is
>
>
> xoreax,eax
> ret
>
>
> Tested on x86_64-pc-linux-gnu.

OK.

Thanks,
Richard.

> Eugene


[committed] openmp: Fix handling of allocate clause on taskloop

2020-10-30 Thread Jakub Jelinek via Gcc-patches
Hi!

This patch fixes gimplification of allocate clause on taskloop - puts
allocate on inner taskloop only if there is allocate clause, because otherwise
the data sharing clauses are only on the task construct in the construct 
sandwich.

Bootstrapped/regtested on x86_64-linux and i686-linux, committed to trunk.

2020-10-30  Jakub Jelinek  

* gimplify.c (gimplify_scan_omp_clauses): Force
OMP_CLAUSE_ALLOCATE_ALLOCATOR into a temporary if it is non-NULL and
non-constant.
(gimplify_omp_for): Only put allocate on inner taskloop if lastprivate
for the same variable is going to be put there, and in that case
if the OMP_CLAUSE_ALLOCATE_ALLOCATOR is non-NULL non-constant, make
the allocator firstprivate on task.

* c-c++-common/gomp/allocate-3.c: New test.

--- gcc/gimplify.c.jj   2020-10-28 10:37:50.488608373 +0100
+++ gcc/gimplify.c  2020-10-29 20:05:23.869378065 +0100
@@ -9721,6 +9721,13 @@ gimplify_scan_omp_clauses (tree *list_p,
  remove = true;
  break;
}
+ else if (code == OMP_TASKLOOP
+  && OMP_CLAUSE_ALLOCATE_ALLOCATOR (c)
+  && (TREE_CODE (OMP_CLAUSE_ALLOCATE_ALLOCATOR (c))
+  != INTEGER_CST))
+   OMP_CLAUSE_ALLOCATE_ALLOCATOR (c)
+ = get_initialized_tmp_var (OMP_CLAUSE_ALLOCATE_ALLOCATOR (c),
+pre_p, NULL, false);
  break;
 
case OMP_CLAUSE_DEFAULT:
@@ -12120,6 +12127,20 @@ gimplify_omp_for (tree *expr_p, gimple_s
   tree *gtask_clauses_ptr = &task_clauses;
   tree outer_for_clauses = NULL_TREE;
   tree *gforo_clauses_ptr = &outer_for_clauses;
+  bitmap lastprivate_uids = NULL;
+  if (omp_find_clause (c, OMP_CLAUSE_ALLOCATE))
+   {
+ c = omp_find_clause (c, OMP_CLAUSE_LASTPRIVATE);
+ if (c)
+   {
+ lastprivate_uids = BITMAP_ALLOC (NULL);
+ for (; c; c = omp_find_clause (OMP_CLAUSE_CHAIN (c),
+OMP_CLAUSE_LASTPRIVATE))
+   bitmap_set_bit (lastprivate_uids,
+   DECL_UID (OMP_CLAUSE_DECL (c)));
+   }
+ c = *gfor_clauses_ptr;
+   }
   for (; c; c = OMP_CLAUSE_CHAIN (c))
switch (OMP_CLAUSE_CODE (c))
  {
@@ -12207,12 +12228,35 @@ gimplify_omp_for (tree *expr_p, gimple_s
gtask_clauses_ptr
  = &OMP_CLAUSE_CHAIN (*gtask_clauses_ptr);
break;
- /* Allocate clause we duplicate on task and inner taskloop.  */
+ /* Allocate clause we duplicate on task and inner taskloop
+if the decl is lastprivate, otherwise just put on task.  */
  case OMP_CLAUSE_ALLOCATE:
-   *gfor_clauses_ptr = c;
-   gfor_clauses_ptr = &OMP_CLAUSE_CHAIN (c);
-   *gtask_clauses_ptr = copy_node (c);
-   gtask_clauses_ptr = &OMP_CLAUSE_CHAIN (*gtask_clauses_ptr);
+   if (lastprivate_uids
+   && bitmap_bit_p (lastprivate_uids,
+DECL_UID (OMP_CLAUSE_DECL (c
+ {
+   if (OMP_CLAUSE_ALLOCATE_ALLOCATOR (c)
+   && DECL_P (OMP_CLAUSE_ALLOCATE_ALLOCATOR (c)))
+ {
+   /* Additionally, put firstprivate clause on task
+  for the allocator if it is not constant.  */
+   *gtask_clauses_ptr
+ = build_omp_clause (OMP_CLAUSE_LOCATION (c),
+ OMP_CLAUSE_FIRSTPRIVATE);
+   OMP_CLAUSE_DECL (*gtask_clauses_ptr)
+ = OMP_CLAUSE_ALLOCATE_ALLOCATOR (c);
+   gtask_clauses_ptr = &OMP_CLAUSE_CHAIN (*gtask_clauses_ptr);
+ }
+   *gfor_clauses_ptr = c;
+   gfor_clauses_ptr = &OMP_CLAUSE_CHAIN (c);
+   *gtask_clauses_ptr = copy_node (c);
+   gtask_clauses_ptr = &OMP_CLAUSE_CHAIN (*gtask_clauses_ptr);
+ }
+   else
+ {
+   *gtask_clauses_ptr = c;
+   gtask_clauses_ptr = &OMP_CLAUSE_CHAIN (c);
+ }
break;
  default:
gcc_unreachable ();
@@ -12220,6 +12264,7 @@ gimplify_omp_for (tree *expr_p, gimple_s
   *gfor_clauses_ptr = NULL_TREE;
   *gtask_clauses_ptr = NULL_TREE;
   *gforo_clauses_ptr = NULL_TREE;
+  BITMAP_FREE (lastprivate_uids);
   g = gimple_build_bind (NULL_TREE, gfor, NULL_TREE);
   g = gimple_build_omp_task (g, task_clauses, NULL_TREE, NULL_TREE,
 NULL_TREE, NULL_TREE, NULL_TREE);
--- gcc/testsuite/c-c++-common/gomp/allocate-3.c.jj 2020-10-29 
20:12:58.675241433 +0100
+++ gcc/testsuite/c-c++-common/gomp/allocate-3.c2020-10-29 
18:47:45.400924987 +0100
@@ -0,0 +1,38 @@
+typedef enum omp_allocator_handle_t
+#if __cplusplus >= 201103L
+: __UINTPTR_TY

[RS6000] Adjust testcases for power10 instructions V3

2020-10-30 Thread Alan Modra via Gcc-patches
And now waking up to what you meant by the lvsl-lvsr.c \s comment,
plus a revised ppc-ne0-1.c scan-assembler.

I think this covers all previous review corrections.  Regression tested
powerpc64-linux power7 and powerpc64le-linux power10.  OK?

* lib/target-supports.exp (check_effective_target_has_arch_pwr10): New.
* gcc.dg/pr56727-2.c,
gcc.dg/vect/costmodel/ppc/costmodel-bb-slp-9a-pr63175.c,
gcc.target/powerpc/fold-vec-load-builtin_vec_xl-char.c,
gcc.target/powerpc/fold-vec-load-builtin_vec_xl-double.c,
gcc.target/powerpc/fold-vec-load-builtin_vec_xl-float.c,
gcc.target/powerpc/fold-vec-load-builtin_vec_xl-int.c,
gcc.target/powerpc/fold-vec-load-builtin_vec_xl-longlong.c,
gcc.target/powerpc/fold-vec-load-builtin_vec_xl-short.c,
gcc.target/powerpc/fold-vec-load-vec_vsx_ld-char.c,
gcc.target/powerpc/fold-vec-load-vec_vsx_ld-double.c,
gcc.target/powerpc/fold-vec-load-vec_vsx_ld-float.c,
gcc.target/powerpc/fold-vec-load-vec_vsx_ld-int.c,
gcc.target/powerpc/fold-vec-load-vec_vsx_ld-longlong.c,
gcc.target/powerpc/fold-vec-load-vec_vsx_ld-short.c,
gcc.target/powerpc/fold-vec-load-vec_xl-char.c,
gcc.target/powerpc/fold-vec-load-vec_xl-double.c,
gcc.target/powerpc/fold-vec-load-vec_xl-float.c,
gcc.target/powerpc/fold-vec-load-vec_xl-int.c,
gcc.target/powerpc/fold-vec-load-vec_xl-longlong.c,
gcc.target/powerpc/fold-vec-load-vec_xl-short.c,
gcc.target/powerpc/fold-vec-splat-floatdouble.c,
gcc.target/powerpc/fold-vec-splat-longlong.c,
gcc.target/powerpc/fold-vec-store-builtin_vec_xst-char.c,
gcc.target/powerpc/fold-vec-store-builtin_vec_xst-double.c,
gcc.target/powerpc/fold-vec-store-builtin_vec_xst-float.c,
gcc.target/powerpc/fold-vec-store-builtin_vec_xst-int.c,
gcc.target/powerpc/fold-vec-store-builtin_vec_xst-short.c,
gcc.target/powerpc/fold-vec-store-vec_vsx_st-char.c,
gcc.target/powerpc/fold-vec-store-vec_vsx_st-double.c,
gcc.target/powerpc/fold-vec-store-vec_vsx_st-float.c,
gcc.target/powerpc/fold-vec-store-vec_vsx_st-int.c,
gcc.target/powerpc/fold-vec-store-vec_vsx_st-longlong.c,
gcc.target/powerpc/fold-vec-store-vec_vsx_st-short.c,
gcc.target/powerpc/fold-vec-store-vec_xst-char.c,
gcc.target/powerpc/fold-vec-store-vec_xst-double.c,
gcc.target/powerpc/fold-vec-store-vec_xst-float.c,
gcc.target/powerpc/fold-vec-store-vec_xst-int.c,
gcc.target/powerpc/fold-vec-store-vec_xst-longlong.c,
gcc.target/powerpc/fold-vec-store-vec_xst-short.c,
gcc.target/powerpc/lvsl-lvsr.c,
gcc.target/powerpc/ppc-eq0-1.c,
gcc.target/powerpc/ppc-ne0-1.c,
gcc.target/powerpc/pr86731-fwrapv-longlong.c: Match power10 insns.
* gcc.target/powerpc/lvsl-lvsr.c: Avoid file name match.

diff --git a/gcc/testsuite/gcc.dg/pr56727-2.c b/gcc/testsuite/gcc.dg/pr56727-2.c
index c54369ed25e..f055116772a 100644
--- a/gcc/testsuite/gcc.dg/pr56727-2.c
+++ b/gcc/testsuite/gcc.dg/pr56727-2.c
@@ -18,4 +18,4 @@ void h ()
 
 /* { dg-final { scan-assembler "@(PLT|plt)" { target i?86-*-* x86_64-*-* } } } 
*/
 /* { dg-final { scan-assembler "@(PLT|plt)" { target { powerpc*-*-linux* && 
ilp32 } } } } */
-/* { dg-final { scan-assembler "bl f\n\\s*nop" { target { powerpc*-*-linux* && 
lp64 } } } } */
+/* { dg-final { scan-assembler {bl f(\n\s*nop|@notoc\n)} { target { 
powerpc*-*-linux* && lp64 } } } } */
diff --git 
a/gcc/testsuite/gcc.dg/vect/costmodel/ppc/costmodel-bb-slp-9a-pr63175.c 
b/gcc/testsuite/gcc.dg/vect/costmodel/ppc/costmodel-bb-slp-9a-pr63175.c
index 246f38fa6d1..1cff4550f28 100644
--- a/gcc/testsuite/gcc.dg/vect/costmodel/ppc/costmodel-bb-slp-9a-pr63175.c
+++ b/gcc/testsuite/gcc.dg/vect/costmodel/ppc/costmodel-bb-slp-9a-pr63175.c
@@ -25,6 +25,6 @@ main1 (void)
with no word loads (lw, lwu, lwz, lwzu, or their indexed forms)
or word stores (stw, stwu, stwx, stwux, or their indexed forms).  */
 
-/* { dg-final { scan-assembler "\t(lvx|lxv|lvsr|stxv)" } } */
+/* { dg-final { scan-assembler "\t(lvx|p?lxv|lvsr|p?stxv)" } } */
 /* { dg-final { scan-assembler-not "\tlwz?u?x? " { xfail { powerpc-ibm-aix* } 
} } } */
 /* { dg-final { scan-assembler-not "\tstwu?x? " } } */
diff --git 
a/gcc/testsuite/gcc.target/powerpc/fold-vec-load-builtin_vec_xl-char.c 
b/gcc/testsuite/gcc.target/powerpc/fold-vec-load-builtin_vec_xl-char.c
index 9b199c219bf..104710700c8 100644
--- a/gcc/testsuite/gcc.target/powerpc/fold-vec-load-builtin_vec_xl-char.c
+++ b/gcc/testsuite/gcc.target/powerpc/fold-vec-load-builtin_vec_xl-char.c
@@ -36,4 +36,4 @@ BUILD_VAR_TEST( test10, vector unsigned char, signed long 
long, vector unsigned
 BUILD_VAR_TEST( test11, vector unsigned char, signed int, vector unsigned 
char);
 BUILD_CST_TEST( test12, vector unsigned char, 8, vector unsigned char);
 
-/* { dg-final { scan-assembler-times {\m

Re: [PATCH] calls.c:can_implement_as_sibling_call_p REG_PARM_STACK_SPACE check

2020-10-30 Thread Richard Sandiford via Gcc-patches
Alan Modra via Gcc-patches  writes:
> This moves an #ifdef block of code from calls.c to
> targetm.function_ok_for_sibcall.  Only two targets, x86 and rs6000,
> define REG_PARM_STACK_SPACE or OUTGOING_REG_PARM_STACK_SPACE macros
> that might vary depending on the called function.  Macros like
> UNITS_PER_WORD don't change over a function boundary, nor does the
> MIPS ABI, nor does TARGET_64BIT on PA-RISC.  Other targets are even
> more trivially seen to not need the calls.c code.
>
> Besides cleaning up a small piece of #ifdef code, the motivation for
> this patch is to allow tail calls on PowerPC for functions that
> require less reg_parm_stack_space than their caller.  The original
> code in calls.c only permitted tail calls when exactly equal.

Is there something PowerPC-specific that makes the relaxation safe
for that target while not being safe on x86?

I take your point about x86 and PowerPC being the only two affected
targets.  But the interface does still take an fndecl on all targets,
so I think the target-independent assumption should be that the value
might vary depending on function.  So I guess an alternative would be
to relax the target-independent condition and make the x86 hook enforce
the stricter condition (if it really is needed).

Thanks,
Richard


[committed] aarch64: Fix PR96998 and restore code quality in combine

2020-10-30 Thread Alex Coplan via Gcc-patches
This patch fixes a bug in the AArch64 backend. Currently, we accept an
odd sign_extract representation of addresses, but don't accept that same
odd form of address as an LEA.

This is the cause of PR96998. In the testcase given in the PR, combine
produces:

(insn 9 8 10 3 (set (mem:SI (plus:DI (sign_extract:DI (mult:DI (subreg:DI 
(reg/v:SI 92 [ g ]) 0)
(const_int 4 [0x4]))
(const_int 34 [0x22])
(const_int 0 [0]))
(reg/f:DI 96)) [3 *i_5+0 S4 A32])
(asm_operands:SI ("") ("=Q") 0 []
 []
 [] test.c:11)) "test.c":11:5 -1
 (expr_list:REG_DEAD (reg/v:SI 92 [ g ])
(nil)))

Then LRA reloads the address and we ICE because we fail to recognize the
sign_extract outside the mem:

(insn 33 8 34 3 (set (reg:DI 100)
(sign_extract:DI (ashift:DI (subreg:DI (reg/v:SI 92 [ g ]) 0)
(const_int 2 [0x2]))
(const_int 34 [0x22])
(const_int 0 [0]))) "test.c":11:5 -1
 (nil))

The aarch64 changes here remove the support for this sign_extract
representation of addresses, fixing PR96998. Now this by itself would
regress code quality, so this change is paired with an improvement to
combine which prevents an extract rtx from being emitted in this case:
we now write the rtx above as a shift of an extend, which allows the
combination to go ahead.

Prior to this, combine.c:make_extraction() identified where we can emit
an ashift of an extend in place of an extraction, but failed to make the
corresponding canonicalization/simplification when presented with a mult
by a power of two. Such a representation is canonical when representing
a left-shifted address inside a mem.

This change remedies this situation. For rtxes such as:

(mult:DI (subreg:DI (reg:SI r) 0) (const_int 2^n))

where the bottom 32 + n bits are valid (the higher-order bits are
undefined) and make_extraction() is being asked to sign_extract the
lower (valid) bits, after the patch, we rewrite this as:

(mult:DI (sign_extend:DI (reg:SI r)) (const_int 2^n))

instead of using a sign_extract.

gcc/ChangeLog:

PR target/96998
* combine.c (make_extraction): Also handle shifts written as
(mult x 2^n), avoid creating an extract rtx for these.
* config/aarch64/aarch64.c (aarch64_is_extend_from_extract): Delete.
(aarch64_classify_index): Remove extract-based address handling.
(aarch64_strip_extend): Likewise.
(aarch64_rtx_arith_op_extract_p): Likewise, remove now-unused parameter.
Update callers...
(aarch64_rtx_costs): ... here.

gcc/testsuite/ChangeLog:

PR target/96998
* gcc.c-torture/compile/pr96998.c: New test.

---

aarch64 change approved here:
https://gcc.gnu.org/pipermail/gcc-patches/2020-October/556267.html

combine change approved here:
https://gcc.gnu.org/pipermail/gcc-patches/2020-October/557387.html

Committing as one change to avoid regressing the trunk.

Testing:
 * Bootstrapped and regtested on aarch64-linux-gnu, arm-linux-gnueabihf,
   and x86_64-linux-gnu: no regressions.

Pushed to master.

Thanks,
Alex
diff --git a/gcc/combine.c b/gcc/combine.c
index 4782e1d9dcc..ed1ad45de83 100644
--- a/gcc/combine.c
+++ b/gcc/combine.c
@@ -7665,6 +7665,24 @@ make_extraction (machine_mode mode, rtx inner, 
HOST_WIDE_INT pos,
   if (new_rtx != 0)
return gen_rtx_ASHIFT (mode, new_rtx, XEXP (inner, 1));
 }
+  else if (GET_CODE (inner) == MULT
+  && CONST_INT_P (XEXP (inner, 1))
+  && pos_rtx == 0 && pos == 0)
+{
+  /* We're extracting the least significant bits of an rtx
+(mult X (const_int 2^C)), where LEN > C.  Extract the
+least significant (LEN - C) bits of X, giving an rtx
+whose mode is MODE, then multiply it by 2^C.  */
+  const HOST_WIDE_INT shift_amt = exact_log2 (INTVAL (XEXP (inner, 1)));
+  if (IN_RANGE (shift_amt, 1, len - 1))
+   {
+ new_rtx = make_extraction (mode, XEXP (inner, 0),
+0, 0, len - shift_amt,
+unsignedp, in_dest, in_compare);
+ if (new_rtx)
+   return gen_rtx_MULT (mode, new_rtx, XEXP (inner, 1));
+   }
+}
   else if (GET_CODE (inner) == TRUNCATE
   /* If trying or potentionally trying to extract
  bits outside of is_mode, don't look through
diff --git a/gcc/config/aarch64/aarch64.c b/gcc/config/aarch64/aarch64.c
index 35d6f2e2f01..db991e59cbe 100644
--- a/gcc/config/aarch64/aarch64.c
+++ b/gcc/config/aarch64/aarch64.c
@@ -2886,33 +2886,6 @@ aarch64_is_noplt_call_p (rtx sym)
   return false;
 }
 
-/* Return true if the offsets to a zero/sign-extract operation
-   represent an expression that matches an extend operation.  The
-   operands represent the parameters from
-
-   (extract:MODE (mult (reg) (MULT_IMM)) (EXTRACT_IMM) (const_int 0)).  */
-bool
-aarch64_is_extend_from_extract (scalar_int_mode mode, 

Re: [PATCH] SLP vectorize across PHI nodes

2020-10-30 Thread Richard Sandiford via Gcc-patches
Richard Biener  writes:
> On Wed, 28 Oct 2020, Christophe Lyon wrote:
>
>> On Wed, 28 Oct 2020 at 11:27, Christophe Lyon
>>  wrote:
>> >
>> > On Tue, 27 Oct 2020 at 13:18, Richard Biener  wrote:
>> > >
>> > > This makes SLP discovery detect backedges by seeding the bst_map with
>> > > the node to be analyzed so it can be picked up from recursive calls.
>> > > This removes the need to discover backedges in a separate walk.
>> > >
>> > > This enables SLP build to handle PHI nodes in full, continuing
>> > > the SLP build to non-backedges.  For loop vectorization this
>> > > enables outer loop vectorization of nested SLP cycles and for
>> > > BB vectorization this enables vectorization of PHIs at CFG merges.
>> > >
>> > > It also turns code generation into a SCC discovery walk to handle
>> > > irreducible regions and nodes only reachable via backedges where
>> > > we now also fill in vectorized backedge defs.
>> > >
>> > > This requires sanitizing the SLP tree for SLP reduction chains even
>> > > more, manually filling the backedge SLP def.
>> > >
>> > > This also exposes the fact that CFG copying (and edge splitting
>> > > until I fixed that) ends up with different edge order in the
>> > > copy which doesn't play well with the desired 1:1 mapping of
>> > > SLP PHI node children and edges for epilogue vectorization.
>> > > I've tried to fixup CFG copying here but this really looks
>> > > like a dead (or expensive) end there so I've done fixup in
>> > > slpeel_tree_duplicate_loop_to_edge_cfg instead for the cases
>> > > we can run into.
>> > >
>> > > There's still NULLs in the SLP_TREE_CHILDREN vectors and I'm
>> > > not sure it's possible to eliminate them all this stage1 so the
>> > > patch has quite some checks for this case all over the place.
>> > >
>> > > Bootstrapped and tested on x86_64-unknown-linux-gnu.  SPEC CPU 2017
>> > > and SPEC CPU 2006 successfully built and tested.
>> > >
>> > > Will push soon.
>> > >
>> > > Richard.
>> > >
>> > > 2020-10-27  Richard Biener  
>> > >
>> > > * gimple.h (gimple_expr_type): For PHIs return the type
>> > > of the result.
>> > > * tree-vect-loop-manip.c 
>> > > (slpeel_tree_duplicate_loop_to_edge_cfg):
>> > > Make sure edge order into copied loop headers line up with the
>> > > originals.
>> > > * tree-vect-loop.c (vect_transform_cycle_phi): Handle nested
>> > > loops with SLP.
>> > > (vectorizable_phi): New function.
>> > > (vectorizable_live_operation): For BB vectorization compute 
>> > > insert
>> > > location here.
>> > > * tree-vect-slp.c (vect_free_slp_tree): Deal with NULL
>> > > SLP_TREE_CHILDREN entries.
>> > > (vect_create_new_slp_node): Add overloads with pre-existing node
>> > > argument.
>> > > (vect_print_slp_graph): Likewise.
>> > > (vect_mark_slp_stmts): Likewise.
>> > > (vect_mark_slp_stmts_relevant): Likewise.
>> > > (vect_gather_slp_loads): Likewise.
>> > > (vect_optimize_slp): Likewise.
>> > > (vect_slp_analyze_node_operations): Likewise.
>> > > (vect_bb_slp_scalar_cost): Likewise.
>> > > (vect_remove_slp_scalar_calls): Likewise.
>> > > (vect_get_and_check_slp_defs): Handle PHIs.
>> > > (vect_build_slp_tree_1): Handle PHIs.
>> > > (vect_build_slp_tree_2): Continue SLP build, following PHI
>> > > arguments.  Fix memory leak.
>> > > (vect_build_slp_tree): Put stub node into the hash-map so
>> > > we can discover cycles directly.
>> > > (vect_build_slp_instance): Set the backedge SLP def for
>> > > reduction chains.
>> > > (vect_analyze_slp_backedges): Remove.
>> > > (vect_analyze_slp): Do not call it.
>> > > (vect_slp_convert_to_external): Release 
>> > > SLP_TREE_LOAD_PERMUTATION.
>> > > (vect_slp_analyze_node_operations): Handle stray failed
>> > > backedge defs by failing.
>> > > (vect_slp_build_vertices): Adjust leaf condition.
>> > > (vect_bb_slp_mark_live_stmts): Handle PHIs, use visited
>> > > hash-set to handle cycles.
>> > > (vect_slp_analyze_operations): Adjust.
>> > > (vect_bb_partition_graph_r): Likewise.
>> > > (vect_slp_function): Adjust split condition to allow CFG
>> > > merges.
>> > > (vect_schedule_slp_instance): Rename to ...
>> > > (vect_schedule_slp_node): ... this.  Move DFS walk to ...
>> > > (vect_schedule_scc): ... this new function.
>> > > (vect_schedule_slp): Call it.  Remove ad-hoc vectorized
>> > > backedge fill code.
>> > > * tree-vect-stmts.c (vect_analyze_stmt): Call
>> > > vectorizable_phi.
>> > > (vect_transform_stmt): Likewise.
>> > > (vect_is_simple_use): Handle vect_backedge_def.
>> > > * tree-vectorizer.c (vec_info::new_stmt_vec_info): Only
>> > > set loop header PHIs to vect_unknown_def_type f

Re: [PATCH v7] genemit.c (main): split insn-emit.c for compiling parallelly

2020-10-30 Thread Jojo R


Jojo
在 2020年10月27日 +0800 PM10:14,Richard Sandiford ,写道:
> Jojo R  writes:
> > gcc/ChangeLog:
> >
> > * genemit.c (main): Print 'split line'.
> > * Makefile.in (insn-emit.c): Define split count and file
> >
> > ---
> > gcc/Makefile.in | 19 +
> > gcc/genemit.c | 104 +---
> > 2 files changed, 83 insertions(+), 40 deletions(-)
> >
> > diff --git a/gcc/Makefile.in b/gcc/Makefile.in
> > index 79e854aa938..a7fcc7d5949 100644
> > --- a/gcc/Makefile.in
> > +++ b/gcc/Makefile.in
> > @@ -1258,6 +1258,21 @@ ANALYZER_OBJS = \
> > # We put the *-match.o and insn-*.o files first so that a parallel make
> > # will build them sooner, because they are large and otherwise tend to be
> > # the last objects to finish building.
> > +
> > +# target overrides
> > +-include $(tmake_file)
> > +
> > +INSN-GENERATED-SPLIT-NUM ?= 0
> > +
> > +insn-generated-split-num = $(shell i=1; j=`expr 
> > $(INSN-GENERATED-SPLIT-NUM) + 1`; \
> > + while test $$i -le $$j; do \
> > + echo $$i; i=`expr $$i + 1`; \
> > + done)
> > +
> > +insn-emit-split-c := $(foreach o, $(shell for i in 
> > $(insn-generated-split-num); do echo $$i; done), insn-emit$(o).c)
> > +insn-emit-split-obj = $(patsubst %.c,%.o, $(insn-emit-split-c))
> > +$(insn-emit-split-c): insn-emit.c
>
> Sorry for the slow reply. I stand by what I said in
> https://gcc.gnu.org/pipermail/gcc-patches/2020-August/552863.html:
>
> I think we should use the same wordlist technique as check_p_numbers[0-6].
> So I guess the first step would be to rename check_p_numbers[0-6] to
> something more general and use it both here and in check_p_numbers.
>
> I think that would be better than having two different ways of
> generating lists of numbers, one directly in make and one calling
> out to the shell. But I didn't want to reassert that comment in
> case anyone was prepared to approve the patch in its current form.
>

Ok & Thanks.

It’s fixed in patch v8.
> BTW, do you have a copyright assignment on file?

I email the patch without copyright, and I think it is same with other gcc 
community patch.
>
> Thanks,
> Richard
>
> > +
> > OBJS = \
> > gimple-match.o \
> > generic-match.o \
> > @@ -1265,6 +1280,7 @@ OBJS = \
> > insn-automata.o \
> > insn-dfatab.o \
> > insn-emit.o \
> > + $(insn-emit-split-obj) \
> > insn-extract.o \
> > insn-latencytab.o \
> > insn-modes.o \
> > @@ -2365,6 +2381,9 @@ $(simple_generated_c:insn-%.c=s-%): s-%: 
> > build/gen%$(build_exeext)
> > $(RUN_GEN) build/gen$*$(build_exeext) $(md_file) \
> > $(filter insn-conditions.md,$^) > tmp-$*.c
> > $(SHELL) $(srcdir)/../move-if-change tmp-$*.c insn-$*.c
> > + $*v=$$(echo $$(csplit insn-$*.c /parallel\ compilation/ -k -s 
> > {$(INSN-GENERATED-SPLIT-NUM)} -f insn-$* -b "%d.c" 2>&1));\
> > + [ ! "$$$*v" ] || grep "match not found" <<< $$$*v
> > + [ -s insn-$*0.c ] || (for i in $(insn-generated-split-num); do touch 
> > insn-$*$$i.c; done && echo "" > insn-$*.c)
> > $(STAMP) s-$*
> >
> > # gencheck doesn't read the machine description, and the file produced
> > diff --git a/gcc/genemit.c b/gcc/genemit.c
> > index 84d07d388ee..54a0d909d9d 100644
> > --- a/gcc/genemit.c
> > +++ b/gcc/genemit.c
> > @@ -847,24 +847,13 @@ handle_overloaded_gen (overloaded_name *oname)
> > }
> > }
> >
> > -int
> > -main (int argc, const char **argv)
> > -{
> > - progname = "genemit";
> > -
> > - if (!init_rtx_reader_args (argc, argv))
> > - return (FATAL_EXIT_CODE);
> > -
> > -#define DEF_INTERNAL_OPTAB_FN(NAME, FLAGS, OPTAB, TYPE) \
> > - nofail_optabs[OPTAB##_optab] = true;
> > -#include "internal-fn.def"
> > -
> > - /* Assign sequential codes to all entries in the machine description
> > - in parallel with the tables in insn-output.c. */
> > -
> > - printf ("/* Generated automatically by the program `genemit'\n\
> > -from the machine description file `md'. */\n\n");
> > +/* Print include header. */
> >
> > +static void
> > +printf_include (void)
> > +{
> > + printf ("/* Generated automatically by the program `genemit'\n"
> > + "from the machine description file `md'. */\n\n");
> > printf ("#define IN_TARGET_CODE 1\n");
> > printf ("#include \"config.h\"\n");
> > printf ("#include \"system.h\"\n");
> > @@ -900,35 +889,70 @@ from the machine description file `md'. */\n\n");
> > printf ("#include \"tm-constrs.h\"\n");
> > printf ("#include \"ggc.h\"\n");
> > printf ("#include \"target.h\"\n\n");
> > +}
> >
> > - /* Read the machine description. */
> > +/* Generate the `gen_...' function from GET_CODE(). */
> >
> > - md_rtx_info info;
> > - while (read_md_rtx (&info))
> > - switch (GET_CODE (info.def))
> > - {
> > - case DEFINE_INSN:
> > - gen_insn (&info);
> > - break;
> > +static void
> > +gen_md_rtx (md_rtx_info *info)
> > +{
> > + switch (GET_CODE (info->def))
> > + {
> > + case DEFINE_INSN:
> > + gen_insn (info);
> > + break;
> >
> > - case DEFINE_EXPAND:
> > - printf ("/* %s:%d */\n", info.loc.filename, info.loc.lineno);
> > - gen_expand (&info);
> > - break;
> > + case DEFINE_EXPAND:
> > + printf

[PATCH v8] genemit.c (main): split insn-emit.c for compiling parallelly

2020-10-30 Thread Jojo R
gcc/ChangeLog:

* genemit.c (main): Print 'split line'.
* Makefile.in (insn-emit.c): Define split count and file

---
 gcc/Makefile.in |  33 +++
 gcc/genemit.c   | 104 +---
 2 files changed, 89 insertions(+), 48 deletions(-)

diff --git a/gcc/Makefile.in b/gcc/Makefile.in
index 7fc03c8d946..974b65c560d 100644
--- a/gcc/Makefile.in
+++ b/gcc/Makefile.in
@@ -1154,6 +1154,15 @@ export STRIP_FOR_TARGET
 export RANLIB_FOR_TARGET
 export libsubdir
 
+check_p_numbers0:=1 2 3 4 5 6 7 8 9
+check_p_numbers1:=0 $(check_p_numbers0)
+check_p_numbers2:=$(foreach i,$(check_p_numbers0),$(addprefix 
$(i),$(check_p_numbers1)))
+check_p_numbers3:=$(addprefix 0,$(check_p_numbers1)) $(check_p_numbers2)
+check_p_numbers4:=$(foreach i,$(check_p_numbers0),$(addprefix 
$(i),$(check_p_numbers3)))
+check_p_numbers5:=$(addprefix 0,$(check_p_numbers3)) $(check_p_numbers4)
+check_p_numbers6:=$(foreach i,$(check_p_numbers0),$(addprefix 
$(i),$(check_p_numbers5)))
+check_p_numbers:=$(check_p_numbers0) $(check_p_numbers2) $(check_p_numbers4) 
$(check_p_numbers6)
+
 FLAGS_TO_PASS = \
"ADA_CFLAGS=$(ADA_CFLAGS)" \
"BISON=$(BISON)" \
@@ -1259,6 +1268,18 @@ ANALYZER_OBJS = \
 # We put the *-match.o and insn-*.o files first so that a parallel make
 # will build them sooner, because they are large and otherwise tend to be
 # the last objects to finish building.
+
+# target overrides
+-include $(tmake_file)
+
+INSN-GENERATED-SPLIT-NUM ?= 0
+
+insn-generated-split-num = $(wordlist 1,$(shell expr 
$(INSN-GENERATED-SPLIT-NUM) + 1),$(check_p_numbers))
+
+insn-emit-split-c := $(foreach o, $(insn-generated-split-num), insn-emit$(o).c)
+insn-emit-split-obj = $(patsubst %.c,%.o, $(insn-emit-split-c))
+$(insn-emit-split-c): insn-emit.c
+
 OBJS = \
gimple-match.o \
generic-match.o \
@@ -1266,6 +1287,7 @@ OBJS = \
insn-automata.o \
insn-dfatab.o \
insn-emit.o \
+   $(insn-emit-split-obj) \
insn-extract.o \
insn-latencytab.o \
insn-modes.o \
@@ -2375,6 +2397,9 @@ $(simple_generated_c:insn-%.c=s-%): s-%: 
build/gen%$(build_exeext)
$(RUN_GEN) build/gen$*$(build_exeext) $(md_file) \
  $(filter insn-conditions.md,$^) > tmp-$*.c
$(SHELL) $(srcdir)/../move-if-change tmp-$*.c insn-$*.c
+   $*v=$$(echo $$(csplit insn-$*.c /parallel\ compilation/ -k -s 
{$(INSN-GENERATED-SPLIT-NUM)} -f insn-$* -b "%d.c" 2>&1));\
+   [ ! "$$$*v" ] || grep "match not found" <<< $$$*v
+   [ -s insn-$*0.c ] || (for i in $(insn-generated-split-num); do touch 
insn-$*$$i.c; done && echo "" > insn-$*.c)
$(STAMP) s-$*
 
 # gencheck doesn't read the machine description, and the file produced
@@ -4094,14 +4119,6 @@ $(patsubst %,%-subtargets,$(lang_checks)): 
check-%-subtargets:
 check_p_tool=$(firstword $(subst _, ,$*))
 check_p_count=$(check_$(check_p_tool)_parallelize)
 check_p_subno=$(word 2,$(subst _, ,$*))
-check_p_numbers0:=1 2 3 4 5 6 7 8 9
-check_p_numbers1:=0 $(check_p_numbers0)
-check_p_numbers2:=$(foreach i,$(check_p_numbers0),$(addprefix 
$(i),$(check_p_numbers1)))
-check_p_numbers3:=$(addprefix 0,$(check_p_numbers1)) $(check_p_numbers2)
-check_p_numbers4:=$(foreach i,$(check_p_numbers0),$(addprefix 
$(i),$(check_p_numbers3)))
-check_p_numbers5:=$(addprefix 0,$(check_p_numbers3)) $(check_p_numbers4)
-check_p_numbers6:=$(foreach i,$(check_p_numbers0),$(addprefix 
$(i),$(check_p_numbers5)))
-check_p_numbers:=$(check_p_numbers0) $(check_p_numbers2) $(check_p_numbers4) 
$(check_p_numbers6)
 check_p_subdir=$(subst _,,$*)
 check_p_subdirs=$(wordlist 1,$(check_p_count),$(wordlist 1, \
$(if 
$(GCC_TEST_PARALLEL_SLOTS),$(GCC_TEST_PARALLEL_SLOTS),128), \
diff --git a/gcc/genemit.c b/gcc/genemit.c
index 84d07d388ee..54a0d909d9d 100644
--- a/gcc/genemit.c
+++ b/gcc/genemit.c
@@ -847,24 +847,13 @@ handle_overloaded_gen (overloaded_name *oname)
 }
 }
 
-int
-main (int argc, const char **argv)
-{
-  progname = "genemit";
-
-  if (!init_rtx_reader_args (argc, argv))
-return (FATAL_EXIT_CODE);
-
-#define DEF_INTERNAL_OPTAB_FN(NAME, FLAGS, OPTAB, TYPE) \
-  nofail_optabs[OPTAB##_optab] = true;
-#include "internal-fn.def"
-
-  /* Assign sequential codes to all entries in the machine description
- in parallel with the tables in insn-output.c.  */
-
-  printf ("/* Generated automatically by the program `genemit'\n\
-from the machine description file `md'.  */\n\n");
+/* Print include header.  */
 
+static void
+printf_include (void)
+{
+  printf ("/* Generated automatically by the program `genemit'\n"
+ "from the machine description file `md'.  */\n\n");
   printf ("#define IN_TARGET_CODE 1\n");
   printf ("#include \"config.h\"\n");
   printf ("#include \"system.h\"\n");
@@ -900,35 +889,70 @@ from the machine description file `md'.  */\n\n");
   printf ("#include \"tm-constrs.h\"\n");
   printf ("#include \"ggc.h\"\n");
   printf ("#include \"target.h\"\n\n");
+}
 
-  /

Re: [PATCH] SLP vectorize across PHI nodes

2020-10-30 Thread Richard Biener
On Fri, 30 Oct 2020, Richard Sandiford wrote:

> Richard Biener  writes:
> > On Wed, 28 Oct 2020, Christophe Lyon wrote:
> >
> >> On Wed, 28 Oct 2020 at 11:27, Christophe Lyon
> >>  wrote:
> >> >
> >> > On Tue, 27 Oct 2020 at 13:18, Richard Biener  wrote:
> >> > >
> >> > > This makes SLP discovery detect backedges by seeding the bst_map with
> >> > > the node to be analyzed so it can be picked up from recursive calls.
> >> > > This removes the need to discover backedges in a separate walk.
> >> > >
> >> > > This enables SLP build to handle PHI nodes in full, continuing
> >> > > the SLP build to non-backedges.  For loop vectorization this
> >> > > enables outer loop vectorization of nested SLP cycles and for
> >> > > BB vectorization this enables vectorization of PHIs at CFG merges.
> >> > >
> >> > > It also turns code generation into a SCC discovery walk to handle
> >> > > irreducible regions and nodes only reachable via backedges where
> >> > > we now also fill in vectorized backedge defs.
> >> > >
> >> > > This requires sanitizing the SLP tree for SLP reduction chains even
> >> > > more, manually filling the backedge SLP def.
> >> > >
> >> > > This also exposes the fact that CFG copying (and edge splitting
> >> > > until I fixed that) ends up with different edge order in the
> >> > > copy which doesn't play well with the desired 1:1 mapping of
> >> > > SLP PHI node children and edges for epilogue vectorization.
> >> > > I've tried to fixup CFG copying here but this really looks
> >> > > like a dead (or expensive) end there so I've done fixup in
> >> > > slpeel_tree_duplicate_loop_to_edge_cfg instead for the cases
> >> > > we can run into.
> >> > >
> >> > > There's still NULLs in the SLP_TREE_CHILDREN vectors and I'm
> >> > > not sure it's possible to eliminate them all this stage1 so the
> >> > > patch has quite some checks for this case all over the place.
> >> > >
> >> > > Bootstrapped and tested on x86_64-unknown-linux-gnu.  SPEC CPU 2017
> >> > > and SPEC CPU 2006 successfully built and tested.
> >> > >
> >> > > Will push soon.
> >> > >
> >> > > Richard.
> >> > >
> >> > > 2020-10-27  Richard Biener  
> >> > >
> >> > > * gimple.h (gimple_expr_type): For PHIs return the type
> >> > > of the result.
> >> > > * tree-vect-loop-manip.c 
> >> > > (slpeel_tree_duplicate_loop_to_edge_cfg):
> >> > > Make sure edge order into copied loop headers line up with the
> >> > > originals.
> >> > > * tree-vect-loop.c (vect_transform_cycle_phi): Handle nested
> >> > > loops with SLP.
> >> > > (vectorizable_phi): New function.
> >> > > (vectorizable_live_operation): For BB vectorization compute 
> >> > > insert
> >> > > location here.
> >> > > * tree-vect-slp.c (vect_free_slp_tree): Deal with NULL
> >> > > SLP_TREE_CHILDREN entries.
> >> > > (vect_create_new_slp_node): Add overloads with pre-existing 
> >> > > node
> >> > > argument.
> >> > > (vect_print_slp_graph): Likewise.
> >> > > (vect_mark_slp_stmts): Likewise.
> >> > > (vect_mark_slp_stmts_relevant): Likewise.
> >> > > (vect_gather_slp_loads): Likewise.
> >> > > (vect_optimize_slp): Likewise.
> >> > > (vect_slp_analyze_node_operations): Likewise.
> >> > > (vect_bb_slp_scalar_cost): Likewise.
> >> > > (vect_remove_slp_scalar_calls): Likewise.
> >> > > (vect_get_and_check_slp_defs): Handle PHIs.
> >> > > (vect_build_slp_tree_1): Handle PHIs.
> >> > > (vect_build_slp_tree_2): Continue SLP build, following PHI
> >> > > arguments.  Fix memory leak.
> >> > > (vect_build_slp_tree): Put stub node into the hash-map so
> >> > > we can discover cycles directly.
> >> > > (vect_build_slp_instance): Set the backedge SLP def for
> >> > > reduction chains.
> >> > > (vect_analyze_slp_backedges): Remove.
> >> > > (vect_analyze_slp): Do not call it.
> >> > > (vect_slp_convert_to_external): Release 
> >> > > SLP_TREE_LOAD_PERMUTATION.
> >> > > (vect_slp_analyze_node_operations): Handle stray failed
> >> > > backedge defs by failing.
> >> > > (vect_slp_build_vertices): Adjust leaf condition.
> >> > > (vect_bb_slp_mark_live_stmts): Handle PHIs, use visited
> >> > > hash-set to handle cycles.
> >> > > (vect_slp_analyze_operations): Adjust.
> >> > > (vect_bb_partition_graph_r): Likewise.
> >> > > (vect_slp_function): Adjust split condition to allow CFG
> >> > > merges.
> >> > > (vect_schedule_slp_instance): Rename to ...
> >> > > (vect_schedule_slp_node): ... this.  Move DFS walk to ...
> >> > > (vect_schedule_scc): ... this new function.
> >> > > (vect_schedule_slp): Call it.  Remove ad-hoc vectorized
> >> > > backedge fill code.
> >> > > * tree-vect-stmts.c (vect_analyze_stmt): Call
> >> > > vectoriza

[PATCH] tree-optimization/97633 - fix SLP scheduling of single-node cycles

2020-10-30 Thread Richard Biener


This makes sure to update backedges in single-node cycles.

Bootstrapped and tested on x86_64-unknown-linux-gnu, pushed.

2020-10-30  Richard Biener  

PR tree-optimization/97633
* tree-vect-slp.c (): Update backedges in single-node cycles.
Optimize processing of externals.

* g++.dg/vect/slp-pr97636.cc: New testcase.
* gcc.dg/vect/bb-slp-pr97633.c: Likewise.
---
 gcc/testsuite/g++.dg/vect/slp-pr97636.cc   |  83 +++
 gcc/testsuite/gcc.dg/vect/bb-slp-pr97633.c |  27 
 gcc/tree-vect-slp.c| 162 +++--
 3 files changed, 198 insertions(+), 74 deletions(-)
 create mode 100644 gcc/testsuite/g++.dg/vect/slp-pr97636.cc
 create mode 100644 gcc/testsuite/gcc.dg/vect/bb-slp-pr97633.c

diff --git a/gcc/testsuite/g++.dg/vect/slp-pr97636.cc 
b/gcc/testsuite/g++.dg/vect/slp-pr97636.cc
new file mode 100644
index 000..012342004f1
--- /dev/null
+++ b/gcc/testsuite/g++.dg/vect/slp-pr97636.cc
@@ -0,0 +1,83 @@
+/* { dg-do compile } */
+/* { dg-require-effective-target c++17 } */
+
+struct u {
+  int b;
+  int c;
+  template  u(d, e);
+};
+template  struct f { u g; };
+template  class v {
+  typedef f k;
+  k *l[4];
+  k m;
+public:
+  v(h, h);
+  void aa(h, i);
+};
+template  void v::aa(h, i) { n(&l[1], &m); }
+template  void n(f **o, f *ab) {
+  bool p, r;
+  f q = **o;
+  f *t;
+  h a = q.g;
+  h b = t->g;
+  if (r)
+;
+  else
+goto ac;
+s:
+  p = a.b || a.c < b.c;
+  if (p)
+goto s;
+ac:
+  ab->g = b;
+  b = t->g;
+  goto s;
+}
+template  class w {};
+template  class x;
+template  class z;
+class ad {
+public:
+  template 
+  static void ah(const z &, const z &, x *&);
+};
+template 
+void ad::ah(const z &ai, const z &aj, x *&) {
+  u c(0, 0), d(0, 0), g(aj, ai);
+  v e(c, d);
+  e.aa(g, 0);
+}
+template  class ak;
+template 
+void ao(ak ap, ak aq) {
+  x *f;
+  ad::ah(*ap.ar, *aq.ar, f);
+}
+template 
+void au(w ap, w aq) {
+  ao(static_cast(ap), static_cast(aq));
+}
+template  class z {};
+template  class ak : public w> {
+public:
+  z *ar;
+};
+template  class av;
+template 
+void ay(av, av) {
+  aw h, i;
+  au(h, i);
+}
+template  class av {};
+class az {
+public:
+  typedef av> ba;
+};
+int main() {
+  az::ba j, k;
+  ay(j, k);
+}
diff --git a/gcc/testsuite/gcc.dg/vect/bb-slp-pr97633.c 
b/gcc/testsuite/gcc.dg/vect/bb-slp-pr97633.c
new file mode 100644
index 000..ab0ae1de9c9
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/vect/bb-slp-pr97633.c
@@ -0,0 +1,27 @@
+/* { dg-do compile } */
+
+extern short i (void);
+
+struct {
+  short a;
+  short b;
+} c;
+
+int d, e;
+static int f = 1;
+
+void g () {
+  if (e) {
+if (f)
+  goto L;
+while (d) {
+  i ();
+  short j = d, k = i (), l = k;
+L:
+  if (!(d && e) || l)
+goto L;
+  c.a = j;
+  c.b = k;
+}
+  }
+}
diff --git a/gcc/tree-vect-slp.c b/gcc/tree-vect-slp.c
index 5d69a98c2a9..714e50697bd 100644
--- a/gcc/tree-vect-slp.c
+++ b/gcc/tree-vect-slp.c
@@ -5554,97 +5554,114 @@ vect_schedule_scc (vec_info *vinfo, slp_tree node, 
slp_instance instance,
   gcc_assert (!existed_p);
   info->dfs = maxdfs;
   info->lowlink = maxdfs;
-  info->on_stack = true;
   maxdfs++;
+
+  /* Leaf.  */
+  if (SLP_TREE_DEF_TYPE (node) != vect_internal_def)
+{
+  info->on_stack = false;
+  vect_schedule_slp_node (vinfo, node, instance);
+  return;
+}
+
+  info->on_stack = true;
   stack.safe_push (node);
+
   unsigned i;
   slp_tree child;
-
-  /* ???  We're keeping SLP_TREE_CHILDREN of externalized nodes.  */
-  if (SLP_TREE_DEF_TYPE (node) == vect_internal_def)
-FOR_EACH_VEC_ELT (SLP_TREE_CHILDREN (node), i, child)
-  {
-   if (!child)
- continue;
-   slp_scc_info *child_info = scc_info.get (child);
-   if (!child_info)
- {
-   vect_schedule_scc (vinfo, child, instance, scc_info, maxdfs, stack);
-   /* Recursion might have re-allocated the node.  */
-   info = scc_info.get (node);
-   child_info = scc_info.get (child);
-   info->lowlink = MIN (info->lowlink, child_info->lowlink);
- }
-   else if (child_info->on_stack)
- info->lowlink = MIN (info->lowlink, child_info->dfs);
-  }
+  /* DFS recurse.  */
+  FOR_EACH_VEC_ELT (SLP_TREE_CHILDREN (node), i, child)
+{
+  if (!child)
+   continue;
+  slp_scc_info *child_info = scc_info.get (child);
+  if (!child_info)
+   {
+ vect_schedule_scc (vinfo, child, instance, scc_info, maxdfs, stack);
+ /* Recursion might have re-allocated the node.  */
+ info = scc_info.get (node);
+ child_info = scc_info.get (child);
+ info->lowlink = MIN (info->lowlink, child_info->lowlink);
+   }
+  else if (child_info->on_stack)
+   info->lowlink = MIN (info->lowlink, child_info->dfs);
+}
   if (info->lowlink != info->dfs)
 return;
 
+  auto_vec phis_to_fixup;
+
   /* Singleton.  */
   if (stack.last () == node)
 {

Re: [PATCH][middle-end][i386][version 6]Add -fzero-call-used-regs=[skip|used-gpr-arg|used-arg|all-gpr-arg|all-arg|used-gpr|all-gpr|used|all]

2020-10-30 Thread Richard Sandiford via Gcc-patches
Qing Zhao  writes:
> @@ -3996,22 +3996,19 @@ with a named @code{target} must be @code{static}.
>  @cindex @code{zero_call_used_regs} function attribute
>  
>  The @code{zero_call_used_regs} attribute causes the compiler to zero
> -a subset of all call-used registers at function return according to
> -@var{choice}.
> -This is used to increase the program security by either mitigating
> -Return-Oriented Programming (ROP) or preventing information leak
> +a subset of all call-used registers@footnote{A ``call-used'' register
> +is a register whose contents can be changed by a function call;
> +therefore, a caller cannot assume that the register has the same contents
> +on return from the function as it had before calling the function.  Such
> +registers are also called ``call-clobbered'', ``caller-saved'', or
> +``volatile''.} at function return.
> +This is used to increase program security by either mitigating
> +Return-Oriented Programming (ROP) attacks or preventing information leakage
>  through registers.
>  
> -A ``call-used'' register is a register whose contents can be changed by
> -a function call; therefore, a caller cannot assume that the register has
> -the same contents on return from the function as it had before calling
> -the function.  Such registers are also called ``call-clobbered'',
> -``caller-saved'', or ``volatile''.
> -
>  In order to satisfy users with different security needs and control the
> -run-time overhead at the same time, GCC provides a flexible way to choose
> -the subset of the call-used registers to be zeroed.
> -
> +run-time overhead at the same time, @var{choice} parameter provides a

I suggested “the @var{choice} parameter provides” in the review yesterday.
The “the” is needed.

> +flexible way to choose the subset of the call-used registers to be zeroed.
>  The three basic values of @var{choice} are:
>  
>  @itemize @bullet
> @@ -4046,42 +4043,41 @@ together, they must appear in the order above.
>  
>  The full list of @var{choice}s is therefore:
>  
> -@itemize @bullet
> -@item
> -@samp{skip} doesn't zero any call-used register.
> +@table @code
> +@item skip
> +doesn't zero any call-used register.
>  
> -@item
> -@samp{used} only zeros call-used registers that are used in the function.
> +@item used
> +only zeros call-used registers that are used in the function.
>  
> -@item
> -@samp{all} zeros all call-used registers.
> +@item used-gpr
> +only zeros call-used general purpose registers that are used in the function.
>  
> -@item
> -@samp{used-arg} only zeros used call-used registers that pass arguments.
> +@item used-arg
> +only zeros call-used registers that are used in the function and pass 
> arguments.
>  
> -@item
> -@samp{used-gpr} only zeros used call-used general purpose registers.
> +@item used-gpr-arg
> +only zeros call-used general purpose registers that are used in the function
> +and pass arguments.
>  
> -@item
> -@samp{used-gpr-arg} only zeros used call-used general purpose registers that
> -pass arguments.
> +@item all
> +zeros all call-used registers.
>  
> -@item
> -@samp{all-gpr-arg} zeros all call-used general purpose registers that pass
> -arguments.
> +@item all-gpr
> +zeros all call-used general purpose registers.
>  
> -@item
> -@samp{all-arg} zeros all call-used registers that pass arguments.
> +@item all-arg
> +zeros all call-used registers that pass arguments.
>  
> -@item
> -@samp{all-gpr} zeros all call-used general purpose registers.
> -@end itemize
> +@item all-gpr-arg
> +zeros all call-used general purpose registers that pass
> +arguments.
> +@end table

TBH I also think the order I suggested yesterday is more natural
than this one, but either's OK.  The above certainly addresses
the original concern I had about the order being inconsistent.

> @@ -288,7 +288,7 @@ enum sanitize_code {
>  };
>  
>  /* Different settings for zeroing subset of registers.  */
> -namespace  zero_regs_code {
> +namespace zero_regs_flag {

I suggested “zero_regs_flags” rather than “zero_reg_flag” yesterday;
I think “zero_regs_flags” is better because the namespace contains
more than one flag.

> @@ -1776,7 +1776,7 @@ const struct sanitizer_opts_s coverage_sanitizer_opts[] 
> =
>{ NULL, 0U, 0UL, false }
>  };
>  
> -using namespace zero_regs_code;
> +using namespace zero_regs_flag;
>  /* -fzero-call-used-regs= suboptions.  */
>  const struct zero_call_used_regs_opts_s zero_call_used_regs_opts[] =
>  {

Sorry, I didn't notice this yesterday.  The table should use
fully-qualified names: zero_regs_flags::SKIP etc.  We shouldn't
do a using namespace for the whole file here.

OK with those changes, and thanks for doing this.

The new tests are likely to fail on some targets with the sorry()
message, but I think target maintainers are best placed to decide
whether (a) that's a fundamental restriction of the target and the
tests should just be skipped or (b) the target needs to implement
the new hook.

Richard


Re: [PING] [PATCH] S/390: Do not turn maybe-uninitialized warnings into errors

2020-10-30 Thread Stefan Schulze Frielinghaus via Gcc-patches
On Wed, Oct 28, 2020 at 11:34:53AM -0600, Jeff Law wrote:
> 
> On 10/28/20 11:29 AM, Stefan Schulze Frielinghaus wrote:
> > On Wed, Oct 28, 2020 at 08:39:41AM -0600, Jeff Law wrote:
> >> On 10/28/20 3:38 AM, Stefan Schulze Frielinghaus via Gcc-patches wrote:
> >>> On Mon, Oct 05, 2020 at 02:02:57PM +0200, Stefan Schulze Frielinghaus via 
> >>> Gcc-patches wrote:
>  On Tue, Sep 22, 2020 at 02:59:30PM +0200, Andreas Krebbel wrote:
> > On 15.09.20 17:02, Stefan Schulze Frielinghaus wrote:
> >> Over the last couple of months quite a few warnings about uninitialized
> >> variables were raised while building GCC.  A reason why these warnings
> >> show up on S/390 only is due to the aggressive inlining settings here.
> >> Some of these warnings (2c832ffedf0, b776bdca932, 2786c0221b6,
> >> 1657178f59b) could be fixed or in case of a false positive silenced by
> >> initializing the corresponding variable.  Since the latter reoccurs and
> >> while bootstrapping such warnings are turned into errors bootstrapping
> >> fails on S/390 consistently.  Therefore, for the moment do not turn
> >> those warnings into errors.
> >>
> >> config/ChangeLog:
> >>
> >>* warnings.m4: Do not turn maybe-uninitialized warnings into 
> >> errors
> >>on S/390.
> >>
> >> fixincludes/ChangeLog:
> >>
> >>* configure: Regenerate.
> >>
> >> gcc/ChangeLog:
> >>
> >>* configure: Regenerate.
> >>
> >> libcc1/ChangeLog:
> >>
> >>* configure: Regenerate.
> >>
> >> libcpp/ChangeLog:
> >>
> >>* configure: Regenerate.
> >>
> >> libdecnumber/ChangeLog:
> >>
> >>* configure: Regenerate.
> > That change looks good to me. Could a global reviewer please comment!
>  Ping
> >>> Ping
> >> I think this would be a huge mistake to install.
> > The root cause why those false positives show up on S/390 only seems to
> > be of more aggressive inlining w.r.t. other architectures.  Because of
> > bigger caches and a rather huge function call overhead we greatly
> > benefit from those inlining parameters. Thus:
> >
> > 1) Reverting those parameters would have a negative performance impact.
> >
> > 2) Fixing the maybe-uninitialized warnings analysis itself seems not to
> >happen in the near future (assuming that it is fixable at all).
> >
> > 3) Silencing the warning by initialising the variable itself also seems
> >to be undesired and feels like a fight against windmills ;-)
> >
> > 4) Not lifting maybe-uninitialized warnings to errors on S/390 only.
> >
> > Option (4) has the least intrusive effect to me.  At least then it is
> > not necessary to bootstrap with --disable-werror and we would still
> > treat all other warnings as errors.  All maybe-uninitialized warnings
> > which are triggered in common code with non-aggressive inlining are
> > still caught by other architectures.  Therefore, I'm wondering why this
> > should be a huge mistake?  What would you propose instead?
> 
> I'm aware of all that.  What I think it all argues is that y'all need to
> address the issues because of how you've changed the tuning on the s390
> port.  Simply disabling things like you've suggested is, IMHO, horribly
> wrong.
> 
> 
> Improve the analysis, dummy initializers, pragmas all seem viable.  But
> again, it feels like it's something the s390 maintainers will have to
> take the lead on because of how you've retuned the port.

Fixing the analysis is of course the best option.  However, this sounds
like a non-trivial task to me and I'm missing a lot of context here,
i.e., I'm not sure what the initial goals were and if it is possible to
meet those with the requirements which are necessary to solve those
false positives (currently having PR96564 in mind where it was mentioned
that alias info is not enough but also flow-based info is required; does
this imply that we would have to reschedule the analysis at later time
which was not desired in the first place etc.).

In the past I tried to come up with some dummy initializers which were
tough to get accepted (which I can understand up to some degree).  For
example, this one is still open (I would be happy if you could have a
look at it and accept/reject):
https://gcc.gnu.org/pipermail/gcc-patches/2020-June/547063.html

Then there is at least one unreported case (similar to PR96564) where we
are not talking about a variable of scalar type but of an aggregate
where only one struct member must be initialized in order to silence the
warning.  Not sure whether a patch would be accepted where I initialize
the whole structure or just a single member.

Thus I'm still willing to come up with dummy initializer patches,
though, I'm not sure whether they are really accepted by the community
or not.

> And note that this isn't just an issue with uninitialized warnings, the
> changes in inlining heuristics can impact all the middle

Re: [PATCH PR94442] [AArch64] Redundant ldp/stp instructions emitted at -O3

2020-10-30 Thread Richard Sandiford via Gcc-patches
xiezhiheng  writes:
>> -Original Message-
>> From: Richard Sandiford [mailto:richard.sandif...@arm.com]
>> Sent: Monday, October 26, 2020 9:03 PM
>> To: xiezhiheng 
>> Cc: Richard Biener ; gcc-patches@gcc.gnu.org
>> Subject: Re: [PATCH PR94442] [AArch64] Redundant ldp/stp instructions
>> emitted at -O3
>>
>> Thanks, pushed to trunk.
>>
>
> Thanks, and I made the patch for float conversion intrinsics.

LGTM, thanks.  Pushed.

Richard


[PATCH][pushed] gcc-changelog: Handle situations like '* tree-vect-slp.c (): '

2020-10-30 Thread Martin Liška

I've just pusted that.

contrib/ChangeLog:

* gcc-changelog/git_commit.py: Handle empty groups in
file description.
* gcc-changelog/test_email.py: New test.
* gcc-changelog/test_patches.txt: Likewise.
---
 contrib/gcc-changelog/git_commit.py|  7 +
 contrib/gcc-changelog/test_email.py|  5 
 contrib/gcc-changelog/test_patches.txt | 41 ++
 3 files changed, 53 insertions(+)

diff --git a/contrib/gcc-changelog/git_commit.py 
b/contrib/gcc-changelog/git_commit.py
index 5a9cc4c7563..1d0860cddd8 100755
--- a/contrib/gcc-changelog/git_commit.py
+++ b/contrib/gcc-changelog/git_commit.py
@@ -459,6 +459,13 @@ class GitCommit:
 msg = 'one space should follow asterisk'
 self.errors.append(Error(msg, line))
 else:
+content = m.group('content')
+parts = content.split(':')
+if len(parts) > 1:
+for needle in ('()', '[]', '<>'):
+if ' ' + needle in parts[0]:
+msg = f'empty group "{needle}" found'
+self.errors.append(Error(msg, line))
 last_entry.lines.append(line)
 else:
 if last_entry.is_empty:
diff --git a/contrib/gcc-changelog/test_email.py 
b/contrib/gcc-changelog/test_email.py
index b6fbe6a5303..98f2ecd258d 100755
--- a/contrib/gcc-changelog/test_email.py
+++ b/contrib/gcc-changelog/test_email.py
@@ -365,3 +365,8 @@ class TestGccChangelog(unittest.TestCase):
 def test_square_and_lt_gt(self):
 email = self.from_patch_glob('0001-Check-for-more-missing')
 assert not email.errors
+
+def test_empty_parenthesis(self):
+email = self.from_patch_glob('0001-tree-optimization-97633-fix')
+assert len(email.errors) == 1
+assert email.errors[0].message == 'empty group "()" found'
diff --git a/contrib/gcc-changelog/test_patches.txt 
b/contrib/gcc-changelog/test_patches.txt
index 2bf5d1aefaa..148d020f23b 100644
--- a/contrib/gcc-changelog/test_patches.txt
+++ b/contrib/gcc-changelog/test_patches.txt
@@ -3193,5 +3193,46 @@ index fe18288..313f84d 100644
  
 +

 --
+=== 0001-tree-optimization-97633-fix-SLP-scheduling-of-single.patch ===
+From c0bfd9672e19caf08e45afeb4277f848488ced2b Mon Sep 17 00:00:00 2001
+From: Richard Biener 
+Date: Fri, 30 Oct 2020 09:57:02 +0100
+Subject: [PATCH] tree-optimization/97633 - fix SLP scheduling of single-node
+ cycles
+
+This makes sure to update backedges in single-node cycles.
+
+2020-10-30  Richard Biener  
+
+   PR tree-optimization/97633
+   * tree-vect-slp.c (): Update backedges in single-node cycles.
+   Optimize processing of externals.
+
+   * g++.dg/vect/slp-pr97636.cc: New testcase.
+   * gcc.dg/vect/bb-slp-pr97633.c: Likewise.
+---
+ gcc/testsuite/g++.dg/vect/slp-pr97636.cc   |  83 +++
+ gcc/testsuite/gcc.dg/vect/bb-slp-pr97633.c |  27 
+ gcc/tree-vect-slp.c| 162 +++--
+ 3 files changed, 198 insertions(+), 74 deletions(-)
+ create mode 100644 gcc/testsuite/g++.dg/vect/slp-pr97636.cc
+ create mode 100644 gcc/testsuite/gcc.dg/vect/bb-slp-pr97633.c
+
+diff --git a/gcc/testsuite/g++.dg/vect/slp-pr97636.cc 
b/gcc/testsuite/g++.dg/vect/slp-pr97636.cc
+new file mode 100644
+index 000..012342004f1
+--- /dev/null
 b/gcc/testsuite/g++.dg/vect/slp-pr97636.cc
+@@ -0,0 +1 @@
++
+diff --git a/gcc/tree-vect-slp.c b/gcc/tree-vect-slp.c
+index 5d69a98c2a9..714e50697bd 100644
+--- a/gcc/tree-vect-slp.c
 b/gcc/tree-vect-slp.c
+@@ -1 +1,2 @@
+
++
+--
+
 2.7.4
 
--

2.29.1



Re: [PATCH] libstdc++: Fix the default constructor of ranges::__detail::__box

2020-10-30 Thread Jonathan Wakely via Gcc-patches

On 29/10/20 19:48 -0400, Patrick Palka via Libstdc++ wrote:

On Thu, 29 Oct 2020, Patrick Palka wrote:


The class template semiregular-box of [range.semi.wrap] is specified
to value-initialize the underlying object whenever its type is default-
initializable.  Our primary template for __detail::__box respects this
requirement, but the recently added partial specialization (for types
which are already semiregular) does not.

This patch fixes this issue, and additionally makes the in place
constructor explicit (as in the primary template).

Tested on x86_64-pc-linux-gnu, does this look OK for trunk?

libstdc++-v3/ChangeLog:

* include/std/ranges (__detail::__box): For the partial
specialization for types that are already semiregular,
make the default constructor value-initialize the underlying
object instead of default-initializing it.  Make the
corresponding in place constructor explicit.
* testsuite/std/ranges/detail/semiregular_box.cc: New test.
---
 libstdc++-v3/include/std/ranges   |  4 +--
 .../std/ranges/detail/semiregular_box.cc  | 33 +++
 2 files changed, 35 insertions(+), 2 deletions(-)
 create mode 100644 libstdc++-v3/testsuite/std/ranges/detail/semiregular_box.cc

diff --git a/libstdc++-v3/include/std/ranges b/libstdc++-v3/include/std/ranges
index df02b03cada..59aac326309 100644
--- a/libstdc++-v3/include/std/ranges
+++ b/libstdc++-v3/include/std/ranges
@@ -141,7 +141,7 @@ namespace ranges
   struct __box<_Tp>
   {
   private:
-   [[no_unique_address]] _Tp _M_value;
+   [[no_unique_address]] _Tp _M_value = {};


It would be more consistent with the rest of the header to do = _Tp()
instead:


OK, thanks.



Re: [Patch, Fortran] PR 92793 - fix column used for error diagnostic

2020-10-30 Thread Thomas Schwinge
Hi!

On 2019-12-04T14:37:55+0100, Tobias Burnus  wrote:
> As reported internally by Frederik, gfortran currently passes
> LOCATION_COLUMN == 0 to the middle end. The reason for that is how
> parsing works – gfortran reads the input line by line.
>
> For internal error diagnostic (fortran/error.c), the column location was
> corrected –  but not for locations passed to the middle end. Hence, the
> diagnostic there wasn't optimal.

Thanks for fixing that aspect.


Frederik has then later added a testcase to exercise this (a little bit,
at least), and I've now just pushed to master branch commit
fa410314ec94c9df2ad270c1917adc51f9147c2c "[OpenACC] Elaborate testcases
that verify column location information [PR92793]", backported to
releases/gcc-10 branch in commit
fc423b4e5b16dc02cc9f91fdfc800d00a5103dea, see attached.


Grüße
 Thomas


> Fixed by introducing a new function; now one only needs to make sure
> that no new code will re-introduce "lb->location" :-)
>
> Build and regtested on x86-64-gnu-linux.
> OK for the trunk?
>
> Tobias
>
> 2019-12-04  Tobias Burnus  
>
>   PR fortran/92793
>   * trans.c (gfc_get_location): Declare.
>   * trans.c (gfc_get_location): Define; returns column-corrected location.
>   (trans_runtime_error_vararg, gfc_trans_runtime_check,
>   gfc_generate_module_code): Use new function.
>   * trans-array.c (gfc_trans_auto_array_allocation): Likewise.
>   * trans-common.c (build_field, get_init_field, create_common): Likewise.
>   * trans-decl.c (gfc_build_label_decl, gfc_get_symbol_decl): Likewise.
>   * trans-openmp.c (gfc_trans_omp_reduction_list, gfc_trans_omp_clauses):
>   Likewise.
>   * trans-stmt.c (gfc_trans_if_1): Likewise.
>
>  gcc/fortran/trans-array.c  |   4 +-
>  gcc/fortran/trans-common.c |   6 +--
>  gcc/fortran/trans-decl.c   |   4 +-
>  gcc/fortran/trans-openmp.c | 103 
> +++--
>  gcc/fortran/trans-stmt.c   |  19 +
>  gcc/fortran/trans.c|  22 +++---
>  gcc/fortran/trans.h|   4 ++
>  7 files changed, 91 insertions(+), 71 deletions(-)
>
> diff --git a/gcc/fortran/trans-array.c b/gcc/fortran/trans-array.c
> index 685f8c5a874..3bae49d85db 100644
> --- a/gcc/fortran/trans-array.c
> +++ b/gcc/fortran/trans-array.c
> @@ -6364,7 +6364,7 @@ gfc_trans_auto_array_allocation (tree decl, gfc_symbol 
> * sym,
>if (flag_stack_arrays)
>  {
>gcc_assert (TREE_CODE (TREE_TYPE (decl)) == POINTER_TYPE);
> -  space = build_decl (sym->declared_at.lb->location,
> +  space = build_decl (gfc_get_location (&sym->declared_at),
> VAR_DECL, create_tmp_var_name ("A"),
> TREE_TYPE (TREE_TYPE (decl)));
>gfc_trans_vla_type_sizes (sym, &init);
> @@ -6406,7 +6406,7 @@ gfc_trans_auto_array_allocation (tree decl, gfc_symbol 
> * sym,
>tmp = fold_build1_loc (input_location, DECL_EXPR,
>TREE_TYPE (space), space);
>gfc_add_expr_to_block (&init, tmp);
> -  addr = fold_build1_loc (sym->declared_at.lb->location,
> +  addr = fold_build1_loc (gfc_get_location (&sym->declared_at),
> ADDR_EXPR, TREE_TYPE (decl), space);
>gfc_add_modify (&init, decl, addr);
>gfc_add_init_cleanup (block, gfc_finish_block (&init), NULL_TREE);
> diff --git a/gcc/fortran/trans-common.c b/gcc/fortran/trans-common.c
> index 18ad60fd657..95d6395470c 100644
> --- a/gcc/fortran/trans-common.c
> +++ b/gcc/fortran/trans-common.c
> @@ -282,7 +282,7 @@ build_field (segment_info *h, tree union_type, 
> record_layout_info rli)
>unsigned HOST_WIDE_INT desired_align, known_align;
>
>name = get_identifier (h->sym->name);
> -  field = build_decl (h->sym->declared_at.lb->location,
> +  field = build_decl (gfc_get_location (&h->sym->declared_at),
> FIELD_DECL, name, h->field);
>known_align = (offset & -offset) * BITS_PER_UNIT;
>if (known_align == 0 || known_align > BIGGEST_ALIGNMENT)
> @@ -559,7 +559,7 @@ get_init_field (segment_info *head, tree union_type, tree 
> *field_init,
>tmp = build_range_type (gfc_array_index_type,
> gfc_index_zero_node, tmp);
>tmp = build_array_type (type, tmp);
> -  field = build_decl (gfc_current_locus.lb->location,
> +  field = build_decl (gfc_get_location (&gfc_current_locus),
> FIELD_DECL, NULL_TREE, tmp);
>
>known_align = BIGGEST_ALIGNMENT;
> @@ -711,7 +711,7 @@ create_common (gfc_common_head *com, segment_info *head, 
> bool saw_equiv)
>  {
>tree var_decl;
>
> -  var_decl = build_decl (s->sym->declared_at.lb->location,
> +  var_decl = build_decl (gfc_get_location (&s->sym->declared_at),
>VAR_DECL, DECL_NAME (s->field),
>TREE_TYPE (s->field));
>TREE_STATIC (var_decl) = TREE_STATIC (decl);
> diff --git a/gcc/fortran/trans-decl.c b/gcc/fortran/trans-decl.c
> index e742

Re: [Patch, Fortran] PR 92793 - fix column used for error diagnostic

2020-10-30 Thread Thomas Schwinge
Hi!

On 2020-10-30T11:35:15+0100, I wrote:
> On 2019-12-04T14:37:55+0100, Tobias Burnus  wrote:
>> As reported internally by Frederik, gfortran currently passes
>> LOCATION_COLUMN == 0 to the middle end. The reason for that is how
>> parsing works – gfortran reads the input line by line.
>>
>> For internal error diagnostic (fortran/error.c), the column location was
>> corrected –  but not for locations passed to the middle end. Hence, the
>> diagnostic there wasn't optimal.
>
> Thanks for fixing that aspect.

While working on something completely different -- of course...  ;-) -- I
ran into:

>> Fixed by introducing a new function; now one only needs to make sure
>> that no new code will re-introduce "lb->location" :-)

... another *existing instance* of this problem.


>> -  space = build_decl (sym->declared_at.lb->location,
>> +  space = build_decl (gfc_get_location (&sym->declared_at),

The same change is required in
'gcc/fortran/trans.c:gfc_set_backend_locus'.

That took me a while to figure out...  :-| In OMP offloading compilation
I saw diagnostics *with* column location information for C, C++, but the
very same diagnostics *without* column location information for Fortran.
Once I had some understood the Fortran front end locaiton processing --
uh...  ;-\ -- I came up with the attached patch to "Further improve
Fortran column location information [PR92793]".  OK to push?  (No
testsuite regressions.)


Grüße
 Thomas


-
Mentor Graphics (Deutschland) GmbH, Arnulfstraße 201, 80634 München / Germany
Registergericht München HRB 106955, Geschäftsführer: Thomas Heurung, Alexander 
Walter
>From 3f284dd076b313c2ffbd02a8d7327118ce910c49 Mon Sep 17 00:00:00 2001
From: Thomas Schwinge 
Date: Thu, 29 Oct 2020 14:42:28 +0100
Subject: [PATCH] Further improve Fortran column location information [PR92793]

Building on top of commit 9c81750c5bedd7883182ee2684a012c6210ebe1d "Fortran] PR
92793 - fix column used for error diagnostic", there is another place where we
have to use 'gfc_get_location' returning column-corrected locations.

For example, this improves column location information for OMP constructs.

	gcc/fortran/
	PR fortran/92793
	* trans.c (gfc_set_backend_locus): Use 'gfc_get_location'.
	gcc/testsuite/
	PR fortran/92793
	* gfortran.dg/goacc/pr92793-1.f90: Adjust.
---
 gcc/fortran/trans.c   |  2 +-
 gcc/testsuite/gfortran.dg/goacc/pr92793-1.f90 | 24 +--
 2 files changed, 13 insertions(+), 13 deletions(-)

diff --git a/gcc/fortran/trans.c b/gcc/fortran/trans.c
index 8caa625ab0e8..72ea24125232 100644
--- a/gcc/fortran/trans.c
+++ b/gcc/fortran/trans.c
@@ -1829,7 +1829,7 @@ void
 gfc_set_backend_locus (locus * loc)
 {
   gfc_current_backend_file = loc->lb->file;
-  input_location = loc->lb->location;
+  input_location = gfc_get_location (loc);
 }
 
 
diff --git a/gcc/testsuite/gfortran.dg/goacc/pr92793-1.f90 b/gcc/testsuite/gfortran.dg/goacc/pr92793-1.f90
index a572c6b3437b..72dd6b7b8f81 100644
--- a/gcc/testsuite/gfortran.dg/goacc/pr92793-1.f90
+++ b/gcc/testsuite/gfortran.dg/goacc/pr92793-1.f90
@@ -13,22 +13,22 @@ subroutine check ()
   integer :: i, j, sum, diff
 
  !$accparallel &
- !$acc & & ! Fortran location information points to the last line of the directive, and there is no column location information.
-!$acc  && ! { dg-final { scan-tree-dump-times "pr92793-1\\\.f90:18:0\\\] #pragma acc parallel" 1 "original" } }
-  !$acc & ! { dg-final { scan-tree-dump-times "pr92793-1\\\.f90:18:0\\\] #pragma omp target oacc_parallel" 1 "gimple" } }
+ !$acc & & ! Fortran location information points to the last line, and last character of the directive.
+!$acc  && ! { dg-final { scan-tree-dump-times "pr92793-1\\\.f90:18:123\\\] #pragma acc parallel" 1 "original" } }
+  !$acc & ! { dg-final { scan-tree-dump-times "pr92793-1\\\.f90:18:123\\\] #pragma omp target oacc_parallel" 1 "gimple" } }
   !$acc loop &
-!$acc & & ! Fortran location information points to the last line of the directive, and there is no column location information.
-  !$acc  &   & ! { dg-final { scan-tree-dump-times "pr92793-1\\\.f90:26:0\\\] #pragma acc loop" 1 "original" } }
- !$acc & & ! { dg-final { scan-tree-dump-times "pr92793-1\\\.f90:26:0\\\] #pragma acc loop" 1 "gimple" } }
+!$acc & & ! Fortran location information points to the last line, and last character of the directive.
+  !$acc  &   & ! { dg-final { scan-tree-dump-times "pr92793-1\\\.f90:26:22\\\] #pragma acc loop" 1 "original" } }
+ !$acc & & ! { dg-final { scan-tree-dump-times "pr92793-1\\\.f90:26:22\\\] #pragma acc loop" 1 "gimple" } }
 !$acc&   reduction  ( +: sum ) & ! { dg-line sum1 }
  !$acc && ! Fortran location information points to the ':' in 'reduction(+:sum)'.
!$acc   &&  ! { dg-message "36: location of the previous reduction for 'sum'" "" { target *-*-* } sum1 }
 !$acc& independent
   do i = 1, 10
   !$acc loop &
-!$acc & & ! Fortran loc

[committed] openmp: Use FIELD_TGT_EMPTY once more

2020-10-30 Thread Jakub Jelinek via Gcc-patches
Hi!

I have noticed one extra spot which should be using the FIELD_TGT_EMPTY
macro (defined to (~(size_t) 0)), but didn't.

Fixed thusly, committed to trunk.

2020-10-30  Jakub Jelinek  

* target.c (gomp_map_vars_internal): Use FIELD_TGT_EMPTY macro
even in field_tgt_clear initializer.

--- libgomp/target.c.jj
+++ libgomp/target.c
@@ -1020,7 +1020,7 @@ gomp_map_vars_internal (struct gomp_device_descr *devicep,
   if (not_found_cnt)
tgt->array = gomp_malloc (not_found_cnt * sizeof (*tgt->array));
   splay_tree_node array = tgt->array;
-  size_t j, field_tgt_offset = 0, field_tgt_clear = ~(size_t) 0;
+  size_t j, field_tgt_offset = 0, field_tgt_clear = FIELD_TGT_EMPTY;
   uintptr_t field_tgt_base = 0;
 
   for (i = 0; i < mapnum; i++)


Jakub



Re: [Patch, Fortran] PR 92793 - fix column used for error diagnostic

2020-10-30 Thread Tobias Burnus

Hi Thomas,

On 30.10.20 11:47, Thomas Schwinge wrote:

Fixed by introducing a new function; now one only needs to make sure
that no new code will re-introduce "lb->location":-)

... another*existing instance*  of this problem.

...

  gfc_set_backend_locus (locus * loc)
  {
gfc_current_backend_file = loc->lb->file;
-  input_location = loc->lb->location;
+  input_location = gfc_get_location (loc);
  }


In bare usage, it seems to be fine – which are 23 callers.

However, there is additionally:

gfc_save_backend_locus (locus * loc)
{
  loc->lb = XCNEW (gfc_linebuf);
  loc->lb->location = input_location;
  loc->lb->file = gfc_current_backend_file;
}

which is used together with:

gfc_restore_backend_locus (locus * loc)
{
  gfc_set_backend_locus (loc);
  free (loc->lb);
}

I think the latter needs to be replaced by the previous
version of "gfc_save_backend_locus" for two related reasons:

* gfc_save_backend_locus operates with incomplete data,
  i.e. loc->nextc (used by gfc_get_location) might not
  be set.
* input_location might/should already contain the column
  offset – and you do not want to add some random offset
  to it.

Hence: LGTM – if you update 'gfc_restore_backend_locus'
by inlining the previous version of 'gfc_set_backend_locus'.

Thanks,

Tobias

-
Mentor Graphics (Deutschland) GmbH, Arnulfstraße 201, 80634 München / Germany
Registergericht München HRB 106955, Geschäftsführer: Thomas Heurung, Alexander 
Walter


libgomp testsuite: tell warning from error diagnostics, etc. [PR80219, PR85303]

2020-10-30 Thread Thomas Schwinge
Hi!

Turns out that GCC PR85303 "[testsuite, libgomp] dg-message not
supported" is the very same problem as (the libgomp aspect of) GCC
PR80219 "relative line numbers only working if gcc_{error,warning}_prefix
defined" (see rationale in there).  OK to push the attached patch for
"libgomp testsuite: tell warning from error diagnostics, etc. [PR80219,
PR85303]"?  This changes makes 'dg-warning', 'dg-error', 'dg-bogus',
'dg-message' behave as expected, and also enables use of relative line
numbers as well as 'dg-line'.  (No testsuite regressions.)


I (later) have a proper use for that, but with that fixed, you can then
do standard things as follows:

--- a/libgomp/testsuite/libgomp.oacc-c-c++-common/parallel-dims.c
+++ b/libgomp/testsuite/libgomp.oacc-c-c++-common/parallel-dims.c
@@ -56,17 +56,17 @@ int main ()

   /* GR, WS, VS.  */
   {
-#define GANGS 0 /* { dg-warning "'num_gangs' value must be positive" "" { 
target c } } */
-int gangs_actual = GANGS;
+#define GANGS 0
+int gangs_actual = GANGS; /* { dg-warning "'num_gangs' value must be 
positive" "" { target c } .-1 } */
 int gangs_min, gangs_max, workers_min, workers_max, vectors_min, 
vectors_max;
 gangs_min = workers_min = vectors_min = INT_MAX;
 gangs_max = workers_max = vectors_max = INT_MIN;
 #pragma acc parallel copy (gangs_actual) \
   reduction (min: gangs_min, workers_min, vectors_min) reduction (max: 
gangs_max, workers_max, vectors_max) \
-  num_gangs (GANGS) /* { dg-warning "'num_gangs' value must be positive" 
"" { target c++ } } */
+  num_gangs (GANGS) // { dg-line LiNE }
 {
   /* We're actually executing with num_gangs (1).  */
-  gangs_actual = 1;
+  gangs_actual = 1; /* { dg-warning "'num_gangs' value must be 
positive" "" { target c++ } LiNE } */
   for (int i = 100 * gangs_actual; i > -100 * gangs_actual; --i)
{
  /* .  */
@@ -98,13 +98,13 @@ int main ()

   /* GP, WS, VS.  */
   {
-#define GANGS 0 /* { dg-warning "'num_gangs' value must be positive" "" { 
target c } } */
+#define GANGS 0 /* { dg-message "warning: 'num_gangs' value must be 
positive" "" { target c } } */
 int gangs_actual = GANGS;
 int gangs_min, gangs_max, workers_min, workers_max, vectors_min, 
vectors_max;
 gangs_min = workers_min = vectors_min = INT_MAX;
 gangs_max = workers_max = vectors_max = INT_MIN;
 #pragma acc parallel copy (gangs_actual) \
-  num_gangs (GANGS) /* { dg-warning "'num_gangs' value must be positive" 
"" { target c++ } } */
+  num_gangs (GANGS) // { dg-line LaNE }
 {
   /* We're actually executing with num_gangs (1).  */
   gangs_actual = 1;
@@ -115,7 +115,7 @@ int main ()
  workers_min = workers_max = acc_worker ();
  vectors_min = vectors_max = acc_vector ();
}
-}
+} /* { dg-message "'num_gangs' value must be positive" "" { target c++ 
} LaNE } */
 if (gangs_actual != 1)
   __builtin_abort ();
 if (gangs_min != 0 || gangs_max != gangs_actual - 1


Grüße
 Thomas


-
Mentor Graphics (Deutschland) GmbH, Arnulfstraße 201, 80634 München / Germany
Registergericht München HRB 106955, Geschäftsführer: Thomas Heurung, Alexander 
Walter
>From 47e35a661859ade764fe4ed2a4e3e2205c19dd90 Mon Sep 17 00:00:00 2001
From: Thomas Schwinge 
Date: Thu, 29 Oct 2020 10:29:19 +0100
Subject: [PATCH] libgomp testsuite: tell warning from error diagnostics, etc.
 [PR80219, PR85303]

This changes makes 'dg-warning', 'dg-error', 'dg-bogus', 'dg-message' behave as
expected, and also enables use of relative line numbers as well as 'dg-line'.

	libgomp/
	PR testsuite/80219
	PR testsuite/85303
	* testsuite/lib/libgomp.exp (libgomp_init): Set
	'gcc_warning_prefix', 'gcc_error_prefix'.
---
 libgomp/testsuite/lib/libgomp.exp | 6 ++
 1 file changed, 6 insertions(+)

diff --git a/libgomp/testsuite/lib/libgomp.exp b/libgomp/testsuite/lib/libgomp.exp
index 5d86e2ac095f..72d001186a57 100644
--- a/libgomp/testsuite/lib/libgomp.exp
+++ b/libgomp/testsuite/lib/libgomp.exp
@@ -241,6 +241,12 @@ proc libgomp_init { args } {
 if { $offload_additional_options != "" } {
 	lappend ALWAYS_CFLAGS "additional_flags=${offload_additional_options}"
 }
+
+# Tell warning from error diagnostics.  This fits for C, C++, and Fortran.
+global gcc_warning_prefix
+set gcc_warning_prefix "\[Ww\]arning:"
+global gcc_error_prefix
+set gcc_error_prefix "(\[Ff\]atal )?\[Ee\]rror:"
 }
 
 #
-- 
2.17.1



Re: libgomp testsuite: tell warning from error diagnostics, etc. [PR80219, PR85303]

2020-10-30 Thread Jakub Jelinek via Gcc-patches
On Fri, Oct 30, 2020 at 12:22:31PM +0100, Thomas Schwinge wrote:
> Turns out that GCC PR85303 "[testsuite, libgomp] dg-message not
> supported" is the very same problem as (the libgomp aspect of) GCC
> PR80219 "relative line numbers only working if gcc_{error,warning}_prefix
> defined" (see rationale in there).  OK to push the attached patch for
> "libgomp testsuite: tell warning from error diagnostics, etc. [PR80219,
> PR85303]"?  This changes makes 'dg-warning', 'dg-error', 'dg-bogus',
> 'dg-message' behave as expected, and also enables use of relative line
> numbers as well as 'dg-line'.  (No testsuite regressions.)

Ok, thanks.

>   libgomp/
>   PR testsuite/80219
>   PR testsuite/85303
>   * testsuite/lib/libgomp.exp (libgomp_init): Set
>   'gcc_warning_prefix', 'gcc_error_prefix'.
> ---
>  libgomp/testsuite/lib/libgomp.exp | 6 ++
>  1 file changed, 6 insertions(+)
> 
> diff --git a/libgomp/testsuite/lib/libgomp.exp 
> b/libgomp/testsuite/lib/libgomp.exp
> index 5d86e2ac095f..72d001186a57 100644
> --- a/libgomp/testsuite/lib/libgomp.exp
> +++ b/libgomp/testsuite/lib/libgomp.exp
> @@ -241,6 +241,12 @@ proc libgomp_init { args } {
>  if { $offload_additional_options != "" } {
>   lappend ALWAYS_CFLAGS "additional_flags=${offload_additional_options}"
>  }
> +
> +# Tell warning from error diagnostics.  This fits for C, C++, and 
> Fortran.
> +global gcc_warning_prefix
> +set gcc_warning_prefix "\[Ww\]arning:"
> +global gcc_error_prefix
> +set gcc_error_prefix "(\[Ff\]atal )?\[Ee\]rror:"
>  }
>  
>  #
> -- 
> 2.17.1
> 


Jakub



[PATCH] tree-optimization/97626 - handle SCCs properly in SLP stmt analysis

2020-10-30 Thread Richard Biener
This makes sure to roll-back the whole SCC when we fail stmt
analysis, otherwise the optimistic visited treatment breaks down
with different entries.  Rollback is easy when tracking additions
to visited in a vector which also makes the whole thing cheaper
than the two hash-sets used before.

Bootstrapped and tested on x86_64-unknown-linux-gnu, pushed.

2020-10-30  Richard Biener  

PR tree-optimization/97626
* tree-vect-slp.c (vect_slp_analyze_node_operations):
Exchange the lvisited hash-set for a vector, roll back
recursive adds to visited when analysis failed.
(vect_slp_analyze_operations): Likewise.

* gcc.dg/vect/bb-slp-pr97626.c: New testcase.
---
 gcc/testsuite/gcc.dg/vect/bb-slp-pr97626.c | 34 ++
 gcc/tree-vect-slp.c| 34 +-
 2 files changed, 55 insertions(+), 13 deletions(-)
 create mode 100644 gcc/testsuite/gcc.dg/vect/bb-slp-pr97626.c

diff --git a/gcc/testsuite/gcc.dg/vect/bb-slp-pr97626.c 
b/gcc/testsuite/gcc.dg/vect/bb-slp-pr97626.c
new file mode 100644
index 000..943d8a62de7
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/vect/bb-slp-pr97626.c
@@ -0,0 +1,34 @@
+/* { dg-do compile } */
+
+struct {
+  int x;
+  int y;
+} do_plasma_rect;
+
+int do_plasma_context_0, do_plasma_x2, do_plasma_y2, do_plasma_plasma_depth,
+do_plasma_xm, do_plasma_ym;
+void gegl_buffer_set();
+
+void do_plasma(int x1, int y1) {
+  if (__builtin_expect(({
+ int _g_boolean_var_;
+ if (do_plasma_context_0)
+   _g_boolean_var_ = 1;
+ else
+   _g_boolean_var_ = 0;
+ _g_boolean_var_;
+   }),
+   0)) {
+do_plasma_rect.x = x1;
+do_plasma_rect.y = y1;
+gegl_buffer_set();
+  }
+  do_plasma_xm = (x1 + do_plasma_x2) / 2;
+  do_plasma_ym = (y1 + do_plasma_y2) / 2;
+  if (do_plasma_plasma_depth) {
+do_plasma_rect.x = do_plasma_xm;
+do_plasma_rect.y = do_plasma_ym;
+return;
+  }
+  do_plasma(do_plasma_xm, do_plasma_ym);
+}
diff --git a/gcc/tree-vect-slp.c b/gcc/tree-vect-slp.c
index 714e50697bd..56dc59e11a6 100644
--- a/gcc/tree-vect-slp.c
+++ b/gcc/tree-vect-slp.c
@@ -3487,8 +3487,8 @@ vect_prologue_cost_for_slp (slp_tree node,
 static bool
 vect_slp_analyze_node_operations (vec_info *vinfo, slp_tree node,
  slp_instance node_instance,
- hash_set &visited,
- hash_set &lvisited,
+ hash_set &visited_set,
+ vec &visited_vec,
  stmt_vector_for_cost *cost_vec)
 {
   int i, j;
@@ -3511,15 +3511,18 @@ vect_slp_analyze_node_operations (vec_info *vinfo, 
slp_tree node,
 
   /* If we already analyzed the exact same set of scalar stmts we're done.
  We share the generated vector stmts for those.  */
-  if (visited.contains (node)
-  || lvisited.add (node))
+  if (visited_set.add (node))
 return true;
+  visited_vec.safe_push (node);
 
   bool res = true;
+  unsigned visited_rec_start = visited_vec.length ();
+  unsigned cost_vec_rec_start = cost_vec->length ();
   FOR_EACH_VEC_ELT (SLP_TREE_CHILDREN (node), i, child)
 {
   res = vect_slp_analyze_node_operations (vinfo, child, node_instance,
- visited, lvisited, cost_vec);
+ visited_set, visited_vec,
+ cost_vec);
   if (!res)
break;
 }
@@ -3527,8 +3530,14 @@ vect_slp_analyze_node_operations (vec_info *vinfo, 
slp_tree node,
   if (res)
 res = vect_slp_analyze_node_operations_1 (vinfo, node, node_instance,
  cost_vec);
+  /* If analysis failed we have to pop all recursive visited nodes
+ plus ourselves.  */
   if (!res)
-lvisited.remove (node);
+{
+  while (visited_vec.length () >= visited_rec_start)
+   visited_set.remove (visited_vec.pop ());
+  cost_vec->truncate (cost_vec_rec_start);
+}
 
   /* When the node can be vectorized cost invariant nodes it references.
  This is not done in DFS order to allow the refering node
@@ -3543,9 +3552,9 @@ vect_slp_analyze_node_operations (vec_info *vinfo, 
slp_tree node,
  /* Perform usual caching, note code-generation still
 code-gens these nodes multiple times but we expect
 to CSE them later.  */
- && !visited.contains (child)
- && !lvisited.add (child))
+ && !visited_set.add (child))
{
+ visited_vec.safe_push (child);
  /* ???  After auditing more code paths make a "default"
 and push the vector type from NODE to all children
 if it is not already set.  */
@@ -3705,14 +3714,14 @@ vect_slp_analyze_oper

Re: [RFC, testsuite] Add dg-save-linenr

2020-10-30 Thread Thomas Schwinge
Hi!

On 2017-05-22T18:55:29+0200, Tom de Vries  wrote:
> On 05/16/2017 03:12 PM, Rainer Orth wrote:
>> [...], but the new proc ['dg-line'] needs documenting in sourcebuild.texi.
>
> Attached patch adds the missing documentation.

OK to expand that with the attached patch to "Document that 'linenumvar'
in 'dg-line' may contain Tcl syntax"?  (Hooray for embedded Tcl!  --
Don't hurt me; I (later) have a use case where this does make things
easier.)

'{ dg-line LINENUMVAR }'
 This DejaGnu directive sets the variable LINENUMVAR to the line
 number of the source line.  The variable LINENUMVAR, which must be
 unique per testcase, may then be used in subsequent 'dg-error',
 'dg-warning', 'dg-message' and 'dg-bogus' directives.  For example:

  int a;   /* { dg-line first_def_a } */
  float a; /* { dg-error "conflicting types of" } */
  /* { dg-message "previous declaration of" "" { target *-*-* } 
first_def_a } */

 Note that LINENUMVAR may contain Tcl syntax, for example:

  #pragma acc parallel loop [...] /* { dg-line line[incr 
line_count] } */
/* { dg-message "note: [...]" "" { target *-*-* } 
line$line_count } */
/* { dg-message "optimized: [...]" "" { target *-*-* } 
line$line_count } */
for (int j = 0; j < nj; ++j)
  {
#pragma acc loop [...] /* { dg-line line[incr line_count] } 
*/
/* { dg-message "missed: [...]" "" { target *-*-* } 
line$line_count } */
/* { dg-message "optimized: [...]" "" { target *-*-* } 
line$line_count } */
/* { dg-message "note: [...]" "" { target *-*-* } 
line$line_count } */
for (int i = 0; i < ni; ++i)

 For each 'dg-line', this increments a counter variable 'line_count'
 to construct unique 'line$line_count' names for LINENUMVAR:
 'line1', 'line2',   The preceding 'dg-line' may then be
 referred to via 'line$line_count'.


Grüße
 Thomas


-
Mentor Graphics (Deutschland) GmbH, Arnulfstraße 201, 80634 München / Germany
Registergericht München HRB 106955, Geschäftsführer: Thomas Heurung, Alexander 
Walter
>From 2211acd9a902a5cab874762166dbca116a98bea5 Mon Sep 17 00:00:00 2001
From: Thomas Schwinge 
Date: Thu, 29 Oct 2020 07:04:54 +0100
Subject: [PATCH] Document that 'linenumvar' in 'dg-line' may contain Tcl
 syntax

	gcc/
	* doc/sourcebuild.texi (dg-line): Document that 'linenumvar' may
	contain Tcl syntax.
---
 gcc/doc/sourcebuild.texi | 25 -
 1 file changed, 24 insertions(+), 1 deletion(-)

diff --git a/gcc/doc/sourcebuild.texi b/gcc/doc/sourcebuild.texi
index 49316a5d0ff9..92e9f4353d3f 100644
--- a/gcc/doc/sourcebuild.texi
+++ b/gcc/doc/sourcebuild.texi
@@ -1226,7 +1226,9 @@ targets.
 
 @item @{ dg-line @var{linenumvar} @}
 This DejaGnu directive sets the variable @var{linenumvar} to the line number of
-the source line.  The variable @var{linenumvar} can then be used in subsequent
+the source line.
+The variable @var{linenumvar}, which must be unique per testcase, may
+then be used in subsequent
 @code{dg-error}, @code{dg-warning}, @code{dg-message} and @code{dg-bogus}
 directives.  For example:
 
@@ -1236,6 +1238,27 @@ float a; /* @{ dg-error "conflicting types of" @} */
 /* @{ dg-message "previous declaration of" "" @{ target *-*-* @} first_def_a @} */
 @end smallexample
 
+Note that @var{linenumvar} may contain Tcl syntax, for example:
+
+@smallexample
+#pragma acc parallel loop [...] /* @{ dg-line line[incr line_count] @} */
+  /* @{ dg-message "note: [...]" "" @{ target *-*-* @} line$line_count @} */
+  /* @{ dg-message "optimized: [...]" "" @{ target *-*-* @} line$line_count @} */
+  for (int j = 0; j < nj; ++j)
+@{
+  #pragma acc loop [...] /* @{ dg-line line[incr line_count] @} */
+  /* @{ dg-message "missed: [...]" "" @{ target *-*-* @} line$line_count @} */
+  /* @{ dg-message "optimized: [...]" "" @{ target *-*-* @} line$line_count @} */
+  /* @{ dg-message "note: [...]" "" @{ target *-*-* @} line$line_count @} */
+  for (int i = 0; i < ni; ++i)
+@end smallexample
+
+For each @code{dg-line}, this increments a counter variable
+@code{line_count} to construct unique @code{line$line_count} names for
+@var{linenumvar}: @code{line1}, @code{line2}, @dots{}.
+The preceding @code{dg-line} may then be referred to via
+@code{line$line_count}.
+
 @item @{ dg-excess-errors @var{comment} [@{ target/xfail @var{selector} @}] @}
 This DejaGnu directive indicates that the test is expected to fail due
 to compiler messages that are not handled by @samp{dg-error},
-- 
2.17.1



Re: [Patch] Fortran: Update omp atomic for OpenMP 5

2020-10-30 Thread Jakub Jelinek via Gcc-patches
On Thu, Oct 29, 2020 at 06:05:41PM +0100, Tobias Burnus wrote:
> gcc/fortran/ChangeLog:
> 
>   * dump-parse-tree.c (show_omp_clauses): Handle atomic clauses.
>   (show_omp_node): Call it for atomic.
>   * gfortran.h (enum gfc_omp_atomic_op): Add GFC_OMP_ATOMIC_UNSET,
>   remove GFC_OMP_ATOMIC_SEQ_CST and GFC_OMP_ATOMIC_ACQ_REL.
>   (enum gfc_omp_memorder): Replace OMP_MEMORDER_LAST by
>   OMP_MEMORDER_UNSET, add OMP_MEMORDER_SEQ_CST/OMP_MEMORDER_RELAXED.
>   (gfc_omp_clauses): Add capture and atomic_op.
>   (gfc_code): remove omp_atomic.
>   * openmp.c (enum omp_mask1): Add atomic, capture, memorder clauses.
>   (gfc_match_omp_clauses): Match them.
>   (OMP_ATOMIC_CLAUSES): Add.
>   (gfc_match_omp_flush): Update for 'last' to 'unset' change.
>   (gfc_match_omp_oacc_atomic): Removed and placed content ..
>   (gfc_match_omp_atomic): ... here. Update for OpenMP 5 clauses.
>   (gfc_match_oacc_atomic): Match directly here.
>   (resolve_omp_atomic, gfc_resolve_omp_directive): Update.
>   * parse.c (parse_omp_oacc_atomic): Update for struct gfc_code changes.
>   * resolve.c (gfc_resolve_blocks): Update assert.
>   * st.c (gfc_free_statement): Also call for EXEC_O{ACC,MP}_ATOMIC.
>   * trans-openmp.c (gfc_trans_omp_atomic): Update.
>   (gfc_trans_omp_flush): Update for 'last' to 'unset' change.
> 
> gcc/testsuite/ChangeLog:
> 
>   * gfortran.dg/gomp/atomic-2.f90: New test.
>   * gfortran.dg/gomp/atomic.f90: New test.

> + gfc_error ("OMP ATOMIC READ at %L incompatible with "
> +"ACQ_REL or RELEASE clauses", &loc);

> +  gfc_error ("Unexpected junk after $ACC ATOMIC statement at %C");
> +  gfc_free_omp_clauses (c);

Would be nice to be consistent.  I think most commonly in diagnostics
we use !$OMP ... and !$ACC , $ACC is not used anywhere, and while
some uses of just OMP ... crept in, they aren't used that much yet.

> -= (gfc_omp_atomic_op) (atomic_code->ext.omp_atomic & 
> GFC_OMP_ATOMIC_MASK);
> += (gfc_omp_atomic_op) (atomic_code->ext.omp_clauses->atomic_op & 
> GFC_OMP_ATOMIC_MASK);

Too long line.

Otherwise LGTM.

Jakub



Re: [RFC, testsuite] Add dg-save-linenr

2020-10-30 Thread Jakub Jelinek via Gcc-patches
On Fri, Oct 30, 2020 at 12:34:57PM +0100, Thomas Schwinge wrote:
> Hi!
> 
> On 2017-05-22T18:55:29+0200, Tom de Vries  wrote:
> > On 05/16/2017 03:12 PM, Rainer Orth wrote:
> >> [...], but the new proc ['dg-line'] needs documenting in sourcebuild.texi.
> >
> > Attached patch adds the missing documentation.
> 
> OK to expand that with the attached patch to "Document that 'linenumvar'
> in 'dg-line' may contain Tcl syntax"?  (Hooray for embedded Tcl!  --
> Don't hurt me; I (later) have a use case where this does make things
> easier.)

Is it desirable though?
I mean if we ever decide to switch from dejagnu to something else,
adding parsing of our dg-* grammar is not that hard, and while we rely
on some tcl details already (e.g. the {}s vs. ""s for regular expressions
etc.), allowing arbitrary embedded tcl will make that effort even harder.

Jakub



[PATCH] tree-optimization/97623 - avoid excessive insert iteration for hoisting

2020-10-30 Thread Richard Biener
This avoids requiring insert iteration for back-to-back hoisting
opportunities as seen in the added testcase.  For the PR at hand
this halves the number of insert iterations retaining only
the hard to avoid PRE / hoist insert back-to-backs.

Bootstrap and regtest running on x86_64-unknown-linux-gnu.

2020-10-30  Richard Biener  

PR tree-optimization/97623
* tree-ssa-pre.c (insert): First do hoist insertion in
a backward walk.

* gcc.dg/tree-ssa/ssa-hoist-7.c: New testcase.
---
 gcc/testsuite/gcc.dg/tree-ssa/ssa-hoist-7.c | 54 +
 gcc/tree-ssa-pre.c  | 13 +++--
 2 files changed, 63 insertions(+), 4 deletions(-)
 create mode 100644 gcc/testsuite/gcc.dg/tree-ssa/ssa-hoist-7.c

diff --git a/gcc/testsuite/gcc.dg/tree-ssa/ssa-hoist-7.c 
b/gcc/testsuite/gcc.dg/tree-ssa/ssa-hoist-7.c
new file mode 100644
index 000..ce9cec61668
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/tree-ssa/ssa-hoist-7.c
@@ -0,0 +1,54 @@
+/* { dg-options "-O2 -fdump-tree-pre-stats" } */
+
+void baz();
+int tem;
+void foo (int a, int b, int c, int d, int e, int x, int y, int z)
+{
+  if (a)
+{
+  if (b)
+{
+  if (c)
+{
+ if (d)
+   {
+ if (e)
+   {
+ tem = x + y;
+   }
+ else
+   {
+ if (z) baz ();
+ tem = x + y;
+   }
+   }
+ else
+   {
+ if (z) baz ();
+ tem = x + y;
+   }
+   }
+  else
+{
+  if (z) baz ();
+  tem = x + y;
+}
+}
+  else
+{
+  if (z) baz ();
+  tem = x + y;
+}
+}
+  else
+{
+  if (z) baz ();
+  tem = x + y;
+}
+}
+
+/* Now inserting x + y five times is unnecessary but the cascading
+   cannot be avoided with the simple-minded dataflow.  But make sure
+   we do the insertions all in the first iteration.  */
+/* { dg-final { scan-tree-dump "insert iterations == 2" "pre" } } */
+/* { dg-final { scan-tree-dump "HOIST inserted: 5" "pre" } } */
diff --git a/gcc/tree-ssa-pre.c b/gcc/tree-ssa-pre.c
index bcef9720095..091ecb39bb6 100644
--- a/gcc/tree-ssa-pre.c
+++ b/gcc/tree-ssa-pre.c
@@ -3646,6 +3646,15 @@ insert (void)
fprintf (dump_file, "Starting insert iteration %d\n", num_iterations);
 
   changed = false;
+  /* Insert expressions for hoisting.  Do a backward walk here since
+inserting into BLOCK exposes new opportunities in its predecessors.  */
+  if (flag_code_hoisting)
+   for (int idx = rpo_num - 1; idx >= 0; --idx)
+ {
+   basic_block block = BASIC_BLOCK_FOR_FN (cfun, rpo[idx]);
+   if (EDGE_COUNT (block->succs) >= 2)
+ changed |= do_hoist_insertion (block);
+ }
   for (int idx = 0; idx < rpo_num; ++idx)
{
  basic_block block = BASIC_BLOCK_FOR_FN (cfun, rpo[idx]);
@@ -3680,10 +3689,6 @@ insert (void)
  if (do_partial_partial)
changed |= do_pre_partial_partial_insertion (block, dom);
}
-
- /* Insert expressions for hoisting.  */
- if (flag_code_hoisting && EDGE_COUNT (block->succs) >= 2)
-   changed |= do_hoist_insertion (block);
}
}
 
-- 
2.26.2


Re: [PATCH] libstdc++: Implement C++20 features for

2020-10-30 Thread Rainer Orth
Hi Jonathan,

> On 29/10/20 21:06 +0100, Rainer Orth wrote:
>>Tightening the patterns as in the attached patch at least allows
>>libstdc++.so.6 to link on i386-pc-solaris2.11; full bootstrap still
>>running.  However, I can't tell if this is really correct.
>
> I think we want this attached patch instead. It tightens them up to
> exactly the symbols we actually need to export and no more. This will
> avoid needing to tighten them again in the near future when the new
> overloads of str() are added.

that's even better of course.

Thanks.
Rainer

-- 
-
Rainer Orth, Center for Biotechnology, Bielefeld University


Re: [PATCH] arm: Fix multiple inheritance thunks for thumb-1 with -mpure-code

2020-10-30 Thread Richard Earnshaw via Gcc-patches
On 29/10/2020 19:18, Richard Earnshaw via Gcc-patches wrote:
> On 28/10/2020 18:10, Christophe Lyon via Gcc-patches wrote:
>> On Wed, 28 Oct 2020 at 18:44, Richard Earnshaw
>>  wrote:
>>>
>>> On 27/10/2020 15:42, Richard Earnshaw via Gcc-patches wrote:
 On 26/10/2020 10:52, Christophe Lyon via Gcc-patches wrote:
> On Thu, 22 Oct 2020 at 17:22, Richard Earnshaw
>  wrote:
>>
>> On 22/10/2020 09:45, Christophe Lyon via Gcc-patches wrote:
>>> On Wed, 21 Oct 2020 at 19:36, Richard Earnshaw
>>>  wrote:

 On 21/10/2020 17:11, Christophe Lyon via Gcc-patches wrote:
> On Wed, 21 Oct 2020 at 18:07, Richard Earnshaw
>  wrote:
>>
>> On 21/10/2020 16:49, Christophe Lyon via Gcc-patches wrote:
>>> On Tue, 20 Oct 2020 at 13:25, Richard Earnshaw
>>>  wrote:

 On 20/10/2020 12:22, Richard Earnshaw wrote:
> On 19/10/2020 17:32, Christophe Lyon via Gcc-patches wrote:
>> On Mon, 19 Oct 2020 at 16:39, Richard Earnshaw
>>  wrote:
>>>
>>> On 12/10/2020 08:59, Christophe Lyon via Gcc-patches wrote:
 On Thu, 8 Oct 2020 at 11:58, Richard Earnshaw
  wrote:
>
> On 08/10/2020 10:07, Christophe Lyon via Gcc-patches wrote:
>> On Tue, 6 Oct 2020 at 18:02, Richard Earnshaw
>>  wrote:
>>>
>>> On 29/09/2020 20:50, Christophe Lyon via Gcc-patches wrote:
 When mi_delta is > 255 and -mpure-code is used, we cannot 
 load delta
 from code memory (like we do without -mpure-code).

 This patch builds the value of mi_delta into r3 with a 
 series of
 movs/adds/lsls.

 We also do some cleanup by not emitting the function 
 address and delta
 via .word directives at the end of the thunk since we 
 don't use them
 with -mpure-code.

 No need for new testcases, this bug was already identified 
 by
 eg. pr46287-3.C

 2020-09-29  Christophe Lyon  

   gcc/
   * config/arm/arm.c (arm_thumb1_mi_thunk): Build 
 mi_delta in r3 and
   do not emit function address and delta when 
 -mpure-code is used.
>>>
>> Hi Richard,
>>
>> Thanks for your comments.
>>
>>> There are some optimizations you can make to this code.
>>>
>>> Firstly, for values between 256 and 510 (inclusive), it 
>>> would be better
>>> to just expand a mov of 255 followed by an add.
>> I now see the splitted for the "Pe" constraint which I 
>> hadn't noticed
>> before, so I can write something similar indeed.
>>
>> However, I'm note quite sure to understand the benefit in 
>> the split
>> when -mpure-code is NOT used.
>> Consider:
>> int f3_1 (void) { return 510; }
>> int f3_2 (void) { return 511; }
>> Compile with -O2 -mcpu=cortex-m0:
>> f3_1:
>> movsr0, #255
>> lslsr0, r0, #1
>> bx  lr
>> f3_2:
>> ldr r0, .L4
>> bx  lr
>>
>> The splitter makes the code bigger, does it "compensate" for 
>> this by
>> not having to load the constant?
>> Actually the constant uses 4 more bytes, which should be 
>> taken into
>> account when comparing code size,
>
> Yes, the size of the literal pool entry needs to be taken 
> into account.
>  It might happen that the entry could be shared with another 
> use of that
> literal, but in general that's rare.
>
>> so f3_1 uses 6 bytes, and f3_2 uses 8, so as you say below 
>> three
>> thumb1 instructions would be equivalent in size compared to 
>> loading
>> from the literal pool. Should the 256-510 range be extended?
>
> It's a bit borderline at three instructions when literal 
> pools are not
> expensive to 

[PATCH] Fix gnu-versioned-namespace build

2020-10-30 Thread François Dumont via Gcc-patches

The gnu-versioned-namespace build is broken.

The fix in charconv/floating_from_chars.cc is quite trivial. I am not so 
sure about the fix in sstream-inst.cc.


    libstdc++: Fix gnu-version-namespace buid

    libstdc++-v3/ChangeLog

    * include/std/charconv (from_chars): Define only if
    _GLIBCXX_USE_CXX11_ABI.
    * src/c++17/floating_from_chars.cc (from_chars): Likewise.
    * src/c++20/sstream-inst.cc: Limit instantiations if
    _GLIBCXX_USE_CXX11_ABI.

I build the lib with this patch. I am now running tests.

Ok to commit if tests are successful ?

François

diff --git a/libstdc++-v3/include/std/charconv b/libstdc++-v3/include/std/charconv
index dd1ebdf8322..90142659a0c 100644
--- a/libstdc++-v3/include/std/charconv
+++ b/libstdc++-v3/include/std/charconv
@@ -688,7 +688,7 @@ namespace __detail
   operator^=(chars_format& __lhs, chars_format __rhs) noexcept
   { return __lhs = __lhs ^ __rhs; }
 
-#if _GLIBCXX_HAVE_USELOCALE
+#if _GLIBCXX_HAVE_USELOCALE && _GLIBCXX_USE_CXX11_ABI
   from_chars_result
   from_chars(const char* __first, const char* __last, float& __value,
 	 chars_format __fmt = chars_format::general) noexcept;
diff --git a/libstdc++-v3/src/c++17/floating_from_chars.cc b/libstdc++-v3/src/c++17/floating_from_chars.cc
index d52c0a937b9..36685c2d6f4 100644
--- a/libstdc++-v3/src/c++17/floating_from_chars.cc
+++ b/libstdc++-v3/src/c++17/floating_from_chars.cc
@@ -41,7 +41,7 @@
 # include 
 #endif
 
-#if _GLIBCXX_HAVE_USELOCALE
+#if _GLIBCXX_HAVE_USELOCALE && _GLIBCXX_USE_CXX11_ABI
 namespace std _GLIBCXX_VISIBILITY(default)
 {
 _GLIBCXX_BEGIN_NAMESPACE_VERSION
diff --git a/libstdc++-v3/src/c++20/sstream-inst.cc b/libstdc++-v3/src/c++20/sstream-inst.cc
index e04560d28c8..8c6840115c5 100644
--- a/libstdc++-v3/src/c++20/sstream-inst.cc
+++ b/libstdc++-v3/src/c++20/sstream-inst.cc
@@ -29,6 +29,7 @@
 // Instantiations in this file are only for the new SSO std::string ABI
 #include 
 
+#if _GLIBCXX_USE_CXX11_ABI
 namespace std _GLIBCXX_VISIBILITY(default)
 {
 _GLIBCXX_BEGIN_NAMESPACE_VERSION
@@ -106,3 +107,5 @@ basic_stringstream::view() const noexcept;
 
 _GLIBCXX_END_NAMESPACE_VERSION
 }
+
+#endif //_GLIBCXX_USE_CXX11_ABI


Re: [PING] [PATCH] S/390: Do not turn maybe-uninitialized warnings into errors

2020-10-30 Thread Richard Biener via Gcc-patches
On Fri, Oct 30, 2020 at 11:09 AM Stefan Schulze Frielinghaus via
Gcc-patches  wrote:
>
> On Wed, Oct 28, 2020 at 11:34:53AM -0600, Jeff Law wrote:
> >
> > On 10/28/20 11:29 AM, Stefan Schulze Frielinghaus wrote:
> > > On Wed, Oct 28, 2020 at 08:39:41AM -0600, Jeff Law wrote:
> > >> On 10/28/20 3:38 AM, Stefan Schulze Frielinghaus via Gcc-patches wrote:
> > >>> On Mon, Oct 05, 2020 at 02:02:57PM +0200, Stefan Schulze Frielinghaus 
> > >>> via Gcc-patches wrote:
> >  On Tue, Sep 22, 2020 at 02:59:30PM +0200, Andreas Krebbel wrote:
> > > On 15.09.20 17:02, Stefan Schulze Frielinghaus wrote:
> > >> Over the last couple of months quite a few warnings about 
> > >> uninitialized
> > >> variables were raised while building GCC.  A reason why these 
> > >> warnings
> > >> show up on S/390 only is due to the aggressive inlining settings 
> > >> here.
> > >> Some of these warnings (2c832ffedf0, b776bdca932, 2786c0221b6,
> > >> 1657178f59b) could be fixed or in case of a false positive silenced 
> > >> by
> > >> initializing the corresponding variable.  Since the latter reoccurs 
> > >> and
> > >> while bootstrapping such warnings are turned into errors 
> > >> bootstrapping
> > >> fails on S/390 consistently.  Therefore, for the moment do not turn
> > >> those warnings into errors.
> > >>
> > >> config/ChangeLog:
> > >>
> > >>* warnings.m4: Do not turn maybe-uninitialized warnings into 
> > >> errors
> > >>on S/390.
> > >>
> > >> fixincludes/ChangeLog:
> > >>
> > >>* configure: Regenerate.
> > >>
> > >> gcc/ChangeLog:
> > >>
> > >>* configure: Regenerate.
> > >>
> > >> libcc1/ChangeLog:
> > >>
> > >>* configure: Regenerate.
> > >>
> > >> libcpp/ChangeLog:
> > >>
> > >>* configure: Regenerate.
> > >>
> > >> libdecnumber/ChangeLog:
> > >>
> > >>* configure: Regenerate.
> > > That change looks good to me. Could a global reviewer please comment!
> >  Ping
> > >>> Ping
> > >> I think this would be a huge mistake to install.
> > > The root cause why those false positives show up on S/390 only seems to
> > > be of more aggressive inlining w.r.t. other architectures.  Because of
> > > bigger caches and a rather huge function call overhead we greatly
> > > benefit from those inlining parameters. Thus:
> > >
> > > 1) Reverting those parameters would have a negative performance impact.
> > >
> > > 2) Fixing the maybe-uninitialized warnings analysis itself seems not to
> > >happen in the near future (assuming that it is fixable at all).
> > >
> > > 3) Silencing the warning by initialising the variable itself also seems
> > >to be undesired and feels like a fight against windmills ;-)
> > >
> > > 4) Not lifting maybe-uninitialized warnings to errors on S/390 only.
> > >
> > > Option (4) has the least intrusive effect to me.  At least then it is
> > > not necessary to bootstrap with --disable-werror and we would still
> > > treat all other warnings as errors.  All maybe-uninitialized warnings
> > > which are triggered in common code with non-aggressive inlining are
> > > still caught by other architectures.  Therefore, I'm wondering why this
> > > should be a huge mistake?  What would you propose instead?
> >
> > I'm aware of all that.  What I think it all argues is that y'all need to
> > address the issues because of how you've changed the tuning on the s390
> > port.  Simply disabling things like you've suggested is, IMHO, horribly
> > wrong.
> >
> >
> > Improve the analysis, dummy initializers, pragmas all seem viable.  But
> > again, it feels like it's something the s390 maintainers will have to
> > take the lead on because of how you've retuned the port.
>
> Fixing the analysis is of course the best option.  However, this sounds
> like a non-trivial task to me and I'm missing a lot of context here,
> i.e., I'm not sure what the initial goals were and if it is possible to
> meet those with the requirements which are necessary to solve those
> false positives (currently having PR96564 in mind where it was mentioned
> that alias info is not enough but also flow-based info is required; does
> this imply that we would have to reschedule the analysis at later time
> which was not desired in the first place etc.).
>
> In the past I tried to come up with some dummy initializers which were
> tough to get accepted (which I can understand up to some degree).  For
> example, this one is still open (I would be happy if you could have a
> look at it and accept/reject):
> https://gcc.gnu.org/pipermail/gcc-patches/2020-June/547063.html
>
> Then there is at least one unreported case (similar to PR96564) where we
> are not talking about a variable of scalar type but of an aggregate
> where only one struct member must be initialized in order to silence the
> warning.  Not sure whether a patch would be ac

[Patch + RFC][contrib] gcc-changelog/git_commit.py: Check for missing description

2020-10-30 Thread Tobias Burnus

In terms of issues, it seems as if Ubuntu 20.04.1 LTS has a too
old unidiff – I copied the check from test_email.py and applied
it to git_email.py – otherwise, nearly all tests fail.

Still, I do see some fails – I have attached the fails I got.
(fails.log, second attachment)

Independent of that, I have now written a check for an empty
description.

OK for the patch and thoughts about the fails?

Tobias

-
Mentor Graphics (Deutschland) GmbH, Arnulfstraße 201, 80634 München / Germany
Registergericht München HRB 106955, Geschäftsführer: Thomas Heurung, Alexander 
Walter
gcc-changelog/git_commit.py: Check for missing description

Especially when using mklog.py, it is simply to forget to fill in
the entries after the '\t* file.c (section):' or '\t(section):'.

contrib/ChangeLog:

	* gcc-changelog/git_commit.py (item_parenthesis_regex): Add.
	(parse_changelog): Detect missing descriptions.
	* gcc-changelog/git_email.py: Add unidiff_supports_renaming check.
	* gcc-changelog/test_email.py (test_emptry_entry_desc): Add.
	* gcc-changelog/test_patches.txt: Add testcase for it.

 contrib/gcc-changelog/git_commit.py|  9 +
 contrib/gcc-changelog/git_email.py |  5 +++--
 contrib/gcc-changelog/test_email.py|  5 +
 contrib/gcc-changelog/test_patches.txt | 28 
 4 files changed, 45 insertions(+), 2 deletions(-)

diff --git a/contrib/gcc-changelog/git_commit.py b/contrib/gcc-changelog/git_commit.py
index 1d0860cddd8..6adea9124fd 100755
--- a/contrib/gcc-changelog/git_commit.py
+++ b/contrib/gcc-changelog/git_commit.py
@@ -155,6 +155,7 @@ pr_regex = re.compile(r'\tPR (?P[a-z+-]+\/)?([0-9]+)$')
 dr_regex = re.compile(r'\tDR ([0-9]+)$')
 star_prefix_regex = re.compile(r'\t\*(?P\ *)(?P.*)')
 end_of_location_regex = re.compile(r'[\[<(:]')
+item_parenthesis_regex = re.compile(r'\t(\* \S+ )?\(\S+\):\s*$')
 
 LINE_LIMIT = 100
 TAB_WIDTH = 8
@@ -474,6 +475,14 @@ class GitCommit:
 self.errors.append(Error(msg, line))
 else:
 last_entry.lines.append(line)
+for entry in self.changelog_entries:
+for idx, line in enumerate(entry.lines):
+print (line)
+if item_parenthesis_regex.match(line) and \
+   (idx+1 not in entry.lines or \
+item_parenthesis_regex.match(entry.lines[idx+1])):
+msg = 'Missing description for item'
+self.errors.append(Error(msg, line))
 
 def parse_file_names(self):
 for entry in self.changelog_entries:
diff --git a/contrib/gcc-changelog/git_email.py b/contrib/gcc-changelog/git_email.py
index 014fdd1004b..ef054d58d52 100755
--- a/contrib/gcc-changelog/git_email.py
+++ b/contrib/gcc-changelog/git_email.py
@@ -24,7 +24,7 @@ from dateutil.parser import parse
 
 from git_commit import GitCommit, GitInfo
 
-from unidiff import PatchSet
+from unidiff import PatchSet, PatchedFile
 
 DATE_PREFIX = 'Date: '
 FROM_PREFIX = 'From: '
@@ -49,6 +49,7 @@ class GitEmail(GitCommit):
 body = lines[len(header) + 1:]
 
 modified_files = []
+unidiff_supports_renaming = hasattr(PatchedFile(), 'is_rename')
 for f in diff:
 # Strip "a/" and "b/" prefixes
 source = f.source_file[2:]
@@ -58,7 +59,7 @@ class GitEmail(GitCommit):
 t = 'A'
 elif f.is_removed_file:
 t = 'D'
-elif f.is_rename:
+elif unidiff_supports_renaming and f.is_rename:
 # Consider that renamed files are two operations: the deletion
 # of the original name and the addition of the new one.
 modified_files.append((source, 'D'))
diff --git a/contrib/gcc-changelog/test_email.py b/contrib/gcc-changelog/test_email.py
index 98f2ecd258d..3020f152192 100755
--- a/contrib/gcc-changelog/test_email.py
+++ b/contrib/gcc-changelog/test_email.py
@@ -370,3 +370,8 @@ class TestGccChangelog(unittest.TestCase):
 email = self.from_patch_glob('0001-tree-optimization-97633-fix')
 assert len(email.errors) == 1
 assert email.errors[0].message == 'empty group "()" found'
+
+def test_emptry_entry_desc(self):
+email = self.from_patch_glob('0001-c-Set-CALL_FROM_NEW_OR')
+assert len(email.errors) == 1
+assert email.errors[0].message == 'Missing description for item'
diff --git a/contrib/gcc-changelog/test_patches.txt b/contrib/gcc-changelog/test_patches.txt
index 148d020f23b..b1b85a4abc4 100644
--- a/contrib/gcc-changelog/test_patches.txt
+++ b/contrib/gcc-changelog/test_patches.txt
@@ -3235,4 +3235,32 @@ index 5d69a98c2a9..714e50697bd 100644
 -- 
 
 2.7.4
+=== 0001-c-Set-CALL_FROM_NEW_OR_DELETE_P-on-more-calls.patch ===
+From 4f4ced28826ece7b7b76649522ee2a9601a63b90 Mon Sep 17 00:00:00 2001
+From: Jason Merrill 
+Date: Fri, 2 Oct 2020 09:00:49 +0200
+Subject: [PATCH] c++: Set CALL_FROM_NEW_OR_DELETE_P 

Re: [PATCH] Fix gnu-versioned-namespace build

2020-10-30 Thread Jonathan Wakely via Gcc-patches

On 30/10/20 13:59 +0100, François Dumont via Libstdc++ wrote:

The gnu-versioned-namespace build is broken.

The fix in charconv/floating_from_chars.cc is quite trivial. I am not 
so sure about the fix in sstream-inst.cc.


The change for src/c++20/sstream-inst.cc is OK to commit. It would
probably be better to not build that file at all if the cxx11 ABI is
not supported at all, but then the src/c++20 directory would be empty
and I'm not sure if that would work. So just making the file empty is
fine.

The change for from_chars is not OK. With your change the 
header doesn't declare those functions if included by a file using the
old ABI. That's wrong, they should be declared unconditionally.

I see two ways to fix it. Either make the declarations in the header
depend on ! _GLIBCXX_INLINE_VERSION (so they're disabled for
gnu-versioned namespace) or fix the code in floating_from_chars to not
use a pmr::memory_resource for allocation in the versioned namespace
build.

Please commit the sstream-inst.cc part only, thanks.



Re: [PATCH 1/5] [PR target/96342] Change field "simdlen" into poly_uint64

2020-10-30 Thread Richard Sandiford via Gcc-patches
"yangyang (ET)"  writes:
> Although Richard mentioned in the PR that poly_uint64 will naturally 
> decay to a uint64_t in i386 target files, it seems that operation /= is not 
> supported yet, so I change "clonei->simdlen /= GET_MODE_BITSIZE (TYPE_MODE 
> (base_type));" into "clonei->simdlen = clonei->simdlen / GET_MODE_BITSIZE 
> (TYPE_MODE (base_type));".

Ah, don't remember encountering that one.  But yeah, expanding the
/= seems like the best approach for now.

> diff --git a/gcc/config/aarch64/aarch64.c b/gcc/config/aarch64/aarch64.c
> index a8cc545c370..c630c0c7f81 100644
> --- a/gcc/config/aarch64/aarch64.c
> +++ b/gcc/config/aarch64/aarch64.c
> @@ -23044,18 +23044,23 @@ aarch64_simd_clone_compute_vecsize_and_simdlen 
> (struct cgraph_node *node,
>   tree base_type, int num)
>  {
>tree t, ret_type, arg_type;
> -  unsigned int elt_bits, vec_bits, count;
> +  unsigned int elt_bits, count;
> +  unsigned HOST_WIDE_INT const_simdlen;
> +  poly_uint64 vec_bits;
>  
>if (!TARGET_SIMD)
>  return 0;
>  
> -  if (clonei->simdlen
> -  && (clonei->simdlen < 2
> -   || clonei->simdlen > 1024
> -   || (clonei->simdlen & (clonei->simdlen - 1)) != 0))
> +  /* For now, SVE simdclones won't produce illegal simdlen, So only check
> + const simdlens here.  */
> +  if (maybe_ne (clonei->simdlen, 0U)
> +  && (clonei->simdlen.is_constant (&const_simdlen))

Very minor, but GCC style is (mostly!) not to wrap a condition like this
in parentheses if it fits on a single line, so just:

  && clonei->simdlen.is_constant (&const_simdlen)

>else
>  {
>count = 1;
>vec_bits = clonei->simdlen * elt_bits;
> -  if (vec_bits != 64 && vec_bits != 128)
> +  /* For now, SVE simdclones won't produce illegal simdlen, So only check
> +  const simdlens here.  */
> +  if (clonei->simdlen.is_constant (&const_simdlen)
> +   && known_ne (vec_bits, 64U) && known_ne (vec_bits, 128U))

Although it won't make a difference in context due to the is_constant
check, in principle this should be “maybe_ne” rather than “known_ne”.
E.g. when testing SVE conditions, known_ne (2 + 2 * (VQ - 1), 2)
is false but maybe_ne (2 + 2 * (VQ - 1), 2) is true.

Alternatively:

   !(known_eq (vec_bits, 64U) || known_eq (vec_bits, 128U))

if that seems more natural (either's fine).

> diff --git a/gcc/config/i386/i386.c b/gcc/config/i386/i386.c
> index 54c2cdaf060..0ef037e5e55 100644
> --- a/gcc/config/i386/i386.c
> +++ b/gcc/config/i386/i386.c
> @@ -22140,7 +22140,7 @@ ix86_simd_clone_compute_vecsize_and_simdlen (struct 
> cgraph_node *node,
> || (clonei->simdlen & (clonei->simdlen - 1)) != 0))
>  {
>warning_at (DECL_SOURCE_LOCATION (node->decl), 0,
> -   "unsupported simdlen %d", clonei->simdlen);
> +   "unsupported simdlen %ld", clonei->simdlen.to_constant ());

I think this should be %wd instead.

> @@ -22267,7 +22268,8 @@ ix86_simd_clone_compute_vecsize_and_simdlen (struct 
> cgraph_node *node,
>if (cnt > (TARGET_64BIT ? 16 : 8))
>   {
> warning_at (DECL_SOURCE_LOCATION (node->decl), 0,
> -   "unsupported simdlen %d", clonei->simdlen);
> +   "unsupported simdlen %ld",
> +   clonei->simdlen.to_constant ());

Same here.

> @@ -502,17 +504,18 @@ simd_clone_adjust_return_type (struct cgraph_node *node)
>  veclen = node->simdclone->vecsize_int;
>else
>  veclen = node->simdclone->vecsize_float;
> -  veclen /= GET_MODE_BITSIZE (SCALAR_TYPE_MODE (t));
> -  if (veclen > node->simdclone->simdlen)
> +  veclen = exact_div (veclen, GET_MODE_BITSIZE (SCALAR_TYPE_MODE (t)));
> +  if (known_gt (veclen, node->simdclone->simdlen))
>  veclen = node->simdclone->simdlen;

Although again it probably doesn't make a difference in practice,
the known/maybe situation is similar here.  When comparing:

- an SVE vector of 2 + 2 * (VQ - 1) doubles and
- an Advanced SIMD vector of 2 doubles

the Advanced SIMD version is conceptually ordered <= the SVE one,
in the sense that the SVE vector always contains a whole number of
Advanced SIMD vectors whereas the Advanced SIMD vector might not
contain a whole number of SVE vectors.

In other words, the number of lanes in the Advanced SIMD vector
is known_le the number of lanes in the SVE vector, and the number
of lanes in the SVE vector is known_ge and maybe_gt (but not known_gt)
the number of lanes in the Advanced SIMD vector.  So for these kinds of
comparison, known_gt can look a bit unexpected, even if (as here) it's
probably fine in practice.

There's currently a hard-coded assumption in this code and in the
vectoriser that both constant-length software vectors and constant-length
hardware vectors are a power of 2 in size.  This means that the > above
is effectively testing whether veclen contains a whole number of
node->simdclone->simdlens, not just whether the veclen is bigger.

So when adding the initi

Re: [PATCH] Fix gnu-versioned-namespace build

2020-10-30 Thread Jonathan Wakely via Gcc-patches

On 30/10/20 13:23 +, Jonathan Wakely wrote:

On 30/10/20 13:59 +0100, François Dumont via Libstdc++ wrote:

The gnu-versioned-namespace build is broken.

The fix in charconv/floating_from_chars.cc is quite trivial. I am 
not so sure about the fix in sstream-inst.cc.


The change for src/c++20/sstream-inst.cc is OK to commit. It would
probably be better to not build that file at all if the cxx11 ABI is
not supported at all, but then the src/c++20 directory would be empty
and I'm not sure if that would work. So just making the file empty is
fine.

The change for from_chars is not OK. With your change the 
header doesn't declare those functions if included by a file using the
old ABI. That's wrong, they should be declared unconditionally.

I see two ways to fix it. Either make the declarations in the header
depend on ! _GLIBCXX_INLINE_VERSION (so they're disabled for
gnu-versioned namespace) or fix the code in floating_from_chars to not
use a pmr::memory_resource for allocation in the versioned namespace
build.


Here's a patch for the second way.

A third way to fix it would be to make basic_string work with C++
allocators, so that pmr::string is usable for the gnu-versioned
namespace.

And the fourth would be to switch the versioned namespace to use the
new ABI unconditionally, instead of using the old ABI unconditionally.


diff --git a/libstdc++-v3/src/c++17/floating_from_chars.cc b/libstdc++-v3/src/c++17/floating_from_chars.cc
index d52c0a937b9f..c279809cf35d 100644
--- a/libstdc++-v3/src/c++17/floating_from_chars.cc
+++ b/libstdc++-v3/src/c++17/floating_from_chars.cc
@@ -27,6 +27,9 @@
 // 23.2.9  Primitive numeric input conversion [utility.from.chars]
 //
 
+// Prefer to use std::pmr::string if possible, which requires the cxx11 ABI.
+#define _GLIBCXX_USE_CXX11_ABI 1
+
 #include 
 #include 
 #include 
@@ -87,6 +90,12 @@ namespace
 void* m_ptr = nullptr;
   };
 
+#if _GLIBCXX_USE_CXX11_ABI
+  using buffered_string = std::pmr::string;
+#else
+  using buffered_string = std::string;
+#endif
+
   inline bool valid_fmt(chars_format fmt)
   {
 return fmt != chars_format{}
@@ -130,7 +139,7 @@ namespace
   // Returns a nullptr if a valid pattern is not present.
   const char*
   pattern(const char* const first, const char* last,
-	  chars_format& fmt, pmr::string& buf)
+	  chars_format& fmt, buffered_string& buf)
   {
 // fmt has the value of one of the enumerators of chars_format.
 __glibcxx_assert(valid_fmt(fmt));
@@ -359,6 +368,22 @@ namespace
 return result;
   }
 
+#if ! _GLIBCXX_USE_CXX11_ABI
+  inline bool
+  reserve_string(std::string& s) noexcept
+  {
+__try
+  {
+	s.reserve(buffer_resource::guaranteed_capacity());
+  }
+__catch (const std::bad_alloc&)
+  {
+	return false;
+  }
+return true;
+  }
+#endif
+
 } // namespace
 
 // FIXME: This should be reimplemented so it doesn't use strtod and newlocale.
@@ -369,10 +394,16 @@ from_chars_result
 from_chars(const char* first, const char* last, float& value,
 	   chars_format fmt) noexcept
 {
+  errc ec = errc::invalid_argument;
+#if _GLIBCXX_USE_CXX11_ABI
   buffer_resource mr;
   pmr::string buf(&mr);
+#else
+  string buf;
+  if (!reserve_string(buf))
+return make_result(first, 0, {}, ec);
+#endif
   size_t len = 0;
-  errc ec = errc::invalid_argument;
   __try
 {
   if (const char* pat = pattern(first, last, fmt, buf)) [[likely]]
@@ -389,10 +420,16 @@ from_chars_result
 from_chars(const char* first, const char* last, double& value,
 	   chars_format fmt) noexcept
 {
+  errc ec = errc::invalid_argument;
+#if _GLIBCXX_USE_CXX11_ABI
   buffer_resource mr;
   pmr::string buf(&mr);
+#else
+  string buf;
+  if (!reserve_string(buf))
+return make_result(first, 0, {}, ec);
+#endif
   size_t len = 0;
-  errc ec = errc::invalid_argument;
   __try
 {
   if (const char* pat = pattern(first, last, fmt, buf)) [[likely]]
@@ -409,10 +446,16 @@ from_chars_result
 from_chars(const char* first, const char* last, long double& value,
 	   chars_format fmt) noexcept
 {
+  errc ec = errc::invalid_argument;
+#if _GLIBCXX_USE_CXX11_ABI
   buffer_resource mr;
   pmr::string buf(&mr);
+#else
+  string buf;
+  if (!reserve_string(buf))
+return make_result(first, 0, {}, ec);
+#endif
   size_t len = 0;
-  errc ec = errc::invalid_argument;
   __try
 {
   if (const char* pat = pattern(first, last, fmt, buf)) [[likely]]


Re: [RFC, testsuite] Add dg-save-linenr

2020-10-30 Thread Thomas Schwinge
Hi Jakub!

On 2020-10-30T12:40:02+0100, Jakub Jelinek  wrote:
> On Fri, Oct 30, 2020 at 12:34:57PM +0100, Thomas Schwinge wrote:
>> On 2017-05-22T18:55:29+0200, Tom de Vries  wrote:
>> > On 05/16/2017 03:12 PM, Rainer Orth wrote:
>> >> [...], but the new proc ['dg-line'] needs documenting in sourcebuild.texi.
>> >
>> > Attached patch adds the missing documentation.
>>
>> OK to expand that with the attached patch to "Document that 'linenumvar'
>> in 'dg-line' may contain Tcl syntax"?  (Hooray for embedded Tcl!  --
>> Don't hurt me; I (later) have a use case where this does make things
>> easier.)
>
> Is it desirable though?

I hear you.

> I mean if we ever decide to switch from dejagnu to something else,
> adding parsing of our dg-* grammar is not that hard, and while we rely
> on some tcl details already (e.g. the {}s vs. ""s for regular expressions
> etc.), allowing arbitrary embedded tcl will make that effort even harder.

(It's not much, but note that there already are some more "arbitrary"
Tcl-y idioms in the testsuite.)

I had considered the point you're making, but it's already many years
(decades?) that we (meaning: some?) would like to switch away from
DejaGnu (to what else -- QMTest apparently isn't it?) -- so, this isn't
going to happen next week.  If we then ever port to something else, I'm
sure the new system will be likewise expressive/extensible.  Thus I
decided to use the convenience now, and defer the potential (minor,
compared to the overall effort) complication until then.


Grüße
 Thomas
-
Mentor Graphics (Deutschland) GmbH, Arnulfstraße 201, 80634 München / Germany
Registergericht München HRB 106955, Geschäftsführer: Thomas Heurung, Alexander 
Walter


Re: [PATCH] calls.c:can_implement_as_sibling_call_p REG_PARM_STACK_SPACE check

2020-10-30 Thread Alan Modra via Gcc-patches
On Fri, Oct 30, 2020 at 09:21:09AM +, Richard Sandiford wrote:
> Alan Modra via Gcc-patches  writes:
> > This moves an #ifdef block of code from calls.c to
> > targetm.function_ok_for_sibcall.  Only two targets, x86 and rs6000,
> > define REG_PARM_STACK_SPACE or OUTGOING_REG_PARM_STACK_SPACE macros
> > that might vary depending on the called function.  Macros like
> > UNITS_PER_WORD don't change over a function boundary, nor does the
> > MIPS ABI, nor does TARGET_64BIT on PA-RISC.  Other targets are even
> > more trivially seen to not need the calls.c code.
> >
> > Besides cleaning up a small piece of #ifdef code, the motivation for
> > this patch is to allow tail calls on PowerPC for functions that
> > require less reg_parm_stack_space than their caller.  The original
> > code in calls.c only permitted tail calls when exactly equal.
> 
> Is there something PowerPC-specific that makes the relaxation safe
> for that target while not being safe on x86?

It is quite possible that x86 can relax this condition too, I'm just
not familiar enough with all the x86 ABIs know with any certainty.  By
moving the test to the target hook we allow target maintainers to have
full say in the matter.

> I take your point about x86 and PowerPC being the only two affected
> targets.  But the interface does still take an fndecl on all targets,
> so I think the target-independent assumption should be that the value
> might vary depending on function.  So I guess an alternative would be
> to relax the target-independent condition and make the x86 hook enforce
> the stricter condition (if it really is needed).

Yes, except that actually the REG_PARM_STACK_SPACE condition for
PowerPC can be removed entirely.  I agree that doing as you suggest
would be OK for PowerPC, it would just mean we continue to do some
unnecessary work in the non-trivial rs6000_function_parms_need_stack.

Would it be better if I post the patches again, restructuring them as
1) completely no functional change just moving the existing condition
   to the powerpc and i386 target hooks, and
2) twiddling the powerpc target hook?

Thanks for your time spent reviewing, and comments!

-- 
Alan Modra
Australia Development Lab, IBM


Re: [RFC] Add support for the "retain" attribute utilizing SHF_GNU_RETAIN

2020-10-30 Thread Jozef Lawrynowicz
On Mon, Oct 26, 2020 at 07:08:06PM +, Pedro Alves via Gcc-patches wrote:
> On 10/6/20 12:10 PM, Jozef Lawrynowicz wrote:
> 
> > Should "used" apply SHF_GNU_RETAIN?
> > ===
> > Another talking point is whether the existing "used" attribute should
> > apply the SHF_GNU_RETAIN flag to the containing section.
> > 
> > It seems unlikely that a user applies the "used" attribute to a
> > declaration, and means for it to be saved from only compiler
> > optimization, but *not* linker optimization. So perhaps it would be
> > beneficial for "used" to apply SHF_GNU_RETAIN in some way.
> > 
> > If "used" did apply SHF_GNU_RETAIN, we would also have to
> > consider the above options for how to apply SHF_GNU_RETAIN to the
> > section. Since the "used" attribute has been around for a while 
> > it might not be appropriate for its behavior to be changed to place the
> > associated declaration in its own, unique section, as in option (2).
> > 
> 
> To me, if I use attribute((used)), and the linker still garbage
> collects the symbol, then the toolchain has a bug.  Is there any
> use case that would suggest otherwise?

I revised the implementation so the "used" attribute will save the
symbol from garbage collection.

By implementing the TARGET_MARK_DECL_PRESERVED macro, a
".retain " directive will be emitted for decls that had the
"used" attribute applied.

GAS will set the SHF_GNU_RETAIN flag on sections containing symbols
referenced in ".retain" directives.
GAS still supports setting the "R" flag in .section directives, but GCC
won't emit these.

LD will save SHF_GNU_RETAIN sections from garbage collection.

For reference, I've attached the Binutils and GCC patches that implement
this. The results from a bootstrap and regtest for x86_64-pc-linux-gnu
and light testing for arm-eabi are looking good, but I need to add more
tests and clean the patches up before final submission.

Thanks!
Jozef

> 
> Thanks,
> Pedro Alves
> 
diff --git a/bfd/elf-bfd.h b/bfd/elf-bfd.h
index 140a98594d..ffb75f7919 100644
--- a/bfd/elf-bfd.h
+++ b/bfd/elf-bfd.h
@@ -1897,14 +1897,15 @@ struct output_elf_obj_tdata
   bfd_boolean flags_init;
 };
 
-/* Indicate if the bfd contains SHF_GNU_MBIND sections or symbols that
-   have the STT_GNU_IFUNC symbol type or STB_GNU_UNIQUE binding.  Used
-   to set the osabi field in the ELF header structure.  */
+/* Indicate if the bfd contains SHF_GNU_MBIND/SHF_GNU_RETAIN sections or
+   symbols that have the STT_GNU_IFUNC symbol type or STB_GNU_UNIQUE
+   binding.  Used to set the osabi field in the ELF header structure.  */
 enum elf_gnu_osabi
 {
   elf_gnu_osabi_mbind = 1 << 0,
   elf_gnu_osabi_ifunc = 1 << 1,
   elf_gnu_osabi_unique = 1 << 2,
+  elf_gnu_osabi_retain = 1 << 3,
 };
 
 typedef struct elf_section_list
@@ -2034,7 +2035,7 @@ struct elf_obj_tdata
   ENUM_BITFIELD (dynamic_lib_link_class) dyn_lib_class : 4;
 
   /* Whether the bfd uses OS specific bits that require ELFOSABI_GNU.  */
-  ENUM_BITFIELD (elf_gnu_osabi) has_gnu_osabi : 3;
+  ENUM_BITFIELD (elf_gnu_osabi) has_gnu_osabi : 4;
 
   /* Whether if the bfd contains the GNU_PROPERTY_NO_COPY_ON_PROTECTED
  property.  */
diff --git a/bfd/elf.c b/bfd/elf.c
index 9d7cbd52e0..8ec21d7705 100644
--- a/bfd/elf.c
+++ b/bfd/elf.c
@@ -1066,9 +1066,12 @@ _bfd_elf_make_section_from_shdr (bfd *abfd,
   /* FIXME: We should not recognize SHF_GNU_MBIND for ELFOSABI_NONE,
 but binutils as of 2019-07-23 did not set the EI_OSABI header
 byte.  */
-case ELFOSABI_NONE:
 case ELFOSABI_GNU:
 case ELFOSABI_FREEBSD:
+  if ((hdr->sh_flags & SHF_GNU_RETAIN) != 0)
+   elf_tdata (abfd)->has_gnu_osabi |= elf_gnu_osabi_retain;
+  /* Fall through */
+case ELFOSABI_NONE:
   if ((hdr->sh_flags & SHF_GNU_MBIND) != 0)
elf_tdata (abfd)->has_gnu_osabi |= elf_gnu_osabi_mbind;
   break;
@@ -12454,8 +12457,8 @@ _bfd_elf_final_write_processing (bfd *abfd)
 i_ehdrp->e_ident[EI_OSABI] = get_elf_backend_data (abfd)->elf_osabi;
 
   /* Set the osabi field to ELFOSABI_GNU if the binary contains
- SHF_GNU_MBIND sections or symbols of STT_GNU_IFUNC type or
- STB_GNU_UNIQUE binding.  */
+ SHF_GNU_MBIND or SHF_GNU_RETAIN sections or symbols of STT_GNU_IFUNC type
+ or STB_GNU_UNIQUE binding.  */
   if (elf_tdata (abfd)->has_gnu_osabi != 0)
 {
   if (i_ehdrp->e_ident[EI_OSABI] == ELFOSABI_NONE)
@@ -12464,11 +12467,17 @@ _bfd_elf_final_write_processing (bfd *abfd)
   && i_ehdrp->e_ident[EI_OSABI] != ELFOSABI_FREEBSD)
{
  if (elf_tdata (abfd)->has_gnu_osabi & elf_gnu_osabi_mbind)
-   _bfd_error_handler (_("GNU_MBIND section is unsupported"));
+   _bfd_error_handler (_("GNU_MBIND section is supported only by GNU "
+ "and FreeBSD targets"));
  if (elf_tdata (abfd)->has_gnu_osabi & elf_gnu_osabi_ifunc)
-   _bfd_error_handler (_("symbol type STT_GNU_IFUNC is unsupported"));
+   

Re: [PATCH, OpenMP 5.0] Implement structure element mapping changes in 5.0

2020-10-30 Thread Jakub Jelinek via Gcc-patches
On Mon, Oct 26, 2020 at 09:10:08AM +0100, Jakub Jelinek via Gcc-patches wrote:
> Yes, it is a QoI and it is important not to regress about that.
> Furthermore, the more we diverge from what the spec says, it will be harder
> for us to implement, not just now, but in the future too.
> What I wrote about the actual implementation is actually not accurate, we
> need the master and slaves to be the struct splay_tree_key_s objects.
> And that one already has the aux field that could be used for the slaves,
> so we could e.g. use another magic value of refcount, e.g. REFCOUNT_SLAVE
> ~(uintptr_t) 2, and in that case aux would point to the master
> splay_tree_key_s.
> 
> And the 
> "If the corresponding list item’s reference count was not already incremented 
> because of the
> effect of a map clause on the construct then:
> a) The corresponding list item’s reference count is incremented by one;"
> and
> "If the map-type is not delete and the corresponding list item’s reference 
> count is finite and
> was not already decremented because of the effect of a map clause on the 
> construct then:
> a) The corresponding list item’s reference count is decremented by one;"
> rules we need to implement in any case, I don't see a way around that.
> The same list item can now be mapped (or unmapped) multiple times on the same
> construct.

To show up what exactly I meant, here is a proof of concept (but unfinished)
patch.
For OpenMP only (I believe OpenACC ATM doesn't have such concept of
structure sibling lists nor requirement as OpenMP 5.0 that on one construct
one refcount isn't incremented multiple times nor decremented multiple
times) it uses the dynamic_refcount field otherwise only used in OpenACC
for the structure sibling lists; in particular, all but the first mapping
in a structure sibling list will have refcount == REFCOUNT_SIBLING and
dynamic_refcount pointing to their master's refcount field.  And
the master has dynamic_refcount set to the number of REFCOUNT_SIBLING
following those.

In the patch I've only changed the construction of such splay_tree_keys
and changed gomp_exit_data to do deal with those (that is the very easy
part) plus implement the OpenMP 5.0 rule that one refcount isn't decremented
more than once.
What would need to be done is handle the rest, in particular (for OpenMP
only) adjust the refcount (splay_tree_key only, not target_mem_desc), such
that for the just created splay_tree_keys (refcount pointers in between
tgt->array and end of the array (perhaps we should add a field how many
elts the array has) it doesn't bump anything - just rely on the refcount = 1
we do elsewhere, and for other refcounts, if REFCOUNT_SIBLING, use the
dynamic_refcount pointer and if not REFCOUNT_INFINITY, instead of bumping
the refcount queue it for later increments (again, with allocaed list).
And when unmapping at the end of target or target data, do something similar
to what gomp_exit_data does in the patch (perhaps with some helper
functions).

At least from omp-lang discussions, the intent is that e.g. on
struct S { int a, b, c, d, e; } s = { 1, 2, 3, 4, 5};
#pragma omp target enter data map (s)
// same thing as
// #pragma omp target enter data map (s.a, s.b, s.c, s.d, s.e)
// The above at least theoretically creates 5 mappings, with
// refcount set to 1 for each (but with all those refcount behaving
// in sync), but I'd strongly prefer to create just one with one refcount.
int *p = &s.b;
int *q = &s.d;
#pragma omp target enter data map (p[:1]) map (q[:1])
// Above needs to bump either the refcounts of all of s.a, s.b, s.c, s.d and
// s.e by 1, or when it all has just a single refcount, bump it also just by
// 1.

int a;
#pragma omp target enter data map (a)   // This creates just one mapping and 
sets refcount to 1
// as int is not an aggregate
char *r, *s;
r = (char *) &a;
s = r + 2;
#pragma omp target enter data map (r[:1], s[:1])
// The above should bump the refcount of a just once, not twice in OpenMP
// 5.0.

For both testcases, I guess one can try to construct from that user
observable tests where the refcount will result in copying the data back at
certain points (or not).
And for the non-contiguous structure element mappings, the idea would
be that we still use a single refcount for the whole structure sibling list
defined in the spec.

--- libgomp/libgomp.h.jj2020-10-30 12:57:16.176284101 +0100
+++ libgomp/libgomp.h   2020-10-30 12:57:40.264014514 +0100
@@ -1002,6 +1002,10 @@ struct target_mem_desc {
 /* Special value for refcount - tgt_offset contains target address of the
artificial pointer to "omp declare target link" object.  */
 #define REFCOUNT_LINK (~(uintptr_t) 1)
+/* Special value for refcount - structure sibling list item other than
+   the first one.  *(uintptr_t *)dynamic_refcount is the actual refcount
+   for it.  */
+#define REFCOUNT_SIBLING (~(uintptr_t) 2)
 
 /* Special offset values.  */
 #define OFFSET_INLINED (~(uintptr_t) 0)
--- libgomp/target.c.jj 2020-10-30 12:57:1

Re: [PATCH][AArch64] ACLE intrinsics: get low/high half from BFloat16 vector

2020-10-30 Thread Richard Sandiford via Gcc-patches
Dennis Zhang  writes:
> diff --git a/gcc/config/aarch64/aarch64-simd-builtins.def 
> b/gcc/config/aarch64/aarch64-simd-builtins.def
> index 332a0b6b1ea..39ebb776d1d 100644
> --- a/gcc/config/aarch64/aarch64-simd-builtins.def
> +++ b/gcc/config/aarch64/aarch64-simd-builtins.def
> @@ -719,6 +719,9 @@
>VAR1 (QUADOP_LANE, bfmlalb_lane_q, 0, ALL, v4sf)
>VAR1 (QUADOP_LANE, bfmlalt_lane_q, 0, ALL, v4sf)
>  
> +  /* Implemented by aarch64_vget_halfv8bf.  */
> +  VAR1 (GETREG, vget_half, 0, ALL, v8bf)

This should be AUTO_FP, since it doesn't have any side-effects.
(As before, we should probably rename the flag, but that's separate work.)

> +
>/* Implemented by aarch64_simd_mmlav16qi.  */
>VAR1 (TERNOP, simd_smmla, 0, NONE, v16qi)
>VAR1 (TERNOPU, simd_ummla, 0, NONE, v16qi)
> diff --git a/gcc/config/aarch64/aarch64-simd.md 
> b/gcc/config/aarch64/aarch64-simd.md
> index 9f0e2bd1e6f..f62c52ca327 100644
> --- a/gcc/config/aarch64/aarch64-simd.md
> +++ b/gcc/config/aarch64/aarch64-simd.md
> @@ -7159,6 +7159,19 @@
>[(set_attr "type" "neon_dot")]
>  )
>  
> +;; vget_low/high_bf16
> +(define_expand "aarch64_vget_halfv8bf"
> +  [(match_operand:V4BF 0 "register_operand")
> +   (match_operand:V8BF 1 "register_operand")
> +   (match_operand:SI 2 "aarch64_zero_or_1")]
> +  "TARGET_BF16_SIMD"
> +{
> +  int hbase = INTVAL (operands[2]);
> +  rtx sel = aarch64_gen_stepped_int_parallel (4, hbase * 4, 1);

I think this needs to be:

  aarch64_simd_vect_par_cnst_half

instead.  The issue is that on big-endian targets, GCC assumes vector
lane 0 is in the high part of the register, whereas for AArch64 it's
always in the low part of the register.  So we convert from AArch64
numbering to GCC numbering when generating the rtx and then take
endianness into account when matching the rtx later.

It would be good to have -mbig-endian tests that make sure we generate
the right instruction for each function (i.e. we get them the right way
round).  I guess it would be good to test that for little-endian too.

> +  emit_insn (gen_aarch64_get_halfv8bf (operands[0], operands[1], sel));
> +  DONE;
> +})
> +
>  ;; bfmmla
>  (define_insn "aarch64_bfmmlaqv4sf"
>[(set (match_operand:V4SF 0 "register_operand" "=w")
> diff --git a/gcc/config/aarch64/predicates.md 
> b/gcc/config/aarch64/predicates.md
> index 215fcec5955..0c8bc2b0c73 100644
> --- a/gcc/config/aarch64/predicates.md
> +++ b/gcc/config/aarch64/predicates.md
> @@ -84,6 +84,10 @@
>(ior (match_test "op == constm1_rtx")
> (match_test "op == const1_rtx"))
>  
> +(define_predicate "aarch64_zero_or_1"
> +  (and (match_code "const_int")
> +   (match_test "op == const0_rtx || op == const1_rtx")))

zero_or_1 looked odd to me, feels like it should be 0_or_1 or zero_or_one.
But I see that it's for consistency with aarch64_reg_zero_or_m1_or_1,
so let's keep it as-is.

Thanks,
Richard


Re: Revert "[nvptx, libgomp] Update pr85381-{2, 4}.c test-cases" [PR89713, PR94392] (was: [PATCH][RFC] c/94392 - only enable -ffinite-loops for C++)

2020-10-30 Thread Thomas Schwinge
Hi!

Frederik stumbled over a related thing.

On 2020-04-03T12:36:29+0200, Richard Biener  wrote:
> On Fri, 3 Apr 2020, Thomas Schwinge wrote:
>> On 2020-04-02T11:12:48+0200, Richard Biener  wrote:
>> > On Wed, 1 Apr 2020, Jason Merrill wrote:
>> >
>> >> On 4/1/20 9:36 AM, Richard Biener wrote:
>> >> > This does away with enabling -ffinite-loops at -O2+ for all languages
>> >> > and instead enables it selectively for C++ only.
>>
>> > I'm retesting the following [...]
>>
>> ..., which got pushed in commit 75efe9cb1f8938a713ce540dc3b27bc2afcd3fae
>> "c/94392 - only enable -ffinite-loops for C++".
>>
>> I pushed the attached in commit 4f6a0888de52a2e523a6fd4235fe7f8193819c3b
>> 'Revert "[nvptx, libgomp] Update pr85381-{2,4}.c test-cases" [PR89713,
>> PR94392]'.  As can be observed in two nvptx offloading test cases
>> regressing, 'apparently now again "empty oacc loops are" no longer
>> "removed before expand"' (quoting myself from the commit log, adapting
>> Tom's commit log snippet from the reverted commit).
>>
>> It's not obvious to me how the "finite loop" property discussed/changed
>> in Richard's commit 75efe9cb1f8938a713ce540dc3b27bc2afcd3fae "c/94392 -
>> only enable -ffinite-loops for C++" relates to the previously observed
>> optimization of removing "empty oacc loops [...] before expand" (after
>> PR89713 commit c29c92c789d93848cc1c929838771bfc68cb272c "PR
>> tree-optimization/89713 - Assume loop with an exit is finite"), but
>> examining that in detail is for another day.
>
> For C we no longer have -ffinite-loops in effect but for C++ we still
> do.  But since the testcase is c/c++ common I'd have expected it
> now fails "split" ... so an explicit -fno-finite-loops or
> -ffinite-loops with an explanation would be easier.

(Thanks, and ACK; still have to look into that.)

> Note there's now also the opportunity to set the loop flag for
> OpenACC/OpenMP annotated loops if any of that guarantees finiteness.
> (for GCC11 only please)

On 2020-04-03T13:34:18+0200, Jakub Jelinek  wrote:
> Dunno about OpenACC, but OpenMP loops guarantee finiteness, as the number of
> iterations must be computable before the loop and must fit into the type in
> which that count is computed without overflows.

Specifically, is that computable at run-time or compile-time?

Similar for OpenACC.  For example, OpenACC 3.0, 2.9. "Loop Construct",
"Restrictions": "A loop associated with a 'loop' construct that does not
have a 'seq' clause must be written such that the loop iteration count is
computable when entering the loop construct".

(This can only viewed by members of the OpenACC GitHub organization, but
I wanted to share the pointer anyway, and can relay discussion as
necessary.)  For the next version of OpenACC (soon!), this is being
further clarified as per the current discussion in
 "Proposed changes for
range-based for loops and != operator", which should be relevant here:

|   - A loop associated with a 'loop' construct that does not have a 'seq'
| clause must be written to meet all of the following conditions:
|
|   - The loop variable must be of integer, C/C++ pointer, or C++
| random-access iterator type.
|
|   - The loop variable must monotonically increase or decrease in the
| direction of its termination condition.
|
|   - The loop iteration count must be computable in constant time when
| entering the 'loop' construct.
|
| For a C++ range-based 'for' loop, the loop variable identified by the
| above conditions is the internal iterator, such as a pointer, that
| the compiler generates to iterate the range.  It is not the variable
| declared by the 'for' loop.

(Notice: "computable in constant time" (which means: not computable only
by actually iterating the whole loop structure), and "computable [...]
when entering" (which, if I got that right, means: at run-time, not
necessarily already compile-time?).


Grüße
 Thomas
-
Mentor Graphics (Deutschland) GmbH, Arnulfstraße 201, 80634 München / Germany
Registergericht München HRB 106955, Geschäftsführer: Thomas Heurung, Alexander 
Walter


Re: [PATCH] Fix gnu-versioned-namespace build

2020-10-30 Thread Jonathan Wakely via Gcc-patches

On 30/10/20 13:38 +, Jonathan Wakely wrote:

On 30/10/20 13:23 +, Jonathan Wakely wrote:

On 30/10/20 13:59 +0100, François Dumont via Libstdc++ wrote:

The gnu-versioned-namespace build is broken.

The fix in charconv/floating_from_chars.cc is quite trivial. I am 
not so sure about the fix in sstream-inst.cc.


The change for src/c++20/sstream-inst.cc is OK to commit. It would
probably be better to not build that file at all if the cxx11 ABI is
not supported at all, but then the src/c++20 directory would be empty
and I'm not sure if that would work. So just making the file empty is
fine.

The change for from_chars is not OK. With your change the 
header doesn't declare those functions if included by a file using the
old ABI. That's wrong, they should be declared unconditionally.

I see two ways to fix it. Either make the declarations in the header
depend on ! _GLIBCXX_INLINE_VERSION (so they're disabled for
gnu-versioned namespace) or fix the code in floating_from_chars to not
use a pmr::memory_resource for allocation in the versioned namespace
build.


Here's a patch for the second way.

A third way to fix it would be to make basic_string work with C++


Oops, I meant "make the old basic_string work with C++11 allocators"
of course.



allocators, so that pmr::string is usable for the gnu-versioned
namespace.

And the fourth would be to switch the versioned namespace to use the
new ABI unconditionally, instead of using the old ABI unconditionally.





Re: Revert "[nvptx, libgomp] Update pr85381-{2,4}.c test-cases" [PR89713, PR94392] (was: [PATCH][RFC] c/94392 - only enable -ffinite-loops for C++)

2020-10-30 Thread Jakub Jelinek via Gcc-patches
On Fri, Oct 30, 2020 at 03:09:31PM +0100, Thomas Schwinge wrote:
> On 2020-04-03T13:34:18+0200, Jakub Jelinek  wrote:
> > Dunno about OpenACC, but OpenMP loops guarantee finiteness, as the number of
> > iterations must be computable before the loop and must fit into the type in
> > which that count is computed without overflows.
> 
> Specifically, is that computable at run-time or compile-time?

At run-time.  OpenMP certainly doesn't disallow loops with non-constant
steps or lower/upper bound expressions (ok, there is one exception, for
!= loop condition the step must be a constant expression, the step has to
be just 1 or -1).

Jakub



Re: [PATCH][middle-end][i386][version 6]Add -fzero-call-used-regs=[skip|used-gpr-arg|used-arg|all-gpr-arg|all-arg|used-gpr|all-gpr|used|all]

2020-10-30 Thread Qing Zhao via Gcc-patches



> On Oct 30, 2020, at 4:54 AM, Richard Sandiford  
> wrote:
> 
> Qing Zhao  writes:
>> @@ -3996,22 +3996,19 @@ with a named @code{target} must be @code{static}.
>> @cindex @code{zero_call_used_regs} function attribute
>> 
>> The @code{zero_call_used_regs} attribute causes the compiler to zero
>> -a subset of all call-used registers at function return according to
>> -@var{choice}.
>> -This is used to increase the program security by either mitigating
>> -Return-Oriented Programming (ROP) or preventing information leak
>> +a subset of all call-used registers@footnote{A ``call-used'' register
>> +is a register whose contents can be changed by a function call;
>> +therefore, a caller cannot assume that the register has the same contents
>> +on return from the function as it had before calling the function.  Such
>> +registers are also called ``call-clobbered'', ``caller-saved'', or
>> +``volatile''.} at function return.
>> +This is used to increase program security by either mitigating
>> +Return-Oriented Programming (ROP) attacks or preventing information leakage
>> through registers.
>> 
>> -A ``call-used'' register is a register whose contents can be changed by
>> -a function call; therefore, a caller cannot assume that the register has
>> -the same contents on return from the function as it had before calling
>> -the function.  Such registers are also called ``call-clobbered'',
>> -``caller-saved'', or ``volatile''.
>> -
>> In order to satisfy users with different security needs and control the
>> -run-time overhead at the same time, GCC provides a flexible way to choose
>> -the subset of the call-used registers to be zeroed.
>> -
>> +run-time overhead at the same time, @var{choice} parameter provides a
> 
> I suggested “the @var{choice} parameter provides” in the review yesterday.
> The “the” is needed.
My bad, added it.
> 
>> +flexible way to choose the subset of the call-used registers to be zeroed.
>> The three basic values of @var{choice} are:
>> 
>> @itemize @bullet
>> @@ -4046,42 +4043,41 @@ together, they must appear in the order above.
>> 
>> The full list of @var{choice}s is therefore:
>> 
>> -@itemize @bullet
>> -@item
>> -@samp{skip} doesn't zero any call-used register.
>> +@table @code
>> +@item skip
>> +doesn't zero any call-used register.
>> 
>> -@item
>> -@samp{used} only zeros call-used registers that are used in the function.
>> +@item used
>> +only zeros call-used registers that are used in the function.
>> 
>> -@item
>> -@samp{all} zeros all call-used registers.
>> +@item used-gpr
>> +only zeros call-used general purpose registers that are used in the 
>> function.
>> 
>> -@item
>> -@samp{used-arg} only zeros used call-used registers that pass arguments.
>> +@item used-arg
>> +only zeros call-used registers that are used in the function and pass 
>> arguments.
>> 
>> -@item
>> -@samp{used-gpr} only zeros used call-used general purpose registers.
>> +@item used-gpr-arg
>> +only zeros call-used general purpose registers that are used in the function
>> +and pass arguments.
>> 
>> -@item
>> -@samp{used-gpr-arg} only zeros used call-used general purpose registers that
>> -pass arguments.
>> +@item all
>> +zeros all call-used registers.
>> 
>> -@item
>> -@samp{all-gpr-arg} zeros all call-used general purpose registers that pass
>> -arguments.
>> +@item all-gpr
>> +zeros all call-used general purpose registers.
>> 
>> -@item
>> -@samp{all-arg} zeros all call-used registers that pass arguments.
>> +@item all-arg
>> +zeros all call-used registers that pass arguments.
>> 
>> -@item
>> -@samp{all-gpr} zeros all call-used general purpose registers.
>> -@end itemize
>> +@item all-gpr-arg
>> +zeros all call-used general purpose registers that pass
>> +arguments.
>> +@end table
> 
> TBH I also think the order I suggested yesterday is more natural
> than this one, but either's OK.  The above certainly addresses
> the original concern I had about the order being inconsistent.

You suggested:

- skip
- used
- used-arg
- used-gpr
- used-gpr-arg
- all
- all-arg
- all-gpr
- all-gpr-arg

I changed to: (switched the order used-arg and used-gpr,all-arg and all-gpr)

-skip
-used
-used-gpr
-used-arg
-used-gpr-arg
-all
-all-gpr
-all-arg
-all-gpr-arg

I intended to do this change in order to make “gpr” before “arg”.

> 
>> @@ -288,7 +288,7 @@ enum sanitize_code {
>> };
>> 
>> /* Different settings for zeroing subset of registers.  */
>> -namespace  zero_regs_code {
>> +namespace zero_regs_flag {
> 
> I suggested “zero_regs_flags” rather than “zero_reg_flag” yesterday;
> I think “zero_regs_flags” is better because the namespace contains
> more than one flag.

Okay.
> 
>> @@ -1776,7 +1776,7 @@ const struct sanitizer_opts_s 
>> coverage_sanitizer_opts[] =
>>   { NULL, 0U, 0UL, false }
>> };
>> 
>> -using namespace zero_regs_code;
>> +using namespace zero_regs_flag;
>> /* -fzero-call-used-regs= suboptions.  */
>> const struct zero_call_used_regs_opts_s zero_call_used_regs_opts[] =
>> {
> 
> Sorry, I didn't no

[PATCH] i386: Set the stack usage to 0 for naked functions

2020-10-30 Thread Pat Bernardi
-fstack-usage raises a "stack usage computation not supported for this target"
warning when it encounters a naked function because the prologue returns early
for naked function on i386. This patch sets the stack usage to zero for naked 
function, following the fix done for Arm by Eric Botcazou:

https://gcc.gnu.org/pipermail/gcc-patches/2016-May/448258.html

Bootstrapped and tested on x86_64-linux. If approved, I'll need a maintainer to
commit on my behalf.

Thanks,

Pat Bernardi
Senior Software Engineer, AdaCore


2020-10-29  Pat Bernardi  

gcc/ChangeLog

* config/i386/i386.c (ix86_expand_prologue): Set the stack usage to 0
for naked functions.

---
 gcc/config/i386/i386.c | 6 +-
 1 file changed, 5 insertions(+), 1 deletion(-)

diff --git a/gcc/config/i386/i386.c b/gcc/config/i386/i386.c
index 0268d87f198..129fe6fa1eb 100644
--- a/gcc/config/i386/i386.c
+++ b/gcc/config/i386/i386.c
@@ -13197,7 +13197,11 @@ ix86_expand_prologue (void)
   rtx static_chain = NULL_RTX;

   if (ix86_function_naked (current_function_decl))
-return;
+{
+  if (flag_stack_usage_info)
+   current_function_static_stack_size = 0;
+  return;
+}

   ix86_finalize_stack_frame_flags ();



Re: [PATCH] aarch64: Add backend support for expanding __builtin_memset

2020-10-30 Thread Richard Sandiford via Gcc-patches
Sudakshina Das  writes:
> diff --git a/gcc/config/aarch64/aarch64.h b/gcc/config/aarch64/aarch64.h
> index 
> 00b5f8438863bb52c348cfafd5d4db478fe248a7..bcb654809c9662db0f51fc1368e37e42969efd29
>  100644
> --- a/gcc/config/aarch64/aarch64.h
> +++ b/gcc/config/aarch64/aarch64.h
> @@ -1024,16 +1024,18 @@ typedef struct
>  #define MOVE_RATIO(speed) \
>(!STRICT_ALIGNMENT ? 2 : (((speed) ? 15 : AARCH64_CALL_RATIO) / 2))
>  
> -/* For CLEAR_RATIO, when optimizing for size, give a better estimate
> -   of the length of a memset call, but use the default otherwise.  */
> +/* Like MOVE_RATIO, without -mstrict-align, make decisions in "setmem" when
> +   we would use more than 3 scalar instructions.
> +   Otherwise follow a sensible default: when optimizing for size, give a 
> better
> +   estimate of the length of a memset call, but use the default otherwise.  
> */
>  #define CLEAR_RATIO(speed) \
> -  ((speed) ? 15 : AARCH64_CALL_RATIO)
> +  (!STRICT_ALIGNMENT ? 4 : (speed) ? 15 : AARCH64_CALL_RATIO)
>  
>  /* SET_RATIO is similar to CLEAR_RATIO, but for a non-zero constant, so when
> optimizing for size adjust the ratio to account for the overhead of 
> loading
> the constant.  */
>  #define SET_RATIO(speed) \
> -  ((speed) ? 15 : AARCH64_CALL_RATIO - 2)
> +  (!STRICT_ALIGNMENT ? 0 : (speed) ? 15 : AARCH64_CALL_RATIO - 2)

Think it would help to adjust the SET_RATIO comment too, otherwise it's
not obvious why its !STRICT_ALIGNMNENT value is 0.

>  
>  /* Disable auto-increment in move_by_pieces et al.  Use of auto-increment is
> rarely a good idea in straight-line code since it adds an extra address
> diff --git a/gcc/config/aarch64/aarch64.c b/gcc/config/aarch64/aarch64.c
> index 
> a8cc545c37044345c3f1d3bf09151c8a9578a032..16ac0c076adcc82627af43473a938e78d3a7ecdc
>  100644
> --- a/gcc/config/aarch64/aarch64.c
> +++ b/gcc/config/aarch64/aarch64.c
> @@ -7058,6 +7058,9 @@ aarch64_gen_store_pair (machine_mode mode, rtx mem1, 
> rtx reg1, rtx mem2,
>  case E_V4SImode:
>return gen_vec_store_pairv4siv4si (mem1, reg1, mem2, reg2);
>  
> +case E_V16QImode:
> +  return gen_vec_store_pairv16qiv16qi (mem1, reg1, mem2, reg2);
> +
>  default:
>gcc_unreachable ();
>  }
> @@ -21373,6 +21376,134 @@ aarch64_expand_cpymem (rtx *operands)
>return true;
>  }
>  
> +/* Like aarch64_copy_one_block_and_progress_pointers, except for memset where
> +   *src is a register we have created with the duplicated value to be set.  
> */

AIUI, *SRC doesn't accumulate across calls in the way that it does for
aarch64_copy_one_block_and_progress_pointers, so it might be better to
pass an rtx rather than an “rtx *”.

> +static void
> +aarch64_set_one_block_and_progress_pointer (rtx *src, rtx *dst,
> + machine_mode mode)
> +{
> +  /* If we are copying 128bits or 256bits, we can do that straight from
> +  the SIMD register we prepared.  */

Nit: excess space before “the”.

> +  if (known_eq (GET_MODE_BITSIZE (mode), 256))
> +{
> +  mode =  GET_MODE (*src);

Excess space before “GET_MODE”.

> +  /* "Cast" the *dst to the correct mode.  */
> +  *dst = adjust_address (*dst, mode, 0);
> +  /* Emit the memset.  */
> +  emit_insn (aarch64_gen_store_pair (mode, *dst, *src,
> +  aarch64_progress_pointer (*dst), 
> *src));
> +
> +  /* Move the pointers forward.  */
> +  *dst = aarch64_move_pointer (*dst, 32);
> +  return;
> +}
> +  else if (known_eq (GET_MODE_BITSIZE (mode), 128))

Nit: more usual in GCC not to have an “else” after an early return.

> +{
> +  /* "Cast" the *dst to the correct mode.  */
> +  *dst = adjust_address (*dst, GET_MODE (*src), 0);
> +  /* Emit the memset.  */
> +  emit_move_insn (*dst, *src);
> +  /* Move the pointers forward.  */
> +  *dst = aarch64_move_pointer (*dst, 16);
> +  return;
> +}
> +  /* For copying less, we have to extract the right amount from *src.  */
> +  machine_mode vq_mode = aarch64_vq_mode (GET_MODE_INNER(mode)).require ();

Nit: should be a space before “(mode)”.

> +  *src = convert_to_mode (vq_mode, *src, 0);
> +  rtx reg = simplify_gen_subreg (mode, *src, vq_mode, 0);

I was surprised that this needed a two-step conversion.  Does a direct
subreg of the original V16QI src blow up for some modes?  If so, it might
be worth a comment.

Even if we need two steps, it would be good if the first one was also
a subreg.  convert_to_mode is normally for arithmetic conversions rather
than bitcasts.

I think we need to use lowpart_subreg instead of simplify_gen_subreg
in order to get the right SUBREG_BYTE for big-endian.  It would be
good to have some big-endian tests just to make sure. :-)

> +
> +  /* "Cast" the *dst to the correct mode.  */
> +  *dst = adjust_address (*dst, mode, 0);
> +  /* Emit the memset.  */
> +  emit_move_insn (*dst, reg);
> +  /* Move the pointer forward.  */
> +  *dst = aarch64_progress

[PATCH] libstdc++: Don't initialize from *this inside some views [PR97600]

2020-10-30 Thread Patrick Palka via Gcc-patches
This works around a subtle issue where instantiating the begin()/end()
member of some views (as part of return type deduction) inadvertently
requires computing the satisfaction value of range.

This is problematic because the constraint range requires the
begin()/end() member to be callable.  But it's not callable until we've
deduced its return type, so evaluation of range yields false
at this point.  And if at any point after both members are instantiated
(and their return types deduced) we evaluate range again, this
time it will yield true since the begin()/end() members are now both
callable.  This makes the program ill-formed according to
[temp.constr.atomic]/3:

  If, at different points in the program, the satisfaction result is
  different for identical atomic constraints and template arguments, the
  program is ill-formed, no diagnostic required.

The views affected by this issue are those whose begin()/end() member
has a placeholder return type and that member initializes an _Iterator
or _Sentinel object from a reference to *this.  The second condition is
relevant because it means explicit conversion functions are considered
during overload resolution (as per [over.match.copy], I think), and
therefore it causes g++ to check the constraints of the conversion
function view_interface::operator bool().  And this conversion
function's constraints indirectly require range.

This issue is observable on trunk only with basic_istream_view (as in
the testcase in the PR).  But a pending patch that makes g++ memoize
constraint satisaction values indefinitely (it currently invalidates
the satisfaction cache on various events) causes many existing tests for
the other affected views to fail, because range then remains
false for the whole compilation.

This patch works around this issue by adjusting the constructors of the
_Iterator and _Sentinel types of the affected views to take their
foo_view argument by pointer instead of by reference, so that g++ no
longer considers explicit conversion functions when resolving the
direct-initialization inside these views' begin()/end() members.

Tested on x86_64-pc-linux-gnu, and also verified that this fixes the
testsuite failures when combined with the mentioned frontend patch
(https://gcc.gnu.org/pipermail/gcc-patches/2020-October/557237.html).
Does this look OK for trunk?

libstdc++-v3/ChangeLog:

PR libstdc++/97600
* include/std/ranges (basic_istream_view::begin): Initialize
_Iterator from 'this' instead of '*this'.
(basic_istream_view::_Iterator::_Iterator): Adjust constructor
accordingly.
(filter_view::_Iterator::_Iterator): Take a filter_view*
argument instead of a filter_view& argument.
(filter_view::_Sentinel::_Sentinel): Likewise.
(filter_view::begin): Initialize _Iterator from 'this' instead
of '*this'.
(filter_view::end): Likewise.
(transform_view::_Iterator::_Iterator): Take a _Parent* instead
of a _Parent&.
(filter_view::_Iterator::operator+): Adjust accordingly.
(filter_view::_Iterator::operator-): Likewise.
(filter_view::begin): Initialize _Iterator from 'this' instead
of '*this'.
(filter_view::end): Likewise.
(join_view::_Iterator): Take a _Parent* instead of a _Parent&.
(join_view::_Sentinel): Likewise.
(join_view::begin): Initialize _Iterator from 'this' instead of
'*this'.
(join_view::end): Initialize _Sentinel from 'this' instead of
'*this'.
(split_view::_OuterIter): Take a _Parent& instead of a _Parent*.
(split_view::begin): Initialize _OuterIter from 'this' instead
of '*this'.
(split_view::end): Likewise.
* testsuite/std/ranges/97600.cc: New test.
---
 libstdc++-v3/include/std/ranges| 78 +++---
 libstdc++-v3/testsuite/std/ranges/97600.cc | 32 +
 2 files changed, 71 insertions(+), 39 deletions(-)
 create mode 100644 libstdc++-v3/testsuite/std/ranges/97600.cc

diff --git a/libstdc++-v3/include/std/ranges b/libstdc++-v3/include/std/ranges
index a3e5354848a..28787cc5cc3 100644
--- a/libstdc++-v3/include/std/ranges
+++ b/libstdc++-v3/include/std/ranges
@@ -636,7 +636,7 @@ namespace views
   {
if (_M_stream != nullptr)
  *_M_stream >> _M_object;
-   return _Iterator{*this};
+   return _Iterator{this};
   }
 
   constexpr default_sentinel_t
@@ -657,8 +657,8 @@ namespace views
_Iterator() = default;
 
constexpr explicit
-   _Iterator(basic_istream_view& __parent) noexcept
- : _M_parent(std::__addressof(__parent))
+   _Iterator(basic_istream_view* __parent) noexcept
+ : _M_parent(__parent)
{ }
 
_Iterator(const _Iterator&) = delete;
@@ -1147,9 +1147,9 @@ namespace views
_Iterator() = default;
 
constexpr
-   _Iterator(filter_view& __parent, _Vp_iter __current)
+   _Iterator(filter_v

Re: [PATCH v7] genemit.c (main): split insn-emit.c for compiling parallelly

2020-10-30 Thread Richard Sandiford via Gcc-patches
Jojo R  writes:
> Jojo
> 在 2020年10月27日 +0800 PM10:14,Richard Sandiford ,写道:
>> Jojo R  writes:
>> > gcc/ChangeLog:
>> >
>> > * genemit.c (main): Print 'split line'.
>> > * Makefile.in (insn-emit.c): Define split count and file
>> >
>> > ---
>> > gcc/Makefile.in | 19 +
>> > gcc/genemit.c | 104 +---
>> > 2 files changed, 83 insertions(+), 40 deletions(-)
>> >
>> > diff --git a/gcc/Makefile.in b/gcc/Makefile.in
>> > index 79e854aa938..a7fcc7d5949 100644
>> > --- a/gcc/Makefile.in
>> > +++ b/gcc/Makefile.in
>> > @@ -1258,6 +1258,21 @@ ANALYZER_OBJS = \
>> > # We put the *-match.o and insn-*.o files first so that a parallel make
>> > # will build them sooner, because they are large and otherwise tend to be
>> > # the last objects to finish building.
>> > +
>> > +# target overrides
>> > +-include $(tmake_file)
>> > +
>> > +INSN-GENERATED-SPLIT-NUM ?= 0
>> > +
>> > +insn-generated-split-num = $(shell i=1; j=`expr 
>> > $(INSN-GENERATED-SPLIT-NUM) + 1`; \
>> > + while test $$i -le $$j; do \
>> > + echo $$i; i=`expr $$i + 1`; \
>> > + done)
>> > +
>> > +insn-emit-split-c := $(foreach o, $(shell for i in 
>> > $(insn-generated-split-num); do echo $$i; done), insn-emit$(o).c)
>> > +insn-emit-split-obj = $(patsubst %.c,%.o, $(insn-emit-split-c))
>> > +$(insn-emit-split-c): insn-emit.c
>>
>> Sorry for the slow reply. I stand by what I said in
>> https://gcc.gnu.org/pipermail/gcc-patches/2020-August/552863.html:
>>
>> I think we should use the same wordlist technique as check_p_numbers[0-6].
>> So I guess the first step would be to rename check_p_numbers[0-6] to
>> something more general and use it both here and in check_p_numbers.
>>
>> I think that would be better than having two different ways of
>> generating lists of numbers, one directly in make and one calling
>> out to the shell. But I didn't want to reassert that comment in
>> case anyone was prepared to approve the patch in its current form.
>>
>
> Ok & Thanks.
>
> It’s fixed in patch v8.

Thanks.  Like I say, I think we should rename check_p_numbers* at the
same time, since it's now used for more than just parallel check.  Maybe
s/check_p_numbers/number_series/.

But otherwise it looks good.

>> BTW, do you have a copyright assignment on file?
>
> I email the patch without copyright, and I think it is same with other gcc 
> community patch.

Some changes can be so small and mechanical that they're not in practice
copyrightable, but all other changes need a copyright assignment.
Unfortunately this patch is too complex to fall into the first category.
See:

  https://gcc.gnu.org/contribute.html#legal

for more details about the requirement and process.

Thanks,
Richard


[Patch, committed] – was: [Patch] testsuite: Avoid TCL errors when rootme or ASAN/TSAN/UBSAN is not available

2020-10-30 Thread Tobias Burnus

I have now committed it as obvious.

Tobias

On 19.10.20 18:03, Tobias Burnus wrote: [...]

On 10/19/20 11:46 AM, Tobias Burnus wrote:

In a --disable-libsanitizer build, I see errors such as:
  g++.sum:ERROR: can't read "asan_saved_library_path": no such
variable [...]

-
Mentor Graphics (Deutschland) GmbH, Arnulfstraße 201, 80634 München / Germany
Registergericht München HRB 106955, Geschäftsführer: Thomas Heurung, Alexander 
Walter
commit 24d762d1435257a8efd242c4a1a476c9b2037c03
Author: Tobias Burnus 
Date:   Fri Oct 30 17:11:20 2020 +0100

testsuite: Avoid TCL errors when rootme or ASAN/TSAN/UBSAN is not avail

gcc/testsuite/
* g++.dg/guality/guality.exp: Skip $rootme-based check if unset.
* gcc.dg/guality/guality.exp: Likewise.
* gfortran.dg/guality/guality.exp: Likewise.
* lib/asan-dg.exp: Don't use $asan_saved_library_path if not set.
* lib/tsan-dg.exp: Don't use $tsan_saved_library_path if not set.
* lib/ubsan-dg.exp: Don't use $ubsan_saved_library_path if not set.
---
 gcc/testsuite/g++.dg/guality/guality.exp  | 2 +-
 gcc/testsuite/gcc.dg/guality/guality.exp  | 2 +-
 gcc/testsuite/gfortran.dg/guality/guality.exp | 2 +-
 gcc/testsuite/lib/asan-dg.exp | 6 --
 gcc/testsuite/lib/tsan-dg.exp | 6 --
 gcc/testsuite/lib/ubsan-dg.exp| 6 --
 6 files changed, 15 insertions(+), 9 deletions(-)

diff --git a/gcc/testsuite/g++.dg/guality/guality.exp b/gcc/testsuite/g++.dg/guality/guality.exp
index 33571f1f28f..1d5b65fef57 100644
--- a/gcc/testsuite/g++.dg/guality/guality.exp
+++ b/gcc/testsuite/g++.dg/guality/guality.exp
@@ -41 +41 @@ if ![info exists ::env(GUALITY_GDB_NAME)] {
-} elseif [file exists $rootme/../gdb/gdb] {
+} elseif { [info exists rootme] && [file exists $rootme/../gdb/gdb] } {
diff --git a/gcc/testsuite/gcc.dg/guality/guality.exp b/gcc/testsuite/gcc.dg/guality/guality.exp
index 89cd896d05c..ba87132aef2 100644
--- a/gcc/testsuite/gcc.dg/guality/guality.exp
+++ b/gcc/testsuite/gcc.dg/guality/guality.exp
@@ -41 +41 @@ if ![info exists ::env(GUALITY_GDB_NAME)] {
-} elseif [file exists $rootme/../gdb/gdb] {
+} elseif { [info exists rootme] && [file exists $rootme/../gdb/gdb] } {
diff --git a/gcc/testsuite/gfortran.dg/guality/guality.exp b/gcc/testsuite/gfortran.dg/guality/guality.exp
index eaa7ae770d6..0375edfffe4 100644
--- a/gcc/testsuite/gfortran.dg/guality/guality.exp
+++ b/gcc/testsuite/gfortran.dg/guality/guality.exp
@@ -22 +22 @@ if ![info exists ::env(GUALITY_GDB_NAME)] {
-} elseif [file exists $rootme/../gdb/gdb] {
+} elseif { [info exists rootme] && [file exists $rootme/../gdb/gdb] } {
diff --git a/gcc/testsuite/lib/asan-dg.exp b/gcc/testsuite/lib/asan-dg.exp
index 2124607245e..ce745dfdf8d 100644
--- a/gcc/testsuite/lib/asan-dg.exp
+++ b/gcc/testsuite/lib/asan-dg.exp
@@ -154,2 +154,4 @@ proc asan_finish { args } {
-set ld_library_path $asan_saved_library_path
-set_ld_library_path_env_vars
+if [info exists asan_saved_library_path ] {
+	set ld_library_path $asan_saved_library_path
+	set_ld_library_path_env_vars
+}
diff --git a/gcc/testsuite/lib/tsan-dg.exp b/gcc/testsuite/lib/tsan-dg.exp
index b5631a79bcf..6dcfd0a2f83 100644
--- a/gcc/testsuite/lib/tsan-dg.exp
+++ b/gcc/testsuite/lib/tsan-dg.exp
@@ -153,2 +153,4 @@ proc tsan_finish { args } {
-set ld_library_path $tsan_saved_library_path
-set_ld_library_path_env_vars
+if [info exists tsan_saved_library_path ] {
+	set ld_library_path $tsan_saved_library_path
+	set_ld_library_path_env_vars
+}
diff --git a/gcc/testsuite/lib/ubsan-dg.exp b/gcc/testsuite/lib/ubsan-dg.exp
index f4ab29e2add..31740e02ab4 100644
--- a/gcc/testsuite/lib/ubsan-dg.exp
+++ b/gcc/testsuite/lib/ubsan-dg.exp
@@ -144,2 +144,4 @@ proc ubsan_finish { args } {
-set ld_library_path $ubsan_saved_library_path
-set_ld_library_path_env_vars
+if [info exists ubsan_saved_library_path ] {
+	set ld_library_path $ubsan_saved_library_path
+	set_ld_library_path_env_vars
+}


Re: [PATCH 6/9] [nvptx] Force vl32 if calling vector-partitionable routines -- test-cases

2020-10-30 Thread Thomas Schwinge
Hi Tom!

While working on something completely different, I had to dig deeper, and
noticed a thing there, and deeper, and notice another thing, and deeper,
and noticed this other thing here...  (So, business as usual...)  ;-)

On 2019-01-12T23:21:28+0100, Tom de Vries  wrote:
> --- /dev/null
> +++ b/libgomp/testsuite/libgomp.oacc-c-c++-common/pr85486.c

> +#pragma acc routine vector
> +void __attribute__((noinline, noclone))
> +Vector (int *ptr, int n, const int inc)
> +{

> +#pragma acc parallel copy (ary) vector_length (128) /* { dg-warning "using 
> vector_length \\(32\\) due to call to vector-partitionable routine, ignoring 
> 128" } */
> +  {
> +Vector (&ary[0][0], m * n, (1 << 24) - (1 << 16));

This works as diagnosed/expected.

On 2019-01-12T23:21:31+0100, Tom de Vries  wrote:
> --- /dev/null
> +++ b/libgomp/testsuite/libgomp.oacc-c-c++-common/pr85486-2.c
> @@ -0,0 +1,52 @@
> +/* { dg-do run { target openacc_nvidia_accel_selected } } */
> +/* { dg-additional-options "-fopenacc-dim=::128" } */

Via '-fopenacc-dim', we here request a default 'vector_length(128)'.

> +#pragma acc parallel copy (ary)
> +  {
> +Vector (&ary[0][0], m * n, (1 << 24) - (1 << 16));

As above, 'vector_length(128)' must be demoted to 'vector_length(32)'
(and in fact, it is) -- but we're not getting a diagnostic for that.  Is
this expected?

On 2019-01-12T23:21:28+0100, Tom de Vries  wrote:
> --- /dev/null
> +++ b/libgomp/testsuite/libgomp.oacc-c-c++-common/pr85486-3.c
> @@ -0,0 +1,54 @@
> +/* { dg-do run { target openacc_nvidia_accel_selected } } */
> +/* { dg-set-target-env-var "GOMP_OPENACC_DIM" "::128" } */

This testcase needs 'dg-additional-options "-fopenacc-dim=::-"' (or
similar), but support for that is still missing in master branch (I'm
working on porting over the corresponding patch), so this currently
defaults to 'vector_length(32)', and...

> +#pragma acc parallel copy (ary)
> +  {
> +Vector (&ary[0][0], m * n, (1 << 24) - (1 << 16));

... thus no diagnostic here, and...

> +/* { dg-prune-output "using vector_length \\(32\\), ignoring runtime 
> setting" } */

... we're in fact not seeing this diagnostic.


In addition to the (presumedly unexpected) missing diagnostic for
'-fopenacc-dim=::128' mentioned above -- OK to simplify and enhance the
testcases as attached, "Simplify and enhance
'libgomp.oacc-c-c++-common/pr85486*.c' [PR85486]"?


Grüße
 Thomas


-
Mentor Graphics (Deutschland) GmbH, Arnulfstraße 201, 80634 München / Germany
Registergericht München HRB 106955, Geschäftsführer: Thomas Heurung, Alexander 
Walter
>From b0f9199a17911966ee24ec27b23bfb7ed7846700 Mon Sep 17 00:00:00 2001
From: Thomas Schwinge 
Date: Wed, 28 Oct 2020 10:56:20 +0100
Subject: [PATCH] Simplify and enhance 'libgomp.oacc-c-c++-common/pr85486*.c'
 [PR85486]

Avoid code duplication, and better test what we expect to happen.

	libgomp/
	PR target/85486
	* testsuite/libgomp.oacc-c-c++-common/pr85486-2.c: Simplify and enhance.
	* testsuite/libgomp.oacc-c-c++-common/pr85486-3.c: Likewise.
	* testsuite/libgomp.oacc-c-c++-common/pr85486.c: Likewise.
---
 .../libgomp.oacc-c-c++-common/pr85486-2.c | 53 ++
 .../libgomp.oacc-c-c++-common/pr85486-3.c | 55 ++-
 .../libgomp.oacc-c-c++-common/pr85486.c   |  9 ++-
 3 files changed, 20 insertions(+), 97 deletions(-)

diff --git a/libgomp/testsuite/libgomp.oacc-c-c++-common/pr85486-2.c b/libgomp/testsuite/libgomp.oacc-c-c++-common/pr85486-2.c
index f6ca263166d7..d45326488cd8 100644
--- a/libgomp/testsuite/libgomp.oacc-c-c++-common/pr85486-2.c
+++ b/libgomp/testsuite/libgomp.oacc-c-c++-common/pr85486-2.c
@@ -1,52 +1,11 @@
 /* { dg-do run { target openacc_nvidia_accel_selected } } */
+/* { dg-additional-options "-DVECTOR_LENGTH=" } */
 /* { dg-additional-options "-fopenacc-dim=::128" } */
 
-/* Minimized from ref-1.C.  */
+/* { dg-additional-options "-foffload=-fdump-tree-oaccdevlow" } */
+/* { dg-set-target-env-var "GOMP_DEBUG" "1" } */
 
-#include 
+#include "pr85486.c"
 
-#pragma acc routine vector
-void __attribute__((noinline, noclone))
-Vector (int *ptr, int n, const int inc)
-{
-  #pragma acc loop vector
-  for (unsigned ix = 0; ix < n; ix++)
-ptr[ix] += inc;
-}
-
-int
-main (void)
-{
-  const int n = 32, m=32;
-
-  int ary[m][n];
-  unsigned ix,  iy;
-
-  for (ix = m; ix--;)
-for (iy = n; iy--;)
-  ary[ix][iy] = (1 << 16) + (ix << 8) + iy;
-
-  int err = 0;
-
-#pragma acc parallel copy (ary)
-  {
-Vector (&ary[0][0], m * n, (1 << 24) - (1 << 16));
-  }
-
-  for (ix = m; ix--;)
-for (iy = n; iy--;)
-  if (ary[ix][iy] != ((1 << 24) + (ix << 8) + iy))
-	{
-	  printf ("ary[%u][%u] = %x expected %x\n",
-		  ix, iy, ary[ix][iy], ((1 << 24) + (ix << 8) + iy));
-	  err++;
-	}
-
-  if (err)
-{
-  printf ("%d failed\n", err);
-  return 1;
-}
-
-  return 0;
-}
+/* { dg-final { scan-offload-tree-dump "__attribute__\\(\\(oacc function \\(1, 1, 32\\)" "oaccdevlow" } } */
+/* { dg-output "nvptx_exec: k

Re: [PATCH 6/9] [nvptx] Force vl32 if calling vector-partitionable routines -- test-cases

2020-10-30 Thread Tom de Vries
On 10/30/20 5:16 PM, Thomas Schwinge wrote:
> Hi Tom!
> 
> While working on something completely different, I had to dig deeper, and
> noticed a thing there, and deeper, and notice another thing, and deeper,
> and noticed this other thing here...  (So, business as usual...)  ;-)
> 
> On 2019-01-12T23:21:28+0100, Tom de Vries  wrote:
>> --- /dev/null
>> +++ b/libgomp/testsuite/libgomp.oacc-c-c++-common/pr85486.c
> 
>> +#pragma acc routine vector
>> +void __attribute__((noinline, noclone))
>> +Vector (int *ptr, int n, const int inc)
>> +{
> 
>> +#pragma acc parallel copy (ary) vector_length (128) /* { dg-warning "using 
>> vector_length \\(32\\) due to call to vector-partitionable routine, ignoring 
>> 128" } */
>> +  {
>> +Vector (&ary[0][0], m * n, (1 << 24) - (1 << 16));
> 
> This works as diagnosed/expected.
> 
> On 2019-01-12T23:21:31+0100, Tom de Vries  wrote:
>> --- /dev/null
>> +++ b/libgomp/testsuite/libgomp.oacc-c-c++-common/pr85486-2.c
>> @@ -0,0 +1,52 @@
>> +/* { dg-do run { target openacc_nvidia_accel_selected } } */
>> +/* { dg-additional-options "-fopenacc-dim=::128" } */
> 
> Via '-fopenacc-dim', we here request a default 'vector_length(128)'.
> 
>> +#pragma acc parallel copy (ary)
>> +  {
>> +Vector (&ary[0][0], m * n, (1 << 24) - (1 << 16));
> 
> As above, 'vector_length(128)' must be demoted to 'vector_length(32)'
> (and in fact, it is) -- but we're not getting a diagnostic for that.  Is
> this expected?
> 

I think it would be good to have.  I don't know whether it's implemented.

> On 2019-01-12T23:21:28+0100, Tom de Vries  wrote:
>> --- /dev/null
>> +++ b/libgomp/testsuite/libgomp.oacc-c-c++-common/pr85486-3.c
>> @@ -0,0 +1,54 @@
>> +/* { dg-do run { target openacc_nvidia_accel_selected } } */
>> +/* { dg-set-target-env-var "GOMP_OPENACC_DIM" "::128" } */
> 
> This testcase needs 'dg-additional-options "-fopenacc-dim=::-"' (or
> similar), but support for that is still missing in master branch (I'm
> working on porting over the corresponding patch), so this currently
> defaults to 'vector_length(32)', and...
> 
>> +#pragma acc parallel copy (ary)
>> +  {
>> +Vector (&ary[0][0], m * n, (1 << 24) - (1 << 16));
> 
> ... thus no diagnostic here, and...
> 
>> +/* { dg-prune-output "using vector_length \\(32\\), ignoring runtime 
>> setting" } */
> 
> ... we're in fact not seeing this diagnostic.
> 
> 
> In addition to the (presumedly unexpected) missing diagnostic for
> '-fopenacc-dim=::128' mentioned above -- OK to simplify and enhance the
> testcases as attached, "Simplify and enhance
> 'libgomp.oacc-c-c++-common/pr85486*.c' [PR85486]"?
> 

Yep, looks good.

Thanks,
- Tom


RE: [PATCH] rs6000, Add bcd builtings listed in appendix B of the ABI

2020-10-30 Thread Carl Love via Gcc-patches
David:

On Wed, 2020-10-28 at 20:43 -0400, David Edelsohn wrote:
> Better, but please use
> 
> /* { dg-require-effective-target int128 } */
> 
> not "target int128" in the selector.  Segher and I both agree that
> it's cleaner and more readable.  The selector (the target part on the
> dg-do line) should not be used for this type of requirement.

OK, redid the test case.  It now reads:

+/* { dg-do compile } */
+/* { dg-require-effective-target int128 } */
+/
* { dg-require-effective-target power10_hw } */
+/* { dg-options "-
mdejagnu-cpu=power10 -O2" } */
+/* { dg-final { scan-assembler-times
"\mbcdadd\M" 7 } } */
+/* { dg-final { scan-assembler-times "\mbcdsub\M"
18 } } */
+/* { dg-final { scan-assembler-times "\mbcds\M" 2 } } */
+/* {
dg-final { scan-assembler-times "\mdenbcdq\M" 1 } } */
+

Reran the regresion, no new failures were reported.

Please let me know if that looks OK.  Thanks.

   Carl 


Gcc maintainers:

The following patch adds support for the built-ins listed in Table B.1,
"Binary-Coded Decimal Built-In Functions" of the "64-Bit ELF V2 ABI
Specification", July 30, 2019.

The built-ins adds support the V16QI type for addition, subtraction and
comparison as sepcified in the Table B.1.  Note, the V1TI type was
previously supported for add, subtract and comparison.  The builtins
for test for valid value, multiply by 10, divide by 10 and conversion
to DFP value are also added for the V16QI type as specified in Table
B.1.

The patch includes adding the #include  to the existing bcd-
2.c and bcd-3.c tests so they will pass the regression tests as the
builtins names are now listed in altivec.h rather then just using
internal names.

The patch was compiled and tested on:

  powerpc64le-unknown-linux-gnu (Power 9 LE)

with no regressions. 

Please let me know if this patch is acceptable for mainline.  Thanks.

Carl Love

-


2020-10-29  Carl Love  

gcc/
PR target/93449
* config/rs6000/altivec.h (__builtin_bcdadd, __builtin_bcdadd_lt,
__builtin_bcdadd_eq, __builtin_bcdadd_gt, __builtin_bcdadd_ofl,
__builtin_bcdadd_ov, __builtin_bcdsub, __builtin_bcdsub_lt,
__builtin_bcdsub_eq, __builtin_bcdsub_gt, __builtin_bcdsub_ofl,
__builtin_bcdsub_ov, __builtin_bcdinvalid, __builtin_bcdmul10,
__builtin_bcddiv10, __builtin_bcd2dfp, __builtin_bcdcmpeq,
__builtin_bcdcmpgt, __builtin_bcdcmplt, __builtin_bcdcmpge,
__builtin_bcdcmple): Add defines.
* config/rs6000/altivec.md: Add UNSPEC_BCDSHIFT.
(BCD_TEST): Add le, ge to code iterator.
Add VBCD mode iterator.
(bcd_test, *bcd_test2,
bcd_, bcd_): Add mode to name.
Change iterator from V1TI to VBCD.
(*bcdinvalid_, bcdshift_v16qi): New define_insn.
(bcdinvalid_, bcdmul10_v16qi, bcddiv10_v16qi): New define.
config/rs6000/dfp.md (dfp_denbcd_v16qi_inst): New define_insn.
(dfp_denbcd_v16qi): New define_expand.
* confit/rs6000/rs6000-builtin.def (BU_P8V_MISC_1): New define.
(BCDADD): Replaced with BCDADD_V1TI and BCDADD_V16QI.
(BCDADD_LT): Replaced with BCDADD_LT_V1TI and BCDADD_LT_V16QI.
(BCDADD_EQ): Replaced with BCDADD_EQ_V1TI and BCDADD_EQ_V16QI.
(BCDADD_GT): Replaced with BCDADD_GT_V1TI and BCDADD_GT_V16QI.
(BCDADD_OV): Replaced with BCDADD_OV_V1TI and BCDADD_OV_V16QI.
(BCDSUB_V1TI, BCDSUB_V16QI, BCDSUB_LT_V1TI, BCDSUB_LT_V16QI,
BCDSUB_LE_V1TI, BCDSUB_LE_V16QI, BCDSUB_EQ_V1TI, BCDSUB_EQ_V16QI,
BCDSUB_GT_V1TI, BCDSUB_GT_V16QI, BCDSUB_GE_V1TI, BCDSUB_GE_V16QI,
BCDSUB_OV_V1TI, BCDSUB_OV_V16QI, BCDINVALID_V1TI, BCDINVALID_V16QI,
BCDMUL10_V16QI, BCDDIV10_V16QI, DENBCD_V16QI): New builtin definitions.
(BCDADD, BCDADD_LT, BCDADD_EQ, BCDADD_GT, BCDADD_OV, BCDSUB, BCDSUB_LT,
BCDSUB_LE, BCDSUB_EQ, BCDSUB_GT, BCDSUB_GE, BCDSUB_OV, BCDINVALID,
BCDMUL10, BCDDIV10, DENBCD): New overload definitions.
config/rs6000/rs6000-call.c (P8V_BUILTIN_VEC_BCDADD, 
P8V_BUILTIN_VEC_BCDADD_LT,
P8V_BUILTIN_VEC_BCDADD_EQ, P8V_BUILTIN_VEC_BCDADD_GT, 
P8V_BUILTIN_VEC_BCDADD_OV,
P8V_BUILTIN_VEC_BCDINVALID, P9V_BUILTIN_VEC_BCDMUL10, 
P8V_BUILTIN_VEC_DENBCD.
P8V_BUILTIN_VEC_BCDSUB, P8V_BUILTIN_VEC_BCDSUB_LT, 
P8V_BUILTIN_VEC_BCDSUB_LE,
P8V_BUILTIN_VEC_BCDSUB_EQ, P8V_BUILTIN_VEC_BCDSUB_GT, 
P8V_BUILTIN_VEC_BCDSUB_GE,
P8V_BUILTIN_VEC_BCDSUB_OV): New overloaded specifications.
(CODE_FOR_bcdadd): Replaced with CODE_FOR_bcdadd_v16qi and 
CODE_FOR_bcdadd_v1ti.
(CODE_FOR_bcdadd_lt): Replaced with CODE_FOR_bcdadd_lt_v16qi and 
CODE_FOR_bcdadd_lt_v1ti.
(CODE_FOR_bcdadd_eq): Replaced with CODE_FOR_bcdadd_eq_v16qi and 
CODE_FOR_bcdadd_eq_v1ti.
(CODE_FOR_bcdadd_gt): Replaced with CODE_FOR_bcdadd_gt_v16qi and 
CODE_FOR_bcdadd_gt_v1ti.
(CO

Re: [nvptx, committed] Force vl32 if calling vector-partitionable routines

2020-10-30 Thread Thomas Schwinge
Hi Tom!

On 2019-01-07T20:11:59+0100, Tom de Vries  wrote:
> [nvptx] Force vl32 if calling vector-partitionable routines
>
> With PTX_MAX_VECTOR_LENGTH set to larger than PTX_WARP_SIZE, routines can be
> called from offloading regions with vector-size set to larger than warp size.
> OTOH, vector-partitionable routines assume warp-sized vector length.
>
> Detect if we're calling a vector-partitionable routine from an offloading
> region, and if so, fall back to warp-sized vector length in that region.
>
> 2018-12-17  Tom de Vries  
>
>   PR target/85486
>   * config/nvptx/nvptx.c (has_vector_partitionable_routine_calls_p): New
>   function.
>   (nvptx_goacc_validate_dims): Force vl32 if calling vector-partitionable
>   routines.

> --- a/gcc/config/nvptx/nvptx.c
> +++ b/gcc/config/nvptx/nvptx.c

> +/* Return true if FNDECL contains calls to vector-partitionable routines.  */
> +
> +static bool
> +has_vector_partitionable_routine_calls_p (tree fndecl)
> +{
> +  if (!fndecl)
> +return false;
> +
> +  basic_block bb;
> +  FOR_EACH_BB_FN (bb, DECL_STRUCT_FUNCTION (fndecl))
> +for (gimple_stmt_iterator i = gsi_start_bb (bb); !gsi_end_p (i);
> +  gsi_next_nondebug (&i))
> +  {
> + gimple *stmt = gsi_stmt (i);
> + if (gimple_code (stmt) != GIMPLE_CALL)
> +   continue;

(This might use '!is_gimple_call (stmt)'.)

> +
> + tree callee = gimple_call_fndecl (stmt);
> + if (!callee)
> +   continue;

Would there be any other case where this '!callee' conditional doesn't
really mean 'gimple_call_internal_p (stmt)'?  I thought about suggesting
to use that instead, and then maybe 'gcc_assert (callee)' (... which
doesn't trigger for any current testcases), but reviewing 'GIMPLE_CALL',
I now see further 'is_gimple_call_addr' legitimate cases.  What do these
mean, here?

And, should we add a comment why 'continue' is fine then, instead of
fail-safe 'return true'?

Couldn't an 'internal_fn' potentially also make use of OpenACC
parallelism?

> +
> + tree attrs  = oacc_get_fn_attrib (callee);
> + if (attrs == NULL_TREE)
> +   return false;

That's not correct, as far as I can tell: if the current callee doesn't
have an 'oacc function' attribute, we *stop* here any further processing,
and 'return false' indicating that there are no "calls to
vector-partitionable routines".  See bug fix and adjusted test case in
attached patch "Force vl32 if calling vector-partitionable routines: fix
case where callee doesn't have 'oacc function' attribute [PR85486]".  OK
to push?

> +
> + int partition_level = oacc_fn_attrib_level (attrs);
> + bool seq_routine_p = partition_level == GOMP_DIM_MAX;
> + if (!seq_routine_p)
> +   return true;
> +  }
> +
> +  return false;
> +}

> @@ -5611,6 +5646,16 @@ nvptx_goacc_validate_dims_1 (tree decl, int dims[], 
> int fn_level)
>  old_dims[i] = dims[i];
>
>const char *vector_reason = NULL;
> +  if (offload_region_p && has_vector_partitionable_routine_calls_p (decl))
> +{
> +  if (dims[GOMP_DIM_VECTOR] > PTX_WARP_SIZE)
> + {
> +   vector_reason = G_("using vector_length (%d) due to call to"
> +  " vector-partitionable routine, ignoring %d");
> +   dims[GOMP_DIM_VECTOR] = PTX_WARP_SIZE;
> + }
> +}
> +
>if (dims[GOMP_DIM_VECTOR] == 0)
>  {
>vector_reason = G_("using vector_length (%d), ignoring runtime 
> setting");


Grüße
 Thomas


-
Mentor Graphics (Deutschland) GmbH, Arnulfstraße 201, 80634 München / Germany
Registergericht München HRB 106955, Geschäftsführer: Thomas Heurung, Alexander 
Walter
>From 0399c9023b717ea686db912ca5c133a2d30752e4 Mon Sep 17 00:00:00 2001
From: Thomas Schwinge 
Date: Wed, 28 Oct 2020 12:04:46 +0100
Subject: [PATCH] Force vl32 if calling vector-partitionable routines: fix case
 where callee doesn't have 'oacc function' attribute [PR85486]

	gcc/
	PR target/85486
	* config/nvptx/nvptx.c (has_vector_partitionable_routine_calls_p):
	Fix case where callee doesn't have 'oacc function' attribute.
	libgomp/
	PR target/85486
	* testsuite/libgomp.oacc-c-c++-common/pr85486.c: Extend.
---
 gcc/config/nvptx/nvptx.c  |  3 ++-
 libgomp/testsuite/libgomp.oacc-c-c++-common/pr85486.c | 10 ++
 2 files changed, 12 insertions(+), 1 deletion(-)

diff --git a/gcc/config/nvptx/nvptx.c b/gcc/config/nvptx/nvptx.c
index 17349475fff0..61a756fc6448 100644
--- a/gcc/config/nvptx/nvptx.c
+++ b/gcc/config/nvptx/nvptx.c
@@ -5674,7 +5674,8 @@ has_vector_partitionable_routine_calls_p (tree fndecl)
 
 	tree attrs  = oacc_get_fn_attrib (callee);
 	if (attrs == NULL_TREE)
-	  return false;
+	  /* Implicitly 'seq'.  */
+	  continue;
 
 	int partition_level = oacc_fn_attrib_level (attrs);
 	bool seq_routine_p = partition_level == GOMP_DIM_MAX;
diff --git a/libgomp/testsuite/libgomp.oacc-c-c++-common/pr85486.c b/libgomp/testsuite/libgomp.oacc-c-c++-common/pr85486.c
index 0d98b82f9932..38a61624d9f8 100644
--

[PATCH] PowerPC: Don't assume all targets have GLIBC

2020-10-30 Thread Michael Meissner via Gcc-patches
PowerPC: Don't assume all targets have GLIBC.

David reminded me that not all targets support GLIBC.  This patch should fix my
previous committed patch not to use TARGET_GLIBC_MAJOR or TARGET_GLIBC_MINOR
unless they are defined.

I have done a bootstrap on a little endian power9 system and it was fine.  Can
I check this patch into the master branch?

gcc/
2020-10-30  Michael Meissner  

* config/rs6000/rs6000.c (glibc_supports_ieee_128bit): New helper
function.
(rs6000_option_override_internal): Call it.
---
 gcc/config/rs6000/rs6000.c | 24 +---
 1 file changed, 21 insertions(+), 3 deletions(-)

diff --git a/gcc/config/rs6000/rs6000.c b/gcc/config/rs6000/rs6000.c
index 1d7e8878c45..a59dc919baa 100644
--- a/gcc/config/rs6000/rs6000.c
+++ b/gcc/config/rs6000/rs6000.c
@@ -3539,6 +3539,25 @@ rs6000_linux64_override_options ()
 }
 #endif
 
+/* Return true if we are using GLIBC, and it supports IEEE 128-bit long double.
+   This support is only in little endian GLIBC 2.32 or newer.  */
+static bool
+glibc_supports_ieee_128bit (void)
+{
+#if defined (OPTION_GLIBC) \
+  && defined (TARGET_GLIBC_MAJOR) \
+  && defined (TARGET_GLIBC_MINOR)
+
+  if (OPTION_GLIBC
+  && !BYTES_BIG_ENDIAN
+  && DEFAULT_ABI == ABI_ELFv2
+  && ((TARGET_GLIBC_MAJOR * 1000) + TARGET_GLIBC_MINOR) >= 2032)
+return true;
+#endif /* GLIBC provided.  */
+
+  return false;
+}
+
 /* Override command line options.
 
Combine build-specific configuration information with options
@@ -4173,9 +4192,8 @@ rs6000_option_override_internal (bool global_init_p)
  static bool warned_change_long_double;
 
  if (!warned_change_long_double
- && (!OPTION_GLIBC
- || (!lang_GNU_C () && !lang_GNU_CXX ())
- || ((TARGET_GLIBC_MAJOR * 1000) + TARGET_GLIBC_MINOR) < 2032))
+ && (!glibc_supports_ieee_128bit ()
+ || (!lang_GNU_C () && !lang_GNU_CXX (
{
  if (TARGET_IEEEQUAD)
warning (OPT_Wpsabi, "Using IEEE extended precision "
-- 
2.22.0


-- 
Michael Meissner, IBM
IBM, M/S 2506R, 550 King Street, Littleton, MA 01460-6245, USA
email: meiss...@linux.ibm.com, phone: +1 (978) 899-4797


Re: [PATCH] rs6000, Add bcd builtings listed in appendix B of the ABI

2020-10-30 Thread David Edelsohn via Gcc-patches
On Fri, Oct 30, 2020 at 12:36 PM Carl Love  wrote:
>
> David:
>
> On Wed, 2020-10-28 at 20:43 -0400, David Edelsohn wrote:
> > Better, but please use
> >
> > /* { dg-require-effective-target int128 } */
> >
> > not "target int128" in the selector.  Segher and I both agree that
> > it's cleaner and more readable.  The selector (the target part on the
> > dg-do line) should not be used for this type of requirement.
>
> OK, redid the test case.  It now reads:
>
> +/* { dg-do compile } */
> +/* { dg-require-effective-target int128 } */
> +/
> * { dg-require-effective-target power10_hw } */
> +/* { dg-options "-
> mdejagnu-cpu=power10 -O2" } */
> +/* { dg-final { scan-assembler-times
> "\mbcdadd\M" 7 } } */
> +/* { dg-final { scan-assembler-times "\mbcdsub\M"
> 18 } } */
> +/* { dg-final { scan-assembler-times "\mbcds\M" 2 } } */
> +/* {
> dg-final { scan-assembler-times "\mdenbcdq\M" 1 } } */
> +
>
> Reran the regresion, no new failures were reported.
>
> Please let me know if that looks OK.  Thanks.

Hi, Carl

The revised dg-require for the testcases look fine to me.

Thanks for implementing this next set of builtins.  The patch looks
good to me, modulo any comments from Segher.

20+ more builtins for Bill's rewrite.  You owe him a beer.

Thanks, David


Re: [PATCH] Fix gnu-versioned-namespace build

2020-10-30 Thread François Dumont via Gcc-patches

On 30/10/20 2:37 pm, Jonathan Wakely wrote:

On 30/10/20 13:23 +, Jonathan Wakely wrote:

On 30/10/20 13:59 +0100, François Dumont via Libstdc++ wrote:

The gnu-versioned-namespace build is broken.

The fix in charconv/floating_from_chars.cc is quite trivial. I am 
not so sure about the fix in sstream-inst.cc.


The change for src/c++20/sstream-inst.cc is OK to commit. It would
probably be better to not build that file at all if the cxx11 ABI is
not supported at all, but then the src/c++20 directory would be empty
and I'm not sure if that would work. So just making the file empty is
fine.

The change for from_chars is not OK. With your change the 
header doesn't declare those functions if included by a file using the
old ABI. That's wrong, they should be declared unconditionally.

I see two ways to fix it. Either make the declarations in the header
depend on ! _GLIBCXX_INLINE_VERSION (so they're disabled for
gnu-versioned namespace) or fix the code in floating_from_chars to not
use a pmr::memory_resource for allocation in the versioned namespace
build.


Here's a patch for the second way.

A third way to fix it would be to make basic_string work with C++
allocators, so that pmr::string is usable for the gnu-versioned
namespace.

And the fourth would be to switch the versioned namespace to use the
new ABI unconditionally, instead of using the old ABI unconditionally.



Can I commit this one once tested then ?

I'll try to put the fourth way in place however.

Thanks,

François



RE: [PATCH] aarch64: Add backend support for expanding __builtin_memset

2020-10-30 Thread Sudakshina Das via Gcc-patches
Hi Richard

Thank you for the review. Please find my comments inlined.

> -Original Message-
> From: Richard Sandiford 
> Sent: 30 October 2020 15:03
> To: Sudakshina Das 
> Cc: gcc-patches@gcc.gnu.org; Kyrylo Tkachov ;
> Richard Earnshaw 
> Subject: Re: [PATCH] aarch64: Add backend support for expanding
> __builtin_memset
> 
> Sudakshina Das  writes:
> > diff --git a/gcc/config/aarch64/aarch64.h
> > b/gcc/config/aarch64/aarch64.h index
> >
> 00b5f8438863bb52c348cfafd5d4db478fe248a7..bcb654809c9662db0f51fc1368
> e3
> > 7e42969efd29 100644
> > --- a/gcc/config/aarch64/aarch64.h
> > +++ b/gcc/config/aarch64/aarch64.h
> > @@ -1024,16 +1024,18 @@ typedef struct  #define MOVE_RATIO(speed) \
> >(!STRICT_ALIGNMENT ? 2 : (((speed) ? 15 : AARCH64_CALL_RATIO) / 2))
> >
> > -/* For CLEAR_RATIO, when optimizing for size, give a better estimate
> > -   of the length of a memset call, but use the default otherwise.  */
> > +/* Like MOVE_RATIO, without -mstrict-align, make decisions in "setmem"
> when
> > +   we would use more than 3 scalar instructions.
> > +   Otherwise follow a sensible default: when optimizing for size, give a
> better
> > +   estimate of the length of a memset call, but use the default
> > +otherwise.  */
> >  #define CLEAR_RATIO(speed) \
> > -  ((speed) ? 15 : AARCH64_CALL_RATIO)
> > +  (!STRICT_ALIGNMENT ? 4 : (speed) ? 15 : AARCH64_CALL_RATIO)
> >
> >  /* SET_RATIO is similar to CLEAR_RATIO, but for a non-zero constant, so
> when
> > optimizing for size adjust the ratio to account for the overhead of 
> > loading
> > the constant.  */
> >  #define SET_RATIO(speed) \
> > -  ((speed) ? 15 : AARCH64_CALL_RATIO - 2)
> > +  (!STRICT_ALIGNMENT ? 0 : (speed) ? 15 : AARCH64_CALL_RATIO - 2)
> 
> Think it would help to adjust the SET_RATIO comment too, otherwise it's not
> obvious why its !STRICT_ALIGNMNENT value is 0.
> 

Will do.

> >
> >  /* Disable auto-increment in move_by_pieces et al.  Use of auto-
> increment is
> > rarely a good idea in straight-line code since it adds an extra
> > address diff --git a/gcc/config/aarch64/aarch64.c
> > b/gcc/config/aarch64/aarch64.c index
> >
> a8cc545c37044345c3f1d3bf09151c8a9578a032..16ac0c076adcc82627af43473a9
> 3
> > 8e78d3a7ecdc 100644
> > --- a/gcc/config/aarch64/aarch64.c
> > +++ b/gcc/config/aarch64/aarch64.c
> > @@ -7058,6 +7058,9 @@ aarch64_gen_store_pair (machine_mode mode,
> rtx mem1, rtx reg1, rtx mem2,
> >  case E_V4SImode:
> >return gen_vec_store_pairv4siv4si (mem1, reg1, mem2, reg2);
> >
> > +case E_V16QImode:
> > +  return gen_vec_store_pairv16qiv16qi (mem1, reg1, mem2, reg2);
> > +
> >  default:
> >gcc_unreachable ();
> >  }
> > @@ -21373,6 +21376,134 @@ aarch64_expand_cpymem (rtx *operands)
> >return true;
> >  }
> >
> > +/* Like aarch64_copy_one_block_and_progress_pointers, except for
> memset where
> > +   *src is a register we have created with the duplicated value to be
> > +set.  */
> 
> AIUI, *SRC doesn't accumulate across calls in the way that it does for
> aarch64_copy_one_block_and_progress_pointers, so it might be better to
> pass an rtx rather than an “rtx *”.
> 

Will do.

> > +static void
> > +aarch64_set_one_block_and_progress_pointer (rtx *src, rtx *dst,
> > +   machine_mode mode)
> > +{
> > +  /* If we are copying 128bits or 256bits, we can do that straight from
> > +  the SIMD register we prepared.  */
> 
> Nit: excess space before “the”.
>
 
Will do.

> > +  if (known_eq (GET_MODE_BITSIZE (mode), 256))
> > +{
> > +  mode =  GET_MODE (*src);
> 
> Excess space before “GET_MODE”.
>
  
Will do.

> > +  /* "Cast" the *dst to the correct mode.  */
> > +  *dst = adjust_address (*dst, mode, 0);
> > +  /* Emit the memset.  */
> > +  emit_insn (aarch64_gen_store_pair (mode, *dst, *src,
> > +aarch64_progress_pointer (*dst),
> *src));
> > +
> > +  /* Move the pointers forward.  */
> > +  *dst = aarch64_move_pointer (*dst, 32);
> > +  return;
> > +}
> > +  else if (known_eq (GET_MODE_BITSIZE (mode), 128))
> 
> Nit: more usual in GCC not to have an “else” after an early return.
>

Will do.
 
> > +{
> > +  /* "Cast" the *dst to the correct mode.  */
> > +  *dst = adjust_address (*dst, GET_MODE (*src), 0);
> > +  /* Emit the memset.  */
> > +  emit_move_insn (*dst, *src);
> > +  /* Move the pointers forward.  */
> > +  *dst = aarch64_move_pointer (*dst, 16);
> > +  return;
> > +}
> > +  /* For copying less, we have to extract the right amount from *src.
> > + */  machine_mode vq_mode = aarch64_vq_mode
> > + (GET_MODE_INNER(mode)).require ();
> 
> Nit: should be a space before “(mode)”.
> 

Will do.

> > +  *src = convert_to_mode (vq_mode, *src, 0);  rtx reg =
> > + simplify_gen_subreg (mode, *src, vq_mode, 0);
> 
> I was surprised that this needed a two-step conversion.  Does a direct subreg
> of the original V16QI src blow up for s

Re: [PATCH] rs6000, Add bcd builtings listed in appendix B of the ABI

2020-10-30 Thread Segher Boessenkool
Hi!

On Fri, Oct 30, 2020 at 09:36:13AM -0700, Carl Love wrote:
> On Wed, 2020-10-28 at 20:43 -0400, David Edelsohn wrote:
> > Better, but please use
> > 
> > /* { dg-require-effective-target int128 } */
> > 
> > not "target int128" in the selector.  Segher and I both agree that
> > it's cleaner and more readable.  The selector (the target part on the
> > dg-do line) should not be used for this type of requirement.

Yes.

> +(define_insn "*bcd_test_"
>[(set (reg:CCFP CR6_REGNO)
>   (compare:CCFP
> -  (unspec:V2DF [(match_operand:V1TI 1 "register_operand" "v")
> -(match_operand:V1TI 2 "register_operand" "v")
> +  (unspec:V2DF [(match_operand:VBCD 1 "register_operand" "v")
> +(match_operand:VBCD 2 "register_operand" "v")
>  (match_operand:QI 3 "const_0_to_1_operand" "i")]

This should be "n" instead of "i".  This is existing code of course, but
please do that in the new code at least?  (And changing "i" to "n" in
existing code wherever an assembly-time literal constant is needed is
pre-approved -- "i" allows relocations, "n" does not, essentially.)

> +(define_insn "dfp_denbcd_v16qi_inst"
> +  [(set (match_operand:TD 0 "gpc_reg_operand" "=d")
> + (unspec:TD [(match_operand:QI 1 "const_0_to_1_operand" "i")

(like here)

Because the predicate here only allows actual numbers (const_ints), it
is quite hard to ever make the "i" go wrong, but it isn't impossible in
principle.

Other than that nit, yes this looks good.  So okay for trunk, thanks!


Segher


Re: [PATCH] Fix gnu-versioned-namespace build

2020-10-30 Thread Jonathan Wakely via Gcc-patches

On 30/10/20 18:51 +0100, François Dumont wrote:

On 30/10/20 2:37 pm, Jonathan Wakely wrote:

On 30/10/20 13:23 +, Jonathan Wakely wrote:

On 30/10/20 13:59 +0100, François Dumont via Libstdc++ wrote:

The gnu-versioned-namespace build is broken.

The fix in charconv/floating_from_chars.cc is quite trivial. I 
am not so sure about the fix in sstream-inst.cc.


The change for src/c++20/sstream-inst.cc is OK to commit. It would
probably be better to not build that file at all if the cxx11 ABI is
not supported at all, but then the src/c++20 directory would be empty
and I'm not sure if that would work. So just making the file empty is
fine.

The change for from_chars is not OK. With your change the 
header doesn't declare those functions if included by a file using the
old ABI. That's wrong, they should be declared unconditionally.

I see two ways to fix it. Either make the declarations in the header
depend on ! _GLIBCXX_INLINE_VERSION (so they're disabled for
gnu-versioned namespace) or fix the code in floating_from_chars to not
use a pmr::memory_resource for allocation in the versioned namespace
build.


Here's a patch for the second way.

A third way to fix it would be to make basic_string work with C++
allocators, so that pmr::string is usable for the gnu-versioned
namespace.

And the fourth would be to switch the versioned namespace to use the
new ABI unconditionally, instead of using the old ABI unconditionally.



Can I commit this one once tested then ?


Yes please.


I'll try to put the fourth way in place however.


N.B. https://gcc.gnu.org/bugzilla/show_bug.cgi?id=83077 tracks that.



Re: [PATCH] PowerPC: Don't assume all targets have GLIBC

2020-10-30 Thread Segher Boessenkool
On Fri, Oct 30, 2020 at 01:21:34PM -0400, Michael Meissner wrote:
> David reminded me that not all targets support GLIBC.  This patch should fix 
> my
> previous committed patch not to use TARGET_GLIBC_MAJOR or TARGET_GLIBC_MINOR
> unless they are defined.

> +   This support is only in little endian GLIBC 2.32 or newer.  */
> +static bool
> +glibc_supports_ieee_128bit (void)
> +{
> +#if defined (OPTION_GLIBC) \
> +  && defined (TARGET_GLIBC_MAJOR) \
> +  && defined (TARGET_GLIBC_MINOR)
> +
> +  if (OPTION_GLIBC
> +  && !BYTES_BIG_ENDIAN
> +  && DEFAULT_ABI == ABI_ELFv2
> +  && ((TARGET_GLIBC_MAJOR * 1000) + TARGET_GLIBC_MINOR) >= 2032)
> +return true;
> +#endif /* GLIBC provided.  */
> +
> +  return false;
> +}
> +

So this makes the compiler behave differently based on what the libc
used at build time was (including its version).  Can't we do better?  :-(


Segher


ipa-cp: New debug counters for IPA-CP

2020-10-30 Thread Martin Jambor
Hi,

Martin Liška has been asking me to add debug counters to the IPA-CP pass so
that testcase reductions are easier.  The pass already has one for the bit
value propagation, so this patch adds one for value_range propagation
and one for the actual constant propagation.

Passed bootstrap and testing on x86_64-linux.  OK for trunk?

Thanks,

Martin


gcc/ChangeLog:

2020-10-30  Martin Jambor  

* dbgcnt.def (ipa_cp_values): New counter.
(ipa_cp_vr): Likewise.
* ipa-cp.c (decide_about_value): Check and bump ipa_cp_values debug
counter.
(decide_whether_version_node): Likewise.
(ipcp_store_vr_results):Check and bump ipa_cp_vr debug counter.
---
 gcc/dbgcnt.def |  2 ++
 gcc/ipa-cp.c   | 12 +++-
 2 files changed, 13 insertions(+), 1 deletion(-)

diff --git a/gcc/dbgcnt.def b/gcc/dbgcnt.def
index 07946a85ecc..a5b6bb66a6c 100644
--- a/gcc/dbgcnt.def
+++ b/gcc/dbgcnt.def
@@ -171,6 +171,8 @@ DEBUG_COUNTER (if_after_reload)
 DEBUG_COUNTER (if_conversion)
 DEBUG_COUNTER (if_conversion_tree)
 DEBUG_COUNTER (ipa_cp_bits)
+DEBUG_COUNTER (ipa_cp_values)
+DEBUG_COUNTER (ipa_cp_vr)
 DEBUG_COUNTER (ipa_mod_ref)
 DEBUG_COUNTER (ipa_sra_params)
 DEBUG_COUNTER (ipa_sra_retvalues)
diff --git a/gcc/ipa-cp.c b/gcc/ipa-cp.c
index f981366a345..9895548fc35 100644
--- a/gcc/ipa-cp.c
+++ b/gcc/ipa-cp.c
@@ -5462,6 +5462,9 @@ decide_about_value (struct cgraph_node *node, int index, 
HOST_WIDE_INT offset,
&caller_count))
 return false;
 
+  if (!dbg_cnt (ipa_cp_values))
+return false;
+
   if (dump_file && (dump_flags & TDF_DETAILS))
 {
   fprintf (dump_file, " - considering value ");
@@ -5577,6 +5580,12 @@ decide_whether_version_node (struct cgraph_node *node)
 
   if (info->do_clone_for_all_contexts)
 {
+  if (!dbg_cnt (ipa_cp_values))
+   {
+ info->do_clone_for_all_contexts = false;
+ return ret;
+   }
+
   struct cgraph_node *clone;
   vec callers = node->collect_callers ();
 
@@ -5864,7 +5873,8 @@ ipcp_store_vr_results (void)
  ipa_vr vr;
 
  if (!plats->m_value_range.bottom_p ()
- && !plats->m_value_range.top_p ())
+ && !plats->m_value_range.top_p ()
+ && dbg_cnt (ipa_cp_vr))
{
  vr.known = true;
  vr.type = plats->m_value_range.m_vr.kind ();
-- 
2.28.0



[committed] patch to deal with insn scratches in global RA

2020-10-30 Thread Vladimir Makarov via Gcc-patches

 The following patch implements taking insn scratch requirements into
account in global RA (IRA).  Before the patch IRA simply ignored insn
scratches.  Only LRA took the scratches into account and assigned hard
registers to scratches if neccessary.  In some cases it resulted in
spilling pseudos who got hard registers in IRA and as a consequence in
violating a good IRA assignment.

  The patch changes insn scratches which require registers for all
insn alternatives (in other words w/o X constraint in scratch
constraint string).  This is done before IRA staring its work. LRA
still continue to change the rest scratches (with X constraint and in
insn created during IRA) into pseudos.  As before the patch at the end
of LRA work, spilled scratch pseudos (for which X constraint was
chosen) changed into scratches back.

  The patch was successfully bootstrapped and tested on x86-64, ppc64,
aarch64, s390x.  There are few new GCC test failures on ppc64 and
s390x which can be fixed by adding hints to scratch constraints in ppc
md file and by changing expected test output (as hard register
assignment was changed a bit).  I'll submit the patches for approval a
bit later.


2020-10-30  Vladimir Makarov  

    * lra.c (get_scratch_reg): New function.
    (remove_scratches_1): Rename remove_insn_scratches.  Use
    ira_remove_insn_scratches and get_scratch_reg.
    (remove_scratches): Do not
    initialize scratches, scratch_bitmap, and scratch_operand_bitmap.
    (lra): Call ira_restore_scratches instead of restore_scratches.
    (struct sloc, sloc_t, scratches, scratch_bitmap)
    (scratch_operand_bitmap, lra_former_scratch_p)
    (lra_former_scratch_operand_p, lra_register_new_scratch_op, 
restore_scratches): Move them to ...

    * ira.c: ... here.
    (former_scratch_p, former_scratch_operand_p): Rename to
    ira_former_scratch_p and ira_former_scratch_operand_p.
    (contains_X_constraint_p): New function.
    (register_new_scratch_op): Rename to ira_register_new_scratch_op.
    Change it to work for IRA and LRA.
    (restore_scratches): Rename to ira_restore_scratches.
    (get_scratch_reg, ira_remove_insn_scratches): New functions.
    (ira): Call ira_remove_scratches if we use LRA.
    * ira.h (ira_former_scratch_p, ira_former_scratch_operand_p): New
    prototypes.
    (ira_register_new_scratch_op, ira_restore_scratches): New 
prototypes.

    (ira_remove_insn_scratches): New prototype.
    * lra-int.h (lra_former_scratch_p, lra_former_scratch_operand_p):
    Remove prototypes.
    (lra_register_new_scratch_op): Ditto.
    * lra-constraints.c: Rename lra_former_scratch_p and
    lra_former_scratch_p to ira_former_scratch_p and to
    ira_former_scratch_p.
    * lra-remat.c: Ditto.
    * lra-spills.c: Rename lra_former_scratch_p to 
ira_former_scratch_p.


commit 44fbc9c6e02ca5b8f98f25b514ed7588e7ba733d
Author: Vladimir N. Makarov 
Date:   Fri Oct 30 15:05:22 2020 -0400

Take insn scratch RA requirements into account in IRA.

  The patch changes insn scratches which require registers for all
insn alternatives (in other words w/o X constraint in scratch
constraint string).  This is done before IRA staring its work.  LRA
still continue to change the rest scratches (with X constraint and in
insn created during IRA) into pseudos.  As before the patch at the end
of LRA work, spilled scratch pseudos (for which X constraint was
chosen) changed into scratches back.

gcc/ChangeLog:

* lra.c (get_scratch_reg): New function.
(remove_scratches_1): Rename remove_insn_scratches.  Use
ira_remove_insn_scratches and get_scratch_reg.
(remove_scratches): Do not
initialize scratches, scratch_bitmap, and scratch_operand_bitmap.
(lra): Call ira_restore_scratches instead of restore_scratches.
(struct sloc, sloc_t, scratches, scratch_bitmap)
(scratch_operand_bitmap, lra_former_scratch_p)
(lra_former_scratch_operand_p, lra_register_new_scratch_op)
(restore_scratches): Move them to ...
* ira.c: ... here.
(former_scratch_p, former_scratch_operand_p): Rename to
ira_former_scratch_p and ira_former_scratch_operand_p.
(contains_X_constraint_p): New function.
(register_new_scratch_op): Rename to ira_register_new_scratch_op.
Change it to work for IRA and LRA.
(restore_scratches): Rename to ira_restore_scratches.
(get_scratch_reg, ira_remove_insn_scratches): New functions.
(ira): Call ira_remove_scratches if we use LRA.
* ira.h (ira_former_scratch_p, ira_former_scratch_operand_p): New
prototypes.
(ira_register_new_scratch_op, ira_restore_scratches): New prototypes.
(ira_remove_insn_scratches): New prototype.

[committed] avoid creating inverted ranges in access_ref::add_offset (PR 97556)

2020-10-30 Thread Martin Sebor via Gcc-patches

access_ref::add_offset() works hard to restore the property that
the lower bound of a range is less than or equal to its upper
bound.  But by capping the upper bound to at most PTRDIFF_MAX
without also considering the lower bound, it allows the latter
to exceed the value of the former, thus violating  the very
postcondition it aims to guarantee.

To correct this oversight I have committed the attached patch
as an obvious fix.  Tested on x86_64-linux.

Martin
PR middle-end/97556 - ICE on excessively large index into a multidimensional array

gcc/ChangeLog:

	PR middle-end/97556
	* builtins.c (access_ref::add_offset): Cap offset lower bound
	to at most the the upper bound.

gcc/testsuite/ChangeLog:

	PR middle-end/97556
	* gcc.dg/Warray-bounds-70.c: New test.

diff --git a/gcc/builtins.c b/gcc/builtins.c
index 3a3eb5562df..da25343beb1 100644
--- a/gcc/builtins.c
+++ b/gcc/builtins.c
@@ -321,7 +321,13 @@ void access_ref::add_offset (const offset_int &min, const offset_int &max)
   offrng[1] = maxoff;
   offset_int absmax = wi::abs (max);
   if (offrng[0] < absmax)
-	offrng[0] += min;
+	{
+	  offrng[0] += min;
+	  /* Cap the lower bound at the upper (set to MAXOFF above)
+	 to avoid inadvertently recreating an inverted range.  */
+	  if (offrng[1] < offrng[0])
+	offrng[0] = offrng[1];
+	}
   else
 	offrng[0] = 0;
 }
diff --git a/gcc/testsuite/gcc.dg/Warray-bounds-70.c b/gcc/testsuite/gcc.dg/Warray-bounds-70.c
new file mode 100644
index 000..087e255599c
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/Warray-bounds-70.c
@@ -0,0 +1,18 @@
+/* PR middle-end/97556 - ICE on excessively large index into a multidimensional
+   array
+   { dg-do compile }
+   { dg-options "-O2 -Wall" } */
+
+#define SIZE_MAX __SIZE_MAX__
+
+typedef __SIZE_TYPE__ size_t;
+
+char a[1][3];
+
+void f (int c)
+{
+  size_t i = c ? SIZE_MAX / 2 : SIZE_MAX;
+  a[i][0] = 0;  // { dg-warning "\\\[-Warray-bounds" }
+}
+
+// { dg-prune-output "\\\[-Wstringop-overflow=" }


[patch] Fixing ppc64 test failure after patch dealing with scratches in IRA

2020-10-30 Thread Vladimir Makarov via Gcc-patches

  The following patch fixes failures for test p9-extract-2.c on
ppc64.  The failures are a result of committing patch dealing with insn
scratches in IRA.  The pseudo corresponding the 1st scratch in the
following insn get unexpected register class (general regs) and
unexpected insn alternative (the 2nd one).

;; Optimize stores to use the ISA 3.0 scalar store instructions
(define_insn_and_split "*vsx_extract__store_p9"
  [(set (match_operand: 0 "memory_operand" "=Z,m")
    (vec_select:
 (match_operand:VSX_EXTRACT_I 1 "gpc_reg_operand" ",v")
 (parallel [(match_operand:QI 2 "const_int_operand" "n,n")])))
   (clobber (match_scratch: 3 "=,&r"))
   (clobber (match_scratch:SI 4 "=X,&r"))]
  "VECTOR_MEM_VSX_P (mode) && TARGET_VEXTRACTUB"

Actually getting the right hard register in LRA before the patch was a luck.

The following patch fixes the failures by adding hints * to the constraints.

Is it ok to commit?

2020-10-30  Vladimir Makarov  

    * config/rs6000/vsx.md (*vsx_extract__store_p9): Add 
hints * to 1st scratch.


diff --git a/gcc/config/rs6000/vsx.md b/gcc/config/rs6000/vsx.md
index 67e4f2fd037..78de85ccbbb 100644
--- a/gcc/config/rs6000/vsx.md
+++ b/gcc/config/rs6000/vsx.md
@@ -3717,7 +3717,7 @@
 	(vec_select:
 	 (match_operand:VSX_EXTRACT_I 1 "gpc_reg_operand" ",v")
 	 (parallel [(match_operand:QI 2 "const_int_operand" "n,n")])))
-   (clobber (match_scratch: 3 "=,&r"))
+   (clobber (match_scratch: 3 "=*,&*r"))
(clobber (match_scratch:SI 4 "=X,&r"))]
   "VECTOR_MEM_VSX_P (mode) && TARGET_VEXTRACTUB"
   "#"


Re: [PATCH] i386: Set the stack usage to 0 for naked functions

2020-10-30 Thread Uros Bizjak via Gcc-patches
> -fstack-usage raises a "stack usage computation not supported for this target"
> warning when it encounters a naked function because the prologue returns early
> for naked function on i386. This patch sets the stack usage to zero for naked
> function, following the fix done for Arm by Eric Botcazou:
>
> https://gcc.gnu.org/pipermail/gcc-patches/2016-May/448258.html
>
> Bootstrapped and tested on x86_64-linux. If approved, I'll need a maintainer 
> to
> commit on my behalf.
>
> Thanks,
>
> Pat Bernardi
> Senior Software Engineer, AdaCore
>
>
> 2020-10-29  Pat Bernardi  
>
> gcc/ChangeLog
>
> * config/i386/i386.c (ix86_expand_prologue): Set the stack usage to 0
> for naked functions.

OK.

Thanks,
Uros.


Re: [PATCH][middle-end][i386][version 6]Add -fzero-call-used-regs=[skip|used-gpr-arg|used-arg|all-gpr-arg|all-arg|used-gpr|all-gpr|used|all]

2020-10-30 Thread Qing Zhao via Gcc-patches
FYI.

I just committed the patch to gcc11 as:

https://gcc.gnu.org/pipermail/gcc-cvs/2020-October/336263.html 


Qing

Re: [PATCH] aarch64: Add backend support for expanding __builtin_memset

2020-10-30 Thread Richard Sandiford via Gcc-patches
Sudakshina Das  writes:
>> > +
>> > +  /* "Cast" the *dst to the correct mode.  */
>> > +  *dst = adjust_address (*dst, mode, 0);
>> > +  /* Emit the memset.  */
>> > +  emit_move_insn (*dst, reg);
>> > +  /* Move the pointer forward.  */
>> > +  *dst = aarch64_progress_pointer (*dst); }
>> > +
>> > +/* Expand setmem, as if from a __builtin_memset.  Return true if
>> > +   we succeed, otherwise return false.  */
>> > +
>> > +bool
>> > +aarch64_expand_setmem (rtx *operands) {
>> > +  int n, mode_bits;
>> > +  unsigned HOST_WIDE_INT len;
>> > +  rtx dst = operands[0];
>> > +  rtx val = operands[2], src;
>> > +  rtx base;
>> > +  machine_mode cur_mode = BLKmode, next_mode;
>> > +  bool speed_p = !optimize_function_for_size_p (cfun);
>> > +  unsigned max_set_size = speed_p ? 256 : 128;
>> 
>> What's the basis for the size value?  AIUI (and I've probably got this 
>> wrong),
>> that effectively means a worst case of 3+2 stores
>> (3 STP Qs and 2 mop-up stores).  Then we need one instruction to set up the
>> constant.  So if that's right, it looks like the worst-case size is 6 
>> instructions.
>> 
>> AARCH64_CALL_RATIO has a value of 8, but I'm not sure how that relates to
>> the number of instructions in a call.  I guess the best case is 4 (3 
>> instructions
>> for the parameters and one for the call itself).
>> 
>
> This one I will ask Wilco to chime in. We discussed offline what would be the
> largest case that this builtin should allow and he suggested 256-bytes. It
> would actually generate 9 instructions (its in the memset-corner-case.c).
> Personally I am not sure what the best decisions are in this case so
> I will rely on Wilco's suggestions.

Ah, sorry, by “the size value”, I meant the !speed_p value of 128.
I now realise that that was far from clear given that the variable is
called max_set_size :-)

So yeah, I'm certainly not questioning the speed_p value of 256.
I'm sure you and Wilco have picked the best value for that.  But -Os
stuff can usually be justified on first principles and I wasn't sure
where the value of 128 came from.

>> > +
>> > +  /* Convert len to bits to make the rest of the code simpler.  */  n
>> > + = len * BITS_PER_UNIT;
>> > +
>> > +  /* Maximum amount to copy in one go.  We allow 256-bit chunks based
>> on the
>> > + AARCH64_EXTRA_TUNE_NO_LDP_STP_QREGS tuning parameter.
>> setmem expand
>> > + pattern is only turned on for TARGET_SIMD.  */
>> > +  const int copy_limit = ((aarch64_tune_params.extra_tuning_flags
>> > + & AARCH64_EXTRA_TUNE_NO_LDP_STP_QREGS))
>> > +   ? GET_MODE_BITSIZE (TImode) : 256;
>> > +
>> 
>> Perhaps we should override this for !speed, since I guess the limits are 
>> based
>> on using STP Q in that case.  There again, we don't do that for the memcpy
>> code, so it's just a suggestion.
>> 
>
> I think at this point we are deciding what would be the maximum size that we 
> can
> set in one go and so if the core lets me do STP Q, I would do 256 bits. Would 
> this
> choice be different If we were optimizing for size?

With the max_set_size value above we'd be aiming for 128 bytes when
optimising for size.  But GCC is usually quite aggressive about -Os
(even to the extent of using division instructions instead of a fairly
short inline expansion) so I wasn't sure whether we should let the core
have a veto over using STP Q even for -Os.  If we do, I imagine the
!speed_p value of max_set_size should vary based on whether we use
STP Q or not.

>> > +  if (n > 0 && n < copy_limit / 2)
>> > +  {
>> > +next_mode = smallest_mode_for_size (n, MODE_INT);
>> > +/* Last 1-byte causes the compiler to optimize to STRB when it
>> should
>> > +   use STR Bx, [mem] since we already used SIMD registers.
>> > +   So force it to HImode.  */
>> > +if (next_mode == QImode)
>> > +  next_mode = HImode;
>> 
>> Is this always better?  E.g. for variable inputs and zero it seems quite 
>> natural
>> to store the original scalar GPR.
>> 
>> If we do do this, I think we should assert before the loop that n > 1.
>> 
>> Also, it would be good to cover this case in the tests.
>
> To give a background on this:
> So the case in point here is when we are copying the _last_ 1 byte. So the 
> following
> Void foo (void *p) { __builtin_memset (p, 1, 3); }
> The compiler was generating
> moviv0.16b, 0x1
> mov w1, 1
> strbw1, [x0, 2]
> str h0, [x0]
> ret
> This is because after my expansion in subsequent passes it would see
> (insn 13 12 14 2 (set (reg:QI 99)
> (subreg:QI (reg:V16QI 98) 0)) "x.c":3:3 -1
>  (nil))
> (insn 14 13 0 2 (set (mem:QI (plus:DI (reg:DI 93)
> (const_int 2 [0x2])) [0 MEM  [(void *)p_2(D)]+2 S1 
> A8])
> (reg:QI 99)) "x.c":3:3 -1
>  (nil))
> And "optimize" it away to strb with an extra mov. Ideally this is a separate 
> patch
> to fix this somewhere between cse1 and fwprop1 and emit
> moviv0.16b, 0

Re: [PATCH] PowerPC: Don't assume all targets have GLIBC

2020-10-30 Thread Michael Meissner via Gcc-patches
On Fri, Oct 30, 2020 at 01:52:13PM -0500, Segher Boessenkool wrote:
> On Fri, Oct 30, 2020 at 01:21:34PM -0400, Michael Meissner wrote:
> > David reminded me that not all targets support GLIBC.  This patch should 
> > fix my
> > previous committed patch not to use TARGET_GLIBC_MAJOR or TARGET_GLIBC_MINOR
> > unless they are defined.
> 
> > +   This support is only in little endian GLIBC 2.32 or newer.  */
> > +static bool
> > +glibc_supports_ieee_128bit (void)
> > +{
> > +#if defined (OPTION_GLIBC) \
> > +  && defined (TARGET_GLIBC_MAJOR) \
> > +  && defined (TARGET_GLIBC_MINOR)
> > +
> > +  if (OPTION_GLIBC
> > +  && !BYTES_BIG_ENDIAN
> > +  && DEFAULT_ABI == ABI_ELFv2
> > +  && ((TARGET_GLIBC_MAJOR * 1000) + TARGET_GLIBC_MINOR) >= 2032)
> > +return true;
> > +#endif /* GLIBC provided.  */
> > +
> > +  return false;
> > +}
> > +
> 
> So this makes the compiler behave differently based on what the libc
> used at build time was (including its version).  Can't we do better?  :-(

Not really.  We have exactly the same issue with __builtin_cpu_supports.  At
the end of the day, you have to configure GCC with an appropriate GLIBC.

-- 
Michael Meissner, IBM
IBM, M/S 2506R, 550 King Street, Littleton, MA 01460-6245, USA
email: meiss...@linux.ibm.com, phone: +1 (978) 899-4797


Re: [PATCH] i386: Cleanup i386/i386elf.h and align it's return convention with the SVR4 ABI

2020-10-30 Thread Uros Bizjak via Gcc-patches
> As observed a number of years ago in the following thread, i386/i386elf.h has 
> not been
> kept up to date:
>
> https://gcc.gnu.org/pipermail/gcc/2013-August/209981.html
>
> This patch does the following cleanup:
>
> 1. The return convention now follows the i386 and x86_64 SVR4 ABIs again. As 
> discussed
> in the above thread, the current return convention does not match any other 
> target or
> existing ABI, which is problematic since the current approach is inefficient 
> (particularly on
> x86_64-elf) and confuses other tools like GDB (unfortunately that thread did 
> not lead to any
> fix at the time).
>
> 2. The default version of ASM_OUTPUT_ASCII from elfos.h is used. As mentioned 
> in the
> cleanup of i386/sysv4.h [1] the ASM_OUTPUT_ASCII implementation then used by 
> sysv4.h,
> and currently used by i386elf.h, has a significantly higher computation 
> complexity than the
> default version provided by elfos.h.
>
> The patch has been tested on i386-elf and x86_64-elf hosted on x86_64-linux, 
> fixing a
> number failing tests that were expecting the SVR4 ABI return convention. It 
> has also been
> bootstrapped and tested on x86_64-pc-linux-gnu without regression.
>
> If approved, I'll need a maintainer to kindly commit on my behalf.
>
> Thanks,
>
> Pat Bernardi
> Senior Software Engineer, AdaCore
>
> [1] https://gcc.gnu.org/pipermail/gcc-patches/2011-February/305559.html

Looking at the [1], it looks that i386elf.h suffered some bitrot.
Probably nobody cares much for {i386,x86_64}-elf nowadays.

So, I think, uder reasons explained in [1], and based on your testing,
that the patch should be committed to the mainline to fix the ABI
issues. However, I wonder if the ABI change is severe enough to
warrant a compile-time warning?

> 2020-08-18  Pat Bernardi  
>
> gcc/ChangeLog
>
> * config/i386/i386elf.h (SUBTARGET_RETURN_IN_MEMORY): Remove.
> (ASM_OUTPUT_ASCII): Likewise.
> (DEFAULT_PCC_STRUCT_RETURN): Define.
> * config/i386/i386.c (ix86_return_in_memory): Remove
> SUBTARGET_RETURN_IN_MEMORY.

-/* The ELF ABI for the i386 says that records and unions are returned
-   in memory.  */
-
-#define SUBTARGET_RETURN_IN_MEMORY(TYPE, FNTYPE) \
- (TYPE_MODE (TYPE) == BLKmode \
- || (VECTOR_MODE_P (TYPE_MODE (TYPE)) && int_size_in_bytes (TYPE) == 8))
+/* Define DEFAULT_PCC_STRUCT_RETURN to 1 because the i386 SVR4 ABI returns
+   records and unions in memory. ix86_option_override_internal will overide
+   this flag when compiling 64-bit code as we never do pcc_struct_return
+   scheme on x86-64.  */
+#undef DEFAULT_PCC_STRUCT_RETURN
+#define DEFAULT_PCC_STRUCT_RETURN 1

The documentation says:

--cut here--
DEFAULT_PCC_STRUCT_RETURN Define this macro to be 1 if all structure
and union return values must be in memory. Since this results in
slower code, this should be defined only if needed for compatibility
with other compilers or with an ABI. If you define this macro to be 0,
then the conventions used for structure and union return values are
decided by the RETURN_IN_MEMORY macro.

If not defined, this defaults to the value 1.
--cut here--

So, is it necessary to define DEFAULT_PCC_STRUCT_RETURN ?

Uros.


[PATCH] rs6000, vector integer multiply/divide/modulo instructions

2020-10-30 Thread Carl Love via Gcc-patches
GCC maintainers:

The following patch adds new builtins for the vector integer multiply,
divide and modulo operations.  The builtins are:  
vec_mulh(), vec_div(), vec_dive(), vec_mod() for signed and unsigned
integers and long long integers.  Support for signed and unsigned long
long integers the exiting vec_mul() is added.  Note that the existing
support for the vec_div()and vec_mul() builtins emulate the vector
operations with multiple scalar instructions.  This patch adds support
for these builtins to use the new vector instructions.

The patch was compiled and tested on:

  powerpc64le-unknown-linux-gnu (Power 9 LE)

with no regressions. Additionally the new test case was compiled and
executed by hand on Mambo to verify the test case passes.

Please let me know if this patch is acceptable for mainline.  Thanks.

Carl Love

-

2020-10-30  Carl Love  

gcc/
* config/rs6000/altivec.h (vec_mulh, vec_div, vec_dive, vec_mod): New
defines.
* config/rs6000/altivec.md (VIlong): Move define to file vector.md.
* config/rs6000/rs6000-builtin.def (VDIVES_V4SI, VDIVES_V2DI,
VDIVEU_V4SI, VDIVEU_V2DI, VDIVS_V4SI, VDIVS_V2DI, VDIVU_V4SI,
VDIVU_V2DI, VMODS_V2DI, VMODS_V4SI, VMODU_V2DI, VMODU_V4SI, VMULHS_V2DI,
VMULHS_V4SI, VMULHU_V2DI, VMULHU_V4SI, VMULLD_V2DI): Add builtin define.
(VMUL, VMULH, VDIVE, VMOD):  Add new BU_P10_OVERLOAD_2 definitions.
* config/rs6000/rs6000-call.c (VSX_BUILTIN_VEC_DIV, 
P10_BUILTIN_VEC_VDIVE,
P10_BUILTIN_VEC_VMOD, P10_BUILTIN_VEC_VMULH): New overloaded 
definitions.
(builtin_function_type)
[P10V_BUILTIN_VDIVEU_V4SI, P10V_BUILTIN_VDIVEU_V2DI,
P10V_BUILTIN_VDIVU_V4SI, P10V_BUILTIN_VDIVU_V2DI,
P10V_BUILTIN_VMODU_V2DI, P10V_BUILTIN_VMODU_V4SI, 
P10V_BUILTIN_VMULHU_V2DI,
P10V_BUILTIN_VMULHU_V4SI, P10V_BUILTIN_VMULLD_V2DI]: Add case statement
for builtins.
* config/rs6000/vector.md (UNSPEC_VDIVES, UNSPEC_VDIVEU, UNSPEC_VMULHS,
UNSPEC_VMULHU, UNSPEC_VMULLD): Add enum for UNSPECs.
(VIlong_char): Add define_mod_attribute.
(vdives_, vdiveu_, vdiv3, uuvdiv3, vdivs_,
vdivu_, vmods_, vmodu_, vmulhs_, vmulhu_,
mulv2di3): Add define_insn, mode is VIlong.
config/rs6000/vsx.md (vsx_mul_v2di, vsx_udiv_v2di): Add if 
(TARGET_POWER10)
statement.
* doc/extend.texi (vec_mulh, vec_mul, vec_div, vec_dive, vec_mod): Add
builtin descriptions.

gcc/testsuite/
* gcc.target/powerpc/builtins-1-p10-runnable.c: New test file.
---
 gcc/config/rs6000/altivec.h   |   5 +
 gcc/config/rs6000/altivec.md  |   2 -
 gcc/config/rs6000/rs6000-builtin.def  |  23 ++
 gcc/config/rs6000/rs6000-call.c   |  49 +++
 gcc/config/rs6000/vector.md   | 104 +
 gcc/config/rs6000/vsx.md  | 118 +++---
 gcc/doc/extend.texi   | 120 ++
 .../powerpc/builtins-1-p10-runnable.c | 378 ++
 8 files changed, 747 insertions(+), 52 deletions(-)
 create mode 100644 gcc/testsuite/gcc.target/powerpc/builtins-1-p10-runnable.c

diff --git a/gcc/config/rs6000/altivec.h b/gcc/config/rs6000/altivec.h
index df10a8c498d..b2803e52d93 100644
--- a/gcc/config/rs6000/altivec.h
+++ b/gcc/config/rs6000/altivec.h
@@ -725,6 +725,11 @@ __altivec_scalar_pred(vec_any_nle,
 #define vec_strir_p(a) __builtin_vec_strir_p (a)
 #define vec_stril_p(a) __builtin_vec_stril_p (a)
 
+#define vec_mulh(a, b) __builtin_vec_mulh (a, b)
+#define vec_div(a, b) __builtin_vec_div (a, b)
+#define vec_dive(a, b) __builtin_vec_dive (a, b)
+#define vec_mod(a, b) __builtin_vec_mod (a, b)
+
 /* VSX Mask Manipulation builtin. */
 #define vec_genbm __builtin_vec_mtvsrbm
 #define vec_genhm __builtin_vec_mtvsrhm
diff --git a/gcc/config/rs6000/altivec.md b/gcc/config/rs6000/altivec.md
index 0a2e634d6b0..8e80c681b11 100644
--- a/gcc/config/rs6000/altivec.md
+++ b/gcc/config/rs6000/altivec.md
@@ -192,8 +192,6 @@
 
 ;; Short vec int modes
 (define_mode_iterator VIshort [V8HI V16QI])
-;; Longer vec int modes for rotate/mask ops
-(define_mode_iterator VIlong [V2DI V4SI])
 ;; Vec float modes
 (define_mode_iterator VF [V4SF])
 ;; Vec modes, pity mode iterators are not composable
diff --git a/gcc/config/rs6000/rs6000-builtin.def 
b/gcc/config/rs6000/rs6000-builtin.def
index 5b05da87f4b..706527dcd3a 100644
--- a/gcc/config/rs6000/rs6000-builtin.def
+++ b/gcc/config/rs6000/rs6000-builtin.def
@@ -2830,6 +2830,24 @@ BU_P10V_AV_3 (VSRDB_V8HI, "vsrdb_v8hi", CONST, 
vsrdb_v8hi)
 BU_P10V_AV_3 (VSRDB_V4SI, "vsrdb_v4si", CONST, vsrdb_v4si)
 BU_P10V_AV_3 (VSRDB_V2DI, "vsrdb_v2di", CONST, vsrdb_v2di)
 
+BU_P10V_AV_2 (VDIVES_V4SI, "vdivesw", CONST, vdives_v4si)
+BU_P10V_AV_2 (VDIVES_V2DI, "vdivesd", CONST, vdives_v2di)
+BU_P10V_AV_2 (VDIVEU_V4SI, "vdiveuw", CONST, vdiveu_v4si)
+BU_P10V_AV_2 (VDIVEU_V2DI

Re: [PATCH] c++: Tweaks for value_dependent_expression_p.

2020-10-30 Thread Jason Merrill via Gcc-patches

On 10/29/20 10:36 PM, Marek Polacek wrote:

We may not call value_dependent_expression_p on expressions that are
not potential constant expressions, otherwise value_d could crash,
as I saw recently (in C++98).  So beef up the checking in i_d_e_p.

This revealed a curious issue: when we have __PRETTY_FUNCTION__ in
a template function, we set its DECL_VALUE_EXPR to error_mark_node
(cp_make_fname_decl), so potential_c_e returns false when it gets it,
but value_dependent_expression_p handles it specially and says true.
This broke lambda-generic-pretty1.C.  So take care of that.

And then also tweak uses_template_parms.

Bootstrapped/regtested on x86_64-pc-linux-gnu, ok for trunk?


OK, thanks.


gcc/cp/ChangeLog:

* constexpr.c (potential_constant_expression_1): Treat
__PRETTY_FUNCTION__ inside a template function as
potentially-constant.
* pt.c (uses_template_parms): Call
instantiation_dependent_expression_p instead of
value_dependent_expression_p.
(instantiation_dependent_expression_p): Check
potential_constant_expression before calling
value_dependent_expression_p.
---
  gcc/cp/constexpr.c | 5 +
  gcc/cp/pt.c| 5 +++--
  2 files changed, 8 insertions(+), 2 deletions(-)

diff --git a/gcc/cp/constexpr.c b/gcc/cp/constexpr.c
index b46824f128d..c257dfcb2e6 100644
--- a/gcc/cp/constexpr.c
+++ b/gcc/cp/constexpr.c
@@ -7716,6 +7716,11 @@ potential_constant_expression_1 (tree t, bool want_rval, 
bool strict, bool now,
}
  return false;
}
+ /* Treat __PRETTY_FUNCTION__ inside a template function as
+potentially-constant.  */
+ else if (DECL_PRETTY_FUNCTION_P (t)
+  && DECL_VALUE_EXPR (t) == error_mark_node)
+   return true;
  return RECUR (DECL_VALUE_EXPR (t), rval);
}
if (want_rval
diff --git a/gcc/cp/pt.c b/gcc/cp/pt.c
index b569644514c..c419fb470ee 100644
--- a/gcc/cp/pt.c
+++ b/gcc/cp/pt.c
@@ -10755,7 +10755,7 @@ uses_template_parms (tree t)
else if (t == error_mark_node)
  dependent_p = false;
else
-dependent_p = value_dependent_expression_p (t);
+dependent_p = instantiation_dependent_expression_p (t);
  
processing_template_decl = saved_processing_template_decl;
  
@@ -27293,7 +27293,8 @@ bool

  instantiation_dependent_expression_p (tree expression)
  {
return (instantiation_dependent_uneval_expression_p (expression)
- || value_dependent_expression_p (expression));
+ || (potential_constant_expression (expression)
+ && value_dependent_expression_p (expression)));
  }
  
  /* Like type_dependent_expression_p, but it also works while not processing


base-commit: 4f0606fe4bbf1346f83dd4d0c9060c6b46672a7d





Re: [PATCH v2] c++: Implement -Wvexing-parse [PR25814]

2020-10-30 Thread Jason Merrill via Gcc-patches

On 10/29/20 11:00 PM, Marek Polacek wrote:

On Thu, Oct 29, 2020 at 02:25:33PM -0400, Jason Merrill via Gcc-patches wrote:

On 10/29/20 2:11 PM, Marek Polacek wrote:

On Thu, Oct 29, 2020 at 11:17:37AM -0400, Jason Merrill via Gcc-patches wrote:

On 10/28/20 7:40 PM, Marek Polacek wrote:

On Wed, Oct 28, 2020 at 03:09:08PM -0400, Jason Merrill wrote:

On 10/28/20 1:58 PM, Marek Polacek wrote:

On Wed, Oct 28, 2020 at 01:26:53AM -0400, Jason Merrill via Gcc-patches wrote:

On 10/24/20 7:40 PM, Marek Polacek wrote:

On Fri, Oct 23, 2020 at 09:33:38PM -0400, Jason Merrill via Gcc-patches wrote:

On 10/23/20 3:01 PM, Marek Polacek wrote:

This patch implements the -Wvexing-parse warning to warn about the
sneaky most vexing parse rule in C++: the cases when a declaration
looks like a variable definition, but the C++ language requires it
to be interpreted as a function declaration.  This warning is on by
default (like clang++).  From the docs:

void f(double a) {
  int i();// extern int i (void);
  int n(int(a));  // extern int n (int);
}

Another example:

struct S { S(int); };
void f(double a) {
  S x(int(a));   // extern struct S x (int);
  S y(int());// extern struct S y (int (*) (void));
  S z(); // extern struct S z (void);
}

You can find more on this in [dcl.ambig.res].

I spent a fair amount of time on fix-it hints so that GCC can recommend
various ways to resolve such an ambiguity.  Sometimes that's tricky.
E.g., suggesting default-initialization when the class doesn't have
a default constructor would not be optimal.  Suggesting {}-init is also
not trivial because it can use an initializer-list constructor if no
default constructor is available (which ()-init wouldn't do).  And of
course, pre-C++11, we shouldn't be recommending {}-init at all.


What do you think of, instead of passing the type down into the declarator
parse, adding the paren locations to cp_declarator::function and giving the
diagnostic from cp_parser_init_declarator instead?


Oops, now I see there's already cp_declarator::parenthesized; might as well
reuse that.  And maybe change it to a range, while we're at it.


I'm afraid I can't reuse it because grokdeclarator uses it to warn about
"unnecessary parentheses in declaration".  So when we have:

  int (x());

declarator->parenthesized points to the outer parens (if any), whereas
declarator->u.function.parens_loc should point to the inner ones.  We also
have declarator->id_loc but I think we should only use it for declarator-ids.


Makes sense.


(We should still adjust ->parenthesized to be a range to generate a better
diagnostic; I shall send a patch soon.)


Hmm, I wonder why we have the parenthesized_p parameter to some of these
functions, since we can look at the declarator to find that information...


That would be a nice cleanup.


Interesting idea.  I suppose it's better, and makes the implementation
more localized.  The approach here is that if the .function.parens_loc
is UNKNOWN_LOCATION, we've not seen a vexing parse.


I'd rather always set the parens location, and then analyze the
cp_declarator in warn_about_ambiguous_parse to see if it's a vexing parse;
we should have all the information we need.


I could always set .parens_loc, but then I'd still need another flag telling
me whether we had an ambiguity.  Otherwise I don't know how I would tell
apart e.g. "int f()" (warn) v. "int f(void)" (don't warn), etc.


Ah, I was thinking that we still had the parameter declarators, but now I
see that cp_parser_parameter_declaration_list groks them and returns a
TREE_LIST.  We could set a TREE_LANG_FLAG on each TREE_LIST if its parameter
declarator was parenthesized?


I think so, looks like we have a bunch of free TREE_LANG_FLAG slots on
a TREE_LIST.  But cp_parser_parameter_declaration_clause can return
a void_list_node, so I assume I'd have to copy_node it before setting
some new flag in it.  Do you think that'd be fine?


There's no declarator in a void_list_node, so we shouldn't need to set a
"declarator is parenthesized" flag on it.


I guess I'm still not clear on how I would distinguish between
int f() and int f(void).  When I look at the cdk_function declarator,
all I can see is the .parameters TREE_LIST, which for both cases will
be the same void_list_node, but we should only warn for the former.

What am I missing?


I'm just being dense.  You're right that we would need to distinguish those
two.  Perhaps an explicit_void_parms_node or something like that for during
parsing; it looks like grokparms will turn it into void_list_node as other
code expects.


Gotcha.  Now we do most of the work in warn_about_ambiguous_parse.


Thanks, just a few tweaks left.


Bootstrapped/regtested on x86_64-pc-linux-gnu, ok for trunk?

-- >8 --
This patch implements the -Wvexing-parse warning to warn about the
sneaky most vexing parse rule in C++: the cases when a declaratio

Re: [PATCH] c++: Disable -Winit-list-lifetime in unevaluated operand [PR97632]

2020-10-30 Thread Jason Merrill via Gcc-patches

On 10/29/20 10:35 PM, Marek Polacek wrote:

Jon suggested turning this warning off when we're not actually
evaluating the operand.  This patch does that.

Bootstrapped/regtested on x86_64-pc-linux-gnu, ok for trunk?


OK.


gcc/cp/ChangeLog:

PR c++/97632
* init.c (build_new_1): Disable -Winit-list-lifetime for an unevaluated
operand.

gcc/testsuite/ChangeLog:

PR c++/97632
* g++.dg/warn/Winit-list4.C: New test.
---
  gcc/cp/init.c   |  2 +-
  gcc/testsuite/g++.dg/warn/Winit-list4.C | 15 +++
  2 files changed, 16 insertions(+), 1 deletion(-)
  create mode 100644 gcc/testsuite/g++.dg/warn/Winit-list4.C

diff --git a/gcc/cp/init.c b/gcc/cp/init.c
index 1bddb6555dc..ffb84ea5b09 100644
--- a/gcc/cp/init.c
+++ b/gcc/cp/init.c
@@ -2957,7 +2957,7 @@ build_new_1 (vec **placement, tree type, 
tree nelts,
return error_mark_node;
  }
  
-  if (is_std_init_list (elt_type))

+  if (is_std_init_list (elt_type) && !cp_unevaluated_operand)
  warning (OPT_Winit_list_lifetime,
 "% of % does not "
 "extend the lifetime of the underlying array");
diff --git a/gcc/testsuite/g++.dg/warn/Winit-list4.C 
b/gcc/testsuite/g++.dg/warn/Winit-list4.C
new file mode 100644
index 000..d136187e2c6
--- /dev/null
+++ b/gcc/testsuite/g++.dg/warn/Winit-list4.C
@@ -0,0 +1,15 @@
+// PR c++/97632
+// { dg-do compile { target c++20 } }
+// Test we don't warn in an unevaluated operand.
+
+#include 
+
+template
+concept default_initializable
+  = requires
+{
+  _Tp{};
+  (void) ::new _Tp; // { dg-bogus "does not extend the lifetime" }
+};
+
+static_assert(default_initializable>);

base-commit: dec1eb4c276f1b3c003154c159b539eb7110a13f





Re: [PATCH] PowerPC: Don't assume all targets have GLIBC

2020-10-30 Thread Segher Boessenkool
On Fri, Oct 30, 2020 at 04:00:30PM -0400, Michael Meissner wrote:
> On Fri, Oct 30, 2020 at 01:52:13PM -0500, Segher Boessenkool wrote:
> > On Fri, Oct 30, 2020 at 01:21:34PM -0400, Michael Meissner wrote:
> > > David reminded me that not all targets support GLIBC.  This patch should 
> > > fix my
> > > previous committed patch not to use TARGET_GLIBC_MAJOR or 
> > > TARGET_GLIBC_MINOR
> > > unless they are defined.
> > 
> > > +   This support is only in little endian GLIBC 2.32 or newer.  */
> > > +static bool
> > > +glibc_supports_ieee_128bit (void)
> > > +{
> > > +#if defined (OPTION_GLIBC) \
> > > +  && defined (TARGET_GLIBC_MAJOR) \
> > > +  && defined (TARGET_GLIBC_MINOR)
> > > +
> > > +  if (OPTION_GLIBC
> > > +  && !BYTES_BIG_ENDIAN
> > > +  && DEFAULT_ABI == ABI_ELFv2
> > > +  && ((TARGET_GLIBC_MAJOR * 1000) + TARGET_GLIBC_MINOR) >= 2032)
> > > +return true;
> > > +#endif /* GLIBC provided.  */
> > > +
> > > +  return false;
> > > +}
> > > +
> > 
> > So this makes the compiler behave differently based on what the libc
> > used at build time was (including its version).  Can't we do better?  :-(
> 
> Not really.  We have exactly the same issue with __builtin_cpu_supports.  At
> the end of the day, you have to configure GCC with an appropriate GLIBC.

That isn't quite the same issue: hopefully all distros have backported
all hwcap names (it is always safe to do so), but they will not backport
all other features.

But, time will fix all problems here.  So, okay for trunk.  Thanks!


Segher


Re: [PATCH] rs6000, vector integer multiply/divide/modulo instructions

2020-10-30 Thread David Edelsohn via Gcc-patches
On Fri, Oct 30, 2020 at 4:07 PM Carl Love  wrote:

> diff --git a/gcc/testsuite/gcc.target/powerpc/builtins-1-p10-runnable.c 
> b/gcc/testsuite/gcc.target/powerpc/builtins-1-p10-runnable.c
> new file mode 100644
> index 000..549bc742d12
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/powerpc/builtins-1-p10-runnable.c
> @@ -0,0 +1,378 @@
> +/* { dg-do run } */
> +/* { dg-require-effective-target power10_hw } */
> +/* { dg-options "-mdejagnu-cpu=power10 -O2" } */
> +/* { dg-final { scan-assembler-times "\mvdivsw\M" 1 } } */
> +/* { dg-final { scan-assembler-times "\mvdivuw\M" 1 } } */
> +/* { dg-final { scan-assembler-times "\mvdivsd\M" 1 } } */
> +/* { dg-final { scan-assembler-times "\mvdivud\M" 1 } } */
> +/* { dg-final { scan-assembler-times "\mvdivesw\M" 1 } } */
> +/* { dg-final { scan-assembler-times "\mvdiveuw\M" 1 } } */
> +/* { dg-final { scan-assembler-times "\mvdivesd\M" 1 } } */
> +/* { dg-final { scan-assembler-times "\mvdiveud\M" 1 } } */
> +/* { dg-final { scan-assembler-times "\mvmodsw\M" 1 } } */
> +/* { dg-final { scan-assembler-times "\mvmoduw\M" 1 } } */
> +/* { dg-final { scan-assembler-times "\mvmodsd\M" 1 } } */
> +/* { dg-final { scan-assembler-times "\mvmodud\M" 1 } } */
> +/* { dg-final { scan-assembler-times "\mvmulhsw\M" 1 } } */
> +/* { dg-final { scan-assembler-times "\mvmulhuw\M" 1 } } */
> +/* { dg-final { scan-assembler-times "\mvmulhsd\M" 1 } } */
> +/* { dg-final { scan-assembler-times "\mvmulhud\M" 1 } } */
> +/* { dg-final { scan-assembler-times "\mvmulld\M" 2 } } */

As Alan mentioned with the other testcases, without an explicit
"-save-temps", dg-do run will not test for the assembler output.  Are
you certain that the assembler output is actually tested and matching?

Thanks, David


RE: [PATCH] rs6000, vector integer multiply/divide/modulo instructions

2020-10-30 Thread Carl Love via Gcc-patches
On Fri, 2020-10-30 at 17:05 -0400, David Edelsohn wrote:
> On Fri, Oct 30, 2020 at 4:07 PM Carl Love  wrote:
> 
> > diff --git a/gcc/testsuite/gcc.target/powerpc/builtins-1-p10-
> > runnable.c b/gcc/testsuite/gcc.target/powerpc/builtins-1-p10-
> > runnable.c
> > new file mode 100644
> > index 000..549bc742d12
> > --- /dev/null
> > +++ b/gcc/testsuite/gcc.target/powerpc/builtins-1-p10-runnable.c
> > @@ -0,0 +1,378 @@
> > +/* { dg-do run } */
> > +/* { dg-require-effective-target power10_hw } */
> > +/* { dg-options "-mdejagnu-cpu=power10 -O2" } */
> > +/* { dg-final { scan-assembler-times "\mvdivsw\M" 1 } } */
> > +/* { dg-final { scan-assembler-times "\mvdivuw\M" 1 } } */
> > +/* { dg-final { scan-assembler-times "\mvdivsd\M" 1 } } */
> > +/* { dg-final { scan-assembler-times "\mvdivud\M" 1 } } */
> > +/* { dg-final { scan-assembler-times "\mvdivesw\M" 1 } } */
> > +/* { dg-final { scan-assembler-times "\mvdiveuw\M" 1 } } */
> > +/* { dg-final { scan-assembler-times "\mvdivesd\M" 1 } } */
> > +/* { dg-final { scan-assembler-times "\mvdiveud\M" 1 } } */
> > +/* { dg-final { scan-assembler-times "\mvmodsw\M" 1 } } */
> > +/* { dg-final { scan-assembler-times "\mvmoduw\M" 1 } } */
> > +/* { dg-final { scan-assembler-times "\mvmodsd\M" 1 } } */
> > +/* { dg-final { scan-assembler-times "\mvmodud\M" 1 } } */
> > +/* { dg-final { scan-assembler-times "\mvmulhsw\M" 1 } } */
> > +/* { dg-final { scan-assembler-times "\mvmulhuw\M" 1 } } */
> > +/* { dg-final { scan-assembler-times "\mvmulhsd\M" 1 } } */
> > +/* { dg-final { scan-assembler-times "\mvmulhud\M" 1 } } */
> > +/* { dg-final { scan-assembler-times "\mvmulld\M" 2 } } */
> 
> As Alan mentioned with the other testcases, without an explicit
> "-save-temps", dg-do run will not test for the assembler output.  Are
> you certain that the assembler output is actually tested and
> matching?
> 
> Thanks, David

David:

I am just running the binary on Mambo by hand.  I am not running the
GCC regression test on Mambo.  I don't have GCC setup on Mambo.   But
yes, I did miss the -save-temps.  I will fix that.  Thanks.

   Carl 



Re: [PATCH] libstdc++: Don't initialize from *this inside some views [PR97600]

2020-10-30 Thread Jonathan Wakely via Gcc-patches

On 30/10/20 11:11 -0400, Patrick Palka via Libstdc++ wrote:

This works around a subtle issue where instantiating the begin()/end()
member of some views (as part of return type deduction) inadvertently
requires computing the satisfaction value of range.

This is problematic because the constraint range requires the
begin()/end() member to be callable.  But it's not callable until we've
deduced its return type, so evaluation of range yields false
at this point.  And if at any point after both members are instantiated
(and their return types deduced) we evaluate range again, this
time it will yield true since the begin()/end() members are now both
callable.  This makes the program ill-formed according to
[temp.constr.atomic]/3:

 If, at different points in the program, the satisfaction result is
 different for identical atomic constraints and template arguments, the
 program is ill-formed, no diagnostic required.

The views affected by this issue are those whose begin()/end() member
has a placeholder return type and that member initializes an _Iterator
or _Sentinel object from a reference to *this.  The second condition is
relevant because it means explicit conversion functions are considered
during overload resolution (as per [over.match.copy], I think), and
therefore it causes g++ to check the constraints of the conversion
function view_interface::operator bool().  And this conversion
function's constraints indirectly require range.

This issue is observable on trunk only with basic_istream_view (as in
the testcase in the PR).  But a pending patch that makes g++ memoize
constraint satisaction values indefinitely (it currently invalidates
the satisfaction cache on various events) causes many existing tests for
the other affected views to fail, because range then remains
false for the whole compilation.

This patch works around this issue by adjusting the constructors of the
_Iterator and _Sentinel types of the affected views to take their
foo_view argument by pointer instead of by reference, so that g++ no
longer considers explicit conversion functions when resolving the
direct-initialization inside these views' begin()/end() members.


Nice solution.


Tested on x86_64-pc-linux-gnu, and also verified that this fixes the
testsuite failures when combined with the mentioned frontend patch
(https://gcc.gnu.org/pipermail/gcc-patches/2020-October/557237.html).
Does this look OK for trunk?


Yes, thanks.




Re: [PATCH] Treat { 0 } specially for structs with the designated_init attribute.

2020-10-30 Thread Asher Gordon via Gcc-patches
Joseph Myers  writes:

> I've tested and committed the first patch.

Great, thanks!

> The second one introduces some test failures:
>
> [...]
>
> Could you investigate those and send versions of the second and third
> patches that don't introduce any test regressions?

I've also found a more serious bug: when Wdesignated-init-2.c is
compiled with -Wuniversal-initializer, it causes an internal compiler
error. I'll try to fix this and the test regressions.

-- 
"It's my cookie file and if I come up with something that's lame and I like it,
it goes in."
-- karl (Karl Lehenbauer)
   
I prefer to send and receive mail encrypted. Please send me your
public key, and if you do not have my public key, please let me
know. Thanks.

GPG fingerprint: 38F3 975C D173 4037 B397  8095 D4C9 C4FC 5460 8E68


signature.asc
Description: PGP signature


Re: [PATCH] PowerPC: Don't assume all targets have GLIBC

2020-10-30 Thread Michael Meissner via Gcc-patches
On Fri, Oct 30, 2020 at 03:54:06PM -0500, Segher Boessenkool wrote:
> But, time will fix all problems here.  So, okay for trunk.  Thanks!

Note, I discovered the ABI is not set to ELFv2 at the time the test is done, so
I removed that part of the test.

This is the patch I committed:

>From f03851e1a6dac72127e97629e259ad01a2b1e7b6 Mon Sep 17 00:00:00 2001
From: Michael Meissner 
Date: Fri, 30 Oct 2020 18:36:25 -0400
Subject: [PATCH] PowerPC: Don't assume all targets have GLIBC.

gcc/
2020-10-30  Michael Meissner  

* config/rs6000/rs6000.c (glibc_supports_ieee_128bit): New helper
function.
(rs6000_option_override_internal): Call it.
---
 gcc/config/rs6000/rs6000.c | 19 ---
 1 file changed, 16 insertions(+), 3 deletions(-)

diff --git a/gcc/config/rs6000/rs6000.c b/gcc/config/rs6000/rs6000.c
index bcd4c4a82b3..1e506b83762 100644
--- a/gcc/config/rs6000/rs6000.c
+++ b/gcc/config/rs6000/rs6000.c
@@ -3539,6 +3539,20 @@ rs6000_linux64_override_options ()
 }
 #endif
 
+/* Return true if we are using GLIBC, and it supports IEEE 128-bit long double.
+   This support is only in little endian GLIBC 2.32 or newer.  */
+static bool
+glibc_supports_ieee_128bit (void)
+{
+#ifdef OPTION_GLIBC
+  if (OPTION_GLIBC && !BYTES_BIG_ENDIAN
+  && ((TARGET_GLIBC_MAJOR * 1000) + TARGET_GLIBC_MINOR) >= 2032)
+return true;
+#endif /* OPTION_GLIBC.  */
+
+  return false;
+}
+
 /* Override command line options.
 
Combine build-specific configuration information with options
@@ -4164,9 +4178,8 @@ rs6000_option_override_internal (bool global_init_p)
  static bool warned_change_long_double;
 
  if (!warned_change_long_double
- && (!OPTION_GLIBC
- || (!lang_GNU_C () && !lang_GNU_CXX ())
- || ((TARGET_GLIBC_MAJOR * 1000) + TARGET_GLIBC_MINOR) < 2032))
+ && (!glibc_supports_ieee_128bit ()
+ || (!lang_GNU_C () && !lang_GNU_CXX (
{
  warned_change_long_double = true;
  if (TARGET_IEEEQUAD)
-- 
2.22.0


-- 
Michael Meissner, IBM
IBM, M/S 2506R, 550 King Street, Littleton, MA 01460-6245, USA
email: meiss...@linux.ibm.com, phone: +1 (978) 899-4797


Re: [patch] Fixing ppc64 test failure after patch dealing with scratches in IRA

2020-10-30 Thread Segher Boessenkool
Hi!

On Fri, Oct 30, 2020 at 03:19:12PM -0400, Vladimir Makarov wrote:
>   The following patch fixes failures for test p9-extract-2.c on
> ppc64.  The failures are a result of committing patch dealing with insn
> scratches in IRA.  The pseudo corresponding the 1st scratch in the
> following insn get unexpected register class (general regs) and
> unexpected insn alternative (the 2nd one).
> 
> ;; Optimize stores to use the ISA 3.0 scalar store instructions
> (define_insn_and_split "*vsx_extract__store_p9"
>   [(set (match_operand: 0 "memory_operand" "=Z,m")
>     (vec_select:
>  (match_operand:VSX_EXTRACT_I 1 "gpc_reg_operand" 
> ",v")
>  (parallel [(match_operand:QI 2 "const_int_operand" 
> "n,n")])))
>    (clobber (match_scratch: 3 "=,&r"))
>    (clobber (match_scratch:SI 4 "=X,&r"))]
>   "VECTOR_MEM_VSX_P (mode) && TARGET_VEXTRACTUB"
> 
> Actually getting the right hard register in LRA before the patch was a luck.
> 
> The following patch fixes the failures by adding hints * to the constraints.
> 
> Is it ok to commit?
> 
> 2020-10-30  Vladimir Makarov  
> 
>     * config/rs6000/vsx.md (*vsx_extract__store_p9): Add 
> hints * to 1st scratch.
> 

Thanks for the patch!  But it has a problem:

> diff --git a/gcc/config/rs6000/vsx.md b/gcc/config/rs6000/vsx.md
> index 67e4f2fd037..78de85ccbbb 100644
> --- a/gcc/config/rs6000/vsx.md
> +++ b/gcc/config/rs6000/vsx.md
> @@ -3717,7 +3717,7 @@
>   (vec_select:
>(match_operand:VSX_EXTRACT_I 1 "gpc_reg_operand" ",v")
>(parallel [(match_operand:QI 2 "const_int_operand" "n,n")])))
> -   (clobber (match_scratch: 3 "=,&r"))
> +   (clobber (match_scratch: 3 "=*,&*r"))
> (clobber (match_scratch:SI 4 "=X,&r"))]
>"VECTOR_MEM_VSX_P (mode) && TARGET_VEXTRACTUB"
>"#"

You add * to both alternatives here?  I would expect adding it to only
the second alternative, does it work better with both?

That also avoids a different problem: * won't work as expected.
'*' in IRA skips one constraint character, but  can be "wa", a
two-letter constraint (and we do have an "a" constraint as well,
something wholly different: "wa" means a VSX register, while "a" is an
indexed address).

case '*':
  /* Ignore the next letter for this pass.  */
  c = *++p;
  break;


Segher


Re: [PATCH] PowerPC: Don't assume all targets have GLIBC

2020-10-30 Thread Segher Boessenkool
On Fri, Oct 30, 2020 at 06:50:30PM -0400, Michael Meissner wrote:
> On Fri, Oct 30, 2020 at 03:54:06PM -0500, Segher Boessenkool wrote:
> > But, time will fix all problems here.  So, okay for trunk.  Thanks!
> 
> Note, I discovered the ABI is not set to ELFv2 at the time the test is done, 
> so
> I removed that part of the test.

Good, thanks.  But why did you leave the !BYTES_BIG_ENDIAN?  That seems
just as wrong.


Segher


[committed] libstdc++: Implement P2017R1 "Conditionally borrowed ranges"

2020-10-30 Thread Jonathan Wakely via Gcc-patches
This makes some range adaptors model the borrowed_range concept if they
are adapting a borrowed range. This hasn't been added to the C++23
working paper yet, but it has been approved by LWG, and the
recommendation is to treat it as a defect report for C++20 as well.

libstdc++-v3/ChangeLog:

* include/std/ranges (enable_borrowed_view>)
(enable_borrowed_view>)
(enable_borrowed_view>)
(enable_borrowed_view>)
(enable_borrowed_view>)
(enable_borrowed_view>): Add partial
specializations as per P2017R1.
* testsuite/std/ranges/adaptors/conditionally_borrowed.cc:
New test.

Tested powerpc64le-linux. Committed to trunk.

commit 39bf4f14fc75e14aafc4ba8a53a34775f29b743a
Author: Jonathan Wakely 
Date:   Fri Oct 30 18:39:43 2020

libstdc++: Implement P2017R1 "Conditionally borrowed ranges"

This makes some range adaptors model the borrowed_range concept if they
are adapting a borrowed range. This hasn't been added to the C++23
working paper yet, but it has been approved by LWG, and the
recommendation is to treat it as a defect report for C++20 as well.

libstdc++-v3/ChangeLog:

* include/std/ranges (enable_borrowed_view>)
(enable_borrowed_view>)
(enable_borrowed_view>)
(enable_borrowed_view>)
(enable_borrowed_view>)
(enable_borrowed_view>): Add partial
specializations as per P2017R1.
* testsuite/std/ranges/adaptors/conditionally_borrowed.cc:
New test.

diff --git a/libstdc++-v3/include/std/ranges b/libstdc++-v3/include/std/ranges
index 610083167d89..bc7bb05b0050 100644
--- a/libstdc++-v3/include/std/ranges
+++ b/libstdc++-v3/include/std/ranges
@@ -1810,6 +1810,10 @@ namespace views
 take_view(_Range&&, range_difference_t<_Range>)
   -> take_view>;
 
+  template
+inline constexpr bool enable_borrowed_range>
+  = enable_borrowed_range<_Tp>;
+
   namespace views
   {
 inline constexpr __adaptor::_RangeAdaptor take
@@ -2010,6 +2014,10 @@ namespace views
 drop_view(_Range&&, range_difference_t<_Range>)
   -> drop_view>;
 
+  template
+inline constexpr bool enable_borrowed_range>
+  = enable_borrowed_range<_Tp>;
+
   namespace views
   {
 inline constexpr __adaptor::_RangeAdaptor drop
@@ -2071,6 +2079,10 @@ namespace views
 drop_while_view(_Range&&, _Pred)
   -> drop_while_view, _Pred>;
 
+  template
+inline constexpr bool enable_borrowed_range>
+  = enable_borrowed_range<_Tp>;
+
   namespace views
   {
 inline constexpr __adaptor::_RangeAdaptor drop_while
@@ -2891,6 +2903,10 @@ namespace views
   template
 common_view(_Range&&) -> common_view>;
 
+  template
+inline constexpr bool enable_borrowed_range>
+  = enable_borrowed_range<_Tp>;
+
   namespace views
   {
 inline constexpr __adaptor::_RangeAdaptorClosure common
@@ -2976,6 +2992,10 @@ namespace views
   template
 reverse_view(_Range&&) -> reverse_view>;
 
+  template
+inline constexpr bool enable_borrowed_range>
+  = enable_borrowed_range<_Tp>;
+
   namespace views
   {
 namespace __detail
@@ -3301,6 +3321,10 @@ namespace views
   _Vp _M_base = _Vp();
 };
 
+  template
+inline constexpr bool enable_borrowed_range>
+  = enable_borrowed_range<_Tp>;
+
   template
 using keys_view = elements_view, 0>;
 
diff --git 
a/libstdc++-v3/testsuite/std/ranges/adaptors/conditionally_borrowed.cc 
b/libstdc++-v3/testsuite/std/ranges/adaptors/conditionally_borrowed.cc
new file mode 100644
index ..98398ff3b0a7
--- /dev/null
+++ b/libstdc++-v3/testsuite/std/ranges/adaptors/conditionally_borrowed.cc
@@ -0,0 +1,75 @@
+// Copyright (C) 2020 Free Software Foundation, Inc.
+//
+// This file is part of the GNU ISO C++ Library.  This library is free
+// software; you can redistribute it and/or modify it under the
+// terms of the GNU General Public License as published by the
+// Free Software Foundation; either version 3, or (at your option)
+// any later version.
+
+// This library is distributed in the hope that it will be useful,
+// but WITHOUT ANY WARRANTY; without even the implied warranty of
+// MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+// GNU General Public License for more details.
+
+// You should have received a copy of the GNU General Public License along
+// with this library; see the file COPYING3.  If not see
+// .
+
+// { dg-options "-std=gnu++2a" }
+// { dg-do run { target c++2a } }
+
+#include 
+#include 
+#include 
+#include 
+#include 
+
+namespace ranges = std::ranges;
+namespace views = std::views;
+using namespace std::literals;
+
+// P2017R1 "Conditionally borrowed ranges"
+auto trim(std::string const& s) {
+auto isalpha = [](unsigned char c){ return std::isalpha(c); };
+auto b = ranges::find_if(s, isalpha);
+auto e = ranges::find_if(s | views::reverse, isalpha).base();
+   

[committed] libstdc++: Fix some more warnings in test

2020-10-30 Thread Jonathan Wakely via Gcc-patches
libstdc++-v3/ChangeLog:

* testsuite/23_containers/vector/bool/modifiers/insert/31370.cc:
Avoid -Wcatch-value warnings.

Tested powerpc64le-linux. Committed to trunk.

commit d1e5d82af819025df9d9a81e8c591690e299924a
Author: Jonathan Wakely 
Date:   Fri Oct 30 10:47:25 2020

libstdc++: Fix some more warnings in test

libstdc++-v3/ChangeLog:

* testsuite/23_containers/vector/bool/modifiers/insert/31370.cc:
Avoid -Wcatch-value warnings.

diff --git 
a/libstdc++-v3/testsuite/23_containers/vector/bool/modifiers/insert/31370.cc 
b/libstdc++-v3/testsuite/23_containers/vector/bool/modifiers/insert/31370.cc
index 36d681528650..5a714873f0d7 100644
--- a/libstdc++-v3/testsuite/23_containers/vector/bool/modifiers/insert/31370.cc
+++ b/libstdc++-v3/testsuite/23_containers/vector/bool/modifiers/insert/31370.cc
@@ -59,14 +59,14 @@ void test01()
 { }
   catch(std::exception&)
 { ++myexit; }
-  
+
   // When doubling is too big, but smaller is sufficient, the resize
   // should do smaller and be happy.  It certainly shouldn't throw
   // other exceptions or crash.
   try
 {
   std::vector x;
-  x.resize(x.max_size() / 2 + 1, false); 
+  x.resize(x.max_size() / 2 + 1, false);
   for(int i = 0; i < _S_word_bit; ++i)
x.push_back(false);
   check_cap_ge_size(x);
@@ -75,11 +75,11 @@ void test01()
 { }
   catch(std::exception&)
 { ++myexit; }
-  
+
   try
 {
   std::vector x;
-  x.resize(x.max_size() / 2 + 1, false); 
+  x.resize(x.max_size() / 2 + 1, false);
   x.insert(x.end(), _S_word_bit, false);
   check_cap_ge_size(x);
 }
@@ -87,11 +87,11 @@ void test01()
 { }
   catch(std::exception&)
 { ++myexit; }
-  
+
   try
 {
   std::vector x;
-  x.resize(x.max_size() / 2 + 1, false); 
+  x.resize(x.max_size() / 2 + 1, false);
   std::vector y(_S_word_bit, false);
   x.insert(x.end(), y.begin(), y.end());
   check_cap_ge_size(x);
@@ -106,7 +106,7 @@ void test01()
   try
 {
   std::vector x;
-  x.resize(x.max_size() - _S_word_bit, false); 
+  x.resize(x.max_size() - _S_word_bit, false);
   for(int i = 0; i < _S_word_bit; ++i)
x.push_back(false);
   check_cap_ge_size(x);
@@ -115,11 +115,11 @@ void test01()
 { }
   catch(std::exception&)
 { ++myexit; }
-  
+
   try
 {
   std::vector x;
-  x.resize(x.max_size() - _S_word_bit, false); 
+  x.resize(x.max_size() - _S_word_bit, false);
   x.insert(x.end(), _S_word_bit, false);
   check_cap_ge_size(x);
 }
@@ -131,7 +131,7 @@ void test01()
   try
 {
   std::vector x;
-  x.resize(x.max_size() - _S_word_bit, false); 
+  x.resize(x.max_size() - _S_word_bit, false);
   std::vector y(_S_word_bit, false);
   x.insert(x.end(), y.begin(), y.end());
   check_cap_ge_size(x);
@@ -140,51 +140,51 @@ void test01()
 { }
   catch(std::exception&)
 { ++myexit; }
-  
+
   // Attempts to put in more than max_size() items should result in a
   // length error.
   try
 {
   std::vector x;
-  x.resize(x.max_size() - _S_word_bit, false); 
+  x.resize(x.max_size() - _S_word_bit, false);
   for(int i = 0; i < _S_word_bit + 1; ++i)
x.push_back(false);
   ++myexit;
 }
-  catch(std::bad_alloc)
+  catch(std::bad_alloc&)
 { }
-  catch(std::length_error)
+  catch(std::length_error&)
 { }
-  catch(std::exception)
-{ ++myexit; }
-  
-  try
-{
-  std::vector x;
-  x.resize(x.max_size() - _S_word_bit, false); 
-  x.insert(x.end(), _S_word_bit + 1, false);
-  ++myexit;
-}
-  catch(std::bad_alloc)
-{ }
-  catch(std::length_error)
-{ }
-  catch(std::exception)
+  catch(std::exception&)
 { ++myexit; }
 
   try
 {
   std::vector x;
-  x.resize(x.max_size() - _S_word_bit, false); 
+  x.resize(x.max_size() - _S_word_bit, false);
+  x.insert(x.end(), _S_word_bit + 1, false);
+  ++myexit;
+}
+  catch(std::bad_alloc&)
+{ }
+  catch(std::length_error&)
+{ }
+  catch(std::exception&)
+{ ++myexit; }
+
+  try
+{
+  std::vector x;
+  x.resize(x.max_size() - _S_word_bit, false);
   std::vector y(_S_word_bit + 1, false);
   x.insert(x.end(), y.begin(), y.end());
   ++myexit;
 }
-  catch(std::bad_alloc)
+  catch(std::bad_alloc&)
 { }
-  catch(std::length_error)
+  catch(std::length_error&)
 { }
-  catch(std::exception)
+  catch(std::exception&)
 { ++myexit; }
 
   VERIFY( !myexit );


[committed] libstdc++: Use double for unordered container load factors [PR 96958]

2020-10-30 Thread Jonathan Wakely via Gcc-patches
These calculations were changed to use long double nearly ten years ago
in order to get more precision than float:
https://gcc.gnu.org/pipermail/libstdc++/2011-September/036420.html

However, double should be sufficient, whlie being potentially faster
than long double, and not requiring soft FP calculations for targets
without native long double support.

libstdc++-v3/ChangeLog:

PR libstdc++/96958
* include/bits/hashtable_policy.h (_Prime_rehash_policy)
(_Power2_rehash_policy): Use double instead of long double.

Tested powerpc64le-linux. Committed to trunk.

This doesn't fix the PR, because there are also long double
calculations in src/c++11/hashtable_c++0x.cc, so another patch is
needed.

commit a1343e5c74093124d7fbce6052d838f47a8eeb20
Author: Jonathan Wakely 
Date:   Fri Oct 30 15:14:33 2020

libstdc++: Use double for unordered container load factors [PR 96958]

These calculations were changed to use long double nearly ten years ago
in order to get more precision than float:
https://gcc.gnu.org/pipermail/libstdc++/2011-September/036420.html

However, double should be sufficient, whlie being potentially faster
than long double, and not requiring soft FP calculations for targets
without native long double support.

libstdc++-v3/ChangeLog:

PR libstdc++/96958
* include/bits/hashtable_policy.h (_Prime_rehash_policy)
(_Power2_rehash_policy): Use double instead of long double.

diff --git a/libstdc++-v3/include/bits/hashtable_policy.h 
b/libstdc++-v3/include/bits/hashtable_policy.h
index cea5e549d253..7fed87f1c76b 100644
--- a/libstdc++-v3/include/bits/hashtable_policy.h
+++ b/libstdc++-v3/include/bits/hashtable_policy.h
@@ -458,7 +458,7 @@ namespace __detail
 // Return a bucket count appropriate for n elements
 std::size_t
 _M_bkt_for_elements(std::size_t __n) const
-{ return __builtin_ceill(__n / (long double)_M_max_load_factor); }
+{ return __builtin_ceill(__n / (double)_M_max_load_factor); }
 
 // __n_bkt is current bucket count, __n_elt is current element count,
 // and __n_ins is number of elements to be inserted.  Do we need to
@@ -559,7 +559,7 @@ namespace __detail
_M_next_resize = size_t(-1);
   else
_M_next_resize
- = __builtin_floorl(__res * (long double)_M_max_load_factor);
+ = __builtin_floorl(__res * (double)_M_max_load_factor);
 
   return __res;
 }
@@ -567,7 +567,7 @@ namespace __detail
 // Return a bucket count appropriate for n elements
 std::size_t
 _M_bkt_for_elements(std::size_t __n) const noexcept
-{ return __builtin_ceill(__n / (long double)_M_max_load_factor); }
+{ return __builtin_ceill(__n / (double)_M_max_load_factor); }
 
 // __n_bkt is current bucket count, __n_elt is current element count,
 // and __n_ins is number of elements to be inserted.  Do we need to
@@ -582,16 +582,16 @@ namespace __detail
  // If _M_next_resize is 0 it means that we have nothing allocated so
  // far and that we start inserting elements. In this case we start
  // with an initial bucket size of 11.
- long double __min_bkts
+ double __min_bkts
= std::max(__n_elt + __n_ins, _M_next_resize ? 0 : 11)
- / (long double)_M_max_load_factor;
+ / (double)_M_max_load_factor;
  if (__min_bkts >= __n_bkt)
return { true,
  _M_next_bkt(std::max(__builtin_floorl(__min_bkts) + 
1,
__n_bkt * _S_growth_factor)) };
 
  _M_next_resize
-   = __builtin_floorl(__n_bkt * (long double)_M_max_load_factor);
+   = __builtin_floorl(__n_bkt * (double)_M_max_load_factor);
  return { false, 0 };
}
   else


[r11-4578 Regression] FAIL: gcc.target/i386/zero-scratch-regs-31.c (test for excess errors) on Linux/x86_64

2020-10-30 Thread sunil.k.pandey via Gcc-patches
On Linux/x86_64,

d10f3e900b0377b4760a090b0f90371bcef01686 is the first bad commit
commit d10f3e900b0377b4760a090b0f90371bcef01686
Author: qing zhao 
Date:   Fri Oct 30 20:41:38 2020 +0100

Add -fzero-call-used-regs option and zero_call_used_regs function 
attributes.

caused

FAIL: gcc.target/i386/zero-scratch-regs-24.c scan-assembler xorl[ \t]*%edi, %edi
FAIL: gcc.target/i386/zero-scratch-regs-25.c scan-assembler xorl[ \t]*%edi, %edi
FAIL: gcc.target/i386/zero-scratch-regs-26.c scan-assembler movaps[ \t]*%xmm0, 
%xmm3
FAIL: gcc.target/i386/zero-scratch-regs-26.c scan-assembler movaps[ \t]*%xmm0, 
%xmm4
FAIL: gcc.target/i386/zero-scratch-regs-26.c scan-assembler movaps[ \t]*%xmm0, 
%xmm5
FAIL: gcc.target/i386/zero-scratch-regs-26.c scan-assembler movaps[ \t]*%xmm0, 
%xmm6
FAIL: gcc.target/i386/zero-scratch-regs-26.c scan-assembler movaps[ \t]*%xmm0, 
%xmm7
FAIL: gcc.target/i386/zero-scratch-regs-26.c scan-assembler movl[ \t]*%edx, %edi
FAIL: gcc.target/i386/zero-scratch-regs-26.c scan-assembler movl[ \t]*%edx, %esi
FAIL: gcc.target/i386/zero-scratch-regs-26.c scan-assembler movl[ \t]*%edx, %r8d
FAIL: gcc.target/i386/zero-scratch-regs-26.c scan-assembler movl[ \t]*%edx, %r9d
FAIL: gcc.target/i386/zero-scratch-regs-26.c scan-assembler pxor[ \t]*%xmm0, 
%xmm0
FAIL: gcc.target/i386/zero-scratch-regs-27.c scan-assembler movl[ \t]*%edx, %edi
FAIL: gcc.target/i386/zero-scratch-regs-27.c scan-assembler movl[ \t]*%edx, %esi
FAIL: gcc.target/i386/zero-scratch-regs-27.c scan-assembler movl[ \t]*%edx, %r8d
FAIL: gcc.target/i386/zero-scratch-regs-27.c scan-assembler movl[ \t]*%edx, %r9d
FAIL: gcc.target/i386/zero-scratch-regs-28.c (test for excess errors)
FAIL: gcc.target/i386/zero-scratch-regs-30.c scan-assembler-times fstp 8
FAIL: gcc.target/i386/zero-scratch-regs-31.c (test for excess errors)

with GCC configured with

../../gcc/configure 
--prefix=/local/skpandey/gccwork/toolwork/gcc-bisect-master/master/r11-4578/usr 
--enable-clocale=gnu --with-system-zlib --with-demangler-in-ld 
--with-fpmath=sse --enable-languages=c,c++,fortran --enable-cet --without-isl 
--enable-libmpx x86_64-linux --disable-bootstrap

To reproduce:

$ cd {build_dir}/gcc && make check 
RUNTESTFLAGS="i386.exp=gcc.target/i386/zero-scratch-regs-24.c 
--target_board='unix{-m32}'"
$ cd {build_dir}/gcc && make check 
RUNTESTFLAGS="i386.exp=gcc.target/i386/zero-scratch-regs-24.c 
--target_board='unix{-m32\ -march=cascadelake}'"
$ cd {build_dir}/gcc && make check 
RUNTESTFLAGS="i386.exp=gcc.target/i386/zero-scratch-regs-25.c 
--target_board='unix{-m32}'"
$ cd {build_dir}/gcc && make check 
RUNTESTFLAGS="i386.exp=gcc.target/i386/zero-scratch-regs-25.c 
--target_board='unix{-m32\ -march=cascadelake}'"
$ cd {build_dir}/gcc && make check 
RUNTESTFLAGS="i386.exp=gcc.target/i386/zero-scratch-regs-26.c 
--target_board='unix{-m32\ -march=cascadelake}'"
$ cd {build_dir}/gcc && make check 
RUNTESTFLAGS="i386.exp=gcc.target/i386/zero-scratch-regs-26.c 
--target_board='unix{-m64\ -march=cascadelake}'"
$ cd {build_dir}/gcc && make check 
RUNTESTFLAGS="i386.exp=gcc.target/i386/zero-scratch-regs-27.c 
--target_board='unix{-m32}'"
$ cd {build_dir}/gcc && make check 
RUNTESTFLAGS="i386.exp=gcc.target/i386/zero-scratch-regs-27.c 
--target_board='unix{-m32\ -march=cascadelake}'"
$ cd {build_dir}/gcc && make check 
RUNTESTFLAGS="i386.exp=gcc.target/i386/zero-scratch-regs-28.c 
--target_board='unix{-m32}'"
$ cd {build_dir}/gcc && make check 
RUNTESTFLAGS="i386.exp=gcc.target/i386/zero-scratch-regs-28.c 
--target_board='unix{-m32\ -march=cascadelake}'"
$ cd {build_dir}/gcc && make check 
RUNTESTFLAGS="i386.exp=gcc.target/i386/zero-scratch-regs-30.c 
--target_board='unix{-m32}'"
$ cd {build_dir}/gcc && make check 
RUNTESTFLAGS="i386.exp=gcc.target/i386/zero-scratch-regs-30.c 
--target_board='unix{-m32\ -march=cascadelake}'"
$ cd {build_dir}/gcc && make check 
RUNTESTFLAGS="i386.exp=gcc.target/i386/zero-scratch-regs-31.c 
--target_board='unix{-m32}'"
$ cd {build_dir}/gcc && make check 
RUNTESTFLAGS="i386.exp=gcc.target/i386/zero-scratch-regs-31.c 
--target_board='unix{-m32\ -march=cascadelake}'"

(Please do not reply to this email, for question about this report, contact me 
at skpgkp2 at gmail dot com)


Re: [committed] libstdc++: Use double for unordered container load factors [PR 96958]

2020-10-30 Thread Jonathan Wakely via Gcc-patches

On 31/10/20 00:23 +, Jonathan Wakely wrote:

These calculations were changed to use long double nearly ten years ago
in order to get more precision than float:
https://gcc.gnu.org/pipermail/libstdc++/2011-September/036420.html

However, double should be sufficient, whlie being potentially faster
than long double, and not requiring soft FP calculations for targets
without native long double support.

libstdc++-v3/ChangeLog:

PR libstdc++/96958
* include/bits/hashtable_policy.h (_Prime_rehash_policy)
(_Power2_rehash_policy): Use double instead of long double.

Tested powerpc64le-linux. Committed to trunk.

This doesn't fix the PR, because there are also long double
calculations in src/c++11/hashtable_c++0x.cc, so another patch is
needed.


Here's that other patch. This also fixes some failures I was seeing
when mixing -mabi=ieeelongdouble with -mabi=ibmlongdouble in a local
branch for the ieee128 transition work.

Tested powerpc64le-linux. Committed to trunk.

commit 943cc2a1b70f2d755b4fed97b1c4b49234d92899
Author: Jonathan Wakely 
Date:   Sat Oct 31 00:52:57 2020

libstdc++: Use double for unordered container load factors [PR 96958]

My previous commit for this PR changed the types from long double to
double, but didn't change the uses of __builtin_ceill and
__builtin_floorl. It also failed to change the non-inline functions in
src/c++11/hashtable_c++0x.cc. This should fix it properly now.

libstdc++-v3/ChangeLog:

PR libstdc++/96958
* include/bits/hashtable_policy.h (_Prime_rehash_policy)
(_Power2_rehash_policy): Use ceil and floor instead of ceill and
floorl.
* src/c++11/hashtable_c++0x.cc (_Prime_rehash_policy): Likewise.
Use double instead of long double.

diff --git a/libstdc++-v3/include/bits/hashtable_policy.h b/libstdc++-v3/include/bits/hashtable_policy.h
index 7fed87f1c76b..28372979c873 100644
--- a/libstdc++-v3/include/bits/hashtable_policy.h
+++ b/libstdc++-v3/include/bits/hashtable_policy.h
@@ -458,7 +458,7 @@ namespace __detail
 // Return a bucket count appropriate for n elements
 std::size_t
 _M_bkt_for_elements(std::size_t __n) const
-{ return __builtin_ceill(__n / (double)_M_max_load_factor); }
+{ return __builtin_ceil(__n / (double)_M_max_load_factor); }
 
 // __n_bkt is current bucket count, __n_elt is current element count,
 // and __n_ins is number of elements to be inserted.  Do we need to
@@ -559,7 +559,7 @@ namespace __detail
 	_M_next_resize = size_t(-1);
   else
 	_M_next_resize
-	  = __builtin_floorl(__res * (double)_M_max_load_factor);
+	  = __builtin_floor(__res * (double)_M_max_load_factor);
 
   return __res;
 }
@@ -567,7 +567,7 @@ namespace __detail
 // Return a bucket count appropriate for n elements
 std::size_t
 _M_bkt_for_elements(std::size_t __n) const noexcept
-{ return __builtin_ceill(__n / (double)_M_max_load_factor); }
+{ return __builtin_ceil(__n / (double)_M_max_load_factor); }
 
 // __n_bkt is current bucket count, __n_elt is current element count,
 // and __n_ins is number of elements to be inserted.  Do we need to
@@ -587,11 +587,11 @@ namespace __detail
 	  / (double)_M_max_load_factor;
 	  if (__min_bkts >= __n_bkt)
 	return { true,
-	  _M_next_bkt(std::max(__builtin_floorl(__min_bkts) + 1,
+	  _M_next_bkt(std::max(__builtin_floor(__min_bkts) + 1,
 		__n_bkt * _S_growth_factor)) };
 
 	  _M_next_resize
-	= __builtin_floorl(__n_bkt * (double)_M_max_load_factor);
+	= __builtin_floor(__n_bkt * (double)_M_max_load_factor);
 	  return { false, 0 };
 	}
   else
diff --git a/libstdc++-v3/src/c++11/hashtable_c++0x.cc b/libstdc++-v3/src/c++11/hashtable_c++0x.cc
index 62762f34cafc..4dec2a84641e 100644
--- a/libstdc++-v3/src/c++11/hashtable_c++0x.cc
+++ b/libstdc++-v3/src/c++11/hashtable_c++0x.cc
@@ -58,7 +58,7 @@ namespace __detail
 	  return 1;
 
 	_M_next_resize =
-	  __builtin_floorl(__fast_bkt[__n] * (long double)_M_max_load_factor);
+	  __builtin_floor(__fast_bkt[__n] * (double)_M_max_load_factor);
 	return __fast_bkt[__n];
   }
 
@@ -81,7 +81,7 @@ namespace __detail
   _M_next_resize = size_t(-1);
 else
   _M_next_resize =
-	__builtin_floorl(*__next_bkt * (long double)_M_max_load_factor);
+	__builtin_floor(*__next_bkt * (double)_M_max_load_factor);
 
 return *__next_bkt;
   }
@@ -105,16 +105,16 @@ namespace __detail
 	// If _M_next_resize is 0 it means that we have nothing allocated so
 	// far and that we start inserting elements. In this case we start
 	// with an initial bucket size of 11.
-	long double __min_bkts
+	double __min_bkts
 	  = std::max(__n_elt + __n_ins, _M_next_resize ? 0 : 11)
-	  / (long double)_M_max_load_factor;
+	  / (double)_M_max_load_factor;
 	if (__min_bkts >= __n_bkt)
 	  return { true,
-	_M_next_bkt(std::max(__builtin_floorl(__min_bkts) + 1,
+	_M_next_bkt(std::max(__builtin_flo

Re: [PATCH] PowerPC: Don't assume all targets have GLIBC

2020-10-30 Thread Michael Meissner via Gcc-patches
On Fri, Oct 30, 2020 at 06:39:16PM -0500, Segher Boessenkool wrote:
> On Fri, Oct 30, 2020 at 06:50:30PM -0400, Michael Meissner wrote:
> > On Fri, Oct 30, 2020 at 03:54:06PM -0500, Segher Boessenkool wrote:
> > > But, time will fix all problems here.  So, okay for trunk.  Thanks!
> > 
> > Note, I discovered the ABI is not set to ELFv2 at the time the test is 
> > done, so
> > I removed that part of the test.
> 
> Good, thanks.  But why did you leave the !BYTES_BIG_ENDIAN?  That seems
> just as wrong.

As I recall, Tulio has said that the float128 stuff is only enabled in little
endian port of GLIBC.

-- 
Michael Meissner, IBM
IBM, M/S 2506R, 550 King Street, Littleton, MA 01460-6245, USA
email: meiss...@linux.ibm.com, phone: +1 (978) 899-4797