Re: [PATCH] [PR target/97194] [AVX2] Support variable index vec_set.

2020-11-11 Thread Hongtao Liu via Gcc-patches
ping ^3

Rebase patch on latest trunk.

On Tue, Oct 27, 2020 at 3:51 PM Hongtao Liu  wrote:
>
> ping^1
>
> On Tue, Oct 20, 2020 at 3:36 PM Richard Biener
>  wrote:
> >
> > On Tue, Oct 20, 2020 at 4:35 AM Hongtao Liu  wrote:
> > >
> > > On Mon, Oct 19, 2020 at 5:55 PM Richard Biener
> > >  wrote:
> > > >
> > > > On Mon, Oct 19, 2020 at 11:37 AM Hongtao Liu  wrote:
> > > > >
> > > > > On Mon, Oct 19, 2020 at 5:07 PM Richard Biener
> > > > >  wrote:
> > > > > >
> > > > > > On Mon, Oct 19, 2020 at 10:21 AM Hongtao Liu  
> > > > > > wrote:
> > > > > > >
> > > > > > > Hi:
> > > > > > >   It's implemented as below:
> > > > > > > V setg (V v, int idx, T val)
> > > > > > >
> > > > > > > {
> > > > > > >   V idxv = (V){idx, idx, idx, idx, idx, idx, idx, idx};
> > > > > > >   V valv = (V){val, val, val, val, val, val, val, val};
> > > > > > >   V mask = ((V){0, 1, 2, 3, 4, 5, 6, 7} == idxv);
> > > > > > >   v = (v & ~mask) | (valv & mask);
> > > > > > >   return v;
> > > > > > > }
> > > > > > >
> > > > > > > Bootstrap is fine, regression test for i386/x86-64 backend is ok.
> > > > > > > Ok for trunk?
> > > > > >
> > > > > > Hmm, I guess you're trying to keep the code for !AVX512BW simple
> > > > > > but isn't just splitting the compare into
> > > > > >
> > > > > >  clow = {0, 1, 2, 3 ... } == idxv
> > > > > >  chigh = {16, 17, 18, ... } == idxv;
> > > > > >  cmp = {clow, chigh}
> > > > > >
> > > > >
> > > > > We also don't have 512-bits byte/word blend instructions without
> > > > > TARGET_AVX512W, so how to use 512-bits cmp?
> > > >
> > > > Oh, I see.  Guess two back-to-back vpternlog could emulate
> > >
> > > Yes, we can have something like vpternlogd %zmm0, %zmm1, %zmm2, 0xD8,
> > > but since we don't have 512-bits bytes/word broadcast instruction,
> > > It would need 2 broadcast and 1 vec_concat to get 1 512-bits vector.
> > > it wouldn't save many instructions compared to my version(as below).
> > >
> > > ---
> > > leal-16(%rsi), %eax
> > > vmovd   %edi, %xmm2
> > > vmovdqa .LC0(%rip), %ymm4
> > > vextracti64x4   $0x1, %zmm0, %ymm3
> > > vmovd   %eax, %xmm1
> > > vpbroadcastw%xmm2, %ymm2
> > > vpbroadcastw%xmm1, %ymm1
> > > vpcmpeqw%ymm4, %ymm1, %ymm1
> > > vpblendvb   %ymm1, %ymm2, %ymm3, %ymm3
> > > vmovd   %esi, %xmm1
> > > vpbroadcastw%xmm1, %ymm1
> > > vpcmpeqw%ymm4, %ymm1, %ymm1
> > > vpblendvb   %ymm1, %ymm2, %ymm0, %ymm0
> > > vinserti64x4$0x1, %ymm3, %zmm0, %zmm0
> > > ---
> > >
> > > > the blend?  Not sure if important - I recall only knl didn't have bw?
> > > >
> > >
> > > Yes, after(including) SKX, all avx512 targets will support AVX512BW.
> > > And i don't think performance for V32HI/V64QI without AVX512BW is 
> > > important.
> >
> > True.
> >
> > I have no further comments on the patch then - it still needs i386 
> > maintainer
> > approval though.
> >
> > Thanks,
> > Richard.
> >
> > >
> > > > > cut from i386-expand.c:
> > > > > in ix86_expand_sse_movcc
> > > > >  3682case E_V64QImode:
> > > > >  3683  gen = gen_avx512bw_blendmv64qi; ---> TARGET_AVX512BW needed
> > > > >  3684  break;
> > > > >  3685case E_V32HImode:
> > > > >  3686  gen = gen_avx512bw_blendmv32hi; --> TARGET_AVX512BW needed
> > > > >  3687  break;
> > > > >  3688case E_V16SImode:
> > > > >  3689  gen = gen_avx512f_blendmv16si;
> > > > >  3690  break;
> > > > >  3691case E_V8DImode:
> > > > >  3692  gen = gen_avx512f_blendmv8di;
> > > > >  3693  break;
> > > > >  3694case E_V8DFmode:
> > > > >
> > > > > > faster, smaller and eventually even easier during expansion?
> > > > > >
> > > > > > +  gcc_assert (ix86_expand_vector_init_duplicate (false, mode, 
> > > > > > valv, val));
> > > > > > +  gcc_assert (ix86_expand_vector_init_duplicate (false, cmp_mode,
> > > > > > idxv, idx_tmp));
> > > > > >
> > > > > > side-effects in gcc_assert is considered bad style, use
> > > > > >
> > > > > >   ok = ix86_expand_vector_init_duplicate (false, mode, valv, val);
> > > > > >   gcc_assert (ok);
> > > > > >
> > > > > > +  vec[5] = constv;
> > > > > > +  ix86_expand_int_vcond (vec);
> > > > > >
> > > > > > this also returns a bool you probably should assert true.
> > > > > >
> > > > >
> > > > > Yes, will change.
> > > > >
> > > > > > Otherwise thanks for tackling this.
> > > > > >
> > > > > > Richard.
> > > > > >
> > > > > > > gcc/ChangeLog:
> > > > > > >
> > > > > > > PR target/97194
> > > > > > > * config/i386/i386-expand.c (ix86_expand_vector_set_var): 
> > > > > > > New function.
> > > > > > > * config/i386/i386-protos.h (ix86_expand_vector_set_var): 
> > > > > > > New Decl.
> > > > > > > * config/i386/predicates.md (vec_setm_operand): New 
> > > > > > > predicate,
> > > > > > > true for const_int_operand or register_operand under 
> > > > > > > TARGET_AVX2.
> > > > > > > * config/i386/sse.md

Re: [PATCH V2] Clean up loop-closed PHIs after loop finalize

2020-11-11 Thread Jiufu Guo via Gcc-patches


Thanks a lot for the sugguestion from previous mails.
The patch was updated accordingly.

This updated patch propagates loop-closed PHIs them out after
loop_optimizer_finalize under a new introduced flag.  At some cases,
to clean up loop-closed PHIs would save efforts of optimization passes
after loopdone.

This patch passes bootstrap and regtest on ppc64le.  Is this ok for trunk?

gcc/ChangeLog
2020-10-11  Jiufu Guo   

* common.opt (flag_clean_up_loop_closed_phi): New flag.
* loop-init.c (loop_optimizer_finalize): Check
flag_clean_up_loop_closed_phi and call clean_up_loop_closed_phi.
* tree-cfgcleanup.h (clean_up_loop_closed_phi): New declare.
* tree-ssa-propagate.c (clean_up_loop_closed_phi): New function.

gcc/testsuite/ChangeLog
2020-10-11  Jiufu Guo   

* gcc.dg/tree-ssa/loopclosedphi.c: New test.

---
 gcc/common.opt|  4 ++
 gcc/loop-init.c   |  8 +++
 gcc/testsuite/gcc.dg/tree-ssa/loopclosedphi.c | 21 +++
 gcc/tree-cfgcleanup.h |  1 +
 gcc/tree-ssa-propagate.c  | 61 +++
 5 files changed, 95 insertions(+)
 create mode 100644 gcc/testsuite/gcc.dg/tree-ssa/loopclosedphi.c

diff --git a/gcc/common.opt b/gcc/common.opt
index 7e789d1c47f..f0d7b74d7ad 100644
--- a/gcc/common.opt
+++ b/gcc/common.opt
@@ -1141,6 +1141,10 @@ fchecking=
 Common Joined RejectNegative UInteger Var(flag_checking)
 Perform internal consistency checkings.
 
+fclean-up-loop-closed-phi
+Common Report Var(flag_clean_up_loop_closed_phi) Optimization Init(0)
+Clean up loop-closed PHIs after loop optimization done.
+
 fcode-hoisting
 Common Report Var(flag_code_hoisting) Optimization
 Enable code hoisting.
diff --git a/gcc/loop-init.c b/gcc/loop-init.c
index 401e5282907..05804759ac9 100644
--- a/gcc/loop-init.c
+++ b/gcc/loop-init.c
@@ -33,6 +33,7 @@ along with GCC; see the file COPYING3.  If not see
 #include "tree-ssa-loop-niter.h"
 #include "loop-unroll.h"
 #include "tree-scalar-evolution.h"
+#include "tree-cfgcleanup.h"
 
 
 /* Apply FLAGS to the loop state.  */
@@ -145,6 +146,13 @@ loop_optimizer_finalize (struct function *fn)
 
   free_numbers_of_iterations_estimates (fn);
 
+  if (flag_clean_up_loop_closed_phi
+  && loops_state_satisfies_p (fn, LOOP_CLOSED_SSA))
+{
+  clean_up_loop_closed_phi (fn);
+  loops_state_clear (fn, LOOP_CLOSED_SSA);
+}
+
   /* If we should preserve loop structure, do not free it but clear
  flags that advanced properties are there as we are not preserving
  that in full.  */
diff --git a/gcc/testsuite/gcc.dg/tree-ssa/loopclosedphi.c 
b/gcc/testsuite/gcc.dg/tree-ssa/loopclosedphi.c
new file mode 100644
index 000..ab22a991935
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/tree-ssa/loopclosedphi.c
@@ -0,0 +1,21 @@
+/* { dg-do compile } */
+/* { dg-options "-O3 -fno-tree-ch -w -fdump-tree-loopdone-details 
-fclean-up-loop-closed-phi" } */
+
+void
+t6 (int qz, int wh)
+{
+  int jl = wh;
+
+  while (1.0 * qz / wh < 1)
+{
+  qz = wh * (wh + 2);
+
+  while (wh < 1)
+jl = 0;
+}
+
+  while (qz < 1)
+qz = jl * wh;
+}
+
+/* { dg-final { scan-tree-dump-times "Replacing" 2 "loopdone"} } */
diff --git a/gcc/tree-cfgcleanup.h b/gcc/tree-cfgcleanup.h
index 6ff6726bfe4..9e368d63709 100644
--- a/gcc/tree-cfgcleanup.h
+++ b/gcc/tree-cfgcleanup.h
@@ -26,5 +26,6 @@ extern bool cleanup_tree_cfg (unsigned = 0);
 extern bool fixup_noreturn_call (gimple *stmt);
 extern bool delete_unreachable_blocks_update_callgraph (cgraph_node *dst_node,
bool update_clones);
+extern unsigned clean_up_loop_closed_phi (function *);
 
 #endif /* GCC_TREE_CFGCLEANUP_H */
diff --git a/gcc/tree-ssa-propagate.c b/gcc/tree-ssa-propagate.c
index 87dbf55fab9..a3bfe36c733 100644
--- a/gcc/tree-ssa-propagate.c
+++ b/gcc/tree-ssa-propagate.c
@@ -1549,3 +1549,64 @@ propagate_tree_value_into_stmt (gimple_stmt_iterator 
*gsi, tree val)
   else
 gcc_unreachable ();
 }
+
+/* Check exits of each loop in FUN, walk over loop closed PHIs in
+   each exit basic block and propagate degenerate PHIs.  */
+
+unsigned
+clean_up_loop_closed_phi (function *fun)
+{
+  unsigned i;
+  edge e;
+  gphi *phi;
+  tree rhs;
+  tree lhs;
+  gphi_iterator gsi;
+  struct loop *loop;
+  bool cfg_altered = false;
+
+  /* Check dominator info before get loop-close PHIs from loop exits.  */
+  if (dom_info_state (CDI_DOMINATORS) != DOM_OK)
+return 0;
+
+  /* Walk over loop in function.  */
+  FOR_EACH_LOOP_FN (fun, loop, 0)
+{
+  /* Check each exit edege of loop.  */
+  auto_vec exits = get_loop_exit_edges (loop);
+  FOR_EACH_VEC_ELT (exits, i, e)
+   if (single_pred_p (e->dest))
+ /* Walk over loop-closed PHIs.  */
+ for (gsi = gsi_start_phis (e->dest); !gsi_end_p (gsi);)
+   {
+ phi = gsi.phi ();
+ rhs = degenerate_phi_result (p

Re: [PATCH] Cleanup irange::set.

2020-11-11 Thread Aldy Hernandez via Gcc-patches




On 11/10/20 3:35 PM, Richard Sandiford wrote:

Aldy Hernandez  writes:

(actually I can see 3245 ICEs on aarch64)

Can you fix it?


Sure can.

Richard, I seem to have incorrectly removed the early exit for varying,
and that affected the changes you made for poly ints.  Is there any
reason we can't just exit and set varying without checking for kind !=
VR_VARYING?


No reason, it seemed more natural to drop to a lower kind with the old
code :-)  (But not with the new code.)

But it isn't obvious to me why the code is now structed the way it is.

   if (POLY_INT_CST_P (min) || POLY_INT_CST_P (max))
 {
   set_varying (TREE_TYPE (min));
   return;
 }

   // Nothing to canonicalize for symbolic ranges.
   if (TREE_CODE (min) != INTEGER_CST
   || TREE_CODE (max) != INTEGER_CST)
 {
   m_kind = kind;
   m_base[0] = min;
   m_base[1] = max;
   m_num_ranges = 1;
   return;
 }

   swap_out_of_order_endpoints (min, max, kind);
   if (kind == VR_VARYING)
 {
   set_varying (TREE_TYPE (min));
   return;
 }

Why do we want to check “min” and “max” being INTEGER_CST before “kind”
being VR_VARYING, and the potentially record VR_VARYING with specific
bounds?  And why do we want to swap the “min” and “max” before checking
whether “kind” is VR_VARYING (when we'll then drop the min and max anyway)?
I think this would benefit from a bit more commentary at least.


The main idea was to shorten the code and avoid having to exit due to 
varying at various points (early and after the operands had been 
swapped).  But yes, it took more cycles.


BTW, VR_VARYING does get specific bounds, by design.  What could've 
happened in the code was someone feeding VR_VARYING with non-integer 
bounds.  This would've built an invalid VR_VARYING.


How about this (on top of the previous patch which I already pushed to 
un-break aarch64)?


Aldy

p.s. If POLY_INT_CST_P are not supported in ranges, but are 
INTEGRAL_TYPE_P, perhaps we should also tweak irange::supports_type_p so 
it doesn't leak in anywhere else.


gcc/ChangeLog:

* value-range.cc (irange::set): Early exit on VR_VARYING.
---
 gcc/value-range.cc | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/gcc/value-range.cc b/gcc/value-range.cc
index b7ccba010e4..3703519b03a 100644
--- a/gcc/value-range.cc
+++ b/gcc/value-range.cc
@@ -249,7 +249,8 @@ irange::set (tree min, tree max, value_range_kind kind)
   return;
 }

-  if (POLY_INT_CST_P (min) || POLY_INT_CST_P (max))
+  if (kind == VR_VARYING
+  || POLY_INT_CST_P (min) || POLY_INT_CST_P (max))
 {
   set_varying (TREE_TYPE (min));
   return;
--
2.26.2



[committed] gfortran.dg/gomp/workshare-reduction-*.f90: Fix dumps for -m32 (was: Re: [Patch] Fortran: OpenMP 5.0 (in_, task_)reduction clause extensions)

2020-11-11 Thread Tobias Burnus

As Sunil's regression tester pointed out, the testcases fail on x86-64 with 
-m32.

The reason is that then the _ull_ variants of the GOMP functions are called;
in the C equivalent, those are always called – I assume that's because the C
testcase uses 'unsigned' which does not exist with Fortran.

Committed as r11-4903-g1644ab9917ca6b96e9e683c422f1793258b9a3db

Tobias

-
Mentor Graphics (Deutschland) GmbH, Arnulfstraße 201, 80634 München / Germany
Registergericht München HRB 106955, Geschäftsführer: Thomas Heurung, Alexander 
Walter
commit 1644ab9917ca6b96e9e683c422f1793258b9a3db
Author: Tobias Burnus 
Date:   Wed Nov 11 09:23:07 2020 +0100

gfortran.dg/gomp/workshare-reduction-*.f90: Fix dumps for -m32

gcc/testsuite/ChangeLog:

* gfortran.dg/gomp/workshare-reduction-26.f90: Add (?:_ull) to
scan-tree-dump-times regex for -m32.
* gfortran.dg/gomp/workshare-reduction-27.f90: Likewise.
* gfortran.dg/gomp/workshare-reduction-28.f90: Likewise.
* gfortran.dg/gomp/workshare-reduction-3.f90: Likewise.
* gfortran.dg/gomp/workshare-reduction-36.f90: Likewise.
* gfortran.dg/gomp/workshare-reduction-37.f90: Likewise.
* gfortran.dg/gomp/workshare-reduction-38.f90: Likewise.
* gfortran.dg/gomp/workshare-reduction-39.f90: Likewise.
* gfortran.dg/gomp/workshare-reduction-40.f90: Likewise.
* gfortran.dg/gomp/workshare-reduction-41.f90: Likewise.
* gfortran.dg/gomp/workshare-reduction-42.f90: Likewise.
* gfortran.dg/gomp/workshare-reduction-43.f90: Likewise.
* gfortran.dg/gomp/workshare-reduction-44.f90: Likewise.
* gfortran.dg/gomp/workshare-reduction-45.f90: Likewise.
* gfortran.dg/gomp/workshare-reduction-46.f90: Likewise.
* gfortran.dg/gomp/workshare-reduction-47.f90: Likewise.
* gfortran.dg/gomp/workshare-reduction-56.f90: Likewise.
* gfortran.dg/gomp/workshare-reduction-57.f90: Likewise.

diff --git a/gcc/testsuite/gfortran.dg/gomp/workshare-reduction-26.f90 b/gcc/testsuite/gfortran.dg/gomp/workshare-reduction-26.f90
index 28267902914..d8633b66045 100644
--- a/gcc/testsuite/gfortran.dg/gomp/workshare-reduction-26.f90
+++ b/gcc/testsuite/gfortran.dg/gomp/workshare-reduction-26.f90
@@ -3 +3 @@
-! { dg-final { scan-tree-dump-times "__builtin_GOMP_loop_start \[^\n\r]*, 0, 0, " 1 "optimized" } }
+! { dg-final { scan-tree-dump-times "__builtin_GOMP_loop(?:_ull)_start \[^\n\r]*, 0, 0, " 1 "optimized" } }
@@ -5 +5 @@
-! { dg-final { scan-tree-dump-times "__builtin_GOMP_loop_maybe_nonmonotonic_runtime_next " 1 "optimized" } }
+! { dg-final { scan-tree-dump-times "__builtin_GOMP_loop(?:_ull)_maybe_nonmonotonic_runtime_next " 1 "optimized" } }
diff --git a/gcc/testsuite/gfortran.dg/gomp/workshare-reduction-27.f90 b/gcc/testsuite/gfortran.dg/gomp/workshare-reduction-27.f90
index 2ee047d4e8c..aada4d7a23b 100644
--- a/gcc/testsuite/gfortran.dg/gomp/workshare-reduction-27.f90
+++ b/gcc/testsuite/gfortran.dg/gomp/workshare-reduction-27.f90
@@ -3 +3 @@
-! { dg-final { scan-tree-dump-times "__builtin_GOMP_loop_start \[^\n\r]*, (?:2147483648|-2147483648), 0, " 1 "optimized" } }
+! { dg-final { scan-tree-dump-times "__builtin_GOMP_loop(?:_ull)_start \[^\n\r]*, (?:2147483648|-2147483648), 0, " 1 "optimized" } }
@@ -5 +5 @@
-! { dg-final { scan-tree-dump-times "__builtin_GOMP_loop_runtime_next " 1 "optimized" } }
+! { dg-final { scan-tree-dump-times "__builtin_GOMP_loop(?:_ull)_runtime_next " 1 "optimized" } }
diff --git a/gcc/testsuite/gfortran.dg/gomp/workshare-reduction-28.f90 b/gcc/testsuite/gfortran.dg/gomp/workshare-reduction-28.f90
index 6c9d49be13c..e67e24b1aa2 100644
--- a/gcc/testsuite/gfortran.dg/gomp/workshare-reduction-28.f90
+++ b/gcc/testsuite/gfortran.dg/gomp/workshare-reduction-28.f90
@@ -3 +3 @@
-! { dg-final { scan-tree-dump-times "__builtin_GOMP_loop_start \[^\n\r]*, 4, 0, " 1 "optimized" } }
+! { dg-final { scan-tree-dump-times "__builtin_GOMP_loop(?:_ull)_start \[^\n\r]*, 4, 0, " 1 "optimized" } }
@@ -5 +5 @@
-! { dg-final { scan-tree-dump-times "__builtin_GOMP_loop_nonmonotonic_runtime_next " 1 "optimized" } }
+! { dg-final { scan-tree-dump-times "__builtin_GOMP_loop(?:_ull)_nonmonotonic_runtime_next " 1 "optimized" } }
diff --git a/gcc/testsuite/gfortran.dg/gomp/workshare-reduction-3.f90 b/gcc/testsuite/gfortran.dg/gomp/workshare-reduction-3.f90
index 6c9d49be13c..e67e24b1aa2 100644
--- a/gcc/testsuite/gfortran.dg/gomp/workshare-reduction-3.f90
+++ b/gcc/testsuite/gfortran.dg/gomp/workshare-reduction-3.f90
@@ -3 +3 @@
-! { dg-final { scan-tree-dump-times "__builtin_GOMP_loop_start \[^\n\r]*, 4, 0, " 1 "optimized" } }
+! { dg-final { scan-tree-dump-times "__builtin_GOMP_loop(?:_ull)_start \[^\n\r]*, 4, 0, " 1 "optimized" } }
@@ -5 +5 @@
-! { dg-final { scan-tree-dump-times "__builtin_GOMP_loop_nonmonotonic_runtime_next " 1 "optimized" } }
+! { dg-final { scan-

Re: [PATCH] Better __ashlDI3, __ashrDI3 and __lshrDI3 functions, plus fixed __bswapsi2 function

2020-11-11 Thread Stefan Kanthak
Jakub Jelinek  wrote:

> On Tue, Nov 10, 2020 at 04:48:10PM -0700, Jeff Law via Gcc-patches wrote:
>> > @@ -486,10 +425,10 @@
>> >  SItype
>> >  __bswapsi2 (SItype u)
>> >  {
>> > -  return u) & 0xff00) >> 24)
>> > -   | (((u) & 0x00ff) >>  8)
>> > -   | (((u) & 0xff00) <<  8)
>> > -   | (((u) & 0x00ff) << 24));
>> > +  return u) & 0xff00u) >> 24)
>> > +   | (((u) & 0x00ffu) >>  8)
>> > +   | (((u) & 0xff00u) <<  8)
>> > +   | (((u) & 0x00ffu) << 24));
>>
>> What's the point of this change? I'm not sure how the signedness of the
>> constant really matters here.
>
> Note 0xff00 is implicitly 0xff00U because it doesn't fit into signed
> int, and that is the only one where the logical vs. arithmetic right shift
> really matters for correct behavior.

Ouch: that's but not the point here; what matters is the undefined behaviour of
  ((u) & 0x00ff) << 24

0x00ff is a signed int, so (u) & 0x00ff is signed too -- and producing
a negative value (or overflow) from the left-shift of a signed int, i.e.
shifting into (or beyond) the sign bit, is undefined behaviour!

JFTR: both -fsanitize=signed-integer-overflow and -fsanitize=undefined fail
  to catch this BUGBUGBUG, which surfaces on i386 and AMD64 with -O1 or
  -O0!
Stefan Kanthak

PS: even worse, -fsanitize=signed-integer-overflow fails to catch 1 << 31
or 128 << 24!



Re: [PATCH] [PR target/97194] [AVX2] Support variable index vec_set.

2020-11-11 Thread Uros Bizjak via Gcc-patches
> gcc/ChangeLog:
>
> PR target/97194
> * config/i386/i386-expand.c (ix86_expand_vector_set_var): New function.
> * config/i386/i386-protos.h (ix86_expand_vector_set_var): New Decl.
> * config/i386/predicates.md (vec_setm_operand): New predicate,
> true for const_int_operand or register_operand under TARGET_AVX2.
> * config/i386/sse.md (vec_set): Support both constant
> and variable index vec_set.
>
> gcc/testsuite/ChangeLog:
>
> * gcc.target/i386/avx2-vec-set-1.c: New test.
> * gcc.target/i386/avx2-vec-set-2.c: New test.
> * gcc.target/i386/avx512bw-vec-set-1.c: New test.
> * gcc.target/i386/avx512bw-vec-set-2.c: New test.
> * gcc.target/i386/avx512f-vec-set-2.c: New test.
> * gcc.target/i386/avx512vl-vec-set-2.c: New test.

+;; True for registers, or const_int_operand, used to vec_setm expander.
+(define_predicate "vec_setm_operand"
+  (ior (and (match_operand 0 "register_operand")
+(match_test "TARGET_AVX2"))
+   (match_code "const_int")))
+
 ;; True for registers, or 1 or -1.  Used to optimize double-word shifts.
 (define_predicate "reg_or_pm1_operand"
   (ior (match_operand 0 "register_operand")
diff --git a/gcc/config/i386/sse.md b/gcc/config/i386/sse.md
index b153a87fb98..1798e5dea75 100644
--- a/gcc/config/i386/sse.md
+++ b/gcc/config/i386/sse.md
@@ -8098,11 +8098,14 @@ (define_insn "vec_setv2df_0"
 (define_expand "vec_set"
   [(match_operand:V 0 "register_operand")
(match_operand: 1 "register_operand")
-   (match_operand 2 "const_int_operand")]
+   (match_operand 2 "vec_setm_operand")]

You need to specify a mode, otherwise a register of any mode can pass here.

Uros.


Re: [committed] gfortran.dg/gomp/workshare-reduction-*.f90: Fix dumps for -m32 (was: Re: [Patch] Fortran: OpenMP 5.0 (in_, task_)reduction clause extensions)

2020-11-11 Thread Jakub Jelinek via Gcc-patches
On Wed, Nov 11, 2020 at 09:25:25AM +0100, Tobias Burnus wrote:
> As Sunil's regression tester pointed out, the testcases fail on x86-64 with 
> -m32.
> 
> The reason is that then the _ull_ variants of the GOMP functions are called;
> in the C equivalent, those are always called – I assume that's because the C
> testcase uses 'unsigned' which does not exist with Fortran.

Yes, I didn't want to have 4 variants of everything, so we have just two.
One handles what fits into signed long, another handles what fits into
unsigned long long.

Jakub



Re: [PATCH] Cleanup irange::set.

2020-11-11 Thread Richard Sandiford via Gcc-patches
Aldy Hernandez  writes:
> On 11/10/20 3:35 PM, Richard Sandiford wrote:
>> Aldy Hernandez  writes:
 (actually I can see 3245 ICEs on aarch64)

 Can you fix it?
>>>
>>> Sure can.
>>>
>>> Richard, I seem to have incorrectly removed the early exit for varying,
>>> and that affected the changes you made for poly ints.  Is there any
>>> reason we can't just exit and set varying without checking for kind !=
>>> VR_VARYING?
>> 
>> No reason, it seemed more natural to drop to a lower kind with the old
>> code :-)  (But not with the new code.)
>> 
>> But it isn't obvious to me why the code is now structed the way it is.
>> 
>>if (POLY_INT_CST_P (min) || POLY_INT_CST_P (max))
>>  {
>>set_varying (TREE_TYPE (min));
>>return;
>>  }
>> 
>>// Nothing to canonicalize for symbolic ranges.
>>if (TREE_CODE (min) != INTEGER_CST
>>|| TREE_CODE (max) != INTEGER_CST)
>>  {
>>m_kind = kind;
>>m_base[0] = min;
>>m_base[1] = max;
>>m_num_ranges = 1;
>>return;
>>  }
>> 
>>swap_out_of_order_endpoints (min, max, kind);
>>if (kind == VR_VARYING)
>>  {
>>set_varying (TREE_TYPE (min));
>>return;
>>  }
>> 
>> Why do we want to check “min” and “max” being INTEGER_CST before “kind”
>> being VR_VARYING, and the potentially record VR_VARYING with specific
>> bounds?  And why do we want to swap the “min” and “max” before checking
>> whether “kind” is VR_VARYING (when we'll then drop the min and max anyway)?
>> I think this would benefit from a bit more commentary at least.
>
> The main idea was to shorten the code and avoid having to exit due to 
> varying at various points (early and after the operands had been 
> swapped).  But yes, it took more cycles.
>
> BTW, VR_VARYING does get specific bounds, by design.  What could've 
> happened in the code was someone feeding VR_VARYING with non-integer 
> bounds.  This would've built an invalid VR_VARYING.
>
> How about this (on top of the previous patch which I already pushed to 
> un-break aarch64)?

Thanks, this certainly makes the flow clearer for a range noob like me :-)

> p.s. If POLY_INT_CST_P are not supported in ranges, but are 
> INTEGRAL_TYPE_P, perhaps we should also tweak irange::supports_type_p so 
> it doesn't leak in anywhere else.

POLY_INT_CSTs aren't associated with separate types.  They're just
values of normal integer type.  Logically they come somewhere between an
INTEGER_CST and an SSA_NAME: they're not “as constant as” an INTEGER_CST
but not “as variable as” an SSA_NAME.

This means that they really could be treated as symbolic.  The difficulty
is that POLY_INT_CSTs satisfy is_gimple_min_invariant, and the VRP code
fundamentally assumes that is_gimple_min_invariant on an integer type
means that the value must be an INTEGER_CST or an ADDR_EXPR.  At one
point I'd posted a patch to add a new predicate specifically for that,
but Richard's response was that noone would remember to use it (which is
a fair comment :-)).  So the current approach is instead to stop
POLY_INT_CSTs getting into the VRP system in the first place.

If the VRP code was rigorous about checking for INTEGER_CST before assuming
that something was an INTEGER_CST then no special handling of POLY_INT_CST
would be needed.  But that's not the style that GCC generally follows and
trying to retrofit it now seems like fighting against the tide.

> diff --git a/gcc/value-range.cc b/gcc/value-range.cc
> index b7ccba010e4..3703519b03a 100644
> --- a/gcc/value-range.cc
> +++ b/gcc/value-range.cc
> @@ -249,7 +249,8 @@ irange::set (tree min, tree max, value_range_kind kind)
> return;
>   }
>
> -  if (POLY_INT_CST_P (min) || POLY_INT_CST_P (max))
> +  if (kind == VR_VARYING
> +  || POLY_INT_CST_P (min) || POLY_INT_CST_P (max))
>   {
> set_varying (TREE_TYPE (min));
> return;

Very minor nit, sorry, but: formatting rules say that all checks should
be on one line or that there should be exactly one check per line.

OK with that change.  Thanks for the quick response in fixing this.

Richard


Re: [Patch, fortran] PR83118 - [8/9/10/11 Regression] Bad intrinsic assignment of class(*) array component of derived type

2020-11-11 Thread Paul Richard Thomas via Gcc-patches
Hi Thomas,

Yes, it did grow into a bit of a monster patch. I kept noticing rather
flakey bits of existing code, especially where matching of dtype element
lengths to the actual payload was concerned.

Waiting for the others to comment gives me a chance to write a more
comprehensive testcase for the handling of temporaries. Note also that
PR96012 is fixed by this patch and will require an additional test.

I am happy to leave dependency_57.f90 as it is and add an additional test.
I will post the tests as soon as they are available.

Thanks for taking a look at it.

Paul


Paul


On Tue, 10 Nov 2020 at 22:16, Thomas Koenig  wrote:

> Hi Paul,
>
> > This all bootstraps and regtests on FC31/x86_64 - OK for master?
>
> This is a sizable patch, and from what I can see, it all looks
> plausible.  So, I's say OK for master (with one nit, below),
> but maybe you could wait a day or so to give others the chance
> to look it over, too.
>
> The nit:
>
> > PR fortran/83118
> > * gfortran.dg/dependency_57.f90: Change to dg-run and test for correct
> > result.
>
> I'd rather not change a test case unless it is needed; if something
> breaks it, it is better to leave it as is for bisection.
>
> Could you just make a new test from the run-time version?
>
> Thanks a lot for tackling this thorny issue!
>
> Best regards
>
> Thomas
>
>
>

-- 
"If you can't explain it simply, you don't understand it well enough" -
Albert Einstein


Re: testsuite: Adjust pr96789.c to exclude vect_load_lanes

2020-11-11 Thread Richard Sandiford via Gcc-patches
"Kewen.Lin"  writes:
> Hi Richard,
>
> Thanks for the review!
>
> on 2020/11/10 锟斤拷锟斤拷7:31, Richard Sandiford wrote:
>> "Kewen.Lin"  writes:
>>> Hi,
>>>
>>> As Lyon pointed out, the newly introduced test case
>>> gcc.dg/tree-ssa/pr96789.c fails on arm-none-linux-gnueabihf.
>>> Loop vectorizer is able to vectorize the two loops which
>>> operate on array tmp with load_lanes feature support.  It
>>> makes dse3 get unexpected inputs and do nothing.
>>>
>>> This patch is to teach the case to respect vect_load_lanes,
>>> meanwhile to guard the check only under vect_int.
>> 
>> I'm not sure this is the right check.  The test passes on aarch64,
>> which also has load lanes, but apparently doesn't use them for this
>> test.  I think the way the loop vectoriser handles the loops will
>> depend a lot on target costs, which can vary in unpredictable ways.
>> 
>
> You are right, although aarch64 doesn't have this failure, it can fail
> with explicit -march=armv8-a+sve.  It can vary as target features/costs
> change.  The check is still fragile.
>
> Your suggestion with -ftree-slp-vectorize below is better!
>
>> Does it work if you instead change -ftree-vectorize to -ftree-slp-vectorize?
>> Or does that defeat the purpose of the test?
>
> It works, nice, thanks for the suggestion!
>
> I appended one explicit -fno-tree-loop-vectorize to avoid it to fail
> in case someone kicks off the testing with explicit -ftree-loop-vectorize.
>
> The updated version is pasted below, is it ok for trunk?

OK, thanks.

Richard

>
> BR,
> Kewen
> -
> gcc/testsuite/ChangeLog:
>
>   * gcc.dg/tree-ssa/pr96789.c: Adjusted by disabling loop vectorization.
>
> diff --git a/gcc/testsuite/gcc.dg/tree-ssa/pr96789.c 
> b/gcc/testsuite/gcc.dg/tree-ssa/pr96789.c
> index d6139a014d8..5704952309b 100644
> --- a/gcc/testsuite/gcc.dg/tree-ssa/pr96789.c
> +++ b/gcc/testsuite/gcc.dg/tree-ssa/pr96789.c
> @@ -1,5 +1,8 @@
>  /* { dg-do compile } */
> -/* { dg-options "-O2 -funroll-loops -ftree-vectorize 
> -fdump-tree-dse-details" } */
> +/* Disable loop vectorization to avoid that loop vectorizer
> +   optimizes those two loops that operate tmp array so that
> +   subsequent dse3 won't eliminate expected tmp stores.  */
> +/* { dg-options "-O2 -funroll-loops -ftree-slp-vectorize 
> -fno-tree-loop-vectorize -fdump-tree-dse-details" } */
>
>  /* Test if scalar cleanup pass takes effects, mainly check
> its secondary pass DSE can remove dead stores on array


Re: [PATCH 2/2] loops: Invoke lim after successful loop interchange

2020-11-11 Thread Richard Biener
On Mon, 9 Nov 2020, Martin Jambor wrote:

> Hi,
> 
> this patch modifies the loop invariant pass so that is can operate
> only on a single requested loop and its sub-loops and ignore the rest
> of the function, much like it currently ignores basic blocks that are
> not in any real loop.  It then invokes it from within the loop
> interchange pass when it successfully swaps two loops.  This avoids
> the non-LTO -Ofast run-time regressions of 410.bwaves and 503.bwaves_r
> (which are 19% and 15% faster than current master on an AMD zen2
> machine) while not introducing a full LIM pass into the pass pipeline.
> 
> I have not modified the LIM data structures, this means that it still
> contains vectors indexed by loop->num even though only a single loop
> nest is actually processed.  I also did not replace the uses of
> pre_and_rev_post_order_compute_fn with a function that would count a
> postorder only for a given loop.  I can of course do so if the
> approach is otherwise deemed viable.
> 
> The patch adds one additional global variable requested_loop to the
> pass and then at various places behaves differently when it is set.  I
> was considering storing the fake root loop into it for normal
> operation, but since this loop often requires special handling anyway,
> I came to the conclusion that the code would actually end up less
> straightforward.
> 
> I have bootstrapped and tested the patch on x86_64-linux and a very
> similar one on aarch64-linux.  I have also tested it by modifying the
> tree_ssa_lim function to run loop_invariant_motion_from_loop on each
> real outermost loop in a function and this variant also passed
> bootstrap and all tests, including dump scans, of all languages.
> 
> I have built the entire SPEC 2006 FPrate monitoring the activity of
> the LIM pass without and with the patch (on top of commit b642fca1c31
> with which 526.blender_r and 538.imagick_r seemed to be failing) and
> it only examined 0.2% more loops, 0.02% more BBs and even fewer
> percent of statements because it is invoked only in a rather special
> circumstance.  But the patch allows for more such need-based uses at
> hopefully reasonable cost.
> 
> Since I do not have much experience with loop optimizers, I expect
> that there will be requests to adjust the patch during the review.
> Still, it fixes a performance regression against GCC 9 and so I hope
> to address the concerns in time to get it into GCC 11.
> 
> Thanks,
> 
> Martin
> 
> 
> gcc/ChangeLog:
> 
> 2020-11-08  Martin Jambor  
> 
>   * gimple-loop-interchange.cc (pass_linterchange::execute): Call
>   loop_invariant_motion_from_loop on affected loop nests.
>   * tree-ssa-loop-im.c (requested_loop): New variable.
>   (get_topmost_lim_loop): New function.
>   (outermost_invariant_loop): Use it, cap discovered topmost loop at
>   requested_loop.
>   (determine_max_movement): Use get_topmost_lim_loop.
>   (set_level): Assert that the selected loop is not outside of
>   requested_loop.
>   (compute_invariantness): Do not process loops outside of
>   requested_loop, if non-NULL.
>   (move_computations_worker): Likewise.
>   (mark_ref_stored): Stop iteration at requested_loop, if non-NULL.
>   (mark_ref_loaded): Likewise.
>   (analyze_memory_references): If non-NULL, only process basic
>   blocks and loops in requested_loop.  Compute contains_call bitmap.
>   (do_store_motion): Only process requested_loop if non-NULL.
>   (fill_always_executed_in): Likewise.  Also accept contains_call as
>   a parameter rather than computing it.
>   (tree_ssa_lim_initialize): New parameter which is stored into
>   requested_loop.  Additonal dumping. Only initialize
>   bb_loop_postorder for loops within requested_loop, if non-NULL.
>   (tree_ssa_lim_finalize): Clear requested_loop, additional dumping.
>   (loop_invariant_motion_from_loop): New function.
>   (tree_ssa_lim): Move all functionality to
>   loop_invariant_motion_from_loop, call it.
>   * tree-ssa-loop-manip.h (loop_invariant_motion_from_loop): Declare.
> 
> ---
>  gcc/gimple-loop-interchange.cc |  30 +-
>  gcc/tree-ssa-loop-im.c | 176 -
>  gcc/tree-ssa-loop-manip.h  |   2 +
>  3 files changed, 156 insertions(+), 52 deletions(-)
> 
> diff --git a/gcc/gimple-loop-interchange.cc b/gcc/gimple-loop-interchange.cc
> index 1656004ecf0..8c376228779 100644
> --- a/gcc/gimple-loop-interchange.cc
> +++ b/gcc/gimple-loop-interchange.cc
> @@ -2068,6 +2068,7 @@ pass_linterchange::execute (function *fun)
>  return 0;
>  
>bool changed_p = false;
> +  auto_vec loops_to_lim;
>class loop *loop;
>FOR_EACH_LOOP (loop, LI_ONLY_INNERMOST)
>  {
> @@ -2077,7 +2078,11 @@ pass_linterchange::execute (function *fun)
>if (prepare_perfect_loop_nest (loop, &loop_nest, &datarefs, &ddrs))
>   {
> tree_loop_interchange loop_interchange (loop_nest);
> -   cha

Re: [PATCH] Cleanup irange::set.

2020-11-11 Thread Aldy Hernandez via Gcc-patches

On 11/11/20 10:45 AM, Richard Sandiford wrote:

Aldy Hernandez  writes:

On 11/10/20 3:35 PM, Richard Sandiford wrote:

Aldy Hernandez  writes:



p.s. If POLY_INT_CST_P are not supported in ranges, but are
INTEGRAL_TYPE_P, perhaps we should also tweak irange::supports_type_p so
it doesn't leak in anywhere else.


POLY_INT_CSTs aren't associated with separate types.  They're just
values of normal integer type.  Logically they come somewhere between an
INTEGER_CST and an SSA_NAME: they're not “as constant as” an INTEGER_CST
but not “as variable as” an SSA_NAME.

This means that they really could be treated as symbolic.  The difficulty
is that POLY_INT_CSTs satisfy is_gimple_min_invariant, and the VRP code
fundamentally assumes that is_gimple_min_invariant on an integer type
means that the value must be an INTEGER_CST or an ADDR_EXPR.  At one
point I'd posted a patch to add a new predicate specifically for that,
but Richard's response was that noone would remember to use it (which is
a fair comment :-)).  So the current approach is instead to stop
POLY_INT_CSTs getting into the VRP system in the first place.

If the VRP code was rigorous about checking for INTEGER_CST before assuming
that something was an INTEGER_CST then no special handling of POLY_INT_CST
would be needed.  But that's not the style that GCC generally follows and
trying to retrofit it now seems like fighting against the tide.


I agree, we should probably treat POLY_INTs as symbolics (or varying) 
right at the setter and avoid dealing with them.  Andrew may have a plan 
for these later, but this will do for now.





diff --git a/gcc/value-range.cc b/gcc/value-range.cc
index b7ccba010e4..3703519b03a 100644
--- a/gcc/value-range.cc
+++ b/gcc/value-range.cc
@@ -249,7 +249,8 @@ irange::set (tree min, tree max, value_range_kind kind)
 return;
   }

-  if (POLY_INT_CST_P (min) || POLY_INT_CST_P (max))
+  if (kind == VR_VARYING
+  || POLY_INT_CST_P (min) || POLY_INT_CST_P (max))
   {
 set_varying (TREE_TYPE (min));
 return;


Very minor nit, sorry, but: formatting rules say that all checks should
be on one line or that there should be exactly one check per line.


Huh... I didn't know we had a rule for that, but it makes perfect sense. 
 I'll keep that in mind for future changes.




OK with that change.  Thanks for the quick response in fixing this.


Thanks.
Aldy



Re: [PATCH] Better __ashlDI3, __ashrDI3 and __lshrDI3 functions, plus fixed __bswapsi2 function

2020-11-11 Thread Jakub Jelinek via Gcc-patches
On Wed, Nov 11, 2020 at 09:33:00AM +0100, Stefan Kanthak wrote:
> Ouch: that's but not the point here; what matters is the undefined behaviour 
> of
>   ((u) & 0x00ff) << 24
> 
> 0x00ff is a signed int, so (u) & 0x00ff is signed too -- and producing
> a negative value (or overflow) from the left-shift of a signed int, i.e.
> shifting into (or beyond) the sign bit, is undefined behaviour!

Only in some language dialects.
It is caught by -fsanitize=shift.
In C++20, if the shift count is within bounds, all signed as well as
unsigned left shifts well defined.
In C99/C11 there is one extra rule:
For signed x << y, in C99/C11, the following:
 (unsigned) x >> (uprecm1 - y)
 if non-zero, is undefined.
and for C++11 to C++17 another one:
  /* For signed x << y, in C++11 and later, the following:
 x < 0 || ((unsigned) x >> (uprecm1 - y)) > 1
 is undefined.  */
So indeed, 0x80 << 24 is UB in C99/C11 and C++98, unclear in C89 and
well defined in C++11 and later.  I don't know if C2X is considering
mandating two's complement and making it well defined like C++20 did.

Guess we should fix that, though because different languages have different
rules, GCC itself except for sanitization doesn't consider it UB and only
treats shifts by negative value or shifts by bitsize or more UB.

Jakub



Re: Add support for copy specifier to fnspec

2020-11-11 Thread Richard Biener
On Mon, 9 Nov 2020, Jan Hubicka wrote:

> Hi,
> this patch adds 'c' and 'C' fnspec for parameter that is copied to different
> parameter.  Main motivation is to get rid of wrong EAF_NOESCAPE flag on
> the memcpy argument #2. I however also added arg_copies_to_arg_p
> predicate that can be eventually used by tree-ssa-structalias instead of
> special casing all builtins.
> 
> I noticed that we can no longer describe STRNCAT precisely.  I am not
> sure how important it is.  We can either special case it on the three
> places (in tree-ssa-alias and in ipa-modref) or use 1-9 in place of 'c'
> and 'C' so the second character would be still available for size
> specifier, so strncat would become
> 
> "1cW 13"
> instead of
> "1cW C1"
> 
> Not sure how important this is.

I guess it's an interesting idea and it also gets around the issue
I have with the patch, that you allow 'c' where the semantics are not
at all clear.  So if we go with the patch as-is please disallow 'c'.
Otherwise using 1-9 in place of 'C' works for me as well and also
avoids the overload of the second char.  The description should
also mention that 'C' implies read-only and no escaping (besides
to the specified parameter).  So I guess it should be documented
as "same as 'R'" besides the copying, noting that this means
it "escapes" to the other parameter.

Either OK with disallowing 'c' and adjusted docs or if you want
to rework as '1'-'9' ...

Richard.

> Bootstrapped/regtested x86_64-linux, OK?
> 
> Honza
> 
> 2020-11-09  Jan Hubicka  
> 
>   * attr-fnspec.h: Add 'c' and 'C' specifiers to the toplevel comment.
>   (attr_fnspec::arg_direct_p): Add 'C'.
>   (attr_fnspec::arg_not_written_p): Handle 'c' and 'C'.
>   (attr_fnspec::arg_max_access_size_given_by_arg_p): Handle 'c' and 'C'.
>   (attr_fnspec::arg_access_size_given_by_type_p): Add comment about 'c'
>   and 'C'.
>   (attr_fnspec::arg_copied_to_arg_p): New.
>   * builtins.c (attr_fnspec::builtin_fnspec): Update fnspec of string
>   functions that copies argument.
>   * tree-ssa-alias.c (attr_fnspec::verify): Add 'c' and 'C'; be more
>   struct on arg specifiers.
> 
> diff --git a/gcc/attr-fnspec.h b/gcc/attr-fnspec.h
> index 28135328437..97405dbdd78 100644
> --- a/gcc/attr-fnspec.h
> +++ b/gcc/attr-fnspec.h
> @@ -41,6 +41,13 @@
>   written and does not escape
>   'w' or 'W' specifies that the memory pointed to by the parameter does 
> not
>   escape
> + 'c' or 'C' specifies that the memory pointed to by the parameter is
> + copied to memory pointed to by different parameter
> + (as in memcpy).  The index of the destination parmeter is
> + specified by following character i.e. "C1" means that memory is
> + copied to parameter pointed to by parameter 1.
> + Size of block copied is determined by size specifier of the
> + destination parameter.
>   '.' specifies that nothing is known.
> The uppercase letter in addition specifies that the memory pointed to
> by the parameter is not dereferenced.  For 'r' only read applies
> @@ -51,8 +58,11 @@
>   ' 'nothing is known
>   't' the size of value written/read corresponds to the size of
>   of the pointed-to type of the argument type
> - '1'...'9'  the size of value written/read is given by the specified
> - argument
> + '1'...'9'  preceeded by 'o', 'O', 'w' or 'W'
> + specifies the size of value written/read is given by the
> + specified argument
> + '1'...'9'  preceeded by 'c', or 'c'
> + specifies the argument data is copied to
>   */
>  
>  #ifndef ATTR_FNSPEC_H
> @@ -122,7 +132,8 @@ public:
>{
>  unsigned int idx = arg_idx (i);
>  gcc_checking_assert (arg_specified_p (i));
> -return str[idx] == 'R' || str[idx] == 'O' || str[idx] == 'W';
> +return str[idx] == 'R' || str[idx] == 'O'
> +|| str[idx] == 'W' || str[idx] == 'C';
>}
>  
>/* True if argument is used.  */
> @@ -161,6 +172,7 @@ public:
>  unsigned int idx = arg_idx (i);
>  gcc_checking_assert (arg_specified_p (i));
>  return str[idx] != 'r' && str[idx] != 'R'
> +&& str[idx] != 'c' && str[idx] != 'C'
>  && str[idx] != 'x' && str[idx] != 'X';
>}
>  
> @@ -171,6 +183,8 @@ public:
>{
>  unsigned int idx = arg_idx (i);
>  gcc_checking_assert (arg_specified_p (i));
> +if (str[idx] == 'c' || str[idx] == 'C')
> +  return arg_max_access_size_given_by_arg_p (str[idx + 1] - '1', arg);
>  if (str[idx + 1] >= '1' && str[idx + 1] <= '9')
>{
>   *arg = str[idx + 1] - '1';
> @@ -187,9 +201,26 @@ public:
>{
>  unsigned int idx = arg_idx (i);
>  gcc_checking_assert (arg_specified_p (i));
> +/* We could handle 'c' 'C' but then we would need to have way to check
> +   that both points to sizes are same.  */
>  return str[idx + 1] ==

Re: Detect EAF flags in ipa-modref

2020-11-11 Thread Richard Biener
On Tue, 10 Nov 2020, Jan Hubicka wrote:

> > > +  tree callee = gimple_call_fndecl (stmt);
> > > +  if (callee)
> > > +{
> > > +  cgraph_node *node = cgraph_node::get (callee);
> > > +  modref_summary *summary = node ? get_modref_function_summary (node)
> > > + : NULL;
> > > +
> > > +  if (summary && summary->arg_flags.length () > arg)
> > 
> > So could we make modref "transform" push this as fnspec attribute or
> > would that not really be an optimization?
> 
> It was my original plan to synthetize fnspecs, but I think it is not
> very good idea: we have the summary readily available and we can
> represent information that fnspecs can't
> (do not have artificial limits on number of parameters or counts)
> 
> I would preffer fnspecs to be used only for in-compiler declarations.

Fine, I was just curious...

> > > +
> > > +/* Analyze EAF flags for SSA name NAME.
> > > +   KNOWN_FLAGS is a cache for flags we already determined.
> > > +   DEPTH is a recursion depth used to make debug output prettier.  */
> > > +
> > > +static int
> > > +analyze_ssa_name_flags (tree name, vec *known_flags, int depth)
> > 
> > C++ has references which makes the access to known_flags nicer ;)
> 
> Yay, will chang that :)
> > 
> > > +{
> > > +  imm_use_iterator ui;
> > > +  gimple *use_stmt;
> > > +  int flags = EAF_DIRECT | EAF_NOCLOBBER | EAF_NOESCAPE | EAF_UNUSED;
> > > +
> > > +  /* See if value is already computed.  */
> > > +  if ((*known_flags)[SSA_NAME_VERSION (name)])
> > > +{
> > > +  /* Punt on cycles for now, so we do not need dataflow.  */
> > > +  if ((*known_flags)[SSA_NAME_VERSION (name)] == 1)
> > > + {
> > > +   if (dump_file)
> > > + fprintf (dump_file,
> > > +  "%*sGiving up on a cycle in SSA graph\n", depth * 4, "");
> > > +   return 0;
> > > + }
> > > +  return (*known_flags)[SSA_NAME_VERSION (name)] - 2;
> > > +}
> > > +  /* Recursion guard.  */
> > > +  (*known_flags)[SSA_NAME_VERSION (name)] = 1;
> > 
> > This also guards against multiple evaluations of the same stmts
> > but only in some cases?  Consider
> > 
> >   _1 = ..;
> >   _2 = _1 + _3;
> >   _4 = _1 + _5;
> >   _6 = _2 + _4;
> > 
> > where we visit _2 = and _4 = from _1 but from both are going
> > to visit _6.
> 
> Here we first push _6, then we go for _2 then for _1 evaluate _1,
> evalueate _2, go for _4 and evaluate _4, and evaluate _6.
> It is DFS and you need backward edge in DFS (comming from a PHI).

Hmm, but then we eventually evaluate _6 twice?

> 
> Cycles seems to somewhat matter for GCC: we do have a lot of functions
> that walk linked lists that we could track otherwise.
> > 
> > Maybe I'm blind but you're not limiting depth?  Guess that asks
> > for problems, esp. as you are recursing rather than using a
> > worklist or so?
> > 
> > I see you try to "optimize" the walk by only visiting def->use
> > links from parameters but then a RPO walk over all stmts would
> > be simpler iteration-wise ...
> We usually evaluate just small part of bigger functions (since we lose
> track quite easily after hitting first memory store).  My plan was to
> change this to actual dataflow once we have it well defined 
> (this means after discussing EAF flags with you and adding the logic to
> track callsites for true IPA pass that midly complicated things - for
> every ssa name I track callsite/arg pair where it is passed to
> either directly or indirectly.  Then this is translaed into call summary
> and used by IPA pass to compute final flags)
> 
> I guess I can add --param ipa-modref-walk-depth for now and handle
> dataflow incremntally?

Works for me.

> In particular I am not sure if I should just write iterated RPO myself
> or use tree-ssa-propagate.h (the second may be overkill).

tree-ssa-propagate.h is not to be used, it should DIE ;)

I guess you do want to iterate SSA cycles rather than BB cycles
so I suggest to re-surrect the SSA SCC discovery from the SCC
value-numbering (see tree-ssa-sccvn.c:DFS () on the gcc-8-branch)
which is non-recursive and micro-optimized.  Could put it
somewhere useful (tree-ssa.c?).

> > 
> > > +  if (dump_file)
> > > +{
> > > +  fprintf (dump_file,
> > > +"%*sAnalyzing flags of ssa name: ", depth * 4, "");
> > > +  print_generic_expr (dump_file, name);
> > > +  fprintf (dump_file, "\n");
> > > +}
> > > +
> > > +  FOR_EACH_IMM_USE_STMT (use_stmt, ui, name)
> > > +{
> > > +  if (flags == 0)
> > > + {
> > > +   BREAK_FROM_IMM_USE_STMT (ui);
> > > + }
> > > +  if (is_gimple_debug (use_stmt))
> > > + continue;
> > > +  if (dump_file)
> > > + {
> > > +   fprintf (dump_file, "%*s  Analyzing stmt:", depth * 4, "");
> > > +   print_gimple_stmt (dump_file, use_stmt, 0);
> > > + }
> > > +
> > > +  /* Gimple return may load the return value.  */
> > > +  if (gimple_code (use_stmt) == GIMPLE_RETURN)
> > 
> >  if (greturn *ret = dyn_cast  (use_stmt))
> > 
> > makes the as_a below not needed, similar for 

[PATCH] match.pd: undistribute (a << s) & C, when C = (M << s) and exact_log2(M - 1)

2020-11-11 Thread Philipp Tomsich
From: Philipp Tomsich 

The function
long f(long a)
{
return(a & 0xull) << 3;
}
is folded into
_1 = a_2(D) << 3;
_3 = _1 & 34359738360;
wheras the construction
return (a & 0xull) * 8;
results in
_1 = a_2(D) & 4294967295;
_3 = _1 * 8;

This leads to suboptimal code-generation for RISC-V (march=rv64g), as
the shifted constant needs to be expanded into 3 RTX and 2 RTX (one
each for the LSHIFT_EXPR and the BIT_AND_EXPR) which will overwhelm
the combine pass (a sequence of 5 RTX are not considered):
li  a5,1# tmp78,# 23[c=4 l=4]  
*movdi_64bit/1
sllia5,a5,35#, tmp79, tmp78 # 24[c=4 l=4]  ashldi3
addia5,a5,-8#, tmp77, tmp79 # 9 [c=4 l=4]  adddi3/1
sllia0,a0,3 #, tmp76, tmp80 # 6 [c=4 l=4]  ashldi3
and a0,a0,a5# tmp77,, tmp76 # 15[c=4 l=4]  anddi3/0
ret # 28[c=0 l=4]  simple_return
instead of:
sllia0,a0,32#, tmp76, tmp79 # 26[c=4 l=4]  ashldi3
srlia0,a0,29#,, tmp76   # 27[c=4 l=4]  lshrdi3
ret # 24[c=0 l=4]  simple_return

We address this by adding a simplification for
   (a << s) & M, where ((M >> s) << s) == M
to
   (a & M_unshifted) << s, where M_unshifted := (M >> s)
which undistributes the LSHIFT.

Signed-off-by: Philipp Tomsich 
---
 gcc/match.pd| 11 +--
 gcc/testsuite/gcc.target/riscv/zextws.c | 18 ++
 2 files changed, 27 insertions(+), 2 deletions(-)
 create mode 100644 gcc/testsuite/gcc.target/riscv/zextws.c

diff --git a/gcc/match.pd b/gcc/match.pd
index 349eab6..6bb9535 100644
--- a/gcc/match.pd
+++ b/gcc/match.pd
@@ -3079,6 +3079,12 @@ DEFINE_INT_AND_FLOAT_ROUND_FN (RINT)
 }
 }
  }
+(if (GIMPLE && (((mask >> shiftc) << shiftc) == mask)
+   && (exact_log2((mask >> shiftc) + 1) >= 0)
+   && (shift == LSHIFT_EXPR))
+(with
+ { tree newmaskt = build_int_cst_type(TREE_TYPE (@2), mask >> shiftc); 
}
+ (shift (convert (bit_and:shift_type (convert @0) { newmaskt; })) @1))
  /* ((X << 16) & 0xff00) is (X, 0).  */
  (if ((mask & zerobits) == mask)
   { build_int_cst (type, 0); }
@@ -3100,7 +3106,7 @@ DEFINE_INT_AND_FLOAT_ROUND_FN (RINT)
   (if (!tree_int_cst_equal (newmaskt, @2))
(if (shift_type != TREE_TYPE (@3))
 (bit_and (convert (shift:shift_type (convert @3) @1)) { newmaskt; 
})
-(bit_and @4 { newmaskt; })
+(bit_and @4 { newmaskt; }))
 
 /* Fold (X {&,^,|} C2) << C1 into (X << C1) {&,^,|} (C2 << C1)
(X {&,^,|} C2) >> C1 into (X >> C1) & (C2 >> C1).  */
@@ -3108,7 +3114,8 @@ DEFINE_INT_AND_FLOAT_ROUND_FN (RINT)
  (for bit_op (bit_and bit_xor bit_ior)
   (simplify
(shift (convert?:s (bit_op:s @0 INTEGER_CST@2)) INTEGER_CST@1)
-   (if (tree_nop_conversion_p (type, TREE_TYPE (@0)))
+   (if (tree_nop_conversion_p (type, TREE_TYPE (@0))
+&& !wi::exact_log2(wi::to_wide(@2) + 1))
 (with { tree mask = int_const_binop (shift, fold_convert (type, @2), @1); }
  (bit_op (shift (convert @0) @1) { mask; }))
 
diff --git a/gcc/testsuite/gcc.target/riscv/zextws.c 
b/gcc/testsuite/gcc.target/riscv/zextws.c
new file mode 100644
index 000..8ac93f6
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/zextws.c
@@ -0,0 +1,18 @@
+/* { dg-do compile } */
+/* { dg-options "-march=rv64g -mabi=lp64 -O2" } */
+
+/* Test for
+ (a << s) & M', where ((M >> s) << s) == M
+   being undistributed into
+ (a & M_unshifted) << s, where M_unshifted := (M >> s)
+   to produce the sequence (or similar)
+ slli  a0,a0,32
+ srli  a0,a0,29
+*/
+long
+zextws_mask (long i)
+{
+  return (i & 0xULL) << 3;
+}
+/* { dg-final { scan-assembler "slli" } } */
+/* { dg-final { scan-assembler "srli" } } */
-- 
1.8.3.1



[PATCH] match.pd: rewrite x << C with C > precision to (const_int 0)

2020-11-11 Thread Philipp Tomsich
From: Philipp Tomsich 

csmith managed to sneak a shift wider than the bit-width of a register
past the frontend (found when addressing a bug in our bitmanip machine
description): no warning is given and an unneeded shift is generated.
This behaviour was validated for the resulting assembly both for RISC-V
and AArch64.

This matches (x << C), where C is contant and C > precicison(x), and
rewrites it to (const_int 0).  This has been confirmed to remove the
redundant shift instruction both for AArch64 and RISC-V.
---
 gcc/match.pd | 6 ++
 1 file changed, 6 insertions(+)

diff --git a/gcc/match.pd b/gcc/match.pd
index 349eab6..2309175 100644
--- a/gcc/match.pd
+++ b/gcc/match.pd
@@ -764,6 +764,12 @@ DEFINE_INT_AND_FLOAT_ROUND_FN (RINT)
(cabss (ops @0))
(cabss @0
 
+/* Fold (x << C), where C > precision(type) into 0. */
+(simplify
+ (lshift @0 INTEGER_CST@1)
+  (if (wi::ltu_p (TYPE_PRECISION (TREE_TYPE (@0)), wi::to_wide(@1)))
+   { build_zero_cst (TREE_TYPE (@0)); } ))
+
 /* Fold (a * (1 << b)) into (a << b)  */
 (simplify
  (mult:c @0 (convert? (lshift integer_onep@1 @2)))
-- 
1.8.3.1



Re: [PATCH] match.pd: undistribute (a << s) & C, when C = (M << s) and exact_log2(M - 1)

2020-11-11 Thread Jakub Jelinek via Gcc-patches
On Wed, Nov 11, 2020 at 11:17:32AM +0100, Philipp Tomsich wrote:
> From: Philipp Tomsich 
> 
> The function
> long f(long a)
> {
>   return(a & 0xull) << 3;
> }
> is folded into
> _1 = a_2(D) << 3;
> _3 = _1 & 34359738360;
> wheras the construction
> return (a & 0xull) * 8;
> results in
> _1 = a_2(D) & 4294967295;
> _3 = _1 * 8;
> 
> This leads to suboptimal code-generation for RISC-V (march=rv64g), as
> the shifted constant needs to be expanded into 3 RTX and 2 RTX (one
> each for the LSHIFT_EXPR and the BIT_AND_EXPR) which will overwhelm
> the combine pass (a sequence of 5 RTX are not considered):
>   li  a5,1# tmp78,# 23[c=4 l=4]  
> *movdi_64bit/1
>   sllia5,a5,35#, tmp79, tmp78 # 24[c=4 l=4]  ashldi3
>   addia5,a5,-8#, tmp77, tmp79 # 9 [c=4 l=4]  adddi3/1
>   sllia0,a0,3 #, tmp76, tmp80 # 6 [c=4 l=4]  ashldi3
>   and a0,a0,a5# tmp77,, tmp76 # 15[c=4 l=4]  anddi3/0
>   ret # 28[c=0 l=4]  simple_return
> instead of:
>   sllia0,a0,32#, tmp76, tmp79 # 26[c=4 l=4]  ashldi3
>   srlia0,a0,29#,, tmp76   # 27[c=4 l=4]  lshrdi3
>   ret # 24[c=0 l=4]  simple_return
> 
> We address this by adding a simplification for
>(a << s) & M, where ((M >> s) << s) == M
> to
>(a & M_unshifted) << s, where M_unshifted := (M >> s)
> which undistributes the LSHIFT.

This is problematic, we have another rule that goes against this:
/* Fold (X {&,^,|} C2) << C1 into (X << C1) {&,^,|} (C2 << C1)
   (X {&,^,|} C2) >> C1 into (X >> C1) & (C2 >> C1).  */
(for shift (lshift rshift)
 (for bit_op (bit_and bit_xor bit_ior)
  (simplify
   (shift (convert?:s (bit_op:s @0 INTEGER_CST@2)) INTEGER_CST@1)
   (if (tree_nop_conversion_p (type, TREE_TYPE (@0)))
(with { tree mask = int_const_binop (shift, fold_convert (type, @2), @1); }
 (bit_op (shift (convert @0) @1) { mask; }))
and we don't want the two rules to keep fighting against each other.
It is better to have one form as canonical and only right before expansion
(isel pass) or during expansion decide e.g. based on target costs
whether that (X << C1) & (C2 << C1) is better expanded like that,
or as (X & C2) << C1, or as (X << C3) >> C4.

Jakub



Re: [PATCH] match.pd: undistribute (a << s) & C, when C = (M << s) and exact_log2(M - 1)

2020-11-11 Thread Philipp Tomsich via Gcc-patches
Jakub,

On Wed, 11 Nov 2020 at 11:31, Jakub Jelinek  wrote:
>
> On Wed, Nov 11, 2020 at 11:17:32AM +0100, Philipp Tomsich wrote:
> > From: Philipp Tomsich 
> >
> > The function
> > long f(long a)
> > {
> >   return(a & 0xull) << 3;
> > }
> > is folded into
> > _1 = a_2(D) << 3;
> > _3 = _1 & 34359738360;
> > wheras the construction
> > return (a & 0xull) * 8;
> > results in
> > _1 = a_2(D) & 4294967295;
> > _3 = _1 * 8;
> >
> > This leads to suboptimal code-generation for RISC-V (march=rv64g), as
> > the shifted constant needs to be expanded into 3 RTX and 2 RTX (one
> > each for the LSHIFT_EXPR and the BIT_AND_EXPR) which will overwhelm
> > the combine pass (a sequence of 5 RTX are not considered):
> >   li  a5,1# tmp78,# 23[c=4 l=4]  
> > *movdi_64bit/1
> >   sllia5,a5,35#, tmp79, tmp78 # 24[c=4 l=4]  ashldi3
> >   addia5,a5,-8#, tmp77, tmp79 # 9 [c=4 l=4]  adddi3/1
> >   sllia0,a0,3 #, tmp76, tmp80 # 6 [c=4 l=4]  ashldi3
> >   and a0,a0,a5# tmp77,, tmp76 # 15[c=4 l=4]  anddi3/0
> >   ret # 28[c=0 l=4]  simple_return
> > instead of:
> >   sllia0,a0,32#, tmp76, tmp79 # 26[c=4 l=4]  ashldi3
> >   srlia0,a0,29#,, tmp76   # 27[c=4 l=4]  lshrdi3
> >   ret # 24[c=0 l=4]  
> > simple_return
> >
> > We address this by adding a simplification for
> >(a << s) & M, where ((M >> s) << s) == M
> > to
> >(a & M_unshifted) << s, where M_unshifted := (M >> s)
> > which undistributes the LSHIFT.
>
> This is problematic, we have another rule that goes against this:
> /* Fold (X {&,^,|} C2) << C1 into (X << C1) {&,^,|} (C2 << C1)
>(X {&,^,|} C2) >> C1 into (X >> C1) & (C2 >> C1).  */
> (for shift (lshift rshift)
>  (for bit_op (bit_and bit_xor bit_ior)
>   (simplify
>(shift (convert?:s (bit_op:s @0 INTEGER_CST@2)) INTEGER_CST@1)
>(if (tree_nop_conversion_p (type, TREE_TYPE (@0)))
> (with { tree mask = int_const_binop (shift, fold_convert (type, @2), @1); 
> }
>  (bit_op (shift (convert @0) @1) { mask; }))
> and we don't want the two rules to keep fighting against each other.
> It is better to have one form as canonical and only right before expansion
> (isel pass) or during expansion decide e.g. based on target costs
> whether that (X << C1) & (C2 << C1) is better expanded like that,
> or as (X & C2) << C1, or as (X << C3) >> C4.


The patch addresses this by disallowing that rule, if an exact power-of-2 is
seen as C1.  The reason why I would prefer to have this canonicalised the
same way the (X & C1) * C2 is canonicalised, is that cleaning this up during
combine is more difficult on some architectures that require multiple insns
to represent the shifted constant (i.e. C1 << C2).

Given that this maps back to another (already used) canonical form, it seems
straightforward enough — assuming that the additional complexity (i.e. an
additional rule and an additional condition for one of the other
rules) is acceptable.

Philipp.


[PATCH] Drop topological sort for PRE phi-translation

2020-11-11 Thread Richard Biener
The topological sort sorted_array_from_bitmap_set is supposed to
provide isn't one since quite some time since value_ids are
assigned first to SSA names in the order of SSA_NAME_VERSION
and then to hashtable entries in the order they appear in the
table.  One can even argue that expression-ids provide a closer
approximation of a topological sort since those are assigned
during AVAIL_OUT computation which is done in a dominator walk.

Now - phi-translation is not even depending on topological sorting
but it essentially does a DFS walk, phi-translating expressions
it depends on and relying on phi-translation caching to avoid
doing redundant work.

So this patch drops the use of sorted_array_from_bitmap_set from
phi_translate_set because this function is quite expensive.

Bootstrapped and tested on x86_64-unknown-linux-gnu, pushed.

2020-11-11  Richard Biener  

* tree-ssa-pre.c (phi_translate_set): Do not sort the
expression set topologically.
---
 gcc/tree-ssa-pre.c | 17 +++--
 1 file changed, 7 insertions(+), 10 deletions(-)

diff --git a/gcc/tree-ssa-pre.c b/gcc/tree-ssa-pre.c
index 90877e3c68e..da2b68909d9 100644
--- a/gcc/tree-ssa-pre.c
+++ b/gcc/tree-ssa-pre.c
@@ -1762,9 +1762,8 @@ phi_translate (bitmap_set_t dest, pre_expr expr,
 static void
 phi_translate_set (bitmap_set_t dest, bitmap_set_t set, edge e)
 {
-  vec exprs;
-  pre_expr expr;
-  int i;
+  bitmap_iterator bi;
+  unsigned int i;
 
   if (gimple_seq_empty_p (phi_nodes (e->dest)))
 {
@@ -1772,24 +1771,22 @@ phi_translate_set (bitmap_set_t dest, bitmap_set_t set, 
edge e)
   return;
 }
 
-  exprs = sorted_array_from_bitmap_set (set);
   /* Allocate the phi-translation cache where we have an idea about
  its size.  hash-table implementation internals tell us that
  allocating the table to fit twice the number of elements will
  make sure we do not usually re-allocate.  */
   if (!PHI_TRANS_TABLE (e->src))
-PHI_TRANS_TABLE (e->src)
-  = new hash_table (2 * exprs.length ());
-  FOR_EACH_VEC_ELT (exprs, i, expr)
+PHI_TRANS_TABLE (e->src) = new hash_table
+  (2 * bitmap_count_bits (&set->expressions));
+  FOR_EACH_EXPR_ID_IN_SET (set, i, bi)
 {
-  pre_expr translated;
-  translated = phi_translate (dest, expr, set, NULL, e);
+  pre_expr expr = expression_for_id (i);
+  pre_expr translated = phi_translate (dest, expr, set, NULL, e);
   if (!translated)
continue;
 
   bitmap_insert_into_set (dest, translated);
 }
-  exprs.release ();
 }
 
 /* Find the leader for a value (i.e., the name representing that
-- 
2.26.2


Re: [PATCH] match.pd: undistribute (a << s) & C, when C = (M << s) and exact_log2(M - 1)

2020-11-11 Thread Jakub Jelinek via Gcc-patches
On Wed, Nov 11, 2020 at 11:43:34AM +0100, Philipp Tomsich wrote:
> The patch addresses this by disallowing that rule, if an exact power-of-2 is
> seen as C1.  The reason why I would prefer to have this canonicalised the
> same way the (X & C1) * C2 is canonicalised, is that cleaning this up during
> combine is more difficult on some architectures that require multiple insns
> to represent the shifted constant (i.e. C1 << C2).

It is bad to have many exceptions for the canonicalization
and it is unclear why exactly these were chosen, and it doesn't really deal
with say:
(x & 0xabcdef12ULL) << 13
being less expensive on some targets than
(x << 13) & (0xabcdef12ULL << 13).
(x & 0x7) << 3 vs. (x << 3) & 0x38 on the other side is a wash on
many targets.
As I said, it is better to decide which one is better before or during
expansion based on target costs, sure, combine can't catch everything.

Also, the patch formatting was incorrect in several ways (indentation,
missing space before ( when calling functions, etc.).

Jakub



Re: [PATCH PR94442] [AArch64] Redundant ldp/stp instructions emitted at -O3

2020-11-11 Thread Richard Sandiford via Gcc-patches
xiezhiheng  writes:
>> -Original Message-
>> From: Richard Sandiford [mailto:richard.sandif...@arm.com]
>> Sent: Tuesday, November 10, 2020 7:54 PM
>> To: xiezhiheng 
>> Cc: gcc-patches@gcc.gnu.org
>> Subject: Re: [PATCH PR94442] [AArch64] Redundant ldp/stp instructions
>> emitted at -O3
>> 
>> xiezhiheng  writes:
>> >> -Original Message-
>> >> From: Richard Sandiford [mailto:richard.sandif...@arm.com]
>> >> Sent: Tuesday, November 3, 2020 9:57 PM
>> >> To: xiezhiheng 
>> >> Cc: gcc-patches@gcc.gnu.org
>> >> Subject: Re: [PATCH PR94442] [AArch64] Redundant ldp/stp instructions
>> >> emitted at -O3
>> >>
>> >> Thanks, I pushed both patches to trunk.
>> >>
>> >
>> > Thanks.  And I made two separate patches for these two groups, tbl/tbx
>> intrinsics and
>> > the rest of the arithmetic operation intrinsics.
>> >
>> > Note: It does not matter which patch is applied first.
>> 
>> I pushed the TBL/TBX one, but on the other patch:
>> 
>> > @@ -297,7 +297,7 @@
>> >BUILTIN_VSDQ_I (USHIFTIMM, uqshl_n, 0, ALL)
>> >
>> >/* Implemented by aarch64_reduc_plus_.  */
>> > -  BUILTIN_VALL (UNOP, reduc_plus_scal_, 10, ALL)
>> > +  BUILTIN_VALL (UNOP, reduc_plus_scal_, 10, FP)
>> 
>> This is defined for integer and FP modes, so I think it should be
>> NONE instead of FP.  We'll automatically add FLAGS_FP based on the
>> mode where necessary.
>> 
>
> Sorry, and I have revised a new patch.
> Bootstrapped and tested on aarch64 Linux platform.

LGTM, thanks.  Pushed to trunk.

Richard

> Thanks,
> Xie Zhiheng
>
>
> diff --git a/gcc/ChangeLog b/gcc/ChangeLog
> index 75092451216..d6a49d65214 100644
> --- a/gcc/ChangeLog
> +++ b/gcc/ChangeLog
> @@ -1,3 +1,9 @@
> +2020-11-11  Zhiheng Xie  
> +   Nannan Zheng  
> +
> +   * config/aarch64/aarch64-simd-builtins.def: Add proper FLAG
> +   for arithmetic operation intrinsics.
> +


[committed] libstdc++: Use helper type for checking thread ID

2020-11-11 Thread Jonathan Wakely via Gcc-patches
This encapsulates the storing and checking of the thread ID into a class
type, so that the macro _GLIBCXX_HAS_GTHREADS is only checked in one
place. The code doing the checks just calls member functions of the new
type, without caring whether that really does any work or not.

libstdc++-v3/ChangeLog:

* include/std/stop_token (_Stop_state_t::_M_requester): Define
new struct with members to store and check the thread ID.
(_Stop_state_t::_M_request_stop()): Use _M_requester._M_set().
(_Stop_state_t::_M_remove_callback(_Stop_cb*)): Use
_M_requester._M_is_current_thread().

Tested powerpc64le-linux. Committed to trunk.

commit 43f9e5aff06f1ca2296fdbd3141fe90ec0be1912
Author: Jonathan Wakely 
Date:   Wed Nov 11 09:28:50 2020

libstdc++: Use helper type for checking thread ID

This encapsulates the storing and checking of the thread ID into a class
type, so that the macro _GLIBCXX_HAS_GTHREADS is only checked in one
place. The code doing the checks just calls member functions of the new
type, without caring whether that really does any work or not.

libstdc++-v3/ChangeLog:

* include/std/stop_token (_Stop_state_t::_M_requester): Define
new struct with members to store and check the thread ID.
(_Stop_state_t::_M_request_stop()): Use _M_requester._M_set().
(_Stop_state_t::_M_remove_callback(_Stop_cb*)): Use
_M_requester._M_is_current_thread().

diff --git a/libstdc++-v3/include/std/stop_token 
b/libstdc++-v3/include/std/stop_token
index ccec6fab15cf..7cd01c9713ee 100644
--- a/libstdc++-v3/include/std/stop_token
+++ b/libstdc++-v3/include/std/stop_token
@@ -162,9 +162,18 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
   std::atomic _M_owners{1};
   std::atomic _M_value{_S_ssrc_counter_inc};
   _Stop_cb* _M_head = nullptr;
+  struct
+  {
 #ifdef _GLIBCXX_HAS_GTHREADS
-  __gthread_t _M_requester;
+   __gthread_t _M_id;
+   void _M_set() { _M_id = __gthread_self(); }
+   bool _M_is_current_thread() const
+   { return __gthread_equal(_M_id, __gthread_self()); }
+#else
+   void _M_set() { }
+   constexpr bool _M_is_current_thread() const { return true; }
 #endif
+  } _M_requester;
 
   _Stop_state_t() = default;
 
@@ -237,9 +246,7 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
  }
while (!_M_try_lock_and_stop(__old));
 
-#ifdef _GLIBCXX_HAS_GTHREADS
-   _M_requester = __gthread_self();
-#endif
+   _M_requester._M_set();
 
while (_M_head)
  {
@@ -343,18 +350,17 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
// Callback is not in the list, so must have been removed by a call to
// _M_request_stop.
 
-#ifdef _GLIBCXX_HAS_GTHREADS
// Despite appearances there is no data race on _M_requester. The only
// write to it happens before the callback is removed from the list,
// and removing it from the list happens before this read.
-   if (!__gthread_equal(_M_requester, __gthread_self()))
+   if (!_M_requester._M_is_current_thread())
  {
// Synchronize with completion of callback.
__cb->_M_done.acquire();
// Safe for ~stop_callback to destroy *__cb now.
return;
  }
-#endif
+
if (__cb->_M_destroyed)
  *__cb->_M_destroyed = true;
   }


[PATCH 4/6 v3] Add documentation for dead field elimination

2020-11-11 Thread Erick Ochoa



2020-11-04  Erick Ochoa  

* gcc/Makefile.in: Add file to documentation sources
* gcc/doc/dfe.texi: New section
* gcc/doc/gccint.texi: Include new section
---
 gcc/Makefile.in |   3 +-
 gcc/doc/dfe.texi| 187 
 gcc/doc/gccint.texi |   2 +
 3 files changed, 191 insertions(+), 1 deletion(-)
 create mode 100644 gcc/doc/dfe.texi

diff --git a/gcc/Makefile.in b/gcc/Makefile.in
index 2184bd0fc3d..7e4c442416d 100644
--- a/gcc/Makefile.in
+++ b/gcc/Makefile.in
@@ -3275,7 +3275,8 @@ TEXI_GCCINT_FILES = gccint.texi gcc-common.texi 
gcc-vers.texi		\

 gnu.texi gpl_v3.texi fdl.texi contrib.texi languages.texi  \
 sourcebuild.texi gty.texi libgcc.texi cfg.texi tree-ssa.texi   \
 loop.texi generic.texi gimple.texi plugins.texi optinfo.texi   \
-match-and-simplify.texi analyzer.texi ux.texi poly-int.texi
+match-and-simplify.texi analyzer.texi ux.texi poly-int.texi\
+dfe.texi
  TEXI_GCCINSTALL_FILES = install.texi install-old.texi fdl.texi
\
 gcc-common.texi gcc-vers.texi
diff --git a/gcc/doc/dfe.texi b/gcc/doc/dfe.texi
new file mode 100644
index 000..e8d01d817d3
--- /dev/null
+++ b/gcc/doc/dfe.texi
@@ -0,0 +1,187 @@
+@c Copyright (C) 2001 Free Software Foundation, Inc.
+@c This is part of the GCC manual.
+@c For copying conditions, see the file gcc.texi.
+
+@node Dead Field Elimination
+@chapter Dead Field Elimination
+
+@node Dead Field Elimination Internals
+@section Dead Field Elimination Internals
+
+@subsection Introduction
+
+Dead field elimination is a compiler transformation that removes fields 
from structs. There are several challenges to removing fields from 
structs at link time but, depending on the workload of the compiled 
program and the architecture where the program runs, dead field 
elimination might be a worthwhile transformation to apply. Generally 
speaking, when the bottle-neck of an application is given by the memory 
bandwidth of the host system and the memory requested is of a struct 
which can be reduced in size, then that combination of workload, program 
and architecture can benefit from applying dead field elimination. The 
benefits come from removing unnecessary fields from structures and thus 
reducing the memory/cache requirements to represent a structure.  +

+ +
+While challenges exist to fully automate a dead field elimination 
transformation, similar and more powerful optimizations have been 
implemented in the past. Chakrabarti et al [0] implement struct peeling, 
splitting into hot and cold parts of a structure, and field reordering. 
Golovanevsky et al [1] also shows efforts to implement data layout 
optimizations at link time. Unlike the work of Chakrabarti and 
Golovanesky, this text only talks about dead field elimination. This 
doesn't mean that the implementation can't be expanded to perform other 
link-time layout optimizations, it just means that dead field 
elimination is the only transformation that is implemented at the time 
of this writing. +
+[0] Chakrabarti, Gautam, Fred Chow, and L. PathScale. "Structure layout 
optimizations in the open64 compiler: Design, implementation and 
measurements." Open64 Workshop at the International Symposium on Code 
Generation and Optimization. 2008.  +
+[1] Golovanevsky, Olga, and Ayal Zaks. "Struct-reorg: current status 
and future perspectives." Proceedings of the GCC Developers’ Summit. 
2007.  +

+@subsection Overview
+
+The dead field implementation is structured in the following way:  +
+ +@itemize @bullet
+@item
+Collect all types which can refer to a @code{RECORD_TYPE}. This means 
that if we have a pointer to a record, we also collect this pointer. Or 
an array, or a union. +@item

+Mark types as escaping. More of this in the following section.  +@item
+Find fields which can be deleted. (Iterate over all gimple code and 
find which fields are read.)  +@item
+Create new types with removed fields (and reference these types in 
pointers, arrays, etc.)  +@item

+Modify gimple to include these types.  +@end itemize
+
+
+Most of this code relies on the visitor pattern. Types, Expr, and 
Gimple statements are visited using this pattern. You can find the base 
classes in @file{type-walker.c} @file{expr-walker.c} and 
@file{gimple-walker.c}. There are assertions in place where a type, 
expr, or gimple code is encountered which has not been encountered 
before during the testing of this transformation. This facilitates 
fuzzying of the transformation.

+
+@subsubsection Implementation Details: Is a global variable escaping?
+
+How does the analysis determine whether a global variable is visible to 
code outside the current linking unit? In the file 
@file{gimple-escaper.c} we have a simple function called 
@code{is_variable_escaping} which checks whether a variable is visible 
to code outside the current linking unit by looking at the 
@code{varpool_node}’s @code{externally_visible} fi

[PATCH 5/6 v3] Abort if Gimple from C++ or Fortran sources is found.

2020-11-11 Thread Erick Ochoa



2020-11-04  Erick Ochoa  

* gcc/ipa-field-reorder: Add flag to exit transformation
* gcc/ipa-type-escape-analysis: Same
---
 gcc/ipa-field-reorder.c|  3 +-
 gcc/ipa-type-escape-analysis.c | 54 --
 gcc/ipa-type-escape-analysis.h |  2 ++
 3 files changed, 49 insertions(+), 10 deletions(-)

diff --git a/gcc/ipa-field-reorder.c b/gcc/ipa-field-reorder.c
index 4c1ddc6d0e3..9a28097b473 100644
--- a/gcc/ipa-field-reorder.c
+++ b/gcc/ipa-field-reorder.c
@@ -587,6 +587,7 @@ lto_fr_execute ()
 {
   log ("here in field reordering \n");
   // Analysis.
+  detected_incompatible_syntax = false;
   tpartitions_t escaping_nonescaping_sets
 = partition_types_into_escaping_nonescaping ();
   record_field_map_t record_field_map = find_fields_accessed ();
@@ -594,7 +595,7 @@ lto_fr_execute ()
 = obtain_nonescaping_unaccessed_fields (escaping_nonescaping_sets,
record_field_map, 0);
 -  if (record_field_offset_map.empty ())
+  if (detected_incompatible_syntax || record_field_offset_map.empty ())
 return 0;
// Prepare for transformation.
diff --git a/gcc/ipa-type-escape-analysis.c b/gcc/ipa-type-escape-analysis.c
index 31eb8b41ff0..df21500b61e 100644
--- a/gcc/ipa-type-escape-analysis.c
+++ b/gcc/ipa-type-escape-analysis.c
@@ -171,6 +171,10 @@ along with GCC; see the file COPYING3.  If not see
 #include "ipa-type-escape-analysis.h"
 #include "ipa-dfe.h"
 +#define ABORT_IF_NOT_C true
+
+bool detected_incompatible_syntax = false;
+
 // Main function that drives dfe.
 static unsigned int
 lto_dfe_execute ();
@@ -262,13 +266,15 @@ lto_dead_field_elimination ()
 if (cnode->inlined_to) continue;
 cnode->get_body();
   }
+
+  detected_incompatible_syntax = false;
   tpartitions_t escaping_nonescaping_sets
 = partition_types_into_escaping_nonescaping ();
   record_field_map_t record_field_map = find_fields_accessed ();
   record_field_offset_map_t record_field_offset_map
 = obtain_nonescaping_unaccessed_fields (escaping_nonescaping_sets,
record_field_map, OPT_Wdfa);
-  if (record_field_offset_map.empty ())
+  if (detected_incompatible_syntax || record_field_offset_map.empty ())
 return;
  // Prepare for transformation.
@@ -588,6 +594,7 @@ TypeWalker::_walk (tree type)
   // Improve, verify that having a type is an invariant.
   // I think there was a specific example which didn't
   // allow for it
+  if (detected_incompatible_syntax) return;
   if (!type)
 return;
 @@ -641,9 +648,9 @@ TypeWalker::_walk (tree type)
 case POINTER_TYPE:
   this->walk_POINTER_TYPE (type);
   break;
-case REFERENCE_TYPE:
-  this->walk_REFERENCE_TYPE (type);
-  break;
+//case REFERENCE_TYPE:
+//  this->walk_REFERENCE_TYPE (type);
+//  break;
 case ARRAY_TYPE:
   this->walk_ARRAY_TYPE (type);
   break;
@@ -653,18 +660,24 @@ TypeWalker::_walk (tree type)
 case FUNCTION_TYPE:
   this->walk_FUNCTION_TYPE (type);
   break;
-case METHOD_TYPE:
-  this->walk_METHOD_TYPE (type);
-  break;
+//case METHOD_TYPE:
+  //this->walk_METHOD_TYPE (type);
+  //break;
 // Since we are dealing only with C at the moment,
 // we don't care about QUAL_UNION_TYPE nor LANG_TYPEs
 // So fail early.
+case REFERENCE_TYPE:
+case METHOD_TYPE:
 case QUAL_UNION_TYPE:
 case LANG_TYPE:
 default:
   {
log ("missing %s\n", get_tree_code_name (code));
+#ifdef ABORT_IF_NOT_C
+   detected_incompatible_syntax = true;
+#else
gcc_unreachable ();
+#endif
   }
   break;
 }
@@ -847,6 +860,7 @@ TypeWalker::_walk_arg (tree t)
 void
 ExprWalker::walk (tree e)
 {
+  if (detected_incompatible_syntax) return;
   _walk_pre (e);
   _walk (e);
   _walk_post (e);
@@ -931,7 +945,11 @@ ExprWalker::_walk (tree e)
 default:
   {
log ("missing %s\n", get_tree_code_name (code));
+#ifdef ABORT_IF_NOT_C
+   detected_incompatible_syntax = true;
+#else
gcc_unreachable ();
+#endif
   }
   break;
 }
@@ -1164,6 +1182,7 @@ GimpleWalker::walk ()
   cgraph_node *node = NULL;
   FOR_EACH_FUNCTION_WITH_GIMPLE_BODY (node)
 {
+  if (detected_incompatible_syntax) return;
   node->get_untransformed_body ();
   tree decl = node->decl;
   gcc_assert (decl);
@@ -1410,7 +1429,11 @@ GimpleWalker::_walk_gimple (gimple *stmt)
   // Break if something is unexpected.
   const char *name = gimple_code_name[code];
   log ("gimple code name %s\n", name);
+#ifdef ABORT_IF_NOT_C
+  detected_incompatible_syntax = true;
+#else
   gcc_unreachable ();
+#endif
 }
  void
@@ -2960,6 +2983,8 @@ TypeStringifier::stringify (tree t)
 return std::string ("");
   _stringification.clear ();
   gcc_assert (t);
+  if (detected_incompatible_syntax)
+return std::string ("");
   walk (t);
   return _stringification;
 }
@@ -3150,14 +3175,19 @@ TypeStringifier::_walk_arg_po

[PATCH 6/6 v3] Add heuristic to take into account void* pattern.

2020-11-11 Thread Erick Ochoa



We add a heuristic in order to be able to transform functions which
receive void* arguments as a way to generalize over arguments. An
example of this is qsort. The heuristic works by first inspecting
leaves in the call graph. If the leaves only contain a reference
to a single RECORD_TYPE then we color the nodes in the call graph
as "casts are safe in this function and does not call external
visible functions". We propagate this property up the callgraph
until a fixed point is reached. This will later be changed to
use ipa-modref.

2020-11-04  Erick Ochoa  

* ipa-type-escape-analysis.c : Add new heuristic
* ipa-field-reorder.c : Use heuristic
* ipa-type-escape-analysis.h : Change signatures
---
 gcc/ipa-field-reorder.c|   3 +-
 gcc/ipa-type-escape-analysis.c | 193 +++--
 gcc/ipa-type-escape-analysis.h |  78 +++--
 3 files changed, 259 insertions(+), 15 deletions(-)

diff --git a/gcc/ipa-field-reorder.c b/gcc/ipa-field-reorder.c
index 9a28097b473..2f694cff7ea 100644
--- a/gcc/ipa-field-reorder.c
+++ b/gcc/ipa-field-reorder.c
@@ -588,8 +588,9 @@ lto_fr_execute ()
   log ("here in field reordering \n");
   // Analysis.
   detected_incompatible_syntax = false;
+  std::map whitelisted = get_whitelisted_nodes();
   tpartitions_t escaping_nonescaping_sets
-= partition_types_into_escaping_nonescaping ();
+= partition_types_into_escaping_nonescaping (whitelisted);
   record_field_map_t record_field_map = find_fields_accessed ();
   record_field_offset_map_t record_field_offset_map
 = obtain_nonescaping_unaccessed_fields (escaping_nonescaping_sets,
diff --git a/gcc/ipa-type-escape-analysis.c b/gcc/ipa-type-escape-analysis.c
index df21500b61e..ea993b8a6cc 100644
--- a/gcc/ipa-type-escape-analysis.c
+++ b/gcc/ipa-type-escape-analysis.c
@@ -104,6 +104,7 @@ along with GCC; see the file COPYING3.  If not see
 #include 
 #include 
 #include 
+#include 
  #include "config.h"
 #include "system.h"
@@ -249,6 +250,99 @@ lto_dfe_execute ()
   return 0;
 }
 +/* Heuristic to determine if casting is allowed in a function.
+ * This heuristic attempts to allow casting in functions which follow the
+ * pattern where a struct pointer or array pointer is casted to void* 
or + * char*.  The heuristic works as follows:

+ *
+ * There is a simple per-function analysis that determines whether there
+ * is more than 1 type of struct referenced in the body of the method.
+ * If there is more than 1 type of struct referenced in the body,
+ * then the layout of the structures referenced within the body
+ * cannot be casted.  However, if there's only one type of struct 
referenced

+ * in the body of the function, casting is allowed in the function itself.
+ * The logic behind this is that the if the code follows good programming
+ * practices, the only way the memory should be accessed is via a singular
+ * type. There is also another requisite to this per-function analysis, and
+ * that is that the function can only call colored functions or functions
+ * which are available in the linking unit.
+ *
+ * Using this per-function analysis, we then start coloring leaf nodes 
in the
+ * call graph as ``safe'' or ``unsafe''.  The color is propagated to 
the + * callers of the functions until a fixed point is reached.

+ */
+std::map
+get_whitelisted_nodes ()
+{
+  cgraph_node *node = NULL;
+  std::set nodes;
+  std::set leaf_nodes;
+  std::set leaf_nodes_decl;
+  FOR_EACH_FUNCTION_WITH_GIMPLE_BODY (node)
+  {
+node->get_untransformed_body ();
+nodes.insert(node);
+if (node->callees) continue;
+
+leaf_nodes.insert (node);
+leaf_nodes_decl.insert (node->decl);
+  }
+
+  std::queue worklist;
+  for (std::set::iterator i = leaf_nodes.begin (),
+e = leaf_nodes.end (); i != e; ++i)
+  {
+if (dump_file) fprintf (dump_file, "is a leaf node %s\n", 
(*i)->name ());

+worklist.push (*i);
+  }
+
+  for (std::set::iterator i = nodes.begin (),
+e = nodes.end (); i != e; ++i)
+  {
+worklist.push (*i);
+  }
+
+  std::map map;
+  while (!worklist.empty ())
+  {
+
+if (detected_incompatible_syntax) return map;
+cgraph_node *i = worklist.front ();
+worklist.pop ();
+if (dump_file) fprintf (dump_file, "analyzing %s %p\n", i->name (), 
(void*)i);

+GimpleWhiteLister whitelister;
+whitelister._walk_cnode (i);
+bool no_external = whitelister.does_not_call_external_functions (i, 
map);

+bool before_in_map = map.find (i->decl) != map.end ();
+bool place_callers_in_worklist = !before_in_map;
+if (!before_in_map)
+{
+  map.insert(std::pair(i->decl, no_external));
+} else
+{
+  map[i->decl] = no_external;
+}
+bool previous_value = map[i->decl];
+place_callers_in_worklist |= previous_value != no_external;
+if (previous_value != no_external)
+{
+   // This ensures we are having a total order
+   // from no_external -> !no_external
+   gcc_assert (!previous_value);
+   gcc_a

[PATCH 3/6 v3] Add Field Reordering

2020-11-11 Thread Erick Ochoa



Field reordering of structs at link-time

2020-11-04  Erick Ochoa  

* gcc/Makefile.in: add new file to list of sources
* gcc/common.opt: add new flag for field reordering
* gcc/passes.def: add new pass
* gcc/tree-pass.h: same
* gcc/ipa-field-reorder.c: New file
* gcc/ipa-type-escape-analysis.c: Export common functions
* gcc/ipa-type-escape-analysis.h: Same
---
 gcc/Makefile.in|   1 +
 gcc/common.opt |   4 +
 gcc/ipa-dfe.c  |  86 -
 gcc/ipa-dfe.h  |  26 +-
 gcc/ipa-field-reorder.c| 622 +
 gcc/ipa-type-escape-analysis.c |  44 ++-
 gcc/ipa-type-escape-analysis.h |  12 +-
 gcc/passes.def |   1 +
 gcc/tree-pass.h|   2 +
 9 files changed, 749 insertions(+), 49 deletions(-)
 create mode 100644 gcc/ipa-field-reorder.c

diff --git a/gcc/Makefile.in b/gcc/Makefile.in
index 8ef6047870b..2184bd0fc3d 100644
--- a/gcc/Makefile.in
+++ b/gcc/Makefile.in
@@ -1417,6 +1417,7 @@ OBJS = \
internal-fn.o \
ipa-type-escape-analysis.o \
ipa-dfe.o \
+   ipa-field-reorder.o \
ipa-cp.o \
ipa-sra.o \
ipa-devirt.o \
diff --git a/gcc/common.opt b/gcc/common.opt
index 85351738a29..7885d0f5c0c 100644
--- a/gcc/common.opt
+++ b/gcc/common.opt
@@ -3468,4 +3468,8 @@ Wdfa
 Common Var(warn_dfa) Init(1) Warning
 Warn about dead fields at link time.
 +fipa-field-reorder
+Common Report Var(flag_ipa_field_reorder) Optimization
+Reorder fields.
+
 ; This comment is to ensure we retain the blank line above.
diff --git a/gcc/ipa-dfe.c b/gcc/ipa-dfe.c
index 5ba68332ad2..b4a3698f0dc 100644
--- a/gcc/ipa-dfe.c
+++ b/gcc/ipa-dfe.c
@@ -185,7 +185,7 @@ get_types_replacement (record_field_offset_map_t 
record_field_offset_map,

 {
   TypeStringifier stringifier;
 -  TypeReconstructor reconstructor (record_field_offset_map);
+  TypeReconstructor reconstructor (record_field_offset_map, "reorg");
   for (std::set::const_iterator i = to_modify.begin (),
e = to_modify.end ();
i != e; ++i)
@@ -245,9 +245,9 @@ get_types_replacement (record_field_offset_map_t 
record_field_offset_map,

  */
 void
 substitute_types_in_program (reorg_record_map_t map,
-reorg_field_map_t field_map)
+reorg_field_map_t field_map, bool _delete)
 {
-  GimpleTypeRewriter rewriter (map, field_map);
+  GimpleTypeRewriter rewriter (map, field_map, _delete);
   rewriter.walk ();
   rewriter._rewrite_function_decl ();
 }
@@ -361,8 +361,11 @@ TypeReconstructor::set_is_not_modified_yet (tree t)
 return;
tree type = _reorg_map[tt];
-  const bool is_modified
+  bool is_modified
 = strstr (TypeStringifier::get_type_identifier (type).c_str (), 
".reorg");

+  is_modified
+|= (bool) strstr (TypeStringifier::get_type_identifier (type).c_str (),
+ ".reorder");
   if (!is_modified)
 return;
 @@ -408,14 +411,20 @@ TypeReconstructor::is_memoized (tree t)
   return already_changed;
 }
 -static tree
-get_new_identifier (tree type)
+const char *
+TypeReconstructor::get_new_suffix ()
+{
+  return _suffix;
+}
+
+tree
+get_new_identifier (tree type, const char *suffix)
 {
   const char *identifier = TypeStringifier::get_type_identifier 
(type).c_str ();

-  const bool is_new_type = strstr (identifier, "reorg");
+  const bool is_new_type = strstr (identifier, suffix);
   gcc_assert (!is_new_type);
   char *new_name;
-  asprintf (&new_name, "%s.reorg", identifier);
+  asprintf (&new_name, "%s.%s", identifier, suffix);
   return get_identifier (new_name);
 }
 @@ -471,7 +480,9 @@ TypeReconstructor::_walk_ARRAY_TYPE_post (tree t)
   TREE_TYPE (copy) = build_variant_type_copy (TREE_TYPE (copy));
   copy = is_modified ? build_distinct_type_copy (copy) : copy;
   TREE_TYPE (copy) = is_modified ? _reorg_map[TREE_TYPE (t)] : 
TREE_TYPE (copy);
-  TYPE_NAME (copy) = is_modified ? get_new_identifier (copy) : 
TYPE_NAME (copy);

+  TYPE_NAME (copy) = is_modified
+  ? get_new_identifier (copy, this->get_new_suffix ())
+  : TYPE_NAME (copy);
   // This is useful so that we go again through type layout
   TYPE_SIZE (copy) = is_modified ? NULL : TYPE_SIZE (copy);
   tree domain = TYPE_DOMAIN (t);
@@ -524,7 +535,9 @@ TypeReconstructor::_walk_POINTER_TYPE_post (tree t)
copy = is_modified ? build_variant_type_copy (copy) : copy;
   TREE_TYPE (copy) = is_modified ? _reorg_map[TREE_TYPE (t)] : 
TREE_TYPE (copy);
-  TYPE_NAME (copy) = is_modified ? get_new_identifier (copy) : 
TYPE_NAME (copy);

+  TYPE_NAME (copy) = is_modified
+  ? get_new_identifier (copy, this->get_new_suffix ())
+  : TYPE_NAME (copy);
   TYPE_CACHED_VALUES_P (copy) = false;
tree _t = tree_to_tree (t);
@@ -619,7 +632,8 @@ TypeReconstructor::_walk_RECORD_TYPE_post (tree t)
   tree main = TYPE_MAIN_VARIANT (t);

[PATCH 2/6 v3] Add Dead Field Elimination

2020-11-11 Thread Erick Ochoa



Using the Dead Field Analysis, Dead Field Elimination
automatically transforms gimple to eliminate fields that
are never read.

2020-11-04  Erick Ochoa  

* gcc/Makefile.in: add file to list of sources
* gcc/ipa-dfe.c: New
* gcc/ipa-dfe.h: Same
* gcc/ipa-type-escape-analysis.h: Export code used in dfe.
* gcc/ipa-type-escape-analysis.c: Call transformation
---
 gcc/Makefile.in|1 +
 gcc/ipa-dfe.c  | 1284 
 gcc/ipa-dfe.h  |  247 ++
 gcc/ipa-type-escape-analysis.c |   22 +-
 gcc/ipa-type-escape-analysis.h |   10 +
 5 files changed, 1554 insertions(+), 10 deletions(-)
 create mode 100644 gcc/ipa-dfe.c
 create mode 100644 gcc/ipa-dfe.h

diff --git a/gcc/Makefile.in b/gcc/Makefile.in
index 8b18c9217a2..8ef6047870b 100644
--- a/gcc/Makefile.in
+++ b/gcc/Makefile.in
@@ -1416,6 +1416,7 @@ OBJS = \
init-regs.o \
internal-fn.o \
ipa-type-escape-analysis.o \
+   ipa-dfe.o \
ipa-cp.o \
ipa-sra.o \
ipa-devirt.o \
diff --git a/gcc/ipa-dfe.c b/gcc/ipa-dfe.c
new file mode 100644
index 000..5ba68332ad2
--- /dev/null
+++ b/gcc/ipa-dfe.c
@@ -0,0 +1,1284 @@
+/* IPA Type Escape Analysis and Dead Field Elimination
+   Copyright (C) 2019-2020 Free Software Foundation, Inc.
+
+  Contributed by Erick Ochoa 
+
+This file is part of GCC.
+
+GCC is free software; you can redistribute it and/or modify it under
+the terms of the GNU General Public License as published by the Free
+Software Foundation; either version 3, or (at your option) any later
+version.
+
+GCC is distributed in the hope that it will be useful, but WITHOUT ANY
+WARRANTY; without even the implied warranty of MERCHANTABILITY or
+FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public License
+for more details.
+
+You should have received a copy of the GNU General Public License
+along with GCC; see the file COPYING3.  If not see
+.  */
+
+/* Interprocedural dead field elimination (IPA-DFE)
+
+   The goal of this transformation is to
+
+   1) Create new types to replace RECORD_TYPEs which hold dead fields.
+   2) Substitute instances of old RECORD_TYPEs for new RECORD_TYPEs.
+   3) Substitute instances of old FIELD_DECLs for new FIELD_DECLs.
+   4) Fix some instances of pointer arithmetic.
+   5) Relayout where needed.
+
+   First stage - DFA
+   =
+
+   Use DFA to compute the set of FIELD_DECLs which can be deleted.
+
+   Second stage - Reconstruct Types
+   
+
+   This stage is done by two family of classes, the SpecificTypeCollector
+   and the TypeReconstructor.
+
+   The SpecificTypeCollector collects all TYPE_P trees which point to
+   RECORD_TYPE trees returned by DFA.  The TypeReconstructor will create
+   new RECORD_TYPE trees and new TYPE_P trees replacing the old RECORD_TYPE
+   trees with the new RECORD_TYPE trees.
+
+   Third stage - Substitute Types and Relayout
+   ===
+
+   This stage is handled by ExprRewriter and GimpleRewriter.
+   Some pointer arithmetic is fixed here to take into account those 
eliminated

+   FIELD_DECLS.
+ */
+
+#include "config.h"
+
+#include 
+#include 
+#include 
+#include 
+#include 
+
+#include "system.h"
+#include "coretypes.h"
+#include "backend.h"
+#include "tree.h"
+#include "gimple-expr.h"
+#include "predict.h"
+#include "alloc-pool.h"
+#include "tree-pass.h"
+#include "cgraph.h"
+#include "diagnostic.h"
+#include "fold-const.h"
+#include "gimple-fold.h"
+#include "symbol-summary.h"
+#include "tree-vrp.h"
+#include "ipa-prop.h"
+#include "tree-pretty-print.h"
+#include "tree-inline.h"
+#include "ipa-fnsummary.h"
+#include "ipa-utils.h"
+#include "tree-ssa-ccp.h"
+#include "stringpool.h"
+#include "attribs.h"
+#include "basic-block.h" //needed for gimple.h
+#include "function.h"//needed for gimple.h
+#include "gimple.h"
+#include "stor-layout.h"
+#include "cfg.h" // needed for gimple-iterator.h
+#include "gimple-iterator.h"
+#include "gimplify.h"  //unshare_expr
+#include "value-range.h"   // make_ssa_name dependency
+#include "tree-ssanames.h" // make_ssa_name
+#include "ssa.h"
+#include "tree-into-ssa.h"
+#include "gimple-ssa.h" // update_stmt
+#include "tree.h"
+#include "gimple-expr.h"
+#include "predict.h"
+#include "alloc-pool.h"
+#include "tree-pass.h"
+#include "cgraph.h"
+#include "diagnostic.h"
+#include "fold-const.h"
+#include "gimple-fold.h"
+#include "symbol-summary.h"
+#include "tree-vrp.h"
+#include "ipa-prop.h"
+#include "tree-pretty-print.h"
+#include "tree-inline.h"
+#include "ipa-fnsummary.h"
+#include "ipa-utils.h"
+#include "tree-ssa-ccp.h"
+#include "stringpool.h"
+#include "attribs.h"
+#include "tree-ssa-alias.h"
+#include "tree-ssanames.h"
+#include "gimple.h"
+#include "cfg.h"
+#include "gimple-iterator.h"
+#include "gimple-ssa.h"
+#include "gimple-pretty-print.h"
+
+#include "ipa-type-escape-analys

vect: Allow vconds between different vector sizes

2020-11-11 Thread Richard Sandiford via Gcc-patches
[Andrew: cc:ing you in case this affects/helps GCN.]

The vcond code requires the compared vectors and the selected
vectors to have both the same size and the same number of elements
as each other.  But the operation makes logical sense even for
different vector sizes.  E.g. you could compare two V4SIs and
use the result to select between two V4DIs.

The underlying optab already allows the compared mode and the selected
mode to be specified separately.  Since the vectoriser now also
supports mixed vector sizes, I think we can simply remove the
equal-size check and just keep the equal-lanes check.  It's then
up to the target to decide which (if any) mixtures of sizes it
supports.

Tested on aarch64-linux-gnu and x86_64-linux-gnu.  OK to install?

Richard


gcc/
* optabs-tree.c (expand_vec_cond_expr_p): Allow the compared values
and the selected values to have different mode sizes.
* gimple-isel.cc (gimple_expand_vec_cond_expr): Likewise.
---
 gcc/gimple-isel.cc | 5 ++---
 gcc/optabs-tree.c  | 3 +--
 2 files changed, 3 insertions(+), 5 deletions(-)

diff --git a/gcc/gimple-isel.cc b/gcc/gimple-isel.cc
index 9186ff55cdd..b5362cc4b01 100644
--- a/gcc/gimple-isel.cc
+++ b/gcc/gimple-isel.cc
@@ -199,9 +199,8 @@ gimple_expand_vec_cond_expr (gimple_stmt_iterator *gsi,
   unsignedp = TYPE_UNSIGNED (TREE_TYPE (op0a));
 
 
-  gcc_assert (known_eq (GET_MODE_SIZE (mode), GET_MODE_SIZE (cmp_op_mode))
- && known_eq (GET_MODE_NUNITS (mode),
-  GET_MODE_NUNITS (cmp_op_mode)));
+  gcc_assert (known_eq (GET_MODE_NUNITS (mode),
+   GET_MODE_NUNITS (cmp_op_mode)));
 
   icode = get_vcond_icode (mode, cmp_op_mode, unsignedp);
   if (icode == CODE_FOR_nothing)
diff --git a/gcc/optabs-tree.c b/gcc/optabs-tree.c
index badd30bfda8..4dfda756932 100644
--- a/gcc/optabs-tree.c
+++ b/gcc/optabs-tree.c
@@ -377,8 +377,7 @@ expand_vec_cond_expr_p (tree value_type, tree cmp_op_type, 
enum tree_code code)
   TYPE_MODE (cmp_op_type)) != CODE_FOR_nothing)
 return true;
 
-  if (maybe_ne (GET_MODE_SIZE (value_mode), GET_MODE_SIZE (cmp_op_mode))
-  || maybe_ne (GET_MODE_NUNITS (value_mode), GET_MODE_NUNITS 
(cmp_op_mode)))
+  if (maybe_ne (GET_MODE_NUNITS (value_mode), GET_MODE_NUNITS (cmp_op_mode)))
 return false;
 
   if (TREE_CODE_CLASS (code) != tcc_comparison)
-- 
2.17.1



Re: std::jthread::operator=(std::jthread&&) calls std::terminate if *this has an associated running thread.

2020-11-11 Thread Jonathan Wakely via Gcc-patches

On 08/11/20 14:51 +0100, Paul Scharnofske via Libstdc++ wrote:

I think this would work:

  jthread& operator=(jthread&& __x) noexcept
  {
std::jthread(std::move(__x)).swap(*this);
return *this;
  }


That looks a lot better than what I did, it's also consistent with other
places like std::stop_token::operator=(std::stop_token&&).

I updated my patch and also created a test that checks the post
conditions described in the standard.


Thanks, I've committed the patch and will also backport it to the
gcc-10 branch.

This patch is small enough to not require a copyright assignment, but
if you would like to make further contritbutions please contact me
off-list to discuss the legal prerequisites as described at
https://gcc.gnu.org/contribute.html

Thanks again for contributing to GCC!



---
libstdc++-v3/include/std/thread   |  6 +-
.../testsuite/30_threads/jthread/jthread.cc   | 20 +++
2 files changed, 25 insertions(+), 1 deletion(-)

diff --git a/libstdc++-v3/include/std/thread b/libstdc++-v3/include/std/thread
index 887ee579962..080036e2609 100644
--- a/libstdc++-v3/include/std/thread
+++ b/libstdc++-v3/include/std/thread
@@ -456,7 +456,11 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
operator=(const jthread&) = delete;

jthread&
-operator=(jthread&&) noexcept = default;
+operator=(jthread&& __other) noexcept
+{
+  std::jthread(std::move(__other)).swap(*this);
+  return *this;
+}

void
swap(jthread& __other) noexcept
diff --git a/libstdc++-v3/testsuite/30_threads/jthread/jthread.cc 
b/libstdc++-v3/testsuite/30_threads/jthread/jthread.cc
index 746ff437c1d..b8ba62f6df2 100644
--- a/libstdc++-v3/testsuite/30_threads/jthread/jthread.cc
+++ b/libstdc++-v3/testsuite/30_threads/jthread/jthread.cc
@@ -187,6 +187,25 @@ void test_detach()
  VERIFY(t1FinallyInterrupted.load());
}

+//--
+
+void test_move_assignment()
+{
+std::jthread thread1([]{});
+std::jthread thread2([]{});
+
+const auto id2 = thread2.get_id();
+const auto ssource2 = thread2.get_stop_source();
+
+thread1 = std::move(thread2);
+
+VERIFY(thread1.get_id() == id2);
+VERIFY(thread2.get_id() == std::jthread::id());
+
+VERIFY(thread1.get_stop_source() == ssource2);
+VERIFY(!thread2.get_stop_source().stop_possible());
+}
+
int main()
{
  std::set_terminate([](){
@@ -197,4 +216,5 @@ int main()
  test_stop_token();
  test_join();
  test_detach();
+  test_move_assignment();
}
--
2.29.2





[PATCH,wwwdocs] gcc-11/changes: Mention Intel AVX-VNNI

2020-11-11 Thread Hongtao Liu via Gcc-patches
[GCC-11] Mention Intel AVX-VNNI and add it to ALDERLAKE and SAPPIRERAPIDS,
also add HRESET to ALDERLAKE.

diff --git a/htdocs/gcc-11/changes.html b/htdocs/gcc-11/changes.html
index d7a3a1f9..fc4c74f4 100644
--- a/htdocs/gcc-11/changes.html
+++ b/htdocs/gcc-11/changes.html
@@ -267,15 +267,20 @@ a work-in-progress.
   added to GCC. AMX-TILE, AMX-INT8, AMX-BF16 intrinsics are available
   via the -mamx-tile, -mamx-int8, -mamx-bf16 compiler switch.
   
+  New ISA extension support for Intel AVX-VNNI was added to GCC.
+  AVX-VNNI intrinsics are available via the -mavxvnni
+  compiler switch.
+  
   GCC now supports the Intel CPU named Sapphire Rapids through
 -march=sapphirerapids.
 The switch enables the MOVDIRI MOVDIR64B AVX512VP2INTERSECT ENQCMD CLDEMOTE
-SERIALIZE PTWRITE WAITPKG TSXLDTRK AMT-TILE AMX-INT8 AMX-BF16 ISA
extensions.
+SERIALIZE PTWRITE WAITPKG TSXLDTRK AMT-TILE AMX-INT8 AMX-BF16 AVX-VNNI
+ISA extensions.
   
   GCC now supports the Intel CPU named Alderlake through
 -march=alderlake.
-The switch enables the CLDEMOTE PTWRITE WAITPKG SERIALIZE KEYLOCKER
-ISA extensions.
+The switch enables the CLDEMOTE PTWRITE WAITPKG SERIALIZE
KEYLOCKER AVX-VNNI
+HRESET ISA extensions.
   
 


-- 
BR,
Hongtao


Re: [PATCH 2/6 v3] Add Dead Field Elimination

2020-11-11 Thread Jakub Jelinek via Gcc-patches
On Wed, Nov 11, 2020 at 03:14:59AM -0800, Erick Ochoa wrote:
> 
> Using the Dead Field Analysis, Dead Field Elimination
> automatically transforms gimple to eliminate fields that
> are never read.
> 
> 2020-11-04  Erick Ochoa  
> 
> * gcc/Makefile.in: add file to list of sources
> * gcc/ipa-dfe.c: New
> * gcc/ipa-dfe.h: Same
> * gcc/ipa-type-escape-analysis.h: Export code used in dfe.
> * gcc/ipa-type-escape-analysis.c: Call transformation

Just random general nits, not a review.
The gcc/ prefix shouldn't be in the filenames, paths are relative
to the ChangeLog file into which it goes and gcc/ directory has a ChangeLog.
All entries should start with a capital letter, so Add above, and all should
end with a period (missing in all but one place).

> ---
>  gcc/Makefile.in|1 +
>  gcc/ipa-dfe.c  | 1284 
>  gcc/ipa-dfe.h  |  247 ++
>  gcc/ipa-type-escape-analysis.c |   22 +-
>  gcc/ipa-type-escape-analysis.h |   10 +
>  5 files changed, 1554 insertions(+), 10 deletions(-)
>  create mode 100644 gcc/ipa-dfe.c
>  create mode 100644 gcc/ipa-dfe.h
> 
> diff --git a/gcc/Makefile.in b/gcc/Makefile.in
> index 8b18c9217a2..8ef6047870b 100644
> --- a/gcc/Makefile.in
> +++ b/gcc/Makefile.in
> @@ -1416,6 +1416,7 @@ OBJS = \
>   init-regs.o \
>   internal-fn.o \
>   ipa-type-escape-analysis.o \
> + ipa-dfe.o \
>   ipa-cp.o \
>   ipa-sra.o \
>   ipa-devirt.o \
> diff --git a/gcc/ipa-dfe.c b/gcc/ipa-dfe.c
> new file mode 100644
> index 000..5ba68332ad2
> --- /dev/null
> +++ b/gcc/ipa-dfe.c
> @@ -0,0 +1,1284 @@
> +/* IPA Type Escape Analysis and Dead Field Elimination
> +   Copyright (C) 2019-2020 Free Software Foundation, Inc.
> +
> +  Contributed by Erick Ochoa 
> +
> +This file is part of GCC.
> +
> +GCC is free software; you can redistribute it and/or modify it under
> +the terms of the GNU General Public License as published by the Free
> +Software Foundation; either version 3, or (at your option) any later
> +version.
> +
> +GCC is distributed in the hope that it will be useful, but WITHOUT ANY
> +WARRANTY; without even the implied warranty of MERCHANTABILITY or
> +FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public License
> +for more details.
> +
> +You should have received a copy of the GNU General Public License
> +along with GCC; see the file COPYING3.  If not see
> +.  */
> +
> +/* Interprocedural dead field elimination (IPA-DFE)
> +
> +   The goal of this transformation is to
> +
> +   1) Create new types to replace RECORD_TYPEs which hold dead fields.
> +   2) Substitute instances of old RECORD_TYPEs for new RECORD_TYPEs.
> +   3) Substitute instances of old FIELD_DECLs for new FIELD_DECLs.
> +   4) Fix some instances of pointer arithmetic.
> +   5) Relayout where needed.
> +
> +   First stage - DFA
> +   =
> +
> +   Use DFA to compute the set of FIELD_DECLs which can be deleted.
> +
> +   Second stage - Reconstruct Types
> +   
> +
> +   This stage is done by two family of classes, the SpecificTypeCollector
> +   and the TypeReconstructor.
> +
> +   The SpecificTypeCollector collects all TYPE_P trees which point to
> +   RECORD_TYPE trees returned by DFA.  The TypeReconstructor will create
> +   new RECORD_TYPE trees and new TYPE_P trees replacing the old RECORD_TYPE
> +   trees with the new RECORD_TYPE trees.
> +
> +   Third stage - Substitute Types and Relayout
> +   ===
> +
> +   This stage is handled by ExprRewriter and GimpleRewriter.
> +   Some pointer arithmetic is fixed here to take into account those
> eliminated
> +   FIELD_DECLS.
> + */
> +
> +#include "config.h"
> +
> +#include 
> +#include 
> +#include 
> +#include 
> +#include 

We really do not want to use STL in GCC sources that much, we have our own
vectors and hash_sets/maps.  If something is still needed after that,
the standard way to include STL header is define INCLUDE_ALGORITHM etc.
macros before including system.h.
> +
> +#include "system.h"

> +  TypeStringifier stringifier;

GCC is not a CamelCase shop, so types etc. should use lower-case
only and underscores instead.  This is everywhere in the patch.
> +
> +  // Here we are just placing the types of interest in a set.
> +  for (std::map::const_iterator i
> +   = record_field_offset_map.begin (),
> +   e = record_field_offset_map.end ();
> +   i != e; ++i)

This should just use hash_map.

> +  for (std::set::const_iterator i = non_escaping.begin (),
> + e = non_escaping.end ();
> +   i != e; ++i)
> +{
> +  tree type = *i;
> +  specifier.walk (type);
> +}
> +
> +  // These are all the types which need modifications.
> +  std::set to_modify = specifier.get_set ();

And hash_set.

Also note that code-generation from hash table iterations should be
gene

Re: [PATCH] match.pd: rewrite x << C with C > precision to (const_int 0)

2020-11-11 Thread Richard Biener via Gcc-patches
On Wed, Nov 11, 2020 at 11:28 AM Philipp Tomsich
 wrote:
>
> From: Philipp Tomsich 
>
> csmith managed to sneak a shift wider than the bit-width of a register
> past the frontend (found when addressing a bug in our bitmanip machine
> description): no warning is given and an unneeded shift is generated.
> This behaviour was validated for the resulting assembly both for RISC-V
> and AArch64.
>
> This matches (x << C), where C is contant and C > precicison(x), and
> rewrites it to (const_int 0).  This has been confirmed to remove the
> redundant shift instruction both for AArch64 and RISC-V.
> ---
>  gcc/match.pd | 6 ++
>  1 file changed, 6 insertions(+)
>
> diff --git a/gcc/match.pd b/gcc/match.pd
> index 349eab6..2309175 100644
> --- a/gcc/match.pd
> +++ b/gcc/match.pd
> @@ -764,6 +764,12 @@ DEFINE_INT_AND_FLOAT_ROUND_FN (RINT)
> (cabss (ops @0))
> (cabss @0
>
> +/* Fold (x << C), where C > precision(type) into 0. */
> +(simplify
> + (lshift @0 INTEGER_CST@1)
> +  (if (wi::ltu_p (TYPE_PRECISION (TREE_TYPE (@0)), wi::to_wide(@1)))

You want element_precision (@0), otherwise this breaks for vector
shifts by scalars.

Please move it in the section starting with

/* Simplifications of shift and rotates.  */

you should be able to write a testcase.  When looking at

int foo(int a)
{
  return a << 33;
}

I see the shift eliminated to zero by early constant propagation, but with
-fno-tree-ccp I see it prevails to the assembler.

Thanks,
Richard.

> +   { build_zero_cst (TREE_TYPE (@0)); } ))
> +
>  /* Fold (a * (1 << b)) into (a << b)  */
>  (simplify
>   (mult:c @0 (convert? (lshift integer_onep@1 @2)))
> --
> 1.8.3.1
>


Re: vect: Allow vconds between different vector sizes

2020-11-11 Thread Richard Biener
On Wed, 11 Nov 2020, Richard Sandiford wrote:

> [Andrew: cc:ing you in case this affects/helps GCN.]
> 
> The vcond code requires the compared vectors and the selected
> vectors to have both the same size and the same number of elements
> as each other.  But the operation makes logical sense even for
> different vector sizes.  E.g. you could compare two V4SIs and
> use the result to select between two V4DIs.
> 
> The underlying optab already allows the compared mode and the selected
> mode to be specified separately.  Since the vectoriser now also
> supports mixed vector sizes, I think we can simply remove the
> equal-size check and just keep the equal-lanes check.  It's then
> up to the target to decide which (if any) mixtures of sizes it
> supports.
> 
> Tested on aarch64-linux-gnu and x86_64-linux-gnu.  OK to install?

OK.

Thanks,
Richard.

> Richard
> 
> 
> gcc/
>   * optabs-tree.c (expand_vec_cond_expr_p): Allow the compared values
>   and the selected values to have different mode sizes.
>   * gimple-isel.cc (gimple_expand_vec_cond_expr): Likewise.
> ---
>  gcc/gimple-isel.cc | 5 ++---
>  gcc/optabs-tree.c  | 3 +--
>  2 files changed, 3 insertions(+), 5 deletions(-)
> 
> diff --git a/gcc/gimple-isel.cc b/gcc/gimple-isel.cc
> index 9186ff55cdd..b5362cc4b01 100644
> --- a/gcc/gimple-isel.cc
> +++ b/gcc/gimple-isel.cc
> @@ -199,9 +199,8 @@ gimple_expand_vec_cond_expr (gimple_stmt_iterator *gsi,
>unsignedp = TYPE_UNSIGNED (TREE_TYPE (op0a));
>  
>  
> -  gcc_assert (known_eq (GET_MODE_SIZE (mode), GET_MODE_SIZE (cmp_op_mode))
> -   && known_eq (GET_MODE_NUNITS (mode),
> -GET_MODE_NUNITS (cmp_op_mode)));
> +  gcc_assert (known_eq (GET_MODE_NUNITS (mode),
> + GET_MODE_NUNITS (cmp_op_mode)));
>  
>icode = get_vcond_icode (mode, cmp_op_mode, unsignedp);
>if (icode == CODE_FOR_nothing)
> diff --git a/gcc/optabs-tree.c b/gcc/optabs-tree.c
> index badd30bfda8..4dfda756932 100644
> --- a/gcc/optabs-tree.c
> +++ b/gcc/optabs-tree.c
> @@ -377,8 +377,7 @@ expand_vec_cond_expr_p (tree value_type, tree 
> cmp_op_type, enum tree_code code)
>  TYPE_MODE (cmp_op_type)) != CODE_FOR_nothing)
>  return true;
>  
> -  if (maybe_ne (GET_MODE_SIZE (value_mode), GET_MODE_SIZE (cmp_op_mode))
> -  || maybe_ne (GET_MODE_NUNITS (value_mode), GET_MODE_NUNITS 
> (cmp_op_mode)))
> +  if (maybe_ne (GET_MODE_NUNITS (value_mode), GET_MODE_NUNITS (cmp_op_mode)))
>  return false;
>  
>if (TREE_CODE_CLASS (code) != tcc_comparison)
> 

-- 
Richard Biener 
SUSE Software Solutions Germany GmbH, Maxfeldstrasse 5, 90409 Nuernberg,
Germany; GF: Felix Imend


Re: [PATCH] match.pd: rewrite x << C with C > precision to (const_int 0)

2020-11-11 Thread Jakub Jelinek via Gcc-patches
On Wed, Nov 11, 2020 at 12:28:28PM +0100, Richard Biener via Gcc-patches wrote:
> > +/* Fold (x << C), where C > precision(type) into 0. */
> > +(simplify
> > + (lshift @0 INTEGER_CST@1)
> > +  (if (wi::ltu_p (TYPE_PRECISION (TREE_TYPE (@0)), wi::to_wide(@1)))
> 
> You want element_precision (@0), otherwise this breaks for vector
> shifts by scalars.
> 
> Please move it in the section starting with
> 
> /* Simplifications of shift and rotates.  */
> 
> you should be able to write a testcase.  When looking at
> 
> int foo(int a)
> {
>   return a << 33;
> }
> 
> I see the shift eliminated to zero by early constant propagation, but with
> -fno-tree-ccp I see it prevails to the assembler.

If we do want to do this, wouldn't it be better done in vrp instead?
I mean, if we want to optimize x << 33 to 0, don't we want to also optimize
x << [32, 137] to 0 too, or
x >= 32 ? 1 << x : 0 to 0, etc.?

For the cswitch, the question is where the out of bounds shift comes from
and if it isn't compiler's fault that it made it into the IL.

Jakub



Re: vect: Allow vconds between different vector sizes

2020-11-11 Thread Andrew Stubbs

On 11/11/2020 11:16, Richard Sandiford wrote:

[Andrew: cc:ing you in case this affects/helps GCN.]

The vcond code requires the compared vectors and the selected
vectors to have both the same size and the same number of elements
as each other.  But the operation makes logical sense even for
different vector sizes.  E.g. you could compare two V4SIs and
use the result to select between two V4DIs.

The underlying optab already allows the compared mode and the selected
mode to be specified separately.  Since the vectoriser now also
supports mixed vector sizes, I think we can simply remove the
equal-size check and just keep the equal-lanes check.  It's then
up to the target to decide which (if any) mixtures of sizes it
supports.

Tested on aarch64-linux-gnu and x86_64-linux-gnu.  OK to install?


If this doesn't work for GCN then I'll make it work! :-)

Andrew


[committed] aarch64: Support SVE comparisons for unpacked integers

2020-11-11 Thread Richard Sandiford via Gcc-patches
This patch adds support for comparing unpacked SVE integer vectors,
such as byte elements stored in the bottom bytes of halfword
containers.  It also adds support for selects between unpacked
SVE vectors (both integer and floating-point), since selects and
compares are closely tied via the vcond optab interface.

Tested on aarch64-linux-gnu, pushed to trunk.

Richard


gcc/
* config/aarch64/aarch64-sve.md (@vcond_mask_): Extend
from SVE_FULL to SVE_ALL.
(*vcond_mask_): Likewise.
(@aarch64_sel_dup): Likewise.
(vcond): Extend to...
(vcond): ...this, but requiring the
sizes of the container modes to match.
(vcondu): Extend to...
(vcondu): ...this.
(vec_cmp): Extend to...
(vec_cmp): ...this.
(vec_cmpu): Extend to...
(vec_cmpu): ...this.
(@aarch64_pred_cmp): Extend to...
(@aarch64_pred_cmp): ...this.
(*cmp_cc): Extend to...
(*cmp_cc): ...this.
(*cmp_ptest): Extend to...
(*cmp_ptest): ...this.
(*cmp_and): Extend to...
(*cmp_and): ...this.

gcc/testsuite/
* gcc.target/aarch64/sve/cmp_1.c: New test.
* gcc.target/aarch64/sve/cmp_2.c: Likewise.
* gcc.target/aarch64/sve/cond_arith_1.c: Add --param
aarch64-sve-compare-costs=0
* gcc.target/aarch64/sve/cond_arith_1_run.c: Likewise.
* gcc.target/aarch64/sve/cond_arith_3.c: Likewise.
* gcc.target/aarch64/sve/cond_arith_3_run.c: Likewise.
* gcc.target/aarch64/sve/mask_gather_load_7.c: Likewise.
* gcc.target/aarch64/sve/mask_load_slp_1.c: Likewise.
* gcc.target/aarch64/sve/vcond_11.c: Likewise.
* gcc.target/aarch64/sve/vcond_11_run.c: Likewise.
---
 gcc/config/aarch64/aarch64-sve.md | 121 --
 gcc/testsuite/gcc.target/aarch64/sve/cmp_1.c  |  57 +
 gcc/testsuite/gcc.target/aarch64/sve/cmp_2.c  |  72 +++
 .../gcc.target/aarch64/sve/cond_arith_1.c |   2 +-
 .../gcc.target/aarch64/sve/cond_arith_1_run.c |   2 +-
 .../gcc.target/aarch64/sve/cond_arith_3.c |   2 +-
 .../gcc.target/aarch64/sve/cond_arith_3_run.c |   2 +-
 .../aarch64/sve/mask_gather_load_7.c  |   2 +-
 .../gcc.target/aarch64/sve/mask_load_slp_1.c  |   2 +-
 .../gcc.target/aarch64/sve/vcond_11.c |   2 +-
 .../gcc.target/aarch64/sve/vcond_11_run.c |   2 +-
 11 files changed, 216 insertions(+), 50 deletions(-)
 create mode 100644 gcc/testsuite/gcc.target/aarch64/sve/cmp_1.c
 create mode 100644 gcc/testsuite/gcc.target/aarch64/sve/cmp_2.c

diff --git a/gcc/config/aarch64/aarch64-sve.md 
b/gcc/config/aarch64/aarch64-sve.md
index 4b0a1ebe9e1..455b025521f 100644
--- a/gcc/config/aarch64/aarch64-sve.md
+++ b/gcc/config/aarch64/aarch64-sve.md
@@ -7379,11 +7379,11 @@ (define_insn "@aarch64_sve_"
 ;; UNSPEC_SEL operand order: mask, true, false (as for VEC_COND_EXPR)
 ;; SEL operand order:mask, true, false
 (define_expand "@vcond_mask_"
-  [(set (match_operand:SVE_FULL 0 "register_operand")
-   (unspec:SVE_FULL
+  [(set (match_operand:SVE_ALL 0 "register_operand")
+   (unspec:SVE_ALL
  [(match_operand: 3 "register_operand")
-  (match_operand:SVE_FULL 1 "aarch64_sve_reg_or_dup_imm")
-  (match_operand:SVE_FULL 2 "aarch64_simd_reg_or_zero")]
+  (match_operand:SVE_ALL 1 "aarch64_sve_reg_or_dup_imm")
+  (match_operand:SVE_ALL 2 "aarch64_simd_reg_or_zero")]
  UNSPEC_SEL))]
   "TARGET_SVE"
   {
@@ -7396,12 +7396,25 @@ (define_expand "@vcond_mask_"
 ;; - two registers
 ;; - a duplicated immediate and a register
 ;; - a duplicated immediate and zero
+;;
+;; For unpacked vectors, it doesn't really matter whether SEL uses the
+;; the container size or the element size.  If SEL used the container size,
+;; it would ignore undefined bits of the predicate but would copy the
+;; upper (undefined) bits of each container along with the defined bits.
+;; If SEL used the element size, it would use undefined bits of the predicate
+;; to select between undefined elements in each input vector.  Thus the only
+;; difference is whether the undefined bits in a container always come from
+;; the same input as the defined bits, or whether the choice can vary
+;; independently of the defined bits.
+;;
+;; For the other instructions, using the element size is more natural,
+;; so we do that for SEL as well.
 (define_insn "*vcond_mask_"
-  [(set (match_operand:SVE_FULL 0 "register_operand" "=w, w, w, w, ?w, ?&w, 
?&w")
-   (unspec:SVE_FULL
+  [(set (match_operand:SVE_ALL 0 "register_operand" "=w, w, w, w, ?w, ?&w, 
?&w")
+   (unspec:SVE_ALL
  [(match_operand: 3 "register_operand" "Upa, Upa, Upa, Upa, 
Upl, Upl, Upl")
-  (match_operand:SVE_FULL 1 "aarch64_sve_reg_or_dup_imm" "w, vss, vss, 
Ufc, Ufc, vss, Ufc")
-  (match_operand:SVE_FULL 2 "aarch64_simd_reg_or_zero" "w, 0, Dz, 0, 
Dz, w, w")]
+  (match_operand:SVE_ALL 1 "aarch64_sve_r

[PATCH] tree-optimization/97623 - Avoid PRE hoist insertion iteration

2020-11-11 Thread Richard Biener
The recent previous change in this area limited hoist insertion
iteration via a param but the following is IMHO better since
we are not really interested in PRE opportunities exposed by
hoisting but only the other way around.  So this moves hoist
insertion after PRE iteration finished and removes hoist
insertion iteration alltogether.

Bootstrapped and tested on x86_64-unknown-linux-gnu, pushed.

2020-11-11  Richard Biener  

PR tree-optimization/97623
* params.opt (-param=max-pre-hoist-insert-iterations): Remove
again.
* doc/invoke.texi (max-pre-hoist-insert-iterations): Likewise.
* tree-ssa-pre.c (insert): Move hoist insertion after PRE
insertion iteration and do not iterate it.

* gcc.dg/tree-ssa/ssa-hoist-3.c: Adjust.
* gcc.dg/tree-ssa/ssa-hoist-7.c: Likewise.
* gcc.dg/tree-ssa/ssa-pre-30.c: Likewise.
---
 gcc/doc/invoke.texi |  5 ---
 gcc/params.opt  |  4 ---
 gcc/testsuite/gcc.dg/tree-ssa/ssa-hoist-3.c |  2 +-
 gcc/testsuite/gcc.dg/tree-ssa/ssa-hoist-7.c |  4 +--
 gcc/testsuite/gcc.dg/tree-ssa/ssa-pre-30.c  |  2 +-
 gcc/tree-ssa-pre.c  | 34 +
 6 files changed, 26 insertions(+), 25 deletions(-)

diff --git a/gcc/doc/invoke.texi b/gcc/doc/invoke.texi
index d01beb248e1..5692a986928 100644
--- a/gcc/doc/invoke.texi
+++ b/gcc/doc/invoke.texi
@@ -13446,11 +13446,6 @@ is aborted and the load or store is not considered 
redundant.  The
 number of queries is algorithmically limited to the number of
 stores on all paths from the load to the function entry.
 
-@item max-pre-hoist-insert-iterations
-The maximum number of iterations doing insertion during code
-hoisting which is done as part of the partial redundancy elimination
-insertion phase.
-
 @item ira-max-loops-num
 IRA uses regional register allocation by default.  If a function
 contains more loops than the number given by this parameter, only at most
diff --git a/gcc/params.opt b/gcc/params.opt
index a33a371a395..7bac39a9d58 100644
--- a/gcc/params.opt
+++ b/gcc/params.opt
@@ -597,10 +597,6 @@ Maximum depth of sqrt chains to use when synthesizing 
exponentiation by a real c
 Common Joined UInteger Var(param_max_predicted_iterations) Init(100) 
IntegerRange(0, 65536) Param Optimization
 The maximum number of loop iterations we predict statically.
 
--param=max-pre-hoist-insert-iterations=
-Common Joined UInteger Var(param_max_pre_hoist_insert_iterations) Init(3) 
Param Optimization
-The maximum number of insert iterations done for PRE code hoisting.
-
 -param=max-reload-search-insns=
 Common Joined UInteger Var(param_max_reload_search_insns) Init(100) Param 
Optimization
 The maximum number of instructions to search backward when looking for 
equivalent reload.
diff --git a/gcc/testsuite/gcc.dg/tree-ssa/ssa-hoist-3.c 
b/gcc/testsuite/gcc.dg/tree-ssa/ssa-hoist-3.c
index 51ba59c9ab6..de3051bfb50 100644
--- a/gcc/testsuite/gcc.dg/tree-ssa/ssa-hoist-3.c
+++ b/gcc/testsuite/gcc.dg/tree-ssa/ssa-hoist-3.c
@@ -15,4 +15,4 @@ int test (int a, int b, int c, int g)
 /* We should hoist and CSE only the multiplication.  */
 
 /* { dg-final { scan-tree-dump-times " \\* " 1 "pre" } } */
-/* { dg-final { scan-tree-dump "Insertions: 1" "pre" } } */
+/* { dg-final { scan-tree-dump "HOIST inserted: 1" "pre" } } */
diff --git a/gcc/testsuite/gcc.dg/tree-ssa/ssa-hoist-7.c 
b/gcc/testsuite/gcc.dg/tree-ssa/ssa-hoist-7.c
index ce9cec61668..fdb6a3ed349 100644
--- a/gcc/testsuite/gcc.dg/tree-ssa/ssa-hoist-7.c
+++ b/gcc/testsuite/gcc.dg/tree-ssa/ssa-hoist-7.c
@@ -49,6 +49,6 @@ void foo (int a, int b, int c, int d, int e, int x, int y, 
int z)
 
 /* Now inserting x + y five times is unnecessary but the cascading
cannot be avoided with the simple-minded dataflow.  But make sure
-   we do the insertions all in the first iteration.  */
-/* { dg-final { scan-tree-dump "insert iterations == 2" "pre" } } */
+   we do not iterate PRE insertion.  */
+/* { dg-final { scan-tree-dump "insert iterations == 1" "pre" } } */
 /* { dg-final { scan-tree-dump "HOIST inserted: 5" "pre" } } */
diff --git a/gcc/testsuite/gcc.dg/tree-ssa/ssa-pre-30.c 
b/gcc/testsuite/gcc.dg/tree-ssa/ssa-pre-30.c
index 59af63ad5ac..cf9317372d6 100644
--- a/gcc/testsuite/gcc.dg/tree-ssa/ssa-pre-30.c
+++ b/gcc/testsuite/gcc.dg/tree-ssa/ssa-pre-30.c
@@ -24,4 +24,4 @@ bar (int b, int x)
 /* We should see the partial redundant loads of f even though they
are using different types (of the same size).  */
 
-/* { dg-final { scan-tree-dump-times "Replaced MEM" 2 "pre" } } */
+/* { dg-final { scan-tree-dump-times "Replaced MEM" 3 "pre" } } */
diff --git a/gcc/tree-ssa-pre.c b/gcc/tree-ssa-pre.c
index da2b68909d9..d90249c0182 100644
--- a/gcc/tree-ssa-pre.c
+++ b/gcc/tree-ssa-pre.c
@@ -3695,18 +3695,6 @@ insert (void)
fprintf (dump_file, "Starting insert iteration %d\n", num_iterations);
 
   changed = false;
-  /* Insert expressions for hois

[PATCH] testsuite/97797 - adjust GIMPLE tests for sizetype

2020-11-11 Thread Richard Biener
Tested on x86_64-unknown-linux-gnu, pushed.

2020-11-11  Richard Biener  

PR testsuite/97797
* gcc.dg/torture/ssa-fre-5.c: Use __SIZETYPE__ where
appropriate.
* gcc.dg/torture/ssa-fre-6.c: Likewise.
---
 gcc/testsuite/gcc.dg/torture/ssa-fre-5.c | 8 
 gcc/testsuite/gcc.dg/torture/ssa-fre-6.c | 8 
 2 files changed, 8 insertions(+), 8 deletions(-)

diff --git a/gcc/testsuite/gcc.dg/torture/ssa-fre-5.c 
b/gcc/testsuite/gcc.dg/torture/ssa-fre-5.c
index 180fd720749..1915b9ae771 100644
--- a/gcc/testsuite/gcc.dg/torture/ssa-fre-5.c
+++ b/gcc/testsuite/gcc.dg/torture/ssa-fre-5.c
@@ -11,14 +11,14 @@ foo ()
   int * p;
   int i;
   int x[4];
-  long unsigned int _1;
-  long unsigned int _2;
+  __SIZETYPE__ _1;
+  __SIZETYPE__ _2;
   int _7;
 
   __BB(2):
   i_3 = 0;
-  _1 = (long unsigned int) i_3;
-  _2 = _1 * 4ul;
+  _1 = (__SIZETYPE__) i_3;
+  _2 = _1 * _Literal (__SIZETYPE__) 4;
   p_4 = _Literal (int *) &x + _2;
   __MEM  ((v4si *)p_4) = _Literal (v4si) { 1, 2, 3, 4 };
   _7 = x[0];
diff --git a/gcc/testsuite/gcc.dg/torture/ssa-fre-6.c 
b/gcc/testsuite/gcc.dg/torture/ssa-fre-6.c
index 2c4235fa4ea..041d921916c 100644
--- a/gcc/testsuite/gcc.dg/torture/ssa-fre-6.c
+++ b/gcc/testsuite/gcc.dg/torture/ssa-fre-6.c
@@ -11,14 +11,14 @@ foo ()
   int * p;
   int i;
   int x[4];
-  long unsigned int _1;
-  long unsigned int _2;
+  __SIZETYPE__ _1;
+  __SIZETYPE__ _2;
   int _7;
 
   __BB(2):
   i_3 = 0;
-  _1 = (long unsigned int) i_3;
-  _2 = _1 * 4ul;
+  _1 = (__SIZETYPE__) i_3;
+  _2 = _1 * _Literal (__SIZETYPE__) 4;
   p_4 = _Literal (int *) &x + _2;
   __MEM  ((v4si *)p_4) = _Literal (v4si) {};
   _7 = x[0];
-- 
2.26.2


Re: [PATCH 2/6 v3] Add Dead Field Elimination

2020-11-11 Thread Erick Ochoa




On 11.11.20 03:25, Jakub Jelinek wrote:

On Wed, Nov 11, 2020 at 03:14:59AM -0800, Erick Ochoa wrote:


Using the Dead Field Analysis, Dead Field Elimination
automatically transforms gimple to eliminate fields that
are never read.

2020-11-04  Erick Ochoa  

 * gcc/Makefile.in: add file to list of sources
 * gcc/ipa-dfe.c: New
 * gcc/ipa-dfe.h: Same
 * gcc/ipa-type-escape-analysis.h: Export code used in dfe.
 * gcc/ipa-type-escape-analysis.c: Call transformation


Just random general nits, not a review.
The gcc/ prefix shouldn't be in the filenames, paths are relative
to the ChangeLog file into which it goes and gcc/ directory has a ChangeLog.
All entries should start with a capital letter, so Add above, and all should
end with a period (missing in all but one place).


---
  gcc/Makefile.in|1 +
  gcc/ipa-dfe.c  | 1284 
  gcc/ipa-dfe.h  |  247 ++
  gcc/ipa-type-escape-analysis.c |   22 +-
  gcc/ipa-type-escape-analysis.h |   10 +
  5 files changed, 1554 insertions(+), 10 deletions(-)
  create mode 100644 gcc/ipa-dfe.c
  create mode 100644 gcc/ipa-dfe.h

diff --git a/gcc/Makefile.in b/gcc/Makefile.in
index 8b18c9217a2..8ef6047870b 100644
--- a/gcc/Makefile.in
+++ b/gcc/Makefile.in
@@ -1416,6 +1416,7 @@ OBJS = \
init-regs.o \
internal-fn.o \
ipa-type-escape-analysis.o \
+   ipa-dfe.o \
ipa-cp.o \
ipa-sra.o \
ipa-devirt.o \
diff --git a/gcc/ipa-dfe.c b/gcc/ipa-dfe.c
new file mode 100644
index 000..5ba68332ad2
--- /dev/null
+++ b/gcc/ipa-dfe.c
@@ -0,0 +1,1284 @@
+/* IPA Type Escape Analysis and Dead Field Elimination
+   Copyright (C) 2019-2020 Free Software Foundation, Inc.
+
+  Contributed by Erick Ochoa 
+
+This file is part of GCC.
+
+GCC is free software; you can redistribute it and/or modify it under
+the terms of the GNU General Public License as published by the Free
+Software Foundation; either version 3, or (at your option) any later
+version.
+
+GCC is distributed in the hope that it will be useful, but WITHOUT ANY
+WARRANTY; without even the implied warranty of MERCHANTABILITY or
+FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public License
+for more details.
+
+You should have received a copy of the GNU General Public License
+along with GCC; see the file COPYING3.  If not see
+.  */
+
+/* Interprocedural dead field elimination (IPA-DFE)
+
+   The goal of this transformation is to
+
+   1) Create new types to replace RECORD_TYPEs which hold dead fields.
+   2) Substitute instances of old RECORD_TYPEs for new RECORD_TYPEs.
+   3) Substitute instances of old FIELD_DECLs for new FIELD_DECLs.
+   4) Fix some instances of pointer arithmetic.
+   5) Relayout where needed.
+
+   First stage - DFA
+   =
+
+   Use DFA to compute the set of FIELD_DECLs which can be deleted.
+
+   Second stage - Reconstruct Types
+   
+
+   This stage is done by two family of classes, the SpecificTypeCollector
+   and the TypeReconstructor.
+
+   The SpecificTypeCollector collects all TYPE_P trees which point to
+   RECORD_TYPE trees returned by DFA.  The TypeReconstructor will create
+   new RECORD_TYPE trees and new TYPE_P trees replacing the old RECORD_TYPE
+   trees with the new RECORD_TYPE trees.
+
+   Third stage - Substitute Types and Relayout
+   ===
+
+   This stage is handled by ExprRewriter and GimpleRewriter.
+   Some pointer arithmetic is fixed here to take into account those
eliminated
+   FIELD_DECLS.
+ */
+
+#include "config.h"
+
+#include 
+#include 
+#include 
+#include 
+#include 


We really do not want to use STL in GCC sources that much, we have our own
vectors and hash_sets/maps.  If something is still needed after that,
the standard way to include STL header is define INCLUDE_ALGORITHM etc.
macros before including system.h.


Ah, I thought that just including them before was enough. Sure, I can 
change this, however, my next version of this patch will remove the STL 
from this pass.



+
+#include "system.h"



+  TypeStringifier stringifier;


GCC is not a CamelCase shop, so types etc. should use lower-case
only and underscores instead.  This is everywhere in the patch.


Understood. Thanks!


+
+  // Here we are just placing the types of interest in a set.
+  for (std::map::const_iterator i
+   = record_field_offset_map.begin (),
+   e = record_field_offset_map.end ();
+   i != e; ++i)


This should just use hash_map.


Ditto.




+  for (std::set::const_iterator i = non_escaping.begin (),
+   e = non_escaping.end ();
+   i != e; ++i)
+{
+  tree type = *i;
+  specifier.walk (type);
+}
+
+  // These are all the types which need modifications.
+  std::set to_modify = specifier.get_set ();


And hash_set.


Ditto.



Also note that code-g

Re: [Patch, fortran] PR83118 - [8/9/10/11 Regression] Bad intrinsic assignment of class(*) array component of derived type

2020-11-11 Thread Tobias Burnus

Hi Paul,

thanks for the patch.

On 10.11.20 14:25, Paul Richard Thomas via Fortran wrote:

...


unlimited_polymorphic_32.f03:

 if (any (z .ne. [42_4, 43_4])) stop 1 + idx

If you already use an offset for the stop codes, can you enumerate those?
Currently all are 'stop 1'.

In resolve.c: Typo 'ie.' → 'i.e.' (or, if really needed: 'ie')

+ temporary; ie. the rhs of the assignment.  */



+get_class_info_from_ss (stmtblock_t * pre, gfc_ss *ss, tree *eltype)
...
+  /* lhs is class and rhs is intrinsic or derived type.  */
...
+  if (unlimited_lhs)
+ {
+   tmp = gfc_class_len_get (lhs_class_expr);
+   if (rhs_ss->info
+   && rhs_ss->info->expr
+   && rhs_ss->info->expr->ts.type == BT_CHARACTER)
+ tmp2 = build_int_cst (TREE_TYPE (tmp),
+   rhs_ss->info->expr->ts.kind);


The last part looks incomplete. Unless I am mistaken:
The length for BT_CHARACTER is the character kind times the string length,
not just the character kind.

Otherwise: LGTM, but I do not want to rule out that I missed something!

Tobias

-
Mentor Graphics (Deutschland) GmbH, Arnulfstraße 201, 80634 München / Germany
Registergericht München HRB 106955, Geschäftsführer: Thomas Heurung, Alexander 
Walter


[PATCH] libstdc++: exclude cygwin/mingw from relro linker test

2020-11-11 Thread Jonathan Yong via Gcc-patches
cygwin/mingw hosted linker may support multiple targets with ELF relro 
support. This breaks configure testing.


Attached patch excludes cygwin/mingw PE format from relro linker flag. 
Patch OK?
From a72f02aec065c312528e41e4243c702d7371b5ce Mon Sep 17 00:00:00 2001
From: Jonathan Yong <10wa...@gmail.com>
Date: Wed, 11 Nov 2020 12:23:06 +
Subject: [PATCH] libstdc++: exclude cygwin and mingw from linker relro support

PE format does not have ELF style relro linker support, exclude
from checking. If the host linker supports ELF format, configure
may get confused.

	11-11-20202  Jonathan Yong  <10wa...@gmail.com>
	libstdc++:
	* acinclude (GLIBCXX_CHECK_LINKER_FEATURES): exclude
	cygwin and mingw from relro linker test.
---
 libstdc++-v3/acinclude.m4 | 23 ---
 1 file changed, 16 insertions(+), 7 deletions(-)

diff --git a/libstdc++-v3/acinclude.m4 b/libstdc++-v3/acinclude.m4
index b9452dd74cd..650d63ab3d7 100644
--- a/libstdc++-v3/acinclude.m4
+++ b/libstdc++-v3/acinclude.m4
@@ -273,13 +273,22 @@ AC_DEFUN([GLIBCXX_CHECK_LINKER_FEATURES], [
   # Note this is only for shared objects.
   ac_ld_relro=no
   if test x"$with_gnu_ld" = x"yes"; then
-AC_MSG_CHECKING([for ld that supports -Wl,-z,relro])
-cxx_z_relo=`$LD -v --help 2>/dev/null | grep "z relro"`
-if test -n "$cxx_z_relo"; then
-  OPT_LDFLAGS="-Wl,-z,relro"
-  ac_ld_relro=yes
-fi
-AC_MSG_RESULT($ac_ld_relro)
+# cygwin and mingw uses PE, which has no ELF relro support,
+# multi target ld may confuse configure machinery
+case "$host" in
+*-*-cygwin*)
+ ;;
+*-*-mingw*)
+ ;;
+*)
+  AC_MSG_CHECKING([for ld that supports -Wl,-z,relro])
+  cxx_z_relo=`$LD -v --help 2>/dev/null | grep "z relro"`
+  if test -n "$cxx_z_relo"; then
+OPT_LDFLAGS="-Wl,-z,relro"
+ac_ld_relro=yes
+  fi
+  AC_MSG_RESULT($ac_ld_relro)
+esac
   fi
 
   # Set linker optimization flags.
-- 
2.29.2



OpenPGP_0x713B5FE29C145D45_and_old_rev.asc
Description: application/pgp-keys


OpenPGP_signature
Description: OpenPGP digital signature


Re: [PATCH] libstdc++: exclude cygwin/mingw from relro linker test

2020-11-11 Thread Jonathan Wakely via Gcc-patches

On 11/11/20 12:34 +, Jonathan Yong via Libstdc++ wrote:
cygwin/mingw hosted linker may support multiple targets with ELF relro 
support. This breaks configure testing.


Attached patch excludes cygwin/mingw PE format from relro linker flag. 
Patch OK?


OK, thanks.


From a72f02aec065c312528e41e4243c702d7371b5ce Mon Sep 17 00:00:00 2001
From: Jonathan Yong <10wa...@gmail.com>
Date: Wed, 11 Nov 2020 12:23:06 +
Subject: [PATCH] libstdc++: exclude cygwin and mingw from linker relro support

PE format does not have ELF style relro linker support, exclude
from checking. If the host linker supports ELF format, configure
may get confused.

11-11-20202  Jonathan Yong  <10wa...@gmail.com>
libstdc++:
* acinclude (GLIBCXX_CHECK_LINKER_FEATURES): exclude
cygwin and mingw from relro linker test.
---
libstdc++-v3/acinclude.m4 | 23 ---
1 file changed, 16 insertions(+), 7 deletions(-)

diff --git a/libstdc++-v3/acinclude.m4 b/libstdc++-v3/acinclude.m4
index b9452dd74cd..650d63ab3d7 100644
--- a/libstdc++-v3/acinclude.m4
+++ b/libstdc++-v3/acinclude.m4
@@ -273,13 +273,22 @@ AC_DEFUN([GLIBCXX_CHECK_LINKER_FEATURES], [
  # Note this is only for shared objects.
  ac_ld_relro=no
  if test x"$with_gnu_ld" = x"yes"; then
-AC_MSG_CHECKING([for ld that supports -Wl,-z,relro])
-cxx_z_relo=`$LD -v --help 2>/dev/null | grep "z relro"`
-if test -n "$cxx_z_relo"; then
-  OPT_LDFLAGS="-Wl,-z,relro"
-  ac_ld_relro=yes
-fi
-AC_MSG_RESULT($ac_ld_relro)
+# cygwin and mingw uses PE, which has no ELF relro support,
+# multi target ld may confuse configure machinery
+case "$host" in
+*-*-cygwin*)
+ ;;
+*-*-mingw*)
+ ;;
+*)
+  AC_MSG_CHECKING([for ld that supports -Wl,-z,relro])
+  cxx_z_relo=`$LD -v --help 2>/dev/null | grep "z relro"`
+  if test -n "$cxx_z_relo"; then
+OPT_LDFLAGS="-Wl,-z,relro"
+ac_ld_relro=yes
+  fi
+  AC_MSG_RESULT($ac_ld_relro)
+esac
  fi

  # Set linker optimization flags.
--
2.29.2









[Ada] Fix internal error with Shift_Right operator on signed type

2020-11-11 Thread Eric Botcazou
This is a regression present on the mainline and 10 branch in the form of an 
ICE with a shift operator applied to a variable of a signed type, and which is 
caused by a type mismatch.

Tested on x86-64/Linux, applied on the mainline and 10 branch.


2020-11-11  Eric Botcazou  

* gcc-interface/trans.c (gnat_to_gnu) : Also convert
GNU_MAX_SHIFT if the type of the operation has been changed.


2020-11-11  Eric Botcazou  

* gnat.dg/shift1.adb: New test.

-- 
Eric Botcazoudiff --git a/gcc/ada/gcc-interface/trans.c b/gcc/ada/gcc-interface/trans.c
index 059e1a4f677..d0663a2d69b 100644
--- a/gcc/ada/gcc-interface/trans.c
+++ b/gcc/ada/gcc-interface/trans.c
@@ -7085,6 +7085,8 @@ gnat_to_gnu (Node_Id gnat_node)
 	if (TREE_CODE (gnu_lhs) == INTEGER_CST && ignore_lhs_overflow)
 	  TREE_OVERFLOW (gnu_lhs) = TREE_OVERFLOW (gnu_old_lhs);
 	gnu_rhs = convert (gnu_type, gnu_rhs);
+	if (gnu_max_shift)
+	  gnu_max_shift = convert (gnu_type, gnu_max_shift);
 	  }
 
 	/* For signed integer addition, subtraction and multiplication, do an
-- { dg-do compile }
-- { dg-options "-gnatws" }

procedure Shift1 is

  type T_Integer_8 is range -2 ** 7 .. 2 ** 7 - 1
with Size => 8;

  pragma Provide_Shift_Operators (T_Integer_8);

  X : T_Integer_8;

begin
  X := Shift_Right (X, 1);
end;


RE: [PATCH] aarch64: Add backend support for expanding __builtin_memset

2020-11-11 Thread Sudakshina Das via Gcc-patches
Hi Richard

> -Original Message-
> From: Richard Sandiford 
> Sent: 03 November 2020 11:34
> To: Sudakshina Das 
> Cc: Wilco Dijkstra ; gcc-patches@gcc.gnu.org;
> Kyrylo Tkachov ; Richard Earnshaw
> 
> Subject: Re: [PATCH] aarch64: Add backend support for expanding
> __builtin_memset
> 
> Sudakshina Das  writes:
> >> -Original Message-
> >> From: Richard Sandiford 
> >> Sent: 30 October 2020 19:56
> >> To: Sudakshina Das 
> >> Cc: Wilco Dijkstra ; gcc-patches@gcc.gnu.org;
> >> Kyrylo Tkachov ; Richard Earnshaw
> >> 
> >> Subject: Re: [PATCH] aarch64: Add backend support for expanding
> >> __builtin_memset
> >>
> >> > +  base = copy_to_mode_reg (Pmode, XEXP (dst, 0));  dst =
> >> > + adjust_automodify_address (dst, VOIDmode, base, 0);
> >> > +
> >> > +  /* Prepare the val using a DUP v0.16B, val.  */  if (CONST_INT_P
> >> > + (val))
> >> > +{
> >> > +  val = force_reg (QImode, val);
> >> > +}
> >> > +  src = gen_reg_rtx (V16QImode);
> >> > +  emit_insn (gen_aarch64_simd_dupv16qi(src, val));
> >>
> >> I think we should use:
> >>
> >>   src = expand_vector_broadcast (V16QImode, val);
> >>
> >> here (without the CONST_INT_P check), so that for constants we just
> >> move a constant directly into a register.
> >>
> >
> > Sorry to bring this up again. When I tried expand_vector_broadcast, I
> > see the following behaviour:
> > for __builtin_memset(p, 1, 24) where the duplicated constant fits
> > moviv0.16b, 0x1
> > mov x1, 72340172838076673
> > str x1, [x0, 16]
> > str q0, [x0]
> > and an ICE for __builtin_memset(p, 1, 32) where I am guessing the
> > duplicated constant does not fit
> > x.c:7:30: error: unrecognizable insn:
> > 7 | { __builtin_memset(p, 1, 32);}
> >   |  ^
> > (insn 8 7 0 2 (parallel [
> > (set (mem:V16QI (reg:DI 94) [0 MEM  [(void 
> > *)p_2(D)]+0
> S16 A8])
> > (const_vector:V16QI [
> > (const_int 1 [0x1]) repeated x16
> > ]))
> > (set (mem:V16QI (plus:DI (reg:DI 94)
> > (const_int 16 [0x10])) [0 MEM  [(void 
> > *)p_2(D)]+16
> S16 A8])
> > (const_vector:V16QI [
> > (const_int 1 [0x1]) repeated x16
> > ]))
> > ]) "x.c":7:3 -1
> >  (nil))
> > during RTL pass: vregs
> 
> Ah, yeah, I guess we need to call force_reg on the result.
> 
> >> So yeah, I'm certainly not questioning the speed_p value of 256.
> >> I'm sure you and Wilco have picked the best value for that.  But -Os
> >> stuff can usually be justified on first principles and I wasn't sure
> >> where the value of 128 came from.
> >>
> >
> > I had another chat with Wilco about the 128byte value for !speed_p. We
> > estimate the average number of instructions upto 128byte would be ~3
> > which is similar to do a memset call. But I did go back and think
> > about the tuning argument of
> AARCH64_EXTRA_TUNE_NO_LDP_STP_QREGS a
> > bit more because you are right that based on that the average instructions
> can become double.
> > I would propose using 256/128 based on speed_p but halving the value
> > based on the tune parameter. Obviously the assumption here is that we
> > are respecting the core's choice of avoiding stp of q registers (given
> > that I do not see other uses of
> AARCH64_EXTRA_TUNE_NO_LDP_STP_QREGS being changed by -Os).
> 
> Yeah, but I think the lack of an -Os check in the existing code might be a
> mistake.  The point is that STP Q is smaller than two separate STR Qs, so 
> using
> it is a size optimisation even if it's not a speed optimisation.
> And like I say, -Os isn't supposed to be striking a balance between size and
> speed: it's supposed to be going for size quite aggressively.
> 
> So TBH I have slight preference for keeping the current value and only
> checking the tuning flag for speed_p.  But I agree that halving the value
> would be self-consistent, so if you or Wilco believe strongly that halving is
> better, that'd be OK with me too.
> 
> > There might be a debate on how useful
> > AARCH64_EXTRA_TUNE_NO_LDP_STP_QREGS
> > is in the context of memset/memcpy but that needs more analysis and I
> > would say should be a separate patch.
> 
> Agreed.
> 
> >> >> > +  if (n > 0 && n < copy_limit / 2)
> >> >> > + {
> >> >> > +   next_mode = smallest_mode_for_size (n, MODE_INT);
> >> >> > +   /* Last 1-byte causes the compiler to optimize to STRB when
> >> >> > +it
> >> >> should
> >> >> > +  use STR Bx, [mem] since we already used SIMD registers.
> >> >> > +  So force it to HImode.  */
> >> >> > +   if (next_mode == QImode)
> >> >> > + next_mode = HImode;
> >> >>
> >> >> Is this always better?  E.g. for variable inputs and zero it seems
> >> >> quite natural to store the original scalar GPR.
> >> >>
> >> >> If we do do this, I think we should assert before the loop that n > 1.
> >> >>
> >> 

Re: [committed] gfortran.dg/gomp/workshare-reduction-*.f90: Fix dumps for -m32 (was: Re: [Patch] Fortran: OpenMP 5.0 (in_, task_)reduction clause extensions)

2020-11-11 Thread Thomas Schwinge
Hi Tobias!

On 2020-11-11T09:25:25+0100, Tobias Burnus  wrote:
> As Sunil's regression tester pointed out, the testcases fail on x86-64 with 
> -m32.
>
> The reason is that then the _ull_ variants of the GOMP functions are called;
> in the C equivalent, those are always called – I assume that's because the C
> testcase uses 'unsigned' which does not exist with Fortran.
>
> Committed as r11-4903-g1644ab9917ca6b96e9e683c422f1793258b9a3db

I'm confirming this fixes things for '-m32' -- but it also broke '-m64'.
;-)


Grüße
 Thomas


> commit 1644ab9917ca6b96e9e683c422f1793258b9a3db
> Author: Tobias Burnus 
> Date:   Wed Nov 11 09:23:07 2020 +0100
>
> gfortran.dg/gomp/workshare-reduction-*.f90: Fix dumps for -m32
>
> gcc/testsuite/ChangeLog:
>
> * gfortran.dg/gomp/workshare-reduction-26.f90: Add (?:_ull) to
> scan-tree-dump-times regex for -m32.
> * gfortran.dg/gomp/workshare-reduction-27.f90: Likewise.
> * gfortran.dg/gomp/workshare-reduction-28.f90: Likewise.
> * gfortran.dg/gomp/workshare-reduction-3.f90: Likewise.
> * gfortran.dg/gomp/workshare-reduction-36.f90: Likewise.
> * gfortran.dg/gomp/workshare-reduction-37.f90: Likewise.
> * gfortran.dg/gomp/workshare-reduction-38.f90: Likewise.
> * gfortran.dg/gomp/workshare-reduction-39.f90: Likewise.
> * gfortran.dg/gomp/workshare-reduction-40.f90: Likewise.
> * gfortran.dg/gomp/workshare-reduction-41.f90: Likewise.
> * gfortran.dg/gomp/workshare-reduction-42.f90: Likewise.
> * gfortran.dg/gomp/workshare-reduction-43.f90: Likewise.
> * gfortran.dg/gomp/workshare-reduction-44.f90: Likewise.
> * gfortran.dg/gomp/workshare-reduction-45.f90: Likewise.
> * gfortran.dg/gomp/workshare-reduction-46.f90: Likewise.
> * gfortran.dg/gomp/workshare-reduction-47.f90: Likewise.
> * gfortran.dg/gomp/workshare-reduction-56.f90: Likewise.
> * gfortran.dg/gomp/workshare-reduction-57.f90: Likewise.
>
> diff --git a/gcc/testsuite/gfortran.dg/gomp/workshare-reduction-26.f90 
> b/gcc/testsuite/gfortran.dg/gomp/workshare-reduction-26.f90
> index 28267902914..d8633b66045 100644
> --- a/gcc/testsuite/gfortran.dg/gomp/workshare-reduction-26.f90
> +++ b/gcc/testsuite/gfortran.dg/gomp/workshare-reduction-26.f90
> @@ -3 +3 @@
> -! { dg-final { scan-tree-dump-times "__builtin_GOMP_loop_start \[^\n\r]*, 0, 
> 0, " 1 "optimized" } }
> +! { dg-final { scan-tree-dump-times "__builtin_GOMP_loop(?:_ull)_start 
> \[^\n\r]*, 0, 0, " 1 "optimized" } }
> @@ -5 +5 @@
> -! { dg-final { scan-tree-dump-times 
> "__builtin_GOMP_loop_maybe_nonmonotonic_runtime_next " 1 "optimized" } }
> +! { dg-final { scan-tree-dump-times 
> "__builtin_GOMP_loop(?:_ull)_maybe_nonmonotonic_runtime_next " 1 "optimized" 
> } }
> diff --git a/gcc/testsuite/gfortran.dg/gomp/workshare-reduction-27.f90 
> b/gcc/testsuite/gfortran.dg/gomp/workshare-reduction-27.f90
> index 2ee047d4e8c..aada4d7a23b 100644
> --- a/gcc/testsuite/gfortran.dg/gomp/workshare-reduction-27.f90
> +++ b/gcc/testsuite/gfortran.dg/gomp/workshare-reduction-27.f90
> @@ -3 +3 @@
> -! { dg-final { scan-tree-dump-times "__builtin_GOMP_loop_start \[^\n\r]*, 
> (?:2147483648|-2147483648), 0, " 1 "optimized" } }
> +! { dg-final { scan-tree-dump-times "__builtin_GOMP_loop(?:_ull)_start 
> \[^\n\r]*, (?:2147483648|-2147483648), 0, " 1 "optimized" } }
> @@ -5 +5 @@
> -! { dg-final { scan-tree-dump-times "__builtin_GOMP_loop_runtime_next " 1 
> "optimized" } }
> +! { dg-final { scan-tree-dump-times 
> "__builtin_GOMP_loop(?:_ull)_runtime_next " 1 "optimized" } }
> diff --git a/gcc/testsuite/gfortran.dg/gomp/workshare-reduction-28.f90 
> b/gcc/testsuite/gfortran.dg/gomp/workshare-reduction-28.f90
> index 6c9d49be13c..e67e24b1aa2 100644
> --- a/gcc/testsuite/gfortran.dg/gomp/workshare-reduction-28.f90
> +++ b/gcc/testsuite/gfortran.dg/gomp/workshare-reduction-28.f90
> @@ -3 +3 @@
> -! { dg-final { scan-tree-dump-times "__builtin_GOMP_loop_start \[^\n\r]*, 4, 
> 0, " 1 "optimized" } }
> +! { dg-final { scan-tree-dump-times "__builtin_GOMP_loop(?:_ull)_start 
> \[^\n\r]*, 4, 0, " 1 "optimized" } }
> @@ -5 +5 @@
> -! { dg-final { scan-tree-dump-times 
> "__builtin_GOMP_loop_nonmonotonic_runtime_next " 1 "optimized" } }
> +! { dg-final { scan-tree-dump-times 
> "__builtin_GOMP_loop(?:_ull)_nonmonotonic_runtime_next " 1 "optimized" } }
> diff --git a/gcc/testsuite/gfortran.dg/gomp/workshare-reduction-3.f90 
> b/gcc/testsuite/gfortran.dg/gomp/workshare-reduction-3.f90
> index 6c9d49be13c..e67e24b1aa2 100644
> --- a/gcc/testsuite/gfortran.dg/gomp/workshare-reduction-3.f90
> +++ b/gcc/testsuite/gfortran.dg/gomp/workshare-reduction-3.f90
> @@ -3 +3 @@
> -! { dg-final { scan-tree-dump-times "__builtin_GOMP_loop_start \[^\n\r]*, 4, 
> 0, " 1 "optimized" } }
> +! { dg-final { scan-tree-dump-times "__builtin_GOMP_loop(?:_ull)_start 
> \[^\n\r]*, 4, 0, " 

[Ada] Fix internal error on chain of constants with -gnatc

2020-11-11 Thread Eric Botcazou
This is a rather obscure case not really worth commenting. :-)

Tested on x86-64/Linux, applied on the mainline.


2020-11-11  Eric Botcazou  

* gcc-interface/decl.c (gnat_to_gnu_entity) : In case
the constant is not being defined, get the expression in type
annotation mode only if its type is elementary.

-- 
Eric Botcazoudiff --git a/gcc/ada/gcc-interface/decl.c b/gcc/ada/gcc-interface/decl.c
index 4e6dc84beea..baae58a025f 100644
--- a/gcc/ada/gcc-interface/decl.c
+++ b/gcc/ada/gcc-interface/decl.c
@@ -667,21 +667,24 @@ gnat_to_gnu_entity (Entity_Id gnat_entity, tree gnu_expr, bool definition)
 
   /* If we have a constant that we are not defining, get the expression it
 	 was defined to represent.  This is necessary to avoid generating dumb
-	 elaboration code in simple cases, but we may throw it away later if it
+	 elaboration code in simple cases, and we may throw it away later if it
 	 is not a constant.  But do not do it for dispatch tables because they
 	 are only referenced indirectly and we need to have a consistent view
 	 of the exported and of the imported declarations of the tables from
 	 external units for them to be properly merged in LTO mode.  Moreover
-	 simply do not retrieve the expression it if it is an allocator since
+	 simply do not retrieve the expression if it is an allocator because
 	 the designated type might still be dummy at this point.  Note that we
 	 invoke gnat_to_gnu_external and not gnat_to_gnu because the expression
 	 may contain N_Expression_With_Actions nodes and thus declarations of
-	 objects from other units that we need to discard.  */
+	 objects from other units that we need to discard.  Note also that we
+	 need to do it even if we are only annotating types, so as to be able
+	 to validate representation clauses using constants.  */
   if (!definition
 	  && !No_Initialization (gnat_decl)
 	  && !Is_Dispatch_Table_Entity (gnat_entity)
 	  && Present (gnat_temp = Expression (gnat_decl))
-	  && Nkind (gnat_temp) != N_Allocator)
+	  && Nkind (gnat_temp) != N_Allocator
+	  && (Is_Elementary_Type (Etype (gnat_entity)) || !type_annotate_only))
 	gnu_expr = gnat_to_gnu_external (gnat_temp);
 
   /* ... fall through ... */


ping x2 [PATCH 0/2] "noinit" and "persistent" attributes

2020-11-11 Thread Jozef Lawrynowicz
ping x2 for below

On Wed, Nov 04, 2020 at 01:03:33PM +, Jozef Lawrynowicz wrote:
> Ping for below
> https://gcc.gnu.org/pipermail/gcc-patches/2020-October/557184.html
> 
> On Tue, Oct 27, 2020 at 11:40:33AM +, Jozef Lawrynowicz wrote:
> > This patch series fixes behavior related to the "noinit" attribute, and
> > makes the MSP430 "persistent" attribute generic, so it can be used for
> > ARM.
> > These attributes are related because they are both used to mark
> > variables that should not be initialized by the target's runtime
> > startup code.
> > 
> > The "noinit" attribute is used for variables that are not initialized
> > to any value by the program loader, or the runtime startup code.
> > This attribute was made generic for GCC 10, whilst previously it was
> > only supported for MSP430.
> > There are a couple of issues when using it for arm-eabi:
> > - It does not work at -O0.
> >   The test for it is in the torture directory but only runs at -O2,
> >   which is why this bug was undetected.
> > - It does not work with -fdata-sections.
> > Patch 1 fixes these issues.
> > 
> > The "persistent" attribute is used for variables that *are* initialized
> > by the program loader, but are not initialized by the runtime startup
> > code. "persistent" variables are placed in a non-volatile area of
> > memory, which allows their value to "persist" between processor resets.
> > 
> > The "persistent" attribute is already implemented for msp430-elf, but
> > patch 2 makes it generic so it can be leveraged by ARM targets. The
> > ".persistent" section is pervasive in linker scripts distributed ARM
> > devices by manufacturers such as ST and TI.
> > 
> > I've attached a Binutils patch that adds the ".persistent" section to
> > the default ARM linker script. I'll apply it alongside this GCC patch.
> > 
> > Side note: There is handling of a ".persistent.bss" section, however
> > this is Ada-specific and unrelated to the "noinit" and "persistent"
> > attributes. The handling of the "noinit" and "persistent" attributes
> > does not interfere with it.
> > 
> > Successfully bootstrapped/regtested x86_64-pc-linux-gnu and regtested
> > for arm-none-eabi.
> > 
> > Ok for trunk?
> > 
> > Jozef Lawrynowicz (2):
> >   Fix "noinit" attribute being ignored for -O0 and -fdata-sections
> >   Implement the "persistent" attribute
> > 
> >  gcc/c-family/c-attribs.c  | 146 --
> >  gcc/cgraph.h  |   6 +-
> >  gcc/cgraphunit.c  |   2 +
> >  gcc/doc/extend.texi   |  20 ++-
> >  gcc/lto-cgraph.c  |   2 +
> >  .../c-c++-common/torture/attr-noinit-1.c  |   7 +
> >  .../c-c++-common/torture/attr-noinit-2.c  |   8 +
> >  .../c-c++-common/torture/attr-noinit-3.c  |  11 ++
> >  .../torture/attr-noinit-invalid.c |  12 ++
> >  .../torture/attr-noinit-main.inc} |  37 ++---
> >  .../c-c++-common/torture/attr-persistent-1.c  |   8 +
> >  .../c-c++-common/torture/attr-persistent-2.c  |   8 +
> >  .../c-c++-common/torture/attr-persistent-3.c  |  10 ++
> >  .../torture/attr-persistent-invalid.c |  11 ++
> >  .../torture/attr-persistent-main.inc  |  58 +++
> >  gcc/testsuite/lib/target-supports.exp |  15 +-
> >  gcc/tree-core.h   |   1 +
> >  gcc/tree.h|   7 +
> >  gcc/varasm.c  |  30 +++-
> >  19 files changed, 325 insertions(+), 74 deletions(-)
> >  create mode 100644 gcc/testsuite/c-c++-common/torture/attr-noinit-1.c
> >  create mode 100644 gcc/testsuite/c-c++-common/torture/attr-noinit-2.c
> >  create mode 100644 gcc/testsuite/c-c++-common/torture/attr-noinit-3.c
> >  create mode 100644 gcc/testsuite/c-c++-common/torture/attr-noinit-invalid.c
> >  rename gcc/testsuite/{gcc.c-torture/execute/noinit-attribute.c => 
> > c-c++-common/torture/attr-noinit-main.inc} (56%)
> >  create mode 100644 gcc/testsuite/c-c++-common/torture/attr-persistent-1.c
> >  create mode 100644 gcc/testsuite/c-c++-common/torture/attr-persistent-2.c
> >  create mode 100644 gcc/testsuite/c-c++-common/torture/attr-persistent-3.c
> >  create mode 100644 
> > gcc/testsuite/c-c++-common/torture/attr-persistent-invalid.c
> >  create mode 100644 
> > gcc/testsuite/c-c++-common/torture/attr-persistent-main.inc
> > 
> > -- 
> > 2.28.0
> > 
>From 965de1985a21ef449d1b1477be566efcf3405f7e Mon Sep 17 00:00:00 2001
From: Jozef Lawrynowicz 
Date: Mon, 26 Oct 2020 14:11:08 +
Subject: [PATCH 1/2] Fix "noinit" attribute being ignored for -O0 and
 -fdata-sections

Variables with the "noinit" attribute are ignored at -O0 because they
are treated like a regular .bss variable and placed in the .bss section.

With -fdata-sections they are ignored because they are not handled in
resolve_unique_section.

gcc/c-family/ChangeLog:

* c-attribs.c (handle_noinit_attribute): Set DECL_NOINIT_

[Ada] Fix segfault on elaboration of empty 1-element array at -O

2020-11-11 Thread Eric Botcazou
This is another rather obscure case where the elaboration of an empty array 
whose base type is an array type of length at most 1 goes awry when the code 
is compiled with optimization.

Tested on x86-64/Linux, applied on the mainline, 10 and 9 branches.


2020-11-11  Eric Botcazou  

* gcc-interface/trans.c (can_be_lower_p): Remove.
(Regular_Loop_to_gnu): Add ENTRY_COND unconditionally if
BOTTOM_COND is non-zero.


2020-11-11  Eric Botcazou  

* gnat.dg/opt89.adb: New test.

-- 
Eric Botcazoudiff --git a/gcc/ada/gcc-interface/trans.c b/gcc/ada/gcc-interface/trans.c
index d0663a2d69b..065fcd2f956 100644
--- a/gcc/ada/gcc-interface/trans.c
+++ b/gcc/ada/gcc-interface/trans.c
@@ -2814,38 +2814,6 @@ can_equal_max_val_p (tree val, tree type, bool reverse)
   return can_equal_min_or_max_val_p (val, type, !reverse);
 }
 
-/* Return true if VAL1 can be lower than VAL2.  */
-
-static bool
-can_be_lower_p (tree val1, tree val2)
-{
-  if (TREE_CODE (val1) == NOP_EXPR)
-{
-  tree type = TREE_TYPE (TREE_OPERAND (val1, 0));
-  if (can_be_lower_p (TYPE_MAX_VALUE (type), TYPE_MIN_VALUE (type)))
-	return true;
-
-  val1 = TYPE_MIN_VALUE (type);
-}
-
-  if (TREE_CODE (val1) != INTEGER_CST)
-return true;
-
-  if (TREE_CODE (val2) == NOP_EXPR)
-{
-  tree type = TREE_TYPE (TREE_OPERAND (val2, 0));
-  if (can_be_lower_p (TYPE_MAX_VALUE (type), TYPE_MIN_VALUE (type)))
-	return true;
-
-  val2 = TYPE_MAX_VALUE (type);
-}
-
-  if (TREE_CODE (val2) != INTEGER_CST)
-return true;
-
-  return tree_int_cst_lt (val1, val2);
-}
-
 /* Replace EXPR1 and EXPR2 by invariant expressions if possible.  Return
true if both expressions have been replaced and false otherwise.  */
 
@@ -3126,19 +3094,16 @@ Regular_Loop_to_gnu (Node_Id gnat_node, tree *gnu_cond_expr_p)
 	}
 
   /* If we use the BOTTOM_COND, we can turn the test into an inequality
-	 test but we may have to add ENTRY_COND to protect the empty loop.  */
+	 test but we have to add ENTRY_COND to protect the empty loop.  */
   if (LOOP_STMT_BOTTOM_COND_P (gnu_loop_stmt))
 	{
 	  test_code = NE_EXPR;
-	  if (can_be_lower_p (gnu_high, gnu_low))
-	{
-	  gnu_cond_expr
-		= build3 (COND_EXPR, void_type_node,
-			  build_binary_op (LE_EXPR, boolean_type_node,
-	   gnu_low, gnu_high),
-			  NULL_TREE, alloc_stmt_list ());
-	  set_expr_location_from_node (gnu_cond_expr, gnat_iter_scheme);
-	}
+	  gnu_cond_expr
+	= build3 (COND_EXPR, void_type_node,
+		  build_binary_op (LE_EXPR, boolean_type_node,
+   gnu_low, gnu_high),
+		  NULL_TREE, alloc_stmt_list ());
+	  set_expr_location_from_node (gnu_cond_expr, gnat_iter_scheme);
 	}
 
   /* Open a new nesting level that will surround the loop to declare the
-- { dg-do run }
-- { dg-options "-O" }

procedure Opt89 is

  type Rec is record
I : Integer := 3;
  end record;

  subtype Index is Natural range 0..0;

  type Arr is array (Index range <>) of Rec;

  X : Arr (0 .. -1);

begin
  null;
end;


[Ada] Fix biased integer arithmetic

2020-11-11 Thread Eric Botcazou
The Ada compiler uses a biased representation when a size clause reserves 
fewer bits than normal either for the lower or for the upper bound.

Tested on x86-64/Linux, applied on the mainline, 10 and 9 branches.


2020-11-11  Eric Botcazou  

* gcc-interface/trans.c (build_binary_op_trapv): Convert operands
to the result type before doing generic overflow checking.


2020-11-11  Eric Botcazou  

* gnat.dg/bias2.adb: New test.

-- 
Eric Botcazou
-- { dg-do run }

procedure Bias2 is

  type Biased_T is range 1 .. 2 ** 6;
  for Biased_T'Size use 6;  --  { dg-warning "biased representation" }
  X, Y : Biased_T;

begin
  X := 1;
  Y := 1;
  if X + Y /= 2 then
raise Program_Error;
  end if;

  X := 2;
  Y := 1;
  if X - Y /= 1 then
raise Program_Error;
  end if;

  X := 2;
  Y := 3;
  if X * Y /= 6 then
raise Program_Error;
  end if;

  X := 24;
  Y := 3;
  if X / Y /= 8 then
raise Program_Error;
  end if;
end;
diff --git a/gcc/ada/gcc-interface/trans.c b/gcc/ada/gcc-interface/trans.c
index 065fcd2f956..7be8463d32b 100644
--- a/gcc/ada/gcc-interface/trans.c
+++ b/gcc/ada/gcc-interface/trans.c
@@ -9361,6 +9361,11 @@ build_binary_op_trapv (enum tree_code code, tree gnu_type, tree left,
   /* If no operand is a constant, we use the generic implementation.  */
   if (TREE_CODE (lhs) != INTEGER_CST && TREE_CODE (rhs) != INTEGER_CST)
 {
+  /* First convert the operands to the result type like build_binary_op.
+	 This is where the bias is made explicit for biased types.  */
+  lhs = convert (gnu_type, lhs);
+  rhs = convert (gnu_type, rhs);
+
   /* Never inline a 64-bit mult for a 32-bit target, it's way too long.  */
   if (code == MULT_EXPR && precision == 64 && BITS_PER_WORD < 64)
 	{


[PATCH] aarch64: Fix SVE2 BCAX pattern [PR97730]

2020-11-11 Thread Alex Coplan via Gcc-patches
Hello,

This patch adds a missing not to the SVE2 BCAX (Bitwise clear and
exclusive or) pattern, fixing the PR. Since SVE doesn't have an
unpredicated not instruction, we need to use a (vacuously) predicated
not here.

To ensure that the predicate is instantiated correctly (to all 1s) for
the intrinsics, we pull out a separate expander from the define_insn.

>From the ISA reference [1]:
> Bitwise AND elements of the second source vector with the
> corresponding inverted elements of the third source vector, then
> exclusive OR the results with corresponding elements of the first
> source vector.

Testing:
 * Regression tested an aarch64-linux-gnu cross configured with
   --with-arch=armv8.2-a+sve2, no new failures.
 * Bootstrap and regression test on aarch64-linux-gnu in progress.

The following execution tests went from FAIL to PASS on the SVE2
regression run as a result of this change:

FAIL->PASS: gcc.c-torture/execute/pr37573.c   -O3 -fomit-frame-pointer 
-funroll-loops -fpeel-loops -ftracer -finline-functions  execution test
FAIL->PASS: gcc.c-torture/execute/pr37573.c   -O3 -g  execution test
FAIL->PASS: gcc.dg/torture/pr69714.c   -O3 -fomit-frame-pointer -funroll-loops 
-fpeel-loops -ftracer -finline-functions  execution test
FAIL->PASS: gcc.dg/torture/pr69714.c   -O3 -g  execution test
FAIL->PASS: gcc.dg/vect/pr70021.c execution test
FAIL->PASS: gcc.dg/vect/pr70021.c -flto -ffat-lto-objects execution test

OK for trunk (provided patch passes bootstrap/regtest)?

Thanks,
Alex

[1] : 
https://developer.arm.com/docs/ddi0602/g/a64-sve-instructions-alphabetic-order/bcax-bitwise-clear-and-exclusive-or

---

gcc/ChangeLog:

PR target/97730
* config/aarch64/aarch64-sve2.md (@aarch64_sve2_bcax):
Change to define_expand, add missing (trivially-predicated) not
rtx to fix wrong code bug.
(*aarch64_sve2_bcax): New.

gcc/testsuite/ChangeLog:

PR target/97730
* gcc.target/aarch64/sve2/bcax_1.c (OP): Add missing bitwise not
to match correct bcax semantics.
* gcc.dg/vect/pr97730.c: New test.
diff --git a/gcc/config/aarch64/aarch64-sve2.md 
b/gcc/config/aarch64/aarch64-sve2.md
index 0cafd0b690d..12dc9aaac55 100644
--- a/gcc/config/aarch64/aarch64-sve2.md
+++ b/gcc/config/aarch64/aarch64-sve2.md
@@ -786,17 +786,42 @@ (define_insn "@aarch64_sve2_xar"
 ;; -
 
 ;; Unpredicated exclusive OR of AND.
-(define_insn "@aarch64_sve2_bcax"
+(define_expand "@aarch64_sve2_bcax"
+  [(set (match_operand:SVE_FULL_I 0 "register_operand")
+   (xor:SVE_FULL_I
+ (and:SVE_FULL_I
+   (unspec:SVE_FULL_I
+ [(match_dup 4)
+  (not:SVE_FULL_I
+(match_operand:SVE_FULL_I 3 "register_operand"))]
+ UNSPEC_PRED_X)
+   (match_operand:SVE_FULL_I 2 "register_operand"))
+ (match_operand:SVE_FULL_I 1 "register_operand")))]
+  "TARGET_SVE2"
+  {
+operands[4] = CONSTM1_RTX (mode);
+  }
+)
+
+(define_insn_and_rewrite "*aarch64_sve2_bcax"
   [(set (match_operand:SVE_FULL_I 0 "register_operand" "=w, ?&w")
(xor:SVE_FULL_I
  (and:SVE_FULL_I
-   (match_operand:SVE_FULL_I 2 "register_operand" "w, w")
-   (match_operand:SVE_FULL_I 3 "register_operand" "w, w"))
+   (unspec:SVE_FULL_I
+ [(match_operand 4)
+  (not:SVE_FULL_I
+(match_operand:SVE_FULL_I 3 "register_operand" "w, w"))]
+ UNSPEC_PRED_X)
+   (match_operand:SVE_FULL_I 2 "register_operand" "w, w"))
  (match_operand:SVE_FULL_I 1 "register_operand" "0, w")))]
   "TARGET_SVE2"
   "@
   bcax\t%0.d, %0.d, %2.d, %3.d
   movprfx\t%0, %1\;bcax\t%0.d, %0.d, %2.d, %3.d"
+  "&& !CONSTANT_P (operands[4])"
+  {
+operands[4] = CONSTM1_RTX (mode);
+  }
   [(set_attr "movprfx" "*,yes")]
 )
 
diff --git a/gcc/testsuite/gcc.dg/vect/pr97730.c 
b/gcc/testsuite/gcc.dg/vect/pr97730.c
new file mode 100644
index 000..af4bca44879
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/vect/pr97730.c
@@ -0,0 +1,12 @@
+/* { dg-additional-options "-O1" } */
+unsigned b = 0xce8e5a48, c = 0xb849691a;
+unsigned a[8080];
+int main() {
+  a[0] = b;
+  c = c;
+  unsigned f = 0xb1e8;
+  for (int h = 0; h < 5; h++)
+a[h] = (b & c) ^ f;
+  if (a[0] != 0x8808f9e0)
+__builtin_abort();
+}
diff --git a/gcc/testsuite/gcc.target/aarch64/sve2/bcax_1.c 
b/gcc/testsuite/gcc.target/aarch64/sve2/bcax_1.c
index 4b0d5a9e67c..7c31afc4f19 100644
--- a/gcc/testsuite/gcc.target/aarch64/sve2/bcax_1.c
+++ b/gcc/testsuite/gcc.target/aarch64/sve2/bcax_1.c
@@ -1,7 +1,7 @@
 /* { dg-do compile } */
 /* { dg-options "-O2 -ftree-vectorize -fdump-tree-vect-details --save-temps" } 
*/
 
-#define OP(x,y,z) ((x) ^ ((y) & (z)))
+#define OP(x,y,z) ((x) ^ (~(y) & (z)))
 
 #include "bitsel_1.c"
 


Re: [gcc-7-arm] Backport -moutline-atomics flag

2020-11-11 Thread Richard Biener via Gcc-patches
On Thu, Sep 24, 2020 at 4:59 PM Pop, Sebastian  wrote:
>
> Thanks Richard for your recommendations.
> I am still discussing with Kyrill about a good name for the branch.
> Once we agree on a name we will commit the patches to that branch.

Any update here?  Are those patches in production at Amazon?
I now see refs/vendors/AWS/heads/Arm64/gcc-7-branch

Thanks,
Richard.

> Sebastian
>
> On 9/24/20, 4:10 AM, "Richard Biener"  wrote:
>
> On Fri, Sep 11, 2020 at 12:38 AM Pop, Sebastian via Gcc-patches
>  wrote:
> >
> > Hi,
> >
> > the attached patches are back-porting the flag -moutline-atomics to the 
> gcc-7-arm vendor branch.
> > The flag enables a very important performance optimization for 
> N1-neoverse processors.
> > The patches pass bootstrap and make check on Graviton2 aarch64-linux.
> >
> > Ok to commit to the gcc-7-arm vendor branch?
>
> Given the branch doesn't exist yet can you eventually push this series to
> a user branch (or a amazon vendor branch)?
>
> You failed to CC arm folks so your mail might have been lost in the noise.
>
> Thanks,
> Richard.
>
> > Thanks,
> > Sebastian
> >
>


[PATCH] Fix PRE topological expression set sorting

2020-11-11 Thread Richard Biener
This fixes sorted_array_from_bitmap_set to do a topological sort
as required by re-using what PHI-translation does, namely a DFS
walk with the help of bitmap_find_leader.  The proper result
is verified by extra checking in clean () (which would have tripped
before) and for the testcase I'm working at during the last
patches (PR97623) it is neutral in compile-time cost.

Bootstrapped and tested on x86_64-unknown-linux-gnu, pushed.

2020-11-11  Richard Biener  

* tree-ssa-pre.c (pre_expr_DFS): New function.
(sorted_array_from_bitmap_set): Use it to properly
topologically sort the expression set.
(clean): Verify we've cleaned everything we should.
---
 gcc/tree-ssa-pre.c | 104 -
 1 file changed, 84 insertions(+), 20 deletions(-)

diff --git a/gcc/tree-ssa-pre.c b/gcc/tree-ssa-pre.c
index d90249c0182..6dea8c28a1a 100644
--- a/gcc/tree-ssa-pre.c
+++ b/gcc/tree-ssa-pre.c
@@ -806,36 +806,92 @@ bitmap_set_free (bitmap_set_t set)
 }
 
 
+/* DFS walk EXPR to its operands with leaders in SET, collecting
+   expressions in SET in postorder into POST.  */
+
+static void
+pre_expr_DFS (pre_expr expr, bitmap_set_t set, bitmap visited,
+ hash_set > &leader_set,
+ vec &post)
+{
+  if (!bitmap_set_bit (visited, get_expression_id (expr)))
+return;
+
+  switch (expr->kind)
+{
+case NARY:
+  {
+   vn_nary_op_t nary = PRE_EXPR_NARY (expr);
+   for (unsigned i = 0; i < nary->length; i++)
+ {
+   if (TREE_CODE (nary->op[i]) != SSA_NAME)
+ continue;
+   unsigned int op_val_id = VN_INFO (nary->op[i])->value_id;
+   /* If we already found a leader for the value we've
+  recursed already.  Avoid the costly bitmap_find_leader.  */
+   if (!leader_set.add (op_val_id))
+ {
+   pre_expr leader = bitmap_find_leader (set, op_val_id);
+   if (leader)
+ pre_expr_DFS (leader, set, visited, leader_set, post);
+ }
+ }
+   break;
+  }
+case REFERENCE:
+  {
+   vn_reference_t ref = PRE_EXPR_REFERENCE (expr);
+   vec operands = ref->operands;
+   vn_reference_op_t operand;
+   for (unsigned i = 0; operands.iterate (i, &operand); i++)
+ {
+   tree op[3];
+   op[0] = operand->op0;
+   op[1] = operand->op1;
+   op[2] = operand->op2;
+   for (unsigned n = 0; n < 3; ++n)
+ {
+   if (!op[n] || TREE_CODE (op[n]) != SSA_NAME)
+ continue;
+   unsigned op_val_id = VN_INFO (op[n])->value_id;
+   if (!leader_set.add (op_val_id))
+ {
+   pre_expr leader = bitmap_find_leader (set, op_val_id);
+   if (leader)
+ pre_expr_DFS (leader, set, visited, leader_set, post);
+ }
+ }
+ }
+   break;
+  }
+default:;
+}
+  post.quick_push (expr);
+}
+
 /* Generate an topological-ordered array of bitmap set SET.  */
 
 static vec 
 sorted_array_from_bitmap_set (bitmap_set_t set)
 {
-  unsigned int i, j;
-  bitmap_iterator bi, bj;
+  unsigned int i;
+  bitmap_iterator bi;
   vec result;
 
   /* Pre-allocate enough space for the array.  */
-  result.create (bitmap_count_bits (&set->expressions));
+  size_t len = bitmap_count_bits (&set->expressions);
+  result.create (len);
+  hash_set > leader_set (2*len);
 
-  FOR_EACH_VALUE_ID_IN_SET (set, i, bi)
+  auto_bitmap visited (&grand_bitmap_obstack);
+  bitmap_tree_view (visited);
+  FOR_EACH_EXPR_ID_IN_SET (set, i, bi)
 {
-  /* The number of expressions having a given value is usually
-relatively small.  Thus, rather than making a vector of all
-the expressions and sorting it by value-id, we walk the values
-and check in the reverse mapping that tells us what expressions
-have a given value, to filter those in our set.  As a result,
-the expressions are inserted in value-id order, which means
-topological order.
-
-If this is somehow a significant lose for some cases, we can
-choose which set to walk based on the set size.  */
-  bitmap exprset = value_expressions[i];
-  EXECUTE_IF_SET_IN_BITMAP (exprset, 0, j, bj)
-   {
- if (bitmap_bit_p (&set->expressions, j))
-   result.quick_push (expression_for_id (j));
-}
+  pre_expr expr = expression_for_id (i);
+  /* Hoist insertion calls us with a value-set we have to and with,
+do so.  */
+  if (bitmap_set_contains_value (set, get_expr_value_id (expr)))
+   pre_expr_DFS (expr, set, visited, leader_set, result);
 }
 
   return result;
@@ -1988,6 +2044,14 @@ clean (bitmap_set_t set1, bitmap_set_t set2 = NULL)
}
 }
   exprs.release ();
+
+  if (flag_checking)
+{
+  unsigned j;
+  bitmap_iterator bi;
+  FOR_EACH_EXP

[committed] testsuite: Fix up scan-tree-dump-times regexps for 64-bit targets

2020-11-11 Thread Jakub Jelinek via Gcc-patches
On Wed, Nov 11, 2020 at 02:40:20PM +0100, Thomas Schwinge wrote:
> On 2020-11-11T09:25:25+0100, Tobias Burnus  wrote:
> > As Sunil's regression tester pointed out, the testcases fail on x86-64 with 
> > -m32.
> >
> > The reason is that then the _ull_ variants of the GOMP functions are called;
> > in the C equivalent, those are always called – I assume that's because the C
> > testcase uses 'unsigned' which does not exist with Fortran.
> >
> > Committed as r11-4903-g1644ab9917ca6b96e9e683c422f1793258b9a3db
> 
> I'm confirming this fixes things for '-m32' -- but it also broke '-m64'.
> ;-)

The added (?:_ull) match on 32-bit targets, but are equivalent to just
adding _ull into the strings, i.e. require the _ull substrings, while
the intent is that they are optional, so we should use (?:_ull)? instead.

Tested on x86_64-linux with -m32/-m64, committed to trunk.

2020-11-11  Jakub Jelinek  

* gfortran.dg/gomp/workshare-reduction-3.f90: Use (?:_ull)? instead
of (?:_ull) in the scan-tree-dump-times directives.
* gfortran.dg/gomp/workshare-reduction-26.f90: Likewise.
* gfortran.dg/gomp/workshare-reduction-27.f90: Likewise.
* gfortran.dg/gomp/workshare-reduction-28.f90: Likewise.
* gfortran.dg/gomp/workshare-reduction-36.f90: Likewise.
* gfortran.dg/gomp/workshare-reduction-37.f90: Likewise.
* gfortran.dg/gomp/workshare-reduction-38.f90: Likewise.
* gfortran.dg/gomp/workshare-reduction-39.f90: Likewise.
* gfortran.dg/gomp/workshare-reduction-40.f90: Likewise.
* gfortran.dg/gomp/workshare-reduction-41.f90: Likewise.
* gfortran.dg/gomp/workshare-reduction-42.f90: Likewise.
* gfortran.dg/gomp/workshare-reduction-43.f90: Likewise.
* gfortran.dg/gomp/workshare-reduction-44.f90: Likewise.
* gfortran.dg/gomp/workshare-reduction-45.f90: Likewise.
* gfortran.dg/gomp/workshare-reduction-46.f90: Likewise.
* gfortran.dg/gomp/workshare-reduction-47.f90: Likewise.
* gfortran.dg/gomp/workshare-reduction-56.f90: Likewise.
* gfortran.dg/gomp/workshare-reduction-57.f90: Likewise.

--- gcc/testsuite/gfortran.dg/gomp/workshare-reduction-26.f90.jj
2020-11-11 14:11:07.924006064 +0100
+++ gcc/testsuite/gfortran.dg/gomp/workshare-reduction-26.f90   2020-11-11 
16:08:18.865674174 +0100
@@ -1,8 +1,8 @@
 ! { dg-do compile }
 ! { dg-options "-O2 -fopenmp -fdump-tree-optimized" }
-! { dg-final { scan-tree-dump-times "__builtin_GOMP_loop(?:_ull)_start 
\[^\n\r]*, 0, 0, " 1 "optimized" } }
+! { dg-final { scan-tree-dump-times "__builtin_GOMP_loop(?:_ull)?_start 
\[^\n\r]*, 0, 0, " 1 "optimized" } }
 ! { dg-final { scan-tree-dump-times "__builtin_GOMP_loop_end " 1 "optimized" } 
}
-! { dg-final { scan-tree-dump-times 
"__builtin_GOMP_loop(?:_ull)_maybe_nonmonotonic_runtime_next " 1 "optimized" } }
+! { dg-final { scan-tree-dump-times 
"__builtin_GOMP_loop(?:_ull)?_maybe_nonmonotonic_runtime_next " 1 "optimized" } 
}
 ! { dg-final { scan-tree-dump-times 
"__builtin_GOMP_workshare_task_reduction_unregister \\(0\\)" 1 "optimized" } }
 ! { dg-final { scan-tree-dump-times "__builtin_GOMP_parallel " 1 "optimized" } 
}
 
--- gcc/testsuite/gfortran.dg/gomp/workshare-reduction-28.f90.jj
2020-11-11 14:11:07.924006064 +0100
+++ gcc/testsuite/gfortran.dg/gomp/workshare-reduction-28.f90   2020-11-11 
16:08:18.871674108 +0100
@@ -1,8 +1,8 @@
 ! { dg-do compile }
 ! { dg-options "-O2 -fopenmp -fdump-tree-optimized" }
-! { dg-final { scan-tree-dump-times "__builtin_GOMP_loop(?:_ull)_start 
\[^\n\r]*, 4, 0, " 1 "optimized" } }
+! { dg-final { scan-tree-dump-times "__builtin_GOMP_loop(?:_ull)?_start 
\[^\n\r]*, 4, 0, " 1 "optimized" } }
 ! { dg-final { scan-tree-dump-times "__builtin_GOMP_loop_end " 1 "optimized" } 
}
-! { dg-final { scan-tree-dump-times 
"__builtin_GOMP_loop(?:_ull)_nonmonotonic_runtime_next " 1 "optimized" } }
+! { dg-final { scan-tree-dump-times 
"__builtin_GOMP_loop(?:_ull)?_nonmonotonic_runtime_next " 1 "optimized" } }
 ! { dg-final { scan-tree-dump-times 
"__builtin_GOMP_workshare_task_reduction_unregister \\(0\\)" 1 "optimized" } }
 ! { dg-final { scan-tree-dump-times "__builtin_GOMP_parallel " 1 "optimized" } 
}
 
--- gcc/testsuite/gfortran.dg/gomp/workshare-reduction-44.f90.jj
2020-11-11 14:11:07.924006064 +0100
+++ gcc/testsuite/gfortran.dg/gomp/workshare-reduction-44.f90   2020-11-11 
16:08:18.895673844 +0100
@@ -1,8 +1,8 @@
 ! { dg-do compile }
 ! { dg-options "-O2 -fopenmp -fdump-tree-optimized" }
-! { dg-final { scan-tree-dump-times "__builtin_GOMP_loop(?:_ull)_start 
\[^\n\r]*, 3, 1, " 1 "optimized" } }
+! { dg-final { scan-tree-dump-times "__builtin_GOMP_loop(?:_ull)?_start 
\[^\n\r]*, 3, 1, " 1 "optimized" } }
 ! { dg-final { scan-tree-dump-times "__builtin_GOMP_loop_end " 1 "optimized" } 
}
-! { dg-final { scan-tree-dump-times 
"__builtin_GOMP_loop(?:_ull)_nonmonotonic_guided_next " 1 "optimized" } }
+! { dg-final { scan-tree-dump-times 
"__builtin_GOMP_l

Re: [gcc-7-arm] Backport -moutline-atomics flag

2020-11-11 Thread Pop, Sebastian via Gcc-patches
Hi Richard,

On 11/11/20, 8:45 AM, "Richard Biener"  wrote:
> Any update here?  Are those patches in production at Amazon?
> I now see refs/vendors/AWS/heads/Arm64/gcc-7-branch

The patches in the branch 
https://gcc.gnu.org/git/?p=gcc.git;a=shortlog;h=refs/vendors/AWS/heads/Arm64/gcc-7-branch
up to 
https://gcc.gnu.org/git/?p=gcc.git;a=commitdiff;h=7c216ba945cb92bd79fbe01b35e16bd1e3cd854d
are part of gcc-7 in Amazon Linux 2.
The patches after that commit are staged to be integrated in AL2 by end of the 
year.

Sebastian



[PATCH] c++: Fix up constexpr CLEANUP_POINT_EXPR and TRY_FINALLY_EXPR handling [PR97790]

2020-11-11 Thread Jakub Jelinek via Gcc-patches
Hi!

As the testcase shows, CLEANUP_POINT_EXPR (and I think TRY_FINALLY_EXPR too)
suffer from the same problem that I was trying to fix in
r10-3597-g1006c9d4395a939820df76f37c7b085a4a1a003f
for CLEANUP_STMT, namely that if in the middle of the body expression of
those stmts is e.g. return stmt, goto, break or continue (something that
changes *jump_target and makes it start skipping stmts), we then skip the
cleanups too, which is not appropriate - the cleanups were either queued up
during the non-skipping execution of the body (for CLEANUP_POINT_EXPR), or
for TRY_FINALLY_EXPR are relevant already after entering the body block.

Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?

2020-11-11  Jakub Jelinek  

PR c++/97790
* constexpr.c (cxx_eval_constant_expression) : For evaluation of cleanups use initially
recorded jump_target pointee rather than whatever ends up in it
after evaluation of the body operand.

* g++.dg/cpp2a/constexpr-dtor9.C: New test.

--- gcc/cp/constexpr.c.jj   2020-11-04 09:35:10.025029335 +0100
+++ gcc/cp/constexpr.c  2020-11-11 13:52:37.538466295 +0100
@@ -6008,6 +6008,7 @@ cxx_eval_constant_expression (const cons
auto_vec cleanups;
vec *prev_cleanups = ctx->global->cleanups;
ctx->global->cleanups = &cleanups;
+   tree initial_jump_target = jump_target ? *jump_target : NULL_TREE;
r = cxx_eval_constant_expression (ctx, TREE_OPERAND (t, 0),
  lval,
  non_constant_p, overflow_p,
@@ -6019,19 +6020,24 @@ cxx_eval_constant_expression (const cons
FOR_EACH_VEC_ELT_REVERSE (cleanups, i, cleanup)
  cxx_eval_constant_expression (ctx, cleanup, false,
non_constant_p, overflow_p,
-   jump_target);
+   jump_target ? &initial_jump_target
+   : NULL);
   }
   break;
 
 case TRY_FINALLY_EXPR:
-  r = cxx_eval_constant_expression (ctx, TREE_OPERAND (t, 0), lval,
+  {
+   tree initial_jump_target = jump_target ? *jump_target : NULL_TREE;
+   r = cxx_eval_constant_expression (ctx, TREE_OPERAND (t, 0), lval,
+ non_constant_p, overflow_p,
+ jump_target);
+   if (!*non_constant_p)
+ /* Also evaluate the cleanup.  */
+ cxx_eval_constant_expression (ctx, TREE_OPERAND (t, 1), true,
non_constant_p, overflow_p,
-   jump_target);
-  if (!*non_constant_p)
-   /* Also evaluate the cleanup.  */
-   cxx_eval_constant_expression (ctx, TREE_OPERAND (t, 1), true,
- non_constant_p, overflow_p,
- jump_target);
+   jump_target ? &initial_jump_target
+   : NULL);
+  }
   break;
 
 case CLEANUP_STMT:
--- gcc/testsuite/g++.dg/cpp2a/constexpr-dtor9.C.jj 2020-11-11 
13:57:16.572334917 +0100
+++ gcc/testsuite/g++.dg/cpp2a/constexpr-dtor9.C2020-11-11 
13:57:45.510010165 +0100
@@ -0,0 +1,31 @@
+// PR c++/97790
+// { dg-do compile { target c++20 } }
+
+struct S
+{
+  int *d;
+  int n;
+  constexpr S () : d(new int[1]{}), n(1) {}
+  constexpr ~S () { delete [] d; }
+};
+
+constexpr S
+foo ()
+{
+  return S ();
+}
+
+constexpr int
+bar ()
+{
+  return foo ().n;
+}
+
+constexpr int
+baz ()
+{
+  return S ().n;
+}
+
+constexpr int a = baz ();
+constexpr int b = bar ();

Jakub



Re: [PATCH] libstdc++: exclude cygwin/mingw from relro linker test

2020-11-11 Thread Jonathan Wakely via Gcc-patches

On 11/11/20 12:41 +, Jonathan Wakely wrote:

On 11/11/20 12:34 +, Jonathan Yong via Libstdc++ wrote:
cygwin/mingw hosted linker may support multiple targets with ELF 
relro support. This breaks configure testing.


Attached patch excludes cygwin/mingw PE format from relro linker 
flag. Patch OK?


OK, thanks.


Pushed to master now.




[PATCH] libstdc++: Enable without gthreads

2020-11-11 Thread Jonathan Wakely via Gcc-patches
This makes it possible to use std::thread in single-threaded builds.
All member functions are available, but attempting to create a new
thread will throw an exception.

The main benefit for most targets is that other headers such as 
do not need to include the whole of  just to be able to create a
std::thread. That avoids including  and std::jthread where
not required.

This also means we can use std::thread::id in  instead of
using the gthread wrappers directly.

libstdc++-v3/ChangeLog:

* include/Makefile.am: Add new  header.
* include/Makefile.in: Regenerate.
* include/std/future: Include new header instead of .
* include/std/stop_token: Include new header instead of
.
(stop_token::_S_yield()): Use this_thread::yield().
(_Stop_state_t::_M_requester): Change type to std::thread::id.
(_Stop_state_t::_M_request_stop()): Use this_thread::get_id().
(_Stop_state_t::_M_remove_callback(_Stop_cb*)): Likewise.
Use __is_single_threaded() to decide whether to synchronize.
* include/std/thread (thread, operator==, this_thread::get_id)
(this_thread::yield): Move to new header.
(operator<=>, operator!=, operator<, operator<=, operator>)
(operator>=, hash, operator<<): Define even when
gthreads not available.
* src/c++11/thread.cc (_GLIBCXX_NPROCS): Define to 0 when
gthreads not available.
(thread::_State::~_State(), thread::join(), thread::detach())
(thread::_M_start_thread(_State_ptr, void(*)()))
(thread::hardware_concurrency()): Define even when gthreads
not available.
* include/bits/std_thread.h: New file.
(thread, operator==, this_thread::get_id, this_thread::yield):
Define even when gthreads not available.

I'm sending this for consideration, I haven't pushed it.

This removes a number of ugly preprocessor checks for
_GLIBCXX_HAS_GTHREADS because things like std::this_thread::get_id()
and std::this_thread::yield() are always available.

The patch is missing changes to the testsuite to remove some (but
certainly not all) of the { dg-require-gthreads "" } directives. That
hasn't been done yet because it isn't sufficient. The testsuite also
filters out any test with the string "thread" in its path (see
testsuite/libstdc++-dg/conformance.exp) so that needs to be changed if
we want any tests under 30_threads to run for non-gthreads targets.

A follow-up patch could make futures and promises available on targets
with no gthreads support. For example, to use with coroutines. That
needs a little more work, as we'd need a non-gthreads version of
__atomic_futex_unsigned.

Thoughts?

commit 8bd28626e5f9b28192d0e868e922f26d4f40187e
Author: Jonathan Wakely 
Date:   Wed Nov 11 15:43:03 2020

libstdc++: Enable  without gthreads

This makes it possible to use std::thread in single-threaded builds.
All member functions are available, but attempting to create a new
thread will throw an exception.

The main benefit for most targets is that other headers such as 
do not need to include the whole of  just to be able to create a
std::thread. That avoids including  and std::jthread where
not required.

This also means we can use std::thread::id in  instead of
using the gthread wrappers directly.

libstdc++-v3/ChangeLog:

* include/Makefile.am: Add new  header.
* include/Makefile.in: Regenerate.
* include/std/future: Include new header instead of .
* include/std/stop_token: Include new header instead of
.
(stop_token::_S_yield()): Use this_thread::yield().
(_Stop_state_t::_M_requester): Change type to std::thread::id.
(_Stop_state_t::_M_request_stop()): Use this_thread::get_id().
(_Stop_state_t::_M_remove_callback(_Stop_cb*)): Likewise.
Use __is_single_threaded() to decide whether to synchronize.
* include/std/thread (thread, operator==, this_thread::get_id)
(this_thread::yield): Move to new header.
(operator<=>, operator!=, operator<, operator<=, operator>)
(operator>=, hash, operator<<): Define even when
gthreads not available.
* src/c++11/thread.cc (_GLIBCXX_NPROCS): Define to 0 when
gthreads not available.
(thread::_State::~_State(), thread::join(), thread::detach())
(thread::_M_start_thread(_State_ptr, void(*)()))
(thread::hardware_concurrency()): Define even when gthreads
not available.
* include/bits/std_thread.h: New file.
(thread, operator==, this_thread::get_id, this_thread::yield):
Define even when gthreads not available.

diff --git a/libstdc++-v3/include/Makefile.am b/libstdc++-v3/include/Makefile.am
index 292d89da8ba7..7979f1c589d6 100644
--- a/libstdc++-v3/include/Makefile.am
+++ b/libstd

[PATCH] libstdc++: Ensure __gthread_self doesn't call undefined weak symbol [PR 95989]

2020-11-11 Thread Jonathan Wakely via Gcc-patches
Since glibc 2.27 the pthread_self symbol has been defined in libc rather
than libpthread. Because we only call pthread_self through a weak alias
it's possible for statically linked executables to end up without a
definition of pthread_self. This crashes when trying to call an
undefined weak symbol.

We can use the __GLIBC_PREREQ version check to detect the version of
glibc where pthread_self is no longer in libpthread, and call it
directly rather than through the weak reference.

It would be better to check for pthread_self in libc during configure
instead of hardcoding the __GLIBC_PREREQ check. That would be somewhat
complicated by the fact that prior to glibc 2.27 only libc.so.6
contained the pthread_self symbol. The configure checks would need to
try to link both statically and dynamically, and the result would depend
on whether the static libc.a happens to be installed during configure
(which could vary between different systems using the same version of
glibc). Doing it properly is left for a future date, as it will be
needed anyway after glibc moves all pthread symbols from libpthread to
libc. When that happens we should revisit the whole approach of using
weak symbols for pthread symbols.

An undesirable consequence of this change is that code compiled prior to
the change might inline the old definition of this_thread::get_id()
which always returns (__gthread_t)1 in a program that isn't linked to
libpthread. Code compiled after the change will use pthread_self() and
so get a real TID. That could result in the main thread having different
thread::id values in different translation units. This seems acceptable,
as there are not expected to be many uses of thread::id in programs
that aren't linked to libpthread.

libgcc/ChangeLog:

PR libstdc++/95989
* gthr-posix.h (__gthread_self) [__GLIBC_PREREQ(2, 27)]: Call
pthread_self directly rather than using weak alias.

libstdc++-v3/ChangeLog:

PR libstdc++/95989
* include/std/thread (this_thread::get_id): Add explicit cast
from int to __gthread_t. Use __gthread_self for glibc 2.27 and
newer.

Tested powerpc64le-linux (glibc 2.17) and x86_64-linux (glibc 2.31).

I can't approve the libgcc/gthr-posix.h part.

OK for trunk?

If the libgcc/gthr-posix.h change is not acceptable I will just change
the two places libstdc++ uses __gthread_self() so that they call
pthread_self() directly instead. But it seems worth fixing
gthr-posix.h to avoid the problem.

commit f01eaa49afab0cbd88a7e2d177d6b416ce1b78c6
Author: Jonathan Wakely 
Date:   Thu Jul 9 10:11:57 2020

libstdc++: Ensure __gthread_self doesn't call undefined weak symbol [PR 
95989]

Since glibc 2.27 the pthread_self symbol has been defined in libc rather
than libpthread. Because we only call pthread_self through a weak alias
it's possible for statically linked executables to end up without a
definition of pthread_self. This crashes when trying to call an
undefined weak symbol.

We can use the __GLIBC_PREREQ version check to detect the version of
glibc where pthread_self is no longer in libpthread, and call it
directly rather than through the weak reference.

It would be better to check for pthread_self in libc during configure
instead of hardcoding the __GLIBC_PREREQ check. That would be somewhat
complicated by the fact that prior to glibc 2.27 only libc.so.6
contained the pthread_self symbol. The configure checks would need to
try to link both statically and dynamically, and the result would depend
on whether the static libc.a happens to be installed during configure
(which could vary between different systems using the same version of
glibc). Doing it properly is left for a future date, as it will be
needed anyway after glibc moves all pthread symbols from libpthread to
libc. When that happens we should revisit the whole approach of using
weak symbols for pthread symbols.

An undesirable consequence of this change is that code compiled prior to
the change might inline the old definition of this_thread::get_id()
which always returns (__gthread_t)1 in a program that isn't linked to
libpthread. Code compiled after the change will use pthread_self() and
so get a real TID. That could result in the main thread having different
thread::id values in different translation units. This seems acceptable,
as there are not expected to be many uses of thread::id in programs
that aren't linked to libpthread.

libgcc/ChangeLog:

PR libstdc++/95989
* gthr-posix.h (__gthread_self) [__GLIBC_PREREQ(2, 27)]: Call
pthread_self directly rather than using weak alias.

libstdc++-v3/ChangeLog:

PR libstdc++/95989
* include/std/thread (this_thread::get_id): Add explicit cast
from int to __gthread_t. Use __gthread_self for glibc 2.27 and
newer.

dif

Re: [PATCH] c++: Improve static_assert diagnostic [PR97518]

2020-11-11 Thread Jason Merrill via Gcc-patches

On 11/10/20 8:13 PM, Marek Polacek wrote:

On Tue, Nov 10, 2020 at 02:30:30PM -0500, Jason Merrill via Gcc-patches wrote:

On 11/10/20 2:28 PM, Marek Polacek wrote:

On Tue, Nov 10, 2020 at 02:15:56PM -0500, Jason Merrill wrote:

On 11/10/20 1:59 PM, Marek Polacek wrote:

On Tue, Nov 10, 2020 at 11:32:04AM -0500, Jason Merrill wrote:

On 11/9/20 7:21 PM, Marek Polacek wrote:

Currently, when a static_assert fails, we only say "static assertion failed".
It would be more useful if we could also print the expression that
evaluated to false; this is especially useful when the condition uses
template parameters.  Consider the motivating example, in which we have
this line:

  static_assert(is_same::value);

if this fails, the user has to play dirty games to get the compiler to
print the template arguments.  With this patch, we say:

  static assertion failed due to requirement 'is_same::value'


I'd rather avoid the word "requirement" here, to avoid confusion with
Concepts.

Maybe have the usual failed error, and if we're showing the expression, have
a second inform to say e.g. "%qE evaluates to false"?


Works for me.


Also, if we've narrowed the problem down to a particular subexpression,
let's only print that one.


Done.


which I think is much better.  However, always printing the condition that
evaluated to 'false' wouldn't be very useful: e.g. noexcept(fn) is
always parsed to true/false, so we would say "static assertion failed due
to requirement 'false'" which doesn't help.  So I wound up only printing
the condition when it was instantiation-dependent, that is, we called
finish_static_assert from tsubst_expr.

Moreover, this patch also improves the diagnostic when the condition
consists of a logical AND.  Say you have something like this:

  static_assert(fn1() && fn2() && fn3() && fn4() && fn5());

where fn4() evaluates to false and the other ones to true.  Highlighting
the whole thing is not that helpful because it won't say which clause
evaluated to false.  With the find_failing_clause tweak in this patch
we emit:

  error: static assertion failed
6 | static_assert(fn1() && fn2() && fn3() && fn4() && fn5());
  |  ~~~^~

so you know right away what's going on.  Unfortunately, when you combine
both things, that is, have an instantiation-dependent expr and && in
a static_assert, we can't yet quite point to the clause that failed.  It
is because when we tsubstitute something like is_same::value, we
generate a VAR_DECL that doesn't have any location.  It would be awesome
if we could wrap it with a location wrapper, but I didn't see anything
obvious.


Hmm, I vaguely remember that at first we were using location wrappers less
in templates, but I thought that was fixed.  I don't see anything setting
suppress_location_wrappers, and it looks like tsubst_copy_and_build should
preserve a location wrapper.


The problem here is that tsubst_qualified_id produces a VAR_DECL and for those
CAN_HAVE_LOCATION_P is false.


Ah, then perhaps tsubst_qualified_id should call maybe_wrap_with_location to
preserve the location of the SCOPE_REF.


The SCOPE_REF's location is fine, we just throw it away and return a VAR_DECL.


Yes, I'm saying that tsubst_qualified_id should extract the location from
the SCOPE_REF and pass it to maybe_wrap_with_location to give the VAR_DECL a
location wrapper.


Nevermind, I wasn't checking the return value of maybe_wrap_with_location...
Maybe we should childproof it by marking it with WARN_UNUSED_RESULT.


Sounds good.


Anyway, this patch does the tsubst_qualified_id tweak.  With that, the
static_assert diagnostic with && is pretty spot on!

Relatedly, don't create useless location wrappers for temporary variables.
Since they are compiler-generated and ignored, nobody should be interested
in their location in the source file.

Bootstrapped/regtested on x86_64-pc-linux-gnu, ok for trunk?


OK.


-- >8 --
Retain the location when tsubstituting a qualified-id so that our
static_assert diagnostic can benefit.  Don't create useless location
wrappers for temporary variables.

gcc/ChangeLog:

PR c++/97518
* tree.c (maybe_wrap_with_location): Don't add a location
wrapper around an artificial and ignored decl.

gcc/cp/ChangeLog:

PR c++/97518
* pt.c (tsubst_qualified_id): Use EXPR_LOCATION of the qualified-id.
Use it to maybe_wrap_with_location the final expression.

gcc/testsuite/ChangeLog:

PR c++/97518
* g++.dg/diagnostic/static_assert3.C: New test.
---
  gcc/cp/pt.c   |  5 +--
  .../g++.dg/diagnostic/static_assert3.C| 36 +++
  gcc/tree.c|  4 +++
  3 files changed, 43 insertions(+), 2 deletions(-)
  create mode 100644 gcc/testsuite/g++.dg/diagnostic/static_assert3.C

diff --git a/gcc/cp/pt.c b/gcc/cp/pt.c
index 6ba114c9da3..c592461c474 100644
--- a/gcc/cp/pt.c
+++ b/gcc/c

Re: [PATCH] libstdc++: Enable without gthreads

2020-11-11 Thread Jonathan Wakely via Gcc-patches

On 11/11/20 16:13 +, Jonathan Wakely wrote:

This makes it possible to use std::thread in single-threaded builds.
All member functions are available, but attempting to create a new
thread will throw an exception.

The main benefit for most targets is that other headers such as 
do not need to include the whole of  just to be able to create a
std::thread. That avoids including  and std::jthread where
not required.


I forgot to mention that this patch also reduces the size of the
 header, by only including  instead of the
whole of . That could be done separately from the rest of the
changes here.

It would be possible to split std::thread and this_thread::get_id()
into a new header without also making them work without gthreads.

That would still reduce the size of the  header, because it
wouldn't need the whole of . But it wouldn't get rid of
preprocessor checks for _GLIBCXX_HAS_GTHREADS in .

Allowing std::this_thread::get_id() and std::this_thread::yield() to
work without threads seems worth doing (we already make
std::this_thread::sleep_until and std::this_thread::sleep_for work
without threads).



RE: [Patch, testsuite]: Update MicroBlaze strings test

2020-11-11 Thread Nagaraju Mekala
Hi Jeff,

> -Original Message-
> From: Jeff Law 
> Sent: Wednesday, November 11, 2020 5:55 AM
> To: Nagaraju Mekala ; Michael Eager
> ; gcc-patches@gcc.gnu.org
> Cc: Sadanand Mutyala 
> Subject: Re: [Patch, testsuite]: Update MicroBlaze strings test
> 
> 
> On 11/8/20 10:58 PM, Nagaraju Mekala wrote:
> > Hello All,
> >
> > for new scan-assembly output resulting in use of $LC label
> >
> > gcc/testsuite/ChangeLog:
> > * gcc.target/microblaze/others/strings1.c: Update
> > to include $LC label.
> 
> Thanks.  Installed on the trunk.
Thanks for pushing the patch.

> 
> jeff
> 



RE: [Patch, microblaze]: Correct the const high double immediate value

2020-11-11 Thread Nagaraju Mekala
Hi Eager,

> -Original Message-
> From: Michael Eager 
> Sent: Wednesday, November 11, 2020 9:06 AM
> To: Nagaraju Mekala ; Michael Eager
> ; gcc-patches@gcc.gnu.org
> Cc: Sadanand Mutyala 
> Subject: Re: [Patch, microblaze]: Correct the const high double immediate
> value
> 
> On 11/8/20 9:43 PM, Nagaraju Mekala wrote:
> > diff --git a/gcc/config/microblaze/microblaze.c
> > b/gcc/config/microblaze/microblaze.c
> >
> > index a0f81b7..d9341ec 100644
> > --- a/gcc/config/microblaze/microblaze.c
> > +++ b/gcc/config/microblaze/microblaze.c
> > @@ -2440,15 +2440,18 @@ print_operand (FILE * file, rtx op, int
> > letter)
> >     else if (letter == 'h' || letter == 'j')
> >   {
> > -  long val[2];
> > +  long val[2], l[2];
> >     if (code == CONST_DOUBLE)
> >      {
> >    if (GET_MODE (op) == DFmode)
> >      REAL_VALUE_TO_TARGET_DOUBLE
> (*CONST_DOUBLE_REAL_VALUE
> > (op), val);
> >    else
> >      {
> > - val[0] = CONST_DOUBLE_HIGH (op);
> > - val[1] = CONST_DOUBLE_LOW (op);
> > + REAL_VALUE_TYPE rv;
> > +     REAL_VALUE_FROM_CONST_DOUBLE (rv, op);
> 
> REAL_VALUE_FROM_CONST_DOUBLE was removed from real.h in 2015.
> Use CONST_DOUBLE_REAL_VALUE.
> 
> > + REAL_VALUE_TO_TARGET_DOUBLE (rv, l);
> > + val[1] = l[WORDS_BIG_ENDIAN == 0];
> > + val[0] = l[WORDS_BIG_ENDIAN != 0];
> >      }
> >      }
> >     else if (code == CONST_INT)
> 
> 
> > diff --git a/gcc/testsuite/gcc.target/microblaze/long.c
> > b/gcc/testsuite/gcc.target/microblaze/long.c
> > new file mode 100644
> > index 000..4d45186
> > --- /dev/null
> > +++ b/gcc/testsuite/gcc.target/microblaze/long.c
> > @@ -0,0 +1,10 @@
> > +/* { dg-options "-O0" } */
> > +#define BASEADDR 0xF000ULL
> > +int main ()
> > +{
> > +  unsigned long long start;
> > +  start = (unsigned long long) BASEADDR;
> > +  return 0;
> > +}
> > +/* { dg-final { scan-assembler
> > "addik\tr(\[0-9]\|\[1-2]\[0-9]\|3\[0-1]),r0,0x" } } */
> > +/* { dg-final { scan-assembler
> > "addik\tr(\[0-9]\|\[1-2]\[0-9]\|3\[0-1]),r0,0xf000" } } */
> 
> It looks like this test case will pass without the patch.  The code
> generated before applying the patch is
>  addik   r4,r0,0x
>  addik   r5,r0,0xf000 #li => la
> 
> Can you provide a test case which fails without the patch but passes
> with the patch?
Thanks for reviewing the patch, I will update both patch and testcase and 
re-submit them.

Thanks,
Nagaraju
> 
> --
> Michael Eager


Re: [PATCH] aarch64: Add backend support for expanding __builtin_memset

2020-11-11 Thread Richard Sandiford via Gcc-patches
Sudakshina Das  writes:
> Apologies for the delay. I have attached another version of the patch.
> I have disabled the test cases for ILP32. This is only because function body 
> check
> fails because there is an addition unsigned extension instruction for src 
> pointer in
> every test (uxtwx0, w0). The actual inlining is not different.

Yeah, agree that's the best way of handling the ILP32 difference.

> […]
> +/* SET_RATIO is similar to CLEAR_RATIO, but for a non-zero constant.  Without
> +   -mstrict-align, make decisions in "setmem".  Otherwise follow a sensible
> +   default:  when optimizing for size adjust the ratio to account for the

nit: should just be one space after “:”

> […]
> @@ -21289,6 +21292,134 @@ aarch64_expand_cpymem (rtx *operands)
>return true;
>  }
>  
> +/* Like aarch64_copy_one_block_and_progress_pointers, except for memset where
> +   *src is a register we have created with the duplicated value to be set.  
> */

“*src” -> SRC
since there's no dereference now

> […]
> +  /* In case we are optimizing for size or if the core does not
> + want to use STP Q regs, lower the max_set_size.  */
> +  max_set_size = (!speed_p
> +   || (aarch64_tune_params.extra_tuning_flags
> +   & AARCH64_EXTRA_TUNE_NO_LDP_STP_QREGS))
> +   ? max_set_size/2 : max_set_size;

Formatting nit: should be a space either side of “/”.

> +  while (n > 0)
> +{
> +  /* Find the largest mode in which to do the copy in without
> +  over writing.  */

s/in without/without/

> +  opt_scalar_int_mode mode_iter;
> +  FOR_EACH_MODE_IN_CLASS (mode_iter, MODE_INT)
> + if (GET_MODE_BITSIZE (mode_iter.require ()) <= MIN (n, copy_limit))
> +   cur_mode = mode_iter.require ();
> +
> +  gcc_assert (cur_mode != BLKmode);
> +
> +  mode_bits = GET_MODE_BITSIZE (cur_mode).to_constant ();
> +  aarch64_set_one_block_and_progress_pointer (src, &dst, cur_mode);
> +
> +  n -= mode_bits;
> +
> +  /* Do certain trailing copies as overlapping if it's going to be
> +  cheaper.  i.e. less instructions to do so.  For instance doing a 15
> +  byte copy it's more efficient to do two overlapping 8 byte copies than
> +  8 + 4 + 2 + 1.  */
> +  if (n > 0 && n < copy_limit / 2)
> + {
> +   next_mode = smallest_mode_for_size (n, MODE_INT);
> +   int n_bits = GET_MODE_BITSIZE (next_mode).to_constant ();

Sorry for the runaround, but looking at this again, I'm a bit worried
that we only indirectly test that n_bits is within the length of the
original set.  I guess it is because if n < copy_limit / 2 then
n < mode_bits, and so n_bits will never exceed mode_bits.  I think
it might be worth adding an assert to make that “clearer” (maybe
only to me, probably obvious to everyone else):

  gcc_assert (n_bits <= mode_bits);

OK with those changes, thanks.

Richard

> +   dst = aarch64_move_pointer (dst, (n - n_bits) / BITS_PER_UNIT);
> +   n = n_bits;
> + }
> +}
> +
> +  return true;
> +}
> +
> +
>  /* Split a DImode store of a CONST_INT SRC to MEM DST as two
> SImode stores.  Handle the case when the constant has identical
> bottom and top halves.  This is beneficial when the two stores can be


Re: [PATCH] Better __ashlDI3, __ashrDI3 and __lshrDI3 functions, plus fixed __bswapsi2 function

2020-11-11 Thread Joseph Myers
On Wed, 11 Nov 2020, Jakub Jelinek via Gcc-patches wrote:

> So indeed, 0x80 << 24 is UB in C99/C11 and C++98, unclear in C89 and
> well defined in C++11 and later.  I don't know if C2X is considering
> mandating two's complement and making it well defined like C++20 did.

C2x requires two's complement but that's only about representation; there 
are no changes so far to what shifts are undefined.

-- 
Joseph S. Myers
jos...@codesourcery.com


Re: [PATCH] AArch64: Improve inline memcpy expansion

2020-11-11 Thread Richard Sandiford via Gcc-patches
Wilco Dijkstra  writes:
> Improve the inline memcpy expansion.  Use integer load/store for copies <= 24 
> bytes
> instead of SIMD.  Set the maximum copy to expand to 256 by default, except 
> that -Os or
> no Neon expands up to 128 bytes.  When using LDP/STP of Q-registers, also use 
> Q-register
> accesses for the unaligned tail, saving 2 instructions (eg. all sizes up to 
> 48 bytes emit
> exactly 4 instructions).  Cleanup code and comments.
>
> The codesize gain vs the GCC10 expansion is 0.05% on SPECINT2017.

Nice.  A couple of small comments inline…

> Passes bootstrap and regress. OK for commit?
>
> ChangeLog:
> 2020-11-03  Wilco Dijkstra  
>
> * config/aarch64/aarch64.c (aarch64_expand_cpymem): Cleanup code and
> comments, tweak expansion decisions and improve tail expansion.
>
> ---
>
> diff --git a/gcc/config/aarch64/aarch64.c b/gcc/config/aarch64/aarch64.c
> index 
> 41e2a699108146e0fa7464743607bd34e91ea9eb..9487c1cb07b0d851c0f085262179470d0d596116
>  100644
> --- a/gcc/config/aarch64/aarch64.c
> +++ b/gcc/config/aarch64/aarch64.c
> @@ -21255,35 +21255,36 @@ aarch64_copy_one_block_and_progress_pointers (rtx 
> *src, rtx *dst,
>  bool
>  aarch64_expand_cpymem (rtx *operands)
>  {
> -  /* These need to be signed as we need to perform arithmetic on n as
> - signed operations.  */
> -  int n, mode_bits;
> +  int mode_bits;
>rtx dst = operands[0];
>rtx src = operands[1];
>rtx base;
> -  machine_mode cur_mode = BLKmode, next_mode;
> -  bool speed_p = !optimize_function_for_size_p (cfun);
> +  machine_mode cur_mode = BLKmode;
>
> -  /* When optimizing for size, give a better estimate of the length of a
> - memcpy call, but use the default otherwise.  Moves larger than 8 bytes
> - will always require an even number of instructions to do now.  And each
> - operation requires both a load+store, so divide the max number by 2.  */
> -  unsigned int max_num_moves = (speed_p ? 16 : AARCH64_CALL_RATIO) / 2;
> -
> -  /* We can't do anything smart if the amount to copy is not constant.  */
> +  /* Only expand fixed-size copies.  */
>if (!CONST_INT_P (operands[2]))
>  return false;
>
> -  unsigned HOST_WIDE_INT tmp = INTVAL (operands[2]);
> +  unsigned HOST_WIDE_INT size = INTVAL (operands[2]);
>
> -  /* Try to keep the number of instructions low.  For all cases we will do at
> - most two moves for the residual amount, since we'll always overlap the
> - remainder.  */
> -  if (((tmp / 16) + (tmp % 16 ? 2 : 0)) > max_num_moves)
> +  /* Inline up to 256 bytes when optimizing for speed.  */
> +  unsigned HOST_WIDE_INT max_copy_size = 256;
> +
> +  if (optimize_function_for_size_p (cfun) || !TARGET_SIMD)
> +max_copy_size = 128;
> +
> +  if (size > max_copy_size)
>  return false;
>
> -  /* At this point tmp is known to have to fit inside an int.  */
> -  n = tmp;
> +  int copy_bits = 256;
> +
> +  /* Default to 256-bit LDP/STP on large copies, however small copies, no 
> SIMD
> + support or slow 256-bit LDP/STP fall back to 128-bit chunks.  */
> +  if (size <= 24 || !TARGET_SIMD

Nit: one condition per line when the condition spans multiple lines.

> +  || (size <= (max_copy_size / 2)
> +  && (aarch64_tune_params.extra_tuning_flags
> +  & AARCH64_EXTRA_TUNE_NO_LDP_STP_QREGS)))
> +copy_bits = GET_MODE_BITSIZE (TImode);

(Looks like the mailer has eaten some tabs here.)

As discussed in Sudi's setmem patch, I think we should make the
AARCH64_EXTRA_TUNE_NO_LDP_STP_QREGS conditional on optimising
for speed.  For size, using LDP Q and STP Q is a win regardless
of what the CPU wants.

>base = copy_to_mode_reg (Pmode, XEXP (dst, 0));
>dst = adjust_automodify_address (dst, VOIDmode, base, 0);
> @@ -21291,15 +21292,8 @@ aarch64_expand_cpymem (rtx *operands)
>base = copy_to_mode_reg (Pmode, XEXP (src, 0));
>src = adjust_automodify_address (src, VOIDmode, base, 0);
>
> -  /* Convert n to bits to make the rest of the code simpler.  */
> -  n = n * BITS_PER_UNIT;
> -
> -  /* Maximum amount to copy in one go.  We allow 256-bit chunks based on the
> - AARCH64_EXTRA_TUNE_NO_LDP_STP_QREGS tuning parameter and TARGET_SIMD.  
> */
> -  const int copy_limit = ((aarch64_tune_params.extra_tuning_flags
> -   & AARCH64_EXTRA_TUNE_NO_LDP_STP_QREGS)
> -  || !TARGET_SIMD)
> - ? GET_MODE_BITSIZE (TImode) : 256;
> +  /* Convert size to bits to make the rest of the code simpler.  */
> +  int n = size * BITS_PER_UNIT;
>
>while (n > 0)
>  {
> @@ -21307,23 +21301,26 @@ aarch64_expand_cpymem (rtx *operands)
>   or writing.  */
>opt_scalar_int_mode mode_iter;
>FOR_EACH_MODE_IN_CLASS (mode_iter, MODE_INT)
> -if (GET_MODE_BITSIZE (mode_iter.require ()) <= MIN (n, copy_limit))
> +if (GET_MODE_BITSIZE (mode_iter.require ()) <= MIN (n, copy_bits))
>cur_mode = mode_iter.require ();
>
>gcc_assert (cur_mode != BLKmode);
>
>mode_bits = GET_MODE_BITSIZE (cur_mode).to_constant ();
> +
> +  /* Prefer Q-register acc

Re: [PATCH] libstdc++: Ensure __gthread_self doesn't call undefined weak symbol [PR 95989]

2020-11-11 Thread Jakub Jelinek via Gcc-patches
On Wed, Nov 11, 2020 at 05:24:42PM +, Jonathan Wakely wrote:
> --- a/libgcc/gthr-posix.h
> +++ b/libgcc/gthr-posix.h
> @@ -684,7 +684,14 @@ __gthread_equal (__gthread_t __t1, __gthread_t __t2)
>  static inline __gthread_t
>  __gthread_self (void)
>  {
> +#if __GLIBC_PREREQ(2, 27)

What if it is a non-glibc system where __GLIBC_PREREQ macro isn't defined?
I think you'd get then
error: missing binary operator before token "("
So I think you want
#if defined __GLIBC__ && defined __GLIBC_PREREQ
#if __GLIBC_PREREQ(2, 27)
  return pthread_self ();
#else
  return __gthrw_(pthread_self) ();
#else
  return __gthrw_(pthread_self) ();
#endif
or similar.

Jakub



Re: [PATCH] aarch64: Fix SVE2 BCAX pattern [PR97730]

2020-11-11 Thread Richard Sandiford via Gcc-patches
Alex Coplan  writes:
> Hello,
>
> This patch adds a missing not to the SVE2 BCAX (Bitwise clear and
> exclusive or) pattern,

Oops.  Even worse is that I'd made the test match the bug in the code. :-(

> fixing the PR. Since SVE doesn't have an unpredicated not instruction,
> we need to use a (vacuously) predicated not here.
>
> To ensure that the predicate is instantiated correctly (to all 1s) for
> the intrinsics, we pull out a separate expander from the define_insn.
>
> From the ISA reference [1]:
>> Bitwise AND elements of the second source vector with the
>> corresponding inverted elements of the third source vector, then
>> exclusive OR the results with corresponding elements of the first
>> source vector.
>
> Testing:
>  * Regression tested an aarch64-linux-gnu cross configured with
>--with-arch=armv8.2-a+sve2, no new failures.
>  * Bootstrap and regression test on aarch64-linux-gnu in progress.
>
> The following execution tests went from FAIL to PASS on the SVE2
> regression run as a result of this change:
>
> FAIL->PASS: gcc.c-torture/execute/pr37573.c   -O3 -fomit-frame-pointer 
> -funroll-loops -fpeel-loops -ftracer -finline-functions  execution test
> FAIL->PASS: gcc.c-torture/execute/pr37573.c   -O3 -g  execution test
> FAIL->PASS: gcc.dg/torture/pr69714.c   -O3 -fomit-frame-pointer 
> -funroll-loops -fpeel-loops -ftracer -finline-functions  execution test
> FAIL->PASS: gcc.dg/torture/pr69714.c   -O3 -g  execution test
> FAIL->PASS: gcc.dg/vect/pr70021.c execution test
> FAIL->PASS: gcc.dg/vect/pr70021.c -flto -ffat-lto-objects execution test
>
> OK for trunk (provided patch passes bootstrap/regtest)?

OK, thanks.  Please also backport to GCC 10 after a few days on trunk
(no separate approval needed).

Richard


Re: [PATCH] match.pd: undistribute (a << s) & C, when C = (M << s) and exact_log2(M - 1)

2020-11-11 Thread Jeff Law via Gcc-patches


On 11/11/20 3:55 AM, Jakub Jelinek via Gcc-patches wrote:
> On Wed, Nov 11, 2020 at 11:43:34AM +0100, Philipp Tomsich wrote:
>> The patch addresses this by disallowing that rule, if an exact power-of-2 is
>> seen as C1.  The reason why I would prefer to have this canonicalised the
>> same way the (X & C1) * C2 is canonicalised, is that cleaning this up during
>> combine is more difficult on some architectures that require multiple insns
>> to represent the shifted constant (i.e. C1 << C2).
> It is bad to have many exceptions for the canonicalization
> and it is unclear why exactly these were chosen, and it doesn't really deal
> with say:
> (x & 0xabcdef12ULL) << 13
> being less expensive on some targets than
> (x << 13) & (0xabcdef12ULL << 13).
> (x & 0x7) << 3 vs. (x << 3) & 0x38 on the other side is a wash on
> many targets.
> As I said, it is better to decide which one is better before or during
> expansion based on target costs, sure, combine can't catch everything.

I think Jakub is hitting a key point here.  Gimple should canonicalize
on what is simpler from a gimple standpoint, not what is better for some
set of targets.   Target dependencies like this shouldn't be introduced
until expansion time.


Jeff




Re: [PATCH 1/3] Refactor copying decl section names

2020-11-11 Thread Jeff Law via Gcc-patches


On 11/10/20 10:11 PM, Alan Modra wrote:
> On Tue, Nov 10, 2020 at 09:19:37PM -0700, Jeff Law wrote:
>> I think the const char * is fine.  That should force resolution to the
>> same routine we were using earlier.  Do you want to commit that fix or
>> shall I?
> Commited 693a79a355e1.

THanks.  I ended up not coming back to the computer last night, so it's
definitely good you took it :-)


jeff



[PATCH 1/2] c++: Correct the handling of alignof(expr) [PR88115]

2020-11-11 Thread Patrick Palka via Gcc-patches
We're currently neglecting to set the ALIGNOF_EXPR_STD_P flag on an
ALIGNOF_EXPR when its operand is an expression.  This leads to us
handling alignof(expr) as if it were written __alignof__(expr), and
returning the preferred alignment instead of the ABI alignment.  In the
testcase below, this causes the first and third static_assert to fail on
x86.

Bootstrapped and regtested on x86_64-pc-linux-gnu, does this look OK for
trunk?  Would it be appropriate for backporting to the release branches?

gcc/cp/ChangeLog:

PR c++/88115
* cp-tree.h (cxx_sizeof_or_alignof_expr): Add bool parameter.
* decl.c (fold_sizeof_expr): Pass false to
cxx_sizeof_or_alignof_expr.
* parser.c (cp_parser_unary_expression): Pass std_alignof to
cxx_sizeof_or_alignof_expr.
* pt.c (tsubst_copy): Pass false to cxx_sizeof_or_alignof_expr.
(tsubst_copy_and_build): Pass std_alignof to
cxx_sizeof_or_alignof_expr.
* typeck.c (cxx_alignof_expr): Add std_alignof bool parameter
and pass it to cxx_sizeof_or_alignof_type.  Set ALIGNOF_EXPR_STD_P
appropriately.
(cxx_sizeof_or_alignof_expr): Add std_alignof bool parameter
and pass it to cxx_alignof_expr.  Assert op is either
SIZEOF_EXPR or ALIGNOF_EXPR.

libcc1/ChangeLog:

PR c++/88115
* libcp1plugin.cc (plugin_build_unary_expr): Pass true to
cxx_sizeof_or_alignof_expr.

gcc/testsuite/ChangeLog:

PR c++/88115
* g++.dg/cpp0x/alignof6.C: New test.
---
 gcc/cp/cp-tree.h  |  2 +-
 gcc/cp/decl.c |  2 +-
 gcc/cp/parser.c   |  4 ++--
 gcc/cp/pt.c   |  3 ++-
 gcc/cp/typeck.c   | 17 +++--
 gcc/testsuite/g++.dg/cpp0x/alignof6.C | 19 +++
 libcc1/libcp1plugin.cc|  2 +-
 7 files changed, 37 insertions(+), 12 deletions(-)
 create mode 100644 gcc/testsuite/g++.dg/cpp0x/alignof6.C

diff --git a/gcc/cp/cp-tree.h b/gcc/cp/cp-tree.h
index 230a1525c63..63724c0e84f 100644
--- a/gcc/cp/cp-tree.h
+++ b/gcc/cp/cp-tree.h
@@ -7461,7 +7461,7 @@ extern int comp_cv_qualification  (const_tree, 
const_tree);
 extern int comp_cv_qualification   (int, int);
 extern int comp_cv_qual_signature  (tree, tree);
 extern tree cxx_sizeof_or_alignof_expr (location_t, tree,
-enum tree_code, bool);
+enum tree_code, bool, bool);
 extern tree cxx_sizeof_or_alignof_type (location_t, tree,
 enum tree_code, bool, bool);
 extern tree cxx_alignas_expr(tree);
diff --git a/gcc/cp/decl.c b/gcc/cp/decl.c
index 42e704e7af2..c52111e329c 100644
--- a/gcc/cp/decl.c
+++ b/gcc/cp/decl.c
@@ -10335,7 +10335,7 @@ fold_sizeof_expr (tree t)
   else
 r = cxx_sizeof_or_alignof_expr (EXPR_LOCATION (t),
TREE_OPERAND (t, 0), SIZEOF_EXPR,
-   false);
+   false, false);
   if (r == error_mark_node)
 r = size_one_node;
   return r;
diff --git a/gcc/cp/parser.c b/gcc/cp/parser.c
index 36322812310..4f59fc48d0f 100644
--- a/gcc/cp/parser.c
+++ b/gcc/cp/parser.c
@@ -8335,8 +8335,8 @@ cp_parser_unary_expression (cp_parser *parser, cp_id_kind 
* pidk,
   "ISO C++ does not allow % "
   "with a non-type");
 
-   ret = cxx_sizeof_or_alignof_expr (compound_loc,
- operand, op, true);
+   ret = cxx_sizeof_or_alignof_expr (compound_loc, operand, op,
+ std_alignof, true);
  }
/* For SIZEOF_EXPR, just issue diagnostics, but keep
   SIZEOF_EXPR with the original operand.  */
diff --git a/gcc/cp/pt.c b/gcc/cp/pt.c
index 6ba114c9da3..1a01fb96c69 100644
--- a/gcc/cp/pt.c
+++ b/gcc/cp/pt.c
@@ -16789,6 +16789,7 @@ tsubst_copy (tree t, tree args, tsubst_flags_t 
complain, tree in_decl)
  else
return cxx_sizeof_or_alignof_expr (input_location,
   expanded, SIZEOF_EXPR,
+  false,
complain & tf_error);
}
  else
@@ -19731,7 +19732,7 @@ tsubst_copy_and_build (tree t,
  complain & tf_error);
else
  r = cxx_sizeof_or_alignof_expr (input_location,
- op1, TREE_CODE (t),
+ op1, TREE_CODE (t), std_alignof,
  complain & tf_error);
if (TREE_CODE (t) == SIZEOF_EXPR && r != error_mark_node)
  {
diff --git a/

Re: [PATCH,wwwdocs] gcc-11/changes: Mention Intel AVX-VNNI

2020-11-11 Thread Jeff Law via Gcc-patches


On 11/11/20 4:19 AM, Hongtao Liu via Gcc-patches wrote:
> [GCC-11] Mention Intel AVX-VNNI and add it to ALDERLAKE and SAPPIRERAPIDS,
> also add HRESET to ALDERLAKE.

OK.  Please install if you haven't done so already.

jeff




[PATCH 2/2] c++: Change the mangling of __alignof__ [PR88115]

2020-11-11 Thread Patrick Palka via Gcc-patches
This patch changes the mangling of __alignof__ to v111__alignof__,
making the mangling distinct from that of alignof(type) and
alignof(expr).

How we mangle ALIGNOF_EXPR now depends on its ALIGNOF_EXPR_STD_P flag,
which after the previous patch gets consistently set for alignof(type)
as well as alignof(expr).

Bootstrapped and regtestd on x86_64-pc-linux-gnu, does this look OK for
trunk?

gcc/c-family/ChangeLog:

PR c++/88115
* c-opts.c (c_common_post_options): Update latest_abi_version.

gcc/ChangeLog:

PR c++/88115
* common.opt (-fabi-version): Document =15.
* doc/invoke.texi (C++ Dialect Options): Likewise.

gcc/cp/ChangeLog:

PR c++/88115
* mangle.c (write_expression): Mangle __alignof_ differently
from alignof when the ABI version is at least 15.

libiberty/ChangeLog:

PR c++/88115
* cp-demangle.c (d_print_comp_inner)
: Don't print the
"operator " prefix for __alignof__.
: Always print parens around the
operand of __alignof__.
* testsuite/demangle-expected: Test demangling of __alignof__.

gcc/testsuite/ChangeLog:

PR c++/88115
* g++.dg/abi/macro0.C: Adjust.
* g++.dg/cpp0x/alignof7.C: New test.
* g++.dg/cpp0x/alignof8.C: New test.
---
 gcc/c-family/c-opts.c |  2 +-
 gcc/common.opt|  4 
 gcc/cp/mangle.c   | 27 +++
 gcc/doc/invoke.texi   |  3 +++
 gcc/testsuite/g++.dg/abi/macro0.C |  2 +-
 gcc/testsuite/g++.dg/cpp0x/alignof7.C | 22 ++
 gcc/testsuite/g++.dg/cpp0x/alignof8.C | 13 +
 libiberty/cp-demangle.c   | 25 -
 libiberty/testsuite/demangle-expected |  7 +++
 9 files changed, 94 insertions(+), 11 deletions(-)
 create mode 100644 gcc/testsuite/g++.dg/cpp0x/alignof7.C
 create mode 100644 gcc/testsuite/g++.dg/cpp0x/alignof8.C

diff --git a/gcc/c-family/c-opts.c b/gcc/c-family/c-opts.c
index 0698e58a335..40e92229d8a 100644
--- a/gcc/c-family/c-opts.c
+++ b/gcc/c-family/c-opts.c
@@ -945,7 +945,7 @@ c_common_post_options (const char **pfilename)
 
   /* Change flag_abi_version to be the actual current ABI level, for the
  benefit of c_cpp_builtins, and to make comparison simpler.  */
-  const int latest_abi_version = 14;
+  const int latest_abi_version = 15;
   /* Generate compatibility aliases for ABI v11 (7.1) by default.  */
   const int abi_compat_default = 11;
 
diff --git a/gcc/common.opt b/gcc/common.opt
index 7d0e0d9c88a..9552cebe0d6 100644
--- a/gcc/common.opt
+++ b/gcc/common.opt
@@ -960,9 +960,13 @@ Driver Undocumented
 ; 13: Fixes the accidental change in 12 to the calling convention for classes
 ; with deleted copy constructor and trivial move constructor.
 ; Default in G++ 8.2.
+;
 ; 14: Corrects the mangling of nullptr expression.
 ; Default in G++ 10.
 ;
+; 15: Changes the mangling of __alignof__ to be distinct from that of alignof.
+; Default in G++ 11.
+;
 ; Additional positive integers will be assigned as new versions of
 ; the ABI become the default version of the ABI.
 fabi-version=
diff --git a/gcc/cp/mangle.c b/gcc/cp/mangle.c
index 9fd30011288..5548e51d39d 100644
--- a/gcc/cp/mangle.c
+++ b/gcc/cp/mangle.c
@@ -3049,11 +3049,30 @@ write_expression (tree expr)
   else
goto normal_expr;
 }
-  else if (TREE_CODE (expr) == ALIGNOF_EXPR
-  && TYPE_P (TREE_OPERAND (expr, 0)))
+  else if (TREE_CODE (expr) == ALIGNOF_EXPR)
 {
-  write_string ("at");
-  write_type (TREE_OPERAND (expr, 0));
+  if (!ALIGNOF_EXPR_STD_P (expr))
+   {
+ if (abi_warn_or_compat_version_crosses (15))
+   G.need_abi_warning = true;
+ if (abi_version_at_least (15))
+   {
+ /* We used to mangle __alignof__ like alignof.  */
+ write_string ("v111__alignof__");
+ if (TYPE_P (TREE_OPERAND (expr, 0)))
+   write_type (TREE_OPERAND (expr, 0));
+ else
+   write_expression (TREE_OPERAND (expr, 0));
+ return;
+   }
+   }
+  if (TYPE_P (TREE_OPERAND (expr, 0)))
+   {
+ write_string ("at");
+ write_type (TREE_OPERAND (expr, 0));
+   }
+  else
+   goto normal_expr;
 }
   else if (code == SCOPE_REF
   || code == BASELINK)
diff --git a/gcc/doc/invoke.texi b/gcc/doc/invoke.texi
index 8d0d2136831..553cc07e330 100644
--- a/gcc/doc/invoke.texi
+++ b/gcc/doc/invoke.texi
@@ -2807,6 +2807,9 @@ change in version 12.
 Version 14, which first appeared in G++ 10, corrects the mangling of
 the nullptr expression.
 
+Version 15, which first appeared in G++ 11, changes the mangling of
+@code{__alignof__} to be distinct from that of @code{alignof}.
+
 See also @option{-Wabi}.
 
 @item -fabi-compat-version=@var{n}
diff --git a/gcc/testsuite/g++.dg/abi/macro0.C 
b/gcc/testsuite/g++.dg/abi/m

[PATCH] [WIP] openmp: Add OpenMP 5.0 task detach clause support

2020-11-11 Thread Kwok Cheung Yeung

Hello

This is a WIP implementation of the OpenMP 5.0 task detach clause. The task 
construct can now take a detach clause, passing in a variable of type 
omp_event_handle_t. When the construct is encountered, space for an event is 
allocated and the event variable is set to point to the new event. When the task 
is run, it is not complete until a new function omp_fulfill_event has been 
called on the event variable, either in the task itself or in another thread of 
execution.


lower_detach_clause generates code to call GOMP_new_event, which allocates, 
initializes and returns a pointer to a gomp_allow_completion_event struct. The 
return value is then type-cast to a omp_event_handle_t and assigned to the event 
variable, before the data environment for the task construct is set up.


The event variable is passed into the call to GOMP_task, where it is assigned to 
a field in the gomp_task struct. If the task is not deferred, then it will wait 
for the detach event for be fulfilled inside GOMP_task, otherwise it needs to be 
handled in omp_barrier_handle_tasks.


When a task finishes in omp_barrier_handle_tasks and the detach event has not 
been fulfilled, it is placed onto a separate queue of unfulfilled tasks before 
the current thread continues with another task. When the current thread has no 
more tasks, then it will remove a task from the queue of unfulfilled tasks and 
wait for it to complete. When it does, it is removed and any dependent tasks are 
requeued for execution.


We cannot simply block after a task with an unfulfilled event has finished 
because in the case where there are more tasks than threads, there is the 
possibility that all the threads will be tied up waiting, while a task that 
results in an event getting fulfilled never gets run, causing execution to stall.


The memory allocated for the event is released when the associated task is 
destroyed.


Issues that I can see with the current implementation at the moment are:

- No error checking at the front-end.
- The memory for the event is not mapped on the target. This means that if 
omp_fulfill_event is called from an 'omp target' section with a target that does 
not share memory with the host, the event will not be fulfilled (and a segfault 
will probably occur).
- The tasks awaiting event fulfillment currently wait until there are no other 
runnable tasks left. A better approach would be to poll (without blocking) the 
waiting tasks whenever any task completes, immediately removing any now-complete 
tasks and requeuing any dependent tasks.


This patchset has only been very lightly tested on a x86-64 host. Any 
comments/thoughts/suggestions on this implementation?


Thanks

Kwok
commit 4c3926d9abb1a7e6089a9098e2099e2d574ebfec
Author: Kwok Cheung Yeung 
Date:   Tue Nov 3 03:06:26 2020 -0800

openmp: Add support for the OpenMP 5.0 task detach clause

2020-11-11  Kwok Cheung Yeung  

gcc/
* builtin-types.def (BT_PTR_SIZED_INT): New primitive type.
(BT_FN_PSINT_VOID): New function type.
(BT_FN_VOID_OMPFN_PTR_OMPCPYFN_LONG_LONG_BOOL_UINT_PTR_INT): Rename
to...
(BT_FN_VOID_OMPFN_PTR_OMPCPYFN_LONG_LONG_BOOL_UINT_PTR_INT_PSINT):
...this.  Add extra argument.
* gimplify.c (gimplify_scan_omp_clauses): Handle OMP_CLAUSE_DETACH.
(gimplify_adjust_omp_clauses): Likewise.
* omp-builtins.def (BUILT_IN_GOMP_TASK): Change function type to
BT_FN_VOID_OMPFN_PTR_OMPCPYFN_LONG_LONG_BOOL_UINT_PTR_INT_PSINT.
(BUILT_IN_GOMP_NEW_EVENT): New.
* omp-expand.c (expand_task_call): Add detach argument when generating
call to GOMP_task.
* omp-low.c (scan_sharing_clauses): Setup data environment for detach
clause.
(lower_detach_clause): New.
(lower_omp_taskreg): Call lower_detach_clause for detach clause.  Add
Gimple statements generated for detach clause.
* tree-core.h (enum omp_clause_code): Add OMP_CLAUSE_DETACH.
* tree-pretty-print.c (dump_omp_clause): Handle OMP_CLAUSE_DETACH.
* tree.c (omp_clause_num_ops): Add entry for OMP_CLAUSE_DETACH.
(omp_clause_code_name): Add entry for OMP_CLAUSE_DETACH.
(walk_tree_1): Handle OMP_CLAUSE_DETACH.
* tree.h (OMP_CLAUSE_DETACH_EXPR): New.

gcc/c-family/
* c-pragma.h (pragma_omp_clause): Add PRAGMA_OMP_CLAUSE_DETACH.
Redefine PRAGMA_OACC_CLAUSE_DETACH.

gcc/c/
* c-parser.c (c_parser_omp_clause_detach): New.
(c_parser_omp_all_clauses): Handle PRAGMA_OMP_CLAUSE_DETACH clause.
(OMP_TASK_CLAUSE_MASK): Add mask for PRAGMA_OMP_CLAUSE_DETACH.
* c-typeck.c (c_finish_omp_clauses): Handle PRAGMA_OMP_CLAUSE_DETACH
clause.

gcc/cp/
* parser.c (cp_parser_omp_all_clauses): Handle
PRAGMA_OMP_CLAUSE_DETACH.
(OMP_TASK_CLAUSE_MASK): Add mask for PRAGMA_OMP_CLAUSE_DETACH.
* semantics.c (finish_omp_claus

Re: [PATCH] c++: Fix up constexpr CLEANUP_POINT_EXPR and TRY_FINALLY_EXPR handling [PR97790]

2020-11-11 Thread Jason Merrill via Gcc-patches

On 11/11/20 10:26 AM, Jakub Jelinek wrote:

Hi!

As the testcase shows, CLEANUP_POINT_EXPR (and I think TRY_FINALLY_EXPR too)
suffer from the same problem that I was trying to fix in
r10-3597-g1006c9d4395a939820df76f37c7b085a4a1a003f
for CLEANUP_STMT, namely that if in the middle of the body expression of
those stmts is e.g. return stmt, goto, break or continue (something that
changes *jump_target and makes it start skipping stmts), we then skip the
cleanups too, which is not appropriate - the cleanups were either queued up
during the non-skipping execution of the body (for CLEANUP_POINT_EXPR), or
for TRY_FINALLY_EXPR are relevant already after entering the body block.


Would it make sense to always use a NULL jump_target when evaluating 
cleanups?



Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?

2020-11-11  Jakub Jelinek  

PR c++/97790
* constexpr.c (cxx_eval_constant_expression) : For evaluation of cleanups use initially
recorded jump_target pointee rather than whatever ends up in it
after evaluation of the body operand.

* g++.dg/cpp2a/constexpr-dtor9.C: New test.

--- gcc/cp/constexpr.c.jj   2020-11-04 09:35:10.025029335 +0100
+++ gcc/cp/constexpr.c  2020-11-11 13:52:37.538466295 +0100
@@ -6008,6 +6008,7 @@ cxx_eval_constant_expression (const cons
auto_vec cleanups;
vec *prev_cleanups = ctx->global->cleanups;
ctx->global->cleanups = &cleanups;
+   tree initial_jump_target = jump_target ? *jump_target : NULL_TREE;
r = cxx_eval_constant_expression (ctx, TREE_OPERAND (t, 0),
  lval,
  non_constant_p, overflow_p,
@@ -6019,19 +6020,24 @@ cxx_eval_constant_expression (const cons
FOR_EACH_VEC_ELT_REVERSE (cleanups, i, cleanup)
  cxx_eval_constant_expression (ctx, cleanup, false,
non_constant_p, overflow_p,
-   jump_target);
+   jump_target ? &initial_jump_target
+   : NULL);
}
break;
  
  case TRY_FINALLY_EXPR:

-  r = cxx_eval_constant_expression (ctx, TREE_OPERAND (t, 0), lval,
+  {
+   tree initial_jump_target = jump_target ? *jump_target : NULL_TREE;
+   r = cxx_eval_constant_expression (ctx, TREE_OPERAND (t, 0), lval,
+ non_constant_p, overflow_p,
+ jump_target);
+   if (!*non_constant_p)
+ /* Also evaluate the cleanup.  */
+ cxx_eval_constant_expression (ctx, TREE_OPERAND (t, 1), true,
non_constant_p, overflow_p,
-   jump_target);
-  if (!*non_constant_p)
-   /* Also evaluate the cleanup.  */
-   cxx_eval_constant_expression (ctx, TREE_OPERAND (t, 1), true,
- non_constant_p, overflow_p,
- jump_target);
+   jump_target ? &initial_jump_target
+   : NULL);
+  }
break;
  
  case CLEANUP_STMT:

--- gcc/testsuite/g++.dg/cpp2a/constexpr-dtor9.C.jj 2020-11-11 
13:57:16.572334917 +0100
+++ gcc/testsuite/g++.dg/cpp2a/constexpr-dtor9.C2020-11-11 
13:57:45.510010165 +0100
@@ -0,0 +1,31 @@
+// PR c++/97790
+// { dg-do compile { target c++20 } }
+
+struct S
+{
+  int *d;
+  int n;
+  constexpr S () : d(new int[1]{}), n(1) {}
+  constexpr ~S () { delete [] d; }
+};
+
+constexpr S
+foo ()
+{
+  return S ();
+}
+
+constexpr int
+bar ()
+{
+  return foo ().n;
+}
+
+constexpr int
+baz ()
+{
+  return S ().n;
+}
+
+constexpr int a = baz ();
+constexpr int b = bar ();

Jakub





Re: [PATCH 2/2] c++: Change the mangling of __alignof__ [PR88115]

2020-11-11 Thread Jason Merrill via Gcc-patches

On 11/11/20 1:21 PM, Patrick Palka wrote:

This patch changes the mangling of __alignof__ to v111__alignof__,
making the mangling distinct from that of alignof(type) and
alignof(expr).

How we mangle ALIGNOF_EXPR now depends on its ALIGNOF_EXPR_STD_P flag,
which after the previous patch gets consistently set for alignof(type)
as well as alignof(expr).

Bootstrapped and regtestd on x86_64-pc-linux-gnu, does this look OK for
trunk?


Both patches are OK for trunk, not for release branches.


gcc/c-family/ChangeLog:

PR c++/88115
* c-opts.c (c_common_post_options): Update latest_abi_version.

gcc/ChangeLog:

PR c++/88115
* common.opt (-fabi-version): Document =15.
* doc/invoke.texi (C++ Dialect Options): Likewise.

gcc/cp/ChangeLog:

PR c++/88115
* mangle.c (write_expression): Mangle __alignof_ differently
from alignof when the ABI version is at least 15.

libiberty/ChangeLog:

PR c++/88115
* cp-demangle.c (d_print_comp_inner)
: Don't print the
"operator " prefix for __alignof__.
: Always print parens around the
operand of __alignof__.
* testsuite/demangle-expected: Test demangling of __alignof__.

gcc/testsuite/ChangeLog:

PR c++/88115
* g++.dg/abi/macro0.C: Adjust.
* g++.dg/cpp0x/alignof7.C: New test.
* g++.dg/cpp0x/alignof8.C: New test.
---
  gcc/c-family/c-opts.c |  2 +-
  gcc/common.opt|  4 
  gcc/cp/mangle.c   | 27 +++
  gcc/doc/invoke.texi   |  3 +++
  gcc/testsuite/g++.dg/abi/macro0.C |  2 +-
  gcc/testsuite/g++.dg/cpp0x/alignof7.C | 22 ++
  gcc/testsuite/g++.dg/cpp0x/alignof8.C | 13 +
  libiberty/cp-demangle.c   | 25 -
  libiberty/testsuite/demangle-expected |  7 +++
  9 files changed, 94 insertions(+), 11 deletions(-)
  create mode 100644 gcc/testsuite/g++.dg/cpp0x/alignof7.C
  create mode 100644 gcc/testsuite/g++.dg/cpp0x/alignof8.C

diff --git a/gcc/c-family/c-opts.c b/gcc/c-family/c-opts.c
index 0698e58a335..40e92229d8a 100644
--- a/gcc/c-family/c-opts.c
+++ b/gcc/c-family/c-opts.c
@@ -945,7 +945,7 @@ c_common_post_options (const char **pfilename)
  
/* Change flag_abi_version to be the actual current ABI level, for the

   benefit of c_cpp_builtins, and to make comparison simpler.  */
-  const int latest_abi_version = 14;
+  const int latest_abi_version = 15;
/* Generate compatibility aliases for ABI v11 (7.1) by default.  */
const int abi_compat_default = 11;
  
diff --git a/gcc/common.opt b/gcc/common.opt

index 7d0e0d9c88a..9552cebe0d6 100644
--- a/gcc/common.opt
+++ b/gcc/common.opt
@@ -960,9 +960,13 @@ Driver Undocumented
  ; 13: Fixes the accidental change in 12 to the calling convention for classes
  ; with deleted copy constructor and trivial move constructor.
  ; Default in G++ 8.2.
+;
  ; 14: Corrects the mangling of nullptr expression.
  ; Default in G++ 10.
  ;
+; 15: Changes the mangling of __alignof__ to be distinct from that of alignof.
+; Default in G++ 11.
+;
  ; Additional positive integers will be assigned as new versions of
  ; the ABI become the default version of the ABI.
  fabi-version=
diff --git a/gcc/cp/mangle.c b/gcc/cp/mangle.c
index 9fd30011288..5548e51d39d 100644
--- a/gcc/cp/mangle.c
+++ b/gcc/cp/mangle.c
@@ -3049,11 +3049,30 @@ write_expression (tree expr)
else
goto normal_expr;
  }
-  else if (TREE_CODE (expr) == ALIGNOF_EXPR
-  && TYPE_P (TREE_OPERAND (expr, 0)))
+  else if (TREE_CODE (expr) == ALIGNOF_EXPR)
  {
-  write_string ("at");
-  write_type (TREE_OPERAND (expr, 0));
+  if (!ALIGNOF_EXPR_STD_P (expr))
+   {
+ if (abi_warn_or_compat_version_crosses (15))
+   G.need_abi_warning = true;
+ if (abi_version_at_least (15))
+   {
+ /* We used to mangle __alignof__ like alignof.  */
+ write_string ("v111__alignof__");
+ if (TYPE_P (TREE_OPERAND (expr, 0)))
+   write_type (TREE_OPERAND (expr, 0));
+ else
+   write_expression (TREE_OPERAND (expr, 0));
+ return;
+   }
+   }
+  if (TYPE_P (TREE_OPERAND (expr, 0)))
+   {
+ write_string ("at");
+ write_type (TREE_OPERAND (expr, 0));
+   }
+  else
+   goto normal_expr;
  }
else if (code == SCOPE_REF
   || code == BASELINK)
diff --git a/gcc/doc/invoke.texi b/gcc/doc/invoke.texi
index 8d0d2136831..553cc07e330 100644
--- a/gcc/doc/invoke.texi
+++ b/gcc/doc/invoke.texi
@@ -2807,6 +2807,9 @@ change in version 12.
  Version 14, which first appeared in G++ 10, corrects the mangling of
  the nullptr expression.
  
+Version 15, which first appeared in G++ 11, changes the mangling of

+@code{__alignof__} to be distinct from that of @code{alignof}.

Re: [PATCH] libstdc++: Ensure __gthread_self doesn't call undefined weak symbol [PR 95989]

2020-11-11 Thread Jonathan Wakely via Gcc-patches

On 11/11/20 19:08 +0100, Jakub Jelinek via Libstdc++ wrote:

On Wed, Nov 11, 2020 at 05:24:42PM +, Jonathan Wakely wrote:

--- a/libgcc/gthr-posix.h
+++ b/libgcc/gthr-posix.h
@@ -684,7 +684,14 @@ __gthread_equal (__gthread_t __t1, __gthread_t __t2)
 static inline __gthread_t
 __gthread_self (void)
 {
+#if __GLIBC_PREREQ(2, 27)


What if it is a non-glibc system where __GLIBC_PREREQ macro isn't defined?


Ah yes, I forgot non-glibc systems exist :-)

Thanks, I'll fix it tomorrow, and test on some more targets.


I think you'd get then
error: missing binary operator before token "("
So I think you want
#if defined __GLIBC__ && defined __GLIBC_PREREQ
#if __GLIBC_PREREQ(2, 27)
 return pthread_self ();
#else
 return __gthrw_(pthread_self) ();
#else
 return __gthrw_(pthread_self) ();
#endif
or similar.

Jakub





Re: [PATCH] [PING^2] Asan changes for RISC-V.

2020-11-11 Thread Jim Wilson
Original message here
https://gcc.gnu.org/pipermail/gcc-patches/2020-October/557406.html

This has non-RISC-V changes, so I need a global reviewer to look at it.

Jim

On Wed, Nov 4, 2020 at 12:10 PM Jim Wilson  wrote:

>
>
> On Wed, Oct 28, 2020 at 4:59 PM Jim Wilson  wrote:
>
>> We have only riscv64 asan support, there is no riscv32 support as yet.
>> So I
>> need to be able to conditionally enable asan support for the riscv
>> target.  I
>> implemented this by returning zero from the asan_shadow_offset function.
>> This
>> requires a change to toplev.c and docs in target.def.
>>
>> The asan support works on a 5.5 kernel, but does not work on a 4.15
>> kernel.
>> The problem is that the asan high memory region is a small wedge below
>> 0x40.  The new kernel puts shared libraries at 0x3f and
>> going
>> down which works.  But the old kernel puts shared libraries at
>> 0x20
>> and going up which does not work, as it isn't in any recognized memory
>> region.  This might be fixable with more asan work, but we don't really
>> need
>> support for old kernel versions.
>>
>> The asan port is curious in that it uses 1<<29 for the shadow offset, but
>> all
>> other 64-bit targets use a number larger than 1<<32.  But what we have is
>> working OK for now.
>>
>> I did a make check RUNTESTFLAGS="asan.exp" on Fedora rawhide image
>> running on
>> qemu and the results look reasonable.
>>
>> === gcc Summary ===
>>
>> # of expected passes1905
>> # of unexpected failures11
>> # of unsupported tests  224
>>
>> === g++ Summary ===
>>
>> # of expected passes2002
>> # of unexpected failures6
>> # of unresolved testcases   1
>> # of unsupported tests  175
>>
>> OK?
>>
>> Jim
>>
>> 2020-10-28  Jim Wilson  
>>
>> gcc/
>> * config/riscv/riscv.c (riscv_asan_shadow_offset): New.
>> (TARGET_ASAN_SHADOW_OFFSET): New.
>> * doc/tm.texi: Regenerated.
>> * target.def (asan_shadow_offset); Mention that it can return
>> zero.
>> * toplev.c (process_options): Check for and handle zero return
>> from
>> targetm.asan_shadow_offset call.
>>
>> Co-Authored-By: cooper.joshua 
>> ---
>>  gcc/config/riscv/riscv.c | 16 
>>  gcc/doc/tm.texi  |  3 ++-
>>  gcc/target.def   |  3 ++-
>>  gcc/toplev.c |  3 ++-
>>  4 files changed, 22 insertions(+), 3 deletions(-)
>>
>> diff --git a/gcc/config/riscv/riscv.c b/gcc/config/riscv/riscv.c
>> index 989a9f15250..6909e200de1 100644
>> --- a/gcc/config/riscv/riscv.c
>> +++ b/gcc/config/riscv/riscv.c
>> @@ -5299,6 +5299,19 @@ riscv_gpr_save_operation_p (rtx op)
>>return true;
>>  }
>>
>> +/* Implement TARGET_ASAN_SHADOW_OFFSET.  */
>> +
>> +static unsigned HOST_WIDE_INT
>> +riscv_asan_shadow_offset (void)
>> +{
>> +  /* We only have libsanitizer support for RV64 at present.
>> +
>> + This number must match kRiscv*_ShadowOffset* in the file
>> + libsanitizer/asan/asan_mapping.h which is currently 1<<29 for rv64,
>> + even though 1<<36 makes more sense.  */
>> +  return TARGET_64BIT ? (HOST_WIDE_INT_1 << 29) : 0;
>> +}
>> +
>>  /* Initialize the GCC target structure.  */
>>  #undef TARGET_ASM_ALIGNED_HI_OP
>>  #define TARGET_ASM_ALIGNED_HI_OP "\t.half\t"
>> @@ -5482,6 +5495,9 @@ riscv_gpr_save_operation_p (rtx op)
>>  #undef TARGET_NEW_ADDRESS_PROFITABLE_P
>>  #define TARGET_NEW_ADDRESS_PROFITABLE_P riscv_new_address_profitable_p
>>
>> +#undef TARGET_ASAN_SHADOW_OFFSET
>> +#define TARGET_ASAN_SHADOW_OFFSET riscv_asan_shadow_offset
>> +
>>  struct gcc_target targetm = TARGET_INITIALIZER;
>>
>>  #include "gt-riscv.h"
>> diff --git a/gcc/doc/tm.texi b/gcc/doc/tm.texi
>> index 24c37f655c8..39c596b647a 100644
>> --- a/gcc/doc/tm.texi
>> +++ b/gcc/doc/tm.texi
>> @@ -12078,7 +12078,8 @@ is zero, which disables this optimization.
>>  @deftypefn {Target Hook} {unsigned HOST_WIDE_INT}
>> TARGET_ASAN_SHADOW_OFFSET (void)
>>  Return the offset bitwise ored into shifted address to get corresponding
>>  Address Sanitizer shadow memory address.  NULL if Address Sanitizer is
>> not
>> -supported by the target.
>> +supported by the target.  May return 0 if Address Sanitizer is not
>> supported
>> +by a subtarget.
>>  @end deftypefn
>>
>>  @deftypefn {Target Hook} {unsigned HOST_WIDE_INT} TARGET_MEMMODEL_CHECK
>> (unsigned HOST_WIDE_INT @var{val})
>> diff --git a/gcc/target.def b/gcc/target.def
>> index ed2da154e30..268b56b6ebd 100644
>> --- a/gcc/target.def
>> +++ b/gcc/target.def
>> @@ -4452,7 +4452,8 @@ DEFHOOK
>>  (asan_shadow_offset,
>>   "Return the offset bitwise ored into shifted address to get
>> corresponding\n\
>>  Address Sanitizer shadow memory address.  NULL if Address Sanitizer is
>> not\n\
>> -supported by the target.",
>> +supported by the target.  May return 0 if Address Sanitizer is not
>> supported\n\
>> +by a subtarget.",
>>   unsigned HOST_WIDE_INT, (void),
>>   NULL)
>>
>> diff --git 

Re: [PATCH] match.pd: undistribute (a << s) & C, when C = (M << s) and exact_log2(M - 1)

2020-11-11 Thread Philipp Tomsich
On Wed, 11 Nov 2020 at 19:17, Jeff Law  wrote:
>
>
> On 11/11/20 3:55 AM, Jakub Jelinek via Gcc-patches wrote:
> > On Wed, Nov 11, 2020 at 11:43:34AM +0100, Philipp Tomsich wrote:
> >> The patch addresses this by disallowing that rule, if an exact power-of-2 
> >> is
> >> seen as C1.  The reason why I would prefer to have this canonicalised the
> >> same way the (X & C1) * C2 is canonicalised, is that cleaning this up 
> >> during
> >> combine is more difficult on some architectures that require multiple insns
> >> to represent the shifted constant (i.e. C1 << C2).
> > It is bad to have many exceptions for the canonicalization
> > and it is unclear why exactly these were chosen, and it doesn't really deal
> > with say:
> > (x & 0xabcdef12ULL) << 13
> > being less expensive on some targets than
> > (x << 13) & (0xabcdef12ULL << 13).
> > (x & 0x7) << 3 vs. (x << 3) & 0x38 on the other side is a wash on
> > many targets.
> > As I said, it is better to decide which one is better before or during
> > expansion based on target costs, sure, combine can't catch everything.
>
> I think Jakub is hitting a key point here.  Gimple should canonicalize
> on what is simpler from a gimple standpoint, not what is better for some
> set of targets.   Target dependencies like this shouldn't be introduced
> until expansion time.

The simplification that distributes the shift (i.e. the one that Jakub referred
to as fighting the new rule) is also run after GIMPLE has been expanded to
RTX.  In my understanding, this still implies that even if we have a cost-aware
expansion, this existing rule will nonetheless distribute the shift.

Philipp.


Re: [PATCH] match.pd: undistribute (a << s) & C, when C = (M << s) and exact_log2(M - 1)

2020-11-11 Thread Jakub Jelinek via Gcc-patches
On Wed, Nov 11, 2020 at 08:53:56PM +0100, Philipp Tomsich wrote:
> > On 11/11/20 3:55 AM, Jakub Jelinek via Gcc-patches wrote:
> > > On Wed, Nov 11, 2020 at 11:43:34AM +0100, Philipp Tomsich wrote:
> > >> The patch addresses this by disallowing that rule, if an exact 
> > >> power-of-2 is
> > >> seen as C1.  The reason why I would prefer to have this canonicalised the
> > >> same way the (X & C1) * C2 is canonicalised, is that cleaning this up 
> > >> during
> > >> combine is more difficult on some architectures that require multiple 
> > >> insns
> > >> to represent the shifted constant (i.e. C1 << C2).
> > > It is bad to have many exceptions for the canonicalization
> > > and it is unclear why exactly these were chosen, and it doesn't really 
> > > deal
> > > with say:
> > > (x & 0xabcdef12ULL) << 13
> > > being less expensive on some targets than
> > > (x << 13) & (0xabcdef12ULL << 13).
> > > (x & 0x7) << 3 vs. (x << 3) & 0x38 on the other side is a wash on
> > > many targets.
> > > As I said, it is better to decide which one is better before or during
> > > expansion based on target costs, sure, combine can't catch everything.
> >
> > I think Jakub is hitting a key point here.  Gimple should canonicalize
> > on what is simpler from a gimple standpoint, not what is better for some
> > set of targets.   Target dependencies like this shouldn't be introduced
> > until expansion time.
> 
> The simplification that distributes the shift (i.e. the one that Jakub 
> referred
> to as fighting the new rule) is also run after GIMPLE has been expanded to
> RTX.  In my understanding, this still implies that even if we have a 
> cost-aware
> expansion, this existing rule will nonetheless distribute the shift.

At the RTL level, such simplifications should not happen if it is
against costs (e.g. combine but various other passes too check costs and
punt if the new code would be more costly than the old one).

Jakub



[PATCH] c++, v2: Fix up constexpr CLEANUP_POINT_EXPR and TRY_FINALLY_EXPR handling [PR97790]

2020-11-11 Thread Jakub Jelinek via Gcc-patches
On Wed, Nov 11, 2020 at 02:11:14PM -0500, Jason Merrill via Gcc-patches wrote:
> Would it make sense to always use a NULL jump_target when evaluating
> cleanups?

I was afraid of that, especially for TRY_FINALLY_EXPR, but it seems that
during constexpr evaluation the cleanups will most often be just very simple
destructor calls (or calls to cleanup attribute functions).
Furthermore, for neither of these 3 tree codes we'll reach that code if
jump_target && *jump_target initially (there is a return NULL_TREE much
earlier for those except for trees that could embed labels etc. in it and
clearly these 3 don't count in that).

This adjusted patch doesn't have any check-c++-all regressions, would that
be ok for trunk if it passes whole bootstrap/regtest?

2020-11-11  Jakub Jelinek  

PR c++/97790
* constexpr.c (cxx_eval_constant_expression) : Don't pass jump_target to
cxx_eval_constant_expression when evaluating the cleanups.

* g++.dg/cpp2a/constexpr-dtor9.C: New test.

--- gcc/cp/constexpr.c.jj   2020-11-04 09:35:10.025029335 +0100
+++ gcc/cp/constexpr.c  2020-11-11 13:52:37.538466295 +0100
@@ -6018,8 +6018,7 @@ cxx_eval_constant_expression (const cons
/* Evaluate the cleanups.  */
FOR_EACH_VEC_ELT_REVERSE (cleanups, i, cleanup)
  cxx_eval_constant_expression (ctx, cleanup, false,
-   non_constant_p, overflow_p,
-   jump_target);
+   non_constant_p, overflow_p);
   }
   break;
 
@@ -6030,29 +6029,20 @@ cxx_eval_constant_expression (const cons
   if (!*non_constant_p)
/* Also evaluate the cleanup.  */
cxx_eval_constant_expression (ctx, TREE_OPERAND (t, 1), true,
- non_constant_p, overflow_p,
- jump_target);
+ non_constant_p, overflow_p);
   break;
 
 case CLEANUP_STMT:
-  {
-   tree initial_jump_target = jump_target ? *jump_target : NULL_TREE;
-   r = cxx_eval_constant_expression (ctx, CLEANUP_BODY (t), lval,
- non_constant_p, overflow_p,
- jump_target);
-   if (!CLEANUP_EH_ONLY (t) && !*non_constant_p)
- {
-   iloc_sentinel ils (loc);
-   /* Also evaluate the cleanup.  If we weren't skipping at the
-  start of the CLEANUP_BODY, change jump_target temporarily
-  to &initial_jump_target, so that even a return or break or
-  continue in the body doesn't skip the cleanup.  */
-   cxx_eval_constant_expression (ctx, CLEANUP_EXPR (t), true,
- non_constant_p, overflow_p,
- jump_target ? &initial_jump_target
- : NULL);
- }
-  }
+  r = cxx_eval_constant_expression (ctx, CLEANUP_BODY (t), lval,
+   non_constant_p, overflow_p,
+   jump_target);
+  if (!CLEANUP_EH_ONLY (t) && !*non_constant_p)
+   {
+ iloc_sentinel ils (loc);
+ /* Also evaluate the cleanup.  */
+ cxx_eval_constant_expression (ctx, CLEANUP_EXPR (t), true,
+   non_constant_p, overflow_p);
+   }
   break;
 
   /* These differ from cxx_eval_unary_expression in that this doesn't
--- gcc/testsuite/g++.dg/cpp2a/constexpr-dtor9.C.jj 2020-11-11 
13:57:16.572334917 +0100
+++ gcc/testsuite/g++.dg/cpp2a/constexpr-dtor9.C2020-11-11 
13:57:45.510010165 +0100
@@ -0,0 +1,31 @@
+// PR c++/97790
+// { dg-do compile { target c++20 } }
+
+struct S
+{
+  int *d;
+  int n;
+  constexpr S () : d(new int[1]{}), n(1) {}
+  constexpr ~S () { delete [] d; }
+};
+
+constexpr S
+foo ()
+{
+  return S ();
+}
+
+constexpr int
+bar ()
+{
+  return foo ().n;
+}
+
+constexpr int
+baz ()
+{
+  return S ().n;
+}
+
+constexpr int a = baz ();
+constexpr int b = bar ();


Jakub



Re: [PATCH] match.pd: undistribute (a << s) & C, when C = (M << s) and exact_log2(M - 1)

2020-11-11 Thread Philipp Tomsich
On Wed, 11 Nov 2020 at 20:59, Jakub Jelinek  wrote:
> >
> > The simplification that distributes the shift (i.e. the one that Jakub 
> > referred
> > to as fighting the new rule) is also run after GIMPLE has been expanded to
> > RTX.  In my understanding, this still implies that even if we have a 
> > cost-aware
> > expansion, this existing rule will nonetheless distribute the shift.
>
> At the RTL level, such simplifications should not happen if it is
> against costs (e.g. combine but various other passes too check costs and
> punt if the new code would be more costly than the old one).

I agree.
Let me go back and investigate if the cost-model is misreading things, before we
continue the discussion.

Philipp.


Re: [PATCH] match.pd: undistribute (a << s) & C, when C = (M << s) and exact_log2(M - 1)

2020-11-11 Thread Jakub Jelinek via Gcc-patches
On Wed, Nov 11, 2020 at 09:04:28PM +0100, Philipp Tomsich wrote:
> On Wed, 11 Nov 2020 at 20:59, Jakub Jelinek  wrote:
> > >
> > > The simplification that distributes the shift (i.e. the one that Jakub 
> > > referred
> > > to as fighting the new rule) is also run after GIMPLE has been expanded to
> > > RTX.  In my understanding, this still implies that even if we have a 
> > > cost-aware
> > > expansion, this existing rule will nonetheless distribute the shift.
> >
> > At the RTL level, such simplifications should not happen if it is
> > against costs (e.g. combine but various other passes too check costs and
> > punt if the new code would be more costly than the old one).
> 
> I agree.
> Let me go back and investigate if the cost-model is misreading things, before 
> we
> continue the discussion.

If it is on some targets, surely it should be fixed.
E.g. x86_64 certainly has different costs for constants that fit into
instruction's immediates and for constants that need to be loaded into
registers first.
I know various other targets have much more complex sequences to compute
certain constants in registers though.
sparc costs surprise me:
case CONST_INT:
  if (SMALL_INT (x))
*total = 0;
  else
*total = 2;
because for the else I'd have expect better analysis of the constant to see
how many instructions are needed for it (I think maximum is 6).

Jakub



[PATCH] C++ : Add the -stdlib= option.

2020-11-11 Thread Iain Sandoe

resending - the first & second attempt didn’t seem to make it to gcc-patches.

Hi

This option allows the user to specify alternate C++ runtime libraries,
for example when a platform uses libc++ as the installed C++ runtime.

It is the same spelling as a clang option that allows that to use libstdc++.

I have had this patch for some time now (more than a year) on Darwin
branches.

For Darwin [>=11] (and I expect modern FreeBSD) the fact that the installed
C++ runtime is libc++ means conflicts can and do occur when using G++.

I expect that the facility will also be useful for folks who regularly try to
ensure that GCC and clang stay compatible, it is a credit to that effort that
the replacement is pretty much “drop in”.

Testing:

The patch applies without regression on *darwin* and x86_64-linux-gnu.

That doesn’t say much about whether it does what’s intended, of course,
and testing in-tree is not a viable option (it would need a lot of work, not
to mention the fact that it depends on an external source base).  So I’ve
tested this quite extensively on x86 Darwin and Linux.

It’s a lot easier to use an LLVM branch >= 9 for this since there is a
missing __cxa symbol before that (I originally used LLVM-7 for ‘reasons’).
Since coroutines was committed to GCC we have a  header
where the libc++ implementation is still using the 
version, so that one needs to  account for this.

Here’s an LLVM-9 tree with an added  header (as an example)
https://github.com/iains/llvm-project/tree/9.0.1-gcc-stdlib
(in case someone wants to try this out in the near future; I don’t think that
LLVM-10 will be much different, at least the coroutine header is unchanged
there)

I’ve used this ‘in anger’ on Darwin to build a toolset which includes a  
number

of C++ heavy applications (e.g. LLVM, cmake, etc) and it allowed some of
these to work effectively where it had not been possible before.

One can also do an “installed test” of g++
for that there are (a relatively modest number of) test fails.
AFAICT, there is nothing significant there - some tests fail because the  
output

isn’t expecting to see libc++ __1 inline namespace, some fail because libc++
(as per current branches) doesn’t allow use with GCC + std=c++98, some
are warning diagnostics etc.

[how compatible libc++ is, is somewhat independent of the patch itself; but
it seems “very compatible” is a starting assessment].

phew… description longer than patch, it seems.

OK for master?
thanks
Iain

—— commit message

This option allows the user to specify alternate C++ runtime libraries,
for example when a platform uses libc++ as the installed C++ runtime.

We introduce the command line option: -stdlib= which is the user-facing
mechanism to select the C++ runtime to be used when compiling and linking
code.  This is the same option spelling as that used by clang to allow the
use of libstdc++.

The availability (and thus function) of the option are a configure-time
choice using the configuration control:
--with-gxx-libcxx-include-dir=

Specification of the path for the libc++ headers, enables the -stdlib=
option (using the path as given), default values are set when the path
is unconfigured.

If --with-gxx-libcxx-include-dir is given together with --with-sysroot=,
then we test to see if the include path starts with the sysroot and, if so,
record the sysroot-relative component as the local path.  At runtime, we
prepend the sysroot that is actually active.

At link time, we use the C++ runtime in force and (if that is libc++) also
append the libc++abi ABI library. As for other cases, if a target sets the
name pointer for the ABI library to NULL the G++ driver will omit it from
the link line.

gcc/ChangeLog:

* configure.ac: Add gxx-libcxx-include-dir handled
in the same way as the regular cxx header directory.
* Makefile.in: Regenerated.
* config.in: Likewise.
* configure: Likewise.
* cppdefault.c: Pick up libc++ headers if the option
is enabled.
* incpath.c (add_standard_paths): Allow for multiple
c++ header include path variants.
* doc/invoke.texi: Document the -stdlib= option.

gcc/c-family/ChangeLog:

* c.opt: Add -stdlib= option and enumerations for
libstdc++ and libc++.

gcc/cp/ChangeLog:

* g++spec.c (LIBCXX, LIBCXX_PROFILE, LIBCXX_STATIC): New.
(LIBCXXABI, LIBCXXABI_PROFILE, LIBCXXABI_STATIC): New.
(lang_specific_driver): Allow selection amongst multiple
c++ libraries to be added to the link command.
---
gcc/Makefile.in |  6 +
gcc/c-family/c.opt  | 14 +++
gcc/config.in   |  6 +
gcc/configure   | 57 +++--
gcc/configure.ac| 44 ++
gcc/cp/g++spec.c| 53 ++---
gcc/cppdefault.c|  5 
gcc/doc/invoke.texi | 11 +
gcc/incpath.c   |  6 +++--
9 files changed, 195 insertions(+), 7 deletions(-)

diff 

[r11-4903 Regression] FAIL: gfortran.dg/gomp/workshare-reduction-57.f90 -O scan-tree-dump-times optimized "__builtin_GOMP_loop(?:_ull)_dynamic_next " 1 on Linux/x86_64

2020-11-11 Thread sunil.k.pandey via Gcc-patches
On Linux/x86_64,

1644ab9917ca6b96e9e683c422f1793258b9a3db is the first bad commit
commit 1644ab9917ca6b96e9e683c422f1793258b9a3db
Author: Tobias Burnus 
Date:   Wed Nov 11 09:23:07 2020 +0100

gfortran.dg/gomp/workshare-reduction-*.f90: Fix dumps for -m32

caused

FAIL: gfortran.dg/gomp/workshare-reduction-26.f90   -O   scan-tree-dump-times 
optimized "__builtin_GOMP_loop(?:_ull)_maybe_nonmonotonic_runtime_next " 1
FAIL: gfortran.dg/gomp/workshare-reduction-26.f90   -O   scan-tree-dump-times 
optimized "__builtin_GOMP_loop(?:_ull)_start [^\n\r]*, 0, 0, " 1
FAIL: gfortran.dg/gomp/workshare-reduction-27.f90   -O   scan-tree-dump-times 
optimized "__builtin_GOMP_loop(?:_ull)_runtime_next " 1
FAIL: gfortran.dg/gomp/workshare-reduction-27.f90   -O   scan-tree-dump-times 
optimized "__builtin_GOMP_loop(?:_ull)_start [^\n\r]*, 
(?:2147483648|-2147483648), 0, " 1
FAIL: gfortran.dg/gomp/workshare-reduction-28.f90   -O   scan-tree-dump-times 
optimized "__builtin_GOMP_loop(?:_ull)_nonmonotonic_runtime_next " 1
FAIL: gfortran.dg/gomp/workshare-reduction-28.f90   -O   scan-tree-dump-times 
optimized "__builtin_GOMP_loop(?:_ull)_start [^\n\r]*, 4, 0, " 1
FAIL: gfortran.dg/gomp/workshare-reduction-36.f90   -O   scan-tree-dump-times 
optimized "__builtin_GOMP_loop(?:_ull)_nonmonotonic_dynamic_next " 1
FAIL: gfortran.dg/gomp/workshare-reduction-36.f90   -O   scan-tree-dump-times 
optimized "__builtin_GOMP_loop(?:_ull)_start [^\n\r]*, 2, 1, " 1
FAIL: gfortran.dg/gomp/workshare-reduction-37.f90   -O   scan-tree-dump-times 
optimized "__builtin_GOMP_loop(?:_ull)_dynamic_next " 1
FAIL: gfortran.dg/gomp/workshare-reduction-37.f90   -O   scan-tree-dump-times 
optimized "__builtin_GOMP_loop(?:_ull)_start [^\n\r]*, 
(?:2147483650|-2147483646), 1, " 1
FAIL: gfortran.dg/gomp/workshare-reduction-38.f90   -O   scan-tree-dump-times 
optimized "__builtin_GOMP_loop(?:_ull)_nonmonotonic_dynamic_next " 1
FAIL: gfortran.dg/gomp/workshare-reduction-38.f90   -O   scan-tree-dump-times 
optimized "__builtin_GOMP_loop(?:_ull)_start [^\n\r]*, 2, 1, " 1
FAIL: gfortran.dg/gomp/workshare-reduction-39.f90   -O   scan-tree-dump-times 
optimized "__builtin_GOMP_loop(?:_ull)_nonmonotonic_dynamic_next " 1
FAIL: gfortran.dg/gomp/workshare-reduction-39.f90   -O   scan-tree-dump-times 
optimized "__builtin_GOMP_loop(?:_ull)_start [^\n\r]*, 2, 3, " 1
FAIL: gfortran.dg/gomp/workshare-reduction-3.f90   -O   scan-tree-dump-times 
optimized "__builtin_GOMP_loop(?:_ull)_nonmonotonic_runtime_next " 1
FAIL: gfortran.dg/gomp/workshare-reduction-3.f90   -O   scan-tree-dump-times 
optimized "__builtin_GOMP_loop(?:_ull)_start [^\n\r]*, 4, 0, " 1
FAIL: gfortran.dg/gomp/workshare-reduction-40.f90   -O   scan-tree-dump-times 
optimized "__builtin_GOMP_loop(?:_ull)_dynamic_next " 1
FAIL: gfortran.dg/gomp/workshare-reduction-40.f90   -O   scan-tree-dump-times 
optimized "__builtin_GOMP_loop(?:_ull)_start [^\n\r]*, 
(?:2147483650|-2147483646), 3, " 1
FAIL: gfortran.dg/gomp/workshare-reduction-41.f90   -O   scan-tree-dump-times 
optimized "__builtin_GOMP_loop(?:_ull)_nonmonotonic_dynamic_next " 1
FAIL: gfortran.dg/gomp/workshare-reduction-41.f90   -O   scan-tree-dump-times 
optimized "__builtin_GOMP_loop(?:_ull)_start [^\n\r]*, 2, 3, " 1
FAIL: gfortran.dg/gomp/workshare-reduction-42.f90   -O   scan-tree-dump-times 
optimized "__builtin_GOMP_loop(?:_ull)_nonmonotonic_guided_next " 1
FAIL: gfortran.dg/gomp/workshare-reduction-42.f90   -O   scan-tree-dump-times 
optimized "__builtin_GOMP_loop(?:_ull)_start [^\n\r]*, 3, 1, " 1
FAIL: gfortran.dg/gomp/workshare-reduction-43.f90   -O   scan-tree-dump-times 
optimized "__builtin_GOMP_loop(?:_ull)_guided_next " 1
FAIL: gfortran.dg/gomp/workshare-reduction-43.f90   -O   scan-tree-dump-times 
optimized "__builtin_GOMP_loop(?:_ull)_start [^\n\r]*, 
(?:2147483651|-2147483645), 1, " 1
FAIL: gfortran.dg/gomp/workshare-reduction-44.f90   -O   scan-tree-dump-times 
optimized "__builtin_GOMP_loop(?:_ull)_nonmonotonic_guided_next " 1
FAIL: gfortran.dg/gomp/workshare-reduction-44.f90   -O   scan-tree-dump-times 
optimized "__builtin_GOMP_loop(?:_ull)_start [^\n\r]*, 3, 1, " 1
FAIL: gfortran.dg/gomp/workshare-reduction-45.f90   -O   scan-tree-dump-times 
optimized "__builtin_GOMP_loop(?:_ull)_nonmonotonic_guided_next " 1
FAIL: gfortran.dg/gomp/workshare-reduction-45.f90   -O   scan-tree-dump-times 
optimized "__builtin_GOMP_loop(?:_ull)_start [^\n\r]*, 3, 3, " 1
FAIL: gfortran.dg/gomp/workshare-reduction-46.f90   -O   scan-tree-dump-times 
optimized "__builtin_GOMP_loop(?:_ull)_guided_next " 1
FAIL: gfortran.dg/gomp/workshare-reduction-46.f90   -O   scan-tree-dump-times 
optimized "__builtin_GOMP_loop(?:_ull)_start [^\n\r]*, 
(?:2147483651|-2147483645), 3, " 1
FAIL: gfortran.dg/gomp/workshare-reduction-47.f90   -O   scan-tree-dump-times 
optimized "__builtin_GOMP_loop(?:_ull)_nonmonotonic_guided_next " 1
FAIL: gfortran.dg/gomp/workshare-reduction-47.f90   -O   scan-tree-dump-times 
optimized "__builtin_GOMP_loop(?:_ull)_start [^\n\r]*, 3, 3

[r11-4913 Regression] FAIL: 25_algorithms/merge/constrained.cc (test for excess errors) on Linux/x86_64

2020-11-11 Thread sunil.k.pandey via Gcc-patches
On Linux/x86_64,

bd87cc14ebdb6789e067fb1828d5808407c308b3 is the first bad commit
commit bd87cc14ebdb6789e067fb1828d5808407c308b3
Author: Richard Biener 
Date:   Wed Nov 11 11:51:59 2020 +0100

tree-optimization/97623 - Avoid PRE hoist insertion iteration

caused

FAIL: 23_containers/vector/modifiers/insert/const_iterator.cc (test for excess 
errors)
FAIL: 23_containers/vector/types/1.cc (test for excess errors)
FAIL: 25_algorithms/merge/constrained.cc (test for excess errors)

with GCC configured with

../../gcc/configure 
--prefix=/local/skpandey/gccwork/toolwork/gcc-bisect-master/master/r11-4913/usr 
--enable-clocale=gnu --with-system-zlib --with-demangler-in-ld 
--with-fpmath=sse --enable-languages=c,c++,fortran --enable-cet --without-isl 
--enable-libmpx x86_64-linux --disable-bootstrap

To reproduce:

$ cd {build_dir}/x86_64-linux/libstdc++-v3/testsuite && make check 
RUNTESTFLAGS="conformance.exp=23_containers/vector/modifiers/insert/const_iterator.cc
 --target_board='unix{-m32}'"
$ cd {build_dir}/x86_64-linux/libstdc++-v3/testsuite && make check 
RUNTESTFLAGS="conformance.exp=23_containers/vector/modifiers/insert/const_iterator.cc
 --target_board='unix{-m32\ -march=cascadelake}'"
$ cd {build_dir}/x86_64-linux/libstdc++-v3/testsuite && make check 
RUNTESTFLAGS="conformance.exp=23_containers/vector/modifiers/insert/const_iterator.cc
 --target_board='unix{-m64}'"
$ cd {build_dir}/x86_64-linux/libstdc++-v3/testsuite && make check 
RUNTESTFLAGS="conformance.exp=23_containers/vector/modifiers/insert/const_iterator.cc
 --target_board='unix{-m64\ -march=cascadelake}'"
$ cd {build_dir}/x86_64-linux/libstdc++-v3/testsuite && make check 
RUNTESTFLAGS="conformance.exp=23_containers/vector/types/1.cc 
--target_board='unix{-m32}'"
$ cd {build_dir}/x86_64-linux/libstdc++-v3/testsuite && make check 
RUNTESTFLAGS="conformance.exp=23_containers/vector/types/1.cc 
--target_board='unix{-m32\ -march=cascadelake}'"
$ cd {build_dir}/x86_64-linux/libstdc++-v3/testsuite && make check 
RUNTESTFLAGS="conformance.exp=23_containers/vector/types/1.cc 
--target_board='unix{-m64}'"
$ cd {build_dir}/x86_64-linux/libstdc++-v3/testsuite && make check 
RUNTESTFLAGS="conformance.exp=23_containers/vector/types/1.cc 
--target_board='unix{-m64\ -march=cascadelake}'"
$ cd {build_dir}/x86_64-linux/libstdc++-v3/testsuite && make check 
RUNTESTFLAGS="conformance.exp=25_algorithms/merge/constrained.cc 
--target_board='unix{-m32}'"
$ cd {build_dir}/x86_64-linux/libstdc++-v3/testsuite && make check 
RUNTESTFLAGS="conformance.exp=25_algorithms/merge/constrained.cc 
--target_board='unix{-m32\ -march=cascadelake}'"

(Please do not reply to this email, for question about this report, contact me 
at skpgkp2 at gmail dot com)


Re: [PATCH] match.pd: undistribute (a << s) & C, when C = (M << s) and exact_log2(M - 1)

2020-11-11 Thread Jim Wilson
On Wed, Nov 11, 2020 at 2:55 AM Jakub Jelinek via Gcc-patches <
gcc-patches@gcc.gnu.org> wrote:

> On Wed, Nov 11, 2020 at 11:43:34AM +0100, Philipp Tomsich wrote:
> > The patch addresses this by disallowing that rule, if an exact
> power-of-2 is
> > seen as C1.  The reason why I would prefer to have this canonicalised the
> > same way the (X & C1) * C2 is canonicalised, is that cleaning this up
> during
> > combine is more difficult on some architectures that require multiple
> insns
> > to represent the shifted constant (i.e. C1 << C2).
>
> As I said, it is better to decide which one is better before or during
> expansion based on target costs, sure, combine can't catch everything.
>

it could be fixed in combine if we allowed 4 instructions to split into 3.
We allow combinations of 4 insns, but we only allow splits into 2 insns as
far as I know.

Trying 7, 8, 9 -> 10:
7: r80:DI=0x1
8: r81:DI=r80:DI<<0x23
  REG_DEAD r80:DI
  REG_EQUAL 0x8
9: r79:DI=r81:DI-0x8
  REG_DEAD r81:DI
  REG_EQUAL 0x7fff8
   10: r77:DI=r78:DI&r79:DI
  REG_DEAD r79:DI
  REG_DEAD r78:DI
Failed to match this instruction:
(set (reg:DI 77)
(and:DI (reg:DI 78)
(const_int 34359738360 [0x7fff8])))

The AND operation can be implemented with 3 shifts, a left shift to clear
the upper bits, a right shift to clear the lower bits, and then another
left shift to shift it back to position.  We are then left with 4 shifts,
and we can have a combiner pattern to match those 4 shifts and reduce to
2.  But this would require combine.c changes to work.  Unless maybe we
don't split the pattern into 3 insns and accept it as is, but there is risk
that this could result in worse code.

Or alternatively, maybe we could have an ANDDI3 expander which accepts mask
constants like this and emits the 3 shifts directly instead of forcing
the constant to a register.  Then we just need to be able to recognize that
these 3 shifts plus the fourth one can be combined into 2 shifts which
might work already, and if not it should be a simple combiner pattern.
This doesn't help if the same RTL can be created after initial RTL
expansion.

With the draft B extension we do this operation with a single instruction,
but we should get this right for systems without the B extension also.

Jim


Re: [PATCH][RFC] diagnostics: Add support for Unicode drawing characters

2020-11-11 Thread Lewis Hyatt via Gcc-patches
On Thu, Jul 23, 2020 at 05:47:28PM -0400, David Malcolm wrote:
> On Thu, 2020-07-23 at 12:28 -0400, Lewis Hyatt via Gcc-patches wrote:
> > Hello-
> > 
> > The attached patch is complete including docs, but I tagged as RFC
> > because I am not sure if anyone will like it, or if the general
> > reaction may
> > be closer to recoiling in horror :). Would appreciate your thoughts,
> > please...
> 
> Thanks for working on this.  I'm interested in other people's thoughts
> on this.  Various comments inline throughout below.
> 
> Currently, if a UTF-8 locale is detected, GCC changes the quote
> > characters
> > it outputs in diagnostics to Unicode directional quotes. I feel like
> > this is
> > a nice touch, so I was wondering whether GCC shouldn't do more along
> > these
> > lines. This patch adds support for using Unicode line drawing
> > characters and
> > similar things when outputting diagnostics. There is a new option
> > -fdiagnostics-unicode-drawing=[auto|never|always] to control it,
> > which
> > defaults to auto. "auto" will enable the feature under the same
> > circumstances that Unicode quotes get output, namely when the locale
> > is
> > determined by gcc_init_libintl() to support UTF-8. (The new option
> > does not
> > affect Unicode quote characters, which currently are not configurable
> > and
> > are determined solely by the locale.)
> 
> FWIW when I first started experimenting with location ranges back in
> 2015 my first patches had box-drawing characters for underlines; you
> can see this in some of the early examples here (and similar URLs from
> around then):
> 
> https://dmalcolm.fedorapeople.org/gcc/2015-08-18/plugin.html
>   (this also has a different approach for labeling ranges, which I
> called "captions", putting them in a right margin)
> 
> https://dmalcolm.fedorapeople.org/gcc/2015-08-19/diagnostic-test-string-literals-1.html
> 
> https://dmalcolm.fedorapeople.org/gcc/2015-08-26/tree-expression-ranges.html
> 
> etc; the patch kits were:
> 
> https://gcc.gnu.org/legacy-ml/gcc-patches/2015-03/msg00837.html
> https://gcc.gnu.org/pipermail/gcc-patches/2015-September/428036.html
> https://gcc.gnu.org/legacy-ml/gcc-patches/2015-09/msg01696.html
> 
> In:
>   https://gcc.gnu.org/legacy-ml/gcc-patches/2015-09/msg01700.html
> I wrote:
> > * Eliminated UTF-8/box-drawing and captions.  Captions were cute but
> >   weren't "fully baked".  Without them, box-drawing isn't really
> >   needed, and I think I prefer the ASCII look, with the actual
> >   "caret" character, and '~' makes it easier to count characters
> >   compared to a box-drawing line, in my terminal's font, at least.
> >   Doing so greatly simplifies the new locus-printing code.
> 
> So I dropped the UTF-8 box drawing from that original kit for:
> (a) simplicity (the original patch kit was huge in scope, covering a
> bunch of ideas for diagnostics - ranges, labeling, fix-it hints,
> spelling suggestions, so I wanted to reduce the scope to something
> manageable)
> (b) I found it easier to count characters with "~"
> 
> 
> The thing I'm most nervous about with this patch is the potential for
> introducing mojibake when people copy and paste GCC output.
> 
> For example, looking at:
> https://gcc.gnu.org/legacy-ml/gcc-patches/2015-03/msg00837.html
> I see mojibake where the unicode line-drawing characters in my email
> are being displayed in the HTML mailing list archive via "â" -
> something has gone wrong with encoding somewhere between the copy&paste
> from my terminal, the email, and the list archive.
> 
> That said, looking at your email in the archive here:
> https://gcc.gnu.org/pipermail/gcc-patches/2020-July/550551.html
> I don't see any mojibake.
> 
> What happens if GCC's stderr is piped into "less"?
> What happens if GCC's stderr is saved in a build.log file, uploaded
> somewhere, and then viewed?
> etc.
> 
> 
> > The elements implemented are:
> > 
> > * Vertical lines, e.g. those indicating labels and those
> > separating the
> >   source lines from the line numbers, are changed to line drawing
> >   characters.
> > 
> > * The diagnostic paths output by the static analyzer make use of
> > line
> >   drawing characters to output smooth corners etc.
> > 
> > * The squiggly underline ~ used to highlight source locations
> > is
> >   changed to a double underline ═. The main reason for this
> > is that
> >   it enables a seamless "tee" character to connect the underline
> > to a
> >   label line if one exists.
> > 
> > * Carets (^) are changed to a slightly different character (∧). I
> > think
> >   the new one is a little nicer looking, although probably not
> > worth the
> >   trouble on its own. I wanted to implement the support in this
> > patch
> >   beause carets are harder to change than the rest of the
> > elements
> >   (front ends have an interface to override them, which currently
> >   Fortran makes use of), so I thought it worthwhile to get this
> 

[PATCH] libstdc++: Add C++ runtime support for new 128-bit long double format

2020-11-11 Thread Jonathan Wakely via Gcc-patches
This adds support for the new __ieee128 long double format on
powerpc64le targets.

Most of the complexity comes from wanting a single libstdc++.so library
that contains the symbols needed by code compiled with both
-mabi=ibmlongdouble and -mabi=ieeelongdouble (and not forgetting
-mlong-double-64 as well!)

In a few places this just requires an extra overload, for example
std::from_chars has to be overloaded for both forms of long double.
That can be done in a single translation unit that defines overloads
for 'long double' and also '__ieee128', so that user code including
 will be able to link to a definition for either type of long
double. Those are the easy cases.

The difficult parts are (as for the std::string ABI transition) the I/O
and locale facets. In order to be able to write either form of long
double to an ostream such as std::cout we need the locale to contain a
std::num_put facet that can handle both forms. The same approach is
taken as was already done for supporting 64-bit long double and 128-bit
long double: adding extra overloads of do_put to the facet class. On
targets where the new long double code is enabled, the facets that are
registered in the locale at program startup have additional overloads so
that they can work with any long double type. Where this fails to work
is if user code installs its own facet, which will probably not have the
additional overloads and so will only be able to output one or the other
type. In practice the number of users expecting to be able to use their
own locale facets in code using a mix of -mabi=ibmlongdouble and
-mabi=ieeelongdouble is probably close to zero.



Not yet pushed to master.

Tested x86_64-linux, powerpc64pe-linux (glibc 2.31 and 2.32, i.e.
with and without libc support for ieee128 long double).

There are still some test failures when using -mabi=ieeelongdouble
(whether adding that to the dejagnu test flags or by configuring GCC
with --with-long-double-format=ieee). Some of them are testing the
problem case mentioned above (custom facets that don't handle both
types of long double) but others are less clear, and I'm still hoping
to fix them.



commit c2ac6fe2105cac1e83dffc303082e337fd97fcc5
Author: Jonathan Wakely 
Date:   Mon Nov 12 10:47:41 2018

libstdc++: Add C++ runtime support for new 128-bit long double format

This adds support for the new __ieee128 long double format on
powerpc64le targets.

Most of the complexity comes from wanting a single libstdc++.so library
that contains the symbols needed by code compiled with both
-mabi=ibmlongdouble and -mabi=ieeelongdouble (and not forgetting
-mlong-double-64 as well!)

In a few places this just requires an extra overload, for example
std::from_chars has to be overloaded for both forms of long double.
That can be done in a single translation unit that defines overloads
for 'long double' and also '__ieee128', so that user code including
 will be able to link to a definition for either type of long
double. Those are the easy cases.

The difficult parts are (as for the std::string ABI transition) the I/O
and locale facets. In order to be able to write either form of long
double to an ostream such as std::cout we need the locale to contain a
std::num_put facet that can handle both forms. The same approach is
taken as was already done for supporting 64-bit long double and 128-bit
long double: adding extra overloads of do_put to the facet class. On
targets where the new long double code is enabled, the facets that are
registered in the locale at program startup have additional overloads so
that they can work with any long double type. Where this fails to work
is if user code installs its own facet, which will probably not have the
additional overloads and so will only be able to output one or the other
type. In practice the number of users expecting to be able to use their
own locale facets in code using a mix of -mabi=ibmlongdouble and
-mabi=ieeelongdouble is probably close to zero.

libstdc++-v3/ChangeLog:

* Makefile.in: Regenerate.
* config.h.in: Regenerate.
* config/abi/pre/gnu.ver: Make patterns less greedy.
* config/os/gnu-linux/ldbl-ieee128-extra.ver: New file with patterns
for IEEE128 long double symbols.
* configure: Regenerate.
* configure.ac: Enable alternative 128-bit long double format on
powerpc64*-*-linux*.
* doc/Makefile.in: Regenerate.
* fragment.am: Regenerate.
* include/Makefile.am: Set _GLIBCXX_LONG_DOUBLE_ALT128_COMPAT.
* include/Makefile.in: Regenerate.
* include/bits/c++config: Define inline namespace for new long
double symbols. Don't define _GLIBCXX_USE_FLOAT128 when it's the
same type as long double.
* include/bits/locale_classes.h [_GLIBCXX_

Re: [PATCH] match.pd: undistribute (a << s) & C, when C = (M << s) and exact_log2(M - 1)

2020-11-11 Thread Jakub Jelinek via Gcc-patches
On Wed, Nov 11, 2020 at 01:43:00PM -0800, Jim Wilson wrote:
> > On Wed, Nov 11, 2020 at 11:43:34AM +0100, Philipp Tomsich wrote:
> > > The patch addresses this by disallowing that rule, if an exact
> > power-of-2 is
> > > seen as C1.  The reason why I would prefer to have this canonicalised the
> > > same way the (X & C1) * C2 is canonicalised, is that cleaning this up
> > during
> > > combine is more difficult on some architectures that require multiple
> > insns
> > > to represent the shifted constant (i.e. C1 << C2).
> >
> > As I said, it is better to decide which one is better before or during
> > expansion based on target costs, sure, combine can't catch everything.
> >
> 
> it could be fixed in combine if we allowed 4 instructions to split into 3.
> We allow combinations of 4 insns, but we only allow splits into 2 insns as
> far as I know.

If combiner can do it, good, but I think it would be still helpful for many
targets to decide it during expansion, we have TER and so on
expansion of BIT_AND_EXPR with a constant second operand we can check
if the other one is a left shift and then based on the exact mask
decide if it is useful to try different possibilities and ask for their
costs (try (x & c1) << c2, (x << c3) & c4 and perhaps (x << c5) >> c6).
We already have other cases where during expansion we try multiple things
and compare their costs.  E.g. for division and modulo, if we know that the
most significant bit is clear on both of the operands, we can expand as both
signed or unsigned division/modulo and so we try both and pick the one
which is less costly on the target.  Similarly, I think we could do it
for right shifts, if we know the most significant bit is clear, we can
expand it as both arithmetic and logical right shift, so we can ask the
target what it prefers cost-wise.

E.g. on x86_64
unsigned long long
foo (unsigned long long x)
{
  return (x << 47) >> 17;
}

unsigned long long
bar (unsigned long long x)
{
  return (x & 0x1) << 30;
}

unsigned long long
baz (unsigned long long x)
{
  unsigned long long y;
  __asm ("" : "=r" (y) : "0" (x & 0x1));
  return y << 30;
}

unsigned long long
qux (unsigned long long x)
{
  return (x << 30) & (0x1ULL << 30);
}
foo is 4 instructions 12 bytes, baz is 4 instructions 13 bytes,
bar/qux are 5 instructions 21 bytes, which of foo or baz is faster would
need to be benchmarked and no idea what the rtx costs would say, but bar/qux
certainly looks more costly and I'm quite sure the rtx costs would say that
too.

The reason for the match.pd canonicalization is I bet it wants to put
possible multiple shifts adjacent and similarly with the masks, so that when
one uses
(((x & c1) << c2) & c3) << c4 etc. one can simplify that into just one shift
and one masking; and having the canonicalization do it one way for some
constants/shift pairs and another way for others wouldn't achieve that.

Jakub



Re: [PATCH] c++, v2: Fix up constexpr CLEANUP_POINT_EXPR and TRY_FINALLY_EXPR handling [PR97790]

2020-11-11 Thread Jason Merrill via Gcc-patches

On 11/11/20 3:04 PM, Jakub Jelinek wrote:

On Wed, Nov 11, 2020 at 02:11:14PM -0500, Jason Merrill via Gcc-patches wrote:

Would it make sense to always use a NULL jump_target when evaluating
cleanups?


I was afraid of that, especially for TRY_FINALLY_EXPR, but it seems that
during constexpr evaluation the cleanups will most often be just very simple
destructor calls (or calls to cleanup attribute functions).
Furthermore, for neither of these 3 tree codes we'll reach that code if
jump_target && *jump_target initially (there is a return NULL_TREE much
earlier for those except for trees that could embed labels etc. in it and
clearly these 3 don't count in that).

This adjusted patch doesn't have any check-c++-all regressions, would that
be ok for trunk if it passes whole bootstrap/regtest?


OK.


2020-11-11  Jakub Jelinek  

PR c++/97790
* constexpr.c (cxx_eval_constant_expression) : Don't pass jump_target to
cxx_eval_constant_expression when evaluating the cleanups.

* g++.dg/cpp2a/constexpr-dtor9.C: New test.

--- gcc/cp/constexpr.c.jj   2020-11-04 09:35:10.025029335 +0100
+++ gcc/cp/constexpr.c  2020-11-11 13:52:37.538466295 +0100
@@ -6018,8 +6018,7 @@ cxx_eval_constant_expression (const cons
/* Evaluate the cleanups.  */
FOR_EACH_VEC_ELT_REVERSE (cleanups, i, cleanup)
  cxx_eval_constant_expression (ctx, cleanup, false,
-   non_constant_p, overflow_p,
-   jump_target);
+   non_constant_p, overflow_p);
}
break;
  
@@ -6030,29 +6029,20 @@ cxx_eval_constant_expression (const cons

if (!*non_constant_p)
/* Also evaluate the cleanup.  */
cxx_eval_constant_expression (ctx, TREE_OPERAND (t, 1), true,
- non_constant_p, overflow_p,
- jump_target);
+ non_constant_p, overflow_p);
break;
  
  case CLEANUP_STMT:

-  {
-   tree initial_jump_target = jump_target ? *jump_target : NULL_TREE;
-   r = cxx_eval_constant_expression (ctx, CLEANUP_BODY (t), lval,
- non_constant_p, overflow_p,
- jump_target);
-   if (!CLEANUP_EH_ONLY (t) && !*non_constant_p)
- {
-   iloc_sentinel ils (loc);
-   /* Also evaluate the cleanup.  If we weren't skipping at the
-  start of the CLEANUP_BODY, change jump_target temporarily
-  to &initial_jump_target, so that even a return or break or
-  continue in the body doesn't skip the cleanup.  */
-   cxx_eval_constant_expression (ctx, CLEANUP_EXPR (t), true,
- non_constant_p, overflow_p,
- jump_target ? &initial_jump_target
- : NULL);
- }
-  }
+  r = cxx_eval_constant_expression (ctx, CLEANUP_BODY (t), lval,
+   non_constant_p, overflow_p,
+   jump_target);
+  if (!CLEANUP_EH_ONLY (t) && !*non_constant_p)
+   {
+ iloc_sentinel ils (loc);
+ /* Also evaluate the cleanup.  */
+ cxx_eval_constant_expression (ctx, CLEANUP_EXPR (t), true,
+   non_constant_p, overflow_p);
+   }
break;
  
/* These differ from cxx_eval_unary_expression in that this doesn't

--- gcc/testsuite/g++.dg/cpp2a/constexpr-dtor9.C.jj 2020-11-11 
13:57:16.572334917 +0100
+++ gcc/testsuite/g++.dg/cpp2a/constexpr-dtor9.C2020-11-11 
13:57:45.510010165 +0100
@@ -0,0 +1,31 @@
+// PR c++/97790
+// { dg-do compile { target c++20 } }
+
+struct S
+{
+  int *d;
+  int n;
+  constexpr S () : d(new int[1]{}), n(1) {}
+  constexpr ~S () { delete [] d; }
+};
+
+constexpr S
+foo ()
+{
+  return S ();
+}
+
+constexpr int
+bar ()
+{
+  return foo ().n;
+}
+
+constexpr int
+baz ()
+{
+  return S ().n;
+}
+
+constexpr int a = baz ();
+constexpr int b = bar ();


Jakub





[patch] Enhance debug info for fixed-point types

2020-11-11 Thread Eric Botcazou
Hi,

the Ada language supports fixed-point types as first-class citizens so they 
need to be described as-is in the debug info.  Pierre-Marie devised the 
langhook get_fixed_point_type_info for this purpose a few years ago, but it 
comes with a limitation for the representation of the scale factor that we 
would need to lift in order to be able to represent more fixed-point types.

Since it's only used by the Ada compiler, this is probably non-controversial 
but someone still needs to approve the change.  Thanks in advance.


2020-11-11  Eric Botcazou  

* dwarf2out.h (struct fixed_point_type_info) : Turn
numerator and denominator into a tree.
* dwarf2out.c (base_type_die): In the case of a fixed-point type
with arbitrary scale factor, call add_scalar_info on numerator and
denominator to emit the appropriate attributes.


2020-11-11  Eric Botcazou  

* exp_dbug.adb (Is_Handled_Scale_Factor): Delete.
(Get_Encoded_Name): Do not call it.
* gcc-interface/decl.c (gnat_to_gnu_entity) :
Tidy up and always use a meaningful description for arbitrary
scale factors.
* gcc-interface/misc.c (gnat_get_fixed_point_type_info): Remove
obsolete block and adjust the description of the scale factor.

-- 
Eric Botcazoudiff --git a/gcc/ada/exp_dbug.adb b/gcc/ada/exp_dbug.adb
index c2e774140ff..dc6cd265af4 100644
--- a/gcc/ada/exp_dbug.adb
+++ b/gcc/ada/exp_dbug.adb
@@ -133,11 +133,6 @@ package body Exp_Dbug is
--  Determine whether the bounds of E match the size of the type. This is
--  used to determine whether encoding is required for a discrete type.
 
-   function Is_Handled_Scale_Factor (U : Ureal) return Boolean;
-   --  The argument U is the Small_Value of a fixed-point type. This function
-   --  determines whether the back-end can handle this scale factor. When it
-   --  cannot, we have to output a GNAT encoding for the corresponding type.
-
procedure Output_Homonym_Numbers_Suffix;
--  If homonym numbers are stored, then output them into Name_Buffer
 
@@ -594,27 +589,6 @@ package body Exp_Dbug is
  return Make_Null_Statement (Loc);
end Debug_Renaming_Declaration;
 
-   -
-   -- Is_Handled_Scale_Factor --
-   -
-
-   function Is_Handled_Scale_Factor (U : Ureal) return Boolean is
-   begin
-  --  Keep in sync with gigi (see E_*_Fixed_Point_Type handling in
-  --  decl.c:gnat_to_gnu_entity).
-
-  if UI_Eq (Numerator (U), Uint_1) then
- if Rbase (U) = 2 or else Rbase (U) = 10 then
-return True;
- end if;
-  end if;
-
-  return
-(UI_Is_In_Int_Range (Norm_Num (U))
-   and then
- UI_Is_In_Int_Range (Norm_Den (U)));
-   end Is_Handled_Scale_Factor;
-
--
-- Get_Encoded_Name --
--
@@ -671,12 +645,10 @@ package body Exp_Dbug is
 
   Has_Suffix := True;
 
-  --  Fixed-point case: generate GNAT encodings when asked to or when we
-  --  know the back-end will not be able to handle the scale factor.
+  --  Fixed-point case: generate GNAT encodings when asked to
 
   if Is_Fixed_Point_Type (E)
-and then (GNAT_Encodings /= DWARF_GNAT_Encodings_Minimal
-   or else not Is_Handled_Scale_Factor (Small_Value (E)))
+and then GNAT_Encodings /= DWARF_GNAT_Encodings_Minimal
   then
  Get_External_Name (E, True, "XF_");
  Add_Real_To_Buffer (Delta_Value (E));
diff --git a/gcc/ada/gcc-interface/decl.c b/gcc/ada/gcc-interface/decl.c
index fa17ad9453f..a0f17b1aafc 100644
--- a/gcc/ada/gcc-interface/decl.c
+++ b/gcc/ada/gcc-interface/decl.c
@@ -1743,24 +1743,12 @@ gnat_to_gnu_entity (Entity_Id gnat_entity, tree gnu_expr, bool definition)
 
 	gnu_type = make_signed_type (esize);
 
-	/* Try to decode the scale factor and to save it for the fixed-point
-	   types debug hook.  */
-
-	/* There are various ways to describe the scale factor, however there
-	   are cases where back-end internals cannot hold it.  In such cases,
-	   we output invalid scale factor for such cases (i.e. the 0/0
-	   rational constant) but we expect GNAT to output GNAT encodings,
-	   then.  Thus, keep this in sync with
-	   Exp_Dbug.Is_Handled_Scale_Factor.  */
-
 	/* When encoded as 1/2**N or 1/10**N, describe the scale factor as a
 	   binary or decimal scale: it is easier to read for humans.  */
 	if (UI_Eq (Numerator (gnat_small_value), Uint_1)
 	&& (Rbase (gnat_small_value) == 2
 		|| Rbase (gnat_small_value) == 10))
 	  {
-	/* Given RM restrictions on 'Small values, we assume here that
-	   the denominator fits in an int.  */
 	tree base
 	  = build_int_cst (integer_type_node, Rbase (gnat_small_value));
 	tree exponent
@@ -1773,29 +1761,18 @@ gnat_to_gnu_entity (Entity_Id gnat_entity, tree gnu_expr, bool definition)
 base, exponent));
 	  }
 
-	/* Default to arbitrary scale 

Re: [PATCH] std::experimental::simd

2020-11-11 Thread Jonathan Wakely via Gcc-patches

On 08/05/20 21:03 +0200, Matthias Kretz wrote:

Here's my last update to the std::experimental::simd patch. It's currently
based on the gcc-10 branch.




+
+// __next_power_of_2{{{
+/**
+ * \internal


We use @foo for Doxygen commens rather than \foo


+ * Returns the next power of 2 larger than or equal to \p __x.
+ */
+constexpr std::size_t
+__next_power_of_2(std::size_t __x)
+{
+  return (__x & (__x - 1)) == 0 ? __x
+   : __next_power_of_2((__x | (__x >> 1)) + 1);
+}


Can this be replaced with std::__bit_ceil ?

std::bit_ceil is C++20, but we provide __private versions of
everything in  for C++14 and up.


+// vvv  type traits  vvv
+// integer type aliases{{{
+using _UChar = unsigned char;
+using _SChar = signed char;
+using _UShort = unsigned short;
+using _UInt = unsigned int;
+using _ULong = unsigned long;
+using _ULLong = unsigned long long;
+using _LLong = long long;


I have a suspicion some of these might clash with libc macros on some
OS somewhere, but we can cross that bridge when we come to it.



+//}}}
+// __identity/__id{{{
+template  struct __identity
+{
+  using type = _Tp;
+};
+template  using __id = typename __identity<_Tp>::type;


 provides __type_identity and __type_identity_t.


+
+// }}}
+// __first_of_pack{{{
+template  struct __first_of_pack
+{
+  using type = _T0;
+};
+template 
+using __first_of_pack_t = typename __first_of_pack<_Ts...>::type;


 has __reflection_type_list::first::type for
this purpose, but nobody uses that header. This is fine.



+//}}}
+// __value_type_or_identity_t {{{
+template 
+typename _Tp::value_type
+__value_type_or_identity_impl(int);
+template 
+_Tp
+__value_type_or_identity_impl(float);
+template 
+using __value_type_or_identity_t
+  = decltype(__value_type_or_identity_impl<_Tp>(int()));


This could be __detected_or_t<_Tp, __value_t, _Tp> but your version
probably compiles faster.



+// }}}
+// __is_vectorizable {{{
+template 
+struct __is_vectorizable : public std::is_arithmetic<_Tp>
+{
+};
+template <> struct __is_vectorizable : public false_type
+{
+};
+template 
+inline constexpr bool __is_vectorizable_v = __is_vectorizable<_Tp>::value;


Specializing __is_vectorizable_v = false would save needing to
instantiate __is_vectorizable, but not a big deal.


+// __make_dependent_t {{{
+template  struct __make_dependent
+{
+  using type = _Up;
+};
+template 
+using __make_dependent_t = typename __make_dependent<_Tp, _Up>::type;


Do you need a distinct class template for this, or can
__make_dependent_t be an alias to __type_identity::type or
something else that already exists?


+// __call_with_n_evaluations{{{
+template 
+_GLIBCXX_SIMD_INTRINSIC constexpr auto
+__call_with_n_evaluations(std::index_sequence<_I...>, _F0&& __f0,
+ _FArgs&& __fargs)


I'm not sure if it matters here, but old versions of G++ passed empty
types (like index_sequence) using the wrong ABI. Passing them as the
last argument makes it a non-issue. If they're not the last argument,
you get incompatible code when compiling with -fabi-version=7 or
lower.


+// __is_narrowing_conversion<_From, _To>{{{
+template ::value,
+ bool = std::is_arithmetic<_To>::value>


These could use is_arithmetic_v.


+struct __is_narrowing_conversion;
+
+// ignore "warning C4018: '<': signed/unsigned mismatch" in the following 
trait.
+// The implicit conversions will do the right thing here.
+template 
+struct __is_narrowing_conversion<_From, _To, true, true>
+  : public __bool_constant<(
+  std::numeric_limits<_From>::digits > std::numeric_limits<_To>::digits
+  || std::numeric_limits<_From>::max() > std::numeric_limits<_To>::max()
+  || std::numeric_limits<_From>::lowest()
+  < std::numeric_limits<_To>::lowest()
+  || (std::is_signed<_From>::value && std::is_unsigned<_To>::value))>


And is_signed_v and is_unsigned_v.


+{
+};
+
+template 
+struct __is_narrowing_conversion : public true_type


This looks odd, bool to arithmetic type T is narrowing?
I assume there's a reason for it, so maybe a comment explaining it
would help.


+// _BitOps {{{
+struct _BitOps
+{
+  // __popcount {{{
+  static constexpr _UInt __popcount(_UInt __x)
+  {
+return __builtin_popcount(__x);
+  }
+  static constexpr _ULong __popcount(_ULong __x)
+  {
+return __builtin_popcountl(__x);
+  }
+  static constexpr _ULLong __popcount(_ULLong __x)
+  {
+return __builtin_popcountll(__x);
+  }


std::__popcount in 


+  // }}}
+  // __ctz/__clz {{{
+  static constexpr _UInt __ctz(_UInt __x) { return __builtin_ctz(__x); }
+  static constexpr _ULong __ctz(_ULong __x) { return __builtin_ctzl(__x); }
+  static constexpr _ULLong __ctz(_ULLong __x) { return __builtin_ctzll(__x); }
+  static constexpr _UInt __clz(_UInt __x) { return __builtin_clz(__x); }
+  static constexpr _ULong __clz(_ULong __x) { return __builtin_clzl(__x); }
+  static constexpr _ULLong __clz(_ULLong __x) { return __builtin_clzll(__x); }


std::__countl_zero in 


Re: [patch] Enhance debug info for fixed-point types

2020-11-11 Thread Jeff Law via Gcc-patches


On 11/11/20 4:25 PM, Eric Botcazou wrote:
> Hi,
>
> the Ada language supports fixed-point types as first-class citizens so they 
> need to be described as-is in the debug info.  Pierre-Marie devised the 
> langhook get_fixed_point_type_info for this purpose a few years ago, but it 
> comes with a limitation for the representation of the scale factor that we 
> would need to lift in order to be able to represent more fixed-point types.
>
> Since it's only used by the Ada compiler, this is probably non-controversial 
> but someone still needs to approve the change.  Thanks in advance.
>
>
> 2020-11-11  Eric Botcazou  
>
>   * dwarf2out.h (struct fixed_point_type_info) : Turn
>   numerator and denominator into a tree.
>   * dwarf2out.c (base_type_die): In the case of a fixed-point type
>   with arbitrary scale factor, call add_scalar_info on numerator and
>   denominator to emit the appropriate attributes.
>
>
> 2020-11-11  Eric Botcazou  
>
>   * exp_dbug.adb (Is_Handled_Scale_Factor): Delete.
>   (Get_Encoded_Name): Do not call it.
>   * gcc-interface/decl.c (gnat_to_gnu_entity) :
>   Tidy up and always use a meaningful description for arbitrary
>   scale factors.
>   * gcc-interface/misc.c (gnat_get_fixed_point_type_info): Remove
>   obsolete block and adjust the description of the scale factor.

OK.

jeff



Improve handling of memory operands in ipa-icf 2/4

2020-11-11 Thread Jan Hubicka
Hi,
this patch iplements new class ao_compare that is derived from operand_compare
and adds a method to compare and hash ao_refs.  This is used by ICF to enable
more merging.

Comparsion is done as follows

1) Verify that the memory access will happen at the same address
   and will have same size.

   For constant addresses this is done by comparing ao_ref_base
   and offset/size

   For varable accesses it uses operand_equal_p but with OEP_ADDRESS
   (that does not match TBAA metadata) and then operand_equal_p on
   type size.

2) Compare alignments.  I use get_object_alignment_1 like ipa-icf
   did before revamp to operand_equal_p in gcc 9.
   I noticed that return value is bitodd so added a comment

3) Match MR_DEPENDENCE_CLIQUE

At this point the memory refrences are same except for TBAA information.
We continue by checking

4) ref and base alias sets.  Now if lto streaming is going to happen
   instead of comparing alias sets themselves we compare alias_ptr_types

   (the patch depends on the ao_ref_alias_ptr_tyep and
ao_ref_base_alias_ptr_type acessors I sent yesterday)

5) See if accesses are view converted.
   If they are we are done since access path is not present

6) Compare the part of access path relevant for TBAA.
   I recall FRE relies on the fact that if base and ref types are same the
   access path is, but I do not thing this is 100% reliable especially with LTO
   alias sets.

   The access path comparsion logic is also useful for modref (for next stage1).
   Tracking the access paths improves quite noticeably disambiguation in C++ 
code
   by being able to distinquish different fields of same type within a struct.
   I had the comparsion logic in my tree for some time and it seems to work
   quite well.

   During cc1plus build we have some cases where we find mismatch after matching
   the base/ref alias sets.  These are due to failed type merging: access path
   oracle in LTO uses TYPE_MAIN_VARIANTs.

I implemented relatively basic hashing using base and offset.

Patch bootstraps with LTO with checking enabled, disabled and with or without
strict aliasing.  With -fno-strict-aliasing we now perform a lot more merging.
After fixing two indenpendent issues i will send separately we have quite good
chance that functions with same hashes are actually merged.  cc1plus stats are
as follows:

 61   false returned: 'compare_ao_refs failed (semantic difference)' in 
compare_operand at ../../gcc/ipa-icf-gimple.c:336
 73   false returned: 'THIS pointer ODR type mismatch' in equals_wpa at 
../../gcc/ipa-icf.c:673
 74   false returned: 'types are not same for ODR' in 
compatible_polymorphic_types_p at ../../gcc/ipa-icf-gimple.c:197
 94   false returned: 'parameter type is not compatible' in 
compatible_parm_types_p at ../../gcc/ipa-icf.c:508
111   false returned: 'GIMPLE LHS type mismatch' in compare_gimple_assign 
at ../../gcc/ipa-icf-gimple.c:706
116   false returned: '' in compare_decl at ../../gcc/ipa-icf-gimple.c:162
236   false returned: 'GIMPLE assignment operands are different' in 
compare_gimple_assign at ../../gcc/ipa-icf-gimple.c:710
274   false returned: 'inline attributes are different' in 
compare_referenced_symbol_properties at ../../gcc/ipa-icf.c:346
446   false returned: 'parameter types are not compatible' in equals_wpa at 
../../gcc/ipa-icf.c:635
546   false returned: '' in equals_private at ../../gcc/ipa-icf.c:882
631   false returned: '' in compare_phi_node at ../../gcc/ipa-icf.c:1555
631   false returned: 'PHI node comparison returns false' in equals_private 
at ../../gcc/ipa-icf.c:917
969   false returned: 'decl_or_type flags are different' in equals_wpa at 
../../gcc/ipa-icf.c:568
973   false returned: 'different tree types' in compatible_types_p at 
../../gcc/ipa-icf-gimple.c:206
   3109   false returned: 'types are not compatible' in compatible_types_p at 
../../gcc/ipa-icf-gimple.c:212
   3849   false returned: 'result types are different' in equals_wpa at 
../../gcc/ipa-icf.c:617

So we now have only memory reference 61 mismatches.  There are 11313 functions
merged and 546 fails in equals_private which I think is number of function body
compares that did not go well.  Code size reduction is about 3% compared to
0.2% without patch.

With strict aliasing the situation is worse
 46   false returned: 'references to virtual tables cannot be merged' in 
compare_referenced_symbol_properties at ../../gcc/ipa-icf.c:369
 48   false returned: 'different references' in compare_symbol_references 
at ../../gcc/ipa-icf.c:461
 48   false returned: '' in operand_equal_p at 
../../gcc/ipa-icf-gimple.c:282
 53   false returned: 'GIMPLE LHS type mismatch' in compare_gimple_assign 
at ../../gcc/ipa-icf-gimple.c:701
 73   false returned: 'THIS pointer ODR type mismatch' in equals_wpa at 
../../gcc/ipa-icf.c:673
 74   false returned: 'types are not same for ODR' in 
compatible_polymorphic_types_p at ../../gcc/

Re: [PATCH] Add a new pattern in 4-insn combine

2020-11-11 Thread Segher Boessenkool
Hi Hao Chen,

[ You first need to add yourself to MAINTAINERS?  And get an account to
  do that, if you do not yet have one yet :-) ]

On Mon, Nov 09, 2020 at 10:48:19AM +0800, HAO CHEN GUI wrote:
> This patch adds a new pattern in 4-insn combine. It supports the 
> following sign_extend(op: zero_extend, zero_extend) optimization. In the 
> patch, newpat is split twice. The first split becomes newi1pat and the 
> second becomes newi2pat. They replace i1, i2 and i3 if all of them can 
> be recognized.
> 
> 7: r126:SI=zero_extend([r123:DI+0x1])
> 6: r125:SI=zero_extend([r123:DI])
> 8: r127:SI=r125:SI+r126:SI
> 9: r124:DI=sign_extend(r127:SI)
> 
> are replaced by:
> 
> 7: r125:DI=zero_extend([r123:DI])
> 8: r127:DI=zero_extend([r123:DI+0x1])
> 9: r124:DI=r127:DI+r125:DI

So in the original insn 8 registers 125 and 126 died, so this isn't
changing insn 7 at all, just moving it back from 6!  That is not what
combine should do, but it also is problematic: is it checked anywhere
that we *can* move the insn like this?

> --- a/gcc/combine.c
> +++ b/gcc/combine.c
> @@ -851,10 +851,11 @@ do_SUBST_LINK (struct insn_link **into, struct 
> insn_link *newval)
>  
>  static bool
>  combine_validate_cost (rtx_insn *i0, rtx_insn *i1, rtx_insn *i2, rtx_insn 
> *i3,
> -rtx newpat, rtx newi2pat, rtx newotherpat)
> +rtx newpat, rtx newi2pat, rtx newotherpat,
> +rtx newi1pat)

(Keep the args ordered: 3, 2, 1, other.  "newpat" means "newi3pat").

> @@ -2672,7 +2693,7 @@ try_combine (rtx_insn *i3, rtx_insn *i2, rtx_insn *i1, 
> rtx_insn *i0,
>int *new_direct_jump_p, rtx_insn *last_combined_insn)
>  {
>/* New patterns for I3 and I2, respectively.  */
> -  rtx newpat, newi2pat = 0;
> +  rtx newpat, newi2pat = 0, newi1pat = 0;
>rtvec newpat_vec_with_clobbers = 0;
>int substed_i2 = 0, substed_i1 = 0, substed_i0 = 0;
>/* Indicates need to preserve SET in I0, I1 or I2 in I3 if it is not
> @@ -2682,8 +2703,9 @@ try_combine (rtx_insn *i3, rtx_insn *i2, rtx_insn *i1, 
> rtx_insn *i0,
>int total_sets;
>/* Nonzero if I2's or I1's body now appears in I3.  */
>int i2_is_used = 0, i1_is_used = 0;
> -  /* INSN_CODEs for new I3, new I2, and user of condition code.  */
> +  /* INSN_CODEs for new I3, new I2, new I1 and user of condition code.  */

(Comma after I1: "new I1, and ...".)

>int insn_code_number, i2_code_number = 0, other_code_number = 0;
> +  int i1_code_number = 0;

Well, put it together with i3, i2 then, put "other" on its own line?

> @@ -2756,7 +2778,9 @@ try_combine (rtx_insn *i3, rtx_insn *i2, rtx_insn *i1, 
> rtx_insn *i0,
> else if (BINARY_P (src) && CONSTANT_P (XEXP (src, 1)))
>   ngood++;
> else if (GET_CODE (src) == ASHIFT || GET_CODE (src) == ASHIFTRT
> -|| GET_CODE (src) == LSHIFTRT)
> +|| GET_CODE (src) == LSHIFTRT
> +|| GET_CODE (src) == SIGN_EXTEND
> +|| GET_CODE (src) == ZERO_EXTEND)
>   nshift++;

Extends are not shifts.  Please count "nextend" separately?

> @@ -3399,6 +3423,12 @@ try_combine (rtx_insn *i3, rtx_insn *i2, rtx_insn *i1, 
> rtx_insn *i0,
> i1src = subst (i1src, pc_rtx, pc_rtx, 0, 0, 0);
>   }
>  
> +   if (i0)
> + {
> +   subst_low_luid = DF_INSN_LUID (i0);
> +   i0src = subst (i0src, pc_rtx, pc_rtx, 0, 0, 0);
> + }

This won't work?  "subst_low_uid" is overwritten right in the next
statement:

> subst_low_luid = DF_INSN_LUID (i2);
> i2src = subst (i2src, pc_rtx, pc_rtx, 0, 0, 0);

> @@ -3920,6 +3950,50 @@ try_combine (rtx_insn *i3, rtx_insn *i2, rtx_insn *i1, 
> rtx_insn *i0,
> rtx src_op0 = XEXP (setsrc, 0);
> rtx src_op1 = XEXP (setsrc, 1);
>  
> +   /* Double split when src of i0 and i1 are both ZERO_EXTEND.  */
> +   if (i0 && i1
> +   && GET_CODE (PATTERN (i0)) == SET
> +   && GET_CODE (PATTERN (i1)) == SET
> +   && GET_CODE (SET_SRC (PATTERN (i0))) == ZERO_EXTEND
> +   && GET_CODE (SET_SRC (PATTERN (i1))) == ZERO_EXTEND
> +   && (rtx_equal_p (XEXP (*split, 0),
> +XEXP (SET_SRC (PATTERN (i1)), 0))
> +   || rtx_equal_p (XEXP (*split, 0),
> +   XEXP (SET_SRC (PATTERN (i0)), 0
> + {
> +   newi1pat = NULL_RTX;
> +   rtx newdest, *i0_i1dest;
> +   machine_mode new_mode;
> +
> +   new_mode = GET_MODE (*split);
> +   if (rtx_equal_p (XEXP (*split, 0),
> +XEXP (SET_SRC (PATTERN (i1)), 0)))
> + i0_i1dest = &i1dest;
> +   else
> + i0_i1dest = &i0dest;
> +
> +   if (REGNO (i1dest) < FIRST_PSEUDO_REGISTER)
> + newdest = gen_rtx_REG (new_mode, REGNO (*i0_i1dest));

You can only have one mod

[PATCH] [tree-optimization] Optimize two patterns with three xors.

2020-11-11 Thread Eugene Rozenfeld via Gcc-patches
Simplify (a ^ b) & ((b ^ c) ^ a) --> (a ^ b) & ~c.

int f(int a, int b, int c)
{
return (a ^ b) & ((b ^ c) ^ a);
}

Code without the patch:
moveax,edx
xoreax,esi
xoreax,edi
xoredi,esi
andeax,edi
ret  

Code with the patch:
xoredi,esi
andn   eax,edx,edi
ret  

Simplify (a ^ b) | ((b ^ c) ^ a) --> (a ^ b) | c.
int g(int a, int b, int c)
{
return (a ^ b) | ((b ^ c) ^ a);
}

Code without the patch:
moveax,edx
xoreax,esi
xoreax,edi
xoredi,esi
or eax,edi
ret

Code with the patch:
xoredi,esi
moveax,edi
or eax,edx
ret

This fixes PR96671.

Tested on x86_64-pc-linux-gnu.



0001-Optimize-two-patterns-with-three-xors.patch
Description: 0001-Optimize-two-patterns-with-three-xors.patch


[PATCH][PR target/97770] x86: Add missing popcount2 expander

2020-11-11 Thread Hongyu Wang via Gcc-patches
Hi,

According to https://gcc.gnu.org/bugzilla/show_bug.cgi?id=97770, x86
backend need popcount2 expander so __builtin_popcount could be
auto vectorized with AVX512BITALG/AVX512VPOPCNTDQ targets.

For DImode the middle-end vectorizer could not generate expected code,
and for QI/HImode there is no corresponding IFN, xfails are added for
these tests.

Bootstrap/regression test for x86 backend is OK.

OK for master?

gcc/ChangeLog

PR target/97770
* gcc/config/i386/sse.md (popcount2): New expander
for SI/DI vector modes.
(popcount2): Likewise for QI/HI vector modes.

gcc/testsuite/ChangeLog

PR target/97770
* gcc.target/i386/avx512bitalg-pr97770-1.c: New test.
* gcc.target/i386/avx512vpopcntdq-pr97770-1.c: Likewise.
* gcc.target/i386/avx512vpopcntdq-pr97770-2.c: Likewise.
* gcc.target/i386/avx512vpopcntdqvl-pr97770-1.c: Likewise.

-- 
Regards,

Hongyu, Wang
From b809052b0bab5d80dd0a1b1ffbd55faa8179a416 Mon Sep 17 00:00:00 2001
From: Hongyu Wang 
Date: Wed, 11 Nov 2020 09:41:13 +0800
Subject: [PATCH] Add popcount expander to enable popcount auto
 vectorization under AVX512BITALG/AVX512POPCNTDQ target.

gcc/ChangeLog

	PR target/97770
	* gcc/config/i386/sse.md (popcount2): New expander
	for SI/DI vector modes.
	(popcount2): Likewise for QI/HI vector modes.

gcc/testsuite/ChangeLog

	PR target/97770
	* gcc.target/i386/avx512bitalg-pr97770-1.c: New test.
	* gcc.target/i386/avx512vpopcntdq-pr97770-1.c: Likewise.
	* gcc.target/i386/avx512vpopcntdq-pr97770-2.c: Likewise.
	* gcc.target/i386/avx512vpopcntdqvl-pr97770-1.c: Likewise.
---
 gcc/config/i386/sse.md| 12 
 .../gcc.target/i386/avx512bitalg-pr97770-1.c  | 60 ++
 .../i386/avx512vpopcntdq-pr97770-1.c  | 63 +++
 .../i386/avx512vpopcntdq-pr97770-2.c  | 39 
 .../i386/avx512vpopcntdqvl-pr97770-1.c| 14 +
 5 files changed, 188 insertions(+)
 create mode 100644 gcc/testsuite/gcc.target/i386/avx512bitalg-pr97770-1.c
 create mode 100644 gcc/testsuite/gcc.target/i386/avx512vpopcntdq-pr97770-1.c
 create mode 100644 gcc/testsuite/gcc.target/i386/avx512vpopcntdq-pr97770-2.c
 create mode 100644 gcc/testsuite/gcc.target/i386/avx512vpopcntdqvl-pr97770-1.c

diff --git a/gcc/config/i386/sse.md b/gcc/config/i386/sse.md
index 8437ad27087..8566b2ccda2 100644
--- a/gcc/config/i386/sse.md
+++ b/gcc/config/i386/sse.md
@@ -22678,6 +22678,12 @@ (define_insn "avx5124vnniw_vp4dpwssds_maskz"
 (set_attr ("prefix") ("evex"))
 (set_attr ("mode") ("TI"))])
 
+(define_expand "popcount2"
+  [(set (match_operand:VI48_AVX512VL 0 "register_operand")
+	(popcount:VI48_AVX512VL
+	  (match_operand:VI48_AVX512VL 1 "nonimmediate_operand")))]
+  "TARGET_AVX512VPOPCNTDQ")
+
 (define_insn "vpopcount"
   [(set (match_operand:VI48_AVX512VL 0 "register_operand" "=v")
 	(popcount:VI48_AVX512VL
@@ -22722,6 +22728,12 @@ (define_insn "*restore_multiple_leave_return"
   "TARGET_SSE && TARGET_64BIT"
   "jmp\t%P1")
 
+(define_expand "popcount2"
+  [(set (match_operand:VI12_AVX512VL 0 "register_operand" "=v")
+	(popcount:VI12_AVX512VL
+	  (match_operand:VI12_AVX512VL 1 "nonimmediate_operand" "vm")))]
+  "TARGET_AVX512BITALG")
+
 (define_insn "vpopcount"
   [(set (match_operand:VI12_AVX512VL 0 "register_operand" "=v")
 	(popcount:VI12_AVX512VL
diff --git a/gcc/testsuite/gcc.target/i386/avx512bitalg-pr97770-1.c b/gcc/testsuite/gcc.target/i386/avx512bitalg-pr97770-1.c
new file mode 100644
index 000..c83a477045c
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/avx512bitalg-pr97770-1.c
@@ -0,0 +1,60 @@
+/* PR target/97770 */
+/* { dg-do compile } */
+/* { dg-options "-O2 -mavx512bitalg -mavx512vl -mprefer-vector-width=512" } */
+/* Add xfail since no IFN for QI/HImode popcount */
+/* { dg-final { scan-assembler-times "vpopcntb\[ \\t\]+\[^\\n\\r\]*xmm" 1 {xfail *-*-*} } } */
+/* { dg-final { scan-assembler-times "vpopcntw\[ \\t\]+\[^\\n\\r\]*xmm" 1 {xfail *-*-*} } } */
+/* { dg-final { scan-assembler-times "vpopcntb\[ \\t\]+\[^\\n\\r\]*ymm" 1 {xfail *-*-*} } } */
+/* { dg-final { scan-assembler-times "vpopcntw\[ \\t\]+\[^\\n\\r\]*ymm" 1 {xfail *-*-*} } } */
+/* { dg-final { scan-assembler-times "vpopcntb\[ \\t\]+\[^\\n\\r\]*zmm" 1 {xfail *-*-*} } } */
+/* { dg-final { scan-assembler-times "vpopcntw\[ \\t\]+\[^\\n\\r\]*zmm" 1 {xfail *-*-*} } } */
+
+#include 
+
+void
+__attribute__ ((noipa, optimize("-O3")))
+popcountb_128 (char * __restrict dest, char* src)
+{
+  for (int i = 0; i != 16; i++)
+dest[i] = __builtin_popcount (src[i]);
+}
+
+void
+__attribute__ ((noipa, optimize("-O3")))
+popcountw_128 (short* __restrict dest, short* src)
+{
+  for (int i = 0; i != 8; i++)
+dest[i] = __builtin_popcount (src[i]);
+}
+
+void
+__attribute__ ((noipa, optimize("-O3")))
+popcountb_256 (char * __restrict dest, char* src)
+{
+  for (int i = 0; i != 32; i++)
+dest[i] = __builtin_popcount (src[i]);
+}
+
+void
+__attribute__ ((noipa, optimize("-O3")))
+popcountw_256 (short* __re

Re: [PATCH] [PR target/97194] [AVX2] Support variable index vec_set.

2020-11-11 Thread Hongtao Liu via Gcc-patches
On Wed, Nov 11, 2020 at 4:45 PM Uros Bizjak  wrote:
>
> > gcc/ChangeLog:
> >
> > PR target/97194
> > * config/i386/i386-expand.c (ix86_expand_vector_set_var): New function.
> > * config/i386/i386-protos.h (ix86_expand_vector_set_var): New Decl.
> > * config/i386/predicates.md (vec_setm_operand): New predicate,
> > true for const_int_operand or register_operand under TARGET_AVX2.
> > * config/i386/sse.md (vec_set): Support both constant
> > and variable index vec_set.
> >
> > gcc/testsuite/ChangeLog:
> >
> > * gcc.target/i386/avx2-vec-set-1.c: New test.
> > * gcc.target/i386/avx2-vec-set-2.c: New test.
> > * gcc.target/i386/avx512bw-vec-set-1.c: New test.
> > * gcc.target/i386/avx512bw-vec-set-2.c: New test.
> > * gcc.target/i386/avx512f-vec-set-2.c: New test.
> > * gcc.target/i386/avx512vl-vec-set-2.c: New test.
>
> +;; True for registers, or const_int_operand, used to vec_setm expander.
> +(define_predicate "vec_setm_operand"
> +  (ior (and (match_operand 0 "register_operand")
> +(match_test "TARGET_AVX2"))
> +   (match_code "const_int")))
> +
>  ;; True for registers, or 1 or -1.  Used to optimize double-word shifts.
>  (define_predicate "reg_or_pm1_operand"
>(ior (match_operand 0 "register_operand")
> diff --git a/gcc/config/i386/sse.md b/gcc/config/i386/sse.md
> index b153a87fb98..1798e5dea75 100644
> --- a/gcc/config/i386/sse.md
> +++ b/gcc/config/i386/sse.md
> @@ -8098,11 +8098,14 @@ (define_insn "vec_setv2df_0"
>  (define_expand "vec_set"
>[(match_operand:V 0 "register_operand")
> (match_operand: 1 "register_operand")
> -   (match_operand 2 "const_int_operand")]
> +   (match_operand 2 "vec_setm_operand")]
>
> You need to specify a mode, otherwise a register of any mode can pass here.
>
Yes, theoretically, we only accept integer types. But in can_vec_set_var_idx_p
cut
---
bool
can_vec_set_var_idx_p (machine_mode vec_mode)
{
  if (!VECTOR_MODE_P (vec_mode))
return false;

  machine_mode inner_mode = GET_MODE_INNER (vec_mode);
  rtx reg1 = alloca_raw_REG (vec_mode, LAST_VIRTUAL_REGISTER + 1);
  rtx reg2 = alloca_raw_REG (inner_mode, LAST_VIRTUAL_REGISTER + 2);
  rtx reg3 = alloca_raw_REG (VOIDmode, LAST_VIRTUAL_REGISTER + 3);

  enum insn_code icode = optab_handler (vec_set_optab, vec_mode);

  return icode != CODE_FOR_nothing && insn_operand_matches (icode, 0, reg1)
 && insn_operand_matches (icode, 1, reg2)
 && insn_operand_matches (icode, 2, reg3);
}
---

reg3 is assumed to be VOIDmode, set anymode in match_operand 2 will
fail insn_operand_matches (icode, 2, reg3)
---
(gdb) p insn_operand_matches(icode,2,reg3)
$5 = false
(gdb)
---

Maybe we need to change

rtx reg3 = alloca_raw_REG (VOIDmode, LAST_VIRTUAL_REGISTER + 3);

to

rtx reg3 = alloca_raw_REG (SImode, LAST_VIRTUAL_REGISTER + 3);

cc Richard Biener, any thoughts?

> Uros.



-- 
BR,
Hongtao


Re: [PATCH v3] Include checking of 0 cost dependency due to bypass in rank_for_schedule

2020-11-11 Thread Jojo R
Ping … ...

Jojo
在 2020年11月6日 +0800 PM5:38,Jojo R ,写道:
> Insn seqs before sched:
>
> .L1:
> a5 = insn-1 (a0)
> a6 = insn-2 (a1)
> a7 = insn-3 (a7, a5)
> a8 = insn-4 (a8, a6)
> Jmp .L1
>
> Insn-3 & insn-4 is REG_DEP_TRUE of insn-1 & insn-2,
> so insn-3 & insn-4 will be as the last of ready list.
> And this patch will put 0 cost dependency due to a bypass
> as highest numbered class also if some target have forward
> feature between DEP_PRO and DEP_CON.
>
> if the insns are in the same cost class on -fsched-last-insn-heuristic,
> And then, go to "prefer the insn which has more later insns that depend on 
> it",
> return from dep_list_size() is not satisfied, it includes all dependence of 
> insn.
> We need to ignore the ones that have a 0 cost dependency due to a bypass.
>
> With this patch and pipeline description as below:
>
> (define_bypass 0 "insn-1, insn-2" "insn-3, insn-4")
>
> We can get better insn seqs after sched:
>
> .L1:
> a5 = insn-1 (a0)
> a7 = insn-3 (a7, a5)
> a6 = insn-2 (a1)
> a8 = insn-4 (a8, a6)
> Jmp .L1
>
> I have tested on ck860 of C-SKY arch and C960 of T-Head based on RISCV arch
>
> gcc/
> * haifa-sched.c (dep_list_costs): New.
> (rank_for_schedule): Replace dep_list_size with dep_list_costs.
> Add 0 cost dependency due to bypass on -fsched-last-insn-heuristic.
>
> ---
> gcc/haifa-sched.c | 49 +++
> 1 file changed, 45 insertions(+), 4 deletions(-)
>
> diff --git a/gcc/haifa-sched.c b/gcc/haifa-sched.c
> index 350178c82b8..51c6d23d3a5 100644
> --- a/gcc/haifa-sched.c
> +++ b/gcc/haifa-sched.c
> @@ -1584,6 +1584,44 @@ dep_list_size (rtx_insn *insn, sd_list_types_def list)
> return nodbgcount;
> }
>
> +/* Get the bypass cost of dependence DEP. */
> +
> +HAIFA_INLINE static int
> +dep_cost_bypass(dep_t dep)
> +{
> + if (dep == NULL)
> + return -1;
> +
> + if (INSN_CODE (DEP_PRO (dep)) >= 0
> + && bypass_p (DEP_PRO (dep))
> + && recog_memoized (DEP_CON (dep)) >= 0)
> + return dep_cost (dep);
> +
> + return -1;
> +}
> +
> +/* Compute the costs of nondebug deps in list LIST for INSN. */
> +
> +static int
> +dep_list_costs (rtx_insn *insn, sd_list_types_def list)
> +{
> + sd_iterator_def sd_it;
> + dep_t dep;
> + int costs = 0;
> +
> + FOR_EACH_DEP (insn, list, sd_it, dep)
> + {
> + if (!DEBUG_INSN_P (DEP_CON (dep))
> + && !DEBUG_INSN_P (DEP_PRO (dep)))
> + {
> + if (dep_cost_bypass (dep) != 0)
> + costs++;
> + }
> + }
> +
> + return costs;
> +}
> +
> bool sched_fusion;
>
> /* Compute the priority number for INSN. */
> @@ -2758,10 +2796,12 @@ rank_for_schedule (const void *x, const void *y)
> 1) Data dependent on last schedule insn.
> 2) Anti/Output dependent on last scheduled insn.
> 3) Independent of last scheduled insn, or has latency of one.
> + 4) bypass of last scheduled insn, and has latency of zero.
> Choose the insn from the highest numbered class if different. */
> dep1 = sd_find_dep_between (last, tmp, true);
>
> - if (dep1 == NULL || dep_cost (dep1) == 1)
> + if (dep1 == NULL || dep_cost (dep1) == 1
> + || (dep_cost_bypass (dep1) == 0))
> tmp_class = 3;
> else if (/* Data dependence. */
> DEP_TYPE (dep1) == REG_DEP_TRUE)
> @@ -2771,7 +2811,8 @@ rank_for_schedule (const void *x, const void *y)
>
> dep2 = sd_find_dep_between (last, tmp2, true);
>
> - if (dep2 == NULL || dep_cost (dep2) == 1)
> + if (dep2 == NULL || dep_cost (dep2) == 1
> + || (dep_cost_bypass (dep2) == 0))
> tmp2_class = 3;
> else if (/* Data dependence. */
> DEP_TYPE (dep2) == REG_DEP_TRUE)
> @@ -2795,8 +2836,8 @@ rank_for_schedule (const void *x, const void *y)
> This gives the scheduler more freedom when scheduling later
> instructions at the expense of added register pressure. */
>
> - val = (dep_list_size (tmp2, SD_LIST_FORW)
> - - dep_list_size (tmp, SD_LIST_FORW));
> + val = (dep_list_costs (tmp2, SD_LIST_FORW)
> + - dep_list_costs (tmp, SD_LIST_FORW));
>
> if (flag_sched_dep_count_heuristic && val != 0)
> return rfs_result (RFS_DEP_COUNT, val, tmp, tmp2);
> --
> 2.24.3 (Apple Git-128)


  1   2   >