date:20211019

Re: [match.pd] PR83750 - CSE erf/erfc pair

2021-10-19 Thread Prathamesh Kulkarni via Gcc-patches

On Mon, 18 Oct 2021 at 17:23, Richard Biener  wrote:
>
> On Mon, 18 Oct 2021, Prathamesh Kulkarni wrote:
>
> > On Mon, 18 Oct 2021 at 17:10, Richard Biener  wrote:
> > >
> > > On Mon, 18 Oct 2021, Prathamesh Kulkarni wrote:
> > >
> > > > On Mon, 18 Oct 2021 at 16:18, Richard Biener  wrote:
> > > > >
> > > > > On Mon, 18 Oct 2021, Prathamesh Kulkarni wrote:
> > > > >
> > > > > > Hi Richard,
> > > > > > As suggested in PR, I have attached WIP patch that adds two patterns
> > > > > > to match.pd:
> > > > > > erfc(x) --> 1 - erf(x) if canonicalize_math_p() and,
> > > > > > 1 - erf(x) --> erfc(x) if !canonicalize_math_p().
> > > > > >
> > > > > > This works to remove call to erfc for the following test:
> > > > > > double f(double x)
> > > > > > {
> > > > > >   double g(double, double);
> > > > > >
> > > > > >   double t1 = __builtin_erf (x);
> > > > > >   double t2 = __builtin_erfc (x);
> > > > > >   return g(t1, t2);
> > > > > > }
> > > > > >
> > > > > > with .optimized dump shows:
> > > > > >   t1_2 = __builtin_erf (x_1(D));
> > > > > >   t2_3 = 1.0e+0 - t1_2;
> > > > > >
> > > > > > However, for the following test:
> > > > > > double f(double x)
> > > > > > {
> > > > > >   double g(double, double);
> > > > > >
> > > > > >   double t1 = __builtin_erfc (x);
> > > > > >   return t1;
> > > > > > }
> > > > > >
> > > > > > It canonicalizes erfc(x) to 1 - erf(x), but does not transform 1 -
> > > > > > erf(x) to erfc(x) again
> > > > > > post canonicalization.
> > > > > > -fdump-tree-folding shows that 1 - erf(x) --> erfc(x) gets applied,
> > > > > > but then it tries to
> > > > > > resimplify erfc(x), which fails post canonicalization. So we end up
> > > > > > with erfc(x) transformed to
> > > > > > 1 - erf(x) in .optimized dump, which I suppose isn't ideal.
> > > > > > Could you suggest how to proceed ?
> > > > >
> > > > > I applied your patch manually and it does the intended
> > > > > simplifications so I wonder what I am missing?
> > > > Would it be OK to always fold erfc(x) -> 1 - erf(x) even when there's
> > > > no erf(x) in the source ?
> > >
> > > I do think it's reasonable to expect erfc to be available when erf
> > > is and vice versa but note both are C99 specified functions (either
> > > requires -lm).
> > OK, thanks. Would it be OK to commit the patch after bootstrap+test ?
>
> Yes, but I'm confused because you say the patch doesn't work for you?
The patch works for me to CSE erf/erfc pair.
However when there's only erfc in the source, it canonicalizes erfc(x)
to 1 - erf(x) but later fails to uncanonicalize 1 - erf(x) back to
erfc(x)
with -O3 -funsafe-math-optimizations.

For,
t1 = __builtin_erfc(x),

.optimized dump shows:
  _2 = __builtin_erf (x_1(D));
  t1_3 = 1.0e+0 - _2;

and for,
double t1 = x + __builtin_erfc(x);

.optimized dump shows:
  _3 = __builtin_erf (x_2(D));
  _7 = x_2(D) + 1.0e+0;
  t1_4 = _7 - _3;

I assume in both cases, we want erfc in the code-gen instead ?
I think the reason uncaonicalization fails is because the pattern 1 -
erf(x) to erfc(x)
gets applied, but then it fails in resimplifying erfc(x), and we end
up with 1 - erf(x) in code-gen.

>From gimple-match.c, it hits the simplification:

gimple_seq *lseq = seq;
if (__builtin_expect (!dbg_cnt
(match), 0)) goto next_after_fail1172;
if (__builtin_expect (dump_file &&
(dump_flags & TDF_FOLDING), 0)) fprintf (dump_file, "Applying pattern
%s:%d, %s:%d\n", "match.pd", 6162, __FILE__, __LINE__);
{
  res_op->set_op (CFN_BUILT_IN_ERFC, type, 1);
  res_op->ops[0] = captures[0];
  res_op->resimplify (lseq, valueize);
  return true;
}

But res_op->resimplify returns false, and doesn't end up adding to lseq.

As you suggest, should we instead handle this in fre to transform
erfc(x) to 1 - erf(x), only when
there's a matching erf(x) in the source ?

Thanks,
Prathamesh
>
> Btw, please add the testcase from the PR and also a testcase that shows
> the canonicalization is undone.  Maybe you can also double-check that
> we handle x + erfc (x) because I see we associate that as
> (x + 1) - erf (x) which is then not recognized back to erfc.
>
> The less surprising (as to preserve the function called in the source)
> variant for the PR would be to teach CSE to lookup erf(x) when
> visiting erfc(x) and when found synthesize 1 - erf(x).
>
> That said, a mathematician should chime in on how important it is
> to preserve erfc vs. erf (precision or even speed).
>
> Thanks,
> Richard.
>
> > Thanks,
> > Prathamesh
> >
> > >
> > > Richard.
> > >
> > > > So for the following test:
> > > > double f(double x)
> > > > {
> > > >   t1 = __builtin_erfc(x)
> > > >   return t1;
> > > > }
> > > >
> > > > .optimized dump shows:
> > > > double f (double x)
> > > > {
> > > >   double t

[PATCH] Remove check_aligned parameter from vect_supportable_dr_alignment

2021-10-19 Thread Richard Biener via Gcc-patches

There are two calls with true as parameter, one is only relevant
for the case of the misalignment being unknown which means the
access is never aligned there, the other is in the peeling hash
insert code used conditional on the unlimited cost model which
adds an artificial count.  But the way it works right now is
that it boosts the count if the specific misalignment when not peeling
is unsupported - in particular when the access is currently aligned
we'll query the backend with a misalign value of zero.  I've
changed it to boost the peeling when unknown alignment is not
supported instead and noted how we could in principle improve this.

Bootstrapped and tested on x86_64-unknown-linux-gnu, pushed.

2021-10-19  Richard Biener  

* tree-vectorizer.h (vect_supportable_dr_alignment): Remove
check_aligned argument.
* tree-vect-data-refs.c (vect_supportable_dr_alignment):
Likewise.
(vect_peeling_hash_insert): Add supportable_if_not_aligned
argument and do not call vect_supportable_dr_alignment here.
(vect_peeling_supportable): Adjust.
(vect_enhance_data_refs_alignment): Compute whether the
access is supported with different alignment here and
pass that down to vect_peeling_hash_insert.
(vect_vfa_access_size): Adjust.
* tree-vect-stmts.c (vect_get_store_cost): Likewise.
(vect_get_load_cost): Likewise.
(get_negative_load_store_type): Likewise.
(get_group_load_store_type): Likewise.
(get_load_store_type): Likewise.
---
 gcc/tree-vect-data-refs.c | 41 ++-
 gcc/tree-vect-stmts.c | 12 ++--
 gcc/tree-vectorizer.h |  2 +-
 3 files changed, 30 insertions(+), 25 deletions(-)

diff --git a/gcc/tree-vect-data-refs.c b/gcc/tree-vect-data-refs.c
index 4c9215874c9..7c95f9ad69e 100644
--- a/gcc/tree-vect-data-refs.c
+++ b/gcc/tree-vect-data-refs.c
@@ -1465,14 +1465,10 @@ peel_info_hasher::equal (const _vect_peel_info *a, 
const _vect_peel_info *b)
 static void
 vect_peeling_hash_insert (hash_table *peeling_htab,
  loop_vec_info loop_vinfo, dr_vec_info *dr_info,
-  int npeel)
+ int npeel, bool supportable_if_not_aligned)
 {
   struct _vect_peel_info elem, *slot;
   _vect_peel_info **new_slot;
-  tree vectype = STMT_VINFO_VECTYPE (dr_info->stmt);
-  bool supportable_dr_alignment
-= (vect_supportable_dr_alignment (loop_vinfo, dr_info, vectype, true)
-   != dr_unaligned_unsupported);
 
   elem.npeel = npeel;
   slot = peeling_htab->find (&elem);
@@ -1488,7 +1484,9 @@ vect_peeling_hash_insert (hash_table 
*peeling_htab,
   *new_slot = slot;
 }
 
-  if (!supportable_dr_alignment
+  /* If this DR is not supported with unknown misalignment then bias
+ this slot when the cost model is disabled.  */
+  if (!supportable_if_not_aligned
   && unlimited_cost_model (LOOP_VINFO_LOOP (loop_vinfo)))
 slot->count += VECT_MAX_COST;
 }
@@ -1661,7 +1659,7 @@ vect_peeling_supportable (loop_vec_info loop_vinfo, 
dr_vec_info *dr0_info,
   vect_update_misalignment_for_peel (dr_info, dr0_info, npeel);
   tree vectype = STMT_VINFO_VECTYPE (dr_info->stmt);
   supportable_dr_alignment
-   = vect_supportable_dr_alignment (loop_vinfo, dr_info, vectype, false);
+   = vect_supportable_dr_alignment (loop_vinfo, dr_info, vectype);
   SET_DR_MISALIGNMENT (dr_info, save_misalignment);
 
   if (supportable_dr_alignment == dr_unaligned_unsupported)
@@ -1803,7 +1801,6 @@ opt_result
 vect_enhance_data_refs_alignment (loop_vec_info loop_vinfo)
 {
   class loop *loop = LOOP_VINFO_LOOP (loop_vinfo);
-  enum dr_alignment_support supportable_dr_alignment;
   dr_vec_info *first_store = NULL;
   dr_vec_info *dr0_info = NULL;
   struct data_reference *dr;
@@ -1912,8 +1909,6 @@ vect_enhance_data_refs_alignment (loop_vec_info 
loop_vinfo)
 
   stmt_vec_info stmt_info = dr_info->stmt;
   tree vectype = STMT_VINFO_VECTYPE (stmt_info);
-  supportable_dr_alignment
-   = vect_supportable_dr_alignment (loop_vinfo, dr_info, vectype, true);
   do_peeling = vector_alignment_reachable_p (dr_info);
   if (do_peeling)
 {
@@ -1957,11 +1952,20 @@ vect_enhance_data_refs_alignment (loop_vec_info 
loop_vinfo)
}
 
  /* Save info about DR in the hash table.  Also include peeling
-amounts according to the explanation above.  */
+amounts according to the explanation above.  Indicate
+the alignment status when the ref is not aligned.
+???  Rather than using unknown alignment here we should
+prune all entries from the peeling hashtable which cause
+DRs to be not supported.  */
+ bool supportable_if_not_aligned
+   = targetm.vectorize.support_vector_misalignment
+   (TYPE_MODE (vectype), TREE_TYPE (DR_REF (

Re: [PATCH] tree-object-size: Make unknown a computation

2021-10-19 Thread Siddhesh Poyarekar


On 10/19/21 12:28, Jakub Jelinek wrote:

On Tue, Oct 19, 2021 at 09:47:45AM +0530, Siddhesh Poyarekar wrote:

Compute the unknown size value as a function of the min/max bit of
object_size_type.  This transforms into a neat little branchless
sequence on x86_64:

movl%edi, %eax
sarl%eax
xorl$1, %eax
negl%eax
cltq

which should be faster than loading the value from memory.  A quick
unscientific test using

`time make check-gcc RUNTESTFLAGS="dg.exp=builtin*"`


But if you use some other higher bit of object_size_type for the mode
(normal vs. dynamic) you'd need to mask it away, so it will become longer.
Anyway, I guess that part is ok.


Yes, my WIP patchset has:

((unsigned HOST_WIDE_INT) -(((object_size_type & MIN_BIT) >> 1) ^ 1));

which results in:

xorl%eax, %eax
andl$2, %edi
sete%al
negq%rax


-/* Compute object_sizes for PTR, defined to an unknown value.  */
-
-static void
-unknown_object_size (struct object_size_info *osi, tree ptr)
-{
-  int object_size_type = osi->object_size_type;
-  unsigned int varno = SSA_NAME_VERSION (ptr);
-  unsigned HOST_WIDE_INT bytes;
-
-  gcc_assert (object_sizes[object_size_type][varno]
- != unknown[object_size_type]);
-  gcc_assert (osi->pass == 0);
-
-  bytes = unknown[object_size_type];
-
-  if ((object_size_type & 2) == 0)
-{
-  if (object_sizes[object_size_type][varno] < bytes)
-   object_sizes[object_size_type][varno] = bytes;
-}
-  else
-{
-  if (object_sizes[object_size_type][varno] > bytes)
-   object_sizes[object_size_type][varno] = bytes;
-}
-}


But I don't think removing this function is desirable.
Can it be greatly simplified?  Yes, certainly.
The assert verifies it is not unknown before, and then for mode 0 or 1
uses maximum which will always be unknown and for mode 2 or 3
minimum which will always be unknown as well.
But I'd keep the asserts in there.  So it can become
   int object_size_type = osi->object_size_type;
   unsigned int varno = SSA_NAME_VERSION (ptr);
   unsigned HOST_WIDE_INT bytes = unknown (object_size_type);

   gcc_checking_assert (object_sizes[object_size_type][varno] != bytes);
   gcc_checking_assert (osi->pass == 0);
   object_sizes[object_size_type][varno] = bytes;


OK, I'll send an updated patch.

Thanks,
Siddhesh

Re: [match.pd] PR83750 - CSE erf/erfc pair

2021-10-19 Thread Richard Biener via Gcc-patches

On Tue, Oct 19, 2021 at 9:03 AM Prathamesh Kulkarni via Gcc-patches
 wrote:
>
> On Mon, 18 Oct 2021 at 17:23, Richard Biener  wrote:
> >
> > On Mon, 18 Oct 2021, Prathamesh Kulkarni wrote:
> >
> > > On Mon, 18 Oct 2021 at 17:10, Richard Biener  wrote:
> > > >
> > > > On Mon, 18 Oct 2021, Prathamesh Kulkarni wrote:
> > > >
> > > > > On Mon, 18 Oct 2021 at 16:18, Richard Biener  
> > > > > wrote:
> > > > > >
> > > > > > On Mon, 18 Oct 2021, Prathamesh Kulkarni wrote:
> > > > > >
> > > > > > > Hi Richard,
> > > > > > > As suggested in PR, I have attached WIP patch that adds two 
> > > > > > > patterns
> > > > > > > to match.pd:
> > > > > > > erfc(x) --> 1 - erf(x) if canonicalize_math_p() and,
> > > > > > > 1 - erf(x) --> erfc(x) if !canonicalize_math_p().
> > > > > > >
> > > > > > > This works to remove call to erfc for the following test:
> > > > > > > double f(double x)
> > > > > > > {
> > > > > > >   double g(double, double);
> > > > > > >
> > > > > > >   double t1 = __builtin_erf (x);
> > > > > > >   double t2 = __builtin_erfc (x);
> > > > > > >   return g(t1, t2);
> > > > > > > }
> > > > > > >
> > > > > > > with .optimized dump shows:
> > > > > > >   t1_2 = __builtin_erf (x_1(D));
> > > > > > >   t2_3 = 1.0e+0 - t1_2;
> > > > > > >
> > > > > > > However, for the following test:
> > > > > > > double f(double x)
> > > > > > > {
> > > > > > >   double g(double, double);
> > > > > > >
> > > > > > >   double t1 = __builtin_erfc (x);
> > > > > > >   return t1;
> > > > > > > }
> > > > > > >
> > > > > > > It canonicalizes erfc(x) to 1 - erf(x), but does not transform 1 -
> > > > > > > erf(x) to erfc(x) again
> > > > > > > post canonicalization.
> > > > > > > -fdump-tree-folding shows that 1 - erf(x) --> erfc(x) gets 
> > > > > > > applied,
> > > > > > > but then it tries to
> > > > > > > resimplify erfc(x), which fails post canonicalization. So we end 
> > > > > > > up
> > > > > > > with erfc(x) transformed to
> > > > > > > 1 - erf(x) in .optimized dump, which I suppose isn't ideal.
> > > > > > > Could you suggest how to proceed ?
> > > > > >
> > > > > > I applied your patch manually and it does the intended
> > > > > > simplifications so I wonder what I am missing?
> > > > > Would it be OK to always fold erfc(x) -> 1 - erf(x) even when there's
> > > > > no erf(x) in the source ?
> > > >
> > > > I do think it's reasonable to expect erfc to be available when erf
> > > > is and vice versa but note both are C99 specified functions (either
> > > > requires -lm).
> > > OK, thanks. Would it be OK to commit the patch after bootstrap+test ?
> >
> > Yes, but I'm confused because you say the patch doesn't work for you?
> The patch works for me to CSE erf/erfc pair.
> However when there's only erfc in the source, it canonicalizes erfc(x)
> to 1 - erf(x) but later fails to uncanonicalize 1 - erf(x) back to
> erfc(x)
> with -O3 -funsafe-math-optimizations.
>
> For,
> t1 = __builtin_erfc(x),
>
> .optimized dump shows:
>   _2 = __builtin_erf (x_1(D));
>   t1_3 = 1.0e+0 - _2;
>
> and for,
> double t1 = x + __builtin_erfc(x);
>
> .optimized dump shows:
>   _3 = __builtin_erf (x_2(D));
>   _7 = x_2(D) + 1.0e+0;
>   t1_4 = _7 - _3;
>
> I assume in both cases, we want erfc in the code-gen instead ?
> I think the reason uncaonicalization fails is because the pattern 1 -
> erf(x) to erfc(x)
> gets applied, but then it fails in resimplifying erfc(x), and we end
> up with 1 - erf(x) in code-gen.
>
> From gimple-match.c, it hits the simplification:
>
> gimple_seq *lseq = seq;
> if (__builtin_expect (!dbg_cnt
> (match), 0)) goto next_after_fail1172;
> if (__builtin_expect (dump_file &&
> (dump_flags & TDF_FOLDING), 0)) fprintf (dump_file, "Applying pattern
> %s:%d, %s:%d\n", "match.pd", 6162, __FILE__, __LINE__);
> {
>   res_op->set_op (CFN_BUILT_IN_ERFC, type, 1);
>   res_op->ops[0] = captures[0];
>   res_op->resimplify (lseq, valueize);
>   return true;
> }
>
> But res_op->resimplify returns false, and doesn't end up adding to lseq.

There's nothing to add to lseq since there's also nothing to resimplify.
The only thing that could happen is that the replacement is not done
because replace_stmt_with_simplification via maybe_push_res_to_seq
doesn't pass the builtin_decl_implicit test:

  /* Find the function we want to call.  */
  tree decl = builtin_decl_implicit (as_builtin_fn (fn));
  if (!decl)
return NULL;

btw, it did work for me since the call was present before and gimplification
should then mark the function eligible for implicit generation.

> As you suggest, should we instead handle this in fre to transform
> erfc(x) to 1 - erf(x), only when
> there's a matching erf(x) in the source ?

Note that's strictly l

Re: [RFC] Remove VRP threader passes in exchange for better threading pre-VRP.

2021-10-19 Thread Aldy Hernandez via Gcc-patches

On Tue, Oct 19, 2021 at 8:52 AM Richard Biener
 wrote:
>
> On Mon, Oct 18, 2021 at 4:03 PM Aldy Hernandez  wrote:
> >
> >
> >
> > On 10/18/21 3:41 PM, Aldy Hernandez wrote:
> >
> > > I've been experimenting with reducing the total number of threading
> > > passes, and I'd like to see if there's consensus/stomach for altering
> > > the pipeline.  Note, that the goal is to remove forward threader clients,
> > > not the other way around.  So, we should prefer to remove a VRP threader
> > > instance over a *.thread one immediately before VRP.
> > >
> > > After some playing, it looks like if we enable fully-resolving mode in
> > > the *.thread passes immediately preceeding VRP, we can remove the VRP
> > > threading passes altogether, thus removing 2 threading passes (and
> > > forward threading passes at that!).
> >
> > It occurs to me that we could also remove the threading before VRP
> > passes, and enable a fully-resolving backward threader after VRP.  I
> > haven't played with this scenario, but it should be just as good.  That
> > being said, I don't know the intricacies of why we had both pre and post
> > VRP threading passes, and if one is ideally better than the other.
>
> It was done because they were different threaders.  Since the new threader
> uses built-in VRP it shouldn't really matter whether it's before or after
> VRP _for the threading_, but it might be that if threading runs before VRP
> then VRP itself can do a better job on cleaning up the IL.

Good point.

FWIW, earlier this season I played with replacing the VRPs with evrp
instances (which fold far more conditionals) and I found that the
threaders can actually find LESS opportunities after *vrp fold away
things.  I don't know if this is a good or a bad thing.  Perhaps we
should benchmark three alternatives:

1. Mainline
2. Fully resolving threader -> VRP -> No threading.
3. No threading -> VRP -> Full resolving threader.

...and see what the actual effect is, regardless of number of threaded paths.

Speak of which, what's the blessed way of benchmarking performance
nowadays?  I've seen some PRs fly that measure some more lightweight
benchmarks (open source?) than a full blown SPEC.

>
> +  /* ?? Is this still needed.  ?? */
>/* Threading can leave many const/copy propagations in the IL.
>  Clean them up.  Instead of just copy_prop, we use ccp to
>  compute alignment and nonzero bits.  */
>
> Yes, it's still needed but not for the stated reason - the VRP
> substitution and folding stage should deal with copy/constant propagation
> but we replaced the former copy propagation with CCP to re-compute
> nonzero bits & alignment so I'd change the comment to
>
>/* Run CCP to compute alignment and nonzero bits.  */

Ahh..

There's another similar comment after DOM.  Is this comment still relevant?

  NEXT_PASS (pass_dominator, true /* may_peel_loop_headers_p */);
  /* Threading can leave many const/copy propagations in the IL.
 Clean them up.  Failure to do so well can lead to false
 positives from warnings for erroneous code.  */
  NEXT_PASS (pass_copy_prop);
  /* Identify paths that should never be executed in a conforming
 program and isolate those paths.  */

Thanks.
Aldy

[PATCH v2] tree-object-size: Make unknown a computation

2021-10-19 Thread Siddhesh Poyarekar

Compute the unknown size value as a function of the min/max bit of
object_size_type.  This transforms into a neat little branchless
sequence on x86_64:

movl%edi, %eax
sarl%eax
xorl$1, %eax
negl%eax
cltq

which should be faster than loading the value from memory.  A quick
unscientific test using

`time make check-gcc RUNTESTFLAGS="dg.exp=builtin*"`

shaves about half a second off execution time with this.  Also simplify
implementation of unknown_object_size.

gcc/ChangeLog:

* tree-object-size.c (unknown): Make into a function.  Adjust
all uses.
(unknown_object_size): Simplify implementation.

Signed-off-by: Siddhesh Poyarekar 
---
 gcc/tree-object-size.c | 100 ++---
 1 file changed, 43 insertions(+), 57 deletions(-)

diff --git a/gcc/tree-object-size.c b/gcc/tree-object-size.c
index 46a976dfe10..4334e05ef70 100644
--- a/gcc/tree-object-size.c
+++ b/gcc/tree-object-size.c
@@ -45,13 +45,6 @@ struct object_size_info
   unsigned int *stack, *tos;
 };
 
-static const unsigned HOST_WIDE_INT unknown[4] = {
-  HOST_WIDE_INT_M1U,
-  HOST_WIDE_INT_M1U,
-  0,
-  0
-};
-
 static tree compute_object_offset (const_tree, const_tree);
 static bool addr_object_size (struct object_size_info *,
  const_tree, int, unsigned HOST_WIDE_INT *);
@@ -82,6 +75,11 @@ static bitmap computed[4];
 /* Maximum value of offset we consider to be addition.  */
 static unsigned HOST_WIDE_INT offset_limit;
 
+static inline unsigned HOST_WIDE_INT
+unknown (int object_size_type)
+{
+  return ((unsigned HOST_WIDE_INT) -((object_size_type >> 1) ^ 1));
+}
 
 /* Initialize OFFSET_LIMIT variable.  */
 static void
@@ -204,7 +202,7 @@ decl_init_size (tree decl, bool min)
 
 /* Compute __builtin_object_size for PTR, which is a ADDR_EXPR.
OBJECT_SIZE_TYPE is the second argument from __builtin_object_size.
-   If unknown, return unknown[object_size_type].  */
+   If unknown, return unknown (object_size_type).  */
 
 static bool
 addr_object_size (struct object_size_info *osi, const_tree ptr,
@@ -216,7 +214,7 @@ addr_object_size (struct object_size_info *osi, const_tree 
ptr,
 
   /* Set to unknown and overwrite just before returning if the size
  could be determined.  */
-  *psize = unknown[object_size_type];
+  *psize = unknown (object_size_type);
 
   pt_var = TREE_OPERAND (ptr, 0);
   while (handled_component_p (pt_var))
@@ -244,9 +242,9 @@ addr_object_size (struct object_size_info *osi, const_tree 
ptr,
SSA_NAME_VERSION (var)))
sz = object_sizes[object_size_type][SSA_NAME_VERSION (var)];
  else
-   sz = unknown[object_size_type];
+   sz = unknown (object_size_type);
}
-  if (sz != unknown[object_size_type])
+  if (sz != unknown (object_size_type))
{
  offset_int mem_offset;
  if (mem_ref_offset (pt_var).is_constant (&mem_offset))
@@ -257,13 +255,13 @@ addr_object_size (struct object_size_info *osi, 
const_tree ptr,
  else if (wi::fits_uhwi_p (dsz))
sz = dsz.to_uhwi ();
  else
-   sz = unknown[object_size_type];
+   sz = unknown (object_size_type);
}
  else
-   sz = unknown[object_size_type];
+   sz = unknown (object_size_type);
}
 
-  if (sz != unknown[object_size_type] && sz < offset_limit)
+  if (sz != unknown (object_size_type) && sz < offset_limit)
pt_var_size = size_int (sz);
 }
   else if (DECL_P (pt_var))
@@ -445,7 +443,7 @@ addr_object_size (struct object_size_info *osi, const_tree 
ptr,
 /* Compute __builtin_object_size for CALL, which is a GIMPLE_CALL.
Handles calls to functions declared with attribute alloc_size.
OBJECT_SIZE_TYPE is the second argument from __builtin_object_size.
-   If unknown, return unknown[object_size_type].  */
+   If unknown, return unknown (object_size_type).  */
 
 static unsigned HOST_WIDE_INT
 alloc_object_size (const gcall *call, int object_size_type)
@@ -459,7 +457,7 @@ alloc_object_size (const gcall *call, int object_size_type)
 calltype = gimple_call_fntype (call);
 
   if (!calltype)
-return unknown[object_size_type];
+return unknown (object_size_type);
 
   /* Set to positions of alloc_size arguments.  */
   int arg1 = -1, arg2 = -1;
@@ -479,7 +477,7 @@ alloc_object_size (const gcall *call, int object_size_type)
   || (arg2 >= 0
  && (arg2 >= (int)gimple_call_num_args (call)
  || TREE_CODE (gimple_call_arg (call, arg2)) != INTEGER_CST)))
-return unknown[object_size_type];
+return unknown (object_size_type);
 
   tree bytes = NULL_TREE;
   if (arg2 >= 0)
@@ -492,7 +490,7 @@ alloc_object_size (const gcall *call, int object_size_type)
   if (bytes && tree_fits_uhwi_p (bytes))
 return tree_to_uhwi (bytes);
 
-  return unknown[object_size_type];
+  return unknown (object_size_type

Re: [PATCH v2] tree-object-size: Make unknown a computation

2021-10-19 Thread Jakub Jelinek via Gcc-patches

On Tue, Oct 19, 2021 at 01:08:30PM +0530, Siddhesh Poyarekar wrote:
> Compute the unknown size value as a function of the min/max bit of
> object_size_type.  This transforms into a neat little branchless
> sequence on x86_64:
> 
>   movl%edi, %eax
>   sarl%eax
>   xorl$1, %eax
>   negl%eax
>   cltq
> 
> which should be faster than loading the value from memory.  A quick
> unscientific test using
> 
> `time make check-gcc RUNTESTFLAGS="dg.exp=builtin*"`
> 
> shaves about half a second off execution time with this.  Also simplify
> implementation of unknown_object_size.
> 
> gcc/ChangeLog:
> 
>   * tree-object-size.c (unknown): Make into a function.  Adjust
>   all uses.
>   (unknown_object_size): Simplify implementation.
> 
> Signed-off-by: Siddhesh Poyarekar 

Ok for trunk if it passes bootstrap/regtest.

Jakub

Re: [SVE] Adjust PR93183 test-case to compile with -march=armv8.3-a+sve

2021-10-19 Thread Richard Sandiford via Gcc-patches

Prathamesh Kulkarni  writes:
> Hi,
> The attached patch removes "-mcpu=generic+sve" from dg-options,
> because it conflicts
> with -march=armv8.3-a+sve, and resulted in:
>
> cc1: warning: switch '-mcpu=generic+sve' conflicts with
> '-march=armv8.3-a+sve' switch^M
> FAIL: gcc.target/aarch64/sve/pr93183.c (test for excess errors)
> Excess errors:
> cc1: warning: switch '-mcpu=generic+sve' conflicts with
> '-march=armv8.3-a+sve' switch
>
> Is the patch OK to commit ?

Yes, thanks.

Richard

> Thanks,
> Prathamesh
>
> diff --git a/gcc/testsuite/gcc.target/aarch64/sve/pr93183.c 
> b/gcc/testsuite/gcc.target/aarch64/sve/pr93183.c
> index 2f92224cecb..8d1ee418baf 100644
> --- a/gcc/testsuite/gcc.target/aarch64/sve/pr93183.c
> +++ b/gcc/testsuite/gcc.target/aarch64/sve/pr93183.c
> @@ -1,5 +1,5 @@
>  /* { dg-do compile } */
> -/* { dg-options "-O3 -mcpu=generic+sve" } */
> +/* { dg-options "-O3" } */
>  
>  typedef unsigned char uint8_t;
>

[PATCH] Add misalignment output parameter to get_load_store_type

2021-10-19 Thread Richard Biener via Gcc-patches

This makes us compute the misalignment alongside the alignment support
scheme in get_load_store_type, removing some out-of-place calls to
the DR alignment API.

Bootstrapped and tested on x86_64-unknown-linux-gnu, pushed.

2021-10-18  Richard Biener  

* tree-vect-stmts.c (get_group_load_store_type): Add
misalignment output parameter and initialize it.
(get_group_load_store_type): Likewise.
(vectorizable_store): Remove now redundant queries.
(vectorizable_load): Likewise.
---
 gcc/tree-vect-stmts.c | 39 +++
 1 file changed, 27 insertions(+), 12 deletions(-)

diff --git a/gcc/tree-vect-stmts.c b/gcc/tree-vect-stmts.c
index 5d8fe42e200..c3690769d8f 100644
--- a/gcc/tree-vect-stmts.c
+++ b/gcc/tree-vect-stmts.c
@@ -2109,6 +2109,7 @@ get_group_load_store_type (vec_info *vinfo, stmt_vec_info 
stmt_info,
   bool masked_p, vec_load_store_type vls_type,
   vect_memory_access_type *memory_access_type,
   dr_alignment_support *alignment_support_scheme,
+  int *misalignment,
   gather_scatter_info *gs_info)
 {
   loop_vec_info loop_vinfo = dyn_cast  (vinfo);
@@ -2294,10 +2295,16 @@ get_group_load_store_type (vec_info *vinfo, 
stmt_vec_info stmt_info,
 
   if (*memory_access_type == VMAT_GATHER_SCATTER
   || *memory_access_type == VMAT_ELEMENTWISE)
-*alignment_support_scheme = dr_unaligned_supported;
+{
+  *alignment_support_scheme = dr_unaligned_supported;
+  *misalignment = DR_MISALIGNMENT_UNKNOWN;
+}
   else
-*alignment_support_scheme
-  = vect_supportable_dr_alignment (vinfo, first_dr_info, vectype);
+{
+  *alignment_support_scheme
+   = vect_supportable_dr_alignment (vinfo, first_dr_info, vectype);
+  *misalignment = dr_misalignment (first_dr_info, vectype);
+}
 
   if (vls_type != VLS_LOAD && first_stmt_info == stmt_info)
 {
@@ -2337,7 +2344,9 @@ get_group_load_store_type (vec_info *vinfo, stmt_vec_info 
stmt_info,
storing it in *MEMORY_ACCESS_TYPE if so.  If we decide to use gathers
or scatters, fill in GS_INFO accordingly.  In addition
*ALIGNMENT_SUPPORT_SCHEME is filled out and false is returned if
-   the target does not support the alignment scheme.
+   the target does not support the alignment scheme.  *MISALIGNMENT
+   is set according to the alignment of the access (including
+   DR_MISALIGNMENT_UNKNOWN when it is unknown).
 
SLP says whether we're performing SLP rather than loop vectorization.
MASKED_P is true if the statement is conditional on a vectorized mask.
@@ -2351,10 +2360,12 @@ get_load_store_type (vec_info  *vinfo, stmt_vec_info 
stmt_info,
 unsigned int ncopies,
 vect_memory_access_type *memory_access_type,
 dr_alignment_support *alignment_support_scheme,
+int *misalignment,
 gather_scatter_info *gs_info)
 {
   loop_vec_info loop_vinfo = dyn_cast  (vinfo);
   poly_uint64 nunits = TYPE_VECTOR_SUBPARTS (vectype);
+  *misalignment = DR_MISALIGNMENT_UNKNOWN;
   if (STMT_VINFO_GATHER_SCATTER_P (stmt_info))
 {
   *memory_access_type = VMAT_GATHER_SCATTER;
@@ -2402,7 +2413,8 @@ get_load_store_type (vec_info  *vinfo, stmt_vec_info 
stmt_info,
   if (!get_group_load_store_type (vinfo, stmt_info, vectype, slp_node,
  masked_p,
  vls_type, memory_access_type,
- alignment_support_scheme, gs_info))
+ alignment_support_scheme,
+ misalignment, gs_info))
return false;
 }
   else if (STMT_VINFO_STRIDED_P (stmt_info))
@@ -2439,6 +2451,8 @@ get_load_store_type (vec_info  *vinfo, stmt_vec_info 
stmt_info,
= vect_supportable_dr_alignment (vinfo,
 STMT_VINFO_DR_INFO (stmt_info),
 vectype);
+ *misalignment = dr_misalignment (STMT_VINFO_DR_INFO (stmt_info),
+  vectype);
}
 }
 
@@ -7337,9 +7351,10 @@ vectorizable_store (vec_info *vinfo,
 
   vect_memory_access_type memory_access_type;
   enum dr_alignment_support alignment_support_scheme;
+  int misalignment;
   if (!get_load_store_type (vinfo, stmt_info, vectype, slp_node, mask, 
vls_type,
ncopies, &memory_access_type,
-   &alignment_support_scheme, &gs_info))
+   &alignment_support_scheme, &misalignment, &gs_info))
 return false;
 
   if (mask)
@@ -8218,14 +8233,13 @@ vectorizable_store (vec_info *vinfo,
  gcc_assert (aligned_access_p (first_dr_info, vectype));
  misalign = 0;
}
- else if (dr_misalignment

Re: [SVE] Adjust PR93183 test-case to compile with -march=armv8.3-a+sve

2021-10-19 Thread Prathamesh Kulkarni via Gcc-patches

On Tue, 19 Oct 2021 at 13:32, Richard Sandiford
 wrote:
>
> Prathamesh Kulkarni  writes:
> > Hi,
> > The attached patch removes "-mcpu=generic+sve" from dg-options,
> > because it conflicts
> > with -march=armv8.3-a+sve, and resulted in:
> >
> > cc1: warning: switch '-mcpu=generic+sve' conflicts with
> > '-march=armv8.3-a+sve' switch^M
> > FAIL: gcc.target/aarch64/sve/pr93183.c (test for excess errors)
> > Excess errors:
> > cc1: warning: switch '-mcpu=generic+sve' conflicts with
> > '-march=armv8.3-a+sve' switch
> >
> > Is the patch OK to commit ?
>
> Yes, thanks.
Thanks, committed as 6b4c18b98127087d7f14062b81bc678f0589cd36.

Thanks,
Prathamesh
>
> Richard
>
> > Thanks,
> > Prathamesh
> >
> > diff --git a/gcc/testsuite/gcc.target/aarch64/sve/pr93183.c 
> > b/gcc/testsuite/gcc.target/aarch64/sve/pr93183.c
> > index 2f92224cecb..8d1ee418baf 100644
> > --- a/gcc/testsuite/gcc.target/aarch64/sve/pr93183.c
> > +++ b/gcc/testsuite/gcc.target/aarch64/sve/pr93183.c
> > @@ -1,5 +1,5 @@
> >  /* { dg-do compile } */
> > -/* { dg-options "-O3 -mcpu=generic+sve" } */
> > +/* { dg-options "-O3" } */
> >
> >  typedef unsigned char uint8_t;
> >

Re: [RFC] Remove VRP threader passes in exchange for better threading pre-VRP.

2021-10-19 Thread Richard Biener via Gcc-patches

On Tue, Oct 19, 2021 at 9:33 AM Aldy Hernandez  wrote:
>
> On Tue, Oct 19, 2021 at 8:52 AM Richard Biener
>  wrote:
> >
> > On Mon, Oct 18, 2021 at 4:03 PM Aldy Hernandez  wrote:
> > >
> > >
> > >
> > > On 10/18/21 3:41 PM, Aldy Hernandez wrote:
> > >
> > > > I've been experimenting with reducing the total number of threading
> > > > passes, and I'd like to see if there's consensus/stomach for altering
> > > > the pipeline.  Note, that the goal is to remove forward threader 
> > > > clients,
> > > > not the other way around.  So, we should prefer to remove a VRP threader
> > > > instance over a *.thread one immediately before VRP.
> > > >
> > > > After some playing, it looks like if we enable fully-resolving mode in
> > > > the *.thread passes immediately preceeding VRP, we can remove the VRP
> > > > threading passes altogether, thus removing 2 threading passes (and
> > > > forward threading passes at that!).
> > >
> > > It occurs to me that we could also remove the threading before VRP
> > > passes, and enable a fully-resolving backward threader after VRP.  I
> > > haven't played with this scenario, but it should be just as good.  That
> > > being said, I don't know the intricacies of why we had both pre and post
> > > VRP threading passes, and if one is ideally better than the other.
> >
> > It was done because they were different threaders.  Since the new threader
> > uses built-in VRP it shouldn't really matter whether it's before or after
> > VRP _for the threading_, but it might be that if threading runs before VRP
> > then VRP itself can do a better job on cleaning up the IL.
>
> Good point.
>
> FWIW, earlier this season I played with replacing the VRPs with evrp
> instances (which fold far more conditionals) and I found that the
> threaders can actually find LESS opportunities after *vrp fold away
> things.  I don't know if this is a good or a bad thing.

Probably a sign that either threading theads stuff that's pointless
(does not consider conditions on the path that always evaluate false?)
or that after VRP and removing redundant conditions blocks are now
bigger and we run into some --param limit (that would suggest the
limits behave odd if it would thread when we'd artificially split blocks).

Examples might be interesting to look at to understand what's going "wrong".

> Perhaps we
> should benchmark three alternatives:
>
> 1. Mainline
> 2. Fully resolving threader -> VRP -> No threading.
> 3. No threading -> VRP -> Full resolving threader.
>
> ...and see what the actual effect is, regardless of number of threaded paths.

As said, only 2. makes "sense" to me unless examples show why we really
have the usual pass ordering issue.  As said, I think threading exposes new
VRP (esp. constant/copy prop) opportunities but VRP shouldn't expose new
threading opportunities.

> Speak of which, what's the blessed way of benchmarking performance
> nowadays?  I've seen some PRs fly that measure some more lightweight
> benchmarks (open source?) than a full blown SPEC.

The answer is SPEC.  The other answer would be GCC bootstrap time
and resulting code size.

> >
> > +  /* ?? Is this still needed.  ?? */
> >/* Threading can leave many const/copy propagations in the IL.
> >  Clean them up.  Instead of just copy_prop, we use ccp to
> >  compute alignment and nonzero bits.  */
> >
> > Yes, it's still needed but not for the stated reason - the VRP
> > substitution and folding stage should deal with copy/constant propagation
> > but we replaced the former copy propagation with CCP to re-compute
> > nonzero bits & alignment so I'd change the comment to
> >
> >/* Run CCP to compute alignment and nonzero bits.  */
>
> Ahh..
>
> There's another similar comment after DOM.  Is this comment still relevant?

Yes, since DOM still does threading the threaded paths lack const/copy
propagation.

>   NEXT_PASS (pass_dominator, true /* may_peel_loop_headers_p */);
>   /* Threading can leave many const/copy propagations in the IL.
>  Clean them up.  Failure to do so well can lead to false
>  positives from warnings for erroneous code.  */
>   NEXT_PASS (pass_copy_prop);
>   /* Identify paths that should never be executed in a conforming
>  program and isolate those paths.  */
>
> Thanks.
> Aldy
>

[COMMITTED] Change threading comment before pass_ccp pass.

2021-10-19 Thread Aldy Hernandez via Gcc-patches

As suggested.  Thanks.

gcc/ChangeLog:

* passes.def: Change threading comment before pass_ccp pass.
---
 gcc/passes.def | 4 +---
 1 file changed, 1 insertion(+), 3 deletions(-)

diff --git a/gcc/passes.def b/gcc/passes.def
index c11c237f6d2..4c54176328b 100644
--- a/gcc/passes.def
+++ b/gcc/passes.def
@@ -339,9 +339,7 @@ along with GCC; see the file COPYING3.  If not see
   NEXT_PASS (pass_thread_jumps);
   NEXT_PASS (pass_vrp, false /* warn_array_bounds_p */);
   NEXT_PASS (pass_vrp_threader);
-  /* Threading can leave many const/copy propagations in the IL.
-Clean them up.  Instead of just copy_prop, we use ccp to
-compute alignment and nonzero bits.  */
+  /* Run CCP to compute alignment and nonzero bits.  */
   NEXT_PASS (pass_ccp, true /* nonzero_p */);
   NEXT_PASS (pass_warn_restrict);
   NEXT_PASS (pass_dse);
-- 
2.31.1

Re: [PATCH] rs6000: Remove unnecessary option manipulation.

2021-10-19 Thread Martin Liška


On 10/15/21 17:24, Martin Liška wrote:

P.S. Next time, please CC me.


All right, I tested the updated patch and it works fine
on ppc64le-linux-gnu.

May I install it?
Thanks,
Martin

Re: PATCH, rs6000] Optimization for vec_xl_sext

2021-10-19 Thread HAO CHEN GUI via Gcc-patches


Committed as r12-4494. Thanks to all of you.

Gui Haochen

On 15/10/2021 上午 2:53, David Edelsohn wrote:

On Thu, Oct 14, 2021 at 2:17 AM HAO CHEN GUI  wrote:

Hi,

The patch optimizes the code generation for vec_xl_sext builtin. Now all 
the sign extensions are done on VSX registers directly.

Bootstrapped and tested on powerpc64le-linux with no regressions. Is this 
okay for trunk? Any recommendations? Thanks a lot.

I refined the patch according to Bill and David's advice. I put the 
patch.diff and ChangeLog in attachment also in case the indentation doesn't 
show correctly in email body.


ChangeLog

2021-10-11 Haochen Gui 


gcc/

* config/rs6000/rs6000-call.c (altivec_expand_lxvr_builtin):

Modify the expansion for sign extension. All extensions are done

within VSX registers.


gcc/testsuite/

* gcc.target/powerpc/p10_vec_xl_sext.c: New test.

This is okay.

Thanks, David

[PATCH] Adjust testcase for O2 vectorization.

2021-10-19 Thread liuhongt via Gcc-patches

updated patch:
  1. Add documents in doc/sourcebuild.texi (Effective-Target Keywords).
  2. Reduce -novec.c testcases to contain only new failed parted which
is caused by O2 vectorization.
  3. Add PR in dg-warning comment.

As discussed in [1], this patch add xfail/target selector to those
testcases, also make a copy of them so that they can be tested w/o
vectorization.

Newly added xfail/target selectors are used to check the vectorization
capability of continuous byte/double bytes storage, these scenarios
are exactly the part of the testcases that regressed after O2
vectorization.

[1] https://gcc.gnu.org/pipermail/gcc-patches/2021-October/581456.html.

gcc/ChangeLog

* doc/sourcebuild.texi (Effective-Target Keywords): Document
vect_slp_v2qi_store, vect_slp_v4qi_store, vect_slp_v8qi_store,
vect_slp_v16qi_store, vect_slp_v2hi_store,
vect_slp_v4hi_store, vect_slp_v2si_store, vect_slp_v4si_store.

gcc/testsuite/ChangeLog

PR middle-end/102722
PR middle-end/102697
PR middle-end/102462
PR middle-end/102706
PR middle-end/102744
* c-c++-common/Wstringop-overflow-2.c: Adjust testcase with new
xfail/target selector.
* gcc.dg/Warray-bounds-51.c: Ditto.
* gcc.dg/Warray-parameter-3.c: Ditto.
* gcc.dg/Wstringop-overflow-14.c: Ditto.
* gcc.dg/Wstringop-overflow-21.c: Ditto.
* gcc.dg/Wstringop-overflow-68.c: Ditto.
* gcc.dg/Wstringop-overflow-76.c: Ditto.
* gcc.dg/Warray-bounds-48.c: Ditto.
* gcc.dg/Wzero-length-array-bounds-2.c: Ditto.
* lib/target-supports.exp (check_vect_slp_aligned_store_usage):
New function.
(check_effective_target_vect_slp_v2qi_store): Ditto.
(check_effective_target_vect_slp_v4qi_store): Ditto.
(check_effective_target_vect_slp_v8qi_store): Ditto.
(check_effective_target_vect_slp_v16qi_store): Ditto.
(check_effective_target_vect_slp_v2hi_store): Ditto.
(check_effective_target_vect_slp_v4hi_store): Ditto.
(check_effective_target_vect_slp_v2si_store): Ditto.
(check_effective_target_vect_slp_v4si_store): Ditto.
* c-c++-common/Wstringop-overflow-2-novec.c: New test.
* gcc.dg/Warray-bounds-51-novec.c: New test.
* gcc.dg/Warray-bounds-48-novec.c: New test.
* gcc.dg/Warray-parameter-3-novec.c: New test.
* gcc.dg/Wstringop-overflow-14-novec.c: New test.
* gcc.dg/Wstringop-overflow-21-novec.c: New test.
* gcc.dg/Wstringop-overflow-76-novec.c: New test.
* gcc.dg/Wzero-length-array-bounds-2-novec.c: New test.
---
 gcc/doc/sourcebuild.texi  |  32 ++
 .../c-c++-common/Wstringop-overflow-2-novec.c | 126 ++
 .../c-c++-common/Wstringop-overflow-2.c   |  20 +-
 gcc/testsuite/gcc.dg/Warray-bounds-48-novec.c | 364 ++
 gcc/testsuite/gcc.dg/Warray-bounds-48.c   |   4 +-
 gcc/testsuite/gcc.dg/Warray-bounds-51-novec.c |  21 +
 gcc/testsuite/gcc.dg/Warray-bounds-51.c   |   2 +-
 .../gcc.dg/Warray-parameter-3-novec.c |  16 +
 gcc/testsuite/gcc.dg/Warray-parameter-3.c |   2 +-
 .../gcc.dg/Wstringop-overflow-14-novec.c  |  16 +
 gcc/testsuite/gcc.dg/Wstringop-overflow-14.c  |   4 +-
 .../gcc.dg/Wstringop-overflow-21-novec.c  |  34 ++
 gcc/testsuite/gcc.dg/Wstringop-overflow-21.c  |   8 +-
 gcc/testsuite/gcc.dg/Wstringop-overflow-68.c  |  17 +-
 .../gcc.dg/Wstringop-overflow-76-novec.c  |  88 +
 gcc/testsuite/gcc.dg/Wstringop-overflow-76.c  |  18 +-
 .../Wzero-length-array-bounds-2-novec.c   |  45 +++
 .../gcc.dg/Wzero-length-array-bounds-2.c  |   2 +-
 gcc/testsuite/lib/target-supports.exp | 182 +
 19 files changed, 967 insertions(+), 34 deletions(-)
 create mode 100644 gcc/testsuite/c-c++-common/Wstringop-overflow-2-novec.c
 create mode 100644 gcc/testsuite/gcc.dg/Warray-bounds-48-novec.c
 create mode 100644 gcc/testsuite/gcc.dg/Warray-bounds-51-novec.c
 create mode 100644 gcc/testsuite/gcc.dg/Warray-parameter-3-novec.c
 create mode 100644 gcc/testsuite/gcc.dg/Wstringop-overflow-14-novec.c
 create mode 100644 gcc/testsuite/gcc.dg/Wstringop-overflow-21-novec.c
 create mode 100644 gcc/testsuite/gcc.dg/Wstringop-overflow-76-novec.c
 create mode 100644 gcc/testsuite/gcc.dg/Wzero-length-array-bounds-2-novec.c

diff --git a/gcc/doc/sourcebuild.texi b/gcc/doc/sourcebuild.texi
index b1fffd5e90f..6a165767630 100644
--- a/gcc/doc/sourcebuild.texi
+++ b/gcc/doc/sourcebuild.texi
@@ -1845,6 +1845,38 @@ Target supports loop vectorization with partial vectors 
and
 @item vect_partial_vectors
 Target supports loop vectorization with partial vectors and
 @code{vect-partial-vector-usage} is nonzero.
+
+@item vect_slp_v2qi_store
+Target supports vectorization of 2-byte char stores with 2-byte aligned
+address at plain @option{-O2}.
+
+@item vect_slp_v4qi_store
+Target supports vectorization of 4-byte char stores with 4-byte aligned
+address at plai

Re: [RFC] Remove VRP threader passes in exchange for better threading pre-VRP.

2021-10-19 Thread Aldy Hernandez via Gcc-patches





On 10/19/21 10:40 AM, Richard Biener wrote:

On Tue, Oct 19, 2021 at 9:33 AM Aldy Hernandez  wrote:


On Tue, Oct 19, 2021 at 8:52 AM Richard Biener
 wrote:


On Mon, Oct 18, 2021 at 4:03 PM Aldy Hernandez  wrote:




On 10/18/21 3:41 PM, Aldy Hernandez wrote:


I've been experimenting with reducing the total number of threading
passes, and I'd like to see if there's consensus/stomach for altering
the pipeline.  Note, that the goal is to remove forward threader clients,
not the other way around.  So, we should prefer to remove a VRP threader
instance over a *.thread one immediately before VRP.

After some playing, it looks like if we enable fully-resolving mode in
the *.thread passes immediately preceeding VRP, we can remove the VRP
threading passes altogether, thus removing 2 threading passes (and
forward threading passes at that!).


It occurs to me that we could also remove the threading before VRP
passes, and enable a fully-resolving backward threader after VRP.  I
haven't played with this scenario, but it should be just as good.  That
being said, I don't know the intricacies of why we had both pre and post
VRP threading passes, and if one is ideally better than the other.


It was done because they were different threaders.  Since the new threader
uses built-in VRP it shouldn't really matter whether it's before or after
VRP _for the threading_, but it might be that if threading runs before VRP
then VRP itself can do a better job on cleaning up the IL.


Good point.

FWIW, earlier this season I played with replacing the VRPs with evrp
instances (which fold far more conditionals) and I found that the
threaders can actually find LESS opportunities after *vrp fold away
things.  I don't know if this is a good or a bad thing.


Probably a sign that either threading theads stuff that's pointless
(does not consider conditions on the path that always evaluate false?)


At least in the backward threader, we don't keep looking back if we can 
resolve the conditional at the end of an in-progress path, so it's 
certainly possible we thread paths that are unreachable.  I'm pretty 
sure that's also possible in the forward threader.


For example, we if we have a candidate path that ends in x > 1234 and we 
know on entry to the path that x is [2000,3000], there's no need to 
chase further back to see if the path itself is reachable.



or that after VRP and removing redundant conditions blocks are now
bigger and we run into some --param limit (that would suggest the
limits behave odd if it would thread when we'd artificially split blocks).

Examples might be interesting to look at to understand what's going "wrong".


Perhaps we
should benchmark three alternatives:

1. Mainline
2. Fully resolving threader -> VRP -> No threading.
3. No threading -> VRP -> Full resolving threader.

...and see what the actual effect is, regardless of number of threaded paths.


As said, only 2. makes "sense" to me unless examples show why we really
have the usual pass ordering issue.  As said, I think threading exposes new
VRP (esp. constant/copy prop) opportunities but VRP shouldn't expose new
threading opportunities.


Excellent!  This saves me time, as I've mostly played with option #2 ;-).

Thanks.
Aldy

Re: [PATCH] options: Fix variable tracking option processing.

2021-10-19 Thread Richard Biener via Gcc-patches

On Fri, Oct 15, 2021 at 5:22 PM Martin Liška  wrote:
>
> All right, and there's second part that moves the code
> from toplev.c to opts.c (finish_options) as I've done in the original version.
>
> The patch also handles PR102766 where nvptx.c target sets:
> debug_nonbind_markers_p = 0;
>
> So the easiest approach is marking the flag as set in global_options_set,
> I haven't found a better approach :/ Reason is that nvptx_option_override
> is called before finish_options.

So currently nvptx_option_override is called before we do this code
blob (it's called at the beginning of process_options).

Why's the solution not to move this setting to finish_options as well?
(and disabling it along var-tracking when we end with no -g in process_options)

IMHO the target should have the last say, so the hook should be invoked
after we are finished overriding stuff and finish_options should be
_only_ doing diagnostics and disabling stuff we cannot handle.

Richard.

> Patch can bootstrap on x86_64-linux-gnu and survives regression tests.
>
> Ready to be installed?
> Thanks,
> Martin

[PATCH][aarch64] target: Support whitespaces in target attr/pragma.

2021-10-19 Thread Martin Liška


Hello.

The patch does the same as g:df592811f950301ed3b10a08e476dad0f2eff26a for 
aarch64.

Tested locally with cross compiler.

Ready for master?
Thanks,
Martin

PR target/102375

gcc/ChangeLog:

* config/aarch64/aarch64.c (aarch64_process_one_target_attr):
Strip whitespaces.

gcc/testsuite/ChangeLog:

* gcc.target/aarch64/pr102375.c: New test.
---
 gcc/config/aarch64/aarch64.c| 1 +
 gcc/testsuite/gcc.target/aarch64/pr102375.c | 4 
 2 files changed, 5 insertions(+)
 create mode 100644 gcc/testsuite/gcc.target/aarch64/pr102375.c

diff --git a/gcc/config/aarch64/aarch64.c b/gcc/config/aarch64/aarch64.c
index 76d99d247ae..4c3e491ab14 100644
--- a/gcc/config/aarch64/aarch64.c
+++ b/gcc/config/aarch64/aarch64.c
@@ -17624,6 +17624,7 @@ aarch64_process_one_target_attr (char *arg_str)
   bool invert = false;
 
   size_t len = strlen (arg_str);

+  arg_str = strip_whitespaces (arg_str, &len);
 
   if (len == 0)

 {
diff --git a/gcc/testsuite/gcc.target/aarch64/pr102375.c 
b/gcc/testsuite/gcc.target/aarch64/pr102375.c
new file mode 100644
index 000..fa75d319b2d
--- /dev/null
+++ b/gcc/testsuite/gcc.target/aarch64/pr102375.c
@@ -0,0 +1,4 @@
+/* PR target/102375 */
+/* { dg-do compile } */
+
+void calculate(void) __attribute__ ((target ("+sve, +sve2")));
--
2.33.1

Re: [RFC] Remove VRP threader passes in exchange for better threading pre-VRP.

2021-10-19 Thread Aldy Hernandez via Gcc-patches

On Tue, Oct 19, 2021 at 11:06 AM Aldy Hernandez  wrote:
>
>
>
> On 10/19/21 10:40 AM, Richard Biener wrote:
> > On Tue, Oct 19, 2021 at 9:33 AM Aldy Hernandez  wrote:
> >>
> >> On Tue, Oct 19, 2021 at 8:52 AM Richard Biener
> >>  wrote:
> >>>
> >>> On Mon, Oct 18, 2021 at 4:03 PM Aldy Hernandez  wrote:
> 
> 
> 
>  On 10/18/21 3:41 PM, Aldy Hernandez wrote:
> 
> > I've been experimenting with reducing the total number of threading
> > passes, and I'd like to see if there's consensus/stomach for altering
> > the pipeline.  Note, that the goal is to remove forward threader 
> > clients,
> > not the other way around.  So, we should prefer to remove a VRP threader
> > instance over a *.thread one immediately before VRP.
> >
> > After some playing, it looks like if we enable fully-resolving mode in
> > the *.thread passes immediately preceeding VRP, we can remove the VRP
> > threading passes altogether, thus removing 2 threading passes (and
> > forward threading passes at that!).
> 
>  It occurs to me that we could also remove the threading before VRP
>  passes, and enable a fully-resolving backward threader after VRP.  I
>  haven't played with this scenario, but it should be just as good.  That
>  being said, I don't know the intricacies of why we had both pre and post
>  VRP threading passes, and if one is ideally better than the other.
> >>>
> >>> It was done because they were different threaders.  Since the new threader
> >>> uses built-in VRP it shouldn't really matter whether it's before or after
> >>> VRP _for the threading_, but it might be that if threading runs before VRP
> >>> then VRP itself can do a better job on cleaning up the IL.
> >>
> >> Good point.
> >>
> >> FWIW, earlier this season I played with replacing the VRPs with evrp
> >> instances (which fold far more conditionals) and I found that the
> >> threaders can actually find LESS opportunities after *vrp fold away
> >> things.  I don't know if this is a good or a bad thing.
> >
> > Probably a sign that either threading theads stuff that's pointless
> > (does not consider conditions on the path that always evaluate false?)
>
> At least in the backward threader, we don't keep looking back if we can
> resolve the conditional at the end of an in-progress path, so it's
> certainly possible we thread paths that are unreachable.  I'm pretty
> sure that's also possible in the forward threader.
>
> For example, we if we have a candidate path that ends in x > 1234 and we
> know on entry to the path that x is [2000,3000], there's no need to
> chase further back to see if the path itself is reachable.

For that matter, when I was working on replacing the DOM threader, I
found out that the forward threader + evrp routinely tried to thread
paths that were unreachable, and I had to trim them from my comparison
tally.  The new backward threader engine suffers less from this,
because if there is an UNDEFINED range as part of the in-path
calculation, we can trim the path as unreachable (and avoid further
searches in that direction).  However, as I said, if the range is
known on entry, we do no further lookups and happily thread away.

Aldy

Re: [PATCH] options: Fix variable tracking option processing.

2021-10-19 Thread Martin Liška


On 10/19/21 11:12, Richard Biener wrote:

On Fri, Oct 15, 2021 at 5:22 PM Martin Liška  wrote:


All right, and there's second part that moves the code
from toplev.c to opts.c (finish_options) as I've done in the original version.

The patch also handles PR102766 where nvptx.c target sets:
debug_nonbind_markers_p = 0;

So the easiest approach is marking the flag as set in global_options_set,
I haven't found a better approach :/ Reason is that nvptx_option_override
is called before finish_options.


So currently nvptx_option_override is called before we do this code
blob (it's called at the beginning of process_options).


Yes, happens early in process_options.



Why's the solution not to move this setting to finish_options as well?


I would like to, but the option detection depends on target hooks and some 
debuginfo
functionality, leading to:

g++ -no-pie   -g   -DIN_GCC -fPIC -DCROSS_DIRECTORY_STRUCTURE   -fno-exceptions 
-fno-rtti -fasynchronous-unwind-tables -W -Wall -Wno-narrowing -Wwrite-strings 
-Wcast-qual -Wno-error=format-diag -Wmissing-format-attribute 
-Woverloaded-virtual -pedantic -Wno-long-long -Wno-variadic-macros 
-Wno-overlength-strings -fno-common  -DHAVE_CONFIG_H -static-libstdc++ 
-static-libgcc   -o Tlto-wrapper \
   lto-wrapper.o collect-utils.o ggc-none.o libcommon-target.a libcommon.a 
../libcpp/libcpp.a   ../libbacktrace/.libs/libbacktrace.a 
../libiberty/pic/libiberty.a ../libdecnumber/libdecnumber.a
/home/marxin/Programming/gcc/gcc/opts.c:1382: error: undefined reference to 
'dwarf2out_default_as_loc_support()'
/home/marxin/Programming/gcc/gcc/opts.c:1384: error: undefined reference to 
'dwarf2out_default_as_locview_support()'
/home/marxin/Programming/gcc/gcc/opts.c:1409: error: undefined reference to 
'targetm'

Or can we do better?


(and disabling it along var-tracking when we end with no -g in process_options)

IMHO the target should have the last say, so the hook should be invoked
after we are finished overriding stuff and finish_options should be
_only_ doing diagnostics and disabling stuff we cannot handle.


Well, that sounds good, but we are quite far from that :/

Martin



Richard.


Patch can bootstrap on x86_64-linux-gnu and survives regression tests.

Ready to be installed?
Thanks,
Martin

[PATCH] tree-optimization/102827 - avoid stmts in preheader

2021-10-19 Thread Richard Biener via Gcc-patches

The PR shows that when carefully crafting the runtime alias
condition in the vectorizer we might end up using defs from
the loop preheader but will end up inserting the condition
before the .LOOP_VECTORIZED call.  So the following makes
sure to insert invariants before that when we versioned the
loop, preserving the invariant the vectorizer relies on.

Bootstrapped and tested on x86_64-unknown-linux-gnu, pushed.

2021-10-19  Richard Biener  

PR tree-optimization/102827
* tree-if-conv.c (predicate_statements): Add pe parameter
and use that edge to insert invariant stmts on.
(combine_blocks): Pass through pe.
(tree_if_conversion): Compute the edge to insert invariant
stmts on and pass it along.

* gcc.dg/pr102827.c: New testcase.
---
 gcc/testsuite/gcc.dg/pr102827.c | 13 +
 gcc/tree-if-conv.c  | 21 +++--
 2 files changed, 28 insertions(+), 6 deletions(-)
 create mode 100644 gcc/testsuite/gcc.dg/pr102827.c

diff --git a/gcc/testsuite/gcc.dg/pr102827.c b/gcc/testsuite/gcc.dg/pr102827.c
new file mode 100644
index 000..eed3eba32d1
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/pr102827.c
@@ -0,0 +1,13 @@
+/* { dg-do compile } */
+/* { dg-options "-O -ftree-vectorize --param ssa-name-def-chain-limit=0" } */
+/* { dg-additional-options "-mavx" { target { x86_64-*-* i?86-*-* } } } */
+
+void
+test_double_double_nugt_var (double *dest, double *src, int b, int i)
+{
+  while (i < 1)
+{
+  dest[i] = b ? src[i] : 0.0;
+  ++i;
+}
+}
diff --git a/gcc/tree-if-conv.c b/gcc/tree-if-conv.c
index 15dcc1e2b94..b165dc0c17f 100644
--- a/gcc/tree-if-conv.c
+++ b/gcc/tree-if-conv.c
@@ -2490,7 +2490,7 @@ predicate_rhs_code (gassign *stmt, tree mask, tree cond,
 */
 
 static void
-predicate_statements (loop_p loop)
+predicate_statements (loop_p loop, edge pe)
 {
   unsigned int i, orig_loop_num_nodes = loop->num_nodes;
   auto_vec vect_sizes;
@@ -2596,8 +2596,7 @@ predicate_statements (loop_p loop)
  if (CONVERT_EXPR_CODE_P (gimple_assign_rhs_code (stmt2))
  && expr_invariant_in_loop_p (loop,
   gimple_assign_rhs1 (stmt2)))
-   gsi_insert_on_edge_immediate (loop_preheader_edge (loop),
- stmt2);
+   gsi_insert_on_edge_immediate (pe, stmt2);
  else if (first)
{
  gsi_insert_before (&gsi, stmt2, GSI_NEW_STMT);
@@ -2679,7 +2678,7 @@ remove_conditions_and_labels (loop_p loop)
blocks.  Replace PHI nodes with conditional modify expressions.  */
 
 static void
-combine_blocks (class loop *loop)
+combine_blocks (class loop *loop, edge pe)
 {
   basic_block bb, exit_bb, merge_target_bb;
   unsigned int orig_loop_num_nodes = loop->num_nodes;
@@ -2692,7 +2691,7 @@ combine_blocks (class loop *loop)
   predicate_all_scalar_phis (loop);
 
   if (need_to_predicate || need_to_rewrite_undefined)
-predicate_statements (loop);
+predicate_statements (loop, pe);
 
   /* Merge basic blocks.  */
   exit_bb = NULL;
@@ -3187,6 +3186,7 @@ tree_if_conversion (class loop *loop, vec 
*preds)
   bool aggressive_if_conv;
   class loop *rloop;
   bitmap exit_bbs;
+  edge pe;
 
  again:
   rloop = NULL;
@@ -3218,6 +3218,9 @@ tree_if_conversion (class loop *loop, vec 
*preds)
  || loop->dont_vectorize))
 goto cleanup;
 
+  /* The edge to insert invariant stmts on.  */
+  pe = loop_preheader_edge (loop);
+
   /* Since we have no cost model, always version loops unless the user
  specified -ftree-loop-if-convert or unless versioning is required.
  Either version this loop, or if the pattern is right for outer-loop
@@ -3255,12 +3258,18 @@ tree_if_conversion (class loop *loop, vec 
*preds)
  gcc_assert (nloop->inner && nloop->inner->next == NULL);
  rloop = nloop->inner;
}
+  else
+   /* If we versioned loop then make sure to insert invariant
+  stmts before the .LOOP_VECTORIZED check since the vectorizer
+  will re-use that for things like runtime alias versioning
+  whose condition can end up using those invariants.  */
+   pe = single_pred_edge (gimple_bb (preds->last ()));
 }
 
   /* Now all statements are if-convertible.  Combine all the basic
  blocks into one huge basic block doing the if-conversion
  on-the-fly.  */
-  combine_blocks (loop);
+  combine_blocks (loop, pe);
 
   /* Perform local CSE, this esp. helps the vectorizer analysis if loads
  and stores are involved.  CSE only the loop body, not the entry
-- 
2.31.1

[PATCH] Compute negative offset in get_load_store_type

2021-10-19 Thread Richard Biener via Gcc-patches

This moves the computation of a negative offset that needs to be
applied when we vectorize a negative stride access to
get_load_store_type alongside where we compute the actual access
method.

Bootstrapped and tested on x86_64-unknown-linux-gnu, pushed.

2021-10-19  Richard Biener  

* tree-vect-stmts.c (get_negative_load_store_type): Add
offset output parameter and initialize it.
(get_group_load_store_type): Likewise.
(get_load_store_type): Likewise.
(vectorizable_store): Use offset as computed by
get_load_store_type.
(vectorizable_load): Likewise.
---
 gcc/tree-vect-stmts.c | 33 +
 1 file changed, 21 insertions(+), 12 deletions(-)

diff --git a/gcc/tree-vect-stmts.c b/gcc/tree-vect-stmts.c
index c3690769d8f..09a97b44c91 100644
--- a/gcc/tree-vect-stmts.c
+++ b/gcc/tree-vect-stmts.c
@@ -1967,13 +1967,14 @@ perm_mask_for_reverse (tree vectype)
 
 /* A subroutine of get_load_store_type, with a subset of the same
arguments.  Handle the case where STMT_INFO is a load or store that
-   accesses consecutive elements with a negative step.  */
+   accesses consecutive elements with a negative step.  Sets *POFFSET
+   to the offset to be applied to the DR for the first access.  */
 
 static vect_memory_access_type
 get_negative_load_store_type (vec_info *vinfo,
  stmt_vec_info stmt_info, tree vectype,
  vec_load_store_type vls_type,
- unsigned int ncopies)
+ unsigned int ncopies, poly_int64 *poffset)
 {
   dr_vec_info *dr_info = STMT_VINFO_DR_INFO (stmt_info);
   dr_alignment_support alignment_support_scheme;
@@ -2003,6 +2004,7 @@ get_negative_load_store_type (vec_info *vinfo,
dump_printf_loc (MSG_NOTE, vect_location,
 "negative step with invariant source;"
 " no permute needed.\n");
+  *poffset = -TYPE_VECTOR_SUBPARTS (vectype) + 1;
   return VMAT_CONTIGUOUS_DOWN;
 }
 
@@ -2014,6 +2016,7 @@ get_negative_load_store_type (vec_info *vinfo,
   return VMAT_ELEMENTWISE;
 }
 
+  *poffset = -TYPE_VECTOR_SUBPARTS (vectype) + 1;
   return VMAT_CONTIGUOUS_REVERSE;
 }
 
@@ -2108,6 +2111,7 @@ get_group_load_store_type (vec_info *vinfo, stmt_vec_info 
stmt_info,
   tree vectype, slp_tree slp_node,
   bool masked_p, vec_load_store_type vls_type,
   vect_memory_access_type *memory_access_type,
+  poly_int64 *poffset,
   dr_alignment_support *alignment_support_scheme,
   int *misalignment,
   gather_scatter_info *gs_info)
@@ -2210,7 +2214,7 @@ get_group_load_store_type (vec_info *vinfo, stmt_vec_info 
stmt_info,
/* ???  The VMAT_CONTIGUOUS_REVERSE code generation is
   only correct for single element "interleaving" SLP.  */
*memory_access_type = get_negative_load_store_type
-  (vinfo, stmt_info, vectype, vls_type, 1);
+(vinfo, stmt_info, vectype, vls_type, 1, poffset);
  else
{
  /* Try to use consecutive accesses of DR_GROUP_SIZE elements,
@@ -2359,6 +2363,7 @@ get_load_store_type (vec_info  *vinfo, stmt_vec_info 
stmt_info,
 bool masked_p, vec_load_store_type vls_type,
 unsigned int ncopies,
 vect_memory_access_type *memory_access_type,
+poly_int64 *poffset,
 dr_alignment_support *alignment_support_scheme,
 int *misalignment,
 gather_scatter_info *gs_info)
@@ -2366,6 +2371,7 @@ get_load_store_type (vec_info  *vinfo, stmt_vec_info 
stmt_info,
   loop_vec_info loop_vinfo = dyn_cast  (vinfo);
   poly_uint64 nunits = TYPE_VECTOR_SUBPARTS (vectype);
   *misalignment = DR_MISALIGNMENT_UNKNOWN;
+  *poffset = 0;
   if (STMT_VINFO_GATHER_SCATTER_P (stmt_info))
 {
   *memory_access_type = VMAT_GATHER_SCATTER;
@@ -2412,7 +2418,7 @@ get_load_store_type (vec_info  *vinfo, stmt_vec_info 
stmt_info,
 {
   if (!get_group_load_store_type (vinfo, stmt_info, vectype, slp_node,
  masked_p,
- vls_type, memory_access_type,
+ vls_type, memory_access_type, poffset,
  alignment_support_scheme,
  misalignment, gs_info))
return false;
@@ -2444,7 +2450,7 @@ get_load_store_type (vec_info  *vinfo, stmt_vec_info 
stmt_info,
{
  if (cmp < 0)
*memory_access_type = get_negative_load_store_type
-  (vinfo, stmt_info, vectype, vls_type, ncopies);
+  (vinfo, stmt_info, vectype, vls_ty

[committed] libstdc++: Fix mem-initializer in std::move_only_function [PR102825]

2021-10-19 Thread Jonathan Wakely via Gcc-patches

libstdc++-v3/ChangeLog:

PR libstdc++/102825
* include/bits/mofunc_impl.h (move_only_function): Remove
invalid base initializer.
* testsuite/20_util/move_only_function/cons.cc: Instantiate
constructors to check bodies.

Tested powerpc64le-linux. Committed to trunk.

commit 9890b12c72c02828c691f22198c3e0afd8678991
Author: Jonathan Wakely 
Date:   Tue Oct 19 09:16:56 2021

libstdc++: Fix mem-initializer in std::move_only_function [PR102825]

libstdc++-v3/ChangeLog:

PR libstdc++/102825
* include/bits/mofunc_impl.h (move_only_function): Remove
invalid base initializer.
* testsuite/20_util/move_only_function/cons.cc: Instantiate
constructors to check bodies.

diff --git a/libstdc++-v3/include/bits/mofunc_impl.h 
b/libstdc++-v3/include/bits/mofunc_impl.h
index 543c6f547b7..968d235f867 100644
--- a/libstdc++-v3/include/bits/mofunc_impl.h
+++ b/libstdc++-v3/include/bits/mofunc_impl.h
@@ -108,7 +108,7 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
move_only_function(in_place_type_t<_Tp>, initializer_list<_Up> __il,
   _Args&&... __args)
noexcept(_S_nothrow_init<_Tp, initializer_list<_Up>&, _Args...>())
-   : _Mofunc_base(nullptr), _M_invoke(&_S_invoke<_Tp>)
+   : _M_invoke(&_S_invoke<_Tp>)
{
  static_assert(is_same_v, _Tp>);
  _M_init<_Tp>(__il, std::forward<_Args>(__args)...);
diff --git a/libstdc++-v3/testsuite/20_util/move_only_function/cons.cc 
b/libstdc++-v3/testsuite/20_util/move_only_function/cons.cc
index 0992f107003..d8a0a4ab2b0 100644
--- a/libstdc++-v3/testsuite/20_util/move_only_function/cons.cc
+++ b/libstdc++-v3/testsuite/20_util/move_only_function/cons.cc
@@ -96,3 +96,30 @@ static_assert( ! 
is_nothrow_constructible_v,
in_place_type_t, int> );
 static_assert( is_nothrow_constructible_v,
  in_place_type_t, int, int> );
+
+struct I {
+  I(int, const char*);
+  I(std::initializer_list);
+  int operator()() const noexcept;
+};
+
+static_assert( is_constructible_v,
+ std::in_place_type_t,
+ int, const char*> );
+static_assert( is_constructible_v,
+ std::in_place_type_t,
+ std::initializer_list> );
+
+void
+test_instantiation()
+{
+  // Instantiate the constructor bodies
+  move_only_function f0;
+  move_only_function f1(nullptr);
+  move_only_function f2( I(1, "two") );
+  move_only_function f3(std::in_place_type, 3, "four");
+  move_only_function f4(std::in_place_type, // PR libstdc++/102825
+   { 'P', 'R', '1', '0', '2', '8', '2', '5'});
+  auto f5 = std::move(f4);
+  f4 = std::move(f5);
+}

Re: [PATCH][aarch64] target: Support whitespaces in target attr/pragma.

2021-10-19 Thread Richard Sandiford via Gcc-patches

Martin Liška  writes:
> Hello.
>
> The patch does the same as g:df592811f950301ed3b10a08e476dad0f2eff26a for 
> aarch64.
>
> Tested locally with cross compiler.
>
> Ready for master?
>
> Thanks,
> Martin
>
>   PR target/102375
>
> gcc/ChangeLog:
>
>   * config/aarch64/aarch64.c (aarch64_process_one_target_attr):
>   Strip whitespaces.
>
> gcc/testsuite/ChangeLog:
>
>   * gcc.target/aarch64/pr102375.c: New test.
> ---
>   gcc/config/aarch64/aarch64.c| 1 +
>   gcc/testsuite/gcc.target/aarch64/pr102375.c | 4 
>   2 files changed, 5 insertions(+)
>   create mode 100644 gcc/testsuite/gcc.target/aarch64/pr102375.c
>
> diff --git a/gcc/config/aarch64/aarch64.c b/gcc/config/aarch64/aarch64.c
> index 76d99d247ae..4c3e491ab14 100644
> --- a/gcc/config/aarch64/aarch64.c
> +++ b/gcc/config/aarch64/aarch64.c
> @@ -17624,6 +17624,7 @@ aarch64_process_one_target_attr (char *arg_str)
> bool invert = false;
>   
> size_t len = strlen (arg_str);
> +  arg_str = strip_whitespaces (arg_str, &len);

It looks like this ought to happen after the alloca and copy, since it
modifies the string.

Thanks,
Richard

>   
> if (len == 0)
>   {
> diff --git a/gcc/testsuite/gcc.target/aarch64/pr102375.c 
> b/gcc/testsuite/gcc.target/aarch64/pr102375.c
> new file mode 100644
> index 000..fa75d319b2d
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/aarch64/pr102375.c
> @@ -0,0 +1,4 @@
> +/* PR target/102375 */
> +/* { dg-do compile } */
> +
> +void calculate(void) __attribute__ ((target ("+sve, +sve2")));

Re: [PATCH] options: Fix variable tracking option processing.

2021-10-19 Thread Richard Biener via Gcc-patches

On Tue, Oct 19, 2021 at 11:34 AM Martin Liška  wrote:
>
> On 10/19/21 11:12, Richard Biener wrote:
> > On Fri, Oct 15, 2021 at 5:22 PM Martin Liška  wrote:
> >>
> >> All right, and there's second part that moves the code
> >> from toplev.c to opts.c (finish_options) as I've done in the original 
> >> version.
> >>
> >> The patch also handles PR102766 where nvptx.c target sets:
> >> debug_nonbind_markers_p = 0;
> >>
> >> So the easiest approach is marking the flag as set in global_options_set,
> >> I haven't found a better approach :/ Reason is that nvptx_option_override
> >> is called before finish_options.
> >
> > So currently nvptx_option_override is called before we do this code
> > blob (it's called at the beginning of process_options).
>
> Yes, happens early in process_options.
>
> >
> > Why's the solution not to move this setting to finish_options as well?
>
> I would like to, but the option detection depends on target hooks and some 
> debuginfo
> functionality, leading to:
>
> g++ -no-pie   -g   -DIN_GCC -fPIC -DCROSS_DIRECTORY_STRUCTURE   
> -fno-exceptions -fno-rtti -fasynchronous-unwind-tables -W -Wall 
> -Wno-narrowing -Wwrite-strings -Wcast-qual -Wno-error=format-diag 
> -Wmissing-format-attribute -Woverloaded-virtual -pedantic -Wno-long-long 
> -Wno-variadic-macros -Wno-overlength-strings -fno-common  -DHAVE_CONFIG_H 
> -static-libstdc++ -static-libgcc   -o Tlto-wrapper \
> lto-wrapper.o collect-utils.o ggc-none.o libcommon-target.a libcommon.a 
> ../libcpp/libcpp.a   ../libbacktrace/.libs/libbacktrace.a 
> ../libiberty/pic/libiberty.a ../libdecnumber/libdecnumber.a
> /home/marxin/Programming/gcc/gcc/opts.c:1382: error: undefined reference to 
> 'dwarf2out_default_as_loc_support()'
> /home/marxin/Programming/gcc/gcc/opts.c:1384: error: undefined reference to 
> 'dwarf2out_default_as_locview_support()'
> /home/marxin/Programming/gcc/gcc/opts.c:1409: error: undefined reference to 
> 'targetm'
>
> Or can we do better?

Meh ... :/

Well, move the target override hook call down (try to shuffle things
so diagnostics happen after but
"inits" happen before).

> > (and disabling it along var-tracking when we end with no -g in 
> > process_options)
> >
> > IMHO the target should have the last say, so the hook should be invoked
> > after we are finished overriding stuff and finish_options should be
> > _only_ doing diagnostics and disabling stuff we cannot handle.
>
> Well, that sounds good, but we are quite far from that :/

Yes, I know...

> Martin
>
> >
> > Richard.
> >
> >> Patch can bootstrap on x86_64-linux-gnu and survives regression tests.
> >>
> >> Ready to be installed?
> >> Thanks,
> >> Martin
>

Re: [COMMITTED] Change threading comment before pass_ccp pass.

2021-10-19 Thread Richard Biener via Gcc-patches

On Tue, Oct 19, 2021 at 10:48 AM Aldy Hernandez  wrote:
>
> As suggested.  Thanks.
>
> gcc/ChangeLog:
>
> * passes.def: Change threading comment before pass_ccp pass.
> ---
>  gcc/passes.def | 4 +---
>  1 file changed, 1 insertion(+), 3 deletions(-)
>
> diff --git a/gcc/passes.def b/gcc/passes.def
> index c11c237f6d2..4c54176328b 100644
> --- a/gcc/passes.def
> +++ b/gcc/passes.def
> @@ -339,9 +339,7 @@ along with GCC; see the file COPYING3.  If not see
>NEXT_PASS (pass_thread_jumps);
>NEXT_PASS (pass_vrp, false /* warn_array_bounds_p */);
>NEXT_PASS (pass_vrp_threader);
> -  /* Threading can leave many const/copy propagations in the IL.
> -Clean them up.  Instead of just copy_prop, we use ccp to
> -compute alignment and nonzero bits.  */

Well, the comment is still true as long as pass_vrp_threader is there ;)

> +  /* Run CCP to compute alignment and nonzero bits.  */
>NEXT_PASS (pass_ccp, true /* nonzero_p */);
>NEXT_PASS (pass_warn_restrict);
>NEXT_PASS (pass_dse);
> --
> 2.31.1
>

Re: [match.pd] PR83750 - CSE erf/erfc pair

2021-10-19 Thread Prathamesh Kulkarni via Gcc-patches

On Tue, 19 Oct 2021 at 13:02, Richard Biener  wrote:
>
> On Tue, Oct 19, 2021 at 9:03 AM Prathamesh Kulkarni via Gcc-patches
>  wrote:
> >
> > On Mon, 18 Oct 2021 at 17:23, Richard Biener  wrote:
> > >
> > > On Mon, 18 Oct 2021, Prathamesh Kulkarni wrote:
> > >
> > > > On Mon, 18 Oct 2021 at 17:10, Richard Biener  wrote:
> > > > >
> > > > > On Mon, 18 Oct 2021, Prathamesh Kulkarni wrote:
> > > > >
> > > > > > On Mon, 18 Oct 2021 at 16:18, Richard Biener  
> > > > > > wrote:
> > > > > > >
> > > > > > > On Mon, 18 Oct 2021, Prathamesh Kulkarni wrote:
> > > > > > >
> > > > > > > > Hi Richard,
> > > > > > > > As suggested in PR, I have attached WIP patch that adds two 
> > > > > > > > patterns
> > > > > > > > to match.pd:
> > > > > > > > erfc(x) --> 1 - erf(x) if canonicalize_math_p() and,
> > > > > > > > 1 - erf(x) --> erfc(x) if !canonicalize_math_p().
> > > > > > > >
> > > > > > > > This works to remove call to erfc for the following test:
> > > > > > > > double f(double x)
> > > > > > > > {
> > > > > > > >   double g(double, double);
> > > > > > > >
> > > > > > > >   double t1 = __builtin_erf (x);
> > > > > > > >   double t2 = __builtin_erfc (x);
> > > > > > > >   return g(t1, t2);
> > > > > > > > }
> > > > > > > >
> > > > > > > > with .optimized dump shows:
> > > > > > > >   t1_2 = __builtin_erf (x_1(D));
> > > > > > > >   t2_3 = 1.0e+0 - t1_2;
> > > > > > > >
> > > > > > > > However, for the following test:
> > > > > > > > double f(double x)
> > > > > > > > {
> > > > > > > >   double g(double, double);
> > > > > > > >
> > > > > > > >   double t1 = __builtin_erfc (x);
> > > > > > > >   return t1;
> > > > > > > > }
> > > > > > > >
> > > > > > > > It canonicalizes erfc(x) to 1 - erf(x), but does not transform 
> > > > > > > > 1 -
> > > > > > > > erf(x) to erfc(x) again
> > > > > > > > post canonicalization.
> > > > > > > > -fdump-tree-folding shows that 1 - erf(x) --> erfc(x) gets 
> > > > > > > > applied,
> > > > > > > > but then it tries to
> > > > > > > > resimplify erfc(x), which fails post canonicalization. So we 
> > > > > > > > end up
> > > > > > > > with erfc(x) transformed to
> > > > > > > > 1 - erf(x) in .optimized dump, which I suppose isn't ideal.
> > > > > > > > Could you suggest how to proceed ?
> > > > > > >
> > > > > > > I applied your patch manually and it does the intended
> > > > > > > simplifications so I wonder what I am missing?
> > > > > > Would it be OK to always fold erfc(x) -> 1 - erf(x) even when 
> > > > > > there's
> > > > > > no erf(x) in the source ?
> > > > >
> > > > > I do think it's reasonable to expect erfc to be available when erf
> > > > > is and vice versa but note both are C99 specified functions (either
> > > > > requires -lm).
> > > > OK, thanks. Would it be OK to commit the patch after bootstrap+test ?
> > >
> > > Yes, but I'm confused because you say the patch doesn't work for you?
> > The patch works for me to CSE erf/erfc pair.
> > However when there's only erfc in the source, it canonicalizes erfc(x)
> > to 1 - erf(x) but later fails to uncanonicalize 1 - erf(x) back to
> > erfc(x)
> > with -O3 -funsafe-math-optimizations.
> >
> > For,
> > t1 = __builtin_erfc(x),
> >
> > .optimized dump shows:
> >   _2 = __builtin_erf (x_1(D));
> >   t1_3 = 1.0e+0 - _2;
> >
> > and for,
> > double t1 = x + __builtin_erfc(x);
> >
> > .optimized dump shows:
> >   _3 = __builtin_erf (x_2(D));
> >   _7 = x_2(D) + 1.0e+0;
> >   t1_4 = _7 - _3;
> >
> > I assume in both cases, we want erfc in the code-gen instead ?
> > I think the reason uncaonicalization fails is because the pattern 1 -
> > erf(x) to erfc(x)
> > gets applied, but then it fails in resimplifying erfc(x), and we end
> > up with 1 - erf(x) in code-gen.
> >
> > From gimple-match.c, it hits the simplification:
> >
> > gimple_seq *lseq = seq;
> > if (__builtin_expect (!dbg_cnt
> > (match), 0)) goto next_after_fail1172;
> > if (__builtin_expect (dump_file &&
> > (dump_flags & TDF_FOLDING), 0)) fprintf (dump_file, "Applying pattern
> > %s:%d, %s:%d\n", "match.pd", 6162, __FILE__, __LINE__);
> > {
> >   res_op->set_op (CFN_BUILT_IN_ERFC, type, 
> > 1);
> >   res_op->ops[0] = captures[0];
> >   res_op->resimplify (lseq, valueize);
> >   return true;
> > }
> >
> > But res_op->resimplify returns false, and doesn't end up adding to lseq.
>
> There's nothing to add to lseq since there's also nothing to resimplify.
> The only thing that could happen is that the replacement is not done
> because replace_stmt_with_simplification via maybe_push_res_to_seq
> doesn't pass the builtin_decl_implicit test:
>
>   /* Find the function we want to call.  */
>   tree decl = builtin_decl_implicit (as_builtin_fn (fn));
>   if (!decl)
>

[PATCH] MAINTAINERS: Add myself for write after approval

2021-10-19 Thread CHIGOT, CLEMENT via Gcc-patches

ChangeLog:
2021-10-19  Clément Chigot  

* MAINTAINERS: Add myself for write after approval.
From 13ddc381ea7bde6df9e48fb968d9324564f7a540 Mon Sep 17 00:00:00 2001
From: =?UTF-8?q?Cl=C3=A9ment=20Chigot?= 
Date: Tue, 19 Oct 2021 13:20:14 +0200
Subject: [PATCH] MAINTAINERS: Add myself for write after approval
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

ChangeLog:
2021-10-19  ClÃ©ment Chigot  

	* MAINTAINERS: Add myself for write after approval.
---
 MAINTAINERS | 1 +
 1 file changed, 1 insertion(+)

diff --git a/MAINTAINERS b/MAINTAINERS
index bf4006c779f..b22f930583a 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -352,6 +352,7 @@ Gabriel Charette
 Chandra Chavva	
 Dehao Chen	
 Fabien ChÃªne	
+ClÃ©ment Chigot	
 Harshit Chopra	
 Tamar Christina	
 Eric Christopher
-- 
2.25.1

Re: [match.pd] PR83750 - CSE erf/erfc pair

2021-10-19 Thread Richard Biener via Gcc-patches

On Tue, 19 Oct 2021, Prathamesh Kulkarni wrote:

> On Tue, 19 Oct 2021 at 13:02, Richard Biener  
> wrote:
> >
> > On Tue, Oct 19, 2021 at 9:03 AM Prathamesh Kulkarni via Gcc-patches
> >  wrote:
> > >
> > > On Mon, 18 Oct 2021 at 17:23, Richard Biener  wrote:
> > > >
> > > > On Mon, 18 Oct 2021, Prathamesh Kulkarni wrote:
> > > >
> > > > > On Mon, 18 Oct 2021 at 17:10, Richard Biener  
> > > > > wrote:
> > > > > >
> > > > > > On Mon, 18 Oct 2021, Prathamesh Kulkarni wrote:
> > > > > >
> > > > > > > On Mon, 18 Oct 2021 at 16:18, Richard Biener  
> > > > > > > wrote:
> > > > > > > >
> > > > > > > > On Mon, 18 Oct 2021, Prathamesh Kulkarni wrote:
> > > > > > > >
> > > > > > > > > Hi Richard,
> > > > > > > > > As suggested in PR, I have attached WIP patch that adds two 
> > > > > > > > > patterns
> > > > > > > > > to match.pd:
> > > > > > > > > erfc(x) --> 1 - erf(x) if canonicalize_math_p() and,
> > > > > > > > > 1 - erf(x) --> erfc(x) if !canonicalize_math_p().
> > > > > > > > >
> > > > > > > > > This works to remove call to erfc for the following test:
> > > > > > > > > double f(double x)
> > > > > > > > > {
> > > > > > > > >   double g(double, double);
> > > > > > > > >
> > > > > > > > >   double t1 = __builtin_erf (x);
> > > > > > > > >   double t2 = __builtin_erfc (x);
> > > > > > > > >   return g(t1, t2);
> > > > > > > > > }
> > > > > > > > >
> > > > > > > > > with .optimized dump shows:
> > > > > > > > >   t1_2 = __builtin_erf (x_1(D));
> > > > > > > > >   t2_3 = 1.0e+0 - t1_2;
> > > > > > > > >
> > > > > > > > > However, for the following test:
> > > > > > > > > double f(double x)
> > > > > > > > > {
> > > > > > > > >   double g(double, double);
> > > > > > > > >
> > > > > > > > >   double t1 = __builtin_erfc (x);
> > > > > > > > >   return t1;
> > > > > > > > > }
> > > > > > > > >
> > > > > > > > > It canonicalizes erfc(x) to 1 - erf(x), but does not 
> > > > > > > > > transform 1 -
> > > > > > > > > erf(x) to erfc(x) again
> > > > > > > > > post canonicalization.
> > > > > > > > > -fdump-tree-folding shows that 1 - erf(x) --> erfc(x) gets 
> > > > > > > > > applied,
> > > > > > > > > but then it tries to
> > > > > > > > > resimplify erfc(x), which fails post canonicalization. So we 
> > > > > > > > > end up
> > > > > > > > > with erfc(x) transformed to
> > > > > > > > > 1 - erf(x) in .optimized dump, which I suppose isn't ideal.
> > > > > > > > > Could you suggest how to proceed ?
> > > > > > > >
> > > > > > > > I applied your patch manually and it does the intended
> > > > > > > > simplifications so I wonder what I am missing?
> > > > > > > Would it be OK to always fold erfc(x) -> 1 - erf(x) even when 
> > > > > > > there's
> > > > > > > no erf(x) in the source ?
> > > > > >
> > > > > > I do think it's reasonable to expect erfc to be available when erf
> > > > > > is and vice versa but note both are C99 specified functions (either
> > > > > > requires -lm).
> > > > > OK, thanks. Would it be OK to commit the patch after bootstrap+test ?
> > > >
> > > > Yes, but I'm confused because you say the patch doesn't work for you?
> > > The patch works for me to CSE erf/erfc pair.
> > > However when there's only erfc in the source, it canonicalizes erfc(x)
> > > to 1 - erf(x) but later fails to uncanonicalize 1 - erf(x) back to
> > > erfc(x)
> > > with -O3 -funsafe-math-optimizations.
> > >
> > > For,
> > > t1 = __builtin_erfc(x),
> > >
> > > .optimized dump shows:
> > >   _2 = __builtin_erf (x_1(D));
> > >   t1_3 = 1.0e+0 - _2;
> > >
> > > and for,
> > > double t1 = x + __builtin_erfc(x);
> > >
> > > .optimized dump shows:
> > >   _3 = __builtin_erf (x_2(D));
> > >   _7 = x_2(D) + 1.0e+0;
> > >   t1_4 = _7 - _3;
> > >
> > > I assume in both cases, we want erfc in the code-gen instead ?
> > > I think the reason uncaonicalization fails is because the pattern 1 -
> > > erf(x) to erfc(x)
> > > gets applied, but then it fails in resimplifying erfc(x), and we end
> > > up with 1 - erf(x) in code-gen.
> > >
> > > From gimple-match.c, it hits the simplification:
> > >
> > > gimple_seq *lseq = seq;
> > > if (__builtin_expect (!dbg_cnt
> > > (match), 0)) goto next_after_fail1172;
> > > if (__builtin_expect (dump_file &&
> > > (dump_flags & TDF_FOLDING), 0)) fprintf (dump_file, "Applying pattern
> > > %s:%d, %s:%d\n", "match.pd", 6162, __FILE__, __LINE__);
> > > {
> > >   res_op->set_op (CFN_BUILT_IN_ERFC, 
> > > type, 1);
> > >   res_op->ops[0] = captures[0];
> > >   res_op->resimplify (lseq, valueize);
> > >   return true;
> > > }
> > >
> > > But res_op->resimplify returns false, and doesn't end up adding to lseq.
> >
> > There's nothing to add to lseq since there's also nothing to resimplify.
> > The only thing that could happen is t

Re: [COMMITTED] Change threading comment before pass_ccp pass.

2021-10-19 Thread Aldy Hernandez via Gcc-patches





On 10/19/21 12:55 PM, Richard Biener wrote:

On Tue, Oct 19, 2021 at 10:48 AM Aldy Hernandez  wrote:


As suggested.  Thanks.

gcc/ChangeLog:

 * passes.def: Change threading comment before pass_ccp pass.
---
  gcc/passes.def | 4 +---
  1 file changed, 1 insertion(+), 3 deletions(-)

diff --git a/gcc/passes.def b/gcc/passes.def
index c11c237f6d2..4c54176328b 100644
--- a/gcc/passes.def
+++ b/gcc/passes.def
@@ -339,9 +339,7 @@ along with GCC; see the file COPYING3.  If not see
NEXT_PASS (pass_thread_jumps);
NEXT_PASS (pass_vrp, false /* warn_array_bounds_p */);
NEXT_PASS (pass_vrp_threader);
-  /* Threading can leave many const/copy propagations in the IL.
-Clean them up.  Instead of just copy_prop, we use ccp to
-compute alignment and nonzero bits.  */


Well, the comment is still true as long as pass_vrp_threader is there ;)


Ha ha!  Oops.  You're right.

Tell you what, if we decide not to rearrange the pipeline, I'll put the 
comment back :).


Aldy

[PATCH] Refactor load/store costing

2021-10-19 Thread Richard Biener via Gcc-patches

This passes down the already available alignment scheme and
misalignment to the load/store costing routines, removing
redundant queries.

Bootstrapped and tested on x86_64-unknown-linux-gnu, pushed.

2021-10-19  Richard Biener  

* tree-vect-data-refs.c (vect_get_data_access_cost): Get
alignment support scheme and misalignment as arguments
and pass them down.
(vect_get_peeling_costs_all_drs): Compute that info here
and note that we shouldn't need to.
* tree-vect-stmts.c (vect_model_store_cost): Get
alignment support scheme and misalignment as arguments.
(vect_get_store_cost): Likewise.
(vect_model_load_cost): Likewise.
(vect_get_load_cost): Likewise.
(vectorizable_store): Pass down alignment support scheme
and misalignment to costing.
(vectorizable_load): Likewise.
---
 gcc/tree-vect-data-refs.c | 20 +++
 gcc/tree-vect-stmts.c | 41 ---
 gcc/tree-vectorizer.h |  4 +++-
 3 files changed, 40 insertions(+), 25 deletions(-)

diff --git a/gcc/tree-vect-data-refs.c b/gcc/tree-vect-data-refs.c
index 7c95f9ad69e..0db6aec7312 100644
--- a/gcc/tree-vect-data-refs.c
+++ b/gcc/tree-vect-data-refs.c
@@ -1396,7 +1396,9 @@ vector_alignment_reachable_p (dr_vec_info *dr_info)
 
 static void
 vect_get_data_access_cost (vec_info *vinfo, dr_vec_info *dr_info,
-   unsigned int *inside_cost,
+  dr_alignment_support alignment_support_scheme,
+  int misalignment,
+  unsigned int *inside_cost,
unsigned int *outside_cost,
   stmt_vector_for_cost *body_cost_vec,
   stmt_vector_for_cost *prologue_cost_vec)
@@ -1411,10 +1413,12 @@ vect_get_data_access_cost (vec_info *vinfo, dr_vec_info 
*dr_info,
 ncopies = vect_get_num_copies (loop_vinfo, STMT_VINFO_VECTYPE (stmt_info));
 
   if (DR_IS_READ (dr_info->dr))
-vect_get_load_cost (vinfo, stmt_info, ncopies, true, inside_cost,
+vect_get_load_cost (vinfo, stmt_info, ncopies, alignment_support_scheme,
+   misalignment, true, inside_cost,
outside_cost, prologue_cost_vec, body_cost_vec, false);
   else
-vect_get_store_cost (vinfo,stmt_info, ncopies, inside_cost, body_cost_vec);
+vect_get_store_cost (vinfo,stmt_info, ncopies, alignment_support_scheme,
+misalignment, inside_cost, body_cost_vec);
 
   if (dump_enabled_p ())
 dump_printf_loc (MSG_NOTE, vect_location,
@@ -1545,7 +1549,15 @@ vect_get_peeling_costs_all_drs (loop_vec_info loop_vinfo,
 vect_dr_misalign_for_aligned_access (dr0_info));
   else
vect_update_misalignment_for_peel (dr_info, dr0_info, npeel);
-  vect_get_data_access_cost (loop_vinfo, dr_info, inside_cost, 
outside_cost,
+  /* ???  We should be able to avoid both the adjustment before and the
+call to vect_supportable_dr_alignment below.  */
+  tree vectype = STMT_VINFO_VECTYPE (dr_info->stmt);
+  int misalignment = dr_misalignment (dr_info, vectype);
+  dr_alignment_support alignment_support_scheme
+   = vect_supportable_dr_alignment (loop_vinfo, dr_info, vectype);
+  vect_get_data_access_cost (loop_vinfo, dr_info,
+alignment_support_scheme, misalignment,
+inside_cost, outside_cost,
 body_cost_vec, prologue_cost_vec);
   SET_DR_MISALIGNMENT (dr_info, save_misalignment);
 }
diff --git a/gcc/tree-vect-stmts.c b/gcc/tree-vect-stmts.c
index 09a97b44c91..afc3ef17834 100644
--- a/gcc/tree-vect-stmts.c
+++ b/gcc/tree-vect-stmts.c
@@ -909,6 +909,8 @@ cfun_returns (tree decl)
 static void
 vect_model_store_cost (vec_info *vinfo, stmt_vec_info stmt_info, int ncopies,
   vect_memory_access_type memory_access_type,
+  dr_alignment_support alignment_support_scheme,
+  int misalignment,
   vec_load_store_type vls_type, slp_tree slp_node,
   stmt_vector_for_cost *cost_vec)
 {
@@ -969,7 +971,8 @@ vect_model_store_cost (vec_info *vinfo, stmt_vec_info 
stmt_info, int ncopies,
   scalar_store, stmt_info, 0, vect_body);
 }
   else
-vect_get_store_cost (vinfo, stmt_info, ncopies, &inside_cost, cost_vec);
+vect_get_store_cost (vinfo, stmt_info, ncopies, alignment_support_scheme,
+misalignment, &inside_cost, cost_vec);
 
   if (memory_access_type == VMAT_ELEMENTWISE
   || memory_access_type == VMAT_STRIDED_SLP)
@@ -1021,15 +1024,12 @@ vect_model_store_cost (vec_info *vinfo, stmt_vec_info 
stmt_info, int ncopies,
 
 /* Calculate cost of DR's memory access.  */
 void
-vect_get_store_cost (vec_info *vinfo, stmt_vec_info stmt_info, in

[PATCH, v2] c++: Diagnose taking address of an immediate member function [PR102753]

2021-10-19 Thread Jakub Jelinek via Gcc-patches

On Mon, Oct 18, 2021 at 12:42:00PM -0400, Jason Merrill wrote:
> > --- gcc/cp/typeck.c.jj  2021-10-05 09:53:55.382734051 +0200
> > +++ gcc/cp/typeck.c 2021-10-15 19:28:38.034213437 +0200
> > @@ -6773,9 +6773,21 @@ cp_build_addr_expr_1 (tree arg, bool str
> > return error_mark_node;
> >   }
> > +   if (TREE_CODE (t) == FUNCTION_DECL
> > +   && DECL_IMMEDIATE_FUNCTION_P (t)
> > +   && cp_unevaluated_operand == 0
> > +   && (current_function_decl == NULL_TREE
> > +   || !DECL_IMMEDIATE_FUNCTION_P (current_function_decl)))
> 
> This doesn't cover some of the other cases of immediate context; we should
> probably factor most of immediate_invocation_p out into a function called
> something like in_immediate_context and use it here, and in several other
> places as well.

You're right, I've done that for the two spots in cp_build_addr_expr_1
and added testsuite coverage for where it changed behavior.
While doing that I've discovered further issues.

One is that we weren't diagnosing PMFs referring to immediate methods
returned from immediate functions (either directly or embedded in
aggregates).  I'm not sure if it can only appear as PTRMEM_CST which
I've handled (cp_walk_subtree only walks the type and not the
PTRMEM_CST_MEMBER) or something else.

Another issue is that while default arg in immediate function
containing &immediate_fn works properly, if it is immediate_fn
instead, we were incorrectly rejecting it.
I've handled this in build_over_call, though with this usage
in_consteval_if_p is slightly misnamed, it stands for in consteval
if or some other reason why we are currently in immediate function context.
Though, that flag alone can't be all the reasons for being in immediate
function contexts, as I've tried the other reasons can't be handled in such
a bool and need to be tested too.

And another thing isn't in a patch, but I'm wondering whether we don't
handle it incorrectly.  constexpr.c has:
  /* Check that immediate invocation does not return an expression referencing
 any immediate function decls.  They need to be allowed while parsing
 immediate functions, but can't leak outside of them.  */
  if (is_consteval
  && t != r
  && (current_function_decl == NULL_TREE
  || !DECL_IMMEDIATE_FUNCTION_P (current_function_decl)))
as condition for the discovery of embedded immediate FUNCTION_DECLs
(or now PTRMEM_CSTs).  If I remove the && (current... ..._decl))
then g++.dg/cpp2a/consteval7.C's
struct S { int b; int (*c) (); };
consteval S baz () { return { 5, foo }; }
consteval int qux () { S s = baz (); return s.b + s.c (); }
consteval int quux () { constexpr S s = baz (); return s.b + s.c (); }
quux line fails, but based on
http://eel.is/c++draft/expr.const#11
I wonder if it shouldn't fail (clang++ -std=c++20 rejects it),
and be only accepted without the constexpr keyword before S s.
Also wonder about e.g.
consteval int foo () { return 42; }

consteval int
bar ()
{
  auto fn1 = foo;  // This must be ok
  constexpr auto fn2 = foo; // Isn't this an error?
  return fn1 () + fn2 ();
}

constexpr int
baz ()
{
  if consteval {
auto fn1 = foo; // This must be ok
constexpr auto fn2 = foo; // Isn't this an error?
return fn1 () + fn2 ();
  }
  return 0;
}

auto a = bar ();

static_assert (bar () == 84);
static_assert (baz () == 84);
(again, clang++ -std=c++20 rejects the fn2 = foo; case,
but doesn't implement consteval if, so can't test the other one).
For taking address of an immediate function or method if it is taken
outside of immediate function context we already have diagnostics
about it, but shouldn't the immediate FUNCTION_DECL discovery in
cxx_eval_outermost_constant_expression be instead guarded with something
like
  if (is_consteval || in_immediate_context ())
and be done regardless of whether t != r?

2021-10-19  Jakub Jelinek  

PR c++/102753
* cp-tree.h (in_immediate_context): Declare.
* call.c (in_immediate_context): New function.
(immediate_invocation_p): Use it.
(build_over_call): Temporarily set in_consteval_if_p for
convert_default_arg calls of immediate invocations.
* typeck.c (cp_build_addr_expr_1): Diagnose taking address of
an immediate method.  Use t instead of TREE_OPERAND (arg, 1).
Use in_immediate_context function.
* constexpr.c (find_immediate_fndecl): Handle PTRMEM_CST
which refers to immediate function decl.

* g++.dg/cpp2a/consteval20.C: New test.
* g++.dg/cpp2a/consteval21.C: New test.
* g++.dg/cpp2a/consteval22.C: New test.
* g++.dg/cpp2a/consteval23.C: New test.
* g++.dg/cpp23/consteval-if11.C: New test.

--- gcc/cp/cp-tree.h.jj 2021-10-15 11:58:44.968133548 +0200
+++ gcc/cp/cp-tree.h2021-10-19 10:40:58.375799274 +0200
@@ -6547,6 +6547,7 @@ extern tree perform_direct_initializatio
tsubst_flags_t);
 extern vec *resolve_args (vec*,

[PATCH 1/2] libstdc++: Implement LWG 3523 changes to ranges::iota_view

2021-10-19 Thread Patrick Palka via Gcc-patches

libstdc++-v3/ChangeLog:

* include/std/ranges (iota_view::_Iterator): Befriend iota_view.
(iota_view::_Sentinel): Likewise.
(iota_view::iota_view): Add three overloads each taking an
iterator/sentinel pair as per LWG 3523.
* testsuite/std/ranges/iota/iota_view.cc (test06): New test.
---
 libstdc++-v3/include/std/ranges   | 21 +++
 .../testsuite/std/ranges/iota/iota_view.cc| 21 +++
 2 files changed, 42 insertions(+)

diff --git a/libstdc++-v3/include/std/ranges b/libstdc++-v3/include/std/ranges
index b8de400dfbb..85f232d8fb9 100644
--- a/libstdc++-v3/include/std/ranges
+++ b/libstdc++-v3/include/std/ranges
@@ -532,6 +532,7 @@ namespace ranges
   private:
_Winc _M_value = _Winc();
 
+   friend iota_view;
 friend _Sentinel;
   };
 
@@ -568,6 +569,8 @@ namespace ranges
operator-(const _Sentinel& __x, const _Iterator& __y)
  requires sized_sentinel_for<_Bound, _Winc>
{ return __x._M_distance_from(__y); }
+
+   friend iota_view;
   };
 
   _Winc _M_value = _Winc();
@@ -590,6 +593,24 @@ namespace ranges
  __glibcxx_assert( bool(__value <= __bound) );
   }
 
+  constexpr
+  iota_view(_Iterator __first, _Iterator __last)
+   requires same_as<_Winc, _Bound>
+   : iota_view(__first._M_value, __last._M_value)
+  { }
+
+  constexpr
+  iota_view(_Iterator __first, unreachable_sentinel_t __last)
+   requires same_as<_Bound, unreachable_sentinel_t>
+   : iota_view(__first._M_value, __last)
+  { }
+
+  constexpr
+  iota_view(_Iterator __first, _Sentinel __last)
+   requires (!same_as<_Winc, _Bound>) && (!same_as<_Bound, 
unreachable_sentinel_t>)
+   : iota_view(__first._M_value, __last._M_bound)
+  { }
+
   constexpr _Iterator
   begin() const { return _Iterator{_M_value}; }
 
diff --git a/libstdc++-v3/testsuite/std/ranges/iota/iota_view.cc 
b/libstdc++-v3/testsuite/std/ranges/iota/iota_view.cc
index 362ef1f7f78..5bebe4be693 100644
--- a/libstdc++-v3/testsuite/std/ranges/iota/iota_view.cc
+++ b/libstdc++-v3/testsuite/std/ranges/iota/iota_view.cc
@@ -18,6 +18,7 @@
 // { dg-options "-std=gnu++2a" }
 // { dg-do run { target c++2a } }
 
+#include 
 #include 
 #include 
 
@@ -90,6 +91,25 @@ test05()
   VERIFY( r.begin() - r.end() == -3 );
 }
 
+void
+test06()
+{
+  // Verify LWG 3523 changes.
+  auto v1 = std::views::iota(0, 5);
+  auto w1 = decltype(v1)(v1.begin(), v1.end());
+  VERIFY( std::ranges::equal(v1, w1) );
+
+  auto v2 = std::views::iota(0);
+  auto w2 = decltype(v2)(v2.begin(), v2.end());
+  static_assert(std::same_as);
+  VERIFY( *w2.begin() == 0 );
+
+  auto v3 = std::views::iota(0, 5l);
+  auto w3 = decltype(v3)(v3.begin(), v3.end());
+  static_assert(!std::ranges::common_range);
+  VERIFY( std::ranges::equal(v3, w3) );
+}
+
 int
 main()
 {
@@ -98,4 +118,5 @@ main()
   test03();
   test04();
   test05();
+  test06();
 }
-- 
2.33.1.637.gf443b226ca

[PATCH 2/2] libstdc++: Implement P1739R4 changes to views::take/drop/counted

2021-10-19 Thread Patrick Palka via Gcc-patches

This implements P1739R4 along with the resolution for LWG 3407 which
corrects the paper's wording.

Tested on x86_64-pc-linux-gnu, does this look OK for trunk?

libstdc++-v3/ChangeLog:

* include/bits/ranges_util.h (views::_Drop): Forward declare.
(subrange): Befriend views::_Drop.
(subrange::_S_store_size): Declare constexpr instead of just
const, remove obsolete comment.
* include/std/ranges (views::__detail::__is_empty_view): Define.
(views::__detail::__is_basic_string_view): Likewise.
(views::__detail::__is_subrange): Likewise.
(views::__detail::__is_iota_view): Likewise.
(views::__detail::__can_take_view): Rename template parm _Tp to _Dp.
(views::_Take): Rename template parm _Tp to _Dp, make it non-deducible
and fix its type to range_difference_t<_Range>.  Implement P1739R4 and
LWG 3407 changes.
(views::__detail::__can_drop_view): Rename template parm _Tp to _Dp.
(views::_Drop): As with views::_Take.
(views::_Counted): Implement P1739R4 changes.
* include/std/span (__detail::__is_std_span): Rename to ...
(__detail::__is_span): ... this and turn it into a variable
template.
(__detail::__is_std_array): Turn it into a variable template.
(span::span): Adjust uses of __is_std_span and __is_std_array
accordingly.
* testsuite/std/ranges/adaptors/p1739.cc: New test.
---
 libstdc++-v3/include/bits/ranges_util.h   |   7 +-
 libstdc++-v3/include/std/ranges   | 106 +++---
 libstdc++-v3/include/std/span |  12 +-
 .../testsuite/std/ranges/adaptors/p1739.cc|  88 +++
 4 files changed, 192 insertions(+), 21 deletions(-)
 create mode 100644 libstdc++-v3/testsuite/std/ranges/adaptors/p1739.cc

diff --git a/libstdc++-v3/include/bits/ranges_util.h 
b/libstdc++-v3/include/bits/ranges_util.h
index 7e7b958d274..1afa66d298c 100644
--- a/libstdc++-v3/include/bits/ranges_util.h
+++ b/libstdc++-v3/include/bits/ranges_util.h
@@ -211,6 +211,8 @@ namespace ranges
 
   } // namespace __detail
 
+  namespace views { struct _Drop; }
+
   enum class subrange_kind : bool { unsized, sized };
 
   /// The ranges::subrange class template
@@ -221,8 +223,9 @@ namespace ranges
 class subrange : public view_interface>
 {
 private:
-  // XXX: gcc complains when using constexpr here
-  static const bool _S_store_size
+  friend struct views::_Drop; // Needs to inspect _S_store_size.
+
+  static constexpr bool _S_store_size
= _Kind == subrange_kind::sized && !sized_sentinel_for<_Sent, _It>;
 
   _It _M_begin = _It();
diff --git a/libstdc++-v3/include/std/ranges b/libstdc++-v3/include/std/ranges
index 85f232d8fb9..64396027c1b 100644
--- a/libstdc++-v3/include/std/ranges
+++ b/libstdc++-v3/include/std/ranges
@@ -42,6 +42,7 @@
 #include 
 #include 
 #include 
+#include 
 #include 
 #include 
 #include 
@@ -2147,19 +2148,66 @@ namespace views::__adaptor
   {
 namespace __detail
 {
-  template
+  template
+   constexpr bool __is_empty_view = false;
+
+  template
+   constexpr bool __is_empty_view> = true;
+
+  template
+   constexpr bool __is_basic_string_view = false;
+
+  template
+   constexpr bool __is_basic_string_view> = true;
+
+  template
+   constexpr bool __is_subrange = false;
+
+  template
+   constexpr bool __is_subrange> = true;
+
+  template
+   constexpr bool __is_iota_view = false;
+
+  template
+   constexpr bool __is_iota_view> = true;
+
+  template
concept __can_take_view
- = requires { take_view(std::declval<_Range>(), std::declval<_Tp>()); 
};
+ = requires { take_view(std::declval<_Range>(), std::declval<_Dp>()); 
};
 } // namespace __detail
 
 struct _Take : __adaptor::_RangeAdaptor<_Take>
 {
-  template
-   requires __detail::__can_take_view<_Range, _Tp>
+  template>
+   requires __detail::__can_take_view<_Range, _Dp>
constexpr auto
-   operator() [[nodiscard]] (_Range&& __r, _Tp&& __n) const
+   operator() [[nodiscard]] (_Range&& __r, type_identity_t<_Dp> __n) const
{
- return take_view(std::forward<_Range>(__r), std::forward<_Tp>(__n));
+ using _Tp = remove_cvref_t<_Range>;
+ if constexpr (__detail::__is_empty_view<_Tp>)
+   return _Tp();
+ else if constexpr (random_access_range<_Tp>
+&& sized_range<_Tp>
+&& (std::__detail::__is_span<_Tp>
+|| __detail::__is_basic_string_view<_Tp>
+|| __detail::__is_subrange<_Tp>
+|| __detail::__is_iota_view<_Tp>))
+   {
+ __n = std::min<_Dp>(ranges::distance(__r), __n);
+ auto __begin = ranges::begin(__r);
+ auto __end = __begin + __n;
+

Re: [PATCH][aarch64] target: Support whitespaces in target attr/pragma.

2021-10-19 Thread Martin Liška


On 10/19/21 12:52, Richard Sandiford wrote:

It looks like this ought to happen after the alloca and copy, since it
modifies the string.


Oh yeah, good point.

Ready to be installed with the change?
Thanks,
MartinFrom 68df4cba3bccb714a14e3c795e6d9e4a44c54318 Mon Sep 17 00:00:00 2001
From: Martin Liska 
Date: Tue, 19 Oct 2021 11:11:16 +0200
Subject: [PATCH] target: Support whitespaces in target attr/pragma.

	PR target/102375

gcc/ChangeLog:

	* config/aarch64/aarch64.c (aarch64_process_one_target_attr):
	Strip whitespaces.

gcc/testsuite/ChangeLog:

	* gcc.target/aarch64/pr102375.c: New test.
---
 gcc/config/aarch64/aarch64.c| 1 +
 gcc/testsuite/gcc.target/aarch64/pr102375.c | 4 
 2 files changed, 5 insertions(+)
 create mode 100644 gcc/testsuite/gcc.target/aarch64/pr102375.c

diff --git a/gcc/config/aarch64/aarch64.c b/gcc/config/aarch64/aarch64.c
index 76d99d247ae..fdf341812f4 100644
--- a/gcc/config/aarch64/aarch64.c
+++ b/gcc/config/aarch64/aarch64.c
@@ -17633,6 +17633,7 @@ aarch64_process_one_target_attr (char *arg_str)
 
   char *str_to_check = (char *) alloca (len + 1);
   strcpy (str_to_check, arg_str);
+  str_to_check = strip_whitespaces (str_to_check, &len);
 
   /* We have something like __attribute__ ((target ("+fp+nosimd"))).
  It is easier to detect and handle it explicitly here rather than going
diff --git a/gcc/testsuite/gcc.target/aarch64/pr102375.c b/gcc/testsuite/gcc.target/aarch64/pr102375.c
new file mode 100644
index 000..fa75d319b2d
--- /dev/null
+++ b/gcc/testsuite/gcc.target/aarch64/pr102375.c
@@ -0,0 +1,4 @@
+/* PR target/102375 */
+/* { dg-do compile } */
+
+void calculate(void) __attribute__ ((target ("+sve, +sve2")));
-- 
2.33.1

Re: [PATCH][aarch64] target: Support whitespaces in target attr/pragma.

2021-10-19 Thread Richard Sandiford via Gcc-patches

Martin Liška  writes:
> On 10/19/21 12:52, Richard Sandiford wrote:
>> It looks like this ought to happen after the alloca and copy, since it
>> modifies the string.
>
> Oh yeah, good point.
>
> Ready to be installed with the change?
> Thanks,
> Martin
>
> From 68df4cba3bccb714a14e3c795e6d9e4a44c54318 Mon Sep 17 00:00:00 2001
> From: Martin Liska 
> Date: Tue, 19 Oct 2021 11:11:16 +0200
> Subject: [PATCH] target: Support whitespaces in target attr/pragma.
>
>   PR target/102375
>
> gcc/ChangeLog:
>
>   * config/aarch64/aarch64.c (aarch64_process_one_target_attr):
>   Strip whitespaces.
>
> gcc/testsuite/ChangeLog:
>
>   * gcc.target/aarch64/pr102375.c: New test.

OK, thanks.

Richard

> ---
>  gcc/config/aarch64/aarch64.c| 1 +
>  gcc/testsuite/gcc.target/aarch64/pr102375.c | 4 
>  2 files changed, 5 insertions(+)
>  create mode 100644 gcc/testsuite/gcc.target/aarch64/pr102375.c
>
> diff --git a/gcc/config/aarch64/aarch64.c b/gcc/config/aarch64/aarch64.c
> index 76d99d247ae..fdf341812f4 100644
> --- a/gcc/config/aarch64/aarch64.c
> +++ b/gcc/config/aarch64/aarch64.c
> @@ -17633,6 +17633,7 @@ aarch64_process_one_target_attr (char *arg_str)
>  
>char *str_to_check = (char *) alloca (len + 1);
>strcpy (str_to_check, arg_str);
> +  str_to_check = strip_whitespaces (str_to_check, &len);
>  
>/* We have something like __attribute__ ((target ("+fp+nosimd"))).
>   It is easier to detect and handle it explicitly here rather than going
> diff --git a/gcc/testsuite/gcc.target/aarch64/pr102375.c 
> b/gcc/testsuite/gcc.target/aarch64/pr102375.c
> new file mode 100644
> index 000..fa75d319b2d
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/aarch64/pr102375.c
> @@ -0,0 +1,4 @@
> +/* PR target/102375 */
> +/* { dg-do compile } */
> +
> +void calculate(void) __attribute__ ((target ("+sve, +sve2")));

Cleanup compute_points_to_sets

2021-10-19 Thread Jan Hubicka via Gcc-patches

Hi,
this patch fixes two issues I noticed while proofreading the code.
First is that I have added conditional around setting of nonlocal and
escaped flags (since they may be set from solver) while keeping the
variable in assignment that is confusing.

Second is that we still do not set pt in the case function has no memory
side effects.  In this case the call use is not going to be used since
uses_global_memory is false only if either function is const or modref
determined that all loads are from memory pointed to by parameters.  In
both cases we will disambiguate earlier before asking PTA oracle, but it
is better to avoid stale PTA sets (which shows in -alias dumps etc.)

Most of builtins are not modifying global memory, one option would be to
stick another flag into the fnspecs strings for this property.

Bootstrapped/regtested x86_64-linux, OK?

Honza

gcc/ChangeLog:

* tree-ssa-structalias.c (compute_points_to_sets): Cleanup.

diff --git a/gcc/tree-ssa-structalias.c b/gcc/tree-ssa-structalias.c
index 2e6513bb72a..35971a54e02 100644
--- a/gcc/tree-ssa-structalias.c
+++ b/gcc/tree-ssa-structalias.c
@@ -7550,8 +7550,8 @@ compute_points_to_sets (void)
 always escaped.  */
  if (uses_global_memory)
{
- pt->nonlocal = uses_global_memory;
- pt->escaped = uses_global_memory;
+ pt->nonlocal = 1;
+ pt->escaped = 1;
}
}
  else if (uses_global_memory)
@@ -7561,6 +7561,8 @@ compute_points_to_sets (void)
  *pt = cfun->gimple_df->escaped;
  pt->nonlocal = 1;
}
+ else
+   memset (pt, 0, sizeof (struct pt_solution));
}
 
  pt = gimple_call_clobber_set (stmt);
@@ -7582,8 +7584,8 @@ compute_points_to_sets (void)
 always escaped.  */
  if (writes_global_memory)
{
- pt->nonlocal = writes_global_memory;
- pt->escaped = writes_global_memory;
+ pt->nonlocal = 1;
+ pt->escaped = 1;
}
}
  else if (writes_global_memory)
@@ -7593,6 +7595,8 @@ compute_points_to_sets (void)
  *pt = cfun->gimple_df->escaped;
  pt->nonlocal = 1;
}
+ else
+   memset (pt, 0, sizeof (struct pt_solution));
}
}
 }

Re: [PATCH 1/2] libstdc++: Implement LWG 3523 changes to ranges::iota_view

2021-10-19 Thread Jonathan Wakely via Gcc-patches

On Tue, 19 Oct 2021 at 13:18, Patrick Palka via Libstdc++
 wrote:
>
> libstdc++-v3/ChangeLog:
>
> * include/std/ranges (iota_view::_Iterator): Befriend iota_view.
> (iota_view::_Sentinel): Likewise.
> (iota_view::iota_view): Add three overloads each taking an
> iterator/sentinel pair as per LWG 3523.
> * testsuite/std/ranges/iota/iota_view.cc (test06): New test.

OK for trunk and gcc-11 (after some soak time), thanks.


> ---
>  libstdc++-v3/include/std/ranges   | 21 +++
>  .../testsuite/std/ranges/iota/iota_view.cc| 21 +++
>  2 files changed, 42 insertions(+)
>
> diff --git a/libstdc++-v3/include/std/ranges b/libstdc++-v3/include/std/ranges
> index b8de400dfbb..85f232d8fb9 100644
> --- a/libstdc++-v3/include/std/ranges
> +++ b/libstdc++-v3/include/std/ranges
> @@ -532,6 +532,7 @@ namespace ranges
>private:
> _Winc _M_value = _Winc();
>
> +   friend iota_view;
>  friend _Sentinel;
>};
>
> @@ -568,6 +569,8 @@ namespace ranges
> operator-(const _Sentinel& __x, const _Iterator& __y)
>   requires sized_sentinel_for<_Bound, _Winc>
> { return __x._M_distance_from(__y); }
> +
> +   friend iota_view;
>};
>
>_Winc _M_value = _Winc();
> @@ -590,6 +593,24 @@ namespace ranges
>   __glibcxx_assert( bool(__value <= __bound) );
>}
>
> +  constexpr
> +  iota_view(_Iterator __first, _Iterator __last)
> +   requires same_as<_Winc, _Bound>
> +   : iota_view(__first._M_value, __last._M_value)
> +  { }
> +
> +  constexpr
> +  iota_view(_Iterator __first, unreachable_sentinel_t __last)
> +   requires same_as<_Bound, unreachable_sentinel_t>
> +   : iota_view(__first._M_value, __last)
> +  { }
> +
> +  constexpr
> +  iota_view(_Iterator __first, _Sentinel __last)
> +   requires (!same_as<_Winc, _Bound>) && (!same_as<_Bound, 
> unreachable_sentinel_t>)
> +   : iota_view(__first._M_value, __last._M_bound)
> +  { }
> +
>constexpr _Iterator
>begin() const { return _Iterator{_M_value}; }
>
> diff --git a/libstdc++-v3/testsuite/std/ranges/iota/iota_view.cc 
> b/libstdc++-v3/testsuite/std/ranges/iota/iota_view.cc
> index 362ef1f7f78..5bebe4be693 100644
> --- a/libstdc++-v3/testsuite/std/ranges/iota/iota_view.cc
> +++ b/libstdc++-v3/testsuite/std/ranges/iota/iota_view.cc
> @@ -18,6 +18,7 @@
>  // { dg-options "-std=gnu++2a" }
>  // { dg-do run { target c++2a } }
>
> +#include 
>  #include 
>  #include 
>
> @@ -90,6 +91,25 @@ test05()
>VERIFY( r.begin() - r.end() == -3 );
>  }
>
> +void
> +test06()
> +{
> +  // Verify LWG 3523 changes.
> +  auto v1 = std::views::iota(0, 5);
> +  auto w1 = decltype(v1)(v1.begin(), v1.end());
> +  VERIFY( std::ranges::equal(v1, w1) );
> +
> +  auto v2 = std::views::iota(0);
> +  auto w2 = decltype(v2)(v2.begin(), v2.end());
> +  static_assert(std::same_as std::unreachable_sentinel_t>);
> +  VERIFY( *w2.begin() == 0 );
> +
> +  auto v3 = std::views::iota(0, 5l);
> +  auto w3 = decltype(v3)(v3.begin(), v3.end());
> +  static_assert(!std::ranges::common_range);
> +  VERIFY( std::ranges::equal(v3, w3) );
> +}
> +
>  int
>  main()
>  {
> @@ -98,4 +118,5 @@ main()
>test03();
>test04();
>test05();
> +  test06();
>  }
> --
> 2.33.1.637.gf443b226ca
>

[PATCH] aix: ensure reference to __tls_get_addr is in text section.

2021-10-19 Thread CHIGOT, CLEMENT via Gcc-patches

The garbage collector of AIX linker might remove the reference to
__tls_get_addr if it's added inside an unused csect, which can be
the case of .data with very simple programs.

gcc/ChangeLog:
2021-10-19  Clément Chigot  

        * config/rs6000/rs6000.c (rs6000_xcoff_file_end): Move
        __tls_get_addr reference to .text csect.

Approved offline by David Edelson.

From 52e9e4554d8dba9f9c9c56267789fc1d08b1de98 Mon Sep 17 00:00:00 2001
From: =?UTF-8?q?Cl=C3=A9ment=20Chigot?= 
Date: Thu, 14 Oct 2021 09:03:13 +0200
Subject: [PATCH] aix: ensure reference to __tls_get_addr is in text section.
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

The garbage collector of AIX linker might remove the reference to
__tls_get_addr if it's added inside an unused csect, which can be
the case of .data with very simple programs.

gcc/ChangeLog:
2021-10-19  ClÃ©ment Chigot  

	* config/rs6000/rs6000.c (rs6000_xcoff_file_end): Move
	__tls_get_addr reference to .text csect.
---
 gcc/config/rs6000/rs6000.c | 10 +-
 1 file changed, 5 insertions(+), 5 deletions(-)

diff --git a/gcc/config/rs6000/rs6000.c b/gcc/config/rs6000/rs6000.c
index 68111c3fe6a..bac959f4ef4 100644
--- a/gcc/config/rs6000/rs6000.c
+++ b/gcc/config/rs6000/rs6000.c
@@ -21626,17 +21626,17 @@ static void
 rs6000_xcoff_file_end (void)
 {
   switch_to_section (text_section);
+  if (xcoff_tls_exec_model_detected)
+{
+  /* Add a .ref to __tls_get_addr to force libpthread dependency.  */
+  fputs ("\t.extern __tls_get_addr\n\t.ref __tls_get_addr\n", asm_out_file);
+}
   fputs ("_section_.text:\n", asm_out_file);
   switch_to_section (data_section);
   fputs (TARGET_32BIT
 	 ? "\t.long _section_.text\n" : "\t.llong _section_.text\n",
 	 asm_out_file);
 
-  if (xcoff_tls_exec_model_detected)
-{
-  /* Add a .ref to __tls_get_addr to force libpthread dependency.  */
-  fputs ("\t.extern __tls_get_addr\n\t.ref __tls_get_addr\n", asm_out_file);
-}
 }
 
 struct declare_alias_data
-- 
2.25.1

Re: [PATCH] AArch64: Tune case-values-threshold

2021-10-19 Thread Wilco Dijkstra via Gcc-patches

Hi Richard,

> I'm just concerned that here we're using the same explanation but with
> different numbers.  Why are the new numbers more right than the old ones
> (especially when it comes to code size, where the trade-off hasn't
> really changed)?

Like all tuning/costing parameters, these values are never fixed but change
over time due to optimizations, micro architectures and workloads.
The previous values were out of date so that's why I retuned them by
benchmarking different values and choosing the best combinations.

> It would be good to have more discussion of why certain numbers are
> too small or too high, and why 8 is the right pivot point for -Os.

You mean add more discussion in the comment? That comment is already overly
large and too specific - it would be better to reduce it. The -Os value was 
never
tuned, and 8 turns out to be faster and smaller than GCC's default.

Cheers,
Wilco

[PATCH, v2, OpenMP, Fortran] Support in_reduction for Fortran

2021-10-19 Thread Chung-Lin Tang


Hi Jakub,

On 2021/9/18 12:11 AM, Jakub Jelinek wrote:

@@ -3496,7 +3509,8 @@ static match
  match_omp (gfc_exec_op op, const omp_mask mask)
  {
gfc_omp_clauses *c;
-  if (gfc_match_omp_clauses (&c, mask) != MATCH_YES)
+  if (gfc_match_omp_clauses (&c, mask, true, true, false,
+(op == EXEC_OMP_TARGET)) != MATCH_YES)


The ()s around op == EXEC_OMP_TARGET are unnecessary.


Fixed.


--- a/gcc/fortran/trans-openmp.c
+++ b/gcc/fortran/trans-openmp.c
@@ -6391,12 +6391,17 @@ gfc_trans_omp_task (gfc_code *code)
  static tree
  gfc_trans_omp_taskgroup (gfc_code *code)
  {
+  stmtblock_t block;
+  gfc_start_block (&block);
tree body = gfc_trans_code (code->block->next);
tree stmt = make_node (OMP_TASKGROUP);
TREE_TYPE (stmt) = void_type_node;
OMP_TASKGROUP_BODY (stmt) = body;
-  OMP_TASKGROUP_CLAUSES (stmt) = NULL_TREE;
-  return stmt;
+  OMP_TASKGROUP_CLAUSES (stmt) = gfc_trans_omp_clauses (&block,
+   code->ext.omp_clauses,
+   code->loc);
+  gfc_add_expr_to_block (&block, stmt);


If this was missing, then I'm afraid we lack a lot of testsuite coverage for
Fortran task reductions.  It doesn't need to be covered in this patch, but 
would be
good to cover it incrementally.  Because the above means nothing with
taskgroup with task_reduction clause(s) could work properly at runtime.


Actually, the testcases do somewhat exercise taskgroup task_reductions, but 
like you
said, only lightly.


--- a/gcc/omp-low.c
+++ b/gcc/omp-low.c
@@ -1317,9 +1317,13 @@ scan_sharing_clauses (tree clauses, omp_context *ctx)
  if (is_omp_target (ctx->stmt))
{
  tree at = decl;
+ omp_context *scan_ctx = ctx;
  if (ctx->outer)
-   scan_omp_op (&at, ctx->outer);
- tree nt = omp_copy_decl_1 (at, ctx);
+   {
+ scan_omp_op (&at, ctx->outer);
+ scan_ctx = ctx->outer;
+   }
+ tree nt = omp_copy_decl_1 (at, scan_ctx);
  splay_tree_insert (ctx->field_map,
 (splay_tree_key) &DECL_CONTEXT (decl),
 (splay_tree_value) nt);


You're right that the var remembered with &DECL_CONTEXT (whatever) key is
used outside of the target construct rather than inside of it.
So, if ctx->outer is non-NULL, it seems right to create the var in that
outer context.  But, if ctx->outer is NULL, which can happen if the
target construct is orphaned, consider e.g.
extern int &x;
extern int &y;

void
foo ()
{
   #pragma omp target in_reduction (+: x, y)
   {
 x = x + 8;
 y = y + 16;
   }
}

void
bar ()
{
   #pragma omp taskgroup task_reduction (+: x, y)
   foo ();
}
then those artificial decls (copies of x and y) should appear
to be at the function scope and not inside of the target region.

Therefore, I wonder if omp_copy_decl_2 shouldn't do the
   DECL_CONTEXT (copy) = current_function_decl;
   DECL_CHAIN (copy) = ctx->block_vars;
   ctx->block_vars = copy;
(the last one can be moved next to the others) only if ctx != NULL
and otherwise call gimple_add_tmp_var (copy); instead
and then just call omp_copy_decl_1 at that spot with unconditional
ctx->outer.


I see what you mean. I tried gimple_add_tmp_var but didn't work due to a
!DECL_SEEN_IN_BIND_EXPR_P() assert fail, but record_vars() appears to work.


Also, this isn't the only place that should have such a change,
there is also
   if (ctx->outer)
 scan_omp_op (&at, ctx->outer);
   tree nt = omp_copy_decl_1 (at, ctx);
   splay_tree_insert (ctx->field_map,
  (splay_tree_key) &DECL_CONTEXT (t),
  (splay_tree_value) nt);
a few lines above this and I'd expect that it should be (at, ctx->outer)
as well.


Fixed.


@@ -1339,7 +1343,9 @@ scan_sharing_clauses (tree clauses, omp_context *ctx)
  if (!is_global_var (maybe_lookup_decl_in_outer_ctx (decl, ctx)))
{
  by_ref = use_pointer_for_field (decl, ctx);
- if (OMP_CLAUSE_CODE (c) == OMP_CLAUSE_IN_REDUCTION)
+ if (OMP_CLAUSE_CODE (c) == OMP_CLAUSE_IN_REDUCTION
+ && !splay_tree_lookup (ctx->field_map,
+(splay_tree_key) decl))
install_var_field (decl, by_ref, 3, ctx);
}
  install_var_local (decl, ctx);


When exactly do you need this?  It doesn't trigger on the new libgomp
testcase...


I remember there was a testcase with triggered an ICE without this, but for some
reason can't find it anymore. I don't have any more evidence this is needed, so
removed now.


--- /dev/null
+++ b/libgomp/testsuite/libgomp.fortran/target-in-reduction-1.f90
@@ -0,0 +1,33 @@
+! { dg-do run }
+
+subrouti

Re: [PATCH] rs6000: Remove unnecessary option manipulation.

2021-10-19 Thread Bill Schmidt via Gcc-patches

Hi Martin,

On 10/19/21 3:53 AM, Martin Liška wrote:
> On 10/15/21 17:24, Martin Liška wrote:
>> P.S. Next time, please CC me.
>
> All right, I tested the updated patch and it works fine
> on ppc64le-linux-gnu.
>
> May I install it?

I'm not a maintainer, so can't approve -- CCing those who can.

Thanks!
Bill

> Thanks,
> Martin

Re: [PATCH v4 1/3] rs6000: Add nmmintrin.h to extra_headers

2021-10-19 Thread Bill Schmidt via Gcc-patches

Hi Paul,

On 10/18/21 8:15 PM, Paul A. Clarke wrote:
> Fix an ommission in commit 29fb1e831bf1c25e4574bf2f98a9f534e5c67665.
>
> 2021-10-18  Paul A. Clarke  
>
> gcc
>   * config/config.gcc (extra_headers): Add nmmintrin.h.
> ---
>  gcc/config.gcc | 1 +
>  1 file changed, 1 insertion(+)
>
> diff --git a/gcc/config.gcc b/gcc/config.gcc
> index aa5bd5d14590..1cb9303b3a85 100644
> --- a/gcc/config.gcc
> +++ b/gcc/config.gcc
> @@ -490,6 +490,7 @@ powerpc*-*-*)
>   extra_headers="${extra_headers} xmmintrin.h mm_malloc.h emmintrin.h"
>   extra_headers="${extra_headers} mmintrin.h x86intrin.h"
>   extra_headers="${extra_headers} pmmintrin.h tmmintrin.h smmintrin.h"
> + extra_headers="${extra_headers} nmmintrin.h"
>   extra_headers="${extra_headers} ppu_intrinsics.h spu2vmx.h vec_types.h 
> si2vmx.h"
>   extra_headers="${extra_headers} amo.h"
>   case x$with_cpu in

In my opinion, you can commit this one as obvious.

Thanks,
Bill

[Patch, committed] Fortran: Fix "str" to scalar descriptor conversion [PR92482]

2021-10-19 Thread Tobias Burnus


Here, the problem is that the param.expr was:
  &"abc"  -> type: "char*"
as that's an ADDR_EXPR, the previous code dereferrenced it:
 *&"abc" -> type  *(char*)
but that's the type 'char'. Thus, at the end, the result
was
  scalar = 'a' -> type char
instead of
  scalar "abc" -> type char array of size 3

Solution: Do what the comment does – remove the ADDR_EXPR
insead of dereferrencing the result.

Build + regtested on x86_64-gnu-linux
+ installed as r12-4505-g6920d5a1a2834e9c62d441b8f4c6186b01107d13

Tobias
-
Siemens Electronic Design Automation GmbH; Anschrift: Arnulfstraße 201, 80634 
München; Gesellschaft mit beschränkter Haftung; Geschäftsführer: Thomas 
Heurung, Frank Thürauf; Sitz der Gesellschaft: München; Registergericht 
München, HRB 106955
commit 6920d5a1a2834e9c62d441b8f4c6186b01107d13
Author: Tobias Burnus 
Date:   Tue Oct 19 15:16:01 2021 +0200

Fortran: Fix "str" to scalar descriptor conversion [PR92482]

PR fortran/92482
gcc/fortran/ChangeLog:

* trans-expr.c (gfc_conv_procedure_call): Use TREE_OPERAND not
build_fold_indirect_ref_loc to undo an ADDR_EXPR.

gcc/testsuite/ChangeLog:

* gfortran.dg/bind-c-char-descr.f90: Remove xfail; extend a bit.
---
 gcc/fortran/trans-expr.c|  2 +-
 gcc/testsuite/gfortran.dg/bind-c-char-descr.f90 | 57 -
 2 files changed, 39 insertions(+), 20 deletions(-)

diff --git a/gcc/fortran/trans-expr.c b/gcc/fortran/trans-expr.c
index 01389373065..29697e69e75 100644
--- a/gcc/fortran/trans-expr.c
+++ b/gcc/fortran/trans-expr.c
@@ -6640,7 +6640,7 @@ gfc_conv_procedure_call (gfc_se * se, gfc_symbol * sym,
 		{
 		  tmp = parmse.expr;
 		  if (TREE_CODE (tmp) == ADDR_EXPR)
-			tmp = build_fold_indirect_ref_loc (input_location, tmp);
+			tmp = TREE_OPERAND (tmp, 0);
 		  parmse.expr = gfc_conv_scalar_to_descriptor (&parmse, tmp,
    fsym->attr);
 		  parmse.expr = gfc_build_addr_expr (NULL_TREE,
diff --git a/gcc/testsuite/gfortran.dg/bind-c-char-descr.f90 b/gcc/testsuite/gfortran.dg/bind-c-char-descr.f90
index 3b01ad3b63d..8829fd1f71b 100644
--- a/gcc/testsuite/gfortran.dg/bind-c-char-descr.f90
+++ b/gcc/testsuite/gfortran.dg/bind-c-char-descr.f90
@@ -2,7 +2,6 @@
 !
 ! Contributed by José Rui Faustino de Sousa 
 !
-! Note the xfail issue below for 'strg_print_2("abc")
 
 program strp_p
 
@@ -24,13 +23,18 @@ program strp_p
   if (len(str) /= 3 .or. str /= "abc") stop 1
   if (len(strp_1) /= 3 .or. strp_1 /= "abc") stop 2
   if (len(strp_2) /= 3 .or. strp_2 /= "abc") stop 3
-  call strg_print_0("abc") ! Error (10.0.0) or segmentation fault (9.1.0)
-  call strg_print_0(str) ! Error (10.0.0) or segmentation fault (9.1.0)
-  call strg_print_0(strp_1) ! Error (10.0.0) or segmentation fault (9.1.0)
-  call strg_print_0(strp_2) ! Error (10.0.0) or segmentation fault (9.1.0)
-  call strg_print_1(strp_1) ! Not yet supported
+  call strg_print_0("abc")
+  call strg_print_0(str)
+  call strg_print_0(strp_1)
+  call strg_print_0(strp_2)
+  call strg_print_0_c("abc")
+  call strg_print_0_c(str)
+  call strg_print_0_c(strp_1)
+  call strg_print_0_c(strp_2)
+  call strg_print_1(strp_1)
+  call strg_print_1_c(strp_1)
 
-  call strg_print_2("abc", xfail=.true.)
+  call strg_print_2("abc")
   call strg_print_2(str)
   call strg_print_2(strp_1)
   call strg_print_2(strp_2)
@@ -42,14 +46,21 @@ program strp_p
 
 contains
 
-  subroutine strg_print_0(this) bind(c) ! Error (10.0.0 20191106) or warning (9.1.0) issued with bind(c)
+  subroutine strg_print_0 (this)
 character(len=*, kind=c_char), target, intent(in) :: this
 
 if (len (this) /= 3) stop 10
 if (this /= "abc") stop 11
   end subroutine strg_print_0
+
+  subroutine strg_print_0_c (this) bind(c)
+character(len=*, kind=c_char), target, intent(in) :: this
+
+if (len (this) /= 3) stop 10
+if (this /= "abc") stop 11
+  end subroutine strg_print_0_c
   
-  subroutine strg_print_1(this) bind(c) ! Not yet supported with bind(c)
+  subroutine strg_print_1 (this) bind(c)
 character(len=:, kind=c_char), pointer, intent(in) :: this
 character(len=:), pointer :: strn
 
@@ -63,26 +74,34 @@ contains
if (this /= "abc") stop 25
  end if
end subroutine strg_print_1
+
+  subroutine strg_print_1_c (this) bind(c)
+character(len=:, kind=c_char), pointer, intent(in) :: this
+character(len=:), pointer :: strn
+
+if (.not. associated (this)) stop 20
+if (len (this) /= 3) stop 21
+if (this /= "abc") stop 22
+ strn => this
+ if (.not. associated (strn)) stop 23
+ if(associated(strn))then
+   if (len (this) /= 3) stop 24
+   if (this /= "abc") stop 25
+ end if
+   end subroutine strg_print_1_c
   
-  subroutine strg_print_2(this, xfail)
+  subroutine strg_print_2(this)
 use, intrinsic :: iso_c_binding, only: &
   c_loc, c_f_pointer
 
 type(*), target, intent(in) :: this(..)
-logical, opt

Re: [PATCH] AArch64: Tune case-values-threshold

2021-10-19 Thread Richard Sandiford via Gcc-patches

Wilco Dijkstra  writes:
> Hi Richard,
>
>> I'm just concerned that here we're using the same explanation but with
>> different numbers.  Why are the new numbers more right than the old ones
>> (especially when it comes to code size, where the trade-off hasn't
>> really changed)?
>
> Like all tuning/costing parameters, these values are never fixed but change
> over time due to optimizations, micro architectures and workloads.
> The previous values were out of date so that's why I retuned them by
> benchmarking different values and choosing the best combinations.
>
>> It would be good to have more discussion of why certain numbers are
>> too small or too high, and why 8 is the right pivot point for -Os.
>
> You mean add more discussion in the comment? That comment is already overly
> large and too specific - it would be better to reduce it. The -Os value was 
> never
> tuned, and 8 turns out to be faster and smaller than GCC's default.

The problem is that you're effectively asking for these values to be
taken on faith without providing any analysis and without describing
how you arrived at the new numbers.  Did you try other values too?
If so, how did they compare with the numbers that you finally chose?
At least that would give an indication of where the boundaries are.

For example, it's easier to believe that 8 is the right value for -Os if
you say that you tried 9 and 7 as well, and they were worse than 8 by X%
and Y%.  This would also help anyone who wants to tweak the numbers
again in future.

BTW, which CPU did you use to do the experiments?  Are the tuning
parameters for that CPU already consistent with the new generic values?

Thanks,
Richard

[PATCH] c++: Reject addresses of immediate functions in constexpr vars inside of immediate functions or consteval if [PR102753]

2021-10-19 Thread Jakub Jelinek via Gcc-patches

On Tue, Oct 19, 2021 at 02:00:21PM +0200, Jakub Jelinek via Gcc-patches wrote:
> And another thing isn't in a patch, but I'm wondering whether we don't
> handle it incorrectly.  constexpr.c has:
>   /* Check that immediate invocation does not return an expression referencing
>  any immediate function decls.  They need to be allowed while parsing
>  immediate functions, but can't leak outside of them.  */
>   if (is_consteval
>   && t != r
>   && (current_function_decl == NULL_TREE
> || !DECL_IMMEDIATE_FUNCTION_P (current_function_decl)))
> as condition for the discovery of embedded immediate FUNCTION_DECLs
> (or now PTRMEM_CSTs).  If I remove the && (current... ..._decl))
> then g++.dg/cpp2a/consteval7.C's
> struct S { int b; int (*c) (); };
> consteval S baz () { return { 5, foo }; }
> consteval int qux () { S s = baz (); return s.b + s.c (); }
> consteval int quux () { constexpr S s = baz (); return s.b + s.c (); }
> quux line fails, but based on
> http://eel.is/c++draft/expr.const#11
> I wonder if it shouldn't fail (clang++ -std=c++20 rejects it),
> and be only accepted without the constexpr keyword before S s.

Here is an incremental patch that implements that.

2021-10-19  Jakub Jelinek  

PR c++/102753
* constexpr.c (cxx_eval_outermost_constant_expr): Perform
find_immediate_fndecl discovery if is_consteval or
in_immediate_context () rather than if is_consteval, t != r
and not in immediate function's body.

* g++.dg/cpp2a/consteval7.C: Expect diagnostics on quux.
* g++.dg/cpp2a/consteval24.C: New test.
* g++.dg/cpp23/consteval-if12.C: New test.

--- gcc/cp/constexpr.c.jj   2021-10-19 12:22:35.583964001 +0200
+++ gcc/cp/constexpr.c  2021-10-19 13:58:22.545182032 +0200
@@ -7472,12 +7472,8 @@ cxx_eval_outermost_constant_expr (tree t
 }
 
   /* Check that immediate invocation does not return an expression referencing
- any immediate function decls.  They need to be allowed while parsing
- immediate functions, but can't leak outside of them.  */
-  if (is_consteval
-  && t != r
-  && (current_function_decl == NULL_TREE
- || !DECL_IMMEDIATE_FUNCTION_P (current_function_decl)))
+ any immediate function decls.  */
+  if (is_consteval || in_immediate_context ())
 if (tree immediate_fndecl
= cp_walk_tree_without_duplicates (&r, find_immediate_fndecl,
   NULL))
--- gcc/testsuite/g++.dg/cpp2a/consteval7.C.jj  2020-01-12 11:54:37.140402440 
+0100
+++ gcc/testsuite/g++.dg/cpp2a/consteval7.C 2021-10-19 13:59:54.033897061 
+0200
@@ -7,7 +7,7 @@ constexpr auto a = bar ();  // { dg-error
 struct S { int b; int (*c) (); };
 consteval S baz () { return { 5, foo }; }
 consteval int qux () { S s = baz (); return s.b + s.c (); }
-consteval int quux () { constexpr S s = baz (); return s.b + s.c (); }
+consteval int quux () { constexpr S s = baz (); return s.b + s.c (); } // { 
dg-error "immediate evaluation returns address of immediate function 'consteval 
int foo\\(\\)'" }
 constexpr auto d = baz (); // { dg-error "immediate evaluation returns 
address of immediate function 'consteval int foo\\(\\)'" }
 constexpr auto e = qux ();
 constexpr auto f = quux ();
--- gcc/testsuite/g++.dg/cpp2a/consteval24.C.jj 2021-10-19 14:32:51.858019368 
+0200
+++ gcc/testsuite/g++.dg/cpp2a/consteval24.C2021-10-19 14:49:11.618177303 
+0200
@@ -0,0 +1,30 @@
+// PR c++/102753
+// { dg-do compile { target c++20 } }
+
+struct S {
+  constexpr S () : s (0) {}
+  consteval int foo () { return 1; }
+  virtual consteval int bar () { return 2; }
+  int s;
+};
+
+consteval int foo () { return 42; }
+consteval auto baz () { return foo; }
+consteval auto qux () { return &S::foo; }
+consteval auto corge () { return &S::bar; }
+
+consteval int
+bar ()
+{
+  S s;
+  constexpr auto fn1 = foo;// { dg-error "immediate evaluation 
returns address of immediate function" }
+  constexpr auto fn2 = &foo;   // { dg-error "immediate evaluation 
returns address of immediate function" }
+  constexpr auto fn3 = &S::foo;// { dg-error "immediate 
evaluation returns address of immediate function" }
+  constexpr auto fn4 = &S::bar;// { dg-error "immediate 
evaluation returns address of immediate function" }
+  constexpr auto fn5 = baz (); // { dg-error "immediate evaluation 
returns address of immediate function" }
+  constexpr auto fn6 = qux (); // { dg-error "immediate evaluation 
returns address of immediate function" }
+  constexpr auto fn7 = corge ();   // { dg-error "immediate evaluation 
returns address of immediate function" }
+  return fn1 () + fn2 () + (s.*fn3) () + (s.*fn4) () + fn5 () + (s.*fn6) () + 
(s.*fn7) ();
+}
+
+auto a = bar ();
--- gcc/testsuite/g++.dg/cpp23/consteval-if12.C.jj  2021-10-19 
14:54:05.123023731 +0200
+++ gcc/testsuite/g++.dg/cpp23/consteval-if12.C 2021-10-19 14:55:29.02684403

Re: Cleanup compute_points_to_sets

2021-10-19 Thread Richard Biener via Gcc-patches

On October 19, 2021 2:35:47 PM GMT+02:00, Jan Hubicka via Gcc-patches 
 wrote:
>Hi,
>this patch fixes two issues I noticed while proofreading the code.
>First is that I have added conditional around setting of nonlocal and
>escaped flags (since they may be set from solver) while keeping the
>variable in assignment that is confusing.
>
>Second is that we still do not set pt in the case function has no memory
>side effects.  In this case the call use is not going to be used since
>uses_global_memory is false only if either function is const or modref
>determined that all loads are from memory pointed to by parameters.  In
>both cases we will disambiguate earlier before asking PTA oracle, but it
>is better to avoid stale PTA sets (which shows in -alias dumps etc.)
>
>Most of builtins are not modifying global memory, one option would be to
>stick another flag into the fnspecs strings for this property.
>
>Bootstrapped/regtested x86_64-linux, OK?

Ok. 

Richard. 

>Honza
>
>gcc/ChangeLog:
>
>   * tree-ssa-structalias.c (compute_points_to_sets): Cleanup.
>
>diff --git a/gcc/tree-ssa-structalias.c b/gcc/tree-ssa-structalias.c
>index 2e6513bb72a..35971a54e02 100644
>--- a/gcc/tree-ssa-structalias.c
>+++ b/gcc/tree-ssa-structalias.c
>@@ -7550,8 +7550,8 @@ compute_points_to_sets (void)
>always escaped.  */
> if (uses_global_memory)
>   {
>-pt->nonlocal = uses_global_memory;
>-pt->escaped = uses_global_memory;
>+pt->nonlocal = 1;
>+pt->escaped = 1;
>   }
>   }
> else if (uses_global_memory)
>@@ -7561,6 +7561,8 @@ compute_points_to_sets (void)
> *pt = cfun->gimple_df->escaped;
> pt->nonlocal = 1;
>   }
>+else
>+  memset (pt, 0, sizeof (struct pt_solution));
>   }
> 
> pt = gimple_call_clobber_set (stmt);
>@@ -7582,8 +7584,8 @@ compute_points_to_sets (void)
>always escaped.  */
> if (writes_global_memory)
>   {
>-pt->nonlocal = writes_global_memory;
>-pt->escaped = writes_global_memory;
>+pt->nonlocal = 1;
>+pt->escaped = 1;
>   }
>   }
> else if (writes_global_memory)
>@@ -7593,6 +7595,8 @@ compute_points_to_sets (void)
> *pt = cfun->gimple_df->escaped;
> pt->nonlocal = 1;
>   }
>+else
>+  memset (pt, 0, sizeof (struct pt_solution));
>   }
>   }
> }

[committed] libstdc++: Fix std::stack deduction guide

2021-10-19 Thread Jonathan Wakely via Gcc-patches

libstdc++-v3/ChangeLog:

* include/bits/stl_stack.h (stack(Iterator, Iterator)): Remove
non-deducible template parameter from deduction guide.
* testsuite/23_containers/stack/deduction.cc: Check new C++23
deduction guides.

Tested powerpc64le-linux. Committed to trunk.

commit c4ecb11e4f7ea15f636e463248c8b14083bef05d
Author: Jonathan Wakely 
Date:   Tue Oct 19 11:38:26 2021

libstdc++: Fix std::stack deduction guide

libstdc++-v3/ChangeLog:

* include/bits/stl_stack.h (stack(Iterator, Iterator)): Remove
non-deducible template parameter from deduction guide.
* testsuite/23_containers/stack/deduction.cc: Check new C++23
deduction guides.

diff --git a/libstdc++-v3/include/bits/stl_stack.h 
b/libstdc++-v3/include/bits/stl_stack.h
index 429743f5514..76b2e242c37 100644
--- a/libstdc++-v3/include/bits/stl_stack.h
+++ b/libstdc++-v3/include/bits/stl_stack.h
@@ -322,7 +322,7 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
 -> stack;
 
 #ifdef __cpp_lib_adaptor_iterator_pair_constructor
-  template::value_type,
   typename = _RequireInputIter<_InputIterator>>
diff --git a/libstdc++-v3/testsuite/23_containers/stack/deduction.cc 
b/libstdc++-v3/testsuite/23_containers/stack/deduction.cc
index 169a063687b..dea7ba060d9 100644
--- a/libstdc++-v3/testsuite/23_containers/stack/deduction.cc
+++ b/libstdc++-v3/testsuite/23_containers/stack/deduction.cc
@@ -87,3 +87,17 @@ test02()
   std::stack s8(std::move(l), l.get_allocator());
   check_type>>(s8);
 }
+
+#if __cpp_lib_adaptor_iterator_pair_constructor
+void
+test03()
+{
+  std::list l;
+
+  std::stack s1(l.begin(), l.end());
+  check_type>(s1);
+
+  std::stack s2(l.begin(), l.end(), std::allocator());
+  check_type>(s1);
+}
+#endif

[committed] libstdc++: Implement monadic operations for std::optional (P0798R8)

2021-10-19 Thread Jonathan Wakely via Gcc-patches

Another new addition to the C++23 working draft.

The new member functions of std::optional are only defined for C++23,
but the new members of _Optional_payload_base are defined for C++20 so
that they can be used in non-propagating-cache in . The
_Optional_payload_base::_M_construct member can also be used in
non-propagating-cache now, because it's constexpr since r12-4389.

There will be an LWG issue about the feature test macro, suggesting that
we should just bump the value of __cpp_lib_optional instead. I haven't
done that here, but it can be changed once consensus is reached on the
change.

libstdc++-v3/ChangeLog:

* include/std/optional (_Optional_payload_base::_Storage): Add
constructor taking a callable function to invoke.
(_Optional_payload_base::_M_apply): New function.
(__cpp_lib_monadic_optional): Define for C++23.
(optional::and_then, optional::transform, optional::or_else):
Define for C++23.
* include/std/ranges (__detail::__cached): Remove.
(__detail::__non_propagating_cache): Remove use of __cached for
contained value. Use _Optional_payload_base::_M_construct and
_Optional_payload_base::_M_apply to set the contained value.
* include/std/version (__cpp_lib_monadic_optional): Define.
* testsuite/20_util/optional/monadic/and_then.cc: New test.
* testsuite/20_util/optional/monadic/or_else.cc: New test.
* testsuite/20_util/optional/monadic/or_else_neg.cc: New test.
* testsuite/20_util/optional/monadic/transform.cc: New test.
* testsuite/20_util/optional/monadic/version.cc: New test.

Tested powerpc64le-linux. Committed to trunk.

commit 82b2e4f8cf5a01c6724fe3f465a77ee03cfcaae2
Author: Jonathan Wakely 
Date:   Tue Oct 19 11:06:56 2021

libstdc++: Implement monadic operations for std::optional (P0798R8)

Another new addition to the C++23 working draft.

The new member functions of std::optional are only defined for C++23,
but the new members of _Optional_payload_base are defined for C++20 so
that they can be used in non-propagating-cache in . The
_Optional_payload_base::_M_construct member can also be used in
non-propagating-cache now, because it's constexpr since r12-4389.

There will be an LWG issue about the feature test macro, suggesting that
we should just bump the value of __cpp_lib_optional instead. I haven't
done that here, but it can be changed once consensus is reached on the
change.

libstdc++-v3/ChangeLog:

* include/std/optional (_Optional_payload_base::_Storage): Add
constructor taking a callable function to invoke.
(_Optional_payload_base::_M_apply): New function.
(__cpp_lib_monadic_optional): Define for C++23.
(optional::and_then, optional::transform, optional::or_else):
Define for C++23.
* include/std/ranges (__detail::__cached): Remove.
(__detail::__non_propagating_cache): Remove use of __cached for
contained value. Use _Optional_payload_base::_M_construct and
_Optional_payload_base::_M_apply to set the contained value.
* include/std/version (__cpp_lib_monadic_optional): Define.
* testsuite/20_util/optional/monadic/and_then.cc: New test.
* testsuite/20_util/optional/monadic/or_else.cc: New test.
* testsuite/20_util/optional/monadic/or_else_neg.cc: New test.
* testsuite/20_util/optional/monadic/transform.cc: New test.
* testsuite/20_util/optional/monadic/version.cc: New test.

diff --git a/libstdc++-v3/include/std/optional 
b/libstdc++-v3/include/std/optional
index b69268b3642..eac91d3c160 100644
--- a/libstdc++-v3/include/std/optional
+++ b/libstdc++-v3/include/std/optional
@@ -1,6 +1,7 @@
 //  -*- C++ -*-
 
 // Copyright (C) 2013-2021 Free Software Foundation, Inc.
+// Copyright The GNU Toolchain Authors.
 //
 // This file is part of the GNU ISO C++ Library.  This library is free
 // software; you can redistribute it and/or modify it under the
@@ -44,6 +45,10 @@
 #include  // in_place_t
 #if __cplusplus > 201703L
 # include 
+# include  // std::__invoke
+#endif
+#if __cplusplus > 202002L
+# include 
 #endif
 
 namespace std _GLIBCXX_VISIBILITY(default)
@@ -81,6 +86,8 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
   /// Tag to disengage optional objects.
   inline constexpr nullopt_t nullopt { nullopt_t::_Construct::_Token };
 
+  template struct _Optional_func { _Fn& _M_f; };
+
   /**
*  @brief Exception class thrown when a disengaged optional object is
*  dereferenced.
@@ -211,6 +218,15 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
: _M_value(__il, std::forward<_Args>(__args)...)
{ }
 
+#if __cplusplus >= 202002L
+ template
+   constexpr
+   _Storage(_Optional_func<_Fn> __f, _Arg&& __arg)
+   : _M_value(std::__invoke(std::forward<_Fn>(__f._M_f),
+

[committed] libstdc++: Change std::variant union member to empty struct

2021-10-19 Thread Jonathan Wakely via Gcc-patches

This more clearly expresses the intent (a completely unused, trivial
type) than using char. It's also consistent with the unions in
std::optional.

libstdc++-v3/ChangeLog:

* include/std/variant (_Uninitialized): Use an empty struct
for the unused union member, instead of char.

Tested powerpc64le-linux. Committed to trunk.

commit 5a8832b1659e311437d25b7ec8b078be27ae54b8
Author: Jonathan Wakely 
Date:   Tue Oct 19 11:53:27 2021

libstdc++: Change std::variant union member to empty struct

This more clearly expresses the intent (a completely unused, trivial
type) than using char. It's also consistent with the unions in
std::optional.

libstdc++-v3/ChangeLog:

* include/std/variant (_Uninitialized): Use an empty struct
for the unused union member, instead of char.

diff --git a/libstdc++-v3/include/std/variant b/libstdc++-v3/include/std/variant
index d18365fde22..3da7dad1e82 100644
--- a/libstdc++-v3/include/std/variant
+++ b/libstdc++-v3/include/std/variant
@@ -293,8 +293,10 @@ namespace __variant
   constexpr _Type&& _M_get() && noexcept
   { return std::move(_M_storage); }
 
+  struct _Empty_byte { };
+
   union {
-   char _M_nope;
+   _Empty_byte _M_empty;
_Type _M_storage;
   };
 #else

Re: [PATCH, v2, OpenMP, Fortran] Support in_reduction for Fortran

2021-10-19 Thread Jakub Jelinek via Gcc-patches

On Tue, Oct 19, 2021 at 09:03:06PM +0800, Chung-Lin Tang wrote:
> 2021-10-19  Chung-Lin Tang  
> 
> gcc/fortran/ChangeLog:
> 
>   * openmp.c (gfc_match_omp_clause_reduction): Add 'openmp_target' default
>   false parameter. Add 'always,tofrom' map for OMP_LIST_IN_REDUCTION case.
>   (gfc_match_omp_clauses): Add 'openmp_target' default false parameter,
>   adjust call to gfc_match_omp_clause_reduction.
>   (match_omp): Adjust call to gfc_match_omp_clauses
>   * trans-openmp.c (gfc_trans_omp_taskgroup): Add call to
>   gfc_match_omp_clause, create and return block.
> 
> gcc/ChangeLog:
> 
>   * omp-low.c (omp_copy_decl_2): For !ctx, use record_vars to add new copy
>   as local variable.
>   (scan_sharing_clauses): Place copy of OMP_CLAUSE_IN_REDUCTION decl in
>   ctx->outer instead of ctx.
> 
> gcc/testsuite/ChangeLog:
> 
>   * gfortran.dg/gomp/reduction4.f90: Adjust omp target in_reduction' scan
>   pattern.
> 
> libgomp/ChangeLog:
> 
>   * testsuite/libgomp.fortran/target-in-reduction-1.f90: New test.
>   * testsuite/libgomp.fortran/target-in-reduction-2.f90: New test.

LGTM, thanks.

Jakub

[PATCH] Refactor vect_supportable_dr_alignment

2021-10-19 Thread Richard Biener via Gcc-patches

This refactors vect_supportable_dr_alignment to get the misalignment
as input parameter which allows us to elide modifying/restoring
of DR_MISALIGNMENT during alignment peeling analysis which eventually
makes it more straight-forward to split out the negative step
handling.

Bootstrapped and tested on x86_64-unknown-linux-gnu, pushed to trunk.

2021-10-19  Richard Biener  

* tree-vectorizer.h (vect_supportable_dr_alignment): Add
misalignment parameter.
* tree-vect-data-refs.c (vect_get_peeling_costs_all_drs):
Do not change DR_MISALIGNMENT in place, instead pass the
adjusted misalignment to vect_supportable_dr_alignment.
(vect_peeling_supportable): Likewise.
(vect_peeling_hash_get_lowest_cost): Adjust.
(vect_enhance_data_refs_alignment): Likewise.
(vect_vfa_access_size): Likewise.
(vect_supportable_dr_alignment): Add misalignment
parameter and simplify.
* tree-vect-stmts.c (get_negative_load_store_type): Adjust.
(get_group_load_store_type): Likewise.
(get_load_store_type): Likewise.
---
 gcc/tree-vect-data-refs.c | 113 +++---
 gcc/tree-vect-stmts.c |  26 +
 gcc/tree-vectorizer.h |   2 +-
 3 files changed, 85 insertions(+), 56 deletions(-)

diff --git a/gcc/tree-vect-data-refs.c b/gcc/tree-vect-data-refs.c
index 0db6aec7312..556ae9725f1 100644
--- a/gcc/tree-vect-data-refs.c
+++ b/gcc/tree-vect-data-refs.c
@@ -1529,37 +1529,49 @@ vect_get_peeling_costs_all_drs (loop_vec_info 
loop_vinfo,
unsigned int *outside_cost,
stmt_vector_for_cost *body_cost_vec,
stmt_vector_for_cost *prologue_cost_vec,
-   unsigned int npeel,
-   bool unknown_misalignment)
+   unsigned int npeel)
 {
   vec datarefs = LOOP_VINFO_DATAREFS (loop_vinfo);
 
+  bool dr0_alignment_known_p
+= (dr0_info
+   && known_alignment_for_access_p (dr0_info,
+   STMT_VINFO_VECTYPE (dr0_info->stmt)));
+
   for (data_reference *dr : datarefs)
 {
   dr_vec_info *dr_info = loop_vinfo->lookup_dr (dr);
   if (!vect_relevant_for_alignment_p (dr_info))
continue;
 
-  int save_misalignment;
-  save_misalignment = dr_info->misalignment;
+  tree vectype = STMT_VINFO_VECTYPE (dr_info->stmt);
+  dr_alignment_support alignment_support_scheme;
+  int misalignment;
+  unsigned HOST_WIDE_INT alignment;
+
   if (npeel == 0)
-   ;
-  else if (unknown_misalignment && dr_info == dr0_info)
-   SET_DR_MISALIGNMENT (dr_info,
-vect_dr_misalign_for_aligned_access (dr0_info));
+   misalignment = dr_misalignment (dr_info, vectype);
+  else if (dr_info == dr0_info
+  || vect_dr_aligned_if_peeled_dr_is (dr_info, dr0_info))
+   misalignment = 0;
+  else if (!dr0_alignment_known_p
+  || !known_alignment_for_access_p (dr_info, vectype)
+  || !DR_TARGET_ALIGNMENT (dr_info).is_constant (&alignment))
+   misalignment = DR_MISALIGNMENT_UNKNOWN;
   else
-   vect_update_misalignment_for_peel (dr_info, dr0_info, npeel);
-  /* ???  We should be able to avoid both the adjustment before and the
-call to vect_supportable_dr_alignment below.  */
-  tree vectype = STMT_VINFO_VECTYPE (dr_info->stmt);
-  int misalignment = dr_misalignment (dr_info, vectype);
-  dr_alignment_support alignment_support_scheme
-   = vect_supportable_dr_alignment (loop_vinfo, dr_info, vectype);
+   {
+ misalignment = dr_misalignment (dr_info, vectype);
+ misalignment += npeel * TREE_INT_CST_LOW (DR_STEP (dr_info->dr));
+ misalignment &= alignment - 1;
+   }
+  alignment_support_scheme
+   = vect_supportable_dr_alignment (loop_vinfo, dr_info, vectype,
+misalignment);
+
   vect_get_data_access_cost (loop_vinfo, dr_info,
 alignment_support_scheme, misalignment,
 inside_cost, outside_cost,
 body_cost_vec, prologue_cost_vec);
-  SET_DR_MISALIGNMENT (dr_info, save_misalignment);
 }
 }
 
@@ -1583,7 +1595,7 @@ vect_peeling_hash_get_lowest_cost (_vect_peel_info **slot,
 
   vect_get_peeling_costs_all_drs (loop_vinfo, elem->dr_info, &inside_cost,
  &outside_cost, &body_cost_vec,
- &prologue_cost_vec, elem->npeel, false);
+ &prologue_cost_vec, elem->npeel);
 
   body_cost_vec.release ();
 
@@ -1655,25 +1667,37 @@ vect_peeling_supportable (loop_vec_info loop_vinfo, 
dr_vec_info *dr0_info,
   vec datarefs = LOOP_VINFO_DATAREFS (loop_vinfo);
   enum dr_alignment_support supportable_dr_alignment;
 
+  bo

Re: [PATCH] rs6000: Remove unnecessary option manipulation.

2021-10-19 Thread Segher Boessenkool

Hi!

On Thu, Oct 14, 2021 at 09:49:30AM +0200, Martin Liška wrote:
> gcc/ChangeLog:
>   * config/rs6000/rs6000.c (rs6000_override_options_after_change):
>   Do not set flag_rename_registers, it's already default behavior.

It defaults to *off*?

frename-registers
Common Var(flag_rename_registers) Optimization EnabledBy(funroll-loops)
Perform a register renaming optimization pass.

>   Use EnabledBy for unroll_only_small_loops.
>   * config/rs6000/rs6000.opt: Use EnabledBy for
>   unroll_only_small_loops.
> ---
>  gcc/config/rs6000/rs6000.c   | 7 +--
>  gcc/config/rs6000/rs6000.opt | 2 +-
>  2 files changed, 2 insertions(+), 7 deletions(-)
> 
> diff --git a/gcc/config/rs6000/rs6000.c b/gcc/config/rs6000/rs6000.c
> index acba4d9f26c..40146179e06 100644
> --- a/gcc/config/rs6000/rs6000.c
> +++ b/gcc/config/rs6000/rs6000.c
> @@ -3472,13 +3472,8 @@ rs6000_override_options_after_change (void)
>/* Explicit -funroll-loops turns -munroll-only-small-loops off, and
>   turns -frename-registers on.  */
>if ((OPTION_SET_P (flag_unroll_loops) && flag_unroll_loops)
> -   || (OPTION_SET_P (flag_unroll_all_loops)
> -&& flag_unroll_all_loops))
> +   || (OPTION_SET_P (flag_unroll_all_loops && flag_unroll_all_loops)))

That doesn't do what the changelog said, and it is not obvious at all
that this is correct?  Maybe this works with the current implementation
of that macro (I assume you tested it works :-) ), but that is not
something you can depend on.  This expands to
  global_options_set.x_flag_unroll_all_loops && flag_unroll_all_loops
just like the previous code did, but that one was obvious, and this is
not.

>  {
> -  if (!OPTION_SET_P (unroll_only_small_loops))
> - unroll_only_small_loops = 0;
> -  if (!OPTION_SET_P (flag_rename_registers))
> - flag_rename_registers = 1;
>if (!OPTION_SET_P (flag_cunroll_grow_size))
>   flag_cunroll_grow_size = 1;
>  }
> diff --git a/gcc/config/rs6000/rs6000.opt b/gcc/config/rs6000/rs6000.opt
> index 9d7878f144a..faeb7423ca7 100644
> --- a/gcc/config/rs6000/rs6000.opt
> +++ b/gcc/config/rs6000/rs6000.opt
> @@ -546,7 +546,7 @@ Target Undocumented Var(rs6000_optimize_swaps) Init(1) 
> Save
>  Analyze and remove doubleword swaps from VSX computations.
>  
>  munroll-only-small-loops
> -Target Undocumented Var(unroll_only_small_loops) Init(0) Save
> +Target Undocumented Var(unroll_only_small_loops) Init(0) Save 
> EnabledBy(funroll-loops)

Your patches cannot apply.  Please send them non-wordwrapped.

This isn't the endpoint of the changes here I hope?  The macro games
make everything less readable (so, harder to change) and more fragile.


Segher

Re: [PATCH] rs6000: Remove unnecessary option manipulation.

2021-10-19 Thread Segher Boessenkool

On Fri, Oct 15, 2021 at 05:24:32PM +0200, Martin Liška wrote:
> On 10/14/21 17:10, Bill Schmidt via Gcc-patches wrote:
> >Looks like you got your parentheses wrong here.
> 
> Whoops, thanks for the heads up.
> 
> I'm testing this fixed version.

Please start a new thread for every new patch (series).  I missed this
one like this, instead I reviewed the older one.

[-- Attachment #2: 0001-rs6000-Remove-unnecessary-option-manipulation.patch --]
[-- Type: text/x-patch, Encoding: base64, Size: 2.6K --]

Don't encode as non-text.  Don't use x-anything if you can help it, it
is meaningless in email.

Use git send-email, it makes everything work :-)

Segher

Re: [aarch64] PR102376 - Emit better diagnostic for arch extensions in target attr

2021-10-19 Thread Richard Sandiford via Gcc-patches

Prathamesh Kulkarni  writes:
> Hi,
> The attached patch emits a more verbose diagnostic for target attribute that
> is an architecture extension needing a leading '+'.
>
> For the following test,
> void calculate(void) __attribute__ ((__target__ ("sve")));
>
> With patch, the compiler now emits:
> 102376.c:1:1: error: arch extension ‘sve’ should be prepended with ‘+’
> 1 | void calculate(void) __attribute__ ((__target__ ("sve")));
>   | ^~~~
>
> instead of:
> 102376.c:1:1: error: pragma or attribute ‘target("sve")’ is not valid
> 1 | void calculate(void) __attribute__ ((__target__ ("sve")));
>   | ^~~~

Nice :-)

> (This isn't specific to sve though).
> OK to commit after bootstrap+test ?
>
> Thanks,
> Prathamesh
>
> diff --git a/gcc/config/aarch64/aarch64.c b/gcc/config/aarch64/aarch64.c
> index a9a1800af53..975f7faf968 100644
> --- a/gcc/config/aarch64/aarch64.c
> +++ b/gcc/config/aarch64/aarch64.c
> @@ -17821,7 +17821,16 @@ aarch64_process_target_attr (tree args)
>num_attrs++;
>if (!aarch64_process_one_target_attr (token))
>   {
> -   error ("pragma or attribute % is not valid", token);
> +   /* Check if token is possibly an arch extension without
> +  leading '+'.  */
> +   char *str = (char *) xmalloc (strlen (token) + 2);
> +   str[0] = '+';
> +   strcpy(str + 1, token);

I think std::string would be better here, e.g.:

  auto with_plus = std::string ("+") + token;

> +   if (aarch64_handle_attr_isa_flags (str))
> + error("arch extension %<%s%> should be prepended with %<+%>", 
> token);

Nit: should be a space before the “(”.

In principle, a fixit hint would have been nice here, but I don't think
we have enough information to provide one.  (Just saying for the record.)

Thanks,
Richard

> +   else
> + error ("pragma or attribute % is not valid", 
> token);
> +   free (str);
> return false;
>   }
>

Re: [PATCH v4 1/3] rs6000: Add nmmintrin.h to extra_headers

2021-10-19 Thread Segher Boessenkool

On Tue, Oct 19, 2021 at 08:10:06AM -0500, Bill Schmidt via Gcc-patches wrote:
> Hi Paul,
> 
> On 10/18/21 8:15 PM, Paul A. Clarke wrote:
> > Fix an ommission in commit 29fb1e831bf1c25e4574bf2f98a9f534e5c67665.

(Typo, s/mm/m/)

> > 2021-10-18  Paul A. Clarke  
> >
> > gcc
> > * config/config.gcc (extra_headers): Add nmmintrin.h.
> > ---
> >  gcc/config.gcc | 1 +
> >  1 file changed, 1 insertion(+)
> >
> > diff --git a/gcc/config.gcc b/gcc/config.gcc
> > index aa5bd5d14590..1cb9303b3a85 100644
> > --- a/gcc/config.gcc
> > +++ b/gcc/config.gcc
> > @@ -490,6 +490,7 @@ powerpc*-*-*)
> > extra_headers="${extra_headers} xmmintrin.h mm_malloc.h emmintrin.h"
> > extra_headers="${extra_headers} mmintrin.h x86intrin.h"
> > extra_headers="${extra_headers} pmmintrin.h tmmintrin.h smmintrin.h"
> > +   extra_headers="${extra_headers} nmmintrin.h"
> > extra_headers="${extra_headers} ppu_intrinsics.h spu2vmx.h vec_types.h 
> > si2vmx.h"
> > extra_headers="${extra_headers} amo.h"
> > case x$with_cpu in
> 
> In my opinion, you can commit this one as obvious.

Or as trivial.  Or as obvious and trivial :-)


Segher

Re: [PATCH v4 3/3] rs6000: Guard some x86 intrinsics implementations

2021-10-19 Thread Segher Boessenkool

On Mon, Oct 18, 2021 at 08:15:12PM -0500, Paul A. Clarke via Gcc-patches wrote:
> Some compatibility implementations of x86 intrinsics include
> Power intrinsics which require POWER8.  Guard them.

I assume this improves on all previous commented things (you don't say
if it does).

> gcc
>   PR target/101893
>   PR target/102719
>   * config/rs6000/emmintrin.h: Guard POWER8 intrinsics.
>   * config/rs6000/pmmintrin.h: Same.
>   * config/rs6000/smmintrin.h: Same.
>   * config/rs6000/tmmintrin.h: Same.

Okay for trunk.  Thanks!


Segher

[PATCH] libstdc++: Implement LWG 3549 changes to ranges::enable_view

2021-10-19 Thread Patrick Palka via Gcc-patches

This patch also reverts the r11-3504 workaround since it's made obsolete
by this resolution.

Tested on x86_64-pc-linux-gnu, does this look OK for trunk only?

libstdc++-v3/ChangeLog:

* include/bits/ranges_base.h (view_interface): Forward declare.
(__detail::__is_derived_from_view_interface_fn): Declare
(__detail::__is_derived_from_view_interface): Define as per LWG 3549.
(enable_view): Adjust as per LWG 3549.
* include/bits/ranges_util.h (view_interface): Don't derive from
view_base.
* include/std/ranges (filter_view): Revert r11-3504 change.
(transform_view): Likewise.
(take_view): Likewise.
(take_while_view): Likewise.
(drop_view): Likewise.
(drop_while_view): Likewise.
(join_view): Likewise.
(split_view): Likewise.
(reverse_view): Likewise.
* testsuite/std/ranges/adaptors/sizeof.cc: Update expected
sizes.
* testsuite/std/ranges/view.cc (test_view::test_view): Remove
now that views no longer need to be default-initializable.
(test01): New test.
---
 libstdc++-v3/include/bits/ranges_base.h   | 21 -
 libstdc++-v3/include/bits/ranges_util.h   |  2 +-
 libstdc++-v3/include/std/ranges   | 45 +--
 .../testsuite/std/ranges/adaptors/sizeof.cc   |  6 +--
 libstdc++-v3/testsuite/std/ranges/view.cc | 28 ++--
 5 files changed, 70 insertions(+), 32 deletions(-)

diff --git a/libstdc++-v3/include/bits/ranges_base.h 
b/libstdc++-v3/include/bits/ranges_base.h
index 01d0c35f4b4..7801b2fd023 100644
--- a/libstdc++-v3/include/bits/ranges_base.h
+++ b/libstdc++-v3/include/bits/ranges_base.h
@@ -614,12 +614,31 @@ namespace ranges
   template
 using range_size_t = decltype(ranges::size(std::declval<_Range&>()));
 
+  template
+requires is_class_v<_Derived> && same_as<_Derived, remove_cv_t<_Derived>>
+class view_interface; // defined in 
+
+  namespace __detail
+  {
+template
+  requires (!same_as<_Tp, view_interface<_Up>>)
+  void __is_derived_from_view_interface_fn(const _Tp&,
+  const view_interface<_Up>&); // 
not defined
+
+// Returns true iff _Tp has exactly one public base class that's a
+// specialization of view_interface.
+template
+  concept __is_derived_from_view_interface
+   = requires (_Tp __t) { __is_derived_from_view_interface_fn(__t, __t); };
+  }
+
   /// [range.view] The ranges::view_base type.
   struct view_base { };
 
   /// [range.view] The ranges::enable_view boolean.
   template
-inline constexpr bool enable_view = derived_from<_Tp, view_base>;
+inline constexpr bool enable_view = derived_from<_Tp, view_base>
+  || __detail::__is_derived_from_view_interface<_Tp>;
 
   /// [range.view] The ranges::view concept.
   template
diff --git a/libstdc++-v3/include/bits/ranges_util.h 
b/libstdc++-v3/include/bits/ranges_util.h
index 1afa66d298c..5c0bef26220 100644
--- a/libstdc++-v3/include/bits/ranges_util.h
+++ b/libstdc++-v3/include/bits/ranges_util.h
@@ -61,7 +61,7 @@ namespace ranges
   /// The ranges::view_interface class template
   template
 requires is_class_v<_Derived> && same_as<_Derived, remove_cv_t<_Derived>>
-class view_interface : public view_base
+class view_interface
 {
 private:
   constexpr _Derived& _M_derived() noexcept
diff --git a/libstdc++-v3/include/std/ranges b/libstdc++-v3/include/std/ranges
index 64396027c1b..e47fc075bbe 100644
--- a/libstdc++-v3/include/std/ranges
+++ b/libstdc++-v3/include/std/ranges
@@ -1533,9 +1533,9 @@ namespace views::__adaptor
{ return __y.__equal(__x); }
   };
 
+  _Vp _M_base = _Vp();
   [[no_unique_address]] __detail::__box<_Pred> _M_pred;
   [[no_unique_address]] __detail::_CachedPosition<_Vp> _M_cached_begin;
-  _Vp _M_base = _Vp();
 
 public:
   filter_view() requires (default_initializable<_Vp>
@@ -1544,7 +1544,7 @@ namespace views::__adaptor
 
   constexpr
   filter_view(_Vp __base, _Pred __pred)
-   : _M_pred(std::move(__pred)), _M_base(std::move(__base))
+   : _M_base(std::move(__base)), _M_pred(std::move(__pred))
   { }
 
   constexpr _Vp
@@ -1900,8 +1900,8 @@ namespace views::__adaptor
  friend _Sentinel;
};
 
-  [[no_unique_address]] __detail::__box<_Fp> _M_fun;
   _Vp _M_base = _Vp();
+  [[no_unique_address]] __detail::__box<_Fp> _M_fun;
 
 public:
   transform_view() requires (default_initializable<_Vp>
@@ -1910,7 +1910,7 @@ namespace views::__adaptor
 
   constexpr
   transform_view(_Vp __base, _Fp __fun)
-   : _M_fun(std::move(__fun)), _M_base(std::move(__base))
+   : _M_base(std::move(__base)), _M_fun(std::move(__fun))
   { }
 
   constexpr _Vp
@@ -2037,15 +2037,15 @@ namespace views::__adaptor
  friend _Sentinel;
};
 
-  range_difference_t<_Vp> _M_count = 0;
   _Vp

Re: [PATCH] AArch64: Tune case-values-threshold

2021-10-19 Thread Bernhard Reutner-Fischer via Gcc-patches

On 19 October 2021 15:23:58 CEST, Richard Sandiford via Gcc-patches 
 wrote:
>Wilco Dijkstra  writes:
>> Hi Richard,
>>
>>> I'm just concerned that here we're using the same explanation but with
>>> different numbers.  Why are the new numbers more right than the old ones
>>> (especially when it comes to code size, where the trade-off hasn't
>>> really changed)?
>>
>> Like all tuning/costing parameters, these values are never fixed but change
>> over time due to optimizations, micro architectures and workloads.
>> The previous values were out of date so that's why I retuned them by
>> benchmarking different values and choosing the best combinations.
>>
>>> It would be good to have more discussion of why certain numbers are
>>> too small or too high, and why 8 is the right pivot point for -Os.
>>
>> You mean add more discussion in the comment? That comment is already overly
>> large and too specific - it would be better to reduce it. The -Os value was 
>> never
>> tuned, and 8 turns out to be faster and smaller than GCC's default.
>
>The problem is that you're effectively asking for these values to be
>taken on faith without providing any analysis and without describing
>how you arrived at the new numbers.  Did you try other values too?
>If so, how did they compare with the numbers that you finally chose?
>At least that would give an indication of where the boundaries are.

Maybe you can show csibe benchmark numbers to show the effects:
http://szeged.github.io/csibe/

thanks,
>
>For example, it's easier to believe that 8 is the right value for -Os if
>you say that you tried 9 and 7 as well, and they were worse than 8 by X%
>and Y%.  This would also help anyone who wants to tweak the numbers
>again in future.
>
>BTW, which CPU did you use to do the experiments?  Are the tuning
>parameters for that CPU already consistent with the new generic values?
>
>Thanks,
>Richard

Re: [PATCH] rs6000: Remove unnecessary option manipulation.

2021-10-19 Thread Martin Liška


On 10/19/21 16:23, Segher Boessenkool wrote:

On Fri, Oct 15, 2021 at 05:24:32PM +0200, Martin Liška wrote:

On 10/14/21 17:10, Bill Schmidt via Gcc-patches wrote:

Looks like you got your parentheses wrong here.


Whoops, thanks for the heads up.

I'm testing this fixed version.


Please start a new thread for every new patch (series).  I missed this
one like this, instead I reviewed the older one.


Is it really best practice. My impression is that patch review (iterating over
a patch) happens in the same thread (in most cases). It's caused by discussion
in between sender reviewers.



[-- Attachment #2: 0001-rs6000-Remove-unnecessary-option-manipulation.patch --]
[-- Type: text/x-patch, Encoding: base64, Size: 2.6K --]


Meh :) If I need a reply to somebody's questions, I always attach patch as an 
attachment.
And I can't likely influence how Thunderbird is going to mark it.



Don't encode as non-text.  Don't use x-anything if you can help it, it
is meaningless in email.

Use git send-email, it makes everything work :-)


I know.

Anyway, sending updated version of the patch.

Cheers,
Martin




Segher

From d32d9445d8ec868abc965f167fffa10ab19417e5 Mon Sep 17 00:00:00 2001
From: Martin Liska 
Date: Wed, 13 Oct 2021 14:20:33 +0200
Subject: [PATCH] rs6000: Remove unnecessary option manipulation.

gcc/ChangeLog:

	* config/rs6000/rs6000.c (rs6000_override_options_after_change):
	Do not set flag_rename_registers, it's already enabled with
	EnabledBy(funroll-loops).
	Use EnabledBy for unroll_only_small_loops.
	* config/rs6000/rs6000.opt: Use EnabledBy for
	unroll_only_small_loops.
---
 gcc/config/rs6000/rs6000.c   | 7 +--
 gcc/config/rs6000/rs6000.opt | 2 +-
 2 files changed, 2 insertions(+), 7 deletions(-)

diff --git a/gcc/config/rs6000/rs6000.c b/gcc/config/rs6000/rs6000.c
index 01a95591a5d..b9dddcd0aa1 100644
--- a/gcc/config/rs6000/rs6000.c
+++ b/gcc/config/rs6000/rs6000.c
@@ -3472,13 +3472,8 @@ rs6000_override_options_after_change (void)
   /* Explicit -funroll-loops turns -munroll-only-small-loops off, and
  turns -frename-registers on.  */
   if ((OPTION_SET_P (flag_unroll_loops) && flag_unroll_loops)
-   || (OPTION_SET_P (flag_unroll_all_loops)
-	   && flag_unroll_all_loops))
+   || (OPTION_SET_P (flag_unroll_all_loops) && flag_unroll_all_loops))
 {
-  if (!OPTION_SET_P (unroll_only_small_loops))
-	unroll_only_small_loops = 0;
-  if (!OPTION_SET_P (flag_rename_registers))
-	flag_rename_registers = 1;
   if (!OPTION_SET_P (flag_cunroll_grow_size))
 	flag_cunroll_grow_size = 1;
 }
diff --git a/gcc/config/rs6000/rs6000.opt b/gcc/config/rs6000/rs6000.opt
index 9d7878f144a..faeb7423ca7 100644
--- a/gcc/config/rs6000/rs6000.opt
+++ b/gcc/config/rs6000/rs6000.opt
@@ -546,7 +546,7 @@ Target Undocumented Var(rs6000_optimize_swaps) Init(1) Save
 Analyze and remove doubleword swaps from VSX computations.
 
 munroll-only-small-loops
-Target Undocumented Var(unroll_only_small_loops) Init(0) Save
+Target Undocumented Var(unroll_only_small_loops) Init(0) Save EnabledBy(funroll-loops)
 ; Use conservative small loop unrolling.
 
 mpower9-misc
-- 
2.33.1

Re: [PATCH] c++: Fix up push_local_extern_decl_alias error recovery [PR102642]

2021-10-19 Thread Jason Merrill via Gcc-patches


On 10/11/21 07:18, Jakub Jelinek wrote:

Hi!

My recent push_local_extern_decl_alias change broke error-recovery,
do_pushdecl can return error_mark_node and set_decl_tls_model can't be
called on that.  There are other code paths that store error_mark_node
into DECL_LOCAL_DECL_ALIAS, with the intent to differentiate the cases
where we haven't yet tried to push it into the namespace scope (NULL)
and one where we have tried it but it failed (error_mark_node), but looking
around, there are other spots where we call functions or do processing
which doesn't tolerate error_mark_node.

So, the first hunk with the testcase fixes the testcase, the others
fix what I've spotted and the fix was easy to figure out (there are I think
3 other spots mainly for function multiversioning).


What if we use NULL_TREE for the error case instead of error_mark_node, i.e.

-  DECL_LOCAL_DECL_ALIAS (decl) = alias;
+  if (alias != error_mark_node)
+DECL_LOCAL_DECL_ALIAS (decl) = alias;

?  That ought to avoid the need to change other functions, since they 
already check for null.



Ok for trunk and 11.3 (where I've backported the tls fix before) if it
passes bootstrap/regtest?

2021-10-11  Jakub Jelinek  

PR c++/102642
* name-lookup.c (push_local_extern_decl_alias): Don't call
set_decl_tls_model on error_mark_node.
* decl.c (make_rtl_for_nonlocal_decl): Don't call
set_user_assembler_name on error_mark_node.
* parser.c (cp_parser_oacc_declare): Ignore DECL_LOCAL_DECL_ALIAS
if it is error_mark_node.
(cp_parser_omp_declare_target): Likewise.

* g++.dg/tls/pr102642.C: New test.

--- gcc/cp/name-lookup.c.jj 2021-10-01 10:30:07.674588541 +0200
+++ gcc/cp/name-lookup.c2021-10-11 12:43:39.261051228 +0200
@@ -3474,7 +3474,9 @@ push_local_extern_decl_alias (tree decl)
  push_nested_namespace (ns);
  alias = do_pushdecl (alias, /* hiding= */true);
  pop_nested_namespace (ns);
- if (VAR_P (decl) && CP_DECL_THREAD_LOCAL_P (decl))
+ if (VAR_P (decl)
+ && CP_DECL_THREAD_LOCAL_P (decl)
+ && alias != error_mark_node)
set_decl_tls_model (alias, DECL_TLS_MODEL (decl));
}
  }
--- gcc/cp/decl.c.jj2021-10-09 10:07:51.883704975 +0200
+++ gcc/cp/decl.c   2021-10-11 12:49:33.810977118 +0200
@@ -7373,7 +7373,8 @@ make_rtl_for_nonlocal_decl (tree decl, t
 This is horrible, as we're affecting a
 possibly-shared decl.  Again, a one-true-decl
 model breaks down.  */
- set_user_assembler_name (ns_decl, asmspec);
+ if (ns_decl != error_mark_node)
+   set_user_assembler_name (ns_decl, asmspec);
}
  }
  
--- gcc/cp/parser.c.jj	2021-10-09 10:14:24.043098112 +0200

+++ gcc/cp/parser.c 2021-10-11 12:47:21.220874667 +0200
@@ -44437,7 +44437,8 @@ cp_parser_oacc_declare (cp_parser *parse
   dependent local extern variable decls are as rare as
   hen's teeth.  */
if (auto alias = DECL_LOCAL_DECL_ALIAS (decl))
- decl = alias;
+ if (alias != error_mark_node)
+   decl = alias;
  
  	  if (OMP_CLAUSE_MAP_KIND (t) == GOMP_MAP_LINK)

id = get_identifier ("omp declare target link");
@@ -45665,7 +45666,8 @@ cp_parser_omp_declare_target (cp_parser
if (VAR_OR_FUNCTION_DECL_P (t)
  && DECL_LOCAL_DECL_P (t)
  && DECL_LANG_SPECIFIC (t)
- && DECL_LOCAL_DECL_ALIAS (t))
+ && DECL_LOCAL_DECL_ALIAS (t)
+ && DECL_LOCAL_DECL_ALIAS (t) != error_mark_node)
handle_omp_declare_target_clause (c, DECL_LOCAL_DECL_ALIAS (t),
  device_type);
  }
--- gcc/testsuite/g++.dg/tls/pr102642.C.jj  2021-10-11 13:00:35.889503002 
+0200
+++ gcc/testsuite/g++.dg/tls/pr102642.C 2021-10-11 13:00:20.388724721 +0200
@@ -0,0 +1,10 @@
+// PR c++/102642
+// { dg-do compile { target c++11 } }
+
+thread_local int *z;   // { dg-message "previous declaration" }
+
+void
+foo ()
+{
+  extern thread_local int z;   // { dg-error "conflicting declaration" }
+}

Jakub

[Patch, committed] Fortran: Fix 'fn spec' for deferred character length (was: Re: [r12-4457 Regression] FAIL: gfortran.dg/deferred_type_param_6.f90 -Os execution test on Linux/x86_64)

2021-10-19 Thread Tobias Burnus


On 16.10.21 20:54, Jan Hubicka wrote:

I wrote:

Fortran has for a long time 'character(len=5), allocatable" or
"character(len=*)". In the first case, the "5" can be ignored as both
caller and callee know the length. In the second case, the length is
determined by the argument, but it cannot be changed.

Since a not-that-short while, 'len=:' together with allocatable/pointer
is supported.

In the latter case, the value can be change when the array
association/allocation is changed.
...
+  if (!sym->ts.u.cl->length
+  && ((sym->attr.allocatable && sym->attr.target)
+  || sym->attr.pointer))
+spec[spec_len++] = '.';
+  if (!sym->ts.u.cl->length && sym->attr.allocatable)
+spec[spec_len++] = 'w';
+  else
+spec[spec_len++] = 'R';

Also escaping is quite important bit of information so it would be
good to figure out if it really can escape rather than playing safe.


The pointer to the string length variable itself does not escape,
only its integer string value:

subroutine foo(x)
  character(len=:), pointer :: x
  character(len=:), pointer :: y
  y => x
has in the dump:
  .y = *_x;
  y = (character(kind=1)[1:.y] *) *x;

Thus, 'w' can always be used.

Committed as obvious as r12-4511-gff0eec94e87dfb7dc387f120ca5ade2707aecf50

Tobias
-
Siemens Electronic Design Automation GmbH; Anschrift: Arnulfstraße 201, 80634 
München; Gesellschaft mit beschränkter Haftung; Geschäftsführer: Thomas 
Heurung, Frank Thürauf; Sitz der Gesellschaft: München; Registergericht 
München, HRB 106955
commit ff0eec94e87dfb7dc387f120ca5ade2707aecf50
Author: Tobias Burnus 
Date:   Tue Oct 19 16:38:56 2021 +0200

Fortran: Fix 'fn spec' for deferred character length

Shows now up with gfortran.dg/deferred_type_param_6.f90 due to more ME
optimizations, causing fails without this commit.

gcc/fortran/ChangeLog:

* trans-types.c (create_fn_spec): For allocatable/pointer
character(len=:), use 'w' not 'R' as fn spec for the length dummy
argument.
---
 gcc/fortran/trans-types.c | 6 +-
 1 file changed, 5 insertions(+), 1 deletion(-)

diff --git a/gcc/fortran/trans-types.c b/gcc/fortran/trans-types.c
index 50fceebc941..42778067dbe 100644
--- a/gcc/fortran/trans-types.c
+++ b/gcc/fortran/trans-types.c
@@ -3014,7 +3014,11 @@ create_fn_spec (gfc_symbol *sym, tree fntype)
 	}
   if (sym->ts.type == BT_CHARACTER)
 	{
-	  spec[spec_len++] = 'R';
+	  if (!sym->ts.u.cl->length
+	  && (sym->attr.allocatable || sym->attr.pointer))
+	spec[spec_len++] = 'w';
+	  else
+	spec[spec_len++] = 'R';
 	  spec[spec_len++] = ' ';
 	}
 }

Re: [PATCH] AArch64: Tune case-values-threshold

2021-10-19 Thread Wilco Dijkstra via Gcc-patches

Hi Richard,

> The problem is that you're effectively asking for these values to be
> taken on faith without providing any analysis and without describing
> how you arrived at the new numbers.  Did you try other values too?
> If so, how did they compare with the numbers that you finally chose?
> At least that would give an indication of where the boundaries are.

Yes, I obviously tried other values, pretty much all in range 1-20. There is
generally a range of 4-5 values that are very similar in size, and then you
choose one in the middle which also looks good for performance.

> For example, it's easier to believe that 8 is the right value for -Os if
> you say that you tried 9 and 7 as well, and they were worse than 8 by X%
> and Y%.  This would also help anyone who wants to tweak the numbers
> again in future.

For -Os, the size range for values 6-10 is within 0.01% so they are virtually
identical and I picked the median. Whether this will remain best in the future
is unclear since it depends on so many things, so at some point it needs
to be looked at again, just like most other tunings.

> BTW, which CPU did you use to do the experiments?  Are the tuning
> parameters for that CPU already consistent with the new generic values?

This was done on Neoverse N1. Almost no CPUs use per-CPU tuning for this.

Cheers,
Wilco

Re: [PATCH] Add a simulate_record_decl lang hook

2021-10-19 Thread Jason Merrill via Gcc-patches


On 10/18/21 16:35, Richard Sandiford wrote:

Jason Merrill  writes:

On 9/24/21 13:53, Richard Sandiford wrote:

+  if (type == error_mark_node)
+return lhd_simulate_record_decl (loc, name, fields);


Why fall back to the language-independent function on error?  Is there a
case where that gives better error recovery than just returning
error_mark_node?


I don't think falling back necessarily improves future error messages
(or makes them worse).  The reason was more that the code to handle
target builtins generally expects to be able to create whatever types
and functions it wants.  If we return something unexpected, even it's
error_mark_node, then there's a higher risk of ICEs later on.

I guess that's a bit defeatist.  But in practice, the first code
that uses the hook will be code that previously ran at start-up
and so didn't have to worry about these errors.

In practice I think errors will be extremely rare.


+  xref_basetypes (type, NULL_TREE);
+  type = begin_class_definition (type);
+  if (type == error_mark_node)
+return lhd_simulate_record_decl (loc, name, fields);
+
+  for (tree field : fields)
+finish_member_declaration (field);
+
+  type = finish_struct (type, NULL_TREE);
+
+  tree decl = build_decl (loc, TYPE_DECL, ident, type);
+  TYPE_NAME (type) = decl;
+  TYPE_STUB_DECL (type) = decl;


Setting TYPE_NAME and TYPE_STUB_DECL to the typedef is wrong; it should
work to just remove these two lines.  I expect they're also wrong for C.

For C++ only, I wonder if you need this typedef at all.

If you do want it, you need to use set_underlying_type to create a real
typedef.  I expect that's also true for C.


Ah, yeah, thanks for the pointer.  Fixed in the patch below.

I wanted the hook to simulate the typedef even for C++ because its
first user will be arm_neon.h.  The spec for arm_neon.h says that the
types must be declared as:

   typedef struct int32x2x4_t { … } int32x2x4_t;

etc.  So, although it's a silly edge case, code that tries to take
advantage of the struct stat hack, such as:

   #include 
   struct int32x2x4_t int32x2x4_t = {};

should continue to be rejected for C++ as well as C.

Maybe in future we could add a flag to suppress the typedef if some
callers prefer that behaviour.

Tested as before.


Can the C++ hook go in cp-lang.c (which already includes 
langhooks-def.h) instead of decl.c?  With that change, the patch is OK 
in a week if nobody else has feedback.



gcc/
* langhooks.h (lang_hooks_for_types::simulate_record_decl): New hook.
* langhooks-def.h (lhd_simulate_record_decl): Declare.
(LANG_HOOKS_SIMULATE_RECORD_DECL): Define.
(LANG_HOOKS_FOR_TYPES_INITIALIZER): Include it.
* langhooks.c (lhd_simulate_record_decl): New function.

gcc/c/
* c-tree.h (c_simulate_record_decl): Declare.
* c-objc-common.h (LANG_HOOKS_SIMULATE_RECORD_DECL): Override.
* c-decl.c (c_simulate_record_decl): New function.

gcc/cp/
* decl.c: Include langhooks-def.h.
(cxx_simulate_record_decl): New function.
* cp-objcp-common.h (cxx_simulate_record_decl): Declare.
(LANG_HOOKS_SIMULATE_RECORD_DECL): Override.
---
  gcc/c/c-decl.c   | 30 ++
  gcc/c/c-objc-common.h|  2 ++
  gcc/c/c-tree.h   |  2 ++
  gcc/cp/cp-objcp-common.h |  4 
  gcc/cp/decl.c| 37 +
  gcc/langhooks-def.h  |  4 
  gcc/langhooks.c  | 19 +++
  gcc/langhooks.h  | 10 ++
  8 files changed, 108 insertions(+)

diff --git a/gcc/c/c-decl.c b/gcc/c/c-decl.c
index 771efa3eadf..186fa1692c1 100644
--- a/gcc/c/c-decl.c
+++ b/gcc/c/c-decl.c
@@ -9436,6 +9436,36 @@ c_simulate_enum_decl (location_t loc, const char *name,
input_location = saved_loc;
return enumtype;
  }
+
+/* Implement LANG_HOOKS_SIMULATE_RECORD_DECL.  */
+
+tree
+c_simulate_record_decl (location_t loc, const char *name,
+   array_slice fields)
+{
+  location_t saved_loc = input_location;
+  input_location = loc;
+
+  class c_struct_parse_info *struct_info;
+  tree ident = get_identifier (name);
+  tree type = start_struct (loc, RECORD_TYPE, ident, &struct_info);
+
+  for (unsigned int i = 0; i < fields.size (); ++i)
+{
+  DECL_FIELD_CONTEXT (fields[i]) = type;
+  if (i > 0)
+   DECL_CHAIN (fields[i - 1]) = fields[i];
+}
+
+  finish_struct (loc, type, fields[0], NULL_TREE, struct_info);
+
+  tree decl = build_decl (loc, TYPE_DECL, ident, type);
+  set_underlying_type (decl);
+  lang_hooks.decls.pushdecl (decl);
+
+  input_location = saved_loc;
+  return type;
+}
  
  /* Create the FUNCTION_DECL for a function definition.
 DECLSPECS, DECLARATOR and ATTRIBUTES are the parts of
diff --git a/gcc/c/c-objc-common.h b/gcc/c/c-objc-common.h
index 7d35a0621e4..f4e8271f06c 100644
--- a/gcc/c/c-objc-common.h
+++ b/gcc/c/c-objc-common.h
@@ -81,6 +81,8 @@ along with GCC; see the file COPYING3.  I

Re: [PATCH v2 0/4] libffi: Sync with upstream

2021-10-19 Thread David Edelsohn via Gcc-patches

Hi, H.J.

My colleague built GCC, including GCC Go, with your patch:

"I was able to build libgo and test it partially.  The results are
similar to the current master without libffi updates. But 64bit tests
aren't working in both cases. It's related to LIBPATH issues..."

- David

On Mon, Oct 18, 2021 at 11:09 AM H.J. Lu  wrote:
>
> On Mon, Oct 18, 2021 at 8:04 AM David Edelsohn  wrote:
> >
> > Hi, H.J.
> >
> > My colleague responded that GCC Go builds and works on AIX, but it
> > currently requires a special, custom version of GNU objcopy that adds
> > support for the types of features that Go requires to operate on AIX
> > XCOFF files.  Those changes have not yet been updated and contributed
> > to GNU Binutils.
> >
> > I will see if I can install that version of objcopy standalone.  We
> > also can ask Clement and ATOS to test GCC Go build with your proposed
> > libffi patch, or is it vanilla libffi trunk?
>
> My libffi branch:
>
> https://gitlab.com/x86-gcc/gcc/-/tree/users/hjl/libffi/master
>
> synced with libffi v3.4.2, not master.
>
> BTW, the current master branch won't build libgo:
>
> https://gcc.gnu.org/bugzilla/show_bug.cgi?id=102796
>
> Thanks.
>
> > Thanks, David
> >
> >
> > On Sat, Oct 16, 2021 at 3:59 PM H.J. Lu  wrote:
> > >
> > > On Sat, Oct 16, 2021 at 12:53 PM David Edelsohn  wrote:
> > > >
> > > > On Sat, Oct 16, 2021 at 1:13 PM H.J. Lu  wrote:
> > > > >
> > > > > On Sat, Oct 16, 2021 at 10:04 AM David Edelsohn  
> > > > > wrote:
> > > > > >
> > > > > > On Sat, Oct 16, 2021 at 7:48 AM H.J. Lu  wrote:
> > > > > > >
> > > > > > > On Fri, Oct 15, 2021 at 5:22 PM David Edelsohn 
> > > > > > >  wrote:
> > > > > > > >
> > > > > > > > On Fri, Oct 15, 2021 at 8:06 PM H.J. Lu  
> > > > > > > > wrote:
> > > > > > > > >
> > > > > > > > > On Wed, Oct 13, 2021 at 6:42 AM H.J. Lu  
> > > > > > > > > wrote:
> > > > > > > > > >
> > > > > > > > > > On Wed, Oct 13, 2021 at 6:03 AM Richard Biener
> > > > > > > > > >  wrote:
> > > > > > > > > > >
> > > > > > > > > > > On Wed, Oct 13, 2021 at 2:56 PM H.J. Lu 
> > > > > > > > > > >  wrote:
> > > > > > > > > > > >
> > > > > > > > > > > > On Wed, Oct 13, 2021 at 5:45 AM Richard Biener
> > > > > > > > > > > >  wrote:
> > > > > > > > > > > > >
> > > > > > > > > > > > > On Thu, Sep 2, 2021 at 5:50 PM H.J. Lu 
> > > > > > > > > > > > >  wrote:
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > Change in the v2 patch:
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > 1. Disable static trampolines by default.
> > > > > > > > > > > > > >
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > GCC maintained a copy of libffi snapshot from 2009 
> > > > > > > > > > > > > > and cherry-picked fixes
> > > > > > > > > > > > > > from upstream over the last 10+ years.  In the 
> > > > > > > > > > > > > > meantime, libffi upstream
> > > > > > > > > > > > > > has been changed significantly with new features, 
> > > > > > > > > > > > > > bug fixes and new target
> > > > > > > > > > > > > > support.  Here is a set of patches to sync with 
> > > > > > > > > > > > > > libffi 3.4.2 release and
> > > > > > > > > > > > > > make it easier to sync with libffi upstream:
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > 1. Document how to sync with upstream.
> > > > > > > > > > > > > > 2. Add scripts to help sync with upstream.
> > > > > > > > > > > > > > 3. Sync with libffi 3.4.2. This patch is quite big. 
> > > > > > > > > > > > > >  It is availale at
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > https://gitlab.com/x86-gcc/gcc/-/commit/15e80c879c571f79a0e57702848a9df5fba5be2f
> > > > > > > > > > > > > > 4. Integrate libffi build and testsuite with GCC.
> > > > > > > > > > > > >
> > > > > > > > > > > > > How did you test this?  It looks like libgo is the 
> > > > > > > > > > > > > only consumer of
> > > > > > > > > > > > > libffi these days.
> > > > > > > > > > > > > In particular go/libgo seems to be supported on 
> > > > > > > > > > > > > almost all targets besides
> > > > > > > > > > > > > darwin/windows - did you test cross and canadian 
> > > > > > > > > > > > > configurations?
> > > > > > > > > > > >
> > > > > > > > > > > > I only tested it on Linux/i686 and Linux/x86-64.   My 
> > > > > > > > > > > > understanding is that
> > > > > > > > > > > > the upstream libffi works on Darwin and Windows.
> > > > > > > > > > > >
> > > > > > > > > > > > > I applaud the attempt to sync to upsteam but I fear 
> > > > > > > > > > > > > you won't get any "review"
> > > > > > > > > > > > > of this massive diff.
> > > > > > > > > > > >
> > > > > > > > > > > > I believe that it should just work.  Our libffi is very 
> > > > > > > > > > > > much out of date.
> > > > > > > > > > >
> > > > > > > > > > > Yes, you can hope.  And yes, our libffi is out of date.
> > > > > > > > > > >
> > > > > > > > > > > Can you please do the extra step to test one weird 
> > > > > > > > > > > architecture, namely
> > > > > > > > > > > powerpc64-aix which is available on the compile-farm?
> > >

Re: [PATCH] c++: Fix up push_local_extern_decl_alias error recovery [PR102642]

2021-10-19 Thread Jakub Jelinek via Gcc-patches

On Tue, Oct 19, 2021 at 10:48:04AM -0400, Jason Merrill wrote:
> What if we use NULL_TREE for the error case instead of error_mark_node, i.e.
> 
> -  DECL_LOCAL_DECL_ALIAS (decl) = alias;
> +  if (alias != error_mark_node)
> +DECL_LOCAL_DECL_ALIAS (decl) = alias;
> 
> ?  That ought to avoid the need to change other functions, since they
> already check for null.

True, but I'm worried about e.g. maybe_version_functions which does
  if (DECL_LOCAL_DECL_P (olddecl))
{
  olddecl = DECL_LOCAL_DECL_ALIAS (olddecl);
  maybe_mark_function_versioned (olddecl);
}
(I think the above will crash of DECL_LOCAL_DECL_ALIAS is NULL) and
  if (DECL_LOCAL_DECL_P (newdecl))
{
  /* Unfortunately, we can get here before pushdecl naturally calls
 push_local_extern_decl_alias, so we need to call it directly.  */
  if (!DECL_LOCAL_DECL_ALIAS (newdecl))
push_local_extern_decl_alias (newdecl);
  newdecl = DECL_LOCAL_DECL_ALIAS (newdecl);
  maybe_mark_function_versioned (newdecl);
}
which means that if there is an error from push_local_extern_decl_alias,
we'd report it twice rather than once (once from here, another time
from do_pushdecl).
I'm afraid maybe_version_functions needs fixing no matter what,
but the NULL vs. error_mark_node decision is probably dependent on whether
it is ok to error twice or not.

Jakub

Re: [PATCH v4 3/3] rs6000: Guard some x86 intrinsics implementations

2021-10-19 Thread Paul A. Clarke via Gcc-patches

On Tue, Oct 19, 2021 at 09:32:20AM -0500, Segher Boessenkool wrote:
> On Mon, Oct 18, 2021 at 08:15:12PM -0500, Paul A. Clarke via Gcc-patches 
> wrote:
> > Some compatibility implementations of x86 intrinsics include
> > Power intrinsics which require POWER8.  Guard them.
> 
> I assume this improves on all previous commented things (you don't say
> if it does).

Sorry, I summarized the changes in the v4 cover letter. This patch
required no changes other than adding a new PR addressed by it.
The reasons for no changes was in my reply to your review of v3.

> > gcc
> > PR target/101893
> > PR target/102719
> > * config/rs6000/emmintrin.h: Guard POWER8 intrinsics.
> > * config/rs6000/pmmintrin.h: Same.
> > * config/rs6000/smmintrin.h: Same.
> > * config/rs6000/tmmintrin.h: Same.
> 
> Okay for trunk.  Thanks!

Thanks!

PC

[committed] libstdc++: Fix doxygen generation to work with relative paths

2021-10-19 Thread Jonathan Wakely via Gcc-patches

In r12-826 I tried to remove some redundant steps from the doxygen
build, but they are needed when configure is run as a relative path. The
use of pwd is to resolve the relative path to an absolute one.

libstdc++-v3/ChangeLog:

* doc/Makefile.am (stamp-html-doxygen, stamp-html-doxygen)
(stamp-latex-doxygen, stamp-man-doxygen): Fix recipes for
relative ${top_srcdir}.
* doc/Makefile.in: Regenerate.

Tested x86_64-linux. Committed to trunk.

I'll backport to gcc-11 after testing finishes.


commit 04d392e8430ca66a3f12b7db4f3cb84788269a48
Author: Jonathan Wakely 
Date:   Tue Oct 19 16:00:13 2021

libstdc++: Fix doxygen generation to work with relative paths

In r12-826 I tried to remove some redundant steps from the doxygen
build, but they are needed when configure is run as a relative path. The
use of pwd is to resolve the relative path to an absolute one.

libstdc++-v3/ChangeLog:

* doc/Makefile.am (stamp-html-doxygen, stamp-html-doxygen)
(stamp-latex-doxygen, stamp-man-doxygen): Fix recipes for
relative ${top_srcdir}.
* doc/Makefile.in: Regenerate.

diff --git a/libstdc++-v3/doc/Makefile.am b/libstdc++-v3/doc/Makefile.am
index 487e8621b23..0aacf3f27de 100644
--- a/libstdc++-v3/doc/Makefile.am
+++ b/libstdc++-v3/doc/Makefile.am
@@ -226,10 +226,11 @@ ${doxygen_outdir}/man:
mkdir -p ${doxygen_outdir}/man
 
 stamp-xml-doxygen: ${doxygen_outdir}/xml
-   @builddir=`cd ..; ${PWD_COMMAND}`; \
+   -srcdir=`cd ${top_srcdir}; ${PWD_COMMAND}`; \
+ builddir=`cd ..; ${PWD_COMMAND}`; \
  ${SHELL} ${doxygen_script} \
  --host_alias=${host_alias} --mode=xml \
- "${top_srcdir}" "$${builddir}" NO || true
+ "$${srcdir}" "$${builddir}" NO
$(STAMP) stamp-xml-doxygen
 
 stamp-xml-single-doxygen: stamp-xml-doxygen
@@ -239,17 +240,19 @@ stamp-xml-single-doxygen: stamp-xml-doxygen
$(STAMP) stamp-xml-single-doxygen
 
 stamp-html-doxygen: ${doxygen_outdir}/html
-   @builddir=`cd ..; ${PWD_COMMAND}`; \
+   -srcdir=`cd ${top_srcdir}; ${PWD_COMMAND}`; \
+ builddir=`cd ..; ${PWD_COMMAND}`; \
  ${SHELL} ${doxygen_script} \
  --host_alias=${host_alias} --mode=html \
- "${top_srcdir}" "$${builddir}" YES || true
+ "$${srcdir}" "$${builddir}" YES
$(STAMP) stamp-html-doxygen
 
 stamp-latex-doxygen: ${doxygen_outdir}/latex
-   @builddir=`cd ..; ${PWD_COMMAND}`; \
+   -srcdir=`cd ${top_srcdir}; ${PWD_COMMAND}`; \
+ builddir=`cd ..; ${PWD_COMMAND}`; \
  ${SHELL} ${doxygen_script} \
  --host_alias=${host_alias} --mode=latex --latex_cmd=$(LATEX_CMD) \
- "${top_srcdir}" "$${builddir}" NO || true
+ "$${srcdir}" "$${builddir}" NO
$(STAMP) stamp-latex-doxygen
 
 # Chance of loonnggg creation time on this rule.  Iff this fails,
@@ -274,10 +277,11 @@ stamp-pdf-doxygen: stamp-latex-doxygen 
${doxygen_outdir}/pdf
$(STAMP) stamp-pdf-doxygen
 
 stamp-man-doxygen: ${doxygen_outdir}/man
-   @builddir=`cd ..; ${PWD_COMMAND}`; \
+   -srcdir=`cd ${top_srcdir}; ${PWD_COMMAND}`; \
+ builddir=`cd ..; ${PWD_COMMAND}`; \
  ${SHELL} ${doxygen_script} \
  --host_alias=${host_alias} --mode=man \
- "${top_srcdir}" "$${builddir}" YES || true
+ "$${srcdir}" "$${builddir}" YES
$(STAMP) stamp-man-doxygen
 
 doc-xml-doxygen: stamp-xml-doxygen

[committed] libstdc++: Implement std::random_device::entropy() for other sources

2021-10-19 Thread Jonathan Wakely via Gcc-patches

Currently this function only returns a non-zero value for /dev/random
and /dev/urandom. When a hardware instruction such as RDRAND is in use
it should (in theory) be perfectly random and produce 32 bits of entropy
in each 32-bit result. Add a helper function to identify the source of
randomness from the _M_func and _M_file data members, and return a
suitable value when RDRAND or RDSEED is being used.

libstdc++-v3/ChangeLog:

* src/c++11/random.cc (which_source): New helper function.
(random_device::_M_getentropy()): Use which_source and return
suitable values for sources other than device files.
* testsuite/26_numerics/random/random_device/entropy.cc: New test.

Tested powerpc64le-linux. Committed to trunk.

commit 58f339fc5eaae7db9526f81ab91f282ad4a9b8cc
Author: Jonathan Wakely 
Date:   Tue Oct 19 12:31:06 2021

libstdc++: Implement std::random_device::entropy() for other sources

Currently this function only returns a non-zero value for /dev/random
and /dev/urandom. When a hardware instruction such as RDRAND is in use
it should (in theory) be perfectly random and produce 32 bits of entropy
in each 32-bit result. Add a helper function to identify the source of
randomness from the _M_func and _M_file data members, and return a
suitable value when RDRAND or RDSEED is being used.

libstdc++-v3/ChangeLog:

* src/c++11/random.cc (which_source): New helper function.
(random_device::_M_getentropy()): Use which_source and return
suitable values for sources other than device files.
* testsuite/26_numerics/random/random_device/entropy.cc: New test.

diff --git a/libstdc++-v3/src/c++11/random.cc b/libstdc++-v3/src/c++11/random.cc
index 44b9f30e4a9..4b64bde00ea 100644
--- a/libstdc++-v3/src/c++11/random.cc
+++ b/libstdc++-v3/src/c++11/random.cc
@@ -192,6 +192,51 @@ namespace std _GLIBCXX_VISIBILITY(default)
   return lcg();
 }
 #endif
+
+enum Which {
+  rand_s = 1, rdseed = 2, rdrand = 4, device_file = 8, prng = 16,
+  any = 0x
+};
+
+inline Which
+which_source(random_device::result_type (*func [[maybe_unused]])(void*),
+void* file [[maybe_unused]])
+{
+#ifdef _GLIBCXX_USE_CRT_RAND_S
+  if (func == &__winxp_rand_s)
+   return rand_s;
+#endif
+
+#ifdef USE_RDSEED
+#ifdef USE_RDRAND
+  if (func == &__x86_rdseed_rdrand)
+   return rdseed;
+#endif
+  if (func == &__x86_rdseed)
+   return rdseed;
+#endif
+
+#ifdef USE_RDRAND
+  if (func == &__x86_rdrand)
+   return rdrand;
+#endif
+
+#ifdef _GLIBCXX_USE_DEV_RANDOM
+  if (file != nullptr)
+   return device_file;
+#endif
+
+#ifdef USE_LCG
+  if (func == &__lcg)
+   return prng;
+#endif
+
+#ifdef USE_MT19937
+  return prng;
+#endif
+
+  return any; // should be unreachable
+}
   }
 
   void
@@ -209,10 +254,7 @@ namespace std _GLIBCXX_VISIBILITY(default)
 
 const char* fname [[gnu::unused]] = nullptr;
 
-enum {
-   rand_s = 1, rdseed = 2, rdrand = 4, device_file = 8, prng = 16,
-   any = 0x
-} which;
+Which which;
 
 if (token == "default")
   {
@@ -449,10 +491,25 @@ namespace std _GLIBCXX_VISIBILITY(default)
   double
   random_device::_M_getentropy() const noexcept
   {
+const int max = sizeof(result_type) * __CHAR_BIT__;
+
+switch(which_source(_M_func, _M_file))
+{
+case rdrand:
+case rdseed:
+  return (double) max;
+case rand_s:
+case prng:
+  return 0.0;
+case device_file:
+  // handled below
+  break;
+default:
+  return 0.0;
+}
+
 #if defined _GLIBCXX_USE_DEV_RANDOM \
 && defined _GLIBCXX_HAVE_SYS_IOCTL_H && defined RNDGETENTCNT
-if (!_M_file)
-  return 0.0;
 
 #ifdef USE_POSIX_FILE_IO
 const int fd = _M_fd;
@@ -469,7 +526,6 @@ namespace std _GLIBCXX_VISIBILITY(default)
 if (ent < 0)
   return 0.0;
 
-const int max = sizeof(result_type) * __CHAR_BIT__;
 if (ent > max)
   ent = max;
 
diff --git a/libstdc++-v3/testsuite/26_numerics/random/random_device/entropy.cc 
b/libstdc++-v3/testsuite/26_numerics/random/random_device/entropy.cc
new file mode 100644
index 000..9ef1538d2bb
--- /dev/null
+++ b/libstdc++-v3/testsuite/26_numerics/random/random_device/entropy.cc
@@ -0,0 +1,37 @@
+// { dg-do run { target c++11 } }
+
+#include 
+#include 
+#include 
+
+void
+test01()
+{
+  for (auto token : { "mt19937", "prng", "rand_s" })
+if (__gnu_test::random_device_available(token))
+  VERIFY( std::random_device(token).entropy() == 0.0 );
+
+  using result_type = std::random_device::result_type;
+  const double max = std::log2(std::numeric_limits::max() + 1.0);
+
+  for (auto token : { "/dev/random", "/dev/urandom" })
+if (__gnu_test::random_device_available(token))
+{
+  const double entropy = std::random_device(token).entropy();
+  VERIFY( entropy >= 0.0 );
+  VERIFY( entropy <= max

[PATCH] libstdc++: Add support for POWER9 DARN instruction to std::random_device

2021-10-19 Thread Jonathan Wakely via Gcc-patches

The ISA-3.0 instruction set includes DARN ("deliver a random number")
which can be used similar to the existing support for RDRAND and RDSEED.

libstdc++-v3/ChangeLog:

* src/c++11/random.cc (USE_DARN): Define.
(__ppc_darn): New function to use POWER9 DARN instruction.
(Which): Add 'darn' enumerator.
(which_source): Check for __ppc_darn.
(random_device::_M_init): Support "darn" and "hw" tokens.
(random_device::_M_getentropy): Add darn to switch.
* testsuite/26_numerics/random/random_device/cons/token.cc:
Check "darn" token.
* testsuite/26_numerics/random/random_device/entropy.cc:
Likewise.

Tested powerpc64le-linux (power8 and power9) and x86_64-linux.

The new "darn" (power-specific) and "hw" (x86 and power)
strings should be documented, but I'll do that if this gets committed.

Most of this patch is just "more of the same", similar to the existing
code for RDRAND and RDSEED on x86, but the parts of the patch I'd like
more eyes on are:


+#elif defined __powerpc__ && defined __BUILTIN_CPU_SUPPORTS__
+# define USE_DARN 1
 #endif

 #include 


and:


@@ -135,6 +137,15 @@ namespace std _GLIBCXX_VISIBILITY(default)
 #endif
 #endif

+#ifdef USE_DARN
+unsigned int
+__attribute__((target("power9")))
+__ppc_darn(void*)
+{
+  return __builtin_darn_32();
+}
+#endif
+


and:

@@ -346,6 +375,17 @@ namespace std _GLIBCXX_VISIBILITY(default)
 }
 #endif // USE_RDRAND

+#ifdef USE_DARN
+if (which & darn)
+  {
+   if (__builtin_cpu_supports("darn"))
+ {
+   _M_func = &__ppc_darn;
+   return;
+ }
+  }
+#endif // USE_DARN
+
 #ifdef _GLIBCXX_USE_DEV_RANDOM
 if (which & device_file)
 {

commit 5cfef2d435b5cb3b3e959e14e4b1edde8edea473
Author: Jonathan Wakely 
Date:   Tue Oct 19 12:53:00 2021

libstdc++: Add support for POWER9 DARN instruction to std::random_device

The ISA-3.0 instruction set includes DARN ("deliver a random number")
which can be used similar to the existing support for RDRAND and RDSEED.

libstdc++-v3/ChangeLog:

* src/c++11/random.cc (USE_DARN): Define.
(__ppc_darn): New function to use POWER9 DARN instruction.
(Which): Add 'darn' enumerator.
(which_source): Check for __ppc_darn.
(random_device::_M_init): Support "darn" and "hw" tokens.
(random_device::_M_getentropy): Add darn to switch.
* testsuite/26_numerics/random/random_device/cons/token.cc:
Check "darn" token.
* testsuite/26_numerics/random/random_device/entropy.cc:
Likewise.

diff --git a/libstdc++-v3/src/c++11/random.cc b/libstdc++-v3/src/c++11/random.cc
index 4b64bde00ea..96d59799c2c 100644
--- a/libstdc++-v3/src/c++11/random.cc
+++ b/libstdc++-v3/src/c++11/random.cc
@@ -37,6 +37,8 @@
 # ifdef _GLIBCXX_X86_RDSEED
 #  define USE_RDSEED 1
 # endif
+#elif defined __powerpc__ && defined __BUILTIN_CPU_SUPPORTS__
+# define USE_DARN 1
 #endif
 
 #include 
@@ -69,7 +71,7 @@
 #if defined _GLIBCXX_USE_CRT_RAND_S || defined _GLIBCXX_USE_DEV_RANDOM
 // The OS provides a source of randomness we can use.
 # pragma GCC poison _M_mt
-#elif defined USE_RDRAND || defined USE_RDSEED
+#elif defined USE_RDRAND || defined USE_RDSEED || defined USE_DARN
 // Hardware instructions might be available, but use cpuid checks at runtime.
 # pragma GCC poison _M_mt
 // If the runtime cpuid checks fail we'll use a linear congruential engine.
@@ -135,6 +137,15 @@ namespace std _GLIBCXX_VISIBILITY(default)
 #endif
 #endif
 
+#ifdef USE_DARN
+unsigned int
+__attribute__((target("power9")))
+__ppc_darn(void*)
+{
+  return __builtin_darn_32();
+}
+#endif
+
 #ifdef _GLIBCXX_USE_CRT_RAND_S
 unsigned int
 __winxp_rand_s(void*)
@@ -193,11 +204,16 @@ namespace std _GLIBCXX_VISIBILITY(default)
 }
 #endif
 
-enum Which {
-  rand_s = 1, rdseed = 2, rdrand = 4, device_file = 8, prng = 16,
+enum Which : unsigned {
+  device_file = 1, prng = 2, rand_s = 4,
+  rdseed = 64, rdrand = 128, darn = 256,
   any = 0x
 };
 
+constexpr Which
+operator|(Which l, Which r) noexcept
+{ return Which(unsigned(l) | unsigned(r)); }
+
 inline Which
 which_source(random_device::result_type (*func [[maybe_unused]])(void*),
 void* file [[maybe_unused]])
@@ -221,6 +237,11 @@ namespace std _GLIBCXX_VISIBILITY(default)
return rdrand;
 #endif
 
+#ifdef USE_DARN
+  if (func == &__ppc_darn)
+   return darn;
+#endif
+
 #ifdef _GLIBCXX_USE_DEV_RANDOM
   if (file != nullptr)
return device_file;
@@ -269,6 +290,14 @@ namespace std _GLIBCXX_VISIBILITY(default)
 else if (token == "rdrand" || token == "rdrnd")
   which = rdrand;
 #endif // USE_RDRAND
+#ifdef USE_DARN
+else if (token == "darn")
+  which = darn;
+#endif
+#if defined USE_RDRAND || defined USE_RDSEED || defined USE_DA

Re: [PATCH v2 0/4] libffi: Sync with upstream

2021-10-19 Thread H.J. Lu via Gcc-patches

On Tue, Oct 19, 2021 at 8:03 AM David Edelsohn  wrote:
>
> Hi, H.J.
>
> My colleague built GCC, including GCC Go, with your patch:
>
> "I was able to build libgo and test it partially.  The results are
> similar to the current master without libffi updates. But 64bit tests
> aren't working in both cases. It's related to LIBPATH issues..."
>

Thanks for checking.  I will rebase and retest.  If there is no regression,
I will check them in.

Thanks.

-- 
H.J.

[PATCH] x86: Adjust gcc.target/i386/pr22076.c

2021-10-19 Thread H.J. Lu via Gcc-patches

commit 247c407c83f0015f4b92d5f71e45b63192f6757e
Author: Roger Sayle 
Date:   Mon Oct 18 12:15:40 2021 +0100

Try placing RTL folded constants in the constant pool.

My recent attempts to come up with a testcase for my patch to evaluate
ss_plus in simplify-rtx.c, identified a missed optimization opportunity
(that's potentially a long-time regression): The RTL optimizers no longer
place constants in the constant pool.

changed -m32 codegen from

movq.LC1, %mm0
paddb   .LC0, %mm0
movq%mm0, x
ret

to

movl$807671820, %eax
movl$1616136252, %edx
movl%eax, x
movl%edx, x+4
ret

and -m64 codegen from

movq.LC1(%rip), %mm0
paddb   .LC0(%rip), %mm0
movq%xmm0, x(%rip)
ret

to

movq.LC2(%rip), %rax
movq%rax, x(%rip)
ret

Adjust pr22076.c to check that MMX register isn't used since avoiding
MMX register isn't a bad thing.

PR testsuite/102840
* gcc.target/i386/pr22076.c: Updated to check that MMX register
isn't used.
---
 gcc/testsuite/gcc.target/i386/pr22076.c | 5 +++--
 1 file changed, 3 insertions(+), 2 deletions(-)

diff --git a/gcc/testsuite/gcc.target/i386/pr22076.c 
b/gcc/testsuite/gcc.target/i386/pr22076.c
index 427ffcd4920..aa06f057690 100644
--- a/gcc/testsuite/gcc.target/i386/pr22076.c
+++ b/gcc/testsuite/gcc.target/i386/pr22076.c
@@ -15,5 +15,6 @@ void test ()
   x = _mm_add_pi8 (mm0, mm1);
 }
 
-/* { dg-final { scan-assembler-times "movq" 2 } } */
-/* { dg-final { scan-assembler-not "movl" { target nonpic } } } */
+/* { dg-final { scan-assembler-times "movq" 2 { target { ! ia32 } } } } */
+/* { dg-final { scan-assembler-times "movl" 4 { target { nonpic && ia32 } } } 
} */
+/* { dg-final { scan-assembler-not "%mm" } }  */
-- 
2.32.0

[PATCH] libstdc++: Implement LWG 3470 change to ranges::subrange

2021-10-19 Thread Patrick Palka via Gcc-patches

Tested on x86_64-pc-linux-gnu, does this look OK for trunk and branches?

libstdc++-v3/ChangeLog:

* include/bits/ranges_util.h
(__detail::__uses_nonqualification_pointer_conversion): Define
and use it ...
(__convertible_to_nonslicing): ... here, as per LWG 3470.
* testsuite/std/ranges/subrange/1.cc: New test.
---
 libstdc++-v3/include/bits/ranges_util.h   | 13 +
 .../testsuite/std/ranges/subrange/1.cc| 19 +++
 2 files changed, 28 insertions(+), 4 deletions(-)
 create mode 100644 libstdc++-v3/testsuite/std/ranges/subrange/1.cc

diff --git a/libstdc++-v3/include/bits/ranges_util.h 
b/libstdc++-v3/include/bits/ranges_util.h
index 7e7b958d274..765848e327d 100644
--- a/libstdc++-v3/include/bits/ranges_util.h
+++ b/libstdc++-v3/include/bits/ranges_util.h
@@ -184,11 +184,16 @@ namespace ranges
 
   namespace __detail
   {
-template
+template
+  concept __uses_nonqualification_pointer_conversion
+   = is_pointer_v<_From> && is_pointer_v<_To>
+ && !convertible_to(*)[],
+remove_pointer_t<_To>(*)[]>;
+
+template
   concept __convertible_to_non_slicing = convertible_to<_From, _To>
-   && !(is_pointer_v> && is_pointer_v>
-   && __different_from>,
-   remove_pointer_t>>);
+   && !__uses_nonqualification_pointer_conversion,
+  decay_t<_To>>;
 
 template
   concept __pair_like
diff --git a/libstdc++-v3/testsuite/std/ranges/subrange/1.cc 
b/libstdc++-v3/testsuite/std/ranges/subrange/1.cc
new file mode 100644
index 000..8a53261c78c
--- /dev/null
+++ b/libstdc++-v3/testsuite/std/ranges/subrange/1.cc
@@ -0,0 +1,19 @@
+// { dg-options "-std=gnu++20" }
+// { dg-do run { target c++20 } }
+
+#include 
+
+void
+test01()
+{
+  // LWG 3470
+  int a[3] = {1,2,3};
+  int* b[3] = {&a[2], &a[0], &a[1]};
+  auto c = std::ranges::subrange(b);
+}
+
+int
+main()
+{
+  test01();
+}
-- 
2.33.1.711.g9d530dc002

[PATCH] libstdc++: Implement LWG 3568 change to ranges::basic_istream_view

2021-10-19 Thread Patrick Palka via Gcc-patches

Tested on x86_64-pc-linux-gnu, does this look OK for trunk?
(The branches don't have P2325R3.)

libstdc++-v3/ChangeLog:

* include/std/ranges (basic_istream_view::_M_object): Value
initialize as per LWG 3568.
---
 libstdc++-v3/include/std/ranges | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/libstdc++-v3/include/std/ranges b/libstdc++-v3/include/std/ranges
index 5e18c98eb2f..4a90f115d2f 100644
--- a/libstdc++-v3/include/std/ranges
+++ b/libstdc++-v3/include/std/ranges
@@ -711,7 +711,7 @@ namespace views
 
 private:
   basic_istream<_CharT, _Traits>* _M_stream;
-  _Val _M_object;
+  _Val _M_object = _Val();
 
   struct _Iterator
   {
-- 
2.33.1.711.g9d530dc002

[PATCH] libstdc++: Implement LWG 3580 changes to ranges::iota_view

2021-10-19 Thread Patrick Palka via Gcc-patches

Tested on x86_64-pc-linux-gnu, does this look OK for trunk and branches?

libstdc++-v3/ChangeLog:

* include/std/ranges (iota_view::operator+): Adjust definition
as per LWG 3580.
(iota_view::operator-): Likewise.
---
 libstdc++-v3/include/std/ranges | 10 --
 1 file changed, 8 insertions(+), 2 deletions(-)

diff --git a/libstdc++-v3/include/std/ranges b/libstdc++-v3/include/std/ranges
index 18bd087985c..5e18c98eb2f 100644
--- a/libstdc++-v3/include/std/ranges
+++ b/libstdc++-v3/include/std/ranges
@@ -497,7 +497,10 @@ namespace ranges
friend constexpr _Iterator
operator+(_Iterator __i, difference_type __n)
  requires __detail::__advanceable<_Winc>
-   { return __i += __n; }
+   {
+ __i += __n;
+ return __i;
+   }
 
friend constexpr _Iterator
operator+(difference_type __n, _Iterator __i)
@@ -507,7 +510,10 @@ namespace ranges
friend constexpr _Iterator
operator-(_Iterator __i, difference_type __n)
  requires __detail::__advanceable<_Winc>
-   { return __i -= __n; }
+   {
+ __i -= __n;
+ return __i;
+   }
 
friend constexpr difference_type
operator-(const _Iterator& __x, const _Iterator& __y)
-- 
2.33.1.711.g9d530dc002

Re: [PATCH] libstdc++: Implement LWG 3580 changes to ranges::iota_view

2021-10-19 Thread Jonathan Wakely via Gcc-patches

On Tue, 19 Oct 2021, 19:33 Patrick Palka via Libstdc++, <
libstd...@gcc.gnu.org> wrote:

> Tested on x86_64-pc-linux-gnu, does this look OK for trunk and branches?
>


Yes, thanks.



> libstdc++-v3/ChangeLog:
>
> * include/std/ranges (iota_view::operator+): Adjust definition
> as per LWG 3580.
> (iota_view::operator-): Likewise.
> ---
>  libstdc++-v3/include/std/ranges | 10 --
>  1 file changed, 8 insertions(+), 2 deletions(-)
>
> diff --git a/libstdc++-v3/include/std/ranges
> b/libstdc++-v3/include/std/ranges
> index 18bd087985c..5e18c98eb2f 100644
> --- a/libstdc++-v3/include/std/ranges
> +++ b/libstdc++-v3/include/std/ranges
> @@ -497,7 +497,10 @@ namespace ranges
> friend constexpr _Iterator
> operator+(_Iterator __i, difference_type __n)
>   requires __detail::__advanceable<_Winc>
> -   { return __i += __n; }
> +   {
> + __i += __n;
> + return __i;
> +   }
>
> friend constexpr _Iterator
> operator+(difference_type __n, _Iterator __i)
> @@ -507,7 +510,10 @@ namespace ranges
> friend constexpr _Iterator
> operator-(_Iterator __i, difference_type __n)
>   requires __detail::__advanceable<_Winc>
> -   { return __i -= __n; }
> +   {
> + __i -= __n;
> + return __i;
> +   }
>
> friend constexpr difference_type
> operator-(const _Iterator& __x, const _Iterator& __y)
> --
> 2.33.1.711.g9d530dc002
>
>

Re: [PATCH] libstdc++: Implement LWG 3568 change to ranges::basic_istream_view

2021-10-19 Thread Jonathan Wakely via Gcc-patches

On Tue, 19 Oct 2021, 19:31 Patrick Palka via Libstdc++, <
libstd...@gcc.gnu.org> wrote:

> Tested on x86_64-pc-linux-gnu, does this look OK for trunk?
>

Yes, thanks.


(The branches don't have P2325R3.)
>
> libstdc++-v3/ChangeLog:
>
> * include/std/ranges (basic_istream_view::_M_object): Value
> initialize as per LWG 3568.
> ---
>  libstdc++-v3/include/std/ranges | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/libstdc++-v3/include/std/ranges
> b/libstdc++-v3/include/std/ranges
> index 5e18c98eb2f..4a90f115d2f 100644
> --- a/libstdc++-v3/include/std/ranges
> +++ b/libstdc++-v3/include/std/ranges
> @@ -711,7 +711,7 @@ namespace views
>
>  private:
>basic_istream<_CharT, _Traits>* _M_stream;
> -  _Val _M_object;
> +  _Val _M_object = _Val();
>
>struct _Iterator
>{
> --
> 2.33.1.711.g9d530dc002
>
>

Re: [PATCH] libstdc++: Implement LWG 3470 change to ranges::subrange

2021-10-19 Thread Jonathan Wakely via Gcc-patches

On Tue, 19 Oct 2021, 19:30 Patrick Palka via Libstdc++, <
libstd...@gcc.gnu.org> wrote:

> Tested on x86_64-pc-linux-gnu, does this look OK for trunk and branches?
>

Yes, thanks.


> libstdc++-v3/ChangeLog:
>
> * include/bits/ranges_util.h
> (__detail::__uses_nonqualification_pointer_conversion): Define
> and use it ...
> (__convertible_to_nonslicing): ... here, as per LWG 3470.
> * testsuite/std/ranges/subrange/1.cc: New test.
> ---
>  libstdc++-v3/include/bits/ranges_util.h   | 13 +
>  .../testsuite/std/ranges/subrange/1.cc| 19 +++
>  2 files changed, 28 insertions(+), 4 deletions(-)
>  create mode 100644 libstdc++-v3/testsuite/std/ranges/subrange/1.cc
>
> diff --git a/libstdc++-v3/include/bits/ranges_util.h
> b/libstdc++-v3/include/bits/ranges_util.h
> index 7e7b958d274..765848e327d 100644
> --- a/libstdc++-v3/include/bits/ranges_util.h
> +++ b/libstdc++-v3/include/bits/ranges_util.h
> @@ -184,11 +184,16 @@ namespace ranges
>
>namespace __detail
>{
> -template
> +template
> +  concept __uses_nonqualification_pointer_conversion
> +   = is_pointer_v<_From> && is_pointer_v<_To>
> + && !convertible_to(*)[],
> +remove_pointer_t<_To>(*)[]>;
> +
> +template
>concept __convertible_to_non_slicing = convertible_to<_From, _To>
> -   && !(is_pointer_v> && is_pointer_v>
> -   && __different_from>,
> -   remove_pointer_t>>);
> +   && !__uses_nonqualification_pointer_conversion,
> +  decay_t<_To>>;
>
>  template
>concept __pair_like
> diff --git a/libstdc++-v3/testsuite/std/ranges/subrange/1.cc
> b/libstdc++-v3/testsuite/std/ranges/subrange/1.cc
> new file mode 100644
> index 000..8a53261c78c
> --- /dev/null
> +++ b/libstdc++-v3/testsuite/std/ranges/subrange/1.cc
> @@ -0,0 +1,19 @@
> +// { dg-options "-std=gnu++20" }
> +// { dg-do run { target c++20 } }
> +
> +#include 
> +
> +void
> +test01()
> +{
> +  // LWG 3470
> +  int a[3] = {1,2,3};
> +  int* b[3] = {&a[2], &a[0], &a[1]};
> +  auto c = std::ranges::subrange(b);
> +}
> +
> +int
> +main()
> +{
> +  test01();
> +}
> --
> 2.33.1.711.g9d530dc002
>
>

Re: [PATCH] c++: Fix up push_local_extern_decl_alias error recovery [PR102642]

2021-10-19 Thread Jason Merrill via Gcc-patches


On 10/19/21 11:21, Jakub Jelinek wrote:

On Tue, Oct 19, 2021 at 10:48:04AM -0400, Jason Merrill wrote:

What if we use NULL_TREE for the error case instead of error_mark_node, i.e.

-  DECL_LOCAL_DECL_ALIAS (decl) = alias;
+  if (alias != error_mark_node)
+DECL_LOCAL_DECL_ALIAS (decl) = alias;

?  That ought to avoid the need to change other functions, since they
already check for null.


True, but I'm worried about e.g. maybe_version_functions which does
   if (DECL_LOCAL_DECL_P (olddecl))
 {
   olddecl = DECL_LOCAL_DECL_ALIAS (olddecl);
   maybe_mark_function_versioned (olddecl);
 }
(I think the above will crash of DECL_LOCAL_DECL_ALIAS is NULL) and
   if (DECL_LOCAL_DECL_P (newdecl))
 {
   /* Unfortunately, we can get here before pushdecl naturally calls
  push_local_extern_decl_alias, so we need to call it directly.  */
   if (!DECL_LOCAL_DECL_ALIAS (newdecl))
 push_local_extern_decl_alias (newdecl);
   newdecl = DECL_LOCAL_DECL_ALIAS (newdecl);
   maybe_mark_function_versioned (newdecl);
 }
which means that if there is an error from push_local_extern_decl_alias,
we'd report it twice rather than once (once from here, another time
from do_pushdecl).
I'm afraid maybe_version_functions needs fixing no matter what,
but the NULL vs. error_mark_node decision is probably dependent on whether
it is ok to error twice or not.


Ah, true.  Your earlier patch is OK.

Jason

Re: [PATCH] c++: error message for dependent template members [PR70417]

2021-10-19 Thread Jason Merrill via Gcc-patches


On 10/10/21 07:28, Anthony Sharp wrote:

Hi Jason,

Hope you are well. Thanks for the feedback.

I like adding the configurability, but I think let's keep committing as
the default behavior.  And adding the parameter to the rollback function
seems unnecessary.  For the behavior argument, let's use an enum to be
clearer.

Sure, all done. The declaration of the enum could have gone in many 
places it seems but I put it in cp-tree.h.


I'd put it just above the definition of saved_token_sentinel in parser.c.


I've removed the awkward call to cp_parser_require and put it (along with the 
call to cp_parser_skip_to_end_of_template_parameter_list) in its own new 
function called cp_parser_ensure_reached_end_of_template_parameter_list.


Maybe cp_parser_require_end_of_template_parameter_list?  Either way is fine.


I think we don't want to return early here, we want to go through the
same ( check that we do for regular names.

I have changed it to do that, but could something like "operator- <" 
ever be intended as something other than the start of a template-id?


Hmm, good point; operators that are member functions must be non-static, 
so we couldn't be doing a comparison of the address of the function.



+// { dg-warning "expected \"template\" keyword before dependent template name" 
{ target c++20 } .-1 }
+// ^ bogus warning caused by the above being treated as an expression, not a 
declaration >

Hmm, that's a problem.  Can you avoid it by checking declarator_p?


We actually already check declarator_p in cp_parser_id_expression in 
that case. The reason it throws a warning is because typename14.C is 
intentionally malformed; in the eyes of the compiler it's written like 
an expression because it's missing the return type (although, even 
adding a return type would not make it valid). I'm not sure there's any 
worthwhile way around this really.


But it isn't written like an expression: the error comes when trying to 
diagnose it as an invalid type in a declaration context.


So declarator_p should be true there.  I'll fix that.

Also, some more missing template keywords seemed to crop up in the 
regression tests, so I had to sprinkle some more template keywords in a 
few. I guess there was a hidden bug that was missing a few scenarios.


Just out of curiosity, how pedantic would you say I should be when 
submitting patches to GCC etc? I regression test and bootstrap all my 
changes, but I'm always worrying about what might happen if I somehow 
forgot to stage a file in git, or attached the wrong patch file, or 
interpreted the .sum file wrong etc. Do patches go through another round 
of testing after submitting?


Not automatically (yet), but often I test other people's patches before 
applying them.


Sometimes I wonder whether I should be 
applying the patch locally and then bootstrapping and regression testing 
it again, although that's hardly fun since that usually takes around 2-3 
hours even with -j 12.



Maybe I ought to think about getting a dedicated Linux PC.


You could also apply for a GCC Compile Farm account:
https://gcc.gnu.org/wiki/CompileFarm


+  if (next_token->keyword == RID_TEMPLATE)
+{
+  /* But at least make sure it's properly formed (e.g. see PR19397).  */
+  if (cp_lexer_peek_nth_token (parser->lexer, 2)->type == CPP_NAME)
+   return 1;
+
+  return -1;
+}
+
+  /* Could be a ~ referencing the destructor of a class template.  */
+  if (next_token->type == CPP_COMPL)
+{
+  /* It could only be a template.  */
+  if (cp_lexer_peek_nth_token (parser->lexer, 2)->type == CPP_NAME)
+   return 1;
+
+  return -1;
+}


Why don't these check for the < ?


+  /* E.g. "::operator- <>". */
+  if (next_token->keyword == RID_OPERATOR)
+{
+  /* Consume it so it doesn't get in our way.  */
+  cp_lexer_consume_token (parser->lexer);
+  next_token = cp_lexer_peek_token (parser->lexer);
+  found_operator_keyword = true;
+}

...

+  if (!found_operator_keyword && next_token->type != CPP_NAME)
+return -1;


These could be if/else if so you don't need the found_operator_keyword 
variable.



+  if (next_token->type == CPP_TEMPLATE_ID)
+return 1;


This could move above the saved_token_sentinel; you won't have a 
CPP_TEMPLATE_ID after RID_OPERATOR.



+  /* If this is a function template then we would see a "(" after the
+ final ">".  It could also be a class template constructor.  */
+  if (next_token->type == CPP_OPEN_PAREN
+  /* If this is a class template then we could see a scope token after
+  the final ">".  */
+  || next_token->type == CPP_SCOPE
+  /* We could see a ";" after a variable template.  */
+  || next_token->type == CPP_SEMICOLON
+  /* We could see something like
+friend vect (::operator- <>)( const vect&, const vect& );
+*/
+  || next_token->type == CPP_CLOSE_PAREN)
+return 1;


This seems too limited.  As I was saying before,

Re: [PATCH] elf: Add __libc_get_static_tls_bounds [BZ #16291]

2021-10-19 Thread Fāng-ruì Sòng via Gcc-patches

On Thu, Oct 14, 2021 at 5:13 PM Fangrui Song  wrote:
>
> On 2021-10-06, Fangrui Song wrote:
> >On 2021-09-27, Fangrui Song wrote:
> >>On 2021-09-27, Florian Weimer wrote:
> >>>* Fangrui Song:
> >>>
> Sanitizer runtimes need static TLS boundaries for a variety of use cases.
> 
> * asan/hwasan/msan/tsan need to unpoison static TLS blocks to prevent 
> false
>  positives due to reusing the TLS blocks with a previous thread.
> * lsan needs TCB for pointers into pthread_setspecific regions.
> 
> See https://maskray.me/blog/2021-02-14-all-about-thread-local-storage
> for details.
> 
> compiler-rt/lib/sanitizer_common/sanitizer_linux_libcdep.cpp GetTls has
> to infer the static TLS bounds from TP, _dl_get_tls_static_info, and
> hard-coded TCB sizes. Currently this is somewhat robust for
> aarch64/powerpc64/x86-64 but is brittle for many other architectures.
> 
> This patch implements __libc_get_static_tls_bounds@@GLIBC_PRIVATE which
> is available in Android bionic since API level 31. This API allows the
> sanitizer code to be more robust. _dl_get_tls_static_info@@GLIBC_PRIVATE
> can probably be removed when Clang/GCC sanitizers drop reliance on it.
> I am unclear whether the version should be GLIBC_2.*.
> >>>
> >>>Does this really cover the right memory region?  I assume LSAN needs
> >>>something that identifies pointers to malloc'ed memory that are stored
> >>>in non-malloc'ed (mmap'ed) memory.  The static TLS region is certainly a
> >>>place where such pointers can be stored.  But struct pthread also
> >>>contains other such pointers: the DTV, the TPP data, and POSIX TLS
> >>>(pthread_setspecific) data, and struct pthread is not obviously part of
> >>>the static TLS region.
> >>
> >>I know the pthread_setspecific leak detection is brittle but it is
> >>currently implemented this way ;-)
> >>
> >>https://maskray.me/blog/2021-02-14-all-about-thread-local-storage says
> >>
> >>"On glibc, GetTls returned range includes
> >>pthread::{specific_1stblock,specific} for thread-specific data keys.
> >>There is currently a hack to ignore allocations from ld.so allocated
> >>dynamic TLS blocks. Note: if the pthread::{specific_1stblock,specific}
> >>pointers are encrypted, lsan cannot track the allocation."
> >>
> >>If pthread::{specific_1stblock,specific} use an XOR technique (like
> >>__cxa_atexit/setjmp) the pthread_setspecific leak detection will stop
> >>working :(
> >>
> >>---
> >>
> >>In any case, the pthread_setspecific leak detection is a relatively
> >>minor issue. The big issue is asan/msan/tsan false positives due to
> >>reusing an (exited) thread stack or its TLS blocks.
> >>
> >>Around
> >>https://code.woboq.org/llvm/compiler-rt/lib/sanitizer_common/sanitizer_linux_libcdep.cpp.html#435
> >>there is very long messy code hard coding the thread descriptor size in
> >>glibc.
> >>
> >>Android `__libc_get_static_tls_bounds(&start_addr, &end_addr);` is the
> >>most robust one.
> >>
> >>---
> >>
> >>I ported sanitizers to musl (https://reviews.llvm.org/D93848)
> >>in LLVM 12.0.0 and fixed some TLS block detection aarch64/ppc64 issues
> >>(https://reviews.llvm.org/D98926 and its follow-up, due to the
> >>complexity I couldn't get it right in the first place), so I have some
> >>understanding about sanitizers' TLS usage.
> >
> >Adhemerval showed me that the __libc_get_static_tls_bounds behavior is
> >expected on aarch64 as well (
> >__libc_get_static_tls_bounds should match sanitizer GetTls)
> >
> >From https://gist.github.com/MaskRay/e035b85dce008f0c6d4997b98354d355
> >```
> >$ ./testrun.sh ./test-tls-boundary
> >+++GetTls: 0x7f9c5fd6c000 4416
> >get_tls=0x7f9c600b4050
> >_dl_get_tls_static_info: 4416 64
> >get_static=0x7f9c600b4070
> >__libc_get_static_tls_bounds: 0x7f9c5fd6c000 4416
> >```
> >
> >
> >
> >Is there any concern adding the interface?
>
> Gentle ping...


CC gcc-patches which ports compiler-rt and may be interested in more
reliable sanitizers.

[committed] doc: Fix typo in name of PowerPC __builtin_cpu_supports built-in

2021-10-19 Thread Jonathan Wakely via Gcc-patches

gcc/ChangeLog:

* doc/extend.texi (Basic PowerPC Built-in Functions): Fix typo.

Committed as obvious.



commit c6a1fdd6dde3a95997731c8339d70970aca67594
Author: Jonathan Wakely 
Date:   Tue Oct 19 20:37:53 2021

doc: Fix typo in name of PowerPC __builtin_cpu_supports built-in

gcc/ChangeLog:

* doc/extend.texi (Basic PowerPC Built-in Functions): Fix typo.

diff --git a/gcc/doc/extend.texi b/gcc/doc/extend.texi
index 10d466fae9a..3c942d81c32 100644
--- a/gcc/doc/extend.texi
+++ b/gcc/doc/extend.texi
@@ -17698,7 +17698,7 @@ macro @code{__BUILTIN_CPU_SUPPORTS__} if the
 @code{__builtin_cpu_supports} built-in function is fully supported.
 
 If GCC was configured to use a GLIBC before 2.23, the built-in
-function @code{__builtin_cpu_suports} always returns a 0 and the
+function @code{__builtin_cpu_supports} always returns a 0 and the
 compiler issues a warning.
 
 The following features can be

Re: [PATCH] libstdc++: Implement LWG 3549 changes to ranges::enable_view

2021-10-19 Thread Jonathan Wakely via Gcc-patches

On Tue, 19 Oct 2021 at 15:35, Patrick Palka via Libstdc++ <
libstd...@gcc.gnu.org> wrote:

> This patch also reverts the r11-3504 workaround since it's made obsolete
> by this resolution.
>
> Tested on x86_64-pc-linux-gnu, does this look OK for trunk only?
>

OK, thanks.


> libstdc++-v3/ChangeLog:
>
> * include/bits/ranges_base.h (view_interface): Forward declare.
> (__detail::__is_derived_from_view_interface_fn): Declare
> (__detail::__is_derived_from_view_interface): Define as per LWG
> 3549.
> (enable_view): Adjust as per LWG 3549.
> * include/bits/ranges_util.h (view_interface): Don't derive from
> view_base.
> * include/std/ranges (filter_view): Revert r11-3504 change.
> (transform_view): Likewise.
> (take_view): Likewise.
> (take_while_view): Likewise.
> (drop_view): Likewise.
> (drop_while_view): Likewise.
> (join_view): Likewise.
> (split_view): Likewise.
> (reverse_view): Likewise.
> * testsuite/std/ranges/adaptors/sizeof.cc: Update expected
> sizes.
> * testsuite/std/ranges/view.cc (test_view::test_view): Remove
> now that views no longer need to be default-initializable.
> (test01): New test.
> ---
>  libstdc++-v3/include/bits/ranges_base.h   | 21 -
>  libstdc++-v3/include/bits/ranges_util.h   |  2 +-
>  libstdc++-v3/include/std/ranges   | 45 +--
>  .../testsuite/std/ranges/adaptors/sizeof.cc   |  6 +--
>  libstdc++-v3/testsuite/std/ranges/view.cc | 28 ++--
>  5 files changed, 70 insertions(+), 32 deletions(-)
>
> diff --git a/libstdc++-v3/include/bits/ranges_base.h
> b/libstdc++-v3/include/bits/ranges_base.h
> index 01d0c35f4b4..7801b2fd023 100644
> --- a/libstdc++-v3/include/bits/ranges_base.h
> +++ b/libstdc++-v3/include/bits/ranges_base.h
> @@ -614,12 +614,31 @@ namespace ranges
>template
>  using range_size_t = decltype(ranges::size(std::declval<_Range&>()));
>
> +  template
> +requires is_class_v<_Derived> && same_as<_Derived,
> remove_cv_t<_Derived>>
> +class view_interface; // defined in 
> +
> +  namespace __detail
> +  {
> +template
> +  requires (!same_as<_Tp, view_interface<_Up>>)
> +  void __is_derived_from_view_interface_fn(const _Tp&,
> +  const
> view_interface<_Up>&); // not defined
> +
> +// Returns true iff _Tp has exactly one public base class that's a
> +// specialization of view_interface.
> +template
> +  concept __is_derived_from_view_interface
> +   = requires (_Tp __t) { __is_derived_from_view_interface_fn(__t,
> __t); };
> +  }
> +
>/// [range.view] The ranges::view_base type.
>struct view_base { };
>
>/// [range.view] The ranges::enable_view boolean.
>template
> -inline constexpr bool enable_view = derived_from<_Tp, view_base>;
> +inline constexpr bool enable_view = derived_from<_Tp, view_base>
> +  || __detail::__is_derived_from_view_interface<_Tp>;
>
>/// [range.view] The ranges::view concept.
>template
> diff --git a/libstdc++-v3/include/bits/ranges_util.h
> b/libstdc++-v3/include/bits/ranges_util.h
> index 1afa66d298c..5c0bef26220 100644
> --- a/libstdc++-v3/include/bits/ranges_util.h
> +++ b/libstdc++-v3/include/bits/ranges_util.h
> @@ -61,7 +61,7 @@ namespace ranges
>/// The ranges::view_interface class template
>template
>  requires is_class_v<_Derived> && same_as<_Derived,
> remove_cv_t<_Derived>>
> -class view_interface : public view_base
> +class view_interface
>  {
>  private:
>constexpr _Derived& _M_derived() noexcept
> diff --git a/libstdc++-v3/include/std/ranges
> b/libstdc++-v3/include/std/ranges
> index 64396027c1b..e47fc075bbe 100644
> --- a/libstdc++-v3/include/std/ranges
> +++ b/libstdc++-v3/include/std/ranges
> @@ -1533,9 +1533,9 @@ namespace views::__adaptor
> { return __y.__equal(__x); }
>};
>
> +  _Vp _M_base = _Vp();
>[[no_unique_address]] __detail::__box<_Pred> _M_pred;
>[[no_unique_address]] __detail::_CachedPosition<_Vp>
> _M_cached_begin;
> -  _Vp _M_base = _Vp();
>
>  public:
>filter_view() requires (default_initializable<_Vp>
> @@ -1544,7 +1544,7 @@ namespace views::__adaptor
>
>constexpr
>filter_view(_Vp __base, _Pred __pred)
> -   : _M_pred(std::move(__pred)), _M_base(std::move(__base))
> +   : _M_base(std::move(__base)), _M_pred(std::move(__pred))
>{ }
>
>constexpr _Vp
> @@ -1900,8 +1900,8 @@ namespace views::__adaptor
>   friend _Sentinel;
> };
>
> -  [[no_unique_address]] __detail::__box<_Fp> _M_fun;
>_Vp _M_base = _Vp();
> +  [[no_unique_address]] __detail::__box<_Fp> _M_fun;
>
>  public:
>transform_view() requires (default_initializable<_Vp>
> @@ -1910,7 +1910,7 @@ namespace views::__adaptor
>
>constexpr
>tra

Re: [PATCH] gcc: implement AIX-style constructors

2021-10-19 Thread David Edelsohn via Gcc-patches

Clement,

+  /* Use __C_runtime_pstartup to run ctors and register dtors.
+ This whole part should normally be in libgcc but as
+ AIX cdtors format is currently not the default, managed
+ that in collect2.  */

Why are you emitting the special startup function call in collect2.c
instead of placing it in libgcc.  The comment mentions that the
special startup function should be defined in libgcc.

Yes, the AIX ld bcdtors mechanism is not the default, but what is the
harm? The symbol will be defined and exported by libgcc. If the AIX
linker -bcdtors functionality is not invoked, the symbol is not used.
And if a user does invoke the AIX linker with -bcdtors, the behavior
will be the same (either the program was compiled to use AIX cdtors or
not, which is the same if the startup function is emitted by
collect2.c.

Also, the patch should include documentation of the option.  The
documentation should mention that this is for interoperability with
IBM XL Compiler, and the option will not operate correctly unless the
application and the GCC runtime are built with the option.

Thanks, David

On Mon, Oct 18, 2021 at 3:55 AM CHIGOT, CLEMENT  wrote:
>
> AIX linker now supports constructors and destructors detection. For such
> functions to be detected, their name must starts with __sinit or __sterm.
> and -bcdtors must be passed to linker calls. It will create "_cdtors"
> symbol which can be used to launch the initialization.
>
> This patch creates a new RS6000 flag "-mcdtors=".
> With "-mcdtors=aix", gcc will generate these new constructors/destructors.
> With "-mcdtors=gcc", which is currently the default, gcc will continue
> to generate "gcc" format for constructors (ie _GLOBAL__I and _GLOBAL__D
> symbols).
> Ideally, it would have been better to enable the AIX format by default
> instead of using collect2. However, the compatibility between the
> previously-built binaries and the new ones is too complex to be done.
>
> gcc/ChangeLog:
> 2021-10-04  Clément Chigot  
>
> * collect2.c (aixbcdtors_flags): New variable.
> (main): Use it to detect -bcdtors and remove -binitfini flag.
> (write_c_file_stat): Adapt to new AIX format.
> * config/rs6000/aix.h (FILE_SINIT_FORMAT): New define.
> (FILE_STERM_FORMAT): New define.
> (TARGET_FILE_FUNCTION_FORMAT): New define.
> * config/rs6000/aix64.opt: Add -mcdtors flag.
> * config/rs6000/aix71.h (LINK_SPEC_COMMON): Pass -bcdtors when
>   -mcdtors=aix is passed.
> * config/rs6000/aix72.h (LINK_SPEC_COMMON): Likewise.
> * config/rs6000/aix73.h (LINK_SPEC_COMMON): Likewise.
> * config/rs6000/rs6000-opts.h (enum rs6000_cdtors): New enum.
> * tree.c (get_file_function_name): Add
>   TARGET_FILE_FUNCTION_FORMAT support.
>
> gcc/testsuite/ChangeLog:
> 2021-10-04  Clément Chigot  
>
> * gcc.target/powerpc/constructor-aix.c: New test.
>
>

[r12-4475 Regression] FAIL: gcc.target/i386/pr22076.c scan-assembler-times movq 2 on Linux/x86_64

2021-10-19 Thread sunil.k.pandey via Gcc-patches

On Linux/x86_64,

247c407c83f0015f4b92d5f71e45b63192f6757e is the first bad commit
commit 247c407c83f0015f4b92d5f71e45b63192f6757e
Author: Roger Sayle 
Date:   Mon Oct 18 12:15:40 2021 +0100

Try placing RTL folded constants in the constant pool.

caused

FAIL: gcc.target/i386/pr22076.c scan-assembler-not movl
FAIL: gcc.target/i386/pr22076.c scan-assembler-times movq 2

with GCC configured with

../../gcc/configure 
--prefix=/local/skpandey/gccwork/toolwork/gcc-bisect-master/master/r12-4475/usr 
--enable-clocale=gnu --with-system-zlib --with-demangler-in-ld 
--with-fpmath=sse --enable-languages=c,c++,fortran --enable-cet --without-isl 
--enable-libmpx x86_64-linux --disable-bootstrap

To reproduce:

$ cd {build_dir}/gcc && make check 
RUNTESTFLAGS="i386.exp=gcc.target/i386/pr22076.c --target_board='unix{-m32}'"
$ cd {build_dir}/gcc && make check 
RUNTESTFLAGS="i386.exp=gcc.target/i386/pr22076.c --target_board='unix{-m32\ 
-march=cascadelake}'"

(Please do not reply to this email, for question about this report, contact me 
at skpgkp2 at gmail dot com)

how does vrp2 rearrange this?

2021-10-19 Thread Andrew MacLeod via Gcc-patches


using testcase ifcvt-4.c:


typedef int word __attribute__((mode(word)));

word
foo (word x, word y, word a)
{
  word i = x;
  word j = y;
  /* Try to make taking the branch likely.  */
  __builtin_expect (x > y, 1);
  if (x > y)
    {
  i = a;
  j = i;
    }
  return i * j;

The current VRP2 pass takes:

  if (x_3(D) > y_4(D))
    goto ; [50.00%]        <<---  note the THEN target
  else
    goto ; [50.00%]
;;    succ:   3 [50.0% (guessed)]  count:536870912 (estimated 
locally) (TRUE_VALUE,EXECUTABLE)
;;    4 [50.0% (guessed)]  count:536870912 (estimated 
locally) (FALSE_VALUE,EXECUTABLE)


;;   basic block 3, loop depth 0, count 536870912 (estimated locally), 
maybe hot

;;    prev block 2, next block 4, flags: (NEW, REACHABLE, VISITED)
;;    pred:   2 [50.0% (guessed)]  count:536870912 (estimated 
locally) (TRUE_VALUE,EXECUTABLE)
;;    succ:   4 [always]  count:536870912 (estimated locally) 
(FALLTHRU,EXECUTABLE)


;;   basic block 4, loop depth 0, count 1073741824 (estimated locally), 
maybe hot

;;    prev block 3, next block 1, flags: (NEW, REACHABLE, VISITED)
;;    pred:   2 [50.0% (guessed)]  count:536870912 (estimated 
locally) (FALSE_VALUE,EXECUTABLE)
;;    3 [always]  count:536870912 (estimated locally) 
(FALLTHRU,EXECUTABLE)

  # i_1 = PHI 
  # j_2 = PHI 
  _6 = i_1 * j_2;
  # VUSE <.MEM_7(D)>
  return _6;

and turns it into :

;;   basic block 2, loop depth 0, count 1073741824 (estimated locally), 
maybe hot

;;    prev block 0, next block 3, flags: (NEW, REACHABLE, VISITED)
;;    pred:   ENTRY [always]  count:1073741824 (estimated locally) 
(FALLTHRU,EXECUTABLE)

  if (x_3(D) > y_4(D))
    goto ; [50.00%]        <<-- has been reversed.
  else
    goto ; [50.00%]
;;    succ:   4 [50.0% (guessed)]  count:536870912 (estimated 
locally) (TRUE_VALUE,EXECUTABLE)
;;    3 [50.0% (guessed)]  count:536870912 (estimated 
locally) (FALSE_VALUE,EXECUTABLE)


;;   basic block 3, loop depth 0, count 536870912 (estimated locally), 
maybe hot

;;    prev block 2, next block 4, flags: (NEW, VISITED)
;;    pred:   2 [50.0% (guessed)]  count:536870912 (estimated 
locally) (FALSE_VALUE,EXECUTABLE)
;;    succ:   4 [always]  count:536870912 (estimated locally) 
(FALLTHRU,EXECUTABLE)


;;   basic block 4, loop depth 0, count 1073741824 (estimated locally), 
maybe hot

;;    prev block 3, next block 1, flags: (NEW, REACHABLE, VISITED)
;;    pred:   3 [always]  count:536870912 (estimated locally) 
(FALLTHRU,EXECUTABLE)
;;    2 [50.0% (guessed)]  count:536870912 (estimated 
locally) (TRUE_VALUE,EXECUTABLE)

  # i_1 = PHI 
  # j_2 = PHI 
  _6 = i_1 * j_2;
  # VUSE <.MEM_7(D)>
  return _6;

So the IF has reversed the targets (but not the condition), and the PHIs 
in the target block have been adjusted.


Where does this happen? I cannot find it.  There doesnt seem to be 
anything in the IL which is reflecting the "expect" from the original 
code..  and if I run ranger vrp instead of classic_vrp, we don't do 
this...  so I'm missing something


Thanks

Andrew

Re: [PATCH] Cleanup --params for backward threader.

2021-10-19 Thread Jan-Benedict Glaw

Hi Aldy!

On Thu, 2021-10-14 16:25:48 +0200, Aldy Hernandez via Gcc-patches 
 wrote:
> The new backward threader makes some of the --param knobs used to
> control it questionable at best or no longer applicable at worst.
> 
> The fsm-maximum-phi-arguments param is unused and can be removed.
> 
> The max-fsm-thread-length param is block based which is a bit redundant,
> since we already restrict paths based on instruction estimates.
> 
> The max-fsm-thread-paths restricts the total number of threadable paths
> in a function.  We probably don't need this.  Besides, the forward
> threader has no such restriction.
> 
> OK pending tests?

This causes a regression for me. I'm auto-building lots of GCC
cross-compilers and use these to cross-build the Linux kernel.

  Using binutils/gas/gcc configured for --target=sh-linux (actual
configuration for GCC is this:

.../gcc/configure --target=sh-linux --enable-werror-always \
  --enable-languages=all --disable-gcov\
  --disable-shared --disable-threads   \
  --without-headers \
  
--prefix=/var/lib/laminar/run/gcc-sh-linux/13/toolchain-install
)

Then, building Linux for a good number of default configurations for
ARCH=sh and ARCH=arm, GCC will just loop:

$ make ARCH=sh distclean
$ cp arch/sh/configs/r7780mp_defconfig .config
$ make V=1 ARCH=sh CROSS_COMPILE=sh-linux- oldconfig < /dev/null
$ make V=1 ARCH=sh CROSS_COMPILE=sh-linux- prepare
$ make V=1 ARCH=sh CROSS_COMPILE=sh-linux- all
[...]
  sh-linux-gcc -Wp,-MMD,drivers/ata/.libata-core.o.d -nostdinc -isystem 
/tmp/testbed/install/lib/gcc/sh-linux/12.0.0/include -I./arch/sh/include 
-I./arch/sh/include/generated  -I./include -I./arch/sh/include/uapi 
-I./arch/sh/include/generated/uapi -I./include/uapi -I./include/generated/uapi 
-include ./include/linux/compiler-version.h -include ./include/linux/kconfig.h 
-include ./include/linux/compiler_types.h -D__KERNEL__ -m4 -m4-nofpu -m4a 
-m4a-nofpu -ml -mno-fdpic -Wa,-isa=sh4a-up -ffreestanding -I 
./arch/sh/include/cpu-sh4a -I ./arch/sh/include/cpu-sh4 -I 
./arch/sh/include/cpu-common -I ./arch/sh/include/mach-highlander -I 
./arch/sh/include/mach-common -fmacro-prefix-map=./= -Wall -Wundef 
-Werror=strict-prototypes -Wno-trigraphs -fno-strict-aliasing -fno-common 
-fshort-wchar -fno-PIE -Werror=implicit-function-declaration 
-Werror=implicit-int -Werror=return-type -Wno-format-security -std=gnu89 -pipe 
-m4 -m4-nofpu -m4a -m4a-nofpu -ml -mno-fdpic -Wa,-isa=sh4a-up -ffreestanding -I 
./arch/sh/include/cpu-sh4a -I ./arch/sh/include/cpu-sh4 -I 
./arch/sh/include/cpu-common -I ./arch/sh/include/mach-highlander -I 
./arch/sh/include/mach-common -fno-delete-null-pointer-checks 
-Wno-frame-address -Wno-format-truncation -Wno-format-overflow 
-Wno-address-of-packed-member -O2 -fno-allow-store-data-races 
-Wframe-larger-than=1024 -fstack-protector-strong -Wimplicit-fallthrough=5 
-Wno-main -Wno-unused-but-set-variable -Wno-unused-const-variable 
-fomit-frame-pointer -ftrivial-auto-var-init=zero 
-enable-trivial-auto-var-init-zero-knowing-it-will-be-removed-from-clang 
-fno-stack-clash-protection -g -Wdeclaration-after-statement -Wvla 
-Wno-pointer-sign -Wno-stringop-truncation -Wno-zero-length-bounds 
-Wno-array-bounds -Wno-stringop-overflow -Wno-restrict -Wno-maybe-uninitialized 
-fno-strict-overflow -fno-stack-check -fconserve-stack -Werror=date-time 
-Werror=incompatible-pointer-types -Werror=designated-init 
-Wno-packed-not-aligned-DKBUILD_MODFILE='"drivers/ata/libata"' 
-DKBUILD_BASENAME='"libata_core"' -DKBUILD_MODNAME='"libata"' 
-D__KBUILD_MODNAME=kmod_libata -c -o drivers/ata/libata-core.o 
drivers/ata/libata-core.c


(gdb) bt
#0  0x0100318e in vec::operator[] 
(ix=0, this=0x50ee6e0) at ../../gcc/gcc/vec.h:1495
#1  back_jt_path_registry::adjust_paths_after_duplication (this=0x7ffdf8b6e868, 
curr_path_num=0) at ../../gcc/gcc/tree-ssa-threadupdate.c:2315
#2  0x01003c0d in back_jt_path_registry::duplicate_thread_path 
(this=0x7ffdf8b6e868, entry=0x7f92651000c0, exit=, 
region=, n_region=8, 
current_path_no=0) at ../../gcc/gcc/tree-ssa-threadupdate.c:2546
#3  0x010051e4 in back_jt_path_registry::update_cfg 
(this=0x7ffdf8b6e868) at ../../gcc/gcc/tree-ssa-threadupdate.c:2656
#4  0x01003ecc in jt_path_registry::thread_through_all_blocks 
(this=0x7ffdf8b6e868, peel_loop_headers=) at 
../../gcc/gcc/tree-ssa-threadupdate.c:2604
#5  0x00ffb5a7 in back_threader_registry::thread_through_all_blocks 
(may_peel_loop_headers=true, this=0x7ffdf8b6e868) at 
../../gcc/gcc/tree-ssa-threadbackward.c:556
#6  back_threader::thread_through_all_blocks (may_peel_loop_headers=true, 
this=0x7ffdf8b6e860) at ../../gcc/gcc/tree-ssa-threadbackward.c:501
#7  (anonymous namespace)::try_thread_blocks (fun=fun@entry=0x7f926eb389c0) at 
../../gcc/gcc/tree-ssa-threadbackward.c:946
#8  0x00ffb5eb in (anonymous namespa

Re: [PATCH] Cleanup --params for backward threader.

2021-10-19 Thread Aldy Hernandez via Gcc-patches

That's odd. Yeah, please open a PR.

Thanks.
Aldy

On Tue, Oct 19, 2021, 22:47 Jan-Benedict Glaw  wrote:

> Hi Aldy!
>
> On Thu, 2021-10-14 16:25:48 +0200, Aldy Hernandez via Gcc-patches <
> gcc-patches@gcc.gnu.org> wrote:
> > The new backward threader makes some of the --param knobs used to
> > control it questionable at best or no longer applicable at worst.
> >
> > The fsm-maximum-phi-arguments param is unused and can be removed.
> >
> > The max-fsm-thread-length param is block based which is a bit redundant,
> > since we already restrict paths based on instruction estimates.
> >
> > The max-fsm-thread-paths restricts the total number of threadable paths
> > in a function.  We probably don't need this.  Besides, the forward
> > threader has no such restriction.
> >
> > OK pending tests?
>
> This causes a regression for me. I'm auto-building lots of GCC
> cross-compilers and use these to cross-build the Linux kernel.
>
>   Using binutils/gas/gcc configured for --target=sh-linux (actual
> configuration for GCC is this:
>
> .../gcc/configure --target=sh-linux --enable-werror-always \
>   --enable-languages=all --disable-gcov\
>   --disable-shared --disable-threads   \
>   --without-headers \
>
> --prefix=/var/lib/laminar/run/gcc-sh-linux/13/toolchain-install
> )
>
> Then, building Linux for a good number of default configurations for
> ARCH=sh and ARCH=arm, GCC will just loop:
>
> $ make ARCH=sh distclean
> $ cp arch/sh/configs/r7780mp_defconfig .config
> $ make V=1 ARCH=sh CROSS_COMPILE=sh-linux- oldconfig < /dev/null
> $ make V=1 ARCH=sh CROSS_COMPILE=sh-linux- prepare
> $ make V=1 ARCH=sh CROSS_COMPILE=sh-linux- all
> [...]
>   sh-linux-gcc -Wp,-MMD,drivers/ata/.libata-core.o.d -nostdinc -isystem
> /tmp/testbed/install/lib/gcc/sh-linux/12.0.0/include -I./arch/sh/include
> -I./arch/sh/include/generated  -I./include -I./arch/sh/include/uapi
> -I./arch/sh/include/generated/uapi -I./include/uapi
> -I./include/generated/uapi -include ./include/linux/compiler-version.h
> -include ./include/linux/kconfig.h -include
> ./include/linux/compiler_types.h -D__KERNEL__ -m4 -m4-nofpu -m4a -m4a-nofpu
> -ml -mno-fdpic -Wa,-isa=sh4a-up -ffreestanding -I
> ./arch/sh/include/cpu-sh4a -I ./arch/sh/include/cpu-sh4 -I
> ./arch/sh/include/cpu-common -I ./arch/sh/include/mach-highlander -I
> ./arch/sh/include/mach-common -fmacro-prefix-map=./= -Wall -Wundef
> -Werror=strict-prototypes -Wno-trigraphs -fno-strict-aliasing -fno-common
> -fshort-wchar -fno-PIE -Werror=implicit-function-declaration
> -Werror=implicit-int -Werror=return-type -Wno-format-security -std=gnu89
> -pipe -m4 -m4-nofpu -m4a -m4a-nofpu -ml -mno-fdpic -Wa,-isa=sh4a-up
> -ffreestanding -I ./arch/sh/include/cpu-sh4a -I ./arch/sh/include/cpu-sh4
> -I ./arch/sh/include/cpu-common -I ./arch/sh/include/mach-highlander -I
> ./arch/sh/include/mach-common -fno-delete-null-pointer-checks
> -Wno-frame-address -Wno-format-truncation -Wno-format-overflow
> -Wno-address-of-packed-member -O2 -fno-allow-store-data-races
> -Wframe-larger-than=1024 -fstack-protector-strong -Wimplicit-fallthrough=5
> -Wno-main -Wno-unused-but-set-variable -Wno-unused-const-variable
> -fomit-frame-pointer -ftrivial-auto-var-init=zero
> -enable-trivial-auto-var-init-zero-knowing-it-will-be-removed-from-clang
> -fno-stack-clash-protection -g -Wdeclaration-after-statement -Wvla
> -Wno-pointer-sign -Wno-stringop-truncation -Wno-zero-length-bounds
> -Wno-array-bounds -Wno-stringop-overflow -Wno-restrict
> -Wno-maybe-uninitialized -fno-strict-overflow -fno-stack-check
> -fconserve-stack -Werror=date-time -Werror=incompatible-pointer-types
> -Werror=designated-init -Wno-packed-not-aligned
> -DKBUILD_MODFILE='"drivers/ata/libata"' -DKBUILD_BASENAME='"libata_core"'
> -DKBUILD_MODNAME='"libata"' -D__KBUILD_MODNAME=kmod_libata -c -o
> drivers/ata/libata-core.o drivers/ata/libata-core.c
>
>
> (gdb) bt
> #0  0x0100318e in vec vl_ptr>::operator[] (ix=0, this=0x50ee6e0) at ../../gcc/gcc/vec.h:1495
> #1  back_jt_path_registry::adjust_paths_after_duplication
> (this=0x7ffdf8b6e868, curr_path_num=0) at
> ../../gcc/gcc/tree-ssa-threadupdate.c:2315
> #2  0x01003c0d in back_jt_path_registry::duplicate_thread_path
> (this=0x7ffdf8b6e868, entry=0x7f92651000c0, exit=,
> region=, n_region=8,
> current_path_no=0) at ../../gcc/gcc/tree-ssa-threadupdate.c:2546
> #3  0x010051e4 in back_jt_path_registry::update_cfg
> (this=0x7ffdf8b6e868) at ../../gcc/gcc/tree-ssa-threadupdate.c:2656
> #4  0x01003ecc in jt_path_registry::thread_through_all_blocks
> (this=0x7ffdf8b6e868, peel_loop_headers=) at
> ../../gcc/gcc/tree-ssa-threadupdate.c:2604
> #5  0x00ffb5a7 in
> back_threader_registry::thread_through_all_blocks
> (may_peel_loop_headers=true, this=0x7ffdf8b6e868) at
> ../../gcc/gcc/tree-ssa-threadbackward.c:556
> #6  back_threader::thread_through_all_blocks (may_peel_loop_heade

Re: how does vrp2 rearrange this?

2021-10-19 Thread Andrew Pinski via Gcc-patches

On Tue, Oct 19, 2021 at 1:29 PM Andrew MacLeod via Gcc-patches
 wrote:
>
> using testcase ifcvt-4.c:
>
>
> typedef int word __attribute__((mode(word)));
>
> word
> foo (word x, word y, word a)
> {
>word i = x;
>word j = y;
>/* Try to make taking the branch likely.  */
>__builtin_expect (x > y, 1);
>if (x > y)
>  {
>i = a;
>j = i;
>  }
>return i * j;
>
> The current VRP2 pass takes:
>
>if (x_3(D) > y_4(D))
>  goto ; [50.00%]<<---  note the THEN target
>else
>  goto ; [50.00%]
> ;;succ:   3 [50.0% (guessed)]  count:536870912 (estimated
> locally) (TRUE_VALUE,EXECUTABLE)
> ;;4 [50.0% (guessed)]  count:536870912 (estimated
> locally) (FALSE_VALUE,EXECUTABLE)
>
> ;;   basic block 3, loop depth 0, count 536870912 (estimated locally),
> maybe hot
> ;;prev block 2, next block 4, flags: (NEW, REACHABLE, VISITED)
> ;;pred:   2 [50.0% (guessed)]  count:536870912 (estimated
> locally) (TRUE_VALUE,EXECUTABLE)
> ;;succ:   4 [always]  count:536870912 (estimated locally)
> (FALLTHRU,EXECUTABLE)
>
> ;;   basic block 4, loop depth 0, count 1073741824 (estimated locally),
> maybe hot
> ;;prev block 3, next block 1, flags: (NEW, REACHABLE, VISITED)
> ;;pred:   2 [50.0% (guessed)]  count:536870912 (estimated
> locally) (FALSE_VALUE,EXECUTABLE)
> ;;3 [always]  count:536870912 (estimated locally)
> (FALLTHRU,EXECUTABLE)
># i_1 = PHI 
># j_2 = PHI 
>_6 = i_1 * j_2;
># VUSE <.MEM_7(D)>
>return _6;
>
> and turns it into :
>
> ;;   basic block 2, loop depth 0, count 1073741824 (estimated locally),
> maybe hot
> ;;prev block 0, next block 3, flags: (NEW, REACHABLE, VISITED)
> ;;pred:   ENTRY [always]  count:1073741824 (estimated locally)
> (FALLTHRU,EXECUTABLE)
>if (x_3(D) > y_4(D))
>  goto ; [50.00%]<<-- has been reversed.
>else
>  goto ; [50.00%]
> ;;succ:   4 [50.0% (guessed)]  count:536870912 (estimated
> locally) (TRUE_VALUE,EXECUTABLE)
> ;;3 [50.0% (guessed)]  count:536870912 (estimated
> locally) (FALSE_VALUE,EXECUTABLE)
>
> ;;   basic block 3, loop depth 0, count 536870912 (estimated locally),
> maybe hot
> ;;prev block 2, next block 4, flags: (NEW, VISITED)
> ;;pred:   2 [50.0% (guessed)]  count:536870912 (estimated
> locally) (FALSE_VALUE,EXECUTABLE)
> ;;succ:   4 [always]  count:536870912 (estimated locally)
> (FALLTHRU,EXECUTABLE)
>
> ;;   basic block 4, loop depth 0, count 1073741824 (estimated locally),
> maybe hot
> ;;prev block 3, next block 1, flags: (NEW, REACHABLE, VISITED)
> ;;pred:   3 [always]  count:536870912 (estimated locally)
> (FALLTHRU,EXECUTABLE)
> ;;2 [50.0% (guessed)]  count:536870912 (estimated
> locally) (TRUE_VALUE,EXECUTABLE)
># i_1 = PHI 
># j_2 = PHI 
>_6 = i_1 * j_2;
># VUSE <.MEM_7(D)>
>return _6;
>
> So the IF has reversed the targets (but not the condition), and the PHIs
> in the target block have been adjusted.
>
> Where does this happen? I cannot find it.  There doesnt seem to be
> anything in the IL which is reflecting the "expect" from the original
> code..  and if I run ranger vrp instead of classic_vrp, we don't do
> this...  so I'm missing something

I suspect this is an artifact of inserting and removing the assert
expressions in VRP rather than anything else.

Thanks,
Andrew

>
> Thanks
>
> Andrew
>

Re: how does vrp2 rearrange this?

2021-10-19 Thread Andrew MacLeod via Gcc-patches


On 10/19/21 5:13 PM, Andrew Pinski wrote:

On Tue, Oct 19, 2021 at 1:29 PM Andrew MacLeod via Gcc-patches
 wrote:

using testcase ifcvt-4.c:


typedef int word __attribute__((mode(word)));

word
foo (word x, word y, word a)
{
word i = x;
word j = y;
/* Try to make taking the branch likely.  */
__builtin_expect (x > y, 1);
if (x > y)
  {
i = a;
j = i;
  }
return i * j;

The current VRP2 pass takes:

if (x_3(D) > y_4(D))
  goto ; [50.00%]<<---  note the THEN target
else
  goto ; [50.00%]
;;succ:   3 [50.0% (guessed)]  count:536870912 (estimated
locally) (TRUE_VALUE,EXECUTABLE)
;;4 [50.0% (guessed)]  count:536870912 (estimated
locally) (FALSE_VALUE,EXECUTABLE)

;;   basic block 3, loop depth 0, count 536870912 (estimated locally),
maybe hot
;;prev block 2, next block 4, flags: (NEW, REACHABLE, VISITED)
;;pred:   2 [50.0% (guessed)]  count:536870912 (estimated
locally) (TRUE_VALUE,EXECUTABLE)
;;succ:   4 [always]  count:536870912 (estimated locally)
(FALLTHRU,EXECUTABLE)

;;   basic block 4, loop depth 0, count 1073741824 (estimated locally),
maybe hot
;;prev block 3, next block 1, flags: (NEW, REACHABLE, VISITED)
;;pred:   2 [50.0% (guessed)]  count:536870912 (estimated
locally) (FALSE_VALUE,EXECUTABLE)
;;3 [always]  count:536870912 (estimated locally)
(FALLTHRU,EXECUTABLE)
# i_1 = PHI 
# j_2 = PHI 
_6 = i_1 * j_2;
# VUSE <.MEM_7(D)>
return _6;

and turns it into :

;;   basic block 2, loop depth 0, count 1073741824 (estimated locally),
maybe hot
;;prev block 0, next block 3, flags: (NEW, REACHABLE, VISITED)
;;pred:   ENTRY [always]  count:1073741824 (estimated locally)
(FALLTHRU,EXECUTABLE)
if (x_3(D) > y_4(D))
  goto ; [50.00%]<<-- has been reversed.
else
  goto ; [50.00%]
;;succ:   4 [50.0% (guessed)]  count:536870912 (estimated
locally) (TRUE_VALUE,EXECUTABLE)
;;3 [50.0% (guessed)]  count:536870912 (estimated
locally) (FALSE_VALUE,EXECUTABLE)

;;   basic block 3, loop depth 0, count 536870912 (estimated locally),
maybe hot
;;prev block 2, next block 4, flags: (NEW, VISITED)
;;pred:   2 [50.0% (guessed)]  count:536870912 (estimated
locally) (FALSE_VALUE,EXECUTABLE)
;;succ:   4 [always]  count:536870912 (estimated locally)
(FALLTHRU,EXECUTABLE)

;;   basic block 4, loop depth 0, count 1073741824 (estimated locally),
maybe hot
;;prev block 3, next block 1, flags: (NEW, REACHABLE, VISITED)
;;pred:   3 [always]  count:536870912 (estimated locally)
(FALLTHRU,EXECUTABLE)
;;2 [50.0% (guessed)]  count:536870912 (estimated
locally) (TRUE_VALUE,EXECUTABLE)
# i_1 = PHI 
# j_2 = PHI 
_6 = i_1 * j_2;
# VUSE <.MEM_7(D)>
return _6;

So the IF has reversed the targets (but not the condition), and the PHIs
in the target block have been adjusted.

Where does this happen? I cannot find it.  There doesnt seem to be
anything in the IL which is reflecting the "expect" from the original
code..  and if I run ranger vrp instead of classic_vrp, we don't do
this...  so I'm missing something

I suspect this is an artifact of inserting and removing the assert
expressions in VRP rather than anything else.

Well, this fails the test because  we eventually check in RTL land to 
make sure the branch was reversed due to the expect, I guess.  and it 
seems that only VRP2 does the reversing.  so its expected.. just not 
obvious to me why, when or how.


of course, the pass is th rtl.ce1 pass... and the check is for "2 true 
changes made."  . and if the IF isn't reversed , it doesnt "report" 
that.   I have no idea what thats about.


regardless... if we DONT do this, then the assembly at the end has an 
extra branch in it, so it seems like a necessary thing.


Andrew

Re: [RFC] Remove VRP threader passes in exchange for better threading pre-VRP.

2021-10-19 Thread Jeff Law via Gcc-patches





On 10/18/2021 7:41 AM, Aldy Hernandez wrote:


After some playing, it looks like if we enable fully-resolving mode in
the *.thread passes immediately preceeding VRP, we can remove the VRP
threading passes altogether, thus removing 2 threading passes (and
forward threading passes at that!).

Whoo whoo!




The numbers look really good.  We get 6874 more jump threading passes
over my boostrap .ii files for a total 3.74% increase.  And we get that
while running marginally faster (0.19% faster, so noise).

The details are:

*** Mainline (with the loop rotation patch):
   ethread:64722
  dom:31246
  thread:73709
  vrp-thread:14357
  total:  184034

*** Removing all the VRP threaders.
  ethread:64722
  thread-full:76493
  dom:33648
  thread:16045
  total:  190908

Notice that not only do we get a lot more threads in thread-full
(resolving mode), but even DOM can get more jump threads.

This doesn't come without risks though.  The main issue is that we would
be removing one engine (forward threader), with another one (backward
threader).  But the good news is that (a) we've been using the new
backward threader for a while now (b) even the VRP threader in
mainline is using the backward threader solver.  So, all that would
really be changing would be the path discovery bits and custom copier
in the forward threader, with the backward threader bit and the
generic copier.

I personally don't think this is a big risk, because we've done all
the hard work already and it's all being stressed in one way or another.
I don't see the risk as significantly different than any other big chunk 
of development work.  Furthermore, this is a major step on the path 
we've been discussing the last couple years.   There'll be some testing 
fallout, but I think that's manageable and ultimately worth the pain to 
work through.   I can express how happy I am that we're at the point of 
zapping the two VRP threading passes.




The untested patch below is all that would need to happen, albeit with
copius changes to tests.

I'd like to see where we all stand on this before I start chugging away
at testing and other time consuming tasks.

Note, that all the relevant bits will still be tested in this release,
so I'm not gonna cry one way or another.  But it'd be nice to start
reducing passes, especially if we get a 3.74% increase in jump threads
for no time penalty.

Finally, even if we all agree, I think we should give us a week after the
loop rotation restrictions go in, because threading changes always cause
a party of unexpected things to happen.
Sure.  And FWIW, the loop rotation changes are fine IMHO.   So commit 
those when you're ready, then drop this in a week later.  All in time 
for stage1 close :-)


Jeff

Re: [RFC] Remove VRP threader passes in exchange for better threading pre-VRP.

2021-10-19 Thread Jeff Law via Gcc-patches





On 10/18/2021 8:03 AM, Aldy Hernandez wrote:



On 10/18/21 3:41 PM, Aldy Hernandez wrote:


I've been experimenting with reducing the total number of threading
passes, and I'd like to see if there's consensus/stomach for altering
the pipeline.  Note, that the goal is to remove forward threader 
clients,

not the other way around.  So, we should prefer to remove a VRP threader
instance over a *.thread one immediately before VRP.

After some playing, it looks like if we enable fully-resolving mode in
the *.thread passes immediately preceeding VRP, we can remove the VRP
threading passes altogether, thus removing 2 threading passes (and
forward threading passes at that!).


It occurs to me that we could also remove the threading before VRP 
passes, and enable a fully-resolving backward threader after VRP. I 
haven't played with this scenario, but it should be just as good.  
That being said, I don't know the intricacies of why we had both pre 
and post VRP threading passes, and if one is ideally better than the 
other.
The only post-VRP threading pass that (in my mind) makes sense is the 
one sitting between VRP and DOM and it should replace the DOM based 
threader.


Jeff

Re: [RFC] Remove VRP threader passes in exchange for better threading pre-VRP.

2021-10-19 Thread Jeff Law via Gcc-patches





On 10/19/2021 1:33 AM, Aldy Hernandez wrote:

On Tue, Oct 19, 2021 at 8:52 AM Richard Biener
 wrote:

On Mon, Oct 18, 2021 at 4:03 PM Aldy Hernandez  wrote:



On 10/18/21 3:41 PM, Aldy Hernandez wrote:


I've been experimenting with reducing the total number of threading
passes, and I'd like to see if there's consensus/stomach for altering
the pipeline.  Note, that the goal is to remove forward threader clients,
not the other way around.  So, we should prefer to remove a VRP threader
instance over a *.thread one immediately before VRP.

After some playing, it looks like if we enable fully-resolving mode in
the *.thread passes immediately preceeding VRP, we can remove the VRP
threading passes altogether, thus removing 2 threading passes (and
forward threading passes at that!).

It occurs to me that we could also remove the threading before VRP
passes, and enable a fully-resolving backward threader after VRP.  I
haven't played with this scenario, but it should be just as good.  That
being said, I don't know the intricacies of why we had both pre and post
VRP threading passes, and if one is ideally better than the other.

It was done because they were different threaders.  Since the new threader
uses built-in VRP it shouldn't really matter whether it's before or after
VRP _for the threading_, but it might be that if threading runs before VRP
then VRP itself can do a better job on cleaning up the IL.

Good point.

FWIW, earlier this season I played with replacing the VRPs with evrp
instances (which fold far more conditionals) and I found that the
threaders can actually find LESS opportunities after *vrp fold away
things.  I don't know if this is a good or a bad thing.  Perhaps we
should benchmark three alternatives:
This is expected.  VRP and DOM will sometimes find conditionals that 
they can fully optimize away.  If those conditionals are left in the IL, 
the threaders would sometimes pick them up.


So as we fold more in VRP/DOM, I'm not surprised there's fewer things 
for the threaders to find.   In general, if a conditional can be removed 
by VRP/DOM, that's the preference.




1. Mainline
2. Fully resolving threader -> VRP -> No threading.
3. No threading -> VRP -> Full resolving threader.

...and see what the actual effect is, regardless of number of threaded paths.

Speak of which, what's the blessed way of benchmarking performance
nowadays?  I've seen some PRs fly that measure some more lightweight
benchmarks (open source?) than a full blown SPEC.
Can't speak for anyone else, but I benchmark these days with spec in a 
cycle-approximate simulator :-)  I'm isolated from the whims of turbo 
modes and the like.  Of course it takes considerably longer than running 
on real hardware :-)





+  /* ?? Is this still needed.  ?? */
/* Threading can leave many const/copy propagations in the IL.
  Clean them up.  Instead of just copy_prop, we use ccp to
  compute alignment and nonzero bits.  */

Yes, it's still needed but not for the stated reason - the VRP
substitution and folding stage should deal with copy/constant propagation
but we replaced the former copy propagation with CCP to re-compute
nonzero bits & alignment so I'd change the comment to

/* Run CCP to compute alignment and nonzero bits.  */
The threaders (really the copiers) don't make much, if any, attempt to 
clean up the IL.  So, for example,  they'll often leave degenerate PHIs 
in the IL.  We need to clean that crap up or we'll get false positives 
in the middle-end diagnostics.


Jeff

Re: [PATCH 1/4] Add dump prints when execute_fixup_cfg removes a write only var store.

2021-10-19 Thread Jeff Law via Gcc-patches





On 10/18/2021 10:54 PM, apinski--- via Gcc-patches wrote:

From: Andrew Pinski 

While debugging PR 102703, I found it was hard to figure out where
the store was being removed as there was no pass which was outputting
why the store was removed.
This adds to execute_fixup_cfg the output.
Also note most of removals happen when execute_fixup_cfg is called
from the inliner.

gcc/ChangeLog:

* tree-cfg.c (execute_fixup_cfg): Output when the statement
is removed when it is a write only var.

OK
jeff

Re: [PATCH 2/4] Remove outdated comment about execute_fixup_cfg

2021-10-19 Thread Jeff Law via Gcc-patches





On 10/18/2021 10:54 PM, apinski--- via Gcc-patches wrote:

From: Andrew Pinski 

The comment about execute_fixup_cfg not being able to
run as a standalone pass is not true for a long time
now.  It has been a standalone pass for a while now.

gcc/ChangeLog:

* tree-cfg.c (execute_fixup_cfg): Remove comment
about standalone pass.

OK
jeff

Re: [PATCH 3/4] Factor out removal of write only stores from execute_fixup_cfg

2021-10-19 Thread Jeff Law via Gcc-patches





On 10/18/2021 10:54 PM, apinski--- via Gcc-patches wrote:

From: Andrew Pinski 

To make it easier to fix PR 102703, factoring this code out
to its own function makes it easier to read and less indentions
too.

gcc/ChangeLog:

* tree-cfg.c (maybe_remove_writeonly_store): New function
factored out from ...
(execute_fixup_cfg): Here. Call maybe_remove_writeonly_store.

OK
jeff

Re: [PATCH 4/4] Improve maybe_remove_writeonly_store to do a simple DCE for defining statement

2021-10-19 Thread Jeff Law via Gcc-patches





On 10/18/2021 10:54 PM, apinski--- via Gcc-patches wrote:

From: Andrew Pinski 

Instead of putting a full blow DCE after execute_fixup_cfg, it makes sense
to try to remove the defining statement for the store that is being removed.
Right now we only handle PHI node statements as there needs no extra checks
except for it is only used once in the store statement.

gcc/ChangeLog:

* tree-cfg.c (maybe_remove_writeonly_store): Remove defining
(PHI) statement of the store if possible.

This is the only part that I consider at all controversial.

Is the case you're trying to handle such that you have to eliminate the 
PHI immediately and can't wait until the next DCE pass?


If so and we want to go this direction, should we pull this out into a 
little routine?   I'm a bit surprised we don't already have one or more 
that do basically the same thing.


Jeff

Re: how does vrp2 rearrange this?

2021-10-19 Thread Andrew Pinski via Gcc-patches

On Tue, Oct 19, 2021 at 3:32 PM Andrew MacLeod  wrote:
>
> On 10/19/21 5:13 PM, Andrew Pinski wrote:
> > On Tue, Oct 19, 2021 at 1:29 PM Andrew MacLeod via Gcc-patches
> >  wrote:
> >> using testcase ifcvt-4.c:
> >>
> >>
> >> typedef int word __attribute__((mode(word)));
> >>
> >> word
> >> foo (word x, word y, word a)
> >> {
> >> word i = x;
> >> word j = y;
> >> /* Try to make taking the branch likely.  */
> >> __builtin_expect (x > y, 1);
> >> if (x > y)
> >>   {
> >> i = a;
> >> j = i;
> >>   }
> >> return i * j;
> >>


The testcase is broken anyways.
The builtin_expect should be inside the if to have any effect.  Look
at the estimated values:
   if (x_3(D) > y_4(D))
 goto ; [50.00%]<<-- has been reversed.
   else
 goto ; [50.00%]
;;succ:   4 [50.0% (guessed)]  count:536870912 (estimated
locally) (TRUE_VALUE,EXECUTABLE)
;;3 [50.0% (guessed)]  count:536870912 (estimated
locally) (FALSE_VALUE,EXECUTABLE)

See how it is 50/50?
The testcase is not even testing what it says it is testing.  Just
happened to work previously does not mean anything.  Move the
builtin_expect inside the if and try again. I am shocked it took this
long to find the testcase issue really.

Thanks,
Andrew Pinski


> >> The current VRP2 pass takes:
> >>
> >> if (x_3(D) > y_4(D))
> >>   goto ; [50.00%]<<---  note the THEN target
> >> else
> >>   goto ; [50.00%]
> >> ;;succ:   3 [50.0% (guessed)]  count:536870912 (estimated
> >> locally) (TRUE_VALUE,EXECUTABLE)
> >> ;;4 [50.0% (guessed)]  count:536870912 (estimated
> >> locally) (FALSE_VALUE,EXECUTABLE)
> >>
> >> ;;   basic block 3, loop depth 0, count 536870912 (estimated locally),
> >> maybe hot
> >> ;;prev block 2, next block 4, flags: (NEW, REACHABLE, VISITED)
> >> ;;pred:   2 [50.0% (guessed)]  count:536870912 (estimated
> >> locally) (TRUE_VALUE,EXECUTABLE)
> >> ;;succ:   4 [always]  count:536870912 (estimated locally)
> >> (FALLTHRU,EXECUTABLE)
> >>
> >> ;;   basic block 4, loop depth 0, count 1073741824 (estimated locally),
> >> maybe hot
> >> ;;prev block 3, next block 1, flags: (NEW, REACHABLE, VISITED)
> >> ;;pred:   2 [50.0% (guessed)]  count:536870912 (estimated
> >> locally) (FALSE_VALUE,EXECUTABLE)
> >> ;;3 [always]  count:536870912 (estimated locally)
> >> (FALLTHRU,EXECUTABLE)
> >> # i_1 = PHI 
> >> # j_2 = PHI 
> >> _6 = i_1 * j_2;
> >> # VUSE <.MEM_7(D)>
> >> return _6;
> >>
> >> and turns it into :
> >>
> >> ;;   basic block 2, loop depth 0, count 1073741824 (estimated locally),
> >> maybe hot
> >> ;;prev block 0, next block 3, flags: (NEW, REACHABLE, VISITED)
> >> ;;pred:   ENTRY [always]  count:1073741824 (estimated locally)
> >> (FALLTHRU,EXECUTABLE)
> >> if (x_3(D) > y_4(D))
> >>   goto ; [50.00%]<<-- has been reversed.
> >> else
> >>   goto ; [50.00%]
> >> ;;succ:   4 [50.0% (guessed)]  count:536870912 (estimated
> >> locally) (TRUE_VALUE,EXECUTABLE)
> >> ;;3 [50.0% (guessed)]  count:536870912 (estimated
> >> locally) (FALSE_VALUE,EXECUTABLE)
> >>
> >> ;;   basic block 3, loop depth 0, count 536870912 (estimated locally),
> >> maybe hot
> >> ;;prev block 2, next block 4, flags: (NEW, VISITED)
> >> ;;pred:   2 [50.0% (guessed)]  count:536870912 (estimated
> >> locally) (FALSE_VALUE,EXECUTABLE)
> >> ;;succ:   4 [always]  count:536870912 (estimated locally)
> >> (FALLTHRU,EXECUTABLE)
> >>
> >> ;;   basic block 4, loop depth 0, count 1073741824 (estimated locally),
> >> maybe hot
> >> ;;prev block 3, next block 1, flags: (NEW, REACHABLE, VISITED)
> >> ;;pred:   3 [always]  count:536870912 (estimated locally)
> >> (FALLTHRU,EXECUTABLE)
> >> ;;2 [50.0% (guessed)]  count:536870912 (estimated
> >> locally) (TRUE_VALUE,EXECUTABLE)
> >> # i_1 = PHI 
> >> # j_2 = PHI 
> >> _6 = i_1 * j_2;
> >> # VUSE <.MEM_7(D)>
> >> return _6;
> >>
> >> So the IF has reversed the targets (but not the condition), and the PHIs
> >> in the target block have been adjusted.
> >>
> >> Where does this happen? I cannot find it.  There doesnt seem to be
> >> anything in the IL which is reflecting the "expect" from the original
> >> code..  and if I run ranger vrp instead of classic_vrp, we don't do
> >> this...  so I'm missing something
> > I suspect this is an artifact of inserting and removing the assert
> > expressions in VRP rather than anything else.
> >
> Well, this fails the test because  we eventually check in RTL land to
> make sure the branch was reversed due to the expect, I guess.  and it
> seems that only VRP2 does the reversing.  so its expected.. just not
> obvious to me why, when or how.
>
> of course, the pass is th rtl.ce1 pass... and the check is for "2 true
> changes made."  . and if the IF isn't reversed , it doesnt "report"
> that.   I have no id

[PATCH] gcc-changelog: Add libffi/ to ignored_prefixes

2021-10-19 Thread H.J. Lu via Gcc-patches

Add libffi/ to ignored_prefixes for syncing with libffi upstream:

commit c095f8f2e6f26bfc2ff8e3276c6af23ab153f5ff
Author: H.J. Lu 
Date:   Tue Aug 31 07:14:47 2021 -0700

libffi: Sync with libffi 3.4.2

Merged commit: f9ea41683444ebe11cfa45b05223899764df28fb

to avoid

remote: *** The following commit was rejected by your 
hooks.commit-extra-checker script (status: 1)
remote: *** commit: c095f8f2e6f26bfc2ff8e3276c6af23ab153f5ff
remote: *** ChangeLog format failed:
remote: *** ERR: cannot find a ChangeLog location in message
remote: ***
remote: *** Please see: https://gcc.gnu.org/codingconventions.html#ChangeLogs
remote: ***
remote: error: hook declined to update refs/heads/master

* gcc-changelog/git_commit.py (ignored_prefixes): Add libffi/.
---
 contrib/gcc-changelog/git_commit.py | 1 +
 1 file changed, 1 insertion(+)

diff --git a/contrib/gcc-changelog/git_commit.py 
b/contrib/gcc-changelog/git_commit.py
index cf29f761964..60377b68ba1 100755
--- a/contrib/gcc-changelog/git_commit.py
+++ b/contrib/gcc-changelog/git_commit.py
@@ -134,6 +134,7 @@ ignored_prefixes = {
 'gcc/go/gofrontend/',
 'gcc/testsuite/gdc.test/',
 'gcc/testsuite/go.test/test/',
+'libffi/',
 'libgo/',
 'libphobos/libdruntime/',
 'libphobos/src/',
-- 
2.32.0

[PATCH] Use fold_build2 instead fold_binary for TRUTH_AND

2021-10-19 Thread Jiufu Guo via Gcc-patches

In tree_simplify_using_condition_1, there is code which should be logic:
"op0 || op1"/"op0 && op1".  When creating expression for TRUTH_OR_EXPR
and TRUTH_AND_EXPR, fold_build2 would be used instead fold_binary which
always return NULL_TREE for this kind of expr.

Bootstrap and regtest pass on ppc and ppc64le.  Is this ok for trunk?

BR,
Jiufu

gcc/ChangeLog:

* tree-ssa-loop-niter.c (tree_simplify_using_condition_1): Replace
fold_binary with fold_build2 fir logical OR/AND.

---
 gcc/tree-ssa-loop-niter.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/gcc/tree-ssa-loop-niter.c b/gcc/tree-ssa-loop-niter.c
index 75109407124..27e11a29707 100644
--- a/gcc/tree-ssa-loop-niter.c
+++ b/gcc/tree-ssa-loop-niter.c
@@ -2290,12 +2290,12 @@ tree_simplify_using_condition_1 (tree cond, tree expr)
 
   /* Check whether COND ==> EXPR.  */
   notcond = invert_truthvalue (cond);
-  e = fold_binary (TRUTH_OR_EXPR, boolean_type_node, notcond, expr);
+  e = fold_build2 (TRUTH_OR_EXPR, boolean_type_node, notcond, expr);
   if (e && integer_nonzerop (e))
 return e;
 
   /* Check whether COND ==> not EXPR.  */
-  e = fold_binary (TRUTH_AND_EXPR, boolean_type_node, cond, expr);
+  e = fold_build2 (TRUTH_AND_EXPR, boolean_type_node, cond, expr);
   if (e && integer_zerop (e))
 return e;
 
-- 
2.17.1

1 2 >

1 - 100 of 112 matches

Mail list logo