date:20211125

[PATCH] match.pd: Fix up the recent bitmask_inv_cst_vector_p simplification [PR103417]

2021-11-25 Thread Jakub Jelinek via Gcc-patches

Hi!

The following testcase is miscompiled since the r12-5489-g0888d6bbe97e10
changes.
The simplification triggers on
(x & 4294967040U) >= 0U
and turns it into:
x <= 255U
which is incorrect, it should fold to 1 because unsigned >= 0U is always
true and normally the
/* Non-equality compare simplifications from fold_binary  */
 (if (wi::to_wide (cst) == min)
   (if (cmp == GE_EXPR)
{ constant_boolean_node (true, type); })
simplification folds that, but this simplification was done earlier.

The simplification correctly doesn't include lt which has the same
reason why it shouldn't be handled, we'll fold it to 0 elsewhere.

But, IMNSHO while it isn't incorrect to handle le and gt there, it is
unnecessary.  Because (x & cst) <= 0U and (x & cst) > 0U should
never appear, again in
/* Non-equality compare simplifications from fold_binary  */
we have a simplification for it:
   (if (cmp == LE_EXPR)
(eq @2 @1))
   (if (cmp == GT_EXPR)
(ne @2 @1
This is done for
  (cmp (convert?@2 @0) uniform_integer_cst_p@1)
and so should be done for both integers and vectors.
As the bitmask_inv_cst_vector_p simplification only handles
eq and ne for signed types, I think it can be simplified to just
following patch.

Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?

I wonder if (for cst (VECTOR_CST INTEGER_CST) is good for the best
size of *-match.c, wouldn't accepting just CONSTANT_CLASS_P@1
and then just say in bitmask_inv_cst_vector_p return NULL_TREE if
it isn't INTEGER_CST or VECTOR_CST?

Also, without/with this patch I see on i686-linux (can be reproduced
with RUNTESTFLAGS="--target_board=unix/-m32/-mno-sse dg.exp='bic-bitmask* 
signbit-2*'"
too):
FAIL: gcc.dg/bic-bitmask-10.c scan-tree-dump dce7 "<=s*.+{ 255,.+}"
FAIL: gcc.dg/bic-bitmask-11.c scan-tree-dump dce7 ">s*.+{ 255,.+}"
FAIL: gcc.dg/bic-bitmask-12.c scan-tree-dump dce7 "<=s*.+{ 255,.+}"
FAIL: gcc.dg/bic-bitmask-2.c scan-tree-dump-times dce7 "<=s*.+{ 255,.+}" 1
FAIL: gcc.dg/bic-bitmask-23.c (test for excess errors)
FAIL: gcc.dg/bic-bitmask-23.c scan-tree-dump dce7 "<=s*.+{ 255, 15, 1, 
65535 }"
FAIL: gcc.dg/bic-bitmask-3.c scan-tree-dump-times dce7 "<=s*.+{ 255,.+}" 1
FAIL: gcc.dg/bic-bitmask-4.c scan-tree-dump-times dce7 "=s*.+{ 1,.+}" 1
FAIL: gcc.dg/bic-bitmask-5.c scan-tree-dump-times dce7 ">s*.+{ 255,.+}" 1
FAIL: gcc.dg/bic-bitmask-6.c scan-tree-dump-times dce7 "<=s*.+{ 255,.+}" 1
FAIL: gcc.dg/bic-bitmask-8.c scan-tree-dump-times dce7 ">s*.+{ 1,.+}" 1
FAIL: gcc.dg/bic-bitmask-9.c scan-tree-dump dce7 "&s*.+{ 4294967290,.+}"
FAIL: gcc.dg/signbit-2.c scan-tree-dump optimized "s+>s+{ 0(, 0)+ }"
Those tests use vect_int effective target, but AFAIK that can be used only
in *.dg/vect/ because it relies on vect.exp enabling options to support
vectorization on the particular target (e.g. for i686-linux that -msse2).
I think there isn't other way to get the DEFAULT_VECTCFLAGS into dg-options
other than having the test driven by vect.exp.

And, finally, I've noticed incorrect formatting in the new
bitmask_inv_cst_vector_p routine:
  do {
if (idx > 0)
  cst = vector_cst_elt (t, idx);
...
builder.quick_push (newcst);
  } while (++idx < nelts);
It should be
  do
{
  if (idx > 0)
cst = vector_cst_elt (t, idx);
...
  builder.quick_push (newcst);
}
  while (++idx < nelts);

2021-11-25  Jakub Jelinek  

PR tree-optimization/103417
* match.pd ((X & Y) CMP 0): Only handle eq and ne.  Commonalize
common tests.

* gcc.c-torture/execute/pr103417.c: New test.

--- gcc/match.pd.jj 2021-11-24 11:46:03.191918052 +0100
+++ gcc/match.pd2021-11-24 22:33:43.852575772 +0100
@@ -5214,20 +5214,16 @@ (define_operator_list SYNC_FETCH_AND_AND
 /* Transform comparisons of the form (X & Y) CMP 0 to X CMP2 Z
where ~Y + 1 == pow2 and Z = ~Y.  */
 (for cst (VECTOR_CST INTEGER_CST)
- (for cmp (le eq ne ge gt)
-  icmp (le le gt le gt)
- (simplify
-  (cmp (bit_and:c@2 @0 cst@1) integer_zerop)
-   (with { tree csts = bitmask_inv_cst_vector_p (@1); }
- (switch
-  (if (csts && TYPE_UNSIGNED (TREE_TYPE (@1))
-  && (VECTOR_TYPE_P (TREE_TYPE (@1)) || single_use (@2)))
-   (icmp @0 { csts; }))
-  (if (csts && !TYPE_UNSIGNED (TREE_TYPE (@1))
-  && (cmp == EQ_EXPR || cmp == NE_EXPR)
-  && (VECTOR_TYPE_P (TREE_TYPE (@1)) || single_use (@2)))
+ (for cmp (eq ne)
+  icmp (le gt)
+  (simplify
+   (cmp (bit_and:c@2 @0 cst@1) integer_zerop)
+(with { tree csts = bitmask_inv_cst_vector_p (@1); }
+ (if (csts && (VECTOR_TYPE_P (TREE_TYPE (@1)) || single_use (@2)))
+  (if (TYPE_UNSIGNED (TREE_TYPE (@1)))
+   (icmp @0 { csts; })
(with { tree utype = unsigned_type_for (TREE_TYPE (@1)); }
-   (icmp (convert:utype @0) { csts; }
+(icmp (convert:utype @0) { csts; }
 
 /* -A CMP -B -> B CMP A.  */
 (for cmp (tcc_comparison)
--- gcc/testsuite

Re: [PATCH][RFC] middle-end/46476 - resurrect -Wunreachable-code

2021-11-25 Thread Richard Biener via Gcc-patches

On Wed, 24 Nov 2021, Michael Matz wrote:

> Hello,
> 
> On Wed, 24 Nov 2021, Richard Biener wrote:
> 
> > >> +/* Unreachable code in if (0) block.  */
> > >> +void baz(int *p)
> > >> +{
> > >> +   if (0)
> > >> + {
> > >> +return;  /* { dg-bogus "not reachable" } */
> > >
> > >Hmm?  Why are you explicitely saying that warning here would be bogus? 
> > 
> > Because I don't think we want to warn here. Such code is common from 
> > template instantiation or macro expansion.
> 
> Like all code with an (const-propagated) explicit 'if (0)', which is of 
> course the reason why -Wunreachable-code is a challenge.

OK, so I probably shouldn't have taken -Wunreachable-code but named
it somehow differently.  We want to diagnose obvious programming
mistakes, not (source code) optimization opportunities.  So

int foo (int i)
{
  return i;
  i += 1;
  return i;
}

should be diagnosed for example but not so

int foo (int i)
{
  if (USE_NOOP_FOO)
return i;
  i += 1;
  return i;
}

and compiling with -DUSE_NOOP_FOO=1

>  IOW: I could 
> accept your argument but then wonder why you want to warn about the second 
> statement of the guarded block.  The situation was:
> 
>   if (0) {
> return;  // (1) don't warn here?
> whatever++;  // (2) but warn here?

because as said above, the whatever++ will never be reachable even if
you change the condition in the if().  See my response to Martin where
I said I think if (0) of a block is a good way to comment it out
but keep it syntactically correct.

>   }
> 
> That seems even more confusing.  So you don't want to warn about 
> unreachable code (the 'return') but you do want to warn about unreachable 
> code within unreachable code (point (2) is unreachable because of the 
> if(0) and because of the return).  If your worry is macro/template 
> expansion resulting if(0)'s then I don't see why you would only disable 
> warnings for some of the statements therein.

The point is not to disable the warning for some statements therein
but to avoid diagnosing following stmts.

> It seems we are actually interested in code unreachable via fallthrough or 
> labels, not in all unreachable code, so maybe the warning is mis-named.

Yes, that's definitely the case - I was too lazy to re-use the old
option name here.  But I don't have a good name at hand, maybe clang
has an option covering the cases I'm thinking about.

Btw, the diagnostic spotted qsort_chk doing

if (CMP (i1, i2))
  break;
else if (CMP (i2, i1))
  return ERR2 (i1, i2);

where ERR2 expands to a call to a noreturn void "returning"
qsort_chk_error, so the 'return' stmt is not reachable.  Not exactly
a bug but somewhat difficult to avoid the diagnostic for.  I suppose
the pointless 'return' is to make it more visible that the loop
terminates here (albeit we don't return normally).

Likewise we diagnose (c_tree_equal):

default:
  gcc_unreachable ();
}
  /* We can get here with --disable-checking.  */
  return false;

where the 'return false' is never reachable.  The return was likely
inserted to avoid very strange error paths then the unreachable
falls through to some other random function.

> Btw. what does the code now do about this situation:
> 
>   if (0) {
> something++;  // 1
> return;   // 2
> somethingelse++;  // 3
>   }
> 
> does it warn at (1) or not?  (I assume it unconditionally warns at (3))

It warns at (3).  It basically assumes that if (0) might become
if (1) in some other configuration and thus the diagnostic is
difficult to silence in source.

Any suggestion for a better option name?

Richard.

Re: [PATCH] bswap: Improve perform_symbolic_merge [PR103376]

2021-11-25 Thread Richard Biener via Gcc-patches

On Thu, 25 Nov 2021, Jakub Jelinek wrote:

> On Wed, Nov 24, 2021 at 09:45:16AM +0100, Richard Biener wrote:
> > > Thinking more about it, perhaps we could do more for BIT_XOR_EXPR.
> > > We could allow masked1 == masked2 case for it, but would need to
> > > do something different than the
> > >   n->n = n1->n | n2->n;
> > > we do on all the bytes together.
> > > In particular, for masked1 == masked2 if masked1 != 0 (well, for 0
> > > both variants are the same) and masked1 != 0xff we would need to
> > > clear corresponding n->n byte instead of setting it to the input
> > > as x ^ x = 0 (but if we don't know what x and y are, the result is
> > > also don't know).  Now, for plus it is much harder, because not only
> > > for non-zero operands we don't know what the result is, but it can
> > > modify upper bytes as well.  So perhaps only if current's byte
> > > masked1 && masked2 set the resulting byte to 0xff (unknown) iff
> > > the byte above it is 0 and 0, and set that resulting byte to 0xff too.
> > > Also, even for | we could instead of return NULL just set the resulting
> > > byte to 0xff if it is different, perhaps it will be masked off later on.
> > > Ok to handle that incrementally?
> > 
> > Not sure if it is worth the trouble - the XOR handling sounds
> > straight forward at least.  But sure, the merging routine could
> > simply be conservatively correct here.
> 
> This patch implements that (except that for + it just punts whenever
> both operand bytes aren't 0 like before).
> 
> Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?

OK if you can add a testcase that exercises this "feature".

Thanks,
Richard.

> 2021-11-25  Jakub Jelinek  
> 
>   PR tree-optimization/103376
>   * gimple-ssa-store-merging.c (perform_symbolic_merge): For
>   BIT_IOR_EXPR, if masked1 && masked2 && masked1 != masked2, don't
>   punt, but set the corresponding result byte to MARKER_BYTE_UNKNOWN.
>   For BIT_XOR_EXPR similarly and if masked1 == masked2 and the
>   byte isn't MARKER_BYTE_UNKNOWN, set the corresponding result byte to
>   0.
> 
> --- gcc/gimple-ssa-store-merging.c.jj 2021-11-24 09:54:37.684365460 +0100
> +++ gcc/gimple-ssa-store-merging.c2021-11-24 11:18:54.46266 +0100
> @@ -556,6 +556,7 @@ perform_symbolic_merge (gimple *source_s
>n->bytepos = n_start->bytepos;
>n->type = n_start->type;
>size = TYPE_PRECISION (n->type) / BITS_PER_UNIT;
> +  uint64_t res_n = n1->n | n2->n;
>  
>for (i = 0, mask = MARKER_MASK; i < size; i++, mask <<= BITS_PER_MARKER)
>  {
> @@ -563,12 +564,33 @@ perform_symbolic_merge (gimple *source_s
>  
>masked1 = n1->n & mask;
>masked2 = n2->n & mask;
> -  /* For BIT_XOR_EXPR or PLUS_EXPR, at least one of masked1 and masked2
> -  has to be 0, for BIT_IOR_EXPR x | x is still x.  */
> -  if (masked1 && masked2 && (code != BIT_IOR_EXPR || masked1 != masked2))
> - return NULL;
> +  /* If at least one byte is 0, all of 0 | x == 0 ^ x == 0 + x == x.  */
> +  if (masked1 && masked2)
> + {
> +   /* + can carry into upper bits, just punt.  */
> +   if (code == PLUS_EXPR)
> + return NULL;
> +   /* x | x is still x.  */
> +   if (code == BIT_IOR_EXPR && masked1 == masked2)
> + continue;
> +   if (code == BIT_XOR_EXPR)
> + {
> +   /* x ^ x is 0, but MARKER_BYTE_UNKNOWN stands for
> +  unknown values and unknown ^ unknown is unknown.  */
> +   if (masked1 == masked2
> +   && masked1 != ((uint64_t) MARKER_BYTE_UNKNOWN
> +  << i * BITS_PER_MARKER))
> + {
> +   res_n &= ~mask;
> +   continue;
> + }
> + }
> +   /* Otherwise set the byte to unknown, it might still be
> +  later masked off.  */
> +   res_n |= mask;
> + }
>  }
> -  n->n = n1->n | n2->n;
> +  n->n = res_n;
>n->n_ops = n1->n_ops + n2->n_ops;
>  
>return source_stmt;
> 
> 
>   Jakub
> 
> 

-- 
Richard Biener 
SUSE Software Solutions Germany GmbH, Maxfeldstrasse 5, 90409 Nuernberg,
Germany; GF: Ivo Totev; HRB 36809 (AG Nuernberg)

RE: [PATCH] match.pd: Fix up the recent bitmask_inv_cst_vector_p simplification [PR103417]

2021-11-25 Thread Tamar Christina via Gcc-patches

Hi Jakub,

> -Original Message-
> From: Jakub Jelinek 
> Sent: Thursday, November 25, 2021 8:19 AM
> To: Richard Biener 
> Cc: Tamar Christina ; gcc-patches@gcc.gnu.org
> Subject: [PATCH] match.pd: Fix up the recent bitmask_inv_cst_vector_p
> simplification [PR103417]
> 
> Hi!
> 
> The following testcase is miscompiled since the r12-5489-g0888d6bbe97e10
> changes.
> The simplification triggers on
> (x & 4294967040U) >= 0U
> and turns it into:
> x <= 255U
> which is incorrect, it should fold to 1 because unsigned >= 0U is always true
> and normally the
> /* Non-equality compare simplifications from fold_binary  */
>  (if (wi::to_wide (cst) == min)
>(if (cmp == GE_EXPR)
> { constant_boolean_node (true, type); }) simplification folds that, 
> but
> this simplification was done earlier.
> 
> The simplification correctly doesn't include lt which has the same reason why
> it shouldn't be handled, we'll fold it to 0 elsewhere.


Yes this was a bug, sorry I'm not sure why I didn't catch it...

> 
> But, IMNSHO while it isn't incorrect to handle le and gt there, it is
> unnecessary.  Because (x & cst) <= 0U and (x & cst) > 0U should never appear,
> again in
> /* Non-equality compare simplifications from fold_binary  */ we have a
> simplification for it:
>(if (cmp == LE_EXPR)
> (eq @2 @1))
>(if (cmp == GT_EXPR)
> (ne @2 @1
> This is done for
>   (cmp (convert?@2 @0) uniform_integer_cst_p@1) and so should be done
> for both integers and vectors.
> As the bitmask_inv_cst_vector_p simplification only handles eq and ne for
> signed types, I think it can be simplified to just following patch.

As I mentioned on the PR I don't think LE and GT should be removed, the patch
Is attempting to simplify the bitmask used because most vector ISAs can create
the simpler mask much easier than the complex mask.

It. 0xFF00 is harder to create than 0xFF.   So while for scalar it doesn't 
matter
as much, it does for vector code.

Regards,
Tamar

> 
> Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?
> 
> I wonder if (for cst (VECTOR_CST INTEGER_CST) is good for the best size of *-
> match.c, wouldn't accepting just CONSTANT_CLASS_P@1 and then just say in
> bitmask_inv_cst_vector_p return NULL_TREE if it isn't INTEGER_CST or
> VECTOR_CST?
> 
> Also, without/with this patch I see on i686-linux (can be reproduced with
> RUNTESTFLAGS="--target_board=unix/-m32/-mno-sse dg.exp='bic-bitmask*
> signbit-2*'"
> too):
> FAIL: gcc.dg/bic-bitmask-10.c scan-tree-dump dce7 "<=s*.+{ 255,.+}"
> FAIL: gcc.dg/bic-bitmask-11.c scan-tree-dump dce7 ">s*.+{ 255,.+}"
> FAIL: gcc.dg/bic-bitmask-12.c scan-tree-dump dce7 "<=s*.+{ 255,.+}"
> FAIL: gcc.dg/bic-bitmask-2.c scan-tree-dump-times dce7
> "<=s*.+{ 255,.+}" 1
> FAIL: gcc.dg/bic-bitmask-23.c (test for excess errors)
> FAIL: gcc.dg/bic-bitmask-23.c scan-tree-dump dce7 "<=s*.+{ 255, 15, 1,
> 65535 }"
> FAIL: gcc.dg/bic-bitmask-3.c scan-tree-dump-times dce7
> "<=s*.+{ 255,.+}" 1
> FAIL: gcc.dg/bic-bitmask-4.c scan-tree-dump-times dce7 "=s*.+{ 1,.+}" 1
> FAIL: gcc.dg/bic-bitmask-5.c scan-tree-dump-times dce7 ">s*.+{ 255,.+}"
> 1
> FAIL: gcc.dg/bic-bitmask-6.c scan-tree-dump-times dce7
> "<=s*.+{ 255,.+}" 1
> FAIL: gcc.dg/bic-bitmask-8.c scan-tree-dump-times dce7 ">s*.+{ 1,.+}" 1
> FAIL: gcc.dg/bic-bitmask-9.c scan-tree-dump dce7
> "&s*.+{ 4294967290,.+}"
> FAIL: gcc.dg/signbit-2.c scan-tree-dump optimized "s+>s+{ 0(, 0)+ }"
> Those tests use vect_int effective target, but AFAIK that can be used only in
> *.dg/vect/ because it relies on vect.exp enabling options to support
> vectorization on the particular target (e.g. for i686-linux that -msse2).
> I think there isn't other way to get the DEFAULT_VECTCFLAGS into dg-
> options other than having the test driven by vect.exp.
> 
> And, finally, I've noticed incorrect formatting in the new
> bitmask_inv_cst_vector_p routine:
>   do {
> if (idx > 0)
>   cst = vector_cst_elt (t, idx);
> ...
> builder.quick_push (newcst);
>   } while (++idx < nelts);
> It should be
>   do
> {
>   if (idx > 0)
>   cst = vector_cst_elt (t, idx);
> ...
>   builder.quick_push (newcst);
> }
>   while (++idx < nelts);
> 
> 2021-11-25  Jakub Jelinek  
> 
>   PR tree-optimization/103417
>   * match.pd ((X & Y) CMP 0): Only handle eq and ne.  Commonalize
>   common tests.
> 
>   * gcc.c-torture/execute/pr103417.c: New test.
> 
> --- gcc/match.pd.jj   2021-11-24 11:46:03.191918052 +0100
> +++ gcc/match.pd  2021-11-24 22:33:43.852575772 +0100
> @@ -5214,20 +5214,16 @@ (define_operator_list SYNC_FETCH_AND_AND
>  /* Transform comparisons of the form (X & Y) CMP 0 to X CMP2 Z
> where ~Y + 1 == pow2 and Z = ~Y.  */  (for cst (VECTOR_CST INTEGER_CST)
> - (for cmp (le eq ne ge gt)
> -  icmp (le le gt le gt)
> - (simplify
> -  (cmp (bit_and:c@2 @0 cst@1) integer_zerop)
> -   (with { tree csts = bitmask_inv_

RE: [PATCH] match.pd: Fix up the recent bitmask_inv_cst_vector_p simplification [PR103417]

2021-11-25 Thread Richard Biener via Gcc-patches

On Thu, 25 Nov 2021, Tamar Christina wrote:

> Hi Jakub,
> 
> > -Original Message-
> > From: Jakub Jelinek 
> > Sent: Thursday, November 25, 2021 8:19 AM
> > To: Richard Biener 
> > Cc: Tamar Christina ; gcc-patches@gcc.gnu.org
> > Subject: [PATCH] match.pd: Fix up the recent bitmask_inv_cst_vector_p
> > simplification [PR103417]
> > 
> > Hi!
> > 
> > The following testcase is miscompiled since the r12-5489-g0888d6bbe97e10
> > changes.
> > The simplification triggers on
> > (x & 4294967040U) >= 0U
> > and turns it into:
> > x <= 255U
> > which is incorrect, it should fold to 1 because unsigned >= 0U is always 
> > true
> > and normally the
> > /* Non-equality compare simplifications from fold_binary  */
> >  (if (wi::to_wide (cst) == min)
> >(if (cmp == GE_EXPR)
> > { constant_boolean_node (true, type); }) simplification folds that, 
> > but
> > this simplification was done earlier.
> > 
> > The simplification correctly doesn't include lt which has the same reason 
> > why
> > it shouldn't be handled, we'll fold it to 0 elsewhere.
> 
> 
> Yes this was a bug, sorry I'm not sure why I didn't catch it...
> 
> > 
> > But, IMNSHO while it isn't incorrect to handle le and gt there, it is
> > unnecessary.  Because (x & cst) <= 0U and (x & cst) > 0U should never 
> > appear,
> > again in
> > /* Non-equality compare simplifications from fold_binary  */ we have a
> > simplification for it:
> >(if (cmp == LE_EXPR)
> > (eq @2 @1))
> >(if (cmp == GT_EXPR)
> > (ne @2 @1
> > This is done for
> >   (cmp (convert?@2 @0) uniform_integer_cst_p@1) and so should be done
> > for both integers and vectors.
> > As the bitmask_inv_cst_vector_p simplification only handles eq and ne for
> > signed types, I think it can be simplified to just following patch.

Note that would mean the transform should be ordered _after_ the above,
even if we retain it for vector le/gt.

> As I mentioned on the PR I don't think LE and GT should be removed, the patch
> Is attempting to simplify the bitmask used because most vector ISAs can create
> the simpler mask much easier than the complex mask.
> 
> It. 0xFF00 is harder to create than 0xFF.   So while for scalar it 
> doesn't matter
> as much, it does for vector code.
>
> Regards,
> Tamar
> 
> > 
> > Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?
> > 
> > I wonder if (for cst (VECTOR_CST INTEGER_CST) is good for the best size of 
> > *-
> > match.c, wouldn't accepting just CONSTANT_CLASS_P@1 and then just say in
> > bitmask_inv_cst_vector_p return NULL_TREE if it isn't INTEGER_CST or
> > VECTOR_CST?

In the end that should be recoverable by genmatch.  I do have some ideas
to improve it for size in this area, maybe during stage4.  Originally
genmatch was trying to optimize for matching speed but with now honoring
ordering of patterns that very much became secondary (note re-ordering
patterns in match.pd can also improve *-match.c size greatly!  maybe
some script can try to brute-force the "optimal" order - but note some
pattern order matters ;))

> > Also, without/with this patch I see on i686-linux (can be reproduced with
> > RUNTESTFLAGS="--target_board=unix/-m32/-mno-sse dg.exp='bic-bitmask*
> > signbit-2*'"
> > too):
> > FAIL: gcc.dg/bic-bitmask-10.c scan-tree-dump dce7 "<=s*.+{ 255,.+}"
> > FAIL: gcc.dg/bic-bitmask-11.c scan-tree-dump dce7 ">s*.+{ 255,.+}"
> > FAIL: gcc.dg/bic-bitmask-12.c scan-tree-dump dce7 "<=s*.+{ 255,.+}"
> > FAIL: gcc.dg/bic-bitmask-2.c scan-tree-dump-times dce7
> > "<=s*.+{ 255,.+}" 1
> > FAIL: gcc.dg/bic-bitmask-23.c (test for excess errors)
> > FAIL: gcc.dg/bic-bitmask-23.c scan-tree-dump dce7 "<=s*.+{ 255, 15, 1,
> > 65535 }"
> > FAIL: gcc.dg/bic-bitmask-3.c scan-tree-dump-times dce7
> > "<=s*.+{ 255,.+}" 1
> > FAIL: gcc.dg/bic-bitmask-4.c scan-tree-dump-times dce7 "=s*.+{ 1,.+}" 1
> > FAIL: gcc.dg/bic-bitmask-5.c scan-tree-dump-times dce7 ">s*.+{ 255,.+}"
> > 1
> > FAIL: gcc.dg/bic-bitmask-6.c scan-tree-dump-times dce7
> > "<=s*.+{ 255,.+}" 1
> > FAIL: gcc.dg/bic-bitmask-8.c scan-tree-dump-times dce7 ">s*.+{ 1,.+}" 1
> > FAIL: gcc.dg/bic-bitmask-9.c scan-tree-dump dce7
> > "&s*.+{ 4294967290,.+}"
> > FAIL: gcc.dg/signbit-2.c scan-tree-dump optimized "s+>s+{ 0(, 0)+ }"
> > Those tests use vect_int effective target, but AFAIK that can be used only 
> > in
> > *.dg/vect/ because it relies on vect.exp enabling options to support
> > vectorization on the particular target (e.g. for i686-linux that -msse2).
> > I think there isn't other way to get the DEFAULT_VECTCFLAGS into dg-
> > options other than having the test driven by vect.exp.
> > 
> > And, finally, I've noticed incorrect formatting in the new
> > bitmask_inv_cst_vector_p routine:
> >   do {
> > if (idx > 0)
> >   cst = vector_cst_elt (t, idx);
> > ...
> > builder.quick_push (newcst);
> >   } while (++idx < nelts);
> > It should be
> >   do
> > {
> >   i

Re: [PATCH] match.pd: Fix up the recent bitmask_inv_cst_vector_p simplification [PR103417]

2021-11-25 Thread Jakub Jelinek via Gcc-patches

On Thu, Nov 25, 2021 at 08:23:50AM +, Tamar Christina wrote:
> > But, IMNSHO while it isn't incorrect to handle le and gt there, it is
> > unnecessary.  Because (x & cst) <= 0U and (x & cst) > 0U should never 
> > appear,
> > again in
> > /* Non-equality compare simplifications from fold_binary  */ we have a
> > simplification for it:
> >(if (cmp == LE_EXPR)
> > (eq @2 @1))
> >(if (cmp == GT_EXPR)
> > (ne @2 @1
> > This is done for
> >   (cmp (convert?@2 @0) uniform_integer_cst_p@1) and so should be done
> > for both integers and vectors.
> > As the bitmask_inv_cst_vector_p simplification only handles eq and ne for
> > signed types, I think it can be simplified to just following patch.
> 
> As I mentioned on the PR I don't think LE and GT should be removed, the patch
> Is attempting to simplify the bitmask used because most vector ISAs can create
> the simpler mask much easier than the complex mask.
> 
> It. 0xFF00 is harder to create than 0xFF.   So while for scalar it 
> doesn't matter
> as much, it does for vector code.

What I'm trying to explain is that you should never see those le or gt cases
with TYPE_UNSIGNED (especially when the simplification is moved after those
/* Non-equality compare simplifications from fold_binary  */
I've mentioned), because if you try:
typedef unsigned V __attribute__((vector_size (4)));

unsigned f1 (unsigned x) { unsigned z = 0; return x > z; }
unsigned f2 (unsigned x) { unsigned z = 0; return x <= z; }
V f3 (V x) { V z = (V) {}; return x > z; }
V f4 (V x) { V z = (V) {}; return x <= z; }
you'll see that this is at ccp1 when the constants propagate simplified
using the rules I mentioned into x != 0U, x == 0U, x != (V) {} and x == (V) {}.

The important rule of match.pd is composability, the simplifications should
rely on other simplifications and not repeating all their decisions because
that makes the *match.c larger and more expensive (and a source of extra
possible bugs).

Jakub

Re: [PATCH][RFC] middle-end/46476 - resurrect -Wunreachable-code

2021-11-25 Thread Richard Biener via Gcc-patches

On Thu, 25 Nov 2021, Richard Biener wrote:

> On Wed, 24 Nov 2021, Michael Matz wrote:
> 
> > Hello,
> > 
> > On Wed, 24 Nov 2021, Richard Biener wrote:
> > 
> > > >> +/* Unreachable code in if (0) block.  */
> > > >> +void baz(int *p)
> > > >> +{
> > > >> +   if (0)
> > > >> + {
> > > >> +return;  /* { dg-bogus "not reachable" } */
> > > >
> > > >Hmm?  Why are you explicitely saying that warning here would be bogus? 
> > > 
> > > Because I don't think we want to warn here. Such code is common from 
> > > template instantiation or macro expansion.
> > 
> > Like all code with an (const-propagated) explicit 'if (0)', which is of 
> > course the reason why -Wunreachable-code is a challenge.
> 
> OK, so I probably shouldn't have taken -Wunreachable-code but named
> it somehow differently.  We want to diagnose obvious programming
> mistakes, not (source code) optimization opportunities.  So
> 
> int foo (int i)
> {
>   return i;
>   i += 1;
>   return i;
> }
> 
> should be diagnosed for example but not so
> 
> int foo (int i)
> {
>   if (USE_NOOP_FOO)
> return i;
>   i += 1;
>   return i;
> }
> 
> and compiling with -DUSE_NOOP_FOO=1
> 
> >  IOW: I could 
> > accept your argument but then wonder why you want to warn about the second 
> > statement of the guarded block.  The situation was:
> > 
> >   if (0) {
> > return;  // (1) don't warn here?
> > whatever++;  // (2) but warn here?
> 
> because as said above, the whatever++ will never be reachable even if
> you change the condition in the if().  See my response to Martin where
> I said I think if (0) of a block is a good way to comment it out
> but keep it syntactically correct.
> 
> >   }
> > 
> > That seems even more confusing.  So you don't want to warn about 
> > unreachable code (the 'return') but you do want to warn about unreachable 
> > code within unreachable code (point (2) is unreachable because of the 
> > if(0) and because of the return).  If your worry is macro/template 
> > expansion resulting if(0)'s then I don't see why you would only disable 
> > warnings for some of the statements therein.
> 
> The point is not to disable the warning for some statements therein
> but to avoid diagnosing following stmts.
> 
> > It seems we are actually interested in code unreachable via fallthrough or 
> > labels, not in all unreachable code, so maybe the warning is mis-named.
> 
> Yes, that's definitely the case - I was too lazy to re-use the old
> option name here.  But I don't have a good name at hand, maybe clang
> has an option covering the cases I'm thinking about.
> 
> Btw, the diagnostic spotted qsort_chk doing
> 
> if (CMP (i1, i2))
>   break;
> else if (CMP (i2, i1))
>   return ERR2 (i1, i2);
> 
> where ERR2 expands to a call to a noreturn void "returning"
> qsort_chk_error, so the 'return' stmt is not reachable.  Not exactly
> a bug but somewhat difficult to avoid the diagnostic for.  I suppose
> the pointless 'return' is to make it more visible that the loop
> terminates here (albeit we don't return normally).
> 
> Likewise we diagnose (c_tree_equal):
> 
> default:
>   gcc_unreachable ();
> }
>   /* We can get here with --disable-checking.  */
>   return false;
> 
> where the 'return false' is never reachable.  The return was likely
> inserted to avoid very strange error paths then the unreachable
> falls through to some other random function.

It also finds this strange code in label_rtx_for_bb:

  /* Find the tree label if it is present.  */

  for (gsi = gsi_start_bb (bb); !gsi_end_p (gsi); gsi_next (&gsi))
{
  glabel *lab_stmt;

  lab_stmt = dyn_cast  (gsi_stmt (gsi));
  if (!lab_stmt)
break;

  lab = gimple_label_label (lab_stmt);
  if (DECL_NONLOCAL (lab))
break;

  return jump_target_rtx (lab);
}

diagnosing

/home/rguenther/src/trunk/gcc/cfgexpand.c:2476:60: error: statement is not 
reachable [-Werror=unreachable-code]
 2476 |   for (gsi = gsi_start_bb (bb); !gsi_end_p (gsi); gsi_next (&gsi))
  |   ~^~

indeed the loop looks pointless.  Unless the DECL_NONLOCAL case was
meant to continue;

Richard.

RE: [PATCH] match.pd: Fix up the recent bitmask_inv_cst_vector_p simplification [PR103417]

2021-11-25 Thread Tamar Christina via Gcc-patches




> -Original Message-
> From: Jakub Jelinek 
> Sent: Thursday, November 25, 2021 8:39 AM
> To: Tamar Christina 
> Cc: Richard Biener ; gcc-patches@gcc.gnu.org
> Subject: Re: [PATCH] match.pd: Fix up the recent bitmask_inv_cst_vector_p
> simplification [PR103417]
> 
> On Thu, Nov 25, 2021 at 08:23:50AM +, Tamar Christina wrote:
> > > But, IMNSHO while it isn't incorrect to handle le and gt there, it
> > > is unnecessary.  Because (x & cst) <= 0U and (x & cst) > 0U should
> > > never appear, again in
> > > /* Non-equality compare simplifications from fold_binary  */ we have
> > > a simplification for it:
> > >(if (cmp == LE_EXPR)
> > > (eq @2 @1))
> > >(if (cmp == GT_EXPR)
> > > (ne @2 @1
> > > This is done for
> > >   (cmp (convert?@2 @0) uniform_integer_cst_p@1) and so should be
> > > done for both integers and vectors.
> > > As the bitmask_inv_cst_vector_p simplification only handles eq and
> > > ne for signed types, I think it can be simplified to just following patch.
> >
> > As I mentioned on the PR I don't think LE and GT should be removed,
> > the patch Is attempting to simplify the bitmask used because most
> > vector ISAs can create the simpler mask much easier than the complex
> mask.
> >
> > It. 0xFF00 is harder to create than 0xFF.   So while for scalar it 
> > doesn't
> matter
> > as much, it does for vector code.
> 
> What I'm trying to explain is that you should never see those le or gt cases
> with TYPE_UNSIGNED (especially when the simplification is moved after
> those
> /* Non-equality compare simplifications from fold_binary  */ I've mentioned),
> because if you try:
> typedef unsigned V __attribute__((vector_size (4)));
> 
> unsigned f1 (unsigned x) { unsigned z = 0; return x > z; } unsigned f2
> (unsigned x) { unsigned z = 0; return x <= z; } V f3 (V x) { V z = (V) {}; 
> return x >
> z; } V f4 (V x) { V z = (V) {}; return x <= z; } you'll see that this is at 
> ccp1 when
> the constants propagate simplified using the rules I mentioned into x != 0U, x
> == 0U, x != (V) {} and x == (V) {}.

Ah I see, sorry I didn't see that rule before, you're right that if this is 
ordered
after it then they can be dropped.

Thanks,
Tamar

> 
> The important rule of match.pd is composability, the simplifications should
> rely on other simplifications and not repeating all their decisions because 
> that
> makes the *match.c larger and more expensive (and a source of extra
> possible bugs).
> 
>   Jakub

RE: [PATCH] match.pd: Fix up the recent bitmask_inv_cst_vector_p simplification [PR103417]

2021-11-25 Thread Tamar Christina via Gcc-patches

> -Original Message-
> From: Jakub Jelinek 
> Sent: Thursday, November 25, 2021 8:19 AM
> To: Richard Biener 
> Cc: Tamar Christina ; gcc-patches@gcc.gnu.org
> Subject: [PATCH] match.pd: Fix up the recent bitmask_inv_cst_vector_p
> simplification [PR103417]
> 
> Hi!
> 
> The following testcase is miscompiled since the r12-5489-g0888d6bbe97e10
> changes.
> The simplification triggers on
> (x & 4294967040U) >= 0U
> and turns it into:
> x <= 255U
> which is incorrect, it should fold to 1 because unsigned >= 0U is always true
> and normally the
> /* Non-equality compare simplifications from fold_binary  */
>  (if (wi::to_wide (cst) == min)
>(if (cmp == GE_EXPR)
> { constant_boolean_node (true, type); }) simplification folds that, 
> but
> this simplification was done earlier.
> 
> The simplification correctly doesn't include lt which has the same reason why
> it shouldn't be handled, we'll fold it to 0 elsewhere.
> 
> But, IMNSHO while it isn't incorrect to handle le and gt there, it is
> unnecessary.  Because (x & cst) <= 0U and (x & cst) > 0U should never appear,
> again in
> /* Non-equality compare simplifications from fold_binary  */ we have a
> simplification for it:
>(if (cmp == LE_EXPR)
> (eq @2 @1))
>(if (cmp == GT_EXPR)
> (ne @2 @1
> This is done for
>   (cmp (convert?@2 @0) uniform_integer_cst_p@1) and so should be done
> for both integers and vectors.
> As the bitmask_inv_cst_vector_p simplification only handles eq and ne for
> signed types, I think it can be simplified to just following patch.
> 
> Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?
> 
> I wonder if (for cst (VECTOR_CST INTEGER_CST) is good for the best size of *-
> match.c, wouldn't accepting just CONSTANT_CLASS_P@1 and then just say in
> bitmask_inv_cst_vector_p return NULL_TREE if it isn't INTEGER_CST or
> VECTOR_CST?
> 
> Also, without/with this patch I see on i686-linux (can be reproduced with
> RUNTESTFLAGS="--target_board=unix/-m32/-mno-sse dg.exp='bic-bitmask*
> signbit-2*'"
> too):
> FAIL: gcc.dg/bic-bitmask-10.c scan-tree-dump dce7 "<=s*.+{ 255,.+}"
> FAIL: gcc.dg/bic-bitmask-11.c scan-tree-dump dce7 ">s*.+{ 255,.+}"
> FAIL: gcc.dg/bic-bitmask-12.c scan-tree-dump dce7 "<=s*.+{ 255,.+}"
> FAIL: gcc.dg/bic-bitmask-2.c scan-tree-dump-times dce7
> "<=s*.+{ 255,.+}" 1
> FAIL: gcc.dg/bic-bitmask-23.c (test for excess errors)
> FAIL: gcc.dg/bic-bitmask-23.c scan-tree-dump dce7 "<=s*.+{ 255, 15, 1,
> 65535 }"
> FAIL: gcc.dg/bic-bitmask-3.c scan-tree-dump-times dce7
> "<=s*.+{ 255,.+}" 1
> FAIL: gcc.dg/bic-bitmask-4.c scan-tree-dump-times dce7 "=s*.+{ 1,.+}" 1
> FAIL: gcc.dg/bic-bitmask-5.c scan-tree-dump-times dce7 ">s*.+{ 255,.+}"
> 1
> FAIL: gcc.dg/bic-bitmask-6.c scan-tree-dump-times dce7
> "<=s*.+{ 255,.+}" 1
> FAIL: gcc.dg/bic-bitmask-8.c scan-tree-dump-times dce7 ">s*.+{ 1,.+}" 1
> FAIL: gcc.dg/bic-bitmask-9.c scan-tree-dump dce7
> "&s*.+{ 4294967290,.+}"
> FAIL: gcc.dg/signbit-2.c scan-tree-dump optimized "s+>s+{ 0(, 0)+ }"
> Those tests use vect_int effective target, but AFAIK that can be used only in
> *.dg/vect/ because it relies on vect.exp enabling options to support
> vectorization on the particular target (e.g. for i686-linux that -msse2).
> I think there isn't other way to get the DEFAULT_VECTCFLAGS into dg-
> options other than having the test driven by vect.exp.

Yeah, I now see that vect_int is different from some of the other effective 
target checks like the SVE one.
I'll move the ones testing the vector code to vect and leave the scalars where 
they are.

Thanks,
Tamar

> 
> And, finally, I've noticed incorrect formatting in the new
> bitmask_inv_cst_vector_p routine:
>   do {
> if (idx > 0)
>   cst = vector_cst_elt (t, idx);
> ...
> builder.quick_push (newcst);
>   } while (++idx < nelts);
> It should be
>   do
> {
>   if (idx > 0)
>   cst = vector_cst_elt (t, idx);
> ...
>   builder.quick_push (newcst);
> }
>   while (++idx < nelts);
> 
> 2021-11-25  Jakub Jelinek  
> 
>   PR tree-optimization/103417
>   * match.pd ((X & Y) CMP 0): Only handle eq and ne.  Commonalize
>   common tests.
> 
>   * gcc.c-torture/execute/pr103417.c: New test.
> 
> --- gcc/match.pd.jj   2021-11-24 11:46:03.191918052 +0100
> +++ gcc/match.pd  2021-11-24 22:33:43.852575772 +0100
> @@ -5214,20 +5214,16 @@ (define_operator_list SYNC_FETCH_AND_AND
>  /* Transform comparisons of the form (X & Y) CMP 0 to X CMP2 Z
> where ~Y + 1 == pow2 and Z = ~Y.  */  (for cst (VECTOR_CST INTEGER_CST)
> - (for cmp (le eq ne ge gt)
> -  icmp (le le gt le gt)
> - (simplify
> -  (cmp (bit_and:c@2 @0 cst@1) integer_zerop)
> -   (with { tree csts = bitmask_inv_cst_vector_p (@1); }
> - (switch
> -  (if (csts && TYPE_UNSIGNED (TREE_TYPE (@1))
> -&& (VECTOR_TYPE_P (TREE_TYPE (@1)) || single_use (@2)))
> -   (icmp @0 { csts; }))
> -  (if (csts && !

RE: [PATCH] match.pd: Fix up the recent bitmask_inv_cst_vector_p simplification [PR103417]

2021-11-25 Thread Richard Biener via Gcc-patches

On Thu, 25 Nov 2021, Tamar Christina wrote:

> 
> 
> > -Original Message-
> > From: Jakub Jelinek 
> > Sent: Thursday, November 25, 2021 8:39 AM
> > To: Tamar Christina 
> > Cc: Richard Biener ; gcc-patches@gcc.gnu.org
> > Subject: Re: [PATCH] match.pd: Fix up the recent bitmask_inv_cst_vector_p
> > simplification [PR103417]
> > 
> > On Thu, Nov 25, 2021 at 08:23:50AM +, Tamar Christina wrote:
> > > > But, IMNSHO while it isn't incorrect to handle le and gt there, it
> > > > is unnecessary.  Because (x & cst) <= 0U and (x & cst) > 0U should
> > > > never appear, again in
> > > > /* Non-equality compare simplifications from fold_binary  */ we have
> > > > a simplification for it:
> > > >(if (cmp == LE_EXPR)
> > > > (eq @2 @1))
> > > >(if (cmp == GT_EXPR)
> > > > (ne @2 @1
> > > > This is done for
> > > >   (cmp (convert?@2 @0) uniform_integer_cst_p@1) and so should be
> > > > done for both integers and vectors.
> > > > As the bitmask_inv_cst_vector_p simplification only handles eq and
> > > > ne for signed types, I think it can be simplified to just following 
> > > > patch.
> > >
> > > As I mentioned on the PR I don't think LE and GT should be removed,
> > > the patch Is attempting to simplify the bitmask used because most
> > > vector ISAs can create the simpler mask much easier than the complex
> > mask.
> > >
> > > It. 0xFF00 is harder to create than 0xFF.   So while for scalar it 
> > > doesn't
> > matter
> > > as much, it does for vector code.
> > 
> > What I'm trying to explain is that you should never see those le or gt cases
> > with TYPE_UNSIGNED (especially when the simplification is moved after
> > those
> > /* Non-equality compare simplifications from fold_binary  */ I've 
> > mentioned),
> > because if you try:
> > typedef unsigned V __attribute__((vector_size (4)));
> > 
> > unsigned f1 (unsigned x) { unsigned z = 0; return x > z; } unsigned f2
> > (unsigned x) { unsigned z = 0; return x <= z; } V f3 (V x) { V z = (V) {}; 
> > return x >
> > z; } V f4 (V x) { V z = (V) {}; return x <= z; } you'll see that this is at 
> > ccp1 when
> > the constants propagate simplified using the rules I mentioned into x != 
> > 0U, x
> > == 0U, x != (V) {} and x == (V) {}.
> 
> Ah I see, sorry I didn't see that rule before, you're right that if this is 
> ordered
> after it then they can be dropped.

So the patch is OK, possibly with re-ordering the matches.

Thanks,
Richard.

Re: [PATCH 2/2][GCC] arm: Declare MVE types internally via pragma

2021-11-25 Thread Murray Steele via Gcc-patches

Changes from original patch:

1. Merged test_redef_* test files into one
2. Encapsulated contents of arm-mve-builtins.h in namespace arm_mve (missed
   in initial patch).
3. Added extern declarations for scalar_types and acle_vector types to
   arm-mve-builtins.h (missed in initial patch).
4. Added arm-mve-builtins.(cc|h) to gt_targets for arm-*-*-* (missed in
   initial patch).
5. Added include for gt-arm-mve-builtins.h to arm-mve-builtins.cc (missed in
   initial patch).
6. Removed explicit initialisation of handle_arm_mve_types_p as it is unneeded.

---

This patch moves the implementation of MVE ACLE types from
arm_mve_types.h to inside GCC via a new pragma, which replaces the prior
type definitions. This allows for the types to be used internally for
intrinsic function definitions.

Bootstrapped and regression tested on arm-none-linux-gnuabihf, and
regression tested on arm-eabi -- no issues.

Thanks,
Murray

gcc/ChangeLog:

* config.gcc: Add arm-mve-builtins.o to extra_objs for arm-*-*-*
targets.
* config/arm/arm-c.c (arm_pragma_arm): Handle new pragma.
(arm_register_target_pragmas): Register new pragma.
* config/arm/arm-protos.h: Add arm_mve namespace and declare
arm_handle_mve_types_h.
* config/arm/arm_mve_types.h: Replace MVE type definitions with
new pragma.
* config/arm/t-arm: Add arm-mve-builtins.o target.
* config/arm/arm-mve-builtins.cc: New file.
* config/arm/arm-mve-builtins.def: New file.
* config/arm/arm-mve-builtins.h: New file.

gcc/testsuite/ChangeLog:

* gcc.target/arm/mve/mve.exp: Add new subdirectories.
* gcc.target/arm/mve/general-c/type_redef_1.c: New test.
* gcc.target/arm/mve/general/double_pragmas_1.c: New test.
* gcc.target/arm/mve/general/nomve_1.c: New test.
diff --git a/gcc/config.gcc b/gcc/config.gcc
index 
edd12655c4a1e6feb09aabbee77eacd9f66b4171..0aa386403112eff80cb5071fa6ff2fdbe610c9fc
 100644
--- a/gcc/config.gcc
+++ b/gcc/config.gcc
@@ -352,14 +352,14 @@ arc*-*-*)
;;
 arm*-*-*)
cpu_type=arm
-   extra_objs="arm-builtins.o aarch-common.o"
+   extra_objs="arm-builtins.o arm-mve-builtins.o aarch-common.o"
extra_headers="mmintrin.h arm_neon.h arm_acle.h arm_fp16.h arm_cmse.h 
arm_bf16.h arm_mve_types.h arm_mve.h arm_cde.h"
target_type_format_char='%'
c_target_objs="arm-c.o"
cxx_target_objs="arm-c.o"
d_target_objs="arm-d.o"
extra_options="${extra_options} arm/arm-tables.opt"
-   target_gtfiles="\$(srcdir)/config/arm/arm-builtins.c"
+   target_gtfiles="\$(srcdir)/config/arm/arm-builtins.c 
\$(srcdir)/config/arm/arm-mve-builtins.h 
\$(srcdir)/config/arm/arm-mve-builtins.cc"
;;
 avr-*-*)
cpu_type=avr
diff --git a/gcc/config/arm/arm-c.c b/gcc/config/arm/arm-c.c
index 
cc7901bca8dc9c5c27ed6afc5bc26afd42689e6d..d1414f6e0e1c2bd0a7364b837c16adf493221376
 100644
--- a/gcc/config/arm/arm-c.c
+++ b/gcc/config/arm/arm-c.c
@@ -28,6 +28,7 @@
 #include "c-family/c-pragma.h"
 #include "stringpool.h"
 #include "arm-builtins.h"
+#include "arm-protos.h"
 
 tree
 arm_resolve_cde_builtin (location_t loc, tree fndecl, void *arglist)
@@ -129,6 +130,24 @@ arm_resolve_cde_builtin (location_t loc, tree fndecl, void 
*arglist)
   return call_expr;
 }
 
+/* Implement "#pragma GCC arm".  */
+static void
+arm_pragma_arm (cpp_reader *)
+{
+  tree x;
+  if (pragma_lex (&x) != CPP_STRING)
+{
+  error ("%<#pragma GCC arm%> requires a string parameter");
+  return;
+}
+
+  const char *name = TREE_STRING_POINTER (x);
+  if (strcmp (name, "arm_mve_types.h") == 0)
+arm_mve::handle_arm_mve_types_h ();
+  else
+error ("unknown %<#pragma GCC arm%> option %qs", name);
+}
+
 /* Implement TARGET_RESOLVE_OVERLOADED_BUILTIN.  This is currently only
used for the MVE related builtins for the CDE extension.
Here we ensure the type of arguments is such that the size is correct, and
@@ -476,6 +495,8 @@ arm_register_target_pragmas (void)
   targetm.target_option.pragma_parse = arm_pragma_target_parse;
   targetm.resolve_overloaded_builtin = arm_resolve_overloaded_builtin;
 
+  c_register_pragma ("GCC", "arm", arm_pragma_arm);
+
 #ifdef REGISTER_SUBTARGET_PRAGMAS
   REGISTER_SUBTARGET_PRAGMAS ();
 #endif
diff --git a/gcc/config/arm/arm-mve-builtins.cc 
b/gcc/config/arm/arm-mve-builtins.cc
new file mode 100644
index 
..99ddc8d49aad39e057c1c0d349c6c02c278553d6
--- /dev/null
+++ b/gcc/config/arm/arm-mve-builtins.cc
@@ -0,0 +1,196 @@
+/* ACLE support for Arm MVE
+   Copyright (C) 2021 Free Software Foundation, Inc.
+
+   This file is part of GCC.
+
+   GCC is free software; you can redistribute it and/or modify it
+   under the terms of the GNU General Public License as published by
+   the Free Software Foundation; either version 3, or (at your option)
+   any later version.
+
+   GCC is distributed in the hope that it will be useful, but

Re: [PATCH] bswap: Improve perform_symbolic_merge [PR103376]

2021-11-25 Thread Jakub Jelinek via Gcc-patches

On Thu, Nov 25, 2021 at 09:21:37AM +0100, Richard Biener wrote:
> OK if you can add a testcase that exercises this "feature".

Sure, that is easy.

Here is what I've committed.  f2 tests the x | x = x handling in it,
f3 tests x | y = unknown instead of punting, f4 tests x ^ x = 0
and f5 tests x ^ y = unknown.  Without the patch only f2 is optimized
to __builtin_bswap32, with the patch all of them.

2021-11-25  Jakub Jelinek  

PR tree-optimization/103376
* gimple-ssa-store-merging.c (perform_symbolic_merge): For
BIT_IOR_EXPR, if masked1 && masked2 && masked1 != masked2, don't
punt, but set the corresponding result byte to MARKER_BYTE_UNKNOWN.
For BIT_XOR_EXPR similarly and if masked1 == masked2 and the
byte isn't MARKER_BYTE_UNKNOWN, set the corresponding result byte to
0.

* gcc.dg/optimize-bswapsi-7.c: New test.

--- gcc/gimple-ssa-store-merging.c.jj   2021-11-24 09:54:37.684365460 +0100
+++ gcc/gimple-ssa-store-merging.c  2021-11-24 11:18:54.46266 +0100
@@ -556,6 +556,7 @@ perform_symbolic_merge (gimple *source_s
   n->bytepos = n_start->bytepos;
   n->type = n_start->type;
   size = TYPE_PRECISION (n->type) / BITS_PER_UNIT;
+  uint64_t res_n = n1->n | n2->n;
 
   for (i = 0, mask = MARKER_MASK; i < size; i++, mask <<= BITS_PER_MARKER)
 {
@@ -563,12 +564,33 @@ perform_symbolic_merge (gimple *source_s
 
   masked1 = n1->n & mask;
   masked2 = n2->n & mask;
-  /* For BIT_XOR_EXPR or PLUS_EXPR, at least one of masked1 and masked2
-has to be 0, for BIT_IOR_EXPR x | x is still x.  */
-  if (masked1 && masked2 && (code != BIT_IOR_EXPR || masked1 != masked2))
-   return NULL;
+  /* If at least one byte is 0, all of 0 | x == 0 ^ x == 0 + x == x.  */
+  if (masked1 && masked2)
+   {
+ /* + can carry into upper bits, just punt.  */
+ if (code == PLUS_EXPR)
+   return NULL;
+ /* x | x is still x.  */
+ if (code == BIT_IOR_EXPR && masked1 == masked2)
+   continue;
+ if (code == BIT_XOR_EXPR)
+   {
+ /* x ^ x is 0, but MARKER_BYTE_UNKNOWN stands for
+unknown values and unknown ^ unknown is unknown.  */
+ if (masked1 == masked2
+ && masked1 != ((uint64_t) MARKER_BYTE_UNKNOWN
+<< i * BITS_PER_MARKER))
+   {
+ res_n &= ~mask;
+ continue;
+   }
+   }
+ /* Otherwise set the byte to unknown, it might still be
+later masked off.  */
+ res_n |= mask;
+   }
 }
-  n->n = n1->n | n2->n;
+  n->n = res_n;
   n->n_ops = n1->n_ops + n2->n_ops;
 
   return source_stmt;
--- gcc/testsuite/gcc.dg/optimize-bswapsi-7.c.jj2021-11-25 
10:36:03.847529686 +0100
+++ gcc/testsuite/gcc.dg/optimize-bswapsi-7.c   2021-11-25 10:35:46.522778192 
+0100
@@ -0,0 +1,37 @@
+/* PR tree-optimization/103376 */
+/* { dg-do compile } */
+/* { dg-require-effective-target bswap } */
+/* { dg-options "-O2 -fno-tree-vectorize -fdump-tree-optimized" } */
+/* { dg-additional-options "-march=z900" { target s390-*-* } } */
+
+static unsigned int
+f1 (unsigned int x)
+{
+  return (x << 24) | (x >> 8);
+}
+
+unsigned int
+f2 (unsigned *p)
+{
+  return ((f1 (p[0]) | (p[0] >> 8)) & 0xff00U) | (p[0] >> 24) | ((p[0] & 
0xff00U) << 8) | ((p[0] & 0xffU) >> 8);
+}
+
+unsigned int
+f3 (unsigned *p)
+{
+  return ((f1 (p[0]) | (p[0] & 0x00ff00ffU)) & 0xff00ff00U) | (f1 (f1 (f1 
(p[0]))) & 0x00ff00ffU);
+}
+
+unsigned int
+f4 (unsigned *p)
+{
+  return (f1 (p[0]) ^ (p[0] >> 8)) ^ (p[0] >> 24) ^ ((p[0] & 0xff00U) << 8) ^ 
((p[0] & 0xffU) >> 8);
+}
+
+unsigned int
+f5 (unsigned *p)
+{
+  return (((f1 (p[0]) | (p[0] >> 16)) ^ (p[0] >> 8)) & 0xU) ^ (p[0] >> 
24) ^ ((p[0] & 0xff00U) << 8) ^ ((p[0] & 0xffU) >> 8);
+}
+
+/* { dg-final { scan-tree-dump-times "= __builtin_bswap32 \\\(" 4 "optimized" 
} } */


Jakub

Re: [PATCH] match.pd: Fix up the recent bitmask_inv_cst_vector_p simplification [PR103417]

2021-11-25 Thread Jakub Jelinek via Gcc-patches

On Thu, Nov 25, 2021 at 10:17:52AM +0100, Richard Biener wrote:
> > Ah I see, sorry I didn't see that rule before, you're right that if this is 
> > ordered
> > after it then they can be dropped.
> 
> So the patch is OK, possibly with re-ordering the matches.

I've committed the patch as is because it has been tested that way and I'd
like to avoid dups of that PR flowing in.  Even when not reordered, the new
earlier match.pd simplification will not trigger for the lt le gt ge cases
anymore and the later old simplifications will trigger and I'd expect after
that latter simplification the earlier should trigger again because the IL
changed, no?
Tamar, can you handle the reordering together with the testsuite changes
(and perhaps formatting fixes in the tree.c routine)?

Jakub

[ping][vect-patterns] Refactor widen_plus/widen_minus as internal_fns

2021-11-25 Thread Joel Hutton via Gcc-patches

Just a quick ping to check this hasn't been forgotten.

> -Original Message-
> From: Joel Hutton
> Sent: 12 November 2021 11:42
> To: Richard Biener 
> Cc: gcc-patches@gcc.gnu.org; Richard Sandiford
> 
> Subject: RE: [vect-patterns] Refactor widen_plus/widen_minus as
> internal_fns
> 
> > please use #define INCLUDE_MAP before the system.h include instead.
> > Is it really necessary to build a new std::map for each optab lookup?!
> > That looks quite ugly and inefficient.  We'd usually - if necessary at
> > all - build a auto_vec > and .sort () and .bsearch () 
> > it.
> Ok, I'll rework this part. In the meantime, to address your other comment.
> 
> > I'm not sure I understand DEF_INTERNAL_OPTAB_MULTI_FN, neither this
> > cover letter nor the patch ChangeLog explains anything.
> 
> I'll attempt to clarify, if this makes things clearer I can include this in 
> the
> commit message of the respun patch:
> 
> DEF_INTERNAL_OPTAB_MULTI_FN is like DEF_INTERNAL_OPTAB_FN except it
> provides convenience wrappers for defining conversions that require a hi/lo
> split, like widening and narrowing operations.  Each definition for 
> will require an optab named  and two other optabs that you specify
> for signed and unsigned. The hi/lo pair is necessary because the widening
> operations take n narrow elements as inputs and return n/2 wide elements
> as outputs. The 'lo' operation operates on the first n/2 elements of input.
> The 'hi' operation operates on the second n/2 elements of input. Defining an
> internal_fn along with hi/lo variations allows a single internal function to 
> be
> returned from a vect_recog function that will later be expanded to hi/lo.
> 
> DEF_INTERNAL_OPTAB_MULTI_FN is used in internal-fn.def to register a
> widening internal_fn. It is defined differently in different places and 
> internal-
> fn.def is sourced from those places so the parameters given can be reused.
>   internal-fn.c: defined to expand to hi/lo signed/unsigned optabs, later
> defined to generate the  'expand_' functions for the hi/lo versions of the fn.
>   internal-fn.def: defined to invoke DEF_INTERNAL_OPTAB_FN for the original
> and hi/lo variants of the internal_fn
> 
>  For example:
>  IFN_VEC_WIDEN_PLUS -> IFN_VEC_WIDEN_PLUS_HI,
> IFN_VEC_WIDEN_PLUS_LO
> for aarch64: IFN_VEC_WIDEN_PLUS_HI   -> vec_widen_addl_hi_
> -> (u/s)addl2
>IFN_VEC_WIDEN_PLUS_LO  -> vec_widen_addl_lo_
> -> (u/s)addl
> 
> This gives the same functionality as the previous
> WIDEN_PLUS/WIDEN_MINUS tree codes which are expanded into
> VEC_WIDEN_PLUS_LO, VEC_WIDEN_PLUS_HI.
> 
> Let me know if I'm not expressing this clearly.
> 
> Thanks,
> Joel

Re: [PATCH] introduce predicate analysis class

2021-11-25 Thread Richard Biener via Gcc-patches

On Mon, Aug 30, 2021 at 10:06 PM Martin Sebor via Gcc-patches
 wrote:
>
> The predicate analysis subset of the tree-ssa-uninit pass isn't
> necessarily specific to the detection of uninitialized reads.
> Suitably parameterized, the same core logic could be used in
> other warning passes to improve their S/N ratio, or issue more
> nuanced diagnostics (e.g., when an invalid access cannot be
> ruled out but also need not in reality be unavoidable, issue
> a "may be invalid" type of warning rather than "is invalid").
>
> Separating the predicate analysis logic from the uninitialized
> pass and exposing a narrow API should also make it easier to
> understand and evolve each part independently of the other,
> or replace one with a better implementation without modifying
> the other.(*)
>
> As the first step in this direction, the attached patch extracts
> the predicate analysis logic out of the pass, turns the interface
> into public class members, and hides the internals in either
> private members or static functions defined in a new source file.
> (**)
>
> The changes should have no externally observable effect (i.e.,
> should cause no changes in warnings), except on the contents of
> the uninitialized dump.  While making the changes I enhanced
> the dumps to help me follow the logic.  Turning some previously
> free-standing functions into members involved changing their
> signatures and adjusting their callers.  While making these
> changes I also renamed some of them as well some variables for
> improved clarity.  Finally, I moved declarations of locals
> closer to their point of initialization.
>
> Tested on x86_64-linux.  Besides the usual bootstrap/regtest
> I also tentatively verified the generality of the new class
> interfaces by making use of it in -Warray-bounds.  Besides there,
> I'd like to make use of it in the new gimple-ssa-warn-access pass
> and, longer term, any other flow-sensitive warnings that might
> benefit from it.

This changed can_chain_union_be_invalidated_p from

  for (size_t i = 0; i < uninit_pred.length (); ++i)
{
  pred_chain c = uninit_pred[i];
  size_t j;
  for (j = 0; j < c.length (); ++j)
if (can_one_predicate_be_invalidated_p (c[j], use_guard))
  break;

  /* If we were unable to invalidate any predicate in C, then there
 is a viable path from entry to the PHI where the PHI takes
 an uninitialized value and continues to a use of the PHI.  */
  if (j == c.length ())
return false;
}
  return true;

to

  for (unsigned i = 0; i < preds.length (); ++i)
{
  const pred_chain &chain = preds[i];
  for (unsigned j = 0; j < chain.length (); ++j)
if (can_be_invalidated_p (chain[j], guard))
  return true;

  /* If we were unable to invalidate any predicate in C, then there
 is a viable path from entry to the PHI where the PHI takes
 an interesting value and continues to a use of the PHI.  */
  return false;
}
  return true;

which isn't semantically equivalent (it also uses overloading to
confuse me).  In particular the old code checked whether an
invalidation can happen for _each_ predicate in 'preds' while
the new one just checks preds[0], so the loop is pointless.

Catched by -Wunreachable-code complaining about unreachable
++i

Martin, was that change intended?

Richard.

>
> Martin
>
> [*] A review of open -Wuninitialized bugs I did while working
> on this project made me aware of a number of opportunities to
> improve the analyzer to reduce the number of false positives
> -Wmaybe-uninitiailzed suffers from.
>
> [**] The class isn't fully general and, like the uninit pass,
> only works with PHI nodes.  I plan to generalize it to compute
> the set of predicates between any two basic blocks.

Re: [PATCH] Fix typo in r12-5486.

2021-11-25 Thread Richard Biener via Gcc-patches

On Thu, Nov 25, 2021 at 9:00 AM liuhongt via Gcc-patches
 wrote:
>
> TYPE_PRECISION (type) <  TYPE_PRECISION (TREE_TYPE (@2)) supposed to check
> integer type but not pointer type, so use second parameter instead.
>
> i.e. first parameter is VPTR, second parameter is I4.
>
> 582DEF_SYNC_BUILTIN (BUILT_IN_ATOMIC_FETCH_OR_4,
> 583  "__atomic_fetch_or_4",
> 584  BT_FN_I4_VPTR_I4_INT, ATTR_NOTHROWCALL_LEAF_LIST)
>
>
> Bootstrapped and regtested on x86_64-linux-gnu{-m32,}.
> Failed testcases in PR are verified.
> Ok for trunk?

OK.

> gcc/ChangeLog:
>
> PR middle-end/103419
> * match.pd: Fix typo, use the type of second parameter, not
> first one.
> ---
>  gcc/match.pd | 16 
>  1 file changed, 8 insertions(+), 8 deletions(-)
>
> diff --git a/gcc/match.pd b/gcc/match.pd
> index 5adcd6bd02c..09c7ce749dc 100644
> --- a/gcc/match.pd
> +++ b/gcc/match.pd
> @@ -4053,7 +4053,7 @@ DEFINE_INT_AND_FLOAT_ROUND_FN (RINT)
> }
>(if (ibit == ibit2
>&& ibit >= 0
> -  && TYPE_PRECISION (type) <= TYPE_PRECISION (TREE_TYPE (@2))
> +  && TYPE_PRECISION (type) <= TYPE_PRECISION (TREE_TYPE (@0))
>
>  (match (nop_atomic_bit_test_and_p @0 @1 @3)
>   (bit_and (convert?@3 (SYNC_FETCH_OR_XOR_N @2 INTEGER_CST@0))
> @@ -4064,21 +4064,21 @@ DEFINE_INT_AND_FLOAT_ROUND_FN (RINT)
> }
>(if (ibit == ibit2
>&& ibit >= 0
> -  && TYPE_PRECISION (type) <= TYPE_PRECISION (TREE_TYPE (@2))
> +  && TYPE_PRECISION (type) <= TYPE_PRECISION (TREE_TYPE (@0))
>
>  (match (nop_atomic_bit_test_and_p @0 @0 @4)
>   (bit_and:c
>(convert1?@4
> (ATOMIC_FETCH_OR_XOR_N @2 (nop_convert? (lshift@0 integer_onep@5 @6)) @3))
>(convert2? @0))
> - (if (TYPE_PRECISION (type) <= TYPE_PRECISION (TREE_TYPE (@2)
> + (if (TYPE_PRECISION (type) <= TYPE_PRECISION (TREE_TYPE (@0)
>
>  (match (nop_atomic_bit_test_and_p @0 @0 @4)
>   (bit_and:c
>(convert1?@4
> (SYNC_FETCH_OR_XOR_N @2 (nop_convert? (lshift@0 integer_onep@3 @5
>(convert2? @0))
> - (if (TYPE_PRECISION (type) <= TYPE_PRECISION (TREE_TYPE (@2)
> + (if (TYPE_PRECISION (type) <= TYPE_PRECISION (TREE_TYPE (@0)
>
>  (match (nop_atomic_bit_test_and_p @0 @1 @3)
>   (bit_and@4 (convert?@3 (ATOMIC_FETCH_AND_N @2 INTEGER_CST@0 @5))
> @@ -4090,7 +4090,7 @@ DEFINE_INT_AND_FLOAT_ROUND_FN (RINT)
> }
>(if (ibit == ibit2
>&& ibit >= 0
> -  && TYPE_PRECISION (type) <= TYPE_PRECISION (TREE_TYPE (@2))
> +  && TYPE_PRECISION (type) <= TYPE_PRECISION (TREE_TYPE (@0))
>
>  (match (nop_atomic_bit_test_and_p @0 @1 @3)
>   (bit_and@4
> @@ -4103,21 +4103,21 @@ DEFINE_INT_AND_FLOAT_ROUND_FN (RINT)
> }
>(if (ibit == ibit2
>&& ibit >= 0
> -  && TYPE_PRECISION (type) <= TYPE_PRECISION (TREE_TYPE (@2))
> +  && TYPE_PRECISION (type) <= TYPE_PRECISION (TREE_TYPE (@0))
>
>  (match (nop_atomic_bit_test_and_p @4 @0 @3)
>   (bit_and:c
>(convert1?@3
> (ATOMIC_FETCH_AND_N @2 (nop_convert?@4 (bit_not (lshift@0 integer_onep@6 
> @7))) @5))
>(convert2? @0))
> - (if (TYPE_PRECISION (type) <= TYPE_PRECISION (TREE_TYPE (@2)
> + (if (TYPE_PRECISION (type) <= TYPE_PRECISION (TREE_TYPE (@4)
>
>  (match (nop_atomic_bit_test_and_p @4 @0 @3)
>   (bit_and:c
>(convert1?@3
> (SYNC_FETCH_AND_AND_N @2 (nop_convert?@4 (bit_not (lshift@0 
> integer_onep@6 @7)
>(convert2? @0))
> -  (if (TYPE_PRECISION (type) <= TYPE_PRECISION (TREE_TYPE (@2)
> +  (if (TYPE_PRECISION (type) <= TYPE_PRECISION (TREE_TYPE (@4)
>
>  #endif
>
> --
> 2.18.1
>

Re: [PATCH] Loop unswitching: support gswitch statements.

2021-11-25 Thread Aldy Hernandez via Gcc-patches

On Wed, Nov 24, 2021 at 9:00 AM Richard Biener
 wrote:
>
> On Tue, Nov 23, 2021 at 5:36 PM Martin Liška  wrote:
> >
> > On 11/23/21 16:20, Martin Liška wrote:
> > > Sure, so for e.g. case 1 ... 5 we would need to create a new 
> > > unswitch_predicate
> > > with 1 <= index && index <= 5 tree predicate (and the corresponding 
> > > irange range).
> > > Later once we unswitch on it, we should use a special unreachable_flag 
> > > that will
> > > be used for marking of dead edges (similarly how we fold gconds to 
> > > boolean_{false/true}_node.
> > > Does it make sense?
> >
> > I have thought about it more and it's not enough. What we really want is 
> > having a irange
> > for *each edge* (2 for gconds and multiple for gswitchs). Once we select a 
> > unswitch_predicate,
> > then we need to fold_range in true/false loop all these iranges. Doing that 
> > we can handle situations like:
> >
> > if (index < 1)
> > do_something1
> >
> > if (index > 2)
> > do_something2
> >
> > switch (index)
> > case 1 ... 2:
> >   do_something;
> > ...
> >
> > as seen the once we unswitch on 'index < 1' and 'index > 2', then the first 
> > case will be taken in the false_edge
> > of 'index > 2' loop unswitching.
>
> Hmm.  I'm not sure it needs to be this complicated.  We're basically
> evaluating ranges/predicates based
> on a fixed set of versioning predicates.  Your implementation created
> "predicates" for the to be simplified
> conditions but in the end we like to evaluate the actual stmt to
> figure the taken/not taken edges.  IIRC
> elsewhere Andrew showed a snipped on how to evaluate a stmt with a
> given range - not sure if that
> was useful enough.  So what I think would be nice if we could somehow
> use rangers path query
> without an actual CFG.  So we virtuall have
>
>   if (versioning-predicate1)
> if (versioning-predicate2)
>;
>else
>   for (;;) // out current loop
> {
>   ...
>   if (condition)
> ;
>  ...
>   switch (var)
>  {
> ...
>   }
> }
>
> and versioning-predicate1 and versioning-predicate2 are not in the IL.
> What we'd like
> to do is seed the path query with a "virtual" path through the two
> predicates to the
> entry of the loop and compute_ranges based on those.  Then we like to
> use range_of_stmt on 'if (condition)' and 'switch (var)' to determine
> not taken edges.

Huh, that's an interesting idea.  We could definitely adapt
path_range_query to work with an artificial sequence of blocks, but it
would need some surgery.  Off the top of my head:

a) The phi handling code looks for specific edges in the path (both
for intra path ranges and for relations inherent in PHIs).
b) The exported ranges between blocks in the path, probably needs some
massaging.
c) compute_outgoing_relations would need some work as you mention below...

> Looking somewhat at the sources it seems like we "simply" need to do what
> compute_outgoing_relations does - unfortunately the code lacks comments
> so I have no idea what jt_fur_source src (...).register_outgoing_edges does 
> ...

fur_source is an abstraction for operands to the folding mechanism:

// Source of all operands for fold_using_range and gori_compute.
// It abstracts out the source of an operand so it can come from a stmt or
// and edge or anywhere a derived class of fur_source wants.
// The default simply picks up ranges from the current range_query.

class fur_source
{
}

When passed to register_outgoing_edges, it registers outgoing
relations out of a conditional.  I pass it the known outgoing edge out
of the conditional, so only the relational on that edge is recorded.
I have overloaded fur_source into a path specific jt_fur_source that
uses a path_oracle to register relations as they would occur along a
path.  Once register_outgoing_edges is called on each outgoing edge
between blocks in a path, the relations will have been set, and can be
seen by the range_of_stmt:

path_range_query::range_of_stmt (irange &r, gimple *stmt, tree)
{
...
  // If resolving unknowns, fold the statement making use of any
  // relations along the path.
  if (m_resolve)
{
  fold_using_range f;
  jt_fur_source src (stmt, this, &m_ranger->gori (), m_path);
  if (!f.fold_stmt (r, stmt, src))
r.set_varying (type);
}
...
}

register_outgoing_edges would probably have to be adjusted for your
CFGless paths, and maybe the path_oracle (Andrew??).

My apologies.  The jt_fur_source is not only  wrongly named
"jump_thread", but is the least obvious part of the solver.  There are
some comments for jt_fur_source, but its use could benefit from better
comments throughout.  Let's see if I have some time before my leave to
document things better.

Aldy

>
> Anyway, for now manually simplifying things is fine but I probably would still
> stick to a basic interface that marks not taken outgoing edges of a stmt based
> on the set of version

Re: [PATCH 1v2/3][vect] Add main vectorized loop unrolling

2021-11-25 Thread Andre Vieira (lists) via Gcc-patches




On 24/11/2021 11:00, Richard Biener wrote:

On Wed, 24 Nov 2021, Andre Vieira (lists) wrote:


On 22/11/2021 12:39, Richard Biener wrote:

+  if (first_loop_vinfo->suggested_unroll_factor > 1)
+{
+  if (LOOP_VINFO_EPIL_USING_PARTIAL_VECTORS_P (first_loop_vinfo))
+   {
+ if (dump_enabled_p ())
+   dump_printf_loc (MSG_NOTE, vect_location,
+"* Re-trying analysis with first vector
mode"
+" %s for epilogue with partial vectors of"
+" unrolled first loop.\n",
+GET_MODE_NAME (vector_modes[0]));
+ mode_i = 0;

and the later done check for bigger VF than main loop - why would
we re-start at 0 rather than at the old mode?  Maybe we want to
remember the iterator value we started at when arriving at the
main loop mode?  So if we analyzed successfully with mode_i == 2,
then sucessfully at mode_i == 4 which suggested an unroll of 2,
re-start at the mode_i we continued after the mode_i == 2
successful analysis?  To just consider the "simple" case of
AVX vs SSE it IMHO doesn't make much sense to succeed with
AVX V4DF, succeed with SSE V2DF and figure it's better than V4DF AVX
but get a suggestion of 2 times unroll and then re-try AVX V4DF
just to re-compute that yes, it's worse than SSE V2DF?  You
are probably thinking of SVE vs ADVSIMD here but do we need to
start at 0?  Adding a comment to the code would be nice.

Thanks,

I was indeed thinking SVE vs Advanced SIMD where we end up having to compare
different vectorization strategies, which will have different costs depending.
The hypothetical case, as in I don't think I've come across one, is where if
we decide to vectorize the main loop for V8QI and unroll 2x, yielding a VF of
16, we may then want to then use a predicated VNx16QI epilogue.

But this isn't the epilogue handling ...
Am I misunderstanding the code here? To me it looks like this is picking 
what mode_i to start the 'while (1)' loop does the loop analysis for the 
epilogues?

[COMMITTED] path solver: Compute ranges in path in gimple order.

2021-11-25 Thread Aldy Hernandez via Gcc-patches

Andrew's patch for this PR103254 papered over some underlying
performance issues in the path solver that I'd like to address.

We are currently solving the SSA's defined in the current block in
bitmap order, which amounts to random order for all purposes.  This is
causing unnecessary recursion in gori.  This patch changes the order
to gimple order, thus solving dependencies before uses.

There is no change in threadable paths with this change.

Tested on x86-64 & ppc64le Linux.

gcc/ChangeLog:

PR tree-optimization/103254
* gimple-range-path.cc (path_range_query::compute_ranges_defined): New
(path_range_query::compute_ranges_in_block): Move to
compute_ranges_defined.
* gimple-range-path.h (compute_ranges_defined): New.
---
 gcc/gimple-range-path.cc | 33 ++---
 gcc/gimple-range-path.h  |  1 +
 2 files changed, 23 insertions(+), 11 deletions(-)

diff --git a/gcc/gimple-range-path.cc b/gcc/gimple-range-path.cc
index 4aa666d2c8b..e24086691c4 100644
--- a/gcc/gimple-range-path.cc
+++ b/gcc/gimple-range-path.cc
@@ -401,6 +401,27 @@ path_range_query::compute_ranges_in_phis (basic_block bb)
 }
 }
 
+// Compute ranges defined in block.
+
+void
+path_range_query::compute_ranges_defined (basic_block bb)
+{
+  int_range_max r;
+
+  compute_ranges_in_phis (bb);
+
+  // Iterate in gimple order to minimize recursion.
+  for (auto gsi = gsi_start_nondebug_bb (bb); !gsi_end_p (gsi); gsi_next 
(&gsi))
+if (gimple_has_lhs (gsi_stmt (gsi)))
+  {
+   tree name = gimple_get_lhs (gsi_stmt (gsi));
+   if (TREE_CODE (name) == SSA_NAME
+   && bitmap_bit_p (m_imports, SSA_NAME_VERSION (name))
+   && range_defined_in_block (r, name, bb))
+ set_cache (r, name);
+  }
+}
+
 // Compute ranges defined in the current block, or exported to the
 // next block.
 
@@ -423,17 +444,7 @@ path_range_query::compute_ranges_in_block (basic_block bb)
clear_cache (name);
 }
 
-  // Solve imports defined in this block, starting with the PHIs...
-  compute_ranges_in_phis (bb);
-  // ...and then the rest of the imports.
-  EXECUTE_IF_SET_IN_BITMAP (m_imports, 0, i, bi)
-{
-  tree name = ssa_name (i);
-
-  if (gimple_code (SSA_NAME_DEF_STMT (name)) != GIMPLE_PHI
- && range_defined_in_block (r, name, bb))
-   set_cache (r, name);
-}
+  compute_ranges_defined (bb);
 
   if (at_exit ())
 return;
diff --git a/gcc/gimple-range-path.h b/gcc/gimple-range-path.h
index 57a9ae9bdcd..81c87d475dd 100644
--- a/gcc/gimple-range-path.h
+++ b/gcc/gimple-range-path.h
@@ -58,6 +58,7 @@ private:
   // Methods to compute ranges for the given path.
   bool range_defined_in_block (irange &, tree name, basic_block bb);
   void compute_ranges_in_block (basic_block bb);
+  void compute_ranges_defined (basic_block bb);
   void compute_ranges_in_phis (basic_block bb);
   void adjust_for_non_null_uses (basic_block bb);
   void ssa_range_in_phi (irange &r, gphi *phi);
-- 
2.31.1

[COMMITTED] path solver: Move boolean import code to compute_imports.

2021-11-25 Thread Aldy Hernandez via Gcc-patches

In a follow-up patch I will be pruning the set of exported ranges
within blocks to avoid unnecessary work.  In order to do this, all the
interesting SSA names must be in the internal import bitmap ahead of
time.  I had already abstracted them out into compute_imports, but I
missed the boolean code.  This fixes the oversight.

There's a net gain of 25 threadable paths, which is unexpected but
welcome.

Tested on x86-64 & ppc64le Linux.

gcc/ChangeLog:

PR tree-optimization/103254
* gimple-range-path.cc (path_range_query::compute_ranges): Move
exported boolean code...
(path_range_query::compute_imports): ...here.
---
 gcc/gimple-range-path.cc | 25 -
 1 file changed, 12 insertions(+), 13 deletions(-)

diff --git a/gcc/gimple-range-path.cc b/gcc/gimple-range-path.cc
index e24086691c4..806bce9ff11 100644
--- a/gcc/gimple-range-path.cc
+++ b/gcc/gimple-range-path.cc
@@ -537,7 +537,8 @@ void
 path_range_query::compute_imports (bitmap imports, basic_block exit)
 {
   // Start with the imports from the exit block...
-  bitmap r_imports = m_ranger->gori ().imports (exit);
+  gori_compute &gori = m_ranger->gori ();
+  bitmap r_imports = gori.imports (exit);
   bitmap_copy (imports, r_imports);
 
   auto_vec worklist (bitmap_count_bits (imports));
@@ -579,6 +580,16 @@ path_range_query::compute_imports (bitmap imports, 
basic_block exit)
}
}
 }
+  // Exported booleans along the path, may help conditionals.
+  if (m_resolve)
+for (i = 0; i < m_path.length (); ++i)
+  {
+   basic_block bb = m_path[i];
+   tree name;
+   FOR_EACH_GORI_EXPORT_NAME (gori, bb, name)
+ if (TREE_CODE (TREE_TYPE (name)) == BOOLEAN_TYPE)
+   bitmap_set_bit (imports, SSA_NAME_VERSION (name));
+  }
 }
 
 // Compute the ranges for IMPORTS along PATH.
@@ -622,18 +633,6 @@ path_range_query::compute_ranges (const vec 
&path,
 {
   basic_block bb = curr_bb ();
 
-  if (m_resolve)
-   {
- gori_compute &gori = m_ranger->gori ();
- tree name;
-
- // Exported booleans along the path, may help conditionals.
- // Add them as interesting imports.
- FOR_EACH_GORI_EXPORT_NAME (gori, bb, name)
-   if (TREE_CODE (TREE_TYPE (name)) == BOOLEAN_TYPE)
- bitmap_set_bit (m_imports, SSA_NAME_VERSION (name));
-   }
-
   compute_ranges_in_block (bb);
   adjust_for_non_null_uses (bb);
 
-- 
2.31.1

Re: [PATCH][RFC] middle-end/46476 - resurrect -Wunreachable-code

2021-11-25 Thread Richard Biener via Gcc-patches

On Thu, 25 Nov 2021, Richard Biener wrote:

> On Wed, 24 Nov 2021, Jason Merrill wrote:
> 
> > On 11/24/21 11:15, Marek Polacek wrote:
> > > On Wed, Nov 24, 2021 at 04:21:31PM +0100, Richard Biener via Gcc-patches
> > > wrote:
> > >> This resurrects -Wunreachable-code and implements a warning for
> > >> trivially unreachable code as of CFG construction.  Most problematic
> > >> with this is the C/C++ frontend added 'return 0;' stmt in main
> > >> which the patch handles for C++ like the C frontend already does
> > >> by using BUILTINS_LOCATION.
> > >>
> > >> Another problem for future enhancement is that after CFG construction
> > >> we no longer can point to the stmt making a stmt unreachable, so
> > >> this implementation tries to warn on the first unreachable
> > >> statement of a region.  It might be possible to retain a pointer
> > >> to the stmt that triggered creation of a basic-block but I'm not
> > >> sure how reliable that would be.
> > >>
> > >> So this is really a simple attempt for now, triggered by myself
> > >> running into such a coding error.  As always, the perfect is the
> > >> enemy of the good.
> > >>
> > >> It does not pass bootstrap (which enables -Wextra), because of the
> > >> situation in g++.dg/Wunreachable-code-5.C where the C++ frontend
> > >> prematurely elides conditions like if (! GATHER_STATISTICS) that
> > >> evaluate to true - oddly enough it does _not_ do this for
> > >> conditions evaluating to false ... (one of the
> > >> c-c++-common/Wunreachable-code-2.c cases).
> > > 
> > > I've taken a look into the C++ thing.  This is genericize_if_stmt:
> > > if we have
> > > 
> > >if (0)
> > >  return;
> > > 
> > > then cond is integer_zerop, then_ is a return_expr, but since it has
> > > TREE_SIDE_EFFECTS, we create a COND_EXPR.  For
> > > 
> > >if (!0)
> > >   return;
> > > 
> > > we do
> > >   170   else if (integer_nonzerop (cond) && !TREE_SIDE_EFFECTS (else_))
> > >   171 stmt = then_;
> > > which elides the if completely.
> > > 
> > > So it seems it would help if we avoided eliding the if stmt if
> > > -Wunreachable-code is in effect.  I'd be happy to make that change,
> > > if it sounds sane.
> 
> Yes, that seems to work.
> 
> > Sure.
> > 
> > Currently the front end does various constant folding as part of
> > genericization, as I recall because there were missed optimizations without
> > it.  Is this particular one undesirable because it's at the statement level
> > rather than within an expression?
> 
> It's undesirable because it short-circuits control flow and thus
> 
>   if (0)
> return;
>   foo ();
> 
> becomes
> 
>   return;
>   foo ();
> 
> which looks exactly like a case we want to diagnose (very likely a 
> programming error).
> 
> So yes, it applies to the statement level and there only to control
> statements.

So another case in GCC is

  if (WORDS_BIG_ENDIAN == BYTES_BIG_ENDIAN)
...
  else
{
  /* Assert that we're only dealing with the PDP11 case.  */
  gcc_assert (!BYTES_BIG_ENDIAN);
  gcc_assert (WORDS_BIG_ENDIAN);

  cpp_define (pfile, "__BYTE_ORDER__=__ORDER_PDP_ENDIAN__");

where that macro expands to

  ((void)(!(!0) ? fancy_abort 
("/home/rguenther/src/trunk/gcc/cppbuiltin.c", 180, __FUNCTION__), 0 : 
0));
  ((void)(!(0) ? fancy_abort 
("/home/rguenther/src/trunk/gcc/cppbuiltin.c", 181, __FUNCTION__), 0 : 
0));

  cpp_define (pfile, "__BYTE_ORDER__=__ORDER_PDP_ENDIAN__");

and the frontend elides the COND_EXPRs making the cpp_define
unreachable.  That's only exposed because we no longer elide the
if (1) guardingt this else path ...

Also this is a case where we definitely do not want to
diagnose that either the else or the true path is statically
unreachable.

Richard.

Re: [PATCH] match.pd: Fix up the recent bitmask_inv_cst_vector_p simplification [PR103417]

2021-11-25 Thread Richard Biener via Gcc-patches

On Thu, 25 Nov 2021, Jakub Jelinek wrote:

> On Thu, Nov 25, 2021 at 10:17:52AM +0100, Richard Biener wrote:
> > > Ah I see, sorry I didn't see that rule before, you're right that if this 
> > > is ordered
> > > after it then they can be dropped.
> > 
> > So the patch is OK, possibly with re-ordering the matches.
> 
> I've committed the patch as is because it has been tested that way and I'd
> like to avoid dups of that PR flowing in.  Even when not reordered, the new
> earlier match.pd simplification will not trigger for the lt le gt ge cases
> anymore and the later old simplifications will trigger and I'd expect after
> that latter simplification the earlier should trigger again because the IL
> changed, no?

Yes, the result always is re-folded.

> Tamar, can you handle the reordering together with the testsuite changes
> (and perhaps formatting fixes in the tree.c routine)?

Re: [PATCH] ipa: Teach IPA-CP transformation about IPA-SRA modifications (PR 103227)

2021-11-25 Thread Jan Hubicka via Gcc-patches

> 
> gcc/ChangeLog:
> 
> 2021-11-23  Martin Jambor  
> 
>   PR ipa/103227
>   * ipa-prop.h (ipa_get_param): New overload.  Move bits of the existing
>   one to the new one.
>   * ipa-param-manipulation.h (ipa_param_adjustments): New member
>   function get_updated_index_or_split.
>   * ipa-param-manipulation.c
>   (ipa_param_adjustments::get_updated_index_or_split): New function.
>   * ipa-prop.c (adjust_agg_replacement_values): Reimplement, add
>   capability to identify scalarized parameters and perform substitution
>   on them.
>   (ipcp_transform_function): Create descriptors earlier, handle new
>   return values of adjust_agg_replacement_values.
> 
> gcc/testsuite/ChangeLog:
> 
> 2021-11-23  Martin Jambor  
> 
>   PR ipa/103227
>   * gcc.dg/ipa/pr103227-1.c: New test.
>   * gcc.dg/ipa/pr103227-3.c: Likewise.
>   * gcc.dg/ipa/pr103227-2.c: Likewise.
>   * gfortran.dg/pr53787.f90: Disable IPA-SRA.
> ---
>  gcc/ipa-param-manipulation.c  | 33 
>  gcc/ipa-param-manipulation.h  |  7 +++
>  gcc/ipa-prop.c| 73 +++
>  gcc/ipa-prop.h| 15 --
>  gcc/testsuite/gcc.dg/ipa/pr103227-1.c | 29 +++
>  gcc/testsuite/gcc.dg/ipa/pr103227-2.c | 29 +++
>  gcc/testsuite/gcc.dg/ipa/pr103227-3.c | 52 +++
>  gcc/testsuite/gfortran.dg/pr53787.f90 |  2 +-
>  8 files changed, 216 insertions(+), 24 deletions(-)
>  create mode 100644 gcc/testsuite/gcc.dg/ipa/pr103227-1.c
>  create mode 100644 gcc/testsuite/gcc.dg/ipa/pr103227-2.c
>  create mode 100644 gcc/testsuite/gcc.dg/ipa/pr103227-3.c
> 
> diff --git a/gcc/ipa-param-manipulation.c b/gcc/ipa-param-manipulation.c
> index cec1dba701f..479c20b3871 100644
> --- a/gcc/ipa-param-manipulation.c
> +++ b/gcc/ipa-param-manipulation.c
> @@ -449,6 +449,39 @@ ipa_param_adjustments::get_updated_indices (vec 
> *new_indices)
>  }
>  }
>  
> +/* If a parameter with original INDEX has survived intact, return its new
> +   index.  Otherwise return -1.  In that case, if it has been split and there
> +   is a new parameter representing a portion at unit OFFSET for which a value
> +   of a TYPE can be substituted, store its new index into SPLIT_INDEX,
> +   otherwise store -1 there.  */
> +int
> +ipa_param_adjustments::get_updated_index_or_split (int index,
> +unsigned unit_offset,
> +tree type, int *split_index)
> +{
> +  unsigned adj_len = vec_safe_length (m_adj_params);
> +  for (unsigned i = 0; i < adj_len ; i++)

In ipa-modref I precompute this to map so we do not need to walk all
params, but the loop is probably not bad since functions do not have
tens of thousdands parameters :)

Can I use it in ipa-modref to discover what parameters was turned from
by-reference to scalar, too?
> +{
> +  ipa_adjusted_param *apm = &(*m_adj_params)[i];
> +  if (apm->base_index != index)
> + continue;
> +  if (apm->op == IPA_PARAM_OP_COPY)
> + return i;
> +  if (apm->op == IPA_PARAM_OP_SPLIT
> +   && apm->unit_offset == unit_offset)
> + {
> +   if (useless_type_conversion_p (apm->type, type))
> + *split_index = i;
> +   else
> + *split_index = -1;
> +   return -1;
> + }
> +}
> +
> +  *split_index = -1;
> +  return -1;
> +}
> +
>  /* Return the original index for the given new parameter index.  Return a
> negative number if not available.  */
>  
> diff --git a/gcc/ipa-param-manipulation.h b/gcc/ipa-param-manipulation.h
> index 5adf8a22356..d1dad9fac73 100644
> --- a/gcc/ipa-param-manipulation.h
> +++ b/gcc/ipa-param-manipulation.h
> @@ -236,6 +236,13 @@ public:
>void get_surviving_params (vec *surviving_params);
>/* Fill a vector with new indices of surviving original parameters.  */
>void get_updated_indices (vec *new_indices);
> +  /* If a parameter with original INDEX has survived intact, return its new
> + index.  Otherwise return -1.  In that case, if it has been split and 
> there
> + is a new parameter representing a portion at UNIT_OFFSET for which a 
> value
> + of a TYPE can be substituted, store its new index into SPLIT_INDEX,
> + otherwise store -1 there.  */
> +  int get_updated_index_or_split (int index, unsigned unit_offset, tree type,
> +   int *split_index);
>/* Return the original index for the given new parameter index.  Return a
>   negative number if not available.  */
>int get_original_index (int newidx);
> diff --git a/gcc/ipa-prop.c b/gcc/ipa-prop.c
> index e85df0971fc..a297f50e945 100644
> --- a/gcc/ipa-prop.c
> +++ b/gcc/ipa-prop.c
> @@ -5578,32 +5578,55 @@ ipcp_read_transformation_summaries (void)
>  }
>  
>  /* Adjust the aggregate replacements in AGGVAL to reflect parameters skipped 
> in
> -   NODE.  */
> +   NODE but also if any parameter was I

[PATCH][DOCS] docs: Add missing @option keyword.

2021-11-25 Thread Martin Liška


Pushed as obvious.

Martin

gcc/ChangeLog:

* doc/invoke.texi: Use @option for -Wuninitialized.
---
 gcc/doc/invoke.texi | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/gcc/doc/invoke.texi b/gcc/doc/invoke.texi
index d0ac59768b9..3bddfbaae6a 100644
--- a/gcc/doc/invoke.texi
+++ b/gcc/doc/invoke.texi
@@ -12117,8 +12117,8 @@ Initialize automatic variables with either a pattern or 
with zeroes to increase
 the security and predictability of a program by preventing uninitialized memory
 disclosure and use.
 GCC still considers an automatic variable that doesn't have an explicit
-initializer as uninitialized, -Wuninitialized will still report warning 
messages
-on such automatic variables.
+initializer as uninitialized, @option{-Wuninitialized} will still report
+warning messages on such automatic variables.
 With this option, GCC will also initialize any padding of automatic variables
 that have structure or union types to zeroes.
 
--

2.33.1

[PATCH] PR middle-end/103406: Check for Inf before simplifying x-x.

2021-11-25 Thread Roger Sayle


This is a simple one line fix to the regression PR middle-end/103406,
where x - x is being folded to 0.0 even when x is +Inf or -Inf.
In GCC 11 and previously, we'd check whether the type honored NaNs
(which implicitly covered the case where the type honors infinities),
but my patch to test whether the operand could potentially be NaN
failed to also check whether the operand could potentially be Inf.

This patch doesn't address the issue of NaN signedness from binary
arithmetic operations, just the regression.

This patch has been tested on x86_64-pc-linux-gnu with make bootstrap
and make -k check with no new failures.  Ok for mainline?


2021-11-25  Roger Sayle  

gcc/ChangeLog
PR middle-end/103406
* match.pd (minus @0 @0): Check tree_expr_maybe_infinite_p.

gcc/testsuite/ChangeLog
PR middle-end/103406
* gcc.dg/pr103406.c: New test case.

Thanks in advance (and sorry for the inconvenience),
Roger
--

diff --git a/gcc/match.pd b/gcc/match.pd
index f059b47..d28dfe2 100644
--- a/gcc/match.pd
+++ b/gcc/match.pd
@@ -232,7 +232,9 @@ DEFINE_INT_AND_FLOAT_ROUND_FN (RINT)
is volatile.  */
 (simplify
  (minus @0 @0)
- (if (!FLOAT_TYPE_P (type) || !tree_expr_maybe_nan_p (@0))
+ (if (!FLOAT_TYPE_P (type)
+  || (!tree_expr_maybe_nan_p (@0)
+ && !tree_expr_maybe_infinite_p (@0)))
   { build_zero_cst (type); }))
 (simplify
  (pointer_diff @@0 @0)
diff --git a/gcc/testsuite/gcc.dg/pr103406.c b/gcc/testsuite/gcc.dg/pr103406.c
new file mode 100644
index 000..9c7b83b
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/pr103406.c
@@ -0,0 +1,14 @@
+/* { dg-do compile } */
+/* { dg-options "-O2 -fdump-tree-optimized" } */
+
+#define HUGE __DBL_MAX__
+#define INF (HUGE + HUGE)
+#define NAN (INF - INF)
+
+double foo() {
+  double x = -NAN;
+  double y = NAN;
+  return x + y;
+}
+
+/* { dg-final { scan-tree-dump-not "return 0\.0" "optimized" } } */

Re: [COMMITTED] path solver: Compute ranges in path in gimple order.

2021-11-25 Thread Richard Biener via Gcc-patches

On Thu, Nov 25, 2021 at 11:55 AM Aldy Hernandez via Gcc-patches
 wrote:
>
> Andrew's patch for this PR103254 papered over some underlying
> performance issues in the path solver that I'd like to address.
>
> We are currently solving the SSA's defined in the current block in
> bitmap order, which amounts to random order for all purposes.  This is
> causing unnecessary recursion in gori.  This patch changes the order
> to gimple order, thus solving dependencies before uses.
>
> There is no change in threadable paths with this change.
>
> Tested on x86-64 & ppc64le Linux.
>
> gcc/ChangeLog:
>
> PR tree-optimization/103254
> * gimple-range-path.cc (path_range_query::compute_ranges_defined): New
> (path_range_query::compute_ranges_in_block): Move to
> compute_ranges_defined.
> * gimple-range-path.h (compute_ranges_defined): New.
> ---
>  gcc/gimple-range-path.cc | 33 ++---
>  gcc/gimple-range-path.h  |  1 +
>  2 files changed, 23 insertions(+), 11 deletions(-)
>
> diff --git a/gcc/gimple-range-path.cc b/gcc/gimple-range-path.cc
> index 4aa666d2c8b..e24086691c4 100644
> --- a/gcc/gimple-range-path.cc
> +++ b/gcc/gimple-range-path.cc
> @@ -401,6 +401,27 @@ path_range_query::compute_ranges_in_phis (basic_block bb)
>  }
>  }
>
> +// Compute ranges defined in block.
> +
> +void
> +path_range_query::compute_ranges_defined (basic_block bb)
> +{
> +  int_range_max r;
> +
> +  compute_ranges_in_phis (bb);
> +
> +  // Iterate in gimple order to minimize recursion.
> +  for (auto gsi = gsi_start_nondebug_bb (bb); !gsi_end_p (gsi); gsi_next 
> (&gsi))

gsi_next_nondebug (&gsi)?

Of course this  all has the extra cost of iterating over a possibly very large
BB for just a few bits in m_imports?  How often does m_imports have
exactly one bit set?

> +if (gimple_has_lhs (gsi_stmt (gsi)))
> +  {
> +   tree name = gimple_get_lhs (gsi_stmt (gsi));
> +   if (TREE_CODE (name) == SSA_NAME
> +   && bitmap_bit_p (m_imports, SSA_NAME_VERSION (name))
> +   && range_defined_in_block (r, name, bb))
> + set_cache (r, name);
> +  }

So if you ever handle SSA DEFs in asms then this will not pick them.
I think more generic would be to do

FOR_EACH_SSA_DEF_OPERAND (..., SSA_OP_DEF)



> +}
> +
>  // Compute ranges defined in the current block, or exported to the
>  // next block.
>
> @@ -423,17 +444,7 @@ path_range_query::compute_ranges_in_block (basic_block 
> bb)
> clear_cache (name);
>  }
>
> -  // Solve imports defined in this block, starting with the PHIs...
> -  compute_ranges_in_phis (bb);
> -  // ...and then the rest of the imports.
> -  EXECUTE_IF_SET_IN_BITMAP (m_imports, 0, i, bi)
> -{
> -  tree name = ssa_name (i);
> -
> -  if (gimple_code (SSA_NAME_DEF_STMT (name)) != GIMPLE_PHI
> - && range_defined_in_block (r, name, bb))
> -   set_cache (r, name);
> -}
> +  compute_ranges_defined (bb);
>
>if (at_exit ())
>  return;
> diff --git a/gcc/gimple-range-path.h b/gcc/gimple-range-path.h
> index 57a9ae9bdcd..81c87d475dd 100644
> --- a/gcc/gimple-range-path.h
> +++ b/gcc/gimple-range-path.h
> @@ -58,6 +58,7 @@ private:
>// Methods to compute ranges for the given path.
>bool range_defined_in_block (irange &, tree name, basic_block bb);
>void compute_ranges_in_block (basic_block bb);
> +  void compute_ranges_defined (basic_block bb);
>void compute_ranges_in_phis (basic_block bb);
>void adjust_for_non_null_uses (basic_block bb);
>void ssa_range_in_phi (irange &r, gphi *phi);
> --
> 2.31.1
>

Re: [PATCH] PR middle-end/103406: Check for Inf before simplifying x-x.

2021-11-25 Thread Richard Biener via Gcc-patches

On Thu, Nov 25, 2021 at 12:30 PM Roger Sayle  wrote:
>
>
> This is a simple one line fix to the regression PR middle-end/103406,
> where x - x is being folded to 0.0 even when x is +Inf or -Inf.
> In GCC 11 and previously, we'd check whether the type honored NaNs
> (which implicitly covered the case where the type honors infinities),
> but my patch to test whether the operand could potentially be NaN
> failed to also check whether the operand could potentially be Inf.
>
> This patch doesn't address the issue of NaN signedness from binary
> arithmetic operations, just the regression.
>
> This patch has been tested on x86_64-pc-linux-gnu with make bootstrap
> and make -k check with no new failures.  Ok for mainline?

OK.

Thanks,
Richard.

>
> 2021-11-25  Roger Sayle  
>
> gcc/ChangeLog
> PR middle-end/103406
> * match.pd (minus @0 @0): Check tree_expr_maybe_infinite_p.
>
> gcc/testsuite/ChangeLog
> PR middle-end/103406
> * gcc.dg/pr103406.c: New test case.
>
> Thanks in advance (and sorry for the inconvenience),
> Roger
> --
>

Re: [COMMITTED] path solver: Compute ranges in path in gimple order.

2021-11-25 Thread Aldy Hernandez via Gcc-patches

On Thu, Nov 25, 2021 at 12:57 PM Richard Biener
 wrote:
>
> On Thu, Nov 25, 2021 at 11:55 AM Aldy Hernandez via Gcc-patches
>  wrote:
> >
> > Andrew's patch for this PR103254 papered over some underlying
> > performance issues in the path solver that I'd like to address.
> >
> > We are currently solving the SSA's defined in the current block in
> > bitmap order, which amounts to random order for all purposes.  This is
> > causing unnecessary recursion in gori.  This patch changes the order
> > to gimple order, thus solving dependencies before uses.
> >
> > There is no change in threadable paths with this change.
> >
> > Tested on x86-64 & ppc64le Linux.
> >
> > gcc/ChangeLog:
> >
> > PR tree-optimization/103254
> > * gimple-range-path.cc (path_range_query::compute_ranges_defined): 
> > New
> > (path_range_query::compute_ranges_in_block): Move to
> > compute_ranges_defined.
> > * gimple-range-path.h (compute_ranges_defined): New.
> > ---
> >  gcc/gimple-range-path.cc | 33 ++---
> >  gcc/gimple-range-path.h  |  1 +
> >  2 files changed, 23 insertions(+), 11 deletions(-)
> >
> > diff --git a/gcc/gimple-range-path.cc b/gcc/gimple-range-path.cc
> > index 4aa666d2c8b..e24086691c4 100644
> > --- a/gcc/gimple-range-path.cc
> > +++ b/gcc/gimple-range-path.cc
> > @@ -401,6 +401,27 @@ path_range_query::compute_ranges_in_phis (basic_block 
> > bb)
> >  }
> >  }
> >
> > +// Compute ranges defined in block.
> > +
> > +void
> > +path_range_query::compute_ranges_defined (basic_block bb)
> > +{
> > +  int_range_max r;
> > +
> > +  compute_ranges_in_phis (bb);
> > +
> > +  // Iterate in gimple order to minimize recursion.
> > +  for (auto gsi = gsi_start_nondebug_bb (bb); !gsi_end_p (gsi); gsi_next 
> > (&gsi))
>
> gsi_next_nondebug (&gsi)?
>
> Of course this  all has the extra cost of iterating over a possibly very large
> BB for just a few bits in m_imports?  How often does m_imports have
> exactly one bit set?

Hmmm, good point.

Perhaps this isn't worth it then.  I mean, the underlying bug I'm
tackling is an excess of outgoing edge ranges, not the excess
recursion this patch attacks.

If you think the cost would be high for large ILs, I can revert the patch.

Aldy

[PATCH] Remove dead code and function

2021-11-25 Thread Richard Biener via Gcc-patches

The only use of get_alias_symbol is gated by a gcc_unreachable (),
so the following patch gets rid of it.

Bootstrapped and tested on x86_64-unknown-linux-gnu, OK?

Thanks,
Richard.

2021-11-24  Richard Biener  

* cgraphunit.c (symbol_table::output_weakrefs): Remove
unreachable init.
(get_alias_symbol): Remove now unused function.
---
 gcc/cgraphunit.c | 16 +---
 1 file changed, 1 insertion(+), 15 deletions(-)

diff --git a/gcc/cgraphunit.c b/gcc/cgraphunit.c
index 1e58ffd65e8..3a803a34cbc 100644
--- a/gcc/cgraphunit.c
+++ b/gcc/cgraphunit.c
@@ -,17 +,6 @@ ipa_passes (void)
 }
 
 
-/* Return string alias is alias of.  */
-
-static tree
-get_alias_symbol (tree decl)
-{
-  tree alias = lookup_attribute ("alias", DECL_ATTRIBUTES (decl));
-  return get_identifier (TREE_STRING_POINTER
- (TREE_VALUE (TREE_VALUE (alias;
-}
-
-
 /* Weakrefs may be associated to external decls and thus not output
at expansion time.  Emit all necessary aliases.  */
 
@@ -2259,10 +2248,7 @@ symbol_table::output_weakrefs (void)
else if (node->analyzed)
  target = DECL_ASSEMBLER_NAME (node->get_alias_target ()->decl);
else
- {
-   gcc_unreachable ();
-   target = get_alias_symbol (node->decl);
- }
+ gcc_unreachable ();
 do_assemble_alias (node->decl, target);
   }
 }
-- 
2.31.1

[PATCH] Continue RTL verifying in rtl_verify_fallthru

2021-11-25 Thread Richard Biener via Gcc-patches

One case used fatal_insn which does not return which isn't
intended as can be seen by the following erro = 1.  The following
change refactors this to inline the relevant parts of fatal_insn
instead and continue validating the RTL IL.

Bootstrapped and tested on x86_64-unknown-linux-gnu, will push.

2021-11-25  Richard Biener  

* cfgrtl.c (rtl_verify_fallthru): Do not stop verifying
with fatal_insn.
(skip_insns_after_block): Remove unreachable break and continue.
---
 gcc/cfgrtl.c | 5 ++---
 1 file changed, 2 insertions(+), 3 deletions(-)

diff --git a/gcc/cfgrtl.c b/gcc/cfgrtl.c
index e3a724bddb4..c7ba9006b4e 100644
--- a/gcc/cfgrtl.c
+++ b/gcc/cfgrtl.c
@@ -3001,7 +3001,8 @@ rtl_verify_fallthru (void)
{
  error ("verify_flow_info: Incorrect fallthru %i->%i",
 e->src->index, e->dest->index);
- fatal_insn ("wrong insn in the fallthru edge", insn);
+ error ("wrong insn in the fallthru edge");
+ debug_rtx (insn);
  err = 1;
}
}
@@ -3540,10 +3541,8 @@ skip_insns_after_block (basic_block bb)
{
case NOTE_INSN_BLOCK_END:
  gcc_unreachable ();
- continue;
default:
  continue;
- break;
}
  break;
 
-- 
2.31.1

[PATCH] Remove never looping loop in label_rtx_for_bb

2021-11-25 Thread Richard Biener via Gcc-patches

This refactors the IL "walk" in a way to avoid the loop which will
never iterate.

Bootstrapped and tested on x86_64-unknown-linux-gnu, will push later
unless there are comments explaining the function is wrong in other
ways.

Richard.

2021-11-25  Richard Biener  

* cfgexpand.c (label_rtx_for_bb): Remove dead loop construct.
---
 gcc/cfgexpand.c | 24 ++--
 1 file changed, 6 insertions(+), 18 deletions(-)

diff --git a/gcc/cfgexpand.c b/gcc/cfgexpand.c
index eb6466f4be6..fb84d469f1e 100644
--- a/gcc/cfgexpand.c
+++ b/gcc/cfgexpand.c
@@ -2461,9 +2461,6 @@ static hash_map 
*lab_rtx_for_bb;
 static rtx_code_label *
 label_rtx_for_bb (basic_block bb ATTRIBUTE_UNUSED)
 {
-  gimple_stmt_iterator gsi;
-  tree lab;
-
   if (bb->flags & BB_RTL)
 return block_label (bb);
 
@@ -2472,21 +2469,12 @@ label_rtx_for_bb (basic_block bb ATTRIBUTE_UNUSED)
 return *elt;
 
   /* Find the tree label if it is present.  */
-
-  for (gsi = gsi_start_bb (bb); !gsi_end_p (gsi); gsi_next (&gsi))
-{
-  glabel *lab_stmt;
-
-  lab_stmt = dyn_cast  (gsi_stmt (gsi));
-  if (!lab_stmt)
-   break;
-
-  lab = gimple_label_label (lab_stmt);
-  if (DECL_NONLOCAL (lab))
-   break;
-
-  return jump_target_rtx (lab);
-}
+  gimple_stmt_iterator gsi = gsi_start_bb (bb);
+  glabel *lab_stmt;
+  if (!gsi_end_p (gsi)
+  && (lab_stmt = dyn_cast  (gsi_stmt (gsi)))
+  && !DECL_NONLOCAL (gimple_label_label (lab_stmt)))
+return jump_target_rtx (gimple_label_label (lab_stmt));
 
   rtx_code_label *l = gen_label_rtx ();
   lab_rtx_for_bb->put (bb, l);
-- 
2.31.1

[PATCH] Introduce REG_SET_EMPTY_P

2021-11-25 Thread Richard Biener via Gcc-patches

This avoids a -Wunreachable-code diagnostic with EXECUTE_IF_*
in case the first iteration will exit the loop.  For the case
in thread_jump using bitmap_empty_p looks preferable so this
adds REG_SET_EMPTY_P to make that available for register sets.

Bootstrapped and tested on x86_64-unknown-linux-gnu, will push.

Richard.

2021-11-25  Richard Biener  

* regset.h (REG_SET_EMPTY_P): New macro.
* cfgcleanup.c (thread_jump): Use REG_SET_EMPTY_P.
---
 gcc/cfgcleanup.c | 3 +--
 gcc/regset.h | 3 +++
 2 files changed, 4 insertions(+), 2 deletions(-)

diff --git a/gcc/cfgcleanup.c b/gcc/cfgcleanup.c
index 82fc505ff50..67ae0597cee 100644
--- a/gcc/cfgcleanup.c
+++ b/gcc/cfgcleanup.c
@@ -256,7 +256,6 @@ thread_jump (edge e, basic_block b)
   unsigned i;
   regset nonequal;
   bool failed = false;
-  reg_set_iterator rsi;
 
   /* Jump threading may cause fixup_partitions to introduce new crossing edges,
  which is not allowed after reload.  */
@@ -379,7 +378,7 @@ thread_jump (edge e, basic_block b)
   goto failed_exit;
 }
 
-  EXECUTE_IF_SET_IN_REG_SET (nonequal, 0, i, rsi)
+  if (!REG_SET_EMPTY_P (nonequal))
 goto failed_exit;
 
   BITMAP_FREE (nonequal);
diff --git a/gcc/regset.h b/gcc/regset.h
index aee6d6f974f..997b4d2d827 100644
--- a/gcc/regset.h
+++ b/gcc/regset.h
@@ -49,6 +49,9 @@ typedef bitmap regset;
 /* Clear a register set by freeing up the linked list.  */
 #define CLEAR_REG_SET(HEAD) bitmap_clear (HEAD)
 
+/* True if the register set is empty.  */
+#define REG_SET_EMPTY_P(HEAD) bitmap_empty_p (HEAD)
+
 /* Copy a register set to another register set.  */
 #define COPY_REG_SET(TO, FROM) bitmap_copy (TO, FROM)
 
-- 
2.31.1

Re: [COMMITTED] path solver: Compute ranges in path in gimple order.

2021-11-25 Thread Richard Biener via Gcc-patches

On Thu, Nov 25, 2021 at 1:10 PM Aldy Hernandez  wrote:
>
> On Thu, Nov 25, 2021 at 12:57 PM Richard Biener
>  wrote:
> >
> > On Thu, Nov 25, 2021 at 11:55 AM Aldy Hernandez via Gcc-patches
> >  wrote:
> > >
> > > Andrew's patch for this PR103254 papered over some underlying
> > > performance issues in the path solver that I'd like to address.
> > >
> > > We are currently solving the SSA's defined in the current block in
> > > bitmap order, which amounts to random order for all purposes.  This is
> > > causing unnecessary recursion in gori.  This patch changes the order
> > > to gimple order, thus solving dependencies before uses.
> > >
> > > There is no change in threadable paths with this change.
> > >
> > > Tested on x86-64 & ppc64le Linux.
> > >
> > > gcc/ChangeLog:
> > >
> > > PR tree-optimization/103254
> > > * gimple-range-path.cc 
> > > (path_range_query::compute_ranges_defined): New
> > > (path_range_query::compute_ranges_in_block): Move to
> > > compute_ranges_defined.
> > > * gimple-range-path.h (compute_ranges_defined): New.
> > > ---
> > >  gcc/gimple-range-path.cc | 33 ++---
> > >  gcc/gimple-range-path.h  |  1 +
> > >  2 files changed, 23 insertions(+), 11 deletions(-)
> > >
> > > diff --git a/gcc/gimple-range-path.cc b/gcc/gimple-range-path.cc
> > > index 4aa666d2c8b..e24086691c4 100644
> > > --- a/gcc/gimple-range-path.cc
> > > +++ b/gcc/gimple-range-path.cc
> > > @@ -401,6 +401,27 @@ path_range_query::compute_ranges_in_phis 
> > > (basic_block bb)
> > >  }
> > >  }
> > >
> > > +// Compute ranges defined in block.
> > > +
> > > +void
> > > +path_range_query::compute_ranges_defined (basic_block bb)
> > > +{
> > > +  int_range_max r;
> > > +
> > > +  compute_ranges_in_phis (bb);
> > > +
> > > +  // Iterate in gimple order to minimize recursion.
> > > +  for (auto gsi = gsi_start_nondebug_bb (bb); !gsi_end_p (gsi); gsi_next 
> > > (&gsi))
> >
> > gsi_next_nondebug (&gsi)?
> >
> > Of course this  all has the extra cost of iterating over a possibly very 
> > large
> > BB for just a few bits in m_imports?  How often does m_imports have
> > exactly one bit set?
>
> Hmmm, good point.
>
> Perhaps this isn't worth it then.  I mean, the underlying bug I'm
> tackling is an excess of outgoing edge ranges, not the excess
> recursion this patch attacks.
>
> If you think the cost would be high for large ILs, I can revert the patch.

I think so.  If ordering is important then that should be achieved in some
other ways (always a bit difficult for on-demand infrastructure).

Richard.

>
> Aldy
>

Re: [PATCH] PR tree-optimization/103359 - Check for equivalences between PHI argument and def.

2021-11-25 Thread Richard Biener via Gcc-patches

On Wed, Nov 24, 2021 at 9:49 PM Andrew MacLeod via Gcc-patches
 wrote:
>
> PHI nodes frequently feed each other, and this is particularly true of
> the one/two incoming edge PHIs inserted by some of the loop analysis
> code which is introduced at the start of the VRP passes.
>
> Ranger has a hybrid of optimistic vs pessimistic evaluation, and when it
> switches to pessimistic, it has to assume VARYING for a range.  PHIs are
> calculated as the union of all incoming edges, so once we throw a
> VARYING into the mix, there's not much chance going back.  (mostly
> true... we can sometimes update the range when inputs change, but we
> prefer to avoid iterating when possible)
>
> We already have code to recognize that if an argument to a PHI is the
> same as the def, it cannot provide any additional information and is
> skipped.  ie,
>
># h_10 = PHI <4(2), h_10(3), h_10(4), 1(7)>
>
> We can skip the h_10 arguments, and produce [1,1][4,4] as the range with
> any additional information/processing.
>
> This patch extends that slightly to recognize that if the argument is a
> known equivalence of the def, it also does not provide any additional
> information.  This allows us to "ignore" some of the pessimistic VARYING
> values that come in on back edges when the relation oracle indicates
> that there is a known equivalence.
>
> Take for instance the sequence from the PR testcase:
>
> :
># h_7 = PHI <4(2), 1(4)>
>
> :
># h_18 = PHI 
>
> :
># h_22 = PHI 
>
> :
># h_20 = PHI 
>
>   We only fully calculate one range at a time, so when calculating h_18,
> we need to first resolve the range h_22 on the back edge 3->6. That
> feeds back to h_18, which isn't fully calculated yet and is
> pessimistically assumed to be VARYING until we do get a value. With h_22
> being varying when resolving h_18 now, we end up makig h_18 varying, and
> lose the info from h_7.
>
> This patch extends the equivalence observation slightly to recognize
> that if the argument is a known equivalence of the def in predecessor
> block, it also does not provide any additional information.  This allows
> us to ignore some of the pessimistic VARYING values that are set when
> the relation oracle indicates that there is a known equivalence.
>
> In the above case, h_22 is known to be equivalent to h_18 in BB3, and so
> we can ignore the range h_22 provides on any edge coming from bb3. There
> is a caveat that if *all* the arguments to a PHI are in the equivalence
> set, then you have to use the range of the equivalence.. otherwise you
> get UNDEFINED.
>
> This will help us to see through some of the artifacts of cycling PHIs
> in these simple cases, and in the above case, we end up with h_7, h_18,
> h_22 and h_20 all in the equivalence set with a range of [1, 1][4, 4],
> and we can remove the code we need to like we did in GCC11.
>
> This wont help with more complex PHI cycles, but that seems like
> something we could be looking at elsewhere, phi-opt maybe, utilizing
> ranger to set the global range when its complex.
>
> Bootstrapped on x86_64-pc-linux-gnu with no regressions.  OK?

OK.

Thanks,
Richard.

> Andrew
>
>
>
>

Re: [PATCH 1v2/3][vect] Add main vectorized loop unrolling

2021-11-25 Thread Richard Biener via Gcc-patches

On Thu, 25 Nov 2021, Andre Vieira (lists) wrote:

> 
> On 24/11/2021 11:00, Richard Biener wrote:
> > On Wed, 24 Nov 2021, Andre Vieira (lists) wrote:
> >
> >> On 22/11/2021 12:39, Richard Biener wrote:
> >>> +  if (first_loop_vinfo->suggested_unroll_factor > 1)
> >>> +{
> >>> +  if (LOOP_VINFO_EPIL_USING_PARTIAL_VECTORS_P (first_loop_vinfo))
> >>> +   {
> >>> + if (dump_enabled_p ())
> >>> +   dump_printf_loc (MSG_NOTE, vect_location,
> >>> +"* Re-trying analysis with first vector
> >>> mode"
> >>> +" %s for epilogue with partial vectors of"
> >>> +" unrolled first loop.\n",
> >>> +GET_MODE_NAME (vector_modes[0]));
> >>> + mode_i = 0;
> >>>
> >>> and the later done check for bigger VF than main loop - why would
> >>> we re-start at 0 rather than at the old mode?  Maybe we want to
> >>> remember the iterator value we started at when arriving at the
> >>> main loop mode?  So if we analyzed successfully with mode_i == 2,
> >>> then sucessfully at mode_i == 4 which suggested an unroll of 2,
> >>> re-start at the mode_i we continued after the mode_i == 2
> >>> successful analysis?  To just consider the "simple" case of
> >>> AVX vs SSE it IMHO doesn't make much sense to succeed with
> >>> AVX V4DF, succeed with SSE V2DF and figure it's better than V4DF AVX
> >>> but get a suggestion of 2 times unroll and then re-try AVX V4DF
> >>> just to re-compute that yes, it's worse than SSE V2DF?  You
> >>> are probably thinking of SVE vs ADVSIMD here but do we need to
> >>> start at 0?  Adding a comment to the code would be nice.
> >>>
> >>> Thanks,
> >> I was indeed thinking SVE vs Advanced SIMD where we end up having to
> >> compare
> >> different vectorization strategies, which will have different costs
> >> depending.
> >> The hypothetical case, as in I don't think I've come across one, is where
> >> if
> >> we decide to vectorize the main loop for V8QI and unroll 2x, yielding a VF
> >> of
> >> 16, we may then want to then use a predicated VNx16QI epilogue.
> > But this isn't the epilogue handling ...
> Am I misunderstanding the code here? To me it looks like this is picking what
> mode_i to start the 'while (1)' loop does the loop analysis for the epilogues?

Oops, my fault, yes, it does.  I would suggest to refactor things so
that the mode_i = first_loop_i case is there only once.  I also wonder
if all the argument about starting at 0 doesn't apply to the
not unrolled LOOP_VINFO_EPIL_USING_PARTIAL_VECTORS_P as well?  So
what's the reason to differ here?  So in the end I'd just change
the existing

  if (LOOP_VINFO_EPIL_USING_PARTIAL_VECTORS_P (first_loop_vinfo))
{

to

  if (LOOP_VINFO_EPIL_USING_PARTIAL_VECTORS_P (first_loop_vinfo)
  || first_loop_vinfo->suggested_unroll_factor > 1)
{

and maybe revisit this when we have an actual testcase showing that
doing sth else has a positive effect?

Thanks,
Richard.

Re: [COMMITTED] path solver: Compute ranges in path in gimple order.

2021-11-25 Thread Aldy Hernandez via Gcc-patches

On Thu, Nov 25, 2021 at 1:38 PM Richard Biener
 wrote:
>
> On Thu, Nov 25, 2021 at 1:10 PM Aldy Hernandez  wrote:
> >
> > On Thu, Nov 25, 2021 at 12:57 PM Richard Biener
> >  wrote:
> > >
> > > On Thu, Nov 25, 2021 at 11:55 AM Aldy Hernandez via Gcc-patches
> > >  wrote:
> > > >
> > > > Andrew's patch for this PR103254 papered over some underlying
> > > > performance issues in the path solver that I'd like to address.
> > > >
> > > > We are currently solving the SSA's defined in the current block in
> > > > bitmap order, which amounts to random order for all purposes.  This is
> > > > causing unnecessary recursion in gori.  This patch changes the order
> > > > to gimple order, thus solving dependencies before uses.
> > > >
> > > > There is no change in threadable paths with this change.
> > > >
> > > > Tested on x86-64 & ppc64le Linux.
> > > >
> > > > gcc/ChangeLog:
> > > >
> > > > PR tree-optimization/103254
> > > > * gimple-range-path.cc 
> > > > (path_range_query::compute_ranges_defined): New
> > > > (path_range_query::compute_ranges_in_block): Move to
> > > > compute_ranges_defined.
> > > > * gimple-range-path.h (compute_ranges_defined): New.
> > > > ---
> > > >  gcc/gimple-range-path.cc | 33 ++---
> > > >  gcc/gimple-range-path.h  |  1 +
> > > >  2 files changed, 23 insertions(+), 11 deletions(-)
> > > >
> > > > diff --git a/gcc/gimple-range-path.cc b/gcc/gimple-range-path.cc
> > > > index 4aa666d2c8b..e24086691c4 100644
> > > > --- a/gcc/gimple-range-path.cc
> > > > +++ b/gcc/gimple-range-path.cc
> > > > @@ -401,6 +401,27 @@ path_range_query::compute_ranges_in_phis 
> > > > (basic_block bb)
> > > >  }
> > > >  }
> > > >
> > > > +// Compute ranges defined in block.
> > > > +
> > > > +void
> > > > +path_range_query::compute_ranges_defined (basic_block bb)
> > > > +{
> > > > +  int_range_max r;
> > > > +
> > > > +  compute_ranges_in_phis (bb);
> > > > +
> > > > +  // Iterate in gimple order to minimize recursion.
> > > > +  for (auto gsi = gsi_start_nondebug_bb (bb); !gsi_end_p (gsi); 
> > > > gsi_next (&gsi))
> > >
> > > gsi_next_nondebug (&gsi)?
> > >
> > > Of course this  all has the extra cost of iterating over a possibly very 
> > > large
> > > BB for just a few bits in m_imports?  How often does m_imports have
> > > exactly one bit set?
> >
> > Hmmm, good point.
> >
> > Perhaps this isn't worth it then.  I mean, the underlying bug I'm
> > tackling is an excess of outgoing edge ranges, not the excess
> > recursion this patch attacks.
> >
> > If you think the cost would be high for large ILs, I can revert the patch.
>
> I think so.  If ordering is important then that should be achieved in some
> other ways (always a bit difficult for on-demand infrastructure).

Nah, this isn't a correctness issue.  It's not worth it.

I will revert the patch.

Thanks.
Aldy

Re: [PATCH] Remove dead code and function

2021-11-25 Thread Jan Hubicka via Gcc-patches

> The only use of get_alias_symbol is gated by a gcc_unreachable (),
> so the following patch gets rid of it.
> 
> Bootstrapped and tested on x86_64-unknown-linux-gnu, OK?
OK,
thanks!
Honza

Do not check gimple_call_chain in tree-ssa-alias

2021-11-25 Thread Jan Hubicka via Gcc-patches

Hi,
this pach removes gimple_call_cahin checkin ref_maybe_used_by_call_p that
disables check for CONST functions.  I suppose it was meant to allow consts
to read variables from the static chain but this is not what other places
do.  The testcase:

int
main()
{
int a =0;
__attribute__ ((noinline,const))
int reta ()
{
return a;
}
int val = reta();
a=1;
return val+reta ();
}

Gets optimized to single call of reta since at least gcc 4.1.

LTO bootstrapped and regtested x86_64-linux all languages. OK?

* tree-ssa-alias.c (ref_maybe_used_by_call_p_1): Do not check
gimple_call_call when treating const functions.

diff --git a/gcc/tree-ssa-alias.c b/gcc/tree-ssa-alias.c
index cd6a0b2f67b..3c253e2843f 100644
--- a/gcc/tree-ssa-alias.c
+++ b/gcc/tree-ssa-alias.c
@@ -2743,9 +2743,7 @@ ref_maybe_used_by_call_p_1 (gcall *call, ao_ref *ref, 
bool tbaa_p)
   unsigned i;
   int flags = gimple_call_flags (call);
 
-  /* Const functions without a static chain do not implicitly use memory.  */
-  if (!gimple_call_chain (call)
-  && (flags & (ECF_CONST|ECF_NOVOPS)))
+  if (flags & (ECF_CONST|ECF_NOVOPS))
 goto process_args;
 
   /* A call that is not without side-effects might involve volatile

Re: [PATCH][RFC] middle-end/46476 - resurrect -Wunreachable-code

2021-11-25 Thread Michael Matz via Gcc-patches

Hello,

On Thu, 25 Nov 2021, Richard Biener wrote:

> > Yes, that's definitely the case - I was too lazy to re-use the old
> > option name here.  But I don't have a good name at hand, maybe clang
> > has an option covering the cases I'm thinking about.

As you asked: I already have difficulties to describe the exact semantics 
of the warning in sentences, so I don't find a good name either :-)

> > Btw, the diagnostic spotted qsort_chk doing
> > 
> > if (CMP (i1, i2))
> >   break;
> > else if (CMP (i2, i1))
> >   return ERR2 (i1, i2);
> > 
> > where ERR2 expands to a call to a noreturn void "returning"
> > qsort_chk_error, so the 'return' stmt is not reachable.  Not exactly
> > a bug but somewhat difficult to avoid the diagnostic for.  I suppose
> > the pointless 'return' is to make it more visible that the loop
> > terminates here (albeit we don't return normally).

Tough one.  You could also disable the warning when the fallthrough 
doesn't exist because of a non-returning call.  If it's supposed to find 
obvious programming mistakes it might make sense to regard all function 
calls the same, like they look, i.e. as function calls that can return.  
Or it might make sense to not do that for programmers who happen to know 
about non-returning functions.  :-/

> It also finds this strange code in label_rtx_for_bb:

So the warning is definitely useful!

> indeed the loop looks pointless.  Unless the DECL_NONLOCAL case was 
> meant to continue;

It's like that since it was introduced in 2007.  It's an invariant that 
DECL_NONLOCAL labels are first in a BB and are not followed by normal 
labels, so a 'continue' wouldn't change anything; the loop is useless.

Ciao,
Michael.

[PATCH] [RFC] unreachable returns

2021-11-25 Thread Richard Biener via Gcc-patches

We have quite a number of "default" returns that cannot be reached.
One is particularly interesting since it says (see patch below):

 default:
   gcc_unreachable ();
 }
   /* We can get here with --disable-checking.  */
   return false;

which suggests that _maybe_ the intention was to have the
gcc_unreachable () which expands to __builtin_unreachable ()
with --disable-checking and thus a fallthru to "somewhere"
be catched with a "sane" default return value rather than
falling through to the next function or so.  BUT - that
isn't what actually happens since the 'return false' is
unreachable after CFG construction and will be elided.

In fact the IL after CFG construction is exactly the same
with and without the spurious return.

Now, I wonder if we should, instead of expanding
gcc_unreachable to __builtin_unreachable () with
--disable-checking, expand it to __builtin_trap ()
(or remove the --disable-checking variant completely,
always retaining assert level checking but maybe make
it cheaper in size by using __builtin_trap () or abort ())

Thoughts?

That said, I do have a set of changes removing such spurious
returns.

2021-11-25  Richard Biener  

gcc/c/
* c-typeck.c (c_tree_equal): Remove unreachable return.
---
 gcc/c/c-typeck.c | 2 --
 1 file changed, 2 deletions(-)

diff --git a/gcc/c/c-typeck.c b/gcc/c/c-typeck.c
index b71358e1821..7524304f2bd 100644
--- a/gcc/c/c-typeck.c
+++ b/gcc/c/c-typeck.c
@@ -15984,8 +15984,6 @@ c_tree_equal (tree t1, tree t2)
 default:
   gcc_unreachable ();
 }
-  /* We can get here with --disable-checking.  */
-  return false;
 }
 
 /* Returns true when the function declaration FNDECL is implicit,
-- 
2.31.1

Re: [PATCH] [RFC] unreachable returns

2021-11-25 Thread Jan Hubicka via Gcc-patches

> We have quite a number of "default" returns that cannot be reached.
> One is particularly interesting since it says (see patch below):
> 
>  default:
>gcc_unreachable ();
>  }
>/* We can get here with --disable-checking.  */
>return false;
> 
> which suggests that _maybe_ the intention was to have the
> gcc_unreachable () which expands to __builtin_unreachable ()
> with --disable-checking and thus a fallthru to "somewhere"
> be catched with a "sane" default return value rather than
> falling through to the next function or so.  BUT - that
> isn't what actually happens since the 'return false' is
> unreachable after CFG construction and will be elided.

I think this is just remat of times we did not have
__builtin_unreachable.  I like the idea of removing the redundant code.

Honza

[PATCH] Remove unreachable gcc_unreachable () at the end of functions

2021-11-25 Thread Richard Biener via Gcc-patches

It seems to be a style to place gcc_unreachable () after a
switch that handles all cases with every case returning.
Those are unreachable (well, yes!), so they will be elided
at CFG construction time and the middle-end will place
another __builtin_unreachable "after" them to note the
path doesn't lead to a return when the function is not declared
void.

So IMHO those explicit gcc_unreachable () serve no purpose,
if they could be replaced by a comment.  But since all cases
cover switches not handling a case or not returning will
likely cause some diagnostic to be emitted which is better
than running into an ICE only at runtime.

Bootstrapped and tested on x86_64-unknown-linux-gnu - any
comments?

Thanks,
Richard.

2021-11-24  Richard Biener  

* tree.h (reverse_storage_order_for_component_p): Remove
spurious gcc_unreachable.
* cfganal.c (dfs_find_deadend): Likewise.
* fold-const-call.c (fold_const_logb): Likewise.
(fold_const_significand): Likewise.
* gimple-ssa-store-merging.c (lhs_valid_for_store_merging_p):
Likewise.

gcc/c-family/
* c-format.c (check_format_string): Remove spurious
gcc_unreachable.
---
 gcc/c-family/c-format.c| 2 --
 gcc/cfganal.c  | 2 --
 gcc/fold-const-call.c  | 2 --
 gcc/gimple-ssa-store-merging.c | 2 --
 gcc/tree.h | 2 --
 5 files changed, 10 deletions(-)

diff --git a/gcc/c-family/c-format.c b/gcc/c-family/c-format.c
index e735e092043..617fb5ea626 100644
--- a/gcc/c-family/c-format.c
+++ b/gcc/c-family/c-format.c
@@ -296,8 +296,6 @@ check_format_string (const_tree fntype, unsigned 
HOST_WIDE_INT format_num,
   *no_add_attrs = true;
   return false;
 }
-
-  gcc_unreachable ();
 }
 
 /* Under the control of FLAGS, verify EXPR is a valid constant that
diff --git a/gcc/cfganal.c b/gcc/cfganal.c
index 0cba612738d..48598e55c01 100644
--- a/gcc/cfganal.c
+++ b/gcc/cfganal.c
@@ -752,8 +752,6 @@ dfs_find_deadend (basic_block bb)
  next = e ? e->dest : EDGE_SUCC (bb, 0)->dest;
}
 }
-
-  gcc_unreachable ();
 }
 
 
diff --git a/gcc/fold-const-call.c b/gcc/fold-const-call.c
index d6cb9b11a31..c542e780a18 100644
--- a/gcc/fold-const-call.c
+++ b/gcc/fold-const-call.c
@@ -429,7 +429,6 @@ fold_const_logb (real_value *result, const real_value *arg,
}
   return false;
 }
-  gcc_unreachable ();
 }
 
 /* Try to evaluate:
@@ -463,7 +462,6 @@ fold_const_significand (real_value *result, const 
real_value *arg,
}
   return false;
 }
-  gcc_unreachable ();
 }
 
 /* Try to evaluate:
diff --git a/gcc/gimple-ssa-store-merging.c b/gcc/gimple-ssa-store-merging.c
index e7c90ba8b59..13413ca4cd6 100644
--- a/gcc/gimple-ssa-store-merging.c
+++ b/gcc/gimple-ssa-store-merging.c
@@ -4861,8 +4861,6 @@ lhs_valid_for_store_merging_p (tree lhs)
 default:
   return false;
 }
-
-  gcc_unreachable ();
 }
 
 /* Return true if the tree RHS is a constant we want to consider
diff --git a/gcc/tree.h b/gcc/tree.h
index f0e72b55abe..094501bd9b1 100644
--- a/gcc/tree.h
+++ b/gcc/tree.h
@@ -5110,8 +5110,6 @@ reverse_storage_order_for_component_p (tree t)
 default:
   return false;
 }
-
-  gcc_unreachable ();
 }
 
 /* Return true if T is a storage order barrier, i.e. a VIEW_CONVERT_EXPR
-- 
2.31.1

Re: Do not check gimple_call_chain in tree-ssa-alias

2021-11-25 Thread Richard Biener via Gcc-patches

On Thu, 25 Nov 2021, Jan Hubicka wrote:

> Hi,
> this pach removes gimple_call_cahin checkin ref_maybe_used_by_call_p that
> disables check for CONST functions.  I suppose it was meant to allow consts
> to read variables from the static chain but this is not what other places
> do.  The testcase:
> 
> int
> main()
> {
> int a =0;
> __attribute__ ((noinline,const))
> int reta ()
> {
> return a;
> }
> int val = reta();
> a=1;
> return val+reta ();
> }
> 
> Gets optimized to single call of reta since at least gcc 4.1.
> 
> LTO bootstrapped and regtested x86_64-linux all languages. OK?

I suppose at some point it broke.

But yes, I agree, thus OK.

Thanks,
Richard.

>   * tree-ssa-alias.c (ref_maybe_used_by_call_p_1): Do not check
>   gimple_call_call when treating const functions.
> 
> diff --git a/gcc/tree-ssa-alias.c b/gcc/tree-ssa-alias.c
> index cd6a0b2f67b..3c253e2843f 100644
> --- a/gcc/tree-ssa-alias.c
> +++ b/gcc/tree-ssa-alias.c
> @@ -2743,9 +2743,7 @@ ref_maybe_used_by_call_p_1 (gcall *call, ao_ref *ref, 
> bool tbaa_p)
>unsigned i;
>int flags = gimple_call_flags (call);
>  
> -  /* Const functions without a static chain do not implicitly use memory.  */
> -  if (!gimple_call_chain (call)
> -  && (flags & (ECF_CONST|ECF_NOVOPS)))
> +  if (flags & (ECF_CONST|ECF_NOVOPS))
>  goto process_args;
>  
>/* A call that is not without side-effects might involve volatile
> 

-- 
Richard Biener 
SUSE Software Solutions Germany GmbH, Maxfeldstrasse 5, 90409 Nuernberg,
Germany; GF: Ivo Totev; HRB 36809 (AG Nuernberg)

Re: [PATCH] PR tree-optimization/103359 - Check for equivalences between PHI argument and def.

2021-11-25 Thread Andrew MacLeod via Gcc-patches


On 11/25/21 07:40, Richard Biener wrote:

On Wed, Nov 24, 2021 at 9:49 PM Andrew MacLeod via Gcc-patches
 wrote:

PHI nodes frequently feed each other, and this is particularly true of
the one/two incoming edge PHIs inserted by some of the loop analysis
code which is introduced at the start of the VRP passes.

Ranger has a hybrid of optimistic vs pessimistic evaluation, and when it
switches to pessimistic, it has to assume VARYING for a range.  PHIs are
calculated as the union of all incoming edges, so once we throw a
VARYING into the mix, there's not much chance going back.  (mostly
true... we can sometimes update the range when inputs change, but we
prefer to avoid iterating when possible)

We already have code to recognize that if an argument to a PHI is the
same as the def, it cannot provide any additional information and is
skipped.  ie,

# h_10 = PHI <4(2), h_10(3), h_10(4), 1(7)>

We can skip the h_10 arguments, and produce [1,1][4,4] as the range with
any additional information/processing.

This patch extends that slightly to recognize that if the argument is a
known equivalence of the def, it also does not provide any additional
information.  This allows us to "ignore" some of the pessimistic VARYING
values that come in on back edges when the relation oracle indicates
that there is a known equivalence.

Take for instance the sequence from the PR testcase:

 :
# h_7 = PHI <4(2), 1(4)>

 :
# h_18 = PHI 

 :
# h_22 = PHI 

 :
# h_20 = PHI 

   We only fully calculate one range at a time, so when calculating h_18,
we need to first resolve the range h_22 on the back edge 3->6. That
feeds back to h_18, which isn't fully calculated yet and is
pessimistically assumed to be VARYING until we do get a value. With h_22
being varying when resolving h_18 now, we end up makig h_18 varying, and
lose the info from h_7.

This patch extends the equivalence observation slightly to recognize
that if the argument is a known equivalence of the def in predecessor
block, it also does not provide any additional information.  This allows
us to ignore some of the pessimistic VARYING values that are set when
the relation oracle indicates that there is a known equivalence.

In the above case, h_22 is known to be equivalent to h_18 in BB3, and so
we can ignore the range h_22 provides on any edge coming from bb3. There
is a caveat that if *all* the arguments to a PHI are in the equivalence
set, then you have to use the range of the equivalence.. otherwise you
get UNDEFINED.

This will help us to see through some of the artifacts of cycling PHIs
in these simple cases, and in the above case, we end up with h_7, h_18,
h_22 and h_20 all in the equivalence set with a range of [1, 1][4, 4],
and we can remove the code we need to like we did in GCC11.

This wont help with more complex PHI cycles, but that seems like
something we could be looking at elsewhere, phi-opt maybe, utilizing
ranger to set the global range when its complex.

Bootstrapped on x86_64-pc-linux-gnu with no regressions.  OK?

OK.

Thanks,
Richard.


Committed.

Re: [PATCH][RFC] middle-end/46476 - resurrect -Wunreachable-code

2021-11-25 Thread Richard Biener via Gcc-patches

On Thu, 25 Nov 2021, Michael Matz wrote:

> Hello,
> 
> On Thu, 25 Nov 2021, Richard Biener wrote:
> 
> > > Yes, that's definitely the case - I was too lazy to re-use the old
> > > option name here.  But I don't have a good name at hand, maybe clang
> > > has an option covering the cases I'm thinking about.
> 
> As you asked: I already have difficulties to describe the exact semantics 
> of the warning in sentences, so I don't find a good name either :-)

It diagnoses some cases of unreachable code so -Wunreachable-code sounded
like the obvious fit :P  But names can create (wrong) expectation ...

clang has 
-Wunreachable-code{,-aggressive,-break,-fallthrough,-loop-increment,-return}
but documentation is very sparse, -break and -return are what -aggressive
enables.

> > > Btw, the diagnostic spotted qsort_chk doing
> > > 
> > > if (CMP (i1, i2))
> > >   break;
> > > else if (CMP (i2, i1))
> > >   return ERR2 (i1, i2);
> > > 
> > > where ERR2 expands to a call to a noreturn void "returning"
> > > qsort_chk_error, so the 'return' stmt is not reachable.  Not exactly
> > > a bug but somewhat difficult to avoid the diagnostic for.  I suppose
> > > the pointless 'return' is to make it more visible that the loop
> > > terminates here (albeit we don't return normally).
> 
> Tough one.  You could also disable the warning when the fallthrough 
> doesn't exist because of a non-returning call.  If it's supposed to find 
> obvious programming mistakes it might make sense to regard all function 
> calls the same, like they look, i.e. as function calls that can return.  
> Or it might make sense to not do that for programmers who happen to know 
> about non-returning functions.  :-/
> 
> > It also finds this strange code in label_rtx_for_bb:
> 
> So the warning is definitely useful!

Yep, also found some more real issues.  But I'm not managing it
to get clean in a bootstrap due to some remaining issues with
early folding exposing unreachable code following gcc_assert()s

Richard.

Re: [AArch64] Enable generation of FRINTNZ instructions

2021-11-25 Thread Andre Vieira (lists) via Gcc-patches



On 22/11/2021 11:41, Richard Biener wrote:



On 18/11/2021 11:05, Richard Biener wrote:

This is a good shout and made me think about something I hadn't before... I
thought I could handle the vector forms later, but the problem is if I add
support for the scalar, it will stop the vectorizer. It seems
vectorizable_call expects all arguments to have the same type, which doesn't
work with passing the integer type as an operand work around.

We already special case some IFNs there (masked load/store and gather)
to ignore some args, so that would just add to this set.

Richard.

Hi,

Reworked it to add support of the new IFN to the vectorizer. Was 
initially trying to make vectorizable_call and 
vectorizable_internal_function handle IFNs with different inputs more 
generically, using the information we have in the _direct structs 
regarding what operands to get the modes from. Unfortunately, that 
wasn't straightforward because of how vectorizable_call assumes operands 
have the same type and uses the type of the DEF_STMT_INFO of the 
non-constant operands (either output operand or non-constant inputs) to 
determine the type of constants. I assume there is some reason why we 
use the DEF_STMT_INFO and not always use get_vectype_for_scalar_type on 
the argument types. That is why I ended up with this sort of half-way 
mix of both, which still allows room to add more IFNs that don't take 
inputs of the same type, but require adding a bit of special casing 
similar to the IFN_FTRUNC_INT and masking ones.


Bootstrapped on aarch64-none-linux.

OK for trunk?

gcc/ChangeLog:

    * config/aarch64/aarch64.md (ftrunc2): New 
pattern.

    * config/aarch64/iterators.md (FRINTNZ): New iterator.
    (frintnz_mode): New int attribute.
    (VSFDF): Make iterator conditional.
    * internal-fn.def (FTRUNC_INT): New IFN.
    * internal-fn.c (ftrunc_int_direct): New define.
    (expand_ftrunc_int_optab_fn): New custom expander.
    (direct_ftrunc_int_optab_supported_p): New supported_p.
    * match.pd: Add to the existing TRUNC pattern match.
    * optabs.def (ftrunc_int): New entry.
    * stor-layout.h (element_precision): Moved from here...
    * tree.h (element_precision): ... to here.
    (element_type): New declaration.
    * tree.c (element_type): New function.
    (element_precision): Changed to use element_type.
    * tree-vect-stmts.c (vectorizable_internal_function): Add 
support for

    IFNs with different input types.
    (vectorizable_call): Teach to handle IFN_FTRUNC_INT.
    * doc/md.texi: New entry for ftrunc pattern name.
    * doc/sourcebuild.texi (aarch64_frintzx_ok): New target.

gcc/testsuite/ChangeLog:

    * gcc.target/aarch64/merge_trunc1.c: Adapted to skip if frintNz 
instruction available.

    * lib/target-supports.exp: Added arm_v8_5a_frintnzx_ok target.
    * gcc.target/aarch64/frintnz.c: New test.
    * gcc.target/aarch64/frintnz_vec.c: New test.diff --git a/gcc/config/aarch64/aarch64.md b/gcc/config/aarch64/aarch64.md
index 
4035e061706793849c68ae09bcb2e4b9580ab7b6..c5c60e7a810e22b0ea9ed6bf056ddd6431d60269
 100644
--- a/gcc/config/aarch64/aarch64.md
+++ b/gcc/config/aarch64/aarch64.md
@@ -7345,12 +7345,18 @@ (define_insn "despeculate_simpleti"
(set_attr "speculation_barrier" "true")]
 )
 
+(define_expand "ftrunc2"
+  [(set (match_operand:VSFDF 0 "register_operand" "=w")
+(unspec:VSFDF [(match_operand:VSFDF 1 "register_operand" "w")]
+ FRINTNZ))]
+  "TARGET_FRINT"
+)
+
 (define_insn "aarch64_"
   [(set (match_operand:VSFDF 0 "register_operand" "=w")
(unspec:VSFDF [(match_operand:VSFDF 1 "register_operand" "w")]
  FRINTNZX))]
-  "TARGET_FRINT && TARGET_FLOAT
-   && !(VECTOR_MODE_P (mode) && !TARGET_SIMD)"
+  "TARGET_FRINT"
   "\\t%0, %1"
   [(set_attr "type" "f_rint")]
 )
diff --git a/gcc/config/aarch64/iterators.md b/gcc/config/aarch64/iterators.md
index 
bdc8ba3576cf2c9b4ae96b45a382234e4e25b13f..51f00344b02d0d1d4adf97463f6a46f9fd0fb43f
 100644
--- a/gcc/config/aarch64/iterators.md
+++ b/gcc/config/aarch64/iterators.md
@@ -160,7 +160,11 @@ (define_mode_iterator VHSDF_HSDF [(V4HF 
"TARGET_SIMD_F16INST")
  SF DF])
 
 ;; Scalar and vetor modes for SF, DF.
-(define_mode_iterator VSFDF [V2SF V4SF V2DF DF SF])
+(define_mode_iterator VSFDF [ (V2SF "TARGET_SIMD")
+ (V4SF "TARGET_SIMD")
+ (V2DF "TARGET_SIMD")
+ (DF "TARGET_FLOAT")
+ (SF "TARGET_FLOAT")])
 
 ;; Advanced SIMD single Float modes.
 (define_mode_iterator VDQSF [V2SF V4SF])
@@ -3067,6 +3071,8 @@ (define_int_iterator FCMLA [UNSPEC_FCMLA
 (define_int_iterator FRINTNZX [UNSPEC_FRINT32Z UNSPEC_FRINT32X
   UNSPEC_FRINT64Z UNSPEC_FRINT64X])
 
+(define_int_iterator FRINTNZ [UNSPEC_FRINT32Z UNSPEC_FRINT64Z])
+
 (define_int_iter

[PATCH] d: fix ASAN in option processing

2021-11-25 Thread Martin Liška


Fixes:

==129444==ERROR: AddressSanitizer: global-buffer-overflow on address 
0x0666ca5c at pc 0x00ef094b bp 0x7fff8180 sp 0x7fff8178
READ of size 4 at 0x0666ca5c thread T0
#0 0xef094a in parse_optimize_options ../../gcc/d/d-attribs.cc:855
#1 0xef0d36 in d_handle_optimize_attribute ../../gcc/d/d-attribs.cc:916
#2 0xef107e in d_handle_optimize_attribute ../../gcc/d/d-attribs.cc:887
#3 0xff85b1 in decl_attributes(tree_node**, tree_node*, int, tree_node*) 
../../gcc/attribs.c:829
#4 0xef2a91 in apply_user_attributes(Dsymbol*, tree_node*) 
../../gcc/d/d-attribs.cc:427
#5 0xf7b7f3 in get_symbol_decl(Declaration*) ../../gcc/d/decl.cc:1346
#6 0xf87bc7 in get_symbol_decl(Declaration*) ../../gcc/d/decl.cc:967
#7 0xf87bc7 in DeclVisitor::visit(FuncDeclaration*) ../../gcc/d/decl.cc:808
#8 0xf83db5 in DeclVisitor::build_dsymbol(Dsymbol*) ../../gcc/d/decl.cc:146

for the following test-case: gcc/testsuite/gdc.dg/attr_optimize1.d.

Ready for master?
Thanks,
Martin

gcc/d/ChangeLog:

* d-attribs.cc (parse_optimize_options): Check index before
accessing cl_options.
---
 gcc/d/d-attribs.cc | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/gcc/d/d-attribs.cc b/gcc/d/d-attribs.cc
index d81b7d122f7..1ec800526f7 100644
--- a/gcc/d/d-attribs.cc
+++ b/gcc/d/d-attribs.cc
@@ -852,7 +852,9 @@ parse_optimize_options (tree args)
   unsigned j = 1;
   for (unsigned i = 1; i < decoded_options_count; ++i)
 {
-  if (! (cl_options[decoded_options[i].opt_index].flags & CL_OPTIMIZATION))
+  unsigned opt_index = decoded_options[i].opt_index;
+  if (opt_index >= cl_options_count
+ && ! (cl_options[opt_index].flags & CL_OPTIMIZATION))
{
  ret = false;
  warning (OPT_Wattributes,
--
2.34.0

Re: [PATCH] Remove unreachable gcc_unreachable () at the end of functions

2021-11-25 Thread Michael Matz via Gcc-patches

Hello,

On Thu, 25 Nov 2021, Richard Biener via Gcc-patches wrote:

> It seems to be a style to place gcc_unreachable () after a
> switch that handles all cases with every case returning.
> Those are unreachable (well, yes!), so they will be elided
> at CFG construction time and the middle-end will place
> another __builtin_unreachable "after" them to note the
> path doesn't lead to a return when the function is not declared
> void.
> 
> So IMHO those explicit gcc_unreachable () serve no purpose,
> if they could be replaced by a comment.

Never document in comments what you can document in code (IMO).  I think 
the code as-is clearly documents the invariants and expectations and 
removing the gcc_unreachable() leads to worse sources.

Can't you simply exempt warning on unreachable __builtin_unreachable()?
It seems an obvious thing that the warning should _not_ warn about, after 
all, quite clearly, the author is aware of that being unreachable, it says 
so, right there.

Ciao,
Michael.

[COMMITTED] PR tree-optimization/102648 - Add the testcase for this PR to the testsuite.

2021-11-25 Thread Andrew MacLeod via Gcc-patches

Various ranger enabled passes, such as threading, or VRP2 resolve this 
now.  I'm adding the test case before closing.


committed as obvious.

Andrew


commit 1598bd47b2a4a5f12b5a987d16d82634644db4b6
Author: Andrew MacLeod 
Date:   Thu Nov 25 08:58:19 2021 -0500

Add the testcase for this PR to the testsuite.

Various ranger-enabled patches like threading and VRP2 can do this now, so add the testcase for posterity.

gcc/testsuite/
PR tree-optimization/102648
* gcc.dg/pr102648.c: New.

diff --git a/gcc/testsuite/gcc.dg/pr102648.c b/gcc/testsuite/gcc.dg/pr102648.c
new file mode 100644
index 000..a0f6386dde3
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/pr102648.c
@@ -0,0 +1,21 @@
+/* { dg-do compile } */
+/* { dg-options "-O3 -fdump-tree-optimized" } */
+
+void foo();
+static char a, c;
+static int d, e;
+static short b(short f, short g) { return f * g; }
+int main() {
+  short h = 4;
+  for (; d;)
+if (h)
+  if(e) {
+if (!b(a & 1 | h, 3))
+  c = 0;
+h = 1;
+  }
+  if (c)
+foo();
+}
+
+/* { dg-final { scan-tree-dump-not "foo" "optimized" } } */

[PATCH 00/16] OpenMP: lvalues in "map" clauses and struct handling rework

2021-11-25 Thread Julian Brown

Hi Jakub,

This is a rebased/slightly bug-fixed version of several previously posted
patch series, all in one place for ease of reference.  The series should
be applied on top of Chung-Lin's two patches:

  "Improve OpenMP target support for C++ [PR92120 v5]"
  https://gcc.gnu.org/pipermail/gcc-patches/2021-November/584602.html

  "Remove array section base-pointer mapping semantics, and other front-end
   adjustments (mainline trunk)"
  https://gcc.gnu.org/pipermail/gcc-patches/2021-November/584994.html

And supersedes the following three patch series:

  "Topological sort for OpenMP 5.0 base pointers"
  https://gcc.gnu.org/pipermail/gcc-patches/2021-August/577211.html

  "OpenMP: Deep struct dereferences"
  https://gcc.gnu.org/pipermail/gcc-patches/2021-October/580721.html

  "Parsing of lvalues for "map" clauses for C and C++"
  https://gcc.gnu.org/pipermail/gcc-patches/2021-November/584445.html

Tested with offloading to NVPTX and bootstrapped.  Further commentary
on individual patches.  OK?

Thanks,

Julian

Julian Brown (16):
  Rewrite GOMP_MAP_ATTACH_DETACH mappings unconditionally
  OpenMP/OpenACC: Move array_ref/indirect_ref handling code out of
extract_base_bit_offset
  OpenACC/OpenMP: Refactor struct lowering in gimplify.c
  OpenACC: Rework indirect struct handling in gimplify.c
  Remove base_ind/base_ref handling from extract_base_bit_offset
  OpenMP 5.0: Clause ordering for OpenMP 5.0 (topological sorting by
base pointer)
  Remove omp_target_reorder_clauses
  OpenMP/OpenACC: Hoist struct sibling list handling in gimplification
  OpenMP: Allow array ref components for C & C++
  OpenMP: Fix non-zero attach/detach bias for struct dereferences
  OpenMP: Handle reference-typed struct members
  OpenACC: Make deep-copy-arrayofstruct.c a libgomp/runtime test
  Add debug_omp_expr
  OpenMP: Add inspector class to unify mapped address analysis
  OpenMP: lvalue parsing for map clauses (C++)
  OpenMP: lvalue parsing for map clauses (C)

 gcc/c-family/c-common.h   |   45 +
 gcc/c-family/c-omp.c  |  210 ++
 gcc/c/c-parser.c  |  150 +-
 gcc/c/c-tree.h|1 +
 gcc/c/c-typeck.c  |  250 +-
 gcc/cp/error.c|9 +
 gcc/cp/parser.c   |  141 +-
 gcc/cp/parser.h   |3 +
 gcc/cp/semantics.c|  290 +-
 gcc/fortran/trans-openmp.c|   20 +-
 gcc/gimplify.c| 2458 +++--
 gcc/omp-low.c |   23 +-
 gcc/testsuite/c-c++-common/gomp/map-1.c   |3 +-
 gcc/testsuite/c-c++-common/gomp/map-6.c   |6 +-
 gcc/testsuite/g++.dg/goacc/member-array-acc.C |   13 +
 gcc/testsuite/g++.dg/gomp/ind-base-3.C|   38 +
 gcc/testsuite/g++.dg/gomp/map-assignment-1.C  |   12 +
 gcc/testsuite/g++.dg/gomp/map-inc-1.C |   10 +
 gcc/testsuite/g++.dg/gomp/map-lvalue-ref-1.C  |   19 +
 gcc/testsuite/g++.dg/gomp/map-ptrmem-1.C  |   36 +
 gcc/testsuite/g++.dg/gomp/map-ptrmem-2.C  |   39 +
 .../g++.dg/gomp/map-static-cast-lvalue-1.C|   17 +
 gcc/testsuite/g++.dg/gomp/map-ternary-1.C |   20 +
 gcc/testsuite/g++.dg/gomp/member-array-2.C|   86 +
 gcc/testsuite/g++.dg/gomp/member-array-omp.C  |   13 +
 gcc/testsuite/g++.dg/gomp/pr67522.C   |2 +-
 gcc/testsuite/g++.dg/gomp/target-3.C  |4 +-
 gcc/testsuite/g++.dg/gomp/target-lambda-1.C   |6 +-
 gcc/testsuite/g++.dg/gomp/target-this-2.C |2 +-
 gcc/testsuite/g++.dg/gomp/target-this-3.C |4 +-
 gcc/testsuite/g++.dg/gomp/target-this-4.C |4 +-
 .../g++.dg/gomp/unmappable-component-1.C  |   21 +
 gcc/tree-pretty-print.c   |   45 +
 gcc/tree-pretty-print.h   |1 +
 gcc/tree.def  |3 +
 libgomp/testsuite/libgomp.c++/baseptrs-3.C|  275 ++
 libgomp/testsuite/libgomp.c++/ind-base-1.C|  162 ++
 libgomp/testsuite/libgomp.c++/ind-base-2.C|   49 +
 libgomp/testsuite/libgomp.c++/map-comma-1.C   |   15 +
 .../testsuite/libgomp.c++/map-rvalue-ref-1.C  |   22 +
 .../testsuite/libgomp.c++/member-array-1.C|   89 +
 libgomp/testsuite/libgomp.c++/struct-ref-1.C  |   97 +
 .../libgomp.c-c++-common/baseptrs-1.c |   50 +
 .../libgomp.c-c++-common/baseptrs-2.c |   70 +
 .../libgomp.c-c++-common/ind-base-4.c |   50 +
 .../libgomp.c-c++-common/unary-ptr-1.c|   16 +
 .../testsuite/libgomp.oacc-c++/deep-copy-17.C |  101 +
 .../libgomp.oacc-c-c++-common/deep-copy-15.c  |   68 +
 .../libgomp.oacc-c-c++-common/deep-copy-16.c  |  231 ++
 .../deep-copy-arrayofstruct.c |2 +-
 50 files changed, 4114 insertions(+), 1187 deletions(-)
 create mode 100644 gcc/testsuite/g++.dg/goacc/member-array-acc.C
 create mode 100644 gcc/testsuite/g++.dg/gomp/ind-base-3.C
 cr

[PATCH 01/16] Rewrite GOMP_MAP_ATTACH_DETACH mappings unconditionally

2021-11-25 Thread Julian Brown

It never makes sense for a GOMP_MAP_ATTACH_DETACH mapping to survive
beyond gimplify.c, so this patch rewrites such mappings to GOMP_MAP_ATTACH
or GOMP_MAP_DETACH unconditionally (rather than checking for a list
of types of OpenACC or OpenMP constructs), in cases where it hasn't
otherwise been done already in the preceding code.

Previously posted here:

  https://gcc.gnu.org/pipermail/gcc-patches/2021-May/570399.html
  https://gcc.gnu.org/pipermail/gcc-patches/2021-June/571711.html (og11)

OK?

Thanks,

Julian

2021-06-02  Julian Brown  

gcc/
* gimplify.c (gimplify_scan_omp_clauses): Simplify condition
for changing GOMP_MAP_ATTACH_DETACH to GOMP_MAP_ATTACH or
GOMP_MAP_DETACH.
---
 gcc/gimplify.c | 10 +-
 1 file changed, 1 insertion(+), 9 deletions(-)

diff --git a/gcc/gimplify.c b/gcc/gimplify.c
index 4cd62270a10..8d8735ae4c1 100644
--- a/gcc/gimplify.c
+++ b/gcc/gimplify.c
@@ -9965,15 +9965,7 @@ gimplify_scan_omp_clauses (tree *list_p, gimple_seq 
*pre_p,
skip_map_struct:
  ;
}
- else if ((code == OACC_ENTER_DATA
-   || code == OACC_EXIT_DATA
-   || code == OACC_DATA
-   || code == OACC_PARALLEL
-   || code == OACC_KERNELS
-   || code == OACC_SERIAL
-   || code == OMP_TARGET_ENTER_DATA
-   || code == OMP_TARGET_EXIT_DATA)
-  && OMP_CLAUSE_MAP_KIND (c) == GOMP_MAP_ATTACH_DETACH)
+ else if (OMP_CLAUSE_MAP_KIND (c) == GOMP_MAP_ATTACH_DETACH)
{
  gomp_map_kind k = ((code == OACC_EXIT_DATA
  || code == OMP_TARGET_EXIT_DATA)
-- 
2.29.2

[PATCH 02/16] OpenMP/OpenACC: Move array_ref/indirect_ref handling code out of extract_base_bit_offset

2021-11-25 Thread Julian Brown

This patch slightly cleans up the semantics of extract_base_bit_offset,
in that the stripping of ARRAY_REFS/INDIRECT_REFS out of
extract_base_bit_offset is moved back into the (two) call sites of the
function. This is done in preparation for follow-on patches that extend
the function.

Previously posted for the og11 branch here (patch & reversion/rework):

  https://gcc.gnu.org/pipermail/gcc-patches/2021-June/571712.html
  https://gcc.gnu.org/pipermail/gcc-patches/2021-June/571884.html

OK?

Thanks,

Julian

2021-06-03  Julian Brown  

gcc/
* gimplify.c (extract_base_bit_offset): Don't look through ARRAY_REFs or
INDIRECT_REFs here.
(build_struct_group): Reinstate previous behaviour for handling
ARRAY_REFs/INDIRECT_REFs.
---
 gcc/gimplify.c | 59 +-
 1 file changed, 29 insertions(+), 30 deletions(-)

diff --git a/gcc/gimplify.c b/gcc/gimplify.c
index 8d8735ae4c1..1baea68920b 100644
--- a/gcc/gimplify.c
+++ b/gcc/gimplify.c
@@ -8675,31 +8675,7 @@ extract_base_bit_offset (tree base, tree *base_ref, 
poly_int64 *bitposp,
   poly_offset_int poffset;
 
   if (base_ref)
-{
-  *base_ref = NULL_TREE;
-
-  while (TREE_CODE (base) == ARRAY_REF)
-   base = TREE_OPERAND (base, 0);
-
-  if (TREE_CODE (base) == INDIRECT_REF)
-   base = TREE_OPERAND (base, 0);
-}
-  else
-{
-  if (TREE_CODE (base) == ARRAY_REF)
-   {
- while (TREE_CODE (base) == ARRAY_REF)
-   base = TREE_OPERAND (base, 0);
- if (TREE_CODE (base) != COMPONENT_REF
- || TREE_CODE (TREE_TYPE (base)) != ARRAY_TYPE)
-   return NULL_TREE;
-   }
-  else if (TREE_CODE (base) == INDIRECT_REF
-  && TREE_CODE (TREE_OPERAND (base, 0)) == COMPONENT_REF
-  && (TREE_CODE (TREE_TYPE (TREE_OPERAND (base, 0)))
-  == REFERENCE_TYPE))
-   base = TREE_OPERAND (base, 0);
-}
+*base_ref = NULL_TREE;
 
   base = get_inner_reference (base, &bitsize, &bitpos, &offset, &mode,
  &unsignedp, &reversep, &volatilep);
@@ -9673,12 +9649,17 @@ gimplify_scan_omp_clauses (tree *list_p, gimple_seq 
*pre_p,
  poly_offset_int offset1;
  poly_int64 bitpos1;
  tree tree_offset1;
- tree base_ref;
+ tree base_ref, ocd = OMP_CLAUSE_DECL (c);
 
- tree base
-   = extract_base_bit_offset (OMP_CLAUSE_DECL (c), &base_ref,
-  &bitpos1, &offset1,
-  &tree_offset1);
+ while (TREE_CODE (ocd) == ARRAY_REF)
+   ocd = TREE_OPERAND (ocd, 0);
+
+ if (TREE_CODE (ocd) == INDIRECT_REF)
+   ocd = TREE_OPERAND (ocd, 0);
+
+ tree base = extract_base_bit_offset (ocd, &base_ref,
+  &bitpos1, &offset1,
+  &tree_offset1);
 
  bool do_map_struct = (base == decl && !tree_offset1);
 
@@ -9871,6 +9852,24 @@ gimplify_scan_omp_clauses (tree *list_p, gimple_seq 
*pre_p,
poly_offset_int offsetn;
poly_int64 bitposn;
tree tree_offsetn;
+
+   if (TREE_CODE (sc_decl) == ARRAY_REF)
+ {
+   while (TREE_CODE (sc_decl) == ARRAY_REF)
+ sc_decl = TREE_OPERAND (sc_decl, 0);
+   if (TREE_CODE (sc_decl) != COMPONENT_REF
+   || (TREE_CODE (TREE_TYPE (sc_decl))
+   != ARRAY_TYPE))
+ break;
+ }
+   else if (TREE_CODE (sc_decl) == INDIRECT_REF
+&& (TREE_CODE (TREE_OPERAND (sc_decl, 0))
+== COMPONENT_REF)
+&& (TREE_CODE (TREE_TYPE
+ (TREE_OPERAND (sc_decl, 0)))
+== REFERENCE_TYPE))
+ sc_decl = TREE_OPERAND (sc_decl, 0);
+
tree base
  = extract_base_bit_offset (sc_decl, NULL,
 &bitposn, &offsetn,
-- 
2.29.2

[PATCH 03/16] OpenACC/OpenMP: Refactor struct lowering in gimplify.c

2021-11-25 Thread Julian Brown

(Previously submitted here:
https://gcc.gnu.org/pipermail/gcc-patches/2021-May/570398.html)

This patch is a second attempt at refactoring struct component mapping
handling for OpenACC/OpenMP during gimplification, after the patch I
posted here:

  https://gcc.gnu.org/pipermail/gcc-patches/2018-November/510503.html

And improved here, post-review:

  https://gcc.gnu.org/pipermail/gcc-patches/2019-November/533394.html

This patch goes further, in that the struct-handling code is outlined
into its own function (to create the "GOMP_MAP_STRUCT" node and the
sorted list of nodes immediately following it, from a set of mappings
of components of a given struct or derived type). I've also gone through
the list-handling code and attempted to add comments documenting how it
works to the best of my understanding, and broken out a couple of helper
functions in order to (hopefully) have the code self-document better also.

OK?

Thanks,

Julian

2021-06-02  Julian Brown  

gcc/
* gimplify.c (insert_struct_comp_map): Refactor function into...
(build_struct_comp_nodes): This new function.  Remove list handling
and improve self-documentation.
(insert_node_after, move_node_after, move_nodes_after,
move_concat_nodes_after): New helper functions.
(build_struct_group): New function to build up GOMP_MAP_STRUCT node
groups to map struct components. Outlined from...
(gimplify_scan_omp_clauses): Here.  Call above function.
---
 gcc/gimplify.c | 976 +++--
 1 file changed, 611 insertions(+), 365 deletions(-)

diff --git a/gcc/gimplify.c b/gcc/gimplify.c
index 1baea68920b..c5e058d6d1f 100644
--- a/gcc/gimplify.c
+++ b/gcc/gimplify.c
@@ -8588,73 +8588,66 @@ gimplify_omp_depend (tree *list_p, gimple_seq *pre_p)
   return 1;
 }
 
-/* Insert a GOMP_MAP_ALLOC or GOMP_MAP_RELEASE node following a
-   GOMP_MAP_STRUCT mapping.  C is an always_pointer mapping.  STRUCT_NODE is
-   the struct node to insert the new mapping after (when the struct node is
-   initially created).  PREV_NODE is the first of two or three mappings for a
-   pointer, and is either:
- - the node before C, when a pair of mappings is used, e.g. for a C/C++
-   array section.
- - not the node before C.  This is true when we have a reference-to-pointer
-   type (with a mapping for the reference and for the pointer), or for
-   Fortran derived-type mappings with a GOMP_MAP_TO_PSET.
-   If SCP is non-null, the new node is inserted before *SCP.
-   if SCP is null, the new node is inserted before PREV_NODE.
-   The return type is:
- - PREV_NODE, if SCP is non-null.
- - The newly-created ALLOC or RELEASE node, if SCP is null.
- - The second newly-created ALLOC or RELEASE node, if we are mapping a
-   reference to a pointer.  */
+/* For a set of mappings describing an array section pointed to by a struct
+   (or derived type, etc.) component, create an "alloc" or "release" node to
+   insert into a list following a GOMP_MAP_STRUCT node.  For some types of
+   mapping (e.g. Fortran arrays with descriptors), an additional mapping may
+   be created that is inserted into the list of mapping nodes attached to the
+   directive being processed -- not part of the sorted list of nodes after
+   GOMP_MAP_STRUCT.
+
+   CODE is the code of the directive being processed.  GRP_START and GRP_END
+   are the first and last of two or three nodes representing this array section
+   mapping (e.g. a data movement node like GOMP_MAP_{TO,FROM}, optionally a
+   GOMP_MAP_TO_PSET, and finally a GOMP_MAP_ALWAYS_POINTER).  EXTRA_NODE is
+   filled with the additional node described above, if needed.
+
+   This function does not add the new nodes to any lists itself.  It is the
+   responsibility of the caller to do that.  */
 
 static tree
-insert_struct_comp_map (enum tree_code code, tree c, tree struct_node,
-   tree prev_node, tree *scp)
+build_struct_comp_nodes (enum tree_code code, tree grp_start, tree grp_end,
+tree *extra_node)
 {
   enum gomp_map_kind mkind
 = (code == OMP_TARGET_EXIT_DATA || code == OACC_EXIT_DATA)
   ? GOMP_MAP_RELEASE : GOMP_MAP_ALLOC;
 
-  tree c2 = build_omp_clause (OMP_CLAUSE_LOCATION (c), OMP_CLAUSE_MAP);
-  tree cl = scp ? prev_node : c2;
+  gcc_assert (grp_start != grp_end);
+
+  tree c2 = build_omp_clause (OMP_CLAUSE_LOCATION (grp_end), OMP_CLAUSE_MAP);
   OMP_CLAUSE_SET_MAP_KIND (c2, mkind);
-  OMP_CLAUSE_DECL (c2) = unshare_expr (OMP_CLAUSE_DECL (c));
-  OMP_CLAUSE_CHAIN (c2) = scp ? *scp : prev_node;
-  if (OMP_CLAUSE_CHAIN (prev_node) != c
-  && OMP_CLAUSE_CODE (OMP_CLAUSE_CHAIN (prev_node)) == OMP_CLAUSE_MAP
-  && (OMP_CLAUSE_MAP_KIND (OMP_CLAUSE_CHAIN (prev_node))
- == GOMP_MAP_TO_PSET))
-OMP_CLAUSE_SIZE (c2) = OMP_CLAUSE_SIZE (OMP_CLAUSE_CHAIN (prev_node));
+  OMP_CLAUSE_DECL (c2) = unshare_expr (OMP_CLAUSE_DECL (grp_end));
+  OMP_CLAUSE_CHAIN (c2)

[PATCH 04/16] OpenACC: Rework indirect struct handling in gimplify.c

2021-11-25 Thread Julian Brown

(Previously posted here:
https://gcc.gnu.org/pipermail/gcc-patches/2021-May/570400.html)

This patch reworks indirect struct handling in gimplify.c (i.e. for
struct components mapped with "mystruct->a[0:n]", "mystruct->b", etc.),
for OpenACC.  The key observation leading to these changes was that
component mappings of references-to-structures is already implemented
and working, and indirect struct component handling via a pointer can
work quite similarly.  That lets us remove some earlier, special-case
handling for mapping indirect struct component accesses for OpenACC,
which required the pointed-to struct to be manually mapped before the
indirect component mapping.

With this patch, you can map struct components directly (e.g. an array
slice "mystruct->a[0:n]") just like you can map a non-indirect struct
component slice ("mystruct.a[0:n]"). Both references-to-pointers (with
the former syntax) and references to structs (with the latter syntax)
work now.

For Fortran class pointers, we no longer re-use GOMP_MAP_TO_PSET for the
class metadata (the structure that points to the class data and vptr)
-- it is instead treated as any other struct.

For C++, the struct handling also works for class members ("this->foo"),
without having to explicitly map "this[:1]" first.

For OpenACC, we permit chained indirect component references
("mystruct->a->b[0:n]"), though only the last part of such mappings will
trigger an attach/detach operation.  To properly use such a construct
on the target, you must still manually map "mystruct->a[:1]" first --
but there's no need to map "mystruct[:1]" explicitly before that.

This version of the patch avoids altering code paths for OpenMP,
where possible. (Those are dealt with by later patches in this series.)

OK?

Thanks,

Julian

2021-06-02  Julian Brown  

gcc/fortran/
* trans-openmp.c (gfc_trans_omp_clauses): Don't create GOMP_MAP_TO_PSET
mappings for class metadata, nor GOMP_MAP_POINTER mappings for
POINTER_TYPE_P decls.

gcc/
* gimplify.c (extract_base_bit_offset): Add BASE_IND and OPENMP
parameters.  Handle pointer-typed indirect references for OpenACC
alongside reference-typed ones.
(strip_components_and_deref, aggregate_base_p): New functions.
(build_struct_group): Add pointer type indirect ref handling,
including chained references, for OpenACC.  Also handle references to
structs for OpenACC.  Conditionalise bits for OpenMP only where
appropriate.
(gimplify_scan_omp_clauses): Rework pointer-type indirect structure
access handling to work more like the reference-typed handling for
OpenACC only.
* omp-low.c (scan_sharing_clauses): Handle pointer-type indirect struct
references, and references to pointers to structs also.

gcc/testsuite/
* g++.dg/goacc/member-array-acc.C: New test.
* g++.dg/gomp/member-array-omp.C: New test.

libgomp/
* testsuite/libgomp.oacc-c-c++-common/deep-copy-15.c: New test.
* testsuite/libgomp.oacc-c-c++-common/deep-copy-16.c: New test.
* testsuite/libgomp.oacc-c++/deep-copy-17.C: New test.
---
 gcc/fortran/trans-openmp.c|  20 +-
 gcc/gimplify.c| 214 +---
 gcc/omp-low.c |  16 +-
 gcc/testsuite/g++.dg/goacc/member-array-acc.C |  13 +
 gcc/testsuite/g++.dg/gomp/member-array-omp.C  |  13 +
 .../testsuite/libgomp.oacc-c++/deep-copy-17.C | 101 
 .../libgomp.oacc-c-c++-common/deep-copy-15.c  |  68 ++
 .../libgomp.oacc-c-c++-common/deep-copy-16.c  | 231 ++
 8 files changed, 618 insertions(+), 58 deletions(-)
 create mode 100644 gcc/testsuite/g++.dg/goacc/member-array-acc.C
 create mode 100644 gcc/testsuite/g++.dg/gomp/member-array-omp.C
 create mode 100644 libgomp/testsuite/libgomp.oacc-c++/deep-copy-17.C
 create mode 100644 libgomp/testsuite/libgomp.oacc-c-c++-common/deep-copy-15.c
 create mode 100644 libgomp/testsuite/libgomp.oacc-c-c++-common/deep-copy-16.c

diff --git a/gcc/fortran/trans-openmp.c b/gcc/fortran/trans-openmp.c
index 7d761e90dd7..508e02306e9 100644
--- a/gcc/fortran/trans-openmp.c
+++ b/gcc/fortran/trans-openmp.c
@@ -3034,30 +3034,16 @@ gfc_trans_omp_clauses (stmtblock_t *block, 
gfc_omp_clauses *clauses,
  tree present = gfc_omp_check_optional_argument (decl, true);
  if (openacc && n->sym->ts.type == BT_CLASS)
{
- tree type = TREE_TYPE (decl);
  if (n->sym->attr.optional)
sorry ("optional class parameter");
- if (POINTER_TYPE_P (type))
-   {
- node4 = build_omp_clause (input_location,
-   OMP_CLAUSE_MAP);
- OMP_CLAUSE_SET_MAP_KIND (node4, GOMP_MAP_POINTER);
- OMP_CLAUSE_DECL (node4)

Re: [PATCH] Remove unreachable gcc_unreachable () at the end of functions

2021-11-25 Thread Richard Biener via Gcc-patches

On Thu, 25 Nov 2021, Michael Matz wrote:

> Hello,
> 
> On Thu, 25 Nov 2021, Richard Biener via Gcc-patches wrote:
> 
> > It seems to be a style to place gcc_unreachable () after a
> > switch that handles all cases with every case returning.
> > Those are unreachable (well, yes!), so they will be elided
> > at CFG construction time and the middle-end will place
> > another __builtin_unreachable "after" them to note the
> > path doesn't lead to a return when the function is not declared
> > void.
> > 
> > So IMHO those explicit gcc_unreachable () serve no purpose,
> > if they could be replaced by a comment.
> 
> Never document in comments what you can document in code (IMO).  I think 
> the code as-is clearly documents the invariants and expectations and 
> removing the gcc_unreachable() leads to worse sources.
> 
> Can't you simply exempt warning on unreachable __builtin_unreachable()?
> It seems an obvious thing that the warning should _not_ warn about, after 
> all, quite clearly, the author is aware of that being unreachable, it says 
> so, right there.

gcc_unreachable () is not actually __builtin_unreachable () but instead
fancy_abort (__FILE__, __LINE__, __FUNCTION__).  Yes, I agree
that the warning shouldn't warn about "this is unrechable", but if it's
not plain __builtin_unreachable () then we'd need a new function
attribute on it which in this particular case means an alternate
"fancy_abort" since in general fancy_aborts are of course reachable.

We could also handle all noreturn calls this way and not diagnose
those if they are unreachable in exchange for some false negatives.

Btw, I don't agree with "Never document in comments what you can document 
in code" in this case, but I take it as a hint that removing
gcc_unreachable in those cases should at least leave a comment in
there?

Richard.

[PATCH 05/16] Remove base_ind/base_ref handling from extract_base_bit_offset

2021-11-25 Thread Julian Brown

In preparation for follow-up patches extending struct dereference
handling for OpenMP, this patch removes base_ind/base_ref handling from
gimplify.c:extract_base_bit_offset. This arguably simplifies some of the
code around the callers of the function also, though subsequent patches
modify those parts further.

(This one has already been approved, pending approval of the rest of
the series:
https://gcc.gnu.org/pipermail/gcc-patches/2021-October/581426.html)

2021-09-29  Julian Brown  

gcc/
* gimplify.c (extract_base_bit_offset): Remove BASE_IND, BASE_REF and
OPENMP parameters.
(strip_indirections): New function.
(build_struct_group): Update calls to extract_base_bit_offset.
Rearrange indirect/reference handling accordingly.  Use extracted base
instead of passed-in decl when grouping component accesses together.
---
 gcc/gimplify.c | 109 ++---
 1 file changed, 57 insertions(+), 52 deletions(-)

diff --git a/gcc/gimplify.c b/gcc/gimplify.c
index fcc278d07cf..73b839daa09 100644
--- a/gcc/gimplify.c
+++ b/gcc/gimplify.c
@@ -8658,9 +8658,8 @@ build_struct_comp_nodes (enum tree_code code, tree 
grp_start, tree grp_end,
has array type, else return NULL.  */
 
 static tree
-extract_base_bit_offset (tree base, tree *base_ind, tree *base_ref,
-poly_int64 *bitposp, poly_offset_int *poffsetp,
-tree *offsetp, bool openmp)
+extract_base_bit_offset (tree base, poly_int64 *bitposp,
+poly_offset_int *poffsetp, tree *offsetp)
 {
   tree offset;
   poly_int64 bitsize, bitpos;
@@ -8668,38 +8667,12 @@ extract_base_bit_offset (tree base, tree *base_ind, 
tree *base_ref,
   int unsignedp, reversep, volatilep = 0;
   poly_offset_int poffset;
 
-  if (base_ind)
-*base_ind = NULL_TREE;
-
-  if (base_ref)
-*base_ref = NULL_TREE;
+  STRIP_NOPS (base);
 
   base = get_inner_reference (base, &bitsize, &bitpos, &offset, &mode,
  &unsignedp, &reversep, &volatilep);
 
-  if (!openmp
-  && (TREE_CODE (base) == INDIRECT_REF
- || (TREE_CODE (base) == MEM_REF
- && integer_zerop (TREE_OPERAND (base, 1
-  && TREE_CODE (TREE_TYPE (TREE_OPERAND (base, 0))) == POINTER_TYPE)
-{
-  if (base_ind)
-   *base_ind = base;
-  base = TREE_OPERAND (base, 0);
-}
-  if ((TREE_CODE (base) == INDIRECT_REF
-   || (TREE_CODE (base) == MEM_REF
-  && integer_zerop (TREE_OPERAND (base, 1
-  && DECL_P (TREE_OPERAND (base, 0))
-  && TREE_CODE (TREE_TYPE (TREE_OPERAND (base, 0))) == REFERENCE_TYPE)
-{
-  if (base_ref)
-   *base_ref = base;
-  base = TREE_OPERAND (base, 0);
-}
-
-  if (!openmp)
-STRIP_NOPS (base);
+  STRIP_NOPS (base);
 
   if (offset && poly_int_tree_p (offset))
 {
@@ -8756,6 +8729,17 @@ strip_components_and_deref (tree expr)
   return expr;
 }
 
+static tree
+strip_indirections (tree expr)
+{
+  while (TREE_CODE (expr) == INDIRECT_REF
+|| (TREE_CODE (expr) == MEM_REF
+&& integer_zerop (TREE_OPERAND (expr, 1
+expr = TREE_OPERAND (expr, 0);
+
+  return expr;
+}
+
 /* Return TRUE if EXPR is something we will use as the base of an aggregate
access, either:
 
@@ -9249,7 +9233,7 @@ build_struct_group (struct gimplify_omp_ctx *ctx,
 {
   poly_offset_int coffset;
   poly_int64 cbitpos;
-  tree base_ind, base_ref, tree_coffset;
+  tree tree_coffset;
   tree ocd = OMP_CLAUSE_DECL (c);
   bool openmp = !(region_type & ORT_ACC);
 
@@ -9259,10 +9243,25 @@ build_struct_group (struct gimplify_omp_ctx *ctx,
   if (TREE_CODE (ocd) == INDIRECT_REF)
 ocd = TREE_OPERAND (ocd, 0);
 
-  tree base = extract_base_bit_offset (ocd, &base_ind, &base_ref, &cbitpos,
-  &coffset, &tree_coffset, openmp);
+  tree base = extract_base_bit_offset (ocd, &cbitpos, &coffset, &tree_coffset);
+  tree sbase;
 
-  bool do_map_struct = (base == decl && !tree_coffset);
+  if (openmp)
+{
+  if (TREE_CODE (base) == INDIRECT_REF
+ && TREE_CODE (TREE_TYPE (TREE_OPERAND (base, 0))) == REFERENCE_TYPE)
+   sbase = strip_indirections (base);
+  else
+   sbase = base;
+}
+  else
+{
+  sbase = strip_indirections (base);
+
+  STRIP_NOPS (sbase);
+}
+
+  bool do_map_struct = (sbase == decl && !tree_coffset);
 
   /* Here, DECL is usually a DECL_P, unless we have chained indirect member
  accesses, e.g. mystruct->a->b.  In that case it'll be the "mystruct->a"
@@ -9322,19 +9321,12 @@ build_struct_group (struct gimplify_omp_ctx *ctx,
 
   OMP_CLAUSE_SET_MAP_KIND (l, k);
 
-  if (!openmp && base_ind)
-   OMP_CLAUSE_DECL (l) = unshare_expr (base_ind);
-  else if (base_ref)
-   OMP_CLAUSE_DECL (l) = unshare_expr (base_ref);
-  else
-   {
- OMP_CLAUSE_DECL (l) = unshare_expr (decl);
- if (openmp
- && !DECL_P (OMP_CLAUSE_DECL (l))
-

[PATCH 06/16] OpenMP 5.0: Clause ordering for OpenMP 5.0 (topological sorting by base pointer)

2021-11-25 Thread Julian Brown

This patch reimplements the omp_target_reorder_clauses function in
anticipation of supporting "deeper" struct mappings (that is, with
several structure dereference operators, or similar).

The idea is that in place of the (possibly quadratic) algorithm in
omp_target_reorder_clauses that greedily moves clauses containing
addresses that are subexpressions of other addresses before those other
addresses, we employ a topological sort algorithm to calculate a proper
order for map clauses. This should run in linear time, and hopefully
handles degenerate cases where multiple "levels" of indirect accesses
are present on a given directive.

The new method also takes care to keep clause groups together, addressing
the concerns raised in:

  https://gcc.gnu.org/pipermail/gcc-patches/2021-May/570501.html

To figure out if some given clause depends on a base pointer in another
clause, we strip off the outer layers of the address expression, and check
(via a tree_operand_hash hash table we have built) if the result is a
"base pointer" as defined in OpenMP 5.0 (1.2.6 Data Terminology). There
are some subtleties involved, however:

 - We must treat MEM_REF with zero offset the same as INDIRECT_REF.
   This should probably be fixed in the front ends instead so we always
   use a canonical form (probably INDIRECT_REF). The following patch
   shows one instance of the problem, but there may be others:

   https://gcc.gnu.org/pipermail/gcc-patches/2021-May/571382.html

 - Mapping a whole struct implies mapping each of that struct's
   elements, which may be base pointers. Because those base pointers
   aren't necessarily explicitly referenced in the directive in question,
   we treat the whole-struct mapping as a dependency instead.

This version of the patch fixes a bug in omp_reorder_mapping_groups,
relative to the last version posted.

OK?

Thanks,

Julian

2021-11-23  Julian Brown  

gcc/
* gimplify.c (is_or_contains_p, omp_target_reorder_clauses): Delete
functions.
(omp_tsort_mark): Add enum.
(omp_mapping_group): Add struct.
(debug_mapping_group, omp_get_base_pointer, omp_get_attachment,
omp_group_last, omp_gather_mapping_groups, omp_group_base,
omp_index_mapping_groups, omp_containing_struct,
omp_tsort_mapping_groups_1, omp_tsort_mapping_groups,
omp_segregate_mapping_groups, omp_reorder_mapping_groups): New
functions.
(gimplify_scan_omp_clauses): Call above functions instead of
omp_target_reorder_clauses, unless we've seen an error.
* omp-low.c (scan_sharing_clauses): Avoid strict test if we haven't
sorted mapping groups.

gcc/testsuite/
* g++.dg/gomp/target-lambda-1.C: Adjust expected output.
* g++.dg/gomp/target-this-3.C: Likewise.
* g++.dg/gomp/target-this-4.C: Likewise.
---
 gcc/gimplify.c  | 807 +++-
 gcc/omp-low.c   |   7 +-
 gcc/testsuite/g++.dg/gomp/target-lambda-1.C |   6 +-
 gcc/testsuite/g++.dg/gomp/target-this-3.C   |   4 +-
 gcc/testsuite/g++.dg/gomp/target-this-4.C   |   4 +-
 5 files changed, 791 insertions(+), 37 deletions(-)

diff --git a/gcc/gimplify.c b/gcc/gimplify.c
index 73b839daa09..6778fb25e45 100644
--- a/gcc/gimplify.c
+++ b/gcc/gimplify.c
@@ -8692,29 +8692,6 @@ extract_base_bit_offset (tree base, poly_int64 *bitposp,
   return base;
 }
 
-/* Returns true if EXPR is or contains (as a sub-component) BASE_PTR.  */
-
-static bool
-is_or_contains_p (tree expr, tree base_ptr)
-{
-  if ((TREE_CODE (expr) == INDIRECT_REF && TREE_CODE (base_ptr) == MEM_REF)
-  || (TREE_CODE (expr) == MEM_REF && TREE_CODE (base_ptr) == INDIRECT_REF))
-return operand_equal_p (TREE_OPERAND (expr, 0),
-   TREE_OPERAND (base_ptr, 0));
-  while (!operand_equal_p (expr, base_ptr))
-{
-  if (TREE_CODE (base_ptr) == COMPOUND_EXPR)
-   base_ptr = TREE_OPERAND (base_ptr, 1);
-  if (TREE_CODE (base_ptr) == COMPONENT_REF
- || TREE_CODE (base_ptr) == POINTER_PLUS_EXPR
- || TREE_CODE (base_ptr) == SAVE_EXPR)
-   base_ptr = TREE_OPERAND (base_ptr, 0);
-  else
-   break;
-}
-  return operand_equal_p (expr, base_ptr);
-}
-
 /* Remove COMPONENT_REFS and indirections from EXPR.  */
 
 static tree
@@ -8768,6 +8745,7 @@ aggregate_base_p (tree expr)
   return false;
 }
 
+#if 0
 /* Implement OpenMP 5.x map ordering rules for target directives. There are
several rules, and with some level of ambiguity, hopefully we can at least
collect the complexity here in one place.  */
@@ -8947,6 +8925,761 @@ omp_target_reorder_clauses (tree *list_p)
}
   }
 }
+#endif
+
+
+enum omp_tsort_mark {
+  UNVISITED,
+  TEMPORARY,
+  PERMANENT
+};
+
+struct omp_mapping_group {
+  tree *grp_start;
+  tree grp_end;
+  omp_tsort_mark mark;
+  struct omp_mapping_group *sibling;
+  struct omp_mapping_group *next;
+};
+
+__attribute__((used)) static void
+debug_ma

[PATCH 07/16] Remove omp_target_reorder_clauses

2021-11-25 Thread Julian Brown

This patch has been split out from the previous one to avoid a
confusingly-interleaved diff.  The two patches should probably be
committed squashed together.

2021-10-01  Julian Brown  

gcc/
* gimplify.c (omp_target_reorder_clauses): Delete.
---
 gcc/gimplify.c | 183 -
 1 file changed, 183 deletions(-)

diff --git a/gcc/gimplify.c b/gcc/gimplify.c
index 6778fb25e45..fb923f05314 100644
--- a/gcc/gimplify.c
+++ b/gcc/gimplify.c
@@ -8745,189 +8745,6 @@ aggregate_base_p (tree expr)
   return false;
 }
 
-#if 0
-/* Implement OpenMP 5.x map ordering rules for target directives. There are
-   several rules, and with some level of ambiguity, hopefully we can at least
-   collect the complexity here in one place.  */
-
-static void
-omp_target_reorder_clauses (tree *list_p)
-{
-  /* Collect refs to alloc/release/delete maps.  */
-  auto_vec ard;
-  tree *cp = list_p;
-  while (*cp != NULL_TREE)
-if (OMP_CLAUSE_CODE (*cp) == OMP_CLAUSE_MAP
-   && (OMP_CLAUSE_MAP_KIND (*cp) == GOMP_MAP_ALLOC
-   || OMP_CLAUSE_MAP_KIND (*cp) == GOMP_MAP_RELEASE
-   || OMP_CLAUSE_MAP_KIND (*cp) == GOMP_MAP_DELETE))
-  {
-   /* Unlink cp and push to ard.  */
-   tree c = *cp;
-   tree nc = OMP_CLAUSE_CHAIN (c);
-   *cp = nc;
-   ard.safe_push (c);
-
-   /* Any associated pointer type maps should also move along.  */
-   while (*cp != NULL_TREE
-  && OMP_CLAUSE_CODE (*cp) == OMP_CLAUSE_MAP
-  && (OMP_CLAUSE_MAP_KIND (*cp) == GOMP_MAP_FIRSTPRIVATE_REFERENCE
-  || OMP_CLAUSE_MAP_KIND (*cp) == GOMP_MAP_FIRSTPRIVATE_POINTER
-  || OMP_CLAUSE_MAP_KIND (*cp) == GOMP_MAP_ATTACH_DETACH
-  || OMP_CLAUSE_MAP_KIND (*cp) == GOMP_MAP_POINTER
-  || OMP_CLAUSE_MAP_KIND (*cp) == GOMP_MAP_ALWAYS_POINTER
-  || OMP_CLAUSE_MAP_KIND (*cp) == GOMP_MAP_TO_PSET))
- {
-   c = *cp;
-   nc = OMP_CLAUSE_CHAIN (c);
-   *cp = nc;
-   ard.safe_push (c);
- }
-  }
-else
-  cp = &OMP_CLAUSE_CHAIN (*cp);
-
-  /* Link alloc/release/delete maps to the end of list.  */
-  for (unsigned int i = 0; i < ard.length (); i++)
-{
-  *cp = ard[i];
-  cp = &OMP_CLAUSE_CHAIN (ard[i]);
-}
-  *cp = NULL_TREE;
-
-  /* OpenMP 5.0 requires that pointer variables are mapped before
- its use as a base-pointer.  */
-  auto_vec atf;
-  for (tree *cp = list_p; *cp; cp = &OMP_CLAUSE_CHAIN (*cp))
-if (OMP_CLAUSE_CODE (*cp) == OMP_CLAUSE_MAP)
-  {
-   /* Collect alloc, to, from, to/from clause tree pointers.  */
-   gomp_map_kind k = OMP_CLAUSE_MAP_KIND (*cp);
-   if (k == GOMP_MAP_ALLOC
-   || k == GOMP_MAP_TO
-   || k == GOMP_MAP_FROM
-   || k == GOMP_MAP_TOFROM
-   || k == GOMP_MAP_ALWAYS_TO
-   || k == GOMP_MAP_ALWAYS_FROM
-   || k == GOMP_MAP_ALWAYS_TOFROM)
- atf.safe_push (cp);
-  }
-
-  for (unsigned int i = 0; i < atf.length (); i++)
-if (atf[i])
-  {
-   tree *cp = atf[i];
-   tree decl = OMP_CLAUSE_DECL (*cp);
-   if (TREE_CODE (decl) == INDIRECT_REF || TREE_CODE (decl) == MEM_REF)
- {
-   tree base_ptr = TREE_OPERAND (decl, 0);
-   STRIP_TYPE_NOPS (base_ptr);
-   for (unsigned int j = i + 1; j < atf.length (); j++)
- if (atf[j])
-   {
- tree *cp2 = atf[j];
- tree decl2 = OMP_CLAUSE_DECL (*cp2);
-
- decl2 = OMP_CLAUSE_DECL (*cp2);
- if (is_or_contains_p (decl2, base_ptr))
-   {
- /* Move *cp2 to before *cp.  */
- tree c = *cp2;
- *cp2 = OMP_CLAUSE_CHAIN (c);
- OMP_CLAUSE_CHAIN (c) = *cp;
- *cp = c;
-
- if (*cp2 != NULL_TREE
- && OMP_CLAUSE_CODE (*cp2) == OMP_CLAUSE_MAP
- && OMP_CLAUSE_MAP_KIND (*cp2) == 
GOMP_MAP_ALWAYS_POINTER)
-   {
- tree c2 = *cp2;
- *cp2 = OMP_CLAUSE_CHAIN (c2);
- OMP_CLAUSE_CHAIN (c2) = OMP_CLAUSE_CHAIN (c);
- OMP_CLAUSE_CHAIN (c) = c2;
-   }
-
- atf[j] = NULL;
- }
-   }
- }
-  }
-
-  /* For attach_detach map clauses, if there is another map that maps the
- attached/detached pointer, make sure that map is ordered before the
- attach_detach.  */
-  atf.truncate (0);
-  for (tree *cp = list_p; *cp; cp = &OMP_CLAUSE_CHAIN (*cp))
-if (OMP_CLAUSE_CODE (*cp) == OMP_CLAUSE_MAP)
-  {
-   /* Collect alloc, to, from, to/from clauses, and
-  always_pointer/attach_detach clauses.  */
-   gomp_map_kind k = OMP_CLAUSE_MAP_KIND (*cp);
-   if (k == GOMP_MAP_ALLOC
-

[PATCH 08/16] OpenMP/OpenACC: Hoist struct sibling list handling in gimplification

2021-11-25 Thread Julian Brown

This patch lifts struct sibling-list handling out of the main loop in
gimplify_scan_omp_clauses.  The reasons for this are several: first,
it means that we can subject created sibling list groups to topological
sorting (see previous patch) so base-pointer data dependencies are
handled correctly.

Secondly, it means that in the first pass gathering up sibling lists
from parsed OpenMP/OpenACC clauses, we don't need to worry about
gimplifying: that means we can see struct bases & components we need
to sort sibling lists properly, even when we're using a non-DECL_P
struct base.  Gimplification proper still happens in the main loop in
gimplify_scan_omp_clauses.

Thirdly, because we use more than one pass through the clause list and
gather appropriate data, we can tell if we're mapping a whole struct
in a different node, and avoid building struct sibling lists for that
struct appropriately.

Fourthly, we can re-use the node grouping functions from the
previous patch, and thus mostly avoid the "prev_list_p" handling in
gimplify_scan_omp_clauses that tracks the first node in such groups
at present.

Some redundant code has been removed and code paths for OpenACC/OpenMP are
now shared where appropriate, though OpenACC doesn't do the topological
sorting of nodes (yet?).

OK?

Thanks,

Julian

2021-09-29  Julian Brown  

gcc/
* gimplify.c (gimplify_omp_var_data): Remove GOVD_MAP_HAS_ATTACHMENTS.
(extract_base_bit_offset): Remove OFFSETP parameter.
(strip_components_and_deref): Extend with POINTER_PLUS_EXPR and
COMPOUND_EXPR handling.
(aggregate_base_p): Remove.
(omp_group_last, omp_group_base): Add GOMP_MAP_STRUCT handling.
(build_struct_group): Remove CTX, DECL, PD, COMPONENT_REF_P, FLAGS,
STRUCT_SEEN_CLAUSE, PRE_P, CONT parameters.  Replace PREV_LIST_P and C
parameters with GRP_START_P and GRP_END.  Add INNER.  Update calls to
extract_base_bit_offset.  Remove gimplification of clauses for OpenMP.
Rework inner struct handling for OpenACC.  Don't use context's
variables splay tree.
(omp_build_struct_sibling_lists): New function, extracted from
gimplify_scan_omp_clauses and refactored.
(gimplify_scan_omp_clauses): Call above function to handle struct
sibling lists.  Remove STRUCT_MAP_TO_CLAUSE, STRUCT_SEEN_CLAUSE,
STRUCT_DEREF_SET.  Rework flag handling, adding decl for struct
variables.
(gimplify_adjust_omp_clauses_1): Remove GOVD_MAP_HAS_ATTACHMENTS
handling, unused now.

gcc/testsuite/
* g++.dg/goacc/member-array-acc.C: Update expected output.
* g++.dg/gomp/target-3.C: Likewise.
* g++.dg/gomp/target-lambda-1.C: Likewise.
* g++.dg/gomp/target-this-2.C: Likewise.
* g++.dg/gomp/target-this-4.C: Likewise.
---
 gcc/gimplify.c| 943 --
 gcc/testsuite/g++.dg/goacc/member-array-acc.C |   2 +-
 gcc/testsuite/g++.dg/gomp/target-3.C  |   4 +-
 gcc/testsuite/g++.dg/gomp/target-lambda-1.C   |   2 +-
 gcc/testsuite/g++.dg/gomp/target-this-2.C |   2 +-
 gcc/testsuite/g++.dg/gomp/target-this-4.C |   4 +-
 6 files changed, 410 insertions(+), 547 deletions(-)

diff --git a/gcc/gimplify.c b/gcc/gimplify.c
index fb923f05314..56f0aaaf979 100644
--- a/gcc/gimplify.c
+++ b/gcc/gimplify.c
@@ -125,10 +125,6 @@ enum gimplify_omp_var_data
   /* Flag for GOVD_REDUCTION: inscan seen in {in,ex}clusive clause.  */
   GOVD_REDUCTION_INSCAN = 0x200,
 
-  /* Flag for GOVD_MAP: (struct) vars that have pointer attachments for
- fields.  */
-  GOVD_MAP_HAS_ATTACHMENTS = 0x400,
-
   /* Flag for GOVD_FIRSTPRIVATE: OMP_CLAUSE_FIRSTPRIVATE_IMPLICIT.  */
   GOVD_FIRSTPRIVATE_IMPLICIT = 0x800,
 
@@ -8659,7 +8655,7 @@ build_struct_comp_nodes (enum tree_code code, tree 
grp_start, tree grp_end,
 
 static tree
 extract_base_bit_offset (tree base, poly_int64 *bitposp,
-poly_offset_int *poffsetp, tree *offsetp)
+poly_offset_int *poffsetp)
 {
   tree offset;
   poly_int64 bitsize, bitpos;
@@ -8687,7 +8683,6 @@ extract_base_bit_offset (tree base, poly_int64 *bitposp,
 
   *bitposp = bitpos;
   *poffsetp = poffset;
-  *offsetp = offset;
 
   return base;
 }
@@ -8700,8 +8695,15 @@ strip_components_and_deref (tree expr)
   while (TREE_CODE (expr) == COMPONENT_REF
 || TREE_CODE (expr) == INDIRECT_REF
 || (TREE_CODE (expr) == MEM_REF
-&& integer_zerop (TREE_OPERAND (expr, 1
-expr = TREE_OPERAND (expr, 0);
+&& integer_zerop (TREE_OPERAND (expr, 1)))
+|| TREE_CODE (expr) == POINTER_PLUS_EXPR
+|| TREE_CODE (expr) == COMPOUND_EXPR)
+  if (TREE_CODE (expr) == COMPOUND_EXPR)
+   expr = TREE_OPERAND (expr, 1);
+  else
+   expr = TREE_OPERAND (expr, 0);
+
+  STRIP_NOPS (expr);
 
   return expr;
 }
@@ -8717,34 +8719,6 @@ strip_indirections (tree expr)
   return expr;
 }
 
-/* Re

[PATCH 09/16] OpenMP: Allow array ref components for C & C++

2021-11-25 Thread Julian Brown

This patch fixes parsing for struct components that are array references
in OMP clauses in both the C and C++ front ends.

OK?

Thanks,

Julian

2021-09-29  Julian Brown  

gcc/c/
* c-typeck.c (c_finish_omp_clauses): Allow ARRAY_REF components.

gcc/cp/
* semantics.c (finish_omp_clauses): Allow ARRAY_REF components.
---
 gcc/c/c-typeck.c   | 3 ++-
 gcc/cp/semantics.c | 3 ++-
 2 files changed, 4 insertions(+), 2 deletions(-)

diff --git a/gcc/c/c-typeck.c b/gcc/c/c-typeck.c
index ee6362d4274..4d156f6d3ec 100644
--- a/gcc/c/c-typeck.c
+++ b/gcc/c/c-typeck.c
@@ -14918,7 +14918,8 @@ c_finish_omp_clauses (tree clauses, enum 
c_omp_region_type ort)
{
  t = TREE_OPERAND (t, 0);
  if (TREE_CODE (t) == MEM_REF
- || TREE_CODE (t) == INDIRECT_REF)
+ || TREE_CODE (t) == INDIRECT_REF
+ || TREE_CODE (t) == ARRAY_REF)
{
  t = TREE_OPERAND (t, 0);
  STRIP_NOPS (t);
diff --git a/gcc/cp/semantics.c b/gcc/cp/semantics.c
index 50f95751d1c..e882c302f31 100644
--- a/gcc/cp/semantics.c
+++ b/gcc/cp/semantics.c
@@ -7910,7 +7910,8 @@ finish_omp_clauses (tree clauses, enum c_omp_region_type 
ort)
  if (REFERENCE_REF_P (t))
t = TREE_OPERAND (t, 0);
  if (TREE_CODE (t) == MEM_REF
- || TREE_CODE (t) == INDIRECT_REF)
+ || TREE_CODE (t) == INDIRECT_REF
+ || TREE_CODE (t) == ARRAY_REF)
{
  t = TREE_OPERAND (t, 0);
  STRIP_NOPS (t);
-- 
2.29.2

Re: [committed 03/12] d: Insert null terminator in obstack buffers

2021-11-25 Thread Martin Liška


On 7/30/21 13:01, Iain Buclaw via Gcc-patches wrote:

|Covers cases where functions that handle the extracted strings ignore the 
explicit length. This isn't something that's known to happen in the current 
front-end, but the self-hosted front-end has been observed to do this in its 
conversions between D and C-style strings.|


Can you please cherry pick this for gcc-11 branch as I see nasty output when 
using --verbose:

$ gcc /home/marxin/Programming/gcc/gcc/testsuite/gdc.dg/attr_optimize4.d -c 
--verbose
...
predefs   GNU D_Version2 LittleEndian GNU_DWARF2_Exceptions GNU_StackGrowsDown 
GNU_InlineAsm D_LP64 assert D_ModuleInfo D_Exceptions D_TypeInfo all X86_64 
D_HardFloat Posix linux CRuntime_Glibc CppRuntime_Gcc��...


Thanks,
Martin

[PATCH 10/16] OpenMP: Fix non-zero attach/detach bias for struct dereferences

2021-11-25 Thread Julian Brown

This patch fixes attach/detach operations for OpenMP that have a non-zero
bias: these can occur if we have a mapping such as:

  #pragma omp target map(mystruct->a.b[idx].c[:arrsz])

i.e. where there is an offset between the attachment point ("mystruct"
here) and the pointed-to data.  (The "b" and "c" members would be array
types here, not pointers themselves).  In this example the difference
(thus bias encoded in the attach/detach node) will be something like:

  (uintptr_t) &mystruct->a.b[idx].c[0] - (uintptr_t) &mystruct->a

OK?

Thanks,

Julian

2021-09-29  Julian Brown  

gcc/c-family/
* c-common.h (c_omp_decompose_attachable_address): Add prototype.
* c-omp.c (c_omp_decompose_attachable_address): New function.

gcc/c/
* c-typeck.c (handle_omp_array_sections): Handle attach/detach for
struct dereferences with non-zero bias.

gcc/cp/
* semantics.c (handle_omp_array_section): Handle attach/detach for
struct dereferences with non-zero bias.

libgomp/
* testsuite/libgomp.c++/baseptrs-3.C: Add test (XFAILed for now).
* testsuite/libgomp.c-c++-common/baseptrs-1.c: Add test.
* testsuite/libgomp.c-c++-common/baseptrs-2.c: Add test.
---
 gcc/c-family/c-common.h   |   1 +
 gcc/c-family/c-omp.c  |  42 
 gcc/c/c-typeck.c  |  12 +-
 gcc/cp/semantics.c|  14 +-
 libgomp/testsuite/libgomp.c++/baseptrs-3.C| 182 ++
 .../libgomp.c-c++-common/baseptrs-1.c |  50 +
 .../libgomp.c-c++-common/baseptrs-2.c |  70 +++
 7 files changed, 364 insertions(+), 7 deletions(-)
 create mode 100644 libgomp/testsuite/libgomp.c++/baseptrs-3.C
 create mode 100644 libgomp/testsuite/libgomp.c-c++-common/baseptrs-1.c
 create mode 100644 libgomp/testsuite/libgomp.c-c++-common/baseptrs-2.c

diff --git a/gcc/c-family/c-common.h b/gcc/c-family/c-common.h
index d5dad99ff97..dd103d8eecd 100644
--- a/gcc/c-family/c-common.h
+++ b/gcc/c-family/c-common.h
@@ -1251,6 +1251,7 @@ extern tree c_omp_check_context_selector (location_t, 
tree);
 extern void c_omp_mark_declare_variant (location_t, tree, tree);
 extern const char *c_omp_map_clause_name (tree, bool);
 extern void c_omp_adjust_map_clauses (tree, bool);
+extern tree c_omp_decompose_attachable_address (tree t, tree *virtbase);
 
 enum c_omp_directive_kind {
   C_OMP_DIR_STANDALONE,
diff --git a/gcc/c-family/c-omp.c b/gcc/c-family/c-omp.c
index 3f84fd1b5cb..a90696fe706 100644
--- a/gcc/c-family/c-omp.c
+++ b/gcc/c-family/c-omp.c
@@ -3113,6 +3113,48 @@ c_omp_adjust_map_clauses (tree clauses, bool is_target)
 }
 }
 
+tree
+c_omp_decompose_attachable_address (tree t, tree *virtbase)
+{
+  *virtbase = t;
+
+  /* It's already a pointer.  Just use that.  */
+  if (POINTER_TYPE_P (TREE_TYPE (t)))
+return NULL_TREE;
+
+  /* Otherwise, look for a base pointer deeper within the expression.  */
+
+  while (TREE_CODE (t) == COMPONENT_REF
+&& (TREE_CODE (TREE_OPERAND (t, 0)) == COMPONENT_REF
+|| TREE_CODE (TREE_OPERAND (t, 0)) == ARRAY_REF))
+{
+  t = TREE_OPERAND (t, 0);
+  while (TREE_CODE (t) == ARRAY_REF)
+   t = TREE_OPERAND (t, 0);
+}
+
+
+  *virtbase = t;
+
+  if (TREE_CODE (t) != COMPONENT_REF)
+return NULL_TREE;
+
+  t = TREE_OPERAND (t, 0);
+
+  tree attach_pt = NULL_TREE;
+
+  if ((TREE_CODE (t) == INDIRECT_REF
+   || TREE_CODE (t) == MEM_REF)
+  && TREE_CODE (TREE_TYPE (TREE_OPERAND (t, 0))) == POINTER_TYPE)
+{
+  attach_pt = TREE_OPERAND (t, 0);
+  if (TREE_CODE (attach_pt) == POINTER_PLUS_EXPR)
+   attach_pt = TREE_OPERAND (attach_pt, 0);
+}
+
+  return attach_pt;
+}
+
 static const struct c_omp_directive omp_directives[] = {
   /* Keep this alphabetically sorted by the first word.  Non-null second/third
  if any should precede null ones.  */
diff --git a/gcc/c/c-typeck.c b/gcc/c/c-typeck.c
index 4d156f6d3ec..cfac7d0a2b5 100644
--- a/gcc/c/c-typeck.c
+++ b/gcc/c/c-typeck.c
@@ -13799,9 +13799,15 @@ handle_omp_array_sections (tree c, enum 
c_omp_region_type ort)
   if (size)
size = c_fully_fold (size, false, NULL);
   OMP_CLAUSE_SIZE (c) = size;
+  tree virtbase = t;
+  tree attach_pt
+   = ((ort != C_ORT_ACC)
+  ? c_omp_decompose_attachable_address (t, &virtbase)
+  : NULL_TREE);
   if (OMP_CLAUSE_CODE (c) != OMP_CLAUSE_MAP
  || (TREE_CODE (t) == COMPONENT_REF
- && TREE_CODE (TREE_TYPE (t)) == ARRAY_TYPE))
+ && TREE_CODE (TREE_TYPE (t)) == ARRAY_TYPE
+ && !attach_pt))
return false;
   gcc_assert (OMP_CLAUSE_MAP_KIND (c) != GOMP_MAP_FORCE_DEVICEPTR);
   switch (OMP_CLAUSE_MAP_KIND (c))
@@ -13834,10 +13840,10 @@ handle_omp_array_sections (tree c, enum 
c_omp_region_type ort)
   if (OMP_CLAUSE_MAP_KIND (c2) != GOMP_MAP_FIRSTPRIVATE_POINTER
  && !c_mark_addressable (t))
return fa

[PATCH 12/16] OpenACC: Make deep-copy-arrayofstruct.c a libgomp/runtime test

2021-11-25 Thread Julian Brown

I noticed that the test in question now compiles properly, and in fact
runs properly too.  Thus it's more useful as a runtime test than a
passing compilation test that otherwise doesn't do much.  This patch
moves it to libgomp.

OK?

Thanks,

Julian

2021-10-11  Julian Brown  

gcc/testsuite/
* c-c++-common/goacc/deep-copy-arrayofstruct.c: Move test from here.

libgomp/
* testsuite/libgomp.oacc-c-c++-common/deep-copy-arrayofstruct.c: Move
test to here.
---
 .../libgomp.oacc-c-c++-common}/deep-copy-arrayofstruct.c| 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)
 rename {gcc/testsuite/c-c++-common/goacc => 
libgomp/testsuite/libgomp.oacc-c-c++-common}/deep-copy-arrayofstruct.c (98%)

diff --git a/gcc/testsuite/c-c++-common/goacc/deep-copy-arrayofstruct.c 
b/libgomp/testsuite/libgomp.oacc-c-c++-common/deep-copy-arrayofstruct.c
similarity index 98%
rename from gcc/testsuite/c-c++-common/goacc/deep-copy-arrayofstruct.c
rename to libgomp/testsuite/libgomp.oacc-c-c++-common/deep-copy-arrayofstruct.c
index 4247607b61c..a11c64749cc 100644
--- a/gcc/testsuite/c-c++-common/goacc/deep-copy-arrayofstruct.c
+++ b/libgomp/testsuite/libgomp.oacc-c-c++-common/deep-copy-arrayofstruct.c
@@ -1,4 +1,4 @@
-/* { dg-do compile } */
+/* { dg-do run } */
 
 #include 
 #include 
-- 
2.29.2

[PATCH 11/16] OpenMP: Handle reference-typed struct members

2021-11-25 Thread Julian Brown

This patch fixes the baseptrs-3.C test case introduced in the patch:

  https://gcc.gnu.org/pipermail/gcc-patches/2021-October/580729.html

The problematic case concerns OpenMP mapping clauses containing struct
members of reference type, e.g. "mystruct.myref.myptr[:N]".  To be able
to access the array slice through the reference in the middle, we need
to perform an attach action for that reference, since it is represented
internally as a pointer.

I don't think the spec allows for this case explicitly.  The closest
clause is (OpenMP 5.0, "2.19.7.1 map Clause"):

  "If the type of a list item is a reference to a type T then the
   reference in the device data environment is initialized to refer to
   the object in the device data environment that corresponds to the
   object referenced by the list item. If mapping occurs, it occurs as
   though the object were mapped through a pointer with an array section
   of type T and length one."

The patch as is allows the mapping to work with just
"mystruct.myref.myptr[:N]", without an explicit "mystruct.myref"
mapping also (because, would that refer to the hidden pointer used by
the reference, or the automatically-dereferenced data itself?). An
attach/detach operation is thus synthesised for the reference.

OK?

Thanks,

Julian

2021-10-11  Julian Brown  

gcc/cp/
* semantics.c (finish_omp_clauses): Handle reference-typed members.

gcc/
* gimplify.c (build_struct_group): Arrange for attach/detach nodes to
be created for reference-typed struct members for OpenMP.  Only create
firstprivate_pointer/firstprivate_reference nodes for innermost struct
accesses, those with an optionally-indirected DECL_P base.
(omp_build_struct_sibling_lists): Handle two-element chain for inner
struct component returned from build_struct_group.

libgomp/
* testsuite/libgomp.c++/baseptrs-3.C: Remove XFAILs and extend test.
---
 gcc/cp/semantics.c |   4 +
 gcc/gimplify.c |  56 +--
 libgomp/testsuite/libgomp.c++/baseptrs-3.C | 109 +++--
 3 files changed, 154 insertions(+), 15 deletions(-)

diff --git a/gcc/cp/semantics.c b/gcc/cp/semantics.c
index 068c0c69e58..6d30a9ed97d 100644
--- a/gcc/cp/semantics.c
+++ b/gcc/cp/semantics.c
@@ -7923,6 +7923,8 @@ finish_omp_clauses (tree clauses, enum c_omp_region_type 
ort)
  STRIP_NOPS (t);
  if (TREE_CODE (t) == POINTER_PLUS_EXPR)
t = TREE_OPERAND (t, 0);
+ if (REFERENCE_REF_P (t))
+   t = TREE_OPERAND (t, 0);
}
}
  while (TREE_CODE (t) == COMPONENT_REF);
@@ -8021,6 +8023,8 @@ finish_omp_clauses (tree clauses, enum c_omp_region_type 
ort)
{
  t = TREE_OPERAND (TREE_OPERAND (t, 0), 0);
  indir_component_ref_p = true;
+ if (REFERENCE_REF_P (t))
+   t = TREE_OPERAND (t, 0);
  STRIP_NOPS (t);
  if (TREE_CODE (t) == POINTER_PLUS_EXPR)
t = TREE_OPERAND (t, 0);
diff --git a/gcc/gimplify.c b/gcc/gimplify.c
index 56f0aaaf979..8f07da8a991 100644
--- a/gcc/gimplify.c
+++ b/gcc/gimplify.c
@@ -9802,7 +9802,10 @@ build_struct_group (enum omp_region_type region_type, 
enum tree_code code,
   /* FIXME: If we're not mapping the base pointer in some other clause on this
  directive, I think we want to create ALLOC/RELEASE here -- i.e. not
  early-exit.  */
-  if (openmp && attach_detach)
+  if (openmp
+  && attach_detach
+  && !(TREE_CODE (TREE_TYPE (ocd)) == REFERENCE_TYPE
+  && TREE_CODE (TREE_TYPE (TREE_TYPE (ocd))) != POINTER_TYPE))
 return NULL;
 
   if (!struct_map_to_clause || struct_map_to_clause->get (base) == NULL)
@@ -9851,9 +9854,32 @@ build_struct_group (enum omp_region_type region_type, 
enum tree_code code,
 
   tree noind = strip_indirections (base);
 
-  if (!openmp
+  if (openmp
+ && TREE_CODE (TREE_TYPE (noind)) == REFERENCE_TYPE
  && (region_type & ORT_TARGET)
  && TREE_CODE (noind) == COMPONENT_REF)
+   {
+ tree c2 = build_omp_clause (OMP_CLAUSE_LOCATION (grp_end),
+ OMP_CLAUSE_MAP);
+ OMP_CLAUSE_SET_MAP_KIND (c2, GOMP_MAP_TO);
+ OMP_CLAUSE_DECL (c2) = unshare_expr (base);
+ OMP_CLAUSE_SIZE (c2) = TYPE_SIZE_UNIT (TREE_TYPE (noind));
+
+ tree c3 = build_omp_clause (OMP_CLAUSE_LOCATION (grp_end),
+ OMP_CLAUSE_MAP);
+ OMP_CLAUSE_SET_MAP_KIND (c3, GOMP_MAP_ATTACH_DETACH);
+ OMP_CLAUSE_DECL (c3) = unshare_expr (noind);
+ OMP_CLAUSE_SIZE (c3) = size_zero_node;
+
+ OMP_CLAUSE_CHAIN (c2) = c3;
+ OMP_CLAUSE_CHAIN (c3) = NULL_TREE;
+
+ *inner = c2;
+ return NULL;
+   }
+  e

[PATCH 13/16] Add debug_omp_expr

2021-11-25 Thread Julian Brown

The C and C++ front-ends use a TREE_LIST as a 3-tuple representing an
OpenMP array section, which tends to crash debug_generic_expr if one
wants to print such an expression in the debugger.  This little helper
function works around that.

We might want to adjust the representation of array sections to use the
soon-to-be-introduced OMP_ARRAY_SECTION tree code throughout instead,
at which point this patch will no longer be necessary.

OK?

Thanks,

Julian

2021-11-15  Julian Brown  

gcc/
* tree-pretty-print.c (print_omp_expr, debug_omp_expr): New functions.
* tree-pretty-print.h (debug_omp_expr): Add prototype.
---
 gcc/tree-pretty-print.c | 31 +++
 gcc/tree-pretty-print.h |  1 +
 2 files changed, 32 insertions(+)

diff --git a/gcc/tree-pretty-print.c b/gcc/tree-pretty-print.c
index a81ba401ef9..13b64fd52e1 100644
--- a/gcc/tree-pretty-print.c
+++ b/gcc/tree-pretty-print.c
@@ -103,6 +103,37 @@ debug_generic_stmt (tree t)
   fprintf (stderr, "\n");
 }
 
+static void
+print_omp_expr (tree t)
+{
+  if (TREE_CODE (t) == TREE_LIST)
+{
+  tree low = TREE_PURPOSE (t);
+  tree len = TREE_VALUE (t);
+  tree base = TREE_CHAIN (t);
+  if (TREE_CODE (base) == TREE_LIST)
+   print_omp_expr (base);
+  else
+   print_generic_expr (stderr, base, TDF_VOPS|TDF_MEMSYMS);
+  fprintf (stderr, "[");
+  if (low)
+   print_generic_expr (stderr, low, TDF_VOPS|TDF_MEMSYMS);
+  fprintf (stderr, ":");
+  if (len)
+   print_generic_expr (stderr, len, TDF_VOPS|TDF_MEMSYMS);
+  fprintf (stderr, "]");
+}
+  else
+print_generic_expr (stderr, t, TDF_VOPS|TDF_MEMSYMS);
+}
+
+DEBUG_FUNCTION void
+debug_omp_expr (tree t)
+{
+  print_omp_expr (t);
+  fprintf (stderr, "\n");
+}
+
 /* Debugging function to print out a chain of trees .  */
 
 DEBUG_FUNCTION void
diff --git a/gcc/tree-pretty-print.h b/gcc/tree-pretty-print.h
index dacd256302b..bc910f9a1b1 100644
--- a/gcc/tree-pretty-print.h
+++ b/gcc/tree-pretty-print.h
@@ -33,6 +33,7 @@ along with GCC; see the file COPYING3.  If not see
 
 extern void debug_generic_expr (tree);
 extern void debug_generic_stmt (tree);
+extern void debug_omp_expr (tree);
 extern void debug_tree_chain (tree);
 extern void print_generic_decl (FILE *, tree, dump_flags_t);
 extern void print_generic_stmt (FILE *, tree, dump_flags_t = TDF_NONE);
-- 
2.29.2

[PATCH 14/16] OpenMP: Add inspector class to unify mapped address analysis

2021-11-25 Thread Julian Brown

Several places in the C and C++ front-ends dig through OpenMP addresses
from "map" clauses (etc.) in order to determine whether they are component
accesses that need "attach" operations, check duplicate mapping clauses,
and so on.  When we're extending support for more kinds of lvalues in
map clauses, it seems helpful to bring these all into one place in order
to keep all the analyses in sync, and to make it easier to reason about
which kinds of expressions are supported.

This patch introduces an "address inspector" class for that purpose,
and adjusts the C and C++ front-ends to use it.

(The adjacent "c_omp_decompose_attachable_address" function could also
be moved into the address inspector class, perhaps.  That's not been
done yet.)

OK?

Thanks,

Julian

2021-11-15  Julian Brown  

gcc/c-family/
* c-common.h (c_omp_address_inspector): New class.
* c-omp.c (c_omp_address_inspector::init,
c_omp_address_inspector::analyze_components,
c_omp_address_inspector::map_supported_p,
c_omp_address_inspector::mappable_type): New methods.

gcc/c/
* c-typeck.c (handle_omp_array_sections_1,
c_finish_omp_clauses): Use c_omp_address_inspector class.

gcc/cp/
* semantics.c (cp_omp_address_inspector): New class, derived from
c_omp_address_inspector.
(handle_omp_array_sections_1): Use cp_omp_address_inspector class to
analyze OpenMP map clause expressions.  Support POINTER_PLUS_EXPR.
(finish_omp_clauses): Likewise.  Support some additional kinds of
lvalues in map clauses.

gcc/testsuite/
* g++.dg/gomp/unmappable-component-1.C: New test.
---
 gcc/c-family/c-common.h   |  44 +++
 gcc/c-family/c-omp.c  | 147 ++
 gcc/c/c-typeck.c  | 198 +++---
 gcc/cp/semantics.c| 252 ++
 .../g++.dg/gomp/unmappable-component-1.C  |  21 ++
 5 files changed, 338 insertions(+), 324 deletions(-)
 create mode 100644 gcc/testsuite/g++.dg/gomp/unmappable-component-1.C

diff --git a/gcc/c-family/c-common.h b/gcc/c-family/c-common.h
index dd103d8eecd..05d479e4d2f 100644
--- a/gcc/c-family/c-common.h
+++ b/gcc/c-family/c-common.h
@@ -1253,6 +1253,50 @@ extern const char *c_omp_map_clause_name (tree, bool);
 extern void c_omp_adjust_map_clauses (tree, bool);
 extern tree c_omp_decompose_attachable_address (tree t, tree *virtbase);
 
+class c_omp_address_inspector
+{
+  tree clause;
+  tree orig;
+  tree deref_toplevel;
+  tree outer_virtual_base;
+  tree root_term;
+  bool component_access;
+  bool indirections;
+  int map_supported;
+
+protected:
+  virtual bool reference_ref_p (tree) { return false; }
+  virtual bool processing_template_decl_p () { return false; }
+  virtual bool mappable_type (tree t);
+  virtual void emit_unmappable_type_notes (tree) { }
+
+public:
+  c_omp_address_inspector (tree c, tree t)
+: clause (c), orig (t), deref_toplevel (NULL_TREE),
+  outer_virtual_base (NULL_TREE), root_term (NULL_TREE),
+  component_access (false), indirections (false), map_supported (-1)
+  { }
+
+  ~c_omp_address_inspector () {}
+
+  virtual void init ();
+
+  tree analyze_components (bool);
+
+  tree get_deref_toplevel () { return deref_toplevel; }
+  tree get_outer_virtual_base () { return outer_virtual_base; }
+  tree get_root_term () { gcc_assert (root_term); return root_term; }
+  bool component_access_p () { return component_access; }
+
+  bool indir_component_ref_p ()
+{
+  gcc_assert (!component_access || root_term != NULL_TREE);
+  return component_access && indirections;
+}
+
+  bool map_supported_p ();
+};
+
 enum c_omp_directive_kind {
   C_OMP_DIR_STANDALONE,
   C_OMP_DIR_CONSTRUCT,
diff --git a/gcc/c-family/c-omp.c b/gcc/c-family/c-omp.c
index a90696fe706..5b2fbf6809b 100644
--- a/gcc/c-family/c-omp.c
+++ b/gcc/c-family/c-omp.c
@@ -3113,6 +3113,153 @@ c_omp_adjust_map_clauses (tree clauses, bool is_target)
 }
 }
 
+/* This could just be done in the constructor, but we need to call the
+   subclass's version of reference_ref_p, etc.  */
+
+void
+c_omp_address_inspector::init ()
+{
+  tree t = orig;
+
+  gcc_assert (TREE_CODE (t) != ARRAY_REF);
+
+  /* We may have a reference-typed component access at the outermost level
+ that has had convert_from_reference called on it.  Look through that
+ access.  */
+  if (reference_ref_p (t)
+  && TREE_CODE (TREE_OPERAND (t, 0)) == COMPONENT_REF)
+{
+  t = TREE_OPERAND (t, 0);
+  deref_toplevel = t;
+}
+  else
+deref_toplevel = t;
+
+  /* Strip off expression nodes that may enclose a COMPONENT_REF.  Look through
+ references, but not indirections through pointers.  */
+  while (1)
+{
+  if (TREE_CODE (t) == COMPOUND_EXPR)
+   {
+ t = TREE_OPERAND (t, 1);
+ STRIP_NOPS (t);
+   }
+  else if (TREE_CODE (t) == POINTER_PLUS_EXPR
+  || TREE_CODE (t

[PATCH 16/16] OpenMP: lvalue parsing for map clauses (C)

2021-11-25 Thread Julian Brown

This patch adds support for parsing general lvalues for OpenMP "map"
clauses to the C front-end, similar to the previous patch for C++.

This version of the patch fixes several omissions regarding non-DECL_P
root terms in map clauses (i.e. "*foo" in "(*foo)->ptr->arr[:N]") --
similar to the cp/semantics.c changes in the previous patch -- and adds
a couple of new tests.

OK?

Thanks,

Julian

2021-11-24  Julian Brown  

gcc/c/
* c-parser.c (c_parser_postfix_expression_after_primary): Add support
for OpenMP array section parsing.
(c_parser_omp_variable_list): Change ALLOW_DEREF parameter to
MAP_LVALUE.  Support parsing of general lvalues in "map" clauses.
(c_parser_omp_var_list_parens): Change ALLOW_DEREF parameter to
MAP_LVALUE.  Update call to c_parser_omp_variable_list.
(c_parser_oacc_data_clause, c_parser_omp_clause_to,
c_parser_omp_clause_from): Update calls to
c_parser_omp_var_list_parens.
* c-tree.h (c_omp_array_section_p): Add extern declaration.
* c-typeck.c (c_omp_array_section_p): Add flag.
(mark_exp_read): Support OMP_ARRAY_SECTION.
(handle_omp_array_sections_1): Handle more kinds of expressions.
(handle_omp_array_sections): Handle non-DECL_P attachment points.
(c_finish_omp_clauses): Check for supported expression types.  Support
non-DECL_P root term for map clauses.

gcc/testsuite/
* c-c++-common/gomp/map-1.c: Adjust expected output.
* c-c++-common/gomp/map-6.c: Likewise.

libgomp/
* testsuite/libgomp.c-c++-common/ind-base-4.c: New test.
* testsuite/libgomp.c-c++-common/unary-ptr-1.c: New test.
---
 gcc/c/c-parser.c  | 150 +++---
 gcc/c/c-tree.h|   1 +
 gcc/c/c-typeck.c  |  45 +-
 gcc/testsuite/c-c++-common/gomp/map-1.c   |   3 +-
 gcc/testsuite/c-c++-common/gomp/map-6.c   |   2 +
 .../libgomp.c-c++-common/ind-base-4.c |  50 ++
 .../libgomp.c-c++-common/unary-ptr-1.c|  16 ++
 7 files changed, 243 insertions(+), 24 deletions(-)
 create mode 100644 libgomp/testsuite/libgomp.c-c++-common/ind-base-4.c
 create mode 100644 libgomp/testsuite/libgomp.c-c++-common/unary-ptr-1.c

diff --git a/gcc/c/c-parser.c b/gcc/c/c-parser.c
index 322f30c90b4..702a0b7d8a9 100644
--- a/gcc/c/c-parser.c
+++ b/gcc/c/c-parser.c
@@ -10460,7 +10460,7 @@ c_parser_postfix_expression_after_primary (c_parser 
*parser,
   struct c_expr expr)
 {
   struct c_expr orig_expr;
-  tree ident, idx;
+  tree ident, idx, len;
   location_t sizeof_arg_loc[3], comp_loc;
   tree sizeof_arg[3];
   unsigned int literal_zero_mask;
@@ -10479,15 +10479,44 @@ c_parser_postfix_expression_after_primary (c_parser 
*parser,
case CPP_OPEN_SQUARE:
  /* Array reference.  */
  c_parser_consume_token (parser);
- idx = c_parser_expression (parser).value;
- c_parser_skip_until_found (parser, CPP_CLOSE_SQUARE,
-"expected %<]%>");
- start = expr.get_start ();
- finish = parser->tokens_buf[0].location;
- expr.value = build_array_ref (op_loc, expr.value, idx);
- set_c_expr_source_range (&expr, start, finish);
- expr.original_code = ERROR_MARK;
- expr.original_type = NULL;
+ idx = len = NULL_TREE;
+ if (!c_omp_array_section_p
+ || c_parser_next_token_is_not (parser, CPP_COLON))
+   idx = c_parser_expression (parser).value;
+
+ if (c_omp_array_section_p
+ && c_parser_next_token_is (parser, CPP_COLON))
+   {
+ c_parser_consume_token (parser);
+ if (c_parser_next_token_is_not (parser, CPP_CLOSE_SQUARE))
+   len = c_parser_expression (parser).value;
+
+ c_parser_skip_until_found (parser, CPP_CLOSE_SQUARE,
+"expected %<]%>");
+
+/* NOTE: We are reusing using the type of the whole array as the
+   type of the array section here, which isn't necessarily
+   entirely correct.  Might need revisiting.  */
+ start = expr.get_start ();
+ finish = parser->tokens_buf[0].location;
+ expr.value = build3_loc (op_loc, OMP_ARRAY_SECTION,
+  TREE_TYPE (expr.value), expr.value,
+  idx, len);
+ set_c_expr_source_range (&expr, start, finish);
+ expr.original_code = ERROR_MARK;
+ expr.original_type = NULL;
+   }
+ else
+   {
+ c_parser_skip_until_found (parser, CPP_CLOSE_SQUARE,
+"expected %<]%>");
+ start = expr.get_start ();
+ finish = parser->tokens_buf[0].location;
+ expr.value = build_array_ref (op_

[PATCH 15/16] OpenMP: lvalue parsing for map clauses (C++)

2021-11-25 Thread Julian Brown

This patch changes parsing for OpenMP map clauses in C++ to use the
generic expression parser, hence adds support for parsing general
lvalues (as required by OpenMP 5.0+).  So far only a few new types of
expression are actually supported throughout compilation (including
everything in the testsuite of course, and newly-added tests), and we
attempt to reject unsupported expressions in order to avoid surprises
for the user.

This version of the patch adds a number of additional tests for various
expressions accepted as lvalues in C++, many of which are currently
rejected as not yet supported -- and really only a handful of the
rejected cases would be plausible in the context of an OpenMP "map"
clause anyway, IMO.

OK?

Thanks,

Julian

2021-11-24  Julian Brown  

gcc/c-family/
* c-omp.c (c_omp_decompose_attachable_address): Handle
more types of expressions.

gcc/cp/
* error.c (dump_expr): Handle OMP_ARRAY_SECTION.
* parser.c (cp_parser_new): Initialize parser->omp_array_section_p.
(cp_parser_postfix_open_square_expression): Support OMP_ARRAY_SECTION
parsing.
(cp_parser_omp_var_list_no_open): Remove ALLOW_DEREF parameter, add
MAP_LVALUE in its place.  Supported generalised lvalue parsing for map
clauses.
(cp_parser_omp_var_list): Remove ALLOW_DEREF parameter, add MAP_LVALUE.
Pass to cp_parser_omp_var_list_no_open.
(cp_parser_oacc_data_clause, cp_parser_omp_all_clauses): Update calls
to cp_parser_omp_var_list.
* parser.h (cp_parser): Add omp_array_section_p field.
* semantics.c (handle_omp_array_sections_1): Handle more types of map
expression.
(handle_omp_array_section): Handle non-DECL_P attachment points.
(finish_omp_clauses): Check for supported types of expression.

gcc/
* gimplify.c (build_struct_group): Handle reference-typed component
accesses.  Fix support for non-DECL_P struct bases.
(omp_build_struct_sibling_lists): Support length-two group for
synthesized inner struct mapping.
* tree-pretty-print.c (dump_generic_node): Support OMP_ARRAY_SECTION.
* tree.def (OMP_ARRAY_SECTION): New tree code.

gcc/testsuite/
* c-c++-common/gomp/map-6.c: Update expected output.
* g++.dg/gomp/pr67522.C: Likewise.
* g++.dg/gomp/ind-base-3.C: New test.
* g++.dg/gomp/map-assignment-1.C: New test.
* g++.dg/gomp/map-inc-1.C: New test.
* g++.dg/gomp/map-lvalue-ref-1.C: New test.
* g++.dg/gomp/map-ptrmem-1.C: New test.
* g++.dg/gomp/map-ptrmem-2.C: New test.
* g++.dg/gomp/map-static-cast-lvalue-1.C: New test.
* g++.dg/gomp/map-ternary-1.C: New test.
* g++.dg/gomp/member-array-2.C: New test.

libgomp/
* testsuite/libgomp.c++/ind-base-1.C: New test.
* testsuite/libgomp.c++/ind-base-2.C: New test.
* testsuite/libgomp.c++/map-comma-1.C: New test.
* testsuite/libgomp.c++/map-rvalue-ref-1.C: New test.
* testsuite/libgomp.c++/member-array-1.C: New test.
* testsuite/libgomp.c++/struct-ref-1.C: New test.
---
 gcc/c-family/c-omp.c  |  25 ++-
 gcc/cp/error.c|   9 +
 gcc/cp/parser.c   | 141 +--
 gcc/cp/parser.h   |   3 +
 gcc/cp/semantics.c|  35 +++-
 gcc/gimplify.c|  37 +++-
 gcc/testsuite/c-c++-common/gomp/map-6.c   |   4 +-
 gcc/testsuite/g++.dg/gomp/ind-base-3.C|  38 
 gcc/testsuite/g++.dg/gomp/map-assignment-1.C  |  12 ++
 gcc/testsuite/g++.dg/gomp/map-inc-1.C |  10 ++
 gcc/testsuite/g++.dg/gomp/map-lvalue-ref-1.C  |  19 ++
 gcc/testsuite/g++.dg/gomp/map-ptrmem-1.C  |  36 
 gcc/testsuite/g++.dg/gomp/map-ptrmem-2.C  |  39 +
 .../g++.dg/gomp/map-static-cast-lvalue-1.C|  17 ++
 gcc/testsuite/g++.dg/gomp/map-ternary-1.C |  20 +++
 gcc/testsuite/g++.dg/gomp/member-array-2.C|  86 ++
 gcc/testsuite/g++.dg/gomp/pr67522.C   |   2 +-
 gcc/tree-pretty-print.c   |  14 ++
 gcc/tree.def  |   3 +
 libgomp/testsuite/libgomp.c++/ind-base-1.C| 162 ++
 libgomp/testsuite/libgomp.c++/ind-base-2.C|  49 ++
 libgomp/testsuite/libgomp.c++/map-comma-1.C   |  15 ++
 .../testsuite/libgomp.c++/map-rvalue-ref-1.C  |  22 +++
 .../testsuite/libgomp.c++/member-array-1.C|  89 ++
 libgomp/testsuite/libgomp.c++/struct-ref-1.C  |  97 +++
 25 files changed, 956 insertions(+), 28 deletions(-)
 create mode 100644 gcc/testsuite/g++.dg/gomp/ind-base-3.C
 create mode 100644 gcc/testsuite/g++.dg/gomp/map-assignment-1.C
 create mode 100644 gcc/testsuite/g++.dg/gomp/map-inc-1.C
 create mode 100644 gcc/testsuite/g++.dg/gomp/map-lvalue-ref-1.C
 create mode 100644 gcc/testsuite/g++.dg/gomp/map-ptrmem-1.C
 create

[PATCH] Remove unreachable returns

2021-11-25 Thread Richard Biener via Gcc-patches

This removes unreachable return statements as diagnosed by
the -Wunreachable-code patch.  Some cases are more obviously
an improvement than others - in fact some may get you the idea
to replace them with gcc_unreachable () instead, leading to
cases of the 'Remove unreachable gcc_unreachable () at the end
of functions' patch.

Bootstrapped and tested on x86_64-unknown-linux-gnu.

OK?  Comments?  Feel free to approve select cases only.

Thanks,
Richard.

2021-11-25  Richard Biener  

* vec.c (qsort_chk): Do not return the void return value
from the noreturn qsort_chk_error.
* ccmp.c (expand_ccmp_expr_1): Remove unreachable return.
* df-scan.c (df_ref_equal_p): Likewise.
* dwarf2out.c (is_base_type): Likewise.
(add_const_value_attribute): Likewise.
* fixed-value.c (fixed_arithmetic): Likewise.
* gimple-fold.c (gimple_fold_builtin_fputs): Likewise.
* gimple-ssa-strength-reduction.c (stmt_cost): Likewise.
* graphite-isl-ast-to-gimple.c
(gcc_expression_from_isl_expr_op): Likewise.
(gcc_expression_from_isl_expression): Likewise.
* ipa-fnsummary.c (will_be_nonconstant_expr_predicate):
Likewise.
* lto-streamer-in.c (lto_input_mode_table): Likewise.

gcc/c-family/
* c-opts.c (c_common_post_options): Remove unreachable return.
* c-pragma.c (handle_pragma_target): Likewise.
(handle_pragma_optimize): Likewise.

gcc/c/
* c-typeck.c (c_tree_equal): Remove unreachable return.
* c-parser.c (get_matching_symbol): Likewise.

libgomp/
* oacc-plugin.c (GOMP_PLUGIN_acc_default_dim): Remove unreachable
return.
---
 gcc/c-family/c-opts.c   |  5 +
 gcc/c-family/c-pragma.c | 10 ++
 gcc/c/c-parser.c|  1 -
 gcc/c/c-typeck.c|  2 --
 gcc/ccmp.c  |  2 --
 gcc/df-scan.c   |  1 -
 gcc/dwarf2out.c |  3 ---
 gcc/fixed-value.c   |  1 -
 gcc/gimple-fold.c   |  1 -
 gcc/gimple-ssa-strength-reduction.c |  1 -
 gcc/graphite-isl-ast-to-gimple.c|  4 
 gcc/ipa-fnsummary.c |  1 -
 gcc/lto-streamer-in.c   |  7 ++-
 gcc/vec.c   | 10 +-
 libgomp/oacc-plugin.c   |  1 -
 15 files changed, 10 insertions(+), 40 deletions(-)

diff --git a/gcc/c-family/c-opts.c b/gcc/c-family/c-opts.c
index 2030eb1a4cd..93845d57dee 100644
--- a/gcc/c-family/c-opts.c
+++ b/gcc/c-family/c-opts.c
@@ -1109,10 +1109,7 @@ c_common_post_options (const char **pfilename)
out_stream = fopen (out_fname, "w");
 
   if (out_stream == NULL)
-   {
- fatal_error (input_location, "opening output file %s: %m", out_fname);
- return false;
-   }
+   fatal_error (input_location, "opening output file %s: %m", out_fname);
 
   init_pp_output (out_stream);
 }
diff --git a/gcc/c-family/c-pragma.c b/gcc/c-family/c-pragma.c
index 3663eb1cfbb..c4ed4205820 100644
--- a/gcc/c-family/c-pragma.c
+++ b/gcc/c-family/c-pragma.c
@@ -916,10 +916,7 @@ handle_pragma_target(cpp_reader *ARG_UNUSED(dummy))
 }
 
   if (token != CPP_STRING)
-{
-  GCC_BAD_AT (loc, "%<#pragma GCC option%> is not a string");
-  return;
-}
+GCC_BAD_AT (loc, "%<#pragma GCC option%> is not a string");
 
   /* Strings are user options.  */
   else
@@ -991,10 +988,7 @@ handle_pragma_optimize (cpp_reader *ARG_UNUSED(dummy))
 }
 
   if (token != CPP_STRING && token != CPP_NUMBER)
-{
-  GCC_BAD ("%<#pragma GCC optimize%> is not a string or number");
-  return;
-}
+GCC_BAD ("%<#pragma GCC optimize%> is not a string or number");
 
   /* Strings/numbers are user options.  */
   else
diff --git a/gcc/c/c-parser.c b/gcc/c/c-parser.c
index f312630448f..af2bb5bc8cc 100644
--- a/gcc/c/c-parser.c
+++ b/gcc/c/c-parser.c
@@ -1132,7 +1132,6 @@ get_matching_symbol (enum cpp_ttype type)
 {
 default:
   gcc_unreachable ();
-  return "";
 case CPP_CLOSE_PAREN:
   return "(";
 case CPP_CLOSE_BRACE:
diff --git a/gcc/c/c-typeck.c b/gcc/c/c-typeck.c
index b71358e1821..7524304f2bd 100644
--- a/gcc/c/c-typeck.c
+++ b/gcc/c/c-typeck.c
@@ -15984,8 +15984,6 @@ c_tree_equal (tree t1, tree t2)
 default:
   gcc_unreachable ();
 }
-  /* We can get here with --disable-checking.  */
-  return false;
 }
 
 /* Returns true when the function declaration FNDECL is implicit,
diff --git a/gcc/ccmp.c b/gcc/ccmp.c
index d581cfadf06..616fe035e79 100644
--- a/gcc/ccmp.c
+++ b/gcc/ccmp.c
@@ -273,8 +273,6 @@ expand_ccmp_expr_1 (gimple *g, rtx_insn **prep_seq, 
rtx_insn **gen_seq)
return NULL_RTX;
   return expand_ccmp_next (op1, code, tmp, prep_seq, gen_seq);
 }
-
-  return NULL_RTX;
 }
 
 /* Main entry to expand conditional compare statement G.
diff --git a/gcc/df-scan.c b/gcc/df-scan.c
index 3dbda7aa52c..1baa6e73

[PATCH] c++, v3: Fix up diagnostics about taking address of an immediate member function [PR102753]

2021-11-25 Thread Jakub Jelinek via Gcc-patches

On Wed, Nov 24, 2021 at 09:07:48PM -0500, Jason Merrill wrote:
> > --- gcc/cp/tree.c.jj2021-11-24 15:05:23.371927735 +0100
> > +++ gcc/cp/tree.c   2021-11-24 17:09:05.348164621 +0100
> > @@ -5167,6 +5167,7 @@ make_ptrmem_cst (tree type, tree member)
> > tree ptrmem_cst = make_node (PTRMEM_CST);
> > TREE_TYPE (ptrmem_cst) = type;
> > PTRMEM_CST_MEMBER (ptrmem_cst) = member;
> > +  PTRMEM_CST_LOCATION (ptrmem_cst) = input_location;
> > return ptrmem_cst;
> >   }
> 
> Please also change build_x_unary_op to improve PTRMEM_CST_LOCATION instead
> of adding a wrapper, and teach cp_expr_location about PTRMEM_CST_LOCATION.

Done.  Though, had to also change convert_for_assignment from
EXPR_LOC_OR_LOC to cp_expr_loc_or_input_loc and expand_ptrmemfunc_cst
to copy over location to ADDR_EXPR from PTRMEM_CST.

> > --- gcc/cp/pt.c.jj  2021-11-24 15:05:23.336928234 +0100
> > +++ gcc/cp/pt.c 2021-11-24 15:34:29.018014159 +0100
> > @@ -17012,6 +17012,12 @@ tsubst_copy (tree t, tree args, tsubst_f
> > r = build1 (code, type, op0);
> 
> This should become build1_loc (EXPR_LOCATION (t), ...

Done.
> 
> > if (code == ALIGNOF_EXPR)
> >   ALIGNOF_EXPR_STD_P (r) = ALIGNOF_EXPR_STD_P (t);
> > +   /* For addresses of immediate functions ensure we have EXPR_LOCATION
> > +  set for possible later diagnostics.  */
> > +   if (code == ADDR_EXPR
> > +   && TREE_CODE (op0) == FUNCTION_DECL
> > +   && DECL_IMMEDIATE_FUNCTION_P (op0))
> > + SET_EXPR_LOCATION (r, input_location);
> 
> ...and then do this only if t didn't have a location.

And this too.

2021-11-25  Jakub Jelinek  

PR c++/102753
* cp-tree.h (struct ptrmem_cst): Add locus member.
(PTRMEM_CST_LOCATION): Define.
* tree.c (make_ptrmem_cst): Set PTRMEM_CST_LOCATION to input_location.
(cp_expr_location): Return PTRMEM_CST_LOCATION for PTRMEM_CST.
* typeck.c (build_x_unary_op): Overwrite PTRMEM_CST_LOCATION for
PTRMEM_CST instead of calling maybe_wrap_with_location.
(cp_build_addr_expr_1): Don't diagnose taking address of
immediate functions here.  Instead when taking their address make
sure the returned ADDR_EXPR has EXPR_LOCATION set.
(expand_ptrmemfunc_cst): Copy over PTRMEM_CST_LOCATION to
ADDR_EXPR if taking address of immediate member function.
(convert_for_assignment): Use cp_expr_loc_or_input_loc instead of
EXPR_LOC_OR_LOC.
* pt.c (tsubst_copy): Use build1_loc instead of build1.  Ensure
ADDR_EXPR of immediate function has EXPR_LOCATION set.
* cp-gimplify.c (cp_fold_r): Diagnose taking address of immediate
functions here.  For consteval if don't walk THEN_CLAUSE.
(cp_genericize_r): Move evaluation of calls to
std::source_location::current from here to...
(cp_fold): ... here.  Don't assert calls to immediate functions must
be source_location_current_p, instead only constant evaluate
calls to source_location_current_p.

* g++.dg/cpp2a/consteval20.C: Add some extra tests.
* g++.dg/cpp2a/consteval23.C: Likewise.
* g++.dg/cpp2a/consteval25.C: New test.
* g++.dg/cpp2a/srcloc20.C: New test.

--- gcc/cp/cp-tree.h.jj 2021-11-25 08:35:39.856073838 +0100
+++ gcc/cp/cp-tree.h2021-11-25 14:25:33.411081733 +0100
@@ -703,6 +703,7 @@ struct GTY(()) template_parm_index {
 struct GTY(()) ptrmem_cst {
   struct tree_common common;
   tree member;
+  location_t locus;
 };
 typedef struct ptrmem_cst * ptrmem_cst_t;

@@ -4726,6 +4727,11 @@ more_aggr_init_expr_args_p (const aggr_i
 #define PTRMEM_CST_MEMBER(NODE) \
   (((ptrmem_cst_t)PTRMEM_CST_CHECK (NODE))->member)

+/* For a pointer-to-member constant `X::Y' this is a location where
+   the address of the member has been taken.  */
+#define PTRMEM_CST_LOCATION(NODE) \
+  (((ptrmem_cst_t)PTRMEM_CST_CHECK (NODE))->locus)
+
 /* The expression in question for a TYPEOF_TYPE.  */
 #define TYPEOF_TYPE_EXPR(NODE) (TYPE_VALUES_RAW (TYPEOF_TYPE_CHECK (NODE)))

--- gcc/cp/tree.c.jj2021-11-25 08:35:39.942072610 +0100
+++ gcc/cp/tree.c   2021-11-25 14:31:32.784899701 +0100
@@ -5196,6 +5196,7 @@ make_ptrmem_cst (tree type, tree member)
   tree ptrmem_cst = make_node (PTRMEM_CST);
   TREE_TYPE (ptrmem_cst) = type;
   PTRMEM_CST_MEMBER (ptrmem_cst) = member;
+  PTRMEM_CST_LOCATION (ptrmem_cst) = input_location;
   return ptrmem_cst;
 }

@@ -6040,6 +6041,8 @@ cp_expr_location (const_tree t_)
   return STATIC_ASSERT_SOURCE_LOCATION (t);
 case TRAIT_EXPR:
   return TRAIT_EXPR_LOCATION (t);
+case PTRMEM_CST:
+  return PTRMEM_CST_LOCATION (t);
 default:
   return EXPR_LOCATION (t);
 }
--- gcc/cp/typeck.c.jj  2021-11-25 08:32:50.585489416 +0100
+++ gcc/cp/typeck.c 2021-11-25 15:22:25.554996949 +0100
@@ -6497,7 +6497,7 @@ build_x_unary_op (location_t loc, enum t
   exp = cp_build_addr_expr_strict (xarg, complain);

   if (TREE_CODE (exp) ==

RE: [PATCH] match.pd: Fix up the recent bitmask_inv_cst_vector_p simplification [PR103417]

2021-11-25 Thread Tamar Christina via Gcc-patches




> -Original Message-
> From: Jakub Jelinek 
> Sent: Thursday, November 25, 2021 9:53 AM
> To: Richard Biener 
> Cc: Tamar Christina ; gcc-patches@gcc.gnu.org
> Subject: Re: [PATCH] match.pd: Fix up the recent bitmask_inv_cst_vector_p
> simplification [PR103417]
> 
> On Thu, Nov 25, 2021 at 10:17:52AM +0100, Richard Biener wrote:
> > > Ah I see, sorry I didn't see that rule before, you're right that if
> > > this is ordered after it then they can be dropped.
> >
> > So the patch is OK, possibly with re-ordering the matches.
> 
> I've committed the patch as is because it has been tested that way and I'd
> like to avoid dups of that PR flowing in.  Even when not reordered, the new
> earlier match.pd simplification will not trigger for the lt le gt ge cases 
> anymore
> and the later old simplifications will trigger and I'd expect after that 
> latter
> simplification the earlier should trigger again because the IL changed, no?
> Tamar, can you handle the reordering together with the testsuite changes
> (and perhaps formatting fixes in the tree.c routine)?

Yes I will, I'll send a patch tomorrow morning. Thanks!

Regards,
Tamar
> 
>   Jakub

Re: [PATCH] c++, v3: Fix up diagnostics about taking address of an immediate member function [PR102753]

2021-11-25 Thread Jason Merrill via Gcc-patches


On 11/25/21 09:38, Jakub Jelinek wrote:

On Wed, Nov 24, 2021 at 09:07:48PM -0500, Jason Merrill wrote:

--- gcc/cp/tree.c.jj2021-11-24 15:05:23.371927735 +0100
+++ gcc/cp/tree.c   2021-11-24 17:09:05.348164621 +0100
@@ -5167,6 +5167,7 @@ make_ptrmem_cst (tree type, tree member)
 tree ptrmem_cst = make_node (PTRMEM_CST);
 TREE_TYPE (ptrmem_cst) = type;
 PTRMEM_CST_MEMBER (ptrmem_cst) = member;
+  PTRMEM_CST_LOCATION (ptrmem_cst) = input_location;
 return ptrmem_cst;
   }


Please also change build_x_unary_op to improve PTRMEM_CST_LOCATION instead
of adding a wrapper, and teach cp_expr_location about PTRMEM_CST_LOCATION.


Done.  Though, had to also change convert_for_assignment from
EXPR_LOC_OR_LOC to cp_expr_loc_or_input_loc and expand_ptrmemfunc_cst
to copy over location to ADDR_EXPR from PTRMEM_CST.


--- gcc/cp/pt.c.jj  2021-11-24 15:05:23.336928234 +0100
+++ gcc/cp/pt.c 2021-11-24 15:34:29.018014159 +0100
@@ -17012,6 +17012,12 @@ tsubst_copy (tree t, tree args, tsubst_f
r = build1 (code, type, op0);


This should become build1_loc (EXPR_LOCATION (t), ...


Done.



if (code == ALIGNOF_EXPR)
  ALIGNOF_EXPR_STD_P (r) = ALIGNOF_EXPR_STD_P (t);
+   /* For addresses of immediate functions ensure we have EXPR_LOCATION
+  set for possible later diagnostics.  */
+   if (code == ADDR_EXPR
+   && TREE_CODE (op0) == FUNCTION_DECL
+   && DECL_IMMEDIATE_FUNCTION_P (op0))
+ SET_EXPR_LOCATION (r, input_location);


...and then do this only if t didn't have a location.


And this too.

2021-11-25  Jakub Jelinek  

PR c++/102753
* cp-tree.h (struct ptrmem_cst): Add locus member.
(PTRMEM_CST_LOCATION): Define.
* tree.c (make_ptrmem_cst): Set PTRMEM_CST_LOCATION to input_location.
(cp_expr_location): Return PTRMEM_CST_LOCATION for PTRMEM_CST.
* typeck.c (build_x_unary_op): Overwrite PTRMEM_CST_LOCATION for
PTRMEM_CST instead of calling maybe_wrap_with_location.
(cp_build_addr_expr_1): Don't diagnose taking address of
immediate functions here.  Instead when taking their address make
sure the returned ADDR_EXPR has EXPR_LOCATION set.
(expand_ptrmemfunc_cst): Copy over PTRMEM_CST_LOCATION to
ADDR_EXPR if taking address of immediate member function.
(convert_for_assignment): Use cp_expr_loc_or_input_loc instead of
EXPR_LOC_OR_LOC.
* pt.c (tsubst_copy): Use build1_loc instead of build1.  Ensure
ADDR_EXPR of immediate function has EXPR_LOCATION set.
* cp-gimplify.c (cp_fold_r): Diagnose taking address of immediate
functions here.  For consteval if don't walk THEN_CLAUSE.
(cp_genericize_r): Move evaluation of calls to
std::source_location::current from here to...
(cp_fold): ... here.  Don't assert calls to immediate functions must
be source_location_current_p, instead only constant evaluate
calls to source_location_current_p.

* g++.dg/cpp2a/consteval20.C: Add some extra tests.
* g++.dg/cpp2a/consteval23.C: Likewise.
* g++.dg/cpp2a/consteval25.C: New test.
* g++.dg/cpp2a/srcloc20.C: New test.

--- gcc/cp/cp-tree.h.jj 2021-11-25 08:35:39.856073838 +0100
+++ gcc/cp/cp-tree.h2021-11-25 14:25:33.411081733 +0100
@@ -703,6 +703,7 @@ struct GTY(()) template_parm_index {
  struct GTY(()) ptrmem_cst {
struct tree_common common;
tree member;
+  location_t locus;
  };
  typedef struct ptrmem_cst * ptrmem_cst_t;
  
@@ -4726,6 +4727,11 @@ more_aggr_init_expr_args_p (const aggr_i

  #define PTRMEM_CST_MEMBER(NODE) \
(((ptrmem_cst_t)PTRMEM_CST_CHECK (NODE))->member)
  
+/* For a pointer-to-member constant `X::Y' this is a location where

+   the address of the member has been taken.  */
+#define PTRMEM_CST_LOCATION(NODE) \
+  (((ptrmem_cst_t)PTRMEM_CST_CHECK (NODE))->locus)
+
  /* The expression in question for a TYPEOF_TYPE.  */
  #define TYPEOF_TYPE_EXPR(NODE) (TYPE_VALUES_RAW (TYPEOF_TYPE_CHECK (NODE)))
  
--- gcc/cp/tree.c.jj	2021-11-25 08:35:39.942072610 +0100

+++ gcc/cp/tree.c   2021-11-25 14:31:32.784899701 +0100
@@ -5196,6 +5196,7 @@ make_ptrmem_cst (tree type, tree member)
tree ptrmem_cst = make_node (PTRMEM_CST);
TREE_TYPE (ptrmem_cst) = type;
PTRMEM_CST_MEMBER (ptrmem_cst) = member;
+  PTRMEM_CST_LOCATION (ptrmem_cst) = input_location;
return ptrmem_cst;
  }
  
@@ -6040,6 +6041,8 @@ cp_expr_location (const_tree t_)

return STATIC_ASSERT_SOURCE_LOCATION (t);
  case TRAIT_EXPR:
return TRAIT_EXPR_LOCATION (t);
+case PTRMEM_CST:
+  return PTRMEM_CST_LOCATION (t);
  default:
return EXPR_LOCATION (t);
  }
--- gcc/cp/typeck.c.jj  2021-11-25 08:32:50.585489416 +0100
+++ gcc/cp/typeck.c 2021-11-25 15:22:25.554996949 +0100
@@ -6497,7 +6497,7 @@ build_x_unary_op (location_t loc, enum t
exp = cp_build_addr_expr_strict (xarg, complain);
  
i

Re: [PATCH] c++: __builtin_bit_cast To C array target type [PR103140]

2021-11-25 Thread Jason Merrill via Gcc-patches


On 11/8/21 15:03, Will Wray via Gcc-patches wrote:

This patch allows __builtin_bit_cast to materialize a C array as its To type.

It was developed as part of an implementation of P1997, array copy-semantics,
but is independent, so makes sense to submit, review and merge ahead of it.

gcc/cp/ChangeLog:

* constexpr.c (check_bit_cast_type): handle ARRAY_TYPE check,
(cxx_eval_bit_cast): handle ARRAY_TYPE copy.
* semantics.c (cp_build_bit_cast): warn only on unbounded/VLA.

gcc/testsuite/ChangeLog:

* g++.dg/cpp2a/bit-cast2.C: update XFAIL tests.
* g++.dg/cpp2a/bit-cast-to-array1.C: New test.
---
  gcc/cp/constexpr.c  |  8 -
  gcc/cp/semantics.c  |  7 ++---
  gcc/testsuite/g++.dg/cpp2a/bit-cast-to-array1.C | 40 +
  gcc/testsuite/g++.dg/cpp2a/bit-cast2.C  |  8 ++---
  4 files changed, 53 insertions(+), 10 deletions(-)

diff --git a/gcc/cp/constexpr.c b/gcc/cp/constexpr.c
index 453007c686b..be1cdada6f8 100644
--- a/gcc/cp/constexpr.c
+++ b/gcc/cp/constexpr.c
@@ -4124,6 +4124,11 @@ static bool
  check_bit_cast_type (const constexpr_ctx *ctx, location_t loc, tree type,
 tree orig_type)
  {
+  if (TREE_CODE (type) == ARRAY_TYPE)
+  return check_bit_cast_type (ctx, loc,
+ TYPE_MAIN_VARIANT (TREE_TYPE (type)),
+ orig_type);
+
if (TREE_CODE (type) == UNION_TYPE)
  {
if (!ctx->quiet)
@@ -4280,7 +4285,8 @@ cxx_eval_bit_cast (const constexpr_ctx *ctx, tree t, bool 
*non_constant_p,
tree r = NULL_TREE;
if (can_native_interpret_type_p (TREE_TYPE (t)))
  r = native_interpret_expr (TREE_TYPE (t), ptr, len);
-  else if (TREE_CODE (TREE_TYPE (t)) == RECORD_TYPE)
+  else if (TREE_CODE (TREE_TYPE (t)) == RECORD_TYPE
+  || TREE_CODE (TREE_TYPE (t)) == ARRAY_TYPE)
  {
r = native_interpret_aggregate (TREE_TYPE (t), ptr, 0, len);
if (r != NULL_TREE)
diff --git a/gcc/cp/semantics.c b/gcc/cp/semantics.c
index 2443d032749..b3126b12abc 100644
--- a/gcc/cp/semantics.c
+++ b/gcc/cp/semantics.c
@@ -11562,13 +11562,10 @@ cp_build_bit_cast (location_t loc, tree type, tree 
arg,
  {
if (!complete_type_or_maybe_complain (type, NULL_TREE, complain))
return error_mark_node;
-  if (TREE_CODE (type) == ARRAY_TYPE)
+  if (TREE_CODE (type) == ARRAY_TYPE && !TYPE_DOMAIN (type))
{
- /* std::bit_cast for destination ARRAY_TYPE is not possible,
-as functions may not return an array, so don't bother trying
-to support this (and then deal with VLAs etc.).  */
  error_at (loc, "%<__builtin_bit_cast%> destination type %qT "
-"is an array type", type);
+"is a VLA variable-length array type", type);


Null TYPE_DOMAIN doesn't mean VLA, it means unknown length.

Probably better to check for null or non-constant TYPE_SIZE rather than 
specifically for VLA.



  return error_mark_node;
}
if (!trivially_copyable_p (type))
diff --git a/gcc/testsuite/g++.dg/cpp2a/bit-cast-to-array1.C 
b/gcc/testsuite/g++.dg/cpp2a/bit-cast-to-array1.C
new file mode 100644
index 000..e6e50c06389
--- /dev/null
+++ b/gcc/testsuite/g++.dg/cpp2a/bit-cast-to-array1.C
@@ -0,0 +1,40 @@
+// { dg-do compile }
+
+class S { int s; };
+S s();
+class U { int a, b; };
+U u();
+
+void
+foo (int *q)
+{
+  __builtin_bit_cast (int [1], 0);
+  __builtin_bit_cast (S [1], 0);
+  __builtin_bit_cast (U [1], u);
+}
+
+template 
+void
+bar (int *q)
+{
+  int intN[N] = {};
+  int int2N[2*N] = {};
+  __builtin_bit_cast (int [N], intN);
+  __builtin_bit_cast (S [N], intN);
+  __builtin_bit_cast (U [N], int2N);
+}
+
+template 
+void
+baz (T1 ia, T2 sa, T3 ua)
+{
+  __builtin_bit_cast (T1, *ia);
+  __builtin_bit_cast (T2, *sa);
+  __builtin_bit_cast (T3, *ua);
+}
+
+void
+qux (S* sp, int *ip, U* up)
+{
+  baz  (ip, sp, up);
+}
diff --git a/gcc/testsuite/g++.dg/cpp2a/bit-cast2.C 
b/gcc/testsuite/g++.dg/cpp2a/bit-cast2.C
index 6bb1760e621..7f1836ee4e9 100644
--- a/gcc/testsuite/g++.dg/cpp2a/bit-cast2.C
+++ b/gcc/testsuite/g++.dg/cpp2a/bit-cast2.C
@@ -14,7 +14,7 @@ foo (int *q)
__builtin_bit_cast (int, s);// { dg-error "'__builtin_bit_cast' 
source type 'S' is not trivially copyable" }
__builtin_bit_cast (S, 0);  // { dg-error "'__builtin_bit_cast' 
destination type 'S' is not trivially copyable" }
__builtin_bit_cast (int &, q);  // { dg-error "'__builtin_bit_cast' destination type 
'int&' is not trivially copyable" }
-  __builtin_bit_cast (int [1], 0); // { dg-error "'__builtin_bit_cast' 
destination type \[^\n\r]* is an array type" }
+  __builtin_bit_cast (S [1], 0);   // { dg-error "'__builtin_bit_cast' 
destination type \[^\n\r]* is not trivially copyable" }
__builtin_bit_cast (V, 0);  // { dg-error "invalid use of incomplete 
type

Re: [PATCH 1/3] c++: designated init of char array by string constant [PR55227]

2021-11-25 Thread Jason Merrill via Gcc-patches


On 11/21/21 21:51, Will Wray via Gcc-patches wrote:

Also address "FIXME: this code is duplicated from reshape_init" in
cp_complete_array_type by always calling reshape_init on init-list.

PR c++/55227

gcc/cp/ChangeLog:

* decl.c (reshape_init_r): Only call has_designator_check when
first_initializer_p or for the inner constructor element.
(cp_complete_array_type): Call reshape_init on braced-init-list.

gcc/testsuite/ChangeLog:

* g++.dg/cpp2a/desig20.C: New test.
---
  gcc/cp/decl.c| 42 +--
  gcc/testsuite/g++.dg/cpp2a/desig20.C | 48 
  2 files changed, 65 insertions(+), 25 deletions(-)

diff --git a/gcc/cp/decl.c b/gcc/cp/decl.c
index 2ddf0e4a524..83a2d3bf8f1 100644
--- a/gcc/cp/decl.c
+++ b/gcc/cp/decl.c
@@ -6824,28 +6824,31 @@ reshape_init_r (tree type, reshape_iter *d, tree 
first_initializer_p,
if (TREE_CODE (type) == ARRAY_TYPE
&& char_type_p (TYPE_MAIN_VARIANT (TREE_TYPE (type
  {
-  tree str_init = init;
-  tree stripped_str_init = stripped_init;
+  tree arr_init = init;
+  tree stripped_arr_init = stripped_init;


This renaming seems unnecessary; OK without the name change.


+  reshape_iter stripd = {};
  
/* Strip one level of braces if and only if they enclose a single

 element (as allowed by [dcl.init.string]).  */
if (!first_initializer_p
- && TREE_CODE (stripped_str_init) == CONSTRUCTOR
- && CONSTRUCTOR_NELTS (stripped_str_init) == 1)
+ && TREE_CODE (stripped_arr_init) == CONSTRUCTOR
+ && CONSTRUCTOR_NELTS (stripped_arr_init) == 1)
{
- str_init = (*CONSTRUCTOR_ELTS (stripped_str_init))[0].value;
- stripped_str_init = tree_strip_any_location_wrapper (str_init);
+ stripd.cur = CONSTRUCTOR_ELT (stripped_arr_init, 0);
+ arr_init = stripd.cur->value;
+ stripped_arr_init = tree_strip_any_location_wrapper (arr_init);
}
  
/* If it's a string literal, then it's the initializer for the array

 as a whole. Otherwise, continue with normal initialization for
 array types (one value per array element).  */
-  if (TREE_CODE (stripped_str_init) == STRING_CST)
+  if (TREE_CODE (stripped_arr_init) == STRING_CST)
{
- if (has_designator_problem (d, complain))
+ if ((first_initializer_p && has_designator_problem (d, complain))
+ || (stripd.cur && has_designator_problem (&stripd, complain)))
return error_mark_node;
  d->cur++;
- return str_init;
+ return arr_init;
}
  }
  
@@ -9545,22 +9548,11 @@ cp_complete_array_type (tree *ptype, tree initial_value, bool do_default)

if (initial_value)
  {
/* An array of character type can be initialized from a
-brace-enclosed string constant.
-
-FIXME: this code is duplicated from reshape_init. Probably
-we should just call reshape_init here?  */
-  if (char_type_p (TYPE_MAIN_VARIANT (TREE_TYPE (*ptype)))
- && TREE_CODE (initial_value) == CONSTRUCTOR
- && !vec_safe_is_empty (CONSTRUCTOR_ELTS (initial_value)))
-   {
- vec *v = CONSTRUCTOR_ELTS (initial_value);
- tree value = (*v)[0].value;
- STRIP_ANY_LOCATION_WRAPPER (value);
-
- if (TREE_CODE (value) == STRING_CST
- && v->length () == 1)
-   initial_value = value;
-   }
+brace-enclosed string constant so call reshape_init to
+remove the optional braces from a braced string literal.  */
+  if (BRACE_ENCLOSED_INITIALIZER_P (initial_value))
+   initial_value = reshape_init (*ptype, initial_value,
+ tf_warning_or_error);
  
/* If any of the elements are parameter packs, we can't actually

 complete this type now because the array size is dependent.  */
diff --git a/gcc/testsuite/g++.dg/cpp2a/desig20.C 
b/gcc/testsuite/g++.dg/cpp2a/desig20.C
new file mode 100644
index 000..daadfa58855
--- /dev/null
+++ b/gcc/testsuite/g++.dg/cpp2a/desig20.C
@@ -0,0 +1,48 @@
+// PR c++/55227
+// Test designated initializer for char array by string constant
+
+// { dg-options "" }
+
+struct C {char a[2];};
+
+/* Case a, designated, unbraced, string-literal of the exact same size
+   as the initialized char array; valid and accepted before and after.  */
+C a = {.a="a"};
+
+/* Cases b,c,d, designated, braced or mimatched-size, string literal,
+   previously rejected; "C99 designator 'a' outside aggregate initializer".  */
+C b = {.a=""};
+C c = {.a={""}};
+C d = {.a={"a"}};
+
+/* Case e, designated char array field and braced, designated array element(s)
+   (with GNU [N]= extension) valid and accepted before and after.  */
+C e = {.a={[0]='a'}};
+
+/* Cases f,g,h, braced string literal, 'designated' within inner braces;
+   invalid, previously accepted a

Re: [COMMITTED] path solver: Compute ranges in path in gimple order.

2021-11-25 Thread Aldy Hernandez via Gcc-patches

Pushed.

Sorry for the noise.

On Thu, Nov 25, 2021 at 1:51 PM Aldy Hernandez  wrote:
>
> On Thu, Nov 25, 2021 at 1:38 PM Richard Biener
>  wrote:
> >
> > On Thu, Nov 25, 2021 at 1:10 PM Aldy Hernandez  wrote:
> > >
> > > On Thu, Nov 25, 2021 at 12:57 PM Richard Biener
> > >  wrote:
> > > >
> > > > On Thu, Nov 25, 2021 at 11:55 AM Aldy Hernandez via Gcc-patches
> > > >  wrote:
> > > > >
> > > > > Andrew's patch for this PR103254 papered over some underlying
> > > > > performance issues in the path solver that I'd like to address.
> > > > >
> > > > > We are currently solving the SSA's defined in the current block in
> > > > > bitmap order, which amounts to random order for all purposes.  This is
> > > > > causing unnecessary recursion in gori.  This patch changes the order
> > > > > to gimple order, thus solving dependencies before uses.
> > > > >
> > > > > There is no change in threadable paths with this change.
> > > > >
> > > > > Tested on x86-64 & ppc64le Linux.
> > > > >
> > > > > gcc/ChangeLog:
> > > > >
> > > > > PR tree-optimization/103254
> > > > > * gimple-range-path.cc 
> > > > > (path_range_query::compute_ranges_defined): New
> > > > > (path_range_query::compute_ranges_in_block): Move to
> > > > > compute_ranges_defined.
> > > > > * gimple-range-path.h (compute_ranges_defined): New.
> > > > > ---
> > > > >  gcc/gimple-range-path.cc | 33 ++---
> > > > >  gcc/gimple-range-path.h  |  1 +
> > > > >  2 files changed, 23 insertions(+), 11 deletions(-)
> > > > >
> > > > > diff --git a/gcc/gimple-range-path.cc b/gcc/gimple-range-path.cc
> > > > > index 4aa666d2c8b..e24086691c4 100644
> > > > > --- a/gcc/gimple-range-path.cc
> > > > > +++ b/gcc/gimple-range-path.cc
> > > > > @@ -401,6 +401,27 @@ path_range_query::compute_ranges_in_phis 
> > > > > (basic_block bb)
> > > > >  }
> > > > >  }
> > > > >
> > > > > +// Compute ranges defined in block.
> > > > > +
> > > > > +void
> > > > > +path_range_query::compute_ranges_defined (basic_block bb)
> > > > > +{
> > > > > +  int_range_max r;
> > > > > +
> > > > > +  compute_ranges_in_phis (bb);
> > > > > +
> > > > > +  // Iterate in gimple order to minimize recursion.
> > > > > +  for (auto gsi = gsi_start_nondebug_bb (bb); !gsi_end_p (gsi); 
> > > > > gsi_next (&gsi))
> > > >
> > > > gsi_next_nondebug (&gsi)?
> > > >
> > > > Of course this  all has the extra cost of iterating over a possibly 
> > > > very large
> > > > BB for just a few bits in m_imports?  How often does m_imports have
> > > > exactly one bit set?
> > >
> > > Hmmm, good point.
> > >
> > > Perhaps this isn't worth it then.  I mean, the underlying bug I'm
> > > tackling is an excess of outgoing edge ranges, not the excess
> > > recursion this patch attacks.
> > >
> > > If you think the cost would be high for large ILs, I can revert the patch.
> >
> > I think so.  If ordering is important then that should be achieved in some
> > other ways (always a bit difficult for on-demand infrastructure).
>
> Nah, this isn't a correctness issue.  It's not worth it.
>
> I will revert the patch.
>
> Thanks.
> Aldy
From f21dc29d923f559c069fbd0b32e473f5a76de12c Mon Sep 17 00:00:00 2001
From: Aldy Hernandez 
Date: Thu, 25 Nov 2021 17:30:07 +0100
Subject: [PATCH] path solver: Revert computation of ranges in gimple order.

Revert the patch below, as it may slow down compilation with large CFGs.

	commit 8acbd7bef6edbf537e3037174907029b530212f6
	Author: Aldy Hernandez 
	Date:   Wed Nov 24 09:43:36 2021 +0100

	path solver: Compute ranges in path in gimple order.
---
 gcc/gimple-range-path.cc | 33 +++--
 gcc/gimple-range-path.h  |  1 -
 2 files changed, 11 insertions(+), 23 deletions(-)

diff --git a/gcc/gimple-range-path.cc b/gcc/gimple-range-path.cc
index 806bce9ff11..b9c71226c1c 100644
--- a/gcc/gimple-range-path.cc
+++ b/gcc/gimple-range-path.cc
@@ -401,27 +401,6 @@ path_range_query::compute_ranges_in_phis (basic_block bb)
 }
 }
 
-// Compute ranges defined in block.
-
-void
-path_range_query::compute_ranges_defined (basic_block bb)
-{
-  int_range_max r;
-
-  compute_ranges_in_phis (bb);
-
-  // Iterate in gimple order to minimize recursion.
-  for (auto gsi = gsi_start_nondebug_bb (bb); !gsi_end_p (gsi); gsi_next (&gsi))
-if (gimple_has_lhs (gsi_stmt (gsi)))
-  {
-	tree name = gimple_get_lhs (gsi_stmt (gsi));
-	if (TREE_CODE (name) == SSA_NAME
-	&& bitmap_bit_p (m_imports, SSA_NAME_VERSION (name))
-	&& range_defined_in_block (r, name, bb))
-	  set_cache (r, name);
-  }
-}
-
 // Compute ranges defined in the current block, or exported to the
 // next block.
 
@@ -444,7 +423,17 @@ path_range_query::compute_ranges_in_block (basic_block bb)
 	clear_cache (name);
 }
 
-  compute_ranges_defined (bb);
+  // Solve imports defined in this block, starting with the PHIs...
+  compute_ranges_in_phis (bb);
+  // ...and then the rest of the imports.
+  EXECUTE_IF_SET_IN_BITMAP (m_imports, 0, i, bi)

Re: [PATCH] PR fortran/103411 - ICE in gfc_conv_array_initializer, at fortran/trans-array.c:6377

2021-11-25 Thread Mikael Morin


Hello,

Le 24/11/2021 à 22:32, Harald Anlauf via Fortran a écrit :

diff --git a/gcc/fortran/check.c b/gcc/fortran/check.c
index 5a5aca10ebe..837eb0912c0 100644
--- a/gcc/fortran/check.c
+++ b/gcc/fortran/check.c
@@ -4866,10 +4868,17 @@ gfc_check_reshape (gfc_expr *source, gfc_expr *shape,
{
  gfc_constructor *c;
  bool test;
+ gfc_constructor_base b;

+ if (shape->expr_type == EXPR_ARRAY)
+   b = shape->value.constructor;
+ else if (shape->expr_type == EXPR_VARIABLE)
+   b = shape->symtree->n.sym->value->value.constructor;


This misses a check that shape->symtree->n.sym->value is an array, so 
that it makes sense to access its constructor.


Actually, this only supports the case where the parameter value is 
defined by an array; but it could be an intrinsic call, a sum of 
parameters, a reference to an other parameter, etc.


The usual way to handle this is to call gfc_reduce_init_expr which (pray 
for it) will make an array out of whatever the shape expression is.


The rest looks good.
In the test, can you add a comment telling what it is testing?
Something like: "This tests that constant shape expressions passed to 
the reshape intrinsic are properly simplified before being used to 
diagnose invalid values"
We also used to put a comment mentioning the person who submitted the 
test, but not everybody seems to do it these days.


Mikael

[commit][master+OG11] amdgcn: Fix ICE generating CFI [PR103396]

2021-11-25 Thread Andrew Stubbs


If committed this patch to fix the amdgcn ICE reported in PR103396.

The problem was that it was mis-counting the number of registers to save 
when the link register was only clobbered implicitly by calls. The issue 
is easily fixed by adjusting the condition to match elsewhere in the 
same function.


Committed to master and backported to devel/omp/gcc-11. It should affect 
GCC 11.


Andrewamdgcn: Fix ICE generating CFI [PR103396]

gcc/ChangeLog:

PR target/103396
* config/gcn/gcn.c (move_callee_saved_registers): Ensure that the
number of spilled registers is counted correctly.

diff --git a/gcc/config/gcn/gcn.c b/gcc/config/gcn/gcn.c
index 75a9c576694..2bde88afc32 100644
--- a/gcc/config/gcn/gcn.c
+++ b/gcc/config/gcn/gcn.c
@@ -2785,7 +2785,7 @@ move_callee_saved_registers (rtx sp, machine_function 
*offsets,
int start = (regno == VGPR_REGNO (7) ? 64 : 0);
int count = MIN (saved_scalars - start, 64);
int add_lr = (regno == VGPR_REGNO (6)
- && df_regs_ever_live_p (LINK_REGNUM));
+ && offsets->lr_needs_saving);
int lrdest = -1;
rtvec seq = rtvec_alloc (count + add_lr);

Re: [PATCH] ipa: Teach IPA-CP transformation about IPA-SRA modifications (PR 103227)

2021-11-25 Thread Martin Jambor

Hi,

On Thu, Nov 25 2021, Jan Hubicka wrote:
>> 
>> gcc/ChangeLog:
>> 
>> 2021-11-23  Martin Jambor  
>> 
>>  PR ipa/103227
>>  * ipa-prop.h (ipa_get_param): New overload.  Move bits of the existing
>>  one to the new one.
>>  * ipa-param-manipulation.h (ipa_param_adjustments): New member
>>  function get_updated_index_or_split.
>>  * ipa-param-manipulation.c
>>  (ipa_param_adjustments::get_updated_index_or_split): New function.
>>  * ipa-prop.c (adjust_agg_replacement_values): Reimplement, add
>>  capability to identify scalarized parameters and perform substitution
>>  on them.
>>  (ipcp_transform_function): Create descriptors earlier, handle new
>>  return values of adjust_agg_replacement_values.
>> 
>> gcc/testsuite/ChangeLog:
>> 
>> 2021-11-23  Martin Jambor  
>> 
>>  PR ipa/103227
>>  * gcc.dg/ipa/pr103227-1.c: New test.
>>  * gcc.dg/ipa/pr103227-3.c: Likewise.
>>  * gcc.dg/ipa/pr103227-2.c: Likewise.
>>  * gfortran.dg/pr53787.f90: Disable IPA-SRA.
>> ---
>>  gcc/ipa-param-manipulation.c  | 33 
>>  gcc/ipa-param-manipulation.h  |  7 +++
>>  gcc/ipa-prop.c| 73 +++
>>  gcc/ipa-prop.h| 15 --
>>  gcc/testsuite/gcc.dg/ipa/pr103227-1.c | 29 +++
>>  gcc/testsuite/gcc.dg/ipa/pr103227-2.c | 29 +++
>>  gcc/testsuite/gcc.dg/ipa/pr103227-3.c | 52 +++
>>  gcc/testsuite/gfortran.dg/pr53787.f90 |  2 +-
>>  8 files changed, 216 insertions(+), 24 deletions(-)
>>  create mode 100644 gcc/testsuite/gcc.dg/ipa/pr103227-1.c
>>  create mode 100644 gcc/testsuite/gcc.dg/ipa/pr103227-2.c
>>  create mode 100644 gcc/testsuite/gcc.dg/ipa/pr103227-3.c
>> 
>> diff --git a/gcc/ipa-param-manipulation.c b/gcc/ipa-param-manipulation.c
>> index cec1dba701f..479c20b3871 100644
>> --- a/gcc/ipa-param-manipulation.c
>> +++ b/gcc/ipa-param-manipulation.c
>> @@ -449,6 +449,39 @@ ipa_param_adjustments::get_updated_indices (vec 
>> *new_indices)
>>  }
>>  }
>>  
>> +/* If a parameter with original INDEX has survived intact, return its new
>> +   index.  Otherwise return -1.  In that case, if it has been split and 
>> there
>> +   is a new parameter representing a portion at unit OFFSET for which a 
>> value
>> +   of a TYPE can be substituted, store its new index into SPLIT_INDEX,
>> +   otherwise store -1 there.  */
>> +int
>> +ipa_param_adjustments::get_updated_index_or_split (int index,
>> +   unsigned unit_offset,
>> +   tree type, int *split_index)
>> +{
>> +  unsigned adj_len = vec_safe_length (m_adj_params);
>> +  for (unsigned i = 0; i < adj_len ; i++)
>
> In ipa-modref I precompute this to map so we do not need to walk all
> params, but the loop is probably not bad since functions do not have
> tens of thousdands parameters :)

The most I have seen is about 70 and those were big outliers.

I was thinking of precomputing it somehow but for one parameter there
can be up to param ipa-sra-max-replacements replacements (default 8 -
and there is another, by default stricter, limit for pointers).  So it
would have to be a hash table or something like it.

>
> Can I use it in ipa-modref to discover what parameters was turned from
> by-reference to scalar, too?

IIUC, I don't think you directly can, also because for one parameter you
can have more scalar replacements and the interface needs an offset for
which to look.  OTOH, if you only care about simple scalars passed by
reference, then passing zero as offset - and probably adding a flag to
check there are no replacements at other offsets - would work.  (But
that information could also be easily pre-computed.)

>> +{
>> +  ipa_adjusted_param *apm = &(*m_adj_params)[i];
>> +  if (apm->base_index != index)
>> +continue;
>> +  if (apm->op == IPA_PARAM_OP_COPY)
>> +return i;
>> +  if (apm->op == IPA_PARAM_OP_SPLIT
>> +  && apm->unit_offset == unit_offset)
>> +{
>> +  if (useless_type_conversion_p (apm->type, type))
>> +*split_index = i;
>> +  else
>> +*split_index = -1;
>> +  return -1;
>> +}
>> +}
>> +
>> +  *split_index = -1;
>> +  return -1;
>> +}
>> +
>>  /* Return the original index for the given new parameter index.  Return a
>> negative number if not available.  */
>>  
>> diff --git a/gcc/ipa-param-manipulation.h b/gcc/ipa-param-manipulation.h
>> index 5adf8a22356..d1dad9fac73 100644
>> --- a/gcc/ipa-param-manipulation.h
>> +++ b/gcc/ipa-param-manipulation.h
>> @@ -236,6 +236,13 @@ public:
>>void get_surviving_params (vec *surviving_params);
>>/* Fill a vector with new indices of surviving original parameters.  */
>>void get_updated_indices (vec *new_indices);
>> +  /* If a parameter with original INDEX has survived intact, return its new
>> + index.  Otherwise return -1.  In that case, if it has been spli

Re: [PATCH] ipa: Teach IPA-CP transformation about IPA-SRA modifications (PR 103227)

2021-11-25 Thread Jan Hubicka via Gcc-patches

> >
> > In ipa-modref I precompute this to map so we do not need to walk all
> > params, but the loop is probably not bad since functions do not have
> > tens of thousdands parameters :)
> 
> The most I have seen is about 70 and those were big outliers.
> 
> I was thinking of precomputing it somehow but for one parameter there
> can be up to param ipa-sra-max-replacements replacements (default 8 -
> and there is another, by default stricter, limit for pointers).  So it
> would have to be a hash table or something like it.

Yep, I think given that we have API, we can play with this later.
> 
> >
> > Can I use it in ipa-modref to discover what parameters was turned from
> > by-reference to scalar, too?
> 
> IIUC, I don't think you directly can, also because for one parameter you
> can have more scalar replacements and the interface needs an offset for
> which to look.  OTOH, if you only care about simple scalars passed by
> reference, then passing zero as offset - and probably adding a flag to
> check there are no replacements at other offsets - would work.  (But
> that information could also be easily pre-computed.)

If parameter is broken up into multipe pieces, I can just duplicate its
ECF flags (if I know that pointers from whole structure does not escape,
neither does pointers from its part). However presently modref compute
noting useful for aggregate parameters (have patch for that but it was
too late in this stage1 to push out everything, so it will come next
stage1).

If parameter is turned from by-reference and possibly offsetted, I can
use the original ECF flag after applying deref_flags translation.
Again it is not problem to multily it if the parameter is stplit into
multiple subparameters.

Honza

Re: [PATCH 3/4] libgcc: Split FDE search code from PT_GNU_EH_FRAME lookup

2021-11-25 Thread Jakub Jelinek via Gcc-patches

On Tue, Nov 23, 2021 at 06:56:14PM +0100, Florian Weimer wrote:
> 8<--8<
> This allows switching to a different implementation for
> PT_GNU_EH_FRAME lookup in a subsequent commit.
> 
> This moves some of the PT_GNU_EH_FRAME parsing out of the glibc loader
> lock that is implied by dl_iterate_phdr.  However, the FDE is already
> parsed outside the lock before this change, so this does not introduce
> additional crashes in case of a concurrent dlclose.
> 
> libunwind/ChangeLog
> 
>   * unwind-dw2-fde-dip.c (struct unw_eh_callback_data): Add hdr.
>   Remove func, ret.
>   (find_fde_tail): New function.  Split from
>   _Unwind_IteratePhdrCallback. Move the result initialization
>   from _Unwind_Find_FDE.
>   (_Unwind_Find_FDE): Updated to call find_fde_tail.

LGTM, thanks.

Jakub

PING^3 [PATCH v4 0/2] Implement indirect external access

2021-11-25 Thread H.J. Lu via Gcc-patches

On Mon, Nov 1, 2021 at 7:02 AM H.J. Lu  wrote:
>
> On Thu, Oct 21, 2021 at 12:56 PM H.J. Lu  wrote:
> >
> > On Wed, Sep 22, 2021 at 7:02 PM H.J. Lu  wrote:
> > >
> > > Changes in the v4 patch.
> > >
> > > 1. Add nodirect_extern_access attribute.
> > >
> > > Changes in the v3 patch.
> > >
> > > 1. GNU_PROPERTY_1_NEEDED_INDIRECT_EXTERN_ACCESS support has been added to
> > > GNU binutils 2.38.  But the -z indirect-extern-access linker option is
> > > only available for Linux/x86.  However, the --max-cache-size=SIZE linker
> > > option was also addded within a day.  --max-cache-size=SIZE is used to
> > > check for GNU_PROPERTY_1_NEEDED_INDIRECT_EXTERN_ACCESS support.
> > >
> > > Changes in the v2 patch.
> > >
> > > 1. Rename the option to -fdirect-extern-access.
> > >
> > > ---
> > > On systems with copy relocation:
> > > * A copy in executable is created for the definition in a shared library
> > > at run-time by ld.so.
> > > * The copy is referenced by executable and shared libraries.
> > > * Executable can access the copy directly.
> > >
> > > Issues are:
> > > * Overhead of a copy, time and space, may be visible at run-time.
> > > * Read-only data in the shared library becomes read-write copy in
> > > executable at run-time.
> > > * Local access to data with the STV_PROTECTED visibility in the shared
> > > library must use GOT.
> > >
> > > On systems without function descriptor, function pointers vary depending
> > > on where and how the functions are defined.
> > > * If the function is defined in executable, it can be the address of
> > > function body.
> > > * If the function, including the function with STV_PROTECTED visibility,
> > > is defined in the shared library, it can be the address of the PLT entry
> > > in executable or shared library.
> > >
> > > Issues are:
> > > * The address of function body may not be used as its function pointer.
> > > * ld.so needs to search loaded shared libraries for the function pointer
> > > of the function with STV_PROTECTED visibility.
> > >
> > > Here is a proposal to remove copy relocation and use canonical function
> > > pointer:
> > >
> > > 1. Accesses, including in PIE and non-PIE, to undefined symbols must
> > > use GOT.
> > >   a. Linker may optimize out GOT access if the data is defined in PIE or
> > >   non-PIE.
> > > 2. Read-only data in the shared library remain read-only at run-time
> > > 3. Address of global data with the STV_PROTECTED visibility in the shared
> > > library is the address of data body.
> > >   a. Can use IP-relative access.
> > >   b. May need GOT without IP-relative access.
> > > 4. For systems without function descriptor,
> > >   a. All global function pointers of undefined functions in PIE and
> > >   non-PIE must use GOT.  Linker may optimize out GOT access if the
> > >   function is defined in PIE or non-PIE.
> > >   b. Function pointer of functions with the STV_PROTECTED visibility in
> > >   executable and shared library is the address of function body.
> > >i. Can use IP-relative access.
> > >ii. May need GOT without IP-relative access.
> > >iii. Branches to undefined functions may use PLT.
> > > 5. Single global definition marker:
> > >
> > > Add GNU_PROPERTY_1_NEEDED:
> > >
> > > #define GNU_PROPERTY_1_NEEDED GNU_PROPERTY_UINT32_OR_LO
> > >
> > > to indicate the needed properties by the object file.
> > >
> > > Add GNU_PROPERTY_1_NEEDED_INDIRECT_EXTERN_ACCESS:
> > >
> > > #define GNU_PROPERTY_1_NEEDED_INDIRECT_EXTERN_ACCESS (1U << 0)
> > >
> > > to indicate that the object file requires canonical function pointers and
> > > cannot be used with copy relocation.  This bit should be cleared in
> > > executable when there are non-GOT or non-PLT relocations in relocatable
> > > input files without this bit set.
> > >
> > >   a. Protected symbol access within the shared library can be treated as
> > >   local.
> > >   b. Copy relocation should be disallowed at link-time and run-time.
> > >   c. GOT function pointer reference is required at link-time and run-time.
> > >
> > > The indirect external access marker can be used in the following ways:
> > >
> > > 1. Linker can decide the best way to resolve a relocation against a
> > > protected symbol before seeing all relocations against the symbol.
> > > 2. Dynamic linker can decide if it is an error to have a copy relocation
> > > in executable against the protected symbol in a shared library by checking
> > > if the shared library is built with -fno-direct-extern-access.
> > >
> > > Add a compiler option, -fdirect-extern-access. -fdirect-extern-access is
> > > the default.  With -fno-direct-extern-access:
> > >
> > > 1. Always to use GOT to access undefined symbols, including in PIE and
> > > non-PIE.  This is safe to do and does not break the ABI.
> > > 2. In executable and shared library, for symbols with the STV_PROTECTED
> > > visibility:
> > >   a. The address of data symbol is the address of data body.
> > >   b. For systems without function descriptor, the function

[PATCH take 3] ivopts: Improve code generated for very simple loops.

2021-11-25 Thread Roger Sayle

On Tue, Nov 23, 2021 at 12:46PM Richard Biener < richard.guent...@gmail.com> 
wrote:
> On Thu, Nov 18, 2021 at 4:18 PM Roger Sayle 
> wrote:
> > > The patch doesn't add any testcase.
> >
> > The three new attached tests check that the critical invariants have a
> > simpler form, and hopefully shouldn't be affected by whether the
> > optimizer and/or backend costs actually decide to perform this iv 
> > substitution
> or not.
> 
> The testcases might depend on lp64 though, did you test them with -m32?
> IMHO it's fine to require lp64 here.

Great catch.  You're right that when the loop index has the same precision as 
the
target's pointer, that fold is (already) able to simplify the ((EXPR)-1)+1, so 
that with
-m32 my new tests ivopts-[567].c fail.  I've added "require lp64" to those 
tests, but
I've also added two more tests, using char and unsigned char for the loop 
expression,
which are optimized on both ilp32 and lp64.

For example, with -O2 -m32, we see the following improvements in ivopts-8.c:
diff ivopts-8.old.s ivopts-8.new.s
14,16c14,15
<   subl$1, %ecx
<   movzbl  %cl, %ecx
<   leal4(%eax,%ecx,4), %ecx
---
>   movsbl  %cl, %ecx
>   leal(%eax,%ecx,4), %ecx

This might also explain why GCC currently generates sub-optimal code.  Back when
ivopts was written, most folks were on i686, so the generated code was optimal.
But with the transition to x86_64, the code is correct, just slightly less 
efficient.

> I'm a bit unsure about adding this special-casing in cand_value_at in general 
> - it
> does seem that we're doing sth wrong elsewhere - either by not simplifying 
> even
> though enough knowledge is there or by throwing away knowledge earlier
> (during niter analysis?).

I agree this approach is a bit ugly.  Conceptually, an alternative might be to 
avoid
throwing away knowledge earlier, during niter analysis, by adding an extra tree 
field
to the tree_niter_desc structure, so that it returns both niter0 (the iteration 
count
at the top of the loop) and niter1 (the iteration count at the bottom of the 
loop),
so that later passes (cand_value_at) can use the tree that's relevant.  Alas, 
this too is
ugly, and inefficient as we're creating/folding trees that may never be 
used/useful.
A compromise might be to add an enum field describing how the niter was
calculated to tree_niter_desc, and this can be inspected/used by cand_value_at.
The current patch figures this out by examining the other fields already in
tree_niter_desc.

> Anyway, the patch does look quite safe - can you show some statistics in how
> many time there's extra simplification this way during say bootstrap?

Certainly.  During stage2 and stage3 of a bootstrap on x86_64-pc-linux-gnu,
cand_value_at is called 500657 times.  The majority of calls,
447607 (89.4%), request the value at the end of the loop (after_adjust),
while 53050 (10.6%) request the value at the start of the loop.
102437 calls (20.5%) are optimized by clause 1 [0..N loops]
27939 calls (5.6%) are optimized by clause 2 [beg..end loops]

Looking for opportunities to improve things further, I see that
319608 calls (63.8%) have a LT_EXPR exit test.
160965 calls (32.2%) have a NE_EXPR exit test.
20084 calls (4.0%) have a GT_EXPR exit test.
so handling descending loops wouldn’t be a big win.
I'll investigate whether (constant) step sizes other than 1 are
(i) sufficiently common and (ii) benefit from improved folding.

This revised patch has been test on x86_64-pc-linux-gnu with a
make bootstrap and make -k check, both with and without
--target-board=unix{-m32}, with no new failures.
Ok for mainline?

2021-11-25  Roger Sayle  

gcc/ChangeLog
* tree-ssa-loop-ivopts.c (cand_value_at): Take a class
tree_niter_desc* argument instead of just a tree for NITER.
If we require the iv candidate value at the end of the final
loop iteration, try using the original loop bound as the
NITER for sufficiently simple loops.
(may_eliminate_iv): Update (only) call to cand_value_at.

gcc/testsuite
* gcc.dg/wrapped-binop-simplify.c: Update expected test result.
* gcc.dg/tree-ssa/ivopts-5.c: New test case.
* gcc.dg/tree-ssa/ivopts-6.c: New test case.
* gcc.dg/tree-ssa/ivopts-7.c: New test case.
* gcc.dg/tree-ssa/ivopts-8.c: New test case.
* gcc.dg/tree-ssa/ivopts-9.c: New test case.

Roger
--

diff --git a/gcc/tree-ssa-loop-ivopts.c b/gcc/tree-ssa-loop-ivopts.c
index 4769b65..067f823 100644
--- a/gcc/tree-ssa-loop-ivopts.c
+++ b/gcc/tree-ssa-loop-ivopts.c
@@ -5030,28 +5030,57 @@ determine_group_iv_cost_address (struct ivopts_data 
*data,
   return !sum_cost.infinite_cost_p ();
 }

-/* Computes value of candidate CAND at position AT in iteration NITER, and
-   stores it to VAL.  */
+/* Computes value of candidate CAND at position AT in iteration DESC->NITER,
+   and stores it to VAL.  */

 static void
-cand_value_at (class loop *loop, struct iv_cand *cand, gimple *at,

Re: [PATCH] PR fortran/103411 - ICE in gfc_conv_array_initializer, at fortran/trans-array.c:6377

2021-11-25 Thread Harald Anlauf via Gcc-patches


Hi Mikael,

Am 25.11.21 um 17:46 schrieb Mikael Morin:

Hello,

Le 24/11/2021 à 22:32, Harald Anlauf via Fortran a écrit :

diff --git a/gcc/fortran/check.c b/gcc/fortran/check.c
index 5a5aca10ebe..837eb0912c0 100644
--- a/gcc/fortran/check.c
+++ b/gcc/fortran/check.c
@@ -4866,10 +4868,17 @@ gfc_check_reshape (gfc_expr *source, gfc_expr
*shape,
 {
   gfc_constructor *c;
   bool test;
+  gfc_constructor_base b;

+  if (shape->expr_type == EXPR_ARRAY)
+    b = shape->value.constructor;
+  else if (shape->expr_type == EXPR_VARIABLE)
+    b = shape->symtree->n.sym->value->value.constructor;


This misses a check that shape->symtree->n.sym->value is an array, so
that it makes sense to access its constructor.


there are checks further above for the cases
  shape->expr_type == EXPR_ARRAY
and for
  shape->expr_type == EXPR_VARIABLE
which look at the elements of array shape to see if they are
non-negative.

Only in those cases where the full "if ()'s" pass we set
shape_is_const = true; and proceed.  The purpose of the auxiliary
bool shape_is_const is to avoid repeating the lengthy if's again.
Only then the above cited code segment should get executed.

For shape->expr_type == EXPR_ARRAY there is really no change in logic.
For shape->expr_type == EXPR_VARIABLE the above snipped is now executed,
but then we already had

  else if (shape->expr_type == EXPR_VARIABLE && shape->ref
   && shape->ref->u.ar.type == AR_FULL && shape->ref->u.ar.dimen == 1
   && shape->ref->u.ar.as
   && shape->ref->u.ar.as->lower[0]->expr_type == EXPR_CONSTANT
   && shape->ref->u.ar.as->lower[0]->ts.type == BT_INTEGER
   && shape->ref->u.ar.as->upper[0]->expr_type == EXPR_CONSTANT
   && shape->ref->u.ar.as->upper[0]->ts.type == BT_INTEGER
   && shape->symtree->n.sym->attr.flavor == FL_PARAMETER
   && shape->symtree->n.sym->value)

In which situations do I miss anything new?


Actually, this only supports the case where the parameter value is
defined by an array; but it could be an intrinsic call, a sum of
parameters, a reference to an other parameter, etc.


E.g. the following (still) does get rejected:

  print *, reshape([1,2,3,4,5], a+1)
  print *, reshape([1,2,3,4,5], a+a)
  print *, reshape([1,2,3,4,5], 2*a)
  print *, reshape([1,2,3,4,5], [3,3])
  print *, reshape([1,2,3,4,5], spread(3,dim=1,ncopies=2))

and has been rejected before.


The usual way to handle this is to call gfc_reduce_init_expr which (pray
for it) will make an array out of whatever the shape expression is.


Can you give an example where it fails?

I think the current code would almost certainly fail, too.


The rest looks good.
In the test, can you add a comment telling what it is testing?
Something like: "This tests that constant shape expressions passed to
the reshape intrinsic are properly simplified before being used to
diagnose invalid values"


Can do.


We also used to put a comment mentioning the person who submitted the
test, but not everybody seems to do it these days.


Can do.


Mikael



Harald

[committed] libstdc++: Do not use memset in constexpr calls to ranges::fill_n [PR101608]

2021-11-25 Thread Jonathan Wakely via Gcc-patches

Tested x86_64-linux, pushed to trunk.


libstdc++-v3/ChangeLog:

PR libstdc++/101608
* include/bits/ranges_algobase.h (__fill_n_fn): Check for
constant evaluation before using memset.
* testsuite/25_algorithms/fill_n/constrained.cc: Check
byte-sized values as well.
---
 libstdc++-v3/include/bits/ranges_algobase.h   | 28 ---
 .../25_algorithms/fill_n/constrained.cc   |  6 ++--
 2 files changed, 22 insertions(+), 12 deletions(-)

diff --git a/libstdc++-v3/include/bits/ranges_algobase.h 
b/libstdc++-v3/include/bits/ranges_algobase.h
index c8c4d032983..9929e5e828b 100644
--- a/libstdc++-v3/include/bits/ranges_algobase.h
+++ b/libstdc++-v3/include/bits/ranges_algobase.h
@@ -527,17 +527,25 @@ namespace ranges
if (__n <= 0)
  return __first;
 
-   // TODO: Generalize this optimization to contiguous iterators.
-   if constexpr (is_pointer_v<_Out>
- // Note that __is_byte already implies !is_volatile.
- && __is_byte>::__value
- && integral<_Tp>)
- {
-   __builtin_memset(__first, static_cast(__value), __n);
-   return __first + __n;
- }
-   else if constexpr (is_scalar_v<_Tp>)
+   if constexpr (is_scalar_v<_Tp>)
  {
+   // TODO: Generalize this optimization to contiguous iterators.
+   if constexpr (is_pointer_v<_Out>
+ // Note that __is_byte already implies !is_volatile.
+ && __is_byte>::__value
+ && integral<_Tp>)
+ {
+#ifdef __cpp_lib_is_constant_evaluated
+   if (!std::is_constant_evaluated())
+#endif
+ {
+   __builtin_memset(__first,
+static_cast(__value),
+__n);
+   return __first + __n;
+ }
+ }
+
const auto __tmp = __value;
for (; __n > 0; --__n, (void)++__first)
  *__first = __tmp;
diff --git a/libstdc++-v3/testsuite/25_algorithms/fill_n/constrained.cc 
b/libstdc++-v3/testsuite/25_algorithms/fill_n/constrained.cc
index 6a015d34a89..1d1e1c104d4 100644
--- a/libstdc++-v3/testsuite/25_algorithms/fill_n/constrained.cc
+++ b/libstdc++-v3/testsuite/25_algorithms/fill_n/constrained.cc
@@ -73,11 +73,12 @@ test01()
 }
 }
 
+template
 constexpr bool
 test02()
 {
   bool ok = true;
-  int x[6] = { 1, 2, 3, 4, 5, 6 };
+  T x[6] = { 1, 2, 3, 4, 5, 6 };
   const int y[6] = { 1, 2, 3, 4, 5, 6 };
   const int z[6] = { 17, 17, 17, 4, 5, 6 };
 
@@ -94,5 +95,6 @@ int
 main()
 {
   test01();
-  static_assert(test02());
+  static_assert(test02());
+  static_assert(test02()); // PR libstdc++/101608
 }
-- 
2.31.1

[PATCH v2] elf: Add _dl_find_object function

2021-11-25 Thread Florian Weimer via Gcc-patches

I have reword the previous patch to make the interface more generally
useful.  Since there are now four words in the core arrays, I did away
with the separate base address array.  (We can bring it back in the
future if necessary.)  I fixed a bug in the handling of proxy map (by
not copying proxy maps during the dlopen update).  The placement of the
function is also different, as explained in the commit message.

The performance seems unchanged.

I haven't included the obvious future performance enhancements in this
patch, and also did not update to Arm's __gnu_Unwind_Find_exidx to use
the new interface.  I think this work can be done in follow-up patches.

Thanks,
Florian

Subject: elf: Add _dl_find_object function

It can be used to speed up the libgcc unwinder, and the internal
_dl_find_dso_for_object function (which is used for caller
identification in dlopen and related functions, and in dladdr).

_dl_find_object is in the internal namespace due to bug 28503.
If libgcc switches to _dl_find_object, this namespace issue will
be fixed.  It is located in libc for two reasons: it is necessary
to forward the call to the static libc after static dlopen, and
there is a link ordering issue with -static-libgcc and libgcc_eh.a
because libc.so is not a linker script that includes ld.so in the
glibc build tree (so that GCC's internal -lc after libgcc_eh.a does
not pick up ld.so).

It is necessary to do the i386 customization in the
sysdeps/x86/bits/dl_find_object.h header shared with x86-64 because
otherwise, multilib installations are broken.

The implementation uses software transactional memory, as suggested
by Torvald Riegel.  Two copies of the supporting data structures are
used, also achieving full async-signal-safety.

---
 NEWS   |   4 +
 bits/dl_find_object.h  |  32 +
 dlfcn/Makefile |   2 +-
 dlfcn/dlfcn.h  |  22 +
 elf/Makefile   |  47 +-
 elf/Versions   |   3 +
 elf/dl-close.c |   4 +
 elf/dl-find_object.c   | 841 +
 elf/dl-find_object.h   | 115 +++
 elf/dl-libc_freeres.c  |   2 +
 elf/dl-open.c  |   5 +
 elf/dl-support.c   |   3 +
 elf/libc-dl_find_object.c  |  26 +
 elf/rtld.c |  11 +
 elf/rtld_static_init.c |   1 +
 elf/tst-dl_find_object-mod1.c  |  10 +
 elf/tst-dl_find_object-mod2.c  |  15 +
 elf/tst-dl_find_object-mod3.c  |  10 +
 elf/tst-dl_find_object-mod4.c  |  10 +
 elf/tst-dl_find_object-mod5.c  |  11 +
 elf/tst-dl_find_object-mod6.c  |  11 +
 elf/tst-dl_find_object-mod7.c  |  10 +
 elf/tst-dl_find_object-mod8.c  |  10 +
 elf/tst-dl_find_object-mod9.c  |  10 +
 elf/tst-dl_find_object-static.c|  22 +
 elf/tst-dl_find_object-threads.c   | 275 +++
 elf/tst-dl_find_object.c   | 240 ++
 include/atomic_wide_counter.h  |  14 +
 include/bits/dl_find_object.h  |   1 +
 include/dlfcn.h|   2 +
 include/link.h |   3 +
 manual/Makefile|   2 +-
 manual/dynlink.texi| 137 
 manual/libdl.texi  |  10 -
 manual/probes.texi |   2 +-
 manual/threads.texi|   2 +-
 sysdeps/arm/bits/dl_find_object.h  |  25 +
 sysdeps/generic/ldsodefs.h |   5 +
 sysdeps/mach/hurd/i386/libc.abilist|   1 +
 sysdeps/nios2/bits/dl_find_object.h|  25 +
 sysdeps/unix/sysv/linux/aarch64/libc.abilist   |   1 +
 sysdeps/unix/sysv/linux/alpha/libc.abilist |   1 +
 sysdeps/unix/sysv/linux/arc/libc.abilist   |   1 +
 sysdeps/unix/sysv/linux/arm/be/libc.abilist|   1 +
 sysdeps/unix/sysv/linux/arm/le/libc.abilist|   1 +
 sysdeps/unix/sysv/linux/csky/libc.abilist  |   1 +
 sysdeps/unix/sysv/linux/hppa/libc.abilist  |   1 +
 sysdeps/unix/sysv/linux/i386/libc.abilist  |   1 +
 sysdeps/unix/sysv/linux/ia64/libc.abilist  |   1 +
 sysdeps/unix/sysv/linux/m68k/coldfire/libc.abilist |   1 +
 sysdeps/unix/sysv/linux/m68k/m680x0/libc.abilist   |   1 +
 sysdeps/unix/sysv/linux/microblaze/be/libc.abilist |   1 +
 sysdeps/unix/sysv/linux/microblaze/le/libc.abilist |   1 +
 .../unix/sysv/linux/mips/m

Re: [PATCH 4/4] libgcc: Use _dl_find_eh_frame in _Unwind_Find_FDE

2021-11-25 Thread Florian Weimer via Gcc-patches

* Jakub Jelinek:

>> +/* Fallback declaration for old glibc headers.  DL_FIND_EH_FRAME_DBASE is 
>> used
>> +   as a proxy to determine if  declares _dl_find_eh_frame.  */
>> +#if defined __GLIBC__ && !defined DL_FIND_EH_FRAME_DBASE
>> +#if NEED_DBASE_MEMBER
>> +void *_dl_find_eh_frame (void *__pc, void **__dbase) __attribute__ ((weak));
>> +#else
>> +void *_dl_find_eh_frame (void *__pc) __attribute__ ((weak));
>> +#endif
>> +#define USE_DL_FIND_EH_FRAME 1
>> +#define DL_FIND_EH_FRAME_CONDITION (_dl_find_eh_frame != NULL)
>> +#endif
>
> I'd prefer not to do this.  If we find glibc with the support in the
> headers, let's use it, otherwise let's keep using what we were doing before.

I've included a simplified version below, based on the _dl_find_object
patch for glibc.

This is a bit difficult to test, but I ran a full toolchain bootstrap
with GCC + glibc on all glibc-supported architectures (except Hurd and
one m68k variant; they do not presnetly build, see Joseph's testers).

I also tested this by copying the respective GCC-built libgcc_s into a
glibc build tree for run-time testing on i686-linux-gnu and
x86_64-linux-gnu.  There weren't any issues.  There are a buch of
unwinder tests in glibc, giving at least some coverage.

Thanks,
Florian

Subject: libgcc: Use _dl_find_object in _Unwind_Find_FDE

libgcc/ChangeLog:

* unwind-dw2-fde-dip.c (_Unwind_Find_FDE): Call _dl_find_object
if available.

---
 libgcc/unwind-dw2-fde-dip.c | 18 ++
 1 file changed, 18 insertions(+)

diff --git a/libgcc/unwind-dw2-fde-dip.c b/libgcc/unwind-dw2-fde-dip.c
index fbb0fbdebb9..b837d8e4904 100644
--- a/libgcc/unwind-dw2-fde-dip.c
+++ b/libgcc/unwind-dw2-fde-dip.c
@@ -504,6 +504,24 @@ _Unwind_Find_FDE (void *pc, struct dwarf_eh_bases *bases)
   if (ret != NULL)
 return ret;
 
+  /* Use DLFO_STRUCT_HAS_EH_DBASE as a proxy for the existence of a glibc-style
+ _dl_find_object function.  */
+#ifdef DLFO_STRUCT_HAS_EH_DBASE
+  {
+struct dl_find_object dlfo;
+if (_dl_find_object (pc, &dlfo) == 0)
+  return find_fde_tail ((_Unwind_Ptr) pc, dlfo.dlfo_eh_frame,
+# if DLFO_STRUCT_HAS_EH_DBASE
+   (_Unwind_Ptr) dlfo.dlfo_eh_dbase,
+# else
+   NULL,
+# endif
+   bases);
+else
+  return NULL;
+}
+#endif /* DLFO_STRUCT_HAS_EH_DBASE */
+
   data.pc = (_Unwind_Ptr) pc;
 #if NEED_DBASE_MEMBER
   data.dbase = NULL;

Re: [PATCH] PR fortran/103411 - ICE in gfc_conv_array_initializer, at fortran/trans-array.c:6377

2021-11-25 Thread Mikael Morin


Le 25/11/2021 à 21:03, Harald Anlauf a écrit :

Hi Mikael,

Am 25.11.21 um 17:46 schrieb Mikael Morin:

Hello,

Le 24/11/2021 à 22:32, Harald Anlauf via Fortran a écrit :

diff --git a/gcc/fortran/check.c b/gcc/fortran/check.c
index 5a5aca10ebe..837eb0912c0 100644
--- a/gcc/fortran/check.c
+++ b/gcc/fortran/check.c
@@ -4866,10 +4868,17 @@ gfc_check_reshape (gfc_expr *source, gfc_expr
*shape,
 {
   gfc_constructor *c;
   bool test;
+  gfc_constructor_base b;

+  if (shape->expr_type == EXPR_ARRAY)
+    b = shape->value.constructor;
+  else if (shape->expr_type == EXPR_VARIABLE)
+    b = shape->symtree->n.sym->value->value.constructor;


This misses a check that shape->symtree->n.sym->value is an array, so
that it makes sense to access its constructor.


there are checks further above for the cases
   shape->expr_type == EXPR_ARRAY
and for
   shape->expr_type == EXPR_VARIABLE
which look at the elements of array shape to see if they are
non-negative.

Only in those cases where the full "if ()'s" pass we set
shape_is_const = true; and proceed.  The purpose of the auxiliary
bool shape_is_const is to avoid repeating the lengthy if's again.
Only then the above cited code segment should get executed.

For shape->expr_type == EXPR_ARRAY there is really no change in logic.
For shape->expr_type == EXPR_VARIABLE the above snipped is now executed,
but then we already had

   else if (shape->expr_type == EXPR_VARIABLE && shape->ref
    && shape->ref->u.ar.type == AR_FULL && shape->ref->u.ar.dimen == 1
    && shape->ref->u.ar.as
    && shape->ref->u.ar.as->lower[0]->expr_type == EXPR_CONSTANT
    && shape->ref->u.ar.as->lower[0]->ts.type == BT_INTEGER
    && shape->ref->u.ar.as->upper[0]->expr_type == EXPR_CONSTANT
    && shape->ref->u.ar.as->upper[0]->ts.type == BT_INTEGER
    && shape->symtree->n.sym->attr.flavor == FL_PARAMETER
    && shape->symtree->n.sym->value)

In which situations do I miss anything new?


Yes, I agree with all of this.
My comment wasn’t about a check on shape->expr_type, but on 
shape->value->expr_type if shape->expr_type is a (parameter) variable.



Actually, this only supports the case where the parameter value is
defined by an array; but it could be an intrinsic call, a sum of
parameters, a reference to an other parameter, etc.


E.g. the following (still) does get rejected:

   print *, reshape([1,2,3,4,5], a+1)
   print *, reshape([1,2,3,4,5], a+a)
   print *, reshape([1,2,3,4,5], 2*a)
   print *, reshape([1,2,3,4,5], [3,3])
   print *, reshape([1,2,3,4,5], spread(3,dim=1,ncopies=2))

and has been rejected before.




The usual way to handle this is to call gfc_reduce_init_expr which (pray
for it) will make an array out of whatever the shape expression is.


Can you give an example where it fails?

I think the current code would almost certainly fail, too.


Probably, I was just trying to avoid followup bugs. ;-)

I have checked the following:

  integer, parameter :: a(2) = [1,1]
  integer, parameter :: b(2) = a + 1
  print *, reshape([1,2,3,4], b)
end

and it doesn’t fail as I thought it would.
So yes, I was wrong; b has been expanded to an array before.

Can you add an assert or a comment saying that the parameter value has 
been expanded to a constant array?


Ok with that change.

Re: [PATCH v7] rtl: builtins: (not just) rs6000: Add builtins for fegetround, feclearexcept and feraiseexcept [PR94193]

2021-11-25 Thread Segher Boessenkool

Hi!

On Wed, Nov 24, 2021 at 08:48:47PM -0300, Raoni Fassina Firmino wrote:
> gcc/ChangeLog:
> * builtins.c (expand_builtin_fegetround): New function.
> (expand_builtin_feclear_feraise_except): New function.
> (expand_builtin): Add cases for BUILT_IN_FEGETROUND,
> BUILT_IN_FECLEAREXCEPT and BUILT_IN_FERAISEEXCEPT

Something is missing here (maybe just a full stop?)

> * config/rs6000/rs6000.md (fegetroundsi): New pattern.
> (feclearexceptsi): New Pattern.
> (feraiseexceptsi): New Pattern.
> * doc/extend.texi: Add a new introductory paragraph about the
> new builtins.

Pet peeve: please don't break lines early, we have only 72 columns per
line and we have many long symbol names.  Trying to make many lines very
short only results in everything looking very irregular, which is harder
to read.

> * doc/md.texi: (fegetround@var{m}): Document new optab.
> (feclearexcept@var{m}): Document new optab.
> (feraiseexcept@var{m}): Document new optab.
> * optabs.def (fegetround_optab): New optab.
> (feclearexcept_optab): New optab.
> (feraiseexcept_optab): New optab.
> 
> gcc/testsuite/ChangeLog:
> 
> * gcc.target/powerpc/builtin-feclearexcept-feraiseexcept-1.c: New 
> test.
> * gcc.target/powerpc/builtin-feclearexcept-feraiseexcept-2.c: New 
> test.
> * gcc.target/powerpc/builtin-fegetround.c: New test.

> --- a/gcc/config/rs6000/rs6000.md
> +++ b/gcc/config/rs6000/rs6000.md
> @@ -6860,6 +6860,117 @@
>[(set_attr "type" "fpload")
> (set_attr "length" "8")
> (set_attr "isa" "*,p8v,p8v")])
> +
> +;; int fegetround(void)
> +;;
> +;; This expansion for the C99 function only expands for compatible
> +;; target libcs. Because it needs to return one of FE_DOWNWARD,
> +;; FE_TONEAREST, FE_TOWARDZERO or FE_UPWARD with the values as defined
> +;; by the target libc, and since they are free to
> +;; choose the values and the expand needs to know then beforehand,
> +;; this expand only expands for target libcs that it can handle the
> +;; values is knows.
> +;; Because of these restriction, this only expands on the desired
> +;; case and fallback to a call to libc on any otherwise.
> +(define_expand "fegetroundsi"

(This needs some wordsmithing.)

> +;; int feclearexcept(int excepts)
> +;;
> +;; This expansion for the C99 function only works when EXCEPTS is a
> +;; constant known at compile time and specifies any one of
> +;; FE_INEXACT, FE_DIVBYZERO, FE_UNDERFLOW and FE_OVERFLOW flags.
> +;; It doesn't handle values out of range, and always returns 0.

It FAILs the expansion if a parameter is bad?  Is this comment out of
date?

> +;; Note that FE_INVALID is unsupported because it maps to more than
> +;; one bit of the FPSCR register.

It could be implemented, now that you check for the libc used.  It is a
fixed part of the ABI :-)

> +;; The FE_* are defined in the targed libc, and since they are free to
> +;; choose the values and the expand needs to know then beforehand,

s/then/them/

> +;; this expand only expands for target libcs that it can handle the

(this expander)

> +;; values is knows.

s/is/it/

> +/* This testcase ensures that the builtins expand with the matching arguments
> + * or otherwise fallback gracefully to a function call, and don't ICE during
> + * compilation.
> + * "-fno-builtin" option is used to enable calls to libc implementation of 
> the
> + * gcc builtins tested when not using __builtin_ prefix. */

Don't use leading * in comments, btw.  This is a testcase so anything
goes, but FYI :-)

> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/powerpc/builtin-fegetround.c

> +  int i, rounding, expected;
> +  const int rm[] = {FE_TONEAREST, FE_TOWARDZERO, FE_UPWARD, FE_DOWNWARD};
> +  for (i = 0; i < sizeof(rm); i++)

That should be   sizeof rm / sizeof rm[0]   ?  It accesses out of bounds
as it is.

Maybe test more values?  At least 0, but also combinations of these FE_
bits, and maybe even FE_INVALID?

With such changes the rs6000 parts are okay for trunk.  Thanks!

I looked at the generic changes as well, and they all look fine to me.


Segher

Re: libstdc++: Make atomic::wait() const [PR102994]

2021-11-25 Thread Jonathan Wakely via Gcc-patches

On Wed, 24 Nov 2021 at 01:27, Thomas Rodgers wrote:
>
> const qualification was also missing in the free functions for 
> wait/wait_explicit/notify_one/notify_all. Revised patch attached.

Please tweak the whitespace in the new test:

> +test1(const std::atomic &a, char*p)

The '&' should be on the type not the variable, and there should be a
space before 'p':

> +test1(const std::atomic& a, char* p)

OK for trunk and gcc-11 with that tweak, thanks!

[PATCH, v2] PR fortran/103411 - ICE in gfc_conv_array_initializer, at fortran/trans-array.c:6377

2021-11-25 Thread Harald Anlauf via Gcc-patches


Hi Mikael,

Am 25.11.21 um 22:02 schrieb Mikael Morin:

Le 25/11/2021 à 21:03, Harald Anlauf a écrit :

Hi Mikael,

Am 25.11.21 um 17:46 schrieb Mikael Morin:

Hello,

Le 24/11/2021 à 22:32, Harald Anlauf via Fortran a écrit :

diff --git a/gcc/fortran/check.c b/gcc/fortran/check.c
index 5a5aca10ebe..837eb0912c0 100644
--- a/gcc/fortran/check.c
+++ b/gcc/fortran/check.c
@@ -4866,10 +4868,17 @@ gfc_check_reshape (gfc_expr *source, gfc_expr
*shape,
 {
   gfc_constructor *c;
   bool test;
+  gfc_constructor_base b;

+  if (shape->expr_type == EXPR_ARRAY)
+    b = shape->value.constructor;
+  else if (shape->expr_type == EXPR_VARIABLE)
+    b = shape->symtree->n.sym->value->value.constructor;


This misses a check that shape->symtree->n.sym->value is an array, so
that it makes sense to access its constructor.


there are checks further above for the cases
   shape->expr_type == EXPR_ARRAY
and for
   shape->expr_type == EXPR_VARIABLE
which look at the elements of array shape to see if they are
non-negative.

Only in those cases where the full "if ()'s" pass we set
shape_is_const = true; and proceed.  The purpose of the auxiliary
bool shape_is_const is to avoid repeating the lengthy if's again.
Only then the above cited code segment should get executed.

For shape->expr_type == EXPR_ARRAY there is really no change in logic.
For shape->expr_type == EXPR_VARIABLE the above snipped is now executed,
but then we already had

   else if (shape->expr_type == EXPR_VARIABLE && shape->ref
    && shape->ref->u.ar.type == AR_FULL && shape->ref->u.ar.dimen
== 1
    && shape->ref->u.ar.as
    && shape->ref->u.ar.as->lower[0]->expr_type == EXPR_CONSTANT
    && shape->ref->u.ar.as->lower[0]->ts.type == BT_INTEGER
    && shape->ref->u.ar.as->upper[0]->expr_type == EXPR_CONSTANT
    && shape->ref->u.ar.as->upper[0]->ts.type == BT_INTEGER
    && shape->symtree->n.sym->attr.flavor == FL_PARAMETER
    && shape->symtree->n.sym->value)

In which situations do I miss anything new?


Yes, I agree with all of this.
My comment wasn’t about a check on shape->expr_type, but on
shape->value->expr_type if shape->expr_type is a (parameter) variable.


Actually, this only supports the case where the parameter value is
defined by an array; but it could be an intrinsic call, a sum of
parameters, a reference to an other parameter, etc.


E.g. the following (still) does get rejected:

   print *, reshape([1,2,3,4,5], a+1)
   print *, reshape([1,2,3,4,5], a+a)
   print *, reshape([1,2,3,4,5], 2*a)
   print *, reshape([1,2,3,4,5], [3,3])
   print *, reshape([1,2,3,4,5], spread(3,dim=1,ncopies=2))

and has been rejected before.




The usual way to handle this is to call gfc_reduce_init_expr which (pray
for it) will make an array out of whatever the shape expression is.


Can you give an example where it fails?

I think the current code would almost certainly fail, too.


Probably, I was just trying to avoid followup bugs. ;-)

I have checked the following:

   integer, parameter :: a(2) = [1,1]
   integer, parameter :: b(2) = a + 1
   print *, reshape([1,2,3,4], b)
end

and it doesn’t fail as I thought it would.


well, that one is actually better valid, since b=[2,2].


So yes, I was wrong; b has been expanded to an array before.


Motivated by your reasoning I tried gfc_reduce_init_expr.  That attempt
failed miserably (many regressions), and I think it is not right.

Then I found that array sections posed a problem that wasn't detected
before.  gfc_simplify_expr seemed to be a better choice that makes more
sense for the present situations and seems to work here.  And it even
detects many more invalid cases now than e.g. Intel ;-)

I've updated the patch and testcase accordingly.


Can you add an assert or a comment saying that the parameter value has
been expanded to a constant array?

Ok with that change.



Given the above discussion, I'll give you another day or two to have a
further look.  Otherwise Gerhard will... ;-)

Cheers,
Harald
From 56fd0d23ac0a5bda802e5cce3024b947e497555a Mon Sep 17 00:00:00 2001
From: Harald Anlauf 
Date: Thu, 25 Nov 2021 22:39:44 +0100
Subject: [PATCH] Fortran: improve check of arguments to the RESHAPE intrinsic

gcc/fortran/ChangeLog:

	PR fortran/103411
	* check.c (gfc_check_reshape): Improve check of size of source
	array for the RESHAPE intrinsic against the given shape when pad
	is not given, and shape is a parameter.  Try other simplifications
	of shape.

gcc/testsuite/ChangeLog:

	PR fortran/103411
	* gfortran.dg/pr68153.f90: Adjust test to improved check.
	* gfortran.dg/reshape_7.f90: Likewise.
	* gfortran.dg/reshape_9.f90: New test.
---
 gcc/fortran/check.c | 22 +-
 gcc/testsuite/gfortran.dg/pr68153.f90   |  2 +-
 gcc/testsuite/gfortran.dg/reshape_7.f90 |  2 +-
 gcc/testsuite/gfortran.dg/reshape_9.f90 | 24 
 4 files changed, 43 insertions(+), 7 deletions(-)
 create mode 100644 gcc/t

[PATCH] x86: Add -mmove-max=bits and -mstore-max=bits

2021-11-25 Thread H.J. Lu via Gcc-patches

Add -mmove-max=bits and -mstore-max=bits to enable 256-bit/512-bit move
and store, independent of -mprefer-vector-width=bits:

1. Add X86_TUNE_AVX512_MOVE_BY_PIECES and X86_TUNE_AVX512_STORE_BY_PIECES
which are enabled for Intel Sapphire Rapids processor.
2. Add -mmove-max=bits to set the maximum number of bits can be moved from
memory to memory efficiently.  The default value is derived from
X86_TUNE_AVX512_MOVE_BY_PIECES, X86_TUNE_AVX256_MOVE_BY_PIECES, and the
preferred vector width.
3. Add -mstore-max=bits to set the maximum number of bits can be stored to
memory efficiently.  The default value is derived from
X86_TUNE_AVX512_STORE_BY_PIECES, X86_TUNE_AVX256_STORE_BY_PIECES and the
preferred vector width.

gcc/

PR target/103269
* config/i386/i386-expand.c (ix86_expand_builtin): Pass PVW_NONE
and PVW_NONE to ix86_target_string.
* config/i386/i386-options.c (ix86_target_string): Add arguments
for move_max and store_max.
(ix86_target_string::add_vector_width): New lambda.
(ix86_debug_options): Pass ix86_move_max and ix86_store_max to
ix86_target_string.
(ix86_function_specific_print): Pass ptr->x_ix86_move_max and
ptr->x_ix86_store_max to ix86_target_string.
(ix86_valid_target_attribute_tree): Handle x_ix86_move_max and
x_ix86_store_max.
(ix86_option_override_internal): Set the default x_ix86_move_max
and x_ix86_store_max.
* config/i386/i386-options.h (ix86_target_string): Add
prefer_vector_width and prefer_vector_width.
* config/i386/i386.h (TARGET_AVX256_MOVE_BY_PIECES): Removed.
(TARGET_AVX256_STORE_BY_PIECES): Likewise.
(MOVE_MAX): Use 64 if ix86_move_max or ix86_store_max ==
PVW_AVX512.  Use 32 if ix86_move_max or ix86_store_max >=
PVW_AVX256.
(STORE_MAX_PIECES): Use 64 if ix86_store_max == PVW_AVX512.
Use 32 if ix86_store_max >= PVW_AVX256.
* config/i386/i386.opt: Add -mmove-max=bits and -mstore-max=bits.
* config/i386/x86-tune.def (X86_TUNE_AVX512_MOVE_BY_PIECES): New.
(X86_TUNE_AVX512_STORE_BY_PIECES): Likewise.
* doc/invoke.texi: Document -mmove-max=bits and -mstore-max=bits.

gcc/testsuite/

PR target/103269
* gcc.target/i386/pieces-memcpy-17.c: New test.
* gcc.target/i386/pieces-memcpy-18.c: Likewise.
* gcc.target/i386/pieces-memcpy-19.c: Likewise.
* gcc.target/i386/pieces-memcpy-20.c: Likewise.
* gcc.target/i386/pieces-memcpy-21.c: Likewise.
* gcc.target/i386/pieces-memset-45.c: Likewise.
* gcc.target/i386/pieces-memset-46.c: Likewise.
* gcc.target/i386/pieces-memset-47.c: Likewise.
* gcc.target/i386/pieces-memset-48.c: Likewise.
* gcc.target/i386/pieces-memset-49.c: Likewise.
---
 gcc/config/i386/i386-expand.c |  1 +
 gcc/config/i386/i386-options.c| 75 +--
 gcc/config/i386/i386-options.h|  6 +-
 gcc/config/i386/i386.h| 18 ++---
 gcc/config/i386/i386.opt  |  8 ++
 gcc/config/i386/x86-tune.def  | 10 +++
 gcc/doc/invoke.texi   | 13 
 .../gcc.target/i386/pieces-memcpy-17.c| 16 
 .../gcc.target/i386/pieces-memcpy-18.c| 16 
 .../gcc.target/i386/pieces-memcpy-19.c| 16 
 .../gcc.target/i386/pieces-memcpy-20.c| 16 
 .../gcc.target/i386/pieces-memcpy-21.c| 16 
 .../gcc.target/i386/pieces-memset-45.c| 16 
 .../gcc.target/i386/pieces-memset-46.c| 17 +
 .../gcc.target/i386/pieces-memset-47.c| 17 +
 .../gcc.target/i386/pieces-memset-48.c| 17 +
 .../gcc.target/i386/pieces-memset-49.c| 16 
 17 files changed, 276 insertions(+), 18 deletions(-)
 create mode 100644 gcc/testsuite/gcc.target/i386/pieces-memcpy-17.c
 create mode 100644 gcc/testsuite/gcc.target/i386/pieces-memcpy-18.c
 create mode 100644 gcc/testsuite/gcc.target/i386/pieces-memcpy-19.c
 create mode 100644 gcc/testsuite/gcc.target/i386/pieces-memcpy-20.c
 create mode 100644 gcc/testsuite/gcc.target/i386/pieces-memcpy-21.c
 create mode 100644 gcc/testsuite/gcc.target/i386/pieces-memset-45.c
 create mode 100644 gcc/testsuite/gcc.target/i386/pieces-memset-46.c
 create mode 100644 gcc/testsuite/gcc.target/i386/pieces-memset-47.c
 create mode 100644 gcc/testsuite/gcc.target/i386/pieces-memset-48.c
 create mode 100644 gcc/testsuite/gcc.target/i386/pieces-memset-49.c

diff --git a/gcc/config/i386/i386-expand.c b/gcc/config/i386/i386-expand.c
index 0d5d1a0e205..7e77ff56ddc 100644
--- a/gcc/config/i386/i386-expand.c
+++ b/gcc/config/i386/i386-expand.c
@@ -12295,6 +12295,7 @@ ix86_expand_builtin (tree exp, rtx target, rtx 
subtarget,
   char *opts = ix86_target_string (bisa, bisa2, 0, 0, NULL, NULL,
   (enum fpmath_unit) 0,

[committed] libstdc++: Make std::pointer_traits SFINAE-friendly [PR96416]

2021-11-25 Thread Jonathan Wakely via Gcc-patches

Tested x86_64-linux, pushed to trunk.



This implements the resolution I'm proposing for LWG 3545, to avoid hard
errors when using std::to_address for types that make pointer_traits
ill-formed.

Consistent with std::iterator_traits, instantiating std::pointer_traits
for a non-pointer type will be well-formed, but give an empty type with
no member types. This avoids the problematic cases for std::to_address.
Additionally, the pointer_to member is now only declared when the
element type is not cv void (and for C++20, when the function body would
be well-formed). The rebind member was already SFINAE-friendly in our
implementation.

libstdc++-v3/ChangeLog:

PR libstdc++/96416
* include/bits/ptr_traits.h (pointer_traits): Reimplement to be
SFINAE-friendly (LWG 3545).
* testsuite/20_util/pointer_traits/lwg3545.cc: New test.
* testsuite/20_util/to_address/1_neg.cc: Adjust dg-error line.
* testsuite/20_util/to_address/lwg3545.cc: New test.
---
 libstdc++-v3/include/bits/ptr_traits.h| 167 +-
 .../20_util/pointer_traits/lwg3545.cc | 120 +
 .../testsuite/20_util/to_address/1_neg.cc |   2 +-
 .../testsuite/20_util/to_address/lwg3545.cc   |  12 ++
 4 files changed, 251 insertions(+), 50 deletions(-)
 create mode 100644 libstdc++-v3/testsuite/20_util/pointer_traits/lwg3545.cc
 create mode 100644 libstdc++-v3/testsuite/20_util/to_address/lwg3545.cc

diff --git a/libstdc++-v3/include/bits/ptr_traits.h 
b/libstdc++-v3/include/bits/ptr_traits.h
index 115b86d43e4..4987fa9942f 100644
--- a/libstdc++-v3/include/bits/ptr_traits.h
+++ b/libstdc++-v3/include/bits/ptr_traits.h
@@ -35,6 +35,7 @@
 #include 
 
 #if __cplusplus > 201703L
+#include 
 #define __cpp_lib_constexpr_memory 201811L
 namespace __gnu_debug { struct _Safe_iterator_base; }
 #endif
@@ -45,55 +46,119 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
 
   class __undefined;
 
-  // Given Template return T, otherwise invalid.
+  // For a specialization `SomeTemplate` the member `type` is T,
+  // otherwise `type` is `__undefined`.
   template
 struct __get_first_arg
 { using type = __undefined; };
 
-  template class _Template, typename _Tp,
+  template class _SomeTemplate, typename _Tp,
typename... _Types>
-struct __get_first_arg<_Template<_Tp, _Types...>>
+struct __get_first_arg<_SomeTemplate<_Tp, _Types...>>
 { using type = _Tp; };
 
-  template
-using __get_first_arg_t = typename __get_first_arg<_Tp>::type;
-
-  // Given Template and U return Template, otherwise invalid.
+  // For a specialization `SomeTemplate` and a type `U` the member
+  // `type` is `SomeTemplate`, otherwise there is no member `type`.
   template
 struct __replace_first_arg
 { };
 
-  template class _Template, typename _Up,
+  template class _SomeTemplate, typename _Up,
typename _Tp, typename... _Types>
-struct __replace_first_arg<_Template<_Tp, _Types...>, _Up>
-{ using type = _Template<_Up, _Types...>; };
+struct __replace_first_arg<_SomeTemplate<_Tp, _Types...>, _Up>
+{ using type = _SomeTemplate<_Up, _Types...>; };
 
-  template
-using __replace_first_arg_t = typename __replace_first_arg<_Tp, _Up>::type;
-
-  template
-using __make_not_void
-  = __conditional_t::value, __undefined, _Tp>;
-
-  /**
-   * @brief  Uniform interface to all pointer-like types
-   * @ingroup pointer_abstractions
-  */
+#if __cpp_concepts
+  // When concepts are supported detection of _Ptr::element_type is done
+  // by a requires-clause, so __ptr_traits_elem_t only needs to do this:
   template
-struct pointer_traits
+using __ptr_traits_elem_t = typename __get_first_arg<_Ptr>::type;
+#else
+  // Detect the element type of a pointer-like type.
+  template
+struct __ptr_traits_elem : __get_first_arg<_Ptr>
+{ };
+
+  // Use _Ptr::element_type if is a valid type.
+  template
+struct __ptr_traits_elem<_Ptr, __void_t>
+{ using type = typename _Ptr::element_type; };
+
+  template
+using __ptr_traits_elem_t = typename __ptr_traits_elem<_Ptr>::type;
+#endif
+
+  // Define pointer_traits::pointer_to.
+  template::value>
+struct __ptr_traits_ptr_to
+{
+  using pointer = _Ptr;
+  using element_type = _Elt;
+
+  /**
+   *  @brief  Obtain a pointer to an object
+   *  @param  __r  A reference to an object of type `element_type`
+   *  @return `pointer::pointer_to(__e)`
+   *  @pre `pointer::pointer_to(__e)` is a valid expression.
+  */
+  static pointer
+  pointer_to(element_type& __e)
+#if __cpp_lib_concepts
+  requires requires {
+   { pointer::pointer_to(__e) } -> convertible_to;
+  }
+#endif
+  { return pointer::pointer_to(__e); }
+};
+
+  // Do not define pointer_traits::pointer_to if element type is void.
+  template
+struct __ptr_traits_ptr_to<_Ptr, _Elt, true>
+{ };
+
+  // Partial specialization defining pointer_traits::pointer_to(T&).
+  template
+

[committed] libstdc++: Remove dg-error that no longer happens

2021-11-25 Thread Jonathan Wakely via Gcc-patches

Tested x86_64-linux, pushed to trunk.



There was a c++11_only dg-error in this testcase, for a "body of
constexpr function is not a return statement" diagnostic that was bogus,
but happened because the return statement was ill-formed. A change to
G++ earlier this month means that diagnostic is no longer emitted, so
remove the dg-error.

libstdc++-v3/ChangeLog:

* testsuite/20_util/tuple/comparison_operators/overloaded2.cc:
Remove dg-error for C++11_only error.
---
 .../testsuite/20_util/tuple/comparison_operators/overloaded2.cc  | 1 -
 1 file changed, 1 deletion(-)

diff --git 
a/libstdc++-v3/testsuite/20_util/tuple/comparison_operators/overloaded2.cc 
b/libstdc++-v3/testsuite/20_util/tuple/comparison_operators/overloaded2.cc
index bac16ffd521..6a7a584c71e 100644
--- a/libstdc++-v3/testsuite/20_util/tuple/comparison_operators/overloaded2.cc
+++ b/libstdc++-v3/testsuite/20_util/tuple/comparison_operators/overloaded2.cc
@@ -52,4 +52,3 @@ auto b = a < a;
 // { dg-error "no match for 'operator<'" "" { target c++20 } 0 }
 // { dg-error "no match for .*_Synth3way|in requirements" "" { target c++20 } 
0 }
 // { dg-error "ordered comparison" "" { target c++17_down } 0 }
-// { dg-error "not a return-statement" "" { target c++11_only } 0 }
-- 
2.31.1

[r12-5531 Regression] FAIL: gcc.dg/ipa/inline-9.c scan-ipa-dump inline "Inlined 1 calls" on Linux/x86_64

2021-11-25 Thread sunil.k.pandey via Gcc-patches

On Linux/x86_64,

1b0acc4b800b589a39d637d7312da5cf969a5765 is the first bad commit
commit 1b0acc4b800b589a39d637d7312da5cf969a5765
Author: Jan Hubicka 
Date:   Thu Nov 25 23:58:48 2021 +0100

Remove forgotten early return in ipa_value_range_from_jfunc

caused

FAIL: gcc.dg/ipa/inline-9.c scan-ipa-dump inline "Inlined 1 calls"

with GCC configured with

../../gcc/configure 
--prefix=/local/skpandey/gccwork/toolwork/gcc-bisect-master/master/r12-5531/usr 
--enable-clocale=gnu --with-system-zlib --with-demangler-in-ld 
--with-fpmath=sse --enable-languages=c,c++,fortran --enable-cet --without-isl 
--enable-libmpx x86_64-linux --disable-bootstrap

To reproduce:

$ cd {build_dir}/gcc && make check RUNTESTFLAGS="ipa.exp=gcc.dg/ipa/inline-9.c 
--target_board='unix{-m32}'"
$ cd {build_dir}/gcc && make check RUNTESTFLAGS="ipa.exp=gcc.dg/ipa/inline-9.c 
--target_board='unix{-m32\ -march=cascadelake}'"
$ cd {build_dir}/gcc && make check RUNTESTFLAGS="ipa.exp=gcc.dg/ipa/inline-9.c 
--target_board='unix{-m64}'"
$ cd {build_dir}/gcc && make check RUNTESTFLAGS="ipa.exp=gcc.dg/ipa/inline-9.c 
--target_board='unix{-m64\ -march=cascadelake}'"

(Please do not reply to this email, for question about this report, contact me 
at skpgkp2 at gmail dot com)

[PATCH v3 0/8] __builtin_dynamic_object_size

2021-11-25 Thread Siddhesh Poyarekar

This patchset implements the __builtin_dynamic_object_size builtin for
gcc.  The primary motivation to have this builtin in gcc is to enable
_FORTIFY_SOURCE=3 support with gcc, thus allowing greater fortification
in use cases where the potential performance tradeoff is acceptable.

Semantics:
--

__builtin_dynamic_object_size has the same signature as
__builtin_object_size; it accepts a pointer and type ranging from 0 to 3
and it returns an object size estimate for the pointer based on an
analysis of which objects the pointer could point to.  The actual
properties of the object size estimate are different:

- In the best case __builtin_dynamic_object_size evaluates to an
  expression that represents a precise size of the object being pointed
  to.

- In case a precise object size expression cannot be evaluated,
  __builtin_dynamic_object_size attempts to evaluate an estimate size
  expression based on the object size type.

- In what situations the builtin returns an estimate vs a precise
  expression is an implementation detail and may change in future.
  Users must always assume, as in the case of __builtin_object_size, that
  the returned value is the maximum or minimum based on the object size
  type they have provided.

- In the worst case of failure, __builtin_dynamic_object_size returns a
  constant (size_t)-1 or (size_t)0.

Implementation:
---

- The __builtin_dynamic_object_size support is implemented in
  tree-object-size.  In most cases the first pass (early_objsz) the
  builtin is treated like __builtin_object_size to preserve subobject
  bounds.

- Each element of the object_sizes vector is now a TREE_VEC of size 2
  holding bytes to the end of the object and the full size of the
  object.  This allows proper handling of negative offsets, allowing
  them to the extent of the whole object bounds.  This improves
  __builtin_object_size usage too with negative offsets, consistently
  returning valid results for pointer decrementing loops too.

- The patchset begins with structural modification of the
  tree-object-size pass, followed by enhancement to return size
  expressions.  I have split the implementation into one feature per
  patch (calls, function parameters, PHI, etc.) to hopefully ease
  review.

Performance:


Expressions generated by this pass in theory could be arbitrarily
complex.  I have not made an attempt to limit nesting of objects since
it seemed too early to do that.  In practice based on the few
applications I built, most of the complexity of the expressions got
folded away.  Even so, the performance overhead is likely to be
non-zero.  If we find performance degradation to be significant, we
could later add nesting limits to bail out if a size expression gets too
complex.

I have implemented simplification of __*_chk to their normal
variants if we can determine at compile time that it is safe.  This
should limit the performance overhead of the expressions in valid cases.

Build time performance doesn't seem to be affected much based on an
unscientific check to time
`make check-gcc RUNTESTFLAGS="dg.exp=builtin*"`.  It only increases by
about a couple of seconds when the dynamic tests are added and remains
more or less in the same ballpark otherwise.

Testing:


I have added tests for dynamic object sizes as well as wrappers for all
__builtin_object_size tests to provide wide coverage.  I have also done
a full bootstrap build and test run on x86_64.

I have also built bash, cmake, wpa_supplicant and systemtap with
_FORTIFY_SOURCE=2 and _FORTIFY_SOURCE=3 (with a hacked up glibc to make
sure it works) and saw no issues in any of those builds.  I did some
rudimentary analysis of the generated binaries using fortify-metrics[1]
to confirm that there was a difference in coverage between the two
fortification levels.

Here is a summary of coverage in the above packages:

F = number of fortified calls
T = Total number of calls to fortifiable functions (fortified as well as
unfortified)
C = F * 100/ T

Package F(2)T(2)F(3)T(3)C(2)C(3)
bash428 12201005119635.08%  84.03%
wpa_supplicant  163532322350340850.59%  68.96%
systemtap   324 1990343 199416.28%  17.20%
cmake   830 14181   958 14196   5.85%   6.75%

The numbers are slightly lower than the previous patch series because in
the interim I pushed an improvement to folding of the _chk builtins so
that they can use ranges to simplify the calls to their regular
variants.  Also note that even _FORTIFY_SOURCE=2 coverage should be
improved due to negative offset handling.

Additional testing plans (i.e. I've already started to do some of this):

- Build packages to compare values returned by __builtin_object_size
  with the older pass and this new one.  Also compare with
  __builtin_dynamic_object_size.

- Expand the list of packages to get more coverage metrics.

- Explore performance impact on ap

[PATCH v3 1/8] tree-object-size: Replace magic numbers with enums

2021-11-25 Thread Siddhesh Poyarekar

A simple cleanup to allow inserting dynamic size code more easily.

gcc/ChangeLog:

* tree-object-size.c: New enum.
(object_sizes, computed, addr_object_size,
compute_builtin_object_size, expr_object_size, call_object_size,
merge_object_sizes, plus_stmt_object_size,
collect_object_sizes_for, init_object_sizes, fini_object_sizes,
object_sizes_execute): Replace magic numbers with enums.

Signed-off-by: Siddhesh Poyarekar 
---

Changes from v2:

- Incorporated review suggestions.

 gcc/tree-object-size.c | 59 --
 1 file changed, 34 insertions(+), 25 deletions(-)

diff --git a/gcc/tree-object-size.c b/gcc/tree-object-size.c
index 4334e05ef70..5e93bb74f92 100644
--- a/gcc/tree-object-size.c
+++ b/gcc/tree-object-size.c
@@ -45,6 +45,13 @@ struct object_size_info
   unsigned int *stack, *tos;
 };
 
+enum
+{
+  OST_SUBOBJECT = 1,
+  OST_MINIMUM = 2,
+  OST_END = 4,
+};
+
 static tree compute_object_offset (const_tree, const_tree);
 static bool addr_object_size (struct object_size_info *,
  const_tree, int, unsigned HOST_WIDE_INT *);
@@ -67,10 +74,10 @@ static void check_for_plus_in_loops_1 (struct 
object_size_info *, tree,
the subobject (innermost array or field with address taken).
object_sizes[2] is lower bound for number of bytes till the end of
the object and object_sizes[3] lower bound for subobject.  */
-static vec object_sizes[4];
+static vec object_sizes[OST_END];
 
 /* Bitmaps what object sizes have been computed already.  */
-static bitmap computed[4];
+static bitmap computed[OST_END];
 
 /* Maximum value of offset we consider to be addition.  */
 static unsigned HOST_WIDE_INT offset_limit;
@@ -227,11 +234,11 @@ addr_object_size (struct object_size_info *osi, 
const_tree ptr,
 {
   unsigned HOST_WIDE_INT sz;
 
-  if (!osi || (object_size_type & 1) != 0
+  if (!osi || (object_size_type & OST_SUBOBJECT) != 0
  || TREE_CODE (TREE_OPERAND (pt_var, 0)) != SSA_NAME)
{
  compute_builtin_object_size (TREE_OPERAND (pt_var, 0),
-  object_size_type & ~1, &sz);
+  object_size_type & ~OST_SUBOBJECT, &sz);
}
   else
{
@@ -266,7 +273,7 @@ addr_object_size (struct object_size_info *osi, const_tree 
ptr,
 }
   else if (DECL_P (pt_var))
 {
-  pt_var_size = decl_init_size (pt_var, object_size_type & 2);
+  pt_var_size = decl_init_size (pt_var, object_size_type & OST_MINIMUM);
   if (!pt_var_size)
return false;
 }
@@ -287,7 +294,7 @@ addr_object_size (struct object_size_info *osi, const_tree 
ptr,
 {
   tree var;
 
-  if (object_size_type & 1)
+  if (object_size_type & OST_SUBOBJECT)
{
  var = TREE_OPERAND (ptr, 0);
 
@@ -528,7 +535,7 @@ bool
 compute_builtin_object_size (tree ptr, int object_size_type,
 unsigned HOST_WIDE_INT *psize)
 {
-  gcc_assert (object_size_type >= 0 && object_size_type <= 3);
+  gcc_assert (object_size_type >= 0 && object_size_type < OST_END);
 
   /* Set to unknown and overwrite just before returning if the size
  could be determined.  */
@@ -546,7 +553,7 @@ compute_builtin_object_size (tree ptr, int object_size_type,
 
   if (computed[object_size_type] == NULL)
 {
-  if (optimize || object_size_type & 1)
+  if (optimize || object_size_type & OST_SUBOBJECT)
return false;
 
   /* When not optimizing, rather than failing, make a small effort
@@ -586,8 +593,8 @@ compute_builtin_object_size (tree ptr, int object_size_type,
   if (dump_file)
{
  fprintf (dump_file, "Computing %s %sobject size for ",
-  (object_size_type & 2) ? "minimum" : "maximum",
-  (object_size_type & 1) ? "sub" : "");
+  (object_size_type & OST_MINIMUM) ? "minimum" : "maximum",
+  (object_size_type & OST_SUBOBJECT) ? "sub" : "");
  print_generic_expr (dump_file, ptr, dump_flags);
  fprintf (dump_file, ":\n");
}
@@ -620,7 +627,7 @@ compute_builtin_object_size (tree ptr, int object_size_type,
 terminate, it could take a long time.  If a pointer is
 increasing this way, we need to assume 0 object size.
 E.g. p = &buf[0]; while (cond) p = p + 4;  */
- if (object_size_type & 2)
+ if (object_size_type & OST_MINIMUM)
{
  osi.depths = XCNEWVEC (unsigned int, num_ssa_names);
  osi.stack = XNEWVEC (unsigned int, num_ssa_names);
@@ -679,8 +686,9 @@ compute_builtin_object_size (tree ptr, int object_size_type,
fprintf (dump_file,
 ": %s %sobject size "
 HOST_WIDE_INT_PRINT_UNSIGNED "\n",
-(object_size_type & 2) ? "minimum" : "maximum",
-(object_size_type & 1) ? "sub" :

[PATCH v3 2/8] tree-object-size: Abstract object_sizes array

2021-11-25 Thread Siddhesh Poyarekar

Put all accesses to object_sizes behind functions so that we can add
dynamic capability more easily.

gcc/ChangeLog:

* tree-object-size.c (object_sizes_grow, object_sizes_release,
object_sizes_unknown_p, object_sizes_get, object_size_set_force,
object_sizes_set): New functions.
(addr_object_size, compute_builtin_object_size,
expr_object_size, call_object_size, unknown_object_size,
merge_object_sizes, plus_stmt_object_size,
cond_expr_object_size, collect_object_sizes_for,
check_for_plus_in_loops_1, init_object_sizes,
fini_object_sizes): Adjust.

Signed-off-by: Siddhesh Poyarekar 
---
Changes from v2:

- Incorporated review suggestions.

 gcc/tree-object-size.c | 177 +++--
 1 file changed, 98 insertions(+), 79 deletions(-)

diff --git a/gcc/tree-object-size.c b/gcc/tree-object-size.c
index 5e93bb74f92..3780437ff91 100644
--- a/gcc/tree-object-size.c
+++ b/gcc/tree-object-size.c
@@ -88,6 +88,71 @@ unknown (int object_size_type)
   return ((unsigned HOST_WIDE_INT) -((object_size_type >> 1) ^ 1));
 }
 
+/* Grow object_sizes[OBJECT_SIZE_TYPE] to num_ssa_names.  */
+
+static inline void
+object_sizes_grow (int object_size_type)
+{
+  if (num_ssa_names > object_sizes[object_size_type].length ())
+object_sizes[object_size_type].safe_grow (num_ssa_names, true);
+}
+
+/* Release object_sizes[OBJECT_SIZE_TYPE].  */
+
+static inline void
+object_sizes_release (int object_size_type)
+{
+  object_sizes[object_size_type].release ();
+}
+
+/* Return true if object_sizes[OBJECT_SIZE_TYPE][VARNO] is unknown.  */
+
+static inline bool
+object_sizes_unknown_p (int object_size_type, unsigned varno)
+{
+  return (object_sizes[object_size_type][varno]
+ == unknown (object_size_type));
+}
+
+/* Return size for VARNO corresponding to OSI.  */
+
+static inline unsigned HOST_WIDE_INT
+object_sizes_get (struct object_size_info *osi, unsigned varno)
+{
+  return object_sizes[osi->object_size_type][varno];
+}
+
+/* Set size for VARNO corresponding to OSI to VAL.  */
+
+static inline bool
+object_sizes_set_force (struct object_size_info *osi, unsigned varno,
+   unsigned HOST_WIDE_INT val)
+{
+  object_sizes[osi->object_size_type][varno] = val;
+  return true;
+}
+
+/* Set size for VARNO corresponding to OSI to VAL if it is the new minimum or
+   maximum.  */
+
+static inline bool
+object_sizes_set (struct object_size_info *osi, unsigned varno,
+ unsigned HOST_WIDE_INT val)
+{
+  int object_size_type = osi->object_size_type;
+  if ((object_size_type & OST_MINIMUM) == 0)
+{
+  if (object_sizes[object_size_type][varno] < val)
+   return object_sizes_set_force (osi, varno, val);
+}
+  else
+{
+  if (object_sizes[object_size_type][varno] > val)
+   return object_sizes_set_force (osi, varno, val);
+}
+  return false;
+}
+
 /* Initialize OFFSET_LIMIT variable.  */
 static void
 init_offset_limit (void)
@@ -247,7 +312,7 @@ addr_object_size (struct object_size_info *osi, const_tree 
ptr,
collect_object_sizes_for (osi, var);
  if (bitmap_bit_p (computed[object_size_type],
SSA_NAME_VERSION (var)))
-   sz = object_sizes[object_size_type][SSA_NAME_VERSION (var)];
+   sz = object_sizes_get (osi, SSA_NAME_VERSION (var));
  else
sz = unknown (object_size_type);
}
@@ -582,14 +647,14 @@ compute_builtin_object_size (tree ptr, int 
object_size_type,
   return false;
 }
 
+  struct object_size_info osi;
+  osi.object_size_type = object_size_type;
   if (!bitmap_bit_p (computed[object_size_type], SSA_NAME_VERSION (ptr)))
 {
-  struct object_size_info osi;
   bitmap_iterator bi;
   unsigned int i;
 
-  if (num_ssa_names > object_sizes[object_size_type].length ())
-   object_sizes[object_size_type].safe_grow (num_ssa_names, true);
+  object_sizes_grow (object_size_type);
   if (dump_file)
{
  fprintf (dump_file, "Computing %s %sobject size for ",
@@ -601,7 +666,6 @@ compute_builtin_object_size (tree ptr, int object_size_type,
 
   osi.visited = BITMAP_ALLOC (NULL);
   osi.reexamine = BITMAP_ALLOC (NULL);
-  osi.object_size_type = object_size_type;
   osi.depths = NULL;
   osi.stack = NULL;
   osi.tos = NULL;
@@ -678,8 +742,7 @@ compute_builtin_object_size (tree ptr, int object_size_type,
   if (dump_file)
{
  EXECUTE_IF_SET_IN_BITMAP (osi.visited, 0, i, bi)
-   if (object_sizes[object_size_type][i]
-   != unknown (object_size_type))
+   if (!object_sizes_unknown_p (object_size_type, i))
  {
print_generic_expr (dump_file, ssa_name (i),
dump_flags);
@@ -689,7 +752,7 @@ compute_builtin_object_size (tree ptr, int object_size_type,
 ((object_size_type & OST_MINIMUM) ? "minimum"

[PATCH v3 3/8] tree-object-size: Save sizes as trees and support negative offsets

2021-11-25 Thread Siddhesh Poyarekar

Transform tree-object-size to operate on tree objects instead of host
wide integers.  This makes it easier to extend to dynamic expressions
for object sizes.

The compute_builtin_object_size interface also now returns a tree
expression instead of HOST_WIDE_INT, so callers have been adjusted to
account for that.

The trees in object_sizes are each a TREE_VEC with the first element
being the bytes from the pointer to the end of the object and the
second, the size of the whole object.  This allows analysis of negative
offsets, which can now be allowed to the extent of the object bounds.
Tests have been added to verify that it actually works.

gcc/ChangeLog:

* tree-object-size.h (compute_builtin_object_size): Return tree
instead of HOST_WIDE_INT.
* builtins.c (fold_builtin_object_size): Adjust.
* gimple-fold.c (gimple_fold_builtin_strncat): Likewise.
* ubsan.c (instrument_object_size): Likewise.
* tree-object-size.c (object_sizes): Change type to vec.
(initval): New function.
(unknown): Use it.
(size_unknown_p, size_initval, size_unknown): New functions.
(object_sizes_unknown_p): Use it.
(object_sizes_get): Return tree.
(object_sizes_initialize): Rename from object_sizes_set_force
and set VAL parameter type as tree.  Add new parameter WHOLEVAL.
(object_sizes_set): Set VAL parameter type as tree and adjust
implementation.  Add new parameter WHOLEVAL.
(size_for_offset): New function.
(decl_init_size): Adjust comment.
(addr_object_size): Change PSIZE parameter to tree and adjust
implementation.  Add new parameter PWHOLESIZE.
(alloc_object_size): Return tree.
(compute_builtin_object_size): Return tree in PSIZE.
(expr_object_size, call_object_size, unknown_object_size):
Adjust for object_sizes_set change.
(merge_object_sizes): Drop OFFSET parameter and adjust
implementation for tree change.
(plus_stmt_object_size): Call collect_object_sizes_for directly
instead of merge_object_size and call size_for_offset to get net
size.
(cond_expr_object_size, collect_object_sizes_for,
object_sizes_execute): Adjust for change of type from
HOST_WIDE_INT to tree.
(check_for_plus_in_loops_1): Likewise and skip non-positive
offsets.

gcc/testsuite/ChangeLog:

* gcc.dg/builtin-object-size-1.c (test9): New test.
(main): Call it.
* gcc.dg/builtin-object-size-2.c (test8): New test.
(main): Call it.
* gcc.dg/builtin-object-size-3.c (test9): New test.
(main): Call it.
* gcc.dg/builtin-object-size-4.c (test8): New test.
(main): Call it.
* gcc.dg/builtin-object-size-5.c (test5, test6, test7): New
tests.

Signed-off-by: Siddhesh Poyarekar 
---
Changes from v2:

- Incorporated review suggestions.
- Added support for negative offsets.

 gcc/builtins.c   |  10 +-
 gcc/gimple-fold.c|  11 +-
 gcc/testsuite/gcc.dg/builtin-object-size-1.c |  30 ++
 gcc/testsuite/gcc.dg/builtin-object-size-2.c |  30 ++
 gcc/testsuite/gcc.dg/builtin-object-size-3.c |  31 ++
 gcc/testsuite/gcc.dg/builtin-object-size-4.c |  30 ++
 gcc/testsuite/gcc.dg/builtin-object-size-5.c |  25 ++
 gcc/tree-object-size.c   | 388 ---
 gcc/tree-object-size.h   |   2 +-
 gcc/ubsan.c  |   5 +-
 10 files changed, 403 insertions(+), 159 deletions(-)

diff --git a/gcc/builtins.c b/gcc/builtins.c
index 384864bfb3a..50e66692775 100644
--- a/gcc/builtins.c
+++ b/gcc/builtins.c
@@ -10226,7 +10226,7 @@ maybe_emit_sprintf_chk_warning (tree exp, enum 
built_in_function fcode)
 static tree
 fold_builtin_object_size (tree ptr, tree ost)
 {
-  unsigned HOST_WIDE_INT bytes;
+  tree bytes;
   int object_size_type;
 
   if (!validate_arg (ptr, POINTER_TYPE)
@@ -10251,8 +10251,8 @@ fold_builtin_object_size (tree ptr, tree ost)
   if (TREE_CODE (ptr) == ADDR_EXPR)
 {
   compute_builtin_object_size (ptr, object_size_type, &bytes);
-  if (wi::fits_to_tree_p (bytes, size_type_node))
-   return build_int_cstu (size_type_node, bytes);
+  if (int_fits_type_p (bytes, size_type_node))
+   return fold_convert (size_type_node, bytes);
 }
   else if (TREE_CODE (ptr) == SSA_NAME)
 {
@@ -10260,8 +10260,8 @@ fold_builtin_object_size (tree ptr, tree ost)
later.  Maybe subsequent passes will help determining
it.  */
   if (compute_builtin_object_size (ptr, object_size_type, &bytes)
- && wi::fits_to_tree_p (bytes, size_type_node))
-   return build_int_cstu (size_type_node, bytes);
+ && int_fits_type_p (bytes, size_type_node))
+   return fold_convert (size_type_node, bytes);
 }
 
   return NULL_TREE;
diff --git a/gcc/gimple-fold.c b/gcc/gimple-fold.c
index 73f090b

[PATCH v3 4/8] __builtin_dynamic_object_size: Recognize builtin

2021-11-25 Thread Siddhesh Poyarekar

Recognize the __builtin_dynamic_object_size builtin and add paths in the
object size path to deal with it, but treat it like
__builtin_object_size for now.  Also add tests to provide the same
testing coverage for the new builtin name.

gcc/ChangeLog:

* builtins.def (BUILT_IN_DYNAMIC_OBJECT_SIZE): New builtin.
* tree-object-size.h: Move object size type bits enum from
tree-object-size.c and add new value OST_DYNAMIC.
* builtins.c (expand_builtin, fold_builtin_2): Handle it.
(fold_builtin_object_size): Handle new builtin and adjust for
change to compute_builtin_object_size.
* tree-object-size.c: Include builtins.h.
(compute_builtin_object_size): Adjust.
(early_object_sizes_execute_one,
dynamic_object_sizes_execute_one): New functions.
(object_sizes_execute): Rename insert_min_max_p argument to
early.  Handle BUILT_IN_DYNAMIC_OBJECT_SIZE and call the new
functions.
doc/extend.texi (__builtin_dynamic_object_size): Document new
builtin.

gcc/testsuite/ChangeLog:

* g++.dg/ext/builtin-dynamic-object-size1.C: New test.
* g++.dg/ext/builtin-dynamic-object-size2.C: Likewise.
* gcc.dg/builtin-dynamic-alloc-size.c: Likewise.
* gcc.dg/builtin-dynamic-object-size-1.c: Likewise.
* gcc.dg/builtin-dynamic-object-size-10.c: Likewise.
* gcc.dg/builtin-dynamic-object-size-11.c: Likewise.
* gcc.dg/builtin-dynamic-object-size-12.c: Likewise.
* gcc.dg/builtin-dynamic-object-size-13.c: Likewise.
* gcc.dg/builtin-dynamic-object-size-14.c: Likewise.
* gcc.dg/builtin-dynamic-object-size-15.c: Likewise.
* gcc.dg/builtin-dynamic-object-size-16.c: Likewise.
* gcc.dg/builtin-dynamic-object-size-17.c: Likewise.
* gcc.dg/builtin-dynamic-object-size-18.c: Likewise.
* gcc.dg/builtin-dynamic-object-size-19.c: Likewise.
* gcc.dg/builtin-dynamic-object-size-2.c: Likewise.
* gcc.dg/builtin-dynamic-object-size-3.c: Likewise.
* gcc.dg/builtin-dynamic-object-size-4.c: Likewise.
* gcc.dg/builtin-dynamic-object-size-5.c: Likewise.
* gcc.dg/builtin-dynamic-object-size-6.c: Likewise.
* gcc.dg/builtin-dynamic-object-size-7.c: Likewise.
* gcc.dg/builtin-dynamic-object-size-8.c: Likewise.
* gcc.dg/builtin-dynamic-object-size-9.c: Likewise.
* gcc.dg/builtin-object-size-16.c: Adjust to allow inclusion
from builtin-dynamic-object-size-16.c.
* gcc.dg/builtin-object-size-17.c: Likewise.

Signed-off-by: Siddhesh Poyarekar 
---
Changes from v2:

- Incorporated review suggestions.

 gcc/builtins.c|  11 +-
 gcc/builtins.def  |   1 +
 gcc/doc/extend.texi   |  13 ++
 .../g++.dg/ext/builtin-dynamic-object-size1.C |   5 +
 .../g++.dg/ext/builtin-dynamic-object-size2.C |   5 +
 .../gcc.dg/builtin-dynamic-alloc-size.c   |   7 +
 .../gcc.dg/builtin-dynamic-object-size-1.c|   6 +
 .../gcc.dg/builtin-dynamic-object-size-10.c   |   9 ++
 .../gcc.dg/builtin-dynamic-object-size-11.c   |   7 +
 .../gcc.dg/builtin-dynamic-object-size-12.c   |   5 +
 .../gcc.dg/builtin-dynamic-object-size-13.c   |   5 +
 .../gcc.dg/builtin-dynamic-object-size-14.c   |   5 +
 .../gcc.dg/builtin-dynamic-object-size-15.c   |   5 +
 .../gcc.dg/builtin-dynamic-object-size-16.c   |   6 +
 .../gcc.dg/builtin-dynamic-object-size-17.c   |   7 +
 .../gcc.dg/builtin-dynamic-object-size-18.c   |   8 +
 .../gcc.dg/builtin-dynamic-object-size-19.c   | 104 
 .../gcc.dg/builtin-dynamic-object-size-2.c|   6 +
 .../gcc.dg/builtin-dynamic-object-size-3.c|   6 +
 .../gcc.dg/builtin-dynamic-object-size-4.c|   6 +
 .../gcc.dg/builtin-dynamic-object-size-5.c|   7 +
 .../gcc.dg/builtin-dynamic-object-size-6.c|   5 +
 .../gcc.dg/builtin-dynamic-object-size-7.c|   5 +
 .../gcc.dg/builtin-dynamic-object-size-8.c|   5 +
 .../gcc.dg/builtin-dynamic-object-size-9.c|   5 +
 gcc/testsuite/gcc.dg/builtin-object-size-16.c |   2 +
 gcc/testsuite/gcc.dg/builtin-object-size-17.c |   2 +
 gcc/tree-object-size.c| 152 +-
 gcc/tree-object-size.h|  10 ++
 29 files changed, 378 insertions(+), 42 deletions(-)
 create mode 100644 gcc/testsuite/g++.dg/ext/builtin-dynamic-object-size1.C
 create mode 100644 gcc/testsuite/g++.dg/ext/builtin-dynamic-object-size2.C
 create mode 100644 gcc/testsuite/gcc.dg/builtin-dynamic-alloc-size.c
 create mode 100644 gcc/testsuite/gcc.dg/builtin-dynamic-object-size-1.c
 create mode 100644 gcc/testsuite/gcc.dg/builtin-dynamic-object-size-10.c
 create mode 100644 gcc/testsuite/gcc.dg/builtin-dynamic-object-size-11.c
 create mode 100644 gcc/testsuite/gcc.dg/builtin-dynamic-object-size-12.c
 create mode 100644 gcc/testsuite/gcc.dg/builtin-dynamic-object-size-13.c
 create mode 100644 gcc/testsuite/gcc

[PATCH v3 6/8] tree-object-size: Handle function parameters

2021-11-25 Thread Siddhesh Poyarekar

Handle hints provided by __attribute__ ((access (...))) to compute
dynamic sizes for objects.

gcc/ChangeLog:

* tree-object-size.c: Include tree-dfa.h.
(parm_object_size): New function.
(collect_object_sizes_for): Call it.

gcc/testsuite/ChangeLog:

* gcc.dg/builtin-dynamic-object-size-0.c (test_parmsz_simple):
New function.
(main): Call it.

Signed-off-by: Siddhesh Poyarekar 
---
 .../gcc.dg/builtin-dynamic-object-size-0.c| 11 
 gcc/tree-object-size.c| 50 ++-
 2 files changed, 60 insertions(+), 1 deletion(-)

diff --git a/gcc/testsuite/gcc.dg/builtin-dynamic-object-size-0.c 
b/gcc/testsuite/gcc.dg/builtin-dynamic-object-size-0.c
index ddedf6a49bd..ce0f4eb17f3 100644
--- a/gcc/testsuite/gcc.dg/builtin-dynamic-object-size-0.c
+++ b/gcc/testsuite/gcc.dg/builtin-dynamic-object-size-0.c
@@ -46,6 +46,14 @@ test_deploop (size_t sz, size_t cond)
   return __builtin_dynamic_object_size (bin, 0);
 }
 
+size_t
+__attribute__ ((access (__read_write__, 1, 2)))
+__attribute__ ((noinline))
+test_parmsz_simple (void *obj, size_t sz)
+{
+  return __builtin_dynamic_object_size (obj, 0);
+}
+
 unsigned nfails = 0;
 
 #define FAIL() ({ \
@@ -64,6 +72,9 @@ main (int argc, char **argv)
 FAIL ();
   if (test_deploop (128, 129) != 32)
 FAIL ();
+  if (test_parmsz_simple (argv[0], __builtin_strlen (argv[0]) + 1)
+  != __builtin_strlen (argv[0]) + 1)
+FAIL ();
 
   if (nfails > 0)
 __builtin_abort ();
diff --git a/gcc/tree-object-size.c b/gcc/tree-object-size.c
index 5b4dcb619cd..48b1ec6e26a 100644
--- a/gcc/tree-object-size.c
+++ b/gcc/tree-object-size.c
@@ -32,6 +32,7 @@ along with GCC; see the file COPYING3.  If not see
 #include "gimple-fold.h"
 #include "gimple-iterator.h"
 #include "tree-cfg.h"
+#include "tree-dfa.h"
 #include "stringpool.h"
 #include "attribs.h"
 #include "builtins.h"
@@ -1446,6 +1447,53 @@ cond_expr_object_size (struct object_size_info *osi, 
tree var, gimple *stmt)
   return reexamine;
 }
 
+/* Find size of an object passed as a parameter to the function.  */
+
+static void
+parm_object_size (struct object_size_info *osi, tree var)
+{
+  int object_size_type = osi->object_size_type;
+  tree parm = SSA_NAME_VAR (var);
+
+  if (!(object_size_type & OST_DYNAMIC) || !POINTER_TYPE_P (TREE_TYPE (parm)))
+expr_object_size (osi, var, parm);
+
+  /* Look for access attribute.  */
+  rdwr_map rdwr_idx;
+
+  tree fndecl = cfun->decl;
+  const attr_access *access = get_parm_access (rdwr_idx, parm, fndecl);
+  tree typesize = TYPE_SIZE_UNIT (TREE_TYPE (TREE_TYPE (parm)));
+  tree sz = NULL_TREE;
+
+  if (access && access->sizarg != UINT_MAX)
+{
+  tree fnargs = DECL_ARGUMENTS (fndecl);
+  tree arg = NULL_TREE;
+  unsigned argpos = 0;
+
+  /* Walk through the parameters to pick the size parameter and safely
+scale it by the type size.  */
+  for (arg = fnargs; argpos != access->sizarg && arg;
+  arg = TREE_CHAIN (arg), ++argpos);
+
+  if (arg != NULL_TREE && INTEGRAL_TYPE_P (TREE_TYPE (arg)))
+   {
+ sz = get_or_create_ssa_default_def (cfun, arg);
+ if (sz != NULL_TREE)
+   {
+ sz = fold_convert (sizetype, sz);
+ if (typesize)
+   sz = size_binop (MULT_EXPR, sz, typesize);
+   }
+   }
+}
+  if (!sz)
+sz = size_unknown (object_size_type);
+
+  object_sizes_set (osi, SSA_NAME_VERSION (var), sz, sz);
+}
+
 /* Compute an object size expression for VAR, which is the result of a PHI
node.  */
 
@@ -1603,7 +1651,7 @@ collect_object_sizes_for (struct object_size_info *osi, 
tree var)
 case GIMPLE_NOP:
   if (SSA_NAME_VAR (var)
  && TREE_CODE (SSA_NAME_VAR (var)) == PARM_DECL)
-   expr_object_size (osi, var, SSA_NAME_VAR (var));
+   parm_object_size (osi, var);
   else
/* Uninitialized SSA names point nowhere.  */
unknown_object_size (osi, var);
-- 
2.31.1

1 2 >

1 - 100 of 109 matches

Mail list logo