Re: Fix type compatibility for types with flexible array member [PR113688,PR114014,PR117724]

2024-11-30 Thread Martin Uecker
Am Dienstag, dem 26.11.2024 um 15:15 + schrieb Qing Zhao:
> 
> > On Nov 25, 2024, at 16:46, Martin Uecker  wrote:
> > 
> > 
> > Hi Qing,
> > 
> > Am Montag, dem 25.11.2024 um 17:40 + schrieb Qing Zhao:
> > > Hi, Martin,
> > > 
> > > I didn’t go through all the details of your patch.
> > > 
> > > But I have one question:
> > > 
> > > Did you consider the effect of the option -fstrict-flex-array 
> > > (https://gcc.gnu.org/onlinedocs/gcc-14.2.0/gcc/C-Dialect-Options.html#index-fstrict-flex-arrays)
> > >  on how gcc treats the zero size trailing array, 1-element trailing array 
> > > as flexible array member in the patch?
> > 
> > I used the function which was already there which
> > does not take this into account.  For the new version
> > of the patch this should not matter anymore.
> 
> Why it’s not matter anymore?
> 
> For the following testing case:
> 
> struct S{int x,y[1];}*a;
> int main(void){
>  struct S{int x,y[];};
> }
> 
> With your latest patch,  the two structures are considered as compatible with 
> -g;
> However, if we add -fstrict-flex-array=2 or -fstrict-flex-array=3,  the 
> trailing array y[1] is NOT treated
> as FAM anymore, as a result, these two structure are NOT compatible too. 
> 
> Do I miss anything obvious? 

It is not about compatibility from a language semantic point of you
but for TBAA-compatibility which needs to be consistent with it but
can (and must be) more general.

For TBAA, I think we want 

struct foo { int x; int y[]; };

to be TBAA-compatible to

struct foo { int x; int y[3]; };

even when we do not treat the later as FAM (i.e. still forbid
out-of-bounds accesses).

E.g. see Richard's comment: 
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114713#c2


Martin

> Thanks.
> 
> Qing
> > 
> > Martin
> > 
> > 
> > > 
> > > thanks.
> > > 
> > > Qing
> > > > On Nov 23, 2024, at 14:45, Martin Uecker  wrote:
> > > > 
> > > > 
> > > > This patch tries fixes the errors we have because of
> > > > flexible array members.  I am bit unsure about the exception
> > > > for the mode. 
> > > > 
> > > > Bootstrapped and regression tested on x86_64.
> > > > 
> > > > 
> > > > 
> > > >   Fix type compatibility for types with flexible array member 
> > > > [PR113688,PR114014,PR117724]
> > > > 
> > > >   verify_type checks the compatibility of TYPE_CANONICAL using
> > > >   gimple_canonical_types_compatible_p.   But it is stricter than what 
> > > > the
> > > >   C standard requires and therefor inconsistent with how TYPE_CANONICAL 
> > > > is set
> > > >   in the C FE.  Here, the logic is changed to ignore array size when 
> > > > one of the
> > > >   types is a flexible array member.  To not get errors because of 
> > > > inconsistent
> > > >   number of members, zero-sized arrays are not ignored anymore when 
> > > > checking
> > > >   fields of a struct (which is stricter than what was done before).
> > > >   Finally, a exception is added that allows the TYPE_MODE of a type with
> > > >   flexible array member to differ from another compatible type.
> > > > 
> > > >   PR c/113688
> > > >   PR c/114014
> > > >   PR c/117724
> > > > 
> > > >   gcc/ChangeLog:
> > > >   * tree.cc (gimple_canonical_types_compatible_p): Revise
> > > >   logic for types with FAM.
> > > >   (verify_type): Add exception for mode for types with FAM.
> > > > 
> > > >   gcc/testsuite/ChangeLog:
> > > >   * gcc.dg/pr113688.c: New test.
> > > >   * gcc.dg/pr114014.c: New test.
> > > >   * gcc.dg/pr117724.c: New test.
> > > > 
> > > > diff --git a/gcc/testsuite/gcc.dg/pr113688.c 
> > > > b/gcc/testsuite/gcc.dg/pr113688.c
> > > > new file mode 100644
> > > > index 000..8dee8c86f1b
> > > > --- /dev/null
> > > > +++ b/gcc/testsuite/gcc.dg/pr113688.c
> > > > @@ -0,0 +1,8 @@
> > > > +/* { dg-do compile } */
> > > > +/* { dg-options "-g" } */
> > > > +
> > > > +struct S{int x,y[1];}*a;
> > > > +int main(void){
> > > > + struct S{int x,y[];};
> > > > +}
> > > > +
> > > > diff --git a/gcc/testsuite/gcc.dg/pr114014.c 
> > > > b/gcc/testsuite/gcc.dg/pr114014.c
> > > > new file mode 100644
> > > > index 000..ab783f4f85d
> > > > --- /dev/null
> > > > +++ b/gcc/testsuite/gcc.dg/pr114014.c
> > > > @@ -0,0 +1,14 @@
> > > > +/* PR c/114014
> > > > + * { dg-do compile }
> > > > + * { dg-options "-std=c23 -g" } */
> > > > +
> > > > +struct r {
> > > > +  int a;
> > > > +  char b[];
> > > > +};
> > > > +struct r {
> > > > +  int a;
> > > > +  char b[0];
> > > > +}; /* { dg-error "redefinition" } */
> > > > +
> > > > +
> > > > diff --git a/gcc/testsuite/gcc.dg/pr117724.c 
> > > > b/gcc/testsuite/gcc.dg/pr117724.c
> > > > new file mode 100644
> > > > index 000..d631daeb644
> > > > --- /dev/null
> > > > +++ b/gcc/testsuite/gcc.dg/pr117724.c
> > > > @@ -0,0 +1,16 @@
> > > > +/* { dg-do compile } */
> > > > +/* { dg-options "-g" } */
> > > > +
> > > > +struct {
> > > > +  unsigned long len;
> > > > +  unsigned long size;
> > > > +  char data[]

Re: Should -fsanitize=bounds support counted-by attribute for pointers inside a structure?

2024-11-30 Thread Martin Uecker
Am Dienstag, dem 26.11.2024 um 20:59 + schrieb Qing Zhao:
> Think this over these days, I have another thought that need some feedback:
> 
> The major issue right now is:
> 
> 1. For the following structure in which the “counted_by” attributes is 
> attached to the pointer field.
> 
> struct foo {
>   int n;
>   char *p __attribute__ ((counted_by (n)));
> } *x;

BTW: I also do not like the syntax 'n' for a lookup of a member 
in the member namespace. I think it should be '.n'.  For FAM this 
is less problematic because it always at the end, but here it is 
problematic.
> 
> There is one important additional requirement:
> 
> x->n, x->p can ONLY be changed by changing the whole structure at the same 
> time. 
> Otherwise, x->n might not be consistent with x->p.

By itself, this would still not fix the issue I pointed out.

struct foo x;
x = .. ; // set the whole structure
char *p = x->p;
x = ... ; // set the whole structure

What is the bound for 'p' ?  
With current rules it would be the old bound.


> 
> 2. This new requirement is ONLY for “counted_by” attribute that is attached 
> to the pointer field, not needed
> for flexible array members.
> 
> 3. Then there will be inconsistency for the “counted_by” attribute between 
> FAM and pointer field.


> The major questions I have right now:
> 
> 1. Shall we keep this inconsistency between FAM and pointer field? 
> Or, 
> 
> 2. Shall we keep them consistent by adding this new requirement for the 
> previous FAM? 

Or have a new attribute?  I feel we double down on a bad design
which made sense for FAM addressing a very specific use case
(retrofitting checks to the Linux kernel) but is otherwise not
very strong.

Martin

> 
> Kees, the following feature that you requested for the FAM:
> 
> "
>  One important feature of the attribute is, a reference to the
>  flexible array member field uses the latest value assigned to the
>  field that represents the number of the elements before that
>  reference.  For example,
> 
> p->count = val1;
> p->array[20] = 0;  // ref1 to p->array
> p->count = val2;
> p->array[30] = 0;  // ref2 to p->array
> 
>  in the above, 'ref1' uses 'val1' as the number of the elements in
>  'p->array', and 'ref2' uses 'val2' as the number of elements in
>  'p->array’.
> "
> Has this feature been used by Linux kernel already?
> Is this feature really needed by Linux kernel?
> 
> Thanks a lot for suggestions and comments.
> 
> Qing
> 
> > On Nov 20, 2024, at 14:23, Martin Uecker  wrote:
> > 
> > Am Mittwoch, dem 20.11.2024 um 17:37 + schrieb Qing Zhao:
> > > Hi, Martin,
> > > 
> > > Thanks a lot for pointing this out. 
> > > 
> > > This does look like a problem we need avoid for the pointer arrays.
> > > 
> > > Does  the same problem exist in the language extension too if the n is 
> > > allowed to be changed after initialization?
> > > 
> > > If so, for the future language extension, is there any proposed solution 
> > > to this problem? 
> > > 
> > 
> > There is no specification yet and nothing formally proposed, so
> > it is entirely unclear at this point.
> > 
> > My idea would be to give 'x->buf' the type '(char(*)[x->n])'
> > where 'x->n' is loaded at the same time 'x->buf' is accessed
> > and then the value is frozen (like in the simpler versions
> > of 'counted_by' you had implemented first).  Of course, one
> > would then have to set x->n before setting the buffer (or
> > at the same time). This could be ensured by making the
> > member 'n' const, so that it can only be changed by
> > overwriting the whole struct. But I am still thinking
> > about this.
> > 
> > In any case, I think for "counted_by" this is not an option
> > because it would be confusing if it works differently
> > than for the FAM case. 
> > 
> > Martin
> > 
> > 
> > > Qing
> > > > On Nov 20, 2024, at 10:52, Martin Uecker  wrote:
> > > > 
> > > > Am Mittwoch, dem 20.11.2024 um 15:27 + schrieb Qing Zhao:
> > > > > 
> > > > > > On Nov 19, 2024, at 10:47, Marek Polacek  wrote:
> > > > > > 
> > > > > > On Mon, Nov 18, 2024 at 07:10:35PM +0100, Martin Uecker wrote:
> > > > > > > Am Montag, dem 18.11.2024 um 17:55 + schrieb Qing Zhao:
> > > > > > > > Hi,
> > > > > 
> > > > ..
> > > > 
> > > > Hi Qing,
> > > > 
> > > > > Per our discussion so far, if treating the following
> > > > > 
> > > > > struct foo {
> > > > > int n;
> > > > > char *p __attribute__ ((counted_by (n)));
> > > > > };
> > > > > 
> > > > > as an array with upper-bounds being “n” is reasonable.
> > > > 
> > > > There is one issue I want to point out, which I just realized during
> > > > a discussion in WG14.  For "counted_by" we defined the semantics
> > > > in a way that later store to 'n' will be taken into account. 
> > > > We did this to support the use case where 'n' is set after
> > > > the access to 'p'.
> > > > 
> > > > struct foo *x = ;
> > > > 
> > > > char *q = x->p;
> > > > x->n = 100; // this should apply

Re: gimplify: Handle void BIND_EXPR as asm input [PR100501]

2024-11-30 Thread Richard Biener



> Am 29.11.2024 um 23:45 schrieb Joseph Myers :
> 
> On Fri, 29 Nov 2024, Richard Biener wrote:
> 
>> I think we're trying to handle errorneous cases by setting TREE_VALUE
>> to error_mark_node
>> before this, so how about the following instead?
> 
> Yes, that works, and also fixes the test in bug 100792 unlike my previous
> patch.  Here's a full, tested patch using your version.
> 
> 
> gimplify: Handle void expression as asm input [PR100501, PR100792]
> 
> As reported in bug 100501 (plus duplicates), the gimplifier ICEs for C
> tests involving a statement expression not returning a value as an asm
> input; this includes the variant bug 100792 where the statement
> expression ends with another asm statement.
> 
> The expected diagnostic for this case (as seen for C++ input) is one
> coming from the gimplifier and so it seems reasonable to fix the
> gimplifier to handle the GENERIC generated for this case by the C
> front end, rather than trying to make the C front end detect it
> earlier.  Thus the gimplifier to handle a void
> expression like other non-lvalues for such a memory input.
> 
> Bootstrapped with no regressions for x86_64-pc-linux-gnu.  OK to commit?

Ok

Richard 

>PR c/100501
>PR c/100792
> 
> gcc/
>* gimplify.cc (gimplify_asm_expr): Handle void expressions for
>memory inputs like other non-lvalues.
> 
> gcc/testsuite/
>* gcc.dg/pr100501-1.c, gcc.dg/pr100792-1.c: New tests.
>* gcc.dg/pr48552-1.c, gcc.dg/pr48552-2.c,
>gcc.dg/torture/pr98601.c: Update expected errors.
> 
> Co-authored-by: Richard Biener 
> 
> diff --git a/gcc/gimplify.cc b/gcc/gimplify.cc
> index fb0ca23bfb6c..aa99c0a98f73 100644
> --- a/gcc/gimplify.cc
> +++ b/gcc/gimplify.cc
> @@ -7453,7 +7453,8 @@ gimplify_asm_expr (tree *expr_p, gimple_seq *pre_p, 
> gimple_seq *post_p)
>  || TREE_CODE (inputv) == PREINCREMENT_EXPR
>  || TREE_CODE (inputv) == POSTDECREMENT_EXPR
>  || TREE_CODE (inputv) == POSTINCREMENT_EXPR
> -  || TREE_CODE (inputv) == MODIFY_EXPR)
> +  || TREE_CODE (inputv) == MODIFY_EXPR
> +  || VOID_TYPE_P (TREE_TYPE (inputv)))
>TREE_VALUE (link) = error_mark_node;
>  tret = gimplify_expr (&TREE_VALUE (link), pre_p, post_p,
>is_gimple_lvalue, fb_lvalue | fb_mayfail);
> diff --git a/gcc/testsuite/gcc.dg/pr100501-1.c 
> b/gcc/testsuite/gcc.dg/pr100501-1.c
> new file mode 100644
> index ..152caac8b5d1
> --- /dev/null
> +++ b/gcc/testsuite/gcc.dg/pr100501-1.c
> @@ -0,0 +1,26 @@
> +/* Test ICE for statement expression returning no value as asm input (bug
> +   100501).  */
> +/* { dg-do compile } */
> +/* { dg-options "" } */
> +
> +int x;
> +int g ();
> +
> +void
> +f ()
> +{
> +  __asm__ ("" : : "m" (({}))); /* { dg-error "memory input 0 is not directly 
> addressable" } */
> +  __asm__ ("" : : "m" (({ ; }))); /* { dg-error "memory input 0 is not 
> directly addressable" } */
> +  __asm__ ("" : : "m" (({ (void) 0; }))); /* { dg-error "memory input 0 is 
> not directly addressable" } */
> +  __asm__ ("" : : "m" (({ f (); }))); /* { dg-error "memory input 0 is not 
> directly addressable" } */
> +  __asm__ ("" : : "m" (({ f (); f (); }))); /* { dg-error "memory input 0 is 
> not directly addressable" } */
> +  __asm__ ("" : : "m" (({ x = g (); f (); }))); /* { dg-error "memory input 
> 0 is not directly addressable" } */
> +  __asm__ ("" : : "m" (({ if (1) g (); }))); /* { dg-error "memory input 0 
> is not directly addressable" } */
> +  __asm__ ("" : : "m" (({ if (1) g (); else g (); }))); /* { dg-error 
> "memory input 0 is not directly addressable" } */
> +  __asm__ ("" : : "m" (({ test : goto test; }))); /* { dg-error "memory 
> input 0 is not directly addressable" } */
> +  __asm__ ("" : : "m" (({ return; }))); /* { dg-error "memory input 0 is not 
> directly addressable" } */
> +  __asm__ ("" : : "m" (({ while (1); }))); /* { dg-error "memory input 0 is 
> not directly addressable" } */
> +  __asm__ ("" : : "m" (({ do {} while (1); }))); /* { dg-error "memory input 
> 0 is not directly addressable" } */
> +  __asm__ ("" : : "m" (({ for (;;); }))); /* { dg-error "memory input 0 is 
> not directly addressable" } */
> +  __asm__ ("" : : "m" (({ switch (x); }))); /* { dg-error "memory input 0 is 
> not directly addressable" } */
> +}
> diff --git a/gcc/testsuite/gcc.dg/pr100792-1.c 
> b/gcc/testsuite/gcc.dg/pr100792-1.c
> new file mode 100644
> index ..52f3aaf83f73
> --- /dev/null
> +++ b/gcc/testsuite/gcc.dg/pr100792-1.c
> @@ -0,0 +1,10 @@
> +/* Test ICE for statement expression ending with asm as asm input (bug
> +   100792).  */
> +/* { dg-do compile } */
> +/* { dg-options "" } */
> +
> +void
> +f ()
> +{
> +  __asm__ ("" : : "m" (({ __asm__ (""); }))); /* { dg-error "memory input 0 
> is not directly addressable" } */
> +}
> diff --git a/gcc/testsuite/gcc.dg/pr48552-1.c 
> b/gcc/testsuite/gcc.dg/pr48552-1.c
> index 4cd7c59011ef..b3ef453cb1d5 100644
> --- a/gcc/testsuite/gcc.dg/pr48552-1

Re:[pushed] [PATCH 1/2] LoongArch: testsuite: Fix loongarch/vect-frint-scalar.c.

2024-11-30 Thread Lulu Cheng

Pushed to r15-5817.

在 2024/11/26 下午4:06, Lulu Cheng 写道:

In r15-5327, change the default language version for C compilation from
-std=gnu17 to -std=gnu23.

ISO C99 and C11 allow ceil, floor, round and trunc, and their float and
long double variants, to raise the “inexact” exception,
but ISO/IEC TS 18661-1:2014, the C bindings to IEEE 754-2008, as
integrated into ISO C23, does not allow these functions to do so.

So add '-ffp-int-builtin-inexact' to this test case.

gcc/testsuite/ChangeLog:

* gcc.target/loongarch/vect-frint-scalar.c: Add
'-ffp-int-builtin-inexact'.
---
  gcc/testsuite/gcc.target/loongarch/vect-frint-scalar.c | 2 +-
  1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/gcc/testsuite/gcc.target/loongarch/vect-frint-scalar.c 
b/gcc/testsuite/gcc.target/loongarch/vect-frint-scalar.c
index c7cb40be7d4..dbcb9065ad4 100644
--- a/gcc/testsuite/gcc.target/loongarch/vect-frint-scalar.c
+++ b/gcc/testsuite/gcc.target/loongarch/vect-frint-scalar.c
@@ -1,5 +1,5 @@
  /* { dg-do compile } */
-/* { dg-options "-O2 -mlsx" } */
+/* { dg-options "-O2 -mlsx -ffp-int-builtin-inexact" } */
  
  #define test(func, suffix) \

  __typeof__ (1.##suffix) \




Re:[pushed] [PATCH 2/2] LoongArch: testsuite: Fix l{a}sx-andn-iorn.c.

2024-11-30 Thread Lulu Cheng

Pushed to r15-5818.

在 2024/11/26 下午4:06, Lulu Cheng 写道:

Add '-fdump-tree-optimized' to this testcases.

gcc/testsuite/ChangeLog:

* gcc.target/loongarch/lasx-andn-iorn.c:
Add '-fdump-tree-optimized'.
* gcc.target/loongarch/lsx-andn-iorn.c:
Likewise.
---
  gcc/testsuite/gcc.target/loongarch/lasx-andn-iorn.c | 2 +-
  gcc/testsuite/gcc.target/loongarch/lsx-andn-iorn.c  | 2 +-
  2 files changed, 2 insertions(+), 2 deletions(-)

diff --git a/gcc/testsuite/gcc.target/loongarch/lasx-andn-iorn.c 
b/gcc/testsuite/gcc.target/loongarch/lasx-andn-iorn.c
index 4aa5f19a650..86b04dbbb33 100644
--- a/gcc/testsuite/gcc.target/loongarch/lasx-andn-iorn.c
+++ b/gcc/testsuite/gcc.target/loongarch/lasx-andn-iorn.c
@@ -3,7 +3,7 @@
  #include "./lsx-andn-iorn.c"
  
  /* { dg-do compile } */

-/* { dg-options "-O2 -mlasx -ftree-vectorize" } */
+/* { dg-options "-O2 -mlasx -ftree-vectorize -fdump-tree-optimized" } */
  
  /* We should produce a BIT_ANDC and BIT_IORC here.  */
  
diff --git a/gcc/testsuite/gcc.target/loongarch/lsx-andn-iorn.c b/gcc/testsuite/gcc.target/loongarch/lsx-andn-iorn.c

index 7bceccd37d4..14838ab8bff 100644
--- a/gcc/testsuite/gcc.target/loongarch/lsx-andn-iorn.c
+++ b/gcc/testsuite/gcc.target/loongarch/lsx-andn-iorn.c
@@ -1,5 +1,5 @@
  /* { dg-do compile } */
-/* { dg-options "-O2 -mlsx -ftree-vectorize" } */
+/* { dg-options "-O2 -mlsx -ftree-vectorize -fdump-tree-optimized" } */
  
  #ifndef N

  #define N 4




[PATCH] strlen: Handle vector CONSTRUCTORs [PR117057]

2024-11-30 Thread Jakub Jelinek
Hi!

The following patch handles VECTOR_TYPE_P CONSTRUCTORs in
count_nonzero_bytes, including handling them if they have some elements
non-constant.
If there are still some constant elements before it (in the range queried),
we derive info at least from those bytes and consider the rest as unknown.

The first 3 hunks just punt in IMHO problematic cases, the spaghetti code
considers byte_size 0 as unknown size, determine yourself, so if offset
is equal to exp size, there are 0 bytes to consider (so nothing useful
to determine), but using byte_size 0 would mean use any size.
Similarly, native_encode_expr uses int type for offset (and size), so
padding it offset larger than INT_MAX could be silent miscompilation.

I've guarded the test to just a couple of targets known to handle it,
because e.g. on ia32 without -msse forwprop1 seems to lower the CONSTRUCTOR
into 4 BIT_FIELD_REF stores and I haven't figured out on what exactly
that depends on (e.g. powerpc* is fine on any CPUs, even with -mno-altivec
-mno-vsx, even -m32).

Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?

2024-11-30  Jakub Jelinek  

PR tree-optimization/117057
* tree-ssa-strlen.cc (strlen_pass::count_nonzero_bytes): Punt also
when byte_size is equal to offset or nchars.  Punt if offset is bigger
than INT_MAX.  Handle vector CONSTRUCTOR with some elements constant,
possibly followed by non-constant.

* gcc.dg/strlenopt-32.c: Remove xfail and vect_slp_v2qi_store_unalign
specific scan-tree-dump-times directive.
* gcc.dg/strlenopt-96.c: New test.

--- gcc/tree-ssa-strlen.cc.jj   2024-11-23 13:00:31.476982808 +0100
+++ gcc/tree-ssa-strlen.cc  2024-11-29 22:53:17.465837084 +0100
@@ -4629,7 +4629,7 @@ strlen_pass::count_nonzero_bytes (tree e
return false;
 
   unsigned HOST_WIDE_INT byte_size = tree_to_uhwi (size);
-  if (byte_size < offset)
+  if (byte_size <= offset)
return false;
 
   nbytes = byte_size - offset;
@@ -4682,7 +4682,7 @@ strlen_pass::count_nonzero_bytes (tree e
   if (TREE_CODE (exp) == STRING_CST)
 {
   unsigned nchars = TREE_STRING_LENGTH (exp);
-  if (nchars < offset)
+  if (nchars <= offset)
return false;
 
   if (!nbytes)
@@ -4700,7 +4700,7 @@ strlen_pass::count_nonzero_bytes (tree e
   unsigned char buf[256];
   if (!prep)
 {
-  if (CHAR_BIT != 8 || BITS_PER_UNIT != 8)
+  if (CHAR_BIT != 8 || BITS_PER_UNIT != 8 || offset > INT_MAX)
return false;
   /* If the pointer to representation hasn't been set above
 for STRING_CST point it at the buffer.  */
@@ -4710,11 +4710,75 @@ strlen_pass::count_nonzero_bytes (tree e
   unsigned repsize = native_encode_expr (exp, buf, sizeof buf, offset);
   if (repsize < nbytes)
{
- /* This should only happen when REPSIZE is zero because EXP
-doesn't denote an object with a known initializer, except
-perhaps when the reference reads past its end.  */
- lenrange[0] = 0;
- prep = NULL;
+ /* Handle vector { 0x12345678, 0x23003412, x_1(D), y_2(D) }
+and similar cases.  Even when not all the elements are constant,
+we can perhaps figure out something from the constant ones
+and assume the others can be anything.  */
+ if (TREE_CODE (exp) == CONSTRUCTOR
+ && CONSTRUCTOR_NELTS (exp)
+ && VECTOR_TYPE_P (TREE_TYPE (exp))
+ && nbytes <= sizeof buf)
+   {
+ tree v0 = CONSTRUCTOR_ELT (exp, 0)->value;
+ unsigned HOST_WIDE_INT elt_sz
+   = int_size_in_bytes (TREE_TYPE (v0));
+ unsigned int i, s = 0;
+ tree v, idx;
+ FOR_EACH_CONSTRUCTOR_ELT (CONSTRUCTOR_ELTS (exp), i, idx, v)
+   {
+ if (idx
+ && (VECTOR_TYPE_P (TREE_TYPE (v0))
+ || !tree_fits_uhwi_p (idx)
+ || tree_to_uhwi (idx) != i))
+   {
+ s = 0;
+ break;
+   }
+ if ((i + 1) * elt_sz <= offset)
+   continue;
+ unsigned int o = 0;
+ if (i * elt_sz < offset)
+   o = offset % elt_sz;
+ repsize = native_encode_expr (v, buf + s,
+   sizeof (buf) - s, o);
+ if (repsize != elt_sz - o)
+   break;
+ s += repsize;
+   }
+ if (s != 0 && s < nbytes)
+   {
+ unsigned HOST_WIDE_INT n = strnlen (prep, s);
+ if (n < lenrange[0])
+   lenrange[0] = n;
+ if (lenrange[1] < n && n != s)
+   lenrange[1] = n;
+ if (lenrange[2] < nbytes)
+   lenrange[2] = nbytes;
+ 

Re: [PATCH] strlen: Handle vector CONSTRUCTORs [PR117057]

2024-11-30 Thread Richard Biener



> Am 30.11.2024 um 10:53 schrieb Jakub Jelinek :
> 
> Hi!
> 
> The following patch handles VECTOR_TYPE_P CONSTRUCTORs in
> count_nonzero_bytes, including handling them if they have some elements
> non-constant.
> If there are still some constant elements before it (in the range queried),
> we derive info at least from those bytes and consider the rest as unknown.
> 
> The first 3 hunks just punt in IMHO problematic cases, the spaghetti code
> considers byte_size 0 as unknown size, determine yourself, so if offset
> is equal to exp size, there are 0 bytes to consider (so nothing useful
> to determine), but using byte_size 0 would mean use any size.
> Similarly, native_encode_expr uses int type for offset (and size), so
> padding it offset larger than INT_MAX could be silent miscompilation.
> 
> I've guarded the test to just a couple of targets known to handle it,
> because e.g. on ia32 without -msse forwprop1 seems to lower the CONSTRUCTOR
> into 4 BIT_FIELD_REF stores and I haven't figured out on what exactly
> that depends on (e.g. powerpc* is fine on any CPUs, even with -mno-altivec
> -mno-vsx, even -m32).
> 
> Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?

Ok

Thanks,
Richard 

> 2024-11-30  Jakub Jelinek  
> 
>PR tree-optimization/117057
>* tree-ssa-strlen.cc (strlen_pass::count_nonzero_bytes): Punt also
>when byte_size is equal to offset or nchars.  Punt if offset is bigger
>than INT_MAX.  Handle vector CONSTRUCTOR with some elements constant,
>possibly followed by non-constant.
> 
>* gcc.dg/strlenopt-32.c: Remove xfail and vect_slp_v2qi_store_unalign
>specific scan-tree-dump-times directive.
>* gcc.dg/strlenopt-96.c: New test.
> 
> --- gcc/tree-ssa-strlen.cc.jj2024-11-23 13:00:31.476982808 +0100
> +++ gcc/tree-ssa-strlen.cc2024-11-29 22:53:17.465837084 +0100
> @@ -4629,7 +4629,7 @@ strlen_pass::count_nonzero_bytes (tree e
>return false;
> 
>   unsigned HOST_WIDE_INT byte_size = tree_to_uhwi (size);
> -  if (byte_size < offset)
> +  if (byte_size <= offset)
>return false;
> 
>   nbytes = byte_size - offset;
> @@ -4682,7 +4682,7 @@ strlen_pass::count_nonzero_bytes (tree e
>   if (TREE_CODE (exp) == STRING_CST)
> {
>   unsigned nchars = TREE_STRING_LENGTH (exp);
> -  if (nchars < offset)
> +  if (nchars <= offset)
>return false;
> 
>   if (!nbytes)
> @@ -4700,7 +4700,7 @@ strlen_pass::count_nonzero_bytes (tree e
>   unsigned char buf[256];
>   if (!prep)
> {
> -  if (CHAR_BIT != 8 || BITS_PER_UNIT != 8)
> +  if (CHAR_BIT != 8 || BITS_PER_UNIT != 8 || offset > INT_MAX)
>return false;
>   /* If the pointer to representation hasn't been set above
> for STRING_CST point it at the buffer.  */
> @@ -4710,11 +4710,75 @@ strlen_pass::count_nonzero_bytes (tree e
>   unsigned repsize = native_encode_expr (exp, buf, sizeof buf, offset);
>   if (repsize < nbytes)
>{
> -  /* This should only happen when REPSIZE is zero because EXP
> - doesn't denote an object with a known initializer, except
> - perhaps when the reference reads past its end.  */
> -  lenrange[0] = 0;
> -  prep = NULL;
> +  /* Handle vector { 0x12345678, 0x23003412, x_1(D), y_2(D) }
> + and similar cases.  Even when not all the elements are constant,
> + we can perhaps figure out something from the constant ones
> + and assume the others can be anything.  */
> +  if (TREE_CODE (exp) == CONSTRUCTOR
> +  && CONSTRUCTOR_NELTS (exp)
> +  && VECTOR_TYPE_P (TREE_TYPE (exp))
> +  && nbytes <= sizeof buf)
> +{
> +  tree v0 = CONSTRUCTOR_ELT (exp, 0)->value;
> +  unsigned HOST_WIDE_INT elt_sz
> += int_size_in_bytes (TREE_TYPE (v0));
> +  unsigned int i, s = 0;
> +  tree v, idx;
> +  FOR_EACH_CONSTRUCTOR_ELT (CONSTRUCTOR_ELTS (exp), i, idx, v)
> +{
> +  if (idx
> +  && (VECTOR_TYPE_P (TREE_TYPE (v0))
> +  || !tree_fits_uhwi_p (idx)
> +  || tree_to_uhwi (idx) != i))
> +{
> +  s = 0;
> +  break;
> +}
> +  if ((i + 1) * elt_sz <= offset)
> +continue;
> +  unsigned int o = 0;
> +  if (i * elt_sz < offset)
> +o = offset % elt_sz;
> +  repsize = native_encode_expr (v, buf + s,
> +sizeof (buf) - s, o);
> +  if (repsize != elt_sz - o)
> +break;
> +  s += repsize;
> +}
> +  if (s != 0 && s < nbytes)
> +{
> +  unsigned HOST_WIDE_INT n = strnlen (prep, s);
> +  if (n < lenrange[0])
> +lenrange[0] = n;
> +  if (lenrange[1] < n && n != s)
> +lenrange[1] = n;
> +  if (lenrange[2] < nbytes)
> +lenrange[2] = nbytes;
> +  /* We haven't processed all bytes, the rest are unknown.
> +

[committed] openmp: Add crtoffloadtableS.o and use it [PR117851]

2024-11-30 Thread Jakub Jelinek
Hi!

Unlike crtoffload{begin,end}.o which just define some symbols at the start/end
of the various .gnu.offload* sections, crtoffloadtable.o contains
const void *const __OFFLOAD_TABLE__[]
  __attribute__ ((__visibility__ ("hidden"))) =
{
  &__offload_func_table, &__offload_funcs_end,
  &__offload_var_table, &__offload_vars_end,
  &__offload_ind_func_table, &__offload_ind_funcs_end,
};
The problem is that linking this into PIEs or shared libraries doesn't
work when it is compiled without -fpic/-fpie - __OFFLOAD_TABLE__ for non-PIC
code is put into .rodata section, but it really needs relocations, so for
PIC it should go into .data.rel.ro/.data.rel.ro.local.
As I think we don't want .data.rel.ro section in non-PIE binaries, this patch
follows the path of e.g. crtbegin.o vs. crtbeginS.o and adds crtoffloadtableS.o
next to crtoffloadtable.o, where crtoffloadtableS.o is compiled with -fpic.

Bootstrapped/regtested on x86_64-linux and i686-linux (without offloading)
and Matthias said he has tested it with offloading (I'll do that early next
week when trying to build distro packages), committed to trunk.

2024-11-30  Jakub Jelinek  

PR libgomp/117851
gcc/
* lto-wrapper.cc (find_crtoffloadtable): Add PIE_OR_SHARED argument,
search for crtoffloadtableS.o rather than crtoffloadtable.o if
true.
(run_gcc): Add pie_or_shared variable.  If OPT_pie or OPT_shared or
OPT_static_pie is seen, set pie_or_shared to true, if OPT_no_pie is
seen, set pie_or_shared to false.  Pass it to find_crtoffloadtable.
libgcc/
* configure.ac (extra_parts): Add crtoffloadtableS.o.
* Makefile.in (crtoffloadtableS$(objext)): New goal.
* configure: Regenerated.

--- gcc/lto-wrapper.cc.jj   2024-11-23 13:00:30.083002920 +0100
+++ gcc/lto-wrapper.cc  2024-11-29 21:18:47.362532372 +0100
@@ -1092,13 +1092,16 @@ copy_file (const char *dest, const char
the copy to the linker.  */
 
 static void
-find_crtoffloadtable (int save_temps, const char *dumppfx)
+find_crtoffloadtable (int save_temps, bool pie_or_shared, const char *dumppfx)
 {
   char **paths = NULL;
   const char *library_path = getenv ("LIBRARY_PATH");
   if (!library_path)
 return;
-  unsigned n_paths = parse_env_var (library_path, &paths, 
"/crtoffloadtable.o");
+  unsigned n_paths = parse_env_var (library_path, &paths,
+   pie_or_shared
+   ? "/crtoffloadtableS.o"
+   : "/crtoffloadtable.o");
 
   unsigned i;
   for (i = 0; i < n_paths; i++)
@@ -1117,7 +1120,8 @@ find_crtoffloadtable (int save_temps, co
   }
   if (i == n_paths)
 fatal_error (input_location,
-"installation error, cannot find %");
+"installation error, cannot find %",
+pie_or_shared ? "S" : "");
 
   free_array_of_ptrs ((void **) paths, n_paths);
 }
@@ -1423,6 +1427,11 @@ run_gcc (unsigned argc, char *argv[])
   char **lto_argv, **ltoobj_argv;
   bool linker_output_rel = false;
   bool skip_debug = false;
+#ifdef ENABLE_DEFAULT_PIE
+  bool pie_or_shared = true;
+#else
+  bool pie_or_shared = false;
+#endif
   const char *incoming_dumppfx = dumppfx = NULL;
   static char current_dir[] = { '.', DIR_SEPARATOR, '\0' };
 
@@ -1594,6 +1603,16 @@ run_gcc (unsigned argc, char *argv[])
  global_dc->set_show_highlight_colors (option->value);
  break;
 
+   case OPT_pie:
+   case OPT_shared:
+   case OPT_static_pie:
+ pie_or_shared = true;
+ break;
+
+   case OPT_no_pie:
+ pie_or_shared = false;
+ break;
+
default:
  break;
}
@@ -1802,7 +1821,7 @@ cont1:
 
   if (offload_names)
{
- find_crtoffloadtable (save_temps, dumppfx);
+ find_crtoffloadtable (save_temps, pie_or_shared, dumppfx);
  for (i = 0; offload_names[i]; i++)
printf ("%s\n", offload_names[i]);
  free_array_of_ptrs ((void **) offload_names, i);
--- libgcc/configure.ac.jj  2024-09-24 11:31:48.942619056 +0200
+++ libgcc/configure.ac 2024-11-29 21:05:29.344764142 +0100
@@ -481,7 +481,8 @@ AC_SUBST(accel_dir_suffix)
 AC_SUBST(real_host_noncanonical)
 
 if test x"$enable_offload_targets" != x; then
-  extra_parts="${extra_parts} crtoffloadbegin.o crtoffloadend.o 
crtoffloadtable.o"
+  extra_parts="${extra_parts} crtoffloadbegin.o crtoffloadend.o"
+  extra_parts="${extra_parts} crtoffloadtable.o crtoffloadtableS.o"
 fi
 
 # Check if Solaris/x86 linker supports ZERO terminator unwind entries.
--- libgcc/Makefile.in.jj   2024-09-24 11:31:48.912619466 +0200
+++ libgcc/Makefile.in  2024-11-29 21:03:22.789545531 +0100
@@ -1064,6 +1064,9 @@ crtoffloadend$(objext): $(srcdir)/offloa
 
 crtoffloadtable$(objext): $(srcdir)/offloadstuff.c
$(crt_compile) $(CRTSTUFF_T_CFLAGS) -c $< -DCRT_TABLE
+
+crtoffloadtableS$(objext): $(srcdir)/offloadstuff.c
+   $(crt_compile) $(CRTSTUFF_T_CFLAGS_S) -

Re: [pushed][PATCH v3] LoongArch: Mask shift offset when emit {xv,v}{srl,sll,sra} with sameimm vector

2024-11-30 Thread Lulu Cheng

Pushed to r15-5819..

在 2024/11/28 上午9:26, Jinyang He 写道:

For {xv,v}{srl,sll,sra}, the constraint `vector_same_uimm6` cause overflow
in when emit {w,h,b}. Since the number of bits shifted is the remainder of
the register value, it is actually unnecessary to constrain the range.
Simply mask the shift number with the unit-bit-width, without any
constraint on the shift range.

gcc/ChangeLog:

* config/loongarch/constraints.md (Uuv6, Uuvx): Remove Uuv6,
add Uuvx as replicated vector const with unsigned range [0,umax].
* config/loongarch/lasx.md (xvsrl, xvsra, xvsll): Mask shift
offset by its unit bits.
* config/loongarch/lsx.md (vsrl, vsra, vsll): Likewise.
* config/loongarch/loongarch-protos.h
(loongarch_const_vector_same_int_p): Set default for low and high.
* config/loongarch/predicates.md: Replace reg_or_vector_same_uimm6
_operand to reg_or_vector_same_uimm_operand.

gcc/testsuite/ChangeLog:

* gcc.target/loongarch/vector/lasx/lasx-shift-sameimm-vec.c: New test.
* gcc.target/loongarch/vector/lsx/lsx-shift-sameimm-vec.c: New test.
---
v2: Fix indent in lsx.md and lasx.md.
 Use "dg-do assemble" in test which suggested by Ruoyao.
v3: Re-enable scan-assembler.

  gcc/config/loongarch/constraints.md   | 14 ++--
  gcc/config/loongarch/lasx.md  | 60 
  gcc/config/loongarch/loongarch-protos.h   |  5 +-
  gcc/config/loongarch/lsx.md   | 60 
  gcc/config/loongarch/predicates.md|  8 +--
  .../vector/lasx/lasx-shift-sameimm-vec.c  | 72 +++
  .../vector/lsx/lsx-shift-sameimm-vec.c| 72 +++
  7 files changed, 254 insertions(+), 37 deletions(-)
  create mode 100644 
gcc/testsuite/gcc.target/loongarch/vector/lasx/lasx-shift-sameimm-vec.c
  create mode 100644 
gcc/testsuite/gcc.target/loongarch/vector/lsx/lsx-shift-sameimm-vec.c

diff --git a/gcc/config/loongarch/constraints.md 
b/gcc/config/loongarch/constraints.md
index 18da8b31f49..66ef1073fad 100644
--- a/gcc/config/loongarch/constraints.md
+++ b/gcc/config/loongarch/constraints.md
@@ -334,19 +334,19 @@
(and (match_code "const_vector")
 (match_test "loongarch_const_vector_same_int_p (op, mode, -16, 15)")))
  
-(define_constraint "Uuv6"

-  "@internal
-   A replicated vector const in which the replicated value is in the range
-   [0,63]."
-  (and (match_code "const_vector")
-   (match_test "loongarch_const_vector_same_int_p (op, mode, 0, 63)")))
-
  (define_constraint "Urv8"
"@internal
 A replicated vector const with replicated byte values as well as elements"
(and (match_code "const_vector")
 (match_test "loongarch_const_vector_same_bytes_p (op, mode)")))
  
+(define_constraint "Uuvx"

+  "@internal
+   A replicated vector const in which the replicated value is in the unsigned
+   range [0,umax]."
+  (and (match_code "const_vector")
+   (match_test "loongarch_const_vector_same_int_p (op, mode)")))
+
  (define_memory_constraint "ZC"
"A memory operand whose address is formed by a base register and offset
 that is suitable for use in instructions with the same addressing mode
diff --git a/gcc/config/loongarch/lasx.md b/gcc/config/loongarch/lasx.md
index 457ed163f31..90778dd8ff9 100644
--- a/gcc/config/loongarch/lasx.md
+++ b/gcc/config/loongarch/lasx.md
@@ -1013,11 +1013,23 @@
[(set (match_operand:ILASX 0 "register_operand" "=f,f")
(lshiftrt:ILASX
  (match_operand:ILASX 1 "register_operand" "f,f")
- (match_operand:ILASX 2 "reg_or_vector_same_uimm6_operand" "f,Uuv6")))]
+ (match_operand:ILASX 2 "reg_or_vector_same_uimm_operand" "f,Uuvx")))]
"ISA_HAS_LASX"
-  "@
-   xvsrl.\t%u0,%u1,%u2
-   xvsrli.\t%u0,%u1,%E2"
+{
+  switch (which_alternative)
+{
+case 0:
+  return "xvsrl.\t%u0,%u1,%u2";
+case 1:
+  {
+   unsigned HOST_WIDE_INT val = UINTVAL (CONST_VECTOR_ELT (operands[2], 
0));
+   operands[2] = GEN_INT (val & (GET_MODE_UNIT_BITSIZE (mode) - 1));
+   return "xvsrli.\t%u0,%u1,%d2";
+  }
+default:
+  gcc_unreachable ();
+}
+}
[(set_attr "type" "simd_shift")
 (set_attr "mode" "")])
  
@@ -1026,11 +1038,23 @@

[(set (match_operand:ILASX 0 "register_operand" "=f,f")
(ashiftrt:ILASX
  (match_operand:ILASX 1 "register_operand" "f,f")
- (match_operand:ILASX 2 "reg_or_vector_same_uimm6_operand" "f,Uuv6")))]
+ (match_operand:ILASX 2 "reg_or_vector_same_uimm_operand" "f,Uuvx")))]
"ISA_HAS_LASX"
-  "@
-   xvsra.\t%u0,%u1,%u2
-   xvsrai.\t%u0,%u1,%E2"
+{
+  switch (which_alternative)
+{
+case 0:
+  return "xvsra.\t%u0,%u1,%u2";
+case 1:
+  {
+   unsigned HOST_WIDE_INT val = UINTVAL (CONST_VECTOR_ELT (operands[2], 
0));
+   operands[2] = GEN_INT (val & (GET_MODE_UNIT_BITSIZE (mode) - 1));
+   return "xvsrai.\t%u0,%u1,%d2";
+  }
+default:
+  g

[patch, avr] ad PR84211: Prefer split into reg-reg move over reg-const move

2024-11-30 Thread Georg-Johann Lay

When splitting multi-byte REG-REG moves in try_split_any(),
it's not clear whether propagating constants will turn
out as profitable.  When MOVW is available, split into
REG-REG moves instead of a possible REG-CONST.

Johann

--

AVR: ad target/84211 - Split MOVW into MOVs in try_split_any.

When splitting multi-byte REG-REG moves in try_split_any(),
it's not clear whether propagating constants will turn
out as profitable.  When MOVW is available, split into
REG-REG moves instead of a possible REG-CONST.
gcc/
PR target/84211
* config/avr/avr-passes.cc (try_split_any) [SET, MOVW]: Prefer
reg=reg move over reg=const when splitting a reg=reg insn.AVR: ad target/84211 - Split MOVW into MOVs in try_split_any.

When splitting multi-byte REG-REG moves in try_split_any(),
it's not clear whether propagating constants will turn
out as profitable.  When MOVW is available, split into
REG-REG moves instead of a possible REG-CONST.
gcc/
PR target/84211
* config/avr/avr-passes.cc (try_split_any) [SET, MOVW]: Prefer
reg=reg move over reg=const when splitting a reg=reg insn.

diff --git a/gcc/config/avr/avr-passes.cc b/gcc/config/avr/avr-passes.cc
index 68ebd31390e..7be5ec25fbc 100644
--- a/gcc/config/avr/avr-passes.cc
+++ b/gcc/config/avr/avr-passes.cc
@@ -2912,6 +2912,14 @@ optimize_data_t::try_split_any (bbinfo_t *)
 	  xsrc = gen_rtx_REG (HImode, r16);
 	  i += step;
 	}
+	  // ...or a reg-reg move from a multi-byte move...
+	  else if (r_src
+		   // Prefer a reg-reg move over a (potential) load
+		   // of a constant, because the subsequent RTL
+		   // peephole pass may combine it to a MOVW again.
+		   && AVR_HAVE_MOVW
+		   && REG_P (curr.ii.m_src))
+	xsrc = gen_rtx_REG (QImode, r_src);
 	  // ...or a cheap constant...
 	  else if (val8 >= 0
 		   && AVRasm::constant_cost (SET, r_dest, val8) <= 1)


[patch,avr,applied] Fix some coding rule nits

2024-11-30 Thread Georg-Johann Lay

This patch fixed some unrelated coding rule nits.

Johann

--

AVR: Fix some coding rule nits and typos.

gcc/
* config/avr/avr-c.cc: Fix some coding rule nits and typos.
* config/avr/avr-passes.cc: Same
* config/avr/avr.h: Same.
* config/avr/avr.cc: Same.
(avr_function_arg_regno_p, avr_hard_regno_rename_ok)
(avr_epilogue_uses, extra_constraint_Q): Return bool instead of 
int.

* config/avr/avr-protos.h (avr_function_arg_regno_p)
(avr_hard_regno_rename_ok, avr_epilogue_uses)
(extra_constraint_Q): Return bool instead of int.AVR: Fix some coding rule nits and typos.

gcc/
* config/avr/avr-c.cc: Fix some coding rule nits and typos.
* config/avr/avr-passes.cc: Same
* config/avr/avr.h: Same.
* config/avr/avr.cc: Same.
(avr_function_arg_regno_p, avr_hard_regno_rename_ok)
(avr_epilogue_uses, extra_constraint_Q): Return bool instead of int.
* config/avr/avr-protos.h (avr_function_arg_regno_p)
(avr_hard_regno_rename_ok, avr_epilogue_uses)
(extra_constraint_Q): Return bool instead of int.

diff --git a/gcc/config/avr/avr-c.cc b/gcc/config/avr/avr-c.cc
index d3c40d73043..7db88a8c77a 100644
--- a/gcc/config/avr/avr-c.cc
+++ b/gcc/config/avr/avr-c.cc
@@ -53,7 +53,7 @@ avr_resolve_overloaded_builtin (unsigned int iloc, tree fndecl, void *vargs)
   tree type0, type1, fold = NULL_TREE;
   avr_builtin_id id = AVR_BUILTIN_COUNT;
   location_t loc = (location_t) iloc;
-  vec &args = * (vec*) vargs;
+  vec &args = * (vec *) vargs;
 
   switch (DECL_MD_FUNCTION_CODE (fndecl))
 {
@@ -61,10 +61,10 @@ avr_resolve_overloaded_builtin (unsigned int iloc, tree fndecl, void *vargs)
   break;
 
 case AVR_BUILTIN_ABSFX:
-  if (args.length() != 1)
+  if (args.length () != 1)
 	{
 	  error_at (loc, "%qs expects 1 argument but %d given",
-		"absfx", (int) args.length());
+		"absfx", (int) args.length ());
 
 	  fold = error_mark_node;
 	  break;
@@ -120,10 +120,10 @@ avr_resolve_overloaded_builtin (unsigned int iloc, tree fndecl, void *vargs)
   break; // absfx
 
 case AVR_BUILTIN_ROUNDFX:
-  if (args.length() != 2)
+  if (args.length () != 2)
 	{
 	  error_at (loc, "%qs expects 2 arguments but %d given",
-		"roundfx", (int) args.length());
+		"roundfx", (int) args.length ());
 
 	  fold = error_mark_node;
 	  break;
@@ -186,10 +186,10 @@ avr_resolve_overloaded_builtin (unsigned int iloc, tree fndecl, void *vargs)
   break; // roundfx
 
 case AVR_BUILTIN_COUNTLSFX:
-  if (args.length() != 1)
+  if (args.length () != 1)
 	{
 	  error_at (loc, "%qs expects 1 argument but %d given",
-		"countlsfx", (int) args.length());
+		"countlsfx", (int) args.length ());
 
 	  fold = error_mark_node;
 	  break;
diff --git a/gcc/config/avr/avr-passes.cc b/gcc/config/avr/avr-passes.cc
index 0d324727b47..68ebd31390e 100644
--- a/gcc/config/avr/avr-passes.cc
+++ b/gcc/config/avr/avr-passes.cc
@@ -1391,10 +1391,10 @@ plies_t::emit_insns (const insninfo_t &ii, const memento_t &memo) const
 }
 
 
-// Helper for .emit_insns().  Emit an ior3 or and3 insns
+// Helper for .emit_insns().  Emit an ior3 or and3 insn
 // that's equivalent to a sequence of contiguous BLDs starting at
-// .plies[ISTART].  Updates N_INSNS according to the number of insns emitted
-// and returns the number of consumed plys in .plies[].
+// .plies[ISTART].  Updates N_INSNS according to the number of insns
+// emitted and returns the number of consumed plys in .plies[].
 int
 plies_t::emit_blds (const insninfo_t &ii, int &n_insns, int istart) const
 {
@@ -1448,7 +1448,7 @@ plies_t::emit_blds (const insninfo_t &ii, int &n_insns, int istart) const
 
 // Emit insns for a contiguous sequence of SET ply_t's starting at
 // .plies[ISTART].  Advances N_INSNS by the number of emitted insns.
-// MEMO ist the state of the GPRs before II es executed, where II
+// MEMO ist the state of the GPRs before II is executed, where II
 // represents the insn under optimization.
 // The emitted insns are "movqi_insn" or "*reload_inqi"
 // when .plies[ISTART].in_set_some is not set, and one "set_some" insn
@@ -1571,7 +1571,7 @@ plies_t::emit_sets (const insninfo_t &ii, int &n_insns, const memento_t &memo,
 
 // Try to find an operation such that  Y = op (X).
 // Shifts and rotates are regarded as unary operaions with
-// an implied 2nd operand.
+// an implied 2nd operand or 1 or 4, respectively.
 static rtx_code
 find_arith (uint8_t y, uint8_t x)
 {
@@ -1606,8 +1606,8 @@ find_arith2 (uint8_t z, uint8_t x, uint8_t y)
 }
 
 
-// Add plies to .plies[] that represent a MOVW, but only ones that reduce the
-// Hamming distance from REGNO[SIZE] to VAL by exactly DHAMM.
+// Add plies to .plies[] that represent a MOVW, but only ones that reduce
+// the Hamming distance from REGNO[SIZE] to VAL by exactly DHAMM.
 void
 plies_t::add_plies_movw (int regno, int siz

Re: [patch,avr,testsuite,applied] gcc.c-torture/execute/memcpy-a*.c

2024-11-30 Thread Maciej W. Rozycki
On Sat, 30 Nov 2024, Georg-Johann Lay wrote:

> The gcc.c-torture/execute/memcpy-a[1248].c tests consumed more time
> than the whole rest of the test suite, just to come up with
> a "memory full" even at -Os.  Skipped thusly.

 As a matter of interest, is the timeout/memory exhaustion observed with 
host compilation or target execution?

 From my observation the optimisation level does not matter much for 
compilation times unless you use -O0, reducing the number of passes and 
code transformations applied.  So whether it's at -O2 or -Os the time 
consumed does not change much.

 The resulting executables themselves take ~560KiB with a VAX target, the 
ultimate CISC machine which also has memory move and memory set hardware 
instructions contributing to size reduction, and ~1200KiB with a POWER9 
target, a 64-bit RISC machine.  I suppose the sizes will lie somewhere in 
between for the majority of our targets, and with smaller embedded ones 
the executables may not fit in the address space available.

  Maciej


Re: [PATCH v2] match.pd: Add pattern to simplify `(a - 1) & -a` to `0`

2024-11-30 Thread Jeff Law




On 11/26/24 3:49 AM, Jovan Vukic wrote:

Thank you for the feedback on the v1 patch.

As requested, I added detailed tests for various signed and unsigned integer
types in the test file bitops-11.c. I also included more complex expressions to
observe how everything behaves at the GIMPLE level and added vector test
examples as well. Since vector expressions are matched on (minus @0 1) instead
of (plus @0 -1), I added a simplification for the minus case in match.pd.

Additionally, I introduced simplifications for the expressions (a - 1) & -a,
(a - 1) | -a, and (a - 1) ^ -a to 0, -1, and -1, respectively, in 
simplify-rtx.cc.

For each of the three expressions, I added two if statements. The first matches
the typical (BIT_OP (plus (A -1)) (neg A)), while the second recognizes the
presence of a SUBREG within the RTL expression. For example, when A is of type
short, the second if statement is triggered. I didn't observe any issues with
match.pd missing any simplifications, but if that happens, the code in
simplify-rtx.cc should help.

Bootstrapped and tested on x86-linux-gnu with no regressions.

2024-11-26  Jovan Vukic

gcc/ChangeLog:

 * match.pd: New pattern.
 * simplify-rtx.cc (simplify_context::simplify_binary_operation_1): New
 code to handle (a - 1) & -a, (a - 1) | -a and (a - 1) ^ -a.

gcc/testsuite/ChangeLog:

 * gcc.dg/tree-ssa/bitops-11.c: New test.


CONFIDENTIALITY: The contents of this e-mail are confidential and intended only 
for the above addressee(s). If you are not the intended recipient, or the 
person responsible for delivering it to the intended recipient, copying or 
delivering it to anyone else or using it in any unauthorized manner is 
prohibited and may be unlawful. If you receive this e-mail by mistake, please 
notify the sender and the systems administrator atstraym...@rt-rk.com 
immediately.


patch.patch

---
  gcc/match.pd  |  16 +++
  gcc/simplify-rtx.cc   |  87 
  gcc/testsuite/gcc.dg/tree-ssa/bitops-11.c | 116 ++
  3 files changed, 219 insertions(+)
  create mode 100644 gcc/testsuite/gcc.dg/tree-ssa/bitops-11.c

diff --git a/gcc/match.pd b/gcc/match.pd
index 0ac5674f24b..c85d4b9ae6c 100644
--- a/gcc/match.pd
+++ b/gcc/match.pd
@@ -1472,6 +1472,22 @@ DEFINE_INT_AND_FLOAT_ROUND_FN (RINT)
   (bit_and:c @0 (bit_not (bit_xor:c @0 @1)))
   (bit_and @0 @1))
  
+/* Transform:

+   (a - 1) & -a ->  0.
+   (a - 1) | -a -> -1.
+   (a - 1) ^ -a -> -1.  */
+(for bit_op (bit_ior bit_xor bit_and)
+ (simplify
+  (bit_op:c (plus @0 integer_minus_onep@1) (negate @0))
+  (if (bit_op == BIT_AND_EXPR)
+{ build_zero_cst (type); }
+{ build_minus_one_cst (type); }))
+ (simplify
+  (bit_op:c (minus @0 integer_onep@1) (negate @0))
+  (if (bit_op == BIT_AND_EXPR)
+{ build_zero_cst (type); }
+{ build_minus_one_cst (type); })))
I'm a little surprised the second form matches anything since I would 
expect canonicalization to turn the second form into the first.  But may 
we're not as good at canonicalizing trees as we are RTL.




+
  /* a & (a == b)  -->  a & b (boolean version of the above). */
  (simplify
   (bit_and:c @0 (nop_convert? (eq:c @0 @1)))
diff --git a/gcc/simplify-rtx.cc b/gcc/simplify-rtx.cc
index 893c5f6e1ae..514d21c6ef5 100644
--- a/gcc/simplify-rtx.cc
+++ b/gcc/simplify-rtx.cc
@@ -3530,6 +3530,35 @@ simplify_context::simplify_binary_operation_1 (rtx_code 
code,
  && GET_MODE_CLASS (mode) != MODE_CC)
return CONSTM1_RTX (mode);
  
+  /* (ior (plus (A -1)) (neg A)) is -1.  */

Nit.  Space between the operator "-" and the operand "1".


+  if (((GET_CODE (op1) == NEG
+   && GET_CODE (op0) == PLUS
+   && XEXP (op0, 1) == constm1_rtx)
Most likely constm1_rtx should be CONSTM1_RTX (mode). Similarly for the 
other instances.





+  || (GET_CODE (op0) == NEG
+  && GET_CODE (op1) == PLUS
+  && XEXP (op1, 1) == constm1_rtx))
+ && rtx_equal_p (XEXP (op0, 0), XEXP (op1, 0))
+ && !side_effects_p (XEXP (op0, 0))
+ && !side_effects_p (XEXP (op1, 0)))

If the RTXs are equal, then the second call to !side_effects_p isn't needed.


Given the duplication of RTL scanning across the AND, XOR, IOR cases, 
wouldn't it make sense to factor the tests into a new function and just 
call it from all 3 locations?



Overall it looks pretty good.  I'm a bit concerned about the SUBREG 
case, but I haven't come up with any scenarios where it doesn't work yet.


Jeff




Re: [PATCH v2] Write binary annotations for CodeView S_INLINESITE symbols

2024-11-30 Thread Jeff Law




On 11/30/24 3:39 PM, Mark Harmstone wrote:

Add "binary annotations" at the end of CodeView S_INLINESITE symbols,
which are a series of compressed integers that represent how line
numbers map to addresses.

This requires assembler support; you will need commit b3aa594d ("gas:
add .cv_ucomp and .cv_scomp pseudo-directives") in binutils.

gcc/
* configure.ac (HAVE_GAS_CV_UCOMP): New check.
* configure: Regenerate.
* config.in: Regenerate.
* dwarf2codeview.cc (enum binary_annotation_opcode): Define.
(struct codeview_function): Add htab_next and inline_loc;
(struct cv_func_hasher): Define.
(cv_func_htab): New global variable.
(new_codeview_function): Add new codeview_function to hash table.
(codeview_begin_block): Record location of inline block.
(codeview_end_block): Add dummy source line at end of inline block.
(find_line_function): New function.
(write_binary_annotations): New function.
(write_s_inlinesite): Call write_binary_annotations.
(codeview_debug_finish): Delete cv_func_htab.
---
This second version adds a configure check for .cv_ucomp in the
assembler, and prevents the binary annotations from being emitted if
it's not supported. There's no separate check for .cv_scomp, as the same
binutils commit introduced them both.

OK.
Jeff



Re: [PATCH] sched1: debug/model: dump predecessor list and BB num [NFC]

2024-11-30 Thread Jeff Law




On 11/5/24 3:20 PM, Vineet Gupta wrote:

This is broken out of predecessor promotion patch so that debugging can
proceed during stage1 restrictions.

Signed-off-by: Vineet Gupta 
---
  gcc/haifa-sched.cc | 10 +-
  gcc/sched-rgn.cc   | 14 --
  2 files changed, 17 insertions(+), 7 deletions(-)


OK with a ChangeLog entry.

jeff



Re: [PATCH] v2: Allow limited extended asm at toplevel [PR41045]

2024-11-30 Thread Jeff Law




On 11/22/24 2:45 AM, Jakub Jelinek wrote:

On Thu, Nov 21, 2024 at 09:32:51PM +, Joseph Myers wrote:

On Sat, 2 Nov 2024, Jakub Jelinek wrote:


+Extended @code{asm} statements outside of functions may not use any
+qualifiers, may not specify clobbers, may not use @code{%}, @code{+} or
+@code{&} modifiers in constraints and can only use constraints which don%'t
+allow using any register.


Just ' in Texinfo, not %'.


@@ -3071,7 +3072,62 @@ c_parser_declaration_or_fndef (c_parser
  static void
  c_parser_asm_definition (c_parser *parser)
  {
-  tree asm_str = c_parser_simple_asm_expr (parser);
+  location_t asm_loc = c_parser_peek_token (parser)->location;


The syntax comment above this function needs updating.

The C front-end changes are OK with that fix.


Thanks.
Here is the adjusted patch.

2024-11-22  Jakub Jelinek  

PR c/41045
gcc/
* output.h (insn_noperands): Declare.
* final.cc (insn_noperands): No longer static.
* varasm.cc (assemble_asm): Handle ASM_EXPR.
* lto-streamer-out.cc (lto_output_toplevel_asms): Add sorry_at
for non-STRING_CST toplevel asm for now.
* doc/extend.texi (Basic @code{asm}, Extended @code{asm}): Document
that extended asm is now allowed outside of functions with certain
restrictions.
gcc/c/
* c-parser.cc (c_parser_asm_string_literal): Add forward declaration.
(c_parser_asm_definition): Parse also extended asm without
clobbers/labels.
* c-typeck.cc (build_asm_expr): Allow extended asm outside of
functions and check extra restrictions.
gcc/cp/
* cp-tree.h (finish_asm_stmt): Add TOPLEV_P argument.
* parser.cc (cp_parser_asm_definition): Parse also extended asm
without clobbers/labels outside of functions.
* semantics.cc (finish_asm_stmt): Add TOPLEV_P argument, if set,
check extra restrictions for extended asm outside of functions.
* pt.cc (tsubst_stmt): Adjust finish_asm_stmt caller.
gcc/testsuite/
* c-c++-common/toplevel-asm-1.c: New test.
* c-c++-common/toplevel-asm-2.c: New test.
* c-c++-common/toplevel-asm-3.c: New test.
final/varasm/lto bits are fine.  Consider adding the note about the LTO 
limitation in the documentation blurb.


Jason will need to chime in on the C++ FE bits if he hasn't already.

Thanks,
jeff




Re: [PATCH 1/2] gimple-lim: Reuse boolean var when moving PHI

2024-11-30 Thread Richard Biener



> Am 01.12.2024 um 03:00 schrieb Andrew Pinski :
> 
> While looking into PR 117859, I noticed that LIM
> sometimes would produce `bool_var2 = bool_var1 != 0` instead
> of just using bool_var2. This patch allows LIM to reuse bool_var1
> in the place where bool_var2 was going to be used.
> 
> Bootstrapped and tested on x86_64-linux-gnu.

Ok

Richard 

> gcc/ChangeLog:
> 
>* tree-ssa-loop-im.cc (move_computations_worker): While moving
>phi, reuse the lhs of the conditional if it is a boolean type.
> 
> Signed-off-by: Andrew Pinski 
> ---
> gcc/tree-ssa-loop-im.cc | 21 -
> 1 file changed, 16 insertions(+), 5 deletions(-)
> 
> diff --git a/gcc/tree-ssa-loop-im.cc b/gcc/tree-ssa-loop-im.cc
> index 538a0588d33..0130a809e30 100644
> --- a/gcc/tree-ssa-loop-im.cc
> +++ b/gcc/tree-ssa-loop-im.cc
> @@ -1304,11 +1304,22 @@ move_computations_worker (basic_block bb)
> edges of COND.  */
>  extract_true_false_args_from_phi (dom, stmt, &arg0, &arg1);
>  gcc_assert (arg0 && arg1);
> -  t = make_ssa_name (boolean_type_node);
> -  new_stmt = gimple_build_assign (t, gimple_cond_code (cond),
> -  gimple_cond_lhs (cond),
> -  gimple_cond_rhs (cond));
> -  gsi_insert_on_edge (loop_preheader_edge (level), new_stmt);
> +  /* For `bool_val != 0`, reuse bool_val. */
> +  if (gimple_cond_code (cond) == NE_EXPR
> +  && integer_zerop (gimple_cond_rhs (cond))
> +  && types_compatible_p (TREE_TYPE (gimple_cond_lhs (cond)),
> + boolean_type_node))
> +{
> +  t = gimple_cond_lhs (cond);
> +}
> +  else
> +{
> +  t = make_ssa_name (boolean_type_node);
> +  new_stmt = gimple_build_assign (t, gimple_cond_code (cond),
> +  gimple_cond_lhs (cond),
> +  gimple_cond_rhs (cond));
> +  gsi_insert_on_edge (loop_preheader_edge (level), new_stmt);
> +}
>  new_stmt = gimple_build_assign (gimple_phi_result (stmt),
>  COND_EXPR, t, arg0, arg1);
>  todo |= TODO_cleanup_cfg;
> --
> 2.43.0
> 


[PUSHED 2/5] testsuite: Fix another issue with sve-sizeless-[12].C

2024-11-30 Thread Andrew Pinski
There is a different error message expected on line 165 (for both files).
It was expecting:
error: cannot convert 'svint16_t' to 'sveint8_t' in initialization
But now we get:
error: cannot convert 'svint16_t' to 'signed char' in initialization

This is because we support constructing scalable vectors rather than before.

So just update error message.

Pushed as obvious after a quick test for aarch64-linux-gnu.

gcc/testsuite/ChangeLog:

* g++.dg/ext/sve-sizeless-1.C: Update error message for line 165.
* g++.dg/ext/sve-sizeless-2.C: Likewise.

Signed-off-by: Andrew Pinski 
---
 gcc/testsuite/g++.dg/ext/sve-sizeless-1.C | 2 +-
 gcc/testsuite/g++.dg/ext/sve-sizeless-2.C | 2 +-
 2 files changed, 2 insertions(+), 2 deletions(-)

diff --git a/gcc/testsuite/g++.dg/ext/sve-sizeless-1.C 
b/gcc/testsuite/g++.dg/ext/sve-sizeless-1.C
index 0a5c80b92b8..1e00518d0e4 100644
--- a/gcc/testsuite/g++.dg/ext/sve-sizeless-1.C
+++ b/gcc/testsuite/g++.dg/ext/sve-sizeless-1.C
@@ -162,7 +162,7 @@ statements (int n)
   svint8_t init_sve_sc4 = sve_sh1; // { dg-error {cannot convert 'svint16_t' 
to 'svint8_t'} }
   svint8_t init_sve_sc5 = {};
   svint8_t init_sve_sc6 = { sve_sc1 };
-  svint8_t init_sve_sc7 = { sve_sh1 }; // { dg-error {cannot convert 
'svint16_t' to 'svint8_t'} }
+  svint8_t init_sve_sc7 = { sve_sh1 }; // { dg-error {cannot convert 
'svint16_t' to 'signed char'} }
   svint32_t init_sve_vc1 = { 0, 1 };
   svint32_t init_sve_vc2 = { 0, bar () };
   svint32_t init_sve_vc3 = { bar (), n };
diff --git a/gcc/testsuite/g++.dg/ext/sve-sizeless-2.C 
b/gcc/testsuite/g++.dg/ext/sve-sizeless-2.C
index a5ee230f76a..e6423d572f0 100644
--- a/gcc/testsuite/g++.dg/ext/sve-sizeless-2.C
+++ b/gcc/testsuite/g++.dg/ext/sve-sizeless-2.C
@@ -162,7 +162,7 @@ statements (int n)
   svint8_t init_sve_sc4 = sve_sh1; // { dg-error {cannot convert 'svint16_t' 
to 'svint8_t'} }
   svint8_t init_sve_sc5 = {};
   svint8_t init_sve_sc6 = { sve_sc1 };
-  svint8_t init_sve_sc7 = { sve_sh1 }; // { dg-error {cannot convert 
'svint16_t' to 'svint8_t'} }
+  svint8_t init_sve_sc7 = { sve_sh1 }; // { dg-error {cannot convert 
'svint16_t' to 'signed char'} }
   svint32_t init_sve_vc1 = { 0, 1 };
   svint32_t init_sve_vc2 = { 0, bar () };
   svint32_t init_sve_vc3 = { bar (), n };
-- 
2.43.0



[PUSHED 1/5] testsuite: Fix part of sve-sizeless-2.c

2024-11-30 Thread Andrew Pinski
r15-5783-gb5df3eefd70064 missed to update part of sve-sizeless-2.C to
include the declaration of the bar function.
This corrects the oversight there.

Pushed as obvious after testing the tecase for aarch64-linux-gnu.

gcc/testsuite/ChangeLog:

* g++.dg/ext/sve-sizeless-2.C: Add declaration of bar.

Signed-off-by: Andrew Pinski 
---
 gcc/testsuite/g++.dg/ext/sve-sizeless-2.C | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/gcc/testsuite/g++.dg/ext/sve-sizeless-2.C 
b/gcc/testsuite/g++.dg/ext/sve-sizeless-2.C
index 87937d060d2..a5ee230f76a 100644
--- a/gcc/testsuite/g++.dg/ext/sve-sizeless-2.C
+++ b/gcc/testsuite/g++.dg/ext/sve-sizeless-2.C
@@ -124,6 +124,8 @@ void thrower2 () throw (svint8_t); // { dg-error {cannot 
throw or catch SVE type
 void thrower3 () throw (svint8_t); // { dg-error {cannot throw or catch SVE 
type 'svint8_t'} "" { target c++98_only } }
 #endif
 
+extern int bar (void);
+
 // Main tests for statements and expressions.
 
 void
-- 
2.43.0



[PUSHED] testsuite: Fix aarch64/sve/acle/general-c/gnu_vectors_[12].c for taking address of vector element

2024-11-30 Thread Andrew Pinski
After the recent changes for SVE vectors becoming usable as GNU vector 
extensions. You can now get
each of the elements like it was an array. There is no reason why taking the 
address of that
won't be invalid too. especially since we are limiting to the first N elements 
(where N is the
min arch supported elements for these types).
So this removes the error message on these 2 lines and fixes the testcase.

Pushed as obvious after a quick test for these tests for aarch64-linux-gnu.

gcc/testsuite/ChangeLog:

* gcc.target/aarch64/sve/acle/general-c/gnu_vectors_1.c: Remove
error message on taking address of an element of a vector.
* gcc.target/aarch64/sve/acle/general-c/gnu_vectors_2.c: Likewise.

Signed-off-by: Andrew Pinski 
---
 .../gcc.target/aarch64/sve/acle/general-c/gnu_vectors_1.c   | 2 +-
 .../gcc.target/aarch64/sve/acle/general-c/gnu_vectors_2.c   | 2 +-
 2 files changed, 2 insertions(+), 2 deletions(-)

diff --git 
a/gcc/testsuite/gcc.target/aarch64/sve/acle/general-c/gnu_vectors_1.c 
b/gcc/testsuite/gcc.target/aarch64/sve/acle/general-c/gnu_vectors_1.c
index 7a7eb42f57b..32bb826e240 100644
--- a/gcc/testsuite/gcc.target/aarch64/sve/acle/general-c/gnu_vectors_1.c
+++ b/gcc/testsuite/gcc.target/aarch64/sve/acle/general-c/gnu_vectors_1.c
@@ -79,7 +79,7 @@ f (svuint8_t sve_u1, svint8_t sve_s1,
   /* Vector indexing.  */
 
   sve_u1[0];
-  &sve_u1[0]; /* { dg-error {subscripted value is neither array nor pointer} } 
*/
+  &sve_u1[0];
 
   gnu_u1[0];
   &gnu_u1[0];
diff --git 
a/gcc/testsuite/gcc.target/aarch64/sve/acle/general-c/gnu_vectors_2.c 
b/gcc/testsuite/gcc.target/aarch64/sve/acle/general-c/gnu_vectors_2.c
index 2e92b0dd766..c311e162f9e 100644
--- a/gcc/testsuite/gcc.target/aarch64/sve/acle/general-c/gnu_vectors_2.c
+++ b/gcc/testsuite/gcc.target/aarch64/sve/acle/general-c/gnu_vectors_2.c
@@ -78,7 +78,7 @@ f (svuint8_t sve_u1, svint8_t sve_s1,
   /* Vector indexing.  */
 
   sve_u1[0];
-  &sve_u1[0]; /* { dg-error {subscripted value is neither array nor pointer} } 
*/
+  &sve_u1[0];
 
   gnu_u1[0];
   &gnu_u1[0];
-- 
2.43.0



[PUSHED 4/5] testsuite: Fix sve-sizeless-[12].C for C++98

2024-11-30 Thread Andrew Pinski
In C++98 `{ a }` for aggregates can only mean constructing by
each element rather than a copy. This adds the expected error
message for SVE vectors for C++98.

Pushed as obvious after a test for aarch64-linux-gnu.

gcc/testsuite/ChangeLog:

* g++.dg/ext/sve-sizeless-1.C: Add error message for line 164
for C++98 only.
* g++.dg/ext/sve-sizeless-2.C: Likewise.

Signed-off-by: Andrew Pinski 
---
 gcc/testsuite/g++.dg/ext/sve-sizeless-1.C | 2 +-
 gcc/testsuite/g++.dg/ext/sve-sizeless-2.C | 2 +-
 2 files changed, 2 insertions(+), 2 deletions(-)

diff --git a/gcc/testsuite/g++.dg/ext/sve-sizeless-1.C 
b/gcc/testsuite/g++.dg/ext/sve-sizeless-1.C
index 0d62bd7f39e..37bcec1efb9 100644
--- a/gcc/testsuite/g++.dg/ext/sve-sizeless-1.C
+++ b/gcc/testsuite/g++.dg/ext/sve-sizeless-1.C
@@ -161,7 +161,7 @@ statements (int n)
   svint8_t init_sve_sc3 = sve_sc1;
   svint8_t init_sve_sc4 = sve_sh1; // { dg-error {cannot convert 'svint16_t' 
to 'svint8_t'} }
   svint8_t init_sve_sc5 = {};
-  svint8_t init_sve_sc6 = { sve_sc1 };
+  svint8_t init_sve_sc6 = { sve_sc1 }; // { dg-error {cannot convert 
'svint8_t' to 'signed char'} "" { target c++98_only } }
   svint8_t init_sve_sc7 = { sve_sh1 }; // { dg-error {cannot convert 
'svint16_t' to 'signed char'} }
   svint32_t init_sve_vc1 = { 0, 1 };
   svint32_t init_sve_vc2 = { 0, bar () };
diff --git a/gcc/testsuite/g++.dg/ext/sve-sizeless-2.C 
b/gcc/testsuite/g++.dg/ext/sve-sizeless-2.C
index 51def2441df..3243a17f606 100644
--- a/gcc/testsuite/g++.dg/ext/sve-sizeless-2.C
+++ b/gcc/testsuite/g++.dg/ext/sve-sizeless-2.C
@@ -161,7 +161,7 @@ statements (int n)
   svint8_t init_sve_sc3 = sve_sc1;
   svint8_t init_sve_sc4 = sve_sh1; // { dg-error {cannot convert 'svint16_t' 
to 'svint8_t'} }
   svint8_t init_sve_sc5 = {};
-  svint8_t init_sve_sc6 = { sve_sc1 };
+  svint8_t init_sve_sc6 = { sve_sc1 }; // { dg-error {cannot convert 
'svint8_t' to 'signed char'} "" { target c++98_only } }
   svint8_t init_sve_sc7 = { sve_sh1 }; // { dg-error {cannot convert 
'svint16_t' to 'signed char'} }
   svint32_t init_sve_vc1 = { 0, 1 };
   svint32_t init_sve_vc2 = { 0, bar () };
-- 
2.43.0



[PUSHED 3/5] testsuite: Fix sve-sizeless-[12].C for aggregate change

2024-11-30 Thread Andrew Pinski
Since r15-5777-g761cf60218890a, the SVE types are considered
an aggregate since they are now acting similar as a GNU vector.

Pushed as obvious after a quick test for aarch64-linux-gnu.

gcc/testsuite/ChangeLog:

* g++.dg/ext/sve-sizeless-1.C: SVE vectors are now aggregates.
* g++.dg/ext/sve-sizeless-2.C: Likewise.

Signed-off-by: Andrew Pinski 
---
 gcc/testsuite/g++.dg/ext/sve-sizeless-1.C | 2 +-
 gcc/testsuite/g++.dg/ext/sve-sizeless-2.C | 2 +-
 2 files changed, 2 insertions(+), 2 deletions(-)

diff --git a/gcc/testsuite/g++.dg/ext/sve-sizeless-1.C 
b/gcc/testsuite/g++.dg/ext/sve-sizeless-1.C
index 1e00518d0e4..0d62bd7f39e 100644
--- a/gcc/testsuite/g++.dg/ext/sve-sizeless-1.C
+++ b/gcc/testsuite/g++.dg/ext/sve-sizeless-1.C
@@ -351,7 +351,7 @@ statements (int n)
   { typedef int f[__has_unique_object_representations (svint8_t) ? 1 : -1]; }
   { typedef int f[!__has_virtual_destructor (svint8_t) ? 1 : -1]; }
   { typedef int f[!__is_abstract (svint8_t) ? 1 : -1]; }
-  { typedef int f[!__is_aggregate (svint8_t) ? 1 : -1]; }
+  { typedef int f[__is_aggregate (svint8_t) ? 1 : -1]; }
   { typedef int f[!__is_base_of (svint8_t, svint8_t) ? 1 : -1]; }
   { typedef int f[!__is_base_of (svint8_t, svint16_t) ? 1 : -1]; }
   { typedef int f[!__is_class (svint8_t) ? 1 : -1]; }
diff --git a/gcc/testsuite/g++.dg/ext/sve-sizeless-2.C 
b/gcc/testsuite/g++.dg/ext/sve-sizeless-2.C
index e6423d572f0..51def2441df 100644
--- a/gcc/testsuite/g++.dg/ext/sve-sizeless-2.C
+++ b/gcc/testsuite/g++.dg/ext/sve-sizeless-2.C
@@ -351,7 +351,7 @@ statements (int n)
   { typedef int f[__has_unique_object_representations (svint8_t) ? 1 : -1]; }
   { typedef int f[!__has_virtual_destructor (svint8_t) ? 1 : -1]; }
   { typedef int f[!__is_abstract (svint8_t) ? 1 : -1]; }
-  { typedef int f[!__is_aggregate (svint8_t) ? 1 : -1]; }
+  { typedef int f[__is_aggregate (svint8_t) ? 1 : -1]; }
   { typedef int f[!__is_base_of (svint8_t, svint8_t) ? 1 : -1]; }
   { typedef int f[!__is_base_of (svint8_t, svint16_t) ? 1 : -1]; }
   { typedef int f[!__is_class (svint8_t) ? 1 : -1]; }
-- 
2.43.0



Re: [PATCH v2] sched1: parameterize pressure scheduling spilling agressiveness [PR/114729]

2024-11-30 Thread Jeff Law




On 11/5/24 1:11 PM, Vineet Gupta wrote:

changes since v1
   * Changed target hook to --param
   * squash addon patch for RISC-V opting-in, testcase here
   * updated changelog with latest perf numbers
---

sched1 computes ECC (Excess Change Cost) for each insn, which represents
the register pressure attributed to the insn.
Currently the pressure sensitive schduling algorithm deliberately ignores
negative values (pressure reduction), making them 0 (neutral), leading
to more spills. This happens due to the assumption that the compiler has
a reasonably accurate processor pipeline scheduling model and thus tries
to aggresively fill pipeline bubbles with spill slots.

This however might not be true, as the model might not be available for
certains uarches or even applicable especially for modern out-of-order cores.

The existing heuristic induces spill frenzy on RISC-V, noticably so on
SPEC2017 507.Cactu. If insn scheduling is disabled completely, the
total dynamic icounts for this workload are reduced in half from
~2.5 trillion insns to ~1.3 (w/ -fno-schedule-insns).

This patch adds --param=cycle-accurate-model={0,1} to gate the spill
behavior.

  - The default (1) preserves existing spill behavior.

  - targets/uarches sensitive to spilling can override the param to (0)
to get the reverse effect. RISC-V backend does so too.

The actual perf numbers are very promising.

(1) On RISC-V BPI-F3 in-order CPU, -Ofast -march=rv64gcv_zba_zbb_zbs:

   Before:
   --
   Performance counter stats for './cactusBSSN_r_base.rivos spec_ref.par':

   4,917,712.97 msec task-clock:u #1.000 CPUs 
utilized
  5,314  context-switches:u   #1.081 /sec
  3  cpu-migrations:u #0.001 /sec
204,784  page-faults:u#   41.642 /sec
  7,868,291,222,513  cycles:u #1.600 GHz
  2,615,069,866,153  instructions:u   #0.33  insn per 
cycle
 10,799,381,890  branches:u   #2.196 M/sec
 15,714,572  branch-misses:u  #0.15% of all 
branches

   After:
   -
   Performance counter stats for './cactusBSSN_r_base.rivos spec_ref.par':

   4,552,979.58 msec task-clock:u #0.998 CPUs 
utilized
205,020  context-switches:u   #   45.030 /sec
  2  cpu-migrations:u #0.000 /sec
204,221  page-faults:u#   44.854 /sec
  7,285,176,204,764  cycles:u(7.4% faster)#1.600 GHz
  2,145,284,345,397  instructions:u (17.96% fewer)#0.29  insn per 
cycle
 10,799,382,011  branches:u   #2.372 M/sec
 16,235,628  branch-misses:u  #0.15% of all 
branches

(2) Wilco reported 20% perf gains on aarch64 Neoverse V2 runs.

gcc/ChangeLog:
PR target/11472
* params.opt (--param=cycle-accurate-model=): New opt.
* doc/invoke.texi (cycle-accurate-model): Document.
* haifa-sched.cc (model_excess_group_cost): Return negative
delta if param_cycle_accurate_model is 0.
(model_excess_cost): Ceil negative baseECC to 0 only if
param_cycle_accurate_model is 1.
Dump the actual ECC value.
* config/riscv/riscv.cc (riscv_option_override): Set param
to 0.

gcc/testsuite/ChangeLog:
PR target/114729
* gcc.target/riscv/riscv.exp: Enable new tests to build.
* gcc.target/riscv/sched1-spills/spill1.cpp: Add new test.

Signed-off-by: Vineet Gupta 
---
  gcc/config/riscv/riscv.cc |  4 +++
  gcc/doc/invoke.texi   |  7 
  gcc/haifa-sched.cc| 32 ++-
  gcc/params.opt|  4 +++
  gcc/testsuite/gcc.target/riscv/riscv.exp  |  2 ++
  .../gcc.target/riscv/sched1-spills/spill1.cpp | 32 +++
  6 files changed, 73 insertions(+), 8 deletions(-)
  create mode 100644 gcc/testsuite/gcc.target/riscv/sched1-spills/spill1.cpp

diff --git a/gcc/doc/invoke.texi b/gcc/doc/invoke.texi
index 7146163d66d0..c1e07e258b25 100644
--- a/gcc/doc/invoke.texi
+++ b/gcc/doc/invoke.texi
@@ -17084,6 +17084,13 @@ With @option{--param=openacc-privatization=quiet}, 
don't diagnose.
  This is the current default.
  With @option{--param=openacc-privatization=noisy}, do diagnose.
  
+@item cycle-accurate-model

+Specifies whether GCC should assume that the scheduling description is mostly
+a cycle-accurate model of the target processor, where the code is intended to
+run on, in the absence of cache misses.  Nonzero means that the selected 
scheduling
+model is accuate and likely describes an in-order processor, and that 
scheduling
+will aggressively spill to try and fill any pipeline bubbles.
s/accuate/accurate/  And you should prob

[PUSHED 5/5] testsuite: Fix aarch64/sve/acle/general-c++/gnu_vectors_[12].C for taking address of vector element

2024-11-30 Thread Andrew Pinski
After the recent changes for SVE vectors becoming usable as GNU vector 
extensions. You can now get
each of the elements like it was an array. There is no reason why taking the 
address of that
won't be invalid too. especially since we are limiting to the first N elements 
(where N is the
min arch supported elements for these types).
So this removes the error message on these 2 lines and fixes the testcase.

Pushed as obvious after a quick test for these tests for aarch64-linux-gnu.

gcc/testsuite/ChangeLog:

* g++.target/aarch64/sve/acle/general-c++/gnu_vectors_1.C: Remove
error message on taking address of an element of a vector.
* g++.target/aarch64/sve/acle/general-c++/gnu_vectors_2.C: Likewise.

Signed-off-by: Andrew Pinski 
---
 .../g++.target/aarch64/sve/acle/general-c++/gnu_vectors_1.C | 2 +-
 .../g++.target/aarch64/sve/acle/general-c++/gnu_vectors_2.C | 2 +-
 2 files changed, 2 insertions(+), 2 deletions(-)

diff --git 
a/gcc/testsuite/g++.target/aarch64/sve/acle/general-c++/gnu_vectors_1.C 
b/gcc/testsuite/g++.target/aarch64/sve/acle/general-c++/gnu_vectors_1.C
index 49b90492ce1..cdd48cbc454 100644
--- a/gcc/testsuite/g++.target/aarch64/sve/acle/general-c++/gnu_vectors_1.C
+++ b/gcc/testsuite/g++.target/aarch64/sve/acle/general-c++/gnu_vectors_1.C
@@ -115,7 +115,7 @@ f (svuint8_t sve_u1, svint8_t sve_s1,
   // Vector indexing.
 
   sve_u1[0];
-  &sve_u1[0]; // { dg-error {subscripted value is neither array nor pointer} }
+  &sve_u1[0];
 
   gnu_u1[0];
   &gnu_u1[0];
diff --git 
a/gcc/testsuite/g++.target/aarch64/sve/acle/general-c++/gnu_vectors_2.C 
b/gcc/testsuite/g++.target/aarch64/sve/acle/general-c++/gnu_vectors_2.C
index 2e9f688946f..587d28d2363 100644
--- a/gcc/testsuite/g++.target/aarch64/sve/acle/general-c++/gnu_vectors_2.C
+++ b/gcc/testsuite/g++.target/aarch64/sve/acle/general-c++/gnu_vectors_2.C
@@ -115,7 +115,7 @@ f (svuint8_t sve_u1, svint8_t sve_s1,
   // Vector indexing.
 
   sve_u1[0];
-  &sve_u1[0]; // { dg-error {subscripted value is neither array nor pointer} }
+  &sve_u1[0];
 
   gnu_u1[0];
   &gnu_u1[0];
-- 
2.43.0



Re: [PATCH 2/2] VN: Don't recurse on for the same value of `a != 0` [PR117859]

2024-11-30 Thread Richard Biener



> Am 01.12.2024 um 02:59 schrieb Andrew Pinski :
> 
> Like r15-5063-g6e84a41622f56c, but this is for the `a != 0` case.
> After adding vn_valueize to the handle the `a ==/!= 0` case
> of insert_predicates_for_cond, it would go into an infinite loop
> as the Value number for a could be the same as what it
> is for the whole expression. This avoids that recursion so there is
> no infinite loop here.
> 
> Note lim was introducing `bool_var2 = bool_var1 != 0` originally but
> with the gimple testcase in -2, there is no dependency on what passes
> before hand will do.
> 
> Bootstrapped and tested on x86_64-linux-gnu.

Ok

Richard 


>PR tree-optimization/117859
> 
> gcc/ChangeLog:
> 
>* tree-ssa-sccvn.cc (insert_predicates_for_cond): If the
>valueization for the new lhs for `lhs != 0`
>is the same as the old ones, don't recurse.
> 
> gcc/testsuite/ChangeLog:
> 
>* gcc.dg/torture/pr117859-1.c: New test.
>* gcc.dg/torture/pr117859-2.c: New test.
> 
> Signed-off-by: Andrew Pinski 
> ---
> gcc/testsuite/gcc.dg/torture/pr117859-1.c | 23 ++
> gcc/testsuite/gcc.dg/torture/pr117859-2.c | 51 +++
> gcc/tree-ssa-sccvn.cc |  3 +-
> 3 files changed, 76 insertions(+), 1 deletion(-)
> create mode 100644 gcc/testsuite/gcc.dg/torture/pr117859-1.c
> create mode 100644 gcc/testsuite/gcc.dg/torture/pr117859-2.c
> 
> diff --git a/gcc/testsuite/gcc.dg/torture/pr117859-1.c 
> b/gcc/testsuite/gcc.dg/torture/pr117859-1.c
> new file mode 100644
> index 000..21e0a26e5ca
> --- /dev/null
> +++ b/gcc/testsuite/gcc.dg/torture/pr117859-1.c
> @@ -0,0 +1,23 @@
> +/* { dg-do compile } */
> +/* { dg-options "-O2 -fno-ssa-phiopt -fno-expensive-optimizations" } */
> +
> +/* PR tree-optimization/117859 */
> +
> +unsigned char a;
> +int b, c, d, e, g, h, j, k;
> +struct {
> +  int f : 1;
> +} i;
> +char l(int m) { return a > 255 >> m ? 0 : m; }
> +int main() {
> +  if (i.f)
> +d = b;
> +  k = e = 1;
> +  for (; e; e--)
> +g = 0;
> +  h = g ? 1 << g : 1;
> +  c = h || i.f;
> +  while (b)
> +j = l(k && i.f);
> +  return 0;
> +}
> diff --git a/gcc/testsuite/gcc.dg/torture/pr117859-2.c 
> b/gcc/testsuite/gcc.dg/torture/pr117859-2.c
> new file mode 100644
> index 000..83549a80d3e
> --- /dev/null
> +++ b/gcc/testsuite/gcc.dg/torture/pr117859-2.c
> @@ -0,0 +1,51 @@
> +/* { dg-do compile } */
> +/* { dg-options "-O2 -fgimple" } */
> +
> +/* PR tree-optimization/117859 */
> +
> +void __GIMPLE (ssa,startwith("pre"))
> +m (int p, int b)
> +{
> +  int _6;
> +  _Bool _48;
> +  _Bool _50;
> +  int _57;
> +  _Bool _59;
> +
> +  __BB(2):
> +  _6 = 0; // This needs to be prop'ed to make _48 unconditional
> +  if (_6 != 0)
> +goto __BB5;
> +  else
> +goto __BB14;
> +
> +  __BB(14):
> +  _48 = p_2(D) != 0;
> +  goto __BB7;
> +
> +  __BB(5):
> +  goto __BB7;
> +
> +  __BB(7):
> +  _59 = __PHI (__BB5: _Literal (_Bool) 1, __BB14: _48);
> +  if (b_36(D) != 0)
> +goto __BB8;
> +  else
> +goto __BB13;
> +
> +  __BB(8):
> +  _50 = _59 != _Literal (_Bool) 0; // Important being != 0
> +  _57 = _50 ? 1 : 0; // can be unused but need around for _50 being used
> +  goto __BB9;
> +
> +  __BB(9,loop_header(2)):
> +  if (_59 != _Literal (_Bool) 0)
> +goto __BB13;
> +  else
> +goto __BB9;
> +
> +  __BB(13):
> +  return;
> +
> +
> +}
> diff --git a/gcc/tree-ssa-sccvn.cc b/gcc/tree-ssa-sccvn.cc
> index 7ef0bd01d3c..8d74731a891 100644
> --- a/gcc/tree-ssa-sccvn.cc
> +++ b/gcc/tree-ssa-sccvn.cc
> @@ -7964,7 +7964,8 @@ insert_predicates_for_cond (tree_code code, tree lhs, 
> tree rhs,
>edge nf = false_e;
>if (code == EQ_EXPR)
>  std::swap (nt, nf);
> -insert_predicates_for_cond (nc, nlhs, nrhs, nt, nf);
> +if (lhs != nlhs)
> +  insert_predicates_for_cond (nc, nlhs, nrhs, nt, nf);
>  }
>   /* (a | b) == 0 ->
>on true edge assert: a == 0 & b == 0. */
> --
> 2.43.0
> 


[PATCH v2] Write binary annotations for CodeView S_INLINESITE symbols

2024-11-30 Thread Mark Harmstone
Add "binary annotations" at the end of CodeView S_INLINESITE symbols,
which are a series of compressed integers that represent how line
numbers map to addresses.

This requires assembler support; you will need commit b3aa594d ("gas:
add .cv_ucomp and .cv_scomp pseudo-directives") in binutils.

gcc/
* configure.ac (HAVE_GAS_CV_UCOMP): New check.
* configure: Regenerate.
* config.in: Regenerate.
* dwarf2codeview.cc (enum binary_annotation_opcode): Define.
(struct codeview_function): Add htab_next and inline_loc;
(struct cv_func_hasher): Define.
(cv_func_htab): New global variable.
(new_codeview_function): Add new codeview_function to hash table.
(codeview_begin_block): Record location of inline block.
(codeview_end_block): Add dummy source line at end of inline block.
(find_line_function): New function.
(write_binary_annotations): New function.
(write_s_inlinesite): Call write_binary_annotations.
(codeview_debug_finish): Delete cv_func_htab.
---
This second version adds a configure check for .cv_ucomp in the
assembler, and prevents the binary annotations from being emitted if
it's not supported. There's no separate check for .cv_scomp, as the same
binutils commit introduced them both.

 gcc/config.in |   6 ++
 gcc/configure |  32 +++
 gcc/configure.ac  |   3 +
 gcc/dwarf2codeview.cc | 203 +-
 4 files changed, 240 insertions(+), 4 deletions(-)

diff --git a/gcc/config.in b/gcc/config.in
index 972c0c2034d..d8145a1453b 100644
--- a/gcc/config.in
+++ b/gcc/config.in
@@ -1459,6 +1459,12 @@
 /* Define 0/1 if your assembler supports .cfi_sections. */
 #undef HAVE_GAS_CFI_SECTIONS_DIRECTIVE
 
+/* Define if your assembler supports .cv_ucomp. */
+#ifndef USED_FOR_TARGET
+#undef HAVE_GAS_CV_UCOMP
+#endif
+
+
 /* Define if your assembler supports the .loc discriminator sub-directive. */
 #ifndef USED_FOR_TARGET
 #undef HAVE_GAS_DISCRIMINATOR
diff --git a/gcc/configure b/gcc/configure
index 8bb71cfe348..bab4181a940 100755
--- a/gcc/configure
+++ b/gcc/configure
@@ -25943,6 +25943,38 @@ $as_echo "#define HAVE_GAS_BASE64 1" >>confdefs.h
 fi
 
 
+{ $as_echo "$as_me:${as_lineno-$LINENO}: checking assembler for .cv_ucomp" >&5
+$as_echo_n "checking assembler for .cv_ucomp... " >&6; }
+if ${gcc_cv_as_cv_ucomp+:} false; then :
+  $as_echo_n "(cached) " >&6
+else
+  gcc_cv_as_cv_ucomp=no
+  if test x$gcc_cv_as != x; then
+$as_echo '.cv_ucomp 0' > conftest.s
+if { ac_try='$gcc_cv_as $gcc_cv_as_flags  -o conftest.o conftest.s >&5'
+  { { eval echo "\"\$as_me\":${as_lineno-$LINENO}: \"$ac_try\""; } >&5
+  (eval $ac_try) 2>&5
+  ac_status=$?
+  $as_echo "$as_me:${as_lineno-$LINENO}: \$? = $ac_status" >&5
+  test $ac_status = 0; }; }
+then
+   gcc_cv_as_cv_ucomp=yes
+else
+  echo "configure: failed program was" >&5
+  cat conftest.s >&5
+fi
+rm -f conftest.o conftest.s
+  fi
+fi
+{ $as_echo "$as_me:${as_lineno-$LINENO}: result: $gcc_cv_as_cv_ucomp" >&5
+$as_echo "$gcc_cv_as_cv_ucomp" >&6; }
+if test $gcc_cv_as_cv_ucomp = yes; then
+
+$as_echo "#define HAVE_GAS_CV_UCOMP 1" >>confdefs.h
+
+fi
+
+
 # gnu_indirect_function type is an extension proposed at
 # http://groups.google/com/group/generic-abi/files. It allows dynamic runtime
 # selection of function implementation
diff --git a/gcc/configure.ac b/gcc/configure.ac
index 88a1a44fcf7..b1b21cf4d7b 100644
--- a/gcc/configure.ac
+++ b/gcc/configure.ac
@@ -3065,6 +3065,9 @@ gcc_GAS_CHECK_FEATURE([.base64], gcc_cv_as_base64,,
.base64 
"Tm9uIHB1ZG9yIGVzdCBuaWwgc2NpcmUsIHB1ZG9yIG5pbCBkaXNjZXJlIHZlbGxlLgo="],,
 [AC_DEFINE(HAVE_GAS_BASE64, 1, [Define if your assembler supports .base64.])])
 
+gcc_GAS_CHECK_FEATURE([.cv_ucomp], gcc_cv_as_cv_ucomp,,[.cv_ucomp 0],,
+[AC_DEFINE(HAVE_GAS_CV_UCOMP, 1, [Define if your assembler supports 
.cv_ucomp.])])
+
 # gnu_indirect_function type is an extension proposed at
 # http://groups.google/com/group/generic-abi/files. It allows dynamic runtime
 # selection of function implementation
diff --git a/gcc/dwarf2codeview.cc b/gcc/dwarf2codeview.cc
index 821554088d7..33b57a45970 100644
--- a/gcc/dwarf2codeview.cc
+++ b/gcc/dwarf2codeview.cc
@@ -1086,6 +1086,25 @@ enum cv_amd64_register {
   CV_AMD64_YMM15D3 = 687
 };
 
+/* This is enum BinaryAnnotationOpcode in Microsoft's cvinfo.h.  */
+
+enum binary_annotation_opcode {
+  ba_op_invalid,
+  ba_op_code_offset,
+  ba_op_change_code_offset_base,
+  ba_op_change_code_offset,
+  ba_op_change_code_length,
+  ba_op_change_file,
+  ba_op_change_line_offset,
+  ba_op_change_line_end_delta,
+  ba_op_change_range_kind,
+  ba_op_change_column_start,
+  ba_op_change_column_end_delta,
+  ba_op_change_code_offset_and_line_offset,
+  ba_op_change_code_length_and_code_offset,
+  ba_op_change_column_end
+};
+
 struct codeview_string
 {
   codeview_string *next;
@@ -1157,11 +1176,13 @@ struct codeview_inlinee_lines

[committed] libstdc++: Improve new testcase for std::optional assignment [PR117858]

2024-11-30 Thread Jonathan Wakely
The copy & paste bug affected two assignment operators, so ensure the
new test covers both.

libstdc++-v3/ChangeLog:

PR libstdc++/117858
* testsuite/20_util/optional/assignment/117858.cc: Also test
assignment from rvalue optional.
---

Tested x86_64-linux. Pushed to trunk.

 libstdc++-v3/testsuite/20_util/optional/assignment/117858.cc | 1 +
 1 file changed, 1 insertion(+)

diff --git a/libstdc++-v3/testsuite/20_util/optional/assignment/117858.cc 
b/libstdc++-v3/testsuite/20_util/optional/assignment/117858.cc
index 9443e160484..e7045b37dd9 100644
--- a/libstdc++-v3/testsuite/20_util/optional/assignment/117858.cc
+++ b/libstdc++-v3/testsuite/20_util/optional/assignment/117858.cc
@@ -13,4 +13,5 @@ struct Focus
 void g(std::optional f)
 {
   f = f;
+  f = std::move(f);
 }
-- 
2.47.0



[PATCH] x86: Add pcmpeq splitters

2024-11-30 Thread H.J. Lu
Add pcmpeq splitters to split

(insn 5 3 7 2 (set (reg:V4SI 100)
(eq:V4SI (reg:V4SI 98)
(reg:V4SI 98))) 7910 {*sse2_eqv4si3}
 (expr_list:REG_DEAD (reg:V4SI 98)
(expr_list:REG_EQUAL (eq:V4SI (const_vector:V4SI [
(const_int -1 [0x]) repeated x4
])
(const_vector:V4SI [
(const_int -1 [0x]) repeated x4
]))
(nil

to

(insn 8 3 7 2 (set (reg:V4SI 100)
(const_vector:V4SI [
(const_int -1 [0x]) repeated x4
])) -1
 (nil))

gcc/

PR target/117863
* config/i386/sse.md: Add pcmpeq splitters.

gcc/testsuite/

PR target/117863
* gcc.dg/rtl/i386/vector_eq-2.c: New test.

Signed-off-by: H.J. Lu 
---
 gcc/config/i386/sse.md  | 33 ++
 gcc/testsuite/gcc.dg/rtl/i386/vector_eq-2.c | 71 +
 2 files changed, 104 insertions(+)
 create mode 100644 gcc/testsuite/gcc.dg/rtl/i386/vector_eq-2.c

diff --git a/gcc/config/i386/sse.md b/gcc/config/i386/sse.md
index 498a42d6e1e..e2ce0781cb4 100644
--- a/gcc/config/i386/sse.md
+++ b/gcc/config/i386/sse.md
@@ -17943,6 +17943,17 @@ (define_insn "*avx2_eq3"
(set_attr "prefix" "vex")
(set_attr "mode" "OI")])
 
+(define_split
+  [(set (match_operand:VI_256 0 "register_operand")
+   (eq:VI_256
+ (match_operand:VI_256 1 "nonimmediate_operand")
+ (match_operand:VI_256 2 "nonimmediate_operand")))]
+  "TARGET_AVX2 && rtx_equal_p (operands[1], operands[2])"
+  [(set (match_dup 0) (match_dup 1))]
+{
+  operands[1] = CONSTM1_RTX (mode);
+})
+
 (define_insn_and_split "*avx2_pcmp3_1"
  [(set (match_operand:VI_128_256  0 "register_operand")
(vec_merge:VI_128_256
@@ -18227,6 +18238,17 @@ (define_insn "*sse4_1_eqv2di3"
(set_attr "prefix" "orig,orig,vex")
(set_attr "mode" "TI")])
 
+(define_split
+  [(set (match_operand:V2DI 0 "register_operand")
+   (eq:V2DI
+ (match_operand:V2DI 1 "vector_operand")
+ (match_operand:V2DI 2 "vector_operand")))]
+  "TARGET_SSE4_1 && rtx_equal_p (operands[1], operands[2])"
+  [(set (match_dup 0) (match_dup 1))]
+{
+  operands[1] = CONSTM1_RTX (V2DImode);
+})
+
 (define_insn "*sse2_eq3"
   [(set (match_operand:VI124_128 0 "register_operand" "=x,x")
(eq:VI124_128
@@ -18243,6 +18265,17 @@ (define_insn "*sse2_eq3"
(set_attr "prefix" "orig,vex")
(set_attr "mode" "TI")])
 
+(define_split
+  [(set (match_operand:VI124_128 0 "register_operand")
+   (eq:VI124_128
+ (match_operand:VI124_128 1 "vector_operand")
+ (match_operand:VI124_128 2 "vector_operand")))]
+  "TARGET_SSE2 && rtx_equal_p (operands[1], operands[2])"
+  [(set (match_dup 0) (match_dup 1))]
+{
+  operands[1] = CONSTM1_RTX (mode);
+})
+
 (define_insn "sse4_2_gtv2di3"
   [(set (match_operand:V2DI 0 "register_operand" "=Yr,*x,x")
(gt:V2DI
diff --git a/gcc/testsuite/gcc.dg/rtl/i386/vector_eq-2.c 
b/gcc/testsuite/gcc.dg/rtl/i386/vector_eq-2.c
new file mode 100644
index 000..871d489b730
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/rtl/i386/vector_eq-2.c
@@ -0,0 +1,71 @@
+/* { dg-do compile { target { i?86-*-* x86_64-*-* } } } */
+/* { dg-additional-options "-O2 -march=x86-64-v3" } */
+
+typedef int v4si __attribute__((vector_size(16)));
+typedef int v8si __attribute__((vector_size(32)));
+typedef int v2di __attribute__((vector_size(16)));
+
+v4si __RTL (startwith ("vregs1")) foo1 (void)
+{
+(function "foo1"
+  (insn-chain
+(block 2
+  (edge-from entry (flags "FALLTHRU"))
+  (cnote 1 [bb 2] NOTE_INSN_BASIC_BLOCK)
+  (cnote 2 NOTE_INSN_FUNCTION_BEG)
+  (cinsn 3 (set (reg:V4SI <0>) (const_vector:V4SI [(const_int -1) 
(const_int -1) (const_int -1) (const_int -1)])))
+  (cinsn 4 (set (reg:V4SI <1>) (const_vector:V4SI [(const_int -1) 
(const_int -1) (const_int -1) (const_int -1)])))
+  (cinsn 5 (set (reg:V4SI <2>)
+   (eq:V4SI (reg:V4SI <0>) (reg:V4SI <1>
+  (cinsn 6 (set (reg:V4SI <3>) (reg:V4SI <2>)))
+  (cinsn 7 (set (reg:V4SI xmm0) (reg:V4SI <3>)))
+  (edge-to exit (flags "FALLTHRU"))
+)
+  )
+ (crtl (return_rtx (reg/i:V4SI xmm0)))
+)
+}
+
+v8si __RTL (startwith ("vregs1")) foo2 (void)
+{
+(function "foo2"
+  (insn-chain
+(block 2
+  (edge-from entry (flags "FALLTHRU"))
+  (cnote 1 [bb 2] NOTE_INSN_BASIC_BLOCK)
+  (cnote 2 NOTE_INSN_FUNCTION_BEG)
+  (cinsn 3 (set (reg:V8SI <0>) (const_vector:V8SI [(const_int -1) 
(const_int -1) (const_int -1) (const_int -1) (const_int -1) (const_int -1) 
(const_int -1) (const_int -1)])))
+  (cinsn 4 (set (reg:V8SI <1>) (const_vector:V8SI [(const_int -1) 
(const_int -1) (const_int -1) (const_int -1) (const_int -1) (const_int -1) 
(const_int -1) (const_int -1)])))
+  (cinsn 5 (set (reg:V8SI <2>)
+   (eq:V8SI (reg:V8SI <0>) (reg:V8SI <1>
+  (cinsn 6 (set (reg:V8SI <

[PATCH 2/2] VN: Don't recurse on for the same value of `a != 0` [PR117859]

2024-11-30 Thread Andrew Pinski
Like r15-5063-g6e84a41622f56c, but this is for the `a != 0` case.
After adding vn_valueize to the handle the `a ==/!= 0` case
of insert_predicates_for_cond, it would go into an infinite loop
as the Value number for a could be the same as what it
is for the whole expression. This avoids that recursion so there is
no infinite loop here.

Note lim was introducing `bool_var2 = bool_var1 != 0` originally but
with the gimple testcase in -2, there is no dependency on what passes
before hand will do.

Bootstrapped and tested on x86_64-linux-gnu.

PR tree-optimization/117859

gcc/ChangeLog:

* tree-ssa-sccvn.cc (insert_predicates_for_cond): If the
valueization for the new lhs for `lhs != 0`
is the same as the old ones, don't recurse.

gcc/testsuite/ChangeLog:

* gcc.dg/torture/pr117859-1.c: New test.
* gcc.dg/torture/pr117859-2.c: New test.

Signed-off-by: Andrew Pinski 
---
 gcc/testsuite/gcc.dg/torture/pr117859-1.c | 23 ++
 gcc/testsuite/gcc.dg/torture/pr117859-2.c | 51 +++
 gcc/tree-ssa-sccvn.cc |  3 +-
 3 files changed, 76 insertions(+), 1 deletion(-)
 create mode 100644 gcc/testsuite/gcc.dg/torture/pr117859-1.c
 create mode 100644 gcc/testsuite/gcc.dg/torture/pr117859-2.c

diff --git a/gcc/testsuite/gcc.dg/torture/pr117859-1.c 
b/gcc/testsuite/gcc.dg/torture/pr117859-1.c
new file mode 100644
index 000..21e0a26e5ca
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/torture/pr117859-1.c
@@ -0,0 +1,23 @@
+/* { dg-do compile } */
+/* { dg-options "-O2 -fno-ssa-phiopt -fno-expensive-optimizations" } */
+
+/* PR tree-optimization/117859 */
+
+unsigned char a;
+int b, c, d, e, g, h, j, k;
+struct {
+  int f : 1;
+} i;
+char l(int m) { return a > 255 >> m ? 0 : m; }
+int main() {
+  if (i.f)
+d = b;
+  k = e = 1;
+  for (; e; e--)
+g = 0;
+  h = g ? 1 << g : 1;
+  c = h || i.f;
+  while (b)
+j = l(k && i.f);
+  return 0;
+}
diff --git a/gcc/testsuite/gcc.dg/torture/pr117859-2.c 
b/gcc/testsuite/gcc.dg/torture/pr117859-2.c
new file mode 100644
index 000..83549a80d3e
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/torture/pr117859-2.c
@@ -0,0 +1,51 @@
+/* { dg-do compile } */
+/* { dg-options "-O2 -fgimple" } */
+
+/* PR tree-optimization/117859 */
+
+void __GIMPLE (ssa,startwith("pre"))
+m (int p, int b)
+{
+  int _6;
+  _Bool _48;
+  _Bool _50;
+  int _57;
+  _Bool _59;
+
+  __BB(2):
+  _6 = 0; // This needs to be prop'ed to make _48 unconditional
+  if (_6 != 0)
+goto __BB5;
+  else
+goto __BB14;
+
+  __BB(14):
+  _48 = p_2(D) != 0;
+  goto __BB7;
+
+  __BB(5):
+  goto __BB7;
+
+  __BB(7):
+  _59 = __PHI (__BB5: _Literal (_Bool) 1, __BB14: _48);
+  if (b_36(D) != 0)
+goto __BB8;
+  else
+goto __BB13;
+
+  __BB(8):
+  _50 = _59 != _Literal (_Bool) 0; // Important being != 0
+  _57 = _50 ? 1 : 0; // can be unused but need around for _50 being used
+  goto __BB9;
+
+  __BB(9,loop_header(2)):
+  if (_59 != _Literal (_Bool) 0)
+goto __BB13;
+  else
+goto __BB9;
+
+  __BB(13):
+  return;
+
+
+}
diff --git a/gcc/tree-ssa-sccvn.cc b/gcc/tree-ssa-sccvn.cc
index 7ef0bd01d3c..8d74731a891 100644
--- a/gcc/tree-ssa-sccvn.cc
+++ b/gcc/tree-ssa-sccvn.cc
@@ -7964,7 +7964,8 @@ insert_predicates_for_cond (tree_code code, tree lhs, 
tree rhs,
edge nf = false_e;
if (code == EQ_EXPR)
  std::swap (nt, nf);
-   insert_predicates_for_cond (nc, nlhs, nrhs, nt, nf);
+   if (lhs != nlhs)
+ insert_predicates_for_cond (nc, nlhs, nrhs, nt, nf);
  }
   /* (a | b) == 0 ->
on true edge assert: a == 0 & b == 0. */
-- 
2.43.0



[PATCH 1/2] gimple-lim: Reuse boolean var when moving PHI

2024-11-30 Thread Andrew Pinski
While looking into PR 117859, I noticed that LIM
sometimes would produce `bool_var2 = bool_var1 != 0` instead
of just using bool_var2. This patch allows LIM to reuse bool_var1
in the place where bool_var2 was going to be used.

Bootstrapped and tested on x86_64-linux-gnu.

gcc/ChangeLog:

* tree-ssa-loop-im.cc (move_computations_worker): While moving
phi, reuse the lhs of the conditional if it is a boolean type.

Signed-off-by: Andrew Pinski 
---
 gcc/tree-ssa-loop-im.cc | 21 -
 1 file changed, 16 insertions(+), 5 deletions(-)

diff --git a/gcc/tree-ssa-loop-im.cc b/gcc/tree-ssa-loop-im.cc
index 538a0588d33..0130a809e30 100644
--- a/gcc/tree-ssa-loop-im.cc
+++ b/gcc/tree-ssa-loop-im.cc
@@ -1304,11 +1304,22 @@ move_computations_worker (basic_block bb)
 edges of COND.  */
  extract_true_false_args_from_phi (dom, stmt, &arg0, &arg1);
  gcc_assert (arg0 && arg1);
- t = make_ssa_name (boolean_type_node);
- new_stmt = gimple_build_assign (t, gimple_cond_code (cond),
- gimple_cond_lhs (cond),
- gimple_cond_rhs (cond));
- gsi_insert_on_edge (loop_preheader_edge (level), new_stmt);
+ /* For `bool_val != 0`, reuse bool_val. */
+ if (gimple_cond_code (cond) == NE_EXPR
+ && integer_zerop (gimple_cond_rhs (cond))
+ && types_compatible_p (TREE_TYPE (gimple_cond_lhs (cond)),
+boolean_type_node))
+   {
+ t = gimple_cond_lhs (cond);
+   }
+ else
+   {
+ t = make_ssa_name (boolean_type_node);
+ new_stmt = gimple_build_assign (t, gimple_cond_code (cond),
+ gimple_cond_lhs (cond),
+ gimple_cond_rhs (cond));
+ gsi_insert_on_edge (loop_preheader_edge (level), new_stmt);
+   }
  new_stmt = gimple_build_assign (gimple_phi_result (stmt),
  COND_EXPR, t, arg0, arg1);
  todo |= TODO_cleanup_cfg;
-- 
2.43.0



Re: [PATCH v3] zero_extend(not) -> xor optimization [PR112398]

2024-11-30 Thread Jeff Law




On 11/28/24 5:26 AM, Alexey Merzlyakov wrote:

This patch adds optimization of the following patterns:

   (zero_extend:M (subreg:N (not:O==M (X:Q==M ->
   (xor:M (zero_extend:M (subreg:N (X:M)), mask))
   ... where the mask is GET_MODE_MASK (N).

For the cases when X:M doesn't have any non-zero bits outside of mode N,
(zero_extend:M (subreg:N (X:M)) could be simplified to just (X:M)
and whole optimization will be:

   (zero_extend:M (subreg:N (not:M (X:M ->
   (xor:M (X:M, mask))

Patch targets to handle code patterns like:
   not   a0,a0
   andi  a0,a0,0xff
to be optimized to:
   xori  a0,a0,255

PR rtl-optimization/112398
PR rtl-optimization/117476

gcc/ChangeLog:

* simplify-rtx.cc (simplify_context::simplify_unary_operation_1):
Simplify ZERO_EXTEND (SUBREG (NOT X)) to XOR (X, GET_MODE_MASK(SUBREG))
when X doesn't have any non-zero bits outside of SUBREG mode.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/pr112398.c: New test.
* gcc.dg/torture/pr117476-1.c: New test. From Zhendong Su.
* gcc.dg/torture/pr117476-2.c: New test. From Zdenek Sojka.
Thanks.  I've bootstrapped & regression tested this on x86_64 as well as 
run it through my tester successfully.  Pushed to the trunk.  Hopefully 
no fallout this time :-)


jeff



[committed] libstdc++: Fix constraints on std::optional converting assignments [PR117858]

2024-11-30 Thread Jonathan Wakely
It looks like I copied these constraints from operator=(U&&) and didn't
correct them to account for the parameter being optional not U.

libstdc++-v3/ChangeLog:

PR libstdc++/117858
* include/std/optional (operator=(const optional&)): Fix copy
and paste error in constraints.
(operator=(optional&&)): Likewise.
* testsuite/20_util/optional/assignment/117858.cc: New test.
---

Tested x86_64-linux. Pushed to trunk.

 libstdc++-v3/include/std/optional|  4 ++--
 .../20_util/optional/assignment/117858.cc| 16 
 2 files changed, 18 insertions(+), 2 deletions(-)
 create mode 100644 libstdc++-v3/testsuite/20_util/optional/assignment/117858.cc

diff --git a/libstdc++-v3/include/std/optional 
b/libstdc++-v3/include/std/optional
index b8eedeec781..55e56cfb9ed 100644
--- a/libstdc++-v3/include/std/optional
+++ b/libstdc++-v3/include/std/optional
@@ -1043,7 +1043,7 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
 
   template
 #ifdef _GLIBCXX_USE_CONSTRAINTS_FOR_OPTIONAL
-   requires (!is_same_v>)
+   requires (!is_same_v<_Tp, _Up>)
  && is_constructible_v<_Tp, const _Up&>
  && is_assignable_v<_Tp&, const _Up&>
  && (!__converts_from_optional<_Tp, _Up>::value)
@@ -1077,7 +1077,7 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
 
   template
 #ifdef _GLIBCXX_USE_CONSTRAINTS_FOR_OPTIONAL
-   requires (!is_same_v>)
+   requires (!is_same_v<_Tp, _Up>)
  && is_constructible_v<_Tp, _Up>
  && is_assignable_v<_Tp&, _Up>
  && (!__converts_from_optional<_Tp, _Up>::value)
diff --git a/libstdc++-v3/testsuite/20_util/optional/assignment/117858.cc 
b/libstdc++-v3/testsuite/20_util/optional/assignment/117858.cc
new file mode 100644
index 000..9443e160484
--- /dev/null
+++ b/libstdc++-v3/testsuite/20_util/optional/assignment/117858.cc
@@ -0,0 +1,16 @@
+// { dg-do compile { target c++17 } }
+
+// PR 117858 std::optional with a constructor template ctor(T)
+
+#include 
+
+struct Focus
+{
+  template
+  Focus(T newValue)  { }
+};
+
+void g(std::optional f)
+{
+  f = f;
+}
-- 
2.47.0



[PATCH] x86: Add a pass to remove redundant all 0s/1s vector load

2024-11-30 Thread H.J. Lu
For all different modes of all 0s/1s vectors, we can use the single widest
all 0s/1s vector register for all 0s/1s vector uses in the whole function.
Add a pass to generate a single widest all 0s/1s vector set instruction at
entry of the nearest common dominator for basic blocks with all 0s/1s
vector uses.  On Linux/x86-64, in cc1plus, this patch reduces the number
of vector xor instructions from 4803 to 4714 and pcmpeq instructions from
144 to 142.

This change causes a regression:

FAIL: gcc.dg/rtl/x86_64/vector_eq.c

without the fix for

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=117863

NB: PR target/92080 and PR target/117839 aren't same.  PR target/117839
is for vectors of all 0s and all 1s with different sizes and different
components.  PR target/92080 is for broadcast of the same component to
different vector sizes.  This patch covers only all 0s and all 1s cases
of PR target/92080.

gcc/

PR target/92080
PR target/117839
* config/i386/i386-features.cc (ix86_rrvl_gate): New.
(ix86_place_single_vector_set): Likewise.
(ix86_get_vector_load_mode): Likewise.
(remove_redundant_vector_load): Likewise.
(pass_data_remove_redundant_vector_load): Likewise.
(pass_remove_redundant_vector_load): Likewise.
(make_pass_remove_redundant_vector_load): Likewise.
* config/i386/i386-passes.def: Add
pass_remove_redundant_vector_load after
pass_remove_partial_avx_dependency.
* config/i386/i386-protos.h
(make_pass_remove_redundant_vector_load): New.

gcc/testsuite/

PR target/92080
PR target/117839
* gcc.target/i386/pr117839-1a.c: New test.
* gcc.target/i386/pr117839-1b.c: Likewise.
* gcc.target/i386/pr117839-2.c: Likewise.
* gcc.target/i386/pr92080-1.c: Likewise.
* gcc.target/i386/pr92080-2.c: Likewise.
* gcc.target/i386/pr92080-3.c: Likewise.

Signed-off-by: H.J. Lu 
---
 gcc/config/i386/i386-features.cc| 308 
 gcc/config/i386/i386-passes.def |   1 +
 gcc/config/i386/i386-protos.h   |   2 +
 gcc/testsuite/gcc.target/i386/pr117839-1a.c |  35 +++
 gcc/testsuite/gcc.target/i386/pr117839-1b.c |   5 +
 gcc/testsuite/gcc.target/i386/pr117839-2.c  |  40 +++
 gcc/testsuite/gcc.target/i386/pr92080-1.c   |  54 
 gcc/testsuite/gcc.target/i386/pr92080-2.c   |  59 
 gcc/testsuite/gcc.target/i386/pr92080-3.c   |  48 +++
 9 files changed, 552 insertions(+)
 create mode 100644 gcc/testsuite/gcc.target/i386/pr117839-1a.c
 create mode 100644 gcc/testsuite/gcc.target/i386/pr117839-1b.c
 create mode 100644 gcc/testsuite/gcc.target/i386/pr117839-2.c
 create mode 100644 gcc/testsuite/gcc.target/i386/pr92080-1.c
 create mode 100644 gcc/testsuite/gcc.target/i386/pr92080-2.c
 create mode 100644 gcc/testsuite/gcc.target/i386/pr92080-3.c

diff --git a/gcc/config/i386/i386-features.cc b/gcc/config/i386/i386-features.cc
index 003b003e09c..7d8d260750d 100644
--- a/gcc/config/i386/i386-features.cc
+++ b/gcc/config/i386/i386-features.cc
@@ -3288,6 +3288,314 @@ make_pass_remove_partial_avx_dependency (gcc::context 
*ctxt)
   return new pass_remove_partial_avx_dependency (ctxt);
 }
 
+static bool
+ix86_rrvl_gate ()
+{
+  return (TARGET_SSE2
+ && optimize
+ && optimize_function_for_speed_p (cfun));
+}
+
+/* Generate a vector set, DEST = SRC, at entry of the nearest dominator
+   for basic block map BBS, which is in the fake loop that contains the
+   whole function, so that there is only a single vector set in the
+   whole function.   */
+
+static void
+ix86_place_single_vector_set (rtx dest, rtx src, bitmap bbs)
+{
+  basic_block bb = nearest_common_dominator_for_set (CDI_DOMINATORS, bbs);
+  while (bb->loop_father->latch
+!= EXIT_BLOCK_PTR_FOR_FN (cfun))
+bb = get_immediate_dominator (CDI_DOMINATORS,
+ bb->loop_father->header);
+
+  rtx set = gen_rtx_SET (dest, src);
+
+  rtx_insn *insn = BB_HEAD (bb);
+  while (insn && !NONDEBUG_INSN_P (insn))
+{
+  if (insn == BB_END (bb))
+   {
+ insn = NULL;
+ break;
+   }
+  insn = NEXT_INSN (insn);
+}
+
+  rtx_insn *set_insn;
+  if (insn == BB_HEAD (bb))
+set_insn = emit_insn_before (set, insn);
+  else
+set_insn = emit_insn_after (set,
+   insn ? PREV_INSN (insn) : BB_END (bb));
+  df_insn_rescan (set_insn);
+}
+
+/* Return a machine mode suitable for vector SIZE.  */
+
+static machine_mode
+ix86_get_vector_load_mode (unsigned int size)
+{
+  machine_mode mode;
+  if (size == 64)
+mode = V64QImode;
+  else if (size == 32)
+mode = V32QImode;
+  else
+mode = V16QImode;
+  return mode;
+}
+
+/* At entry of the nearest common dominator for basic blocks with vector
+   CONST0_RTX and integer CONSTM1_RTX uses, generate a single widest
+   vector set instruction for all CONST0_RTX and integer CONSTM1_RTX
+   uses.
+
+   NB: We want to 

Re: [PATCH] Add fancy pointer support in std::map/set

2024-11-30 Thread Jonathan Wakely
On Sat, 30 Nov 2024, 13:52 François Dumont,  wrote:

> Hi
>
> I've applied all your comments below and the ones you did on the PR
> directly.
>
> When all new types totally seperated from the legacy types
> _Rb_tree_node_traits is indeed useless.
>
> Regarding _Rb_tree_helpers I got rid of it but move the insert-and-rebalance
> and rebalance-for-erase function to _Node_traits<>.
>
> PR updated:
>
> https://forge.sourceware.org/gcc/gcc-TEST/pulls/27
>
> libstdc++: Add fancy pointer support in map and set Support fancy
> allocator pointer type in std::_Rb_tree<>. In case of fancy pointer type
> the container is now storing the pointer to __rb_tree::_Node<_ValPtr> as
> a pointer to __rb_tree::_Node_base<_VoidPtr>. Many methods are adapted to
> take and return _Base_ptr in place of _Link_type which has been renamed
> into _Node_ptr. As all node are stored as _Base_ptr have all methods
> working with this type and remove _Const_Base_ptr and _Const_Node_ptr and
> all methods associated with it.
>

Sounds good! I'll take another look on Monday.


libstdc++-v3/ChangeLog: * include/bits/stl_tree.h
> [_GLIBCXX_USE_ALLOC_PTR_FOR_RB_TREE]: New macro to control usage of the
> code required to support fancy allocator pointer type. (_Rb_tree_node_
> base::_Const_Base_ptr): Remove. (_Rb_tree_node_base::_S_minimum,
> _Rb_tree_node_base::_S_maximum): Remove overloads for _Const_Base_ptr.
> (_Rb_tree_node_base::_M_base_ptr()): New. 
> (_Rb_tree_node_base::_M_node_ptr<_NodePtr>()):
> New. (_Rb_tree_node::_Link_type): Rename into... (_Rb_tree_node::_Node_ptr):
> ...this. (__rb_tree::_Node_base<>): New. (__rb_tree::_Header<>): New.
> (__rb_tree::_Node<>): New. (_Rb_tree_increment(const
> _Rb_tree_node_base*)): Remove declaration. (_Rb_tree_decrement(const
> _Rb_tree_node_base*)): Remove declaration.
> (_Rb_tree_iterator<>::_Link_type): Rename into...
> (_Rb_tree_iterator<>::_Node_ptr): ...this.
> (_Rb_tree_const_iterator<>::_Link_type): Rename into...
> (_Rb_tree_const_iterator<>::_Node_ptr): ...this.
> (_Rb_tree_const_iterator<>::_M_const_cast): Remove.
> (_Rb_tree_const_iterator<>::_M_node): Change type into _Base_ptr. (__rb_
> tree::_Iterator<>): New. (__rb_tree::_Node_traits<>): New.
> (_Rb_tree<>::_Node_base, _Rb_tree::_Node_type): New.
> (_Rb_tree<>::_Link_type): Rename into... (_Rb_tree<>::_Node_ptr): ...this.
> (_Rb_tree<>::_Const_Base_ptr, _Rb_tree<>::_Const_Node_ptr): Remove.
> (_Rb_tree<>): Adapt to generalize usage of _Base_ptr in place of former
> _Link_type now _Node_ptr. (_Rb_tree<>::_M_mbegin): Remove.
> (_Rb_tree<>::_S_left(_Const_Base_ptr)): Remove.
> (_Rb_tree<>::_S_right(_Const_Base_ptr)): Remove.
> (_Rb_tree<>::_S_maximum(_Const_Base_ptr)): Remove.
> (_Rb_tree<>::_S_minimum(_Const_Base_ptr)): Remove. *
> testsuite/23_containers/map/allocator/ext_ptr.cc: New test case. *
> testsuite/23_containers/multimap/allocator/ext_ptr.cc: New test case. *
> testsuite/23_containers/multiset/allocator/ext_ptr.cc: New test case. *
> testsuite/23_containers/set/allocator/ext_ptr.cc: New test case.
>
> I've tested it in C++98/C++17/C++20 with default
> _GLIBCXX_USE_ALLOC_PTR_FOR_RB_TREE and with it 9001.
>
> François
> On 18/11/2024 17:39, Jonathan Wakely wrote:
>
> On 18/11/24 06:57 +0100, François Dumont wrote:
>
> Hi
>
> Here is a new proposal with all the cleanup regarding _Const_Base_ptr
> that makes support of allocator's fancy pointer type simpler.
>
> Also submitted as PR:
> https://forge.sourceware.org/gcc/gcc-TEST/pulls/27
>
>   libstdc++: Add fancy pointer support in map and set
>
>Support fancy allocator pointer type in std::_Rb_tree<>.
>
>In case of fancy pointer type the container is now storing the
> pointer to
>_Rb_tree_pnode<> as a pointer to _Rb_tree_pnode_base<>.
>
>Many methods are adapted to take and return _Base_ptr in place of
> _Link_type
>which has been renamed into _Node_ptr.
>
>As all node are stored as _Base_ptr have all methods working with
> this type
>and remove _Const_Base_ptr and all methods associated to it.
>
>libstdc++-v3/ChangeLog:
>
>* include/bits/stl_set.h (std::set<>::upper_bound<>(const
> _Kt&) const): Fix
>decltype typo to use const_iterator.
>* include/bits/stl_tree.h
>(_Rb_tree_ptr_traits<>): New.
>(_Rb_tree_pnode_base<>): New.
>(_Rb_tree_node_base): Inherit from latter.
>(_Rb_tree_node_base::_Const_Base_ptr): Remove.
>(_Rb_tree_node_base::_S_minimum(_Const_Base_ptr)): Remove.
>(_Rb_tree_node_base::_S_maximum(_Const_Base_ptr)): Remove.
>(_Rb_tree_pheader): New.
>(_Rb_tree_header): Inherit from latter.
>(_Rb_tree_node_val): New.
>(_Rb_tree_node): Inherit from latter.
>(_Rb_tree_pnode): New.
>(_Rb_tree_iterator<>::_Link_type): Rename into...
>(_Rb_tree_iterator<>::_Node_ptr): ...this.
>(_Rb_tree_const_iterator<>::_Link_type): Rename into...
>