Re: Fix type compatibility for types with flexible array member [PR113688,PR114014,PR117724]
Am Dienstag, dem 26.11.2024 um 15:15 + schrieb Qing Zhao: > > > On Nov 25, 2024, at 16:46, Martin Uecker wrote: > > > > > > Hi Qing, > > > > Am Montag, dem 25.11.2024 um 17:40 + schrieb Qing Zhao: > > > Hi, Martin, > > > > > > I didn’t go through all the details of your patch. > > > > > > But I have one question: > > > > > > Did you consider the effect of the option -fstrict-flex-array > > > (https://gcc.gnu.org/onlinedocs/gcc-14.2.0/gcc/C-Dialect-Options.html#index-fstrict-flex-arrays) > > > on how gcc treats the zero size trailing array, 1-element trailing array > > > as flexible array member in the patch? > > > > I used the function which was already there which > > does not take this into account. For the new version > > of the patch this should not matter anymore. > > Why it’s not matter anymore? > > For the following testing case: > > struct S{int x,y[1];}*a; > int main(void){ > struct S{int x,y[];}; > } > > With your latest patch, the two structures are considered as compatible with > -g; > However, if we add -fstrict-flex-array=2 or -fstrict-flex-array=3, the > trailing array y[1] is NOT treated > as FAM anymore, as a result, these two structure are NOT compatible too. > > Do I miss anything obvious? It is not about compatibility from a language semantic point of you but for TBAA-compatibility which needs to be consistent with it but can (and must be) more general. For TBAA, I think we want struct foo { int x; int y[]; }; to be TBAA-compatible to struct foo { int x; int y[3]; }; even when we do not treat the later as FAM (i.e. still forbid out-of-bounds accesses). E.g. see Richard's comment: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114713#c2 Martin > Thanks. > > Qing > > > > Martin > > > > > > > > > > thanks. > > > > > > Qing > > > > On Nov 23, 2024, at 14:45, Martin Uecker wrote: > > > > > > > > > > > > This patch tries fixes the errors we have because of > > > > flexible array members. I am bit unsure about the exception > > > > for the mode. > > > > > > > > Bootstrapped and regression tested on x86_64. > > > > > > > > > > > > > > > > Fix type compatibility for types with flexible array member > > > > [PR113688,PR114014,PR117724] > > > > > > > > verify_type checks the compatibility of TYPE_CANONICAL using > > > > gimple_canonical_types_compatible_p. But it is stricter than what > > > > the > > > > C standard requires and therefor inconsistent with how TYPE_CANONICAL > > > > is set > > > > in the C FE. Here, the logic is changed to ignore array size when > > > > one of the > > > > types is a flexible array member. To not get errors because of > > > > inconsistent > > > > number of members, zero-sized arrays are not ignored anymore when > > > > checking > > > > fields of a struct (which is stricter than what was done before). > > > > Finally, a exception is added that allows the TYPE_MODE of a type with > > > > flexible array member to differ from another compatible type. > > > > > > > > PR c/113688 > > > > PR c/114014 > > > > PR c/117724 > > > > > > > > gcc/ChangeLog: > > > > * tree.cc (gimple_canonical_types_compatible_p): Revise > > > > logic for types with FAM. > > > > (verify_type): Add exception for mode for types with FAM. > > > > > > > > gcc/testsuite/ChangeLog: > > > > * gcc.dg/pr113688.c: New test. > > > > * gcc.dg/pr114014.c: New test. > > > > * gcc.dg/pr117724.c: New test. > > > > > > > > diff --git a/gcc/testsuite/gcc.dg/pr113688.c > > > > b/gcc/testsuite/gcc.dg/pr113688.c > > > > new file mode 100644 > > > > index 000..8dee8c86f1b > > > > --- /dev/null > > > > +++ b/gcc/testsuite/gcc.dg/pr113688.c > > > > @@ -0,0 +1,8 @@ > > > > +/* { dg-do compile } */ > > > > +/* { dg-options "-g" } */ > > > > + > > > > +struct S{int x,y[1];}*a; > > > > +int main(void){ > > > > + struct S{int x,y[];}; > > > > +} > > > > + > > > > diff --git a/gcc/testsuite/gcc.dg/pr114014.c > > > > b/gcc/testsuite/gcc.dg/pr114014.c > > > > new file mode 100644 > > > > index 000..ab783f4f85d > > > > --- /dev/null > > > > +++ b/gcc/testsuite/gcc.dg/pr114014.c > > > > @@ -0,0 +1,14 @@ > > > > +/* PR c/114014 > > > > + * { dg-do compile } > > > > + * { dg-options "-std=c23 -g" } */ > > > > + > > > > +struct r { > > > > + int a; > > > > + char b[]; > > > > +}; > > > > +struct r { > > > > + int a; > > > > + char b[0]; > > > > +}; /* { dg-error "redefinition" } */ > > > > + > > > > + > > > > diff --git a/gcc/testsuite/gcc.dg/pr117724.c > > > > b/gcc/testsuite/gcc.dg/pr117724.c > > > > new file mode 100644 > > > > index 000..d631daeb644 > > > > --- /dev/null > > > > +++ b/gcc/testsuite/gcc.dg/pr117724.c > > > > @@ -0,0 +1,16 @@ > > > > +/* { dg-do compile } */ > > > > +/* { dg-options "-g" } */ > > > > + > > > > +struct { > > > > + unsigned long len; > > > > + unsigned long size; > > > > + char data[]
Re: Should -fsanitize=bounds support counted-by attribute for pointers inside a structure?
Am Dienstag, dem 26.11.2024 um 20:59 + schrieb Qing Zhao: > Think this over these days, I have another thought that need some feedback: > > The major issue right now is: > > 1. For the following structure in which the “counted_by” attributes is > attached to the pointer field. > > struct foo { > int n; > char *p __attribute__ ((counted_by (n))); > } *x; BTW: I also do not like the syntax 'n' for a lookup of a member in the member namespace. I think it should be '.n'. For FAM this is less problematic because it always at the end, but here it is problematic. > > There is one important additional requirement: > > x->n, x->p can ONLY be changed by changing the whole structure at the same > time. > Otherwise, x->n might not be consistent with x->p. By itself, this would still not fix the issue I pointed out. struct foo x; x = .. ; // set the whole structure char *p = x->p; x = ... ; // set the whole structure What is the bound for 'p' ? With current rules it would be the old bound. > > 2. This new requirement is ONLY for “counted_by” attribute that is attached > to the pointer field, not needed > for flexible array members. > > 3. Then there will be inconsistency for the “counted_by” attribute between > FAM and pointer field. > The major questions I have right now: > > 1. Shall we keep this inconsistency between FAM and pointer field? > Or, > > 2. Shall we keep them consistent by adding this new requirement for the > previous FAM? Or have a new attribute? I feel we double down on a bad design which made sense for FAM addressing a very specific use case (retrofitting checks to the Linux kernel) but is otherwise not very strong. Martin > > Kees, the following feature that you requested for the FAM: > > " > One important feature of the attribute is, a reference to the > flexible array member field uses the latest value assigned to the > field that represents the number of the elements before that > reference. For example, > > p->count = val1; > p->array[20] = 0; // ref1 to p->array > p->count = val2; > p->array[30] = 0; // ref2 to p->array > > in the above, 'ref1' uses 'val1' as the number of the elements in > 'p->array', and 'ref2' uses 'val2' as the number of elements in > 'p->array’. > " > Has this feature been used by Linux kernel already? > Is this feature really needed by Linux kernel? > > Thanks a lot for suggestions and comments. > > Qing > > > On Nov 20, 2024, at 14:23, Martin Uecker wrote: > > > > Am Mittwoch, dem 20.11.2024 um 17:37 + schrieb Qing Zhao: > > > Hi, Martin, > > > > > > Thanks a lot for pointing this out. > > > > > > This does look like a problem we need avoid for the pointer arrays. > > > > > > Does the same problem exist in the language extension too if the n is > > > allowed to be changed after initialization? > > > > > > If so, for the future language extension, is there any proposed solution > > > to this problem? > > > > > > > There is no specification yet and nothing formally proposed, so > > it is entirely unclear at this point. > > > > My idea would be to give 'x->buf' the type '(char(*)[x->n])' > > where 'x->n' is loaded at the same time 'x->buf' is accessed > > and then the value is frozen (like in the simpler versions > > of 'counted_by' you had implemented first). Of course, one > > would then have to set x->n before setting the buffer (or > > at the same time). This could be ensured by making the > > member 'n' const, so that it can only be changed by > > overwriting the whole struct. But I am still thinking > > about this. > > > > In any case, I think for "counted_by" this is not an option > > because it would be confusing if it works differently > > than for the FAM case. > > > > Martin > > > > > > > Qing > > > > On Nov 20, 2024, at 10:52, Martin Uecker wrote: > > > > > > > > Am Mittwoch, dem 20.11.2024 um 15:27 + schrieb Qing Zhao: > > > > > > > > > > > On Nov 19, 2024, at 10:47, Marek Polacek wrote: > > > > > > > > > > > > On Mon, Nov 18, 2024 at 07:10:35PM +0100, Martin Uecker wrote: > > > > > > > Am Montag, dem 18.11.2024 um 17:55 + schrieb Qing Zhao: > > > > > > > > Hi, > > > > > > > > > .. > > > > > > > > Hi Qing, > > > > > > > > > Per our discussion so far, if treating the following > > > > > > > > > > struct foo { > > > > > int n; > > > > > char *p __attribute__ ((counted_by (n))); > > > > > }; > > > > > > > > > > as an array with upper-bounds being “n” is reasonable. > > > > > > > > There is one issue I want to point out, which I just realized during > > > > a discussion in WG14. For "counted_by" we defined the semantics > > > > in a way that later store to 'n' will be taken into account. > > > > We did this to support the use case where 'n' is set after > > > > the access to 'p'. > > > > > > > > struct foo *x = ; > > > > > > > > char *q = x->p; > > > > x->n = 100; // this should apply
Re: gimplify: Handle void BIND_EXPR as asm input [PR100501]
> Am 29.11.2024 um 23:45 schrieb Joseph Myers : > > On Fri, 29 Nov 2024, Richard Biener wrote: > >> I think we're trying to handle errorneous cases by setting TREE_VALUE >> to error_mark_node >> before this, so how about the following instead? > > Yes, that works, and also fixes the test in bug 100792 unlike my previous > patch. Here's a full, tested patch using your version. > > > gimplify: Handle void expression as asm input [PR100501, PR100792] > > As reported in bug 100501 (plus duplicates), the gimplifier ICEs for C > tests involving a statement expression not returning a value as an asm > input; this includes the variant bug 100792 where the statement > expression ends with another asm statement. > > The expected diagnostic for this case (as seen for C++ input) is one > coming from the gimplifier and so it seems reasonable to fix the > gimplifier to handle the GENERIC generated for this case by the C > front end, rather than trying to make the C front end detect it > earlier. Thus the gimplifier to handle a void > expression like other non-lvalues for such a memory input. > > Bootstrapped with no regressions for x86_64-pc-linux-gnu. OK to commit? Ok Richard >PR c/100501 >PR c/100792 > > gcc/ >* gimplify.cc (gimplify_asm_expr): Handle void expressions for >memory inputs like other non-lvalues. > > gcc/testsuite/ >* gcc.dg/pr100501-1.c, gcc.dg/pr100792-1.c: New tests. >* gcc.dg/pr48552-1.c, gcc.dg/pr48552-2.c, >gcc.dg/torture/pr98601.c: Update expected errors. > > Co-authored-by: Richard Biener > > diff --git a/gcc/gimplify.cc b/gcc/gimplify.cc > index fb0ca23bfb6c..aa99c0a98f73 100644 > --- a/gcc/gimplify.cc > +++ b/gcc/gimplify.cc > @@ -7453,7 +7453,8 @@ gimplify_asm_expr (tree *expr_p, gimple_seq *pre_p, > gimple_seq *post_p) > || TREE_CODE (inputv) == PREINCREMENT_EXPR > || TREE_CODE (inputv) == POSTDECREMENT_EXPR > || TREE_CODE (inputv) == POSTINCREMENT_EXPR > - || TREE_CODE (inputv) == MODIFY_EXPR) > + || TREE_CODE (inputv) == MODIFY_EXPR > + || VOID_TYPE_P (TREE_TYPE (inputv))) >TREE_VALUE (link) = error_mark_node; > tret = gimplify_expr (&TREE_VALUE (link), pre_p, post_p, >is_gimple_lvalue, fb_lvalue | fb_mayfail); > diff --git a/gcc/testsuite/gcc.dg/pr100501-1.c > b/gcc/testsuite/gcc.dg/pr100501-1.c > new file mode 100644 > index ..152caac8b5d1 > --- /dev/null > +++ b/gcc/testsuite/gcc.dg/pr100501-1.c > @@ -0,0 +1,26 @@ > +/* Test ICE for statement expression returning no value as asm input (bug > + 100501). */ > +/* { dg-do compile } */ > +/* { dg-options "" } */ > + > +int x; > +int g (); > + > +void > +f () > +{ > + __asm__ ("" : : "m" (({}))); /* { dg-error "memory input 0 is not directly > addressable" } */ > + __asm__ ("" : : "m" (({ ; }))); /* { dg-error "memory input 0 is not > directly addressable" } */ > + __asm__ ("" : : "m" (({ (void) 0; }))); /* { dg-error "memory input 0 is > not directly addressable" } */ > + __asm__ ("" : : "m" (({ f (); }))); /* { dg-error "memory input 0 is not > directly addressable" } */ > + __asm__ ("" : : "m" (({ f (); f (); }))); /* { dg-error "memory input 0 is > not directly addressable" } */ > + __asm__ ("" : : "m" (({ x = g (); f (); }))); /* { dg-error "memory input > 0 is not directly addressable" } */ > + __asm__ ("" : : "m" (({ if (1) g (); }))); /* { dg-error "memory input 0 > is not directly addressable" } */ > + __asm__ ("" : : "m" (({ if (1) g (); else g (); }))); /* { dg-error > "memory input 0 is not directly addressable" } */ > + __asm__ ("" : : "m" (({ test : goto test; }))); /* { dg-error "memory > input 0 is not directly addressable" } */ > + __asm__ ("" : : "m" (({ return; }))); /* { dg-error "memory input 0 is not > directly addressable" } */ > + __asm__ ("" : : "m" (({ while (1); }))); /* { dg-error "memory input 0 is > not directly addressable" } */ > + __asm__ ("" : : "m" (({ do {} while (1); }))); /* { dg-error "memory input > 0 is not directly addressable" } */ > + __asm__ ("" : : "m" (({ for (;;); }))); /* { dg-error "memory input 0 is > not directly addressable" } */ > + __asm__ ("" : : "m" (({ switch (x); }))); /* { dg-error "memory input 0 is > not directly addressable" } */ > +} > diff --git a/gcc/testsuite/gcc.dg/pr100792-1.c > b/gcc/testsuite/gcc.dg/pr100792-1.c > new file mode 100644 > index ..52f3aaf83f73 > --- /dev/null > +++ b/gcc/testsuite/gcc.dg/pr100792-1.c > @@ -0,0 +1,10 @@ > +/* Test ICE for statement expression ending with asm as asm input (bug > + 100792). */ > +/* { dg-do compile } */ > +/* { dg-options "" } */ > + > +void > +f () > +{ > + __asm__ ("" : : "m" (({ __asm__ (""); }))); /* { dg-error "memory input 0 > is not directly addressable" } */ > +} > diff --git a/gcc/testsuite/gcc.dg/pr48552-1.c > b/gcc/testsuite/gcc.dg/pr48552-1.c > index 4cd7c59011ef..b3ef453cb1d5 100644 > --- a/gcc/testsuite/gcc.dg/pr48552-1
Re:[pushed] [PATCH 1/2] LoongArch: testsuite: Fix loongarch/vect-frint-scalar.c.
Pushed to r15-5817. 在 2024/11/26 下午4:06, Lulu Cheng 写道: In r15-5327, change the default language version for C compilation from -std=gnu17 to -std=gnu23. ISO C99 and C11 allow ceil, floor, round and trunc, and their float and long double variants, to raise the “inexact” exception, but ISO/IEC TS 18661-1:2014, the C bindings to IEEE 754-2008, as integrated into ISO C23, does not allow these functions to do so. So add '-ffp-int-builtin-inexact' to this test case. gcc/testsuite/ChangeLog: * gcc.target/loongarch/vect-frint-scalar.c: Add '-ffp-int-builtin-inexact'. --- gcc/testsuite/gcc.target/loongarch/vect-frint-scalar.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/gcc/testsuite/gcc.target/loongarch/vect-frint-scalar.c b/gcc/testsuite/gcc.target/loongarch/vect-frint-scalar.c index c7cb40be7d4..dbcb9065ad4 100644 --- a/gcc/testsuite/gcc.target/loongarch/vect-frint-scalar.c +++ b/gcc/testsuite/gcc.target/loongarch/vect-frint-scalar.c @@ -1,5 +1,5 @@ /* { dg-do compile } */ -/* { dg-options "-O2 -mlsx" } */ +/* { dg-options "-O2 -mlsx -ffp-int-builtin-inexact" } */ #define test(func, suffix) \ __typeof__ (1.##suffix) \
Re:[pushed] [PATCH 2/2] LoongArch: testsuite: Fix l{a}sx-andn-iorn.c.
Pushed to r15-5818. 在 2024/11/26 下午4:06, Lulu Cheng 写道: Add '-fdump-tree-optimized' to this testcases. gcc/testsuite/ChangeLog: * gcc.target/loongarch/lasx-andn-iorn.c: Add '-fdump-tree-optimized'. * gcc.target/loongarch/lsx-andn-iorn.c: Likewise. --- gcc/testsuite/gcc.target/loongarch/lasx-andn-iorn.c | 2 +- gcc/testsuite/gcc.target/loongarch/lsx-andn-iorn.c | 2 +- 2 files changed, 2 insertions(+), 2 deletions(-) diff --git a/gcc/testsuite/gcc.target/loongarch/lasx-andn-iorn.c b/gcc/testsuite/gcc.target/loongarch/lasx-andn-iorn.c index 4aa5f19a650..86b04dbbb33 100644 --- a/gcc/testsuite/gcc.target/loongarch/lasx-andn-iorn.c +++ b/gcc/testsuite/gcc.target/loongarch/lasx-andn-iorn.c @@ -3,7 +3,7 @@ #include "./lsx-andn-iorn.c" /* { dg-do compile } */ -/* { dg-options "-O2 -mlasx -ftree-vectorize" } */ +/* { dg-options "-O2 -mlasx -ftree-vectorize -fdump-tree-optimized" } */ /* We should produce a BIT_ANDC and BIT_IORC here. */ diff --git a/gcc/testsuite/gcc.target/loongarch/lsx-andn-iorn.c b/gcc/testsuite/gcc.target/loongarch/lsx-andn-iorn.c index 7bceccd37d4..14838ab8bff 100644 --- a/gcc/testsuite/gcc.target/loongarch/lsx-andn-iorn.c +++ b/gcc/testsuite/gcc.target/loongarch/lsx-andn-iorn.c @@ -1,5 +1,5 @@ /* { dg-do compile } */ -/* { dg-options "-O2 -mlsx -ftree-vectorize" } */ +/* { dg-options "-O2 -mlsx -ftree-vectorize -fdump-tree-optimized" } */ #ifndef N #define N 4
[PATCH] strlen: Handle vector CONSTRUCTORs [PR117057]
Hi! The following patch handles VECTOR_TYPE_P CONSTRUCTORs in count_nonzero_bytes, including handling them if they have some elements non-constant. If there are still some constant elements before it (in the range queried), we derive info at least from those bytes and consider the rest as unknown. The first 3 hunks just punt in IMHO problematic cases, the spaghetti code considers byte_size 0 as unknown size, determine yourself, so if offset is equal to exp size, there are 0 bytes to consider (so nothing useful to determine), but using byte_size 0 would mean use any size. Similarly, native_encode_expr uses int type for offset (and size), so padding it offset larger than INT_MAX could be silent miscompilation. I've guarded the test to just a couple of targets known to handle it, because e.g. on ia32 without -msse forwprop1 seems to lower the CONSTRUCTOR into 4 BIT_FIELD_REF stores and I haven't figured out on what exactly that depends on (e.g. powerpc* is fine on any CPUs, even with -mno-altivec -mno-vsx, even -m32). Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk? 2024-11-30 Jakub Jelinek PR tree-optimization/117057 * tree-ssa-strlen.cc (strlen_pass::count_nonzero_bytes): Punt also when byte_size is equal to offset or nchars. Punt if offset is bigger than INT_MAX. Handle vector CONSTRUCTOR with some elements constant, possibly followed by non-constant. * gcc.dg/strlenopt-32.c: Remove xfail and vect_slp_v2qi_store_unalign specific scan-tree-dump-times directive. * gcc.dg/strlenopt-96.c: New test. --- gcc/tree-ssa-strlen.cc.jj 2024-11-23 13:00:31.476982808 +0100 +++ gcc/tree-ssa-strlen.cc 2024-11-29 22:53:17.465837084 +0100 @@ -4629,7 +4629,7 @@ strlen_pass::count_nonzero_bytes (tree e return false; unsigned HOST_WIDE_INT byte_size = tree_to_uhwi (size); - if (byte_size < offset) + if (byte_size <= offset) return false; nbytes = byte_size - offset; @@ -4682,7 +4682,7 @@ strlen_pass::count_nonzero_bytes (tree e if (TREE_CODE (exp) == STRING_CST) { unsigned nchars = TREE_STRING_LENGTH (exp); - if (nchars < offset) + if (nchars <= offset) return false; if (!nbytes) @@ -4700,7 +4700,7 @@ strlen_pass::count_nonzero_bytes (tree e unsigned char buf[256]; if (!prep) { - if (CHAR_BIT != 8 || BITS_PER_UNIT != 8) + if (CHAR_BIT != 8 || BITS_PER_UNIT != 8 || offset > INT_MAX) return false; /* If the pointer to representation hasn't been set above for STRING_CST point it at the buffer. */ @@ -4710,11 +4710,75 @@ strlen_pass::count_nonzero_bytes (tree e unsigned repsize = native_encode_expr (exp, buf, sizeof buf, offset); if (repsize < nbytes) { - /* This should only happen when REPSIZE is zero because EXP -doesn't denote an object with a known initializer, except -perhaps when the reference reads past its end. */ - lenrange[0] = 0; - prep = NULL; + /* Handle vector { 0x12345678, 0x23003412, x_1(D), y_2(D) } +and similar cases. Even when not all the elements are constant, +we can perhaps figure out something from the constant ones +and assume the others can be anything. */ + if (TREE_CODE (exp) == CONSTRUCTOR + && CONSTRUCTOR_NELTS (exp) + && VECTOR_TYPE_P (TREE_TYPE (exp)) + && nbytes <= sizeof buf) + { + tree v0 = CONSTRUCTOR_ELT (exp, 0)->value; + unsigned HOST_WIDE_INT elt_sz + = int_size_in_bytes (TREE_TYPE (v0)); + unsigned int i, s = 0; + tree v, idx; + FOR_EACH_CONSTRUCTOR_ELT (CONSTRUCTOR_ELTS (exp), i, idx, v) + { + if (idx + && (VECTOR_TYPE_P (TREE_TYPE (v0)) + || !tree_fits_uhwi_p (idx) + || tree_to_uhwi (idx) != i)) + { + s = 0; + break; + } + if ((i + 1) * elt_sz <= offset) + continue; + unsigned int o = 0; + if (i * elt_sz < offset) + o = offset % elt_sz; + repsize = native_encode_expr (v, buf + s, + sizeof (buf) - s, o); + if (repsize != elt_sz - o) + break; + s += repsize; + } + if (s != 0 && s < nbytes) + { + unsigned HOST_WIDE_INT n = strnlen (prep, s); + if (n < lenrange[0]) + lenrange[0] = n; + if (lenrange[1] < n && n != s) + lenrange[1] = n; + if (lenrange[2] < nbytes) + lenrange[2] = nbytes; +
Re: [PATCH] strlen: Handle vector CONSTRUCTORs [PR117057]
> Am 30.11.2024 um 10:53 schrieb Jakub Jelinek : > > Hi! > > The following patch handles VECTOR_TYPE_P CONSTRUCTORs in > count_nonzero_bytes, including handling them if they have some elements > non-constant. > If there are still some constant elements before it (in the range queried), > we derive info at least from those bytes and consider the rest as unknown. > > The first 3 hunks just punt in IMHO problematic cases, the spaghetti code > considers byte_size 0 as unknown size, determine yourself, so if offset > is equal to exp size, there are 0 bytes to consider (so nothing useful > to determine), but using byte_size 0 would mean use any size. > Similarly, native_encode_expr uses int type for offset (and size), so > padding it offset larger than INT_MAX could be silent miscompilation. > > I've guarded the test to just a couple of targets known to handle it, > because e.g. on ia32 without -msse forwprop1 seems to lower the CONSTRUCTOR > into 4 BIT_FIELD_REF stores and I haven't figured out on what exactly > that depends on (e.g. powerpc* is fine on any CPUs, even with -mno-altivec > -mno-vsx, even -m32). > > Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk? Ok Thanks, Richard > 2024-11-30 Jakub Jelinek > >PR tree-optimization/117057 >* tree-ssa-strlen.cc (strlen_pass::count_nonzero_bytes): Punt also >when byte_size is equal to offset or nchars. Punt if offset is bigger >than INT_MAX. Handle vector CONSTRUCTOR with some elements constant, >possibly followed by non-constant. > >* gcc.dg/strlenopt-32.c: Remove xfail and vect_slp_v2qi_store_unalign >specific scan-tree-dump-times directive. >* gcc.dg/strlenopt-96.c: New test. > > --- gcc/tree-ssa-strlen.cc.jj2024-11-23 13:00:31.476982808 +0100 > +++ gcc/tree-ssa-strlen.cc2024-11-29 22:53:17.465837084 +0100 > @@ -4629,7 +4629,7 @@ strlen_pass::count_nonzero_bytes (tree e >return false; > > unsigned HOST_WIDE_INT byte_size = tree_to_uhwi (size); > - if (byte_size < offset) > + if (byte_size <= offset) >return false; > > nbytes = byte_size - offset; > @@ -4682,7 +4682,7 @@ strlen_pass::count_nonzero_bytes (tree e > if (TREE_CODE (exp) == STRING_CST) > { > unsigned nchars = TREE_STRING_LENGTH (exp); > - if (nchars < offset) > + if (nchars <= offset) >return false; > > if (!nbytes) > @@ -4700,7 +4700,7 @@ strlen_pass::count_nonzero_bytes (tree e > unsigned char buf[256]; > if (!prep) > { > - if (CHAR_BIT != 8 || BITS_PER_UNIT != 8) > + if (CHAR_BIT != 8 || BITS_PER_UNIT != 8 || offset > INT_MAX) >return false; > /* If the pointer to representation hasn't been set above > for STRING_CST point it at the buffer. */ > @@ -4710,11 +4710,75 @@ strlen_pass::count_nonzero_bytes (tree e > unsigned repsize = native_encode_expr (exp, buf, sizeof buf, offset); > if (repsize < nbytes) >{ > - /* This should only happen when REPSIZE is zero because EXP > - doesn't denote an object with a known initializer, except > - perhaps when the reference reads past its end. */ > - lenrange[0] = 0; > - prep = NULL; > + /* Handle vector { 0x12345678, 0x23003412, x_1(D), y_2(D) } > + and similar cases. Even when not all the elements are constant, > + we can perhaps figure out something from the constant ones > + and assume the others can be anything. */ > + if (TREE_CODE (exp) == CONSTRUCTOR > + && CONSTRUCTOR_NELTS (exp) > + && VECTOR_TYPE_P (TREE_TYPE (exp)) > + && nbytes <= sizeof buf) > +{ > + tree v0 = CONSTRUCTOR_ELT (exp, 0)->value; > + unsigned HOST_WIDE_INT elt_sz > += int_size_in_bytes (TREE_TYPE (v0)); > + unsigned int i, s = 0; > + tree v, idx; > + FOR_EACH_CONSTRUCTOR_ELT (CONSTRUCTOR_ELTS (exp), i, idx, v) > +{ > + if (idx > + && (VECTOR_TYPE_P (TREE_TYPE (v0)) > + || !tree_fits_uhwi_p (idx) > + || tree_to_uhwi (idx) != i)) > +{ > + s = 0; > + break; > +} > + if ((i + 1) * elt_sz <= offset) > +continue; > + unsigned int o = 0; > + if (i * elt_sz < offset) > +o = offset % elt_sz; > + repsize = native_encode_expr (v, buf + s, > +sizeof (buf) - s, o); > + if (repsize != elt_sz - o) > +break; > + s += repsize; > +} > + if (s != 0 && s < nbytes) > +{ > + unsigned HOST_WIDE_INT n = strnlen (prep, s); > + if (n < lenrange[0]) > +lenrange[0] = n; > + if (lenrange[1] < n && n != s) > +lenrange[1] = n; > + if (lenrange[2] < nbytes) > +lenrange[2] = nbytes; > + /* We haven't processed all bytes, the rest are unknown. > +
[committed] openmp: Add crtoffloadtableS.o and use it [PR117851]
Hi! Unlike crtoffload{begin,end}.o which just define some symbols at the start/end of the various .gnu.offload* sections, crtoffloadtable.o contains const void *const __OFFLOAD_TABLE__[] __attribute__ ((__visibility__ ("hidden"))) = { &__offload_func_table, &__offload_funcs_end, &__offload_var_table, &__offload_vars_end, &__offload_ind_func_table, &__offload_ind_funcs_end, }; The problem is that linking this into PIEs or shared libraries doesn't work when it is compiled without -fpic/-fpie - __OFFLOAD_TABLE__ for non-PIC code is put into .rodata section, but it really needs relocations, so for PIC it should go into .data.rel.ro/.data.rel.ro.local. As I think we don't want .data.rel.ro section in non-PIE binaries, this patch follows the path of e.g. crtbegin.o vs. crtbeginS.o and adds crtoffloadtableS.o next to crtoffloadtable.o, where crtoffloadtableS.o is compiled with -fpic. Bootstrapped/regtested on x86_64-linux and i686-linux (without offloading) and Matthias said he has tested it with offloading (I'll do that early next week when trying to build distro packages), committed to trunk. 2024-11-30 Jakub Jelinek PR libgomp/117851 gcc/ * lto-wrapper.cc (find_crtoffloadtable): Add PIE_OR_SHARED argument, search for crtoffloadtableS.o rather than crtoffloadtable.o if true. (run_gcc): Add pie_or_shared variable. If OPT_pie or OPT_shared or OPT_static_pie is seen, set pie_or_shared to true, if OPT_no_pie is seen, set pie_or_shared to false. Pass it to find_crtoffloadtable. libgcc/ * configure.ac (extra_parts): Add crtoffloadtableS.o. * Makefile.in (crtoffloadtableS$(objext)): New goal. * configure: Regenerated. --- gcc/lto-wrapper.cc.jj 2024-11-23 13:00:30.083002920 +0100 +++ gcc/lto-wrapper.cc 2024-11-29 21:18:47.362532372 +0100 @@ -1092,13 +1092,16 @@ copy_file (const char *dest, const char the copy to the linker. */ static void -find_crtoffloadtable (int save_temps, const char *dumppfx) +find_crtoffloadtable (int save_temps, bool pie_or_shared, const char *dumppfx) { char **paths = NULL; const char *library_path = getenv ("LIBRARY_PATH"); if (!library_path) return; - unsigned n_paths = parse_env_var (library_path, &paths, "/crtoffloadtable.o"); + unsigned n_paths = parse_env_var (library_path, &paths, + pie_or_shared + ? "/crtoffloadtableS.o" + : "/crtoffloadtable.o"); unsigned i; for (i = 0; i < n_paths; i++) @@ -1117,7 +1120,8 @@ find_crtoffloadtable (int save_temps, co } if (i == n_paths) fatal_error (input_location, -"installation error, cannot find %"); +"installation error, cannot find %", +pie_or_shared ? "S" : ""); free_array_of_ptrs ((void **) paths, n_paths); } @@ -1423,6 +1427,11 @@ run_gcc (unsigned argc, char *argv[]) char **lto_argv, **ltoobj_argv; bool linker_output_rel = false; bool skip_debug = false; +#ifdef ENABLE_DEFAULT_PIE + bool pie_or_shared = true; +#else + bool pie_or_shared = false; +#endif const char *incoming_dumppfx = dumppfx = NULL; static char current_dir[] = { '.', DIR_SEPARATOR, '\0' }; @@ -1594,6 +1603,16 @@ run_gcc (unsigned argc, char *argv[]) global_dc->set_show_highlight_colors (option->value); break; + case OPT_pie: + case OPT_shared: + case OPT_static_pie: + pie_or_shared = true; + break; + + case OPT_no_pie: + pie_or_shared = false; + break; + default: break; } @@ -1802,7 +1821,7 @@ cont1: if (offload_names) { - find_crtoffloadtable (save_temps, dumppfx); + find_crtoffloadtable (save_temps, pie_or_shared, dumppfx); for (i = 0; offload_names[i]; i++) printf ("%s\n", offload_names[i]); free_array_of_ptrs ((void **) offload_names, i); --- libgcc/configure.ac.jj 2024-09-24 11:31:48.942619056 +0200 +++ libgcc/configure.ac 2024-11-29 21:05:29.344764142 +0100 @@ -481,7 +481,8 @@ AC_SUBST(accel_dir_suffix) AC_SUBST(real_host_noncanonical) if test x"$enable_offload_targets" != x; then - extra_parts="${extra_parts} crtoffloadbegin.o crtoffloadend.o crtoffloadtable.o" + extra_parts="${extra_parts} crtoffloadbegin.o crtoffloadend.o" + extra_parts="${extra_parts} crtoffloadtable.o crtoffloadtableS.o" fi # Check if Solaris/x86 linker supports ZERO terminator unwind entries. --- libgcc/Makefile.in.jj 2024-09-24 11:31:48.912619466 +0200 +++ libgcc/Makefile.in 2024-11-29 21:03:22.789545531 +0100 @@ -1064,6 +1064,9 @@ crtoffloadend$(objext): $(srcdir)/offloa crtoffloadtable$(objext): $(srcdir)/offloadstuff.c $(crt_compile) $(CRTSTUFF_T_CFLAGS) -c $< -DCRT_TABLE + +crtoffloadtableS$(objext): $(srcdir)/offloadstuff.c + $(crt_compile) $(CRTSTUFF_T_CFLAGS_S) -
Re: [pushed][PATCH v3] LoongArch: Mask shift offset when emit {xv,v}{srl,sll,sra} with sameimm vector
Pushed to r15-5819.. 在 2024/11/28 上午9:26, Jinyang He 写道: For {xv,v}{srl,sll,sra}, the constraint `vector_same_uimm6` cause overflow in when emit {w,h,b}. Since the number of bits shifted is the remainder of the register value, it is actually unnecessary to constrain the range. Simply mask the shift number with the unit-bit-width, without any constraint on the shift range. gcc/ChangeLog: * config/loongarch/constraints.md (Uuv6, Uuvx): Remove Uuv6, add Uuvx as replicated vector const with unsigned range [0,umax]. * config/loongarch/lasx.md (xvsrl, xvsra, xvsll): Mask shift offset by its unit bits. * config/loongarch/lsx.md (vsrl, vsra, vsll): Likewise. * config/loongarch/loongarch-protos.h (loongarch_const_vector_same_int_p): Set default for low and high. * config/loongarch/predicates.md: Replace reg_or_vector_same_uimm6 _operand to reg_or_vector_same_uimm_operand. gcc/testsuite/ChangeLog: * gcc.target/loongarch/vector/lasx/lasx-shift-sameimm-vec.c: New test. * gcc.target/loongarch/vector/lsx/lsx-shift-sameimm-vec.c: New test. --- v2: Fix indent in lsx.md and lasx.md. Use "dg-do assemble" in test which suggested by Ruoyao. v3: Re-enable scan-assembler. gcc/config/loongarch/constraints.md | 14 ++-- gcc/config/loongarch/lasx.md | 60 gcc/config/loongarch/loongarch-protos.h | 5 +- gcc/config/loongarch/lsx.md | 60 gcc/config/loongarch/predicates.md| 8 +-- .../vector/lasx/lasx-shift-sameimm-vec.c | 72 +++ .../vector/lsx/lsx-shift-sameimm-vec.c| 72 +++ 7 files changed, 254 insertions(+), 37 deletions(-) create mode 100644 gcc/testsuite/gcc.target/loongarch/vector/lasx/lasx-shift-sameimm-vec.c create mode 100644 gcc/testsuite/gcc.target/loongarch/vector/lsx/lsx-shift-sameimm-vec.c diff --git a/gcc/config/loongarch/constraints.md b/gcc/config/loongarch/constraints.md index 18da8b31f49..66ef1073fad 100644 --- a/gcc/config/loongarch/constraints.md +++ b/gcc/config/loongarch/constraints.md @@ -334,19 +334,19 @@ (and (match_code "const_vector") (match_test "loongarch_const_vector_same_int_p (op, mode, -16, 15)"))) -(define_constraint "Uuv6" - "@internal - A replicated vector const in which the replicated value is in the range - [0,63]." - (and (match_code "const_vector") - (match_test "loongarch_const_vector_same_int_p (op, mode, 0, 63)"))) - (define_constraint "Urv8" "@internal A replicated vector const with replicated byte values as well as elements" (and (match_code "const_vector") (match_test "loongarch_const_vector_same_bytes_p (op, mode)"))) +(define_constraint "Uuvx" + "@internal + A replicated vector const in which the replicated value is in the unsigned + range [0,umax]." + (and (match_code "const_vector") + (match_test "loongarch_const_vector_same_int_p (op, mode)"))) + (define_memory_constraint "ZC" "A memory operand whose address is formed by a base register and offset that is suitable for use in instructions with the same addressing mode diff --git a/gcc/config/loongarch/lasx.md b/gcc/config/loongarch/lasx.md index 457ed163f31..90778dd8ff9 100644 --- a/gcc/config/loongarch/lasx.md +++ b/gcc/config/loongarch/lasx.md @@ -1013,11 +1013,23 @@ [(set (match_operand:ILASX 0 "register_operand" "=f,f") (lshiftrt:ILASX (match_operand:ILASX 1 "register_operand" "f,f") - (match_operand:ILASX 2 "reg_or_vector_same_uimm6_operand" "f,Uuv6")))] + (match_operand:ILASX 2 "reg_or_vector_same_uimm_operand" "f,Uuvx")))] "ISA_HAS_LASX" - "@ - xvsrl.\t%u0,%u1,%u2 - xvsrli.\t%u0,%u1,%E2" +{ + switch (which_alternative) +{ +case 0: + return "xvsrl.\t%u0,%u1,%u2"; +case 1: + { + unsigned HOST_WIDE_INT val = UINTVAL (CONST_VECTOR_ELT (operands[2], 0)); + operands[2] = GEN_INT (val & (GET_MODE_UNIT_BITSIZE (mode) - 1)); + return "xvsrli.\t%u0,%u1,%d2"; + } +default: + gcc_unreachable (); +} +} [(set_attr "type" "simd_shift") (set_attr "mode" "")]) @@ -1026,11 +1038,23 @@ [(set (match_operand:ILASX 0 "register_operand" "=f,f") (ashiftrt:ILASX (match_operand:ILASX 1 "register_operand" "f,f") - (match_operand:ILASX 2 "reg_or_vector_same_uimm6_operand" "f,Uuv6")))] + (match_operand:ILASX 2 "reg_or_vector_same_uimm_operand" "f,Uuvx")))] "ISA_HAS_LASX" - "@ - xvsra.\t%u0,%u1,%u2 - xvsrai.\t%u0,%u1,%E2" +{ + switch (which_alternative) +{ +case 0: + return "xvsra.\t%u0,%u1,%u2"; +case 1: + { + unsigned HOST_WIDE_INT val = UINTVAL (CONST_VECTOR_ELT (operands[2], 0)); + operands[2] = GEN_INT (val & (GET_MODE_UNIT_BITSIZE (mode) - 1)); + return "xvsrai.\t%u0,%u1,%d2"; + } +default: + g
[patch, avr] ad PR84211: Prefer split into reg-reg move over reg-const move
When splitting multi-byte REG-REG moves in try_split_any(), it's not clear whether propagating constants will turn out as profitable. When MOVW is available, split into REG-REG moves instead of a possible REG-CONST. Johann -- AVR: ad target/84211 - Split MOVW into MOVs in try_split_any. When splitting multi-byte REG-REG moves in try_split_any(), it's not clear whether propagating constants will turn out as profitable. When MOVW is available, split into REG-REG moves instead of a possible REG-CONST. gcc/ PR target/84211 * config/avr/avr-passes.cc (try_split_any) [SET, MOVW]: Prefer reg=reg move over reg=const when splitting a reg=reg insn.AVR: ad target/84211 - Split MOVW into MOVs in try_split_any. When splitting multi-byte REG-REG moves in try_split_any(), it's not clear whether propagating constants will turn out as profitable. When MOVW is available, split into REG-REG moves instead of a possible REG-CONST. gcc/ PR target/84211 * config/avr/avr-passes.cc (try_split_any) [SET, MOVW]: Prefer reg=reg move over reg=const when splitting a reg=reg insn. diff --git a/gcc/config/avr/avr-passes.cc b/gcc/config/avr/avr-passes.cc index 68ebd31390e..7be5ec25fbc 100644 --- a/gcc/config/avr/avr-passes.cc +++ b/gcc/config/avr/avr-passes.cc @@ -2912,6 +2912,14 @@ optimize_data_t::try_split_any (bbinfo_t *) xsrc = gen_rtx_REG (HImode, r16); i += step; } + // ...or a reg-reg move from a multi-byte move... + else if (r_src + // Prefer a reg-reg move over a (potential) load + // of a constant, because the subsequent RTL + // peephole pass may combine it to a MOVW again. + && AVR_HAVE_MOVW + && REG_P (curr.ii.m_src)) + xsrc = gen_rtx_REG (QImode, r_src); // ...or a cheap constant... else if (val8 >= 0 && AVRasm::constant_cost (SET, r_dest, val8) <= 1)
[patch,avr,applied] Fix some coding rule nits
This patch fixed some unrelated coding rule nits. Johann -- AVR: Fix some coding rule nits and typos. gcc/ * config/avr/avr-c.cc: Fix some coding rule nits and typos. * config/avr/avr-passes.cc: Same * config/avr/avr.h: Same. * config/avr/avr.cc: Same. (avr_function_arg_regno_p, avr_hard_regno_rename_ok) (avr_epilogue_uses, extra_constraint_Q): Return bool instead of int. * config/avr/avr-protos.h (avr_function_arg_regno_p) (avr_hard_regno_rename_ok, avr_epilogue_uses) (extra_constraint_Q): Return bool instead of int.AVR: Fix some coding rule nits and typos. gcc/ * config/avr/avr-c.cc: Fix some coding rule nits and typos. * config/avr/avr-passes.cc: Same * config/avr/avr.h: Same. * config/avr/avr.cc: Same. (avr_function_arg_regno_p, avr_hard_regno_rename_ok) (avr_epilogue_uses, extra_constraint_Q): Return bool instead of int. * config/avr/avr-protos.h (avr_function_arg_regno_p) (avr_hard_regno_rename_ok, avr_epilogue_uses) (extra_constraint_Q): Return bool instead of int. diff --git a/gcc/config/avr/avr-c.cc b/gcc/config/avr/avr-c.cc index d3c40d73043..7db88a8c77a 100644 --- a/gcc/config/avr/avr-c.cc +++ b/gcc/config/avr/avr-c.cc @@ -53,7 +53,7 @@ avr_resolve_overloaded_builtin (unsigned int iloc, tree fndecl, void *vargs) tree type0, type1, fold = NULL_TREE; avr_builtin_id id = AVR_BUILTIN_COUNT; location_t loc = (location_t) iloc; - vec &args = * (vec*) vargs; + vec &args = * (vec *) vargs; switch (DECL_MD_FUNCTION_CODE (fndecl)) { @@ -61,10 +61,10 @@ avr_resolve_overloaded_builtin (unsigned int iloc, tree fndecl, void *vargs) break; case AVR_BUILTIN_ABSFX: - if (args.length() != 1) + if (args.length () != 1) { error_at (loc, "%qs expects 1 argument but %d given", - "absfx", (int) args.length()); + "absfx", (int) args.length ()); fold = error_mark_node; break; @@ -120,10 +120,10 @@ avr_resolve_overloaded_builtin (unsigned int iloc, tree fndecl, void *vargs) break; // absfx case AVR_BUILTIN_ROUNDFX: - if (args.length() != 2) + if (args.length () != 2) { error_at (loc, "%qs expects 2 arguments but %d given", - "roundfx", (int) args.length()); + "roundfx", (int) args.length ()); fold = error_mark_node; break; @@ -186,10 +186,10 @@ avr_resolve_overloaded_builtin (unsigned int iloc, tree fndecl, void *vargs) break; // roundfx case AVR_BUILTIN_COUNTLSFX: - if (args.length() != 1) + if (args.length () != 1) { error_at (loc, "%qs expects 1 argument but %d given", - "countlsfx", (int) args.length()); + "countlsfx", (int) args.length ()); fold = error_mark_node; break; diff --git a/gcc/config/avr/avr-passes.cc b/gcc/config/avr/avr-passes.cc index 0d324727b47..68ebd31390e 100644 --- a/gcc/config/avr/avr-passes.cc +++ b/gcc/config/avr/avr-passes.cc @@ -1391,10 +1391,10 @@ plies_t::emit_insns (const insninfo_t &ii, const memento_t &memo) const } -// Helper for .emit_insns(). Emit an ior3 or and3 insns +// Helper for .emit_insns(). Emit an ior3 or and3 insn // that's equivalent to a sequence of contiguous BLDs starting at -// .plies[ISTART]. Updates N_INSNS according to the number of insns emitted -// and returns the number of consumed plys in .plies[]. +// .plies[ISTART]. Updates N_INSNS according to the number of insns +// emitted and returns the number of consumed plys in .plies[]. int plies_t::emit_blds (const insninfo_t &ii, int &n_insns, int istart) const { @@ -1448,7 +1448,7 @@ plies_t::emit_blds (const insninfo_t &ii, int &n_insns, int istart) const // Emit insns for a contiguous sequence of SET ply_t's starting at // .plies[ISTART]. Advances N_INSNS by the number of emitted insns. -// MEMO ist the state of the GPRs before II es executed, where II +// MEMO ist the state of the GPRs before II is executed, where II // represents the insn under optimization. // The emitted insns are "movqi_insn" or "*reload_inqi" // when .plies[ISTART].in_set_some is not set, and one "set_some" insn @@ -1571,7 +1571,7 @@ plies_t::emit_sets (const insninfo_t &ii, int &n_insns, const memento_t &memo, // Try to find an operation such that Y = op (X). // Shifts and rotates are regarded as unary operaions with -// an implied 2nd operand. +// an implied 2nd operand or 1 or 4, respectively. static rtx_code find_arith (uint8_t y, uint8_t x) { @@ -1606,8 +1606,8 @@ find_arith2 (uint8_t z, uint8_t x, uint8_t y) } -// Add plies to .plies[] that represent a MOVW, but only ones that reduce the -// Hamming distance from REGNO[SIZE] to VAL by exactly DHAMM. +// Add plies to .plies[] that represent a MOVW, but only ones that reduce +// the Hamming distance from REGNO[SIZE] to VAL by exactly DHAMM. void plies_t::add_plies_movw (int regno, int siz
Re: [patch,avr,testsuite,applied] gcc.c-torture/execute/memcpy-a*.c
On Sat, 30 Nov 2024, Georg-Johann Lay wrote: > The gcc.c-torture/execute/memcpy-a[1248].c tests consumed more time > than the whole rest of the test suite, just to come up with > a "memory full" even at -Os. Skipped thusly. As a matter of interest, is the timeout/memory exhaustion observed with host compilation or target execution? From my observation the optimisation level does not matter much for compilation times unless you use -O0, reducing the number of passes and code transformations applied. So whether it's at -O2 or -Os the time consumed does not change much. The resulting executables themselves take ~560KiB with a VAX target, the ultimate CISC machine which also has memory move and memory set hardware instructions contributing to size reduction, and ~1200KiB with a POWER9 target, a 64-bit RISC machine. I suppose the sizes will lie somewhere in between for the majority of our targets, and with smaller embedded ones the executables may not fit in the address space available. Maciej
Re: [PATCH v2] match.pd: Add pattern to simplify `(a - 1) & -a` to `0`
On 11/26/24 3:49 AM, Jovan Vukic wrote: Thank you for the feedback on the v1 patch. As requested, I added detailed tests for various signed and unsigned integer types in the test file bitops-11.c. I also included more complex expressions to observe how everything behaves at the GIMPLE level and added vector test examples as well. Since vector expressions are matched on (minus @0 1) instead of (plus @0 -1), I added a simplification for the minus case in match.pd. Additionally, I introduced simplifications for the expressions (a - 1) & -a, (a - 1) | -a, and (a - 1) ^ -a to 0, -1, and -1, respectively, in simplify-rtx.cc. For each of the three expressions, I added two if statements. The first matches the typical (BIT_OP (plus (A -1)) (neg A)), while the second recognizes the presence of a SUBREG within the RTL expression. For example, when A is of type short, the second if statement is triggered. I didn't observe any issues with match.pd missing any simplifications, but if that happens, the code in simplify-rtx.cc should help. Bootstrapped and tested on x86-linux-gnu with no regressions. 2024-11-26 Jovan Vukic gcc/ChangeLog: * match.pd: New pattern. * simplify-rtx.cc (simplify_context::simplify_binary_operation_1): New code to handle (a - 1) & -a, (a - 1) | -a and (a - 1) ^ -a. gcc/testsuite/ChangeLog: * gcc.dg/tree-ssa/bitops-11.c: New test. CONFIDENTIALITY: The contents of this e-mail are confidential and intended only for the above addressee(s). If you are not the intended recipient, or the person responsible for delivering it to the intended recipient, copying or delivering it to anyone else or using it in any unauthorized manner is prohibited and may be unlawful. If you receive this e-mail by mistake, please notify the sender and the systems administrator atstraym...@rt-rk.com immediately. patch.patch --- gcc/match.pd | 16 +++ gcc/simplify-rtx.cc | 87 gcc/testsuite/gcc.dg/tree-ssa/bitops-11.c | 116 ++ 3 files changed, 219 insertions(+) create mode 100644 gcc/testsuite/gcc.dg/tree-ssa/bitops-11.c diff --git a/gcc/match.pd b/gcc/match.pd index 0ac5674f24b..c85d4b9ae6c 100644 --- a/gcc/match.pd +++ b/gcc/match.pd @@ -1472,6 +1472,22 @@ DEFINE_INT_AND_FLOAT_ROUND_FN (RINT) (bit_and:c @0 (bit_not (bit_xor:c @0 @1))) (bit_and @0 @1)) +/* Transform: + (a - 1) & -a -> 0. + (a - 1) | -a -> -1. + (a - 1) ^ -a -> -1. */ +(for bit_op (bit_ior bit_xor bit_and) + (simplify + (bit_op:c (plus @0 integer_minus_onep@1) (negate @0)) + (if (bit_op == BIT_AND_EXPR) +{ build_zero_cst (type); } +{ build_minus_one_cst (type); })) + (simplify + (bit_op:c (minus @0 integer_onep@1) (negate @0)) + (if (bit_op == BIT_AND_EXPR) +{ build_zero_cst (type); } +{ build_minus_one_cst (type); }))) I'm a little surprised the second form matches anything since I would expect canonicalization to turn the second form into the first. But may we're not as good at canonicalizing trees as we are RTL. + /* a & (a == b) --> a & b (boolean version of the above). */ (simplify (bit_and:c @0 (nop_convert? (eq:c @0 @1))) diff --git a/gcc/simplify-rtx.cc b/gcc/simplify-rtx.cc index 893c5f6e1ae..514d21c6ef5 100644 --- a/gcc/simplify-rtx.cc +++ b/gcc/simplify-rtx.cc @@ -3530,6 +3530,35 @@ simplify_context::simplify_binary_operation_1 (rtx_code code, && GET_MODE_CLASS (mode) != MODE_CC) return CONSTM1_RTX (mode); + /* (ior (plus (A -1)) (neg A)) is -1. */ Nit. Space between the operator "-" and the operand "1". + if (((GET_CODE (op1) == NEG + && GET_CODE (op0) == PLUS + && XEXP (op0, 1) == constm1_rtx) Most likely constm1_rtx should be CONSTM1_RTX (mode). Similarly for the other instances. + || (GET_CODE (op0) == NEG + && GET_CODE (op1) == PLUS + && XEXP (op1, 1) == constm1_rtx)) + && rtx_equal_p (XEXP (op0, 0), XEXP (op1, 0)) + && !side_effects_p (XEXP (op0, 0)) + && !side_effects_p (XEXP (op1, 0))) If the RTXs are equal, then the second call to !side_effects_p isn't needed. Given the duplication of RTL scanning across the AND, XOR, IOR cases, wouldn't it make sense to factor the tests into a new function and just call it from all 3 locations? Overall it looks pretty good. I'm a bit concerned about the SUBREG case, but I haven't come up with any scenarios where it doesn't work yet. Jeff
Re: [PATCH v2] Write binary annotations for CodeView S_INLINESITE symbols
On 11/30/24 3:39 PM, Mark Harmstone wrote: Add "binary annotations" at the end of CodeView S_INLINESITE symbols, which are a series of compressed integers that represent how line numbers map to addresses. This requires assembler support; you will need commit b3aa594d ("gas: add .cv_ucomp and .cv_scomp pseudo-directives") in binutils. gcc/ * configure.ac (HAVE_GAS_CV_UCOMP): New check. * configure: Regenerate. * config.in: Regenerate. * dwarf2codeview.cc (enum binary_annotation_opcode): Define. (struct codeview_function): Add htab_next and inline_loc; (struct cv_func_hasher): Define. (cv_func_htab): New global variable. (new_codeview_function): Add new codeview_function to hash table. (codeview_begin_block): Record location of inline block. (codeview_end_block): Add dummy source line at end of inline block. (find_line_function): New function. (write_binary_annotations): New function. (write_s_inlinesite): Call write_binary_annotations. (codeview_debug_finish): Delete cv_func_htab. --- This second version adds a configure check for .cv_ucomp in the assembler, and prevents the binary annotations from being emitted if it's not supported. There's no separate check for .cv_scomp, as the same binutils commit introduced them both. OK. Jeff
Re: [PATCH] sched1: debug/model: dump predecessor list and BB num [NFC]
On 11/5/24 3:20 PM, Vineet Gupta wrote: This is broken out of predecessor promotion patch so that debugging can proceed during stage1 restrictions. Signed-off-by: Vineet Gupta --- gcc/haifa-sched.cc | 10 +- gcc/sched-rgn.cc | 14 -- 2 files changed, 17 insertions(+), 7 deletions(-) OK with a ChangeLog entry. jeff
Re: [PATCH] v2: Allow limited extended asm at toplevel [PR41045]
On 11/22/24 2:45 AM, Jakub Jelinek wrote: On Thu, Nov 21, 2024 at 09:32:51PM +, Joseph Myers wrote: On Sat, 2 Nov 2024, Jakub Jelinek wrote: +Extended @code{asm} statements outside of functions may not use any +qualifiers, may not specify clobbers, may not use @code{%}, @code{+} or +@code{&} modifiers in constraints and can only use constraints which don%'t +allow using any register. Just ' in Texinfo, not %'. @@ -3071,7 +3072,62 @@ c_parser_declaration_or_fndef (c_parser static void c_parser_asm_definition (c_parser *parser) { - tree asm_str = c_parser_simple_asm_expr (parser); + location_t asm_loc = c_parser_peek_token (parser)->location; The syntax comment above this function needs updating. The C front-end changes are OK with that fix. Thanks. Here is the adjusted patch. 2024-11-22 Jakub Jelinek PR c/41045 gcc/ * output.h (insn_noperands): Declare. * final.cc (insn_noperands): No longer static. * varasm.cc (assemble_asm): Handle ASM_EXPR. * lto-streamer-out.cc (lto_output_toplevel_asms): Add sorry_at for non-STRING_CST toplevel asm for now. * doc/extend.texi (Basic @code{asm}, Extended @code{asm}): Document that extended asm is now allowed outside of functions with certain restrictions. gcc/c/ * c-parser.cc (c_parser_asm_string_literal): Add forward declaration. (c_parser_asm_definition): Parse also extended asm without clobbers/labels. * c-typeck.cc (build_asm_expr): Allow extended asm outside of functions and check extra restrictions. gcc/cp/ * cp-tree.h (finish_asm_stmt): Add TOPLEV_P argument. * parser.cc (cp_parser_asm_definition): Parse also extended asm without clobbers/labels outside of functions. * semantics.cc (finish_asm_stmt): Add TOPLEV_P argument, if set, check extra restrictions for extended asm outside of functions. * pt.cc (tsubst_stmt): Adjust finish_asm_stmt caller. gcc/testsuite/ * c-c++-common/toplevel-asm-1.c: New test. * c-c++-common/toplevel-asm-2.c: New test. * c-c++-common/toplevel-asm-3.c: New test. final/varasm/lto bits are fine. Consider adding the note about the LTO limitation in the documentation blurb. Jason will need to chime in on the C++ FE bits if he hasn't already. Thanks, jeff
Re: [PATCH 1/2] gimple-lim: Reuse boolean var when moving PHI
> Am 01.12.2024 um 03:00 schrieb Andrew Pinski : > > While looking into PR 117859, I noticed that LIM > sometimes would produce `bool_var2 = bool_var1 != 0` instead > of just using bool_var2. This patch allows LIM to reuse bool_var1 > in the place where bool_var2 was going to be used. > > Bootstrapped and tested on x86_64-linux-gnu. Ok Richard > gcc/ChangeLog: > >* tree-ssa-loop-im.cc (move_computations_worker): While moving >phi, reuse the lhs of the conditional if it is a boolean type. > > Signed-off-by: Andrew Pinski > --- > gcc/tree-ssa-loop-im.cc | 21 - > 1 file changed, 16 insertions(+), 5 deletions(-) > > diff --git a/gcc/tree-ssa-loop-im.cc b/gcc/tree-ssa-loop-im.cc > index 538a0588d33..0130a809e30 100644 > --- a/gcc/tree-ssa-loop-im.cc > +++ b/gcc/tree-ssa-loop-im.cc > @@ -1304,11 +1304,22 @@ move_computations_worker (basic_block bb) > edges of COND. */ > extract_true_false_args_from_phi (dom, stmt, &arg0, &arg1); > gcc_assert (arg0 && arg1); > - t = make_ssa_name (boolean_type_node); > - new_stmt = gimple_build_assign (t, gimple_cond_code (cond), > - gimple_cond_lhs (cond), > - gimple_cond_rhs (cond)); > - gsi_insert_on_edge (loop_preheader_edge (level), new_stmt); > + /* For `bool_val != 0`, reuse bool_val. */ > + if (gimple_cond_code (cond) == NE_EXPR > + && integer_zerop (gimple_cond_rhs (cond)) > + && types_compatible_p (TREE_TYPE (gimple_cond_lhs (cond)), > + boolean_type_node)) > +{ > + t = gimple_cond_lhs (cond); > +} > + else > +{ > + t = make_ssa_name (boolean_type_node); > + new_stmt = gimple_build_assign (t, gimple_cond_code (cond), > + gimple_cond_lhs (cond), > + gimple_cond_rhs (cond)); > + gsi_insert_on_edge (loop_preheader_edge (level), new_stmt); > +} > new_stmt = gimple_build_assign (gimple_phi_result (stmt), > COND_EXPR, t, arg0, arg1); > todo |= TODO_cleanup_cfg; > -- > 2.43.0 >
[PUSHED 2/5] testsuite: Fix another issue with sve-sizeless-[12].C
There is a different error message expected on line 165 (for both files). It was expecting: error: cannot convert 'svint16_t' to 'sveint8_t' in initialization But now we get: error: cannot convert 'svint16_t' to 'signed char' in initialization This is because we support constructing scalable vectors rather than before. So just update error message. Pushed as obvious after a quick test for aarch64-linux-gnu. gcc/testsuite/ChangeLog: * g++.dg/ext/sve-sizeless-1.C: Update error message for line 165. * g++.dg/ext/sve-sizeless-2.C: Likewise. Signed-off-by: Andrew Pinski --- gcc/testsuite/g++.dg/ext/sve-sizeless-1.C | 2 +- gcc/testsuite/g++.dg/ext/sve-sizeless-2.C | 2 +- 2 files changed, 2 insertions(+), 2 deletions(-) diff --git a/gcc/testsuite/g++.dg/ext/sve-sizeless-1.C b/gcc/testsuite/g++.dg/ext/sve-sizeless-1.C index 0a5c80b92b8..1e00518d0e4 100644 --- a/gcc/testsuite/g++.dg/ext/sve-sizeless-1.C +++ b/gcc/testsuite/g++.dg/ext/sve-sizeless-1.C @@ -162,7 +162,7 @@ statements (int n) svint8_t init_sve_sc4 = sve_sh1; // { dg-error {cannot convert 'svint16_t' to 'svint8_t'} } svint8_t init_sve_sc5 = {}; svint8_t init_sve_sc6 = { sve_sc1 }; - svint8_t init_sve_sc7 = { sve_sh1 }; // { dg-error {cannot convert 'svint16_t' to 'svint8_t'} } + svint8_t init_sve_sc7 = { sve_sh1 }; // { dg-error {cannot convert 'svint16_t' to 'signed char'} } svint32_t init_sve_vc1 = { 0, 1 }; svint32_t init_sve_vc2 = { 0, bar () }; svint32_t init_sve_vc3 = { bar (), n }; diff --git a/gcc/testsuite/g++.dg/ext/sve-sizeless-2.C b/gcc/testsuite/g++.dg/ext/sve-sizeless-2.C index a5ee230f76a..e6423d572f0 100644 --- a/gcc/testsuite/g++.dg/ext/sve-sizeless-2.C +++ b/gcc/testsuite/g++.dg/ext/sve-sizeless-2.C @@ -162,7 +162,7 @@ statements (int n) svint8_t init_sve_sc4 = sve_sh1; // { dg-error {cannot convert 'svint16_t' to 'svint8_t'} } svint8_t init_sve_sc5 = {}; svint8_t init_sve_sc6 = { sve_sc1 }; - svint8_t init_sve_sc7 = { sve_sh1 }; // { dg-error {cannot convert 'svint16_t' to 'svint8_t'} } + svint8_t init_sve_sc7 = { sve_sh1 }; // { dg-error {cannot convert 'svint16_t' to 'signed char'} } svint32_t init_sve_vc1 = { 0, 1 }; svint32_t init_sve_vc2 = { 0, bar () }; svint32_t init_sve_vc3 = { bar (), n }; -- 2.43.0
[PUSHED 1/5] testsuite: Fix part of sve-sizeless-2.c
r15-5783-gb5df3eefd70064 missed to update part of sve-sizeless-2.C to include the declaration of the bar function. This corrects the oversight there. Pushed as obvious after testing the tecase for aarch64-linux-gnu. gcc/testsuite/ChangeLog: * g++.dg/ext/sve-sizeless-2.C: Add declaration of bar. Signed-off-by: Andrew Pinski --- gcc/testsuite/g++.dg/ext/sve-sizeless-2.C | 2 ++ 1 file changed, 2 insertions(+) diff --git a/gcc/testsuite/g++.dg/ext/sve-sizeless-2.C b/gcc/testsuite/g++.dg/ext/sve-sizeless-2.C index 87937d060d2..a5ee230f76a 100644 --- a/gcc/testsuite/g++.dg/ext/sve-sizeless-2.C +++ b/gcc/testsuite/g++.dg/ext/sve-sizeless-2.C @@ -124,6 +124,8 @@ void thrower2 () throw (svint8_t); // { dg-error {cannot throw or catch SVE type void thrower3 () throw (svint8_t); // { dg-error {cannot throw or catch SVE type 'svint8_t'} "" { target c++98_only } } #endif +extern int bar (void); + // Main tests for statements and expressions. void -- 2.43.0
[PUSHED] testsuite: Fix aarch64/sve/acle/general-c/gnu_vectors_[12].c for taking address of vector element
After the recent changes for SVE vectors becoming usable as GNU vector extensions. You can now get each of the elements like it was an array. There is no reason why taking the address of that won't be invalid too. especially since we are limiting to the first N elements (where N is the min arch supported elements for these types). So this removes the error message on these 2 lines and fixes the testcase. Pushed as obvious after a quick test for these tests for aarch64-linux-gnu. gcc/testsuite/ChangeLog: * gcc.target/aarch64/sve/acle/general-c/gnu_vectors_1.c: Remove error message on taking address of an element of a vector. * gcc.target/aarch64/sve/acle/general-c/gnu_vectors_2.c: Likewise. Signed-off-by: Andrew Pinski --- .../gcc.target/aarch64/sve/acle/general-c/gnu_vectors_1.c | 2 +- .../gcc.target/aarch64/sve/acle/general-c/gnu_vectors_2.c | 2 +- 2 files changed, 2 insertions(+), 2 deletions(-) diff --git a/gcc/testsuite/gcc.target/aarch64/sve/acle/general-c/gnu_vectors_1.c b/gcc/testsuite/gcc.target/aarch64/sve/acle/general-c/gnu_vectors_1.c index 7a7eb42f57b..32bb826e240 100644 --- a/gcc/testsuite/gcc.target/aarch64/sve/acle/general-c/gnu_vectors_1.c +++ b/gcc/testsuite/gcc.target/aarch64/sve/acle/general-c/gnu_vectors_1.c @@ -79,7 +79,7 @@ f (svuint8_t sve_u1, svint8_t sve_s1, /* Vector indexing. */ sve_u1[0]; - &sve_u1[0]; /* { dg-error {subscripted value is neither array nor pointer} } */ + &sve_u1[0]; gnu_u1[0]; &gnu_u1[0]; diff --git a/gcc/testsuite/gcc.target/aarch64/sve/acle/general-c/gnu_vectors_2.c b/gcc/testsuite/gcc.target/aarch64/sve/acle/general-c/gnu_vectors_2.c index 2e92b0dd766..c311e162f9e 100644 --- a/gcc/testsuite/gcc.target/aarch64/sve/acle/general-c/gnu_vectors_2.c +++ b/gcc/testsuite/gcc.target/aarch64/sve/acle/general-c/gnu_vectors_2.c @@ -78,7 +78,7 @@ f (svuint8_t sve_u1, svint8_t sve_s1, /* Vector indexing. */ sve_u1[0]; - &sve_u1[0]; /* { dg-error {subscripted value is neither array nor pointer} } */ + &sve_u1[0]; gnu_u1[0]; &gnu_u1[0]; -- 2.43.0
[PUSHED 4/5] testsuite: Fix sve-sizeless-[12].C for C++98
In C++98 `{ a }` for aggregates can only mean constructing by each element rather than a copy. This adds the expected error message for SVE vectors for C++98. Pushed as obvious after a test for aarch64-linux-gnu. gcc/testsuite/ChangeLog: * g++.dg/ext/sve-sizeless-1.C: Add error message for line 164 for C++98 only. * g++.dg/ext/sve-sizeless-2.C: Likewise. Signed-off-by: Andrew Pinski --- gcc/testsuite/g++.dg/ext/sve-sizeless-1.C | 2 +- gcc/testsuite/g++.dg/ext/sve-sizeless-2.C | 2 +- 2 files changed, 2 insertions(+), 2 deletions(-) diff --git a/gcc/testsuite/g++.dg/ext/sve-sizeless-1.C b/gcc/testsuite/g++.dg/ext/sve-sizeless-1.C index 0d62bd7f39e..37bcec1efb9 100644 --- a/gcc/testsuite/g++.dg/ext/sve-sizeless-1.C +++ b/gcc/testsuite/g++.dg/ext/sve-sizeless-1.C @@ -161,7 +161,7 @@ statements (int n) svint8_t init_sve_sc3 = sve_sc1; svint8_t init_sve_sc4 = sve_sh1; // { dg-error {cannot convert 'svint16_t' to 'svint8_t'} } svint8_t init_sve_sc5 = {}; - svint8_t init_sve_sc6 = { sve_sc1 }; + svint8_t init_sve_sc6 = { sve_sc1 }; // { dg-error {cannot convert 'svint8_t' to 'signed char'} "" { target c++98_only } } svint8_t init_sve_sc7 = { sve_sh1 }; // { dg-error {cannot convert 'svint16_t' to 'signed char'} } svint32_t init_sve_vc1 = { 0, 1 }; svint32_t init_sve_vc2 = { 0, bar () }; diff --git a/gcc/testsuite/g++.dg/ext/sve-sizeless-2.C b/gcc/testsuite/g++.dg/ext/sve-sizeless-2.C index 51def2441df..3243a17f606 100644 --- a/gcc/testsuite/g++.dg/ext/sve-sizeless-2.C +++ b/gcc/testsuite/g++.dg/ext/sve-sizeless-2.C @@ -161,7 +161,7 @@ statements (int n) svint8_t init_sve_sc3 = sve_sc1; svint8_t init_sve_sc4 = sve_sh1; // { dg-error {cannot convert 'svint16_t' to 'svint8_t'} } svint8_t init_sve_sc5 = {}; - svint8_t init_sve_sc6 = { sve_sc1 }; + svint8_t init_sve_sc6 = { sve_sc1 }; // { dg-error {cannot convert 'svint8_t' to 'signed char'} "" { target c++98_only } } svint8_t init_sve_sc7 = { sve_sh1 }; // { dg-error {cannot convert 'svint16_t' to 'signed char'} } svint32_t init_sve_vc1 = { 0, 1 }; svint32_t init_sve_vc2 = { 0, bar () }; -- 2.43.0
[PUSHED 3/5] testsuite: Fix sve-sizeless-[12].C for aggregate change
Since r15-5777-g761cf60218890a, the SVE types are considered an aggregate since they are now acting similar as a GNU vector. Pushed as obvious after a quick test for aarch64-linux-gnu. gcc/testsuite/ChangeLog: * g++.dg/ext/sve-sizeless-1.C: SVE vectors are now aggregates. * g++.dg/ext/sve-sizeless-2.C: Likewise. Signed-off-by: Andrew Pinski --- gcc/testsuite/g++.dg/ext/sve-sizeless-1.C | 2 +- gcc/testsuite/g++.dg/ext/sve-sizeless-2.C | 2 +- 2 files changed, 2 insertions(+), 2 deletions(-) diff --git a/gcc/testsuite/g++.dg/ext/sve-sizeless-1.C b/gcc/testsuite/g++.dg/ext/sve-sizeless-1.C index 1e00518d0e4..0d62bd7f39e 100644 --- a/gcc/testsuite/g++.dg/ext/sve-sizeless-1.C +++ b/gcc/testsuite/g++.dg/ext/sve-sizeless-1.C @@ -351,7 +351,7 @@ statements (int n) { typedef int f[__has_unique_object_representations (svint8_t) ? 1 : -1]; } { typedef int f[!__has_virtual_destructor (svint8_t) ? 1 : -1]; } { typedef int f[!__is_abstract (svint8_t) ? 1 : -1]; } - { typedef int f[!__is_aggregate (svint8_t) ? 1 : -1]; } + { typedef int f[__is_aggregate (svint8_t) ? 1 : -1]; } { typedef int f[!__is_base_of (svint8_t, svint8_t) ? 1 : -1]; } { typedef int f[!__is_base_of (svint8_t, svint16_t) ? 1 : -1]; } { typedef int f[!__is_class (svint8_t) ? 1 : -1]; } diff --git a/gcc/testsuite/g++.dg/ext/sve-sizeless-2.C b/gcc/testsuite/g++.dg/ext/sve-sizeless-2.C index e6423d572f0..51def2441df 100644 --- a/gcc/testsuite/g++.dg/ext/sve-sizeless-2.C +++ b/gcc/testsuite/g++.dg/ext/sve-sizeless-2.C @@ -351,7 +351,7 @@ statements (int n) { typedef int f[__has_unique_object_representations (svint8_t) ? 1 : -1]; } { typedef int f[!__has_virtual_destructor (svint8_t) ? 1 : -1]; } { typedef int f[!__is_abstract (svint8_t) ? 1 : -1]; } - { typedef int f[!__is_aggregate (svint8_t) ? 1 : -1]; } + { typedef int f[__is_aggregate (svint8_t) ? 1 : -1]; } { typedef int f[!__is_base_of (svint8_t, svint8_t) ? 1 : -1]; } { typedef int f[!__is_base_of (svint8_t, svint16_t) ? 1 : -1]; } { typedef int f[!__is_class (svint8_t) ? 1 : -1]; } -- 2.43.0
Re: [PATCH v2] sched1: parameterize pressure scheduling spilling agressiveness [PR/114729]
On 11/5/24 1:11 PM, Vineet Gupta wrote: changes since v1 * Changed target hook to --param * squash addon patch for RISC-V opting-in, testcase here * updated changelog with latest perf numbers --- sched1 computes ECC (Excess Change Cost) for each insn, which represents the register pressure attributed to the insn. Currently the pressure sensitive schduling algorithm deliberately ignores negative values (pressure reduction), making them 0 (neutral), leading to more spills. This happens due to the assumption that the compiler has a reasonably accurate processor pipeline scheduling model and thus tries to aggresively fill pipeline bubbles with spill slots. This however might not be true, as the model might not be available for certains uarches or even applicable especially for modern out-of-order cores. The existing heuristic induces spill frenzy on RISC-V, noticably so on SPEC2017 507.Cactu. If insn scheduling is disabled completely, the total dynamic icounts for this workload are reduced in half from ~2.5 trillion insns to ~1.3 (w/ -fno-schedule-insns). This patch adds --param=cycle-accurate-model={0,1} to gate the spill behavior. - The default (1) preserves existing spill behavior. - targets/uarches sensitive to spilling can override the param to (0) to get the reverse effect. RISC-V backend does so too. The actual perf numbers are very promising. (1) On RISC-V BPI-F3 in-order CPU, -Ofast -march=rv64gcv_zba_zbb_zbs: Before: -- Performance counter stats for './cactusBSSN_r_base.rivos spec_ref.par': 4,917,712.97 msec task-clock:u #1.000 CPUs utilized 5,314 context-switches:u #1.081 /sec 3 cpu-migrations:u #0.001 /sec 204,784 page-faults:u# 41.642 /sec 7,868,291,222,513 cycles:u #1.600 GHz 2,615,069,866,153 instructions:u #0.33 insn per cycle 10,799,381,890 branches:u #2.196 M/sec 15,714,572 branch-misses:u #0.15% of all branches After: - Performance counter stats for './cactusBSSN_r_base.rivos spec_ref.par': 4,552,979.58 msec task-clock:u #0.998 CPUs utilized 205,020 context-switches:u # 45.030 /sec 2 cpu-migrations:u #0.000 /sec 204,221 page-faults:u# 44.854 /sec 7,285,176,204,764 cycles:u(7.4% faster)#1.600 GHz 2,145,284,345,397 instructions:u (17.96% fewer)#0.29 insn per cycle 10,799,382,011 branches:u #2.372 M/sec 16,235,628 branch-misses:u #0.15% of all branches (2) Wilco reported 20% perf gains on aarch64 Neoverse V2 runs. gcc/ChangeLog: PR target/11472 * params.opt (--param=cycle-accurate-model=): New opt. * doc/invoke.texi (cycle-accurate-model): Document. * haifa-sched.cc (model_excess_group_cost): Return negative delta if param_cycle_accurate_model is 0. (model_excess_cost): Ceil negative baseECC to 0 only if param_cycle_accurate_model is 1. Dump the actual ECC value. * config/riscv/riscv.cc (riscv_option_override): Set param to 0. gcc/testsuite/ChangeLog: PR target/114729 * gcc.target/riscv/riscv.exp: Enable new tests to build. * gcc.target/riscv/sched1-spills/spill1.cpp: Add new test. Signed-off-by: Vineet Gupta --- gcc/config/riscv/riscv.cc | 4 +++ gcc/doc/invoke.texi | 7 gcc/haifa-sched.cc| 32 ++- gcc/params.opt| 4 +++ gcc/testsuite/gcc.target/riscv/riscv.exp | 2 ++ .../gcc.target/riscv/sched1-spills/spill1.cpp | 32 +++ 6 files changed, 73 insertions(+), 8 deletions(-) create mode 100644 gcc/testsuite/gcc.target/riscv/sched1-spills/spill1.cpp diff --git a/gcc/doc/invoke.texi b/gcc/doc/invoke.texi index 7146163d66d0..c1e07e258b25 100644 --- a/gcc/doc/invoke.texi +++ b/gcc/doc/invoke.texi @@ -17084,6 +17084,13 @@ With @option{--param=openacc-privatization=quiet}, don't diagnose. This is the current default. With @option{--param=openacc-privatization=noisy}, do diagnose. +@item cycle-accurate-model +Specifies whether GCC should assume that the scheduling description is mostly +a cycle-accurate model of the target processor, where the code is intended to +run on, in the absence of cache misses. Nonzero means that the selected scheduling +model is accuate and likely describes an in-order processor, and that scheduling +will aggressively spill to try and fill any pipeline bubbles. s/accuate/accurate/ And you should prob
[PUSHED 5/5] testsuite: Fix aarch64/sve/acle/general-c++/gnu_vectors_[12].C for taking address of vector element
After the recent changes for SVE vectors becoming usable as GNU vector extensions. You can now get each of the elements like it was an array. There is no reason why taking the address of that won't be invalid too. especially since we are limiting to the first N elements (where N is the min arch supported elements for these types). So this removes the error message on these 2 lines and fixes the testcase. Pushed as obvious after a quick test for these tests for aarch64-linux-gnu. gcc/testsuite/ChangeLog: * g++.target/aarch64/sve/acle/general-c++/gnu_vectors_1.C: Remove error message on taking address of an element of a vector. * g++.target/aarch64/sve/acle/general-c++/gnu_vectors_2.C: Likewise. Signed-off-by: Andrew Pinski --- .../g++.target/aarch64/sve/acle/general-c++/gnu_vectors_1.C | 2 +- .../g++.target/aarch64/sve/acle/general-c++/gnu_vectors_2.C | 2 +- 2 files changed, 2 insertions(+), 2 deletions(-) diff --git a/gcc/testsuite/g++.target/aarch64/sve/acle/general-c++/gnu_vectors_1.C b/gcc/testsuite/g++.target/aarch64/sve/acle/general-c++/gnu_vectors_1.C index 49b90492ce1..cdd48cbc454 100644 --- a/gcc/testsuite/g++.target/aarch64/sve/acle/general-c++/gnu_vectors_1.C +++ b/gcc/testsuite/g++.target/aarch64/sve/acle/general-c++/gnu_vectors_1.C @@ -115,7 +115,7 @@ f (svuint8_t sve_u1, svint8_t sve_s1, // Vector indexing. sve_u1[0]; - &sve_u1[0]; // { dg-error {subscripted value is neither array nor pointer} } + &sve_u1[0]; gnu_u1[0]; &gnu_u1[0]; diff --git a/gcc/testsuite/g++.target/aarch64/sve/acle/general-c++/gnu_vectors_2.C b/gcc/testsuite/g++.target/aarch64/sve/acle/general-c++/gnu_vectors_2.C index 2e9f688946f..587d28d2363 100644 --- a/gcc/testsuite/g++.target/aarch64/sve/acle/general-c++/gnu_vectors_2.C +++ b/gcc/testsuite/g++.target/aarch64/sve/acle/general-c++/gnu_vectors_2.C @@ -115,7 +115,7 @@ f (svuint8_t sve_u1, svint8_t sve_s1, // Vector indexing. sve_u1[0]; - &sve_u1[0]; // { dg-error {subscripted value is neither array nor pointer} } + &sve_u1[0]; gnu_u1[0]; &gnu_u1[0]; -- 2.43.0
Re: [PATCH 2/2] VN: Don't recurse on for the same value of `a != 0` [PR117859]
> Am 01.12.2024 um 02:59 schrieb Andrew Pinski : > > Like r15-5063-g6e84a41622f56c, but this is for the `a != 0` case. > After adding vn_valueize to the handle the `a ==/!= 0` case > of insert_predicates_for_cond, it would go into an infinite loop > as the Value number for a could be the same as what it > is for the whole expression. This avoids that recursion so there is > no infinite loop here. > > Note lim was introducing `bool_var2 = bool_var1 != 0` originally but > with the gimple testcase in -2, there is no dependency on what passes > before hand will do. > > Bootstrapped and tested on x86_64-linux-gnu. Ok Richard >PR tree-optimization/117859 > > gcc/ChangeLog: > >* tree-ssa-sccvn.cc (insert_predicates_for_cond): If the >valueization for the new lhs for `lhs != 0` >is the same as the old ones, don't recurse. > > gcc/testsuite/ChangeLog: > >* gcc.dg/torture/pr117859-1.c: New test. >* gcc.dg/torture/pr117859-2.c: New test. > > Signed-off-by: Andrew Pinski > --- > gcc/testsuite/gcc.dg/torture/pr117859-1.c | 23 ++ > gcc/testsuite/gcc.dg/torture/pr117859-2.c | 51 +++ > gcc/tree-ssa-sccvn.cc | 3 +- > 3 files changed, 76 insertions(+), 1 deletion(-) > create mode 100644 gcc/testsuite/gcc.dg/torture/pr117859-1.c > create mode 100644 gcc/testsuite/gcc.dg/torture/pr117859-2.c > > diff --git a/gcc/testsuite/gcc.dg/torture/pr117859-1.c > b/gcc/testsuite/gcc.dg/torture/pr117859-1.c > new file mode 100644 > index 000..21e0a26e5ca > --- /dev/null > +++ b/gcc/testsuite/gcc.dg/torture/pr117859-1.c > @@ -0,0 +1,23 @@ > +/* { dg-do compile } */ > +/* { dg-options "-O2 -fno-ssa-phiopt -fno-expensive-optimizations" } */ > + > +/* PR tree-optimization/117859 */ > + > +unsigned char a; > +int b, c, d, e, g, h, j, k; > +struct { > + int f : 1; > +} i; > +char l(int m) { return a > 255 >> m ? 0 : m; } > +int main() { > + if (i.f) > +d = b; > + k = e = 1; > + for (; e; e--) > +g = 0; > + h = g ? 1 << g : 1; > + c = h || i.f; > + while (b) > +j = l(k && i.f); > + return 0; > +} > diff --git a/gcc/testsuite/gcc.dg/torture/pr117859-2.c > b/gcc/testsuite/gcc.dg/torture/pr117859-2.c > new file mode 100644 > index 000..83549a80d3e > --- /dev/null > +++ b/gcc/testsuite/gcc.dg/torture/pr117859-2.c > @@ -0,0 +1,51 @@ > +/* { dg-do compile } */ > +/* { dg-options "-O2 -fgimple" } */ > + > +/* PR tree-optimization/117859 */ > + > +void __GIMPLE (ssa,startwith("pre")) > +m (int p, int b) > +{ > + int _6; > + _Bool _48; > + _Bool _50; > + int _57; > + _Bool _59; > + > + __BB(2): > + _6 = 0; // This needs to be prop'ed to make _48 unconditional > + if (_6 != 0) > +goto __BB5; > + else > +goto __BB14; > + > + __BB(14): > + _48 = p_2(D) != 0; > + goto __BB7; > + > + __BB(5): > + goto __BB7; > + > + __BB(7): > + _59 = __PHI (__BB5: _Literal (_Bool) 1, __BB14: _48); > + if (b_36(D) != 0) > +goto __BB8; > + else > +goto __BB13; > + > + __BB(8): > + _50 = _59 != _Literal (_Bool) 0; // Important being != 0 > + _57 = _50 ? 1 : 0; // can be unused but need around for _50 being used > + goto __BB9; > + > + __BB(9,loop_header(2)): > + if (_59 != _Literal (_Bool) 0) > +goto __BB13; > + else > +goto __BB9; > + > + __BB(13): > + return; > + > + > +} > diff --git a/gcc/tree-ssa-sccvn.cc b/gcc/tree-ssa-sccvn.cc > index 7ef0bd01d3c..8d74731a891 100644 > --- a/gcc/tree-ssa-sccvn.cc > +++ b/gcc/tree-ssa-sccvn.cc > @@ -7964,7 +7964,8 @@ insert_predicates_for_cond (tree_code code, tree lhs, > tree rhs, >edge nf = false_e; >if (code == EQ_EXPR) > std::swap (nt, nf); > -insert_predicates_for_cond (nc, nlhs, nrhs, nt, nf); > +if (lhs != nlhs) > + insert_predicates_for_cond (nc, nlhs, nrhs, nt, nf); > } > /* (a | b) == 0 -> >on true edge assert: a == 0 & b == 0. */ > -- > 2.43.0 >
[PATCH v2] Write binary annotations for CodeView S_INLINESITE symbols
Add "binary annotations" at the end of CodeView S_INLINESITE symbols, which are a series of compressed integers that represent how line numbers map to addresses. This requires assembler support; you will need commit b3aa594d ("gas: add .cv_ucomp and .cv_scomp pseudo-directives") in binutils. gcc/ * configure.ac (HAVE_GAS_CV_UCOMP): New check. * configure: Regenerate. * config.in: Regenerate. * dwarf2codeview.cc (enum binary_annotation_opcode): Define. (struct codeview_function): Add htab_next and inline_loc; (struct cv_func_hasher): Define. (cv_func_htab): New global variable. (new_codeview_function): Add new codeview_function to hash table. (codeview_begin_block): Record location of inline block. (codeview_end_block): Add dummy source line at end of inline block. (find_line_function): New function. (write_binary_annotations): New function. (write_s_inlinesite): Call write_binary_annotations. (codeview_debug_finish): Delete cv_func_htab. --- This second version adds a configure check for .cv_ucomp in the assembler, and prevents the binary annotations from being emitted if it's not supported. There's no separate check for .cv_scomp, as the same binutils commit introduced them both. gcc/config.in | 6 ++ gcc/configure | 32 +++ gcc/configure.ac | 3 + gcc/dwarf2codeview.cc | 203 +- 4 files changed, 240 insertions(+), 4 deletions(-) diff --git a/gcc/config.in b/gcc/config.in index 972c0c2034d..d8145a1453b 100644 --- a/gcc/config.in +++ b/gcc/config.in @@ -1459,6 +1459,12 @@ /* Define 0/1 if your assembler supports .cfi_sections. */ #undef HAVE_GAS_CFI_SECTIONS_DIRECTIVE +/* Define if your assembler supports .cv_ucomp. */ +#ifndef USED_FOR_TARGET +#undef HAVE_GAS_CV_UCOMP +#endif + + /* Define if your assembler supports the .loc discriminator sub-directive. */ #ifndef USED_FOR_TARGET #undef HAVE_GAS_DISCRIMINATOR diff --git a/gcc/configure b/gcc/configure index 8bb71cfe348..bab4181a940 100755 --- a/gcc/configure +++ b/gcc/configure @@ -25943,6 +25943,38 @@ $as_echo "#define HAVE_GAS_BASE64 1" >>confdefs.h fi +{ $as_echo "$as_me:${as_lineno-$LINENO}: checking assembler for .cv_ucomp" >&5 +$as_echo_n "checking assembler for .cv_ucomp... " >&6; } +if ${gcc_cv_as_cv_ucomp+:} false; then : + $as_echo_n "(cached) " >&6 +else + gcc_cv_as_cv_ucomp=no + if test x$gcc_cv_as != x; then +$as_echo '.cv_ucomp 0' > conftest.s +if { ac_try='$gcc_cv_as $gcc_cv_as_flags -o conftest.o conftest.s >&5' + { { eval echo "\"\$as_me\":${as_lineno-$LINENO}: \"$ac_try\""; } >&5 + (eval $ac_try) 2>&5 + ac_status=$? + $as_echo "$as_me:${as_lineno-$LINENO}: \$? = $ac_status" >&5 + test $ac_status = 0; }; } +then + gcc_cv_as_cv_ucomp=yes +else + echo "configure: failed program was" >&5 + cat conftest.s >&5 +fi +rm -f conftest.o conftest.s + fi +fi +{ $as_echo "$as_me:${as_lineno-$LINENO}: result: $gcc_cv_as_cv_ucomp" >&5 +$as_echo "$gcc_cv_as_cv_ucomp" >&6; } +if test $gcc_cv_as_cv_ucomp = yes; then + +$as_echo "#define HAVE_GAS_CV_UCOMP 1" >>confdefs.h + +fi + + # gnu_indirect_function type is an extension proposed at # http://groups.google/com/group/generic-abi/files. It allows dynamic runtime # selection of function implementation diff --git a/gcc/configure.ac b/gcc/configure.ac index 88a1a44fcf7..b1b21cf4d7b 100644 --- a/gcc/configure.ac +++ b/gcc/configure.ac @@ -3065,6 +3065,9 @@ gcc_GAS_CHECK_FEATURE([.base64], gcc_cv_as_base64,, .base64 "Tm9uIHB1ZG9yIGVzdCBuaWwgc2NpcmUsIHB1ZG9yIG5pbCBkaXNjZXJlIHZlbGxlLgo="],, [AC_DEFINE(HAVE_GAS_BASE64, 1, [Define if your assembler supports .base64.])]) +gcc_GAS_CHECK_FEATURE([.cv_ucomp], gcc_cv_as_cv_ucomp,,[.cv_ucomp 0],, +[AC_DEFINE(HAVE_GAS_CV_UCOMP, 1, [Define if your assembler supports .cv_ucomp.])]) + # gnu_indirect_function type is an extension proposed at # http://groups.google/com/group/generic-abi/files. It allows dynamic runtime # selection of function implementation diff --git a/gcc/dwarf2codeview.cc b/gcc/dwarf2codeview.cc index 821554088d7..33b57a45970 100644 --- a/gcc/dwarf2codeview.cc +++ b/gcc/dwarf2codeview.cc @@ -1086,6 +1086,25 @@ enum cv_amd64_register { CV_AMD64_YMM15D3 = 687 }; +/* This is enum BinaryAnnotationOpcode in Microsoft's cvinfo.h. */ + +enum binary_annotation_opcode { + ba_op_invalid, + ba_op_code_offset, + ba_op_change_code_offset_base, + ba_op_change_code_offset, + ba_op_change_code_length, + ba_op_change_file, + ba_op_change_line_offset, + ba_op_change_line_end_delta, + ba_op_change_range_kind, + ba_op_change_column_start, + ba_op_change_column_end_delta, + ba_op_change_code_offset_and_line_offset, + ba_op_change_code_length_and_code_offset, + ba_op_change_column_end +}; + struct codeview_string { codeview_string *next; @@ -1157,11 +1176,13 @@ struct codeview_inlinee_lines
[committed] libstdc++: Improve new testcase for std::optional assignment [PR117858]
The copy & paste bug affected two assignment operators, so ensure the new test covers both. libstdc++-v3/ChangeLog: PR libstdc++/117858 * testsuite/20_util/optional/assignment/117858.cc: Also test assignment from rvalue optional. --- Tested x86_64-linux. Pushed to trunk. libstdc++-v3/testsuite/20_util/optional/assignment/117858.cc | 1 + 1 file changed, 1 insertion(+) diff --git a/libstdc++-v3/testsuite/20_util/optional/assignment/117858.cc b/libstdc++-v3/testsuite/20_util/optional/assignment/117858.cc index 9443e160484..e7045b37dd9 100644 --- a/libstdc++-v3/testsuite/20_util/optional/assignment/117858.cc +++ b/libstdc++-v3/testsuite/20_util/optional/assignment/117858.cc @@ -13,4 +13,5 @@ struct Focus void g(std::optional f) { f = f; + f = std::move(f); } -- 2.47.0
[PATCH] x86: Add pcmpeq splitters
Add pcmpeq splitters to split (insn 5 3 7 2 (set (reg:V4SI 100) (eq:V4SI (reg:V4SI 98) (reg:V4SI 98))) 7910 {*sse2_eqv4si3} (expr_list:REG_DEAD (reg:V4SI 98) (expr_list:REG_EQUAL (eq:V4SI (const_vector:V4SI [ (const_int -1 [0x]) repeated x4 ]) (const_vector:V4SI [ (const_int -1 [0x]) repeated x4 ])) (nil to (insn 8 3 7 2 (set (reg:V4SI 100) (const_vector:V4SI [ (const_int -1 [0x]) repeated x4 ])) -1 (nil)) gcc/ PR target/117863 * config/i386/sse.md: Add pcmpeq splitters. gcc/testsuite/ PR target/117863 * gcc.dg/rtl/i386/vector_eq-2.c: New test. Signed-off-by: H.J. Lu --- gcc/config/i386/sse.md | 33 ++ gcc/testsuite/gcc.dg/rtl/i386/vector_eq-2.c | 71 + 2 files changed, 104 insertions(+) create mode 100644 gcc/testsuite/gcc.dg/rtl/i386/vector_eq-2.c diff --git a/gcc/config/i386/sse.md b/gcc/config/i386/sse.md index 498a42d6e1e..e2ce0781cb4 100644 --- a/gcc/config/i386/sse.md +++ b/gcc/config/i386/sse.md @@ -17943,6 +17943,17 @@ (define_insn "*avx2_eq3" (set_attr "prefix" "vex") (set_attr "mode" "OI")]) +(define_split + [(set (match_operand:VI_256 0 "register_operand") + (eq:VI_256 + (match_operand:VI_256 1 "nonimmediate_operand") + (match_operand:VI_256 2 "nonimmediate_operand")))] + "TARGET_AVX2 && rtx_equal_p (operands[1], operands[2])" + [(set (match_dup 0) (match_dup 1))] +{ + operands[1] = CONSTM1_RTX (mode); +}) + (define_insn_and_split "*avx2_pcmp3_1" [(set (match_operand:VI_128_256 0 "register_operand") (vec_merge:VI_128_256 @@ -18227,6 +18238,17 @@ (define_insn "*sse4_1_eqv2di3" (set_attr "prefix" "orig,orig,vex") (set_attr "mode" "TI")]) +(define_split + [(set (match_operand:V2DI 0 "register_operand") + (eq:V2DI + (match_operand:V2DI 1 "vector_operand") + (match_operand:V2DI 2 "vector_operand")))] + "TARGET_SSE4_1 && rtx_equal_p (operands[1], operands[2])" + [(set (match_dup 0) (match_dup 1))] +{ + operands[1] = CONSTM1_RTX (V2DImode); +}) + (define_insn "*sse2_eq3" [(set (match_operand:VI124_128 0 "register_operand" "=x,x") (eq:VI124_128 @@ -18243,6 +18265,17 @@ (define_insn "*sse2_eq3" (set_attr "prefix" "orig,vex") (set_attr "mode" "TI")]) +(define_split + [(set (match_operand:VI124_128 0 "register_operand") + (eq:VI124_128 + (match_operand:VI124_128 1 "vector_operand") + (match_operand:VI124_128 2 "vector_operand")))] + "TARGET_SSE2 && rtx_equal_p (operands[1], operands[2])" + [(set (match_dup 0) (match_dup 1))] +{ + operands[1] = CONSTM1_RTX (mode); +}) + (define_insn "sse4_2_gtv2di3" [(set (match_operand:V2DI 0 "register_operand" "=Yr,*x,x") (gt:V2DI diff --git a/gcc/testsuite/gcc.dg/rtl/i386/vector_eq-2.c b/gcc/testsuite/gcc.dg/rtl/i386/vector_eq-2.c new file mode 100644 index 000..871d489b730 --- /dev/null +++ b/gcc/testsuite/gcc.dg/rtl/i386/vector_eq-2.c @@ -0,0 +1,71 @@ +/* { dg-do compile { target { i?86-*-* x86_64-*-* } } } */ +/* { dg-additional-options "-O2 -march=x86-64-v3" } */ + +typedef int v4si __attribute__((vector_size(16))); +typedef int v8si __attribute__((vector_size(32))); +typedef int v2di __attribute__((vector_size(16))); + +v4si __RTL (startwith ("vregs1")) foo1 (void) +{ +(function "foo1" + (insn-chain +(block 2 + (edge-from entry (flags "FALLTHRU")) + (cnote 1 [bb 2] NOTE_INSN_BASIC_BLOCK) + (cnote 2 NOTE_INSN_FUNCTION_BEG) + (cinsn 3 (set (reg:V4SI <0>) (const_vector:V4SI [(const_int -1) (const_int -1) (const_int -1) (const_int -1)]))) + (cinsn 4 (set (reg:V4SI <1>) (const_vector:V4SI [(const_int -1) (const_int -1) (const_int -1) (const_int -1)]))) + (cinsn 5 (set (reg:V4SI <2>) + (eq:V4SI (reg:V4SI <0>) (reg:V4SI <1> + (cinsn 6 (set (reg:V4SI <3>) (reg:V4SI <2>))) + (cinsn 7 (set (reg:V4SI xmm0) (reg:V4SI <3>))) + (edge-to exit (flags "FALLTHRU")) +) + ) + (crtl (return_rtx (reg/i:V4SI xmm0))) +) +} + +v8si __RTL (startwith ("vregs1")) foo2 (void) +{ +(function "foo2" + (insn-chain +(block 2 + (edge-from entry (flags "FALLTHRU")) + (cnote 1 [bb 2] NOTE_INSN_BASIC_BLOCK) + (cnote 2 NOTE_INSN_FUNCTION_BEG) + (cinsn 3 (set (reg:V8SI <0>) (const_vector:V8SI [(const_int -1) (const_int -1) (const_int -1) (const_int -1) (const_int -1) (const_int -1) (const_int -1) (const_int -1)]))) + (cinsn 4 (set (reg:V8SI <1>) (const_vector:V8SI [(const_int -1) (const_int -1) (const_int -1) (const_int -1) (const_int -1) (const_int -1) (const_int -1) (const_int -1)]))) + (cinsn 5 (set (reg:V8SI <2>) + (eq:V8SI (reg:V8SI <0>) (reg:V8SI <1> + (cinsn 6 (set (reg:V8SI <
[PATCH 2/2] VN: Don't recurse on for the same value of `a != 0` [PR117859]
Like r15-5063-g6e84a41622f56c, but this is for the `a != 0` case. After adding vn_valueize to the handle the `a ==/!= 0` case of insert_predicates_for_cond, it would go into an infinite loop as the Value number for a could be the same as what it is for the whole expression. This avoids that recursion so there is no infinite loop here. Note lim was introducing `bool_var2 = bool_var1 != 0` originally but with the gimple testcase in -2, there is no dependency on what passes before hand will do. Bootstrapped and tested on x86_64-linux-gnu. PR tree-optimization/117859 gcc/ChangeLog: * tree-ssa-sccvn.cc (insert_predicates_for_cond): If the valueization for the new lhs for `lhs != 0` is the same as the old ones, don't recurse. gcc/testsuite/ChangeLog: * gcc.dg/torture/pr117859-1.c: New test. * gcc.dg/torture/pr117859-2.c: New test. Signed-off-by: Andrew Pinski --- gcc/testsuite/gcc.dg/torture/pr117859-1.c | 23 ++ gcc/testsuite/gcc.dg/torture/pr117859-2.c | 51 +++ gcc/tree-ssa-sccvn.cc | 3 +- 3 files changed, 76 insertions(+), 1 deletion(-) create mode 100644 gcc/testsuite/gcc.dg/torture/pr117859-1.c create mode 100644 gcc/testsuite/gcc.dg/torture/pr117859-2.c diff --git a/gcc/testsuite/gcc.dg/torture/pr117859-1.c b/gcc/testsuite/gcc.dg/torture/pr117859-1.c new file mode 100644 index 000..21e0a26e5ca --- /dev/null +++ b/gcc/testsuite/gcc.dg/torture/pr117859-1.c @@ -0,0 +1,23 @@ +/* { dg-do compile } */ +/* { dg-options "-O2 -fno-ssa-phiopt -fno-expensive-optimizations" } */ + +/* PR tree-optimization/117859 */ + +unsigned char a; +int b, c, d, e, g, h, j, k; +struct { + int f : 1; +} i; +char l(int m) { return a > 255 >> m ? 0 : m; } +int main() { + if (i.f) +d = b; + k = e = 1; + for (; e; e--) +g = 0; + h = g ? 1 << g : 1; + c = h || i.f; + while (b) +j = l(k && i.f); + return 0; +} diff --git a/gcc/testsuite/gcc.dg/torture/pr117859-2.c b/gcc/testsuite/gcc.dg/torture/pr117859-2.c new file mode 100644 index 000..83549a80d3e --- /dev/null +++ b/gcc/testsuite/gcc.dg/torture/pr117859-2.c @@ -0,0 +1,51 @@ +/* { dg-do compile } */ +/* { dg-options "-O2 -fgimple" } */ + +/* PR tree-optimization/117859 */ + +void __GIMPLE (ssa,startwith("pre")) +m (int p, int b) +{ + int _6; + _Bool _48; + _Bool _50; + int _57; + _Bool _59; + + __BB(2): + _6 = 0; // This needs to be prop'ed to make _48 unconditional + if (_6 != 0) +goto __BB5; + else +goto __BB14; + + __BB(14): + _48 = p_2(D) != 0; + goto __BB7; + + __BB(5): + goto __BB7; + + __BB(7): + _59 = __PHI (__BB5: _Literal (_Bool) 1, __BB14: _48); + if (b_36(D) != 0) +goto __BB8; + else +goto __BB13; + + __BB(8): + _50 = _59 != _Literal (_Bool) 0; // Important being != 0 + _57 = _50 ? 1 : 0; // can be unused but need around for _50 being used + goto __BB9; + + __BB(9,loop_header(2)): + if (_59 != _Literal (_Bool) 0) +goto __BB13; + else +goto __BB9; + + __BB(13): + return; + + +} diff --git a/gcc/tree-ssa-sccvn.cc b/gcc/tree-ssa-sccvn.cc index 7ef0bd01d3c..8d74731a891 100644 --- a/gcc/tree-ssa-sccvn.cc +++ b/gcc/tree-ssa-sccvn.cc @@ -7964,7 +7964,8 @@ insert_predicates_for_cond (tree_code code, tree lhs, tree rhs, edge nf = false_e; if (code == EQ_EXPR) std::swap (nt, nf); - insert_predicates_for_cond (nc, nlhs, nrhs, nt, nf); + if (lhs != nlhs) + insert_predicates_for_cond (nc, nlhs, nrhs, nt, nf); } /* (a | b) == 0 -> on true edge assert: a == 0 & b == 0. */ -- 2.43.0
[PATCH 1/2] gimple-lim: Reuse boolean var when moving PHI
While looking into PR 117859, I noticed that LIM sometimes would produce `bool_var2 = bool_var1 != 0` instead of just using bool_var2. This patch allows LIM to reuse bool_var1 in the place where bool_var2 was going to be used. Bootstrapped and tested on x86_64-linux-gnu. gcc/ChangeLog: * tree-ssa-loop-im.cc (move_computations_worker): While moving phi, reuse the lhs of the conditional if it is a boolean type. Signed-off-by: Andrew Pinski --- gcc/tree-ssa-loop-im.cc | 21 - 1 file changed, 16 insertions(+), 5 deletions(-) diff --git a/gcc/tree-ssa-loop-im.cc b/gcc/tree-ssa-loop-im.cc index 538a0588d33..0130a809e30 100644 --- a/gcc/tree-ssa-loop-im.cc +++ b/gcc/tree-ssa-loop-im.cc @@ -1304,11 +1304,22 @@ move_computations_worker (basic_block bb) edges of COND. */ extract_true_false_args_from_phi (dom, stmt, &arg0, &arg1); gcc_assert (arg0 && arg1); - t = make_ssa_name (boolean_type_node); - new_stmt = gimple_build_assign (t, gimple_cond_code (cond), - gimple_cond_lhs (cond), - gimple_cond_rhs (cond)); - gsi_insert_on_edge (loop_preheader_edge (level), new_stmt); + /* For `bool_val != 0`, reuse bool_val. */ + if (gimple_cond_code (cond) == NE_EXPR + && integer_zerop (gimple_cond_rhs (cond)) + && types_compatible_p (TREE_TYPE (gimple_cond_lhs (cond)), +boolean_type_node)) + { + t = gimple_cond_lhs (cond); + } + else + { + t = make_ssa_name (boolean_type_node); + new_stmt = gimple_build_assign (t, gimple_cond_code (cond), + gimple_cond_lhs (cond), + gimple_cond_rhs (cond)); + gsi_insert_on_edge (loop_preheader_edge (level), new_stmt); + } new_stmt = gimple_build_assign (gimple_phi_result (stmt), COND_EXPR, t, arg0, arg1); todo |= TODO_cleanup_cfg; -- 2.43.0
Re: [PATCH v3] zero_extend(not) -> xor optimization [PR112398]
On 11/28/24 5:26 AM, Alexey Merzlyakov wrote: This patch adds optimization of the following patterns: (zero_extend:M (subreg:N (not:O==M (X:Q==M -> (xor:M (zero_extend:M (subreg:N (X:M)), mask)) ... where the mask is GET_MODE_MASK (N). For the cases when X:M doesn't have any non-zero bits outside of mode N, (zero_extend:M (subreg:N (X:M)) could be simplified to just (X:M) and whole optimization will be: (zero_extend:M (subreg:N (not:M (X:M -> (xor:M (X:M, mask)) Patch targets to handle code patterns like: not a0,a0 andi a0,a0,0xff to be optimized to: xori a0,a0,255 PR rtl-optimization/112398 PR rtl-optimization/117476 gcc/ChangeLog: * simplify-rtx.cc (simplify_context::simplify_unary_operation_1): Simplify ZERO_EXTEND (SUBREG (NOT X)) to XOR (X, GET_MODE_MASK(SUBREG)) when X doesn't have any non-zero bits outside of SUBREG mode. gcc/testsuite/ChangeLog: * gcc.target/riscv/pr112398.c: New test. * gcc.dg/torture/pr117476-1.c: New test. From Zhendong Su. * gcc.dg/torture/pr117476-2.c: New test. From Zdenek Sojka. Thanks. I've bootstrapped & regression tested this on x86_64 as well as run it through my tester successfully. Pushed to the trunk. Hopefully no fallout this time :-) jeff
[committed] libstdc++: Fix constraints on std::optional converting assignments [PR117858]
It looks like I copied these constraints from operator=(U&&) and didn't correct them to account for the parameter being optional not U. libstdc++-v3/ChangeLog: PR libstdc++/117858 * include/std/optional (operator=(const optional&)): Fix copy and paste error in constraints. (operator=(optional&&)): Likewise. * testsuite/20_util/optional/assignment/117858.cc: New test. --- Tested x86_64-linux. Pushed to trunk. libstdc++-v3/include/std/optional| 4 ++-- .../20_util/optional/assignment/117858.cc| 16 2 files changed, 18 insertions(+), 2 deletions(-) create mode 100644 libstdc++-v3/testsuite/20_util/optional/assignment/117858.cc diff --git a/libstdc++-v3/include/std/optional b/libstdc++-v3/include/std/optional index b8eedeec781..55e56cfb9ed 100644 --- a/libstdc++-v3/include/std/optional +++ b/libstdc++-v3/include/std/optional @@ -1043,7 +1043,7 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION template #ifdef _GLIBCXX_USE_CONSTRAINTS_FOR_OPTIONAL - requires (!is_same_v>) + requires (!is_same_v<_Tp, _Up>) && is_constructible_v<_Tp, const _Up&> && is_assignable_v<_Tp&, const _Up&> && (!__converts_from_optional<_Tp, _Up>::value) @@ -1077,7 +1077,7 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION template #ifdef _GLIBCXX_USE_CONSTRAINTS_FOR_OPTIONAL - requires (!is_same_v>) + requires (!is_same_v<_Tp, _Up>) && is_constructible_v<_Tp, _Up> && is_assignable_v<_Tp&, _Up> && (!__converts_from_optional<_Tp, _Up>::value) diff --git a/libstdc++-v3/testsuite/20_util/optional/assignment/117858.cc b/libstdc++-v3/testsuite/20_util/optional/assignment/117858.cc new file mode 100644 index 000..9443e160484 --- /dev/null +++ b/libstdc++-v3/testsuite/20_util/optional/assignment/117858.cc @@ -0,0 +1,16 @@ +// { dg-do compile { target c++17 } } + +// PR 117858 std::optional with a constructor template ctor(T) + +#include + +struct Focus +{ + template + Focus(T newValue) { } +}; + +void g(std::optional f) +{ + f = f; +} -- 2.47.0
[PATCH] x86: Add a pass to remove redundant all 0s/1s vector load
For all different modes of all 0s/1s vectors, we can use the single widest all 0s/1s vector register for all 0s/1s vector uses in the whole function. Add a pass to generate a single widest all 0s/1s vector set instruction at entry of the nearest common dominator for basic blocks with all 0s/1s vector uses. On Linux/x86-64, in cc1plus, this patch reduces the number of vector xor instructions from 4803 to 4714 and pcmpeq instructions from 144 to 142. This change causes a regression: FAIL: gcc.dg/rtl/x86_64/vector_eq.c without the fix for https://gcc.gnu.org/bugzilla/show_bug.cgi?id=117863 NB: PR target/92080 and PR target/117839 aren't same. PR target/117839 is for vectors of all 0s and all 1s with different sizes and different components. PR target/92080 is for broadcast of the same component to different vector sizes. This patch covers only all 0s and all 1s cases of PR target/92080. gcc/ PR target/92080 PR target/117839 * config/i386/i386-features.cc (ix86_rrvl_gate): New. (ix86_place_single_vector_set): Likewise. (ix86_get_vector_load_mode): Likewise. (remove_redundant_vector_load): Likewise. (pass_data_remove_redundant_vector_load): Likewise. (pass_remove_redundant_vector_load): Likewise. (make_pass_remove_redundant_vector_load): Likewise. * config/i386/i386-passes.def: Add pass_remove_redundant_vector_load after pass_remove_partial_avx_dependency. * config/i386/i386-protos.h (make_pass_remove_redundant_vector_load): New. gcc/testsuite/ PR target/92080 PR target/117839 * gcc.target/i386/pr117839-1a.c: New test. * gcc.target/i386/pr117839-1b.c: Likewise. * gcc.target/i386/pr117839-2.c: Likewise. * gcc.target/i386/pr92080-1.c: Likewise. * gcc.target/i386/pr92080-2.c: Likewise. * gcc.target/i386/pr92080-3.c: Likewise. Signed-off-by: H.J. Lu --- gcc/config/i386/i386-features.cc| 308 gcc/config/i386/i386-passes.def | 1 + gcc/config/i386/i386-protos.h | 2 + gcc/testsuite/gcc.target/i386/pr117839-1a.c | 35 +++ gcc/testsuite/gcc.target/i386/pr117839-1b.c | 5 + gcc/testsuite/gcc.target/i386/pr117839-2.c | 40 +++ gcc/testsuite/gcc.target/i386/pr92080-1.c | 54 gcc/testsuite/gcc.target/i386/pr92080-2.c | 59 gcc/testsuite/gcc.target/i386/pr92080-3.c | 48 +++ 9 files changed, 552 insertions(+) create mode 100644 gcc/testsuite/gcc.target/i386/pr117839-1a.c create mode 100644 gcc/testsuite/gcc.target/i386/pr117839-1b.c create mode 100644 gcc/testsuite/gcc.target/i386/pr117839-2.c create mode 100644 gcc/testsuite/gcc.target/i386/pr92080-1.c create mode 100644 gcc/testsuite/gcc.target/i386/pr92080-2.c create mode 100644 gcc/testsuite/gcc.target/i386/pr92080-3.c diff --git a/gcc/config/i386/i386-features.cc b/gcc/config/i386/i386-features.cc index 003b003e09c..7d8d260750d 100644 --- a/gcc/config/i386/i386-features.cc +++ b/gcc/config/i386/i386-features.cc @@ -3288,6 +3288,314 @@ make_pass_remove_partial_avx_dependency (gcc::context *ctxt) return new pass_remove_partial_avx_dependency (ctxt); } +static bool +ix86_rrvl_gate () +{ + return (TARGET_SSE2 + && optimize + && optimize_function_for_speed_p (cfun)); +} + +/* Generate a vector set, DEST = SRC, at entry of the nearest dominator + for basic block map BBS, which is in the fake loop that contains the + whole function, so that there is only a single vector set in the + whole function. */ + +static void +ix86_place_single_vector_set (rtx dest, rtx src, bitmap bbs) +{ + basic_block bb = nearest_common_dominator_for_set (CDI_DOMINATORS, bbs); + while (bb->loop_father->latch +!= EXIT_BLOCK_PTR_FOR_FN (cfun)) +bb = get_immediate_dominator (CDI_DOMINATORS, + bb->loop_father->header); + + rtx set = gen_rtx_SET (dest, src); + + rtx_insn *insn = BB_HEAD (bb); + while (insn && !NONDEBUG_INSN_P (insn)) +{ + if (insn == BB_END (bb)) + { + insn = NULL; + break; + } + insn = NEXT_INSN (insn); +} + + rtx_insn *set_insn; + if (insn == BB_HEAD (bb)) +set_insn = emit_insn_before (set, insn); + else +set_insn = emit_insn_after (set, + insn ? PREV_INSN (insn) : BB_END (bb)); + df_insn_rescan (set_insn); +} + +/* Return a machine mode suitable for vector SIZE. */ + +static machine_mode +ix86_get_vector_load_mode (unsigned int size) +{ + machine_mode mode; + if (size == 64) +mode = V64QImode; + else if (size == 32) +mode = V32QImode; + else +mode = V16QImode; + return mode; +} + +/* At entry of the nearest common dominator for basic blocks with vector + CONST0_RTX and integer CONSTM1_RTX uses, generate a single widest + vector set instruction for all CONST0_RTX and integer CONSTM1_RTX + uses. + + NB: We want to
Re: [PATCH] Add fancy pointer support in std::map/set
On Sat, 30 Nov 2024, 13:52 François Dumont, wrote: > Hi > > I've applied all your comments below and the ones you did on the PR > directly. > > When all new types totally seperated from the legacy types > _Rb_tree_node_traits is indeed useless. > > Regarding _Rb_tree_helpers I got rid of it but move the insert-and-rebalance > and rebalance-for-erase function to _Node_traits<>. > > PR updated: > > https://forge.sourceware.org/gcc/gcc-TEST/pulls/27 > > libstdc++: Add fancy pointer support in map and set Support fancy > allocator pointer type in std::_Rb_tree<>. In case of fancy pointer type > the container is now storing the pointer to __rb_tree::_Node<_ValPtr> as > a pointer to __rb_tree::_Node_base<_VoidPtr>. Many methods are adapted to > take and return _Base_ptr in place of _Link_type which has been renamed > into _Node_ptr. As all node are stored as _Base_ptr have all methods > working with this type and remove _Const_Base_ptr and _Const_Node_ptr and > all methods associated with it. > Sounds good! I'll take another look on Monday. libstdc++-v3/ChangeLog: * include/bits/stl_tree.h > [_GLIBCXX_USE_ALLOC_PTR_FOR_RB_TREE]: New macro to control usage of the > code required to support fancy allocator pointer type. (_Rb_tree_node_ > base::_Const_Base_ptr): Remove. (_Rb_tree_node_base::_S_minimum, > _Rb_tree_node_base::_S_maximum): Remove overloads for _Const_Base_ptr. > (_Rb_tree_node_base::_M_base_ptr()): New. > (_Rb_tree_node_base::_M_node_ptr<_NodePtr>()): > New. (_Rb_tree_node::_Link_type): Rename into... (_Rb_tree_node::_Node_ptr): > ...this. (__rb_tree::_Node_base<>): New. (__rb_tree::_Header<>): New. > (__rb_tree::_Node<>): New. (_Rb_tree_increment(const > _Rb_tree_node_base*)): Remove declaration. (_Rb_tree_decrement(const > _Rb_tree_node_base*)): Remove declaration. > (_Rb_tree_iterator<>::_Link_type): Rename into... > (_Rb_tree_iterator<>::_Node_ptr): ...this. > (_Rb_tree_const_iterator<>::_Link_type): Rename into... > (_Rb_tree_const_iterator<>::_Node_ptr): ...this. > (_Rb_tree_const_iterator<>::_M_const_cast): Remove. > (_Rb_tree_const_iterator<>::_M_node): Change type into _Base_ptr. (__rb_ > tree::_Iterator<>): New. (__rb_tree::_Node_traits<>): New. > (_Rb_tree<>::_Node_base, _Rb_tree::_Node_type): New. > (_Rb_tree<>::_Link_type): Rename into... (_Rb_tree<>::_Node_ptr): ...this. > (_Rb_tree<>::_Const_Base_ptr, _Rb_tree<>::_Const_Node_ptr): Remove. > (_Rb_tree<>): Adapt to generalize usage of _Base_ptr in place of former > _Link_type now _Node_ptr. (_Rb_tree<>::_M_mbegin): Remove. > (_Rb_tree<>::_S_left(_Const_Base_ptr)): Remove. > (_Rb_tree<>::_S_right(_Const_Base_ptr)): Remove. > (_Rb_tree<>::_S_maximum(_Const_Base_ptr)): Remove. > (_Rb_tree<>::_S_minimum(_Const_Base_ptr)): Remove. * > testsuite/23_containers/map/allocator/ext_ptr.cc: New test case. * > testsuite/23_containers/multimap/allocator/ext_ptr.cc: New test case. * > testsuite/23_containers/multiset/allocator/ext_ptr.cc: New test case. * > testsuite/23_containers/set/allocator/ext_ptr.cc: New test case. > > I've tested it in C++98/C++17/C++20 with default > _GLIBCXX_USE_ALLOC_PTR_FOR_RB_TREE and with it 9001. > > François > On 18/11/2024 17:39, Jonathan Wakely wrote: > > On 18/11/24 06:57 +0100, François Dumont wrote: > > Hi > > Here is a new proposal with all the cleanup regarding _Const_Base_ptr > that makes support of allocator's fancy pointer type simpler. > > Also submitted as PR: > https://forge.sourceware.org/gcc/gcc-TEST/pulls/27 > > libstdc++: Add fancy pointer support in map and set > >Support fancy allocator pointer type in std::_Rb_tree<>. > >In case of fancy pointer type the container is now storing the > pointer to >_Rb_tree_pnode<> as a pointer to _Rb_tree_pnode_base<>. > >Many methods are adapted to take and return _Base_ptr in place of > _Link_type >which has been renamed into _Node_ptr. > >As all node are stored as _Base_ptr have all methods working with > this type >and remove _Const_Base_ptr and all methods associated to it. > >libstdc++-v3/ChangeLog: > >* include/bits/stl_set.h (std::set<>::upper_bound<>(const > _Kt&) const): Fix >decltype typo to use const_iterator. >* include/bits/stl_tree.h >(_Rb_tree_ptr_traits<>): New. >(_Rb_tree_pnode_base<>): New. >(_Rb_tree_node_base): Inherit from latter. >(_Rb_tree_node_base::_Const_Base_ptr): Remove. >(_Rb_tree_node_base::_S_minimum(_Const_Base_ptr)): Remove. >(_Rb_tree_node_base::_S_maximum(_Const_Base_ptr)): Remove. >(_Rb_tree_pheader): New. >(_Rb_tree_header): Inherit from latter. >(_Rb_tree_node_val): New. >(_Rb_tree_node): Inherit from latter. >(_Rb_tree_pnode): New. >(_Rb_tree_iterator<>::_Link_type): Rename into... >(_Rb_tree_iterator<>::_Node_ptr): ...this. >(_Rb_tree_const_iterator<>::_Link_type): Rename into... >