[PING] [PATCH] testsuite: Disable finite math only for test [PR115826]

2024-07-23 Thread Torbjorn SVENSSON

Gentle ping :)

As mentioned in the ticket, I would like to target this to trunk and 
releases/gcc-14.


Kind regards,
Torbjörn

On 2024-07-15 12:16, Torbjörn SVENSSON wrote:

As the test case requires +-Inf and NaN to work and -ffast-math is added
by default for arm-none-eabi, re-enable non-finite math.

gcc/testsuite/ChangeLog:

PR testsuite/115826
* gcc.dg/vect/tsvc/vect-tsvc-s1281.c: Use -fno-finite-math-only.

Signed-off-by: Torbjörn SVENSSON 
---
  gcc/testsuite/gcc.dg/vect/tsvc/vect-tsvc-s1281.c | 3 +++
  1 file changed, 3 insertions(+)

diff --git a/gcc/testsuite/gcc.dg/vect/tsvc/vect-tsvc-s1281.c 
b/gcc/testsuite/gcc.dg/vect/tsvc/vect-tsvc-s1281.c
index dba95a81973..3e619a3fa5a 100644
--- a/gcc/testsuite/gcc.dg/vect/tsvc/vect-tsvc-s1281.c
+++ b/gcc/testsuite/gcc.dg/vect/tsvc/vect-tsvc-s1281.c
@@ -4,6 +4,9 @@
  /* { dg-additional-options "--param vect-epilogues-nomask=0" } */
  /* { dg-require-effective-target vect_float } */
  
+/* This test requires +-Inf and NaN, so disable finite-math-only */

+/* { dg-additional-options "-fno-finite-math-only" } */
+
  #include "tsvc.h"
  
  real_t s1281(struct args_t * func_args)




[PATCH] gm2: fix bad programming practice identifier warning

2024-07-23 Thread Wilken Gottwalt
Fix using keywords as identifiers to prevent warnings coming from
Modula-2's own libraries.

m2pim/DynamicStrings.mod:1358:27: note: In procedure ‘Slice’: the symbol
name ‘end’ is legal as an identifier, however as such it might cause
confusion and is considered bad programming practice
 1358 |start, end, o: INTEGER ;

m2pim/DynamicStrings.mod:1358:27: note: either the identifier has the
same name as a keyword or alternatively a keyword has the wrong case
(‘END’ and ‘end’)

gcc/gm2:
* gm2-libs/DynamicStrings.mod: Fix bad identifier warning.

Signed-off-by: Wilken Gottwalt 
---
 gcc/m2/gm2-libs/DynamicStrings.mod | 8 
 1 file changed, 4 insertions(+), 4 deletions(-)

diff --git a/gcc/m2/gm2-libs/DynamicStrings.mod 
b/gcc/m2/gm2-libs/DynamicStrings.mod
index b53f0f285b5..982284d3629 100644
--- a/gcc/m2/gm2-libs/DynamicStrings.mod
+++ b/gcc/m2/gm2-libs/DynamicStrings.mod
@@ -1354,8 +1354,8 @@ END Mult ;
 
 PROCEDURE Slice (s: String; low, high: INTEGER) : String ;
 VAR
-   d, t : String ;
-   start, end, o: INTEGER ;
+   d, t  : String ;
+   start, stop, o: INTEGER ;
 BEGIN
IF PoisonOn
THEN
@@ -1390,7 +1390,7 @@ BEGIN
 ELSE
start := low - o
 END ;
-end := Max (Min (MaxBuf, high - o), 0) ;
+stop := Max (Min (MaxBuf, high - o), 0) ;
 WHILE t^.contents.len = MaxBuf DO
IF t^.contents.next = NIL
THEN
@@ -1408,7 +1408,7 @@ BEGIN
t := t^.contents.next
 END ;
 ConcatContentsAddress (t^.contents,
-   ADR (s^.contents.buf[start]), end - start) ;
+   ADR (s^.contents.buf[start]), stop - start) 
;
 INC (o, s^.contents.len) ;
 s := s^.contents.next
  END
-- 
2.45.2



Re: [PING] [PATCH] testsuite: Disable finite math only for test [PR115826]

2024-07-23 Thread Richard Biener
On Tue, 23 Jul 2024, Torbjorn SVENSSON wrote:

> Gentle ping :)
> 
> As mentioned in the ticket, I would like to target this to trunk and
> releases/gcc-14.

OK.

Richard.

> Kind regards,
> Torbjörn
> 
> On 2024-07-15 12:16, Torbjörn SVENSSON wrote:
> > As the test case requires +-Inf and NaN to work and -ffast-math is added
> > by default for arm-none-eabi, re-enable non-finite math.
> > 
> > gcc/testsuite/ChangeLog:
> > 
> >  PR testsuite/115826
> >  * gcc.dg/vect/tsvc/vect-tsvc-s1281.c: Use -fno-finite-math-only.
> > 
> > Signed-off-by: Torbjörn SVENSSON 
> > ---
> >   gcc/testsuite/gcc.dg/vect/tsvc/vect-tsvc-s1281.c | 3 +++
> >   1 file changed, 3 insertions(+)
> > 
> > diff --git a/gcc/testsuite/gcc.dg/vect/tsvc/vect-tsvc-s1281.c
> > b/gcc/testsuite/gcc.dg/vect/tsvc/vect-tsvc-s1281.c
> > index dba95a81973..3e619a3fa5a 100644
> > --- a/gcc/testsuite/gcc.dg/vect/tsvc/vect-tsvc-s1281.c
> > +++ b/gcc/testsuite/gcc.dg/vect/tsvc/vect-tsvc-s1281.c
> > @@ -4,6 +4,9 @@
> >   /* { dg-additional-options "--param vect-epilogues-nomask=0" } */
> >   /* { dg-require-effective-target vect_float } */
> >   
> > +/* This test requires +-Inf and NaN, so disable finite-math-only */
> > +/* { dg-additional-options "-fno-finite-math-only" } */
> > +
> >   #include "tsvc.h"
> >   
> >   real_t s1281(struct args_t * func_args)
> 
> 

-- 
Richard Biener 
SUSE Software Solutions Germany GmbH,
Frankenstrasse 146, 90461 Nuernberg, Germany;
GF: Ivo Totev, Andrew McDonald, Werner Knoblich; (HRB 36809, AG Nuernberg)

Re: [PATCH] ssa: Fix up maybe_rewrite_mem_ref_base complex type handling [PR116034]

2024-07-23 Thread Jakub Jelinek
On Tue, Jul 23, 2024 at 08:42:24AM +0200, Richard Biener wrote:
> On Tue, 23 Jul 2024, Jakub Jelinek wrote:
> > The folding into REALPART_EXPR is correct, used only when the mem_offset
> > is zero, but for IMAGPART_EXPR it didn't check the exact offset value (just
> > that it is not 0).
> > The following patch fixes that by using IMAGPART_EXPR only if the offset
> > is right and using BITFIELD_REF or whatever else otherwise.
> > 
> > Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk, 14.2
> > and other release branches?
> 
> I think this is not enough - what kind of GIMPLE does this result in?

If it is
  __builtin_memmove (&g, 2 + (char *) &c, 2);
then
  _1 = &c + 2;
  _3 = MEM  [(char * {ref-all})_1];
is optimized to
  _3 = IMAGPART_EXPR ;
as before.  If it is
  __builtin_memmove (&g, 1 + (char *) &c, 2);
from the testcase, then
  _1 = &c + 1;
  _3 = MEM  [(char * {ref-all})_1];
is optimized to
  _3 = BIT_FIELD_REF ;
and that is expanded correctly.

> You should amend the checking in non_rewritable_mem_ref_base as well,
> it should fail when the corresponding rewrite doesn't work.

That is already the case I believe.
non_rewritable_mem_ref_base rejects it in the VECTOR_TYPE/COMPLEX_TYPE
case (so doesn't return NULL), but then falls through the
  /* For integral typed extracts we can use a BIT_FIELD_REF.  */
check and returns NULL_TREE there.
But then maybe_rewrite_mem_ref_base triggers already on the COMPLEX_TYPE
case.

> I suspect it falls through to the BIT_FIELD_REF code?
> 
> That said, can you match up the offset check with that of
> non_rewritable_mem_ref_base then, particularly
> 
>   if ((VECTOR_TYPE_P (TREE_TYPE (decl))
>|| TREE_CODE (TREE_TYPE (decl)) == COMPLEX_TYPE)
>   && useless_type_conversion_p (TREE_TYPE (base),
> TREE_TYPE (TREE_TYPE (decl)))
>   && known_ge (mem_ref_offset (base), 0)
>   && known_gt (wi::to_poly_offset (TYPE_SIZE_UNIT (TREE_TYPE 
> (decl))),
>mem_ref_offset (base))
>   && multiple_p (mem_ref_offset (base),
>  wi::to_poly_offset (TYPE_SIZE_UNIT (TREE_TYPE 
> (base)
> 
> I suppose this check should be adjusted to use the three arg multiple_p
> and check the factor against 0/1 for COMPLEX_TYPE.

Isn't that just too complex/expensive for something as simple as
mem_ref_offset (base) is 0 or TYPE_SIZE_UNIT?
That
integer_zerop () || tree_int_cst_equal looked much cheaper.
Sure, it could be done on poly_int_cst instead if that looks better:
  && (known_eq (mem_ref_offset (base), 0)
  || known_eq (mem_ref_offset (base),
   wi::to_poly_offset (TYPE_SIZE_UNIT (TREE_TYPE (decl
And shouldn't we cache those mem_ref_offset (base) and
wi::to_poly_offset (TYPE_SIZE_UNIT (TREE_TYPE (decl)))
which are used in multiple spots?

Anyway, yet another option because non_rewritable_mem_ref_base has
the VECTOR/COMPLEX types cases together would be to have them together
in maybe_rewrite_mem_ref_base too, so do:
  if ((VECTOR_TYPE_P (TREE_TYPE (sym))
   || TREE_CODE (TREE_TYPE (sym)) == COMPLEX_TYPE)
  && useless_type_conversion_p (TREE_TYPE (*tp),
TREE_TYPE (TREE_TYPE (sym)))
  && multiple_p (mem_ref_offset (*tp),
 wi::to_poly_offset (TYPE_SIZE_UNIT (TREE_TYPE (*tp)
{
  if (VECTOR_TYPE_P (TREE_TYPE (sym)))
*tp = build3 (BIT_FIELD_REF, TREE_TYPE (*tp), sym, 
  TYPE_SIZE (TREE_TYPE (*tp)),
  int_const_binop (MULT_EXPR,
   bitsize_int (BITS_PER_UNIT),
   TREE_OPERAND (*tp, 1)));
  else
*tp = build1 (integer_zerop (TREE_OPERAND (*tp, 1))
  ? REALPART_EXPR : IMAGPART_EXPR,
  TREE_TYPE (*tp), sym);
}

Jakub



Re: [PATCH] MATCH: add abs support for half float

2024-07-23 Thread Kugan Vivekanandarajah
On Tue, Jul 23, 2024 at 10:35 AM Andrew Pinski  wrote:
>
> On Mon, Jul 22, 2024 at 5:26 PM Kugan Vivekanandarajah
>  wrote:
> >
> > Revised based on the comment and moved it into existing patterns as.
> >
> > gcc/ChangeLog:
> >
> > * match.pd: Extend A CMP 0 ? A : -A into (type)A CMP 0 ? A : -A.
> > Extend A CMP 0 ? A : -A into (type) A CMP 0 ? A : -A.
> >
> > gcc/testsuite/ChangeLog:
> >
> > * gcc.dg/tree-ssa/absfloat16.c: New test.
>
> The testcase needs to make sure it runs only for targets that support
> float16 so like:
>
> /* { dg-require-effective-target float16 } */
> /* { dg-add-options float16 } */
Added in the attached version.

Thanks.
Kugan
>
> (like what is in gcc.dg/c11-floatn-3.c and others).
>
> Other than that it looks good but I can't approve it.
>
> Thanks,
> Andrew Pinski
>
> >
> > Signed-off-by: Kugan Vivekanandarajah 
> >
> > Bootstrapped and regression test on aarch64-linux-gnu. Is this OK for trunk?
> > Thanks,
> > Kugan
> >
> > 
> > From: Andrew Pinski 
> > Sent: Monday, 15 July 2024 5:30 AM
> > To: Kugan Vivekanandarajah 
> > Cc: gcc-patches@gcc.gnu.org ; 
> > richard.guent...@gmail.com 
> > Subject: Re: [PATCH] MATCH: add abs support for half float
> >
> > External email: Use caution opening links or attachments
> >
> >
> > On Sun, Jul 14, 2024 at 1:12 AM Kugan Vivekanandarajah
> >  wrote:
> > >
> > > This patch extends abs detection in matched for half float.
> > >
> > > Bootstrapped and regression test on aarch64-linux-gnu. Is this OK for 
> > > trunk?
> >
> > This is basically this pattern:
> > ```
> >  /* A >=/> 0 ? A : -Asame as abs (A) */
> >  (for cmp (ge gt)
> >   (simplify
> >(cnd (cmp @0 zerop) @1 (negate @1))
> > (if (!HONOR_SIGNED_ZEROS (TREE_TYPE(@0))
> >  && !TYPE_UNSIGNED (TREE_TYPE(@0))
> >  && bitwise_equal_p (@0, @1))
> >  (if (TYPE_UNSIGNED (type))
> >   (absu:type @0)
> >   (abs @0)
> > ```
> >
> > except extended to handle an optional convert. Why didn't you just
> > extend the above pattern to handle the convert instead? Also I think
> > you have an issue with unsigned types with the comparison.
> > Also you should extend the -abs(A) pattern right below it in a similar 
> > fashion.
> >
> > Thanks,
> > Andrew Pinski
> >
> >
> > >
> > > gcc/ChangeLog:
> > >
> > > * match.pd: Add pattern to convert (type)A >=/> 0 ? A : -A into abs (A) 
> > > for half float.
> > >
> > > gcc/testsuite/ChangeLog:
> > >
> > > * gcc.dg/tree-ssa/absfloat16.c: New test.
> > >
> > > Signed-off-by: Kugan Vivekanandarajah 
> > >


0001-abshalffloat-v3.patch
Description: Binary data


Re: [PATCH 2/3] aarch64: Add support for moving fpm system register

2024-07-23 Thread Claudio Bantaloukas


On 22/07/2024 10:51, Kyrylo Tkachov wrote:
> Hi Claudio,
> 
>> On 22 Jul 2024, at 11:30, Claudio Bantaloukas  
>> wrote:
>>
>> External email: Use caution opening links or attachments
>>
>>
>> Unlike most system registers, fpmr can be heavily written to in code that
>> exercises the fp8 functionality. That is because every fp8 instrinsic call
>> can potentially change the value of fpmr.
>> Rather than just use a an unspec, we treat the fpmr system register like
>> all other registers and use a move operation to read and write to it.
>>
>> We introduce a new class of moveable system registers that, currently,
>> only accepts fpmr and a new constraint, Umv, that allows us to
>> selectively use mrs and msr instructions when expanding rtl for them.
>> Given that there is code that depends on "real" registers coming before
>> "fake" ones, we introduce a new constant FPM_REGNUM that uses an
>> existing value and renumber registers below that.
>> This requires us to update the bitmaps that describe which registers
>> belong to each register class.
>>
> 
> This approach sounds reasonable to me. Some comments inline below.
> 

-8<-

>> diff --git a/gcc/config/aarch64/aarch64.md b/gcc/config/aarch64/aarch64.md
>> index 94ff0eefa77..b654c0b9bb8 100644
>> --- a/gcc/config/aarch64/aarch64.md
>> +++ b/gcc/config/aarch64/aarch64.md
>> @@ -107,10 +107,14 @@ (define_constants
>>  (P14_REGNUM 82)
>>  (P15_REGNUM 83)
>>  (LAST_SAVED_REGNUM 83)
>> -(FFR_REGNUM 84)
>> +
>> +;; Floating Point Mode Register, used in FP8 insns.
>> +(FPM_REGNUM 84)
>> +
> 
> Why not define FPM_REGNUM...
> 
>> +(FFR_REGNUM 85)
>>  ;; "FFR token": a fake register used for representing the scheduling
>>  ;; restrictions on FFR-related operations.
>> -(FFRT_REGNUM 85)
>> +(FFRT_REGNUM 86)
>>
>>  ;; 
>>  ;; Fake registers
>> @@ -122,17 +126,17 @@ (define_constants
>>  ;; ABI-related lowering is needed.  These placeholders read and
>>  ;; write this register.  Instructions that depend on the lowering
>>  ;; read the register.
>> -(LOWERING_REGNUM 86)
>> +(LOWERING_REGNUM 87)
>>
>>  ;; Represents the contents of the current function's TPIDR2 block,
>>  ;; in abstract form.
>> -(TPIDR2_BLOCK_REGNUM 87)
>> +(TPIDR2_BLOCK_REGNUM 88)
>>
>>  ;; Holds the value that the current function wants PSTATE.ZA to be.
>>  ;; The actual value can sometimes vary, because it does not track
>>  ;; changes to PSTATE.ZA that happen during a lazy save and restore.
>>  ;; Those effects are instead tracked by ZA_SAVED_REGNUM.
>> -(SME_STATE_REGNUM 88)
>> +(SME_STATE_REGNUM 89)
>>
>>  ;; Instructions write to this register if they set TPIDR2_EL0 to a
>>  ;; well-defined value.  Instructions read from the register if they
>> @@ -140,14 +144,14 @@ (define_constants
>>  ;;
>>  ;; The register does not model the architected TPIDR2_ELO, just the
>>  ;; current function's management of it.
>> -(TPIDR2_SETUP_REGNUM 89)
>> +(TPIDR2_SETUP_REGNUM 90)
>>
>>  ;; Represents the property "has an incoming lazy save been committed?".
>> -(ZA_FREE_REGNUM 90)
>> +(ZA_FREE_REGNUM 91)
>>
>>  ;; Represents the property "are the current function's ZA contents
>>  ;; stored in the lazy save buffer, rather than in ZA itself?".
>> -(ZA_SAVED_REGNUM 91)
>> +(ZA_SAVED_REGNUM 92)
>>
>>  ;; Represents the contents of the current function's ZA state in
>>  ;; abstract form.  At various times in the function, these contents
>> @@ -155,10 +159,10 @@ (define_constants
>>  ;;
>>  ;; The contents persist even when the architected ZA is off.  Private-ZA
>>  ;; functions have no effect on its contents.
>> -(ZA_REGNUM 92)
>> +(ZA_REGNUM 93)
>>
>>  ;; Similarly represents the contents of the current function's ZT0 
>> state.
>> -(ZT0_REGNUM 93)
>> +(ZT0_REGNUM 94)
>>
> 
> …. Here as 95. Then you’d avoid having to update all the other regnums to new 
> values. I don’t think we aim to have these register names in alphabetical 
> order...

Hi Kyryl, thank you for the lightning fast review.
The registers are split into two contiguous regions, "non-fake" and 
"fake", and I notice that, in aarch64.h

#define FIRST_PSEUDO_REGISTER   (LAST_FAKE_REGNUM + 1)

I admit I don't know the difference between a fake and a pseudo register 
(the words are synonyms to me!) but it seems to me that the code is 
creating these regions on purpose and maintaining their contiguity is 
useful.

> 
> 
>>  (FIRST_FAKE_REGNUM LOWERING_REGNUM)
>>  (LAST_FAKE_REGNUM ZT0_REGNUM)

-8<-

>> diff --git a/gcc/testsuite/gcc.target/aarch64/acle/fp8.c 
>> b/gcc/testsuite/gcc.target/aarch64/acle/fp8.c
>> index b774f28c9f0..10fd128d8f9 100644
>> --- a/gcc/testsuite/gcc.target/aarch64/acle/fp8.c
>> +++ b/gcc/testsuite/gcc.target/aarch64/acle/fp8.c
>> @@ -1,14

Re: [PATCH] ssa: Fix up maybe_rewrite_mem_ref_base complex type handling [PR116034]

2024-07-23 Thread Jakub Jelinek
On Tue, Jul 23, 2024 at 10:07:08AM +0200, Jakub Jelinek wrote:
> Anyway, yet another option because non_rewritable_mem_ref_base has
> the VECTOR/COMPLEX types cases together would be to have them together
> in maybe_rewrite_mem_ref_base too, so do:

In patch form that would be:

2024-07-23  Jakub Jelinek  
Andrew Pinski  

PR tree-optimization/116034
* tree-ssa.cc (maybe_rewrite_mem_ref_base): Merge the VECTOR and COMPLEX
type checks.

* gcc.dg/pr116034.c: New test.

--- gcc/tree-ssa.cc.jj  2024-03-11 11:00:46.768915988 +0100
+++ gcc/tree-ssa.cc 2024-07-23 10:24:54.564568968 +0200
@@ -1492,25 +1492,23 @@ maybe_rewrite_mem_ref_base (tree *tp, bi
   && is_gimple_reg_type (TREE_TYPE (*tp))
   && ! VOID_TYPE_P (TREE_TYPE (*tp)))
 {
-  if (VECTOR_TYPE_P (TREE_TYPE (sym))
+  if ((VECTOR_TYPE_P (TREE_TYPE (sym))
+  || TREE_CODE (TREE_TYPE (sym)) == COMPLEX_TYPE)
  && useless_type_conversion_p (TREE_TYPE (*tp),
TREE_TYPE (TREE_TYPE (sym)))
  && multiple_p (mem_ref_offset (*tp),
 wi::to_poly_offset (TYPE_SIZE_UNIT (TREE_TYPE (*tp)
{
- *tp = build3 (BIT_FIELD_REF, TREE_TYPE (*tp), sym, 
-   TYPE_SIZE (TREE_TYPE (*tp)),
-   int_const_binop (MULT_EXPR,
-bitsize_int (BITS_PER_UNIT),
-TREE_OPERAND (*tp, 1)));
-   }
-  else if (TREE_CODE (TREE_TYPE (sym)) == COMPLEX_TYPE
-  && useless_type_conversion_p (TREE_TYPE (*tp),
-TREE_TYPE (TREE_TYPE (sym
-   {
- *tp = build1 (integer_zerop (TREE_OPERAND (*tp, 1))
-   ? REALPART_EXPR : IMAGPART_EXPR,
-   TREE_TYPE (*tp), sym);
+ if (VECTOR_TYPE_P (TREE_TYPE (sym)))
+   *tp = build3 (BIT_FIELD_REF, TREE_TYPE (*tp), sym, 
+ TYPE_SIZE (TREE_TYPE (*tp)),
+ int_const_binop (MULT_EXPR,
+  bitsize_int (BITS_PER_UNIT),
+  TREE_OPERAND (*tp, 1)));
+ else
+   *tp = build1 (integer_zerop (TREE_OPERAND (*tp, 1))
+ ? REALPART_EXPR : IMAGPART_EXPR,
+ TREE_TYPE (*tp), sym);
}
   else if (integer_zerop (TREE_OPERAND (*tp, 1))
   && DECL_SIZE (sym) == TYPE_SIZE (TREE_TYPE (*tp)))
--- gcc/testsuite/gcc.dg/pr116034.c.jj  2024-07-22 21:39:50.050994243 +0200
+++ gcc/testsuite/gcc.dg/pr116034.c 2024-07-23 10:26:29.940340508 +0200
@@ -0,0 +1,22 @@
+/* PR tree-optimization/116034 */
+/* { dg-do run } */
+/* { dg-options "-O1 -fno-strict-aliasing" } */
+
+int g;
+
+static inline int
+foo (_Complex unsigned short c)
+{
+  __builtin_memmove (&g, 1 + (char *) &c, 2);
+  return g;
+}
+
+int
+main ()
+{
+  if (__SIZEOF_SHORT__ == 2
+  && __CHAR_BIT__ == 8
+  && (foo (__BYTE_ORDER__ != __ORDER_BIG_ENDIAN__ ? 0x100 : 1)
+ != (__BYTE_ORDER__ != __ORDER_BIG_ENDIAN__ ? 1 : 0x100)))
+__builtin_abort ();
+}


Jakub



Re: [PATCH] ssa: Fix up maybe_rewrite_mem_ref_base complex type handling [PR116034]

2024-07-23 Thread Richard Biener
On Tue, 23 Jul 2024, Jakub Jelinek wrote:

> On Tue, Jul 23, 2024 at 08:42:24AM +0200, Richard Biener wrote:
> > On Tue, 23 Jul 2024, Jakub Jelinek wrote:
> > > The folding into REALPART_EXPR is correct, used only when the mem_offset
> > > is zero, but for IMAGPART_EXPR it didn't check the exact offset value 
> > > (just
> > > that it is not 0).
> > > The following patch fixes that by using IMAGPART_EXPR only if the offset
> > > is right and using BITFIELD_REF or whatever else otherwise.
> > > 
> > > Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk, 14.2
> > > and other release branches?
> > 
> > I think this is not enough - what kind of GIMPLE does this result in?
> 
> If it is
>   __builtin_memmove (&g, 2 + (char *) &c, 2);
> then
>   _1 = &c + 2;
>   _3 = MEM  [(char * {ref-all})_1];
> is optimized to
>   _3 = IMAGPART_EXPR ;
> as before.  If it is
>   __builtin_memmove (&g, 1 + (char *) &c, 2);
> from the testcase, then
>   _1 = &c + 1;
>   _3 = MEM  [(char * {ref-all})_1];
> is optimized to
>   _3 = BIT_FIELD_REF ;
> and that is expanded correctly.
> 
> > You should amend the checking in non_rewritable_mem_ref_base as well,
> > it should fail when the corresponding rewrite doesn't work.
> 
> That is already the case I believe.
> non_rewritable_mem_ref_base rejects it in the VECTOR_TYPE/COMPLEX_TYPE
> case (so doesn't return NULL), but then falls through the
>   /* For integral typed extracts we can use a BIT_FIELD_REF.  */
> check and returns NULL_TREE there.
> But then maybe_rewrite_mem_ref_base triggers already on the COMPLEX_TYPE
> case.
> 
> > I suspect it falls through to the BIT_FIELD_REF code?
> > 
> > That said, can you match up the offset check with that of
> > non_rewritable_mem_ref_base then, particularly
> > 
> >   if ((VECTOR_TYPE_P (TREE_TYPE (decl))
> >|| TREE_CODE (TREE_TYPE (decl)) == COMPLEX_TYPE)
> >   && useless_type_conversion_p (TREE_TYPE (base),
> > TREE_TYPE (TREE_TYPE (decl)))
> >   && known_ge (mem_ref_offset (base), 0)
> >   && known_gt (wi::to_poly_offset (TYPE_SIZE_UNIT (TREE_TYPE 
> > (decl))),
> >mem_ref_offset (base))
> >   && multiple_p (mem_ref_offset (base),
> >  wi::to_poly_offset (TYPE_SIZE_UNIT (TREE_TYPE 
> > (base)
> > 
> > I suppose this check should be adjusted to use the three arg multiple_p
> > and check the factor against 0/1 for COMPLEX_TYPE.

Ah, reading the above again it should alreeady ensure the offset is
for the real or imag part only - I was thinking it might allow
larger aligned offsets.

Thus your original patch is OK.

I think an improvement would really be to merge the two functions.

Thanks,
Richard.

> Isn't that just too complex/expensive for something as simple as
> mem_ref_offset (base) is 0 or TYPE_SIZE_UNIT?
> That
> integer_zerop () || tree_int_cst_equal looked much cheaper.
> Sure, it could be done on poly_int_cst instead if that looks better:
>   && (known_eq (mem_ref_offset (base), 0)
>   || known_eq (mem_ref_offset (base),
>  wi::to_poly_offset (TYPE_SIZE_UNIT (TREE_TYPE (decl
> And shouldn't we cache those mem_ref_offset (base) and
> wi::to_poly_offset (TYPE_SIZE_UNIT (TREE_TYPE (decl)))
> which are used in multiple spots?
> 
> Anyway, yet another option because non_rewritable_mem_ref_base has
> the VECTOR/COMPLEX types cases together would be to have them together
> in maybe_rewrite_mem_ref_base too, so do:
>   if ((VECTOR_TYPE_P (TREE_TYPE (sym))
>  || TREE_CODE (TREE_TYPE (sym)) == COMPLEX_TYPE)
>   && useless_type_conversion_p (TREE_TYPE (*tp),
> TREE_TYPE (TREE_TYPE (sym)))
>   && multiple_p (mem_ref_offset (*tp),
>  wi::to_poly_offset (TYPE_SIZE_UNIT (TREE_TYPE 
> (*tp)
> {
> if (VECTOR_TYPE_P (TREE_TYPE (sym)))
>   *tp = build3 (BIT_FIELD_REF, TREE_TYPE (*tp), sym, 
> TYPE_SIZE (TREE_TYPE (*tp)),
> int_const_binop (MULT_EXPR,
>  bitsize_int (BITS_PER_UNIT),
>  TREE_OPERAND (*tp, 1)));
> else
>   *tp = build1 (integer_zerop (TREE_OPERAND (*tp, 1))
> ? REALPART_EXPR : IMAGPART_EXPR,
> TREE_TYPE (*tp), sym);
> }
> 
>   Jakub
> 
> 

-- 
Richard Biener 
SUSE Software Solutions Germany GmbH,
Frankenstrasse 146, 90461 Nuernberg, Germany;
GF: Ivo Totev, Andrew McDonald, Werner Knoblich; (HRB 36809, AG Nuernberg)


Re: [PATCH v2] Internal-fn: Only allow type matches mode for internal fn[PR115961]

2024-07-23 Thread Andrew Pinski
On Fri, Jul 19, 2024 at 4:11 AM  wrote:
>
> From: Pan Li 
>
> The direct_internal_fn_supported_p has no restrictions for the type
> modes.  For example the bitfield like below will be recog as .SAT_TRUNC.
>
> struct e
> {
>   unsigned pre : 12;
>   unsigned a : 4;
> };
>
> __attribute__((noipa))
> void bug (e * v, unsigned def, unsigned use) {
>   e & defE = *v;
>   defE.a = min_u (use + 1, 0xf);
> }
>
> This patch would like to check strictly for the 
> direct_internal_fn_supported_p,
> and only allows the type matches mode for ifn type tree pair.
>
> The below test suites are passed for this patch:
> 1. The rv64gcv fully regression tests.
> 2. The x86 bootstrap tests.
> 3. The x86 fully regression tests.

Note I just ran into basically the same bug with BIT_ANDC/BIT_IORC
(which will be renamed to BIT_ANDN/BIT_IORN).

>
> PR target/115961
>
> gcc/ChangeLog:
>
> * internal-fn.cc (type_strictly_matches_mode_p): Add new func
> impl to check type strictly matches mode or not.
> (type_pair_strictly_matches_mode_p): Ditto but for tree type
> pair.
> (direct_internal_fn_supported_p): Add above check for the tree
> type pair.
>
> gcc/testsuite/ChangeLog:
>
> * g++.target/i386/pr115961-run-1.C: New test.
> * g++.target/riscv/rvv/base/pr115961-run-1.C: New test.
>
> Signed-off-by: Pan Li 
> ---
>  gcc/internal-fn.cc| 32 +
>  .../g++.target/i386/pr115961-run-1.C  | 34 +++
>  .../riscv/rvv/base/pr115961-run-1.C   | 34 +++
>  3 files changed, 100 insertions(+)
>  create mode 100644 gcc/testsuite/g++.target/i386/pr115961-run-1.C
>  create mode 100644 gcc/testsuite/g++.target/riscv/rvv/base/pr115961-run-1.C
>
> diff --git a/gcc/internal-fn.cc b/gcc/internal-fn.cc
> index 95946bfd683..5c21249318e 100644
> --- a/gcc/internal-fn.cc
> +++ b/gcc/internal-fn.cc
> @@ -4164,6 +4164,35 @@ direct_internal_fn_optab (internal_fn fn)
>gcc_unreachable ();
>  }
>
> +/* Return true if TYPE's mode has the same format as TYPE, and if there is
> +   a 1:1 correspondence between the values that the mode can store and the
> +   values that the type can store.  */
> +
> +static bool
> +type_strictly_matches_mode_p (const_tree type)
> +{
> +  if (VECTOR_TYPE_P (type))
> +return VECTOR_MODE_P (TYPE_MODE (type));
> +
> +  if (INTEGRAL_TYPE_P (type))
> +return type_has_mode_precision_p (type);
> +
> +  if (SCALAR_FLOAT_TYPE_P (type) || COMPLEX_FLOAT_TYPE_P (type))
> +return true;
> +
> +  return false;
> +}
> +
> +/* Return true if both the first and the second type of tree pair are
> +   strictly matches their modes,  or return false.  */
Just a slight comment improvement:
/* Returns true if both types of TYPE_PAIR strictly match their modes,
else returns false.  */

> +
> +static bool
> +type_pair_strictly_matches_mode_p (tree_pair type_pair)
> +{
> +  return type_strictly_matches_mode_p (type_pair.first)
> +&& type_strictly_matches_mode_p (type_pair.second);
> +}
> +
>  /* Return true if FN is supported for the types in TYPES when the
> optimization type is OPT_TYPE.  The types are those associated with
> the "type0" and "type1" fields of FN's direct_internal_fn_info
> @@ -4173,6 +4202,9 @@ bool
>  direct_internal_fn_supported_p (internal_fn fn, tree_pair types,
> optimization_type opt_type)
>  {
> +  if (!type_pair_strictly_matches_mode_p (types))
> +return false;
> +
>switch (fn)
>  {
>  #define DEF_INTERNAL_FN(CODE, FLAGS, FNSPEC) \
> diff --git a/gcc/testsuite/g++.target/i386/pr115961-run-1.C 
> b/gcc/testsuite/g++.target/i386/pr115961-run-1.C
> new file mode 100644
> index 000..b8c8aef3b17
> --- /dev/null
> +++ b/gcc/testsuite/g++.target/i386/pr115961-run-1.C

This testcase could go in g++.dg/torture/ without the -O3 option.

> @@ -0,0 +1,34 @@
> +/* PR target/115961 */
> +/* { dg-do run } */
> +/* { dg-options "-O3 -fdump-rtl-expand-details" } */
> +
> +struct e
> +{
> +  unsigned pre : 12;
> +  unsigned a : 4;
> +};
> +
> +static unsigned min_u (unsigned a, unsigned b)
> +{
> +  return (b < a) ? b : a;
> +}
> +
> +__attribute__((noipa))
> +void bug (e * v, unsigned def, unsigned use) {
> +  e & defE = *v;
> +  defE.a = min_u (use + 1, 0xf);
> +}
> +
> +__attribute__((noipa, optimize(0)))
> +int main(void)
> +{
> +  e v = { 0xded, 3 };
> +
> +  bug(&v, 32, 33);
> +
> +  if (v.a != 0xf)
> +__builtin_abort ();
> +
> +  return 0;
> +}

> +/* { dg-final { scan-rtl-dump-not ".SAT_TRUNC " "expand" } } */

Since we are scanning for the negative it should pass on all targets
even ones without SAT_TRUNC support. And then you should not need the
other testcase either.

Thanks,
Andrew Pinski

> diff --git a/gcc/testsuite/g++.target/riscv/rvv/base/pr115961-run-1.C 
> b/gcc/testsuite/g++.target/riscv/rvv/base/pr115961-run-1.C
> new file mode 100644
> index 000..b8c8aef3b17
> --- /dev/null
> +++ b/g

[committed] libstdc++: Do not use isatty on avr [PR115482]

2024-07-23 Thread Jonathan Wakely
I'm pushing this workaround from Detlef.

I incorrectly assumed that  is enough to ensure isatty is
present, but that isn't true for avr-libc.

It might be cleaner to add a proper autoconf check for isatty and dup,
but we don't have any reports of it failing for other targets. This
simple workaround solves the immediate problem in time for the 14.2
release.

Apart from using __write_to_terminal in the 27_io/print/2.cc test, the
functions in this file are not actually used for non-Windows targets. As
long as they compile and don't break the build, that's good enough.

Tested x86_64-linux, built on avr and mingw-w64.

Pushed to trunk, gcc-14 backport when testing finishes.

-- >8 --

avrlibc has an incomplete unistd.h that doesn't have isatty.
So building libstdc++ fails when compiling c++23/print.cc.
As a workaround I added a check for AVR.

libstdc++-v3/ChangeLog:

PR libstdc++/115482
* src/c++23/print.cc (__open_terminal) [__AVR__]: Do not use
isatty.
---
 libstdc++-v3/src/c++23/print.cc | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/libstdc++-v3/src/c++23/print.cc b/libstdc++-v3/src/c++23/print.cc
index 99a19cd4500..558dc149d12 100644
--- a/libstdc++-v3/src/c++23/print.cc
+++ b/libstdc++-v3/src/c++23/print.cc
@@ -75,7 +75,7 @@ namespace
 #ifdef _WIN32
if (int fd = ::_fileno(f); fd >= 0)
  return check_for_console((void*)_get_osfhandle(fd));
-#elifdef _GLIBCXX_HAVE_UNISTD_H
+#elif defined _GLIBCXX_HAVE_UNISTD_H && ! defined __AVR__
if (int fd = (::fileno)(f); fd >= 0 && ::isatty(fd))
  return f;
 #endif
@@ -100,7 +100,7 @@ namespace
 #ifdef _WIN32
 if (auto fb = dynamic_cast(sb))
   return check_for_console(fb->native_handle());
-#elifdef _GLIBCXX_HAVE_UNISTD_H
+#elif defined _GLIBCXX_HAVE_UNISTD_H && ! defined __AVR__
 if (auto fb = dynamic_cast(sb))
   if (int fd = fb->native_handle(); fd >= 0 && ::isatty(fd))
return ::fdopen(::dup(fd), "w"); // Caller must call fclose.
-- 
2.45.2



[committed] libstdc++: Use [[maybe_unused]] attribute in src/c++23/print.cc

2024-07-23 Thread Jonathan Wakely
Tested x86_64-linux, built on avr.

Pushed to trunk.

-- >8 --

This avoids some warnings when the preprocessor conditions are not met.

libstdc++-v3/ChangeLog:

* src/c++23/print.cc (__open_terminal): Use [[maybe_unused]] on
parameter.
---
 libstdc++-v3/src/c++23/print.cc | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/libstdc++-v3/src/c++23/print.cc b/libstdc++-v3/src/c++23/print.cc
index 558dc149d12..8ba71405967 100644
--- a/libstdc++-v3/src/c++23/print.cc
+++ b/libstdc++-v3/src/c++23/print.cc
@@ -67,7 +67,7 @@ namespace
   // This returns intptr_t that is either a Windows HANDLE
   // or 1 + a POSIX file descriptor. A zero return indicates failure.
   void*
-  __open_terminal(FILE* f)
+  __open_terminal([[maybe_unused]] FILE* f)
   {
 #ifndef _GLIBCXX_USE_STDIO_PURE
 if (f)
@@ -85,7 +85,7 @@ namespace
   }
 
   void*
-  __open_terminal(std::streambuf* sb)
+  __open_terminal([[maybe_unused]] std::streambuf* sb)
   {
 #if ! defined _GLIBCXX_USE_STDIO_PURE && defined __cpp_rtti
 using namespace __gnu_cxx;
-- 
2.45.2



[PATCH] tree-optimization/116002 - PTA solving slow with degenerate graph

2024-07-23 Thread Richard Biener
When the constraint graph consists of N nodes with only complex
constraints and no copy edges we have to be lucky to arrive at
a constraint solving order that requires the optimal number of
iterations.  What happens in the testcase is that we bottle-neck
on computing the visitation order but propagate changes only
very slowly.  Luckily the testcase complex constraints are
all copy-with-offset and those do provide a way to order
visitation.  The following adds this which reduces the iteration
count to one.

Bootstrapped and tested on x86_64-unknown-linux-gnu

Richard

PR tree-optimization/116002
* tree-ssa-structalias.cc (topo_visit): Also consider
SCALAR = SCALAR complex constraints as edges.
---
 gcc/tree-ssa-structalias.cc | 12 
 1 file changed, 12 insertions(+)

diff --git a/gcc/tree-ssa-structalias.cc b/gcc/tree-ssa-structalias.cc
index 330e64e65da..65f9132a94f 100644
--- a/gcc/tree-ssa-structalias.cc
+++ b/gcc/tree-ssa-structalias.cc
@@ -1908,6 +1908,18 @@ topo_visit (constraint_graph_t graph, vec 
&topo_order,
  topo_visit (graph, topo_order, visited, k);
   }
 
+  /* Also consider copy with offset complex constraints as implicit edges.  */
+  for (auto c : graph->complex[n])
+{
+  /* Constraints are ordered so that SCALAR = SCALAR appear first.  */
+  if (c->lhs.type != SCALAR || c->rhs.type != SCALAR)
+   break;
+  gcc_checking_assert (c->rhs.var == n);
+  unsigned k = find (c->lhs.var);
+  if (!bitmap_bit_p (visited, k))
+   topo_visit (graph, topo_order, visited, k);
+}
+
   topo_order.quick_push (n);
 }
 
-- 
2.35.3


[Patch] install.texi (gcn): Suggest newer commit for Newlib

2024-07-23 Thread Tobias Burnus

Hi Andrew, hi all,

to be compatible with C++ (and Thomas' WIP work for GCN C++ support), I 
suggest the attach patch that also suggest Thomas' Newlib commit (April 
4, 2024)


ed50a50b9   amdgcn: Implement proper locks: Fix 
'newlib/libc/sys/amdgcn/include/sys/lock.h' for C++


and not only your commit (March 25, 2024)

7dd4eb1db amdgcn: Implement proper locks

Comments or suggestions before I commit it?

Tobias
install.texi (gcn): Suggest newer commit for Newlib

Newlib 4.4.0 lacks two commits: 7dd4eb1db (2024-03-25) to fix device console
output for GFX10/GFX11 and ed50a50b9 (2024-04-04) to make the added lock.h
compilable with C++. This commit mentiones now also the second commit.

gcc/ChangeLog:

	* doc/install.texi (amdgcn-x-amdhsa): Suggest newer git version
	for newlib.

diff --git a/gcc/doc/install.texi b/gcc/doc/install.texi
index b5456992583..dda623f4410 100644
--- a/gcc/doc/install.texi
+++ b/gcc/doc/install.texi
@@ -3952,9 +3952,9 @@ Instead of GNU Binutils, you will need to install LLVM 15, or later, and copy
 by specifying a @code{--with-multilib-list=} that does not list @code{gfx1100}
 and @code{gfx1103}.
 
-Use Newlib (4.3.0 or newer; 4.4.0 contains some improvements and git commit
-7dd4eb1db (2024-03-25, post-4.4.0) fixes device console output for GFX10 and
-GFX11 devices).
+Use Newlib (4.3.0 or newer; 4.4.0 contains some improvements and git commits
+7dd4eb1db and ed50a50b9 (2024-04-04, post-4.4.0) fix device console output
+for GFX10 and GFX11 devices).
 
 To run the binaries, install the HSA Runtime from the
 @uref{https://rocm.docs.amd.com/,,ROCm Platform}, and use


Re: [PING] [PATCH] testsuite: Disable finite math only for test [PR115826]

2024-07-23 Thread Torbjorn SVENSSON




On 2024-07-23 09:59, Richard Biener wrote:

On Tue, 23 Jul 2024, Torbjorn SVENSSON wrote:


Gentle ping :)

As mentioned in the ticket, I would like to target this to trunk and
releases/gcc-14.


OK.

Richard.



Pushed as releases/gcc-14.1.0-331-ga544898f6dd and 
basepoints/gcc-15-2224-g7793f5b4194.


Kind regards,
Torbjörn


[PATCH] aarch64: Fuse CMP+CSEL and CMP+CSET for -mcpu=neoverse-v2

2024-07-23 Thread Jennifer Schmitz
According to the Neoverse V2 Software Optimization Guide (section 4.14), the
instruction pairs CMP+CSEL and CMP+CSET can be fused, which had not been
implemented so far. This patch implements and tests the two fusion pairs.

The patch was bootstrapped and regtested on aarch64-linux-gnu, no regression.
There was also no non-noise impact on SPEC CPU2017 benchmark.
OK for mainline?

Signed-off-by: Jennifer Schmitz 

gcc/

* config/aarch64/aarch64.cc (aarch_macro_fusion_pair_p): Implement
fusion logic.
* config/aarch64/aarch64-fusion-pairs.def (cmp+csel): New entry.
(cmp+cset): Likewise.
* config/aarch64/tuning_models/neoversev2.h: Enable logic in
field fusible_ops.

gcc/testsuite/

* gcc.target/aarch64/fuse_cmp_csel.c: New test.
* gcc.target/aarch64/fuse_cmp_cset.c: Likewise.


0001-aarch64-Fuse-CMP-CSEL-and-CMP-CSET-for-mcpu-neoverse.patch
Description: Binary data


smime.p7s
Description: S/MIME cryptographic signature


[wwwdocs] Add aarch64 11.5.0 caveat

2024-07-23 Thread Jakub Jelinek
Hi!

Richi suggested to mention the PR116029 bug in 11.5.0 caveats as we can't
fix that anymore.

Here is a patch for that, which attempts to describe (my limited
understanding of) the issue.
As TARGET_CPU_generic is now 64, when config.gcc doesn't set
TARGET_CPU_DEFAULT, we end up with TARGET_CPU_DEFAULT
(64 | (AARCH64_CPU_DEFAULT_FLAGS << 6))
which is treated as I think
TARGET_CPU_cortexa34 | ((AARCH64_CPU_DEFAULT_FLAGS | AARCH64_FL_SIMD) << 6))
Ok for wwwdocs?

diff --git a/htdocs/gcc-11/changes.html b/htdocs/gcc-11/changes.html
index e010cd08..26189772 100644
--- a/htdocs/gcc-11/changes.html
+++ b/htdocs/gcc-11/changes.html
@@ -1164,5 +1164,20 @@ known to be fixed in the 11.5 release. This list might 
not be
 complete (that is, it is possible that some PRs that have been fixed
 are not listed here).
 
+Caveats
+
+aarch64
+
+  
+Due to a late introduced bug if the compiler is configured without
+explicit --with-arch=, --with=cpu= and/or
+--with-tune= configure options the compiler without
+explicit -march= etc. options might act as if asked
+for cortex-a34.  This can be fixed by appling manually the
+https://gcc.gnu.org/r12-8060";>r12-8060 commit on top
+of GCC 11.5.0.  See https://gcc.gnu.org/PR116029";>PR116029
+for more details.
+  
+
 
 

Jakub



Re: [PATCH V5] report message for operator %a on unaddressible operand

2024-07-23 Thread Jiufu Guo
"Kewen.Lin"  writes:

> Hi Jeff,
>
> on 2024/7/16 13:39, Jiufu Guo wrote:
>> Hi,
>> 
>> For PR96866, when printing asm code for modifier "%a", an addressable
>> operand is required.  While the constraint "X" allow any kind of
>> operand even which is hard to get the address directly. e.g. extern
>> symbol whose address is in TOC.
>> An error message would be reported to indicate the invalid asm operand.
>> 
>> Compare with previous version, test case is updated with -mno-pcrel.
>> 
>> Bootstrap®test pass on ppc64{,le}.
>> Is this ok for trunk?
>> 
>> BR,
>> Jeff(Jiufu Guo)
>> 
>>  PR target/96866
>> 
>> gcc/ChangeLog:
>> 
>>  * config/rs6000/rs6000.cc (print_operand_address): Emit message for
>>  Unsupported operand.
>
> Nit: s/Unsupported/unsupported/
>
>> 
>> gcc/testsuite/ChangeLog:
>> 
>>  * gcc.target/powerpc/pr96866-1.c: New test.
>>  * gcc.target/powerpc/pr96866-2.c: New test.
>> 
>> ---
>>  gcc/config/rs6000/rs6000.cc  |  7 ++-
>>  gcc/testsuite/gcc.target/powerpc/pr96866-1.c | 18 ++
>>  gcc/testsuite/gcc.target/powerpc/pr96866-2.c | 13 +
>>  3 files changed, 37 insertions(+), 1 deletion(-)
>>  create mode 100644 gcc/testsuite/gcc.target/powerpc/pr96866-1.c
>>  create mode 100644 gcc/testsuite/gcc.target/powerpc/pr96866-2.c
>> 
>> diff --git a/gcc/config/rs6000/rs6000.cc b/gcc/config/rs6000/rs6000.cc
>> index 117999613d8..7e7c36a1bad 100644
>> --- a/gcc/config/rs6000/rs6000.cc
>> +++ b/gcc/config/rs6000/rs6000.cc
>> @@ -14664,7 +14664,12 @@ print_operand_address (FILE *file, rtx x)
>>  fprintf (file, "@%s(%s)", SMALL_DATA_RELOC,
>>   reg_names[SMALL_DATA_REG]);
>>else
>> -gcc_assert (!TARGET_TOC);
>> +{
>> +  /* Do not support getting address directly from TOC, emit error.
>> + No more work is needed for !TARGET_TOC. */
>> +  if (TARGET_TOC)
>> +output_operand_lossage ("%%a requires an address of memory");
>> +}
>>  }
>>else if (GET_CODE (x) == PLUS && REG_P (XEXP (x, 0))
>> && REG_P (XEXP (x, 1)))
>> diff --git a/gcc/testsuite/gcc.target/powerpc/pr96866-1.c 
>> b/gcc/testsuite/gcc.target/powerpc/pr96866-1.c
>> new file mode 100644
>> index 000..bcebbd6e310
>> --- /dev/null
>> +++ b/gcc/testsuite/gcc.target/powerpc/pr96866-1.c
>> @@ -0,0 +1,18 @@
>> +/* The "%a" modifier can't get the address of extern symbol directly from 
>> TOC
>> +   with -fPIC, even the symbol is propgated for "X" constraint under -O2. */
>
> Nit: s/propgated/propagated/
>
> OK with these nits tweaked, thanks!
Thanks a lot for your helpful comments and insight review!
Committed via r15-2225.

BR,
Jeff(Jiufu) Guo

>
> BR,
> Kewen
>
>> +/* { dg-options "-fPIC -O2 -mno-pcrel" } */
>> +
>> +/* It's to verify no ICE here, ignore error messages about invalid 'asm'.  
>> */
>> +/* { dg-excess-errors "pr96866-1.c" } */
>> +
>> +int x[2];
>> +
>> +int __attribute__ ((noipa))
>> +f1 (void)
>> +{
>> +  int n;
>> +  int *p = x;
>> +  *p++;
>> +  __asm__ volatile("ld %0, %a1" : "=r"(n) : "X"(p));
>> +  return n;
>> +}
>> diff --git a/gcc/testsuite/gcc.target/powerpc/pr96866-2.c 
>> b/gcc/testsuite/gcc.target/powerpc/pr96866-2.c
>> new file mode 100644
>> index 000..0577fd6d588
>> --- /dev/null
>> +++ b/gcc/testsuite/gcc.target/powerpc/pr96866-2.c
>> @@ -0,0 +1,13 @@
>> +/* The "%a" modifier can't get the address of extern symbol directly from 
>> TOC
>> +   with -fPIC. */
>> +/* { dg-options "-fPIC -O2 -mno-pcrel" } */
>> +
>> +/* It's to verify no ICE here, ignore error messages about invalid 'asm'.  
>> */
>> +/* { dg-excess-errors "pr96866-2.c" } */
>> +
>> +void
>> +f (void)
>> +{
>> +  extern int x;
>> +  __asm__ volatile("#%a0" ::"X"(&x));
>> +}


Re: [Patch] install.texi (gcn): Suggest newer commit for Newlib

2024-07-23 Thread Andrew Stubbs

On 23/07/2024 11:05, Tobias Burnus wrote:

Hi Andrew, hi all,

to be compatible with C++ (and Thomas' WIP work for GCN C++ support), I 
suggest the attach patch that also suggest Thomas' Newlib commit (April 
4, 2024)


ed50a50b9   amdgcn: Implement proper locks: Fix 
'newlib/libc/sys/amdgcn/include/sys/lock.h' for C++


and not only your commit (March 25, 2024)

7dd4eb1db amdgcn: Implement proper locks

Comments or suggestions before I commit it?


LGTM.

We should write the same thing on the Wiki Offloading page.

Andrew



Re: [wwwdocs] Add aarch64 11.5.0 caveat

2024-07-23 Thread Richard Biener
On Tue, 23 Jul 2024, Jakub Jelinek wrote:

> Hi!
> 
> Richi suggested to mention the PR116029 bug in 11.5.0 caveats as we can't
> fix that anymore.
> 
> Here is a patch for that, which attempts to describe (my limited
> understanding of) the issue.
> As TARGET_CPU_generic is now 64, when config.gcc doesn't set
> TARGET_CPU_DEFAULT, we end up with TARGET_CPU_DEFAULT
> (64 | (AARCH64_CPU_DEFAULT_FLAGS << 6))
> which is treated as I think
> TARGET_CPU_cortexa34 | ((AARCH64_CPU_DEFAULT_FLAGS | AARCH64_FL_SIMD) << 6))
> Ok for wwwdocs?

OK, but please give arm folks the chance to review this.

Richard.

> diff --git a/htdocs/gcc-11/changes.html b/htdocs/gcc-11/changes.html
> index e010cd08..26189772 100644
> --- a/htdocs/gcc-11/changes.html
> +++ b/htdocs/gcc-11/changes.html
> @@ -1164,5 +1164,20 @@ known to be fixed in the 11.5 release. This list might 
> not be
>  complete (that is, it is possible that some PRs that have been fixed
>  are not listed here).
>  
> +Caveats
> +
> +aarch64
> +
> +  
> +Due to a late introduced bug if the compiler is configured without
> +explicit --with-arch=, --with=cpu= and/or
> +--with-tune= configure options the compiler without
> +explicit -march= etc. options might act as if asked
> +for cortex-a34.  This can be fixed by appling manually the
> +https://gcc.gnu.org/r12-8060";>r12-8060 commit on top
> +of GCC 11.5.0.  See https://gcc.gnu.org/PR116029";>PR116029
> +for more details.
> +  
> +
>  
>  
> 
>   Jakub
> 
> 

-- 
Richard Biener 
SUSE Software Solutions Germany GmbH,
Frankenstrasse 146, 90461 Nuernberg, Germany;
GF: Ivo Totev, Andrew McDonald, Werner Knoblich; (HRB 36809, AG Nuernberg)


Re: [PATCH] libstdc++: Remove duplicate include header from ranges_algobase.h

2024-07-23 Thread Jonathan Wakely
On Mon, 22 Jul 2024 at 20:53, Michael Levine wrote:
>
> The bits/stl_algobase.h header was added to bits/ranges_algobase.h separately 
> through two related commits:
>
> https://github.com/gcc-mirror/gcc/commit/674d213ab91871652e96dc2de06e6f50682eebe0
>
> https://github.com/gcc-mirror/gcc/commit/0bb1db32ccf54a9de59bea718f7575f7ef22abf5
>
> The comment for the first time it is included in the file is also incorrect 
> (my error from that 2nd PR) since it is really being included for __memcmp, 
> not __memcpy

Oops, I thought I'd fixed that when I applied your patch.

> This patch removes the duplicate header include.

Thanks, I'll push this today.



[r15-2196 Regression] FAIL: c-c++-common/dfp/convert-bfp-6.c -std=gnu++98 execution test on Linux/x86_64

2024-07-23 Thread haochen.jiang
On Linux/x86_64,

88d16194d0c8a6bdc2896c8944bfbf3e6038c9d2 is the first bad commit
commit 88d16194d0c8a6bdc2896c8944bfbf3e6038c9d2
Author: Jeff Law 
Date:   Mon Jul 22 08:45:10 2024 -0600

[NFC][PR rtl-optimization/115877] Avoid setting irrelevant bit groups as 
live in ext-dce

caused

FAIL: c-c++-common/dfp/convert-bfp-10.c execution test
FAIL: c-c++-common/dfp/convert-bfp-10.c  -std=c++14 execution test
FAIL: c-c++-common/dfp/convert-bfp-10.c  -std=c++17 execution test
FAIL: c-c++-common/dfp/convert-bfp-10.c  -std=c++20 execution test
FAIL: c-c++-common/dfp/convert-bfp-10.c  -std=c++98 execution test
FAIL: c-c++-common/dfp/convert-bfp-6.c execution test
FAIL: c-c++-common/dfp/convert-bfp-6.c  -std=gnu++14 execution test
FAIL: c-c++-common/dfp/convert-bfp-6.c  -std=gnu++17 execution test
FAIL: c-c++-common/dfp/convert-bfp-6.c  -std=gnu++20 execution test
FAIL: c-c++-common/dfp/convert-bfp-6.c  -std=gnu++98 execution test

with GCC configured with

../../gcc/configure 
--prefix=/export/users/haochenj/src/gcc-bisect/master/master/r15-2196/usr 
--enable-clocale=gnu --with-system-zlib --with-demangler-in-ld 
--with-fpmath=sse --enable-languages=c,c++,fortran --enable-cet --without-isl 
--enable-libmpx x86_64-linux --disable-bootstrap

To reproduce:

$ cd {build_dir}/gcc && make check 
RUNTESTFLAGS="dfp.exp=c-c++-common/dfp/convert-bfp-10.c 
--target_board='unix{-m32}'"
$ cd {build_dir}/gcc && make check 
RUNTESTFLAGS="dfp.exp=c-c++-common/dfp/convert-bfp-10.c 
--target_board='unix{-m32\ -march=cascadelake}'"
$ cd {build_dir}/gcc && make check 
RUNTESTFLAGS="dfp.exp=c-c++-common/dfp/convert-bfp-6.c 
--target_board='unix{-m32}'"
$ cd {build_dir}/gcc && make check 
RUNTESTFLAGS="dfp.exp=c-c++-common/dfp/convert-bfp-6.c 
--target_board='unix{-m32\ -march=cascadelake}'"

(Please do not reply to this email, for question about this report, contact me 
at haochen dot jiang at intel.com.)
(If you met problems with cascadelake related, disabling AVX512F in command 
line might save that.)
(However, please make sure that there is no potential problems with AVX512.)


Re: [PATCH v3 2/2] Prevent divide-by-zero

2024-07-23 Thread Richard Biener
On Tue, Jul 23, 2024 at 1:03 AM Edwin Lu  wrote:
>
> Hi Richard,
>
> On 5/31/2024 1:48 AM, Richard Biener wrote:
> > On Thu, May 30, 2024 at 2:11 AM Patrick O'Neill  
> > wrote:
> >>
> >> From: Greg McGary 
> >
> > Still a NACK.  If remain ends up zero then
> >
> >  /* Try to use a single smaller load when we are about
> > to load excess elements compared to the unrolled
> > scalar loop.  */
> >  if (known_gt ((vec_num * j + i + 1) * nunits,
> > (group_size * vf - gap)))
> >{
> >  poly_uint64 remain = ((group_size * vf - gap)
> >- (vec_num * j + i) * 
> > nunits);
> >  if (known_ge ((vec_num * j + i + 1) * nunits
> >- (group_size * vf - gap), nunits))
> >/* DR will be unused.  */
> >ltype = NULL_TREE;
> >
> > needs to be re-formulated so that the combined conditions make sure
> > this doesn't happen.  The outer known_gt should already ensure that
> > remain > 0.  For correctness that should possibly be maybe_gt though.
> >

Putting the list back in the loop and CCing Richard S.

> I'm currently looking into this patch and am trying to figure out what
> is going on. Stepping through gdb, I see that remain == {coeffs = {0,
> 2}} and nunits == {coeffs = {2, 2}} (the outer known_gt returned true
> with known_gt({coeffs = {8, 8}}, {coeffs = {6, 8}})).
>
>  From what I understand, this falls under the umbrella of 0 <= remain <
> nunits. The divide by zero error is because of the 0 <= remain which is
> coming from the constant_multiple_p function in poly-int.h where it
> performs the modulus NCa(a.coeffs[0]) % NCb(b.coeffs[0]).
> (https://github.com/gcc-mirror/gcc/blob/master/gcc/poly-int.h#L1970-L1971)
>
>
>  >  if (known_ge ((vec_num * j + i + 1) * nunits
>  >- (group_size * vf - gap),
> nunits))
>  >/* DR will be unused.  */
>  >ltype = NULL_TREE;
>
> This if condition is a bit suspicious to me though. I'm seeing that it's
> evaluating known_ge({coeffs = {2, 0}}, {coeffs = {2, 2}}) which is
> returning false. Should it be maybe_ge instead?

No, we can only not emit a load if we know it won't be used, not if
it eventually cannot be used.

> After running some
> tests, to me it looks like it doesn't vectorize quite as often; however,
> I'm not fully sure what else to do when the coeffs can potentially be
> equal to 0.
>
> Should it even be possible for there to be a {coeffs = {0, n}}
> situation? My understanding of how poly_ints are used for representing
> vectorization is that the first coefficient is the number of elements
> needed to make the minimum supported vector size. That is, if vector
> lengths are 128 bits, element size is 32 bits, coeff[0] should be
> minimum of 4. Is this understanding correct?

I was told n can be negative, but nunits.coeff[0] should be non-zero.

What is j and i when the divisor is zero?


>
> >> gcc/ChangeLog:
> >>  * gcc/tree-vect-stmts.cc (gcc/tree-vect-stmts.cc): Prevent 
> >> divide-by-zero.
> >>  * testsuite/gcc.target/riscv/rvv/autovec/no-segment.c: Remove 
> >> dg-ice.
> >> ---
> >> No changes in v3. Depends on the risc-v backend option added in patch 1 to
> >> trigger the ICE.
> >> ---
> >>   gcc/testsuite/gcc.target/riscv/rvv/autovec/no-segment.c | 1 -
> >>   gcc/tree-vect-stmts.cc  | 3 ++-
> >>   2 files changed, 2 insertions(+), 2 deletions(-)
> >>
> >> diff --git a/gcc/testsuite/gcc.target/riscv/rvv/autovec/no-segment.c 
> >> b/gcc/testsuite/gcc.target/riscv/rvv/autovec/no-segment.c
> >> index dfbe09f01a1..79d03612a22 100644
> >> --- a/gcc/testsuite/gcc.target/riscv/rvv/autovec/no-segment.c
> >> +++ b/gcc/testsuite/gcc.target/riscv/rvv/autovec/no-segment.c
> >> @@ -1,6 +1,5 @@
> >>   /* { dg-do compile } */
> >>   /* { dg-options "-march=rv64gcv -mabi=lp64d -mrvv-vector-bits=scalable 
> >> -O3 -mno-autovec-segment" } */
> >> -/* { dg-ice "Floating point exception" } */
> >>
> >>   enum e { c, d };
> >>   enum g { f };
> >> diff --git a/gcc/tree-vect-stmts.cc b/gcc/tree-vect-stmts.cc
> >> index 4219ad832db..34f5736ba00 100644
> >> --- a/gcc/tree-vect-stmts.cc
> >> +++ b/gcc/tree-vect-stmts.cc
> >> @@ -11558,7 +11558,8 @@ vectorizable_load (vec_info *vinfo,
> >>   - (vec_num * j + i) * nunits);
> >>  /* remain should now be > 0 and < nunits.  */
> >>  unsigned num;
> >> -   if (constant_multiple_p (nunits, remain, &num))
> >> +   if (known_gt (remain, 0)
> >> +   && constant_multiple_p (nunits, remain, 
> 

Re: [PATCH] c++: missing SFINAE during alias CTAD [PR115296]

2024-07-23 Thread Jason Merrill

On 7/19/24 12:24 PM, Patrick Palka wrote:

On Fri, 19 Jul 2024, Jason Merrill wrote:


On 7/19/24 10:55 AM, Patrick Palka wrote:

On Fri, Jul 5, 2024 at 1:50 PM Patrick Palka  wrote:


Bootstrapped and regtested on x86_64-pc-linux-gnu, does this look
OK for trunk/14?

-- >8 --

During the alias CTAD transformation, if substitution failed for some
guide we should just discard the guide silently.  We currently do
discard the guide, but not silently, which causes us to reject the
below testcase due to forming a too-large array type when transforming
the user-defined deduction guides.

This patch fixes this by passing complain=tf_none instead of
=tf_warning_or_error in the couple of spots where we expect subsitution
to possibly fail and so subsequently check for error_mark_node.


Ping.  Alternatively we could set complain=tf_none everywhere.


That sounds better, unless you think there's a reason to have different
complain args for different calls.


I was initially worried about a stray error_mark_node silently leaking
into the rewritten guide signature (since we don't check for
error_mark_node after each substitution) but on second thought that
seems unlikely.  The substitution steps in alias_ctad_tweaks  that
aren't checked should probably never fail, since they're just reindexing
template parameters etc.

So like so?


OK.


-- >8 --

Subject: [PATCH] c++: missing SFINAE during alias CTAD [PR115296]

PR c++/115296

gcc/cp/ChangeLog:

* pt.cc (alias_ctad_tweaks): Use complain=tf_none
instead of tf_warning_or_error.

gcc/testsuite/ChangeLog:

* g++.dg/cpp2a/class-deduction-alias23.C: New test.
---
  gcc/cp/pt.cc  |  2 +-
  .../g++.dg/cpp2a/class-deduction-alias23.C| 19 +++
  2 files changed, 20 insertions(+), 1 deletion(-)
  create mode 100644 gcc/testsuite/g++.dg/cpp2a/class-deduction-alias23.C

diff --git a/gcc/cp/pt.cc b/gcc/cp/pt.cc
index 45453c0d45a..8e9951a9066 100644
--- a/gcc/cp/pt.cc
+++ b/gcc/cp/pt.cc
@@ -30298,7 +30298,7 @@ alias_ctad_tweaks (tree tmpl, tree uguides)
  (INNERMOST_TEMPLATE_PARMS (fullatparms)));
  }
  
-  tsubst_flags_t complain = tf_warning_or_error;

+  tsubst_flags_t complain = tf_none;
tree aguides = NULL_TREE;
tree atparms = INNERMOST_TEMPLATE_PARMS (fullatparms);
unsigned natparms = TREE_VEC_LENGTH (atparms);
diff --git a/gcc/testsuite/g++.dg/cpp2a/class-deduction-alias23.C 
b/gcc/testsuite/g++.dg/cpp2a/class-deduction-alias23.C
new file mode 100644
index 000..117212c67de
--- /dev/null
+++ b/gcc/testsuite/g++.dg/cpp2a/class-deduction-alias23.C
@@ -0,0 +1,19 @@
+// PR c++/115296
+// { dg-do compile { target c++20 } }
+
+using size_t = decltype(sizeof(0));
+
+template
+struct span { span(T); };
+
+template
+span(T(&)[N]) -> span; // { dg-bogus "array exceeds maximum" }
+
+template
+requires (sizeof(T[N]) != 42) // { dg-bogus "array exceeds maximum" }
+span(T*) -> span;
+
+template
+using array_view = span;
+
+array_view x = 0;




Re: [PATCH] c++: normalizing ttp parm constraints [PR115656]

2024-07-23 Thread Jason Merrill

On 7/5/24 12:18 PM, Patrick Palka wrote:

Bootstrapped and regtested on x86_64-pc-linux-gnu, does this look OK
for trunk/14 and perhaps 13?

Alternatively we can set current_template_parms from weakly_subsumes
instead, who has only one caller anyway.


Would it also work to pass tmpl instead of NULL_TREE to 
get_normalized_constraints_from_info in weakly_subsumes?



-- >8 --

Here we normalize the constraint same_as for the first
time during constraint subsumption checking of B / TT as part of ttp
coercion.  During this normalization the set of in-scope template
parameters i.e. current_template_parms is empty, which tricks the
satisfaction cache into thinking that the satisfaction value of the
constraint is independent of its template parameters, and we incorrectly
conflate the satisfaction value with auto = bool vs auto = long and
accept the specialization A.

This patch fixes this by setting current_template_parms appropirately
during subsumption checking.

PR c++/115656

gcc/cp/ChangeLog:

* pt.cc (is_compatible_template_arg): Set current_template_parms
around the call to weakly_subsumes.

gcc/testsuite/ChangeLog:

* g++.dg/cpp2a/concepts-ttp7.C: New test.
---
  gcc/cp/pt.cc   |  4 
  gcc/testsuite/g++.dg/cpp2a/concepts-ttp7.C | 12 
  2 files changed, 16 insertions(+)
  create mode 100644 gcc/testsuite/g++.dg/cpp2a/concepts-ttp7.C

diff --git a/gcc/cp/pt.cc b/gcc/cp/pt.cc
index 017cc7fd0ab..1f6553790a5 100644
--- a/gcc/cp/pt.cc
+++ b/gcc/cp/pt.cc
@@ -8493,6 +8493,10 @@ is_compatible_template_arg (tree parm, tree arg, tree 
args)
  return false;
  }
  
+  /* Normalization needs to know the effective set of in-scope

+ template parameters.  */
+  auto ctp = make_temp_override (current_template_parms,
+DECL_TEMPLATE_PARMS (arg));
return weakly_subsumes (parm_cons, arg);
  }
  
diff --git a/gcc/testsuite/g++.dg/cpp2a/concepts-ttp7.C b/gcc/testsuite/g++.dg/cpp2a/concepts-ttp7.C

new file mode 100644
index 000..bc123ecf75e
--- /dev/null
+++ b/gcc/testsuite/g++.dg/cpp2a/concepts-ttp7.C
@@ -0,0 +1,12 @@
+// PR c++/115656
+// { dg-do compile { target c++20 } }
+
+template concept same_as = __is_same(T, U);
+
+template U, template> class TT>
+struct A { };
+
+template> class B;
+
+A a1;
+A a2; // { dg-error "constraint failure" }




[PATCH v2] AArch64: Add LUTI ACLE for SVE2

2024-07-23 Thread vladimir.miloserdov
Hi All,

Changes since V1: add missing MD constraints, rename intrinsics,
remove SME2 flag for LUT feature.

Bootstrapped Regtested on aarch64-none-linux-gnu and no issues.

This depends on "Extend aarch64_feature_flags to 128 bits" work which is soon 
to be submitted upstream as we ran out of 64-bit flags. 

The patch needs to be committed for me as I don't have commit rights.

Ok for master when the pre-requisites get committed? 

--

This patch introduces support for LUTI2/LUTI4 ACLE for SVE2.

LUTI instructions are used for efficient table lookups with 2-bit or 4-bit
indices. LUTI2 reads indexed 8-bit or 16-bit elements from the low 128 bits of
the table vector using packed 2-bit indices, while LUTI4 can read from the low
128 or 256 bits of the table vector or from two table vectors using packed 
4-bit indices. These instructions fill the destination vector by copying 
elements indexed by segments of the source vector, selected by the vector 
segment index.

The changes include the addition of a new AArch64 option extension "lut",
__ARM_FEATURE_LUT preprocessor macro, definitions for the new LUTI instruction
shapes, and implementations of the svluti2 and svluti4 builtins.

BR,
- Vladimir

gcc/ChangeLog:

* config/aarch64/aarch64-c.cc (aarch64_update_cpp_builtins): 
Add support for __ARM_FEATURE_LUT preprocessor macro.
* config/aarch64/aarch64-option-extensions.def (AARCH64_OPT_EXTENSION): 
Add "lut" option extension.
* config/aarch64/aarch64-sve-builtins-shapes.cc (struct luti_base): 
Define new LUTI ACLE shapes.
(SHAPE): Define shapes for luti2 and luti4.
* config/aarch64/aarch64-sve-builtins-shapes.h: Add declarations 
for luti2 and luti4.
* config/aarch64/aarch64-sve-builtins-sve2.cc (class svluti_lane_impl): 
Implement support for LUTI instructions.
(FUNCTION): Register svluti2 and svluti4 functions.
* config/aarch64/aarch64-sve-builtins-sve2.def (svluti2): 
Define svluti2 function.
(svluti4): Define svluti4 function.
* config/aarch64/aarch64-sve-builtins-sve2.h: Add declarations 
for svluti2 and svluti4.
* config/aarch64/aarch64-sve2.md (@aarch64_sve_luti): 
Define machine description patterns for LUTI.
* config/aarch64/aarch64.h (AARCH64_ISA_LUT): Define macro for LUTI.
(TARGET_LUT): Likewise.
* config/aarch64/iterators.md: Define mode iterators 
for LUTI MD patterns.

gcc/testsuite/ChangeLog:

* gcc.target/aarch64/sve/acle/asm/test_sve_acle.h: Add macro for 
SVE ACLE to enable LUTI tests.
* lib/target-supports.exp: Update to include check for the LUT feature.
* gcc.target/aarch64/sve2/acle/asm/luti2_bf16.c: New test.
* gcc.target/aarch64/sve2/acle/asm/luti2_f16.c: New test.
* gcc.target/aarch64/sve2/acle/asm/luti2_s16.c: New test.
* gcc.target/aarch64/sve2/acle/asm/luti2_s8.c: New test.
* gcc.target/aarch64/sve2/acle/asm/luti2_u16.c: New test.
* gcc.target/aarch64/sve2/acle/asm/luti2_u8.c: New test.
* gcc.target/aarch64/sve2/acle/asm/luti4_bf16.c: New test.
* gcc.target/aarch64/sve2/acle/asm/luti4_bf16_x2.c: New test.
* gcc.target/aarch64/sve2/acle/asm/luti4_f16.c: New test.
* gcc.target/aarch64/sve2/acle/asm/luti4_f16_x2.c: New test.
* gcc.target/aarch64/sve2/acle/asm/luti4_s16.c: New test.
* gcc.target/aarch64/sve2/acle/asm/luti4_s16_x2.c: New test.
* gcc.target/aarch64/sve2/acle/asm/luti4_s8.c: New test.
* gcc.target/aarch64/sve2/acle/asm/luti4_u16.c: New test.
* gcc.target/aarch64/sve2/acle/asm/luti4_u16_x2.c: New test.
* gcc.target/aarch64/sve2/acle/asm/luti4_u8.c: New test.

diff --git a/gcc/config/aarch64/aarch64-c.cc b/gcc/config/aarch64/aarch64-c.cc
index 59b2246cf8e..c1fc1955c92 100644
--- a/gcc/config/aarch64/aarch64-c.cc
+++ b/gcc/config/aarch64/aarch64-c.cc
@@ -272,6 +272,7 @@ aarch64_update_cpp_builtins (cpp_reader *pfile)
   aarch64_def_or_undef (TARGET_SME_I16I64, "__ARM_FEATURE_SME_I16I64", pfile);
   aarch64_def_or_undef (TARGET_SME_F64F64, "__ARM_FEATURE_SME_F64F64", pfile);
   aarch64_def_or_undef (TARGET_SME2, "__ARM_FEATURE_SME2", pfile);
+  aarch64_def_or_undef (TARGET_LUT, "__ARM_FEATURE_LUT", pfile);
 
   /* Not for ACLE, but required to keep "float.h" correct if we switch
  target between implementations that do or do not support ARMv8.2-A
diff --git a/gcc/config/aarch64/aarch64-option-extensions.def b/gcc/config/aarch64/aarch64-option-extensions.def
index 42ec0eec31e..e58aea09bfc 100644
--- a/gcc/config/aarch64/aarch64-option-extensions.def
+++ b/gcc/config/aarch64/aarch64-option-extensions.def
@@ -232,6 +232,8 @@ AARCH64_OPT_EXTENSION("the", THE, (), (), (), "the")
 
 AARCH64_OPT_EXTENSION("gcs", GCS, (), (), (), "gcs")
 
+AARCH64_OPT_EXTENSION("lut", LUT, (SVE2), (), (), "lut")
+
 #undef AARCH64_OPT_FMV_EXTENSION
 #undef AARCH64_OPT_

[PATCH v4] RISC-V: Implement __init_riscv_feature_bits, __riscv_feature_bits, and __riscv_vendor_feature_bits

2024-07-23 Thread Kito Cheng
This provides a common abstraction layer to probe the available extensions at
run-time. These functions can be used to implement function multi-versioning or
to detect available extensions.

The advantages of providing this abstraction layer are:
- Easy to port to other new platforms.
- Easier to maintain in GCC for function multi-versioning.
  - For example, maintaining platform-dependent code in C code/libgcc is much
easier than maintaining it in GCC by creating GIMPLEs...

This API is intended to provide the capability to query minimal common 
available extensions on the system.

Proposal in riscv-c-api-doc: 
https://github.com/riscv-non-isa/riscv-c-api-doc/pull/74

Full function multi-versioning implementation will come later. We are posting
this first because we intend to backport it to the GCC 14 branch to unblock
LLVM 19 to use this with GCC 14.2, rather than waiting for GCC 15.

Changes since v3:
- Fix non-linux build.
- Let __init_riscv_feature_bits become constructor

Changes since v2:
- Prevent it initialize more than once.

Changes since v1:
- Fix the format.
- Prevented race conditions by introducing a local variable to avoid load/store
  operations during the computation of the feature bit.

libgcc/ChangeLog:

* config/riscv/feature_bits.c: New.
* config/riscv/t-elf (LIB2ADD): Add feature_bits.c.
---
 libgcc/config/riscv/feature_bits.c | 315 +
 libgcc/config/riscv/t-elf  |   1 +
 2 files changed, 316 insertions(+)
 create mode 100644 libgcc/config/riscv/feature_bits.c

diff --git a/libgcc/config/riscv/feature_bits.c 
b/libgcc/config/riscv/feature_bits.c
new file mode 100644
index 000..208283e4d74
--- /dev/null
+++ b/libgcc/config/riscv/feature_bits.c
@@ -0,0 +1,315 @@
+/* Helper function for function multi-versioning for RISC-V.
+
+   Copyright (C) 2024 Free Software Foundation, Inc.
+
+This file is part of GCC.
+
+GCC is free software; you can redistribute it and/or modify it under
+the terms of the GNU General Public License as published by the Free
+Software Foundation; either version 3, or (at your option) any later
+version.
+
+GCC is distributed in the hope that it will be useful, but WITHOUT ANY
+WARRANTY; without even the implied warranty of MERCHANTABILITY or
+FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public License
+for more details.
+
+Under Section 7 of GPL version 3, you are granted additional
+permissions described in the GCC Runtime Library Exception, version
+3.1, as published by the Free Software Foundation.
+
+You should have received a copy of the GNU General Public License and
+a copy of the GCC Runtime Library Exception along with this program;
+see the files COPYING3 and COPYING.RUNTIME respectively.  If not, see
+.  */
+
+#define RISCV_FEATURE_BITS_LENGTH 1
+struct {
+  unsigned length;
+  unsigned long long features[RISCV_FEATURE_BITS_LENGTH];
+} __riscv_feature_bits __attribute__((visibility("hidden"), nocommon));
+
+#define RISCV_VENDOR_FEATURE_BITS_LENGTH 1
+
+struct {
+  unsigned vendorID;
+  unsigned length;
+  unsigned long long features[RISCV_VENDOR_FEATURE_BITS_LENGTH];
+} __riscv_vendor_feature_bits __attribute__((visibility("hidden"), nocommon));
+
+#define A_GROUPID 0
+#define A_BITMASK (1ULL << 0)
+#define C_GROUPID 0
+#define C_BITMASK (1ULL << 2)
+#define D_GROUPID 0
+#define D_BITMASK (1ULL << 3)
+#define F_GROUPID 0
+#define F_BITMASK (1ULL << 5)
+#define I_GROUPID 0
+#define I_BITMASK (1ULL << 8)
+#define M_GROUPID 0
+#define M_BITMASK (1ULL << 12)
+#define V_GROUPID 0
+#define V_BITMASK (1ULL << 21)
+#define ZACAS_GROUPID 0
+#define ZACAS_BITMASK (1ULL << 26)
+#define ZBA_GROUPID 0
+#define ZBA_BITMASK (1ULL << 27)
+#define ZBB_GROUPID 0
+#define ZBB_BITMASK (1ULL << 28)
+#define ZBC_GROUPID 0
+#define ZBC_BITMASK (1ULL << 29)
+#define ZBKB_GROUPID 0
+#define ZBKB_BITMASK (1ULL << 30)
+#define ZBKC_GROUPID 0
+#define ZBKC_BITMASK (1ULL << 31)
+#define ZBKX_GROUPID 0
+#define ZBKX_BITMASK (1ULL << 32)
+#define ZBS_GROUPID 0
+#define ZBS_BITMASK (1ULL << 33)
+#define ZFA_GROUPID 0
+#define ZFA_BITMASK (1ULL << 34)
+#define ZFH_GROUPID 0
+#define ZFH_BITMASK (1ULL << 35)
+#define ZFHMIN_GROUPID 0
+#define ZFHMIN_BITMASK (1ULL << 36)
+#define ZICBOZ_GROUPID 0
+#define ZICBOZ_BITMASK (1ULL << 37)
+#define ZICOND_GROUPID 0
+#define ZICOND_BITMASK (1ULL << 38)
+#define ZIHINTNTL_GROUPID 0
+#define ZIHINTNTL_BITMASK (1ULL << 39)
+#define ZIHINTPAUSE_GROUPID 0
+#define ZIHINTPAUSE_BITMASK (1ULL << 40)
+#define ZKND_GROUPID 0
+#define ZKND_BITMASK (1ULL << 41)
+#define ZKNE_GROUPID 0
+#define ZKNE_BITMASK (1ULL << 42)
+#define ZKNH_GROUPID 0
+#define ZKNH_BITMASK (1ULL << 43)
+#define ZKSED_GROUPID 0
+#define ZKSED_BITMASK (1ULL << 44)
+#define ZKSH_GROUPID 0
+#define ZKSH_BITMASK (1ULL << 45)
+#define ZKT_GROUPID 0
+#define ZKT_BITMASK (1ULL << 46)
+#define ZTSO_GROUPID 0
+#define ZTSO_BITMASK (1ULL << 47)
+#define ZVBB_GROUPID 0
+#define ZVBB_BITMASK (1U

Re: [wwwdocs] Add aarch64 11.5.0 caveat

2024-07-23 Thread Richard Earnshaw (lists)

On 23/07/2024 11:39, Richard Biener wrote:

On Tue, 23 Jul 2024, Jakub Jelinek wrote:


Hi!

Richi suggested to mention the PR116029 bug in 11.5.0 caveats as we can't
fix that anymore.

Here is a patch for that, which attempts to describe (my limited
understanding of) the issue.
As TARGET_CPU_generic is now 64, when config.gcc doesn't set
TARGET_CPU_DEFAULT, we end up with TARGET_CPU_DEFAULT
(64 | (AARCH64_CPU_DEFAULT_FLAGS << 6))
which is treated as I think
TARGET_CPU_cortexa34 | ((AARCH64_CPU_DEFAULT_FLAGS | AARCH64_FL_SIMD) << 6))
Ok for wwwdocs?


OK, but please give arm folks the chance to review this.

Richard.


diff --git a/htdocs/gcc-11/changes.html b/htdocs/gcc-11/changes.html
index e010cd08..26189772 100644
--- a/htdocs/gcc-11/changes.html
+++ b/htdocs/gcc-11/changes.html
@@ -1164,5 +1164,20 @@ known to be fixed in the 11.5 release. This list might 
not be
  complete (that is, it is possible that some PRs that have been fixed
  are not listed here).
  
+Caveats

+
+aarch64
+
+  
+Due to a late introduced bug if the compiler is configured without
+explicit --with-arch=, --with=cpu= and/or
+--with-tune= configure options the compiler without
+explicit -march= etc. options might act as if asked
+for cortex-a34.  This can be fixed by appling manually the
+https://gcc.gnu.org/r12-8060";>r12-8060 commit on top
+of GCC 11.5.0.  See https://gcc.gnu.org/PR116029";>PR116029
+for more details.


Possibly add that 11.4 and earlier are not affected?  'late' is not very 
specific as to when the bug appeared and people using earlier versions 
might see this and be confused.


Otherwise, LGTM.

R.


+  
+
  
  

Jakub








Re: [PATCH v2] AArch64: Add LUTI ACLE for SVE2

2024-07-23 Thread Kyrylo Tkachov
Hi Vladimir,

Thanks for the update, this patch looks much better now.
Some more comments inline.

> On 23 Jul 2024, at 14:47, vladimir.miloser...@arm.com wrote:
> 
> External email: Use caution opening links or attachments
> 
> 
> Hi All,
> 
> Changes since V1: add missing MD constraints, rename intrinsics,
> remove SME2 flag for LUT feature.
> 
> Bootstrapped Regtested on aarch64-none-linux-gnu and no issues.
> 
> This depends on "Extend aarch64_feature_flags to 128 bits" work which is soon
> to be submitted upstream as we ran out of 64-bit flags.
> 
> The patch needs to be committed for me as I don't have commit rights.
> 
> Ok for master when the pre-requisites get committed?
> 
> --
> 
> This patch introduces support for LUTI2/LUTI4 ACLE for SVE2.
> 
> LUTI instructions are used for efficient table lookups with 2-bit or 4-bit
> indices. LUTI2 reads indexed 8-bit or 16-bit elements from the low 128 bits of
> the table vector using packed 2-bit indices, while LUTI4 can read from the low
> 128 or 256 bits of the table vector or from two table vectors using packed
> 4-bit indices. These instructions fill the destination vector by copying
> elements indexed by segments of the source vector, selected by the vector
> segment index.
> 
> The changes include the addition of a new AArch64 option extension "lut",
> __ARM_FEATURE_LUT preprocessor macro, definitions for the new LUTI instruction
> shapes, and implementations of the svluti2 and svluti4 builtins.
> 
> BR,
> - Vladimir
> 
> gcc/ChangeLog:
> 
>* config/aarch64/aarch64-c.cc (aarch64_update_cpp_builtins):
>Add support for __ARM_FEATURE_LUT preprocessor macro.
>* config/aarch64/aarch64-option-extensions.def (AARCH64_OPT_EXTENSION):
>Add "lut" option extension.
>* config/aarch64/aarch64-sve-builtins-shapes.cc (struct luti_base):
>Define new LUTI ACLE shapes.
>(SHAPE): Define shapes for luti2 and luti4.
>* config/aarch64/aarch64-sve-builtins-shapes.h: Add declarations
>for luti2 and luti4.
>* config/aarch64/aarch64-sve-builtins-sve2.cc (class svluti_lane_impl):
>Implement support for LUTI instructions.
>(FUNCTION): Register svluti2 and svluti4 functions.
>* config/aarch64/aarch64-sve-builtins-sve2.def (svluti2):
>Define svluti2 function.
>(svluti4): Define svluti4 function.
>* config/aarch64/aarch64-sve-builtins-sve2.h: Add declarations
>for svluti2 and svluti4.
>* config/aarch64/aarch64-sve2.md (@aarch64_sve_luti):
>Define machine description patterns for LUTI.
>* config/aarch64/aarch64.h (AARCH64_ISA_LUT): Define macro for LUTI.
>(TARGET_LUT): Likewise.
>* config/aarch64/iterators.md: Define mode iterators
>for LUTI MD patterns.
> 
> gcc/testsuite/ChangeLog:
> 
>* gcc.target/aarch64/sve/acle/asm/test_sve_acle.h: Add macro for
>SVE ACLE to enable LUTI tests.
>* lib/target-supports.exp: Update to include check for the LUT feature.
>* gcc.target/aarch64/sve2/acle/asm/luti2_bf16.c: New test.
>* gcc.target/aarch64/sve2/acle/asm/luti2_f16.c: New test.
>* gcc.target/aarch64/sve2/acle/asm/luti2_s16.c: New test.
>* gcc.target/aarch64/sve2/acle/asm/luti2_s8.c: New test.
>* gcc.target/aarch64/sve2/acle/asm/luti2_u16.c: New test.
>* gcc.target/aarch64/sve2/acle/asm/luti2_u8.c: New test.
>* gcc.target/aarch64/sve2/acle/asm/luti4_bf16.c: New test.
>* gcc.target/aarch64/sve2/acle/asm/luti4_bf16_x2.c: New test.
>* gcc.target/aarch64/sve2/acle/asm/luti4_f16.c: New test.
>* gcc.target/aarch64/sve2/acle/asm/luti4_f16_x2.c: New test.
>* gcc.target/aarch64/sve2/acle/asm/luti4_s16.c: New test.
>* gcc.target/aarch64/sve2/acle/asm/luti4_s16_x2.c: New test.
>* gcc.target/aarch64/sve2/acle/asm/luti4_s8.c: New test.
>* gcc.target/aarch64/sve2/acle/asm/luti4_u16.c: New test.
>* gcc.target/aarch64/sve2/acle/asm/luti4_u16_x2.c: New test.
>* gcc.target/aarch64/sve2/acle/asm/luti4_u8.c: New test.
> 
> diff --git 
> a/gcc/config/aarch64/aarch64-c.cc b/gcc/config/aarch64/aarch64-c.cc
index 59b2246cf8e..c1fc1955c92 100644
--- a/gcc/config/aarch64/aarch64-c.cc
+++ b/gcc/config/aarch64/aarch64-c.cc
@@ -272,6 +272,7 @@ aarch64_update_cpp_builtins (cpp_reader *pfile)
aarch64_def_or_undef (TARGET_SME_I16I64, "__ARM_FEATURE_SME_I16I64", pfile);
aarch64_def_or_undef (TARGET_SME_F64F64, "__ARM_FEATURE_SME_F64F64", pfile);
aarch64_def_or_undef (TARGET_SME2, "__ARM_FEATURE_SME2", pfile);
+ aarch64_def_or_undef (TARGET_LUT, "__ARM_FEATURE_LUT", pfile);

From reading the ACLE pull request:
https://github.com/ARM-software/acle/pull/324/files
It looks like __ARM_FEATURE_LUT should guard the Advanced SIMD intrinsics for 
LUTI at least.
Therefore we should add this macro definition only once those are implemented 
as well. So I’d remove this hunk.

diff --git a/gcc/config/aar

Re: [PATCH] regrename: Skip renaming register pairs [PR115860]

2024-07-23 Thread Jeff Law




On 7/22/24 9:16 PM, Andrew Pinski wrote:


It is interesting how there is a subreg of a hardregister after reload
showing up here. Is that on purpose?
In general subregs of hard regs shouldn't exist after allocation.  There 
are just a few exceptions to that rule.  I don't remember where the code 
is, but there's a pass over all the insns after reloading which should 
have removed them.




They come from:
```
(define_insn "*tf_to_fprx2_0"
   [(set (subreg:DF (match_operand:FPRX2 0 "nonimmediate_operand" "+f") 0)
 (subreg:DF (match_operand:TF1 "general_operand"   "v") 0))]
...
(define_insn "*tf_to_fprx2_1"
   [(set (subreg:DF (match_operand:FPRX2 0 "nonimmediate_operand" "+f") 8)
 (subreg:DF (match_operand:TF1 "general_operand"   "v") 8))]
This kind of stuff may inhibit the elimination of hard register subregs 
since after removing the subreg these patterns probably won't match anymore.


Jeff


Re: [PATCH 1/3] aarch64: Add march flags for +fp8 arch extensions

2024-07-23 Thread Claudio Bantaloukas


On 22/07/2024 10:45, Kyrylo Tkachov wrote:
> Hi Claudio,
> Thanks for working on this.
> 
>> On 22 Jul 2024, at 11:30, Claudio Bantaloukas  
>> wrote:
>>
>> External email: Use caution opening links or attachments
>>
>>
>> This introduces the relevant flags to enable access to the fpmr register and 
>> fp8 intrinsics, which will be added subsequently.
>>
>> The `+fp8' -march modifier defines the __ARM_FEATURE_FP8 macro to 1.
> 
> These __ARM_FEATURE* macros are expected to be used for the presence of 
> intrinsics that the developer would want to use in their code.
> Thus, it should be defined only when the relevant intrinsics are implemented, 
> not at the beginning of the series.
> I’m guessing you, or someone else, have patches in the works to support the 
> full set of FP8 intrinsics.
> The definition of __ARM_FEATURE_FP8 should go in at the end once the 
> intrinsics have been implemented.
> 

Hi Kyrill, will address this in the next patch iteration.

Thanks
Claudio

> The rest of the patch looks reasonable.
> Thanks,
> Kyrill
> 
> 
>>
>> gcc/ChangeLog:
>>
>> * config/aarch64/aarch64-c.cc (__ARM_FEATURE_FP8): New.
>> * config/aarch64/aarch64-option-extensions.def (fp8): Likewise.
>> (the): Likewise.
>> * config/aarch64/aarch64.h (TARGET_FP8): Likewise.
>> * doc/invoke.texi (AArch64 Options): Document new -march flags
>> and extensions.
>>
>> gcc/testsuite/ChangeLog:
>>
>> * gcc.target/aarch64/acle/fp8.c: New test.
>> ---
>> gcc/config/aarch64/aarch64-c.cc   |  1 +
>> .../aarch64/aarch64-option-extensions.def |  2 ++
>> gcc/config/aarch64/aarch64.h  |  3 +++
>> gcc/doc/invoke.texi   |  2 ++
>> gcc/testsuite/gcc.target/aarch64/acle/fp8.c   | 19 +++
>> 5 files changed, 27 insertions(+)
>> create mode 100644 gcc/testsuite/gcc.target/aarch64/acle/fp8.c
>>
>> diff --git a/gcc/config/aarch64/aarch64-c.cc 
>> b/gcc/config/aarch64/aarch64-c.cc
>> index f9b9e379375..592e71d8404 100644
>> --- a/gcc/config/aarch64/aarch64-c.cc
>> +++ b/gcc/config/aarch64/aarch64-c.cc
>> @@ -276,6 +276,7 @@ aarch64_update_cpp_builtins (cpp_reader *pfile)
>>cpp_undef (pfile, "__FLT_EVAL_METHOD_C99__");
>>builtin_define_with_int_value ("__FLT_EVAL_METHOD_C99__",
>> c_flt_eval_method (false));
>> +  aarch64_def_or_undef (TARGET_FP8, "__ARM_FEATURE_FP8", pfile);
>> }
>>
>> /* Implement TARGET_CPU_CPP_BUILTINS.  */
>> diff --git a/gcc/config/aarch64/aarch64-option-extensions.def 
>> b/gcc/config/aarch64/aarch64-option-extensions.def
>> index 42ec0eec31e..6998627f377 100644
>> --- a/gcc/config/aarch64/aarch64-option-extensions.def
>> +++ b/gcc/config/aarch64/aarch64-option-extensions.def
>> @@ -232,6 +232,8 @@ AARCH64_OPT_EXTENSION("the", THE, (), (), (), "the")
>>
>> AARCH64_OPT_EXTENSION("gcs", GCS, (), (), (), "gcs")
>>
>> +AARCH64_OPT_EXTENSION("fp8", FP8, (SIMD), (), (), "fp8")
>> +
>> #undef AARCH64_OPT_FMV_EXTENSION
>> #undef AARCH64_OPT_EXTENSION
>> #undef AARCH64_FMV_FEATURE
>> diff --git a/gcc/config/aarch64/aarch64.h b/gcc/config/aarch64/aarch64.h
>> index 8056c337957..40793aab814 100644
>> --- a/gcc/config/aarch64/aarch64.h
>> +++ b/gcc/config/aarch64/aarch64.h
>> @@ -462,6 +462,9 @@ constexpr auto AARCH64_FL_DEFAULT_ISA_MODE 
>> ATTRIBUTE_UNUSED
>> && (aarch64_tune_params.extra_tuning_flags \
>>  & AARCH64_EXTRA_TUNE_AVOID_PRED_RMW))
>>
>> +/* fp8 instructions are enabled through +fp8.  */
>> +#define TARGET_FP8 AARCH64_HAVE_ISA (FP8)
>> +
>> /* Standard register usage.  */
>>
>> /* 31 64-bit general purpose registers R0-R30:
>> diff --git a/gcc/doc/invoke.texi b/gcc/doc/invoke.texi
>> index 4850c7379bf..46dee3e4521 100644
>> --- a/gcc/doc/invoke.texi
>> +++ b/gcc/doc/invoke.texi
>> @@ -21777,6 +21777,8 @@ Enable support for Armv9.4-a Guarded Control Stack 
>> extension.
>> Enable support for Armv8.9-a/9.4-a translation hardening extension.
>> @item rcpc3
>> Enable the RCpc3 (Release Consistency) extension.
>> +@item fp8
>> +Enable the fp8 (8-bit floating point) extension.
>>
>> @end table
>>
>> diff --git a/gcc/testsuite/gcc.target/aarch64/acle/fp8.c 
>> b/gcc/testsuite/gcc.target/aarch64/acle/fp8.c
>> new file mode 100644
>> index 000..b774f28c9f0
>> --- /dev/null
>> +++ b/gcc/testsuite/gcc.target/aarch64/acle/fp8.c
>> @@ -0,0 +1,19 @@
>> +/* Test the fp8 ACLE intrinsics family.  */
>> +/* { dg-do compile } */
>> +/* { dg-options "-O1 -march=armv8-a" } */
>> +/* { dg-final { check-function-bodies "**" "" } } */
>> +
>> +#ifdef __ARM_FEATURE_FP8
>> +#error "__ARM_FEATURE_FP8 feature macro defined."
>> +#endif
>> +
>> +#pragma GCC push_options
>> +#pragma GCC target ("arch=armv9.4-a+fp8")
>> +
>> +#include 
>> +
>> +#ifndef __ARM_FEATURE_FP8
>> +#error "__ARM_FEATURE_FP8 feature macro not defined."
>> +#endif
>> +
>> +#pragma GCC pop_options
> 

Re: [PATCH] c++: normalizing ttp parm constraints [PR115656]

2024-07-23 Thread Patrick Palka
On Tue, 23 Jul 2024, Jason Merrill wrote:

> On 7/5/24 12:18 PM, Patrick Palka wrote:
> > Bootstrapped and regtested on x86_64-pc-linux-gnu, does this look OK
> > for trunk/14 and perhaps 13?
> > 
> > Alternatively we can set current_template_parms from weakly_subsumes
> > instead, who has only one caller anyway.
> 
> Would it also work to pass tmpl instead of NULL_TREE to
> get_normalized_constraints_from_info in weakly_subsumes?

That seems to work nicely too since ci has been rewritten in terms of
tmpl's constraints by the caller.  I briefly considered that earlier but
confused myself into thinking that wouldn't do the right thing.

(I might've been looking instead at strictly_subsumes where when called
from process_partial_specialization we have two sets of parameters
involved: tmpl's parms for the primary template and current_template_parms
for the partial specialization, so we can't pass in_decl=tmpl there.
Luckily current_template_parms is always properly set there though.)

Like so?  Bootstrap and regtest in progress.

-- >8 --

Subject: [PATCH] c++: normalizing ttp constraints [PR115656]

Here we normalize the constraint same_as for the first
time during constraint subsumption checking of B / UU as part of ttp
coercion.  During this normalization the set of in-scope template
parameters i.e. current_template_parms is empty, which tricks the
satisfaction cache into thinking that the satisfaction value of the
constraint is independent of its template parameters, and we incorrectly
conflate the satisfaction value with T = bool vs T = long and accept the
specialization A.

Since is_compatible_template_arg is the only caller of weakly_subsumes
which rewrote the ttp's constraints ('ci') in terms of those of the
argument template ('tmpl'), we can in turn normalize the ttp's
constraints relative to tmpl's parameters rather than relying on
current_template_parms by passing in_decl=tmpl from weakly_subsumes.

PR c++/115656

gcc/cp/ChangeLog:

* constraint.cc (weakly_subsumes): Pass in_decl=tmpl to
get_normalized_constraints_from_info.

gcc/testsuite/ChangeLog:

* g++.dg/cpp2a/concepts-ttp7.C: New test.
---
 gcc/cp/constraint.cc   |  2 +-
 gcc/testsuite/g++.dg/cpp2a/concepts-ttp7.C | 12 
 2 files changed, 13 insertions(+), 1 deletion(-)
 create mode 100644 gcc/testsuite/g++.dg/cpp2a/concepts-ttp7.C

diff --git a/gcc/cp/constraint.cc b/gcc/cp/constraint.cc
index 5472cc51b8a..d84094762f7 100644
--- a/gcc/cp/constraint.cc
+++ b/gcc/cp/constraint.cc
@@ -3334,7 +3334,7 @@ strictly_subsumes (tree ci, tree tmpl)
 bool
 weakly_subsumes (tree ci, tree tmpl)
 {
-  tree n1 = get_normalized_constraints_from_info (ci, NULL_TREE);
+  tree n1 = get_normalized_constraints_from_info (ci, tmpl);
   tree n2 = get_normalized_constraints_from_decl (tmpl);
 
   return subsumes (n1, n2);
diff --git a/gcc/testsuite/g++.dg/cpp2a/concepts-ttp7.C 
b/gcc/testsuite/g++.dg/cpp2a/concepts-ttp7.C
new file mode 100644
index 000..2ce884b995c
--- /dev/null
+++ b/gcc/testsuite/g++.dg/cpp2a/concepts-ttp7.C
@@ -0,0 +1,12 @@
+// PR c++/115656
+// { dg-do compile { target c++20 } }
+
+template concept same_as = __is_same(T, U);
+
+template T, template> class UU>
+struct A { };
+
+template> class B;
+
+A a1;
+A a2; // { dg-error "constraint failure" }
-- 
2.46.0.rc0.106.g1c4a234a1c


> 
> > -- >8 --
> > 
> > Here we normalize the constraint same_as for the first
> > time during constraint subsumption checking of B / TT as part of ttp
> > coercion.  During this normalization the set of in-scope template
> > parameters i.e. current_template_parms is empty, which tricks the
> > satisfaction cache into thinking that the satisfaction value of the
> > constraint is independent of its template parameters, and we incorrectly
> > conflate the satisfaction value with auto = bool vs auto = long and
> > accept the specialization A.
> > 
> > This patch fixes this by setting current_template_parms appropirately
> > during subsumption checking.
> > 
> > PR c++/115656
> > 
> > gcc/cp/ChangeLog:
> > 
> > * pt.cc (is_compatible_template_arg): Set current_template_parms
> > around the call to weakly_subsumes.
> > 
> > gcc/testsuite/ChangeLog:
> > 
> > * g++.dg/cpp2a/concepts-ttp7.C: New test.
> > ---
> >   gcc/cp/pt.cc   |  4 
> >   gcc/testsuite/g++.dg/cpp2a/concepts-ttp7.C | 12 
> >   2 files changed, 16 insertions(+)
> >   create mode 100644 gcc/testsuite/g++.dg/cpp2a/concepts-ttp7.C
> > 
> > diff --git a/gcc/cp/pt.cc b/gcc/cp/pt.cc
> > index 017cc7fd0ab..1f6553790a5 100644
> > --- a/gcc/cp/pt.cc
> > +++ b/gcc/cp/pt.cc
> > @@ -8493,6 +8493,10 @@ is_compatible_template_arg (tree parm, tree arg, tree
> > args)
> >   return false;
> >   }
> >   +  /* Normalization needs to know the effective set of in-scope
> > + template parameters.  */
> > +  auto ctp = make_temp_override (current_template_parms,
> > +   

Re: [PATCH] MATCH: add abs support for half float

2024-07-23 Thread Richard Biener
On Tue, Jul 23, 2024 at 10:27 AM Kugan Vivekanandarajah
 wrote:
>
> On Tue, Jul 23, 2024 at 10:35 AM Andrew Pinski  wrote:
> >
> > On Mon, Jul 22, 2024 at 5:26 PM Kugan Vivekanandarajah
> >  wrote:
> > >
> > > Revised based on the comment and moved it into existing patterns as.
> > >
> > > gcc/ChangeLog:
> > >
> > > * match.pd: Extend A CMP 0 ? A : -A into (type)A CMP 0 ? A : -A.
> > > Extend A CMP 0 ? A : -A into (type) A CMP 0 ? A : -A.
> > >
> > > gcc/testsuite/ChangeLog:
> > >
> > > * gcc.dg/tree-ssa/absfloat16.c: New test.
> >
> > The testcase needs to make sure it runs only for targets that support
> > float16 so like:
> >
> > /* { dg-require-effective-target float16 } */
> > /* { dg-add-options float16 } */
> Added in the attached version.

+ /* (type)A >=/> 0 ? A : -Asame as abs (A) */
  (for cmp (ge gt)
   (simplify
-   (cnd (cmp @0 zerop) @1 (negate @1))
-(if (!HONOR_SIGNED_ZEROS (TREE_TYPE(@0))
-&& !TYPE_UNSIGNED (TREE_TYPE(@0))
-&& bitwise_equal_p (@0, @1))
+   (cnd (cmp (convert?@0 @1) zerop) @2 (negate @2))
+(if (!HONOR_SIGNED_ZEROS (TREE_TYPE (@1))
+&& !TYPE_UNSIGNED (TREE_TYPE (@1))
+&& ((VECTOR_TYPE_P (type)
+ && tree_nop_conversion_p (TREE_TYPE (@0), TREE_TYPE (@1)))
+   || (!VECTOR_TYPE_P (type)
+   && (TYPE_PRECISION (TREE_TYPE (@1))
+   <= TYPE_PRECISION (TREE_TYPE (@0)
+&& bitwise_equal_p (@1, @2))

I wonder about the bitwise_equal_p which tests @1 against @2 now
with the convert still applied to @1 - that looks odd.  You are allowing
sign-changing conversions but doesn't that change ge/gt behavior?
Also why are sign/zero-extensions not OK for vector types?

+  (absu:type @1)
+  (abs @1)

I think this should use @2 now.

> Thanks.
> Kugan
> >
> > (like what is in gcc.dg/c11-floatn-3.c and others).
> >
> > Other than that it looks good but I can't approve it.
> >
> > Thanks,
> > Andrew Pinski
> >
> > >
> > > Signed-off-by: Kugan Vivekanandarajah 
> > >
> > > Bootstrapped and regression test on aarch64-linux-gnu. Is this OK for 
> > > trunk?
> > > Thanks,
> > > Kugan
> > >
> > > 
> > > From: Andrew Pinski 
> > > Sent: Monday, 15 July 2024 5:30 AM
> > > To: Kugan Vivekanandarajah 
> > > Cc: gcc-patches@gcc.gnu.org ; 
> > > richard.guent...@gmail.com 
> > > Subject: Re: [PATCH] MATCH: add abs support for half float
> > >
> > > External email: Use caution opening links or attachments
> > >
> > >
> > > On Sun, Jul 14, 2024 at 1:12 AM Kugan Vivekanandarajah
> > >  wrote:
> > > >
> > > > This patch extends abs detection in matched for half float.
> > > >
> > > > Bootstrapped and regression test on aarch64-linux-gnu. Is this OK for 
> > > > trunk?
> > >
> > > This is basically this pattern:
> > > ```
> > >  /* A >=/> 0 ? A : -Asame as abs (A) */
> > >  (for cmp (ge gt)
> > >   (simplify
> > >(cnd (cmp @0 zerop) @1 (negate @1))
> > > (if (!HONOR_SIGNED_ZEROS (TREE_TYPE(@0))
> > >  && !TYPE_UNSIGNED (TREE_TYPE(@0))
> > >  && bitwise_equal_p (@0, @1))
> > >  (if (TYPE_UNSIGNED (type))
> > >   (absu:type @0)
> > >   (abs @0)
> > > ```
> > >
> > > except extended to handle an optional convert. Why didn't you just
> > > extend the above pattern to handle the convert instead? Also I think
> > > you have an issue with unsigned types with the comparison.
> > > Also you should extend the -abs(A) pattern right below it in a similar 
> > > fashion.
> > >
> > > Thanks,
> > > Andrew Pinski
> > >
> > >
> > > >
> > > > gcc/ChangeLog:
> > > >
> > > > * match.pd: Add pattern to convert (type)A >=/> 0 ? A : -A into abs (A) 
> > > > for half float.
> > > >
> > > > gcc/testsuite/ChangeLog:
> > > >
> > > > * gcc.dg/tree-ssa/absfloat16.c: New test.
> > > >
> > > > Signed-off-by: Kugan Vivekanandarajah 
> > > >


Re: [PATCH v2] Internal-fn: Only allow type matches mode for internal fn[PR115961]

2024-07-23 Thread Richard Biener
On Fri, Jul 19, 2024 at 1:10 PM  wrote:
>
> From: Pan Li 
>
> The direct_internal_fn_supported_p has no restrictions for the type
> modes.  For example the bitfield like below will be recog as .SAT_TRUNC.
>
> struct e
> {
>   unsigned pre : 12;
>   unsigned a : 4;
> };
>
> __attribute__((noipa))
> void bug (e * v, unsigned def, unsigned use) {
>   e & defE = *v;
>   defE.a = min_u (use + 1, 0xf);
> }
>
> This patch would like to check strictly for the 
> direct_internal_fn_supported_p,
> and only allows the type matches mode for ifn type tree pair.
>
> The below test suites are passed for this patch:
> 1. The rv64gcv fully regression tests.
> 2. The x86 bootstrap tests.
> 3. The x86 fully regression tests.

LGTM unless Richard S. has any more comments.

Richard.

> PR target/115961
>
> gcc/ChangeLog:
>
> * internal-fn.cc (type_strictly_matches_mode_p): Add new func
> impl to check type strictly matches mode or not.
> (type_pair_strictly_matches_mode_p): Ditto but for tree type
> pair.
> (direct_internal_fn_supported_p): Add above check for the tree
> type pair.
>
> gcc/testsuite/ChangeLog:
>
> * g++.target/i386/pr115961-run-1.C: New test.
> * g++.target/riscv/rvv/base/pr115961-run-1.C: New test.
>
> Signed-off-by: Pan Li 
> ---
>  gcc/internal-fn.cc| 32 +
>  .../g++.target/i386/pr115961-run-1.C  | 34 +++
>  .../riscv/rvv/base/pr115961-run-1.C   | 34 +++
>  3 files changed, 100 insertions(+)
>  create mode 100644 gcc/testsuite/g++.target/i386/pr115961-run-1.C
>  create mode 100644 gcc/testsuite/g++.target/riscv/rvv/base/pr115961-run-1.C
>
> diff --git a/gcc/internal-fn.cc b/gcc/internal-fn.cc
> index 95946bfd683..5c21249318e 100644
> --- a/gcc/internal-fn.cc
> +++ b/gcc/internal-fn.cc
> @@ -4164,6 +4164,35 @@ direct_internal_fn_optab (internal_fn fn)
>gcc_unreachable ();
>  }
>
> +/* Return true if TYPE's mode has the same format as TYPE, and if there is
> +   a 1:1 correspondence between the values that the mode can store and the
> +   values that the type can store.  */
> +
> +static bool
> +type_strictly_matches_mode_p (const_tree type)
> +{
> +  if (VECTOR_TYPE_P (type))
> +return VECTOR_MODE_P (TYPE_MODE (type));
> +
> +  if (INTEGRAL_TYPE_P (type))
> +return type_has_mode_precision_p (type);
> +
> +  if (SCALAR_FLOAT_TYPE_P (type) || COMPLEX_FLOAT_TYPE_P (type))
> +return true;
> +
> +  return false;
> +}
> +
> +/* Return true if both the first and the second type of tree pair are
> +   strictly matches their modes,  or return false.  */
> +
> +static bool
> +type_pair_strictly_matches_mode_p (tree_pair type_pair)
> +{
> +  return type_strictly_matches_mode_p (type_pair.first)
> +&& type_strictly_matches_mode_p (type_pair.second);
> +}
> +
>  /* Return true if FN is supported for the types in TYPES when the
> optimization type is OPT_TYPE.  The types are those associated with
> the "type0" and "type1" fields of FN's direct_internal_fn_info
> @@ -4173,6 +4202,9 @@ bool
>  direct_internal_fn_supported_p (internal_fn fn, tree_pair types,
> optimization_type opt_type)
>  {
> +  if (!type_pair_strictly_matches_mode_p (types))
> +return false;
> +
>switch (fn)
>  {
>  #define DEF_INTERNAL_FN(CODE, FLAGS, FNSPEC) \
> diff --git a/gcc/testsuite/g++.target/i386/pr115961-run-1.C 
> b/gcc/testsuite/g++.target/i386/pr115961-run-1.C
> new file mode 100644
> index 000..b8c8aef3b17
> --- /dev/null
> +++ b/gcc/testsuite/g++.target/i386/pr115961-run-1.C
> @@ -0,0 +1,34 @@
> +/* PR target/115961 */
> +/* { dg-do run } */
> +/* { dg-options "-O3 -fdump-rtl-expand-details" } */
> +
> +struct e
> +{
> +  unsigned pre : 12;
> +  unsigned a : 4;
> +};
> +
> +static unsigned min_u (unsigned a, unsigned b)
> +{
> +  return (b < a) ? b : a;
> +}
> +
> +__attribute__((noipa))
> +void bug (e * v, unsigned def, unsigned use) {
> +  e & defE = *v;
> +  defE.a = min_u (use + 1, 0xf);
> +}
> +
> +__attribute__((noipa, optimize(0)))
> +int main(void)
> +{
> +  e v = { 0xded, 3 };
> +
> +  bug(&v, 32, 33);
> +
> +  if (v.a != 0xf)
> +__builtin_abort ();
> +
> +  return 0;
> +}
> +/* { dg-final { scan-rtl-dump-not ".SAT_TRUNC " "expand" } } */
> diff --git a/gcc/testsuite/g++.target/riscv/rvv/base/pr115961-run-1.C 
> b/gcc/testsuite/g++.target/riscv/rvv/base/pr115961-run-1.C
> new file mode 100644
> index 000..b8c8aef3b17
> --- /dev/null
> +++ b/gcc/testsuite/g++.target/riscv/rvv/base/pr115961-run-1.C
> @@ -0,0 +1,34 @@
> +/* PR target/115961 */
> +/* { dg-do run } */
> +/* { dg-options "-O3 -fdump-rtl-expand-details" } */
> +
> +struct e
> +{
> +  unsigned pre : 12;
> +  unsigned a : 4;
> +};
> +
> +static unsigned min_u (unsigned a, unsigned b)
> +{
> +  return (b < a) ? b : a;
> +}
> +
> +__attribute__((noipa))
> +void bug (e * v, unsigned def, unsigned use) {
> +  e & defE = *v;
>

Re: [PATCH v2] Internal-fn: Only allow type matches mode for internal fn[PR115961]

2024-07-23 Thread Richard Sandiford
Richard Biener  writes:
> On Fri, Jul 19, 2024 at 1:10 PM  wrote:
>>
>> From: Pan Li 
>>
>> The direct_internal_fn_supported_p has no restrictions for the type
>> modes.  For example the bitfield like below will be recog as .SAT_TRUNC.
>>
>> struct e
>> {
>>   unsigned pre : 12;
>>   unsigned a : 4;
>> };
>>
>> __attribute__((noipa))
>> void bug (e * v, unsigned def, unsigned use) {
>>   e & defE = *v;
>>   defE.a = min_u (use + 1, 0xf);
>> }
>>
>> This patch would like to check strictly for the 
>> direct_internal_fn_supported_p,
>> and only allows the type matches mode for ifn type tree pair.
>>
>> The below test suites are passed for this patch:
>> 1. The rv64gcv fully regression tests.
>> 2. The x86 bootstrap tests.
>> 3. The x86 fully regression tests.
>
> LGTM unless Richard S. has any more comments.

LGTM too with Andrew's comments addressed.

Thanks,
Richard

>
> Richard.
>
>> PR target/115961
>>
>> gcc/ChangeLog:
>>
>> * internal-fn.cc (type_strictly_matches_mode_p): Add new func
>> impl to check type strictly matches mode or not.
>> (type_pair_strictly_matches_mode_p): Ditto but for tree type
>> pair.
>> (direct_internal_fn_supported_p): Add above check for the tree
>> type pair.
>>
>> gcc/testsuite/ChangeLog:
>>
>> * g++.target/i386/pr115961-run-1.C: New test.
>> * g++.target/riscv/rvv/base/pr115961-run-1.C: New test.
>>
>> Signed-off-by: Pan Li 
>> ---
>>  gcc/internal-fn.cc| 32 +
>>  .../g++.target/i386/pr115961-run-1.C  | 34 +++
>>  .../riscv/rvv/base/pr115961-run-1.C   | 34 +++
>>  3 files changed, 100 insertions(+)
>>  create mode 100644 gcc/testsuite/g++.target/i386/pr115961-run-1.C
>>  create mode 100644 gcc/testsuite/g++.target/riscv/rvv/base/pr115961-run-1.C
>>
>> diff --git a/gcc/internal-fn.cc b/gcc/internal-fn.cc
>> index 95946bfd683..5c21249318e 100644
>> --- a/gcc/internal-fn.cc
>> +++ b/gcc/internal-fn.cc
>> @@ -4164,6 +4164,35 @@ direct_internal_fn_optab (internal_fn fn)
>>gcc_unreachable ();
>>  }
>>
>> +/* Return true if TYPE's mode has the same format as TYPE, and if there is
>> +   a 1:1 correspondence between the values that the mode can store and the
>> +   values that the type can store.  */
>> +
>> +static bool
>> +type_strictly_matches_mode_p (const_tree type)
>> +{
>> +  if (VECTOR_TYPE_P (type))
>> +return VECTOR_MODE_P (TYPE_MODE (type));
>> +
>> +  if (INTEGRAL_TYPE_P (type))
>> +return type_has_mode_precision_p (type);
>> +
>> +  if (SCALAR_FLOAT_TYPE_P (type) || COMPLEX_FLOAT_TYPE_P (type))
>> +return true;
>> +
>> +  return false;
>> +}
>> +
>> +/* Return true if both the first and the second type of tree pair are
>> +   strictly matches their modes,  or return false.  */
>> +
>> +static bool
>> +type_pair_strictly_matches_mode_p (tree_pair type_pair)
>> +{
>> +  return type_strictly_matches_mode_p (type_pair.first)
>> +&& type_strictly_matches_mode_p (type_pair.second);
>> +}
>> +
>>  /* Return true if FN is supported for the types in TYPES when the
>> optimization type is OPT_TYPE.  The types are those associated with
>> the "type0" and "type1" fields of FN's direct_internal_fn_info
>> @@ -4173,6 +4202,9 @@ bool
>>  direct_internal_fn_supported_p (internal_fn fn, tree_pair types,
>> optimization_type opt_type)
>>  {
>> +  if (!type_pair_strictly_matches_mode_p (types))
>> +return false;
>> +
>>switch (fn)
>>  {
>>  #define DEF_INTERNAL_FN(CODE, FLAGS, FNSPEC) \
>> diff --git a/gcc/testsuite/g++.target/i386/pr115961-run-1.C 
>> b/gcc/testsuite/g++.target/i386/pr115961-run-1.C
>> new file mode 100644
>> index 000..b8c8aef3b17
>> --- /dev/null
>> +++ b/gcc/testsuite/g++.target/i386/pr115961-run-1.C
>> @@ -0,0 +1,34 @@
>> +/* PR target/115961 */
>> +/* { dg-do run } */
>> +/* { dg-options "-O3 -fdump-rtl-expand-details" } */
>> +
>> +struct e
>> +{
>> +  unsigned pre : 12;
>> +  unsigned a : 4;
>> +};
>> +
>> +static unsigned min_u (unsigned a, unsigned b)
>> +{
>> +  return (b < a) ? b : a;
>> +}
>> +
>> +__attribute__((noipa))
>> +void bug (e * v, unsigned def, unsigned use) {
>> +  e & defE = *v;
>> +  defE.a = min_u (use + 1, 0xf);
>> +}
>> +
>> +__attribute__((noipa, optimize(0)))
>> +int main(void)
>> +{
>> +  e v = { 0xded, 3 };
>> +
>> +  bug(&v, 32, 33);
>> +
>> +  if (v.a != 0xf)
>> +__builtin_abort ();
>> +
>> +  return 0;
>> +}
>> +/* { dg-final { scan-rtl-dump-not ".SAT_TRUNC " "expand" } } */
>> diff --git a/gcc/testsuite/g++.target/riscv/rvv/base/pr115961-run-1.C 
>> b/gcc/testsuite/g++.target/riscv/rvv/base/pr115961-run-1.C
>> new file mode 100644
>> index 000..b8c8aef3b17
>> --- /dev/null
>> +++ b/gcc/testsuite/g++.target/riscv/rvv/base/pr115961-run-1.C
>> @@ -0,0 +1,34 @@
>> +/* PR target/115961 */
>> +/* { dg-do run } */
>> +/* { dg-options "-O3 -fdump-rtl-expand-details" } */
>> +
>> +struct e
>

[PATCH] doc: Document -O1 as the preferred level for large machine-generated code

2024-07-23 Thread Sam James
At -O1, the intention is that we compile things in a "reasonable" amount
of time (ditto memory use). In particular, we try to especially avoid
optimizations which scale poorly on pathological cases, as is the case
for large machine-generated code.

Recommend -O1 for large machine-generated code, as has been informally
done on bugs for a while now.

This applies (broadly speaking) for both large machine-generated functions
but also to a lesser extent repetitive small-but-still-not-tiny functions
from a generator program.

gcc/ChangeLog:
PR middle-end/114855
* doc/invoke.texi (Optimize options): Mention machine-generated
code for -O1.
---
richi, does this accurately reflect the discussion we had on IRC a little
while ago?

Please push if OK, thanks.

 gcc/doc/invoke.texi | 5 +
 1 file changed, 5 insertions(+)

diff --git a/gcc/doc/invoke.texi b/gcc/doc/invoke.texi
index e0a641213ae4..9fb0925ed292 100644
--- a/gcc/doc/invoke.texi
+++ b/gcc/doc/invoke.texi
@@ -12560,6 +12560,11 @@ With @option{-O}, the compiler tries to reduce code 
size and execution
 time, without performing any optimizations that take a great deal of
 compilation time.
 
+@option{-O} is the recommended optimization level for large machine-generated
+code as a sensible balance between time taken to compile and memory use:
+higher optimization levels perform optimizations with greater algorithmic
+complexity than at @option{-O}.
+
 @c Note that in addition to the default_options_table list in opts.cc,
 @c several optimization flags default to true but control optimization
 @c passes that are explicitly disabled at -O0.

-- 
2.45.2



[pushed] cp/coroutines: add a test for PR c++/103953

2024-07-23 Thread Arsen Arsenović
This PR seems to have been fixed by a fix for a seemingly unrelated PR.
Lets add a regression test to make sure it stays fixed.

PR c++/103953 - Leak of coroutine return object

PR c++/103953

gcc/testsuite/ChangeLog:

* g++.dg/coroutines/torture/pr103953.C: New test.

Reviewed-by: Iain Sandoe 
---
Pushed as obvious.

 .../g++.dg/coroutines/torture/pr103953.C  | 75 +++
 1 file changed, 75 insertions(+)
 create mode 100644 gcc/testsuite/g++.dg/coroutines/torture/pr103953.C

diff --git a/gcc/testsuite/g++.dg/coroutines/torture/pr103953.C 
b/gcc/testsuite/g++.dg/coroutines/torture/pr103953.C
new file mode 100644
index 000..da559f8fa0d
--- /dev/null
+++ b/gcc/testsuite/g++.dg/coroutines/torture/pr103953.C
@@ -0,0 +1,75 @@
+// { dg-do run }
+// https://gcc.gnu.org/PR103953
+#include 
+#include 
+
+static int ctor_dtor_count = 0;
+
+struct task {
+struct promise_type;
+
+using handle_type = std::coroutine_handle;
+
+task(handle_type h) : handle(h) {
+ctor_dtor_count++;
+}
+task(const task & t) : handle(t.handle) {
+ctor_dtor_count++;
+}
+task(task && t) : handle(std::move(t.handle)) {
+ctor_dtor_count++;
+}
+~task() {
+   if (--ctor_dtor_count < 0)
+   __builtin_abort ();
+}
+
+struct promise_type {
+auto get_return_object() {
+return task{handle_type::from_promise(*this)};
+}
+
+auto initial_suspend() {
+return std::suspend_always {};
+}
+
+auto unhandled_exception() {}
+
+auto final_suspend() noexcept {
+return std::suspend_always{};
+}
+
+void return_void() {}
+};
+
+   handle_type handle;
+
+   void await_resume() {
+   handle.resume();
+   }
+
+   auto await_suspend(handle_type) {
+   return handle;
+   }
+
+   auto await_ready() {
+   return false;
+   }
+};
+
+int main() {
+{
+   task coroutine_A = []() ->task {
+   co_return;
+   }();
+
+   task coroutine_B = [&coroutine_A]() ->task {
+   co_await coroutine_A;
+   }();
+
+   coroutine_B.handle.resume();
+}
+
+if (ctor_dtor_count != 0)
+   __builtin_abort ();
+}
-- 
2.45.2



RE: [PATCH][contrib]: support json output from check_GNU_style_lib.py

2024-07-23 Thread Tamar Christina
Hi Both,

> -Original Message-
> From: Jonathan Wakely 
> Sent: Monday, July 22, 2024 3:21 PM
> To: Filip Kastl 
> Cc: Tamar Christina ; gcc-patches@gcc.gnu.org; nd
> 
> Subject: Re: [PATCH][contrib]: support json output from check_GNU_style_lib.py
> 
> On Mon, 22 Jul 2024 at 14:54, Filip Kastl  wrote:
> >
> > Hi Tamar,
> >
> > I wanted to try reviewing a patch and this seemed simple enough so I gave 
> > it a
> > shot.  Hopefully this saves some time of the maintainer that eventually
> > approves this :).

Thanks for the review! :)

> >
> > I don't see any bug in the code.  I also tried running it on my own input 
> > and
> > the output was correct.  So functionally the patch is good.  I have some
> > remarks about the style though:
> >
> > - You sometimes put a space between function name and parentheses.  This
> >   doesn't look python-ish and isn't consistent in the file.
> 
> It's the GNU C convention, but I really wish we didn't use it for
> Python code too.
> 
> > - There's one very long line (check_GNU_style_lib.py:335).  I would shorten 
> > it
> >   so it is at most 79 characters long.
> > - On the same line, there is a space after { but not before }.  For
> >   consistency, I would erase the space after {
> > - On the same line there are spaces after :.  I think a more python-ish way
> >   would be not to have those spaces there.  Here I'm maybe being too 
> > pedantic
> >   so feel free to ignore this.  I think it will look nice either way.
> >
> > To summarize the last 3 points, I would replace this
> >
> > errlines.append({ "file" : locs[0], "row" : locs[1], 
> > "column" : locs[2], "err"
> : e.console_error})

How about this formatting, I tend to find it a bit easier to read even.
I also updated the location numbering to be numerical so, removed the quotes.

Ok for master?

Thanks,
Tamar

contrib/ChangeLog:

* check_GNU_style.py: Add json format.
* check_GNU_style_lib.py: Likewise.

-- inline copy of patch --

diff --git a/contrib/check_GNU_style.py b/contrib/check_GNU_style.py
index 
6b946a5bc3610b8ef70ba372ea800f892eeac85b..0890947f1f9b60c37ff62e23007c3a0735fd9c14
 100755
--- a/contrib/check_GNU_style.py
+++ b/contrib/check_GNU_style.py
@@ -31,7 +31,7 @@ def main():
 parser.add_argument('file', help = 'File with a patch')
 parser.add_argument('-f', '--format', default = 'stdio',
 help = 'Display format',
-choices = ['stdio', 'quickfix'])
+choices = ['stdio', 'quickfix', 'json'])
 args = parser.parse_args()
 filename = args.file
 format = args.format
diff --git a/contrib/check_GNU_style_lib.py b/contrib/check_GNU_style_lib.py
index 
6dbe4b53559c63d2e0276d0ff88619cd2f7f8e06..ab21ed4607593668ab95f24715295a41ac7d8a21
 100755
--- a/contrib/check_GNU_style_lib.py
+++ b/contrib/check_GNU_style_lib.py
@@ -29,6 +29,7 @@
 import sys
 import re
 import unittest
+import json
 
 def import_pip3(*args):
 missing=[]
@@ -317,6 +318,33 @@ def check_GNU_style_file(file, format):
 else:
 print('%d error(s) written to %s file.' % (len(errors), f))
 exit(1)
+elif format == 'json':
+fn = lambda x: x.error_message
+i = 1
+result = []
+for (k, errors) in groupby(sorted(errors, key = fn), fn):
+errors = list(errors)
+entry = {}
+entry['type'] = i
+entry['msg'] = k
+entry['count'] = len(errors)
+i += 1
+errlines = []
+for e in errors:
+locs = e.error_location ().split(':')
+errlines.append({ "file": locs[0]
+, "row": int(locs[1])
+, "column": int(locs[2])
+, "err": e.console_error })
+entry['errors'] = errlines
+result.append(entry)
+
+if len(errors) == 0:
+exit(0)
+else:
+json_string = json.dumps(result)
+print(json_string)
+exit(1)
 else:
 assert False



rb18652.patch
Description: rb18652.patch


Re: [PATCH] c++: normalizing ttp parm constraints [PR115656]

2024-07-23 Thread Jason Merrill

On 7/23/24 9:37 AM, Patrick Palka wrote:

On Tue, 23 Jul 2024, Jason Merrill wrote:


On 7/5/24 12:18 PM, Patrick Palka wrote:

Bootstrapped and regtested on x86_64-pc-linux-gnu, does this look OK
for trunk/14 and perhaps 13?

Alternatively we can set current_template_parms from weakly_subsumes
instead, who has only one caller anyway.


Would it also work to pass tmpl instead of NULL_TREE to
get_normalized_constraints_from_info in weakly_subsumes?


That seems to work nicely too since ci has been rewritten in terms of
tmpl's constraints by the caller.  I briefly considered that earlier but
confused myself into thinking that wouldn't do the right thing.

(I might've been looking instead at strictly_subsumes where when called
from process_partial_specialization we have two sets of parameters
involved: tmpl's parms for the primary template and current_template_parms
for the partial specialization, so we can't pass in_decl=tmpl there.
Luckily current_template_parms is always properly set there though.)

Like so?  Bootstrap and regtest in progress.


Let's also rename weakly_subsumes to something like ttp_subsumes so it's 
clearer that it's just used for this case.  OK with that adjustment.



-- >8 --

Subject: [PATCH] c++: normalizing ttp constraints [PR115656]

Here we normalize the constraint same_as for the first
time during constraint subsumption checking of B / UU as part of ttp
coercion.  During this normalization the set of in-scope template
parameters i.e. current_template_parms is empty, which tricks the
satisfaction cache into thinking that the satisfaction value of the
constraint is independent of its template parameters, and we incorrectly
conflate the satisfaction value with T = bool vs T = long and accept the
specialization A.

Since is_compatible_template_arg is the only caller of weakly_subsumes
which rewrote the ttp's constraints ('ci') in terms of those of the
argument template ('tmpl'), we can in turn normalize the ttp's
constraints relative to tmpl's parameters rather than relying on
current_template_parms by passing in_decl=tmpl from weakly_subsumes.

PR c++/115656

gcc/cp/ChangeLog:

* constraint.cc (weakly_subsumes): Pass in_decl=tmpl to
get_normalized_constraints_from_info.

gcc/testsuite/ChangeLog:

* g++.dg/cpp2a/concepts-ttp7.C: New test.
---
  gcc/cp/constraint.cc   |  2 +-
  gcc/testsuite/g++.dg/cpp2a/concepts-ttp7.C | 12 
  2 files changed, 13 insertions(+), 1 deletion(-)
  create mode 100644 gcc/testsuite/g++.dg/cpp2a/concepts-ttp7.C

diff --git a/gcc/cp/constraint.cc b/gcc/cp/constraint.cc
index 5472cc51b8a..d84094762f7 100644
--- a/gcc/cp/constraint.cc
+++ b/gcc/cp/constraint.cc
@@ -3334,7 +3334,7 @@ strictly_subsumes (tree ci, tree tmpl)
  bool
  weakly_subsumes (tree ci, tree tmpl)
  {
-  tree n1 = get_normalized_constraints_from_info (ci, NULL_TREE);
+  tree n1 = get_normalized_constraints_from_info (ci, tmpl);
tree n2 = get_normalized_constraints_from_decl (tmpl);
  
return subsumes (n1, n2);

diff --git a/gcc/testsuite/g++.dg/cpp2a/concepts-ttp7.C 
b/gcc/testsuite/g++.dg/cpp2a/concepts-ttp7.C
new file mode 100644
index 000..2ce884b995c
--- /dev/null
+++ b/gcc/testsuite/g++.dg/cpp2a/concepts-ttp7.C
@@ -0,0 +1,12 @@
+// PR c++/115656
+// { dg-do compile { target c++20 } }
+
+template concept same_as = __is_same(T, U);
+
+template T, template> class UU>
+struct A { };
+
+template> class B;
+
+A a1;
+A a2; // { dg-error "constraint failure" }




Re: [PATCH] doc: Document -O1 as the preferred level for large machine-generated code

2024-07-23 Thread Sam James
Sam James  writes:

(oops, forgot to CC docs maintainers. Done now.)

> At -O1, the intention is that we compile things in a "reasonable" amount
> of time (ditto memory use). In particular, we try to especially avoid
> optimizations which scale poorly on pathological cases, as is the case
> for large machine-generated code.
>
> Recommend -O1 for large machine-generated code, as has been informally
> done on bugs for a while now.
>
> This applies (broadly speaking) for both large machine-generated functions
> but also to a lesser extent repetitive small-but-still-not-tiny functions
> from a generator program.
>
> gcc/ChangeLog:
>   PR middle-end/114855
>   * doc/invoke.texi (Optimize options): Mention machine-generated
>   code for -O1.
> ---
> richi, does this accurately reflect the discussion we had on IRC a little
> while ago?
>
> Please push if OK, thanks.
>
>  gcc/doc/invoke.texi | 5 +
>  1 file changed, 5 insertions(+)
>
> diff --git a/gcc/doc/invoke.texi b/gcc/doc/invoke.texi
> index e0a641213ae4..9fb0925ed292 100644
> --- a/gcc/doc/invoke.texi
> +++ b/gcc/doc/invoke.texi
> @@ -12560,6 +12560,11 @@ With @option{-O}, the compiler tries to reduce code 
> size and execution
>  time, without performing any optimizations that take a great deal of
>  compilation time.
>  
> +@option{-O} is the recommended optimization level for large machine-generated
> +code as a sensible balance between time taken to compile and memory use:
> +higher optimization levels perform optimizations with greater algorithmic
> +complexity than at @option{-O}.
> +
>  @c Note that in addition to the default_options_table list in opts.cc,
>  @c several optimization flags default to true but control optimization
>  @c passes that are explicitly disabled at -O0.


Re: [PATCH v3] RISC-V: Implement __init_riscv_feature_bits, __riscv_feature_bits, and __riscv_vendor_feature_bits

2024-07-23 Thread Palmer Dabbelt

On Mon, 22 Jul 2024 07:16:28 PDT (-0700), kito.ch...@sifive.com wrote:

This provides a common abstraction layer to probe the available extensions at
run-time. These functions can be used to implement function multi-versioning or
to detect available extensions.

The advantages of providing this abstraction layer are:
- Easy to port to other new platforms.
- Easier to maintain in GCC for function multi-versioning.
  - For example, maintaining platform-dependent code in C code/libgcc is much
easier than maintaining it in GCC by creating GIMPLEs...

This API is intended to provide the capability to query minimal common 
available extensions on the system.

Proposal in riscv-c-api-doc: 
https://github.com/riscv-non-isa/riscv-c-api-doc/pull/74


That's not merged, but I'm not sure what the rules are on stability for 
the C API doc.



Full function multi-versioning implementation will come later. We are posting
this first because we intend to backport it to the GCC 14 branch to unblock
LLVM 19 to use this with GCC 14.2, rather than waiting for GCC 15.

Changes since v2:
- Prevent it initialize more than once.

Changes since v1:
- Fix the format.
- Prevented race conditions by introducing a local variable to avoid load/store
  operations during the computation of the feature bit.

libgcc/ChangeLog:

* config/riscv/feature_bits.c: New.
* config/riscv/t-elf (LIB2ADD): Add feature_bits.c.
---
 libgcc/config/riscv/feature_bits.c | 313 +
 libgcc/config/riscv/t-elf  |   1 +
 2 files changed, 314 insertions(+)
 create mode 100644 libgcc/config/riscv/feature_bits.c

diff --git a/libgcc/config/riscv/feature_bits.c 
b/libgcc/config/riscv/feature_bits.c
new file mode 100644
index 000..cce4fbfa6be
--- /dev/null
+++ b/libgcc/config/riscv/feature_bits.c
@@ -0,0 +1,313 @@
+/* Helper function for function multi-versioning for RISC-V.
+
+   Copyright (C) 2024 Free Software Foundation, Inc.
+
+This file is part of GCC.
+
+GCC is free software; you can redistribute it and/or modify it under
+the terms of the GNU General Public License as published by the Free
+Software Foundation; either version 3, or (at your option) any later
+version.
+
+GCC is distributed in the hope that it will be useful, but WITHOUT ANY
+WARRANTY; without even the implied warranty of MERCHANTABILITY or
+FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public License
+for more details.
+
+Under Section 7 of GPL version 3, you are granted additional
+permissions described in the GCC Runtime Library Exception, version
+3.1, as published by the Free Software Foundation.
+
+You should have received a copy of the GNU General Public License and
+a copy of the GCC Runtime Library Exception along with this program;
+see the files COPYING3 and COPYING.RUNTIME respectively.  If not, see
+.  */
+
+#define RISCV_FEATURE_BITS_LENGTH 1
+struct {
+  unsigned length;
+  unsigned long long features[RISCV_FEATURE_BITS_LENGTH];
+} __riscv_feature_bits __attribute__((visibility("hidden"), nocommon));
+
+#define RISCV_VENDOR_FEATURE_BITS_LENGTH 1
+
+struct {
+  unsigned vendorID;
+  unsigned length;
+  unsigned long long features[RISCV_VENDOR_FEATURE_BITS_LENGTH];
+} __riscv_vendor_feature_bits __attribute__((visibility("hidden"), nocommon));
+
+#define A_GROUPID 0
+#define A_BITMASK (1ULL << 0)
+#define C_GROUPID 0
+#define C_BITMASK (1ULL << 2)
+#define D_GROUPID 0
+#define D_BITMASK (1ULL << 3)
+#define F_GROUPID 0
+#define F_BITMASK (1ULL << 5)
+#define I_GROUPID 0
+#define I_BITMASK (1ULL << 8)
+#define M_GROUPID 0
+#define M_BITMASK (1ULL << 12)
+#define V_GROUPID 0
+#define V_BITMASK (1ULL << 21)
+#define ZACAS_GROUPID 0
+#define ZACAS_BITMASK (1ULL << 26)
+#define ZBA_GROUPID 0
+#define ZBA_BITMASK (1ULL << 27)
+#define ZBB_GROUPID 0
+#define ZBB_BITMASK (1ULL << 28)
+#define ZBC_GROUPID 0
+#define ZBC_BITMASK (1ULL << 29)
+#define ZBKB_GROUPID 0
+#define ZBKB_BITMASK (1ULL << 30)
+#define ZBKC_GROUPID 0
+#define ZBKC_BITMASK (1ULL << 31)
+#define ZBKX_GROUPID 0
+#define ZBKX_BITMASK (1ULL << 32)
+#define ZBS_GROUPID 0
+#define ZBS_BITMASK (1ULL << 33)
+#define ZFA_GROUPID 0
+#define ZFA_BITMASK (1ULL << 34)
+#define ZFH_GROUPID 0
+#define ZFH_BITMASK (1ULL << 35)
+#define ZFHMIN_GROUPID 0
+#define ZFHMIN_BITMASK (1ULL << 36)
+#define ZICBOZ_GROUPID 0
+#define ZICBOZ_BITMASK (1ULL << 37)
+#define ZICOND_GROUPID 0
+#define ZICOND_BITMASK (1ULL << 38)
+#define ZIHINTNTL_GROUPID 0
+#define ZIHINTNTL_BITMASK (1ULL << 39)
+#define ZIHINTPAUSE_GROUPID 0
+#define ZIHINTPAUSE_BITMASK (1ULL << 40)
+#define ZKND_GROUPID 0
+#define ZKND_BITMASK (1ULL << 41)
+#define ZKNE_GROUPID 0
+#define ZKNE_BITMASK (1ULL << 42)
+#define ZKNH_GROUPID 0
+#define ZKNH_BITMASK (1ULL << 43)
+#define ZKSED_GROUPID 0
+#define ZKSED_BITMASK (1ULL << 44)
+#define ZKSH_GROUPID 0
+#define ZKSH_BITMASK (1ULL << 45)
+#define ZKT_GROUPID 0
+#define ZKT_BITMASK (1ULL << 46)
+#define ZTSO_GROUPID 0
+#define ZTS

Re: [PATCH] regrename: Skip renaming register pairs [PR115860]

2024-07-23 Thread Stefan Schulze Frielinghaus
On Mon, Jul 22, 2024 at 08:16:16PM -0700, Andrew Pinski wrote:
> On Sun, Jul 21, 2024 at 11:47 PM Stefan Schulze Frielinghaus
> > diff --git a/gcc/regrename.cc b/gcc/regrename.cc
> > index 054e601740b..6ae5a2309d0 100644
> > --- a/gcc/regrename.cc
> > +++ b/gcc/regrename.cc
> > @@ -1113,6 +1113,10 @@ scan_rtx_reg (rtx_insn *insn, rtx *loc, enum 
> > reg_class cl, enum scan_actions act
> >
> >   c = create_new_chain (this_regno, this_nregs, loc, insn, cl);
> >
> > + /* Give up early in case of register pairs.  */
> > + if (this_nregs != 1)
> > +   c->cannot_rename = 1;
> 
> 
> I am a bit worried this will make TImode (and DImode for 32bit targets) worse.
> And it might make aarch64's vector struct types much worse than they
> are currently.
> It is interesting how there is a subreg of a hardregister after reload
> showing up here. Is that on purpose?

Good catch.  I don't think this was on purpose.  When looking at the
dump I rather thought this is valid RTL and didn't question it since
subregs for register pairs got "expanded" during final.

> They come from:
> ```
> (define_insn "*tf_to_fprx2_0"
>   [(set (subreg:DF (match_operand:FPRX2 0 "nonimmediate_operand" "+f") 0)
> (subreg:DF (match_operand:TF1 "general_operand"   "v") 0))]
> ...
> (define_insn "*tf_to_fprx2_1"
>   [(set (subreg:DF (match_operand:FPRX2 0 "nonimmediate_operand" "+f") 8)
> (subreg:DF (match_operand:TF1 "general_operand"   "v") 8))]
> 
> ```
> 
> I am not sure if that is a valid thing to do. s390 backend is the only
> one that has insn patterns like this. all that uses "+" use either
> strict_lowpart of zero_extract for the lhs or just a pure set.
> Maybe there is a better way of representing this. Maybe using unspec here?

I gave unspec a try and came up with

(define_insn "*tf_to_fprx2_0"
  [(set (subreg:DF (match_operand:FPRX2 0 "nonimmediate_operand" "+f") 0)
(unspec:DF [(match_operand:TF1 "general_operand"   "v")] 
UNSPEC_TF_TO_FPRX2_0))]
  "TARGET_VXE"
  ; M4 == 1 corresponds to %v0[0] = %v1[0]; %v0[1] = %v0[1];
  "vpdi\t%v0,%v1,%v0,1"
  [(set_attr "op_type" "VRR")])

(define_insn "*tf_to_fprx2_1"
  [(set (subreg:DF (match_operand:FPRX2 0 "nonimmediate_operand" "+f") 8)
(unspec:DF [(match_operand:TF1 "general_operand"   "v")] 
UNSPEC_TF_TO_FPRX2_1))]
  "TARGET_VXE"
  ; M4 == 5 corresponds to %V0[0] = %v1[1]; %V0[1] = %V0[1];
  "vpdi\t%V0,%v1,%V0,5"
  [(set_attr "op_type" "VRR")])

which seems to work.  However, I'm still getting subregs at final:

(insn 3 18 7 (set (reg/v:TF 18 %f4 [orig:62 x ] [62])
(mem/c:TF (reg:DI 2 %r2 [65]) [1 x+0 S16 A64])) "t.c":3:1 421 {movtf_vr}
 (expr_list:REG_DEAD (reg:DI 2 %r2 [65])
(nil)))
(insn 7 3 8 (set (subreg:DF (reg:FPRX2 16 %f0 [64]) 0)
(unspec:DF [
(reg/v:TF 18 %f4 [orig:62 x ] [62])
] UNSPEC_TF_TO_FPRX2_0)) "t.c":4:10 569 {*tf_to_fprx2_0}
 (nil))
(insn 8 7 14 (set (subreg:DF (reg:FPRX2 16 %f0 [64]) 8)
(unspec:DF [
(reg/v:TF 18 %f4 [orig:62 x ] [62])
] UNSPEC_TF_TO_FPRX2_1)) "t.c":4:10 570 {*tf_to_fprx2_1}
 (expr_list:REG_DEAD (reg/v:TF 18 %f4 [orig:62 x ] [62])
(nil)))

Thus, I'm not sure whether this really solves the problem or rather
shifts around it.  I'm still a bit puzzled why the initial RTL is
invalid.  If I understood you correctly Jeff, then we are missing a
pattern which would match once the subregs are eliminated.  Since none
exists the subregs survive and regrename gets confused.  This basically
means that subregs of register pairs must not survive RA and the unspec
solution from above is no real solution.

Since the only purpose of tf_to_fprx2_0 and tf_to_fprx2_1 are to move a
long double from a vector register into a FP register pair one could
also merge both insn into one and emit two instructions in the assembler
template.  This would at least circumvent the subreg issue.

(define_insn "tf_to_fprx2"
  [(set (match_operand:FPRX2 0 "nonimmediate_operand" "=f")
(unspec:FPRX2 [(match_operand:TF 1 "general_operand"   "v")] 
UNSPEC_TF_TO_FPRX2))]
  "TARGET_VXE"
  "vpdi\t%v0,%v1,%v0,1;vpdi\t%V0,%v1,%V0,5"
  [(set_attr "length" "12")
   (set_attr "op_type" "VRR")])

I will give this a try tomorrow.

Thanks,
Stefan


Re: [PATCH] arm: Update fp16-aapcs-[24].c after insn_propagation patch

2024-07-23 Thread Adhemerval Zanella Netto



On 19/07/24 11:25, Richard Earnshaw (lists) wrote:
> On 11/07/2024 19:31, Richard Sandiford wrote:
>> These tests used to generate:
>>
>> bl  swap
>> ldr r2, [sp, #4]
>> mov r0, r2  @ __fp16
>>
>> but g:9d20529d94b23275885f380d155fe8671ab5353a means that we can
>> load directly into r0:
>>
>> bl  swap
>> ldrhr0, [sp, #4]@ __fp16
>>
>> This patch updates the tests to "defend" this change.
>>
>> While there, the scans include:
>>
>> mov\tr1, r[03]}
>>
>> But if the spill of r2 occurs first, there's no real reason why
>> r2 couldn't be used as the temporary, instead r3.
>>
>> The patch tries to update the scans while preserving the spirit
>> of the originals.
>>
>> Spot-checked with armv8l-unknown-linux-gnueabihf.  OK to install?
>>
>> Richard
> 
> OK.
> 
> I'm not sure that these tests are really doing very much; it would probably 
> be better if they could be rewritten using the gcc.target/arm/aapcs 
> framework.  But that's for another day.
> 
> R.

Hi Richard,

It seems that this not fully fixed on all configurations, I am still seeing:

FAIL: gcc.target/arm/fp16-aapcs-1.c scan-assembler vmov\\.f32\\ts1, s0
FAIL: gcc.target/arm/fp16-aapcs-2.c scan-assembler-times mov\\tr[01], r[0-9]+ 2
FAIL: gcc.target/arm/fp16-aapcs-2.c scan-assembler str\\tr2, 
([^\\n]*).*ldrh\\tr0, \\1
FAIL: gcc.target/arm/fp16-aapcs-3.c scan-assembler vmov\\.f32\\ts1, s0
FAIL: gcc.target/arm/fp16-aapcs-4.c scan-assembler-times mov\\tr[01], r[0-9]+ 2
FAIL: gcc.target/arm/fp16-aapcs-4.c scan-assembler str\\tr2, 
([^\\n]*).*ldrh\\tr0, \\1

The gcc is configured with --with-float=hard --with-fpu=vfpv3-d16 
--with-mode=thumb --with-tune=cortex-a9 --with-arch=armv7-a 

https://ci.linaro.org/job/tcwg_gnu_cross_check_gcc--master-arm-build/1561/artifact/artifacts/00-sumfiles/


[match.pd PATCH] Fold ctz(-x) as ctz(x).

2024-07-23 Thread Roger Sayle

The subject line pretty much says it all; the count-trailing-zeros function
of -X produces the same result as count-trailing-zeros of X.  This
transformation eliminates a negation which may potentially overflow with
an equivalent expression that doesn't [much like the analogous abs(-X)
simplification in match.pd].  Likewise, the undefined at zero remains
undefined.

I'd noticed this equivalence, which isn't mentioned in Hacker's Delight,
investigating whether ranger's non_zero_bits can help determine whether
an integer variable may be converted to a floating point type exactly
(without raising FE_INEXACT), but it turns out this observation isn't
novel, as (disappointingly) LLVM already performs this same folding.

This patch has been tested on x86_64-pc-linux-gnu with make bootstrap
and make -k check, both with and without --target_board=unix{-m32}
with no new failures.  Ok for mainline?

2024-07-23  Roger Sayle  

gcc/ChangeLog
* match.pd (ctz (-X) => ctz (X)): New simplification.

gcc/testsuite/ChangeLog
* gcc.dg/fold-ctz-1.c: New test case.


Thanks in advance,
Roger
--

diff --git a/gcc/match.pd b/gcc/match.pd
index 6818856..d6d61eb 100644
--- a/gcc/match.pd
+++ b/gcc/match.pd
@@ -9056,6 +9056,11 @@ DEFINE_INT_AND_FLOAT_ROUND_FN (RINT)
 
 /* CTZ simplifications.  */
 (for ctz (CTZ)
+ /* ctz (-x) => ctz (x).  */
+ (simplify
+  (ctz (nop_convert?@0 (negate @1)))
+  (with { tree t = TREE_TYPE (@0); }
+   (ctz (convert:t @1
  (for op (ge gt le lt)
   cmp (eq eq ne ne)
   (simplify
diff --git a/gcc/testsuite/gcc.dg/fold-ctz-1.c 
b/gcc/testsuite/gcc.dg/fold-ctz-1.c
new file mode 100644
index 000..dcc444c
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/fold-ctz-1.c
@@ -0,0 +1,9 @@
+/* { dg-do compile } */
+/* { dg-options "-O2 -fdump-tree-optimized" } */
+
+int foo(int x)
+{
+  return __builtin_ctz (-x);
+}
+
+/* { dg-final { scan-tree-dump-not "-x_" "optimized"} } */


Re: [match.pd PATCH] Fold ctz(-x) as ctz(x).

2024-07-23 Thread Andrew Pinski
On Tue, Jul 23, 2024 at 9:30 AM Roger Sayle  wrote:
>
>
> The subject line pretty much says it all; the count-trailing-zeros function
> of -X produces the same result as count-trailing-zeros of X.  This
> transformation eliminates a negation which may potentially overflow with
> an equivalent expression that doesn't [much like the analogous abs(-X)
> simplification in match.pd].  Likewise, the undefined at zero remains
> undefined.
>
> I'd noticed this equivalence, which isn't mentioned in Hacker's Delight,
> investigating whether ranger's non_zero_bits can help determine whether
> an integer variable may be converted to a floating point type exactly
> (without raising FE_INEXACT), but it turns out this observation isn't
> novel, as (disappointingly) LLVM already performs this same folding.
>
> This patch has been tested on x86_64-pc-linux-gnu with make bootstrap
> and make -k check, both with and without --target_board=unix{-m32}
> with no new failures.  Ok for mainline?
>
> 2024-07-23  Roger Sayle  
>
> gcc/ChangeLog
> * match.pd (ctz (-X) => ctz (X)): New simplification.
```
int f1(int a)
{
return __builtin_ctz(__builtin_abs(a));
}
```
Should also be handled. Though this might be good to handle in the
backprop pass (gimple-ssa-backprop.cc) which handles sign changes like
this already.

Thanks,
Andrew Pinski

>
> gcc/testsuite/ChangeLog
> * gcc.dg/fold-ctz-1.c: New test case.
>
>
> Thanks in advance,
> Roger
> --
>


[PATCH ver 2] rs6000, remove __builtin_vsx_xvcmp* built-ins

2024-07-23 Thread Carl Love

GCC maintainers:

version 2, Updated patch comments, added missing ChangeLog.  Fixed 
unintended line removal.


The following patch removes the three __builtin_vsx_xvcmp[eq|ge|gt]sp  
builtins as they similar to the overloaded vec_cmp[eq|ge|gt] built-ins.  
The difference is the overloaded built-ins return a vector of boolean or 
a vector of long long booleans where as the removed built-ins returned a 
vector of floats or vector of doubles.


The tests for __builtin_vsx_xvcmp[eq|ge|gt]sp and 
__builtin_vsx_xvcmp[eq|ge|gt]dp are updated to use the overloaded 
vec_cmp[eq|ge|gt] built-in with the required changes for the return 
type.  Note __builtin_vsx_xvcmp[eq|ge|gt]dp are used internally.


The patches have been tested on a Power 10 LE system with no regressions.

Please let me know if the patch is acceptable for mainline.  Thanks.

   Carl
-
rs6000, remove __builtin_vsx_xvcmp* built-ins

This patch removes the built-ins:
 __builtin_vsx_xvcmpeqsp, __builtin_vsx_xvcmpgesp,
 __builtin_vsx_xvcmpgtsp.

which are similar to the recommended PVIPR documented overloaded
vec_cmpeq, vec_cmpgt and vec_cmpge built-ins.

The difference is that the overloaded built-ins return a vector of
32-bit booleans.  The removed built-ins returned a vector of floats.

The __builtin_vsx_xvcmpeqdp, __builtin_vsx_xvcmpgedp and
__builtin_vsx_xvcmpgtdp are not removed as they are used by the
overloaded vec_cmpeq, vec_cmpgt and vec_cmpge built-ins.

The test cases for the __builtin_vsx_xvcmpeqsp, __builtin_vsx_xvcmpgesp,
__builtin_vsx_xvcmpgtsp, __builtin_vsx_xvcmpeqdp,
__builtin_vsx_xvcmpgedp and __builtin_vsx_xvcmpgtdp  are changed to use
the overloaded vec_cmpeq, vec_cmpgt, vec_cmpge built-ins.  Use of the
overloaded built-ins requires the result to be stored in a vector of
boolean of the appropriate size or the result must be cast to the return
type used by the original __builtin_vsx_xvcmp* built-ins.

gcc/ChangeLog:
    * config/rs6000/rs6000-builtins.def (__builtin_vsx_xvcmpeqsp,
    __builtin_vsx_xvcmpgesp, __builtin_vsx_xvcmpgtsp): Remove
    definitions.

gcc/testsuite/ChangeLog:
    * gcc.target/powerpc/vsx-builtin-3.c (__builtin_vsx_xvcmpeqdp,
    __builtin_vsx_xvcmpgtdp, __builtin_vsx_xvcmpgedp,
    __builtin_vsx_xvcmpeqsp, __builtin_vsx_xvcmpgtsp,
    __builtin_vsx_xvcmpgesp): Remove.
    (vec_cmpeq, vec_cmpgt, vec_cmpge): Add tests for float
    arguments that     store result in boolean and cast result to
    store result in float.  Add tests for double arguments that
    store the result in long long boolean and cast result to
    double.
---
 gcc/config/rs6000/rs6000-builtins.def |  9 --
 .../gcc.target/powerpc/vsx-builtin-3.c    | 28 ++-
 2 files changed, 21 insertions(+), 16 deletions(-)

diff --git a/gcc/config/rs6000/rs6000-builtins.def 
b/gcc/config/rs6000/rs6000-builtins.def

index 77eb0f7e406..47830b7dcb0 100644
--- a/gcc/config/rs6000/rs6000-builtins.def
+++ b/gcc/config/rs6000/rs6000-builtins.def
@@ -1579,18 +1579,12 @@
   const signed int __builtin_vsx_xvcmpeqdp_p (signed int, vd, vd);
 XVCMPEQDP_P vector_eq_v2df_p {pred}

-  const vf __builtin_vsx_xvcmpeqsp (vf, vf);
-    XVCMPEQSP vector_eqv4sf {}
-
   const vd __builtin_vsx_xvcmpgedp (vd, vd);
 XVCMPGEDP vector_gev2df {}

   const signed int __builtin_vsx_xvcmpgedp_p (signed int, vd, vd);
 XVCMPGEDP_P vector_ge_v2df_p {pred}

-  const vf __builtin_vsx_xvcmpgesp (vf, vf);
-    XVCMPGESP vector_gev4sf {}
-
   const signed int __builtin_vsx_xvcmpgesp_p (signed int, vf, vf);
 XVCMPGESP_P vector_ge_v4sf_p {pred}

@@ -1600,9 +1594,6 @@
   const signed int __builtin_vsx_xvcmpgtdp_p (signed int, vd, vd);
 XVCMPGTDP_P vector_gt_v2df_p {pred}

-  const vf __builtin_vsx_xvcmpgtsp (vf, vf);
-    XVCMPGTSP vector_gtv4sf {}
-
   const signed int __builtin_vsx_xvcmpgtsp_p (signed int, vf, vf);
 XVCMPGTSP_P vector_gt_v4sf_p {pred}

diff --git a/gcc/testsuite/gcc.target/powerpc/vsx-builtin-3.c 
b/gcc/testsuite/gcc.target/powerpc/vsx-builtin-3.c

index 60f91aad23c..d67f97c8011 100644
--- a/gcc/testsuite/gcc.target/powerpc/vsx-builtin-3.c
+++ b/gcc/testsuite/gcc.target/powerpc/vsx-builtin-3.c
@@ -156,13 +156,27 @@ int do_cmp (void)
 {
   int i = 0;

-  d[i][0] = __builtin_vsx_xvcmpeqdp (d[i][1], d[i][2]); i++;
-  d[i][0] = __builtin_vsx_xvcmpgtdp (d[i][1], d[i][2]); i++;
-  d[i][0] = __builtin_vsx_xvcmpgedp (d[i][1], d[i][2]); i++;
-
-  f[i][0] = __builtin_vsx_xvcmpeqsp (f[i][1], f[i][2]); i++;
-  f[i][0] = __builtin_vsx_xvcmpgtsp (f[i][1], f[i][2]); i++;
-  f[i][0] = __builtin_vsx_xvcmpgesp (f[i][1], f[i][2]); i++;
+  /* The __builtin_vsx_xvcmp[gt|ge|eq]dp and 
__builtin_vsx_xvcmp[gt|ge|eq]sp

+ have been removed in favor of the overloaded vec_cmpeq, vec_cmpgt and
+ vec_cmpge built-ins.  The __builtin_vsx_xvcmp* builtins returned a 
vector
+ result of the same type as the 

Re: [PATCH v3 2/2] Prevent divide-by-zero

2024-07-23 Thread Edwin Lu



On 7/23/2024 4:56 AM, Richard Biener wrote:

On Tue, Jul 23, 2024 at 1:03 AM Edwin Lu  wrote:

Hi Richard,

On 5/31/2024 1:48 AM, Richard Biener wrote:

On Thu, May 30, 2024 at 2:11 AM Patrick O'Neill  wrote:

From: Greg McGary 

Still a NACK.  If remain ends up zero then

  /* Try to use a single smaller load when we are about
 to load excess elements compared to the unrolled
 scalar loop.  */
  if (known_gt ((vec_num * j + i + 1) * nunits,
 (group_size * vf - gap)))
{
  poly_uint64 remain = ((group_size * vf - gap)
- (vec_num * j + i) * nunits);
  if (known_ge ((vec_num * j + i + 1) * nunits
- (group_size * vf - gap), nunits))
/* DR will be unused.  */
ltype = NULL_TREE;

needs to be re-formulated so that the combined conditions make sure
this doesn't happen.  The outer known_gt should already ensure that
remain > 0.  For correctness that should possibly be maybe_gt though.


Putting the list back in the loop and CCing Richard S.


I'm currently looking into this patch and am trying to figure out what
is going on. Stepping through gdb, I see that remain == {coeffs = {0,
2}} and nunits == {coeffs = {2, 2}} (the outer known_gt returned true
with known_gt({coeffs = {8, 8}}, {coeffs = {6, 8}})).

  From what I understand, this falls under the umbrella of 0 <= remain <
nunits. The divide by zero error is because of the 0 <= remain which is
coming from the constant_multiple_p function in poly-int.h where it
performs the modulus NCa(a.coeffs[0]) % NCb(b.coeffs[0]).
(https://github.com/gcc-mirror/gcc/blob/master/gcc/poly-int.h#L1970-L1971)


  >  if (known_ge ((vec_num * j + i + 1) * nunits
  >- (group_size * vf - gap),
nunits))
  >/* DR will be unused.  */
  >ltype = NULL_TREE;

This if condition is a bit suspicious to me though. I'm seeing that it's
evaluating known_ge({coeffs = {2, 0}}, {coeffs = {2, 2}}) which is
returning false. Should it be maybe_ge instead?

No, we can only not emit a load if we know it won't be used, not if
it eventually cannot be used.


After running some
tests, to me it looks like it doesn't vectorize quite as often; however,
I'm not fully sure what else to do when the coeffs can potentially be
equal to 0.

Should it even be possible for there to be a {coeffs = {0, n}}
situation? My understanding of how poly_ints are used for representing
vectorization is that the first coefficient is the number of elements
needed to make the minimum supported vector size. That is, if vector
lengths are 128 bits, element size is 32 bits, coeff[0] should be
minimum of 4. Is this understanding correct?

I was told n can be negative, but nunits.coeff[0] should be non-zero.

What is j and i when the divisor is zero?


The values I see in gdb are: vec_num = 4 j = 0 i = 3 vf = {coeffs = {2, 
2}} nunits = {coeffs = {2, 2}} group_size = 4 gap = 2 vect_align = 2 
remain = {coeffs = {0, 2}}


What would it mean for the coeffs[0] to be 0? Would that mean the vector length 
supports 0 bits?


gcc/ChangeLog:
  * gcc/tree-vect-stmts.cc (gcc/tree-vect-stmts.cc): Prevent 
divide-by-zero.
  * testsuite/gcc.target/riscv/rvv/autovec/no-segment.c: Remove dg-ice.
---
No changes in v3. Depends on the risc-v backend option added in patch 1 to
trigger the ICE.
---
   gcc/testsuite/gcc.target/riscv/rvv/autovec/no-segment.c | 1 -
   gcc/tree-vect-stmts.cc  | 3 ++-
   2 files changed, 2 insertions(+), 2 deletions(-)

diff --git a/gcc/testsuite/gcc.target/riscv/rvv/autovec/no-segment.c 
b/gcc/testsuite/gcc.target/riscv/rvv/autovec/no-segment.c
index dfbe09f01a1..79d03612a22 100644
--- a/gcc/testsuite/gcc.target/riscv/rvv/autovec/no-segment.c
+++ b/gcc/testsuite/gcc.target/riscv/rvv/autovec/no-segment.c
@@ -1,6 +1,5 @@
   /* { dg-do compile } */
   /* { dg-options "-march=rv64gcv -mabi=lp64d -mrvv-vector-bits=scalable -O3 
-mno-autovec-segment" } */
-/* { dg-ice "Floating point exception" } */

   enum e { c, d };
   enum g { f };
diff --git a/gcc/tree-vect-stmts.cc b/gcc/tree-vect-stmts.cc
index 4219ad832db..34f5736ba00 100644
--- a/gcc/tree-vect-stmts.cc
+++ b/gcc/tree-vect-stmts.cc
@@ -11558,7 +11558,8 @@ vectorizable_load (vec_info *vinfo,
   - (vec_num * j + i) * nunits);
  /* remain should now be > 0 and < nunits.  */
  unsigned num;
-   if (constant_multiple_p (nunits, remain, &num))
+   if (known_gt (remain, 0)
+   && constant_multiple_p (nunits, remain, &

Re: [PATCH] arm: Update fp16-aapcs-[24].c after insn_propagation patch

2024-07-23 Thread Richard Earnshaw (lists)

On 23/07/2024 17:25, Adhemerval Zanella Netto wrote:



On 19/07/24 11:25, Richard Earnshaw (lists) wrote:

On 11/07/2024 19:31, Richard Sandiford wrote:

These tests used to generate:

 bl  swap
 ldr r2, [sp, #4]
 mov r0, r2  @ __fp16

but g:9d20529d94b23275885f380d155fe8671ab5353a means that we can
load directly into r0:

 bl  swap
 ldrhr0, [sp, #4]@ __fp16

This patch updates the tests to "defend" this change.

While there, the scans include:

mov\tr1, r[03]}

But if the spill of r2 occurs first, there's no real reason why
r2 couldn't be used as the temporary, instead r3.

The patch tries to update the scans while preserving the spirit
of the originals.

Spot-checked with armv8l-unknown-linux-gnueabihf.  OK to install?

Richard


OK.

I'm not sure that these tests are really doing very much; it would probably be 
better if they could be rewritten using the gcc.target/arm/aapcs framework.  
But that's for another day.

R.


Hi Richard,

It seems that this not fully fixed on all configurations, I am still seeing:

FAIL: gcc.target/arm/fp16-aapcs-1.c scan-assembler vmov\\.f32\\ts1, s0
FAIL: gcc.target/arm/fp16-aapcs-2.c scan-assembler-times mov\\tr[01], r[0-9]+ 2
FAIL: gcc.target/arm/fp16-aapcs-2.c scan-assembler str\\tr2, 
([^\\n]*).*ldrh\\tr0, \\1
FAIL: gcc.target/arm/fp16-aapcs-3.c scan-assembler vmov\\.f32\\ts1, s0
FAIL: gcc.target/arm/fp16-aapcs-4.c scan-assembler-times mov\\tr[01], r[0-9]+ 2
FAIL: gcc.target/arm/fp16-aapcs-4.c scan-assembler str\\tr2, 
([^\\n]*).*ldrh\\tr0, \\1

The gcc is configured with --with-float=hard --with-fpu=vfpv3-d16 
--with-mode=thumb --with-tune=cortex-a9 --with-arch=armv7-a

https://ci.linaro.org/job/tcwg_gnu_cross_check_gcc--master-arm-build/1561/artifact/artifacts/00-sumfiles/


That looks like a wider problem.  Did the test ever work for that set of 
configure options?


R.


[PATCH 0/2] rs6000, remove vec and vsx set builtins

2024-07-23 Thread Carl Love

GCC maintainers:

The code generated by using C-code to set a vector element versus using 
a built-in has been investigated.  The assembly code generated from the 
C-code is as good or better than the assembly code generated for the 
built-ins for both the -O0 and -O3 levels of optimization.


For the vec_insert built-in bif whose resolving makes use of the vec_set 
bif previously, is now removed, is as good as before with optimization.


This two patch series removes the __builtin_vec_set_v1ti, 
__builtin_vec_set_v2df, __builtin_vec_set_v2di and  built-ins 
__builtin_vsx_set_1ti,  __builtin_vsx_set_2df, __builtin_vsx_set_2di 
built-ins in favor of using C-code instead.  The built-ins use the 
built-in set attribute in the definitions of the built-ins.  With the 
removal of these 6 built-ins, the set built-in attribute is no longer 
used and the related code for the attribute is removed.


The patch, first patch in this series, to remove the 
__builtin_vec_set_v1ti, __builtin_vec_set_v2df, __builtin_vec_set_v2di 
was previously posted.  The feedback on the patch was that we could also 
remove set bif attribute.  Removal of the set bif attribute requires 
also removing the __builtin_vsx_set_1ti,  __builtin_vsx_set_2df, 
__builtin_vsx_set_2di built-ins.  The second patch removes the vsx set 
built-ins and the now no longer used set built-in attribute and 
associated code.


The patches have been tested on a Power 10 LE system with no regressions.

Carl


Re: [PATCH] regrename: Skip renaming register pairs [PR115860]

2024-07-23 Thread Jeff Law




On 7/23/24 9:45 AM, Stefan Schulze Frielinghaus wrote:




They come from:
```
(define_insn "*tf_to_fprx2_0"
   [(set (subreg:DF (match_operand:FPRX2 0 "nonimmediate_operand" "+f") 0)
 (subreg:DF (match_operand:TF1 "general_operand"   "v") 0))]
...
(define_insn "*tf_to_fprx2_1"
   [(set (subreg:DF (match_operand:FPRX2 0 "nonimmediate_operand" "+f") 8)
 (subreg:DF (match_operand:TF1 "general_operand"   "v") 8))]

```

I am not sure if that is a valid thing to do. s390 backend is the only
one that has insn patterns like this. all that uses "+" use either
strict_lowpart of zero_extract for the lhs or just a pure set.
Maybe there is a better way of representing this. Maybe using unspec here?


I gave unspec a try and came up with

(define_insn "*tf_to_fprx2_0"
   [(set (subreg:DF (match_operand:FPRX2 0 "nonimmediate_operand" "+f") 0)
 (unspec:DF [(match_operand:TF1 "general_operand"   "v")] 
UNSPEC_TF_TO_FPRX2_0))]
   "TARGET_VXE"
   ; M4 == 1 corresponds to %v0[0] = %v1[0]; %v0[1] = %v0[1];
   "vpdi\t%v0,%v1,%v0,1"
   [(set_attr "op_type" "VRR")])

(define_insn "*tf_to_fprx2_1"
   [(set (subreg:DF (match_operand:FPRX2 0 "nonimmediate_operand" "+f") 8)
 (unspec:DF [(match_operand:TF1 "general_operand"   "v")] 
UNSPEC_TF_TO_FPRX2_1))]
   "TARGET_VXE"
   ; M4 == 5 corresponds to %V0[0] = %v1[1]; %V0[1] = %V0[1];
   "vpdi\t%V0,%v1,%V0,5"
   [(set_attr "op_type" "VRR")])

which seems to work.  However, I'm still getting subregs at final:

(insn 3 18 7 (set (reg/v:TF 18 %f4 [orig:62 x ] [62])
 (mem/c:TF (reg:DI 2 %r2 [65]) [1 x+0 S16 A64])) "t.c":3:1 421 
{movtf_vr}
  (expr_list:REG_DEAD (reg:DI 2 %r2 [65])
 (nil)))
(insn 7 3 8 (set (subreg:DF (reg:FPRX2 16 %f0 [64]) 0)
 (unspec:DF [
 (reg/v:TF 18 %f4 [orig:62 x ] [62])
 ] UNSPEC_TF_TO_FPRX2_0)) "t.c":4:10 569 {*tf_to_fprx2_0}
  (nil))
(insn 8 7 14 (set (subreg:DF (reg:FPRX2 16 %f0 [64]) 8)
 (unspec:DF [
 (reg/v:TF 18 %f4 [orig:62 x ] [62])
 ] UNSPEC_TF_TO_FPRX2_1)) "t.c":4:10 570 {*tf_to_fprx2_1}
  (expr_list:REG_DEAD (reg/v:TF 18 %f4 [orig:62 x ] [62])
 (nil)))

Thus, I'm not sure whether this really solves the problem or rather
shifts around it.  I'm still a bit puzzled why the initial RTL is
invalid.  If I understood you correctly Jeff, then we are missing a
pattern which would match once the subregs are eliminated.  Since none
exists the subregs survive and regrename gets confused.  This basically
means that subregs of register pairs must not survive RA and the unspec
solution from above is no real solution.
I'd tend to agree.  The routine in question is cleanup_subreg_operands 
and from a quick looksie it's not going to work for the insn in question 
because cleanup_subreg_operands actually looks down into the recog data 
structures for each operand.  In the case above the subreg is explicit 
in the RTL rather than matched by the operand predicate.


Jeff


Re: [match.pd PATCH] Fold ctz(-x) as ctz(x).

2024-07-23 Thread Jeff Law




On 7/23/24 10:30 AM, Roger Sayle wrote:


The subject line pretty much says it all; the count-trailing-zeros function
of -X produces the same result as count-trailing-zeros of X.  This
transformation eliminates a negation which may potentially overflow with
an equivalent expression that doesn't [much like the analogous abs(-X)
simplification in match.pd].  Likewise, the undefined at zero remains
undefined.

I'd noticed this equivalence, which isn't mentioned in Hacker's Delight,
investigating whether ranger's non_zero_bits can help determine whether
an integer variable may be converted to a floating point type exactly
(without raising FE_INEXACT), but it turns out this observation isn't
novel, as (disappointingly) LLVM already performs this same folding.

This patch has been tested on x86_64-pc-linux-gnu with make bootstrap
and make -k check, both with and without --target_board=unix{-m32}
with no new failures.  Ok for mainline?

2024-07-23  Roger Sayle  

gcc/ChangeLog
 * match.pd (ctz (-X) => ctz (X)): New simplification.

gcc/testsuite/ChangeLog
 * gcc.dg/fold-ctz-1.c: New test case.

OK.  Your call on how to handle the additional case that Andrew P. noted.

jeff



Re: [PATCH] gm2: add missing debug output guard

2024-07-23 Thread Gaius Mulley
Wilken Gottwalt  writes:

> The Close() procedure in MemStream is missing a guard to prevent it from
> printing in non-debug mode.
>
> gcc/gm2:
>   * gm2-libs-iso/MemStream.mod: Guard debug output.
>
> Signed-off-by: Wilken Gottwalt 
> ---
>  gcc/m2/gm2-libs-iso/MemStream.mod | 5 -
>  1 file changed, 4 insertions(+), 1 deletion(-)
>
> diff --git a/gcc/m2/gm2-libs-iso/MemStream.mod 
> b/gcc/m2/gm2-libs-iso/MemStream.mod
> index 9620ed2ba19..d3204692540 100644
> --- a/gcc/m2/gm2-libs-iso/MemStream.mod
> +++ b/gcc/m2/gm2-libs-iso/MemStream.mod
> @@ -694,7 +694,10 @@ END handlefree ;
>  
>  PROCEDURE Close (VAR cid: ChanId) ;
>  BEGIN
> -   printf ("Close called\n");
> +   IF Debugging
> +   THEN
> +  printf ("Close called\n")
> +   END ;
> IF IsMem(cid)
> THEN
>UnMakeChan(did, cid) ;

many thanks!

regards,
Gaius


Re: [PATCH] gm2: fix bad programming practice identifier warning

2024-07-23 Thread Gaius Mulley
Wilken Gottwalt  writes:

> Fix using keywords as identifiers to prevent warnings coming from
> Modula-2's own libraries.
>
> m2pim/DynamicStrings.mod:1358:27: note: In procedure ‘Slice’: the symbol
> name ‘end’ is legal as an identifier, however as such it might cause
> confusion and is considered bad programming practice
>  1358 |start, end, o: INTEGER ;
>
> m2pim/DynamicStrings.mod:1358:27: note: either the identifier has the
> same name as a keyword or alternatively a keyword has the wrong case
> (‘END’ and ‘end’)
>
> gcc/gm2:
> * gm2-libs/DynamicStrings.mod: Fix bad identifier warning.
>
> Signed-off-by: Wilken Gottwalt 

thank you!


Re: [PATCH 2/2] rs6000, remove built-ins __builtin_vsx_set_1ti, __builtin_vsx_set_2df, __builtin_vsx_set_2di

2024-07-23 Thread Carl Love

GCC maintainers:

This patch removes the vsx set built-ins: __builtin_vsx_set_1ti, 
__builtin_vsx_set_2df, __builtin_vsx_set_2di.  With the  removal of 
these built-ins, the built-in attribute "set", used in the built-in 
definition file, is no longer needed.  The "set"  and the associated 
code for the "set" is removed.


The assembly code generated by using C code to set an element of a 
vector versus using the vsx set built-in to set an element was 
investigated.  With -O0 optimization the generated assmenly code is 
comparable in therms of the generated assembly instrucitons and number 
of instructions.  For the -O3 optimization level, the 2DI an 2DF cases 
the built-ins and the C code generate identical assembly code.  The 
assembly code generated for the 1TI case for the C code has one less 
instruction.  The built-in generates an extra load instruction.  Hence, 
the C code is better as it has fewer load instructions.


The testcase for the __builtin_vsx_set_2df is removed.  The other 
built-ins do not have testcases.


The patch has been tested on a Power 10 LE system with no regressions.

Please let me know if the patch is acceptable for mainline.  Thanks.

   Carl

--
rs6000, remove built-ins __builtin_vsx_set_1ti, __builtin_vsx_set_2df, 
__builtin_vsx_set_2di


The built-ins set a value in a vector.  The same operation can be done
in C-code.  The assembly code generated from the C-code is as good or
better than the code generated by the built-ins.  With default
optimization the number of assembly generated for the two methods are
similar.  With -O3 optimization, the assembly generated for the two
approaches is identical for the 2DF and 2DI types.  The assembly for
the C-code version of the 1Ti requres one less assembly instruction.
It also only uses one load versus two loads for the built-in.

With the removal of the built-ins, there are no other uses of the
set built-in attribute.  The code associated with the set built-in
attribute is removed.

Finally, the testcase for the __builtin_vsx_set_2df is removed.  The
other built-ins do not have testcases.

gcc/ChangeLog:
    * config/rs6000/rs6000-builtin.cc (get_element_number,
    altivec_expand_vec_set_builtin): Remove functions.
    (rs6000_expand_builtin): Remove the if statement to call
    altivec_expand_vec_set_builtin.
    * config/rs6000/rs6000-builtins.def (__builtin_vsx_set_1ti,
    __builtin_vsx_set_2df, __builtin_vsx_set_2di): Remove the
    built-in definitions.
    * config/rs6000/rs6000-gen-builtins.cc (struct attrinfo):
    Remove the isset variable from the structure.
    (parse_bif_attrs): Remove the uses of the isset variable.

gcc/testsuite/ChangeLog:
    * gcc.target/powerpc/vsx-builtin-3.c: Remove test cases for the
    __builtin_vsx_set_2df built-in.
---
 gcc/config/rs6000/rs6000-builtin.cc   | 53 ---
 gcc/config/rs6000/rs6000-builtins.def | 10 
 gcc/config/rs6000/rs6000-gen-builtins.cc  | 29 --
 .../gcc.target/powerpc/vsx-builtin-3.c    |  6 ---
 4 files changed, 11 insertions(+), 87 deletions(-)

diff --git a/gcc/config/rs6000/rs6000-builtin.cc 
b/gcc/config/rs6000/rs6000-builtin.cc

index 117cf0125f8..099cbc82245 100644
--- a/gcc/config/rs6000/rs6000-builtin.cc
+++ b/gcc/config/rs6000/rs6000-builtin.cc
@@ -2313,56 +2313,6 @@ altivec_expand_predicate_builtin (enum insn_code 
icode, tree exp, rtx target)

   return target;
 }

-/* Return the integer constant in ARG.  Constrain it to be in the range
-   of the subparts of VEC_TYPE; issue an error if not.  */
-
-static int
-get_element_number (tree vec_type, tree arg)
-{
-  unsigned HOST_WIDE_INT elt, max = TYPE_VECTOR_SUBPARTS (vec_type) - 1;
-
-  if (!tree_fits_uhwi_p (arg)
-  || (elt = tree_to_uhwi (arg), elt > max))
-    {
-  error ("selector must be an integer constant in the range [0, 
%wi]", max);

-  return 0;
-    }
-
-  return elt;
-}
-
-/* Expand vec_set builtin.  */
-static rtx
-altivec_expand_vec_set_builtin (tree exp)
-{
-  machine_mode tmode, mode1;
-  tree arg0, arg1, arg2;
-  int elt;
-  rtx op0, op1;
-
-  arg0 = CALL_EXPR_ARG (exp, 0);
-  arg1 = CALL_EXPR_ARG (exp, 1);
-  arg2 = CALL_EXPR_ARG (exp, 2);
-
-  tmode = TYPE_MODE (TREE_TYPE (arg0));
-  mode1 = TYPE_MODE (TREE_TYPE (TREE_TYPE (arg0)));
-  gcc_assert (VECTOR_MODE_P (tmode));
-
-  op0 = expand_expr (arg0, NULL_RTX, tmode, EXPAND_NORMAL);
-  op1 = expand_expr (arg1, NULL_RTX, mode1, EXPAND_NORMAL);
-  elt = get_element_number (TREE_TYPE (arg0), arg2);
-
-  if (GET_MODE (op1) != mode1 && GET_MODE (op1) != VOIDmode)
-    op1 = convert_modes (mode1, GET_MODE (op1), op1, true);
-
-  op0 = force_reg (tmode, op0);
-  op1 = force_reg (mode1, op1);
-
-  rs6000_expand_vector_set (op0, op1, GEN_INT (elt));
-
-  return op0;
-}
-
 /* Expand vec_ext builtin.  */
 static rtx
 altivec_expan

Re: [PATCH 1/2] rs6000, Remove __builtin_vec_set_v1ti,, __builtin_vec_set_v2df, __builtin_vec_set_v2di

2024-07-23 Thread Carl Love



GCC maintainers:

This patch was previously posted.  Per the feedback, it is now the first 
of two patches to remove the set built-ins.


This patch removes the __builtin_vec_set_v1ti, __builtin_vec_set_v2df 
and __builtin_vec_set_v2di built-ins.  The users should just use normal 
C-code to update the various vector elements.  This change was 
originally intended to be part of the earlier series of cleanup 
patches.  It was initially thought that some additional work would be 
needed to do some gimple generation instead of these built-ins.  
However, the existing default code generation does produce the needed 
code.    For the vec_set bif, the equivalent C code is as good or better 
than the built-in.  For the vec_insert bif whose resolving previously 
made use of the vec_set bif, the assembly code generation is as good as 
before with the -O3 optimization.


The patch has been tested on Power 10 LE with no regressions.

Please let me know if the patch is acceptable for mainline.  Thanks.

   Carl

-
rs6000, Remove __builtin_vec_set_v1ti, __builtin_vec_set_v2df, 
__builtin_vec_set_v2di


Remove the built-ins, use the default gimple generation instead.

gcc/ChangeLog:
    * config/rs6000/rs6000-builtins.def (__builtin_vec_set_v1ti,
    __builtin_vec_set_v2df, __builtin_vec_set_v2di): Remove built-in
    definitions.
    * config/rs6000/rs6000-c.cc (resolve_vec_insert): Remove the
    handling for constant vec_insert position with
    VECTOR_UNIT_VSX_P V1TImode, V2DFmode and V2DImode modes.
---
 gcc/config/rs6000/rs6000-builtins.def | 13 -
 gcc/config/rs6000/rs6000-c.cc | 40 ---
 2 files changed, 53 deletions(-)

diff --git a/gcc/config/rs6000/rs6000-builtins.def 
b/gcc/config/rs6000/rs6000-builtins.def

index 47830b7dcb0..75c33aa9ffc 100644
--- a/gcc/config/rs6000/rs6000-builtins.def
+++ b/gcc/config/rs6000/rs6000-builtins.def
@@ -1263,19 +1263,6 @@
   const signed long long __builtin_vec_ext_v2di (vsll, signed int);
 VEC_EXT_V2DI nothing {extract}

-;; VEC_SET_V1TI, VEC_SET_V2DF and VEC_SET_V2DI are used in
-;; resolve_vec_insert(), rs6000-c.cc
-;; TODO: Remove VEC_SET_V1TI, VEC_SET_V2DF and VEC_SET_V2DI once the uses
-;; in resolve_vec_insert are replaced by the equivalent gimple statements.
-  const vsq __builtin_vec_set_v1ti (vsq, signed __int128, const int<0,0>);
-    VEC_SET_V1TI nothing {set}
-
-  const vd __builtin_vec_set_v2df (vd, double, const int<1>);
-    VEC_SET_V2DF nothing {set}
-
-  const vsll __builtin_vec_set_v2di (vsll, signed long long, const int<1>);
-    VEC_SET_V2DI nothing {set}
-
   const vsc __builtin_vsx_cmpge_16qi (vsc, vsc);
 CMPGE_16QI vector_nltv16qi {}

diff --git a/gcc/config/rs6000/rs6000-c.cc b/gcc/config/rs6000/rs6000-c.cc
index 68519e1397f..04882c396bf 100644
--- a/gcc/config/rs6000/rs6000-c.cc
+++ b/gcc/config/rs6000/rs6000-c.cc
@@ -1524,46 +1524,6 @@ resolve_vec_insert (resolution *res, vecva_gc> *arglist,

   return error_mark_node;
 }

-  /* If we can use the VSX xxpermdi instruction, use that for insert.  */
-  machine_mode mode = TYPE_MODE (arg1_type);
-
-  if ((mode == V2DFmode || mode == V2DImode)
-  && VECTOR_UNIT_VSX_P (mode)
-  && TREE_CODE (arg2) == INTEGER_CST)
-    {
-  wide_int selector = wi::to_wide (arg2);
-  selector = wi::umod_trunc (selector, 2);
-  arg2 = wide_int_to_tree (TREE_TYPE (arg2), selector);
-
-  tree call = NULL_TREE;
-  if (mode == V2DFmode)
-    call = rs6000_builtin_decls[RS6000_BIF_VEC_SET_V2DF];
-  else if (mode == V2DImode)
-    call = rs6000_builtin_decls[RS6000_BIF_VEC_SET_V2DI];
-
-  /* Note, __builtin_vec_insert_ has vector and scalar types
-     reversed.  */
-  if (call)
-    {
-      *res = resolved;
-      return build_call_expr (call, 3, arg1, arg0, arg2);
-    }
-    }
-
-  else if (mode == V1TImode
-       && VECTOR_UNIT_VSX_P (mode)
-       && TREE_CODE (arg2) == INTEGER_CST)
-    {
-  tree call = rs6000_builtin_decls[RS6000_BIF_VEC_SET_V1TI];
-  wide_int selector = wi::zero(32);
-  arg2 = wide_int_to_tree (TREE_TYPE (arg2), selector);
-
-  /* Note, __builtin_vec_insert_ has vector and scalar types
-     reversed.  */
-  *res = resolved;
-  return build_call_expr (call, 3, arg1, arg0, arg2);
-    }
-
   /* Build *(((arg1_inner_type*) & (vector type){arg1}) + arg2) = arg0 
with

  VIEW_CONVERT_EXPR.  i.e.:
    D.3192 = v1;
--
2.45.2




Re: [PATCH v3 2/2] Prevent divide-by-zero

2024-07-23 Thread Richard Sandiford
Edwin Lu  writes:
> On 7/23/2024 4:56 AM, Richard Biener wrote:
>> On Tue, Jul 23, 2024 at 1:03 AM Edwin Lu  wrote:
>>> Hi Richard,
>>>
>>> On 5/31/2024 1:48 AM, Richard Biener wrote:
 On Thu, May 30, 2024 at 2:11 AM Patrick O'Neill  
 wrote:
> From: Greg McGary 
 Still a NACK.  If remain ends up zero then

   /* Try to use a single smaller load when we are about
  to load excess elements compared to the unrolled
  scalar loop.  */
   if (known_gt ((vec_num * j + i + 1) * nunits,
  (group_size * vf - gap)))
 {
   poly_uint64 remain = ((group_size * vf - gap)
 - (vec_num * j + i) * 
 nunits);
   if (known_ge ((vec_num * j + i + 1) * nunits
 - (group_size * vf - gap), nunits))
 /* DR will be unused.  */
 ltype = NULL_TREE;

 needs to be re-formulated so that the combined conditions make sure
 this doesn't happen.  The outer known_gt should already ensure that
 remain > 0.  For correctness that should possibly be maybe_gt though.

Yeah.  FWIW, I mentioned the maybe_gt thing in
https://gcc.gnu.org/pipermail/gcc-patches/2024-May/653013.html:

  Pre-existing, but shouldn't this be maybe_gt rather than known_gt?
  We can only skip doing it if we know for sure that the load won't cross
  the gap.  (Not sure whether the difference can trigger in practice.)

But AFAICT, the known_gt doesn't inherently prove that remain is known
to be nonzero.  It just proves that the gap between the end of the scalar
accesses and the end of this vector is known to be nonzero.


>> Putting the list back in the loop and CCing Richard S.
>>
>>> I'm currently looking into this patch and am trying to figure out what
>>> is going on. Stepping through gdb, I see that remain == {coeffs = {0,
>>> 2}} and nunits == {coeffs = {2, 2}} (the outer known_gt returned true
>>> with known_gt({coeffs = {8, 8}}, {coeffs = {6, 8}})).
>>>
>>>   From what I understand, this falls under the umbrella of 0 <= remain <
>>> nunits. The divide by zero error is because of the 0 <= remain which is
>>> coming from the constant_multiple_p function in poly-int.h where it
>>> performs the modulus NCa(a.coeffs[0]) % NCb(b.coeffs[0]).
>>> (https://github.com/gcc-mirror/gcc/blob/master/gcc/poly-int.h#L1970-L1971)
>>>
>>>
>>>   >  if (known_ge ((vec_num * j + i + 1) * nunits
>>>   >- (group_size * vf - gap),
>>> nunits))
>>>   >/* DR will be unused.  */
>>>   >ltype = NULL_TREE;
>>>
>>> This if condition is a bit suspicious to me though. I'm seeing that it's
>>> evaluating known_ge({coeffs = {2, 0}}, {coeffs = {2, 2}}) which is
>>> returning false. Should it be maybe_ge instead?
>> No, we can only not emit a load if we know it won't be used, not if
>> it eventually cannot be used.

Agreed.

[switching round for easier reply]
>>> After running some
>>> tests, to me it looks like it doesn't vectorize quite as often; however,
>>> I'm not fully sure what else to do when the coeffs can potentially be
>>> equal to 0.
>>>
>>> Should it even be possible for there to be a {coeffs = {0, n}}
>>> situation? My understanding of how poly_ints are used for representing
>>> vectorization is that the first coefficient is the number of elements
>>> needed to make the minimum supported vector size. That is, if vector
>>> lengths are 128 bits, element size is 32 bits, coeff[0] should be
>>> minimum of 4. Is this understanding correct?
>> I was told n can be negative, but nunits.coeff[0] should be non-zero.
>
> What would it mean for the coeffs[0] to be 0? Would that mean the vector 
> length supports 0 bits?

coeffs = {A,B} just means A+B*X, where X is the number of vector
"chunks" beyond the minimum length.  It's certainly valid for a poly_int
to have a zero coeffs[0] (i.e. zero A).  For example, (the length of a
vector) - (the minimum length) would have this property.

>>
>> What is j and i when the divisor is zero?
>
> The values I see in gdb are: vec_num = 4 j = 0 i = 3 vf = {coeffs = {2, 
> 2}} nunits = {coeffs = {2, 2}} group_size = 4 gap = 2 vect_align = 2 
> remain = {coeffs = {0, 2}}

OK, so let's use D to mean "data" and G to mean "gap".  Then, for the
minimum vector length of 2 elements, we have:

  DD GG DD GG

The last load will read beyond the scalar loop if the vector loop happens
to handle all elements of the scalar loop.

For a vector length of 4 elements, we have:

  DDGG DDGG DDGG DDGG

where every load contains both data and gaps.  The same will be true
for larger vectors.

That's where remain={0,2} is coming from.  The last 

Re: [PATCH 2/3] aarch64: Add support for moving fpm system register

2024-07-23 Thread Claudio Bantaloukas


On 22/07/2024 11:07, Alex Coplan wrote:
> Hi Claudio,
> 
> I've left a couple of small comments below.
> 
> On 22/07/2024 09:30, Claudio Bantaloukas wrote:
>>
>> Unlike most system registers, fpmr can be heavily written to in code that
>> exercises the fp8 functionality. That is because every fp8 instrinsic call
>> can potentially change the value of fpmr.
>> Rather than just use a an unspec, we treat the fpmr system register like
>> all other registers and use a move operation to read and write to it.
>>
>> We introduce a new class of moveable system registers that, currently,
>> only accepts fpmr and a new constraint, Umv, that allows us to
>> selectively use mrs and msr instructions when expanding rtl for them.
>> Given that there is code that depends on "real" registers coming before
>> "fake" ones, we introduce a new constant FPM_REGNUM that uses an
>> existing value and renumber registers below that.
>> This requires us to update the bitmaps that describe which registers
>> belong to each register class.
>>
>> gcc/ChangeLog:
>>
>>  * config/aarch64/aarch64.cc (aarch64_hard_regno_nregs): Add
>>  support for MOVEABLE_SYSREGS class.
>>  (aarch64_hard_regno_mode_ok): Only allow 64 bit reads and writes
>>  to fpmr.
>>  (aarch64_regno_regclass): Support MOVEABLE_SYSREGS class.
>>  (aarch64_class_max_nregs): Likewise.
>>  * config/aarch64/aarch64.h (FIXED_REGISTERS): add fpmr.
>>  (CALL_REALLY_USED_REGISTERS): Likewise.
>>  (REGISTER_NAMES): Likewise.
>>  (enum reg_class): Add MOVEABLE_SYSREGS class.
>>  (REG_CLASS_NAMES): Likewise.
>>  (REG_CLASS_CONTENTS): Update class bitmaps to deal with fpmr,
>>  the new MOVEABLE_REGS class and renumbering of registers.
>>  * config/aarch64/aarch64.md: (FPM_REGNUM): added new register
>>  number, reusing old value.
>>  (FFR_REGNUM): Renumber.
>>  (FFRT_REGNUM): Likewise.
>>  (LOWERING_REGNUM): Likewise.
>>  (TPIDR2_BLOCK_REGNUM): Likewise.
>>  (SME_STATE_REGNUM): Likewise.
>>  (TPIDR2_SETUP_REGNUM): Likewise.
>>  (ZA_FREE_REGNUM): Likewise.
>>  (ZA_SAVED_REGNUM): Likewise.
>>  (ZA_REGNUM): Likewise.
>>  (ZT0_REGNUM): Likewise.
>>  (*mov_aarch64): Add support for moveable sysregs.
>>  (*movsi_aarch64): Likewise.
>>  (*movdi_aarch64): Likewise.
>>  * config/aarch64/constraints.md (MOVEABLE_SYSREGS): New constraint.
>>
>> gcc/testsuite/ChangeLog:
>>
>>  * gcc.target/aarch64/acle/fp8.c: New tests.
>> ---
>>   gcc/config/aarch64/aarch64.cc   |   9 ++
>>   gcc/config/aarch64/aarch64.h|  14 ++-
>>   gcc/config/aarch64/aarch64.md   |  30 --
>>   gcc/config/aarch64/constraints.md   |   3 +
>>   gcc/testsuite/gcc.target/aarch64/acle/fp8.c | 107 +++-
>>   5 files changed, 146 insertions(+), 17 deletions(-)
>>
> 
>> diff --git a/gcc/config/aarch64/aarch64.cc b/gcc/config/aarch64/aarch64.cc
>> index 0d9e80d85b2..fa526836c6a 100644
>> --- a/gcc/config/aarch64/aarch64.cc
>> +++ b/gcc/config/aarch64/aarch64.cc
>> @@ -2018,6 +2018,7 @@ aarch64_hard_regno_nregs (unsigned regno, machine_mode 
>> mode)
>>   case PR_HI_REGS:
>> return mode == VNx32BImode ? 2 : 1;
>>   
>> +case MOVEABLE_SYSREGS:
>>   case FFR_REGS:
>>   case PR_AND_FFR_REGS:
>>   case FAKE_REGS:
>> @@ -2045,6 +2046,10 @@ aarch64_hard_regno_mode_ok (unsigned regno, 
>> machine_mode mode)
>>   /* This must have the same size as _Unwind_Word.  */
>>   return mode == DImode;
>>   
>> +  if (regno == FPM_REGNUM)
>> +/* This must have the same size as the FPMR register.  */
>> +return mode == QImode || mode == HImode || mode == SImode || mode == 
>> DImode;
> 
> I'm probably missing something here, but I can't seem to square the
> comment with the logic.  These modes all have different sizes, so how
> can they all be the same size as the FPMR register?

Thanks for catching this. An initial version of the patch only allowed 
64 bit register moves but we increased support to other types in order 
to avoid ICEs. I forgot to remove the comment. WIll fix in the next patch

>> +
>> unsigned int vec_flags = aarch64_classify_vector_mode (mode);
>> if (vec_flags == VEC_SVE_PRED)
>>   return pr_or_ffr_regnum_p (regno);
>> @@ -12682,6 +12687,9 @@ aarch64_regno_regclass (unsigned regno)
>> if (PR_REGNUM_P (regno))
>>   return PR_LO_REGNUM_P (regno) ? PR_LO_REGS : PR_HI_REGS;
>>   
>> +  if (regno == FPM_REGNUM)
>> +return MOVEABLE_SYSREGS;
>> +
>> if (regno == FFR_REGNUM || regno == FFRT_REGNUM)
>>   return FFR_REGS;
>>   
>> @@ -13070,6 +13078,7 @@ aarch64_class_max_nregs (reg_class_t regclass, 
>> machine_mode mode)
>>   case PR_HI_REGS:
>> return mode == VNx32BImode ? 2 : 1;
>>   
>> +case MOVEABLE_SYSREGS:
>>   case STACK_REG:
>>   case FFR_REGS:
>>   case PR_AND_FFR_REGS:
>> diff --git a/gcc/config/aarch64/aarch64.h b/gcc/config/aarch64/aarch64.h

Re: [ARC PATCH] Improve performance of SImode right shifts (take #2)

2024-07-23 Thread Jeff Law




On 7/22/24 11:07 AM, Roger Sayle wrote:
> > Whilst 33-bit pseudo-rotations almost certainly doesn't occur 
frequently

in real code, this provides a very useful building block.  Conventionally,
rotations require 2 cycles per bit; one cycle to shift the top-bit out
out of the source, and one cycle to shift this bit into the destination.
The above pseudo rotation has twice the throughput, but leaves the
upper bits in a unusual configuration.  It turns out that masking the
top-bits out with AND after the pseudo-rotation provides a fast form
of lshr with high shift counts, and even ashr can be improved using
either a sign-extension instruction or sign-extension sequence after
a lshr.
The H8 uses rotate through the carry in a variety of ways as well.  For 
example, a SImode shift by 15:


shlr.w  // Move bit into the carry
mov.w   // Move low half word into high half word
xor.w   // clear low half word
rotxr.l // rotate right to restore carry bit
> > Unfortunately without real hardware or a simulator to test on, I can't

be 100% confident in this code, but on paper, shifts should now be much
faster.  This patch has been tested on a cross-compiler to arc-linux
hosted on x86_64 with no new failures in the compilation tests.

Now that Claudiu has left Synopsys, is anyone able to test these changes?
I see that Synopsys has QEMU for the arc.  In theory that should be 
sufficient if it has user mode emulation support.  But I don't have the 
time to really dig into that.   Not sure if you want to take that on or not.


I can throw this patch into my tester, but I'm not sure it's going to 
give significantly more coverage than you've done.  I have a dummy 
simulator that always returns 0.  So it'll pretend to run all the 
execution tests, but it really just verifies they compile, assemble and 
link.


Claudiu is still listed as the maintainer for the port, so let's give 
him time to chime in.  He may also have suggestions on how to get things 
set up for doing real executions tests going forward.



Jeff


Re: [PATCH-1v4] Value Range: Add range op for builtin isinf

2024-07-23 Thread Jeff Law




On 7/11/24 9:17 PM, HAO CHEN GUI wrote:


So why the test for real_isinf on the upper/lower bound?  If op1 is known to be 
a NaN, then why test the bounds at all?  If a bounds test is needed, why only 
test the upper bound?


IMHO, logical is if the op1 is a NAN, it's not an infinite number. If the upper
and lower bound both are finite numbers, the op1 is not an infinite number.
Under both situations, the result should be set to 0 which means op1 isn't an
infinite number.

Understood, but that's not what the code actually implements:


+if (op1.known_isnan ()
+|| (!real_isinf (&op1.lower_bound ())
+&& !real_isinf (&op1.upper_bound (
+  {
+r.set_zero (type);
+return true;
+  }
If op1 is a NaN, then it it can not be Inf.  Similarly if both of the 
bounds are known not to be Inf, then op1 is not Inf and thus we should 
be returning false instead of true.  Or am I mis-understanding this API?












+  virtual bool op1_range (frange &r, tree type, const irange &lhs,
+  const frange &, relation_trio) const override
+  {
+    if (lhs.undefined_p ())
+  return false;
+
+    if (lhs.zero_p ())
+  {
+    nan_state nan (true);
+    r.set (type, real_min_representable (type),
+   real_max_representable (type), nan);
+    return true;
+  }

If the result of a builtin_isinf is zero, that doesn't mean the input has a nan 
state.  It means we know it's not infinity.  The input argument could be 
anything but an Inf.


If the result of a builtin_isinf is zero, it means the input might be a NAN or
a finite number. So the range should be [min_rep, max_rep] U NAN.

ACK.

jeff



Re: [PATCH v10 1/3] C++: Support clang compatible [[musttail]] (PR83324)

2024-07-23 Thread Jason Merrill

On 7/18/24 7:46 PM, Andi Kleen wrote:


Updated patch with the !retval bug fix identified by Marek.


OK.


This patch implements a clang compatible [[musttail]] attribute for
returns.
   
musttail is useful as an alternative to computed goto for interpreters.

With computed goto the interpreter function usually ends up very big
which causes problems with register allocation and other per function
optimizations not scaling. With musttail the interpreter can be instead
written as a sequence of smaller functions that call each other. To
avoid unbounded stack growth this requires forcing a sibling call, which
this attribute does. It guarantees an error if the call cannot be tail
called which allows the programmer to fix it instead of risking a stack
overflow. Unlike computed goto it is also type-safe.

It turns out that David Malcolm had already implemented middle/backend

support for a musttail attribute back in 2016, but it wasn't exposed
to any frontend other than a special plugin.
 
This patch adds a [[gnu::musttail]] attribute for C++ that can be added

to return statements. The return statement must be a direct call
(it does not follow dependencies), which is similar to what clang
implements. It then uses the existing must tail infrastructure.
 
For compatibility it also detects clang::musttail
 
Passes bootstrap and full test
 
gcc/c-family/ChangeLog:
 
 * c-attribs.cc (set_musttail_on_return): New function.

 * c-common.h (set_musttail_on_return): Declare new function.
 
gcc/cp/ChangeLog:
  
 PR c/83324

 * cp-tree.h (AGGR_INIT_EXPR_MUST_TAIL): Add.
 * parser.cc (cp_parser_statement): Handle musttail.
 (cp_parser_jump_statement): Dito.
 * pt.cc (tsubst_expr): Copy CALL_EXPR_MUST_TAIL_CALL.
 * semantics.cc (simplify_aggr_init_expr): Handle musttail.

diff --git a/gcc/c-family/c-attribs.cc b/gcc/c-family/c-attribs.cc
index 5adc7b775eaf..685f212683f4 100644
--- a/gcc/c-family/c-attribs.cc
+++ b/gcc/c-family/c-attribs.cc
@@ -672,6 +672,26 @@ attribute_takes_identifier_p (const_tree attr_id)
  return targetm.attribute_takes_identifier_p (attr_id);
  }
  
+/* Set a musttail attribute MUSTTAIL_P on return expression RETVAL

+   at LOC.  */
+
+void
+set_musttail_on_return (tree retval, location_t loc, bool musttail_p)
+{
+  if (retval && musttail_p)
+{
+  tree t = retval;
+  if (TREE_CODE (t) == TARGET_EXPR)
+   t = TARGET_EXPR_INITIAL (t);
+  if (TREE_CODE (t) != CALL_EXPR)
+   error_at (loc, "cannot tail-call: return value must be a call");
+  else
+   CALL_EXPR_MUST_TAIL_CALL (t) = 1;
+}
+  else if (musttail_p && !retval)
+error_at (loc, "cannot tail-call: return value must be a call");
+}
+
  /* Verify that argument value POS at position ARGNO to attribute NAME
 applied to function FN (which is either a function declaration or function
 type) refers to a function parameter at position POS and the expected type
diff --git a/gcc/c-family/c-common.h b/gcc/c-family/c-common.h
index adee822a3ae0..2510ee4dbc9d 100644
--- a/gcc/c-family/c-common.h
+++ b/gcc/c-family/c-common.h
@@ -1648,6 +1648,7 @@ extern tree handle_noreturn_attribute (tree *, tree, 
tree, int, bool *);
  extern tree handle_musttail_attribute (tree *, tree, tree, int, bool *);
  extern bool has_attribute (location_t, tree, tree, tree (*)(tree));
  extern tree build_attr_access_from_parms (tree, bool);
+extern void set_musttail_on_return (tree, location_t, bool);
  
  /* In c-format.cc.  */

  extern bool valid_format_string_type_p (tree);
diff --git a/gcc/cp/cp-tree.h b/gcc/cp/cp-tree.h
index c6f102564ce0..67ba3274eb1b 100644
--- a/gcc/cp/cp-tree.h
+++ b/gcc/cp/cp-tree.h
@@ -4236,6 +4236,10 @@ templated_operator_saved_lookups (tree t)
  #define AGGR_INIT_FROM_THUNK_P(NODE) \
(AGGR_INIT_EXPR_CHECK (NODE)->base.protected_flag)
  
+/* Nonzero means that the call was marked musttail.  */

+#define AGGR_INIT_EXPR_MUST_TAIL(NODE) \
+  (AGGR_INIT_EXPR_CHECK (NODE)->base.static_flag)
+
  /* AGGR_INIT_EXPR accessors.  These are equivalent to the CALL_EXPR
 accessors, except for AGGR_INIT_EXPR_SLOT (which takes the place of
 CALL_EXPR_STATIC_CHAIN).  */
diff --git a/gcc/cp/parser.cc b/gcc/cp/parser.cc
index efd5d6f29a71..1fa0780944b6 100644
--- a/gcc/cp/parser.cc
+++ b/gcc/cp/parser.cc
@@ -2467,7 +2467,7 @@ static tree cp_parser_perform_range_for_lookup
  static tree cp_parser_range_for_member_function
(tree, tree);
  static tree cp_parser_jump_statement
-  (cp_parser *);
+  (cp_parser *, tree &);
  static void cp_parser_declaration_statement
(cp_parser *);
  
@@ -12757,7 +12757,7 @@ cp_parser_statement (cp_parser* parser, tree in_statement_expr,

case RID_CO_RETURN:
case RID_GOTO:
  std_attrs = process_stmt_hotness_attribute (std_attrs, attrs_loc);
- statement = cp_parser_jump_statement (parser);
+ statement = cp_parser_jump_statement (parser, std_attr

Re: [PATCH] c++/modules: Ensure deduction guides are always reachable [PR115231]

2024-07-23 Thread Jason Merrill

On 6/15/24 10:29 PM, Nathaniel Shead wrote:

Bootstrapped and regtested on x86_64-pc-linux-gnu, OK for trunk?

This probably isn't the most efficient approach, since we need to do
name lookup to find deduction guides for a type which will also
potentially do a bunch of pointless lazy loading from imported modules,
but I wasn't able to work out a better approach without completely
reworking how deduction guides are stored and represented.


Indeed.  We likely want to find them more directly from the template; 
it's not clear to me that DECL_INITIAL is used for TEMPLATE_DECL, or we 
could put them in an internal attribute or a separate hash table.



-- >8 --

Deduction guides are represented as 'normal' functions currently, and
have no special handling in modules.  However, this causes some issues;
by [temp.deduct.guide] a deduction guide is not found by normal name
lookup and instead all reachable deduction guides for a class template
should be considered, but this does not happen currently.

To solve this, this patch ensures that all deduction guides are
considered exported to ensure that they are always visible to importers
if they are reachable.  Another alternative here would be to add a new
kind of "all reachable" flag to name lookup, but that is complicated by
some difficulties in handling GM entities; this may be a better way to
go if more kinds of entities end up needing this handling, however.

Another issue here is that because deduction guides are "unrelated"
functions, they will usually get discarded from the GMF, so this patch
ensures that when finding dependencies, GMF deduction guides will also
have bindings created.  We do this in find_dependencies so that we don't
unnecessarily create bindings for GMF deduction guides that are never
reached; for consistency we do this for *all* deduction guides, not just
GM ones.


If you fixed the dependency calculation, why do they also need to be 
exported?


Jason



Re: [PATCH] c++: fix wrong ambiguity resolution [PR29834]

2024-07-23 Thread Marek Polacek
On Tue, Jul 23, 2024 at 12:53:07AM -0400, Jason Merrill wrote:
> On 7/20/24 2:31 PM, Marek Polacek wrote:
> > [ Entering the contest to fix the oldest PR in this cycle. ]
> > 
> > Bootstrapped/regtested on x86_64-pc-linux-gnu, ok for trunk?
> > 
> > -- >8 --
> > This 18-year-old PR reports that we parse certain comma expressions
> > as a declaration rather than statement when the statement begins with
> > a functional-style cast expression.  Consider
> > 
> >int(x), 0;
> > 
> > which does not declare x--it only casts x to int--, whereas
> > 
> >int(x), (y);
> > 
> > declares x and y.  We need some kind of look-ahead to decide how we
> > should disambiguate the construct, because cp_parser_init_declarator
> > commits eagerly once it has seen "int(x)", and then it's too late to
> > recover.
> > 
> > This patch makes us try to parse the code as a sequence of declarators;
> > if that fails, we are likely looking at a statement.  That's a simple
> > idea, but it's complicated by code like
> > 
> >void (*p)(void *)(fun);
> > 
> > which initializes a pointer-to-function, or
> > 
> >int(x), (x) + 1;
> > 
> > which is an expression statement, but the second (x) is parsed as
> > a valid declarator, only the + after reveals that the whole thing
> > is an expression.  You can have things like
> > 
> >int(**p)
> > 
> > which by itself doesn't tell you much.  You can have
> > 
> >int(*q)(void*)
> > 
> > which looks like it starts with a functional-style cast, but it is not
> > a cast.  The simple
> > 
> >int(x) = 42;
> > 
> > has an initializer so it declares x; it is not an assignment.  But then,
> > 
> >  int(d) __attribute__(());
> > 
> > does not have an initializer, but the attribute makes it a declaration.
> > 
> > PR c++/29834
> > PR c++/54905
> > 
> > gcc/cp/ChangeLog:
> > 
> > * parser.cc (cp_parser_lambda_introducer): Use
> > cp_parser_next_token_starts_initializer_p.
> > (cp_parser_simple_declaration): Add look-ahead to decide if we're
> > looking at a declaration or statement.
> > (cp_parser_next_token_starts_initializer_p): New.
> > 
> > gcc/testsuite/ChangeLog:
> > 
> > * g++.dg/parse/ambig15.C: New test.
> > * g++.dg/parse/ambig16.C: New test.
> > ---
> >   gcc/cp/parser.cc | 73 ++--
> >   gcc/testsuite/g++.dg/parse/ambig15.C | 83 
> >   gcc/testsuite/g++.dg/parse/ambig16.C | 18 ++
> >   3 files changed, 168 insertions(+), 6 deletions(-)
> >   create mode 100644 gcc/testsuite/g++.dg/parse/ambig15.C
> >   create mode 100644 gcc/testsuite/g++.dg/parse/ambig16.C
> > 
> > diff --git a/gcc/cp/parser.cc b/gcc/cp/parser.cc
> > index 1fa0780944b..797cfc3204e 100644
> > --- a/gcc/cp/parser.cc
> > +++ b/gcc/cp/parser.cc
> > @@ -2947,6 +2947,8 @@ static bool 
> > cp_parser_next_token_ends_template_argument_p
> > (cp_parser *);
> >   static bool cp_parser_nth_token_starts_template_argument_list_p
> > (cp_parser *, size_t);
> > +static bool cp_parser_next_token_starts_initializer_p
> > +  (cp_parser *);
> >   static enum tag_types cp_parser_token_is_class_key
> > (cp_token *);
> >   static enum tag_types cp_parser_token_is_type_parameter_key
> > @@ -11663,9 +11665,7 @@ cp_parser_lambda_introducer (cp_parser* parser, 
> > tree lambda_expr)
> > }
> > /* Find the initializer for this capture.  */
> > -  if (cp_lexer_next_token_is (parser->lexer, CPP_EQ)
> > - || cp_lexer_next_token_is (parser->lexer, CPP_OPEN_PAREN)
> > - || cp_lexer_next_token_is (parser->lexer, CPP_OPEN_BRACE))
> > +  if (cp_parser_next_token_starts_initializer_p (parser))
> > {
> >   /* An explicit initializer exists.  */
> >   if (cxx_dialect < cxx14)
> > @@ -11747,9 +11747,7 @@ cp_parser_lambda_introducer (cp_parser* parser, 
> > tree lambda_expr)
> >   /* If what follows is an initializer, the second '...' is
> >  invalid.  But for cases like [...xs...], the first one
> >  is invalid.  */
> > - if (cp_lexer_next_token_is (parser->lexer, CPP_EQ)
> > - || cp_lexer_next_token_is (parser->lexer, CPP_OPEN_PAREN)
> > - || cp_lexer_next_token_is (parser->lexer, CPP_OPEN_BRACE))
> > + if (cp_parser_next_token_starts_initializer_p (parser))
> > ellipsis_loc = loc;
> >   error_at (ellipsis_loc, "too many %<...%> in lambda capture");
> >   continue;
> > @@ -16047,6 +16045,58 @@ cp_parser_simple_declaration (cp_parser* parser,
> >   else
> > break;
> > +  /* If we are still uncommitted, we're probably looking at something like
> > + T(x), which can be a declaration but does not have to be, depending
> > + on what comes after.  Consider
> > +   int(x), 0;
> > + which is _not_ a declaration of x, it's a functional cast, and
> > +   int(x), (y);
> > + which declares x and y.  We need some kind of look-

[patch,avr] Implement PR116056: attribute signal(n) and interrupt(n)

2024-07-23 Thread Georg-Johann Lay

This patch adds support for arguments to the signal and interrupt
function attributes.  It allows to specify the ISR by means of the
associated IRQ number, in extension to the current attributes that
require to specify the ISR name like "__vector_1" as (assembly) name
for the function.  The new feature is more convenient, e.g. when the
ISR is implemented by a class method or in a namespace.  There is no
requirement that the ISR is externally visible.  The syntax is like:

__attribute__((signal(1, 2, ...), signal(3, 4, ...)))
[static] void isr_function (void)
{
// Code
}

Ok for trunk?

Johann

--

AVR target 116056 - Support attribute signal(n) and interrupt(n).

This patch adds support for arguments to the signal and interrupt
function attributes.  It allows to specify the ISR by means of the
associated IRQ number, in extension to the current attributes that
require to specify the ISR name like "__vector_1" as (assembly) name
for the function.  The new feature is more convenient, e.g. when the
ISR is implemented by a class method or in a namespace.  There is no
requirement that the ISR is externally visible.  The syntax is like:

__attribute__((signal(1, 2, ...), signal(3, 4, ...)))
[static] void isr_function (void)
{
// Code
}

PR target/116056
gcc/
* config/avr/avr.h (ASM_DECLARE_FUNCTION_NAME): New define.
* config/avr/avr-protos.h (avr_declare_function_name): New proto.
* config/avr/avr-c.cc (avr_cpu_cpp_builtins) <__HAVE_SIGNAL_N__>: New
built-in macro.
* config/avr/avr.cc (avr_declare_function_name): New function.
(avr_attribute_table) : Allow any number of args.
(avr_insert_attributes): Check validity of "signal" and "interrupt"
arguments.
(avr_foreach_function_attribute, avr_interrupt_signal_function)
(avr_isr_number, avr_asm_isr_alias, avr_handle_isr_attribute):
New static functions.
(avr_interrupt_function): New from avr_interrupt_function_p.
Adjust callers.
(avr_signal_function): New from avr_signal_function_p.
Adjust callers.
(avr_set_current_function): Only diagnose non-__vector ISR names
when "signal" or "interrupt" attribute has no args.
(struct avr_fun_cookie): New.
* doc/extend.texi (AVR Function Attributes): Document
signal(num) and interrupt(num).
* doc/invoke.texi (AVR Built-in Macros) <__HAVE_SIGNAL_N__>: Document.
gcc/testsuite/
* gcc.target/avr/torture/signal_n-1.c: New test.
* gcc.target/avr/torture/signal_n-2.c: New test.
* gcc.target/avr/torture/signal_n-3.c: New test.
* gcc.target/avr/torture/signal_n-4.cpp: New test.diff --git a/gcc/config/avr/avr-c.cc b/gcc/config/avr/avr-c.cc
index 5e7f759ed73..2c5cfb34df6 100644
--- a/gcc/config/avr/avr-c.cc
+++ b/gcc/config/avr/avr-c.cc
@@ -391,6 +391,9 @@ avr_cpu_cpp_builtins (struct cpp_reader *pfile)
   cpp_define (pfile, "__WITH_AVRLIBC__");
 #endif /* WITH_AVRLIBC */
 
+  // We support __attribute__((signal (n1, n2, ...))).
+  cpp_define (pfile, "__HAVE_SIGNAL_N__");
+
   // From configure --with-libf7={|libgcc|math|math-symbols|yes|no}
 
 #ifdef WITH_LIBF7_LIBGCC
diff --git a/gcc/config/avr/avr-protos.h b/gcc/config/avr/avr-protos.h
index 5fdb1305757..7b666f17718 100644
--- a/gcc/config/avr/avr-protos.h
+++ b/gcc/config/avr/avr-protos.h
@@ -35,6 +35,7 @@ extern void avr_init_expanders (void);
 #ifdef TREE_CODE
 extern void avr_asm_output_aligned_decl_common (FILE*, tree, const char*, unsigned HOST_WIDE_INT, unsigned int, bool);
 extern void avr_asm_asm_output_aligned_bss (FILE *, tree, const char *, unsigned HOST_WIDE_INT, int, void (*) (FILE *, tree, const char *, unsigned HOST_WIDE_INT, int));
+extern void avr_declare_function_name (FILE *, const char *, tree);
 extern void asm_output_external (FILE *file, tree decl, char *name);
 extern int avr_progmem_p (tree decl, tree attributes);
 extern bool avr_addr_space_supported_p (addr_space_t, location_t loc = UNKNOWN_LOCATION);
diff --git a/gcc/config/avr/avr.cc b/gcc/config/avr/avr.cc
index e941730452e..7f82858659f 100644
--- a/gcc/config/avr/avr.cc
+++ b/gcc/config/avr/avr.cc
@@ -1356,6 +1356,33 @@ avr_lookup_function_attribute1 (const_tree func, const char *name)
   return NULL_TREE != lookup_attribute (name, TYPE_ATTRIBUTES (func));
 }
 
+
+/* Call WORKER on all NAME attributes of function FUNC.  */
+
+static void
+avr_foreach_function_attribute (tree func, const char *name,
+void (*worker) (tree, tree, void *),
+void *cookie)
+{
+  tree attrs = NULL_TREE;
+
+  if (TREE_CODE (func) == FUNCTION_DECL)
+attrs = DECL_ATTRIBUTES (func);
+  else if (FUNC_OR_METHOD_TYPE_P (func))
+attrs = TYPE_ATTRIBUTES (TREE_TYPE (func));
+
+  while (attrs)
+{
+  attrs = lookup_attribute (name, attrs);
+  if (attrs)
+	{
+	  worker (func, attrs, cookie);
+	  attrs = TREE_CHAIN (attrs);
+	}
+}
+}
+
+
 /* Return nonzero if FUNC is a naked function.  */
 
 static 

Re: [PATCH] c++/modules: Stream warning suppressions [PR115757]

2024-07-23 Thread Jason Merrill

On 7/7/24 12:39 AM, Nathaniel Shead wrote:

Bootstrapped on x86_64-pc-linux-gnu, successfully regtested modules.exp;
OK for trunk if full regtest passes?


Patrick, I assume this change won't mess with your streaming optimizations?

OK with Patrick's approval or on Friday, whichever comes first.


-- >8 --

Currently we don't stream the contents of 'nowarn_map'; this means that
warning suppressions don't get applied in importers, which is
particularly relevant for templates (as in the linked testcase).

Rather than streaming the whole contents of 'nowarn_map', this patch
instead just streams the exported suppressions for each tree node
individually, to not build up additional locations and suppressions for
tree nodes that do not need to be streamed.

PR c++/115757

gcc/cp/ChangeLog:

* module.cc (trees_out::core_vals): Write warning specs for
DECLs and EXPRs.
(trees_in::core_vals): Read warning specs.

gcc/ChangeLog:

* tree.h (put_warning_spec_at): Declare new function.
(has_warning_spec): Likewise.
(get_warning_spec): Likewise.
(put_warning_spec): Likewise.
* diagnostic-spec.h (nowarn_spec_t::from_bits): New function.
* diagnostic-spec.cc (put_warning_spec_at): New function.
* warning-control.cc (has_warning_spec): New function.
(get_warning_spec): New function.
(put_warning_spec): New function.

gcc/testsuite/ChangeLog:

* g++.dg/modules/warn-spec-1_a.C: New test.
* g++.dg/modules/warn-spec-1_b.C: New test.

Signed-off-by: Nathaniel Shead 
---
  gcc/cp/module.cc | 12 +
  gcc/diagnostic-spec.cc   | 21 
  gcc/diagnostic-spec.h|  7 ++
  gcc/testsuite/g++.dg/modules/warn-spec-1_a.C | 10 
  gcc/testsuite/g++.dg/modules/warn-spec-1_b.C |  8 ++
  gcc/tree.h   |  9 +++
  gcc/warning-control.cc   | 26 
  7 files changed, 93 insertions(+)
  create mode 100644 gcc/testsuite/g++.dg/modules/warn-spec-1_a.C
  create mode 100644 gcc/testsuite/g++.dg/modules/warn-spec-1_b.C

diff --git a/gcc/cp/module.cc b/gcc/cp/module.cc
index dc5d046f04d..0f9a689dbec 100644
--- a/gcc/cp/module.cc
+++ b/gcc/cp/module.cc
@@ -6000,6 +6000,10 @@ trees_out::core_vals (tree t)
  
if (state)

state->write_location (*this, t->decl_minimal.locus);
+
+  if (streaming_p ())
+   if (has_warning_spec (t))
+ u (get_warning_spec (t));
  }
  
if (CODE_CONTAINS_STRUCT (code, TS_TYPE_COMMON))

@@ -6113,6 +6117,10 @@ trees_out::core_vals (tree t)
if (state)
state->write_location (*this, t->exp.locus);
  
+  if (streaming_p ())

+   if (has_warning_spec (t))
+ u (get_warning_spec (t));
+
/* Walk in forward order, as (for instance) REQUIRES_EXPR has a
   bunch of unscoped parms on its first operand.  It's safer to
   create those in order.  */
@@ -6576,6 +6584,8 @@ trees_in::core_vals (tree t)
/* Don't zap the locus just yet, we don't record it correctly
 and thus lose all location information.  */
t->decl_minimal.locus = state->read_location (*this);
+  if (has_warning_spec (t))
+   put_warning_spec (t, u ());
  }
  
if (CODE_CONTAINS_STRUCT (code, TS_TYPE_COMMON))

@@ -6654,6 +6664,8 @@ trees_in::core_vals (tree t)
if (CODE_CONTAINS_STRUCT (code, TS_EXP))
  {
t->exp.locus = state->read_location (*this);
+  if (has_warning_spec (t))
+   put_warning_spec (t, u ());
  
bool vl = TREE_CODE_CLASS (code) == tcc_vl_exp;

for (unsigned limit = (vl ? VL_EXP_OPERAND_LENGTH (t)
diff --git a/gcc/diagnostic-spec.cc b/gcc/diagnostic-spec.cc
index 996ad6b273a..addaf089f03 100644
--- a/gcc/diagnostic-spec.cc
+++ b/gcc/diagnostic-spec.cc
@@ -179,6 +179,27 @@ suppress_warning_at (location_t loc, opt_code opt /* = 
all_warnings */,
return true;
  }
  
+/* Change the warning disposition for LOC to match OPTSPEC.  */

+
+void
+put_warning_spec_at (location_t loc, unsigned bits)
+{
+  gcc_checking_assert (!RESERVED_LOCATION_P (loc));
+
+  nowarn_spec_t optspec = nowarn_spec_t::from_bits (bits);
+  if (!optspec)
+{
+  if (nowarn_map)
+   nowarn_map->remove (loc);
+}
+  else
+{
+  if (!nowarn_map)
+   nowarn_map = nowarn_map_t::create_ggc (32);
+  nowarn_map->put (loc, optspec);
+}
+}
+
  /* Copy the no-warning disposition from one location to another.  */
  
  void

diff --git a/gcc/diagnostic-spec.h b/gcc/diagnostic-spec.h
index 22d4c067158..0b155a5cde3 100644
--- a/gcc/diagnostic-spec.h
+++ b/gcc/diagnostic-spec.h
@@ -56,6 +56,13 @@ public:
  
nowarn_spec_t (opt_code);
  
+  static nowarn_spec_t from_bits (unsigned bits)

+  {
+nowarn_spec_t spec;
+spec.m_bits = bits;
+return spec;
+  }
+
/* Return the raw bitset.  */
operator unsign

Re: [PATCH] c++, contracts: Ensure return statements on checkers.

2024-07-23 Thread Jason Merrill

On 6/17/24 8:14 AM, Iain Sandoe wrote:

This is a minor tidy-up, tested on x86_64-darwin,
OK For trunk?
thanks
Iain

--- 8< ---

At present, for pre-conditions and for post-conditions with a void
return, we are not emitting a return statement. This patch adds the
relevant return statements.


Hmm, why would we need a return statement in a void function?


gcc/cp/ChangeLog:

* contracts.cc (finish_function_contracts): Add return
statements to pre-condition and void post-cndition
checking functions.

Signed-off-by: Iain Sandoe 
---
  gcc/cp/contracts.cc | 3 +++
  1 file changed, 3 insertions(+)

diff --git a/gcc/cp/contracts.cc b/gcc/cp/contracts.cc
index 634e3cf4fa9..0822624a910 100644
--- a/gcc/cp/contracts.cc
+++ b/gcc/cp/contracts.cc
@@ -2052,6 +2052,7 @@ finish_function_contracts (tree fndecl)
DECL_PENDING_INLINE_P (pre) = false;
start_preparsed_function (pre, DECL_ATTRIBUTES (pre), flags);
remap_and_emit_conditions (fndecl, pre, PRECONDITION_STMT);
+  finish_return_stmt (NULL_TREE);
tree finished_pre = finish_function (false);
expand_or_defer_fn (finished_pre);
  }
@@ -2065,6 +2066,8 @@ finish_function_contracts (tree fndecl)
remap_and_emit_conditions (fndecl, post, POSTCONDITION_STMT);
if (!VOID_TYPE_P (TREE_TYPE (TREE_TYPE (post
finish_return_stmt (get_postcondition_result_parameter (fndecl));
+  else
+   finish_return_stmt (NULL_TREE);
  
tree finished_post = finish_function (false);

expand_or_defer_fn (finished_post);




[pushed] doc: add missing @option for musttail

2024-07-23 Thread Marek Polacek
I'll push this as obvious soon.

-- >8 --
gcc/ChangeLog:

* doc/extend.texi: Add missing @option.
---
 gcc/doc/extend.texi | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/gcc/doc/extend.texi b/gcc/doc/extend.texi
index b0273927b25..66c99ef7a66 100644
--- a/gcc/doc/extend.texi
+++ b/gcc/doc/extend.texi
@@ -9995,8 +9995,8 @@ If the compiler cannot generate a @code{musttail} tail 
call it will report
 an error. On some targets tail calls may never be supported.
 Tail calls cannot reference locals in memory, which may affect
 builds without optimization when passing small structures, or passing
-or returning large structures. Enabling -O1 or -O2 can improve
-the success of tail calls.
+or returning large structures.  Enabling @option{-O1} or @option{-O2} can
+improve the success of tail calls.
 @end table
 
 @node Attribute Syntax

base-commit: 8daae81113eeff37b4ae2e08a9797295fbc8b81e
-- 
2.45.2



Re: [PATCH] c++: fix wrong ambiguity resolution [PR29834]

2024-07-23 Thread Jason Merrill

On 7/23/24 4:18 PM, Marek Polacek wrote:

On Tue, Jul 23, 2024 at 12:53:07AM -0400, Jason Merrill wrote:

On 7/20/24 2:31 PM, Marek Polacek wrote:

[ Entering the contest to fix the oldest PR in this cycle. ]

Bootstrapped/regtested on x86_64-pc-linux-gnu, ok for trunk?

-- >8 --
This 18-year-old PR reports that we parse certain comma expressions
as a declaration rather than statement when the statement begins with
a functional-style cast expression.  Consider

int(x), 0;

which does not declare x--it only casts x to int--, whereas

int(x), (y);

declares x and y.  We need some kind of look-ahead to decide how we
should disambiguate the construct, because cp_parser_init_declarator
commits eagerly once it has seen "int(x)", and then it's too late to
recover.

This patch makes us try to parse the code as a sequence of declarators;
if that fails, we are likely looking at a statement.  That's a simple
idea, but it's complicated by code like

void (*p)(void *)(fun);

which initializes a pointer-to-function, or

int(x), (x) + 1;

which is an expression statement, but the second (x) is parsed as
a valid declarator, only the + after reveals that the whole thing
is an expression.  You can have things like

int(**p)

which by itself doesn't tell you much.  You can have

int(*q)(void*)

which looks like it starts with a functional-style cast, but it is not
a cast.  The simple

int(x) = 42;

has an initializer so it declares x; it is not an assignment.  But then,

  int(d) __attribute__(());

does not have an initializer, but the attribute makes it a declaration.

PR c++/29834
PR c++/54905

gcc/cp/ChangeLog:

* parser.cc (cp_parser_lambda_introducer): Use
cp_parser_next_token_starts_initializer_p.
(cp_parser_simple_declaration): Add look-ahead to decide if we're
looking at a declaration or statement.
(cp_parser_next_token_starts_initializer_p): New.

gcc/testsuite/ChangeLog:

* g++.dg/parse/ambig15.C: New test.
* g++.dg/parse/ambig16.C: New test.
---
   gcc/cp/parser.cc | 73 ++--
   gcc/testsuite/g++.dg/parse/ambig15.C | 83 
   gcc/testsuite/g++.dg/parse/ambig16.C | 18 ++
   3 files changed, 168 insertions(+), 6 deletions(-)
   create mode 100644 gcc/testsuite/g++.dg/parse/ambig15.C
   create mode 100644 gcc/testsuite/g++.dg/parse/ambig16.C

diff --git a/gcc/cp/parser.cc b/gcc/cp/parser.cc
index 1fa0780944b..797cfc3204e 100644
--- a/gcc/cp/parser.cc
+++ b/gcc/cp/parser.cc
@@ -2947,6 +2947,8 @@ static bool cp_parser_next_token_ends_template_argument_p
 (cp_parser *);
   static bool cp_parser_nth_token_starts_template_argument_list_p
 (cp_parser *, size_t);
+static bool cp_parser_next_token_starts_initializer_p
+  (cp_parser *);
   static enum tag_types cp_parser_token_is_class_key
 (cp_token *);
   static enum tag_types cp_parser_token_is_type_parameter_key
@@ -11663,9 +11665,7 @@ cp_parser_lambda_introducer (cp_parser* parser, tree 
lambda_expr)
}
 /* Find the initializer for this capture.  */
-  if (cp_lexer_next_token_is (parser->lexer, CPP_EQ)
- || cp_lexer_next_token_is (parser->lexer, CPP_OPEN_PAREN)
- || cp_lexer_next_token_is (parser->lexer, CPP_OPEN_BRACE))
+  if (cp_parser_next_token_starts_initializer_p (parser))
{
  /* An explicit initializer exists.  */
  if (cxx_dialect < cxx14)
@@ -11747,9 +11747,7 @@ cp_parser_lambda_introducer (cp_parser* parser, tree 
lambda_expr)
  /* If what follows is an initializer, the second '...' is
 invalid.  But for cases like [...xs...], the first one
 is invalid.  */
- if (cp_lexer_next_token_is (parser->lexer, CPP_EQ)
- || cp_lexer_next_token_is (parser->lexer, CPP_OPEN_PAREN)
- || cp_lexer_next_token_is (parser->lexer, CPP_OPEN_BRACE))
+ if (cp_parser_next_token_starts_initializer_p (parser))
ellipsis_loc = loc;
  error_at (ellipsis_loc, "too many %<...%> in lambda capture");
  continue;
@@ -16047,6 +16045,58 @@ cp_parser_simple_declaration (cp_parser* parser,
   else
 break;
+  /* If we are still uncommitted, we're probably looking at something like
+ T(x), which can be a declaration but does not have to be, depending
+ on what comes after.  Consider
+   int(x), 0;
+ which is _not_ a declaration of x, it's a functional cast, and
+   int(x), (y);
+ which declares x and y.  We need some kind of look-ahead to decide,
+ cp_parser_init_declarator below will commit eagerly once it has seen
+ "int(x)".  So we try to parse this as a sequence of declarators; if
+ that fails, we are likely looking at a statement.  (We could avoid
+ all of this if there is no non-nested comma.)  */


Unfortunately, this se

Re: [PATCH] c++/modules: Stream warning suppressions [PR115757]

2024-07-23 Thread Patrick Palka
On Tue, 23 Jul 2024, Jason Merrill wrote:

> On 7/7/24 12:39 AM, Nathaniel Shead wrote:
> > Bootstrapped on x86_64-pc-linux-gnu, successfully regtested modules.exp;
> > OK for trunk if full regtest passes?
> 
> Patrick, I assume this change won't mess with your streaming optimizations?

Should be fine, those optimizations are currently confined to
tree_node_bools where we stream many consecutive bools (and so we want
to avoid conditionally streaming a bit, so that the bit buffer position
of each streamed bit is statically known).  I don't think the technique
can really apply to core_vals since it streams mostly trees.

> 
> OK with Patrick's approval or on Friday, whichever comes first.
> 
> > -- >8 --
> > 
> > Currently we don't stream the contents of 'nowarn_map'; this means that
> > warning suppressions don't get applied in importers, which is
> > particularly relevant for templates (as in the linked testcase).
> > 
> > Rather than streaming the whole contents of 'nowarn_map', this patch
> > instead just streams the exported suppressions for each tree node
> > individually, to not build up additional locations and suppressions for
> > tree nodes that do not need to be streamed.
> > 
> > PR c++/115757
> > 
> > gcc/cp/ChangeLog:
> > 
> > * module.cc (trees_out::core_vals): Write warning specs for
> > DECLs and EXPRs.
> > (trees_in::core_vals): Read warning specs.
> > 
> > gcc/ChangeLog:
> > 
> > * tree.h (put_warning_spec_at): Declare new function.
> > (has_warning_spec): Likewise.
> > (get_warning_spec): Likewise.
> > (put_warning_spec): Likewise.
> > * diagnostic-spec.h (nowarn_spec_t::from_bits): New function.
> > * diagnostic-spec.cc (put_warning_spec_at): New function.
> > * warning-control.cc (has_warning_spec): New function.
> > (get_warning_spec): New function.
> > (put_warning_spec): New function.
> > 
> > gcc/testsuite/ChangeLog:
> > 
> > * g++.dg/modules/warn-spec-1_a.C: New test.
> > * g++.dg/modules/warn-spec-1_b.C: New test.
> > 
> > Signed-off-by: Nathaniel Shead 
> > ---
> >   gcc/cp/module.cc | 12 +
> >   gcc/diagnostic-spec.cc   | 21 
> >   gcc/diagnostic-spec.h|  7 ++
> >   gcc/testsuite/g++.dg/modules/warn-spec-1_a.C | 10 
> >   gcc/testsuite/g++.dg/modules/warn-spec-1_b.C |  8 ++
> >   gcc/tree.h   |  9 +++
> >   gcc/warning-control.cc   | 26 
> >   7 files changed, 93 insertions(+)
> >   create mode 100644 gcc/testsuite/g++.dg/modules/warn-spec-1_a.C
> >   create mode 100644 gcc/testsuite/g++.dg/modules/warn-spec-1_b.C
> > 
> > diff --git a/gcc/cp/module.cc b/gcc/cp/module.cc
> > index dc5d046f04d..0f9a689dbec 100644
> > --- a/gcc/cp/module.cc
> > +++ b/gcc/cp/module.cc
> > @@ -6000,6 +6000,10 @@ trees_out::core_vals (tree t)
> >   if (state)
> > state->write_location (*this, t->decl_minimal.locus);
> > +
> > +  if (streaming_p ())
> > +   if (has_warning_spec (t))
> > + u (get_warning_spec (t));
> >   }
> >   if (CODE_CONTAINS_STRUCT (code, TS_TYPE_COMMON))
> > @@ -6113,6 +6117,10 @@ trees_out::core_vals (tree t)
> > if (state)
> > state->write_location (*this, t->exp.locus);
> >   +  if (streaming_p ())
> > +   if (has_warning_spec (t))
> > + u (get_warning_spec (t));
> > +
> > /* Walk in forward order, as (for instance) REQUIRES_EXPR has a
> >bunch of unscoped parms on its first operand.  It's safer to
> >create those in order.  */
> > @@ -6576,6 +6584,8 @@ trees_in::core_vals (tree t)
> > /* Don't zap the locus just yet, we don't record it correctly
> >  and thus lose all location information.  */
> > t->decl_minimal.locus = state->read_location (*this);
> > +  if (has_warning_spec (t))
> > +   put_warning_spec (t, u ());
> >   }
> >   if (CODE_CONTAINS_STRUCT (code, TS_TYPE_COMMON))
> > @@ -6654,6 +6664,8 @@ trees_in::core_vals (tree t)
> > if (CODE_CONTAINS_STRUCT (code, TS_EXP))
> >   {
> > t->exp.locus = state->read_location (*this);
> > +  if (has_warning_spec (t))
> > +   put_warning_spec (t, u ());
> >   bool vl = TREE_CODE_CLASS (code) == tcc_vl_exp;
> > for (unsigned limit = (vl ? VL_EXP_OPERAND_LENGTH (t)
> > diff --git a/gcc/diagnostic-spec.cc b/gcc/diagnostic-spec.cc
> > index 996ad6b273a..addaf089f03 100644
> > --- a/gcc/diagnostic-spec.cc
> > +++ b/gcc/diagnostic-spec.cc
> > @@ -179,6 +179,27 @@ suppress_warning_at (location_t loc, opt_code opt /* =
> > all_warnings */,
> > return true;
> >   }
> >   +/* Change the warning disposition for LOC to match OPTSPEC.  */
> > +
> > +void
> > +put_warning_spec_at (location_t loc, unsigned bits)
> > +{
> > +  gcc_checking_assert (!RESERVED_LOCATION_P (loc));
> > +
> > +  nowarn_spec_t optspec = nowarn_spec_t::from_bits (bits);
> > + 

Re: [PATCH] c++: array new with value-initialization, again [PR115645]

2024-07-23 Thread Jason Merrill

On 7/17/24 5:33 PM, Marek Polacek wrote:

Bootstrapped/regtested on x86_64-pc-linux-gnu, ok for trunk?


Hmm, I thought I had replied to this already.


-- >8 --
Unfortunately, my r15-1946 fix broke the attached testcase.  In it,
we no longer go into the
   /* P1009: Array size deduction in new-expressions.  */
block, and instead generate an operator new [] call along with a loop
in build_new_1, which we can't constexpr-evaluate.  So this patch
reverts r15-1946 and uses CONSTRUCTOR_IS_PAREN_INIT to distinguish
between () and {} to fix the original testcase (anew7.C).

PR c++/115645

gcc/cp/ChangeLog:

* call.cc (convert_like_internal) : Don't report errors
about calling an explicit constructor when the constructor was marked
CONSTRUCTOR_IS_PAREN_INIT.
* init.cc (build_new): Revert r15-1946.  Set CONSTRUCTOR_IS_PAREN_INIT.
(build_vec_init): Maybe set CONSTRUCTOR_IS_PAREN_INIT.

gcc/testsuite/ChangeLog:

* g++.dg/cpp2a/constexpr-new23.C: New test.
---
  gcc/cp/call.cc   |  2 ++
  gcc/cp/init.cc   | 17 -
  gcc/testsuite/g++.dg/cpp2a/constexpr-new23.C | 38 
  3 files changed, 49 insertions(+), 8 deletions(-)
  create mode 100644 gcc/testsuite/g++.dg/cpp2a/constexpr-new23.C

diff --git a/gcc/cp/call.cc b/gcc/cp/call.cc
index a5d3426b70c..2d94d5e0d07 100644
--- a/gcc/cp/call.cc
+++ b/gcc/cp/call.cc
@@ -8592,6 +8592,8 @@ convert_like_internal (conversion *convs, tree expr, tree 
fn, int argnum,
&& BRACE_ENCLOSED_INITIALIZER_P (expr)
/* Unless this is for direct-list-initialization.  */
&& (!CONSTRUCTOR_IS_DIRECT_INIT (expr) || convs->need_temporary_p)
+   /* And it wasn't a ()-init.  */
+   && !CONSTRUCTOR_IS_PAREN_INIT (expr)
/* And in C++98 a default constructor can't be explicit.  */
&& cxx_dialect >= cxx11)
  {
diff --git a/gcc/cp/init.cc b/gcc/cp/init.cc
index e9561c146d7..4138a6077dd 100644
--- a/gcc/cp/init.cc
+++ b/gcc/cp/init.cc
@@ -4005,20 +4005,17 @@ build_new (location_t loc, vec 
**placement, tree type,
/* P1009: Array size deduction in new-expressions.  */
const bool array_p = TREE_CODE (type) == ARRAY_TYPE;
if (*init
-  /* If the array didn't specify its bound, we have to deduce it.  */
-  && ((array_p && !TYPE_DOMAIN (type))
- /* For C++20 array with parenthesized-init, we have to process
-the parenthesized-list.  But don't do it for (), which is
-value-initialization, and INIT should stay empty.  */
- || (cxx_dialect >= cxx20
- && (array_p || nelts)
- && !(*init)->is_empty (
+  /* If ARRAY_P, we have to deduce the array bound.  For C++20 paren-init,
+we have to process the parenthesized-list.  But don't do it for (),
+which is value-initialization, and INIT should stay empty.  */
+  && (array_p || (cxx_dialect >= cxx20 && nelts && !(*init)->is_empty (
  {
/* This means we have 'new T[]()'.  */
if ((*init)->is_empty ())
{
  tree ctor = build_constructor (init_list_type_node, NULL);
  CONSTRUCTOR_IS_DIRECT_INIT (ctor) = true;
+ CONSTRUCTOR_IS_PAREN_INIT (ctor) = true;
  vec_safe_push (*init, ctor);
}
tree &elt = (**init)[0];
@@ -4735,6 +4732,9 @@ build_vec_init (tree base, tree maxindex, tree init,
bool do_static_init = (DECL_P (obase) && TREE_STATIC (obase));
  
bool empty_list = false;

+  const bool paren_init_p = (init
+&& TREE_CODE (init) == CONSTRUCTOR
+&& CONSTRUCTOR_IS_PAREN_INIT (init));


I think rather than recognizing paren-init in general, we want to 
recognize () specifically, and set explicit_value_init_p...



if (init && BRACE_ENCLOSED_INITIALIZER_P (init)
&& CONSTRUCTOR_NELTS (init) == 0)
  /* Skip over the handling of non-empty init lists.  */
@@ -4927,6 +4927,7 @@ build_vec_init (tree base, tree maxindex, tree init,
  || TREE_CODE (type) == ARRAY_TYPE))
{
  init = build_constructor (init_list_type_node, NULL);
+ CONSTRUCTOR_IS_PAREN_INIT (init) = paren_init_p;
}
  else
{


...by taking the else branch here.  Then we shouldn't need the 
convert_like change.


Jason



Re: [PATCH] rs6000, Add new overloaded vector shift builtin int128, varients

2024-07-23 Thread Peter Bergner
On 7/19/24 3:04 PM, Carl Love wrote:
> diff --git a/gcc/config/rs6000/altivec.md b/gcc/config/rs6000/altivec.md
> index 5af9bf920a2..2a18ee44526 100644
> --- a/gcc/config/rs6000/altivec.md
> +++ b/gcc/config/rs6000/altivec.md
> @@ -878,9 +878,9 @@ (define_int_attr SLDB_lr [(UNSPEC_SLDB "l")
>  (define_int_iterator VSHIFT_DBL_LR [UNSPEC_SLDB UNSPEC_SRDB])
> 
>  (define_insn "vsdb_"
> - [(set (match_operand:VI2 0 "register_operand" "=v")
> -  (unspec:VI2 [(match_operand:VI2 1 "register_operand" "v")
> -   (match_operand:VI2 2 "register_operand" "v")
> + [(set (match_operand:VEC_IC 0 "register_operand" "=v")
> +  (unspec:VEC_IC [(match_operand:VEC_IC 1 "register_operand" "v")
> +   (match_operand:VEC_IC 2 "register_operand" "v")
> (match_operand:QI 3 "const_0_to_12_operand" "n")]
>VSHIFT_DBL_LR))]
>"TARGET_POWER10"

I know the old code used the register_operand predicate for the vector
operands, but those really should be changed to altivec_register_operand.

Peter




Re: [PATCH v2] MATCH: Add simplification for MAX and MIN to match.pd [PR109878]

2024-07-23 Thread Marc Glisse
A few ideas in case you want to generalize the transformation (these are 
not requirements to get your patch in, and this is not a review):


On Fri, 19 Jul 2024, Eikansh Gupta wrote:


--- a/gcc/match.pd
+++ b/gcc/match.pd
@@ -4321,6 +4321,32 @@ DEFINE_INT_AND_FLOAT_ROUND_FN (RINT)
@0
@2)))

+/* min (a & CST0, a & CST1) -> a & CST0 IFF CST0 & CST1 == CST0 */
+/* max (a & CST0, a & CST1) -> a & CST0 IFF CST0 & CST1 == CST1 */
+/* If signed a, then both the constants should have same sign. */
+(for minmax (min max)
+ (simplify
+  (minmax (bit_and@3 @0 INTEGER_CST@1) (bit_and@4 @0 INTEGER_CST@2))
+   (if (TYPE_UNSIGNED (type)
+|| (tree_int_cst_sgn (@1) == tree_int_cst_sgn (@2)))
+(with { auto andvalue = wi::to_wide (@1) & wi::to_wide (@2); }
+ (if (andvalue == ((minmax == MIN_EXPR)
+   ? wi::to_wide (@1) : wi::to_wide (@2)))
+  @3
+  (if (andvalue == ((minmax != MIN_EXPR)
+? wi::to_wide (@1) : wi::to_wide (@2)))
+   @4))


Since max(a&1,a&3) is a&3, I think in the signed case we could also 
replace max(a&N,a&3) with a&3 if N is 1 | sign-bit (i.e. -1u/2+2). Indeed, 
either a>=0 and a&N is a&1, or a<0 and a&N < 0 <= a&3.



+/* min (a, a & CST) --> a & CST */
+/* max (a, a & CST) --> a */
+(for minmax (min max)
+ (simplify
+  (minmax @0 (bit_and@1 @0 INTEGER_CST@2))


Why do you require that @2 be a constant?


+   (if (TYPE_UNSIGNED(type))
+(if (minmax == MIN_EXPR)
+ @1
+ @0


Do we already have the corresponding transformations for comparisons?

a&b <= a --> true (if unsigned)
etc

Ideally, we would have **1** transformation for max(X,Y) that tries to 
fold X<=Y and if it folds to true then simplifies to Y. This way the 
transformations would only need to be written for comparisons, not minmax.


--
Marc Glisse


Re: [PATCH] c++: fix wrong ambiguity resolution [PR29834]

2024-07-23 Thread Marek Polacek
On Tue, Jul 23, 2024 at 05:12:14PM -0400, Jason Merrill wrote:
> On 7/23/24 4:18 PM, Marek Polacek wrote:
> > On Tue, Jul 23, 2024 at 12:53:07AM -0400, Jason Merrill wrote:
> > > On 7/20/24 2:31 PM, Marek Polacek wrote:
> > > > [ Entering the contest to fix the oldest PR in this cycle. ]
> > > > 
> > > > Bootstrapped/regtested on x86_64-pc-linux-gnu, ok for trunk?
> > > > 
> > > > -- >8 --
> > > > This 18-year-old PR reports that we parse certain comma expressions
> > > > as a declaration rather than statement when the statement begins with
> > > > a functional-style cast expression.  Consider
> > > > 
> > > > int(x), 0;
> > > > 
> > > > which does not declare x--it only casts x to int--, whereas
> > > > 
> > > > int(x), (y);
> > > > 
> > > > declares x and y.  We need some kind of look-ahead to decide how we
> > > > should disambiguate the construct, because cp_parser_init_declarator
> > > > commits eagerly once it has seen "int(x)", and then it's too late to
> > > > recover.
> > > > 
> > > > This patch makes us try to parse the code as a sequence of declarators;
> > > > if that fails, we are likely looking at a statement.  That's a simple
> > > > idea, but it's complicated by code like
> > > > 
> > > > void (*p)(void *)(fun);
> > > > 
> > > > which initializes a pointer-to-function, or
> > > > 
> > > > int(x), (x) + 1;
> > > > 
> > > > which is an expression statement, but the second (x) is parsed as
> > > > a valid declarator, only the + after reveals that the whole thing
> > > > is an expression.  You can have things like
> > > > 
> > > > int(**p)
> > > > 
> > > > which by itself doesn't tell you much.  You can have
> > > > 
> > > > int(*q)(void*)
> > > > 
> > > > which looks like it starts with a functional-style cast, but it is not
> > > > a cast.  The simple
> > > > 
> > > > int(x) = 42;
> > > > 
> > > > has an initializer so it declares x; it is not an assignment.  But then,
> > > > 
> > > >   int(d) __attribute__(());
> > > > 
> > > > does not have an initializer, but the attribute makes it a declaration.
> > > > 
> > > > PR c++/29834
> > > > PR c++/54905
> > > > 
> > > > gcc/cp/ChangeLog:
> > > > 
> > > > * parser.cc (cp_parser_lambda_introducer): Use
> > > > cp_parser_next_token_starts_initializer_p.
> > > > (cp_parser_simple_declaration): Add look-ahead to decide if 
> > > > we're
> > > > looking at a declaration or statement.
> > > > (cp_parser_next_token_starts_initializer_p): New.
> > > > 
> > > > gcc/testsuite/ChangeLog:
> > > > 
> > > > * g++.dg/parse/ambig15.C: New test.
> > > > * g++.dg/parse/ambig16.C: New test.
> > > > ---
> > > >gcc/cp/parser.cc | 73 ++--
> > > >gcc/testsuite/g++.dg/parse/ambig15.C | 83 
> > > > 
> > > >gcc/testsuite/g++.dg/parse/ambig16.C | 18 ++
> > > >3 files changed, 168 insertions(+), 6 deletions(-)
> > > >create mode 100644 gcc/testsuite/g++.dg/parse/ambig15.C
> > > >create mode 100644 gcc/testsuite/g++.dg/parse/ambig16.C
> > > > 
> > > > diff --git a/gcc/cp/parser.cc b/gcc/cp/parser.cc
> > > > index 1fa0780944b..797cfc3204e 100644
> > > > --- a/gcc/cp/parser.cc
> > > > +++ b/gcc/cp/parser.cc
> > > > @@ -2947,6 +2947,8 @@ static bool 
> > > > cp_parser_next_token_ends_template_argument_p
> > > >  (cp_parser *);
> > > >static bool cp_parser_nth_token_starts_template_argument_list_p
> > > >  (cp_parser *, size_t);
> > > > +static bool cp_parser_next_token_starts_initializer_p
> > > > +  (cp_parser *);
> > > >static enum tag_types cp_parser_token_is_class_key
> > > >  (cp_token *);
> > > >static enum tag_types cp_parser_token_is_type_parameter_key
> > > > @@ -11663,9 +11665,7 @@ cp_parser_lambda_introducer (cp_parser* parser, 
> > > > tree lambda_expr)
> > > > }
> > > >  /* Find the initializer for this capture.  */
> > > > -  if (cp_lexer_next_token_is (parser->lexer, CPP_EQ)
> > > > - || cp_lexer_next_token_is (parser->lexer, CPP_OPEN_PAREN)
> > > > - || cp_lexer_next_token_is (parser->lexer, CPP_OPEN_BRACE))
> > > > +  if (cp_parser_next_token_starts_initializer_p (parser))
> > > > {
> > > >   /* An explicit initializer exists.  */
> > > >   if (cxx_dialect < cxx14)
> > > > @@ -11747,9 +11747,7 @@ cp_parser_lambda_introducer (cp_parser* parser, 
> > > > tree lambda_expr)
> > > >   /* If what follows is an initializer, the second 
> > > > '...' is
> > > >  invalid.  But for cases like [...xs...], the first 
> > > > one
> > > >  is invalid.  */
> > > > - if (cp_lexer_next_token_is (parser->lexer, CPP_EQ)
> > > > - || cp_lexer_next_token_is (parser->lexer, 
> > > > CPP_OPEN_PAREN)
> > > > - || cp_lexer_next_token_is (parser->lexer, 
> > > > CPP_OPEN_BRACE))
> > > > +  

Re: [PATCHv2, expand] Add const0 move checking for CLEAR_BY_PIECES optabs

2024-07-23 Thread Jeff Law




On 7/21/24 7:58 PM, HAO CHEN GUI wrote:

Hi,
   This patch adds const0 move checking for CLEAR_BY_PIECES. The original
vec_duplicate handles duplicates of non-constant inputs. But 0 is a
constant. So even a platform doesn't support vec_duplicate, it could
still do clear by pieces if it supports const0 move by that mode.

   Compared to the previous version, the main change is to do const0
direct move for by-piece clear if the target supports const0 move by
that mode.
https://gcc.gnu.org/pipermail/gcc-patches/2024-January/643063.html

   Bootstrapped and tested on x86 and powerpc64-linux BE and LE with no
regressions. There are several regressions on aarch64. They could be
fixed by enhancing const0 move on V2x8QImode. Is it OK for trunk?
Can you be more specific about the aarch64 regressions?  Execution? 
Scan-asm?  ICE?


Ideally we'd include a patch to fix those regressions as well.


jeff



Re: Yet another SPEC compare failure on trunk

2024-07-23 Thread Jeff Law




On 7/22/24 10:42 PM, Vineet Gupta wrote:



On 7/22/24 10:58, Jeff Law wrote:

On 7/22/24 11:52 AM, Vineet Gupta wrote:

On 7/9/24 17:26, Jeff Law wrote:

On 7/9/24 6:11 PM, Vineet Gupta wrote:

Couple weeks ago, 502.gcc was failing (PR/115669) which got fixed promptly.
On today's trunk I'm seeing a runtime compare failure on 500.perlbench.

       2024-07-09 d17889dbffd5 i386: Implement .SAT_TRUNC for unsigned
integers

Anyone else seeing this ?

      > 3830:  mbox2:
dWshe3Aa1EULre4CT5O/ErYFrk+o/EOoebA1kTVjQVQQH2EjT5fHcYnwjj2MdBmZu5y3Ce4Ei4QQZo/SNrry9g
      >    mbox2:
uuWPimQiU0D4UrwFP+LS0lFNph4qL43WV1A6T3tHleatIOUaHixhrJU9NoA2lc9KjwYpdEL0lNTXkvo8ymNHzA
      >   ^
      > 3832:  mbox3:
8f4jdv6GIf0lX3DcdwRdEm6/aZwnmGX6n86GzCvmkwTKFXQjwlwVHc8jy8XlcyiIPr3yXTkgVOiP3cRYvyYQPg
      >    mbox3:
9xQySgP6qbhfxl8Usu1WfGA5UhStB5AN31wueGM6OF4Jp59DkqJPu6ksGblOU5u0nQapQC1e9oYIs16a2mq2NA
      >   ^
      > specdiff run completed

Given it it looks like a hash and Robin has indicated that LMUL > 2 is
mucking up the hashing code in xz, I wouldn't be surprised if it's the
same thing.

The Embecosm guys were going to take a looksie.

Quick update:

Edwin bisected this to

      2024-07-06 273f16a125c4 [to-be-committed][v3][RISC-V] Handle bit
manipulation of SImode values

He can provide more details.

Ugh.  That's mine...  We've been running a variant of that internally,
so my first thought is whether or not the generalizations done as it I
prepped it for upstream are goofy.

Anyway, I'll own it.


It seems the operands for and+not got swapped in one of the patterns.
Follows restores perlbench

diff --git a/gcc/config/riscv/bitmanip.md b/gcc/config/riscv/bitmanip.md
index f403ba8dbbad..d262430485e7 100644
--- a/gcc/config/riscv/bitmanip.md
+++ b/gcc/config/riscv/bitmanip.md
@@ -675,7 +675,7 @@
    "#"
    "&& reload_completed"
     [(set (match_dup 4) (match_dup 2))
-    (set (match_dup 4) (and:DI (not:DI (match_dup 4)) (match_dup 1)))
+    (set (match_dup 4) (and:DI (not:DI (match_dup 1)) (match_dup 4)))
  (set (match_dup 0) (any_or:DI (ashift:DI (const_int 1) (match_dup
5)) (match_dup 3)))]
    { operands[5] = gen_lowpart (QImode, operands[4]); }
    [(set_attr "type" "bitmanip")])

Yea, that looks like I just plain goof'd it.   OK for the trunk.

jeff



Re: [PATCH 1/2] Output CodeView type information for references

2024-07-23 Thread Jeff Law




On 7/20/24 1:31 PM, Mark Harmstone wrote:

Translates DW_TAG_reference_type DIEs into LF_POINTER types.

gcc/
* dwarf2codeview.cc (get_type_num_reference_type): New function.
(get_type_num_array_type): Add DW_TAG_reference_type to switch.
(get_type_num): Handle DW_TAG_reference_type DIEs.
* dwarf2codeview.h (CV_PTR_MODE_LVREF): Define.

Both patches in this series are fine.

Thanks!
jeff



Re: [PATCH-1v4] Value Range: Add range op for builtin isinf

2024-07-23 Thread Andrew MacLeod



On 7/23/24 15:18, Jeff Law wrote:



On 7/11/24 9:17 PM, HAO CHEN GUI wrote:

So why the test for real_isinf on the upper/lower bound?  If op1 is 
known to be a NaN, then why test the bounds at all?  If a bounds 
test is needed, why only test the upper bound?


IMHO, logical is if the op1 is a NAN, it's not an infinite number. If 
the upper
and lower bound both are finite numbers, the op1 is not an infinite 
number.
Under both situations, the result should be set to 0 which means op1 
isn't an

infinite number.

Understood, but that's not what the code actually implements:


+    if (op1.known_isnan ()
+    || (!real_isinf (&op1.lower_bound ())
+    && !real_isinf (&op1.upper_bound (
+  {
+    r.set_zero (type);
+    return true;
+  }
If op1 is a NaN, then it it can not be Inf.  Similarly if both of the 
bounds are known not to be Inf, then op1 is not Inf and thus we should 
be returning false instead of true.  Or am I mis-understanding this API?



the range is in r, and is set to [0,0].  this is the false part of what 
is being returned for the range.


the "return true" indicates we determined a range, so use what is in R.

returning false means we did not find a range to return, so r is garbage.




Re: [PATCH] c++, contracts: Ensure return statements on checkers.

2024-07-23 Thread Iain Sandoe



> On 23 Jul 2024, at 21:26, Jason Merrill  wrote:
> 
> On 6/17/24 8:14 AM, Iain Sandoe wrote:
>> This is a minor tidy-up, tested on x86_64-darwin,
>> OK For trunk?
>> thanks
>> Iain
>> --- 8< ---
>> At present, for pre-conditions and for post-conditions with a void
>> return, we are not emitting a return statement. This patch adds the
>> relevant return statements.
> 
> Hmm, why would we need a return statement in a void function?

we do not need it,
Iain

> 
>> gcc/cp/ChangeLog:
>>  * contracts.cc (finish_function_contracts): Add return
>>  statements to pre-condition and void post-cndition
>>  checking functions.
>> Signed-off-by: Iain Sandoe 
>> ---
>>  gcc/cp/contracts.cc | 3 +++
>>  1 file changed, 3 insertions(+)
>> diff --git a/gcc/cp/contracts.cc b/gcc/cp/contracts.cc
>> index 634e3cf4fa9..0822624a910 100644
>> --- a/gcc/cp/contracts.cc
>> +++ b/gcc/cp/contracts.cc
>> @@ -2052,6 +2052,7 @@ finish_function_contracts (tree fndecl)
>>DECL_PENDING_INLINE_P (pre) = false;
>>start_preparsed_function (pre, DECL_ATTRIBUTES (pre), flags);
>>remap_and_emit_conditions (fndecl, pre, PRECONDITION_STMT);
>> +  finish_return_stmt (NULL_TREE);
>>tree finished_pre = finish_function (false);
>>expand_or_defer_fn (finished_pre);
>>  }
>> @@ -2065,6 +2066,8 @@ finish_function_contracts (tree fndecl)
>>remap_and_emit_conditions (fndecl, post, POSTCONDITION_STMT);
>>if (!VOID_TYPE_P (TREE_TYPE (TREE_TYPE (post
>>  finish_return_stmt (get_postcondition_result_parameter (fndecl));
>> +  else
>> +finish_return_stmt (NULL_TREE);
>>  tree finished_post = finish_function (false);
>>expand_or_defer_fn (finished_post);
> 



[COMMITTED] RISC-V: Fix snafu in SI mode splitters patch

2024-07-23 Thread Vineet Gupta
SPEC2017 perlbench for RISC-V was broke as runtime output mismatch
failure.

> 3830:  mbox2: 
> dWshe3Aa1EULre4CT5O/ErYFrk+o/EOoebA1kTVjQVQQH2EjT5fHcYnwjj2MdBmZu5y3Ce4Ei4QQZo/SNrry9g
>mbox2: 
> uuWPimQiU0D4UrwFP+LS0lFNph4qL43WV1A6T3tHleatIOUaHixhrJU9NoA2lc9KjwYpdEL0lNTXkvo8ymNHzA
>   ^
> 3832:  mbox3: 
> 8f4jdv6GIf0lX3DcdwRdEm6/aZwnmGX6n86GzCvmkwTKFXQjwlwVHc8jy8XlcyiIPr3yXTkgVOiP3cRYvyYQPg
>mbox3: 
> 9xQySgP6qbhfxl8Usu1WfGA5UhStB5AN31wueGM6OF4Jp59DkqJPu6ksGblOU5u0nQapQC1e9oYIs16a2mq2NA
>   ^
> specdiff run completed

Edwin bisected this to 273f16a125c4 ("[v3][RISC-V] Handle bit
manipulation of SImode values") which had the operands swapped in one
of the new splitters introduced.

No test as reducer narrows it to down to the exact test introduced by
the original commit.

gcc/ChangeLog:
* config/riscv/bitmanip.md: Fix splitter.

Reported-by: Edwin Lu 
Signed-off-by: Vineet Gupta 
---
 gcc/config/riscv/bitmanip.md | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/gcc/config/riscv/bitmanip.md b/gcc/config/riscv/bitmanip.md
index f403ba8dbbad..d262430485e7 100644
--- a/gcc/config/riscv/bitmanip.md
+++ b/gcc/config/riscv/bitmanip.md
@@ -675,7 +675,7 @@
   "#"
   "&& reload_completed"
[(set (match_dup 4) (match_dup 2))
-(set (match_dup 4) (and:DI (not:DI (match_dup 4)) (match_dup 1)))
+(set (match_dup 4) (and:DI (not:DI (match_dup 1)) (match_dup 4)))
 (set (match_dup 0) (any_or:DI (ashift:DI (const_int 1) (match_dup 5)) 
(match_dup 3)))]
   { operands[5] = gen_lowpart (QImode, operands[4]); }
   [(set_attr "type" "bitmanip")])
-- 
2.43.0



Re: [RFC/PATCH] isel: Fold more in gimple_expand_vec_cond_expr with andc/iorc

2024-07-23 Thread Andrew Pinski
On Mon, Jul 22, 2024 at 7:41 PM Kewen.Lin  wrote:
>
> Hi Andrew,
>
> on 2024/7/23 08:09, Andrew Pinski wrote:
> > On Sun, Jun 30, 2024 at 11:17 PM Kewen.Lin  wrote:
> >>
> >> Hi,
> >>
> >> As PR115659 shows, assuming c = x CMP y, there are some
> >> folding chances for patterns r = c ? 0/z : z/-1:
> >>   - For r = c ? 0 : z, it can be folded into r = ~c & z.
> >>   - For r = c ? z : -1, it can be folded into r = ~c | z.
> >>
> >> But BIT_AND/BIT_IOR applied on one BIT_NOT operand is a
> >> compound operation, I'm not sure if each target with
> >> vector capability have a single vector instruction for it,
> >> if no, it's arguable to consider it always beats vector
> >> selection (like vector constant gets hoisted or combined
> >> and selection has same latency as normal logical operation).
> >> So IMHO we probably need to query target with new optabs.
> >> So this patch is to introduce new optabs andc, iorc and its
> >> corresponding internal functions BIT_{ANDC,IORC} (looking
> >> for suggestion for naming optabs and ifns), and if targets
> >> defines such optabs for vector modes, it means targets
> >> support these hardware insns and should be not worse than
> >> vector selection.  btw, the rs6000 changes are meant to
> >> give an example for a target supporting andc/iorc.
> >>
> >> Does this sound reasonable?
> >
> > Just a quick FYI (I will be making the change and testing the change).
> > The optab names `andc` and `iorc` unfortunately do not work with
> > scalar modes since there are complex modes which start with c and are
> > combined with the scalar modes. So for an example a pattern named
> > `andcsi3` is not for the optab `andc` with the mode of si but rather
> > for `and` optab and for the mode `csi`. The same issue happens for
> > `iorc` too.
>
> ah, thanks for pointing out this!  I guess a "_" can help, that is:
>
> OPTAB_D (andc_optab, "andc_$a3")
> OPTAB_D (iorc_optab, "iorc_$a3")
>
> but the downside is the code naming become different from "and$a3"
> and "ior$a3", so it seems better to use different names like what
> you proposed.
>
> > Thinking out loud on what names we should use instead; `andn` and
> > `iorn` might be ok? Does anyone else have any suggestions?
>
> FWIW, they look good to me.

Just FYI. I also noticed the powerpc backend could define these optabs
for scalars and would benefit for better code with the following
example (after I finish up my patches):
```
long f1(long a, long b)
{
a = ~0x4;
return a | ~b;
}
long f2(long a, long b)
{
a = ~0x4;
return a & ~b;
}
```

Thanks,
Andrew Pinski

>
> BR,
> Kewen
>
> >
> > Note I will also be adding the cond version of them since at least for
> > aarch64 SVE, bic (andc) can be conditional.
> >
> > Thanks,
> > Andrew Pinski
> >
> >>
>
>


Re: [PATCH v3 2/2] Prevent divide-by-zero

2024-07-23 Thread Edwin Lu



On 7/23/2024 11:20 AM, Richard Sandiford wrote:

Edwin Lu  writes:

On 7/23/2024 4:56 AM, Richard Biener wrote:

On Tue, Jul 23, 2024 at 1:03 AM Edwin Lu  wrote:

Hi Richard,

On 5/31/2024 1:48 AM, Richard Biener wrote:

On Thu, May 30, 2024 at 2:11 AM Patrick O'Neill  wrote:

From: Greg McGary 

Still a NACK.  If remain ends up zero then

   /* Try to use a single smaller load when we are about
  to load excess elements compared to the unrolled
  scalar loop.  */
   if (known_gt ((vec_num * j + i + 1) * nunits,
  (group_size * vf - gap)))
 {
   poly_uint64 remain = ((group_size * vf - gap)
 - (vec_num * j + i) * nunits);
   if (known_ge ((vec_num * j + i + 1) * nunits
 - (group_size * vf - gap), nunits))
 /* DR will be unused.  */
 ltype = NULL_TREE;

needs to be re-formulated so that the combined conditions make sure
this doesn't happen.  The outer known_gt should already ensure that
remain > 0.  For correctness that should possibly be maybe_gt though.

Yeah.  FWIW, I mentioned the maybe_gt thing in
https://gcc.gnu.org/pipermail/gcc-patches/2024-May/653013.html:

   Pre-existing, but shouldn't this be maybe_gt rather than known_gt?
   We can only skip doing it if we know for sure that the load won't cross
   the gap.  (Not sure whether the difference can trigger in practice.)

But AFAICT, the known_gt doesn't inherently prove that remain is known
to be nonzero.  It just proves that the gap between the end of the scalar
accesses and the end of this vector is known to be nonzero.


Putting the list back in the loop and CCing Richard S.


I'm currently looking into this patch and am trying to figure out what
is going on. Stepping through gdb, I see that remain == {coeffs = {0,
2}} and nunits == {coeffs = {2, 2}} (the outer known_gt returned true
with known_gt({coeffs = {8, 8}}, {coeffs = {6, 8}})).

   From what I understand, this falls under the umbrella of 0 <= remain <
nunits. The divide by zero error is because of the 0 <= remain which is
coming from the constant_multiple_p function in poly-int.h where it
performs the modulus NCa(a.coeffs[0]) % NCb(b.coeffs[0]).
(https://github.com/gcc-mirror/gcc/blob/master/gcc/poly-int.h#L1970-L1971)


   >  if (known_ge ((vec_num * j + i + 1) * nunits
   >- (group_size * vf - gap),
nunits))
   >/* DR will be unused.  */
   >ltype = NULL_TREE;

This if condition is a bit suspicious to me though. I'm seeing that it's
evaluating known_ge({coeffs = {2, 0}}, {coeffs = {2, 2}}) which is
returning false. Should it be maybe_ge instead?

No, we can only not emit a load if we know it won't be used, not if
it eventually cannot be used.

Agreed.

[switching round for easier reply]

After running some
tests, to me it looks like it doesn't vectorize quite as often; however,
I'm not fully sure what else to do when the coeffs can potentially be
equal to 0.

Should it even be possible for there to be a {coeffs = {0, n}}
situation? My understanding of how poly_ints are used for representing
vectorization is that the first coefficient is the number of elements
needed to make the minimum supported vector size. That is, if vector
lengths are 128 bits, element size is 32 bits, coeff[0] should be
minimum of 4. Is this understanding correct?

I was told n can be negative, but nunits.coeff[0] should be non-zero.

What would it mean for the coeffs[0] to be 0? Would that mean the vector length 
supports 0 bits?

coeffs = {A,B} just means A+B*X, where X is the number of vector
"chunks" beyond the minimum length.  It's certainly valid for a poly_int
to have a zero coeffs[0] (i.e. zero A).  For example, (the length of a
vector) - (the minimum length) would have this property.


Thanks for the explanation! I have a few clarification questions about this.
If I understand correctly, B would represent the number of elements the vector 
can have (for 128b vector operating on 32b elements, B == 4, but if operating 
on 64b elements B == 2); however, I'm not too sure what A represents.

On the poly_int docs, it says

An indeterminate value of 0 should usually represent the minimum possible 
runtime value, with c0 specifying the value in that case.

"minimum possible runtime value" doesn't make sense to me. Does it mean the 
potential minimum bound of elements it will operate on?


What is j and i when the divisor is zero?

The values I see in gdb are: vec_num = 4 j = 0 i = 3 vf = {coeffs = {2,
2}} nunits = {coeffs = {2, 2}} group_size = 4 gap = 2 vect_align = 2
remain = {coeffs = {0, 2}}

OK, so let's use D to mean "data" and G to mean "gap".  Th

[PATCH v2] RISC-V: Add basic support for the Zacas extension

2024-07-23 Thread Patrick O'Neill
From: Gianluca Guida 

This patch adds support for amocas.{b|h|w|d}. Support for amocas.q
(64/128 bit cas for rv32/64) will be added in a future patch.

Extension: https://github.com/riscv/riscv-zacas
Ratification: https://jira.riscv.org/browse/RVS-680

gcc/ChangeLog:

* common/config/riscv/riscv-common.cc
(riscv_subset_list::to_string): Skip zacas when not supported by
the assembler.
* config.in: Regenerate.
* config/riscv/arch-canonicalize: Make zacas imply zaamo.
* config/riscv/riscv.opt: Add zacas.
* config/riscv/sync.md (zacas_atomic_cas_value): New pattern.
(atomic_compare_and_swap): Use new pattern for compare-and-swap 
ops.
(zalrsc_atomic_cas_value_strong): Rename atomic_cas_value_strong.
* configure: Regenerate.
* configure.ac: Regenerate.
* doc/sourcebuild.texi: Add Zacas documentation.

gcc/testsuite/ChangeLog:

* lib/target-supports.exp: Add zacas testsuite infra support.
* 
gcc.target/riscv/amo/zalrsc-rvwmo-compare-exchange-int-acquire-release.c:
Remove zacas to continue to test the lr/sc pairs.
* gcc.target/riscv/amo/zalrsc-rvwmo-compare-exchange-int-acquire.c: 
Ditto.
* gcc.target/riscv/amo/zalrsc-rvwmo-compare-exchange-int-consume.c: 
Ditto.
* gcc.target/riscv/amo/zalrsc-rvwmo-compare-exchange-int-relaxed.c: 
Ditto.
* gcc.target/riscv/amo/zalrsc-rvwmo-compare-exchange-int-release.c: 
Ditto.
* 
gcc.target/riscv/amo/zalrsc-rvwmo-compare-exchange-int-seq-cst-relaxed.c: Ditto.
* gcc.target/riscv/amo/zalrsc-rvwmo-compare-exchange-int-seq-cst.c: 
Ditto.
* 
gcc.target/riscv/amo/zalrsc-ztso-compare-exchange-int-acquire-release.c: Ditto.
* gcc.target/riscv/amo/zalrsc-ztso-compare-exchange-int-acquire.c: 
Ditto.
* gcc.target/riscv/amo/zalrsc-ztso-compare-exchange-int-consume.c: 
Ditto.
* gcc.target/riscv/amo/zalrsc-ztso-compare-exchange-int-relaxed.c: 
Ditto.
* gcc.target/riscv/amo/zalrsc-ztso-compare-exchange-int-release.c: 
Ditto.
* 
gcc.target/riscv/amo/zalrsc-ztso-compare-exchange-int-seq-cst-relaxed.c: Ditto.
* gcc.target/riscv/amo/zalrsc-ztso-compare-exchange-int-seq-cst.c: 
Ditto.
* gcc.target/riscv/amo/zabha-zacas-preferred-over-zalrsc.c: New test.
* gcc.target/riscv/amo/zacas-char-requires-zabha.c: New test.
* gcc.target/riscv/amo/zacas-char-requires-zacas.c: New test.
* gcc.target/riscv/amo/zacas-preferred-over-zalrsc.c: New test.
* gcc.target/riscv/amo/zacas-rvwmo-compare-exchange-char-acq-rel.c: New 
test.
* gcc.target/riscv/amo/zacas-rvwmo-compare-exchange-char-acquire.c: New 
test.
* gcc.target/riscv/amo/zacas-rvwmo-compare-exchange-char-relaxed.c: New 
test.
* gcc.target/riscv/amo/zacas-rvwmo-compare-exchange-char-release.c: New 
test.
* gcc.target/riscv/amo/zacas-rvwmo-compare-exchange-char-seq-cst.c: New 
test.
* 
gcc.target/riscv/amo/zacas-rvwmo-compare-exchange-compatability-mapping-no-fence.c:
New test.
* 
gcc.target/riscv/amo/zacas-rvwmo-compare-exchange-compatability-mapping.cc: New 
test.
* gcc.target/riscv/amo/zacas-rvwmo-compare-exchange-int-acq-rel.c: New 
test.
* gcc.target/riscv/amo/zacas-rvwmo-compare-exchange-int-acquire.c: New 
test.
* gcc.target/riscv/amo/zacas-rvwmo-compare-exchange-int-relaxed.c: New 
test.
* gcc.target/riscv/amo/zacas-rvwmo-compare-exchange-int-release.c: New 
test.
* gcc.target/riscv/amo/zacas-rvwmo-compare-exchange-int-seq-cst.c: New 
test.
* gcc.target/riscv/amo/zacas-rvwmo-compare-exchange-short-acq-rel.c: 
New test.
* gcc.target/riscv/amo/zacas-rvwmo-compare-exchange-short-acquire.c: 
New test.
* gcc.target/riscv/amo/zacas-rvwmo-compare-exchange-short-relaxed.c: 
New test.
* gcc.target/riscv/amo/zacas-rvwmo-compare-exchange-short-release.c: 
New test.
* gcc.target/riscv/amo/zacas-rvwmo-compare-exchange-short-seq-cst.c: 
New test.
* gcc.target/riscv/amo/zacas-ztso-compare-exchange-char-seq-cst.c: New 
test.
* gcc.target/riscv/amo/zacas-ztso-compare-exchange-char.c: New test.
* 
gcc.target/riscv/amo/zacas-ztso-compare-exchange-compatability-mapping-no-fence.c:
New test.
* 
gcc.target/riscv/amo/zacas-ztso-compare-exchange-compatability-mapping.cc: New 
test.
* gcc.target/riscv/amo/zacas-ztso-compare-exchange-int-seq-cst.c: New 
test.
* gcc.target/riscv/amo/zacas-ztso-compare-exchange-int.c: New test.
* gcc.target/riscv/amo/zacas-ztso-compare-exchange-short-seq-cst.c: New 
test.
* gcc.target/riscv/amo/zacas-ztso-compare-exchange-short.c: New test.

Co-authored-by: Patrick O'Neill 
---
V2 Changelog
* Fix functional bug where expected amocas value wasn't initialized.
* Add leading fence to zacas with seq_cst failure mode for PSABI
compatability (and corresponding testcases).
* Rename ato

Re: [PATCH v2] RISC-V: Add basic support for the Zacas extension

2024-07-23 Thread Patrick O'Neill

  (define_expand "atomic_compare_and_swap"
[(match_operand:SI 0 "register_operand" "")   ;; bool output
 (match_operand:GPR 1 "register_operand" "")  ;; val output
 (match_operand:GPR 2 "memory_operand" "");; memory
-   (match_operand:GPR 3 "reg_or_0_operand" "")  ;; expected value
+   (match_operand:GPR 3 "register_operand" "")  ;; expected value
 (match_operand:GPR 4 "reg_or_0_operand" "")  ;; desired value
 (match_operand:SI 5 "const_int_operand" "")  ;; is_weak
 (match_operand:SI 6 "const_int_operand" "")  ;; mod_s
 (match_operand:SI 7 "const_int_operand" "")] ;; mod_f



-(define_expand "atomic_cas_value_strong"
+(define_expand "zalrsc_atomic_cas_value_strong"
[(match_operand:SHORT 0 "register_operand") ;; val output
 (match_operand:SHORT 1 "memory_operand")   ;; memory
 (match_operand:SHORT 2 "reg_or_0_operand") ;; expected value


Wanted to highlight this for reviewers since it's non-obvious when
reading the diff:
`expected value` is now a `register_operand` since that's needed for
amocas as it reads/writes expected/actual value to the same register.

I left the zalrsc pattern as `reg_or_0_operand` even though it will
only be passed `register_operand` from the expander.
I think it captures the lr/sc pattern's requirements better but I'm
happy to change this if register_operand is preferred for the zalrsc case.

Patrick


[PATCH 2/2] xtensa: Add missing speed cost for TYPE_FARITH in TARGET_INSN_COST

2024-07-23 Thread Takayuki 'January June' Suwa

According to the implemented pipeline model, this cost can be assumed to be
1 clock cycle.

gcc/ChangeLog:

* config/xtensa/xtensa.cc (xtensa_insn_cost):
Add a case statement for TYPE_FARITH.From b819dd4fb38bedd95ef5d66847a0f80b9ca8ee86 Mon Sep 17 00:00:00 2001
From: Takayuki 'January June' Suwa 
Date: Wed, 24 Jul 2024 06:07:06 +0900
Subject: [PATCH 2/2] xtensa: Add missing speed cost for TYPE_FARITH in
 TARGET_INSN_COST

According to the implemented pipeline model, this cost can be assumed to be
1 clock cycle.

gcc/ChangeLog:

* config/xtensa/xtensa.cc (xtensa_insn_cost):
Add a case statement for TYPE_FARITH.
---
 gcc/config/xtensa/xtensa.cc | 1 +
 1 file changed, 1 insertion(+)

diff --git a/gcc/config/xtensa/xtensa.cc b/gcc/config/xtensa/xtensa.cc
index d90e78fe9c4..a7cb3cc59fc 100644
--- a/gcc/config/xtensa/xtensa.cc
+++ b/gcc/config/xtensa/xtensa.cc
@@ -4722,6 +4722,7 @@ xtensa_insn_cost (rtx_insn *insn, bool speed)
case TYPE_ARITH:
case TYPE_MULTI:
case TYPE_NOP:
+   case TYPE_FARITH:
case TYPE_FSTORE:
  return COSTS_N_INSNS (n);
 
-- 
2.39.2



[PATCH 1/2] xtensa: Fix suboptimal loading of pooled constant value into hardware single-precision FP register

2024-07-23 Thread Takayuki 'January June' Suwa

We would like to implement the following to store a single-precision FP
constant in a hardware FP register:

- Load the bit-exact integer image of the pooled single-precision FP
  constant into an address (integer) register
- Then, assign from that address register to a hardware single-precision
  FP register

.literal_position
.literal.LC1, 0x3f80
...
l32ra9, .LC1
wfr f0, a9

However, it was emitted as follows:

- Load the address of the FP constant entry in litpool into an address
  register
- Then, dereference the address via that address register into a hardware
  single-precision FP register

.literal_position
.literal.LC1, 0x3f80
.literal.LC2, .LC1
...
l32ra9, .LC2
lsi f0, a9, 0

It is obviously inefficient to read the pool twice.

gcc/ChangeLog:

* config/xtensa/xtensa.md (movsf_internal):
Reorder alternative that corresponds to L32R machine instruction,
and prefix alternatives that correspond to LSI/SSI instructions
with the constraint character '^' so that they are disparaged by
reload/LRA.From a552e4fca21ff9a0c7a5327dd15ccdada36930c1 Mon Sep 17 00:00:00 2001
From: Takayuki 'January June' Suwa 
Date: Tue, 23 Jul 2024 16:03:12 +0900
Subject: [PATCH 1/2] xtensa: Fix suboptimal loading of pooled constant value
 into hardware single-precision FP register

We would like to implement the following to store a single-precision FP
constant in a hardware FP register:

- Load the bit-exact integer image of the pooled single-precision FP
  constant into an address (integer) register
- Then, assign from that address register to a hardware single-precision
  FP register

.literal_position
.literal.LC1, 0x3f80
...
l32ra9, .LC1
wfr f0, a9

However, it was emitted as follows:

- Load the address of the FP constant entry in litpool into an address
  register
- Then, dereference the address via that address register into a hardware
  single-precision FP register

.literal_position
.literal.LC1, 0x3f80
.literal.LC2, .LC1
...
l32ra9, .LC2
lsi f0, a9, 0

It is obviously inefficient to read the pool twice.

gcc/ChangeLog:

* config/xtensa/xtensa.md (movsf_internal):
Reorder alternative that corresponds to L32R machine instruction,
and prefix alternatives that correspond to LSI/SSI instructions
with the constraint character '^' so that they are disparaged by
reload/LRA.
---
 gcc/config/xtensa/xtensa.md | 10 +-
 1 file changed, 5 insertions(+), 5 deletions(-)

diff --git a/gcc/config/xtensa/xtensa.md b/gcc/config/xtensa/xtensa.md
index a3b99dc381d..f19e1fd16b5 100644
--- a/gcc/config/xtensa/xtensa.md
+++ b/gcc/config/xtensa/xtensa.md
@@ -1453,8 +1453,8 @@
 })
 
 (define_insn "movsf_internal"
-  [(set (match_operand:SF 0 "nonimmed_operand" "=f,f,U,D,D,R,a,f,a,a,W,a,a,U")
-   (match_operand:SF 1 "move_operand" "f,U,f,d,R,d,r,r,f,Y,iF,T,U,r"))]
+  [(set (match_operand:SF 0 "nonimmed_operand" "=f,f,^U,D,a,D,R,a,f,a,a,W,a,U")
+   (match_operand:SF 1 "move_operand" "f,^U,f,d,T,R,d,r,r,f,Y,iF,U,r"))]
   "((register_operand (operands[0], SFmode)
  || register_operand (operands[1], SFmode))
 && !(FP_REG_P (xt_true_regnum (operands[0]))
@@ -1464,6 +1464,7 @@
%v1lsi\t%0, %1
%v0ssi\t%1, %0
mov.n\t%0, %1
+   %v1l32r\t%0, %1
%v1l32i.n\t%0, %1
%v0s32i.n\t%1, %0
mov\t%0, %1
@@ -1471,12 +1472,11 @@
rfr\t%0, %1
movi\t%0, %y1
const16\t%0, %t1\;const16\t%0, %b1
-   %v1l32r\t%0, %1
%v1l32i\t%0, %1
%v0s32i\t%1, %0"
-  [(set_attr "type"
"farith,fload,fstore,move,load,store,move,farith,farith,move,move,load,load,store")
+  [(set_attr "type"
"farith,fload,fstore,move,load,load,store,move,farith,farith,move,move,load,store")
(set_attr "mode""SF")
-   (set_attr "length"  "3,3,3,2,2,2,3,3,3,3,6,3,3,3")])
+   (set_attr "length"  "3,3,3,2,3,2,2,3,3,3,3,6,3,3")])
 
 (define_insn "*lsiu"
   [(set (match_operand:SF 0 "register_operand" "=f")
-- 
2.39.2



[committed][PR rtl-optimization/115877][6/n] Add testcase from pr115877

2024-07-23 Thread Jeff Law


This just adds the testcase from pr115877.  It's working now on the 
trunk.  I'm not done with cleanups/bugfixing, but there's no reason to 
not have the testcase installed at this point.


Pushed to the trunk.

jeff

commit f9a60d575f02822852aa22513c636be38f9c63ea
Author: Jeff Law 
Date:   Tue Jul 23 19:11:04 2024 -0600

[PR rtl-optimization/115877][6/n] Add testcase from pr115877

This just adds the testcase from pr115877.  It's working now on the trunk.  
I'm
not done with cleanups/bugfixing, but there's no reason to not have the
testcase installed at this point.

PR rtl-optimization/115877
gcc/testsuite
* gcc.dg/torture/pr115877.c: New test.

diff --git a/gcc/testsuite/gcc.dg/torture/pr115877.c 
b/gcc/testsuite/gcc.dg/torture/pr115877.c
new file mode 100644
index 000..432b1280b17
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/torture/pr115877.c
@@ -0,0 +1,20 @@
+/* { dg-do run { target int128 } } */
+
+char a[16];
+unsigned short u;
+
+__int128
+foo (int i)
+{
+  i -= (unsigned short) ~u;
+  a[(unsigned short) i] = 1;
+  return i;
+}
+
+int
+main ()
+{
+  __int128 x = foo (0);
+  if (x != -0x)
+__builtin_abort();
+}


[PATCH 1/2] cp/coroutines: do not rewrite parameters in unevaluated contexts

2024-07-23 Thread Arsen Arsenović
It is possible to use parameters of a parent function of a lambda in
unevaluated contexts without capturing them.  By not capturing them, we
work around the usual mechanism we use to prevent rewriting captured
parameters.  Prevent this by simply skipping rewrites in unevaluated
contexts.  Those won't mind the value not being present anyway.

gcc/cp/ChangeLog:

PR c++/111728
* coroutines.cc (rewrite_param_uses): Skip unevaluated
subexpressions.

gcc/testsuite/ChangeLog:

PR c++/111728
* g++.dg/coroutines/pr111728.C: New test.
---
Evening!

This 'series' contains two patches for the coroutine implementation to
address two unrelated PRs.

The first prevents an ICE during coroutine parameter substitution by not
performing it in unevaluated contexts.  Those contexts can contain names
that were not captured by lambdas but *are* parameters to coroutines.
In the testcase from the PR, the rewriting machinery finds a param in
the body of the coroutine, which it did not previously encounter while
processing the coroutine declaration, and that does not have a
DECL_VALUE_EXPR, and fails.

Since it is not really useful to rewrite parameter uses in unevaluated
contexts, we can just ignore those, preventing confusion (and the ICE).

Regression tested on x86_64-pc-linux-gnu.

OK for trunk?

TIA, have a lovely night.

 gcc/cp/coroutines.cc   |  6 +
 gcc/testsuite/g++.dg/coroutines/pr111728.C | 29 ++
 2 files changed, 35 insertions(+)
 create mode 100644 gcc/testsuite/g++.dg/coroutines/pr111728.C

diff --git a/gcc/cp/coroutines.cc b/gcc/cp/coroutines.cc
index e8f028df3ad..fb8f24e6c61 100644
--- a/gcc/cp/coroutines.cc
+++ b/gcc/cp/coroutines.cc
@@ -3755,6 +3755,12 @@ rewrite_param_uses (tree *stmt, int *do_subtree 
ATTRIBUTE_UNUSED, void *d)
   return cp_walk_tree (&t, rewrite_param_uses, d, NULL);
 }
 
+  if (unevaluated_p (TREE_CODE (*stmt)))
+{
+  *do_subtree = 0; // Nothing to do.
+  return NULL_TREE;
+}
+
   if (TREE_CODE (*stmt) != PARM_DECL)
 return NULL_TREE;
 
diff --git a/gcc/testsuite/g++.dg/coroutines/pr111728.C 
b/gcc/testsuite/g++.dg/coroutines/pr111728.C
new file mode 100644
index 000..c1fee4b36a1
--- /dev/null
+++ b/gcc/testsuite/g++.dg/coroutines/pr111728.C
@@ -0,0 +1,29 @@
+// { dg-do compile }
+// https://gcc.gnu.org/PR111728
+#include 
+struct promise;
+struct coroutine : std::coroutine_handle
+{
+using promise_type = ::promise;
+bool await_ready() { return false; }
+void await_suspend(coroutine_handle h) {}
+int await_resume() { return {} ;}
+};
+struct promise
+{
+coroutine get_return_object() { return {coroutine::from_promise(*this)}; }
+std::suspend_always initial_suspend() noexcept { return {}; }
+std::suspend_always final_suspend() noexcept { return {}; }
+void return_void() {}
+void unhandled_exception() {}
+};
+coroutine
+write_fields() {
+  int static_buffer[10];
+  co_await [](auto)
+  -> coroutine
+  {
+if (sizeof(static_buffer));
+  co_return;
+  }(0);
+}
-- 
2.45.2



[PATCH 2/2] cp+coroutines: teach convert_to_void to diagnose discarded co_awaits

2024-07-23 Thread Arsen Arsenović
co_await expressions are nearly calls to Awaitable::await_resume, and,
as such, should inherit its nodiscard.  A discarded co_await expression
should, hence, act as if its call to await_resume was discarded.

CO_AWAIT_EXPR trees do conveniently contain the expression for calling
await_resume in them, so we can discard it.

gcc/cp/ChangeLog:

PR c++/110171
* coroutines.cc (co_await_get_resume_call): New function.
Returns the await_resume expression of a given co_await.
* cp-tree.h (co_await_get_resume_call): New function.
* cvt.cc (convert_to_void): Handle CO_AWAIT_EXPRs and call
maybe_warn_nodiscard on their resume exprs.

gcc/testsuite/ChangeLog:

PR c++/110171
* g++.dg/coroutines/pr110171-1.C: New test.
* g++.dg/coroutines/pr110171.C: New test.
---
This patch teaches convert_to_void how to discard 'through' a
CO_AWAIT_EXPR.  CO_AWAIT_EXPR nodes (most of the time) already contain
their relevant await_resume() call embedded within them, so, when we
discard a CO_AWAIT_EXPR, we can also just discard the await_resume()
call embedded within it.  This results in a [[nodiscard]] diagnostic
that the PR noted was missing.

As with the previous patch, regression-tested on x86_64-pc-linux-gnu.

OK for trunk?

TIA.

 gcc/cp/coroutines.cc | 13 
 gcc/cp/cp-tree.h |  3 ++
 gcc/cp/cvt.cc|  8 +
 gcc/testsuite/g++.dg/coroutines/pr110171-1.C | 34 
 gcc/testsuite/g++.dg/coroutines/pr110171.C   | 32 ++
 5 files changed, 90 insertions(+)
 create mode 100644 gcc/testsuite/g++.dg/coroutines/pr110171-1.C
 create mode 100644 gcc/testsuite/g++.dg/coroutines/pr110171.C

diff --git a/gcc/cp/coroutines.cc b/gcc/cp/coroutines.cc
index fb8f24e6c61..05486c2fb19 100644
--- a/gcc/cp/coroutines.cc
+++ b/gcc/cp/coroutines.cc
@@ -596,6 +596,19 @@ coro_get_destroy_function (tree decl)
   return NULL_TREE;
 }
 
+/* Given a CO_AWAIT_EXPR AWAIT_EXPR, return its resume call.  */
+
+tree*
+co_await_get_resume_call (tree await_expr)
+{
+  gcc_checking_assert (TREE_CODE (await_expr) == CO_AWAIT_EXPR);
+  tree vec = TREE_OPERAND (await_expr, 3);
+  if (!vec)
+return nullptr;
+  return &TREE_VEC_ELT (vec, 2);
+}
+
+
 /* These functions assumes that the caller has verified that the state for
the decl has been initialized, we try to minimize work here.  */
 
diff --git a/gcc/cp/cp-tree.h b/gcc/cp/cp-tree.h
index 856699de82f..c9ae8950bd1 100644
--- a/gcc/cp/cp-tree.h
+++ b/gcc/cp/cp-tree.h
@@ -8763,6 +8763,9 @@ extern tree coro_get_actor_function   (tree);
 extern tree coro_get_destroy_function  (tree);
 extern tree coro_get_ramp_function (tree);
 
+extern tree* co_await_get_resume_call  (tree await_expr);
+
+
 /* contracts.cc */
 extern tree make_postcondition_variable(cp_expr);
 extern tree make_postcondition_variable(cp_expr, tree);
diff --git a/gcc/cp/cvt.cc b/gcc/cp/cvt.cc
index d95e01c118c..7b4bd8a9dc4 100644
--- a/gcc/cp/cvt.cc
+++ b/gcc/cp/cvt.cc
@@ -1502,6 +1502,14 @@ convert_to_void (tree expr, impl_conv_void implicit, 
tsubst_flags_t complain)
maybe_warn_nodiscard (expr, implicit);
   break;
 
+case CO_AWAIT_EXPR:
+  {
+   auto awr = co_await_get_resume_call (expr);
+   if (awr && *awr)
+ *awr = convert_to_void (*awr, implicit, complain);
+   break;
+  }
+
 default:;
 }
   expr = resolve_nondeduced_context (expr, complain);
diff --git a/gcc/testsuite/g++.dg/coroutines/pr110171-1.C 
b/gcc/testsuite/g++.dg/coroutines/pr110171-1.C
new file mode 100644
index 000..d8aff582487
--- /dev/null
+++ b/gcc/testsuite/g++.dg/coroutines/pr110171-1.C
@@ -0,0 +1,34 @@
+// { dg-do compile }
+#include 
+
+struct must_check_result
+{
+bool await_ready() { return false; }
+void await_suspend(std::coroutine_handle<>) {}
+[[nodiscard]] bool await_resume() { return {}; }
+};
+
+struct task {};
+
+namespace std
+{
+template
+struct coroutine_traits
+{
+struct promise_type
+{
+task get_return_object() { return {}; }
+suspend_always initial_suspend() noexcept { return {}; }
+suspend_always final_suspend() noexcept { return {}; }
+void return_void() {}
+void unhandled_exception() {}
+};
+};
+}
+
+task example(auto)
+{
+co_await must_check_result(); // { dg-warning "-Wunused-result" }
+}
+
+void f() { example(1); }
diff --git a/gcc/testsuite/g++.dg/coroutines/pr110171.C 
b/gcc/testsuite/g++.dg/coroutines/pr110171.C
new file mode 100644
index 000..4b82e23656c
--- /dev/null
+++ b/gcc/testsuite/g++.dg/coroutines/pr110171.C
@@ -0,0 +1,32 @@
+// { dg-do compile }
+#include 
+
+struct must_check_result
+{
+bool await_ready() { return false; }
+void await_suspend(std::coroutine_handle<>) {}
+[[nodiscard]] boo

RE: [PATCH v2] Internal-fn: Only allow type matches mode for internal fn[PR115961]

2024-07-23 Thread Li, Pan2
> Just a slight comment improvement:
> /* Returns true if both types of TYPE_PAIR strictly match their modes,
> else returns false.  */

> This testcase could go in g++.dg/torture/ without the -O3 option.

> Since we are scanning for the negative it should pass on all targets
> even ones without SAT_TRUNC support. And then you should not need the
> other testcase either.

Thanks all, will address above comments and commit it if no surprise from test.

Pan

-Original Message-
From: Richard Sandiford  
Sent: Tuesday, July 23, 2024 10:03 PM
To: Richard Biener 
Cc: Li, Pan2 ; gcc-patches@gcc.gnu.org; 
juzhe.zh...@rivai.ai; kito.ch...@gmail.com; tamar.christ...@arm.com; 
jeffreya...@gmail.com; rdapp@gmail.com
Subject: Re: [PATCH v2] Internal-fn: Only allow type matches mode for internal 
fn[PR115961]

Richard Biener  writes:
> On Fri, Jul 19, 2024 at 1:10 PM  wrote:
>>
>> From: Pan Li 
>>
>> The direct_internal_fn_supported_p has no restrictions for the type
>> modes.  For example the bitfield like below will be recog as .SAT_TRUNC.
>>
>> struct e
>> {
>>   unsigned pre : 12;
>>   unsigned a : 4;
>> };
>>
>> __attribute__((noipa))
>> void bug (e * v, unsigned def, unsigned use) {
>>   e & defE = *v;
>>   defE.a = min_u (use + 1, 0xf);
>> }
>>
>> This patch would like to check strictly for the 
>> direct_internal_fn_supported_p,
>> and only allows the type matches mode for ifn type tree pair.
>>
>> The below test suites are passed for this patch:
>> 1. The rv64gcv fully regression tests.
>> 2. The x86 bootstrap tests.
>> 3. The x86 fully regression tests.
>
> LGTM unless Richard S. has any more comments.

LGTM too with Andrew's comments addressed.

Thanks,
Richard

>
> Richard.
>
>> PR target/115961
>>
>> gcc/ChangeLog:
>>
>> * internal-fn.cc (type_strictly_matches_mode_p): Add new func
>> impl to check type strictly matches mode or not.
>> (type_pair_strictly_matches_mode_p): Ditto but for tree type
>> pair.
>> (direct_internal_fn_supported_p): Add above check for the tree
>> type pair.
>>
>> gcc/testsuite/ChangeLog:
>>
>> * g++.target/i386/pr115961-run-1.C: New test.
>> * g++.target/riscv/rvv/base/pr115961-run-1.C: New test.
>>
>> Signed-off-by: Pan Li 
>> ---
>>  gcc/internal-fn.cc| 32 +
>>  .../g++.target/i386/pr115961-run-1.C  | 34 +++
>>  .../riscv/rvv/base/pr115961-run-1.C   | 34 +++
>>  3 files changed, 100 insertions(+)
>>  create mode 100644 gcc/testsuite/g++.target/i386/pr115961-run-1.C
>>  create mode 100644 gcc/testsuite/g++.target/riscv/rvv/base/pr115961-run-1.C
>>
>> diff --git a/gcc/internal-fn.cc b/gcc/internal-fn.cc
>> index 95946bfd683..5c21249318e 100644
>> --- a/gcc/internal-fn.cc
>> +++ b/gcc/internal-fn.cc
>> @@ -4164,6 +4164,35 @@ direct_internal_fn_optab (internal_fn fn)
>>gcc_unreachable ();
>>  }
>>
>> +/* Return true if TYPE's mode has the same format as TYPE, and if there is
>> +   a 1:1 correspondence between the values that the mode can store and the
>> +   values that the type can store.  */
>> +
>> +static bool
>> +type_strictly_matches_mode_p (const_tree type)
>> +{
>> +  if (VECTOR_TYPE_P (type))
>> +return VECTOR_MODE_P (TYPE_MODE (type));
>> +
>> +  if (INTEGRAL_TYPE_P (type))
>> +return type_has_mode_precision_p (type);
>> +
>> +  if (SCALAR_FLOAT_TYPE_P (type) || COMPLEX_FLOAT_TYPE_P (type))
>> +return true;
>> +
>> +  return false;
>> +}
>> +
>> +/* Return true if both the first and the second type of tree pair are
>> +   strictly matches their modes,  or return false.  */
>> +
>> +static bool
>> +type_pair_strictly_matches_mode_p (tree_pair type_pair)
>> +{
>> +  return type_strictly_matches_mode_p (type_pair.first)
>> +&& type_strictly_matches_mode_p (type_pair.second);
>> +}
>> +
>>  /* Return true if FN is supported for the types in TYPES when the
>> optimization type is OPT_TYPE.  The types are those associated with
>> the "type0" and "type1" fields of FN's direct_internal_fn_info
>> @@ -4173,6 +4202,9 @@ bool
>>  direct_internal_fn_supported_p (internal_fn fn, tree_pair types,
>> optimization_type opt_type)
>>  {
>> +  if (!type_pair_strictly_matches_mode_p (types))
>> +return false;
>> +
>>switch (fn)
>>  {
>>  #define DEF_INTERNAL_FN(CODE, FLAGS, FNSPEC) \
>> diff --git a/gcc/testsuite/g++.target/i386/pr115961-run-1.C 
>> b/gcc/testsuite/g++.target/i386/pr115961-run-1.C
>> new file mode 100644
>> index 000..b8c8aef3b17
>> --- /dev/null
>> +++ b/gcc/testsuite/g++.target/i386/pr115961-run-1.C
>> @@ -0,0 +1,34 @@
>> +/* PR target/115961 */
>> +/* { dg-do run } */
>> +/* { dg-options "-O3 -fdump-rtl-expand-details" } */
>> +
>> +struct e
>> +{
>> +  unsigned pre : 12;
>> +  unsigned a : 4;
>> +};
>> +
>> +static unsigned min_u (unsigned a, unsigned b)
>> +{
>> +  return (b < a) ? b : a;
>> +}
>> +
>> +__attribute__((n

Re: [PATCH] libcpp, c++: Optimize initializers using #embed in C++

2024-07-23 Thread Jason Merrill

On 7/17/24 3:47 AM, Jakub Jelinek wrote:

Hi!

This patch on top of the
https://gcc.gnu.org/pipermail/gcc-patches/2024-June/655012.html
https://gcc.gnu.org/pipermail/gcc-patches/2024-June/655013.html
https://gcc.gnu.org/pipermail/gcc-patches/2024-July/657049.html
patches which just introduce non-optimized support for the C23 feature
and two extensions to it actually optimizes it and on top of the
https://gcc.gnu.org/pipermail/gcc-patches/2024-July/657053.html
patch which adds optimizations to C & middle-end adds similar
optimizations to the C++ FE.
The first hunk enables use of CPP_EMBED token even for C++, not just
C; the preprocessor guarantees there is always a CPP_NUMBER CPP_COMMA
before CPP_EMBED and CPP_COMMA CPP_NUMBER after it which simplifies
parsing (unless #embed is more than 2GB, in that case it could be
CPP_NUMBER CPP_COMMA CPP_EMBED CPP_COMMA CPP_EMBED CPP_COMMA CPP_EMBED
CPP_COMMA CPP_NUMBER etc. with each CPP_EMBED covering at most INT_MAX
bytes).
Similarly to the C patch, this patch parses it into RAW_DATA_CST tree
in the braced initializers (and from there peels into INTEGER_CSTs unless
it is an initializer of an std::byte array or integral array with CHAR_BIT
element precision), parses CPP_EMBED in cp_parser_expression into just
the last INTEGER_CST in it because I think users don't need millions of
-Wunused-value warnings because they did useless
   int a = (
   #embed "megabyte.dat"
   );
and so most of the inner INTEGER_CSTs would be there just for the warning,
and in the rest of contexts like template argument list, function argument
list, attribute argument list, ...) parse it into a sequence of INTEGER_CSTs
(I wrote a range/iterator classes to simplify that).

My dumb
cat embed-11.c
constexpr unsigned char a[] = {
#embed "cc1plus"
};
const unsigned char *b = a;
testcase where cc1plus is 492329008 bytes long when configured
--enable-checking=yes,rtl,extra against recent binutils with .base64 gas
support results in:
time ./xg++ -B ./ -S -O2 embed-11.c

real0m4.350s
user0m2.427s
sys 0m0.830s
time ./xg++ -B ./ -c -O2 embed-11.c

real0m6.932s
user0m6.034s
sys 0m0.888s
(compared to running out of memory or very long compilation).
On a shorter inclusion,
cat embed-12.c
constexpr unsigned char a[] = {
#embed "xg++"
};
const unsigned char *b = a;
where xg++ is 15225904 bytes long, this takes using GCC with the #embed
patchset except for this patch:
time ~/src/gcc/obj36/gcc/xg++ -B ~/src/gcc/obj36/gcc/ -S -O2 embed-12.c

real0m33.190s
user0m32.327s
sys 0m0.790s
and with this patch:
time ./xg++ -B ./ -S -O2 embed-12.c

real0m0.118s
user0m0.090s
sys 0m0.028s

The patch doesn't change anything on what the first patch in the series
introduces even for C++, namely that #embed is expanded (actually or as if)
into a sequence of literals like
127,69,76,70,2,1,1,3,0,0,0,0,0,0,0,0,2,0,62,0,1,0,0,0,80,211,64,0,0,0,0,0,64,0,0,0,0,0,0,0,8,253
and so each element has int type.
That is how I believe it is in C23, and the different versions of the
C++ P1967 paper specified there some casts, P1967R12 in particular
"Otherwise, the integral constant expression is the value of std::fgetc’s 
return is cast
to unsigned char."
but please see
https://github.com/llvm/llvm-project/pull/97274#issuecomment-2230929277
comment and whether we really want the preprocessor to preprocess it for
C++ as (or as-if)
static_cast(127),static_cast(69),static_cast(76),static_cast(70),static_cast(2),...
i.e. 9 tokens per byte rather than 2, or
(unsigned char)127,(unsigned char)69,...
or
((unsigned char)127),((unsigned char)69),...
etc.


The discussion at that link suggests that the author is planning to 
propose removing the cast.



@@ -6895,16 +6918,68 @@ reshape_init_array_1 (tree elt_type, tre
  {
tree elt_init;
constructor_elt *old_cur = d->cur;
+  const char *old_ptr = NULL;
+
+  if (TREE_CODE (d->cur->value) == RAW_DATA_CST)
+   old_ptr = RAW_DATA_POINTER (d->cur->value);


Let's call this variable old_raw_data_ptr for clarity, here and in 
reshape_init_class.


  
if (d->cur->index)

CONSTRUCTOR_IS_DESIGNATED_INIT (new_init) = true;
check_array_designated_initializer (d->cur, index);
-  elt_init = reshape_init_r (elt_type, d,
-/*first_initializer_p=*/NULL_TREE,
-complain);
+  if (TREE_CODE (d->cur->value) == RAW_DATA_CST
+ && (TREE_CODE (elt_type) == INTEGER_TYPE
+ || (TREE_CODE (elt_type) == ENUMERAL_TYPE
+ && TYPE_CONTEXT (TYPE_MAIN_VARIANT (elt_type)) == std_node
+ && strcmp (TYPE_NAME_STRING (TYPE_MAIN_VARIANT (elt_type)),
+"byte") == 0))


Maybe is_byte_access_type?  Or finally factor out a function to test 
specifically for std::byte, it's odd that we don't have one yet.



@@ -7158,6 +7244,7 @@ reshape_init_class (tree type, reshape_i
 is initialized by 

RE: [r15-2196 Regression] FAIL: c-c++-common/dfp/convert-bfp-6.c -std=gnu++98 execution test on Linux/x86_64

2024-07-23 Thread Jiang, Haochen
It might be a false positive timeout alert. Please ignore that first.

Thx,
Haochen

> -Original Message-
> From: haochen.jiang 
> Sent: Tuesday, July 23, 2024 7:51 PM
> To: j...@ventanamicro.com; gcc-regress...@gcc.gnu.org; gcc-
> patc...@gcc.gnu.org; Jiang, Haochen 
> Subject: [r15-2196 Regression] FAIL: c-c++-common/dfp/convert-bfp-6.c -
> std=gnu++98 execution test on Linux/x86_64
> 
> On Linux/x86_64,
> 
> 88d16194d0c8a6bdc2896c8944bfbf3e6038c9d2 is the first bad commit
> commit 88d16194d0c8a6bdc2896c8944bfbf3e6038c9d2
> Author: Jeff Law 
> Date:   Mon Jul 22 08:45:10 2024 -0600
> 
> [NFC][PR rtl-optimization/115877] Avoid setting irrelevant bit groups as 
> live
> in ext-dce
> 
> caused
> 
> FAIL: c-c++-common/dfp/convert-bfp-10.c execution test
> FAIL: c-c++-common/dfp/convert-bfp-10.c  -std=c++14 execution test
> FAIL: c-c++-common/dfp/convert-bfp-10.c  -std=c++17 execution test
> FAIL: c-c++-common/dfp/convert-bfp-10.c  -std=c++20 execution test
> FAIL: c-c++-common/dfp/convert-bfp-10.c  -std=c++98 execution test
> FAIL: c-c++-common/dfp/convert-bfp-6.c execution test
> FAIL: c-c++-common/dfp/convert-bfp-6.c  -std=gnu++14 execution test
> FAIL: c-c++-common/dfp/convert-bfp-6.c  -std=gnu++17 execution test
> FAIL: c-c++-common/dfp/convert-bfp-6.c  -std=gnu++20 execution test
> FAIL: c-c++-common/dfp/convert-bfp-6.c  -std=gnu++98 execution test
> 
> with GCC configured with
> 
> ../../gcc/configure --prefix=/export/users/haochenj/src/gcc-
> bisect/master/master/r15-2196/usr --enable-clocale=gnu --with-system-zlib -
> -with-demangler-in-ld --with-fpmath=sse --enable-languages=c,c++,fortran --
> enable-cet --without-isl --enable-libmpx x86_64-linux --disable-bootstrap
> 
> To reproduce:
> 
> $ cd {build_dir}/gcc && make check RUNTESTFLAGS="dfp.exp=c-c++-
> common/dfp/convert-bfp-10.c --target_board='unix{-m32}'"
> $ cd {build_dir}/gcc && make check RUNTESTFLAGS="dfp.exp=c-c++-
> common/dfp/convert-bfp-10.c --target_board='unix{-m32\ -
> march=cascadelake}'"
> $ cd {build_dir}/gcc && make check RUNTESTFLAGS="dfp.exp=c-c++-
> common/dfp/convert-bfp-6.c --target_board='unix{-m32}'"
> $ cd {build_dir}/gcc && make check RUNTESTFLAGS="dfp.exp=c-c++-
> common/dfp/convert-bfp-6.c --target_board='unix{-m32\ -
> march=cascadelake}'"
> 
> (Please do not reply to this email, for question about this report, contact 
> me at
> haochen dot jiang at intel.com.) (If you met problems with cascadelake
> related, disabling AVX512F in command line might save that.) (However,
> please make sure that there is no potential problems with AVX512.)


Re: [r15-2196 Regression] FAIL: c-c++-common/dfp/convert-bfp-6.c -std=gnu++98 execution test on Linux/x86_64

2024-07-23 Thread Jeff Law




On 7/23/24 7:49 PM, Jiang, Haochen wrote:

It might be a false positive timeout alert. Please ignore that first.

Funny, I was wondering about that -- I couldn't get them to fail.

Jeff



[PATCH v3] RISC-V: Supports Profiles in '-march' option.

2024-07-23 Thread Jiawei
Supports RISC-V profiles[1] in -march option.

Default input set the profile before other formal extensions.

V2: Fixes some format errors and adds code comments for parse function
Thanks for Jeff Law's review and comments.

V3: Update testcases and profiles extensions support.Remove S/M mode Profiles. 
Thanks for Christoph Müllner,Palmer Dabbelt's  review and comments.

[1]https://github.com/riscv/riscv-profiles/blob/main/profiles.adoc

---
 gcc/common/config/riscv/riscv-common.cc  | 71 +++-
 gcc/config/riscv/riscv-subset.h  |  2 +
 gcc/testsuite/gcc.target/riscv/arch-41.c |  5 ++
 gcc/testsuite/gcc.target/riscv/arch-42.c | 12 
 gcc/testsuite/gcc.target/riscv/arch-43.c | 12 
 5 files changed, 101 insertions(+), 1 deletion(-)
 create mode 100644 gcc/testsuite/gcc.target/riscv/arch-41.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/arch-42.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/arch-43.c

diff --git a/gcc/common/config/riscv/riscv-common.cc 
b/gcc/common/config/riscv/riscv-common.cc
index 682826c0e34..e092026fe9b 100644
--- a/gcc/common/config/riscv/riscv-common.cc
+++ b/gcc/common/config/riscv/riscv-common.cc
@@ -231,6 +231,12 @@ struct riscv_ext_version
   int minor_version;
 };
 
+struct riscv_profiles
+{
+  const char *profile_name;
+  const char *profile_string;
+};
+
 /* All standard extensions defined in all supported ISA spec.  */
 static const struct riscv_ext_version riscv_ext_version_table[] =
 {
@@ -442,6 +448,31 @@ static const struct riscv_ext_version riscv_combine_info[] 
=
   {NULL, ISA_SPEC_CLASS_NONE, 0, 0}
 };
 
+/* This table records the mapping form RISC-V Profiles into march string.  */
+static const riscv_profiles riscv_profiles_table[] =
+{
+  /* RVI20U only contains the base extension 'i' as mandatory extension.  */
+  {"RVI20U64", "rv64i"},
+  {"RVI20U32", "rv32i"},
+
+  /* RVA20U contains the 'i,m,a,f,d,c,zicsr,zicntr,ziccif,ziccrse,ziccamoa,
+ zicclsm,za128rs' as mandatory extensions.  */
+  {"RVA20U64", "rv64imafdc_zicsr_zicntr_ziccif_ziccrse_ziccamoa"
+   "_zicclsm_za128rs"},
+
+  /* RVA22U contains the 'i,m,a,f,d,c,zicsr,zihintpause,zba,zbb,zbs,zicntr,
+ zihpm,ziccif,ziccrse,ziccamoa, zicclsm,zic64b,za64rs,zicbom,zicbop,zicboz,
+ zfhmin,zkt' as mandatory extensions.  */
+  {"RVA22U64", "rv64imafdc_zicsr_zicntr_ziccif_ziccrse_ziccamoa"
+   "_zicclsm_zic64b_za64rs_zihintpause_zba_zbb_zbs_zicbom_zicbop"
+   "_zicboz_zfhmin_zkt"},
+
+  /* Currently we do not define S/M mode Profiles in gcc part.  */
+
+  /* Terminate the list.  */
+  {NULL, NULL}
+};
+
 static const riscv_cpu_info riscv_cpu_tables[] =
 {
 #define RISCV_CORE(CORE_NAME, ARCH, TUNE) \
@@ -1047,6 +1078,42 @@ riscv_subset_list::parsing_subset_version (const char 
*ext,
   return p;
 }
 
+/* Parsing RISC-V Profiles in -march string.
+   Return string with mandatory extensions of Profiles.  */
+const char *
+riscv_subset_list::parse_profiles (const char * p){
+  /* Checking if input string contains a Profiles.
+ There are two cases use Profiles in -march option
+
+   1. Only use Profiles as -march input
+   2. Mixed Profiles with other extensions
+
+ use '+' to split Profiles and other extension.  */
+  for (int i = 0; riscv_profiles_table[i].profile_name != NULL; ++i) {
+const char* match = strstr(p, riscv_profiles_table[i].profile_name);
+const char* plus_ext = strchr(p, '+');
+/* Find profile at the begin.  */
+if (match != NULL && match == p) {
+  /* If there's no '+' sign, return the profile_string directly.  */
+  if(!plus_ext)
+   return riscv_profiles_table[i].profile_string;
+  /* If there's a '+' sign, need to add profiles with other ext.  */
+  else {
+   size_t arch_len = strlen(riscv_profiles_table[i].profile_string)+
+ strlen(plus_ext);
+   /* Reset the input string with Profiles mandatory extensions,
+  end with '_' to connect other additional extensions.  */
+   char* result = new char[arch_len + 2];
+   strcpy(result, riscv_profiles_table[i].profile_string);
+   strcat(result, "_");
+   strcat(result, plus_ext + 1); /* skip the '+'.  */
+   return result;
+  }
+}
+  }
+  return p;
+}
+
 /* Parsing function for base extensions, rv[32|64][i|e|g]
 
Return Value:
@@ -1060,6 +1127,8 @@ riscv_subset_list::parse_base_ext (const char *p)
   unsigned major_version = 0;
   unsigned minor_version = 0;
   bool explicit_version_p = false;
+
+  p = parse_profiles(p);
 
   if (startswith (p, "rv32"))
 {
@@ -1073,7 +1142,7 @@ riscv_subset_list::parse_base_ext (const char *p)
 }
   else
 {
-  error_at (m_loc, "%<-march=%s%>: ISA string must begin with rv32 or 
rv64",
+  error_at (m_loc, "%<-march=%s%>: ISA string must begin with rv32, rv64 
or Profile",
m_arch);
   return NULL;
 }
diff --git a/gcc/config/riscv/riscv-subset.h b/gcc/config/riscv/riscv-subset.h
index dace4de6575..98fd9877f74 100644
--

Re: [PATCH v2] RISC-V: Add basic support for the Zacas extension

2024-07-23 Thread Kito Cheng
I incline do not add skip_zacas stuffs (although skip_zabha is already
there but that's fine), because that's different situation compare to
the zaamo/zalrsc, zaamo/zalrsc should automatically append if a
extension is available, which is new behavior and new extensions.

But zacas is only added when users explicitly add that in -march
string unlike zaamo/zalrsc, so I am not sure if we need to check the
binutils support and drop that if unsupported,

My biggest concern is : should we do so for every new extension?

I think we didn't do that so far, so we should


[PATCH] optabs/rs6000: Rename iorc and andc to iorn and andn

2024-07-23 Thread Andrew Pinski
When I was trying to add an scalar version of iorc and andc, the optab that
got matched was for and/ior with the mode of csi and cdi instead of iorc and
andc optabs for si and di modes. Since csi/cdi are the complex integer modes,
we need to rename the optabs to be without c there. This changes c to n which
is a neutral and known not to be first letter of a mode.

Bootstrapped and tested on x86_64 and powerpc64le.

gcc/ChangeLog:

* config/rs6000/rs6000-builtins.def: s/iorc/iorn/. s/andc/andn/
for the code.
* config/rs6000/rs6000-string.cc (expand_cmp_vec_sequence): Update
to andn.
* config/rs6000/rs6000.md (andc3): Rename to ...
(andn3): This.
(iorc3): Rename to ...
(iorn3): This.
* doc/md.texi: Update documentation for the rename.
* internal-fn.def (BIT_ANDC): Rename to ...
(BIT_ANDN): This.
(BIT_IORC): Rename to ...
(BIT_IORN): This.
* optabs.def (andc_optab): Rename to ...
(andn_optab): This.
(iorc_optab): Rename to ...
(iorn_optab): This.
* gimple-isel.cc (gimple_expand_vec_cond_expr): Update for the
renamed internal functions, ANDC/IORC to ANDN/IORN.

Signed-off-by: Andrew Pinski 
---
 gcc/config/rs6000/rs6000-builtins.def | 44 +--
 gcc/config/rs6000/rs6000-string.cc|  2 +-
 gcc/config/rs6000/rs6000.md   |  4 +--
 gcc/doc/md.texi   |  8 ++---
 gcc/gimple-isel.cc| 12 
 gcc/internal-fn.def   |  4 +--
 gcc/optabs.def| 10 --
 7 files changed, 44 insertions(+), 40 deletions(-)

diff --git a/gcc/config/rs6000/rs6000-builtins.def 
b/gcc/config/rs6000/rs6000-builtins.def
index 77eb0f7e406..ffbeff64d6d 100644
--- a/gcc/config/rs6000/rs6000-builtins.def
+++ b/gcc/config/rs6000/rs6000-builtins.def
@@ -518,25 +518,25 @@
 VAND_V8HI_UNS andv8hi3 {}
 
   const vsc __builtin_altivec_vandc_v16qi (vsc, vsc);
-VANDC_V16QI andcv16qi3 {}
+VANDC_V16QI andnv16qi3 {}
 
   const vuc __builtin_altivec_vandc_v16qi_uns (vuc, vuc);
-VANDC_V16QI_UNS andcv16qi3 {}
+VANDC_V16QI_UNS andnv16qi3 {}
 
   const vf __builtin_altivec_vandc_v4sf (vf, vf);
-VANDC_V4SF andcv4sf3 {}
+VANDC_V4SF andnv4sf3 {}
 
   const vsi __builtin_altivec_vandc_v4si (vsi, vsi);
-VANDC_V4SI andcv4si3 {}
+VANDC_V4SI andnv4si3 {}
 
   const vui __builtin_altivec_vandc_v4si_uns (vui, vui);
-VANDC_V4SI_UNS andcv4si3 {}
+VANDC_V4SI_UNS andnv4si3 {}
 
   const vss __builtin_altivec_vandc_v8hi (vss, vss);
-VANDC_V8HI andcv8hi3 {}
+VANDC_V8HI andnv8hi3 {}
 
   const vus __builtin_altivec_vandc_v8hi_uns (vus, vus);
-VANDC_V8HI_UNS andcv8hi3 {}
+VANDC_V8HI_UNS andnv8hi3 {}
 
   const vsc __builtin_altivec_vavgsb (vsc, vsc);
 VAVGSB avgv16qi3_ceil {}
@@ -1189,13 +1189,13 @@
 VAND_V2DI_UNS andv2di3 {}
 
   const vd __builtin_altivec_vandc_v2df (vd, vd);
-VANDC_V2DF andcv2df3 {}
+VANDC_V2DF andnv2df3 {}
 
   const vsll __builtin_altivec_vandc_v2di (vsll, vsll);
-VANDC_V2DI andcv2di3 {}
+VANDC_V2DI andnv2di3 {}
 
   const vull __builtin_altivec_vandc_v2di_uns (vull, vull);
-VANDC_V2DI_UNS andcv2di3 {}
+VANDC_V2DI_UNS andnv2di3 {}
 
   const vd __builtin_altivec_vnor_v2df (vd, vd);
 VNOR_V2DF norv2df3 {}
@@ -1975,40 +1975,40 @@
 NEG_V2DI negv2di2 {}
 
   const vsc __builtin_altivec_orc_v16qi (vsc, vsc);
-ORC_V16QI iorcv16qi3 {}
+ORC_V16QI iornv16qi3 {}
 
   const vuc __builtin_altivec_orc_v16qi_uns (vuc, vuc);
-ORC_V16QI_UNS iorcv16qi3 {}
+ORC_V16QI_UNS iornv16qi3 {}
 
   const vsq __builtin_altivec_orc_v1ti (vsq, vsq);
-ORC_V1TI iorcv1ti3 {}
+ORC_V1TI iornv1ti3 {}
 
   const vuq __builtin_altivec_orc_v1ti_uns (vuq, vuq);
-ORC_V1TI_UNS iorcv1ti3 {}
+ORC_V1TI_UNS iornv1ti3 {}
 
   const vd __builtin_altivec_orc_v2df (vd, vd);
-ORC_V2DF iorcv2df3 {}
+ORC_V2DF iornv2df3 {}
 
   const vsll __builtin_altivec_orc_v2di (vsll, vsll);
-ORC_V2DI iorcv2di3 {}
+ORC_V2DI iornv2di3 {}
 
   const vull __builtin_altivec_orc_v2di_uns (vull, vull);
-ORC_V2DI_UNS iorcv2di3 {}
+ORC_V2DI_UNS iornv2di3 {}
 
   const vf __builtin_altivec_orc_v4sf (vf, vf);
-ORC_V4SF iorcv4sf3 {}
+ORC_V4SF iornv4sf3 {}
 
   const vsi __builtin_altivec_orc_v4si (vsi, vsi);
-ORC_V4SI iorcv4si3 {}
+ORC_V4SI iornv4si3 {}
 
   const vui __builtin_altivec_orc_v4si_uns (vui, vui);
-ORC_V4SI_UNS iorcv4si3 {}
+ORC_V4SI_UNS iornv4si3 {}
 
   const vss __builtin_altivec_orc_v8hi (vss, vss);
-ORC_V8HI iorcv8hi3 {}
+ORC_V8HI iornv8hi3 {}
 
   const vus __builtin_altivec_orc_v8hi_uns (vus, vus);
-ORC_V8HI_UNS iorcv8hi3 {}
+ORC_V8HI_UNS iornv8hi3 {}
 
   const vsc __builtin_altivec_vclzb (vsc);
 VCLZB clzv16qi2 {}
diff --git a/gcc/config/rs6000/rs6000-string.cc 
b/gcc/config/rs6000/rs6000-string.cc
index 0f992902f38..55b4133b1a3 100644
--- a/gcc/config/rs600

Re: [PATCH] c++: diagnose failed qualified lookup into current inst

2024-07-23 Thread Andrew Pinski
On Wed, Jul 17, 2024 at 10:55 AM Patrick Palka  wrote:
>
> Bootstrapped and regtested on x86_64-pc-linux-gnu, does this look
> OK for trunk?

Just an FYI. This broke xalancbmk_r in SPEC 2017. clang has a flag to
delay the checking until instantiation time to work around this buggy
code, -fdelayed-template-parsing . Does it make sense to add a similar
one for GCC? See https://gcc.gnu.org/bugzilla/show_bug.cgi?id=116064
and https://github.com/llvm/llvm-project/issues/96859 for reference.

Thanks,
Andrew Pinski

>
> -- >8 --
>
> When the scope of a qualified name is the current instantiation, and
> qualified lookup finds nothing at template definition time, then we
> know it'll find nothing at instantiation time (unless the current
> instantiation has dependent bases).  So such qualified name lookup
> failure can be diagnosed ahead of time as per [temp.res.general]/6.
>
> This patch implements that, for qualified names of the form:
>
>   this->non_existent
>   a.non_existent
>   A::non_existent
>   typename A::non_existent
>
> It turns out we already optimistically attempt qualified lookup of
> basically every qualified name, even when it's dependently scoped, and
> just suppress issuing a lookup failure diagnostic after the fact when
> the scope is a dependent type.  So implementing this is mostly a
> matter of restricting the diagnostic suppression to "dependentish"
> scopes, rather than all dependently typed scopes.
>
> The cp_parser_conversion_function_id change is needed to avoid regressing
> lookup/using8.C:
>
>   using A::operator typename A::Nested*;
>
> When resolving A::Nested we consider it not dependently scoped since
> we entered A from cp_parser_conversion_function_id earlier.   But this
> A is the implicit instantiation A not the primary template type A,
> and so the lookup of Nested fails which we now diagnose.  This patch works
> around this by not entering the template scope of a qualified conversion
> function-id in this case, i.e. if we're in an expression vs declaration
> context, by seeing if the type already went through finish_template_type
> with entering_scope=true.
>
> gcc/cp/ChangeLog:
>
> * decl.cc (make_typename_type): Restrict name lookup failure
> punting to dependentish_scope_p instead of dependent_type_p.
> * error.cc (qualified_name_lookup_error): Improve diagnostic
> when the scope is the current instantiation.
> * parser.cc (cp_parser_diagnose_invalid_type_name): Likewise.
> (cp_parser_conversion_function_id): Don't call push_scope on
> a template scope unless we're in a declaration context.
> (cp_parser_lookup_name): Restrict name lookup failure
> punting to dependentish_scope_p instead of depedent_type_p.
> * semantics.cc (finish_id_expression_1): Likewise.
> * typeck.cc (finish_class_member_access_expr): Likewise.
>
> libstdc++-v3/ChangeLog:
>
> * include/experimental/socket
> (basic_socket_iostream::basic_socket_iostream): Fix typo.
> * include/tr2/dynamic_bitset
> (__dynamic_bitset_base::_M_is_proper_subset_of): Likewise.
>
> gcc/testsuite/ChangeLog:
>
> * g++.dg/cpp0x/alignas18.C: Expect name lookup error for U::X.
> * g++.dg/cpp0x/forw_enum13.C: Expect name lookup error for
> D3::A and D4::A.
> * g++.dg/parse/access13.C: Declare A::E::V to avoid name lookup
> failure and preserve intent of the test.
> * g++.dg/parse/enum11.C: Expect extra errors, matching the
> non-template case.
> * g++.dg/template/crash123.C: Avoid name lookup failure to
> preserve intent of the test.
> * g++.dg/template/crash124.C: Likewise.
> * g++.dg/template/crash7.C: Adjust expected diagnostics.
> * g++.dg/template/dtor6.C: Declare A::~A() to avoid name lookup
> failure and preserve intent of the test.
> * g++.dg/template/error22.C: Adjust expected diagnostics.
> * g++.dg/template/static30.C: Avoid name lookup failure to
> preserve intent of the test.
> * g++.old-deja/g++.other/decl5.C: Adjust expected diagnostics.
> * g++.dg/template/non-dependent34.C: New test.
> ---
>  gcc/cp/decl.cc|  2 +-
>  gcc/cp/error.cc   |  3 +-
>  gcc/cp/parser.cc  | 10 +++--
>  gcc/cp/semantics.cc   |  2 +-
>  gcc/cp/typeck.cc  |  2 +-
>  gcc/testsuite/g++.dg/cpp0x/alignas18.C|  3 +-
>  gcc/testsuite/g++.dg/cpp0x/forw_enum13.C  |  6 +--
>  gcc/testsuite/g++.dg/parse/access13.C |  1 +
>  gcc/testsuite/g++.dg/parse/enum11.C   |  2 +-
>  gcc/testsuite/g++.dg/template/crash123.C  |  2 +-
>  gcc/testsuite/g++.dg/template/crash124.C  |  4 +-
>  gcc/testsuite/g++.dg/template/crash7.C|  6 +--
>  gcc/testsuite/g++.dg/template/dtor6.C |  3 +-
>  gcc/testsuite/g++.dg/template/e

Re: [r15-2196 Regression] FAIL: c-c++-common/dfp/convert-bfp-6.c -std=gnu++98 execution test on Linux/x86_64

2024-07-23 Thread Jakub Jelinek
On Wed, Jul 24, 2024 at 01:49:06AM +, Jiang, Haochen wrote:
> It might be a false positive timeout alert. Please ignore that first.

It is not.  I'm seeing it too consistently on i686-linux:
obj49/LOGT:FAIL: c-c++-common/dfp/convert-bfp-10.c execution test
obj49/LOGT:FAIL: c-c++-common/dfp/convert-bfp-6.c execution test
obj49/LOGT:FAIL: c-c++-common/dfp/convert-bfp-10.c  -std=c++11 execution test
obj49/LOGT:FAIL: c-c++-common/dfp/convert-bfp-10.c  -std=c++14 execution test
obj49/LOGT:FAIL: c-c++-common/dfp/convert-bfp-10.c  -std=c++17 execution test
obj49/LOGT:FAIL: c-c++-common/dfp/convert-bfp-10.c  -std=c++20 execution test
obj49/LOGT:FAIL: c-c++-common/dfp/convert-bfp-10.c  -std=c++23 execution test
obj49/LOGT:FAIL: c-c++-common/dfp/convert-bfp-10.c  -std=c++26 execution test
obj49/LOGT:FAIL: c-c++-common/dfp/convert-bfp-10.c  -std=c++98 execution test
obj49/LOGT:FAIL: c-c++-common/dfp/convert-bfp-6.c  -std=gnu++11 execution test
obj49/LOGT:FAIL: c-c++-common/dfp/convert-bfp-6.c  -std=gnu++14 execution test
obj49/LOGT:FAIL: c-c++-common/dfp/convert-bfp-6.c  -std=gnu++17 execution test
obj49/LOGT:FAIL: c-c++-common/dfp/convert-bfp-6.c  -std=gnu++20 execution test
obj49/LOGT:FAIL: c-c++-common/dfp/convert-bfp-6.c  -std=gnu++23 execution test
obj49/LOGT:FAIL: c-c++-common/dfp/convert-bfp-6.c  -std=gnu++26 execution test
obj49/LOGT:FAIL: c-c++-common/dfp/convert-bfp-6.c  -std=gnu++98 execution test
obj51/LOGT:FAIL: c-c++-common/dfp/convert-bfp-10.c execution test
obj51/LOGT:FAIL: c-c++-common/dfp/convert-bfp-6.c execution test
obj51/LOGT:FAIL: c-c++-common/dfp/convert-bfp-10.c  -std=c++11 execution test
obj51/LOGT:FAIL: c-c++-common/dfp/convert-bfp-10.c  -std=c++14 execution test
obj51/LOGT:FAIL: c-c++-common/dfp/convert-bfp-10.c  -std=c++17 execution test
obj51/LOGT:FAIL: c-c++-common/dfp/convert-bfp-10.c  -std=c++20 execution test
obj51/LOGT:FAIL: c-c++-common/dfp/convert-bfp-10.c  -std=c++23 execution test
obj51/LOGT:FAIL: c-c++-common/dfp/convert-bfp-10.c  -std=c++26 execution test
obj51/LOGT:FAIL: c-c++-common/dfp/convert-bfp-10.c  -std=c++98 execution test
obj51/LOGT:FAIL: c-c++-common/dfp/convert-bfp-6.c  -std=gnu++11 execution test
obj51/LOGT:FAIL: c-c++-common/dfp/convert-bfp-6.c  -std=gnu++14 execution test
obj51/LOGT:FAIL: c-c++-common/dfp/convert-bfp-6.c  -std=gnu++17 execution test
obj51/LOGT:FAIL: c-c++-common/dfp/convert-bfp-6.c  -std=gnu++20 execution test
obj51/LOGT:FAIL: c-c++-common/dfp/convert-bfp-6.c  -std=gnu++23 execution test
obj51/LOGT:FAIL: c-c++-common/dfp/convert-bfp-6.c  -std=gnu++26 execution test
obj51/LOGT:FAIL: c-c++-common/dfp/convert-bfp-6.c  -std=gnu++98 execution test

The compilation of convert-bfp-6.c itself is identical between the older
(where it didn't fail) and newer (where it fails) builds, what has changed
is libgcc.a.
In particular, what matters is libgcc/bid_binarydecimal.o.
If I link all objects from libgcc from older (good libgcc) but
bid_binarydecimal.o (that one from newer bad libgcc), convert-bfp-6 still
aborts, if I link all objects from libgcc from newer (bad libgcc) but
bid_binarydecimal.o (that one from older good libgcc), convert-bfp-6 works.

Jakub



RE: [r15-2196 Regression] FAIL: c-c++-common/dfp/convert-bfp-6.c -std=gnu++98 execution test on Linux/x86_64

2024-07-23 Thread Jiang, Haochen



> -Original Message-
> From: Jakub Jelinek 
> Sent: Wednesday, July 24, 2024 1:09 PM
> To: Jiang, Haochen 
> Cc: j...@ventanamicro.com; gcc-regress...@gcc.gnu.org; gcc-
> patc...@gcc.gnu.org
> Subject: Re: [r15-2196 Regression] FAIL: c-c++-common/dfp/convert-bfp-6.c
> -std=gnu++98 execution test on Linux/x86_64
> 
> On Wed, Jul 24, 2024 at 01:49:06AM +, Jiang, Haochen wrote:
> > It might be a false positive timeout alert. Please ignore that first.
> 
> It is not.  I'm seeing it too consistently on i686-linux:
> obj49/LOGT:FAIL: c-c++-common/dfp/convert-bfp-10.c execution test
> obj49/LOGT:FAIL: c-c++-common/dfp/convert-bfp-6.c execution test
> obj49/LOGT:FAIL: c-c++-common/dfp/convert-bfp-10.c  -std=c++11
> execution test
> obj49/LOGT:FAIL: c-c++-common/dfp/convert-bfp-10.c  -std=c++14
> execution test
> obj49/LOGT:FAIL: c-c++-common/dfp/convert-bfp-10.c  -std=c++17
> execution test
> obj49/LOGT:FAIL: c-c++-common/dfp/convert-bfp-10.c  -std=c++20
> execution test
> obj49/LOGT:FAIL: c-c++-common/dfp/convert-bfp-10.c  -std=c++23
> execution test
> obj49/LOGT:FAIL: c-c++-common/dfp/convert-bfp-10.c  -std=c++26
> execution test
> obj49/LOGT:FAIL: c-c++-common/dfp/convert-bfp-10.c  -std=c++98
> execution test
> obj49/LOGT:FAIL: c-c++-common/dfp/convert-bfp-6.c  -std=gnu++11
> execution test
> obj49/LOGT:FAIL: c-c++-common/dfp/convert-bfp-6.c  -std=gnu++14
> execution test
> obj49/LOGT:FAIL: c-c++-common/dfp/convert-bfp-6.c  -std=gnu++17
> execution test
> obj49/LOGT:FAIL: c-c++-common/dfp/convert-bfp-6.c  -std=gnu++20
> execution test
> obj49/LOGT:FAIL: c-c++-common/dfp/convert-bfp-6.c  -std=gnu++23
> execution test
> obj49/LOGT:FAIL: c-c++-common/dfp/convert-bfp-6.c  -std=gnu++26
> execution test
> obj49/LOGT:FAIL: c-c++-common/dfp/convert-bfp-6.c  -std=gnu++98
> execution test
> obj51/LOGT:FAIL: c-c++-common/dfp/convert-bfp-10.c execution test
> obj51/LOGT:FAIL: c-c++-common/dfp/convert-bfp-6.c execution test
> obj51/LOGT:FAIL: c-c++-common/dfp/convert-bfp-10.c  -std=c++11
> execution test
> obj51/LOGT:FAIL: c-c++-common/dfp/convert-bfp-10.c  -std=c++14
> execution test
> obj51/LOGT:FAIL: c-c++-common/dfp/convert-bfp-10.c  -std=c++17
> execution test
> obj51/LOGT:FAIL: c-c++-common/dfp/convert-bfp-10.c  -std=c++20
> execution test
> obj51/LOGT:FAIL: c-c++-common/dfp/convert-bfp-10.c  -std=c++23
> execution test
> obj51/LOGT:FAIL: c-c++-common/dfp/convert-bfp-10.c  -std=c++26
> execution test
> obj51/LOGT:FAIL: c-c++-common/dfp/convert-bfp-10.c  -std=c++98
> execution test
> obj51/LOGT:FAIL: c-c++-common/dfp/convert-bfp-6.c  -std=gnu++11
> execution test
> obj51/LOGT:FAIL: c-c++-common/dfp/convert-bfp-6.c  -std=gnu++14
> execution test
> obj51/LOGT:FAIL: c-c++-common/dfp/convert-bfp-6.c  -std=gnu++17
> execution test
> obj51/LOGT:FAIL: c-c++-common/dfp/convert-bfp-6.c  -std=gnu++20
> execution test
> obj51/LOGT:FAIL: c-c++-common/dfp/convert-bfp-6.c  -std=gnu++23
> execution test
> obj51/LOGT:FAIL: c-c++-common/dfp/convert-bfp-6.c  -std=gnu++26
> execution test
> obj51/LOGT:FAIL: c-c++-common/dfp/convert-bfp-6.c  -std=gnu++98
> execution test
> 
> The compilation of convert-bfp-6.c itself is identical between the older 
> (where
> it didn't fail) and newer (where it fails) builds, what has changed is 
> libgcc.a.
> In particular, what matters is libgcc/bid_binarydecimal.o.
> If I link all objects from libgcc from older (good libgcc) but 
> bid_binarydecimal.o
> (that one from newer bad libgcc), convert-bfp-6 still aborts, if I link all 
> objects
> from libgcc from newer (bad libgcc) but bid_binarydecimal.o (that one from
> older good libgcc), convert-bfp-6 works.

I see. If it is not a false alarm, then it seems to me that 
gcc-15-2212-gad642d2c950
from Jeff might fix the problem from the regression report. But I am not sure 
if it
really fix the problem or happen to be right.

Thx,
Haochen

> 
>   Jakub