date:20250121

[PATCH] tree-optimization/118558 - fix alignment compute with VMAT_CONTIGUOUS_REVERSE

2025-01-21 Thread Richard Biener

There are calls to dr_misalignment left that do not correct for the
offset (which is vector type dependent) when the stride is negative.
Notably vect_known_alignment_in_bytes doesn't allow to pass through
such offset which the following adds (computing the offset in
vect_known_alignment_in_bytes would be possible as well, but the
offset can be shared as seen).  Eventually this function could go away.

This leads to peeling for gaps not considerd, nor shortening of the
access applied which is what fixes the testcase on x86_64.

Bootstrap and regtest ongoing on x86_64-unknown-linux-gnu.  I'll
be watching the CI whether other targets also run into this issue
(and whether it's fixed by the patch).

PR tree-optimization/118558
* tree-vectorizer.h (vect_known_alignment_in_bytes): Pass
through offset to dr_misalignment.
* tree-vect-stmts.cc (get_group_load_store_type): Compute
offset applied for negative stride and use it when querying
alignment of accesses.
(vectorizable_load): Likewise.

* gcc.dg/vect/pr118558.c: New testcase.
---
 gcc/testsuite/gcc.dg/vect/pr118558.c | 15 +++
 gcc/tree-vect-stmts.cc   | 24 +---
 gcc/tree-vectorizer.h|  5 +++--
 3 files changed, 35 insertions(+), 9 deletions(-)
 create mode 100644 gcc/testsuite/gcc.dg/vect/pr118558.c

diff --git a/gcc/testsuite/gcc.dg/vect/pr118558.c 
b/gcc/testsuite/gcc.dg/vect/pr118558.c
new file mode 100644
index 000..5483328d686
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/vect/pr118558.c
@@ -0,0 +1,15 @@
+#include "tree-vect.h"
+
+static unsigned long g_270[5][2] = {{123}};
+static short g_2312 = 0;
+int main()
+{
+  check_vect ();
+  int g_1168 = 0;
+  unsigned t = 4;
+  for (g_1168 = 3; g_1168 >= 0; g_1168 -= 1)
+for (g_2312 = 0; g_2312 <= 1; g_2312 += 1)
+  t = g_270[g_1168][0];
+  if (t != 123) __builtin_abort();
+}
+
diff --git a/gcc/tree-vect-stmts.cc b/gcc/tree-vect-stmts.cc
index 21fb5cf5bd4..c0550acf6b2 100644
--- a/gcc/tree-vect-stmts.cc
+++ b/gcc/tree-vect-stmts.cc
@@ -2198,14 +2198,20 @@ get_group_load_store_type (vec_info *vinfo, 
stmt_vec_info stmt_info,
   " non-consecutive accesses\n");
  return false;
}
+
+ unsigned HOST_WIDE_INT dr_size
+   = vect_get_scalar_dr_size (first_dr_info);
+ poly_int64 off = 0;
+ if (*memory_access_type == VMAT_CONTIGUOUS_REVERSE)
+   off = (TYPE_VECTOR_SUBPARTS (vectype) - 1) * -dr_size;
+
  /* An overrun is fine if the trailing elements are smaller
 than the alignment boundary B.  Every vector access will
 be a multiple of B and so we are guaranteed to access a
 non-gap element in the same B-sized block.  */
  if (overrun_p
  && gap < (vect_known_alignment_in_bytes (first_dr_info,
-  vectype)
-   / vect_get_scalar_dr_size (first_dr_info)))
+  vectype, off) / dr_size))
overrun_p = false;
 
  /* When we have a contiguous access across loop iterations
@@ -2230,7 +2236,7 @@ get_group_load_store_type (vec_info *vinfo, stmt_vec_info 
stmt_info,
 by simply loading half of the vector only.  Usually
 the construction with an upper zero half will be elided.  */
  dr_alignment_support alss;
- int misalign = dr_misalignment (first_dr_info, vectype);
+ int misalign = dr_misalignment (first_dr_info, vectype, off);
  tree half_vtype;
  poly_uint64 remain;
  unsigned HOST_WIDE_INT tem, num;
@@ -11991,8 +11997,14 @@ vectorizable_load (vec_info *vinfo,
tree ltype = vectype;
tree new_vtype = NULL_TREE;
unsigned HOST_WIDE_INT gap = DR_GROUP_GAP (first_stmt_info);
+   unsigned HOST_WIDE_INT dr_size
+ = vect_get_scalar_dr_size (first_dr_info);
+   poly_int64 off = 0;
+   if (memory_access_type == VMAT_CONTIGUOUS_REVERSE)
+ off = (TYPE_VECTOR_SUBPARTS (vectype) - 1) * -dr_size;
unsigned int vect_align
- = vect_known_alignment_in_bytes (first_dr_info, vectype);
+ = vect_known_alignment_in_bytes (first_dr_info, vectype,
+  off);
/* Try to use a single smaller load when we are about
   to load excess elements compared to the unrolled
   scalar loop.  */
@@ -12013,9 +12025,7 @@ vectorizable_load (vec_info *vinfo,
 scalar loop.  */
  ;
else if (known_gt (vect_align,
-  ((nunits - remain)
-

Re: [PATCH] [ifcombine] avoid dropping tree_could_trap_p [PR118514]

2025-01-21 Thread Richard Biener

On Tue, Jan 21, 2025 at 10:52 AM Jakub Jelinek  wrote:
>
> On Tue, Jan 21, 2025 at 06:31:43AM -0300, Alexandre Oliva wrote:
> > On Jan 21, 2025, Richard Biener  wrote:
> >
> > > you can use bit_field_size () and bit_field_offset () unconditionally,
> >
> > Nice, thanks!
> >
> > > Now, we don't have the same handling on BIT_FIELD_REFs but it
> > > seems it's enough to apply the check to those with a DECL as
> > > object to operate on.
> >
> > I doubt that will be enough.  I'm pretty sure the cases I saw in libgnat
> > in which BIT_FIELD_REF changed could_trap status, compared with the
> > preexisting convert-and-access-field it replaced, were not DECLs, but
> > dereferences.  But I'll check and report back.  (I'll be AFK for most of
> > the day, alas)
>
> I'd think if we know for sure access is out of bounds, we shouldn't be
> creating BIT_FIELD_REF for it (so in case of combine punt on the
> optimization).
> A different thing is if we don't know it, where the base is say a MEM_REF
> or something similar.

But we assume all indirect MEM_REFs may trap, likewise we check
whether MEM_REFs of DECLs do out-of-bound accesses.  The idea is
to do the same for BIT_FIELD_REF when the access is based on a decl.
I _think_ that should make them tree_could_trap_p at least.

But sure, not creating BIT_FIELD_REFs that are "obviously" out-of-bound
would be nice.  That's why I suggested not creating them when the original
ref was tree_could_trap_p - maybe only when the base get_inner_reference
returned isn't a MEM_REF.

Richard.

>
> Jakub
>

Re: [PATCH v2 2/2] LoongArch: Improve reassociation for bitwise operation and left shift [PR 115921]

2025-01-21 Thread Xi Ruoyao

On Tue, 2025-01-21 at 21:52 +0800, Xi Ruoyao wrote:
> > struct Pair { unsigned long a, b; };
> > 
> > struct Pair
> > test (struct Pair p, long x, long y)
> > {
> >   p.a &= 0x;
> >   p.a <<= 2;
> >   p.a += x;
> >   p.b &= 0x;
> >   p.b <<= 2;
> >   p.b += x;
> >   return p;
> > }
> > 
> > in GCC 13 the result is:
> > 
> >     or  $r12,$r4,$r0
> 
> Hmm, this strange move is caused by "&" in bstrpick_alsl_paired.  Is it
> really needed for the fusion?

Never mind, it's needed or a = ((a & 0x) << 1) + a will blow up.
Stupid I.

> >     bstrpick.d  $r4,$r12,31,0
> >     alsl.d  $r4,$r4,$r6,2
> >     or  $r12,$r5,$r0
> >     bstrpick.d  $r5,$r12,31,0
> >     alsl.d  $r5,$r5,$r6,2
> >     jr  $r1

-- 
Xi Ruoyao 
School of Aerospace Science and Technology, Xidian University

Re: [PATCH v3] c++: fix wrong-code with constexpr prvalue opt [PR118396]

2025-01-21 Thread Jason Merrill


On 1/20/25 5:58 PM, Marek Polacek wrote:

On Mon, Jan 20, 2025 at 12:39:03PM -0500, Jason Merrill wrote:

On 1/20/25 12:27 PM, Marek Polacek wrote:

On Mon, Jan 20, 2025 at 11:46:44AM -0500, Jason Merrill wrote:

On 1/20/25 10:27 AM, Marek Polacek wrote:

On Fri, Jan 17, 2025 at 06:38:45PM -0500, Jason Merrill wrote:

On 1/17/25 1:31 PM, Marek Polacek wrote:

On Fri, Jan 17, 2025 at 08:10:24AM -0500, Jason Merrill wrote:

On 1/16/25 8:04 PM, Marek Polacek wrote:

Bootstrapped/regtested on x86_64-pc-linux-gnu, ok for trunk?

-- >8 --
The recent r15-6369 unfortunately caused a bad wrong-code issue.
Here we have

   TARGET_EXPR 

and call cp_fold_r -> maybe_constant_init with object=D.2996.  In
cxx_eval_outermost_constant_expr we now take the type of the object
if present.  An object can't have type 'void' and so we continue to
evaluate the initializer.  That evaluates into a VOID_CST, meaning
we disregard the whole initializer, and terrible things ensue.


In that case, I'd think we want to use the value of 'object' (which should
be in ctx.ctor?) instead of the return value of
cxx_eval_constant_expression.


Ah, I'm sorry I didn't choose that approach.  Maybe like this, then?

Bootstrapped/regtested on x86_64-pc-linux-gnu, ok for trunk?


OK.  Maybe also add an assert that TREE_TYPE (r) is close enough to type?


Thanks.  dg.exp passed with this extra assert:

@@ -8986,7 +8986,11 @@ cxx_eval_outermost_constant_expr (tree t, bool 
allow_non_constant,
  /* If we got a non-simple TARGET_EXPR, the initializer was a sequence
 of statements, and the result ought to be stored in ctx.ctor.  */
  if (r == void_node && !constexpr_dtor && ctx.ctor)
-r = ctx.ctor;
+{
+  r = ctx.ctor;
+  gcc_checking_assert (same_type_ignoring_top_level_qualifiers_p
+  (TREE_TYPE (r), type));
+}


I was thinking to add that assert in general, not just in this case, to
catch any other instances of trying to return the wrong type.


Unfortunately this
+  /* Check we are not trying to return the wrong type.  */
+  gcc_checking_assert (same_type_ignoring_top_level_qualifiers_p
+  (initialized_type (r), type)


Why not just TREE_TYPE (r)?


Adjusted to use TREE_TYPE now.
  

+  || error_operand_p (type));
breaks too much, e.g. constexpr-prvalue2.C with struct A x struct B,
or pr82128.C
*(((struct C *) a)->D.2903._vptr.A + 8)
x
int (*) ()

I've also tried can_convert, or similar_type_p but no luck.  Any thoughts?


Those both sound like the sort of bugs the assert is intended to catch. But
I suppose we can't add it without fixing them first.

In the latter case, probably by adding an explicit conversion from the vtbl
slot type to the desired function pointer type.

In the former case, I don't see a constant-expression, so we shouldn't be
trying to check the type of a nonexistent constant result?


As discussed earlier, this patch just returns the original expression if
the types don't match:

Bootstrapped/regtested on x86_64-pc-linux-gnu, ok for trunk?


OK, thanks!


-- >8 --
The recent r15-6369 unfortunately caused a bad wrong-code issue.
Here we have

   TARGET_EXPR 

and call cp_fold_r -> maybe_constant_init with object=D.2996.  In
cxx_eval_outermost_constant_expr we now take the type of the object
if present.  An object can't have type 'void' and so we continue to
evaluate the initializer.  That evaluates into a VOID_CST, meaning
we disregard the whole initializer, and terrible things ensue.

For non-simple TARGET_EXPRs, we should return ctx.ctor rather than
the result of cxx_eval_constant_expression.

PR c++/118396
PR c++/118523

gcc/cp/ChangeLog:

* constexpr.cc (cxx_eval_outermost_constant_expr): For non-simple
TARGET_EXPRs, return ctx.ctor rather than the result of
cxx_eval_constant_expression.  If TYPE and the type of R don't
match, return the original expression.

gcc/testsuite/ChangeLog:

* g++.dg/cpp0x/constexpr-prvalue4.C: New test.
* g++.dg/cpp1y/constexpr-prvalue3.C: New test.

Reviewed-by: Jason Merrill 
---
  gcc/cp/constexpr.cc   |  9 +++-
  .../g++.dg/cpp0x/constexpr-prvalue4.C | 33 ++
  .../g++.dg/cpp1y/constexpr-prvalue3.C | 45 +++
  3 files changed, 86 insertions(+), 1 deletion(-)
  create mode 100644 gcc/testsuite/g++.dg/cpp0x/constexpr-prvalue4.C
  create mode 100644 gcc/testsuite/g++.dg/cpp1y/constexpr-prvalue3.C

diff --git a/gcc/cp/constexpr.cc b/gcc/cp/constexpr.cc
index 7ff38f8b5e5..9f950ffed74 100644
--- a/gcc/cp/constexpr.cc
+++ b/gcc/cp/constexpr.cc
@@ -8983,6 +8983,11 @@ cxx_eval_outermost_constant_expr (tree t, bool 
allow_non_constant,
r = cxx_eval_constant_expression (&ctx, r, vc_prvalue,
&non_constant_p, &overflow_p);
  
+  /* If we got a non-simple TARGET_EXPR, the initializer was a sequence

+ of statements, and the result ought to be stored in ctx.ctor.

Re: [GCC16 stage 1][RFC][PATCH 0/3]extend "counted_by" attribute to pointer fields of structures

2025-01-21 Thread Qing Zhao




> On Jan 20, 2025, at 16:19, Joseph Myers  wrote:
> 
> On Sat, 18 Jan 2025, Kees Cook wrote:
> 
>> Gaining access to global variables is another gap Linux has -- e.g. we
>> have arrays that are sized by the global number-of-cpus variable. :)
> 
> Note that it's already defined that counted_by takes an identifier for a 
> structure member (i.e. not an expression, not following the name lookup 
> rules used in expressions).  So some different syntax that only takes an 
> expression and not an identifier interpreted as a structure member would 
> be needed for anything that allows use of a global variable.

If we need to add such syntax for counted_by (I,e,  an expresson), can we still 
keep the same attribute
Name, all we need a new attribute name for the new syntax?

Qing
> 
> -- 
> Joseph S. Myers
> josmy...@redhat.com
>

Re: [PATCH v2 2/2] LoongArch: Improve reassociation for bitwise operation and left shift [PR 115921]

2025-01-21 Thread Xi Ruoyao

On Tue, 2025-01-21 at 22:14 +0800, Xi Ruoyao wrote:
> > > in GCC 13 the result is:
> > > 
> > >   or  $r12,$r4,$r0
> > 
> > Hmm, this strange move is caused by "&" in bstrpick_alsl_paired.  Is it
> > really needed for the fusion?
> 
> Never mind, it's needed or a = ((a & 0x) << 1) + a will blow up.
> Stupid I.

And my code is indeed broken due to the missing '&':

/* { dg-do run } */
/* { dg-options "-O2" } */

register long x asm ("s0");

#define TEST(x) (int)(((x & 0x114) << 3) + x)

[[gnu::noipa]] void
test (void)
{
  x = TEST (x);
}

int
main (void)
{
  x = 0x;
  test ();
  if (x != TEST (0x))
__builtin_trap ();
}

ends up:

0760 :
 760:   034452f7andi$s0, $s0, 0x114
 764:   00055ef7alsl.w  $s0, $s0, $s0, 0x3
 768:   4c20ret

and fails.  The fix would be like https://gcc.gnu.org/r15-5074.

> > >   bstrpick.d  $r4,$r12,31,0
> > >   alsl.d  $r4,$r4,$r6,2
> > >   or  $r12,$r5,$r0
> > >   bstrpick.d  $r5,$r12,31,0
> > >   alsl.d  $r5,$r5,$r6,2
> > >   jr  $r1

-- 
Xi Ruoyao 
School of Aerospace Science and Technology, Xidian University

Re: [PATCH] testsuite: Fixes for test case pr117546.c

2025-01-21 Thread Georg-Johann Lay


Am 18.01.25 um 19:30 schrieb Dimitar Dimitrov:

This test fails on AVR.

Debugging the test on x86 host, I noticed that u in function s sometimes
has value 16128.  The "t <= 3 * u" expression in the same function
results in signed integer overflow for targets with sizeof(int)=16.

Fix by requiring int32 effective target.


Thank you.  Though int32plus should be good enough?

Johann


Also add return statement for the main function.

gcc/testsuite/ChangeLog:

* gcc.dg/torture/pr117546.c: Require effective target int32.
(main): Add return statement.

Ok for trunk?

Cc: Sam James 
Signed-off-by: Dimitar Dimitrov 
---
  gcc/testsuite/gcc.dg/torture/pr117546.c | 4 +++-
  1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/gcc/testsuite/gcc.dg/torture/pr117546.c 
b/gcc/testsuite/gcc.dg/torture/pr117546.c
index 21e2aef18b9..b60f877a906 100644
--- a/gcc/testsuite/gcc.dg/torture/pr117546.c
+++ b/gcc/testsuite/gcc.dg/torture/pr117546.c
@@ -1,4 +1,4 @@
-/* { dg-do run } */
+/* { dg-do run { target int32 } } */
  
  typedef struct {

int a;
@@ -81,4 +81,6 @@ int main() {
l.glyf.coords[4] = (e){2, 206};
l.glyf.coords[6] = (e){0, 308, 5};
w(&l);
+
+  return 0;
  }

[PATCH 02/13] i386: Change mnemonics from V[ADDNE, DIVNE, MULNE, RCP, SUBNE]PBF16 to V[ADD, DIV, MUL, RCP, SUB]BF16

2025-01-21 Thread Haochen Jiang

gcc/ChangeLog:

PR target/118270
* config/i386/avx10_2-512bf16intrin.h: Change intrin and builtin
name according to new mnemonics.
* config/i386/avx10_2bf16intrin.h: Ditto.
* config/i386/i386-builtin.def (BDESC): Ditto.
* config/i386/sse.md (div3): Adjust emit_insn.
(avx10_2_nepbf16_): Rename to...
(avx10_2_bf16_): ...this. Change
instruction name output.
(avx10_2_rcppbf16_): Rename to...
(avx10_2_rcpbf16_):...this. Change
instruction name output.

gcc/testsuite/ChangeLog:

PR target/118270
* gcc.target/i386/avx10_2-512-bf16-1.c: Adjust output and
intrin call.
* gcc.target/i386/avx10_2-512-bf-vector-operations-1.c: Move to ...
* gcc.target/i386/avx10_2-512-bf16-vector-operations-1.c: ...here.
Adjust asm check.
* gcc.target/i386/avx10_2-512-vaddnepbf16-2.c: Move to...
* gcc.target/i386/avx10_2-512-vaddbf16-2.c: ...here. Adjust
intrin call.
* gcc.target/i386/avx10_2-512-vdivnepbf16-2.c: Move to...
* gcc.target/i386/avx10_2-512-vdivbf16-2.c: ...here. Adjust
intrin call.
* gcc.target/i386/avx10_2-512-vmulnepbf16-2.c: Move to...
* gcc.target/i386/avx10_2-512-vmulbf16-2.c: ...here. Adjust
intrin call.
* gcc.target/i386/avx10_2-512-vrcppbf16-2.c: Move to...
* gcc.target/i386/avx10_2-512-vrcpbf16-2.c: ...here. Adjust
intrin call.
* gcc.target/i386/avx10_2-512-vsubnepbf16-2.c: Move to...
* gcc.target/i386/avx10_2-512-vsubbf16-2.c: ...here. Adjust
intrin call.
* gcc.target/i386/avx10_2-bf16-1.c: Adjust output and
intrin call.
* gcc.target/i386/avx10_2-bf-vector-operations-1.c: Move to 
* gcc.target/i386/avx10_2-bf16-vector-operations-1.c: ...here.
Adjust asm check.
* gcc.target/i386/avx10_2-partial-bf-vector-fast-math-1.c: Move to...
* gcc.target/i386/avx10_2-partial-bf16-vector-fast-math-1.c: ...here.
Adjust asm check.
* gcc.target/i386/avx10_2-partial-bf-vector-operations-1.c: Move to...
* gcc.target/i386/avx10_2-partial-bf16-vector-operations-1.c: ...here.
Adjust asm check.
* gcc.target/i386/avx10_2-vaddnepbf16-2.c: Move to...
* gcc.target/i386/avx10_2-vaddbf16-2.c: ...here. Adjust intrin call.
* gcc.target/i386/avx10_2-vdivnepbf16-2.c: Move to...
* gcc.target/i386/avx10_2-vdivbf16-2.c: ...here. Adjust intrin call.
* gcc.target/i386/avx10_2-vmulnepbf16-2.c: Move to...
* gcc.target/i386/avx10_2-vmulbf16-2.c: ...here. Adjust intrin call.
* gcc.target/i386/avx10_2-vrcppbf16-2.c: Move to...
* gcc.target/i386/avx10_2-vrcpbf16-2.c: ...here. Adjust intrin call.
* gcc.target/i386/avx10_2-vsubnepbf16-2.c: Move to...
* gcc.target/i386/avx10_2-vsubbf16-2.c: ...here. Adjust intrin call.
* lib/target-supports.exp (check_effective_target_avx10_2):
Adjust asm usage.
(check_effective_target_avx10_2_512): Ditto.
---
 gcc/config/i386/avx10_2-512bf16intrin.h   |  86 -
 gcc/config/i386/avx10_2bf16intrin.h   | 174 +-
 gcc/config/i386/i386-builtin.def  |  54 +++---
 gcc/config/i386/sse.md|  12 +-
 .../i386/avx10_2-512-bf-vector-operations-1.c |  42 -
 .../gcc.target/i386/avx10_2-512-bf16-1.c  |  54 +++---
 .../avx10_2-512-bf16-vector-operations-1.c|  42 +
 ...ddnepbf16-2.c => avx10_2-512-vaddbf16-2.c} |   6 +-
 ...ivnepbf16-2.c => avx10_2-512-vdivbf16-2.c} |   6 +-
 ...ulnepbf16-2.c => avx10_2-512-vmulbf16-2.c} |   6 +-
 ...vrcppbf16-2.c => avx10_2-512-vrcpbf16-2.c} |   0
 ...ubnepbf16-2.c => avx10_2-512-vsubbf16-2.c} |   6 +-
 .../i386/avx10_2-bf-vector-operations-1.c |  79 
 .../gcc.target/i386/avx10_2-bf16-1.c  | 108 +--
 .../i386/avx10_2-bf16-vector-operations-1.c   |  79 
 ...avx10_2-partial-bf16-vector-fast-math-1.c} |   4 +-
 ...vx10_2-partial-bf16-vector-operations-1.c} |   8 +-
 ...0_2-vrcppbf16-2.c => avx10_2-vaddbf16-2.c} |   4 +-
 ...2-vaddnepbf16-2.c => avx10_2-vdivbf16-2.c} |   4 +-
 ...2-vdivnepbf16-2.c => avx10_2-vmulbf16-2.c} |   4 +-
 ...2-vmulnepbf16-2.c => avx10_2-vrcpbf16-2.c} |   4 +-
 .../gcc.target/i386/avx10_2-vsubbf16-2.c  |  16 ++
 .../gcc.target/i386/avx10_2-vsubnepbf16-2.c   |  16 --
 gcc/testsuite/lib/target-supports.exp |   4 +-
 24 files changed, 409 insertions(+), 409 deletions(-)
 delete mode 100644 
gcc/testsuite/gcc.target/i386/avx10_2-512-bf-vector-operations-1.c
 create mode 100644 
gcc/testsuite/gcc.target/i386/avx10_2-512-bf16-vector-operations-1.c
 rename gcc/testsuite/gcc.target/i386/{avx10_2-512-vaddnepbf16-2.c => 
avx10_2-512-vaddbf16-2.c} (86%)
 rename gcc/testsuite/gcc.target/i386/{avx10_2-512-vdivnepbf16-2.c => 
avx10_2-512-vdivbf16-2.c} (86%)
 rename gcc/testsuite/gcc.target/i386/{avx10_2-512-vmulnepbf1

[PATCH 01/13] i386: Enhance AMX tests

2025-01-21 Thread Haochen Jiang

After Binutils got changed, the previous usage on intrin will raise
warning for assembler. We need to change that. Besides that, there
are separate issues for both AMX-MOVRS and AMX-TRANSPOSE.

For AMX-MOVRS, t2rpntlvwrs tests wrongly used AMX-TRANSPOSE intrins
in test. Since the only difference between them is the "rs" hint,
it won't change result.

For AMX-TRANSPOSE, "t1" hint test is missing.

This patch fixed both of them. Also changing AMX-MOVRS test file
name to make it match with other AMX tests.

gcc/testsuite/ChangeLog:

PR target/118270
* gcc.target/i386/amxmovrs-t2rpntlvw-2.c: Move to...
* gcc.target/i386/amxmovrs-2rpntlvwrs-2.c: ...here.
* gcc.target/i386/amxtranspose-2rpntlvw-2.c: Add "t1" hint test.
---
 ...-t2rpntlvw-2.c => amxmovrs-2rpntlvwrs-2.c} | 30 +--
 .../gcc.target/i386/amxtranspose-2rpntlvw-2.c | 21 ++---
 2 files changed, 32 insertions(+), 19 deletions(-)
 rename gcc/testsuite/gcc.target/i386/{amxmovrs-t2rpntlvw-2.c => 
amxmovrs-2rpntlvwrs-2.c} (62%)

diff --git a/gcc/testsuite/gcc.target/i386/amxmovrs-t2rpntlvw-2.c 
b/gcc/testsuite/gcc.target/i386/amxmovrs-2rpntlvwrs-2.c
similarity index 62%
rename from gcc/testsuite/gcc.target/i386/amxmovrs-t2rpntlvw-2.c
rename to gcc/testsuite/gcc.target/i386/amxmovrs-2rpntlvwrs-2.c
index e38c6ea277a..0093ef7883f 100644
--- a/gcc/testsuite/gcc.target/i386/amxmovrs-t2rpntlvw-2.c
+++ b/gcc/testsuite/gcc.target/i386/amxmovrs-2rpntlvwrs-2.c
@@ -5,17 +5,17 @@
 /* { dg-options "-O2 -mamx-movrs -mamx-transpose -mavx512fp16 -mavx512bf16" } 
*/
 #define AMX_MOVRS
 #define AMX_TRANSPOSE
-#define DO_TEST test_amx_movrs_t2rpntlvw
-void test_amx_movrs_t2rpntlvw ();
+#define DO_TEST test_amx_movrs_t2rpntlvwrs
+void test_amx_movrs_t2rpntlvwrs ();
 #include "amx-helper.h"
 
-#define init_pair_tile_reg_and_src_z_t1(tmm_num, src, buffer, ztype, wtype)\
-{ \
-  init_pair_tile_src (tmm_num, &src, buffer, ztype);  \
-  _tile_2rpntlvwz##ztype##wtype (tmm_num, buffer, _STRIDE);\
+#define init_pair_tile_reg_and_src_z_t(tmm_num, src, buffer, ztype, wtype) \
+{  \
+  init_pair_tile_src (tmm_num, &src, buffer, ztype);   \
+  _tile_2rpntlvwz##ztype##rs##wtype (tmm_num, buffer, _STRIDE);
\
 }
 
-void test_amx_movrs_t2rpntlvw ()
+void test_amx_movrs_t2rpntlvwrs ()
 {
   __tilecfg_u cfg;
   __tilepair src;
@@ -28,29 +28,29 @@ void test_amx_movrs_t2rpntlvw ()
   for (i = 0; i < 2048; i++)
 buffer[i] = i % 256;
 
-  /* Check t2rpntlvwz0.  */
-  init_pair_tile_reg_and_src_z_t1 (0, src, buffer, 0,);
+  /* Check t2rpntlvwz0rs.  */
+  init_pair_tile_reg_and_src_z_t (0, src, buffer, 0,);
   _tile_stored (0, ref_0.buf, _STRIDE);
   _tile_stored (1, ref_1.buf, _STRIDE);
   if (!check_pair_tile_register (&ref_0, &ref_1, &src))
 abort ();
 
-  /* Check t2rpntlvwz1.  */
-  init_pair_tile_reg_and_src_z_t1 (1, src, buffer, 1,);
+  /* Check t2rpntlvwz1rs.  */
+  init_pair_tile_reg_and_src_z_t (0, src, buffer, 1,);
   _tile_stored (0, ref_0.buf, _STRIDE);
   _tile_stored (1, ref_1.buf, _STRIDE);
   if (!check_pair_tile_register (&ref_0, &ref_1, &src))
 abort ();
 
-  /* Check t2rpntlvwz0t1.  */
-  init_pair_tile_reg_and_src_z_t1 (0, src, buffer, 0, t1);
+  /* Check t2rpntlvwz0t1rs.  */
+  init_pair_tile_reg_and_src_z_t (0, src, buffer, 0, t1);
   _tile_stored (0, ref_0.buf, _STRIDE);
   _tile_stored (1, ref_1.buf, _STRIDE);
   if (!check_pair_tile_register (&ref_0, &ref_1, &src))
 abort ();
 
-  /* Check t2rpntlvwz1t1.  */
-  init_pair_tile_reg_and_src_z_t1 (1, src, buffer, 1, t1);
+  /* Check t2rpntlvwz1t1rs.  */
+  init_pair_tile_reg_and_src_z_t (0, src, buffer, 1, t1);
   _tile_stored (0, ref_0.buf, _STRIDE);
   _tile_stored (1, ref_1.buf, _STRIDE);
   if (!check_pair_tile_register (&ref_0, &ref_1, &src))
diff --git a/gcc/testsuite/gcc.target/i386/amxtranspose-2rpntlvw-2.c 
b/gcc/testsuite/gcc.target/i386/amxtranspose-2rpntlvw-2.c
index 3b1c8701237..2d018276af9 100644
--- a/gcc/testsuite/gcc.target/i386/amxtranspose-2rpntlvw-2.c
+++ b/gcc/testsuite/gcc.target/i386/amxtranspose-2rpntlvw-2.c
@@ -5,10 +5,10 @@
 #define DO_TEST test_amx_transpose_t2rpntlvw
 void test_amx_transpose_t2rpntlvw ();
 #include "amx-helper.h"
-#define init_pair_tile_reg_and_src_z(tmm_num, src, buffer, ztype)  \
+#define init_pair_tile_reg_and_src_z_t(tmm_num, src, buffer, ztype, wtype) \
 {  \
   init_pair_tile_src (tmm_num, &src, buffer, ztype);   \
-  _tile_2rpntlvwz##ztype (tmm_num, buffer, _STRIDE);   \
+  _tile_2rpntlvwz##ztype##wtype (tmm_num, buffer, _STRIDE);\
 }
 
 void test_amx_transpose_t2rpntlvw ()
@@ -25,17 +25,30 @@ void test_amx_transpose_t2rpntlvw ()
 buffer[i] = i % 256;
 
   /* Check t2rpntlvwz0.  */
-  init_pair_tile_reg

[PATCH 06/13] i386: Change mnemonics from V[GETMANT, REDUCENE, RNDSCALENE]PBF16 to V[GETMANT, REDUCE, RNDSCALE]BF16

2025-01-21 Thread Haochen Jiang

gcc/ChangeLog:

PR target/118270
* config/i386/avx10_2-512bf16intrin.h: Change intrin and builtin
name according to new mnemonics.
* config/i386/avx10_2bf16intrin.h: Ditto.
* config/i386/i386-builtin.def (BDESC): Ditto.
* config/i386/sse.md
(UNSPEC_VRNDSCALEBF16): Rename from UNSPEC_VRNDSCALENEPBF16.
(UNSPEC_VREDUCEBF16): Rename from UNSPEC_VREDUCENEPBF16.
(UNSPEC_VGETMANTBF16): Rename from UNSPEC_VGETMANTPBF16.
(BF16IMMOP): Adjust iterator due to UNSPEC name change.
(bf16immop): Ditto.
(avx10_2_pbf16_): Rename to...
(avx10_2_bf16_): ...this. Change
instruction name output.

gcc/testsuite/ChangeLog:

PR target/118270
* gcc.target/i386/avx10_2-512-bf16-1.c: Adjust output and
intrin call.
* gcc.target/i386/avx10_2-512-vgetmantpbf16-2.c: Move to...
* gcc.target/i386/avx10_2-512-vgetmantbf16-2.c: ...here.
Adjust intrin call.
* gcc.target/i386/avx10_2-512-vreducenepbf16-2.c: Move to...
* gcc.target/i386/avx10_2-512-vreducebf16-2.c: ...here.
Adjust intrin call.
* gcc.target/i386/avx10_2-512-vrndscalenepbf16-2.c: Move to...
* gcc.target/i386/avx10_2-512-vrndscalebf16-2.c: ...here.
Adjust intrin call.
* gcc.target/i386/avx10_2-bf16-1.c: Adjust output and intrin
call.
* gcc.target/i386/avx10_2-vgetmantpbf16-2.c: Move to...
* gcc.target/i386/avx10_2-vgetmantbf16-2.c: ...here.
Adjust intrin call.
* gcc.target/i386/avx10_2-vreducenepbf16-2.c: Move to...
* gcc.target/i386/avx10_2-vreducebf16-2.c: ...here.
Adjust intrin call.
* gcc.target/i386/avx10_2-vrndscalenepbf16-2.c: Move to...
* gcc.target/i386/avx10_2-vrndscalebf16-2.c: ...here.
Adjust intrin call.
* gcc.target/i386/avx-1.c: Adjust builtin call.
* gcc.target/i386/sse-13.c: Ditto.
* gcc.target/i386/sse-23.c: Ditto.
* gcc.target/i386/sse-14.c: Adjust intrin call.
* gcc.target/i386/sse-22.c: Ditto.
---
 gcc/config/i386/avx10_2-512bf16intrin.h   | 112 -
 gcc/config/i386/avx10_2bf16intrin.h   | 232 +-
 gcc/config/i386/i386-builtin.def  |  18 +-
 gcc/config/i386/sse.md|  22 +-
 gcc/testsuite/gcc.target/i386/avx-1.c |  18 +-
 .../gcc.target/i386/avx10_2-512-bf16-1.c  |  30 +--
 ...pbf16-2.c => avx10_2-512-vgetmantbf16-2.c} |   0
 ...epbf16-2.c => avx10_2-512-vreducebf16-2.c} |   6 +-
 ...bf16-2.c => avx10_2-512-vrndscalebf16-2.c} |   6 +-
 .../gcc.target/i386/avx10_2-bf16-1.c  |  60 ++---
 ...mantpbf16-2.c => avx10_2-vgetmantbf16-2.c} |   4 +-
 ...ucenepbf16-2.c => avx10_2-vreducebf16-2.c} |   4 +-
 ...enepbf16-2.c => avx10_2-vrndscalebf16-2.c} |   4 +-
 gcc/testsuite/gcc.target/i386/sse-13.c|  18 +-
 gcc/testsuite/gcc.target/i386/sse-14.c|  36 +--
 gcc/testsuite/gcc.target/i386/sse-22.c|  36 +--
 gcc/testsuite/gcc.target/i386/sse-23.c|  18 +-
 17 files changed, 312 insertions(+), 312 deletions(-)
 rename gcc/testsuite/gcc.target/i386/{avx10_2-512-vgetmantpbf16-2.c => 
avx10_2-512-vgetmantbf16-2.c} (100%)
 rename gcc/testsuite/gcc.target/i386/{avx10_2-512-vreducenepbf16-2.c => 
avx10_2-512-vreducebf16-2.c} (87%)
 rename gcc/testsuite/gcc.target/i386/{avx10_2-512-vrndscalenepbf16-2.c => 
avx10_2-512-vrndscalebf16-2.c} (84%)
 rename gcc/testsuite/gcc.target/i386/{avx10_2-vgetmantpbf16-2.c => 
avx10_2-vgetmantbf16-2.c} (78%)
 rename gcc/testsuite/gcc.target/i386/{avx10_2-vreducenepbf16-2.c => 
avx10_2-vreducebf16-2.c} (78%)
 rename gcc/testsuite/gcc.target/i386/{avx10_2-vrndscalenepbf16-2.c => 
avx10_2-vrndscalebf16-2.c} (77%)

diff --git a/gcc/config/i386/avx10_2-512bf16intrin.h 
b/gcc/config/i386/avx10_2-512bf16intrin.h
index fcd28534ddc..276a43890bd 100644
--- a/gcc/config/i386/avx10_2-512bf16intrin.h
+++ b/gcc/config/i386/avx10_2-512bf16intrin.h
@@ -468,100 +468,100 @@ _mm512_maskz_getexp_pbh (__mmask32 __U, __m512bh __A)
__U);
 }
 
-/* Intrinsics vrndscalepbf16.  */
+/* Intrinsics vrndscalebf16.  */
 #ifdef __OPTIMIZE__
 extern __inline__ __m512bh
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_roundscalene_pbh (__m512bh __A, int B)
+_mm512_roundscale_pbh (__m512bh __A, int B)
 {
   return (__m512bh)
-__builtin_ia32_rndscalenepbf16512_mask (__A, B,
-   (__v32bf) _mm512_setzero_si512 (),
-   (__mmask32) -1);
+__builtin_ia32_rndscalebf16512_mask (__A, B,
+(__v32bf) _mm512_setzero_si512 (),
+(__mmask32) -1);
 }
 
 extern __inline__ __m512bh
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_mask_roundscalene_pbh (__m512bh __W, __mmask32 __U, __m512bh __A, int B

[PATCH 05/13] i386: Change mnemonics from VMINMAXNEPBF16 to VMINMAXBF16

2025-01-21 Thread Haochen Jiang

gcc/ChangeLog:

PR target/118270
* config/i386/avx10_2-512minmaxintrin.h: Change intrin and
builtin name according to new mnemonics.
* config/i386/avx10_2minmaxintrin.h: Ditto.
* config/i386/i386-builtin.def (BDESC): Ditto.
* config/i386/sse.md
(UNSPEC_MINMAXBF16): Rename from UNSPEC_MINMAXNEPBF16.
(avx10_2_minmaxnepbf16_): Rename to...
(avx10_2_minmaxbf16_): ...this. Change
instruction name output.

gcc/testsuite/ChangeLog:

PR target/118270
* gcc.target/i386/avx10_2-512-minmax-1.c: Adjust output and
intrin call.
* gcc.target/i386/avx10_2-512-vminmaxnepbf16-2.c: Move to...
* gcc.target/i386/avx10_2-512-vminmaxbf16-2.c: ...here.
Adjust intrin call.
* gcc.target/i386/avx10_2-minmax-1.c: Adjust output and intrin
call.
* gcc.target/i386/avx10_2-vminmaxnepbf16-2.c: Move to...
* gcc.target/i386/avx10_2-vminmaxbf16-2.c: ...here.
Adjust intrin call.
* gcc.target/i386/avx-1.c: Adjust builtin call.
* gcc.target/i386/sse-13.c: Ditto.
* gcc.target/i386/sse-23.c: Ditto.
* gcc.target/i386/sse-14.c: Adjust intrin call.
* gcc.target/i386/sse-22.c: Ditto.
---
 gcc/config/i386/avx10_2-512minmaxintrin.h |  88 +-
 gcc/config/i386/avx10_2minmaxintrin.h | 165 +-
 gcc/config/i386/i386-builtin.def  |   6 +-
 gcc/config/i386/sse.md|   8 +-
 gcc/testsuite/gcc.target/i386/avx-1.c |   6 +-
 .../gcc.target/i386/avx10_2-512-minmax-1.c|  12 +-
 ...epbf16-2.c => avx10_2-512-vminmaxbf16-2.c} |  12 +-
 .../gcc.target/i386/avx10_2-minmax-1.c|  24 +--
 ...maxnepbf16-2.c => avx10_2-vminmaxbf16-2.c} |   4 +-
 gcc/testsuite/gcc.target/i386/sse-13.c|   6 +-
 gcc/testsuite/gcc.target/i386/sse-14.c|  18 +-
 gcc/testsuite/gcc.target/i386/sse-22.c|  18 +-
 gcc/testsuite/gcc.target/i386/sse-23.c|   6 +-
 13 files changed, 187 insertions(+), 186 deletions(-)
 rename gcc/testsuite/gcc.target/i386/{avx10_2-512-vminmaxnepbf16-2.c => 
avx10_2-512-vminmaxbf16-2.c} (73%)
 rename gcc/testsuite/gcc.target/i386/{avx10_2-vminmaxnepbf16-2.c => 
avx10_2-vminmaxbf16-2.c} (75%)

diff --git a/gcc/config/i386/avx10_2-512minmaxintrin.h 
b/gcc/config/i386/avx10_2-512minmaxintrin.h
index 1dc5949a727..3acdc568f27 100644
--- a/gcc/config/i386/avx10_2-512minmaxintrin.h
+++ b/gcc/config/i386/avx10_2-512minmaxintrin.h
@@ -32,39 +32,39 @@
 #ifdef __OPTIMIZE__
 extern __inline __m512bh
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_minmax_nepbh (__m512bh __A, __m512bh __B, const int __C)
+_mm512_minmax_pbh (__m512bh __A, __m512bh __B, const int __C)
 {
-  return (__m512bh) __builtin_ia32_minmaxnepbf16512_mask ((__v32bf) __A,
- (__v32bf) __B,
- __C,
- (__v32bf)(__m512bh)
- _mm512_setzero_si512 
(),
- (__mmask32) -1);
+  return (__m512bh) __builtin_ia32_minmaxbf16512_mask ((__v32bf) __A,
+  (__v32bf) __B,
+  __C,
+  (__v32bf)(__m512bh)
+  _mm512_setzero_si512 (),
+  (__mmask32) -1);
 }
 
 extern __inline __m512bh
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_mask_minmax_nepbh (__m512bh __W, __mmask32 __U,
- __m512bh __A, __m512bh __B, const int __C)
+_mm512_mask_minmax_pbh (__m512bh __W, __mmask32 __U,
+   __m512bh __A, __m512bh __B, const int __C)
 {
-  return (__m512bh) __builtin_ia32_minmaxnepbf16512_mask ((__v32bf) __A,
- (__v32bf) __B,
- __C,
- (__v32bf) __W,
- (__mmask32) __U);
+  return (__m512bh) __builtin_ia32_minmaxbf16512_mask ((__v32bf) __A,
+  (__v32bf) __B,
+  __C,
+  (__v32bf) __W,
+  (__mmask32) __U);
 }
 
 extern __inline __m512bh
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_maskz_minmax_nepbh (__mmask32 __U, __m512bh __A,
-  __m512bh __B, const int __C)
+_mm512_maskz_minmax_pbh (__mmask32 __U, __m512bh __A,
+

[PATCH 07/13] i386: Change mnemonics from V[RSQRT, SCALEF, SQRTNE]PBF16 to V[RSQRT.SCALEF.SQRT]BF16

2025-01-21 Thread Haochen Jiang

gcc/ChangeLog:

PR target/118270
* config/i386/avx10_2-512bf16intrin.h: Change intrin and builtin
name according to new mnemonics.
* config/i386/avx10_2bf16intrin.h: Ditto.
* config/i386/i386-builtin.def (BDESC): Ditto.
* config/i386/sse.md
(UNSPEC_VSCALEFBF16): Rename from UNSPEC_VSCALEFPBF16.
(avx10_2_scalefpbf16_): Rename to...
(avx10_2_scalefbf16_): ...this.
Change instruction name output.
(avx10_2_rsqrtpbf16_): Rename to...
(avx10_2_rsqrtbf16_): ...this.
Change instruction name output.
(avx10_2_sqrtnepbf16_): Rename to...
(avx10_2_sqrtbf16_): ...this.
Change instruction name output.

gcc/testsuite/ChangeLog:

PR target/118270
* gcc.target/i386/avx10_2-512-bf16-1.c: Adjust output and intrin
call.
* gcc.target/i386/avx10_2-512-vrsqrtpbf16-2.c: Move to...
* gcc.target/i386/avx10_2-512-vrsqrtbf16-2.c: ...here.
Adjust intrin call.
* gcc.target/i386/avx10_2-512-vscalefpbf16-2.c: Move to...
* gcc.target/i386/avx10_2-512-vscalefbf16-2.c: ...here.
Adjust intrin call.
* gcc.target/i386/avx10_2-512-vsqrtnepbf16-2.c: Move to...
* gcc.target/i386/avx10_2-512-vsqrtbf16-2.c: ...here.
Adjust intrin call.
* gcc.target/i386/avx10_2-bf16-1.c: Adjust output and intrin
call.
* gcc.target/i386/avx10_2-vrsqrtpbf16-2.c: Move to...
* gcc.target/i386/avx10_2-vrsqrtbf16-2.c: ...here.
Adjust intrin call.
* gcc.target/i386/avx10_2-vscalefpbf16-2.c: Move to...
* gcc.target/i386/avx10_2-vscalefbf16-2.c: ...here.
Adjust intrin call.
* gcc.target/i386/avx10_2-vsqrtnepbf16-2.c: Move to...
* gcc.target/i386/avx10_2-vsqrtbf16-2.c: ...here.
Adjust intrin call.
---
 gcc/config/i386/avx10_2-512bf16intrin.h   | 46 +-
 gcc/config/i386/avx10_2bf16intrin.h   | 88 +--
 gcc/config/i386/i386-builtin.def  | 24 ++---
 gcc/config/i386/sse.md| 16 ++--
 .../gcc.target/i386/avx10_2-512-bf16-1.c  | 24 ++---
 ...rtpbf16-2.c => avx10_2-512-vrsqrtbf16-2.c} |  0
 ...fpbf16-2.c => avx10_2-512-vscalefbf16-2.c} |  0
 ...tnepbf16-2.c => avx10_2-512-vsqrtbf16-2.c} |  6 +-
 .../gcc.target/i386/avx10_2-bf16-1.c  | 48 +-
 ...vrsqrtpbf16-2.c => avx10_2-vrsqrtbf16-2.c} |  4 +-
 ...calefpbf16-2.c => avx10_2-vscalefbf16-2.c} |  4 +-
 ...vsqrtnepbf16-2.c => avx10_2-vsqrtbf16-2.c} |  4 +-
 12 files changed, 132 insertions(+), 132 deletions(-)
 rename gcc/testsuite/gcc.target/i386/{avx10_2-512-vrsqrtpbf16-2.c => 
avx10_2-512-vrsqrtbf16-2.c} (100%)
 rename gcc/testsuite/gcc.target/i386/{avx10_2-512-vscalefpbf16-2.c => 
avx10_2-512-vscalefbf16-2.c} (100%)
 rename gcc/testsuite/gcc.target/i386/{avx10_2-512-vsqrtnepbf16-2.c => 
avx10_2-512-vsqrtbf16-2.c} (86%)
 rename gcc/testsuite/gcc.target/i386/{avx10_2-vrsqrtpbf16-2.c => 
avx10_2-vrsqrtbf16-2.c} (79%)
 rename gcc/testsuite/gcc.target/i386/{avx10_2-vscalefpbf16-2.c => 
avx10_2-vscalefbf16-2.c} (79%)
 rename gcc/testsuite/gcc.target/i386/{avx10_2-vsqrtnepbf16-2.c => 
avx10_2-vsqrtbf16-2.c} (79%)

diff --git a/gcc/config/i386/avx10_2-512bf16intrin.h 
b/gcc/config/i386/avx10_2-512bf16intrin.h
index 276a43890bd..f60ac2cd03f 100644
--- a/gcc/config/i386/avx10_2-512bf16intrin.h
+++ b/gcc/config/i386/avx10_2-512bf16intrin.h
@@ -194,16 +194,16 @@ extern __inline__ __m512bh
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
 _mm512_scalef_pbh (__m512bh __A, __m512bh __B)
 {
-  return (__m512bh) __builtin_ia32_scalefpbf16512 (__A, __B);
+  return (__m512bh) __builtin_ia32_scalefbf16512 (__A, __B);
 }
 
 extern __inline__ __m512bh
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
 _mm512_mask_scalef_pbh (__m512bh __W, __mmask32 __U,
- __m512bh __A, __m512bh __B)
+   __m512bh __A, __m512bh __B)
 {
   return (__m512bh)
-__builtin_ia32_scalefpbf16512_mask (__A, __B, __W, __U);
+__builtin_ia32_scalefbf16512_mask (__A, __B, __W, __U);
 }
 
 extern __inline__ __m512bh
@@ -211,9 +211,9 @@ __attribute__ ((__gnu_inline__, __always_inline__, 
__artificial__))
 _mm512_maskz_scalef_pbh (__mmask32 __U, __m512bh __A, __m512bh __B)
 {
   return (__m512bh)
-__builtin_ia32_scalefpbf16512_mask (__A, __B,
-   (__v32bf) _mm512_setzero_si512 (),
-   __U);
+__builtin_ia32_scalefbf16512_mask (__A, __B,
+  (__v32bf) _mm512_setzero_si512 (),
+  __U);
 }
 
 extern __inline__ __m512bh
@@ -361,9 +361,9 @@ __attribute__ ((__gnu_inline__, __always_inline__, 
__artificial__))
 _mm512_rsqrt_pbh (__m512bh __A)
 {
   return (__m512bh)
-__builtin_ia32_rsqrtpbf16512_mask (__A,
-  (__v32bf) _mm512_

[PATCH 09/13] i386: Change mnemonics from VCOMSBF16 to VCOMISBF16

2025-01-21 Thread Haochen Jiang

Besides mnemonics change, this patch also use the compare
pattern instead of UNSPEC.

gcc/ChangeLog:

PR target/118270
* config/i386/avx10_2bf16intrin.h: Change intrin and builtin
name according to new mnemonics.
* config/i386/i386-builtin.def (BDESC): Ditto.
* config/i386/i386-expand.cc
(ix86_expand_fp_compare): Adjust comments.
(ix86_expand_builtin): Adjust switch case.
* config/i386/i386.md (cmpibf): Change instruction name output.
* config/i386/sse.md (UNSPEC_VCOMSBF16): Removed.
(avx10_2_comisbf16_v8bf): New.
(avx10_2_comsbf16_v8bf): Removed.

gcc/testsuite/ChangeLog:

PR target/118270
* gcc.target/i386/avx10_2-comibf-1.c: Adjust asm check.
* gcc.target/i386/avx10_2-comibf-3.c: Ditto.
* gcc.target/i386/avx10_2-vcomsbf16-1.c: Move to...
* gcc.target/i386/avx10_2-vcomisbf16-1.c: ...here.
Adjust output and intrin call.
* gcc.target/i386/avx10_2-vcomsbf16-2.c: Move to...
* gcc.target/i386/avx10_2-vcomisbf16-2.c: ...here.
Adjust intrin call.
* gcc.target/i386/pr117495.c: Adjust asm check.
---
 gcc/config/i386/avx10_2bf16intrin.h   | 26 -
 gcc/config/i386/i386-builtin.def  | 12 
 gcc/config/i386/i386-expand.cc| 14 -
 gcc/config/i386/i386.md   |  2 +-
 gcc/config/i386/sse.md| 29 +--
 .../gcc.target/i386/avx10_2-comibf-1.c|  2 +-
 .../gcc.target/i386/avx10_2-comibf-3.c|  2 +-
 .../gcc.target/i386/avx10_2-vcomisbf16-1.c| 19 
 ...2-vcomsbf16-2.c => avx10_2-vcomisbf16-2.c} |  2 +-
 .../gcc.target/i386/avx10_2-vcomsbf16-1.c | 19 
 gcc/testsuite/gcc.target/i386/pr117495.c  |  2 +-
 11 files changed, 64 insertions(+), 65 deletions(-)
 create mode 100644 gcc/testsuite/gcc.target/i386/avx10_2-vcomisbf16-1.c
 rename gcc/testsuite/gcc.target/i386/{avx10_2-vcomsbf16-2.c => 
avx10_2-vcomisbf16-2.c} (95%)
 delete mode 100644 gcc/testsuite/gcc.target/i386/avx10_2-vcomsbf16-1.c

diff --git a/gcc/config/i386/avx10_2bf16intrin.h 
b/gcc/config/i386/avx10_2bf16intrin.h
index e3fa71f27c0..af3b4afe17f 100644
--- a/gcc/config/i386/avx10_2bf16intrin.h
+++ b/gcc/config/i386/avx10_2bf16intrin.h
@@ -1284,47 +1284,47 @@ _mm_cmp_pbh_mask (__m128bh __A, __m128bh __B, const int 
__imm)
 
 #endif /* __OPIMTIZE__ */
 
-/* Intrinsics vcomsbf16.  */
+/* Intrinsics vcomisbf16.  */
 extern __inline int
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm_comeq_sbh (__m128bh __A, __m128bh __B)
+_mm_comieq_sbh (__m128bh __A, __m128bh __B)
 {
-  return __builtin_ia32_vcomsbf16eq (__A, __B);
+  return __builtin_ia32_vcomisbf16eq (__A, __B);
 }
 
 extern __inline int
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm_comlt_sbh (__m128bh __A, __m128bh __B)
+_mm_comilt_sbh (__m128bh __A, __m128bh __B)
 {
-  return __builtin_ia32_vcomsbf16lt (__A, __B);
+  return __builtin_ia32_vcomisbf16lt (__A, __B);
 }
 
 extern __inline int
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm_comle_sbh (__m128bh __A, __m128bh __B)
+_mm_comile_sbh (__m128bh __A, __m128bh __B)
 {
-  return __builtin_ia32_vcomsbf16le (__A, __B);
+  return __builtin_ia32_vcomisbf16le (__A, __B);
 }
 
 extern __inline int
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm_comgt_sbh (__m128bh __A, __m128bh __B)
+_mm_comigt_sbh (__m128bh __A, __m128bh __B)
 {
-  return __builtin_ia32_vcomsbf16gt (__A, __B);
+  return __builtin_ia32_vcomisbf16gt (__A, __B);
 }
 
 extern __inline int
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm_comge_sbh (__m128bh __A, __m128bh __B)
+_mm_comige_sbh (__m128bh __A, __m128bh __B)
 {
-  return __builtin_ia32_vcomsbf16ge (__A, __B);
+  return __builtin_ia32_vcomisbf16ge (__A, __B);
 }
 
 extern __inline int
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm_comneq_sbh (__m128bh __A, __m128bh __B)
+_mm_comineq_sbh (__m128bh __A, __m128bh __B)
 {
-  return __builtin_ia32_vcomsbf16neq (__A, __B);
+  return __builtin_ia32_vcomisbf16neq (__A, __B);
 }
 
 #ifdef __DISABLE_AVX10_2_256__
diff --git a/gcc/config/i386/i386-builtin.def b/gcc/config/i386/i386-builtin.def
index a546cdcaed9..7e1dad2615e 100644
--- a/gcc/config/i386/i386-builtin.def
+++ b/gcc/config/i386/i386-builtin.def
@@ -3284,12 +3284,12 @@ BDESC (0, OPTION_MASK_ISA2_AVX10_2_256, 
CODE_FOR_avx10_2_fpclassbf16_v8bf_mask,
 BDESC (0, OPTION_MASK_ISA2_AVX10_2_512, CODE_FOR_avx10_2_cmpbf16_v32bf_mask, 
"__builtin_ia32_cmpbf16512_mask", IX86_BUILTIN_CMPBF16512_MASK, UNKNOWN, (int) 
USI_FTYPE_V32BF_V32BF_INT_USI)
 BDESC (0, OPTION_MASK_ISA2_AVX10_2_256, CODE_FOR_avx10_2_cmpbf16_v16bf_mask, 
"__builtin_ia32_cmpbf16256_mask", IX86_BUILTIN_CMPBF16256_MASK, UNKNOWN, (int) 
UHI_FTYPE_V16BF_V16BF_INT_UHI)
 BDESC (0, OPTION_MASK_ISA2_AVX10_2

[PATCH 00/13] Realign x86 GCC after Binutils change [PR118270]

2025-01-21 Thread Haochen Jiang

Hi all,

Recently, DMR ISAs got lots of changes in mnemonics. The detailed change
are:

  - NE would be removed for all AVX10.2 new insns
  - VCOMSBF16 -> VCOMISBF16
  - P for packed omitted for AI data types (BF16, TF32, FP8)

For AMX-AVX512 change, it has been upstreamed previouslv, the remaining
change are all related to AVX10.2.

Ref:
https://www.intel.com/content/www/us/en/content-details/844829/intel-advanced-vector-extensions-10-2-intel-avx10-2-architecture-specification.html

You could also refer to the thread in Binutils previously for some more
information: https://sourceware.org/pipermail/binutils/2025-January/138577.html

Binutils has applied all the above changes prior to GCC due to its release
window. Now it is the time for GCC to get re-aligned with Binutils.

Besides all the mnemonics change, there will also be two more changes:

  - Due to Binutils pair tile register handle, we will need to adjust intrin
usage for AMX-MOVRS and AMX-TRANSPOSE testcases to avoid warnings.
  - Since we will omit P for packed for AI data types, we will also omit
"p" for packed in intrin name for FP8 since its introduction in AVX10.2.
For BF16, we will still stick to the current mixed status in intrin name.

The upcoming 13 patches are all related to the changes. Ok for trunk?

Thx,
Haochen

[PATCH 03/13] i386: Change mnemonics from VF[, N]M[ADD, SUB][132, 213, 231]NEPBF16 to VF[, N]M[ADD, SUB][132, 213, 231]BF16

2025-01-21 Thread Haochen Jiang

gcc/ChangeLog:

PR target/118270
* config/i386/avx10_2-512bf16intrin.h: Change intrin and builtin
names according to new mnemonics.
* config/i386/avx10_2bf16intrin.h: Ditto.
* config/i386/i386-builtin.def (BDESC): Ditto.
* config/i386/sse.md
(avx10_2_fmaddnepbf16__maskz): Rename to...
(avx10_2_fmaddbf16__maskz): ...this. Adjust emit_insn.
(avx10_2_fmaddnepbf16_): Rename to...
(avx10_2_fmaddbf16_): ...this.
Change instruction name output.
(avx10_2_fmaddnepbf16__mask): Rename to...
(avx10_2_fmaddbf16__mask): ...this.
Change instruction name output.
(avx10_2_fmaddnepbf16__mask3): Rename to...
(avx10_2_fmaddbf16__mask3): ...this.
Change instruction name output.
(avx10_2_fnmaddnepbf16__maskz): Rename to...
(avx10_2_fnmaddbf16__maskz): ...this. Adjust emit_insn.
(avx10_2_fnmaddnepbf16_): Rename to...
(avx10_2_fnmaddbf16_): ...this.
Change instruction name output.
(avx10_2_fnmaddnepbf16__mask): Rename to...
(avx10_2_fnmaddbf16__mask): ...this.
Change instruction name output.
(avx10_2_fnmaddnepbf16__mask3): Rename to...
(avx10_2_fnmaddbf16__mask3): ...this.
Change instruction name output.
(avx10_2_fmsubnepbf16__maskz): Rename to...
(avx10_2_fmsubbf16__maskz): ...this. Adjust emit_insn.
(avx10_2_fmsubnepbf16_): Rename to...
(avx10_2_fmsubbf16_): ...this.
Change instruction name output.
(avx10_2_fmsubnepbf16__mask): Rename to...
(avx10_2_fmsubbf16__mask): ...this.
Change instruction name output.
(avx10_2_fmsubnepbf16__mask3): Rename to...
(avx10_2_fmsubbf16__mask3): ...this.
Change instruction name output.
(avx10_2_fnmsubnepbf16__maskz): Rename to...
(avx10_2_fnmsubbf16__maskz): ...this. Adjust emit_insn.
(avx10_2_fnmsubnepbf16_): Rename to...
(avx10_2_fnmsubbf16_): ...this.
Change instruction name output.
(avx10_2_fnmsubnepbf16__mask): Rename to...
(avx10_2_fnmsubbf16__mask): ...this.
Change instruction name output.
(avx10_2_fnmsubnepbf16__mask3): Rename to...
(avx10_2_fnmsubbf16__mask3): ...this.
Change instruction name output.

gcc/testsuite/ChangeLog:

PR target/118270
* gcc.target/i386/avx10_2-512-bf16-1.c: Adjust output and
intrin call.
* gcc.target/i386/avx10_2-512-bf-vector-fma-1.c: Move to...
* gcc.target/i386/avx10_2-512-bf16-vector-fma-1.c: ...here.
Adjust asm check.
* gcc.target/i386/avx10_2-512-vfmaddXXXnepbf16-2.c: Move to...
* gcc.target/i386/avx10_2-512-vfmaddXXXbf16-2.c: ...here.
Adjust intrin call.
* gcc.target/i386/avx10_2-512-vfmsubXXXnepbf16-2.c: Move to...
* gcc.target/i386/avx10_2-512-vfmsubXXXbf16-2.c: ...here.
Adjust intrin call.
* gcc.target/i386/avx10_2-512-vfnmaddXXXnepbf16-2.c: Move to...
* gcc.target/i386/avx10_2-512-vfnmaddXXXbf16-2.c: ...here.
Adjust intrin call.
* gcc.target/i386/avx10_2-512-vfnmsubXXXnepbf16-2.c: Move to...
* gcc.target/i386/avx10_2-512-vfnmsubXXXbf16-2.c: ...here.
Adjust intrin call.
* gcc.target/i386/avx10_2-bf16-1.c: Adjust output and
intrin call.
* gcc.target/i386/avx10_2-bf-vector-fma-1.c: Move to...
* gcc.target/i386/avx10_2-bf16-vector-fma-1.c: ...here.
Adjust asm check.
* gcc.target/i386/avx10_2-partial-bf-vector-fma-1.c: Move to...
* gcc.target/i386/avx10_2-partial-bf16-vector-fma-1.c: ...here.
Adjust asm check.
* gcc.target/i386/avx10_2-vfmaddXXXnepbf16-2.c: Move to...
* gcc.target/i386/avx10_2-vfmaddXXXbf16-2.c: ...here.
Adjust intrin call.
* gcc.target/i386/avx10_2-vfmsubXXXnepbf16-2.c: Move to...
* gcc.target/i386/avx10_2-vfmsubXXXbf16-2.c: ...here.
Adjust intrin call.
* gcc.target/i386/avx10_2-vfnmaddXXXnepbf16-2.c: Move to...
* gcc.target/i386/avx10_2-vfnmaddXXXbf16-2.c: ...here.
Adjust intrin call.
* gcc.target/i386/avx10_2-vfnmsubXXXnepbf16-2.c: Move to...
* gcc.target/i386/avx10_2-vfnmsubXXXbf16-2.c: ...here.
Adjust intrin call.
---
 gcc/config/i386/avx10_2-512bf16intrin.h   |  86 -
 gcc/config/i386/avx10_2bf16intrin.h   | 176 +-
 gcc/config/i386/i386-builtin.def  |  72 +++
 gcc/config/i386/sse.md| 148 +++
 .../i386/avx10_2-512-bf-vector-fma-1.c|  34 
 .../gcc.target/i386/avx10_2-512-bf16-1.c  |  64 +++
 .../i386/avx10_2-512-bf16-vector-fma-1.c  |  34 
 ...bf16-2.c => avx10_2-512-vfmaddXXXbf16-2.c} |   4 +-
 ...bf16-2.c => avx10_2-512-vfmsubXXXbf16-2.c} |   4 +-
 ...f16-2.c => avx10_2-512-vfnmaddXXXbf16-2.c} |   4 +-
 ...f16-2.c => avx10_2-512-

[PATCH 08/13] i386: Change mnemonics from V[GETEXP, FPCLASS]PBF16 to V[GETEXP.FPCLASS]BF16

2025-01-21 Thread Haochen Jiang

Besides mnemonics change, this patch also fixed SDE test fail for
FPCLASS.

gcc/ChangeLog:

PR target/118270
* config/i386/avx10_2-512bf16intrin.h: Change intrin and builtin
name according to new mnemonics.
* config/i386/avx10_2bf16intrin.h: Ditto.
* config/i386/i386-builtin.def (BDESC): Ditto.
* config/i386/sse.md
(UNSPEC_VFPCLASSBF16); Rename from UNSPEC_VFPCLASSPBF16.
(avx10_2_getexppbf16_): Rename to...
(avx10_2_getexpbf16_): ...this.
Change instruction name output.
(avx10_2_fpclasspbf16_):
Rename to...
(avx10_2_fpclassbf16_): ...this.
Change instruction name output.

gcc/testsuite/ChangeLog:

PR target/118270
* gcc.target/i386/avx10_2-512-bf16-1.c: Adjust output and
intrin call.
* gcc.target/i386/avx10_2-512-vfpclasspbf16-2.c: Move to...
* gcc.target/i386/avx10_2-512-vfpclassbf16-2.c: ...here.
Adjust intrin call.
* gcc.target/i386/avx10_2-512-vgetexppbf16-2.c: Move to...
* gcc.target/i386/avx10_2-512-vgetexpbf16-2.c: ...here.
Adjust intrin call.
* gcc.target/i386/avx10_2-bf16-1.c: Adjust output and
intrin call.
* gcc.target/i386/avx10_2-vgetexppbf16-2.c: Move to...
* gcc.target/i386/avx10_2-vgetexpbf16-2.c: ...here.
Adjust intrin call.
* gcc.target/i386/avx10_2-vfpclasspbf16-2.c: Move to...
* gcc.target/i386/avx10_2-vfpclassbf16-2.c: ...here.
Adjust intrin call.
* gcc.target/i386/avx-1.c: Adjust builtin call.
* gcc.target/i386/sse-13.c: Ditto.
* gcc.target/i386/sse-23.c: Ditto.
---
 gcc/config/i386/avx10_2-512bf16intrin.h   | 26 +-
 gcc/config/i386/avx10_2bf16intrin.h   | 52 +--
 gcc/config/i386/i386-builtin.def  | 12 ++---
 gcc/config/i386/sse.md| 12 ++---
 gcc/testsuite/gcc.target/i386/avx-1.c |  6 +--
 .../gcc.target/i386/avx10_2-512-bf16-1.c  | 10 ++--
 ...pbf16-2.c => avx10_2-512-vfpclassbf16-2.c} |  2 +-
 ...ppbf16-2.c => avx10_2-512-vgetexpbf16-2.c} |  0
 .../gcc.target/i386/avx10_2-bf16-1.c  | 20 +++
 ...texppbf16-2.c => avx10_2-vfpclassbf16-2.c} |  4 +-
 ...classpbf16-2.c => avx10_2-vgetexpbf16-2.c} |  4 +-
 gcc/testsuite/gcc.target/i386/sse-13.c|  6 +--
 gcc/testsuite/gcc.target/i386/sse-23.c|  6 +--
 13 files changed, 80 insertions(+), 80 deletions(-)
 rename gcc/testsuite/gcc.target/i386/{avx10_2-512-vfpclasspbf16-2.c => 
avx10_2-512-vfpclassbf16-2.c} (95%)
 rename gcc/testsuite/gcc.target/i386/{avx10_2-512-vgetexppbf16-2.c => 
avx10_2-512-vgetexpbf16-2.c} (100%)
 rename gcc/testsuite/gcc.target/i386/{avx10_2-vgetexppbf16-2.c => 
avx10_2-vfpclassbf16-2.c} (79%)
 rename gcc/testsuite/gcc.target/i386/{avx10_2-vfpclasspbf16-2.c => 
avx10_2-vgetexpbf16-2.c} (78%)

diff --git a/gcc/config/i386/avx10_2-512bf16intrin.h 
b/gcc/config/i386/avx10_2-512bf16intrin.h
index f60ac2cd03f..307b14a878a 100644
--- a/gcc/config/i386/avx10_2-512bf16intrin.h
+++ b/gcc/config/i386/avx10_2-512bf16intrin.h
@@ -446,16 +446,16 @@ __attribute__ ((__gnu_inline__, __always_inline__, 
__artificial__))
 _mm512_getexp_pbh (__m512bh __A)
 {
   return (__m512bh)
-__builtin_ia32_getexppbf16512_mask (__A,
-   (__v32bf) _mm512_setzero_si512 (),
-   (__mmask32) -1);
+__builtin_ia32_getexpbf16512_mask (__A,
+  (__v32bf) _mm512_setzero_si512 (),
+  (__mmask32) -1);
 }
 
 extern __inline__ __m512bh
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
 _mm512_mask_getexp_pbh (__m512bh __W, __mmask32 __U, __m512bh __A)
 {
-  return (__m512bh) __builtin_ia32_getexppbf16512_mask (__A,  __W,  __U);
+  return (__m512bh) __builtin_ia32_getexpbf16512_mask (__A,  __W,  __U);
 }
 
 extern __inline__ __m512bh
@@ -463,9 +463,9 @@ __attribute__ ((__gnu_inline__, __always_inline__, 
__artificial__))
 _mm512_maskz_getexp_pbh (__mmask32 __U, __m512bh __A)
 {
   return (__m512bh)
-__builtin_ia32_getexppbf16512_mask (__A,
-   (__v32bf) _mm512_setzero_si512 (),
-   __U);
+__builtin_ia32_getexpbf16512_mask (__A,
+  (__v32bf) _mm512_setzero_si512 (),
+  __U);
 }
 
 /* Intrinsics vrndscalebf16.  */
@@ -613,7 +613,7 @@ _mm512_maskz_getmant_pbh (__mmask32 __U, __m512bh __A,
 
 #endif /* __OPTIMIZE__ */
 
-/* Intrinsics vfpclasspbf16.  */
+/* Intrinsics vfpclassbf16.  */
 #ifdef __OPTIMIZE__
 extern __inline __mmask32
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
@@ -621,7 +621,7 @@ _mm512_mask_fpclass_pbh_mask (__mmask32 __U, __m512bh __A,
  const int __imm)
 {
   return (__mmask32)
-__builtin_ia32_fpclasspbf1

Re: [PATCH] tree, c++: Consider TARGET_EXPR invariant like SAVE_EXPR [PR118509]

2025-01-21 Thread Richard Biener

On Mon, 20 Jan 2025, Jason Merrill wrote:

> On 1/20/25 4:21 AM, Jakub Jelinek wrote:
> > On Mon, Jan 20, 2025 at 10:14:58AM +0100, Richard Biener wrote:
> >> OK (it really makes sense).  I do wonder whether there's going to be
> >> more fallout similar to the OMP one - did you try grepping for SAVE_EXPR
> >> checks?
> > 
> > I saw e.g. something for the #pragma omp atomic case, but there it worst
> > case just creates another TARGET_EXPR, so shouldn't be that bad.
> > But I'm afraid if something else will show up we don't have covered in the
> > testsuite yet.
> > 
> >> I'm not really happy doing two different things - if we're not
> >> comfortable with the trunk variant what about reverting the causing
> >> change?
> > 
> > Reverting it will result in other wrong-code issues.  It is unfortunately a
> > can of worms.
> > I could revert r14-10839 and r14-10666 to restore the set of existing
> > wrong-code issues to the 14.2 and earlier state and reapply after two months
> > of getting this change tested on the trunk.
> 
> I think your quick fix of adding save_exprs makes sense, and then we can
> replace it later after this change gets more testing?

If you both agree then I'm fine with that.

Richard.

Re: [PATCH v2 2/2] LoongArch: Improve reassociation for bitwise operation and left shift [PR 115921]

2025-01-21 Thread Xi Ruoyao

On Tue, 2025-01-21 at 20:34 +0800, Lulu Cheng wrote:
> 
> 在 2025/1/21 下午6:05, Xi Ruoyao 写道:
> > On Tue, 2025-01-21 at 16:41 +0800, Lulu Cheng wrote:
> > > 在 2025/1/21 下午12:59, Xi Ruoyao 写道:
> > > > On Tue, 2025-01-21 at 11:46 +0800, Lulu Cheng wrote:
> > > > > 在 2025/1/18 下午7:33, Xi Ruoyao 写道:
> > > > > /* snip */
> > > > > >     ;; This code iterator allows unsigned and signed division to be 
> > > > > > generated
> > > > > >     ;; from the same template.
> > > > > > @@ -3083,39 +3084,6 @@ (define_expand "rotl3"
> > > > > >   }
> > > > > >   });
> > > > > >     
> > > > > > -;; The following templates were added to generate "bstrpick.d + 
> > > > > > alsl.d"
> > > > > > -;; instruction pairs.
> > > > > > -;; It is required that the values of const_immalsl_operand and
> > > > > > -;; immediate_operand must have the following correspondence:
> > > > > > -;;
> > > > > > -;; (immediate_operand >> const_immalsl_operand) == 0x
> > > > > > -
> > > > > > -(define_insn "zero_extend_ashift"
> > > > > > -  [(set (match_operand:DI 0 "register_operand" "=r")
> > > > > > -   (and:DI (ashift:DI (match_operand:DI 1 "register_operand" "r")
> > > > > > -      (match_operand 2 "const_immalsl_operand" ""))
> > > > > > -   (match_operand 3 "immediate_operand" "")))]
> > > > > > -  "TARGET_64BIT
> > > > > > -   && ((INTVAL (operands[3]) >> INTVAL (operands[2])) == 
> > > > > > 0x)"
> > > > > > -  "bstrpick.d\t%0,%1,31,0\n\talsl.d\t%0,%0,$r0,%2"
> > > > > > -  [(set_attr "type" "arith")
> > > > > > -   (set_attr "mode" "DI")
> > > > > > -   (set_attr "insn_count" "2")])
> > > > > > -
> > > > > > -(define_insn "bstrpick_alsl_paired"
> > > > > > -  [(set (match_operand:DI 0 "register_operand" "=&r")
> > > > > > -   (plus:DI
> > > > > > -     (and:DI (ashift:DI (match_operand:DI 1 "register_operand" "r")
> > > > > > -    (match_operand 2 "const_immalsl_operand" 
> > > > > > ""))
> > > > > > -     (match_operand 3 "immediate_operand" ""))
> > > > > > -     (match_operand:DI 4 "register_operand" "r")))]
> > > > > > -  "TARGET_64BIT
> > > > > > -   && ((INTVAL (operands[3]) >> INTVAL (operands[2])) == 
> > > > > > 0x)"
> > > > > > -  "bstrpick.d\t%0,%1,31,0\n\talsl.d\t%0,%0,%4,%2"
> > > > > > -  [(set_attr "type" "arith")
> > > > > > -   (set_attr "mode" "DI")
> > > > > > -   (set_attr "insn_count" "2")])
> > > > > > -
> > > > > Hi,
> > > > > 
> > > > > In LoongArch, the microarchitecture has performed instruction fusion 
> > > > > on
> > > > > bstrpick.d+alsl.d.
> > > > > 
> > > > > This modification may cause the two instructions to not be close 
> > > > > together.
> > > > > 
> > > > > So I think these two templates cannot be deleted. I will test the 
> > > > > impact
> > > > > of this patch on the spec today.
> > > > Oops.  I guess we can salvage it with TARGET_SCHED_MACRO_FUSION_P and
> > > > TARGET_SCHED_MACRO_FUSION_PAIR_P.  And I'd like to know more details:
> > > > 
> > > > 1. Is the fusion applying to all bstrpick.d + alsl.d, or only bstrpick.d
> > > > rd, rs, 31, 0?
> > > > 2. Is the fusion also applying to bstrpick.d + slli.d, or we really have
> > > > to write the strange "alsl.d rd, rs, r0, shamt" instruction?
> > > > 
> > > Currently, command fusion can only be done in the following situations:
> > > 
> > > bstrpick.d rd, rs, 31, 0 + alsl.d rd1,rj,rk,shamt and "rd = rj"
> > So the easiest solution seems just adding the two patterns back, I'm
> > bootstrapping and regtesting the patch attached.
> 
> It seems to be more formal through TARGET_SCHED_MACRO_FUSION_P and
> 
> TARGET_SCHED_MACRO_FUSION_PAIR_P. I found the spec test item that generated
> 
> this instruction pair. I implemented these two hooks to see if it works.

And another problem is w/o bstrpick_alsl_paired some test cases are
regressed, like:

struct Pair { unsigned long a, b; };

struct Pair
test (struct Pair p, long x, long y)
{
  p.a &= 0x;
  p.a <<= 2;
  p.a += x;
  p.b &= 0x;
  p.b <<= 2;
  p.b += x;
  return p;
}

in GCC 13 the result is:

or  $r12,$r4,$r0
bstrpick.d  $r4,$r12,31,0
alsl.d  $r4,$r4,$r6,2
or  $r12,$r5,$r0
bstrpick.d  $r5,$r12,31,0
alsl.d  $r5,$r5,$r6,2
jr  $r1

But now:

addi.w  $r12,$r0,-4 # 0xfffc
lu32i.d $r12,0x3
slli.d  $r5,$r5,2
slli.d  $r4,$r4,2
and $r5,$r5,$r12
and $r4,$r4,$r12
add.d   $r4,$r4,$r6
add.d   $r5,$r5,$r6
jr  $r1

While both are suboptimial, the new code generation is more stupid.  I'm
still unsure how to fix it, so maybe for now we'd just restore
bstrpick_alsl_paired to fix the regression.

And I guess we'd need zero_extend_ashift anyway because we need to use
alsl.d instead of slli.d for the fusion.

-- 
Xi Ruoyao 
School of Aerospace Science and Technology, Xidian University

Re: [PATCH V5 0/2] RISC-V: Add intrinsics support and testcases for SiFive Xsfvcp extension.

2025-01-21 Thread Kito Cheng

LGTM but defer to GCC 16 :)

On Tue, Jan 21, 2025 at 11:43 AM  wrote:
>
> From: yulong 
>
> This patch implements the Sifvie vendor extension Xsfvcp[1]
>  support to gcc. Providing a flexible mechanism to extend application
>  processors with custom coprocessors and variable-latency arithmetic
>   units intrinsics.
>
> [1] 
> https://www.sifive.com/document-file/sifive-vector-coprocessor-interface-vcix-software
>
> Co-Authored by: Jiawei Chen 
> Co-Authored by: Shihua Liao 
> Co-Authored by: Yixuan Chen 
>
> Diff with V4: Delete the sifive_vector.h file.
>
> yulong (2):
>   RISC-V: Add intrinsics support for SiFive Xsfvcp extensions.
>   RISC-V: Add intrinsics testcases for SiFive Xsfvcp extensions.
>
>  gcc/config/riscv/constraints.md   |  10 +
>  gcc/config/riscv/generic-vector-ooo.md|   4 +
>  gcc/config/riscv/genrvv-type-indexer.cc   |   9 +
>  gcc/config/riscv/riscv-c.cc   |   3 +-
>  .../riscv/riscv-vector-builtins-shapes.cc |  48 +
>  .../riscv/riscv-vector-builtins-shapes.h  |   2 +
>  .../riscv/riscv-vector-builtins-types.def |  40 +
>  gcc/config/riscv/riscv-vector-builtins.cc | 362 +++-
>  gcc/config/riscv/riscv-vector-builtins.def|  30 +-
>  gcc/config/riscv/riscv-vector-builtins.h  |   8 +
>  gcc/config/riscv/riscv.md |   5 +-
>  .../riscv/sifive-vector-builtins-bases.cc |  78 ++
>  .../riscv/sifive-vector-builtins-bases.h  |   3 +
>  .../sifive-vector-builtins-functions.def  |  45 +
>  gcc/config/riscv/sifive-vector.md | 871 ++
>  gcc/config/riscv/vector-iterators.md  |  48 +
>  gcc/config/riscv/vector.md|   3 +-
>  .../gcc.target/riscv/rvv/xsfvector/sf_vc_f.c  |  88 ++
>  .../gcc.target/riscv/rvv/xsfvector/sf_vc_i.c  | 132 +++
>  .../gcc.target/riscv/rvv/xsfvector/sf_vc_v.c  | 107 +++
>  .../gcc.target/riscv/rvv/xsfvector/sf_vc_x.c  | 138 +++
>  21 files changed, 2026 insertions(+), 8 deletions(-)
>  create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/xsfvector/sf_vc_f.c
>  create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/xsfvector/sf_vc_i.c
>  create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/xsfvector/sf_vc_v.c
>  create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/xsfvector/sf_vc_x.c
>
> --
> 2.34.1
>

[PATCH v5] RISC-V: Add a new constraint to ensure that the vl of XTheadVector does not get a non-zero immediate

2025-01-21 Thread Jin Ma

Although we have handled the vl of XTheadVector correctly in the
expand phase and predicates, the results show that the work is
still insufficient.

In the curr_insn_transform function, the insn is transformed from:
(insn 69 67 225 12 (set (mem:RVVM8SF (reg/f:DI 218 [ _77 ]) [0  S[128, 128] 
A32])
(if_then_else:RVVM8SF (unspec:RVVMF4BI [
(const_vector:RVVMF4BI repeat [
(const_int 1 [0x1])
])
(reg:DI 209)
(const_int 0 [0])
(reg:SI 66 vl)
(reg:SI 67 vtype)
] UNSPEC_VPREDICATE)
(reg/v:RVVM8SF 143 [ _xx ])
(mem:RVVM8SF (reg/f:DI 218 [ _77 ]) [0  S[128, 128] A32])))
 (expr_list:REG_DEAD (reg/v:RVVM8SF 143 [ _xx ])
(nil)))
to
(insn 69 284 225 11 (set (mem:RVVM8SF (reg/f:DI 18 s2 [orig:218 _77 ] [218]) [0 
 S[128, 128] A32])
(if_then_else:RVVM8SF (unspec:RVVMF4BI [
(const_vector:RVVMF4BI repeat [
(const_int 1 [0x1])
])
(const_int 1 [0x1])
(const_int 0 [0])
(reg:SI 66 vl)
(reg:SI 67 vtype)
] UNSPEC_VPREDICATE)
(reg/v:RVVM8SF 104 v8 [orig:143 _xx ] [143])
(mem:RVVM8SF (reg/f:DI 18 s2 [orig:218 _77 ] [218]) [0  S[128, 128] 
A32])))
 (nil))

Looking at the log for the reload pass, it is found that "Changing pseudo 209 in
operand 3 of insn 69 on equiv 0x1".
It converts the vl operand in insn from the expected register(reg:DI 209) to the
constant 1(const_int 1 [0x1]).

This conversion occurs because, although the predicate for the vl operand is
restricted by "vector_length_operand" in the pattern, the constraint is still
"rK", which allows the transformation.

The issue is that changing the "rK" constraint to "rJ" for the constraint of vl
operand in the pattern would prevent this conversion, But unfortunately this 
will
conflict with RVV (RISC-V Vector Extension).

Based on the review's recommendations, the best solution for now is to create
a new constraint to distinguish between RVV and XTheadVector, which is exactly
what this patch does.

PR 116593

gcc/ChangeLog:

* config/riscv/constraints.md (vl): New.
* config/riscv/thead-vector.md: Replacing rK with rvl.
* config/riscv/vector.md: Likewise.

gcc/testsuite/ChangeLog:

* g++.target/riscv/rvv/rvv.exp: Enable testsuite of XTheadVector.
* g++.target/riscv/rvv/xtheadvector/pr116593.C: New test.

Reported-by: nihui 
---
 gcc/config/riscv/constraints.md   |   6 +
 gcc/config/riscv/thead-vector.md  |  18 +-
 gcc/config/riscv/vector.md| 476 +-
 gcc/testsuite/g++.target/riscv/rvv/rvv.exp|   3 +
 .../riscv/rvv/xtheadvector/pr116593.C |  47 ++
 5 files changed, 303 insertions(+), 247 deletions(-)
 create mode 100644 gcc/testsuite/g++.target/riscv/rvv/xtheadvector/pr116593.C

diff --git a/gcc/config/riscv/constraints.md b/gcc/config/riscv/constraints.md
index f25975dc0208..ba3c6e6a4c44 100644
--- a/gcc/config/riscv/constraints.md
+++ b/gcc/config/riscv/constraints.md
@@ -209,6 +209,12 @@ (define_constraint "vk"
   (and (match_code "const_vector")
(match_test "riscv_vector::const_vec_all_same_in_range_p (op, 0, 31)")))
 
+(define_constraint "vl"
+  "A uimm5 for Vector or zero for XTheadVector."
+  (and (match_code "const_int")
+   (ior (match_test "!TARGET_XTHEADVECTOR && satisfies_constraint_K (op)")
+   (match_test "TARGET_XTHEADVECTOR && satisfies_constraint_J (op)"
+
 (define_constraint "Wc0"
   "@internal
  A constraint that matches a vector of immediate all zeros."
diff --git a/gcc/config/riscv/thead-vector.md b/gcc/config/riscv/thead-vector.md
index 5fe9ba08c4eb..5a02debdd207 100644
--- a/gcc/config/riscv/thead-vector.md
+++ b/gcc/config/riscv/thead-vector.md
@@ -108,7 +108,7 @@ (define_insn_and_split "@pred_th_whole_mov"
   [(set (match_operand:V_VLS_VT 0 "reg_or_mem_operand"  "=vr,vr, m")
(unspec:V_VLS_VT
  [(match_operand:V_VLS_VT 1 "reg_or_mem_operand" " vr, m,vr")
-  (match_operand 2 "vector_length_operand"   " rK, rK, rK")
+  (match_operand 2 "vector_length_operand"   "rvl,rvl,rvl")
   (match_operand 3 "const_1_operand" "  i, i, i")
   (reg:SI VL_REGNUM)
   (reg:SI VTYPE_REGNUM)]
@@ -133,7 +133,7 @@ (define_insn_and_split "@pred_th_whole_mov"
   [(set (match_operand:VB 0 "reg_or_mem_operand"  "=vr,vr, m")
(unspec:VB
  [(match_operand:VB 1 "reg_or_mem_operand" " vr, m,vr")
-  (match_operand 2 "vector_length_operand"   " rK, rK, rK")
+  (match_operand 2 "vector_length_operand"   "rvl,rvl,rvl")
   (match_operand 3 "const_1_operand" "  i, i, i")
   (reg:SI VL_REGNUM)
   (reg:SI VTYPE_REGNUM)]

Re: [PATCH] [ifcombine] avoid dropping tree_could_trap_p [PR118514]

2025-01-21 Thread Jakub Jelinek

On Tue, Jan 21, 2025 at 06:31:43AM -0300, Alexandre Oliva wrote:
> On Jan 21, 2025, Richard Biener  wrote:
> 
> > you can use bit_field_size () and bit_field_offset () unconditionally,
> 
> Nice, thanks!
> 
> > Now, we don't have the same handling on BIT_FIELD_REFs but it
> > seems it's enough to apply the check to those with a DECL as
> > object to operate on.
> 
> I doubt that will be enough.  I'm pretty sure the cases I saw in libgnat
> in which BIT_FIELD_REF changed could_trap status, compared with the
> preexisting convert-and-access-field it replaced, were not DECLs, but
> dereferences.  But I'll check and report back.  (I'll be AFK for most of
> the day, alas)

I'd think if we know for sure access is out of bounds, we shouldn't be
creating BIT_FIELD_REF for it (so in case of combine punt on the
optimization).
A different thing is if we don't know it, where the base is say a MEM_REF
or something similar.

Jakub

Re: [PATCH v2 2/2] LoongArch: Improve reassociation for bitwise operation and left shift [PR 115921]

2025-01-21 Thread Xi Ruoyao

On Tue, 2025-01-21 at 16:41 +0800, Lulu Cheng wrote:
> 
> 在 2025/1/21 下午12:59, Xi Ruoyao 写道:
> > On Tue, 2025-01-21 at 11:46 +0800, Lulu Cheng wrote:
> > > 在 2025/1/18 下午7:33, Xi Ruoyao 写道:
> > > /* snip */
> > > >    ;; This code iterator allows unsigned and signed division to be 
> > > > generated
> > > >    ;; from the same template.
> > > > @@ -3083,39 +3084,6 @@ (define_expand "rotl3"
> > > >  }
> > > >  });
> > > >    
> > > > -;; The following templates were added to generate "bstrpick.d + alsl.d"
> > > > -;; instruction pairs.
> > > > -;; It is required that the values of const_immalsl_operand and
> > > > -;; immediate_operand must have the following correspondence:
> > > > -;;
> > > > -;; (immediate_operand >> const_immalsl_operand) == 0x
> > > > -
> > > > -(define_insn "zero_extend_ashift"
> > > > -  [(set (match_operand:DI 0 "register_operand" "=r")
> > > > -   (and:DI (ashift:DI (match_operand:DI 1 "register_operand" "r")
> > > > -      (match_operand 2 "const_immalsl_operand" ""))
> > > > -   (match_operand 3 "immediate_operand" "")))]
> > > > -  "TARGET_64BIT
> > > > -   && ((INTVAL (operands[3]) >> INTVAL (operands[2])) == 0x)"
> > > > -  "bstrpick.d\t%0,%1,31,0\n\talsl.d\t%0,%0,$r0,%2"
> > > > -  [(set_attr "type" "arith")
> > > > -   (set_attr "mode" "DI")
> > > > -   (set_attr "insn_count" "2")])
> > > > -
> > > > -(define_insn "bstrpick_alsl_paired"
> > > > -  [(set (match_operand:DI 0 "register_operand" "=&r")
> > > > -   (plus:DI
> > > > -     (and:DI (ashift:DI (match_operand:DI 1 "register_operand" "r")
> > > > -    (match_operand 2 "const_immalsl_operand" 
> > > > ""))
> > > > -     (match_operand 3 "immediate_operand" ""))
> > > > -     (match_operand:DI 4 "register_operand" "r")))]
> > > > -  "TARGET_64BIT
> > > > -   && ((INTVAL (operands[3]) >> INTVAL (operands[2])) == 0x)"
> > > > -  "bstrpick.d\t%0,%1,31,0\n\talsl.d\t%0,%0,%4,%2"
> > > > -  [(set_attr "type" "arith")
> > > > -   (set_attr "mode" "DI")
> > > > -   (set_attr "insn_count" "2")])
> > > > -
> > > Hi,
> > > 
> > > In LoongArch, the microarchitecture has performed instruction fusion on
> > > bstrpick.d+alsl.d.
> > > 
> > > This modification may cause the two instructions to not be close together.
> > > 
> > > So I think these two templates cannot be deleted. I will test the impact
> > > of this patch on the spec today.
> > Oops.  I guess we can salvage it with TARGET_SCHED_MACRO_FUSION_P and
> > TARGET_SCHED_MACRO_FUSION_PAIR_P.  And I'd like to know more details:
> > 
> > 1. Is the fusion applying to all bstrpick.d + alsl.d, or only bstrpick.d
> > rd, rs, 31, 0?
> > 2. Is the fusion also applying to bstrpick.d + slli.d, or we really have
> > to write the strange "alsl.d rd, rs, r0, shamt" instruction?
> > 
> Currently, command fusion can only be done in the following situations:
> 
> bstrpick.d rd, rs, 31, 0 + alsl.d rd1,rj,rk,shamt and "rd = rj"

So the easiest solution seems just adding the two patterns back, I'm
bootstrapping and regtesting the patch attached.

-- 
Xi Ruoyao 
School of Aerospace Science and Technology, Xidian University
From 88dc215ee55c0e9da05812310964d41c69117416 Mon Sep 17 00:00:00 2001
From: Xi Ruoyao 
Date: Tue, 21 Jan 2025 17:34:36 +0800
Subject: [PATCH 1/2] LoongArch: Add back zero_extend_ashift and
 bstrpick_alsl_paired

This partially reverts r15-7062-g10e98638998.

These two define_insn's are needed for utilizing the macro-fusion of
bstrpick.d rd,rs,31,0 and alsl.d rd,rd,rk,shamt.

Per GCC Internal section "When the Order of Patterns Matters," having
zero_extend_ashift and bstrpick_alsl_paired before and_shift_reversedi
is enough to make the compiler prefer zero_extend_ashift or
bstrpick_alsl_paired, so we don't need to explicitly reject the case in
and_shift_reversedi.  The test change is also reverted and now the test
properly demonstrate bstrpick_alsl_paired should be used.

gcc/ChangeLog:

	* config/loongarch/loongarch.md (zero_extend_ashift): New
	define_insn.
	(bstrpick_alsl_paired): New define_insn.

gcc/testsuite/ChangeLog:

	* gcc.target/loongarch/bstrpick_alsl_paired.c: Revert r15-7062
	change.
---
 gcc/config/loongarch/loongarch.md | 33 +++
 .../loongarch/bstrpick_alsl_paired.c  |  2 +-
 2 files changed, 34 insertions(+), 1 deletion(-)

diff --git a/gcc/config/loongarch/loongarch.md b/gcc/config/loongarch/loongarch.md
index 9cde5c58a20..01145fa0f70 100644
--- a/gcc/config/loongarch/loongarch.md
+++ b/gcc/config/loongarch/loongarch.md
@@ -3080,6 +3080,39 @@ (define_expand "rotl3"
   }
   });
 
+;; The following templates were added to generate "bstrpick.d + alsl.d"
+;; instruction pairs.
+;; It is required that the values of const_immalsl_operand and
+;; immediate_operand must have the following correspondence:
+;;
+;; (immediate_operand >> const_immalsl_operand) == 0x
+
+(define_insn "zero_extend_ashift"
+  [(s

Re: [PATCH,LRA] Restrict the reuse of spill slots [PR117868]

2025-01-21 Thread Richard Sandiford

Denis Chertykov  writes:
>   PR rtl-optimization/117868
> gcc/
>   * lra-spills.cc (assign_stack_slot_num_and_sort_pseudos): Reuse slots
>   only without allocated memory or only with equal or smaller registers
>   with equal or smaller alignment.
>   (lra_spill): Print slot size as width.
>
>
> diff --git a/gcc/lra-spills.cc b/gcc/lra-spills.cc
> index db78dcd28a3..93a0c92db9f 100644
> --- a/gcc/lra-spills.cc
> +++ b/gcc/lra-spills.cc
> @@ -386,7 +386,18 @@ assign_stack_slot_num_and_sort_pseudos (int 
> *pseudo_regnos, int n)
>   && ! (lra_intersected_live_ranges_p
> (slots[j].live_ranges,
>  lra_reg_info[regno].live_ranges)))
> -   break;
> +   {
> + /* A slot without allocated memory can be shared.  */
> + if (slots[j].mem == NULL_RTX)
> +   break;
> +
> + /* A slot with allocated memory can be shared only with equal
> +or smaller register with equal or smaller alignment.  */
> + if (slots[j].align >= spill_slot_alignment (mode)
> + && compare_sizes_for_sort (slots[j].size,
> +GET_MODE_SIZE (mode)) != -1)

Sorry for piping up late, but I think this should be:

  known_ge (GET_MODE_SIZE (mode), slots[j].size)

>From the comment above compare_sizes_for_sort:

/* Compare A and B for sorting purposes, returning -1 if A should come
   before B, 0 if A and B are identical, and 1 if A should come after B.
   This is a lexicographical compare of the coefficients in reverse order.

   A consequence of this is that all constant sizes come before all
   non-constant ones, regardless of magnitude (since a size is never
   negative).  This is what most callers want.  For example, when laying
   data out on the stack, it's better to keep all the constant-sized
   data together so that it can be accessed as a constant offset from a
   single base.  */

For example, compare_sizes_for_sort would return 1 for a slot size
of 2+2X and a mode size of 16, but the slot would be too small for X < 7.

Thanks,
Richard

> +   break;
> +   }
>   }
> if (j >= slots_num)
>   {
> @@ -656,8 +667,7 @@ lra_spill (void)
> for (i = 0; i < slots_num; i++)
>   {
> fprintf (lra_dump_file, "  Slot %d regnos (width = ", i);
> -   print_dec (GET_MODE_SIZE (GET_MODE (slots[i].mem)),
> -  lra_dump_file, SIGNED);
> +   print_dec (slots[i].size, lra_dump_file, SIGNED);
> fprintf (lra_dump_file, "):");
> for (curr_regno = slots[i].regno;;
>  curr_regno = pseudo_slots[curr_regno].next - pseudo_slots)

Re: [PATCH]middle-end: use ncopies both when registering and reading masks [PR118273]

2025-01-21 Thread Richard Biener

On Mon, 20 Jan 2025, Tamar Christina wrote:

> Hi All,
> 
> When registering masks for SIMD clone we end up using nmasks instead of
> nvectors where nmasks seems to compute the number of input masks required for
> the call given the current simdlen.
> 
> This is however wrong as vect_record_loop_mask wants to know how many masks 
> you
> want to create from the given vectype. i.e. which level of rgroups to create.
> 
> This ends up mismatching with vect_get_loop_mask which uses nvectors and if 
> the
> return type is narrower than the input types there will be a mismatch which
> causes us to try to read from the given rgroup.  It only happens to work if 
> the
> function had an additional argument that's wider or if all elements and return
> types are the same size.
> 
> This fixes it by using nvectors during registration as well, which has already
> taken into account SLP and VF.
> 
> Bootstrapped Regtested on aarch64-none-linux-gnu,
> arm-none-linux-gnueabihf, x86_64-pc-linux-gnu
> -m32, -m64 and no issues.
> 
> Ok for master?

OK.  This was a fragile bit IIRC but your testing hopefully covered
all important cases (GCN might be missing, but is somewhat peculiar to 
test).

Thanks,
Richard.

> Thanks,
> Tamar
> 
> gcc/ChangeLog:
> 
>   PR middle-end/118273
>   * tree-vect-stmts.cc (vectorizable_simd_clone_call): Use nvectors when
>   doing mask registrations.
> 
> gcc/testsuite/ChangeLog:
> 
>   * gcc.target/aarch64/vect-simd-clone-4.c: New test.
> 
> ---
> diff --git a/gcc/testsuite/gcc.target/aarch64/vect-simd-clone-4.c 
> b/gcc/testsuite/gcc.target/aarch64/vect-simd-clone-4.c
> new file mode 100644
> index 
> ..9b52af7039ffa4af2b49c7cef9ad93ca1525
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/aarch64/vect-simd-clone-4.c
> @@ -0,0 +1,15 @@
> +/* { dg-do compile }  */
> +/* { dg-options "-std=c99" } */
> +/* { dg-additional-options "-O3 -march=armv8-a" } */
> +
> +#pragma GCC target ("+sve")
> +
> +extern char __attribute__ ((simd, const)) fn3 (short);
> +void test_fn3 (float *a, float *b, double *c, int n)
> +{
> +  for (int i = 0; i < n; ++i)
> +a[i] = fn3 (c[i]);
> +}
> +
> +/* { dg-final { scan-assembler {\s+_ZGVsMxv_fn3\n} } } */
> +
> diff --git a/gcc/tree-vect-stmts.cc b/gcc/tree-vect-stmts.cc
> index 
> 833029fcb00108abc605042376e9811651d5cd64..21fb5cf5bd47ad9e37762909c6103adbf8752e2a
>  100644
> --- a/gcc/tree-vect-stmts.cc
> +++ b/gcc/tree-vect-stmts.cc
> @@ -4561,14 +4561,9 @@ vectorizable_simd_clone_call (vec_info *vinfo, 
> stmt_vec_info stmt_info,
>   case SIMD_CLONE_ARG_TYPE_MASK:
> if (loop_vinfo
> && LOOP_VINFO_CAN_USE_PARTIAL_VECTORS_P (loop_vinfo))
> - {
> -   unsigned nmasks
> - = exact_div (ncopies * bestn->simdclone->simdlen,
> -  TYPE_VECTOR_SUBPARTS (vectype)).to_constant ();
> -   vect_record_loop_mask (loop_vinfo,
> -  &LOOP_VINFO_MASKS (loop_vinfo),
> -  nmasks, vectype, op);
> - }
> + vect_record_loop_mask (loop_vinfo,
> +&LOOP_VINFO_MASKS (loop_vinfo),
> +ncopies, vectype, op);
>  
> break;
>   }
> 
> 
> 
> 
> 

-- 
Richard Biener 
SUSE Software Solutions Germany GmbH,
Frankenstrasse 146, 90461 Nuernberg, Germany;
GF: Ivo Totev, Andrew McDonald, Werner Knoblich; (HRB 36809, AG Nuernberg)

Re: [PATCH 3/4] RISC-V: Add .note.gnu.property for ZICFILP and ZICFISS ISA extension

2025-01-21 Thread Mark Wielaard

Hi,

On Sat, 2025-01-18 at 09:34 +0800, Monk Chiang wrote:
> Thanks, I will fix it.

Thanks. And if you need help with that please let people know.
The riscv bootstrap has been broken now for 5 days.
And it really looks like it is as simple as just removing that one
line.

Cheers,

Mark
> 

> > Mark Wielaard  於 2025年1月17日 晚上10:32 寫道：
> > 
> > Hi Monk,
> > 
> > > On Fri, Nov 15, 2024 at 06:53:09PM +0800, Monk Chiang wrote:
> > > gcc/ChangeLog:
> > >* gcc/config/riscv/riscv.cc
> > >(riscv_file_end_indicate_exec_stack): Add .note.gnu.property.
> > >* gcc/config/riscv/linux.h (TARGET_ASM_FILE_END): Define.
> > > 
> > > [...]
> > > diff --git a/gcc/config/riscv/riscv.cc b/gcc/config/riscv/riscv.cc
> > > index 8201a965ed1..bda982f085c 100644
> > > --- a/gcc/config/riscv/riscv.cc
> > > +++ b/gcc/config/riscv/riscv.cc
> > > @@ -10195,6 +10195,56 @@ riscv_file_start (void)
> > > riscv_emit_attribute ();
> > > }
> > > 
> > > +void
> > > +riscv_file_end_indicate_exec_stack ()
> > > +{
> > > +  file_end_indicate_exec_stack ();
> > > +  long GNU_PROPERTY_RISCV_FEATURE_1_AND  = 0;
> > 
> > This broke the riscv bootstrap because
> > GNU_PROPERTY_RISCV_FEATURE_1_AND is never used.
> > 
> > ../../gcc/gcc/config/riscv/riscv.cc:10340:8: error: unused variable 
> > ‘GNU_PROPERTY_RISCV_FEATURE_1_AND’ [-Werror=unused-variable]
> > 10340 |   long GNU_PROPERTY_RISCV_FEATURE_1_AND  = 0;
> >   |^~~~
> > 
> > See https://builder.sourceware.org/buildbot/#/builders/310/builds/863
> > 
> > Could you fix that?
> > 
> > Thanks,
> > 
> > Mark

[committed] Regenerate aarch64.opt.urls

2025-01-21 Thread alfie.richards


This updates aarch64.opt.urls after my patch earlier today.

Pushing directly as it;s an obvious fix.

gcc/ChangeLog:

* config/aarch64/aarch64.opt.urls: Regenerate
---
 gcc/config/aarch64/aarch64.opt.urls | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/gcc/config/aarch64/aarch64.opt.urls b/gcc/config/aarch64/aarch64.opt.urls
index 4fa90384378..7ec14a94381 100644
--- a/gcc/config/aarch64/aarch64.opt.urls
+++ b/gcc/config/aarch64/aarch64.opt.urls
@@ -92,3 +92,6 @@ UrlSuffix(gcc/AArch64-Options.html#index-mstack-protector-guard-reg)
 mstack-protector-guard-offset=
 UrlSuffix(gcc/AArch64-Options.html#index-mstack-protector-guard-offset)
 
+Wexperimental-fmv-target
+UrlSuffix(gcc/AArch64-Options.html#index-Wexperimental-fmv-target)
+

Honor dump options for C/C++ '-fdump-tree-original'

2025-01-21 Thread Thomas Schwinge

Hi!

On 2025-01-16T15:57:52+0100, I wrote:
> I have noticed that '-fdump-tree-original-lineno' for Fortran (for
> example) does dump location information, but for C/C++ it does not.
> The reason is that Fortran (and other front ends) use code like:
>
> /* Output the GENERIC tree.  */
> dump_function (TDI_original, fndecl);
>
> ..., but 'gcc/c-family/c-gimplify.cc:c_genericize' has some special code
> to "Dump the C-specific tree IR", and that (unless 'TDF_RAW') calls
> 'gcc/c-family/c-pretty-print.cc:print_c_tree', and appears to completely
> ignore the 'dump_flags_t'.  (Ignores it in 'c_pretty_printer::statement',
> and passes 'TDF_NONE' into 'dump_generic_node'.)
>
> See the attached "Honor dump options for C/C++ '-fdump-tree-original'"
> for what I have quickly hacked up.  Does that make any sense to do like
> this, and if yes, how much more polish does this need, or if no, how
> should we approach this issue otherwise?
>
> (I need this, no surprise, for use in test cases.)

In addition to upcoming use of '-fdump-tree-original-lineno', this patch
actually resolves XFAILs for 'c-c++-common/goacc/pr92793-1.c', which had
gotten added as part of commit fa410314ec94c9df2ad270c1917adc51f9147c2c
"[OpenACC] Elaborate testcases that verify column location information 
[PR92793]".

With 'c-c++-common/goacc/pr92793-1.c' un-XFAILed, is it OK, for now, to
push the attached "Honor dump options for C/C++ '-fdump-tree-original'"?
I've 'make check'ed a number of different GCC targets/configurations.


Grüße
 Thomas


>From 37d23691a1087cc2295daa45d8b579ccf6e84d9c Mon Sep 17 00:00:00 2001
From: Thomas Schwinge 
Date: Thu, 16 Jan 2025 15:32:56 +0100
Subject: [PATCH] Honor dump options for C/C++ '-fdump-tree-original'

In addition to upcoming use of '-fdump-tree-original-lineno', this patch
actually resolves XFAILs for 'c-c++-common/goacc/pr92793-1.c', which had
gotten added as part of commit fa410314ec94c9df2ad270c1917adc51f9147c2c
"[OpenACC] Elaborate testcases that verify column location information [PR92793]".

	gcc/c-family/
	* c-gimplify.cc (c_genericize): Pass 'local_dump_flags' to
	'print_c_tree'.
	* c-pretty-print.cc (c_pretty_printer::statement): Pass
	'dump_flags' to 'dump_generic_node'.
	(c_pretty_printer::c_pretty_printer): Initialize 'dump_flags'.
	(print_c_tree): Add 'dump_flags_t' formal parameter.
	(debug_c_tree): Adjust.
	* c-pretty-print.h (c_pretty_printer): Add 'dump_flags_t
	dump_flags'.
	(c_pretty_printer::c_pretty_printer): Add 'dump_flags_t' formal
	parameter.
	(print_c_tree): Adjust.
	gcc/testsuite/
	* c-c++-common/goacc/pr92793-1.c: Remove
	'-fdump-tree-original-lineno' XFAILs.
---
 gcc/c-family/c-gimplify.cc   |  2 +-
 gcc/c-family/c-pretty-print.cc   | 29 
 gcc/c-family/c-pretty-print.h|  6 ++--
 gcc/testsuite/c-c++-common/goacc/pr92793-1.c | 21 +++---
 4 files changed, 39 insertions(+), 19 deletions(-)

diff --git a/gcc/c-family/c-gimplify.cc b/gcc/c-family/c-gimplify.cc
index 8f6b4335b17..4055369e5d0 100644
--- a/gcc/c-family/c-gimplify.cc
+++ b/gcc/c-family/c-gimplify.cc
@@ -728,7 +728,7 @@ c_genericize (tree fndecl)
 	dump_node (DECL_SAVED_TREE (fndecl),
 		   TDF_SLIM | local_dump_flags, dump_orig);
   else
-	print_c_tree (dump_orig, DECL_SAVED_TREE (fndecl));
+	print_c_tree (dump_orig, DECL_SAVED_TREE (fndecl), local_dump_flags);
   fprintf (dump_orig, "\n");
 }
 
diff --git a/gcc/c-family/c-pretty-print.cc b/gcc/c-family/c-pretty-print.cc
index 0b6810e1224..1ce19f54988 100644
--- a/gcc/c-family/c-pretty-print.cc
+++ b/gcc/c-family/c-pretty-print.cc
@@ -2858,6 +2858,9 @@ c_pretty_printer::statement (tree t)
 {
 
 case SWITCH_STMT:
+  if (dump_flags != TDF_NONE)
+	internal_error ("dump flags not handled here");
+
   pp_c_ws_string (this, "switch");
   pp_space (this);
   pp_c_left_paren (this);
@@ -2875,6 +2878,9 @@ c_pretty_printer::statement (tree t)
 	for ( expression(opt) ; expression(opt) ; expression(opt) ) statement
 	for ( declaration expression(opt) ; expression(opt) ) statement  */
 case WHILE_STMT:
+  if (dump_flags != TDF_NONE)
+	internal_error ("dump flags not handled here");
+
   pp_c_ws_string (this, "while");
   pp_space (this);
   pp_c_left_paren (this);
@@ -2887,6 +2893,9 @@ c_pretty_printer::statement (tree t)
   break;
 
 case DO_STMT:
+  if (dump_flags != TDF_NONE)
+	internal_error ("dump flags not handled here");
+
   pp_c_ws_string (this, "do");
   pp_newline_and_indent (this, 3);
   statement (DO_BODY (t));
@@ -2901,6 +2910,9 @@ c_pretty_printer::statement (tree t)
   break;
 
 case FOR_STMT:
+  if (dump_flags != TDF_NONE)
+	internal_error ("dump flags not handled here");
+
   pp_c_ws_string (this, "for");
   pp_space (this);
   pp_c_left_paren (this);
@@ -2929,6 +2941,9 @@ c_pretty_printer::statement (tree t)
 	continue ;
 	return expression(opt) ;  */
 case BREAK_

Re: [PATCH v5] AArch64: Add LUTI ACLE for SVE2

2025-01-21 Thread Richard Sandiford

Thanks for the update.  LGTM with one trivial fix:

 writes:
> diff --git a/gcc/config/aarch64/aarch64-sve-builtins-shapes.cc 
> b/gcc/config/aarch64/aarch64-sve-builtins-shapes.cc
> index ca721dd2c09..d8776a55230 100644
> --- a/gcc/config/aarch64/aarch64-sve-builtins-shapes.cc
> +++ b/gcc/config/aarch64/aarch64-sve-builtins-shapes.cc
> @@ -903,6 +903,52 @@ struct load_ext_gather_base : public overloaded_base<1>
>}
>  };
>  
> +
> +/* sv_t svlut[__g](svx_t, svuint8_t, uint64_t)

sv_t 

I think a blank line after this one would help too.

OK with that change, thanks.

Richard

Re: [PATCH v5] AArch64: Add LUTI ACLE for SVE2

2025-01-21 Thread Saurabh Jha


On 1/21/2025 11:37 AM, Richard Sandiford wrote:

Thanks for the update.  LGTM with one trivial fix:

 writes:

diff --git a/gcc/config/aarch64/aarch64-sve-builtins-shapes.cc 
b/gcc/config/aarch64/aarch64-sve-builtins-shapes.cc
index ca721dd2c09..d8776a55230 100644
--- a/gcc/config/aarch64/aarch64-sve-builtins-shapes.cc
+++ b/gcc/config/aarch64/aarch64-sve-builtins-shapes.cc
@@ -903,6 +903,52 @@ struct load_ext_gather_base : public overloaded_base<1>
}
  };
  
+

+/* sv_t svlut[__g](svx_t, svuint8_t, uint64_t)


sv_t 

I think a blank line after this one would help too.

OK with that change, thanks.

Sure, fixed and committed. Thanks.


Richard

[PATCH] tree-optimization/118569 - LC SSA broken after unrolling

2025-01-21 Thread Richard Biener

The following amends the previous fix to mark all of the loop BBs
as need to be scanned for new LC PHI uses when its nesting parents
changed, noticing one caller of fix_loop_placement was already
doing that.  So the following moves this code into fix_loop_placement,
covering both callers now.

Bootstrapped and tested on x86_64-unknown-linux-gnu, pushed.

PR tree-optimization/118569
* cfgloopmanip.cc (fix_loop_placement): When the loops
nesting parents changed, mark all blocks to be scanned
for LC PHI uses.
(fix_bb_placements): Remove code moved into fix_loop_placement.

* gcc.dg/torture/pr118569.c: New testcase.
---
 gcc/cfgloopmanip.cc | 22 +++
 gcc/testsuite/gcc.dg/torture/pr118569.c | 36 +
 2 files changed, 47 insertions(+), 11 deletions(-)
 create mode 100644 gcc/testsuite/gcc.dg/torture/pr118569.c

diff --git a/gcc/cfgloopmanip.cc b/gcc/cfgloopmanip.cc
index 573146b2e28..2c28437b34d 100644
--- a/gcc/cfgloopmanip.cc
+++ b/gcc/cfgloopmanip.cc
@@ -154,10 +154,17 @@ fix_loop_placement (class loop *loop, bool 
*irred_invalidated,
  if (e->flags & EDGE_IRREDUCIBLE_LOOP)
*irred_invalidated = true;
  rescan_loop_exit (e, false, false);
- /* Any LC SSA PHIs on e->dest might now be on the wrong edge
-if their defs were in a former outer loop.  */
- if (loop_closed_ssa_invalidated)
-   bitmap_set_bit (loop_closed_ssa_invalidated, e->src->index);
+   }
+  /* Any LC SSA PHIs on e->dest might now be on the wrong edge
+if their defs were in a former outer loop.  Also all uses
+in the original inner loop of defs in the outer loop(s) now
+require LC PHI nodes.  */
+  if (loop_closed_ssa_invalidated)
+   {
+ basic_block *bbs = get_loop_body (loop);
+ for (unsigned i = 0; i < loop->num_nodes; ++i)
+   bitmap_set_bit (loop_closed_ssa_invalidated, bbs[i]->index);
+ free (bbs);
}
 
   ret = true;
@@ -233,13 +240,6 @@ fix_bb_placements (basic_block from,
   loop_closed_ssa_invalidated))
continue;
  target_loop = loop_outer (from->loop_father);
- if (loop_closed_ssa_invalidated)
-   {
- basic_block *bbs = get_loop_body (from->loop_father);
- for (unsigned i = 0; i < from->loop_father->num_nodes; ++i)
-   bitmap_set_bit (loop_closed_ssa_invalidated, bbs[i]->index);
- free (bbs);
-   }
}
   else
{
diff --git a/gcc/testsuite/gcc.dg/torture/pr118569.c 
b/gcc/testsuite/gcc.dg/torture/pr118569.c
new file mode 100644
index 000..c5b404aded5
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/torture/pr118569.c
@@ -0,0 +1,36 @@
+/* { dg-do compile } */
+/* { dg-additional-options "-fno-tree-ch -fno-tree-ccp -fno-tree-fre" } */
+
+volatile int a;
+int b, c, d, e, f, g;
+int main() {
+  int i = 2, j = 1;
+k:
+  if (!e)
+;
+  else {
+short l = 1;
+if (0)
+m:
+  d = g;
+f = 0;
+for (; f < 2; f++) {
+  if (f)
+for (; j < 2; j++)
+  if (i)
+goto m;
+  a;
+  if (l)
+continue;
+  i = 0;
+  while (c)
+l++;
+}
+g = 0;
+  }
+  if (b) {
+i = 1;
+goto k;
+  }
+  return 0;
+}
-- 
2.43.0

[PATCH v2] RISC-V: Enable and adjust the testsuite for XTheadVector.

2025-01-21 Thread Jin Ma

gcc/testsuite/ChangeLog:

* gcc.target/riscv/rvv/rvv.exp: Enable testsuite of
XTheadVector.
* gcc.target/riscv/rvv/xtheadvector/pr114194.c: Adjust correctly.
* gcc.target/riscv/rvv/xtheadvector/prefix.c: Likewise.
* gcc.target/riscv/rvv/xtheadvector/vlb-vsb.c: Likewise.
* gcc.target/riscv/rvv/xtheadvector/vlbu-vsb.c: Likewise.
* gcc.target/riscv/rvv/xtheadvector/vlh-vsh.c: Likewise.
* gcc.target/riscv/rvv/xtheadvector/vlhu-vsh.c: Likewise.
* gcc.target/riscv/rvv/xtheadvector/vlw-vsw.c: Likewise.
* gcc.target/riscv/rvv/xtheadvector/vlwu-vsw.c: Likewise.
---
 gcc/testsuite/gcc.target/riscv/rvv/rvv.exp|  2 ++
 .../riscv/rvv/xtheadvector/pr114194.c | 32 +--
 .../riscv/rvv/xtheadvector/prefix.c   |  2 +-
 .../riscv/rvv/xtheadvector/vlb-vsb.c  | 17 ++
 .../riscv/rvv/xtheadvector/vlbu-vsb.c | 17 ++
 .../riscv/rvv/xtheadvector/vlh-vsh.c  | 17 ++
 .../riscv/rvv/xtheadvector/vlhu-vsh.c | 17 ++
 .../riscv/rvv/xtheadvector/vlw-vsw.c  | 17 ++
 .../riscv/rvv/xtheadvector/vlwu-vsw.c | 17 ++
 9 files changed, 79 insertions(+), 59 deletions(-)

diff --git a/gcc/testsuite/gcc.target/riscv/rvv/rvv.exp 
b/gcc/testsuite/gcc.target/riscv/rvv/rvv.exp
index d82710e9c416..3824997c9082 100644
--- a/gcc/testsuite/gcc.target/riscv/rvv/rvv.exp
+++ b/gcc/testsuite/gcc.target/riscv/rvv/rvv.exp
@@ -41,6 +41,8 @@ dg-runtest [lsort [glob -nocomplain 
$srcdir/$subdir/base/*.\[cS\]]] \
"" $CFLAGS
 dg-runtest [lsort [glob -nocomplain $srcdir/$subdir/xsfvector/*.\[cS\]]] \
"" $CFLAGS
+dg-runtest [lsort [glob -nocomplain $srcdir/$subdir/xtheadvector/*.\[cS\]]] \
+   "" $CFLAGS
 gcc-dg-runtest [lsort [glob -nocomplain $srcdir/$subdir/vsetvl/*.\[cS\]]] \
"" $CFLAGS
 dg-runtest [lsort [glob -nocomplain $srcdir/$subdir/autovec/*.\[cS\]]] \
diff --git a/gcc/testsuite/gcc.target/riscv/rvv/xtheadvector/pr114194.c 
b/gcc/testsuite/gcc.target/riscv/rvv/xtheadvector/pr114194.c
index a82e2d3fbfe6..5c9777b071b5 100644
--- a/gcc/testsuite/gcc.target/riscv/rvv/xtheadvector/pr114194.c
+++ b/gcc/testsuite/gcc.target/riscv/rvv/xtheadvector/pr114194.c
@@ -1,11 +1,11 @@
 /* { dg-do compile { target { ! riscv_abi_e } } } */
-/* { dg-options "-march=rv32gc_xtheadvector" { target { rv32 } } } */
-/* { dg-options "-march=rv64gc_xtheadvector" { target { rv64 } } } */
+/* { dg-options "-march=rv32gc_xtheadvector -O2" { target { rv32 } } } */
+/* { dg-options "-march=rv64gc_xtheadvector -O2" { target { rv64 } } } */
 /* { dg-final { check-function-bodies "**" "" } } */
 
 /*
 ** foo0_1:
-** sb\tzero,0([a-x0-9]+)
+** sb\tzero,0\([a-x0-9]+\)
 ** ret
 */
 void foo0_1 (void *p)
@@ -15,13 +15,13 @@ void foo0_1 (void *p)
 
 /*
 ** foo0_7:
-** sb\tzero,0([a-x0-9]+)
-** sb\tzero,1([a-x0-9]+)
-** sb\tzero,2([a-x0-9]+)
-** sb\tzero,3([a-x0-9]+)
-** sb\tzero,4([a-x0-9]+)
-** sb\tzero,5([a-x0-9]+)
-** sb\tzero,6([a-x0-9]+)
+** sb\tzero,0\([a-x0-9]+\)
+** sb\tzero,1\([a-x0-9]+\)
+** sb\tzero,2\([a-x0-9]+\)
+** sb\tzero,3\([a-x0-9]+\)
+** sb\tzero,4\([a-x0-9]+\)
+** sb\tzero,5\([a-x0-9]+\)
+** sb\tzero,6\([a-x0-9]+\)
 ** ret
 */
 void foo0_7 (void *p)
@@ -32,7 +32,7 @@ void foo0_7 (void *p)
 /*
 ** foo1_1:
 ** li\t[a-x0-9]+,1
-** sb\t[a-x0-9]+,0([a-x0-9]+)
+** sb\t[a-x0-9]+,0\([a-x0-9]+\)
 ** ret
 */
 void foo1_1 (void *p)
@@ -43,11 +43,11 @@ void foo1_1 (void *p)
 /*
 ** foo1_5:
 ** li\t[a-x0-9]+,1
-** sb\t[a-x0-9]+,0([a-x0-9]+)
-** sb\t[a-x0-9]+,1([a-x0-9]+)
-** sb\t[a-x0-9]+,2([a-x0-9]+)
-** sb\t[a-x0-9]+,3([a-x0-9]+)
-** sb\t[a-x0-9]+,4([a-x0-9]+)
+** sb\t[a-x0-9]+,0\([a-x0-9]+\)
+** sb\t[a-x0-9]+,1\([a-x0-9]+\)
+** sb\t[a-x0-9]+,2\([a-x0-9]+\)
+** sb\t[a-x0-9]+,3\([a-x0-9]+\)
+** sb\t[a-x0-9]+,4\([a-x0-9]+\)
 ** ret
 */
 void foo1_5 (void *p)
diff --git a/gcc/testsuite/gcc.target/riscv/rvv/xtheadvector/prefix.c 
b/gcc/testsuite/gcc.target/riscv/rvv/xtheadvector/prefix.c
index eee727ef6b42..0a18e697830c 100644
--- a/gcc/testsuite/gcc.target/riscv/rvv/xtheadvector/prefix.c
+++ b/gcc/testsuite/gcc.target/riscv/rvv/xtheadvector/prefix.c
@@ -9,4 +9,4 @@ prefix (vint32m1_t vx, vint32m1_t vy, size_t vl)
   return __riscv_vadd_vv_i32m1 (vx, vy, vl);
 }
 
-/* { dg-final { scan-assembler {\mth\.v\M} } } */
+/* { dg-final { scan-assembler {\mth\.vadd\.vv\M} } } */
diff --git a/gcc/testsuite/gcc.target/riscv/rvv/xtheadvector/vlb-vsb.c 
b/gcc/testsuite/gcc.target/riscv/rvv/xtheadvector/vlb-vsb.c
index 3c12c1245974..16073ccb2366 100644
--- a/gcc/testsuite/gcc.target/riscv/rvv/xtheadvector/vlb-vsb.c
+++ b/gcc/testsuite/gcc.target/riscv/rvv/xtheadvector/vlb-vsb.c
@@ -5,7 +5,8 @@
 
 /*
 ** f1:
-** th.vsetivli\tzero,4,e32,m1,tu,ma
+** li\t[a-x0-9]+,4
+** th.vsetvli\tzero,[a-x0-9]+,e32,m1
 ** th.vlb\.v\tv[0-9]+,0

Re: [PATCH v2 2/2] LoongArch: Improve reassociation for bitwise operation and left shift [PR 115921]

2025-01-21 Thread Xi Ruoyao

On Tue, 2025-01-21 at 22:14 +0800, Xi Ruoyao wrote:
> On Tue, 2025-01-21 at 21:52 +0800, Xi Ruoyao wrote:
> > > struct Pair { unsigned long a, b; };
> > > 
> > > struct Pair
> > > test (struct Pair p, long x, long y)
> > > {
> > >   p.a &= 0x;
> > >   p.a <<= 2;
> > >   p.a += x;
> > >   p.b &= 0x;
> > >   p.b <<= 2;
> > >   p.b += x;
> > >   return p;
> > > }
> > > 
> > > in GCC 13 the result is:
> > > 
> > >   or  $r12,$r4,$r0
> > 
> > Hmm, this strange move is caused by "&" in bstrpick_alsl_paired.  Is it
> > really needed for the fusion?
> 
> Never mind, it's needed or a = ((a & 0x) << 1) + a will blow up.
> Stupid I.
> 
> > >   bstrpick.d  $r4,$r12,31,0
> > >   alsl.d  $r4,$r4,$r6,2
> > >   or  $r12,$r5,$r0
> > >   bstrpick.d  $r5,$r12,31,0
> > >   alsl.d  $r5,$r5,$r6,2
> > >   jr  $r1

This fixes the test case and forces to emit alsl.d for (a & 0x)
<< [1234]:

diff --git a/gcc/config/loongarch/loongarch.md 
b/gcc/config/loongarch/loongarch.md
index 10197b9d9d5..7b5b77c56ac 100644
--- a/gcc/config/loongarch/loongarch.md
+++ b/gcc/config/loongarch/loongarch.md
@@ -3134,8 +3134,50 @@ (define_insn_and_split "_shift_reverse"
   mode)"
   "#"
   "&& true"
+  [(set (match_dup 0) (ashift:X (match_dup 0) (match_dup 2)))]
+  {
+operands[3] = loongarch_reassoc_shift_bitwise (,
+  operands[2],
+  operands[3],
+  mode);
+
+if (ins_zero_bitmask_operand (operands[3], mode))
+  {
+   gcc_checking_assert ();
+   emit_move_insn (operands[0], operands[1]);
+   operands[1] = operands[0];  
+  }
+
+emit_insn (gen_3 (operands[0], operands[1], operands[3]));
+
+if (
+   && TARGET_64BIT
+   && si_mask_operand (operands[3], DImode)
+   && const_immalsl_operand (operands[2], SImode))
+  {
+   /* Special case for bstrpick.d + alsl.d fusion
+  TODO: TARGET_SCHED_MACRO_FUSION_PAIR_P */
+   emit_insn (gen_alsldi3 (operands[0], operands[0],
+   operands[2], gen_rtx_REG (DImode, 0)));
+   DONE;
+  }
+  })
+
+(define_insn_and_split "_alsl_reverse"
+  [(set (match_operand:X 0 "register_operand" "=r")
+   (plus:X
+ (any_bitwise:X
+   (ashift:X (match_operand:X  1 "register_operand" "r")
+ (match_operand:SI 2 "const_immalsl_operand" "i"))
+   (match_operand:X 3 "const_int_operand" "i"))
+ (match_operand:X 4 "register_operand" "r")))]
+  "loongarch_reassoc_shift_bitwise (, operands[2], operands[3],
+   mode)"
+  "#"
+  "&& true"
   [(set (match_dup 0) (any_bitwise:X (match_dup 1) (match_dup 3)))
-   (set (match_dup 0) (ashift:X (match_dup 0) (match_dup 2)))]
+   (set (match_dup 0) (plus:X (ashift:X (match_dup 0) (match_dup 2))
+ (match_dup 4)))]
   {
 operands[3] = loongarch_reassoc_shift_bitwise (,
   operands[2],

I guess it'll work combined with TARGET_SCHED_MACRO_FUSION_PAIR_P
implemented.


-- 
Xi Ruoyao 
School of Aerospace Science and Technology, Xidian University

vect: Force alignment peeling to vectorize more early break loops [PR118211]: update 'gcc.dg/vect/vect-switch-search-line-fast.c' for GCN (was: [gcc r15-6807] vect: Force alignment peeling to vectoriz

2025-01-21 Thread Thomas Schwinge

Hi!

On 2025-01-20T08:40:25+, Tamar Christina  wrote:
>> From: Thomas Schwinge 
>> Sent: Monday, January 13, 2025 9:54 AM

>> On 2025-01-10T21:22:03+, Tamar Christina via Gcc-cvs > c...@gcc.gnu.org> wrote:
>> > https://gcc.gnu.org/g:68326d5d1a593dc0bf098c03aac25916168bc5a9
>> >
>> > commit r15-6807-g68326d5d1a593dc0bf098c03aac25916168bc5a9
>> > Author: Alex Coplan 
>> > Date:   Mon Mar 11 13:09:10 2024 +
>> >
>> > vect: Force alignment peeling to vectorize more early break loops 
>> > [PR118211]
>> 
>> In addition to the regression already noted elsewhere:
>> 
>> PASS: gcc.dg/tree-ssa/predcom-8.c (test for excess errors)
>> PASS: gcc.dg/tree-ssa/predcom-8.c scan-tree-dump pcom "Executing 
>> predictive commoning without unrolling"
>> [-PASS:-]{+FAIL:+} gcc.dg/tree-ssa/predcom-8.c scan-tree-dump-not pcom 
>> "Invalid sum"
>> 
>> ..., this commit for for '--target=amdgcn-amdhsa' (tested '-march=gfx908', 
>> '-march=gfx1100') also regresses:
>> 
>> PASS: gcc.dg/vect/vect-switch-search-line-fast.c (test for excess errors)
>> [-XFAIL:-]{+FAIL:+} gcc.dg/vect/vect-switch-search-line-fast.c 
>> scan-tree-dump-times vect "vectorized 1 loops" [-1-]{+0+}
>> 
>> gcc.dg/vect/vect-switch-search-line-fast.c: pattern found 1 times
>> 
>> > --- a/gcc/testsuite/gcc.dg/vect/vect-switch-search-line-fast.c
>> > +++ b/gcc/testsuite/gcc.dg/vect/vect-switch-search-line-fast.c
>> > [...]
>> > -/* { dg-final { scan-tree-dump-times "vectorized 1 loops" 1 "vect" { 
>> > xfail *-*-* } } } */
>> > +/* { dg-final { scan-tree-dump-times "vectorized 1 loops" 1 "vect" { 
>> > target { ilp32 } } } } */
>> > +/* { dg-final { scan-tree-dump-times "vectorized 1 loops" 0 "vect" { 
>> > target { ! ilp32 } } } } */
>> 
>> Presuming that it's correct that GCN continues to be able vectorize this,
>> what is the appropriate conditional to use?
>
> I don't think we really have a condition for it's succeeding on some targets 
> for now.

Thanks for checking.

> The original testcase was xfail but it was failing for many different reasons 
> on all targets.

Eh, of course -- it was XFAIL before, sorry.  So, it's not the case that
"GCN continues to be able vectorize this", but rather that after this
commit, "GCN is now able vectorize this".  :-)

> So I think just doing { target { ilp32 || { amdgcn-* } } } should work for 
> now.

Pushed to trunk branch commit da75309c635c54a6010b146514d456d2a4c6ab33
"vect: Force alignment peeling to vectorize more early break loops [PR118211]: 
update 'gcc.dg/vect/vect-switch-search-line-fast.c' for GCN",
see attached.


Grüße
 Thomas


>From da75309c635c54a6010b146514d456d2a4c6ab33 Mon Sep 17 00:00:00 2001
From: Thomas Schwinge 
Date: Tue, 21 Jan 2025 14:57:37 +0100
Subject: [PATCH] vect: Force alignment peeling to vectorize more early break
 loops [PR118211]: update 'gcc.dg/vect/vect-switch-search-line-fast.c' for GCN

	PR tree-optimization/118211
	PR tree-optimization/116126
	gcc/testsuite/
	* gcc.dg/vect/vect-switch-search-line-fast.c: Update for GCN.
---
 gcc/testsuite/gcc.dg/vect/vect-switch-search-line-fast.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/gcc/testsuite/gcc.dg/vect/vect-switch-search-line-fast.c b/gcc/testsuite/gcc.dg/vect/vect-switch-search-line-fast.c
index 21c77f49ebd..678512db319 100644
--- a/gcc/testsuite/gcc.dg/vect/vect-switch-search-line-fast.c
+++ b/gcc/testsuite/gcc.dg/vect/vect-switch-search-line-fast.c
@@ -16,5 +16,5 @@ const unsigned char *search_line_fast2 (const unsigned char *s,
   return s;
 }
 
-/* { dg-final { scan-tree-dump-times "vectorized 1 loops" 1 "vect" { target { ilp32 } } } } */
-/* { dg-final { scan-tree-dump-times "vectorized 1 loops" 0 "vect" { target { ! ilp32 } } } } */
+/* { dg-final { scan-tree-dump-times "vectorized 1 loops" 1 "vect" { target { ilp32 || { amdgcn*-*-* } } } } } */
+/* { dg-final { scan-tree-dump-times "vectorized 1 loops" 0 "vect" { target { ! { ilp32 || { amdgcn*-*-* } } } } } } */
-- 
2.34.1

Re: [GCC16 stage 1][RFC][PATCH 0/3]extend "counted_by" attribute to pointer fields of structures

2025-01-21 Thread Qing Zhao



> On Jan 17, 2025, at 18:13, Joseph Myers  wrote:
> 
> On Fri, 17 Jan 2025, Qing Zhao wrote:
> 
>> struct fc_bulk {
>>  ...
>>  struct fs_bulk fs_bulk;
>>  struct fc fcs[] __counted_by(fs_bulk.len);
>> };
>> 
>> i.e, the “counted_by” field is in the inner structure of the current 
>> structure of the FAM.
>> With the current syntax, it’s not easy to extend to support this.
>> 
>> But with the designator syntax, it might be much easier to be extended to 
>> support this. 
>> 
>> So, Kees and Bill, what’s your opinion on this? I think that it’s better to 
>> have a consistent interface between GCC
>> and Clang. 
>> 
>> Joseph, what’s your opinion on this new syntax?  Shall we support the 
>> designator syntax for counted_by attribute?
> 
> Designator syntax seems reasonable.
> 
> I think basic C language design principles here include:
> 
> * It should be unambiguous in a given context what name space an 
> identifier is to be looked up in.  (So you can have designator syntax 
> where the identifier is always looked up as a member of the relevant 
> struct, or use a plain identifier where the semantics are defined that 
> way.  But what would be a bad idea would be any attempt to automagically 
> guess whether something that looks like an expression should be 
> interpreted as an expression or with identifiers instead looked up as 
> structure members.  If you allow fs_bulk.len above, there should be no 
> possibility of fs_bulk being an ordinary identifier (global variable etc.)

If we use the designator syntax for counted by, then the correct syntax for the 
above should be:

struct fc_bulk {
 …
 struct fs_bulk fs_bulk;
 struct fc fcs[] __counted_by(.fs_bulk.len);
};


> - the name lookup rules should mean it's always only looked up as a member 
> of the current structure.)
> 
> * Don't introduce something "like expression but with different name 
> lookup rules".  Designators aren't expressions and have limited syntax.  
> It would be a bad idea to try to e.g. have something allowing arithmetic 
> on designators.  For example, don't allow __counted_by(.len1 + .len2) 
> where len1 and len2 are both members, as that's inventing a complete new 
> expression-like-but-not-expression syntactic construct.

So, even after we introduce the designator syntax for counted_by attribute,  
arbitrary expressions as:

counted_by (.len1 + const)
counted_by (.len1 + .len2) 

Still cannot be supported? 

If not, how should we support simple expressions for counted_by attribute? 

Thanks.

Qing

> 
> -- 
> Joseph S. Myers
> josmy...@redhat.com

[PATCH 10/13] i386: Change mnemonics from VCVTNE2PH2[B, H]F8 to VCVT2PH2[B, H]F8

2025-01-21 Thread Haochen Jiang

gcc/ChangeLog:

PR target/118270
* config/i386/avx10_2-512convertintrin.h: Change intrin and
builtin name according to new mnemonics.
* config/i386/avx10_2convertintrin.h: Ditto.
* config/i386/i386-builtin.def (BDESC): Ditto.
* config/i386/sse.md
(UNSPEC_VCVT2PH2BF8): Rename from UNSPEC_VCVTNE2PH2BF8.
(UNSPEC_VCVT2PH2BF8S): Rename from UNSPEC_VCVTNE2PH2BF8S.
(UNSPEC_VCVT2PH2HF8): Rename from UNSPEC_VCVTNE2PH2HF8.
(UNSPEC_VCVT2PH2HF8S): Rename from UNSPEC_VCVTNE2PH2HF8S.
(UNSPEC_CONVERTFP8_PACK): Rename from UNSPEC_NECONVERTFP8_PACK.
Adjust UNSPEC name.
(convertfp8_pack): Rename from neconvertfp8_pack. Adjust
iterator map.
(vcvt): Rename to...
(vcvt): ...this.

gcc/testsuite/ChangeLog:

PR target/118270
* gcc.target/i386/avx10_2-512-convert-1.c: Adjust output
and intrin call.
* gcc.target/i386/avx10_2-512-vcvtne2ph2bf8-2.c: Move to...
* gcc.target/i386/avx10_2-512-vcvt2ph2bf8-2.c: ...here.
Adjust intrin call.
* gcc.target/i386/avx10_2-512-vcvtne2ph2bf8s-2.c: Move to...
* gcc.target/i386/avx10_2-512-vcvt2ph2bf8s-2.c: ...here.
Adjust intrin call.
* gcc.target/i386/avx10_2-512-vcvtne2ph2hf8-2.c: Move to...
* gcc.target/i386/avx10_2-512-vcvt2ph2hf8-2.c: ...here.
Adjust intrin call.
* gcc.target/i386/avx10_2-512-vcvtne2ph2hf8s-2.c: Move to...
* gcc.target/i386/avx10_2-512-vcvt2ph2hf8s-2.c: ...here.
Adjust intrin call.
* gcc.target/i386/avx10_2-convert-1.c: Adjust output
and intrin call.
* gcc.target/i386/avx10_2-vcvtne2ph2bf8-2.c: Move to...
* gcc.target/i386/avx10_2-vcvt2ph2bf8-2.c: ...here.
Adjust intrin call.
* gcc.target/i386/avx10_2-vcvtne2ph2hf8-2.c: Move to...
* gcc.target/i386/avx10_2-vcvt2ph2bf8s-2.c: ...here.
Adjust intrin call.
* gcc.target/i386/avx10_2-vcvtne2ph2bf8s-2.c: Move to...
* gcc.target/i386/avx10_2-vcvt2ph2hf8-2.c: ...here.
Adjust intrin call.
* gcc.target/i386/avx10_2-vcvtne2ph2hf8s-2.c: Move to...
* gcc.target/i386/avx10_2-vcvt2ph2hf8s-2.c: ...here.
Adjust intrin call.
---
 gcc/config/i386/avx10_2-512convertintrin.h| 142 -
 gcc/config/i386/avx10_2convertintrin.h| 286 +-
 gcc/config/i386/i386-builtin.def  |  24 +-
 gcc/config/i386/sse.md|  30 +-
 .../gcc.target/i386/avx10_2-512-convert-1.c   |  56 ++--
 ...ph2bf8-2.c => avx10_2-512-vcvt2ph2bf8-2.c} |   6 +-
 ...2bf8s-2.c => avx10_2-512-vcvt2ph2bf8s-2.c} |   6 +-
 ...ph2hf8-2.c => avx10_2-512-vcvt2ph2hf8-2.c} |   6 +-
 ...2hf8s-2.c => avx10_2-512-vcvt2ph2hf8s-2.c} |   6 +-
 .../gcc.target/i386/avx10_2-convert-1.c   | 104 +++
 ...tne2ph2bf8-2.c => avx10_2-vcvt2ph2bf8-2.c} |   4 +-
 ...ne2ph2hf8-2.c => avx10_2-vcvt2ph2bf8s-2.c} |   4 +-
 ...ne2ph2bf8s-2.c => avx10_2-vcvt2ph2hf8-2.c} |   4 +-
 ...e2ph2hf8s-2.c => avx10_2-vcvt2ph2hf8s-2.c} |   4 +-
 14 files changed, 341 insertions(+), 341 deletions(-)
 rename gcc/testsuite/gcc.target/i386/{avx10_2-512-vcvtne2ph2bf8-2.c => 
avx10_2-512-vcvt2ph2bf8-2.c} (89%)
 rename gcc/testsuite/gcc.target/i386/{avx10_2-512-vcvtne2ph2bf8s-2.c => 
avx10_2-512-vcvt2ph2bf8s-2.c} (89%)
 rename gcc/testsuite/gcc.target/i386/{avx10_2-512-vcvtne2ph2hf8-2.c => 
avx10_2-512-vcvt2ph2hf8-2.c} (89%)
 rename gcc/testsuite/gcc.target/i386/{avx10_2-512-vcvtne2ph2hf8s-2.c => 
avx10_2-512-vcvt2ph2hf8s-2.c} (89%)
 rename gcc/testsuite/gcc.target/i386/{avx10_2-vcvtne2ph2bf8-2.c => 
avx10_2-vcvt2ph2bf8-2.c} (78%)
 rename gcc/testsuite/gcc.target/i386/{avx10_2-vcvtne2ph2hf8-2.c => 
avx10_2-vcvt2ph2bf8s-2.c} (78%)
 rename gcc/testsuite/gcc.target/i386/{avx10_2-vcvtne2ph2bf8s-2.c => 
avx10_2-vcvt2ph2hf8-2.c} (78%)
 rename gcc/testsuite/gcc.target/i386/{avx10_2-vcvtne2ph2hf8s-2.c => 
avx10_2-vcvt2ph2hf8s-2.c} (78%)

diff --git a/gcc/config/i386/avx10_2-512convertintrin.h 
b/gcc/config/i386/avx10_2-512convertintrin.h
index 23b2636139d..c753dd7a946 100644
--- a/gcc/config/i386/avx10_2-512convertintrin.h
+++ b/gcc/config/i386/avx10_2-512convertintrin.h
@@ -265,134 +265,134 @@ _mm512_maskz_cvtbiassph_phf8 (__mmask32 __U, __m512i 
__A, __m512h __B)
 
 extern __inline__ __m512i
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_cvtne2ph_pbf8 (__m512h __A, __m512h __B)
+_mm512_cvt2ph_bf8 (__m512h __A, __m512h __B)
 {
-  return (__m512i) __builtin_ia32_vcvtne2ph2bf8512_mask ((__v32hf) __A,
-(__v32hf) __B,
-(__v64qi)
-_mm512_setzero_si512 
(),
-(__mmask64) -1);
+  return (__m512i) __builtin_ia32_vcvt2ph2bf8512_mask ((__v32hf) __A,
+

[PATCH 04/13] i386: Change mnemonics from V[CMP, MAX, MIN]PBF16 to V[CMP, MAX, MIN]BF16

2025-01-21 Thread Haochen Jiang

gcc/ChangeLog:

PR target/118270
* config/i386/avx10_2-512bf16intrin.h: Change intrin and builtin
name according to new mnemonics.
* config/i386/avx10_2bf16intrin.h: Ditto.
* config/i386/i386-builtin.def (BDESC): Ditto.
* config/i386/sse.md
(avx10_2_pbf16_): Rename to...
(avx10_2_bf16_): ...this.
Change instruction name output.
(avx10_2_cmppbf16_): Rename to...
(avx10_2_cmpbf16_): ...this.
Change instruction name output.

gcc/testsuite/ChangeLog:

PR target/118270
* gcc.target/i386/avx10_2-512-bf16-1.c: Adjust output and
intrin call.
* gcc.target/i386/avx10_2-512-bf-vector-cmpp-1.c: Move to...
* gcc.target/i386/avx10_2-512-bf16-vector-cmp-1.c: ...here.
Adjust asm check.
* gcc.target/i386/avx10_2-512-bf-vector-smaxmin-1.c: Move to...
* gcc.target/i386/avx10_2-512-bf16-vector-smaxmin-1.c: ...here.
Adjust asm check.
* gcc.target/i386/avx10_2-512-vcmppbf16-2.c: Move to...
* gcc.target/i386/avx10_2-512-vcmpbf16-2.c: ...here.
Adjust intrin call.
* gcc.target/i386/avx10_2-512-vmaxpbf16-2.c: Move to...
* gcc.target/i386/avx10_2-512-vmaxbf16-2.c: ...here.
Adjust intrin call.
* gcc.target/i386/avx10_2-512-vminpbf16-2.c: Move to...
* gcc.target/i386/avx10_2-512-vminbf16-2.c: ...here.
Adjust intrin call.
* gcc.target/i386/avx10_2-bf16-1.c: Adjust output and
intrin call.
* gcc.target/i386/avx10_2-bf-vector-cmpp-1.c: Move to...
* gcc.target/i386/avx10_2-bf16-vector-cmp-1.c: ...here.
Adjust asm check.
* gcc.target/i386/avx10_2-bf-vector-smaxmin-1.c: Move to...
* gcc.target/i386/avx10_2-bf16-vector-smaxmin-1.c: ...here.
Adjust asm check.
* gcc.target/i386/avx10_2-partial-bf-vector-smaxmin-1.c: Move to...
* gcc.target/i386/avx10_2-partial-bf16-vector-smaxmin-1.c: ...here.
* gcc.target/i386/avx10_2-vcmpbf16-2.c: Move to...
* gcc.target/i386/avx10_2-vcmpbf16-2.c: ...here.
Adjust intrin call.
* gcc.target/i386/avx10_2-vmaxpbf16-2.c: Move to...
* gcc.target/i386/avx10_2-vmaxbf16-2.c: ...here.
Adjust intrin call.
* gcc.target/i386/avx10_2-vminpbf16-2.c: Move to...
* gcc.target/i386/avx10_2-vminbf16-2.c: ...here.
Adjust intrin call.
* gcc.target/i386/part-vect-vec_cmpbf.c: Adjust asm check.
* gcc.target/i386/avx-1.c: Adjust builtin call.
* gcc.target/i386/sse-13.c: Ditto.
* gcc.target/i386/sse-23.c: Ditto.
---
 gcc/config/i386/avx10_2-512bf16intrin.h   | 36 +-
 gcc/config/i386/avx10_2bf16intrin.h   | 68 +--
 gcc/config/i386/i386-builtin.def  | 30 
 gcc/config/i386/sse.md|  8 +--
 gcc/testsuite/gcc.target/i386/avx-1.c |  6 +-
 .../gcc.target/i386/avx10_2-512-bf16-1.c  | 16 ++---
 ...pp-1.c => avx10_2-512-bf16-vector-cmp-1.c} |  2 +-
 c => avx10_2-512-bf16-vector-smaxmin-1.c} |  8 +--
 ...vcmppbf16-2.c => avx10_2-512-vcmpbf16-2.c} |  0
 ...vmaxpbf16-2.c => avx10_2-512-vmaxbf16-2.c} |  0
 ...vminpbf16-2.c => avx10_2-512-vminbf16-2.c} |  0
 .../gcc.target/i386/avx10_2-bf16-1.c  | 32 -
 ...r-cmpp-1.c => avx10_2-bf16-vector-cmp-1.c} |  2 +-
 ...in-1.c => avx10_2-bf16-vector-smaxmin-1.c} | 12 ++--
 ...> avx10_2-partial-bf16-vector-smaxmin-1.c} |  4 +-
 ...0_2-vmaxpbf16-2.c => avx10_2-vcmpbf16-2.c} |  4 +-
 ...0_2-vminpbf16-2.c => avx10_2-vmaxbf16-2.c} |  4 +-
 ...0_2-vcmppbf16-2.c => avx10_2-vminbf16-2.c} |  4 +-
 .../gcc.target/i386/part-vect-vec_cmpbf.c |  2 +-
 gcc/testsuite/gcc.target/i386/sse-13.c|  6 +-
 gcc/testsuite/gcc.target/i386/sse-23.c|  6 +-
 21 files changed, 125 insertions(+), 125 deletions(-)
 rename gcc/testsuite/gcc.target/i386/{avx10_2-512-bf-vector-cmpp-1.c => 
avx10_2-512-bf16-vector-cmp-1.c} (88%)
 rename gcc/testsuite/gcc.target/i386/{avx10_2-512-bf-vector-smaxmin-1.c => 
avx10_2-512-bf16-vector-smaxmin-1.c} (56%)
 rename gcc/testsuite/gcc.target/i386/{avx10_2-512-vcmppbf16-2.c => 
avx10_2-512-vcmpbf16-2.c} (100%)
 rename gcc/testsuite/gcc.target/i386/{avx10_2-512-vmaxpbf16-2.c => 
avx10_2-512-vmaxbf16-2.c} (100%)
 rename gcc/testsuite/gcc.target/i386/{avx10_2-512-vminpbf16-2.c => 
avx10_2-512-vminbf16-2.c} (100%)
 rename gcc/testsuite/gcc.target/i386/{avx10_2-bf-vector-cmpp-1.c => 
avx10_2-bf16-vector-cmp-1.c} (90%)
 rename gcc/testsuite/gcc.target/i386/{avx10_2-bf-vector-smaxmin-1.c => 
avx10_2-bf16-vector-smaxmin-1.c} (58%)
 rename gcc/testsuite/gcc.target/i386/{avx10_2-partial-bf-vector-smaxmin-1.c => 
avx10_2-partial-bf16-vector-smaxmin-1.c} (87%)
 rename gcc/testsuite/gcc.target/i386/{avx10_2-vmaxpbf16-2.c => 
avx10_2-vcmpbf16-2.c} (80%)
 rename gcc/testsuite/gcc.target/i386/{avx10_2-vminpbf16-2.c => 
avx10_2-vmaxbf16-2.c} (80%)
 rename gcc/testsuite/gcc.target

[PATCH 12/13] i386: Change mnemonics from VCVT[, T]NEBF162I[, U]BS to VCVT[, T]BF162I[, U]BS

2025-01-21 Thread Haochen Jiang

gcc/ChangeLog:

PR target/118270
* config/i386/avx10_2-512satcvtintrin.h: Change intrin and
builtin name according to new mnemonics.
* config/i386/avx10_2satcvtintrin.h: Ditto.
* config/i386/i386-builtin.def (BDESC): Ditto.
* config/i386/sse.md
(UNSPEC_VCVTBF162IBS): Rename from UNSPEC_VCVTNEBF162IBS.
(UNSPEC_VCVTBF162IUBS): Rename from UNSPEC_VCVTNEBF162IUBS.
(UNSPEC_VCVTTBF162IBS): Rename from UNSPEC_VCVTTNEBF162IBS.
(UNSPEC_VCVTTBF162IUBS): Rename from UNSPEC_VCVTTNEBF162IUBS.
(UNSPEC_CVTNE_BF16_IBS_ITER): Rename to...
(UNSPEC_CVT_BF16_IBS_ITER): ...this. Adjust UNSPEC name.
(sat_cvt_sign_prefix): Adjust UNSPEC name.
(sat_cvt_trunc_prefix): Ditto.

(avx10_2_cvtnebf162ibs):
Rename to...

(avx10_2_cvtbf162ibs):
...this. Change instruction name output.

gcc/testsuite/ChangeLog:

PR target/118270
* gcc.target/i386/avx10_2-512-satcvt-1.c: Adjust output and
intrin call.
* gcc.target/i386/avx10_2-512-vcvtnebf162ibs-2.c: Move to...
* gcc.target/i386/avx10_2-512-vcvtbf162ibs-2.c: ...here.
Adjust intrin call.
* gcc.target/i386/avx10_2-512-vcvtnebf162iubs-2.c: Move to...
* gcc.target/i386/avx10_2-512-vcvtbf162iubs-2.c: ...here.
Adjust intrin call.
* gcc.target/i386/avx10_2-512-vcvttnebf162ibs-2.c: Move to...
* gcc.target/i386/avx10_2-512-vcvttbf162ibs-2.c: ...here.
Adjust intrin call.
* gcc.target/i386/avx10_2-512-vcvttnebf162iubs-2.c: Move to...
* gcc.target/i386/avx10_2-512-vcvttbf162iubs-2.c: ...here.
Adjust intrin call.
* gcc.target/i386/avx10_2-satcvt-1.c: Adjust output and
intrin call.
* gcc.target/i386/avx10_2-vcvtnebf162ibs-2.c: Move to...
* gcc.target/i386/avx10_2-vcvtbf162ibs-2.c: ...here.
Adjust intrin call.
* gcc.target/i386/avx10_2-vcvtnebf162iubs-2.c: Move to...
* gcc.target/i386/avx10_2-vcvtbf162iubs-2.c: ...here.
Adjust intrin call.
* gcc.target/i386/avx10_2-vcvttnebf162ibs-2.c: Move to...
* gcc.target/i386/avx10_2-vcvttbf162ibs-2.c: ...here.
Adjust intrin call.
* gcc.target/i386/avx10_2-vcvttnebf162iubs-2.c: Move to...
* gcc.target/i386/avx10_2-vcvttbf162iubs-2.c: ...here.
Adjust intrin call.
---
 gcc/config/i386/avx10_2-512satcvtintrin.h | 111 
 gcc/config/i386/avx10_2satcvtintrin.h | 265 +++---
 gcc/config/i386/i386-builtin.def  |  24 +-
 gcc/config/i386/sse.md|  40 +--
 .../gcc.target/i386/avx10_2-512-satcvt-1.c|  48 ++--
 ...62ibs-2.c => avx10_2-512-vcvtbf162ibs-2.c} |   6 +-
 ...iubs-2.c => avx10_2-512-vcvtbf162iubs-2.c} |   6 +-
 ...2ibs-2.c => avx10_2-512-vcvttbf162ibs-2.c} |   6 +-
 ...ubs-2.c => avx10_2-512-vcvttbf162iubs-2.c} |   6 +-
 .../gcc.target/i386/avx10_2-satcvt-1.c|  96 +++
 ...ebf162ibs-2.c => avx10_2-vcvtbf162ibs-2.c} |   4 +-
 ...f162iubs-2.c => avx10_2-vcvtbf162iubs-2.c} |   4 +-
 ...bf162ibs-2.c => avx10_2-vcvttbf162ibs-2.c} |   4 +-
 ...162iubs-2.c => avx10_2-vcvttbf162iubs-2.c} |   4 +-
 14 files changed, 287 insertions(+), 337 deletions(-)
 rename gcc/testsuite/gcc.target/i386/{avx10_2-512-vcvtnebf162ibs-2.c => 
avx10_2-512-vcvtbf162ibs-2.c} (87%)
 rename gcc/testsuite/gcc.target/i386/{avx10_2-512-vcvtnebf162iubs-2.c => 
avx10_2-512-vcvtbf162iubs-2.c} (88%)
 rename gcc/testsuite/gcc.target/i386/{avx10_2-512-vcvttnebf162ibs-2.c => 
avx10_2-512-vcvttbf162ibs-2.c} (87%)
 rename gcc/testsuite/gcc.target/i386/{avx10_2-512-vcvttnebf162iubs-2.c => 
avx10_2-512-vcvttbf162iubs-2.c} (87%)
 rename gcc/testsuite/gcc.target/i386/{avx10_2-vcvtnebf162ibs-2.c => 
avx10_2-vcvtbf162ibs-2.c} (78%)
 rename gcc/testsuite/gcc.target/i386/{avx10_2-vcvtnebf162iubs-2.c => 
avx10_2-vcvtbf162iubs-2.c} (77%)
 rename gcc/testsuite/gcc.target/i386/{avx10_2-vcvttnebf162ibs-2.c => 
avx10_2-vcvttbf162ibs-2.c} (77%)
 rename gcc/testsuite/gcc.target/i386/{avx10_2-vcvttnebf162iubs-2.c => 
avx10_2-vcvttbf162iubs-2.c} (77%)

diff --git a/gcc/config/i386/avx10_2-512satcvtintrin.h 
b/gcc/config/i386/avx10_2-512satcvtintrin.h
index 902bb884a57..6e864a9a6f8 100644
--- a/gcc/config/i386/avx10_2-512satcvtintrin.h
+++ b/gcc/config/i386/avx10_2-512satcvtintrin.h
@@ -36,126 +36,125 @@
 
 extern __inline __m512i
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_ipcvtnebf16_epi16 (__m512bh __A)
+_mm512_ipcvtbf16_epi16 (__m512bh __A)
 {
   return
-(__m512i) __builtin_ia32_cvtnebf162ibs512_mask ((__v32bf) __A,
-   (__v32hi)
-   _mm512_undefined_si512 (),
-   (__mmask32) -1);
+(__m512i) __builtin_ia32_cvtbf162ibs512_mask ((__v32bf) __A,
+ (__v32hi)

Re: [PATCH v2 2/2] LoongArch: Improve reassociation for bitwise operation and left shift [PR 115921]

2025-01-21 Thread Lulu Cheng




在 2025/1/21 下午12:59, Xi Ruoyao 写道:

On Tue, 2025-01-21 at 11:46 +0800, Lulu Cheng wrote:

在 2025/1/18 下午7:33, Xi Ruoyao 写道:
/* snip */

   ;; This code iterator allows unsigned and signed division to be generated
   ;; from the same template.
@@ -3083,39 +3084,6 @@ (define_expand "rotl3"
     }
     });
   
-;; The following templates were added to generate "bstrpick.d + alsl.d"

-;; instruction pairs.
-;; It is required that the values of const_immalsl_operand and
-;; immediate_operand must have the following correspondence:
-;;
-;; (immediate_operand >> const_immalsl_operand) == 0x
-
-(define_insn "zero_extend_ashift"
-  [(set (match_operand:DI 0 "register_operand" "=r")
-   (and:DI (ashift:DI (match_operand:DI 1 "register_operand" "r")
-      (match_operand 2 "const_immalsl_operand" ""))
-   (match_operand 3 "immediate_operand" "")))]
-  "TARGET_64BIT
-   && ((INTVAL (operands[3]) >> INTVAL (operands[2])) == 0x)"
-  "bstrpick.d\t%0,%1,31,0\n\talsl.d\t%0,%0,$r0,%2"
-  [(set_attr "type" "arith")
-   (set_attr "mode" "DI")
-   (set_attr "insn_count" "2")])
-
-(define_insn "bstrpick_alsl_paired"
-  [(set (match_operand:DI 0 "register_operand" "=&r")
-   (plus:DI
-     (and:DI (ashift:DI (match_operand:DI 1 "register_operand" "r")
-    (match_operand 2 "const_immalsl_operand" ""))
-     (match_operand 3 "immediate_operand" ""))
-     (match_operand:DI 4 "register_operand" "r")))]
-  "TARGET_64BIT
-   && ((INTVAL (operands[3]) >> INTVAL (operands[2])) == 0x)"
-  "bstrpick.d\t%0,%1,31,0\n\talsl.d\t%0,%0,%4,%2"
-  [(set_attr "type" "arith")
-   (set_attr "mode" "DI")
-   (set_attr "insn_count" "2")])
-

Hi,

In LoongArch, the microarchitecture has performed instruction fusion on
bstrpick.d+alsl.d.

This modification may cause the two instructions to not be close together.

So I think these two templates cannot be deleted. I will test the impact
of this patch on the spec today.

Oops.  I guess we can salvage it with TARGET_SCHED_MACRO_FUSION_P and
TARGET_SCHED_MACRO_FUSION_PAIR_P.  And I'd like to know more details:

1. Is the fusion applying to all bstrpick.d + alsl.d, or only bstrpick.d
rd, rs, 31, 0?
2. Is the fusion also applying to bstrpick.d + slli.d, or we really have
to write the strange "alsl.d rd, rs, r0, shamt" instruction?


Currently, command fusion can only be done in the following situations:

bstrpick.d rd, rs, 31, 0 + alsl.d rd1,rj,rk,shamt and "rd = rj"

[PATCH 11/13] i386: Change mnemonics from VCVTNEPH2[B, H]F8 to VCVTPH2[B, H]F8

2025-01-21 Thread Haochen Jiang

gcc/ChangeLog:

PR target/118270
* config/i386/avx10_2-512convertintrin.h: Change intrin and
builtin name according to new mnemonics.
* config/i386/avx10_2convertintrin.h: Ditto.
* config/i386/i386-builtin.def (BDESC): Ditto.
* config/i386/sse.md
(UNSPEC_VCVTPH2BF8): Rename from UNSPEC_VCVTNEPH2BF8.
(UNSPEC_VCVTPH2BF8S): Rename from UNSPEC_VCVTNEPH2BF8S.
(UNSPEC_VCVTPH2HF8): Rename from UNSPEC_VCVTNEPH2HF8.
(UNSPEC_VCVTPH2HF8S): Rename from UNSPEC_VCVTNEPH2HF8S.
(UNSPEC_CONVERTPH2FP8): Rename from UNSPEC_NECONVERTPH2FP8.
Adjust UNSPEC name.
(convertph2fp8): Rename from neconvertph2fp8. Adjust
iterator map.
(vcvtv8hf): Rename to...
(vcvtv8hf): ...this.
(*vcvtv8hf): Rename to...
(*vcvtv8hf): ...this.
(vcvtv8hf_mask): Rename to...
(vcvtv8hf_mask): ...this.
(*vcvtv8hf_mask): Rename to...
(*vcvtv8hf_mask): ...this.
(vcvt): Rename to...
(vcvt): ...this.

gcc/testsuite/ChangeLog:

PR target/118270
* gcc.target/i386/avx10_2-512-convert-1.c: Adjust output
and intrin call.
* gcc.target/i386/avx10_2-512-vcvtneph2bf8-2.c: Move to...
* gcc.target/i386/avx10_2-512-vcvtph2bf8-2.c: ...here.
Adjust intrin call.
* gcc.target/i386/avx10_2-512-vcvtneph2bf8s-2.c: Move to...
* gcc.target/i386/avx10_2-512-vcvtph2bf8s-2.c: ...here.
Adjust intrin call.
* gcc.target/i386/avx10_2-512-vcvtneph2hf8-2.c: Move to...
* gcc.target/i386/avx10_2-512-vcvtph2hf8-2.c: ...here.
Adjust intrin call.
* gcc.target/i386/avx10_2-512-vcvtneph2hf8s-2.c: Move to...
* gcc.target/i386/avx10_2-512-vcvtph2hf8s-2.c: ...here.
Adjust intrin call.
* gcc.target/i386/avx10_2-convert-1.c: Adjust output
and intrin call.
* gcc.target/i386/avx10_2-vcvtneph2bf8-2.c: Move to...
* gcc.target/i386/avx10_2-vcvtph2bf8-2.c: ...here.
Adjust intrin call.
* gcc.target/i386/avx10_2-vcvtneph2hf8-2.c: Move to...
* gcc.target/i386/avx10_2-vcvtph2bf8s-2.c: ...here.
Adjust intrin call.
* gcc.target/i386/avx10_2-vcvtneph2bf8s-2.c: Move to...
* gcc.target/i386/avx10_2-vcvtph2hf8-2.c: ...here.
Adjust intrin call.
* gcc.target/i386/avx10_2-vcvtneph2hf8s-2.c: Move to...
* gcc.target/i386/avx10_2-vcvtph2hf8s-2.c: ...here.
Adjust intrin call.
---
 gcc/config/i386/avx10_2-512convertintrin.h| 112 -
 gcc/config/i386/avx10_2convertintrin.h| 224 +-
 gcc/config/i386/i386-builtin.def  |  24 +-
 gcc/config/i386/sse.md|  50 ++--
 .../gcc.target/i386/avx10_2-512-convert-1.c   |  56 ++---
 ...eph2bf8-2.c => avx10_2-512-vcvtph2bf8-2.c} |   6 +-
 ...h2bf8s-2.c => avx10_2-512-vcvtph2bf8s-2.c} |   6 +-
 ...eph2hf8-2.c => avx10_2-512-vcvtph2hf8-2.c} |   6 +-
 ...h2hf8s-2.c => avx10_2-512-vcvtph2hf8s-2.c} |   6 +-
 .../gcc.target/i386/avx10_2-convert-1.c   | 104 
 ...cvtneph2bf8-2.c => avx10_2-vcvtph2bf8-2.c} |   4 +-
 ...vtneph2hf8-2.c => avx10_2-vcvtph2bf8s-2.c} |   4 +-
 ...vtneph2bf8s-2.c => avx10_2-vcvtph2hf8-2.c} |   4 +-
 ...tneph2hf8s-2.c => avx10_2-vcvtph2hf8s-2.c} |   4 +-
 14 files changed, 305 insertions(+), 305 deletions(-)
 rename gcc/testsuite/gcc.target/i386/{avx10_2-512-vcvtneph2bf8-2.c => 
avx10_2-512-vcvtph2bf8-2.c} (89%)
 rename gcc/testsuite/gcc.target/i386/{avx10_2-512-vcvtneph2bf8s-2.c => 
avx10_2-512-vcvtph2bf8s-2.c} (89%)
 rename gcc/testsuite/gcc.target/i386/{avx10_2-512-vcvtneph2hf8-2.c => 
avx10_2-512-vcvtph2hf8-2.c} (89%)
 rename gcc/testsuite/gcc.target/i386/{avx10_2-512-vcvtneph2hf8s-2.c => 
avx10_2-512-vcvtph2hf8s-2.c} (89%)
 rename gcc/testsuite/gcc.target/i386/{avx10_2-vcvtneph2bf8-2.c => 
avx10_2-vcvtph2bf8-2.c} (79%)
 rename gcc/testsuite/gcc.target/i386/{avx10_2-vcvtneph2hf8-2.c => 
avx10_2-vcvtph2bf8s-2.c} (79%)
 rename gcc/testsuite/gcc.target/i386/{avx10_2-vcvtneph2bf8s-2.c => 
avx10_2-vcvtph2hf8-2.c} (78%)
 rename gcc/testsuite/gcc.target/i386/{avx10_2-vcvtneph2hf8s-2.c => 
avx10_2-vcvtph2hf8s-2.c} (78%)

diff --git a/gcc/config/i386/avx10_2-512convertintrin.h 
b/gcc/config/i386/avx10_2-512convertintrin.h
index c753dd7a946..5c64b9f004b 100644
--- a/gcc/config/i386/avx10_2-512convertintrin.h
+++ b/gcc/config/i386/avx10_2-512convertintrin.h
@@ -426,118 +426,118 @@ _mm512_maskz_cvthf8_ph (__mmask32 __U, __m256i __A)
 
 extern __inline__ __m256i
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_cvtneph_pbf8 (__m512h __A)
+_mm512_cvtph_bf8 (__m512h __A)
 {
-  return (__m256i) __builtin_ia32_vcvtneph2bf8512_mask ((__v32hf) __A,
-   (__v32qi) (__m256i)
-   _mm256_undefined_si256 
(),
-   (__mmas

[PATCH 13/13] i386: Omit "p" for packed in intrin name for FP8 convert

2025-01-21 Thread Haochen Jiang

gcc/ChangeLog:

* config/i386/avx10_2-512convertintrin.h:
Omit "p" for packed for FP8.
* config/i386/avx10_2convertintrin.h: Ditto.

gcc/testsuite/ChangeLog:

* gcc.target/i386/avx10_2-512-convert-1.c: Adjust intrin call.
* gcc.target/i386/avx10_2-512-vcvtbiasph2bf8-2.c: Ditto.
* gcc.target/i386/avx10_2-512-vcvtbiasph2bf8s-2.c: Ditto.
* gcc.target/i386/avx10_2-512-vcvtbiasph2hf8-2.c: Ditto.
* gcc.target/i386/avx10_2-512-vcvtbiasph2hf8s-2.c: Ditto.
* gcc.target/i386/avx10_2-convert-1.c: Ditto.
---
 gcc/config/i386/avx10_2-512convertintrin.h| 38 +-
 gcc/config/i386/avx10_2convertintrin.h| 76 +--
 .../gcc.target/i386/avx10_2-512-convert-1.c   | 30 
 .../i386/avx10_2-512-vcvtbiasph2bf8-2.c   |  6 +-
 .../i386/avx10_2-512-vcvtbiasph2bf8s-2.c  |  6 +-
 .../i386/avx10_2-512-vcvtbiasph2hf8-2.c   |  6 +-
 .../i386/avx10_2-512-vcvtbiasph2hf8s-2.c  |  6 +-
 .../gcc.target/i386/avx10_2-convert-1.c   | 60 +++
 8 files changed, 114 insertions(+), 114 deletions(-)

diff --git a/gcc/config/i386/avx10_2-512convertintrin.h 
b/gcc/config/i386/avx10_2-512convertintrin.h
index 5c64b9f004b..1079e0a2bda 100644
--- a/gcc/config/i386/avx10_2-512convertintrin.h
+++ b/gcc/config/i386/avx10_2-512convertintrin.h
@@ -133,7 +133,7 @@ _mm512_maskz_cvtx_round2ps_ph (__mmask32 __U, __m512 __A,
 
 extern __inline__ __m256i
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_cvtbiasph_pbf8 (__m512i __A, __m512h __B)
+_mm512_cvtbiasph_bf8 (__m512i __A, __m512h __B)
 {
   return (__m256i) __builtin_ia32_vcvtbiasph2bf8512_mask ((__v64qi) __A,
  (__v32hf) __B,
@@ -144,8 +144,8 @@ _mm512_cvtbiasph_pbf8 (__m512i __A, __m512h __B)
 
 extern __inline__ __m256i
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_mask_cvtbiasph_pbf8 (__m256i __W, __mmask32 __U,
-   __m512i __A, __m512h __B)
+_mm512_mask_cvtbiasph_bf8 (__m256i __W, __mmask32 __U,
+  __m512i __A, __m512h __B)
 {
   return (__m256i) __builtin_ia32_vcvtbiasph2bf8512_mask ((__v64qi) __A,
  (__v32hf) __B,
@@ -155,7 +155,7 @@ _mm512_mask_cvtbiasph_pbf8 (__m256i __W, __mmask32 __U,
 
 extern __inline__ __m256i
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_maskz_cvtbiasph_pbf8 (__mmask32 __U, __m512i __A, __m512h __B)
+_mm512_maskz_cvtbiasph_bf8 (__mmask32 __U, __m512i __A, __m512h __B)
 {
   return (__m256i) __builtin_ia32_vcvtbiasph2bf8512_mask ((__v64qi) __A,
  (__v32hf) __B,
@@ -166,7 +166,7 @@ _mm512_maskz_cvtbiasph_pbf8 (__mmask32 __U, __m512i __A, 
__m512h __B)
 
 extern __inline__ __m256i
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_cvtbiassph_pbf8 (__m512i __A, __m512h __B)
+_mm512_cvtbiassph_bf8 (__m512i __A, __m512h __B)
 {
   return (__m256i) __builtin_ia32_vcvtbiasph2bf8s512_mask ((__v64qi) __A,
   (__v32hf) __B,
@@ -177,8 +177,8 @@ _mm512_cvtbiassph_pbf8 (__m512i __A, __m512h __B)
 
 extern __inline__ __m256i
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_mask_cvtbiassph_pbf8 (__m256i __W, __mmask32 __U,
-__m512i __A, __m512h __B)
+_mm512_mask_cvtbiassph_bf8 (__m256i __W, __mmask32 __U,
+   __m512i __A, __m512h __B)
 {
   return (__m256i) __builtin_ia32_vcvtbiasph2bf8s512_mask ((__v64qi) __A,
   (__v32hf) __B,
@@ -188,7 +188,7 @@ _mm512_mask_cvtbiassph_pbf8 (__m256i __W, __mmask32 __U,
 
 extern __inline__ __m256i
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_maskz_cvtbiassph_pbf8 (__mmask32 __U, __m512i __A, __m512h __B)
+_mm512_maskz_cvtbiassph_bf8 (__mmask32 __U, __m512i __A, __m512h __B)
 {
   return (__m256i) __builtin_ia32_vcvtbiasph2bf8s512_mask ((__v64qi) __A,
   (__v32hf) __B,
@@ -199,7 +199,7 @@ _mm512_maskz_cvtbiassph_pbf8 (__mmask32 __U, __m512i __A, 
__m512h __B)
 
 extern __inline__ __m256i
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_cvtbiasph_phf8 (__m512i __A, __m512h __B)
+_mm512_cvtbiasph_hf8 (__m512i __A, __m512h __B)
 {
   return (__m256i) __builtin_ia32_vcvtbiasph2hf8512_mask ((__v64qi) __A,
  (__v32hf) __B,
@@ -210,8 +210,8 @@ _mm512_cvtbiasph_phf8 (__m512i __A, __m512h __B)
 
 extern __inline__ __m256i
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_mask_cvtbiasph_phf8 (__m256i __W, __mmask32 __U, __m512i __A,
-   __m512h __B)
+_mm512_mask_cvtbiasph_hf8 (__m256i __W, __m

Re: [PATCH] c++: 'this' capture clobbered during recursive inst [PR116756]

2025-01-21 Thread Jason Merrill


On 1/16/25 2:02 PM, Patrick Palka wrote:

On Mon, 13 Jan 2025, Jason Merrill wrote:


On 1/10/25 2:20 PM, Patrick Palka wrote:

Bootstrapped and regtested on x86_64-pc-linux-gnu, does this look
OK for trunk?

The documentation for LAMBDA_EXPR_THIS_CAPTURE seems outdated because
it says the field is only used at parse time, but apparently it's also
used at instantiation time.

Non-'this' captures don't seem to be affected, because there is no
corresponding LAMBDA_EXPR field that gets clobbered, and instead their
uses get resolved via the local specialization mechanism which is
recursion aware.

The bug also disappears if we explicitly use this in the openSeries call,
i.e. this->openSeries(...), because that sidesteps the use of
maybe_resolve_dummy / LAMBDA_EXPR_THIS_CAPTURE for resolving the
implicit object, and instead gets resolved via the local mechanism
specialization.

Maybe this suggests that there's a better way to fix this, but I'm not
sure...


That does sound like an interesting direction.  Maybe for a generic lambda,
LAMBDA_EXPR_THIS_CAPTURE could just refer to the captured parameter, and we
use retrieve_local_specialization to find the proxy?


Like so?  Tested on x86_64-pc-linux-gnu, full bootstrap+regtest in
progress.

-- >8 --

Subject: [PATCH v2] c++: 'this' capture clobbered during recursive inst
  [PR116756]

Bootstrapped and regtested on x86_64-pc-linux-gnu, does this look
OK for trunk?


OK.


-- >8 --

Here during instantiation of generic lambda's op() [with I = 0] we
substitute into the call self(self, cst<1>{}) which requires recursive
instantiation of the same op() [with I = 1] (which isn't deferred due to
lambda's deduced return type.  During this recursive instantiation, the
DECL_EXPR case of tsubst_stmt clobbers LAMBDA_EXPR_THIS_CAPTURE to point
to the child op()'s specialized capture proxy instead of the parent's,
and the original value is never restored.

So later when substituting into the openSeries call in the parent op()
maybe_resolve_dummy uses the 'this' proxy belonging to the child op(),
which leads to a context mismatch ICE during gimplification of the
proxy.

An earlier version of this patch fixed this by making instantiate_body
save/restore LAMBDA_EXPR_THIS_CAPTURE during a lambda op() instantiation.
But it seems cleaner to avoid overwriting LAMBDA_EXPR_THIS_CAPTURE in the
first place by making it point to the non-specialized capture proxy, and
instead call retrieve_local_specialization as needed, which is what this
patch implements.  It's simpler then to not clear LAMBDA_EXPR_THIS_CAPTURE
after parsing/regenerating a lambda.

PR c++/116756

gcc/cp/ChangeLog:

* lambda.cc (lambda_expr_this_capture): Call
retrieve_local_specialization on the result of
LAMBDA_EXPR_THIS_CAPTURE for a generic lambda.
* parser.cc (cp_parser_lambda_expression): Don't clear
LAMBDA_EXPR_THIS_CAPTURE.
* pt.cc (tsubst_stmt) : Don't overwrite
LAMBDA_EXPR_THIS_CAPTURE.
(tsubst_lambda_expr): Don't clear LAMBDA_EXPR_THIS_CAPTURE
afterward.

gcc/testsuite/ChangeLog:

* g++.dg/cpp1z/constexpr-if-lambda7.C: New test.
---
  gcc/cp/lambda.cc  |  6 +
  gcc/cp/parser.cc  |  3 ---
  gcc/cp/pt.cc  | 11 +
  .../g++.dg/cpp1z/constexpr-if-lambda7.C   | 24 +++
  4 files changed, 31 insertions(+), 13 deletions(-)
  create mode 100644 gcc/testsuite/g++.dg/cpp1z/constexpr-if-lambda7.C

diff --git a/gcc/cp/lambda.cc b/gcc/cp/lambda.cc
index be8a0fe01cb..4ee8f6c745d 100644
--- a/gcc/cp/lambda.cc
+++ b/gcc/cp/lambda.cc
@@ -785,6 +785,12 @@ lambda_expr_this_capture (tree lambda, int add_capture_p)
tree result;
  
tree this_capture = LAMBDA_EXPR_THIS_CAPTURE (lambda);

+  if (this_capture)
+if (tree spec = retrieve_local_specialization (this_capture))
+  {
+   gcc_checking_assert (generic_lambda_fn_p (lambda_function (lambda)));
+   this_capture = spec;
+  }
  
/* In unevaluated context this isn't an odr-use, so don't capture.  */

if (cp_unevaluated_operand)
diff --git a/gcc/cp/parser.cc b/gcc/cp/parser.cc
index 74f4f7cd6d8..16bbb87a815 100644
--- a/gcc/cp/parser.cc
+++ b/gcc/cp/parser.cc
@@ -11723,9 +11723,6 @@ cp_parser_lambda_expression (cp_parser* parser)
  parser->omp_array_section_p = saved_omp_array_section_p;
}
  
-  /* This field is only used during parsing of the lambda.  */

-  LAMBDA_EXPR_THIS_CAPTURE (lambda_expr) = NULL_TREE;
-
/* This lambda shouldn't have any proxies left at this point.  */
gcc_assert (LAMBDA_EXPR_PENDING_PROXIES (lambda_expr) == NULL);
/* And now that we're done, push proxies for an enclosing lambda.  */
diff --git a/gcc/cp/pt.cc b/gcc/cp/pt.cc
index 961696f333e..64c7d3da405 100644
--- a/gcc/cp/pt.cc
+++ b/gcc/cp/pt.cc
@@ -18938,12 +18938,6 @@ tsubst_stmt (tree t, tree args, tsubst_flags_t 
complain, tree in_decl)

Re: [PATCH 1/2] c++: Don't call fold from cp_fold if one of the operands is an error_mark [PR118525]

2025-01-21 Thread Jason Merrill


On 1/16/25 7:10 PM, Andrew Pinski wrote:

While adding a new match pattern, g++.dg/cpp2a/consteval36.C started to ICE and 
that was
because we would call fold even if one of the operands of the comparison was an 
error_mark_node.
I found a new testcase which also ICEs before this patch too so show the issue 
was latent.

So there is code in cp_fold to avoid calling fold when one of the operands 
become error_mark_node
but with the addition of consteval, the replacement of an invalid call is 
replaced before the call
to cp_fold and there is no way to pop up the error_mark. So this patch changes 
the current code to
check if the operands of the expression are error_mark_node before checking if 
the folded operand
is different from the previous one.


Hmm, I'm surprised the fold functions don't return error_mark_node for 
error operand.  But the patch is OK, thanks.



Bootstrapped and tested on x86_64-linux-gnu.

PR c++/118525

gcc/cp/ChangeLog:

* cp-gimplify.cc (cp_fold): Check operands of unary, binary, 
cond/vec_cond
and array_ref for error_mark before checking if the operands had 
changed.

gcc/testsuite/ChangeLog:

* g++.dg/cpp2a/consteval38.C: New test.

Signed-off-by: Andrew Pinski 
---
  gcc/cp/cp-gimplify.cc| 99 ++--
  gcc/testsuite/g++.dg/cpp2a/consteval38.C | 11 +++
  2 files changed, 53 insertions(+), 57 deletions(-)
  create mode 100644 gcc/testsuite/g++.dg/cpp2a/consteval38.C

diff --git a/gcc/cp/cp-gimplify.cc b/gcc/cp/cp-gimplify.cc
index c7074b00cef..4ec3de13008 100644
--- a/gcc/cp/cp-gimplify.cc
+++ b/gcc/cp/cp-gimplify.cc
@@ -3005,19 +3005,16 @@ cp_fold (tree x, fold_flags_t flags)
loc = EXPR_LOCATION (x);
op0 = cp_fold_maybe_rvalue (TREE_OPERAND (x, 0), rval_ops, flags);
  
-  if (code == CONVERT_EXPR

+  if (op0 == error_mark_node)
+   x = error_mark_node;
+  else if (code == CONVERT_EXPR
  && SCALAR_TYPE_P (TREE_TYPE (x))
  && op0 != void_node)
/* During parsing we used convert_to_*_nofold; re-convert now using the
   folding variants, since fold() doesn't do those transformations.  */
x = fold (convert (TREE_TYPE (x), op0));
else if (op0 != TREE_OPERAND (x, 0))
-   {
- if (op0 == error_mark_node)
-   x = error_mark_node;
- else
-   x = fold_build1_loc (loc, code, TREE_TYPE (x), op0);
-   }
+   x = fold_build1_loc (loc, code, TREE_TYPE (x), op0);
else
x = fold (x);
  
@@ -3087,20 +3084,17 @@ cp_fold (tree x, fold_flags_t flags)

op0 = cp_fold_maybe_rvalue (TREE_OPERAND (x, 0), rval_ops, flags);
  
  finish_unary:

-  if (op0 != TREE_OPERAND (x, 0))
+  if (op0 == error_mark_node)
+   x = error_mark_node;
+  else if (op0 != TREE_OPERAND (x, 0))
{
- if (op0 == error_mark_node)
-   x = error_mark_node;
- else
+ x = fold_build1_loc (loc, code, TREE_TYPE (x), op0);
+ if (code == INDIRECT_REF
+ && (INDIRECT_REF_P (x) || TREE_CODE (x) == MEM_REF))
{
- x = fold_build1_loc (loc, code, TREE_TYPE (x), op0);
- if (code == INDIRECT_REF
- && (INDIRECT_REF_P (x) || TREE_CODE (x) == MEM_REF))
-   {
- TREE_READONLY (x) = TREE_READONLY (org_x);
- TREE_SIDE_EFFECTS (x) = TREE_SIDE_EFFECTS (org_x);
- TREE_THIS_VOLATILE (x) = TREE_THIS_VOLATILE (org_x);
-   }
+ TREE_READONLY (x) = TREE_READONLY (org_x);
+ TREE_SIDE_EFFECTS (x) = TREE_SIDE_EFFECTS (org_x);
+ TREE_THIS_VOLATILE (x) = TREE_THIS_VOLATILE (org_x);
}
}
else
@@ -3190,13 +3184,10 @@ cp_fold (tree x, fold_flags_t flags)
op0, op1);
}
  
-  if (op0 != TREE_OPERAND (x, 0) || op1 != TREE_OPERAND (x, 1))

-   {
- if (op0 == error_mark_node || op1 == error_mark_node)
-   x = error_mark_node;
- else
-   x = fold_build2_loc (loc, code, TREE_TYPE (x), op0, op1);
-   }
+  if (op0 == error_mark_node || op1 == error_mark_node)
+   x = error_mark_node;
+  else if (op0 != TREE_OPERAND (x, 0) || op1 != TREE_OPERAND (x, 1))
+   x = fold_build2_loc (loc, code, TREE_TYPE (x), op0, op1);
else
x = fold (x);
  
@@ -3268,17 +3259,14 @@ cp_fold (tree x, fold_flags_t flags)

}
}
  
-  if (op0 != TREE_OPERAND (x, 0)

- || op1 != TREE_OPERAND (x, 1)
- || op2 != TREE_OPERAND (x, 2))
-   {
- if (op0 == error_mark_node
- || op1 == error_mark_node
- || op2 == error_mark_node)
-   x = error_mark_node;
- else
-   x = fold_build3_loc (loc, code, TREE_TYPE (x), op0, op1, op2);
-   }
+  if (op0 == error_mark_node
+ || op1 == error_mark_node
+ || op2 == error_ma

Re: [PATCH] c++: Introduce append_ctor_to_tree_vector

2025-01-21 Thread Jakub Jelinek

On Tue, Jan 21, 2025 at 04:39:58PM -0500, Jason Merrill wrote:
> On 1/21/25 11:15 AM, Jakub Jelinek wrote:
> > On Tue, Jan 21, 2025 at 11:06:35AM -0500, Jason Merrill wrote:
> > > > --- gcc/c-family/c-common.cc.jj 2025-01-20 18:00:35.667875671 +0100
> > > > +++ gcc/c-family/c-common.cc2025-01-21 09:29:23.955582581 +0100
> > > > @@ -9010,33 +9010,46 @@ make_tree_vector_from_list (tree list)
> > > >  return ret;
> > > >}
> > > > -/* Get a new tree vector of the values of a CONSTRUCTOR.  */
> > > > +/* Append to a tree vector the values of a CONSTRUCTOR.
> > > > +   nelts should be at least CONSTRUCTOR_NELTS (ctor) and v
> > > > +   should be initialized with make_tree_vector (); followed by
> > > > +   vec_safe_reserve (v, nelts); or equivalently vec_alloc (v, nelts);
> > > > +   optionally followed by pushes of other elements (up to
> > > > +   nelts - CONSTRUCTOR_NELTS (ctor)).  */
> > > 
> > > How about using v->allocated () instead of passing in nelts?
> > 
> > That is not necessarily the same.
> 
> Yeah, it occurred to me later that it doesn't matter what the original
> length or capacity of the vector is, we want to make sure there's enough
> room for the elements of the ctor after whatever's already there.  So we
> want nelts to start as CONSTRUCTOR_NELTS, and then vec_safe_reserve nelts -
> i.

That wouldn't work for the appending case, the vector already can have some
extra elements in it.
But actually starting at
  unsigned nelts = vec_safe_length (v) + CONSTRUCTOR_NELTS (ctor);
is I think exactly what we want.  And I think we should keep the
vec_safe_reserve or vec_alloc in the callers unless there is a RAW_DATA_CST,
CONSTRUCTORs without those will still be the vast majority of cases and for
GC vectors reservation when there aren't enough allocated elements means new
allocation and GC of the old.

In the make_tree_vector_from_ctor case that will be 0 + CONSTRUCTOR_NELTS 
(ctor);
and caller reserving (non-exact) that amount, while in the case of
add_list_candidates it starts from what we have already in the vector (i.e.
the nart extra first args) and then CONSTRUCTOR_NELTS (ctor) again.
And vec_safe_length (v) rather than v->length (), because in the
add_list_candidates case for nart == 0 && CONSTRUCTOR_NELTS (ctor) == 0
vec_alloc (new_args, 0);
just sets new_args = NULL (in the make_tree_vector_from_ctor case
make_tree_vector () actually returns some vector with allocated () in
[4,16]).

So like the patch below (again, just quickly tested on a few tests so far)?

2025-01-21  Jakub Jelinek  

gcc/c-family/
* c-common.h (append_ctor_to_tree_vector): Declare.
* c-common.cc (append_ctor_to_tree_vector): New function.
(make_tree_vector_from_ctor): Use it.
gcc/cp/
* call.cc (add_list_candidates): Use append_ctor_to_tree_vector.

--- gcc/c-family/c-common.h.jj  2025-01-17 11:29:33.139696380 +0100
+++ gcc/c-family/c-common.h 2025-01-21 09:30:09.520947570 +0100
@@ -1190,6 +1190,8 @@ extern vec *make_tree_vecto
 extern void release_tree_vector (vec *);
 extern vec *make_tree_vector_single (tree);
 extern vec *make_tree_vector_from_list (tree);
+extern vec *append_ctor_to_tree_vector (vec *,
+tree);
 extern vec *make_tree_vector_from_ctor (tree);
 extern vec *make_tree_vector_copy (const vec *);
 
--- gcc/c-family/c-common.cc.jj 2025-01-20 18:00:35.667875671 +0100
+++ gcc/c-family/c-common.cc2025-01-21 22:47:30.644455174 +0100
@@ -9010,33 +9010,45 @@ make_tree_vector_from_list (tree list)
   return ret;
 }
 
-/* Get a new tree vector of the values of a CONSTRUCTOR.  */
+/* Append to a tree vector the values of a CONSTRUCTOR.
+   v should be initialized with make_tree_vector (); followed by
+   vec_safe_reserve (v, nelts); or equivalently vec_alloc (v, nelts);
+   optionally followed by pushes of other elements (up to
+   nelts - CONSTRUCTOR_NELTS (ctor)).  */
 
 vec *
-make_tree_vector_from_ctor (tree ctor)
+append_ctor_to_tree_vector (vec *v, tree ctor)
 {
-  vec *ret = make_tree_vector ();
-  unsigned nelts = CONSTRUCTOR_NELTS (ctor);
-  vec_safe_reserve (ret, CONSTRUCTOR_NELTS (ctor));
+  unsigned nelts = vec_safe_length (v) + CONSTRUCTOR_NELTS (ctor);
   for (unsigned i = 0; i < CONSTRUCTOR_NELTS (ctor); ++i)
 if (TREE_CODE (CONSTRUCTOR_ELT (ctor, i)->value) == RAW_DATA_CST)
   {
tree raw_data = CONSTRUCTOR_ELT (ctor, i)->value;
nelts += RAW_DATA_LENGTH (raw_data) - 1;
-   vec_safe_reserve (ret, nelts - ret->length ());
+   vec_safe_reserve (v, nelts - v->length ());
if (TYPE_PRECISION (TREE_TYPE (raw_data)) > CHAR_BIT
|| TYPE_UNSIGNED (TREE_TYPE (raw_data)))
  for (unsigned j = 0; j < (unsigned) RAW_DATA_LENGTH (raw_data); ++j)
-   ret->quick_push (build_int_cst (TREE_TYPE (raw_data),
-   RAW_DATA_UCHAR_ELT (raw_data, j)));
+   v->quick_push (build_int_cst (TREE_TYPE (raw_data),
+

Re: [PATCH] c++/modules: Fix linkage checks for exported using-decls

2025-01-21 Thread Jason Merrill


On 1/15/25 7:36 PM, yxj-github-437 wrote:

On Fri, Jan 03, 2025 at 05:18:55PM +, xxx wrote:

From: yxj-github-437 <2457369...@qq.com>

This patch attempts to fix an error when build module std. The reason for the
error is __builrin_va_list (aka struct __va_list) is an internal linkage. so
attempt to handle this builtin type by identifying whether DECL_SOURCE_LOCATION 
(entity)
is BUILTINS_LOCATION.



Hi, thanks for the patch!  I suspect this may not be sufficient to
completely avoid issues with the __gnuc_va_list type; in particular, if
it's internal linkage that may prevent it from being referred to in
other ways by inline functions in named modules (due to P1815).

Maybe a better approach would be to instead mark this builtin type as
TREE_PUBLIC (presumably in aarch64_build_builtin_va_list)?


Thanks, I change my patch to mark TREE_PUBLIC.


Looks good to me if the ARM maintainers don't object.

This patch is small enough not to worry about copyright, but
"yxj-github-437 <2457369...@qq.com>" seems like a placeholder name, what 
name would you like the commit to use?



-- >8 --

This patch attempts to fix an error when build module std. The reason for the
error is __builtin_va_list (aka struct __va_list) has internal linkage.
so mark this builtin type as TREE_PUBLIC to make struct __va_list has
external linkage.

/x/gcc-15.0.0/usr/bin/aarch64-linux-android-g++ -fmodules -std=c++23 -fPIC -O2
-fsearch-include-path bits/std.cc -c
/x/gcc-15.0.0/usr/lib/gcc/aarch64-linux-android/15.0.0/include/c++/bits/std.cc:3642:14:
error: exporting ‘typedef __gnuc_va_list va_list’ that does not have external 
linkage
  3642 |   using std::va_list;
   |  ^~~
: note: ‘struct __va_list’ declared here with internal linkage

gcc
* config/aarch64/aarch64.cc (aarch64_build_builtin_va_list): mark 
__builtin_va_list
as TREE_PUBLIC
* config/arm/arm.cc (arm_build_builtin_va_list): mark __builtin_va_list
as TREE_PUBLIC
* testsuite/g++.dg/modules/builtin-8.C: New test
---
  gcc/config/aarch64/aarch64.cc| 1 +
  gcc/config/arm/arm.cc| 1 +
  gcc/testsuite/g++.dg/modules/builtin-8.C | 9 +
  3 files changed, 11 insertions(+)
  create mode 100644 gcc/testsuite/g++.dg/modules/builtin-8.C

diff --git a/gcc/config/aarch64/aarch64.cc b/gcc/config/aarch64/aarch64.cc
index ad31e9d255c..e022526e573 100644
--- a/gcc/config/aarch64/aarch64.cc
+++ b/gcc/config/aarch64/aarch64.cc
@@ -21566,6 +21566,7 @@ aarch64_build_builtin_va_list (void)
 get_identifier ("__va_list"),
 va_list_type);
DECL_ARTIFICIAL (va_list_name) = 1;
+  TREE_PUBLIC (va_list_name) = 1;
TYPE_NAME (va_list_type) = va_list_name;
TYPE_STUB_DECL (va_list_type) = va_list_name;
  
diff --git a/gcc/config/arm/arm.cc b/gcc/config/arm/arm.cc

index 1e0791dc8c2..86838ebde5f 100644
--- a/gcc/config/arm/arm.cc
+++ b/gcc/config/arm/arm.cc
@@ -2906,6 +2906,7 @@ arm_build_builtin_va_list (void)
 get_identifier ("__va_list"),
 va_list_type);
DECL_ARTIFICIAL (va_list_name) = 1;
+  TREE_PUBLIC (va_list_name) = 1;
TYPE_NAME (va_list_type) = va_list_name;
TYPE_STUB_DECL (va_list_type) = va_list_name;
/* Create the __ap field.  */
diff --git a/gcc/testsuite/g++.dg/modules/builtin-8.C 
b/gcc/testsuite/g++.dg/modules/builtin-8.C
new file mode 100644
index 000..ff91104e4a9
--- /dev/null
+++ b/gcc/testsuite/g++.dg/modules/builtin-8.C
@@ -0,0 +1,9 @@
+// { dg-additional-options -fmodules-ts }
+module;
+#include 
+export module builtins;
+// { dg-module-cmi builtins }
+
+export {
+  using ::va_list;
+}

Re: [PATCH] c++: bogus error with nested lambdas [PR117602]

2025-01-21 Thread Jason Merrill


On 1/16/25 5:42 PM, Marek Polacek wrote:

On Wed, Jan 15, 2025 at 04:18:36PM -0500, Jason Merrill wrote:

On 1/15/25 12:55 PM, Marek Polacek wrote:

On Wed, Jan 15, 2025 at 09:39:41AM -0500, Jason Merrill wrote:

On 11/15/24 9:08 AM, Marek Polacek wrote:

Bootstrapped/regtested on x86_64-pc-linux-gnu, ok for trunk?

-- >8 --
The error here should also check that we aren't nested in another
lambda; in it, at_function_scope_p() will be false.

PR c++/117602

gcc/cp/ChangeLog:

* parser.cc (cp_parser_lambda_introducer): Check if we're in a lambda
before emitting the error about a non-local lambda with
a capture-default.

gcc/testsuite/ChangeLog:

* g++.dg/cpp2a/lambda-uneval19.C: New test.
---
gcc/cp/parser.cc |  5 -
gcc/testsuite/g++.dg/cpp2a/lambda-uneval19.C | 14 ++
2 files changed, 18 insertions(+), 1 deletion(-)
create mode 100644 gcc/testsuite/g++.dg/cpp2a/lambda-uneval19.C

diff --git a/gcc/cp/parser.cc b/gcc/cp/parser.cc
index 07b12224615..dc79ff42a3b 100644
--- a/gcc/cp/parser.cc
+++ b/gcc/cp/parser.cc
@@ -11611,7 +11611,10 @@ cp_parser_lambda_introducer (cp_parser* parser, tree 
lambda_expr)
  cp_lexer_consume_token (parser->lexer);
  first = false;
-  if (!(at_function_scope_p () || parsing_nsdmi ()))
+  if (!(at_function_scope_p ()
+   || parsing_nsdmi ()
+   || (current_class_type
+   && LAMBDA_TYPE_P (current_class_type


How about using current_nonlambda_scope () instead of at_function_scope_p
()?


I think I remember not using that because current_nonlambda_scope() will
give us a namespace_decl :: for non-local stuff so it won't be null.  Do
you still prefer that (checking the result of current_nonlambda_scope())
to what I did in my patch?


I think so, your change looks to be true for lambdas outside function scope
as well.


I think it works correctly for both

   auto x = [&]() { // error
   [&]() { };
   };
   auto x2 = []() {
   [&]() { };
   };

but current_nonlambda_scope () will return '::' for the nested lambdas too.
Am I missing something?


Ah, good point.  But it doesn't work correctly for an adjustment to the 
PR testcase; with your patch the following is wrongly accepted:


auto x = [](decltype([&]{})){};

Perhaps current_scope should look through closure types, so 
at_function_scope_p gives the right answer?


Jason

Re: [PATCH] c++: Introduce append_ctor_to_tree_vector

2025-01-21 Thread Jakub Jelinek

On Tue, Jan 21, 2025 at 05:21:52PM -0500, Jason Merrill wrote:
> > +   v should be initialized with make_tree_vector (); followed by
> > +   vec_safe_reserve (v, nelts); or equivalently vec_alloc (v, nelts);
> > +   optionally followed by pushes of other elements (up to
> > +   nelts - CONSTRUCTOR_NELTS (ctor)).  */
> >   vec *
> > -make_tree_vector_from_ctor (tree ctor)
> > +append_ctor_to_tree_vector (vec *v, tree ctor)
> >   {
> > -  vec *ret = make_tree_vector ();
> > -  unsigned nelts = CONSTRUCTOR_NELTS (ctor);
> > -  vec_safe_reserve (ret, CONSTRUCTOR_NELTS (ctor));
> 
> I think we can/should still have
> 
> vec_safe_reserve (v, CONSTRUCTOR_NELTS (ctor));
> 
> here, to place fewer requirements on callers; if it's redundant it will just
> return.

Ok, will add that and test.

Jakub

[PATCH] PR tree-optimization/95801 - infer non-zero for integral division RHS.

2025-01-21 Thread Andrew MacLeod

This patch simply adds an op2_range to operator_div which returns 
non-zero if the LHS is not undefined.  This means given and integral 
division:


       x = y / z

'z' will have a range of   [-INF, -1] [1, +INF]  after execution of the 
statement.


This is relatively straightforward and resolves the PR, but I also get 
that we might not want to proliferate an inferred range of undefined 
behavior at this late stage.


OK for trunk, or defer to stage 1?  Are there any flags that need to be 
checked to make this valid?


Bootstrapped on x86_64-pc-linux-gnu with no regressions.

Andrew
From 83260dd7c035a2317a6a5083d70288c3fdaf6ab4 Mon Sep 17 00:00:00 2001
From: Andrew MacLeod 
Date: Tue, 21 Jan 2025 11:49:12 -0500
Subject: [PATCH] infer non-zero for integral division RHS.

Adding op2_range for operator_div allows ranger to notice the divisor
is non-zero after execution.

	PR tree-optimization/95801
	gcc/
	* range-op.cc (operator_div::op2_range): New.

	gcc/testsuite/
	* gcc.dg/tree-ssa/pr95801.c: New.
---
 gcc/range-op.cc | 16 
 gcc/testsuite/gcc.dg/tree-ssa/pr95801.c | 13 +
 2 files changed, 29 insertions(+)
 create mode 100644 gcc/testsuite/gcc.dg/tree-ssa/pr95801.c

diff --git a/gcc/range-op.cc b/gcc/range-op.cc
index 6310ce27f03..e6aeefd436f 100644
--- a/gcc/range-op.cc
+++ b/gcc/range-op.cc
@@ -2408,8 +2408,11 @@ operator_widen_mult_unsigned::wi_fold (irange &r, tree type,
 class operator_div : public cross_product_operator
 {
   using range_operator::update_bitmask;
+  using range_operator::op2_range;
 public:
   operator_div (tree_code div_kind) { m_code = div_kind; }
+  bool op2_range (irange &r, tree type, const irange &lhs, const irange &,
+		  relation_trio) const;
   virtual void wi_fold (irange &r, tree type,
 		const wide_int &lh_lb,
 		const wide_int &lh_ub,
@@ -2429,6 +2432,19 @@ static operator_div op_floor_div (FLOOR_DIV_EXPR);
 static operator_div op_round_div (ROUND_DIV_EXPR);
 static operator_div op_ceil_div (CEIL_DIV_EXPR);
 
+// Set OP2 to non-zero if the LHS isn't UNDEFINED.
+bool
+operator_div::op2_range (irange &r, tree type, const irange &lhs,
+			 const irange &, relation_trio) const
+{
+  if (!lhs.undefined_p ())
+{
+  r.set_nonzero (type);
+  return true;
+}
+  return false;
+}
+
 bool
 operator_div::wi_op_overflows (wide_int &res, tree type,
 			   const wide_int &w0, const wide_int &w1) const
diff --git a/gcc/testsuite/gcc.dg/tree-ssa/pr95801.c b/gcc/testsuite/gcc.dg/tree-ssa/pr95801.c
new file mode 100644
index 000..c3c80a045cf
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/tree-ssa/pr95801.c
@@ -0,0 +1,13 @@
+// { dg-do compile }
+// { dg-options "-O2 -fdump-tree-evrp" }
+
+int always1(int a, int b) {
+if (a / b)
+return b != 0;
+return 1;
+}
+
+// If b != 0 is optimized by recognizing divide by 0 cannot happen,
+// there should be no PHI node.
+
+// { dg-final { scan-tree-dump-not "PHI" "evrp" } }
-- 
2.45.0

[committed] testsuite: Add testcase for already fixed PR [PR118560]

2025-01-21 Thread Jakub Jelinek

On Mon, Jan 20, 2025 at 05:15:51PM -0500, Vladimir Makarov wrote:
> The patch fixes
>
> https://gcc.gnu.org/bugzilla/show_bug.cgi?id=118560

The fix for this PR has been committed without a testcase.
The following testcase would take at least 15 minutes to compile
on a fast machine (powerpc64-linux both -m32 or -m64), now it takes
100ms.

Committed as obvious to trunk.

2025-01-21  Jakub Jelinek  

PR target/118560
* gcc.dg/dfp/pr118560.c: New test.

--- gcc/testsuite/gcc.dg/dfp/pr118560.c.jj  2025-01-21 14:32:12.059466859 
+0100
+++ gcc/testsuite/gcc.dg/dfp/pr118560.c 2025-01-21 14:41:19.919866909 +0100
@@ -0,0 +1,17 @@
+/* PR target/118560 */
+/* { dg-do compile } */
+/* { dg-options "-O1" } */
+
+struct { _Decimal32 a; } b;
+void foo (int, _Decimal32);
+
+#define B(n) \
+void   \
+bar##n (int, _Decimal32 d) \
+{  \
+  foo (n, 1);  \
+  b.a = d; \
+}
+
+#define C(n) B(n##0) B(n##1) B(n##2) B(n##3) B(n##4) B(n##5) B(n##6) B(n##7) 
B(n##8) B(n##9)
+C(1) C(2) C(3) C(4) C(5)


Jakub

Re: [PATCH v3] c++: fix wrong-code with constexpr prvalue opt [PR118396]

2025-01-21 Thread Jason Merrill


On 1/21/25 9:54 AM, Jason Merrill wrote:

On 1/20/25 5:58 PM, Marek Polacek wrote:

On Mon, Jan 20, 2025 at 12:39:03PM -0500, Jason Merrill wrote:

On 1/20/25 12:27 PM, Marek Polacek wrote:

On Mon, Jan 20, 2025 at 11:46:44AM -0500, Jason Merrill wrote:

On 1/20/25 10:27 AM, Marek Polacek wrote:

On Fri, Jan 17, 2025 at 06:38:45PM -0500, Jason Merrill wrote:

On 1/17/25 1:31 PM, Marek Polacek wrote:

On Fri, Jan 17, 2025 at 08:10:24AM -0500, Jason Merrill wrote:

On 1/16/25 8:04 PM, Marek Polacek wrote:

Bootstrapped/regtested on x86_64-pc-linux-gnu, ok for trunk?

-- >8 --
The recent r15-6369 unfortunately caused a bad wrong-code issue.
Here we have

   TARGET_EXPR {.status=0, .data={._vptr.Foo=&_ZTV3Foo + 16}})>


and call cp_fold_r -> maybe_constant_init with object=D.2996.  In
cxx_eval_outermost_constant_expr we now take the type of the 
object
if present.  An object can't have type 'void' and so we 
continue to
evaluate the initializer.  That evaluates into a VOID_CST, 
meaning

we disregard the whole initializer, and terrible things ensue.


In that case, I'd think we want to use the value of 
'object' (which should

be in ctx.ctor?) instead of the return value of
cxx_eval_constant_expression.


Ah, I'm sorry I didn't choose that approach.  Maybe like this, 
then?


Bootstrapped/regtested on x86_64-pc-linux-gnu, ok for trunk?


OK.  Maybe also add an assert that TREE_TYPE (r) is close enough 
to type?


Thanks.  dg.exp passed with this extra assert:

@@ -8986,7 +8986,11 @@ cxx_eval_outermost_constant_expr (tree t, 
bool allow_non_constant,
  /* If we got a non-simple TARGET_EXPR, the initializer was a 
sequence
 of statements, and the result ought to be stored in 
ctx.ctor.  */

  if (r == void_node && !constexpr_dtor && ctx.ctor)
-    r = ctx.ctor;
+    {
+  r = ctx.ctor;
+  gcc_checking_assert (same_type_ignoring_top_level_qualifiers_p
+  (TREE_TYPE (r), type));
+    }


I was thinking to add that assert in general, not just in this 
case, to

catch any other instances of trying to return the wrong type.


Unfortunately this
+  /* Check we are not trying to return the wrong type.  */
+  gcc_checking_assert (same_type_ignoring_top_level_qualifiers_p
+  (initialized_type (r), type)


Why not just TREE_TYPE (r)?


Adjusted to use TREE_TYPE now.

+  || error_operand_p (type));
breaks too much, e.g. constexpr-prvalue2.C with struct A x struct B,
or pr82128.C
*(((struct C *) a)->D.2903._vptr.A + 8)
x
int (*) ()

I've also tried can_convert, or similar_type_p but no luck.  Any 
thoughts?


Those both sound like the sort of bugs the assert is intended to 
catch. But

I suppose we can't add it without fixing them first.

In the latter case, probably by adding an explicit conversion from 
the vtbl

slot type to the desired function pointer type.

In the former case, I don't see a constant-expression, so we 
shouldn't be

trying to check the type of a nonexistent constant result?


As discussed earlier, this patch just returns the original expression if
the types don't match:

Bootstrapped/regtested on x86_64-pc-linux-gnu, ok for trunk?


OK, thanks!


-- >8 --
The recent r15-6369 unfortunately caused a bad wrong-code issue.
Here we have

   TARGET_EXPR {.status=0, .data={._vptr.Foo=&_ZTV3Foo + 16}})>


and call cp_fold_r -> maybe_constant_init with object=D.2996.  In
cxx_eval_outermost_constant_expr we now take the type of the object
if present.  An object can't have type 'void' and so we continue to
evaluate the initializer.  That evaluates into a VOID_CST, meaning
we disregard the whole initializer, and terrible things ensue.

For non-simple TARGET_EXPRs, we should return ctx.ctor rather than
the result of cxx_eval_constant_expression.

PR c++/118396
PR c++/118523

gcc/cp/ChangeLog:

* constexpr.cc (cxx_eval_outermost_constant_expr): For non-simple
TARGET_EXPRs, return ctx.ctor rather than the result of
cxx_eval_constant_expression.  If TYPE and the type of R don't
match, return the original expression.

gcc/testsuite/ChangeLog:

* g++.dg/cpp0x/constexpr-prvalue4.C: New test.
* g++.dg/cpp1y/constexpr-prvalue3.C: New test.

Reviewed-by: Jason Merrill 
---
  gcc/cp/constexpr.cc   |  9 +++-
  .../g++.dg/cpp0x/constexpr-prvalue4.C | 33 ++
  .../g++.dg/cpp1y/constexpr-prvalue3.C | 45 +++
  3 files changed, 86 insertions(+), 1 deletion(-)
  create mode 100644 gcc/testsuite/g++.dg/cpp0x/constexpr-prvalue4.C
  create mode 100644 gcc/testsuite/g++.dg/cpp1y/constexpr-prvalue3.C

diff --git a/gcc/cp/constexpr.cc b/gcc/cp/constexpr.cc
index 7ff38f8b5e5..9f950ffed74 100644
--- a/gcc/cp/constexpr.cc
+++ b/gcc/cp/constexpr.cc
@@ -8983,6 +8983,11 @@ cxx_eval_outermost_constant_expr (tree t, bool 
allow_non_constant,

    r = cxx_eval_constant_expression (&ctx, r, vc_prvalue,
  &non_constant_p, &overflow_p);
+  /* If we got a non-simple TARG

Re: [PATCH v2 02/12] libgomp, AArch64: Add test cases for SVE types in OpenMP shared clause.

2025-01-21 Thread Jakub Jelinek

On Fri, Oct 18, 2024 at 11:52:23AM +0530, Tejas Belagod wrote:
> This patch adds a test scaffold for OpenMP compile tests in under the 
> gcc.target
> testsuite.  It also adds a target tests directory libgomp.target along with an
> SVE execution test
> 
> gcc/testsuite/ChangeLog:
> 
>   * gcc.target/aarch64/sve/omp/gomp.exp: New scaffold.

s/scaffold/test driver/ ?
Also, my slight preference would be gomp subdirectory rather than omp,
consistency is nice.
> 
> libgomp/ChangeLog:
> 
>   * testsuite/libgomp.target/aarch64/aarch64.exp: New scaffold.

Likewise.
Plus I wonder about the libgomp.target name.
In gcc/testsuite/ we have gcc.target, g++.target and gfortran.target
subdirectories so it is clear which languages they handle, but
libgomp.target could mean anything.  So, wouldn't libgomp.target.c
or libgomp.c-target be better directory name?
The latter to match e.g. libgomp.oacc-{c,c++,fortran}.

>   * testsuite/libgomp.target/aarch64/shared.c: New test.

> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/aarch64/sve/omp/gomp.exp
> @@ -0,0 +1,46 @@
> +# Copyright (C) 2006-2024 Free Software Foundation, Inc.

s/2024/2025/ before committing anything, otherwise copyright bumping
won't handle it next year either.
> --- /dev/null
> +++ b/libgomp/testsuite/libgomp.target/aarch64/aarch64.exp
> @@ -0,0 +1,57 @@
> +# Copyright (C) 2006-2024 Free Software Foundation, Inc.

Ditto.

> --- /dev/null
> +++ b/libgomp/testsuite/libgomp.target/aarch64/shared.c
> @@ -0,0 +1,186 @@
> +/* { dg-do run { target aarch64_sve256_hw } } */
> +/* { dg-options "-msve-vector-bits=256 -std=gnu99 -fopenmp -O2 
> -fdump-tree-ompexp" } */

Is -std=gnu99 needed (now that gcc defaults to -std=gnu23)?
I guess most of -std=gnu99 is from the time when C99 wasn't the
default.
> +
> +#include 
> +#include 
> +#include 
> +#include 
> +
> +svint32_t
> +__attribute__ ((noinline))
> +explicit_shared (svint32_t a, svint32_t b, svbool_t p)
> +{
> +
> +#pragma omp parallel shared (a, b, p) num_threads (1)
> +  {
> +/* 'a', 'b' and 'p' are explicitly shared.  */
> +a = svadd_s32_z (p, a, b);
> +  }

With the num_threads (1) it isn't a good example, then
the parallel is pretty much useless.
Would be better to test without that, doesn't have to be tons of threads,
but at least 2-4.
With num_threads (2) it is then racy though, stores the same a
in all threads.
Can one have arrays of svint32_t?  If not, perhaps
  svint32_t c;
#pragma omp parallel shared (a, b, c, p) num_threads (2)
#pragma omp sections
  {
/* 'a', 'b', 'c' and 'p' are explicitly shared.  */
a = svadd_s32_z (p, a, b);
  #pragma omp section
c = svadd_s32_z (p, a, b);
  }

#pragma omp parallel shared (a, b, c, p) num_threads (2)
#pragma omp sections
  {
a = svadd_s32_z (p, a, b);
  #pragma omp section
c = svadd_s32_z (p, c, b);
  }

  compare_vec (a, c);
  return a;
?

> +svint32_t
> +__attribute__ ((noinline))
> +implicit_shared_default (svint32_t a, svint32_t b, svbool_t p)
> +{
> +
> +#pragma omp parallel default (shared) num_threads (1)
> +  {
> +/* 'a', 'b' and 'p' are implicitly shared.  */
> +a = svadd_s32_z (p, a, b);

Again, bad example, works only with num_threads (1), otherwise it is racy.
> +svint32_t
> +__attribute__ ((noinline))
> +mix_shared (svint32_t b, svbool_t p)
> +{
> +
> +  svint32_t a;
> +  int32_t *m = (int32_t *)malloc (8 * sizeof (int32_t));

Formatting, missing space before malloc

> +  int i;
> +
> +#pragma omp parallel for
> +  for (i = 0; i < 8; i++)
> +m[i] = i;
> +
> +#pragma omp parallel
> +  {
> +/* 'm' is predetermined shared here.  'a' is implicitly shared here.  */
> +a = svld1_s32 (svptrue_b32 (), m);

This is racy.
Either different threads need to write to different shared
variables, or if arrays of vectors work to different elements
of array, or it can be guarded with say #pragma omp masked
(so that only specific thread does that), or use just low number of threads
and write depending on omp_get_thread_num () to this or that.
Just note that you could get fewer threads than you asked for.

> +#pragma omp parallel num_threads (1)
> +  {
> +/* 'a', 'b' and 'p' are implicitly shared here.  */
> +a = svadd_s32_z (p, a, b);
> +  }
> +
> +#pragma omp parallel shared (a, b, p) num_threads (1)
> +  {
> +/* 'a', 'b' and 'p' are explicitly shared here.  */
> +a = svadd_s32_z (p, a, b);
> +  }

These aren't racy during num_threads (1), but because of that
not really good examples on how shared works.

> +  int32_t *m = (int32_t *)malloc (8 * sizeof (int32_t));

See above.

> +  int i;
> +
> +#pragma omp parallel for
> +  /* 'm' is predetermined shared here.  */
> +  for (i = 0; i < 8; i++)
> +  {
> +m[i] = i;
> +  }

No need for the {}s around the body.
> +
> +#pragma omp parallel
> +  {
> +/* 'a' is predetermined shared here.  */
> +static int64_t n;
> +svint32_t a;
> +#pragma omp parallel
> +{
> +  /* 'n' is predetermined shared here.  */
> +  if (x)
> +

[PATCH] c++: Introduce append_ctor_to_tree_vector

2025-01-21 Thread Jakub Jelinek

On Mon, Jan 20, 2025 at 05:14:33PM -0500, Jason Merrill wrote:
> > --- gcc/cp/call.cc.jj   2025-01-15 18:24:36.135503866 +0100
> > +++ gcc/cp/call.cc  2025-01-17 14:42:38.201643385 +0100
> > @@ -4258,11 +4258,30 @@ add_list_candidates (tree fns, tree firs
> > /* Expand the CONSTRUCTOR into a new argument vec.  */
> 
> Maybe we could factor out a function called something like
> append_ctor_to_tree_vector from the common code between this and
> make_tree_vector_from_ctor?
> 
> But this is OK as is if you don't want to pursue that.

I had the previous patch already tested and wanted to avoid delaying
the large initializer speedup re-reversion any further, so I've committed
the patch as is.

Here is an incremental patch to factor that out.

Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?

2025-01-21  Jakub Jelinek  

gcc/c-family/
* c-common.h (append_ctor_to_tree_vector): Declare.
* c-common.cc (append_ctor_to_tree_vector): New function.
(make_tree_vector_from_ctor): Use it.
gcc/cp/
* call.cc (add_list_candidates): Use append_ctor_to_tree_vector.

--- gcc/c-family/c-common.h.jj  2025-01-17 11:29:33.139696380 +0100
+++ gcc/c-family/c-common.h 2025-01-21 09:30:09.520947570 +0100
@@ -1190,6 +1190,8 @@ extern vec *make_tree_vecto
 extern void release_tree_vector (vec *);
 extern vec *make_tree_vector_single (tree);
 extern vec *make_tree_vector_from_list (tree);
+extern vec *append_ctor_to_tree_vector (vec *,
+tree, unsigned);
 extern vec *make_tree_vector_from_ctor (tree);
 extern vec *make_tree_vector_copy (const vec *);
 
--- gcc/c-family/c-common.cc.jj 2025-01-20 18:00:35.667875671 +0100
+++ gcc/c-family/c-common.cc2025-01-21 09:29:23.955582581 +0100
@@ -9010,33 +9010,46 @@ make_tree_vector_from_list (tree list)
   return ret;
 }
 
-/* Get a new tree vector of the values of a CONSTRUCTOR.  */
+/* Append to a tree vector the values of a CONSTRUCTOR.
+   nelts should be at least CONSTRUCTOR_NELTS (ctor) and v
+   should be initialized with make_tree_vector (); followed by
+   vec_safe_reserve (v, nelts); or equivalently vec_alloc (v, nelts);
+   optionally followed by pushes of other elements (up to
+   nelts - CONSTRUCTOR_NELTS (ctor)).  */
 
 vec *
-make_tree_vector_from_ctor (tree ctor)
+append_ctor_to_tree_vector (vec *v, tree ctor, unsigned nelts)
 {
-  vec *ret = make_tree_vector ();
-  unsigned nelts = CONSTRUCTOR_NELTS (ctor);
-  vec_safe_reserve (ret, CONSTRUCTOR_NELTS (ctor));
   for (unsigned i = 0; i < CONSTRUCTOR_NELTS (ctor); ++i)
 if (TREE_CODE (CONSTRUCTOR_ELT (ctor, i)->value) == RAW_DATA_CST)
   {
tree raw_data = CONSTRUCTOR_ELT (ctor, i)->value;
nelts += RAW_DATA_LENGTH (raw_data) - 1;
-   vec_safe_reserve (ret, nelts - ret->length ());
+   vec_safe_reserve (v, nelts - v->length ());
if (TYPE_PRECISION (TREE_TYPE (raw_data)) > CHAR_BIT
|| TYPE_UNSIGNED (TREE_TYPE (raw_data)))
  for (unsigned j = 0; j < (unsigned) RAW_DATA_LENGTH (raw_data); ++j)
-   ret->quick_push (build_int_cst (TREE_TYPE (raw_data),
-   RAW_DATA_UCHAR_ELT (raw_data, j)));
+   v->quick_push (build_int_cst (TREE_TYPE (raw_data),
+ RAW_DATA_UCHAR_ELT (raw_data, j)));
else
  for (unsigned j = 0; j < (unsigned) RAW_DATA_LENGTH (raw_data); ++j)
-   ret->quick_push (build_int_cst (TREE_TYPE (raw_data),
-   RAW_DATA_SCHAR_ELT (raw_data, j)));
+   v->quick_push (build_int_cst (TREE_TYPE (raw_data),
+ RAW_DATA_SCHAR_ELT (raw_data, j)));
   }
 else
-  ret->quick_push (CONSTRUCTOR_ELT (ctor, i)->value);
-  return ret;
+  v->quick_push (CONSTRUCTOR_ELT (ctor, i)->value);
+  return v;
+}
+
+/* Get a new tree vector of the values of a CONSTRUCTOR.  */
+
+vec *
+make_tree_vector_from_ctor (tree ctor)
+{
+  vec *ret = make_tree_vector ();
+  unsigned nelts = CONSTRUCTOR_NELTS (ctor);
+  vec_safe_reserve (ret, nelts);
+  return append_ctor_to_tree_vector (ret, ctor, nelts);
 }
 
 /* Get a new tree vector which is a copy of an existing one.  */
--- gcc/cp/call.cc.jj   2025-01-21 09:11:58.214113697 +0100
+++ gcc/cp/call.cc  2025-01-21 09:32:29.382005137 +0100
@@ -4262,26 +4262,7 @@ add_list_candidates (tree fns, tree firs
   vec_alloc (new_args, nelts);
   for (unsigned i = 0; i < nart; ++i)
 new_args->quick_push ((*args)[i]);
-  for (unsigned i = 0; i < CONSTRUCTOR_NELTS (init_list); ++i)
-if (TREE_CODE (CONSTRUCTOR_ELT (init_list, i)->value) == RAW_DATA_CST)
-  {
-   tree raw_data = CONSTRUCTOR_ELT (init_list, i)->value;
-   nelts += RAW_DATA_LENGTH (raw_data) - 1;
-   vec_safe_reserve (new_args, nelts - new_args->length ());
-   if (TYPE_PRECISION (TREE_TYPE (raw_data)) > CHAR_BIT
-   || TYPE_UNSIGNED

Re: [PATCH] c++: Handle CPP_EMBED in cp_parser_objc_message_args [PR118586]

2025-01-21 Thread Jason Merrill


On 1/21/25 10:51 AM, Jakub Jelinek wrote:

Hi!

As the following testcases show, I forgot to handle CPP_EMBED in
cp_parser_objc_message_args which is another place which can parse
possibly long valid lists of CPP_COMMA separated CPP_NUMBER tokens.

Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?

2025-01-20  Jakub Jelinek  

PR objc++/118586
gcc/cp/
* parser.cc (cp_parser_objc_message_args): Handle CPP_EMBED.
gcc/testsuite/
* objc.dg/embed-1.m: New test.
* obj-c++.dg/embed-1.mm: New test.
* obj-c++.dg/va-meth-2.mm: New test.

--- gcc/cp/parser.cc.jj 2025-01-17 19:27:34.052140136 +0100
+++ gcc/cp/parser.cc2025-01-20 20:16:23.082876036 +0100
@@ -36632,14 +36632,22 @@ cp_parser_objc_message_args (cp_parser*
/* Handle non-selector arguments, if any. */
while (token->type == CPP_COMMA)
  {
-  tree arg;
-
cp_lexer_consume_token (parser->lexer);
-  arg = cp_parser_assignment_expression (parser);
  
-  addl_args

-   = chainon (addl_args,
-  build_tree_list (NULL_TREE, arg));
+  if (cp_lexer_next_token_is (parser->lexer, CPP_EMBED))
+   {
+ tree raw_data = cp_lexer_peek_token (parser->lexer)->u.value;
+ cp_lexer_consume_token (parser->lexer);
+ for (tree argument : raw_data_range (raw_data))
+   addl_args = chainon (addl_args,
+build_tree_list (NULL_TREE, argument));


chainon of each byte of an #embed looks pretty inefficient, walking the 
full list for each new element.  But OK.



+   }
+  else
+   {
+ tree arg = cp_parser_assignment_expression (parser);
+ addl_args = chainon (addl_args,
+  build_tree_list (NULL_TREE, arg));
+   }
  
token = cp_lexer_peek_token (parser->lexer);

  }
--- gcc/testsuite/objc.dg/embed-1.m.jj  2025-01-20 20:41:05.974260340 +0100
+++ gcc/testsuite/objc.dg/embed-1.m 2025-01-20 20:28:54.934427543 +0100
@@ -0,0 +1,14 @@
+/* PR objc++/118586 */
+/* { dg-do compile } */
+
+@interface Foo
++ (int) bar: (int) firstNumber, int secondNumber, ...;
+@end
+
+void
+baz (void)
+{
+  [Foo bar: 1, 2,
+#embed __FILE__
+   , -1];
+}
--- gcc/testsuite/obj-c++.dg/embed-1.mm.jj  2025-01-20 20:45:07.907894733 
+0100
+++ gcc/testsuite/obj-c++.dg/embed-1.mm 2025-01-20 20:49:18.743405280 +0100
@@ -0,0 +1,15 @@
+// PR objc++/118586
+// { dg-do compile }
+// { dg-options "" }
+
+@interface Foo
++ (int) bar: (int) firstNumber, int secondNumber, ...;
+@end
+
+void
+baz (void)
+{
+  [Foo bar: 1, 2,
+#embed __FILE__
+   , -1];
+}
--- gcc/testsuite/obj-c++.dg/va-meth-2.mm.jj2025-01-20 20:34:59.431358606 
+0100
+++ gcc/testsuite/obj-c++.dg/va-meth-2.mm   2025-01-20 20:40:14.413977609 
+0100
@@ -0,0 +1,87 @@
+/* PR objc++/118586 */
+/* Based on objc/execute/va_method.m, by Nicola Pero */
+
+/* { dg-do run } */
+/* { dg-xfail-run-if "Needs OBJC2 ABI" { *-*-darwin* && { lp64 && { ! objc2 } } } { 
"-fnext-runtime" } { "" } } */
+#include "../objc-obj-c++-shared/TestsuiteObject.m"
+#include 
+#include 
+
+/* Test methods with "C-style" trailing arguments, with or without ellipsis. */
+
+@interface MathClass: TestsuiteObject
+/* sum positive numbers; -1 ends the list */
++ (int) sum: (int) firstNumber, int secondNumber, ...;
++ (int) prod: (int) firstNumber, int secondNumber, int thirdNumber;
++ (int) minimum: (int) firstNumber, ...;
+@end
+
+extern "C" int some_func(id self, SEL _cmd, int firstN, int secondN, int 
thirdN, ...) {
+  return firstN + secondN + thirdN;
+}
+
+@implementation MathClass
++ (int) sum: (int) firstNumber, int secondNumber, ...
+{
+  va_list ap;
+  int sum = 0, number = 0;
+
+  va_start (ap, secondNumber);
+  number = firstNumber + secondNumber;
+
+  while (number >= 0)
+{
+  sum += number;
+  number = va_arg (ap, int);
+}
+
+  va_end (ap);
+
+  return sum;
+}
++ (int) prod: (int) firstNumber, int secondNumber, int thirdNumber {
+  return firstNumber * secondNumber * thirdNumber;
+}
++ (int) minimum: (int) firstNumber, ...
+{
+  va_list ap;
+  int minimum = 999, number = 0;
+
+  va_start (ap, firstNumber);
+  number = firstNumber;
+
+  while (number >= 0)
+{
+  minimum = (minimum < number ? minimum: number);
+  number = va_arg (ap, int);
+}
+
+  va_end (ap);
+
+  return minimum;
+}
+@end
+
+int main (void)
+{
+#define ONETOTEN 1, 2, 3, 4, 5, 6, 7, 8, 9, 10
+  if ([MathClass sum: ONETOTEN, ONETOTEN, ONETOTEN, ONETOTEN,
+   ONETOTEN, ONETOTEN, ONETOTEN, ONETOTEN, ONETOTEN, ONETOTEN,
+   ONETOTEN, ONETOTEN, ONETOTEN, ONETOTEN, ONETOTEN, ONETOTEN,
+   ONETOTEN, ONETOTEN, ONETOTEN, ONETOTEN, ONETOTEN, ONETOTEN,
+   ONETOTEN, ONETOTEN, ONETOTEN, ONETOTEN, ONETOTEN, ONETOTEN,
+   ONETOTEN, ONETOTEN, -1] != 1650)
+abort ();
+  if ([MathClass prod: 4, 5, 6] != 120)
+abort ();
+#define TWENTYONETOTHIRTY 21, 22, 23, 24, 25, 26, 27, 28, 29, 30
+  if ([MathClass minimum:

Re: [PATCH 3/4] RISC-V: Add .note.gnu.property for ZICFILP and ZICFISS ISA extension

2025-01-21 Thread Mark Wielaard

Hi,

On Tue, 2025-01-21 at 14:46 +0100, Mark Wielaard wrote:
> Thanks. And if you need help with that please let people know.
> The riscv bootstrap has been broken now for 5 days.
> And it really looks like it is as simple as just removing that one
> line.

Sorry, I missed that you already pushed the unused variable (commit
3c34cea66). But the bootstrap is still broken because of a format error
in the same function:

../../gcc/gcc/config/riscv/riscv.cc: In function ‘void riscv_file_end()’:
../../gcc/gcc/config/riscv/riscv.cc:10378:30: error: format ‘%x’ expects 
argument of type ‘unsigned int’, but argument 3 has type ‘long unsigned int’ 
[-Werror=format=]
10378 |   fprintf (asm_out_file, "\t.long\t%x\n", feature_1_and);
  |  ^~~  ~
  |   |
  |   long unsigned int
../../gcc/gcc/config/riscv/riscv.cc: In function ‘void 
riscv_lshift_subword(machine_mode, rtx, rtx, rtx_def**)’:

And an used parameter error just below:

../../gcc/gcc/config/riscv/riscv.cc:11962:36: error: unused parameter ‘mode’ 
[-Werror=unused-parameter]
11962 | riscv_lshift_subword (machine_mode mode, rtx value, rtx shift,
  |   ~^~~~
cc1plus: all warnings being treated as errors

Re: [PATCH] aarch64: Provide initial specifications for Apple CPU cores.

2025-01-21 Thread Iain Sandoe




> On 20 Jan 2025, at 18:33, Andrew Carlotti  wrote:
> 
> On Mon, Jan 20, 2025 at 06:29:12PM +, Tamar Christina wrote:
>>> -Original Message-
>>> From: Iain Sandoe 
>>> Sent: Monday, January 20, 2025 6:15 PM
>>> To: Andrew Carlotti 
>>> Cc: Kyrylo Tkachov ; GCC Patches >> patc...@gcc.gnu.org>; Tamar Christina ; Richard
>>> Sandiford ; Sam James 
>>> Subject: Re: [PATCH] aarch64: Provide initial specifications for Apple CPU 
>>> cores.
>>> 
>>> 
>>> 
 On 20 Jan 2025, at 17:38, Andrew Carlotti  wrote:
 
 On Sun, Jan 19, 2025 at 09:14:17PM +, Iain Sandoe wrote:

>> 
>> I would say if we can find both core IDs we should use them, otherwise this 
>> is already
>> an improvement on the situation.
> 
> There are some part numbers listed at:
> https://github.com/AsahiLinux/docs/wiki/HW:ARM-System-Registers
> 
> That only seems to cover apple-a12 and apple-m1.

The latest llvm Host.cpp contains the opposite problem:
apple-m1 is mapped to : 0x20, 21, 22, 23, 24, 25, 28, 29
apple-m2 is mapped to : 0x30, 31, 32, 33, 34, 35, 38, 39
apple-m3 is mapped to : 0x48, 49

(if I had to speculate I might say that odd or even numbers related to 
big/litte - but the
 number of options ….)

>> Comparing to LLVM's AArch64Processors.td, this seems to be missing a few
>>> things:
>> - Crpyto extensions (SHA2 and AES, and SHA3 from apple-m1);
> 
> I do not see FEAT_SHA2 listed in either the Arm doc, or the output from 
> the
>>> sysctl.
> FEAT_AES: 1
> FEAT_SHA3: 1
> So I’ve added those to the three entries.
 
 There some architecture feature names that are effectively aliases in the 
 spec,
 although identifying this requires reading the restrictions of the id 
 register
 fields (and at least one version of the spec accidentally omitted one of 
 the
 dependencies).  In summary:
 - +sha2 = FEAT_SHA1 and FEAT_SHA256
 - +aes = FEAT_AES and FEAT_PMULL
 - +sha3 = FEAT_SHA512 and FEAT_SHA3
>>> 
>>> thanks - that was not obvious.
>>> 
>>> However, if I add any of these to the 8.4 spec, LLVM’s back end (at least 
>>> the ones
>>> via xcode) drops the arch rev down and we fail to build libgcc because of 
>>> missing
>>> support for fp16.
>>> 
>>> This is likely a bug - but I don’t really know how to describe it at the 
>>> moment - and
>>> it won’t make any difference to the assemblers already in the wild - so I 
>>> will leave
>>> these out of the list for now.
>>> 
>> - New flags I just added (FRINTTS and FLAGM2 from apple-m1);
> FEAT_FRINTTS: 1
> FEAT_FlagM2: 1
> So I;ve added those.
>>> 
>>> The build with these added succeeded with no change in test results.

So I have found a way that LLVM’s backend is happy with (I need to test across 
more xcode versions .. but it’s a start):

 AARCH64_CORE("apple-m1", applem1, cortexa57, V8_4A,  (AES, SHA2, SHA3, F16FML, 
SB, SSBS, FRINTTS, FLAGM2), generic_armv8_a, 0x61, 0x023, -1)
AARCH64_CORE("apple-m2", applem2, cortexa57, V8_4A,  (I8MM, BF16, AES, SHA2, 
SHA3, F16FML, SB, SSBS, FRINTTS, FLAGM2), generic_armv8_a, 0x61, 0x033, -1)
AARCH64_CORE("apple-m3", applem3, cortexa57, V8_4A,  (I8MM, BF16, AES, SHA2, 
SHA3, F16FML, SB, SSBS, FRINTTS, FLAGM2), generic_armv8_a, 0x61, 0x048, -1)

So, although FP16FML is implicit in 8.4 but F16 is not - it seems that I cannot 
specify the missing F16 without causing the other part to get switched off).  
Specifying F16FML is OK because that switches on F16 and is part of 8.4 anyway….

>> - PREDRES (from apple-m1)
> 
> I cannot find FEAT_PREDRES …
> … however we do have
> FEAT_SPECRES: 0
 
 FEAT_SPECRES in the architecture spec is the same as the +predres toolchain
 flag.  LLVM seems to think the is supported from apple-m1.

So what do we do about this?
Do we assume that this is a bug in the sysctl reporting?
Is there anyone on the LLVM toolchain team or within Apple you folks could 
query?
I can try via the Apple Open Source folks - but not sure how long that will 
take.

if we could go with 8.5 and 8.6 that would simplify things;
Also I want to add apple-m4 soon and that also reports FEAT_SPECRES  = 0.

thanks,
Iain

Re: [GCC16 stage 1][RFC][PATCH 0/3]extend "counted_by" attribute to pointer fields of structures

2025-01-21 Thread Martin Uecker

Am Dienstag, dem 21.01.2025 um 21:15 +0100 schrieb Martin Uecker:
> Am Dienstag, dem 21.01.2025 um 19:45 + schrieb Joseph Myers:
> > On Tue, 21 Jan 2025, Martin Uecker wrote:
> > 
> > > Coudn't you use the rule that .len refers to the closest enclosing 
> > > structure
> > > even without __self__ ?  This would then also disambiguate between 
> > > designators
> > > and other uses.
> > 
> > Right now, an expression cannot start with '.', which provides the 
> > disambiguation between designators and expressions as initializers. 
> 
> You could disambiguate directly after parsing the identifier, which
> does not seem overly problematic.

The bigger issue seems that if you forward reference a member, you
do not yet know its type.  So whatever syntax we pick, general expressions
seem problematic anyway:

struct {
  char *buf [[counted_by(2 * .n + 3)]];
  unsigned int n;
};


Martin

> 
> >  Note 
> > that for counted_by it's the closest enclosing *definition of a structure 
> > type*.  That's different from designators where the *type of an object 
> > being initialized by a brace-enclosed initializer list* is what's 
> > relevant.
> 
> You would have to treat the members of the referenced structure
> type  as in scope.  But this does not seem too absurd, because
> 
> counted_by ( (struct foo){ .len = 1 }.len ) )
> 
> could also be written with an inline definition:
> 
> counted_by ( (struct foo { int len; }){ .len = 1 }.len ) )
> 
> and then it would be natural to think of "len" as being in scope
> inside the initializer.  
> 
> 
> Martin
> 
>

[PATCH] c++, v2: Introduce append_ctor_to_tree_vector

2025-01-21 Thread Jakub Jelinek

On Tue, Jan 21, 2025 at 05:35:02PM +0100, Jakub Jelinek wrote:
> On Tue, Jan 21, 2025 at 05:15:17PM +0100, Jakub Jelinek wrote:
> > On Tue, Jan 21, 2025 at 11:06:35AM -0500, Jason Merrill wrote:
> > > > --- gcc/c-family/c-common.cc.jj 2025-01-20 18:00:35.667875671 +0100
> > > > +++ gcc/c-family/c-common.cc2025-01-21 09:29:23.955582581 +0100
> > > > @@ -9010,33 +9010,46 @@ make_tree_vector_from_list (tree list)
> > > > return ret;
> > > >   }
> > > > -/* Get a new tree vector of the values of a CONSTRUCTOR.  */
> > > > +/* Append to a tree vector the values of a CONSTRUCTOR.
> > > > +   nelts should be at least CONSTRUCTOR_NELTS (ctor) and v
> > > > +   should be initialized with make_tree_vector (); followed by
> > > > +   vec_safe_reserve (v, nelts); or equivalently vec_alloc (v, nelts);
> > > > +   optionally followed by pushes of other elements (up to
> > > > +   nelts - CONSTRUCTOR_NELTS (ctor)).  */
> > > 
> > > How about using v->allocated () instead of passing in nelts?
> > 
> > That is not necessarily the same.
> > Both vec_safe_reserve (v, nelts) and vec_alloc (v, nelts); actually
> > use exact=false, so they can allocate something larger (doesn't hurt
> > e.g. if RAW_DATA_CST is small enough and fits), but if it used v->allocated
> > (), it could overallocate from the overallocated size.
> > So, e.g. even if nelts + RAW_DATA_LENGTH (x) - 1 <= v->allocated () - 
> > v->length ()
> > and thus we could just use a vector without reallocating,
> > v->allocated () + RAW_DATA_LENGTH (x) - 1 could be too much.
> 
> On the other side, if one uses v = vec_alloc (v, nelts) then v->allocated ()
> is guaranteed to be MAX (4, nelts) and if one uses v = make_tree_vector ();
> vec_safe_reserve (v, nelts); then v->allocated () will be I think at most
> MAX (24, nelts).
> So perhaps not that big deal (at least if the function inside of it uses
> unsigned nelts = v->allocated (); and then uses nelts rather than
> v->allocated () in the loop.  Unless some new caller of the function uses
> a vector reallocated more times.

So, if you prefer that, here is the variant patch, so far lightly tested -
GXX_TESTSUITE_STDS=98,11,14,17,20,23,26 make check-g++ 
RUNTESTFLAGS="dg.exp='embed* pr118532.C explicit20.C class-deduction-aggr16.C'"
can do full bootstrap/regtest tonight.

2025-01-21  Jakub Jelinek  

gcc/c-family/
* c-common.h (append_ctor_to_tree_vector): Declare.
* c-common.cc (append_ctor_to_tree_vector): New function.
(make_tree_vector_from_ctor): Use it.
gcc/cp/
* call.cc (add_list_candidates): Use append_ctor_to_tree_vector.

--- gcc/c-family/c-common.h.jj  2025-01-17 11:29:33.139696380 +0100
+++ gcc/c-family/c-common.h 2025-01-21 09:30:09.520947570 +0100
@@ -1190,6 +1190,8 @@ extern vec *make_tree_vecto
 extern void release_tree_vector (vec *);
 extern vec *make_tree_vector_single (tree);
 extern vec *make_tree_vector_from_list (tree);
+extern vec *append_ctor_to_tree_vector (vec *,
+tree);
 extern vec *make_tree_vector_from_ctor (tree);
 extern vec *make_tree_vector_copy (const vec *);
 
--- gcc/c-family/c-common.cc.jj 2025-01-20 18:00:35.667875671 +0100
+++ gcc/c-family/c-common.cc2025-01-21 09:29:23.955582581 +0100
@@ -9010,33 +9010,45 @@ make_tree_vector_from_list (tree list)
   return ret;
 }
 
-/* Get a new tree vector of the values of a CONSTRUCTOR.  */
+/* Append to a tree vector the values of a CONSTRUCTOR.
+   v should be initialized with make_tree_vector (); followed by
+   vec_safe_reserve (v, nelts); or equivalently vec_alloc (v, nelts);
+   optionally followed by pushes of other elements (up to
+   nelts - CONSTRUCTOR_NELTS (ctor)).  */
 
 vec *
-make_tree_vector_from_ctor (tree ctor)
+append_ctor_to_tree_vector (vec *v, tree ctor)
 {
-  vec *ret = make_tree_vector ();
-  unsigned nelts = CONSTRUCTOR_NELTS (ctor);
-  vec_safe_reserve (ret, CONSTRUCTOR_NELTS (ctor));
+  unsigned nelts = v->allocated ();
   for (unsigned i = 0; i < CONSTRUCTOR_NELTS (ctor); ++i)
 if (TREE_CODE (CONSTRUCTOR_ELT (ctor, i)->value) == RAW_DATA_CST)
   {
tree raw_data = CONSTRUCTOR_ELT (ctor, i)->value;
nelts += RAW_DATA_LENGTH (raw_data) - 1;
-   vec_safe_reserve (ret, nelts - ret->length ());
+   vec_safe_reserve (v, nelts - v->length ());
if (TYPE_PRECISION (TREE_TYPE (raw_data)) > CHAR_BIT
|| TYPE_UNSIGNED (TREE_TYPE (raw_data)))
  for (unsigned j = 0; j < (unsigned) RAW_DATA_LENGTH (raw_data); ++j)
-   ret->quick_push (build_int_cst (TREE_TYPE (raw_data),
-   RAW_DATA_UCHAR_ELT (raw_data, j)));
+   v->quick_push (build_int_cst (TREE_TYPE (raw_data),
+ RAW_DATA_UCHAR_ELT (raw_data, j)));
else
  for (unsigned j = 0; j < (unsigned) RAW_DATA_LENGTH (raw_data); ++j)
-   ret->quick_push (build_int_cst (TREE_TYPE (raw_data),
-

[GCC-12/13][committed] d: Fix ICE in build_deref, at d/d-codegen.cc:1650 [PR111650]

2025-01-21 Thread Iain Buclaw

Hi,

This patch was committed some time ago in r14-10036, now it's being
backported to the gcc-13 and gcc-12 release branches.

The ICE in the D front-end was found to be caused by in some cases the
hidden closure parameter type being generated too early for nested
functions.  Better to update the type after the local closure/frame type
has been completed.

Bootstrapped and regression tested on x86_64-linux-gnu, committed to
releases/gcc-13 and releases/gcc-12.

Regards,
Iain.

---
PR d/111650

gcc/d/ChangeLog:

* decl.cc (get_fndecl_arguments): Move generation of frame type to ...
(DeclVisitor::visit (FuncDeclaration *)): ... here, after the call to
build_closure.

gcc/testsuite/ChangeLog:

* gdc.dg/pr111650.d: New test.

(cherry picked from commit 4d4929fe0654d51b52a2bf6e6188d7aad0bf17ac)
---
 gcc/d/decl.cc   | 20 ++--
 gcc/testsuite/gdc.dg/pr111650.d | 21 +
 2 files changed, 31 insertions(+), 10 deletions(-)
 create mode 100644 gcc/testsuite/gdc.dg/pr111650.d

diff --git a/gcc/d/decl.cc b/gcc/d/decl.cc
index 2a135b516aa..84274b3f3c3 100644
--- a/gcc/d/decl.cc
+++ b/gcc/d/decl.cc
@@ -163,16 +163,6 @@ get_fndecl_arguments (FuncDeclaration *decl)
  tree parm_decl = get_symbol_decl (decl->vthis);
  DECL_ARTIFICIAL (parm_decl) = 1;
  TREE_READONLY (parm_decl) = 1;
-
- if (decl->vthis->type == Type::tvoidptr)
-   {
- /* Replace generic pointer with back-end closure type
-(this wins for gdb).  */
- tree frame_type = FRAMEINFO_TYPE (get_frameinfo (decl));
- gcc_assert (frame_type != NULL_TREE);
- TREE_TYPE (parm_decl) = build_pointer_type (frame_type);
-   }
-
  param_list = chainon (param_list, parm_decl);
}
 
@@ -1060,6 +1050,16 @@ public:
 /* May change cfun->static_chain.  */
 build_closure (d);
 
+/* Replace generic pointer with back-end closure type
+   (this wins for gdb).  */
+if (d->vthis && d->vthis->type == Type::tvoidptr)
+  {
+   tree frame_type = FRAMEINFO_TYPE (get_frameinfo (d));
+   gcc_assert (frame_type != NULL_TREE);
+   tree parm_decl = get_symbol_decl (d->vthis);
+   TREE_TYPE (parm_decl) = build_pointer_type (frame_type);
+  }
+
 if (d->vresult)
   declare_local_var (d->vresult);
 
diff --git a/gcc/testsuite/gdc.dg/pr111650.d b/gcc/testsuite/gdc.dg/pr111650.d
new file mode 100644
index 000..4298a76d38f
--- /dev/null
+++ b/gcc/testsuite/gdc.dg/pr111650.d
@@ -0,0 +1,21 @@
+// { dg-do compile }
+ref V require(K, V)(ref V[K] aa, K key, lazy V value);
+
+struct Root
+{
+ulong[3] f;
+}
+
+Root[ulong] roots;
+
+Root getRoot(int fd, ulong rootID)
+{
+return roots.require(rootID,
+{
+Root result;
+inoLookup(fd, () => result);
+return result;
+}());
+}
+
+void inoLookup(int, scope Root delegate()) { }
-- 
2.43.0

Re: [GCC16 stage 1][RFC][PATCH 0/3]extend "counted_by" attribute to pointer fields of structures

2025-01-21 Thread Joseph Myers

On Tue, 21 Jan 2025, Martin Uecker wrote:

> The bigger issue seems that if you forward reference a member, you
> do not yet know its type.  So whatever syntax we pick, general expressions
> seem problematic anyway:
> 
> struct {
>   char *buf [[counted_by(2 * .n + 3)]];
>   unsigned int n;

That's why N3188 would require such a not-yet-declared member to have type 
const size_t.

-- 
Joseph S. Myers
josmy...@redhat.com

Re: [PATCH v2 04/12] AArch64: Diagnose OpenMP offloading when SVE types involved.

2025-01-21 Thread Jakub Jelinek

On Fri, Oct 18, 2024 at 11:52:25AM +0530, Tejas Belagod wrote:
> The target clause in OpenMP is used to offload loop kernels to accelarator
> peripeherals.  target's 'map' clause is used to move data from and to the
> accelarator.  When the data is SVE type, it may not be suitable because of
> various reasons i.e. the two SVE targets may not agree on vector size or
> some targets don't support variable vector size.  This makes SVE unsuitable
> for use in OMP's 'map' clause.  This patch diagnoses all such cases and issues
> an error where SVE types are not suitable.
> 
> Co-authored-by: Andrea Corallo 
> 
> gcc/ChangeLog:
> 
>   * target.h (type_context_kind): Add new context kinds for target 
> clauses.
>   * config/aarch64/aarch64-sve-builtins.cc (verify_type_context): Diagnose
>   SVE types for a given OpenMP context.
>   (omp_type_context): New.
>   * gimplify.cc (omp_notice_variable):  Diagnose implicitly-mapped SVE

s/  / / above

>   objects in OpenMP regions.
>   (gimplify_scan_omp_clauses): Diagnose SVE types for various target
>   clauses.
> --- a/gcc/config/aarch64/aarch64-sve-builtins.cc
> +++ b/gcc/config/aarch64/aarch64-sve-builtins.cc
> @@ -4956,12 +4956,35 @@ handle_arm_sve_vector_bits_attribute (tree *node, 
> tree, tree args, int,
>return NULL_TREE;
>  }
>  
> +
> +/* Return true if OpenMP context types.  */
> +
> +static bool
> +omp_type_context (type_context_kind context)
> +{
> +  switch (context)
> +{
> +case TCTX_OMP_MAP:
> +case TCTX_OMP_MAP_IMP_REF:
> +case TCTX_OMP_PRIVATE:
> +case TCTX_OMP_FIRSTPRIVATE:
> +case TCTX_OMP_DEVICE_ADDR:
> +  return true;
> +default:
> +  return false;;
> +}
> +}
> +
>  /* Implement TARGET_VERIFY_TYPE_CONTEXT for SVE types.  */
>  bool
>  verify_type_context (location_t loc, type_context_kind context,
>const_tree type, bool silent_p)

I know nothing about this verify_type_context stuff, will certainly
defer review of it to Richard S.
Just am wondering how can this work at all, is this in some anonymous
or aarch64 specific namespace?
Because tree.cc has verify_type_context definition with the same
types.

> --- a/gcc/gimplify.cc
> +++ b/gcc/gimplify.cc
> @@ -8430,11 +8430,13 @@ omp_notice_variable (struct gimplify_omp_ctx *ctx, 
> tree decl, bool in_code)
> | GOVD_MAP_ALLOC_ONLY)) == flags)
>   {
> tree type = TREE_TYPE (decl);
> +   location_t dummy = UNKNOWN_LOCATION;
>  
> if (gimplify_omp_ctxp->target_firstprivatize_array_bases
> && omp_privatize_by_reference (decl))
>   type = TREE_TYPE (type);
> -   if (!omp_mappable_type (type))
> +   if (!omp_mappable_type (type)
> +   || !verify_type_context (dummy, TCTX_OMP_MAP_IMP_REF, type))
>   {
> error ("%qD referenced in target region does not have "
>"a mappable type", decl);
> @@ -12165,6 +12167,8 @@ gimplify_scan_omp_clauses (tree *list_p, gimple_seq 
> *pre_p,
>unsigned int flags;
>tree decl;
>auto_vec addr_tokens;
> +  tree op = NULL_TREE;
> +  location_t loc = OMP_CLAUSE_LOCATION (c);
>  
>if (grp_end && c == OMP_CLAUSE_CHAIN (grp_end))
>   {

Ditto for review here.

> @@ -12172,6 +12176,34 @@ gimplify_scan_omp_clauses (tree *list_p, gimple_seq 
> *pre_p,
> grp_end = NULL_TREE;
>   }
>  
> +  if (code == OMP_TARGET || code == OMP_TARGET_DATA
> +   || code == OMP_TARGET_ENTER_DATA || code == OMP_TARGET_EXIT_DATA)

Just general formatting rule, if condition doesn't fit on one line,
split on every || (so each || goes on a separate line).

> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/aarch64/sve/omp/offload-parallel-loop.c
> @@ -0,0 +1,442 @@
> +/* { dg-do compile } */
> +/* { dg-options "-msve-vector-bits=256 -std=gnu99 -fopenmp -O2" } */
> +
> +#include 
> +
> +#define N __ARM_FEATURE_SVE_BITS
> +
> +svint32_t
> +omp_target_vla ()
> +{
> +  int a[N], b[N], c[N];
> +  svint32_t va, vb, vc;
> +  int i;
> +
> +#pragma omp parallel for
> +  for (i = 0; i < N; i++)
> +{
> +  b[i] = i;
> +  c[i] = i + 1;
> +}
> +
> +#pragma omp target parallel loop
> +  for (i = 0; i < 8; i++)
> +{
> +  vb = svld1_s32 (svptrue_b32 (), b); /* { dg-error {'vb' referenced in 
> target region does not have a mappable type} } */
> +  vc = svld1_s32 (svptrue_b32 (), c); /* { dg-error {'vc' referenced in 
> target region does not have a mappable type} } */
> +  va = svadd_s32_z (svptrue_b32 (), vb, vc); /* { dg-error {'va' 
> referenced in target region does not have a mappable type} } */

I think better would be to use the non-mappable types rather than having
something racy (all threads writing the same shared vars and in
the last case also using them).
> +}
> +
> +  return va;
> +}
> +
> +svint32_t
> +omp_target_data_map_1_vla ()
> +{
> +  int a[N], b[N], c[N];

Re: [PATCH v2 05/12] libgomp, AArch64: Test OpenMP lastprivate clause for various constructs.

2025-01-21 Thread Jakub Jelinek

On Fri, Oct 18, 2024 at 11:52:26AM +0530, Tejas Belagod wrote:
> +/* This worksharing construct binds to an implicit outer parallel region in
> +whose scope va is declared and therefore is default private.  This causes
> +the lastprivate clause list item va to be diagnosed as private in the 
> outer
> +context.  Similarly for constructs for and distribute.  */

So just add #pragma omp parallel around it, then it isn't private in outer
context but shared.

> +#pragma omp sections lastprivate (va) /* { dg-error {lastprivate variable 
> 'va' is private in outer context} } */
> +{
> +  #pragma omp section
> +  vb = svld1_s32 (svptrue_b32 (), b);
> +  #pragma omp section
> +  vc = svld1_s32 (svptrue_b32 (), c);
> +  #pragma omp section
> +  va = svadd_s32_z (svptrue_b32 (), vb, vc);

This is again racy (if there is any parallel around it, whether in main or
within the function), while vb and vc are implicitly shared, by the
time the last section is run, the first two might not even have started, or
might be done concurrently with the third one.  And, as the last section
is the only one which modifies the lastprivate variable, it isn't a good
example for it.  lastprivate is primarily private, each thread in the
parallel has its own copy and it is nice if each section say writes to it
as a temporary and then uses it for some operation.
E.g.
  #pragma omp section
  va = svld1_s32 (svptrue_b32 (), b);
  va = svadd_s32_z (svptrue_b32 (), va, svld1_s32 (svptrue_b32 (), c));
and then another section which subtracts instead of adds and yet another
which multiples rather than adds and then verify lastprivate got the
value from the multiplication.
> +

Again, put #pragma omp parallel around this

> +#pragma omp for lastprivate (va) /* { dg-error {lastprivate variable 'va' is 
> private in outer context} } */
> +  for (i = 0; i < 1; i++)

and perhaps more than one iteration, ideally do something more interesting,
but on the other side, as different iterations can be handled by different
threads, there can't be dependencies between the iterations.
> +{
> +  vb = svld1_s32 (svptrue_b32 (), b);
> +  vc = svld1_s32 (svptrue_b32 (), c);
> +  va = svadd_s32_z (svptrue_b32 (), vb, vc);
> +}

> +#pragma omp parallel
> +#pragma omp sections lastprivate (vb, vc)
> +{
> +  #pragma omp section
> +  vb = svld1_s32 (svptrue_b32 (), b);
> +  #pragma omp section
> +  vc = svld1_s32 (svptrue_b32 (), c);
> +}

This is invalid, vb is used, even when the last
section doesn't write it.  lastprivate for sections
means each thread has its own copy and value
from the thread which executed the last section (lexically)
is copied to the original.
If you are lucky and the same thread handles both sections,
then it would work, but it can be different thread...

> +#pragma omp parallel
> +#pragma omp for lastprivate (va, vb, vc)
> +  for (i = 0; i < 4; i++)
> +{
> +  vb = svld1_s32 (svptrue_b32 (), b + i * 8);
> +  vc = svld1_s32 (svptrue_b32 (), c + i * 8);
> +  va = svadd_s32_z (svptrue_b32 (), vb, vc);
> +  svst1_s32 (svptrue_b32 (), a + i * 8, va);

Is svst1 storing just one element or say 8 elements
and not the whole variable length vector?
If there is overlap between what different threads
write, then it would be racy (or if it can load beyond end of
array).

Jakub

Re: [PATCH,LRA] Restrict the reuse of spill slots [PR117868]

2025-01-21 Thread Denis Chertykov

Richard Sandiford  writes:

> Denis Chertykov  writes:
>>  PR rtl-optimization/117868
>> gcc/
>>  * lra-spills.cc (assign_stack_slot_num_and_sort_pseudos): Reuse slots
>>  only without allocated memory or only with equal or smaller registers
>>  with equal or smaller alignment.
>>  (lra_spill): Print slot size as width.
>>
>>
>> diff --git a/gcc/lra-spills.cc b/gcc/lra-spills.cc
>> index db78dcd28a3..93a0c92db9f 100644
>> --- a/gcc/lra-spills.cc
>> +++ b/gcc/lra-spills.cc
>> @@ -386,7 +386,18 @@ assign_stack_slot_num_and_sort_pseudos (int 
>> *pseudo_regnos, int n)
>>  && ! (lra_intersected_live_ranges_p
>>(slots[j].live_ranges,
>> lra_reg_info[regno].live_ranges)))
>> -  break;
>> +  {
>> +/* A slot without allocated memory can be shared.  */
>> +if (slots[j].mem == NULL_RTX)
>> +  break;
>> +
>> +/* A slot with allocated memory can be shared only with equal
>> +   or smaller register with equal or smaller alignment.  */
>> +if (slots[j].align >= spill_slot_alignment (mode)
>> +&& compare_sizes_for_sort (slots[j].size,
>> +   GET_MODE_SIZE (mode)) != -1)
>
> Sorry for piping up late, but I think this should be:
>
>   known_ge (GET_MODE_SIZE (mode), slots[j].size)
>
> From the comment above compare_sizes_for_sort:
>
> /* Compare A and B for sorting purposes, returning -1 if A should come
>before B, 0 if A and B are identical, and 1 if A should come after B.
>This is a lexicographical compare of the coefficients in reverse order.
>
>A consequence of this is that all constant sizes come before all
>non-constant ones, regardless of magnitude (since a size is never
>negative).  This is what most callers want.  For example, when laying
>data out on the stack, it's better to keep all the constant-sized
>data together so that it can be accessed as a constant offset from a
>single base.  */
>
> For example, compare_sizes_for_sort would return 1 for a slot size
> of 2+2X and a mode size of 16, but the slot would be too small for X <
> 7.

Ok.

Committed as obvious ef7ed227fc9

Denis.

gcc/
* lra-spills.cc (assign_stack_slot_num_and_sort_pseudos): Use known_ge
to compare sizes.

diff --git a/gcc/lra-spills.cc b/gcc/lra-spills.cc
index 93a0c92db9f..fc912c43ce6 100644
--- a/gcc/lra-spills.cc
+++ b/gcc/lra-spills.cc
@@ -394,8 +394,7 @@ assign_stack_slot_num_and_sort_pseudos (int *pseudo_regnos, 
int n)
/* A slot with allocated memory can be shared only with equal
   or smaller register with equal or smaller alignment.  */
if (slots[j].align >= spill_slot_alignment (mode)
-   && compare_sizes_for_sort (slots[j].size,
-  GET_MODE_SIZE (mode)) != -1)
+   && known_ge (slots[j].size, GET_MODE_SIZE (mode)))
  break;
  }
}

Re: [committed] testsuite: Require int32plus for test case pr117546.c

2025-01-21 Thread Sam James

Dimitar Dimitrov  writes:

> Test case is valid even if size of int is more than 32 bits.
>
> Pushed to trunk as obvious.
>
> gcc/testsuite/ChangeLog:
>
>   * gcc.dg/torture/pr117546.c: Require effective target int32plus.
>
> Cc: Georg-Johann Lay 
> Cc: Sam James 
> Signed-off-by: Dimitar Dimitrov 
> ---
>  gcc/testsuite/gcc.dg/torture/pr117546.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/gcc/testsuite/gcc.dg/torture/pr117546.c 
> b/gcc/testsuite/gcc.dg/torture/pr117546.c
> index b60f877a906..a837d056451 100644
> --- a/gcc/testsuite/gcc.dg/torture/pr117546.c
> +++ b/gcc/testsuite/gcc.dg/torture/pr117546.c
> @@ -1,4 +1,4 @@
> -/* { dg-do run { target int32 } } */
> +/* { dg-do run { target int32plus } } */
>  
>  typedef struct {
>int a;

Thanks again.

Re: [GCC16 stage 1][RFC][PATCH 0/3]extend "counted_by" attribute to pointer fields of structures

2025-01-21 Thread Martin Uecker

Am Dienstag, dem 21.01.2025 um 19:45 + schrieb Joseph Myers:
> On Tue, 21 Jan 2025, Martin Uecker wrote:
> 
> > Coudn't you use the rule that .len refers to the closest enclosing structure
> > even without __self__ ?  This would then also disambiguate between 
> > designators
> > and other uses.
> 
> Right now, an expression cannot start with '.', which provides the 
> disambiguation between designators and expressions as initializers. 

You could disambiguate directly after parsing the identifier, which
does not seem overly problematic.

>  Note 
> that for counted_by it's the closest enclosing *definition of a structure 
> type*.  That's different from designators where the *type of an object 
> being initialized by a brace-enclosed initializer list* is what's 
> relevant.

You would have to treat the members of the referenced structure
type  as in scope.  But this does not seem too absurd, because

counted_by ( (struct foo){ .len = 1 }.len ) )

could also be written with an inline definition:

counted_by ( (struct foo { int len; }){ .len = 1 }.len ) )

and then it would be natural to think of "len" as being in scope
inside the initializer.  


Martin

Patch ping^4 (Re: [PATCH] analyzer: Handle nonnull_if_nonzero attribute [PR117023])

2025-01-21 Thread Jakub Jelinek

On Tue, Jan 07, 2025 at 01:49:04PM +0100, Jakub Jelinek wrote:
> On Wed, Dec 18, 2024 at 12:15:15PM +0100, Jakub Jelinek wrote:
> > On Fri, Dec 06, 2024 at 05:07:40PM +0100, Jakub Jelinek wrote:
> > > I'd like to ping the
> > > https://gcc.gnu.org/pipermail/gcc-patches/2024-November/668699.html
> > > patch.
> > > 
> > > The patches it depended on are already committed and there is a patch
> > > which depends on this (the builtins shift from nonnull to 
> > > nonnull_if_nonzero
> > > where needed) which has been approved but can't be committed.
> > 
> > Gentle ping on this one.
> 
> Ping.

Ping again.

> Thanks
> 
> > > > 2024-11-14  Jakub Jelinek  
> > > > 
> > > > PR c/117023
> > > > gcc/analyzer/
> > > > * sm-malloc.cc (malloc_state_machine::on_stmt): Handle
> > > > also nonnull_if_nonzero attributes.
> > > > gcc/testsuite/
> > > > * c-c++-common/analyzer/call-summaries-malloc.c
> > > > (test_use_without_check): Pass 4 rather than sz to memset.
> > > > * c-c++-common/analyzer/strncpy-1.c (test_null_dst,
> > > > test_null_src): Pass 42 rather than count to strncpy.

Jakub

Re: [PATCH] c++: Introduce append_ctor_to_tree_vector

2025-01-21 Thread Jakub Jelinek

On Tue, Jan 21, 2025 at 11:06:35AM -0500, Jason Merrill wrote:
> > --- gcc/c-family/c-common.cc.jj 2025-01-20 18:00:35.667875671 +0100
> > +++ gcc/c-family/c-common.cc2025-01-21 09:29:23.955582581 +0100
> > @@ -9010,33 +9010,46 @@ make_tree_vector_from_list (tree list)
> > return ret;
> >   }
> > -/* Get a new tree vector of the values of a CONSTRUCTOR.  */
> > +/* Append to a tree vector the values of a CONSTRUCTOR.
> > +   nelts should be at least CONSTRUCTOR_NELTS (ctor) and v
> > +   should be initialized with make_tree_vector (); followed by
> > +   vec_safe_reserve (v, nelts); or equivalently vec_alloc (v, nelts);
> > +   optionally followed by pushes of other elements (up to
> > +   nelts - CONSTRUCTOR_NELTS (ctor)).  */
> 
> How about using v->allocated () instead of passing in nelts?

That is not necessarily the same.
Both vec_safe_reserve (v, nelts) and vec_alloc (v, nelts); actually
use exact=false, so they can allocate something larger (doesn't hurt
e.g. if RAW_DATA_CST is small enough and fits), but if it used v->allocated
(), it could overallocate from the overallocated size.
So, e.g. even if nelts + RAW_DATA_LENGTH (x) - 1 <= v->allocated () - v->length 
()
and thus we could just use a vector without reallocating,
v->allocated () + RAW_DATA_LENGTH (x) - 1 could be too much.

Jakub

Re: [PATCH 3/4] RISC-V: Add .note.gnu.property for ZICFILP and ZICFISS ISA extension

2025-01-21 Thread Robin Dapp

I'm going to push the attached as obvious if my local test shows
no issues.

Regards
 Robin

[PATCH] RISC-V: Unbreak bootstrap.

This fixes a wrong format specifier and an unused variable which should
re-enable bootstrap.

gcc/ChangeLog:

* config/riscv/riscv.cc (riscv_file_end): Fix format string.
(riscv_lshift_subword): Mark MODE as unused.
---
 gcc/config/riscv/riscv.cc | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/gcc/config/riscv/riscv.cc b/gcc/config/riscv/riscv.cc
index f5e672bb7f5..5a3a0504177 100644
--- a/gcc/config/riscv/riscv.cc
+++ b/gcc/config/riscv/riscv.cc
@@ -10375,7 +10375,7 @@ riscv_file_end ()
   fprintf (asm_out_file, "\t.long\t4f - 3f\n");
   fprintf (asm_out_file, "3:\n");
   /* zicfiss, zicfilp.  */
-  fprintf (asm_out_file, "\t.long\t%x\n", feature_1_and);
+  fprintf (asm_out_file, "\t.long\t%lx\n", feature_1_and);
   fprintf (asm_out_file, "4:\n");
   fprintf (asm_out_file, "\t.p2align\t%u\n", p2align);
   fprintf (asm_out_file, "5:\n");
@@ -11959,7 +11959,7 @@ riscv_subword_address (rtx mem, rtx *aligned_mem, rtx 
*shift, rtx *mask,
 /* Leftshift a subword within an SImode register.  */
 
 void
-riscv_lshift_subword (machine_mode mode, rtx value, rtx shift,
+riscv_lshift_subword (machine_mode mode ATTRIBUTE_UNUSED, rtx value, rtx shift,
  rtx *shifted_value)
 {
   rtx value_reg = gen_reg_rtx (SImode);
-- 
2.47.1

Re: [PATCH v2 2/2] LoongArch: Improve reassociation for bitwise operation and left shift [PR 115921]

2025-01-21 Thread Xi Ruoyao

On Tue, 2025-01-21 at 23:18 +0800, Xi Ruoyao wrote:
> On Tue, 2025-01-21 at 22:14 +0800, Xi Ruoyao wrote:
> > > > in GCC 13 the result is:
> > > > 
> > > >     or  $r12,$r4,$r0
> > > 
> > > Hmm, this strange move is caused by "&" in bstrpick_alsl_paired. 
> > > Is it
> > > really needed for the fusion?
> > 
> > Never mind, it's needed or a = ((a & 0x) << 1) + a will blow
> > up.
> > Stupid I.
> 
> And my code is indeed broken due to the missing '&':
> 
> /* { dg-do run } */
> /* { dg-options "-O2" } */
> 
> register long x asm ("s0");
> 
> #define TEST(x) (int)(((x & 0x114) << 3) + x)
> 
> [[gnu::noipa]] void
> test (void)
> {
>   x = TEST (x);
> }
> 
> int
> main (void)
> {
>   x = 0x;
>   test ();
>   if (x != TEST (0x))
>     __builtin_trap ();
> }
> 
> ends up:
> 
> 0760 :
>  760: 034452f7    andi$s0, $s0, 0x114
>  764: 00055ef7    alsl.w  $s0, $s0, $s0, 0x3
>  768: 4c20    ret
> 
> and fails.  The fix would be like https://gcc.gnu.org/r15-5074.

Now bootstrapping & testing two patches attached here instead.  They
should fix the wrong-code and miss-optimization regressions, except the
instruction ordering which requires TARGET_SCHED_MACRO_FUSION_PAIR_P.

> > > >     bstrpick.d  $r4,$r12,31,0
> > > >     alsl.d  $r4,$r4,$r6,2
> > > >     or  $r12,$r5,$r0
> > > >     bstrpick.d  $r5,$r12,31,0
> > > >     alsl.d  $r5,$r5,$r6,2
> > > >     jr  $r1

-- 
Xi Ruoyao 
School of Aerospace Science and Technology, Xidian University
From 09a4f641331709685b6b5fbcb07d2a26b42a5576 Mon Sep 17 00:00:00 2001
From: Xi Ruoyao 
Date: Tue, 21 Jan 2025 23:01:38 +0800
Subject: [PATCH 1/2] LoongArch: Fix wrong code with
 _alsl_reversesi_extended

The second source register of this insn cannot be the same as the
destination register.

gcc/ChangeLog:

	* config/loongarch/loongarch.md
	(_alsl_reversesi_extended): Add '&' to the destination
	register constraint and append '0' to the first source register
	constraint to indicate the destination register cannot be same
	as the second source register, and change the split condition to
	reload_completed so that the insn will be split only after RA in
	order to obtain allocated registers that satisfy the above
	constraints.

gcc/testsuite/ChangeLog:

	* gcc.target/loongarch/bitwise-shift-reassoc-clobber.c: New
	test.
---
 gcc/config/loongarch/loongarch.md |  6 +++---
 .../loongarch/bitwise-shift-reassoc-clobber.c | 21 +++
 2 files changed, 24 insertions(+), 3 deletions(-)
 create mode 100644 gcc/testsuite/gcc.target/loongarch/bitwise-shift-reassoc-clobber.c

diff --git a/gcc/config/loongarch/loongarch.md b/gcc/config/loongarch/loongarch.md
index 223e2b9f37f..1392325038c 100644
--- a/gcc/config/loongarch/loongarch.md
+++ b/gcc/config/loongarch/loongarch.md
@@ -3160,13 +3160,13 @@ (define_insn_and_split "_shift_reverse"
 ;; add.w => alsl.w, so implement slli.d + and + add.w => and + alsl.w on
 ;; our own.
 (define_insn_and_split "_alsl_reversesi_extended"
-  [(set (match_operand:DI 0 "register_operand" "=r")
+  [(set (match_operand:DI 0 "register_operand" "=&r")
 	(sign_extend:DI
 	  (plus:SI
 	(subreg:SI
 	  (any_bitwise:DI
 		(ashift:DI
-		  (match_operand:DI 1 "register_operand" "r")
+		  (match_operand:DI 1 "register_operand" "r0")
 		  (match_operand:SI 2 "const_immalsl_operand" ""))
 		(match_operand:DI 3 "const_int_operand" "i"))
 	  0)
@@ -3175,7 +3175,7 @@ (define_insn_and_split "_alsl_reversesi_extended"
&& loongarch_reassoc_shift_bitwise (, operands[2], operands[3],
    SImode)"
   "#"
-  "&& true"
+  "&& reload_completed"
   [; r0 = r1 [&|^] r3 is emitted in PREPARATION-STATEMENTS because we
; need to handle a special case, see below.
(set (match_dup 0)
diff --git a/gcc/testsuite/gcc.target/loongarch/bitwise-shift-reassoc-clobber.c b/gcc/testsuite/gcc.target/loongarch/bitwise-shift-reassoc-clobber.c
new file mode 100644
index 000..9985a18ea08
--- /dev/null
+++ b/gcc/testsuite/gcc.target/loongarch/bitwise-shift-reassoc-clobber.c
@@ -0,0 +1,21 @@
+/* { dg-do run } */
+/* { dg-options "-O2" } */
+
+register long x asm ("s0");
+
+#define TEST(x) (int)(((x & 0x114) << 3) + x)
+
+[[gnu::noipa]] void
+test (void)
+{
+  x = TEST (x);
+}
+
+int
+main (void)
+{
+  x = 0x;
+  test ();
+  if (x != TEST (0x))
+__builtin_trap ();
+}
-- 
2.48.1

From 5c0b402020867110ba5c33760f961733fddfee01 Mon Sep 17 00:00:00 2001
From: Xi Ruoyao 
Date: Tue, 21 Jan 2025 23:36:25 +0800
Subject: [PATCH 2/2] LoongArch: Partially fix code regression from r15-7062

The uarch can fuse bstrpick.d rd,rs1,31,0 and alsl.d rd,rd,rs2,shamt,
so for this special case we should use alsl.d instead of slli.d.  And
I'd hoped late combine to handle slli.d + and + add.d => and + slli.d +
add.d => and + alsl.d, but it does not always work (even before the
alsl.d special case gets in the way).  So let's handle this on our own.

The fix is p

[PATCH] c++: Handle CPP_EMBED in cp_parser_objc_message_args [PR118586]

2025-01-21 Thread Jakub Jelinek

Hi!

As the following testcases show, I forgot to handle CPP_EMBED in
cp_parser_objc_message_args which is another place which can parse
possibly long valid lists of CPP_COMMA separated CPP_NUMBER tokens.

Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?

2025-01-20  Jakub Jelinek  

PR objc++/118586
gcc/cp/
* parser.cc (cp_parser_objc_message_args): Handle CPP_EMBED.
gcc/testsuite/
* objc.dg/embed-1.m: New test.
* obj-c++.dg/embed-1.mm: New test.
* obj-c++.dg/va-meth-2.mm: New test.

--- gcc/cp/parser.cc.jj 2025-01-17 19:27:34.052140136 +0100
+++ gcc/cp/parser.cc2025-01-20 20:16:23.082876036 +0100
@@ -36632,14 +36632,22 @@ cp_parser_objc_message_args (cp_parser*
   /* Handle non-selector arguments, if any. */
   while (token->type == CPP_COMMA)
 {
-  tree arg;
-
   cp_lexer_consume_token (parser->lexer);
-  arg = cp_parser_assignment_expression (parser);
 
-  addl_args
-   = chainon (addl_args,
-  build_tree_list (NULL_TREE, arg));
+  if (cp_lexer_next_token_is (parser->lexer, CPP_EMBED))
+   {
+ tree raw_data = cp_lexer_peek_token (parser->lexer)->u.value;
+ cp_lexer_consume_token (parser->lexer);
+ for (tree argument : raw_data_range (raw_data))
+   addl_args = chainon (addl_args,
+build_tree_list (NULL_TREE, argument));
+   }
+  else
+   {
+ tree arg = cp_parser_assignment_expression (parser);
+ addl_args = chainon (addl_args,
+  build_tree_list (NULL_TREE, arg));
+   }
 
   token = cp_lexer_peek_token (parser->lexer);
 }
--- gcc/testsuite/objc.dg/embed-1.m.jj  2025-01-20 20:41:05.974260340 +0100
+++ gcc/testsuite/objc.dg/embed-1.m 2025-01-20 20:28:54.934427543 +0100
@@ -0,0 +1,14 @@
+/* PR objc++/118586 */
+/* { dg-do compile } */
+
+@interface Foo
++ (int) bar: (int) firstNumber, int secondNumber, ...;
+@end
+
+void
+baz (void)
+{
+  [Foo bar: 1, 2,
+#embed __FILE__
+   , -1];
+}
--- gcc/testsuite/obj-c++.dg/embed-1.mm.jj  2025-01-20 20:45:07.907894733 
+0100
+++ gcc/testsuite/obj-c++.dg/embed-1.mm 2025-01-20 20:49:18.743405280 +0100
@@ -0,0 +1,15 @@
+// PR objc++/118586
+// { dg-do compile }
+// { dg-options "" }
+
+@interface Foo
++ (int) bar: (int) firstNumber, int secondNumber, ...;
+@end
+
+void
+baz (void)
+{
+  [Foo bar: 1, 2,
+#embed __FILE__
+   , -1];
+}
--- gcc/testsuite/obj-c++.dg/va-meth-2.mm.jj2025-01-20 20:34:59.431358606 
+0100
+++ gcc/testsuite/obj-c++.dg/va-meth-2.mm   2025-01-20 20:40:14.413977609 
+0100
@@ -0,0 +1,87 @@
+/* PR objc++/118586 */
+/* Based on objc/execute/va_method.m, by Nicola Pero */
+
+/* { dg-do run } */
+/* { dg-xfail-run-if "Needs OBJC2 ABI" { *-*-darwin* && { lp64 && { ! objc2 } 
} } { "-fnext-runtime" } { "" } } */
+#include "../objc-obj-c++-shared/TestsuiteObject.m"
+#include 
+#include 
+
+/* Test methods with "C-style" trailing arguments, with or without ellipsis. */
+
+@interface MathClass: TestsuiteObject
+/* sum positive numbers; -1 ends the list */
++ (int) sum: (int) firstNumber, int secondNumber, ...;
++ (int) prod: (int) firstNumber, int secondNumber, int thirdNumber;
++ (int) minimum: (int) firstNumber, ...;
+@end
+
+extern "C" int some_func(id self, SEL _cmd, int firstN, int secondN, int 
thirdN, ...) {
+  return firstN + secondN + thirdN;
+}
+
+@implementation MathClass
++ (int) sum: (int) firstNumber, int secondNumber, ...
+{
+  va_list ap;
+  int sum = 0, number = 0;
+
+  va_start (ap, secondNumber);
+  number = firstNumber + secondNumber;
+
+  while (number >= 0)
+{
+  sum += number;
+  number = va_arg (ap, int);
+}
+  
+  va_end (ap);
+
+  return sum;
+}
++ (int) prod: (int) firstNumber, int secondNumber, int thirdNumber {
+  return firstNumber * secondNumber * thirdNumber;
+}
++ (int) minimum: (int) firstNumber, ...
+{
+  va_list ap;
+  int minimum = 999, number = 0;
+  
+  va_start (ap, firstNumber);
+  number = firstNumber;
+  
+  while (number >= 0)
+{
+  minimum = (minimum < number ? minimum: number);
+  number = va_arg (ap, int);
+}
+  
+  va_end (ap);
+  
+  return minimum;
+}
+@end
+
+int main (void)
+{
+#define ONETOTEN 1, 2, 3, 4, 5, 6, 7, 8, 9, 10
+  if ([MathClass sum: ONETOTEN, ONETOTEN, ONETOTEN, ONETOTEN,
+   ONETOTEN, ONETOTEN, ONETOTEN, ONETOTEN, ONETOTEN, ONETOTEN,
+   ONETOTEN, ONETOTEN, ONETOTEN, ONETOTEN, ONETOTEN, ONETOTEN,
+   ONETOTEN, ONETOTEN, ONETOTEN, ONETOTEN, ONETOTEN, ONETOTEN,
+   ONETOTEN, ONETOTEN, ONETOTEN, ONETOTEN, ONETOTEN, ONETOTEN,
+   ONETOTEN, ONETOTEN, -1] != 1650)
+abort ();
+  if ([MathClass prod: 4, 5, 6] != 120)
+abort ();
+#define TWENTYONETOTHIRTY 21, 22, 23, 24, 25, 26, 27, 28, 29, 30
+  if ([MathClass minimum: TWENTYONETOTHIRTY, TWENTYONETOTHIRTY,
+   TWENTYONETOTHIRTY, TWENTYONETOTHIRTY, TWENTYONETOTHIRTY,
+   17, 9, 133, 84, 35, TWENTYONETOTHIRTY, TWENTYONETOTH

[PATCH] c++: Handle CWG2867 even in namespace scope structured bindings in header modules [PR115769]

2025-01-21 Thread Jakub Jelinek

Hi!

On top of the
https://gcc.gnu.org/pipermail/gcc-patches/2024-September/662507.html
https://gcc.gnu.org/pipermail/gcc-patches/2024-September/662750.html
patches (where the first one implements CWG2867 for block scope static
or thread_local structured bindings and the latter for namespace scope
structured bindings; CWG2867 for automatic structured bindings is
already committed in r15-3513) the following patch implements the module
streaming of the new STATIC_INIT_DECOMP_BASE_P and
STATIC_INIT_DECOMP_NONBASE_P flags.  As I think namespace scope structured
bindings in the header modules will be pretty rare, I've tried to stream
something extra only when they actually appear, in that case it streams
extra INTEGER_CSTs which mark end of STATIC_INIT_DECOMP_*BASE_P (0),
start of STATIC_INIT_DECOMP_BASE_P for static_aggregates (1), start of
STATIC_INIT_DECOMP_NONBASE_P for static_aggregates (2) and ditto for
tls_aggregates (3 and 4).
The patch also copies with just small tweaks the testcases from the
second patch above as header modules.

Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?

2025-01-21  Jakub Jelinek  

PR c++/115769
gcc/cp/
* module.cc (module_state::write_inits): Verify
STATIC_INIT_DECOMP_{,NON}BASE_P flags and stream changes in those
out.
(module_state::read_inits): Stream those flags in.
gcc/testsuite/
* g++.dg/modules/dr2867-1_a.H: New test.
* g++.dg/modules/dr2867-1_b.C: New test.
* g++.dg/modules/dr2867-2_a.H: New test.
* g++.dg/modules/dr2867-2_b.C: New test.
* g++.dg/modules/dr2867-3_a.H: New test.
* g++.dg/modules/dr2867-3_b.C: New test.
* g++.dg/modules/dr2867-4_a.H: New test.
* g++.dg/modules/dr2867-4_b.C: New test.

--- gcc/cp/module.cc.jj 2025-01-21 09:04:18.085457077 +0100
+++ gcc/cp/module.cc2025-01-21 13:31:40.670938455 +0100
@@ -18723,6 +18723,65 @@ module_state::write_inits (elf_out *to,
   for (tree init = list; init; init = TREE_CHAIN (init))
if (TREE_LANG_FLAG_0 (init))
  {
+   if (STATIC_INIT_DECOMP_BASE_P (init))
+ {
+   /* Ensure that in the returned result chain if the
+  STATIC_INIT_DECOMP_*BASE_P flags are set, there is
+  always one or more STATIC_INIT_DECOMP_BASE_P TREE_LIST
+  followed by one or more STATIC_INIT_DECOMP_NONBASE_P.  */
+   int phase = 0;
+   tree last = NULL_TREE;
+   for (tree init2 = TREE_CHAIN (init);
+init2; init2 = TREE_CHAIN (init2))
+ {
+   if (phase == 0 && STATIC_INIT_DECOMP_BASE_P (init2))
+ ;
+   else if (phase == 0
+&& STATIC_INIT_DECOMP_NONBASE_P (init2))
+ {
+   phase = TREE_LANG_FLAG_0 (init2) ? 2 : 1;
+   last = init2;
+ }
+   else if (IN_RANGE (phase, 1, 2)
+&& STATIC_INIT_DECOMP_NONBASE_P (init2))
+ {
+   if (TREE_LANG_FLAG_0 (init2))
+ phase = 2;
+   last = init2;
+ }
+   else
+ break;
+ }
+   if (phase == 2)
+ {
+   /* In that case, add markers about it so that the
+  STATIC_INIT_DECOMP_BASE_P and
+  STATIC_INIT_DECOMP_NONBASE_P flags can be restored.  */
+   sec.tree_node (build_int_cst (integer_type_node,
+ 2 * passes + 1));
+   phase = 1;
+   for (tree init2 = init; init2 != TREE_CHAIN (last);
+init2 = TREE_CHAIN (init2))
+ if (TREE_LANG_FLAG_0 (init2))
+   {
+ tree decl = TREE_VALUE (init2);
+ if (phase == 1
+ && STATIC_INIT_DECOMP_NONBASE_P (init2))
+   {
+ sec.tree_node (build_int_cst (integer_type_node,
+   2 * passes + 2));
+ phase = 2;
+   }
+ dump ("Initializer:%u for %N", count, decl);
+ sec.tree_node (decl);
+ ++count;
+   }
+   sec.tree_node (integer_zero_node);
+   init = last;
+   continue;
+ }
+ }
+
tree decl = TREE_VALUE (init);
 
dump ("Initializer:%u for %N", count, decl);
@@ -18793,16 +18852,43 @@ module_state::read_inits (unsigned count
   dump.indent ();
 
   lazy_snum = ~0u;
+  int decomp_phase = 0;
   f

Re: [PATCH] testsuite: Fixes for test case pr117546.c

2025-01-21 Thread Dimitar Dimitrov

On Tue, Jan 21, 2025 at 04:28:59PM +0100, Georg-Johann Lay wrote:
> Am 18.01.25 um 19:30 schrieb Dimitar Dimitrov:
> > This test fails on AVR.
> > 
> > Debugging the test on x86 host, I noticed that u in function s sometimes
> > has value 16128.  The "t <= 3 * u" expression in the same function
> > results in signed integer overflow for targets with sizeof(int)=16.
> > 
> > Fix by requiring int32 effective target.
> 
> Thank you.  Though int32plus should be good enough?
> 
> Johann

Yes, you are correct. I'll resend.

Regards,
Dimitar

[PATCH] c++: Improve cp_parser_objc_messsage_args compile time

2025-01-21 Thread Jakub Jelinek

On Tue, Jan 21, 2025 at 06:47:53PM +0100, Jakub Jelinek wrote:
> Indeed, I've just used what it was doing without thinking too much about it,
> sorry.
> addl_args = tree_cons (NULL_TREE, arg, addl_args);
> with addl_args = nreverse (addl_args); after the loop might be better,
> can test that incrementally.  sel_args is handled the same and should have
> the same treatment.

Here is incremental patch to do that.

So far verified on the 2 va-meth*.mm testcases (one without CPP_EMBED, one
with) that -fdump-tree-gimple is the same before/after the patch.

Ok for trunk if it passes bootstrap/regtest (or defer for GCC 16?)?

2025-01-21  Jakub Jelinek  

* parser.cc (cp_parser_objc_message_args): Use tree_cons with
nreverse at the end for both sel_args and addl_args, instead of
chainon with build_tree_list second argument.

--- gcc/cp/parser.cc.jj 2025-01-21 18:49:42.478969570 +0100
+++ gcc/cp/parser.cc2025-01-21 18:52:19.035786901 +0100
@@ -36724,9 +36724,7 @@ cp_parser_objc_message_args (cp_parser*
   cp_parser_require (parser, CPP_COLON, RT_COLON);
   arg = cp_parser_assignment_expression (parser);
 
-  sel_args
-   = chainon (sel_args,
-  build_tree_list (selector, arg));
+  sel_args = tree_cons (selector, arg, sel_args);
 
   token = cp_lexer_peek_token (parser->lexer);
 }
@@ -36741,14 +36739,12 @@ cp_parser_objc_message_args (cp_parser*
  tree raw_data = cp_lexer_peek_token (parser->lexer)->u.value;
  cp_lexer_consume_token (parser->lexer);
  for (tree argument : raw_data_range (raw_data))
-   addl_args = chainon (addl_args,
-build_tree_list (NULL_TREE, argument));
+   addl_args = tree_cons (NULL_TREE, argument, addl_args);
}
   else
{
  tree arg = cp_parser_assignment_expression (parser);
- addl_args = chainon (addl_args,
-  build_tree_list (NULL_TREE, arg));
+ addl_args = tree_cons (NULL_TREE, arg, addl_args);
}
 
   token = cp_lexer_peek_token (parser->lexer);
@@ -36760,7 +36756,7 @@ cp_parser_objc_message_args (cp_parser*
   return build_tree_list (error_mark_node, error_mark_node);
 }
 
-  return build_tree_list (sel_args, addl_args);
+  return build_tree_list (nreverse (sel_args), nreverse (addl_args));
 }
 
 /* Parse an Objective-C encode expression.


Jakub

Re: [GCC16/PATCH] combine: Better split point for `(and (not X))` [PR111949]

2025-01-21 Thread Andrew Pinski

On Tue, Jan 21, 2025 at 9:55 AM Jeff Law  wrote:
>
>
>
> On 1/20/25 9:38 PM, Andrew Pinski wrote:
> > In a similar way find_split_point handles `a+b*C`, this adds
> > the split point for `~a & b`.  This allows for better instruction
> > selection when the target has this instruction (aarch64, arm and x86_64
> > are examples which have this).
> >
> > Built and tested for aarch64-linux-gnu.
> >
> >   PR rtl-optmization/111949
> >
> > gcc/ChangeLog:
> >
> >   * combine.cc (find_split_point): Add a split point
> >   for `(and (not X) Y)` if not in the outer set already.
> >
> > gcc/testsuite/ChangeLog:
> >
> >   * gcc.target/aarch64/bic-1.c: New test.
> gcc-16, unless there's a good tie-in to a regression?

I have not found a testcase which shows not generating an extra `and`
or not generating the `andn/bic` is a regression yet.

Thanks,
Andrew

>
> jeff
>

Re: [GCC16 stage 1][RFC][PATCH 0/3]extend "counted_by" attribute to pointer fields of structures

2025-01-21 Thread Martin Uecker

Am Dienstag, dem 21.01.2025 um 18:40 + schrieb Joseph Myers:
> On Tue, 21 Jan 2025, Qing Zhao wrote:
> 
> > So, even after we introduce the designator syntax for counted_by attribute, 
> >  arbitrary expressions as:
> > 
> > counted_by (.len1 + const)
> > counted_by (.len1 + .len2) 
> > 
> > Still cannot be supported? 
> 
> Indeed.  Attempting to use ".len1" inside something that looks like an 
> expression is fundamentally ambiguous, as I've noted in previous 
> discussions of such syntax. 

One could allow only a very restricted subset of expressions similar to
how  .len = x  in initializers could also seen as a restricted subset of
the general expression syntax.

Allowing arbitrary expressions also raises other questions, e.g. what to
do about side effects, so my recommendation would be to only allow 
a restricted subset anyway.


>  Consider
> 
>   counted_by ((struct s) { .len1 = x }.len1)
> 
> where you now have an ambiguity of whether ".len1 = x" is a designated 
> initializer for the compound literal of type struct s, or an assignment to 
> ".len1" within the structure referred to in counted_by.
> 
> > If not, how should we support simple expressions for counted_by attribute? 
> 
> First, I should point out that the proposed standard feature in this area 
> (which received along-the-lines support at the WG14 meeting in Strasbourg 
> last year, though with a very large number of issues with the proposed 
> wording that would need to be resolved for any such feature to go in, and 
> without support for another paper needed for the feature to be useful) was 
> intentionally limited; it didn't try for general expressions, just for 
> .IDENTIFIER, where IDENTIFIER named a *const-qualified* structure member 
> (that thus had to be set when the structure was initialized and not 
> modified thereafter, so avoiding various of the complications we have with 
> counted_by of defining exactly when the value applies in relation to 
> accesses to different structure members).
> 
> But if you want a less-limited feature that allows for expressions, you 
> need some syntax for referring to a structure member that's not ambiguous.  
> For example, some new notation such as __self__.len1 to refer to a member 
> of the closest enclosing structure definition when in counted_by (while 
> being invalid except in counted_by inside a structure definition).  
> (That's just one example of how you might define syntax that avoids 
> ambiguity.)

Coudn't you use the rule that .len refers to the closest enclosing structure
even without __self__ ?  This would then also disambiguate between designators
and other uses.

Martin

>

Re: [PATCH] c++: Introduce append_ctor_to_tree_vector

2025-01-21 Thread Jakub Jelinek

On Tue, Jan 21, 2025 at 05:15:17PM +0100, Jakub Jelinek wrote:
> On Tue, Jan 21, 2025 at 11:06:35AM -0500, Jason Merrill wrote:
> > > --- gcc/c-family/c-common.cc.jj   2025-01-20 18:00:35.667875671 +0100
> > > +++ gcc/c-family/c-common.cc  2025-01-21 09:29:23.955582581 +0100
> > > @@ -9010,33 +9010,46 @@ make_tree_vector_from_list (tree list)
> > > return ret;
> > >   }
> > > -/* Get a new tree vector of the values of a CONSTRUCTOR.  */
> > > +/* Append to a tree vector the values of a CONSTRUCTOR.
> > > +   nelts should be at least CONSTRUCTOR_NELTS (ctor) and v
> > > +   should be initialized with make_tree_vector (); followed by
> > > +   vec_safe_reserve (v, nelts); or equivalently vec_alloc (v, nelts);
> > > +   optionally followed by pushes of other elements (up to
> > > +   nelts - CONSTRUCTOR_NELTS (ctor)).  */
> > 
> > How about using v->allocated () instead of passing in nelts?
> 
> That is not necessarily the same.
> Both vec_safe_reserve (v, nelts) and vec_alloc (v, nelts); actually
> use exact=false, so they can allocate something larger (doesn't hurt
> e.g. if RAW_DATA_CST is small enough and fits), but if it used v->allocated
> (), it could overallocate from the overallocated size.
> So, e.g. even if nelts + RAW_DATA_LENGTH (x) - 1 <= v->allocated () - 
> v->length ()
> and thus we could just use a vector without reallocating,
> v->allocated () + RAW_DATA_LENGTH (x) - 1 could be too much.

On the other side, if one uses v = vec_alloc (v, nelts) then v->allocated ()
is guaranteed to be MAX (4, nelts) and if one uses v = make_tree_vector ();
vec_safe_reserve (v, nelts); then v->allocated () will be I think at most
MAX (24, nelts).
So perhaps not that big deal (at least if the function inside of it uses
unsigned nelts = v->allocated (); and then uses nelts rather than
v->allocated () in the loop.  Unless some new caller of the function uses
a vector reallocated more times.

Jakub

[committed] testsuite: Require int32plus for test case pr117546.c

2025-01-21 Thread Dimitar Dimitrov

Test case is valid even if size of int is more than 32 bits.

Pushed to trunk as obvious.

gcc/testsuite/ChangeLog:

* gcc.dg/torture/pr117546.c: Require effective target int32plus.

Cc: Georg-Johann Lay 
Cc: Sam James 
Signed-off-by: Dimitar Dimitrov 
---
 gcc/testsuite/gcc.dg/torture/pr117546.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/gcc/testsuite/gcc.dg/torture/pr117546.c 
b/gcc/testsuite/gcc.dg/torture/pr117546.c
index b60f877a906..a837d056451 100644
--- a/gcc/testsuite/gcc.dg/torture/pr117546.c
+++ b/gcc/testsuite/gcc.dg/torture/pr117546.c
@@ -1,4 +1,4 @@
-/* { dg-do run { target int32 } } */
+/* { dg-do run { target int32plus } } */
 
 typedef struct {
   int a;
-- 
2.48.1

Re: [GCC16 stage 1][RFC][PATCH 0/3]extend "counted_by" attribute to pointer fields of structures

2025-01-21 Thread Joseph Myers

On Tue, 21 Jan 2025, Martin Uecker wrote:

> Coudn't you use the rule that .len refers to the closest enclosing structure
> even without __self__ ?  This would then also disambiguate between designators
> and other uses.

Right now, an expression cannot start with '.', which provides the 
disambiguation between designators and expressions as initializers.  Note 
that for counted_by it's the closest enclosing *definition of a structure 
type*.  That's different from designators where the *type of an object 
being initialized by a brace-enclosed initializer list* is what's 
relevant.

-- 
Joseph S. Myers
josmy...@redhat.com

Re: [PATCH] driver: -fhardened and -z lazy/-z norelro [PR117739]

2025-01-21 Thread Marek Polacek

Ping.

On Fri, Jan 10, 2025 at 03:07:52PM -0500, Marek Polacek wrote:
> Ping.
> 
> On Fri, Dec 20, 2024 at 08:58:05AM -0500, Marek Polacek wrote:
> > Ping.
> > 
> > On Tue, Nov 26, 2024 at 05:35:50PM -0500, Marek Polacek wrote:
> > > Bootstrapped/regtested on x86_64-pc-linux-gnu, ok for trunk?
> > > 
> > > -- >8 --
> > > As the manual states, using "-fhardened -fstack-protector" will produce
> > > a warning because -fhardened wants to enable -fstack-protector-strong,
> > > but it can't since it's been overriden by the weaker -fstack-protector.
> > > 
> > > -fhardened also attempts to enable -Wl,-z,relro,-z,now.  By the same
> > > logic as above, "-fhardened -z norelro" or "-fhardened -z lazy" should
> > > produce the same warning.  But we don't detect this combination, so
> > > this patch fixes it.  I also renamed a variable to better reflect its
> > > purpose.
> > > 
> > > Also don't check warn_hardened in process_command, since it's always
> > > true there.
> > > 
> > > Also tweak wording in the manual as Jon Wakely suggested on IRC.
> > > 
> > >   PR driver/117739
> > > 
> > > gcc/ChangeLog:
> > > 
> > >   * doc/invoke.texi: Tweak wording for -Whardened.
> > >   * gcc.cc (driver_handle_option): If -z lazy or -z norelro was
> > >   specified, don't enable linker hardening.
> > >   (process_command): Don't check warn_hardened.
> > > 
> > > gcc/testsuite/ChangeLog:
> > > 
> > >   * c-c++-common/fhardened-16.c: New test.
> > >   * c-c++-common/fhardened-17.c: New test.
> > >   * c-c++-common/fhardened-18.c: New test.
> > >   * c-c++-common/fhardened-19.c: New test.
> > >   * c-c++-common/fhardened-20.c: New test.
> > >   * c-c++-common/fhardened-21.c: New test.
> > > ---
> > >  gcc/doc/invoke.texi   |  4 ++--
> > >  gcc/gcc.cc| 20 ++--
> > >  gcc/testsuite/c-c++-common/fhardened-16.c |  5 +
> > >  gcc/testsuite/c-c++-common/fhardened-17.c |  5 +
> > >  gcc/testsuite/c-c++-common/fhardened-18.c |  5 +
> > >  gcc/testsuite/c-c++-common/fhardened-19.c |  5 +
> > >  gcc/testsuite/c-c++-common/fhardened-20.c |  5 +
> > >  gcc/testsuite/c-c++-common/fhardened-21.c |  5 +
> > >  8 files changed, 46 insertions(+), 8 deletions(-)
> > >  create mode 100644 gcc/testsuite/c-c++-common/fhardened-16.c
> > >  create mode 100644 gcc/testsuite/c-c++-common/fhardened-17.c
> > >  create mode 100644 gcc/testsuite/c-c++-common/fhardened-18.c
> > >  create mode 100644 gcc/testsuite/c-c++-common/fhardened-19.c
> > >  create mode 100644 gcc/testsuite/c-c++-common/fhardened-20.c
> > >  create mode 100644 gcc/testsuite/c-c++-common/fhardened-21.c
> > > 
> > > diff --git a/gcc/doc/invoke.texi b/gcc/doc/invoke.texi
> > > index 346ac1369b8..371f723539c 100644
> > > --- a/gcc/doc/invoke.texi
> > > +++ b/gcc/doc/invoke.texi
> > > @@ -7012,8 +7012,8 @@ This warning is enabled by @option{-Wall}.
> > >  Warn when @option{-fhardened} did not enable an option from its set (for
> > >  which see @option{-fhardened}).  For instance, using @option{-fhardened}
> > >  and @option{-fstack-protector} at the same time on the command line 
> > > causes
> > > -@option{-Whardened} to warn because @option{-fstack-protector-strong} is
> > > -not enabled by @option{-fhardened}.
> > > +@option{-Whardened} to warn because @option{-fstack-protector-strong} 
> > > will
> > > +not be enabled by @option{-fhardened}.
> > >  
> > >  This warning is enabled by default and has effect only when 
> > > @option{-fhardened}
> > >  is enabled.
> > > diff --git a/gcc/gcc.cc b/gcc/gcc.cc
> > > index 92c92996401..d2718d263bb 100644
> > > --- a/gcc/gcc.cc
> > > +++ b/gcc/gcc.cc
> > > @@ -305,9 +305,10 @@ static size_t dumpdir_length = 0;
> > > driver added to dumpdir after dumpbase or linker output name.  */
> > >  static bool dumpdir_trailing_dash_added = false;
> > >  
> > > -/* True if -r, -shared, -pie, or -no-pie were specified on the command
> > > -   line.  */
> > > -static bool any_link_options_p;
> > > +/* True if -r, -shared, -pie, -no-pie, -z lazy, or -z norelro were
> > > +   specified on the command line, and therefore -fhardened should not
> > > +   add -z now/relro.  */
> > > +static bool avoid_linker_hardening_p;
> > >  
> > >  /* True if -static was specified on the command line.  */
> > >  static bool static_p;
> > > @@ -4434,10 +4435,17 @@ driver_handle_option (struct gcc_options *opts,
> > >   }
> > >   /* Record the part after the last comma.  */
> > >   add_infile (arg + prev, "*");
> > > + if (strcmp (arg, "-z,lazy") == 0 || strcmp (arg, "-z,norelro") == 0)
> > > +   avoid_linker_hardening_p = true;
> > >}
> > >do_save = false;
> > >break;
> > >  
> > > +case OPT_z:
> > > +  if (strcmp (arg, "lazy") == 0 || strcmp (arg, "norelro") == 0)
> > > + avoid_linker_hardening_p = true;
> > > +  break;
> > > +
> > >  case OPT_Xlinker:
> > >add_infile (arg, "*");
> > >do_save = false;
> > > @@ -4642,7 +4650,7 @

Re: [PATCH] c++: Introduce append_ctor_to_tree_vector

2025-01-21 Thread Jason Merrill


On 1/21/25 10:52 AM, Jakub Jelinek wrote:

On Mon, Jan 20, 2025 at 05:14:33PM -0500, Jason Merrill wrote:

--- gcc/cp/call.cc.jj   2025-01-15 18:24:36.135503866 +0100
+++ gcc/cp/call.cc  2025-01-17 14:42:38.201643385 +0100
@@ -4258,11 +4258,30 @@ add_list_candidates (tree fns, tree firs
 /* Expand the CONSTRUCTOR into a new argument vec.  */


Maybe we could factor out a function called something like
append_ctor_to_tree_vector from the common code between this and
make_tree_vector_from_ctor?

But this is OK as is if you don't want to pursue that.


I had the previous patch already tested and wanted to avoid delaying
the large initializer speedup re-reversion any further, so I've committed
the patch as is.

Here is an incremental patch to factor that out.

Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?

2025-01-21  Jakub Jelinek  

gcc/c-family/
* c-common.h (append_ctor_to_tree_vector): Declare.
* c-common.cc (append_ctor_to_tree_vector): New function.
(make_tree_vector_from_ctor): Use it.
gcc/cp/
* call.cc (add_list_candidates): Use append_ctor_to_tree_vector.

--- gcc/c-family/c-common.h.jj  2025-01-17 11:29:33.139696380 +0100
+++ gcc/c-family/c-common.h 2025-01-21 09:30:09.520947570 +0100
@@ -1190,6 +1190,8 @@ extern vec *make_tree_vecto
  extern void release_tree_vector (vec *);
  extern vec *make_tree_vector_single (tree);
  extern vec *make_tree_vector_from_list (tree);
+extern vec *append_ctor_to_tree_vector (vec *,
+tree, unsigned);
  extern vec *make_tree_vector_from_ctor (tree);
  extern vec *make_tree_vector_copy (const vec *);
  
--- gcc/c-family/c-common.cc.jj	2025-01-20 18:00:35.667875671 +0100

+++ gcc/c-family/c-common.cc2025-01-21 09:29:23.955582581 +0100
@@ -9010,33 +9010,46 @@ make_tree_vector_from_list (tree list)
return ret;
  }
  
-/* Get a new tree vector of the values of a CONSTRUCTOR.  */

+/* Append to a tree vector the values of a CONSTRUCTOR.
+   nelts should be at least CONSTRUCTOR_NELTS (ctor) and v
+   should be initialized with make_tree_vector (); followed by
+   vec_safe_reserve (v, nelts); or equivalently vec_alloc (v, nelts);
+   optionally followed by pushes of other elements (up to
+   nelts - CONSTRUCTOR_NELTS (ctor)).  */


How about using v->allocated () instead of passing in nelts?


  vec *
-make_tree_vector_from_ctor (tree ctor)
+append_ctor_to_tree_vector (vec *v, tree ctor, unsigned nelts)
  {
-  vec *ret = make_tree_vector ();
-  unsigned nelts = CONSTRUCTOR_NELTS (ctor);
-  vec_safe_reserve (ret, CONSTRUCTOR_NELTS (ctor));
for (unsigned i = 0; i < CONSTRUCTOR_NELTS (ctor); ++i)
  if (TREE_CODE (CONSTRUCTOR_ELT (ctor, i)->value) == RAW_DATA_CST)
{
tree raw_data = CONSTRUCTOR_ELT (ctor, i)->value;
nelts += RAW_DATA_LENGTH (raw_data) - 1;
-   vec_safe_reserve (ret, nelts - ret->length ());
+   vec_safe_reserve (v, nelts - v->length ());
if (TYPE_PRECISION (TREE_TYPE (raw_data)) > CHAR_BIT
|| TYPE_UNSIGNED (TREE_TYPE (raw_data)))
  for (unsigned j = 0; j < (unsigned) RAW_DATA_LENGTH (raw_data); ++j)
-   ret->quick_push (build_int_cst (TREE_TYPE (raw_data),
-   RAW_DATA_UCHAR_ELT (raw_data, j)));
+   v->quick_push (build_int_cst (TREE_TYPE (raw_data),
+ RAW_DATA_UCHAR_ELT (raw_data, j)));
else
  for (unsigned j = 0; j < (unsigned) RAW_DATA_LENGTH (raw_data); ++j)
-   ret->quick_push (build_int_cst (TREE_TYPE (raw_data),
-   RAW_DATA_SCHAR_ELT (raw_data, j)));
+   v->quick_push (build_int_cst (TREE_TYPE (raw_data),
+ RAW_DATA_SCHAR_ELT (raw_data, j)));
}
  else
-  ret->quick_push (CONSTRUCTOR_ELT (ctor, i)->value);
-  return ret;
+  v->quick_push (CONSTRUCTOR_ELT (ctor, i)->value);
+  return v;
+}
+
+/* Get a new tree vector of the values of a CONSTRUCTOR.  */
+
+vec *
+make_tree_vector_from_ctor (tree ctor)
+{
+  vec *ret = make_tree_vector ();
+  unsigned nelts = CONSTRUCTOR_NELTS (ctor);
+  vec_safe_reserve (ret, nelts);
+  return append_ctor_to_tree_vector (ret, ctor, nelts);
  }
  
  /* Get a new tree vector which is a copy of an existing one.  */

--- gcc/cp/call.cc.jj   2025-01-21 09:11:58.214113697 +0100
+++ gcc/cp/call.cc  2025-01-21 09:32:29.382005137 +0100
@@ -4262,26 +4262,7 @@ add_list_candidates (tree fns, tree firs
vec_alloc (new_args, nelts);
for (unsigned i = 0; i < nart; ++i)
  new_args->quick_push ((*args)[i]);
-  for (unsigned i = 0; i < CONSTRUCTOR_NELTS (init_list); ++i)
-if (TREE_CODE (CONSTRUCTOR_ELT (init_list, i)->value) == RAW_DATA_CST)
-  {
-   tree raw_data = CONSTRUCTOR_ELT (init_list, i)->value;
-   nelts += RAW_DATA_LENGTH (raw_data) - 1;
-   vec_safe_reserve (new_args, nelts - new_

Re: [PATCH v2 01/12] OpenMP/PolyInt: Pass poly-int structures by address to OMP libs.

2025-01-21 Thread Jakub Jelinek

On Fri, Oct 18, 2024 at 11:52:22AM +0530, Tejas Belagod wrote:
> Currently poly-int type structures are passed by value to OpenMP runtime
> functions for shared clauses etc.  This patch improves on this by passing
> around poly-int structures by address to avoid copy-overhead.
> 
> gcc/ChangeLog
>   * omp-low.c (use_pointer_for_field): Use pointer if the OMP data
>   structure's field type is a poly-int.

I think I've acked this one earlier already.
It is still ok.

> ---
>  gcc/omp-low.cc | 3 ++-
>  1 file changed, 2 insertions(+), 1 deletion(-)
> 
> diff --git a/gcc/omp-low.cc b/gcc/omp-low.cc
> index da2051b0279..6b3853ed528 100644
> --- a/gcc/omp-low.cc
> +++ b/gcc/omp-low.cc
> @@ -466,7 +466,8 @@ static bool
>  use_pointer_for_field (tree decl, omp_context *shared_ctx)
>  {
>if (AGGREGATE_TYPE_P (TREE_TYPE (decl))
> -  || TYPE_ATOMIC (TREE_TYPE (decl)))
> +  || TYPE_ATOMIC (TREE_TYPE (decl))
> +  || POLY_INT_CST_P (DECL_SIZE (decl)))
>  return true;
>  
>/* We can only use copy-in/copy-out semantics for shared variables
> -- 
> 2.25.1

Jakub

[committed] libphobos: Add MIPS64 implementation of fiber_switchContext [PR118584]

2025-01-21 Thread Iain Buclaw

Hi,

This patch adds a MIPS64 implementation of `fiber_switchContext',
replacing the generic implementation.  The `core.thread.fiber' module
already defines version=AsmExternal on mips64el-linux-gnuabi64 targets.

Committed to mainline.

Regards,
Iain.

---
PR d/118584

libphobos/ChangeLog:

* libdruntime/config/mips/switchcontext.S: Add MIPS64 N64 ABI
implementation of fiber_switchContext.
---
 .../libdruntime/config/mips/switchcontext.S   | 78 +++
 1 file changed, 78 insertions(+)

diff --git a/libphobos/libdruntime/config/mips/switchcontext.S 
b/libphobos/libdruntime/config/mips/switchcontext.S
index d2fed64c78c..078ad0b3cce 100644
--- a/libphobos/libdruntime/config/mips/switchcontext.S
+++ b/libphobos/libdruntime/config/mips/switchcontext.S
@@ -99,4 +99,82 @@ fiber_switchContext:
 .end fiber_switchContext
 .size fiber_switchContext,.-fiber_switchContext
 
+#endif /* _MIPS_SIM == _ABIO32 */
+
+#if defined(__mips64) && _MIPS_SIM == _ABI64
+/
+ * MIPS 64 ASM BITS
+ * $a0 - void** - ptr to old stack pointer
+ * $a1 - void*  - new stack pointer
+ *
+ */
+.text
+.globl fiber_switchContext
+.align 2
+.ent fiber_switchContext,0
+fiber_switchContext:
+.cfi_startproc
+daddiu $sp, $sp, -(10 * 8)
+
+// fp regs and return address are stored below the stack
+// because we don't want the GC to scan them.
+
+#ifdef __mips_hard_float
+#define BELOW (8 * 8 + 8)
+s.d  $f24, (0 * 8 - BELOW)($sp)
+s.d  $f25, (1 * 8 - BELOW)($sp)
+s.d  $f26, (2 * 8 - BELOW)($sp)
+s.d  $f27, (3 * 8 - BELOW)($sp)
+s.d  $f28, (4 * 8 - BELOW)($sp)
+s.d  $f29, (5 * 8 - BELOW)($sp)
+s.d  $f30, (6 * 8 - BELOW)($sp)
+s.d  $f31, (7 * 8 - BELOW)($sp)
+#endif
+sd $ra, -8($sp)
+
+sd  $s0, (0 * 8)($sp)
+sd  $s1, (1 * 8)($sp)
+sd  $s2, (2 * 8)($sp)
+sd  $s3, (3 * 8)($sp)
+sd  $s4, (4 * 8)($sp)
+sd  $s5, (5 * 8)($sp)
+sd  $s6, (6 * 8)($sp)
+sd  $s7, (7 * 8)($sp)
+sd  $gp, (8 * 8)($sp)
+sd  $fp, (9 * 8)($sp)
+
+// swap stack pointer
+sd   $sp, 0($a0)
+move $sp, $a1
+
+#ifdef __mips_hard_float
+l.d  $f24, (0 * 8 - BELOW)($sp)
+l.d  $f25, (1 * 8 - BELOW)($sp)
+l.d  $f26, (2 * 8 - BELOW)($sp)
+l.d  $f27, (3 * 8 - BELOW)($sp)
+l.d  $f28, (4 * 8 - BELOW)($sp)
+l.d  $f29, (5 * 8 - BELOW)($sp)
+l.d  $f30, (6 * 8 - BELOW)($sp)
+l.d  $f31, (7 * 8 - BELOW)($sp)
 #endif
+ld $ra, -8($sp)
+
+ld $s0, (0 * 8)($sp)
+ld $s1, (1 * 8)($sp)
+ld $s2, (2 * 8)($sp)
+ld $s3, (3 * 8)($sp)
+ld $s4, (4 * 8)($sp)
+ld $s5, (5 * 8)($sp)
+ld $s6, (6 * 8)($sp)
+ld $s7, (7 * 8)($sp)
+ld $gp, (8 * 8)($sp)
+ld $fp, (9 * 8)($sp)
+
+daddiu $sp, $sp, (10 * 8)
+
+jr $ra // return
+.cfi_endproc
+.end fiber_switchContext
+.size fiber_switchContext,.-fiber_switchContext
+
+#endif /* defined(__mips64) && _MIPS_SIM == _ABI64 */
-- 
2.43.0

Re: [PATCH v5] RISC-V: Add a new constraint to ensure that the vl of XTheadVector does not get a non-zero immediate

2025-01-21 Thread Jeff Law





On 1/21/25 6:11 AM, Jin Ma wrote:

Although we have handled the vl of XTheadVector correctly in the
expand phase and predicates, the results show that the work is
still insufficient.

In the curr_insn_transform function, the insn is transformed from:
(insn 69 67 225 12 (set (mem:RVVM8SF (reg/f:DI 218 [ _77 ]) [0  S[128, 128] 
A32])
 (if_then_else:RVVM8SF (unspec:RVVMF4BI [
 (const_vector:RVVMF4BI repeat [
 (const_int 1 [0x1])
 ])
 (reg:DI 209)
 (const_int 0 [0])
 (reg:SI 66 vl)
 (reg:SI 67 vtype)
 ] UNSPEC_VPREDICATE)
 (reg/v:RVVM8SF 143 [ _xx ])
 (mem:RVVM8SF (reg/f:DI 218 [ _77 ]) [0  S[128, 128] A32])))
  (expr_list:REG_DEAD (reg/v:RVVM8SF 143 [ _xx ])
 (nil)))
to
(insn 69 284 225 11 (set (mem:RVVM8SF (reg/f:DI 18 s2 [orig:218 _77 ] [218]) [0 
 S[128, 128] A32])
 (if_then_else:RVVM8SF (unspec:RVVMF4BI [
 (const_vector:RVVMF4BI repeat [
 (const_int 1 [0x1])
 ])
 (const_int 1 [0x1])
 (const_int 0 [0])
 (reg:SI 66 vl)
 (reg:SI 67 vtype)
 ] UNSPEC_VPREDICATE)
 (reg/v:RVVM8SF 104 v8 [orig:143 _xx ] [143])
 (mem:RVVM8SF (reg/f:DI 18 s2 [orig:218 _77 ] [218]) [0  S[128, 
128] A32])))
  (nil))

Looking at the log for the reload pass, it is found that "Changing pseudo 209 in
operand 3 of insn 69 on equiv 0x1".
It converts the vl operand in insn from the expected register(reg:DI 209) to the
constant 1(const_int 1 [0x1]).

This conversion occurs because, although the predicate for the vl operand is
restricted by "vector_length_operand" in the pattern, the constraint is still
"rK", which allows the transformation.

The issue is that changing the "rK" constraint to "rJ" for the constraint of vl
operand in the pattern would prevent this conversion, But unfortunately this 
will
conflict with RVV (RISC-V Vector Extension).

Based on the review's recommendations, the best solution for now is to create
a new constraint to distinguish between RVV and XTheadVector, which is exactly
what this patch does.

PR 116593

gcc/ChangeLog:

* config/riscv/constraints.md (vl): New.
* config/riscv/thead-vector.md: Replacing rK with rvl.
* config/riscv/vector.md: Likewise.

gcc/testsuite/ChangeLog:

* g++.target/riscv/rvv/rvv.exp: Enable testsuite of XTheadVector.
* g++.target/riscv/rvv/xtheadvector/pr116593.C: New test.

I've pushed this to the trunk as well.

Thanks,
jeff

Re: [patch,avr] Tweak some 16-bit shifts using MUL.

2025-01-21 Thread Denis Chertykov

Georg-Johann Lay  writes:

> u16 << 5 and u16 << 6 can be tweaked by using MUL instructions.
> Benefit is a better speed ratio with -Os and smaller size with -O2.
>
> No new regressions.
>
> Ok for trunk?

Ok. Please apply.

Denis.

Re: [PATCH 3/4] RISC-V: Add .note.gnu.property for ZICFILP and ZICFISS ISA extension

2025-01-21 Thread Jeff Law





On 1/21/25 10:15 AM, Robin Dapp wrote:

I'm going to push the attached as obvious if my local test shows
no issues.

Yea, please do.  Thanks.

jeff

Re: [PATCH v2] RISC-V: Enable and adjust the testsuite for XTheadVector.

2025-01-21 Thread Jeff Law





On 1/21/25 5:52 AM, Jin Ma wrote:

gcc/testsuite/ChangeLog:

* gcc.target/riscv/rvv/rvv.exp: Enable testsuite of
XTheadVector.
* gcc.target/riscv/rvv/xtheadvector/pr114194.c: Adjust correctly.
* gcc.target/riscv/rvv/xtheadvector/prefix.c: Likewise.
* gcc.target/riscv/rvv/xtheadvector/vlb-vsb.c: Likewise.
* gcc.target/riscv/rvv/xtheadvector/vlbu-vsb.c: Likewise.
* gcc.target/riscv/rvv/xtheadvector/vlh-vsh.c: Likewise.
* gcc.target/riscv/rvv/xtheadvector/vlhu-vsh.c: Likewise.
* gcc.target/riscv/rvv/xtheadvector/vlw-vsw.c: Likewise.
* gcc.target/riscv/rvv/xtheadvector/vlwu-vsw.c: Likewise.

Thanks.  I've pushed this to the trunk.
jeff

Re: [PATCH] c++: Handle CPP_EMBED in cp_parser_objc_message_args [PR118586]

2025-01-21 Thread Jakub Jelinek

On Tue, Jan 21, 2025 at 12:04:36PM -0500, Jason Merrill wrote:
> > --- gcc/cp/parser.cc.jj 2025-01-17 19:27:34.052140136 +0100
> > +++ gcc/cp/parser.cc2025-01-20 20:16:23.082876036 +0100
> > @@ -36632,14 +36632,22 @@ cp_parser_objc_message_args (cp_parser*
> > /* Handle non-selector arguments, if any. */
> > while (token->type == CPP_COMMA)
> >   {
> > -  tree arg;
> > -
> > cp_lexer_consume_token (parser->lexer);
> > -  arg = cp_parser_assignment_expression (parser);
> > -  addl_args
> > -   = chainon (addl_args,
> > -  build_tree_list (NULL_TREE, arg));
> > +  if (cp_lexer_next_token_is (parser->lexer, CPP_EMBED))
> > +   {
> > + tree raw_data = cp_lexer_peek_token (parser->lexer)->u.value;
> > + cp_lexer_consume_token (parser->lexer);
> > + for (tree argument : raw_data_range (raw_data))
> > +   addl_args = chainon (addl_args,
> > +build_tree_list (NULL_TREE, argument));
> 
> chainon of each byte of an #embed looks pretty inefficient, walking the full
> list for each new element.  But OK.

Indeed, I've just used what it was doing without thinking too much about it,
sorry.
addl_args = tree_cons (NULL_TREE, arg, addl_args);
with addl_args = nreverse (addl_args); after the loop might be better,
can test that incrementally.  sel_args is handled the same and should have
the same treatment.

Jakub

Re: [GCC16/PATCH] combine: Better split point for `(and (not X))` [PR111949]

2025-01-21 Thread Jeff Law





On 1/20/25 9:38 PM, Andrew Pinski wrote:

In a similar way find_split_point handles `a+b*C`, this adds
the split point for `~a & b`.  This allows for better instruction
selection when the target has this instruction (aarch64, arm and x86_64
are examples which have this).

Built and tested for aarch64-linux-gnu.

PR rtl-optmization/111949

gcc/ChangeLog:

* combine.cc (find_split_point): Add a split point
for `(and (not X) Y)` if not in the outer set already.

gcc/testsuite/ChangeLog:

* gcc.target/aarch64/bic-1.c: New test.

gcc-16, unless there's a good tie-in to a regression?

jeff

Calling Convention Semantics for FCSR (was Re: gcc mode switching issue)

2025-01-21 Thread Vineet Gupta

On 1/20/25 19:07, Li, Pan2 wrote:
> Agree, the mode-switch will take care of the frm when meet a call (covered by 
> testcase already).
>
>5   │
>6   │ extern size_t normalize_vl_1 (size_t vl);
>7   │ extern size_t normalize_vl_2 (size_t vl);
>8   │
>9   │ vfloat32m1_t
>   10   │ test_float_point_dynamic_frm (vfloat32m1_t op1, vfloat32m1_t op2,
>   11   │  unsigned count, size_t vl)
>   12   │ {
>   13   │   vfloat32m1_t result = op1;
>   14   │
>   15   │   for (unsigned i = 0; i < count; i++)
>   16   │ {
>   17   │   if (i % 3 == 0)
>   18   │ {
>   19   │   result = __riscv_vfadd_vv_f32m1 (op1, result, vl);
>   20   │   vl = normalize_vl_1 (vl);
>   21   │ }
>   22   │   else
>   23   │ {
>   24   │   result = __riscv_vfadd_vv_f32m1_rm (result, op2, 1, vl);
>   25   │   vl = normalize_vl_2 (vl);
>   26   │ }
>   27   │ }
>   28   │
>   29   │   return result;
>   30   │ }
>
> .L12:
> csrra5,vlenb
> add a5,a5,sp
> vl1re32.v   v1,0(a5)
> vsetvli zero,a1,e32,m1,ta,ma
> addiw   s0,s0,1
> vfadd.vvv8,v1,v8 // Do not pollute frm, nothing need to do 
> here
> vs1r.v  v8,0(sp)
> callnormalize_vl_1 
> vl1re32.v   v8,0(sp)
> frrma4
> mv  a1,a0
> beq s3,s0,.L8
> .L5:
> mulwa5,s0,s2
> mv  a0,a1
> bleua5,s1,.L12
> fsrmi   1
> csrra5,vlenb
> sllia5,a5,1
> add a5,a5,sp  
>   
>   
>   
> vl1re32.v   v1,0(a5)  
>   
>   
>   
> vsetvli zero,a1,e32,m1,ta,ma  
>   
>   
>   
>vfadd.vvv8,v8,v1 // Pollute frm, will restore frm before call
> vs1r.v  v8,0(sp)
> fsrma4
> callnormalize_vl_2
> addiw   s0,s0,1
> vl1re32.v   v8,0(sp)
> frrma4
> mv  a1,a0
> bne s3,s0,.L5
>
> while for llround autovec, it will also perform something like restore frm 
> before leave the func.
>
>8   │ #define TEST_UNARY_CALL_CVT(TYPE_IN, TYPE_OUT, CALL) \
>9   │   void test_##TYPE_IN##_##TYPE_OUT##_##CALL (\
>   10   │ TYPE_OUT *out, TYPE_IN *in, unsigned count)  \
>   11   │   {  \
>   12   │ for (unsigned i = 0; i < count; i++) \
>   13   │   out[i] = CALL (in[i]); \
>   14   │   }
>
> TEST_UNARY_CALL_CVT (double, int64_t, __builtin_llround)
>
> test_double_int64_t___builtin_llround:
>   frrma3  
>   
>   
>  
>   beq a2,zero,.L8 
>   
>   
>  
>   fsrmi   4   
>   
>   
>  
>   sllia2,a2,32
>   
>   
>  
>   srlia2,a2,32
>   
>   
>  
> .L3:  
>   
>   
>

Re: [GCC16 stage 1][RFC][PATCH 0/3]extend "counted_by" attribute to pointer fields of structures

2025-01-21 Thread Joseph Myers

On Tue, 21 Jan 2025, Qing Zhao wrote:

> So, even after we introduce the designator syntax for counted_by attribute,  
> arbitrary expressions as:
> 
> counted_by (.len1 + const)
> counted_by (.len1 + .len2) 
> 
> Still cannot be supported? 

Indeed.  Attempting to use ".len1" inside something that looks like an 
expression is fundamentally ambiguous, as I've noted in previous 
discussions of such syntax.  Consider

  counted_by ((struct s) { .len1 = x }.len1)

where you now have an ambiguity of whether ".len1 = x" is a designated 
initializer for the compound literal of type struct s, or an assignment to 
".len1" within the structure referred to in counted_by.

> If not, how should we support simple expressions for counted_by attribute? 

First, I should point out that the proposed standard feature in this area 
(which received along-the-lines support at the WG14 meeting in Strasbourg 
last year, though with a very large number of issues with the proposed 
wording that would need to be resolved for any such feature to go in, and 
without support for another paper needed for the feature to be useful) was 
intentionally limited; it didn't try for general expressions, just for 
.IDENTIFIER, where IDENTIFIER named a *const-qualified* structure member 
(that thus had to be set when the structure was initialized and not 
modified thereafter, so avoiding various of the complications we have with 
counted_by of defining exactly when the value applies in relation to 
accesses to different structure members).

But if you want a less-limited feature that allows for expressions, you 
need some syntax for referring to a structure member that's not ambiguous.  
For example, some new notation such as __self__.len1 to refer to a member 
of the closest enclosing structure definition when in counted_by (while 
being invalid except in counted_by inside a structure definition).  
(That's just one example of how you might define syntax that avoids 
ambiguity.)

-- 
Joseph S. Myers
josmy...@redhat.com

Re: [GCC16 stage 1][RFC][PATCH 0/3]extend "counted_by" attribute to pointer fields of structures

2025-01-21 Thread Joseph Myers

On Tue, 21 Jan 2025, Qing Zhao wrote:

> > On Jan 20, 2025, at 16:19, Joseph Myers  wrote:
> > 
> > On Sat, 18 Jan 2025, Kees Cook wrote:
> > 
> >> Gaining access to global variables is another gap Linux has -- e.g. we
> >> have arrays that are sized by the global number-of-cpus variable. :)
> > 
> > Note that it's already defined that counted_by takes an identifier for a 
> > structure member (i.e. not an expression, not following the name lookup 
> > rules used in expressions).  So some different syntax that only takes an 
> > expression and not an identifier interpreted as a structure member would 
> > be needed for anything that allows use of a global variable.
> 
> If we need to add such syntax for counted_by (I,e,  an expresson), can we 
> still keep the same attribute
> Name, all we need a new attribute name for the new syntax?

If you want a different syntax that's potentially incompatible or 
ambiguous with cases currently accepted, that would indicate having a new 
attribute name for the new syntax.

-- 
Joseph S. Myers
josmy...@redhat.com

Re: Calling Convention Semantics for FCSR (was Re: gcc mode switching issue)

2025-01-21 Thread Joseph Myers

On Tue, 21 Jan 2025, Vineet Gupta wrote:

> Silly question, what exactly is the procedure calling convention rule for
> FCSR/FRM ? Is it a Caller saved or a Callee saved Reg.
> The psABI CC doc is not explicit in those terms at least [1]
> 
> |   "The Floating-Point Control and Status Register (fcsr) must have thread
> storage duration
> |   in accordance with C11 section 7.6 "Floating-point   environment  
> Per your llround snippet#2  is seems like Callee saved (function is restoring
> the mode before it returns), but then in the snippet #1 at top, why does it 
> need
> to save the value before a function call, can't the callee just restore it 
> back.
> I'm surely missing something here.

In the 32-bit Power Architecture ABI we defined rounding mode and similar 
bits as "limited-access" since, being more like a thread-local variable 
than a normal register, they don't fit well into the caller-save / 
callee-save model.

https://www.polyomino.org.uk/publications/2011/Power-Arch-32-bit-ABI-supp-1.0-Unified.pdf#page=38

-- 
Joseph S. Myers
josmy...@redhat.com

Re: [PATCH] c++: Introduce append_ctor_to_tree_vector

2025-01-21 Thread Jason Merrill


On 1/21/25 11:15 AM, Jakub Jelinek wrote:

On Tue, Jan 21, 2025 at 11:06:35AM -0500, Jason Merrill wrote:

--- gcc/c-family/c-common.cc.jj 2025-01-20 18:00:35.667875671 +0100
+++ gcc/c-family/c-common.cc2025-01-21 09:29:23.955582581 +0100
@@ -9010,33 +9010,46 @@ make_tree_vector_from_list (tree list)
 return ret;
   }
-/* Get a new tree vector of the values of a CONSTRUCTOR.  */
+/* Append to a tree vector the values of a CONSTRUCTOR.
+   nelts should be at least CONSTRUCTOR_NELTS (ctor) and v
+   should be initialized with make_tree_vector (); followed by
+   vec_safe_reserve (v, nelts); or equivalently vec_alloc (v, nelts);
+   optionally followed by pushes of other elements (up to
+   nelts - CONSTRUCTOR_NELTS (ctor)).  */


How about using v->allocated () instead of passing in nelts?


That is not necessarily the same.


Yeah, it occurred to me later that it doesn't matter what the original 
length or capacity of the vector is, we want to make sure there's enough 
room for the elements of the ctor after whatever's already there.  So we 
want nelts to start as CONSTRUCTOR_NELTS, and then vec_safe_reserve 
nelts - i.


Does that make sense to you?

Jason

Re: [PATCH] c++: Improve cp_parser_objc_messsage_args compile time

2025-01-21 Thread Jason Merrill


On 1/21/25 1:02 PM, Jakub Jelinek wrote:

On Tue, Jan 21, 2025 at 06:47:53PM +0100, Jakub Jelinek wrote:

Indeed, I've just used what it was doing without thinking too much about it,
sorry.
addl_args = tree_cons (NULL_TREE, arg, addl_args);
with addl_args = nreverse (addl_args); after the loop might be better,
can test that incrementally.  sel_args is handled the same and should have
the same treatment.


Here is incremental patch to do that.

So far verified on the 2 va-meth*.mm testcases (one without CPP_EMBED, one
with) that -fdump-tree-gimple is the same before/after the patch.

Ok for trunk if it passes bootstrap/regtest (or defer for GCC 16?)?


OK.


2025-01-21  Jakub Jelinek  

* parser.cc (cp_parser_objc_message_args): Use tree_cons with
nreverse at the end for both sel_args and addl_args, instead of
chainon with build_tree_list second argument.

--- gcc/cp/parser.cc.jj 2025-01-21 18:49:42.478969570 +0100
+++ gcc/cp/parser.cc2025-01-21 18:52:19.035786901 +0100
@@ -36724,9 +36724,7 @@ cp_parser_objc_message_args (cp_parser*
cp_parser_require (parser, CPP_COLON, RT_COLON);
arg = cp_parser_assignment_expression (parser);
  
-  sel_args

-   = chainon (sel_args,
-  build_tree_list (selector, arg));
+  sel_args = tree_cons (selector, arg, sel_args);
  
token = cp_lexer_peek_token (parser->lexer);

  }
@@ -36741,14 +36739,12 @@ cp_parser_objc_message_args (cp_parser*
  tree raw_data = cp_lexer_peek_token (parser->lexer)->u.value;
  cp_lexer_consume_token (parser->lexer);
  for (tree argument : raw_data_range (raw_data))
-   addl_args = chainon (addl_args,
-build_tree_list (NULL_TREE, argument));
+   addl_args = tree_cons (NULL_TREE, argument, addl_args);
}
else
{
  tree arg = cp_parser_assignment_expression (parser);
- addl_args = chainon (addl_args,
-  build_tree_list (NULL_TREE, arg));
+ addl_args = tree_cons (NULL_TREE, arg, addl_args);
}
  
token = cp_lexer_peek_token (parser->lexer);

@@ -36760,7 +36756,7 @@ cp_parser_objc_message_args (cp_parser*
return build_tree_list (error_mark_node, error_mark_node);
  }
  
-  return build_tree_list (sel_args, addl_args);

+  return build_tree_list (nreverse (sel_args), nreverse (addl_args));
  }
  
  /* Parse an Objective-C encode expression.



Jakub

[PATCH v6] AArch64: Add LUTI ACLE for SVE2

2025-01-21 Thread saurabh.jha


This patch introduces support for LUTI2/LUTI4 ACLE for SVE2.

LUTI instructions are used for efficient table lookups with 2-bit
or 4-bit indices. LUTI2 reads indexed 8-bit or 16-bit elements from
the low 128 bits of the table vector using packed 2-bit indices,
while LUTI4 can read from the low 128 or 256 bits of the table
vector or from two table vectors using packed 4-bit indices.
These instructions fill the destination vector by copying elements
indexed by segments of the source vector, selected by the vector
segment index.

The changes include the addition of a new AArch64 option
extension "lut", __ARM_FEATURE_LUT preprocessor macro, definitions
for the new LUTI instruction shapes, and implementations of the
svluti2 and svluti4 builtins.

gcc/ChangeLog:

* config/aarch64/aarch64-c.cc
(aarch64_update_cpp_builtins): Add new flag TARGET_LUT.
* config/aarch64/aarch64-sve-builtins-shapes.cc
(struct luti_base): Shape for lut intrinsics.
(SHAPE): Specializations for lut shapes for luti2 and luti4..
* config/aarch64/aarch64-sve-builtins-shapes.h: Declare lut
intrinsics.
* config/aarch64/aarch64-sve-builtins-sve2.cc
(class svluti_lane_impl): Define expand for lut intrinsics.
(FUNCTION): Define expand for lut intrinsics.
* config/aarch64/aarch64-sve-builtins-sve2.def
(REQUIRED_EXTENSIONS): Declare lut intrinsics behind lut flag.
(svluti2_lane): Define intrinsic behind flag.
(svluti4_lane): Define intrinsic behind flag.
* config/aarch64/aarch64-sve-builtins-sve2.h: Declare lut
intrinsics.
* config/aarch64/aarch64-sve-builtins.cc
(TYPES_bh_data): New type for byte and halfword.
(bh_data): Type array for byte and halfword.
(h_data): Type array for halfword.
* config/aarch64/aarch64-sve2.md
(@aarch64_sve_luti): Instruction patterns for
lut intrinsics.
* config/aarch64/iterators.md: Iterators and attributes for lut
intrinsics.

gcc/testsuite/ChangeLog:

* gcc.target/aarch64/sve/acle/asm/test_sve_acle.h: New test
macro.
* lib/target-supports.exp: Add lut flag to the for loop.
* gcc.target/aarch64/sve/acle/general-c/lut_1.c: New test.
* gcc.target/aarch64/sve/acle/general-c/lut_2.c: New test.
* gcc.target/aarch64/sve/acle/general-c/lut_3.c: New test.
* gcc.target/aarch64/sve/acle/general-c/lut_4.c: New test.
* gcc.target/aarch64/sve2/acle/asm/luti2_bf16.c: New test.
* gcc.target/aarch64/sve2/acle/asm/luti2_f16.c: New test.
* gcc.target/aarch64/sve2/acle/asm/luti2_s16.c: New test.
* gcc.target/aarch64/sve2/acle/asm/luti2_s8.c: New test.
* gcc.target/aarch64/sve2/acle/asm/luti2_u16.c: New test.
* gcc.target/aarch64/sve2/acle/asm/luti2_u8.c: New test.
* gcc.target/aarch64/sve2/acle/asm/luti4_bf16.c: New test.
* gcc.target/aarch64/sve2/acle/asm/luti4_bf16_x2.c: New test.
* gcc.target/aarch64/sve2/acle/asm/luti4_f16.c: New test.
* gcc.target/aarch64/sve2/acle/asm/luti4_f16_x2.c: New test.
* gcc.target/aarch64/sve2/acle/asm/luti4_s16.c: New test.
* gcc.target/aarch64/sve2/acle/asm/luti4_s16_x2.c: New test.
* gcc.target/aarch64/sve2/acle/asm/luti4_s8.c: New test.
* gcc.target/aarch64/sve2/acle/asm/luti4_u16.c: New test.
* gcc.target/aarch64/sve2/acle/asm/luti4_u16_x2.c: New test.
* gcc.target/aarch64/sve2/acle/asm/luti4_u8.c: New test.

---

This is a respin of
https://gcc.gnu.org/pipermail/gcc-patches/2025-January/674085.html

The only change from the previous version is the addition of
ChangeLog in the commit message.

Ok for master?

Thanks,
Saurabh
---
 gcc/config/aarch64/aarch64-c.cc   |   2 +
 .../aarch64/aarch64-sve-builtins-shapes.cc|  46 +++
 .../aarch64/aarch64-sve-builtins-shapes.h |   2 +
 .../aarch64/aarch64-sve-builtins-sve2.cc  |  17 ++
 .../aarch64/aarch64-sve-builtins-sve2.def |   8 +
 .../aarch64/aarch64-sve-builtins-sve2.h   |   2 +
 gcc/config/aarch64/aarch64-sve-builtins.cc|   8 +-
 gcc/config/aarch64/aarch64-sve2.md|  33 +++
 gcc/config/aarch64/iterators.md   |   7 +
 .../aarch64/sve/acle/asm/test_sve_acle.h  |  16 ++
 .../aarch64/sve/acle/general-c/lut_1.c|  34 +++
 .../aarch64/sve/acle/general-c/lut_2.c|  11 +
 .../aarch64/sve/acle/general-c/lut_3.c|  92 ++
 .../aarch64/sve/acle/general-c/lut_4.c| 262 ++
 .../aarch64/sve2/acle/asm/luti2_bf16.c|  50 
 .../aarch64/sve2/acle/asm/luti2_f16.c |  50 
 .../aarch64/sve2/acle/asm/luti2_s16.c |  50 
 .../aarch64/sve2/acle/asm/luti2_s8.c  |  50 
 .../aarch64/sve2/acle/asm/luti2_u16.c |  50 
 .../aarch64/sve2/acle/asm/luti2_u8.c  |  50 
 ..

Re: [PATCH] [ifcombine] avoid dropping tree_could_trap_p [PR118514]

2025-01-21 Thread Alexandre Oliva

On Jan 21, 2025, Richard Biener  wrote:

> you can use bit_field_size () and bit_field_offset () unconditionally,

Nice, thanks!

> Now, we don't have the same handling on BIT_FIELD_REFs but it
> seems it's enough to apply the check to those with a DECL as
> object to operate on.

I doubt that will be enough.  I'm pretty sure the cases I saw in libgnat
in which BIT_FIELD_REF changed could_trap status, compared with the
preexisting convert-and-access-field it replaced, were not DECLs, but
dereferences.  But I'll check and report back.  (I'll be AFK for most of
the day, alas)

> Note we assume that x.a[i] (variable index) may trap, handling other cases
> where variable size/offset is involved in the same conservative manner looks
> reasomable.

*nod*.  So arranging for BIT_FIELD_REFs to also trap would be ok, and it
wouldn't prevent the optimization if the BIT_FIELD_REF will have the
same trap status.

>> Yeah.  It works.  But then I figured we could take a safe step further
>> and ended up with what I posted.

> What about that short-circuit argument?  How do we end up combining
> two refs that could trap and emit that to a block that's no longer guarded
> by the original separate conditions?

We don't do that, we emit it to the same block as the original
reference.  But if the could_trap status is different (say, because the
original reference could trap while the replacement doesn't), other
optimizers may move it to an unsafe spot.

Now, it occurs to me that, if only one of the references could trap, and
we merge them and insert the merged load next to the one that doesn't,
we have an even more subtle variant of the error at hand.  I'm not sure
how to trigger it, because alignment and size seem to guarantee we won't
combine accesses with different trapping properties (though one could
presumably be marked as non-trapping if it is dominated by another
access to the same word), but I should probably guard against this.
Will do.

-- 
Alexandre Oliva, happy hackerhttps://FSFLA.org/blogs/lxo/
   Free Software Activist   GNU Toolchain Engineer
More tolerance and less prejudice are key for inclusion and diversity
Excluding neuro-others for not behaving ""normal"" is *not* inclusive

Re:[pushed] [PATCH v2 0/2] Implement target attribute and pragma.

2025-01-21 Thread Lulu Cheng


Pushed to r15-7092 and r15-7093.

在 2025/1/20 下午5:54, Lulu Cheng 写道:

Currently, the following items are supported:

 __attribute__ ((target ("{no-}strict-align")))
 __attribute__ ((target ("cmodel=")))
 __attribute__ ((target ("arch=")))
 __attribute__ ((target ("tune=")))
 __attribute__ ((target ("{no-}lsx")))
 __attribute__ ((target ("{no-}lasx")))

v1 -> v2:
1. Correct clerical errors as follows:
--- a/gcc/config/loongarch/loongarch-target-attr.cc
+++ b/gcc/config/loongarch/loongarch-target-attr.cc
@@ -120,7 +120,7 @@ loongarch_handle_option (struct gcc_options *opts,
  
  case OPT_mlasx:

opts->x_la_opt_simd = val ? ISA_EXT_SIMD_LASX
-   : (la_opt_simd == ISA_EXT_SIMD_LSX || la_opt_simd == ISA_EXT_SIMD_LSX
+   : (la_opt_simd == ISA_EXT_SIMD_LASX || la_opt_simd == ISA_EXT_SIMD_LSX
2. Add example to doc.

Lulu Cheng (2):
   LoongArch: Implement target attribute.
   LoongArch: Implement target pragma.

  gcc/attr-urls.def |   6 +
  gcc/config.gcc|   2 +-
  gcc/config/loongarch/loongarch-protos.h   |   5 +
  gcc/config/loongarch/loongarch-target-attr.cc | 472 ++
  gcc/config/loongarch/loongarch.cc |  41 +-
  gcc/config/loongarch/loongarch.h  |   2 +
  gcc/config/loongarch/t-loongarch  |   6 +
  gcc/doc/extend.texi   |  88 
  .../gcc.target/loongarch/arch-func-attr-1.c   |  20 +
  .../gcc.target/loongarch/arch-pragma-attr-1.c |   7 +
  .../loongarch/attr-check-error-message.c  |  30 ++
  .../gcc.target/loongarch/cmodel-func-attr-1.c |  21 +
  .../loongarch/cmodel-pragma-attr-1.c  |   7 +
  .../gcc.target/loongarch/lasx-func-attr-1.c   |  19 +
  .../gcc.target/loongarch/lasx-func-attr-2.c   |  12 +
  .../gcc.target/loongarch/lasx-pragma-attr-1.c |   7 +
  .../gcc.target/loongarch/lasx-pragma-attr-2.c |  12 +
  .../gcc.target/loongarch/lsx-func-attr-1.c|  19 +
  .../gcc.target/loongarch/lsx-func-attr-2.c|  12 +
  .../gcc.target/loongarch/lsx-pragma-attr-1.c  |   7 +
  .../gcc.target/loongarch/lsx-pragma-attr-2.c  |  12 +
  .../gcc.target/loongarch/pragma-push-pop.c|  22 +
  .../loongarch/strict_align-func-attr-1.c  |  21 +
  .../loongarch/strict_align-func-attr-2.c  |  21 +
  .../loongarch/strict_align-pragma-attr-1.c|   7 +
  .../loongarch/strict_align-pragma-attr-2.c|   7 +
  .../gcc.target/loongarch/vector-func-attr-1.c |  19 +
  .../loongarch/vector-pragma-attr-1.c  |   7 +
  28 files changed, 901 insertions(+), 10 deletions(-)
  create mode 100644 gcc/config/loongarch/loongarch-target-attr.cc
  create mode 100644 gcc/testsuite/gcc.target/loongarch/arch-func-attr-1.c
  create mode 100644 gcc/testsuite/gcc.target/loongarch/arch-pragma-attr-1.c
  create mode 100644 
gcc/testsuite/gcc.target/loongarch/attr-check-error-message.c
  create mode 100644 gcc/testsuite/gcc.target/loongarch/cmodel-func-attr-1.c
  create mode 100644 gcc/testsuite/gcc.target/loongarch/cmodel-pragma-attr-1.c
  create mode 100644 gcc/testsuite/gcc.target/loongarch/lasx-func-attr-1.c
  create mode 100644 gcc/testsuite/gcc.target/loongarch/lasx-func-attr-2.c
  create mode 100644 gcc/testsuite/gcc.target/loongarch/lasx-pragma-attr-1.c
  create mode 100644 gcc/testsuite/gcc.target/loongarch/lasx-pragma-attr-2.c
  create mode 100644 gcc/testsuite/gcc.target/loongarch/lsx-func-attr-1.c
  create mode 100644 gcc/testsuite/gcc.target/loongarch/lsx-func-attr-2.c
  create mode 100644 gcc/testsuite/gcc.target/loongarch/lsx-pragma-attr-1.c
  create mode 100644 gcc/testsuite/gcc.target/loongarch/lsx-pragma-attr-2.c
  create mode 100644 gcc/testsuite/gcc.target/loongarch/pragma-push-pop.c
  create mode 100644 
gcc/testsuite/gcc.target/loongarch/strict_align-func-attr-1.c
  create mode 100644 
gcc/testsuite/gcc.target/loongarch/strict_align-func-attr-2.c
  create mode 100644 
gcc/testsuite/gcc.target/loongarch/strict_align-pragma-attr-1.c
  create mode 100644 
gcc/testsuite/gcc.target/loongarch/strict_align-pragma-attr-2.c
  create mode 100644 gcc/testsuite/gcc.target/loongarch/vector-func-attr-1.c
  create mode 100644 gcc/testsuite/gcc.target/loongarch/vector-pragma-attr-1.c

Re: [PATCH v2 2/2] LoongArch: Improve reassociation for bitwise operation and left shift [PR 115921]

2025-01-21 Thread Lulu Cheng




在 2025/1/21 下午6:05, Xi Ruoyao 写道:

On Tue, 2025-01-21 at 16:41 +0800, Lulu Cheng wrote:

在 2025/1/21 下午12:59, Xi Ruoyao 写道:

On Tue, 2025-01-21 at 11:46 +0800, Lulu Cheng wrote:

在 2025/1/18 下午7:33, Xi Ruoyao 写道:
/* snip */

    ;; This code iterator allows unsigned and signed division to be generated
    ;; from the same template.
@@ -3083,39 +3084,6 @@ (define_expand "rotl3"
  }
  });

-;; The following templates were added to generate "bstrpick.d + alsl.d"

-;; instruction pairs.
-;; It is required that the values of const_immalsl_operand and
-;; immediate_operand must have the following correspondence:
-;;
-;; (immediate_operand >> const_immalsl_operand) == 0x
-
-(define_insn "zero_extend_ashift"
-  [(set (match_operand:DI 0 "register_operand" "=r")
-   (and:DI (ashift:DI (match_operand:DI 1 "register_operand" "r")
-      (match_operand 2 "const_immalsl_operand" ""))
-   (match_operand 3 "immediate_operand" "")))]
-  "TARGET_64BIT
-   && ((INTVAL (operands[3]) >> INTVAL (operands[2])) == 0x)"
-  "bstrpick.d\t%0,%1,31,0\n\talsl.d\t%0,%0,$r0,%2"
-  [(set_attr "type" "arith")
-   (set_attr "mode" "DI")
-   (set_attr "insn_count" "2")])
-
-(define_insn "bstrpick_alsl_paired"
-  [(set (match_operand:DI 0 "register_operand" "=&r")
-   (plus:DI
-     (and:DI (ashift:DI (match_operand:DI 1 "register_operand" "r")
-    (match_operand 2 "const_immalsl_operand" ""))
-     (match_operand 3 "immediate_operand" ""))
-     (match_operand:DI 4 "register_operand" "r")))]
-  "TARGET_64BIT
-   && ((INTVAL (operands[3]) >> INTVAL (operands[2])) == 0x)"
-  "bstrpick.d\t%0,%1,31,0\n\talsl.d\t%0,%0,%4,%2"
-  [(set_attr "type" "arith")
-   (set_attr "mode" "DI")
-   (set_attr "insn_count" "2")])
-

Hi,

In LoongArch, the microarchitecture has performed instruction fusion on
bstrpick.d+alsl.d.

This modification may cause the two instructions to not be close together.

So I think these two templates cannot be deleted. I will test the impact
of this patch on the spec today.

Oops.  I guess we can salvage it with TARGET_SCHED_MACRO_FUSION_P and
TARGET_SCHED_MACRO_FUSION_PAIR_P.  And I'd like to know more details:

1. Is the fusion applying to all bstrpick.d + alsl.d, or only bstrpick.d
rd, rs, 31, 0?
2. Is the fusion also applying to bstrpick.d + slli.d, or we really have
to write the strange "alsl.d rd, rs, r0, shamt" instruction?


Currently, command fusion can only be done in the following situations:

bstrpick.d rd, rs, 31, 0 + alsl.d rd1,rj,rk,shamt and "rd = rj"

So the easiest solution seems just adding the two patterns back, I'm
bootstrapping and regtesting the patch attached.


It seems to be more formal through TARGET_SCHED_MACRO_FUSION_P and

TARGET_SCHED_MACRO_FUSION_PAIR_P. I found the spec test item that generated

this instruction pair. I implemented these two hooks to see if it works.

Re: [PATCH] c++: Don't ICE in build_class_member_access_expr during error recovery [PR118225]

2025-01-21 Thread Simon Martin

Hi Jason,

On 20 Jan 2025, at 22:50, Jason Merrill wrote:

> On 1/4/25 10:13 AM, Simon Martin wrote:
>> The invalid case in this PR trips on an assertion in
>> build_class_member_access_expr that build_base_path would never 
>> return
>> an error_mark_node, which is actually incorrect if the object 
>> involves a
>> tree with an error_mark_node DECL_INITIAL, like here.
>>
>> This patch simply removes the assertion, even though it has been here
>> for 22+ years (r0-44513-g50ad96428042fa). An alternative would be to
>> assert that object != error_mark_node || seen_error (), but that'd be
>> virtually not asserting anything IMO.
>
> That is an important difference: it asserts that if we run into 
> trouble, we've actually given an error message.  Silently ignoring 
> error_mark_node frequently leads to wrong-code bugs.
That makes sense, thanks for the context.

> OK with that change.
Thanks, applied as r15-7096-g4e4c378ac1f923.

Simon

Re: [PATCH v2 2/2] LoongArch: Improve reassociation for bitwise operation and left shift [PR 115921]

2025-01-21 Thread Xi Ruoyao

On Tue, 2025-01-21 at 21:23 +0800, Xi Ruoyao wrote:

/* snip */

> > It seems to be more formal through TARGET_SCHED_MACRO_FUSION_P and
> > 
> > TARGET_SCHED_MACRO_FUSION_PAIR_P. I found the spec test item that
> > generated
> > 
> > this instruction pair. I implemented these two hooks to see if it
> > works.
> 
> And another problem is w/o bstrpick_alsl_paired some test cases are
> regressed, like:
> 
> struct Pair { unsigned long a, b; };
> 
> struct Pair
> test (struct Pair p, long x, long y)
> {
>   p.a &= 0x;
>   p.a <<= 2;
>   p.a += x;
>   p.b &= 0x;
>   p.b <<= 2;
>   p.b += x;
>   return p;
> }
> 
> in GCC 13 the result is:
> 
>   or  $r12,$r4,$r0

Hmm, this strange move is caused by "&" in bstrpick_alsl_paired.  Is it
really needed for the fusion?

>   bstrpick.d  $r4,$r12,31,0
>   alsl.d  $r4,$r4,$r6,2
>   or  $r12,$r5,$r0
>   bstrpick.d  $r5,$r12,31,0
>   alsl.d  $r5,$r5,$r6,2
>   jr  $r1
> 
> But now:
> 
>   addi.w  $r12,$r0,-4 # 0xfffc
>   lu32i.d $r12,0x3
>   slli.d  $r5,$r5,2
>   slli.d  $r4,$r4,2
>   and $r5,$r5,$r12
>   and $r4,$r4,$r12
>   add.d   $r4,$r4,$r6
>   add.d   $r5,$r5,$r6
>   jr  $r1
> 
> While both are suboptimial, the new code generation is more stupid.  I'm
> still unsure how to fix it, so maybe for now we'd just restore
> bstrpick_alsl_paired to fix the regression.
> 
> And I guess we'd need zero_extend_ashift anyway because we need to use
> alsl.d instead of slli.d for the fusion.

-- 
Xi Ruoyao 
School of Aerospace Science and Technology, Xidian University

RE: [PATCH v1] RISC-V: Fix incorrect code gen for scalar signed SAT_SUB [PR117688]

2025-01-21 Thread Li, Pan2

Thanks Jeff for comments.

> So a bit of high level background why this is needed would be helpful.

I see. The problem comes from the gen_lowpart when passing the args to SAT_SUB 
directly(aka without func args).

SAT_SUB with args, we have input rtx (subreg/s/u:QI (reg/v:DI 135 [ x ]) 0), 
and then gen_lowpart will convert to (reg/v:DI 135 [ x ]).
SAT_SUB without args, we have input rtx (reg:QI 141), and then gen_lowpar will 
convert to subreg:DI (reg:QI 141) 0).

Unfortunately we don't have sub/add for QImode in scalar, thus we need to sign 
extend to Xmode to ensure the correctness.
For example, for 0xff(-1 for QImode) sub 0x1(1 for QImode), we actually want to 
-1 - 1 = -2 but when Xmode sub, we will have 0xff - 1 = 0xfe.
Thus, we need to sign extend 0xff (Qmode) to 0x(assume XImode 
is DImode) for sub.

I will add more descriptions about why from the commit log.

> You might be able to (ab)use riscv_expand_comparands rather than making 
> a new function. You'd probably want to rename it in that case.

Just would like wrap the mess in a separate function to keep expand func 
simple. It also need to take care of const_int rtx later
Similar to riscv_gen_zero_extend_rtx. It is a separated refactor, thus I didn't 
involve that for this bug fix.

> The other high level question, wouldn't this be needed for other signed 
> cases, not just those going through expand_sssub?

I think both the ssadd and sstrunc may need this. To be efficient, I send the 
sssub bugfix first, and then validate the ssadd and sstrunc
in the meantime.

Pan


-Original Message-
From: Jeff Law  
Sent: Wednesday, January 22, 2025 6:46 AM
To: Li, Pan2 ; gcc-patches@gcc.gnu.org
Cc: juzhe.zh...@rivai.ai; kito.ch...@gmail.com; rdapp@gmail.com
Subject: Re: [PATCH v1] RISC-V: Fix incorrect code gen for scalar signed 
SAT_SUB [PR117688]



On 1/20/25 2:18 AM, pan2...@intel.com wrote:
> From: Pan Li 
> 
> This patch would like to fix the wroing code generation for the scalar
> signed SAT_SUB.  The input can be QI/HI/SI/DI while the alu like sub
> can only work on Xmode, thus we need to make sure the value of input
> are well signed-extended at first.  But the gen_lowpart will generate
> something like lbu which will perform the zero extended.
> 
> The below test suites are passed for this patch.
> * The rv64gcv fully regression test.
> 
> Note we also notice some refinement like to support const_int for input
> or similar issue for ssadd and/or sstruct.  But we would like to fix
> it by another patch(es).
> 
>   PR target/117688
> 
> gcc/ChangeLog:
> 
>   * config/riscv/riscv.cc (riscv_gen_sign_extend_rtx): Add new func
>   to make sure the op is signed extended to Xmode.
>   (riscv_expand_sssub): Leverage above func to perform sign extend
>   if not Xmode.
> 
> gcc/testsuite/ChangeLog:
> 
>   * gcc.target/riscv/pr117688-run-1-s16.c: New test.
>   * gcc.target/riscv/pr117688-run-1-s32.c: New test.
>   * gcc.target/riscv/pr117688-run-1-s8.c: New test.
>   * gcc.target/riscv/pr117688.h: New test.
So a bit of high level background why this is needed would be helpful.

My suspicion is that we're using a functional interface, but without 
going through any of the usual parameter/argument setup routines.  So a 
sub-word object hasn't gone through a mandatory sign extension.



>   
> +/* Generate a REG rtx of Xmode from the given rtx and mode.
> +   The rtx x can be REG (QI/HI/SI/DI).
> +   The machine_mode mode is the original mode from define pattern.
> +
> +   If rtx is REG and Xmode, the RTX x will be returned directly.
> +
> +   If rtx is REG and non-Xmode, the sign extended to new REG of Xmode will be
> +   returned.
> +
> +   Then the underlying expanding can perform the code generation based on
> +   the REG rtx of Xmode, instead of taking care of these in expand func.  */
> +
> +static rtx
> +riscv_gen_sign_extend_rtx (rtx x, machine_mode mode)
> +{
> +  if (mode == Xmode)
> +return x;
> +
> +  rtx xmode_reg = gen_reg_rtx (Xmode);
> +  riscv_emit_unary (SIGN_EXTEND, xmode_reg, x);
> +
> +  return xmode_reg;
> +}
You might be able to (ab)use riscv_expand_comparands rather than making 
a new function. You'd probably want to rename it in that case.

The other high level question, wouldn't this be needed for other signed 
cases, not just those going through expand_sssub?


jeff

Re: [PATCH] lra: emit caller-save register spills before call insn [PR116028]

2025-01-21 Thread Andrew Pinski

On Thu, Aug 8, 2024 at 2:07 PM Andrew Pinski  wrote:
>
> On Fri, Aug 2, 2024 at 7:30 AM Jeff Law  wrote:
> >
> >
> >
> > On 8/1/24 4:12 AM, Surya Kumari Jangala wrote:
> > > lra: emit caller-save register spills before call insn [PR116028]
> > >
> > > LRA emits insns to save caller-save registers in the
> > > inheritance/splitting pass. In this pass, LRA builds EBBs (Extended
> > > Basic Block) and traverses the insns in the EBBs in reverse order from
> > > the last insn to the first insn. When LRA sees a write to a pseudo (that
> > > has been assigned a caller-save register), and there is a read following
> > > the write, with an intervening call insn between the write and read,
> > > then LRA generates a spill immediately after the write and a restore
> > > immediately before the read. The spill is needed because the call insn
> > > will clobber the caller-save register.
> > >
> > > If there is a write insn and a call insn in two separate BBs but
> > > belonging to the same EBB, the spill insn gets generated in the BB
> > > containing the write insn. If the write insn is in the entry BB, then
> > > the spill insn that is generated in the entry BB prevents shrink wrap
> > > from happening. This is because the spill insn references the stack
> > > pointer and hence the prolog gets generated in the entry BB itself.
> > >
> > > This patch ensures that the spill insn is generated before the call insn
> > > instead of after the write. This is also more efficient as the spill now
> > > occurs only in the path containing the call.
> > >
> > > 2024-08-01  Surya Kumari Jangala  
> > >
> > > gcc/
> > >   PR rtl-optimization/PR116028
> > >   * lra-constraints.cc (split_reg): Spill register before call
> > >   insn.
> > >   (latest_call_insn): New variable.
> > >   (inherit_in_ebb): Track the latest call insn.
> > >
> > > gcc/testsuite/
> > >   PR rtl-optimization/PR116028
> > >   * gcc.dg/ira-shrinkwrap-prep-1.c: Remove xfail for powerpc.
> > >   * gcc.dg/pr10474.c: Remove xfail for powerpc.
> > Implementation looks fine.  I would suggest a comment indicating why
> > we're inserting before last_call_insn.  Otherwise someone in the future
> > would have to find the patch submission to know why we're handling that
> > case specially.
> >
> > OK with that additional comment.
>
> This causes bootstrap failure on aarch64-linux-gnu; self-tests fail at
> stage 2. Looks to be wrong code is produced compiling stage 2
> compiler.
> I have not looked further than that right now.

I decided to re-apply the patch to the trunk locally and see if I
could debug what was going wrong. The good news is the bootstrap
failure is gone.
The bad news is I don't know why though. I am going to see if I can
bisect where the failure mode I was getting disappears. That should
help decide if the bug has got latent or really fixed.

Thanks,
Andrew Pinski

>
> Thanks,
> Andrew
>
> >
> > Thanks,
> > jeff

1 2 >

1 - 100 of 118 matches

Mail list logo