date:20250620

[PATCH] tree-optimization/120654 - ICE with range query from IVOPTs

2025-06-20 Thread Richard Biener

The following ICEs as we hand down an UNDEFINED range to where it
isn't expected.  Put the guard that's there earlier.

Bootstrapped on x86_64-unknown-linux-gnu, testing in progress.

PR tree-optimization/120654
* vr-values.cc (range_fits_type_p): Check for undefined_p ()
before accessing type ().

* gcc.dg/torture/pr120654.c: New testcase.
---
 gcc/testsuite/gcc.dg/torture/pr120654.c | 24 
 gcc/vr-values.cc| 10 +-
 2 files changed, 29 insertions(+), 5 deletions(-)
 create mode 100644 gcc/testsuite/gcc.dg/torture/pr120654.c

diff --git a/gcc/testsuite/gcc.dg/torture/pr120654.c 
b/gcc/testsuite/gcc.dg/torture/pr120654.c
new file mode 100644
index 000..3819b78281d
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/torture/pr120654.c
@@ -0,0 +1,24 @@
+/* { dg-do compile } */
+
+int a, c, e, f, h, j;
+long g, k;
+void *malloc(long);
+void free(void *);
+int b(int m) {
+  if (m || a)
+return 1;
+  return 0.0f;
+}
+int d(int m, int p2) { return b(m) + m + (1 + p2 + p2); }
+int i() {
+  long l[] = {2, 9, 7, 8, g, g, 9, 0, 2, g};
+  e = l[c] << 6;
+}
+void n() {
+  long o;
+  int *p = malloc(sizeof(int));
+  k = 1 % j;
+  for (; i() + f + h; o++)
+if (p[d(j + 6, (int)k + 1992695866) + h + f + j + (int)k - 1 + o])
+  free(p);
+}
diff --git a/gcc/vr-values.cc b/gcc/vr-values.cc
index 799f1bfd91d..ff11656559b 100644
--- a/gcc/vr-values.cc
+++ b/gcc/vr-values.cc
@@ -944,6 +944,10 @@ range_fits_type_p (const irange *vr,
   widest_int tem;
   signop src_sgn;
 
+  /* Now we can only handle ranges with constant bounds.  */
+  if (vr->undefined_p () || vr->varying_p ())
+return false;
+
   /* We can only handle integral and pointer types.  */
   src_type = vr->type ();
   if (!INTEGRAL_TYPE_P (src_type)
@@ -952,17 +956,13 @@ range_fits_type_p (const irange *vr,
 
   /* An extension is fine unless VR is SIGNED and dest_sgn is UNSIGNED,
  and so is an identity transform.  */
-  src_precision = TYPE_PRECISION (vr->type ());
+  src_precision = TYPE_PRECISION (src_type);
   src_sgn = TYPE_SIGN (src_type);
   if ((src_precision < dest_precision
&& !(dest_sgn == UNSIGNED && src_sgn == SIGNED))
   || (src_precision == dest_precision && src_sgn == dest_sgn))
 return true;
 
-  /* Now we can only handle ranges with constant bounds.  */
-  if (vr->undefined_p () || vr->varying_p ())
-return false;
-
   wide_int vrmin = vr->lower_bound ();
   wide_int vrmax = vr->upper_bound ();
 
-- 
2.43.0

[PATCH] fortran: Mention user variable in SELECT TYPE temporary variable names

2025-06-20 Thread Mikael Morin

From: Mikael Morin 

 Regression-tested on x86_64-pc-linux-gnu.
 Ok for master?

-- >8 --

The temporary variables that are generated to implement SELECT TYPE
and TYPE IS statements have (before this change) a name depending only
on the type.  This can produce confusing dumps with code having multiple
SELECT TYPE statements, as it isn't obvious which SELECT TYPE construct
the variable relates to.  This is especially the case with nested SELECT
TYPE statements and with SELECT TYPE variables having identical types
(and thus identical names).

This change adds one additional user-provided discriminating string in
the variable names, using the value from the SELECT TYPE variable name
or last component reference name.

It's a purely convenience change, not a correctness issue.

gcc/fortran/ChangeLog:

* misc.cc (gfc_var_name_for_select_type_temp): New function.
* gfortran.h (gfc_var_name_for_select_type_temp): Declare it.
* resolve.cc (resolve_select_type): Pick a discriminating name
from the SELECT TYPE variable reference and use it in the name
of the temporary variable that is generated.
* match.cc (select_type_set_tmp): Likewise.  Pass the
discriminating name...
(select_intrinsic_set_tmp): ... to this function.  Use the
discriminating name likewise.
---
 gcc/fortran/gfortran.h |  2 ++
 gcc/fortran/match.cc   | 18 ++
 gcc/fortran/misc.cc| 21 +
 gcc/fortran/resolve.cc | 19 +++
 4 files changed, 44 insertions(+), 16 deletions(-)

diff --git a/gcc/fortran/gfortran.h b/gcc/fortran/gfortran.h
index f73b5f9c23f..6848bd1762d 100644
--- a/gcc/fortran/gfortran.h
+++ b/gcc/fortran/gfortran.h
@@ -3507,6 +3507,8 @@ void gfc_done_2 (void);
 
 int get_c_kind (const char *, CInteropKind_t *);
 
+const char * gfc_var_name_for_select_type_temp (gfc_expr *);
+
 const char *gfc_closest_fuzzy_match (const char *, char **);
 inline void
 vec_push (char **&optr, size_t &osz, const char *elt)
diff --git a/gcc/fortran/match.cc b/gcc/fortran/match.cc
index a99a757bede..4631791015f 100644
--- a/gcc/fortran/match.cc
+++ b/gcc/fortran/match.cc
@@ -7171,7 +7171,7 @@ select_type_push (gfc_symbol *sel)
 /* Set the temporary for the current intrinsic SELECT TYPE selector.  */
 
 static gfc_symtree *
-select_intrinsic_set_tmp (gfc_typespec *ts)
+select_intrinsic_set_tmp (gfc_typespec *ts, const char *var_name)
 {
   char name[GFC_MAX_SYMBOL_LEN];
   gfc_symtree *tmp;
@@ -7192,12 +7192,12 @@ select_intrinsic_set_tmp (gfc_typespec *ts)
 charlen = gfc_mpz_get_hwi (ts->u.cl->length->value.integer);
 
   if (ts->type != BT_CHARACTER)
-sprintf (name, "__tmp_%s_%d", gfc_basic_typename (ts->type),
-ts->kind);
+sprintf (name, "__tmp_%s_%d_%s", gfc_basic_typename (ts->type),
+ts->kind, var_name);
   else
 snprintf (name, sizeof (name),
- "__tmp_%s_" HOST_WIDE_INT_PRINT_DEC "_%d",
- gfc_basic_typename (ts->type), charlen, ts->kind);
+ "__tmp_%s_" HOST_WIDE_INT_PRINT_DEC "_%d_%s",
+ gfc_basic_typename (ts->type), charlen, ts->kind, var_name);
 
   gfc_get_sym_tree (name, gfc_current_ns, &tmp, false);
   sym = tmp->n.sym;
@@ -7239,7 +7239,9 @@ select_type_set_tmp (gfc_typespec *ts)
   return;
 }
 
-  tmp = select_intrinsic_set_tmp (ts);
+  gfc_expr *select_type_expr = gfc_state_stack->construct->expr1;
+  const char *var_name = gfc_var_name_for_select_type_temp (select_type_expr);
+  tmp = select_intrinsic_set_tmp (ts, var_name);
 
   if (tmp == NULL)
 {
@@ -7247,9 +7249,9 @@ select_type_set_tmp (gfc_typespec *ts)
return;
 
   if (ts->type == BT_CLASS)
-   sprintf (name, "__tmp_class_%s", ts->u.derived->name);
+   sprintf (name, "__tmp_class_%s_%s", ts->u.derived->name, var_name);
   else
-   sprintf (name, "__tmp_type_%s", ts->u.derived->name);
+   sprintf (name, "__tmp_type_%s_%s", ts->u.derived->name, var_name);
 
   gfc_get_sym_tree (name, gfc_current_ns, &tmp, false);
   sym = tmp->n.sym;
diff --git a/gcc/fortran/misc.cc b/gcc/fortran/misc.cc
index b8bdf7578de..23393066fc7 100644
--- a/gcc/fortran/misc.cc
+++ b/gcc/fortran/misc.cc
@@ -472,3 +472,24 @@ gfc_mpz_set_hwi (mpz_t rop, const HOST_WIDE_INT op)
   const wide_int w = wi::shwi (op, HOST_BITS_PER_WIDE_INT);
   wi::to_mpz (w, rop, SIGNED);
 }
+
+
+/* Extract a name suitable for use in the name of the select type temporary
+   variable.  We pick the last component name in the data reference if there
+   is one, otherwise the user variable name, and return the empty string by
+   default.  */
+
+const char *
+gfc_var_name_for_select_type_temp (gfc_expr *e)
+{
+  const char *name = "";
+  if (e->symtree)
+name = e->symtree->name;
+  for (gfc_ref *r = e->ref; r; r = r->next)
+if (r->type == REF_COMPONENT
+   && !(strcmp (r->u.c.component->name, "_data") == 0
+|| strcmp (r->u.c.component->name, "_vptr") == 0))
+

Re: [PATCH v1] RISC-V: Fix ICE for expand_select_vldi [PR120652]

2025-06-20 Thread Robin Dapp


Hi Pan,


+(define_special_predicate "vectorization_factor_operand"
+  (match_code "const_int,const_poly_int"))
+


Does immediate_operand () work instead of a new predicate?

--
Regards
Robin

[PATCH] vregs: Use force_subreg when instantiating subregs [PR120721]

2025-06-20 Thread Richard Sandiford

In this PR, we started with:

(subreg:V2DI (reg:DI virtual-reg) 0)

and vregs instantiated the virtual register to the argument pointer.
But:

(subreg:V2DI (reg:DI ap) 0)

is not a sensible subreg, since the argument pointer certainly can't
be referenced in V2DImode.  This is (IMO correctly) rejected after
g:2dcc6dbd8a00caf7cfa8cac17b3fd1c33d658016.

The vregs code that instantiates the subreg above is specific to
rvalues and already creates new instructions for nonzero offsets.
It is therefore safe to use force_subreg instead of simplify_gen_subreg.

I did wonder whether we should instead say that a subreg of a
virtual register is invalid if the same subreg would be invalid
for the associated hard registers.  But the point of virtual registers
is that the offsets from the hard registers are not known until after
expand has finished, and if an offset is nonzero, the virtual register
will be instantiated into a pseudo that contains the sum of the hard
register and the offset.  The subreg would then be correct for that
pseudo.  The subreg is only invalid in this case because there is
no offset.

Tested on aarch64-linux-gnu.  OK to install?

Richard


gcc/
PR rtl-optimization/120721
* function.cc (instantiate_virtual_regs_in_insn): Use force_subreg
instead of simplify_gen_subreg when instantiating an rvalue SUBREG.

gcc/testsuite/
PR rtl-optimization/120721
* g++.dg/torture/pr120721.C: New test.
---
 gcc/function.cc | 20 ++---
 gcc/testsuite/g++.dg/torture/pr120721.C | 39 +
 2 files changed, 48 insertions(+), 11 deletions(-)
 create mode 100644 gcc/testsuite/g++.dg/torture/pr120721.C

diff --git a/gcc/function.cc b/gcc/function.cc
index a3a74b44b91..48167b0c207 100644
--- a/gcc/function.cc
+++ b/gcc/function.cc
@@ -1722,19 +1722,17 @@ instantiate_virtual_regs_in_insn (rtx_insn *insn)
  new_rtx = instantiate_new_reg (SUBREG_REG (x), &offset);
  if (new_rtx == NULL)
continue;
+ start_sequence ();
  if (maybe_ne (offset, 0))
-   {
- start_sequence ();
- new_rtx = expand_simple_binop
-   (GET_MODE (new_rtx), PLUS, new_rtx,
-gen_int_mode (offset, GET_MODE (new_rtx)),
-NULL_RTX, 1, OPTAB_LIB_WIDEN);
- seq = end_sequence ();
- emit_insn_before (seq, insn);
-   }
- x = simplify_gen_subreg (recog_data.operand_mode[i], new_rtx,
-  GET_MODE (new_rtx), SUBREG_BYTE (x));
+   new_rtx = expand_simple_binop
+ (GET_MODE (new_rtx), PLUS, new_rtx,
+  gen_int_mode (offset, GET_MODE (new_rtx)),
+  NULL_RTX, 1, OPTAB_LIB_WIDEN);
+ x = force_subreg (recog_data.operand_mode[i], new_rtx,
+   GET_MODE (new_rtx), SUBREG_BYTE (x));
  gcc_assert (x);
+ seq = end_sequence ();
+ emit_insn_before (seq, insn);
  break;
 
default:
diff --git a/gcc/testsuite/g++.dg/torture/pr120721.C 
b/gcc/testsuite/g++.dg/torture/pr120721.C
new file mode 100644
index 000..37dc46cb118
--- /dev/null
+++ b/gcc/testsuite/g++.dg/torture/pr120721.C
@@ -0,0 +1,39 @@
+// { dg-additional-options "-w -fno-vect-cost-model" }
+
+template  struct integral_constant {
+  static constexpr int value = __v;
+};
+template  using __bool_constant = integral_constant<__v>;
+template  using enable_if_t = int;
+struct function_ref {
+  template 
+  function_ref(
+  Callable,
+  enable_if_t<__bool_constant<__is_same(int, int)>::value> * = nullptr);
+};
+struct ArrayRef {
+  int Data;
+  long Length;
+  int *begin();
+  int *end();
+};
+struct StringRef {
+  char Data;
+  long Length;
+};
+void attributeObject(function_ref);
+struct ScopedPrinter {
+  virtual void printBinaryImpl(StringRef, StringRef, ArrayRef, bool, unsigned);
+};
+struct JSONScopedPrinter : ScopedPrinter {
+  JSONScopedPrinter();
+  void printBinaryImpl(StringRef, StringRef, ArrayRef Value, bool,
+   unsigned StartOffset) {
+attributeObject([&] {
+  StartOffset;
+  for (char Val : Value)
+;
+});
+  }
+};
+JSONScopedPrinter::JSONScopedPrinter() {}
-- 
2.43.0

Re: [PATCH v5 2/3][__bdos]Use the counted_by attribute of pointers in builtinin-object-size.

2025-06-20 Thread Qing Zhao

Hi, Sid,

Thanks a lot for the review. 
I will update the testing cases per your suggestions. 

> On Jun 19, 2025, at 12:07, Siddhesh Poyarekar  wrote:
> 
> On 2025-06-16 18:08, Qing Zhao wrote:
>> gcc/ChangeLog:
>> * tree-object-size.cc (access_with_size_object_size): Handle pointers
>> with counted_by.
> 
> This should probably just say "Update comment for .ACCESS_WITH_SIZE.".

Yes, you are right. 
> 
>> (collect_object_sizes_for): Likewise.
>> gcc/testsuite/ChangeLog:
>> * gcc.dg/pointer-counted-by-4.c: New test.
>> * gcc.dg/pointer-counted-by-5.c: New test.
>> * gcc.dg/pointer-counted-by-6.c: New test.
>> * gcc.dg/pointer-counted-by-7.c: New test.
>> ---
> 
> I can't approve, but here's a review.  In summary, I've suggested some 
> modifications to the tests, the tree-object-size change itself looks fine to 
> me.
> 
>>  gcc/testsuite/gcc.dg/pointer-counted-by-4.c | 63 +
>>  gcc/testsuite/gcc.dg/pointer-counted-by-5.c | 48 
>>  gcc/testsuite/gcc.dg/pointer-counted-by-6.c | 47 +++
>>  gcc/testsuite/gcc.dg/pointer-counted-by-7.c | 30 ++
>>  gcc/tree-object-size.cc | 12 +++-
>>  5 files changed, 197 insertions(+), 3 deletions(-)
>>  create mode 100644 gcc/testsuite/gcc.dg/pointer-counted-by-4.c
>>  create mode 100644 gcc/testsuite/gcc.dg/pointer-counted-by-5.c
>>  create mode 100644 gcc/testsuite/gcc.dg/pointer-counted-by-6.c
>>  create mode 100644 gcc/testsuite/gcc.dg/pointer-counted-by-7.c
>> diff --git a/gcc/testsuite/gcc.dg/pointer-counted-by-4.c 
>> b/gcc/testsuite/gcc.dg/pointer-counted-by-4.c
>> new file mode 100644
>> index 000..11e9421c819
>> --- /dev/null
>> +++ b/gcc/testsuite/gcc.dg/pointer-counted-by-4.c
>> @@ -0,0 +1,63 @@
>> +/* Test the attribute counted_by for pointer field and its usage in
>> + * __builtin_dynamic_object_size.  */
>> +/* { dg-do run } */
>> +/* { dg-options "-O2" } */
>> +
>> +#include "builtin-object-size-common.h"
>> +
>> +struct pointer_array {
>> +  int b;
>> +  int *c;
>> +} *p_array;
>> +
>> +struct annotated {
>> +  int *c __attribute__ ((counted_by (b)));
>> +  int b;
>> +} *p_array_annotated;
>> +
>> +struct nested_annotated {
>> +  int *c __attribute__ ((counted_by (b)));
>> +  struct {
>> +union {
>> +  int b;
>> +  float f; 
>> +};
>> +int n;
>> +  };
>> +} *p_array_nested_annotated;
>> +
>> +void __attribute__((__noinline__)) setup (int normal_count, int attr_count)
>> +{
>> +  p_array
>> += (struct pointer_array *) malloc (sizeof (struct pointer_array));
>> +  p_array->c = (int *) malloc (sizeof (int) * normal_count);
>> +  p_array->b = normal_count;
>> +
>> +  p_array_annotated
>> += (struct annotated *) malloc (sizeof (struct annotated));
>> +  p_array_annotated->c = (int *) malloc (sizeof (int) * attr_count);
>> +  p_array_annotated->b = attr_count;
>> +
>> +  p_array_nested_annotated
>> += (struct nested_annotated *) malloc (sizeof (struct nested_annotated));
>> +  p_array_nested_annotated->c = (int *) malloc (sizeof (int) * attr_count);
>> +  p_array_nested_annotated->b = attr_count;
>> +
>> +  return;
>> +}
>> +
>> +void __attribute__((__noinline__)) test ()
>> +{
>> +EXPECT(__builtin_dynamic_object_size(p_array->c, 1), -1);
>> +EXPECT(__builtin_dynamic_object_size(p_array_annotated->c, 1),
>> +p_array_annotated->b * sizeof (int));
>> +EXPECT(__builtin_dynamic_object_size(p_array_nested_annotated->c, 1),
>> +p_array_nested_annotated->b * sizeof (int));
>> +}
> 
> Suggest extending this test for at least char so that you verify scaling.  
> e.g. you could have this in pointer-counted-by-4.c:
> 
> ```
> #ifndef PTR_TYPE
> #define PTR_TYPE int
> #endif
> 
> struct pointer_array {
>  int b;
>  PTR_TYPE *c;
> } *p_array;
> 
> struct annotated {
>  PTR_TYPE *c __attribute__ ((counted_by (b)));
>  int b;
> } *p_array_annotated;
> 
> struct nested_annotated {
>  PTR_TYPE *c __attribute__ ((counted_by (b)));
>  struct {
>union {
>  int b;
>  float f; 
>};
>int n;
>  };
> } *p_array_nested_annotated;
> ```
> 
> and then have another test pointer-counted-by-4-char.c  that simply includes 
> this test:
> 
> ```
> #define PTR_TYPE char
> #include "pointer-counted-by-4.c"

Okay. Thanks for the suggestions. I will do that.
> ```
> 
>> +
>> +int main(int argc, char *argv[])
>> +{
>> +  setup (10,10);
>> +  test ();
>> +  DONE ();
>> +}
>> diff --git a/gcc/testsuite/gcc.dg/pointer-counted-by-5.c 
>> b/gcc/testsuite/gcc.dg/pointer-counted-by-5.c
>> new file mode 100644
>> index 000..45918c3e654
>> --- /dev/null
>> +++ b/gcc/testsuite/gcc.dg/pointer-counted-by-5.c
>> @@ -0,0 +1,48 @@
>> +/* Test the attribute counted_by for pointer fields and its usage in
>> + * __builtin_dynamic_object_size: when the counted_by field is negative.  */
>> +/* { dg-do run } */
>> +/* { dg-options "-O2" } */
>> +
>> +#include "builtin-object-size-common.h"
>> +
>> +struct annotated {
>> +  int b;
>> +  int *c __attribute__ ((cou

Re: [PATCH v5 1/3][C FE] Extend "counted_by" attribute to pointer fields of structures.

2025-06-20 Thread Qing Zhao




> On Jun 18, 2025, at 17:23, Joseph Myers  wrote:
> 
> On Mon, 16 Jun 2025, Qing Zhao wrote:
> 
>> +The counted_by attribute is not allowed for a pointer to @code{void},
> 
> @code{counted_by}.
> 
> This patch is OK with that fix once the rest of this series is approved.

Thanks a lot for the review.

Will fix the above before committing.

Qing
> 
> -- 
> Joseph S. Myers
> josmy...@redhat.com
>

Re: [PATCH v4 1/4] Hard register constraints

2025-06-20 Thread Vladimir Makarov



On 5/20/25 3:22 AM, Stefan Schulze Frielinghaus wrote:

Implement hard register constraints of the form {regname} where regname
must be a valid register name for the target.  Such constraints may be
used in asm statements as a replacement for register asm and in machine
descriptions.

---
  gcc/config/cris/cris.cc   |   6 +-
  gcc/config/i386/i386.cc   |   6 +
  gcc/config/s390/s390.cc   |   6 +-
  gcc/doc/extend.texi   | 178 ++
  gcc/doc/md.texi   |   6 +
  gcc/function.cc   | 116 
  gcc/genoutput.cc  |  14 ++
  gcc/genpreds.cc   |   4 +-
  gcc/ira.cc|  79 +++-
  gcc/lra-constraints.cc|  13 ++
  gcc/recog.cc  |  11 +-
  gcc/stmt.cc   |  39 
  gcc/stmt.h|   1 +
  gcc/testsuite/gcc.dg/asm-hard-reg-1.c |  85 +
  gcc/testsuite/gcc.dg/asm-hard-reg-2.c |  33 
  gcc/testsuite/gcc.dg/asm-hard-reg-3.c |  25 +++
  gcc/testsuite/gcc.dg/asm-hard-reg-4.c |  50 +
  gcc/testsuite/gcc.dg/asm-hard-reg-5.c |  36 
  gcc/testsuite/gcc.dg/asm-hard-reg-6.c |  60 ++
  gcc/testsuite/gcc.dg/asm-hard-reg-7.c |  41 
  gcc/testsuite/gcc.dg/asm-hard-reg-8.c |  49 +
  .../gcc.target/aarch64/asm-hard-reg-1.c   |  55 ++
  .../gcc.target/i386/asm-hard-reg-1.c  | 115 +++
  .../gcc.target/s390/asm-hard-reg-1.c  | 103 ++
  .../gcc.target/s390/asm-hard-reg-2.c  |  43 +
  .../gcc.target/s390/asm-hard-reg-3.c  |  42 +
  .../gcc.target/s390/asm-hard-reg-4.c  |   6 +
  .../gcc.target/s390/asm-hard-reg-5.c  |   6 +
  .../gcc.target/s390/asm-hard-reg-6.c  | 152 +++
  .../gcc.target/s390/asm-hard-reg-longdouble.h |  18 ++
  30 files changed, 1391 insertions(+), 7 deletions(-)
  create mode 100644 gcc/testsuite/gcc.dg/asm-hard-reg-1.c
  create mode 100644 gcc/testsuite/gcc.dg/asm-hard-reg-2.c
  create mode 100644 gcc/testsuite/gcc.dg/asm-hard-reg-3.c
  create mode 100644 gcc/testsuite/gcc.dg/asm-hard-reg-4.c
  create mode 100644 gcc/testsuite/gcc.dg/asm-hard-reg-5.c
  create mode 100644 gcc/testsuite/gcc.dg/asm-hard-reg-6.c
  create mode 100644 gcc/testsuite/gcc.dg/asm-hard-reg-7.c
  create mode 100644 gcc/testsuite/gcc.dg/asm-hard-reg-8.c
  create mode 100644 gcc/testsuite/gcc.target/aarch64/asm-hard-reg-1.c
  create mode 100644 gcc/testsuite/gcc.target/i386/asm-hard-reg-1.c
  create mode 100644 gcc/testsuite/gcc.target/s390/asm-hard-reg-1.c
  create mode 100644 gcc/testsuite/gcc.target/s390/asm-hard-reg-2.c
  create mode 100644 gcc/testsuite/gcc.target/s390/asm-hard-reg-3.c
  create mode 100644 gcc/testsuite/gcc.target/s390/asm-hard-reg-4.c
  create mode 100644 gcc/testsuite/gcc.target/s390/asm-hard-reg-5.c
  create mode 100644 gcc/testsuite/gcc.target/s390/asm-hard-reg-6.c
  create mode 100644 gcc/testsuite/gcc.target/s390/asm-hard-reg-longdouble.h
diff --git a/gcc/ira.cc b/gcc/ira.cc
index 885239d1b43..c32d4ceab88 100644
--- a/gcc/ira.cc
+++ b/gcc/ira.cc
@@ -2113,6 +2113,82 @@ ira_get_dup_out_num (int op_num, alternative_mask alts,
  
  
  
+/* Return true if a replacement of SRC by DEST does not lead to unsatisfiable

+   asm.  Thus, a replacement is valid if and only if SRC and DEST are not
+   constrained in asm inputs of a single asm statement.  See
+   match_asm_constraints_2() for more details.  TODO: As in
+   match_asm_constraints_2() consider alternatives more precisely.  */
+
+static bool
+valid_replacement_for_asm_input_p_1 (const_rtx asmops, const_rtx src, 
const_rtx dest)
+{
+  int ninputs = ASM_OPERANDS_INPUT_LENGTH (asmops);
+  rtvec inputs = ASM_OPERANDS_INPUT_VEC (asmops);
+  for (int i = 0; i < ninputs; ++i)
+{
+  rtx input_src = RTVEC_ELT (inputs, i);
+  const char *constraint_src
+   = ASM_OPERANDS_INPUT_CONSTRAINT (asmops, i);
+  if (rtx_equal_p (input_src, src)
+ && strchr (constraint_src, '{') != nullptr)
+   for (int j = 0; j < ninputs; ++j)
+ {
+   rtx input_dest = RTVEC_ELT (inputs, j);
+   const char *constraint_dest
+ = ASM_OPERANDS_INPUT_CONSTRAINT (asmops, j);
+   if (rtx_equal_p (input_dest, dest)
+   && strchr (constraint_dest, '{') != nullptr)
+ return false;
+ }
+}
+  return true;
+}
+
+static bool
+valid_replacement_for_asm_input_p (const_rtx src, const_rtx dest)
+{
+  /* Bail out early if there is no asm statement.  */
+  if (!crtl->has_asm_statement)
+return true;
+  for (df_ref use = DF_REG_USE_CHAIN (REGNO (src));
+   use;
+   use = DF_REF_NEXT_REG (use))
+{
+  struct df_insn_info *use_info = DF_RE

[PATCH] tree-optimization/120729 - limit compile time in uninit_analysis::prune_phi_opnds

2025-06-20 Thread Richard Biener

The testcase in this PR shows, on the GCC 14 branch, that in some
degenerate cases we can spend exponential time pruning always
initialized paths through a web of PHIs.  The following adds
--param uninit-max-prune-work, defaulted to 10, to limit that
to effectively O(1).

Bootstrap and regtest running on x86_64-unknown-linux-gnu.

Note the issue only reproduces on the GCC 14 branch, GCC 15 and
trunk are fine (though the code is exponential everywhere).

PR tree-optimization/120729
* gimple-predicate-analysis.h (uninit_analysis::prune_phi_opnds):
Add argument of work budget remaining.
* gimple-predicate-analysis.cc (uninit_analysis::prune_phi_opnds):
Likewise.  Maintain and honor it throughout the recursion.
* params.opt (uninit-max-prune-work): New.
* doc/invoke.texi (uninit-max-prune-work): Document.
---
 gcc/doc/invoke.texi  |  3 +++
 gcc/gimple-predicate-analysis.cc | 12 +---
 gcc/gimple-predicate-analysis.h  |  2 +-
 gcc/params.opt   |  4 
 4 files changed, 17 insertions(+), 4 deletions(-)

diff --git a/gcc/doc/invoke.texi b/gcc/doc/invoke.texi
index dec3c7a1b80..91b0a201e1b 100644
--- a/gcc/doc/invoke.texi
+++ b/gcc/doc/invoke.texi
@@ -17420,6 +17420,9 @@ predicate chain.
 @item uninit-max-num-chains
 Maximum number of predicates ored in the normalized predicate chain.
 
+@item uninit-max-prune-work
+Maximum amount of work done to prune paths where the variable is always 
initialized.
+
 @item sched-autopref-queue-depth
 Hardware autoprefetcher scheduler model control flag.
 Number of lookahead cycles the model looks into; at '
diff --git a/gcc/gimple-predicate-analysis.cc b/gcc/gimple-predicate-analysis.cc
index 76f6ab61310..b056b42a17e 100644
--- a/gcc/gimple-predicate-analysis.cc
+++ b/gcc/gimple-predicate-analysis.cc
@@ -385,7 +385,8 @@ bool
 uninit_analysis::prune_phi_opnds (gphi *phi, unsigned opnds, gphi *flag_def,
  tree boundary_cst, tree_code cmp_code,
  hash_set *visited_phis,
- bitmap *visited_flag_phis)
+ bitmap *visited_flag_phis,
+ unsigned &max_attempts)
 {
   /* The Boolean predicate guarding the PHI definition.  Initialized
  lazily from PHI in the first call to is_use_guarded() and cached
@@ -398,6 +399,10 @@ uninit_analysis::prune_phi_opnds (gphi *phi, unsigned 
opnds, gphi *flag_def,
   if (!MASK_TEST_BIT (opnds, i))
continue;
 
+  if (max_attempts == 0)
+   return false;
+  --max_attempts;
+
   tree flag_arg = gimple_phi_arg_def (flag_def, i);
   if (!is_gimple_constant (flag_arg))
{
@@ -432,7 +437,7 @@ uninit_analysis::prune_phi_opnds (gphi *phi, unsigned 
opnds, gphi *flag_def,
  unsigned opnds_arg_phi = m_eval.phi_arg_set (phi_arg_def);
  if (!prune_phi_opnds (phi_arg_def, opnds_arg_phi, flag_arg_def,
boundary_cst, cmp_code, visited_phis,
-   visited_flag_phis))
+   visited_flag_phis, max_attempts))
return false;
 
  bitmap_clear_bit (*visited_flag_phis, SSA_NAME_VERSION (phi_result));
@@ -634,9 +639,10 @@ uninit_analysis::overlap (gphi *phi, unsigned opnds, 
hash_set *visited,
 value that is in conflict with the use guard/predicate.  */
   bitmap visited_flag_phis = NULL;
   gphi *phi_def = as_a (flag_def);
+  unsigned max_attempts = param_uninit_max_prune_work;
   bool all_pruned = prune_phi_opnds (phi, opnds, phi_def, boundary_cst,
 cmp_code, visited,
-&visited_flag_phis);
+&visited_flag_phis, max_attempts);
   if (visited_flag_phis)
BITMAP_FREE (visited_flag_phis);
   if (all_pruned)
diff --git a/gcc/gimple-predicate-analysis.h b/gcc/gimple-predicate-analysis.h
index f71061ec283..67a19aa0905 100644
--- a/gcc/gimple-predicate-analysis.h
+++ b/gcc/gimple-predicate-analysis.h
@@ -152,7 +152,7 @@ private:
   bool is_use_guarded (gimple *, basic_block, gphi *, unsigned,
   hash_set *);
   bool prune_phi_opnds (gphi *, unsigned, gphi *, tree, tree_code,
-   hash_set *, bitmap *);
+   hash_set *, bitmap *, unsigned &);
   bool overlap (gphi *, unsigned, hash_set *, const predicate &);
 
   void collect_phi_def_edges (gphi *, basic_block, vec *,
diff --git a/gcc/params.opt b/gcc/params.opt
index a67f900a63f..31aa0bd5753 100644
--- a/gcc/params.opt
+++ b/gcc/params.opt
@@ -1185,6 +1185,10 @@ predicate chain.
 Common Joined UInteger Var(param_uninit_max_num_chains) Init(8) 
IntegerRange(1, 128) Param Optimization
 Maximum number of predicates ored in the normalized predicate chain.
 
+-param=uninit-max-prune-work=
+Common Joined UInteger Var(param

Re: [PATCH v2] Evaluate the object size by the size of the pointee type when the type is a structure with flexible array member which is annotated with counted_by.

2025-06-20 Thread Qing Zhao



> On Jun 18, 2025, at 20:51, Siddhesh Poyarekar  wrote:
> 
> On 2025-06-18 18:40, Qing Zhao wrote:
 Okay, I guess that I didn’t put enough attention on the above example 
 previously, sorry about that...
 Read it multiple times this time, my question is for the following code 
 portion:
  objsz = __builtin_dynamic_object_size (ptr, 0);
  __memcpy_chk (ptr, src, sz, objsz);
 When program get  to the this point, “ptr” is freed and invalid already,  
 is the program still considered as a valid program when the first argument 
 to the call to __memcpy_chk is an invalid pointer but the 3rd parameter is 
 0?
>>> 
>>> AFAICT, strictly according to the standards it should not be considered 
>>> valid since any use of an invalid pointer (not just dereferencing it) is 
>>> considered undefined behaviour.  However in practice it doesn't result in 
>>> an invalid access because of SZ=0.
>> Then should we follow the standards here? i.e, even though the program does 
>> not result in an invalid access because of SZ=0, the program has undefined 
>> behavior due to the use of invalid pointer?
> 
> It won't be "following", it would be "taking advantage of", which is 
> technically fair, but I don't think it's a good idea to do by default because 
> it has the potential to create vulnerabilities where there wasn't one before. 
>  It would have been fine if the builtin reliably crashed the program, but 
> here we're simply creating an invalid read, which could potentially be 
> silently exploited.
> 
> However like I said to Kees elsewhere in this thread, maybe we could hide 
> this one behind a new --param:
> 
> ```
> --param objsz-allow-dereference-input
> 
> Allow object size expressions generated by the __builtin_dynamic_object_size 
> builtin function to dereference the input pointer.  This may allow the 
> builtin function to get the size of an object when size information is 
> embedded in the object itself, e.g. with structures that have flexible array 
> members at their end, annotated with the __counted_by__ attribute.  Use this 
> parameter with caution because in cases where a non-NULL input pointer is not 
> known to be valid, e.g. when it points to memory that is either protected or 
> freed, enabling this parameter may result in dereferencing that invalid 
> pointer, potentially introducing additional undefined behaviour.

Okay, this is a reasonable solution to this problem. 

I will add a new —param option as suggested, and then guard the generation of 
the size expression for:

__builtin_dynamic_object_size (p, 1)

With this opinion before the NULL pointer checking.  Then update the testing 
cases as well. 

Is that reasonable?

Thanks.

Qing

> ```
> 
> Sid

[PATCH v2] RISC-V: Fix ICE for expand_select_vldi [PR120652]

2025-06-20 Thread pan2 . li

From: Pan Li 

The will be one ICE when expand pass, the bt similar as below.

during RTL pass: expand
red.c: In function 'main':
red.c:20:5: internal compiler error: in require, at machmode.h:323
   20 | int main() {
  | ^~~~
0x2e0b1d6 internal_error(char const*, ...)
../../../gcc/gcc/diagnostic-global-context.cc:517
0xd0d3ed fancy_abort(char const*, int, char const*)
../../../gcc/gcc/diagnostic.cc:1803
0xc3da74 opt_mode::require() const
../../../gcc/gcc/machmode.h:323
0xc3de2f opt_mode::require() const
../../../gcc/gcc/poly-int.h:1383
0xc3de2f riscv_vector::expand_select_vl(rtx_def**)
../../../gcc/gcc/config/riscv/riscv-v.cc:4218
0x21c7d22 gen_select_vldi(rtx_def*, rtx_def*, rtx_def*)
../../../gcc/gcc/config/riscv/autovec.md:1344
0x134db6c maybe_expand_insn(insn_code, unsigned int, expand_operand*)
../../../gcc/gcc/optabs.cc:8257
0x134db6c expand_insn(insn_code, unsigned int, expand_operand*)
../../../gcc/gcc/optabs.cc:8288
0x11b21d3 expand_fn_using_insn
../../../gcc/gcc/internal-fn.cc:318
0xef32cf expand_call_stmt
../../../gcc/gcc/cfgexpand.cc:3097
0xef32cf expand_gimple_stmt_1
../../../gcc/gcc/cfgexpand.cc:4264
0xef32cf expand_gimple_stmt
../../../gcc/gcc/cfgexpand.cc:4411
0xef95b6 expand_gimple_basic_block
../../../gcc/gcc/cfgexpand.cc:6472
0xefb66f execute
../../../gcc/gcc/cfgexpand.cc:7223

The select_vl op_1 and op_2 may be the same const_int like (const_int 32).
And then maybe_legitimize_operands will:

1. First mov the const op_1 to a reg.
2. Resue the reg of op_1 for op_2 as the op_1 and op_2 is equal.

That will break the assumption that the op_2 of select_vl is immediate,
or something like CONST_INT_POLY.

The below test suites are passed for this patch series.
* The rv64gcv fully regression test.

PR target/120652

gcc/ChangeLog:

* config/riscv/predicates.md (vectorization_factor_operand):
Add immediate_operand for select_vl operand 2.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/rvv/autovec/pr120652-1.c: New test.
* gcc.target/riscv/rvv/autovec/pr120652-2.c: New test.
* gcc.target/riscv/rvv/autovec/pr120652-3.c: New test.
* gcc.target/riscv/rvv/autovec/pr120652.h: New test.

Signed-off-by: Pan Li 
---
 gcc/config/riscv/autovec.md   |  2 +-
 .../gcc.target/riscv/rvv/autovec/pr120652-1.c |  5 +++
 .../gcc.target/riscv/rvv/autovec/pr120652-2.c |  5 +++
 .../gcc.target/riscv/rvv/autovec/pr120652-3.c |  5 +++
 .../gcc.target/riscv/rvv/autovec/pr120652.h   | 31 +++
 5 files changed, 47 insertions(+), 1 deletion(-)
 create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/pr120652-1.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/pr120652-2.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/pr120652-3.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/pr120652.h

diff --git a/gcc/config/riscv/autovec.md b/gcc/config/riscv/autovec.md
index c678eefc700..94a61bdc5cf 100644
--- a/gcc/config/riscv/autovec.md
+++ b/gcc/config/riscv/autovec.md
@@ -1338,7 +1338,7 @@ (define_insn_and_split "fnms4"
 (define_expand "select_vl"
   [(match_operand:P 0 "register_operand")
(match_operand:P 1 "vector_length_operand")
-   (match_operand:P 2 "")]
+   (match_operand:P 2 "immediate_operand")]
   "TARGET_VECTOR"
 {
   riscv_vector::expand_select_vl (operands);
diff --git a/gcc/testsuite/gcc.target/riscv/rvv/autovec/pr120652-1.c 
b/gcc/testsuite/gcc.target/riscv/rvv/autovec/pr120652-1.c
new file mode 100644
index 000..260e4c08f16
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/rvv/autovec/pr120652-1.c
@@ -0,0 +1,5 @@
+/* Test that we do not have ice when compile */
+/* { dg-do compile } */
+/* { dg-options "-march=rv64gc_zvl256b -mabi=lp64d -O3" } */
+
+#include "pr120652.h"
diff --git a/gcc/testsuite/gcc.target/riscv/rvv/autovec/pr120652-2.c 
b/gcc/testsuite/gcc.target/riscv/rvv/autovec/pr120652-2.c
new file mode 100644
index 000..6f859426766
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/rvv/autovec/pr120652-2.c
@@ -0,0 +1,5 @@
+/* Test that we do not have ice when compile */
+/* { dg-do compile } */
+/* { dg-options "-march=rv64gc_zvl512b -mabi=lp64d -O3" } */
+
+#include "pr120652.h"
diff --git a/gcc/testsuite/gcc.target/riscv/rvv/autovec/pr120652-3.c 
b/gcc/testsuite/gcc.target/riscv/rvv/autovec/pr120652-3.c
new file mode 100644
index 000..9852b5de86a
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/rvv/autovec/pr120652-3.c
@@ -0,0 +1,5 @@
+/* Test that we do not have ice when compile */
+/* { dg-do compile } */
+/* { dg-options "-march=rv64gc_zvl1024b -mabi=lp64d -O3" } */
+
+#include "pr120652.h"
diff --git a/gcc/testsuite/gcc.target/riscv/rvv/autovec/pr120652.h 
b/gcc/testsuite/gcc.target/riscv/rvv/autovec/pr120652.h
new file mode 100644
index 000..75f27164b22
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/rv

Re: [PATCH v2] RISC-V: Fix ICE for expand_select_vldi [PR120652]

2025-06-20 Thread Robin Dapp


OK, thanks.

--
Regards
Robin

Re: [PATCH v2] RISC-V: Fix ICE for expand_select_vldi [PR120652]

2025-06-20 Thread Jeff Law





On 6/20/25 7:04 AM, pan2...@intel.com wrote:

From: Pan Li 

The will be one ICE when expand pass, the bt similar as below.

during RTL pass: expand
red.c: In function 'main':
red.c:20:5: internal compiler error: in require, at machmode.h:323
20 | int main() {
   | ^~~~
0x2e0b1d6 internal_error(char const*, ...)
 ../../../gcc/gcc/diagnostic-global-context.cc:517
0xd0d3ed fancy_abort(char const*, int, char const*)
 ../../../gcc/gcc/diagnostic.cc:1803
0xc3da74 opt_mode::require() const
 ../../../gcc/gcc/machmode.h:323
0xc3de2f opt_mode::require() const
 ../../../gcc/gcc/poly-int.h:1383
0xc3de2f riscv_vector::expand_select_vl(rtx_def**)
 ../../../gcc/gcc/config/riscv/riscv-v.cc:4218
0x21c7d22 gen_select_vldi(rtx_def*, rtx_def*, rtx_def*)
 ../../../gcc/gcc/config/riscv/autovec.md:1344
0x134db6c maybe_expand_insn(insn_code, unsigned int, expand_operand*)
 ../../../gcc/gcc/optabs.cc:8257
0x134db6c expand_insn(insn_code, unsigned int, expand_operand*)
 ../../../gcc/gcc/optabs.cc:8288
0x11b21d3 expand_fn_using_insn
 ../../../gcc/gcc/internal-fn.cc:318
0xef32cf expand_call_stmt
 ../../../gcc/gcc/cfgexpand.cc:3097
0xef32cf expand_gimple_stmt_1
 ../../../gcc/gcc/cfgexpand.cc:4264
0xef32cf expand_gimple_stmt
 ../../../gcc/gcc/cfgexpand.cc:4411
0xef95b6 expand_gimple_basic_block
 ../../../gcc/gcc/cfgexpand.cc:6472
0xefb66f execute
 ../../../gcc/gcc/cfgexpand.cc:7223

The select_vl op_1 and op_2 may be the same const_int like (const_int 32).
And then maybe_legitimize_operands will:

1. First mov the const op_1 to a reg.
2. Resue the reg of op_1 for op_2 as the op_1 and op_2 is equal.

That will break the assumption that the op_2 of select_vl is immediate,
or something like CONST_INT_POLY.

The below test suites are passed for this patch series.
* The rv64gcv fully regression test.

PR target/120652

gcc/ChangeLog:

* config/riscv/predicates.md (vectorization_factor_operand):
Add immediate_operand for select_vl operand 2.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/rvv/autovec/pr120652-1.c: New test.
* gcc.target/riscv/rvv/autovec/pr120652-2.c: New test.
* gcc.target/riscv/rvv/autovec/pr120652-3.c: New test.
* gcc.target/riscv/rvv/autovec/pr120652.h: New test.

OK
jeff

[PATCH] Add string_slice class.

2025-06-20 Thread Alfie Richards

Thanks for the pointer Joseph.

This update adds tests to gcc/testsuite/g++.dg/warn/Wformat-gcc_diag-1.C
as this seems to be where similar tests are done (eg, %D for tree).

I couldn't find any tests for the actual output of string slice debug
statements for the other format specifiers so haven't included any. The
formatting is tested indirectly via the later diagnostic tests.

Thanks,
Alfie

-- >8 --

The string_slice inherits from array_slice and is used to refer to a
substring of an array that is memory managed elsewhere without modifying
the underlying array.

For example, this is useful in cases such as when needing to refer to a
substring of an attribute in the syntax tree.

Adds some minimal helper functions for string_slice,
such as a strtok alternative, equality operators, strcmp, and a function
to strip whitespace from the beginning and end of a string_slice.

gcc/c-family/ChangeLog:

* c-format.cc (local_string_slice_node): New node type.
(asm_fprintf_char_table): New entry.
(init_dynamic_diag_info): Add support for string_slice.
* c-format.h (T_STRING_SLICE): New node type.

gcc/ChangeLog:

* pretty-print.cc (format_phase_2): Add support for string_slice.
* vec.cc (string_slice::tokenize): New static method.
(string_slice::strcmp): New static method.
(string_slice::strip): New method.
(test_string_slice_initializers): New test.
(test_string_slice_tokenize): Ditto.
(test_string_slice_strcmp): Ditto.
(test_string_slice_equality): Ditto.
(test_string_slice_inequality): Ditto.
(test_string_slice_invalid): Ditto.
(test_string_slice_strip): Ditto.
(vec_cc_tests): Add new tests.
* vec.h (class string_slice): New class.

gcc/testsuite/ChangeLog

* g++.dg/warn/Wformat-gcc_diag-1.C: Add string_slice "%B" format tests.
---
 gcc/c-family/c-format.cc  |   7 +
 gcc/c-family/c-format.h   |   1 +
 gcc/pretty-print.cc   |  10 +
 .../g++.dg/warn/Wformat-gcc_diag-1.C  |  21 +-
 gcc/vec.cc| 228 ++
 gcc/vec.h |  46 
 6 files changed, 309 insertions(+), 4 deletions(-)

diff --git a/gcc/c-family/c-format.cc b/gcc/c-family/c-format.cc
index a44249a0222..80430e9a8f7 100644
--- a/gcc/c-family/c-format.cc
+++ b/gcc/c-family/c-format.cc
@@ -70,6 +70,7 @@ static GTY(()) tree local_event_ptr_node;
 static GTY(()) tree local_pp_element_ptr_node;
 static GTY(()) tree local_gimple_ptr_node;
 static GTY(()) tree local_cgraph_node_ptr_node;
+static GTY(()) tree local_string_slice_node;
 static GTY(()) tree locus;
 
 static bool decode_format_attr (const_tree, tree, tree, function_format_info *,
@@ -770,6 +771,7 @@ static const format_char_info asm_fprintf_char_table[] =
   { "p",   1, STD_C89, { T89_V,   BADLEN,  BADLEN,  BADLEN,  BADLEN,  BADLEN,  
BADLEN,  BADLEN,  BADLEN,  BADLEN,  BADLEN,  BADLEN  }, "q",  "c",  NULL }, \
   { "r",   1, STD_C89, { T89_C,   BADLEN,  BADLEN,  BADLEN,  BADLEN,  BADLEN,  
BADLEN,  BADLEN,  BADLEN,  BADLEN,  BADLEN,  BADLEN  }, "","//cR",   NULL 
}, \
   { "@",   1, STD_C89, { T_EVENT_PTR,   BADLEN,  BADLEN,  BADLEN,  BADLEN,  
BADLEN,  BADLEN,  BADLEN,  BADLEN  }, "", "\"",   NULL }, \
+  { "B",   1, STD_C89, { T_STRING_SLICE,   BADLEN,  BADLEN,  BADLEN,  BADLEN,  
BADLEN,  BADLEN,  BADLEN,  BADLEN  }, "q", "",   NULL }, \
   { "e",   1, STD_C89, { T_PP_ELEMENT_PTR,   BADLEN,  BADLEN,  BADLEN,  
BADLEN,  BADLEN,  BADLEN,  BADLEN,  BADLEN  }, "", "\"", NULL }, \
   { "<",   0, STD_C89, NOARGUMENTS, "",  "<",   NULL }, \
   { ">",   0, STD_C89, NOARGUMENTS, "",  ">",   NULL }, \
@@ -5211,6 +5213,11 @@ init_dynamic_diag_info (void)
   || local_cgraph_node_ptr_node == void_type_node)
 local_cgraph_node_ptr_node = get_named_type ("cgraph_node");
 
+  /* Similar to the above but for string_slice*.  */
+  if (!local_string_slice_node
+  || local_string_slice_node == void_type_node)
+local_string_slice_node = get_named_type ("string_slice");
+
   /* Similar to the above but for diagnostic_event_id_t*.  */
   if (!local_event_ptr_node
   || local_event_ptr_node == void_type_node)
diff --git a/gcc/c-family/c-format.h b/gcc/c-family/c-format.h
index 323338cb8e7..d44d3862d83 100644
--- a/gcc/c-family/c-format.h
+++ b/gcc/c-family/c-format.h
@@ -317,6 +317,7 @@ struct format_kind_info
 #define T89_G   { STD_C89, NULL, &local_gimple_ptr_node }
 #define T_CGRAPH_NODE   { STD_C89, NULL, &local_cgraph_node_ptr_node }
 #define T_EVENT_PTR{ STD_C89, NULL, &local_event_ptr_node }
+#define T_STRING_SLICE{ STD_C89, NULL, &local_string_slice_node }
 #define T_PP_ELEMENT_PTR{ STD_C89, NULL, &local_pp_element_ptr_node }
 #define T89_T   { STD_C89, NULL, &local_tree_type_node }
 #define T89_V  { STD_C89, NULL, T_V }
diff --git a/gcc/pretty-print.cc b/gcc/pretty-print.cc
index

Re: [PATCH v5 3/3][C sanitizer] Use the counted_by attribute of pointers in array bound checker.

2025-06-20 Thread Qing Zhao




> On Jun 18, 2025, at 17:26, Joseph Myers  wrote:
> 
> On Mon, 16 Jun 2025, Qing Zhao wrote:
> 
>> Current array bound checker only instruments ARRAY_REF, and the INDEX
>> information is the 2nd operand of the ARRAY_REF.
>> 
>> When extending the array bound checker to pointer references with
>> counted_by attributes, the hardest part is to get the INDEX of the
>> corresponding array ref from the offset computation expression of
>> the pointer ref.  I.e.
>> 
>> Given an OFFSET expression, and the ELEMENT_SIZE,
>> get the index expression from the OFFSET.
>> For example:
>>  OFFSET:
>>   ((long unsigned int) m * (long unsigned int) SAVE_EXPR ) * 4
>>  ELEMENT_SIZE:
>>   (sizetype) SAVE_EXPR  * 4
>> get the index as (long unsigned int) m.
> 
> This patch is OK once the rest of the series is approved, in the absence 
> of objections within 48 hours.

Thanks a lot for the review.

Qing
> 
> -- 
> Joseph S. Myers
> josmy...@redhat.com

[PATCH] match: Simplify double not and double negate to a non_lvalue

2025-06-20 Thread Mikael Morin

From: Mikael Morin 

Regression tested on x86_64-linux.  OK for master?

-- 8< --

gcc/ChangeLog:

* match.pd (`-(-X)`, `~(~X)`): Add a NON_LVALUE_EXPR wrapper to the
simplification of doubled unary operators NEGATE_EXPR and
BIT_NOT_EXPR.

gcc/testsuite/ChangeLog:

* gfortran.dg/non_lvalue_1.f90: New test.
---
 gcc/match.pd   |  4 ++--
 gcc/testsuite/gfortran.dg/non_lvalue_1.f90 | 23 ++
 2 files changed, 25 insertions(+), 2 deletions(-)
 create mode 100644 gcc/testsuite/gfortran.dg/non_lvalue_1.f90

diff --git a/gcc/match.pd b/gcc/match.pd
index 0f53c162fce..ad0fa8f1004 100644
--- a/gcc/match.pd
+++ b/gcc/match.pd
@@ -2357,7 +2357,7 @@ DEFINE_INT_AND_FLOAT_ROUND_FN (RINT)
 /* ~~x -> x */
 (simplify
   (bit_not (bit_not @0))
-  @0)
+  (non_lvalue @0))
 
 /* zero_one_valued_p will match when a value is known to be either
0 or 1 including constants 0 or 1.
@@ -4037,7 +4037,7 @@ DEFINE_INT_AND_FLOAT_ROUND_FN (RINT)
   (negate (nop_convert? (negate @1)))
   (if (!TYPE_OVERFLOW_SANITIZED (type)
&& !TYPE_OVERFLOW_SANITIZED (TREE_TYPE (@1)))
-   (view_convert @1)))
+   (non_lvalue (view_convert @1
 
  /* We can't reassociate floating-point unless -fassociative-math
 or fixed-point plus or minus because of saturation to +-Inf.  */
diff --git a/gcc/testsuite/gfortran.dg/non_lvalue_1.f90 
b/gcc/testsuite/gfortran.dg/non_lvalue_1.f90
new file mode 100644
index 000..536c86b1eb6
--- /dev/null
+++ b/gcc/testsuite/gfortran.dg/non_lvalue_1.f90
@@ -0,0 +1,23 @@
+! { dg-do compile }
+! { dg-additional-options "-fdump-tree-original" }
+!
+! Check the generation of NON_LVALUE_EXPR expressions in cases where a unary
+! operator expression would simplify to a bare data reference.
+
+! A NON_LVALUE_EXPR is generated for a double negation that would simplify to
+! a bare data reference.
+function f1 (f1_arg1)
+  integer, value :: f1_arg1
+  integer :: f1
+  f1 = -(-f1_arg1)
+end function
+! { dg-final { scan-tree-dump "__result_f1 = NON_LVALUE_EXPR ;" 
"original" } }
+
+! A NON_LVALUE_EXPR is generated for a double complement that would simplify to
+! a bare data reference.
+function f2 (f2_arg1)
+  integer, value :: f2_arg1
+  integer :: f2
+  f2 = not(not(f2_arg1))
+end function
+! { dg-final { scan-tree-dump "__result_f2 = NON_LVALUE_EXPR ;" 
"original" } }
-- 
2.47.2

[committed] amdgcn: allow SImode in VCC_HI [PR120722]

2025-06-20 Thread Andrew Stubbs

This patch isn't fully tested yet, but it fixes the build failure, so that
will do for now.  SImode was not allowed in VCC_HI because there were issues,
way back before the port went upstream, so it's possible we'll find out what
those issues were again soon.

gcc/ChangeLog:

PR target/120722
* config/gcn/gcn.cc (gcn_hard_regno_mode_ok): Allow SImode in VCC_HI.
---
 gcc/config/gcn/gcn.cc | 3 +--
 1 file changed, 1 insertion(+), 2 deletions(-)

diff --git a/gcc/config/gcn/gcn.cc b/gcc/config/gcn/gcn.cc
index 31a59dd6f22..2d8dfa3232e 100644
--- a/gcc/config/gcn/gcn.cc
+++ b/gcc/config/gcn/gcn.cc
@@ -585,9 +585,8 @@ gcn_hard_regno_mode_ok (unsigned int regno, machine_mode 
mode)
 case XNACK_MASK_HI_REG:
 case TBA_HI_REG:
 case TMA_HI_REG:
-  return mode == SImode;
 case VCC_HI_REG:
-  return false;
+  return mode == SImode;
 case EXEC_HI_REG:
   return mode == SImode /*|| mode == V32BImode */ ;
 case SCC_REG:
-- 
2.49.0

[PATCH] s390: Fix float vector extract for pre-z13

2025-06-20 Thread Juergen Christ

Also provide the vec_extract patterns for floats on pre-z13 machines
to prevent ICEing in those cases.

Bootstrapped and regtested on s390.

gcc/ChangeLog:

* config/s390/vector.md (VF): Don't restrict modes.
* config/s390/vector.md (VEC_SET_SINGLEFLOAT): Ditto.

gcc/testsuite/ChangeLog:

* gcc.target/s390/vector/vec-extract-1.c: Fix test on arch11.
* gcc.target/s390/vector/vec-set-1.c: Run test on arch11.
* gcc.target/s390/vector/vec-extract-2.c: New test.

Signed-off-by: Juergen Christ 
---
 gcc/config/s390/vector.md |   4 +-
 .../gcc.target/s390/vector/vec-extract-1.c|  16 +-
 .../gcc.target/s390/vector/vec-extract-2.c| 168 ++
 .../gcc.target/s390/vector/vec-set-1.c|  23 ++-
 4 files changed, 187 insertions(+), 24 deletions(-)
 create mode 100644 gcc/testsuite/gcc.target/s390/vector/vec-extract-2.c

diff --git a/gcc/config/s390/vector.md b/gcc/config/s390/vector.md
index 6f4e1929eb80..7251a76c3aea 100644
--- a/gcc/config/s390/vector.md
+++ b/gcc/config/s390/vector.md
@@ -75,7 +75,7 @@
   V1DF V2DF
   (V1TF "TARGET_VXE") (TF "TARGET_VXE")])
 
-(define_mode_iterator VF [(V2SF "TARGET_VXE") (V4SF "TARGET_VXE") V2DF])
+(define_mode_iterator VF [V2SF V4SF V2DF])
 
 ; All modes present in V_HW1 and VFT.
 (define_mode_iterator V_HW1_FT [V16QI V8HI V4SI V2DI V1TI V1DF
@@ -512,7 +512,7 @@
 (define_mode_iterator VEC_SET_NONFLOAT
   [V1QI V2QI V4QI V8QI V16QI V1HI V2HI V4HI V8HI V1SI V2SI V4SI V1DI V2DI V2SF 
V4SF])
 ; Iterator for single element float vectors
-(define_mode_iterator VEC_SET_SINGLEFLOAT [(V1SF "TARGET_VXE") V1DF (V1TF 
"TARGET_VXE")])
+(define_mode_iterator VEC_SET_SINGLEFLOAT [V1SF V1DF (V1TF "TARGET_VXE")])
 
 ; FIXME: Support also vector mode operands for 1
 ; FIXME: A target memory operand seems to be useful otherwise we end
diff --git a/gcc/testsuite/gcc.target/s390/vector/vec-extract-1.c 
b/gcc/testsuite/gcc.target/s390/vector/vec-extract-1.c
index 9df7909a3ea8..83af839963be 100644
--- a/gcc/testsuite/gcc.target/s390/vector/vec-extract-1.c
+++ b/gcc/testsuite/gcc.target/s390/vector/vec-extract-1.c
@@ -1,5 +1,5 @@
 /* { dg-do compile } */
-/* { dg-options "-O2 -march=z14 -mzarch" } */
+/* { dg-options "-O2 -march=arch11 -mzarch" } */
 /* { dg-final { check-function-bodies "**" "" } } */
 
 typedef double V2DF __attribute__((vector_size(16)));
@@ -110,17 +110,6 @@ extractnthfloat (V4SF x, int n)
   return x[n];
 }
 
-/*
-** sumfirstfloat:
-** vfasb   %v0,%v24,%v26
-** br  %r14
-*/
-float
-sumfirstfloat (V4SF x, V4SF y)
-{
-  return (x + y)[0];
-}
-
 /*
 ** extractfirst2:
 ** vlr %v0,%v24
@@ -179,8 +168,7 @@ extractsingled (V1DF x)
 
 /*
 ** extractsingleld:
-** vlr (%v.),%v24
-** vst \1,0\(%r2\),3
+** vst %v24,0\(%r2\),3
 ** br  %r14
 */
 long double
diff --git a/gcc/testsuite/gcc.target/s390/vector/vec-extract-2.c 
b/gcc/testsuite/gcc.target/s390/vector/vec-extract-2.c
new file mode 100644
index ..640ac0c8c766
--- /dev/null
+++ b/gcc/testsuite/gcc.target/s390/vector/vec-extract-2.c
@@ -0,0 +1,168 @@
+/* { dg-do compile } */
+/* { dg-options "-O2 -march=arch11 -mzarch" } */
+/* { dg-final { check-function-bodies "**" "" } } */
+
+typedef double V2DF __attribute__((vector_size(16)));
+typedef float V4SF __attribute__((vector_size(16)));
+typedef float V2SF __attribute__((vector_size(8)));
+typedef double V1DF __attribute__((vector_size(8)));
+typedef float V1SF __attribute__((vector_size(4)));
+typedef long double V1TF __attribute__((vector_size(16)));
+
+/*
+** extractfirstdouble:
+** vsteg   %v24,0\(%r2\),0
+** br  %r14
+*/
+void
+extractfirstdouble (double *res, V2DF x)
+{
+  *res = x[0];
+}
+
+/*
+** extractseconddouble:
+** vsteg   %v24,0\(%r2\),1
+** br  %r14
+*/
+void
+extractseconddouble (double *res, V2DF x)
+{
+  *res = x[1];
+}
+
+/*
+** extractnthdouble:
+** vlgvg   (%r.),%v24,0\(%r3\)
+** stg \1,0\(%r2\)
+** br  %r14
+*/
+void
+extractnthdouble (double *res, V2DF x, int n)
+{
+  *res = x[n];
+}
+
+/*
+** extractfirstfloat:
+** vstef   %v24,0\(%r2\),0
+** br  %r14
+*/
+void
+extractfirstfloat (float *res, V4SF x)
+{
+  *res = x[0];
+}
+
+/*
+** extractsecondfloat:
+** vstef   %v24,0\(%r2\),1
+** br  %r14
+*/
+void
+extractsecondfloat (float *res, V4SF x)
+{
+  *res = x[1];
+}
+
+/*
+** extractthirdfloat:
+** vstef   %v24,0\(%r2\),2
+** br  %r14
+*/
+void
+extractthirdfloat (float *res, V4SF x)
+{
+  *res = x[2];
+}
+
+/*
+** extractfourthfloat:
+** vstef   %v24,0\(%r2\),3
+** br  %r14
+*/
+void
+extractfourthfloat (float *res, V4SF x)
+{
+  *res = x[3];
+}
+
+/*
+** extractnthfloat:
+** vlgvf   (%r.),%v24,0\(%r3\)
+** st  \1,0\(%r2\)
+** br  %r14
+*/
+void
+extractnthfloat (float *res, V4SF x, int n)
+{
+  *res = x[n];
+}
+
+/*
+** extractfirst2:
+** vstef   %v24,0\(%r2

[PATCH] AArch64: Disable TARGET_CONST_ANCHOR

2025-06-20 Thread Wilco Dijkstra


TARGET_CONST_ANCHOR appears to trigger too often, even on simple immediates.
It inserts extra ADD/SUB instructions even when a single MOV exists.
Disable it to improve overall code quality: on SPEC2017 it removes
1850 ADD/SUB instructions and 630 spill instructions, and SPECINT is ~0.06%
faster on Neoverse V2.  Adjust a testcase that was confusing neg and fneg.

Passes regress, OK for commit?

gcc:
* config/aarch64/aarch64.cc (TARGET_CONST_ANCHOR): Remove.

gcc/testsuite:
* gcc.target/aarch64/vneg_s.c: Update test.

---

diff --git a/gcc/config/aarch64/aarch64.cc b/gcc/config/aarch64/aarch64.cc
index 
0a3c246517a86697142589a513a327e5ee930349..51279e29db88f0aa332c40abda68ad3b957b0ef0
 100644
--- a/gcc/config/aarch64/aarch64.cc
+++ b/gcc/config/aarch64/aarch64.cc
@@ -32444,9 +32444,6 @@ aarch64_libgcc_floating_mode_supported_p
 #undef TARGET_HAVE_SHADOW_CALL_STACK
 #define TARGET_HAVE_SHADOW_CALL_STACK true
 
-#undef TARGET_CONST_ANCHOR
-#define TARGET_CONST_ANCHOR 0x100
-
 #undef TARGET_EXTRA_LIVE_ON_ENTRY
 #define TARGET_EXTRA_LIVE_ON_ENTRY aarch64_extra_live_on_entry
 
diff --git a/gcc/testsuite/gcc.target/aarch64/vneg_s.c 
b/gcc/testsuite/gcc.target/aarch64/vneg_s.c
index 
8ddc4d21c1f89d6c66624a33ee0386cb3a28c512..8d91639faaa1c728095265ce4e61327a4dc441e3
 100644
--- a/gcc/testsuite/gcc.target/aarch64/vneg_s.c
+++ b/gcc/testsuite/gcc.target/aarch64/vneg_s.c
@@ -256,7 +256,7 @@ test_vnegq_s64 ()
   return o1||o2||o2||o4;
 }
 
-/* { dg-final { scan-assembler-times "neg\\tv\[0-9\]+\.2d, v\[0-9\]+\.2d" 1 } 
} */
+/* { dg-final { scan-assembler-times "\tneg\\tv\[0-9\]+\.2d, v\[0-9\]+\.2d" 1 
} } */
 
 int
 main (int argc, char **argv)

Database Info of BioProcess International Conference & Exhibition

2025-06-20 Thread Sophie Brown

Dear Exhibitor,

Participants/Visitors/Attendees database of BioProcess International Conference 
& Exhibition Held at John B. Hynes Veterans Memorial Convention Center, 
Boston, 
USA, is now available and ready to be 
prospected.

Total number of visitors this year - 2,000 Unique Visitors
Event date -  15 - 18 Sep 2025

Benefits of the list:
* Reach all visitors and invite to your booth beforehand.
* Advertise about the product before event.
* Make customers.

Kindly let me know if you would like to acquire this so that I can share the 
pricing and other details.

Thanks & Regards,
Sophie Brown
Marketing Executive

Re: [PATCH, 4 of 4] Use vector pair for memory operations with -mcpu=future

2025-06-20 Thread Surya Kumari Jangala

Hi Mike,

On 14/06/25 2:13 pm, Michael Meissner wrote:
> This is patch #4 of 4 to add -mcpu=future support to the PowerPC.

I think this should be a separate patch in itself. As such, this
patch is not required to enable the -mcpu=future option.

> 
> In the development for the power10 processor, GCC did not enable using the 
> load
> vector pair and store vector pair instructions when optimizing things like

s/things/functions

> memory copy.  This patch enables using those instructions if -mcpu=future is
> used.
> 
> I have tested these patches on both big endian and little endian PowerPC
> servers, with no regressions.  Can I check these patchs into the trunk?
> 
> 2025-06-13  Michael Meissner  
> 
> gcc/
> 
>   * config/rs6000/rs6000-cpus.def (ISA_FUTURE_MASKS_SERVER): Enable using

Just FUTURE_MASKS_SERVER

>   load vector pair and store vector pair instructions for memory copy
>   operations.
>   (POWERPC_MASKS): Make the bit for enabling using load vector pair and
>   store vector pair operations set and reset when the PowerPC processor is
>   changed.

I think this can be reworded, perhaps something like:
(POWERPC_MASKS): Add the option mask OPTION_MASK_BLOCK_OPS_VECTOR_PAIR.

>   * gcc/config/rs6000/rs6000.cc (rs6000_machine_from_flags): Disable
>   -mblock-ops-vector-pair from influcing .machine selection.

nit: "influencing"

Also, in rs6000.opt, mblock-ops-vector-pair is marked as Undocumented. Should we
change this?

Regards,
Surya

> 
> gcc/testsuite/
> 
>   * gcc.target/powerpc/future-3.c: New test.
> ---
>  gcc/config/rs6000/rs6000-cpus.def   |  4 +++-
>  gcc/config/rs6000/rs6000.cc |  2 +-
>  gcc/testsuite/gcc.target/powerpc/future-3.c | 22 +
>  3 files changed, 26 insertions(+), 2 deletions(-)
>  create mode 100644 gcc/testsuite/gcc.target/powerpc/future-3.c
> 
> diff --git a/gcc/config/rs6000/rs6000-cpus.def 
> b/gcc/config/rs6000/rs6000-cpus.def
> index 228d0b5e7b5..063591f5c09 100644
> --- a/gcc/config/rs6000/rs6000-cpus.def
> +++ b/gcc/config/rs6000/rs6000-cpus.def
> @@ -84,7 +84,8 @@
> | OPTION_MASK_POWER11)
>  
>  #define FUTURE_MASKS_SERVER  (POWER11_MASKS_SERVER   \
> -  | OPTION_MASK_FUTURE)
> +  | OPTION_MASK_FUTURE   \
> +  | OPTION_MASK_BLOCK_OPS_VECTOR_PAIR)
>  
>  /* Flags that need to be turned off if -mno-vsx.  */
>  #define OTHER_VSX_VECTOR_MASKS   (OPTION_MASK_EFFICIENT_UNALIGNED_VSX
> \
> @@ -114,6 +115,7 @@
>  
>  /* Mask of all options to set the default isa flags based on -mcpu=.  */
>  #define POWERPC_MASKS(OPTION_MASK_ALTIVEC
> \
> +  | OPTION_MASK_BLOCK_OPS_VECTOR_PAIR\
>| OPTION_MASK_CMPB \
>| OPTION_MASK_CRYPTO   \
>| OPTION_MASK_DFP  \
> diff --git a/gcc/config/rs6000/rs6000.cc b/gcc/config/rs6000/rs6000.cc
> index 141d53b1a12..80fc500fcec 100644
> --- a/gcc/config/rs6000/rs6000.cc
> +++ b/gcc/config/rs6000/rs6000.cc
> @@ -5907,7 +5907,7 @@ rs6000_machine_from_flags (void)
>  
>/* Disable the flags that should never influence the .machine selection.  
> */
>flags &= ~(OPTION_MASK_PPC_GFXOPT | OPTION_MASK_PPC_GPOPT | 
> OPTION_MASK_ISEL
> -  | OPTION_MASK_ALTIVEC);
> +  | OPTION_MASK_ALTIVEC | OPTION_MASK_BLOCK_OPS_VECTOR_PAIR);
>  
>if ((flags & (FUTURE_MASKS_SERVER & ~ISA_3_1_MASKS_SERVER)) != 0)
>  return "future";
> diff --git a/gcc/testsuite/gcc.target/powerpc/future-3.c 
> b/gcc/testsuite/gcc.target/powerpc/future-3.c
> new file mode 100644
> index 000..afa8b96
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/powerpc/future-3.c
> @@ -0,0 +1,22 @@
> +/* 32-bit doesn't generate vector pair instructions.  */
> +/* { dg-do compile { target lp64 } } */
> +/* { dg-options "-mdejagnu-cpu=future -O2" } */
> +
> +/* Test to see that memcpy will use load/store vector pair with
> +   -mcpu=future.  */
> +
> +#ifndef SIZE
> +#define SIZE 4
> +#endif
> +
> +extern vector double to[SIZE], from[SIZE];
> +
> +void
> +copy (void)
> +{
> +  __builtin_memcpy (to, from, sizeof (to));
> +  return;
> +}
> +
> +/* { dg-final { scan-assembler {\mlxvpx?\M}  } } */
> +/* { dg-final { scan-assembler {\mstxvpx?\M} } } */

Re: [PATCH, 4 of 4] Use vector pair for memory operations with -mcpu=future

2025-06-20 Thread Segher Boessenkool

Hi!

On Fri, Jun 20, 2025 at 10:38:30PM +0530, Surya Kumari Jangala wrote:
> On 14/06/25 2:13 pm, Michael Meissner wrote:
> > This is patch #4 of 4 to add -mcpu=future support to the PowerPC.
> 
> I think this should be a separate patch in itself. As such, this
> patch is not required to enable the -mcpu=future option.

It can in theory be helpful to have it in the same series, but yeah, it
certainly does not belong here.  It should be a separate patch, and it
should come with some evidence or at the very least some indication that
it would be a good idea to have it at all, and proof that is not a *bad*
idea!

> > In the development for the power10 processor, GCC did not enable using the 
> > load
> > vector pair and store vector pair instructions when optimizing things like
> 
> s/things/functions

"Things" is nicely non-specific, hehe.

> > * config/rs6000/rs6000-cpus.def (ISA_FUTURE_MASKS_SERVER): Enable using
> 
> Just FUTURE_MASKS_SERVER

The existing masks are ISA_3_1_MASKS_SERVER (and many older ISAs before
it), and POWER11_MASKS_SERVER .  We do not have to call ISA 3.2
"Future", certainly not by IBM's lawyers, it isn't IBM who will publish
Power Architecture revisions anyway!

Yeah, ISA_FUTURE makes no sense in the first place, "Future" here is a
stand-in for the marketing name for the next IBM Power Server chip.  The
(lawyers') fear is that if we publish the expected name for the next
generation server CPU, and also GCC support for that CPU, that then some
potential customers can argue in the future (har har) that that was a
promise.  So we call it "Future", no specific version or timespan, and
of course we cannot really predict the future, and future plans can
always change, too.

You can expect that in the future (when things have settled) we will
just do a tree-wide search and replace.

> > * gcc/config/rs6000/rs6000.cc (rs6000_machine_from_flags): Disable
> > -mblock-ops-vector-pair from influcing .machine selection.
> 
> nit: "influencing"

Speling fixes are never a nit!  Attention to details is important.

> Also, in rs6000.opt, mblock-ops-vector-pair is marked as Undocumented. Should 
> we
> change this?

Probably yes.  If the option is worth being user-selectable at all, we
should document it.

Segher

[COMMITTED] PR tree-optimization/120701 - Fix range wrap check and enhance verify_range.

2025-06-20 Thread Andrew MacLeod

I wasn't checking the underflow and overflow conditions well enough in 
the original patch for range bound snapping.  THe testcxsed in this PR 
has a [+INF, +INF] subrange with a bitmask that said it must be an even 
value.


The lower bound calculation overflowed (+INF + 1}, but it was not 
detected, so a new subrange of [-INF, +INF - 1] was created in its place.


Suprisingly, irange::verify_range was confirming that each subrange has 
LB <= UB,  but it never checked if the UB of the previous pair was less 
than the LB of the current pair.    Very surprising or it would have 
triggered an obvious trap.    INstead, it took quite some time to track 
it down as it wasnt triggering a trap until much later when a range was 
being loaded from memory.


Regardless, this patch fixes the overflow/underflow issue, and adds 
verification that the subrange pairs are also sorted to verify_range.


I also made some minor comment adjustments Aldy pointed out earlier.

Bootstraps on x86_64-pc-linux-gnu with no regressions.   Pushed.

Andrew
From b03e0d69b37f6ea7aef220652635031a89f56a11 Mon Sep 17 00:00:00 2001
From: Andrew MacLeod 
Date: Fri, 20 Jun 2025 08:50:39 -0400
Subject: [PATCH] Fix range wrap check and enhance verify_range.

when snapping range bounds to satidsdaybitmask constraints, end bound overflow
and underflow checks were not working properly.
Also Adjust some comments, and enhance verify_range to make sure range pairs
are sorted properly.

	PR tree-optimization/120701
	gcc/
	* value-range.cc (irange::verify_range): Verify range pairs are
	sorted properly.
	(irange::snap): Check for over/underflow properly.

	gcc/testsuite/
	* gcc.dg/pr120701.c: New.
---
 gcc/testsuite/gcc.dg/pr120701.c | 40 +
 gcc/value-range.cc  | 38 +--
 2 files changed, 61 insertions(+), 17 deletions(-)
 create mode 100644 gcc/testsuite/gcc.dg/pr120701.c

diff --git a/gcc/testsuite/gcc.dg/pr120701.c b/gcc/testsuite/gcc.dg/pr120701.c
new file mode 100644
index 000..09f7b6192ed
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/pr120701.c
@@ -0,0 +1,40 @@
+/* { dg-do compile } */
+/* { dg-options "-O2" } */
+
+int a, b, c, e, f;
+int main() {
+  int d, g, i;
+j:
+  if (d >= 0)
+goto k;
+  if (g >= 0)
+goto l;
+k:
+  i = a + 3;
+m:
+  f = 652685095 + 818172564 * g;
+  if (-1101344938 * f - 1654872807 * d >= 0)
+goto n;
+  goto l;
+o:
+  if (i) {
+c = -b;
+if (-c >= 0)
+  goto l;
+g = b;
+b = i + 5;
+if (b * c)
+  goto n;
+goto o;
+  }
+  if (e)
+goto m;
+  goto j;
+n:
+  d = 978208086 * g - 1963072513;
+  if (d + i)
+return 0;
+  goto k;
+l:
+  goto o;
+}
diff --git a/gcc/value-range.cc b/gcc/value-range.cc
index 0f0770ad705..ce13acc312d 100644
--- a/gcc/value-range.cc
+++ b/gcc/value-range.cc
@@ -1552,6 +1552,11 @@ irange::verify_range ()
   gcc_checking_assert (ub.get_precision () == prec);
   int c = wi::cmp (lb, ub, TYPE_SIGN (m_type));
   gcc_checking_assert (c == 0 || c == -1);
+  // Previous UB should be lower than LB
+  if (i > 0)
+	gcc_checking_assert (wi::lt_p (upper_bound (i - 1),
+   lb,
+   TYPE_SIGN (m_type)));
 }
   m_bitmask.verify_mask ();
 }
@@ -1628,7 +1633,7 @@ irange::contains_p (const wide_int &cst) const
   if (undefined_p ())
 return false;
 
-  // Check is the known bits in bitmask exclude CST.
+  // Check if the known bits in bitmask exclude CST.
   if (!m_bitmask.member_p (cst))
 return false;
 
@@ -2269,7 +2274,7 @@ irange::invert ()
 
 // This routine will take the bounds [LB, UB], and apply the bitmask to those
 // values such that both bounds satisfy the bitmask.  TRUE is returned
-// if either bound changes, and they are retuirned as [NEW_LB, NEW_UB].
+// if either bound changes, and they are returned as [NEW_LB, NEW_UB].
 // if NEW_UB < NEW_LB, then the entire bound is to be removed as none of
 // the values are valid.
 //   ie,   [4, 14] MASK 0xFFFE  VALUE 0x1
@@ -2285,30 +2290,29 @@ irange::snap (const wide_int &lb, const wide_int &ub,
   uint z = wi::ctz (m_bitmask.mask ());
   if (z == 0)
 return false;
-  const wide_int &wild_mask = m_bitmask.mask ();
 
   const wide_int step = (wi::one (TYPE_PRECISION (type ())) << z);
   const wide_int match_mask = step - 1;
   const wide_int value = m_bitmask.value () & match_mask;
 
-  wide_int rem_lb = lb & match_mask;
-
-  wi::overflow_type ov_sub;
-  wide_int diff = wi::sub(value, rem_lb, UNSIGNED, &ov_sub);
-  wide_int offset = diff & match_mask;
+  bool ovf = false;
 
-  wi::overflow_type ov1;
-  new_lb = wi::add (lb, offset, UNSIGNED, &ov1);
+  wide_int rem_lb = lb & match_mask;
+  wide_int offset = (value - rem_lb) & match_mask;
+  new_lb = lb + offset;
+  // Check for overflows at +INF
+  if (wi::lt_p (new_lb, lb, TYPE_SIGN (type (
+ovf = true;
 
   wide_int rem_ub = ub & match_mask;
   wide_int offset_ub = (rem_ub - value) & match_mask;
-
-  wi::overflow_type ov2;
-  new_ub =

[PATCH] x86: Don't use vmovdqu16/vmovdqu8 with non-EVEX registers

2025-06-20 Thread H.J. Lu

Don't use vmovdqu16/vmovdqu8 with non-EVEX registers even if AVX512BW is
available.

gcc/

PR target/120728
* config/i386/i386.cc (ix86_get_ssemov): Use vmovdqu16/vmovdqu8
only with EVEX registers.

gcc/testsuite/

PR target/120728
* gcc.target/i386/pr120728.c: New test.


-- 
H.J.
From fb8db1e46aa4318f8c29853d97e77353dcab1e1c Mon Sep 17 00:00:00 2001
From: "H.J. Lu" 
Date: Fri, 20 Jun 2025 16:07:18 +0800
Subject: [PATCH] x86: Don't use vmovdqu16/vmovdqu8 with non-EVEX registers

Don't use vmovdqu16/vmovdqu8 with non-EVEX registers even if AVX512BW is
available.

gcc/

	PR target/120728
	* config/i386/i386.cc (ix86_get_ssemov): Use vmovdqu16/vmovdqu8
	only with EVEX registers.

gcc/testsuite/

	PR target/120728
	* gcc.target/i386/pr120728.c: New test.

Signed-off-by: H.J. Lu 
---
 gcc/config/i386/i386.cc  |  8 +++
 gcc/testsuite/gcc.target/i386/pr120728.c | 27 
 2 files changed, 31 insertions(+), 4 deletions(-)
 create mode 100644 gcc/testsuite/gcc.target/i386/pr120728.c

diff --git a/gcc/config/i386/i386.cc b/gcc/config/i386/i386.cc
index 77853297a2f..c0284fbdf4e 100644
--- a/gcc/config/i386/i386.cc
+++ b/gcc/config/i386/i386.cc
@@ -5703,7 +5703,7 @@ ix86_get_ssemov (rtx *operands, unsigned size,
 		  : "%vmovaps");
 	  else
 	opcode = (misaligned_p
-		  ? (TARGET_AVX512BW
+		  ? (TARGET_AVX512BW && evex_reg_p
 			 ? "vmovdqu16"
 			 : "%vmovdqu")
 		  : "%vmovdqa");
@@ -5745,7 +5745,7 @@ ix86_get_ssemov (rtx *operands, unsigned size,
 		  : "%vmovaps");
 	  else
 	opcode = (misaligned_p
-		  ? (TARGET_AVX512BW
+		  ? (TARGET_AVX512BW && evex_reg_p
 			 ? "vmovdqu8"
 			 : "%vmovdqu")
 		  : "%vmovdqa");
@@ -5759,13 +5759,13 @@ ix86_get_ssemov (rtx *operands, unsigned size,
 		  : "vmovdqa64");
 	  else if (egpr_p)
 	opcode = (misaligned_p
-		  ? (TARGET_AVX512BW
+		  ? (TARGET_AVX512BW && evex_reg_p
 			 ? "vmovdqu16"
 			 : "%vmovups")
 		  : "%vmovaps");
 	  else
 	opcode = (misaligned_p
-		  ? (TARGET_AVX512BW
+		  ? (TARGET_AVX512BW && evex_reg_p
 			 ? "vmovdqu16"
 			 : "%vmovdqu")
 		  : "%vmovdqa");
diff --git a/gcc/testsuite/gcc.target/i386/pr120728.c b/gcc/testsuite/gcc.target/i386/pr120728.c
new file mode 100644
index 000..93d2cd07e2f
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/pr120728.c
@@ -0,0 +1,27 @@
+/* { dg-do compile } */
+/* { dg-options "-O2 -march=x86-64-v4" } */
+/* { dg-final { scan-assembler-times "vmovdqu\[ \\t\]+\[^\{\n\]*%ymm\[0-9\]+, " 3 } } */
+/* { dg-final { scan-assembler-not "vmovdqu8" } } */
+/* { dg-final { scan-assembler-not "vmovdqu16" } } */
+
+typedef char __v32qi __attribute__ ((__vector_size__ (32)));
+typedef char __v32qi_u __attribute__ ((__vector_size__ (32),
+   __aligned__ (1)));
+typedef short __v16hi __attribute__ ((__vector_size__ (32)));
+typedef short __v16hi_u __attribute__ ((__vector_size__ (32),
+	   __aligned__ (1)));
+typedef _Float16 __v16hf __attribute__ ((__vector_size__ (32)));
+typedef _Float16 __v16hf_u __attribute__ ((__vector_size__ (32),
+	   __aligned__ (1)));
+
+extern __v32qi_u v1;
+extern __v16hi_u v2;
+extern __v16hf_u v3;
+
+void
+foo (__v32qi x1, __v16hi x2, __v16hf x3)
+{
+  v1 = x1;
+  v2 = x2;
+  v3 = x3;
+}
-- 
2.49.0

Re: [PATCH] x86: Get the widest vector mode from MOVE_MAX

2025-06-20 Thread Uros Bizjak

On Thu, Jun 19, 2025 at 1:27 PM H.J. Lu  wrote:
>
> Since MOVE_MAX defines the maximum number of bytes that an instruction
> can move quickly between memory and registers, use it to get the widest
> vector mode in vector loop when inlining memcpy and memset.
>
> gcc/
>
> PR target/120708
> * config/i386/i386-expand.cc (ix86_expand_set_or_cpymem): Use
> MOVE_MAX to get the widest vector mode in vector loop.
>
> gcc/testsuite/
>
> PR target/120708
> * gcc.target/i386/memcpy-pr120708-1.c: New test.
> * gcc.target/i386/memcpy-pr120708-2.c: Likewise.
> * gcc.target/i386/memcpy-pr120708-3.c: Likewise.
> * gcc.target/i386/memcpy-pr120708-4.c: Likewise.
> * gcc.target/i386/memcpy-pr120708-5.c: Likewise.
> * gcc.target/i386/memcpy-pr120708-6.c: Likewise.
> * gcc.target/i386/memset-pr120708-1.c: Likewise.
> * gcc.target/i386/memset-pr120708-2.c: Likewise.
> * gcc.target/i386/memcpy-strategy-1.c: Drop dg-skip-if.  Replace
> -march=atom with -mno-avx -msse2 -mtune=generic
> -mtune-ctrl=^sse_typeless_stores.
> * gcc.target/i386/memcpy-strategy-2.c: Likewise.
> * gcc.target/i386/memcpy-vector_loop-1.c: Likewise.
> * gcc.target/i386/memcpy-vector_loop-2.c: Likewise.
> * gcc.target/i386/memset-vector_loop-1.c: Likewise.
> * gcc.target/i386/memset-vector_loop-2.c: Likewise.

OK.

Thanks,
Uros.

Re: [PATCH 0/2] Memory leak fixes in prime paths [PR120634]

2025-06-20 Thread Jørgen Kvalsvik


Thanks, pushed.

On 6/20/25 12:18, Richard Biener wrote:

On Thu, Jun 19, 2025 at 11:21 PM Jørgen Kvalsvik  wrote:


Hi,

These patches fixes a memory leak in the prime paths, and some in the
selftests that show up in make selftest-valgrind. After applying these
patches on my x86-64-linux-gnu system and make selftest-valgrind:


OK.

Thanks,
Richard.


-fself-test: 7665942 pass(es) in 8.943705 seconds
==802130==
==802130== HEAP SUMMARY:
==802130== in use at exit: 1,174,596 bytes in 2,428 blocks
==802130==   total heap usage: 3,650,060 allocs, 3,647,632 frees, 1,776,527,949 
bytes allocated
==802130==
==802130== LEAK SUMMARY:
==802130==definitely lost: 0 bytes in 0 blocks
==802130==indirectly lost: 0 bytes in 0 blocks
==802130==  possibly lost: 0 bytes in 0 blocks
==802130==still reachable: 1,174,596 bytes in 2,428 blocks
==802130== suppressed: 0 bytes in 0 blocks


Jørgen Kvalsvik (2):
   Free buffer on function exit [PR120634]
   Use auto_vec in prime paths selftests [PR120634]

  gcc/prime-paths.cc | 50 ++
  1 file changed, 24 insertions(+), 26 deletions(-)

--
2.39.5

Re: [PATCH 0/2] Memory leak fixes in prime paths [PR120634]

2025-06-20 Thread Richard Biener

On Thu, Jun 19, 2025 at 11:21 PM Jørgen Kvalsvik  wrote:
>
> Hi,
>
> These patches fixes a memory leak in the prime paths, and some in the
> selftests that show up in make selftest-valgrind. After applying these
> patches on my x86-64-linux-gnu system and make selftest-valgrind:

OK.

Thanks,
Richard.

> -fself-test: 7665942 pass(es) in 8.943705 seconds
> ==802130==
> ==802130== HEAP SUMMARY:
> ==802130== in use at exit: 1,174,596 bytes in 2,428 blocks
> ==802130==   total heap usage: 3,650,060 allocs, 3,647,632 frees, 
> 1,776,527,949 bytes allocated
> ==802130==
> ==802130== LEAK SUMMARY:
> ==802130==definitely lost: 0 bytes in 0 blocks
> ==802130==indirectly lost: 0 bytes in 0 blocks
> ==802130==  possibly lost: 0 bytes in 0 blocks
> ==802130==still reachable: 1,174,596 bytes in 2,428 blocks
> ==802130== suppressed: 0 bytes in 0 blocks
>
>
> Jørgen Kvalsvik (2):
>   Free buffer on function exit [PR120634]
>   Use auto_vec in prime paths selftests [PR120634]
>
>  gcc/prime-paths.cc | 50 ++
>  1 file changed, 24 insertions(+), 26 deletions(-)
>
> --
> 2.39.5
>

Re: [RFC PATCH] gimple-simulate: Add a gimple IR interpreter/simulator

2025-06-20 Thread Richard Biener

On Thu, Jun 19, 2025 at 12:13 PM Mikael Morin  wrote:
>
> Le 18/06/2025 à 16:51, Richard Biener a écrit :
> > On Wed, Jun 18, 2025 at 11:23 AM Mikael Morin  
> > wrote:
> >>
> >> From: Mikael Morin 
> >>
> >> Hello,
> >>
> >> I'm proposing here an interpretor/simulator of the gimple IR.
> >> It proved useful for me to debug complicated testcases, where
> >> the misbehaviour is not obvious if you just look at the IR dump.
> >> It produces an execution trace on the standard error stream, where
> >> one can see the values of variables changing as statements are executed.
> >>
> >> I only implemented the bits that were needed in my testcase(s), so there
> >> are some holes in the implementation, especially regarding builtin
> >> functions.
> >>
> >> Here are two sample outputs:
> >>
> >> a = {-2.0e+0, 3.0e+0, -5.0e+0, 7.0e+0, -1.1e+1, 1.3e+1};
> >># a[0] = -2.0e+0
> >># a[1] = 3.0e+0
> >># a[2] = -5.0e+0
> >># a[3] = 7.0e+0
> >># a[4] = -1.1e+1
> >># a[5] = 1.3e+1
> >> b = {1.7e+1, -2.3e+1, 2.9e+1, -3.1e+1, 3.7e+1, -4.1e+1};
> >># b[0] = 1.7e+1
> >># b[1] = -2.3e+1
> >># b[2] = 2.9e+1
> >># b[3] = -3.1e+1
> >># b[4] = 3.7e+1
> >># b[5] = -4.1e+1
> >> # Entering function main
> >># Executing bb 0
> >># Leaving bb 0, preparing to execute bb 2
> >># Executing bb 2
> >>_gfortran_set_args (argc_1(D), argv_2(D));
> >>  # ignored
> >>_gfortran_set_options (7, &options__7[0]);
> >>  # ignored
> >>_3 = __builtin_calloc (12, 1);
> >>  # _3 = &
> >>if (_3 == 0B)
> >>  # Condition evaluates to false
> >># Leaving bb 2, preparing to execute bb 4
> >>__var_3_do_19 = PHI <0(2), _17(5)>
> >>  # __var_3_do_19 = 0
> >>_18 = PHI <0.0(2), _5(5)>
> >>  # _18 = 0.0
> >># Executing bb 4
> >>_17 = __var_3_do_19 + 1;
> >>  # _17 = 1
> >>_14 = (long unsigned int) _17;
> >>  # _14 = 1
> >>_13 = MEM[(real__kind_4_ *)&a + -4B + _14 * 4];
> >>  # _13 = -2.0e+0
> >>_12 = _13 * 2.9e+1;
> >>  # _12 = -5.8e+1
> >>_11 = _12 + _18;
> >>  # _11 = -5.8e+1
> >>MEM[(real__kind_4_ *)_3 + -4B + _14 * 4] = _11;
> >>  # MEM[(real__kind_4_ *)_3 + -4B + _14 * 4] = -5.8e+1
> >>if (_17 == 3)
> >>  # Condition evaluates to false
> >># Leaving bb 4, preparing to execute bb 5
> >># Executing bb 5
> >>_5 = MEM[(real__kind_4_ *)_3 + _14 * 4];
> >>  # _5 = 0.0
> >># Leaving bb 5, preparing to execute bb 4
> >>__var_3_do_19 = PHI <0(2), _17(5)>
> >>  # __var_3_do_19 = 1
> >>_18 = PHI <0.0(2), _5(5)>
> >>  # _18 = 0.0
> >># Executing bb 4
> >>_17 = __var_3_do_19 + 1;
> >>  # _17 = 2
> >>_14 = (long unsigned int) _17;
> >>  # _14 = 2
> >>_13 = MEM[(real__kind_4_ *)&a + -4B + _14 * 4];
> >>  # _13 = 3.0e+0
> >>_12 = _13 * 2.9e+1;
> >>  # _12 = 8.7e+1
> >>_11 = _12 + _18;
> >>  # _11 = 8.7e+1
> >>MEM[(real__kind_4_ *)_3 + -4B + _14 * 4] = _11;
> >>  # MEM[(real__kind_4_ *)_3 + -4B + _14 * 4] = 8.7e+1
> >>if (_17 == 3)
> >>  # Condition evaluates to false
> >># Leaving bb 4, preparing to execute bb 5
> >># Executing bb 5
> >>_5 = MEM[(real__kind_4_ *)_3 + _14 * 4];
> >>  # _5 = 0.0
> >># Leaving bb 5, preparing to execute bb 4
> >>__var_3_do_19 = PHI <0(2), _17(5)>
> >>  # __var_3_do_19 = 2
> >>_18 = PHI <0.0(2), _5(5)>
> >>  # _18 = 0.0
> >># Executing bb 4
> >>...
> >>
> >>MEM  [(character__kind_1_ *)&str] = { 97, 99 };
> >>  # str[0][0] = 97
> >>  # str[1][0] = 99
> >>str[2][0] = 97;
> >>  # str[2][0] = 97
> >>parm__3.data = &str;
> >>  # parm__3.data = &str
> >>parm__3.offset = -1;
> >>  # parm__3.offset = -1
> >>parm__3.dtype.elem_len = 1;
> >>  # parm__3.dtype.elem_len = 1
> >>MEM  [(void *)&parm__3 + 24B] = 6601364733952;
> >>  # parm__3.dtype.version = 0
> >>  # parm__3.dtype.rank = 1
> >>  # parm__3.dtype.type = 6
> >>  # parm__3.dtype.attribute = 0
> >>MEM  [(struct array01_character__kind_1_ *)&parm__3 
> >> + 32B] = { 1, 1 };
> >>  # parm__3.span = 1
> >>  # parm__3.dim[0].spacing = 1
> >>MEM  [(struct array01_character__kind_1_ *)&parm__3 
> >> + 48B] = { 1, 3 };
> >>  # parm__3.dim[0].lbound = 1
> >>  # parm__3.dim[0].ubound = 3
> >>atmp__4.offset = 0;
> >>  # atmp__4.offset = 0
> >>atmp__4.dtype.elem_len = 4;
> >>  # atmp__4.dtype.elem_len = 4
> >>MEM  [(void *)&atmp__4 + 24B] = 1103806595072;
> >>  # atmp__4.dtype.version = 0
> >>  # atmp__4.dtype.rank = 1
> >>  # atmp__4.dtype.type = 1
> >>  # atmp__4.dtype.attribute = 0
> >>MEM  [(struct array01_integer__kind_4_ *)&atmp__4 + 
> >> 32B] = { 4, 4 };
> >>  # atmp__4.span = 4
> >>  # atmp__4.dim[0].spacing = 4
> >>MEM  [(struct array01_integer__kind_4_ *)&atmp__4 + 
> >> 48B] = { 0, 0 };
> >>  # atmp__4.dim[0].lbound = 0
> >>  # atmp__4.dim[0].ubound = 0

[PATCH] mklog.py: Add main function

2025-06-20 Thread Alex Coplan

Hi,

This adds a main() function to mklog.py (like e.g. check_GNU_style.py
has), which makes it easier to import and invoke from another python
script.  This is useful when using a wrapper script to set up the python
environment.

Smoke tested by using the modified mklog.py to generate the ChangeLog
for this patch.

OK to install?

Thanks,
Alex

contrib/ChangeLog:

* mklog.py: Add main() function.
diff --git a/contrib/mklog.py b/contrib/mklog.py
index dcf7dde6333..26d4156b034 100755
--- a/contrib/mklog.py
+++ b/contrib/mklog.py
@@ -360,7 +360,7 @@ def skip_line_in_changelog(line):
 return FIRST_LINE_OF_END_RE.match(line) is None
 
 
-if __name__ == '__main__':
+def main():
 extra_args = os.getenv('GCC_MKLOG_ARGS')
 if extra_args:
 sys.argv += json.loads(extra_args)
@@ -447,3 +447,6 @@ if __name__ == '__main__':
 f.write('\n'.join(end))
 else:
 print(output, end='')
+
+if __name__ == '__main__':
+main()

Re: [PATCH v5 2/3][__bdos]Use the counted_by attribute of pointers in builtinin-object-size.

2025-06-20 Thread Qing Zhao



> On Jun 19, 2025, at 12:16, Siddhesh Poyarekar  wrote:
> 
> On 2025-06-19 12:07, Siddhesh Poyarekar wrote:
>> On 2025-06-16 18:08, Qing Zhao wrote:
>>> gcc/ChangeLog:
>>> 
>>> * tree-object-size.cc (access_with_size_object_size): Handle pointers
>>> with counted_by.
>> This should probably just say "Update comment for .ACCESS_WITH_SIZE.".
>>> (collect_object_sizes_for): Likewise.
> 
> Oh, and this should be updated accordingly as well.

Yes, will update. 
> 
>>> @@ -1854,6 +1856,10 @@ collect_object_sizes_for (struct object_size_info 
>>> *osi, tree var)
>>>   if (TREE_CODE (rhs) == SSA_NAME
>>>   && POINTER_TYPE_P (TREE_TYPE (rhs)))
>>> reexamine = merge_object_sizes (osi, var, rhs);
>>> +else if (TREE_CODE (rhs) == MEM_REF
>>> + && POINTER_TYPE_P (TREE_TYPE (rhs))
>>> + && TREE_CODE (TREE_OPERAND (rhs, 0)) == SSA_NAME)
>>> +  reexamine = merge_object_sizes (osi, var, TREE_OPERAND (rhs, 0));
>>>   else
>>> expr_object_size (osi, var, rhs);
>>> }
>> Interesting, looks like this would improve coverage for more than just this 
>> specific case of pointers within structs.  Nice!  Is this what 
>> pointer-counted-by-7.c covers?

 the mentioned code change:

+else if (TREE_CODE (rhs) == MEM_REF
+ && POINTER_TYPE_P (TREE_TYPE (rhs))
+ && TREE_CODE (TREE_OPERAND (rhs, 0)) == SSA_NAME)
+  reexamine = merge_object_sizes (osi, var, TREE_OPERAND (rhs, 0));

is mainly for the following IR pattern that is common when the object size is 
queried 
for a pointer with “counted_by” attribute:

  _1 = .ACCESS_WITH_SIZE (_3, _4, 1, 0, -1, 0B);
  _5 = *_1; ===> this is the GIMPLE_ASSIGN stmt that the above code 
handles.
  _6 = __builtin_dynamic_object_size (_5, 1);

Yes, this is a minimal and necessary change for using the counted_by of 
pointers in __bdos. 
Hope this is clear. 
I will add comments to the above change. 

> 
> Actually, I wonder if this is incomplete.  We should be able to get the 
> object size even if TREE_OPERAND (rhs, 0) is not an SSA_NAME, like we do in 
> addr_object_size:
> 
> ```
>  if (TREE_CODE (pt_var) == MEM_REF)
>{
>  tree sz, wholesize;
> 
>  if (!osi || (object_size_type & OST_SUBOBJECT) != 0
>  || TREE_CODE (TREE_OPERAND (pt_var, 0)) != SSA_NAME)
>{
>  compute_builtin_object_size (TREE_OPERAND (pt_var, 0),
>   object_size_type & ~OST_SUBOBJECT, &sz);
>  wholesize = sz;
>}
>  else
>{
>  tree var = TREE_OPERAND (pt_var, 0);
>  if (osi->pass == 0)
>collect_object_sizes_for (osi, var);
>  if (bitmap_bit_p (computed[object_size_type],
>SSA_NAME_VERSION (var)))
>{
>  sz = object_sizes_get (osi, SSA_NAME_VERSION (var));
>  wholesize = object_sizes_get (osi, SSA_NAME_VERSION (var), true);
>}
>  else
>sz = wholesize = size_unknown (object_size_type);
>}
>  if (!size_unknown_p (sz, object_size_type))
>sz = size_for_offset (sz, TREE_OPERAND (pt_var, 1), wholesize);
> 
>  if (!size_unknown_p (sz, object_size_type)
>  && (TREE_CODE (sz) != INTEGER_CST
>  || compare_tree_int (sz, offset_limit) < 0))
>{
>  pt_var_size = sz;
>  pt_var_wholesize = wholesize;
>}
>}
> ```
> 
> Maybe refactor this to handle MEM_REF and update addr_object_size (and 
> collect_object_sizes_for for the gimple_assign_single_p case) to use it?
> 
> However, that's a more general change and we'd probably need a new test case 
> to validate it.

Yes, I guess that this might be a general improvement to the current __bdos. I 
can do it later in a separate patch. Do you have a good testing case for this?
>  I won't block on this though, your minimal change is a step forward, maybe 
> just add a comment about this?

Yeah, I will add comments to the current change in tree-object-size.cc 
 and also update the testing cases per you 
suggested.

Thanks a lot for your review.

Qing



> 
> Thanks,
> Sid

RE: [PATCH v1] RISC-V: Fix ICE for expand_select_vldi [PR120652]

2025-06-20 Thread Li, Pan2

> Does immediate_operand () work instead of a new predicate?

Thanks Robin, the immediate_operand works well here, let me send v2 if no 
surprise from test.

Pan

-Original Message-
From: Robin Dapp  
Sent: Friday, June 20, 2025 5:29 PM
To: Li, Pan2 ; gcc-patches@gcc.gnu.org
Cc: juzhe.zh...@rivai.ai; kito.ch...@gmail.com; jeffreya...@gmail.com; 
rdapp@gmail.com; Chen, Ken ; Liu, Hongtao 
; Robin Dapp 
Subject: Re: [PATCH v1] RISC-V: Fix ICE for expand_select_vldi [PR120652]

Hi Pan,

> +(define_special_predicate "vectorization_factor_operand"
> +  (match_code "const_int,const_poly_int"))
> +

Does immediate_operand () work instead of a new predicate?

-- 
Regards
 Robin

[PATCH v1 1/3] RISC-V: Combine vec_duplicate + vsaddu.vv to vsaddu.vx on GR2VR cost

2025-06-20 Thread pan2 . li

From: Pan Li 

This patch would like to combine the vec_duplicate + vsaddu.vv to the
vsaddu.vx.  From example as below code.  The related pattern will depend
on the cost of vec_duplicate from GR2VR.  Then the late-combine will
take action if the cost of GR2VR is zero, and reject the combination
if the GR2VR cost is greater than zero.

Assume we have example code like below, GR2VR cost is 0.

  #define DEF_VX_BINARY(T, FUNC)  \
  void\
  test_vx_binary (T * restrict out, T * restrict in, T x, unsigned n) \
  {   \
for (unsigned i = 0; i < n; i++)  \
  out[i] = FUNC (in[i], x);   \
  }

  T sat_add(T a, T b)
  {
return (a + b) | (-(T)((T)(a + b) < a));
  }

  DEF_VX_BINARY(uint32_t, sat_add)

Before this patch:
  10   │ test_vx_binary_or_int32_t_case_0:
  11   │ beq a3,zero,.L8
  12   │ vsetvli a5,zero,e32,m1,ta,ma
  13   │ vmv.v.x v2,a2
  14   │ sllia3,a3,32
  15   │ srlia3,a3,32
  16   │ .L3:
  17   │ vsetvli a5,a3,e32,m1,ta,ma
  18   │ vle32.v v1,0(a1)
  19   │ sllia4,a5,2
  20   │ sub a3,a3,a5
  21   │ add a1,a1,a4
  22   │ vsaddu.vv v1,v1,v2
  23   │ vse32.v v1,0(a0)
  24   │ add a0,a0,a4
  25   │ bne a3,zero,.L3

After this patch:
  10   │ test_vx_binary_or_int32_t_case_0:
  11   │ beq a3,zero,.L8
  12   │ sllia3,a3,32
  13   │ srlia3,a3,32
  14   │ .L3:
  15   │ vsetvli a5,a3,e32,m1,ta,ma
  16   │ vle32.v v1,0(a1)
  17   │ sllia4,a5,2
  18   │ sub a3,a3,a5
  19   │ add a1,a1,a4
  20   │ vsaddu.vx v1,v1,a2
  21   │ vse32.v v1,0(a0)
  22   │ add a0,a0,a4
  23   │ bne a3,zero,.L3

gcc/ChangeLog:

* config/riscv/riscv-v.cc (expand_vx_binary_vec_dup_vec): Add
new case US_PLUS.
(expand_vx_binary_vec_vec_dup): Ditto.
* config/riscv/riscv.cc (riscv_rtx_costs): Ditto.
* config/riscv/vector-iterators.md: Add new op us_plus.

Signed-off-by: Pan Li 
---
 gcc/config/riscv/riscv-v.cc  | 2 ++
 gcc/config/riscv/riscv.cc| 1 +
 gcc/config/riscv/vector-iterators.md | 4 ++--
 3 files changed, 5 insertions(+), 2 deletions(-)

diff --git a/gcc/config/riscv/riscv-v.cc b/gcc/config/riscv/riscv-v.cc
index ac690df3688..45dd9256d02 100644
--- a/gcc/config/riscv/riscv-v.cc
+++ b/gcc/config/riscv/riscv-v.cc
@@ -5541,6 +5541,7 @@ expand_vx_binary_vec_dup_vec (rtx op_0, rtx op_1, rtx 
op_2,
 case UMAX:
 case SMIN:
 case UMIN:
+case US_PLUS:
   icode = code_for_pred_scalar (code, mode);
   break;
 case MINUS:
@@ -5579,6 +5580,7 @@ expand_vx_binary_vec_vec_dup (rtx op_0, rtx op_1, rtx 
op_2,
 case UMAX:
 case SMIN:
 case UMIN:
+case US_PLUS:
   icode = code_for_pred_scalar (code, mode);
   break;
 default:
diff --git a/gcc/config/riscv/riscv.cc b/gcc/config/riscv/riscv.cc
index 3c1bb74675a..42d06336a80 100644
--- a/gcc/config/riscv/riscv.cc
+++ b/gcc/config/riscv/riscv.cc
@@ -3995,6 +3995,7 @@ riscv_rtx_costs (rtx x, machine_mode mode, int 
outer_code, int opno ATTRIBUTE_UN
case UDIV:
case MOD:
case UMOD:
+   case US_PLUS:
  *total = get_vector_binary_rtx_cost (op, scalar2vr_cost);
  break;
default:
diff --git a/gcc/config/riscv/vector-iterators.md 
b/gcc/config/riscv/vector-iterators.md
index 44ae79c48aa..0e1318d1447 100644
--- a/gcc/config/riscv/vector-iterators.md
+++ b/gcc/config/riscv/vector-iterators.md
@@ -4042,11 +4042,11 @@ (define_code_iterator any_int_binop [plus minus and ior 
xor ashift ashiftrt lshi
 ])
 
 (define_code_iterator any_int_binop_no_shift_v_vdup [
-  plus minus and ior xor mult div udiv mod umod smax umax smin umin
+  plus minus and ior xor mult div udiv mod umod smax umax smin umin us_plus
 ])
 
 (define_code_iterator any_int_binop_no_shift_vdup_v [
-  plus minus and ior xor mult smax umax smin umin
+  plus minus and ior xor mult smax umax smin umin us_plus
 ])
 
 (define_code_iterator any_int_unop [neg not])
-- 
2.43.0

[PATCH v1 0/3] RISC-V: Combine vec_duplicate + vsaddu.vv to vsaddu.vx on GR2VR cost

2025-06-20 Thread pan2 . li

From: Pan Li 

This patch would like to introduce the combine of vec_dup + vsaddu.vv
into vsaddu.vx on the cost value of GR2VR.  The late-combine will take
place if the cost of GR2VR is zero, or reject the combine if non-zero
like 1, 2, 15 in test.  There will be two cases for the combine:

Case 0:
 |   ...
 |   vmv.v.x
 | L1:
 |   vsaddu.vv
 |   J L1
 |   ...

Case 1:
 |   ...
 | L1:
 |   vmv.v.x
 |   vsaddu.vv
 |   J L1
 |   ...

Both will be combined to below if the cost of GR2VR is zero.
 |   ...
 | L1:
 |   vsaddu.vx
 |   J L1
 |   ...

The below test suites are passed for this patch series.
* The rv64gcv fully regression test.

Pan Li (3):
  RISC-V: Combine vec_duplicate + vsaddu.vv to vsaddu.vx on GR2VR cost
  RISC-V: Add test for vec_duplicate + vsaddu.vv combine case 0 with GR2VR cost 
0, 2 and 15
  RISC-V: Add test for vec_duplicate + vsaddu.vv combine case 1 with GR2VR cost 
0, 1 and 2

 gcc/config/riscv/riscv-v.cc   |   2 +
 gcc/config/riscv/riscv.cc |   1 +
 gcc/config/riscv/vector-iterators.md  |   4 +-
 .../riscv/rvv/autovec/vx_vf/vx-1-u16.c|   1 +
 .../riscv/rvv/autovec/vx_vf/vx-1-u32.c|   1 +
 .../riscv/rvv/autovec/vx_vf/vx-1-u64.c|   1 +
 .../riscv/rvv/autovec/vx_vf/vx-1-u8.c |   1 +
 .../riscv/rvv/autovec/vx_vf/vx-2-u16.c|   1 +
 .../riscv/rvv/autovec/vx_vf/vx-2-u32.c|   1 +
 .../riscv/rvv/autovec/vx_vf/vx-2-u64.c|   1 +
 .../riscv/rvv/autovec/vx_vf/vx-2-u8.c |   1 +
 .../riscv/rvv/autovec/vx_vf/vx-3-u16.c|   1 +
 .../riscv/rvv/autovec/vx_vf/vx-3-u32.c|   1 +
 .../riscv/rvv/autovec/vx_vf/vx-3-u64.c|   1 +
 .../riscv/rvv/autovec/vx_vf/vx-3-u8.c |   1 +
 .../riscv/rvv/autovec/vx_vf/vx-4-u16.c|   2 +
 .../riscv/rvv/autovec/vx_vf/vx-4-u32.c|   1 +
 .../riscv/rvv/autovec/vx_vf/vx-4-u64.c|   2 +
 .../riscv/rvv/autovec/vx_vf/vx-4-u8.c |   3 +
 .../riscv/rvv/autovec/vx_vf/vx-5-u16.c|   2 +
 .../riscv/rvv/autovec/vx_vf/vx-5-u32.c|   2 +
 .../riscv/rvv/autovec/vx_vf/vx-5-u64.c|   2 +
 .../riscv/rvv/autovec/vx_vf/vx-5-u8.c |   2 +
 .../riscv/rvv/autovec/vx_vf/vx-6-u16.c|   2 +
 .../riscv/rvv/autovec/vx_vf/vx-6-u32.c|   2 +
 .../riscv/rvv/autovec/vx_vf/vx-6-u64.c|   2 +
 .../riscv/rvv/autovec/vx_vf/vx-6-u8.c |   2 +
 .../riscv/rvv/autovec/vx_vf/vx_binary.h   |  42 ++--
 .../riscv/rvv/autovec/vx_vf/vx_binary_data.h  | 196 ++
 .../rvv/autovec/vx_vf/vx_vsadd-run-1-u16.c|  17 ++
 .../rvv/autovec/vx_vf/vx_vsadd-run-1-u32.c|  17 ++
 .../rvv/autovec/vx_vf/vx_vsadd-run-1-u64.c|  17 ++
 .../rvv/autovec/vx_vf/vx_vsadd-run-1-u8.c |  17 ++
 33 files changed, 334 insertions(+), 15 deletions(-)
 create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/autovec/vx_vf/vx_vsadd-run-1-u16.c
 create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/autovec/vx_vf/vx_vsadd-run-1-u32.c
 create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/autovec/vx_vf/vx_vsadd-run-1-u64.c
 create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/autovec/vx_vf/vx_vsadd-run-1-u8.c

-- 
2.43.0

Extend afdo inliner to introduce speculative calls

2025-06-20 Thread Jan Hubicka

Hi,
this patch makes the AFDO's VPT to happen during early inlining.  This should
make the einline pass inside afdo pass unnecesary, but some inlining still
happens there - I will need to debug why that happens and will try to drop the
afdo's inliner incrementally.

get_inline_stack_in_node can now be used to produce inline stack out of
callgraph nodes which are marked as inline clones, so we do not need to iterate
tree-inline and IPA decisions phases like old code did.   I also added some
debug facilities - dumping of decisions and inline stacks, so one can match
them with data in gcov profile.

Former VPT pass identified all caes where in train run indirect call was inlined
and the inlined callee collected some samples. In this case it forced inline 
without
doing any checks, such as whether inlining is possible.

New code simply introduces speculative edges into callgraph and lets afdo 
inlining
to decide.  Old code also marked statements that were introduced during 
promotion
to prevent doing double speculation i.e.

   if (ptr == foo)
  .. inlined foo ...
   else
  ptr ();

to

   if (ptr == foo)
  .. inlined foo ...
   else if (ptr == foo)
  foo (); // for IPA inlining
   else
  ptr ();

Since inlning now happens much earlier, tracking the statements would be quite 
hard.
Instead I simply remove the targets from profile data which sould have same 
effect.

I also noticed that there is nothing setting max_count so all non-0 profile is
considered hot which I fixed too.

Training with ref run I now get
500.perlbench_r   1160   9.93  *   1162   9.84  
*
502.gcc_r NR   
NR
505.mcf_r 1186   8.68  *   1194   8.34  
*
520.omnetpp_r 1183   7.15  *   1208   6.32  
*
523.xalancbmk_r   NR   
NR
525.x264_r1 85.220.5   *   1 85.820.4   
*
531.deepsjeng_r   1165   6.93  *   1176   6.51  
*
541.leela_r   1268   6.18  *   1282   5.87  
*
548.exchange2_r   1 86.330.4   *   1 88.929.5   
*
557.xz_r  1224   4.81  *   1224   4.82  
*
 Est. SPECrate2017_int_base  9.72
 Est. SPECrate2017_int_peak   9.33

503.bwaves_r  NR   
NR
507.cactuBSSN_r   1107  11.9   *   1  10512.0   
*
508.namd_r1108   8.79  *   1  116 8.18  
*
510.parest_r  1143  18.3   *   1  15616.8   
*
511.povray_r  1188  12.4   *   1  16314.4   
*
519.lbm_r 1 72.014.6   *   1   75.0  14.1   
*
521.wrf_r 1106  21.1   *   1  10621.1   
*
526.blender_r 1147  10.3   *   1  14710.4   
*
527.cam4_r1110  15.9   *   1  11814.8   
*
538.imagick_r 1104  23.8   *   1  10523.7   
*
544.nab_r 1146  11.6   *   1  14311.8   
*
549.fotonik3d_r   1134  29.0   *   1  16923.1   
*
554.roms_r1 86.618.4   *   1   89.3  17.8   
*
 Est. SPECrate2017_fp_base   15.4
 Est. SPECrate2017_fp_peak14.9


Base is without profile feedback and peak is AFDO.

autofdobootstraped/regtested x86_64-linux, will commit it shortly.

gcc/ChangeLog:

* auto-profile.cc (dump_inline_stack): New function.
(get_inline_stack_in_node): New function.
(get_relative_location_for_stmt): Add FN parameter.
(has_indirect_call): Remove.
(function_instance::find_icall_target_map): Add FN parameter.
(function_instance::remove_icall_target): New function.
(function_instance::read_function_instance): Set sum_max.
(autofdo_source_profile::get_count_info): Add NODE parameter.
(autofdo_source_profile::update_inlined_ind_target): Add NODE parameter.
(autofdo_source_profile::remove_icall_target): New function.
(afdo_indirect_call): Add INDIRECT_EDGE parameter; dump reason
for failure; do not check for recursion; do not inline call.
(afdo_vpt): Add INDIRECT_EDGE parameter.
(afdo_set_bb_count): Do not take PROMOTED set.
(afdo_vpt_for_early_inline): Remove.
(afdo_annotate_cfg): Do not take PROMOTED set.
(auto_profile): Do not call afdo_vpt_for_early_inline.
(afdo_callsite_hot_enough_for_early_inline): Dump count.
(remove_afdo_speculative_target): New func

[to-be-committed][RISC-V][PR target/118241] Fix data prefetch predicate/constraint for RISC-V

2025-06-20 Thread Jeff Law

The RISC-V prefetch support is broken in a few ways.  This addresses the 
data side prefetch problems.  I'd mistakenly thought this BZ was a 
prefetch.i related (which has deeper problems).


The basic problem is we were accepting any valid address when in fact 
there are restrictions.  This patch more precisely defines the predicate 
such that we allow


REG
REG+D

Where D must have the low 5 bits clear.  Note that absolute addresses 
fall into the REG+D form using the x0 for the register operand since it 
always has the value zero.  The test verifies REG, REG+D, ABS addressing 
modes that are valid as well as REG+D and ABS which must be reloaded 
into a REG because the displacement has low bits set.


An earlier version of this patch has gone through testing in my tester 
on rv32 and rv64.  Obviously I'll wait for pre-commit CI to do its thing 
before moving forward.


This is a good backport candidate after simmering on the trunk for a bit.

Jeff
PR target/118241
gcc/
* config/riscv/predicates.md (prefetch_operand): New predicate.
* config/riscv/constraints.md (Q): New constraint.
* config/riscv/riscv.md (prefetch): Use new predicate and constraint.
(riscv_prefetchi_): Similarly.

gcc/testsuite/
* gcc.target/riscv/pr118241.c: New test.

diff --git a/gcc/config/riscv/constraints.md b/gcc/config/riscv/constraints.md
index 58355cf03f2..ccab1a2e29d 100644
--- a/gcc/config/riscv/constraints.md
+++ b/gcc/config/riscv/constraints.md
@@ -325,3 +325,7 @@ (define_constraint "Ou02"
   "A 2-bit unsigned immediate."
   (and (match_code "const_int")
(match_test "IN_RANGE (ival, 0, 3)")))
+
+(define_constraint "Q"
+  "An address operand that is valid for a prefetch instruction"
+  (match_operand 0 "prefetch_operand"))
diff --git a/gcc/config/riscv/predicates.md b/gcc/config/riscv/predicates.md
index 23690792b32..97a0ed2ae66 100644
--- a/gcc/config/riscv/predicates.md
+++ b/gcc/config/riscv/predicates.md
@@ -27,6 +27,18 @@ (define_predicate "arith_operand"
   (ior (match_operand 0 "const_arith_operand")
(match_operand 0 "register_operand")))
 
+;; REG or REG+D where D fits in a simm12 and has the low 4 bits
+;; off.  The REG+D form can be reloaded into a temporary if needed
+;; after FP elimination if that exposes an invalid offset.
+(define_predicate "prefetch_operand"
+  (ior (match_operand 0 "register_operand")
+   (and (match_test "const_arith_operand (op, VOIDmode)")
+(match_test "(INTVAL (op) & 0x1f) == 0"))
+   (and (match_code "plus")
+(match_test "register_operand (XEXP (op, 0), word_mode)")
+(match_test "const_arith_operand (XEXP (op, 1), VOIDmode)")
+(match_test "(INTVAL (XEXP (op, 1)) & 0x1f) == 0"
+
 (define_predicate "lui_operand"
   (and (match_code "const_int")
(match_test "LUI_OPERAND (INTVAL (op))")))
diff --git a/gcc/config/riscv/riscv.md b/gcc/config/riscv/riscv.md
index 3aed25c2588..3406b50518e 100644
--- a/gcc/config/riscv/riscv.md
+++ b/gcc/config/riscv/riscv.md
@@ -4402,7 +4402,7 @@ (define_insn "riscv_zero_"
 )
 
 (define_insn "prefetch"
-  [(prefetch (match_operand 0 "address_operand" "r")
+  [(prefetch (match_operand 0 "prefetch_operand" "Q")
  (match_operand 1 "imm5_operand" "i")
  (match_operand 2 "const_int_operand" "n"))]
   "TARGET_ZICBOP"
@@ -4422,7 +4422,7 @@ (define_insn "prefetch"
  (const_string "4")))])
 
 (define_insn "riscv_prefetchi_"
-  [(unspec_volatile:X [(match_operand:X 0 "address_operand" "r")
+  [(unspec_volatile:X [(match_operand:X 0 "prefetch_operand" "Q")
   (match_operand:X 1 "imm5_operand" "i")]
   UNSPECV_PREI)]
   "TARGET_ZICBOP"
diff --git a/gcc/testsuite/gcc.target/riscv/pr118241.c 
b/gcc/testsuite/gcc.target/riscv/pr118241.c
new file mode 100644
index 000..f1dc44bce0c
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/pr118241.c
@@ -0,0 +1,16 @@
+/* { dg-do compile } */
+/* { dg-options "-march=rv64gc_zicbop" { target { rv64 } } } */
+/* { dg-options "-march=rv32gc_zicbop" { target { rv32 } } } */
+/* { dg-skip-if "" { *-*-* } { "-O0" } } */
+
+void test1() { __builtin_prefetch((int *)2047); }
+void test2() { __builtin_prefetch((int *)1024); }
+void test3(char *x) { __builtin_prefetch(&x); }
+void test4(char *x) { __builtin_prefetch(&x[2]); }
+void test5(char *x) { __builtin_prefetch(&x[1024]); }
+
+/* So we expect test1, test3 and test4 to be a prefetch
+   with zero offset.  test2 and test5 will have a 1k offset.  */
+/* { dg-final { scan-assembler-times "prefetch.r\t0\\(\[a-x0-9\]+\\)" 3 } } */
+/* { dg-final { scan-assembler-times "prefetch.r\t1024" 2 } } */
+

Re: [PATCH] AArch64: Disable TARGET_CONST_ANCHOR

2025-06-20 Thread Andrew Pinski

On Fri, Jun 20, 2025, 4:47 PM Wilco Dijkstra  wrote:

>
> TARGET_CONST_ANCHOR appears to trigger too often, even on simple
> immediates.
> It inserts extra ADD/SUB instructions even when a single MOV exists.
> Disable it to improve overall code quality: on SPEC2017 it removes
> 1850 ADD/SUB instructions and 630 spill instructions, and SPECINT is ~0.06%
> faster on Neoverse V2.  Adjust a testcase that was confusing neg and fneg.
>
> Passes regress, OK for commit?
>

I am not sure 2017 might be best benchmark for this.
Do you have more data?
Even on which spill is happening where?

This seems like it would be better to improve the RA rather than workaround
it via this.

Also 0.6% is in the noise as far as I know.

Thanks,
Andrew




> gcc:
> * config/aarch64/aarch64.cc (TARGET_CONST_ANCHOR): Remove.
>
> gcc/testsuite:
> * gcc.target/aarch64/vneg_s.c: Update test.
>
> ---
>
> diff --git a/gcc/config/aarch64/aarch64.cc b/gcc/config/aarch64/aarch64.cc
> index
> 0a3c246517a86697142589a513a327e5ee930349..51279e29db88f0aa332c40abda68ad3b957b0ef0
> 100644
> --- a/gcc/config/aarch64/aarch64.cc
> +++ b/gcc/config/aarch64/aarch64.cc
> @@ -32444,9 +32444,6 @@ aarch64_libgcc_floating_mode_supported_p
>  #undef TARGET_HAVE_SHADOW_CALL_STACK
>  #define TARGET_HAVE_SHADOW_CALL_STACK true
>
> -#undef TARGET_CONST_ANCHOR
> -#define TARGET_CONST_ANCHOR 0x100
> -
>  #undef TARGET_EXTRA_LIVE_ON_ENTRY
>  #define TARGET_EXTRA_LIVE_ON_ENTRY aarch64_extra_live_on_entry
>
> diff --git a/gcc/testsuite/gcc.target/aarch64/vneg_s.c
> b/gcc/testsuite/gcc.target/aarch64/vneg_s.c
> index
> 8ddc4d21c1f89d6c66624a33ee0386cb3a28c512..8d91639faaa1c728095265ce4e61327a4dc441e3
> 100644
> --- a/gcc/testsuite/gcc.target/aarch64/vneg_s.c
> +++ b/gcc/testsuite/gcc.target/aarch64/vneg_s.c
> @@ -256,7 +256,7 @@ test_vnegq_s64 ()
>return o1||o2||o2||o4;
>  }
>
> -/* { dg-final { scan-assembler-times "neg\\tv\[0-9\]+\.2d, v\[0-9\]+\.2d"
> 1 } } */
> +/* { dg-final { scan-assembler-times "\tneg\\tv\[0-9\]+\.2d,
> v\[0-9\]+\.2d" 1 } } */
>
>  int
>  main (int argc, char **argv)
>
>

[PATCH] tree-optimization/120654 - ICE with range query from IVOPTs

[PATCH] fortran: Mention user variable in SELECT TYPE temporary variable names

Re: [PATCH v1] RISC-V: Fix ICE for expand_select_vldi [PR120652]

[PATCH] vregs: Use force_subreg when instantiating subregs [PR120721]

Re: [PATCH v5 2/3][__bdos]Use the counted_by attribute of pointers in builtinin-object-size.

Re: [PATCH v5 1/3][C FE] Extend "counted_by" attribute to pointer fields of structures.

Re: [PATCH v4 1/4] Hard register constraints

[PATCH] tree-optimization/120729 - limit compile time in uninit_analysis::prune_phi_opnds

Re: [PATCH v2] Evaluate the object size by the size of the pointee type when the type is a structure with flexible array member which is annotated with counted_by.

[PATCH v2] RISC-V: Fix ICE for expand_select_vldi [PR120652]

Re: [PATCH v2] RISC-V: Fix ICE for expand_select_vldi [PR120652]

Re: [PATCH v2] RISC-V: Fix ICE for expand_select_vldi [PR120652]

[PATCH] Add string_slice class.

Re: [PATCH v5 3/3][C sanitizer] Use the counted_by attribute of pointers in array bound checker.

[PATCH] match: Simplify double not and double negate to a non_lvalue

[committed] amdgcn: allow SImode in VCC_HI [PR120722]

[PATCH] s390: Fix float vector extract for pre-z13

[PATCH] AArch64: Disable TARGET_CONST_ANCHOR

Database Info of BioProcess International Conference & Exhibition

Re: [PATCH, 4 of 4] Use vector pair for memory operations with -mcpu=future

Re: [PATCH, 4 of 4] Use vector pair for memory operations with -mcpu=future

[COMMITTED] PR tree-optimization/120701 - Fix range wrap check and enhance verify_range.

[PATCH] x86: Don't use vmovdqu16/vmovdqu8 with non-EVEX registers

Re: [PATCH] x86: Get the widest vector mode from MOVE_MAX

Re: [PATCH 0/2] Memory leak fixes in prime paths [PR120634]

Re: [PATCH 0/2] Memory leak fixes in prime paths [PR120634]

Re: [RFC PATCH] gimple-simulate: Add a gimple IR interpreter/simulator

[PATCH] mklog.py: Add main function

Re: [PATCH v5 2/3][__bdos]Use the counted_by attribute of pointers in builtinin-object-size.

RE: [PATCH v1] RISC-V: Fix ICE for expand_select_vldi [PR120652]

[PATCH v1 1/3] RISC-V: Combine vec_duplicate + vsaddu.vv to vsaddu.vx on GR2VR cost

[PATCH v1 0/3] RISC-V: Combine vec_duplicate + vsaddu.vv to vsaddu.vx on GR2VR cost

Extend afdo inliner to introduce speculative calls

[to-be-committed][RISC-V][PR target/118241] Fix data prefetch predicate/constraint for RISC-V

Re: [PATCH] AArch64: Disable TARGET_CONST_ANCHOR

35 matches

Site Navigation

Mail list logo

Footer information