[PATCH] RISC-V: Add testcases for unsigned vector SAT_SUB form 11 and form 12

2025-07-10 Thread Ciyan Pan
From: panciyan 

This patch adds testcase for form11 and form12, as shown below:

void __attribute__((noinline))   \
vec_sat_u_sub_##T##_fmt_11 (T *out, T *op_1, T *op_2, unsigned limit) \
{\
  unsigned i;\
  for (i = 0; i < limit; i++)\
{\
  T x = op_1[i]; \
  T y = op_2[i]; \
  T ret; \
  T overflow = __builtin_sub_overflow (x, y, &ret);   \
  out[i] = overflow ? 0 : ret;   \
}\
}

void __attribute__((noinline))\
vec_sat_u_sub_##T##_fmt_12 (T *out, T *op_1, T *op_2, unsigned limit) \
{ \
  unsigned i; \
  for (i = 0; i < limit; i++) \
{ \
  T x = op_1[i];  \
  T y = op_2[i];  \
  T ret;  \
  T overflow = __builtin_sub_overflow (x, y, &ret);\
  out[i] = !overflow ? ret : 0;   \
} \
}

Passed the rv64gcv regression test.

Signed-off-by: Ciyan Pan 
gcc/testsuite/ChangeLog:

* gcc.target/riscv/rvv/autovec/sat/vec_sat_arith.h: Add unsigned vector 
SAT_SUB form11 and form12.
* gcc.target/riscv/rvv/autovec/sat/vec_sat_u_sub-11-u16.c: New test.
* gcc.target/riscv/rvv/autovec/sat/vec_sat_u_sub-11-u32.c: New test.
* gcc.target/riscv/rvv/autovec/sat/vec_sat_u_sub-11-u64.c: New test.
* gcc.target/riscv/rvv/autovec/sat/vec_sat_u_sub-11-u8.c: New test.
* gcc.target/riscv/rvv/autovec/sat/vec_sat_u_sub-12-u16.c: New test.
* gcc.target/riscv/rvv/autovec/sat/vec_sat_u_sub-12-u32.c: New test.
* gcc.target/riscv/rvv/autovec/sat/vec_sat_u_sub-12-u64.c: New test.
* gcc.target/riscv/rvv/autovec/sat/vec_sat_u_sub-12-u8.c: New test.
* gcc.target/riscv/rvv/autovec/sat/vec_sat_u_sub-run-11-u16.c: New test.
* gcc.target/riscv/rvv/autovec/sat/vec_sat_u_sub-run-11-u32.c: New test.
* gcc.target/riscv/rvv/autovec/sat/vec_sat_u_sub-run-11-u64.c: New test.
* gcc.target/riscv/rvv/autovec/sat/vec_sat_u_sub-run-11-u8.c: New test.
* gcc.target/riscv/rvv/autovec/sat/vec_sat_u_sub-run-12-u16.c: New test.
* gcc.target/riscv/rvv/autovec/sat/vec_sat_u_sub-run-12-u32.c: New test.
* gcc.target/riscv/rvv/autovec/sat/vec_sat_u_sub-run-12-u64.c: New test.
* gcc.target/riscv/rvv/autovec/sat/vec_sat_u_sub-run-12-u8.c: New test.

---
 .../riscv/rvv/autovec/sat/vec_sat_arith.h | 44 +++
 .../rvv/autovec/sat/vec_sat_u_sub-11-u16.c|  9 
 .../rvv/autovec/sat/vec_sat_u_sub-11-u32.c|  9 
 .../rvv/autovec/sat/vec_sat_u_sub-11-u64.c|  9 
 .../rvv/autovec/sat/vec_sat_u_sub-11-u8.c |  9 
 .../rvv/autovec/sat/vec_sat_u_sub-12-u16.c|  9 
 .../rvv/autovec/sat/vec_sat_u_sub-12-u32.c|  9 
 .../rvv/autovec/sat/vec_sat_u_sub-12-u64.c|  9 
 .../rvv/autovec/sat/vec_sat_u_sub-12-u8.c |  9 
 .../autovec/sat/vec_sat_u_sub-run-11-u16.c| 15 +++
 .../autovec/sat/vec_sat_u_sub-run-11-u32.c| 15 +++
 .../autovec/sat/vec_sat_u_sub-run-11-u64.c| 15 +++
 .../rvv/autovec/sat/vec_sat_u_sub-run-11-u8.c | 15 +++
 .../autovec/sat/vec_sat_u_sub-run-12-u16.c| 15 +++
 .../autovec/sat/vec_sat_u_sub-run-12-u32.c| 15 +++
 .../autovec/sat/vec_sat_u_sub-run-12-u64.c| 15 +++
 .../rvv/autovec/sat/vec_sat_u_sub-run-12-u8.c | 15 +++
 17 files changed, 236 insertions(+)
 create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/autovec/sat/vec_sat_u_sub-11-u16.c
 create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/autovec/sat/vec_sat_u_sub-11-u32.c
 create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/autovec/sat/vec_sat_u_sub-11-u64.c
 create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/autovec/sat/vec_sat_u_sub-11-u8.c
 create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/autovec/sat/vec_sat_u_sub-12-u16.c
 create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/autovec/sat/vec_sat_u_sub-12-u32.c
 create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/autovec/sat/vec_sat_u_sub-12-u64.c
 create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/a

[PATCH 2/2] s390: Implement reduction optabs

2025-07-10 Thread Juergen Christ
Implementation and tests for the standard reduction optabs.

Bootstrapped and regtested on s390.  Ok for trunk?

Signed-off-by: Juergen Christ 

gcc/ChangeLog:

* config/s390/vector.md (reduc_plus_scal_): Implement.
(reduc_plus_scal_v2df): Implement.
(reduc_plus_scal_v4sf): Implement.
(REDUC_FMINMAX): New int iterator.
(reduc_fminmax_name): New int attribute.
(reduc_minmax): New code iterator.
(reduc_minmax_name): New code attribute.
(reduc__scal_v2df): Implement.
(reduc__scal_v4sf): Implement.
(reduc__scal_v2df): Implement.
(reduc__scal_v4sf): Implement.
(REDUCBIN): New code iterator.
(reduc_bin_insn): New code attribute.
(reduc__scal_v2di): Implement.
(reduc__scal_v4si): Implement.
(reduc__scal_v8hi): Implement.
(reduc__scal_v16qi): Implement.

gcc/testsuite/ChangeLog:

* lib/target-supports.exp: Add s390 to vect_logical_reduc targets.
* gcc.target/s390/vector/reduc-binops-1.c: New test.
* gcc.target/s390/vector/reduc-minmax-1.c: New test.
* gcc.target/s390/vector/reduc-plus-1.c: New test.
---
 gcc/config/s390/vector.md | 293 +-
 .../gcc.target/s390/vector/reduc-binops-1.c   |  40 +++
 .../gcc.target/s390/vector/reduc-minmax-1.c   | 234 ++
 .../gcc.target/s390/vector/reduc-plus-1.c | 152 +
 gcc/testsuite/lib/target-supports.exp |   4 +-
 5 files changed, 717 insertions(+), 6 deletions(-)
 create mode 100644 gcc/testsuite/gcc.target/s390/vector/reduc-binops-1.c
 create mode 100644 gcc/testsuite/gcc.target/s390/vector/reduc-minmax-1.c
 create mode 100644 gcc/testsuite/gcc.target/s390/vector/reduc-plus-1.c

diff --git a/gcc/config/s390/vector.md b/gcc/config/s390/vector.md
index 26753c099cda..98427b37e884 100644
--- a/gcc/config/s390/vector.md
+++ b/gcc/config/s390/vector.md
@@ -3572,11 +3572,6 @@
   "veval\t%v0,%v1,%v2,%v3,%b4"
   [(set_attr "op_type" "VRI")])
 
-; reduc_smin
-; reduc_smax
-; reduc_umin
-; reduc_umax
-
 ; vec_pack_sfix_trunc: convert + pack ?
 ; vec_pack_ufix_trunc
 ; vec_unpacks_float_hi
@@ -3627,3 +3622,291 @@
   (const_int 4)]
  UNSPEC_FMIN))]
   "TARGET_VXE")
+
+; reduc_plus
+(define_expand "reduc_plus_scal_"
+  [(set (match_dup 4)
+   (unspec:V4SI [(match_operand:VI_HW_QH 1 "register_operand")
+(match_dup 2)]
+   UNSPEC_VEC_VSUM))
+   (set (match_dup 5)
+   (unspec:V2DI [(match_dup 4) (match_dup 3)] UNSPEC_VEC_VSUMQ))
+   (set (match_operand: 0 "register_operand")
+   (vec_select: (match_dup 6)
+ (parallel [(match_dup 7)])))]
+  "TARGET_VX"
+{
+  operands[2] = force_reg (mode, CONST0_RTX (mode));
+  operands[3] = simplify_gen_subreg (V4SImode, operands[2], mode, 0);
+  operands[4] = gen_reg_rtx (V4SImode);
+  operands[5] = gen_reg_rtx (V2DImode);
+  operands[6] = simplify_gen_subreg(mode, operands[5], V2DImode, 0);
+  operands[7] = GEN_INT (16 / GET_MODE_SIZE (mode) - 1);
+})
+
+(define_expand "reduc_plus_scal_"
+  [(set (match_dup 3)
+ (unspec:V2DI [(match_operand:VI_HW_SD 1 "register_operand")
+  (match_dup 2)]
+ UNSPEC_VEC_VSUMQ))
+   (set (match_operand: 0 "register_operand")
+   (vec_select: (match_dup 4)
+ (parallel [(match_dup 5)])))]
+  "TARGET_VX"
+{
+  operands[2] = force_reg (mode, CONST0_RTX (mode));
+  operands[3] = gen_reg_rtx (V2DImode);
+  operands[4] = simplify_gen_subreg (mode, operands[3], V2DImode, 0);
+  operands[5] = GEN_INT (16 / GET_MODE_SIZE (mode) - 1);
+})
+
+(define_expand "reduc_plus_scal_v2df"
+  [(set (match_dup 2)
+   (unspec:V2DF [(match_operand:V2DF 1 "register_operand")
+(match_dup 1)
+(const_int 8)]
+   UNSPEC_VEC_SLDBYTE))
+   (set (match_dup 3) (plus:V2DF (match_dup 1) (match_dup 2)))
+   (set (match_operand:DF 0 "register_operand")
+   (vec_select:DF (match_dup 3) (parallel [(const_int 0)])))]
+  "TARGET_VX"
+{
+  operands[2] = gen_reg_rtx (V2DFmode);
+  operands[3] = gen_reg_rtx (V2DFmode);
+})
+
+(define_expand "reduc_plus_scal_v4sf"
+  [(set (match_dup 2)
+   (unspec:V4SF [(match_operand:V4SF 1 "register_operand")
+(match_dup 1)
+(const_int 4)]
+   UNSPEC_VEC_SLDBYTE))
+   (set (match_dup 3) (plus:V4SF (match_dup 1) (match_dup 2)))
+   (set (match_dup 4)
+   (unspec:V4SF [(match_dup 3) (match_dup 3) (const_int 8)]
+UNSPEC_VEC_SLDBYTE))
+   (set (match_dup 5) (plus:V4SF (match_dup 3) (match_dup 4)))
+   (set (match_operand:SF 0 "register_operand")
+   (vec_select:SF (match_dup 5) (parallel [(const_int 0)])))]
+  "TARGET_VXE"
+{
+  operands[2] = gen_reg_rtx (V4SFmode);
+  operands[3] = gen_reg_rtx (V4SFmode);
+  operands[4] = gen_reg_rtx (V4SFmode);
+  operands[5] = gen_reg_rtx (V4SFmode);
+})
+
+; reduc_fmin, reduc_fmax, reduc_smin, reduc_sma

[PATCH 1/2] s390: Remove min-vect-loop-bound override

2025-07-10 Thread Juergen Christ
The default setting of s390 for the parameter min-vect-loop-bound was
set to 2 to prevent certain epilogue loop vectorizations in the past.
Reevaluation of this parameter shows that this setting now is not
needed anymore and sometimes even harmful.  Remove the overwrite to
align s390 with other backends.

Signed-off-by: Juergen Christ 

gcc/ChangeLog:

* config/s390/s390.cc (s390_option_override_internal): Remove override.
---
 gcc/config/s390/s390.cc | 3 ---
 1 file changed, 3 deletions(-)

diff --git a/gcc/config/s390/s390.cc b/gcc/config/s390/s390.cc
index de9c15c7bd42..737b176766a2 100644
--- a/gcc/config/s390/s390.cc
+++ b/gcc/config/s390/s390.cc
@@ -16566,9 +16566,6 @@ s390_option_override_internal (struct gcc_options *opts,
   else
 SET_OPTION_IF_UNSET (opts, opts_set, param_vect_partial_vector_usage, 0);
 
-  /* Do not vectorize loops with a low trip count for now.  */
-  SET_OPTION_IF_UNSET (opts, opts_set, param_min_vect_loop_bound, 2);
-
   /* Set the default alignment.  */
   s390_default_align (opts);
 
-- 
2.43.5



Re: [COMMITTED] cobol: Development round-up. [PR120765, PR119337, PR120794]

2025-07-10 Thread Richard Biener
On Wed, Jul 9, 2025 at 10:16 PM Robert Dubner  wrote:
>
> From 069bf2fe31e99f0415ddb6acaf76cfb6eee8bb6a Mon Sep 17 00:00:00 2001
> From: Robert Dubner mailto:rdub...@symas.com
> Date: Wed, 9 Jul 2025 12:24:38 -0400
> Subject: [PATCH] cobol: Development round-up. [PR120765, PR119337,
> PR120794]
>
> This collection of changes reflects development by both Jim Lowden and Bob
> Dubner.  It includes fixes to the cobcd script; refinements to the
> multiple-
> period syntax; changes to the parser; implementation of DISPLAY/ACCEPT to
> and
> from ENVIRONMENT-NAME, ENVIRONMENT-VALUE, ARGUMENT-NUMBER, ARGUMENT-VALUE
> and
> minor changes to genapi.cc to cut down on the number of cppcheck warnings.
>
> Co-authored-by: James K. Lowden mailto:jklow...@cobolworx.com
> Co-authored-by: Robert Dubner mailto:rdub...@symas.com

I'm taking this as opportunity to notify you about the upcoming GCC 15.2 release
(I expect a release candidate not earlier than end of July).

How do you want to go with maintaining Cobol on the GCC 15 branch?  There is
IMO the opportunity to sync what was done on trunk to the branch.  I can't
really tell whether that is, at this point, significant improvement on
the usability
compared to what is in GCC 15.1, so I'm not sure if it is worth it.

Before considering backporting of changes I'd give people the chance to discover
any build issues on the development trunk of course.

Richard.


> gcc/cobol/ChangeLog:
>
> PR cobol/120765
> PR cobol/119337
> PR cobol/120794
> * Make-lang.in: Take control of the .cc.o rule.
> * cbldiag.h (error_msg_direct): New declaration.
> (gcc_location_dump): Forward declaration.
> (location_dump): Use gcc_location_dump.
> * cdf.y: Change some tokens.
> * gcobc: Change dialect handling.
> * genapi.cc (parser_call_targets_dump): Temporarily remove from
> service.
> (parser_compile_dcls): Combine temporary arrays.
> (get_binary_value_from_float): Apply const to one parameter.
> (depending_on_value): Localize a boolean variable.
> (normal_normal_compare): Likewise.
> (cobol_compare): Eliminate cppcheck warning.
> (combined_name): Apply const to an input parameter.
> (parser_perform): Apply const to a variable.
> (parser_accept): Improve handling of special_name_t parameter and
> the exception conditions.
> (parser_display): Improve handling of speciat_name_t parameter;
> use the
> os_filename[] string when appropriate.
> (program_end_stuff): Rename shadowing variable.
> (parser_division): Consolidate temporary char[] arrays.
> (parser_file_start): Apply const to a parameter.
> (inspect_replacing): Likewise.
> (parser_program_hierarchy): Rename shadowing variable.
> (mh_identical): Apply const to parameters.
> (float_type_of): Likewise.
> (picky_memcpy): Likewise.
> (mh_numeric_display): Likewise.
> (mh_little_endian): Likewise.
> (mh_source_is_group): Apply static to a variable it.
> (move_helper): Quiet a cppcheck warning.
> * genapi.h (parser_accept): Add exceptions to declaration.
> (parser_accept_under_discussion): Add declaration.
> (parser_display): Change to std::vector; add exceptions to
> declaration.
> * lexio.cc (cdf_source_format): Improve source code location
> handling.
> (source_format_t::infer): Likewise.
> (is_fixed_format): Likewise.
> (is_reference_format): Likewise.
> (left_margin): Likewise.
> (right_margin): Likewise.
> (cobol_set_indicator_column): Likewise.
> (include_debug): Likewise.
> (continues_at): Likewise.
> (indicated): Likewise.
> (check_source_format_directive): Likewise.
> (cdftext::free_form_reference_format): Likewise.
> * parse.y: Tokens; program and function names; DISPLAY and ACCEPT
> handling.
> * parse_ante.h (class tokenset_t): Removed.
> (class current_tokens_t): Removed.
> (field_of): Removed.
> * scan.l: Token handling.
> * scan_ante.h (level_found): Comment.
> * scan_post.h (start_condition_str): Remove cast author_state:.
> * symbols.cc (symbols_update): Change error message.
> (symbol_table_init): Correct and reorder entries.
> (symbol_unresolved_file_key): New function definition.
> (cbl_file_key_t::deforward): Change error message.
> * symbols.h (symbol_unresolved_file_key): New declaration.
> (keyword_tok): New function.
> (redefined_token): New function.
> (class current_tokens_t): New class.
> * symfind.cc (symbol_match): Revise error message.
> * token_names.h: Reorder and change numbers in comments.
> * util.cc (class cdf_directives_t): New class.
> (cobol_set_indicator_column): New funct

Re: [PATCH] Change bellow in comments to below

2025-07-10 Thread Richard Biener
On Thu, Jul 10, 2025 at 8:13 AM Jakub Jelinek  wrote:
>
> Hi!
>
> While I'm not a native English speaker, I believe all the uses
> of bellow (roar/bark/...) in comments in gcc are meant to be
> below (beneath/under/...).
>
> Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?

OK

> 2025-07-10  Jakub Jelinek  
>
> gcc/
> * tree-vect-loop.cc (scale_profile_for_vect_loop): Comment
> spelling fix: bellow -> below.
> * ipa-polymorphic-call.cc (record_known_type): Likewise.
> * config/i386/x86-tune.def: Likewise.
> * config/riscv/vector.md (*vsetvldi_no_side_effects_si_extend):
> Likewise.
> * tree-scalar-evolution.cc (iv_can_overflow_p): Likewise.
> * ipa-devirt.cc (add_type_duplicate): Likewise.
> * tree-ssa-loop-niter.cc (maybe_lower_iteration_bound): Likewise.
> * gimple-ssa-sccopy.cc: Likewise.
> * cgraphunit.cc: Likewise.
> * graphite.h (struct poly_dr): Likewise.
> * ipa-reference.cc (ignore_edge_p): Likewise.
> * tree-ssa-alias.cc (ao_compare::compare_ao_refs): Likewise.
> * profile-count.h (profile_probability::probably_reliable_p):
> Likewise.
> * ipa-inline-transform.cc (inline_call): Likewise.
> gcc/ada/
> * par-load.adb: Comment spelling fix: bellow -> below.
> * libgnarl/s-taskin.ads: Likewise.
> gcc/testsuite/
> * gfortran.dg/g77/980310-3.f: Comment spelling fix: bellow -> below.
> * jit.dg/test-debuginfo.c: Likewise.
> libstdc++-v3/
> * testsuite/22_locale/codecvt/codecvt_unicode.h
> (ucs2_to_utf8_out_error): Comment spelling fix: bellow -> below.
> (utf16_to_ucs2_in_error): Likewise.
>
> --- gcc/tree-vect-loop.cc.jj2025-07-09 20:38:59.036628116 +0200
> +++ gcc/tree-vect-loop.cc   2025-07-09 20:42:30.409882136 +0200
> @@ -11489,7 +11489,7 @@ scale_profile_for_vect_loop (class loop
>profile_count entry_count = loop_preheader_edge (loop)->count ();
>
>/* If we have unreliable loop profile avoid dropping entry
> - count bellow header count.  This can happen since loops
> + count below header count.  This can happen since loops
>   has unrealistically low trip counts.  */
>while (vf > 1
>  && loop->header->count > entry_count
> --- gcc/ipa-polymorphic-call.cc.jj  2025-01-02 20:54:32.263128066 +0100
> +++ gcc/ipa-polymorphic-call.cc 2025-07-09 20:42:00.479269537 +0200
> @@ -1353,7 +1353,7 @@ record_known_type (struct type_change_in
>
>/* If we found a constructor of type that is not polymorphic or
>   that may contain the type in question as a field (not as base),
> - restrict to the inner class first to make type matching bellow
> + restrict to the inner class first to make type matching below
>   happier.  */
>if (type
>&& (offset
> --- gcc/config/i386/x86-tune.def.jj 2025-07-09 20:38:58.951629222 +0200
> +++ gcc/config/i386/x86-tune.def2025-07-09 20:41:41.466515624 +0200
> @@ -31,7 +31,7 @@ see the files COPYING3 and COPYING.RUNTI
> - Updating ix86_issue_rate and ix86_adjust_cost in i386.md
> - possibly updating ia32_multipass_dfa_lookahead, ix86_sched_reorder
>   and ix86_sched_init_global if those tricks are needed.
> -- Tunning the flags bellow. Those are split into sections and each
> +- Tunning the flags below. Those are split into sections and each
>section is very roughly ordered by importance.  */
>
>  
> /*/
> --- gcc/config/riscv/vector.md.jj   2025-06-30 13:57:47.898657344 +0200
> +++ gcc/config/riscv/vector.md  2025-07-09 20:41:44.100481531 +0200
> @@ -1783,7 +1783,7 @@ (define_insn_and_split "@vsetvl_no
>[(set_attr "type" "vsetvl")
> (set_attr "mode" "SI")])
>
> -;; This pattern use to combine bellow two insns and then further remove
> +;; This pattern use to combine below two insns and then further remove
>  ;; unnecessary sign_extend operations:
>  ;;   (set (reg:DI 134 [ _1 ])
>  ;;(unspec:DI [
> --- gcc/tree-scalar-evolution.cc.jj 2025-05-09 17:56:52.472682248 +0200
> +++ gcc/tree-scalar-evolution.cc2025-07-09 20:42:16.605060815 +0200
> @@ -3088,7 +3088,7 @@ iv_can_overflow_p (class loop *loop, tre
>type_max = wi::max_value (type);
>
>/* Just sanity check that we don't see values out of the range of the type.
> - In this case the arithmetics bellow would overflow.  */
> + In this case the arithmetics below would overflow.  */
>gcc_checking_assert (wi::ge_p (base_min, type_min, sgn)
>&& wi::le_p (base_max, type_max, sgn));
>
> --- gcc/ipa-devirt.cc.jj2025-03-03 21:44:09.553931609 +0100
> +++ gcc/ipa-devirt.cc   2025-07-09 20:41:55.212337706 +0200
> @@ -1763,7 +1763,7 @@ add_type_duplicate (odr_type val, tree t
>   }
> /* One base is polymorphic and the other not.
>  

Re: [AutoFDO] Fix get_original_name to strip only names that are generated after auto-profile

2025-07-10 Thread Jan Hubicka
> > 
> > I tried to implement a workaround to match lost discriminator in cases
> > this is uniquely deterined, but it is not so easy to do.
> > My plan is to figure out how to upstream it and then drop the lost
> > discriminator workaround from match.
> > 
> > Do you see warnings with -Wauto-profile?
> 
> Adding -Wauto-profile gets it to work. Let me look into this.

I reproduced the problem on leela in cpu2017. Inded -Wauto-profile gets
it to work but also warns about real issue here.

We have

   :
  [simulator/csimplemodule.cc:379:85] _40 = 
std::__cxx11::basic_string::c_str ([simulator/csimplemodule.cc:379:85] 
&D.80680);
  [simulator/csimplemodule.cc:379:85 discrim 13] _41 = 
[simulator/csimplemodule.cc:379:85] 
&this->D.78503.D.78106.D.72008.D.68585.D.67935.D.67879.D.67782;
  [simulator/csimplemodule.cc:379:85 discrim 13] _42 = 
&this->D.78503.D.78106.D.72008.D.68585.D.67935.D.67879.D.67782;
  [simulator/csimplemodule.cc:377:45] _43 = 
this->D.78503.D.78106.D.72008.D.68585.D.67935.D.67879.D.67782._vptr.cObject;
  [simulator/csimplemodule.cc:377:45] _44 = _43 + 40;
  [simulator/csimplemodule.cc:377:45] _45 = [simulator/csimplemodule.cc:377:45] 
*_44;
  [simulator/csimplemodule.cc:379:85] D.89001 = OBJ_TYPE_REF(_45;(const struct 
cObject)_42->5B) (_41);

Notice that both calls

  [simulator/csimplemodule.cc:379:85] _40 = 
std::__cxx11::basic_string::c_str ([simulator/csimplemodule.cc:379:85] 
&D.80680);
  [simulator/csimplemodule.cc:379:85] D.89001 = OBJ_TYPE_REF(_45;(const struct 
cObject)_42->5B) (_41);

have same filename, line and discriminator. 

It seems to me that the assign-discriminator pass is simply broken by
design. I wonder if there any other consumers of discriminators or it is
auto-profile only thing?

The fix for match ICE is

diff --git a/gcc/auto-profile.cc b/gcc/auto-profile.cc
index 219676012e7..a2ad478aea9 100644
--- a/gcc/auto-profile.cc
+++ b/gcc/auto-profile.cc
@@ -1475,19 +1550,21 @@ function_instance::match (cgraph_node *node,
{
  if (inlined_fn
  && inlined_fn->get_call_location ()
- != UNKNOWN_LOCATION
- && warning_at (gimple_location (stmt),
-OPT_Wauto_profile,
-"%q+F contains two calls of the same"
-" relative location +%i,"
-" discrimnator %i,"
-" that leads to lost auto-profile",
-node->decl,
-loc << 16,
-loc & 65535))
+ != UNKNOWN_LOCATION)
{
- inform (inlined_fn->get_call_location (),
- "location of the earlier call");
+ if (warning_at (gimple_location (stmt),
+ OPT_Wauto_profile,
+ "%q+F contains two calls of the same"
+ " relative location +%i,"
+ " discrimnator %i,"
+ " that leads to lost auto-profile",
+ node->decl,
+ loc << 16,
+ loc & 65535))
+   {
+ inform (inlined_fn->get_call_location (),
+ "location of the earlier call");
+   }
  inlined_fn = NULL;
}
  if (inlined_fn)

I have some match improvements cumulated in my tree so I will test them
and commit.


[PATCH v2] RISC-V: Make zero-stride load broadcast a tunable.

2025-07-10 Thread Robin Dapp

Hi,

Changes from v1:
- Use Himode broadcast instead of float broadcast, saving two conversion 
  insns.


Let's be daring and leave the thorough testing to the CI first while my own 
testing is in progress :)


This patch makes the zero-stride load broadcast idiom dependent on a
uarch-tunable "use_zero_stride_load".  Right now we have quite a few
paths that reach a strided load and some of them are not exactly
straightforward.

While broadcast is relatively rare on rv64 targets it is more common on
rv32 targets that want to vectorize 64-bit elements.

While the patch is more involved than I would have liked it could have
even touched more places.  The whole broadcast-like insn path feels a
bit hackish due to the several optimizations we employ.  Some of the
complications stem from the fact that we lump together real broadcasts,
vector single-element sets, and strided broadcasts.  The strided-load
alternatives currently require a memory_constraint to work properly
which causes more complications when trying to disable just these.

In short, the whole pred_broadcast handling in combination with the
sew64_scalar_helper could use work in the future.  I was about to start
with it in this patch but soon realized that it would only distract from
the original intent.  What can help in the future is split strided and
non-strided broadcast entirely, as well as the single-element sets.

Yet unclear is whether we need to pay special attention for misaligned
strided loads (PR120782).

I regtested on rv32 and rv64 with strided_load_broadcast_p forced to
true and false.  With either I didn't observe any new execution failures
but obviously there are new scan failures with strided broadcast turned
off.

Regards
Robin

PR target/118734

gcc/ChangeLog:

* config/riscv/constraints.md (Wdm): Use tunable for Wdm
constraint.
* config/riscv/riscv-protos.h (emit_avltype_insn): Declare.
(can_be_broadcasted_p): Rename to...
(can_be_broadcast_p): ...this.
* config/riscv/predicates.md: Use renamed function.
(strided_load_broadcast_p): Declare.
* config/riscv/riscv-selftests.cc (run_broadcast_selftests):
Only run broadcast selftest if strided broadcasts are OK.
* config/riscv/riscv-v.cc (emit_avltype_insn): New function.
(sew64_scalar_helper): Only emit a pred_broadcast if the new
tunable says so.
(can_be_broadcasted_p): Rename to...
(can_be_broadcast_p): ...this and use new tunable.
* config/riscv/riscv.cc (struct riscv_tune_param): Add strided
broad tunable.
(strided_load_broadcast_p): Implement.
* config/riscv/vector.md: Use strided_load_broadcast_p () and
work around 64-bit broadcast on rv32 targets.
---
gcc/config/riscv/constraints.md |  7 +--
gcc/config/riscv/predicates.md  |  2 +-
gcc/config/riscv/riscv-protos.h |  4 +-
gcc/config/riscv/riscv-selftests.cc | 10 +++--
gcc/config/riscv/riscv-v.cc | 58 +
gcc/config/riscv/riscv.cc   | 20 +
gcc/config/riscv/vector.md  | 66 +
7 files changed, 133 insertions(+), 34 deletions(-)

diff --git a/gcc/config/riscv/constraints.md b/gcc/config/riscv/constraints.md
index ccab1a2e29d..5ecaa19eb01 100644
--- a/gcc/config/riscv/constraints.md
+++ b/gcc/config/riscv/constraints.md
@@ -237,10 +237,11 @@ (define_constraint "Wb1"
 (and (match_code "const_vector")
  (match_test "rtx_equal_p (op, riscv_vector::gen_scalar_move_mask (GET_MODE 
(op)))")))

-(define_memory_constraint "Wdm"
+(define_constraint "Wdm"
  "Vector duplicate memory operand"
-  (and (match_code "mem")
-   (match_code "reg" "0")))
+  (and (match_test "strided_load_broadcast_p ()")
+   (and (match_code "mem")
+   (match_code "reg" "0"

;; Vendor ISA extension constraints.

diff --git a/gcc/config/riscv/predicates.md b/gcc/config/riscv/predicates.md
index 8baad2fae7a..1f9a6b562e5 100644
--- a/gcc/config/riscv/predicates.md
+++ b/gcc/config/riscv/predicates.md
@@ -617,7 +617,7 @@ (define_special_predicate "vector_any_register_operand"

;; The scalar operand can be directly broadcast by RVV instructions.
(define_predicate "direct_broadcast_operand"
-  (match_test "riscv_vector::can_be_broadcasted_p (op)"))
+  (match_test "riscv_vector::can_be_broadcast_p (op)"))

;; A CONST_INT operand that has exactly two bits cleared.
(define_predicate "const_nottwobits_operand"
diff --git a/gcc/config/riscv/riscv-protos.h b/gcc/config/riscv/riscv-protos.h
index 38f63ea8424..a41c4c299fa 100644
--- a/gcc/config/riscv/riscv-protos.h
+++ b/gcc/config/riscv/riscv-protos.h
@@ -604,6 +604,7 @@ void emit_vlmax_vsetvl (machine_mode, rtx);
void emit_hard_vlmax_vsetvl (machine_mode, rtx);
void emit_vlmax_insn (unsigned, unsigned, rtx *);
void emit_nonvlmax_insn (unsigned, unsigned, rtx *, rtx);
+void emit_avltype_insn (unsigned, unsigned, rtx *, avl_type, rtx = nullptr);
void emit_vlm

Re: [PATCH] aarch64: Enable selective LDAPUR generation for cores with RCPC2

2025-07-10 Thread Richard Sandiford
Soumya AR  writes:
>> On 1 Jul 2025, at 9:22 PM, Kyrylo Tkachov  wrote:
>> 
>> 
>> 
>>> On 1 Jul 2025, at 17:36, Richard Sandiford  
>>> wrote:
>>> 
>>> Soumya AR  writes:
 From 2a2c3e3683aaf3041524df166fc6f8cf20895a0b Mon Sep 17 00:00:00 2001
 From: Soumya AR 
 Date: Mon, 30 Jun 2025 12:17:30 -0700
 Subject: [PATCH] aarch64: Enable selective LDAPUR generation for cores with
 RCPC2
 
 This patch adds the ability to fold the address computation into the 
 addressing
 mode for LDAPR instructions using LDAPUR when RCPC2 is available.
 
 LDAPUR emission is controlled by the tune flag enable_ldapur, to enable it 
 on a
 per-core basis. Earlier, the following code:
 
 uint64_t
 foo (std::atomic *x)
 {
 return x[1].load(std::memory_order_acquire);
 }
 
 would generate:
 
 foo(std::atomic*):
 add x0, x0, 8
 ldapr   x0, [x0]
 ret
 
 but now generates:
 
 foo(std::atomic*):
 ldapur  x0, [x0, 8]
 ret
 
 The patch was bootstrapped and regtested on aarch64-linux-gnu, no 
 regression.
 OK for mainline?
 
 Signed-off-by: Soumya AR 
 
 gcc/ChangeLog:
 
 * config/aarch64/aarch64-tuning-flags.def (AARCH64_EXTRA_TUNING_OPTION):
 Add the enable_ldapur flag to conwtrol LDAPUR emission.
 * config/aarch64/aarch64.h (TARGET_ENABLE_LDAPUR): Use new flag.
 * config/aarch64/aarch64.md (any): Add ldapur_enable attribute.
 * config/aarch64/atomics.md: (aarch64_atomic_load_rcpc): Modify
 to emit LDAPUR for cores with RCPC2 when enable_ldapur is set.
 (*aarch64_atomic_load_rcpc_zext): Likewise.
 (*aarch64_atomic_load_rcpc_sext): Modified to emit LDAPURS
 for addressing with offsets.
 
 gcc/testsuite/ChangeLog:
 
 * gcc.target/aarch64/ldapur.c: New test.
>>> 
>>> Thanks for doing this.  It generally looks good, but a couple of comments
>>> below:
>>> 
 ---
 gcc/config/aarch64/aarch64-tuning-flags.def |  2 +
 gcc/config/aarch64/aarch64.h|  5 ++
 gcc/config/aarch64/aarch64.md   | 11 +++-
 gcc/config/aarch64/atomics.md   | 22 +---
 gcc/testsuite/gcc.target/aarch64/ldapur.c   | 61 +
 5 files changed, 92 insertions(+), 9 deletions(-)
 create mode 100644 gcc/testsuite/gcc.target/aarch64/ldapur.c
 
 diff --git a/gcc/config/aarch64/aarch64-tuning-flags.def 
 b/gcc/config/aarch64/aarch64-tuning-flags.def
 index f2c916e9d77..5bf54165306 100644
 --- a/gcc/config/aarch64/aarch64-tuning-flags.def
 +++ b/gcc/config/aarch64/aarch64-tuning-flags.def
 @@ -44,6 +44,8 @@ AARCH64_EXTRA_TUNING_OPTION ("avoid_cross_loop_fma", 
 AVOID_CROSS_LOOP_FMA)
 
 AARCH64_EXTRA_TUNING_OPTION ("fully_pipelined_fma", FULLY_PIPELINED_FMA)
 
 +AARCH64_EXTRA_TUNING_OPTION ("enable_ldapur", ENABLE_LDAPUR)
 +
>>> 
>>> Let's see what others say, but personally, I think this would be better
>>> as an opt-out, such as avoid_ldapur.  The natural default seems to be to use
>>> the extra addressing capacity when it's available and have CPUs explicitly
>>> flag when they don't want that.
>>> 
>>> A good, conservatively correct, default would probably be to add 
>>> avoid_ldapur
>>> to every *current* CPU that includes rcpc2 and then separately remove it
>>> from those that are known not to need it.  In that sense, it's more work
>>> for current CPUs than the current patch, but it should ease the impact
>>> on future CPUs.
>> 
>> LLVM used to do this folding by default everywhere until it was discovered 
>> that it hurts various CPUs.
>> So they’ve taken the approach you describe, and disable the folding 
>> explicitly for:
>> neoverse-v2 neoverse-v3 cortex-x3 cortex-x4 cortex-x925 
>> I don’t know for sure if those are the only CPUs where this applies.
>> They also disable the folding for generic tuning when -march is between 
>> armv8.4 - armv8.7/armv9.2.
>> I guess we can do the same in GCC.
>
> Thanks for your suggestions, Richard and Kyrill.
>
> I've updated the patch to use avoid_ldapur.
>
> There's now an explicit override in aarch64_override_options_internal to use 
> avoid_ldapur for armv8.4 through armv8.7. 
>
> I added it here because aarch64_adjust_generic_arch_tuning is only called for 
> generic_tunings and not generic_armv{8,9}_a_tunings.
>
> Let me know what you think.

Sounds good to me.  I can see that we wouldn't want armv9.3-a+ generic
tuning to be hampered by pre-armv9.3 cores.

But I'm not sure that we're deliberately avoiding calling
aarch64_adjust_generic_arch_tuning for generic_armv{8,9}_a_tunings.
The current TARGET_SVE2 behaviour makes conceptual sense for
generic_armv8_a_tunings too, if someone (unsually) used
-march=armv8-a+sve2.  The current TARGET_SVE2 behaviour would
also be a nop for generic_armv9_a_tunings.

It's probably more that the current SVE2 behaviour isn't particularly
important for 

Re: [PATCH] aarch64: Enable selective LDAPUR generation for cores with RCPC2

2025-07-10 Thread Richard Sandiford
Soumya AR  writes:
>> On 10 Jul 2025, at 3:15 PM, Richard Sandiford  
>> wrote:
>> 
>> External email: Use caution opening links or attachments
>> 
>> 
>> Soumya AR  writes:
 On 1 Jul 2025, at 9:22 PM, Kyrylo Tkachov  wrote:
 
 
 
> On 1 Jul 2025, at 17:36, Richard Sandiford  
> wrote:
> 
> Soumya AR  writes:
>> From 2a2c3e3683aaf3041524df166fc6f8cf20895a0b Mon Sep 17 00:00:00 2001
>> From: Soumya AR 
>> Date: Mon, 30 Jun 2025 12:17:30 -0700
>> Subject: [PATCH] aarch64: Enable selective LDAPUR generation for cores 
>> with
>> RCPC2
>> 
>> This patch adds the ability to fold the address computation into the 
>> addressing
>> mode for LDAPR instructions using LDAPUR when RCPC2 is available.
>> 
>> LDAPUR emission is controlled by the tune flag enable_ldapur, to enable 
>> it on a
>> per-core basis. Earlier, the following code:
>> 
>> uint64_t
>> foo (std::atomic *x)
>> {
>> return x[1].load(std::memory_order_acquire);
>> }
>> 
>> would generate:
>> 
>> foo(std::atomic*):
>> add x0, x0, 8
>> ldapr   x0, [x0]
>> ret
>> 
>> but now generates:
>> 
>> foo(std::atomic*):
>> ldapur  x0, [x0, 8]
>> ret
>> 
>> The patch was bootstrapped and regtested on aarch64-linux-gnu, no 
>> regression.
>> OK for mainline?
>> 
>> Signed-off-by: Soumya AR 
>> 
>> gcc/ChangeLog:
>> 
>> * config/aarch64/aarch64-tuning-flags.def (AARCH64_EXTRA_TUNING_OPTION):
>> Add the enable_ldapur flag to conwtrol LDAPUR emission.
>> * config/aarch64/aarch64.h (TARGET_ENABLE_LDAPUR): Use new flag.
>> * config/aarch64/aarch64.md (any): Add ldapur_enable attribute.
>> * config/aarch64/atomics.md: (aarch64_atomic_load_rcpc): Modify
>> to emit LDAPUR for cores with RCPC2 when enable_ldapur is set.
>> (*aarch64_atomic_load_rcpc_zext): Likewise.
>> (*aarch64_atomic_load_rcpc_sext): Modified to emit LDAPURS
>> for addressing with offsets.
>> 
>> gcc/testsuite/ChangeLog:
>> 
>> * gcc.target/aarch64/ldapur.c: New test.
> 
> Thanks for doing this.  It generally looks good, but a couple of comments
> below:
> 
>> ---
>> gcc/config/aarch64/aarch64-tuning-flags.def |  2 +
>> gcc/config/aarch64/aarch64.h|  5 ++
>> gcc/config/aarch64/aarch64.md   | 11 +++-
>> gcc/config/aarch64/atomics.md   | 22 +---
>> gcc/testsuite/gcc.target/aarch64/ldapur.c   | 61 +
>> 5 files changed, 92 insertions(+), 9 deletions(-)
>> create mode 100644 gcc/testsuite/gcc.target/aarch64/ldapur.c
>> 
>> diff --git a/gcc/config/aarch64/aarch64-tuning-flags.def 
>> b/gcc/config/aarch64/aarch64-tuning-flags.def
>> index f2c916e9d77..5bf54165306 100644
>> --- a/gcc/config/aarch64/aarch64-tuning-flags.def
>> +++ b/gcc/config/aarch64/aarch64-tuning-flags.def
>> @@ -44,6 +44,8 @@ AARCH64_EXTRA_TUNING_OPTION ("avoid_cross_loop_fma", 
>> AVOID_CROSS_LOOP_FMA)
>> 
>> AARCH64_EXTRA_TUNING_OPTION ("fully_pipelined_fma", FULLY_PIPELINED_FMA)
>> 
>> +AARCH64_EXTRA_TUNING_OPTION ("enable_ldapur", ENABLE_LDAPUR)
>> +
> 
> Let's see what others say, but personally, I think this would be better
> as an opt-out, such as avoid_ldapur.  The natural default seems to be to 
> use
> the extra addressing capacity when it's available and have CPUs explicitly
> flag when they don't want that.
> 
> A good, conservatively correct, default would probably be to add 
> avoid_ldapur
> to every *current* CPU that includes rcpc2 and then separately remove it
> from those that are known not to need it.  In that sense, it's more work
> for current CPUs than the current patch, but it should ease the impact
> on future CPUs.
 
 LLVM used to do this folding by default everywhere until it was discovered 
 that it hurts various CPUs.
 So they’ve taken the approach you describe, and disable the folding 
 explicitly for:
 neoverse-v2 neoverse-v3 cortex-x3 cortex-x4 cortex-x925
 I don’t know for sure if those are the only CPUs where this applies.
 They also disable the folding for generic tuning when -march is between 
 armv8.4 - armv8.7/armv9.2.
 I guess we can do the same in GCC.
>>> 
>>> Thanks for your suggestions, Richard and Kyrill.
>>> 
>>> I've updated the patch to use avoid_ldapur.
>>> 
>>> There's now an explicit override in aarch64_override_options_internal to 
>>> use avoid_ldapur for armv8.4 through armv8.7.
>>> 
>>> I added it here because aarch64_adjust_generic_arch_tuning is only called 
>>> for generic_tunings and not generic_armv{8,9}_a_tunings.
>>> 
>>> Let me know what you think.
>> 
>> Sounds good to me.  I can see that we wouldn't want armv9.3-a+ generic
>> tuning to be hampered by pre-armv9.3 cores.
>> 
>> But

[PATCH] expand: ICE if asked to expand RDIV with non-float type.

2025-07-10 Thread Robin Dapp

Hi,

this patch adds asserts that ensure we only expand an RDIV_EXPR with
actual float mode.  It also replaces the RDIV_EXPR in setting a
vectorized loop's length by EXACT_DIV_EXPR.  The code in question is
only used with length-control targets (riscv, powerpc, s390).

Bootstrapped and regtested on x86, aarch64, and power10.  Regtested on 
rv64gcv_zvl512b.


Regards
Robin

PR target/121014

gcc/ChangeLog:

* cfgexpand.cc (expand_debug_expr): Assert FLOAT_MODE_P.
* optabs-tree.cc (optab_for_tree_code): Assert FLOAT_TYPE_P.
* tree-vect-loop.cc (vect_get_loop_len): Use EXACT_DIV_EXPR.
---
gcc/cfgexpand.cc  | 2 ++
gcc/optabs-tree.cc| 2 ++
gcc/tree-vect-loop.cc | 2 +-
3 files changed, 5 insertions(+), 1 deletion(-)

diff --git a/gcc/cfgexpand.cc b/gcc/cfgexpand.cc
index 33649d43f71..a656ccebf17 100644
--- a/gcc/cfgexpand.cc
+++ b/gcc/cfgexpand.cc
@@ -5358,6 +5358,8 @@ expand_debug_expr (tree exp)
  return simplify_gen_binary (MULT, mode, op0, op1);

case RDIV_EXPR:
+  gcc_assert (FLOAT_MODE_P (mode));
+  /* Fall through.  */
case TRUNC_DIV_EXPR:
case EXACT_DIV_EXPR:
  if (unsignedp)
diff --git a/gcc/optabs-tree.cc b/gcc/optabs-tree.cc
index 6dfe8ee4c4e..9308a6dfd65 100644
--- a/gcc/optabs-tree.cc
+++ b/gcc/optabs-tree.cc
@@ -82,6 +82,8 @@ optab_for_tree_code (enum tree_code code, const_tree type,
return unknown_optab;
  /* FALLTHRU */
case RDIV_EXPR:
+  gcc_assert (FLOAT_TYPE_P (type));
+  /* FALLTHRU */
case TRUNC_DIV_EXPR:
case EXACT_DIV_EXPR:
  if (TYPE_SATURATING (type))
diff --git a/gcc/tree-vect-loop.cc b/gcc/tree-vect-loop.cc
index d5044d5fe22..432a248715e 100644
--- a/gcc/tree-vect-loop.cc
+++ b/gcc/tree-vect-loop.cc
@@ -11429,7 +11429,7 @@ vect_get_loop_len (loop_vec_info loop_vinfo, 
gimple_stmt_iterator *gsi,
  factor = exact_div (nunits1, nunits2).to_constant ();
  tree iv_type = LOOP_VINFO_RGROUP_IV_TYPE (loop_vinfo);
  gimple_seq seq = NULL;
- loop_len = gimple_build (&seq, RDIV_EXPR, iv_type, loop_len,
+ loop_len = gimple_build (&seq, EXACT_DIV_EXPR, iv_type, loop_len,
   build_int_cst (iv_type, factor));
  if (seq)
gsi_insert_seq_before (gsi, seq, GSI_SAME_STMT);
--
2.50.0



Re: [PATCH, v2] Fortran: fix minor issues with coarrays (extended)

2025-07-10 Thread Andre Vehreschild
Hi Harald,

sorry for all the confusion. Probably my understanding of a pure
elemental routine is imperfect. I therefore first like to express what
I need an a caf-accessor. In a caf-accessor I have only access to data
that is "exported" to it via the add_data object. The data in there has
to be made available locally on the remote image in the accessor,
because the accessor can be executed on a remote image on a different
machine. This prevents access to data that is pointed to (including
allocated) or has an opaque type.

By using the pure && elemental flag I tried to mimic this. The rules
for which routines are pure and/or elemental are to complicated for me
at the moment. I just don't get all possible combinations into my head.
May be it is safest to enforce that every function call is evaluated on
the source image, thus preventing any misunderstandings.

Did that help? Shall I rephrase the commit to enforce source image
evaluation of functions? In my opinion that at least reduces code
complexity and one is on the safe side.

Regards,
Andre



On Mon, 7 Jul 2025 20:53:16 +0200
Harald Anlauf  wrote:

> Andre,
> 
> I still don't get it, and the present version made it worse for me...
> 
> So let's see what I was thinking.  There are the following types of 
> functions:
> 
> (0) impure, non-elemental functions, which likely have side-effects
> 
> (1) pure functions (in the f95 sense), i.e. pure non-elemental
> 
> (2) pure elemental functions (in the f95 sense)
> 
> (3) impure elemental functions (>= f2008)
> 
> Note that I understand "pure elemental" being different from
> "pure and elemental" as used in the comment: the first version
> really means both pure and elemental, the second could be read
> as either pure or elemental or pure elemental.  A native speaker
> may correct me if I am wrong...
> 
> Back to gfortran: we have in decl.cc::gfc_match_prefix
> 
>/* If IMPURE it not seen but the procedure is ELEMENTAL, mark it
> as PURE.  */
>if (!seen_impure && current_attr.elemental && !current_attr.pure)
>  {
>if (!gfc_add_pure (¤t_attr, NULL))
>   goto error;
>  }
> 
> This explains the possible attributes we should see.
> 
> The change to coarray.cc has:
> 
>  case EXPR_FUNCTION:
> - if (!e->symtree->n.sym->attr.pure
> - && !e->symtree->n.sym->attr.elemental
> - && !(e->value.function.isym
> -  && (e->value.function.isym->pure
> -  || e->value.function.isym->elemental)))
> -   /* Treat non-pure/non-elemental functions.  */
> -   check_add_new_comp_handle_array (e, type, add_data);
> + if ((e->symtree->n.sym->attr.pure
> +  && e->symtree->n.sym->attr.elemental)
> + || (e->value.function.isym &&
> e->value.function.isym->pure
> + && e->value.function.isym->elemental))
> +   {
> + /* Only allow pure and elemental function calls in a
> coarray
> +accessor, because all other may have side effects or
> access
> +pointers, which may not be possible in the accessor 
> running on
> +another host.  */
> + for (gfc_actual_arglist *actual =
> e->value.function.actual;
> +  actual; actual = actual->next)
> +   check_add_new_component (type, actual->expr,
> add_data);
> +   }
>else
> -   for (gfc_actual_arglist *actual =
> e->value.function.actual; actual;
> -actual = actual->next)
> - check_add_new_component (type, actual->expr, add_data);
> +   /* Extract the expression, evaluate it and add a
> temporary with its
> +  value to the helper structure.  */
> +   check_add_new_comp_handle_array (e, type, add_data);
> 
> 
> If I read the comment in the if-branch and match it against my
> expectation, I am confused.  Why only "pure elemental"?  Why not
> allow simply "pure"?  And wouldn't it be better to move the explaining
> comment before the "if" to make it easier to read the following?
> E.g. why does a pure non-elemental function need a temporary?
> 
> Thanks,
> Harald
> 
> 
> Am 07.07.25 um 10:40 schrieb Andre Vehreschild:
> > Hi Harald,
> > 
> > I totally understand your confusion. I also had a hard time
> > figuring what is needed there. I got to restructure the code
> > fragment and now only allow pure *and* elemental intrinsic function
> > and pure *and* elemental user-defined functions (hoping that's the
> > opposite of intrinsics) in a caf accessor. For all others a
> > temporary is to be created in the helper structure. I also added a
> > comment to clarify the intention. I think this is better now.
> > Opinions?
> > 
> > Regtests ok on x86_64-pc-linux-gnu / F41. Ok for mainline?
> > 
> > Regards,
> > Andre
> > 
> > On Fri, 4 Jul 2025 19:29:08 +0200
> > Harald Anlauf  wrote:
> >   
> >> Andre,
> >>
> >> either your patch to coarray.cc is wrong,

Re: [PATCH] libgcc: PR target/116363 Fix SFtype to UDWtype conversion

2025-07-10 Thread Jeff Law




On 2/22/25 8:10 AM, Jan Dubiec wrote:

This patch fixes SFtype to UDWtype (aka float to unsigned long long)
conversion on targets without DFmode like e.g. H8/300H. It solely relies
on SFtype->UWtype and UWtype->UDWtype conversions/casts. The existing code
in line 2218 (counter = a) assigns/casts a float which is *always* not 
lesser

than Wtype_MAXp1_F to an UWtype int which of course does not have enough
capacity.

2025-02-22  Jan Dubiec  

 PR target/116363

libgcc/ChangeLog:

 * libgcc2.c (__fixunssfDI): Fix SFtype to UDWtype conversion for 
targets

 without LIBGCC2_HAS_DF_MODE defined
Sorry this has taken so long to resolve.  Too much to do and patches 
affecting something like the H8 rarely bubble to the top.


Anyway, this has been repeatedly bootstrapped & regression tested on 
aarch64, ppc64le and other targets.  It's also been many dozens of 
regression testing cycles on the various embedded targets.


Given it relies on intermediate casts working it should be quite safe. 
If that capability doesn't work, then we've got much bigger problems.


Pushed to the trunk.  Thanks again for your patience.

Jeff



Re: [PATCH] [x86] properly compute fp/mode for scalar ops for vectorizer costing

2025-07-10 Thread Richard Biener
On Thu, 10 Jul 2025, Jan Hubicka wrote:

> > The x86 add_stmt_hook relies on the passed vectype to determine
> > the mode and whether it is FP for a scalar operation.  This is
> > unreliable now for stmts involving patterns and in the future when
> > there is no vector type passed for scalar operations.
> > 
> > To be least disruptive I've kept using the vector type if it is passed.
> > 
> > Bootstrapped and tested on x86_64-unknown-linux-gnu.
> > 
> > OK?
> > 
> > Thanks
> > Richard.
> > 
> > * config/i386/i386.cc (ix86_vector_costs::add_stmt_cost): Use
> > the LHS of a scalar stmt to determine mode and whether it is FP.
> > ---
> >  gcc/config/i386/i386.cc | 6 ++
> >  1 file changed, 6 insertions(+)
> > 
> > diff --git a/gcc/config/i386/i386.cc b/gcc/config/i386/i386.cc
> > index ad7360ec71a..26eefadea64 100644
> > --- a/gcc/config/i386/i386.cc
> > +++ b/gcc/config/i386/i386.cc
> > @@ -25798,6 +25798,12 @@ ix86_vector_costs::add_stmt_cost (int count, 
> > vect_cost_for_stmt kind,
> >if (scalar_p)
> > mode = TYPE_MODE (TREE_TYPE (vectype));
> >  }
> > +  else if (scalar_p && stmt_info)
> > +if (tree lhs = gimple_get_lhs (stmt_info->stmt))
> > +  {
> > +   fp = FLOAT_TYPE_P (TREE_TYPE (lhs));
> > +   mode = TYPE_MODE (TREE_TYPE (lhs));
> > +  }
> Makes sense to me, but perhaps it would be good idea to add a comment,
> since it looks odd at first glance?

Like

  /* When we are costing a scalar stmt use the scalar stmt to get at the
 type of the operation.  */

?

Richard.

> Honza
> 

-- 
Richard Biener 
SUSE Software Solutions Germany GmbH,
Frankenstrasse 146, 90461 Nuernberg, Germany;
GF: Ivo Totev, Andrew McDonald, Werner Knoblich; (HRB 36809, AG Nuernberg)


[PATCH] s390: Support global stack protector canary

2025-07-10 Thread Stefan Schulze Frielinghaus
So far only a per thread canary in the TLS block is supported.  This
patch adds support for a global canary, too.  For this the new option
-mstack-protector-guard={global,tls} is added which defaults to tls.

The global canary is expected at symbol __stack_chk_guard which means
for a function prologue instructions larl/l(g)fr + mvc are emitted and
for an epilogue larl/l(g)fr + clc.

Furthermore, option -mstack-protector-guard-record is added which is
inspired by -mrecord-mcount and generates section __stack_protector_loc
containing pointers to all instructions which load the address of the
global guard.  Thus, this option has only an effect in conjunction with
-mstack-protector-guard=global.  The intended use is for the Linux
kernel in order to support run-time patching.  In each task_struct of
the kernel a canary is held which will be copied into the lowcore.
Since the kernel supports migration of the lowcore, addresses are not
necessarily constant.  Therefore, the kernel expects that all
instructions loading the address of the canary to be of format RIL or
more precisely are either larl or lgrl and that the instructions
addresses are recorded in section __stack_protector_loc.  The kernel is
then required to patch those instructions e.g. to llilf, prior first
execution or whenever the lowcore moves.

In total this means -mstack-protector-guard=global emits code suitable
for user and kernel space.

gcc/ChangeLog:

* config/s390/s390-opts.h (enum stack_protector_guard): Define
SP_TLS and SP_GLOBAL.
* config/s390/s390.h (TARGET_SP_GLOBAL_GUARD): Define predicate.
(TARGET_SP_TLS_GUARD): Define predicate.
* config/s390/s390.md (stack_protect_global_guard_addr):
New insn.
(stack_protect_set): Also deal with a global guard.
(stack_protect_test): Also deal with a global guard.
* config/s390/s390.opt (-mstack-protector-guard={global,tls}):
New option.
(-mstack-protector-guard-record) New option.

gcc/testsuite/ChangeLog:

* gcc.target/s390/stack-protector-guard-global-1.c: New test.
* gcc.target/s390/stack-protector-guard-global-2.c: New test.
* gcc.target/s390/stack-protector-guard-global-3.c: New test.
* gcc.target/s390/stack-protector-guard-global-4.c: New test.
---
 gcc/config/s390/s390-opts.h   |  8 ++
 gcc/config/s390/s390.h|  3 +
 gcc/config/s390/s390.md   | 87 +++
 gcc/config/s390/s390.opt  | 18 
 .../s390/stack-protector-guard-global-1.c | 27 ++
 .../s390/stack-protector-guard-global-2.c |  5 ++
 .../s390/stack-protector-guard-global-3.c |  6 ++
 .../s390/stack-protector-guard-global-4.c |  6 ++
 8 files changed, 144 insertions(+), 16 deletions(-)
 create mode 100644 
gcc/testsuite/gcc.target/s390/stack-protector-guard-global-1.c
 create mode 100644 
gcc/testsuite/gcc.target/s390/stack-protector-guard-global-2.c
 create mode 100644 
gcc/testsuite/gcc.target/s390/stack-protector-guard-global-3.c
 create mode 100644 
gcc/testsuite/gcc.target/s390/stack-protector-guard-global-4.c

diff --git a/gcc/config/s390/s390-opts.h b/gcc/config/s390/s390-opts.h
index 9cacb2c29d1..29dd4a5f77f 100644
--- a/gcc/config/s390/s390-opts.h
+++ b/gcc/config/s390/s390-opts.h
@@ -53,4 +53,12 @@ enum indirect_branch {
   indirect_branch_thunk_inline,
   indirect_branch_thunk_extern
 };
+
+
+/* Where to get the canary for the stack protector.  */
+enum stack_protector_guard
+{
+  SP_TLS,   /* per-thread canary in TLS block */
+  SP_GLOBAL /* global canary */
+};
 #endif
diff --git a/gcc/config/s390/s390.h b/gcc/config/s390/s390.h
index 8b04bc9a755..2631788df4c 100644
--- a/gcc/config/s390/s390.h
+++ b/gcc/config/s390/s390.h
@@ -251,6 +251,9 @@ enum processor_flags
&& (s390_tune < PROCESSOR_2964_Z13 || (VAL) != const0_rtx)  \
&& (!CONST_INT_P (LEN) || INTVAL ((LEN)) > TARGET_SETMEM_PREFETCH_DISTANCE))
 
+#define TARGET_SP_GLOBAL_GUARD (s390_stack_protector_guard == SP_GLOBAL)
+#define TARGET_SP_TLS_GUARD(s390_stack_protector_guard == SP_TLS)
+
 /* Run-time target specification.  */
 
 /* Defaults for option flags defined only on some subtargets.  */
diff --git a/gcc/config/s390/s390.md b/gcc/config/s390/s390.md
index 02bc149b0fb..2d3027e1574 100644
--- a/gcc/config/s390/s390.md
+++ b/gcc/config/s390/s390.md
@@ -311,6 +311,7 @@
 
; Stack Protector
UNSPECV_SP_GET_TP
+   UNSPECV_SP_GLOBAL_GUARD_ADDR
   ])
 
 ;;
@@ -11930,6 +11931,36 @@
 ; Stack Protector Patterns
 ;
 
+(define_insn "stack_protect_global_guard_addr"
+  [(set (match_operand:P 0 "register_operand" "=d")
+   (unspec_volatile:P [(const_int 0)] UNSPECV_SP_GLOBAL_GUARD_ADDR))]
+  ""
+{
+  if (flag_s390_stack_protector_guard_record)
+fprintf (asm_out_file, "1:\n");
+  if (flag_pic)
+{
+  if (TARGET_Z10)
+   output_asm_insn ("lrl\t%0,__stack_chk_guard@GOTENT", operands);
+  else
+   

RE: [PATCH] Reject single lane vector types for SLP build

2025-07-10 Thread Tamar Christina
> -Original Message-
> From: Richard Biener 
> Sent: Thursday, July 10, 2025 3:09 PM
> To: Tamar Christina 
> Cc: gcc-patches@gcc.gnu.org; Richard Sandiford ;
> RISC-V CI 
> Subject: RE: [PATCH] Reject single lane vector types for SLP build
> 
> On Thu, 10 Jul 2025, Tamar Christina wrote:
> 
> > > -Original Message-
> > > From: Richard Biener 
> > > Sent: Thursday, July 10, 2025 1:31 PM
> > > To: gcc-patches@gcc.gnu.org
> > > Cc: Richard Sandiford ; Tamar Christina
> > > ; RISC-V CI 
> > > Subject: [PATCH] Reject single lane vector types for SLP build
> > >
> > > The following makes us never consider vector(1) T types for
> > > vectorization and ensures this during SLP build.  This is a
> > > long-standing issue for BB vectorization and when we remove
> > > early loop vector type setting we lose the single place we have
> > > that rejects this for loops.
> > >
> > > Once we implement partial loop vectorization we should revisit
> > > this, but then use the original scalar types for the unvectorized
> > > parts.
> >
> > SGTM FWIW,
> >
> > I was also wondering if I should start upstreaming my changes to
> > get the vectorizer to recognize vector types as scalar types as well.
> >
> > Or if you wanted me to wait until I have the lane representations
> > more figured out.
> 
> I think if we can restrict things to cases that have a strong
> overlap with what we intend to use in the end that sounds good.
> Like allow only a single "scalar" vector def per SLP node for now
> and simply stash that into the scalar-stmts array.  In the end
> we'd want to allow mixed scalar and vector defs there.

At least for my use case I'd need to be able to do multiple "scalar"
vector lanes, but restricted to the same size for each lane is fine for
now.

But I don't think where's actually much difference here between
one "scalar" and multiple "scalars" representations wise now is there?

Thanks,
Tamar

> 
> It does require altering code that expects to get at actual _scalar_
> defs for each lane, but I don't think that's much code.
> 
> Richard.
> 
> > Regards,
> > Tamar
> > >
> > > Bootstrapped and tested on x86_64-unknown-linux-gnu.  I'll see
> > > if there's any surprises from the CI, but otherwise I'll go
> > > ahead with this.
> > >
> > > Richard.
> > >
> > >   * tree-vect-slp.cc (vect_build_slp_tree_1): Reject
> > >   single-lane vector types.
> > > ---
> > >  gcc/tree-vect-slp.cc | 9 +
> > >  1 file changed, 9 insertions(+)
> > >
> > > diff --git a/gcc/tree-vect-slp.cc b/gcc/tree-vect-slp.cc
> > > index ad75386926a..d2ce4ffaa4f 100644
> > > --- a/gcc/tree-vect-slp.cc
> > > +++ b/gcc/tree-vect-slp.cc
> > > @@ -1114,6 +1114,15 @@ vect_build_slp_tree_1 (vec_info *vinfo, unsigned
> char
> > > *swap,
> > >matches[0] = false;
> > >return false;
> > >  }
> > > +  if (known_le (TYPE_VECTOR_SUBPARTS (vectype), 1U))
> > > +{
> > > +  if (dump_enabled_p ())
> > > + dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location,
> > > +  "Build SLP failed: not using single lane "
> > > +  "vector type %T\n", vectype);
> > > +  matches[0] = false;
> > > +  return false;
> > > +}
> > >/* Record nunits required but continue analysis, producing matches[]
> > >   as if nunits was not an issue.  This allows splitting of groups
> > >   to happen.  */
> > > --
> > > 2.43.0
> >
> 
> --
> Richard Biener 
> SUSE Software Solutions Germany GmbH,
> Frankenstrasse 146, 90461 Nuernberg, Germany;
> GF: Ivo Totev, Andrew McDonald, Werner Knoblich; (HRB 36809, AG Nuernberg)


Re: [PATCH] libgcc: PR target/116363 Fix SFtype to UDWtype conversion

2025-07-10 Thread Jan Dubiec

On 10.07.2025 15:42, Jeff Law wrote:
[...]
Anyway, this has been repeatedly bootstrapped & regression tested on 
aarch64, ppc64le and other targets.  It's also been many dozens of 
regression testing cycles on the various embedded targets.


This part of code does not seem to be used on many targets…


Pushed to the trunk.  Thanks again for your patience.


No worries, thanks!

/J.D.



Re: [Patch, Fortran, Coarray, PR88076, v2] Add a shared memory multi process coarray library.

2025-07-10 Thread Thomas Koenig

Am 10.07.25 um 11:27 schrieb Andre Vehreschild:

Regtests ok on x86_64-pc-linux-gnu / F41. Ok for mainline?


Did you run extensive tests on all potential race conditions,
and fix the resulting fallout?

If you did, please post your test cases and the results. Otherwise,
https://gcc.gnu.org/pipermail/fortran/2025-June/062378.html still
applies.



Re: [PATCH 1/2] Passing TYPE_SIZE_UNIT of the element as the 6th argument to .ACCESS_WITH_SIZE (PR121000)

2025-07-10 Thread Jakub Jelinek
On Thu, Jul 10, 2025 at 04:03:29PM +, Qing Zhao wrote:
> The size of the element of the FAM _cannot_ reliably depends on the original
> TYPE of the FAM that we passed as the 6th parameter to the .ACCESS_WITH_SIZE:
> 
>  TYPE_SIZE_UNIT (TREE_TYPE (TREE_TYPE (gimple_call_arg (call, 5
> 
> when the element of the FAM has a variable length type. Since the variable
>  that represents TYPE_SIZE_UNIT has no explicit usage in the original IL,
> compiler transformations (such as DSE) that are applied before object_size
> phase might eliminate the whole definition to the variable that represents
> the TYPE_SIZE_UNIT of the element of the FAM.
> 
> In order to resolve this issue, instead of passing the original TYPE of the
> FAM as the 6th argument to .ACCESS_WITH_SIZE, we should explicitly pass the
> original TYPE_SIZE_UNIT of the element TYPE of the FAM as the 6th argument
> to the call to  .ACCESS_WITH_SIZE.
> 
> The patches have been bootstrapped and regression tested on both aarch64
> and x86.
> 
> Okay for trunk?
> 
> thanks.
> 
> Qing
> 
>   PR middle-end/121000
> 
> gcc/c/ChangeLog:
> 
>   * c-typeck.cc (build_counted_by_ref): Update comments.

You can't trust mklog that much.  You're updating
build_access_with_size_for_counted_by function comment, not
build_counted_by_ref comments.

>   (build_access_with_size_for_counted_by): Pass TYPE_SIZE_UNIT of the
>   element as the 6th argument.
> 
> gcc/ChangeLog:
> 
>   * internal-fn.cc (expand_DEFERRED_INIT): Update comments.
>   * internal-fn.def (DEFERRED_INIT): Update comments.

Nor these two.  It is expand_ACCESS_WITH_SIZE and ACCESS_WITH_SIZE
in these cases.

>   * tree-object-size.cc (access_with_size_object_size): Update comments.
>   Get the element_size from the 6th argument directly.
> 
> gcc/testsuite/ChangeLog:
> 
>   * gcc.dg/flex-array-counted-by-pr121000.c: New test.

Otherwise LGTM with one nit.
> +int main ()

Line break before main.

Jakub



Re: [PATCH v1 1/2] Match: Leverage BITS_PER_WORD for unsigned SAT_MUL pattern

2025-07-10 Thread Richard Biener
On Fri, Jul 11, 2025 at 6:51 AM  wrote:
>
> From: Pan Li 
>
> The widen mul has different source type for differnt platform,
> like rv32 or rv64.  For rv32, the source of widen mul is 32-bits
> while 64-bits in rv64.  Thus, leverage HOST_WIDE_INT is not that
> correct and result in the pattern match failures in 32-bits system
> like rv32.
>
> Thus, leverage the BITS_PER_WORD instead for this pattern.
>
> gcc/ChangeLog:
>
> * match.pd: Leverage BITS_PER_WORD instead of HOST_WIDE_INT
> for widen mul precision check.
>
> Signed-off-by: Pan Li 
> ---
>  gcc/match.pd | 3 +--
>  1 file changed, 1 insertion(+), 2 deletions(-)
>
> diff --git a/gcc/match.pd b/gcc/match.pd
> index 67b33eee5f7..7f31705b652 100644
> --- a/gcc/match.pd
> +++ b/gcc/match.pd
> @@ -3605,11 +3605,10 @@ DEFINE_INT_AND_FLOAT_ROUND_FN (RINT)
>unsigned widen_prec = TYPE_PRECISION (TREE_TYPE (@3));
>unsigned cvt5_prec = TYPE_PRECISION (TREE_TYPE (@5));
>unsigned cvt6_prec = TYPE_PRECISION (TREE_TYPE (@6));
> -  unsigned hw_int_prec = sizeof (HOST_WIDE_INT) * 8;
>wide_int c2 = wi::to_wide (@2);
>wide_int max = wi::mask (prec, false, widen_prec);
>bool c2_is_max_p = wi::eq_p (c2, max);
> -  bool widen_mult_p = cvt5_prec == cvt6_prec && hw_int_prec == cvt5_prec;
> +  bool widen_mult_p = cvt5_prec == cvt6_prec && BITS_PER_WORD == 
> cvt5_prec;

Why is it important to constrain the widen-mult input to a
fixed precision at all?

>   }
>   (if (widen_prec > prec && c2_is_max_p && widen_mult_p)
>  )
> --
> 2.43.0
>


Re: [PATCH] Change bellow in comments to below

2025-07-10 Thread Kyrylo Tkachov


> On 10 Jul 2025, at 08:09, Jakub Jelinek  wrote:
> 
> Hi!
> 
> While I'm not a native English speaker, I believe all the uses
> of bellow (roar/bark/...) in comments in gcc are meant to be
> below (beneath/under/...).
> 
> Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?
> 
> 2025-07-10  Jakub Jelinek  
> 
> gcc/
> * tree-vect-loop.cc (scale_profile_for_vect_loop): Comment
> spelling fix: bellow -> below.
> * ipa-polymorphic-call.cc (record_known_type): Likewise.
> * config/i386/x86-tune.def: Likewise.
> * config/riscv/vector.md (*vsetvldi_no_side_effects_si_extend):
> Likewise.
> * tree-scalar-evolution.cc (iv_can_overflow_p): Likewise.
> * ipa-devirt.cc (add_type_duplicate): Likewise.
> * tree-ssa-loop-niter.cc (maybe_lower_iteration_bound): Likewise.
> * gimple-ssa-sccopy.cc: Likewise.
> * cgraphunit.cc: Likewise.
> * graphite.h (struct poly_dr): Likewise.
> * ipa-reference.cc (ignore_edge_p): Likewise.
> * tree-ssa-alias.cc (ao_compare::compare_ao_refs): Likewise.
> * profile-count.h (profile_probability::probably_reliable_p):
> Likewise.
> * ipa-inline-transform.cc (inline_call): Likewise.
> gcc/ada/
> * par-load.adb: Comment spelling fix: bellow -> below.
> * libgnarl/s-taskin.ads: Likewise.
> gcc/testsuite/
> * gfortran.dg/g77/980310-3.f: Comment spelling fix: bellow -> below.
> * jit.dg/test-debuginfo.c: Likewise.
> libstdc++-v3/
> * testsuite/22_locale/codecvt/codecvt_unicode.h
> (ucs2_to_utf8_out_error): Comment spelling fix: bellow -> below.
> (utf16_to_ucs2_in_error): Likewise.
> 

Looks fine to me and….


> --- gcc/tree-vect-loop.cc.jj 2025-07-09 20:38:59.036628116 +0200
> +++ gcc/tree-vect-loop.cc 2025-07-09 20:42:30.409882136 +0200
> @@ -11489,7 +11489,7 @@ scale_profile_for_vect_loop (class loop
>   profile_count entry_count = loop_preheader_edge (loop)->count ();
> 
>   /* If we have unreliable loop profile avoid dropping entry
> - count bellow header count.  This can happen since loops
> + count below header count.  This can happen since loops
>  has unrealistically low trip counts.  */
>   while (vf > 1
> && loop->header->count > entry_count
> --- gcc/ipa-polymorphic-call.cc.jj 2025-01-02 20:54:32.263128066 +0100
> +++ gcc/ipa-polymorphic-call.cc 2025-07-09 20:42:00.479269537 +0200
> @@ -1353,7 +1353,7 @@ record_known_type (struct type_change_in
> 
>   /* If we found a constructor of type that is not polymorphic or
>  that may contain the type in question as a field (not as base),
> - restrict to the inner class first to make type matching bellow
> + restrict to the inner class first to make type matching below
>  happier.  */
>   if (type
>   && (offset
> --- gcc/config/i386/x86-tune.def.jj 2025-07-09 20:38:58.951629222 +0200
> +++ gcc/config/i386/x86-tune.def 2025-07-09 20:41:41.466515624 +0200
> @@ -31,7 +31,7 @@ see the files COPYING3 and COPYING.RUNTI
> - Updating ix86_issue_rate and ix86_adjust_cost in i386.md
> - possibly updating ia32_multipass_dfa_lookahead, ix86_sched_reorder
>  and ix86_sched_init_global if those tricks are needed.
> -- Tunning the flags bellow. Those are split into sections and each
> +- Tunning the flags below. Those are split into sections and each

… “Tunning” looks like a typo as well, should likely be “Tuning”.
Thanks,
Kyrill


>   section is very roughly ordered by importance.  */
> 
> /*/
> --- gcc/config/riscv/vector.md.jj 2025-06-30 13:57:47.898657344 +0200
> +++ gcc/config/riscv/vector.md 2025-07-09 20:41:44.100481531 +0200
> @@ -1783,7 +1783,7 @@ (define_insn_and_split "@vsetvl_no
>   [(set_attr "type" "vsetvl")
>(set_attr "mode" "SI")])
> 
> -;; This pattern use to combine bellow two insns and then further remove
> +;; This pattern use to combine below two insns and then further remove
> ;; unnecessary sign_extend operations:
> ;;   (set (reg:DI 134 [ _1 ])
> ;;(unspec:DI [
> --- gcc/tree-scalar-evolution.cc.jj 2025-05-09 17:56:52.472682248 +0200
> +++ gcc/tree-scalar-evolution.cc 2025-07-09 20:42:16.605060815 +0200
> @@ -3088,7 +3088,7 @@ iv_can_overflow_p (class loop *loop, tre
>   type_max = wi::max_value (type);
> 
>   /* Just sanity check that we don't see values out of the range of the type.
> - In this case the arithmetics bellow would overflow.  */
> + In this case the arithmetics below would overflow.  */
>   gcc_checking_assert (wi::ge_p (base_min, type_min, sgn)
>   && wi::le_p (base_max, type_max, sgn));
> 
> --- gcc/ipa-devirt.cc.jj 2025-03-03 21:44:09.553931609 +0100
> +++ gcc/ipa-devirt.cc 2025-07-09 20:41:55.212337706 +0200
> @@ -1763,7 +1763,7 @@ add_type_duplicate (odr_type val, tree t
>  }
>/* One base is polymorphic and the other not.
>   This ought to be diagnosed earlier, but do not ICE in the
> -   checking bellow.  */
> +   checking below.  */
>else if (TYPE_BINFO (type1)
> && polymorphic_type_binfo_p (TYPE_BINFO (type1))
>!

[PATCH] aarch64: PR target/120999: Avoid movprfx for NBSL implementation of NOR

2025-07-10 Thread Kyrylo Tkachov
Hi all,

While the SVE2 NBSL instruction accepts MOVPRFX to add more flexibility
due to its tied operands, the destination of the movprfx cannot be also
a source operand. But the offending pattern in aarch64-sve2.md tries
to do exactly that for the "=?&w,w,w" alternative and gas warns for the
attached testcase.

This patch just removes that alternative causing RA to emit a normal extra
move.
So for the testcase in the patch we now generate:
nor_z:
nbsl z1.d, z1.d, z2.d, z1.d
mov z0.d, z1.d
ret

instead of the previous:
nor_z:
movprfx z0, z1
nbsl z0.d, z0.d, z2.d, z0.d
ret

which generated a gas warning.

Bootstrapped and tested on aarch64-none-linux-gnu.
Ok for trunk?
Do we want to backport it?

Thanks,
Kyrill


Signed-off-by: Kyrylo Tkachov 

gcc/

PR target/120999
* config/aarch64/aarch64-sve2.md (*aarch64_sve2_nor):
Remove movprfx alternative.

gcc/testsuite/

PR target/120999
* gcc.target/aarch64/sve2/pr120999.c: New test.



0001-aarch64-PR-target-120999-Avoid-movprfx-for-NBSL-impl.patch
Description: 0001-aarch64-PR-target-120999-Avoid-movprfx-for-NBSL-impl.patch


Re: [PATCH] RISC-V: Vector-scalar widening multiply-(subtract-)accumulate [PR119100]

2025-07-10 Thread Robin Dapp
The original pattern was not exercised by any pre-existing test. I tried but 
failed to come up with a testcase that would expand to

   float_extend ∘ vec_duplicate
rather than
   vec_duplicate ∘ float_extend.


Ok, so we indeed don't have a test and the intrinsics tests unfortunately are 
no help either.  It's high time we move these propagations to gimple...

What's not tested doesn't exist so I won't insist on keeping the previous
order.  Checking other patterns, we have instances of both variants.

At least for our port we should decide for a canonical order and I'd agree that 
the longer we stay in the scalar domain the better.  This means that in time

the remaining 8(?) instances should be migrated as well.

--
Regards
Robin



[PATCH v3] libstdc++: Implement std::chrono::current_zone() for Windows

2025-07-10 Thread Björn Schäpers
From: Björn Schäpers 

On Windows there is no API to get the current time zone as IANA name,
instead Windows has its own zones. But there exists a mapping provided
by the Unicode Consortium. This patch adds a script to convert the XML
file with the mapping to a lookup table and adds a Windows code path to
use that mapping.

libstdc++-v3/Changelog:

Implement std::chrono::current_zone() for Windows

* scripts/gen_windows_zones_map.py: New file, generates
windows_zones-map.h.
* src/c++20/windows_zones-map.h: New file, contains the look up
table.
* src/c++20/tzdb.cc (tzdb::current_zone): Add Windows code path.

Signed-off-by: Björn Schäpers 
---
 libstdc++-v3/scripts/gen_windows_zones_map.py | 127 ++
 libstdc++-v3/src/c++20/tzdb.cc| 103 -
 libstdc++-v3/src/c++20/windows_zones-map.h| 407 ++
 3 files changed, 635 insertions(+), 2 deletions(-)
 create mode 100644 libstdc++-v3/scripts/gen_windows_zones_map.py
 create mode 100644 libstdc++-v3/src/c++20/windows_zones-map.h

diff --git a/libstdc++-v3/scripts/gen_windows_zones_map.py 
b/libstdc++-v3/scripts/gen_windows_zones_map.py
new file mode 100644
index 000..9ac559209cc
--- /dev/null
+++ b/libstdc++-v3/scripts/gen_windows_zones_map.py
@@ -0,0 +1,127 @@
+#!/usr/bin/env python3
+#
+# Script to generate the map for libstdc++ std::chrono::current_zone under 
Windows.
+#
+# This file is part of GCC.
+#
+# GCC is free software; you can redistribute it and/or modify it under
+# the terms of the GNU General Public License as published by the Free
+# Software Foundation; either version 3, or (at your option) any later
+# version.
+#
+# GCC is distributed in the hope that it will be useful, but WITHOUT ANY
+# WARRANTY; without even the implied warranty of MERCHANTABILITY or
+# FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public License
+# for more details.
+#
+# You should have received a copy of the GNU General Public License
+# along with GCC; see the file COPYING3.  If not see
+# .
+
+# To update the Libstdc++ static data in src/c++20/windows_zones-map.h 
download the latest:
+# 
https://raw.githubusercontent.com/unicode-org/cldr/master/common/supplemental/windowsZones.xml
+# Then run this script and save the output to
+# src/c++20/windows_zones-map.h
+
+import os
+import sys
+import xml.etree.ElementTree as et
+
+if len(sys.argv) != 2:
+print("Usage: %s " % sys.argv[0], file=sys.stderr)
+sys.exit(1)
+
+self = os.path.basename(__file__)
+print("// Generated by scripts/{}, do not edit.".format(self))
+print("""
+// Copyright The GNU Toolchain Authors.
+//
+// This file is part of the GNU ISO C++ Library.  This library is free
+// software; you can redistribute it and/or modify it under the
+// terms of the GNU General Public License as published by the
+// Free Software Foundation; either version 3, or (at your option)
+// any later version.
+
+// This library is distributed in the hope that it will be useful,
+// but WITHOUT ANY WARRANTY; without even the implied warranty of
+// MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+// GNU General Public License for more details.
+
+// Under Section 7 of GPL version 3, you are granted additional
+// permissions described in the GCC Runtime Library Exception, version
+// 3.1, as published by the Free Software Foundation.
+
+// You should have received a copy of the GNU General Public License and
+// a copy of the GCC Runtime Library Exception along with this program;
+// see the files COPYING3 and COPYING.RUNTIME respectively.  If not, see
+// .
+
+/** @file bits/windows_zones-map.h
+ *  This is an internal header file, included by other library headers.
+ *  Do not attempt to use it directly. @headername{chrono}
+ */
+""")
+
+print("#ifndef _GLIBCXX_GET_WINDOWS_ZONES_MAP")
+print('# error "This is not a public header, do not include it directly"')
+print("#endif\n")
+
+class WindowsZoneMapEntry:
+def __init__(self, windows, territory, iana):
+self.windows = windows
+self.territory = territory
+self.iana = iana
+
+def __lt__(self, other):
+if self.windows < other.windows:
+return True
+if self.windows > other.windows:
+return False
+return self.territory < other.territory
+
+windows_zone_map = []
+
+tree = et.parse(sys.argv[1])
+xml_zone_map = tree.getroot().find("windowsZones").find("mapTimezones")
+
+for entry in xml_zone_map.iter("mapZone"):
+iana = entry.attrib["type"]
+space = iana.find(" ")
+if space != -1:
+iana = iana[0:space]
+windows_zone_map.append(WindowsZoneMapEntry(entry.attrib["other"], 
entry.attrib["territory"], iana))
+
+# Sort so we can use binary search on the array.
+windows_zone_map.sort();
+
+# Skip territories which have the same IANA zone as 001, so we can reduce the 
data.
+last_windows_zone = ""
+for entry in

RE: [PATCH] RISC-V: Add testcases for unsigned vector SAT_SUB form 11 and form 12

2025-07-10 Thread Li, Pan2
> +#define test_data  TEST_UNARY_DATA_WRAP(T, usub)

Here should be ussub instead of usub? Aka unsigned saturation sub (standard 
name ussubm3).
Looks you need to update the data defined in previous too.

Otherwise, LGTM.

Pan

-Original Message-
From: Ciyan Pan  
Sent: Thursday, July 10, 2025 3:14 PM
To: gcc-patches@gcc.gnu.org
Cc: kito.ch...@gmail.com; richard.guent...@gmail.com; tamar.christ...@arm.com; 
juzhe.zh...@rivai.ai; Li, Pan2 ; jeffreya...@gmail.com; 
rdapp@gmail.com; panciyan 
Subject: [PATCH] RISC-V: Add testcases for unsigned vector SAT_SUB form 11 and 
form 12

From: panciyan 

This patch adds testcase for form11 and form12, as shown below:

void __attribute__((noinline))   \
vec_sat_u_sub_##T##_fmt_11 (T *out, T *op_1, T *op_2, unsigned limit) \
{\
  unsigned i;\
  for (i = 0; i < limit; i++)\
{\
  T x = op_1[i]; \
  T y = op_2[i]; \
  T ret; \
  T overflow = __builtin_sub_overflow (x, y, &ret);   \
  out[i] = overflow ? 0 : ret;   \
}\
}

void __attribute__((noinline))\
vec_sat_u_sub_##T##_fmt_12 (T *out, T *op_1, T *op_2, unsigned limit) \
{ \
  unsigned i; \
  for (i = 0; i < limit; i++) \
{ \
  T x = op_1[i];  \
  T y = op_2[i];  \
  T ret;  \
  T overflow = __builtin_sub_overflow (x, y, &ret);\
  out[i] = !overflow ? ret : 0;   \
} \
}

Passed the rv64gcv regression test.

Signed-off-by: Ciyan Pan 
gcc/testsuite/ChangeLog:

* gcc.target/riscv/rvv/autovec/sat/vec_sat_arith.h: Add unsigned vector 
SAT_SUB form11 and form12.
* gcc.target/riscv/rvv/autovec/sat/vec_sat_u_sub-11-u16.c: New test.
* gcc.target/riscv/rvv/autovec/sat/vec_sat_u_sub-11-u32.c: New test.
* gcc.target/riscv/rvv/autovec/sat/vec_sat_u_sub-11-u64.c: New test.
* gcc.target/riscv/rvv/autovec/sat/vec_sat_u_sub-11-u8.c: New test.
* gcc.target/riscv/rvv/autovec/sat/vec_sat_u_sub-12-u16.c: New test.
* gcc.target/riscv/rvv/autovec/sat/vec_sat_u_sub-12-u32.c: New test.
* gcc.target/riscv/rvv/autovec/sat/vec_sat_u_sub-12-u64.c: New test.
* gcc.target/riscv/rvv/autovec/sat/vec_sat_u_sub-12-u8.c: New test.
* gcc.target/riscv/rvv/autovec/sat/vec_sat_u_sub-run-11-u16.c: New test.
* gcc.target/riscv/rvv/autovec/sat/vec_sat_u_sub-run-11-u32.c: New test.
* gcc.target/riscv/rvv/autovec/sat/vec_sat_u_sub-run-11-u64.c: New test.
* gcc.target/riscv/rvv/autovec/sat/vec_sat_u_sub-run-11-u8.c: New test.
* gcc.target/riscv/rvv/autovec/sat/vec_sat_u_sub-run-12-u16.c: New test.
* gcc.target/riscv/rvv/autovec/sat/vec_sat_u_sub-run-12-u32.c: New test.
* gcc.target/riscv/rvv/autovec/sat/vec_sat_u_sub-run-12-u64.c: New test.
* gcc.target/riscv/rvv/autovec/sat/vec_sat_u_sub-run-12-u8.c: New test.

---
 .../riscv/rvv/autovec/sat/vec_sat_arith.h | 44 +++
 .../rvv/autovec/sat/vec_sat_u_sub-11-u16.c|  9 
 .../rvv/autovec/sat/vec_sat_u_sub-11-u32.c|  9 
 .../rvv/autovec/sat/vec_sat_u_sub-11-u64.c|  9 
 .../rvv/autovec/sat/vec_sat_u_sub-11-u8.c |  9 
 .../rvv/autovec/sat/vec_sat_u_sub-12-u16.c|  9 
 .../rvv/autovec/sat/vec_sat_u_sub-12-u32.c|  9 
 .../rvv/autovec/sat/vec_sat_u_sub-12-u64.c|  9 
 .../rvv/autovec/sat/vec_sat_u_sub-12-u8.c |  9 
 .../autovec/sat/vec_sat_u_sub-run-11-u16.c| 15 +++
 .../autovec/sat/vec_sat_u_sub-run-11-u32.c| 15 +++
 .../autovec/sat/vec_sat_u_sub-run-11-u64.c| 15 +++
 .../rvv/autovec/sat/vec_sat_u_sub-run-11-u8.c | 15 +++
 .../autovec/sat/vec_sat_u_sub-run-12-u16.c| 15 +++
 .../autovec/sat/vec_sat_u_sub-run-12-u32.c| 15 +++
 .../autovec/sat/vec_sat_u_sub-run-12-u64.c| 15 +++
 .../rvv/autovec/sat/vec_sat_u_sub-run-12-u8.c | 15 +++
 17 files changed, 236 insertions(+)
 create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/autovec/sat/vec_sat_u_sub-11-u16.

[pushed] testsuite: Add -funwind-tables to sve*/pfalse* tests

2025-07-10 Thread Richard Sandiford
The SVE svpfalse folding tests use CFI directives to delimit the
function bodies.  That requires -funwind-tables to be enabled,
which is true by default for *-linux-gnu targets, but not for *-elf.

Tested on aarch64-linux-gnu and aarch64_be-elf.  Pushed as obvious.

Richard


gcc/testsuite/
* gcc.target/aarch64/sve/pfalse-binary.c: Add -funwind-tables.
* gcc.target/aarch64/sve/pfalse-binary_int_opt_n.c: Likewise.
* gcc.target/aarch64/sve/pfalse-binary_opt_n.c: Likewise.
* gcc.target/aarch64/sve/pfalse-binary_opt_single_n.c: Likewise.
* gcc.target/aarch64/sve/pfalse-binary_rotate.c: Likewise.
* gcc.target/aarch64/sve/pfalse-binary_uint64_opt_n.c: Likewise.
* gcc.target/aarch64/sve/pfalse-binary_uint_opt_n.c: Likewise.
* gcc.target/aarch64/sve/pfalse-binaryxn.c: Likewise.
* gcc.target/aarch64/sve/pfalse-clast.c: Likewise.
* gcc.target/aarch64/sve/pfalse-compare_opt_n.c: Likewise.
* gcc.target/aarch64/sve/pfalse-compare_wide_opt_n.c: Likewise.
* gcc.target/aarch64/sve/pfalse-count_pred.c: Likewise.
* gcc.target/aarch64/sve/pfalse-fold_left.c: Likewise.
* gcc.target/aarch64/sve/pfalse-load.c: Likewise.
* gcc.target/aarch64/sve/pfalse-load_ext.c: Likewise.
* gcc.target/aarch64/sve/pfalse-load_ext_gather_index.c: Likewise.
* gcc.target/aarch64/sve/pfalse-load_ext_gather_offset.c: Likewise.
* gcc.target/aarch64/sve/pfalse-load_gather_sv.c: Likewise.
* gcc.target/aarch64/sve/pfalse-load_gather_vs.c: Likewise.
* gcc.target/aarch64/sve/pfalse-load_replicate.c: Likewise.
* gcc.target/aarch64/sve/pfalse-prefetch.c: Likewise.
* gcc.target/aarch64/sve/pfalse-prefetch_gather_index.c: Likewise.
* gcc.target/aarch64/sve/pfalse-prefetch_gather_offset.c: Likewise.
* gcc.target/aarch64/sve/pfalse-ptest.c: Likewise.
* gcc.target/aarch64/sve/pfalse-rdffr.c: Likewise.
* gcc.target/aarch64/sve/pfalse-reduction.c: Likewise.
* gcc.target/aarch64/sve/pfalse-reduction_wide.c: Likewise.
* gcc.target/aarch64/sve/pfalse-shift_right_imm.c: Likewise.
* gcc.target/aarch64/sve/pfalse-store.c: Likewise.
* gcc.target/aarch64/sve/pfalse-store_scatter_index.c: Likewise.
* gcc.target/aarch64/sve/pfalse-store_scatter_offset.c: Likewise.
* gcc.target/aarch64/sve/pfalse-storexn.c: Likewise.
* gcc.target/aarch64/sve/pfalse-ternary_opt_n.c: Likewise.
* gcc.target/aarch64/sve/pfalse-ternary_rotate.c: Likewise.
* gcc.target/aarch64/sve/pfalse-unary.c: Likewise.
* gcc.target/aarch64/sve/pfalse-unary_convert_narrowt.c: Likewise.
* gcc.target/aarch64/sve/pfalse-unary_convertxn.c: Likewise.
* gcc.target/aarch64/sve/pfalse-unary_n.c: Likewise.
* gcc.target/aarch64/sve/pfalse-unary_pred.c: Likewise.
* gcc.target/aarch64/sve/pfalse-unary_to_uint.c: Likewise.
* gcc.target/aarch64/sve/pfalse-unaryxn.c: Likewise.
* gcc.target/aarch64/sve2/pfalse-binary.c: Likewise.
* gcc.target/aarch64/sve2/pfalse-binary_int_opt_n.c: Likewise.
* gcc.target/aarch64/sve2/pfalse-binary_int_opt_single_n.c: Likewise.
* gcc.target/aarch64/sve2/pfalse-binary_opt_n.c: Likewise.
* gcc.target/aarch64/sve2/pfalse-binary_opt_single_n.c: Likewise.
* gcc.target/aarch64/sve2/pfalse-binary_to_uint.c: Likewise.
* gcc.target/aarch64/sve2/pfalse-binary_uint_opt_n.c: Likewise.
* gcc.target/aarch64/sve2/pfalse-binary_wide.c: Likewise.
* gcc.target/aarch64/sve2/pfalse-compare.c: Likewise.
* gcc.target/aarch64/sve2/pfalse-load_ext_gather_index_restricted.c,
* gcc.target/aarch64/sve2/pfalse-load_ext_gather_offset_restricted.c,
* gcc.target/aarch64/sve2/pfalse-load_gather_sv_restricted.c: Likewise.
* gcc.target/aarch64/sve2/pfalse-load_gather_vs.c: Likewise.
* gcc.target/aarch64/sve2/pfalse-shift_left_imm_to_uint.c: Likewise.
* gcc.target/aarch64/sve2/pfalse-shift_right_imm.c: Likewise.
* gcc.target/aarch64/sve2/pfalse-store_scatter_index_restricted.c,
* gcc.target/aarch64/sve2/pfalse-store_scatter_offset_restricted.c,
* gcc.target/aarch64/sve2/pfalse-unary.c: Likewise.
* gcc.target/aarch64/sve2/pfalse-unary_convert.c: Likewise.
* gcc.target/aarch64/sve2/pfalse-unary_convert_narrowt.c: Likewise.
* gcc.target/aarch64/sve2/pfalse-unary_to_int.c: Likewise.
---
 gcc/testsuite/gcc.target/aarch64/sve/pfalse-binary.c| 2 +-
 gcc/testsuite/gcc.target/aarch64/sve/pfalse-binary_int_opt_n.c  | 2 +-
 gcc/testsuite/gcc.target/aarch64/sve/pfalse-binary_opt_n.c  | 2 +-
 .../gcc.target/aarch64/sve/pfalse-binary_opt_single_n.c | 2 +-
 gcc/testsuite/gcc.target/aarch64/sve/pfalse-binary_rotate.c | 2 +-
 .../gcc.target/aarch64/sve/pfalse-binary_uint64_opt_n.c | 2 +-
 gcc/testsuite/gcc.target/aarch64/sv

[PATCH] [x86] properly compute fp/mode for scalar ops for vectorizer costing

2025-07-10 Thread Richard Biener
The x86 add_stmt_hook relies on the passed vectype to determine
the mode and whether it is FP for a scalar operation.  This is
unreliable now for stmts involving patterns and in the future when
there is no vector type passed for scalar operations.

To be least disruptive I've kept using the vector type if it is passed.

Bootstrapped and tested on x86_64-unknown-linux-gnu.

OK?

Thanks
Richard.

* config/i386/i386.cc (ix86_vector_costs::add_stmt_cost): Use
the LHS of a scalar stmt to determine mode and whether it is FP.
---
 gcc/config/i386/i386.cc | 6 ++
 1 file changed, 6 insertions(+)

diff --git a/gcc/config/i386/i386.cc b/gcc/config/i386/i386.cc
index ad7360ec71a..26eefadea64 100644
--- a/gcc/config/i386/i386.cc
+++ b/gcc/config/i386/i386.cc
@@ -25798,6 +25798,12 @@ ix86_vector_costs::add_stmt_cost (int count, 
vect_cost_for_stmt kind,
   if (scalar_p)
mode = TYPE_MODE (TREE_TYPE (vectype));
 }
+  else if (scalar_p && stmt_info)
+if (tree lhs = gimple_get_lhs (stmt_info->stmt))
+  {
+   fp = FLOAT_TYPE_P (TREE_TYPE (lhs));
+   mode = TYPE_MODE (TREE_TYPE (lhs));
+  }
 
   if ((kind == vector_stmt || kind == scalar_stmt)
   && stmt_info
-- 
2.43.0


Re: [PATCH] [x86] properly compute fp/mode for scalar ops for vectorizer costing

2025-07-10 Thread Jan Hubicka
> The x86 add_stmt_hook relies on the passed vectype to determine
> the mode and whether it is FP for a scalar operation.  This is
> unreliable now for stmts involving patterns and in the future when
> there is no vector type passed for scalar operations.
> 
> To be least disruptive I've kept using the vector type if it is passed.
> 
> Bootstrapped and tested on x86_64-unknown-linux-gnu.
> 
> OK?
> 
> Thanks
> Richard.
> 
>   * config/i386/i386.cc (ix86_vector_costs::add_stmt_cost): Use
>   the LHS of a scalar stmt to determine mode and whether it is FP.
> ---
>  gcc/config/i386/i386.cc | 6 ++
>  1 file changed, 6 insertions(+)
> 
> diff --git a/gcc/config/i386/i386.cc b/gcc/config/i386/i386.cc
> index ad7360ec71a..26eefadea64 100644
> --- a/gcc/config/i386/i386.cc
> +++ b/gcc/config/i386/i386.cc
> @@ -25798,6 +25798,12 @@ ix86_vector_costs::add_stmt_cost (int count, 
> vect_cost_for_stmt kind,
>if (scalar_p)
>   mode = TYPE_MODE (TREE_TYPE (vectype));
>  }
> +  else if (scalar_p && stmt_info)
> +if (tree lhs = gimple_get_lhs (stmt_info->stmt))
> +  {
> + fp = FLOAT_TYPE_P (TREE_TYPE (lhs));
> + mode = TYPE_MODE (TREE_TYPE (lhs));
> +  }
Makes sense to me, but perhaps it would be good idea to add a comment,
since it looks odd at first glance?

Honza


Re: [AutoFDO] Fix get_original_name to strip only names that are generated after auto-profile

2025-07-10 Thread Jan Hubicka
Hi,
this patch fixes several issues I noticed in gimple matching and -Wauto-profile
warning.  One problem is that we mismatched symbols with user names, such as
"*strlen" instead of "strlen". I added raw_symbol_name to strip extra '*' which
is ok on ELF targets which are only targets we support with auto-profile, but
eventually we will want to add the user prefix.  There is sorry about this.
Also I think dwarf2out is wrong:

static void
add_linkage_attr (dw_die_ref die, tree decl)
{
  const char *name = IDENTIFIER_POINTER (DECL_ASSEMBLER_NAME (decl));

  /* Mimic what assemble_name_raw does with a leading '*'.  */
  if (name[0] == '*')
name = &name[1];

The patch also fixes locations of warning.  I used location of problematic
statement as warning_at parmaeter but also included info about the containing
funtction.  This makes warning_at to ignore the fist location that is fixed now.

I also fixed the ICE with -Wno-auto-profile disussed earlier.

Bootstrapped/regtested x86_64-linux.  Autoprofiled bootstrap now fails for
weird reasons for me (it does not bild the training stage), so I will try to
debug this before comitting.

gcc/ChangeLog:

* auto-profile.cc: Include output.h.
(function_instance::set_call_location): Also sanity check
that location is known.
(raw_symbol_name): Two new static functions.
(dump_inline_stack): Use it.
(string_table::get_index_by_decl): Likewise.
(function_instance::get_cgraph_node): Likewise.
(function_instance::get_function_instance_by_decl): Fix typo
in warning; use raw names; fix lineno decoding.
(match_with_target): Add containing funciton parameter;
correctly output function and call location in warning.
(function_instance::lookup_count): Fix warning locations.
(function_instance::match): Fix warning locations; avoid
crash with mismatched callee; do not warn about broken callsites
twice.
(autofdo_source_profile::offline_external_functions): Use
raw_assembler_name.
(walk_block): Use raw_assembler_name.

gcc/testsuite/ChangeLog:

* gcc.dg/tree-prof/afdo-inline.c: Add user symbol names.

diff --git a/gcc/auto-profile.cc b/gcc/auto-profile.cc
index 219676012e7..5226e455025 100644
--- a/gcc/auto-profile.cc
+++ b/gcc/auto-profile.cc
@@ -53,6 +53,7 @@ along with GCC; see the file COPYING3.  If not see
 #include "auto-profile.h"
 #include "tree-pretty-print.h"
 #include "gimple-pretty-print.h"
+#include "output.h"
 
 /* The following routines implements AutoFDO optimization.
 
@@ -430,7 +431,8 @@ public:
   void
   set_call_location (location_t l)
   {
-gcc_checking_assert (call_location_ == UNKNOWN_LOCATION);
+gcc_checking_assert (call_location_ == UNKNOWN_LOCATION
+&& l != UNKNOWN_LOCATION);
 call_location_= l;
   }
 
@@ -685,6 +687,26 @@ dump_afdo_loc (FILE *f, unsigned loc)
 fprintf (f, "%i", loc >> 16);
 }
 
+/* Return assembler name as in symbol table and DW_AT_linkage_name.  */
+
+static const char *
+raw_symbol_name (const char *asmname)
+{
+  /* If we start supporting user_label_prefixes, add_linkage_attr will also
+ need to be fixed.  */
+  if (strlen (user_label_prefix))
+sorry ("auto-profile is not supported for targets with user label prefix");
+  return asmname + (asmname[0] == '*');
+}
+
+/* Convenience wrapper that looks up assembler name.  */
+
+static const char *
+raw_symbol_name (tree decl)
+{
+  return raw_symbol_name (IDENTIFIER_POINTER (DECL_ASSEMBLER_NAME (decl)));
+}
+
 /* Dump STACK to F.  */
 
 static void
@@ -695,7 +717,7 @@ dump_inline_stack (FILE *f, inline_stack *stack)
 {
   fprintf (f, "%s%s:",
   first ? "" : "; ",
-  IDENTIFIER_POINTER (DECL_ASSEMBLER_NAME (p.decl)));
+  raw_symbol_name (p.decl));
   dump_afdo_loc (f, p.afdo_loc);
   first = false;
 }
@@ -817,7 +839,7 @@ string_table::get_index (const char *name) const
 int
 string_table::get_index_by_decl (tree decl) const
 {
-  const char *name = IDENTIFIER_POINTER (DECL_ASSEMBLER_NAME (decl));
+  const char *name = raw_symbol_name (decl);
   int ret = get_index (name);
   if (ret != -1)
 return ret;
@@ -880,10 +902,9 @@ function_instance::~function_instance ()
 cgraph_node *
 function_instance::get_cgraph_node ()
 {
-  for (symtab_node *n = cgraph_node::get_for_asmname
-   (get_identifier
-  (afdo_string_table->get_name (name (;
-   n; n = n->next_sharing_asm_name)
+  const char *sname = afdo_string_table->get_name (name ());
+  symtab_node *n = cgraph_node::get_for_asmname (get_identifier (sname));
+  for (;n; n = n->next_sharing_asm_name)
 if (cgraph_node *cn = dyn_cast  (n))
   if (cn->definition && cn->has_gimple_body_p ())
return cn;
@@ -921,10 +942,10 @@ function_instance::get_function_instance_by_de

Re: [PATCH] x86: Update "*mov_internal" in mmx.md to handle all 1s vectors

2025-07-10 Thread Uros Bizjak
On Thu, Jul 10, 2025 at 2:31 PM Uros Bizjak  wrote:
>
> On Thu, Jul 10, 2025 at 1:57 PM H.J. Lu  wrote:
> >
> > commit 77473a27bae04da99d6979d43e7bd0a8106f4557
> > Author: H.J. Lu 
> > Date:   Thu Jun 26 06:08:51 2025 +0800
> >
> > x86: Also handle all 1s float vector constant
> >
> > replaces
> >
> > (insn 29 28 30 5 (set (reg:V2SF 107)
> > (mem/u/c:V2SF (symbol_ref/u:DI ("*.LC0") [flags 0x2]) [0  S8 A64])) 
> > 2031
> >  {*movv2sf_internal}
> >  (expr_list:REG_EQUAL (const_vector:V2SF [
> > (const_double:SF -QNaN [-QNaN]) repeated x2
> > ])
> > (nil)))
> >
> > with
> >
> > (insn 98 13 14 3 (set (reg:V8QI 112)
> > (const_vector:V8QI [
> > (const_int -1 [0x]) repeated x8
> > ])) -1
> >  (nil))
> > ...
> > (insn 29 28 30 5 (set (reg:V2SF 107)
> > (subreg:V2SF (reg:V8QI 112) 0)) 2031 {*movv2sf_internal}
> >  (expr_list:REG_EQUAL (const_vector:V2SF [
> > (const_double:SF -QNaN [-QNaN]) repeated x2
> > ])
> > (nil)))
> >
> > which leads to
> >
> > pr121015.c: In function ‘render_result_from_bake_h’:
> > pr121015.c:34:1: error: unrecognizable insn:
> >34 | }
> >   | ^
> > (insn 98 13 14 3 (set (reg:V8QI 112)
> > (const_vector:V8QI [
> > (const_int -1 [0x]) repeated x8
> > ])) -1
> >  (expr_list:REG_EQUIV (const_vector:V8QI [
> > (const_int -1 [0x]) repeated x8
> > ])
> > (nil)))
> > during RTL pass: ira
> >
> > 1. Update constm1_operand to also return true for integer and float all
> > 1s vectors.
> > 2. Add nonimm_or_0_or_m1_operand for nonimmediate, zero or -1 operand.
> > 3. Add BI for constant all 0s/1s operand.
> > 4. Update "*mov_internal" in mmx.md to handle integer all 1s vectors.
> > 5. Update MMXMODE move splitter to also split all 1s source operand.
> >
> > gcc/
> >
> > PR target/121015
> > * config/i386/constraints.md (BI): New constraint.
> > * config/i386/i386.cc (ix86_print_operand): Support CONSTM1_RTX.
> > * config/i386/mmx.md (*mov_internal): Replace C with BI
> > memory and integer register destination.
> > Update MMXMODE move splitter to also split all 1s source operand.
> > * config/i386/predicates.md (constm1_operand): Also return true
> > for int_float_vector_all_ones_operand.
> > (nonimm_or_0_or_m1_operand): New predicate.
> >
> > gcc/testsuite/
> >
> > PR target/121015
> > * gcc.target/i386/pr106022-2.c: Adjusted.
> > * gcc.target/i386/pr121015.c: New test.
> >
> > OK for master?
>
> +;; Match exactly -1.
> +(define_predicate "constm1_operand"
> +  (ior (and (match_code "const_int")
> +(match_test "op == constm1_rtx"))
> +   (match_operand 0 "int_float_vector_all_ones_operand")))
>
> No, this predicate should not be repurposed to all-ones predicate.
>
> For SSE we have a  macro that defines different
> constraints for float and int moves, I think we should have the same
> approach for MMX. IMO, you also need to amend the  case.

Assuming that problematic conversion only happens with SSE2+, IMO the
correct approach is to use nonimmediate_or_sse_const_operand predicate
in mov_internal with:

@@ -5448,6 +5448,7 @@ standard_sse_constant_p (rtx x, machine_mode pred_mode)
   return 2;
 break;
   case 16:
+   case 8:
 if (TARGET_SSE2)
   return 2;
 break;

and adding  alternative after . Something like:

(define_insn "*mov_internal"
  [(set (match_operand:MMXMODE 0 "nonimmediate_operand"
"=r ,o ,r,r ,m ,?!y,!y,?!y,m  ,r  ,?!y,v,v,v,v,m,r,v,!y,*x")
(match_operand:MMXMODE 1 "nonimmediate_or_sse_const_operand"
"rCo,rC,C,rm,rC,C  ,!y,m  ,?!y,?!y,r  ,C,,v,m,v,v,r,*x,!y"))]

The predicate will only allow -1 with SSE2 (the new alternative should
also be enabled only with SSE2), and the register allocator will
always use "v" output for it, avoiding (-1) -> general reg -> xmm
sequence. Maybe also change mov" expander to use
nonimmediate_or_sse_const_operand predicate

Uros.


RE: [PATCH] Reject single lane vector types for SLP build

2025-07-10 Thread Tamar Christina
> -Original Message-
> From: Richard Biener 
> Sent: Thursday, July 10, 2025 1:31 PM
> To: gcc-patches@gcc.gnu.org
> Cc: Richard Sandiford ; Tamar Christina
> ; RISC-V CI 
> Subject: [PATCH] Reject single lane vector types for SLP build
> 
> The following makes us never consider vector(1) T types for
> vectorization and ensures this during SLP build.  This is a
> long-standing issue for BB vectorization and when we remove
> early loop vector type setting we lose the single place we have
> that rejects this for loops.
> 
> Once we implement partial loop vectorization we should revisit
> this, but then use the original scalar types for the unvectorized
> parts.

SGTM FWIW,

I was also wondering if I should start upstreaming my changes to
get the vectorizer to recognize vector types as scalar types as well.

Or if you wanted me to wait until I have the lane representations
more figured out.

Regards,
Tamar
> 
> Bootstrapped and tested on x86_64-unknown-linux-gnu.  I'll see
> if there's any surprises from the CI, but otherwise I'll go
> ahead with this.
> 
> Richard.
> 
>   * tree-vect-slp.cc (vect_build_slp_tree_1): Reject
>   single-lane vector types.
> ---
>  gcc/tree-vect-slp.cc | 9 +
>  1 file changed, 9 insertions(+)
> 
> diff --git a/gcc/tree-vect-slp.cc b/gcc/tree-vect-slp.cc
> index ad75386926a..d2ce4ffaa4f 100644
> --- a/gcc/tree-vect-slp.cc
> +++ b/gcc/tree-vect-slp.cc
> @@ -1114,6 +1114,15 @@ vect_build_slp_tree_1 (vec_info *vinfo, unsigned char
> *swap,
>matches[0] = false;
>return false;
>  }
> +  if (known_le (TYPE_VECTOR_SUBPARTS (vectype), 1U))
> +{
> +  if (dump_enabled_p ())
> + dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location,
> +  "Build SLP failed: not using single lane "
> +  "vector type %T\n", vectype);
> +  matches[0] = false;
> +  return false;
> +}
>/* Record nunits required but continue analysis, producing matches[]
>   as if nunits was not an issue.  This allows splitting of groups
>   to happen.  */
> --
> 2.43.0


Re: [PING][PATCH] config/rs6000/t-float128: Don't encode full build paths into headers

2025-07-10 Thread Segher Boessenkool
Hi!

On Thu, Jul 10, 2025 at 12:10:16PM +, Sadineni, Harish wrote:
> Ping for [1]https://gcc.gnu.org/pipermail/gcc-patches/2022-August/599882.html.
> 
> This patch avoids embedding full build paths into generated headers by using
> only the basename of the source file. This helps to improve build
> reproducibility, particularly in environments where build paths vary but 
> source
> files are preserved for debugging.
> Can this patch be merged to gcc or do let us know if there are any issues, so
> that we can update the patch accordingly

The patch was not cc:ed to the maintainers, so we didn't see it.

Please repost, cc:ing us.  Also show examples of old and new output!


Segher


Re: [PATCH V2] testsuite: Fix gcc.target/powerpc/vsx-builtin-7.c test [PR119382]

2025-07-10 Thread Surya Kumari Jangala
Hi Jeevitha,

On 24/06/25 3:30 pm, jeevitha wrote:
> Hi All,
> 
> The following patch has been tested on powerpc64le-linux and verified it's
> fixed.
> 
> Changes from V1:
> Added the reason for adding the flag(-fno-ipa-icf) inside the test case.
> 
> The test vsx-builtin-7.c failed on powerpc64le-linux due to Identical
> Code Folding (ICF) merging the functions insert_di_0_v2 and insert_di_0.
> This behavior was introduced by commit r15-7961-gdc47161c1f32c3, which
> enhanced alias analysis in ao_compare::compare_ao_refs, enabling the
> compiler to identify and optimize structurally identical functions. As a
> result, the compiler replaced insert_di_0_v2 with a tail call to
> insert_di_0, altering the expected test behavior.
> 
> This patch adds -fno-ipa-icf to the test's dg-options to disable ICF,
> avoiding function merging and ensuring the test executes correctly.
> 
> 2025-06-24  Jeevitha Palanisamy  
> 
> gcc/testsuite/
>   PR testsuite/119382
>   * gcc.target/powerpc/vsx-builtin-7.c: Add '-fno-ipa-icf' to dg-options.
> 
> 
> diff --git a/gcc/testsuite/gcc.target/powerpc/vsx-builtin-7.c 
> b/gcc/testsuite/gcc.target/powerpc/vsx-builtin-7.c
> index 5095d5030fd..31e12323922 100644
> --- a/gcc/testsuite/gcc.target/powerpc/vsx-builtin-7.c
> +++ b/gcc/testsuite/gcc.target/powerpc/vsx-builtin-7.c
> @@ -1,8 +1,14 @@
>  /* { dg-do compile { target { powerpc*-*-* } } } */
>  /* { dg-skip-if "" { powerpc*-*-darwin* } } */
> -/* { dg-options "-O2 -mdejagnu-cpu=power7 -fno-inline-functions" } */
> +/* { dg-options "-O2 -mdejagnu-cpu=power7 -fno-inline-functions 
> -fno-ipa-icf" } */
>  /* { dg-require-effective-target powerpc_vsx } */
>  
> +/* Note: Added -fno-ipa-icf to disable Interprocedural Identical Code
> +   Folding (ICF). Without this, insert_di_0_v2 is merged with insert_di_0 due
> +   to improved alias analysis introduced in commit r15-7961-gdc47161c1f32c3.
> +   This results in a tail call replacement, altering expected test behavior. 
> +   Disabling ICF ensures correct execution of the test.  */

Can you please mention the PR number in the comment above?
Also please reword as follows:
"This results in the compiler replacing insert_di_0_v2 with a tail call to
insert_di_0, altering expected test behavior."

With the above changes, the patch is fine. However, I cannot approve the patch.


Regards,
Surya

> +
>  /* Test simple extract/insert/slat operations.  Make sure all types are
> supported with various options.  */
>  



Re: [PATCH] tree-optimization/120939 - remove uninitialized use of LOOP_VINFO_COST_MODEL_THRESHOLD

2025-07-10 Thread Richard Sandiford
Richard Biener  writes:
> The following removes an optimization that wrongly triggers right now
> because it accesses LOOP_VINFO_COST_MODEL_THRESHOLD which might not be
> computed yet.
>
> Testing on x86_64 didn't reveal any testsuite coverage.
>
> Bootstrapped and tested on x86_64-unknown-linux-gnu.
>
> OK?
>
>   PR tree-optimization/120939
>   * tree-vect-loop.cc (vect_need_peeling_or_partial_vectors_p):
>   Remove eliding an epilogue based on not computed
>   LOOP_VINFO_COST_MODEL_THRESHOLD.

This regresses:

FAIL: gcc.dg/torture/pr113026-1.c   -O3 -fomit-frame-pointer -funroll-loops 
-fpeel-loops -ftracer -finline-functions   (test for bogus messages, line 10)

on aarch64-linux-gnu, with:

.../pr113026-1.c:10:12: warning: writing 1 byte into a region of size 0 
[-Wstringop-overflow=]
.../pr113026-1.c:4:6: note: at offset 16 into destination object 'dst' of size 
16

I haven't looked into why yet, but it does seem superficially similar
to PR60505, which was what this code seems to have been added to fix
(g:090cd8dc70b80183c83d9f43f1e6ab9970481efd).

Thanks,
Richard

> ---
>  gcc/tree-vect-loop.cc | 21 ++---
>  1 file changed, 2 insertions(+), 19 deletions(-)
>
> diff --git a/gcc/tree-vect-loop.cc b/gcc/tree-vect-loop.cc
> index 46a6399243d..7ac61d4dce2 100644
> --- a/gcc/tree-vect-loop.cc
> +++ b/gcc/tree-vect-loop.cc
> @@ -1224,13 +1224,6 @@ static bool
>  vect_need_peeling_or_partial_vectors_p (loop_vec_info loop_vinfo)
>  {
>unsigned HOST_WIDE_INT const_vf;
> -  HOST_WIDE_INT max_niter
> -= likely_max_stmt_executions_int (LOOP_VINFO_LOOP (loop_vinfo));
> -
> -  unsigned th = LOOP_VINFO_COST_MODEL_THRESHOLD (loop_vinfo);
> -  if (!th && LOOP_VINFO_ORIG_LOOP_INFO (loop_vinfo))
> -th = LOOP_VINFO_COST_MODEL_THRESHOLD (LOOP_VINFO_ORIG_LOOP_INFO
> -   (loop_vinfo));
>  
>if (LOOP_VINFO_NITERS_KNOWN_P (loop_vinfo)
>&& LOOP_VINFO_PEELING_FOR_ALIGNMENT (loop_vinfo) >= 0)
> @@ -1250,18 +1243,8 @@ vect_need_peeling_or_partial_vectors_p (loop_vec_info 
> loop_vinfo)
>VF * N + 1.  That's something of a niche case though.  */
>|| LOOP_VINFO_PEELING_FOR_GAPS (loop_vinfo)
>|| !LOOP_VINFO_VECT_FACTOR (loop_vinfo).is_constant (&const_vf)
> -  || ((tree_ctz (LOOP_VINFO_NITERS (loop_vinfo))
> -< (unsigned) exact_log2 (const_vf))
> -   /* In case of versioning, check if the maximum number of
> -  iterations is greater than th.  If they are identical,
> -  the epilogue is unnecessary.  */
> -   && (!LOOP_REQUIRES_VERSIONING (loop_vinfo)
> -   || ((unsigned HOST_WIDE_INT) max_niter
> -   /* We'd like to use LOOP_VINFO_VERSIONING_THRESHOLD
> -  but that's only computed later based on our result.
> -  The following is the most conservative approximation.  */
> -   > (std::max ((unsigned HOST_WIDE_INT) th,
> -const_vf) / const_vf) * const_vf
> +  || (tree_ctz (LOOP_VINFO_NITERS (loop_vinfo))
> +   < (unsigned) exact_log2 (const_vf)))
>  return true;
>  
>return false;


Re: [PATCH V2] testsuite: Fix gcc.target/powerpc/vsx-builtin-7.c test [PR119382]

2025-07-10 Thread Segher Boessenkool
Hi Surya, Jeevitha,

On Thu, Jul 10, 2025 at 08:26:51PM +0530, Surya Kumari Jangala wrote:
> On 24/06/25 3:30 pm, jeevitha wrote:
> > The following patch has been tested on powerpc64le-linux and verified it's
> > fixed.
> > 
> > Changes from V1:
> > Added the reason for adding the flag(-fno-ipa-icf) inside the test case.
> > 
> > The test vsx-builtin-7.c failed on powerpc64le-linux due to Identical
> > Code Folding (ICF) merging the functions insert_di_0_v2 and insert_di_0.
> > This behavior was introduced by commit r15-7961-gdc47161c1f32c3, which
> > enhanced alias analysis in ao_compare::compare_ao_refs, enabling the
> > compiler to identify and optimize structurally identical functions. As a
> > result, the compiler replaced insert_di_0_v2 with a tail call to
> > insert_di_0, altering the expected test behavior.
> > 
> > This patch adds -fno-ipa-icf to the test's dg-options to disable ICF,
> > avoiding function merging and ensuring the test executes correctly.
> > 
> > 2025-06-24  Jeevitha Palanisamy  
> > 
> > gcc/testsuite/
> > PR testsuite/119382
> > * gcc.target/powerpc/vsx-builtin-7.c: Add '-fno-ipa-icf' to dg-options.
> > 
> > 
> > diff --git a/gcc/testsuite/gcc.target/powerpc/vsx-builtin-7.c 
> > b/gcc/testsuite/gcc.target/powerpc/vsx-builtin-7.c
> > index 5095d5030fd..31e12323922 100644
> > --- a/gcc/testsuite/gcc.target/powerpc/vsx-builtin-7.c
> > +++ b/gcc/testsuite/gcc.target/powerpc/vsx-builtin-7.c
> > @@ -1,8 +1,14 @@
> >  /* { dg-do compile { target { powerpc*-*-* } } } */
> >  /* { dg-skip-if "" { powerpc*-*-darwin* } } */
> > -/* { dg-options "-O2 -mdejagnu-cpu=power7 -fno-inline-functions" } */
> > +/* { dg-options "-O2 -mdejagnu-cpu=power7 -fno-inline-functions 
> > -fno-ipa-icf" } */
> >  /* { dg-require-effective-target powerpc_vsx } */
> >  
> > +/* Note: Added -fno-ipa-icf to disable Interprocedural Identical Code
> > +   Folding (ICF). Without this, insert_di_0_v2 is merged with insert_di_0 
> > due
> > +   to improved alias analysis introduced in commit 
> > r15-7961-gdc47161c1f32c3.
> > +   This results in a tail call replacement, altering expected test 
> > behavior. 
> > +   Disabling ICF ensures correct execution of the test.  */
> 
> Can you please mention the PR number in the comment above?
> Also please reword as follows:
> "This results in the compiler replacing insert_di_0_v2 with a tail call to
> insert_di_0, altering expected test behavior."

Yes please.

> With the above changes, the patch is fine. However, I cannot approve the 
> patch.

But I can :-)  Yes, okay with such changes.  Thanks!


Segher


[PING][PATCH] config/rs6000/t-float128: Don't encode full build paths into headers

2025-07-10 Thread Sadineni, Harish
Hi all,

Ping for https://gcc.gnu.org/pipermail/gcc-patches/2022-August/599882.html.

This patch avoids embedding full build paths into generated headers by using 
only the basename of the source file. This helps to improve build 
reproducibility, particularly in environments where build paths vary but source 
files are preserved for debugging.
Can this patch be merged to gcc or do let us know if there are any issues, so 
that we can update the patch accordingly
thanks,
Harish


Re: [PATCH] aarch64: Enable selective LDAPUR generation for cores with RCPC2

2025-07-10 Thread Soumya AR


> On 10 Jul 2025, at 3:15 PM, Richard Sandiford  
> wrote:
> 
> External email: Use caution opening links or attachments
> 
> 
> Soumya AR  writes:
>>> On 1 Jul 2025, at 9:22 PM, Kyrylo Tkachov  wrote:
>>> 
>>> 
>>> 
 On 1 Jul 2025, at 17:36, Richard Sandiford  
 wrote:
 
 Soumya AR  writes:
> From 2a2c3e3683aaf3041524df166fc6f8cf20895a0b Mon Sep 17 00:00:00 2001
> From: Soumya AR 
> Date: Mon, 30 Jun 2025 12:17:30 -0700
> Subject: [PATCH] aarch64: Enable selective LDAPUR generation for cores 
> with
> RCPC2
> 
> This patch adds the ability to fold the address computation into the 
> addressing
> mode for LDAPR instructions using LDAPUR when RCPC2 is available.
> 
> LDAPUR emission is controlled by the tune flag enable_ldapur, to enable 
> it on a
> per-core basis. Earlier, the following code:
> 
> uint64_t
> foo (std::atomic *x)
> {
> return x[1].load(std::memory_order_acquire);
> }
> 
> would generate:
> 
> foo(std::atomic*):
> add x0, x0, 8
> ldapr   x0, [x0]
> ret
> 
> but now generates:
> 
> foo(std::atomic*):
> ldapur  x0, [x0, 8]
> ret
> 
> The patch was bootstrapped and regtested on aarch64-linux-gnu, no 
> regression.
> OK for mainline?
> 
> Signed-off-by: Soumya AR 
> 
> gcc/ChangeLog:
> 
> * config/aarch64/aarch64-tuning-flags.def (AARCH64_EXTRA_TUNING_OPTION):
> Add the enable_ldapur flag to conwtrol LDAPUR emission.
> * config/aarch64/aarch64.h (TARGET_ENABLE_LDAPUR): Use new flag.
> * config/aarch64/aarch64.md (any): Add ldapur_enable attribute.
> * config/aarch64/atomics.md: (aarch64_atomic_load_rcpc): Modify
> to emit LDAPUR for cores with RCPC2 when enable_ldapur is set.
> (*aarch64_atomic_load_rcpc_zext): Likewise.
> (*aarch64_atomic_load_rcpc_sext): Modified to emit LDAPURS
> for addressing with offsets.
> 
> gcc/testsuite/ChangeLog:
> 
> * gcc.target/aarch64/ldapur.c: New test.
 
 Thanks for doing this.  It generally looks good, but a couple of comments
 below:
 
> ---
> gcc/config/aarch64/aarch64-tuning-flags.def |  2 +
> gcc/config/aarch64/aarch64.h|  5 ++
> gcc/config/aarch64/aarch64.md   | 11 +++-
> gcc/config/aarch64/atomics.md   | 22 +---
> gcc/testsuite/gcc.target/aarch64/ldapur.c   | 61 +
> 5 files changed, 92 insertions(+), 9 deletions(-)
> create mode 100644 gcc/testsuite/gcc.target/aarch64/ldapur.c
> 
> diff --git a/gcc/config/aarch64/aarch64-tuning-flags.def 
> b/gcc/config/aarch64/aarch64-tuning-flags.def
> index f2c916e9d77..5bf54165306 100644
> --- a/gcc/config/aarch64/aarch64-tuning-flags.def
> +++ b/gcc/config/aarch64/aarch64-tuning-flags.def
> @@ -44,6 +44,8 @@ AARCH64_EXTRA_TUNING_OPTION ("avoid_cross_loop_fma", 
> AVOID_CROSS_LOOP_FMA)
> 
> AARCH64_EXTRA_TUNING_OPTION ("fully_pipelined_fma", FULLY_PIPELINED_FMA)
> 
> +AARCH64_EXTRA_TUNING_OPTION ("enable_ldapur", ENABLE_LDAPUR)
> +
 
 Let's see what others say, but personally, I think this would be better
 as an opt-out, such as avoid_ldapur.  The natural default seems to be to 
 use
 the extra addressing capacity when it's available and have CPUs explicitly
 flag when they don't want that.
 
 A good, conservatively correct, default would probably be to add 
 avoid_ldapur
 to every *current* CPU that includes rcpc2 and then separately remove it
 from those that are known not to need it.  In that sense, it's more work
 for current CPUs than the current patch, but it should ease the impact
 on future CPUs.
>>> 
>>> LLVM used to do this folding by default everywhere until it was discovered 
>>> that it hurts various CPUs.
>>> So they’ve taken the approach you describe, and disable the folding 
>>> explicitly for:
>>> neoverse-v2 neoverse-v3 cortex-x3 cortex-x4 cortex-x925
>>> I don’t know for sure if those are the only CPUs where this applies.
>>> They also disable the folding for generic tuning when -march is between 
>>> armv8.4 - armv8.7/armv9.2.
>>> I guess we can do the same in GCC.
>> 
>> Thanks for your suggestions, Richard and Kyrill.
>> 
>> I've updated the patch to use avoid_ldapur.
>> 
>> There's now an explicit override in aarch64_override_options_internal to use 
>> avoid_ldapur for armv8.4 through armv8.7.
>> 
>> I added it here because aarch64_adjust_generic_arch_tuning is only called 
>> for generic_tunings and not generic_armv{8,9}_a_tunings.
>> 
>> Let me know what you think.
> 
> Sounds good to me.  I can see that we wouldn't want armv9.3-a+ generic
> tuning to be hampered by pre-armv9.3 cores.
> 
> But I'm not sure that we're deliberately avoiding calling
> aarch64_adjust_generic_arch_tuning for generic_armv{8,9}_a_tunings.
> The current TARGET_SVE2 b

Re: [PATCH] aarch64: Fix LD1Q and ST1Q failures for big-endian

2025-07-10 Thread Andrew Pinski
On Thu, Jul 10, 2025 at 6:22 AM Richard Sandiford
 wrote:
>
> LD1Q gathers and ST1Q scatters are unusual in that they operate
> on 128-bit blocks (effectively VNx1TI).  However, we don't have
> modes or ACLE types for 128-bit integers, and 128-bit integers
> are not the intended use case.  Instead, the instructions are
> intended to be used in "hybrid VLA" operations, where each 128-bit
> block is an Advanced SIMD vector.
>
> The normal SVE modes therefore capture the intended use case better
> than VNx1TI would.  For example, VNx2DI is effectively N copies
> of V2DI, VNx4SI N copies of V4SI, etc.
>
> Since there is only one LD1Q instruction and one ST1Q instruction,
> the ACLE support used a single pattern for each, with the loaded or
> stored data having mode VNx2DI.  The ST1Q pattern was generated by:
>
> rtx data = e.args.last ();
> e.args.last () = force_lowpart_subreg (VNx2DImode, data, GET_MODE (data));
> e.prepare_gather_address_operands (1, false);
> return e.use_exact_insn (CODE_FOR_aarch64_scatter_st1q);
>
> where the force_lowpart_subreg bitcast the stored data to VNx2DI.
> But such subregs require an element reverse on big-endian targets
> (see the comment at the head of aarch64-sve.md), which wasn't the
> intention.  The code should have used aarch64_sve_reinterpret instead.
>
> The LD1Q pattern was used as follows:
>
> e.prepare_gather_address_operands (1, false);
> return e.use_exact_insn (CODE_FOR_aarch64_gather_ld1q);
>
> which always returns a VNx2DI value, leaving the caller to bitcast
> that to the correct mode.  That bitcast again uses subregs and has
> the same problem as above.
>
> However, for the reasons explained in the comment, using
> aarch64_sve_reinterpret does not work well for LD1Q.  The patch
> instead parameterises the LD1Q based on the required data mode.
>
> Tested on aarch64-linux-gnu and aarch64_be-elf.  OK to install?

Ok.

>
> Richard
>
>
> gcc/
> * config/aarch64/aarch64-sve2.md (aarch64_gather_ld1q): Replace 
> with...
> (@aarch64_gather_ld1q): ...this, parameterizing based on mode.
> * config/aarch64/aarch64-sve-builtins-sve2.cc
> (svld1q_gather_impl::expand): Update accordingly.
> (svst1q_scatter_impl::expand): Use aarch64_sve_reinterpret
> instead of force_lowpart_subreg.
> ---
>  .../aarch64/aarch64-sve-builtins-sve2.cc  |  5 +++--
>  gcc/config/aarch64/aarch64-sve2.md| 21 +--
>  2 files changed, 18 insertions(+), 8 deletions(-)
>
> diff --git a/gcc/config/aarch64/aarch64-sve-builtins-sve2.cc 
> b/gcc/config/aarch64/aarch64-sve-builtins-sve2.cc
> index d9922de7ca5..abe21a8b61c 100644
> --- a/gcc/config/aarch64/aarch64-sve-builtins-sve2.cc
> +++ b/gcc/config/aarch64/aarch64-sve-builtins-sve2.cc
> @@ -316,7 +316,8 @@ public:
>expand (function_expander &e) const override
>{
>  e.prepare_gather_address_operands (1, false);
> -return e.use_exact_insn (CODE_FOR_aarch64_gather_ld1q);
> +auto icode = code_for_aarch64_gather_ld1q (e.tuple_mode (0));
> +return e.use_exact_insn (icode);
>}
>  };
>
> @@ -722,7 +723,7 @@ public:
>expand (function_expander &e) const override
>{
>  rtx data = e.args.last ();
> -e.args.last () = force_lowpart_subreg (VNx2DImode, data, GET_MODE 
> (data));
> +e.args.last () = aarch64_sve_reinterpret (VNx2DImode, data);
>  e.prepare_gather_address_operands (1, false);
>  return e.use_exact_insn (CODE_FOR_aarch64_scatter_st1q);
>}
> diff --git a/gcc/config/aarch64/aarch64-sve2.md 
> b/gcc/config/aarch64/aarch64-sve2.md
> index 62524f36de6..f39a0a964f2 100644
> --- a/gcc/config/aarch64/aarch64-sve2.md
> +++ b/gcc/config/aarch64/aarch64-sve2.md
> @@ -334,12 +334,21 @@ (define_insn "@aarch64__strided4"
>  ;; - LD1Q (SVE2p1)
>  ;; -
>
> -;; Model this as operating on the largest valid element size, which is DI.
> -;; This avoids having to define move patterns & more for VNx1TI, which would
> -;; be difficult without a non-gather form of LD1Q.
> -(define_insn "aarch64_gather_ld1q"
> -  [(set (match_operand:VNx2DI 0 "register_operand")
> -   (unspec:VNx2DI
> +;; For little-endian targets, it would be enough to use a single pattern,
> +;; with a subreg to bitcast the result to whatever mode is needed.
> +;; However, on big-endian targets, the bitcast would need to be an
> +;; aarch64_sve_reinterpret instruction.  That would interact badly
> +;; with the "&" and "?" constraints in this pattern: if the result
> +;; of the reinterpret needs to be in the same register as the index,
> +;; the RA would tend to prefer to allocate a separate register for the
> +;; intermediate (uncast) result, even if the reinterpret prefers tying.
> +;;
> +;; The index is logically VNx1DI rather than VNx2DI, but introducing
> +;; and using VNx1DI would just create more bitcasting.  The ACLE intrinsic
> +;; uses svuint64_t, which corresponds

Re: [PATCH 2/2] Reduce the # of arguments of .ACCESS_WITH_SIZE from 6 to 4.

2025-07-10 Thread Jakub Jelinek
On Thu, Jul 10, 2025 at 04:03:30PM +, Qing Zhao wrote:
> gcc/c-family/ChangeLog:
> 
>   * c-ubsan.cc (get_bound_from_access_with_size): Adjust the position
>   of the arguments per the new design.
> 
> gcc/c/ChangeLog:
> 
>   * c-typeck.cc (build_counted_by_ref): Update comments.
>   (build_access_with_size_for_counted_by): Adjust the arguments per
>   the new design.
> 
> gcc/ChangeLog:
> 
>   * internal-fn.cc (expand_DEFERRED_INIT): Update comments.
>   * internal-fn.def (DEFERRED_INIT): Update comments.
>   * tree-object-size.cc (addr_object_size): Update comments.
>   (access_with_size_object_size): Adjust the arguments per the new
>   design.

Similar comment about ChangeLog entries as on the previous patch.

I see only code passing 0 as the third argument (with the right type),
so don't see why it is documented to be 0/1/2/3.  Or is that something
done in the patch that got reverted?

Jakub



Re: [PATCH 1/2] Passing TYPE_SIZE_UNIT of the element as the 6th argument to .ACCESS_WITH_SIZE (PR121000)

2025-07-10 Thread Qing Zhao



> On Jul 10, 2025, at 12:34, Jakub Jelinek  wrote:
> 
> On Thu, Jul 10, 2025 at 04:03:29PM +, Qing Zhao wrote:
>> The size of the element of the FAM _cannot_ reliably depends on the original
>> TYPE of the FAM that we passed as the 6th parameter to the .ACCESS_WITH_SIZE:
>> 
>> TYPE_SIZE_UNIT (TREE_TYPE (TREE_TYPE (gimple_call_arg (call, 5
>> 
>> when the element of the FAM has a variable length type. Since the variable
>> that represents TYPE_SIZE_UNIT has no explicit usage in the original IL,
>> compiler transformations (such as DSE) that are applied before object_size
>> phase might eliminate the whole definition to the variable that represents
>> the TYPE_SIZE_UNIT of the element of the FAM.
>> 
>> In order to resolve this issue, instead of passing the original TYPE of the
>> FAM as the 6th argument to .ACCESS_WITH_SIZE, we should explicitly pass the
>> original TYPE_SIZE_UNIT of the element TYPE of the FAM as the 6th argument
>> to the call to  .ACCESS_WITH_SIZE.
>> 
>> The patches have been bootstrapped and regression tested on both aarch64
>> and x86.
>> 
>> Okay for trunk?
>> 
>> thanks.
>> 
>> Qing
>> 
>> PR middle-end/121000
>> 
>> gcc/c/ChangeLog:
>> 
>> * c-typeck.cc (build_counted_by_ref): Update comments.
> 
> You can't trust mklog that much.

Oh, thanks for reminding me on this.

Yes, I just checked, you are right. I will  check and update the changelog 
accordingly. 

>  You're updating
> build_access_with_size_for_counted_by function comment, not
> build_counted_by_ref comments.
> 
>> (build_access_with_size_for_counted_by): Pass TYPE_SIZE_UNIT of the
>> element as the 6th argument.
>> 
>> gcc/ChangeLog:
>> 
>> * internal-fn.cc (expand_DEFERRED_INIT): Update comments.
>> * internal-fn.def (DEFERRED_INIT): Update comments.
> 
> Nor these two.  It is expand_ACCESS_WITH_SIZE and ACCESS_WITH_SIZE
> in these cases.

Will update. 

> 
>> * tree-object-size.cc (access_with_size_object_size): Update comments.
>> Get the element_size from the 6th argument directly.
>> 
>> gcc/testsuite/ChangeLog:
>> 
>> * gcc.dg/flex-array-counted-by-pr121000.c: New test.
> 
> Otherwise LGTM with one nit.

>> +int main ()
> 
> Line break before main.

Okay, will fix this before committing.

Thanks a lot.

Qing

> 
> Jakub




Re: [PATCH] c++, libstdc++, v5: Implement C++26 P3068R5 - constexpr exceptions [PR117785]

2025-07-10 Thread Jason Merrill

On 7/10/25 4:05 AM, Jakub Jelinek wrote:

On Wed, Jul 09, 2025 at 06:45:41PM -0400, Jason Merrill wrote:

+ && reduced_constant_expression_p (val))


And a value doesn't need to be constant to be printable, we should be able
to print it unconditionally.


Sure, the question is if printing non-constant value is better for users.
The change to do unconditionally the %qE results in
/usr/src/gcc/gcc/testsuite/g++.dg/cpp26/constexpr-eh12.C:71:49: error: uncaught 
exception '(E*)(& heap )'
while previously it was
/usr/src/gcc/gcc/testsuite/g++.dg/cpp26/constexpr-eh12.C:71:49: error: uncaught 
exception of type 'E*'
I've kept the conditional for now but if you really want that change, can 
remove it
in the 2 spots and tweak constexpr-eh12.C test's expectations.


Fair enough, we can leave it.


  case CLEANUP_STMT:
r = cxx_eval_constant_expression (ctx, CLEANUP_BODY (t), lval,
non_constant_p, overflow_p,
jump_target);
-  if (!CLEANUP_EH_ONLY (t) && !*non_constant_p)
+  if ((!CLEANUP_EH_ONLY (t) || throws (jump_target)) && !*non_constant_p)
{
  iloc_sentinel ils (loc);
+ tree jmp_target = NULL_TREE;
  /* Also evaluate the cleanup.  */
  cxx_eval_constant_expression (ctx, CLEANUP_EXPR (t), vc_discard,
-   non_constant_p, overflow_p);
+   non_constant_p, overflow_p,
+   &jmp_target);
+ if (throws (&jmp_target))
+   {
+ if (throws (jump_target))
+   {
+ /* [except.throw]/9 - If the exception handling mechanism
+handling an uncaught exception directly invokes a function
+that exits via an exception, the function std::terminate is
+invoked.  */
+ if (!ctx->quiet)
+   {
+ auto_diagnostic_group d;
+ diagnose_std_terminate (loc, ctx, *jump_target);
+ inform (loc, "destructor exited with an exception");
+   }
+ *non_constant_p = true;
+ *jump_target = NULL_TREE;
+ r = NULL_TREE;
+   }
+ else
+   *jump_target = jmp_target;


Can't you use merge_jump_target here, too?

OK with that change.

Jason



Re: [PATCH v2 1/1] contrib: add bpf-vmtest-tool to test BPF programs

2025-07-10 Thread Piyush Raj

Hi David,

On 09/07/25 03:33, David Faust wrote:

diff --git a/contrib/bpf-vmtest-tool/.gitignore 
b/contrib/bpf-vmtest-tool/.gitignore
new file mode 100644
index 000..723dfe1d0f4
--- /dev/null
+++ b/contrib/bpf-vmtest-tool/.gitignore
@@ -0,0 +1,23 @@
+.gitignore_local
+.python-version
+# Byte-compiled / optimized / DLL files
+__pycache__/
+*.py[codz]
+*$py.class
+
+# Unit test / coverage reports
+.pytest_cache/
+
+
+# Environments
+.env
+.envrc
+.venv
+env/
+venv/
+ENV/
+env.bak/
+venv.bak/
+
+# Ruff stuff:
+.ruff_cache/


Will the files be e.g. byte-compiled in typical use?  At a glance I think
most (all?) of these will never be created in ordinary usage.  If that's
the case we can probably not include this local .gitignore at all.


In normal use, only __pycache__ would be generated, but I checked, and 
the project's root .gitignore already covers it at line 17. So we can 
drop this .gitignore




+DEVELOPMENT
+===
+
+Development dependencies are specified in `pyproject.toml`, which can be used
+with any suitable Python virtual environment manager.
+
+To run the test suite:
+
+python3 -m pytest


Can the tests be run without a pyproject.toml, or is it required?
(I mean the tests of bpf-vmtest-tool itself)
The current tests can be run with "python3 -m pytest 
--import-mode=importlib tests" without pyproject.toml. But some pytest 
options can only be specified in pyproject.toml file and don't have 
command-line alternatives.

As far as I understand it is typically for packages. But in this case
it seems, now that we don't have external deps, to come down to a local
development utility thing along the lines of .clang-format or so, which
we aren't checking in.
Yes, pyproject.toml is used for packaging python projects, but it's also 
used by tools like pytest and ruff under their [tool.*] tables (see: 
https://packaging.python.org/en/latest/guides/writing-pyproject-toml/ )
While it's not needed to run the script itself, most development tools 
depend on it, so I think it makes sense to include it in the source to 
help set up consistent development environments.



+# https://git.sr.ht/~brianwitte/gcc-bpf-example/tree/master/item/Makefile


Is this comment attached to anything in particular?

I guess you followed here the same structure outlined in Brian Witte's Makefile,
right?  It would be good give a bit of context to the link, otherwise it just
seems out of place.
Thanks. I'll add clarification that the current implementation uses 
Brian's Makefile as a reference.


Best regards,
Piyush


Re: [PATCH 2/2] Reduce the # of arguments of .ACCESS_WITH_SIZE from 6 to 4.

2025-07-10 Thread Qing Zhao


> On Jul 10, 2025, at 12:56, Jakub Jelinek  wrote:
> 
> On Thu, Jul 10, 2025 at 04:03:30PM +, Qing Zhao wrote:
>> gcc/c-family/ChangeLog:
>> 
>> * c-ubsan.cc (get_bound_from_access_with_size): Adjust the position
>> of the arguments per the new design.
>> 
>> gcc/c/ChangeLog:
>> 
>> * c-typeck.cc (build_counted_by_ref): Update comments.
>> (build_access_with_size_for_counted_by): Adjust the arguments per
>> the new design.
>> 
>> gcc/ChangeLog:
>> 
>> * internal-fn.cc (expand_DEFERRED_INIT): Update comments.
>> * internal-fn.def (DEFERRED_INIT): Update comments.
>> * tree-object-size.cc (addr_object_size): Update comments.
>> (access_with_size_object_size): Adjust the arguments per the new
>> design.
> 
> Similar comment about ChangeLog entries as on the previous patch.

Sure, will check the changeling and update accordingly. 

> 
> I see only code passing 0 as the third argument (with the right type),
> so don't see why it is documented to be 0/1/2/3.  Or is that something
> done in the patch that got reverted?

ACCESS_MODE is only for a future work to reimplement the attribute
access with the internal function .ACCESS_WITH_SIZE. 

https://gcc.gnu.org/onlinedocs/gcc/Common-Function-Attributes.html#index-access-function-attribute

For the current “counted_by”, this flag is not used at all.  Therefore only 0 
is passed. 

Hope this is clear. 
> 
> Jakub
> 



Re: [PATCH] Reject single lane vector types for SLP build

2025-07-10 Thread Richard Biener



> Am 10.07.2025 um 16:27 schrieb Tamar Christina :
> 
> 
>> 
>> -Original Message-
>> From: Richard Biener 
>> Sent: Thursday, July 10, 2025 3:09 PM
>> To: Tamar Christina 
>> Cc: gcc-patches@gcc.gnu.org; Richard Sandiford ;
>> RISC-V CI 
>> Subject: RE: [PATCH] Reject single lane vector types for SLP build
>> 
>> On Thu, 10 Jul 2025, Tamar Christina wrote:
>> 
 -Original Message-
 From: Richard Biener 
 Sent: Thursday, July 10, 2025 1:31 PM
 To: gcc-patches@gcc.gnu.org
 Cc: Richard Sandiford ; Tamar Christina
 ; RISC-V CI 
 Subject: [PATCH] Reject single lane vector types for SLP build
 
 The following makes us never consider vector(1) T types for
 vectorization and ensures this during SLP build.  This is a
 long-standing issue for BB vectorization and when we remove
 early loop vector type setting we lose the single place we have
 that rejects this for loops.
 
 Once we implement partial loop vectorization we should revisit
 this, but then use the original scalar types for the unvectorized
 parts.
>>> 
>>> SGTM FWIW,
>>> 
>>> I was also wondering if I should start upstreaming my changes to
>>> get the vectorizer to recognize vector types as scalar types as well.
>>> 
>>> Or if you wanted me to wait until I have the lane representations
>>> more figured out.
>> 
>> I think if we can restrict things to cases that have a strong
>> overlap with what we intend to use in the end that sounds good.
>> Like allow only a single "scalar" vector def per SLP node for now
>> and simply stash that into the scalar-stmts array.  In the end
>> we'd want to allow mixed scalar and vector defs there.
> 
> At least for my use case I'd need to be able to do multiple "scalar"
> vector lanes, but restricted to the same size for each lane is fine for
> now.
> 
> But I don't think where's actually much difference here between
> one "scalar" and multiple "scalars" representations wise now is there?

Not if they are at least uniform, not mixed vector ns scalar or different 
vector sizes.
What’s going to be interesting (or impossible?) might be VLA vectors.

But I’d like to see the BB vectorizer deal with vector lowering, so each target 
unsupported vector stmt would be a BB vect seed.

Richard 

> 
> Thanks,
> Tamar
> 
>> 
>> It does require altering code that expects to get at actual _scalar_
>> defs for each lane, but I don't think that's much code.
>> 
>> Richard.
>> 
>>> Regards,
>>> Tamar
 
 Bootstrapped and tested on x86_64-unknown-linux-gnu.  I'll see
 if there's any surprises from the CI, but otherwise I'll go
 ahead with this.
 
 Richard.
 
* tree-vect-slp.cc (vect_build_slp_tree_1): Reject
single-lane vector types.
 ---
 gcc/tree-vect-slp.cc | 9 +
 1 file changed, 9 insertions(+)
 
 diff --git a/gcc/tree-vect-slp.cc b/gcc/tree-vect-slp.cc
 index ad75386926a..d2ce4ffaa4f 100644
 --- a/gcc/tree-vect-slp.cc
 +++ b/gcc/tree-vect-slp.cc
 @@ -1114,6 +1114,15 @@ vect_build_slp_tree_1 (vec_info *vinfo, unsigned
>> char
 *swap,
   matches[0] = false;
   return false;
 }
 +  if (known_le (TYPE_VECTOR_SUBPARTS (vectype), 1U))
 +{
 +  if (dump_enabled_p ())
 +dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location,
 + "Build SLP failed: not using single lane "
 + "vector type %T\n", vectype);
 +  matches[0] = false;
 +  return false;
 +}
   /* Record nunits required but continue analysis, producing matches[]
  as if nunits was not an issue.  This allows splitting of groups
  to happen.  */
 --
 2.43.0
>>> 
>> 
>> --
>> Richard Biener 
>> SUSE Software Solutions Germany GmbH,
>> Frankenstrasse 146, 90461 Nuernberg, Germany;
>> GF: Ivo Totev, Andrew McDonald, Werner Knoblich; (HRB 36809, AG Nuernberg)


Re: [PATCH] RISC-V: Make zero-stride load broadcast a tunable.

2025-07-10 Thread Robin Dapp

Oh, I guess I didn't expand enough about my thought:
I don't care that we have bad performance/bad code gen here if Zvfh is
mandatory for RVA23 since that means not many people and core will
fall into this code gen path.
But RVA23 will go to this code gen patch, which means we will go this
path for years until the next profile becomes the new baseline, so I
think it worth spending time to improve that :)


Ah, yeah, that makes sense.

--
Regards
Robin



[PATCH 1/5] Pass SLP node down to cost hook for reduction cost

2025-07-10 Thread Richard Biener
The following arranges vector reduction costs to hand down the
SLP node (of the reduction stmt) to the cost hooks, not only the
stmt_info.  This also avoids accessing STMT_VINFO_VECTYPE of an
unrelated stmt to the node that is subject to code generation.

* tree-vect-loop.cc (vect_model_reduction_cost): Get SLP
node instead of stmt_info and use that when recording costs.
---
 gcc/tree-vect-loop.cc | 37 +++--
 1 file changed, 19 insertions(+), 18 deletions(-)

diff --git a/gcc/tree-vect-loop.cc b/gcc/tree-vect-loop.cc
index 6f9765b5459..7b260c34a84 100644
--- a/gcc/tree-vect-loop.cc
+++ b/gcc/tree-vect-loop.cc
@@ -5001,7 +5001,7 @@ vect_is_emulated_mixed_dot_prod (stmt_vec_info stmt_info)
 
 static void
 vect_model_reduction_cost (loop_vec_info loop_vinfo,
-  stmt_vec_info stmt_info, internal_fn reduc_fn,
+  slp_tree node, internal_fn reduc_fn,
   vect_reduction_type reduction_type,
   int ncopies, stmt_vector_for_cost *cost_vec)
 {
@@ -5017,9 +5017,10 @@ vect_model_reduction_cost (loop_vec_info loop_vinfo,
   if (reduction_type == COND_REDUCTION)
 ncopies *= 2;
 
-  vectype = STMT_VINFO_VECTYPE (stmt_info);
+  vectype = SLP_TREE_VECTYPE (node);
   mode = TYPE_MODE (vectype);
-  stmt_vec_info orig_stmt_info = vect_orig_stmt (stmt_info);
+  stmt_vec_info orig_stmt_info
+= vect_orig_stmt (SLP_TREE_REPRESENTATIVE (node));
 
   gimple_match_op op;
   if (!gimple_extract_op (orig_stmt_info->stmt, &op))
@@ -5037,16 +5038,16 @@ vect_model_reduction_cost (loop_vec_info loop_vinfo,
   if (reduc_fn != IFN_LAST)
/* Count one reduction-like operation per vector.  */
inside_cost = record_stmt_cost (cost_vec, ncopies, vec_to_scalar,
-   stmt_info, 0, vect_body);
+   node, 0, vect_body);
   else
{
  /* Use NELEMENTS extracts and NELEMENTS scalar ops.  */
  unsigned int nelements = ncopies * vect_nunits_for_cost (vectype);
  inside_cost = record_stmt_cost (cost_vec, nelements,
- vec_to_scalar, stmt_info, 0,
+ vec_to_scalar, node, 0,
  vect_body);
  inside_cost += record_stmt_cost (cost_vec, nelements,
-  scalar_stmt, stmt_info, 0,
+  scalar_stmt, node, 0,
   vect_body);
}
 }
@@ -5063,7 +5064,7 @@ vect_model_reduction_cost (loop_vec_info loop_vinfo,
/* We need the initial reduction value.  */
prologue_stmts = 1;
   prologue_cost += record_stmt_cost (cost_vec, prologue_stmts,
-scalar_to_vec, stmt_info, 0,
+scalar_to_vec, node, 0,
 vect_prologue);
 }
 
@@ -5080,24 +5081,24 @@ vect_model_reduction_cost (loop_vec_info loop_vinfo,
{
  /* An EQ stmt and an COND_EXPR stmt.  */
  epilogue_cost += record_stmt_cost (cost_vec, 2,
-vector_stmt, stmt_info, 0,
+vector_stmt, node, 0,
 vect_epilogue);
  /* Reduction of the max index and a reduction of the found
 values.  */
  epilogue_cost += record_stmt_cost (cost_vec, 2,
-vec_to_scalar, stmt_info, 0,
+vec_to_scalar, node, 0,
 vect_epilogue);
  /* A broadcast of the max value.  */
  epilogue_cost += record_stmt_cost (cost_vec, 1,
-scalar_to_vec, stmt_info, 0,
+scalar_to_vec, node, 0,
 vect_epilogue);
}
  else
{
  epilogue_cost += record_stmt_cost (cost_vec, 1, vector_stmt,
-stmt_info, 0, vect_epilogue);
+node, 0, vect_epilogue);
  epilogue_cost += record_stmt_cost (cost_vec, 1,
-vec_to_scalar, stmt_info, 0,
+vec_to_scalar, node, 0,
 vect_epilogue);
}
}
@@ -5107,12 +5108,12 @@ vect_model_reduction_cost (loop_vec_info loop_vinfo,
  /* Extraction of scalar elements.  */
  epilogue_cost += record_stmt_cost (cost_vec,
 2 * estimated

[PATCH v3 9/9] aarch64: Add memtag-stack tests

2025-07-10 Thread claudiu . zissulescu-ianculescu
From: Indu Bhagat 

Add basic tests for memtag-stack sanitizer.  Memtag stack sanitizer
uses target hooks to emit AArch64 specific MTE instructions.

gcc/testsuite:

* lib/target-supports.exp:
* gcc.target/aarch64/memtag/alloca-1.c: New test.
* gcc.target/aarch64/memtag/alloca-3.c: New test.
* gcc.target/aarch64/memtag/arguments-1.c: New test.
* gcc.target/aarch64/memtag/arguments-2.c: New test.
* gcc.target/aarch64/memtag/arguments-3.c: New test.
* gcc.target/aarch64/memtag/arguments-4.c: New test.
* gcc.target/aarch64/memtag/arguments.c: New test.
* gcc.target/aarch64/memtag/basic-1.c: New test.
* gcc.target/aarch64/memtag/basic-3.c: New test.
* gcc.target/aarch64/memtag/basic-struct.c: New test.
* gcc.target/aarch64/memtag/large-array.c: New test.
* gcc.target/aarch64/memtag/local-no-escape.c: New test.
* gcc.target/aarch64/memtag/memtag.exp: New file.
* gcc.target/aarch64/memtag/no-sanitize-attribute.c: New test.
* gcc.target/aarch64/memtag/value-init.c: New test.
* gcc.target/aarch64/memtag/vararray-gimple.c: New test.
* gcc.target/aarch64/memtag/vararray.c: New test.
* gcc.target/aarch64/memtag/zero-init.c: New test.
* gcc.target/aarch64/memtag/texec-1.c: New test.
* gcc.target/aarch64/memtag/texec-2.c: New test.
* gcc.target/aarch64/memtag/vla-1.c: New test.
* gcc.target/aarch64/memtag/vla-2.c: New test.
* testsuite/lib/target-supports.exp
(check_effective_target_aarch64_mte): New funcction.

Co-authored-by: Indu Bhagat 
Signed-off-by: Claudiu Zissulescu 
---
 .../gcc.target/aarch64/memtag/alloca-1.c  | 14 
 .../gcc.target/aarch64/memtag/alloca-3.c  | 27 
 .../gcc.target/aarch64/memtag/arguments-1.c   |  3 +
 .../gcc.target/aarch64/memtag/arguments-2.c   |  3 +
 .../gcc.target/aarch64/memtag/arguments-3.c   |  3 +
 .../gcc.target/aarch64/memtag/arguments-4.c   | 16 +
 .../gcc.target/aarch64/memtag/arguments.c |  3 +
 .../gcc.target/aarch64/memtag/basic-1.c   | 15 +
 .../gcc.target/aarch64/memtag/basic-3.c   | 21 ++
 .../gcc.target/aarch64/memtag/basic-struct.c  | 22 +++
 .../aarch64/memtag/cfi-mte-memtag-frame-1.c   | 11 
 .../gcc.target/aarch64/memtag/large-array.c   | 24 +++
 .../aarch64/memtag/local-no-escape.c  | 20 ++
 .../gcc.target/aarch64/memtag/memtag.exp  | 64 +++
 .../gcc.target/aarch64/memtag/mte-sig.h   | 15 +
 .../aarch64/memtag/no-sanitize-attribute.c| 17 +
 .../gcc.target/aarch64/memtag/texec-1.c   | 27 
 .../gcc.target/aarch64/memtag/texec-2.c   | 22 +++
 .../gcc.target/aarch64/memtag/value-init.c| 14 
 .../aarch64/memtag/vararray-gimple.c  | 17 +
 .../gcc.target/aarch64/memtag/vararray.c  | 14 
 .../gcc.target/aarch64/memtag/vla-1.c | 39 +++
 .../gcc.target/aarch64/memtag/vla-2.c | 48 ++
 .../gcc.target/aarch64/memtag/zero-init.c | 14 
 gcc/testsuite/lib/target-supports.exp | 12 
 25 files changed, 485 insertions(+)
 create mode 100644 gcc/testsuite/gcc.target/aarch64/memtag/alloca-1.c
 create mode 100644 gcc/testsuite/gcc.target/aarch64/memtag/alloca-3.c
 create mode 100644 gcc/testsuite/gcc.target/aarch64/memtag/arguments-1.c
 create mode 100644 gcc/testsuite/gcc.target/aarch64/memtag/arguments-2.c
 create mode 100644 gcc/testsuite/gcc.target/aarch64/memtag/arguments-3.c
 create mode 100644 gcc/testsuite/gcc.target/aarch64/memtag/arguments-4.c
 create mode 100644 gcc/testsuite/gcc.target/aarch64/memtag/arguments.c
 create mode 100644 gcc/testsuite/gcc.target/aarch64/memtag/basic-1.c
 create mode 100644 gcc/testsuite/gcc.target/aarch64/memtag/basic-3.c
 create mode 100644 gcc/testsuite/gcc.target/aarch64/memtag/basic-struct.c
 create mode 100644 
gcc/testsuite/gcc.target/aarch64/memtag/cfi-mte-memtag-frame-1.c
 create mode 100644 gcc/testsuite/gcc.target/aarch64/memtag/large-array.c
 create mode 100644 gcc/testsuite/gcc.target/aarch64/memtag/local-no-escape.c
 create mode 100644 gcc/testsuite/gcc.target/aarch64/memtag/memtag.exp
 create mode 100644 gcc/testsuite/gcc.target/aarch64/memtag/mte-sig.h
 create mode 100644 
gcc/testsuite/gcc.target/aarch64/memtag/no-sanitize-attribute.c
 create mode 100644 gcc/testsuite/gcc.target/aarch64/memtag/texec-1.c
 create mode 100644 gcc/testsuite/gcc.target/aarch64/memtag/texec-2.c
 create mode 100644 gcc/testsuite/gcc.target/aarch64/memtag/value-init.c
 create mode 100644 gcc/testsuite/gcc.target/aarch64/memtag/vararray-gimple.c
 create mode 100644 gcc/testsuite/gcc.target/aarch64/memtag/vararray.c
 create mode 100644 gcc/testsuite/gcc.target/aarch64/memtag/vla-1.c
 create mode 100644 gcc/testsuite/gcc.target/aarch64/memtag/vla-2.c
 create mode 100644 gcc/testsuite/gcc.target/aarch64/memtag/zero-init.c

diff --git a/gcc/testsuite/gcc.target/aa

[PATCH v3 2/9] opts: use uint64_t for sanitizer flags

2025-07-10 Thread claudiu . zissulescu-ianculescu
From: Indu Bhagat 

Currently, the data type of sanitizer flags is unsigned int, with
SANITIZE_SHADOW_CALL_STACK (1UL << 31) being highest individual
enumerator for enum sanitize_code.  Use 'uint64_t' data type to allow
for more distinct instrumentation modes be added when needed.

gcc/ChangeLog:

* asan.h (sanitize_flags_p): Use 'uint64_t' instead of 'unsigned
int'.
* common.opt: Likewise.
* dwarf2asm.cc (dw2_output_indirect_constant_1): Likewise.
* opts.cc (find_sanitizer_argument): Likewise.
(report_conflicting_sanitizer_options): Likewise.
(parse_sanitizer_options): Likewise.
(parse_no_sanitize_attribute): Likewise.
* opts.h (parse_sanitizer_options): Likewise.
(parse_no_sanitize_attribute): Likewise.
* tree-cfg.cc (print_no_sanitize_attr_value): Likewise.

gcc/c-family/ChangeLog:

* c-attribs.cc (add_no_sanitize_value): Likewise.
(handle_no_sanitize_attribute): Likewise.
(handle_no_sanitize_address_attribute): Likewise.
(handle_no_sanitize_thread_attribute): Likewise.
(handle_no_address_safety_analysis_attribute): Likewise.
* c-common.h (add_no_sanitize_value): Likewise.

gcc/c/ChangeLog:

* c-parser.cc (c_parser_declaration_or_fndef): Likewise.

gcc/cp/ChangeLog:

* typeck.cc (get_member_function_from_ptrfunc): Likewise.

gcc/d/ChangeLog:

* d-attribs.cc (d_handle_no_sanitize_attribute): Likewise.

Signed-off-by: Claudiu Zissulescu 
---
 gcc/asan.h|  5 +++--
 gcc/c-family/c-attribs.cc | 16 
 gcc/c-family/c-common.h   |  2 +-
 gcc/c/c-parser.cc |  4 ++--
 gcc/common.opt|  6 +++---
 gcc/cp/typeck.cc  |  2 +-
 gcc/d/d-attribs.cc|  8 
 gcc/dwarf2asm.cc  |  2 +-
 gcc/opts.cc   | 25 +
 gcc/opts.h|  8 
 gcc/tree-cfg.cc   |  2 +-
 11 files changed, 41 insertions(+), 39 deletions(-)

diff --git a/gcc/asan.h b/gcc/asan.h
index 064d4f24823..d4443de4620 100644
--- a/gcc/asan.h
+++ b/gcc/asan.h
@@ -242,9 +242,10 @@ asan_protect_stack_decl (tree decl)
remove all flags mentioned in "no_sanitize" of DECL_ATTRIBUTES.  */
 
 inline bool
-sanitize_flags_p (unsigned int flag, const_tree fn = current_function_decl)
+sanitize_flags_p (uint64_t flag,
+ const_tree fn = current_function_decl)
 {
-  unsigned int result_flags = flag_sanitize & flag;
+  uint64_t result_flags = flag_sanitize & flag;
   if (result_flags == 0)
 return false;
 
diff --git a/gcc/c-family/c-attribs.cc b/gcc/c-family/c-attribs.cc
index ea04ed7f0d4..ddb173e3ccf 100644
--- a/gcc/c-family/c-attribs.cc
+++ b/gcc/c-family/c-attribs.cc
@@ -1409,23 +1409,23 @@ handle_cold_attribute (tree *node, tree name, tree 
ARG_UNUSED (args),
 /* Add FLAGS for a function NODE to no_sanitize_flags in DECL_ATTRIBUTES.  */
 
 void
-add_no_sanitize_value (tree node, unsigned int flags)
+add_no_sanitize_value (tree node, uint64_t flags)
 {
   tree attr = lookup_attribute ("no_sanitize", DECL_ATTRIBUTES (node));
   if (attr)
 {
-  unsigned int old_value = tree_to_uhwi (TREE_VALUE (attr));
+  uint64_t old_value = tree_to_uhwi (TREE_VALUE (attr));
   flags |= old_value;
 
   if (flags == old_value)
return;
 
-  TREE_VALUE (attr) = build_int_cst (unsigned_type_node, flags);
+  TREE_VALUE (attr) = build_int_cst (uint64_type_node, flags);
 }
   else
 DECL_ATTRIBUTES (node)
   = tree_cons (get_identifier ("no_sanitize"),
-  build_int_cst (unsigned_type_node, flags),
+  build_int_cst (uint64_type_node, flags),
   DECL_ATTRIBUTES (node));
 }
 
@@ -1436,7 +1436,7 @@ static tree
 handle_no_sanitize_attribute (tree *node, tree name, tree args, int,
  bool *no_add_attrs)
 {
-  unsigned int flags = 0;
+  uint64_t flags = 0;
   *no_add_attrs = true;
   if (TREE_CODE (*node) != FUNCTION_DECL)
 {
@@ -1473,7 +1473,7 @@ handle_no_sanitize_address_attribute (tree *node, tree 
name, tree, int,
   if (TREE_CODE (*node) != FUNCTION_DECL)
 warning (OPT_Wattributes, "%qE attribute ignored", name);
   else
-add_no_sanitize_value (*node, SANITIZE_ADDRESS);
+add_no_sanitize_value (*node, (uint64_t) SANITIZE_ADDRESS);
 
   return NULL_TREE;
 }
@@ -1489,7 +1489,7 @@ handle_no_sanitize_thread_attribute (tree *node, tree 
name, tree, int,
   if (TREE_CODE (*node) != FUNCTION_DECL)
 warning (OPT_Wattributes, "%qE attribute ignored", name);
   else
-add_no_sanitize_value (*node, SANITIZE_THREAD);
+add_no_sanitize_value (*node, (uint64_t) SANITIZE_THREAD);
 
   return NULL_TREE;
 }
@@ -1506,7 +1506,7 @@ handle_no_address_safety_analysis_attribute (tree *node, 
tree name, tree, int,
   if (TREE_CODE (*node) != FUNCTION_DECL)
 warning (OPT_Wattributes, "%qE attribute ignored", name);
   else
-add_no_sanitize_value (*node, SAN

Rewrite assign_discriminators pass

2025-07-10 Thread Jan Hubicka
Hi,
to assign debug locations to corresponding statements auto-fdo uses
discriminators.  Documentation says that if given statement belongs to multiple
basic blocks, the discrminator distinguishes them.

Current implementation however only work fork statements that expands into a
squence of gimple statements which forms a linear sequence, sicne it
essentially tracks a current location and renews it each time new BB is found.
This is commonly not true for C++ code as in:

   :
  [simulator/csimplemodule.cc:379:85] _40 = 
std::__cxx11::basic_string::c_str ([simulator/csimplemodule.cc:379:85] 
&D.80680);
  [simulator/csimplemodule.cc:379:85 discrim 13] _41 = 
[simulator/csimplemodule.cc:379:85] 
&this->D.78503.D.78106.D.72008.D.68585.D.67935.D.67879.D.67782;
  [simulator/csimplemodule.cc:379:85 discrim 13] _42 = 
&this->D.78503.D.78106.D.72008.D.68585.D.67935.D.67879.D.67782;
  [simulator/csimplemodule.cc:377:45] _43 = 
this->D.78503.D.78106.D.72008.D.68585.D.67935.D.67879.D.67782._vptr.cObject;
  [simulator/csimplemodule.cc:377:45] _44 = _43 + 40;
  [simulator/csimplemodule.cc:377:45] _45 = [simulator/csimplemodule.cc:377:45] 
*_44;
  [simulator/csimplemodule.cc:379:85] D.89001 = OBJ_TYPE_REF(_45;(const struct 
cObject)_42->5B) (_41);

This is a fragment of code that is expanded from:


371 if (this!=simulation.getContextModule())
372 throw cRuntimeError("send()/sendDelayed() of module (%s)%s 
called in the context of "
373 "module (%s)%s: method called from the 
latter module "
374 "lacks Enter_Method() or 
Enter_Method_Silent()? "
375 "Also, if message to be sent is passed from 
that module, "
376 "you'll need to call take(msg) after 
Enter_Method() as well",
377 getClassName(), getFullPath().c_str(),
378 
simulation.getContextModule()->getClassName(),
379 
simulation.getContextModule()->getFullPath().c_str());

Notice that 379:85 is interleaved by 377:45 and the pass does not assign new 
discriminator.
With patch we get:

   :
  [simulator/csimplemodule.cc:379:85 discrim 7] _40 = 
std::__cxx11::basic_string::c_str ([simulator/csimplemodule.cc:379:85] 
&D.80680);
  [simulator/csimplemodule.cc:379:85 discrim 8] _41 = 
[simulator/csimplemodule.cc:379:85] 
&this->D.78503.D.78106.D.72008.D.68585.D.67935.D.67879.D.67782;
  [simulator/csimplemodule.cc:379:85 discrim 8] _42 = 
&this->D.78503.D.78106.D.72008.D.68585.D.67935.D.67879.D.67782;
  [simulator/csimplemodule.cc:377:45 discrim 1] _43 = 
this->D.78503.D.78106.D.72008.D.68585.D.67935.D.67879.D.67782._vptr.cObject;
  [simulator/csimplemodule.cc:377:45 discrim 1] _44 = _43 + 40;
  [simulator/csimplemodule.cc:377:45 discrim 1] _45 = 
[simulator/csimplemodule.cc:377:45] *_44;
  [simulator/csimplemodule.cc:379:85 discrim 8] D.89001 = 
OBJ_TYPE_REF(_45;(const struct cObject)_42->5B) (_41);

There are earlier statements with line number 379, so that is why there is 
discriminator 7 for the call.
After that discriminator is increased.  There are two reasons for it
 1) AFDO requires every callsite to have unique lineno:discriminator pair
 2) call may not terminate and htus the profile of first statement
may be higher than the rest.

Old pass also contained logic to skip debug statements.  This is needed
to discriminator at train time (with -g) and discriminators at feedback
time (say -g0 -fauto-profile=...) are the same.  However keeping debug
statments with broken discriminators is not a good idea since we output
them to the debug output and if AFDO tool picks these locations up they
will be misplaced in basic blocks.

Debug statements are naturally quite useful to track back the AFDO profiles
and in meantime LLVM folks implemented something similar called pseudoprobe.
I think it makes sense toenable debug statements with -fauto-profile even if
debug info is off and make use of them as done in this patch.

Sadly AFDO tool is quite broken and bulid around assumption that every
address has at most one debug location assigned to it (i.e. debug info
before debug statements were introduced). I have WIP patch fixing this.
The fact that it ignores all but last location assigned to the address
sort of mitigates problem with debug statements.  If they are
immediately suceeded by another location, the tool ignores them.

Note that LLVM also has -fdebug-info-for-auto-profile (on by defualt it seems)
that controls discriminator production and some other little bits.  I wonder if
we want to have something similar.  Should it be -g instead?

Bootstrapped/regtested x86_64-linux, OK?


I am including new code as diff did not good job.

/* Auto-profile needs discriminator to distinguish statements with same line
   number (file name is ignored) which are in different basic block.  This
   map keeps track of current discriminator for a give

[PATCH v3 8/9] aarch64: Add support for memetag-stack sanitizer using MTE insns

2025-07-10 Thread claudiu . zissulescu-ianculescu
From: Claudiu Zissulescu 

MEMTAG sanitizer, which is based on the HWASAN sanitizer, will invoke
the target-specific hooks to create a random tag, add tag to memory
address, and finally tag and untag memory.

Implement the target hooks to emit MTE instructions if MEMTAG sanitizer
is in effect.  Continue to use the default target hook if HWASAN is
being used.  Following target hooks are implemented:
   - TARGET_MEMTAG_INSERT_RANDOM_TAG
   - TARGET_MEMTAG_ADD_TAG
   - TARGET_MEMTAG_EXTRACT_TAG
   - TARGET_MEMTAG_COMPOSE_OFFSET_TAG

Apart from the target-specific hooks, set the following to values
defined by the Memory Tagging Extension (MTE) in aarch64:
   - TARGET_MEMTAG_TAG_SIZE
   - TARGET_MEMTAG_GRANULE_SIZE

The next instructions were (re-)defined:
   - addg/subg (used by TARGET_MEMTAG_ADD_TAG and
 TARGET_MEMTAG_COMPOSE_OFFSET_TAG hooks)
   - stg/st2g Used to tag/untag a memory granule.
   - tag_memory A target specific instruction, it will will emit MTE
 instructions to tag/untag memory of a given size.

Add documentation in gcc/doc/invoke.texi.
(AARCH64_MEMTAG_TAG_SIZE): Define.

gcc/

* config/aarch64/aarch64.md (addg): Update pattern to use
addg/subg instructions.
(stg): Update pattern.
(st2g): New pattern.
(tag_memory): Likewise.
* config/aarch64/aarch64.cc (AARCH64_MEMTAG_GRANULE_SIZE):
Define.
(AARCH64_MEMTAG_TAG_BITSIZE): Likewise.
(AARCH64_MEMTAG_TAG_MEMORY_LOOP_THRESHOLD): Likewise.
(aarch64_override_options_internal): Error out if MTE instructions
are not available.
(aarch64_post_cfi_startproc): Emit .cfi_mte_tagged_frame.
(aarch64_can_tag_addresses): Add MEMTAG specific handling.
(aarch64_memtag_tag_bitsize): New function
(aarch64_memtag_granule_size): Likewise.
(aarch64_memtag_insert_random_tag): Likwise.
(aarch64_memtag_add_tag): Likewise.
(aarch64_memtag_compose_offset_tag): Likewise.
(aarch64_memtag_extract_tag): Likewise.
(aarch64_granule16_memory_address_p): Likewise.
(aarch64_emit_stxg_insn): Likewise.
(aarch64_gen_tag_memory_postindex): Likewise.
(aarch64_memtag_tag_memory_via_loop): New definition.
(aarch64_expand_tag_memory): Likewise.
(aarch64_check_memtag_ops): Likewise.
(aarch64_gen_tag_memory_postindex): Likewise.
(TARGET_MEMTAG_TAG_SIZE): Define.
(TARGET_MEMTAG_GRANULE_SIZE): Likewise.
(TARGET_MEMTAG_INSERT_RANDOM_TAG): Likewise.
(TARGET_MEMTAG_ADD_TAG): Likewise.
(TARGET_MEMTAG_EXTRACT_TAG): Likewise.
(TARGET_MEMTAG_COMPOSE_OFFSET_TAG): Likewise.
* config/aarch64/aarch64-builtins.cc
(aarch64_expand_builtin_memtag): Update set tag builtin logic.
* config/aarch64/aarch64-linux.h: Pass memtag-stack sanitizer
specific options to the linker.
* config/aarch64/aarch64-protos.h
(aarch64_granule16_memory_address_p): New prototype.
(aarch64_check_memtag_ops): Likewise.
(aarch64_expand_tag_memory): Likewise.
* config/aarch64/constraints.md (Umg): New memory constraint.
(Uag): New constraint.
(Ung): Likewise.
(Utg): Likewise.
* config/aarch64/predicates.md (aarch64_memtag_tag_offset):
Refactor it.
(aarch64_granule16_imm6): Rename from aarch64_granule16_uimm6 and
refactor it.
(aarch64_granule16_memory_operand): New constraint.

doc/
* invoke.texi: Update documentation.

gcc/testsuite:

* gcc.target/aarch64/acle/memtag_1.c: Update test.

Co-authored-by: Indu Bhagat 
Signed-off-by: Claudiu Zissulescu 
---
 gcc/config/aarch64/aarch64-builtins.cc|   7 +-
 gcc/config/aarch64/aarch64-linux.h|   4 +-
 gcc/config/aarch64/aarch64-protos.h   |   4 +
 gcc/config/aarch64/aarch64.cc | 370 +-
 gcc/config/aarch64/aarch64.md |  60 ++-
 gcc/config/aarch64/constraints.md |  26 ++
 gcc/config/aarch64/predicates.md  |  13 +-
 gcc/doc/invoke.texi   |   6 +-
 .../gcc.target/aarch64/acle/memtag_1.c|   2 +-
 9 files changed, 464 insertions(+), 28 deletions(-)

diff --git a/gcc/config/aarch64/aarch64-builtins.cc 
b/gcc/config/aarch64/aarch64-builtins.cc
index 93f939a9c83..b2427e73880 100644
--- a/gcc/config/aarch64/aarch64-builtins.cc
+++ b/gcc/config/aarch64/aarch64-builtins.cc
@@ -3668,8 +3668,11 @@ aarch64_expand_builtin_memtag (int fcode, tree exp, rtx 
target)
pat = GEN_FCN (icode) (target, op0, const0_rtx);
break;
   case AARCH64_MEMTAG_BUILTIN_SET_TAG:
-   pat = GEN_FCN (icode) (op0, op0, const0_rtx);
-   break;
+   {
+ rtx mem = gen_rtx_MEM (TImode, op0);
+ pat = GEN_FCN (icode) (mem, mem, op0);
+ break;
+   }
   default:
gcc_unreachable();
 }
diff --git a/gcc/config/aarch64/aarch64-li

[PATCH v3 6/9] asan: add new memtag sanitizer

2025-07-10 Thread claudiu . zissulescu-ianculescu
From: Indu Bhagat 

Add new command line option -fsanitize=memtag-stack with the following
new params:
--param memtag-instrument-alloca [0,1] (default 1) to use MTE insns
for enabling dynamic checking of stack allocas.

Along with the new SANITIZE_MEMTAG_STACK, define a SANITIZE_MEMTAG
which will be set if any kind of memtag sanitizer is in effect (e.g.,
later we may add -fsanitize=memtag-globals).  Add errors to convey
that memtag sanitizer does not work with hwaddress and address
sanitizers.  Also error out if memtag ISA extension is not enabled.

MEMTAG sanitizer will use the HWASAN machinery, but with a few
differences:
  - The tags are always generated at runtime by the hardware, so
-fsanitize=memtag-stack enforces a --param hwasan-random-frame-tag=1

Add documentation in gcc/doc/invoke.texi.

gcc/
* builtins.def: Adjust the macro to include the new
SANTIZIE_MEMTAG_STACK.
* flag-types.h (enum sanitize_code): Add new enumerator for
SANITIZE_MEMTAG and SANITIZE_MEMTAG_STACK.
* opts.cc (finish_options): memtag-stack sanitizer conflicts with
hwaddress and address sanitizers.
(sanitizer_opts): Add new memtag-stack sanitizer.
(parse_sanitizer_options): memtag-stack sanitizer cannot recover.
* params.opt: Add new params for memtag-stack sanitizer.

doc/
* invoke.texi: Update documentation.

Signed-off-by: Claudiu Zissulescu 
---
 gcc/builtins.def|  1 +
 gcc/doc/invoke.texi | 13 -
 gcc/flag-types.h|  4 
 gcc/opts.cc | 22 +-
 gcc/params.opt  |  4 
 5 files changed, 42 insertions(+), 2 deletions(-)

diff --git a/gcc/builtins.def b/gcc/builtins.def
index d7b2894bcfa..5f0b1107347 100644
--- a/gcc/builtins.def
+++ b/gcc/builtins.def
@@ -257,6 +257,7 @@ along with GCC; see the file COPYING3.  If not see
   true, true, true, ATTRS, true, \
  (flag_sanitize & (SANITIZE_ADDRESS | SANITIZE_THREAD \
| SANITIZE_HWADDRESS \
+   | SANITIZE_MEMTAG_STACK \
| SANITIZE_UNDEFINED \
| SANITIZE_UNDEFINED_NONDEFAULT) \
   || flag_sanitize_coverage))
diff --git a/gcc/doc/invoke.texi b/gcc/doc/invoke.texi
index 74f5ee26042..d8f11201361 100644
--- a/gcc/doc/invoke.texi
+++ b/gcc/doc/invoke.texi
@@ -17261,7 +17261,7 @@ When using stack instrumentation, decide tags for stack 
variables using a
 deterministic sequence beginning at a random tag for each frame.  With this
 parameter unset tags are chosen using the same sequence but beginning from 1.
 This is enabled by default for @option{-fsanitize=hwaddress} and unavailable
-for @option{-fsanitize=kernel-hwaddress}.
+for @option{-fsanitize=kernel-hwaddress} and @option{-fsanitize=memtag-stack}.
 To disable it use @option{--param hwasan-random-frame-tag=0}.
 
 @item hwasan-instrument-allocas
@@ -17294,6 +17294,11 @@ and @option{-fsanitize=kernel-hwaddress}.
 To disable instrumentation of builtin functions use
 @option{--param hwasan-instrument-mem-intrinsics=0}.
 
+@item memtag-instrument-allocas
+Enable hardware-assisted memory tagging of dynamically sized stack-allocated
+variables.  This kind of code generation is enabled by default when using
+@option{-fsanitize=memtag-stack}.
+
 @item use-after-scope-direct-emission-threshold
 If the size of a local variable in bytes is smaller or equal to this
 number, directly poison (or unpoison) shadow memory instead of using
@@ -18225,6 +18230,12 @@ possible by specifying the command-line options
 @option{--param hwasan-instrument-allocas=1} respectively. Using a random frame
 tag is not implemented for kernel instrumentation.
 
+@opindex fsanitize=memtag-stack
+@item -fsanitize=memtag-stack
+Use Memory Tagging Extension instructions instead of instrumentation to allow
+the detection of memory errors.  This option is available only on those AArch64
+architectures that support Memory Tagging Extensions.
+
 @opindex fsanitize=pointer-compare
 @item -fsanitize=pointer-compare
 Instrument comparison operation (<, <=, >, >=) with pointer operands.
diff --git a/gcc/flag-types.h b/gcc/flag-types.h
index 9a3cc4a2e16..0c9c863a654 100644
--- a/gcc/flag-types.h
+++ b/gcc/flag-types.h
@@ -337,6 +337,10 @@ enum sanitize_code {
   SANITIZE_KERNEL_HWADDRESS = 1UL << 30,
   /* Shadow Call Stack.  */
   SANITIZE_SHADOW_CALL_STACK = 1UL << 31,
+  /* Memory Tagging for Stack.  */
+  SANITIZE_MEMTAG_STACK = 1ULL << 32,
+  /* Memory Tagging.  */
+  SANITIZE_MEMTAG = SANITIZE_MEMTAG_STACK,
   SANITIZE_SHIFT = SANITIZE_SHIFT_BASE | SANITIZE_SHIFT_EXPONENT,
   SANITIZE_UNDEFINED = SANITIZE_SHIFT | SANITIZE_DIVIDE | SANITIZE_UNREACHABLE
   | SANITIZE_VLA | SANITIZE_NULL | SANITIZE_RETURN
diff --git a/gcc/opts.cc b/gcc/opts.cc
index d00e05f6321..b4f516fdce6 100644
--- a/gcc/opts.cc
+++ b/gcc/opts.cc
@@ -1307,6 +1307,24 @@ finish_options (struct gcc_opt

[PATCH v2] RISC-V: Vector-scalar widening multiply-(subtract-)accumulate [PR119100]

2025-07-10 Thread Paul-Antoine Arras
This pattern enables the combine pass (or late-combine, depending on the case)
to merge a float_extend'ed vec_duplicate into a plus-mult or minus-mult RTL
instruction.

Before this patch, we have three instructions, e.g.:
  fcvt.s.h   fa5,fa5
  vfmv.v.f   v24,fa5
  vfmadd.vv  v8,v24,v16

After, we get only one:
  vfwmacc.vf v8,fa5,v16

PR target/119100

gcc/ChangeLog:

* config/riscv/autovec-opt.md (*vfwmacc_vf_): New pattern to
handle both vfwmacc and vfwmsac.
(*extend_vf_): New pattern that serves as an intermediate combine
step.
* config/riscv/vector-iterators.md (vsubel): New mode attribute. This is
just the lower-case version of VSUBEL.
* config/riscv/vector.md (@pred_widen_mul__scalar): Reorder
and swap operands to match the RTL emitted by expand, i.e. first
float_extend then vec_duplicate.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/rvv/autovec/vx_vf/vf-1-f16.c: Add vfwmacc and
vfwmsac.
* gcc.target/riscv/rvv/autovec/vx_vf/vf-1-f32.c: Likewise.
* gcc.target/riscv/rvv/autovec/vx_vf/vf-2-f16.c: Likewise. Also check
for fcvt and vfmv.
* gcc.target/riscv/rvv/autovec/vx_vf/vf-2-f32.c: Likewise.
* gcc.target/riscv/rvv/autovec/vx_vf/vf-3-f16.c: Add vfwmacc and
vfwmsac.
* gcc.target/riscv/rvv/autovec/vx_vf/vf-3-f32.c: Likewise.
* gcc.target/riscv/rvv/autovec/vx_vf/vf-4-f16.c: Likewise. Also check
for fcvt and vfmv.
* gcc.target/riscv/rvv/autovec/vx_vf/vf-4-f32.c: Likewise.
* gcc.target/riscv/rvv/autovec/vx_vf/vf_mulop.h: Add support for
widening variants.
* gcc.target/riscv/rvv/autovec/vx_vf/vf_mulop_widen_run.h: New test
helper.
* gcc.target/riscv/rvv/autovec/vx_vf/vf_vfwmacc-run-1-f16.c: New test.
* gcc.target/riscv/rvv/autovec/vx_vf/vf_vfwmacc-run-1-f32.c: New test.
* gcc.target/riscv/rvv/autovec/vx_vf/vf_vfwmsac-run-1-f16.c: New test.
* gcc.target/riscv/rvv/autovec/vx_vf/vf_vfwmsac-run-1-f32.c: New test.
---
 gcc/config/riscv/autovec-opt.md   | 48 +++
 gcc/config/riscv/vector-iterators.md  | 41 
 gcc/config/riscv/vector.md|  8 ++--
 .../riscv/rvv/autovec/vx_vf/vf-1-f16.c|  4 ++
 .../riscv/rvv/autovec/vx_vf/vf-1-f32.c|  4 ++
 .../riscv/rvv/autovec/vx_vf/vf-2-f16.c|  4 ++
 .../riscv/rvv/autovec/vx_vf/vf-2-f32.c|  4 ++
 .../riscv/rvv/autovec/vx_vf/vf-3-f16.c|  4 ++
 .../riscv/rvv/autovec/vx_vf/vf-3-f32.c|  4 ++
 .../riscv/rvv/autovec/vx_vf/vf-4-f16.c|  3 ++
 .../riscv/rvv/autovec/vx_vf/vf-4-f32.c|  3 ++
 .../riscv/rvv/autovec/vx_vf/vf_mulop.h| 30 
 .../rvv/autovec/vx_vf/vf_mulop_widen_run.h| 32 +
 .../rvv/autovec/vx_vf/vf_vfwmacc-run-1-f16.c  | 17 +++
 .../rvv/autovec/vx_vf/vf_vfwmacc-run-1-f32.c  | 17 +++
 .../rvv/autovec/vx_vf/vf_vfwmsac-run-1-f16.c  | 17 +++
 .../rvv/autovec/vx_vf/vf_vfwmsac-run-1-f32.c  | 17 +++
 17 files changed, 253 insertions(+), 4 deletions(-)
 create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/autovec/vx_vf/vf_mulop_widen_run.h
 create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/autovec/vx_vf/vf_vfwmacc-run-1-f16.c
 create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/autovec/vx_vf/vf_vfwmacc-run-1-f32.c
 create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/autovec/vx_vf/vf_vfwmsac-run-1-f16.c
 create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/autovec/vx_vf/vf_vfwmsac-run-1-f32.c

diff --git gcc/config/riscv/autovec-opt.md gcc/config/riscv/autovec-opt.md
index 8df7f6494cf..f372f0e6a69 100644
--- gcc/config/riscv/autovec-opt.md
+++ gcc/config/riscv/autovec-opt.md
@@ -1725,6 +1725,8 @@ (define_insn_and_split "*_vx_"
 ;; - vfmsac.vf
 ;; - vfnmacc.vf
 ;; - vfnmsac.vf
+;; - vfwmacc.vf
+;; - vfwmsac.vf
 ;; 
=
 
 ;; vfmadd.vf, vfmsub.vf, vfmacc.vf, vfmsac.vf
@@ -1796,3 +1798,49 @@ (define_insn_and_split "*vfnmadd_"
   }
   [(set_attr "type" "vfmuladd")]
 )
+
+;; vfwmacc.vf, vfwmsac.vf
+(define_insn_and_split "*vfwmacc_vf_"
+  [(set (match_operand:VWEXTF 0 "register_operand")
+(plus_minus:VWEXTF
+   (mult:VWEXTF
+ (float_extend:VWEXTF
+   (match_operand: 3 "register_operand"))
+ (vec_duplicate:VWEXTF
+   (float_extend:
+ (match_operand: 2 "register_operand"
+   (match_operand:VWEXTF 1 "register_operand")))]
+  "TARGET_VECTOR && can_create_pseudo_p ()"
+  "#"
+  "&& 1"
+  [(const_int 0)]
+  {
+rtx ops[] = {operands[0], operands[1], operands[2], operands[3]};
+riscv_vector::emit_vlmax_insn (code_for_pred_widen_mul_scalar (, 
mode),
+  riscv_vector::WIDEN_TERNARY_OP_FRM_DYN, ops);
+DONE;
+  }
+  [(set_attr "type" "vfwmuladd")]
+)
+
+

[PATCH 4/5] Adjust reduction with conversion SLP build

2025-07-10 Thread Richard Biener
The following adjusts how we set SLP_TREE_VECTYPE for the conversion
node we build when fixing up the reduction with conversion SLP instance.
This should probably see more TLC, but the following avoids relying
on STMT_VINFO_VECTYPE for this.

 * tree-vect-slp.cc (vect_build_slp_instance): Do not use
 SLP_TREE_VECTYPE to determine the conversion back to the
 reduction IV.
---
 gcc/tree-vect-slp.cc | 7 ++-
 1 file changed, 6 insertions(+), 1 deletion(-)

diff --git a/gcc/tree-vect-slp.cc b/gcc/tree-vect-slp.cc
index 68ef1ddda77..5ef45fd60f5 100644
--- a/gcc/tree-vect-slp.cc
+++ b/gcc/tree-vect-slp.cc
@@ -4067,7 +4067,12 @@ vect_build_slp_instance (vec_info *vinfo,
  for (unsigned i = 0; i < group_size; ++i)
scalar_stmts.quick_push (next_info);
  slp_tree conv = vect_create_new_slp_node (scalar_stmts, 1);
- SLP_TREE_VECTYPE (conv) = STMT_VINFO_VECTYPE (next_info);
+ SLP_TREE_VECTYPE (conv)
+   = get_vectype_for_scalar_type (vinfo,
+  TREE_TYPE
+(gimple_assign_lhs
+  (scalar_def)),
+  group_size);
  SLP_TREE_CHILDREN (conv).quick_push (node);
  SLP_INSTANCE_TREE (new_instance) = conv;
  /* We also have to fake this conversion stmt as SLP reduction
-- 
2.43.0



[PATCH v3 7/9] asan: memtag-stack add support for MTE instructions

2025-07-10 Thread claudiu . zissulescu-ianculescu
From: Claudiu Zissulescu 

Memory tagging is used for detecting memory safety bugs.  On AArch64, the
memory tagging extension (MTE) helps in reducing the overheads of memory
tagging:
 - CPU: MTE instructions for efficiently tagging and untagging memory.
 - Memory: New memory type, Normal Tagged Memory, added to the Arm
   Architecture.

The MEMory TAGging (MEMTAG) sanitizer uses the same infrastructure as
HWASAN.  MEMTAG and HWASAN are both hardware-assisted solutions, and
rely on the same sanitizer machinery in parts.  So, define new
constructs that allow MEMTAG and HWASAN to share the infrastructure:

  - hwassist_sanitize_p () is true when either SANITIZE_MEMTAG or
SANITIZE_HWASAN is true.
  - hwassist_sanitize_stack_p () is when hwassist_sanitize_p () and
stack variables are to be sanitized.

MEMTAG and HWASAN do have differences, however, and hence, the need to
conditionalize using memtag_sanitize_p () in the relevant places. E.g.,

  - Instead of generating the libcall __hwasan_tag_memory, MEMTAG needs
to invoke the target-specific hook TARGET_MEMTAG_TAG_MEMORY to tag
memory.  Similar approach can be seen for handling
handle_builtin_alloca, where instead of doing the gimple
transformations, target hooks are used.

  - Add a new internal function HWASAN_ALLOCA_POISON to handle
dynamically allocated stack when MEMTAG sanitizer is enabled. At
expansion, this allows to, in turn, invoke target-hooks to increment
tag, and use the generated tag to finally tag the dynamically allocated
memory.

The usual pattern:
irg x0, x0, x0
subgx0, x0, #16, #0
creates a tag in x0 and so on.  For alloca, we need to apply the
generated tag to the new sp.  In absense of an extract tag insn, the
implemenation in expand_HWASAN_ALLOCA_POISON resorts to invoking irg
again.

gcc/ChangeLog:

* asan.cc (handle_builtin_stack_restore): Accommodate MEMTAG
sanitizer.
(handle_builtin_alloca): Expand differently if MEMTAG sanitizer.
(get_mem_refs_of_builtin_call): Include MEMTAG along with
HWASAN.
(memtag_sanitize_stack_p): New definition.
(memtag_sanitize_allocas_p): Likewise.
(memtag_memintrin): Likewise.
(hwassist_sanitize_p): Likewise.
(hwassist_sanitize_stack_p): Likewise.
(report_error_func): Include MEMTAG along with HWASAN.
(build_check_stmt): Likewise.
(instrument_derefs): MEMTAG too does not deal with globals yet.
(instrument_builtin_call):
(maybe_instrument_call): Include MEMTAG along with HWASAN.
(asan_expand_mark_ifn): Likewise.
(asan_expand_check_ifn): Likewise.
(asan_expand_poison_ifn): Expand differently if MEMTAG sanitizer.
(asan_instrument):
(hwasan_frame_base):
(hwasan_record_stack_var):
(hwasan_emit_prologue): Expand differently if MEMTAG sanitizer.
(hwasan_emit_untag_frame): Likewise.
* asan.h (hwasan_record_stack_var):
(memtag_sanitize_stack_p): New declaration.
(memtag_sanitize_allocas_p): Likewise.
(hwassist_sanitize_p): Likewise.
(hwassist_sanitize_stack_p): Likewise.
(asan_sanitize_use_after_scope): Include MEMTAG along with
HWASAN.
* cfgexpand.cc (align_local_variable): Likewise.
(expand_one_stack_var_at): Likewise.
(expand_stack_vars): Likewise.
(expand_one_stack_var_1): Likewise.
(init_vars_expansion): Likewise.
(expand_used_vars): Likewise.
(pass_expand::execute): Likewise.
* gimplify.cc (asan_poison_variable): Likewise.
* internal-fn.cc (expand_HWASAN_ALLOCA_POISON): New definition.
(expand_HWASAN_ALLOCA_UNPOISON): Expand differently if MEMTAG
sanitizer.
(expand_HWASAN_MARK): Likewise.
* internal-fn.def (HWASAN_ALLOCA_POISON): Define new.
* params.opt: Document new param. FIXME.
* sanopt.cc (pass_sanopt::execute): Include MEMTAG along with
HWASAN.
* gcc.c (sanitize_spec_function): Add check for memtag-stack.

Co-authored-by: Indu Bhagat 
Signed-off-by: Claudiu Zissulescu 
---
 gcc/asan.cc | 214 +---
 gcc/asan.h  |  10 ++-
 gcc/cfgexpand.cc|  29 +++---
 gcc/gcc.cc  |   2 +
 gcc/gimplify.cc |   5 +-
 gcc/internal-fn.cc  |  68 --
 gcc/internal-fn.def |   1 +
 gcc/params.opt  |   4 +
 gcc/sanopt.cc   |   2 +-
 9 files changed, 258 insertions(+), 77 deletions(-)

diff --git a/gcc/asan.cc b/gcc/asan.cc
index 748b289d6f9..711e6a71eee 100644
--- a/gcc/asan.cc
+++ b/gcc/asan.cc
@@ -762,14 +762,15 @@ static void
 handle_builtin_stack_restore (gcall *call, gimple_stmt_iterator *iter)
 {
   if (!iter
-  || !(asan_sanitize_allocas_p () || hwasan_sanitize_allocas_p ()))
+  || !(asan_sanitize_allocas_p () || hwasan_sanitize_allocas_p ()
+  || memtag_sanitize_alloc

[PATCH 5/5] Handle failed gcond pattern gracefully

2025-07-10 Thread Richard Biener
SLP analysis of early break conditions asserts pattern recognition
canonicalized all of them.  But the pattern can fail, for example
when vector types cannot be computed.  So be graceful here, so
we don't ICE when we didn't yet compute vector types.

Bootstrapped and tested on x86_64-unknown-linux-gnu, pushed.

* tree-vect-slp.cc (vect_analyze_slp): Fail for non-canonical
gconds.
---
 gcc/tree-vect-slp.cc | 12 +---
 1 file changed, 9 insertions(+), 3 deletions(-)

diff --git a/gcc/tree-vect-slp.cc b/gcc/tree-vect-slp.cc
index 5ef45fd60f5..ad75386926a 100644
--- a/gcc/tree-vect-slp.cc
+++ b/gcc/tree-vect-slp.cc
@@ -5068,9 +5068,15 @@ vect_analyze_slp (vec_info *vinfo, unsigned 
max_tree_size,
  tree args0 = gimple_cond_lhs (stmt);
  tree args1 = gimple_cond_rhs (stmt);
 
- /* These should be enforced by cond lowering.  */
- gcc_assert (gimple_cond_code (stmt) == NE_EXPR);
- gcc_assert (zerop (args1));
+ /* These should be enforced by cond lowering, but if it failed
+bail.  */
+ if (gimple_cond_code (stmt) != NE_EXPR
+ || TREE_TYPE (args0) != boolean_type_node
+ || !integer_zerop (args1))
+   {
+ roots.release ();
+ continue;
+   }
 
  /* An argument without a loop def will be codegened from vectorizing 
the
 root gcond itself.  As such we don't need to try to build an SLP 
tree
-- 
2.43.0


[PATCH] x86: Update "*mov_internal" in mmx.md to handle all 1s vectors

2025-07-10 Thread H.J. Lu
commit 77473a27bae04da99d6979d43e7bd0a8106f4557
Author: H.J. Lu 
Date:   Thu Jun 26 06:08:51 2025 +0800

x86: Also handle all 1s float vector constant

replaces

(insn 29 28 30 5 (set (reg:V2SF 107)
(mem/u/c:V2SF (symbol_ref/u:DI ("*.LC0") [flags 0x2]) [0  S8 A64])) 2031
 {*movv2sf_internal}
 (expr_list:REG_EQUAL (const_vector:V2SF [
(const_double:SF -QNaN [-QNaN]) repeated x2
])
(nil)))

with

(insn 98 13 14 3 (set (reg:V8QI 112)
(const_vector:V8QI [
(const_int -1 [0x]) repeated x8
])) -1
 (nil))
...
(insn 29 28 30 5 (set (reg:V2SF 107)
(subreg:V2SF (reg:V8QI 112) 0)) 2031 {*movv2sf_internal}
 (expr_list:REG_EQUAL (const_vector:V2SF [
(const_double:SF -QNaN [-QNaN]) repeated x2
])
(nil)))

which leads to

pr121015.c: In function ‘render_result_from_bake_h’:
pr121015.c:34:1: error: unrecognizable insn:
   34 | }
  | ^
(insn 98 13 14 3 (set (reg:V8QI 112)
(const_vector:V8QI [
(const_int -1 [0x]) repeated x8
])) -1
 (expr_list:REG_EQUIV (const_vector:V8QI [
(const_int -1 [0x]) repeated x8
])
(nil)))
during RTL pass: ira

1. Update constm1_operand to also return true for integer and float all
1s vectors.
2. Add nonimm_or_0_or_m1_operand for nonimmediate, zero or -1 operand.
3. Add BI for constant all 0s/1s operand.
4. Update "*mov_internal" in mmx.md to handle integer all 1s vectors.
5. Update MMXMODE move splitter to also split all 1s source operand.

gcc/

PR target/121015
* config/i386/constraints.md (BI): New constraint.
* config/i386/i386.cc (ix86_print_operand): Support CONSTM1_RTX.
* config/i386/mmx.md (*mov_internal): Replace C with BI
memory and integer register destination.
Update MMXMODE move splitter to also split all 1s source operand.
* config/i386/predicates.md (constm1_operand): Also return true
for int_float_vector_all_ones_operand.
(nonimm_or_0_or_m1_operand): New predicate.

gcc/testsuite/

PR target/121015
* gcc.target/i386/pr106022-2.c: Adjusted.
* gcc.target/i386/pr121015.c: New test.

OK for master?

-- 
H.J.
From a9846fdd5e8e43c60fa8ea8e78a2cb72da7a12b9 Mon Sep 17 00:00:00 2001
From: "H.J. Lu" 
Date: Thu, 10 Jul 2025 06:21:58 +0800
Subject: [PATCH] x86: Update "*mov_internal" in mmx.md to handle all 1s
 vectors
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

commit 77473a27bae04da99d6979d43e7bd0a8106f4557
Author: H.J. Lu 
Date:   Thu Jun 26 06:08:51 2025 +0800

x86: Also handle all 1s float vector constant

replaces

(insn 29 28 30 5 (set (reg:V2SF 107)
(mem/u/c:V2SF (symbol_ref/u:DI ("*.LC0") [flags 0x2]) [0  S8 A64])) 2031 {*movv2sf_internal}
 (expr_list:REG_EQUAL (const_vector:V2SF [
(const_double:SF -QNaN [-QNaN]) repeated x2
])
(nil)))

with

(insn 98 13 14 3 (set (reg:V8QI 112)
(const_vector:V8QI [
(const_int -1 [0x]) repeated x8
])) -1
 (nil))
...
(insn 29 28 30 5 (set (reg:V2SF 107)
(subreg:V2SF (reg:V8QI 112) 0)) 2031 {*movv2sf_internal}
 (expr_list:REG_EQUAL (const_vector:V2SF [
(const_double:SF -QNaN [-QNaN]) repeated x2
])
(nil)))

which leads to

pr121015.c: In function ‘render_result_from_bake_h’:
pr121015.c:34:1: error: unrecognizable insn:
   34 | }
  | ^
(insn 98 13 14 3 (set (reg:V8QI 112)
(const_vector:V8QI [
(const_int -1 [0x]) repeated x8
])) -1
 (expr_list:REG_EQUIV (const_vector:V8QI [
(const_int -1 [0x]) repeated x8
])
(nil)))
during RTL pass: ira

1. Update constm1_operand to also return true for integer and float all
1s vectors.
2. Add nonimm_or_0_or_m1_operand for nonimmediate, zero or -1 operand.
3. Add BI for constant all 0s/1s operand.
4. Update "*mov_internal" in mmx.md to handle integer all 1s vectors.
5. Update MMXMODE move splitter to also split all 1s source operand.

gcc/

	PR target/121015
	* config/i386/constraints.md (BI): New constraint.
	* config/i386/i386.cc (ix86_print_operand): Support CONSTM1_RTX.
	* config/i386/mmx.md (*mov_internal): Replace C with BI
	memory and integer register destination.
	Update MMXMODE move splitter to also split all 1s source operand.
	* config/i386/predicates.md (constm1_operand): Also return true
	for int_float_vector_all_ones_operand.
	(nonimm_or_0_or_m1_operand): New predicate.

gcc/testsuite/

	PR target/121015
	* gcc.target/i386/pr106022-2.c: Adjusted.
	* gcc.target/i386/pr121015.c: New test.

Signed-off-by: H.J. Lu 
---
 gcc/config/i386/constraints.md |  5 
 gcc/config/i386/i386.cc| 11 +--
 gcc/config/i386/mmx.md | 13 +
 gcc/config/i386/predicates.md  | 26 ++

[PATCH] aarch64: Fix LD1Q and ST1Q failures for big-endian

2025-07-10 Thread Richard Sandiford
LD1Q gathers and ST1Q scatters are unusual in that they operate
on 128-bit blocks (effectively VNx1TI).  However, we don't have
modes or ACLE types for 128-bit integers, and 128-bit integers
are not the intended use case.  Instead, the instructions are
intended to be used in "hybrid VLA" operations, where each 128-bit
block is an Advanced SIMD vector.

The normal SVE modes therefore capture the intended use case better
than VNx1TI would.  For example, VNx2DI is effectively N copies
of V2DI, VNx4SI N copies of V4SI, etc.

Since there is only one LD1Q instruction and one ST1Q instruction,
the ACLE support used a single pattern for each, with the loaded or
stored data having mode VNx2DI.  The ST1Q pattern was generated by:

rtx data = e.args.last ();
e.args.last () = force_lowpart_subreg (VNx2DImode, data, GET_MODE (data));
e.prepare_gather_address_operands (1, false);
return e.use_exact_insn (CODE_FOR_aarch64_scatter_st1q);

where the force_lowpart_subreg bitcast the stored data to VNx2DI.
But such subregs require an element reverse on big-endian targets
(see the comment at the head of aarch64-sve.md), which wasn't the
intention.  The code should have used aarch64_sve_reinterpret instead.

The LD1Q pattern was used as follows:

e.prepare_gather_address_operands (1, false);
return e.use_exact_insn (CODE_FOR_aarch64_gather_ld1q);

which always returns a VNx2DI value, leaving the caller to bitcast
that to the correct mode.  That bitcast again uses subregs and has
the same problem as above.

However, for the reasons explained in the comment, using
aarch64_sve_reinterpret does not work well for LD1Q.  The patch
instead parameterises the LD1Q based on the required data mode.

Tested on aarch64-linux-gnu and aarch64_be-elf.  OK to install?

Richard


gcc/
* config/aarch64/aarch64-sve2.md (aarch64_gather_ld1q): Replace with...
(@aarch64_gather_ld1q): ...this, parameterizing based on mode.
* config/aarch64/aarch64-sve-builtins-sve2.cc
(svld1q_gather_impl::expand): Update accordingly.
(svst1q_scatter_impl::expand): Use aarch64_sve_reinterpret
instead of force_lowpart_subreg.
---
 .../aarch64/aarch64-sve-builtins-sve2.cc  |  5 +++--
 gcc/config/aarch64/aarch64-sve2.md| 21 +--
 2 files changed, 18 insertions(+), 8 deletions(-)

diff --git a/gcc/config/aarch64/aarch64-sve-builtins-sve2.cc 
b/gcc/config/aarch64/aarch64-sve-builtins-sve2.cc
index d9922de7ca5..abe21a8b61c 100644
--- a/gcc/config/aarch64/aarch64-sve-builtins-sve2.cc
+++ b/gcc/config/aarch64/aarch64-sve-builtins-sve2.cc
@@ -316,7 +316,8 @@ public:
   expand (function_expander &e) const override
   {
 e.prepare_gather_address_operands (1, false);
-return e.use_exact_insn (CODE_FOR_aarch64_gather_ld1q);
+auto icode = code_for_aarch64_gather_ld1q (e.tuple_mode (0));
+return e.use_exact_insn (icode);
   }
 };
 
@@ -722,7 +723,7 @@ public:
   expand (function_expander &e) const override
   {
 rtx data = e.args.last ();
-e.args.last () = force_lowpart_subreg (VNx2DImode, data, GET_MODE (data));
+e.args.last () = aarch64_sve_reinterpret (VNx2DImode, data);
 e.prepare_gather_address_operands (1, false);
 return e.use_exact_insn (CODE_FOR_aarch64_scatter_st1q);
   }
diff --git a/gcc/config/aarch64/aarch64-sve2.md 
b/gcc/config/aarch64/aarch64-sve2.md
index 62524f36de6..f39a0a964f2 100644
--- a/gcc/config/aarch64/aarch64-sve2.md
+++ b/gcc/config/aarch64/aarch64-sve2.md
@@ -334,12 +334,21 @@ (define_insn "@aarch64__strided4"
 ;; - LD1Q (SVE2p1)
 ;; -
 
-;; Model this as operating on the largest valid element size, which is DI.
-;; This avoids having to define move patterns & more for VNx1TI, which would
-;; be difficult without a non-gather form of LD1Q.
-(define_insn "aarch64_gather_ld1q"
-  [(set (match_operand:VNx2DI 0 "register_operand")
-   (unspec:VNx2DI
+;; For little-endian targets, it would be enough to use a single pattern,
+;; with a subreg to bitcast the result to whatever mode is needed.
+;; However, on big-endian targets, the bitcast would need to be an
+;; aarch64_sve_reinterpret instruction.  That would interact badly
+;; with the "&" and "?" constraints in this pattern: if the result
+;; of the reinterpret needs to be in the same register as the index,
+;; the RA would tend to prefer to allocate a separate register for the
+;; intermediate (uncast) result, even if the reinterpret prefers tying.
+;;
+;; The index is logically VNx1DI rather than VNx2DI, but introducing
+;; and using VNx1DI would just create more bitcasting.  The ACLE intrinsic
+;; uses svuint64_t, which corresponds to VNx2DI.
+(define_insn "@aarch64_gather_ld1q"
+  [(set (match_operand:SVE_FULL 0 "register_operand")
+   (unspec:SVE_FULL
  [(match_operand:VNx2BI 1 "register_operand")
   (match_operand:DI 2 "aarch64_reg_or_zero")
   (match_operand:VNx

[PATCH] tree-optimization/120939 - remove uninitialized use of LOOP_VINFO_COST_MODEL_THRESHOLD

2025-07-10 Thread Richard Biener
The following removes an optimization that wrongly triggers right now
because it accesses LOOP_VINFO_COST_MODEL_THRESHOLD which might not be
computed yet.

Testing on x86_64 didn't reveal any testsuite coverage.

Bootstrapped and tested on x86_64-unknown-linux-gnu.

OK?

PR tree-optimization/120939
* tree-vect-loop.cc (vect_need_peeling_or_partial_vectors_p):
Remove eliding an epilogue based on not computed
LOOP_VINFO_COST_MODEL_THRESHOLD.
---
 gcc/tree-vect-loop.cc | 21 ++---
 1 file changed, 2 insertions(+), 19 deletions(-)

diff --git a/gcc/tree-vect-loop.cc b/gcc/tree-vect-loop.cc
index 46a6399243d..7ac61d4dce2 100644
--- a/gcc/tree-vect-loop.cc
+++ b/gcc/tree-vect-loop.cc
@@ -1224,13 +1224,6 @@ static bool
 vect_need_peeling_or_partial_vectors_p (loop_vec_info loop_vinfo)
 {
   unsigned HOST_WIDE_INT const_vf;
-  HOST_WIDE_INT max_niter
-= likely_max_stmt_executions_int (LOOP_VINFO_LOOP (loop_vinfo));
-
-  unsigned th = LOOP_VINFO_COST_MODEL_THRESHOLD (loop_vinfo);
-  if (!th && LOOP_VINFO_ORIG_LOOP_INFO (loop_vinfo))
-th = LOOP_VINFO_COST_MODEL_THRESHOLD (LOOP_VINFO_ORIG_LOOP_INFO
- (loop_vinfo));
 
   if (LOOP_VINFO_NITERS_KNOWN_P (loop_vinfo)
   && LOOP_VINFO_PEELING_FOR_ALIGNMENT (loop_vinfo) >= 0)
@@ -1250,18 +1243,8 @@ vect_need_peeling_or_partial_vectors_p (loop_vec_info 
loop_vinfo)
 VF * N + 1.  That's something of a niche case though.  */
   || LOOP_VINFO_PEELING_FOR_GAPS (loop_vinfo)
   || !LOOP_VINFO_VECT_FACTOR (loop_vinfo).is_constant (&const_vf)
-  || ((tree_ctz (LOOP_VINFO_NITERS (loop_vinfo))
-  < (unsigned) exact_log2 (const_vf))
- /* In case of versioning, check if the maximum number of
-iterations is greater than th.  If they are identical,
-the epilogue is unnecessary.  */
- && (!LOOP_REQUIRES_VERSIONING (loop_vinfo)
- || ((unsigned HOST_WIDE_INT) max_niter
- /* We'd like to use LOOP_VINFO_VERSIONING_THRESHOLD
-but that's only computed later based on our result.
-The following is the most conservative approximation.  */
- > (std::max ((unsigned HOST_WIDE_INT) th,
-  const_vf) / const_vf) * const_vf
+  || (tree_ctz (LOOP_VINFO_NITERS (loop_vinfo))
+ < (unsigned) exact_log2 (const_vf)))
 return true;
 
   return false;
-- 
2.43.0


[PATCH 1/2] Passing TYPE_SIZE_UNIT of the element as the 6th argument to .ACCESS_WITH_SIZE (PR121000)

2025-07-10 Thread Qing Zhao
The size of the element of the FAM _cannot_ reliably depends on the original
TYPE of the FAM that we passed as the 6th parameter to the .ACCESS_WITH_SIZE:

 TYPE_SIZE_UNIT (TREE_TYPE (TREE_TYPE (gimple_call_arg (call, 5

when the element of the FAM has a variable length type. Since the variable
 that represents TYPE_SIZE_UNIT has no explicit usage in the original IL,
compiler transformations (such as DSE) that are applied before object_size
phase might eliminate the whole definition to the variable that represents
the TYPE_SIZE_UNIT of the element of the FAM.

In order to resolve this issue, instead of passing the original TYPE of the
FAM as the 6th argument to .ACCESS_WITH_SIZE, we should explicitly pass the
original TYPE_SIZE_UNIT of the element TYPE of the FAM as the 6th argument
to the call to  .ACCESS_WITH_SIZE.

The patches have been bootstrapped and regression tested on both aarch64
and x86.

Okay for trunk?

thanks.

Qing

PR middle-end/121000

gcc/c/ChangeLog:

* c-typeck.cc (build_counted_by_ref): Update comments.
(build_access_with_size_for_counted_by): Pass TYPE_SIZE_UNIT of the
element as the 6th argument.

gcc/ChangeLog:

* internal-fn.cc (expand_DEFERRED_INIT): Update comments.
* internal-fn.def (DEFERRED_INIT): Update comments.
* tree-object-size.cc (access_with_size_object_size): Update comments.
Get the element_size from the 6th argument directly.

gcc/testsuite/ChangeLog:

* gcc.dg/flex-array-counted-by-pr121000.c: New test.
---
 gcc/c/c-typeck.cc | 10 +++--
 gcc/internal-fn.cc| 10 ++---
 gcc/internal-fn.def   |  2 +-
 .../gcc.dg/flex-array-counted-by-pr121000.c   | 42 +++
 gcc/tree-object-size.cc   | 28 ++---
 5 files changed, 68 insertions(+), 24 deletions(-)
 create mode 100644 gcc/testsuite/gcc.dg/flex-array-counted-by-pr121000.c

diff --git a/gcc/c/c-typeck.cc b/gcc/c/c-typeck.cc
index e24629be918..de3d6c78db8 100644
--- a/gcc/c/c-typeck.cc
+++ b/gcc/c/c-typeck.cc
@@ -2983,7 +2983,7 @@ build_counted_by_ref (tree datum, tree subdatum, tree 
*counted_by_type)
to:
 
(*.ACCESS_WITH_SIZE (REF, COUNTED_BY_REF, 1, (TYPE_OF_SIZE)0, -1,
-   (TYPE_OF_ARRAY *)0))
+   TYPE_SIZE_UNIT for element)
 
NOTE: The return type of this function is the POINTER type pointing
to the original flexible array type.
@@ -2995,8 +2995,8 @@ build_counted_by_ref (tree datum, tree subdatum, tree 
*counted_by_type)
The 4th argument of the call is a constant 0 with the TYPE of the
object pointed by COUNTED_BY_REF.
 
-   The 6th argument of the call is a constant 0 with the pointer TYPE
-   to the original flexible array type.
+   The 6th argument of the call is the TYPE_SIZE_UNIT of the element TYPE
+   of the FAM.
 
   */
 static tree
@@ -3007,6 +3007,8 @@ build_access_with_size_for_counted_by (location_t loc, 
tree ref,
   gcc_assert (c_flexible_array_member_type_p (TREE_TYPE (ref)));
   /* The result type of the call is a pointer to the flexible array type.  */
   tree result_type = c_build_pointer_type (TREE_TYPE (ref));
+  tree element_size = TYPE_SIZE_UNIT (TREE_TYPE (TREE_TYPE (ref)));
+
   tree first_param
 = c_fully_fold (array_to_pointer_conversion (loc, ref), false, NULL);
   tree second_param
@@ -3020,7 +3022,7 @@ build_access_with_size_for_counted_by (location_t loc, 
tree ref,
build_int_cst (integer_type_node, 1),
build_int_cst (counted_by_type, 0),
build_int_cst (integer_type_node, -1),
-   build_int_cst (result_type, 0));
+   element_size);
   /* Wrap the call with an INDIRECT_REF with the flexible array type.  */
   call = build1 (INDIRECT_REF, TREE_TYPE (ref), call);
   SET_EXPR_LOCATION (call, loc);
diff --git a/gcc/internal-fn.cc b/gcc/internal-fn.cc
index ed6ef0e4c64..c6e705cb6f5 100644
--- a/gcc/internal-fn.cc
+++ b/gcc/internal-fn.cc
@@ -3443,7 +3443,7 @@ expand_DEFERRED_INIT (internal_fn, gcall *stmt)
 
 /* Expand the IFN_ACCESS_WITH_SIZE function:
ACCESS_WITH_SIZE (REF_TO_OBJ, REF_TO_SIZE, CLASS_OF_SIZE,
-TYPE_OF_SIZE, ACCESS_MODE)
+TYPE_OF_SIZE, ACCESS_MODE, TYPE_SIZE_UNIT for element)
which returns the REF_TO_OBJ same as the 1st argument;
 
1st argument REF_TO_OBJ: The reference to the object;
@@ -3451,16 +3451,16 @@ expand_DEFERRED_INIT (internal_fn, gcall *stmt)
3rd argument CLASS_OF_SIZE: The size referenced by the REF_TO_SIZE 
represents
  0: the number of bytes.
  1: the number of the elements of the object type;
-   4th argument TYPE_OF_SIZE: A constant 0 with its TYPE being the same as the 
TYPE
-of the object referenced by REF_TO_SIZE
+   4th argument TYPE_OF_SIZE: A constant 0 wi

[PATCH 2/2] Reduce the # of arguments of .ACCESS_WITH_SIZE from 6 to 4.

2025-07-10 Thread Qing Zhao
This is an improvement to the design of internal function .ACCESS_WITH_SIZE.

Currently, the .ACCESS_WITH_SIZE is designed as:

   ACCESS_WITH_SIZE (REF_TO_OBJ, REF_TO_SIZE, CLASS_OF_SIZE,
 TYPE_OF_SIZE, ACCESS_MODE, TYPE_SIZE_UNIT for element)
   which returns the REF_TO_OBJ same as the 1st argument;

   1st argument REF_TO_OBJ: The reference to the object;
   2nd argument REF_TO_SIZE: The reference to the size of the object,
   3rd argument CLASS_OF_SIZE: The size referenced by the REF_TO_SIZE represents
 0: the number of bytes.
 1: the number of the elements of the object type;
   4th argument TYPE_OF_SIZE: A constant 0 with its TYPE being the same as the
 TYPE of the object referenced by REF_TO_SIZE
   5th argument ACCESS_MODE:
 -1: Unknown access semantics
  0: none
  1: read_only
  2: write_only
  3: read_write
   6th argument: The TYPE_SIZE_UNIT of the element TYPE of the FAM when 3rd
  argument is 1. NULL when 3rd argument is 0.

Among the 6 arguments:
 A. The 3rd argument CLASS_OF_SIZE is not needed. If the REF_TO_SIZE represents
the number of bytes, simply pass 1 to the TYPE_SIZE_UNIT argument.
 B. The 4th and the 5th arguments can be combined into 1 argument, whose TYPE
represents the TYPE_OF_SIZE, and the constant value represents the
ACCESS_MODE.

As a result, the new design of the .ACCESS_WITH_SIZE is:

   ACCESS_WITH_SIZE (REF_TO_OBJ, REF_TO_SIZE,
 TYPE_OF_SIZE + ACCESS_MODE, TYPE_SIZE_UNIT for element)
   which returns the REF_TO_OBJ same as the 1st argument;

   1st argument REF_TO_OBJ: The reference to the object;
   2nd argument REF_TO_SIZE: The reference to the size of the object,
   3rd argument TYPE_OF_SIZE + ACCESS_MODE: An integer constant with a pointer
 TYPE.
 The pointee TYPE of the pointer TYPE is the TYPE of the object referenced
by REF_TO_SIZE.
 The integer constant value represents the ACCESS_MODE:
0: none
1: read_only
2: write_only
3: read_write
   4th argument: The TYPE_SIZE_UNIT of the element TYPE of the array.

The patches have been bootstrapped and regression tested on both aarch64
and x86.

Okay for trunk?

thanks.

Qing



gcc/c-family/ChangeLog:

* c-ubsan.cc (get_bound_from_access_with_size): Adjust the position
of the arguments per the new design.

gcc/c/ChangeLog:

* c-typeck.cc (build_counted_by_ref): Update comments.
(build_access_with_size_for_counted_by): Adjust the arguments per
the new design.

gcc/ChangeLog:

* internal-fn.cc (expand_DEFERRED_INIT): Update comments.
* internal-fn.def (DEFERRED_INIT): Update comments.
* tree-object-size.cc (addr_object_size): Update comments.
(access_with_size_object_size): Adjust the arguments per the new
design.
---
 gcc/c-family/c-ubsan.cc | 10 ++
 gcc/c/c-typeck.cc   | 18 +-
 gcc/internal-fn.cc  | 28 +---
 gcc/internal-fn.def |  2 +-
 gcc/tree-object-size.cc | 34 +-
 5 files changed, 38 insertions(+), 54 deletions(-)

diff --git a/gcc/c-family/c-ubsan.cc b/gcc/c-family/c-ubsan.cc
index 78b78685469..a4dc31066af 100644
--- a/gcc/c-family/c-ubsan.cc
+++ b/gcc/c-family/c-ubsan.cc
@@ -397,8 +397,7 @@ get_bound_from_access_with_size (tree call)
 return NULL_TREE;
 
   tree ref_to_size = CALL_EXPR_ARG (call, 1);
-  unsigned int class_of_size = TREE_INT_CST_LOW (CALL_EXPR_ARG (call, 2));
-  tree type = TREE_TYPE (CALL_EXPR_ARG (call, 3));
+  tree type = TREE_TYPE (TREE_TYPE (CALL_EXPR_ARG (call, 2)));
   tree size = fold_build2 (MEM_REF, type, unshare_expr (ref_to_size),
   build_int_cst (ptr_type_node, 0));
   /* If size is negative value, treat it as zero.  */
@@ -410,12 +409,7 @@ get_bound_from_access_with_size (tree call)
build_zero_cst (type), size);
   }
 
-  /* Only when class_of_size is 1, i.e, the number of the elements of
- the object type, return the size.  */
-  if (class_of_size != 1)
-return NULL_TREE;
-  else
-size = fold_convert (sizetype, size);
+  size = fold_convert (sizetype, size);
 
   return size;
 }
diff --git a/gcc/c/c-typeck.cc b/gcc/c/c-typeck.cc
index de3d6c78db8..9a5eb0da3a1 100644
--- a/gcc/c/c-typeck.cc
+++ b/gcc/c/c-typeck.cc
@@ -2982,7 +2982,7 @@ build_counted_by_ref (tree datum, tree subdatum, tree 
*counted_by_type)
 
to:
 
-   (*.ACCESS_WITH_SIZE (REF, COUNTED_BY_REF, 1, (TYPE_OF_SIZE)0, -1,
+   (*.ACCESS_WITH_SIZE (REF, COUNTED_BY_REF, (* TYPE_OF_SIZE)0,
TYPE_SIZE_UNIT for element)
 
NOTE: The return type of this function is the POINTER type pointing
@@ -2992,11 +2992,11 @@ build_counted_by_ref (tree datum, tree subdatum, tree 
*counted_by_type)
The type of the first argument of this function is a POINTER type
to the original flexible array type.
 
-   The 4th 

[PATCH v3 1/9] targhooks: i386: rename TAG_SIZE to TAG_BITSIZE

2025-07-10 Thread claudiu . zissulescu-ianculescu
From: Indu Bhagat 

gcc/Changelog:

* asan.h (HWASAN_TAG_SIZE): Use targetm.memtag.tag_bitsize.
* config/i386/i386.cc (ix86_memtag_tag_size): Rename to
ix86_memtag_tag_bitsize.
(TARGET_MEMTAG_TAG_SIZE): Renamed to TARGET_MEMTAG_TAG_BITSIZE.
* doc/tm.texi (TARGET_MEMTAG_TAG_SIZE): Likewise.
* doc/tm.texi.in (TARGET_MEMTAG_TAG_SIZE): Likewise.
* target.def (tag_size): Rename to tag_bitsize.
* targhooks.cc (default_memtag_tag_size): Rename to
default_memtag_tag_bitsize.
* targhooks.h (default_memtag_tag_size): Liewise.

Signed-off-by: Claudiu Zissulescu 
---
 gcc/asan.h  | 2 +-
 gcc/config/i386/i386.cc | 8 
 gcc/doc/tm.texi | 2 +-
 gcc/doc/tm.texi.in  | 2 +-
 gcc/target.def  | 4 ++--
 gcc/targhooks.cc| 2 +-
 gcc/targhooks.h | 2 +-
 7 files changed, 11 insertions(+), 11 deletions(-)

diff --git a/gcc/asan.h b/gcc/asan.h
index 273d6745c58..064d4f24823 100644
--- a/gcc/asan.h
+++ b/gcc/asan.h
@@ -103,7 +103,7 @@ extern hash_set  *asan_used_labels;
independently here.  */
 /* How many bits are used to store a tag in a pointer.
The default version uses the entire top byte of a pointer (i.e. 8 bits).  */
-#define HWASAN_TAG_SIZE targetm.memtag.tag_size ()
+#define HWASAN_TAG_SIZE targetm.memtag.tag_bitsize ()
 /* Tag Granule of HWASAN shadow stack.
This is the size in real memory that each byte in the shadow memory refers
to.  I.e. if a variable is X bytes long in memory then its tag in shadow
diff --git a/gcc/config/i386/i386.cc b/gcc/config/i386/i386.cc
index b64175d6c93..17faf7ebd24 100644
--- a/gcc/config/i386/i386.cc
+++ b/gcc/config/i386/i386.cc
@@ -27095,9 +27095,9 @@ ix86_memtag_can_tag_addresses ()
   return ix86_lam_type != lam_none && TARGET_LP64;
 }
 
-/* Implement TARGET_MEMTAG_TAG_SIZE.  */
+/* Implement TARGET_MEMTAG_TAG_BITSIZE.  */
 unsigned char
-ix86_memtag_tag_size ()
+ix86_memtag_tag_bitsize ()
 {
   return IX86_HWASAN_TAG_SIZE;
 }
@@ -28071,8 +28071,8 @@ ix86_libgcc_floating_mode_supported_p
 #undef TARGET_MEMTAG_UNTAGGED_POINTER
 #define TARGET_MEMTAG_UNTAGGED_POINTER ix86_memtag_untagged_pointer
 
-#undef TARGET_MEMTAG_TAG_SIZE
-#define TARGET_MEMTAG_TAG_SIZE ix86_memtag_tag_size
+#undef TARGET_MEMTAG_TAG_BITSIZE
+#define TARGET_MEMTAG_TAG_BITSIZE ix86_memtag_tag_bitsize
 
 #undef TARGET_GEN_CCMP_FIRST
 #define TARGET_GEN_CCMP_FIRST ix86_gen_ccmp_first
diff --git a/gcc/doc/tm.texi b/gcc/doc/tm.texi
index 5e305643b3a..3f87abf97b2 100644
--- a/gcc/doc/tm.texi
+++ b/gcc/doc/tm.texi
@@ -12860,7 +12860,7 @@ At preset, this feature does not support address 
spaces.  It also requires
 @code{Pmode} to be the same as @code{ptr_mode}.
 @end deftypefn
 
-@deftypefn {Target Hook} uint8_t TARGET_MEMTAG_TAG_SIZE ()
+@deftypefn {Target Hook} uint8_t TARGET_MEMTAG_TAG_BITSIZE ()
 Return the size of a tag (in bits) for this platform.
 
 The default returns 8.
diff --git a/gcc/doc/tm.texi.in b/gcc/doc/tm.texi.in
index eccc4d88493..040d26c40f1 100644
--- a/gcc/doc/tm.texi.in
+++ b/gcc/doc/tm.texi.in
@@ -8124,7 +8124,7 @@ maintainer is familiar with.
 
 @hook TARGET_MEMTAG_CAN_TAG_ADDRESSES
 
-@hook TARGET_MEMTAG_TAG_SIZE
+@hook TARGET_MEMTAG_TAG_BITSIZE
 
 @hook TARGET_MEMTAG_GRANULE_SIZE
 
diff --git a/gcc/target.def b/gcc/target.def
index 38903eb567a..db48df9498d 100644
--- a/gcc/target.def
+++ b/gcc/target.def
@@ -7457,11 +7457,11 @@ At preset, this feature does not support address 
spaces.  It also requires\n\
  bool, (), default_memtag_can_tag_addresses)
 
 DEFHOOK
-(tag_size,
+(tag_bitsize,
  "Return the size of a tag (in bits) for this platform.\n\
 \n\
 The default returns 8.",
-  uint8_t, (), default_memtag_tag_size)
+  uint8_t, (), default_memtag_tag_bitsize)
 
 DEFHOOK
 (granule_size,
diff --git a/gcc/targhooks.cc b/gcc/targhooks.cc
index c79458e374e..0696f95adeb 100644
--- a/gcc/targhooks.cc
+++ b/gcc/targhooks.cc
@@ -2806,7 +2806,7 @@ default_memtag_can_tag_addresses ()
 }
 
 uint8_t
-default_memtag_tag_size ()
+default_memtag_tag_bitsize ()
 {
   return 8;
 }
diff --git a/gcc/targhooks.h b/gcc/targhooks.h
index f16b58798c2..c9e57e475dc 100644
--- a/gcc/targhooks.h
+++ b/gcc/targhooks.h
@@ -310,7 +310,7 @@ extern bool speculation_safe_value_not_needed (bool);
 extern rtx default_speculation_safe_value (machine_mode, rtx, rtx, rtx);
 
 extern bool default_memtag_can_tag_addresses ();
-extern uint8_t default_memtag_tag_size ();
+extern uint8_t default_memtag_tag_bitsize ();
 extern uint8_t default_memtag_granule_size ();
 extern rtx default_memtag_insert_random_tag (rtx, rtx);
 extern rtx default_memtag_add_tag (rtx, poly_int64, uint8_t);
-- 
2.50.0



[PATCH v3 0/9] Add memtag-stack sanitizer using MTE instructions.

2025-07-10 Thread claudiu . zissulescu-ianculescu
From: Claudiu Zissulescu 

Hi,

Please find a new series of patches that implememnts stack sanitizer
using AArch64 MTE instructions. This new series is based on Indu
previous patch series.

What is new:
 - Introduces a new target instruction tag_memory.
 - Introduces a new target hook to deal with tag computation
   (TARGET_MEMTAG_COMPOSE_OFFSET_TAG).
 - Simplify the stg/st2g instruction patterns to accept POST/PRE
   modify type of addresses.
 - Minimize asan.cc modification.
 - Add execution tests.
 - Improve and fix emitting stg/st2g instructions.
 - Various text improvements.

Thank you,
Claudiu

==
MTE on AArch64 and Memory Tagging

Memory Tagging Extension (MTE) is an AArch64 extension.  This
extension allows coloring of 16-byte memory granules with 4-bit tag
values.  The extension provides additional instructions in ISA and a
new memory type, Normal Tagged Memory, added to the Arm Architecture.
This hardware-assisted mechanism can be used to detect memory bugs
like buffer overrun or use-after-free.  The detection is
probabilistic.

Under the hoods, the MTE extension introduces two types of tags:
  - Address Tags, and,
  - Allocation Tags (a.k.a., Memory Tags)

Address Tag: which acts as the key.  This adds four bits to the top of
a virtual address.  It is built on AArch64 'top-byte-ignore'(TBI)
feature.

Allocation Tag: which acts as the lock.  Allocation tags also consist
of four bits, linked with every aligned 16-byte region in the physical
memory space.  Arm refers to these 16-byte regions as tag granules.
The way Allocation tags are stored is a hardware implementation
detail.

A subset of the MTE instructions which are relevant in the current
context are:

[Xn, Xd are registers containing addresses].

- irg Xd, Xn
  Copy Xn into Xd, insert a random 4-bit Address Tag into Xd.
- addg Xd, Xn, #, #
  Xd = Xn + immA, with Address Tag modified by #immB. Similarly, there
  exists a subg.
- stg Xd, [Xn]
  (Store Allocation Tag) updates Allocation Tag for [Xn, Xn + 16) to the
  Address Tag of Xd.

Additionally, note that load and store instructions with SP base
register do not check tags.

MEMTAG sanitizer for stack
Use MTE instructions to instrument stack accesses to detect memory safety
issues.

Detecting stack-related memory bugs requires the compiler to:
  - ensure that each object on the stack is allocated in its own 16-byte
granule. 
  - Tag/Color: put tags into each stack variable pointer.
  - Untag: the function epilogue will untag the (stack) memory.
Above should work with dynamic stack allocation as well.

GCC has HWASAN machinery for coloring stack variables.  Extend the
machinery to emit MTE instructions when MEMTAG sanitizer is in effect.

Deploying and running user space programs built with -fsanitizer=memtag will
need following additional pieces in place.  If there is any existing work /
ideas on any of the following, please send comments to help define the work.

Additional necessary pieces

* MTE aware exception handling and unwinding routines
The additional stack coloring must work with C++ exceptions and C 
setjmp/longjmp.

* When unwinding the stack for handling C++ exceptions, the unwinder
additionally also needs to untag the stack frame.  As per the
AADWARF64 document: "The character 'G' indicates that associated
frames may modify MTE tags on the stack space they use."

* When restoring the context in longjmp, we need to additionally untag the 
stack.

Claudiu Zissulescu (4):
  target-insns.def: (tag_memory) New pattern.
  targhooks: add TARGET_MEMTAG_COMPOSE_OFFSET_TAG
  asan: memtag-stack add support for MTE instructions
  aarch64: Add support for memetag-stack sanitizer using MTE insns

Indu Bhagat (5):
  targhooks: i386: rename TAG_SIZE to TAG_BITSIZE
  opts: use uint64_t for sanitizer flags
  aarch64: add new constants for MTE insns
  asan: add new memtag sanitizer
  aarch64: Add memtag-stack tests

 gcc/asan.cc   | 214 +++---
 gcc/asan.h|  17 +-
 gcc/builtins.def  |   1 +
 gcc/c-family/c-attribs.cc |  16 +-
 gcc/c-family/c-common.h   |   2 +-
 gcc/c/c-parser.cc |   4 +-
 gcc/cfgexpand.cc  |  29 +-
 gcc/common.opt|   6 +-
 gcc/config/aarch64/aarch64-builtins.cc|   7 +-
 gcc/config/aarch64/aarch64-linux.h|   4 +-
 gcc/config/aarch64/aarch64-protos.h   |   4 +
 gcc/config/aarch64/aarch64.cc | 370 +-
 gcc/config/aarch64/aarch64.md |  78 ++--
 gcc/config/aarch64/constraints.md |  26 ++
 gcc/config/aarch64/predicates.md  |  13 +-
 gcc/config/i386/i386.cc   |   8 +-
 gcc/cp/typeck.cc  |   2 +-
 gcc/d/d-attribs.cc|   8 +-
 gcc/doc/invoke.texi   

[PATCH v3 3/9] target-insns.def: (tag_memory) New pattern.

2025-07-10 Thread claudiu . zissulescu-ianculescu
From: Claudiu Zissulescu 

Add a new target instruction. Hardware-assisted sanitizers on
architectures providing insstructions to tag/untag memory can then
make use of this new instruction pattern. For example, the
memtag-stack sanitizer uses these instructions to tag and untag a
memory granule.

gcc/doc/

* md.texi (tag_memory): Add documentation.

gcc/

* target-insns.def (tag_memory): New target instruction.

Signed-off-by: Claudiu Zissulescu 
---
 gcc/doc/md.texi  | 5 +
 gcc/target-insns.def | 1 +
 2 files changed, 6 insertions(+)

diff --git a/gcc/doc/md.texi b/gcc/doc/md.texi
index 28159b2e820..e4c9a472e3f 100644
--- a/gcc/doc/md.texi
+++ b/gcc/doc/md.texi
@@ -8578,6 +8578,11 @@ the values were equal.
 If this pattern is not defined, then a plain compare pattern and
 conditional branch pattern is used.
 
+@cindex @code{tag_memory} instruction pattern
+This pattern tags an object that begins at the address specified by
+operand 0, has the size indicated by the operand 2, and uses the tag
+from operand 1.
+
 @cindex @code{clear_cache} instruction pattern
 @item @samp{clear_cache}
 This pattern, if defined, flushes the instruction cache for a region of
diff --git a/gcc/target-insns.def b/gcc/target-insns.def
index 59025a20bf7..16e1d8cf565 100644
--- a/gcc/target-insns.def
+++ b/gcc/target-insns.def
@@ -102,6 +102,7 @@ DEF_TARGET_INSN (stack_protect_combined_test, (rtx x0, rtx 
x1, rtx x2))
 DEF_TARGET_INSN (stack_protect_test, (rtx x0, rtx x1, rtx x2))
 DEF_TARGET_INSN (store_multiple, (rtx x0, rtx x1, rtx x2))
 DEF_TARGET_INSN (tablejump, (rtx x0, rtx x1))
+DEF_TARGET_INSN (tag_memory, (rtx x0, rtx x1, rtx x2))
 DEF_TARGET_INSN (trap, (void))
 DEF_TARGET_INSN (unique, (void))
 DEF_TARGET_INSN (untyped_call, (rtx x0, rtx x1, rtx x2))
-- 
2.50.0



Re: [PATCH] expand: ICE if asked to expand RDIV with non-float type.

2025-07-10 Thread Richard Biener
On Thu, Jul 10, 2025 at 12:38 PM Robin Dapp  wrote:
>
> Hi,
>
> this patch adds asserts that ensure we only expand an RDIV_EXPR with
> actual float mode.  It also replaces the RDIV_EXPR in setting a
> vectorized loop's length by EXACT_DIV_EXPR.  The code in question is
> only used with length-control targets (riscv, powerpc, s390).
>
> Bootstrapped and regtested on x86, aarch64, and power10.  Regtested on
> rv64gcv_zvl512b.

OK.

I'm not sure what we use for division for fixed-point modes (all mode
kinds in the end use the sdiv optab).

> Regards
>  Robin
>
> PR target/121014
>
> gcc/ChangeLog:
>
> * cfgexpand.cc (expand_debug_expr): Assert FLOAT_MODE_P.
> * optabs-tree.cc (optab_for_tree_code): Assert FLOAT_TYPE_P.
> * tree-vect-loop.cc (vect_get_loop_len): Use EXACT_DIV_EXPR.
> ---
>  gcc/cfgexpand.cc  | 2 ++
>  gcc/optabs-tree.cc| 2 ++
>  gcc/tree-vect-loop.cc | 2 +-
>  3 files changed, 5 insertions(+), 1 deletion(-)
>
> diff --git a/gcc/cfgexpand.cc b/gcc/cfgexpand.cc
> index 33649d43f71..a656ccebf17 100644
> --- a/gcc/cfgexpand.cc
> +++ b/gcc/cfgexpand.cc
> @@ -5358,6 +5358,8 @@ expand_debug_expr (tree exp)
>return simplify_gen_binary (MULT, mode, op0, op1);
>
>  case RDIV_EXPR:
> +  gcc_assert (FLOAT_MODE_P (mode));
> +  /* Fall through.  */
>  case TRUNC_DIV_EXPR:
>  case EXACT_DIV_EXPR:
>if (unsignedp)
> diff --git a/gcc/optabs-tree.cc b/gcc/optabs-tree.cc
> index 6dfe8ee4c4e..9308a6dfd65 100644
> --- a/gcc/optabs-tree.cc
> +++ b/gcc/optabs-tree.cc
> @@ -82,6 +82,8 @@ optab_for_tree_code (enum tree_code code, const_tree type,
> return unknown_optab;
>/* FALLTHRU */
>  case RDIV_EXPR:
> +  gcc_assert (FLOAT_TYPE_P (type));
> +  /* FALLTHRU */
>  case TRUNC_DIV_EXPR:
>  case EXACT_DIV_EXPR:
>if (TYPE_SATURATING (type))
> diff --git a/gcc/tree-vect-loop.cc b/gcc/tree-vect-loop.cc
> index d5044d5fe22..432a248715e 100644
> --- a/gcc/tree-vect-loop.cc
> +++ b/gcc/tree-vect-loop.cc
> @@ -11429,7 +11429,7 @@ vect_get_loop_len (loop_vec_info loop_vinfo, 
> gimple_stmt_iterator *gsi,
>   factor = exact_div (nunits1, nunits2).to_constant ();
>   tree iv_type = LOOP_VINFO_RGROUP_IV_TYPE (loop_vinfo);
>   gimple_seq seq = NULL;
> - loop_len = gimple_build (&seq, RDIV_EXPR, iv_type, loop_len,
> + loop_len = gimple_build (&seq, EXACT_DIV_EXPR, iv_type, loop_len,
>build_int_cst (iv_type, factor));
>   if (seq)
> gsi_insert_seq_before (gsi, seq, GSI_SAME_STMT);
> --
> 2.50.0
>


[PATCH 2/5] Avoid vect_is_simple_use call from get_load_store_type

2025-07-10 Thread Richard Biener
This isn't the required refactoring of vect_check_gather_scatter
but it avoids a now unnecessary call to vect_is_simple_use which
is problematic because it looks at STMT_VINFO_VECTYPE which we
want to get rid of.  SLP build already ensures vect_is_simple_use
on all lane defs, so all we need is to populate the offset_vectype
and offset_dt which is not always set by vect_check_gather_scatter.
That's both easy to get from the SLP child directly.

* tree-vect-stmts.cc (get_load_store_type): Do not use
vect_is_simple_use to fill gather/scatter offset operand
vectype and dt.
---
 gcc/tree-vect-stmts.cc | 15 ---
 1 file changed, 4 insertions(+), 11 deletions(-)

diff --git a/gcc/tree-vect-stmts.cc b/gcc/tree-vect-stmts.cc
index e5971e4a357..4aa69da2218 100644
--- a/gcc/tree-vect-stmts.cc
+++ b/gcc/tree-vect-stmts.cc
@@ -2466,17 +2466,10 @@ get_load_store_type (vec_info  *vinfo, stmt_vec_info 
stmt_info,
 vls_type == VLS_LOAD ? "gather" : "scatter");
  return false;
}
-  else if (!vect_is_simple_use (gs_info->offset, vinfo,
-   &gs_info->offset_dt,
-   &gs_info->offset_vectype))
-   {
- if (dump_enabled_p ())
-   dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location,
-"%s index use not simple.\n",
-vls_type == VLS_LOAD ? "gather" : "scatter");
- return false;
-   }
-  else if (gs_info->ifn == IFN_LAST && !gs_info->decl)
+  slp_tree offset_node = SLP_TREE_CHILDREN (slp_node)[0];
+  gs_info->offset_dt = SLP_TREE_DEF_TYPE (offset_node);
+  gs_info->offset_vectype = SLP_TREE_VECTYPE (offset_node);
+  if (gs_info->ifn == IFN_LAST && !gs_info->decl)
{
  if (!TYPE_VECTOR_SUBPARTS (vectype).is_constant ()
  || !TYPE_VECTOR_SUBPARTS (gs_info->offset_vectype).is_constant ()
-- 
2.43.0



[PATCH v3 4/9] aarch64: add new constants for MTE insns

2025-07-10 Thread claudiu . zissulescu-ianculescu
From: Indu Bhagat 

Define new constants to be used by the MTE pattern definitions.

gcc/

* config/aarch64/aarch64.md (MEMTAG_TAG_MASK): New define
constant.
(MEMTAG_ADDR_MASK): Likewise.
(irg, subp, ldg): Use new constants.

Signed-off-by: Claudiu Zissulescu 
---
 gcc/config/aarch64/aarch64.md | 18 ++
 1 file changed, 14 insertions(+), 4 deletions(-)

diff --git a/gcc/config/aarch64/aarch64.md b/gcc/config/aarch64/aarch64.md
index 27efc9155dc..bade8af7997 100644
--- a/gcc/config/aarch64/aarch64.md
+++ b/gcc/config/aarch64/aarch64.md
@@ -440,6 +440,16 @@ (define_constants
; must not operate on inactive inputs if doing so could induce a fault.
(SVE_STRICT_GP 1)])
 
+;; These constants are used as a const_int in MTE instructions
+(define_constants
+  [; 0xf0ff...
+   ; Tag mask for the 4-bit tag stored in the top 8 bits of a pointer.
+   (MEMTAG_TAG_MASK -1080863910568919041)
+
+   ;  0x00ff...
+   ; Tag mask 56-bit address used by subp instruction.
+   (MEMTAG_ADDR_MASK 72057594037927935)])
+
 (include "constraints.md")
 (include "predicates.md")
 (include "iterators.md")
@@ -8556,7 +8566,7 @@ (define_insn "irg"
   [(set (match_operand:DI 0 "register_operand" "=rk")
(ior:DI
 (and:DI (match_operand:DI 1 "register_operand" "rk")
-(const_int -1080863910568919041)) ;; 0xf0ff...
+(const_int MEMTAG_TAG_MASK)) ;; 0xf0ff...
 (ashift:DI (unspec:QI [(match_operand:DI 2 "register_operand" "r")]
 UNSPEC_GEN_TAG_RND)
(const_int 56]
@@ -8599,9 +8609,9 @@ (define_insn "subp"
   [(set (match_operand:DI 0 "register_operand" "=r")
(minus:DI
  (and:DI (match_operand:DI 1 "register_operand" "rk")
- (const_int 72057594037927935)) ;; 0x00ff...
+ (const_int MEMTAG_ADDR_MASK)) ;; 0x00ff...
  (and:DI (match_operand:DI 2 "register_operand" "rk")
- (const_int 72057594037927935] ;; 0x00ff...
+ (const_int MEMTAG_ADDR_MASK] ;; 0x00ff...
   "TARGET_MEMTAG"
   "subp\\t%0, %1, %2"
   [(set_attr "type" "memtag")]
@@ -8611,7 +8621,7 @@ (define_insn "subp"
 (define_insn "ldg"
   [(set (match_operand:DI 0 "register_operand" "+r")
(ior:DI
-(and:DI (match_dup 0) (const_int -1080863910568919041)) ;; 0xf0ff...
+(and:DI (match_dup 0) (const_int MEMTAG_TAG_MASK)) ;; 0xf0ff...
 (ashift:DI
  (mem:QI (unspec:DI
   [(and:DI (plus:DI (match_operand:DI 1 "register_operand" "rk")
-- 
2.50.0



[PATCH v3 5/9] targhooks: add TARGET_MEMTAG_COMPOSE_OFFSET_TAG

2025-07-10 Thread claudiu . zissulescu-ianculescu
From: Claudiu Zissulescu 

Add a new target hook TARGET_MEMTAG_COMPOSE_OFFSET_TAG to perform
addition between two tags.

The default of this hook is to byte add the inputs.

Hardware-assisted sanitizers on architectures providing instructions
to compose (add) two tags like in the case of AArch64.

gcc/

* doc/tm.texi: Re-generate.
* doc/tm.texi.in: Add documentation for new target hooks.
* target.def: Add new hook.
* targhooks.cc (default_memtag_compose_offset_tag): New hook.
* targhooks.h (default_memtag_compose_offset_tag): Likewise.

Signed-off-by: Claudiu Zissulescu 
---
 gcc/doc/tm.texi| 6 ++
 gcc/doc/tm.texi.in | 2 ++
 gcc/target.def | 7 +++
 gcc/targhooks.cc   | 7 +++
 gcc/targhooks.h| 2 +-
 5 files changed, 23 insertions(+), 1 deletion(-)

diff --git a/gcc/doc/tm.texi b/gcc/doc/tm.texi
index 3f87abf97b2..a4fba6d21b3 100644
--- a/gcc/doc/tm.texi
+++ b/gcc/doc/tm.texi
@@ -12917,6 +12917,12 @@ Store the result in @var{target} if convenient.
 The default clears the top byte of the original pointer.
 @end deftypefn
 
+@deftypefn {Target Hook} rtx TARGET_MEMTAG_COMPOSE_OFFSET_TAG (rtx 
@var{base_tag}, uint8_t @var{tag_offset})
+Return an RTX that represnts the result of composing @var{tag_offset} with
+the base tag @var{base_tag}.
+The default of this hook is to byte add @var{tag_offset} to @var{base_tag}.
+@end deftypefn
+
 @deftypevr {Target Hook} bool TARGET_HAVE_SHADOW_CALL_STACK
 This value is true if the target platform supports
 @option{-fsanitize=shadow-call-stack}.  The default value is false.
diff --git a/gcc/doc/tm.texi.in b/gcc/doc/tm.texi.in
index 040d26c40f1..ff381b486e1 100644
--- a/gcc/doc/tm.texi.in
+++ b/gcc/doc/tm.texi.in
@@ -8138,6 +8138,8 @@ maintainer is familiar with.
 
 @hook TARGET_MEMTAG_UNTAGGED_POINTER
 
+@hook TARGET_MEMTAG_COMPOSE_OFFSET_TAG
+
 @hook TARGET_HAVE_SHADOW_CALL_STACK
 
 @hook TARGET_HAVE_LIBATOMIC
diff --git a/gcc/target.def b/gcc/target.def
index db48df9498d..89f96ca73c5 100644
--- a/gcc/target.def
+++ b/gcc/target.def
@@ -7521,6 +7521,13 @@ Store the result in @var{target} if convenient.\n\
 The default clears the top byte of the original pointer.",
   rtx, (rtx tagged_pointer, rtx target), default_memtag_untagged_pointer)
 
+DEFHOOK
+(compose_offset_tag,
+ "Return an RTX that represnts the result of composing @var{tag_offset} with\n\
+the base tag @var{base_tag}.\n\
+The default of this hook is to byte add @var{tag_offset} to @var{base_tag}.",
+  rtx, (rtx base_tag, uint8_t tag_offset), default_memtag_compose_offset_tag)
+
 HOOK_VECTOR_END (memtag)
 #undef HOOK_PREFIX
 #define HOOK_PREFIX "TARGET_"
diff --git a/gcc/targhooks.cc b/gcc/targhooks.cc
index 0696f95adeb..cfea4a70403 100644
--- a/gcc/targhooks.cc
+++ b/gcc/targhooks.cc
@@ -2904,4 +2904,11 @@ default_memtag_untagged_pointer (rtx tagged_pointer, rtx 
target)
   return untagged_base;
 }
 
+/* The default implementation of TARGET_MEMTAG_COMPOSE_OFFSET_TAG.  */
+rtx
+default_memtag_compose_offset_tag (rtx base_tag, uint8_t tag_offset)
+{
+  return plus_constant (QImode, base_tag, tag_offset);
+}
+
 #include "gt-targhooks.h"
diff --git a/gcc/targhooks.h b/gcc/targhooks.h
index c9e57e475dc..76afce71baa 100644
--- a/gcc/targhooks.h
+++ b/gcc/targhooks.h
@@ -317,5 +317,5 @@ extern rtx default_memtag_add_tag (rtx, poly_int64, 
uint8_t);
 extern rtx default_memtag_set_tag (rtx, rtx, rtx);
 extern rtx default_memtag_extract_tag (rtx, rtx);
 extern rtx default_memtag_untagged_pointer (rtx, rtx);
-
+extern rtx default_memtag_compose_offset_tag (rtx, uint8_t);
 #endif /* GCC_TARGHOOKS_H */
-- 
2.50.0



[PATCH 3/5] Avoid vect_is_simple_use call from vectorizable_reduction

2025-07-10 Thread Richard Biener
When analyzing the reduction cycle we look to determine the
reduction input vector type, for lane-reducing ops we look
at the input but instead of using vect_is_simple_use which
is problematic for SLP we should simply get at the SLP
operands vector type.  If that's not set and we make up one
we should also ensure it stays so.

* tree-vect-loop.cc (vectorizable_reduction): Avoid
vect_is_simple_use and record a vector type if we come
up with one.
---
 gcc/tree-vect-loop.cc | 13 +
 1 file changed, 5 insertions(+), 8 deletions(-)

diff --git a/gcc/tree-vect-loop.cc b/gcc/tree-vect-loop.cc
index 7b260c34a84..8ea0f45d79f 100644
--- a/gcc/tree-vect-loop.cc
+++ b/gcc/tree-vect-loop.cc
@@ -7378,23 +7378,20 @@ vectorizable_reduction (loop_vec_info loop_vinfo,
 
  if (lane_reducing_op_p (op.code))
{
- enum vect_def_type dt;
- tree vectype_op;
-
  /* The last operand of lane-reducing operation is for
 reduction.  */
  gcc_assert (reduc_idx > 0 && reduc_idx == (int) op.num_ops - 1);
 
- if (!vect_is_simple_use (op.ops[0], loop_vinfo, &dt, &vectype_op))
-   return false;
-
+ slp_tree op_node = SLP_TREE_CHILDREN (slp_for_stmt_info)[0];
+ tree vectype_op = SLP_TREE_VECTYPE (op_node);
  tree type_op = TREE_TYPE (op.ops[0]);
-
  if (!vectype_op)
{
  vectype_op = get_vectype_for_scalar_type (loop_vinfo,
type_op);
- if (!vectype_op)
+ if (!vectype_op
+ || !vect_maybe_update_slp_op_vectype (op_node,
+   vectype_op))
return false;
}
 
-- 
2.43.0



[PATCH] MicroBlaze : Enhance support for atomics. Fix PR118280

2025-07-10 Thread Gopi Kumar Bulusu
namaskaaram

Please find the patch attached. This addresses regression for MicroBlaze
(PR118280)

Atomic support enhanced to fix existing atomic_compare_and_swapsi pattern
to handle side effects; new patterns atomic_fetch_op and atomic_test_and_set
added. As MicroBlaze has no QImode test/set instruction, use shift magic
to implement atomic_test_and_set. Make -mxl-barrel-shift the default to keep
the default atomics code tight.

Files Changed

* gcc/config/microblaze/iterators.md: New
* microblaze-protos.h/microblaze.cc : Add microblaze_subword_address
* gcc/config/microblaze/microblaze.md: constants: Add UNSPECV_CAS_BOOL,
  UNSPECV_CAS_MEM, UNSPECV_CAS_VAL, UNSPECV_ATOMIC_FETCH_OP
  type: add atomic
* gcc/config/microblaze/microblaze.h: TARGET_DEFAULT : Add MASK_BARREL_SHIFT
* gcc/config/microblaze/sync.md: Add atomic_fetch_si
  atomic_test_and_set

Target Checked
microblazeel-amd-linux

Testing

deja-g++

=== g++ Summary ===

# of expected passes237906
# of unexpected failures4165
# of unexpected successes   3
# of expected failures  2180
# of unresolved testcases   645
# of unsupported tests  2658


deja-libstdcpp

=== libstdc++ Summary ===

# of expected passes18180
# of unexpected failures311
# of expected failures  133
# of unresolved testcases   18
# of unsupported tests  853

Includes Test case 29_atomics/atomic_flag/clear/1.cc (which checks for
atomic_test_and_set)

dhanyavaadaaha
gopi
From 2503eec66df23178ab8e906ec963e08fb4fa3081 Mon Sep 17 00:00:00 2001
From: Gopi Kumar Bulusu 
Date: Thu, 10 Jul 2025 12:44:44 +0530
Subject: [PATCH] MicroBlaze : Enhance support for atomics. Fix PR118280

Atomic support enhanced to fix existing atomic_compare_and_swapsi pattern
to handle side effects; new patterns atomic_fetch_op and atomic_test_and_set
added. As MicroBlaze has no QImode test/set instruction, use shift magic
to implement atomic_test_and_set. Make -mxl-barrel-shift the default to keep
the default atomics code tight.

Files Changed

* gcc/config/microblaze/iterators.md: New
* microblaze-protos.h/microblaze.cc : Add microblaze_subword_address
* gcc/config/microblaze/microblaze.md: constants: Add UNSPECV_CAS_BOOL,
  UNSPECV_CAS_MEM, UNSPECV_CAS_VAL, UNSPECV_ATOMIC_FETCH_OP
  type: add atomic
* gcc/config/microblaze/microblaze.h: TARGET_DEFAULT : Add MASK_BARREL_SHIFT
* gcc/config/microblaze/sync.md: Add atomic_fetch_si
  atomic_test_and_set

Signed-off-by: Kirk Meyer 
Signed-off-by: David Holsgrove 
Signed-off-by: Gopi Kumar Bulusu 
---
 gcc/config/microblaze/iterators.md|  25 +
 gcc/config/microblaze/microblaze-protos.h |   1 +
 gcc/config/microblaze/microblaze.cc   |  36 
 gcc/config/microblaze/microblaze.h|   2 +-
 gcc/config/microblaze/microblaze.md   |   7 +-
 gcc/config/microblaze/sync.md | 108 ++
 6 files changed, 159 insertions(+), 20 deletions(-)
 create mode 100644 gcc/config/microblaze/iterators.md

diff --git a/gcc/config/microblaze/iterators.md b/gcc/config/microblaze/iterators.md
new file mode 100644
index 000..2ffc2422a0a
--- /dev/null
+++ b/gcc/config/microblaze/iterators.md
@@ -0,0 +1,25 @@
+;; Iterator definitions for GCC MicroBlaze machine description files.
+;; Copyright (C) 2012-2024 Free Software Foundation, Inc.
+;;
+;; This file is part of GCC.
+;;
+;; GCC is free software; you can redistribute it and/or modify
+;; it under the terms of the GNU General Public License as published by
+;; the Free Software Foundation; either version 3, or (at your option)
+;; any later version.
+;;
+;; GCC is distributed in the hope that it will be useful,
+;; but WITHOUT ANY WARRANTY; without even the implied warranty of
+;; MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+;; GNU General Public License for more details.
+;;
+;; You should have received a copy of the GNU General Public License
+;; along with GCC; see the file COPYING3.  If not see
+;; .
+
+; atomics code iterator
+(define_code_iterator any_atomic [plus ior xor and])
+
+; atomics code attribute
+(define_code_attr atomic_optab
+  [(plus "add") (ior "or") (xor "xor") (and "and")])
diff --git a/gcc/config/microblaze/microblaze-protos.h b/gcc/config/microblaze/microblaze-protos.h
index 90b79cfe716..654537578b2 100644
--- a/gcc/config/microblaze/microblaze-protos.h
+++ b/gcc/config/microblaze/microblaze-protos.h
@@ -62,6 +62,7 @@ extern int symbol_mentioned_p (rtx);
 extern int label_mentioned_p (rtx);
 extern bool microblaze_cannot_force_const_mem (machine_mode, rtx);
 extern void microblaze_eh_return (rtx op0);
+extern void microblaze_subword_address (rtx, rtx *, rtx *, rtx *);
 #endif  /* RTX_CODE */
 
 /* Declare functions in microblaze-c.cc.  */
diff --git a/gcc/config/microblaze/microblaze.cc b/gcc/config/microblaze/microblaze.cc
index 2ab5ada4ec9..9b34baf9c5b 100644
--- a/gcc/config/microblaze/micr

[PATCH] RISC-V: Improve bswap8 when zbb is enabled

2025-07-10 Thread Dusan Stojkovic
This peephole pattern combines the following instructions:
bswap8:
rev8a5,a0
-> li  a4,-65536
-> sraia5,a5,32
-> and a5,a5,a4
-> roriw   a5,a5,16
and a0,a0,a4
or  a0,a0,a5
sext.w  a0,a0
ret

And emits this assembly:
bswap8:
rev8a5,a0
-> li  a4,-65536
-> sraia5,a5,48
and a0,a0,a4
or  a0,a0,a5
sext.w  a0,a0
ret

Since the load instruction is required for the rest of the test function
in the PR, the pattern conserves the load.

2025-07-10  Dusan Stojkovic  

PR target/120920

gcc/ChangeLog:

* config/riscv/peephole.md: New pattern.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/zbb_bswap8.c: New test.


CONFIDENTIALITY: The contents of this e-mail are confidential and intended only 
for the above addressee(s). If you are not the intended recipient, or the 
person responsible for delivering it to the intended recipient, copying or 
delivering it to anyone else or using it in any unauthorized manner is 
prohibited and may be unlawful. If you receive this e-mail by mistake, please 
notify the sender and the systems administrator at straym...@rt-rk.com 
immediately.
---
 gcc/config/riscv/peephole.md| 28 +
 gcc/testsuite/gcc.target/riscv/zbb_bswap8.c | 10 
 2 files changed, 38 insertions(+)
 create mode 100644 gcc/testsuite/gcc.target/riscv/zbb_bswap8.c

diff --git a/gcc/config/riscv/peephole.md b/gcc/config/riscv/peephole.md
index b5cc1924c76..b07de8bf83e 100644
--- a/gcc/config/riscv/peephole.md
+++ b/gcc/config/riscv/peephole.md
@@ -66,3 +66,31 @@
   (set (match_dup 2)
(match_dup 3))])]
 )
+
+;; ZBB
+
+(define_peephole2
+  [(set (match_operand:DI 0 "register_operand")
+(ashiftrt:DI (match_operand:DI 1 "register_operand")
+  (match_operand 2 "const_int_operand")))
+  (set (match_operand:DI 3 "register_operand")
+(match_operand 4 "const_int_operand"))
+  (set (match_dup 1)
+(and:DI (match_dup 1) (match_dup 3)))
+  (set (match_operand:SI 5 "register_operand")
+(rotatert:SI (match_operand:SI 6 "register_operand")
+  (match_operand 7 "const_int_operand")))]
+  "TARGET_ZBB && TARGET_64BIT
+  && (REGNO (operands[0]) == REGNO (operands[5]))
+  && (REGNO (operands[1]) == REGNO (operands[6]))
+  && (ctz_hwi (INTVAL (operands[4])) == INTVAL (operands[7]))"
+  [(set (match_dup 3)
+(match_operand 4))
+  (set (match_dup 0)
+(ashiftrt:DI (match_dup 1)
+  (match_operand 7 "const_int_operand")))]
+{
+  unsigned HOST_WIDE_INT mask = INTVAL (operands[4]);
+  int trailing = ctz_hwi (mask);
+  operands[7] = GEN_INT (trailing + INTVAL (operands[2]));
+})
diff --git a/gcc/testsuite/gcc.target/riscv/zbb_bswap8.c 
b/gcc/testsuite/gcc.target/riscv/zbb_bswap8.c
new file mode 100644
index 000..77441b720b2
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/zbb_bswap8.c
@@ -0,0 +1,10 @@
+/* { dg-do compile } */
+/* { dg-options "-march=rv64gc_zbb -mabi=lp64d -O2" { target { rv64 } } } */
+
+unsigned int bswap8(unsigned int n)
+{
+  return (n & 0x) | ((n & 0xff00) >> 8) | ((n & 0xff) << 8);
+}
+
+/* { dg-final { scan-assembler {\mrev8} } } */
+/* { dg-final { scan-assembler {\msrai\s+[ax][0-9]+,\s*[ax][0-9]+,\s*48} } } */
-- 
2.43.0



Re: [PATCH] libgcc: PR target/116363 Fix SFtype to UDWtype conversion

2025-07-10 Thread Jeff Law




On 7/10/25 8:37 AM, Jan Dubiec wrote:

On 10.07.2025 15:42, Jeff Law wrote:
[...]
Anyway, this has been repeatedly bootstrapped & regression tested on 
aarch64, ppc64le and other targets.  It's also been many dozens of 
regression testing cycles on the various embedded targets.


This part of code does not seem to be used on many targets…
Definitely.  There's only a very small number that do the DF=SF trick 
when building libgcc. In fact, it's possible (but I haven't verified) 
that the H8 is the only one left doing that.


Jeff



Re: [PATCH v2] RISC-V: Make zero-stride load broadcast a tunable.

2025-07-10 Thread Kito Cheng
LGTM :)

On Thu, Jul 10, 2025 at 6:00 PM Robin Dapp  wrote:
>
> Hi,
>
> Changes from v1:
>  - Use Himode broadcast instead of float broadcast, saving two conversion
>insns.
>
> Let's be daring and leave the thorough testing to the CI first while my own
> testing is in progress :)
>
> This patch makes the zero-stride load broadcast idiom dependent on a
> uarch-tunable "use_zero_stride_load".  Right now we have quite a few
> paths that reach a strided load and some of them are not exactly
> straightforward.
>
> While broadcast is relatively rare on rv64 targets it is more common on
> rv32 targets that want to vectorize 64-bit elements.
>
> While the patch is more involved than I would have liked it could have
> even touched more places.  The whole broadcast-like insn path feels a
> bit hackish due to the several optimizations we employ.  Some of the
> complications stem from the fact that we lump together real broadcasts,
> vector single-element sets, and strided broadcasts.  The strided-load
> alternatives currently require a memory_constraint to work properly
> which causes more complications when trying to disable just these.
>
> In short, the whole pred_broadcast handling in combination with the
> sew64_scalar_helper could use work in the future.  I was about to start
> with it in this patch but soon realized that it would only distract from
> the original intent.  What can help in the future is split strided and
> non-strided broadcast entirely, as well as the single-element sets.
>
> Yet unclear is whether we need to pay special attention for misaligned
> strided loads (PR120782).
>
> I regtested on rv32 and rv64 with strided_load_broadcast_p forced to
> true and false.  With either I didn't observe any new execution failures
> but obviously there are new scan failures with strided broadcast turned
> off.
>
> Regards
>  Robin
>
> PR target/118734
>
> gcc/ChangeLog:
>
> * config/riscv/constraints.md (Wdm): Use tunable for Wdm
> constraint.
> * config/riscv/riscv-protos.h (emit_avltype_insn): Declare.
> (can_be_broadcasted_p): Rename to...
> (can_be_broadcast_p): ...this.
> * config/riscv/predicates.md: Use renamed function.
> (strided_load_broadcast_p): Declare.
> * config/riscv/riscv-selftests.cc (run_broadcast_selftests):
> Only run broadcast selftest if strided broadcasts are OK.
> * config/riscv/riscv-v.cc (emit_avltype_insn): New function.
> (sew64_scalar_helper): Only emit a pred_broadcast if the new
> tunable says so.
> (can_be_broadcasted_p): Rename to...
> (can_be_broadcast_p): ...this and use new tunable.
> * config/riscv/riscv.cc (struct riscv_tune_param): Add strided
> broad tunable.
> (strided_load_broadcast_p): Implement.
> * config/riscv/vector.md: Use strided_load_broadcast_p () and
> work around 64-bit broadcast on rv32 targets.
> ---
>  gcc/config/riscv/constraints.md |  7 +--
>  gcc/config/riscv/predicates.md  |  2 +-
>  gcc/config/riscv/riscv-protos.h |  4 +-
>  gcc/config/riscv/riscv-selftests.cc | 10 +++--
>  gcc/config/riscv/riscv-v.cc | 58 +
>  gcc/config/riscv/riscv.cc   | 20 +
>  gcc/config/riscv/vector.md  | 66 +
>  7 files changed, 133 insertions(+), 34 deletions(-)
>
> diff --git a/gcc/config/riscv/constraints.md b/gcc/config/riscv/constraints.md
> index ccab1a2e29d..5ecaa19eb01 100644
> --- a/gcc/config/riscv/constraints.md
> +++ b/gcc/config/riscv/constraints.md
> @@ -237,10 +237,11 @@ (define_constraint "Wb1"
>   (and (match_code "const_vector")
>(match_test "rtx_equal_p (op, riscv_vector::gen_scalar_move_mask 
> (GET_MODE (op)))")))
>
> -(define_memory_constraint "Wdm"
> +(define_constraint "Wdm"
>"Vector duplicate memory operand"
> -  (and (match_code "mem")
> -   (match_code "reg" "0")))
> +  (and (match_test "strided_load_broadcast_p ()")
> +   (and (match_code "mem")
> +   (match_code "reg" "0"
>
>  ;; Vendor ISA extension constraints.
>
> diff --git a/gcc/config/riscv/predicates.md b/gcc/config/riscv/predicates.md
> index 8baad2fae7a..1f9a6b562e5 100644
> --- a/gcc/config/riscv/predicates.md
> +++ b/gcc/config/riscv/predicates.md
> @@ -617,7 +617,7 @@ (define_special_predicate "vector_any_register_operand"
>
>  ;; The scalar operand can be directly broadcast by RVV instructions.
>  (define_predicate "direct_broadcast_operand"
> -  (match_test "riscv_vector::can_be_broadcasted_p (op)"))
> +  (match_test "riscv_vector::can_be_broadcast_p (op)"))
>
>  ;; A CONST_INT operand that has exactly two bits cleared.
>  (define_predicate "const_nottwobits_operand"
> diff --git a/gcc/config/riscv/riscv-protos.h b/gcc/config/riscv/riscv-protos.h
> index 38f63ea8424..a41c4c299fa 100644
> --- a/gcc/config/riscv/riscv-protos.h
> +++ b/gcc/config/riscv/riscv-protos.h
> @@ -604,6 +604,7 @@ void emi

Re: [PATCH v3 2/9] opts: use uint64_t for sanitizer flags

2025-07-10 Thread Andrew Pinski
On Thu, Jul 10, 2025, 4:12 AM 
wrote:

> From: Indu Bhagat 
>
> Currently, the data type of sanitizer flags is unsigned int, with
> SANITIZE_SHADOW_CALL_STACK (1UL << 31) being highest individual
> enumerator for enum sanitize_code.  Use 'uint64_t' data type to allow
> for more distinct instrumentation modes be added when needed.
>


I have not looked yet but does it make sense to use `unsigned
HOST_WIDE_INT` instead of uint64_t? HWI should be the same as uint64_t but
it is more consistent with the rest of gcc.
Plus since tree_to_uhwi is more consistent there.

Thanks,
Andrew



> gcc/ChangeLog:
>
> * asan.h (sanitize_flags_p): Use 'uint64_t' instead of 'unsigned
> int'.
> * common.opt: Likewise.
> * dwarf2asm.cc (dw2_output_indirect_constant_1): Likewise.
> * opts.cc (find_sanitizer_argument): Likewise.
> (report_conflicting_sanitizer_options): Likewise.
> (parse_sanitizer_options): Likewise.
> (parse_no_sanitize_attribute): Likewise.
> * opts.h (parse_sanitizer_options): Likewise.
> (parse_no_sanitize_attribute): Likewise.
> * tree-cfg.cc (print_no_sanitize_attr_value): Likewise.
>
> gcc/c-family/ChangeLog:
>
> * c-attribs.cc (add_no_sanitize_value): Likewise.
> (handle_no_sanitize_attribute): Likewise.
> (handle_no_sanitize_address_attribute): Likewise.
> (handle_no_sanitize_thread_attribute): Likewise.
> (handle_no_address_safety_analysis_attribute): Likewise.
> * c-common.h (add_no_sanitize_value): Likewise.
>
> gcc/c/ChangeLog:
>
> * c-parser.cc (c_parser_declaration_or_fndef): Likewise.
>
> gcc/cp/ChangeLog:
>
> * typeck.cc (get_member_function_from_ptrfunc): Likewise.
>
> gcc/d/ChangeLog:
>
> * d-attribs.cc (d_handle_no_sanitize_attribute): Likewise.
>
> Signed-off-by: Claudiu Zissulescu <
> claudiu.zissulescu-iancule...@oracle.com>
> ---
>  gcc/asan.h|  5 +++--
>  gcc/c-family/c-attribs.cc | 16 
>  gcc/c-family/c-common.h   |  2 +-
>  gcc/c/c-parser.cc |  4 ++--
>  gcc/common.opt|  6 +++---
>  gcc/cp/typeck.cc  |  2 +-
>  gcc/d/d-attribs.cc|  8 
>  gcc/dwarf2asm.cc  |  2 +-
>  gcc/opts.cc   | 25 +
>  gcc/opts.h|  8 
>  gcc/tree-cfg.cc   |  2 +-
>  11 files changed, 41 insertions(+), 39 deletions(-)
>
> diff --git a/gcc/asan.h b/gcc/asan.h
> index 064d4f24823..d4443de4620 100644
> --- a/gcc/asan.h
> +++ b/gcc/asan.h
> @@ -242,9 +242,10 @@ asan_protect_stack_decl (tree decl)
> remove all flags mentioned in "no_sanitize" of DECL_ATTRIBUTES.  */
>
>  inline bool
> -sanitize_flags_p (unsigned int flag, const_tree fn =
> current_function_decl)
> +sanitize_flags_p (uint64_t flag,
> + const_tree fn = current_function_decl)
>  {
> -  unsigned int result_flags = flag_sanitize & flag;
> +  uint64_t result_flags = flag_sanitize & flag;
>if (result_flags == 0)
>  return false;
>
> diff --git a/gcc/c-family/c-attribs.cc b/gcc/c-family/c-attribs.cc
> index ea04ed7f0d4..ddb173e3ccf 100644
> --- a/gcc/c-family/c-attribs.cc
> +++ b/gcc/c-family/c-attribs.cc
> @@ -1409,23 +1409,23 @@ handle_cold_attribute (tree *node, tree name, tree
> ARG_UNUSED (args),
>  /* Add FLAGS for a function NODE to no_sanitize_flags in
> DECL_ATTRIBUTES.  */
>
>  void
> -add_no_sanitize_value (tree node, unsigned int flags)
> +add_no_sanitize_value (tree node, uint64_t flags)
>  {
>tree attr = lookup_attribute ("no_sanitize", DECL_ATTRIBUTES (node));
>if (attr)
>  {
> -  unsigned int old_value = tree_to_uhwi (TREE_VALUE (attr));
> +  uint64_t old_value = tree_to_uhwi (TREE_VALUE (attr));
>flags |= old_value;
>
>if (flags == old_value)
> return;
>
> -  TREE_VALUE (attr) = build_int_cst (unsigned_type_node, flags);
> +  TREE_VALUE (attr) = build_int_cst (uint64_type_node, flags);
>  }
>else
>  DECL_ATTRIBUTES (node)
>= tree_cons (get_identifier ("no_sanitize"),
> -  build_int_cst (unsigned_type_node, flags),
> +  build_int_cst (uint64_type_node, flags),
>DECL_ATTRIBUTES (node));
>  }
>
> @@ -1436,7 +1436,7 @@ static tree
>  handle_no_sanitize_attribute (tree *node, tree name, tree args, int,
>   bool *no_add_attrs)
>  {
> -  unsigned int flags = 0;
> +  uint64_t flags = 0;
>*no_add_attrs = true;
>if (TREE_CODE (*node) != FUNCTION_DECL)
>  {
> @@ -1473,7 +1473,7 @@ handle_no_sanitize_address_attribute (tree *node,
> tree name, tree, int,
>if (TREE_CODE (*node) != FUNCTION_DECL)
>  warning (OPT_Wattributes, "%qE attribute ignored", name);
>else
> -add_no_sanitize_value (*node, SANITIZE_ADDRESS);
> +add_no_sanitize_value (*node, (uint64_t) SANITIZE_ADDRESS);
>
>return NULL_TREE;
>  }
> @@ -1489,7 +1489,7 @@ handle_no_sanitize

Re: [PATCH] x86: Update "*mov_internal" in mmx.md to handle all 1s vectors

2025-07-10 Thread Uros Bizjak
On Thu, Jul 10, 2025 at 1:57 PM H.J. Lu  wrote:
>
> commit 77473a27bae04da99d6979d43e7bd0a8106f4557
> Author: H.J. Lu 
> Date:   Thu Jun 26 06:08:51 2025 +0800
>
> x86: Also handle all 1s float vector constant
>
> replaces
>
> (insn 29 28 30 5 (set (reg:V2SF 107)
> (mem/u/c:V2SF (symbol_ref/u:DI ("*.LC0") [flags 0x2]) [0  S8 A64])) 
> 2031
>  {*movv2sf_internal}
>  (expr_list:REG_EQUAL (const_vector:V2SF [
> (const_double:SF -QNaN [-QNaN]) repeated x2
> ])
> (nil)))
>
> with
>
> (insn 98 13 14 3 (set (reg:V8QI 112)
> (const_vector:V8QI [
> (const_int -1 [0x]) repeated x8
> ])) -1
>  (nil))
> ...
> (insn 29 28 30 5 (set (reg:V2SF 107)
> (subreg:V2SF (reg:V8QI 112) 0)) 2031 {*movv2sf_internal}
>  (expr_list:REG_EQUAL (const_vector:V2SF [
> (const_double:SF -QNaN [-QNaN]) repeated x2
> ])
> (nil)))
>
> which leads to
>
> pr121015.c: In function ‘render_result_from_bake_h’:
> pr121015.c:34:1: error: unrecognizable insn:
>34 | }
>   | ^
> (insn 98 13 14 3 (set (reg:V8QI 112)
> (const_vector:V8QI [
> (const_int -1 [0x]) repeated x8
> ])) -1
>  (expr_list:REG_EQUIV (const_vector:V8QI [
> (const_int -1 [0x]) repeated x8
> ])
> (nil)))
> during RTL pass: ira
>
> 1. Update constm1_operand to also return true for integer and float all
> 1s vectors.
> 2. Add nonimm_or_0_or_m1_operand for nonimmediate, zero or -1 operand.
> 3. Add BI for constant all 0s/1s operand.
> 4. Update "*mov_internal" in mmx.md to handle integer all 1s vectors.
> 5. Update MMXMODE move splitter to also split all 1s source operand.
>
> gcc/
>
> PR target/121015
> * config/i386/constraints.md (BI): New constraint.
> * config/i386/i386.cc (ix86_print_operand): Support CONSTM1_RTX.
> * config/i386/mmx.md (*mov_internal): Replace C with BI
> memory and integer register destination.
> Update MMXMODE move splitter to also split all 1s source operand.
> * config/i386/predicates.md (constm1_operand): Also return true
> for int_float_vector_all_ones_operand.
> (nonimm_or_0_or_m1_operand): New predicate.
>
> gcc/testsuite/
>
> PR target/121015
> * gcc.target/i386/pr106022-2.c: Adjusted.
> * gcc.target/i386/pr121015.c: New test.
>
> OK for master?

+;; Match exactly -1.
+(define_predicate "constm1_operand"
+  (ior (and (match_code "const_int")
+(match_test "op == constm1_rtx"))
+   (match_operand 0 "int_float_vector_all_ones_operand")))

No, this predicate should not be repurposed to all-ones predicate.

For SSE we have a  macro that defines different
constraints for float and int moves, I think we should have the same
approach for MMX. IMO, you also need to amend the  case.

Uros.


[PATCH] Reject single lane vector types for SLP build

2025-07-10 Thread Richard Biener
The following makes us never consider vector(1) T types for
vectorization and ensures this during SLP build.  This is a
long-standing issue for BB vectorization and when we remove
early loop vector type setting we lose the single place we have
that rejects this for loops.

Once we implement partial loop vectorization we should revisit
this, but then use the original scalar types for the unvectorized
parts.

Bootstrapped and tested on x86_64-unknown-linux-gnu.  I'll see
if there's any surprises from the CI, but otherwise I'll go
ahead with this.

Richard.

* tree-vect-slp.cc (vect_build_slp_tree_1): Reject
single-lane vector types.
---
 gcc/tree-vect-slp.cc | 9 +
 1 file changed, 9 insertions(+)

diff --git a/gcc/tree-vect-slp.cc b/gcc/tree-vect-slp.cc
index ad75386926a..d2ce4ffaa4f 100644
--- a/gcc/tree-vect-slp.cc
+++ b/gcc/tree-vect-slp.cc
@@ -1114,6 +1114,15 @@ vect_build_slp_tree_1 (vec_info *vinfo, unsigned char 
*swap,
   matches[0] = false;
   return false;
 }
+  if (known_le (TYPE_VECTOR_SUBPARTS (vectype), 1U))
+{
+  if (dump_enabled_p ())
+   dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location,
+"Build SLP failed: not using single lane "
+"vector type %T\n", vectype);
+  matches[0] = false;
+  return false;
+}
   /* Record nunits required but continue analysis, producing matches[]
  as if nunits was not an issue.  This allows splitting of groups
  to happen.  */
-- 
2.43.0


Re: [PATCH] RISC-V: Make zero-stride load broadcast a tunable.

2025-07-10 Thread Kito Cheng
> diff --git a/gcc/config/riscv/vector.md b/gcc/config/riscv/vector.md
> index 6753b01db59..866aaf1e8a0 100644
> --- a/gcc/config/riscv/vector.md
> +++ b/gcc/config/riscv/vector.md
> @@ -1580,8 +1580,27 @@ (define_insn_and_split "*vec_duplicate"
>"&& 1"
>[(const_int 0)]
>{
> -riscv_vector::emit_vlmax_insn (code_for_pred_broadcast (mode),
> -   riscv_vector::UNARY_OP, operands);
> +if (!strided_load_broadcast_p ()
> +   && TARGET_ZVFHMIN && !TARGET_ZVFH && mode == HFmode)
> +  {
> +   /* For Float16, load, convert to float, then broadcast and
> +  truncate.  */
> +   rtx tmpsf = gen_reg_rtx (SFmode);
> +   emit_insn (gen_extendhfsf2 (tmpsf, operands[1]));
> +   poly_uint64 nunits = GET_MODE_NUNITS (mode);

This could be HF -> HI (bitcast) then a HI pred_broadcast then bitcast
back to a HF vector again,
this could prevent us introducing trunc here,
and I would prefer to improve this since RVA23 profile only mandatory
Zvfhmin not Zvfh.

> +   machine_mode vmodesf
> + = riscv_vector::get_vector_mode (SFmode, nunits).require ();
> +   rtx tmp = gen_reg_rtx (vmodesf);
> +   rtx ops[] =  {tmp, tmpsf};
> +   riscv_vector::emit_vlmax_insn (code_for_pred_broadcast (vmodesf),
> +  riscv_vector::UNARY_OP, ops);
> +   rtx ops2[] = {operands[0], tmp};
> +   riscv_vector::emit_vlmax_insn (code_for_pred_trunc (vmodesf),
> +  riscv_vector::UNARY_OP_FRM_DYN, ops2);
> +  }
> +else
> +  riscv_vector::emit_vlmax_insn (code_for_pred_broadcast (mode),
> +riscv_vector::UNARY_OP, operands);
>  DONE;
>}
>[(set_attr "type" "vector")]
> @@ -2171,7 +2190,7 @@ (define_expand "@pred_broadcast"
> }
>  }
>else if (GET_MODE_BITSIZE (mode) > GET_MODE_BITSIZE (Pmode)
> -   && (immediate_operand (operands[3], Pmode)
> +  && (immediate_operand (operands[3], Pmode)
>|| (CONST_POLY_INT_P (operands[3])
>&& known_ge (rtx_to_poly_int64 (operands[3]), 0U)
>&& known_le (rtx_to_poly_int64 (operands[3]), 
> GET_MODE_SIZE (mode)
> @@ -2224,12 +2243,7 @@ (define_insn_and_split "*pred_broadcast"
>"(register_operand (operands[3], mode)
>|| CONST_POLY_INT_P (operands[3]))
>&& GET_MODE_BITSIZE (mode) > GET_MODE_BITSIZE (Pmode)"
> -  [(set (match_dup 0)
> -   (if_then_else:V_VLSI (unspec: [(match_dup 1) (match_dup 4)
> -(match_dup 5) (match_dup 6) (match_dup 7)
> -(reg:SI VL_REGNUM) (reg:SI VTYPE_REGNUM)] UNSPEC_VPREDICATE)
> - (vec_duplicate:V_VLSI (match_dup 3))
> - (match_dup 2)))]
> +  [(const_int 0)]
>{
>  gcc_assert (can_create_pseudo_p ());
>  if (CONST_POLY_INT_P (operands[3]))
> @@ -2238,12 +2252,6 @@ (define_insn_and_split "*pred_broadcast"
> emit_move_insn (tmp, operands[3]);
> operands[3] = tmp;
>}
> -rtx m = assign_stack_local (mode, GET_MODE_SIZE (mode),
> -   GET_MODE_ALIGNMENT (mode));
> -m = validize_mem (m);
> -emit_move_insn (m, operands[3]);
> -m = gen_rtx_MEM (mode, force_reg (Pmode, XEXP (m, 0)));
> -operands[3] = m;
>
>  /* For SEW = 64 in RV32 system, we expand vmv.s.x:
> andi a2,a2,1
> @@ -2254,6 +2262,35 @@ (define_insn_and_split "*pred_broadcast"
> operands[4] = riscv_vector::gen_avl_for_scalar_move (operands[4]);
> operands[1] = CONSTM1_RTX (mode);
>}
> +
> +/* If the target doesn't want a strided-load broadcast we go with a 
> regular
> +   V1DImode load and a broadcast gather.  */
> +if (strided_load_broadcast_p ())
> +  {
> +   rtx mem = assign_stack_local (mode, GET_MODE_SIZE (mode),
> + GET_MODE_ALIGNMENT (mode));
> +   mem = validize_mem (mem);
> +   emit_move_insn (mem, operands[3]);
> +   mem = gen_rtx_MEM (mode, force_reg (Pmode, XEXP (mem, 0)));
> +
> +   emit_insn
> + (gen_pred_broadcast
> +  (operands[0], operands[1], operands[2], mem,
> +   operands[4], operands[5], operands[6], operands[7]));
> +  }
> +else
> +  {
> +   rtx tmp = gen_reg_rtx (V1DImode);
> +   emit_move_insn (tmp, lowpart_subreg (V1DImode, operands[3],
> +mode));
> +   tmp = lowpart_subreg (mode, tmp, V1DImode);
> +
> +   emit_insn
> + (gen_pred_gather_scalar
> +  (operands[0], operands[1], operands[2], tmp, CONST0_RTX (Pmode),
> +   operands[4], operands[5], operands[6], operands[7]));
> +  }
> +DONE;
>}
>[(set_attr "type" "vimov,vimov,vlds,vlds,vlds,vlds,vimovxv,vimovxv")
> (set_attr "mode" "")])
> @@ -2293,9 +2330,9 @@ (define_insn "*pred_broadcast_zvfhmin"
>  (reg:SI VL_REGNUM)
>  (reg:SI VTYPE_REGNUM)] UNSPEC_

Re: [PATCH] RISC-V: Make zero-stride load broadcast a tunable.

2025-07-10 Thread Robin Dapp

diff --git a/gcc/config/riscv/vector.md b/gcc/config/riscv/vector.md
index 6753b01db59..866aaf1e8a0 100644
--- a/gcc/config/riscv/vector.md
+++ b/gcc/config/riscv/vector.md
@@ -1580,8 +1580,27 @@ (define_insn_and_split "*vec_duplicate"
   "&& 1"
   [(const_int 0)]
   {
-riscv_vector::emit_vlmax_insn (code_for_pred_broadcast (mode),
-   riscv_vector::UNARY_OP, operands);
+if (!strided_load_broadcast_p ()
+   && TARGET_ZVFHMIN && !TARGET_ZVFH && mode == HFmode)
+  {
+   /* For Float16, load, convert to float, then broadcast and
+  truncate.  */
+   rtx tmpsf = gen_reg_rtx (SFmode);
+   emit_insn (gen_extendhfsf2 (tmpsf, operands[1]));
+   poly_uint64 nunits = GET_MODE_NUNITS (mode);


This could be HF -> HI (bitcast) then a HI pred_broadcast then bitcast
back to a HF vector again,
this could prevent us introducing trunc here,
and I would prefer to improve this since RVA23 profile only mandatory
Zvfhmin not Zvfh.


Sure I can do the reinterpret way.  But what's the issue about Zvfhmin?

--
Regards
Robin



Re: [PATCH] expmed: Prevent non-canonical subreg generation in store_bit_field [PR118873]

2025-07-10 Thread Konstantinos Eleftheriou
Hi all!

So, should we go on with the proposed fix inside `store_bit_field`? As
updating `gen_lowpart_common` seems kind of tricky, we should at least
prevent `store_bit_field` from generating non-canonical subregs.

Thanks,
Konstantinos

On Thu, Jun 12, 2025 at 10:47 AM Konstantinos Eleftheriou
 wrote:
>
> After looking at this again, I found out that in both cases, inside
> `store_bit_field` and in the callers, the subregs are generated by
> `gen_lowpart_common` and a call to `lowpart_subreg`. So,
> theoretically, fixing this inside `gen_lowpart_common`, preventing the
> use of `lowpart_subreg` when dealing with vectors, would cover both of
> those cases.
>
> The problem is that, from what I’m seeing, `gen_lowpart_common`
> doesn’t directly modify the RTL and returns a single RTX. For the
> vector cases we would need to generate multiple vector operations to
> access the low part of a register. Another less concerning issue, is
> that we won’t have access to functions like `extract_bit_field`, that
> we use in our implementation.
>
> So, if we can’t find a better solution, we could at least apply this
> one to prevent `store_bit_field` from generating more of those
> subregs.
>
> Konstantinos
>
> On Mon, Jun 2, 2025 at 6:44 AM Jeff Law  wrote:
> >
> >
> >
> > On 5/30/25 12:12 AM, Richard Biener wrote:
> > > On Thu, May 29, 2025 at 12:27 PM Konstantinos Eleftheriou
> > >  wrote:
> > >>
> > >> Hi Richard, thanks for the response.
> > >>
> > >> On Mon, May 26, 2025 at 11:55 AM Richard Biener  
> > >> wrote:
> > >>>
> > >>> On Mon, 26 May 2025, Konstantinos Eleftheriou wrote:
> > >>>
> >  In `store_bit_field_1`, when the value to be written in the bitfield
> >  and/or the bitfield itself have vector modes, non-canonical subregs
> >  are generated, like `(subreg:V4SI (reg:V8SI x) 0)`. If one them is
> >  a scalar, this happens only when the scalar mode is different than the
> >  vector's inner mode.
> > 
> >  This patch tries to prevent this, using vec_set patterns when
> >  possible.
> > >>>
> > >>> I know almost nothing about this code, but why does the patch
> > >>> fixup things after the fact rather than avoid generating the
> > >>> SUBREG in the first place?
> > >>
> > >> That's what we are doing, we are trying to prevent the non-canonical
> > >> subreg generation (it's not always possible). But, there are cases
> > >> where these types of subregs are passed into `store_bit_field` by its
> > >> caller, in which case we choose not to touch them.
> > >>
> > >>> ISTR it also (unfortunately) depends on the target which forms
> > >>> are considered canonical.
> > >>
> > >> But, the way that we interpret the documentation, the
> > >> canonicalizations are machine-independent. Is that not true? Or,
> > >> specifically for the subregs that operate on vectors, is there any
> > >> target that considers them canonical?
> > >>
> > >>> I'm also not sure you got endianess right for all possible
> > >>> values of SUBREG_BYTE.  One more reason to not generate such
> > >>> subreg in the first place but stick to vec_select/concat.
> > >>
> > >> The only way that we would generate subregs are from the calls to
> > >> `extract_bit_field` or `store_bit_field_1` and these should handle the
> > >> endianness. Also, these subregs wouldn't operate on vectors. Do you
> > >> mean that something could go wrong with these calls?
> > >
> > > I wanted to remark that endianess WRT memory order (which is
> > > what store/extract_bit_field deal with) isn't always the same as
> > > endianess in register order (which is what vec_concat and friends
> > > operate on).  If we can avoid transitioning from one to the other
> > > this will help avoid mistakes.
> > >
> > > In general it would be more obvious (to me) if you fixed the callers
> > > that create those subregs.
> > >
> > > Now, I didn't want to pretend I'm reviewing the patch - so others please
> > > do that (as said, I'm not familiar enough with the code to tell whether
> > > it's actually correct).
> > Well, I'd tend to think your core concern is correct -- if something is
> > generating a non-canonical SUBREG, then that needs to be fixed.
> >
> > If I understand Konstantinos's comments correctly they're tackling one
> > of those paths and instead generating a correct form.  My understanding
> > is also that this patch doesn't catch all the known cases.
> >
> > I don't think we anyone anymore that *really* knows the code in
> > question.  We're kind of slogging along as best as we can, but it's not
> > an ideal situation.
> >
> > WRT fixing this earlier in the callers, I think it's the right thing to
> > check.  So I think that means understanding how we get into this routine
> > with VALUE being a SUBREG and FIELDMODE being a vector mode.
> >
> > Jeff
> >


Re: [PATCH] RTEMS: Add riscv multilibs

2025-07-10 Thread Kito Cheng
OK to trunk, although I didn't build a RISC-V rtems toolchain, but I
believe you have verified that change :)

On Thu, Jul 10, 2025 at 1:55 PM Sebastian Huber
 wrote:
>
> gcc/ChangeLog:
>
> * config/riscv/t-rtems: Add -mstrict-align multilibs for
> targets without support for misaligned access in hardware.
> ---
>  gcc/config/riscv/t-rtems | 11 +++
>  1 file changed, 7 insertions(+), 4 deletions(-)
>
> diff --git a/gcc/config/riscv/t-rtems b/gcc/config/riscv/t-rtems
> index f596e764f9d..c58b32ff8e9 100644
> --- a/gcc/config/riscv/t-rtems
> +++ b/gcc/config/riscv/t-rtems
> @@ -1,8 +1,8 @@
>  MULTILIB_OPTIONS   =
>  MULTILIB_DIRNAMES  =
>
> -MULTILIB_OPTIONS   += 
> march=rv32i/march=rv32iac/march=rv32im/march=rv32imf/march=rv32ima/march=rv32imac/march=rv32imaf/march=rv32imafc/march=rv32imafd/march=rv32imafdc/march=rv64ima/march=rv64imac/march=rv64imafd/march=rv64imafdc
> -MULTILIB_DIRNAMES  += rv32i   rv32iac   rv32im   rv32imf 
>  rv32ima   rv32imac   rv32imaf   rv32imafc   rv32imafd   
> rv32imafdc   rv64ima   rv64imac   rv64imafd   rv64imafdc
> +MULTILIB_OPTIONS   += 
> march=rv32i/march=rv32iac/march=rv32im/march=rv32imf/march=rv32ima/march=rv32imac/march=rv32imaf/march=rv32imafc/march=rv32imafd/march=rv32imafdc/march=rv64ima/march=rv64imac/march=rv64imafd/march=rv64imafdc/march=rv64imc
> +MULTILIB_DIRNAMES  += rv32i   rv32iac   rv32im   rv32imf 
>   rv32ima   rv32imac   rv32imaf   rv32imafc   rv32imafd   
> rv32imafdc   rv64ima   rv64imac   rv64imafd   rv64imafdc  
>  rv64imc
>
>  MULTILIB_OPTIONS   += 
> mabi=ilp32/mabi=ilp32f/mabi=ilp32d/mabi=lp64/mabi=lp64d
>  MULTILIB_DIRNAMES  += ilp32  ilp32f  ilp32d  lp64  lp64d
> @@ -10,6 +10,9 @@ MULTILIB_DIRNAMES += ilp32  ilp32f  ilp32d  
> lp64  lp64d
>  MULTILIB_OPTIONS   += mcmodel=medany
>  MULTILIB_DIRNAMES  += medany
>
> +MULTILIB_OPTIONS   += mstrict-align
> +MULTILIB_DIRNAMES  += strict-align
> +
>  MULTILIB_REQUIRED  =
>  MULTILIB_REQUIRED  += march=rv32i/mabi=ilp32
>  MULTILIB_REQUIRED  += march=rv32iac/mabi=ilp32
> @@ -22,6 +25,6 @@ MULTILIB_REQUIRED += march=rv32imafc/mabi=ilp32f
>  MULTILIB_REQUIRED  += march=rv32imafd/mabi=ilp32d
>  MULTILIB_REQUIRED  += march=rv32imafdc/mabi=ilp32d
>  MULTILIB_REQUIRED  += march=rv64ima/mabi=lp64/mcmodel=medany
> -MULTILIB_REQUIRED  += march=rv64imac/mabi=lp64/mcmodel=medany
>  MULTILIB_REQUIRED  += march=rv64imafd/mabi=lp64d/mcmodel=medany
> -MULTILIB_REQUIRED  += march=rv64imafdc/mabi=lp64d/mcmodel=medany
> +MULTILIB_REQUIRED  += 
> march=rv64imafdc/mabi=lp64d/mcmodel=medany/mstrict-align
> +MULTILIB_REQUIRED  += 
> march=rv64imc/mabi=lp64/mcmodel=medany/mstrict-align
> --
> 2.43.0
>


[PATCH] Remove dead code dealing with non-SLP

2025-07-10 Thread Richard Biener
After vect_analyze_loop_operations is gone we can clean up
vect_analyze_stmt as it is no longer called out of SLP context.

Bootstrapped and tested on x86_64-unknown-linux-gnu, pushed.

* tree-vectorizer.h (vect_analyze_stmt): Remove stmt-info
and need_to_vectorize arguments.
* tree-vect-slp.cc (vect_slp_analyze_node_operations_1):
Adjust.
* tree-vect-stmts.cc (can_vectorize_live_stmts): Remove
stmt_info argument and remove non-SLP path.
(vect_analyze_stmt): Remove stmt_info and need_to_vectorize
argument and prune paths no longer reachable.
(vect_transform_stmt): Adjust.
---
 gcc/tree-vect-slp.cc   |   6 +-
 gcc/tree-vect-stmts.cc | 180 +
 gcc/tree-vectorizer.h  |   3 +-
 3 files changed, 38 insertions(+), 151 deletions(-)

diff --git a/gcc/tree-vect-slp.cc b/gcc/tree-vect-slp.cc
index f97a3635cff..68ef1ddda77 100644
--- a/gcc/tree-vect-slp.cc
+++ b/gcc/tree-vect-slp.cc
@@ -7898,8 +7898,6 @@ vect_slp_analyze_node_operations_1 (vec_info *vinfo, 
slp_tree node,
slp_instance node_instance,
stmt_vector_for_cost *cost_vec)
 {
-  stmt_vec_info stmt_info = SLP_TREE_REPRESENTATIVE (node);
-
   /* Calculate the number of vector statements to be created for the scalar
  stmts in this node.  It is the number of scalar elements in one scalar
  iteration (DR_GROUP_SIZE) multiplied by VF divided by the number of
@@ -7928,9 +7926,7 @@ vect_slp_analyze_node_operations_1 (vec_info *vinfo, 
slp_tree node,
   return true;
 }
 
-  bool dummy;
-  return vect_analyze_stmt (vinfo, stmt_info, &dummy,
-   node, node_instance, cost_vec);
+  return vect_analyze_stmt (vinfo, node, node_instance, cost_vec);
 }
 
 static int
diff --git a/gcc/tree-vect-stmts.cc b/gcc/tree-vect-stmts.cc
index 081dd653fd4..e5971e4a357 100644
--- a/gcc/tree-vect-stmts.cc
+++ b/gcc/tree-vect-stmts.cc
@@ -13186,37 +13186,27 @@ vectorizable_early_exit (vec_info *vinfo, 
stmt_vec_info stmt_info,
VEC_STMT_P is as for vectorizable_live_operation.  */
 
 static bool
-can_vectorize_live_stmts (vec_info *vinfo, stmt_vec_info stmt_info,
+can_vectorize_live_stmts (vec_info *vinfo,
  slp_tree slp_node, slp_instance slp_node_instance,
  bool vec_stmt_p,
  stmt_vector_for_cost *cost_vec)
 {
   loop_vec_info loop_vinfo = dyn_cast  (vinfo);
-  if (slp_node)
-{
-  stmt_vec_info slp_stmt_info;
-  unsigned int i;
-  FOR_EACH_VEC_ELT (SLP_TREE_SCALAR_STMTS (slp_node), i, slp_stmt_info)
-   {
- if (slp_stmt_info
- && (STMT_VINFO_LIVE_P (slp_stmt_info)
- || (loop_vinfo
- && LOOP_VINFO_EARLY_BREAKS (loop_vinfo)
- && STMT_VINFO_DEF_TYPE (slp_stmt_info)
- == vect_induction_def))
- && !vectorizable_live_operation (vinfo, slp_stmt_info, slp_node,
-  slp_node_instance, i,
-  vec_stmt_p, cost_vec))
-   return false;
-   }
+  stmt_vec_info slp_stmt_info;
+  unsigned int i;
+  FOR_EACH_VEC_ELT (SLP_TREE_SCALAR_STMTS (slp_node), i, slp_stmt_info)
+{
+  if (slp_stmt_info
+ && (STMT_VINFO_LIVE_P (slp_stmt_info)
+ || (loop_vinfo
+ && LOOP_VINFO_EARLY_BREAKS (loop_vinfo)
+ && STMT_VINFO_DEF_TYPE (slp_stmt_info)
+ == vect_induction_def))
+ && !vectorizable_live_operation (vinfo, slp_stmt_info, slp_node,
+  slp_node_instance, i,
+  vec_stmt_p, cost_vec))
+   return false;
 }
-  else if ((STMT_VINFO_LIVE_P (stmt_info)
-   || (LOOP_VINFO_EARLY_BREAKS (loop_vinfo)
-   && STMT_VINFO_DEF_TYPE (stmt_info) == vect_induction_def))
-  && !vectorizable_live_operation (vinfo, stmt_info,
-   slp_node, slp_node_instance, -1,
-   vec_stmt_p, cost_vec))
-return false;
 
   return true;
 }
@@ -13225,115 +13215,42 @@ can_vectorize_live_stmts (vec_info *vinfo, 
stmt_vec_info stmt_info,
 
 opt_result
 vect_analyze_stmt (vec_info *vinfo,
-  stmt_vec_info stmt_info, bool *need_to_vectorize,
   slp_tree node, slp_instance node_instance,
   stmt_vector_for_cost *cost_vec)
 {
+  stmt_vec_info stmt_info = SLP_TREE_REPRESENTATIVE (node);
   bb_vec_info bb_vinfo = dyn_cast  (vinfo);
   enum vect_relevant relevance = STMT_VINFO_RELEVANT (stmt_info);
   bool ok;
-  gimple_seq pattern_def_seq;
 
   if (dump_enabled_p ())
 dump_printf_loc (MSG_NOTE, vect_location, "==> examining statement: %G",
 stmt_info->stmt);
 
   if (gimple_has_volatile_ops (stmt_info

Re: [PATCH] RISC-V: Make zero-stride load broadcast a tunable.

2025-07-10 Thread Kito Cheng
On Thu, Jul 10, 2025 at 5:31 PM Robin Dapp  wrote:
>
> >> diff --git a/gcc/config/riscv/vector.md b/gcc/config/riscv/vector.md
> >> index 6753b01db59..866aaf1e8a0 100644
> >> --- a/gcc/config/riscv/vector.md
> >> +++ b/gcc/config/riscv/vector.md
> >> @@ -1580,8 +1580,27 @@ (define_insn_and_split "*vec_duplicate"
> >>"&& 1"
> >>[(const_int 0)]
> >>{
> >> -riscv_vector::emit_vlmax_insn (code_for_pred_broadcast (mode),
> >> -   riscv_vector::UNARY_OP, operands);
> >> +if (!strided_load_broadcast_p ()
> >> +   && TARGET_ZVFHMIN && !TARGET_ZVFH && mode == HFmode)
> >> +  {
> >> +   /* For Float16, load, convert to float, then broadcast and
> >> +  truncate.  */
> >> +   rtx tmpsf = gen_reg_rtx (SFmode);
> >> +   emit_insn (gen_extendhfsf2 (tmpsf, operands[1]));
> >> +   poly_uint64 nunits = GET_MODE_NUNITS (mode);
> >
> > This could be HF -> HI (bitcast) then a HI pred_broadcast then bitcast
> > back to a HF vector again,
> > this could prevent us introducing trunc here,
> > and I would prefer to improve this since RVA23 profile only mandatory
> > Zvfhmin not Zvfh.
>
> Sure I can do the reinterpret way.  But what's the issue about Zvfhmin?

Oh, I guess I didn't expand enough about my thought:
I don't care that we have bad performance/bad code gen here if Zvfh is
mandatory for RVA23 since that means not many people and core will
fall into this code gen path.
But RVA23 will go to this code gen patch, which means we will go this
path for years until the next profile becomes the new baseline, so I
think it worth spending time to improve that :)

>
> --
> Regards
>  Robin
>


[PATCH v3 1/1] aarch64: Fold builtins with highpart args to highpart equivalent [PR117850]

2025-07-10 Thread Spencer Abson
Add a fold at gimple_fold_builtin to prefer the highpart variant of
a builtin if at least one argument is a vector highpart and any
others are VECTOR_CSTs that we can cheaply extend to 128-bits.

This eliminates data movement instructions.  For example, we prefer
UMULL2 here over DUP+UMULL

uint16x8_t
foo (const uint8x16_t s)
{
const uint8x8_t f0 = vdup_n_u8 (4);
return vmull_u8 (vget_high_u8 (s), f0);
}

gcc/ChangeLog:
PR target/117850
* config/aarch64/aarch64-builtins.cc (LO_HI_PAIRINGS):  New
(aarch64_get_highpart_builtin):  New function.
(aarch64_v128_highpart_ref):  New function.  Helper to Look
for BIT_FIELD_REFs to the high 64 bits of 128-bit vectors.
(aarch64_build_vector_cst):  New function.  Build a new
VECTOR_CST from the elements of another.
(aarch64_fold_lo_call_to_hi):  New function.  Main logic
for the fold.
(aarch64_general_gimple_fold_builtin):  Add cases for the
pairs in aarch64-builtin-pairs.def.
* config/aarch64/aarch64-builtin-pairs.def: New file.

gcc/testsuite/ChangeLog:
PR target/117850
* gcc.target/aarch64/simd/vabal_combine.c: Removed.  This is
covered by fold_to_highpart_1.c
* gcc.target/aarch64/simd/fold_to_highpart_1.c: New test.
* gcc.target/aarch64/simd/fold_to_highpart_2.c: Likewise.
* gcc.target/aarch64/simd/fold_to_highpart_3.c: Likewise.
* gcc.target/aarch64/simd/fold_to_highpart_4.c: Likewise.
* gcc.target/aarch64/simd/fold_to_highpart_5.c: Likewise.
* gcc.target/aarch64/simd/fold_to_highpart_6.c: Likewise.
---
 gcc/config/aarch64/aarch64-builtin-pairs.def  |  73 ++
 gcc/config/aarch64/aarch64-builtins.cc| 183 +
 .../aarch64/simd/fold_to_highpart_1.c | 717 ++
 .../aarch64/simd/fold_to_highpart_2.c |  89 +++
 .../aarch64/simd/fold_to_highpart_3.c |  83 ++
 .../aarch64/simd/fold_to_highpart_4.c |  38 +
 .../aarch64/simd/fold_to_highpart_5.c |  92 +++
 .../aarch64/simd/fold_to_highpart_6.c |  37 +
 .../gcc.target/aarch64/simd/vabal_combine.c   |  72 --
 9 files changed, 1312 insertions(+), 72 deletions(-)
 create mode 100644 gcc/config/aarch64/aarch64-builtin-pairs.def
 create mode 100644 gcc/testsuite/gcc.target/aarch64/simd/fold_to_highpart_1.c
 create mode 100644 gcc/testsuite/gcc.target/aarch64/simd/fold_to_highpart_2.c
 create mode 100644 gcc/testsuite/gcc.target/aarch64/simd/fold_to_highpart_3.c
 create mode 100644 gcc/testsuite/gcc.target/aarch64/simd/fold_to_highpart_4.c
 create mode 100644 gcc/testsuite/gcc.target/aarch64/simd/fold_to_highpart_5.c
 create mode 100644 gcc/testsuite/gcc.target/aarch64/simd/fold_to_highpart_6.c
 delete mode 100644 gcc/testsuite/gcc.target/aarch64/simd/vabal_combine.c

diff --git a/gcc/config/aarch64/aarch64-builtin-pairs.def 
b/gcc/config/aarch64/aarch64-builtin-pairs.def
new file mode 100644
index 000..83cb0e2fe3a
--- /dev/null
+++ b/gcc/config/aarch64/aarch64-builtin-pairs.def
@@ -0,0 +1,73 @@
+/* Pairings of AArch64 builtins that can be folded into each other.
+   Copyright (C) 2025 Free Software Foundation, Inc.
+
+   This file is part of GCC.
+
+   GCC is free software; you can redistribute it and/or modify it
+   under the terms of the GNU General Public License as published by
+   the Free Software Foundation; either version 3, or (at your option)
+   any later version.
+
+   GCC is distributed in the hope that it will be useful, but
+   WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+   General Public License for more details.
+
+   You should have received a copy of the GNU General Public License
+   along with GCC; see the file COPYING3.  If not see
+   .  */
+
+/* Pairs of single and half integer modes.  */
+#define LO_HI_PAIR_V_HSI(T, LO, HI) \
+  LO_HI_PAIR (T##_##LO##v2si, T##_##HI##v4si) \
+  LO_HI_PAIR (T##_##LO##v4hi, T##_##HI##v8hi)
+
+#define LO_HI_PAIR_V_US_HSI(T, LO, HI) \
+  LO_HI_PAIR_V_HSI (T, s##LO, s##HI) \
+  LO_HI_PAIR_V_HSI (T##U, u##LO, u##HI)
+
+/* Pairs of widenable integer modes.  */
+#define LO_HI_PAIR_V_WI(T, LO, HI) \
+  LO_HI_PAIR_V_HSI (T, LO, HI) \
+  LO_HI_PAIR (T##_##LO##v8qi, T##_##HI##v16qi)
+
+#define LO_HI_PAIR_V_US_WI(T, LO, HI) \
+  LO_HI_PAIR_V_WI (T, s##LO, s##HI) \
+  LO_HI_PAIR_V_WI (T##U, u##LO, u##HI)
+
+#define UNOP_LONG_LH_PAIRS \
+  LO_HI_PAIR (UNOP_sxtlv8hi,  UNOP_vec_unpacks_hi_v16qi) \
+  LO_HI_PAIR (UNOP_sxtlv4si,  UNOP_vec_unpacks_hi_v8hi) \
+  LO_HI_PAIR (UNOP_sxtlv2di,  UNOP_vec_unpacks_hi_v4si) \
+  LO_HI_PAIR (UNOPU_uxtlv8hi, UNOPU_vec_unpacku_hi_v16qi) \
+  LO_HI_PAIR (UNOPU_uxtlv4si, UNOPU_vec_unpacku_hi_v8hi) \
+  LO_HI_PAIR (UNOPU_uxtlv2di, UNOPU_vec_unpacku_hi_v4si)
+
+#define BINOP_LONG_LH_PAIRS \
+  LO_HI_PAIR_V_US_WI (BINOP,  addl, addl2) \
+  LO_HI_PAIR_V_US_WI (BINOP,  subl, subl2) \
+  LO_HI_PAI

[PATCH v3 0/1] aarch64: Fold builtins with highpart args to highpart equivalent [PR117850]

2025-07-10 Thread Spencer Abson
Hi all,

This is V3 of a fix for PR117850.  V2 wasn't picked up or pinged by me as my
work changed quickly, so I've taken the opportunity here to fix it up a bit.
The major differences from V1 are based on the feedback given there.

There are two things that I'd like to note:

* This fold is more general than that required to fix PR117850, in that
it chooses the highpart builtin whenever all of the interesting 
arguments
are the highparts of vector varialbles.  This case is already (mostly)
handled by combine, so it might not be necessary to do this here.  The
real improvement that this fold makes is where we have vector highparts
and vector constants (cases that look like the PR).

* This code does not handle VOPs, which, IIUC, would need to be done
when folding builtins which satisfy aarch64_modifies_global_state_p or
aarch64_reads_global_state_p.  An example of where this might be useful
is for extending floating-point conversions.  But, as I say, I'm not
sure if this is worth it given combine can usually do this (the only
counter-example I've run into being vcvt_f32_bf16).

V1: 
https://inbox.sourceware.org/gcc-patches/20250217190739.1680451-2-spencer.ab...@arm.com/
PR117850: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=117850

Bootstrapped & regtested on aarch64-linux-gnu.  OK for master?

Thanks,
Spencer

Spencer Abson (1):
  aarch64: Fold builtins with highpart args to highpart equivalent
[PR117850]

 gcc/config/aarch64/aarch64-builtin-pairs.def  |  73 ++
 gcc/config/aarch64/aarch64-builtins.cc| 183 +
 .../aarch64/simd/fold_to_highpart_1.c | 717 ++
 .../aarch64/simd/fold_to_highpart_2.c |  89 +++
 .../aarch64/simd/fold_to_highpart_3.c |  83 ++
 .../aarch64/simd/fold_to_highpart_4.c |  38 +
 .../aarch64/simd/fold_to_highpart_5.c |  92 +++
 .../aarch64/simd/fold_to_highpart_6.c |  37 +
 .../gcc.target/aarch64/simd/vabal_combine.c   |  72 --
 9 files changed, 1312 insertions(+), 72 deletions(-)
 create mode 100644 gcc/config/aarch64/aarch64-builtin-pairs.def
 create mode 100644 gcc/testsuite/gcc.target/aarch64/simd/fold_to_highpart_1.c
 create mode 100644 gcc/testsuite/gcc.target/aarch64/simd/fold_to_highpart_2.c
 create mode 100644 gcc/testsuite/gcc.target/aarch64/simd/fold_to_highpart_3.c
 create mode 100644 gcc/testsuite/gcc.target/aarch64/simd/fold_to_highpart_4.c
 create mode 100644 gcc/testsuite/gcc.target/aarch64/simd/fold_to_highpart_5.c
 create mode 100644 gcc/testsuite/gcc.target/aarch64/simd/fold_to_highpart_6.c
 delete mode 100644 gcc/testsuite/gcc.target/aarch64/simd/vabal_combine.c

-- 
2.34.1



Re: [PATCH] gcov: Split atomic bitwise-or for some targets

2025-07-10 Thread Jeff Law




On 7/9/25 11:53 PM, Sebastian Huber wrote:

There are targets, which only offer 32-bit atomic operations (for
example 32-bit RISC-V).  For these targets, split the 64-bit atomic
bitwise-or operation into two parts.

For this test case

int a(int i);
int b(int i);

int f(int i)
{
   if (i) {
 return a(i);
   } else {
 return b(i);
   }
}

with options

-O2 -fprofile-update=atomic -fcondition-coverage

the code generation to 64-bit vs. 32-bit RISC-V looks like:

 addia5,a5,%lo(.LANCHOR0)
 beq a0,zero,.L2
 li  a4,1
-   amoor.d zero,a4,0(a5)
-   addia5,a5,8
-   amoor.d zero,zero,0(a5)
+   amoor.w zero,a4,0(a5)
+   addia4,a5,4
+   amoor.w zero,zero,0(a4)
+   addia4,a5,8
+   amoor.w zero,zero,0(a4)
+   addia5,a5,12
+   amoor.w zero,zero,0(a5)
 taila
  .L2:
-   amoor.d zero,zero,0(a5)
+   amoor.w zero,zero,0(a5)
+   addia4,a5,4
+   amoor.w zero,zero,0(a4)
 li  a4,1
-   addia5,a5,8
-   amoor.d zero,a4,0(a5)
+   addia3,a5,8
+   amoor.w zero,a4,0(a3)
+   addia5,a5,12
+   amoor.w zero,zero,0(a5)
 tailb

Not related to this patch, even with -O2 the compiler generates
no-operations like

amoor.d zero,zero,0(a5)

and

amoor.w zero,zero,0(a5)

Would this be possible to filter out in instrument_decisions()?
Just to touch on the last issue.  We don't generally try that hard to 
optimize atomics, so I'm not terribly surprised to see that kind of dumb 
code.


If you've got a testcase for the nop-atomics, definitely get it filed as 
a bug.  My worry with those cases is we may not have the semantics 
exposed in RTL to allow for optimization, though it's more likely we 
have them in gimple.


Jeff


[PATCH] arm: Fix CMSE nonecure calls [PR 120977]

2025-07-10 Thread Christophe Lyon
As discussed in https://gcc.gnu.org/pipermail/gcc-patches/2025-June/685733.html
the operand of the call should be a mem rather than an unspec.

This patch moves the unspec to an additional argument of the parallel
and adjusts cmse_nonsecure_call_inline_register_clear accordingly.

The scan-rtl-dump in cmse-18.c needs a fix since we no longer emit the
'unspec' part.

In addition, I noticed that since arm_v8_1m_mve_ok is always true in
the context of the test (we know we support CMSE as per cmse.exp, and
arm_v8_1m_mve_ok finds the adequate options), we actually only use the
more permissive regex.  To improve that, the patch duplicates the
test, such that cmse-18.c forces -march=armv8-m.main+fp (so FPCXP is
disabled), and cmse-19.c forces -march=armv8.1-m.main+mve (so FPCXP is
enabled).  Each test uses the appropriate scan-rtl-dump, and also
checks we are using UNSPEC_NONSECURE_MEM (we need to remove -slim for
that).  The tests enable an FPU via -march so that the test passes
whether the testing harness forces -mfloat-abi or not.

2025-07-08  Christophe Lyon  

PR target/120977
gcc/
* config/arm/arm.md (call): Move unspec parameter to parallel.
(nonsecure_call_internal): Likewise.
(call_value): Likewise.
(nonsecure_call_value_internal): Likewise.
* config/arm/thumb1.md (nonsecure_call_reg_thumb1_v5): Likewise.
(nonsecure_call_value_reg_thumb1_v5): Likewise.
* config/arm/thumb2.md (nonsecure_call_reg_thumb2_fpcxt):
Likewise.
(nonsecure_call_reg_thumb2): Likewise.
(nonsecure_call_value_reg_thumb2_fpcxt): Likewise.
(nonsecure_call_value_reg_thumb2): Likewise.
* config/arm/arm.cc (cmse_nonsecure_call_inline_register_clear):
Likewise.

gcc/testsuite
* gcc.target/arm/cmse/cmse-18.c: Check only the case when FPCXT is
not enabled.
* gcc.target/arm/cmse/cmse-19.c: New test.
CI-tag:always-notify
---
 gcc/config/arm/arm.cc   |  5 +++--
 gcc/config/arm/arm.md   | 13 +++--
 gcc/config/arm/thumb1.md|  9 -
 gcc/config/arm/thumb2.md| 21 ++---
 gcc/testsuite/gcc.target/arm/cmse/cmse-18.c |  7 ---
 gcc/testsuite/gcc.target/arm/cmse/cmse-19.c | 14 ++
 6 files changed, 42 insertions(+), 27 deletions(-)
 create mode 100644 gcc/testsuite/gcc.target/arm/cmse/cmse-19.c

diff --git a/gcc/config/arm/arm.cc b/gcc/config/arm/arm.cc
index bde06f3fa86..5d6574761bc 100644
--- a/gcc/config/arm/arm.cc
+++ b/gcc/config/arm/arm.cc
@@ -18982,7 +18982,8 @@ cmse_nonsecure_call_inline_register_clear (void)
  call = SET_SRC (call);
 
  /* Check if it is a cmse_nonsecure_call.  */
- unspec = XEXP (call, 0);
+ unspec = XVECEXP (pat, 0, 2);
+
  if (GET_CODE (unspec) != UNSPEC
  || XINT (unspec, 1) != UNSPEC_NONSECURE_MEM)
continue;
@@ -19009,7 +19010,7 @@ cmse_nonsecure_call_inline_register_clear (void)
 
  /* Make sure the register used to hold the function address is not
 cleared.  */
- address = RTVEC_ELT (XVEC (unspec, 0), 0);
+ address = XEXP (call, 0);
  gcc_assert (MEM_P (address));
  gcc_assert (REG_P (XEXP (address, 0)));
  address_regnum = REGNO (XEXP (address, 0));
diff --git a/gcc/config/arm/arm.md b/gcc/config/arm/arm.md
index 5e5e1120e77..537a3e26a45 100644
--- a/gcc/config/arm/arm.md
+++ b/gcc/config/arm/arm.md
@@ -8623,7 +8623,7 @@ (define_expand "call"
 if (detect_cmse_nonsecure_call (addr))
   {
pat = gen_nonsecure_call_internal (operands[0], operands[1],
-  operands[2]);
+  operands[2], const0_rtx);
emit_call_insn (pat);
   }
 else
@@ -8665,10 +8665,10 @@ (define_expand "call_internal"
  (clobber (reg:SI LR_REGNUM))])])
 
 (define_expand "nonsecure_call_internal"
-  [(parallel [(call (unspec:SI [(match_operand 0 "memory_operand")]
-  UNSPEC_NONSECURE_MEM)
+  [(parallel [(call (match_operand 0 "memory_operand")
(match_operand 1 "general_operand"))
  (use (match_operand 2 "" ""))
+ (unspec:SI [(match_operand 3)] UNSPEC_NONSECURE_MEM)
  (clobber (reg:SI LR_REGNUM))])]
   "use_cmse"
   {
@@ -8745,7 +8745,8 @@ (define_expand "call_value"
 if (detect_cmse_nonsecure_call (addr))
   {
pat = gen_nonsecure_call_value_internal (operands[0], operands[1],
-operands[2], operands[3]);
+operands[2], operands[3],
+const0_rtx);
emit_call_insn (pat);
   }
 else
@@ -8779,10 +8780,10 @@ (define_expand "call_value_internal"
 
 (define_expand "nonsecure_call_value_internal"
   [

RE: [PATCH] Reject single lane vector types for SLP build

2025-07-10 Thread Richard Biener
On Thu, 10 Jul 2025, Tamar Christina wrote:

> > -Original Message-
> > From: Richard Biener 
> > Sent: Thursday, July 10, 2025 1:31 PM
> > To: gcc-patches@gcc.gnu.org
> > Cc: Richard Sandiford ; Tamar Christina
> > ; RISC-V CI 
> > Subject: [PATCH] Reject single lane vector types for SLP build
> > 
> > The following makes us never consider vector(1) T types for
> > vectorization and ensures this during SLP build.  This is a
> > long-standing issue for BB vectorization and when we remove
> > early loop vector type setting we lose the single place we have
> > that rejects this for loops.
> > 
> > Once we implement partial loop vectorization we should revisit
> > this, but then use the original scalar types for the unvectorized
> > parts.
> 
> SGTM FWIW,
> 
> I was also wondering if I should start upstreaming my changes to
> get the vectorizer to recognize vector types as scalar types as well.
> 
> Or if you wanted me to wait until I have the lane representations
> more figured out.

I think if we can restrict things to cases that have a strong
overlap with what we intend to use in the end that sounds good.
Like allow only a single "scalar" vector def per SLP node for now
and simply stash that into the scalar-stmts array.  In the end
we'd want to allow mixed scalar and vector defs there.

It does require altering code that expects to get at actual _scalar_
defs for each lane, but I don't think that's much code.

Richard.

> Regards,
> Tamar
> > 
> > Bootstrapped and tested on x86_64-unknown-linux-gnu.  I'll see
> > if there's any surprises from the CI, but otherwise I'll go
> > ahead with this.
> > 
> > Richard.
> > 
> > * tree-vect-slp.cc (vect_build_slp_tree_1): Reject
> > single-lane vector types.
> > ---
> >  gcc/tree-vect-slp.cc | 9 +
> >  1 file changed, 9 insertions(+)
> > 
> > diff --git a/gcc/tree-vect-slp.cc b/gcc/tree-vect-slp.cc
> > index ad75386926a..d2ce4ffaa4f 100644
> > --- a/gcc/tree-vect-slp.cc
> > +++ b/gcc/tree-vect-slp.cc
> > @@ -1114,6 +1114,15 @@ vect_build_slp_tree_1 (vec_info *vinfo, unsigned char
> > *swap,
> >matches[0] = false;
> >return false;
> >  }
> > +  if (known_le (TYPE_VECTOR_SUBPARTS (vectype), 1U))
> > +{
> > +  if (dump_enabled_p ())
> > +   dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location,
> > +"Build SLP failed: not using single lane "
> > +"vector type %T\n", vectype);
> > +  matches[0] = false;
> > +  return false;
> > +}
> >/* Record nunits required but continue analysis, producing matches[]
> >   as if nunits was not an issue.  This allows splitting of groups
> >   to happen.  */
> > --
> > 2.43.0
> 

-- 
Richard Biener 
SUSE Software Solutions Germany GmbH,
Frankenstrasse 146, 90461 Nuernberg, Germany;
GF: Ivo Totev, Andrew McDonald, Werner Knoblich; (HRB 36809, AG Nuernberg)


Re: [PATCH 2/2] Reduce the # of arguments of .ACCESS_WITH_SIZE from 6 to 4.

2025-07-10 Thread Jakub Jelinek
On Thu, Jul 10, 2025 at 05:27:50PM +, Qing Zhao wrote:
> ACCESS_MODE is only for a future work to reimplement the attribute
> access with the internal function .ACCESS_WITH_SIZE. 
> 
> https://gcc.gnu.org/onlinedocs/gcc/Common-Function-Attributes.html#index-access-function-attribute
> 
> For the current “counted_by”, this flag is not used at all.  Therefore only 0 
> is passed. 
> 
> Hope this is clear. 

Ok, the patch is ok then with ChangeLog fixed.

Jakub



Re: [PATCH 2/2] Reduce the # of arguments of .ACCESS_WITH_SIZE from 6 to 4.

2025-07-10 Thread Qing Zhao


> On Jul 10, 2025, at 13:27, Qing Zhao  wrote:
> 
> 
> 
>> On Jul 10, 2025, at 12:56, Jakub Jelinek  wrote:
>> 
>> On Thu, Jul 10, 2025 at 04:03:30PM +, Qing Zhao wrote:
>>> gcc/c-family/ChangeLog:
>>> 
>>> * c-ubsan.cc (get_bound_from_access_with_size): Adjust the position
>>> of the arguments per the new design.
>>> 
>>> gcc/c/ChangeLog:
>>> 
>>> * c-typeck.cc (build_counted_by_ref): Update comments.
>>> (build_access_with_size_for_counted_by): Adjust the arguments per
>>> the new design.
>>> 
>>> gcc/ChangeLog:
>>> 
>>> * internal-fn.cc (expand_DEFERRED_INIT): Update comments.
>>> * internal-fn.def (DEFERRED_INIT): Update comments.
>>> * tree-object-size.cc (addr_object_size): Update comments.
>>> (access_with_size_object_size): Adjust the arguments per the new
>>> design.
>> 
>> Similar comment about ChangeLog entries as on the previous patch.
> 
> Sure, will check the changeling and update accordingly. 
> 
>> 
>> I see only code passing 0 as the third argument (with the right type),
>> so don't see why it is documented to be 0/1/2/3.  Or is that something
>> done in the patch that got reverted?
> 
> ACCESS_MODE is only for a future work to reimplement the attribute
> access with the internal function .ACCESS_WITH_SIZE. 
> 
> https://gcc.gnu.org/onlinedocs/gcc/Common-Function-Attributes.html#index-access-function-attribute
> 
> For the current “counted_by”, this flag is not used at all.  Therefore only 0 
> is passed. 

One more note here, previously, ACCESS_MODE has 5 values:
 -1: Unknown access semantics
  0: none
  1: read_only
  2: write_only
  3: read_write

For counted_by, I passed “-1” to the .ACCESS_WITH_SIZE.

With the new design, ACCESS_MODE has only 4 values (I think that the -1 is not 
necessary, 0 should be enough)

0: none
1: read_only
2: write_only
3: read_write

For counted_by, I passed “0” to the .ACCESS_WITH_SIZE.

Later, when we reimplement the “access” attribute by using .ACCESS_WITH_SIZE, 
the ACCESS_MODE will be used
At that time. 

What do you think?

thanks.

Qing
> 
> Hope this is clear. 
>> 
>> Jakub




[To-commit][PATCH v2 2/2] Reduce the # of arguments of .ACCESS_WITH_SIZE from 6 to 4.

2025-07-10 Thread Qing Zhao
This is the 2nd version of the patch.

update the changelog per Jacub's comments. 

I will commit this version soon.

thanks.

Qing



This is an improvement to the design of internal function .ACCESS_WITH_SIZE.

Currently, the .ACCESS_WITH_SIZE is designed as:

   ACCESS_WITH_SIZE (REF_TO_OBJ, REF_TO_SIZE, CLASS_OF_SIZE,
 TYPE_OF_SIZE, ACCESS_MODE, TYPE_SIZE_UNIT for element)
   which returns the REF_TO_OBJ same as the 1st argument;

   1st argument REF_TO_OBJ: The reference to the object;
   2nd argument REF_TO_SIZE: The reference to the size of the object,
   3rd argument CLASS_OF_SIZE: The size referenced by the REF_TO_SIZE represents
 0: the number of bytes.
 1: the number of the elements of the object type;
   4th argument TYPE_OF_SIZE: A constant 0 with its TYPE being the same as the
 TYPE of the object referenced by REF_TO_SIZE
   5th argument ACCESS_MODE:
 -1: Unknown access semantics
  0: none
  1: read_only
  2: write_only
  3: read_write
   6th argument: The TYPE_SIZE_UNIT of the element TYPE of the FAM when 3rd
  argument is 1. NULL when 3rd argument is 0.

Among the 6 arguments:
 A. The 3rd argument CLASS_OF_SIZE is not needed. If the REF_TO_SIZE represents
the number of bytes, simply pass 1 to the TYPE_SIZE_UNIT argument.
 B. The 4th and the 5th arguments can be combined into 1 argument, whose TYPE
represents the TYPE_OF_SIZE, and the constant value represents the
ACCESS_MODE.

As a result, the new design of the .ACCESS_WITH_SIZE is:

   ACCESS_WITH_SIZE (REF_TO_OBJ, REF_TO_SIZE,
 TYPE_OF_SIZE + ACCESS_MODE, TYPE_SIZE_UNIT for element)
   which returns the REF_TO_OBJ same as the 1st argument;

   1st argument REF_TO_OBJ: The reference to the object;
   2nd argument REF_TO_SIZE: The reference to the size of the object,
   3rd argument TYPE_OF_SIZE + ACCESS_MODE: An integer constant with a pointer
 TYPE.
 The pointee TYPE of the pointer TYPE is the TYPE of the object referenced
by REF_TO_SIZE.
 The integer constant value represents the ACCESS_MODE:
0: none
1: read_only
2: write_only
3: read_write
   4th argument: The TYPE_SIZE_UNIT of the element TYPE of the array.

gcc/c-family/ChangeLog:

* c-ubsan.cc (get_bound_from_access_with_size): Adjust the position
of the arguments per the new design.

gcc/c/ChangeLog:

* c-typeck.cc (build_access_with_size_for_counted_by): Update comments.
Adjust the arguments per the new design.

gcc/ChangeLog:

* internal-fn.cc (expand_ACCESS_WITH_SIZE): Update comments.
* internal-fn.def (ACCESS_WITH_SIZE): Update comments.
* tree-object-size.cc (access_with_size_object_size): Update comments.
Adjust the arguments per the new design.
---
 gcc/c-family/c-ubsan.cc | 10 ++
 gcc/c/c-typeck.cc   | 18 +-
 gcc/internal-fn.cc  | 28 +---
 gcc/internal-fn.def |  2 +-
 gcc/tree-object-size.cc | 34 +-
 5 files changed, 38 insertions(+), 54 deletions(-)

diff --git a/gcc/c-family/c-ubsan.cc b/gcc/c-family/c-ubsan.cc
index 78b78685469..a4dc31066af 100644
--- a/gcc/c-family/c-ubsan.cc
+++ b/gcc/c-family/c-ubsan.cc
@@ -397,8 +397,7 @@ get_bound_from_access_with_size (tree call)
 return NULL_TREE;
 
   tree ref_to_size = CALL_EXPR_ARG (call, 1);
-  unsigned int class_of_size = TREE_INT_CST_LOW (CALL_EXPR_ARG (call, 2));
-  tree type = TREE_TYPE (CALL_EXPR_ARG (call, 3));
+  tree type = TREE_TYPE (TREE_TYPE (CALL_EXPR_ARG (call, 2)));
   tree size = fold_build2 (MEM_REF, type, unshare_expr (ref_to_size),
   build_int_cst (ptr_type_node, 0));
   /* If size is negative value, treat it as zero.  */
@@ -410,12 +409,7 @@ get_bound_from_access_with_size (tree call)
build_zero_cst (type), size);
   }
 
-  /* Only when class_of_size is 1, i.e, the number of the elements of
- the object type, return the size.  */
-  if (class_of_size != 1)
-return NULL_TREE;
-  else
-size = fold_convert (sizetype, size);
+  size = fold_convert (sizetype, size);
 
   return size;
 }
diff --git a/gcc/c/c-typeck.cc b/gcc/c/c-typeck.cc
index de3d6c78db8..9a5eb0da3a1 100644
--- a/gcc/c/c-typeck.cc
+++ b/gcc/c/c-typeck.cc
@@ -2982,7 +2982,7 @@ build_counted_by_ref (tree datum, tree subdatum, tree 
*counted_by_type)
 
to:
 
-   (*.ACCESS_WITH_SIZE (REF, COUNTED_BY_REF, 1, (TYPE_OF_SIZE)0, -1,
+   (*.ACCESS_WITH_SIZE (REF, COUNTED_BY_REF, (* TYPE_OF_SIZE)0,
TYPE_SIZE_UNIT for element)
 
NOTE: The return type of this function is the POINTER type pointing
@@ -2992,11 +2992,11 @@ build_counted_by_ref (tree datum, tree subdatum, tree 
*counted_by_type)
The type of the first argument of this function is a POINTER type
to the original flexible array type.
 
-   The 4th argument of the call is a constant 0

Re: [PATCH] aarch64: Enable selective LDAPUR generation for cores with RCPC2

2025-07-10 Thread Soumya AR


> On 1 Jul 2025, at 9:22 PM, Kyrylo Tkachov  wrote:
> 
> 
> 
>> On 1 Jul 2025, at 17:36, Richard Sandiford  wrote:
>> 
>> Soumya AR  writes:
>>> From 2a2c3e3683aaf3041524df166fc6f8cf20895a0b Mon Sep 17 00:00:00 2001
>>> From: Soumya AR 
>>> Date: Mon, 30 Jun 2025 12:17:30 -0700
>>> Subject: [PATCH] aarch64: Enable selective LDAPUR generation for cores with
>>> RCPC2
>>> 
>>> This patch adds the ability to fold the address computation into the 
>>> addressing
>>> mode for LDAPR instructions using LDAPUR when RCPC2 is available.
>>> 
>>> LDAPUR emission is controlled by the tune flag enable_ldapur, to enable it 
>>> on a
>>> per-core basis. Earlier, the following code:
>>> 
>>> uint64_t
>>> foo (std::atomic *x)
>>> {
>>> return x[1].load(std::memory_order_acquire);
>>> }
>>> 
>>> would generate:
>>> 
>>> foo(std::atomic*):
>>> add x0, x0, 8
>>> ldapr   x0, [x0]
>>> ret
>>> 
>>> but now generates:
>>> 
>>> foo(std::atomic*):
>>> ldapur  x0, [x0, 8]
>>> ret
>>> 
>>> The patch was bootstrapped and regtested on aarch64-linux-gnu, no 
>>> regression.
>>> OK for mainline?
>>> 
>>> Signed-off-by: Soumya AR 
>>> 
>>> gcc/ChangeLog:
>>> 
>>> * config/aarch64/aarch64-tuning-flags.def (AARCH64_EXTRA_TUNING_OPTION):
>>> Add the enable_ldapur flag to conwtrol LDAPUR emission.
>>> * config/aarch64/aarch64.h (TARGET_ENABLE_LDAPUR): Use new flag.
>>> * config/aarch64/aarch64.md (any): Add ldapur_enable attribute.
>>> * config/aarch64/atomics.md: (aarch64_atomic_load_rcpc): Modify
>>> to emit LDAPUR for cores with RCPC2 when enable_ldapur is set.
>>> (*aarch64_atomic_load_rcpc_zext): Likewise.
>>> (*aarch64_atomic_load_rcpc_sext): Modified to emit LDAPURS
>>> for addressing with offsets.
>>> 
>>> gcc/testsuite/ChangeLog:
>>> 
>>> * gcc.target/aarch64/ldapur.c: New test.
>> 
>> Thanks for doing this.  It generally looks good, but a couple of comments
>> below:
>> 
>>> ---
>>> gcc/config/aarch64/aarch64-tuning-flags.def |  2 +
>>> gcc/config/aarch64/aarch64.h|  5 ++
>>> gcc/config/aarch64/aarch64.md   | 11 +++-
>>> gcc/config/aarch64/atomics.md   | 22 +---
>>> gcc/testsuite/gcc.target/aarch64/ldapur.c   | 61 +
>>> 5 files changed, 92 insertions(+), 9 deletions(-)
>>> create mode 100644 gcc/testsuite/gcc.target/aarch64/ldapur.c
>>> 
>>> diff --git a/gcc/config/aarch64/aarch64-tuning-flags.def 
>>> b/gcc/config/aarch64/aarch64-tuning-flags.def
>>> index f2c916e9d77..5bf54165306 100644
>>> --- a/gcc/config/aarch64/aarch64-tuning-flags.def
>>> +++ b/gcc/config/aarch64/aarch64-tuning-flags.def
>>> @@ -44,6 +44,8 @@ AARCH64_EXTRA_TUNING_OPTION ("avoid_cross_loop_fma", 
>>> AVOID_CROSS_LOOP_FMA)
>>> 
>>> AARCH64_EXTRA_TUNING_OPTION ("fully_pipelined_fma", FULLY_PIPELINED_FMA)
>>> 
>>> +AARCH64_EXTRA_TUNING_OPTION ("enable_ldapur", ENABLE_LDAPUR)
>>> +
>> 
>> Let's see what others say, but personally, I think this would be better
>> as an opt-out, such as avoid_ldapur.  The natural default seems to be to use
>> the extra addressing capacity when it's available and have CPUs explicitly
>> flag when they don't want that.
>> 
>> A good, conservatively correct, default would probably be to add avoid_ldapur
>> to every *current* CPU that includes rcpc2 and then separately remove it
>> from those that are known not to need it.  In that sense, it's more work
>> for current CPUs than the current patch, but it should ease the impact
>> on future CPUs.
> 
> LLVM used to do this folding by default everywhere until it was discovered 
> that it hurts various CPUs.
> So they’ve taken the approach you describe, and disable the folding 
> explicitly for:
> neoverse-v2 neoverse-v3 cortex-x3 cortex-x4 cortex-x925 
> I don’t know for sure if those are the only CPUs where this applies.
> They also disable the folding for generic tuning when -march is between 
> armv8.4 - armv8.7/armv9.2.
> I guess we can do the same in GCC.

Thanks for your suggestions, Richard and Kyrill.

I've updated the patch to use avoid_ldapur.

There's now an explicit override in aarch64_override_options_internal to use 
avoid_ldapur for armv8.4 through armv8.7. 

I added it here because aarch64_adjust_generic_arch_tuning is only called for 
generic_tunings and not generic_armv{8,9}_a_tunings.

Let me know what you think.

Thanks,
Soumya

> Thanks,
> Kyrill
> 
>> 
>>> /* Enable is the target prefers to use a fresh register for predicate 
>>> outputs
>>>   rather than re-use an input predicate register.  */
>>> AARCH64_EXTRA_TUNING_OPTION ("avoid_pred_rmw", AVOID_PRED_RMW)
>>> diff --git a/gcc/config/aarch64/aarch64.h b/gcc/config/aarch64/aarch64.h
>>> index e8bd8c73c12..08ad8350fbc 100644
>>> --- a/gcc/config/aarch64/aarch64.h
>>> +++ b/gcc/config/aarch64/aarch64.h
>>> @@ -490,6 +490,11 @@ constexpr auto AARCH64_FL_DEFAULT_ISA_MODE 
>>> ATTRIBUTE_UNUSED
>>>  (bool (aarch64_tune_params.extra_tuning_flags \
>>> & AARCH64_EXTRA_TUNE_CHEAP_FPMR_WRITE))
>>> 
>>> +/* Enable folding address computation into LDAPUR when

Re: [PATCH] aarch64: PR target/120999: Avoid movprfx for NBSL implementation of NOR

2025-07-10 Thread Richard Sandiford
Kyrylo Tkachov  writes:
> Hi all,
>
> While the SVE2 NBSL instruction accepts MOVPRFX to add more flexibility
> due to its tied operands, the destination of the movprfx cannot be also
> a source operand. But the offending pattern in aarch64-sve2.md tries
> to do exactly that for the "=?&w,w,w" alternative and gas warns for the
> attached testcase.
>
> This patch just removes that alternative causing RA to emit a normal extra
> move.
> So for the testcase in the patch we now generate:
> nor_z:
>   nbsl z1.d, z1.d, z2.d, z1.d
>   mov z0.d, z1.d
>   ret
>
> instead of the previous:
> nor_z:
>   movprfx z0, z1
>   nbsl z0.d, z0.d, z2.d, z0.d
>   ret
>
> which generated a gas warning.

Shouldn't we instead change it to:

 [ ?&w  , w  , w ; yes] movprfx\t%0, %1\;nbsl\t%0.d, %0.d, 
%2.d, %1.d

?  The "&" ensures that %1 is still valid in the NBSL.

(That's OK if it works.)

Thanks,
Richard

> Bootstrapped and tested on aarch64-none-linux-gnu.
> Ok for trunk?
> Do we want to backport it?
>
> Thanks,
> Kyrill
>
>
> Signed-off-by: Kyrylo Tkachov 
>
> gcc/
>
> PR target/120999
>   * config/aarch64/aarch64-sve2.md (*aarch64_sve2_nor):
>   Remove movprfx alternative.
>
> gcc/testsuite/
>
>   PR target/120999
>   * gcc.target/aarch64/sve2/pr120999.c: New test.
>
> From bd24ce298461ee8129befda1983acf1b37a7215a Mon Sep 17 00:00:00 2001
> From: Kyrylo Tkachov 
> Date: Wed, 9 Jul 2025 10:04:01 -0700
> Subject: [PATCH] aarch64: PR target/120999: Avoid movprfx for NBSL
>  implementation of NOR
>
> While the SVE2 NBSL instruction accepts MOVPRFX to add more flexibility
> due to its tied operands, the destination of the movprfx cannot be also
> a source operand.  But the offending pattern in aarch64-sve2.md tries
> to do exactly that for the "=?&w,w,w" alternative and gas warns for the
> attached testcase.
>
> This patch just removes that alternative causing RA to emit a normal extra
> move.
> So for the testcase in the patch we now generate:
> nor_z:
> nbslz1.d, z1.d, z2.d, z1.d
> mov z0.d, z1.d
> ret
>
> instead of the previous:
> nor_z:
> movprfx z0, z1
> nbslz0.d, z0.d, z2.d, z0.d
> ret
>
> which generated a gas warning.
>
> Bootstrapped and tested on aarch64-none-linux-gnu.
>
> Signed-off-by: Kyrylo Tkachov 
>
> gcc/
>
>   PR target/120999
>   * config/aarch64/aarch64-sve2.md (*aarch64_sve2_nor):
>   Remove movprfx alternative.
>
> gcc/testsuite/
>
>   PR target/120999
>   * gcc.target/aarch64/sve2/pr120999.c: New test.
> ---
>  gcc/config/aarch64/aarch64-sve2.md   | 11 ---
>  gcc/testsuite/gcc.target/aarch64/sve2/pr120999.c | 15 +++
>  2 files changed, 19 insertions(+), 7 deletions(-)
>  create mode 100644 gcc/testsuite/gcc.target/aarch64/sve2/pr120999.c
>
> diff --git a/gcc/config/aarch64/aarch64-sve2.md 
> b/gcc/config/aarch64/aarch64-sve2.md
> index 15714712d3b..504ba6fc39b 100644
> --- a/gcc/config/aarch64/aarch64-sve2.md
> +++ b/gcc/config/aarch64/aarch64-sve2.md
> @@ -1616,20 +1616,17 @@
>  
>  ;; Use NBSL for vector NOR.
>  (define_insn_and_rewrite "*aarch64_sve2_nor"
> -  [(set (match_operand:SVE_FULL_I 0 "register_operand")
> +  [(set (match_operand:SVE_FULL_I 0 "register_operand" "=w")
>   (unspec:SVE_FULL_I
> [(match_operand 3)
>  (and:SVE_FULL_I
>(not:SVE_FULL_I
> -(match_operand:SVE_FULL_I 1 "register_operand"))
> +(match_operand:SVE_FULL_I 1 "register_operand" "%0"))
>(not:SVE_FULL_I
> -(match_operand:SVE_FULL_I 2 "register_operand")))]
> +(match_operand:SVE_FULL_I 2 "register_operand" "w")))]
> UNSPEC_PRED_X))]
>"TARGET_SVE2"
> -  {@ [ cons: =0 , %1 , 2 ; attrs: movprfx ]
> - [ w, 0  , w ; *  ] nbsl\t%0.d, %0.d, %2.d, %0.d
> - [ ?&w  , w  , w ; yes] movprfx\t%0, %1\;nbsl\t%0.d, 
> %0.d, %2.d, %0.d
> -  }
> +  "nbsl\t%0.d, %0.d, %2.d, %0.d"
>"&& !CONSTANT_P (operands[3])"
>{
>  operands[3] = CONSTM1_RTX (mode);
> diff --git a/gcc/testsuite/gcc.target/aarch64/sve2/pr120999.c 
> b/gcc/testsuite/gcc.target/aarch64/sve2/pr120999.c
> new file mode 100644
> index 000..1cdfa4107ae
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/aarch64/sve2/pr120999.c
> @@ -0,0 +1,15 @@
> +/* PR target/120999.  */
> +/* { dg-do assemble } */
> +/* { dg-options "-O2 --save-temps" } */
> +
> +/* We shouldn't be generating a MOVPRFX form of NBSL here or we'll get
> +   an assembler warning.  */
> +
> +#include 
> +
> +#define NOR(x, y)   (~((x) | (y)))
> +
> +svuint64_t nor_z(svuint64_t c, svuint64_t a, svuint64_t b) { return NOR(a, 
> b); }
> +
> +/* { dg-final { scan-assembler-not {\tmovprfx} } } */
> +


[r16-2084 Regression] FAIL: 23_containers/forward_list/debug/move_neg.cc -std=gnu++17 (test for excess errors) on Linux/x86_64

2025-07-10 Thread haochen.jiang
On Linux/x86_64,

2fd6f42c17a8040dbd3460ca34d93695dacf8575 is the first bad commit
commit 2fd6f42c17a8040dbd3460ca34d93695dacf8575
Author: François Dumont 
Date:   Thu Mar 27 19:02:59 2025 +0100

libstdc++: Make debug iterator pointer sequence const [PR116369]

caused

FAIL: 23_containers/forward_list/cons/self_move.cc  -std=gnu++17 (test for 
excess errors)
FAIL: 23_containers/forward_list/debug/construct4_neg.cc  -std=gnu++17 (test 
for excess errors)
FAIL: 23_containers/forward_list/debug/move_assign_neg.cc  -std=gnu++17 (test 
for excess errors)
FAIL: 23_containers/forward_list/debug/move_neg.cc  -std=gnu++17 (test for 
excess errors)

with GCC configured with

../../gcc/configure 
--prefix=/export/users3/haochenj/src/gcc-bisect/master/master/r16-2084/usr 
--enable-clocale=gnu --with-system-zlib --with-demangler-in-ld 
--with-fpmath=sse --enable-languages=c,c++,fortran --enable-cet --without-isl 
--enable-libmpx x86_64-linux --disable-bootstrap

To reproduce:

$ cd {build_dir}/x86_64-linux/libstdc++-v3/testsuite && make check 
RUNTESTFLAGS="conformance.exp=23_containers/forward_list/cons/self_move.cc 
--target_board='unix{-m32}'"
$ cd {build_dir}/x86_64-linux/libstdc++-v3/testsuite && make check 
RUNTESTFLAGS="conformance.exp=23_containers/forward_list/cons/self_move.cc 
--target_board='unix{-m32\ -march=cascadelake}'"
$ cd {build_dir}/x86_64-linux/libstdc++-v3/testsuite && make check 
RUNTESTFLAGS="conformance.exp=23_containers/forward_list/cons/self_move.cc 
--target_board='unix{-m64}'"
$ cd {build_dir}/x86_64-linux/libstdc++-v3/testsuite && make check 
RUNTESTFLAGS="conformance.exp=23_containers/forward_list/cons/self_move.cc 
--target_board='unix{-m64\ -march=cascadelake}'"
$ cd {build_dir}/x86_64-linux/libstdc++-v3/testsuite && make check 
RUNTESTFLAGS="conformance.exp=23_containers/forward_list/debug/construct4_neg.cc
 --target_board='unix{-m32}'"
$ cd {build_dir}/x86_64-linux/libstdc++-v3/testsuite && make check 
RUNTESTFLAGS="conformance.exp=23_containers/forward_list/debug/construct4_neg.cc
 --target_board='unix{-m32\ -march=cascadelake}'"
$ cd {build_dir}/x86_64-linux/libstdc++-v3/testsuite && make check 
RUNTESTFLAGS="conformance.exp=23_containers/forward_list/debug/construct4_neg.cc
 --target_board='unix{-m64}'"
$ cd {build_dir}/x86_64-linux/libstdc++-v3/testsuite && make check 
RUNTESTFLAGS="conformance.exp=23_containers/forward_list/debug/construct4_neg.cc
 --target_board='unix{-m64\ -march=cascadelake}'"
$ cd {build_dir}/x86_64-linux/libstdc++-v3/testsuite && make check 
RUNTESTFLAGS="conformance.exp=23_containers/forward_list/debug/move_assign_neg.cc
 --target_board='unix{-m32}'"
$ cd {build_dir}/x86_64-linux/libstdc++-v3/testsuite && make check 
RUNTESTFLAGS="conformance.exp=23_containers/forward_list/debug/move_assign_neg.cc
 --target_board='unix{-m32\ -march=cascadelake}'"
$ cd {build_dir}/x86_64-linux/libstdc++-v3/testsuite && make check 
RUNTESTFLAGS="conformance.exp=23_containers/forward_list/debug/move_assign_neg.cc
 --target_board='unix{-m64}'"
$ cd {build_dir}/x86_64-linux/libstdc++-v3/testsuite && make check 
RUNTESTFLAGS="conformance.exp=23_containers/forward_list/debug/move_assign_neg.cc
 --target_board='unix{-m64\ -march=cascadelake}'"
$ cd {build_dir}/x86_64-linux/libstdc++-v3/testsuite && make check 
RUNTESTFLAGS="conformance.exp=23_containers/forward_list/debug/move_neg.cc 
--target_board='unix{-m32}'"
$ cd {build_dir}/x86_64-linux/libstdc++-v3/testsuite && make check 
RUNTESTFLAGS="conformance.exp=23_containers/forward_list/debug/move_neg.cc 
--target_board='unix{-m32\ -march=cascadelake}'"
$ cd {build_dir}/x86_64-linux/libstdc++-v3/testsuite && make check 
RUNTESTFLAGS="conformance.exp=23_containers/forward_list/debug/move_neg.cc 
--target_board='unix{-m64}'"
$ cd {build_dir}/x86_64-linux/libstdc++-v3/testsuite && make check 
RUNTESTFLAGS="conformance.exp=23_containers/forward_list/debug/move_neg.cc 
--target_board='unix{-m64\ -march=cascadelake}'"

(Please do not reply to this email, for question about this report, contact me 
at haochen dot jiang at intel.com.)
(If you met problems with cascadelake related, disabling AVX512F in command 
line might save that.)
(However, please make sure that there is no potential problems with AVX512.)


[r16-2103 Regression] FAIL: gcc.dg/guality/pr41447-1.c -Os -DPREVENT_OPTIMIZATION execution test on Linux/x86_64

2025-07-10 Thread haochen.jiang
On Linux/x86_64,

ad2bab693f74cad239615ba8725a691d435b3a97 is the first bad commit
commit ad2bab693f74cad239615ba8725a691d435b3a97
Author: Richard Biener 
Date:   Tue Jul 8 13:46:01 2025 +0200

Avoid IPA opts around guality plumbing

caused

FAIL: gcc.dg/guality/pr41447-1.c   -O2  -DPREVENT_OPTIMIZATION  execution test
FAIL: gcc.dg/guality/pr41447-1.c   -O2 -flto -fno-use-linker-plugin 
-flto-partition=none  -DPREVENT_OPTIMIZATION execution test
FAIL: gcc.dg/guality/pr41447-1.c   -O2 -flto -fuse-linker-plugin 
-fno-fat-lto-objects  -DPREVENT_OPTIMIZATION execution test
FAIL: gcc.dg/guality/pr41447-1.c   -O3 -g  -DPREVENT_OPTIMIZATION  execution 
test
FAIL: gcc.dg/guality/pr41447-1.c   -Os  -DPREVENT_OPTIMIZATION  execution test

with GCC configured with

../../gcc/configure 
--prefix=/export/users3/haochenj/src/gcc-bisect/master/master/r16-2103/usr 
--enable-clocale=gnu --with-system-zlib --with-demangler-in-ld 
--with-fpmath=sse --enable-languages=c,c++,fortran --enable-cet --without-isl 
--enable-libmpx x86_64-linux --disable-bootstrap

To reproduce:

$ cd {build_dir}/gcc && make check 
RUNTESTFLAGS="guality.exp=gcc.dg/guality/pr41447-1.c 
--target_board='unix{-m32}'"
$ cd {build_dir}/gcc && make check 
RUNTESTFLAGS="guality.exp=gcc.dg/guality/pr41447-1.c --target_board='unix{-m32\ 
-march=cascadelake}'"

(Please do not reply to this email, for question about this report, contact me 
at haochen dot jiang at intel.com.)
(If you met problems with cascadelake related, disabling AVX512F in command 
line might save that.)
(However, please make sure that there is no potential problems with AVX512.)


Re: [PATCH] aarch64: Add support for NVIDIA GB10

2025-07-10 Thread Kyrylo Tkachov


> On 18 Jun 2025, at 17:26, Kyrylo Tkachov  wrote:
> 
> Hi all,
> 
> This adds support for -mcpu=gb10. This is a big.LITTLE configuration
> involving Cortex-X925 and Cortex-A725 cores. The appropriate MIDR numbers
> are added to detect them in -mcpu=native. We did not add an
> -mcpu=cortex-x925.cortex-a725 option because GB10 does include the crypto
> instructions which we want on by default, and the current convention is to not
> enable such extensions for Arm Cortex cores in -mcpu where they are optional
> in the IP.
> 
> Bootstrapped and tested on aarch64-none-linux-gnu.
> I’m leaving it up for comments as I’m away for the next week and a half.
> I’ll push it to trunk and GCC 15 when I’m back.

Pushed to GCC 15 now after a bootstrap and test run.
Thanks,
Kyrill

> 
> Thanks,
> Kyrill
> 
> Signed-off-by: Kyrylo Tkachov 
> 
> gcc/
> 
> * config/aarch64/aarch64-cores.def (gb10): New entry.
> * config/aarch64/aarch64-tune.md: Regenerate.
> * doc/invoke.texi (AArch64 Options): Document the above.
> 
> <0001-aarch64-Add-support-for-NVIDIA-GB10.patch>



Re: [PATCH] Change bellow in comments to below

2025-07-10 Thread Jakub Jelinek
On Thu, Jul 10, 2025 at 07:26:52AM +, Kyrylo Tkachov wrote:
> … “Tunning” looks like a typo as well, should likely be “Tuning”.

You're right, but because like often it occurs in another place as well,
I've committed this separately as obvious.

Thanks for finding this.

2025-07-10  Jakub Jelinek  

* config/i386/x86-tune.def: Change "Tunning the" to "tuning" in
comment and use semicolon instead of dot in comment.
* loop-unroll.cc (decide_unroll_stupid): Comment spelling fix,
tunning -> tuning.

--- gcc/config/i386/x86-tune.def.jj 2025-07-10 10:16:37.724549554 +0200
+++ gcc/config/i386/x86-tune.def2025-07-10 10:20:38.687404068 +0200
@@ -31,7 +31,7 @@ see the files COPYING3 and COPYING.RUNTI
- Updating ix86_issue_rate and ix86_adjust_cost in i386.md
- possibly updating ia32_multipass_dfa_lookahead, ix86_sched_reorder
  and ix86_sched_init_global if those tricks are needed.
-- Tunning the flags below. Those are split into sections and each
+- tuning flags below; those are split into sections and each
   section is very roughly ordered by importance.  */
 
 /*/
--- gcc/loop-unroll.cc.jj   2025-05-20 08:14:06.251408343 +0200
+++ gcc/loop-unroll.cc  2025-07-10 10:21:00.281122186 +0200
@@ -1185,7 +1185,7 @@ decide_unroll_stupid (class loop *loop,
 
   /* Do not unroll loops with branches inside -- it increases number
  of mispredicts.
- TODO: this heuristic needs tunning; call inside the loop body
+ TODO: this heuristic needs tuning; call inside the loop body
  is also relatively good reason to not unroll.  */
   if (num_loop_branches (loop) > 1)
 {


Jakub



[PATCH] RISC-V: Make zero-stride load broadcast a tunable.

2025-07-10 Thread Robin Dapp

Hi,

This patch makes the zero-stride load broadcast idiom dependent on a
uarch-tunable "use_zero_stride_load".  Right now we have quite a few
paths that reach a strided load and some of them are not exactly
straightforward.

While broadcast is relatively rare on rv64 targets it is more common on
rv32 targets that want to vectorize 64-bit elements.

While the patch is more involved than I would have liked it could have
even touched more places.  The whole broadcast-like insn path feels a
bit hackish due to the several optimizations we employ.  Some of the
complications stem from the fact that we lump together real broadcasts,
vector single-element sets, and strided broadcasts.  The strided-load
alternatives currently require a memory_constraint to work properly
which causes more complications when trying to disable just these.

In short, the whole pred_broadcast handling in combination with the
sew64_scalar_helper could use work in the future.  I was about to start
with it in this patch but soon realized that it would only distract from
the original intent.  What can help in the future is split strided and
non-strided broadcast entirely, as well as the single-element sets.

Yet unclear is whether we need to pay special attention for misaligned
strided loads (PR120782).

I regtested on rv32 and rv64 with strided_load_broadcast_p forced to
true and false.  With either I didn't observe any new execution failures
but obviously there are new scan failures with strided broadcast turned
off.  It can't hurt to have the CI test everything again.

Regards
Robin

PR target/118734

gcc/ChangeLog:

* config/riscv/constraints.md (Wdm): Use tunable for Wdm
constraint.
* config/riscv/riscv-protos.h (emit_avltype_insn): Declare.
(can_be_broadcasted_p): Rename to...
(can_be_broadcast_p): ...this.
* config/riscv/predicates.md: Use renamed function.
(strided_load_broadcast_p): Declare.
* config/riscv/riscv-selftests.cc (run_broadcast_selftests):
Only run broadcast selftest if strided broadcasts are OK.
* config/riscv/riscv-v.cc (emit_avltype_insn): New function.
(sew64_scalar_helper): Only emit a pred_broadcast if the new
tunable says so.
(can_be_broadcasted_p): Rename to...
(can_be_broadcast_p): ...this and use new tunable.
* config/riscv/riscv.cc (struct riscv_tune_param): Add strided
broad tunable.
(strided_load_broadcast_p): Implement.
* config/riscv/vector.md: Use strided_load_broadcast_p () and
work around 64-bit broadcast on rv32 targets.
---
gcc/config/riscv/constraints.md |  7 +--
gcc/config/riscv/predicates.md  |  2 +-
gcc/config/riscv/riscv-protos.h |  4 +-
gcc/config/riscv/riscv-selftests.cc | 10 ++--
gcc/config/riscv/riscv-v.cc | 58 +++
gcc/config/riscv/riscv.cc   | 20 
gcc/config/riscv/vector.md  | 71 ++---
7 files changed, 138 insertions(+), 34 deletions(-)

diff --git a/gcc/config/riscv/constraints.md b/gcc/config/riscv/constraints.md
index ccab1a2e29d..5ecaa19eb01 100644
--- a/gcc/config/riscv/constraints.md
+++ b/gcc/config/riscv/constraints.md
@@ -237,10 +237,11 @@ (define_constraint "Wb1"
 (and (match_code "const_vector")
  (match_test "rtx_equal_p (op, riscv_vector::gen_scalar_move_mask (GET_MODE 
(op)))")))

-(define_memory_constraint "Wdm"
+(define_constraint "Wdm"
  "Vector duplicate memory operand"
-  (and (match_code "mem")
-   (match_code "reg" "0")))
+  (and (match_test "strided_load_broadcast_p ()")
+   (and (match_code "mem")
+   (match_code "reg" "0"

;; Vendor ISA extension constraints.

diff --git a/gcc/config/riscv/predicates.md b/gcc/config/riscv/predicates.md
index 8baad2fae7a..1f9a6b562e5 100644
--- a/gcc/config/riscv/predicates.md
+++ b/gcc/config/riscv/predicates.md
@@ -617,7 +617,7 @@ (define_special_predicate "vector_any_register_operand"

;; The scalar operand can be directly broadcast by RVV instructions.
(define_predicate "direct_broadcast_operand"
-  (match_test "riscv_vector::can_be_broadcasted_p (op)"))
+  (match_test "riscv_vector::can_be_broadcast_p (op)"))

;; A CONST_INT operand that has exactly two bits cleared.
(define_predicate "const_nottwobits_operand"
diff --git a/gcc/config/riscv/riscv-protos.h b/gcc/config/riscv/riscv-protos.h
index 38f63ea8424..a41c4c299fa 100644
--- a/gcc/config/riscv/riscv-protos.h
+++ b/gcc/config/riscv/riscv-protos.h
@@ -604,6 +604,7 @@ void emit_vlmax_vsetvl (machine_mode, rtx);
void emit_hard_vlmax_vsetvl (machine_mode, rtx);
void emit_vlmax_insn (unsigned, unsigned, rtx *);
void emit_nonvlmax_insn (unsigned, unsigned, rtx *, rtx);
+void emit_avltype_insn (unsigned, unsigned, rtx *, avl_type, rtx = nullptr);
void emit_vlmax_insn_lra (unsigned, unsigned, rtx *, rtx);
enum vlmul_type get_vlmul (machine_mode);
rtx get_vlmax_rtx (machine_mode);
@@ -760,7 +761,7 @@ uint8_t get_sew 

Re: [PATCH] aarch64: PR target/120999: Avoid movprfx for NBSL implementation of NOR

2025-07-10 Thread Kyrylo Tkachov


> On 10 Jul 2025, at 10:40, Richard Sandiford  wrote:
> 
> Kyrylo Tkachov  writes:
>> Hi all,
>> 
>> While the SVE2 NBSL instruction accepts MOVPRFX to add more flexibility
>> due to its tied operands, the destination of the movprfx cannot be also
>> a source operand. But the offending pattern in aarch64-sve2.md tries
>> to do exactly that for the "=?&w,w,w" alternative and gas warns for the
>> attached testcase.
>> 
>> This patch just removes that alternative causing RA to emit a normal extra
>> move.
>> So for the testcase in the patch we now generate:
>> nor_z:
>> nbsl z1.d, z1.d, z2.d, z1.d
>> mov z0.d, z1.d
>> ret
>> 
>> instead of the previous:
>> nor_z:
>> movprfx z0, z1
>> nbsl z0.d, z0.d, z2.d, z0.d
>> ret
>> 
>> which generated a gas warning.
> 
> Shouldn't we instead change it to:
> 
> [ ?&w  , w  , w ; yes] movprfx\t%0, %1\;nbsl\t%0.d, %0.d, 
> %2.d, %1.d
> 
> ?  The "&" ensures that %1 is still valid in the NBSL.
> 
> (That's OK if it works.)

Yes, that seems to work, thanks.
I’ll push this version after some more testing.

Kyrill

> 
> Thanks,
> Richard
> 
>> Bootstrapped and tested on aarch64-none-linux-gnu.
>> Ok for trunk?
>> Do we want to backport it?
>> 
>> Thanks,
>> Kyrill
>> 
>> 
>> Signed-off-by: Kyrylo Tkachov 
>> 
>> gcc/
>> 
>> PR target/120999
>> * config/aarch64/aarch64-sve2.md (*aarch64_sve2_nor):
>> Remove movprfx alternative.
>> 
>> gcc/testsuite/
>> 
>> PR target/120999
>> * gcc.target/aarch64/sve2/pr120999.c: New test.
>> 
>> From bd24ce298461ee8129befda1983acf1b37a7215a Mon Sep 17 00:00:00 2001
>> From: Kyrylo Tkachov 
>> Date: Wed, 9 Jul 2025 10:04:01 -0700
>> Subject: [PATCH] aarch64: PR target/120999: Avoid movprfx for NBSL
>> implementation of NOR
>> 
>> While the SVE2 NBSL instruction accepts MOVPRFX to add more flexibility
>> due to its tied operands, the destination of the movprfx cannot be also
>> a source operand.  But the offending pattern in aarch64-sve2.md tries
>> to do exactly that for the "=?&w,w,w" alternative and gas warns for the
>> attached testcase.
>> 
>> This patch just removes that alternative causing RA to emit a normal extra
>> move.
>> So for the testcase in the patch we now generate:
>> nor_z:
>>nbslz1.d, z1.d, z2.d, z1.d
>>mov z0.d, z1.d
>>ret
>> 
>> instead of the previous:
>> nor_z:
>>movprfx z0, z1
>>nbslz0.d, z0.d, z2.d, z0.d
>>ret
>> 
>> which generated a gas warning.
>> 
>> Bootstrapped and tested on aarch64-none-linux-gnu.
>> 
>> Signed-off-by: Kyrylo Tkachov 
>> 
>> gcc/
>> 
>> PR target/120999
>> * config/aarch64/aarch64-sve2.md (*aarch64_sve2_nor):
>> Remove movprfx alternative.
>> 
>> gcc/testsuite/
>> 
>> PR target/120999
>> * gcc.target/aarch64/sve2/pr120999.c: New test.
>> ---
>> gcc/config/aarch64/aarch64-sve2.md   | 11 ---
>> gcc/testsuite/gcc.target/aarch64/sve2/pr120999.c | 15 +++
>> 2 files changed, 19 insertions(+), 7 deletions(-)
>> create mode 100644 gcc/testsuite/gcc.target/aarch64/sve2/pr120999.c
>> 
>> diff --git a/gcc/config/aarch64/aarch64-sve2.md 
>> b/gcc/config/aarch64/aarch64-sve2.md
>> index 15714712d3b..504ba6fc39b 100644
>> --- a/gcc/config/aarch64/aarch64-sve2.md
>> +++ b/gcc/config/aarch64/aarch64-sve2.md
>> @@ -1616,20 +1616,17 @@
>> 
>> ;; Use NBSL for vector NOR.
>> (define_insn_and_rewrite "*aarch64_sve2_nor"
>> -  [(set (match_operand:SVE_FULL_I 0 "register_operand")
>> +  [(set (match_operand:SVE_FULL_I 0 "register_operand" "=w")
>> (unspec:SVE_FULL_I
>>   [(match_operand 3)
>>(and:SVE_FULL_I
>>  (not:SVE_FULL_I
>> -(match_operand:SVE_FULL_I 1 "register_operand"))
>> +(match_operand:SVE_FULL_I 1 "register_operand" "%0"))
>>  (not:SVE_FULL_I
>> -(match_operand:SVE_FULL_I 2 "register_operand")))]
>> +(match_operand:SVE_FULL_I 2 "register_operand" "w")))]
>>   UNSPEC_PRED_X))]
>>   "TARGET_SVE2"
>> -  {@ [ cons: =0 , %1 , 2 ; attrs: movprfx ]
>> - [ w, 0  , w ; *  ] nbsl\t%0.d, %0.d, %2.d, %0.d
>> - [ ?&w  , w  , w ; yes] movprfx\t%0, %1\;nbsl\t%0.d, 
>> %0.d, %2.d, %0.d
>> -  }
>> +  "nbsl\t%0.d, %0.d, %2.d, %0.d"
>>   "&& !CONSTANT_P (operands[3])"
>>   {
>> operands[3] = CONSTM1_RTX (mode);
>> diff --git a/gcc/testsuite/gcc.target/aarch64/sve2/pr120999.c 
>> b/gcc/testsuite/gcc.target/aarch64/sve2/pr120999.c
>> new file mode 100644
>> index 000..1cdfa4107ae
>> --- /dev/null
>> +++ b/gcc/testsuite/gcc.target/aarch64/sve2/pr120999.c
>> @@ -0,0 +1,15 @@
>> +/* PR target/120999.  */
>> +/* { dg-do assemble } */
>> +/* { dg-options "-O2 --save-temps" } */
>> +
>> +/* We shouldn't be generating a MOVPRFX form of NBSL here or we'll get
>> +   an assembler warning.  */
>> +
>> +#include 
>> +
>> +#define NOR(x, y)   (~((x) | (y)))
>> +
>> +svuint64_t nor_z(svuint64_t c, svuint64_t a, svuint64_t b) { return NOR(a, 
>> b); }
>> +
>> +/* { dg-final { scan-assembler-not {\tmovprfx} } } */
>> +




0001-aar

Re: [PATCH] libstdc++: Implement std::chrono::current_zone() for Windows

2025-07-10 Thread Jonathan Wakely
On Wed, 9 Jul 2025 at 20:21, Björn Schäpers wrote:
>
> Am 09.07.2025 um 16:16 schrieb Jonathan Wakely:
> > On Wed, 9 Jul 2025 at 15:13, Jonathan Wakely  wrote:
> >>
> >> On Tue, 8 Jul 2025 at 21:47, Björn Schäpers  wrote:
> >>> index 9923d14b7a7..bfbba077b92 100644
> >>> --- a/libstdc++-v3/src/c++20/tzdb.cc
> >>> +++ b/libstdc++-v3/src/c++20/tzdb.cc
> >>> @@ -48,6 +48,8 @@
> >>>   # define WIN32_LEAN_AND_MEAN
> >>>   # include 
> >>>   # include 
> >>> +
> >>> +# include 
> >>>   #endif
> >>>
> >>>   #if defined __GTHREADS && ATOMIC_POINTER_LOCK_FREE == 2
> >>> @@ -1768,6 +1770,98 @@ namespace std::chrono
> >>>
> >>> return nullptr; // not found
> >>>   }
> >>> +
> >>> +#ifdef _GLIBCXX_HAVE_WINDOWS_H
> >>> +string_view
> >>> +detect_windows_zone() noexcept
> >>> +{
> >>> +  DYNAMIC_TIME_ZONE_INFORMATION information{};
> >>> +  if (GetDynamicTimeZoneInformation(&information) == 
> >>> TIME_ZONE_ID_INVALID)
> >>> +   return {};
> >>> +
> >>> +  constexpr SYSTEMTIME all_zero_time{};
> >>> +  const wstring_view zone_name{ information.TimeZoneKeyName };
> >>> +  auto equal = [](const SYSTEMTIME &lhs, const SYSTEMTIME &rhs) 
> >>> noexcept
> >>> +   { return memcmp(&lhs, &rhs, sizeof(SYSTEMTIME)) == 0; };
> >>> +  // The logic is copied from icu, couldn't find the source.
> >>> +  // Detect if DST is disabled.
> >>> +  if (information.DynamicDaylightTimeDisabled
> >>> + && equal(information.StandardDate, information.DaylightDate)
> >>> + && ((!zone_name.empty()
> >>> +  && equal(information.StandardDate, all_zero_time))
> >>> + || (zone_name.empty()
> >>> + && !equal(information.StandardDate, all_zero_time
> >>> +   {
> >>> + if (information.Bias == 0)
> >>> +   return "Etc/UTC";
> >>> +
> >>> + if (information.Bias % 60 != 0)
> >>> +   // If the offset is not in full hours, we can't do anything 
> >>> really.
> >>> +   return {};
> >>> +
> >>> + const auto raw_index = information.Bias / 60;
> >>> +
> >>> + // The bias added to the local time equals UTC. And GMT+X 
> >>> corrosponds
> >>> + // to UTC-X, the sign is negated. Thus we can use the hourly 
> >>> bias as
> >>> + // an index into an array.
> >>> + if (raw_index < 0 && raw_index >= -14)
> >>> +   {
> >>> + static array table{
> >>> +   "Etc/GMT-1",  "Etc/GMT-2",  "Etc/GMT-3",  "Etc/GMT-4",
> >>> +   "Etc/GMT-5",  "Etc/GMT-6",  "Etc/GMT-7",  "Etc/GMT-8",
> >>> +   "Etc/GMT-9",  "Etc/GMT-10", "Etc/GMT-11", "Etc/GMT-12",
> >>> +   "Etc/GMT-13", "Etc/GMT-14"
> >>> + };
> >>> + return table[-raw_index - 1];
> >>> +   }
> >>> + else if (raw_index > 0 && raw_index <= 12)
> >>> +   {
> >>> + static array table{
> >>> +   "Etc/GMT+1", "Etc/GMT+2",  "Etc/GMT+3",  "Etc/GMT+4",
> >>> +   "Etc/GMT+5", "Etc/GMT+6",  "Etc/GMT+7",  "Etc/GMT+8",
> >>> +   "Etc/GMT+9", "Etc/GMT+10", "Etc/GMT+11", "Etc/GMT+12"
> >>> + };
> >>> + return table[raw_index - 1];
> >>> +   }
> >>> + return {};
> >>> +   }
> >>> +
> >>> +#define _GLIBCXX_GET_WINDOWS_ZONES_MAP
> >>> +#include 
> >>> +#ifdef _GLIBCXX_GET_WINDOWS_ZONES_MAP
> >>> +# error "Invalid windows_zones map"
> >>> +#endif
> >>> +
> >>> +  const auto zone_range
> >>> + = ranges::equal_range(windows_zone_map, zone_name, {},
> >>> +   &windows_zone_map_entry::windows_name);
> >>> +
> >>> +  const auto size = ranges::size(zone_range);
> >>> +  if (size == 0)
> >>> +   // Unknown zone, we can't detect anything.
> >>> +   return {};
> >>> +
> >>> +  if (size == 1)
> >>> +   // Some zones have only one territory, use the quick path.
> >>> +   return zone_range.front().iana_name;
> >>> +
> >>> +  const auto geo_id = GetUserGeoID(GEOCLASS_NATION);
> >>> +  wstring territory;
> >>> +  territory.resize(2); // The terminating zero is always added on 
> >>> top.
> >>> +  if (GetGeoInfoW(geo_id, GEO_ISO2, territory.data(), 3, 0) == 0)
> >>
> >> I'm confused here. The buffer is 2 wchar_t values, but the cchData
> >> argument is 3. The docs at
> >> https://learn.microsoft.com/en-us/windows/win32/api/winnls/nf-winnls-getgeoinfoa
> >> say that value should be "Size of the buffer indicated by lpGeoData.
> >> The size is the number of bytes for the ANSI version of the function,
> >> or the number of words for the Unicode version." I utterly despise
> >> this idiotic convention in MS docs of saying "ANSI" to mean ... who
> >> knows ... not something actually defined by any ANSI standard ... is
> >> GetGeoInfoA the "ANSI" version and GetGeoInfoAW the "Unicode" version?
> >> Why not call them that? Or narrow and wide? This has nothing to do
> >> with ANSI.

[PING][PATCH v4] reassoc: Optimize CMP/XOR expressions [PR116860]

2025-07-10 Thread Konstantinos Eleftheriou
Ping for https://gcc.gnu.org/pipermail/gcc-patches/2025-June/687530.html .

Thanks,
Konstantinos


RE: [PATCH] Reject single lane vector types for SLP build

2025-07-10 Thread Richard Biener
On Thu, 10 Jul 2025, Tamar Christina wrote:

> > -Original Message-
> > From: Richard Biener 
> > Sent: Thursday, July 10, 2025 6:42 PM
> > To: Tamar Christina 
> > Cc: gcc-patches@gcc.gnu.org; Richard Sandiford ;
> > CI RISC-V 
> > Subject: Re: [PATCH] Reject single lane vector types for SLP build
> > 
> > 
> > 
> > > Am 10.07.2025 um 16:27 schrieb Tamar Christina :
> > >
> > > 
> > >>
> > >> -Original Message-
> > >> From: Richard Biener 
> > >> Sent: Thursday, July 10, 2025 3:09 PM
> > >> To: Tamar Christina 
> > >> Cc: gcc-patches@gcc.gnu.org; Richard Sandiford
> > ;
> > >> RISC-V CI 
> > >> Subject: RE: [PATCH] Reject single lane vector types for SLP build
> > >>
> > >> On Thu, 10 Jul 2025, Tamar Christina wrote:
> > >>
> >  -Original Message-
> >  From: Richard Biener 
> >  Sent: Thursday, July 10, 2025 1:31 PM
> >  To: gcc-patches@gcc.gnu.org
> >  Cc: Richard Sandiford ; Tamar Christina
> >  ; RISC-V CI 
> >  Subject: [PATCH] Reject single lane vector types for SLP build
> > 
> >  The following makes us never consider vector(1) T types for
> >  vectorization and ensures this during SLP build.  This is a
> >  long-standing issue for BB vectorization and when we remove
> >  early loop vector type setting we lose the single place we have
> >  that rejects this for loops.
> > 
> >  Once we implement partial loop vectorization we should revisit
> >  this, but then use the original scalar types for the unvectorized
> >  parts.
> > >>>
> > >>> SGTM FWIW,
> > >>>
> > >>> I was also wondering if I should start upstreaming my changes to
> > >>> get the vectorizer to recognize vector types as scalar types as well.
> > >>>
> > >>> Or if you wanted me to wait until I have the lane representations
> > >>> more figured out.
> > >>
> > >> I think if we can restrict things to cases that have a strong
> > >> overlap with what we intend to use in the end that sounds good.
> > >> Like allow only a single "scalar" vector def per SLP node for now
> > >> and simply stash that into the scalar-stmts array.  In the end
> > >> we'd want to allow mixed scalar and vector defs there.
> > >
> > > At least for my use case I'd need to be able to do multiple "scalar"
> > > vector lanes, but restricted to the same size for each lane is fine for
> > > now.
> > >
> > > But I don't think where's actually much difference here between
> > > one "scalar" and multiple "scalars" representations wise now is there?
> > 
> > Not if they are at least uniform, not mixed vector ns scalar or different 
> > vector sizes.
> > What’s going to be interesting (or impossible?) might be VLA vectors.
> 
> Yeah, mixed sized vectors will be a challenge, they weren't on my list for
> now, neither is VLA, or even VLS, the problem is that for at least SVE, many 
> of the
> control structures are opague.  So niters has no clue, DF analysis doesn't 
> know about
> the loads etc.
> 
> So VLA and SVE general as an input is a bit further out. 
> 
> > 
> > But I’d like to see the BB vectorizer deal with vector lowering, so each 
> > target
> > unsupported vector stmt would be a BB vect seed.
> > 
> 
> Hmm so is the idea to treat them as supported for the purpose of 
> vectorizations
> and then decompose them during codegen? Do we not run the risk then of
> potentially missing out on other passes ending up making the operation 
> supported?

Vectorization takes existing vectors just as another way to encode the
scalar instruction flow, so it shouldn't matter to it whether the
operations use a vector mode and are supported or whether they are
BLKmode or unsupported.  But the vectorization result would be of course
supported.  Iff (re-)vectorization fails we'd have a SLP instance we
can scalarize - and we are in a better position to do this sensibly
than the current vector lowering pass which works stmt-by-stmt.

Richard.

> Cheers,
> Tamar
> 
> > Richard
> > 
> > >
> > > Thanks,
> > > Tamar
> > >
> > >>
> > >> It does require altering code that expects to get at actual _scalar_
> > >> defs for each lane, but I don't think that's much code.
> > >>
> > >> Richard.
> > >>
> > >>> Regards,
> > >>> Tamar
> > 
> >  Bootstrapped and tested on x86_64-unknown-linux-gnu.  I'll see
> >  if there's any surprises from the CI, but otherwise I'll go
> >  ahead with this.
> > 
> >  Richard.
> > 
> > * tree-vect-slp.cc (vect_build_slp_tree_1): Reject
> > single-lane vector types.
> >  ---
> >  gcc/tree-vect-slp.cc | 9 +
> >  1 file changed, 9 insertions(+)
> > 
> >  diff --git a/gcc/tree-vect-slp.cc b/gcc/tree-vect-slp.cc
> >  index ad75386926a..d2ce4ffaa4f 100644
> >  --- a/gcc/tree-vect-slp.cc
> >  +++ b/gcc/tree-vect-slp.cc
> >  @@ -1114,6 +1114,15 @@ vect_build_slp_tree_1 (vec_info *vinfo, unsigned
> > >> char
> >  *swap,
> >    matches[0] = false;
> >    return false;
> >  }
>

Re: [PATCH] ipa: Disallow signature changes in fun->has_musttail functions [PR121023]

2025-07-10 Thread Richard Biener
On Fri, 11 Jul 2025, Jakub Jelinek wrote:

> Hi!
> 
> As the following testcase shows e.g. on ia32, letting IPA opts change
> signature of functions which have [[{gnu,clang}::musttail]] calls
> can turn programs that would be compiled normally into something
> that is rejected because the caller has fewer argument stack slots
> than the function being tail called.
> 
> The following patch prevents signature changes for such functions.
> It is perhaps too big hammer in some cases, but it might be hard
> to try to figure out what signature changes are still acceptable and which
> are not at IPA time.
> 
> Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk/15.2?

OK, but please give Martin/Honza a chance to comment.

Thanks,
Richard.

> 2025-07-11  Jakub Jelinek  
>   Martin Jambor  
> 
>   PR ipa/121023
>   * ipa-fnsummary.cc (compute_fn_summary): Disallow signature changes
>   on cfun->has_musttail functions.
> 
>   * c-c++-common/musttail32.c: New test.
> 
> --- gcc/ipa-fnsummary.cc.jj   2025-07-04 09:01:47.507516910 +0200
> +++ gcc/ipa-fnsummary.cc  2025-07-10 14:00:19.488185173 +0200
> @@ -3421,6 +3421,21 @@ compute_fn_summary (struct cgraph_node *
>info->inlinable = tree_inlinable_function_p (node->decl);
>  
> bool no_signature = false;
> +
> +   /* Don't allow signature changes for functions which have
> +   [[gnu::musttail]] or [[clang::musttail]] calls.  Sometimes
> +   (more often on targets which pass everything on the stack)
> +   signature changes can result in tail calls being impossible
> +   even when without the signature changes they would be ok.
> +   See PR121023.  */
> +   if (cfun->has_musttail)
> +  {
> +if (dump_file)
> + fprintf (dump_file, "No signature change:"
> +  " function has calls with musttail attribute.\n");
> +no_signature = true;
> +  }
> +
> /* Type attributes can use parameter indices to describe them.
> Special case fn spec since we can safely preserve them in
> modref summaries.  */
> --- gcc/testsuite/c-c++-common/musttail32.c.jj2025-07-10 
> 14:00:56.760698477 +0200
> +++ gcc/testsuite/c-c++-common/musttail32.c   2025-07-10 14:02:21.945586151 
> +0200
> @@ -0,0 +1,23 @@
> +/* PR ipa/121023 */
> +/* { dg-do compile { target musttail } } */
> +/* { dg-options "-O2" } */
> +
> +struct S { int a, b; };
> +
> +[[gnu::noipa]] int
> +foo (struct S x, int y, int z)
> +{
> +  return x.a + y + z;
> +}
> +
> +[[gnu::noinline]] static int
> +bar (struct S x, int y, int z)
> +{
> +  [[gnu::musttail]] return foo ((struct S) { x.a, 0 }, y, 1);
> +}
> +
> +int
> +baz (int x)
> +{
> +  return bar ((struct S) { 1, 2 }, x, 2) + bar ((struct S) { 2, 3 }, x + 1, 
> 2);
> +}
> 
>   Jakub
> 
> 

-- 
Richard Biener 
SUSE Software Solutions Germany GmbH,
Frankenstrasse 146, 90461 Nuernberg, Germany;
GF: Ivo Totev, Andrew McDonald, Werner Knoblich; (HRB 36809, AG Nuernberg)


  1   2   >