date:20241121

Re: [RFC PATCH 1/5] vect: Force alignment peeling to vectorize more early break loops

2024-11-21 Thread Alex Coplan

On 19/11/2024 20:12, Richard Sandiford wrote:
> Alex Coplan  writes:
> > On 19/11/2024 17:02, Richard Sandiford wrote:
> >> Sorry for the slow review.  Finally catching up on backlog.
> >> 
> >> Richard Biener  writes:
> >> > On Mon, 28 Oct 2024, Alex Coplan wrote:
> >> >
> >> >> This allows us to vectorize more loops with early exits by forcing
> >> >> peeling for alignment to make sure that we're guaranteed to be able to
> >> >> safely read an entire vector iteration without crossing a page boundary.
> >> >> 
> >> >> To make this work for VLA architectures we have to allow compile-time
> >> >> non-constant target alignments.  We also have to override the result of
> >> >> the target's preferred_vector_alignment hook if it isn't a power-of-two
> >> >> multiple of the TYPE_SIZE of the chosen vector type.
> >> >> 
> >> >> There is currently an implicit assumption that the TYPE_SIZE of the
> >> >> vector type is itself a power of two.  For non-VLA types this
> >> >> could be checked directly in the vectorizer.  For VLA types I
> >> >> had discussed offline with Richard S about adding a target hook to allow
> >> >> the vectorizer to query the backend to confirm that a given VLA type
> >> >> is known to have a power-of-two size at runtime.
> >> >
> >> > GCC assumes all vectors have power-of-two size, so I don't think we
> >> > need to check anything but we'd instead have to make sure the
> >> > target constrains the hardware when this assumption doesn't hold
> >> > in silicon.
> >> 
> >> We did at one point support non-power-of-2 for VLA only.  But things
> >> might have crept in since that break it even for VLA.  It's no longer
> >> something that matters for SVE because the architecture has been
> >> tightened to remove the non-power-of-2 option.
> >> 
> >> My main comment on the patch is about:
> >> 
> >> +  /* Below we reject compile-time non-constant target alignments, but if
> >> + our misalignment is zero, then we are known to already be aligned
> >> + w.r.t. any such possible target alignment.  */
> >> +  if (known_eq (misalignment, 0))
> >> +return 0;
> >> 
> >> When is that true for VLA?  It seems surprising that we can guarantee
> >> alignment to an unknown boundary :)  However, I agree that it's the
> >> natural consequence of the formula.
> >
> > My vague memory is that the alignment peeling machinery forces the
> > dr_info->misalignment to 0 after we've decided to peel for alignment
> > (for DRs which we know we will have made aligned by peeling).  So the
> > check is designed to handle that case.
> 
> Ah, yeah, of course.  Sorry for the dumb question.  I'd forgotten that
> that was what the misalignment represented here, rather than the
> incoming/"natural" misalignment.

Not a dumb question at all, it is quite non-obvious.  Thanks for taking
a look at the patch.

Alex

> 
> Thanks,
> Richard

Re: [PATCH v6] forwprop: Try to blend two isomorphic VEC_PERM sequences

2024-11-21 Thread Christoph Müllner

On Thu, Nov 21, 2024 at 1:17 PM Richard Biener  wrote:
>
> On Thu, 21 Nov 2024, Christoph Müllner wrote:
>
> > This extends forwprop by yet another VEC_PERM optimization:
> > It attempts to blend two isomorphic vector sequences by using the
> > redundancy in the lane utilization in these sequences.
> > This redundancy in lane utilization comes from the way how specific
> > scalar statements end up vectorized: two VEC_PERMs on top, binary operations
> > on both of them, and a final VEC_PERM to create the result.
> > Here is an example of this sequence:
> >
> >   v_in = {e0, e1, e2, e3}
> >   v_1 = VEC_PERM 
> >   // v_1 = {e0, e2, e0, e2}
> >   v_2 = VEC_PERM 
> >   // v_2 = {e1, e3, e1, e3}
> >
> >   v_x = v_1 + v_2
> >   // v_x = {e0+e1, e2+e3, e0+e1, e2+e3}
> >   v_y = v_1 - v_2
> >   // v_y = {e0-e1, e2-e3, e0-e1, e2-e3}
> >
> >   v_out = VEC_PERM 
> >   // v_out = {e0+e1, e2+e3, e0-e1, e2-e3}
> >
> > To remove the redundancy, lanes 2 and 3 can be freed, which allows to
> > change the last statement into:
> >   v_out' = VEC_PERM 
> >   // v_out' = {e0+e1, e2+e3, e0-e1, e2-e3}
> >
> > The cost of eliminating the redundancy in the lane utilization is that
> > lowering the VEC PERM expression could get more expensive because of
> > tighter packing of the lanes.  Therefore this optimization is not done
> > alone, but in only in case we identify two such sequences that can be
> > blended.
> >
> > Once all candidate sequences have been identified, we try to blend them,
> > so that we can use the freed lanes for the second sequence.
> > On success we convert 2x (2x BINOP + 1x VEC_PERM) to
> > 2x VEC_PERM + 2x BINOP + 2x VEC_PERM traded for 4x VEC_PERM + 2x BINOP.
> >
> > The implemented transformation reuses (rewrites) the statements
> > of the first sequence and the last VEC_PERM of the second sequence.
> > The remaining four statements of the second statment are left untouched
> > and will be eliminated by DCE later.
> >
> > This targets x264_pixel_satd_8x4, which calculates the sum of absolute
> > transformed differences (SATD) using Hadamard transformation.
> > We have seen 8% speedup on SPEC's x264 on a 5950X (x86-64) and 7%
> > speedup on an AArch64 machine.
> >
> > Bootstrapped and reg-tested on x86-64 and AArch64 (all languages).
>
> OK.

Thanks!

However, this patch triggered an ICE during testing as reported by the
Linaro CI runs for ARM and AArch64:
  
https://patchwork.sourceware.org/project/gcc/patch/20241120231524.3885158-1-christoph.muell...@vrull.eu/

The reason is that I used the following code to initialize a vector:
  auto_vec lane_assignment;
  lane_assignment.reserve (2 * nelts);
  /* Mark all lanes as free.  */
  for (i = 0; i < 2 * nelts; i++)
lane_assignment[i] = 0;  // ICE is triggered here

I've understood the issue and fixed the code as follows:
  auto_vec lane_assignment;
  lane_assignment.create (2 * nelts);
  /* Mark all lanes as free.  */
  lane_assignment.quick_grow_cleared (2 * nelts);

I did not see this myself before sending out, because I bootstrapped/tested
with "--enable-checking=release".
I've changed my test procedures to use "--enable-checking".

I'll push including the change mentioned above once all tests are done
(expected within the next 30 minutes).

Thanks,
Christoph


>
> Thanks,
> Richard.
>
> > gcc/ChangeLog
> >
> >   * tree-ssa-forwprop.cc (struct _vec_perm_simplify_seq): New data
> >   structure to store analysis results of a vec perm simplify sequence.
> >   (get_vect_selector_index_map): Helper to get an index map from the
> >   provided vector permute selector.
> >   (recognise_vec_perm_simplify_seq): Helper to recognise a
> >   vec perm simplify sequence.
> >   (narrow_vec_perm_simplify_seq): Helper to pack the lanes more
> >   tight.
> >   (can_blend_vec_perm_simplify_seqs_p): Test if two vec perm
> >   sequences can be blended.
> >   (calc_perm_vec_perm_simplify_seqs): Helper to calculate the new
> >   permutation indices.
> >   (blend_vec_perm_simplify_seqs): Helper to blend two vec perm
> >   simplify sequences.
> >   (process_vec_perm_simplify_seq_list): Helper to process a list
> >   of vec perm simplify sequences.
> >   (append_vec_perm_simplify_seq_list): Helper to add a vec perm
> >   simplify sequence to the list.
> >   (pass_forwprop::execute): Integrate new functionality.
> >
> > gcc/testsuite/ChangeLog:
> >
> >   * gcc.dg/tree-ssa/satd-hadamard.c: New test.
> >   * gcc.dg/tree-ssa/vector-10.c: New test.
> >   * gcc.dg/tree-ssa/vector-8.c: New test.
> >   * gcc.dg/tree-ssa/vector-9.c: New test.
> >   * gcc.target/aarch64/sve/satd-hadamard.c: New test.
> >
> > Signed-off-by: Christoph Müllner 
> > ---
> > Changes in v6:
> > * Use 'unsigned int' instead of of unsigned HWI for vector indices
> > * Remove hash maps and replace functionality with vec<>
> > * Inline get_tree_def () and eliminate redundant checks
> > * Ensure sequences remain in a BB
>

Re: [PATCH v6] forwprop: Try to blend two isomorphic VEC_PERM sequences

2024-11-21 Thread Sam James

The default on trunk is --enable-checking=yes,extra (when gcc/DEV-PHASE
contains "experimental"), otherwise it's --enable-checking=release.

I personally do most testing with --enable-checking=yes,rtl,extra but
you can do less than that if you want to quickly get results.

The minimum for testing patches on trunk should be yes,extra (as it's
the default).

thanks,
sam

Re: [RFC PATCH] dwarf2out: Use post-DWARF 5 DW_LANG_* codes for -gdwarf-5 -gno-strict-dwarf

2024-11-21 Thread Richard Biener

On Thu, 21 Nov 2024, Jakub Jelinek wrote:

> Hi!
> 
> DWARF now maintains DW_LANG_* code assignment online and 27 language codes
> have been assigned already after DWARF 5 has been released, see
> https://dwarfstd.org/languages.html
> including one added yesterday (DW_LANG_C23).
> DWARF 6 plans to use something different, DW_AT_language_{name,version}
> pair where the new language versions will be just dealt with automatically
> rather than adding new codes, say for C23 we'll be able to use
> DW_LNAME_C 202311 while for C2Y for now to use
> DW_LNAME_C 202500 until the standard is finalized.
> 
> Now, the question is whether the toolchain should use those post DWARF 5
> codes for -gdwarf-5 -gno-strict-dwarf, or if we'll just ignore those
> and only switch to DWARF 6 stuff when the standard is released and people
> use -gdwarf-6 (or when we switch over to that as default).
> 
> The following patch starts using those new codes (just for C/C++ for now,
> Ada/Fortran not switched, Ada because I'm really not familiar with Ada and
> Fortran because it doesn't say 2018 in the language string).
> 
> The problem with the patch is that it regresses quite a few tests,
> in particular
> gcc.dg/guality/pr78726.c
> g++.dg/guality/redeclaration1.C
> libstdc++-prettyprinters/*.cc
> libstdc++-xmethods/deque.cc
> because my gdb doesn't handle those (but git trunk gdb doesn't either),
> so for those the new codes are just unknown languages rather than newer
> revisions of C or C++.
> From what I can read in gdb, it doesn't seem to care about exact standard
> revision, all it cares about is if the TU is C, C++, Fortran, Ada etc.
> So, from this POV perhaps we shouldn't switch at all and ignore all the
> post-DWARF 5 codes.
> Or shall we wait until gdb, elfutils, whatever else actually looks at
> DW_AT_language values is changed to handle the new codes and apply this
> patch after that (still one would need a new version of gdb/elfutils/etc.)?
> Or wait say half a year or year after that support is added in the
> consumers?

I'd say we at least wait until some consumers have support, if it
regresses behavior with "older" consumers I don't think we want to
have this as default for now anyway.

Richard.

> The DWARF 6 planned scheme was designed exactly to overcome this problem,
> consumers that only care if something is C or C++ etc. will be able to
> hardcode the code once and if they care for some behavior on something
> more specific, they can just compare the version, DW_AT_language_version >=
> 201703 for C++ (or < etc.), or for Fortran DW_AT_language_version >= 2008,
> ...
> 
> 2024-11-21  Jakub Jelinek  
> 
> gcc/
>   * dwarf2out.cc (is_c): Handle also DW_LANG_C{17,23}.
>   (is_cxx): Handle also DW_LANG_C_plus_plus_{17,20,23}.
>   (is_fortran): Handle also DW_LANG_Fortran18.
>   (is_ada): Handle also DW_LANG_Ada20{05,12}.
>   (lower_bound_default): Handle also
>   DW_LANG_{C{17,23},C_plus_plus_{17,20,23},Fortran18,Ada20{05,12}}.
>   (add_prototyped_attribute): Handle DW_LANG_C{17,23}.
>   (gen_compile_unit_die): Use DW_LANG_C17 if not -gstrict-dwarf
>   for C17.  Use DW_LANG_C23 if not -gstrict-dwarf for C23/C2Y.  Use
>   DW_LANG_C_plus_plus_{17,20,23} if not -gstrict-dwarf for C++{17,20,23}
>   and the last one also for C++26.  Handle DW_LANG_Fortran18.
> include/
>   * g++.dg/debug/dwarf2/lang-cpp17.C: Add -gno-strict-dwarf to
>   dg-options and expect different DW_AT_language value.
>   * g++.dg/debug/dwarf2/lang-cpp20.C: Likewise.
>   * g++.dg/debug/dwarf2/lang-cpp23.C: New test.
> 
> --- gcc/dwarf2out.cc.jj   2024-10-25 10:00:29.445768186 +0200
> +++ gcc/dwarf2out.cc  2024-11-20 21:49:48.237062064 +0100
> @@ -5540,7 +5540,8 @@ is_c (void)
>unsigned int lang = get_AT_unsigned (comp_unit_die (), DW_AT_language);
>  
>return (lang == DW_LANG_C || lang == DW_LANG_C89 || lang == DW_LANG_C99
> -   || lang == DW_LANG_C11 || lang == DW_LANG_ObjC);
> +   || lang == DW_LANG_C11 || lang == DW_LANG_C17 || lang == DW_LANG_C23
> +   || lang == DW_LANG_ObjC);
>  
>  
>  }
> @@ -5553,7 +5554,9 @@ is_cxx (void)
>unsigned int lang = get_AT_unsigned (comp_unit_die (), DW_AT_language);
>  
>return (lang == DW_LANG_C_plus_plus || lang == DW_LANG_ObjC_plus_plus
> -   || lang == DW_LANG_C_plus_plus_11 || lang == DW_LANG_C_plus_plus_14);
> +   || lang == DW_LANG_C_plus_plus_11 || lang == DW_LANG_C_plus_plus_14
> +   || lang == DW_LANG_C_plus_plus_17 || lang == DW_LANG_C_plus_plus_20
> +   || lang == DW_LANG_C_plus_plus_23);
>  }
>  
>  /* Return TRUE if DECL was created by the C++ frontend.  */
> @@ -5581,7 +5584,8 @@ is_fortran (void)
> || lang == DW_LANG_Fortran90
> || lang == DW_LANG_Fortran95
> || lang == DW_LANG_Fortran03
> -   || lang == DW_LANG_Fortran08);
> +   || lang == DW_LANG_Fortran08
> +   || lang == DW_LANG_Fortran18);
>  }
>  
>  static inline bool
> @@ -5617,7 +5621,8 @@ is_ada (voi

Re: [PATCH v6] forwprop: Try to blend two isomorphic VEC_PERM sequences

2024-11-21 Thread Richard Biener

On Thu, 21 Nov 2024, Christoph Müllner wrote:

> This extends forwprop by yet another VEC_PERM optimization:
> It attempts to blend two isomorphic vector sequences by using the
> redundancy in the lane utilization in these sequences.
> This redundancy in lane utilization comes from the way how specific
> scalar statements end up vectorized: two VEC_PERMs on top, binary operations
> on both of them, and a final VEC_PERM to create the result.
> Here is an example of this sequence:
> 
>   v_in = {e0, e1, e2, e3}
>   v_1 = VEC_PERM 
>   // v_1 = {e0, e2, e0, e2}
>   v_2 = VEC_PERM 
>   // v_2 = {e1, e3, e1, e3}
> 
>   v_x = v_1 + v_2
>   // v_x = {e0+e1, e2+e3, e0+e1, e2+e3}
>   v_y = v_1 - v_2
>   // v_y = {e0-e1, e2-e3, e0-e1, e2-e3}
> 
>   v_out = VEC_PERM 
>   // v_out = {e0+e1, e2+e3, e0-e1, e2-e3}
> 
> To remove the redundancy, lanes 2 and 3 can be freed, which allows to
> change the last statement into:
>   v_out' = VEC_PERM 
>   // v_out' = {e0+e1, e2+e3, e0-e1, e2-e3}
> 
> The cost of eliminating the redundancy in the lane utilization is that
> lowering the VEC PERM expression could get more expensive because of
> tighter packing of the lanes.  Therefore this optimization is not done
> alone, but in only in case we identify two such sequences that can be
> blended.
> 
> Once all candidate sequences have been identified, we try to blend them,
> so that we can use the freed lanes for the second sequence.
> On success we convert 2x (2x BINOP + 1x VEC_PERM) to
> 2x VEC_PERM + 2x BINOP + 2x VEC_PERM traded for 4x VEC_PERM + 2x BINOP.
> 
> The implemented transformation reuses (rewrites) the statements
> of the first sequence and the last VEC_PERM of the second sequence.
> The remaining four statements of the second statment are left untouched
> and will be eliminated by DCE later.
> 
> This targets x264_pixel_satd_8x4, which calculates the sum of absolute
> transformed differences (SATD) using Hadamard transformation.
> We have seen 8% speedup on SPEC's x264 on a 5950X (x86-64) and 7%
> speedup on an AArch64 machine.
> 
> Bootstrapped and reg-tested on x86-64 and AArch64 (all languages).

OK.

Thanks,
Richard.

> gcc/ChangeLog
> 
>   * tree-ssa-forwprop.cc (struct _vec_perm_simplify_seq): New data
>   structure to store analysis results of a vec perm simplify sequence.
>   (get_vect_selector_index_map): Helper to get an index map from the
>   provided vector permute selector.
>   (recognise_vec_perm_simplify_seq): Helper to recognise a
>   vec perm simplify sequence.
>   (narrow_vec_perm_simplify_seq): Helper to pack the lanes more
>   tight.
>   (can_blend_vec_perm_simplify_seqs_p): Test if two vec perm
>   sequences can be blended.
>   (calc_perm_vec_perm_simplify_seqs): Helper to calculate the new
>   permutation indices.
>   (blend_vec_perm_simplify_seqs): Helper to blend two vec perm
>   simplify sequences.
>   (process_vec_perm_simplify_seq_list): Helper to process a list
>   of vec perm simplify sequences.
>   (append_vec_perm_simplify_seq_list): Helper to add a vec perm
>   simplify sequence to the list.
>   (pass_forwprop::execute): Integrate new functionality.
> 
> gcc/testsuite/ChangeLog:
> 
>   * gcc.dg/tree-ssa/satd-hadamard.c: New test.
>   * gcc.dg/tree-ssa/vector-10.c: New test.
>   * gcc.dg/tree-ssa/vector-8.c: New test.
>   * gcc.dg/tree-ssa/vector-9.c: New test.
>   * gcc.target/aarch64/sve/satd-hadamard.c: New test.
> 
> Signed-off-by: Christoph Müllner 
> ---
> Changes in v6:
> * Use 'unsigned int' instead of of unsigned HWI for vector indices
> * Remove hash maps and replace functionality with vec<>
> * Inline get_tree_def () and eliminate redundant checks
> * Ensure sequences remain in a BB
> * Avoid temporary objects that need to converted later
> * Simplify lane calculation when blending
> 
> Changes in v5:
> * Improve coding style.
> 
> Changes in v4:
> * Fix test condition for writing to the dump file
> * Use gimple UIDs instead on expensive walks for comparing ordering.
> * Ensure to not blend across assignments to SSA_NAMES.
> * Restrict list to fix-sized vector with 8 entries.
> * Remove calls of expensive vec methods by restructuring the code.
> * Improved wording.
> 
> Changes in v3:
> * Moved code to tree-ssa-forwprop.cc where similar VEC_PERM
>   optimizations are implemented.
> * Test operand order less strict in case of commutative operators.
> * Made naming more consistent.
> * Added a test for testing dependencies between two sequences.
> * Removed the instruction reordering (no necessary without dependencies).
> * Added tests based on __builtin_shuffle ().
> 
> Changes in v2:
> * Moved code from tree-vect-slp.cc into a new pass (from where it could
>   be moved elsewhere).
> * Only deduplicate lanes if sequences will be merged later on.
> * Split functionality stricter into analysis and transformation parts.
> 
> Manolis Tsamis was the patch's initial author before I took i

Re: [PATCH v1 4/4] aarch64: Add SEH, stack unwinding and C++ exceptions

2024-11-21 Thread Richard Sandiford

Evgeny Karpov  writes:
> From 69ce2026b10711b32595d58e23f92f54e6c718c2 Mon Sep 17 00:00:00 2001
> From: Evgeny Karpov 
> Date: Fri, 15 Nov 2024 13:14:18 +0100
> Subject: [PATCH v1 4/4] aarch64: Add SEH, stack unwinding and C++ exceptions
>
> This patch reuses the existing SEH, stack unwinding and C++ exceptions
> from ix86 and implements the required changes for AArch64.
>
> gcc/ChangeLog:
>
>   * common/config/aarch64/aarch64-common.cc (aarch64_except_unwind_info):
>   Add unwind info for AArch64.
>   (defined): Likewise.
>   (TARGET_EXCEPT_UNWIND_INFO): Likewise.
>   * config/aarch64/aarch64.cc (aarch64_print_reg):
>   Print a reg.
>   (defined): Enable unwind tables.
>   (aarch64_declare_function_name): Add unwinding.
>   * config/aarch64/cygming.h (SYMBOL_REF_STUBVAR_P):
>   Enable SEH, declare required functions and parameters.
>   (TARGET_SEH): Likewise.
>   (SEH_MAX_FRAME_SIZE): Likewise.
>   (TARGET_ASM_UNWIND_EMIT): Likewise.
>   (TARGET_ASM_UNWIND_EMIT_BEFORE_INSN): Likewise.
>   (TARGET_ASM_FUNCTION_END_PROLOGUE): Likewise.
>   (TARGET_ASM_EMIT_EXCEPT_PERSONALITY): Likewise.
>   (TARGET_ASM_INIT_SECTIONS): Likewise.
>   (SUBTARGET_ASM_UNWIND_INIT): Likewise.
>   (aarch64_pe_seh_unwind_emit): Likewise.
>   (aarch64_print_reg): Likewise.
>   (ASM_DECLARE_FUNCTION_SIZE): Likewise.
>   (ASM_DECLARE_COLD_FUNCTION_SIZE): Likewise.
>   (ASM_DECLARE_COLD_FUNCTION_NAME): Likewise.
>   * config/mingw/winnt.cc (defined):
>   Add AArch64 implmentation for SEH.
>   (CALLEE_SAVED_REG_NUMBER): Likewise.
>   (seh_parallel_offset): Likewise.
>   (seh_pattern_emit): Likewise.
>   (aarch64_pe_seh_unwind_emit): Likewise.
>
> libgcc/ChangeLog:
>
>   * config.host: Support AArch64.
>   * unwind-seh.c (defined): Likewise.
>   (_Unwind_Backtrace): Likewise.
> ---
>  gcc/common/config/aarch64/aarch64-common.cc |  30 +++
>  gcc/config/aarch64/aarch64.cc   |  15 ++
>  gcc/config/aarch64/cygming.h|  51 +++-
>  gcc/config/mingw/winnt.cc   | 260 +++-
>  libgcc/config.host  |   2 +-
>  libgcc/unwind-seh.c |  37 ++-
>  6 files changed, 382 insertions(+), 13 deletions(-)
>
> diff --git a/gcc/common/config/aarch64/aarch64-common.cc 
> b/gcc/common/config/aarch64/aarch64-common.cc
> index 2bfc597e333..c46c0a8547d 100644
> --- a/gcc/common/config/aarch64/aarch64-common.cc
> +++ b/gcc/common/config/aarch64/aarch64-common.cc
> @@ -447,6 +447,36 @@ aarch64_rewrite_mcpu (int argc, const char **argv)
>return aarch64_rewrite_selected_cpu (argv[argc - 1]);
>  }
>  
> +#if TARGET_AARCH64_MS_ABI
> +
> +/* Implement TARGET_EXCEPT_UNWIND_INFO.  */
> +
> +static enum unwind_info_type
> +aarch64_except_unwind_info (struct gcc_options *opts)
> +{
> +  /* Honor the --enable-sjlj-exceptions configure switch.  */
> +#ifdef CONFIG_SJLJ_EXCEPTIONS
> +  if (CONFIG_SJLJ_EXCEPTIONS)
> +return UI_SJLJ;
> +#endif
> +
> +  /* On windows 64, prefer SEH exceptions over anything else.  */
> +#if defined (TARGET_AARCH64_MS_ABI)
> +  if (opts->x_flag_unwind_tables)
> +return UI_SEH;
> +#endif
> +
> +  if (DWARF2_UNWIND_INFO)
> +return UI_DWARF2;
> +
> +  return UI_SJLJ;
> +}
> +
> +#undef  TARGET_EXCEPT_UNWIND_INFO
> +#define TARGET_EXCEPT_UNWIND_INFO  aarch64_except_unwind_info
> +
> +#endif // TARGET_AARCH64_MS_ABI
> +
>  struct gcc_targetm_common targetm_common = TARGETM_COMMON_INITIALIZER;
>  
>  #undef AARCH64_CPU_NAME_LENGTH
> diff --git a/gcc/config/aarch64/aarch64.cc b/gcc/config/aarch64/aarch64.cc
> index f02f9c88b6e..6dd1ba8f085 100644
> --- a/gcc/config/aarch64/aarch64.cc
> +++ b/gcc/config/aarch64/aarch64.cc
> @@ -12773,6 +12773,12 @@ aarch64_label_mentioned_p (rtx x)
>return 0;
>  }
>  
> +void
> +aarch64_print_reg (rtx x, int code, FILE *file)
> +{
> +  aarch64_print_operand (file, x, code);
> +}

Missing comment above function (to describe the parameters).  But since
aarch64_print_operand's parameter order is the standard one (for
TARGET_PRINT_OPERAND), could we instead adjust the callers of
aarch64_print_reg to use that order too, perhaps via

  targetm.asm_out.print_operand

?

> +
>  /* Implement REGNO_REG_CLASS.  */
>  
>  enum reg_class
> @@ -18468,6 +18474,12 @@ aarch64_override_options_after_change_1 (struct 
> gcc_options *opts)
>   intermediary step for the former.  */
>if (flag_mlow_precision_sqrt)
>  flag_mrecip_low_precision_sqrt = true;
> +
> +  /* Enable unwind tables for MS.  */
> +#if defined (TARGET_AARCH64_MS_ABI)
> +  if (opts->x_flag_unwind_tables == 0)
> +opts->x_flag_unwind_tables = 1;
> +#endif // TARGET_AARCH64_MS_ABI

The usual way to do this is via:

SET_OPTION_IF_UNSET (opts, opts_set, flag_unwind_tables, 1);

(which is also what x86 seems to use).  This means that an explicit
-fno-unwind-tables still works.

Would that work here too, or we

Re: [PATCH v4 3/5] aarch64: add svcvt* FP8 intrinsics

2024-11-21 Thread Richard Sandiford

Thanks for the updated series and sorry for the slow reply.

Claudio Bantaloukas  writes:
> diff --git a/gcc/testsuite/gcc.target/aarch64/sve2/acle/asm/cvt_mf8.c 
> b/gcc/testsuite/gcc.target/aarch64/sve2/acle/asm/cvt_mf8.c
> new file mode 100644
> index 000..e65774deadc
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/aarch64/sve2/acle/asm/cvt_mf8.c
> @@ -0,0 +1,46 @@
> +/* { dg-additional-options "-march=armv8.5-a+sve2+bf16+fp8" } */
> +/* { dg-require-effective-target aarch64_asm_fp8_ok }  */
> +/* { dg-require-effective-target aarch64_asm_bf16_ok }  */
> +/* { dg-skip-if "" { *-*-* } { "-DSTREAMING_COMPATIBLE" } { "" } } */
> +
> +/* { dg-final { check-function-bodies "**" "" "-DCHECK_ASM" } } */
> +
> +#include "test_sve_acle.h"

Sorry for not noticing this last time, but: since the intrinsics are
streaming-compatible, I think we should avoid skipping the test for
that case and instead replace the lines above with:

/* { dg-do assemble { target aarch64_asm_fp8_ok } } */
/* { dg-do compile { target { ! aarch64_asm_fp8_ok } } } */
/* { dg-final { check-function-bodies "**" "" "-DCHECK_ASM" } } */

#include "test_sve_acle.h"

#pragma GCC target "+bf16+fp8"
#ifdef STREAMING_COMPATIBLE
#pragma GCC target "+sme2"
#endif

(untested!).  The idea behind the first two lines is to make sure
that as many testers as possible will at least cover the compilation
part of the test, including the check-function-bodes.  Testers with
modern binutils will continue to get the assembly part too.
(And fp8 should imply bf16, so I don't think we need to check both.)

The pragmas mean that we'll enable SME2 for the streaming-compatible
test, rather than skip that case altogether.

Same for the other tests.

LGTM otherwise.

Thanks,
Richard

Re: [PATCH v4 3/5] aarch64: add svcvt* FP8 intrinsics

2024-11-21 Thread Claudio Bantaloukas




On 21/11/2024 13:09, Richard Sandiford wrote:

Thanks for the updated series and sorry for the slow reply.

Claudio Bantaloukas  writes:

diff --git a/gcc/testsuite/gcc.target/aarch64/sve2/acle/asm/cvt_mf8.c 
b/gcc/testsuite/gcc.target/aarch64/sve2/acle/asm/cvt_mf8.c
new file mode 100644
index 000..e65774deadc
--- /dev/null
+++ b/gcc/testsuite/gcc.target/aarch64/sve2/acle/asm/cvt_mf8.c
@@ -0,0 +1,46 @@
+/* { dg-additional-options "-march=armv8.5-a+sve2+bf16+fp8" } */
+/* { dg-require-effective-target aarch64_asm_fp8_ok }  */
+/* { dg-require-effective-target aarch64_asm_bf16_ok }  */
+/* { dg-skip-if "" { *-*-* } { "-DSTREAMING_COMPATIBLE" } { "" } } */
+
+/* { dg-final { check-function-bodies "**" "" "-DCHECK_ASM" } } */
+
+#include "test_sve_acle.h"

Sorry for not noticing this last time, but: since the intrinsics are
streaming-compatible, I think we should avoid skipping the test for
that case and instead replace the lines above with:

/* { dg-do assemble { target aarch64_asm_fp8_ok } } */
/* { dg-do compile { target { ! aarch64_asm_fp8_ok } } } */
/* { dg-final { check-function-bodies "**" "" "-DCHECK_ASM" } } */

#include "test_sve_acle.h"

#pragma GCC target "+bf16+fp8"
#ifdef STREAMING_COMPATIBLE
#pragma GCC target "+sme2"
#endif

(untested!).  The idea behind the first two lines is to make sure
that as many testers as possible will at least cover the compilation
part of the test, including the check-function-bodes.  Testers with
modern binutils will continue to get the assembly part too.
(And fp8 should imply bf16, so I don't think we need to check both.)

The pragmas mean that we'll enable SME2 for the streaming-compatible
test, rather than skip that case altogether.

Same for the other tests.

Will do!

LGTM otherwise.

Thanks,
Richard

Re: [PATCH v2 1/5] vect: Force alignment peeling to vectorize more early break loops

2024-11-21 Thread Richard Biener

On Thu, 21 Nov 2024, Alex Coplan wrote:

> On 21/11/2024 10:02, Richard Biener wrote:
> > On Fri, 15 Nov 2024, Alex Coplan wrote:
> > 
> > > Hi,
> > > 
> > > This is a v2 which hopefully addresses the feedback for v1 of the 1/5
> > > patch, originally posted here:
> > > https://gcc.gnu.org/pipermail/gcc-patches/2024-October/48.html
> > > 
> > > As mentioned on IRC, it will need follow-up work to fix up latent
> > > profile issues, but that can be done during stage 3.  We will also need
> > > a simple (hopefully obvious, even) follow-up patch to fix expectations
> > > for various tests (since we now vectorize loops which we previously
> > > couldn't).
> > > 
> > > OK for trunk?
> > 
> > I'm still looking at
> > 
> > +  if (dr_info->need_peeling_for_alignment)
> > +{
> > +  /* Vector size in bytes.  */
> > +  poly_uint64 safe_align = tree_to_poly_uint64 (TYPE_SIZE_UNIT 
> > (vectype));
> > +
> > +  /* We can only peel for loops, of course.  */
> > +  gcc_checking_assert (loop_vinfo);
> > +
> > +  /* Calculate the number of vectors read per vector iteration.  If
> > +it is a power of two, multiply through to get the required
> > +alignment in bytes.  Otherwise, fail analysis since alignment
> > +peeling wouldn't work in such a case.  */
> > +  poly_uint64 vf = LOOP_VINFO_VECT_FACTOR (loop_vinfo);
> > +  if (STMT_VINFO_GROUPED_ACCESS (stmt_info))
> > +   vf *= DR_GROUP_SIZE (stmt_info);
> > 
> > so this is the total number of scalars we load, so maybe call it
> > that way, num_scalars.
> 
> Will do.
> 
> > 
> > +
> > +  auto num_vectors = vect_get_num_vectors (vf, vectype);
> > +  if (!pow2p_hwi (num_vectors))
> > 
> > side-note - with all these multiplies there's the possibility that
> > we get a testcase that has safe_align > PAGE_SIZE, meaning it's
> > no longer a good way to avoid trapping.  This problem of course
> > exists generally, we avoid it elsewhere by not having very large
> > vectors or limiting the group-size.  The actual problem is that
> > we don't know the actual page size, but we maybe could configure
> > a minimum as part of the platform configuration.  Should we for
> > now simply add
> > 
> >   || safe_align > 4096
> > 
> > here?  A testcase would load 512 contiguous uint64 to form an early exit
> > condition, quite unlikely I guess.
> 
> Good point.  I suppose this really depends whether there are
> targets/platforms that GCC supports with a page size smaller than 4k.
> Perhaps a min_page_size target hook (defaulting to 4k) would be
> sensible.  Then if there are any such targets they can override the
> hook.  WDYT?

Yeah, though this might be overkill at this point and a hard-coded
4096 is fine ...

> > 
> > With DR_GROUP_SIZE != 1 there's also the question whether we can
> > ever reach the desired alignment since each peel will skip
> > DR_GROUP_SIZE scalar elements - either those are already
> > DR_GROUP_SIZE aligned or the result will never be.
> 
> I had a look at this, I think this should already be handled by
> vector_alignment_reachable_p which already includes a suitable check for
> STMT_VINFO_GROUPED_ACCESS.  E.g. for the following testcase:
> 
> double *a;
> int f(int n) {
>   for (int i = 0; i < n; i += 2)
> if (a[i])
>   __builtin_abort();
> }
> 
> we have DR_GROUP_SIZE = 2 and in the dump (-O3, with the alignment peeling
> patches) we see:
> 
> note:   === vect_enhance_data_refs_alignment ===
> missed:   vector alignment may not be reachable
> note: vect_can_advance_ivs_p:
> note:   Analyze phi: i_12 = PHI 
> note:   Alignment of access forced using versioning.
> note:   Versioning for alignment will be applied.
> 
> so we decide to version instead, since peeling isn't viable here
> (vector_alignment_reachable_p correctly returns false).

Ah yeah, I thought you might need to adjust that but indeed the check
there should already work.

> Perhaps that says the DR flag should really be called
> need_peeling_or_versioning_for_alignment (although better I guess just
> to update the comment above its definition to mention versioning).

Or simply ->requires_alignment however we arrange for that.  But it's
just a name, so ...

> > 
> > @@ -7208,7 +7277,8 @@ vect_supportable_dr_alignment (vec_info *vinfo, 
> > dr_vec_info *dr_info,
> >if (misalignment == DR_MISALIGNMENT_UNKNOWN)
> >  is_packed = not_size_aligned (DR_REF (dr));
> >if (targetm.vectorize.support_vector_misalignment (mode, type, 
> > misalignment,
> > -is_packed))
> > +is_packed)
> > +  && !dr_info->need_peeling_for_alignment)
> >  return dr_unaligned_supported;
> > 
> > I think you need to do this earlier, like with
> > 
> >   if (misalignment == 0)
> > return dr_aligned;
> > + else if (dr_info->need_peeling_for_alignment)
> > +   return dr_unaligned_unsupported;
> 
> That seems more obviously correct indeed.  I'll

Re: [PATCH] testsuite: Fix up vector-{8,9,10}.c tests

2024-11-21 Thread Richard Biener

On Fri, 22 Nov 2024, Jakub Jelinek wrote:

> On Thu, Nov 21, 2024 at 01:30:39PM +0100, Christoph Müllner wrote:
> > > >   * gcc.dg/tree-ssa/satd-hadamard.c: New test.
> > > >   * gcc.dg/tree-ssa/vector-10.c: New test.
> > > >   * gcc.dg/tree-ssa/vector-8.c: New test.
> > > >   * gcc.dg/tree-ssa/vector-9.c: New test.
> 
> I see FAILs on i686-linux or on x86_64-linux (in the latter
> with -m32 testing).
> 
> One problem is that vector-10.c doesn't use -Wno-psabi option
> and uses a function which returns a vector and takes vector
> as first parameter, the other problems are that 3 other
> tests don't arrange for at least basic vector ISA support,
> plus non-standardly test only on x86_64-*-*, while normally
> one would allow both i?86-*-* x86_64-*-* and if it is e.g.
> specific to 64-bit, also check for lp64 or int128 or whatever
> else is needed.  E.g. Solaris I think has i?86-*-* triplet even
> for 64-bit code, etc.
> 
> The following patch fixes these.
> Tested on x86_64-linux with
> make check-gcc 
> RUNTESTFLAGS="--target_board=unix\{-m32,-m32/-mno-mmx/-mno-sse,-m64\} 
> tree-ssa.exp='vector-*.c satd-hadamard.c'"
> ok for trunk?

OK.

> 2024-11-22  Jakub Jelinek  
> 
>   * gcc.dg/tree-ssa/satd-hadamard.c: Add -msse2 as dg-additional-options
>   on x86.  Also scan-tree-dump on i?86-*-*.
>   * gcc.dg/tree-ssa/vector-8.c: Likewise.
>   * gcc.dg/tree-ssa/vector-9.c: Likewise.
>   * gcc.dg/tree-ssa/vector-10.c: Add -Wno-psabi to dg-additional-options.
> 
> --- gcc/testsuite/gcc.dg/tree-ssa/satd-hadamard.c.jj  2024-11-22 
> 00:06:56.341057153 +0100
> +++ gcc/testsuite/gcc.dg/tree-ssa/satd-hadamard.c 2024-11-22 
> 00:17:38.539656767 +0100
> @@ -1,5 +1,6 @@
>  /* { dg-do compile } */
>  /* { dg-additional-options "-O3 -fdump-tree-forwprop4-details" } */
> +/* { dg-additional-options "-msse2" { target i?86-*-* x86_64-*-* } } */
>  
>  #include 
>  
> @@ -40,4 +41,4 @@ x264_pixel_satd_8x4_simplified (uint8_t
>return (((uint16_t)sum) + ((uint32_t)sum>>16)) >> 1;
>  }
>  
> -/* { dg-final { scan-tree-dump "VEC_PERM_EXPR.*{ 2, 3, 6, 7 }" "forwprop4" { 
> target { aarch64*-*-* x86_64-*-* } } } } */
> +/* { dg-final { scan-tree-dump "VEC_PERM_EXPR.*{ 2, 3, 6, 7 }" "forwprop4" { 
> target { aarch64*-*-* i?86-*-* x86_64-*-* } } } } */
> --- gcc/testsuite/gcc.dg/tree-ssa/vector-8.c.jj   2024-11-22 
> 00:06:56.341057153 +0100
> +++ gcc/testsuite/gcc.dg/tree-ssa/vector-8.c  2024-11-22 00:16:51.047298247 
> +0100
> @@ -1,5 +1,6 @@
>  /* { dg-do compile } */
>  /* { dg-additional-options "-O3 -fdump-tree-forwprop1-details" } */
> +/* { dg-additional-options "-msse2" { target i?86-*-* x86_64-*-* } } */
>  
>  typedef int vec __attribute__((vector_size (4 * sizeof (int;
>  
> @@ -30,5 +31,5 @@ void f (vec *p_v_in_1, vec *p_v_in_2, ve
>*p_v_out_2 = v_out_2;
>  }
>  
> -/* { dg-final { scan-tree-dump "Vec perm simplify sequences have been 
> blended" "forwprop1" { target { aarch64*-*-* x86_64-*-* } } } } */
> -/* { dg-final { scan-tree-dump "VEC_PERM_EXPR.*{ 2, 3, 6, 7 }" "forwprop1" { 
> target { aarch64*-*-* x86_64-*-* } } } } */
> +/* { dg-final { scan-tree-dump "Vec perm simplify sequences have been 
> blended" "forwprop1" { target { aarch64*-*-* i?86-*-* x86_64-*-* } } } } */
> +/* { dg-final { scan-tree-dump "VEC_PERM_EXPR.*{ 2, 3, 6, 7 }" "forwprop1" { 
> target { aarch64*-*-* i?86-*-* x86_64-*-* } } } } */
> --- gcc/testsuite/gcc.dg/tree-ssa/vector-9.c.jj   2024-11-22 
> 00:06:56.341057153 +0100
> +++ gcc/testsuite/gcc.dg/tree-ssa/vector-9.c  2024-11-22 00:17:16.050960523 
> +0100
> @@ -1,5 +1,6 @@
>  /* { dg-do compile } */
>  /* { dg-additional-options "-O3 -fdump-tree-forwprop1-details" } */
> +/* { dg-additional-options "-msse2" { target i?86-*-* x86_64-*-* } } */
>  
>  typedef int vec __attribute__((vector_size (4 * sizeof (int;
>  
> @@ -30,5 +31,5 @@ void f (vec *p_v_in_1, vec *p_v_in_2, ve
>*p_v_out_2 = v_out_2;
>  }
>  
> -/* { dg-final { scan-tree-dump "Vec perm simplify sequences have been 
> blended" "forwprop1" { target { aarch64*-*-* x86_64-*-* } } } } */
> -/* { dg-final { scan-tree-dump "VEC_PERM_EXPR.*{ 2, 3, 6, 7 }" "forwprop1" { 
> target { aarch64*-*-* x86_64-*-* } } } } */
> +/* { dg-final { scan-tree-dump "Vec perm simplify sequences have been 
> blended" "forwprop1" { target { aarch64*-*-* i?86-*-* x86_64-*-* } } } } */
> +/* { dg-final { scan-tree-dump "VEC_PERM_EXPR.*{ 2, 3, 6, 7 }" "forwprop1" { 
> target { aarch64*-*-* i?86-*-* x86_64-*-* } } } } */
> --- gcc/testsuite/gcc.dg/tree-ssa/vector-10.c.jj  2024-11-22 
> 00:06:56.341057153 +0100
> +++ gcc/testsuite/gcc.dg/tree-ssa/vector-10.c 2024-11-22 00:15:28.875406026 
> +0100
> @@ -1,5 +1,5 @@
>  /* { dg-do compile } */
> -/* { dg-additional-options "-O3 -fdump-tree-forwprop1-details" } */
> +/* { dg-additional-options "-O3 -fdump-tree-forwprop1-details -Wno-psabi" } 
> */
>  
>  typedef int vec __attribute__((vector_size (4 * sizeof (int;
>  
> 
>   Jakub
> 
> 

--

Re: [PATCH] i386/testsuite: Do not append AVX10.2 option for check_effective_target

2024-11-21 Thread Hongtao Liu

On Fri, Nov 22, 2024 at 2:40 PM Haochen Jiang  wrote:
>
> Hi all,
>
> When -avx10.2 meet -march with AVX512 enabled, it will report warning
> for vector size conflict. The warning will prevent the test to run on
> GCC with arch native build on those platforms when
> check_effective_target.
>
> Remove AVX10.2 options since we are using inline asm ad it actually do
> not need options. It will eliminate the warning.
>
> Tested wieh -march=native with AVX512. Ok for trunk?
Ok.
>
> Thx,
> Haochen
>
> gcc/testsuite/ChangeLog:
>
> * lib/target-supports.exp (check_effective_target_avx10_2):
> Remove AVX10.2 option.
> (check_effective_target_avx10_2_512): Ditto.
> ---
>  gcc/testsuite/lib/target-supports.exp | 4 ++--
>  1 file changed, 2 insertions(+), 2 deletions(-)
>
> diff --git a/gcc/testsuite/lib/target-supports.exp 
> b/gcc/testsuite/lib/target-supports.exp
> index f3828793986..301254afcf5 100644
> --- a/gcc/testsuite/lib/target-supports.exp
> +++ b/gcc/testsuite/lib/target-supports.exp
> @@ -10805,7 +10805,7 @@ proc check_effective_target_avx10_2 { } {
>   __asm__ volatile ("vcvtph2ibs\t%ymm5, %ymm6");
>   __asm__ volatile ("vminmaxpd\t$123, %ymm4, %ymm5, %ymm6");
> }
> -} "-mavx10.2" ]
> +} "" ]
>  }
>
>  # Return 1 if avx10.2-512 instructions can be compiled.
> @@ -10820,7 +10820,7 @@ proc check_effective_target_avx10_2_512 { } {
>   __asm__ volatile ("vcvtph2ibs\t%zmm5, %zmm6");
>   __asm__ volatile ("vminmaxpd\t$123, %zmm4, %zmm5, %zmm6");
> }
> -} "-mavx10.2-512" ]
> +} "" ]
>  }
>
>  # Return 1 if amx-avx512 instructions can be compiled.
> --
> 2.31.1
>


-- 
BR,
Hongtao

Re: [pushed][PATCH 0/2] Remove redundant code.

2024-11-21 Thread Lulu Cheng


Pushed to r15-5583 and r15-5584.

在 2024/11/2 上午10:48, Lulu Cheng 写道:

Lulu Cheng (2):
   LoongArch: Remove redundant code.
   LoongArch: Modify the document to remove options that don't exist.

  gcc/config/loongarch/loongarch-builtins.cc | 102 -
  gcc/config/loongarch/loongarch-protos.h|   1 -
  gcc/config/loongarch/loongarch.cc  |   8 --
  gcc/doc/invoke.texi|  10 +-
  4 files changed, 5 insertions(+), 116 deletions(-)

Re: [PATCH] doc: mention STAGE1_CFLAGS

2024-11-21 Thread Richard Biener

On Wed, Nov 20, 2024 at 7:16 PM Sam James  wrote:
>
> Sam James  writes:
>
> > STAGE1_CFLAGS can be used to accelerate the just-built stage1 compiler
> > which especially improves its performance on some of the large generated
> > files during bootstrap. It defaults to nothing (i.e. -O0).
> >
> > The downside is that if the native compiler is buggy, there's a greater
> > risk of a failed bootstrap. Those with a modern native compiler, ideally
> > a recent version of GCC, should be able to use -O1 or -O2 without issue
> > to get a faster build.
> >
> >   PR rtl-optimization/111619
> >   * doc/install.texi (Building a native compiler): Discuss 
> > STAGE1_CFLAGS.
> > ---
> > This came out of a discussion between mjw and I a little while ago when
> > working on the buildbots. OK?
>
> Ping.
>
> >
> >  gcc/doc/install.texi | 6 +-
> >  1 file changed, 5 insertions(+), 1 deletion(-)
> >
> > diff --git a/gcc/doc/install.texi b/gcc/doc/install.texi
> > index 705440ffd330..4bd60555af9b 100644
> > --- a/gcc/doc/install.texi
> > +++ b/gcc/doc/install.texi
> > @@ -3017,7 +3017,11 @@ bootstrapped, you can use @code{CFLAGS_FOR_TARGET} 
> > to modify their
> >  compilation flags, as for non-bootstrapped target libraries.
> >  Again, if the native compiler miscompiles the stage1 compiler, you may
> >  need to work around this by avoiding non-working parts of the stage1
> > -compiler.  Use @code{STAGE1_TFLAGS} to this end.
> > +compiler.  Use @code{STAGE1_CFLAGS} and @code{STAGE1_TFLAGS} (for target
> > +libraries) to this end.  The default value for @code{STAGE1_CFLAGS} is
> > +@samp{STAGE1_CFLAGS='-O0'} to increase the chances of a successful 
> > bootstrap

The default for STAGE1_CFLAGS is '-g', not -O0 (this can make a
difference for non-GCC host
compilers).  The default for STAGE1_TFLAGS is '-O2 -g', note the
stage1 target libraries are
built by the stage1 compiler, not the host compiler.  I think the
sentence above for a reason
talks about STAGE1_TFLAGS and a better change would be to simply add

'You can use @code{STAGE1_CFLAGS} to set the flags passed to the host compiler
when building the stage1 compiler.  The default is to pass
@option{-g}, when the host
compiler is GCC this is results in a non-optimizing build of the
stage1 compiler.  You
can speed up the bootstrap by using @samp{STAGE1_CFLAGS='-O2'} at the increased
chance to miscompile the stage1 compiler when the host compiler is buggy.'

Richard.

> > +with a buggy native compiler.  Changing this to @code{-O1} or @code{-O2}
> > +can improve bootstrap times, with some greater risk of a failed bootstrap.
> >
> >  If you used the flag @option{--enable-languages=@dots{}} to restrict
> >  the compilers to be built, only those you've actually enabled will be
> >
> > base-commit: 00448f9b5a123b4b6b3e6f45d2fecf0a5dca66b3

Re: [PATCH v1 2/2] Match: Refactor the unsigned SAT_ADD match pattern [NFC]

2024-11-21 Thread Richard Biener

On Thu, Nov 21, 2024 at 3:09 AM Li, Pan2  wrote:
>
> Almost forgot this patch, kindly reminder.

OK.

Richard.

> Pan
>
> -Original Message-
> From: Li, Pan2 
> Sent: Monday, November 11, 2024 4:44 PM
> To: gcc-patches@gcc.gnu.org
> Cc: richard.guent...@gmail.com; tamar.christ...@arm.com; 
> juzhe.zh...@rivai.ai; kito.ch...@gmail.com; jeffreya...@gmail.com; 
> rdapp@gmail.com; Li, Pan2 
> Subject: [PATCH v1 2/2] Match: Refactor the unsigned SAT_ADD match pattern 
> [NFC]
>
> From: Pan Li 
>
> This patch would like to refactor the unsigned SAT_ADD pattern by:
> * Extract type check outside.
> * Extract common sub pattern.
> * Re-arrange the related match pattern forms together.
> * Remove unnecessary helper pattern matches.
>
> The below test suites are passed for this patch.
> * The rv64gcv fully regression test.
> * The x86 bootstrap test.
> * The x86 fully regression test.
>
> gcc/ChangeLog:
>
> * match.pd: Refactor sorts of unsigned SAT_ADD match pattern.
>
> Signed-off-by: Pan Li 
> ---
>  gcc/match.pd | 109 +--
>  1 file changed, 45 insertions(+), 64 deletions(-)
>
> diff --git a/gcc/match.pd b/gcc/match.pd
> index af3272eac55..3a76757 100644
> --- a/gcc/match.pd
> +++ b/gcc/match.pd
> @@ -3086,14 +3086,6 @@ DEFINE_INT_AND_FLOAT_ROUND_FN (RINT)
> || POINTER_TYPE_P (itype))
>&& wi::eq_p (wi::to_wide (int_cst), wi::max_value (itype))
>
> -/* Unsigned Saturation Add */
> -/* SAT_ADD = usadd_left_part_1 | usadd_right_part_1, aka:
> -   SAT_ADD = (X + Y) | -((X + Y) < X)  */
> -(match (usadd_left_part_1 @0 @1)
> - (plus:c @0 @1)
> - (if (INTEGRAL_TYPE_P (type) && TYPE_UNSIGNED (type)
> -  && types_match (type, @0, @1
> -
>  /* SAT_ADD = usadd_left_part_2 | usadd_right_part_2, aka:
> SAT_ADD = REALPART_EXPR <.ADD_OVERFLOW> | (IMAGPART_EXPR <.ADD_OVERFLOW> 
> != 0) */
>  (match (usadd_left_part_2 @0 @1)
> @@ -3101,20 +3093,6 @@ DEFINE_INT_AND_FLOAT_ROUND_FN (RINT)
>   (if (INTEGRAL_TYPE_P (type) && TYPE_UNSIGNED (type)
>&& types_match (type, @0, @1
>
> -/* SAT_ADD = usadd_left_part_1 | usadd_right_part_1, aka:
> -   SAT_ADD = (X + Y) | -((type)(X + Y) < X)  */
> -(match (usadd_right_part_1 @0 @1)
> - (negate (convert (lt (plus:c @0 @1) @0)))
> - (if (INTEGRAL_TYPE_P (type) && TYPE_UNSIGNED (type)
> -  && types_match (type, @0, @1
> -
> -/* SAT_ADD = usadd_left_part_1 | usadd_right_part_1, aka:
> -   SAT_ADD = (X + Y) | -(X > (X + Y))  */
> -(match (usadd_right_part_1 @0 @1)
> - (negate (convert (gt @0 (plus:c @0 @1
> - (if (INTEGRAL_TYPE_P (type) && TYPE_UNSIGNED (type)
> -  && types_match (type, @0, @1
> -
>  /* SAT_ADD = usadd_left_part_2 | usadd_right_part_2, aka:
> SAT_ADD = REALPART_EXPR <.ADD_OVERFLOW> | (IMAGPART_EXPR <.ADD_OVERFLOW> 
> != 0) */
>  (match (usadd_right_part_2 @0 @1)
> @@ -3129,33 +3107,62 @@ DEFINE_INT_AND_FLOAT_ROUND_FN (RINT)
>   (if (INTEGRAL_TYPE_P (type) && TYPE_UNSIGNED (type)
>&& types_match (type, @0, @1
>
> +(if (INTEGRAL_TYPE_P (type) && TYPE_UNSIGNED (type))
> + (match (usadd_overflow_mask @0 @1)
> +  /* SAT_U_ADD = (X + Y) | -(X > (X + Y)).
> + Overflow_Mask = -(X > (X + Y)).  */
> +  (negate (convert (gt @0 (plus:c @0 @1
> +  (if (types_match (type, @0, @1
> + (match (usadd_overflow_mask @0 @1)
> +  /* SAT_U_ADD = (X + Y) | -(X > (X + Y)).
> + Overflow_Mask = -((X + Y) < X).  */
> +  (negate (convert (lt (plus:c @0 @1) @0)))
> +  (if (types_match (type, @0, @1
> + (match (unsigned_integer_sat_add @0 @1)
> +  /* SAT_U_ADD = (X + Y) | Overflow_Mask  */
> +  (bit_ior:c (plus:c @0 @1) (usadd_overflow_mask @0 @1))
> +  (if (types_match (type, @0, @1
> + (match (unsigned_integer_sat_add @0 @1)
> +  /* SAT_U_ADD = (X + Y) >= X ? (X + Y) : -1  */
> +  (cond^ (ge (plus:c@2 @0 @1) @0) @2 integer_minus_onep)
> +  (if (types_match (type, @0, @1
> + (match (unsigned_integer_sat_add @0 @1)
> +  /* SAT_U_ADD = (X + Y) < X ? -1 : (X + Y)  */
> +  (cond^ (lt (plus:c@2 @0 @1) @0) integer_minus_onep @2)
> +  (if (types_match (type, @0, @1
> + (match (unsigned_integer_sat_add @0 @1)
> +  /* SAT_U_ADD = X <= (X + Y) ? (X + Y) : -1  */
> +  (cond^ (le @0 (plus:c@2 @0 @1)) @2 integer_minus_onep)
> +  (if (types_match (type, @0, @1
> + (match (unsigned_integer_sat_add @0 @1)
> +  /* SAT_U_ADD = X > (X + Y) ? -1 : (X + Y)  */
> +  (cond^ (gt @0 (plus:c@2 @0 @1)) integer_minus_onep @2)
> +  (if (types_match (type, @0, @1
> + (match (unsigned_integer_sat_add @0 @1)
> +  /* SAT_U_ADD = (X + IMM) >= x ? (X + IMM) : -1  */
> +  (plus (min @0 INTEGER_CST@2) INTEGER_CST@1)
> +  (if (types_match (type, @0, @1))
> +   (with
> +{
> + unsigned precision = TYPE_PRECISION (type);
> + wide_int cst_1 = wi::to_wide (@1);
> + wide_int cst_2 = wi::to_wide (@2);
> + wide_int max = wi::mask (precision, false, precision);
> + wide_int sum = wi::add (cst_1, cst_2);
> +}
> +(if (wi::eq_p (max, sum)))

[RFC PATCH] dwarf2out: Use post-DWARF 5 DW_LANG_* codes for -gdwarf-5 -gno-strict-dwarf

2024-11-21 Thread Jakub Jelinek

Hi!

DWARF now maintains DW_LANG_* code assignment online and 27 language codes
have been assigned already after DWARF 5 has been released, see
https://dwarfstd.org/languages.html
including one added yesterday (DW_LANG_C23).
DWARF 6 plans to use something different, DW_AT_language_{name,version}
pair where the new language versions will be just dealt with automatically
rather than adding new codes, say for C23 we'll be able to use
DW_LNAME_C 202311 while for C2Y for now to use
DW_LNAME_C 202500 until the standard is finalized.

Now, the question is whether the toolchain should use those post DWARF 5
codes for -gdwarf-5 -gno-strict-dwarf, or if we'll just ignore those
and only switch to DWARF 6 stuff when the standard is released and people
use -gdwarf-6 (or when we switch over to that as default).

The following patch starts using those new codes (just for C/C++ for now,
Ada/Fortran not switched, Ada because I'm really not familiar with Ada and
Fortran because it doesn't say 2018 in the language string).

The problem with the patch is that it regresses quite a few tests,
in particular
gcc.dg/guality/pr78726.c
g++.dg/guality/redeclaration1.C
libstdc++-prettyprinters/*.cc
libstdc++-xmethods/deque.cc
because my gdb doesn't handle those (but git trunk gdb doesn't either),
so for those the new codes are just unknown languages rather than newer
revisions of C or C++.
>From what I can read in gdb, it doesn't seem to care about exact standard
revision, all it cares about is if the TU is C, C++, Fortran, Ada etc.
So, from this POV perhaps we shouldn't switch at all and ignore all the
post-DWARF 5 codes.
Or shall we wait until gdb, elfutils, whatever else actually looks at
DW_AT_language values is changed to handle the new codes and apply this
patch after that (still one would need a new version of gdb/elfutils/etc.)?
Or wait say half a year or year after that support is added in the
consumers?

The DWARF 6 planned scheme was designed exactly to overcome this problem,
consumers that only care if something is C or C++ etc. will be able to
hardcode the code once and if they care for some behavior on something
more specific, they can just compare the version, DW_AT_language_version >=
201703 for C++ (or < etc.), or for Fortran DW_AT_language_version >= 2008,
...

2024-11-21  Jakub Jelinek  

gcc/
* dwarf2out.cc (is_c): Handle also DW_LANG_C{17,23}.
(is_cxx): Handle also DW_LANG_C_plus_plus_{17,20,23}.
(is_fortran): Handle also DW_LANG_Fortran18.
(is_ada): Handle also DW_LANG_Ada20{05,12}.
(lower_bound_default): Handle also
DW_LANG_{C{17,23},C_plus_plus_{17,20,23},Fortran18,Ada20{05,12}}.
(add_prototyped_attribute): Handle DW_LANG_C{17,23}.
(gen_compile_unit_die): Use DW_LANG_C17 if not -gstrict-dwarf
for C17.  Use DW_LANG_C23 if not -gstrict-dwarf for C23/C2Y.  Use
DW_LANG_C_plus_plus_{17,20,23} if not -gstrict-dwarf for C++{17,20,23}
and the last one also for C++26.  Handle DW_LANG_Fortran18.
include/
* g++.dg/debug/dwarf2/lang-cpp17.C: Add -gno-strict-dwarf to
dg-options and expect different DW_AT_language value.
* g++.dg/debug/dwarf2/lang-cpp20.C: Likewise.
* g++.dg/debug/dwarf2/lang-cpp23.C: New test.

--- gcc/dwarf2out.cc.jj 2024-10-25 10:00:29.445768186 +0200
+++ gcc/dwarf2out.cc2024-11-20 21:49:48.237062064 +0100
@@ -5540,7 +5540,8 @@ is_c (void)
   unsigned int lang = get_AT_unsigned (comp_unit_die (), DW_AT_language);
 
   return (lang == DW_LANG_C || lang == DW_LANG_C89 || lang == DW_LANG_C99
- || lang == DW_LANG_C11 || lang == DW_LANG_ObjC);
+ || lang == DW_LANG_C11 || lang == DW_LANG_C17 || lang == DW_LANG_C23
+ || lang == DW_LANG_ObjC);
 
 
 }
@@ -5553,7 +5554,9 @@ is_cxx (void)
   unsigned int lang = get_AT_unsigned (comp_unit_die (), DW_AT_language);
 
   return (lang == DW_LANG_C_plus_plus || lang == DW_LANG_ObjC_plus_plus
- || lang == DW_LANG_C_plus_plus_11 || lang == DW_LANG_C_plus_plus_14);
+ || lang == DW_LANG_C_plus_plus_11 || lang == DW_LANG_C_plus_plus_14
+ || lang == DW_LANG_C_plus_plus_17 || lang == DW_LANG_C_plus_plus_20
+ || lang == DW_LANG_C_plus_plus_23);
 }
 
 /* Return TRUE if DECL was created by the C++ frontend.  */
@@ -5581,7 +5584,8 @@ is_fortran (void)
  || lang == DW_LANG_Fortran90
  || lang == DW_LANG_Fortran95
  || lang == DW_LANG_Fortran03
- || lang == DW_LANG_Fortran08);
+ || lang == DW_LANG_Fortran08
+ || lang == DW_LANG_Fortran18);
 }
 
 static inline bool
@@ -5617,7 +5621,8 @@ is_ada (void)
 {
   unsigned int lang = get_AT_unsigned (comp_unit_die (), DW_AT_language);
 
-  return lang == DW_LANG_Ada95 || lang == DW_LANG_Ada83;
+  return (lang == DW_LANG_Ada95 || lang == DW_LANG_Ada83
+ || lang == DW_LANG_Ada2005 || lang == DW_LANG_Ada2012);
 }
 
 /* Return TRUE if the language is D.  */
@@ -21645,9 +21650,14 @@ lower_bound_default (void)
 ca

Re: [PATCH v2 04/14] tree-phinodes: Use 4 instead of 2 as the minimum number of phi args

2024-11-21 Thread Lewis Hyatt

On Wed, Nov 20, 2024 at 10:19:13AM +0100, Richard Biener wrote:
> On Tue, Nov 19, 2024 at 5:46 PM Lewis Hyatt  wrote:
> >
> > On Tue, Nov 19, 2024 at 9:59 AM Richard Biener
> >  wrote:
> > >
> > > On Sun, Nov 17, 2024 at 4:28 AM Lewis Hyatt  wrote:
> > > >
> > > > Currently, when we allocate a gphi object, we round up the capacity for 
> > > > the
> > > > trailing arguments array such that it will make full use of the page 
> > > > size
> > > > that ggc will allocate. While there is also an explicit minimum of 2
> > > > arguments, in practice after rounding to the ggc page size there is 
> > > > always
> > > > room for at least 4.
> > > >
> > > > It seems we have some code that has come to depend on there being this 
> > > > much
> > > > room before reallocation of a PHI is required. For example, the function
> > > > loop_version () used during loop optimization will make sure there is 
> > > > room
> > > > for an additional edge on each PHI that it processes. But there are call
> > > > sites which cache a PHI pointer prior to calling loop_version () and 
> > > > assume
> > > > it remains valid afterward, thus implicitly assuming that the PHI will 
> > > > have
> > > > spare capacity. Examples include split_loop () and gen_parallel_loop ().
> > > >
> > > > This works fine now, but if the size of a gphi becomes larger, e.g. due 
> > > > to
> > > > configuring location_t to be a 64-bit type, then on 32-bit platforms it 
> > > > ends
> > > > up being possible to get a gphi with only 2 arguments of capacity, 
> > > > causing
> > > > the above call sites of loop_version () to fail. (They store a pointer 
> > > > to a
> > > > gphi object that no longer has the same meaning it did before it got
> > > > reallocated.) The testcases gcc.dg/torture/pr113707-2.c and
> > > > gcc.dg/graphite/pr81945.c exhibit that failure mode.
> > > >
> > > > It may be necessary to adjust those call sites to make this more 
> > > > robust, but
> > > > in the meantime, changing the minimum from 2 to 4 does no harm given the
> > > > minimum is practically 4 anyway, and it resolves the issue for 32-bit
> > > > platforms.
> > >
> > > We need to fix the users.  Note ideal_phi_node_len rounds up to a power 
> > > of two
> > > but extra_order_size_table also has MAX_ALIGNMENT * n with n from 1 to 16
> > > buckets, so such extensive rounding up is not needed.
> > >
> > > The cache is also quite useless this way (I didn't fix this when last 
> > > working
> > > there).
> > >
> >
> > Adjusting the call sites definitely sounds right, but I worry it's
> > potentially a big change?
> 
> I already had to fixup quite some places because gphis now can be
> ggc_free()d when removed or re-allocated.  It seems you simply uncovered
> more of those places.
> 
> > So one of the call sites that caused problems here was around line 620
> > in tree-ssa-loop-split.cc:
> >
> >  /* Find a loop PHI node that defines guard_iv directly,
> >or create one doing that.  */
> > gphi *phi = find_or_create_guard_phi (loop1, guard_iv, &iv);
> > if (!phi)
> >
> > It remembers "phi" and reuses it later when it might have been
> > invalidated. That one is easy to fix, find_or_create_guard_phi() can
> > just be called again later.
> 
> One trick is to instead remember the PHI result SSA variable and
> later check its SSA_NAME_DEF_STMT which will be the re-allocated
> PHI.  I think this should work almost everywhere for the re-allocation issue.
> 
> > But the other one I ran into with testing was in tree-parloops.cc:
> >
> > /* Element of the hashtable, representing a
> >reduction in the current loop.  */
> > struct reduction_info
> > {
> >   gimple *reduc_stmt;   /* reduction statement.  */
> >   gimple *reduc_phi;/* The phi node defining the reduction.  */
> >   enum tree_code reduction_code;/* code for the reduction operation.  */
> >   unsigned reduc_version;   /* SSA_NAME_VERSION of original reduc_phi
> >result.  */
> >   gphi *keep_res;   /* The PHI_RESULT of this phi is the
> > resulting value
> >of the reduction variable when
> > existing the loop. */
> >   tree initial_value;   /* The initial value of the reduction
> > var before entering the loop.  */
> >   tree field;   /*  the name of the field in the
> > parloop data structure intended for reduction.  */
> >   tree reduc_addr;  /* The address of the reduction variable for
> >openacc reductions.  */
> >   tree init;/* reduction initialization value.  */
> >   gphi *new_phi;/* (helper field) Newly created phi
> > node whose result
> >will be passed to the atomic
> > operation.  Represents
> >the local result each thread
> > computed for the reduction
> >operation.  */

Re: [RFC PATCH] dwarf2out: Use post-DWARF 5 DW_LANG_* codes for -gdwarf-5 -gno-strict-dwarf

2024-11-21 Thread Mark Wielaard

Hi Jakub,

On Thu, Nov 21, 2024 at 10:16:13AM +0100, Jakub Jelinek via Gdb wrote:
> From what I can read in gdb, it doesn't seem to care about exact standard
> revision, all it cares about is if the TU is C, C++, Fortran, Ada etc.
> So, from this POV perhaps we shouldn't switch at all and ignore all the
> post-DWARF 5 codes.
> Or shall we wait until gdb, elfutils, whatever else actually looks at
> DW_AT_language values is changed to handle the new codes and apply this
> patch after that (still one would need a new version of gdb/elfutils/etc.)?
> Or wait say half a year or year after that support is added in the
> consumers?

I'll work with Sasha to make sure support is there for binutils, gdb,
elfutils and valgrind (patches should be simple) and backport it so
those consumers should be ready before end of year.

> The DWARF 6 planned scheme was designed exactly to overcome this problem,
> consumers that only care if something is C or C++ etc. will be able to
> hardcode the code once and if they care for some behavior on something
> more specific, they can just compare the version, DW_AT_language_version >=
> 201703 for C++ (or < etc.), or for Fortran DW_AT_language_version >= 2008,
> ...

Which is obviously much nicer. But I think it will be a little
confusing to mix pre-DWARF6 (which won't be done before GCC15
releases) with DWARF5. So lets work on that for GCC16.

The patch itself looks good to me.

Cheers,

Mark

[PATCH v2] s390: Add expander for uaddc/usubc optabs

2024-11-21 Thread Stefan Schulze Frielinghaus

Bootstrap and regtest are still running.  If those are successful and
there are no further comments I will push this one in the coming days.

-- >8 --

gcc/ChangeLog:

* config/s390/s390-protos.h (s390_emit_compare): Add mode
parameter for the resulting RTX.
* config/s390/s390.cc (s390_emit_compare): Dito.
(s390_emit_compare_and_swap): Change.
(s390_expand_vec_strlen): Change.
(s390_expand_cs_hqi): Change.
(s390_expand_split_stack_prologue): Change.
* config/s390/s390.md (*add3_carry1_cc): Renamed to ...
(add3_carry1_cc): this and in order to use the
corresponding gen function, encode CC mode into pattern.
(*sub3_borrow_cc): Renamed to ...
(sub3_borrow_cc): this and in order to use the
corresponding gen function, encode CC mode into pattern.
(*add3_alc_carry1_cc): Renamed to ...
(add3_alc_carry1_cc): this and in order to use the
corresponding gen function, encode CC mode into pattern.
(sub3_slb_borrow1_cc): New.
(uaddc5): New.
(usubc5): New.

gcc/testsuite/ChangeLog:

* gcc.target/s390/uaddc-1.c: New test.
* gcc.target/s390/uaddc-2.c: New test.
* gcc.target/s390/uaddc-3.c: New test.
* gcc.target/s390/usubc-1.c: New test.
* gcc.target/s390/usubc-2.c: New test.
* gcc.target/s390/usubc-3.c: New test.
---
 gcc/config/s390/s390-protos.h   |   2 +-
 gcc/config/s390/s390.cc |  20 +--
 gcc/config/s390/s390.md | 115 +
 gcc/testsuite/gcc.target/s390/uaddc-1.c | 156 
 gcc/testsuite/gcc.target/s390/uaddc-2.c |  25 
 gcc/testsuite/gcc.target/s390/uaddc-3.c |  27 
 gcc/testsuite/gcc.target/s390/usubc-1.c | 156 
 gcc/testsuite/gcc.target/s390/usubc-2.c |  25 
 gcc/testsuite/gcc.target/s390/usubc-3.c |  29 +
 9 files changed, 519 insertions(+), 36 deletions(-)
 create mode 100644 gcc/testsuite/gcc.target/s390/uaddc-1.c
 create mode 100644 gcc/testsuite/gcc.target/s390/uaddc-2.c
 create mode 100644 gcc/testsuite/gcc.target/s390/uaddc-3.c
 create mode 100644 gcc/testsuite/gcc.target/s390/usubc-1.c
 create mode 100644 gcc/testsuite/gcc.target/s390/usubc-2.c
 create mode 100644 gcc/testsuite/gcc.target/s390/usubc-3.c

diff --git a/gcc/config/s390/s390-protos.h b/gcc/config/s390/s390-protos.h
index e7ac59d17da..b8604394391 100644
--- a/gcc/config/s390/s390-protos.h
+++ b/gcc/config/s390/s390-protos.h
@@ -86,7 +86,7 @@ extern int tls_symbolic_operand (rtx);
 extern bool s390_match_ccmode (rtx_insn *, machine_mode);
 extern machine_mode s390_tm_ccmode (rtx, rtx, bool);
 extern machine_mode s390_select_ccmode (enum rtx_code, rtx, rtx);
-extern rtx s390_emit_compare (enum rtx_code, rtx, rtx);
+extern rtx s390_emit_compare (machine_mode, enum rtx_code, rtx, rtx);
 extern rtx_insn *s390_emit_jump (rtx, rtx);
 extern bool symbolic_reference_mentioned_p (rtx);
 extern bool tls_symbolic_reference_mentioned_p (rtx);
diff --git a/gcc/config/s390/s390.cc b/gcc/config/s390/s390.cc
index c9172d1153a..4c8bf21539c 100644
--- a/gcc/config/s390/s390.cc
+++ b/gcc/config/s390/s390.cc
@@ -2029,9 +2029,9 @@ s390_canonicalize_comparison (int *code, rtx *op0, rtx 
*op1,
the IF_THEN_ELSE of the conditional branch testing the result.  */
 
 rtx
-s390_emit_compare (enum rtx_code code, rtx op0, rtx op1)
+s390_emit_compare (machine_mode mode, enum rtx_code code, rtx op0, rtx op1)
 {
-  machine_mode mode = s390_select_ccmode (code, op0, op1);
+  machine_mode cc_mode = s390_select_ccmode (code, op0, op1);
   rtx cc;
 
   /* Force OP1 into register in order to satisfy VXE TFmode patterns.  */
@@ -2043,17 +2043,17 @@ s390_emit_compare (enum rtx_code code, rtx op0, rtx op1)
   /* Do not output a redundant compare instruction if a
 compare_and_swap pattern already computed the result and the
 machine modes are compatible.  */
-  gcc_assert (s390_cc_modes_compatible (GET_MODE (op0), mode)
+  gcc_assert (s390_cc_modes_compatible (GET_MODE (op0), cc_mode)
  == GET_MODE (op0));
   cc = op0;
 }
   else
 {
-  cc = gen_rtx_REG (mode, CC_REGNUM);
-  emit_insn (gen_rtx_SET (cc, gen_rtx_COMPARE (mode, op0, op1)));
+  cc = gen_rtx_REG (cc_mode, CC_REGNUM);
+  emit_insn (gen_rtx_SET (cc, gen_rtx_COMPARE (cc_mode, op0, op1)));
 }
 
-  return gen_rtx_fmt_ee (code, VOIDmode, cc, const0_rtx);
+  return gen_rtx_fmt_ee (code, mode, cc, const0_rtx);
 }
 
 /* If MEM is not a legitimate compare-and-swap memory operand, return a new
@@ -2103,7 +2103,7 @@ s390_emit_compare_and_swap (enum rtx_code code, rtx old, 
rtx mem,
 default:
   gcc_unreachable ();
 }
-  return s390_emit_compare (code, cc, const0_rtx);
+  return s390_emit_compare (VOIDmode, code, cc, const0_rtx);
 }
 
 /* Emit a jump instruction to TARGET and return it.  If COND is
@@ -6647,7 +6647,7 @@ s390_expand_vec_strle

[PATCH] include: Add new post-DWARF 5 DW_LANG_* enumerators

2024-11-21 Thread Jakub Jelinek

Hi!

DWARF changed the language code assignment to be on a web page and
after DWARF 5 has been published already 27 codes have been assigned.
We have some of those already in the header, but most of them were missing,
including one added just yesterday (DW_LANG_C23).
Note, this is really post-DWARF 5 stuff rather than DWARF 6, because
DWARF 6 plans to switch from DW_AT_language to DW_AT_language_{name,version}
pair where we'll say DW_LNAME_C with 202311 version instead of this.

Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?

2024-11-21  Jakub Jelinek  

* dwarf2.h (enum dwarf_source_language): Add comment where
the post DWARF 5 additions start.  Refresh list from
https://dwarfstd.org/languages.html.

--- include/dwarf2.h.jj 2024-09-24 11:31:48.867620081 +0200
+++ include/dwarf2.h2024-11-20 21:47:16.058161071 +0100
@@ -379,6 +379,8 @@ enum dwarf_source_language
 DW_LANG_Fortran08 = 0x0023,
 DW_LANG_RenderScript = 0x0024,
 DW_LANG_BLISS = 0x0025,
+/* Post DWARF 5 additions to the DWARF set.
+   See https://dwarfstd.org/languages.html .  */
 DW_LANG_Kotlin = 0x0026,
 DW_LANG_Zig = 0x0027,
 DW_LANG_Crystal = 0x0028,
@@ -388,6 +390,24 @@ enum dwarf_source_language
 DW_LANG_Fortran18 = 0x002d,
 DW_LANG_Ada2005 = 0x002e,
 DW_LANG_Ada2012 = 0x002f,
+DW_LANG_HIP = 0x0030,
+DW_LANG_Assembly = 0x0031,
+DW_LANG_C_sharp = 0x0032,
+DW_LANG_Mojo = 0x0033,
+DW_LANG_GLSL = 0x0034,
+DW_LANG_GLSL_ES = 0x0035,
+DW_LANG_HLSL = 0x0036,
+DW_LANG_OpenCL_CPP = 0x0037,
+DW_LANG_CPP_for_OpenCL = 0x0038,
+DW_LANG_SYCL = 0x0039,
+DW_LANG_C_plus_plus_23 = 0x003a,
+DW_LANG_Odin = 0x003b,
+DW_LANG_P4 = 0x003c,
+DW_LANG_Metal = 0x003d,
+DW_LANG_C23 = 0x003e,
+DW_LANG_Ruby = 0x0040,
+DW_LANG_Move = 0x0041,
+DW_LANG_Hylo = 0x0042,
 
 DW_LANG_lo_user = 0x8000,  /* Implementation-defined range start.  */
 DW_LANG_hi_user = 0x,  /* Implementation-defined range start.  */

Jakub

[PATCH v1 1/7] RISC-V: Rearrange the test files for vector SAT_SUB [NFC]

2024-11-21 Thread pan2 . li

From: Pan Li 

The test files of vector SAT_SUB only has numbers as the suffix.
Rearrange the file name to -{form number}-{target-type}.  For example,
test form 3 for uint32_t SAT_SUB will have -3-u32.c for asm check and
-run-3-u32.c for the run test.

Meanwhile, moved all related test files to riscv/rvv/autovec/sat/.

It is test only patch and obvious up to a point, will commit it
directly if no comments in next 48H.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/rvv/autovec/binop/vec_sat_u_sub-2.c: Move to...
* gcc.target/riscv/rvv/autovec/sat/vec_sat_u_sub-1-u16.c: ...here.
* gcc.target/riscv/rvv/autovec/binop/vec_sat_u_sub-3.c: Move to...
* gcc.target/riscv/rvv/autovec/sat/vec_sat_u_sub-1-u32.c: ...here.
* gcc.target/riscv/rvv/autovec/binop/vec_sat_u_sub-4.c: Move to...
* gcc.target/riscv/rvv/autovec/sat/vec_sat_u_sub-1-u64.c: ...here.
* gcc.target/riscv/rvv/autovec/binop/vec_sat_u_sub-1.c: Move to...
* gcc.target/riscv/rvv/autovec/sat/vec_sat_u_sub-1-u8.c: ...here.
* gcc.target/riscv/rvv/autovec/binop/vec_sat_u_sub-38.c: Move to...
* gcc.target/riscv/rvv/autovec/sat/vec_sat_u_sub-10-u16.c: ...here.
* gcc.target/riscv/rvv/autovec/binop/vec_sat_u_sub-39.c: Move to...
* gcc.target/riscv/rvv/autovec/sat/vec_sat_u_sub-10-u32.c: ...here.
* gcc.target/riscv/rvv/autovec/binop/vec_sat_u_sub-40.c: Move to...
* gcc.target/riscv/rvv/autovec/sat/vec_sat_u_sub-10-u64.c: ...here.
* gcc.target/riscv/rvv/autovec/binop/vec_sat_u_sub-37.c: Move to...
* gcc.target/riscv/rvv/autovec/sat/vec_sat_u_sub-10-u8.c: ...here.
* gcc.target/riscv/rvv/autovec/binop/vec_sat_u_sub-6.c: Move to...
* gcc.target/riscv/rvv/autovec/sat/vec_sat_u_sub-2-u16.c: ...here.
* gcc.target/riscv/rvv/autovec/binop/vec_sat_u_sub-7.c: Move to...
* gcc.target/riscv/rvv/autovec/sat/vec_sat_u_sub-2-u32.c: ...here.
* gcc.target/riscv/rvv/autovec/binop/vec_sat_u_sub-8.c: Move to...
* gcc.target/riscv/rvv/autovec/sat/vec_sat_u_sub-2-u64.c: ...here.
* gcc.target/riscv/rvv/autovec/binop/vec_sat_u_sub-5.c: Move to...
* gcc.target/riscv/rvv/autovec/sat/vec_sat_u_sub-2-u8.c: ...here.
* gcc.target/riscv/rvv/autovec/binop/vec_sat_u_sub-10.c: Move to...
* gcc.target/riscv/rvv/autovec/sat/vec_sat_u_sub-3-u16.c: ...here.
* gcc.target/riscv/rvv/autovec/binop/vec_sat_u_sub-11.c: Move to...
* gcc.target/riscv/rvv/autovec/sat/vec_sat_u_sub-3-u32.c: ...here.
* gcc.target/riscv/rvv/autovec/binop/vec_sat_u_sub-12.c: Move to...
* gcc.target/riscv/rvv/autovec/sat/vec_sat_u_sub-3-u64.c: ...here.
* gcc.target/riscv/rvv/autovec/binop/vec_sat_u_sub-9.c: Move to...
* gcc.target/riscv/rvv/autovec/sat/vec_sat_u_sub-3-u8.c: ...here.
* gcc.target/riscv/rvv/autovec/binop/vec_sat_u_sub-14.c: Move to...
* gcc.target/riscv/rvv/autovec/sat/vec_sat_u_sub-4-u16.c: ...here.
* gcc.target/riscv/rvv/autovec/binop/vec_sat_u_sub-15.c: Move to...
* gcc.target/riscv/rvv/autovec/sat/vec_sat_u_sub-4-u32.c: ...here.
* gcc.target/riscv/rvv/autovec/binop/vec_sat_u_sub-16.c: Move to...
* gcc.target/riscv/rvv/autovec/sat/vec_sat_u_sub-4-u64.c: ...here.
* gcc.target/riscv/rvv/autovec/binop/vec_sat_u_sub-13.c: Move to...
* gcc.target/riscv/rvv/autovec/sat/vec_sat_u_sub-4-u8.c: ...here.
* gcc.target/riscv/rvv/autovec/binop/vec_sat_u_sub-18.c: Move to...
* gcc.target/riscv/rvv/autovec/sat/vec_sat_u_sub-5-u16.c: ...here.
* gcc.target/riscv/rvv/autovec/binop/vec_sat_u_sub-19.c: Move to...
* gcc.target/riscv/rvv/autovec/sat/vec_sat_u_sub-5-u32.c: ...here.
* gcc.target/riscv/rvv/autovec/binop/vec_sat_u_sub-20.c: Move to...
* gcc.target/riscv/rvv/autovec/sat/vec_sat_u_sub-5-u64.c: ...here.
* gcc.target/riscv/rvv/autovec/binop/vec_sat_u_sub-17.c: Move to...
* gcc.target/riscv/rvv/autovec/sat/vec_sat_u_sub-5-u8.c: ...here.
* gcc.target/riscv/rvv/autovec/binop/vec_sat_u_sub-22.c: Move to...
* gcc.target/riscv/rvv/autovec/sat/vec_sat_u_sub-6-u16.c: ...here.
* gcc.target/riscv/rvv/autovec/binop/vec_sat_u_sub-23.c: Move to...
* gcc.target/riscv/rvv/autovec/sat/vec_sat_u_sub-6-u32.c: ...here.
* gcc.target/riscv/rvv/autovec/binop/vec_sat_u_sub-24.c: Move to...
* gcc.target/riscv/rvv/autovec/sat/vec_sat_u_sub-6-u64.c: ...here.
* gcc.target/riscv/rvv/autovec/binop/vec_sat_u_sub-21.c: Move to...
* gcc.target/riscv/rvv/autovec/sat/vec_sat_u_sub-6-u8.c: ...here.
* gcc.target/riscv/rvv/autovec/binop/vec_sat_u_sub-26.c: Move to...
* gcc.target/riscv/rvv/autovec/sat/vec_sat_u_sub-7-u16.c: ...here.
* gcc.target/riscv/rvv/autovec/binop/vec_sat_u_sub-27.c: Move to...
* gcc.target/riscv/rvv/autovec/sat/vec_sat_u_sub-7-u32.c: ...here.
* gcc.target/riscv/rvv/a

Re: [PATCH v2 1/5] vect: Force alignment peeling to vectorize more early break loops

2024-11-21 Thread Richard Biener

On Fri, 15 Nov 2024, Alex Coplan wrote:

> Hi,
> 
> This is a v2 which hopefully addresses the feedback for v1 of the 1/5
> patch, originally posted here:
> https://gcc.gnu.org/pipermail/gcc-patches/2024-October/48.html
> 
> As mentioned on IRC, it will need follow-up work to fix up latent
> profile issues, but that can be done during stage 3.  We will also need
> a simple (hopefully obvious, even) follow-up patch to fix expectations
> for various tests (since we now vectorize loops which we previously
> couldn't).
> 
> OK for trunk?

I'm still looking at

+  if (dr_info->need_peeling_for_alignment)
+{
+  /* Vector size in bytes.  */
+  poly_uint64 safe_align = tree_to_poly_uint64 (TYPE_SIZE_UNIT 
(vectype));
+
+  /* We can only peel for loops, of course.  */
+  gcc_checking_assert (loop_vinfo);
+
+  /* Calculate the number of vectors read per vector iteration.  If
+it is a power of two, multiply through to get the required
+alignment in bytes.  Otherwise, fail analysis since alignment
+peeling wouldn't work in such a case.  */
+  poly_uint64 vf = LOOP_VINFO_VECT_FACTOR (loop_vinfo);
+  if (STMT_VINFO_GROUPED_ACCESS (stmt_info))
+   vf *= DR_GROUP_SIZE (stmt_info);

so this is the total number of scalars we load, so maybe call it
that way, num_scalars.

+
+  auto num_vectors = vect_get_num_vectors (vf, vectype);
+  if (!pow2p_hwi (num_vectors))

side-note - with all these multiplies there's the possibility that
we get a testcase that has safe_align > PAGE_SIZE, meaning it's
no longer a good way to avoid trapping.  This problem of course
exists generally, we avoid it elsewhere by not having very large
vectors or limiting the group-size.  The actual problem is that
we don't know the actual page size, but we maybe could configure
a minimum as part of the platform configuration.  Should we for
now simply add

  || safe_align > 4096

here?  A testcase would load 512 contiguous uint64 to form an early exit
condition, quite unlikely I guess.

With DR_GROUP_SIZE != 1 there's also the question whether we can
ever reach the desired alignment since each peel will skip
DR_GROUP_SIZE scalar elements - either those are already
DR_GROUP_SIZE aligned or the result will never be.

@@ -7208,7 +7277,8 @@ vect_supportable_dr_alignment (vec_info *vinfo, 
dr_vec_info *dr_info,
   if (misalignment == DR_MISALIGNMENT_UNKNOWN)
 is_packed = not_size_aligned (DR_REF (dr));
   if (targetm.vectorize.support_vector_misalignment (mode, type, 
misalignment,
-is_packed))
+is_packed)
+  && !dr_info->need_peeling_for_alignment)
 return dr_unaligned_supported;

I think you need to do this earlier, like with

  if (misalignment == 0)
return dr_aligned;
+ else if (dr_info->need_peeling_for_alignment)
+   return dr_unaligned_unsupported;

diff --git a/gcc/tree-vect-loop-manip.cc b/gcc/tree-vect-loop-manip.cc
index c8dc7153298..be2c2a1bc75 100644
--- a/gcc/tree-vect-loop-manip.cc
+++ b/gcc/tree-vect-loop-manip.cc
@@ -3129,12 +3129,6 @@ vect_do_peeling (loop_vec_info loop_vinfo, tree 
niters, tree nitersm1,
   int estimated_vf;
   int prolog_peeling = 0;
   bool vect_epilogues = loop_vinfo->epilogue_vinfo != NULL;
-  /* We currently do not support prolog peeling if the target alignment 
is not
- known at compile time.  'vect_gen_prolog_loop_niters' depends on the
- target alignment being constant.  */
-  dr_vec_info *dr_info = LOOP_VINFO_UNALIGNED_DR (loop_vinfo);
-  if (dr_info && !DR_TARGET_ALIGNMENT (dr_info).is_constant ())
-return NULL;

this indeed looks like an odd (wrong) place to enforce this, so I
suppose you figured the check is not needed, independent of the patch?

OK with the above changes.

Thanks,
Richard.


> Thanks,
> Alex
> 
> -- >8 --
> 
> This allows us to vectorize more loops with early exits by forcing
> peeling for alignment to make sure that we're guaranteed to be able to
> safely read an entire vector iteration without crossing a page boundary.
> 
> To make this work for VLA architectures we have to allow compile-time
> non-constant target alignments.  We also have to override the result of
> the target's preferred_vector_alignment hook if it isn't already a
> power-of-two multiple of the amount read per vector iteration.
> 
> gcc/ChangeLog:
> 
>   * tree-vect-data-refs.cc (vect_analyze_early_break_dependences):
>   Set need_peeling_for_alignment flag on read DRs instead of
>   failing vectorization.  Punt on gathers and strided_p accesses.
>   (dr_misalignment): Handle non-constant target alignments.
>   (vect_compute_data_ref_alignment): If need_peeling_for_alignment
>   flag is set on the DR, then override the target alignment chosen
>   by the preferred_vector_alignment hook to choose a safe
>   alignment.  Add new result parameter.  Use it ...
>   (vect_analyze_data_refs_alignment

Re: [PATCH] include: Add new post-DWARF 5 DW_LANG_* enumerators

2024-11-21 Thread Richard Biener

On Thu, 21 Nov 2024, Jakub Jelinek wrote:

> Hi!
> 
> DWARF changed the language code assignment to be on a web page and
> after DWARF 5 has been published already 27 codes have been assigned.
> We have some of those already in the header, but most of them were missing,
> including one added just yesterday (DW_LANG_C23).
> Note, this is really post-DWARF 5 stuff rather than DWARF 6, because
> DWARF 6 plans to switch from DW_AT_language to DW_AT_language_{name,version}
> pair where we'll say DW_LNAME_C with 202311 version instead of this.
> 
> Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?

OK

> 2024-11-21  Jakub Jelinek  
> 
>   * dwarf2.h (enum dwarf_source_language): Add comment where
>   the post DWARF 5 additions start.  Refresh list from
>   https://dwarfstd.org/languages.html.
> 
> --- include/dwarf2.h.jj   2024-09-24 11:31:48.867620081 +0200
> +++ include/dwarf2.h  2024-11-20 21:47:16.058161071 +0100
> @@ -379,6 +379,8 @@ enum dwarf_source_language
>  DW_LANG_Fortran08 = 0x0023,
>  DW_LANG_RenderScript = 0x0024,
>  DW_LANG_BLISS = 0x0025,
> +/* Post DWARF 5 additions to the DWARF set.
> +   See https://dwarfstd.org/languages.html .  */
>  DW_LANG_Kotlin = 0x0026,
>  DW_LANG_Zig = 0x0027,
>  DW_LANG_Crystal = 0x0028,
> @@ -388,6 +390,24 @@ enum dwarf_source_language
>  DW_LANG_Fortran18 = 0x002d,
>  DW_LANG_Ada2005 = 0x002e,
>  DW_LANG_Ada2012 = 0x002f,
> +DW_LANG_HIP = 0x0030,
> +DW_LANG_Assembly = 0x0031,
> +DW_LANG_C_sharp = 0x0032,
> +DW_LANG_Mojo = 0x0033,
> +DW_LANG_GLSL = 0x0034,
> +DW_LANG_GLSL_ES = 0x0035,
> +DW_LANG_HLSL = 0x0036,
> +DW_LANG_OpenCL_CPP = 0x0037,
> +DW_LANG_CPP_for_OpenCL = 0x0038,
> +DW_LANG_SYCL = 0x0039,
> +DW_LANG_C_plus_plus_23 = 0x003a,
> +DW_LANG_Odin = 0x003b,
> +DW_LANG_P4 = 0x003c,
> +DW_LANG_Metal = 0x003d,
> +DW_LANG_C23 = 0x003e,
> +DW_LANG_Ruby = 0x0040,
> +DW_LANG_Move = 0x0041,
> +DW_LANG_Hylo = 0x0042,
>  
>  DW_LANG_lo_user = 0x8000,/* Implementation-defined range start.  
> */
>  DW_LANG_hi_user = 0x,/* Implementation-defined range start.  
> */
> 
>   Jakub
> 
> 

-- 
Richard Biener 
SUSE Software Solutions Germany GmbH,
Frankenstrasse 146, 90461 Nuernberg, Germany;
GF: Ivo Totev, Andrew McDonald, Werner Knoblich; (HRB 36809, AG Nuernberg)

[PATCH] tree-optimization/117720 - check alignment for VMAT_STRIDED_SLP

2024-11-21 Thread Richard Biener

While vectorizable_store was already checking alignment requirement
of the stores and fall back to elementwise accesses if not honored
the vectorizable_load path wasn't doing this.  After the previous
change to disregard alignment checking for VMAT_STRIDED_SLP in
get_group_load_store_type this now tripped on power.

Bootstrapped and tested on x86_64-unknown-linux-gnu, pushed.

PR tree-optimization/117720
* tree-vect-stmts.cc (vectorizable_load): For VMAT_STRIDED_SLP
verify the choosen load type is OK with regard to alignment.
---
 gcc/tree-vect-stmts.cc | 16 +---
 1 file changed, 13 insertions(+), 3 deletions(-)

diff --git a/gcc/tree-vect-stmts.cc b/gcc/tree-vect-stmts.cc
index 8700d1787b4..271c6da2a25 100644
--- a/gcc/tree-vect-stmts.cc
+++ b/gcc/tree-vect-stmts.cc
@@ -10650,9 +10650,19 @@ vectorizable_load (vec_info *vinfo,
 of it.  */
  if (n == const_nunits)
{
- nloads = 1;
- lnel = const_nunits;
- ltype = vectype;
+ int mis_align = dr_misalignment (first_dr_info, vectype);
+ dr_alignment_support dr_align
+   = vect_supportable_dr_alignment (vinfo, dr_info, vectype,
+mis_align);
+ if (dr_align == dr_aligned
+ || dr_align == dr_unaligned_supported)
+   {
+ nloads = 1;
+ lnel = const_nunits;
+ ltype = vectype;
+ alignment_support_scheme = dr_align;
+ misalignment = mis_align;
+   }
}
  /* Else use the biggest vector we can load the group without
 accessing excess elements.  */
-- 
2.43.0

RE: [PATCH v1 2/2] Match: Refactor the unsigned SAT_ADD match pattern [NFC]

2024-11-21 Thread Li, Pan2

Almost forgot this patch, kindly reminder.

Pan

-Original Message-
From: Li, Pan2  
Sent: Monday, November 11, 2024 4:44 PM
To: gcc-patches@gcc.gnu.org
Cc: richard.guent...@gmail.com; tamar.christ...@arm.com; juzhe.zh...@rivai.ai; 
kito.ch...@gmail.com; jeffreya...@gmail.com; rdapp@gmail.com; Li, Pan2 

Subject: [PATCH v1 2/2] Match: Refactor the unsigned SAT_ADD match pattern [NFC]

From: Pan Li 

This patch would like to refactor the unsigned SAT_ADD pattern by:
* Extract type check outside.
* Extract common sub pattern.
* Re-arrange the related match pattern forms together.
* Remove unnecessary helper pattern matches.

The below test suites are passed for this patch.
* The rv64gcv fully regression test.
* The x86 bootstrap test.
* The x86 fully regression test.

gcc/ChangeLog:

* match.pd: Refactor sorts of unsigned SAT_ADD match pattern.

Signed-off-by: Pan Li 
---
 gcc/match.pd | 109 +--
 1 file changed, 45 insertions(+), 64 deletions(-)

diff --git a/gcc/match.pd b/gcc/match.pd
index af3272eac55..3a76757 100644
--- a/gcc/match.pd
+++ b/gcc/match.pd
@@ -3086,14 +3086,6 @@ DEFINE_INT_AND_FLOAT_ROUND_FN (RINT)
|| POINTER_TYPE_P (itype))
   && wi::eq_p (wi::to_wide (int_cst), wi::max_value (itype))
 
-/* Unsigned Saturation Add */
-/* SAT_ADD = usadd_left_part_1 | usadd_right_part_1, aka:
-   SAT_ADD = (X + Y) | -((X + Y) < X)  */
-(match (usadd_left_part_1 @0 @1)
- (plus:c @0 @1)
- (if (INTEGRAL_TYPE_P (type) && TYPE_UNSIGNED (type)
-  && types_match (type, @0, @1
-
 /* SAT_ADD = usadd_left_part_2 | usadd_right_part_2, aka:
SAT_ADD = REALPART_EXPR <.ADD_OVERFLOW> | (IMAGPART_EXPR <.ADD_OVERFLOW> != 
0) */
 (match (usadd_left_part_2 @0 @1)
@@ -3101,20 +3093,6 @@ DEFINE_INT_AND_FLOAT_ROUND_FN (RINT)
  (if (INTEGRAL_TYPE_P (type) && TYPE_UNSIGNED (type)
   && types_match (type, @0, @1
 
-/* SAT_ADD = usadd_left_part_1 | usadd_right_part_1, aka:
-   SAT_ADD = (X + Y) | -((type)(X + Y) < X)  */
-(match (usadd_right_part_1 @0 @1)
- (negate (convert (lt (plus:c @0 @1) @0)))
- (if (INTEGRAL_TYPE_P (type) && TYPE_UNSIGNED (type)
-  && types_match (type, @0, @1
-
-/* SAT_ADD = usadd_left_part_1 | usadd_right_part_1, aka:
-   SAT_ADD = (X + Y) | -(X > (X + Y))  */
-(match (usadd_right_part_1 @0 @1)
- (negate (convert (gt @0 (plus:c @0 @1
- (if (INTEGRAL_TYPE_P (type) && TYPE_UNSIGNED (type)
-  && types_match (type, @0, @1
-
 /* SAT_ADD = usadd_left_part_2 | usadd_right_part_2, aka:
SAT_ADD = REALPART_EXPR <.ADD_OVERFLOW> | (IMAGPART_EXPR <.ADD_OVERFLOW> != 
0) */
 (match (usadd_right_part_2 @0 @1)
@@ -3129,33 +3107,62 @@ DEFINE_INT_AND_FLOAT_ROUND_FN (RINT)
  (if (INTEGRAL_TYPE_P (type) && TYPE_UNSIGNED (type)
   && types_match (type, @0, @1
 
+(if (INTEGRAL_TYPE_P (type) && TYPE_UNSIGNED (type))
+ (match (usadd_overflow_mask @0 @1)
+  /* SAT_U_ADD = (X + Y) | -(X > (X + Y)).
+ Overflow_Mask = -(X > (X + Y)).  */
+  (negate (convert (gt @0 (plus:c @0 @1
+  (if (types_match (type, @0, @1
+ (match (usadd_overflow_mask @0 @1)
+  /* SAT_U_ADD = (X + Y) | -(X > (X + Y)).
+ Overflow_Mask = -((X + Y) < X).  */
+  (negate (convert (lt (plus:c @0 @1) @0)))
+  (if (types_match (type, @0, @1
+ (match (unsigned_integer_sat_add @0 @1)
+  /* SAT_U_ADD = (X + Y) | Overflow_Mask  */
+  (bit_ior:c (plus:c @0 @1) (usadd_overflow_mask @0 @1))
+  (if (types_match (type, @0, @1
+ (match (unsigned_integer_sat_add @0 @1)
+  /* SAT_U_ADD = (X + Y) >= X ? (X + Y) : -1  */
+  (cond^ (ge (plus:c@2 @0 @1) @0) @2 integer_minus_onep)
+  (if (types_match (type, @0, @1
+ (match (unsigned_integer_sat_add @0 @1)
+  /* SAT_U_ADD = (X + Y) < X ? -1 : (X + Y)  */
+  (cond^ (lt (plus:c@2 @0 @1) @0) integer_minus_onep @2)
+  (if (types_match (type, @0, @1
+ (match (unsigned_integer_sat_add @0 @1)
+  /* SAT_U_ADD = X <= (X + Y) ? (X + Y) : -1  */
+  (cond^ (le @0 (plus:c@2 @0 @1)) @2 integer_minus_onep)
+  (if (types_match (type, @0, @1
+ (match (unsigned_integer_sat_add @0 @1)
+  /* SAT_U_ADD = X > (X + Y) ? -1 : (X + Y)  */
+  (cond^ (gt @0 (plus:c@2 @0 @1)) integer_minus_onep @2)
+  (if (types_match (type, @0, @1
+ (match (unsigned_integer_sat_add @0 @1)
+  /* SAT_U_ADD = (X + IMM) >= x ? (X + IMM) : -1  */
+  (plus (min @0 INTEGER_CST@2) INTEGER_CST@1)
+  (if (types_match (type, @0, @1))
+   (with
+{
+ unsigned precision = TYPE_PRECISION (type);
+ wide_int cst_1 = wi::to_wide (@1);
+ wide_int cst_2 = wi::to_wide (@2);
+ wide_int max = wi::mask (precision, false, precision);
+ wide_int sum = wi::add (cst_1, cst_2);
+}
+(if (wi::eq_p (max, sum)))
+
 /* We cannot merge or overload usadd_left_part_1 and usadd_left_part_2
because the sub part of left_part_2 cannot work with right_part_1.
For example, left_part_2 pattern focus one .ADD_OVERFLOW but the
right_part_1 has nothing to do with .ADD_OVERFLOW.  */
 
-/* Unsigned saturatio

Re: [PATCH]middle-end: Pass along SLP node when costing vector loads/stores

2024-11-21 Thread Richard Biener

On Wed, 20 Nov 2024, Tamar Christina wrote:

> Hi All,
> 
> With the support to SLP only we now pass the VMAT through the SLP node, 
> however
> the majority of the costing calls inside vectorizable_load and
> vectorizable_store do no pass the SLP node along.  Due to this the backend 
> costing
> never sees the VMAT for these cases anymore.
> 
> Additionally the helper around record_stmt_cost when both SLP and stmt_vinfo 
> are
> passed would only pass the SLP node along.  However the SLP node doesn't 
> contain
> all the info available in the stmt_vinfo and we'd have to go through the
> SLP_TREE_REPRESENTATIVE anyway.  As such I changed the function to just Always
> pass both along.  Unlike the VMAT changes, I don't believe there to be a
> correctness issue here but would minimize the number of churn in the backend
> costing until vectorizer costing as a whole is revisited in GCC 16.

I agree this is the best way forward at this point - I originally
thought to never pass both and treat the calls with SLP node as being
the "future" way, but clearly we're not even close to a "future" costing
API right now.

> These changes re-enable the cost model on AArch64 and also correctly find the
> VMATs on loads and stores fixing testcases such as sve_iters_low_2.c.
> 
> Bootstrapped Regtested on aarch64-none-linux-gnu,
> arm-none-linux-gnueabihf, x86_64-pc-linux-gnu -m32, -m64 and no issues.
> 
> Ok for master?

OK.

Thanks,
Richard.

> Thanks,
> Tamar
> 
> gcc/ChangeLog:
> 
>   * tree-vect-data-refs.cc (vect_get_data_access_cost): Pass NULL for SLP
>   node.
>   * tree-vect-stmts.cc (record_stmt_cost): Expose.
>   (vect_get_store_cost, vect_get_load_cost): Extend with SLP node.
>   (vectorizable_store, vectorizable_load): Pass SLP node to all costing.
>   * tree-vectorizer.h (record_stmt_cost): Always pass both SLP node and
>   stmt_vinfo to costing.
>   (vect_get_load_cost, vect_get_store_cost): Extend with SLP node.
> 
> ---
> diff --git a/gcc/tree-vect-data-refs.cc b/gcc/tree-vect-data-refs.cc
> index 
> 3ea5fb883b1a5289195142171eb45fa422910a95..d87ca79b8e4c16d242e67431d1b527bdb8cb74e4
>  100644
> --- a/gcc/tree-vect-data-refs.cc
> +++ b/gcc/tree-vect-data-refs.cc
> @@ -1729,12 +1729,14 @@ vect_get_data_access_cost (vec_info *vinfo, 
> dr_vec_info *dr_info,
>  ncopies = vect_get_num_copies (loop_vinfo, STMT_VINFO_VECTYPE 
> (stmt_info));
>  
>if (DR_IS_READ (dr_info->dr))
> -vect_get_load_cost (vinfo, stmt_info, ncopies, alignment_support_scheme,
> - misalignment, true, inside_cost,
> - outside_cost, prologue_cost_vec, body_cost_vec, false);
> +vect_get_load_cost (vinfo, stmt_info, NULL, ncopies,
> + alignment_support_scheme, misalignment, true,
> + inside_cost, outside_cost, prologue_cost_vec,
> + body_cost_vec, false);
>else
> -vect_get_store_cost (vinfo,stmt_info, ncopies, alignment_support_scheme,
> -  misalignment, inside_cost, body_cost_vec);
> +vect_get_store_cost (vinfo,stmt_info, NULL, ncopies,
> +  alignment_support_scheme, misalignment, inside_cost,
> +  body_cost_vec);
>  
>if (dump_enabled_p ())
>  dump_printf_loc (MSG_NOTE, vect_location,
> diff --git a/gcc/tree-vect-stmts.cc b/gcc/tree-vect-stmts.cc
> index 
> 7a92da00f7ddcfdf146fa1c2511f609e8bc40e9e..46543c15c00f00e5127d06446f58fce79951c3b0
>  100644
> --- a/gcc/tree-vect-stmts.cc
> +++ b/gcc/tree-vect-stmts.cc
> @@ -93,7 +93,7 @@ stmt_in_inner_loop_p (vec_info *vinfo, class _stmt_vec_info 
> *stmt_info)
> target model or by saving it in a vector for later processing.
> Return a preliminary estimate of the statement's cost.  */
>  
> -static unsigned
> +unsigned
>  record_stmt_cost (stmt_vector_for_cost *body_cost_vec, int count,
> enum vect_cost_for_stmt kind,
> stmt_vec_info stmt_info, slp_tree node,
> @@ -1008,8 +1008,8 @@ cfun_returns (tree decl)
>  
>  /* Calculate cost of DR's memory access.  */
>  void
> -vect_get_store_cost (vec_info *, stmt_vec_info stmt_info, int ncopies,
> -  dr_alignment_support alignment_support_scheme,
> +vect_get_store_cost (vec_info *, stmt_vec_info stmt_info, slp_tree slp_node,
> +  int ncopies, dr_alignment_support alignment_support_scheme,
>int misalignment,
>unsigned int *inside_cost,
>stmt_vector_for_cost *body_cost_vec)
> @@ -1019,7 +1019,7 @@ vect_get_store_cost (vec_info *, stmt_vec_info 
> stmt_info, int ncopies,
>  case dr_aligned:
>{
>   *inside_cost += record_stmt_cost (body_cost_vec, ncopies,
> -   vector_store, stmt_info, 0,
> +   vector_store, stmt_info, slp_node, 0,
> vect_body);
>  
>  if (dump_enabled_p ())
>

Re: [PATCH v13] ada: fix timeval timespec on 32 bits archs with 64 bits time_t [PR114065]

2024-11-21 Thread Nicolas Boulenguez

Hello.

I have rebased the patches on gcc-15.
The new v13 and a diff between v12 are available at PR114065.

The merge of
  commit 093894adbdf0638a494257bfe4bc42eb7ad13f6b
  Subject: ada: GNAT Calendar Support for 64-bit Unix Time
  Author: Douglas B Rupp   2024-10-09 21:44:16
  Validator: Marc Poulhiès   2024-11-12 14:00:48
requires some design decisions that I am not in a position to take,
so for now I have ignored this conflict and worked on the other ones.

I think that 093894ad is redundant with the introduction of
System.C_Time and should be reverted except for the Obsolescent
pragmas. Am I missing some use cases?

Re: [PATCH] s390: Add expander for uaddc/usubc optabs

2024-11-21 Thread Stefan Schulze Frielinghaus

On Mon, Nov 18, 2024 at 09:32:29AM +0100, Andreas Krebbel wrote:
> Hi Stefan,
> 
> 
> On 11/12/24 10:35, Stefan Schulze Frielinghaus wrote:
> > > > +  rtx cond = gen_rtx_LTU (mode, gen_rtx_REG (CCL1mode, 
> > > > CC_REGNUM), const0_rtx);
> > > > +  if (operands[4] == const0_rtx)
> > > > +emit_insn (gen_add3_carry1_cc (operands[0], operands[2], 
> > > > operands[3]));
> > > > +  else
> > > If we would just generate the alc pattern with a carry in of 0, wouldn't 
> > > the
> > > other optimizers be able to figure out that the a normal add would do 
> > > here?
> > > This path does not seem to get exercised by your testcases.
> > The zero carry in can occur due to
> > https://gcc.gnu.org/git/gitweb.cgi?p=gcc.git;h=a7aec76a74dd38524be325343158d3049b6ab3ac
> > which is why we have a special case just in order to emit an add with
> > carry out only.
> 
> Ok. I was hoping that this downgrade from add with carry in/out to add with
> carry out would happen somewhat "automatically", if the the carry in happens
> to be a constant zero. But probably not.
> 
> Your testcases invokes the pattern also with a constant 0 as carry in, but
> since you prevent inlining the pattern is never expanded with a const0_rtx.
> The testcase in the commit above is x86 specific, so it might make sense to
> add an invocation which triggers the code path explicitly. Just to be sure.

Ok, just for the curiosity, my take is that the middle end only emits an
u{add,sub}c optab with a constant zero if a pattern is detected as it
would occur e.g. for bitints, i.e., where multiple limbs are about to be
handled and the first one has a zero carry in.  Thus, just invoking the
test case with a zero results in an add-overflow only as expected.
Anyhow, I have added tests u{add,sub}c-3.c in order to test this path,
too.

> 
> > > create a CCU mode comparison result, which is currently consumed as CCL1 
> > > by
> > > the ALC pattern. This seems to be inconsistent. s390_emit_compare returns
> > > the condition, which I think needs to be fed into the alc_carry1_cc 
> > > pattern
> > > as input condition.
> > > 
> > > Also is LTU really the correct code here? CCU + LTU would expect CC1, but
> > > you want CC2 or CC3 for the carry in, so GTU should be the better choice.
> > > s390_alc_comparison should make sure that only valid combinations are
> > > accepted, CCU + GTU would be one of them.
> > I was coming up with my own condition since conditions created by
> > s390_emit_compare() are of void mode which is why the alc predicates
> > s390_alc_comparison() failed since these require GPR mode.  I've fixed
> > that by using PUT_MODE_RAW on those.  I think we could also remove the
> > mode from the match_operand which might be more appropriate.  I've done
> > it the latter way for sub3_slb_borrow1_cc.  Once we have settled
> > for one or the other version I will align uaddc/usubc.
> 
> The PLUS/MINUS arithmetic operations require both operands to have a proper
> integer mode. So I think dropping the mode from match_operand would be
> wrong. On the other hand, an IF_THEN_ELSE always requires the comparison to
> have VOIDmode, that's what s390_emit_compare is supposed to be used for. So
> taking what s390_emit_compare generates and changing the mode is the right
> thing here. Since PUT_MODE is always a bit ugly, we might also want go with
> extending s390_emit_compare with a target mode operand defaulting to
> VOIDmode instead.

Ok, I have added a mode operand to s390_emit_compare() and changed
everything accordingly.

> 
> > > > +/* { dg-do run } */
> > > > +/* { dg-options "-O2 -mzarch -save-temps -fdump-tree-optimized" }  */
> > > > +/* { dg-final { scan-tree-dump-times "\\.UADDC \\(" 2 "optimized" { 
> > > > target lp64 } } } */
> > > > +/* { dg-final { scan-tree-dump-times "\\.UADDC \\(" 1 "optimized" { 
> > > > target { ! lp64 } } } } */
> > > > +/* { dg-final { scan-assembler-times "\\talcr\\t" 1 } } */
> > > > +/* { dg-final { scan-assembler-times "\\talcgr\\t" 1 { target lp64 } } 
> > > > } */
> > > Your checks seem to rely on the testcase being compiled with at least
> > > -march=z13.
> 
> I think for the run test you would have to make sure that the test is not
> executed on machines older than z13 then.
> 
> /* { dg-do run { target { s390_useable_hw } } } */

Didn't know about this neat predicate.  Will probably use it more often
in the future :)

I will send a v2 shortly.

Cheers,
Stefan

[patch,avr] PR117726: Improve 4-byte ASHIFT insns

2024-11-21 Thread Georg-Johann Lay


This patch improves the 4-byte ASHIFT insns.
1) It adds a "r,r,C15" alternative for improved long << 15.
2) It adds 3-operand alternatives (depending on options) and
   splits them after peephole2 / before avr-fuse-move into
   a 3-operand byte shift and a 2-operand residual bit shift.
For better control, it introduces new option -msplit-bit-shift
that's activated at -O2 and higher per default.  2) is even
performed with -Os, but not with -Oz.

No new regressions.

Ok for trunk?

Johann

--

AVR: target/117726 - Better optimizations of ASHIFT:SI insns.

This patch improves the 4-byte ASHIFT insns.
1) It adds a "r,r,C15" alternative for improved long << 15.
2) It adds 3-operand alternatives (depending on options) and
   splits them after peephole2 / before avr-fuse-move into
   a 3-operand byte shift and a 2-operand residual bit shift.
For better control, it introduces new option -msplit-bit-shift
that's activated at -O2 and higher per default.  2) is even
performed with -Os, but not with -Oz.

PR target/117726
gcc/
* config/avr/avr.opt (-msplit-bit-shift): Add new optimization option.
* common/config/avr/avr-common.cc (avr_option_optimization_table)
[OPT_LEVELS_2_PLUS]: Turn on -msplit-bit-shift.
* config/avr/avr.h (machine_function.n_avr_fuse_add_executed):
New bool component.
* config/avr/avr.md (attr "isa") <2op, 3op>: Add new values.
(attr "enabled"): Handle them.
(ashlsi3, *ashlsi3, *ashlsi3_const): Add "r,r,C15" alternative.
Add "r,0,C4l" and "r,r,C4l" alternatives (depending on 2op / 3op).
(define_split) [avr_split_bit_shift]: Add 2 new ashift:ALL4 splitters.
(define_peephole2) [ashift:ALL4]: Add (match_dup 3) so that the scratch
won't overlap with the output operand of the matched insn.
(*ashl3_const_split): Remove unused ashift:ALL4 splitter.
* config/avr/avr-passes.cc (emit_valid_insn)
(emit_valid_move_clobbercc): Move out of anonymous namespace.
(make_avr_pass_fuse_add) : Don't override.
: Set n_avr_fuse_add_executed according to
func->machine->n_avr_fuse_add_executed.
(pass_data avr_pass_data_split_after_peephole2): New object.
(avr_pass_split_after_peephole2): New rtl_opt_pass.
(avr_emit_shift): New static function.
(avr_shift_is_3op, avr_split_shift_p, avr_split_shift)
(make_avr_pass_split_after_peephole2): New functions.
* config/avr/avr-passes.def (avr_pass_split_after_peephole2):
Insert new pass after pass_peephole2.
* config/avr/avr-protos.h
(n_avr_fuse_add_executed, avr_shift_is_3op, avr_split_shift_p)
(avr_split_shift, avr_optimize_size_level)
(make_avr_pass_split_after_peephole2): New prototypes.
* config/avr/avr.cc (n_avr_fuse_add_executed): New global variable.
(avr_optimize_size_level): New function.
(avr_set_current_function): Set n_avr_fuse_add_executed
according to cfun->machine->n_avr_fuse_add_executed.
(ashlsi3_out) [case 15]: Output optimized code for this offset.
(avr_rtx_costs_1) [ASHIFT, SImode]: Adjust costs of oggsets 15, 16.
* config/avr/constraints.md (C4a, C4r, C4r): New constraints.
* pass_manager.h (pass_manager): Adjust comments.AVR: target/117726 - Better optimizations of ASHIFT:SI insns.

This patch improves the 4-byte ASHIFT insns.
1) It adds a "r,r,C15" alternative for improved long << 15.
2) It adds 3-operand alternatives (depending on options) and
   splits them after peephole2 / before avr-fuse-move into
   a 3-operand byte shift and a 2-operand residual bit shift.
For better control, it introduces new option -msplit-bit-shift
that's activated at -O2 and higher per default.  2) is even
performed with -Os, but not with -Oz.

PR target/117726
gcc/
* config/avr/avr.opt (-msplit-bit-shift): Add new optimization option.
* common/config/avr/avr-common.cc (avr_option_optimization_table)
[OPT_LEVELS_2_PLUS]: Turn on -msplit-bit-shift.
* config/avr/avr.h (machine_function.n_avr_fuse_add_executed):
New bool component.
* config/avr/avr.md (attr "isa") <2op, 3op>: Add new values.
(attr "enabled"): Handle them.
(ashlsi3, *ashlsi3, *ashlsi3_const): Add "r,r,C15" alternative.
Add "r,0,C4l" and "r,r,C4l" alternatives (depending on 2op / 3op).
(define_split) [avr_split_bit_shift]: Add 2 new ashift:ALL4 splitters.
(define_peephole2) [ashift:ALL4]: Add (match_dup 3) so that the scratch
won't overlap with the output operand of the matched insn.
(*ashl3_const_split): Remove unused ashift:ALL4 splitter.
* config/avr/avr-passes.cc (emit_valid_insn)
(emit_valid_move_clobbercc): Move out of anonymous namespace.
(make_avr_pass_fuse_add) :

Re: [PATCH v4 4/5] aarch64: add SVE2 FP8 multiply accumulate intrinsics

2024-11-21 Thread Richard Sandiford

Claudio Bantaloukas  writes:
> [...]
> @@ -4004,6 +4008,44 @@ SHAPE (ternary_bfloat_lane)
>  typedef ternary_bfloat_lane_base<2> ternary_bfloat_lanex2_def;
>  SHAPE (ternary_bfloat_lanex2)
 
> +/* sv_t svfoo[_t0](sv_t, svmfloat8_t, svmfloat8_t, uint64_t)
> +
> +   where the final argument is an integer constant expression in the range
> +   [0, 15].  */
> +struct ternary_mfloat8_lane_def
> +: public ternary_resize2_lane_base<8, TYPE_mfloat, TYPE_mfloat>
> +{
> +  void
> +  build (function_builder &b, const function_group_info &group) const 
> override
> +  {
> +gcc_assert (group.fpm_mode == FPM_set);
> +b.add_overloaded_functions (group, MODE_none);
> +build_all (b, "v0,v0,vM,vM,su64", group, MODE_none);
> +  }
> +
> +  bool
> +  check (function_checker &c) const override
> +  {
> +return c.require_immediate_lane_index (3, 2, 1);
> +  }
> +
> +  tree
> +  resolve (function_resolver &r) const override
> +  {
> +type_suffix_index type;
> +if (!r.check_num_arguments (5)
> + || (type = r.infer_vector_type (0)) == NUM_TYPE_SUFFIXES
> + || !r.require_vector_type (1, VECTOR_TYPE_svmfloat8_t)
> + || !r.require_vector_type (2, VECTOR_TYPE_svmfloat8_t)
> + || !r.require_integer_immediate (3)
> + || !r.require_scalar_type (4, "int64_t"))

uint64_t

> +  return error_mark_node;
> +
> +return r.resolve_to (r.mode_suffix_id, type, TYPE_SUFFIX_mf8, 
> GROUP_none);
> +  }
> +};
> +SHAPE (ternary_mfloat8_lane)
> +
>  /* sv_t svfoo[_t0](sv_t, svbfloatt16_t, svbfloat16_t)
> sv_t svfoo[_n_t0](sv_t, svbfloat16_t, bfloat16_t).  */
>  struct ternary_bfloat_opt_n_def
> @@ -4019,6 +4061,46 @@ struct ternary_bfloat_opt_n_def
>  };
>  SHAPE (ternary_bfloat_opt_n)
>  
> +/* sv_t svfoo[_t0](sv_t, svmfloatt8_t, svmfloat8_t)
> +   sv_t svfoo[_n_t0](sv_t, svmfloat8_t, bfloat8_t).  */
> +struct ternary_mfloat8_opt_n_def
> +: public ternary_resize2_opt_n_base<8, TYPE_mfloat, TYPE_mfloat>
> +{
> +  void
> +  build (function_builder &b, const function_group_info &group) const 
> override
> +  {
> +gcc_assert (group.fpm_mode == FPM_set);
> +b.add_overloaded_functions (group, MODE_none);
> +build_all (b, "v0,v0,vM,vM", group, MODE_none);
> +build_all (b, "v0,v0,vM,sM", group, MODE_n);
> +  }
> +
> +  tree
> +  resolve (function_resolver &r) const override
> +  {
> +type_suffix_index type;
> +if (!r.check_num_arguments (4)
> + || (type = r.infer_vector_type (0)) == NUM_TYPE_SUFFIXES
> + || !r.require_vector_type (1, VECTOR_TYPE_svmfloat8_t)
> + || !r.require_scalar_type (3, "int64_t"))
> +  return error_mark_node;
> +
> +tree scalar_form
> + = r.lookup_form (MODE_n, type, TYPE_SUFFIX_mf8, GROUP_none);
> +if (r.scalar_argument_p (2))
> +  {
> + if (scalar_form)
> +   return scalar_form;
> + return error_mark_node;

It looks like this would return error_mark_node without reporting
an error first.

> +  }
> +if (scalar_form && !r.require_vector_or_scalar_type (2))
> +  return error_mark_node;
> +
> +return r.resolve_to (r.mode_suffix_id, type, TYPE_SUFFIX_mf8, 
> GROUP_none);
> +  }

In this context (unlike finish_opt_n_resolution) we know that there is
a bijection between the vector and scalar forms.  So I think we can just
add require_vector_or_scalar_type to the initial checks:

if (!r.check_num_arguments (4)
|| (type = r.infer_vector_type (0)) == NUM_TYPE_SUFFIXES
|| !r.require_vector_type (1, VECTOR_TYPE_svmfloat8_t)
|| !r.require_vector_or_scalar_type (2)
|| !r.require_scalar_type (3, "int64_t"))
  return error_mark_node;

auto mode = r.mode_suffix_id;
if (r.scalar_argument_p (2))
  mode = MODE_n;
else if (!r.require_vector_type (2, VECTOR_TYPE_svmfloat8_t))
  return error_mark_node;

return r.resolve_to (mode, type, TYPE_SUFFIX_mf8, GROUP_none);

(untested).

> [...]
> +;; -
> +;;  [FP] Mfloat8 Multiply-and-accumulate operations
> +;; -
> +;; Includes:
> +;; - FMLALB (vectors, FP8 to FP16)
> +;; - FMLALT (vectors, FP8 to FP16)
> +;; - FMLALB (indexed, FP8 to FP16)
> +;; - FMLALT (indexed, FP8 to FP16)
> +;; - FMLALLBB (vectors)
> +;; - FMLALLBB (indexed)
> +;; - FMLALLBT (vectors)
> +;; - FMLALLBT (indexed)
> +;; - FMLALLTB (vectors)
> +;; - FMLALLTB (indexed)
> +;; - FMLALLTT (vectors)
> +;; - FMLALLTT (indexed)
> +;; -
> +
> +(define_insn "@aarch64_sve_add_"
> +  [(set (match_operand:SVE_FULL_HSF 0 "register_operand")
> + (unspec:SVE_FULL_HSF
> +   [(match_operand:SVE_FULL_HSF 1 "register_operand")
> +(match_operand:VNx16QI 2 "register_operand")
> +(match_operand:VNx16QI 3 "register_operand")
> +(reg:DI FPM_REGNUM)]
> +   SVE2_FP8_TERNARY))]
> +  "TARGET_SSVE_FP8FMA"
> +  {@ [ co

[COMMITTED] libgomp: testsuite: Fix libgomp.c/alloc-pinned-3.c etc. for C23 on non-Linux

2024-11-21 Thread Rainer Orth

Since the switch to a C23 default, three libgomp tests FAIL on Solaris:

FAIL: libgomp.c/alloc-pinned-3.c (test for excess errors)
UNRESOLVED: libgomp.c/alloc-pinned-3.c compilation failed to produce executable
FAIL: libgomp.c/alloc-pinned-4.c (test for excess errors)
UNRESOLVED: libgomp.c/alloc-pinned-4.c compilation failed to produce executable
FAIL: libgomp.c/alloc-pinned-6.c (test for excess errors)
UNRESOLVED: libgomp.c/alloc-pinned-6.c compilation failed to produce executable

Excess errors:
/vol/gcc/src/hg/master/local/libgomp/testsuite/libgomp.c/alloc-pinned-3.c:104:3:
 error: too many arguments to function 'set_pin_limit'

Fixed by adding the missing size argument to the stub functions.

Tested on i386-pc-solaris2.11 and sparc-sun-solaris2.11.

Committed as obvious.

Rainer

-- 
-
Rainer Orth, Center for Biotechnology, Bielefeld University


2024-11-20  Rainer Orth  

libgomp:
* testsuite/libgomp.c/alloc-pinned-3.c [!__linux__]
(set_pin_limit): Add size arg.
* testsuite/libgomp.c/alloc-pinned-4.c [!__linux__]
(set_pin_limit): Likewise.
* testsuite/libgomp.c/alloc-pinned-6.c [!__linux__]
(set_pin_limit): Likewise.

# HG changeset patch
# Parent  178537d2f5cc1ab82ba799681cb22547b9c0f888
libgomp: testsuite: Fix libgomp.c/alloc-pinned-3.c for C23 on non-Linux

diff --git a/libgomp/testsuite/libgomp.c/alloc-pinned-3.c b/libgomp/testsuite/libgomp.c/alloc-pinned-3.c
--- a/libgomp/testsuite/libgomp.c/alloc-pinned-3.c
+++ b/libgomp/testsuite/libgomp.c/alloc-pinned-3.c
@@ -57,7 +57,7 @@ get_pinned_mem ()
 }
 
 void
-set_pin_limit ()
+set_pin_limit (int size)
 {
 }
 #endif
diff --git a/libgomp/testsuite/libgomp.c/alloc-pinned-4.c b/libgomp/testsuite/libgomp.c/alloc-pinned-4.c
--- a/libgomp/testsuite/libgomp.c/alloc-pinned-4.c
+++ b/libgomp/testsuite/libgomp.c/alloc-pinned-4.c
@@ -57,7 +57,7 @@ get_pinned_mem ()
 }
 
 void
-set_pin_limit ()
+set_pin_limit (int size)
 {
 }
 #endif
diff --git a/libgomp/testsuite/libgomp.c/alloc-pinned-6.c b/libgomp/testsuite/libgomp.c/alloc-pinned-6.c
--- a/libgomp/testsuite/libgomp.c/alloc-pinned-6.c
+++ b/libgomp/testsuite/libgomp.c/alloc-pinned-6.c
@@ -56,7 +56,7 @@ get_pinned_mem ()
 }
 
 void
-set_pin_limit ()
+set_pin_limit (int size)
 {
 }
 #endif

[PATCH 0/2] asm() operand 'c' modifier handling

2024-11-21 Thread Jan Beulich

Documentation is pretty clear here: "Require a constant operand and
print the constant expression with no punctuation." See the patches for
further details.

1: fix asm() operand 'c' modifier handling
2: x86: fix asm() operand 'c' modifier handling

Technically for x86 the 2nd patch alone ought to be sufficient to address
the issue there. The first patch covers a sub-case more generally, for all
targets not having special handling (e.g. RISC-V). Further per-target
special casing may be necessary to fully support non-integer-literal
operands.

Jan

[PATCH 1/2] fix asm() operand 'c' modifier handling

2024-11-21 Thread Jan Beulich

Documentation is pretty clear here: "Require a constant operand and
print the constant expression with no punctuation." IOW any integer
value of whatever magnitude or sign ought to be acceptable.
---
RFC: If this (doing the change in generic code) is the way to go, in a
 few cases arch-specific code can (or even needs to be, e.g. Arm32)
 cleaned up. Alternatively arch-specific code needs to learn to deal
 with 'c' (at least x86 and RISC-V, just from code inspection also
 ia64, and I didn't check many others that I'm not familiar with at
 all).

The differences in what CONSTANT_ADDRESS_P() expands to are perplexing:
It goes from e.g. ia64 not permitting anything to PPC covering
CONST_INT_P() there as well. As CONSTANT_ADDRESS_P() is used in other
places too, CONST_INT_P() uses in its expansion likely can't be dropped
(unless they're there _just_ for this purpose).

My interpretation of the doc goes even further: Extended integer values
ought to be okay, too. Yet since floating point (and alike) values
aren't excluded either, that's somewhat ambiguous.

--- a/gcc/final.cc
+++ b/gcc/final.cc
@@ -3506,7 +3506,10 @@ output_asm_insn (const char *templ, rtx
  output_address (VOIDmode, operands[opnum]);
else if (letter == 'c')
  {
-   if (CONSTANT_ADDRESS_P (operands[opnum]))
+   if (CONST_INT_P (operands[opnum]))
+ fprintf (asm_out_file, HOST_WIDE_INT_PRINT_DEC,
+  INTVAL (operands[opnum]));
+   else if (CONSTANT_ADDRESS_P (operands[opnum]))
  output_addr_const (asm_out_file, operands[opnum]);
else
  output_operand (operands[opnum], 'c');

[PATCH 2/2] x86: fix asm() operand 'c' modifier handling

2024-11-21 Thread Jan Beulich

Documentation is pretty clear here: "Require a constant operand and
print the constant expression with no punctuation"; the internal use for
condition codes is entirely undocumented. IOW any constant value of
whatever kind, magnitude, or sign ought to be acceptable as long as it's
expressable. Wire the handling of constants to how 'p' is handled.

--- a/gcc/config/i386/i386.cc
+++ b/gcc/config/i386/i386.cc
@@ -13932,8 +13932,16 @@ ix86_print_operand (FILE *file, rtx x, i
  gcc_fallthrough ();
 #endif
 
-   case 'C':
case 'c':
+ if (code == 'c' && CONSTANT_P (x))
+   {
+ /* Handle all constants (which common code causes to make it
+here) like 'p', for 'c' being overloaded. */
+ code = 'p';
+ break;
+   }
+ gcc_fallthrough ();
+   case 'C':
  if (!COMPARISON_P (x))
{
  output_operand_lossage ("operand is not a condition code, "

[PATCH, commited] apx-ndd-tls-1[ab].c: Add -std=gnu17

2024-11-21 Thread H.J. Lu

Since GCC 15 defaults to -std=gnu23, add -std=gnu17 to apx-ndd-tls-1[ab].c
to avoid:

gcc.target/i386/apx-ndd-tls-1a.c: In function ‘k’:
gcc.target/i386/apx-ndd-tls-1a.c:29:7: error: too many arguments to
function ‘l’
gcc.target/i386/apx-ndd-tls-1a.c:25:5: note: declared here

* gcc.target/i386/apx-ndd-tls-1a.c: -std=gnu17.
* gcc.target/i386/apx-ndd-tls-1b.c: Likewise.

I am checking this patch.

-- 
H.J.
From 6e4b213fbae69ddbfb17717b38ae0bf403465fa0 Mon Sep 17 00:00:00 2001
From: "H.J. Lu" 
Date: Thu, 21 Nov 2024 19:08:03 +0800
Subject: [PATCH] apx-ndd-tls-1[ab].c: Add -std=gnu17
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

Since GCC 15 defaults to -std=gnu23, add -std=gnu17 to apx-ndd-tls-1[ab].c
to avoid:

gcc.target/i386/apx-ndd-tls-1a.c: In function ‘k’:
gcc.target/i386/apx-ndd-tls-1a.c:29:7: error: too many arguments to function ‘l’
gcc.target/i386/apx-ndd-tls-1a.c:25:5: note: declared here

	* gcc.target/i386/apx-ndd-tls-1a.c: -std=gnu17.
	* gcc.target/i386/apx-ndd-tls-1b.c: Likewise.

Signed-off-by: H.J. Lu 
---
 gcc/testsuite/gcc.target/i386/apx-ndd-tls-1a.c | 2 +-
 gcc/testsuite/gcc.target/i386/apx-ndd-tls-1b.c | 2 +-
 2 files changed, 2 insertions(+), 2 deletions(-)

diff --git a/gcc/testsuite/gcc.target/i386/apx-ndd-tls-1a.c b/gcc/testsuite/gcc.target/i386/apx-ndd-tls-1a.c
index 5bf57a76ef7..b4b0e9380a9 100644
--- a/gcc/testsuite/gcc.target/i386/apx-ndd-tls-1a.c
+++ b/gcc/testsuite/gcc.target/i386/apx-ndd-tls-1a.c
@@ -1,7 +1,7 @@
 /* PR target/113733 */
 /* { dg-do assemble { target { apxf && { ! ia32 } } } } */
 /* { dg-require-effective-target tls } */
-/* { dg-options "-mapxf -O3 -w" } */
+/* { dg-options "-std=gnu17 -mapxf -O3 -w" } */
 
 extern __thread int a, j;
 enum b
diff --git a/gcc/testsuite/gcc.target/i386/apx-ndd-tls-1b.c b/gcc/testsuite/gcc.target/i386/apx-ndd-tls-1b.c
index a3eb8106508..d0637034f11 100644
--- a/gcc/testsuite/gcc.target/i386/apx-ndd-tls-1b.c
+++ b/gcc/testsuite/gcc.target/i386/apx-ndd-tls-1b.c
@@ -2,7 +2,7 @@
 /* { dg-do assemble { target { apxf && { ! ia32 } } } } */
 /* { dg-require-effective-target tls } */
 /* { dg-require-effective-target code_6_gottpoff_reloc } */
-/* { dg-options "-save-temps -mapxf -O3 -w" } */
+/* { dg-options "-save-temps -std=gnu17 -mapxf -O3 -w" } */
 
 #include "apx-ndd-tls-1a.c"
 
-- 
2.47.0

[PATCH] i386/testsuite: Enhance AVX10.2 vmovd/w testcases

2024-11-21 Thread Haochen Jiang

Hi all,

Under -fno-omit-frame-pointer, %ebp will be used, which is the
Solaris/x86 default. Both check %ebp and %esp to avoid error on that.

Tested under -m32 w/ and w/o -fno-omit-frame-pointer. Ok for trunk?

Thx,
Haochen

gcc/testsuite/ChangeLog:

PR target/117697
* gcc.target/i386/avx10_2-vmovd-1.c: Both check %esp and %ebp.
* gcc.target/i386/avx10_2-vmovw-1.c: Ditto.
---
 gcc/testsuite/gcc.target/i386/avx10_2-vmovd-1.c | 4 ++--
 gcc/testsuite/gcc.target/i386/avx10_2-vmovw-1.c | 3 +--
 2 files changed, 3 insertions(+), 4 deletions(-)

diff --git a/gcc/testsuite/gcc.target/i386/avx10_2-vmovd-1.c 
b/gcc/testsuite/gcc.target/i386/avx10_2-vmovd-1.c
index 6a5d84ac6cd..21bd1a1ef0a 100644
--- a/gcc/testsuite/gcc.target/i386/avx10_2-vmovd-1.c
+++ b/gcc/testsuite/gcc.target/i386/avx10_2-vmovd-1.c
@@ -1,7 +1,7 @@
 /* { dg-do compile } */
 /* { dg-options "-O2 -march=x86-64-v3 -mavx10.2" } */
-/* { dg-final { scan-assembler-times "vmovd\t4\\(%esp\\), %xmm0" 1 { target 
ia32 } } } */
-/* { dg-final { scan-assembler-times "vmovss\t4\\(%esp\\), %xmm0" 1 { target 
ia32 } } } */
+/* { dg-final { scan-assembler-times "vmovd\t\[0-9\]+\\(%e\[bs\]p\\), %xmm0" 1 
{ target ia32 } } } */
+/* { dg-final { scan-assembler-times "vmovss\t\[0-9\]+\\(%e\[bs\]p\\), %xmm0" 
1 { target ia32 } } } */
 /* { dg-final { scan-assembler-times "vmovd\t%xmm0, %xmm0" 3 { target ia32 } } 
} */
 /* { dg-final { scan-assembler-times "vmovd\t%edi, %xmm0" 1 { target { ! ia32 
} } } } */
 /* { dg-final { scan-assembler-times "vmovd\t%xmm0, %xmm0" 4 { target { ! ia32 
} } } } */
diff --git a/gcc/testsuite/gcc.target/i386/avx10_2-vmovw-1.c 
b/gcc/testsuite/gcc.target/i386/avx10_2-vmovw-1.c
index 6e05f72f637..49fa51dc2ec 100644
--- a/gcc/testsuite/gcc.target/i386/avx10_2-vmovw-1.c
+++ b/gcc/testsuite/gcc.target/i386/avx10_2-vmovw-1.c
@@ -1,7 +1,6 @@
 /* { dg-do compile } */
 /* { dg-options "-O2 -march=x86-64-v3 -mavx10.2" } */
-/* { dg-final { scan-assembler-times "vmovw\t4\\(%esp\\), %xmm0" 3 { target 
ia32 } } } */
-/* { dg-final { scan-assembler-times "vmovw\t8\\(%ebp\\), %xmm0" 1 { target 
ia32 } } } */
+/* { dg-final { scan-assembler-times "vmovw\t\[0-9\]+\\(%e\[bs\]p\\), %xmm0" 4 
{ target ia32 } } } */
 /* { dg-final { scan-assembler-times "vmovw\t%xmm0, %xmm0" 4 { target ia32 } } 
} */
 /* { dg-final { scan-assembler-times "vmovw\t%edi, %xmm0" 1 { target { ! ia32 
} } } } */
 /* { dg-final { scan-assembler-times "vmovw\t%xmm0, %xmm0" 7 { target { ! ia32 
} } } } */
-- 
2.31.1

Re: [PATCH 09/17] testsuite: arm: Use effective-target for nomve_fp_1.c test

2024-11-21 Thread Torbjorn SVENSSON





On 2024-11-19 17:49, Richard Earnshaw (lists) wrote:

On 19/11/2024 10:23, Torbjörn SVENSSON wrote:

Update test cases to use -mcpu=unset/-march=unset feature introduced in
r15-3606-g7d6c6a0d15c.

gcc/testsuite/ChangeLog:

* g++.target/arm/mve/general-c++/nomve_fp_1.c: Added option
"-mcpu=unset".

Signed-off-by: Torbjörn SVENSSON 
---
  gcc/testsuite/g++.target/arm/mve/general-c++/nomve_fp_1.c | 2 +-
  1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/gcc/testsuite/g++.target/arm/mve/general-c++/nomve_fp_1.c 
b/gcc/testsuite/g++.target/arm/mve/general-c++/nomve_fp_1.c
index a2069d353cf..fd8c05b0eed 100644
--- a/gcc/testsuite/g++.target/arm/mve/general-c++/nomve_fp_1.c
+++ b/gcc/testsuite/g++.target/arm/mve/general-c++/nomve_fp_1.c
@@ -4,7 +4,7 @@
  /* Do not use dg-add-options arm_v8_1m_mve, because this might expand to "",
 which could imply mve+fp depending on the user settings. We want to make
 sure the '+fp' extension is not enabled.  */
-/* { dg-options "-mfpu=auto -march=armv8.1-m.main+mve" } */
+/* { dg-options "-mfpu=auto -mcpu=unset -march=armv8.1-m.main+mve" } */
  /* { dg-add-options arm_fp } */
  
  #include 



OK


Pushed as r15-5545-g115ae676fc7.

Kind regards,
Torbjörn



R.

Re: [PATCH 08/11] c: c++: flag to disable fetch_op handling fenv exceptions

2024-11-21 Thread Matthew Malcomson


Attempting to resend since got rejected from gcc-patches mailing list.
(Apologies about the duplication to those on Cc).

On 11/18/24 11:25, Matthew Malcomson wrote:

On 11/14/24 18:44, Joseph Myers wrote:

External email: Use caution opening links or attachments


On Thu, 14 Nov 2024,mmalcom...@nvidia.com wrote:


N.b. I would appreciate any feedback about how one should handle such a
situation when working with C11 _Atomic types.  They have the same
problem that they require libatomic and sometimes libatomic is not
available.  Is this just something that will stay as a known limitation
for certain platforms?  Is there something in the works to make it more
viable?

libatomic, like libgcc, is part of the language support - you can't get
full language support (including e.g. support for _Atomic structures)
without sometimes needing to link with libatomic.  Disabling libatomic
means you don't have full language support.  (It's possible you need to
add more libatomic support for particular OS configurations, or arrange
some system for responsibility to be split between libatomic and a board
support package linked in with custom linker scripts if the way to do e.g.
locking for large _Atomic operations genuinely depends on details of the
RTOS in use where a *-elf target is used with GCC that doesn't know about
those details.  But ultimately, whether provided by libatomic or a BSP,
code needs to be linked in to provide certain interfaces required for full
_Atomic support.)


Ah, I see -- thanks for the clarification.



Based on that -- should the same reasoning apply to the new builtins?
I.e. do you believe it would be reasonable to say that the new builtins
require libatomic, and remove this flag entirely?
(Or possibly just keep it hidden from the user and only available as a
developer flag for things like testing).




--
Joseph S. Myers
josmy...@redhat.com

Re: [PATCH] rs6000, fix test builtins-1-p10-runnable.c

2024-11-21 Thread Carl Love


Ping 6


On 11/14/24 1:36 PM, Carl Love wrote:


Ping 5

On 11/5/24 8:27 AM, Carl Love wrote:


Ping 4

On 10/28/24 4:28 PM, Carl Love wrote:

Ping 3


On 10/17/24 1:31 PM, Carl Love wrote:

Ping 2


On 10/9/24 7:43 AM, Carl Love wrote:

Ping, FYI this is a fairly simple fix to a testcase.


On 10/3/24 8:11 AM, Carl Love wrote:

GCC maintainers:

The builtins-1-10-runnable.c has the debugging inadvertently 
enabled.  The test uses #ifdef to enable/disable the debugging. 
Unfortunately, the #define DEBUG was set to 0 to disable 
debugging and enable the call to abort in case of error.  The 
#define should have been removed to disable debugging.
Additionally, a change in the expected output which was made for 
testing purposes was not removed.  Hence, the test is printing 
that there was an error not calling abort.  The result is the 
test does not get reported as failing.


This patch removes the #define DEBUG to enable the call to abort 
and restores the expected output to the correct value. The patch 
was tested on a Power 10 without the #define DEBUG to verify that 
the test does fail with the incorrect expected value.  The 
correct expected value was then restored.  The test reports 19 
expected passes and no errors.


Please let me know if this patch is acceptable for mainline. Thanks.

Carl


--- 



rs6000, fix test builtins-1-p10-runnable.c

The test has two issues:

1) The test should generate execute abort() if an error is found.
However, the test contains a #define 0 which actually enables the
error prints not exectuting void() because the debug code is 
protected
by an #ifdef not #if.  The #define DEBUG needs to be removed to 
so the

test will abort on an error.

2) The vec_i_expected output was tweeked to test that it would fail.
The test value was not removed.

By removing the #define DEBUG, the test fails and reports 1 failure.
Removing the intentionally wrong expected value results in the test
passing with no errors as expected.

gcc/testsuite/ChangeLog:
    * gcc.target/powerpc/builtins-1-p10-runnable.c: Remove #define
    DEBUG.    Replace vec_i_expected value with correct value.
---
 gcc/testsuite/gcc.target/powerpc/builtins-1-p10-runnable.c | 5 
+

 1 file changed, 1 insertion(+), 4 deletions(-)

diff --git 
a/gcc/testsuite/gcc.target/powerpc/builtins-1-p10-runnable.c 
b/gcc/testsuite/gcc.target/powerpc/builtins-1-p10-runnable.c

index 222c8b3a409..3e8a1c736e3 100644
--- a/gcc/testsuite/gcc.target/powerpc/builtins-1-p10-runnable.c
+++ b/gcc/testsuite/gcc.target/powerpc/builtins-1-p10-runnable.c
@@ -25,8 +25,6 @@
 #include 
 #include 

-#define DEBUG 0
-
 #ifdef DEBUG
 #include 
 #endif
@@ -281,8 +279,7 @@ int main()
 /* Signed word multiply high */
 i_arg1 = (vector int){ 2147483648, 2147483648, 2147483648, 
2147483648 };

 i_arg2 = (vector int){ 2, 3, 4, 5};
-    //    vec_i_expected = (vector int){-1, -2, -2, -3};
-    vec_i_expected = (vector int){1, -2, -2, -3};
+    vec_i_expected = (vector int){-1, -2, -2, -3};

 vec_i_result = vec_mulh (i_arg1, i_arg2);

Re: [PATCH v2] RISC-V: Minimal support for svvptc extension.

2024-11-21 Thread Kito Cheng

Hi Dongyan:

Thanks for your patch, it seems good but just a few minor comments :)

> @@ -1721,6 +1722,9 @@ static const riscv_ext_flag_table_t 
> riscv_ext_flag_table[] =
>RISCV_EXT_FLAG_ENTRY ("svinval", x_riscv_sv_subext, MASK_SVINVAL),
>RISCV_EXT_FLAG_ENTRY ("svnapot", x_riscv_sv_subext, MASK_SVNAPOT),
>
> +
> +  RISCV_EXT_FLAG_ENTRY ("svvptc", x_riscv_ptc_subext, MASK_SVVPTC),

Using x_riscv_sv_subext rather than x_riscv_ptc_subext

> +
>RISCV_EXT_FLAG_ENTRY ("ztso", x_riscv_ztso_subext, MASK_ZTSO),
>
>RISCV_EXT_FLAG_ENTRY ("xcvmac",  x_riscv_xcv_subext, MASK_XCVMAC),
> diff --git a/gcc/common/config/riscv/riscv-ext-bitmask.def 
> b/gcc/common/config/riscv/riscv-ext-bitmask.def
> index ca5df1740f3..c42ce152ce3 100644
> --- a/gcc/common/config/riscv/riscv-ext-bitmask.def
> +++ b/gcc/common/config/riscv/riscv-ext-bitmask.def
> @@ -79,5 +79,7 @@ RISCV_EXT_BITMASK ("zcd", 1,  4)
>  RISCV_EXT_BITMASK ("zcf",  1,  5)
>  RISCV_EXT_BITMASK ("zcmop",1,  6)
>  RISCV_EXT_BITMASK ("zawrs",1,  7)
> +RISCV_EXT_BITMASK ("svvptc",   1,  8)
> +

^^^ drop this blankline

>
>  #undef RISCV_EXT_BITMASK
> diff --git a/gcc/config/riscv/riscv.opt b/gcc/config/riscv/riscv.opt
> index ab9d6e82723..3d9aae80858 100644
> --- a/gcc/config/riscv/riscv.opt
> +++ b/gcc/config/riscv/riscv.opt
> @@ -466,6 +466,11 @@ Mask(SVINVAL) Var(riscv_sv_subext)
>
>  Mask(SVNAPOT) Var(riscv_sv_subext)
>
> +TargetVariable
> +int riscv_ptc_subext

Drop this

> +
> +Mask(SVVPTC) Var(riscv_ptc_subext)

and just using riscv_sv_subext

> +
>  TargetVariable
>  int riscv_ztso_subext
>

>

Re: [PATCH v2 1/5] testsuite: arm: Use effective-target for pr56184.C and pr59985.C

2024-11-21 Thread Christophe Lyon





On 11/21/24 15:24, Torbjörn SVENSSON wrote:

Update test cases to use -mcpu=unset/-march=unset feature introduced in
r15-3606-g7d6c6a0d15c.

gcc/testsuite/ChangeLog:

* g++.dg/other/pr56184.C: Use effective-target
arm_arch_v7a_neon_thumb.
* g++.dg/other/pr59985.C: Use effective-target
arm_arch_v7a_fp_hard.
* lib/target-supports.exp: Define effective-target
arm_arch_v7a_fp_hard, arm_arch_v7a_neon_thumb

Signed-off-by: Torbjörn SVENSSON 
---
  gcc/testsuite/g++.dg/other/pr56184.C  | 7 +--
  gcc/testsuite/g++.dg/other/pr59985.C  | 7 ---
  gcc/testsuite/lib/target-supports.exp | 2 ++
  3 files changed, 11 insertions(+), 5 deletions(-)

diff --git a/gcc/testsuite/g++.dg/other/pr56184.C 
b/gcc/testsuite/g++.dg/other/pr56184.C
index dc949283c98..651c6280c35 100644
--- a/gcc/testsuite/g++.dg/other/pr56184.C
+++ b/gcc/testsuite/g++.dg/other/pr56184.C
@@ -1,6 +1,9 @@
  /* { dg-do compile { target arm*-*-* } } */
-/* { dg-skip-if "incompatible options" { ! { arm_thumb1_ok || arm_thumb2_ok } 
} } */
-/* { dg-options "-fno-short-enums -O2 -mthumb -march=armv7-a -mfpu=neon 
-mfloat-abi=softfp -mtune=cortex-a9 -fno-section-anchors -Wno-return-type" } */
+/* { dg-require-effective-target arm_arch_v7a_neon_thumb_ok } */
+/* { dg-options "-fno-short-enums -O2 -fno-section-anchors -Wno-return-type" } 
*/
+/* { dg-add-options arm_arch_v7a_neon_thumb } */
+/* { dg-additional-options "-mthumb -mtune=cortex-a9" } */

Isn't -mthumb already included by dg-add-options arm_arch_v7a_neon_thumb?

Thanks,

Christophe


+
  
  typedef unsigned int size_t;

  __extension__ typedef int __intptr_t;
diff --git a/gcc/testsuite/g++.dg/other/pr59985.C 
b/gcc/testsuite/g++.dg/other/pr59985.C
index 7c9bfab35f1..e96db431633 100644
--- a/gcc/testsuite/g++.dg/other/pr59985.C
+++ b/gcc/testsuite/g++.dg/other/pr59985.C
@@ -1,7 +1,8 @@
  /* { dg-do compile { target arm*-*-* } } */
-/* { dg-skip-if "incompatible options" { arm_thumb1 } } */
-/* { dg-options "-g -fcompare-debug -O2 -march=armv7-a -mtune=cortex-a9 
-mfpu=vfpv3-d16 -mfloat-abi=hard" } */
-/* { dg-skip-if "need hardfp abi" { *-*-* } { "-mfloat-abi=soft" } { "" } } */
+/* { dg-require-effective-target arm_arch_v7a_fp_hard_ok } */
+/* { dg-options "-g -fcompare-debug -O2" } */
+/* { dg-add-options arm_arch_v7a_fp_hard } */
+/* { dg-additional-options "-mtune=cortex-a9" } */
  
  extern void *f1 (unsigned long, unsigned long);

  extern const struct line_map *f2 (void *, int, unsigned int, const char *, 
unsigned int);
diff --git a/gcc/testsuite/lib/target-supports.exp 
b/gcc/testsuite/lib/target-supports.exp
index f3828793986..e2c839b233a 100644
--- a/gcc/testsuite/lib/target-supports.exp
+++ b/gcc/testsuite/lib/target-supports.exp
@@ -5777,7 +5777,9 @@ foreach { armfunc armflag armdefs } {
v6m "-march=armv6-m -mthumb -mfloat-abi=soft" __ARM_ARCH_6M__
v7a "-march=armv7-a+fp" __ARM_ARCH_7A__
v7a_arm "-march=armv7-a+fp -marm" "__ARM_ARCH_7A__ && !__thumb__"
+   v7a_fp_hard "-march=armv7-a+fp -mfpu=auto -mfloat-abi=hard" 
__ARM_ARCH_7A__
v7a_neon "-march=armv7-a+simd -mfpu=auto -mfloat-abi=softfp" "__ARM_ARCH_7A__ 
&& __ARM_NEON__"
+   v7a_neon_thumb "-march=armv7-a+simd -mfpu=auto -mfloat-abi=softfp -mthumb" "__ARM_ARCH_7A__ 
&& __ARM_NEON__ && __thumb__"
v7r "-march=armv7-r+fp" __ARM_ARCH_7R__
v7m "-march=armv7-m -mthumb -mfloat-abi=soft" __ARM_ARCH_7M__
v7em "-march=armv7e-m+fp -mthumb" __ARM_ARCH_7EM__

Re: [PATCH ver2 0/4] rs6000, remove redundant built-ins and add more test cases

2024-11-21 Thread Carl Love




Ping 6

On 11/14/24 1:36 PM, Carl Love wrote:

Ping 5


On 11/5/24 8:28 AM, Carl Love wrote:


Ping 4

On 10/28/24 4:29 PM, Carl Love wrote:

Ping 3


On 10/17/24 1:31 PM, Carl Love wrote:

Ping 2


On 10/9/24 7:44 AM, Carl Love wrote:


Ping


On 10/1/24 8:12 AM, Carl Love wrote:


GCC maintainers:

The following version 2 of a series of patches for PowerPC 
removes some built-ins that are covered by existing overloaded 
built-ins. Additionally, there are patches to add missing 
testcases and documentation.  The original version of the patch 
series was posted on 8/7/2024.  It was originally reviewed by Kewen.


The patches have been updated per the review.  Note patches 2 and 
3 in the series were approved with minor changes.  I will post 
the entire series for review for completeness.


The patch series has been re-tested on Power 10 LE and BE with no 
regressions.


Please let me know if the patches are acceptable for mainline. 
Thanks.


    Carl

testsuite: robustify gcc.target/m68k/20100512-1.c

2024-11-21 Thread Andreas Schwab

This has been failing since r5-2883-g8cb65b3725f0c3 which caused the
memset to be optimized out.  Add an unoptimizable reference to the local
variable to keep it.

Committed.

* gcc.target/m68k/20100512-1.c (doTest1, doTest2): Add asm that
references foo.
---
 gcc/testsuite/gcc.target/m68k/20100512-1.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/gcc/testsuite/gcc.target/m68k/20100512-1.c 
b/gcc/testsuite/gcc.target/m68k/20100512-1.c
index d07bb519abc..ab54a92e965 100644
--- a/gcc/testsuite/gcc.target/m68k/20100512-1.c
+++ b/gcc/testsuite/gcc.target/m68k/20100512-1.c
@@ -9,8 +9,10 @@
 void doTest1(void) {
   volatile char foo[10];
   memset((void *)foo, 1, 100);
+  asm volatile("# %0" : : "g"(foo));
 }
 void doTest2(void) {
   volatile char foo[10];
   memset((void *)foo, 1, 100);
+  asm volatile("# %0" : : "g"(foo));
 }
-- 
2.47.0


-- 
Andreas Schwab, SUSE Labs, sch...@suse.de
GPG Key fingerprint = 0196 BAD8 1CE9 1970 F4BE  1748 E4D4 88E3 0EEA B9D7
"And now for something completely different."

Re: [PATCH 17/17] testsuite: arm: Use effective-target for pr96939 test

2024-11-21 Thread Richard Earnshaw (lists)

On 20/11/2024 13:00, Torbjorn SVENSSON wrote:
> 
> 
> On 2024-11-19 18:57, Richard Earnshaw (lists) wrote:
>> On 19/11/2024 10:24, Torbjörn SVENSSON wrote:
>>> Update test case to use -mcpu=unset/-march=unset feature introduced in
>>> r15-3606-g7d6c6a0d15c.
>>>
>>> gcc/testsuite/ChangeLog:
>>>
>>> * gcc.target/arm/lto/pr96939_0.c: Use effective-target
>>> arm_arch_v8a.
>>> * gcc.target/arm/lto/pr96939_1.c: Remove dg-options.
>>>
>>> Signed-off-by: Torbjörn SVENSSON 
>>> ---
>>>   gcc/testsuite/gcc.target/arm/lto/pr96939_0.c | 4 ++--
>>>   gcc/testsuite/gcc.target/arm/lto/pr96939_1.c | 1 -
>>>   2 files changed, 2 insertions(+), 3 deletions(-)
>>>
>>> diff --git a/gcc/testsuite/gcc.target/arm/lto/pr96939_0.c 
>>> b/gcc/testsuite/gcc.target/arm/lto/pr96939_0.c
>>> index 241ffd5da0a..3bb74bd1a1d 100644
>>> --- a/gcc/testsuite/gcc.target/arm/lto/pr96939_0.c
>>> +++ b/gcc/testsuite/gcc.target/arm/lto/pr96939_0.c
>>> @@ -1,7 +1,7 @@
>>>   /* PR target/96939 */
>>>   /* { dg-lto-do link } */
>>> -/* { dg-require-effective-target arm_arch_v8a_ok } */
>>> -/* { dg-lto-options { { -flto -O2 } } } */
>>> +/* { dg-require-effective-target arm_arch_v8a_link } */
>>> +/* { dg-lto-options { { -flto -O2 -mcpu=unset -march=armv8-a+simd+crc } } 
>>> } */
>>>     extern unsigned crc (unsigned, const void *);
>>>   typedef unsigned (*fnptr) (unsigned, const void *);
>>> diff --git a/gcc/testsuite/gcc.target/arm/lto/pr96939_1.c 
>>> b/gcc/testsuite/gcc.target/arm/lto/pr96939_1.c
>>> index 4afdbdaf5ad..c641b5580ab 100644
>>> --- a/gcc/testsuite/gcc.target/arm/lto/pr96939_1.c
>>> +++ b/gcc/testsuite/gcc.target/arm/lto/pr96939_1.c
>>> @@ -1,5 +1,4 @@
>>>   /* PR target/96939 */
>>> -/* { dg-options "-march=armv8-a+simd+crc" } */
>>>     #include 
>>>   
>>
>> I'm not sure this is right.  The PR talks about handling streaming in of 
>> objects built with different options, which are supposed to be recorded in 
>> the streaming data.  But your change alters what will be recorded AFAICT.
> 
> I was unsure what path I should take to address this test case.
> Maybe we should go with the following:
> 
> gcc.target/arm/lto/pr96939_0.c:
> /* { dg-lto-do link } */
> /* { dg-require-effective-target arm_arch_v8a_link } */
> /* { dg-lto-options { { -flto -O2 } } } */
> 
> gcc.target/arm/lto/pr96939_1.c:
> /* { dg-options "-mcpu=unset -march=armv8-a+simd+crc -mfpu=auto" } */
> 

Yes, that looks about right.

> 
> Should I also define an effective-target for arm_arch_v8a_crc that checks 
> using -march=armv8-a+crc+simd -mfpu=auto -mfloat-abi=softfp and add dg-r-e-t 
> for it in the pr96939_0.c file? Or is it safe to assume that this 
> architecture is available if v8a is available?

LTO tests are slightly special as the require multiple source files to be 
compiled in the test.  I don't think it would really work to have different 
effective targets for the _1.c files compared to the _0.c files.

> 
> Keep in mind that I cannot rely on dg-add-otions in an LTO test.
> Do we want to run this is -mfloat-abi=softfp or -mfloat-abi=hard mode?

I would just copy any -mfloat-abi value that exists in v8a_link (which by the 
looks of things is none).  The two files must be compiled with the same ABI or 
the link will fail.

We don't normally add many comments (other than dg- directives) to tests, but 
in this case one might be worthwhile about the need for the options to remain 
compatible at the ABI level.

R.

> 
> Kind regards
> Torbjörn
>>
>> R.
>

[COMMITTED 1/3] ada: Cleanup in expansion of aggregates in object declarations with aspects

2024-11-21 Thread Marc Poulhiès

From: Eric Botcazou 

The strategy to expand aggregates present as initialization expressions in
object declarations, originally with a subsequent address clause given for
the object and later with aspects whose resolution needs to be delayed up
to the freeze point, has been to block their resolution, so as to block
their expansion, during the processing of the declarations, lest they be
nonstatic and expanded in place and therefore generate assignments to the
object before its freeze point, which is forbidden.  Instead a temporary
is created at the declaration point and the aggregates are assigned to it,
and finally the temporary is copied into the object at the freeze point.

Of course this general strategy cannot be applied to limited types because
the copy operation is forbidden for them, so instead aggregates of limited
types are resolved but have their expansion delayed, before being eventually
expanded through Convert_Aggr_In_Object_Decl, which uses the mechanism based
on Initialization_Statements to insert them at the freeze point.

After the series of cleanups, all the aggregates that are initialization
expressions in object declarations and get expanded in place, go through the
Convert_Aggr_In_Object_Decl mechanism, exactly like those of limited type
with address clause/aspects have done historically.  This means that we no
longer need to block the resolution of those of nonlimited type with address
clause/aspects.

gcc/ada/ChangeLog:

* exp_ch3.adb: Remove clauses for Expander.
(Expand_N_Object_Declaration): Remove special processing for delayed
aggregates of limited types as initialization expressions.
* freeze.adb (Warn_Overlay): Bail out if No_Initialization is set on
the declaration node of the entity.
* sem_ch3.adb (Delayed_Aspect_Present): Delete.
(Expand_N_Object_Declaration): Do not block the resolution of the
initialization expression that is an aggregate when the object has
an address clause or delayed aspects.

Tested on x86_64-pc-linux-gnu, committed on master.

---
 gcc/ada/exp_ch3.adb | 24 ++--
 gcc/ada/freeze.adb  |  6 +++
 gcc/ada/sem_ch3.adb | 92 ++---
 3 files changed, 22 insertions(+), 100 deletions(-)

diff --git a/gcc/ada/exp_ch3.adb b/gcc/ada/exp_ch3.adb
index 639fe50cd53..7d8a7fd4fed 100644
--- a/gcc/ada/exp_ch3.adb
+++ b/gcc/ada/exp_ch3.adb
@@ -32,7 +32,6 @@ with Einfo;  use Einfo;
 with Einfo.Entities; use Einfo.Entities;
 with Einfo.Utils;use Einfo.Utils;
 with Errout; use Errout;
-with Expander;   use Expander;
 with Exp_Aggr;   use Exp_Aggr;
 with Exp_Atag;   use Exp_Atag;
 with Exp_Ch4;use Exp_Ch4;
@@ -7701,28 +7700,13 @@ package body Exp_Ch3 is
 
  Expr_Q := Unqualify (Expr);
 
- --  When we have the appropriate type of aggregate in the expression
- --  (it has been determined during analysis of the aggregate by
- --  setting the delay flag), let's perform in place assignment and
- --  thus avoid creating a temporary.
+ --  When we have the appropriate kind of aggregate in the expression
+ --  (this has been determined during analysis of the aggregate by
+ --  setting the Expansion_Delayed flag), let's perform in place
+ --  assignment and thus avoid creating a temporary.
 
  if Is_Delayed_Aggregate (Expr_Q) then
 
---  An aggregate that must be built in place is not resolved and
---  expanded until the enclosing construct is expanded. This will
---  happen when the aggregate is limited and the declared object
---  has a following address clause. Resolution is done without
---  expansion because it will take place when the declaration
---  itself is expanded.
-
-if Is_Limited_Type (Typ)
-  and then not Analyzed (Expr)
-then
-   Expander_Mode_Save_And_Set (False);
-   Resolve (Expr, Typ);
-   Expander_Mode_Restore;
-end if;
-
 --  For a special return object, the transformation must wait until
 --  after the object is turned into an allocator.
 
diff --git a/gcc/ada/freeze.adb b/gcc/ada/freeze.adb
index 67a51899f95..b52898f4212 100644
--- a/gcc/ada/freeze.adb
+++ b/gcc/ada/freeze.adb
@@ -11034,6 +11034,12 @@ package body Freeze is
  return;
   end if;
 
+  --  No warning if there is no default initialization
+
+  if No_Initialization (Declaration_Node (Ent)) then
+ return;
+  end if;
+
   --  We only give the warning for non-imported entities of a type for
   --  which a non-null base init proc is defined, or for objects of access
   --  types with implicit null initialization, or when Normalize_Scalars
diff --git a/gcc/ada/sem_ch3.adb b/gcc/ada/sem_ch3.adb
index aa950692473..4a3d020330c 100644
--- a/gcc/ada/sem

[PATCH][middle-end] For multiplication try swapping operands when matching complex multiply [PR116463]

2024-11-21 Thread Tamar Christina

Hi All,

This commit fixes the failures of complex.exp=fast-math-complex-mls-*.c on the
GCC 14 branch and some of the ones on the master.

The current matching just looks for one order for multiplication and was relying
on canonicalization to always give the right order because of the TWO_OPERANDS.

However when it comes to the multiplication trying only one order is a bit
fragile as they can be flipped.

The failing tests on the branch are:

#include 

#define TYPE double
#define N 200

void fms180snd(_Complex TYPE a[restrict N], _Complex TYPE b[restrict N],
   _Complex TYPE c[restrict N]) {
  for (int i = 0; i < N; i++)
c[i] -= a[i] * (b[i] * I * I);
}

void fms180fst(_Complex TYPE a[restrict N], _Complex TYPE b[restrict N],
   _Complex TYPE c[restrict N]) {
  for (int i = 0; i < N; i++)
c[i] -= (a[i] * I * I) * b[i];
}

The issue is just a small difference in commutative operations.
we look for {R,R} * {R,I} but found {R,I} * {R,R}.

Since the DF analysis is cached, we should be able to swap operands and retry
for multiply cheaply.

There is a constraint being checked by vect_validate_multiplication for the data
flow of the operands feeding the multiplications.  So e.g.

between the nodes:

note:   node 0x4d1d210 (max_nunits=2, refcnt=3) vector(2) double
note:   op template: _27 = _10 * _25;
note:  stmt 0 _27 = _10 * _25;
note:  stmt 1 _29 = _11 * _25;
note:   node 0x4d1d060 (max_nunits=2, refcnt=2) vector(2) double
note:   op template: _26 = _11 * _24;
note:  stmt 0 _26 = _11 * _24;
note:  stmt 1 _28 = _10 * _24;

we require the lanes to come from the same source which
vect_validate_multiplication checks.  As such it doesn't make sense to flip them
individually because that would invalidate the earlier linear_loads_p checks
which have validated that the arguments all come from the same datarefs.

This patch thus flips the operands in unison to still maintain this invariant,
but also honor the commutative nature of multiplication.

Bootstrapped Regtested on aarch64-none-linux-gnu and no issues on master and
GCC 14 branch.

Ok for master and backport to GCC 14?

Thanks,
Tamar

gcc/ChangeLog:

PR tree-optimizations/116463
* tree-vect-slp-patterns.cc (complex_mul_pattern::matches,
complex_fms_pattern::matches): Try swapping operands on multiply.

---
diff --git a/gcc/tree-vect-slp-patterns.cc b/gcc/tree-vect-slp-patterns.cc
index 
d62682be43c98f2d16af8bf6a6a049c73100ef16..2535d46db3e84700d7591cc8d1bae3b0d098a803
 100644
--- a/gcc/tree-vect-slp-patterns.cc
+++ b/gcc/tree-vect-slp-patterns.cc
@@ -1076,7 +1076,15 @@ complex_mul_pattern::matches (complex_operation_t op,
   enum _conj_status status;
   if (!vect_validate_multiplication (perm_cache, compat_cache, left_op,
 right_op, false, &status))
-return IFN_LAST;
+{
+  /* Try swapping the order and re-trying since multiplication is
+commutative.  */
+  std::swap (left_op[0], left_op[1]);
+  std::swap (right_op[0], right_op[1]);
+  if (!vect_validate_multiplication (perm_cache, compat_cache, left_op,
+right_op, false, &status))
+   return IFN_LAST;
+}
 
   if (status == CONJ_NONE)
 {
@@ -1293,7 +1301,15 @@ complex_fms_pattern::matches (complex_operation_t op,
   enum _conj_status status;
   if (!vect_validate_multiplication (perm_cache, compat_cache, right_op,
 left_op, true, &status))
-return IFN_LAST;
+{
+  /* Try swapping the order and re-trying since multiplication is
+commutative.  */
+  std::swap (left_op[0], left_op[1]);
+  std::swap (right_op[0], right_op[1]);
+  if (!vect_validate_multiplication (perm_cache, compat_cache, right_op,
+left_op, true, &status))
+   return IFN_LAST;
+}
 
   if (status == CONJ_NONE)
 ifn = IFN_COMPLEX_FMS;




-- 
diff --git a/gcc/tree-vect-slp-patterns.cc b/gcc/tree-vect-slp-patterns.cc
index d62682be43c98f2d16af8bf6a6a049c73100ef16..2535d46db3e84700d7591cc8d1bae3b0d098a803 100644
--- a/gcc/tree-vect-slp-patterns.cc
+++ b/gcc/tree-vect-slp-patterns.cc
@@ -1076,7 +1076,15 @@ complex_mul_pattern::matches (complex_operation_t op,
   enum _conj_status status;
   if (!vect_validate_multiplication (perm_cache, compat_cache, left_op,
  right_op, false, &status))
-return IFN_LAST;
+{
+  /* Try swapping the order and re-trying since multiplication is
+	 commutative.  */
+  std::swap (left_op[0], left_op[1]);
+  std::swap (right_op[0], right_op[1]);
+  if (!vect_validate_multiplication (perm_cache, compat_cache, left_op,
+	 right_op, false, &status))
+	return IFN_LAST;
+}
 
   if (status == CONJ_NONE)
 {
@@ -1293,7 +1301,15 @@ complex_fms_pattern::matches (complex_operation_t op,
   enum _conj_status status;
   if (!vect_validate_multiplication (perm_cache, compat_cache, right_op,

[PATCH] pa: Remove pa_section_type_flags

2024-11-21 Thread Xi Ruoyao

It's no longer needed since r15-4842 (when the target-independent code
started to handle the case).

gcc/ChangeLog:

* config/pa/pa.cc (pa_section_type_flags): Remove.
(TARGET_SECTION_TYPE_FLAGS): Remove.
---

I don't have a hppa machine to test this, but conceptually this should
be correct.  Ok for trunk?

 gcc/config/pa/pa.cc | 21 -
 1 file changed, 21 deletions(-)

diff --git a/gcc/config/pa/pa.cc b/gcc/config/pa/pa.cc
index 94ee7dbfa8e..783b922a5fc 100644
--- a/gcc/config/pa/pa.cc
+++ b/gcc/config/pa/pa.cc
@@ -407,8 +407,6 @@ static size_t n_deferred_plabels = 0;
 
 #undef TARGET_LEGITIMATE_CONSTANT_P
 #define TARGET_LEGITIMATE_CONSTANT_P pa_legitimate_constant_p
-#undef TARGET_SECTION_TYPE_FLAGS
-#define TARGET_SECTION_TYPE_FLAGS pa_section_type_flags
 #undef TARGET_LEGITIMATE_ADDRESS_P
 #define TARGET_LEGITIMATE_ADDRESS_P pa_legitimate_address_p
 
@@ -10900,25 +10898,6 @@ pa_legitimate_constant_p (machine_mode mode, rtx x)
   return true;
 }
 
-/* Implement TARGET_SECTION_TYPE_FLAGS.  */
-
-static unsigned int
-pa_section_type_flags (tree decl, const char *name, int reloc)
-{
-  unsigned int flags;
-
-  flags = default_section_type_flags (decl, name, reloc);
-
-  /* Function labels are placed in the constant pool.  This can
- cause a section conflict if decls are put in ".data.rel.ro"
- or ".data.rel.ro.local" using the __attribute__ construct.  */
-  if (strcmp (name, ".data.rel.ro") == 0
-  || strcmp (name, ".data.rel.ro.local") == 0)
-flags |= SECTION_WRITE | SECTION_RELRO;
-
-  return flags;
-}
-
 /* pa_legitimate_address_p recognizes an RTL expression that is a
valid memory address for an instruction.  The MODE argument is the
machine mode for the MEM expression that wants to use this address.
-- 
2.47.0

Re: [PATCH][middle-end] For multiplication try swapping operands when matching complex multiply [PR116463]

2024-11-21 Thread Richard Biener

On Thu, 21 Nov 2024, Tamar Christina wrote:

> Hi All,
> 
> This commit fixes the failures of complex.exp=fast-math-complex-mls-*.c on the
> GCC 14 branch and some of the ones on the master.
> 
> The current matching just looks for one order for multiplication and was 
> relying
> on canonicalization to always give the right order because of the 
> TWO_OPERANDS.
> 
> However when it comes to the multiplication trying only one order is a bit
> fragile as they can be flipped.
> 
> The failing tests on the branch are:
> 
> #include 
> 
> #define TYPE double
> #define N 200
> 
> void fms180snd(_Complex TYPE a[restrict N], _Complex TYPE b[restrict N],
>_Complex TYPE c[restrict N]) {
>   for (int i = 0; i < N; i++)
> c[i] -= a[i] * (b[i] * I * I);
> }
> 
> void fms180fst(_Complex TYPE a[restrict N], _Complex TYPE b[restrict N],
>_Complex TYPE c[restrict N]) {
>   for (int i = 0; i < N; i++)
> c[i] -= (a[i] * I * I) * b[i];
> }
> 
> The issue is just a small difference in commutative operations.
> we look for {R,R} * {R,I} but found {R,I} * {R,R}.
> 
> Since the DF analysis is cached, we should be able to swap operands and retry
> for multiply cheaply.
> 
> There is a constraint being checked by vect_validate_multiplication for the 
> data
> flow of the operands feeding the multiplications.  So e.g.
> 
> between the nodes:
> 
> note:   node 0x4d1d210 (max_nunits=2, refcnt=3) vector(2) double
> note:   op template: _27 = _10 * _25;
> note:  stmt 0 _27 = _10 * _25;
> note:  stmt 1 _29 = _11 * _25;
> note:   node 0x4d1d060 (max_nunits=2, refcnt=2) vector(2) double
> note:   op template: _26 = _11 * _24;
> note:  stmt 0 _26 = _11 * _24;
> note:  stmt 1 _28 = _10 * _24;
> 
> we require the lanes to come from the same source which
> vect_validate_multiplication checks.  As such it doesn't make sense to flip 
> them
> individually because that would invalidate the earlier linear_loads_p checks
> which have validated that the arguments all come from the same datarefs.
> 
> This patch thus flips the operands in unison to still maintain this invariant,
> but also honor the commutative nature of multiplication.
> 
> Bootstrapped Regtested on aarch64-none-linux-gnu and no issues on master and
> GCC 14 branch.
> 
> Ok for master and backport to GCC 14?

OK for master and branch (after a week or so).

Thanks,
Richard.

> Thanks,
> Tamar
> 
> gcc/ChangeLog:
> 
>   PR tree-optimizations/116463
>   * tree-vect-slp-patterns.cc (complex_mul_pattern::matches,
>   complex_fms_pattern::matches): Try swapping operands on multiply.
> 
> ---
> diff --git a/gcc/tree-vect-slp-patterns.cc b/gcc/tree-vect-slp-patterns.cc
> index 
> d62682be43c98f2d16af8bf6a6a049c73100ef16..2535d46db3e84700d7591cc8d1bae3b0d098a803
>  100644
> --- a/gcc/tree-vect-slp-patterns.cc
> +++ b/gcc/tree-vect-slp-patterns.cc
> @@ -1076,7 +1076,15 @@ complex_mul_pattern::matches (complex_operation_t op,
>enum _conj_status status;
>if (!vect_validate_multiplication (perm_cache, compat_cache, left_op,
>right_op, false, &status))
> -return IFN_LAST;
> +{
> +  /* Try swapping the order and re-trying since multiplication is
> +  commutative.  */
> +  std::swap (left_op[0], left_op[1]);
> +  std::swap (right_op[0], right_op[1]);
> +  if (!vect_validate_multiplication (perm_cache, compat_cache, left_op,
> +  right_op, false, &status))
> + return IFN_LAST;
> +}
>  
>if (status == CONJ_NONE)
>  {
> @@ -1293,7 +1301,15 @@ complex_fms_pattern::matches (complex_operation_t op,
>enum _conj_status status;
>if (!vect_validate_multiplication (perm_cache, compat_cache, right_op,
>left_op, true, &status))
> -return IFN_LAST;
> +{
> +  /* Try swapping the order and re-trying since multiplication is
> +  commutative.  */
> +  std::swap (left_op[0], left_op[1]);
> +  std::swap (right_op[0], right_op[1]);
> +  if (!vect_validate_multiplication (perm_cache, compat_cache, right_op,
> +  left_op, true, &status))
> + return IFN_LAST;
> +}
>  
>if (status == CONJ_NONE)
>  ifn = IFN_COMPLEX_FMS;
> 
> 
> 
> 
> 

-- 
Richard Biener 
SUSE Software Solutions Germany GmbH,
Frankenstrasse 146, 90461 Nuernberg, Germany;
GF: Ivo Totev, Andrew McDonald, Werner Knoblich; (HRB 36809, AG Nuernberg)

[PATCH v2 1/5] testsuite: arm: Use effective-target for pr56184.C and pr59985.C

2024-11-21 Thread Torbjörn SVENSSON

Update test cases to use -mcpu=unset/-march=unset feature introduced in
r15-3606-g7d6c6a0d15c.

gcc/testsuite/ChangeLog:

* g++.dg/other/pr56184.C: Use effective-target
arm_arch_v7a_neon_thumb.
* g++.dg/other/pr59985.C: Use effective-target
arm_arch_v7a_fp_hard.
* lib/target-supports.exp: Define effective-target
arm_arch_v7a_fp_hard, arm_arch_v7a_neon_thumb

Signed-off-by: Torbjörn SVENSSON 
---
 gcc/testsuite/g++.dg/other/pr56184.C  | 7 +--
 gcc/testsuite/g++.dg/other/pr59985.C  | 7 ---
 gcc/testsuite/lib/target-supports.exp | 2 ++
 3 files changed, 11 insertions(+), 5 deletions(-)

diff --git a/gcc/testsuite/g++.dg/other/pr56184.C 
b/gcc/testsuite/g++.dg/other/pr56184.C
index dc949283c98..651c6280c35 100644
--- a/gcc/testsuite/g++.dg/other/pr56184.C
+++ b/gcc/testsuite/g++.dg/other/pr56184.C
@@ -1,6 +1,9 @@
 /* { dg-do compile { target arm*-*-* } } */
-/* { dg-skip-if "incompatible options" { ! { arm_thumb1_ok || arm_thumb2_ok } 
} } */
-/* { dg-options "-fno-short-enums -O2 -mthumb -march=armv7-a -mfpu=neon 
-mfloat-abi=softfp -mtune=cortex-a9 -fno-section-anchors -Wno-return-type" } */
+/* { dg-require-effective-target arm_arch_v7a_neon_thumb_ok } */
+/* { dg-options "-fno-short-enums -O2 -fno-section-anchors -Wno-return-type" } 
*/
+/* { dg-add-options arm_arch_v7a_neon_thumb } */
+/* { dg-additional-options "-mthumb -mtune=cortex-a9" } */
+
 
 typedef unsigned int size_t;
 __extension__ typedef int __intptr_t;
diff --git a/gcc/testsuite/g++.dg/other/pr59985.C 
b/gcc/testsuite/g++.dg/other/pr59985.C
index 7c9bfab35f1..e96db431633 100644
--- a/gcc/testsuite/g++.dg/other/pr59985.C
+++ b/gcc/testsuite/g++.dg/other/pr59985.C
@@ -1,7 +1,8 @@
 /* { dg-do compile { target arm*-*-* } } */
-/* { dg-skip-if "incompatible options" { arm_thumb1 } } */
-/* { dg-options "-g -fcompare-debug -O2 -march=armv7-a -mtune=cortex-a9 
-mfpu=vfpv3-d16 -mfloat-abi=hard" } */
-/* { dg-skip-if "need hardfp abi" { *-*-* } { "-mfloat-abi=soft" } { "" } } */
+/* { dg-require-effective-target arm_arch_v7a_fp_hard_ok } */
+/* { dg-options "-g -fcompare-debug -O2" } */
+/* { dg-add-options arm_arch_v7a_fp_hard } */
+/* { dg-additional-options "-mtune=cortex-a9" } */
 
 extern void *f1 (unsigned long, unsigned long);
 extern const struct line_map *f2 (void *, int, unsigned int, const char *, 
unsigned int);
diff --git a/gcc/testsuite/lib/target-supports.exp 
b/gcc/testsuite/lib/target-supports.exp
index f3828793986..e2c839b233a 100644
--- a/gcc/testsuite/lib/target-supports.exp
+++ b/gcc/testsuite/lib/target-supports.exp
@@ -5777,7 +5777,9 @@ foreach { armfunc armflag armdefs } {
v6m "-march=armv6-m -mthumb -mfloat-abi=soft" __ARM_ARCH_6M__
v7a "-march=armv7-a+fp" __ARM_ARCH_7A__
v7a_arm "-march=armv7-a+fp -marm" "__ARM_ARCH_7A__ && !__thumb__"
+   v7a_fp_hard "-march=armv7-a+fp -mfpu=auto -mfloat-abi=hard" 
__ARM_ARCH_7A__
v7a_neon "-march=armv7-a+simd -mfpu=auto -mfloat-abi=softfp" 
"__ARM_ARCH_7A__ && __ARM_NEON__"
+   v7a_neon_thumb "-march=armv7-a+simd -mfpu=auto -mfloat-abi=softfp 
-mthumb" "__ARM_ARCH_7A__ && __ARM_NEON__ && __thumb__"
v7r "-march=armv7-r+fp" __ARM_ARCH_7R__
v7m "-march=armv7-m -mthumb -mfloat-abi=soft" __ARM_ARCH_7M__
v7em "-march=armv7e-m+fp -mthumb" __ARM_ARCH_7EM__
-- 
2.25.1

[PATCH v2 5/5] testsuite: arm: Use effective-target for pr96939 test

2024-11-21 Thread Torbjörn SVENSSON

Update test case to use -mcpu=unset/-march=unset feature introduced in
r15-3606-g7d6c6a0d15c.

gcc/testsuite/ChangeLog:

* gcc.target/arm/lto/pr96939_0.c: Use effective-target
arm_arch_v8a.
* gcc.target/arm/lto/pr96939_1.c: Remove dg-options.

Signed-off-by: Torbjörn SVENSSON 
---
 gcc/testsuite/gcc.target/arm/lto/pr96939_0.c | 3 ++-
 gcc/testsuite/gcc.target/arm/lto/pr96939_1.c | 2 +-
 2 files changed, 3 insertions(+), 2 deletions(-)

diff --git a/gcc/testsuite/gcc.target/arm/lto/pr96939_0.c 
b/gcc/testsuite/gcc.target/arm/lto/pr96939_0.c
index 241ffd5da0a..21d2c1d70a4 100644
--- a/gcc/testsuite/gcc.target/arm/lto/pr96939_0.c
+++ b/gcc/testsuite/gcc.target/arm/lto/pr96939_0.c
@@ -1,6 +1,7 @@
 /* PR target/96939 */
 /* { dg-lto-do link } */
-/* { dg-require-effective-target arm_arch_v8a_ok } */
+/* { dg-require-effective-target arm_arch_v8a_link } */
+/* { dg-options "-mcpu=unset -march=armv8-a+simd -mfpu=auto" } */
 /* { dg-lto-options { { -flto -O2 } } } */
 
 extern unsigned crc (unsigned, const void *);
diff --git a/gcc/testsuite/gcc.target/arm/lto/pr96939_1.c 
b/gcc/testsuite/gcc.target/arm/lto/pr96939_1.c
index 4afdbdaf5ad..e079ec941a5 100644
--- a/gcc/testsuite/gcc.target/arm/lto/pr96939_1.c
+++ b/gcc/testsuite/gcc.target/arm/lto/pr96939_1.c
@@ -1,5 +1,5 @@
 /* PR target/96939 */
-/* { dg-options "-march=armv8-a+simd+crc" } */
+/* { dg-options "-mcpu=unset -march=armv8-a+simd+crc -mfpu=auto" } */
 
 #include 
 
-- 
2.25.1

[PATCH v2 4/5] testsuite: arm: Use effective-target for its.c test [PR94531]

2024-11-21 Thread Torbjörn SVENSSON

The test case gcc.target/arm/its.c was created together with restriction
of IT blocks for Cortex-M7. As the test case fails on all tunes that
does not match Cortex-M7, explicitly test it for Cortex-M7. To have some
additional faith that GCC does the correct thing, I also added another
variant of the test for Cortex-M3 that should allow longer IT blocks.

gcc/testsuite/ChangeLog:

PR testsuite/94531
* gcc.target/arm/its.c: Removed.
* gcc.target/arm/its-1.c: Copy of gcc.target/arm/its.c. Use
effective-target arm_cpu_cortex_m7.
* gcc.target/arm/its-2.c: Copy of gcc.target/arm/its.c. Use
effective-target arm_cpu_cortex_m3.

Signed-off-by: Torbjörn SVENSSON 
---
 .../gcc.target/arm/{its.c => its-1.c} |  5 ++--
 gcc/testsuite/gcc.target/arm/its-2.c  | 24 +++
 2 files changed, 27 insertions(+), 2 deletions(-)
 rename gcc/testsuite/gcc.target/arm/{its.c => its-1.c} (79%)
 create mode 100644 gcc/testsuite/gcc.target/arm/its-2.c

diff --git a/gcc/testsuite/gcc.target/arm/its.c 
b/gcc/testsuite/gcc.target/arm/its-1.c
similarity index 79%
rename from gcc/testsuite/gcc.target/arm/its.c
rename to gcc/testsuite/gcc.target/arm/its-1.c
index f81a0df51cd..2730a2c349e 100644
--- a/gcc/testsuite/gcc.target/arm/its.c
+++ b/gcc/testsuite/gcc.target/arm/its-1.c
@@ -1,7 +1,8 @@
 /* { dg-do compile } */
-/* { dg-require-effective-target arm_cortex_m } */
-/* { dg-require-effective-target arm_thumb2 } */
+/* { dg-require-effective-target arm_cpu_cortex_m7_ok } */
 /* { dg-options "-O2" }  */
+/* { dg-add-options arm_cpu_cortex_m7 } */
+
 int test (int a, int b)
 {
   int r;
diff --git a/gcc/testsuite/gcc.target/arm/its-2.c 
b/gcc/testsuite/gcc.target/arm/its-2.c
new file mode 100644
index 000..2f4f629f6e7
--- /dev/null
+++ b/gcc/testsuite/gcc.target/arm/its-2.c
@@ -0,0 +1,24 @@
+/* { dg-do compile } */
+/* { dg-require-effective-target arm_cpu_cortex_m3_ok } */
+/* { dg-options "-O2" }  */
+/* { dg-add-options arm_cpu_cortex_m3 } */
+
+int test (int a, int b)
+{
+  int r;
+  if (a > 10)
+{
+  r = a - b;
+  r += 10;
+}
+  else
+{
+  r = b - a;
+  r -= 7;
+}
+  if (r > 0)
+r -= 3;
+  return r;
+}
+/* Ensure there is an IT block with at least 2 instructions.  */
+/* { dg-final { scan-assembler "\\sit\[te\]{2}" } } */
-- 
2.25.1

[PATCH v2 3/5] testsuite: arm: Use -mcpu=unset when overriding -march

2024-11-21 Thread Torbjörn SVENSSON

Update test cases to use -mcpu=unset/-march=unset feature introduced in
r15-3606-g7d6c6a0d15c.

gcc/testsuite/ChangeLog:
* gcc.dg/pr41574.c: Added option "-mcpu=unset".
* gcc.dg/pr59418.c: Likewise.
* lib/target-supports.exp (add_options_for_vect_early_break):
Likewise.
(add_options_for_arm_v8_neon): Likewise.
(check_effective_target_arm_neon_ok_nocache): Likewise.
(check_effective_target_arm_simd32_ok_nocache): Likewise.
(check_effective_target_arm_sat_ok_nocache): Likewise.
(check_effective_target_arm_dsp_ok_nocache): Likewise.
(check_effective_target_arm_crc_ok_nocache): Likewise.
(check_effective_target_arm_v8_neon_ok_nocache): Likewise.
(check_effective_target_arm_v8_1m_mve_fp_ok_nocache): Likewise.
(check_effective_target_arm_v8_1a_neon_ok_nocache): Likewise.
(check_effective_target_arm_v8_2a_fp16_scalar_ok_nocache):
Likewise.
(check_effective_target_arm_v8_2a_fp16_neon_ok_nocache):
Likewise.
(check_effective_target_arm_v8_2a_dotprod_neon_ok_nocache):
Likewise.
(check_effective_target_arm_v8_1m_mve_ok_nocache): Likewise.
(check_effective_target_arm_v8_2a_i8mm_ok_nocache): Likewise.
(check_effective_target_arm_fp16fml_neon_ok_nocache): Likewise.
(check_effective_target_arm_v8_2a_bf16_neon_ok_nocache):
Likewise.
(check_effective_target_arm_v8m_main_cde_ok_nocache): Likewise.
(check_effective_target_arm_v8m_main_cde_fp_ok_nocache):
Likewise.
(check_effective_target_arm_v8_1m_main_cde_mve_ok_nocache):
Likewise.
(check_effective_target_arm_v8_1m_main_cde_mve_fp_ok_nocache):
Likewise.
(check_effective_target_arm_v8_3a_complex_neon_ok_nocache):
Likewise.
(check_effective_target_arm_v8_3a_fp16_complex_neon_ok_nocache):
Likewise.
(check_effective_target_arm_v8_1_lob_ok): Likewise.
---
 gcc/testsuite/gcc.dg/pr41574.c|  2 +-
 gcc/testsuite/gcc.dg/pr59418.c|  2 +-
 gcc/testsuite/lib/target-supports.exp | 60 +--
 3 files changed, 32 insertions(+), 32 deletions(-)

diff --git a/gcc/testsuite/gcc.dg/pr41574.c b/gcc/testsuite/gcc.dg/pr41574.c
index 062c0044532..e25295bc4fd 100644
--- a/gcc/testsuite/gcc.dg/pr41574.c
+++ b/gcc/testsuite/gcc.dg/pr41574.c
@@ -1,5 +1,5 @@
 /* { dg-do compile } */
-/* { dg-options "-O2 -march=armv7-a -mfloat-abi=softfp -mfpu=neon 
-fno-unsafe-math-optimizations -fdump-rtl-combine" { target { arm*-*-* } } } */
+/* { dg-options "-O2 -mcpu=unset -march=armv7-a -mfloat-abi=softfp -mfpu=neon 
-fno-unsafe-math-optimizations -fdump-rtl-combine" { target { arm*-*-* } } } */
 /* { dg-options "-O2 -fno-unsafe-math-optimizations -fdump-rtl-combine" { 
target { ! arm*-*-* } } } */
 
 
diff --git a/gcc/testsuite/gcc.dg/pr59418.c b/gcc/testsuite/gcc.dg/pr59418.c
index 4b54ef2b42d..6ab46ecde8a 100644
--- a/gcc/testsuite/gcc.dg/pr59418.c
+++ b/gcc/testsuite/gcc.dg/pr59418.c
@@ -3,7 +3,7 @@
 
 /* { dg-do compile } */
 /* { dg-options "-Os -g" } */
-/* { dg-options "-march=armv7-a+fp -mfloat-abi=hard -Os -g" { target { 
arm*-*-* && { ! arm_thumb1 } } } } */
+/* { dg-options "-mcpu=unset -march=armv7-a+fp -mfloat-abi=hard -Os -g" { 
target { arm*-*-* && { ! arm_thumb1 } } } } */
 
 extern int printf (const char *__format, ...);
 double bar (const char *, int);
diff --git a/gcc/testsuite/lib/target-supports.exp 
b/gcc/testsuite/lib/target-supports.exp
index 1ad4ea462c5..c1e8dafe941 100644
--- a/gcc/testsuite/lib/target-supports.exp
+++ b/gcc/testsuite/lib/target-supports.exp
@@ -4351,7 +4351,7 @@ proc add_options_for_vect_early_break { flags } {
 
 if { [check_effective_target_arm_v8_neon_ok] } {
global et_arm_v8_neon_flags
-   return "$flags $et_arm_v8_neon_flags -march=armv8-a"
+   return "$flags $et_arm_v8_neon_flags -mcpu=unset -march=armv8-a"
 }
 
 if { [check_effective_target_sse4] } {
@@ -5122,7 +5122,7 @@ proc add_options_for_arm_v8_neon { flags } {
return "$flags"
 }
 global et_arm_v8_neon_flags
-return "$flags $et_arm_v8_neon_flags -march=armv8-a"
+return "$flags $et_arm_v8_neon_flags -mcpu=unset -march=armv8-a"
 }
 
 # Add the options needed for ARMv8.1 Adv.SIMD.  Also adds the ARMv8 NEON
@@ -5196,7 +5196,7 @@ proc check_effective_target_arm_neon_ok_nocache { } {
 global et_arm_neon_flags
 set et_arm_neon_flags ""
 if { [check_effective_target_arm32] } {
-   foreach flags {"" "-mfloat-abi=softfp" "-mfpu=neon" "-mfpu=neon 
-mfloat-abi=softfp" "-mfpu=neon -mfloat-abi=softfp -march=armv7-a" 
"-mfloat-abi=hard" "-mfpu=neon -mfloat-abi=hard" "-mfpu=neon -mfloat-abi=hard 
-march=armv7-a"} {
+   foreach flags {"" "-mfloat-abi=softfp" "-mfpu=neon" "-mfpu=neon 
-mfloat-abi=softfp" "-mfpu=neon -mfloat-abi=softfp -mcpu=unset -march=armv7-a" 
"-mfloat-abi=hard" "-mfpu=neon -mfloat-abi=hard" "-mfpu=neon -mfloat-abi=h

[PATCH v2 0/5] testsuite: arm: Leverage -mcpu=unset/-march=unset

2024-11-21 Thread Torbjörn SVENSSON

Hi,

Changes since v1:

- Pushed part of the patch serie that was ok'ed.
- Updated the remaining patches based on the review comments. Please let
  me know if I missed something.

Ok for trunk?

Kind regards,
Torbjörn

[PATCH v2 2/5] testsuite: arm: Use -march=unset for bfloat16_scalar* tests

2024-11-21 Thread Torbjörn SVENSSON

Update test cases to use -mcpu=unset/-march=unset feature introduced in
r15-3606-g7d6c6a0d15c.

gcc/testsuite/ChangeLog:

* gcc.target/arm/bfloat16_scalar_1_1.c: Use effective-target
arm_arch_v8_2a_bf16_hard.
* gcc.target/arm/bfloat16_scalar_2_1.c: Likewise.
* gcc.target/arm/bfloat16_scalar_3_1.c: Likewise.
* gcc.target/arm/bfloat16_scalar_1_2.c: Use effective-target
arm_arch_v8_2a_bf16.
* gcc.target/arm/bfloat16_scalar_2_2.c: Likewise.
* gcc.target/arm/bfloat16_scalar_3_2.c: Likewise.
* lib/target-supports.exp: Define effective-target
v8_2a_bf16 and v8_2a_bf16_hard.

Signed-off-by: Torbjörn SVENSSON 
---
 gcc/testsuite/gcc.target/arm/bfloat16_scalar_1_1.c | 7 +++
 gcc/testsuite/gcc.target/arm/bfloat16_scalar_1_2.c | 5 ++---
 gcc/testsuite/gcc.target/arm/bfloat16_scalar_2_1.c | 5 ++---
 gcc/testsuite/gcc.target/arm/bfloat16_scalar_2_2.c | 5 ++---
 gcc/testsuite/gcc.target/arm/bfloat16_scalar_3_1.c | 5 ++---
 gcc/testsuite/gcc.target/arm/bfloat16_scalar_3_2.c | 5 ++---
 gcc/testsuite/lib/target-supports.exp  | 2 ++
 7 files changed, 15 insertions(+), 19 deletions(-)

diff --git a/gcc/testsuite/gcc.target/arm/bfloat16_scalar_1_1.c 
b/gcc/testsuite/gcc.target/arm/bfloat16_scalar_1_1.c
index 7a6c1772676..f7361d63fc2 100644
--- a/gcc/testsuite/gcc.target/arm/bfloat16_scalar_1_1.c
+++ b/gcc/testsuite/gcc.target/arm/bfloat16_scalar_1_1.c
@@ -1,8 +1,7 @@
 /* { dg-do assemble { target { arm*-*-* } } } */
-/* { dg-require-effective-target arm_hard_ok } */
-/* { dg-require-effective-target arm_v8_2a_bf16_neon_ok } */
-/* { dg-add-options arm_v8_2a_bf16_neon }  */
-/* { dg-additional-options "-O3 --save-temps -std=gnu90 -mfloat-abi=hard" } */
+/* { dg-require-effective-target arm_arch_v8_2a_bf16_hard_ok } */
+/* { dg-add-options arm_arch_v8_2a_bf16_hard } */
+/* { dg-additional-options "-O3 --save-temps -std=gnu90" } */
 /* { dg-final { check-function-bodies "**" "" } } */
 
 #include 
diff --git a/gcc/testsuite/gcc.target/arm/bfloat16_scalar_1_2.c 
b/gcc/testsuite/gcc.target/arm/bfloat16_scalar_1_2.c
index 8293cafcc14..079814ef337 100644
--- a/gcc/testsuite/gcc.target/arm/bfloat16_scalar_1_2.c
+++ b/gcc/testsuite/gcc.target/arm/bfloat16_scalar_1_2.c
@@ -1,7 +1,6 @@
 /* { dg-do assemble { target { arm*-*-* } } } */
-/* { dg-require-effective-target arm_v8_neon_ok } */
-/* { dg-require-effective-target arm_v8_2a_bf16_neon_ok } */
-/* { dg-additional-options "-march=armv8.2-a+bf16 -mfloat-abi=softfp 
-mfpu=auto" } */
+/* { dg-require-effective-target arm_arch_v8_2a_bf16_ok } */
+/* { dg-add-options arm_arch_v8_2a_bf16 } */
 /* { dg-additional-options "-O3 --save-temps -std=gnu90" } */
 /* { dg-final { check-function-bodies "**" "" } } */
 
diff --git a/gcc/testsuite/gcc.target/arm/bfloat16_scalar_2_1.c 
b/gcc/testsuite/gcc.target/arm/bfloat16_scalar_2_1.c
index e84f837e162..de06c4d68d2 100644
--- a/gcc/testsuite/gcc.target/arm/bfloat16_scalar_2_1.c
+++ b/gcc/testsuite/gcc.target/arm/bfloat16_scalar_2_1.c
@@ -1,7 +1,6 @@
 /* { dg-do assemble { target { arm*-*-* } } } */
-/* { dg-require-effective-target arm_v8_neon_ok } */
-/* { dg-require-effective-target arm_v8_2a_bf16_neon_ok } */
-/* { dg-additional-options "-march=armv8.2-a -mfloat-abi=hard 
-mfpu=neon-fp-armv8" } */
+/* { dg-require-effective-target arm_arch_v8_2a_bf16_hard_ok } */
+/* { dg-add-options arm_arch_v8_2a_bf16_hard } */
 /* { dg-additional-options "-O3 --save-temps -std=gnu90" } */
 /* { dg-final { check-function-bodies "**" "" } } */
 
diff --git a/gcc/testsuite/gcc.target/arm/bfloat16_scalar_2_2.c 
b/gcc/testsuite/gcc.target/arm/bfloat16_scalar_2_2.c
index 93ec059819a..fc252b94edc 100644
--- a/gcc/testsuite/gcc.target/arm/bfloat16_scalar_2_2.c
+++ b/gcc/testsuite/gcc.target/arm/bfloat16_scalar_2_2.c
@@ -1,7 +1,6 @@
 /* { dg-do assemble { target { arm*-*-* } } } */
-/* { dg-require-effective-target arm_v8_neon_ok } */
-/* { dg-require-effective-target arm_v8_2a_bf16_neon_ok } */
-/* { dg-additional-options "-march=armv8.2-a -mfloat-abi=softfp 
-mfpu=neon-fp-armv8" } */
+/* { dg-require-effective-target arm_arch_v8_2a_bf16_ok } */
+/* { dg-add-options arm_arch_v8_2a_bf16 } */
 /* { dg-additional-options "-O3 --save-temps -std=gnu90" } */
 /* { dg-final { check-function-bodies "**" "" } } */
 
diff --git a/gcc/testsuite/gcc.target/arm/bfloat16_scalar_3_1.c 
b/gcc/testsuite/gcc.target/arm/bfloat16_scalar_3_1.c
index a1a70690322..f7361d63fc2 100644
--- a/gcc/testsuite/gcc.target/arm/bfloat16_scalar_3_1.c
+++ b/gcc/testsuite/gcc.target/arm/bfloat16_scalar_3_1.c
@@ -1,7 +1,6 @@
 /* { dg-do assemble { target { arm*-*-* } } } */
-/* { dg-require-effective-target arm_v8_neon_ok } */
-/* { dg-require-effective-target arm_v8_2a_bf16_neon_ok } */
-/* { dg-additional-options "-march=armv8.2-a -mfloat-abi=hard 
-mfpu=neon-fp-armv8" } */
+/* { dg-require-effective-target arm_arch_v8_2a_bf16_hard_ok } */
+/* { dg-add-options arm_arch_v8_2a_bf16_hard } */
 /* { dg-additional

RE: [PATCH]AArch64 Suppress default options when march or mcpu used is not affected by it.

2024-11-21 Thread Tamar Christina

> > I tried writing automated testcases for these, however the testsuite doesn't
> > want to scan the output of -### and it makes the excess error tests always 
> > fail
> > unless you use dg-error, which also looks for"error:".  So tested manually:
> 
> You might be able to use dg-message instead. dg-message does not look
> for a `note:` (dg-note), `error:` (dg-note) or `warning:`
> (dg-warning).
> 
> From gcc-dg.exp:
> ```
> # Look for messages that don't have standard prefixes.
> proc dg-message { args } {
> ```

Thanks 😊 It was mostly the excess errors that were an issue. But I  found you 
can suppress it.
Updated new version and tests.

---

Hi All,

This patch makes it so that when you use any of the Cortex-A53 errata
workarounds but have specified an -march or -mcpu we know is not affected by it
that we suppress the errata workaround.

This is a driver only patch as the linker invocation needs to be changed as
well.  The linker and cc SPECs are different because for the linker we didn't
seem to add an inversion flag for the option.  That said, it's also not possible
to configure the linker with it on by default.  So not passing the flag is
sufficient to turn it off.

For the compilers however we have an inversion flag using -mno-, which is needed
to disable the workarounds when the compiler has been configured with it by
default.

In case it's unclear how the patch does what it does (it took me a while to
figure out the syntax):

  * Early matching will replace any -march=native or -mcpu=native with their
expanded forms and erases the native arguments from the buffer.
  * Due to the above if we ensure we handle the new code after this erasure then
we only have to handle the expanded form.
  * The expanded form needs to handle -march=+extensions and
-mcpu=+extensions and so we can't use normal string matching but
instead use strstr with a custom driver function that's common between
native and non-native builds.
  * For the compilers we output -mno- and for the linker we just
  erase the --fix- option.
  * The extra internal matching, e.g. the duplicate match of mcpu inside:
  mcpu=*:%{%:is_local_not_armv8_base(%{mcpu=*:%*}) is so we can extract the glob
  using %* because the outer match would otherwise reset at the %{.  The reason
  for the outer glob at all is to skip the block early if no matches are found.


The workaround has the effect of suppressing certain inlining and multiply-add
formation which leads to about ~1% SPECCPU 2017 Intrate regression on modern
cores.  This patch is needed because most distros configure GCC with the
workaround enabled by default.

Expected output:

> gcc -mcpu=neoverse-v1 -mfix-cortex-a53-835769 -xc - -O3 -o - < /dev/null -### 
> 2>&1 | grep "\-mfix" | wc -l
0

> gcc -mfix-cortex-a53-835769 -xc - -O3 -o - < /dev/null -### 2>&1 | grep 
> "\-mfix" | wc -l
5

> gcc -mfix-cortex-a53-835769 -march=armv8-a -xc - -O3 -o - < /dev/null -### 
> 2>&1 | grep "\-mfix" | wc -l 
5

> gcc -mfix-cortex-a53-835769 -march=armv8.1-a -xc - -O3 -o - < /dev/null -### 
> 2>&1 | grep "\-mfix" | wc -l 
0

> gcc -mfix-cortex-a53-835769 -march=armv8.1-a -xc - -O3 -o - < /dev/null -### 
> 2>&1 | grep "\-\-fix" | wc -l
0

> gcc -mfix-cortex-a53-835769 -march=armv8-a -xc - -O3 -o - < /dev/null -### 
> 2>&1 | grep "\-\-fix" | wc -l
1

> -gcc -mfix-cortex-a53-835769 -xc - -O3 -o - < /dev/null -### 2>&1 | grep 
> "\-\-fix" | wc -l
1

Bootstrapped Regtested on aarch64-none-linux-gnu and no issues.
cross build and regtested on aarch64-none-elf and no issues.

Ok for master?

Thanks,
Tamar

gcc/ChangeLog:

* config/aarch64/aarch64-errata.h (TARGET_SUPPRESS_OPT_SPEC,
TARGET_TURN_OFF_OPT_SPEC, CA53_ERR_835769_COMPILE_SPEC,
CA53_ERR_843419_COMPILE_SPEC): New.
(CA53_ERR_835769_SPEC, CA53_ERR_843419_SPEC): Use them.
(AARCH64_ERRATA_COMPILE_SPEC):
* config/aarch64/aarch64-elf-raw.h (CC1_SPEC, CC1PLUS_SPEC): Add
AARCH64_ERRATA_COMPILE_SPEC.
* config/aarch64/aarch64-freebsd.h (CC1_SPEC, CC1PLUS_SPEC): Likewise.
* config/aarch64/aarch64-gnu.h (CC1_SPEC, CC1PLUS_SPEC): Likewise.
* config/aarch64/aarch64-linux.h (CC1_SPEC, CC1PLUS_SPEC): Likewise.
* config/aarch64/aarch64-netbsd.h (CC1_SPEC, CC1PLUS_SPEC): Likewise.
* common/config/aarch64/aarch64-common.cc
(is_host_cpu_not_armv8_base): New.
* config/aarch64/aarch64.h (is_host_cpu_not_armv8_base): New.
(MCPU_TO_MARCH_SPEC_FUNCTIONS): Add is_local_not_armv8_base.
(EXTRA_SPEC_FUNCTIONS): Add is_local_cpu_armv8_base.
* doc/invoke.texi: Document it.

gcc/testsuite/ChangeLog:

* gcc.target/aarch64/cpunative/info_30: New test.
* gcc.target/aarch64/cpunative/info_31: New test.
* gcc.target/aarch64/cpunative/info_32: New test.
* gcc.target/aarch64/cpunative/info_33: New test.
* gcc.target/aarch64/cpunative/native_cpu_30.c: New test.
* gcc.target/aarch64/cpunative/nat

RE: 3D printing - gcc.gnu.org

2024-11-21 Thread Kate Turner

Hi Team,

Did you receive the email below? If so, let me know your questions.

Sincerely,
Kate

From: Kate Turner
Sent: Friday, September 13, 2024, 8:52 AM
To: gcc-patches@gcc.gnu.org
Subject: 3D printing - gcc.gnu.org

Hello Team,

Are you looking to connect with leading professionals and decision-makers in 
the 3D printing and additive manufacturing industries?

We provide a targeted data information for the following professionals:

  *   3D Printing Software Developers
  *   Additive Manufacturing Engineers
  *   Production Managers & Supervisors
  *   R&D and Innovation Leaders
  *   Manufacturing & Operations Executives
  *   Product Designers & Engineers
  *   Supply Chain & Logistics Managers
  *   Quality Assurance & Control Specialists
  *   Industry Consultants & Experts
  *   And more

Let me know if you've any questions.
Sincerely,
Kate Turner
Growth Manager
P.S. To opt-out, simply reply with Cancel

Re: [patch,avr] PR117726: Improve 4-byte ASHIFT insns

2024-11-21 Thread Denis Chertykov

чт, 21 нояб. 2024 г. в 18:08, Georg-Johann Lay :
>
> This patch improves the 4-byte ASHIFT insns.
> 1) It adds a "r,r,C15" alternative for improved long << 15.
> 2) It adds 3-operand alternatives (depending on options) and
> splits them after peephole2 / before avr-fuse-move into
> a 3-operand byte shift and a 2-operand residual bit shift.
> For better control, it introduces new option -msplit-bit-shift
> that's activated at -O2 and higher per default.  2) is even
> performed with -Os, but not with -Oz.
>
> No new regressions.
>
> Ok for trunk?

Please apply.

Denis.

[PATCH] testsuite: tree-ssa: Limit targets for vec perm tests

2024-11-21 Thread Christoph Müllner

Recently added test cases assume optimized code generation for certain
vectorized code.  However, this optimization might not be applied if
the backends don't support the optimized permuation.

The tests are confirmed to work on aarch64 and x86-64, so this
patch restricts the tests accordingly.

Tested on x86-64.

PR117728

gcc/testsuite/ChangeLog:

* gcc.dg/tree-ssa/satd-hadamard.c: Restrict to aarch64 and x86-64.
* gcc.dg/tree-ssa/vector-8.c: Likewise.
* gcc.dg/tree-ssa/vector-9.c: Likewise.

Signed-off-by: Christoph Müllner 
---
 gcc/testsuite/gcc.dg/tree-ssa/satd-hadamard.c | 2 +-
 gcc/testsuite/gcc.dg/tree-ssa/vector-8.c  | 4 ++--
 gcc/testsuite/gcc.dg/tree-ssa/vector-9.c  | 4 ++--
 3 files changed, 5 insertions(+), 5 deletions(-)

diff --git a/gcc/testsuite/gcc.dg/tree-ssa/satd-hadamard.c 
b/gcc/testsuite/gcc.dg/tree-ssa/satd-hadamard.c
index 576ef01628c..7a22772f2e6 100644
--- a/gcc/testsuite/gcc.dg/tree-ssa/satd-hadamard.c
+++ b/gcc/testsuite/gcc.dg/tree-ssa/satd-hadamard.c
@@ -40,4 +40,4 @@ x264_pixel_satd_8x4_simplified (uint8_t *pix1, int i_pix1, 
uint8_t *pix2, int i_
   return (((uint16_t)sum) + ((uint32_t)sum>>16)) >> 1;
 }
 
-/* { dg-final { scan-tree-dump "VEC_PERM_EXPR.*{ 2, 3, 6, 7 }" "forwprop4" } } 
*/
+/* { dg-final { scan-tree-dump "VEC_PERM_EXPR.*{ 2, 3, 6, 7 }" "forwprop4" { 
target { aarch64*-*-* x86_64-*-* } } } } */
diff --git a/gcc/testsuite/gcc.dg/tree-ssa/vector-8.c 
b/gcc/testsuite/gcc.dg/tree-ssa/vector-8.c
index bc2269065e4..3a7b62b640d 100644
--- a/gcc/testsuite/gcc.dg/tree-ssa/vector-8.c
+++ b/gcc/testsuite/gcc.dg/tree-ssa/vector-8.c
@@ -30,5 +30,5 @@ void f (vec *p_v_in_1, vec *p_v_in_2, vec *p_v_out_1, vec 
*p_v_out_2)
   *p_v_out_2 = v_out_2;
 }
 
-/* { dg-final { scan-tree-dump "Vec perm simplify sequences have been blended" 
"forwprop1" } } */
-/* { dg-final { scan-tree-dump "VEC_PERM_EXPR.*{ 2, 3, 6, 7 }" "forwprop1" } } 
*/
+/* { dg-final { scan-tree-dump "Vec perm simplify sequences have been blended" 
"forwprop1" { target { aarch64*-*-* x86_64-*-* } } } } */
+/* { dg-final { scan-tree-dump "VEC_PERM_EXPR.*{ 2, 3, 6, 7 }" "forwprop1" { 
target { aarch64*-*-* x86_64-*-* } } } } */
diff --git a/gcc/testsuite/gcc.dg/tree-ssa/vector-9.c 
b/gcc/testsuite/gcc.dg/tree-ssa/vector-9.c
index e5f898e0281..ba34fb163d6 100644
--- a/gcc/testsuite/gcc.dg/tree-ssa/vector-9.c
+++ b/gcc/testsuite/gcc.dg/tree-ssa/vector-9.c
@@ -30,5 +30,5 @@ void f (vec *p_v_in_1, vec *p_v_in_2, vec *p_v_out_1, vec 
*p_v_out_2)
   *p_v_out_2 = v_out_2;
 }
 
-/* { dg-final { scan-tree-dump "Vec perm simplify sequences have been blended" 
"forwprop1" } } */
-/* { dg-final { scan-tree-dump "VEC_PERM_EXPR.*{ 2, 3, 6, 7 }" "forwprop1" } } 
*/
+/* { dg-final { scan-tree-dump "Vec perm simplify sequences have been blended" 
"forwprop1" { target { aarch64*-*-* x86_64-*-* } } } } */
+/* { dg-final { scan-tree-dump "VEC_PERM_EXPR.*{ 2, 3, 6, 7 }" "forwprop1" { 
target { aarch64*-*-* x86_64-*-* } } } } */
-- 
2.47.0

Re: [PATCH] Use decl size in Solaris ASM_DECLARE_OBJECT_NAME [PR102296]

2024-11-21 Thread Rainer Orth

Rainer Orth  writes:

> Solaris has modified versions of ASM_DECLARE_OBJECT_NAME on both i386
> and sparc.  When
>
> commit ce597aedd79e646c4a5517505088d380239cbfa5
> Author: Ilya Enkovich 
> Date:   Thu Aug 7 08:04:55 2014 +
>
> elfos.h (ASM_DECLARE_OBJECT_NAME): Use decl size instead of type size.
>
> was applied, those were missed.  At the same time, the testcase was
> restricted to Linux though there's nothing Linux-specific in there, so
> the error remained undetected.
>
> This patch fixes the definitions to match elfos.h and enables the test
> on Solaris, too.
>
> Bootstrapped without regressions on i386-pc-solaris2.11 and
> sparc-sun-solaris2.11.
>
> Ok for trunk?

I've installed the patch as is for now to at least fix the Solaris side
of things.

> I noticed that both openbsd.h and mcore/mcore-elf.h have the same
> problem.  Since I can test neither of those, I left them alone.

The OpenBSD and Mcore maintainers can worry about their ports at their
leisure ...

> Besides, it should be possible to move the testcase out of
> gcc.target/i386, simultaneously restricting it to ELF targets.

... and we can move the testcase out of gcc.target/i386 if we want.

Rainer

-- 
-
Rainer Orth, Center for Biotechnology, Bielefeld University

Re: [PATCH v6] forwprop: Try to blend two isomorphic VEC_PERM sequences

2024-11-21 Thread Christoph Müllner

On Thu, Nov 21, 2024 at 1:34 PM Sam James  wrote:
>
> The default on trunk is --enable-checking=yes,extra (when gcc/DEV-PHASE
> contains "experimental"), otherwise it's --enable-checking=release.
>
> I personally do most testing with --enable-checking=yes,rtl,extra but
> you can do less than that if you want to quickly get results.
>
> The minimum for testing patches on trunk should be yes,extra (as it's
> the default).

Ok, I will change to `--enable-checking=yes,extra`.
For development this is usually not a big deal, because I do
incremental builds and
restrict the test cases with RUNTESTFLAGS to get quick turnaround times.

For proper testing (e.g. before sending a patch out) the duration is
already so long
(because of bootstrapping and running the whole test suite) that
additional checks
won't matter too much.

Thanks,
Christoph

>
> thanks,
> sam

Re: [PATCH]AArch64 Suppress default options when march or mcpu used is not affected by it.

2024-11-21 Thread Ramana Radhakrishnan



> On 19 Nov 2024, at 4:18 PM, Tamar Christina  wrote:
> 
> External email: Use caution opening links or attachments
> 
> 
>> -Original Message-
>> From: Andrew Pinski 
>> Sent: Friday, November 15, 2024 7:16 PM
>> To: Tamar Christina 
>> Cc: gcc-patches@gcc.gnu.org; nd ; Richard Earnshaw
>> ; ktkac...@gcc.gnu.org; Richard Sandiford
>> 
>> Subject: Re: [PATCH]AArch64 Suppress default options when march or mcpu used
>> is not affected by it.
>> 
>> On Fri, Nov 15, 2024 at 6:30 AM Tamar Christina 
>> wrote:
>>> 
>>> Hi All,
>>> 
>>> This patch makes it so that when you use any of the Cortex-A53 errata
>>> workarounds but have specified an -march or -mcpu we know is not affected by
>> it
>>> that we suppress the errata workaround.
>>> 
>>> This is a driver only patch as the linker invocation needs to be changed as
>>> well.  The linker and cc SPECs are different because for the linker we 
>>> didn't
>>> seem to add an inversion flag for the option.  That said, it's also not 
>>> possible
>>> to configure the linker with it on by default.  So not passing the flag is
>>> sufficient to turn it off.
>>> 
>>> For the compilers however we have an inversion flag using -mno-, which is
>> needed
>>> to disable the workarounds when the compiler has been configured with it by
>>> default.
>>> 
>>> Note that theoretically speaking -mcpu=native on a Cortex-A53 would turn it 
>>> off,
>>> but this should be ok because it's unlikely anyone is running GCC-15+ on a
>>> Cortex-A53 which needs it.  If this is a concern I can adjust the patch to 
>>> for
>>> targets that have HAVE_LOCAL_CPU_DETECT I can make a new custom function
>> that
>>> re-queries host detection to see if it's an affected system.
>>> 
>>> The workaround has the effect of suppressing certain inlining and 
>>> multiply-add
>>> formation which leads to about ~1% SPECCPU 2017 Intrate regression on
>> modern
>>> cores.  This patch is needed because most distros configure GCC with the
>>> workaround enabled by default.
>>> 
>>> I tried writing automated testcases for these, however the testsuite doesn't
>>> want to scan the output of -### and it makes the excess error tests always 
>>> fail
>>> unless you use dg-error, which also looks for"error:".  So tested manually:
>> 
>> You might be able to use dg-message instead. dg-message does not look
>> for a `note:` (dg-note), `error:` (dg-note) or `warning:`
>> (dg-warning).
>> 
>> From gcc-dg.exp:
>> ```
>> # Look for messages that don't have standard prefixes.
>> proc dg-message { args } {
>> ```
>> 
> 
> Thanks for the suggestion, I did try it before but both dg-output and 
> dg-message fail
> test for excess errors since the -### output goes to stderr.
> 
> But I realized I misunderstood the dg-message syntax and found a way to 
> silence the
> excess error. So respinning the patch.
> 
> Thanks,
> Tamar
>> Thanks,
>> Andrew Pinski
>> 
>>> 
 gcc -mcpu=neoverse-v1 -mfix-cortex-a53-835769 -xc - -O3 -o - < /dev/null 
 -###
>> 2>&1 | grep "\-mfix" | wc -l
>>> 0
>>> 
 gcc -mfix-cortex-a53-835769 -xc - -O3 -o - < /dev/null -### 2>&1 | grep "\-
>> mfix" | wc -l
>>> 5
>>> 
 gcc -mfix-cortex-a53-835769 -march=armv8-a -xc - -O3 -o - < /dev/null -###
>> 2>&1 | grep "\-mfix" | wc -l
>>> 5
>>> 
 gcc -mfix-cortex-a53-835769 -march=armv8.1-a -xc - -O3 -o - < /dev/null 
 -###
>> 2>&1 | grep "\-mfix" | wc -l
>>> 0
>>> 
 gcc -mfix-cortex-a53-835769 -march=armv8.1-a -xc - -O3 -o - < /dev/null 
 -###
>> 2>&1 | grep "\-\-fix" | wc -l
>>> 0
>>> 
 gcc -mfix-cortex-a53-835769 -march=armv8-a -xc - -O3 -o - < /dev/null -###
>> 2>&1 | grep "\-\-fix" | wc -l
>>> 1
>>> 
 -gcc -mfix-cortex-a53-835769 -xc - -O3 -o - < /dev/null -### 2>&1 | grep 
 "\-\-
>> fix" | wc -l
>>> 1
>>> 
>>> Bootstrapped Regtested on aarch64-none-linux-gnu and no issues.
>>> 
>>> Ok for master?
>>> 
>>> Thanks,
>>> Tamar
>>> 
>>> gcc/ChangeLog:
>>> 
>>>* config/aarch64/aarch64-errata.h (TARGET_SUPPRESS_OPT_SPEC,
>>>TARGET_TURN_OFF_OPT_SPEC, CA53_ERR_835769_COMPILE_SPEC,
>>>CA53_ERR_843419_COMPILE_SPEC): New.
>>>(CA53_ERR_835769_SPEC, CA53_ERR_843419_SPEC): Use them.
>>>(AARCH64_ERRATA_COMPILE_SPEC):
>>>* config/aarch64/aarch64-elf-raw.h (CC1_SPEC, CC1PLUS_SPEC): Add
>>>AARCH64_ERRATA_COMPILE_SPEC.
>>>* config/aarch64/aarch64-freebsd.h (CC1_SPEC, CC1PLUS_SPEC): 
>>> Likewise.
>>>* config/aarch64/aarch64-gnu.h (CC1_SPEC, CC1PLUS_SPEC): Likewise.
>>>* config/aarch64/aarch64-linux.h (CC1_SPEC, CC1PLUS_SPEC): Likewise.
>>>* config/aarch64/aarch64-netbsd.h (CC1_SPEC, CC1PLUS_SPEC): Likewise.
>>>* doc/invoke.texi: Document it.
>>> 
>>> ---
>>> diff --git a/gcc/config/aarch64/aarch64-elf-raw.h
>> b/gcc/config/aarch64/aarch64-elf-raw.h
>>> index
>> 5396da9b2d626e23e4c4d56e19cd7aa70804c475..8442a664c4fdedd9696da90
>> e6727293c4d472a3f 100644
>>> --- a/gcc/config/aarch64/aarch64-elf-raw.h
>>> +++ b/gcc/config/aarch64/aarch64-elf-raw.h

[PATCH] RISC-V: Minimal support for svvptc extension.

2024-11-21 Thread Dongyan Chen

This patch support svvptc extension.
To enable GCC to recognize and process svvptc extension correctly at compile 
time.

gcc/ChangeLog:

* common/config/riscv/riscv-common.cc: New extension.
* common/config/riscv/riscv-ext-bitmask.def (RISCV_EXT_BITMASK): Ditto.
* config/riscv/riscv.opt: New mask.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/arch-44.c: New test.

add svvptc testsuite
---
 gcc/common/config/riscv/riscv-common.cc   | 4 
 gcc/common/config/riscv/riscv-ext-bitmask.def | 2 ++
 gcc/config/riscv/riscv.opt| 5 +
 gcc/testsuite/gcc.target/riscv/arch-44.c  | 5 +
 4 files changed, 16 insertions(+)
 create mode 100644 gcc/testsuite/gcc.target/riscv/arch-44.c

diff --git a/gcc/common/config/riscv/riscv-common.cc 
b/gcc/common/config/riscv/riscv-common.cc
index b0e49eb82c0..4b57b57f6e4 100644
--- a/gcc/common/config/riscv/riscv-common.cc
+++ b/gcc/common/config/riscv/riscv-common.cc
@@ -405,6 +405,7 @@ static const struct riscv_ext_version 
riscv_ext_version_table[] =
   {"svinval", ISA_SPEC_CLASS_NONE, 1, 0},
   {"svnapot", ISA_SPEC_CLASS_NONE, 1, 0},
   {"svpbmt",  ISA_SPEC_CLASS_NONE, 1, 0},
+  {"svvptc",  ISA_SPEC_CLASS_NONE, 1, 0},
 
   {"xcvmac", ISA_SPEC_CLASS_NONE, 1, 0},
   {"xcvalu", ISA_SPEC_CLASS_NONE, 1, 0},
@@ -1721,6 +1722,9 @@ static const riscv_ext_flag_table_t 
riscv_ext_flag_table[] =
   RISCV_EXT_FLAG_ENTRY ("svinval", x_riscv_sv_subext, MASK_SVINVAL),
   RISCV_EXT_FLAG_ENTRY ("svnapot", x_riscv_sv_subext, MASK_SVNAPOT),
 
+
+  RISCV_EXT_FLAG_ENTRY ("svvptc", x_riscv_ptc_subext, MASK_SVVPTC),
+
   RISCV_EXT_FLAG_ENTRY ("ztso", x_riscv_ztso_subext, MASK_ZTSO),
 
   RISCV_EXT_FLAG_ENTRY ("xcvmac",  x_riscv_xcv_subext, MASK_XCVMAC),
diff --git a/gcc/common/config/riscv/riscv-ext-bitmask.def 
b/gcc/common/config/riscv/riscv-ext-bitmask.def
index ca5df1740f3..c42ce152ce3 100644
--- a/gcc/common/config/riscv/riscv-ext-bitmask.def
+++ b/gcc/common/config/riscv/riscv-ext-bitmask.def
@@ -79,5 +79,7 @@ RISCV_EXT_BITMASK ("zcd", 1,  4)
 RISCV_EXT_BITMASK ("zcf",  1,  5)
 RISCV_EXT_BITMASK ("zcmop",1,  6)
 RISCV_EXT_BITMASK ("zawrs",1,  7)
+RISCV_EXT_BITMASK ("svvptc",   1,  8)
+
 
 #undef RISCV_EXT_BITMASK
diff --git a/gcc/config/riscv/riscv.opt b/gcc/config/riscv/riscv.opt
index ab9d6e82723..3d9aae80858 100644
--- a/gcc/config/riscv/riscv.opt
+++ b/gcc/config/riscv/riscv.opt
@@ -466,6 +466,11 @@ Mask(SVINVAL) Var(riscv_sv_subext)
 
 Mask(SVNAPOT) Var(riscv_sv_subext)
 
+TargetVariable
+int riscv_ptc_subext
+
+Mask(SVVPTC) Var(riscv_ptc_subext)
+
 TargetVariable
 int riscv_ztso_subext
 
diff --git a/gcc/testsuite/gcc.target/riscv/arch-44.c 
b/gcc/testsuite/gcc.target/riscv/arch-44.c
new file mode 100644
index 000..80dc19a7083
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/arch-44.c
@@ -0,0 +1,5 @@
+/* { dg-do compile } */
+/* { dg-options "-march=rv64gc_svvptc -mabi=lp64" } */
+int foo()
+{
+}
-- 
2.43.0

Re: [PATCH v4 5/5] aarch64: add SVE2 FP8DOT2 and FP8DOT4 intrinsics

2024-11-21 Thread Richard Sandiford

Claudio Bantaloukas  writes:
> This patch adds support for the following intrinsics:
> - svdot[_f32_mf8]_fpm
> - svdot_lane[_f32_mf8]_fpm
> - svdot[_f16_mf8]_fpm
> - svdot_lane[_f16_mf8]_fpm
>
> The first two are available under a combination of the FP8DOT4 and SVE2 
> features.
> Alternatively under the SSVE_FP8DOT4 feature under streaming mode.
> The final two are available under a combination of the FP8DOT2 and SVE2 
> features.
> Alternatively under the SSVE_FP8DOT2 feature under streaming mode.

Some of the comments from the previous patches apply here too
(e.g. the boilerplate at the start of the tests, and testing the
highest in-range index).

It looks like the patch is missing a change to doc/invoke.texi.

Otherwise it's just banal trivia, sorry:

> diff --git a/gcc/config/aarch64/aarch64-sve-builtins-base.cc 
> b/gcc/config/aarch64/aarch64-sve-builtins-base.cc
> index 022163f0726..65df48a3e65 100644
> --- a/gcc/config/aarch64/aarch64-sve-builtins-base.cc
> +++ b/gcc/config/aarch64/aarch64-sve-builtins-base.cc
> @@ -835,21 +835,28 @@ public:
>rtx
>expand (function_expander &e) const override
>{
> -/* In the optab, the multiplication operands come before the accumulator
> -   operand.  The optab is keyed off the multiplication mode.  */
> -e.rotate_inputs_left (0, 3);
>  insn_code icode;
> -if (e.type_suffix_ids[1] == NUM_TYPE_SUFFIXES)
> -  icode = e.convert_optab_handler_for_sign (sdot_prod_optab,
> - udot_prod_optab,
> - 0, e.result_mode (),
> - GET_MODE (e.args[0]));
> +if (e.fpm_mode == aarch64_sve::FPM_set)
> +  {
> + icode = code_for_aarch64_sve_dot (e.result_mode ());
> +  }

Formatting nit, but: no braces around single statements, with the body
then being indented by 2 spaces relative to the "if".

> diff --git a/gcc/config/aarch64/aarch64-sve-builtins-shapes.cc 
> b/gcc/config/aarch64/aarch64-sve-builtins-shapes.cc
> index 09f343e7118..9f79f6e28c7 100644
> --- a/gcc/config/aarch64/aarch64-sve-builtins-shapes.cc
> +++ b/gcc/config/aarch64/aarch64-sve-builtins-shapes.cc
> @@ -3994,6 +3994,34 @@ struct ternary_bfloat_def
>  };
>  SHAPE (ternary_bfloat)
>  
> +/* sv_t svfoo[_t0](sv_t, svmfloat8_t, svmfloat8_t).  */
> +struct ternary_mfloat8_def
> +: public ternary_resize2_base<8, TYPE_mfloat, TYPE_mfloat>
> +{
> +  void
> +  build (function_builder &b, const function_group_info &group) const 
> override
> +  {
> +gcc_assert (group.fpm_mode == FPM_set);
> +b.add_overloaded_functions (group, MODE_none);
> +build_all (b, "v0,v0,vM,vM", group, MODE_none);
> +  }
> +
> +  tree
> +  resolve (function_resolver &r) const override
> +  {
> +type_suffix_index type;
> +if (!r.check_num_arguments (4)
> + || (type = r.infer_vector_type (0)) == NUM_TYPE_SUFFIXES
> + || !r.require_vector_type (1, VECTOR_TYPE_svmfloat8_t)
> + || !r.require_vector_type (2, VECTOR_TYPE_svmfloat8_t)
> + || !r.require_scalar_type (3, "int64_t"))

uint64_t

> +  return error_mark_node;
> +
> +return r.resolve_to (r.mode_suffix_id, type, TYPE_SUFFIX_mf8, 
> GROUP_none);
> +  }
> +};
> +SHAPE (ternary_mfloat8)
> +
>  /* sv_t svfoo[_t0](sv_t, svbfloat16_t, svbfloat16_t, uint64_t)
>  
> where the final argument is an integer constant expression in the range
> diff --git a/gcc/config/aarch64/aarch64-sve-builtins-sve2.def 
> b/gcc/config/aarch64/aarch64-sve-builtins-sve2.def
> index c84c153e913..7d90e3b5e20 100644
> --- a/gcc/config/aarch64/aarch64-sve-builtins-sve2.def
> +++ b/gcc/config/aarch64/aarch64-sve-builtins-sve2.def
> @@ -363,3 +363,15 @@ DEF_SVE_FUNCTION_GS_FPM (svmlallbb_lane, 
> ternary_mfloat8_lane, s_float_mf8, none
>  DEF_SVE_FUNCTION_GS_FPM (svmlallbt_lane, ternary_mfloat8_lane, s_float_mf8, 
> none, none, set)
>  DEF_SVE_FUNCTION_GS_FPM (svmlalltb_lane, ternary_mfloat8_lane, s_float_mf8, 
> none, none, set)
>  #undef REQUIRED_EXTENSIONS
> +
> +#define REQUIRED_EXTENSIONS \
> +  streaming_compatible (AARCH64_FL_SVE2 | AARCH64_FL_FP8DOT4, 
> AARCH64_FL_SSVE_FP8DOT4)

Elsewhere we've been putting the non-streaming and streaming requirements
on separate lines if the whole thing doesn't fit on one line:

#define REQUIRED_EXTENSIONS \
  streaming_compatible (AARCH64_FL_SVE2 | AARCH64_FL_FP8DOT4, \
AARCH64_FL_SSVE_FP8DOT4)

Same below.

Looks good to me otherwise, thanks.

Richard

Re: [PATCH] testsuite: tree-ssa: Limit targets for vec perm tests

2024-11-21 Thread Jeff Law





On 11/21/24 9:10 AM, Christoph Müllner wrote:

Recently added test cases assume optimized code generation for certain
vectorized code.  However, this optimization might not be applied if
the backends don't support the optimized permuation.

The tests are confirmed to work on aarch64 and x86-64, so this
patch restricts the tests accordingly.

Tested on x86-64.

PR117728

gcc/testsuite/ChangeLog:

* gcc.dg/tree-ssa/satd-hadamard.c: Restrict to aarch64 and x86-64.
* gcc.dg/tree-ssa/vector-8.c: Likewise.
* gcc.dg/tree-ssa/vector-9.c: Likewise.
Looks fine to me.  We don't have TRN on RISC-V, but I'm planning to take 
a looksie at this set to see how it impacts codegen on RISC-V.


Jeff

[patch,avr] avr.opt: Refactor Var(avr_) to Var(avropt_)

2024-11-21 Thread Georg-Johann Lay


This is a no-op refactoring that uses a prefix of avropt_
(formerly: avr_) for variables defined qua Var() directives
in avr.opt.  This makes it easier to spot values that come directly
from avr.opt in the rest of the backend.

Ok for trunk?

Johann

--

AVR: Use Var(avropt_xxx) for option variables in avr.opt.

This is a no-op refactoring that uses a prefix of avropt_
(formerly: avr_) for variables defined qua Var() directives
in avr.opt.  This makes it easier to spot values that come directly
from avr.opt.

gcc/
* config/avr/avr.opt (avr_bits_e, avr_lra_p, avr_mmcu)
(avr_gasisr_prologues, avr_n_flash, avr_log_details)
(avr_branch_cost, avr_split_bit_shift, avr_strict_X)
(avr_flmap, avr_rodata_in_ram, avr_sp8, avr_fuse_add)
(avr_warn_addr_space_convert, avr_warn_misspelled_isr)
(avr_fuse_move, avr_double, avr_long_double): Rename
to respectively: avropt_bits_e, avropt_lra_p, avropt_mmcu,
avropt_gasisr_prologues, avropt_n_flash, avropt_log_details,
avropt_branch_cost, avropt_split_bit_shift, avropt_strict_X,
avropt_flmap, avropt_rodata_in_ram, avropt_sp8, avropt_fuse_add,
avropt_warn_addr_space_convert, avropt_warn_misspelled_isr,
avropt_fuse_move, avropt_double, avropt_long_double.
* config/avr/avr.h: Same.
* config/avr/avr.cc: Same.
* config/avr/avr.md: Same.
* config/avr/avr-passes.cc
* config/avr/avr-log.cc: Same.
* common/config/avr/avr-common.cc: Same.diff --git a/gcc/config/avr/avr-log.cc b/gcc/config/avr/avr-log.cc
index 6f567d845e1..c48b0fbb6a6 100644
--- a/gcc/config/avr/avr-log.cc
+++ b/gcc/config/avr/avr-log.cc
@@ -342,17 +342,17 @@ avr_log_set_avr_log (void)
   bool all = TARGET_ALL_DEBUG != 0;
 
   if (all)
-avr_log_details = "all";
+avropt_log_details = "all";
 
-  if (all || avr_log_details)
+  if (all || avropt_log_details)
 {
   /* Adding , at beginning and end of string makes searching easier.  */
 
-  char *str = (char*) alloca (3 + strlen (avr_log_details));
+  char *str = (char*) alloca (3 + strlen (avropt_log_details));
   bool info;
 
   str[0] = ',';
-  strcat (stpcpy (str+1, avr_log_details), ",");
+  strcat (stpcpy (str+1, avropt_log_details), ",");
 
   all |= strstr (str, ",all,") != NULL;
   info = strstr (str, ",?,") != NULL;
diff --git a/gcc/config/avr/avr-passes.cc b/gcc/config/avr/avr-passes.cc
index b854f186a7a..57c3fed1e41 100644
--- a/gcc/config/avr/avr-passes.cc
+++ b/gcc/config/avr/avr-passes.cc
@@ -1227,7 +1227,7 @@ public:
 
   unsigned int execute (function *func) final override
   {
-if (optimize > 0 && avr_fuse_move > 0)
+if (optimize > 0 && avropt_fuse_move > 0)
   {
 	df_note_add_problem ();
 	df_analyze ();
@@ -3181,13 +3181,13 @@ bbinfo_t::optimize_one_function (function *func)
   // use arith 1 1 1 1  1 1 1 1  3
 
   // Which optimization(s) to perform.
-  bbinfo_t::try_fuse_p = avr_fuse_move & 0x1;  // Digit 0 in [0, 1].
-  bbinfo_t::try_bin_arg1_p = avr_fuse_move & 0x2;  // Digit 1 in [0, 1].
-  bbinfo_t::try_split_any_p = avr_fuse_move & 0x4; // Digit 2 in [0, 1].
-  bbinfo_t::try_split_ldi_p = avr_fuse_move >> 3;   // Digit 3 in [0, 2].
-  bbinfo_t::use_arith_p = (avr_fuse_move >> 3) >= 2;// Digit 3 in [0, 2].
+  bbinfo_t::try_fuse_p = avropt_fuse_move & 0x1;  // Digit 0 in [0, 1].
+  bbinfo_t::try_bin_arg1_p = avropt_fuse_move & 0x2;  // Digit 1 in [0, 1].
+  bbinfo_t::try_split_any_p = avropt_fuse_move & 0x4; // Digit 2 in [0, 1].
+  bbinfo_t::try_split_ldi_p = avropt_fuse_move >> 3;// Digit 3 in [0, 2].
+  bbinfo_t::use_arith_p = (avropt_fuse_move >> 3) >= 2; // Digit 3 in [0, 2].
   bbinfo_t::use_set_some_p = bbinfo_t::try_split_ldi_p; // Digit 3 in [0, 2].
-  bbinfo_t::try_simplify_p = avr_fuse_move != 0;
+  bbinfo_t::try_simplify_p = avropt_fuse_move != 0;
 
   // Topologically sort BBs from last to first.
 
@@ -4221,7 +4221,7 @@ public:
 func->machine->n_avr_fuse_add_executed += 1;
 n_avr_fuse_add_executed = func->machine->n_avr_fuse_add_executed;
 
-if (optimize && avr_fuse_add > 0)
+if (optimize && avropt_fuse_add > 0)
   return execute1 (func);
 return 0;
   }
@@ -4349,7 +4349,7 @@ avr_maybe_adjust_cfa (rtx_insn *insn, rtx reg, int addend)
   if (addend
   && frame_pointer_needed
   && REGNO (reg) == FRAME_POINTER_REGNUM
-  && avr_fuse_add == 3)
+  && avropt_fuse_add == 3)
 {
   rtx plus = plus_constant (Pmode, reg, addend);
   RTX_FRAME_RELATED_P (insn) = 1;
@@ -4443,7 +4443,7 @@ avr_pass_fuse_add::avr_pass_fuse_add::Mem_Insn::Mem_Insn (rtx_insn *insn)
 
   addr_regno = REGNO (addr_reg);
 
-  if (avr_fuse_add == 2
+  if (avropt_fuse_add == 2
   && frame_pointer_needed
   && addr_regno == FRAME_POINTER_REGNUM)
 MEM_VOLATILE_P (mem) = 0;
@@ -4829,7 +4829,7 @@ avr_shift_is_3op ()
   // For OPTIMIZE_SIZE_BALANCED (-Os), we s

[pushed] c++: tweak for -Wrange-loop-construct (13 backport) [PR116731]

2024-11-21 Thread Marek Polacek

This patch differs from the trunk patch slightly: before r14-125
constructible_expr took a TREE_LIST, not a TREE_VEC, so the patch
adjusts for that.

Bootstrapped/regtested on x86_64-pc-linux-gnu.  Pushed.

-- >8 --
This PR reports that the warning would be better off using a check
for trivially constructible rather than trivially copyable.

LLVM accepted a similar fix:
https://github.com/llvm/llvm-project/issues/47355

PR c++/116731

gcc/cp/ChangeLog:

* parser.cc (warn_for_range_copy): Check if TYPE is trivially
constructible, not copyable.

gcc/testsuite/ChangeLog:

* g++.dg/warn/Wrange-loop-construct3.C: New test.
---
 gcc/cp/parser.cc  |  7 ++-
 .../g++.dg/warn/Wrange-loop-construct3.C  | 57 +++
 2 files changed, 61 insertions(+), 3 deletions(-)
 create mode 100644 gcc/testsuite/g++.dg/warn/Wrange-loop-construct3.C

diff --git a/gcc/cp/parser.cc b/gcc/cp/parser.cc
index a9241bd7e7a..47c15cff346 100644
--- a/gcc/cp/parser.cc
+++ b/gcc/cp/parser.cc
@@ -13851,11 +13851,12 @@ warn_for_range_copy (tree decl, tree expr)
   else if (!CP_TYPE_CONST_P (type))
 return;
 
-  /* Since small trivially copyable types are cheap to copy, we suppress the
- warning for them.  64B is a common size of a cache line.  */
+  /* Since small trivially constructible types are cheap to construct, we
+ suppress the warning for them.  64B is a common size of a cache line.  */
+  tree list = build_tree_list (NULL_TREE, TREE_TYPE (expr));
   if (TREE_CODE (TYPE_SIZE_UNIT (type)) != INTEGER_CST
   || (tree_to_uhwi (TYPE_SIZE_UNIT (type)) <= 64
- && trivially_copyable_p (type)))
+ && is_trivially_xible (INIT_EXPR, type, list)))
 return;
 
   /* If we can initialize a reference directly, suggest that to avoid the
diff --git a/gcc/testsuite/g++.dg/warn/Wrange-loop-construct3.C 
b/gcc/testsuite/g++.dg/warn/Wrange-loop-construct3.C
new file mode 100644
index 000..3d9d0c9088e
--- /dev/null
+++ b/gcc/testsuite/g++.dg/warn/Wrange-loop-construct3.C
@@ -0,0 +1,57 @@
+// PR c++/116731
+// { dg-do compile { target c++11 } }
+// { dg-options "-Wrange-loop-construct" }
+
+void
+f0 ()
+{
+  struct S {
+char a[64];
+S& operator=(const S&) { return *this; };
+  };
+
+  S arr[8];
+  for (const auto r : arr)
+(void) r;
+}
+
+void
+f1 ()
+{
+  struct S {
+char a[65];
+S& operator=(const S&) { return *this; };
+  };
+
+  S arr[8];
+  for (const auto r : arr) // { dg-warning "creates a copy" }
+(void) r;
+}
+
+void
+f2 ()
+{
+  struct S {
+char a[64];
+S& operator=(const S&) { return *this; };
+~S() { }
+  };
+
+  S arr[8];
+  for (const auto r : arr) // { dg-warning "creates a copy" }
+(void) r;
+}
+
+void
+f3 ()
+{
+  struct S {
+char a[65];
+S& operator=(const S&) { return *this; };
+~S() { }
+  };
+
+  S arr[8];
+  for (const auto r : arr) // { dg-warning "creates a copy" }
+(void) r;
+}
-- 
2.47.0

[PATCH] wwwdocs: Align the DCO text for the GNU Toolchain to match community usage.

2024-11-21 Thread Carlos O'Donell

Adjust the DCO text to match the broader community usage including
the Linux kernel use around "real names."

These changes clarify what was meant by "real name" and that it is
not required to be a "legal name" or any other stronger requirement
than a known identity that could be contacted to discuss the
contribution.

Link: 
https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/Documentation/process/submitting-patches.rst?id=d4563201f33a022fc0353033d9dfeb1606a88330
Link: https://github.com/cncf/foundation/blob/659fd32c86dc/dco-guidelines.md
Link: https://github.com/cncf/foundation/issues/383
---
 htdocs/dco.html | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/htdocs/dco.html b/htdocs/dco.html
index 68fa183b..7f6cb882 100644
--- a/htdocs/dco.html
+++ b/htdocs/dco.html
@@ -54,8 +54,8 @@ then you just add a line saying:
 
 Signed-off-by: Random J Developer 

 
-using your real name (sorry, no pseudonyms or anonymous contributions.)  This
-will be done for you automatically if you use `git commit -s`.
+using a known identity (sorry, no anonymous contributions.)
+This will be done for you automatically if you use `git commit -s`.
 
 Some people also put extra optional tags at the end.  The GCC project does
 not require tags from anyone other than the original author of the patch, but
-- 
2.46.0

[PATCH v2] RISC-V: Minimal support for svvptc extension.

2024-11-21 Thread Dongyan Chen

This patch support svvptc extension[1].
To enable GCC to recognize and process svvptc extension correctly at compile 
time.

[1] https://github.com/riscv/riscv-svvptc

gcc/ChangeLog:

* common/config/riscv/riscv-common.cc: New extension.
* common/config/riscv/riscv-ext-bitmask.def (RISCV_EXT_BITMASK): Ditto.
* config/riscv/riscv.opt: New mask.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/arch-44.c: New test.

add svvptc testsuite
---
 gcc/common/config/riscv/riscv-common.cc   | 4 
 gcc/common/config/riscv/riscv-ext-bitmask.def | 2 ++
 gcc/config/riscv/riscv.opt| 5 +
 gcc/testsuite/gcc.target/riscv/arch-44.c  | 5 +
 4 files changed, 16 insertions(+)
 create mode 100644 gcc/testsuite/gcc.target/riscv/arch-44.c

diff --git a/gcc/common/config/riscv/riscv-common.cc 
b/gcc/common/config/riscv/riscv-common.cc
index b0e49eb82c0..4b57b57f6e4 100644
--- a/gcc/common/config/riscv/riscv-common.cc
+++ b/gcc/common/config/riscv/riscv-common.cc
@@ -405,6 +405,7 @@ static const struct riscv_ext_version 
riscv_ext_version_table[] =
   {"svinval", ISA_SPEC_CLASS_NONE, 1, 0},
   {"svnapot", ISA_SPEC_CLASS_NONE, 1, 0},
   {"svpbmt",  ISA_SPEC_CLASS_NONE, 1, 0},
+  {"svvptc",  ISA_SPEC_CLASS_NONE, 1, 0},
 
   {"xcvmac", ISA_SPEC_CLASS_NONE, 1, 0},
   {"xcvalu", ISA_SPEC_CLASS_NONE, 1, 0},
@@ -1721,6 +1722,9 @@ static const riscv_ext_flag_table_t 
riscv_ext_flag_table[] =
   RISCV_EXT_FLAG_ENTRY ("svinval", x_riscv_sv_subext, MASK_SVINVAL),
   RISCV_EXT_FLAG_ENTRY ("svnapot", x_riscv_sv_subext, MASK_SVNAPOT),
 
+
+  RISCV_EXT_FLAG_ENTRY ("svvptc", x_riscv_ptc_subext, MASK_SVVPTC),
+
   RISCV_EXT_FLAG_ENTRY ("ztso", x_riscv_ztso_subext, MASK_ZTSO),
 
   RISCV_EXT_FLAG_ENTRY ("xcvmac",  x_riscv_xcv_subext, MASK_XCVMAC),
diff --git a/gcc/common/config/riscv/riscv-ext-bitmask.def 
b/gcc/common/config/riscv/riscv-ext-bitmask.def
index ca5df1740f3..c42ce152ce3 100644
--- a/gcc/common/config/riscv/riscv-ext-bitmask.def
+++ b/gcc/common/config/riscv/riscv-ext-bitmask.def
@@ -79,5 +79,7 @@ RISCV_EXT_BITMASK ("zcd", 1,  4)
 RISCV_EXT_BITMASK ("zcf",  1,  5)
 RISCV_EXT_BITMASK ("zcmop",1,  6)
 RISCV_EXT_BITMASK ("zawrs",1,  7)
+RISCV_EXT_BITMASK ("svvptc",   1,  8)
+
 
 #undef RISCV_EXT_BITMASK
diff --git a/gcc/config/riscv/riscv.opt b/gcc/config/riscv/riscv.opt
index ab9d6e82723..3d9aae80858 100644
--- a/gcc/config/riscv/riscv.opt
+++ b/gcc/config/riscv/riscv.opt
@@ -466,6 +466,11 @@ Mask(SVINVAL) Var(riscv_sv_subext)
 
 Mask(SVNAPOT) Var(riscv_sv_subext)
 
+TargetVariable
+int riscv_ptc_subext
+
+Mask(SVVPTC) Var(riscv_ptc_subext)
+
 TargetVariable
 int riscv_ztso_subext
 
diff --git a/gcc/testsuite/gcc.target/riscv/arch-44.c 
b/gcc/testsuite/gcc.target/riscv/arch-44.c
new file mode 100644
index 000..80dc19a7083
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/arch-44.c
@@ -0,0 +1,5 @@
+/* { dg-do compile } */
+/* { dg-options "-march=rv64gc_svvptc -mabi=lp64" } */
+int foo()
+{
+}
-- 
2.43.0

[PATCH] i386/testsuite: Do not append AVX10.2 option for check_effective_target

2024-11-21 Thread Haochen Jiang

Hi all,

When -avx10.2 meet -march with AVX512 enabled, it will report warning
for vector size conflict. The warning will prevent the test to run on
GCC with arch native build on those platforms when
check_effective_target.

Remove AVX10.2 options since we are using inline asm ad it actually do
not need options. It will eliminate the warning.

Tested wieh -march=native with AVX512. Ok for trunk?

Thx,
Haochen

gcc/testsuite/ChangeLog:

* lib/target-supports.exp (check_effective_target_avx10_2):
Remove AVX10.2 option.
(check_effective_target_avx10_2_512): Ditto.
---
 gcc/testsuite/lib/target-supports.exp | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/gcc/testsuite/lib/target-supports.exp 
b/gcc/testsuite/lib/target-supports.exp
index f3828793986..301254afcf5 100644
--- a/gcc/testsuite/lib/target-supports.exp
+++ b/gcc/testsuite/lib/target-supports.exp
@@ -10805,7 +10805,7 @@ proc check_effective_target_avx10_2 { } {
  __asm__ volatile ("vcvtph2ibs\t%ymm5, %ymm6");
  __asm__ volatile ("vminmaxpd\t$123, %ymm4, %ymm5, %ymm6");
}
-} "-mavx10.2" ]
+} "" ]
 }
 
 # Return 1 if avx10.2-512 instructions can be compiled.
@@ -10820,7 +10820,7 @@ proc check_effective_target_avx10_2_512 { } {
  __asm__ volatile ("vcvtph2ibs\t%zmm5, %zmm6");
  __asm__ volatile ("vminmaxpd\t$123, %zmm4, %zmm5, %zmm6");
}
-} "-mavx10.2-512" ]
+} "" ]
 }
 
 # Return 1 if amx-avx512 instructions can be compiled.
-- 
2.31.1

[pushed] c++: modules and tsubst_friend_class

2024-11-21 Thread Jason Merrill

Tested x86_64-pc-linux-gnu, applying to trunk.

-- 8< --

In 20_util/function_objects/mem_fn/constexpr.cc we start to instantiate
_Mem_fn_base's friend declaration of _Bind_check_arity before we've loaded
the namespace-scope declaration, so lookup_imported_hidden_friend doesn't
find it.  But then we load the namespace-scope declaration in
lookup_template_class during substitution, and so when we get around to
pushing the result of substitution, they conflict.  Fixed by calling
lazy_load_pendings in lookup_imported_hidden_friend.

gcc/cp/ChangeLog:

* name-lookup.cc (lookup_imported_hidden_friend): Call
lazy_load_pendings.
---
 gcc/cp/name-lookup.cc | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/gcc/cp/name-lookup.cc b/gcc/cp/name-lookup.cc
index 2dca57a14fd..76987997642 100644
--- a/gcc/cp/name-lookup.cc
+++ b/gcc/cp/name-lookup.cc
@@ -4569,6 +4569,8 @@ lookup_imported_hidden_friend (tree friend_tmpl)
   || !DECL_MODULE_IMPORT_P (inner))
 return NULL_TREE;
 
+  lazy_load_pendings (friend_tmpl);
+
   tree bind = get_mergeable_namespace_binding
 (current_namespace, DECL_NAME (inner), DECL_MODULE_ATTACH_P (inner));
   if (!bind)

base-commit: 806563f11eb7a677468f0ef40864da6f749b05a8
-- 
2.47.0

[pushed] c++: inline variables and modules

2024-11-21 Thread Jason Merrill

Tested x86_64-pc-linux-gnu, applying to trunk.

-- 8< --

We weren't writing out the definition of an inline variable, so the importer
either got an undefined symbol or 0.

gcc/cp/ChangeLog:

* module.cc (has_definition): Also true for inline vars.

gcc/testsuite/ChangeLog:

* g++.dg/modules/inline-1_a.C: New test.
* g++.dg/modules/inline-1_b.C: New test.
---
 gcc/cp/module.cc  |  3 ++-
 gcc/testsuite/g++.dg/modules/inline-1_a.C | 11 +++
 gcc/testsuite/g++.dg/modules/inline-1_b.C |  8 
 3 files changed, 21 insertions(+), 1 deletion(-)
 create mode 100644 gcc/testsuite/g++.dg/modules/inline-1_a.C
 create mode 100644 gcc/testsuite/g++.dg/modules/inline-1_b.C

diff --git a/gcc/cp/module.cc b/gcc/cp/module.cc
index 3b25f956928..617bf4c68b1 100644
--- a/gcc/cp/module.cc
+++ b/gcc/cp/module.cc
@@ -11919,7 +11919,8 @@ has_definition (tree decl)
   since there's no TU to emit them in otherwise.  */
return true;
 
- if (!decl_maybe_constant_var_p (decl))
+ if (!decl_maybe_constant_var_p (decl)
+ && !DECL_INLINE_VAR_P (decl))
return false;
 
  return true;
diff --git a/gcc/testsuite/g++.dg/modules/inline-1_a.C 
b/gcc/testsuite/g++.dg/modules/inline-1_a.C
new file mode 100644
index 000..eafd450e667
--- /dev/null
+++ b/gcc/testsuite/g++.dg/modules/inline-1_a.C
@@ -0,0 +1,11 @@
+// { dg-additional-options -fmodules }
+// { dg-module-do run }
+export module M;
+
+inline int b = 42;
+struct A
+{
+  static inline int a = 4200;
+};
+
+export inline int f() { return b+A::a; }
diff --git a/gcc/testsuite/g++.dg/modules/inline-1_b.C 
b/gcc/testsuite/g++.dg/modules/inline-1_b.C
new file mode 100644
index 000..af319b16071
--- /dev/null
+++ b/gcc/testsuite/g++.dg/modules/inline-1_b.C
@@ -0,0 +1,8 @@
+// { dg-additional-options -fmodules }
+import M;
+
+int main()
+{
+  if (f() != 4242)
+__builtin_abort ();
+}

base-commit: 806563f11eb7a677468f0ef40864da6f749b05a8
prerequisite-patch-id: 31e3d047a448e14e52ee30721009e58feb3633e3
prerequisite-patch-id: 020aa4f96c4f50df4217a2c23667256a12b3f149
-- 
2.47.0

Ping^5 [PATCH 0/2] Prime path coverage to gcc/gcov

2024-11-21 Thread Jørgen Kvalsvik


Ping.

On 11/12/24 09:56, Jørgen Kvalsvik wrote:

Ping.

On 10/30/24 13:55, Jørgen Kvalsvik wrote:

Ping.

On 10/21/24 15:21, Jørgen Kvalsvik wrote:

Ping.

On 10/10/24 10:08, Jørgen Kvalsvik wrote:

Ping.

On 10/3/24 12:46, Jørgen Kvalsvik wrote:

This is both a ping and a minor update. A few of the patches from the
previous set have been merged, but the big feature still needs review.

Since then it has been quiet, but there are two notable changes:

1. The --prime-paths-{lines,source} flags take an optional argument to
    print covered or uncovered paths, or both. By default, uncovered
    paths are printed like before.
2. Fixed a bad vector access when independent functions share compiler
    generated statements. A reproducing case is in gcov-23.C which
    relied on printing the uncovered path of multiple destructors of
    static objects.

Jørgen Kvalsvik (2):
   gcov: branch, conds, calls in function summaries
   Add prime path coverage to gcc/gcov

  gcc/Makefile.in    |    6 +-
  gcc/builtins.cc    |    2 +-
  gcc/collect2.cc    |    5 +-
  gcc/common.opt |   16 +
  gcc/doc/gcov.texi  |  184 +++
  gcc/doc/invoke.texi    |   36 +
  gcc/gcc.cc |    4 +-
  gcc/gcov-counter.def   |    3 +
  gcc/gcov-io.h  |    3 +
  gcc/gcov.cc    |  531 ++-
  gcc/ipa-inline.cc  |    2 +-
  gcc/passes.cc  |    4 +-
  gcc/path-coverage.cc   |  782 +
  gcc/prime-paths.cc | 2031 +++ 
+ 

  gcc/profile.cc |    6 +-
  gcc/selftest-run-tests.cc  |    1 +
  gcc/selftest.h |    1 +
  gcc/testsuite/g++.dg/gcov/gcov-22.C    |  170 ++
  gcc/testsuite/g++.dg/gcov/gcov-23-1.h  |    9 +
  gcc/testsuite/g++.dg/gcov/gcov-23-2.h  |    9 +
  gcc/testsuite/g++.dg/gcov/gcov-23.C    |   30 +
  gcc/testsuite/gcc.misc-tests/gcov-29.c |  869 ++
  gcc/testsuite/gcc.misc-tests/gcov-30.c |  869 ++
  gcc/testsuite/gcc.misc-tests/gcov-31.c |   35 +
  gcc/testsuite/gcc.misc-tests/gcov-32.c |   24 +
  gcc/testsuite/gcc.misc-tests/gcov-33.c |   27 +
  gcc/testsuite/gcc.misc-tests/gcov-34.c |   29 +
  gcc/testsuite/lib/gcov.exp |  118 +-
  gcc/tree-profile.cc    |   11 +-
  29 files changed, 5795 insertions(+), 22 deletions(-)
  create mode 100644 gcc/path-coverage.cc
  create mode 100644 gcc/prime-paths.cc
  create mode 100644 gcc/testsuite/g++.dg/gcov/gcov-22.C
  create mode 100644 gcc/testsuite/g++.dg/gcov/gcov-23-1.h
  create mode 100644 gcc/testsuite/g++.dg/gcov/gcov-23-2.h
  create mode 100644 gcc/testsuite/g++.dg/gcov/gcov-23.C
  create mode 100644 gcc/testsuite/gcc.misc-tests/gcov-29.c
  create mode 100644 gcc/testsuite/gcc.misc-tests/gcov-30.c
  create mode 100644 gcc/testsuite/gcc.misc-tests/gcov-31.c
  create mode 100644 gcc/testsuite/gcc.misc-tests/gcov-32.c
  create mode 100644 gcc/testsuite/gcc.misc-tests/gcov-33.c
  create mode 100644 gcc/testsuite/gcc.misc-tests/gcov-34.c

Re: [PATCH] wwwdocs: Align the DCO text for the GNU Toolchain to match community usage.

2024-11-21 Thread Carlos O'Donell

On 11/21/24 1:47 PM, Sam James wrote:
> Mark Wielaard  writes:
> 
>> Hi Carlos,
>>
>> On Thu, 2024-11-21 at 12:04 -0500, Carlos O'Donell wrote:
>>> Adjust the DCO text to match the broader community usage including
>>> the Linux kernel use around "real names."
>>
>> We made a similar change to switch from "real names" to "known
>> identifies" for elfutils a year ago:
>> https://sourceware.org/cgit/elfutils/commit/CONTRIBUTING?id=b770e1c4def3532c7b59c4d2e4cd3cee26d4548b
>>
>> I suggest including the actual clarification in the explantion, so
>> there is no confusion about what is meant by "known identity":
>>
>> diff --git a/htdocs/dco.html b/htdocs/dco.html
>> index 68fa183b9fc0..f4bf17d2a6ec 100644
>> --- a/htdocs/dco.html
>> +++ b/htdocs/dco.html
>> @@ -54,8 +54,10 @@ then you just add a line saying:
>>  
>>  Signed-off-by: Random J Developer 
>> 
>>  
>> -using your real name (sorry, no pseudonyms or anonymous contributions.)  
>> This
>> -will be done for you automatically if you use `git commit -s`.
>> +using a known identity (sorry, no pseudonyms or anonymous contributions.)
>> +The name you use as your identity should not be an anonymous id or
>> +false name that misrepresents who you are.
>> +This will be done for you automatically if you use `git commit -s`.
>>  
>>  Some people also put extra optional tags at the end.  The GCC project 
>> does
>>  not require tags from anyone other than the original author of the patch, 
>> but
>>
>> It looks like the cncf has an almost identical clarification.
> 
> I will note that our DCO in Gentoo was based on the kernel's, and we
> changed ours in April last year accordingly to align with this update
> too.

That's good to know.

Did you include any additional clarifying language?

I'd like to keep the language as simple as possible, and so I do not plan to
make any further changes to my patch.

I'm concerned that terms like "false" or "misrepresent" are context dependent
and may lead to more confusion.

I like that the linux kernel text is succinct.

-- 
Cheers,
Carlos.

Re: [PATCH] wwwdocs: Align the DCO text for the GNU Toolchain to match community usage.

2024-11-21 Thread Mark Wielaard

Hi Carlos,

On Thu, 2024-11-21 at 12:04 -0500, Carlos O'Donell wrote:
> Adjust the DCO text to match the broader community usage including
> the Linux kernel use around "real names."

We made a similar change to switch from "real names" to "known
identifies" for elfutils a year ago:
https://sourceware.org/cgit/elfutils/commit/CONTRIBUTING?id=b770e1c4def3532c7b59c4d2e4cd3cee26d4548b

I suggest including the actual clarification in the explantion, so
there is no confusion about what is meant by "known identity":

diff --git a/htdocs/dco.html b/htdocs/dco.html
index 68fa183b9fc0..f4bf17d2a6ec 100644
--- a/htdocs/dco.html
+++ b/htdocs/dco.html
@@ -54,8 +54,10 @@ then you just add a line saying:
 
 Signed-off-by: Random J Developer 

 
-using your real name (sorry, no pseudonyms or anonymous contributions.)  This
-will be done for you automatically if you use `git commit -s`.
+using a known identity (sorry, no pseudonyms or anonymous contributions.)
+The name you use as your identity should not be an anonymous id or
+false name that misrepresents who you are.
+This will be done for you automatically if you use `git commit -s`.
 
 Some people also put extra optional tags at the end.  The GCC project does
 not require tags from anyone other than the original author of the patch, but

It looks like the cncf has an almost identical clarification.

Cheers,

Mark

Re: [PATCH] doc/cpp: Document __has_include_next

2024-11-21 Thread Arsen Arsenović

Joseph Myers  writes:

> On Fri, 18 Oct 2024, Arsen Arsenović wrote:
>
>> -The @code{__has_include} operator by itself, without any @var{operand} or
>> -parentheses, acts as a predefined macro so that support for it can be tested
>> -in portable code.  Thus, the recommended use of the operator is as follows:
>> +The @code{__has_include} and @code{__has_include_next} operators by
>> +themselves, without any @var{operand} or parentheses, acts as a
>> +predefined macro so that support for it can be tested in portable code.
>> +Thus, the recommended use of the operators is as follows:
>
> Some things need updating for the plural: "*act* as predefined *macros* so 
> that support for *them* can be tested" (not "acts", "macro", "it", and 
> remove "a" before "predefined").
>
> OK with those fixes.

Ah, duh.  I missed that in review - fixed.

Pushed with the following text:

  The @code{__has_include} and @code{__has_include_next} operators by
  themselves, without any @var{operand} or parentheses, act as
  predefined macros so that support for them can be tested in portable
  code.  Thus, the recommended use of the operators is as follows:

Thanks, have a lovely night.
-- 
Arsen Arsenović


signature.asc
Description: PGP signature

[PATCH v3] RISC-V: Minimal support for svvptc extension.

2024-11-21 Thread Dongyan Chen

This patch support svvptc extension[1].
To enable GCC to recognize and process svvptc extension correctly at compile 
time.

[1] https://github.com/riscv/riscv-svvptc

gcc/ChangeLog:

* common/config/riscv/riscv-common.cc: New extension.
* common/config/riscv/riscv-ext-bitmask.def (RISCV_EXT_BITMASK): Ditto.
* config/riscv/riscv.opt: New mask.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/arch-44.c: New test.

add svvptc testsuite
---
 gcc/common/config/riscv/riscv-common.cc   | 2 ++
 gcc/common/config/riscv/riscv-ext-bitmask.def | 1 +
 gcc/config/riscv/riscv.opt| 2 ++
 gcc/testsuite/gcc.target/riscv/arch-44.c  | 5 +
 4 files changed, 10 insertions(+)
 create mode 100644 gcc/testsuite/gcc.target/riscv/arch-44.c

diff --git a/gcc/common/config/riscv/riscv-common.cc 
b/gcc/common/config/riscv/riscv-common.cc
index b0e49eb82c0..199b449a22f 100644
--- a/gcc/common/config/riscv/riscv-common.cc
+++ b/gcc/common/config/riscv/riscv-common.cc
@@ -405,6 +405,7 @@ static const struct riscv_ext_version 
riscv_ext_version_table[] =
   {"svinval", ISA_SPEC_CLASS_NONE, 1, 0},
   {"svnapot", ISA_SPEC_CLASS_NONE, 1, 0},
   {"svpbmt",  ISA_SPEC_CLASS_NONE, 1, 0},
+  {"svvptc",  ISA_SPEC_CLASS_NONE, 1, 0},
 
   {"xcvmac", ISA_SPEC_CLASS_NONE, 1, 0},
   {"xcvalu", ISA_SPEC_CLASS_NONE, 1, 0},
@@ -1720,6 +1721,7 @@ static const riscv_ext_flag_table_t 
riscv_ext_flag_table[] =
 
   RISCV_EXT_FLAG_ENTRY ("svinval", x_riscv_sv_subext, MASK_SVINVAL),
   RISCV_EXT_FLAG_ENTRY ("svnapot", x_riscv_sv_subext, MASK_SVNAPOT),
+  RISCV_EXT_FLAG_ENTRY ("svvptc", x_riscv_sv_subext, MASK_SVVPTC),
 
   RISCV_EXT_FLAG_ENTRY ("ztso", x_riscv_ztso_subext, MASK_ZTSO),
 
diff --git a/gcc/common/config/riscv/riscv-ext-bitmask.def 
b/gcc/common/config/riscv/riscv-ext-bitmask.def
index ca5df1740f3..a733533df98 100644
--- a/gcc/common/config/riscv/riscv-ext-bitmask.def
+++ b/gcc/common/config/riscv/riscv-ext-bitmask.def
@@ -79,5 +79,6 @@ RISCV_EXT_BITMASK ("zcd", 1,  4)
 RISCV_EXT_BITMASK ("zcf",  1,  5)
 RISCV_EXT_BITMASK ("zcmop",1,  6)
 RISCV_EXT_BITMASK ("zawrs",1,  7)
+RISCV_EXT_BITMASK ("svvptc",   1,  8)
 
 #undef RISCV_EXT_BITMASK
diff --git a/gcc/config/riscv/riscv.opt b/gcc/config/riscv/riscv.opt
index ab9d6e82723..b9e980e30f5 100644
--- a/gcc/config/riscv/riscv.opt
+++ b/gcc/config/riscv/riscv.opt
@@ -466,6 +466,8 @@ Mask(SVINVAL) Var(riscv_sv_subext)
 
 Mask(SVNAPOT) Var(riscv_sv_subext)
 
+Mask(SVVPTC) Var(riscv_sv_subext)
+
 TargetVariable
 int riscv_ztso_subext
 
diff --git a/gcc/testsuite/gcc.target/riscv/arch-44.c 
b/gcc/testsuite/gcc.target/riscv/arch-44.c
new file mode 100644
index 000..80dc19a7083
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/arch-44.c
@@ -0,0 +1,5 @@
+/* { dg-do compile } */
+/* { dg-options "-march=rv64gc_svvptc -mabi=lp64" } */
+int foo()
+{
+}
-- 
2.43.0

Re:[pushed] [PATCH] LoongArch: Make __builtin_lsx_vorn_v and __builtin_lasx_xvorn_v arguments and return values unsigned

2024-11-21 Thread Lulu Cheng


Pushed to r15-5580.

We searched in the multimedia package and found no cases of using 
__builtin_lsx_vorn_v or __builtin_lasx_xvorn_v,


so the interface type has been modified in the form of a bugfix.

Thanks!


在 2024/10/31 下午11:58, Xi Ruoyao 写道:

Align them with other vector bitwise builtins.

This may break programs directly invoking __builtin_lsx_vorn_v or
__builtin_lasx_xvorn_v, but doing so is not supported (as builtins are
not documented, only intrinsics are documented and users should use them
instead).

gcc/ChangeLog:

* config/loongarch/loongarch-builtins.cc (vorn_v, xvorn_v): Use
unsigned vector modes.
* config/loongarch/lsxintrin.h (__lsx_vorn_v): Cast arguments to
v16u8.
* config/loongarch/lasxintrin.h (__lasx_xvorn_v): Cast arguments
to v32u8.

gcc/testsuite/ChangeLog:

* gcc.target/loongarch/vector/lsx/lsx-builtin.c (__lsx_vorn_v):
Change arguments and return value to v16u8.
* gcc.target/loongarch/vector/lasx/lasx-builtin.c
(__lasx_xvorn_v): Change arguments and return value to v32u8.
---

Now running bootstrap & regtest.  Posted early as a context for some
LLVM patch.  I'll post the regtest result once it finishes.

  gcc/config/loongarch/lasxintrin.h | 4 ++--
  gcc/config/loongarch/loongarch-builtins.cc| 4 ++--
  gcc/config/loongarch/lsxintrin.h  | 4 ++--
  gcc/testsuite/gcc.target/loongarch/vector/lasx/lasx-builtin.c | 4 ++--
  gcc/testsuite/gcc.target/loongarch/vector/lsx/lsx-builtin.c   | 4 ++--
  5 files changed, 10 insertions(+), 10 deletions(-)

diff --git a/gcc/config/loongarch/lasxintrin.h 
b/gcc/config/loongarch/lasxintrin.h
index 16b21455d81..82ab625c58c 100644
--- a/gcc/config/loongarch/lasxintrin.h
+++ b/gcc/config/loongarch/lasxintrin.h
@@ -3564,11 +3564,11 @@ __m256i __lasx_xvssrln_w_d (__m256i _1, __m256i _2)
  }
  
  /* Assembly instruction format:	xd, xj, xk.  */

-/* Data types in instruction templates:  V32QI, V32QI, V32QI.  */
+/* Data types in instruction templates:  UV32QI, UV32QI, UV32QI.  */
  extern __inline __attribute__((__gnu_inline__, __always_inline__, 
__artificial__))
  __m256i __lasx_xvorn_v (__m256i _1, __m256i _2)
  {
-  return (__m256i)__builtin_lasx_xvorn_v ((v32i8)_1, (v32i8)_2);
+  return (__m256i)__builtin_lasx_xvorn_v ((v32u8)_1, (v32u8)_2);
  }
  
  /* Assembly instruction format:	xd, i13.  */

diff --git a/gcc/config/loongarch/loongarch-builtins.cc 
b/gcc/config/loongarch/loongarch-builtins.cc
index 7d13f2f3aab..b25ae26fb8a 100644
--- a/gcc/config/loongarch/loongarch-builtins.cc
+++ b/gcc/config/loongarch/loongarch-builtins.cc
@@ -1629,7 +1629,7 @@ static const struct loongarch_builtin_description 
loongarch_builtins[] = {
LSX_BUILTIN (vssrln_b_h, LARCH_V16QI_FTYPE_V8HI_V8HI),
LSX_BUILTIN (vssrln_h_w, LARCH_V8HI_FTYPE_V4SI_V4SI),
LSX_BUILTIN (vssrln_w_d, LARCH_V4SI_FTYPE_V2DI_V2DI),
-  LSX_BUILTIN (vorn_v, LARCH_V16QI_FTYPE_V16QI_V16QI),
+  LSX_BUILTIN (vorn_v, LARCH_UV16QI_FTYPE_UV16QI_UV16QI),
LSX_BUILTIN (vldi, LARCH_V2DI_FTYPE_HI),
LSX_BUILTIN (vshuf_b, LARCH_V16QI_FTYPE_V16QI_V16QI_V16QI),
LSX_BUILTIN (vldx, LARCH_V16QI_FTYPE_CVPOINTER_DI),
@@ -2179,7 +2179,7 @@ static const struct loongarch_builtin_description 
loongarch_builtins[] = {
LASX_BUILTIN (xvssrln_b_h, LARCH_V32QI_FTYPE_V16HI_V16HI),
LASX_BUILTIN (xvssrln_h_w, LARCH_V16HI_FTYPE_V8SI_V8SI),
LASX_BUILTIN (xvssrln_w_d, LARCH_V8SI_FTYPE_V4DI_V4DI),
-  LASX_BUILTIN (xvorn_v, LARCH_V32QI_FTYPE_V32QI_V32QI),
+  LASX_BUILTIN (xvorn_v, LARCH_UV32QI_FTYPE_UV32QI_UV32QI),
LASX_BUILTIN (xvldi, LARCH_V4DI_FTYPE_HI),
LASX_BUILTIN (xvldx, LARCH_V32QI_FTYPE_CVPOINTER_DI),
LASX_NO_TARGET_BUILTIN (xvstx, LARCH_VOID_FTYPE_V32QI_CVPOINTER_DI),
diff --git a/gcc/config/loongarch/lsxintrin.h b/gcc/config/loongarch/lsxintrin.h
index 9ab8269db70..0f47b5929d1 100644
--- a/gcc/config/loongarch/lsxintrin.h
+++ b/gcc/config/loongarch/lsxintrin.h
@@ -4745,11 +4745,11 @@ __m128i __lsx_vssrln_w_d (__m128i _1, __m128i _2)
  }
  
  /* Assembly instruction format:	vd, vj, vk.  */

-/* Data types in instruction templates:  V16QI, V16QI, V16QI.  */
+/* Data types in instruction templates:  UV16QI, UV16QI, UV16QI.  */
  extern __inline __attribute__((__gnu_inline__, __always_inline__, 
__artificial__))
  __m128i __lsx_vorn_v (__m128i _1, __m128i _2)
  {
-  return (__m128i)__builtin_lsx_vorn_v ((v16i8)_1, (v16i8)_2);
+  return (__m128i)__builtin_lsx_vorn_v ((v16u8)_1, (v16u8)_2);
  }
  
  /* Assembly instruction format:	vd, i13.  */

diff --git a/gcc/testsuite/gcc.target/loongarch/vector/lasx/lasx-builtin.c 
b/gcc/testsuite/gcc.target/loongarch/vector/lasx/lasx-builtin.c
index b1a903b4a2b..64ff870a4c5 100644
--- a/gcc/testsuite/gcc.target/loongarch/vector/lasx/lasx-builtin.c
+++ b/gcc/testsuite/gcc.target/loongarch/vector/lasx/lasx-builtin.c
@@ -3178,8 +3178,8 @@ __lasx_xvssrln_w_d (v4i64 _1, v4i64 _2)
  {
return

[PATCH] libsanitizer: Move language level from gnu++14 to gnu++17

2024-11-21 Thread Andrew Pinski

While compiling libsanitizer for aarch64-linux-gnu, I noticed the new warning:
```
../../../../libsanitizer/asan/asan_interceptors.cpp: In function ‘char* 
___interceptor_strcpy(char*, const char*)’:
../../../../libsanitizer/asan/asan_interceptors.cpp:554:6: warning: ‘if 
constexpr’ only available with ‘-std=c++17’ or ‘-std=gnu++17’ 
[-Wc++17-extensions]
  554 |   if constexpr (SANITIZER_APPLE) {
  |  ^
```
So compile-rt upstream compiles this as gnu++17 (the current defualt for 
clang), so let's update it
to be similar.

Build and tested on aarch64-linux-gnu.

PR sanitizer/117731
libsanitizer/ChangeLog:

* asan/Makefile.am: Replace gnu++14 with gnu++17.
* asan/Makefile.in: Regenerate.
* hwasan/Makefile.am: Replace gnu++14 with gnu++17.
* hwasan/Makefile.in: Regenerate.
* interception/Makefile.am: Replace gnu++14 with gnu++17.
* interception/Makefile.in: Regenerate.
* libbacktrace/Makefile.am: Replace gnu++14 with gnu++17.
* libbacktrace/Makefile.in: Regenerate.
* lsan/Makefile.am: Replace gnu++14 with gnu++17.
* lsan/Makefile.in: Regenerate.
* sanitizer_common/Makefile.am: Replace gnu++14 with gnu++17.
* sanitizer_common/Makefile.in: Regenerate.
* tsan/Makefile.am: Replace gnu++14 with gnu++17.
* tsan/Makefile.in: Regenerate.
* ubsan/Makefile.am: Replace gnu++14 with gnu++17.
* ubsan/Makefile.in: Regenerate.

Signed-off-by: Andrew Pinski 
---
 libsanitizer/asan/Makefile.am | 2 +-
 libsanitizer/asan/Makefile.in | 2 +-
 libsanitizer/hwasan/Makefile.am   | 2 +-
 libsanitizer/hwasan/Makefile.in   | 2 +-
 libsanitizer/interception/Makefile.am | 2 +-
 libsanitizer/interception/Makefile.in | 2 +-
 libsanitizer/libbacktrace/Makefile.am | 2 +-
 libsanitizer/libbacktrace/Makefile.in | 2 +-
 libsanitizer/lsan/Makefile.am | 2 +-
 libsanitizer/lsan/Makefile.in | 2 +-
 libsanitizer/sanitizer_common/Makefile.am | 2 +-
 libsanitizer/sanitizer_common/Makefile.in | 2 +-
 libsanitizer/tsan/Makefile.am | 2 +-
 libsanitizer/tsan/Makefile.in | 2 +-
 libsanitizer/ubsan/Makefile.am| 2 +-
 libsanitizer/ubsan/Makefile.in| 2 +-
 16 files changed, 16 insertions(+), 16 deletions(-)

diff --git a/libsanitizer/asan/Makefile.am b/libsanitizer/asan/Makefile.am
index 5c77b3b2c67..5074673e028 100644
--- a/libsanitizer/asan/Makefile.am
+++ b/libsanitizer/asan/Makefile.am
@@ -9,7 +9,7 @@ DEFS += -DMAC_INTERPOSE_FUNCTIONS -DMISSING_BLOCKS_SUPPORT
 endif
 AM_CXXFLAGS = -Wall -W -Wno-unused-parameter -Wwrite-strings -pedantic 
-Wno-long-long  -fPIC -fno-builtin -fno-exceptions -fno-rtti 
-fomit-frame-pointer -funwind-tables -fvisibility=hidden -Wno-variadic-macros 
-fno-ipa-icf
 AM_CXXFLAGS += $(LIBSTDCXX_RAW_CXX_CXXFLAGS)
-AM_CXXFLAGS += -std=gnu++14
+AM_CXXFLAGS += -std=gnu++17
 AM_CXXFLAGS += $(EXTRA_CXXFLAGS)
 AM_CCASFLAGS = $(EXTRA_ASFLAGS)
 ACLOCAL_AMFLAGS = -I $(top_srcdir) -I $(top_srcdir)/config
diff --git a/libsanitizer/asan/Makefile.in b/libsanitizer/asan/Makefile.in
index ef4b65dda97..c30e9d5f84c 100644
--- a/libsanitizer/asan/Makefile.in
+++ b/libsanitizer/asan/Makefile.in
@@ -422,7 +422,7 @@ AM_CXXFLAGS = -Wall -W -Wno-unused-parameter 
-Wwrite-strings -pedantic \
-Wno-long-long -fPIC -fno-builtin -fno-exceptions -fno-rtti \
-fomit-frame-pointer -funwind-tables -fvisibility=hidden \
-Wno-variadic-macros -fno-ipa-icf \
-   $(LIBSTDCXX_RAW_CXX_CXXFLAGS) -std=gnu++14 $(EXTRA_CXXFLAGS)
+   $(LIBSTDCXX_RAW_CXX_CXXFLAGS) -std=gnu++17 $(EXTRA_CXXFLAGS)
 AM_CCASFLAGS = $(EXTRA_ASFLAGS)
 ACLOCAL_AMFLAGS = -I $(top_srcdir) -I $(top_srcdir)/config
 toolexeclib_LTLIBRARIES = libasan.la
diff --git a/libsanitizer/hwasan/Makefile.am b/libsanitizer/hwasan/Makefile.am
index 653fc8c4720..deef9f8e1db 100644
--- a/libsanitizer/hwasan/Makefile.am
+++ b/libsanitizer/hwasan/Makefile.am
@@ -6,7 +6,7 @@ gcc_version := $(shell @get_gcc_base_ver@ 
$(top_srcdir)/../gcc/BASE-VER)
 DEFS = -D_GNU_SOURCE -D_DEBUG -D__STDC_CONSTANT_MACROS -D__STDC_FORMAT_MACROS 
-D__STDC_LIMIT_MACROS -DCAN_SANITIZE_UB=0 -DHWASAN_WITH_INTERCEPTORS=1
 AM_CXXFLAGS = -Wall -W -Wno-unused-parameter -Wwrite-strings -pedantic 
-Wno-long-long  -fPIC -fno-builtin -fno-exceptions -fno-rtti -funwind-tables 
-fvisibility=hidden -Wno-variadic-macros -fno-ipa-icf
 AM_CXXFLAGS += $(LIBSTDCXX_RAW_CXX_CXXFLAGS)
-AM_CXXFLAGS += -std=gnu++14
+AM_CXXFLAGS += -std=gnu++17
 AM_CXXFLAGS += $(EXTRA_CXXFLAGS)
 AM_CCASFLAGS = $(EXTRA_ASFLAGS)
 ACLOCAL_AMFLAGS = -I $(top_srcdir) -I $(top_srcdir)/config
diff --git a/libsanitizer/hwasan/Makefile.in b/libsanitizer/hwasan/Makefile.in
index 4420bd6a7a9..ae4ba7c14fc 100644
--- a/libsanitizer/hwasan/Makefile.in
+++ b/libsanitizer/hwasan/Makefile.in
@@ -414,7 +414,7 @@ gcc_version := $(shell @get_gcc_base_ver@ 
$(top_srcdir)/../gcc/BASE-VER)
 AM_CXXFLAGS = -

Re: [PATCH v3] RISC-V: Minimal support for svvptc extension.

2024-11-21 Thread Kito Cheng

LGTM, will commit to trunk once I pass the local test :)

On Fri, Nov 22, 2024 at 1:15 PM Dongyan Chen
 wrote:
>
> This patch support svvptc extension[1].
> To enable GCC to recognize and process svvptc extension correctly at compile 
> time.
>
> [1] https://github.com/riscv/riscv-svvptc
>
> gcc/ChangeLog:
>
> * common/config/riscv/riscv-common.cc: New extension.
> * common/config/riscv/riscv-ext-bitmask.def (RISCV_EXT_BITMASK): 
> Ditto.
> * config/riscv/riscv.opt: New mask.
>
> gcc/testsuite/ChangeLog:
>
> * gcc.target/riscv/arch-44.c: New test.
>
> add svvptc testsuite
> ---
>  gcc/common/config/riscv/riscv-common.cc   | 2 ++
>  gcc/common/config/riscv/riscv-ext-bitmask.def | 1 +
>  gcc/config/riscv/riscv.opt| 2 ++
>  gcc/testsuite/gcc.target/riscv/arch-44.c  | 5 +
>  4 files changed, 10 insertions(+)
>  create mode 100644 gcc/testsuite/gcc.target/riscv/arch-44.c
>
> diff --git a/gcc/common/config/riscv/riscv-common.cc 
> b/gcc/common/config/riscv/riscv-common.cc
> index b0e49eb82c0..199b449a22f 100644
> --- a/gcc/common/config/riscv/riscv-common.cc
> +++ b/gcc/common/config/riscv/riscv-common.cc
> @@ -405,6 +405,7 @@ static const struct riscv_ext_version 
> riscv_ext_version_table[] =
>{"svinval", ISA_SPEC_CLASS_NONE, 1, 0},
>{"svnapot", ISA_SPEC_CLASS_NONE, 1, 0},
>{"svpbmt",  ISA_SPEC_CLASS_NONE, 1, 0},
> +  {"svvptc",  ISA_SPEC_CLASS_NONE, 1, 0},
>
>{"xcvmac", ISA_SPEC_CLASS_NONE, 1, 0},
>{"xcvalu", ISA_SPEC_CLASS_NONE, 1, 0},
> @@ -1720,6 +1721,7 @@ static const riscv_ext_flag_table_t 
> riscv_ext_flag_table[] =
>
>RISCV_EXT_FLAG_ENTRY ("svinval", x_riscv_sv_subext, MASK_SVINVAL),
>RISCV_EXT_FLAG_ENTRY ("svnapot", x_riscv_sv_subext, MASK_SVNAPOT),
> +  RISCV_EXT_FLAG_ENTRY ("svvptc", x_riscv_sv_subext, MASK_SVVPTC),
>
>RISCV_EXT_FLAG_ENTRY ("ztso", x_riscv_ztso_subext, MASK_ZTSO),
>
> diff --git a/gcc/common/config/riscv/riscv-ext-bitmask.def 
> b/gcc/common/config/riscv/riscv-ext-bitmask.def
> index ca5df1740f3..a733533df98 100644
> --- a/gcc/common/config/riscv/riscv-ext-bitmask.def
> +++ b/gcc/common/config/riscv/riscv-ext-bitmask.def
> @@ -79,5 +79,6 @@ RISCV_EXT_BITMASK ("zcd", 1,  4)
>  RISCV_EXT_BITMASK ("zcf",  1,  5)
>  RISCV_EXT_BITMASK ("zcmop",1,  6)
>  RISCV_EXT_BITMASK ("zawrs",1,  7)
> +RISCV_EXT_BITMASK ("svvptc",   1,  8)
>
>  #undef RISCV_EXT_BITMASK
> diff --git a/gcc/config/riscv/riscv.opt b/gcc/config/riscv/riscv.opt
> index ab9d6e82723..b9e980e30f5 100644
> --- a/gcc/config/riscv/riscv.opt
> +++ b/gcc/config/riscv/riscv.opt
> @@ -466,6 +466,8 @@ Mask(SVINVAL) Var(riscv_sv_subext)
>
>  Mask(SVNAPOT) Var(riscv_sv_subext)
>
> +Mask(SVVPTC) Var(riscv_sv_subext)
> +
>  TargetVariable
>  int riscv_ztso_subext
>
> diff --git a/gcc/testsuite/gcc.target/riscv/arch-44.c 
> b/gcc/testsuite/gcc.target/riscv/arch-44.c
> new file mode 100644
> index 000..80dc19a7083
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/riscv/arch-44.c
> @@ -0,0 +1,5 @@
> +/* { dg-do compile } */
> +/* { dg-options "-march=rv64gc_svvptc -mabi=lp64" } */
> +int foo()
> +{
> +}
> --
> 2.43.0
>

Re: [pushed] [PATCH] LoongArch: Make __builtin_lsx_vorn_v and __builtin_lasx_xvorn_v arguments and return values unsigned

2024-11-21 Thread Lulu Cheng


Pushed to r14-10960.


在 2024/11/22 上午9:52, Lulu Cheng 写道:

Pushed to r15-5580.

We searched in the multimedia package and found no cases of using 
__builtin_lsx_vorn_v or __builtin_lasx_xvorn_v,


so the interface type has been modified in the form of a bugfix.

Thanks!


在 2024/10/31 下午11:58, Xi Ruoyao 写道:

Align them with other vector bitwise builtins.

This may break programs directly invoking __builtin_lsx_vorn_v or
__builtin_lasx_xvorn_v, but doing so is not supported (as builtins are
not documented, only intrinsics are documented and users should use them
instead).

gcc/ChangeLog:

* config/loongarch/loongarch-builtins.cc (vorn_v, xvorn_v): Use
unsigned vector modes.
* config/loongarch/lsxintrin.h (__lsx_vorn_v): Cast arguments to
v16u8.
* config/loongarch/lasxintrin.h (__lasx_xvorn_v): Cast arguments
to v32u8.

gcc/testsuite/ChangeLog:

* gcc.target/loongarch/vector/lsx/lsx-builtin.c (__lsx_vorn_v):
Change arguments and return value to v16u8.
* gcc.target/loongarch/vector/lasx/lasx-builtin.c
(__lasx_xvorn_v): Change arguments and return value to v32u8.
---

Now running bootstrap & regtest.  Posted early as a context for some
LLVM patch.  I'll post the regtest result once it finishes.

  gcc/config/loongarch/lasxintrin.h | 4 ++--
  gcc/config/loongarch/loongarch-builtins.cc | 4 ++--
  gcc/config/loongarch/lsxintrin.h | 4 ++--
  gcc/testsuite/gcc.target/loongarch/vector/lasx/lasx-builtin.c | 4 ++--
  gcc/testsuite/gcc.target/loongarch/vector/lsx/lsx-builtin.c | 4 ++--
  5 files changed, 10 insertions(+), 10 deletions(-)

diff --git a/gcc/config/loongarch/lasxintrin.h 
b/gcc/config/loongarch/lasxintrin.h

index 16b21455d81..82ab625c58c 100644
--- a/gcc/config/loongarch/lasxintrin.h
+++ b/gcc/config/loongarch/lasxintrin.h
@@ -3564,11 +3564,11 @@ __m256i __lasx_xvssrln_w_d (__m256i _1, 
__m256i _2)

  }
    /* Assembly instruction format:    xd, xj, xk.  */
-/* Data types in instruction templates:  V32QI, V32QI, V32QI. */
+/* Data types in instruction templates:  UV32QI, UV32QI, UV32QI.  */
  extern __inline __attribute__((__gnu_inline__, __always_inline__, 
__artificial__))

  __m256i __lasx_xvorn_v (__m256i _1, __m256i _2)
  {
-  return (__m256i)__builtin_lasx_xvorn_v ((v32i8)_1, (v32i8)_2);
+  return (__m256i)__builtin_lasx_xvorn_v ((v32u8)_1, (v32u8)_2);
  }
    /* Assembly instruction format:    xd, i13.  */
diff --git a/gcc/config/loongarch/loongarch-builtins.cc 
b/gcc/config/loongarch/loongarch-builtins.cc

index 7d13f2f3aab..b25ae26fb8a 100644
--- a/gcc/config/loongarch/loongarch-builtins.cc
+++ b/gcc/config/loongarch/loongarch-builtins.cc
@@ -1629,7 +1629,7 @@ static const struct 
loongarch_builtin_description loongarch_builtins[] = {

    LSX_BUILTIN (vssrln_b_h, LARCH_V16QI_FTYPE_V8HI_V8HI),
    LSX_BUILTIN (vssrln_h_w, LARCH_V8HI_FTYPE_V4SI_V4SI),
    LSX_BUILTIN (vssrln_w_d, LARCH_V4SI_FTYPE_V2DI_V2DI),
-  LSX_BUILTIN (vorn_v, LARCH_V16QI_FTYPE_V16QI_V16QI),
+  LSX_BUILTIN (vorn_v, LARCH_UV16QI_FTYPE_UV16QI_UV16QI),
    LSX_BUILTIN (vldi, LARCH_V2DI_FTYPE_HI),
    LSX_BUILTIN (vshuf_b, LARCH_V16QI_FTYPE_V16QI_V16QI_V16QI),
    LSX_BUILTIN (vldx, LARCH_V16QI_FTYPE_CVPOINTER_DI),
@@ -2179,7 +2179,7 @@ static const struct 
loongarch_builtin_description loongarch_builtins[] = {

    LASX_BUILTIN (xvssrln_b_h, LARCH_V32QI_FTYPE_V16HI_V16HI),
    LASX_BUILTIN (xvssrln_h_w, LARCH_V16HI_FTYPE_V8SI_V8SI),
    LASX_BUILTIN (xvssrln_w_d, LARCH_V8SI_FTYPE_V4DI_V4DI),
-  LASX_BUILTIN (xvorn_v, LARCH_V32QI_FTYPE_V32QI_V32QI),
+  LASX_BUILTIN (xvorn_v, LARCH_UV32QI_FTYPE_UV32QI_UV32QI),
    LASX_BUILTIN (xvldi, LARCH_V4DI_FTYPE_HI),
    LASX_BUILTIN (xvldx, LARCH_V32QI_FTYPE_CVPOINTER_DI),
    LASX_NO_TARGET_BUILTIN (xvstx, LARCH_VOID_FTYPE_V32QI_CVPOINTER_DI),
diff --git a/gcc/config/loongarch/lsxintrin.h 
b/gcc/config/loongarch/lsxintrin.h

index 9ab8269db70..0f47b5929d1 100644
--- a/gcc/config/loongarch/lsxintrin.h
+++ b/gcc/config/loongarch/lsxintrin.h
@@ -4745,11 +4745,11 @@ __m128i __lsx_vssrln_w_d (__m128i _1, __m128i 
_2)

  }
    /* Assembly instruction format:    vd, vj, vk.  */
-/* Data types in instruction templates:  V16QI, V16QI, V16QI. */
+/* Data types in instruction templates:  UV16QI, UV16QI, UV16QI.  */
  extern __inline __attribute__((__gnu_inline__, __always_inline__, 
__artificial__))

  __m128i __lsx_vorn_v (__m128i _1, __m128i _2)
  {
-  return (__m128i)__builtin_lsx_vorn_v ((v16i8)_1, (v16i8)_2);
+  return (__m128i)__builtin_lsx_vorn_v ((v16u8)_1, (v16u8)_2);
  }
    /* Assembly instruction format:    vd, i13.  */
diff --git 
a/gcc/testsuite/gcc.target/loongarch/vector/lasx/lasx-builtin.c 
b/gcc/testsuite/gcc.target/loongarch/vector/lasx/lasx-builtin.c

index b1a903b4a2b..64ff870a4c5 100644
--- a/gcc/testsuite/gcc.target/loongarch/vector/lasx/lasx-builtin.c
+++ b/gcc/testsuite/gcc.target/loongarch/vector/lasx/lasx-builtin.c
@@ -3178,8 +3178,8 @@ __lasx_xvssrln_w_d (v4i64 _1, v4i64 _2)
  {
    return __builtin_lasx_xvssrln_w_d (_1, _2);
  }

Re:[pushed] [PATCH] LoongArch: Fix clerical errors in lasx_xvreplgr2vr_* and lsx_vreplgr2vr_*.

2024-11-21 Thread Lulu Cheng


Pushed to r15-5581 and r14-10961.

在 2024/11/2 下午3:37, Lulu Cheng 写道:

[x]vldi.{b/h/w/d} is not implemented in LoongArch.
Use the macro [x]vrepli.{b/h/w/d} to replace.

gcc/ChangeLog:

* config/loongarch/lasx.md: Fixed.
* config/loongarch/lsx.md: Fixed.
---
  gcc/config/loongarch/lasx.md | 2 +-
  gcc/config/loongarch/lsx.md  | 2 +-
  2 files changed, 2 insertions(+), 2 deletions(-)

diff --git a/gcc/config/loongarch/lasx.md b/gcc/config/loongarch/lasx.md
index d37b2e83c21..457ed163f31 100644
--- a/gcc/config/loongarch/lasx.md
+++ b/gcc/config/loongarch/lasx.md
@@ -1402,7 +1402,7 @@ (define_insn "lasx_xvreplgr2vr_"
"ISA_HAS_LASX"
  {
if (which_alternative == 1)
-return "xvldi.b\t%u0,0" ;
+return "xvrepli.b\t%u0,0";
  
return "xvreplgr2vr.\t%u0,%z1";

  }
diff --git a/gcc/config/loongarch/lsx.md b/gcc/config/loongarch/lsx.md
index fcba28b0751..a9004290371 100644
--- a/gcc/config/loongarch/lsx.md
+++ b/gcc/config/loongarch/lsx.md
@@ -1275,7 +1275,7 @@ (define_insn "lsx_vreplgr2vr_"
"ISA_HAS_LSX"
  {
if (which_alternative == 1)
-return "vldi.\t%w0,0";
+return "vrepli.b\t%w0,0";
  
return "vreplgr2vr.\t%w0,%z1";

  }

[PATCH 0/1] Add ACLE macro _CHKFEAT_GCS

2024-11-21 Thread Yury Khrustalev

Add ACLE macro _CHKFEAT_GCS for AArch64.

Regression tested on aarch64-none-linux-gnu and no regressions have
been found. Is it OK for trunk?

Applies to dbc38dd9e96.

I don't have commit access so I need someone to commit on my behalf.

---

Yury Khrustalev (1):
  aarch64: add ACLE macro _CHKFEAT_GCS

 gcc/config/aarch64/arm_acle.h  | 3 +++
 libgcc/config/aarch64/aarch64-unwind.h | 6 +++---
 2 files changed, 6 insertions(+), 3 deletions(-)

-- 
2.39.5

[PATCH] testsuite: arm: Use -mtune=cortex-m4 for thumb-ifcvt.c test

2024-11-21 Thread Torbjörn SVENSSON

Ok for trunk and releases/gcc-14?

--

On Cortex-M4, the code generated is:
 cmp r0, r1
 ittene
 lslne   r0, r0, r1
 asrne   r0, r0, #1
 moveq   r0, r1
 add r0, r0, r1
 bx  lr

On Cortex-M7, the code generated is:
 cmp r0, r1
 beq .L3
 lslsr0, r0, r1
 asrsr0, r0, #1
 add r0, r0, r1
 bx  lr
.L3:
 mov r0, r1
 add r0, r0, r1
 bx  lr

As Cortex-M7 only allow maximum one conditional instruction, force
Cortex-M4 to have a stable test case.

gcc/testsuite/ChangeLog:

* gcc.target/arm/thumb-ifcvt.c: Use -mtune=cortex-m4.

Signed-off-by: Torbjörn SVENSSON 
---
 gcc/testsuite/gcc.target/arm/thumb-ifcvt.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/gcc/testsuite/gcc.target/arm/thumb-ifcvt.c 
b/gcc/testsuite/gcc.target/arm/thumb-ifcvt.c
index 02e56f53b0d..c7786faae76 100644
--- a/gcc/testsuite/gcc.target/arm/thumb-ifcvt.c
+++ b/gcc/testsuite/gcc.target/arm/thumb-ifcvt.c
@@ -1,7 +1,7 @@
 /* Check that Thumb 16-bit shifts can be if-converted.  */
 /* { dg-do compile } */
 /* { dg-require-effective-target arm_thumb2_ok } */
-/* { dg-options "-O2 -mthumb -mno-restrict-it" } */
+/* { dg-options "-O2 -mthumb -mtune=cortex-m4 -mno-restrict-it" } */
 
 int
 foo (int a, int b)
-- 
2.25.1

[PATCH] diagnostics: UX: add doc URLs for attributes (v2)

2024-11-21 Thread David Malcolm

This is v2 of the patch; v1 was here:
  https://gcc.gnu.org/pipermail/gcc-patches/2024-June/655541.html

Changed in v2:
* added a new TARGET_DOCUMENTATION_NAME hook for figuring out which
  documentation URL to use when there are multiple per-target docs,
  such as for __attribute__((interrupt)); implemented this for all
  targets that have target-specific attributes
* moved attribute_urlifier and its support code to a new
  gcc-attribute-urlifier.cc since it needs to use targetm for the
  above; gcc-urlifier.o is used by the driver.
* fixed extend.texi so that some attributes that failed to appear in
  attr-urls.def now do so (affected nvptx "kernel" and "shared" attrs)
* regenerated attr-urls.def for the above fix, and bringing in
  attributes added since v1 of the patch

In r14-5118-gc5db4d8ba5f3de I added a mechanism to automatically add
documentation URLs to quoted strings in diagnostics.
In r14-6920-g9e49746da303b8 I added a mechanism to generate URLs for
mentions of command-line options in quoted strings in diagnostics.

This patch does a similar thing for attributes.  It adds a new Python 3
script to scrape the generated HTML looking for documentation of
attributes, and uses this to (re)generate a new gcc/attr-urls.def file.

Running "make regenerate-attr-urls" after rebuilding the HTML docs will
regenerate gcc/attr-urls.def in the source directory.

The patch uses this to optionally add doc URLs for attributes in any
diagnostic emitted during the lifetime of a auto_urlify_attributes
instance, and adds such instances everywhere that a diagnostic refers
to a diagnostic within quotes (based on grepping the source tree
for references to attributes in strings and in code).

For example, given:

$ ./xgcc -B. -S ../../src/gcc/testsuite/gcc.dg/attr-access-2.c
../../src/gcc/testsuite/gcc.dg/attr-access-2.c:14:16: warning:
attribute ‘access(read_write, 2, 3)’ positional argument 2 conflicts
with previous designation by argument 1 [-Wattributes]

with this patch the quoted text `access(read_write, 2, 3)'
automatically gains the URL for our docs for "access":
https://gcc.gnu.org/onlinedocs/gcc/Common-Function-Attributes.html#index-access-function-attribute
in a sufficiently modern terminal.

Like r14-6920-g9e49746da303b8 this avoids the Makefile target
depending on the generated HTML, since a missing URL is a minor
problem, whereas requiring all users to build HTML docs seems more
involved.  Doing so also avoids Python 3 as a build requirement for
everyone, but instead just for developers addding attributes.
Like the options, we could add a CI test for this.

The patch gathers both general and target-specific attributes.
For example, the function attribute "interrupt" has 19 URLs within our
docs: one common, and 18 target-specific ones.
The patch adds a new target hook used when selecting the most
appropriate one.

Successfully bootstrapped & regrtested on x86_64-pc-linux-gnu.

OK for trunk?

Signed-off-by: David Malcolm 

gcc/ChangeLog:
* Makefile.in (OBJS): Add -attribute-urlifier.o.
(ATTR_URLS_HTML_DEPS): New.
(regenerate-attr-urls): New.
(regenerate-attr-urls-unit-test): New.
* attr-urls.def: New file.
* attribs.cc: Include "gcc-urlifier.h".
(decl_attributes): Use auto_urlify_attributes.
* config/aarch64/aarch64.cc (TARGET_DOCUMENTATION_NAME): New.
* config/arc/arc.cc (TARGET_DOCUMENTATION_NAME): New.
* config/arm/arm.cc (TARGET_DOCUMENTATION_NAME): New.
* config/bfin/bfin.cc (TARGET_DOCUMENTATION_NAME): New.
* config/bpf/bpf.cc (TARGET_DOCUMENTATION_NAME): New.
* config/epiphany/epiphany.cc (TARGET_DOCUMENTATION_NAME): New.
* config/gcn/gcn.cc (TARGET_DOCUMENTATION_NAME): New.
* config/h8300/h8300.cc (TARGET_DOCUMENTATION_NAME): New.
* config/i386/i386.cc (TARGET_DOCUMENTATION_NAME): New.
* config/ia64/ia64.cc (TARGET_DOCUMENTATION_NAME): New.
* config/m32c/m32c.cc (TARGET_DOCUMENTATION_NAME): New.
* config/m32r/m32r.cc (TARGET_DOCUMENTATION_NAME): New.
* config/m68k/m68k.cc (TARGET_DOCUMENTATION_NAME): New.
* config/mcore/mcore.cc (TARGET_DOCUMENTATION_NAME): New.
* config/microblaze/microblaze.cc (TARGET_DOCUMENTATION_NAME):
New.
* config/mips/mips.cc (TARGET_DOCUMENTATION_NAME): New.
* config/msp430/msp430.cc (TARGET_DOCUMENTATION_NAME): New.
* config/nds32/nds32.cc (TARGET_DOCUMENTATION_NAME): New.
* config/nvptx/nvptx.cc (TARGET_DOCUMENTATION_NAME): New.
* config/riscv/riscv.cc (TARGET_DOCUMENTATION_NAME): New.
* config/rl78/rl78.cc (TARGET_DOCUMENTATION_NAME): New.
* config/rs6000/rs6000.cc (TARGET_DOCUMENTATION_NAME): New.
* config/rx/rx.cc (TARGET_DOCUMENTATION_NAME): New.
* config/s390/s390.cc (TARGET_DOCUMENTATION_NAME): New.
* config/sh/sh.cc (TARGET_DOCUMENTATION_NAME): New.
* config/stormy16/stormy16.cc (TARGET_DOCUMENTATION

[PATCH] testsuite: Fix up vector-{8,9,10}.c tests

2024-11-21 Thread Jakub Jelinek

On Thu, Nov 21, 2024 at 01:30:39PM +0100, Christoph Müllner wrote:
> > >   * gcc.dg/tree-ssa/satd-hadamard.c: New test.
> > >   * gcc.dg/tree-ssa/vector-10.c: New test.
> > >   * gcc.dg/tree-ssa/vector-8.c: New test.
> > >   * gcc.dg/tree-ssa/vector-9.c: New test.

I see FAILs on i686-linux or on x86_64-linux (in the latter
with -m32 testing).

One problem is that vector-10.c doesn't use -Wno-psabi option
and uses a function which returns a vector and takes vector
as first parameter, the other problems are that 3 other
tests don't arrange for at least basic vector ISA support,
plus non-standardly test only on x86_64-*-*, while normally
one would allow both i?86-*-* x86_64-*-* and if it is e.g.
specific to 64-bit, also check for lp64 or int128 or whatever
else is needed.  E.g. Solaris I think has i?86-*-* triplet even
for 64-bit code, etc.

The following patch fixes these.
Tested on x86_64-linux with
make check-gcc 
RUNTESTFLAGS="--target_board=unix\{-m32,-m32/-mno-mmx/-mno-sse,-m64\} 
tree-ssa.exp='vector-*.c satd-hadamard.c'"
ok for trunk?

2024-11-22  Jakub Jelinek  

* gcc.dg/tree-ssa/satd-hadamard.c: Add -msse2 as dg-additional-options
on x86.  Also scan-tree-dump on i?86-*-*.
* gcc.dg/tree-ssa/vector-8.c: Likewise.
* gcc.dg/tree-ssa/vector-9.c: Likewise.
* gcc.dg/tree-ssa/vector-10.c: Add -Wno-psabi to dg-additional-options.

--- gcc/testsuite/gcc.dg/tree-ssa/satd-hadamard.c.jj2024-11-22 
00:06:56.341057153 +0100
+++ gcc/testsuite/gcc.dg/tree-ssa/satd-hadamard.c   2024-11-22 
00:17:38.539656767 +0100
@@ -1,5 +1,6 @@
 /* { dg-do compile } */
 /* { dg-additional-options "-O3 -fdump-tree-forwprop4-details" } */
+/* { dg-additional-options "-msse2" { target i?86-*-* x86_64-*-* } } */
 
 #include 
 
@@ -40,4 +41,4 @@ x264_pixel_satd_8x4_simplified (uint8_t
   return (((uint16_t)sum) + ((uint32_t)sum>>16)) >> 1;
 }
 
-/* { dg-final { scan-tree-dump "VEC_PERM_EXPR.*{ 2, 3, 6, 7 }" "forwprop4" { 
target { aarch64*-*-* x86_64-*-* } } } } */
+/* { dg-final { scan-tree-dump "VEC_PERM_EXPR.*{ 2, 3, 6, 7 }" "forwprop4" { 
target { aarch64*-*-* i?86-*-* x86_64-*-* } } } } */
--- gcc/testsuite/gcc.dg/tree-ssa/vector-8.c.jj 2024-11-22 00:06:56.341057153 
+0100
+++ gcc/testsuite/gcc.dg/tree-ssa/vector-8.c2024-11-22 00:16:51.047298247 
+0100
@@ -1,5 +1,6 @@
 /* { dg-do compile } */
 /* { dg-additional-options "-O3 -fdump-tree-forwprop1-details" } */
+/* { dg-additional-options "-msse2" { target i?86-*-* x86_64-*-* } } */
 
 typedef int vec __attribute__((vector_size (4 * sizeof (int;
 
@@ -30,5 +31,5 @@ void f (vec *p_v_in_1, vec *p_v_in_2, ve
   *p_v_out_2 = v_out_2;
 }
 
-/* { dg-final { scan-tree-dump "Vec perm simplify sequences have been blended" 
"forwprop1" { target { aarch64*-*-* x86_64-*-* } } } } */
-/* { dg-final { scan-tree-dump "VEC_PERM_EXPR.*{ 2, 3, 6, 7 }" "forwprop1" { 
target { aarch64*-*-* x86_64-*-* } } } } */
+/* { dg-final { scan-tree-dump "Vec perm simplify sequences have been blended" 
"forwprop1" { target { aarch64*-*-* i?86-*-* x86_64-*-* } } } } */
+/* { dg-final { scan-tree-dump "VEC_PERM_EXPR.*{ 2, 3, 6, 7 }" "forwprop1" { 
target { aarch64*-*-* i?86-*-* x86_64-*-* } } } } */
--- gcc/testsuite/gcc.dg/tree-ssa/vector-9.c.jj 2024-11-22 00:06:56.341057153 
+0100
+++ gcc/testsuite/gcc.dg/tree-ssa/vector-9.c2024-11-22 00:17:16.050960523 
+0100
@@ -1,5 +1,6 @@
 /* { dg-do compile } */
 /* { dg-additional-options "-O3 -fdump-tree-forwprop1-details" } */
+/* { dg-additional-options "-msse2" { target i?86-*-* x86_64-*-* } } */
 
 typedef int vec __attribute__((vector_size (4 * sizeof (int;
 
@@ -30,5 +31,5 @@ void f (vec *p_v_in_1, vec *p_v_in_2, ve
   *p_v_out_2 = v_out_2;
 }
 
-/* { dg-final { scan-tree-dump "Vec perm simplify sequences have been blended" 
"forwprop1" { target { aarch64*-*-* x86_64-*-* } } } } */
-/* { dg-final { scan-tree-dump "VEC_PERM_EXPR.*{ 2, 3, 6, 7 }" "forwprop1" { 
target { aarch64*-*-* x86_64-*-* } } } } */
+/* { dg-final { scan-tree-dump "Vec perm simplify sequences have been blended" 
"forwprop1" { target { aarch64*-*-* i?86-*-* x86_64-*-* } } } } */
+/* { dg-final { scan-tree-dump "VEC_PERM_EXPR.*{ 2, 3, 6, 7 }" "forwprop1" { 
target { aarch64*-*-* i?86-*-* x86_64-*-* } } } } */
--- gcc/testsuite/gcc.dg/tree-ssa/vector-10.c.jj2024-11-22 
00:06:56.341057153 +0100
+++ gcc/testsuite/gcc.dg/tree-ssa/vector-10.c   2024-11-22 00:15:28.875406026 
+0100
@@ -1,5 +1,5 @@
 /* { dg-do compile } */
-/* { dg-additional-options "-O3 -fdump-tree-forwprop1-details" } */
+/* { dg-additional-options "-O3 -fdump-tree-forwprop1-details -Wno-psabi" } */
 
 typedef int vec __attribute__((vector_size (4 * sizeof (int;
 

Jakub

[PATCH 1/3] aarch64: Fix up flags for vget_low_, vget_high_ and vreinterpret intrinsics

2024-11-21 Thread Andrew Pinski

These 3 intrinsics will not raise an fp exception, or read FPCR. These 
intrinsics,
will be folded into VIEW_CONVERT_EXPR or a BIT_FIELD_REF which is already set to
be const expressions too.

Built and tested for aarch64-linux-gnu.

gcc/ChangeLog:

* config/aarch64/aarch64-builtins.cc (VREINTERPRET_BUILTIN): Use
FLAG_NONE instead of FLAG_AUTO_FP.
(VGET_LOW_BUILTIN): Likewise.
(VGET_HIGH_BUILTIN): Likewise.

Signed-off-by: Andrew Pinski 
---
 gcc/config/aarch64/aarch64-builtins.cc | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/gcc/config/aarch64/aarch64-builtins.cc 
b/gcc/config/aarch64/aarch64-builtins.cc
index e26ee323a2d..04ae16a0c76 100644
--- a/gcc/config/aarch64/aarch64-builtins.cc
+++ b/gcc/config/aarch64/aarch64-builtins.cc
@@ -911,7 +911,7 @@ static aarch64_fcmla_laneq_builtin_datum 
aarch64_fcmla_lane_builtin_data[] = {
2, \
{ SIMD_INTR_MODE(A, L), SIMD_INTR_MODE(B, L) }, \
{ SIMD_INTR_QUAL(A), SIMD_INTR_QUAL(B) }, \
-   FLAG_AUTO_FP, \
+   FLAG_NONE, \
SIMD_INTR_MODE(A, L) == SIMD_INTR_MODE(B, L) \
  && SIMD_INTR_QUAL(A) == SIMD_INTR_QUAL(B) \
   },
@@ -923,7 +923,7 @@ static aarch64_fcmla_laneq_builtin_datum 
aarch64_fcmla_lane_builtin_data[] = {
2, \
{ SIMD_INTR_MODE(A, d), SIMD_INTR_MODE(A, q) }, \
{ SIMD_INTR_QUAL(A), SIMD_INTR_QUAL(A) }, \
-   FLAG_AUTO_FP, \
+   FLAG_NONE, \
false \
   },
 
@@ -934,7 +934,7 @@ static aarch64_fcmla_laneq_builtin_datum 
aarch64_fcmla_lane_builtin_data[] = {
2, \
{ SIMD_INTR_MODE(A, d), SIMD_INTR_MODE(A, q) }, \
{ SIMD_INTR_QUAL(A), SIMD_INTR_QUAL(A) }, \
-   FLAG_AUTO_FP, \
+   FLAG_NONE, \
false \
   },
 
-- 
2.43.0

[PATCH 3/3] aarch64: Add attributes to the data intrinsics.

2024-11-21 Thread Andrew Pinski

All of the data intrinsics don't read/write memory nor they are fp related.
So adding the attributes will improve the code generation slightly.

Built and tested for aarch64-linux-gnu

gcc/ChangeLog:

* config/aarch64/aarch64-builtins.cc (aarch64_init_data_intrinsics): 
Call
aarch64_get_attributes and update calls to aarch64_general_add_builtin.

Signed-off-by: Andrew Pinski 
---
 gcc/config/aarch64/aarch64-builtins.cc | 14 --
 1 file changed, 8 insertions(+), 6 deletions(-)

diff --git a/gcc/config/aarch64/aarch64-builtins.cc 
b/gcc/config/aarch64/aarch64-builtins.cc
index 9705f2de090..bc1719adbaa 100644
--- a/gcc/config/aarch64/aarch64-builtins.cc
+++ b/gcc/config/aarch64/aarch64-builtins.cc
@@ -2162,6 +2162,8 @@ aarch64_init_ls64_builtins (void)
 static void
 aarch64_init_data_intrinsics (void)
 {
+  /* These intrinsics are not fp nor they read/write memory. */
+  tree attrs = aarch64_get_attributes (FLAG_NONE, SImode);
   tree uint32_fntype = build_function_type_list (uint32_type_node,
 uint32_type_node, NULL_TREE);
   tree ulong_fntype = build_function_type_list (long_unsigned_type_node,
@@ -2171,22 +2173,22 @@ aarch64_init_data_intrinsics (void)
 uint64_type_node, NULL_TREE);
   aarch64_builtin_decls[AARCH64_REV16]
 = aarch64_general_add_builtin ("__builtin_aarch64_rev16", uint32_fntype,
-  AARCH64_REV16);
+  AARCH64_REV16, attrs);
   aarch64_builtin_decls[AARCH64_REV16L]
 = aarch64_general_add_builtin ("__builtin_aarch64_rev16l", ulong_fntype,
-  AARCH64_REV16L);
+  AARCH64_REV16L, attrs);
   aarch64_builtin_decls[AARCH64_REV16LL]
 = aarch64_general_add_builtin ("__builtin_aarch64_rev16ll", uint64_fntype,
-  AARCH64_REV16LL);
+  AARCH64_REV16LL, attrs);
   aarch64_builtin_decls[AARCH64_RBIT]
 = aarch64_general_add_builtin ("__builtin_aarch64_rbit", uint32_fntype,
-  AARCH64_RBIT);
+  AARCH64_RBIT, attrs);
   aarch64_builtin_decls[AARCH64_RBITL]
 = aarch64_general_add_builtin ("__builtin_aarch64_rbitl", ulong_fntype,
-  AARCH64_RBITL);
+  AARCH64_RBITL, attrs);
   aarch64_builtin_decls[AARCH64_RBITLL]
 = aarch64_general_add_builtin ("__builtin_aarch64_rbitll", uint64_fntype,
-  AARCH64_RBITLL);
+  AARCH64_RBITLL, attrs);
 }
 
 /* Implement #pragma GCC aarch64 "arm_acle.h".  */
-- 
2.43.0

[PATCH] testsuite: arm: Fix build error for thumb2-slow-flash-data-3.c test

2024-11-21 Thread Torbjörn SVENSSON

I'm not sure how to verify that adding the parameter won't destroy the test.
I've tried to repoduce the ICE on old Arm builds of arm-none-eabi, but none of
them ICE. I suppose it should be safe to add the parameter as the PR talks
about the literal pools.

Ok for trunk and releases/gcc-14?

--

Without this change, build fails with:

.../thumb2-slow-flash-data-3.c: In function 'fn3':
.../thumb2-slow-flash-data-3.c:23:3: error: too many arguments to function 'fn1'
.../thumb2-slow-flash-data-3.c:10:6: note: declared here

gcc/testsuite/ChangeLog:

* gcc.target/arm/thumb2-slow-flash-data-3.c: Added argument to
fn1 to avoid compile error.

Signed-off-by: Torbjörn SVENSSON 
---
 gcc/testsuite/gcc.target/arm/thumb2-slow-flash-data-3.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/gcc/testsuite/gcc.target/arm/thumb2-slow-flash-data-3.c 
b/gcc/testsuite/gcc.target/arm/thumb2-slow-flash-data-3.c
index 09d25d62002..eb841367c91 100644
--- a/gcc/testsuite/gcc.target/arm/thumb2-slow-flash-data-3.c
+++ b/gcc/testsuite/gcc.target/arm/thumb2-slow-flash-data-3.c
@@ -7,7 +7,7 @@
 /* From PR71607 */
 
 float b;
-void fn1 ();
+void fn1 (void*);
 
 float
 fn2 ()
-- 
2.25.1

[PATCH 1/1] aarch64: add ACLE macro _CHKFEAT_GCS

2024-11-21 Thread Yury Khrustalev

gcc/ChangeLog:
* config/aarch64/arm_acle.h (_CHKFEAT_GCS): New.

libgcc/ChangeLog:

* config/aarch64/aarch64-unwind.h (_Unwind_Frames_Extra): Update.
(_Unwind_Frames_Increment): Update
---
 gcc/config/aarch64/arm_acle.h  | 3 +++
 libgcc/config/aarch64/aarch64-unwind.h | 6 +++---
 2 files changed, 6 insertions(+), 3 deletions(-)

diff --git a/gcc/config/aarch64/arm_acle.h b/gcc/config/aarch64/arm_acle.h
index 7fe61c736ed..7351d1de70b 100644
--- a/gcc/config/aarch64/arm_acle.h
+++ b/gcc/config/aarch64/arm_acle.h
@@ -194,6 +194,9 @@ __rint64x (double __a)
 #pragma GCC push_options
 #pragma GCC target ("+nothing")
 
+/* Feature constants for CHKFEAT operation.  */
+#define _CHKFEAT_GCS 1
+
 __extension__ extern __inline uint64_t
 __attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
 __chkfeat (uint64_t __feat)
diff --git a/libgcc/config/aarch64/aarch64-unwind.h 
b/libgcc/config/aarch64/aarch64-unwind.h
index cf4ec749c05..85468f9685e 100644
--- a/libgcc/config/aarch64/aarch64-unwind.h
+++ b/libgcc/config/aarch64/aarch64-unwind.h
@@ -29,6 +29,7 @@ see the files COPYING3 and COPYING.RUNTIME respectively.  If 
not, see
 
 #include "ansidecl.h"
 #include 
+#include 
 
 #define AARCH64_DWARF_REGNUM_RA_STATE 34
 #define AARCH64_DWARF_RA_STATE_MASK   0x1
@@ -179,7 +180,6 @@ aarch64_demangle_return_addr (struct _Unwind_Context 
*context,
 }
 
 /* GCS enable flag for chkfeat instruction.  */
-#define CHKFEAT_GCS 1
 
 /* SME runtime function local to libgcc, streaming compatible
and preserves more registers than the base PCS requires, but
@@ -194,7 +194,7 @@ void __libgcc_arm_za_disable (void);
   do   \
 {  \
   __libgcc_arm_za_disable ();  \
-  if (__builtin_aarch64_chkfeat (CHKFEAT_GCS) == 0)\
+  if (__builtin_aarch64_chkfeat (_CHKFEAT_GCS) == 0)   \
{   \
  for (_Unwind_Word n = (x); n != 0; n--)   \
__builtin_aarch64_gcspopm ();   \
@@ -233,7 +233,7 @@ void __libgcc_arm_za_disable (void);
   do   \
 {  \
   frames++;\
-  if (__builtin_aarch64_chkfeat (CHKFEAT_GCS) != 0 \
+  if (__builtin_aarch64_chkfeat (_CHKFEAT_GCS) != 0\
  || exc->exception_class == 0  \
  || _Unwind_GetIP (context) == 0)  \
break;  \
-- 
2.39.5

[PATCH] genemit: Distribute evenly to files [PR111600].

2024-11-21 Thread Robin Dapp

Hi,

currently we distribute insn patterns in genemit, partitioning them
by the number of patterns per file.  The first 100 into file 1, the
next 100 into file 2, and so on.  Depending on the patterns this
can lead to files of very uneven size.

Similar to the genmatch split, this patch introduces a dynamic
choose_output () which considers the size of the output files
and selects the shortest one for the next pattern.

Bootstrapped and regtested on x86 and power10, aarch64 running.
Regtested on rv64gcv.

gcc/ChangeLog:

PR target/111600

* genemit.cc (handle_arg): Use files instead of filenames.
(main): Ditto.
* gensupport.cc (SIZED_BASED_CHUNKS): Define.
(choose_output): New function.
* gensupport.h (choose_output): Declare.
---
 gcc/genemit.cc| 53 ++-
 gcc/gensupport.cc | 33 +
 gcc/gensupport.h  |  1 +
 3 files changed, 50 insertions(+), 37 deletions(-)

diff --git a/gcc/genemit.cc b/gcc/genemit.cc
index 5d3d10f5061..518fb85ce8c 100644
--- a/gcc/genemit.cc
+++ b/gcc/genemit.cc
@@ -905,14 +905,15 @@ from the machine description file `md'.  */\n\n");
   fprintf (file, "#include \"target.h\"\n\n");
 }
 
-auto_vec output_files;
+auto_vec output_files;
 
 static bool
 handle_arg (const char *arg)
 {
   if (arg[1] == 'O')
 {
-  output_files.safe_push (&arg[2]);
+  FILE *file = fopen (&arg[2], "w");
+  output_files.safe_push (file);
   return true;
 }
   return false;
@@ -933,47 +934,21 @@ main (int argc, const char **argv)
   /* Assign sequential codes to all entries in the machine description
  in parallel with the tables in insn-output.cc.  */
 
-  int npatterns = count_patterns ();
   md_rtx_info info;
 
-  bool to_stdout = false;
-  int npatterns_per_file = npatterns;
-  if (!output_files.is_empty ())
-npatterns_per_file = npatterns / output_files.length () + 1;
-  else
-to_stdout = true;
-
-  gcc_assert (npatterns_per_file > 1);
+  if (output_files.is_empty ())
+output_files.safe_push (stdout);
 
-  /* Reverse so we can pop the first-added element.  */
-  output_files.reverse ();
+  for (auto f : output_files)
+print_header (f);
 
-  int count = 0;
   FILE *file = NULL;
+  unsigned file_idx;
 
   /* Read the machine description.  */
   while (read_md_rtx (&info))
 {
-  if (count == 0 || count == npatterns_per_file)
-   {
- bool is_last = !to_stdout && output_files.is_empty ();
- if (file && !is_last)
-   if (fclose (file) != 0)
- return FATAL_EXIT_CODE;
-
- if (!output_files.is_empty ())
-   {
- const char *const filename = output_files.pop ();
- file = fopen (filename, "w");
-   }
- else if (to_stdout)
-   file = stdout;
- else
-   break;
-
- print_header (file);
- count = 0;
-   }
+  file = choose_output (output_files, file_idx);
 
   switch (GET_CODE (info.def))
{
@@ -999,10 +974,10 @@ main (int argc, const char **argv)
default:
  break;
}
-
-  count++;
 }
 
+  file = choose_output (output_files, file_idx);
+
   /* Write out the routines to add CLOBBERs to a pattern and say whether they
  clobber a hard reg.  */
   output_add_clobbers (&info, file);
@@ -1015,5 +990,9 @@ main (int argc, const char **argv)
   handle_overloaded_gen (oname, file);
 }
 
-  return (fclose (file) != 0 ? FATAL_EXIT_CODE : SUCCESS_EXIT_CODE);
+  for (FILE *f : output_files)
+if (fclose (f) != 0)
+  return FATAL_EXIT_CODE;
+
+  return SUCCESS_EXIT_CODE;
 }
diff --git a/gcc/gensupport.cc b/gcc/gensupport.cc
index 3a02132c876..e0adf0c1bc5 100644
--- a/gcc/gensupport.cc
+++ b/gcc/gensupport.cc
@@ -3913,3 +3913,36 @@ find_optab (optab_pattern *p, const char *name)
 }
   return false;
 }
+
+/* Find the file to write into next.  We try to evenly distribute the contents
+   over the different files.  */
+
+#define SIZED_BASED_CHUNKS 1
+
+FILE *
+choose_output (const vec &parts, unsigned &idx)
+{
+  if (parts.length () == 0)
+gcc_unreachable ();
+#ifdef SIZED_BASED_CHUNKS
+  FILE *shortest = NULL;
+  long min = 0;
+  idx = 0;
+  for (unsigned i = 0; i < parts.length (); i++)
+{
+  FILE *part  = parts[i];
+  long len = ftell (part);
+  if (!shortest || min > len)
+   {
+ shortest = part;
+ min = len;
+ idx = i;
+   }
+}
+  return shortest;
+#else
+  static int current_file;
+  idx = current_file++ % parts.length ();
+  return parts[idx];
+#endif
+}
diff --git a/gcc/gensupport.h b/gcc/gensupport.h
index b7a1da34518..781c9e9ffce 100644
--- a/gcc/gensupport.h
+++ b/gcc/gensupport.h
@@ -231,5 +231,6 @@ extern file_location get_file_location (rtx);
 extern const char *get_emit_function (rtx);
 extern bool needs_barrier_p (rtx);
 extern bool find_optab (optab_pattern *, const char *);
+extern FILE *choose_ou

[PATCH] testsuite: RISC-V: Fix vector flags handling [PR117603]

2024-11-21 Thread Dimitar Dimitrov

The DejaGnu routine "riscv_get_arch" fails to infer the correct
architecture string when GCC is built for RV32EC.  This causes invalid
architecture string to be produced by "add_options_for_riscv_v":
  xgcc: error: '-march=rv32cv': first ISA subset must be 'e', 'i' or 'g'

Fix by adding the E base ISA variant to the list of possible architecture
modifiers.

Also, the V extension is added to the machine string without checking
whether dependent extensions are available.  This results in errors when
GCC is built for RV32EC:
  Executing on host: .../xgcc ... -march=rv32ecv ...
  cc1: error: ILP32E ABI does not support the 'D' extension
  cc1: sorry, unimplemented: Currently the 'V' implementation requires the 'M' 
extension

Fix by disabling vector tests for RISC-V if V extension cannot be added
to current architecture.

Tested riscv32-none-elf for -march=rv32ec using GNU simulator.  Most of
the remaining failures are due to explicit addition of vector options,
yet missing "dg-require-effective-target riscv_v_ok":

=== gcc Summary ===

 # of expected passes211958
 # of unexpected failures1826
 # of expected failures  1059
 # of unresolved testcases   5209
 # of unsupported tests  15513

Ensured riscv64-unknown-linux-gnu tested with qemu has no new passing or
failing tests, before and after applying this patch:

 Running target riscv-sim/-march=rv64imafdc/-mabi=lp64d/-mcmodel=medlow
 ...
=== gcc Summary ===

 # of expected passes237209
 # of unexpected failures335
 # of expected failures  1670
 # of unresolved testcases   43
 # of unsupported tests  16767

Ok for trunk once CI is green?

PR target/117603

gcc/testsuite/ChangeLog:

* lib/target-supports.exp (riscv_get_arch): Add comment about
function purpose.  Add E ISA to list of possible
modifiers.
(check_vect_support_and_set_flags): Do not advertise vector
support if V extension cannot be enabled.

Signed-off-by: Dimitar Dimitrov 
---
 gcc/testsuite/lib/target-supports.exp | 13 ++---
 1 file changed, 10 insertions(+), 3 deletions(-)

diff --git a/gcc/testsuite/lib/target-supports.exp 
b/gcc/testsuite/lib/target-supports.exp
index e13c138d916..9a1a3e86301 100644
--- a/gcc/testsuite/lib/target-supports.exp
+++ b/gcc/testsuite/lib/target-supports.exp
@@ -2201,10 +2201,13 @@ proc check_effective_target_riscv_v_misalign_ok { } {
 return 0
 }
 
+# Deduce the string for the RISC-V architecture targeted by the compiler
+# under test.  Also take into account the global compiler flags passed
+# by testsuite.
 proc riscv_get_arch { } {
 set gcc_march ""
 # ??? do we neeed to add more extensions to the list below?
-foreach ext { i m a f d q c b v zicsr zifencei zfh zba zbb zbc zbs zvbb 
zvfh ztso zaamo zalrsc zabha zacas } {
+foreach ext { i e m a f d q c b v zicsr zifencei zfh zba zbb zbc zbs zvbb 
zvfh ztso zaamo zalrsc zabha zacas } {
if { [check_no_compiler_messages  riscv_ext_$ext assembly [string map 
[list DEF __riscv_$ext] {
#ifndef DEF
#error "Not DEF"
@@ -12044,12 +12047,16 @@ proc check_vect_support_and_set_flags { } {
if [check_effective_target_riscv_v_misalign_ok] {
lappend DEFAULT_VECTCFLAGS "-mno-vector-strict-align"
}
-   } else {
+   } elseif [check_effective_target_riscv_v_ok] {
foreach item [add_options_for_riscv_v ""] {
lappend DEFAULT_VECTCFLAGS $item
}
set dg-do-what-default compile
-   }
+   } else {
+   # Current architecture cannot support vectors (e.g. the
+   # dependent D extension is missing).
+   return 0
+}
 } elseif [istarget loongarch*-*-*] {
   # Set the default vectorization option to "-mlsx" due to the problem
   # of non-aligned memory access when using 256-bit vectorization.
-- 
2.47.0

Re: [PATCH] gimple: Add limit after which slower switchlower algs are used [PR117091] [PR117352]

2024-11-21 Thread Andi Kleen

On Fri, Nov 15, 2024 at 10:43:57AM +0100, Filip Kastl wrote:
> Hi,
> 
> Andi's greedy bit test finding algorithm was reverted.  I found a fix for the
> problem that caused the revert.  I made this patch to reintroduce the greedy
> alg into GCC.  However I think we should keep the old slow but more powerful
> algorithm so I added a limit on the number of cases of the switch statement
> that decides which of the two algorithms to use.

Do we actually have a case where the DP algorithm is better?

In the bootstrap comparison greedy does produce less or the same clusters


> +  k = 0;
> +  while (i + k < l)
> + {
> +   cluster *end_cluster = clusters[i + k];
> +
> +   /* Does value range fit into the BITS_PER_WORD window?  */
> +   HOST_WIDE_INT w = cluster::get_range (start_cluster->get_low (),
> + end_cluster->get_high ());
> +   if (w == 0 || w > BITS_PER_WORD)
> + break;
> +
> +   /* Check for max # of targets.  */
> +   if (targets.elements () == m_max_case_bit_tests
> +   && !targets.contains (end_cluster->m_case_bb))
> + break;
> +
> +   targets.add (end_cluster->m_case_bb);
> +   k++;
> + }
>  
> +  if (is_beneficial (k, targets.elements ()))

Equivalent to the old test in DP would be k + 1
I'm not sure it makes a lot of difference though.


-Andi

Re: [PATCH] Allow limited extended asm at toplevel [PR41045]

2024-11-21 Thread Joseph Myers

On Sat, 2 Nov 2024, Jakub Jelinek wrote:

> +Extended @code{asm} statements outside of functions may not use any
> +qualifiers, may not specify clobbers, may not use @code{%}, @code{+} or
> +@code{&} modifiers in constraints and can only use constraints which don%'t
> +allow using any register.

Just ' in Texinfo, not %'.

> @@ -3071,7 +3072,62 @@ c_parser_declaration_or_fndef (c_parser
>  static void
>  c_parser_asm_definition (c_parser *parser)
>  {
> -  tree asm_str = c_parser_simple_asm_expr (parser);
> +  location_t asm_loc = c_parser_peek_token (parser)->location;

The syntax comment above this function needs updating.

The C front-end changes are OK with that fix.

-- 
Joseph S. Myers
josmy...@redhat.com

[committed] c: Give errors more consistently for void parameters [PR114816]

2024-11-21 Thread Joseph Myers

Cases of void parameters, other than a parameter list of (void) (or
equivalent with a typedef for void) in its entirety, have been made a
constraint violation in C2Y (N3344 alternative 1 was adopted), as part
of a series of changes to eliminate unnecessary undefined behavior by
turning it into constraint violations, implementation-defined behavior
or something else with stricter bounds on what behavior is allowed.
Previously, these were implicitly undefined behavior (see DR#295),
with only some cases listed in Annex J as undefined (but even those
cases not having wording in the normative text to make them explicitly
undefined).

As discussed in bug 114816, GCC is not entirely consistent about
diagnosing such usages; unnamed void parameters get errors when not
the entire parameter list, while qualified and register void (the
cases listed in Annex J) get errors as a single unnamed parameter, but
named void parameters are accepted with a warning (in a declaration
that's not a definition; it's not possible to define a function with
incomplete parameter types).

Following C2Y, make all these cases into errors.  The errors are not
conditional on the standard version, given that this was previously
implicit undefined behavior.  Since it wasn't possible anyway to
define such functions, only declare them without defining them (or
otherwise use such parameters in function type names that can't
correspond to any defined function), hopefully the risks of
compatibility issues are small.

Bootstrapped with no regressions for x86-64-pc-linux-gnu.

PR c/114816

gcc/c/
* c-decl.cc (grokparms): Do not warn for void parameter type here.
(get_parm_info): Give errors for void parameters even when named.

gcc/testsuite/
* gcc.dg/c2y-void-parm-1.c: New test.
* gcc.dg/noncompile/920616-2.c, gcc.dg/noncompile/921116-1.c,
gcc.dg/parm-incomplete-1.c: Update expected diagnostics.

diff --git a/gcc/c/c-decl.cc b/gcc/c/c-decl.cc
index c58ff4ab2488..a84b35ea23e7 100644
--- a/gcc/c/c-decl.cc
+++ b/gcc/c/c-decl.cc
@@ -8480,8 +8480,8 @@ grokparms (struct c_arg_info *arg_info, bool funcdef_flag)
 or definition of the function.  In the case where the tag was
 first declared within the parameter list, a warning has
 already been given.  If a parameter has void type, then
-however the function cannot be defined or called, so
-warn.  */
+this has already received an error (constraint violation in C2Y,
+previously implicitly undefined behavior).  */
 
   for (parm = arg_info->parms, typelt = arg_types, parmno = 1;
   parm;
@@ -8508,17 +8508,6 @@ grokparms (struct c_arg_info *arg_info, bool 
funcdef_flag)
  TREE_TYPE (parm) = error_mark_node;
  arg_types = NULL_TREE;
}
- else if (VOID_TYPE_P (type))
-   {
- if (DECL_NAME (parm))
-   warning_at (input_location, 0,
-   "parameter %u (%q+D) has void type",
-   parmno, parm);
- else
-   warning_at (DECL_SOURCE_LOCATION (parm), 0,
-   "parameter %u has void type",
-   parmno);
-   }
}
 
  if (DECL_NAME (parm) && TREE_USED (parm))
@@ -8627,12 +8616,14 @@ get_parm_info (bool ellipsis, tree expr)
  if (TREE_ASM_WRITTEN (decl))
error_at (b->locus,
  "parameter %q+D has just a forward declaration", decl);
- /* Check for (..., void, ...) and issue an error.  */
- else if (VOID_TYPE_P (type) && !DECL_NAME (decl))
+ /* Check for (..., void, ...) and named void parameters and issue an
+error.  */
+ else if (VOID_TYPE_P (type))
{
  if (!gave_void_only_once_err)
{
- error_at (b->locus, "% must be the only parameter");
+ error_at (b->locus,
+   "% must be the only parameter and unnamed");
  gave_void_only_once_err = true;
}
}
diff --git a/gcc/testsuite/gcc.dg/c2y-void-parm-1.c 
b/gcc/testsuite/gcc.dg/c2y-void-parm-1.c
new file mode 100644
index ..aff757ee21c4
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/c2y-void-parm-1.c
@@ -0,0 +1,152 @@
+/* Test errors for void parameters (constraint violation in C2Y, previously
+   implicit undefined behavior; bug 114816).  */
+/* { dg-do compile } */
+/* { dg-options "-std=c2y -pedantic-errors" } */
+
+typedef void Void;
+typedef const void CVoid;
+typedef volatile void VVoid;
+
+/* Valid cases.  */
+void f1 (void);
+void f2 (Void);
+void df1 (void) {}
+void df2 (Void) {}
+
+/* All other variants are invalid.  */
+
+void g1 (const void); /* { dg-error "void" } */
+void g2 (CVoid); /* { dg-error "void" } */
+void g3 (volatile void); /* { dg-error "void" } */
+v

Re: [PATCH v2 1/5] vect: Force alignment peeling to vectorize more early break loops

2024-11-21 Thread Alex Coplan

On 21/11/2024 10:02, Richard Biener wrote:
> On Fri, 15 Nov 2024, Alex Coplan wrote:
> 
> > Hi,
> > 
> > This is a v2 which hopefully addresses the feedback for v1 of the 1/5
> > patch, originally posted here:
> > https://gcc.gnu.org/pipermail/gcc-patches/2024-October/48.html
> > 
> > As mentioned on IRC, it will need follow-up work to fix up latent
> > profile issues, but that can be done during stage 3.  We will also need
> > a simple (hopefully obvious, even) follow-up patch to fix expectations
> > for various tests (since we now vectorize loops which we previously
> > couldn't).
> > 
> > OK for trunk?
> 
> I'm still looking at
> 
> +  if (dr_info->need_peeling_for_alignment)
> +{
> +  /* Vector size in bytes.  */
> +  poly_uint64 safe_align = tree_to_poly_uint64 (TYPE_SIZE_UNIT 
> (vectype));
> +
> +  /* We can only peel for loops, of course.  */
> +  gcc_checking_assert (loop_vinfo);
> +
> +  /* Calculate the number of vectors read per vector iteration.  If
> +it is a power of two, multiply through to get the required
> +alignment in bytes.  Otherwise, fail analysis since alignment
> +peeling wouldn't work in such a case.  */
> +  poly_uint64 vf = LOOP_VINFO_VECT_FACTOR (loop_vinfo);
> +  if (STMT_VINFO_GROUPED_ACCESS (stmt_info))
> +   vf *= DR_GROUP_SIZE (stmt_info);
> 
> so this is the total number of scalars we load, so maybe call it
> that way, num_scalars.

Will do.

> 
> +
> +  auto num_vectors = vect_get_num_vectors (vf, vectype);
> +  if (!pow2p_hwi (num_vectors))
> 
> side-note - with all these multiplies there's the possibility that
> we get a testcase that has safe_align > PAGE_SIZE, meaning it's
> no longer a good way to avoid trapping.  This problem of course
> exists generally, we avoid it elsewhere by not having very large
> vectors or limiting the group-size.  The actual problem is that
> we don't know the actual page size, but we maybe could configure
> a minimum as part of the platform configuration.  Should we for
> now simply add
> 
>   || safe_align > 4096
> 
> here?  A testcase would load 512 contiguous uint64 to form an early exit
> condition, quite unlikely I guess.

Good point.  I suppose this really depends whether there are
targets/platforms that GCC supports with a page size smaller than 4k.
Perhaps a min_page_size target hook (defaulting to 4k) would be
sensible.  Then if there are any such targets they can override the
hook.  WDYT?

> 
> With DR_GROUP_SIZE != 1 there's also the question whether we can
> ever reach the desired alignment since each peel will skip
> DR_GROUP_SIZE scalar elements - either those are already
> DR_GROUP_SIZE aligned or the result will never be.

I had a look at this, I think this should already be handled by
vector_alignment_reachable_p which already includes a suitable check for
STMT_VINFO_GROUPED_ACCESS.  E.g. for the following testcase:

double *a;
int f(int n) {
  for (int i = 0; i < n; i += 2)
if (a[i])
  __builtin_abort();
}

we have DR_GROUP_SIZE = 2 and in the dump (-O3, with the alignment peeling
patches) we see:

note:   === vect_enhance_data_refs_alignment ===
missed:   vector alignment may not be reachable
note: vect_can_advance_ivs_p:
note:   Analyze phi: i_12 = PHI 
note:   Alignment of access forced using versioning.
note:   Versioning for alignment will be applied.

so we decide to version instead, since peeling isn't viable here
(vector_alignment_reachable_p correctly returns false).

Perhaps that says the DR flag should really be called
need_peeling_or_versioning_for_alignment (although better I guess just
to update the comment above its definition to mention versioning).

> 
> @@ -7208,7 +7277,8 @@ vect_supportable_dr_alignment (vec_info *vinfo, 
> dr_vec_info *dr_info,
>if (misalignment == DR_MISALIGNMENT_UNKNOWN)
>  is_packed = not_size_aligned (DR_REF (dr));
>if (targetm.vectorize.support_vector_misalignment (mode, type, 
> misalignment,
> -is_packed))
> +is_packed)
> +  && !dr_info->need_peeling_for_alignment)
>  return dr_unaligned_supported;
> 
> I think you need to do this earlier, like with
> 
>   if (misalignment == 0)
> return dr_aligned;
> + else if (dr_info->need_peeling_for_alignment)
> +   return dr_unaligned_unsupported;

That seems more obviously correct indeed.  I'll make that change.

> 
> diff --git a/gcc/tree-vect-loop-manip.cc b/gcc/tree-vect-loop-manip.cc
> index c8dc7153298..be2c2a1bc75 100644
> --- a/gcc/tree-vect-loop-manip.cc
> +++ b/gcc/tree-vect-loop-manip.cc
> @@ -3129,12 +3129,6 @@ vect_do_peeling (loop_vec_info loop_vinfo, tree 
> niters, tree nitersm1,
>int estimated_vf;
>int prolog_peeling = 0;
>bool vect_epilogues = loop_vinfo->epilogue_vinfo != NULL;
> -  /* We currently do not support prolog peeling if the target alignment 
> is not
> - known at compile time.  'vect_gen_

Re: [PATCH v2 1/5] testsuite: arm: Use effective-target for pr56184.C and pr59985.C

2024-11-21 Thread Torbjorn SVENSSON





On 2024-11-21 15:49, Christophe Lyon wrote:



On 11/21/24 15:24, Torbjörn SVENSSON wrote:

Update test cases to use -mcpu=unset/-march=unset feature introduced in
r15-3606-g7d6c6a0d15c.

gcc/testsuite/ChangeLog:

* g++.dg/other/pr56184.C: Use effective-target
arm_arch_v7a_neon_thumb.
* g++.dg/other/pr59985.C: Use effective-target
arm_arch_v7a_fp_hard.
* lib/target-supports.exp: Define effective-target
arm_arch_v7a_fp_hard, arm_arch_v7a_neon_thumb

Signed-off-by: Torbjörn SVENSSON 
---
  gcc/testsuite/g++.dg/other/pr56184.C  | 7 +--
  gcc/testsuite/g++.dg/other/pr59985.C  | 7 ---
  gcc/testsuite/lib/target-supports.exp | 2 ++
  3 files changed, 11 insertions(+), 5 deletions(-)

diff --git a/gcc/testsuite/g++.dg/other/pr56184.C b/gcc/testsuite/g+ 
+.dg/other/pr56184.C

index dc949283c98..651c6280c35 100644
--- a/gcc/testsuite/g++.dg/other/pr56184.C
+++ b/gcc/testsuite/g++.dg/other/pr56184.C
@@ -1,6 +1,9 @@
  /* { dg-do compile { target arm*-*-* } } */
-/* { dg-skip-if "incompatible options" { ! { arm_thumb1_ok || 
arm_thumb2_ok } } } */
-/* { dg-options "-fno-short-enums -O2 -mthumb -march=armv7-a - 
mfpu=neon -mfloat-abi=softfp -mtune=cortex-a9 -fno-section-anchors - 
Wno-return-type" } */

+/* { dg-require-effective-target arm_arch_v7a_neon_thumb_ok } */
+/* { dg-options "-fno-short-enums -O2 -fno-section-anchors -Wno- 
return-type" } */

+/* { dg-add-options arm_arch_v7a_neon_thumb } */
+/* { dg-additional-options "-mthumb -mtune=cortex-a9" } */

Isn't -mthumb already included by dg-add-options arm_arch_v7a_neon_thumb?


Ah, yes. Silly mistake on my part. Fixed in my local commit.

Kind regards,
Torbjörn



Thanks,

Christophe


+
  typedef unsigned int size_t;
  __extension__ typedef int __intptr_t;
diff --git a/gcc/testsuite/g++.dg/other/pr59985.C b/gcc/testsuite/g+ 
+.dg/other/pr59985.C

index 7c9bfab35f1..e96db431633 100644
--- a/gcc/testsuite/g++.dg/other/pr59985.C
+++ b/gcc/testsuite/g++.dg/other/pr59985.C
@@ -1,7 +1,8 @@
  /* { dg-do compile { target arm*-*-* } } */
-/* { dg-skip-if "incompatible options" { arm_thumb1 } } */
-/* { dg-options "-g -fcompare-debug -O2 -march=armv7-a -mtune=cortex- 
a9 -mfpu=vfpv3-d16 -mfloat-abi=hard" } */
-/* { dg-skip-if "need hardfp abi" { *-*-* } { "-mfloat-abi=soft" } 
{ "" } } */

+/* { dg-require-effective-target arm_arch_v7a_fp_hard_ok } */
+/* { dg-options "-g -fcompare-debug -O2" } */
+/* { dg-add-options arm_arch_v7a_fp_hard } */
+/* { dg-additional-options "-mtune=cortex-a9" } */
  extern void *f1 (unsigned long, unsigned long);
  extern const struct line_map *f2 (void *, int, unsigned int, const 
char *, unsigned int);
diff --git a/gcc/testsuite/lib/target-supports.exp b/gcc/testsuite/ 
lib/target-supports.exp

index f3828793986..e2c839b233a 100644
--- a/gcc/testsuite/lib/target-supports.exp
+++ b/gcc/testsuite/lib/target-supports.exp
@@ -5777,7 +5777,9 @@ foreach { armfunc armflag armdefs } {
  v6m "-march=armv6-m -mthumb -mfloat-abi=soft" __ARM_ARCH_6M__
  v7a "-march=armv7-a+fp" __ARM_ARCH_7A__
  v7a_arm "-march=armv7-a+fp -marm" "__ARM_ARCH_7A__ && !__thumb__"
+    v7a_fp_hard "-march=armv7-a+fp -mfpu=auto -mfloat-abi=hard" 
__ARM_ARCH_7A__
  v7a_neon "-march=armv7-a+simd -mfpu=auto -mfloat-abi=softfp" 
"__ARM_ARCH_7A__ && __ARM_NEON__"
+    v7a_neon_thumb "-march=armv7-a+simd -mfpu=auto -mfloat-abi=softfp 
-mthumb" "__ARM_ARCH_7A__ && __ARM_NEON__ && __thumb__"

  v7r "-march=armv7-r+fp" __ARM_ARCH_7R__
  v7m "-march=armv7-m -mthumb -mfloat-abi=soft" __ARM_ARCH_7M__
  v7em "-march=armv7e-m+fp -mthumb" __ARM_ARCH_7EM__

Re: [PATCH 08/11] c: c++: flag to disable fetch_op handling fenv exceptions

2024-11-21 Thread Joseph Myers

On Thu, 21 Nov 2024, Matthew Malcomson wrote:

> Based on that -- should the same reasoning apply to the new builtins?
> I.e. do you believe it would be reasonable to say that the new builtins
> require libatomic, and remove this flag entirely?

I think it's reasonable to say that atomic built-in functions (beyond 
maybe the limited case of existing __sync_* in cases where they worked 
before the addition of libatomic) can require libatomic.

-- 
Joseph S. Myers
josmy...@redhat.com

[PATCH 2/3] aarch64: add attributes to the prefetch_builtins

2024-11-21 Thread Andrew Pinski

This adds the attributes associated with prefetch to the bultins.
Just call aarch64_get_attributes with FLAG_PREFETCH_MEMORY to get the 
attributes.

Built and tested for aarch64-linux-gnu.

gcc/ChangeLog:

* config/aarch64/aarch64-builtins.cc (aarch64_init_prefetch_builtin):
Updete call to aarch64_general_add_builtin in 
AARCH64_INIT_PREFETCH_BUILTIN.
Add new variable prefetch_attrs.

Signed-off-by: Andrew Pinski 
---
 gcc/config/aarch64/aarch64-builtins.cc | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/gcc/config/aarch64/aarch64-builtins.cc 
b/gcc/config/aarch64/aarch64-builtins.cc
index 04ae16a0c76..9705f2de090 100644
--- a/gcc/config/aarch64/aarch64-builtins.cc
+++ b/gcc/config/aarch64/aarch64-builtins.cc
@@ -2024,10 +2024,12 @@ aarch64_init_prefetch_builtin (void)
 {
 #define AARCH64_INIT_PREFETCH_BUILTIN(INDEX, N)
\
   aarch64_builtin_decls[INDEX] =   \
-aarch64_general_add_builtin ("__builtin_aarch64_" N, ftype, INDEX)
+aarch64_general_add_builtin ("__builtin_aarch64_" N, ftype, INDEX,  \
+prefetch_attrs)
 
   tree ftype;
   tree cv_argtype;
+  tree prefetch_attrs = aarch64_get_attributes (FLAG_PREFETCH_MEMORY, DImode);
   cv_argtype = build_qualified_type (void_type_node, TYPE_QUAL_CONST
 | TYPE_QUAL_VOLATILE);
   cv_argtype = build_pointer_type (cv_argtype);
-- 
2.43.0

Re: [RFC PATCH] inline asm, v2: Add new constraint for symbol definitions

2024-11-21 Thread Joseph Myers

On Wed, 6 Nov 2024, Jakub Jelinek wrote:

> +   error_at (loc, "%<:%> constraint operand is not address "
> +  "of a function or non-automatic variable");

I think a testcase for this error is needed.

-- 
Joseph S. Myers
josmy...@redhat.com

Re: [PATCH] inline-asm: Add - constraint modifier support for toplevel extended asm [PR41045]

2024-11-21 Thread Joseph Myers

On Mon, 18 Nov 2024, Jakub Jelinek wrote:

> +@smallexample
> +extern void foo (void), bar (void);
> +int v;
> +extern int w;
> +asm (".globl %cc0, %cc2; .text; %cc0: call %cc1; ret; .data; %cc2: .word 
> %cc3"
> + :: ":" (foo), "-s" (&bar), ":" (&w), "-i" (&v));
> +@end smallexample
> +
> +This asm declaration tells the compiler it defines function foo and variable
> +w and uses function bar and variable v.  This will compile even with PIC,
> +but it is up to the user to ensure it will assemble correctly and have the
> +expected behavior.

That should be @code{foo}, @code{w}, @code{bar}, @code{v}.

The C front-end changes in this patch are OK.

-- 
Joseph S. Myers
josmy...@redhat.com

Re: Stage1 patch ping

2024-11-21 Thread Joseph Myers

On Tue, 19 Nov 2024, Jakub Jelinek wrote:

> https://gcc.gnu.org/pipermail/gcc-patches/2024-November/667737.html
>   inline-asm: Add support for cc operand modifier

> https://gcc.gnu.org/pipermail/gcc-patches/2024-November/667949.html
>   inline-asm, i386: Add "redzone" clobber support
>   This one needs primarily C-family FE review, Uros already acked the
>   i386 part

I don't see any front-end changes in either of these two patches.

-- 
Joseph S. Myers
josmy...@redhat.com

1 2 >

1 - 100 of 110 matches

Mail list logo