Re: [PATCH] Fortran: fix treatment of character, value, optional dummy arguments [PR107444]

2022-11-13 Thread Andreas Schwab
On Nov 10 2022, Harald Anlauf via Gcc-patches wrote:

> Dear Fortranners,
>
> the attached patch is a follow-up to the fix for PR107441,
> as it finally fixes the treatment of character dummy arguments
> that have the value,optional attribute, and allows for checking
> of the presence of such arguments.
>
> This entails a small ABI clarification, as the previous text
> was not really clear on the argument passing conventions,
> and the previously generated code was inconsistent at best,
> or rather wrong, for this kind of procedure arguments.
> (E.g. the number of passed arguments was varying...)
>
> Testcase cross-checked with NAG 7.1.
>
> Regtested on x86_64-pc-linux-gnu.  OK for mainline?

This breaks aarch64:

$ /opt/gcc/gcc-20221113/Build/./gcc/xgcc -B/opt/gcc/gcc-20221113/Build/./gcc/ 
-B/usr/aarch64-suse-linux/bin/ -B/usr/aarch64-suse-linux/lib/ -isystem 
/usr/aarch64-suse-linux/include -isystem /usr/aarch64-suse-linux/sys-include 
-fchecking=1 ../../../../libgomp/testsuite/libgomp.fortran/is_device_ptr-2.f90 
-mabi=lp64 -B/opt/gcc/gcc-20221113/Build/aarch64-suse-linux/./libgomp/ 
-B/opt/gcc/gcc-20221113/Build/aarch64-suse-linux/./libgomp/.libs 
-I/opt/gcc/gcc-20221113/Build/aarch64-suse-linux/./libgomp 
-I../../../../libgomp/testsuite/../../include 
-I../../../../libgomp/testsuite/.. -fmessage-length=0 
-fno-diagnostics-show-caret -fdiagnostics-color=never -fopenmp -O 
-fdump-tree-original 
-B/opt/gcc/gcc-20221113/Build/aarch64-suse-linux/./libgomp/../libgfortran/.libs 
-fintrinsic-modules-path=/opt/gcc/gcc-20221113/Build/aarch64-suse-linux/./libgomp
 -L/opt/gcc/gcc-20221113/Build/aarch64-suse-linux/./libgomp/.libs 
-L/opt/gcc/gcc-20221113/Build/aarch64-suse-linux/./libgomp/../libgfortran/.libs 
-lgfortran -foffload=-lgfortran -lm -o ./is_device_ptr-2.exe
during GIMPLE pass: omplower
../../../../libgomp/testsuite/libgomp.fortran/is_device_ptr-2.f90:66:77: 
internal compiler error: in gfc_omp_check_optional_argument, at 
fortran/trans-openmp.cc:137
0x8acb63 gfc_omp_check_optional_argument(tree_node*, bool)
../../gcc/fortran/trans-openmp.cc:137
0xd29fc3 lower_omp_target
../../gcc/omp-low.cc:13632
0xd314b3 lower_omp_1
../../gcc/omp-low.cc:14523
0xd314b3 lower_omp
../../gcc/omp-low.cc:14662
0xd31283 lower_omp_1
../../gcc/omp-low.cc:14436
0xd31283 lower_omp
../../gcc/omp-low.cc:14662
0xd318a3 lower_omp_1
../../gcc/omp-low.cc:14452
0xd318a3 lower_omp
../../gcc/omp-low.cc:14662
0xd377fb execute_lower_omp
../../gcc/omp-low.cc:14701
0xd377fb execute
../../gcc/omp-low.cc:14755
Please submit a full bug report, with preprocessed source (by using 
-freport-bug).

-- 
Andreas Schwab, sch...@linux-m68k.org
GPG Key fingerprint = 7578 EB47 D4E5 4D69 2510  2552 DF73 E780 A9DA AEC1
"And now for something completely different."


[PATCH] builtins: Commonise default handling of nonlocal_goto

2022-11-13 Thread Richard Sandiford via Gcc-patches
expand_builtin_longjmp and expand_builtin_nonlocal_goto both
emit nonlocal gotos.  They first try to use a target-provided
pattern and fall back to generic code otherwise.  These pieces
of generic code are almost identical, and having them inline
like this makes it difficult to define a nonlocal_goto pattern
that only wants to add extra steps, not change the default ones.

Tested on aarch64-linux-gnu and x86_64-linux-gnu.  OK to install?

Richard

gcc/
* builtins.h (emit_standard_nonlocal_goto): Declare.
* builtins.cc (emit_standard_nonlocal_goto): New function,
commonizing code from...
(expand_builtin_longjmp, expand_builtin_nonlocal_goto): ...here.
* genemit.cc (main): Emit an include of builtins.h.
---
 gcc/builtins.cc | 103 +---
 gcc/builtins.h  |   1 +
 gcc/genemit.cc  |   1 +
 3 files changed, 47 insertions(+), 58 deletions(-)

diff --git a/gcc/builtins.cc b/gcc/builtins.cc
index 4dc1ca672b2..2507745c17a 100644
--- a/gcc/builtins.cc
+++ b/gcc/builtins.cc
@@ -998,6 +998,49 @@ expand_builtin_setjmp_receiver (rtx receiver_label)
   emit_insn (gen_blockage ());
 }
 
+/* Emit the standard sequence for a nonlocal_goto.  The arguments are
+   the operands to the .md pattern.  */
+
+void
+emit_standard_nonlocal_goto (rtx value, rtx label, rtx stack, rtx fp)
+{
+  emit_clobber (gen_rtx_MEM (BLKmode, gen_rtx_SCRATCH (VOIDmode)));
+  emit_clobber (gen_rtx_MEM (BLKmode, hard_frame_pointer_rtx));
+
+  label = copy_to_reg (label);
+
+  /* Restore the frame pointer and stack pointer.  We must use a
+ temporary since the setjmp buffer may be a local.  */
+  fp = copy_to_reg (fp);
+  emit_stack_restore (SAVE_NONLOCAL, stack);
+
+  /* Ensure the frame pointer move is not optimized.  */
+  emit_insn (gen_blockage ());
+  emit_clobber (hard_frame_pointer_rtx);
+  emit_clobber (frame_pointer_rtx);
+  emit_move_insn (hard_frame_pointer_rtx, fp);
+
+  /* USE of hard_frame_pointer_rtx added for consistency;
+ not clear if really needed.  */
+  emit_use (hard_frame_pointer_rtx);
+  emit_use (stack_pointer_rtx);
+
+  /* If the architecture is using a GP register, we must
+ conservatively assume that the target function makes use of it.
+ The prologue of functions with nonlocal gotos must therefore
+ initialize the GP register to the appropriate value, and we
+ must then make sure that this value is live at the point
+ of the jump.  (Note that this doesn't necessarily apply
+ to targets with a nonlocal_goto pattern; they are free
+ to implement it in their own way.  Note also that this is
+ a no-op if the GP register is a global invariant.)  */
+  unsigned regnum = PIC_OFFSET_TABLE_REGNUM;
+  if (value == const0_rtx && regnum != INVALID_REGNUM && fixed_regs[regnum])
+emit_use (pic_offset_table_rtx);
+
+  emit_indirect_jump (label);
+}
+
 /* __builtin_longjmp is passed a pointer to an array of five words (not
all will be used on all machines).  It operates similarly to the C
library function of the same name, but is more efficient.  Much of
@@ -1049,27 +1092,7 @@ expand_builtin_longjmp (rtx buf_addr, rtx value)
   what that value is, because builtin_setjmp does not use it.  */
emit_insn (targetm.gen_nonlocal_goto (value, lab, stack, fp));
   else
-   {
- emit_clobber (gen_rtx_MEM (BLKmode, gen_rtx_SCRATCH (VOIDmode)));
- emit_clobber (gen_rtx_MEM (BLKmode, hard_frame_pointer_rtx));
-
- lab = copy_to_reg (lab);
-
- /* Restore the frame pointer and stack pointer.  We must use a
-temporary since the setjmp buffer may be a local.  */
- fp = copy_to_reg (fp);
- emit_stack_restore (SAVE_NONLOCAL, stack);
-
- /* Ensure the frame pointer move is not optimized.  */
- emit_insn (gen_blockage ());
- emit_clobber (hard_frame_pointer_rtx);
- emit_clobber (frame_pointer_rtx);
- emit_move_insn (hard_frame_pointer_rtx, fp);
-
- emit_use (hard_frame_pointer_rtx);
- emit_use (stack_pointer_rtx);
- emit_indirect_jump (lab);
-   }
+   emit_standard_nonlocal_goto (value, lab, stack, fp);
 }
 
   /* Search backwards and mark the jump insn as a non-local goto.
@@ -1201,43 +1224,7 @@ expand_builtin_nonlocal_goto (tree exp)
   if (targetm.have_nonlocal_goto ())
 emit_insn (targetm.gen_nonlocal_goto (const0_rtx, r_label, r_sp, r_fp));
   else
-{
-  emit_clobber (gen_rtx_MEM (BLKmode, gen_rtx_SCRATCH (VOIDmode)));
-  emit_clobber (gen_rtx_MEM (BLKmode, hard_frame_pointer_rtx));
-
-  r_label = copy_to_reg (r_label);
-
-  /* Restore the frame pointer and stack pointer.  We must use a
-temporary since the setjmp buffer may be a local.  */
-  r_fp = copy_to_reg (r_fp);
-  emit_stack_restore (SAVE_NONLOCAL, r_sp);
-
-  /* Ensure the frame pointer move is not optimized.  */
-  emit_insn (gen_blockage ());
-  emit_clobb

[PATCH 00/16] aarch64: Add support for SME

2022-11-13 Thread Richard Sandiford via Gcc-patches
This series adds support for the Armv9-A Scalable Matrix Extension (SME).
Details about the extension are available here:

  https://developer.arm.com/documentation/ddi0616/aa/?lang=en

The ABI and ACLE documentation is available on github:

  https://github.com/ARM-software/abi-aa/blob/main/aapcs64/aapcs64.rst
  https://github.com/ARM-software/abi-aa/blob/main/aadwarf64/aadwarf64.rst
  
https://github.com/ARM-software/acle/blob/main/main/acle.md#scalable-matrix-extension-sme

Series tested on aarch64-linux-gnu.  It depends on other patches
posted recently, and I'll give some time for comments & reviews,
so I won't be applying just yet.

Thanks,
Richard


[PATCH 01/16] aarch64: Add arm_streaming(_compatible) attributes

2022-11-13 Thread Richard Sandiford via Gcc-patches
This patch adds support for recognising the SME arm_streaming
and arm_streaming_compatible attributes.  These attributes
respectively describe whether the processor is definitely in
"streaming mode" (PSTATE.SM==1), whether the processor is
definitely not in streaming mode (PSTATE.SM==0), or whether
we don't know at compile time either way.

As far as the compiler is concerned, this effectively create three
ISA submodes: streaming mode enables things that are not available
in non-streaming mode, non-streaming mode enables things that not
available in streaming mode, and streaming-compatible mode has to stick
to the common subset.  This means that some instructions are conditional
on PSTATE.SM==1 and some are conditional on PSTATE.SM==0.

I wondered about recording the streaming state in a new variable.
However, the set of available instructions is also influenced by
PSTATE.ZA (added later), so I think it makes sense to view this
as an instance of a more general mechanism.  Also, keeping the
PSTATE.SM state in the same flag variable as the other ISA
features makes it possible to sum up the requirements of an
ACLE function in a single value.

The patch therefore adds a new set of feature flags called "ISA modes".
Unlike the other two sets of flags (optional features and architecture-
level features), these ISA modes are not controlled directly by
command-line parameters or "target" attributes.

arm_streaming and arm_streaming_compatible are function type attributes
rather than function declaration attributes.  This means that we need
to find somewhere to copy the type information across to a function's
target options.  The patch does this in aarch64_set_current_function.

We also need to record which ISA mode a callee expects/requires
to be active on entry.  (The same mode is then active on return.)
The patch extends the current UNSPEC_CALLEE_ABI cookie to include
this information, as well as the PCS variant that it recorded
previously.

gcc/
* config/aarch64/aarch64-isa-modes.def: New file.
* config/aarch64/aarch64.h: Include it in the feature enumerations.
(AARCH64_FL_SM_STATE, AARCH64_FL_ISA_MODES): New constants.
(AARCH64_FL_DEFAULT_ISA_MODE): Likewise.
(AARCH64_ISA_MODE): New macro.
(CUMULATIVE_ARGS): Add an isa_mode field.
* config/aarch64/aarch64-protos.h (aarch64_gen_callee_cookie): Declare.
(aarch64_tlsdesc_abi_id): Return an arm_pcs.
* config/aarch64/aarch64.cc (attr_streaming_exclusions): New variable.
(aarch64_attribute_table): Add arm_streaming and
arm_streaming_compatible.
(aarch64_fntype_sm_state, aarch64_fntype_isa_mode): New functions.
(aarch64_fndecl_sm_state, aarch64_fndecl_isa_mode): Likewise.
(aarch64_gen_callee_cookie, aarch64_callee_abi): Likewise.
(aarch64_insn_callee_cookie, aarch64_insn_callee_abi): Use them.
(aarch64_function_arg, aarch64_output_mi_thunk): Likewise.
(aarch64_init_cumulative_args): Initialize the isa_mode field.
(aarch64_override_options): Add the ISA mode to the feature set.
(aarch64_temporary_target::copy_from_fndecl): Likewise.
(aarch64_fndecl_options, aarch64_handle_attr_arch): Likewise.
(aarch64_set_current_function): Maintain the correct ISA mode.
(aarch64_tlsdesc_abi_id): Return an arm_pcs.
(aarch64_comp_type_attributes): Handle arm_streaming and
arm_streaming_compatible.
* config/aarch64/aarch64.md (tlsdesc_small_): Use
aarch64_gen_callee_cookie to get the ABI cookie.
* config/aarch64/t-aarch64 (TM_H): Add all feature-related .def files.

gcc/testsuite/
* gcc.target/aarch64/sme/aarch64-sme.exp: New harness.
* gcc.target/aarch64/sme/streaming_mode_1.c: New test.
* gcc.target/aarch64/sme/streaming_mode_2.c: Likewise.
* gcc.target/aarch64/auto-init-1.c: Only expect the call insn
to contain 1 (const_int 0), not 2.
---
 gcc/config/aarch64/aarch64-isa-modes.def  |  35 
 gcc/config/aarch64/aarch64-protos.h   |   3 +-
 gcc/config/aarch64/aarch64.cc | 194 +++---
 gcc/config/aarch64/aarch64.h  |  24 ++-
 gcc/config/aarch64/aarch64.md |   3 +-
 gcc/config/aarch64/t-aarch64  |   5 +-
 .../gcc.target/aarch64/auto-init-1.c  |   3 +-
 .../gcc.target/aarch64/sme/aarch64-sme.exp|  41 
 .../gcc.target/aarch64/sme/streaming_mode_1.c | 106 ++
 .../gcc.target/aarch64/sme/streaming_mode_2.c |  25 +++
 10 files changed, 403 insertions(+), 36 deletions(-)
 create mode 100644 gcc/config/aarch64/aarch64-isa-modes.def
 create mode 100644 gcc/testsuite/gcc.target/aarch64/sme/aarch64-sme.exp
 create mode 100644 gcc/testsuite/gcc.target/aarch64/sme/streaming_mode_1.c
 create mode 100644 gcc/testsuite/gcc.target/aarch64/sme/streaming_mode_2.c

diff --git a/gcc/config/aarch64/aarch64-isa-modes.def 
b/gcc/config/aarch64/aarch6

[PATCH 02/16] aarch64: Add +sme

2022-11-13 Thread Richard Sandiford via Gcc-patches
This patch adds the +sme ISA feature and requires it to be present
when compiling arm_streaming code.  (arm_streaming_compatible code
does not necessarily assume the presence of SME.  It just has to
work when SME is present and streaming mode is enabled.)

gcc/
* 
doc/gcc/gcc-command-options/machine-dependent-options/aarch64-options.rst:
Document SME.
* 
doc/gccint/testsuites/directives-used-within-dejagnu-tests/keywords-describing-target-attributes.rst:
Document aarch64_sve.
* config/aarch64/aarch64-option-extensions.def (sme): Define.
* config/aarch64/aarch64.h (AARCH64_ISA_SME): New macro.
* config/aarch64/aarch64.cc (aarch64_override_options_internal):
Ensure that SME is present when compiling streaming code.

gcc/testsuite/
* lib/target-supports.exp (check_effective_target_aarch64_sme): New
target test.
* gcc.target/aarch64/sme/aarch64-sme.exp: Force SME to be enabled
if it isn't by default.
* gcc.target/aarch64/sme/streaming_mode_3.c: New test.
---
 .../aarch64/aarch64-option-extensions.def |  2 +
 gcc/config/aarch64/aarch64.cc | 33 ++
 gcc/config/aarch64/aarch64.h  |  1 +
 .../aarch64-options.rst   |  3 +
 .../keywords-describing-target-attributes.rst |  3 +
 .../gcc.target/aarch64/sme/aarch64-sme.exp| 10 ++-
 .../gcc.target/aarch64/sme/streaming_mode_3.c | 63 +++
 .../gcc.target/aarch64/sme/streaming_mode_4.c | 22 +++
 gcc/testsuite/lib/target-supports.exp | 12 
 9 files changed, 147 insertions(+), 2 deletions(-)
 create mode 100644 gcc/testsuite/gcc.target/aarch64/sme/streaming_mode_3.c
 create mode 100644 gcc/testsuite/gcc.target/aarch64/sme/streaming_mode_4.c

diff --git a/gcc/config/aarch64/aarch64-option-extensions.def 
b/gcc/config/aarch64/aarch64-option-extensions.def
index bdf4baf309c..402a9832f87 100644
--- a/gcc/config/aarch64/aarch64-option-extensions.def
+++ b/gcc/config/aarch64/aarch64-option-extensions.def
@@ -129,6 +129,8 @@ AARCH64_OPT_EXTENSION("sve2-sha3", SVE2_SHA3, (SVE2, SHA3), 
(), (), "svesha3")
 AARCH64_OPT_EXTENSION("sve2-bitperm", SVE2_BITPERM, (SVE2), (), (),
  "svebitperm")
 
+AARCH64_OPT_EXTENSION("sme", SME, (SVE2), (), (), "sme")
+
 AARCH64_OPT_EXTENSION("tme", TME, (), (), (), "")
 
 AARCH64_OPT_EXTENSION("i8mm", I8MM, (SIMD), (), (), "i8mm")
diff --git a/gcc/config/aarch64/aarch64.cc b/gcc/config/aarch64/aarch64.cc
index a2e910daddf..fc6f0bc208a 100644
--- a/gcc/config/aarch64/aarch64.cc
+++ b/gcc/config/aarch64/aarch64.cc
@@ -11374,6 +11374,23 @@ aarch64_fixed_condition_code_regs (unsigned int *p1, 
unsigned int *p2)
   return true;
 }
 
+/* Implement TARGET_START_CALL_ARGS.  */
+
+static void
+aarch64_start_call_args (cumulative_args_t ca_v)
+{
+  CUMULATIVE_ARGS *ca = get_cumulative_args (ca_v);
+
+  if (!TARGET_SME && (ca->isa_mode & AARCH64_FL_SM_ON))
+{
+  error ("calling a streaming function requires the ISA extension %qs",
+"sme");
+  inform (input_location, "you can enable %qs using the command-line"
+ " option %<-march%>, or by using the %"
+ " attribute or pragma", "sme");
+}
+}
+
 /* This function is used by the call expanders of the machine description.
RESULT is the register in which the result is returned.  It's NULL for
"call" and "sibcall".
@@ -17865,6 +17882,19 @@ aarch64_override_options_internal (struct gcc_options 
*opts)
   && !fixed_regs[R18_REGNUM])
 error ("%<-fsanitize=shadow-call-stack%> requires %<-ffixed-x18%>");
 
+  if ((opts->x_aarch64_isa_flags & AARCH64_FL_SM_ON)
+  && !(opts->x_aarch64_isa_flags & AARCH64_FL_SME))
+{
+  error ("streaming functions require the ISA extension %qs", "sme");
+  inform (input_location, "you can enable %qs using the command-line"
+ " option %<-march%>, or by using the %"
+ " attribute or pragma", "sme");
+  opts->x_target_flags &= ~MASK_GENERAL_REGS_ONLY;
+  auto new_flags = (opts->x_aarch64_asm_isa_flags
+   | feature_deps::SME ().enable);
+  aarch64_set_asm_isa_flags (opts, new_flags);
+}
+
   initialize_aarch64_code_model (opts);
   initialize_aarch64_tls_size (opts);
 
@@ -27721,6 +27751,9 @@ aarch64_run_selftests (void)
 #undef TARGET_FUNCTION_VALUE_REGNO_P
 #define TARGET_FUNCTION_VALUE_REGNO_P aarch64_function_value_regno_p
 
+#undef TARGET_START_CALL_ARGS
+#define TARGET_START_CALL_ARGS aarch64_start_call_args
+
 #undef TARGET_GIMPLE_FOLD_BUILTIN
 #define TARGET_GIMPLE_FOLD_BUILTIN aarch64_gimple_fold_builtin
 
diff --git a/gcc/config/aarch64/aarch64.h b/gcc/config/aarch64/aarch64.h
index 1ac37b902bf..c47f27eefec 100644
--- a/gcc/config/aarch64/aarch64.h
+++ b/gcc/config/aarch64/aarch64.h
@@ -214,6 +214,7 @@ constexpr auto AARCH64_FL_DEFAULT_ISA_MODE = 
AARCH64_FL_SM_OFF;
 #define AARCH64_ISA_SVE2_BITPERM  (aarch64_isa_flags & AARCH64_F

[PATCH 03/16] aarch64: Distinguish streaming-compatible AdvSIMD insns

2022-11-13 Thread Richard Sandiford via Gcc-patches
The vast majority of Advanced SIMD instructions are not
available in streaming mode, but some of the load/store/move
instructions are.  This patch adds a new target feature macro
called TARGET_BASE_SIMD for this streaming-compatible subset.

The vector-to-vector move instructions are not streaming-compatible,
so we need to use the SVE move instructions where enabled, or fall
back to the nofp16 handling otherwise.

I haven't found a good way of testing the SVE EXT alternative
in aarch64_simd_mov_from_high, but I'd rather provide it
than not.

gcc/
* config/aarch64/aarch64.h (TARGET_BASE_SIMD): New macro.
(TARGET_SIMD): Require PSTATE.SM to be 0.
(AARCH64_ISA_SM_OFF): New macro.
* config/aarch64/aarch64.cc (aarch64_array_mode_supported_p):
Allow Advanced SIMD structure modes for TARGET_BASE_SIMD.
(aarch64_print_operand): Support '%Z'.
(aarch64_secondary_reload): Expect SVE moves to be used for
Advanced SIMD modes if SVE is enabled and non-streaming
Advanced SIMD isn't.
(aarch64_register_move_cost): Likewise.
(aarch64_simd_container_mode): Extend Advanced SIMD mode
handling to TARGET_BASE_SIMD.
(aarch64_expand_cpymem): Expand commentary.
* config/aarch64/aarch64.md (arches): Add base_simd.
(arch_enabled): Handle it.
(*mov_aarch64): Extend UMOV alternative to TARGET_BASE_SIMD.
(*movti_aarch64): Use an SVE move instruction if non-streaming
SIMD isn't available.
(*mov_aarch64): Likewise.
(load_pair_dw_tftf): Extend to TARGET_BASE_SIMD.
(store_pair_dw_tftf): Likewise.
(loadwb_pair_): Likewise.
(storewb_pair_): Likewise.
* config/aarch64/aarch64-simd.md (*aarch64_simd_mov):
Allow UMOV in streaming mode.
(*aarch64_simd_mov): Use an SVE move instruction
if non-streaming SIMD isn't available.
(aarch64_store_lane0): Depend on TARGET_FLOAT rather than
TARGET_SIMD.
(aarch64_simd_mov_from_low): Likewise.  Use fmov if
Advanced SIMD is completely disabled.
(aarch64_simd_mov_from_high): Use SVE EXT instructions if
non-streaming SIMD isn't available.

gcc/testsuite/
* gcc.target/aarch64/movdf_2.c: New test.
* gcc.target/aarch64/movdi_3.c: Likewise.
* gcc.target/aarch64/movhf_2.c: Likewise.
* gcc.target/aarch64/movhi_2.c: Likewise.
* gcc.target/aarch64/movqi_2.c: Likewise.
* gcc.target/aarch64/movsf_2.c: Likewise.
* gcc.target/aarch64/movsi_2.c: Likewise.
* gcc.target/aarch64/movtf_3.c: Likewise.
* gcc.target/aarch64/movtf_4.c: Likewise.
* gcc.target/aarch64/movti_3.c: Likewise.
* gcc.target/aarch64/movti_4.c: Likewise.
* gcc.target/aarch64/movv16qi_4.c: Likewise.
* gcc.target/aarch64/movv16qi_5.c: Likewise.
* gcc.target/aarch64/movv8qi_4.c: Likewise.
* gcc.target/aarch64/sme/arm_neon_1.c: Likewise.
* gcc.target/aarch64/sme/arm_neon_2.c: Likewise.
* gcc.target/aarch64/sme/arm_neon_3.c: Likewise.
---
 gcc/config/aarch64/aarch64-simd.md| 43 ++
 gcc/config/aarch64/aarch64.cc | 22 +++--
 gcc/config/aarch64/aarch64.h  | 12 ++-
 gcc/config/aarch64/aarch64.md | 45 +-
 gcc/testsuite/gcc.target/aarch64/movdf_2.c| 51 +++
 gcc/testsuite/gcc.target/aarch64/movdi_3.c| 59 +
 gcc/testsuite/gcc.target/aarch64/movhf_2.c| 53 
 gcc/testsuite/gcc.target/aarch64/movhi_2.c| 61 +
 gcc/testsuite/gcc.target/aarch64/movqi_2.c| 59 +
 gcc/testsuite/gcc.target/aarch64/movsf_2.c| 51 +++
 gcc/testsuite/gcc.target/aarch64/movsi_2.c| 59 +
 gcc/testsuite/gcc.target/aarch64/movtf_3.c| 81 +
 gcc/testsuite/gcc.target/aarch64/movtf_4.c| 78 +
 gcc/testsuite/gcc.target/aarch64/movti_3.c| 86 +++
 gcc/testsuite/gcc.target/aarch64/movti_4.c| 83 ++
 gcc/testsuite/gcc.target/aarch64/movv16qi_4.c | 82 ++
 gcc/testsuite/gcc.target/aarch64/movv16qi_5.c | 79 +
 gcc/testsuite/gcc.target/aarch64/movv8qi_4.c  | 55 
 .../gcc.target/aarch64/sme/arm_neon_1.c   | 13 +++
 .../gcc.target/aarch64/sme/arm_neon_2.c   | 11 +++
 .../gcc.target/aarch64/sme/arm_neon_3.c   | 11 +++
 21 files changed, 1047 insertions(+), 47 deletions(-)
 create mode 100644 gcc/testsuite/gcc.target/aarch64/movdf_2.c
 create mode 100644 gcc/testsuite/gcc.target/aarch64/movdi_3.c
 create mode 100644 gcc/testsuite/gcc.target/aarch64/movhf_2.c
 create mode 100644 gcc/testsuite/gcc.target/aarch64/movhi_2.c
 create mode 100644 gcc/testsuite/gcc.target/aarch64/movqi_2.c
 create mode 100644 gcc/testsuite/gcc.target/aarch64/movsf_2.c
 create mode 100644 gcc/testsuite/gcc.target/aarch64/movsi_2.c
 create mode 100644 

[PATCH 08/16] aarch64: Add a VNx1TI mode

2022-11-13 Thread Richard Sandiford via Gcc-patches
Although TI isn't really a native SVE element mode, it's convenient
for SME if we define VNx1TI anyway, so that it can be used to
distinguish .Q ZA operations from others.  It's purely an RTL
convenience and isn't (yet) a valid storage mode.

gcc/
* config/aarch64/aarch64-modes.def: Add VNx1TI.
---
 gcc/config/aarch64/aarch64-modes.def | 21 ++---
 1 file changed, 14 insertions(+), 7 deletions(-)

diff --git a/gcc/config/aarch64/aarch64-modes.def 
b/gcc/config/aarch64/aarch64-modes.def
index 0fd4c32ad0b..e960b649a6b 100644
--- a/gcc/config/aarch64/aarch64-modes.def
+++ b/gcc/config/aarch64/aarch64-modes.def
@@ -148,7 +148,7 @@ ADV_SIMD_Q_REG_STRUCT_MODES (4, V4x16, V4x8, V4x4, V4x2)
for 8-bit, 16-bit, 32-bit and 64-bit elements respectively.  It isn't
strictly necessary to set the alignment here, since the default would
be clamped to BIGGEST_ALIGNMENT anyhow, but it seems clearer.  */
-#define SVE_MODES(NVECS, VB, VH, VS, VD) \
+#define SVE_MODES(NVECS, VB, VH, VS, VD, VT) \
   VECTOR_MODES_WITH_PREFIX (VNx, INT, 16 * NVECS, NVECS == 1 ? 1 : 4); \
   VECTOR_MODES_WITH_PREFIX (VNx, FLOAT, 16 * NVECS, NVECS == 1 ? 1 : 4); \
   \
@@ -156,6 +156,7 @@ ADV_SIMD_Q_REG_STRUCT_MODES (4, V4x16, V4x8, V4x4, V4x2)
   ADJUST_NUNITS (VH##HI, aarch64_sve_vg * NVECS * 4); \
   ADJUST_NUNITS (VS##SI, aarch64_sve_vg * NVECS * 2); \
   ADJUST_NUNITS (VD##DI, aarch64_sve_vg * NVECS); \
+  ADJUST_NUNITS (VT##TI, exact_div (aarch64_sve_vg * NVECS, 2)); \
   ADJUST_NUNITS (VH##BF, aarch64_sve_vg * NVECS * 4); \
   ADJUST_NUNITS (VH##HF, aarch64_sve_vg * NVECS * 4); \
   ADJUST_NUNITS (VS##SF, aarch64_sve_vg * NVECS * 2); \
@@ -165,17 +166,23 @@ ADV_SIMD_Q_REG_STRUCT_MODES (4, V4x16, V4x8, V4x4, V4x2)
   ADJUST_ALIGNMENT (VH##HI, 16); \
   ADJUST_ALIGNMENT (VS##SI, 16); \
   ADJUST_ALIGNMENT (VD##DI, 16); \
+  ADJUST_ALIGNMENT (VT##TI, 16); \
   ADJUST_ALIGNMENT (VH##BF, 16); \
   ADJUST_ALIGNMENT (VH##HF, 16); \
   ADJUST_ALIGNMENT (VS##SF, 16); \
   ADJUST_ALIGNMENT (VD##DF, 16);
 
-/* Give SVE vectors the names normally used for 256-bit vectors.
-   The actual number depends on command-line flags.  */
-SVE_MODES (1, VNx16, VNx8, VNx4, VNx2)
-SVE_MODES (2, VNx32, VNx16, VNx8, VNx4)
-SVE_MODES (3, VNx48, VNx24, VNx12, VNx6)
-SVE_MODES (4, VNx64, VNx32, VNx16, VNx8)
+/* Give SVE vectors names of the form VNxX, where X describes what is
+   stored in each 128-bit unit.  The actual size of the mode depends
+   on command-line flags.
+
+   VNx1TI isn't really a native SVE mode, but it can be useful in some
+   limited situations.  */
+VECTOR_MODE_WITH_PREFIX (VNx, INT, TI, 1, 1);
+SVE_MODES (1, VNx16, VNx8, VNx4, VNx2, VNx1)
+SVE_MODES (2, VNx32, VNx16, VNx8, VNx4, VNx2)
+SVE_MODES (3, VNx48, VNx24, VNx12, VNx6, VNx3)
+SVE_MODES (4, VNx64, VNx32, VNx16, VNx8, VNx4)
 
 /* Partial SVE vectors:
 
-- 
2.25.1



[PATCH 07/16] aarch64: Add a register class for w12-w15

2022-11-13 Thread Richard Sandiford via Gcc-patches
Some SME instructions use w12-w15 to index ZA.  This patch
adds a register class for that range.

gcc/
* config/aarch64/aarch64.h (ZA_INDEX_REGNUM_P): New macro.
(ZA_INDEX_REGS): New register class.
(REG_CLASS_NAMES, REG_CLASS_CONTENTS): Add entries for it.
* config/aarch64/aarch64.cc (aarch64_regno_regclass)
(aarch64_class_max_nregs, aarch64_register_move_cost): Handle
ZA_INDEX_REGS.
---
 gcc/config/aarch64/aarch64.cc | 12 +++-
 gcc/config/aarch64/aarch64.h  |  6 ++
 2 files changed, 13 insertions(+), 5 deletions(-)

diff --git a/gcc/config/aarch64/aarch64.cc b/gcc/config/aarch64/aarch64.cc
index b200d2a9f80..d29cfefee6b 100644
--- a/gcc/config/aarch64/aarch64.cc
+++ b/gcc/config/aarch64/aarch64.cc
@@ -13553,6 +13553,9 @@ aarch64_label_mentioned_p (rtx x)
 enum reg_class
 aarch64_regno_regclass (unsigned regno)
 {
+  if (ZA_INDEX_REGNUM_P (regno))
+return ZA_INDEX_REGS;
+
   if (STUB_REGNUM_P (regno))
 return STUB_REGS;
 
@@ -13917,6 +13920,7 @@ aarch64_class_max_nregs (reg_class_t regclass, 
machine_mode mode)
   unsigned int nregs, vec_flags;
   switch (regclass)
 {
+case ZA_INDEX_REGS:
 case STUB_REGS:
 case TAILCALL_ADDR_REGS:
 case POINTER_REGS:
@@ -16252,13 +16256,11 @@ aarch64_register_move_cost (machine_mode mode,
   const struct cpu_regmove_cost *regmove_cost
 = aarch64_tune_params.regmove_cost;
 
-  /* Caller save and pointer regs are equivalent to GENERAL_REGS.  */
-  if (to == TAILCALL_ADDR_REGS || to == POINTER_REGS
-  || to == STUB_REGS)
+  /* Trest any subset of GENERAL_REGS as though it were GENERAL_REGS.  */
+  if (reg_class_subset_p (to, GENERAL_REGS))
 to = GENERAL_REGS;
 
-  if (from == TAILCALL_ADDR_REGS || from == POINTER_REGS
-  || from == STUB_REGS)
+  if (reg_class_subset_p (from, GENERAL_REGS))
 from = GENERAL_REGS;
 
   /* Make RDFFR very expensive.  In particular, if we know that the FFR
diff --git a/gcc/config/aarch64/aarch64.h b/gcc/config/aarch64/aarch64.h
index b5877e7e61e..bfa28726221 100644
--- a/gcc/config/aarch64/aarch64.h
+++ b/gcc/config/aarch64/aarch64.h
@@ -643,6 +643,9 @@ constexpr auto AARCH64_FL_DEFAULT_ISA_MODE = 
AARCH64_FL_SM_OFF;
&& (REGNO) != R17_REGNUM \
&& (REGNO) != R30_REGNUM) \
 
+#define ZA_INDEX_REGNUM_P(REGNO) \
+  IN_RANGE (REGNO, R12_REGNUM, R15_REGNUM)
+
 #define FP_REGNUM_P(REGNO) \
   (((unsigned) (REGNO - V0_REGNUM)) <= (V31_REGNUM - V0_REGNUM))
 
@@ -666,6 +669,7 @@ constexpr auto AARCH64_FL_DEFAULT_ISA_MODE = 
AARCH64_FL_SM_OFF;
 enum reg_class
 {
   NO_REGS,
+  ZA_INDEX_REGS,
   TAILCALL_ADDR_REGS,
   STUB_REGS,
   GENERAL_REGS,
@@ -690,6 +694,7 @@ enum reg_class
 #define REG_CLASS_NAMES\
 {  \
   "NO_REGS",   \
+  "ZA_INDEX_REGS", \
   "TAILCALL_ADDR_REGS",\
   "STUB_REGS", \
   "GENERAL_REGS",  \
@@ -711,6 +716,7 @@ enum reg_class
 #define REG_CLASS_CONTENTS \
 {  \
   { 0x, 0x, 0x },  /* NO_REGS */   \
+  { 0xf000, 0x, 0x },  /* ZA_INDEX_REGS */ \
   { 0x0003, 0x, 0x },  /* TAILCALL_ADDR_REGS */\
   { 0x3ffc, 0x, 0x },  /* STUB_REGS */ \
   { 0x7fff, 0x, 0x0003 },  /* GENERAL_REGS */  \
-- 
2.25.1



[PATCH 05/16] aarch64: Switch PSTATE.SM around calls

2022-11-13 Thread Richard Sandiford via Gcc-patches
This patch adds support for switching to the appropriate SME mode
for each call.  Switching to streaming mode requires an SMSTART SM
instruction and switching to non-streaming mode requires an SMSTOP SM
instruction.  If the call is being made from streaming-compatible code,
these switches are conditional on the current mode being the opposite
of the one that the call needs.

Since changing PSTATE.SM changes the vector length and effectively
changes the ISA, the code to do the switching has to be emitted late.
The patch does this using a new pass that runs next to late prologue/
epilogue insertion.  (It doesn't use md_reorg because later additions
need the CFG.)

If a streaming-compatible function needs to switch mode for a call,
it must restore the original mode afterwards.  The old mode must
therefore be available immediately after the call.  The easiest
way of ensuring this is to force the use of a hard frame pointer
and ensure that the old state is saved at an in-range offset
from there.

Changing modes clobbers the Z and P registers, so we need to
save and restore live Z and P state around each mode switch.
However, mode switches are not expected to be performance
critical, so it seemed better to err on the side of being
correct rather than trying to optimise the save and restore
with surrounding code.

gcc/
* config/aarch64/aarch64-passes.def
(pass_late_thread_prologue_and_epilogue): New pass.
* config/aarch64/aarch64-sme.md: New file.
* config/aarch64/aarch64.md: Include it.
(*tb1): Rename to...
(@aarch64_tb): ...this.
(call, call_value, sibcall, sibcall_value): Don't require operand 2
to be a CONST_INT.
* config/aarch64/aarch64-protos.h (aarch64_emit_call_insn): Return
the insn.
(make_pass_switch_sm_state): Declare.
* config/aarch64/aarch64.h (TARGET_STREAMING_COMPATIBLE): New macro.
(TARGET_SME): Likewise.
(aarch64_frame::old_svcr_offset): New member variable.
(machine_function::call_switches_sm_state): Likewise.
(CUMULATIVE_ARGS::num_sme_mode_switch_args): Likewise.
(CUMULATIVE_ARGS::sme_mode_switch_args): Likewise.
* config/aarch64/aarch64.cc: Include tree-pass.h and cfgbuild.h.
(aarch64_cfun_incoming_sm_state): New function.
(aarch64_call_switches_sm_state): Likewise.
(aarch64_callee_isa_mode): Likewise.
(aarch64_insn_callee_isa_mode): Likewise.
(aarch64_guard_switch_pstate_sm): Likewise.
(aarch64_switch_pstate_sm): Likewise.
(aarch64_sme_mode_switch_regs): New class.
(aarch64_record_sme_mode_switch_args): New function.
(aarch64_finish_sme_mode_switch_args): Likewise.
(aarch64_function_arg): Handle the end marker by returning a
PARALLEL that contains the ABI cookie that we used previously
alongside the result of aarch64_finish_sme_mode_switch_args.
(aarch64_init_cumulative_args): Initialize num_sme_mode_switch_args.
(aarch64_function_arg_advance): If a call would switch SM state,
record all argument registers that would need to be saved around
the mode switch.
(aarch64_need_old_pstate_sm): New function.
(aarch64_layout_frame): Decide whether the frame needs to store the
incoming value of PSTATE.SM and allocate a save slot for it if so.
(aarch64_old_svcr_mem): New function.
(aarch64_read_old_svcr): Likewise.
(aarch64_guard_switch_pstate_sm): Likewise.
(aarch64_expand_prologue): Initialize any SVCR save slot.
(aarch64_expand_call): Allow the cookie to be PARALLEL that contains
both the UNSPEC_CALLEE_ABI value and a list of registers that need
to be preserved across a change to PSTATE.SM.  If the call does
involve such a change to PSTATE.SM, record the registers that
would be clobbered by this process.  Update call_switches_sm_state
accordingly.
(aarch64_emit_call_insn): Return the emitted instruction.
(aarch64_frame_pointer_required): New function.
(aarch64_switch_sm_state_for_call): Likewise.
(pass_data_switch_sm_state): New pass variable.
(pass_switch_sm_state): New pass class.
(make_pass_switch_sm_state): New function.
(TARGET_FRAME_POINTER_REQUIRED): Define.
* config/aarch64/t-aarch64 (s-check-sve-md): Add aarch64-sme.md.

gcc/testsuite/
* gcc.target/aarch64/sme/call_sm_switch_1.c: New test.
* gcc.target/aarch64/sme/call_sm_switch_2.c: Likewise.
* gcc.target/aarch64/sme/call_sm_switch_3.c: Likewise.
* gcc.target/aarch64/sme/call_sm_switch_4.c: Likewise.
* gcc.target/aarch64/sme/call_sm_switch_5.c: Likewise.
* gcc.target/aarch64/sme/call_sm_switch_6.c: Likewise.
* gcc.target/aarch64/sme/call_sm_switch_7.c: Likewise.
* gcc.target/aarch64/sme/call_sm_switch_8.c: Likewise.
* gcc.target/aarch64/s

[PATCH 14/16] aarch64: Add support for arm_locally_streaming

2022-11-13 Thread Richard Sandiford via Gcc-patches
This patch adds support for the arm_locally_streaming attribute,
which allows a function to use SME internally without changing
the function's ABI.  The attribute is valid but redundant for
arm_streaming functions.

gcc/
* config/aarch64/aarch64.cc (aarch64_attribute_table): Add
arm_locally_streaming.
(aarch64_fndecl_is_locally_streaming): New function.
(aarch64_fndecl_sm_state): Handle arm_locally_streaming functions.
(aarch64_cfun_enables_pstate_sm): New function.
(aarch64_add_offset): Add an argument that specifies whether
the streaming vector length should be used instead of the
prevailing one.
(aarch64_split_add_offset, aarch64_add_sp, aarch64_sub_sp): Likewise.
(aarch64_allocate_and_probe_stack_space): Likewise.
(aarch64_expand_mov_immediate): Update calls accordingly.
(aarch64_need_old_pstate_sm): Return true for locally-streaming
streaming-compatible functions.
(aarch64_layout_frame): Force all call-preserved Z and P registers
to be saved and restored if the function switches PSTATE.SM in the
prologue.
(aarch64_get_separate_components): Disable shrink-wrapping of
such Z and P saves and restores.
(aarch64_use_late_prologue_epilogue): New function.
(aarch64_expand_prologue): Measure SVE lengths in the streaming
vector length for locally-streaming functions, then emit code
to enable streaming mode.  Combine separate SMSTART ZA and
SMSTART SM instructions into a single SMSTART where possible.
(aarch64_expand_epilogue): Likewise in reverse.
(TARGET_USE_LATE_PROLOGUE_EPILOGUE): Define.
* config/aarch64/aarch64-sme.md (UNSPEC_SMSTART): New unspec.
(UNSPEC_SMSTOP): Likewise.
(aarch64_smstart, aarch64_smstop): New patterns.

gcc/testsuite/
* gcc.target/aarch64/sme/locally_streaming_1.c: New test.
* gcc.target/aarch64/sme/locally_streaming_2.c: Likewise.
* gcc.target/aarch64/sme/locally_streaming_3.c: Likewise.
---
 gcc/config/aarch64/aarch64-sme.md |  82 
 gcc/config/aarch64/aarch64.cc | 237 --
 .../aarch64/sme/locally_streaming_1.c | 433 ++
 .../aarch64/sme/locally_streaming_2.c | 177 +++
 .../aarch64/sme/locally_streaming_3.c | 273 +++
 5 files changed, 1164 insertions(+), 38 deletions(-)
 create mode 100644 gcc/testsuite/gcc.target/aarch64/sme/locally_streaming_1.c
 create mode 100644 gcc/testsuite/gcc.target/aarch64/sme/locally_streaming_2.c
 create mode 100644 gcc/testsuite/gcc.target/aarch64/sme/locally_streaming_3.c

diff --git a/gcc/config/aarch64/aarch64-sme.md 
b/gcc/config/aarch64/aarch64-sme.md
index 7b3ccea2e11..70be7adba28 100644
--- a/gcc/config/aarch64/aarch64-sme.md
+++ b/gcc/config/aarch64/aarch64-sme.md
@@ -281,6 +281,88 @@ (define_insn_and_split "aarch64_restore_za"
 DONE;
   }
 )
+
+;; -
+;;  Combined PSTATE.SM and PSTATE.ZA management
+;; -
+;; Includes
+;; - SMSTART
+;; - SMSTOP
+;; -
+
+(define_c_enum "unspec" [
+  UNSPEC_SMSTART
+  UNSPEC_SMSTOP
+])
+
+;; Enable SM and ZA, starting with fresh ZA contents.  This is only valid when
+;; SME is present, but the pattern does not depend on TARGET_SME since it can
+;; be used conditionally.
+(define_insn "aarch64_smstart"
+  [(unspec_volatile [(const_int 0)] UNSPEC_SMSTART)
+   (clobber (reg:V4x16QI V0_REGNUM))
+   (clobber (reg:V4x16QI V4_REGNUM))
+   (clobber (reg:V4x16QI V8_REGNUM))
+   (clobber (reg:V4x16QI V12_REGNUM))
+   (clobber (reg:V4x16QI V16_REGNUM))
+   (clobber (reg:V4x16QI V20_REGNUM))
+   (clobber (reg:V4x16QI V24_REGNUM))
+   (clobber (reg:V4x16QI V28_REGNUM))
+   (clobber (reg:VNx16BI P0_REGNUM))
+   (clobber (reg:VNx16BI P1_REGNUM))
+   (clobber (reg:VNx16BI P2_REGNUM))
+   (clobber (reg:VNx16BI P3_REGNUM))
+   (clobber (reg:VNx16BI P4_REGNUM))
+   (clobber (reg:VNx16BI P5_REGNUM))
+   (clobber (reg:VNx16BI P6_REGNUM))
+   (clobber (reg:VNx16BI P7_REGNUM))
+   (clobber (reg:VNx16BI P8_REGNUM))
+   (clobber (reg:VNx16BI P9_REGNUM))
+   (clobber (reg:VNx16BI P10_REGNUM))
+   (clobber (reg:VNx16BI P11_REGNUM))
+   (clobber (reg:VNx16BI P12_REGNUM))
+   (clobber (reg:VNx16BI P13_REGNUM))
+   (clobber (reg:VNx16BI P14_REGNUM))
+   (clobber (reg:VNx16BI P15_REGNUM))
+   (clobber (reg:VNx16QI ZA_REGNUM))]
+  ""
+  "smstart"
+)
+
+;; Disable SM and ZA, and discard its current contents.  This is only valid
+;; when SME is present, but the pattern does not depend on TARGET_SME since
+;; it can be used conditionally.
+(define_insn "aarch64_smstop"
+  [(unspec_volatile [(reg:VNx16QI OLD_ZA_REGNUM)] UNSPEC_SMSTOP)
+   (clobber (reg:V4x16QI V0_REGNUM))
+   (clobber (reg:V4x16QI V4_

[PATCH 10/16] aarch64: Generalise unspec_based_function_base

2022-11-13 Thread Richard Sandiford via Gcc-patches
Until now, SVE intrinsics that map directly to unspecs
have always used type suffix 0 to distinguish between signed
integers, unsigned integers, and floating-point values.
SME adds functions that need to use type suffix 1 instead.
This patch generalises the classes accordingly.

gcc/
* config/aarch64/aarch64-sve-builtins-functions.h
(unspec_based_function_base): Allow type suffix 1 to determine
the mode of the operation.
(unspec_based_fused_function): Update accordingly.
(unspec_based_fused_lane_function): Likewise.
---
 .../aarch64/aarch64-sve-builtins-functions.h  | 29 ---
 1 file changed, 18 insertions(+), 11 deletions(-)

diff --git a/gcc/config/aarch64/aarch64-sve-builtins-functions.h 
b/gcc/config/aarch64/aarch64-sve-builtins-functions.h
index 472e26c17ff..2fd135aab07 100644
--- a/gcc/config/aarch64/aarch64-sve-builtins-functions.h
+++ b/gcc/config/aarch64/aarch64-sve-builtins-functions.h
@@ -229,18 +229,21 @@ class unspec_based_function_base : public function_base
 public:
   CONSTEXPR unspec_based_function_base (int unspec_for_sint,
int unspec_for_uint,
-   int unspec_for_fp)
+   int unspec_for_fp,
+   unsigned int suffix_index = 0)
 : m_unspec_for_sint (unspec_for_sint),
   m_unspec_for_uint (unspec_for_uint),
-  m_unspec_for_fp (unspec_for_fp)
+  m_unspec_for_fp (unspec_for_fp),
+  m_suffix_index (suffix_index)
   {}
 
   /* Return the unspec code to use for INSTANCE, based on type suffix 0.  */
   int
   unspec_for (const function_instance &instance) const
   {
-return (!instance.type_suffix (0).integer_p ? m_unspec_for_fp
-   : instance.type_suffix (0).unsigned_p ? m_unspec_for_uint
+auto &suffix = instance.type_suffix (m_suffix_index);
+return (!suffix.integer_p ? m_unspec_for_fp
+   : suffix.unsigned_p ? m_unspec_for_uint
: m_unspec_for_sint);
   }
 
@@ -249,6 +252,9 @@ public:
   int m_unspec_for_sint;
   int m_unspec_for_uint;
   int m_unspec_for_fp;
+
+  /* Which type suffix is used to choose between the unspecs.  */
+  unsigned int m_suffix_index;
 };
 
 /* A function_base for functions that have an associated unspec code.
@@ -301,7 +307,8 @@ public:
   rtx
   expand (function_expander &e) const override
   {
-return e.use_exact_insn (CODE (unspec_for (e), e.vector_mode (0)));
+return e.use_exact_insn (CODE (unspec_for (e),
+  e.vector_mode (m_suffix_index)));
   }
 };
 
@@ -355,16 +362,16 @@ public:
   {
 int unspec = unspec_for (e);
 insn_code icode;
-if (e.type_suffix (0).float_p)
+if (e.type_suffix (m_suffix_index).float_p)
   {
/* Put the operands in the normal (fma ...) order, with the accumulator
   last.  This fits naturally since that's also the unprinted operand
   in the asm output.  */
e.rotate_inputs_left (0, e.pred != PRED_none ? 4 : 3);
-   icode = code_for_aarch64_sve (unspec, e.vector_mode (0));
+   icode = code_for_aarch64_sve (unspec, e.vector_mode (m_suffix_index));
   }
 else
-  icode = INT_CODE (unspec, e.vector_mode (0));
+  icode = INT_CODE (unspec, e.vector_mode (m_suffix_index));
 return e.use_exact_insn (icode);
   }
 };
@@ -385,16 +392,16 @@ public:
   {
 int unspec = unspec_for (e);
 insn_code icode;
-if (e.type_suffix (0).float_p)
+if (e.type_suffix (m_suffix_index).float_p)
   {
/* Put the operands in the normal (fma ...) order, with the accumulator
   last.  This fits naturally since that's also the unprinted operand
   in the asm output.  */
e.rotate_inputs_left (0, e.pred != PRED_none ? 5 : 4);
-   icode = code_for_aarch64_lane (unspec, e.vector_mode (0));
+   icode = code_for_aarch64_lane (unspec, e.vector_mode (m_suffix_index));
   }
 else
-  icode = INT_CODE (unspec, e.vector_mode (0));
+  icode = INT_CODE (unspec, e.vector_mode (m_suffix_index));
 return e.use_exact_insn (icode);
   }
 };
-- 
2.25.1



[PATCH 16/16] aarch64: Update sibcall handling for SME

2022-11-13 Thread Richard Sandiford via Gcc-patches
We only support tail calls between functions with the same PSTATE.ZA
setting ("private-ZA" to "private-ZA" and "shared-ZA" to "shared-ZA").

Only a normal non-streaming function can tail-call another non-streaming
function, and only a streaming function can tail-call another streaming
function.  Any function can tail-call a streaming-compatible function.

gcc/
* config/aarch64/aarch64.cc (aarch64_function_ok_for_sibcall):
Enforce PSTATE.SM and PSTATE.ZA restrictions.
(aarch64_expand_epilogue): Save and restore the arguments
to a sibcall around any change to PSTATE.SM.

gcc/testsuite/
* gcc.target/aarch64/sme/locally_streaming_4.c: New test.
* gcc.target/aarch64/sme/sibcall_1.c: Likewise.
* gcc.target/aarch64/sme/sibcall_2.c: Likewise.
* gcc.target/aarch64/sme/sibcall_3.c: Likewise.
* gcc.target/aarch64/sme/sibcall_4.c: Likewise.
* gcc.target/aarch64/sme/sibcall_5.c: Likewise.
* gcc.target/aarch64/sme/sibcall_6.c: Likewise.
* gcc.target/aarch64/sme/sibcall_7.c: Likewise.
* gcc.target/aarch64/sme/sibcall_8.c: Likewise.
---
 gcc/config/aarch64/aarch64.cc |   9 +-
 .../aarch64/sme/locally_streaming_4.c | 129 ++
 .../gcc.target/aarch64/sme/sibcall_1.c|  45 ++
 .../gcc.target/aarch64/sme/sibcall_2.c|  45 ++
 .../gcc.target/aarch64/sme/sibcall_3.c|  45 ++
 .../gcc.target/aarch64/sme/sibcall_4.c|  45 ++
 .../gcc.target/aarch64/sme/sibcall_5.c|  45 ++
 .../gcc.target/aarch64/sme/sibcall_6.c|  26 
 .../gcc.target/aarch64/sme/sibcall_7.c|  26 
 .../gcc.target/aarch64/sme/sibcall_8.c|  19 +++
 10 files changed, 433 insertions(+), 1 deletion(-)
 create mode 100644 gcc/testsuite/gcc.target/aarch64/sme/locally_streaming_4.c
 create mode 100644 gcc/testsuite/gcc.target/aarch64/sme/sibcall_1.c
 create mode 100644 gcc/testsuite/gcc.target/aarch64/sme/sibcall_2.c
 create mode 100644 gcc/testsuite/gcc.target/aarch64/sme/sibcall_3.c
 create mode 100644 gcc/testsuite/gcc.target/aarch64/sme/sibcall_4.c
 create mode 100644 gcc/testsuite/gcc.target/aarch64/sme/sibcall_5.c
 create mode 100644 gcc/testsuite/gcc.target/aarch64/sme/sibcall_6.c
 create mode 100644 gcc/testsuite/gcc.target/aarch64/sme/sibcall_7.c
 create mode 100644 gcc/testsuite/gcc.target/aarch64/sme/sibcall_8.c

diff --git a/gcc/config/aarch64/aarch64.cc b/gcc/config/aarch64/aarch64.cc
index 9a4a469a078..0d4c20f5c6a 100644
--- a/gcc/config/aarch64/aarch64.cc
+++ b/gcc/config/aarch64/aarch64.cc
@@ -8110,6 +8110,11 @@ aarch64_function_ok_for_sibcall (tree, tree exp)
   if (crtl->abi->id () != expr_callee_abi (exp).id ())
 return false;
 
+  tree fntype = TREE_TYPE (TREE_TYPE (CALL_EXPR_FN (exp)));
+  if (aarch64_fntype_sm_state (fntype) & ~aarch64_cfun_incoming_sm_state ())
+return false;
+  if (aarch64_fntype_za_state (fntype) != aarch64_cfun_incoming_za_state ())
+return false;
   return true;
 }
 
@@ -11236,7 +11241,9 @@ aarch64_expand_epilogue (rtx_call_insn *sibcall)
guard_label = aarch64_guard_switch_pstate_sm (IP0_REGNUM,
  aarch64_isa_flags);
   aarch64_sme_mode_switch_regs args_switch;
-  if (crtl->return_rtx && REG_P (crtl->return_rtx))
+  if (sibcall)
+   args_switch.add_call_args (sibcall);
+  else if (crtl->return_rtx && REG_P (crtl->return_rtx))
args_switch.add_reg (GET_MODE (crtl->return_rtx),
 REGNO (crtl->return_rtx));
   args_switch.emit_prologue ();
diff --git a/gcc/testsuite/gcc.target/aarch64/sme/locally_streaming_4.c 
b/gcc/testsuite/gcc.target/aarch64/sme/locally_streaming_4.c
new file mode 100644
index 000..b0e4759ed11
--- /dev/null
+++ b/gcc/testsuite/gcc.target/aarch64/sme/locally_streaming_4.c
@@ -0,0 +1,129 @@
+// { dg-options "-O -fomit-frame-pointer -fno-optimize-sibling-calls" }
+/* { dg-final { check-function-bodies "**" "" } } */
+
+#include 
+#include 
+
+/*
+** test_d0:
+** ...
+** fmovx10, d0
+** smstop  sm
+** fmovd0, x10
+** ...
+*/
+void consume_d0 (double d0);
+
+void __attribute__((arm_locally_streaming))
+test_d0 ()
+{
+  consume_d0 (1.0);
+}
+
+/*
+** test_d7:
+** ...
+** fmovx10, d0
+** fmovx11, d1
+** fmovx12, d2
+** fmovx13, d3
+** fmovx14, d4
+** fmovx15, d5
+** fmovx16, d6
+** fmovx17, d7
+** smstop  sm
+** fmovd0, x10
+** fmovd1, x11
+** fmovd2, x12
+** fmovd3, x13
+** fmovd4, x14
+** fmovd5, x15
+** fmovd6, x16
+** fmovd7, x17
+** ...
+*/
+void consume_d7 (double d0, double d1, double d2, double d3,
+double d4, double d5, double d6, double d7);
+void __attribute__((arm_locally_streaming))
+test_d7 ()
+{
+  consume_d7 (1.0, 2.0, 3.0, 4.0, 5.0, 6.0, 7.0, 8.0);
+}
+
+/*
+**

[PATCH 06/16] aarch64: Add support for SME ZA attributes

2022-11-13 Thread Richard Sandiford via Gcc-patches
SME has an array called ZA that can be enabled and disabled separately
from streaming mode.  A status bit called PSTATE.ZA indicates whether
ZA is currently enabled or not.

In C and C++, the state of PSTATE.ZA is controlled using function
attributes.  If a function's type has an arm_shared_za attribute,
PSTATE.ZA==1 on entry to the function and on return from the function,
and the function shares the contents of ZA with its caller.  Otherwise,
the caller and callee have separate ZA contexts; they do not use ZA to
share data.

Although normal non-arm_shared_za functions have a separate
ZA context from their callers, nested uses of ZA are expected
to be rare.  The ABI therefore defines a cooperative lazy saving
scheme that allows saves and restore of ZA to be kept to a minimum.
(Callers still have the option of doing a full save and restore
if they prefer.)

Functions that want to use ZA internally have an arm_new_za
attribute, which tells the compiler to enable PSTATE.ZA for
the duration of the function body.  It also tells the compiler
to commit any lazy save initiated by a caller.

There is also a function type attribute called arm_preserves_za,
which a function can use to guarantee to callers that it doesn't
change ZA (and so that callers don't need to save and restore it).
A known flaw is that it should be possible to assign preserves-ZA
functions to normal function pointers, but currently that results
in a diagnostic.  (The opposite way is invalid and rightly rejected.)

gcc/
* config/aarch64/aarch64-isa-modes.def (ZA_ON): New ISA mode.
* config/aarch64/aarch64-protos.h (aarch64_rdsvl_immediate_p)
(aarch64_output_rdsvl, aarch64_restore_za): Declare.
* config/aarch64/constraints.md (UsR): New constraint.
* config/aarch64/aarch64.md (ZA_REGNUM, OLD_ZA_REGNUM): New constants.
(UNSPEC_SME_VQ): New unspec.
(arches): Add sme.
(arch_enabled): Handle it.
(*cb1): Rename to...
(aarch64_cb1): ...this.
(*movsi_aarch64): Add an alernative for RDSVL.
(*movdi_aarch64): Likewise.
* config/aarch64/aarch64-sme.md (UNSPEC_SMSTART_ZA, UNSPEC_SMSTOP_ZA)
(UNSPEC_TPIDR2_SAVE, UNSPEC_TPIDR2_RESTORE, UNSPEC_READ_TPIDR2)
(UNSPEC_CLEAR_TPIDR2): New unspecs.
(aarch64_smstart_za, aarch64_smstop_za, aarch64_tpidr2_save)
(aarch64_tpidr2_restore, aarch64_read_tpidr2, aarch64_clear_tpidr2)
(aarch64_save_za, aarch64_restore_za): New patterns.
* config/aarch64/aarch64.h (AARCH64_ISA_ZA_ON, TARGET_ZA): New macros.
(FIXED_REGISTERS, REGISTER_NAMES): Add the ZA registers.
(CALL_USED_REGISTERS): Replace with...
(CALL_REALLY_USED_REGISTERS): ...this and add the ZA registers.
(FIRST_PSEUDO_REGISTER): Bump to include ZA registers.
(ZA_REGS): New register class.
(REG_CLASS_NAMES): Update accordingly.
(REG_CLASS_CONTENTS): Likewise.
(aarch64_frame::has_new_za_state): New member variable.
(machine_function::tpidr2_block): Likewise.
(machine_function::tpidr2_block_ptr): Likewise.
(machine_function::za_save_buffer): Likewise.
(CUMULATIVE_ARGS::preserves_za): Likewise.
* config/aarch64/aarch64.cc (handle_arm_new_za_attribute): New
function.
(attr_arm_new_za_exclusions): New variable.
(attr_no_arm_new_za): Likewise.
(aarch64_attribute_table): Add arm_new_za, arm_shared_za, and
arm_preserves_za.
(aarch64_hard_regno_nregs): Handle the ZA registers.
(aarch64_hard_regno_mode_ok): Likewise.
(aarch64_regno_regclass): Likewise.
(aarch64_class_max_nregs): Likewise.
(aarch64_conditional_register_usage): Likewise.
(aarch64_fntype_za_state): New function.
(aarch64_fntype_isa_mode): Call it.
(aarch64_fntype_preserves_za): New function.
(aarch64_fndecl_has_new_za_state): Likewise.
(aarch64_fndecl_za_state): Likewise.
(aarch64_fndecl_isa_mode): Call it.
(aarch64_fndecl_preserves_za): New function.
(aarch64_cfun_incoming_za_state): Likewise.
(aarch64_cfun_has_new_za_state): Likewise.
(aarch64_sme_vq_immediate): Likewise.
(aarch64_sme_vq_unspec_p): Likewise.
(aarch64_rdsvl_immediate_p): Likewise.
(aarch64_output_rdsvl): Likewise.
(aarch64_expand_mov_immediate): Handle RDSVL immediates.
(aarch64_mov_operand_p): Likewise.
(aarch64_init_cumulative_args): Record whether the call preserves ZA.
(aarch64_layout_frame): Check whether the current function creates
new ZA state.  Record that it clobbers LR if so.
(aarch64_epilogue_uses): Handle ZA_REGNUM.
(aarch64_expand_prologue): Handle functions that create new ZA state.
(aarch64_expand_epilogue): Likewise.
(aarch64_create_tpidr2_block): New function.
(aarch64_restore_za): Likewise.
(aarch64_start_call_ar

[PATCH 11/16] aarch64: Generalise _m rules for SVE intrinsics

2022-11-13 Thread Richard Sandiford via Gcc-patches
In SVE there was a simple rule that unary merging (_m) intrinsics
had a separate initial argument to specify the values of inactive
lanes, whereas other merging functions took inactive lanes from
the first operand to the operation.

That rule began to break down in SVE2, and it continues to do
so in SME.  This patch therefore adds a virtual function to
specify whether the separate initial argument is present or not.
The old rule is still the default.

gcc/
* config/aarch64/aarch64-sve-builtins.h
(function_shape::has_merge_argument_p): New member function.
* config/aarch64/aarch64-sve-builtins.cc:
(function_resolver::check_gp_argument): Use it.
(function_expander::get_fallback_value): Likewise.
* config/aarch64/aarch64-sve-builtins-shapes.cc
(apply_predication): Likewise.
(unary_convert_narrowt_def::has_merge_argument_p): New function.
---
 gcc/config/aarch64/aarch64-sve-builtins-shapes.cc | 10 --
 gcc/config/aarch64/aarch64-sve-builtins.cc|  4 ++--
 gcc/config/aarch64/aarch64-sve-builtins.h | 13 +
 3 files changed, 23 insertions(+), 4 deletions(-)

diff --git a/gcc/config/aarch64/aarch64-sve-builtins-shapes.cc 
b/gcc/config/aarch64/aarch64-sve-builtins-shapes.cc
index 8e26bd8a60f..5b47dff0b41 100644
--- a/gcc/config/aarch64/aarch64-sve-builtins-shapes.cc
+++ b/gcc/config/aarch64/aarch64-sve-builtins-shapes.cc
@@ -66,8 +66,8 @@ apply_predication (const function_instance &instance, tree 
return_type,
 the same type as the result.  For unary_convert_narrowt it also
 provides the "bottom" half of active elements, and is present
 for all types of predication.  */
-  if ((argument_types.length () == 2 && instance.pred == PRED_m)
- || instance.shape == shapes::unary_convert_narrowt)
+  auto nargs = argument_types.length () - 1;
+  if (instance.shape->has_merge_argument_p (instance, nargs))
argument_types.quick_insert (0, return_type);
 }
 }
@@ -3238,6 +3238,12 @@ SHAPE (unary_convert)
predicate.  */
 struct unary_convert_narrowt_def : public overloaded_base<1>
 {
+  bool
+  has_merge_argument_p (const function_instance &, unsigned int) const override
+  {
+return true;
+  }
+
   void
   build (function_builder &b, const function_group_info &group) const override
   {
diff --git a/gcc/config/aarch64/aarch64-sve-builtins.cc 
b/gcc/config/aarch64/aarch64-sve-builtins.cc
index cb3eb76dd77..450a8d958a8 100644
--- a/gcc/config/aarch64/aarch64-sve-builtins.cc
+++ b/gcc/config/aarch64/aarch64-sve-builtins.cc
@@ -2152,7 +2152,7 @@ function_resolver::check_gp_argument (unsigned int nops,
   if (pred != PRED_none)
 {
   /* Unary merge operations should use resolve_unary instead.  */
-  gcc_assert (nops != 1 || pred != PRED_m);
+  gcc_assert (!shape->has_merge_argument_p (*this, nops));
   nargs = nops + 1;
   if (!check_num_arguments (nargs)
  || !require_vector_type (i, VECTOR_TYPE_svbool_t))
@@ -2790,7 +2790,7 @@ function_expander::get_fallback_value (machine_mode mode, 
unsigned int nops,
 
   gcc_assert (pred == PRED_m || pred == PRED_x);
   if (merge_argno == DEFAULT_MERGE_ARGNO)
-merge_argno = nops == 1 && pred == PRED_m ? 0 : 1;
+merge_argno = shape->has_merge_argument_p (*this, nops) ? 0 : 1;
 
   if (merge_argno == 0)
 return args[argno++];
diff --git a/gcc/config/aarch64/aarch64-sve-builtins.h 
b/gcc/config/aarch64/aarch64-sve-builtins.h
index 0d130b871d0..623b9e3a07b 100644
--- a/gcc/config/aarch64/aarch64-sve-builtins.h
+++ b/gcc/config/aarch64/aarch64-sve-builtins.h
@@ -636,6 +636,9 @@ public:
 class function_shape
 {
 public:
+  virtual bool has_merge_argument_p (const function_instance &,
+unsigned int) const;
+
   virtual bool explicit_type_suffix_p (unsigned int) const = 0;
 
   /* Define all functions associated with the given group.  */
@@ -877,6 +880,16 @@ function_base::call_properties (const function_instance 
&instance) const
   return flags;
 }
 
+/* Return true if INSTANCE (which has NARGS arguments) has an initial
+   vector argument whose only purpose is to specify the values of
+   inactive lanes.  */
+inline bool
+function_shape::has_merge_argument_p (const function_instance &instance,
+ unsigned int nargs) const
+{
+  return nargs == 1 && instance.pred == PRED_m;
+}
+
 }
 
 #endif
-- 
2.25.1



[PATCH 09/16] aarch64: Make AARCH64_FL_SVE requirements explicit

2022-11-13 Thread Richard Sandiford via Gcc-patches
So far, all intrinsics covered by the aarch64-sve-builtins*
framework have (naturally enough) required at least SVE.
However, arm_sme.h defines a couple of intrinsics that can
be called by any code.  It's therefore necessary to make
the implicit SVE requirement explicit.

gcc/
* config/aarch64/aarch64-sve-builtins.cc (function_groups): Remove
implied requirement on SVE.
* config/aarch64/aarch64-sve-builtins-base.def: Explicitly require SVE.
* config/aarch64/aarch64-sve-builtins-sve2.def: Likewise.
---
 .../aarch64/aarch64-sve-builtins-base.def | 26 ---
 .../aarch64/aarch64-sve-builtins-sve2.def | 18 -
 gcc/config/aarch64/aarch64-sve-builtins.cc|  2 +-
 3 files changed, 30 insertions(+), 16 deletions(-)

diff --git a/gcc/config/aarch64/aarch64-sve-builtins-base.def 
b/gcc/config/aarch64/aarch64-sve-builtins-base.def
index a2d0cea6c5b..d35cdffe20f 100644
--- a/gcc/config/aarch64/aarch64-sve-builtins-base.def
+++ b/gcc/config/aarch64/aarch64-sve-builtins-base.def
@@ -17,7 +17,7 @@
along with GCC; see the file COPYING3.  If not see
.  */
 
-#define REQUIRED_EXTENSIONS 0
+#define REQUIRED_EXTENSIONS AARCH64_FL_SVE
 DEF_SVE_FUNCTION (svabd, binary_opt_n, all_arith, mxz)
 DEF_SVE_FUNCTION (svabs, unary, all_float_and_signed, mxz)
 DEF_SVE_FUNCTION (svacge, compare_opt_n, all_float, implicit)
@@ -255,7 +255,7 @@ DEF_SVE_FUNCTION (svzip2, binary, all_data, none)
 DEF_SVE_FUNCTION (svzip2, binary_pred, all_pred, none)
 #undef REQUIRED_EXTENSIONS
 
-#define REQUIRED_EXTENSIONS AARCH64_FL_SM_OFF
+#define REQUIRED_EXTENSIONS AARCH64_FL_SVE | AARCH64_FL_SM_OFF
 DEF_SVE_FUNCTION (svadda, fold_left, all_float, implicit)
 DEF_SVE_FUNCTION (svadrb, adr_offset, none, none)
 DEF_SVE_FUNCTION (svadrd, adr_index, none, none)
@@ -321,7 +321,7 @@ DEF_SVE_FUNCTION (svtssel, binary_uint, all_float, none)
 DEF_SVE_FUNCTION (svwrffr, setffr, none, implicit)
 #undef REQUIRED_EXTENSIONS
 
-#define REQUIRED_EXTENSIONS AARCH64_FL_BF16
+#define REQUIRED_EXTENSIONS AARCH64_FL_SVE | AARCH64_FL_BF16
 DEF_SVE_FUNCTION (svbfdot, ternary_bfloat_opt_n, s_float, none)
 DEF_SVE_FUNCTION (svbfdot_lane, ternary_bfloat_lanex2, s_float, none)
 DEF_SVE_FUNCTION (svbfmlalb, ternary_bfloat_opt_n, s_float, none)
@@ -332,27 +332,33 @@ DEF_SVE_FUNCTION (svcvt, unary_convert, cvt_bfloat, mxz)
 DEF_SVE_FUNCTION (svcvtnt, unary_convert_narrowt, cvt_bfloat, mx)
 #undef REQUIRED_EXTENSIONS
 
-#define REQUIRED_EXTENSIONS AARCH64_FL_BF16 | AARCH64_FL_SM_OFF
+#define REQUIRED_EXTENSIONS (AARCH64_FL_SVE \
+| AARCH64_FL_BF16 \
+| AARCH64_FL_SM_OFF)
 DEF_SVE_FUNCTION (svbfmmla, ternary_bfloat, s_float, none)
 #undef REQUIRED_EXTENSIONS
 
-#define REQUIRED_EXTENSIONS AARCH64_FL_I8MM
+#define REQUIRED_EXTENSIONS AARCH64_FL_SVE | AARCH64_FL_I8MM
 DEF_SVE_FUNCTION (svsudot, ternary_intq_uintq_opt_n, s_signed, none)
 DEF_SVE_FUNCTION (svsudot_lane, ternary_intq_uintq_lane, s_signed, none)
 DEF_SVE_FUNCTION (svusdot, ternary_uintq_intq_opt_n, s_signed, none)
 DEF_SVE_FUNCTION (svusdot_lane, ternary_uintq_intq_lane, s_signed, none)
 #undef REQUIRED_EXTENSIONS
 
-#define REQUIRED_EXTENSIONS AARCH64_FL_I8MM | AARCH64_FL_SM_OFF
+#define REQUIRED_EXTENSIONS (AARCH64_FL_SVE \
+| AARCH64_FL_I8MM \
+| AARCH64_FL_SM_OFF)
 DEF_SVE_FUNCTION (svmmla, mmla, s_integer, none)
 DEF_SVE_FUNCTION (svusmmla, ternary_uintq_intq, s_signed, none)
 #undef REQUIRED_EXTENSIONS
 
-#define REQUIRED_EXTENSIONS AARCH64_FL_F32MM | AARCH64_FL_SM_OFF
+#define REQUIRED_EXTENSIONS (AARCH64_FL_SVE \
+| AARCH64_FL_F32MM \
+| AARCH64_FL_SM_OFF)
 DEF_SVE_FUNCTION (svmmla, mmla, s_float, none)
 #undef REQUIRED_EXTENSIONS
 
-#define REQUIRED_EXTENSIONS AARCH64_FL_F64MM
+#define REQUIRED_EXTENSIONS AARCH64_FL_SVE | AARCH64_FL_F64MM
 DEF_SVE_FUNCTION (svtrn1q, binary, all_data, none)
 DEF_SVE_FUNCTION (svtrn2q, binary, all_data, none)
 DEF_SVE_FUNCTION (svuzp1q, binary, all_data, none)
@@ -361,7 +367,9 @@ DEF_SVE_FUNCTION (svzip1q, binary, all_data, none)
 DEF_SVE_FUNCTION (svzip2q, binary, all_data, none)
 #undef REQUIRED_EXTENSIONS
 
-#define REQUIRED_EXTENSIONS AARCH64_FL_F64MM | AARCH64_FL_SM_OFF
+#define REQUIRED_EXTENSIONS (AARCH64_FL_SVE \
+| AARCH64_FL_F64MM \
+| AARCH64_FL_SM_OFF)
 DEF_SVE_FUNCTION (svld1ro, load_replicate, all_data, implicit)
 DEF_SVE_FUNCTION (svmmla, mmla, d_float, none)
 #undef REQUIRED_EXTENSIONS
diff --git a/gcc/config/aarch64/aarch64-sve-builtins-sve2.def 
b/gcc/config/aarch64/aarch64-sve-builtins-sve2.def
index 4e0466b4cf8..3c0a0e072f2 100644
--- a/gcc/config/aarch64/aarch64-sve-builtins-sve2.def
+++ b/gcc/config/aarch64/aarch64-sve-builtins-sve2.def
@@ -17,7 +17,7 @@
along with GCC; see the file COPYING3.  If not see


[PATCH 15/16] aarch64: Enforce inlining restrictions for SME

2022-11-13 Thread Richard Sandiford via Gcc-patches
A function that has local ZA state cannot be inlined into its caller,
since we only support managing ZA switches at function scope.

A function whose body requires a particular PSTATE.SM setting can only
be inlined into a function body that guarantees that PSTATE.SM setting.
(The callee's function type doesn't matter here: one locally-streaming
function can be inlined into another.)

gcc/
* config/aarch64/aarch64.cc (aarch64_function_attribute_inlinable_p):
New function.
(aarch64_can_inline_p): Use aarch64_fndecl_isa_mode to populate
the ISA mode bits when comparing the ISA flags of the two functions.
(TARGET_FUNCTION_ATTRIBUTE_INLINABLE_P): Define.

gcc/testsuite/
* gcc.target/aarch64/sme/inlining_1.c: New test.
* gcc.target/aarch64/sme/inlining_2.c: Likewise.
* gcc.target/aarch64/sme/inlining_3.c: Likewise.
* gcc.target/aarch64/sme/inlining_4.c: Likewise.
* gcc.target/aarch64/sme/inlining_5.c: Likewise.
* gcc.target/aarch64/sme/inlining_6.c: Likewise.
* gcc.target/aarch64/sme/inlining_7.c: Likewise.
* gcc.target/aarch64/sme/inlining_8.c: Likewise.
---
 gcc/config/aarch64/aarch64.cc | 33 ---
 .../gcc.target/aarch64/sme/inlining_1.c   | 26 +++
 .../gcc.target/aarch64/sme/inlining_2.c   | 26 +++
 .../gcc.target/aarch64/sme/inlining_3.c   | 26 +++
 .../gcc.target/aarch64/sme/inlining_4.c   | 26 +++
 .../gcc.target/aarch64/sme/inlining_5.c   | 26 +++
 .../gcc.target/aarch64/sme/inlining_6.c   | 18 ++
 .../gcc.target/aarch64/sme/inlining_7.c   | 18 ++
 .../gcc.target/aarch64/sme/inlining_8.c   | 18 ++
 9 files changed, 212 insertions(+), 5 deletions(-)
 create mode 100644 gcc/testsuite/gcc.target/aarch64/sme/inlining_1.c
 create mode 100644 gcc/testsuite/gcc.target/aarch64/sme/inlining_2.c
 create mode 100644 gcc/testsuite/gcc.target/aarch64/sme/inlining_3.c
 create mode 100644 gcc/testsuite/gcc.target/aarch64/sme/inlining_4.c
 create mode 100644 gcc/testsuite/gcc.target/aarch64/sme/inlining_5.c
 create mode 100644 gcc/testsuite/gcc.target/aarch64/sme/inlining_6.c
 create mode 100644 gcc/testsuite/gcc.target/aarch64/sme/inlining_7.c
 create mode 100644 gcc/testsuite/gcc.target/aarch64/sme/inlining_8.c

diff --git a/gcc/config/aarch64/aarch64.cc b/gcc/config/aarch64/aarch64.cc
index 48bf2de4b3d..9a4a469a078 100644
--- a/gcc/config/aarch64/aarch64.cc
+++ b/gcc/config/aarch64/aarch64.cc
@@ -20554,6 +20554,17 @@ aarch64_option_valid_attribute_p (tree fndecl, tree, 
tree args, int)
   return ret;
 }
 
+/* Implement TARGET_FUNCTION_ATTRIBUTE_INLINABLE_P.  Use an opt-out
+   rather than an opt-in list.  */
+
+static bool
+aarch64_function_attribute_inlinable_p (const_tree fndecl)
+{
+  /* A function that has local ZA state cannot be inlined into its caller,
+ since we only support managing ZA switches at function scope.  */
+  return !aarch64_fndecl_has_new_za_state (fndecl);
+}
+
 /* Helper for aarch64_can_inline_p.  In the case where CALLER and CALLEE are
tri-bool options (yes, no, don't care) and the default value is
DEF, determine whether to reject inlining.  */
@@ -20597,12 +20608,20 @@ aarch64_can_inline_p (tree caller, tree callee)
   : target_option_default_node);
 
   /* Callee's ISA flags should be a subset of the caller's.  */
-  if ((caller_opts->x_aarch64_asm_isa_flags
-   & callee_opts->x_aarch64_asm_isa_flags)
-  != callee_opts->x_aarch64_asm_isa_flags)
+  auto caller_asm_isa = (caller_opts->x_aarch64_isa_flags
+& ~AARCH64_FL_ISA_MODES);
+  auto callee_asm_isa = (callee_opts->x_aarch64_isa_flags
+& ~AARCH64_FL_ISA_MODES);
+  if (callee_asm_isa & ~caller_asm_isa)
 return false;
-  if ((caller_opts->x_aarch64_isa_flags & callee_opts->x_aarch64_isa_flags)
-  != callee_opts->x_aarch64_isa_flags)
+
+  auto caller_isa = ((caller_opts->x_aarch64_isa_flags
+ & ~AARCH64_FL_ISA_MODES)
+| aarch64_fndecl_isa_mode (caller));
+  auto callee_isa = ((callee_opts->x_aarch64_isa_flags
+ & ~AARCH64_FL_ISA_MODES)
+| aarch64_fndecl_isa_mode (callee));
+  if (callee_isa & ~caller_isa)
 return false;
 
   /* Allow non-strict aligned functions inlining into strict
@@ -29150,6 +29169,10 @@ aarch64_run_selftests (void)
 #undef TARGET_CAN_ELIMINATE
 #define TARGET_CAN_ELIMINATE aarch64_can_eliminate
 
+#undef TARGET_FUNCTION_ATTRIBUTE_INLINABLE_P
+#define TARGET_FUNCTION_ATTRIBUTE_INLINABLE_P \
+  aarch64_function_attribute_inlinable_p
+
 #undef TARGET_CAN_INLINE_P
 #define TARGET_CAN_INLINE_P aarch64_can_inline_p
 
diff --git a/gcc/testsuite/gcc.target/aarch64/sme/inlining_1.c 
b/gcc/testsuite/gcc.target/aarch64/sme/inlining_1.c
new file mode 100644
index 000..63d23cb8b41
--- 

[PATCH 12/16] aarch64: Tweaks to function_resolver::resolve_to

2022-11-13 Thread Richard Sandiford via Gcc-patches
This patch adds a new interface to function_resolver::resolve_to
in which the mode suffix stays the same (which is the common case).
It then moves the handling of explicit first type suffixes from
function_resolver::resolve_unary to this new function.

This makes things slightly simpler for existing code.  However, the
main reason for doing it is that it helps require_derived_vector_type
handle explicit type suffixes correctly, which in turn improves the
error messages generated by the manual C overloading code in a
follow-up SME patch.

gcc/
* config/aarch64/aarch64-sve-builtins.h
(function_resolver::resolve_to): Add an overload that takes
only the type suffixes.
* config/aarch64/aarch64-sve-builtins.cc
(function_resolver::resolve_to): Likewise.  Handle explicit type
suffixes here rather than...
(function_resolver::resolve_unary): ...here.
(function_resolver::require_derived_vector_type): Simplify accordingly.
(function_resolver::finish_opt_n_resolution): Likewise.
(function_resolver::resolve_uniform): Likewise.
* config/aarch64/aarch64-sve-builtins-shapes.cc
(binary_imm_narrowt_base::resolve): Likewise.
(load_contiguous_base::resolve): Likewise.
(mmla_def::resolve): Likewise.
(ternary_resize2_base::resolve): Likewise.
(ternary_resize2_lane_base::resolve): Likewise.
(unary_narrowt_base::resolve): Likewise.
(binary_n_def::resolve): Likewise.
(binary_uint_def::resolve): Likewise.
(binary_uint_n_def::resolve): Likewise.
(binary_uint64_n_def::resolve): Likewise.
(binary_wide_def::resolve): Likewise.
(compare_ptr_def::resolve): Likewise.
(compare_scalar_def::resolve): Likewise.
(fold_left_def::resolve): Likewise.
(get_def::resolve): Likewise.
(inc_dec_pred_def::resolve): Likewise.
(inc_dec_pred_scalar_def::resolve): Likewise.
(set_def::resolve): Likewise.
(store_def::resolve): Likewise.
(tbl_tuple_def::resolve): Likewise.
(ternary_qq_lane_rotate_def::resolve): Likewise.
(ternary_qq_rotate_def::resolve): Likewise.
(ternary_uint_def::resolve): Likewise.
(unary_def::resolve): Likewise.
(unary_widen_def::resolve): Likewise.
---
 .../aarch64/aarch64-sve-builtins-shapes.cc| 48 +--
 gcc/config/aarch64/aarch64-sve-builtins.cc| 34 +
 gcc/config/aarch64/aarch64-sve-builtins.h |  1 +
 3 files changed, 49 insertions(+), 34 deletions(-)

diff --git a/gcc/config/aarch64/aarch64-sve-builtins-shapes.cc 
b/gcc/config/aarch64/aarch64-sve-builtins-shapes.cc
index 5b47dff0b41..df2d5414c07 100644
--- a/gcc/config/aarch64/aarch64-sve-builtins-shapes.cc
+++ b/gcc/config/aarch64/aarch64-sve-builtins-shapes.cc
@@ -550,7 +550,7 @@ struct binary_imm_narrowt_base : public overloaded_base<0>
|| !r.require_integer_immediate (i + 2))
   return error_mark_node;
 
-return r.resolve_to (r.mode_suffix_id, type);
+return r.resolve_to (type);
   }
 };
 
@@ -649,7 +649,7 @@ struct load_contiguous_base : public overloaded_base<0>
|| (vnum_p && !r.require_scalar_type (i + 1, "int64_t")))
   return error_mark_node;
 
-return r.resolve_to (r.mode_suffix_id, type);
+return r.resolve_to (type);
   }
 };
 
@@ -739,7 +739,7 @@ struct mmla_def : public overloaded_base<0>
 
 /* Make sure that the function exists now, since not all forms
follow a set pattern after this point.  */
-tree res = r.resolve_to (r.mode_suffix_id, type);
+tree res = r.resolve_to (type);
 if (res == error_mark_node)
   return res;
 
@@ -896,7 +896,7 @@ struct ternary_resize2_base : public overloaded_base<0>
   MODIFIER))
   return error_mark_node;
 
-return r.resolve_to (r.mode_suffix_id, type);
+return r.resolve_to (type);
   }
 };
 
@@ -921,7 +921,7 @@ struct ternary_resize2_lane_base : public overloaded_base<0>
|| !r.require_integer_immediate (i + 3))
   return error_mark_node;
 
-return r.resolve_to (r.mode_suffix_id, type);
+return r.resolve_to (type);
   }
 };
 
@@ -1012,7 +1012,7 @@ struct unary_narrowt_base : public overloaded_base<0>
|| !r.require_derived_vector_type (i, i + 1, type, CLASS, r.HALF_SIZE))
   return error_mark_node;
 
-return r.resolve_to (r.mode_suffix_id, type);
+return r.resolve_to (type);
   }
 };
 
@@ -1218,7 +1218,7 @@ struct binary_n_def : public overloaded_base<0>
|| !r.require_derived_scalar_type (i + 1, r.SAME_TYPE_CLASS))
   return error_mark_node;
 
-return r.resolve_to (r.mode_suffix_id, type);
+return r.resolve_to (type);
   }
 };
 SHAPE (binary_n)
@@ -1399,7 +1399,7 @@ struct binary_uint_def : public overloaded_base<0>
|| !r.require_derived_vector_type (i + 1, i, type, TYPE_unsigned))
   return error_mark_node;
 
-return r.resolve_

Re: Proxy ping [PATCH] Fortran: diagnostics for actual arguments to pointer dummy arguments [PR94104]

2022-11-13 Thread Mikael Morin

Le 09/11/2022 à 21:50, Harald Anlauf via Fortran a écrit :

Dear all,

Jose posted a patch here that was never reviewed:

   https://gcc.gnu.org/pipermail/fortran/2021-June/056162.html

I think the diagnostics improvement is helpful, as it adjusts
to the changes from F2003 to F2008.

The patch suffered a little from bitrot, but was otherwise
straightforward to apply.  I slightly edited the commit
message, as I found the original one difficult to parse.

Regtested on x86_64-pc-linux-gnu.  OK for mainline?


OK, thanks.


Re: [PATCH 3/5] Fortran: Narrow return types [PR78798]

2022-11-13 Thread Janne Blomqvist via Gcc-patches
On Sun, Nov 13, 2022 at 1:47 AM Bernhard Reutner-Fischer via Fortran
 wrote:
> --- a/gcc/fortran/arith.cc
> +++ b/gcc/fortran/arith.cc
> @@ -1135,7 +1135,7 @@ compare_complex (gfc_expr *op1, gfc_expr *op2)
> strings.  We return -1 for a < b, 0 for a == b and 1 for a > b.
> We use the processor's default collating sequence.  */
>
> -int
> +signed char
>  gfc_compare_string (gfc_expr *a, gfc_expr *b)
>  {
>size_t len, alen, blen, i;
> @@ -1162,7 +1162,7 @@ gfc_compare_string (gfc_expr *a, gfc_expr *b)
>  }

Hmm, really? PR 78798 mentions changing int to bool, where
appropriate, which I think is uncontroversial, but this?


-- 
Janne Blomqvist


Re: [PATCH 3/5] Fortran: Narrow return types [PR78798]

2022-11-13 Thread Bernhard Reutner-Fischer via Gcc-patches
On Sun, 13 Nov 2022 12:13:26 +0200
Janne Blomqvist  wrote:

> On Sun, Nov 13, 2022 at 1:47 AM Bernhard Reutner-Fischer via Fortran
>  wrote:
> > --- a/gcc/fortran/arith.cc
> > +++ b/gcc/fortran/arith.cc
> > @@ -1135,7 +1135,7 @@ compare_complex (gfc_expr *op1, gfc_expr *op2)
> > strings.  We return -1 for a < b, 0 for a == b and 1 for a > b.
> > We use the processor's default collating sequence.  */
> >
> > -int
> > +signed char
> >  gfc_compare_string (gfc_expr *a, gfc_expr *b)
> >  {
> >size_t len, alen, blen, i;
> > @@ -1162,7 +1162,7 @@ gfc_compare_string (gfc_expr *a, gfc_expr *b)
> >  }  
> 
> Hmm, really? PR 78798 mentions changing int to bool, where
> appropriate, which I think is uncontroversial, but this?

Well we could leave this or all spots alone where a bool is
insufficient, if you prefer.

In the case of gfc_compare_string, the only user is simplify which only
looks at ge/gt/le/lt 0


Re: [committed] libstdc++: Add C++20 clocks

2022-11-13 Thread Jonathan Wakely via Gcc-patches
On Sun, 13 Nov 2022 at 01:17, Jonathan Wakely via Libstdc++
 wrote:
>
> Tested x86_64-linux and powerpc64le-linux. Pushed to trunk.
>
> -- >8 --
>
> Also add the basic types for timezones, without the non-inline
> definitions needed to actually use them.

This is the patch for the rest of the time zone support.

Not pushed yet.
diff --git a/libstdc++-v3/config/abi/pre/gnu.ver 
b/libstdc++-v3/config/abi/pre/gnu.ver
index 225d6dc482b..8e5d0bf081f 100644
--- a/libstdc++-v3/config/abi/pre/gnu.ver
+++ b/libstdc++-v3/config/abi/pre/gnu.ver
@@ -2485,6 +2485,14 @@ GLIBCXX_3.4.31 {
 
 _ZSt15__try_use_facet*;
 
+_ZNSt6chrono11reload_tzdbEv;
+_ZNSt6chrono8get_tzdbEv;
+_ZNSt6chrono13get_tzdb_listEv;
+_ZNSt6chrono14remote_versionB5cxx11Ev;
+_ZNSt6chrono14remote_versionEv;
+_ZNKSt6chrono4tzdb12current_zoneEv;
+_ZNKSt6chrono4tzdb11locate_zoneESt17basic_string_viewIcSt11char_traitsIcEE;
+
 } GLIBCXX_3.4.30;
 
 # Symbols in the support library (libsupc++) have their own tag.
diff --git a/libstdc++-v3/include/Makefile.am b/libstdc++-v3/include/Makefile.am
index 27dfa2be2f3..08ce9abbcbc 100644
--- a/libstdc++-v3/include/Makefile.am
+++ b/libstdc++-v3/include/Makefile.am
@@ -175,6 +175,7 @@ bits_headers = \
${bits_srcdir}/char_traits.h \
${bits_srcdir}/charconv.h \
${bits_srcdir}/chrono.h \
+   ${bits_srcdir}/chrono_io.h \
${bits_srcdir}/codecvt.h \
${bits_srcdir}/cow_string.h \
${bits_srcdir}/deque.tcc \
diff --git a/libstdc++-v3/include/Makefile.in b/libstdc++-v3/include/Makefile.in
index 64621922f77..401d0eead58 100644
--- a/libstdc++-v3/include/Makefile.in
+++ b/libstdc++-v3/include/Makefile.in
@@ -528,6 +528,7 @@ bits_freestanding = \
 @GLIBCXX_HOSTED_TRUE@  ${bits_srcdir}/char_traits.h \
 @GLIBCXX_HOSTED_TRUE@  ${bits_srcdir}/charconv.h \
 @GLIBCXX_HOSTED_TRUE@  ${bits_srcdir}/chrono.h \
+@GLIBCXX_HOSTED_TRUE@  ${bits_srcdir}/chrono_io.h \
 @GLIBCXX_HOSTED_TRUE@  ${bits_srcdir}/codecvt.h \
 @GLIBCXX_HOSTED_TRUE@  ${bits_srcdir}/cow_string.h \
 @GLIBCXX_HOSTED_TRUE@  ${bits_srcdir}/deque.tcc \
diff --git a/libstdc++-v3/include/bits/chrono.h 
b/libstdc++-v3/include/bits/chrono.h
index 05987ca09df..432b25affea 100644
--- a/libstdc++-v3/include/bits/chrono.h
+++ b/libstdc++-v3/include/bits/chrono.h
@@ -1069,6 +1069,17 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
 const time_point<_Clock, _Dur2>& __rhs)
   { return !(__lhs < __rhs); }
 
+#if __cpp_variable_templates
+template
+  inline constexpr bool __is_duration_v = false;
+template
+  inline constexpr bool __is_duration_v> = true;
+template
+  inline constexpr bool __is_time_point_v = false;
+template
+  inline constexpr bool __is_time_point_v> = true;
+#endif
+
 /// @}
 /// @} group chrono
 
diff --git a/libstdc++-v3/include/bits/chrono_io.h 
b/libstdc++-v3/include/bits/chrono_io.h
new file mode 100644
index 000..779a65ece91
--- /dev/null
+++ b/libstdc++-v3/include/bits/chrono_io.h
@@ -0,0 +1,1633 @@
+//  Formatting -*- C++ -*-
+
+// Copyright The GNU Toolchain Authors.
+//
+// This file is part of the GNU ISO C++ Library.  This library is free
+// software; you can redistribute it and/or modify it under the
+// terms of the GNU General Public License as published by the
+// Free Software Foundation; either version 3, or (at your option)
+// any later version.
+
+// This library is distributed in the hope that it will be useful,
+// but WITHOUT ANY WARRANTY; without even the implied warranty of
+// MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+// GNU General Public License for more details.
+
+// Under Section 7 of GPL version 3, you are granted additional
+// permissions described in the GCC Runtime Library Exception, version
+// 3.1, as published by the Free Software Foundation.
+
+// You should have received a copy of the GNU General Public License and
+// a copy of the GCC Runtime Library Exception along with this program;
+// see the files COPYING3 and COPYING.RUNTIME respectively.  If not, see
+// .
+
+/** @file include/bits/chrono_io.h
+ *  This is an internal header file, included by other library headers.
+ *  Do not attempt to use it directly. @headername{chrono}
+ */
+
+#ifndef _GLIBCXX_CHRONO_IO_H
+#define _GLIBCXX_CHRONO_IO_H 1
+
+#pragma GCC system_header
+
+#if __cplusplus >= 202002L
+
+#include  // ostringstream
+#include  // setw, setfill
+#include 
+
+#include 
+
+namespace std _GLIBCXX_VISIBILITY(default)
+{
+_GLIBCXX_BEGIN_NAMESPACE_VERSION
+
+namespace chrono
+{
+/// @addtogroup chrono
+/// @{
+
+/// @cond undocumented
+namespace __detail
+{
+  // STATICALLY-WIDEN, see C++20 [time.general]
+  // It doesn't matter for format strings (which can only be char or wchar_t)
+  // but this returns the narrow string for anything that isn't wchar_t. This
+  // is done because const char* can be inserted into any ostream type, and
+  // will be widened at runtime if necessar

Re: [PATCH] doc: Ada: include Indices and Tables in manuals

2022-11-13 Thread Martin Liška
On 11/11/22 18:25, Arnaud Charlet wrote:
> 
>> Similarly to other manuals, we should include the page
>> in HTML builder.
>>
>> What Ada folks think about it?
> 
> The latest changes have broken our build of the Ada doc at AdaCore so until 
> further notice, please do not make any additional changes to the Ada doc 
> while we review in details all the recent changes and find a way to recover, 
> thank you.

Hello.

Sorry for the breakage. However, I contacted you (and your colleague) and 
haven't received
any feedback for a couple of weeks.

Martin

> 
> Arno 



[PATCH] c++, v2: Implement CWG 2654 - Un-deprecation of compound volatile assignments

2022-11-13 Thread Jakub Jelinek via Gcc-patches
On Fri, Nov 11, 2022 at 08:43:04AM +0100, Jakub Jelinek wrote:
> Again, because stage1 close is near, posting the following patch
> to implement CWG 2654.
> 
> Ok for trunk if it passes bootstrap/regtest and is voted into C++23
> and C++20 as a DR?

Here is an updated patch that passed bootstrap/regtest, difference
is just a few further testsuite tweaks.

2022-11-13  Jakub Jelinek  

* typeck.cc (cp_build_modify_expr): Implement CWG 2654
- Un-deprecation of compound volatile assignments.  Remove
-Wvolatile warning about compound volatile assignments.

* g++.dg/cpp2a/volatile1.C (fn2, fn3, racoon): Adjust expected
diagnostics.
* g++.dg/cpp2a/volatile3.C (fn2, fn3, racoon): Likewise.
* g++.dg/cpp2a/volatile5.C (f): Likewise.
* g++.dg/ext/vector25.C (foo): Don't expect a warning.
* g++.dg/cpp1y/new1.C (test_unused): Likewise.

--- gcc/cp/typeck.cc.jj 2022-11-09 11:22:42.617628059 +0100
+++ gcc/cp/typeck.cc2022-11-10 23:19:00.394228067 +0100
@@ -9513,19 +9513,6 @@ cp_build_modify_expr (location_t loc, tr
 && MAYBE_CLASS_TYPE_P (TREE_TYPE (lhstype)))
|| MAYBE_CLASS_TYPE_P (lhstype)));
 
- /* An expression of the form E1 op= E2.  [expr.ass] says:
-"Such expressions are deprecated if E1 has volatile-qualified
-type and op is not one of the bitwise operators |, &, ^."
-We warn here rather than in cp_genericize_r because
-for compound assignments we are supposed to warn even if the
-assignment is a discarded-value expression.  */
- if (modifycode != BIT_AND_EXPR
- && modifycode != BIT_IOR_EXPR
- && modifycode != BIT_XOR_EXPR
- && (TREE_THIS_VOLATILE (lhs) || CP_TYPE_VOLATILE_P (lhstype)))
-   warning_at (loc, OPT_Wvolatile,
-   "compound assignment with %-qualified left "
-   "operand is deprecated");
  /* Preevaluate the RHS to make sure its evaluation is complete
 before the lvalue-to-rvalue conversion of the LHS:
 
--- gcc/testsuite/g++.dg/cpp2a/volatile1.C.jj   2022-08-16 13:15:22.739043862 
+0200
+++ gcc/testsuite/g++.dg/cpp2a/volatile1.C  2022-11-10 23:23:18.949717772 
+0100
@@ -74,17 +74,17 @@ fn2 ()
   decltype(i = vi = 42) x3 = i;
 
   // Compound assignments.
-  vi += i; // { dg-warning "assignment with .volatile.-qualified left operand 
is deprecated" "" { target c++20 } }
-  vi -= i; // { dg-warning "assignment with .volatile.-qualified left operand 
is deprecated" "" { target c++20 } }
-  vi %= i; // { dg-warning "assignment with .volatile.-qualified left operand 
is deprecated" "" { target c++20 } }
+  vi += i; // { dg-bogus "assignment with .volatile.-qualified left operand is 
deprecated" }
+  vi -= i; // { dg-bogus "assignment with .volatile.-qualified left operand is 
deprecated" }
+  vi %= i; // { dg-bogus "assignment with .volatile.-qualified left operand is 
deprecated" }
   vi ^= i; // { dg-bogus "assignment with .volatile.-qualified left operand is 
deprecated" }
   vi |= i; // { dg-bogus "assignment with .volatile.-qualified left operand is 
deprecated" }
   vi &= i; // { dg-bogus "assignment with .volatile.-qualified left operand is 
deprecated" }
-  vi /= i; // { dg-warning "assignment with .volatile.-qualified left operand 
is deprecated" "" { target c++20 } }
+  vi /= i; // { dg-bogus "assignment with .volatile.-qualified left operand is 
deprecated" }
   vi = vi += 42; // { dg-warning "assignment with .volatile.-qualified left 
operand is deprecated" "" { target c++20 } }
   vi += vi = 42; // { dg-warning "assignment with .volatile.-qualified left 
operand is deprecated" "" { target c++20 } }
   i *= vi;
-  decltype(vi -= 42) x2 = vi; // { dg-warning "assignment with 
.volatile.-qualified left operand is deprecated" "" { target c++20 } }
+  decltype(vi -= 42) x2 = vi; // { dg-bogus "assignment with 
.volatile.-qualified left operand is deprecated" }
 
   // Structured bindings.
   int a[] = { 10, 5 };
@@ -107,12 +107,12 @@ fn3 ()
   volatile U u;
   u.c = 42;
   i = u.c = 42; // { dg-warning "assignment with .volatile.-qualified left 
operand is deprecated" "" { target c++20 } }
-  u.c += 42; // { dg-warning "assignment with .volatile.-qualified left 
operand is deprecated" "" { target c++20 } }
+  u.c += 42; // { dg-bogus "assignment with .volatile.-qualified left operand 
is deprecated" }
 
   volatile T t;
   t.a = 3;
   j = t.a = 3; // { dg-warning "assignment with .volatile.-qualified left 
operand is deprecated" "" { target c++20 } }
-  t.a += 3; // { dg-warning "assignment with .volatile.-qualified left operand 
is deprecated" "" { target c++20 } }
+  t.a += 3; // { dg-bogus "assignment with .volatile.-qualified left operand 
is deprecated" }
 
   volatile int *src = &i;
   *src; // No assignment, don't warn.
@@ -135,7 +135,7 @@ void raccoon ()
   volatile T t, u;
   t = 42;
  

[PATCH] c++, v2: Implement C++23 P2647R1 - Permitting static constexpr variables in constexpr functions

2022-11-13 Thread Jakub Jelinek via Gcc-patches
On Fri, Nov 11, 2022 at 06:07:04PM +0100, Jakub Jelinek wrote:
> The following patch on top of Marek's P2448 PR106649 patch
> (mainly because that patch implements the previous __cpp_constexpr
> feature test macro bump so this can't go in earlier; OT,
> P2280R4 doesn't have any feature test macro?) implements this
> simple paper.
> 
> Ok for trunk if it passes bootstrap/regtest and is voted into C++23?

Here is an updated patch that passed bootstrap/regtest, the only
change is another testcase tweak.

2022-11-13  Jakub Jelinek  

gcc/c-family/
* c-cppbuiltin.cc (c_cpp_builtins): Bump __cpp_constexpr
value from 202207L to 202211L.
gcc/cp/
* constexpr.cc (cxx_eval_constant_expression): Implement C++23
P2647R1 - Permitting static constexpr variables in constexpr functions.
Allow decl_maybe_constant_var_p static or thread_local vars for
C++23.
(potential_constant_expression_1): Likewise.
gcc/testsuite/
* g++.dg/cpp23/constexpr-nonlit17.C: New test.
* g++.dg/cpp23/feat-cxx2b.C: Adjust expected __cpp_constexpr
value.
* g++.dg/ext/stmtexpr19.C: Don't expect an error for C++23 or later. 

--- gcc/c-family/c-cppbuiltin.cc.jj 2022-11-11 17:14:52.021613271 +0100
+++ gcc/c-family/c-cppbuiltin.cc2022-11-11 17:17:45.065265246 +0100
@@ -1074,7 +1074,7 @@ c_cpp_builtins (cpp_reader *pfile)
  /* Set feature test macros for C++23.  */
  cpp_define (pfile, "__cpp_size_t_suffix=202011L");
  cpp_define (pfile, "__cpp_if_consteval=202106L");
- cpp_define (pfile, "__cpp_constexpr=202207L");
+ cpp_define (pfile, "__cpp_constexpr=202211L");
  cpp_define (pfile, "__cpp_multidimensional_subscript=202110L");
  cpp_define (pfile, "__cpp_named_character_escapes=202207L");
  cpp_define (pfile, "__cpp_static_call_operator=202207L");
--- gcc/cp/constexpr.cc.jj  2022-11-11 17:14:52.024613231 +0100
+++ gcc/cp/constexpr.cc 2022-11-11 17:16:54.384952917 +0100
@@ -7085,7 +7085,8 @@ cxx_eval_constant_expression (const cons
&& (TREE_STATIC (r)
|| (CP_DECL_THREAD_LOCAL_P (r) && !DECL_REALLY_EXTERN (r)))
/* Allow __FUNCTION__ etc.  */
-   && !DECL_ARTIFICIAL (r))
+   && !DECL_ARTIFICIAL (r)
+   && (cxx_dialect < cxx23 || !decl_maybe_constant_var_p (r)))
  {
if (!ctx->quiet)
  {
@@ -9577,7 +9578,10 @@ potential_constant_expression_1 (tree t,
   tmp = DECL_EXPR_DECL (t);
   if (VAR_P (tmp) && !DECL_ARTIFICIAL (tmp))
{
- if (CP_DECL_THREAD_LOCAL_P (tmp) && !DECL_REALLY_EXTERN (tmp))
+ if (CP_DECL_THREAD_LOCAL_P (tmp)
+ && !DECL_REALLY_EXTERN (tmp)
+ && (cxx_dialect < cxx23
+ || !decl_maybe_constant_var_p (tmp)))
{
  if (flags & tf_error)
constexpr_error (DECL_SOURCE_LOCATION (tmp), fundef_p,
@@ -9585,7 +9589,9 @@ potential_constant_expression_1 (tree t,
 "% context", tmp);
  return false;
}
- else if (TREE_STATIC (tmp))
+ else if (TREE_STATIC (tmp)
+  && (cxx_dialect < cxx23
+  || !decl_maybe_constant_var_p (tmp)))
{
  if (flags & tf_error)
constexpr_error (DECL_SOURCE_LOCATION (tmp), fundef_p,
--- gcc/testsuite/g++.dg/cpp23/constexpr-nonlit17.C.jj  2022-11-11 
17:59:59.972852793 +0100
+++ gcc/testsuite/g++.dg/cpp23/constexpr-nonlit17.C 2022-11-11 
17:59:38.725141231 +0100
@@ -0,0 +1,12 @@
+// P2647R1 - Permitting static constexpr variables in constexpr functions
+// { dg-do compile { target c++23 } }
+
+constexpr char
+test ()
+{
+  static const int x = 5;
+  static constexpr char c[] = "Hello World";
+  return *(c + x);
+}
+
+static_assert (test () == ' ');
--- gcc/testsuite/g++.dg/cpp23/feat-cxx2b.C.jj  2022-11-11 17:14:52.194610922 
+0100
+++ gcc/testsuite/g++.dg/cpp23/feat-cxx2b.C 2022-11-11 17:48:56.038865825 
+0100
@@ -134,8 +134,8 @@
 
 #ifndef __cpp_constexpr
 #  error "__cpp_constexpr"
-#elif __cpp_constexpr != 202207
-#  error "__cpp_constexpr != 202207"
+#elif __cpp_constexpr != 202211
+#  error "__cpp_constexpr != 202211"
 #endif
 
 #ifndef __cpp_decltype_auto
--- gcc/testsuite/g++.dg/ext/stmtexpr19.C.jj2020-01-14 20:02:46.839608995 
+0100
+++ gcc/testsuite/g++.dg/ext/stmtexpr19.C   2022-11-12 09:17:40.706245495 
+0100
@@ -8,7 +8,7 @@ const test* setup()
 {
   static constexpr test atest =
 {
-  ({ static const int inner = 123; &inner; }) // { dg-error "static" }
+  ({ static const int inner = 123; &inner; }) // { dg-error "static" "" { 
target c++20_down } }
 };
 
   return &atest;


Jakub



[PATCH] c++, v2: Implement CWG2635 - Constrained structured bindings

2022-11-13 Thread Jakub Jelinek via Gcc-patches
On Sat, Nov 12, 2022 at 12:23:56PM +0100, Jakub Jelinek wrote:
> The following patch implements CWG2635.
> 
> So far tested on
> GXX_TESTSUITE_STDS=98,11,14,17,20,2b make check-g++ 
> RUNTESTFLAGS="dg.exp=decomp*"
> ok for trunk if it passes full bootstrap/regtest and it is voted in?

Here is another version of the patch that passed bootstrap/regtest, the only
change are tweaks to 2 further testcases.

2022-11-13  Jakub Jelinek  

* decl.cc (grokdeclarator): Implement
CWG2635 - Constrained structured bindings.  Diagnose constrained
auto type.

* g++.dg/cpp2a/decomp5.C: New test.
* g++.dg/cpp2a/concepts-placeholder7.C: Adjust expected diagnostics.
* g++.dg/cpp2a/concepts-placeholder8.C: Likewise.

--- gcc/cp/decl.cc.jj   2022-11-11 17:14:33.103869977 +0100
+++ gcc/cp/decl.cc  2022-11-12 12:13:52.217239729 +0100
@@ -12660,7 +12660,8 @@ grokdeclarator (const cp_declarator *dec
  gcc_unreachable ();
}
   if (TREE_CODE (type) != TEMPLATE_TYPE_PARM
- || TYPE_IDENTIFIER (type) != auto_identifier)
+ || TYPE_IDENTIFIER (type) != auto_identifier
+ || PLACEHOLDER_TYPE_CONSTRAINTS_INFO (type))
{
  if (type != error_mark_node)
{
--- gcc/testsuite/g++.dg/cpp2a/decomp5.C.jj 2022-11-12 12:17:21.024392082 
+0100
+++ gcc/testsuite/g++.dg/cpp2a/decomp5.C2022-11-12 12:20:00.700214521 
+0100
@@ -0,0 +1,20 @@
+// CWG2635 - Constrained structured bindings 
+// { dg-do compile { target c++20 } }
+
+namespace std {
+  template struct tuple_size;
+  template struct tuple_element;
+}
+
+struct A {
+  int i;
+  A(int x) : i(x) {}
+  template  int& get() { return i; }
+};
+
+template<> struct std::tuple_size { static const int value = 2; };
+template struct std::tuple_element { using type = int; };
+
+template concept C = true;
+C auto [x, y] = A{1}; // { dg-error "structured binding declaration cannot 
have type 'auto \\\[requires ::C<, >\\\]'" }
+ // { dg-message "type must be cv-qualified 'auto' or 
reference to cv-qualified 'auto'" "" { target *-*-* } .-1 }
--- gcc/testsuite/g++.dg/cpp2a/concepts-placeholder7.C.jj   2021-04-06 
23:49:08.22716 +0200
+++ gcc/testsuite/g++.dg/cpp2a/concepts-placeholder7.C  2022-11-13 
12:17:38.570047510 +0100
@@ -7,16 +7,16 @@ template 
 void f() {
   int x[] = {1,2};
   int y[] = {3};
-  C1 auto [a,b] = x;
-  C1 auto [c] = y; // { dg-error "constraints" }
+  C1 auto [a,b] = x;   // { dg-error "structured binding declaration cannot 
have type 'auto \\\[requires ::C1<, >\\\]'" }
+  C1 auto [c] = y; // { dg-error "structured binding declaration cannot 
have type 'auto \\\[requires ::C1<, >\\\]'" }
 }
 
 template 
 void g() {
   T x[] = {1,2};
   T y[] = {3};
-  C1 auto [a,b] = x;
-  C1 auto [c] = y; // { dg-error "constraints" }
+  C1 auto [a,b] = x;   // { dg-error "structured binding declaration cannot 
have type 'auto \\\[requires ::C1<, >\\\]'" }
+  C1 auto [c] = y; // { dg-error "structured binding declaration cannot 
have type 'auto \\\[requires ::C1<, >\\\]'" }
 }
 template void g();
 
@@ -27,6 +27,6 @@ struct S { int a, b; } s;
 
 template 
 void h() {
-  const C2 auto& [a, b] = s;
+  const C2 auto& [a, b] = s;// { dg-error "structured binding 
declaration cannot have type 'const auto \\\[requires ::C2<, 
>\\\]'" }
 }
 template void h();
--- gcc/testsuite/g++.dg/cpp2a/concepts-placeholder8.C.jj   2021-04-06 
23:49:08.22716 +0200
+++ gcc/testsuite/g++.dg/cpp2a/concepts-placeholder8.C  2022-11-13 
12:19:17.034684881 +0100
@@ -5,6 +5,6 @@ template  concept is_const = __
 void f() {
   int x[] = {1,2};
   const int y[] = {3};
-  const is_const auto [a,b] = x; // { dg-error "constraints" }
-  const is_const auto [c] = y;
+  const is_const auto [a,b] = x;   // { dg-error "structured binding 
declaration cannot have type 'const auto \\\[requires ::is_const<, 
>\\\]'" }
+  const is_const auto [c] = y; // { dg-error "structured binding 
declaration cannot have type 'const auto \\\[requires ::is_const<, 
>\\\]'" }
 }


Jakub



Re: ginclude: C2x header version macros

2022-11-13 Thread Richard Biener via Gcc-patches



> Am 12.11.2022 um 19:18 schrieb Joseph Myers :
> 
> C2x adds __STDC_VERSION_*_H__ macros to individual headers with
> interface changes compared to C17.  All the new header features in
> headers provided by GCC have now been implemented, so define those
> macros to the value given in the current working draft.
> 
> Bootstrapped with no regressions for x86_64-pc-linux-gnu.  OK to
> commit?

Ok.

Richard 

> gcc/
>* ginclude/float.h [__STDC_VERSION__ > 201710L]
>(__STDC_VERSION_FLOAT_H__): New macro.
>* ginclude/stdarg.h [__STDC_VERSION__ > 201710L]
>(__STDC_VERSION_STDARG_H__): New macro.
>* ginclude/stdatomic.h [__STDC_VERSION__ > 201710L]
>(__STDC_VERSION_STDATOMIC_H__): New macro.
>* ginclude/stddef.h [__STDC_VERSION__ > 201710L]
>(__STDC_VERSION_STDDEF_H__): New macro.
>* ginclude/stdint-gcc.h [__STDC_VERSION__ > 201710L]
>(__STDC_VERSION_STDINT_H__): New macro.
>* glimits.h [__STDC_VERSION__ > 201710L]
>(__STDC_VERSION_LIMITS_H__): New macro.
> 
> gcc/testsuite/
>* gcc.dg/c11-float-8.c, gcc.dg/c11-limits-1.c,
>gcc.dg/c11-stdarg-4.c, gcc.dg/c11-stdatomic-3.c,
>gcc.dg/c11-stddef-1.c, gcc.dg/c11-stdint-1.c,
>gcc.dg/c2x-float-13.c, gcc.dg/c2x-limits-1.c,
>gcc.dg/c2x-stdarg-5.c, gcc.dg/c2x-stdatomic-1.c,
>gcc.dg/c2x-stddef-1.c, gcc.dg/c2x-stdint-1.c: New tests.
> 
> diff --git a/gcc/ginclude/float.h b/gcc/ginclude/float.h
> index bc5439d664f..172b9de477f 100644
> --- a/gcc/ginclude/float.h
> +++ b/gcc/ginclude/float.h
> @@ -624,4 +624,8 @@ see the files COPYING3 and COPYING.RUNTIME respectively.  
> If not, see
> 
> #endif /* __DEC32_MANT_DIG__ */
> 
> +#if defined __STDC_VERSION__ && __STDC_VERSION__ > 201710L
> +#define __STDC_VERSION_FLOAT_H__202311L
> +#endif
> +
> #endif /* _FLOAT_H___ */
> diff --git a/gcc/ginclude/stdarg.h b/gcc/ginclude/stdarg.h
> index c704c9ffcf2..5149f7b3f4f 100644
> --- a/gcc/ginclude/stdarg.h
> +++ b/gcc/ginclude/stdarg.h
> @@ -125,6 +125,10 @@ typedef __gnuc_va_list va_list;
> 
> #endif /* not __svr4__ */
> 
> +#if defined __STDC_VERSION__ && __STDC_VERSION__ > 201710L
> +#define __STDC_VERSION_STDARG_H__202311L
> +#endif
> +
> #endif /* _STDARG_H */
> 
> #endif /* not _ANSI_STDARG_H_ */
> diff --git a/gcc/ginclude/stdatomic.h b/gcc/ginclude/stdatomic.h
> index a56ba5d9639..e16b072ccde 100644
> --- a/gcc/ginclude/stdatomic.h
> +++ b/gcc/ginclude/stdatomic.h
> @@ -248,4 +248,8 @@ extern void atomic_flag_clear (volatile atomic_flag *);
> extern void atomic_flag_clear_explicit (volatile atomic_flag *, memory_order);
> #define atomic_flag_clear_explicit(PTR, MO)   __atomic_clear ((PTR), (MO))
> 
> +#if defined __STDC_VERSION__ && __STDC_VERSION__ > 201710L
> +#define __STDC_VERSION_STDATOMIC_H__202311L
> +#endif
> +
> #endif  /* _STDATOMIC_H */
> diff --git a/gcc/ginclude/stddef.h b/gcc/ginclude/stddef.h
> index 2767edf51de..7980045e712 100644
> --- a/gcc/ginclude/stddef.h
> +++ b/gcc/ginclude/stddef.h
> @@ -454,6 +454,7 @@ typedef struct {
> 
> #if defined __STDC_VERSION__ && __STDC_VERSION__ > 201710L
> #define unreachable() (__builtin_unreachable ())
> +#define __STDC_VERSION_STDDEF_H__202311L
> #endif
> 
> #endif /* _STDDEF_H was defined this time */
> diff --git a/gcc/ginclude/stdint-gcc.h b/gcc/ginclude/stdint-gcc.h
> index 6be01ae28b8..eab651d968a 100644
> --- a/gcc/ginclude/stdint-gcc.h
> +++ b/gcc/ginclude/stdint-gcc.h
> @@ -362,4 +362,8 @@ typedef __UINTMAX_TYPE__ uintmax_t;
> 
> #endif
> 
> +#if defined __STDC_VERSION__ && __STDC_VERSION__ > 201710L
> +#define __STDC_VERSION_STDINT_H__202311L
> +#endif
> +
> #endif /* _GCC_STDINT_H */
> diff --git a/gcc/glimits.h b/gcc/glimits.h
> index 8d74c8b88d6..994f7e33bbe 100644
> --- a/gcc/glimits.h
> +++ b/gcc/glimits.h
> @@ -156,6 +156,8 @@ see the files COPYING3 and COPYING.RUNTIME respectively.  
> If not, see
> # define BOOL_MAX 1
> # undef BOOL_WIDTH
> # define BOOL_WIDTH 1
> +
> +# define __STDC_VERSION_LIMITS_H__202311L
> #endif
> 
> #endif /* _LIMITS_H___ */
> diff --git a/gcc/testsuite/gcc.dg/c11-float-8.c 
> b/gcc/testsuite/gcc.dg/c11-float-8.c
> new file mode 100644
> index 000..7fb1e0a5683
> --- /dev/null
> +++ b/gcc/testsuite/gcc.dg/c11-float-8.c
> @@ -0,0 +1,9 @@
> +/* Test __STDC_VERSION_FLOAT_H__ not in C11.  */
> +/* { dg-do preprocess } */
> +/* { dg-options "-std=c11 -pedantic-errors" } */
> +
> +#include 
> +
> +#ifdef __STDC_VERSION_FLOAT_H__
> +#error "__STDC_VERSION_FLOAT_H__ defined"
> +#endif
> diff --git a/gcc/testsuite/gcc.dg/c11-limits-1.c 
> b/gcc/testsuite/gcc.dg/c11-limits-1.c
> new file mode 100644
> index 000..6dc5737024d
> --- /dev/null
> +++ b/gcc/testsuite/gcc.dg/c11-limits-1.c
> @@ -0,0 +1,9 @@
> +/* Test __STDC_VERSION_LIMITS_H__ not in C11.  */
> +/* { dg-do preprocess } */
> +/* { dg-options "-std=c11 -pedantic-errors" } */
> +
> +#include 
> +
> +#ifdef __STDC_VERSION_LIMITS_H__
> +#error "__STDC_VERSION_LIMITS_H__ defined"
> +#endif
> diff --git a/gcc/testsuite/gcc.dg/c11-

Re: [PATCH] doc: Ada: include Indices and Tables in manuals

2022-11-13 Thread Arnaud Charlet via Gcc-patches


> Sorry for the breakage. However, I contacted you (and your colleague) and 
> haven't received
> any feedback for a couple of weeks.

Right although I did give you feedback that what you sent wasn’t in a suitable 
form for review wrt Ada.

Arno

Re: [PATCH] doc: Ada: include Indices and Tables in manuals

2022-11-13 Thread Martin Liška
On 11/13/22 13:26, Arnaud Charlet wrote:
> 
>> Sorry for the breakage. However, I contacted you (and your colleague) and 
>> haven't received
>> any feedback for a couple of weeks.
> 
> Right although I did give you feedback that what you sent wasn’t in a 
> suitable form for review wrt Ada.

Sure, but sending a patch set to gcc-patches wouldn't have worked either, we've 
got quite a strict
email size limit.

Anyway, hope the AdaCore build would be fixable with a reasonable amount of 
effort?

Martin

> 
> Arno



[RFC PATCH] ipa-guarded-deref: Add new pass to dereference function pointers

2022-11-13 Thread Christoph Muellner
From: Christoph Müllner 

This patch adds a new pass that looks up function pointer assignments,
and adds guarded direct calls to the call sites of the function
pointers.

E.g.: Lets assume an assignment to a function pointer as follows:
b->cb = &myfun;
  Other part of the program can use the function pointer as follows:
b->cb ();
  With this pass the invocation will be transformed to:
if (b->cb == myfun)
  myfun();
else
   b->cb ()

The impact of the dynamic guard is expected to be less than the speedup
gained by enabled optimizations (e.g. inlining or constant propagation).

PR ipa/107666
gcc/ChangeLog:

* Makefile.in: Add new pass.
* common.opt: Add flag -fipa-guarded-deref.
* lto-section-in.cc: Add new section "ipa_guarded_deref".
* lto-streamer.h (enum lto_section_type): Add new section.
* passes.def: Add new pass.
* timevar.def (TV_IPA_GUARDED_DEREF): Add time var.
* tree-pass.h (make_pass_ipa_guarded_deref): New prototype.
* ipa-guarded-deref.cc: New file.

Signed-off-by: Christoph Müllner 
---
 gcc/Makefile.in  |1 +
 gcc/common.opt   |4 +
 gcc/ipa-guarded-deref.cc | 1115 ++
 gcc/lto-section-in.cc|1 +
 gcc/lto-streamer.h   |1 +
 gcc/passes.def   |1 +
 gcc/timevar.def  |1 +
 gcc/tree-pass.h  |1 +
 8 files changed, 1125 insertions(+)
 create mode 100644 gcc/ipa-guarded-deref.cc

diff --git a/gcc/Makefile.in b/gcc/Makefile.in
index f672e6ea549..402c4a6ea3f 100644
--- a/gcc/Makefile.in
+++ b/gcc/Makefile.in
@@ -1462,6 +1462,7 @@ OBJS = \
ipa-sra.o \
ipa-devirt.o \
ipa-fnsummary.o \
+   ipa-guarded-deref.o \
ipa-polymorphic-call.o \
ipa-split.o \
ipa-inline.o \
diff --git a/gcc/common.opt b/gcc/common.opt
index bce3e514f65..8344940ae5b 100644
--- a/gcc/common.opt
+++ b/gcc/common.opt
@@ -1933,6 +1933,10 @@ fipa-bit-cp
 Common Var(flag_ipa_bit_cp) Optimization
 Perform interprocedural bitwise constant propagation.
 
+fipa-guarded-deref
+Common Var(flag_ipa_guarded_deref) Optimization
+Perform guarded function pointer derferencing.
+
 fipa-modref
 Common Var(flag_ipa_modref) Optimization
 Perform interprocedural modref analysis.
diff --git a/gcc/ipa-guarded-deref.cc b/gcc/ipa-guarded-deref.cc
new file mode 100644
index 000..198fb9b33ad
--- /dev/null
+++ b/gcc/ipa-guarded-deref.cc
@@ -0,0 +1,1115 @@
+/* IPA pass to transform indirect calls to guarded direct calls.
+   Copyright (C) 2022 Free Software Foundation, Inc.
+   Contributed by Christoph Muellner (Vrull GmbH)
+   Based on work by Erick Ochoa (Vrull GmbH)
+
+This file is part of GCC.
+
+GCC is free software; you can redistribute it and/or modify it under
+the terms of the GNU General Public License as published by the Free
+Software Foundation; either version 3, or (at your option) any later
+version.
+
+GCC is distributed in the hope that it will be useful, but WITHOUT ANY
+WARRANTY; without even the implied warranty of MERCHANTABILITY or
+FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public License
+for more details.
+
+You should have received a copy of the GNU General Public License
+along with GCC; see the file COPYING3.  If not see
+.  */
+
+/* Indirect calls are used to separate callees from their call sites.
+   This helps to implement proper abstraction layers, but prevents
+   optimizations like constant-propagation or function specialization.
+
+   Assuming that we identify a function pointer that gets assigned
+   only a small amount of times, we can convert the indirect calls
+   to the target function into guarded direct calls and let later
+   passes apply additional optimizations.
+
+   This pass does this by:
+   * Identifying function pointers that are assigned up to N=1 times
+ to struct fields.
+   * Convert the indirect calls into a test for the call target
+ and a direct call
+   * If the test fails, then the indirect call will be executed.
+
+   E.g.:
+   - function foo's address is taken and stored in a field of struct
+   o->func = foo;
+   - the program writes into this struct field only once
+   - it is possible, that we miss a store (we would need strong guarantees)
+ therefore, we do the following conversion:
+   o->func ()
+ <-->
+   if (o->func == foo)
+foo ()
+   else
+o->func ()
+
+   This pass is implemented as a full IPA pass that uses the LTO section
+   "ipa_guarded_deref".  */
+
+#include "config.h"
+#include "system.h"
+#include "coretypes.h"
+#include "backend.h"
+#include "tree.h"
+#include "gimple.h"
+#include "alloc-pool.h"
+#include "tree-pass.h"
+#include "tree-cfg.h"
+#include "ssa.h"
+#include "cgraph.h"
+#include "gimple-pretty-print.h"
+#include "gimple-iterator.h"
+#include "symbol-summary.h"
+#include "ipa-utils.h"
+
+#include "

[PATCH (pushed)] configure: always set SPHINX_BUILD

2022-11-13 Thread Martin Liška
During the Sphinx-migration development, I used
SPHINX_BUILD='' in order to skip building info and manual
pages in gcc folder. However, we've got HAS_SPHINX_BUILD
which is the correct flag for that.

With the patch, one will get a nicer error message when
sphinx-build is missing and one builds (explicitly) a target which
depends on it.

PR other/107620

gcc/ChangeLog:

* configure: Regenerate.
* configure.ac: Always set sphinx-build.

libgomp/ChangeLog:

* configure: Regenerate.
* configure.ac: Always set sphinx-build.

libiberty/ChangeLog:

* configure: Regenerate.
* configure.ac: Always set sphinx-build.

libitm/ChangeLog:

* configure: Regenerate.
* configure.ac: Always set sphinx-build.

libquadmath/ChangeLog:

* configure: Regenerate.
* configure.ac: Always set sphinx-build.
---
 gcc/configure| 2 +-
 gcc/configure.ac | 2 +-
 libgomp/configure| 2 +-
 libgomp/configure.ac | 2 +-
 libiberty/configure  | 2 +-
 libiberty/configure.ac   | 2 +-
 libitm/configure | 2 +-
 libitm/configure.ac  | 2 +-
 libquadmath/configure| 2 +-
 libquadmath/configure.ac | 2 +-
 10 files changed, 10 insertions(+), 10 deletions(-)

diff --git a/gcc/configure b/gcc/configure
index 1a32f894394..752d3bf0a99 100755
--- a/gcc/configure
+++ b/gcc/configure
@@ -8849,7 +8849,7 @@ $as_echo "$as_me: WARNING:
   *** Info and man pages documentation will not be built." >&2;}
   { $as_echo "$as_me:${as_lineno-$LINENO}: result: no" >&5
 $as_echo "no" >&6; }
-  SPHINX_BUILD=
+  SPHINX_BUILD=sphinx-build
   HAS_SPHINX_BUILD=
 fi
 rm -rf $tempdir
diff --git a/gcc/configure.ac b/gcc/configure.ac
index f87fab97edd..f4c2ab1789e 100644
--- a/gcc/configure.ac
+++ b/gcc/configure.ac
@@ -1283,7 +1283,7 @@ else
   *** sphinx-build is missing or too old.
   *** Info and man pages documentation will not be built.])
   AC_MSG_RESULT(no)
-  SPHINX_BUILD=
+  SPHINX_BUILD=sphinx-build
   HAS_SPHINX_BUILD=
 fi
 rm -rf $tempdir
diff --git a/libgomp/configure b/libgomp/configure
index c018cb92f80..ac3acdaceb6 100755
--- a/libgomp/configure
+++ b/libgomp/configure
@@ -16916,7 +16916,7 @@ $as_echo "$as_me: WARNING:
   *** Info and man pages documentation will not be built." >&2;}
   { $as_echo "$as_me:${as_lineno-$LINENO}: result: no" >&5
 $as_echo "no" >&6; }
-  SPHINX_BUILD=
+  SPHINX_BUILD=sphinx-build
   HAS_SPHINX_BUILD=
 fi
 rm -rf $tempdir
diff --git a/libgomp/configure.ac b/libgomp/configure.ac
index 1aeac2d3cca..6e8300aba2b 100644
--- a/libgomp/configure.ac
+++ b/libgomp/configure.ac
@@ -499,7 +499,7 @@ else
   *** sphinx-build is missing or too old.
   *** Info and man pages documentation will not be built.])
   AC_MSG_RESULT(no)
-  SPHINX_BUILD=
+  SPHINX_BUILD=sphinx-build
   HAS_SPHINX_BUILD=
 fi
 rm -rf $tempdir
diff --git a/libiberty/configure b/libiberty/configure
index 7ee5d6002f7..c04fa3732fb 100755
--- a/libiberty/configure
+++ b/libiberty/configure
@@ -2535,7 +2535,7 @@ $as_echo "$as_me: WARNING:
   *** Info and man pages documentation will not be built." >&2;}
   { $as_echo "$as_me:${as_lineno-$LINENO}: result: no" >&5
 $as_echo "no" >&6; }
-  SPHINX_BUILD=
+  SPHINX_BUILD=sphinx-build
   HAS_SPHINX_BUILD=
 fi
 rm -rf $tempdir
diff --git a/libiberty/configure.ac b/libiberty/configure.ac
index b2cfef90489..d9f7b16b752 100644
--- a/libiberty/configure.ac
+++ b/libiberty/configure.ac
@@ -50,7 +50,7 @@ else
   *** sphinx-build is missing or too old.
   *** Info and man pages documentation will not be built.])
   AC_MSG_RESULT(no)
-  SPHINX_BUILD=
+  SPHINX_BUILD=sphinx-build
   HAS_SPHINX_BUILD=
 fi
 rm -rf $tempdir
diff --git a/libitm/configure b/libitm/configure
index 20f20039f3b..2a7d21e9d73 100755
--- a/libitm/configure
+++ b/libitm/configure
@@ -18094,7 +18094,7 @@ $as_echo "$as_me: WARNING:
   *** Info and man pages documentation will not be built." >&2;}
   { $as_echo "$as_me:${as_lineno-$LINENO}: result: no" >&5
 $as_echo "no" >&6; }
-  SPHINX_BUILD=
+  SPHINX_BUILD=sphinx-build
   HAS_SPHINX_BUILD=
 fi
 rm -rf $tempdir
diff --git a/libitm/configure.ac b/libitm/configure.ac
index 6930f5abaae..be5d7158745 100644
--- a/libitm/configure.ac
+++ b/libitm/configure.ac
@@ -331,7 +331,7 @@ else
   *** sphinx-build is missing or too old.
   *** Info and man pages documentation will not be built.])
   AC_MSG_RESULT(no)
-  SPHINX_BUILD=
+  SPHINX_BUILD=sphinx-build
   HAS_SPHINX_BUILD=
 fi
 rm -rf $tempdir
diff --git a/libquadmath/configure b/libquadmath/configure
index b7928de4ef4..26fd6a49012 100755
--- a/libquadmath/configure
+++ b/libquadmath/configure
@@ -13311,7 +13311,7 @@ $as_echo "$as_me: WARNING:
   *** Info and man pages documentation will not be built." >&2;}
   { $as_echo "$as_me:${as_lineno-$LINENO}: result: no" >&5
 $as_echo "no" >&6; }
-  SPHINX_BUILD=
+  SPHINX_BUILD=sphinx-build
   HAS_SPHINX_BUILD=
 fi
 rm -rf $tempdir
diff --git a/libquadmath/configure.ac b/libquadmath/configure.ac
index f2bef

[RFC PATCH] ipa-cp: Speculatively call specialized functions

2022-11-13 Thread Christoph Muellner
From: mtsamis 

The IPA CP pass offers a wide range of optimizations, where most of them
lead to specialized functions that are called from a call site.
This can lead to multiple specialized function clones, if more than
one call-site allows such an optimization.
If not all call-sites can be optimized, the program might end
up with call-sites to the original function.

This pass assumes that non-optimized call-sites (i.e. call-sites
that don't call specialized functions) are likely to be called
with arguments that would allow calling specialized clones.
Since we cannot guarantee this (for obvious reasons), we can't
replace the existing calls. However, we can introduce dynamic
guards that test the arguments for the collected constants
and calls the specialized function if there is a match.

To demonstrate the effect, let's consider the following program part:

  func_1()
myfunc(1)
  func_2()
myfunc(2)
  func_i(i)
myfunc(i)

In this case the transformation would do the following:

  func_1()
myfunc.constprop.1() // myfunc() with arg0 == 1
  func_2()
myfunc.constprop.2() // myfunc() with arg0 == 2
  func_i(i)
if (i == 1)
  myfunc.constprop.1() // myfunc() with arg0 == 1
else if (i == 2)
  myfunc.constprop.2() // myfunc() with arg0 == 2
else
  myfunc(i)

The pass consists of two main parts:
* collecting all specialized functions and the argument/constant pair(s)
* insertion of the guards during materialization

The patch integrates well into ipa-cp and related IPA functionality.
Given the nature of IPA, the changes are touching many IPA-related
files as well as call-graph data structures.

The impact of the dynamic guard is expected to be less than the speedup
gained by enabled optimizations (e.g. inlining or constant propagation).

PR ipa/107667
gcc/Changelog:

* cgraph.cc (cgraph_add_edge_to_call_site_hash): Add support for 
guarded specialized edges.
(cgraph_edge::set_call_stmt): Likewise.
(symbol_table::create_edge): Likewise.
(cgraph_edge::remove): Likewise.
(cgraph_edge::make_speculative): Likewise.
(cgraph_edge::make_specialized): Likewise.
(cgraph_edge::remove_specializations): Likewise.
(cgraph_edge::redirect_call_stmt_to_callee): Likewise.
(cgraph_edge::dump_edge_flags): Likewise.
(verify_speculative_call): Likewise.
(verify_specialized_call): Likewise.
(cgraph_node::verify_node): Likewise.
* cgraph.h (class GTY): Add new class that contains info of specialized 
edges.
* cgraphclones.cc (cgraph_edge::clone): Add support for guarded 
specialized edges.
(cgraph_node::set_call_stmt_including_clones): Likewise.
* ipa-cp.cc (want_remove_some_param_p): Likewise.
(create_specialized_node): Likewise.
(add_specialized_edges): Likewise.
(ipcp_driver): Likewise.
* ipa-fnsummary.cc (redirect_to_unreachable): Likewise.
(ipa_fn_summary_t::duplicate): Likewise.
(analyze_function_body): Likewise.
(estimate_edge_size_and_time): Likewise.
(remap_edge_summaries): Likewise.
* ipa-inline-transform.cc (inline_transform): Likewise.
* ipa-inline.cc (edge_badness): Likewise.
 lto-cgraph.cc (lto_output_edge): Likewise.
(input_edge): Likewise.
* tree-inline.cc (copy_bb): Likewise.
* value-prof.cc (gimple_sc): Add function to create guarded 
specializations.
* value-prof.h (gimple_sc): Likewise.

Signed-off-by: Manolis Tsamis 
---
 gcc/cgraph.cc   | 316 +++-
 gcc/cgraph.h| 102 
 gcc/cgraphclones.cc |  30 
 gcc/common.opt  |   4 +
 gcc/ipa-cp.cc   | 105 
 gcc/ipa-fnsummary.cc|  42 +
 gcc/ipa-inline-transform.cc |  11 ++
 gcc/ipa-inline.cc   |   5 +
 gcc/lto-cgraph.cc   |  46 ++
 gcc/tree-inline.cc  |  54 ++
 gcc/value-prof.cc   | 214 
 gcc/value-prof.h|   1 +
 12 files changed, 923 insertions(+), 7 deletions(-)

diff --git a/gcc/cgraph.cc b/gcc/cgraph.cc
index 5851b2ffc6c..ee819c87261 100644
--- a/gcc/cgraph.cc
+++ b/gcc/cgraph.cc
@@ -718,18 +718,24 @@ cgraph_add_edge_to_call_site_hash (cgraph_edge *e)
  one indirect); always hash the direct one.  */
   if (e->speculative && e->indirect_unknown_callee)
 return;
+  /* There are potentially multiple specialization edges for every
+ specialized call; always hash the base egde.  */
+  if (e->guarded_specialization_edge_p ())
+return;
   cgraph_edge **slot = e->caller->call_site_hash->find_slot_with_hash
   (e->call_stmt, cgraph_edge_hasher::hash (e->call_stmt), INSERT);
   if (*slot)
 {
-  gcc_assert (((cgraph_edge *)*slot)->speculative);
+  gcc_assert (((cgraph_edge *)*slot)->speculative
+ || ((cgraph_edge *)*slot)->specialized);
   if (e->

Re: [PATCH v2] RISC-V: costs: support shift-and-add in strength-reduction

2022-11-13 Thread Philipp Tomsich
Applied to master. Thanks!

Note that the multiply-by-200 (in the testcase) originates from Dhrystone.

Philipp.


On Sun, 13 Nov 2022 at 02:23, Jeff Law  wrote:
>
>
> On 11/10/22 14:34, Philipp Tomsich wrote:
> > The strength-reduction implementation in expmed.cc will assess the
> > profitability of using shift-and-add using a RTL expression that wraps
> > a MULT (with a power-of-2) in a PLUS.  Unless the RISC-V rtx_costs
> > function recognizes this as expressing a sh[123]add instruction, we
> > will return an inflated cost---thus defeating the optimization.
> >
> > This change adds the necessary idiom recognition to provide an
> > accurate cost for this for of expressing sh[123]add.
> >
> > Instead on expanding to
> >   li  a5,200
> >   mulwa0,a5,a0
> > with this change, the expression 'a * 200' is sythesized as:
> >   sh2add  a0,a0,a0   // *5 = a + 4 * a
> >   sh2add  a0,a0,a0   // *5 = a + 4 * a
> >   sllia0,a0,3// *8
> >
> > gcc/ChangeLog:
> >
> >   * config/riscv/riscv.c (riscv_rtx_costs): Recognize shNadd,
> >   if expressed as a plus and multiplication with a power-of-2.
> >   Split costing for MINUS from PLUS.
> >
> > gcc/testsuite/ChangeLog:
> >
> >   * gcc.target/riscv/zba-shNadd-07.c: New test.
>
> OK.  Note that getting this right can impact one of the spec2017 integer
> benchmarks notably.  I don't recall which one, but it has a div and a
> mod by the same constant which is fairly reasonably implement with
> shifts and adds.  You won't see it in instruction count data, but would
> see it if you had cycle count data or instrumented for div/mod instructions.
>
>
> Jeff
>
>


[PATCH (pushed)] sphinx: include todolist only if INCLUDE_TODO env. set

2022-11-13 Thread Martin Liška
It is confusing that 'Indexes and tables' contains TODO. One gets
Index by clicking to the Index link.

PR web/107643

ChangeLog:

* doc/baseconf.py: Set include_todo tag if INCLUDE_TODO env
is set.
* doc/indices-and-tables.rst: Use include_todo tag.
---
 doc/baseconf.py| 3 +++
 doc/indices-and-tables.rst | 2 +-
 2 files changed, 4 insertions(+), 1 deletion(-)

diff --git a/doc/baseconf.py b/doc/baseconf.py
index 8021a101e77..c91675d3d04 100644
--- a/doc/baseconf.py
+++ b/doc/baseconf.py
@@ -51,6 +51,7 @@ gcc_REVISION = read_file('REVISION')
 VERSION_PACKAGE = os.getenv('VERSION_PACKAGE')
 BUGURL = os.getenv('BUGURL')
 MONOCHROMATIC = os.getenv('MONOCHROMATIC')
+INCLUDE_TODO = os.getenv('INCLUDE_TODO')
 
 YEAR = time.strftime('%Y')
 
@@ -215,6 +216,8 @@ def set_common(name, module):
 if gcc_DEVPHASE == 'experimental':
 module['todo_include_todos'] = True
 module['tags'].add('development')
+if INCLUDE_TODO:
+module['tags'].add('include_todo')
 
 html_theme_options['source_edit_link'] = 
f'https://gcc.gnu.org/onlinedocs/{name}' \
  '/_sources/{filename}.txt'
diff --git a/doc/indices-and-tables.rst b/doc/indices-and-tables.rst
index 56b33139280..0f4cd2fdc28 100644
--- a/doc/indices-and-tables.rst
+++ b/doc/indices-and-tables.rst
@@ -5,7 +5,7 @@
 
   :ref:`genindex`
 
-  .. only:: development
+  .. only:: include_todo
 
 TODO
 
-- 
2.38.1



[PATCH] doc: Update Jeff Law's email-address in contrib.rst

2022-11-13 Thread Philipp Tomsich
Applied to master as obvious.

ChangeLog:

* doc/contrib.rst: Update Jeff Law's email address.

Signed-off-by: Philipp Tomsich 
---

 doc/contrib.rst | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/doc/contrib.rst b/doc/contrib.rst
index 96bf2a56af4..15e358b7903 100644
--- a/doc/contrib.rst
+++ b/doc/contrib.rst
@@ -13,7 +13,7 @@ Contributors to GCC
 The GCC project would like to thank its many contributors.  Without them the
 project would not have been nearly as successful as it has been.  Any omissions
 in this list are accidental.  Feel free to contact
-l...@redhat.com or ger...@pfeifer.com if you have been left
+j...@ventanamicro.com or ger...@pfeifer.com if you have been left
 out or some of your contributions are not listed.  Please keep this list in
 alphabetical order.
 
-- 
2.34.1



Re: Announcement: Porting the Docs to Sphinx - tomorrow

2022-11-13 Thread Martin Liška
On 11/11/22 22:10, Sandra Loosemore wrote:
> On 11/11/22 13:52, Gerald Pfeifer wrote:
>> On Tue, 8 Nov 2022, Martin Liška wrote:
>>> After the migration, people should be able to build (and install) GCC
>>> even if they miss Sphinx (similar happens now if you miss makeinfo).
>>
>> My nightly *install* (not build) on amd64-unknown-freebsd12.2 broke
>> (from what I can tell due to this - it's been working fine most of
>> the last several 1000 days):
>>
>>    if [ -f doc/g++.1 ]; then rm -f 
>> /home/gerald/gcc-ref12-amd64/share/man/man1/g++.1; /usr/bin/install -c -m 
>> 644 doc/g++.1 /home/gerald/gcc-ref12-amd64/share/man/man1/g++.1; chmod a-x 
>> /home/gerald/gcc-ref12-amd64/share/man/man1/g++.1; fimake -C 
>> /scratch/tmp/gerald/GCC-HEAD/gcc/../doc man 
>> SOURCEDIR=/scratch/tmp/gerald/GCC-HEAD/gcc/fortran/doc/gfortran 
>> BUILDDIR=/scratch/tmp/gerald/OBJ--0954/gcc/doc/gfortran/man SPHINXBUILD=
>>    make[3]: make[3]: don't know how to make w. Stop
>>    make[3]: stopped in /scratch/tmp/gerald/GCC-HEAD/doc
>>    gmake[2]: *** [/scratch/tmp/gerald/GCC-HEAD/gcc/fortran/Make-lang.in:164: 
>> doc/gfortran/man/man/gfortran.1] Error 2
>>    gmake[2]: Leaving directory '/scratch/tmp/gerald/OBJ--0954/gcc'
>>    gmake[1]: *** [Makefile:5310: install-strip-gcc] Error 2
>>    gmake[1]: Leaving directory '/scratch/tmp/gerald/OBJ--0954'
>>    gmake: *** [Makefile:2734: install-strip] Error 2
>>
>> (This appears to be the case with "make -j1 install-strip". Not sure where
>> that "w" target is coming from?)
> 
> I've seen something similar:  "make install" seems to be passing an empty 
> SPHINXBUILD= option to the docs Makefile which is not equipped to handle 
> that.  I know the fix is to get a recent-enough version of Sphinx installed 
> (and I'm going to work on that over the weekend), but it ought to fail more 
> gracefully, or not try to install docs that cannot be built without Sphinx.
> 
> -Sandra
> 

Can you please update the current master, you should get a proper error message.

I'm going to take a look at make install-strip target.

Martin


Re: [PATCH] RISC-V: optimize '(a >= 0) ? b : 0' to srai + andn, if compiling for Zbb

2022-11-13 Thread Philipp Tomsich
Applied to master. Thanks!
--Philipp.

On Sun, 13 Nov 2022 at 01:24, Jeff Law  wrote:
>
>
> On 11/8/22 12:54, Philipp Tomsich wrote:
> > If-conversion is turning '(a >= 0) ? b : 0' into a branchless sequence
> >   not a5,a0
> >   sraia5,a5,63
> >   and a0,a1,a5
> > missing the opportunity to combine the NOT and AND into an ANDN.
> >
> > This adds a define_split to help the combiner reassociate the NOT with
> > the AND.
> >
> >
> > gcc/ChangeLog:
> >
> >   * config/riscv/bitmanip.md: New define_split.
> >
> > gcc/testsuite/ChangeLog:
> >
> >   * gcc.target/riscv/zbb-srai-andn.c: New test.
>
> OK.
>
>
> FWIW, combine can be pretty sneaky in manipulating the result of a scc
> style insn.I've seen a port with pages and pages of special patterns
> to match what simplify_if_then_else would do.
>
>
> Jeff
>


[PATCH] aarch64: Add support for Ampere-1A (-mcpu=ampere1a) CPU

2022-11-13 Thread Philipp Tomsich
This patch adds support for Ampere-1A CPU:
 - recognize the name of the core and provide detection for -mcpu=native,
 - updated extra_costs,
 - adds a new fusion pair for (A+B+1 and A-B-1).

Ampere-1A and Ampere-1 have more timing difference than the extra
costs indicate, but these don't propagate through to the headline
items in our extra costs (e.g. the change in latency for scalar sqrt
doesn't have a corresponding table entry).

gcc/ChangeLog:

* config/aarch64/aarch64-cores.def (AARCH64_CORE): Add ampere1a.
* config/aarch64/aarch64-cost-tables.h: Add ampere1a_extra_costs.
* config/aarch64/aarch64-fusion-pairs.def (AARCH64_FUSION_PAIR):
Define a new fusion pair for A+B+1/A-B-1 (i.e., add/subtract two
registers and then +1/-1).
* config/aarch64/aarch64-tune.md: Regenerate.
* config/aarch64/aarch64.cc (aarch_macro_fusion_pair_p): Implement
idiom-matcher for the new fusion pair.

Signed-off-by: Philipp Tomsich 
---

 gcc/config/aarch64/aarch64-cores.def|   1 +
 gcc/config/aarch64/aarch64-cost-tables.h| 107 
 gcc/config/aarch64/aarch64-fusion-pairs.def |   1 +
 gcc/config/aarch64/aarch64-tune.md  |   2 +-
 gcc/config/aarch64/aarch64.cc   |  63 
 5 files changed, 173 insertions(+), 1 deletion(-)

diff --git a/gcc/config/aarch64/aarch64-cores.def 
b/gcc/config/aarch64/aarch64-cores.def
index d2671778928..aead587cec1 100644
--- a/gcc/config/aarch64/aarch64-cores.def
+++ b/gcc/config/aarch64/aarch64-cores.def
@@ -70,6 +70,7 @@ AARCH64_CORE("thunderxt83",   thunderxt83,   thunderx,  V8A,  
(CRC, CRYPTO), thu
 
 /* Ampere Computing ('\xC0') cores. */
 AARCH64_CORE("ampere1", ampere1, cortexa57, V8_6A, (F16, RNG, AES, SHA3), 
ampere1, 0xC0, 0xac3, -1)
+AARCH64_CORE("ampere1a", ampere1a, cortexa57, V8_6A, (F16, RNG, AES, SHA3, 
MEMTAG), ampere1a, 0xC0, 0xac4, -1)
 /* Do not swap around "emag" and "xgene1",
this order is required to handle variant correctly. */
 AARCH64_CORE("emag",emag,  xgene1,V8A,  (CRC, CRYPTO), emag, 
0x50, 0x000, 3)
diff --git a/gcc/config/aarch64/aarch64-cost-tables.h 
b/gcc/config/aarch64/aarch64-cost-tables.h
index 760d7b30368..48522606fbe 100644
--- a/gcc/config/aarch64/aarch64-cost-tables.h
+++ b/gcc/config/aarch64/aarch64-cost-tables.h
@@ -775,4 +775,111 @@ const struct cpu_cost_table ampere1_extra_costs =
   }
 };
 
+const struct cpu_cost_table ampere1a_extra_costs =
+{
+  /* ALU */
+  {
+0, /* arith.  */
+0, /* logical.  */
+0, /* shift.  */
+COSTS_N_INSNS (1), /* shift_reg.  */
+0, /* arith_shift.  */
+COSTS_N_INSNS (1), /* arith_shift_reg.  */
+0, /* log_shift.  */
+COSTS_N_INSNS (1), /* log_shift_reg.  */
+0, /* extend.  */
+COSTS_N_INSNS (1), /* extend_arith.  */
+0, /* bfi.  */
+0, /* bfx.  */
+0, /* clz.  */
+0, /* rev.  */
+0, /* non_exec.  */
+true   /* non_exec_costs_exec.  */
+  },
+  {
+/* MULT SImode */
+{
+  COSTS_N_INSNS (3),   /* simple.  */
+  COSTS_N_INSNS (3),   /* flag_setting.  */
+  COSTS_N_INSNS (3),   /* extend.  */
+  COSTS_N_INSNS (4),   /* add.  */
+  COSTS_N_INSNS (4),   /* extend_add.  */
+  COSTS_N_INSNS (19)   /* idiv.  */
+},
+/* MULT DImode */
+{
+  COSTS_N_INSNS (3),   /* simple.  */
+  0,   /* flag_setting (N/A).  */
+  COSTS_N_INSNS (3),   /* extend.  */
+  COSTS_N_INSNS (4),   /* add.  */
+  COSTS_N_INSNS (4),   /* extend_add.  */
+  COSTS_N_INSNS (35)   /* idiv.  */
+}
+  },
+  /* LD/ST */
+  {
+COSTS_N_INSNS (4), /* load.  */
+COSTS_N_INSNS (4), /* load_sign_extend.  */
+0, /* ldrd (n/a).  */
+0, /* ldm_1st.  */
+0, /* ldm_regs_per_insn_1st.  */
+0, /* ldm_regs_per_insn_subsequent.  */
+COSTS_N_INSNS (5), /* loadf.  */
+COSTS_N_INSNS (5), /* loadd.  */
+COSTS_N_INSNS (5), /* load_unaligned.  */
+0, /* store.  */
+0, /* strd.  */
+0, /* stm_1st.  */
+0, /* stm_regs_per_insn_1st.  */
+0, /* stm_regs_per_insn_subsequent.  */
+COSTS_N_INSNS (2), /* storef.  */
+COSTS_N_INSNS (2), /* stored.  */
+COSTS_N_INSNS (2), /* store_unaligned.  */
+COSTS_N_INSNS (3), /* loadv.  */
+COSTS_N_INSNS (3)  /* storev.  */
+  },
+  {
+/* FP SFmode */
+{
+  COSTS_N_INSNS (25),  /* div.  */
+  COSTS_N_INSNS (4),   /* mult.  */
+  COSTS_N_INSNS (4),  

Re: [PATCH] doc: Ada: include Indices and Tables in manuals

2022-11-13 Thread Arnaud Charlet via Gcc-patches
> >> Sorry for the breakage. However, I contacted you (and your colleague) and 
> >> haven't received
> >> any feedback for a couple of weeks.
> > 
> > Right although I did give you feedback that what you sent wasn’t in a 
> > suitable form for review wrt Ada.
> 
> Sure, but sending a patch set to gcc-patches wouldn't have worked either, 
> we've got quite a strict
> email size limit.
> 
> Anyway, hope the AdaCore build would be fixable with a reasonable amount of 
> effort?

Unclear yet. We'll probably need to change and possibly partially revert the
Ada changes, we'll see.

Arno


[committed] hppa: Skip guality tests on hppa*-*-hpux*

2022-11-13 Thread John David Anglin
Committed to trunk.

Dave
---

Skip guality tests on hppa-hpux.

The guality check command hangs. This causes TCL errors in
other tests and slows testsuite execution.

2022-11-13  John David Anglin  

gcc/testsuite/ChangeLog:

* g++.dg/guality/guality.exp: Skip on hppa*-*-hpux*.
* gcc.dg/guality/guality.exp: Likewise.
* gfortran.dg/guality/guality.exp: Likewise.

diff --git a/gcc/testsuite/g++.dg/guality/guality.exp 
b/gcc/testsuite/g++.dg/guality/guality.exp
index 1d5b65fef57..2d736d292e9 100644
--- a/gcc/testsuite/g++.dg/guality/guality.exp
+++ b/gcc/testsuite/g++.dg/guality/guality.exp
@@ -8,6 +8,10 @@ if { [istarget *-*-darwin*] } {
 return
 }
 
+if { [istarget hppa*-*-hpux*] } {
+return
+}
+
 if { [istarget "powerpc-ibm-aix*"] } {
 set torture_execute_xfail "powerpc-ibm-aix*"
 return
diff --git a/gcc/testsuite/gcc.dg/guality/guality.exp 
b/gcc/testsuite/gcc.dg/guality/guality.exp
index ba87132aef2..075bebe34e8 100644
--- a/gcc/testsuite/gcc.dg/guality/guality.exp
+++ b/gcc/testsuite/gcc.dg/guality/guality.exp
@@ -8,6 +8,10 @@ if { [istarget *-*-darwin*] } {
 return
 }
 
+if { [istarget hppa*-*-hpux*] } {
+return
+}
+
 if { [istarget "powerpc-ibm-aix*"] } {
 set torture_execute_xfail "powerpc-ibm-aix*"
 return
diff --git a/gcc/testsuite/gfortran.dg/guality/guality.exp 
b/gcc/testsuite/gfortran.dg/guality/guality.exp
index 0375edfffe4..86a966a9133 100644
--- a/gcc/testsuite/gfortran.dg/guality/guality.exp
+++ b/gcc/testsuite/gfortran.dg/guality/guality.exp
@@ -8,6 +8,10 @@ if { [istarget *-*-darwin*] } {
   return
 }
 
+if { [istarget hppa*-*-hpux*] } {
+return
+}
+
 if { [istarget "powerpc-ibm-aix*"] } {
 set torture_execute_xfail "powerpc-ibm-aix*"
 return


signature.asc
Description: PGP signature


[PATCH] libstdc++: Fix python/ not making install directories

2022-11-13 Thread Arsen Arsenović via Gcc-patches
I'm unsure why this issue only started manifesting now with how old this
code is, but this should fix it.

libstdc++-v3/ChangeLog:

* python/Makefile.am: Call mkinstalldirs before INSTALL_DATA
when installing gdb scripts.
* python/Makefile.in: Regenerate.
---
Hi,

Someone spotted on IRC spotted an error: if trying to install to a fresh
prefix/sysroot with --enable-libstdcxx-debug, the install fails since it's
intended target directories don't exist.  I could replicate this on
r13-3944-g43435c7eb0ff60 using

$ ../gcc/configure --disable-bootstrap \
--enable-libstdcxx-debug \
--enable-languages=c,c++ \
--prefix=$(pwd)/pfx

Install tested on x86_64-pc-linux-gnu with and without
--enable-libstdcxx-debug.

 libstdc++-v3/python/Makefile.am | 4 
 libstdc++-v3/python/Makefile.in | 4 
 2 files changed, 8 insertions(+)

diff --git a/libstdc++-v3/python/Makefile.am b/libstdc++-v3/python/Makefile.am
index f523d3a44dc..7987d33e6d9 100644
--- a/libstdc++-v3/python/Makefile.am
+++ b/libstdc++-v3/python/Makefile.am
@@ -58,9 +58,13 @@ install-data-local: gdb.py
  libname=`sed -ne "/^old_library=/{s/.*='//;s/'$$//;s/ .*//;p;}" \
  $(DESTDIR)$(toolexeclibdir)/libstdc++.la`; \
fi; \
+   echo " $(mkinstalldirs) $(DESTDIR)$(toolexeclibdir)"; \
+   $(mkinstalldirs) $(DESTDIR)$(toolexeclibdir); \
echo " $(INSTALL_DATA) gdb.py 
$(DESTDIR)$(toolexeclibdir)/$$libname-gdb.py"; \
$(INSTALL_DATA) gdb.py $(DESTDIR)$(toolexeclibdir)/$$libname-gdb.py ; \
if [ -n "$(debug_gdb_py)" ]; then \
  sed "/^libdir = /s;'$$;/debug';" gdb.py > debug-gdb.py ; \
+ echo " $(mkinstalldirs) $(DESTDIR)$(toolexeclibdir)/debug"; \
+ $(mkinstalldirs) $(DESTDIR)$(toolexeclibdir)/debug; \
  $(INSTALL_DATA) debug-gdb.py 
$(DESTDIR)$(toolexeclibdir)/debug/$$libname-gdb.py ; \
fi
diff --git a/libstdc++-v3/python/Makefile.in b/libstdc++-v3/python/Makefile.in
index 05e79b5ac1e..a68c1836481 100644
--- a/libstdc++-v3/python/Makefile.in
+++ b/libstdc++-v3/python/Makefile.in
@@ -623,10 +623,14 @@ install-data-local: gdb.py
  libname=`sed -ne "/^old_library=/{s/.*='//;s/'$$//;s/ .*//;p;}" \
  $(DESTDIR)$(toolexeclibdir)/libstdc++.la`; \
fi; \
+   echo " $(mkinstalldirs) $(DESTDIR)$(toolexeclibdir)"; \
+   $(mkinstalldirs) $(DESTDIR)$(toolexeclibdir); \
echo " $(INSTALL_DATA) gdb.py 
$(DESTDIR)$(toolexeclibdir)/$$libname-gdb.py"; \
$(INSTALL_DATA) gdb.py $(DESTDIR)$(toolexeclibdir)/$$libname-gdb.py ; \
if [ -n "$(debug_gdb_py)" ]; then \
  sed "/^libdir = /s;'$$;/debug';" gdb.py > debug-gdb.py ; \
+ echo " $(mkinstalldirs) $(DESTDIR)$(toolexeclibdir)/debug"; \
+ $(mkinstalldirs) $(DESTDIR)$(toolexeclibdir)/debug; \
  $(INSTALL_DATA) debug-gdb.py 
$(DESTDIR)$(toolexeclibdir)/debug/$$libname-gdb.py ; \
fi
 
-- 
2.38.1



Re: [PATCH (pushed)] sphinx: fix building if sphinx-build is missing

2022-11-13 Thread Stephan Bergmann via Gcc-patches

On 11/9/22 10:09, Martin Liška wrote:

I noticed I modified Makefile.in files in the Sphinx series. While I was
supposed to modifiky Makefile.am and use automake. I'm going to fix that soon.


A recent master build against (apparently too old) 
python3-sphinx-5.0.2-2.fc37.noarch failed for me with


[...]

make -C ../../src/gcc/../doc man 
SOURCEDIR=.../build/gcc/../../src/gcc/fortran/doc/gfortran 
BUILDDIR=.../build/gcc/doc/gfortran/man SPHINXBUILD=sphinx-build
make[3]: Entering directory '.../src/doc'
sphinx-build -b "man" -d .../build/gcc/doc/gfortran/man/doctrees  -q  
.../build/gcc/../../src/gcc/fortran/doc/gfortran ".../build/gcc/doc/gfortran/man/man"

Sphinx version error:
This project needs at least Sphinx v5.3 and therefore cannot be built with this 
version.
make[3]: *** [Makefile:100: man] Error 2
make[3]: Leaving directory '.../src/doc'
make[2]: *** [../../src/gcc/fortran/Make-lang.in:164: 
doc/gfortran/man/man/gfortran.1] Error 2
make[2]: Leaving directory '.../build/gcc'
make[1]: *** [Makefile:5300: install-gcc] Error 2
make[1]: Leaving directory '.../build'
make: *** [Makefile:2576: install] Error 2


which would be fixed by


diff --git a/gcc/fortran/Make-lang.in b/gcc/fortran/Make-lang.in
index 48acbed1754..852b6f3327f 100644
--- a/gcc/fortran/Make-lang.in
+++ b/gcc/fortran/Make-lang.in
@@ -161,7 +161,9 @@ fortran.install-pdf: $(F95_PDFFILES)
 F95_MANFILES = doc/gfortran/man/man/gfortran.1
 
 doc/gfortran/man/man/gfortran.1: $(SPHINX_FILES)

-   + make -C $(srcdir)/../doc man 
SOURCEDIR=$(abs_srcdir)/fortran/doc/gfortran 
BUILDDIR=$(objdir)/doc/gfortran/man SPHINXBUILD=$(SPHINX_BUILD)
+   + if [ x$(HAS_SPHINX_BUILD) = xhas-sphinx-build ]; then \
+ make -C $(srcdir)/../doc man 
SOURCEDIR=$(abs_srcdir)/fortran/doc/gfortran 
BUILDDIR=$(objdir)/doc/gfortran/man SPHINXBUILD=$(SPHINX_BUILD) \
+   else true; fi
 
 fortran.man: $(F95_MANFILES)
 





Re: [committed] libstdc++: Avoid redundant checks in std::use_facet [PR103755]

2022-11-13 Thread Stephan Bergmann via Gcc-patches

On 11/12/22 03:47, Jonathan Wakely wrote:

On Fri, 11 Nov 2022 at 21:00, Stephan Bergmann  wrote:


On 11/11/22 06:30, Jonathan Wakely via Gcc-patches wrote:

As discussed in the PR, this makes it three times faster to construct
iostreams objects.

Tested x86_64-linux. Pushed to trunk.


I haven't yet tried to track down what's going on, but with various
versions of Clang (e.g. clang-15.0.4-1.fc37.x86_64):


$ cat test.cc
#include 
int main(int, char ** argv) {
 std::regex_traits().transform(argv[0], argv[0] + 1);
}



$ clang++ --gcc-toolchain=... -fsanitize=undefined -O2 test.cc
/usr/bin/ld: /tmp/test-c112b1.o: in function `std::__cxx11::basic_string, std::allocator > 
std::__cxx11::regex_traits::transform(char*, char*) const':
test.cc:(.text._ZNKSt7__cxx1112regex_traitsIcE9transformIPcEENS_12basic_stringIcSt11char_traitsIcESaIcEEET_S9_[_ZNKSt7__cxx1112regex_traitsIcE9transformIPcEENS_12basic_stringIcSt11char_traitsIcESaIcEEET_S9_]+0x1b):
 undefined reference to `std::__cxx11::collate const* 
std::__try_use_facet >(std::locale const&)'
clang-15: error: linker command failed with exit code 1 (use -v to see 
invocation)


That should be present, andis present in my builds:

_ZSt15__try_use_facetINSt7__cxx117collateIcEEEPKT_RKSt6locale
std::__cxx11::collate const*
std::__try_use_facet >(std::locale const&)
version status: compatible
GLIBCXX_3.4.31
type: function
status: added

Was this a clean build, or incremental? I'm guessing the latter.


Yes, indeed.  And a full rebuild fixed the issue for me.

Sorry for the noise.



Re: [PATCH (pushed)] sphinx: fix building if sphinx-build is missing

2022-11-13 Thread Martin Liška
On 11/13/22 20:18, Stephan Bergmann wrote:
> On 11/9/22 10:09, Martin Liška wrote:
>> I noticed I modified Makefile.in files in the Sphinx series. While I was
>> supposed to modifiky Makefile.am and use automake. I'm going to fix that 
>> soon.
> 
> A recent master build against (apparently too old) 
> python3-sphinx-5.0.2-2.fc37.noarch failed for me with

I see!

> 
> [...]
>> make -C ../../src/gcc/../doc man 
>> SOURCEDIR=.../build/gcc/../../src/gcc/fortran/doc/gfortran 
>> BUILDDIR=.../build/gcc/doc/gfortran/man SPHINXBUILD=sphinx-build
>> make[3]: Entering directory '.../src/doc'
>> sphinx-build -b "man" -d .../build/gcc/doc/gfortran/man/doctrees  -q  
>> .../build/gcc/../../src/gcc/fortran/doc/gfortran 
>> ".../build/gcc/doc/gfortran/man/man"
>>
>> Sphinx version error:
>> This project needs at least Sphinx v5.3 and therefore cannot be built with 
>> this version.
>> make[3]: *** [Makefile:100: man] Error 2
>> make[3]: Leaving directory '.../src/doc'
>> make[2]: *** [../../src/gcc/fortran/Make-lang.in:164: 
>> doc/gfortran/man/man/gfortran.1] Error 2
>> make[2]: Leaving directory '.../build/gcc'
>> make[1]: *** [Makefile:5300: install-gcc] Error 2
>> make[1]: Leaving directory '.../build'
>> make: *** [Makefile:2576: install] Error 2
> 
> which would be fixed by

The patch is fine, please send it to gcc-patches ML and install it.

Thanks,
Martin

> 
>> diff --git a/gcc/fortran/Make-lang.in b/gcc/fortran/Make-lang.in
>> index 48acbed1754..852b6f3327f 100644
>> --- a/gcc/fortran/Make-lang.in
>> +++ b/gcc/fortran/Make-lang.in
>> @@ -161,7 +161,9 @@ fortran.install-pdf: $(F95_PDFFILES)
>>  F95_MANFILES = doc/gfortran/man/man/gfortran.1
>>  
>>  doc/gfortran/man/man/gfortran.1: $(SPHINX_FILES)
>> -   + make -C $(srcdir)/../doc man 
>> SOURCEDIR=$(abs_srcdir)/fortran/doc/gfortran 
>> BUILDDIR=$(objdir)/doc/gfortran/man SPHINXBUILD=$(SPHINX_BUILD)
>> +   + if [ x$(HAS_SPHINX_BUILD) = xhas-sphinx-build ]; then \
>> + make -C $(srcdir)/../doc man 
>> SOURCEDIR=$(abs_srcdir)/fortran/doc/gfortran 
>> BUILDDIR=$(objdir)/doc/gfortran/man SPHINXBUILD=$(SPHINX_BUILD) \
>> +   else true; fi
>>  
>>  fortran.man: $(F95_MANFILES)
>>  
> 
> 



Re: [committed] libstdc++: Avoid redundant checks in std::use_facet [PR103755]

2022-11-13 Thread Jonathan Wakely via Gcc-patches
On Sun, 13 Nov 2022, 19:23 Stephan Bergmann via Libstdc++, <
libstd...@gcc.gnu.org> wrote:

> On 11/12/22 03:47, Jonathan Wakely wrote:
> > On Fri, 11 Nov 2022 at 21:00, Stephan Bergmann 
> wrote:
> >>
> >> On 11/11/22 06:30, Jonathan Wakely via Gcc-patches wrote:
> >>> As discussed in the PR, this makes it three times faster to construct
> >>> iostreams objects.
> >>>
> >>> Tested x86_64-linux. Pushed to trunk.
> >>
> >> I haven't yet tried to track down what's going on, but with various
> >> versions of Clang (e.g. clang-15.0.4-1.fc37.x86_64):
> >>
> >>> $ cat test.cc
> >>> #include 
> >>> int main(int, char ** argv) {
> >>>  std::regex_traits().transform(argv[0], argv[0] + 1);
> >>> }
> >>
> >>> $ clang++ --gcc-toolchain=... -fsanitize=undefined -O2 test.cc
> >>> /usr/bin/ld: /tmp/test-c112b1.o: in function
> `std::__cxx11::basic_string,
> std::allocator >
> std::__cxx11::regex_traits::transform(char*, char*) const':
> >>>
> test.cc:(.text._ZNKSt7__cxx1112regex_traitsIcE9transformIPcEENS_12basic_stringIcSt11char_traitsIcESaIcEEET_S9_[_ZNKSt7__cxx1112regex_traitsIcE9transformIPcEENS_12basic_stringIcSt11char_traitsIcESaIcEEET_S9_]+0x1b):
> undefined reference to `std::__cxx11::collate const*
> std::__try_use_facet >(std::locale const&)'
> >>> clang-15: error: linker command failed with exit code 1 (use -v to see
> invocation)
> >
> > That should be present, andis present in my builds:
> >
> > _ZSt15__try_use_facetINSt7__cxx117collateIcEEEPKT_RKSt6locale
> > std::__cxx11::collate const*
> > std::__try_use_facet >(std::locale const&)
> > version status: compatible
> > GLIBCXX_3.4.31
> > type: function
> > status: added
> >
> > Was this a clean build, or incremental? I'm guessing the latter.
>
> Yes, indeed.  And a full rebuild fixed the issue for me.
>

Ah good.


> Sorry for the noise.
>

No problem, you do a good job of keeping us working well with clang.


Re: old install to a different folder

2022-11-13 Thread Martin Liška
On 11/12/22 01:06, Joseph Myers wrote:
> On Fri, 11 Nov 2022, Tobias Burnus wrote:
> 
>> For /onlinedocs/, I concur that we want to have the old doc there as there 
>> are
>> many
>> deep links. Still, we should consider adding a disclaimer box to all former
>> mainline
>> documentation stating that this data is no longer updated + point to the new
>> overview
>> page + we could redirect access which goes directly to '//' and
>> not a (sub)html
>> page to the new site, as you proposed.
> 
> Note that if we do this, I think my previous comments (to keep the 
> unmodified files somewhere long-term outside the directory served by the 
> webserver, with an automated process for modifying headers, so that 
> modified versions of that process can be re-run in future if we have 
> further changes to apply to the added headers) apply to such old mainline 
> documentation just as to release documentation.
> 

Hello.

So I in context of [1] and [2] we will have to move content in between files
and thus newly taken URLs (from gcc.gnu.org/onlinedocs/$man and 
gcc.gnu.org/install)
can get invalid.

So Gerald, I'm suggesting a new url base gcc.gnu.org/docs that will be filled 
with the new manuals
and gcc.gnu.org/onlinedocs/$man and gcc.gnu.org/install locations should point 
to older (trunk)
manuals (prev folder at server I guess).
Having that, the new manuals will not available through navigation and will get 
some time
for further changes.

Thanks.
Martin

[1] https://gcc.gnu.org/pipermail/gcc/2022-November/239922.html
[2] https://gcc.gnu.org/bugzilla/show_bug.cgi?id=107634


Re: [PATCH] libstdc++: Fix python/ not making install directories

2022-11-13 Thread Jonathan Wakely via Gcc-patches
On Sun, 13 Nov 2022, 18:06 Arsen Arsenović via Libstdc++, <
libstd...@gcc.gnu.org> wrote:

> I'm unsure why this issue only started manifesting now with how old this
> code is, but this should fix it.
>

I just pushed a change to how the debug build makefiles are generated,
which presumably uncovered this latent bug. I'll review the patch in the
morning.



> libstdc++-v3/ChangeLog:
>
> * python/Makefile.am: Call mkinstalldirs before INSTALL_DATA
> when installing gdb scripts.
> * python/Makefile.in: Regenerate.
> ---
> Hi,
>
> Someone spotted on IRC spotted an error: if trying to install to a fresh
> prefix/sysroot with --enable-libstdcxx-debug, the install fails since it's
> intended target directories don't exist.  I could replicate this on
> r13-3944-g43435c7eb0ff60 using
>
> $ ../gcc/configure --disable-bootstrap \
> --enable-libstdcxx-debug \
> --enable-languages=c,c++ \
> --prefix=$(pwd)/pfx
>
> Install tested on x86_64-pc-linux-gnu with and without
> --enable-libstdcxx-debug.
>
>  libstdc++-v3/python/Makefile.am | 4 
>  libstdc++-v3/python/Makefile.in | 4 
>  2 files changed, 8 insertions(+)
>
> diff --git a/libstdc++-v3/python/Makefile.am
> b/libstdc++-v3/python/Makefile.am
> index f523d3a44dc..7987d33e6d9 100644
> --- a/libstdc++-v3/python/Makefile.am
> +++ b/libstdc++-v3/python/Makefile.am
> @@ -58,9 +58,13 @@ install-data-local: gdb.py
>   libname=`sed -ne "/^old_library=/{s/.*='//;s/'$$//;s/ .*//;p;}" \
>   $(DESTDIR)$(toolexeclibdir)/libstdc++.la`; \
> fi; \
> +   echo " $(mkinstalldirs) $(DESTDIR)$(toolexeclibdir)"; \
> +   $(mkinstalldirs) $(DESTDIR)$(toolexeclibdir); \
> echo " $(INSTALL_DATA) gdb.py
> $(DESTDIR)$(toolexeclibdir)/$$libname-gdb.py"; \
> $(INSTALL_DATA) gdb.py
> $(DESTDIR)$(toolexeclibdir)/$$libname-gdb.py ; \
> if [ -n "$(debug_gdb_py)" ]; then \
>   sed "/^libdir = /s;'$$;/debug';" gdb.py > debug-gdb.py ; \
> + echo " $(mkinstalldirs) $(DESTDIR)$(toolexeclibdir)/debug"; \
> + $(mkinstalldirs) $(DESTDIR)$(toolexeclibdir)/debug; \
>   $(INSTALL_DATA) debug-gdb.py
> $(DESTDIR)$(toolexeclibdir)/debug/$$libname-gdb.py ; \
> fi
> diff --git a/libstdc++-v3/python/Makefile.in
> b/libstdc++-v3/python/Makefile.in
> index 05e79b5ac1e..a68c1836481 100644
> --- a/libstdc++-v3/python/Makefile.in
> +++ b/libstdc++-v3/python/Makefile.in
> @@ -623,10 +623,14 @@ install-data-local: gdb.py
>   libname=`sed -ne "/^old_library=/{s/.*='//;s/'$$//;s/ .*//;p;}" \
>   $(DESTDIR)$(toolexeclibdir)/libstdc++.la`; \
> fi; \
> +   echo " $(mkinstalldirs) $(DESTDIR)$(toolexeclibdir)"; \
> +   $(mkinstalldirs) $(DESTDIR)$(toolexeclibdir); \
> echo " $(INSTALL_DATA) gdb.py
> $(DESTDIR)$(toolexeclibdir)/$$libname-gdb.py"; \
> $(INSTALL_DATA) gdb.py
> $(DESTDIR)$(toolexeclibdir)/$$libname-gdb.py ; \
> if [ -n "$(debug_gdb_py)" ]; then \
>   sed "/^libdir = /s;'$$;/debug';" gdb.py > debug-gdb.py ; \
> + echo " $(mkinstalldirs) $(DESTDIR)$(toolexeclibdir)/debug"; \
> + $(mkinstalldirs) $(DESTDIR)$(toolexeclibdir)/debug; \
>   $(INSTALL_DATA) debug-gdb.py
> $(DESTDIR)$(toolexeclibdir)/debug/$$libname-gdb.py ; \
> fi
>
> --
> 2.38.1
>
>


[PATCH] [range-ops] Implement sqrt.

2022-11-13 Thread Aldy Hernandez via Gcc-patches
It seems SQRT is relatively straightforward, and it's something Jakub
wanted for this release.

Jakub, what do you think?

p.s. Too tired to think about op1_range.

gcc/ChangeLog:

* gimple-range-op.cc (class cfn_sqrt): New.
(gimple_range_op_handler::maybe_builtin_call): Add cases for sqrt.
---
 gcc/gimple-range-op.cc | 42 ++
 1 file changed, 42 insertions(+)

diff --git a/gcc/gimple-range-op.cc b/gcc/gimple-range-op.cc
index 7764166d5fb..240cd8b6a11 100644
--- a/gcc/gimple-range-op.cc
+++ b/gcc/gimple-range-op.cc
@@ -43,6 +43,7 @@ along with GCC; see the file COPYING3.  If not see
 #include "range.h"
 #include "value-query.h"
 #include "gimple-range.h"
+#include "fold-const-call.h"
 
 // Given stmt S, fill VEC, up to VEC_SIZE elements, with relevant ssa-names
 // on the statement.  For efficiency, it is an error to not pass in enough
@@ -301,6 +302,41 @@ public:
   }
 } op_cfn_constant_p;
 
+// Implement range operator for SQRT.
+class cfn_sqrt : public range_operator_float
+{
+  using range_operator_float::fold_range;
+private:
+  REAL_VALUE_TYPE real_sqrt (const REAL_VALUE_TYPE &arg, tree type) const
+  {
+tree targ = build_real (type, arg);
+tree res = fold_const_call (as_combined_fn (BUILT_IN_SQRT), type, targ);
+return *TREE_REAL_CST_PTR (res);
+  }
+  void rv_fold (REAL_VALUE_TYPE &lb, REAL_VALUE_TYPE &ub, bool &maybe_nan,
+   tree type,
+   const REAL_VALUE_TYPE &lh_lb,
+   const REAL_VALUE_TYPE &lh_ub,
+   const REAL_VALUE_TYPE &,
+   const REAL_VALUE_TYPE &,
+   relation_kind) const final override
+  {
+if (real_compare (LT_EXPR, &lh_ub, &dconst0))
+  {
+   real_nan (&lb, "", 0, TYPE_MODE (type));
+   ub = lb;
+   maybe_nan = true;
+   return;
+  }
+lb = real_sqrt (lh_lb, type);
+ub = real_sqrt (lh_ub, type);
+if (real_compare (GE_EXPR, &lh_lb, &dconst0))
+  maybe_nan = false;
+else
+  maybe_nan = true;
+  }
+} fop_cfn_sqrt;
+
 // Implement range operator for CFN_BUILT_IN_SIGNBIT.
 class cfn_signbit : public range_operator_float
 {
@@ -907,6 +943,12 @@ gimple_range_op_handler::maybe_builtin_call ()
   m_int = &op_cfn_parity;
   break;
 
+CASE_CFN_SQRT:
+CASE_CFN_SQRT_FN:
+  m_valid = true;
+  m_float = &fop_cfn_sqrt;
+  break;
+
 default:
   break;
 }
-- 
2.38.1



Re: [PATCH 3/5] Fortran: Narrow return types [PR78798]

2022-11-13 Thread Harald Anlauf via Gcc-patches

Am 13.11.22 um 11:39 schrieb Bernhard Reutner-Fischer via Gcc-patches:

On Sun, 13 Nov 2022 12:13:26 +0200
Janne Blomqvist  wrote:


On Sun, Nov 13, 2022 at 1:47 AM Bernhard Reutner-Fischer via Fortran
 wrote:

--- a/gcc/fortran/arith.cc
+++ b/gcc/fortran/arith.cc
@@ -1135,7 +1135,7 @@ compare_complex (gfc_expr *op1, gfc_expr *op2)
 strings.  We return -1 for a < b, 0 for a == b and 1 for a > b.
 We use the processor's default collating sequence.  */

-int
+signed char
  gfc_compare_string (gfc_expr *a, gfc_expr *b)
  {
size_t len, alen, blen, i;
@@ -1162,7 +1162,7 @@ gfc_compare_string (gfc_expr *a, gfc_expr *b)
  }


Hmm, really? PR 78798 mentions changing int to bool, where
appropriate, which I think is uncontroversial, but this?


Well we could leave this or all spots alone where a bool is
insufficient, if you prefer.

In the case of gfc_compare_string, the only user is simplify which only
looks at ge/gt/le/lt 0


My reading of the mentioned PR is that there is a fundamental
disagreement with the subject:

Bug 78798 - [cleanup] some int-valued functions should be bool

I see that as an issue of (a minor lack of) conciseness;
it is *not* about narrowing.

Replacing "int" by "signed char" adds confusion and makes code
less understandable, so I would oppose it, as we don't solve a
real problem and rather add confusion.



Re: [PATCH] libstdc++: Fix python/ not making install directories

2022-11-13 Thread Bernhard Reutner-Fischer via Gcc-patches
On Sun, 13 Nov 2022 19:42:52 +
Jonathan Wakely via Gcc-patches  wrote:

> On Sun, 13 Nov 2022, 18:06 Arsen Arsenović via Libstdc++, <
> libstd...@gcc.gnu.org> wrote:  
> 
> > I'm unsure why this issue only started manifesting now with how old this
> > code is, but this should fix it.
> >  
> 
> I just pushed a change to how the debug build makefiles are generated,
> which presumably uncovered this latent bug. I'll review the patch in the
> morning.

Ah, you removed debugdir everywhere but in the install-debug rule :)
I.e.:

$ git diff
diff --git a/libstdc++-v3/src/Makefile.am b/libstdc++-v3/src/Makefile.am
index b545ebf0dcf..bfa031ea395 100644
--- a/libstdc++-v3/src/Makefile.am
+++ b/libstdc++-v3/src/Makefile.am
@@ -422,5 +422,5 @@ build-debug: stamp-debug $(debug_backtrace_supported_h)
 
 # Install debug library.
 install-debug: build-debug
-   (cd ${debugdir} && $(MAKE) CXXFLAGS='$(DEBUG_FLAGS)' \
-   toolexeclibdir=$(glibcxx_toolexeclibdir)/debug install) ;
+   $(MAKE) -C debug CXXFLAGS='$(DEBUG_FLAGS)' \
+   toolexeclibdir=$(glibcxx_toolexeclibdir)/debug install
diff --git a/libstdc++-v3/src/Makefile.in b/libstdc++-v3/src/Makefile.in
index f54ee282fb0..8479d297389 100644
--- a/libstdc++-v3/src/Makefile.in
+++ b/libstdc++-v3/src/Makefile.in
@@ -1142,8 +1142,8 @@ build-debug: stamp-debug $(debug_backtrace_supported_h)
 
 # Install debug library.
 install-debug: build-debug
-   (cd ${debugdir} && $(MAKE) CXXFLAGS='$(DEBUG_FLAGS)' \
-   toolexeclibdir=$(glibcxx_toolexeclibdir)/debug install) ;
+   $(MAKE) -C debug CXXFLAGS='$(DEBUG_FLAGS)' \
+   toolexeclibdir=$(glibcxx_toolexeclibdir)/debug install
 
 # Tell versions [3.59,3.63) of GNU make to not export all variables.
 # Otherwise a system limit (for SysV at least) may be exceeded.

I personally did not experience the gdb.py install bug Arsen seems to
have encountered though.

thanks!
> 
> 
> 
> > libstdc++-v3/ChangeLog:
> >
> > * python/Makefile.am: Call mkinstalldirs before INSTALL_DATA
> > when installing gdb scripts.
> > * python/Makefile.in: Regenerate.
> > ---
> > Hi,
> >
> > Someone spotted on IRC spotted an error: if trying to install to a fresh
> > prefix/sysroot with --enable-libstdcxx-debug, the install fails since it's
> > intended target directories don't exist.  I could replicate this on
> > r13-3944-g43435c7eb0ff60 using
> >
> > $ ../gcc/configure --disable-bootstrap \
> > --enable-libstdcxx-debug \
> > --enable-languages=c,c++ \
> > --prefix=$(pwd)/pfx


Re: [PATCH] [range-ops] Implement sqrt.

2022-11-13 Thread Jakub Jelinek via Gcc-patches
On Sun, Nov 13, 2022 at 09:05:53PM +0100, Aldy Hernandez wrote:
> It seems SQRT is relatively straightforward, and it's something Jakub
> wanted for this release.
> 
> Jakub, what do you think?
> 
> p.s. Too tired to think about op1_range.

That would be multiplication of the same value twice, i.e.
fop_mult with trio that has op1_op2 () == VREL_EQ?
But see below, as sqrt won't be always precise, we need to account for
some errors.

> gcc/ChangeLog:
> 
>   * gimple-range-op.cc (class cfn_sqrt): New.
>   (gimple_range_op_handler::maybe_builtin_call): Add cases for sqrt.

Yes, I'd like to see SQRT support in.
The only thing I'm worried is that unlike {+,-,*,/}, negation etc. typically
implemented in hardware or precise soft-float, sqrt is often implemented
in library using multiple floating point arithmetic functions.  And different
implementations have different accuracy.

So, I wonder if we don't need to add a target hook where targets will be
able to provide upper bound on error for floating point functions for
different floating point modes and some way to signal unknown accuracy/can't
be trusted, in which case we would give up or return just the range for
VARYING.
Then, we could write some tests that say in a loop constructs random
floating point values (perhaps sanitized to be non-NAN), calls libm function
and the same mpfr one and return maximum error in ulps.
And then record those, initially for glibc and most common targets and
gradually maintainers could supply more.

If we add an infrastructure for that within a few days, then we could start
filling the details.  One would hope that sqrt has < 10ulps accuracy if not
already the 0.5ulp one, but for various other functions I think it can be
much more.  Oh, nanq ib libquadmath has terrible accuracy, but that one
fortunately is not builtin...

If we have some small integer for ulps accuracy of calls (we could use
0 for 0.5ulps accuracy aka precise), wonder if we'd handle it just as a loop
of doing n times frange_nextafter or something smarter.

> --- a/gcc/gimple-range-op.cc
> +++ b/gcc/gimple-range-op.cc
> @@ -43,6 +43,7 @@ along with GCC; see the file COPYING3.  If not see
>  #include "range.h"
>  #include "value-query.h"
>  #include "gimple-range.h"
> +#include "fold-const-call.h"
>  
>  // Given stmt S, fill VEC, up to VEC_SIZE elements, with relevant ssa-names
>  // on the statement.  For efficiency, it is an error to not pass in enough
> @@ -301,6 +302,41 @@ public:
>}
>  } op_cfn_constant_p;
>  
> +// Implement range operator for SQRT.
> +class cfn_sqrt : public range_operator_float
> +{
> +  using range_operator_float::fold_range;
> +private:
> +  REAL_VALUE_TYPE real_sqrt (const REAL_VALUE_TYPE &arg, tree type) const
> +  {
> +tree targ = build_real (type, arg);
> +tree res = fold_const_call (as_combined_fn (BUILT_IN_SQRT), type, targ);
> +return *TREE_REAL_CST_PTR (res);
> +  }
> +  void rv_fold (REAL_VALUE_TYPE &lb, REAL_VALUE_TYPE &ub, bool &maybe_nan,
> + tree type,
> + const REAL_VALUE_TYPE &lh_lb,
> + const REAL_VALUE_TYPE &lh_ub,
> + const REAL_VALUE_TYPE &,
> + const REAL_VALUE_TYPE &,
> + relation_kind) const final override
> +  {
> +if (real_compare (LT_EXPR, &lh_ub, &dconst0))
> +  {
> + real_nan (&lb, "", 0, TYPE_MODE (type));
> + ub = lb;
> + maybe_nan = true;
> + return;
> +  }
> +lb = real_sqrt (lh_lb, type);
> +ub = real_sqrt (lh_ub, type);
> +if (real_compare (GE_EXPR, &lh_lb, &dconst0))
> +  maybe_nan = false;
> +else
> +  maybe_nan = true;

Doesn't this for say VARYING range result in [NAN, +INF] range?
We want [-0.0, +INF].
So perhaps the real_compare should be done before doing the real_sqrt calls
and for the maybe_nan case use hardcoded -0.0 as lb?

BTW, as for the ulps, another thing to test is whether even when
the library has certain number of ulps error worst case whether it still
obeys the basic math properties of the function or not.
Say for sqrt that it always fits into [-0.0, +INF] (guess because of the
flush denormals to zero we wouldn't have a problem here for say 30ulps
sqrt that [nextafter (0.0, 1.0) * 16.0, 64.0] wouldn't be considered
[-nextafter (0.0, 1.0) * 16.0, 8.0 + 30ulps] but just
[-0.0, 8.0 + 30ulps], but later on say sin/cos, which mathematically should
have result always in [-1.0, 1.0] +-NAN, it would be interesting to see
if there aren't some implementations that would happily return 1.0 + 15ulps
or -1.0 - 20ulps.

And unrelated thought about reverse y = x * x; if we know y's range
- op1_range/op2_range in that case could be handled as sqrt without the
library ulps treatment (if we assume that multiplication is always precise),
but the question is if op1_range or op2_range is called at all in those
cases and whether we could similarly use trio to derive in that case the
x's range.

Jakub



[PATCH] RISC-V: Use .p2align for code-alignment

2022-11-13 Thread Philipp Tomsich
RISC-V's .p2align (currently) ignores the max-skip argument.  As we
have experimental patches underway to address this in a
backwards-compatible manner, let's prepare GCC for the day when
binutils gets updated.

gcc/ChangeLog:

* config/riscv/riscv.h (ASM_OUTPUT_MAX_SKIP_ALIGN): Implement.

Signed-off-by: Philipp Tomsich 
---

 gcc/config/riscv/riscv.h | 18 ++
 1 file changed, 18 insertions(+)

diff --git a/gcc/config/riscv/riscv.h b/gcc/config/riscv/riscv.h
index 2d0d170645c..c216173cf6b 100644
--- a/gcc/config/riscv/riscv.h
+++ b/gcc/config/riscv/riscv.h
@@ -938,6 +938,24 @@ typedef struct {
   fprintf (STREAM, "\t.word\t%sL%d-%sL%d\n",   \
   LOCAL_LABEL_PREFIX, VALUE, LOCAL_LABEL_PREFIX, REL)
 
+#ifdef HAVE_GAS_MAX_SKIP_P2ALIGN
+/* Support for -falign-* switches.  Use .p2align to ensure that code
+   sections are padded with NOP instructions, rather than zeros.  */
+#define ASM_OUTPUT_MAX_SKIP_ALIGN(FILE, LOG, MAX_SKIP) \
+  do   \
+{  \
+  if ((LOG) != 0)  \
+   {   \
+ if ((MAX_SKIP) == 0)  \
+   fprintf ((FILE), "\t.p2align %d\n", (int) (LOG));   \
+ else  \
+   fprintf ((FILE), "\t.p2align %d,,%d\n", \
+(int) (LOG), (int) (MAX_SKIP));\
+   }   \
+} while (0)
+
+#endif /* HAVE_GAS_MAX_SKIP_P2ALIGN */
+
 /* This is how to output an assembler line
that says to advance the location counter
to a multiple of 2**LOG bytes.  */
-- 
2.34.1



[PATCH] RISC-V: Zihintpause: add __builtin_riscv_pause

2022-11-13 Thread Philipp Tomsich
The Zihintpause extension uses an opcode from the 'fence' opcode range
to add a true hint instruction (i.e. if it is not supported on any
given platform, the 'fence' that is encoded will not enforce any
specific ordering on memory accesses) for entering a low-power state
(e.g. in an idle thread).  We expose this new instruction through a
machine-dependent builtin to allow generating it without a requirement
for any inline assembly.

Given that the encoding of 'pause' is valid (as a 'fence' encoding)
even for processors that do not (yet) support Zihintpause, we make
this builtin available without any further TARGET_* constraints.

gcc/ChangeLog:

* config/riscv/riscv-builtins.cc (struct riscv_builtin_description):
add the pause machine-dependent builtin with no result and no
arguments; mark it as always present (pause is a true hint
that encodes into a fence-insn, if not supported with the new
pause semantics).
* config/riscv/riscv-ftypes.def: Add type for void -> void.
* config/riscv/riscv.md (riscv_pause): Add risc_pause and UNSPECV_PAUSE
* 
doc/gcc/extensions-to-the-c-language-family/target-builtins/risc-v-built-in-functions.rst:
Document.
* optabs.cc (maybe_gen_insn): Allow nops == 0 (void -> void).

gcc/testsuite/ChangeLog:

* gcc.target/riscv/builtin_pause.c: New test.

Signed-off-by: Philipp Tomsich 
---

 gcc/config/riscv/riscv-builtins.cc |  6 +++---
 gcc/config/riscv/riscv-ftypes.def  |  1 +
 gcc/config/riscv/riscv.md  |  8 
 .../target-builtins/risc-v-built-in-functions.rst  |  4 
 gcc/optabs.cc  |  2 ++
 gcc/testsuite/gcc.target/riscv/builtin_pause.c | 10 ++
 6 files changed, 28 insertions(+), 3 deletions(-)
 create mode 100644 gcc/testsuite/gcc.target/riscv/builtin_pause.c

diff --git a/gcc/config/riscv/riscv-builtins.cc 
b/gcc/config/riscv/riscv-builtins.cc
index 021f6c6b69a..24ae22c99cd 100644
--- a/gcc/config/riscv/riscv-builtins.cc
+++ b/gcc/config/riscv/riscv-builtins.cc
@@ -88,8 +88,6 @@ struct riscv_builtin_description {
 };
 
 AVAIL (hard_float, TARGET_HARD_FLOAT || TARGET_ZFINX)
-
-
 AVAIL (clean32, TARGET_ZICBOM && !TARGET_64BIT)
 AVAIL (clean64, TARGET_ZICBOM && TARGET_64BIT)
 AVAIL (flush32, TARGET_ZICBOM && !TARGET_64BIT)
@@ -100,6 +98,7 @@ AVAIL (zero32,  TARGET_ZICBOZ && !TARGET_64BIT)
 AVAIL (zero64,  TARGET_ZICBOZ && TARGET_64BIT)
 AVAIL (prefetchi32, TARGET_ZICBOP && !TARGET_64BIT)
 AVAIL (prefetchi64, TARGET_ZICBOP && TARGET_64BIT)
+AVAIL (always, (!0))
 
 /* Construct a riscv_builtin_description from the given arguments.
 
@@ -148,7 +147,8 @@ static const struct riscv_builtin_description 
riscv_builtins[] = {
   #include "riscv-cmo.def"
 
   DIRECT_BUILTIN (frflags, RISCV_USI_FTYPE, hard_float),
-  DIRECT_NO_TARGET_BUILTIN (fsflags, RISCV_VOID_FTYPE_USI, hard_float)
+  DIRECT_NO_TARGET_BUILTIN (fsflags, RISCV_VOID_FTYPE_USI, hard_float),
+  DIRECT_NO_TARGET_BUILTIN (pause, RISCV_VOID_FTYPE, always),
 };
 
 /* Index I is the function declaration for riscv_builtins[I], or null if the
diff --git a/gcc/config/riscv/riscv-ftypes.def 
b/gcc/config/riscv/riscv-ftypes.def
index c2b45c63ea1..bf2d30782d9 100644
--- a/gcc/config/riscv/riscv-ftypes.def
+++ b/gcc/config/riscv/riscv-ftypes.def
@@ -27,6 +27,7 @@ along with GCC; see the file COPYING3.  If not see
 argument type.  */
 
 DEF_RISCV_FTYPE (0, (USI))
+DEF_RISCV_FTYPE (0, (VOID))
 DEF_RISCV_FTYPE (1, (VOID, USI))
 DEF_RISCV_FTYPE (1, (VOID, VOID_PTR))
 DEF_RISCV_FTYPE (1, (SI, SI))
diff --git a/gcc/config/riscv/riscv.md b/gcc/config/riscv/riscv.md
index d1f3270a3c8..a933764e897 100644
--- a/gcc/config/riscv/riscv.md
+++ b/gcc/config/riscv/riscv.md
@@ -94,6 +94,9 @@
   UNSPECV_INVAL
   UNSPECV_ZERO
   UNSPECV_PREI
+
+  ;; Zihintpause unspec
+  UNSPECV_PAUSE
 ])
 
 (define_constants
@@ -1982,6 +1985,11 @@
   "TARGET_ZIFENCEI"
   "fence.i")
 
+(define_insn "riscv_pause"
+  [(unspec_volatile [(const_int 0)] UNSPECV_PAUSE)]
+  ""
+  "pause")
+
 ;;
 ;;  
 ;;
diff --git 
a/gcc/doc/gcc/extensions-to-the-c-language-family/target-builtins/risc-v-built-in-functions.rst
 
b/gcc/doc/gcc/extensions-to-the-c-language-family/target-builtins/risc-v-built-in-functions.rst
index fca4852ad74..b2f59b310fb 100644
--- 
a/gcc/doc/gcc/extensions-to-the-c-language-family/target-builtins/risc-v-built-in-functions.rst
+++ 
b/gcc/doc/gcc/extensions-to-the-c-language-family/target-builtins/risc-v-built-in-functions.rst
@@ -14,3 +14,7 @@ processors.
 .. function:: void * __builtin_thread_pointer (void)
 
   Returns the value that is currently set in the :samp:`tp` register.
+
+.. function:: void __builtin_riscv_pause (void)
+
+  Generates the :samp:`pause` (hint) machine instruction
diff --git a/gcc/optabs.cc b/gcc/optabs.cc
index 9fc9b1fc6e9..09d3b08cb00 100644
--- a/gcc/optabs.cc
+++ b/gcc/optabs.cc
@@ 

[PATCH v2 0/2] Basic support for the Ventana VT1 w/ instruction fusion

2022-11-13 Thread Philipp Tomsich


This series provides support for the Ventana VT1 (a 4-way superscalar
rv64gc_zba_zbb_zbc_zbs_zifenci_xventanacondops core) including support
for the supported instruction fusion patterns.

This includes the addition of the fusion-aware scheduling
infrastructure for RISC-V and implements idiom recognition for the
fusion patterns supported by VT1.

Note that we don't signal support for XVentanaCondOps at this point,
as the XVentanaCondOps support is in-flight separately.  Changing the
defaults for VT1 can happen late in the cycle, so no need to link the
two different changesets.

Changes in v2:
- Rebased and changed over to .rst-based documentation
- Updated to catch more fusion cases
- Signals support for Zifencei

Philipp Tomsich (2):
  RISC-V: Add basic support for the Ventana-VT1 core
  RISC-V: Add instruction fusion (for ventana-vt1)

 gcc/config/riscv/riscv-cores.def  |   3 +
 gcc/config/riscv/riscv-opts.h |   2 +-
 gcc/config/riscv/riscv.cc | 233 ++
 .../risc-v-options.rst|   5 +-
 4 files changed, 240 insertions(+), 3 deletions(-)

-- 
2.34.1



[PATCH v2 1/2] RISC-V: Add basic support for the Ventana-VT1 core

2022-11-13 Thread Philipp Tomsich
The Ventana-VT1 core is compatible with rv64gc, Zb[abcs], Zifenci and
XVentanaCondOps.
This introduces a placeholder -mcpu=ventana-vt1, so tooling and
scripts don't need to change once full support (pipeline, tuning,
etc.) will become public later.

gcc/ChangeLog:

* config/riscv/riscv-cores.def (RISCV_TUNE): Add ventana-vt1.
(RISCV_CORE): Ditto.
* config/riscv/riscv-opts.h (enum riscv_microarchitecture_type): Ditto.
* config/riscv/riscv.cc: Add tune_info for ventana-vt1.
* config/riscv/riscv.md: Add ventana-vt1.
* 
doc/gcc/gcc-command-options/machine-dependent-options/risc-v-options.rst:
Document -mcpu= and -mtune with ventana-vt1.

Signed-off-by: Philipp Tomsich 
---

Changes in v2:
- Rebased and changed over to .rst-based documentation
- Updated to catch more fusion cases
- Signals support for Zifencei
- Rebase to master, adjusting for the new way to define cores.
- Change documentation to the new way (.rst)
- Include Zifencei in the VT1 definition.

 gcc/config/riscv/riscv-cores.def   |  3 +++
 gcc/config/riscv/riscv-opts.h  |  2 +-
 gcc/config/riscv/riscv.cc  | 14 ++
 .../machine-dependent-options/risc-v-options.rst   |  5 +++--
 4 files changed, 21 insertions(+), 3 deletions(-)

diff --git a/gcc/config/riscv/riscv-cores.def b/gcc/config/riscv/riscv-cores.def
index 31ad34682c5..aef1e92ae24 100644
--- a/gcc/config/riscv/riscv-cores.def
+++ b/gcc/config/riscv/riscv-cores.def
@@ -38,6 +38,7 @@ RISCV_TUNE("sifive-3-series", generic, rocket_tune_info)
 RISCV_TUNE("sifive-5-series", generic, rocket_tune_info)
 RISCV_TUNE("sifive-7-series", sifive_7, sifive_7_tune_info)
 RISCV_TUNE("thead-c906", generic, thead_c906_tune_info)
+RISCV_TUNE("ventana-vt1", generic, ventana_vt1_tune_info)
 RISCV_TUNE("size", generic, optimize_size_tune_info)
 
 #undef RISCV_TUNE
@@ -73,4 +74,6 @@ RISCV_CORE("sifive-s76",  "rv64imafdc", "sifive-7-series")
 RISCV_CORE("sifive-u54",  "rv64imafdc", "sifive-5-series")
 RISCV_CORE("sifive-u74",  "rv64imafdc", "sifive-7-series")
 
+RISCV_CORE("ventana-vt1", "rv64imafdc_zba_zbb_zbc_zbs_zifencei",   
"ventana-vt1")
+
 #undef RISCV_CORE
diff --git a/gcc/config/riscv/riscv-opts.h b/gcc/config/riscv/riscv-opts.h
index 84c987626bc..7962dbe5018 100644
--- a/gcc/config/riscv/riscv-opts.h
+++ b/gcc/config/riscv/riscv-opts.h
@@ -52,7 +52,7 @@ extern enum riscv_isa_spec_class riscv_isa_spec;
 /* Keep this list in sync with define_attr "tune" in riscv.md.  */
 enum riscv_microarchitecture_type {
   generic,
-  sifive_7
+  sifive_7,
 };
 extern enum riscv_microarchitecture_type riscv_microarchitecture;
 
diff --git a/gcc/config/riscv/riscv.cc b/gcc/config/riscv/riscv.cc
index c04e5db21df..31d651f8744 100644
--- a/gcc/config/riscv/riscv.cc
+++ b/gcc/config/riscv/riscv.cc
@@ -360,6 +360,20 @@ static const struct riscv_tune_param 
optimize_size_tune_info = {
   false,   /* slow_unaligned_access */
 };
 
+/* Costs to use when optimizing for Ventana Micro VT1.  */
+static const struct riscv_tune_param ventana_vt1_tune_info = {
+  {COSTS_N_INSNS (4), COSTS_N_INSNS (5)},  /* fp_add */
+  {COSTS_N_INSNS (4), COSTS_N_INSNS (5)},  /* fp_mul */
+  {COSTS_N_INSNS (20), COSTS_N_INSNS (20)},/* fp_div */
+  {COSTS_N_INSNS (4), COSTS_N_INSNS (4)},  /* int_mul */
+  {COSTS_N_INSNS (6), COSTS_N_INSNS (6)},  /* int_div */
+  4,   /* issue_rate */
+  4,   /* branch_cost */
+  5,   /* memory_cost */
+  8,   /* fmv_cost */
+  false,   /* slow_unaligned_access */
+};
+
 static tree riscv_handle_fndecl_attribute (tree *, tree, tree, int, bool *);
 static tree riscv_handle_type_attribute (tree *, tree, tree, int, bool *);
 
diff --git 
a/gcc/doc/gcc/gcc-command-options/machine-dependent-options/risc-v-options.rst 
b/gcc/doc/gcc/gcc-command-options/machine-dependent-options/risc-v-options.rst
index 2b5167b56b2..5a0345ae2b3 100644
--- 
a/gcc/doc/gcc/gcc-command-options/machine-dependent-options/risc-v-options.rst
+++ 
b/gcc/doc/gcc/gcc-command-options/machine-dependent-options/risc-v-options.rst
@@ -95,14 +95,15 @@ These command-line options are defined for RISC-V targets:
   Permissible values for this option are: :samp:`sifive-e20`, 
:samp:`sifive-e21`,
   :samp:`sifive-e24`, :samp:`sifive-e31`, :samp:`sifive-e34`, 
:samp:`sifive-e76`,
   :samp:`sifive-s21`, :samp:`sifive-s51`, :samp:`sifive-s54`, 
:samp:`sifive-s76`,
-  :samp:`sifive-u54`, and :samp:`sifive-u74`.
+  :samp:`sifive-u54`, :samp:`sifive-u74`, and :samp:`ventana-vt1`.
 
 .. option:: -mtune={processor-string}
 
   Optimize the output for the given processor, specified by microarchitecture 
or
   particular CPU name.  Permissible values for this option are: :samp:`rocket`,
 

[PATCH] RISC-V: Split "(a & (1UL << bitno)) ? 0 : -1" to bext + addi

2022-11-13 Thread Philipp Tomsich
For a straightforward application of bext for the following function
  long bext64(long a, char bitno)
  {
return (a & (1UL << bitno)) ? 0 : -1;
  }
we generate
srl a0,a0,a1# 7 [c=4 l=4]  lshrdi3
andia0,a0,1 # 8 [c=4 l=4]  anddi3/1
addia0,a0,-1# 14[c=4 l=4]  adddi3/1
due to the following failed match at combine time:
  (set (reg:DI 82)
   (zero_extract:DI (reg:DI 83)
(const_int 1 [0x1])
(reg:DI 84)))

The existing pattern for bext requires the 3rd argument to
zero_extract to be a QImode register wrapped in a zero_extension.
This adds an additional pattern that allows an Xmode argument.

With this change, the testcase compiles to
bexta0,a0,a1# 8 [c=4 l=4]  *bextdi
addia0,a0,-1# 14[c=4 l=4]  adddi3/1

gcc/ChangeLog:

* config/riscv/bitmanip.md (*bext): Add an additional
pattern that allows the 3rd argument to zero_extract to be
an Xmode register operand.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/zbs-bext.c: Add testcases.
* gcc.target/riscv/zbs-bexti.c: Add testcases.

Signed-off-by: Philipp Tomsich 
---

 gcc/config/riscv/bitmanip.md   | 12 +++
 gcc/testsuite/gcc.target/riscv/zbs-bext.c  | 23 +---
 gcc/testsuite/gcc.target/riscv/zbs-bexti.c | 25 --
 3 files changed, 50 insertions(+), 10 deletions(-)

diff --git a/gcc/config/riscv/bitmanip.md b/gcc/config/riscv/bitmanip.md
index f1d8f24c2d3..8222c600ca5 100644
--- a/gcc/config/riscv/bitmanip.md
+++ b/gcc/config/riscv/bitmanip.md
@@ -644,6 +644,18 @@
   "bext\t%0,%1,%2"
   [(set_attr "type" "bitmanip")])
 
+;; When performing `(a & (1UL << bitno)) ? 0 : -1` the combiner
+;; usually has the `bitno` typed as X-mode (i.e. no further
+;; zero-extension is performed around the bitno).
+(define_insn "*bext"
+  [(set (match_operand:X 0 "register_operand" "=r")
+   (zero_extract:X (match_operand:X 1 "register_operand" "r")
+   (const_int 1)
+   (match_operand:X 2 "register_operand" "r")))]
+  "TARGET_ZBS"
+  "bext\t%0,%1,%2"
+  [(set_attr "type" "bitmanip")])
+
 (define_insn "*bexti"
   [(set (match_operand:X 0 "register_operand" "=r")
(zero_extract:X (match_operand:X 1 "register_operand" "r")
diff --git a/gcc/testsuite/gcc.target/riscv/zbs-bext.c 
b/gcc/testsuite/gcc.target/riscv/zbs-bext.c
index 47982396119..8de9c5a167c 100644
--- a/gcc/testsuite/gcc.target/riscv/zbs-bext.c
+++ b/gcc/testsuite/gcc.target/riscv/zbs-bext.c
@@ -1,6 +1,6 @@
 /* { dg-do compile } */
 /* { dg-options "-march=rv64gc_zbs -mabi=lp64" } */
-/* { dg-skip-if "" { *-*-* } { "-O0" } } */
+/* { dg-skip-if "" { *-*-* } { "-O0" "-Og" } } */
 
 /* bext */
 long
@@ -16,6 +16,23 @@ foo1 (long i)
   return 1L & (i >> 20);
 }
 
+long bext64_1(long a, char bitno)
+{
+  return (a & (1UL << bitno)) ? 1 : 0;
+}
+
+long bext64_2(long a, char bitno)
+{
+  return (a & (1UL << bitno)) ? 0 : -1;
+}
+
+long bext64_3(long a, char bitno)
+{
+  return (a & (1UL << bitno)) ? -1 : 0;
+}
+
 /* { dg-final { scan-assembler-times "bexti\t" 1 } } */
-/* { dg-final { scan-assembler-times "bext\t" 1 } } */
-/* { dg-final { scan-assembler-not "andi" } } */
+/* { dg-final { scan-assembler-times "bext\t" 4 } } */
+/* { dg-final { scan-assembler-times "addi\t" 1 } } */
+/* { dg-final { scan-assembler-times "neg\t" 1 } } */
+/* { dg-final { scan-assembler-not "andi" } } */
\ No newline at end of file
diff --git a/gcc/testsuite/gcc.target/riscv/zbs-bexti.c 
b/gcc/testsuite/gcc.target/riscv/zbs-bexti.c
index 99e3b58309c..30b69c9bc3e 100644
--- a/gcc/testsuite/gcc.target/riscv/zbs-bexti.c
+++ b/gcc/testsuite/gcc.target/riscv/zbs-bexti.c
@@ -1,14 +1,25 @@
 /* { dg-do compile } */
-/* { dg-options "-march=rv64gc_zbs -mabi=lp64 -O2" } */
+/* { dg-options "-march=rv64gc_zbs -mabi=lp64" } */
+/* { dg-skip-if "" { *-*-* } { "-O0" "-Og" } } */
 
 /* bexti */
-#define BIT_NO  4
+#define BIT_NO  21
 
-long
-foo0 (long a)
+long bexti64_1(long a, char bitno)
 {
-  return (a & (1 << BIT_NO)) ? 0 : -1;
+  return (a & (1UL << BIT_NO)) ? 1 : 0;
 }
 
-/* { dg-final { scan-assembler "bexti" } } */
-/* { dg-final { scan-assembler "addi" } } */
+long bexti64_2(long a, char bitno)
+{
+  return (a & (1UL << BIT_NO)) ? 0 : -1;
+}
+
+long bexti64_3(long a, char bitno)
+{
+  return (a & (1UL << BIT_NO)) ? -1 : 0;
+}
+
+/* { dg-final { scan-assembler-times "bexti\t" 3 } } */
+/* { dg-final { scan-assembler-times "addi\t" 1 } } */
+/* { dg-final { scan-assembler-times "neg\t" 1 } } */
-- 
2.34.1



[PATCH v2 2/2] RISC-V: Add instruction fusion (for ventana-vt1)

2022-11-13 Thread Philipp Tomsich
The Ventana VT1 core supports quad-issue and instruction fusion.
This implemented TARGET_SCHED_MACRO_FUSION_P to keep fusible sequences
together and adds idiom matcheing for the supported fusion cases.

gcc/ChangeLog:

* config/riscv/riscv.cc (enum riscv_fusion_pairs): Add symbolic
constants to identify supported fusion patterns.
(struct riscv_tune_param): Add fusible_op field.
(riscv_macro_fusion_p): Implement.
(riscv_fusion_enabled_p): Implement.
(riscv_macro_fusion_pair_p): Implement and recoginze fusible
idioms for Ventana VT1.
(TARGET_SCHED_MACRO_FUSION_P): Point to riscv_macro_fusion_p.
(TARGET_SCHED_MACRO_FUSION_PAIR_P): Point to riscv_macro_fusion_pair_p.

Signed-off-by: Philipp Tomsich 
---

Changes in v2:
- Update fusion patterns and catch some missing idioms/fusion pairs.

 gcc/config/riscv/riscv.cc | 219 ++
 1 file changed, 219 insertions(+)

diff --git a/gcc/config/riscv/riscv.cc b/gcc/config/riscv/riscv.cc
index 31d651f8744..43ba520885c 100644
--- a/gcc/config/riscv/riscv.cc
+++ b/gcc/config/riscv/riscv.cc
@@ -215,6 +215,19 @@ struct riscv_integer_op {
The worst case is LUI, ADDI, SLLI, ADDI, SLLI, ADDI, SLLI, ADDI.  */
 #define RISCV_MAX_INTEGER_OPS 8
 
+enum riscv_fusion_pairs
+{
+  RISCV_FUSE_NOTHING = 0,
+  RISCV_FUSE_ZEXTW = (1 << 0),
+  RISCV_FUSE_ZEXTH = (1 << 1),
+  RISCV_FUSE_ZEXTWS = (1 << 2),
+  RISCV_FUSE_LDINDEXED = (1 << 3),
+  RISCV_FUSE_LUI_ADDI = (1 << 4),
+  RISCV_FUSE_AUIPC_ADDI = (1 << 5),
+  RISCV_FUSE_LUI_LD = (1 << 6),
+  RISCV_FUSE_AUIPC_LD = (1 << 7),
+};
+
 /* Costs of various operations on the different architectures.  */
 
 struct riscv_tune_param
@@ -229,6 +242,7 @@ struct riscv_tune_param
   unsigned short memory_cost;
   unsigned short fmv_cost;
   bool slow_unaligned_access;
+  unsigned int fusible_ops;
 };
 
 /* Information about one micro-arch we know about.  */
@@ -316,6 +330,7 @@ static const struct riscv_tune_param rocket_tune_info = {
   5,   /* memory_cost */
   8,   /* fmv_cost */
   true,/* 
slow_unaligned_access */
+  RISCV_FUSE_NOTHING,   /* fusible_ops */
 };
 
 /* Costs to use when optimizing for Sifive 7 Series.  */
@@ -330,6 +345,7 @@ static const struct riscv_tune_param sifive_7_tune_info = {
   3,   /* memory_cost */
   8,   /* fmv_cost */
   true,/* 
slow_unaligned_access */
+  RISCV_FUSE_NOTHING,   /* fusible_ops */
 };
 
 /* Costs to use when optimizing for T-HEAD c906.  */
@@ -344,6 +360,7 @@ static const struct riscv_tune_param thead_c906_tune_info = 
{
   5,/* memory_cost */
   8,   /* fmv_cost */
   false,/* slow_unaligned_access */
+  RISCV_FUSE_NOTHING,   /* fusible_ops */
 };
 
 /* Costs to use when optimizing for size.  */
@@ -358,6 +375,7 @@ static const struct riscv_tune_param 
optimize_size_tune_info = {
   2,   /* memory_cost */
   8,   /* fmv_cost */
   false,   /* slow_unaligned_access */
+  RISCV_FUSE_NOTHING,   /* fusible_ops */
 };
 
 /* Costs to use when optimizing for Ventana Micro VT1.  */
@@ -372,6 +390,10 @@ static const struct riscv_tune_param ventana_vt1_tune_info 
= {
   5,   /* memory_cost */
   8,   /* fmv_cost */
   false,   /* slow_unaligned_access */
+  ( RISCV_FUSE_ZEXTW | RISCV_FUSE_ZEXTH |   /* fusible_ops */
+RISCV_FUSE_ZEXTWS | RISCV_FUSE_LDINDEXED |
+RISCV_FUSE_LUI_ADDI | RISCV_FUSE_AUIPC_ADDI |
+RISCV_FUSE_LUI_LD | RISCV_FUSE_AUIPC_LD )
 };
 
 static tree riscv_handle_fndecl_attribute (tree *, tree, tree, int, bool *);
@@ -5627,6 +5649,199 @@ riscv_issue_rate (void)
   return tune_param->issue_rate;
 }
 
+/* Implement TARGET_SCHED_MACRO_FUSION_P.  Return true if target supports
+   instruction fusion of some sort.  */
+
+static bool
+riscv_macro_fusion_p (void)
+{
+  return tune_param->fusible_ops != RISCV_FUSE_NOTHING;
+}
+
+/* Return true iff the instruction fusion described by OP is enabled.  */
+
+static bool
+riscv_fusion_enabled_p(enum riscv_fusion_pairs op)
+{
+  return tune_param->fusible_ops & op;
+}
+
+/* Implement TARGET_SCHED_MACRO_FUSION_PAIR_P.  Return true if PREV and CURR
+   should be kept together during scheduling.  */
+
+static bool
+riscv_macro_fusion_pair_p (rtx_insn *prev, rtx_insn *curr)
+{
+  rtx prev_set = single_set (prev);
+  rtx curr_set = single_set (curr);
+  /* prev and curr are simple SET insns i.e. no flag setting or branching.  */
+  bool si

[PATCH] RISC-V: Handle "(a & twobits) == singlebit" in branches using Zbs

2022-11-13 Thread Philipp Tomsich
Use Zbs when generating a sequence for "if ((a & twobits) == singlebit) ..."
that can be expressed as bexti + bexti + andn.

gcc/ChangeLog:

* config/riscv/bitmanip.md 
(*branch_mask_twobits_equals_singlebit):
Handle "if ((a & T) == C)" using Zbs, when T has 2 bits set and C has 
one
of these tow bits set.
* config/riscv/predicates.md (const_twobits_operand): New predicate.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/zbs-if_then_else-01.c: New test.

Signed-off-by: Philipp Tomsich 
---

 gcc/config/riscv/bitmanip.md  | 42 +++
 gcc/config/riscv/predicates.md|  5 +++
 .../gcc.target/riscv/zbs-if_then_else-01.c| 20 +
 3 files changed, 67 insertions(+)
 create mode 100644 gcc/testsuite/gcc.target/riscv/zbs-if_then_else-01.c

diff --git a/gcc/config/riscv/bitmanip.md b/gcc/config/riscv/bitmanip.md
index 7a8f4e35880..2cea394671f 100644
--- a/gcc/config/riscv/bitmanip.md
+++ b/gcc/config/riscv/bitmanip.md
@@ -690,3 +690,45 @@
   "TARGET_ZBS"
   [(set (match_dup 0) (zero_extract:X (match_dup 1) (const_int 1) (match_dup 
2)))
(set (match_dup 0) (xor:X (match_dup 0) (const_int 1)))])
+
+;; IF_THEN_ELSE: test for 2 bits of opposite polarity
+(define_insn_and_split "*branch_mask_twobits_equals_singlebit"
+  [(set (pc)
+   (if_then_else (match_operator 1 "equality_operator"
+  [(and:X (match_operand:X 2 "register_operand" "r")
+  (match_operand:X 3 "const_twobits_operand" "i"))
+   (match_operand:X 4 "single_bit_mask_operand" "i")])
+(label_ref (match_operand 0 "" ""))
+(pc)))
+   (clobber (match_scratch:X 5 "=&r"))
+   (clobber (match_scratch:X 6 "=&r"))]
+  "TARGET_ZBS && TARGET_ZBB && !SMALL_OPERAND (INTVAL (operands[3]))"
+  "#"
+  "&& reload_completed"
+  [(set (match_dup 5) (zero_extract:X (match_dup 2)
+ (const_int 1)
+ (match_dup 8)))
+   (set (match_dup 6) (zero_extract:X (match_dup 2)
+ (const_int 1)
+ (match_dup 9)))
+   (set (match_dup 6) (and:X (not:X (match_dup 6)) (match_dup 5)))
+   (set (pc) (if_then_else (match_op_dup 1 [(match_dup 6) (const_int 0)])
+  (label_ref (match_dup 0))
+  (pc)))]
+{
+   unsigned HOST_WIDE_INT twobits_mask = UINTVAL (operands[3]);
+   unsigned HOST_WIDE_INT singlebit_mask = UINTVAL (operands[4]);
+
+   /* Make sure that the reference value has one of the bits of the mask set */
+   if ((twobits_mask & singlebit_mask) == 0)
+  FAIL;
+
+   int setbit = ctz_hwi (singlebit_mask);
+   int clearbit = ctz_hwi (twobits_mask & ~singlebit_mask);
+
+   operands[1] = gen_rtx_fmt_ee (GET_CODE (operands[1]) == NE ? EQ : NE,
+mode, operands[6], GEN_INT(0));
+
+   operands[8] = GEN_INT (setbit);
+   operands[9] = GEN_INT (clearbit);
+})
diff --git a/gcc/config/riscv/predicates.md b/gcc/config/riscv/predicates.md
index 490bff688a7..6e34829a59b 100644
--- a/gcc/config/riscv/predicates.md
+++ b/gcc/config/riscv/predicates.md
@@ -321,6 +321,11 @@
   (and (match_code "const_int")
(match_test "popcount_hwi (~UINTVAL (op)) == 2")))
 
+;; A CONST_INT operand that has exactly two bits set.
+(define_predicate "const_twobits_operand"
+  (and (match_code "const_int")
+   (match_test "popcount_hwi (UINTVAL (op)) == 2")))
+
 ;; A CONST_INT operand that fits into the unsigned half of a
 ;; signed-immediate after the top bit has been cleared.
 (define_predicate "uimm_extra_bit_operand"
diff --git a/gcc/testsuite/gcc.target/riscv/zbs-if_then_else-01.c 
b/gcc/testsuite/gcc.target/riscv/zbs-if_then_else-01.c
new file mode 100644
index 000..d249a841ff9
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/zbs-if_then_else-01.c
@@ -0,0 +1,20 @@
+/* { dg-do compile } */
+/* { dg-options "-march=rv64gc_zbb_zbs -mabi=lp64" } */
+/* { dg-skip-if "" { *-*-* } { "-O0" "-Og" "-O1" } } */
+
+void g();
+
+void f1 (long a)
+{
+  if ((a & ((1ul << 33) | (1 << 4))) == (1ul << 33))
+g();
+}
+
+void f2 (long a)
+{
+  if ((a & 0x12) == 0x10)
+g();
+}
+
+/* { dg-final { scan-assembler-times "bexti\t" 2 } } */
+/* { dg-final { scan-assembler-times "andn\t" 1 } } */
-- 
2.34.1



[PATCH] RISC-V: Split "(a & (1UL << bitno)) ? 0 : 1" to bext + xori

2022-11-13 Thread Philipp Tomsich
We avoid reassociating "(~(a >> BIT_NO)) & 1" into "((~a) >> BIT_NO) & 1"
by splitting it into a zero-extraction (bext) and an xori.  This both
avoids burning a register on a temporary and generates a sequence that
clearly captures 'extract bit, then invert bit'.

This change improves the previously generated
srl   a0,a0,a1
not   a0,a0
andi  a0,a0,1
into
bext  a0,a0,a1
xori  a0,a0,1

Signed-off-by: Philipp Tomsich 

gcc/ChangeLog:

* config/riscv/bitmanip.md: Add split covering
"(a & (1 << BIT_NO)) ? 0 : 1".

gcc/testsuite/ChangeLog:

* gcc.target/riscv/zbs-bext.c: Add testcases.
* gcc.target/riscv/zbs-bexti.c: Add testcases.

---

 gcc/config/riscv/bitmanip.md   | 13 +
 gcc/testsuite/gcc.target/riscv/zbs-bext.c  | 10 --
 gcc/testsuite/gcc.target/riscv/zbs-bexti.c | 10 --
 3 files changed, 29 insertions(+), 4 deletions(-)

diff --git a/gcc/config/riscv/bitmanip.md b/gcc/config/riscv/bitmanip.md
index 8222c600ca5..7a8f4e35880 100644
--- a/gcc/config/riscv/bitmanip.md
+++ b/gcc/config/riscv/bitmanip.md
@@ -677,3 +677,16 @@
   "TARGET_ZBS"
   [(set (match_dup 0) (zero_extract:GPR (match_dup 1) (const_int 1) (match_dup 
2)))
(set (match_dup 0) (plus:GPR (match_dup 0) (const_int -1)))])
+
+;; Split for "(a & (1 << BIT_NO)) ? 0 : 1":
+;; We avoid reassociating "(~(a >> BIT_NO)) & 1" into "((~a) >> BIT_NO) & 1",
+;; so we don't have to use a temporary.  Instead we extract the bit and then
+;; invert bit 0 ("a ^ 1") only.
+(define_split
+  [(set (match_operand:X 0 "register_operand")
+(and:X (not:X (lshiftrt:X (match_operand:X 1 "register_operand")
+  (subreg:QI (match_operand:X 2 
"register_operand") 0)))
+   (const_int 1)))]
+  "TARGET_ZBS"
+  [(set (match_dup 0) (zero_extract:X (match_dup 1) (const_int 1) (match_dup 
2)))
+   (set (match_dup 0) (xor:X (match_dup 0) (const_int 1)))])
diff --git a/gcc/testsuite/gcc.target/riscv/zbs-bext.c 
b/gcc/testsuite/gcc.target/riscv/zbs-bext.c
index 8de9c5a167c..a8aadb60390 100644
--- a/gcc/testsuite/gcc.target/riscv/zbs-bext.c
+++ b/gcc/testsuite/gcc.target/riscv/zbs-bext.c
@@ -23,16 +23,22 @@ long bext64_1(long a, char bitno)
 
 long bext64_2(long a, char bitno)
 {
-  return (a & (1UL << bitno)) ? 0 : -1;
+  return (a & (1UL << bitno)) ? 0 : 1;
 }
 
 long bext64_3(long a, char bitno)
+{
+  return (a & (1UL << bitno)) ? 0 : -1;
+}
+
+long bext64_4(long a, char bitno)
 {
   return (a & (1UL << bitno)) ? -1 : 0;
 }
 
 /* { dg-final { scan-assembler-times "bexti\t" 1 } } */
-/* { dg-final { scan-assembler-times "bext\t" 4 } } */
+/* { dg-final { scan-assembler-times "bext\t" 5 } } */
+/* { dg-final { scan-assembler-times "xori\t|snez\t" 1 } } */
 /* { dg-final { scan-assembler-times "addi\t" 1 } } */
 /* { dg-final { scan-assembler-times "neg\t" 1 } } */
 /* { dg-final { scan-assembler-not "andi" } } */
\ No newline at end of file
diff --git a/gcc/testsuite/gcc.target/riscv/zbs-bexti.c 
b/gcc/testsuite/gcc.target/riscv/zbs-bexti.c
index 30b69c9bc3e..c15098eb6cc 100644
--- a/gcc/testsuite/gcc.target/riscv/zbs-bexti.c
+++ b/gcc/testsuite/gcc.target/riscv/zbs-bexti.c
@@ -12,14 +12,20 @@ long bexti64_1(long a, char bitno)
 
 long bexti64_2(long a, char bitno)
 {
-  return (a & (1UL << BIT_NO)) ? 0 : -1;
+  return (a & (1UL << BIT_NO)) ? 0 : 1;
 }
 
 long bexti64_3(long a, char bitno)
+{
+  return (a & (1UL << BIT_NO)) ? 0 : -1;
+}
+
+long bexti64_4(long a, char bitno)
 {
   return (a & (1UL << BIT_NO)) ? -1 : 0;
 }
 
-/* { dg-final { scan-assembler-times "bexti\t" 3 } } */
+/* { dg-final { scan-assembler-times "bexti\t" 4 } } */
+/* { dg-final { scan-assembler-times "xori\t|snez\t" 1 } } */
 /* { dg-final { scan-assembler-times "addi\t" 1 } } */
 /* { dg-final { scan-assembler-times "neg\t" 1 } } */
-- 
2.34.1



[PATCH] aarch64: Add bfloat16_t support for aarch64

2022-11-13 Thread Jakub Jelinek via Gcc-patches
Hi!

x86_64/i686 has for a few weeks working std::bfloat16_t support, __bf16
there is no longer a storage only type, but can be used for arithmetics
and is supported in libgcc and libstdc++.

The following patch adds similar support for AArch64.

Bootstrapped/regtested on aarch64-linux.

Regressions are:
+FAIL: 26_numerics/headers/cmath/functions_std_c++23.cc (test for excess errors)
this one is something I need to look at:
functions_std_c++23.cc:(.text._Z14test_functionsIDFb16_EvPT_PiPlPx[_Z14test_functionsIDFb16_EvPT_PiPlPx]+0x738):
 undefined reference to `__floatdibf'
(4 times).  I need to compare to x86, I believe we want to do a DI -> SF
conversion followed by SF -> BF, but it is unclear why that isn't happening.
+FAIL: gcc.target/aarch64/sve/acle/general-c/ternary_bfloat16_opt_n_1.c 
-march=armv8.2-a+sve -moverride=tune=none  (test for errors, line 21)
  svbfdot (f32, bf16, 0); /* { dg-error {invalid conversion to type 
'bfloat16_t'} } */
This test tests for something that no longer fails, so could be just
adjusted.
+FAIL: g++.dg/ext/arm-bf16/bf16-mangle-aarch64-1.C  -std=c++11  scan-assembler 
\\t.global\\t_Z1fPu6__bf16
+FAIL: g++.dg/ext/arm-bf16/bf16-mangle-aarch64-1.C  -std=c++11  scan-assembler 
\\t.global\\t_Z1gPu6__bf16S_
+FAIL: g++.dg/ext/arm-bf16/bf16-mangle-aarch64-1.C  -std=c++11  scan-assembler 
\\t.global\\t_ZN1SIu6__bf16u6__bf16E1iE
+FAIL: g++.dg/ext/arm-bf16/bf16-mangle-aarch64-1.C  -std=c++14  scan-assembler 
\\t.global\\t_Z1fPu6__bf16
+FAIL: g++.dg/ext/arm-bf16/bf16-mangle-aarch64-1.C  -std=c++14  scan-assembler 
\\t.global\\t_Z1gPu6__bf16S_
+FAIL: g++.dg/ext/arm-bf16/bf16-mangle-aarch64-1.C  -std=c++14  scan-assembler 
\\t.global\\t_ZN1SIu6__bf16u6__bf16E1iE
+FAIL: g++.dg/ext/arm-bf16/bf16-mangle-aarch64-1.C  -std=c++17  scan-assembler 
\\t.global\\t_Z1fPu6__bf16
+FAIL: g++.dg/ext/arm-bf16/bf16-mangle-aarch64-1.C  -std=c++17  scan-assembler 
\\t.global\\t_Z1gPu6__bf16S_
+FAIL: g++.dg/ext/arm-bf16/bf16-mangle-aarch64-1.C  -std=c++17  scan-assembler 
\\t.global\\t_ZN1SIu6__bf16u6__bf16E1iE
+FAIL: g++.dg/ext/arm-bf16/bf16-mangle-aarch64-1.C  -std=c++20  scan-assembler 
\\t.global\\t_Z1fPu6__bf16
+FAIL: g++.dg/ext/arm-bf16/bf16-mangle-aarch64-1.C  -std=c++20  scan-assembler 
\\t.global\\t_Z1gPu6__bf16S_
+FAIL: g++.dg/ext/arm-bf16/bf16-mangle-aarch64-1.C  -std=c++20  scan-assembler 
\\t.global\\t_ZN1SIu6__bf16u6__bf16E1iE
+FAIL: g++.dg/ext/arm-bf16/bf16-mangle-aarch64-1.C  -std=c++2b  scan-assembler 
\\t.global\\t_Z1fPu6__bf16
+FAIL: g++.dg/ext/arm-bf16/bf16-mangle-aarch64-1.C  -std=c++2b  scan-assembler 
\\t.global\\t_Z1gPu6__bf16S_
+FAIL: g++.dg/ext/arm-bf16/bf16-mangle-aarch64-1.C  -std=c++2b  scan-assembler 
\\t.global\\t_ZN1SIu6__bf16u6__bf16E1iE
+FAIL: g++.dg/ext/arm-bf16/bf16-mangle-aarch64-1.C  -std=c++98  scan-assembler 
\\t.global\\t_Z1fPu6__bf16
+FAIL: g++.dg/ext/arm-bf16/bf16-mangle-aarch64-1.C  -std=c++98  scan-assembler 
\\t.global\\t_Z1gPu6__bf16S_
+FAIL: g++.dg/ext/arm-bf16/bf16-mangle-aarch64-1.C  -std=c++98  scan-assembler 
\\t.global\\t_ZN1SIu6__bf16u6__bf16E1iE
These test the mangling, which changed from u6__bf16 to the standard DF16b.

Now, while on x86 we change the mangling and behavior of __bf16, it doesn't
need to be necessarily like that on aarch64 (although it would be nice for
consistency), for C++ portable code would just use std::bfloat16_t type
which is in libstdc++ defined to decltype(0.0bf16).
So, if you want to keep previous mangling of __bf16 type or keep it storage
only type, we can always register some other name (__bfloat16_t or whatever),
make __bf16 and __bfloat16_t be distinct types (former aarch64_bf16_type_node
in the compiler, the latter bfloat16_type_node) and thus have
0.0bf16 have the latter type and libstdc++ using it.

2022-11-13  Jakub Jelinek  

gcc/
* config/aarch64/aarch64.h (aarch64_bf16_type_node): Remove.
(aarch64_bf16_ptr_type_node): Adjust comment.
* config/aarch64/aarch64.cc (aarch64_gimplify_va_arg_expr): Use
bfloat16_type_node rather than aarch64_bf16_type_node.
(aarch64_mangle_type): Mangle BFmode as DF16b.
(aarch64_libgcc_floating_mode_supported_p,
aarch64_scalar_mode_supported_p): Also support BFmode.
(aarch64_invalid_conversion, aarch64_invalid_unary_op): Remove.
aarch64_invalid_binary_op): Remove BFmode related rejections.
(TARGET_INVALID_CONVERSION, TARGET_INVALID_UNARY_OP): Don't redefine.
* config/aarch64/aarch64-builtins.cc (aarch64_bf16_type_node): Remove.
(aarch64_int_or_fp_type): Use bfloat16_type_node rather than
aarch64_bf16_type_node.
(aarch64_init_simd_builtin_types): Likewise.
(aarch64_init_bf16_types): Likewise.  Don't create bfloat16_type_node,
which is created in tree.cc already.
* config/aarch64/aarch64-sve-builtins.def (svbfloat16_t): Likewise.
libgcc/
* config/aarch64/t-softfp (softfp_extensions): Add bfsf.
(softfp_truncations): Add tfbf dfbf sfbf hfbf.
  

Re: [PATCH] Fortran: fix treatment of character, value, optional dummy arguments [PR107444]

2022-11-13 Thread Harald Anlauf via Gcc-patches

Am 13.11.22 um 09:51 schrieb Andreas Schwab:

This breaks aarch64:

$ /opt/gcc/gcc-20221113/Build/./gcc/xgcc -B/opt/gcc/gcc-20221113/Build/./gcc/ 
-B/usr/aarch64-suse-linux/bin/ -B/usr/aarch64-suse-linux/lib/ -isystem 
/usr/aarch64-suse-linux/include -isystem /usr/aarch64-suse-linux/sys-include 
-fchecking=1 ../../../../libgomp/testsuite/libgomp.fortran/is_device_ptr-2.f90 
-mabi=lp64 -B/opt/gcc/gcc-20221113/Build/aarch64-suse-linux/./libgomp/ 
-B/opt/gcc/gcc-20221113/Build/aarch64-suse-linux/./libgomp/.libs 
-I/opt/gcc/gcc-20221113/Build/aarch64-suse-linux/./libgomp 
-I../../../../libgomp/testsuite/../../include 
-I../../../../libgomp/testsuite/.. -fmessage-length=0 
-fno-diagnostics-show-caret -fdiagnostics-color=never -fopenmp -O 
-fdump-tree-original 
-B/opt/gcc/gcc-20221113/Build/aarch64-suse-linux/./libgomp/../libgfortran/.libs 
-fintrinsic-modules-path=/opt/gcc/gcc-20221113/Build/aarch64-suse-linux/./libgomp
 -L/opt/gcc/gcc-20221113/Build/aarch64-suse-linux/./libgomp/.libs 
-L/opt/gcc/gcc-20221113/Build/aarch64-suse-linux/./libgomp/../libgfortran/.libs 
-lgfortran -foffload=-lgfortran -lm -o ./is_device_ptr-2.exe
during GIMPLE pass: omplower
../../../../libgomp/testsuite/libgomp.fortran/is_device_ptr-2.f90:66:77: 
internal compiler error: in gfc_omp_check_optional_argument, at 
fortran/trans-openmp.cc:137
0x8acb63 gfc_omp_check_optional_argument(tree_node*, bool)
 ../../gcc/fortran/trans-openmp.cc:137
0xd29fc3 lower_omp_target
 ../../gcc/omp-low.cc:13632
0xd314b3 lower_omp_1
 ../../gcc/omp-low.cc:14523
0xd314b3 lower_omp
 ../../gcc/omp-low.cc:14662
0xd31283 lower_omp_1
 ../../gcc/omp-low.cc:14436
0xd31283 lower_omp
 ../../gcc/omp-low.cc:14662
0xd318a3 lower_omp_1
 ../../gcc/omp-low.cc:14452
0xd318a3 lower_omp
 ../../gcc/omp-low.cc:14662
0xd377fb execute_lower_omp
 ../../gcc/omp-low.cc:14701
0xd377fb execute
 ../../gcc/omp-low.cc:14755
Please submit a full bug report, with preprocessed source (by using 
-freport-bug).



I apologize for forgetting to add the attached change, which does
the adjustment of the name of the generated internal symbol.
Can you please confirm that it fixes your issues?

Thanks,
Harald


From 872ed50812d3ca13554411e107317161777ecf5d Mon Sep 17 00:00:00 2001
From: Harald Anlauf 
Date: Sun, 13 Nov 2022 21:53:58 +0100
Subject: [PATCH] Fortran: fix treatment of character, value, optional dummy
 arguments [PR107444]

gcc/fortran/ChangeLog:

	PR fortran/107444
	* trans-openmp.cc (gfc_omp_check_optional_argument): Adjust to change
	of prefix of internal symbol for presence status to '.'.
---
 gcc/fortran/trans-openmp.cc | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/gcc/fortran/trans-openmp.cc b/gcc/fortran/trans-openmp.cc
index 4bfdf85cd9b..9070c03353d 100644
--- a/gcc/fortran/trans-openmp.cc
+++ b/gcc/fortran/trans-openmp.cc
@@ -123,7 +123,7 @@ gfc_omp_check_optional_argument (tree decl, bool for_present_check)
   char name[GFC_MAX_SYMBOL_LEN + 2];
   tree tree_name;
 
-  name[0] = '_';
+  name[0] = '.';
   strcpy (&name[1], IDENTIFIER_POINTER (DECL_NAME (decl)));
   tree_name = get_identifier (name);
 
-- 
2.35.3



Re: [PATCH 02/12] ipa-cp: Do not consider useless aggregate constants

2022-11-13 Thread Martin Jambor
On Sat, Nov 12 2022, Martin Jambor wrote:
> Hi,
>
> When building vectors of known aggregate values, there is no point in
> including those for parameters which are not used in any way
> whatsoever.
>
> Bootstrapped and tested on x86_64-linux.  OK for master?
>
> Thanks,
>
> Martin

When doing LTO profiled-bootstrap, the original patch triggered a
checking assert that a selected constant was not among those discovered
to be known.  The following avoids doing any heuristics for unused
parameters too.  With this patch, the whole series passes LTO
profiled-bootstrap (and testing) on x86_64.

Martin



When building vectors of known aggregate values, there is no point in
including those for parameters which are not used in any way
whatsoever.  This patch avoids that together with also other kinds of
constants.

gcc/ChangeLog:

2022-11-13  Martin Jambor  

* ipa-cp.cc (push_agg_values_from_edge): Do not consider constants
in unused aggregate parameters.
---
 gcc/ipa-cp.cc | 6 +-
 1 file changed, 5 insertions(+), 1 deletion(-)

diff --git a/gcc/ipa-cp.cc b/gcc/ipa-cp.cc
index d2bcd5e5e69..313336a9ccd 100644
--- a/gcc/ipa-cp.cc
+++ b/gcc/ipa-cp.cc
@@ -5783,7 +5783,8 @@ push_agg_values_from_edge (struct cgraph_edge *cs,
}
 
   ipcp_param_lattices *plats = ipa_get_parm_lattices (dest_info, index);
-  if (plats->aggs_bottom)
+  if (!ipa_is_param_used (dest_info, index)
+ || plats->aggs_bottom)
continue;
   push_agg_values_for_index_from_edge (cs, index, res, interim);
 }
@@ -6147,6 +6148,9 @@ decide_whether_version_node (struct cgraph_node *node)
 
   for (i = 0; i < count;i++)
 {
+  if (!ipa_is_param_used (info, i))
+   continue;
+
   class ipcp_param_lattices *plats = ipa_get_parm_lattices (info, i);
   ipcp_lattice *lat = &plats->itself;
   ipcp_lattice *ctxlat = &plats->ctxlat;
-- 
2.38.0



[PATCH] sphinx: more build fixing if sphinx-build is missing

2022-11-13 Thread Stephan Bergmann via Gcc-patches
...similar to 1f9c79367e136e0ca5b775562e6111e1a0d0046f "sphinx: fix 
building if

sphinx-build is missing"

gcc/ChangeLog:

* fortran/Make-lang.in: Build info pages conditionally.
---
 gcc/fortran/Make-lang.in | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/gcc/fortran/Make-lang.in b/gcc/fortran/Make-lang.in
index 48acbed1754..852b6f3327f 100644
--- a/gcc/fortran/Make-lang.in
+++ b/gcc/fortran/Make-lang.in
@@ -161,7 +161,9 @@ fortran.install-pdf: $(F95_PDFFILES)
 F95_MANFILES = doc/gfortran/man/man/gfortran.1

 doc/gfortran/man/man/gfortran.1: $(SPHINX_FILES)
-	+ make -C $(srcdir)/../doc man 
SOURCEDIR=$(abs_srcdir)/fortran/doc/gfortran 
BUILDDIR=$(objdir)/doc/gfortran/man SPHINXBUILD=$(SPHINX_BUILD)

+   + if [ x$(HAS_SPHINX_BUILD) = xhas-sphinx-build ]; then \
+	  make -C $(srcdir)/../doc man 
SOURCEDIR=$(abs_srcdir)/fortran/doc/gfortran 
BUILDDIR=$(objdir)/doc/gfortran/man SPHINXBUILD=$(SPHINX_BUILD) \

+   else true; fi

 fortran.man: $(F95_MANFILES)

--
2.38.1



Re: [PATCH (pushed)] sphinx: fix building if sphinx-build is missing

2022-11-13 Thread Stephan Bergmann via Gcc-patches

On 11/13/22 20:26, Martin Liška wrote:

The patch is fine, please send it to gcc-patches ML and install it.


I sent a patch now, but don't have commit rights.



[PATCH v2 1/8] RISC-V: Recognize xventanacondops extension

2022-11-13 Thread Philipp Tomsich
This adds the xventanacondops extension to the option parsing and as a
default for the ventana-vt1 core:

gcc/Changelog:

* common/config/riscv/riscv-common.cc: Recognize
  "xventanacondops" as part of an architecture string.
* config/riscv/riscv-opts.h (MASK_XVENTANACONDOPS): Define.
(TARGET_XVENTANACONDOPS): Define.
* config/riscv/riscv.opt: Add "riscv_xventanacondops".

Signed-off-by: Philipp Tomsich 
---

Changes in v2:
- Restore a (during rebase) dropped line to xventanacondops.md
- Include the change to add xventanacondops to the VT1 code definition]
  as a separate patch.

 gcc/common/config/riscv/riscv-common.cc | 2 ++
 gcc/config/riscv/riscv-opts.h   | 3 +++
 gcc/config/riscv/riscv.opt  | 3 +++
 3 files changed, 8 insertions(+)

diff --git a/gcc/common/config/riscv/riscv-common.cc 
b/gcc/common/config/riscv/riscv-common.cc
index 4b7f777c103..6b2bdda5feb 100644
--- a/gcc/common/config/riscv/riscv-common.cc
+++ b/gcc/common/config/riscv/riscv-common.cc
@@ -1247,6 +1247,8 @@ static const riscv_ext_flag_table_t 
riscv_ext_flag_table[] =
   {"svinval", &gcc_options::x_riscv_sv_subext, MASK_SVINVAL},
   {"svnapot", &gcc_options::x_riscv_sv_subext, MASK_SVNAPOT},
 
+  {"xventanacondops", &gcc_options::x_riscv_xventanacondops, 
MASK_XVENTANACONDOPS},
+
   {NULL, NULL, 0}
 };
 
diff --git a/gcc/config/riscv/riscv-opts.h b/gcc/config/riscv/riscv-opts.h
index 1be83b5107c..7962dbe5018 100644
--- a/gcc/config/riscv/riscv-opts.h
+++ b/gcc/config/riscv/riscv-opts.h
@@ -189,4 +189,7 @@ enum stack_protector_guard {
? 0 \
: 32 << (__builtin_popcount (riscv_zvl_flags) - 1))
 
+#define MASK_XVENTANACONDOPS (1 << 0)
+#define TARGET_XVENTANACONDOPS ((riscv_xventanacondops & MASK_XVENTANACONDOPS) 
!= 0)
+
 #endif /* ! GCC_RISCV_OPTS_H */
diff --git a/gcc/config/riscv/riscv.opt b/gcc/config/riscv/riscv.opt
index 7c3ca48d1cc..9595078bdd4 100644
--- a/gcc/config/riscv/riscv.opt
+++ b/gcc/config/riscv/riscv.opt
@@ -233,6 +233,9 @@ int riscv_zm_subext
 TargetVariable
 int riscv_sv_subext
 
+TargetVariable
+int riscv_xventanacondops = 0
+
 Enum
 Name(isa_spec_class) Type(enum riscv_isa_spec_class)
 Supported ISA specs (for use with the -misa-spec= option):
-- 
2.34.1



[PATCH v2 2/8] RISC-V: Generate vt.maskc on noce_try_store_flag_mask if-conversion

2022-11-13 Thread Philipp Tomsich
Adds a pattern to map the output of noce_try_store_flag_mask
if-conversion in the combiner onto vt.maskc; the input patterns
supported are similar to the following:
  (set (reg/v/f:DI 75 [  ])
   (and:DI (neg:DI (ne:DI (reg:DI 82)
   (const_int 0 [0])))
   (reg/v/f:DI 75 [  ])))

This reduces dynamic instruction counts for the perlbench-workload in
SPEC CPU2017 by 0.8230%, 0.4689%, and 0.2332% (respectively, for the
each of the 3 workloads in the 'ref'-workload).

To ensure that the combine-pass doesn't get confused about
profitability, we recognize the idiom as requiring a single
instruction when the XVentanaCondOps extension is present.

gcc/ChangeLog:

* config/riscv/riscv.cc (riscv_rtx_costs): Recognize idiom for
  vt.maskc as a single insn with TARGET_XVENTANACONDOPS.
* config/riscv/riscv.md: Include xventanacondops.md.
* config/riscv/xventanacondops.md: New file.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/xventanacondops-ne-03.c: New test.
* gcc.target/riscv/xventanacondops-ne-04.c: New test.

Signed-off-by: Philipp Tomsich 
---

Changes in v2:
- Ran whitespace-cleanup on xventanacondops-ne-03.c
- Ran whitespace-cleanup on xventanacondops-ne-04.c

 gcc/config/riscv/riscv.cc | 14 +
 gcc/config/riscv/riscv.md |  1 +
 gcc/config/riscv/xventanacondops.md   | 30 +++
 .../gcc.target/riscv/xventanacondops-ne-03.c  | 13 
 .../gcc.target/riscv/xventanacondops-ne-04.c  | 13 
 5 files changed, 71 insertions(+)
 create mode 100644 gcc/config/riscv/xventanacondops.md
 create mode 100644 gcc/testsuite/gcc.target/riscv/xventanacondops-ne-03.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/xventanacondops-ne-04.c

diff --git a/gcc/config/riscv/riscv.cc b/gcc/config/riscv/riscv.cc
index 56fa3600f4c..43ba520885c 100644
--- a/gcc/config/riscv/riscv.cc
+++ b/gcc/config/riscv/riscv.cc
@@ -2336,6 +2336,20 @@ riscv_rtx_costs (rtx x, machine_mode mode, int 
outer_code, int opno ATTRIBUTE_UN
   return false;
 
 case AND:
+  /* vt.maskc/vt.maskcn for XVentanaCondOps */
+  if (TARGET_XVENTANACONDOPS && mode == word_mode
+ && GET_CODE (XEXP (x, 0)) == NEG)
+   {
+ rtx inner = XEXP (XEXP (x, 0), 0);
+
+ if ((GET_CODE (inner) == EQ || GET_CODE (inner) == NE)
+ && CONST_INT_P (XEXP (inner, 1))
+ && INTVAL (XEXP (inner, 1)) == 0)
+   {
+ *total = COSTS_N_INSNS (1);
+ return true;
+   }
+   }
   /* slli.uw pattern for zba.  */
   if (TARGET_ZBA && TARGET_64BIT && mode == DImode
  && GET_CODE (XEXP (x, 0)) == ASHIFT)
diff --git a/gcc/config/riscv/riscv.md b/gcc/config/riscv/riscv.md
index 1514e10dbd1..4331842b7b2 100644
--- a/gcc/config/riscv/riscv.md
+++ b/gcc/config/riscv/riscv.md
@@ -3196,3 +3196,4 @@
 (include "generic.md")
 (include "sifive-7.md")
 (include "vector.md")
+(include "xventanacondops.md")
diff --git a/gcc/config/riscv/xventanacondops.md 
b/gcc/config/riscv/xventanacondops.md
new file mode 100644
index 000..641cef0e44e
--- /dev/null
+++ b/gcc/config/riscv/xventanacondops.md
@@ -0,0 +1,30 @@
+;; Machine description for X-Ventana-CondOps
+;; Copyright (C) 2022 Free Software Foundation, Inc.
+
+;; This file is part of GCC.
+
+;; GCC is free software; you can redistribute it and/or modify
+;; it under the terms of the GNU General Public License as published by
+;; the Free Software Foundation; either version 3, or (at your option)
+;; any later version.
+
+;; GCC is distributed in the hope that it will be useful,
+;; but WITHOUT ANY WARRANTY; without even the implied warranty of
+;; MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+;; GNU General Public License for more details.
+
+;; You should have received a copy of the GNU General Public License
+;; along with GCC; see the file COPYING3.  If not see
+;; .
+
+(define_code_iterator eq_or_ne [eq ne])
+(define_code_attr n [(eq "n") (ne "")])
+
+(define_insn "*vt.maskc"
+  [(set (match_operand:DI 0 "register_operand" "=r")
+   (and:DI (neg:DI (eq_or_ne:DI
+   (match_operand:DI 1 "register_operand" "r")
+   (const_int 0)))
+   (match_operand:DI 2 "register_operand" "r")))]
+  "TARGET_XVENTANACONDOPS"
+  "vt.maskc\t%0,%2,%1")
diff --git a/gcc/testsuite/gcc.target/riscv/xventanacondops-ne-03.c 
b/gcc/testsuite/gcc.target/riscv/xventanacondops-ne-03.c
new file mode 100644
index 000..4a762a1ed61
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/xventanacondops-ne-03.c
@@ -0,0 +1,13 @@
+/* { dg-do compile } */
+/* { dg-options "-march=rv64gc_xventanacondops -mabi=lp64 -mtune=thead-c906" } 
*/
+/* { dg-skip-if "" { *-*-* } { "-O0" "-Og" "-O1" "-Os" "-Oz" } } */
+
+long long ne3(long long a, long long b)
+{
+  if (a != 0)
+return b;
+
+  return 0;
+}
+
+/* { dg

[PATCH v2 4/8] RISC-V: Recognize sign-extract + and cases for XVentanaCondOps

2022-11-13 Thread Philipp Tomsich
Users might use explicit arithmetic operations to create a mask and
then and it, in a sequence like
cond = (bits >> SHIFT) & 1;
mask = ~(cond - 1);
val &= mask;
which will present as a single-bit sign-extract.

Dependening on what combination of XVentanaCondOps and Zbs are
available, this will map to the following sequences:
 - bexti + vt.maskc, if both Zbs and XVentanaCondOps are present
 - andi + vt.maskc, if only XVentanaCondOps is available and the
sign-extract is operating on bits 10:0 (bit
11 can't be reached, as the immediate is
sign-extended)
 - slli + srli + and, otherwise.

gcc/ChangeLog:

* config/riscv/xventanacondops.md: Recognize SIGN_EXTRACT
  of a single-bit followed by AND for XVentanaCondOps.

Signed-off-by: Philipp Tomsich 
---

(no changes since v1)

 gcc/config/riscv/xventanacondops.md | 45 +
 1 file changed, 45 insertions(+)

diff --git a/gcc/config/riscv/xventanacondops.md 
b/gcc/config/riscv/xventanacondops.md
index f23058b95b2..a4068e53c13 100644
--- a/gcc/config/riscv/xventanacondops.md
+++ b/gcc/config/riscv/xventanacondops.md
@@ -74,3 +74,48 @@
   [(set (match_dup 5) (match_dup 1))
(set (match_dup 0) (and:X (neg:X (ne:X (match_dup 5) (const_int 0)))
 (match_dup 4)))])
+
+;; Users might use explicit arithmetic operations to create a mask and
+;; then and it, in a sequence like
+;;cond = (bits >> SHIFT) & 1;
+;;mask = ~(cond - 1);
+;;val &= mask;
+;; which will present as a single-bit sign-extract in the combiner.
+;;
+;; This will give rise to any of the following cases:
+;; - with Zbs and XVentanaCondOps: bexti + vt.maskc
+;; - with XVentanaCondOps (but w/o Zbs):
+;;   - andi + vt.maskc, if the mask is representable in the immediate
+;;  (which requires extra care due to the immediate
+;;   being sign-extended)
+;;   - slli + srli + and
+;; - otherwise: slli + srli + and
+
+;; With Zbb, we have bexti for all possible bits...
+(define_split
+  [(set (match_operand:X 0 "register_operand")
+   (and:X (sign_extract:X (match_operand:X 1 "register_operand")
+  (const_int 1)
+  (match_operand 2 "immediate_operand"))
+  (match_operand:X 3 "register_operand")))
+   (clobber (match_operand:X 4 "register_operand"))]
+  "TARGET_XVENTANACONDOPS && TARGET_ZBS"
+  [(set (match_dup 4) (zero_extract:X (match_dup 1) (const_int 1) (match_dup 
2)))
+   (set (match_dup 0) (and:X (neg:X (ne:X (match_dup 4) (const_int 0)))
+(match_dup 3)))])
+
+;; ...whereas RV64I only allows us access to bits 0..10 in a single andi.
+(define_split
+  [(set (match_operand:X 0 "register_operand")
+   (and:X (sign_extract:X (match_operand:X 1 "register_operand")
+  (const_int 1)
+  (match_operand 2 "immediate_operand"))
+  (match_operand:X 3 "register_operand")))
+   (clobber (match_operand:X 4 "register_operand"))]
+  "TARGET_XVENTANACONDOPS && !TARGET_ZBS && (UINTVAL (operands[2]) < 11)"
+  [(set (match_dup 4) (and:X (match_dup 1) (match_dup 2)))
+   (set (match_dup 0) (and:X (neg:X (ne:X (match_dup 4) (const_int 0)))
+(match_dup 3)))]
+{
+  operands[2] = GEN_INT(1 << UINTVAL(operands[2]));
+})
-- 
2.34.1



[PATCH v2 0/8] RISC-V: Backend support for XVentanaCondOps/ZiCondops

2022-11-13 Thread Philipp Tomsich


Both the XVentanaCondOps (a vendor-defined extension from Ventana
Microsystems) and the proposed ZiCondOps extensions define a
conditional-zero(-or-value) instruction, which is similar to the
following C construct:
  rd = rc ? rs : 0

This functionality can be tied back into if-convertsion and also match
some typical programming idioms.  This series includes backend support
for XVentanaCondops and infrastructure to handle conditional-zero
constructions in if-conversion.

Tested against SPEC CPU 2017.


Changes in v2:
- Restore a (during rebase) dropped line to xventanacondops.md
- Include the change to add xventanacondops to the VT1 code definition]
  as a separate patch.

Philipp Tomsich (8):
  RISC-V: Recognize xventanacondops extension
  RISC-V: Generate vt.maskc on noce_try_store_flag_mask if-conversion
  RISC-V: Support noce_try_store_flag_mask as vt.maskc
  RISC-V: Recognize sign-extract + and cases for XVentanaCondOps
  RISC-V: Recognize bexti in negated if-conversion
  RISC-V: Support immediates in XVentanaCondOps
  RISC-V: Ventana-VT1 supports XVentanaCondOps
  ifcvt: add if-conversion to conditional-zero instructions

 gcc/common/config/riscv/riscv-common.cc   |   2 +
 gcc/config/riscv/predicates.md|  12 +
 gcc/config/riscv/riscv-cores.def  |   2 +-
 gcc/config/riscv/riscv-opts.h |   3 +
 gcc/config/riscv/riscv.cc |  14 ++
 gcc/config/riscv/riscv.md |  27 +++
 gcc/config/riscv/riscv.opt|   3 +
 gcc/config/riscv/xventanacondops.md   | 151 
 gcc/ifcvt.cc  | 214 ++
 .../gcc.target/riscv/xventanacondops-and-01.c |  16 ++
 .../gcc.target/riscv/xventanacondops-and-02.c |  15 ++
 .../gcc.target/riscv/xventanacondops-eq-01.c  |  11 +
 .../gcc.target/riscv/xventanacondops-eq-02.c  |  14 ++
 .../riscv/xventanacondops-ifconv-imm.c|  19 ++
 .../gcc.target/riscv/xventanacondops-le-01.c  |  16 ++
 .../gcc.target/riscv/xventanacondops-lt-01.c  |  16 ++
 .../gcc.target/riscv/xventanacondops-lt-03.c  |  16 ++
 .../gcc.target/riscv/xventanacondops-ne-01.c  |  10 +
 .../gcc.target/riscv/xventanacondops-ne-03.c  |  13 ++
 .../gcc.target/riscv/xventanacondops-ne-04.c  |  13 ++
 .../gcc.target/riscv/xventanacondops-xor-01.c |  14 ++
 21 files changed, 600 insertions(+), 1 deletion(-)
 create mode 100644 gcc/config/riscv/xventanacondops.md
 create mode 100644 gcc/testsuite/gcc.target/riscv/xventanacondops-and-01.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/xventanacondops-and-02.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/xventanacondops-eq-01.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/xventanacondops-eq-02.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/xventanacondops-ifconv-imm.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/xventanacondops-le-01.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/xventanacondops-lt-01.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/xventanacondops-lt-03.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/xventanacondops-ne-01.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/xventanacondops-ne-03.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/xventanacondops-ne-04.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/xventanacondops-xor-01.c

-- 
2.34.1



[PATCH v2 6/8] RISC-V: Support immediates in XVentanaCondOps

2022-11-13 Thread Philipp Tomsich
When if-conversion encounters sequences using immediates, the
sequences can't trivially map back onto vt.maskc/vt.maskcn (even if
benefitial) due to vt.maskc and vt.maskcn not having immediate forms.

This adds a splitter to rewrite opportunities for XVentanaCondOps that
operate on an immediate by first putting the immediate into a register
to enable the non-immediate vt.maskc/vt.maskcn instructions to operate
on the value.

Consider code, such as

  long func2 (long a, long c)
  {
if (c)
  a = 2;
else
  a = 5;
return a;
  }

which will be converted to

  func2:
seqza0,a2
neg a0,a0
andia0,a0,3
addia0,a0,2
ret

Following this change, we generate

li  a0,3
vt.maskcn   a0,a0,a2
addia0,a0,2
ret

This commit also introduces a simple unit test for if-conversion with
immediate (literal) values as the sources for simple sets in the THEN
and ELSE blocks. The test checks that Ventana's conditional mask
instruction (vt.maskc) is emitted as part of the resultant branchless
instruction sequence.

gcc/ChangeLog:

* config/riscv/xventanacondops.md: Support immediates for
  vt.maskc/vt.maskcn through a splitter.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/xventanacondops-ifconv-imm.c: New test.

Signed-off-by: Philipp Tomsich 
Reviewed-by: Henry Brausen 

---
Ref #204

(no changes since v1)

 gcc/config/riscv/xventanacondops.md   | 24 +--
 .../riscv/xventanacondops-ifconv-imm.c| 19 +++
 2 files changed, 41 insertions(+), 2 deletions(-)
 create mode 100644 gcc/testsuite/gcc.target/riscv/xventanacondops-ifconv-imm.c

diff --git a/gcc/config/riscv/xventanacondops.md 
b/gcc/config/riscv/xventanacondops.md
index 81e5e8a8298..e3e109828a2 100644
--- a/gcc/config/riscv/xventanacondops.md
+++ b/gcc/config/riscv/xventanacondops.md
@@ -29,6 +29,26 @@
   "TARGET_XVENTANACONDOPS"
   "vt.maskc\t%0,%2,%1")
 
+;; XVentanaCondOps does not have immediate forms, so we need to do extra
+;; work to support these: if we encounter a vt.maskc/n with an immediate,
+;; we split this into a load-immediate followed by a vt.maskc/n.
+(define_split
+  [(set (match_operand:DI 0 "register_operand")
+   (and:DI (neg:DI (match_operator:DI 1 "equality_operator"
+  [(match_operand:DI 2 "register_operand")
+   (const_int 0)]))
+   (match_operand:DI 3 "immediate_operand")))
+   (clobber (match_operand:DI 4 "register_operand"))]
+  "TARGET_XVENTANACONDOPS"
+  [(set (match_dup 4) (match_dup 3))
+   (set (match_dup 0) (and:DI (neg:DI (match_dup 1))
+ (match_dup 4)))]
+{
+  /* Eliminate the clobber/temporary, if it is not needed. */
+  if (!rtx_equal_p (operands[0], operands[2]))
+ operands[4] = operands[0];
+})
+
 ;; Make order operators digestible to the vt.maskc logic by
 ;; wrapping their result in a comparison against (const_int 0).
 
@@ -37,7 +57,7 @@
   [(set (match_operand:X 0 "register_operand")
(and:X (neg:X (match_operator:X 1 "anyge_operator"
 [(match_operand:X 2 "register_operand")
- (match_operand:X 3 "register_operand")]))
+ (match_operand:X 3 "arith_operand")]))
   (match_operand:X 4 "register_operand")))
(clobber (match_operand:X 5 "register_operand"))]
   "TARGET_XVENTANACONDOPS"
@@ -54,7 +74,7 @@
   [(set (match_operand:X 0 "register_operand")
(and:X (neg:X (match_operator:X 1 "anygt_operator"
 [(match_operand:X 2 "register_operand")
- (match_operand:X 3 "register_operand")]))
+ (match_operand:X 3 "arith_operand")]))
   (match_operand:X 4 "register_operand")))
(clobber (match_operand:X 5 "register_operand"))]
   "TARGET_XVENTANACONDOPS"
diff --git a/gcc/testsuite/gcc.target/riscv/xventanacondops-ifconv-imm.c 
b/gcc/testsuite/gcc.target/riscv/xventanacondops-ifconv-imm.c
new file mode 100644
index 000..0012e7b669c
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/xventanacondops-ifconv-imm.c
@@ -0,0 +1,19 @@
+/* { dg-do compile } */
+/* { dg-options "-march=rv64gc_xventanacondops -mabi=lp64" } */
+/* { dg-skip-if "" { *-*-* } { "-O0" "-Og" "-Os" "-Oz" } } */
+
+/* Each function below should emit a vt.maskcn instruction */
+
+long
+foo0 (long a, long b, long c)
+{
+  if (c)
+a = 0;
+  else
+a = 5;
+  return a;
+}
+
+/* { dg-final { scan-assembler-times "vt.maskcn\t" 1 } } */
+/* { dg-final { scan-assembler-not "beqz\t" } } */
+/* { dg-final { scan-assembler-not "bnez\t" } } */
-- 
2.34.1



[PATCH v2 3/8] RISC-V: Support noce_try_store_flag_mask as vt.maskc

2022-11-13 Thread Philipp Tomsich
When if-conversion in noce_try_store_flag_mask starts the sequence off
with an order-operator, our patterns for vt.maskc will receive the
result of the order-operator as a register argument; consequently,
they can't know that the result will be either 1 or 0.

To convey this information (and make vt.maskc applicable), we wrap
the result of the order-operator in a eq/ne against (const_int 0).
This commit adds the split pattern to handle these cases.

During if-conversion, if noce_try_store_flag_mask succeeds, we may see
if (cur < next) {
next = 0;
}
transformed into
   27: r82:SI=ltu(r76:DI,r75:DI)
  REG_DEAD r76:DI
   28: r81:SI=r82:SI^0x1
  REG_DEAD r82:SI
   29: r80:DI=zero_extend(r81:SI)
  REG_DEAD r81:SI

This currently escapes the combiner, as RISC-V does not have a pattern
to apply the 'slt' instruction to 'geu' verbs.  By adding a pattern in
this commit, we match such cases.

gcc/ChangeLog:

* config/riscv/xventanacondops.md: Add split to wrap an an
  order-operator suitably for generating vt.maskc.
* config/riscv/predicates.md (anyge_operator): Define.
(anygt_operator): Define.
(anyle_operator): Define.
(anylt_operator): Define.
* config/riscv/riscv.md: Helpers for ge(u) & le(u).

gcc/testsuite/ChangeLog:

* gcc.target/riscv/xventanacondops-le-01.c: New test.
* gcc.target/riscv/xventanacondops-lt-03.c: New test.

Signed-off-by: Philipp Tomsich 
---

Changes in v2:
- Fixed a pattern that was truncated during a rebase (last line
  missing).
- Ran whitespace-cleanup on xventanacondops-le-01.c
- Ran whitespace-cleanup on xventanacondops-lt-03.c

 gcc/config/riscv/predicates.md| 12 +
 gcc/config/riscv/riscv.md | 26 +++
 gcc/config/riscv/xventanacondops.md   | 46 +++
 .../gcc.target/riscv/xventanacondops-le-01.c  | 16 +++
 .../gcc.target/riscv/xventanacondops-lt-03.c  | 16 +++
 5 files changed, 116 insertions(+)
 create mode 100644 gcc/testsuite/gcc.target/riscv/xventanacondops-le-01.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/xventanacondops-lt-03.c

diff --git a/gcc/config/riscv/predicates.md b/gcc/config/riscv/predicates.md
index b368c11c930..490bff688a7 100644
--- a/gcc/config/riscv/predicates.md
+++ b/gcc/config/riscv/predicates.md
@@ -204,6 +204,18 @@
 (define_predicate "equality_operator"
   (match_code "eq,ne"))
 
+(define_predicate "anyge_operator"
+  (match_code "ge,geu"))
+
+(define_predicate "anygt_operator"
+  (match_code "gt,gtu"))
+
+(define_predicate "anyle_operator"
+  (match_code "le,leu"))
+
+(define_predicate "anylt_operator"
+  (match_code "lt,ltu"))
+
 (define_predicate "order_operator"
   (match_code "eq,ne,lt,ltu,le,leu,ge,geu,gt,gtu"))
 
diff --git a/gcc/config/riscv/riscv.md b/gcc/config/riscv/riscv.md
index 4331842b7b2..d1f3270a3c8 100644
--- a/gcc/config/riscv/riscv.md
+++ b/gcc/config/riscv/riscv.md
@@ -2636,6 +2636,19 @@
   [(set_attr "type" "slt")
(set_attr "mode" "")])
 
+(define_split
+  [(set (match_operand:GPR 0 "register_operand")
+   (match_operator:GPR 1 "anyle_operator"
+  [(match_operand:X 2 "register_operand")
+   (match_operand:X 3 "register_operand")]))]
+  "TARGET_XVENTANACONDOPS"
+  [(set (match_dup 0) (match_dup 4))
+   (set (match_dup 0) (eq:GPR (match_dup 0) (const_int 0)))]
+ {
+  operands[4] = gen_rtx_fmt_ee (GET_CODE (operands[1]) == LE ? LT : LTU,
+   mode, operands[3], operands[2]);
+ })
+
 (define_insn "*slt_"
   [(set (match_operand:GPR   0 "register_operand" "= r")
(any_lt:GPR (match_operand:X 1 "register_operand" "  r")
@@ -2657,6 +2670,19 @@
   [(set_attr "type" "slt")
(set_attr "mode" "")])
 
+(define_split
+  [(set (match_operand:GPR 0 "register_operand")
+   (match_operator:GPR 1 "anyge_operator"
+  [(match_operand:X 2 "register_operand")
+   (match_operand:X 3 "register_operand")]))]
+  "TARGET_XVENTANACONDOPS"
+  [(set (match_dup 0) (match_dup 4))
+   (set (match_dup 0) (eq:GPR (match_dup 0) (const_int 0)))]
+{
+  operands[4] = gen_rtx_fmt_ee (GET_CODE (operands[1]) == GE ? LT : LTU,
+   mode, operands[2], operands[3]);
+})
+
 ;;
 ;;  
 ;;
diff --git a/gcc/config/riscv/xventanacondops.md 
b/gcc/config/riscv/xventanacondops.md
index 641cef0e44e..f23058b95b2 100644
--- a/gcc/config/riscv/xventanacondops.md
+++ b/gcc/config/riscv/xventanacondops.md
@@ -28,3 +28,49 @@
(match_operand:DI 2 "register_operand" "r")))]
   "TARGET_XVENTANACONDOPS"
   "vt.maskc\t%0,%2,%1")
+
+;; Make order operators digestible to the vt.maskc logic by
+;; wrapping their result in a comparison against (const_int 0).
+
+;; "a >= b" is "!(a < b)"
+(define_split
+  [(set (match_operand:X 0 "register_operand")
+   (and:X (neg:X (match_operator:X 1 "anyge_operator"
+[(match_operand:X 2 "reg

[PATCH v2 5/8] RISC-V: Recognize bexti in negated if-conversion

2022-11-13 Thread Philipp Tomsich
While the positive case "if ((bits >> SHAMT) & 1)" for SHAMT 0..10 can
trigger conversion into efficient branchless sequences
  - with Zbs (bexti + neg + and)
  - with XVentanaCondOps (andi + vt.maskc)
the inverted/negated case results in
  andi a5,a0,1024
  seqz a5,a5
  neg a5,a5
  and a5,a5,a1
due to how the sequence presents to the combine pass.

This adds an additional splitter to reassociate the polarity reversed
case into bexti + addi, if Zbs is present.

gcc/ChangeLog:

* config/riscv/xventanacondops.md: Add split to reassociate
  "andi + seqz + neg" into "bexti + addi".

Signed-off-by: Philipp Tomsich 
---

Changes in v2:
- Removed spurious empty line at the end of xventanacondops.md.

 gcc/config/riscv/xventanacondops.md | 10 ++
 1 file changed, 10 insertions(+)

diff --git a/gcc/config/riscv/xventanacondops.md 
b/gcc/config/riscv/xventanacondops.md
index a4068e53c13..81e5e8a8298 100644
--- a/gcc/config/riscv/xventanacondops.md
+++ b/gcc/config/riscv/xventanacondops.md
@@ -119,3 +119,13 @@
 {
   operands[2] = GEN_INT(1 << UINTVAL(operands[2]));
 })
+
+(define_split
+  [(set (match_operand:X 0 "register_operand")
+   (neg:X (eq:X (zero_extract:X (match_operand:X 1 "register_operand")
+(const_int 1)
+(match_operand 2 "immediate_operand"))
+(const_int 0]
+  "!TARGET_XVENTANACONDOPS && TARGET_ZBS"
+  [(set (match_dup 0) (zero_extract:X (match_dup 1) (const_int 1) (match_dup 
2)))
+   (set (match_dup 0) (plus:X (match_dup 0) (const_int -1)))])
-- 
2.34.1



[PATCH v2 8/8] ifcvt: add if-conversion to conditional-zero instructions

2022-11-13 Thread Philipp Tomsich
Some architectures, as it the case on RISC-V with the proposed
ZiCondOps and the vendor-defined XVentanaCondOps, define a
conditional-zero instruction that is equivalent to:
 - the positive form:  rd = (rc != 0) ? rs : 0
 - the negated form:   rd = (rc == 0) ? rs : 0

While noce_try_store_flag_mask will somewhat work for this case, it
will generate a number of atomic RTX that will misdirect the cost
calculation and may be too long (i.e., 4 RTX and more) to successfully
merge at combine-time.

Instead, we add two new transforms that attempt to build up what we
define as the canonical form of a conditional-zero expression:

  (set (match_operand 0 "register_operand" "=r")
   (and (neg (eq_or_ne (match_operand 1 "register_operand" "r")
   (const_int 0)))
(match_operand 2 "register_operand" "r")))

Architectures that provide a conditional-zero are thus expected to
define an instruction matching this pattern in their backend.

Based on this, we support the following cases:
 - noce_try_condzero:
  a ? a : b
  a ? b : 0  (and then/else swapped)
 !a ? b : 0  (and then/else swapped)
 - noce_try_condzero_arith:
 conditional-plus, conditional-minus, conditional-and,
 conditional-or, conditional-xor, conditional-shift,
 conditional-and

Given that this is hooked into the CE passes, it is less powerful than
a tree-pass (e.g., it can not transform cases where an extension, such
as for uint16_t operations is in either the then or else-branch
together with the arithmetic) but already covers a good array of cases
and triggers across SPEC CPU 2017.
Adding transofmrations in a tree pass will be considered as a future
improvement.

gcc/ChangeLog:

* ifcvt.cc (noce_emit_insn): Add prototype.
(noce_emit_condzero): Helper for noce_try_condzero and
noce_try_condzero_arith transforms.
(noce_try_condzero): New transform.
(noce_try_condzero_arith): New transform for conditional
arithmetic that can be built up by exploiting that the
conditional-zero instruction will inject 0, which acts
as the neutral element for operations.
(noce_process_if_block): Call noce_try_condzero and
noce_try_condzero_arith.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/xventanacondops-and-01.c: New test.
* gcc.target/riscv/xventanacondops-and-02.c: New test.
* gcc.target/riscv/xventanacondops-eq-01.c: New test.
* gcc.target/riscv/xventanacondops-eq-02.c: New test.
* gcc.target/riscv/xventanacondops-lt-01.c: New test.
* gcc.target/riscv/xventanacondops-ne-01.c: New test.
* gcc.target/riscv/xventanacondops-xor-01.c: New test.

Signed-off-by: Philipp Tomsich 
---

Changes in v2:
- Ran whitespace-cleanup on xventanacondops-ne-01.c.

 gcc/ifcvt.cc  | 214 ++
 .../gcc.target/riscv/xventanacondops-and-01.c |  16 ++
 .../gcc.target/riscv/xventanacondops-and-02.c |  15 ++
 .../gcc.target/riscv/xventanacondops-eq-01.c  |  11 +
 .../gcc.target/riscv/xventanacondops-eq-02.c  |  14 ++
 .../gcc.target/riscv/xventanacondops-lt-01.c  |  16 ++
 .../gcc.target/riscv/xventanacondops-ne-01.c  |  10 +
 .../gcc.target/riscv/xventanacondops-xor-01.c |  14 ++
 8 files changed, 310 insertions(+)
 create mode 100644 gcc/testsuite/gcc.target/riscv/xventanacondops-and-01.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/xventanacondops-and-02.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/xventanacondops-eq-01.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/xventanacondops-eq-02.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/xventanacondops-lt-01.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/xventanacondops-ne-01.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/xventanacondops-xor-01.c

diff --git a/gcc/ifcvt.cc b/gcc/ifcvt.cc
index eb8efb89a89..41c58876d05 100644
--- a/gcc/ifcvt.cc
+++ b/gcc/ifcvt.cc
@@ -97,6 +97,7 @@ static int find_if_case_2 (basic_block, edge, edge);
 static int dead_or_predicable (basic_block, basic_block, basic_block,
   edge, int);
 static void noce_emit_move_insn (rtx, rtx);
+static rtx_insn *noce_emit_insn (rtx);
 static rtx_insn *block_has_only_trap (basic_block);
 static void need_cmov_or_rewire (basic_block, hash_set *,
 hash_map *);
@@ -787,6 +788,9 @@ static rtx noce_get_alt_condition (struct noce_if_info *, 
rtx, rtx_insn **);
 static int noce_try_minmax (struct noce_if_info *);
 static int noce_try_abs (struct noce_if_info *);
 static int noce_try_sign_mask (struct noce_if_info *);
+static rtx noce_emit_condzero (struct noce_if_info *, rtx, bool = false);
+static int noce_try_condzero (struct noce_if_info *);
+static int noce_try_condzero_arith (struct noce_if_info *);
 
 /* Return the comparison code for reversed condition for IF_INFO,
or UNKNOWN if reversing the condition is not possible.  */
@@ -1664,6 +1668,212 @

[PATCH v2 7/8] RISC-V: Ventana-VT1 supports XVentanaCondOps

2022-11-13 Thread Philipp Tomsich
gcc/ChangeLog:

* config/riscv/riscv-cores.def (RISCV_CORE): Update the
Ventana-VT1 definition to include the xventanacondops
extension.

Signed-off-by: Philipp Tomsich 
---

Changes in v2:
- New in v2.

 gcc/config/riscv/riscv-cores.def | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/gcc/config/riscv/riscv-cores.def b/gcc/config/riscv/riscv-cores.def
index aef1e92ae24..9e38e9dc72e 100644
--- a/gcc/config/riscv/riscv-cores.def
+++ b/gcc/config/riscv/riscv-cores.def
@@ -74,6 +74,6 @@ RISCV_CORE("sifive-s76",  "rv64imafdc", "sifive-7-series")
 RISCV_CORE("sifive-u54",  "rv64imafdc", "sifive-5-series")
 RISCV_CORE("sifive-u74",  "rv64imafdc", "sifive-7-series")
 
-RISCV_CORE("ventana-vt1", "rv64imafdc_zba_zbb_zbc_zbs_zifencei",   
"ventana-vt1")
+RISCV_CORE("ventana-vt1", 
"rv64imafdc_zba_zbb_zbc_zbs_zifencei_xventanacondops",   "ventana-vt1")
 
 #undef RISCV_CORE
-- 
2.34.1



[PATCH 0/7] Add XThead* support

2022-11-13 Thread Christoph Muellner
From: Christoph Müllner 

This series adds support for the following vendor extensions
from T-Head:

* XTheadCmo, XTheadSync
* XTheadBa, XTheadBb, XTheadBs
* XTheadCondMov
* XTheadMac
* XTheadFmv, XTheadInt

No regressions observed.

Christoph Müllner (7):
  riscv: Add basic XThead* vendor extension support
  riscv: riscv-cores.def: Add T-Head XuanTie C906
  riscv: thead: Add support for XTheadBa and XTheadBs ISA extensions
  riscv: thead: Add support for XTheadCondMov ISA extensions
  riscv: thead: Add support for XTheadBb ISA extension
  riscv: thead: Add support for XTheadMac ISA extension
  riscv: Add basic extension support for XTheadFmv and XTheadInt

 gcc/common/config/riscv/riscv-common.cc   |  24 ++
 gcc/config/riscv/bitmanip.md  |  47 +++-
 gcc/config/riscv/iterators.md |   4 +
 gcc/config/riscv/riscv-cores.def  |   2 +
 gcc/config/riscv/riscv-opts.h |  23 ++
 gcc/config/riscv/riscv.cc |  67 -
 gcc/config/riscv/riscv.md |  52 +++-
 gcc/config/riscv/riscv.opt|   3 +
 gcc/config/riscv/thead.md | 252 ++
 .../gcc.target/riscv/mcpu-thead-c906.c|  18 ++
 gcc/testsuite/gcc.target/riscv/thead-mula-1.c |  40 +++
 gcc/testsuite/gcc.target/riscv/thead-mula-2.c |  28 ++
 .../gcc.target/riscv/xtheadba-addsl-64.c  |  18 ++
 .../gcc.target/riscv/xtheadba-addsl.c |  20 ++
 gcc/testsuite/gcc.target/riscv/xtheadba.c |  13 +
 gcc/testsuite/gcc.target/riscv/xtheadbb-ext.c |  19 ++
 .../gcc.target/riscv/xtheadbb-extu.c  |  12 +
 gcc/testsuite/gcc.target/riscv/xtheadbb-rev.c |  40 +++
 .../gcc.target/riscv/xtheadbb-srri.c  |  18 ++
 gcc/testsuite/gcc.target/riscv/xtheadbb.c |  13 +
 gcc/testsuite/gcc.target/riscv/xtheadbs-tst.c |  12 +
 gcc/testsuite/gcc.target/riscv/xtheadbs.c |  13 +
 gcc/testsuite/gcc.target/riscv/xtheadcmo.c|  13 +
 .../riscv/xtheadcondmov-mveqz-imm-eqz.c   |  37 +++
 .../riscv/xtheadcondmov-mveqz-imm-not.c   |  37 +++
 .../riscv/xtheadcondmov-mveqz-reg-eqz.c   |  37 +++
 .../riscv/xtheadcondmov-mveqz-reg-not.c   |  37 +++
 .../riscv/xtheadcondmov-mvnez-imm-cond.c  |  37 +++
 .../riscv/xtheadcondmov-mvnez-imm-nez.c   |  37 +++
 .../riscv/xtheadcondmov-mvnez-reg-cond.c  |  37 +++
 .../riscv/xtheadcondmov-mvnez-reg-nez.c   |  37 +++
 .../gcc.target/riscv/xtheadcondmov.c  |  13 +
 .../gcc.target/riscv/xtheadfmemidx.c  |  13 +
 gcc/testsuite/gcc.target/riscv/xtheadfmv.c|  14 +
 gcc/testsuite/gcc.target/riscv/xtheadint.c|  14 +
 gcc/testsuite/gcc.target/riscv/xtheadmac.c|  13 +
 gcc/testsuite/gcc.target/riscv/xtheadmemidx.c |  13 +
 gcc/testsuite/gcc.target/riscv/xtheadsync.c   |  13 +
 38 files changed, 1123 insertions(+), 17 deletions(-)
 create mode 100644 gcc/config/riscv/thead.md
 create mode 100644 gcc/testsuite/gcc.target/riscv/mcpu-thead-c906.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/thead-mula-1.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/thead-mula-2.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/xtheadba-addsl-64.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/xtheadba-addsl.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/xtheadba.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/xtheadbb-ext.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/xtheadbb-extu.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/xtheadbb-rev.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/xtheadbb-srri.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/xtheadbb.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/xtheadbs-tst.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/xtheadbs.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/xtheadcmo.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/xtheadcondmov-mveqz-imm-eqz.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/xtheadcondmov-mveqz-imm-not.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/xtheadcondmov-mveqz-reg-eqz.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/xtheadcondmov-mveqz-reg-not.c
 create mode 100644 
gcc/testsuite/gcc.target/riscv/xtheadcondmov-mvnez-imm-cond.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/xtheadcondmov-mvnez-imm-nez.c
 create mode 100644 
gcc/testsuite/gcc.target/riscv/xtheadcondmov-mvnez-reg-cond.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/xtheadcondmov-mvnez-reg-nez.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/xtheadcondmov.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/xtheadfmemidx.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/xtheadfmv.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/xtheadint.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/xtheadmac.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/xtheadmemidx.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/xtheadsync.c

-- 
2.38.1



[PATCH 1/7] riscv: Add basic XThead* vendor extension support

2022-11-13 Thread Christoph Muellner
From: Christoph Müllner 

This patch add basic support for the following XThead* ISA extensions:

* XTheadCmo, XTheadSync
* XTheadBa, XTheadBb, XTheadBs, XTheadCondMov
* XTheadMac
* XTheadFMemIdx, XTheadMemIdx

gcc/ChangeLog:

* common/config/riscv/riscv-common.cc: Add xthead* extensions.
* config/riscv/riscv-opts.h (MASK_XTHEADBA): New.
(TARGET_XTHEADBA): New.
(MASK_XTHEADBB): New.
(TARGET_XTHEADBB): New.
(MASK_XTHEADBS): New.
(TARGET_XTHEADBS): New.
(MASK_XTHEADCMO): New.
(TARGET_XTHEADCMO): New.
(MASK_XTHEADCONDMOV): New.
(TARGET_XTHEADCONDMOV): New.
(MASK_XTHEADFMEMIDX): New.
(TARGET_XTHEADFMEMIDX): New.
(MASK_XTHEADMAC): New.
(TARGET_XTHEADMAC): New.
(MASK_XTHEADMEMIDX): New.
(TARGET_XTHEADMEMIDX): New.
(MASK_XTHEADSYNC): New.
(TARGET_XTHEADSYNC): New.
* config/riscv/riscv.opt: Add riscv_xthead_subext.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/xtheadba.c: New test.
* gcc.target/riscv/xtheadbb.c: New test.
* gcc.target/riscv/xtheadbs.c: New test.
* gcc.target/riscv/xtheadcmo.c: New test.
* gcc.target/riscv/xtheadcondmov.c: New test.
* gcc.target/riscv/xtheadfmemidx.c: New test.
* gcc.target/riscv/xtheadmac.c: New test.
* gcc.target/riscv/xtheadmemidx.c: New test.
* gcc.target/riscv/xtheadsync.c: New test.

Signed-off-by: Christoph Müllner 
---
 gcc/common/config/riscv/riscv-common.cc   | 20 +++
 gcc/config/riscv/riscv-opts.h | 19 ++
 gcc/config/riscv/riscv.opt|  3 +++
 gcc/testsuite/gcc.target/riscv/xtheadba.c | 13 
 gcc/testsuite/gcc.target/riscv/xtheadbb.c | 13 
 gcc/testsuite/gcc.target/riscv/xtheadbs.c | 13 
 gcc/testsuite/gcc.target/riscv/xtheadcmo.c| 13 
 .../gcc.target/riscv/xtheadcondmov.c  | 13 
 .../gcc.target/riscv/xtheadfmemidx.c  | 13 
 gcc/testsuite/gcc.target/riscv/xtheadmac.c| 13 
 gcc/testsuite/gcc.target/riscv/xtheadmemidx.c | 13 
 gcc/testsuite/gcc.target/riscv/xtheadsync.c   | 13 
 12 files changed, 159 insertions(+)
 create mode 100644 gcc/testsuite/gcc.target/riscv/xtheadba.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/xtheadbb.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/xtheadbs.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/xtheadcmo.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/xtheadcondmov.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/xtheadfmemidx.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/xtheadmac.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/xtheadmemidx.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/xtheadsync.c

diff --git a/gcc/common/config/riscv/riscv-common.cc 
b/gcc/common/config/riscv/riscv-common.cc
index 4b7f777c103..8e1449d3543 100644
--- a/gcc/common/config/riscv/riscv-common.cc
+++ b/gcc/common/config/riscv/riscv-common.cc
@@ -222,6 +222,16 @@ static const struct riscv_ext_version 
riscv_ext_version_table[] =
   {"svinval", ISA_SPEC_CLASS_NONE, 1, 0},
   {"svnapot", ISA_SPEC_CLASS_NONE, 1, 0},
 
+  {"xtheadba", ISA_SPEC_CLASS_NONE, 1, 0},
+  {"xtheadbb", ISA_SPEC_CLASS_NONE, 1, 0},
+  {"xtheadbs", ISA_SPEC_CLASS_NONE, 1, 0},
+  {"xtheadcmo", ISA_SPEC_CLASS_NONE, 1, 0},
+  {"xtheadcondmov", ISA_SPEC_CLASS_NONE, 1, 0},
+  {"xtheadfmemidx", ISA_SPEC_CLASS_NONE, 1, 0},
+  {"xtheadmac", ISA_SPEC_CLASS_NONE, 1, 0},
+  {"xtheadmemidx", ISA_SPEC_CLASS_NONE, 1, 0},
+  {"xtheadsync", ISA_SPEC_CLASS_NONE, 1, 0},
+
   /* Terminate the list.  */
   {NULL, ISA_SPEC_CLASS_NONE, 0, 0}
 };
@@ -1247,6 +1257,16 @@ static const riscv_ext_flag_table_t 
riscv_ext_flag_table[] =
   {"svinval", &gcc_options::x_riscv_sv_subext, MASK_SVINVAL},
   {"svnapot", &gcc_options::x_riscv_sv_subext, MASK_SVNAPOT},
 
+  {"xtheadba",  &gcc_options::x_riscv_xthead_subext, MASK_XTHEADBA},
+  {"xtheadbb",  &gcc_options::x_riscv_xthead_subext, MASK_XTHEADBB},
+  {"xtheadbs",  &gcc_options::x_riscv_xthead_subext, MASK_XTHEADBS},
+  {"xtheadcmo", &gcc_options::x_riscv_xthead_subext, MASK_XTHEADCMO},
+  {"xtheadcondmov", &gcc_options::x_riscv_xthead_subext, MASK_XTHEADCONDMOV},
+  {"xtheadfmemidx", &gcc_options::x_riscv_xthead_subext, MASK_XTHEADFMEMIDX},
+  {"xtheadmac", &gcc_options::x_riscv_xthead_subext, MASK_XTHEADMAC},
+  {"xtheadmemidx",  &gcc_options::x_riscv_xthead_subext, MASK_XTHEADMEMIDX},
+  {"xtheadsync",&gcc_options::x_riscv_xthead_subext, MASK_XTHEADSYNC},
+
   {NULL, NULL, 0}
 };
 
diff --git a/gcc/config/riscv/riscv-opts.h b/gcc/config/riscv/riscv-opts.h
index 25fd85b09b1..18daac40dbd 100644
--- a/gcc/config/riscv/riscv-opts.h
+++ b/gcc/config/riscv/riscv-opts.h
@@ -189,4 +189,23 @@ enum stack_protector_guard {
? 0 \
: 32 << (__builtin_popcount (riscv_

[PATCH 2/7] riscv: riscv-cores.def: Add T-Head XuanTie C906

2022-11-13 Thread Christoph Muellner
From: Christoph Müllner 

This adds T-Head's XuanTie C906 to the list of known cores as "thead-c906".
The C906 is shipped for quite some time (it is the core of the Allwinner D1).
Note, that the tuning struct for the C906 is already part of GCC (it is
also name "thead-c906").

gcc/ChangeLog:

* config/riscv/riscv-cores.def (RISCV_CORE): Add "thead-c906".

gcc/testsuite/ChangeLog:

* gcc.target/riscv/mcpu-thead-c906.c: New test.

Signed-off-by: Christoph Müllner 
---
 gcc/config/riscv/riscv-cores.def   |  2 ++
 .../gcc.target/riscv/mcpu-thead-c906.c | 18 ++
 2 files changed, 20 insertions(+)
 create mode 100644 gcc/testsuite/gcc.target/riscv/mcpu-thead-c906.c

diff --git a/gcc/config/riscv/riscv-cores.def b/gcc/config/riscv/riscv-cores.def
index 31ad34682c5..648a010e09b 100644
--- a/gcc/config/riscv/riscv-cores.def
+++ b/gcc/config/riscv/riscv-cores.def
@@ -73,4 +73,6 @@ RISCV_CORE("sifive-s76",  "rv64imafdc", "sifive-7-series")
 RISCV_CORE("sifive-u54",  "rv64imafdc", "sifive-5-series")
 RISCV_CORE("sifive-u74",  "rv64imafdc", "sifive-7-series")
 
+RISCV_CORE("thead-c906",  "rv64imafdc", "thead-c906")
+
 #undef RISCV_CORE
diff --git a/gcc/testsuite/gcc.target/riscv/mcpu-thead-c906.c 
b/gcc/testsuite/gcc.target/riscv/mcpu-thead-c906.c
new file mode 100644
index 000..f579e7e2215
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/mcpu-thead-c906.c
@@ -0,0 +1,18 @@
+/* { dg-do compile } */
+/* { dg-skip-if "-march given" { *-*-* } { "-march=*" } } */
+/* { dg-options "-mcpu=thead-c906" { target { rv64 } } } */
+/* T-Head XuanTie C906 => rv64imafdc */
+
+#if !((__riscv_xlen == 64) \
+  && !defined(__riscv_32e) \
+  && defined(__riscv_mul)  \
+  && defined(__riscv_atomic)   \
+  && (__riscv_flen == 64)  \
+  && defined(__riscv_compressed))
+#error "unexpected arch"
+#endif
+
+int main()
+{
+  return 0;
+}
-- 
2.38.1



[PATCH 4/7] riscv: thead: Add support for XTheadCondMov ISA extensions

2022-11-13 Thread Christoph Muellner
From: Christoph Müllner 

This patch adds support for XTheadCondMov ISA extension.
The extension brings a one-sided conditional move (no else-assignment).
Given that GCC has a great if-conversion pass, we don't need to do much,
besides properly expanding movcc accordingly and adjust the cost
model.

gcc/ChangeLog:

* config/riscv/iterators.md (TARGET_64BIT): Add GPR2 iterator.
* config/riscv/riscv.cc (riscv_rtx_costs): Add costs for
  XTheadCondMov.
(riscv_expand_conditional_move_onesided): New function.
(riscv_expand_conditional_move): New function.
* config/riscv/riscv.md: Add support for XTheadCondMov.
* config/riscv/thead.md (*th_cond_mov): Add
  support for XTheadCondMov.
(*th_cond_gpr_mov): Likewise.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/xtheadcondmov-mveqz-imm-eqz.c: New test.
* gcc.target/riscv/xtheadcondmov-mveqz-imm-not.c: New test.
* gcc.target/riscv/xtheadcondmov-mveqz-reg-eqz.c: New test.
* gcc.target/riscv/xtheadcondmov-mveqz-reg-not.c: New test.
* gcc.target/riscv/xtheadcondmov-mvnez-imm-cond.c: New test.
* gcc.target/riscv/xtheadcondmov-mvnez-imm-nez.c: New test.
* gcc.target/riscv/xtheadcondmov-mvnez-reg-cond.c: New test.
* gcc.target/riscv/xtheadcondmov-mvnez-reg-nez.c: New test.

Signed-off-by: Christoph Müllner 
---
 gcc/config/riscv/iterators.md |  4 ++
 gcc/config/riscv/riscv.cc | 53 ---
 gcc/config/riscv/riscv.md |  7 +--
 gcc/config/riscv/thead.md | 35 
 .../riscv/xtheadcondmov-mveqz-imm-eqz.c   | 37 +
 .../riscv/xtheadcondmov-mveqz-imm-not.c   | 37 +
 .../riscv/xtheadcondmov-mveqz-reg-eqz.c   | 37 +
 .../riscv/xtheadcondmov-mveqz-reg-not.c   | 37 +
 .../riscv/xtheadcondmov-mvnez-imm-cond.c  | 37 +
 .../riscv/xtheadcondmov-mvnez-imm-nez.c   | 37 +
 .../riscv/xtheadcondmov-mvnez-reg-cond.c  | 37 +
 .../riscv/xtheadcondmov-mvnez-reg-nez.c   | 37 +
 12 files changed, 386 insertions(+), 9 deletions(-)
 create mode 100644 gcc/testsuite/gcc.target/riscv/xtheadcondmov-mveqz-imm-eqz.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/xtheadcondmov-mveqz-imm-not.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/xtheadcondmov-mveqz-reg-eqz.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/xtheadcondmov-mveqz-reg-not.c
 create mode 100644 
gcc/testsuite/gcc.target/riscv/xtheadcondmov-mvnez-imm-cond.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/xtheadcondmov-mvnez-imm-nez.c
 create mode 100644 
gcc/testsuite/gcc.target/riscv/xtheadcondmov-mvnez-reg-cond.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/xtheadcondmov-mvnez-reg-nez.c

diff --git a/gcc/config/riscv/iterators.md b/gcc/config/riscv/iterators.md
index 50380ecfac9..3932eab14fa 100644
--- a/gcc/config/riscv/iterators.md
+++ b/gcc/config/riscv/iterators.md
@@ -26,6 +26,10 @@
 ;; from the same template.
 (define_mode_iterator GPR [SI (DI "TARGET_64BIT")])
 
+;; A copy of GPR that can be used when a pattern has two independent
+;; modes.
+(define_mode_iterator GPR2 [SI (DI "TARGET_64BIT")])
+
 ;; This mode iterator allows :P to be used for patterns that operate on
 ;; pointer-sized quantities.  Exactly one of the two alternatives will match.
 (define_mode_iterator P [(SI "Pmode == SImode") (DI "Pmode == DImode")])
diff --git a/gcc/config/riscv/riscv.cc b/gcc/config/riscv/riscv.cc
index decade0fedd..9a795264e00 100644
--- a/gcc/config/riscv/riscv.cc
+++ b/gcc/config/riscv/riscv.cc
@@ -2269,8 +2269,8 @@ riscv_rtx_costs (rtx x, machine_mode mode, int 
outer_code, int opno ATTRIBUTE_UN
   return false;
 
 case IF_THEN_ELSE:
-  if (TARGET_SFB_ALU
- && register_operand (XEXP (x, 1), mode)
+  if ((TARGET_SFB_ALU || TARGET_XTHEADCONDMOV)
+ && reg_or_0_operand (XEXP (x, 1), mode)
  && sfb_alu_operand (XEXP (x, 2), mode)
  && comparison_operator (XEXP (x, 0), VOIDmode))
{
@@ -3231,16 +3231,57 @@ riscv_expand_conditional_branch (rtx label, rtx_code 
code, rtx op0, rtx op1)
   emit_jump_insn (gen_condjump (condition, label));
 }
 
+/* Helper to emit two one-sided conditional moves for the movecc.  */
+
+static void
+riscv_expand_conditional_move_onesided (rtx dest, rtx cons, rtx alt,
+   rtx_code code, rtx op0, rtx op1)
+{
+  machine_mode mode = GET_MODE (dest);
+
+  gcc_assert (GET_MODE_CLASS (mode) == MODE_INT);
+  gcc_assert (reg_or_0_operand (cons, mode));
+  gcc_assert (reg_or_0_operand (alt, mode));
+
+  riscv_emit_int_compare (&code, &op0, &op1);
+  rtx cond = gen_rtx_fmt_ee (code, GET_MODE (op0), op0, op1);
+
+  rtx tmp1 = gen_reg_rtx (mode);
+  rtx tmp2 = gen_reg_rtx (mode);
+
+  emit_insn (gen_rtx_SET (tmp1, gen_rtx_IF_THEN_ELSE (mode, cond,
+  

[PATCH 5/7] riscv: thead: Add support for XTheadBb ISA extension

2022-11-13 Thread Christoph Muellner
From: Christoph Müllner 

The XTheadBb ISA extension provides instructions similar to Zbb:
* th.srri/th.srriw
* th.ext/th.extu
* th.ff1 (count-leading-zeros)
* th.rev/th.revw

Instructions that are not covered, because they don't fit into a
pattern:
* th.ff0 (count-leading-ones)
* th.tstnbz

For the cases where the RISC-V backend already provides instruction
patterns with GCC standard pattern names (e.g. rotatert), this
patch simply uses expanders so we can match the pattern using unnamed
instruction patterns for zb* and xtheadb*.

gcc/ChangeLog:

* config/riscv/bitmanip.md (clz2): Add expand.
(ctz2): Add expand.
(popcount2): Add expand.
(*si2): Hide pattern name.
(*di2): Hide pattern name.
(rotr3): Add expand.
(*rotrsi3): Hide pattern name.
(*rotrdi3): Hide pattern name.
(*rotrsi3_sext): Hide pattern name.
(bswapdi2): Add expand.
(bswapsi2): Add expand.
(*bswap2): Hide pattern name.
* config/riscv/riscv.cc (riscv_rtx_costs): Add support for
  sign-extract.
* config/riscv/riscv.md (extv): New expand.
(extzv): New expand.
* config/riscv/thead.md (*th_srrisi3): New pattern.
(*th_srridi3): New pattern.
(*th_ext): New pattern.
(*th_extu): New pattern.
(*th_clz): New pattern.
(*th_revsi2): New pattern.
(*th_revdi2): New pattern.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/xtheadbb-ext.c: New test.
* gcc.target/riscv/xtheadbb-extu.c: New test.
* gcc.target/riscv/xtheadbb-rev.c: New test.
* gcc.target/riscv/xtheadbb-srri.c: New test.

Signed-off-by: Christoph Müllner 
---
 gcc/config/riscv/bitmanip.md  | 47 --
 gcc/config/riscv/riscv.cc | 10 +++
 gcc/config/riscv/riscv.md | 26 
 gcc/config/riscv/thead.md | 62 +++
 gcc/testsuite/gcc.target/riscv/xtheadbb-ext.c | 19 ++
 .../gcc.target/riscv/xtheadbb-extu.c  | 12 
 gcc/testsuite/gcc.target/riscv/xtheadbb-rev.c | 40 
 .../gcc.target/riscv/xtheadbb-srri.c  | 18 ++
 8 files changed, 228 insertions(+), 6 deletions(-)
 create mode 100644 gcc/testsuite/gcc.target/riscv/xtheadbb-ext.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/xtheadbb-extu.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/xtheadbb-rev.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/xtheadbb-srri.c

diff --git a/gcc/config/riscv/bitmanip.md b/gcc/config/riscv/bitmanip.md
index b44fb9517e7..3dbb92b6115 100644
--- a/gcc/config/riscv/bitmanip.md
+++ b/gcc/config/riscv/bitmanip.md
@@ -119,6 +119,21 @@ (define_insn "*slliuw"
 
 ;; ZBB extension.
 
+(define_expand "clz2"
+  [(set (match_operand:GPR 0 "register_operand")
+   (clz:GPR (match_operand:GPR 1 "register_operand")))]
+  "TARGET_ZBB || TARGET_XTHEADBB")
+
+(define_expand "ctz2"
+  [(set (match_operand:GPR 0 "register_operand")
+   (ctz:GPR (match_operand:GPR 1 "register_operand")))]
+  "TARGET_ZBB")
+
+(define_expand "popcount2"
+  [(set (match_operand:GPR 0 "register_operand")
+   (popcount:GPR (match_operand:GPR 1 "register_operand")))]
+  "TARGET_ZBB")
+
 (define_insn "*_not"
   [(set (match_operand:X 0 "register_operand" "=r")
 (bitmanip_bitwise:X (not:X (match_operand:X 1 "register_operand" "r"))
@@ -137,7 +152,7 @@ (define_insn "*xor_not"
   [(set_attr "type" "bitmanip")
(set_attr "mode" "")])
 
-(define_insn "si2"
+(define_insn "*si2"
   [(set (match_operand:SI 0 "register_operand" "=r")
 (clz_ctz_pcnt:SI (match_operand:SI 1 "register_operand" "r")))]
   "TARGET_ZBB"
@@ -154,7 +169,7 @@ (define_insn "*disi2"
   [(set_attr "type" "bitmanip")
(set_attr "mode" "SI")])
 
-(define_insn "di2"
+(define_insn "*di2"
   [(set (match_operand:DI 0 "register_operand" "=r")
 (clz_ctz_pcnt:DI (match_operand:DI 1 "register_operand" "r")))]
   "TARGET_64BIT && TARGET_ZBB"
@@ -194,7 +209,17 @@ (define_insn "*zero_extendhi2_zbb"
   [(set_attr "type" "bitmanip,load")
(set_attr "mode" "HI")])
 
-(define_insn "rotrsi3"
+(define_expand "rotr3"
+  [(set (match_operand:GPR 0 "register_operand")
+   (rotatert:GPR (match_operand:GPR 1 "register_operand")
+(match_operand:QI 2 "arith_operand")))]
+  "TARGET_ZBB || TARGET_XTHEADBB"
+{
+  if (TARGET_XTHEADBB && !immediate_operand (operands[2], VOIDmode))
+FAIL;
+})
+
+(define_insn "*rotrsi3"
   [(set (match_operand:SI 0 "register_operand" "=r")
(rotatert:SI (match_operand:SI 1 "register_operand" "r")
 (match_operand:QI 2 "arith_operand" "rI")))]
@@ -202,7 +227,7 @@ (define_insn "rotrsi3"
   "ror%i2%~\t%0,%1,%2"
   [(set_attr "type" "bitmanip")])
 
-(define_insn "rotrdi3"
+(define_insn "*rotrdi3"
   [(set (match_operand:DI 0 "register_operand" "=r")
(rotatert:DI (match_operand:DI 1 "register_operand" "r")
 (match_operand

[PATCH 3/7] riscv: thead: Add support for XTheadBa and XTheadBs ISA extensions

2022-11-13 Thread Christoph Muellner
From: Christoph Müllner 

This patch adds support for the following T-Head vendor extensions:
* XTheadBa
* XTheadBs

Both extensions provide just one instruction, that has a counterpart
in the similar named Bitmanip ISA extension.

gcc/ChangeLog:

* config/riscv/riscv.cc (riscv_rtx_costs): Adjust for th.tst.
* config/riscv/riscv.md: Include thead.md.
* config/riscv/thead.md: New file.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/xtheadba-addsl-64.c: New test.
* gcc.target/riscv/xtheadba-addsl.c: New test.
* gcc.target/riscv/xtheadbs-tst.c: New test.

Signed-off-by: Christoph Müllner 
---
 gcc/config/riscv/riscv.cc |  4 +-
 gcc/config/riscv/riscv.md |  1 +
 gcc/config/riscv/thead.md | 38 +++
 .../gcc.target/riscv/xtheadba-addsl-64.c  | 18 +
 .../gcc.target/riscv/xtheadba-addsl.c | 20 ++
 gcc/testsuite/gcc.target/riscv/xtheadbs-tst.c | 12 ++
 6 files changed, 91 insertions(+), 2 deletions(-)
 create mode 100644 gcc/config/riscv/thead.md
 create mode 100644 gcc/testsuite/gcc.target/riscv/xtheadba-addsl-64.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/xtheadba-addsl.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/xtheadbs-tst.c

diff --git a/gcc/config/riscv/riscv.cc b/gcc/config/riscv/riscv.cc
index 02a01ca0b7c..decade0fedd 100644
--- a/gcc/config/riscv/riscv.cc
+++ b/gcc/config/riscv/riscv.cc
@@ -2369,8 +2369,8 @@ riscv_rtx_costs (rtx x, machine_mode mode, int 
outer_code, int opno ATTRIBUTE_UN
  *total = COSTS_N_INSNS (SINGLE_SHIFT_COST);
  return true;
}
-  /* bext pattern for zbs.  */
-  if (TARGET_ZBS && outer_code == SET
+  /* bit extraction pattern (zbs:bext, xtheadbs:tst).  */
+  if ((TARGET_ZBS || TARGET_XTHEADBS) && outer_code == SET
  && GET_CODE (XEXP (x, 1)) == CONST_INT
  && INTVAL (XEXP (x, 1)) == 1)
{
diff --git a/gcc/config/riscv/riscv.md b/gcc/config/riscv/riscv.md
index 798f7370a08..a9254df7820 100644
--- a/gcc/config/riscv/riscv.md
+++ b/gcc/config/riscv/riscv.md
@@ -3009,3 +3009,4 @@ (define_insn "riscv_prefetchi_"
 (include "generic.md")
 (include "sifive-7.md")
 (include "vector.md")
+(include "thead.md")
diff --git a/gcc/config/riscv/thead.md b/gcc/config/riscv/thead.md
new file mode 100644
index 000..676d10b71d7
--- /dev/null
+++ b/gcc/config/riscv/thead.md
@@ -0,0 +1,38 @@
+;; Machine description for T-Head vendor extensions
+;; Copyright (C) 2021-2022 Free Software Foundation, Inc.
+
+;; This file is part of GCC.
+
+;; GCC is free software; you can redistribute it and/or modify
+;; it under the terms of the GNU General Public License as published by
+;; the Free Software Foundation; either version 3, or (at your option)
+;; any later version.
+
+;; GCC is distributed in the hope that it will be useful,
+;; but WITHOUT ANY WARRANTY; without even the implied warranty of
+;; MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+;; GNU General Public License for more details.
+
+;; You should have received a copy of the GNU General Public License
+;; along with GCC; see the file COPYING3.  If not see
+;; .
+
+(define_insn "*th_addsl"
+  [(set (match_operand:X 0 "register_operand" "=r")
+   (plus:X (ashift:X (match_operand:X 1 "register_operand" "r")
+ (match_operand:QI 2 "immediate_operand" "I"))
+   (match_operand:X 3 "register_operand" "r")))]
+  "TARGET_XTHEADBA
+   && (INTVAL (operands[2]) >= 0) && (INTVAL (operands[2]) <= 3)"
+  "th.addsl\t%0,%1,%3,%2"
+  [(set_attr "type" "bitmanip")
+   (set_attr "mode" "")])
+
+(define_insn "*th_tst"
+  [(set (match_operand:X 0 "register_operand" "=r")
+   (zero_extract:X (match_operand:X 1 "register_operand" "r")
+   (const_int 1)
+   (match_operand 2 "immediate_operand" "i")))]
+  "TARGET_XTHEADBS"
+  "th.tst\t%0,%1,%2"
+  [(set_attr "type" "bitmanip")])
diff --git a/gcc/testsuite/gcc.target/riscv/xtheadba-addsl-64.c 
b/gcc/testsuite/gcc.target/riscv/xtheadba-addsl-64.c
new file mode 100644
index 000..7f47929967a
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/xtheadba-addsl-64.c
@@ -0,0 +1,18 @@
+/* { dg-do compile } */
+/* { dg-options "-march=rv64gc_xtheadba -mabi=lp64" } */
+/* { dg-skip-if "" { *-*-* } { "-O0" } } */
+
+/* RV64 only.  */
+int foos(short *x, int n){
+  return x[n];
+}
+int fooi(int *x, int n){
+  return x[n];
+}
+int fooll(long long *x, int n){
+  return x[n];
+}
+
+/* { dg-final { scan-assembler-times "th.addsl\[ 
\t\]*a\[0-9\]+,a\[0-9\]+,a\[0-9\]+,1" 1 } } */
+/* { dg-final { scan-assembler-times "th.addsl\[ 
\t\]*a\[0-9\]+,a\[0-9\]+,a\[0-9\]+,2" 1 } } */
+/* { dg-final { scan-assembler-times "th.addsl\[ 
\t\]*a\[0-9\]+,a\[0-9\]+,a\[0-9\]+,3" 1 } } */
diff --git a/gcc/testsuite/gcc.target/riscv/xtheadba-addsl.c 
b/gcc/testsuite/gcc.target/riscv/xtheadba

[PATCH 7/7] riscv: Add basic extension support for XTheadFmv and XTheadInt

2022-11-13 Thread Christoph Muellner
From: Christoph Müllner 

This patch add basic support for the XTheadFmv and XTheadInt
ISA extension. As both extensions only contain instruction,
which are not supposed to be emitted by the compiler, the support
only covers awareness of the extension name in the march string
and the definition of a feature test macro.

gcc/ChangeLog:

* common/config/riscv/riscv-common.cc: Add xtheadfmv and
  xtheadint.
* config/riscv/riscv-opts.h (MASK_XTHEADMAC): New.
(MASK_XTHEADFMV): New.
(TARGET_XTHEADFMV): New.
(MASK_XTHEADINT): New.
(TARGET_XTHEADINT): New.
(MASK_XTHEADMEMIDX): New.
(MASK_XTHEADSYNC): New.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/xtheadfmv.c: New test.
* gcc.target/riscv/xtheadint.c: New test.

Signed-off-by: Christoph Müllner 
---
 gcc/common/config/riscv/riscv-common.cc|  4 
 gcc/config/riscv/riscv-opts.h  | 10 +++---
 gcc/testsuite/gcc.target/riscv/xtheadfmv.c | 14 ++
 gcc/testsuite/gcc.target/riscv/xtheadint.c | 14 ++
 4 files changed, 39 insertions(+), 3 deletions(-)
 create mode 100644 gcc/testsuite/gcc.target/riscv/xtheadfmv.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/xtheadint.c

diff --git a/gcc/common/config/riscv/riscv-common.cc 
b/gcc/common/config/riscv/riscv-common.cc
index 8e1449d3543..b3e6732 100644
--- a/gcc/common/config/riscv/riscv-common.cc
+++ b/gcc/common/config/riscv/riscv-common.cc
@@ -228,6 +228,8 @@ static const struct riscv_ext_version 
riscv_ext_version_table[] =
   {"xtheadcmo", ISA_SPEC_CLASS_NONE, 1, 0},
   {"xtheadcondmov", ISA_SPEC_CLASS_NONE, 1, 0},
   {"xtheadfmemidx", ISA_SPEC_CLASS_NONE, 1, 0},
+  {"xtheadfmv", ISA_SPEC_CLASS_NONE, 1, 0},
+  {"xtheadint", ISA_SPEC_CLASS_NONE, 1, 0},
   {"xtheadmac", ISA_SPEC_CLASS_NONE, 1, 0},
   {"xtheadmemidx", ISA_SPEC_CLASS_NONE, 1, 0},
   {"xtheadsync", ISA_SPEC_CLASS_NONE, 1, 0},
@@ -1263,6 +1265,8 @@ static const riscv_ext_flag_table_t 
riscv_ext_flag_table[] =
   {"xtheadcmo", &gcc_options::x_riscv_xthead_subext, MASK_XTHEADCMO},
   {"xtheadcondmov", &gcc_options::x_riscv_xthead_subext, MASK_XTHEADCONDMOV},
   {"xtheadfmemidx", &gcc_options::x_riscv_xthead_subext, MASK_XTHEADFMEMIDX},
+  {"xtheadfmv", &gcc_options::x_riscv_xthead_subext, MASK_XTHEADFMV},
+  {"xtheadint", &gcc_options::x_riscv_xthead_subext, MASK_XTHEADINT},
   {"xtheadmac", &gcc_options::x_riscv_xthead_subext, MASK_XTHEADMAC},
   {"xtheadmemidx",  &gcc_options::x_riscv_xthead_subext, MASK_XTHEADMEMIDX},
   {"xtheadsync",&gcc_options::x_riscv_xthead_subext, MASK_XTHEADSYNC},
diff --git a/gcc/config/riscv/riscv-opts.h b/gcc/config/riscv/riscv-opts.h
index 18daac40dbd..c1868dcf284 100644
--- a/gcc/config/riscv/riscv-opts.h
+++ b/gcc/config/riscv/riscv-opts.h
@@ -201,11 +201,15 @@ enum stack_protector_guard {
 #define TARGET_XTHEADCONDMOV   ((riscv_xthead_subext & MASK_XTHEADCONDMOV) != 
0)
 #define MASK_XTHEADFMEMIDX (1 << 5)
 #define TARGET_XTHEADFMEMIDX   ((riscv_xthead_subext & MASK_XTHEADFMEMIDX) != 
0)
-#define MASK_XTHEADMAC (1 << 6)
+#define MASK_XTHEADFMV (1 << 6)
+#define TARGET_XTHEADFMV   ((riscv_xthead_subext & MASK_XTHEADFMV) != 0)
+#define MASK_XTHEADINT (1 << 7)
+#define TARGET_XTHEADINT   ((riscv_xthead_subext & MASK_XTHEADINT) != 0)
+#define MASK_XTHEADMAC (1 << 8)
 #define TARGET_XTHEADMAC   ((riscv_xthead_subext & MASK_XTHEADMAC) != 0)
-#define MASK_XTHEADMEMIDX  (1 << 7)
+#define MASK_XTHEADMEMIDX  (1 << 9)
 #define TARGET_XTHEADMEMIDX((riscv_xthead_subext & MASK_XTHEADMEMIDX) != 0)
-#define MASK_XTHEADSYNC(1 << 8)
+#define MASK_XTHEADSYNC(1 << 10)
 #define TARGET_XTHEADSYNC  ((riscv_xthead_subext & MASK_XTHEADSYNC) != 0)
 
 #endif /* ! GCC_RISCV_OPTS_H */
diff --git a/gcc/testsuite/gcc.target/riscv/xtheadfmv.c 
b/gcc/testsuite/gcc.target/riscv/xtheadfmv.c
new file mode 100644
index 000..e97e8f461f6
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/xtheadfmv.c
@@ -0,0 +1,14 @@
+/* { dg-do compile } */
+/* { dg-options "-march=rv32gc_xtheadfmv" { target { rv32 } } } */
+/* { dg-options "-march=rv64gc_xtheadfmv" { target { rv64 } } } */
+
+#ifndef __riscv_xtheadfmv
+#error Feature macro not defined
+#endif
+
+int
+foo (int a)
+{
+  return a;
+}
+
diff --git a/gcc/testsuite/gcc.target/riscv/xtheadint.c 
b/gcc/testsuite/gcc.target/riscv/xtheadint.c
new file mode 100644
index 000..ee6989a380e
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/xtheadint.c
@@ -0,0 +1,14 @@
+/* { dg-do compile } */
+/* { dg-options "-march=rv32gc_xtheadint" { target { rv32 } } } */
+/* { dg-options "-march=rv64gc_xtheadint" { target { rv64 } } } */
+
+#ifndef __riscv_xtheadint
+#error Feature macro not defined
+#endif
+
+int
+foo (int a)
+{
+  return a;
+}
+
-- 
2.38.1



[PATCH 6/7] riscv: thead: Add support for XTheadMac ISA extension

2022-11-13 Thread Christoph Muellner
From: Christoph Müllner 

The XTheadMac ISA extension provides multiply-accumulate/subtract
instructions:
* mula/mulaw/mulah
* muls/mulsw/mulsh

To benefit from middle-end passes, we expand the following named
patterns in riscv.md (as they are not T-Head-specific):
* maddhisi4
* msubhisi4

gcc/ChangeLog:

* config/riscv/riscv.md (maddhisi4): New expand.
(msubhisi4): New expand.
* config/riscv/thead.md (*th_mula): New pattern.
(*th_mulawsi): New pattern.
(*th_mulawsi2): New pattern.
(*th_maddhisi4): New pattern.
(*th_sextw_maddhisi4): New pattern.
(*th_muls): New pattern.
(*th_mulswsi): New pattern.
(*th_mulswsi2): New pattern.
(*th_msubhisi4): New pattern.
(*th_sextw_msubhisi4): New pattern.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/thead-mula-1.c: New test.
* gcc.target/riscv/thead-mula-2.c: New test.

Co-Developed-by: quxm 
Signed-off-by: quxm 
Signed-off-by: Christoph Müllner 
---
 gcc/config/riscv/riscv.md |  18 +++
 gcc/config/riscv/thead.md | 117 ++
 gcc/testsuite/gcc.target/riscv/thead-mula-1.c |  40 ++
 gcc/testsuite/gcc.target/riscv/thead-mula-2.c |  28 +
 4 files changed, 203 insertions(+)
 create mode 100644 gcc/testsuite/gcc.target/riscv/thead-mula-1.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/thead-mula-2.c

diff --git a/gcc/config/riscv/riscv.md b/gcc/config/riscv/riscv.md
index cfe1fd6baea..998169115f2 100644
--- a/gcc/config/riscv/riscv.md
+++ b/gcc/config/riscv/riscv.md
@@ -3029,6 +3029,24 @@ (define_expand "extzv"
 FAIL;
 })
 
+(define_expand "maddhisi4"
+  [(set (match_operand:SI 0 "register_operand")
+   (plus:SI
+ (mult:SI (sign_extend:SI (match_operand:HI 1 "register_operand"))
+  (sign_extend:SI (match_operand:HI 2 "register_operand")))
+ (match_operand:SI 3 "register_operand")))]
+  "TARGET_XTHEADMAC"
+)
+
+(define_expand "msubhisi4"
+  [(set (match_operand:SI 0 "register_operand")
+   (minus:SI
+ (match_operand:SI 3 "register_operand")
+ (mult:SI (sign_extend:SI (match_operand:HI 1 "register_operand"))
+  (sign_extend:SI (match_operand:HI 2 "register_operand")]
+  "TARGET_XTHEADMAC"
+)
+
 (include "bitmanip.md")
 (include "sync.md")
 (include "peephole.md")
diff --git a/gcc/config/riscv/thead.md b/gcc/config/riscv/thead.md
index ad42c03c0ce..f31ba18aa84 100644
--- a/gcc/config/riscv/thead.md
+++ b/gcc/config/riscv/thead.md
@@ -133,3 +133,120 @@ (define_insn "*th_cond_gpr_mov"
th.mveqz\t%0,%z3,%1"
   [(set_attr "type" "condmove")
(set_attr "mode" "")])
+
+;; XTheadMac
+
+(define_insn "*th_mula"
+  [(set (match_operand:X 0 "register_operand" "=r")
+ (plus:X (mult:X (match_operand:X 1 "register_operand" "r")
+ (match_operand:X 2 "register_operand" "r"))
+ (match_operand:X 3 "register_operand" "0")))]
+  "TARGET_XTHEADMAC"
+  "th.mula\\t%0,%1,%2"
+  [(set_attr "type" "imul")
+   (set_attr "mode" "")]
+)
+
+(define_insn "*th_mulawsi"
+  [(set (match_operand:DI 0 "register_operand" "=r")
+   (sign_extend:DI
+ (plus:SI (mult:SI (match_operand:SI 1 "register_operand" "r")
+   (match_operand:SI 2 "register_operand" "r"))
+  (match_operand:SI 3 "register_operand" "0"]
+  "TARGET_XTHEADMAC && TARGET_64BIT"
+  "th.mulaw\\t%0,%1,%2"
+  [(set_attr "type" "imul")
+   (set_attr "mode" "SI")]
+)
+
+(define_insn "*th_mulawsi2"
+  [(set (match_operand:SI 0 "register_operand" "=r")
+ (plus:SI (mult:SI (match_operand:SI 1 "register_operand" "r")
+   (match_operand:SI 2 "register_operand" "r"))
+  (match_operand:SI 3 "register_operand" "0")))]
+  "TARGET_XTHEADMAC && TARGET_64BIT"
+  "mulaw\\t%0,%1,%2"
+  [(set_attr "type" "imul")
+   (set_attr "mode" "SI")]
+)
+
+(define_insn "*th_maddhisi4"
+  [(set (match_operand:SI 0 "register_operand" "=r")
+   (plus:SI (mult:SI (sign_extend:SI (match_operand:HI 1 
"register_operand" " r"))
+ (sign_extend:SI (match_operand:HI 2 
"register_operand" " r")))
+(match_operand:SI 3 "register_operand" " 0")))]
+  "TARGET_XTHEADMAC"
+  "th.mulah\\t%0,%1,%2"
+  [(set_attr "type" "imul")
+   (set_attr "mode" "SI")]
+)
+
+(define_insn "*th_sextw_maddhisi4"
+  [(set (match_operand:DI 0 "register_operand" "=r")
+   (sign_extend:DI
+ (plus:SI (mult:SI (sign_extend:SI (match_operand:HI 1 
"register_operand" " r"))
+   (sign_extend:SI (match_operand:HI 2 
"register_operand" " r")))
+  (match_operand:SI 3 "register_operand" " 0"]
+  "TARGET_XTHEADMAC && TARGET_64BIT"
+  "th.mulah\\t%0,%1,%2"
+  [(set_attr "type" "imul")
+   (set_attr "mode" "SI")]
+)
+
+(define_insn "*th_muls"
+  [(set (match_operand:X 0 "register_operand" "=r")
+ 

Re: [PATCH 3/5] Fortran: Narrow return types [PR78798]

2022-11-13 Thread Bernhard Reutner-Fischer via Gcc-patches
On 13 November 2022 21:29:50 CET, Harald Anlauf  wrote:

>Replacing "int" by "signed char" adds confusion and makes code
>less understandable, so I would oppose it, as we don't solve a
>real problem and rather add confusion.

Ok so consider the non-bool hunks dropped, they just fell out of my helper and 
I thought I'd ask.

I can send an updated patch during the weekend.

thanks,


[RFC PATCH] riscv: thead: Add support for XTheadMemPair ISA extension

2022-11-13 Thread Christoph Muellner
From: "moiz.hussain" 

The XTheadMacXTheadMemPair ISA extension provides load/store
pair instructions:
* th.ldd
* th.sdd
* th.lwd
* th.lwud
* th.swd

We added the following unnamed patterns to the
peephole.md stage:
* load/store pair patterns for 4 instructions
* load/store pair patterns for 2 instructions

It was also required to add define_insn patterns to thead.md:
* th_mov_mempair_
* th_mov_mempair_di_si_zero_ext
* th_mov_mempair_di_si_sign_ext
* th_mov_mempair_si_si_zero_ext
* th_mov_mempair_si_si_sign_ext

Much of the code has been inspired by the MIPS and the aarch64 backend.
Also the new test cases were inspired by other architectures.

The patch is meant to apply on top of the XThead* series
that has been posted here:
  https://gcc.gnu.org/pipermail/gcc-patches/2022-November/605980.html

This patch is in RFC mode as we still need to sort out some failing
tests. However, we'd like to get some feedback from the RISC-V
maintainers on the approach that we have taken. Especially since the
patch is already quite big.

Signed-off-by: M. Moiz Hussain 
---
 gcc/common/config/riscv/riscv-common.cc   |   2 +
 gcc/config/riscv/peephole.md  | 336 +
 gcc/config/riscv/riscv-opts.h |   4 +-
 gcc/config/riscv/riscv-protos.h   |   9 +
 gcc/config/riscv/riscv.cc | 701 ++
 gcc/config/riscv/thead.md |  83 +++
 .../gcc.target/riscv/xtheadmempair-1.c|  39 +
 .../gcc.target/riscv/xtheadmempair-10.c   |  35 +
 .../gcc.target/riscv/xtheadmempair-11.c   |  22 +
 .../gcc.target/riscv/xtheadmempair-12.c   |  23 +
 .../gcc.target/riscv/xtheadmempair-13.c   |  31 +
 .../gcc.target/riscv/xtheadmempair-14.c   |  36 +
 .../gcc.target/riscv/xtheadmempair-15.c   |  20 +
 .../gcc.target/riscv/xtheadmempair-16.c   |  24 +
 .../gcc.target/riscv/xtheadmempair-17.c   |  16 +
 .../gcc.target/riscv/xtheadmempair-18.c   |  67 ++
 .../gcc.target/riscv/xtheadmempair-18.h   |  59 ++
 .../gcc.target/riscv/xtheadmempair-19.c   |  86 +++
 .../gcc.target/riscv/xtheadmempair-2.c|  24 +
 .../gcc.target/riscv/xtheadmempair-20.c   |  23 +
 .../gcc.target/riscv/xtheadmempair-3.c|  35 +
 .../gcc.target/riscv/xtheadmempair-4.c|  25 +
 .../gcc.target/riscv/xtheadmempair-5.c|  23 +
 .../gcc.target/riscv/xtheadmempair-6.c|  22 +
 .../gcc.target/riscv/xtheadmempair-7.c|  22 +
 .../gcc.target/riscv/xtheadmempair-8.c|  31 +
 .../gcc.target/riscv/xtheadmempair-9.c|  36 +
 27 files changed, 1833 insertions(+), 1 deletion(-)
 create mode 100644 gcc/testsuite/gcc.target/riscv/xtheadmempair-1.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/xtheadmempair-10.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/xtheadmempair-11.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/xtheadmempair-12.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/xtheadmempair-13.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/xtheadmempair-14.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/xtheadmempair-15.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/xtheadmempair-16.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/xtheadmempair-17.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/xtheadmempair-18.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/xtheadmempair-18.h
 create mode 100644 gcc/testsuite/gcc.target/riscv/xtheadmempair-19.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/xtheadmempair-2.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/xtheadmempair-20.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/xtheadmempair-3.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/xtheadmempair-4.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/xtheadmempair-5.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/xtheadmempair-6.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/xtheadmempair-7.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/xtheadmempair-8.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/xtheadmempair-9.c

diff --git a/gcc/common/config/riscv/riscv-common.cc 
b/gcc/common/config/riscv/riscv-common.cc
index 8e1449d3543..b541c55976b 100644
--- a/gcc/common/config/riscv/riscv-common.cc
+++ b/gcc/common/config/riscv/riscv-common.cc
@@ -230,6 +230,7 @@ static const struct riscv_ext_version 
riscv_ext_version_table[] =
   {"xtheadfmemidx", ISA_SPEC_CLASS_NONE, 1, 0},
   {"xtheadmac", ISA_SPEC_CLASS_NONE, 1, 0},
   {"xtheadmemidx", ISA_SPEC_CLASS_NONE, 1, 0},
+  {"xtheadmempair", ISA_SPEC_CLASS_NONE, 1, 0},
   {"xtheadsync", ISA_SPEC_CLASS_NONE, 1, 0},
 
   /* Terminate the list.  */
@@ -1265,6 +1266,7 @@ static const riscv_ext_flag_table_t 
riscv_ext_flag_table[] =
   {"xtheadfmemidx", &gcc_options::x_riscv_xthead_subext, MASK_XTHEADFMEMIDX},
   {"xtheadmac", &gcc_options::x_riscv_xthead_subext, MASK_XTHEADMAC},
   {"xtheadmemidx",  &gcc_options::x_riscv_xthead_subext, MASK_XTHEADMEMID

[committed] analyzer: new warning: -Wanalyzer-tainted-assertion [PR106235]

2022-11-13 Thread David Malcolm via Gcc-patches
This patch adds a new -Wanalyzer-tainted-assertion warning to
-fanalyzer's "taint" mode (which also requires -fanalyzer-checker=taint).

It complains about attacker-controlled values being used in assertions,
or in any expression affecting control flow that guards a "noreturn"
function.  As noted in the docs part of the patch, in such cases:

  - when assertion-checking is enabled: an attacker could trigger
a denial of service by injecting an assertion failure

  - when assertion-checking is disabled, such as by defining NDEBUG,
an attacker could inject data that subverts the process, since it
presumably violates a precondition that is being assumed by the code.

For example, given:

#include 

int __attribute__((tainted_args))
test_tainted_assert (int n)
{
  assert (n > 0);
  return n * n;
}

compiling with
  -fanalyzer -fanalyzer-checker=taint
gives:

t.c: In function 'test_tainted_assert':
t.c:6:3: warning: use of attacked-controlled value in condition for assertion 
[CWE-617] [-Wanalyzer-tainted-assertion]
6 |   assert (n > 0);
  |   ^~
  'test_tainted_assert': event 1
|
|4 | test_tainted_assert (int n)
|  | ^~~
|  | |
|  | (1) function 'test_tainted_assert' marked with 
'__attribute__((tainted_args))'
|
+--> 'test_tainted_assert': event 2
   |
   |4 | test_tainted_assert (int n)
   |  | ^~~
   |  | |
   |  | (2) entry to 'test_tainted_assert'
   |
 'test_tainted_assert': events 3-6
   |
   |/usr/include/assert.h:106:10:
   |  106 |   if (expr) 
\
   |  |  ^
   |  |  |
   |  |  (3) use of attacker-controlled value for control 
flow
   |  |  (4) following 'false' branch (when 'n <= 0')...
   |..
   |  109 | __assert_fail (#expr, __FILE__, __LINE__, 
__ASSERT_FUNCTION);   \
   |  | ~
   |  | |
   |  | (5) ...to here
   |  | (6) treating '__assert_fail' as an assertion 
failure handler due to '__attribute__((__noreturn__))'
   |

The testcases have various examples for BUG and BUG_ON from the
Linux kernel; there, the diagnostic treats "panic" as an assertion
failure handler, due to '__attribute__((__noreturn__))'.

Successfully bootstrapped & regrtested on x86_64-pc-linux-gnu.
Pushed to trunk as r13-3947-gd777b38cde91a8.

gcc/analyzer/ChangeLog:
PR analyzer/106235
* analyzer.opt (Wanalyzer-tainted-assertion): New.
* checker-path.cc (checker_path::fixup_locations): Pass false to
pending_diagnostic::fixup_location.
* diagnostic-manager.cc (get_emission_location): Pass true to
pending_diagnostic::fixup_location.
* pending-diagnostic.cc (pending_diagnostic::fixup_location): Add
bool param.
* pending-diagnostic.h (pending_diagnostic::fixup_location): Add
bool param to decl.
* sm-taint.cc (taint_state_machine::m_tainted_control_flow): New.
(taint_diagnostic::describe_state_change): Drop "final".
(class tainted_assertion): New.
(taint_state_machine::taint_state_machine): Initialize
m_tainted_control_flow.
(taint_state_machine::alt_get_inherited_state): Support
comparisons being tainted, based on their arguments.
(is_assertion_failure_handler_p): New.
(taint_state_machine::on_stmt): Complain about calls to assertion
failure handlers guarded by an attacker-controller conditional.
Detect attacker-controlled gcond conditionals and gswitch index
values.
(taint_state_machine::check_control_flow_arg_for_taint): New.

gcc/ChangeLog:
PR analyzer/106235
* doc/gcc/gcc-command-options/option-summary.rst: Add
-Wno-analyzer-tainted-assertion.
* doc/gcc/gcc-command-options/options-that-control-static-analysis.rst:
Add -Wno-analyzer-tainted-assertion.

gcc/testsuite/ChangeLog:
PR analyzer/106235
* gcc.dg/analyzer/taint-assert-BUG_ON.c: New test.
* gcc.dg/analyzer/taint-assert-macro-expansion.c: New test.
* gcc.dg/analyzer/taint-assert.c: New test.
* gcc.dg/analyzer/taint-assert-system-header.c: New test.
* gcc.dg/analyzer/test-assert.h: New header.
* gcc.dg/plugin/analyzer_gil_plugin.c
(gil_diagnostic::fixup_location): Add bool param.

Signed-off-by: David Malcolm 
---
 gcc/analyzer/analyzer.opt |   4 +
 gcc/analyzer/checker-path.cc  |   2 +-
 gcc/analyzer/diagnostic-manager.cc|   2 +-
 gcc/analyzer/pending-diagnostic.cc|   2 +-
 gcc/analyzer/pending-diagnostic.h |   8 +-
 gcc/analyzer/sm-taint.cc  |

[PATCH 3/7] riscv: Enable overlap-by-pieces via tune param

2022-11-13 Thread Christoph Muellner
From: Christoph Müllner 

This patch adds the field overlap_op_by_pieces to the struct
riscv_tune_param, which allows to enable the overlap_op_by_pieces
infrastructure.

gcc/ChangeLog:

* config/riscv/riscv.c (struct riscv_tune_param): New field.
(riscv_overlap_op_by_pieces): New function.
(TARGET_OVERLAP_OP_BY_PIECES_P): Connect to
riscv_overlap_op_by_pieces.

Signed-off-by: Christoph Müllner 
---
 gcc/config/riscv/riscv.cc | 17 +-
 .../gcc.target/riscv/memcpy-nonoverlapping.c  | 54 +++
 .../gcc.target/riscv/memcpy-overlapping.c | 50 +
 .../gcc.target/riscv/memset-nonoverlapping.c  | 45 
 .../gcc.target/riscv/memset-overlapping.c | 43 +++
 5 files changed, 208 insertions(+), 1 deletion(-)
 create mode 100644 gcc/testsuite/gcc.target/riscv/memcpy-nonoverlapping.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/memcpy-overlapping.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/memset-nonoverlapping.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/memset-overlapping.c

diff --git a/gcc/config/riscv/riscv.cc b/gcc/config/riscv/riscv.cc
index a0c00cfb66f..7357cf51cdf 100644
--- a/gcc/config/riscv/riscv.cc
+++ b/gcc/config/riscv/riscv.cc
@@ -243,6 +243,7 @@ struct riscv_tune_param
   unsigned short fmv_cost;
   bool slow_unaligned_access;
   unsigned int fusible_ops;
+  bool overlap_op_by_pieces;
 };
 
 /* Information about one micro-arch we know about.  */
@@ -331,6 +332,7 @@ static const struct riscv_tune_param rocket_tune_info = {
   8,   /* fmv_cost */
   true,/* 
slow_unaligned_access */
   RISCV_FUSE_NOTHING,   /* fusible_ops */
+  false,   /* overlap_op_by_pieces */
 };
 
 /* Costs to use when optimizing for Sifive 7 Series.  */
@@ -346,6 +348,7 @@ static const struct riscv_tune_param sifive_7_tune_info = {
   8,   /* fmv_cost */
   true,/* 
slow_unaligned_access */
   RISCV_FUSE_NOTHING,   /* fusible_ops */
+  false,   /* overlap_op_by_pieces */
 };
 
 /* Costs to use when optimizing for T-HEAD c906.  */
@@ -361,6 +364,7 @@ static const struct riscv_tune_param thead_c906_tune_info = 
{
   8,   /* fmv_cost */
   false,/* slow_unaligned_access */
   RISCV_FUSE_NOTHING,   /* fusible_ops */
+  false,   /* overlap_op_by_pieces */
 };
 
 /* Costs to use when optimizing for size.  */
@@ -376,6 +380,7 @@ static const struct riscv_tune_param 
optimize_size_tune_info = {
   8,   /* fmv_cost */
   false,   /* slow_unaligned_access */
   RISCV_FUSE_NOTHING,   /* fusible_ops */
+  false,   /* overlap_op_by_pieces */
 };
 
 /* Costs to use when optimizing for Ventana Micro VT1.  */
@@ -393,7 +398,8 @@ static const struct riscv_tune_param ventana_vt1_tune_info 
= {
   ( RISCV_FUSE_ZEXTW | RISCV_FUSE_ZEXTH |   /* fusible_ops */
 RISCV_FUSE_ZEXTWS | RISCV_FUSE_LDINDEXED |
 RISCV_FUSE_LUI_ADDI | RISCV_FUSE_AUIPC_ADDI |
-RISCV_FUSE_LUI_LD | RISCV_FUSE_AUIPC_LD )
+RISCV_FUSE_LUI_LD | RISCV_FUSE_AUIPC_LD ),
+  true,/* overlap_op_by_pieces 
*/
 };
 
 static tree riscv_handle_fndecl_attribute (tree *, tree, tree, int, bool *);
@@ -6444,6 +6450,12 @@ riscv_slow_unaligned_access (machine_mode, unsigned int)
   return riscv_slow_unaligned_access_p;
 }
 
+static bool
+riscv_overlap_op_by_pieces (void)
+{
+  return tune_param->overlap_op_by_pieces;
+}
+
 /* Implement TARGET_CAN_CHANGE_MODE_CLASS.  */
 
 static bool
@@ -6974,6 +6986,9 @@ riscv_dwarf_poly_indeterminate_value (unsigned int i, 
unsigned int *factor,
 #undef TARGET_SLOW_UNALIGNED_ACCESS
 #define TARGET_SLOW_UNALIGNED_ACCESS riscv_slow_unaligned_access
 
+#undef TARGET_OVERLAP_OP_BY_PIECES_P
+#define TARGET_OVERLAP_OP_BY_PIECES_P riscv_overlap_op_by_pieces
+
 #undef TARGET_SECONDARY_MEMORY_NEEDED
 #define TARGET_SECONDARY_MEMORY_NEEDED riscv_secondary_memory_needed
 
diff --git a/gcc/testsuite/gcc.target/riscv/memcpy-nonoverlapping.c 
b/gcc/testsuite/gcc.target/riscv/memcpy-nonoverlapping.c
new file mode 100644
index 000..1c99e13fc26
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/memcpy-nonoverlapping.c
@@ -0,0 +1,54 @@
+/* { dg-do compile } */
+/* { dg-options "-mcpu=sifive-u74 -march=rv64gc -mabi=lp64" } */
+/* { dg-skip-if "" { *-*-* } { "-O0" "-Os" "-Oz" "-Og" } } */
+
+
+#define COPY_N(N)  \
+void copy##N (char *src, char *dst)\
+{  \
+  dst = __builtin_assume_aligned 

[PATCH 5/7] riscv: Use by-pieces to do overlapping accesses in block_move_straight

2022-11-13 Thread Christoph Muellner
From: Christoph Müllner 

The current implementation of riscv_block_move_straight() emits a couple
of load-store pairs with maximum width (e.g. 8-byte for RV64).
The remainder is handed over to move_by_pieces(), which emits code based
target settings like slow_unaligned_access and overlap_op_by_pieces.

move_by_pieces() will emit overlapping memory accesses with maximum
width only if the given length exceeds the size of one access
(e.g. 15-bytes for 8-byte accesses).

This patch changes the implementation of riscv_block_move_straight()
such, that it preserves a remainder within the interval
[delta..2*delta) instead of [0..delta), so that overlapping memory
access may be emitted (if the requirements for them are given).

gcc/ChangeLog:

* config/riscv/riscv-string.c (riscv_block_move_straight):
  Adjust range for emitted load/store pairs.

Signed-off-by: Christoph Müllner 
---
 gcc/config/riscv/riscv-string.cc  |  8 
 .../gcc.target/riscv/memcpy-overlapping.c | 19 ---
 2 files changed, 12 insertions(+), 15 deletions(-)

diff --git a/gcc/config/riscv/riscv-string.cc b/gcc/config/riscv/riscv-string.cc
index 6882f0be269..1137df475be 100644
--- a/gcc/config/riscv/riscv-string.cc
+++ b/gcc/config/riscv/riscv-string.cc
@@ -57,18 +57,18 @@ riscv_block_move_straight (rtx dest, rtx src, unsigned 
HOST_WIDE_INT length)
   delta = bits / BITS_PER_UNIT;
 
   /* Allocate a buffer for the temporary registers.  */
-  regs = XALLOCAVEC (rtx, length / delta);
+  regs = XALLOCAVEC (rtx, length / delta - 1);
 
   /* Load as many BITS-sized chunks as possible.  Use a normal load if
  the source has enough alignment, otherwise use left/right pairs.  */
-  for (offset = 0, i = 0; offset + delta <= length; offset += delta, i++)
+  for (offset = 0, i = 0; offset + 2 * delta <= length; offset += delta, i++)
 {
   regs[i] = gen_reg_rtx (mode);
   riscv_emit_move (regs[i], adjust_address (src, mode, offset));
 }
 
   /* Copy the chunks to the destination.  */
-  for (offset = 0, i = 0; offset + delta <= length; offset += delta, i++)
+  for (offset = 0, i = 0; offset + 2 * delta <= length; offset += delta, i++)
 riscv_emit_move (adjust_address (dest, mode, offset), regs[i]);
 
   /* Mop up any left-over bytes.  */
@@ -166,7 +166,7 @@ riscv_expand_block_move (rtx dest, rtx src, rtx length)
 
   if (hwi_length <= (RISCV_MAX_MOVE_BYTES_STRAIGHT / factor))
{
- riscv_block_move_straight (dest, src, INTVAL (length));
+ riscv_block_move_straight (dest, src, hwi_length);
  return true;
}
   else if (optimize && align >= BITS_PER_WORD)
diff --git a/gcc/testsuite/gcc.target/riscv/memcpy-overlapping.c 
b/gcc/testsuite/gcc.target/riscv/memcpy-overlapping.c
index ffb7248bfd1..ef95bfb879b 100644
--- a/gcc/testsuite/gcc.target/riscv/memcpy-overlapping.c
+++ b/gcc/testsuite/gcc.target/riscv/memcpy-overlapping.c
@@ -25,26 +25,23 @@ COPY_N(15)
 /* Emits 2x {ld,sd} and 1x {lw,sw}.  */
 COPY_N(19)
 
-/* Emits 3x ld and 3x sd.  */
+/* Emits 3x {ld,sd}.  */
 COPY_N(23)
 
 /* The by-pieces infrastructure handles up to 24 bytes.
So the code below is emitted via cpymemsi/block_move_straight.  */
 
-/* Emits 3x {ld,sd} and 1x {lhu,lbu,sh,sb}.  */
+/* Emits 3x {ld,sd} and 1x {lw,sw}.  */
 COPY_N(27)
 
-/* Emits 3x {ld,sd} and 1x {lw,lbu,sw,sb}.  */
+/* Emits 4x {ld,sd}.  */
 COPY_N(29)
 
-/* Emits 3x {ld,sd} and 2x {lw,sw}.  */
+/* Emits 4x {ld,sd}.  */
 COPY_N(31)
 
-/* { dg-final { scan-assembler-times "ld\t" 21 } } */
-/* { dg-final { scan-assembler-times "sd\t" 21 } } */
+/* { dg-final { scan-assembler-times "ld\t" 23 } } */
+/* { dg-final { scan-assembler-times "sd\t" 23 } } */
 
-/* { dg-final { scan-assembler-times "lw\t" 5 } } */
-/* { dg-final { scan-assembler-times "sw\t" 5 } } */
-
-/* { dg-final { scan-assembler-times "lbu\t" 2 } } */
-/* { dg-final { scan-assembler-times "sb\t" 2 } } */
+/* { dg-final { scan-assembler-times "lw\t" 3 } } */
+/* { dg-final { scan-assembler-times "sw\t" 3 } } */
-- 
2.38.1



[PATCH 0/7] riscv: Improve builtins expansion

2022-11-13 Thread Christoph Muellner
From: Christoph Müllner 

This patchset adds includes patches to improve the following builtin
expansions:

* cpymemsi: Allow by-pieces to generate overlapping memory accesses
* cmpstrsi: Add expansion for strcmp and strncmp for aligned strings when 
zbb/orc.b is available
* strlen: Add expansion for strlen when zbb/orc.b is available

This changes were inspired a lot by the PowerPC backend
(e.g. moving the expansion code into riscv-string.cc and emitting an
unrolled loop which eventually calls the C runtime if necessary).

This series is meant to be applied on top of the VT1 support series:
  https://gcc.gnu.org/pipermail/gcc-patches/2022-November/605959.html

The patches come with their own tests and show no regressions to
existing tests. Further the series has been tested with all SPEC CPU2017
benchmarks.

Christoph Müllner (6):
  riscv: bitmanip/zbb: Add prefix/postfix and enable visiblity
  riscv: Enable overlap-by-pieces via tune param
  riscv: Move riscv_block_move_loop to separate file
  riscv: Use by-pieces to do overlapping accesses in block_move_straight
  riscv: Add support for strlen inline expansion
  riscv: Add support for str(n)cmp inline expansion

Philipp Tomsich (1):
  riscv: bitmanip: add orc.b as an unspec

 gcc/config.gcc|   3 +-
 gcc/config/riscv/bitmanip.md  |  12 +-
 gcc/config/riscv/riscv-protos.h   |   7 +-
 gcc/config/riscv/riscv-string.cc  | 687 ++
 gcc/config/riscv/riscv.cc | 172 +
 gcc/config/riscv/riscv.md | 105 ++-
 gcc/config/riscv/riscv.opt|   5 +
 gcc/config/riscv/t-riscv  |   4 +
 .../gcc.target/riscv/memcpy-nonoverlapping.c  |  54 ++
 .../gcc.target/riscv/memcpy-overlapping.c |  47 ++
 .../gcc.target/riscv/memset-nonoverlapping.c  |  45 ++
 .../gcc.target/riscv/memset-overlapping.c |  43 ++
 .../gcc.target/riscv/zbb-strcmp-unaligned.c   |  36 +
 gcc/testsuite/gcc.target/riscv/zbb-strcmp.c   |  55 ++
 .../gcc.target/riscv/zbb-strlen-unaligned.c   |  13 +
 gcc/testsuite/gcc.target/riscv/zbb-strlen.c   |  18 +
 16 files changed, 1132 insertions(+), 174 deletions(-)
 create mode 100644 gcc/config/riscv/riscv-string.cc
 create mode 100644 gcc/testsuite/gcc.target/riscv/memcpy-nonoverlapping.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/memcpy-overlapping.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/memset-nonoverlapping.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/memset-overlapping.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/zbb-strcmp-unaligned.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/zbb-strcmp.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/zbb-strlen-unaligned.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/zbb-strlen.c

-- 
2.38.1



[PATCH 1/7] riscv: bitmanip: add orc.b as an unspec

2022-11-13 Thread Christoph Muellner
From: Philipp Tomsich 

As a basis for optimized string functions (e.g., the by-pieces
implementations), we need orc.b available.  This adds orc.b as an
unspec, so we can expand to it.

gcc/ChangeLog:

* config/riscv/bitmanip.md (orcb2): Add orc.b as an
  unspec.
* config/riscv/riscv.md: Add UNSPEC_ORC_B.

Signed-off-by: Philipp Tomsich 
---
 gcc/config/riscv/bitmanip.md | 8 
 gcc/config/riscv/riscv.md| 3 +++
 2 files changed, 11 insertions(+)

diff --git a/gcc/config/riscv/bitmanip.md b/gcc/config/riscv/bitmanip.md
index b44fb9517e7..3dbe6002974 100644
--- a/gcc/config/riscv/bitmanip.md
+++ b/gcc/config/riscv/bitmanip.md
@@ -242,6 +242,14 @@ (define_insn "rotlsi3_sext"
   "rolw\t%0,%1,%2"
   [(set_attr "type" "bitmanip")])
 
+;; orc.b (or-combine) is added as an unspec for the benefit of the support
+;; for optimized string functions (such as strcmp).
+(define_insn "orcb2"
+  [(set (match_operand:X 0 "register_operand" "=r")
+   (unspec:X [(match_operand:X 1 "register_operand" "r")] UNSPEC_ORC_B))]
+  "TARGET_ZBB"
+  "orc.b\t%0,%1")
+
 (define_insn "bswap2"
   [(set (match_operand:X 0 "register_operand" "=r")
 (bswap:X (match_operand:X 1 "register_operand" "r")))]
diff --git a/gcc/config/riscv/riscv.md b/gcc/config/riscv/riscv.md
index 798f7370a08..532289dd178 100644
--- a/gcc/config/riscv/riscv.md
+++ b/gcc/config/riscv/riscv.md
@@ -62,6 +62,9 @@ (define_c_enum "unspec" [
 
   ;; Stack tie
   UNSPEC_TIE
+
+  ;; OR-COMBINE
+  UNSPEC_ORC_B
 ])
 
 (define_c_enum "unspecv" [
-- 
2.38.1



[PATCH 2/7] riscv: bitmanip/zbb: Add prefix/postfix and enable visiblity

2022-11-13 Thread Christoph Muellner
From: Christoph Müllner 

INSNs are usually postfixed by a number representing the argument count.
Given the instructions will be used in a later commit, let's make them
visible, but add a "riscv_" prefix to avoid conflicts with standard
INSNs.

gcc/ChangeLog:

* config/riscv/bitmanip.md (*_not): Rename INSN.
(riscv__not3): Rename INSN.
(*xor_not): Rename INSN.
(xor_not3): Rename INSN.

Signed-off-by: Christoph Müllner 
---
 gcc/config/riscv/bitmanip.md | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/gcc/config/riscv/bitmanip.md b/gcc/config/riscv/bitmanip.md
index 3dbe6002974..d6d94e5cdf8 100644
--- a/gcc/config/riscv/bitmanip.md
+++ b/gcc/config/riscv/bitmanip.md
@@ -119,7 +119,7 @@ (define_insn "*slliuw"
 
 ;; ZBB extension.
 
-(define_insn "*_not"
+(define_insn "riscv__not3"
   [(set (match_operand:X 0 "register_operand" "=r")
 (bitmanip_bitwise:X (not:X (match_operand:X 1 "register_operand" "r"))
 (match_operand:X 2 "register_operand" "r")))]
@@ -128,7 +128,7 @@ (define_insn "*_not"
   [(set_attr "type" "bitmanip")
(set_attr "mode" "")])
 
-(define_insn "*xor_not"
+(define_insn "riscv_xor_not3"
   [(set (match_operand:X 0 "register_operand" "=r")
 (not:X (xor:X (match_operand:X 1 "register_operand" "r")
   (match_operand:X 2 "register_operand" "r"]
-- 
2.38.1



[PATCH 4/7] riscv: Move riscv_block_move_loop to separate file

2022-11-13 Thread Christoph Muellner
From: Christoph Müllner 

Let's try to not accumulate too much functionality in one single file
as this does not really help maintaining or extending the code.
So in order to add more similar functionality like riscv_block_move_loop
let's move this function to a separate file.

This change does not do any functional changes.
It does modify a single line in the existing code,
that check_GNU_style.py complained about.

gcc/ChangeLog:

* config.gcc: Add new object riscv-string.o
* config/riscv/riscv-protos.h (riscv_expand_block_move): Remove
  duplicated prototype and move to new section for
  riscv-string.cc.
* config/riscv/riscv.cc (riscv_block_move_straight): Remove function.
(riscv_adjust_block_mem): Likewise.
(riscv_block_move_loop): Likewise.
(riscv_expand_block_move): Likewise.
* config/riscv/riscv.md (cpymemsi): Move to new section for
  riscv-string.cc.
* config/riscv/t-riscv: Add compile rule for riscv-string.o
* config/riscv/riscv-string.c: New file.

Signed-off-by: Christoph Müllner 
---
 gcc/config.gcc   |   3 +-
 gcc/config/riscv/riscv-protos.h  |   5 +-
 gcc/config/riscv/riscv-string.cc | 194 +++
 gcc/config/riscv/riscv.cc| 155 
 gcc/config/riscv/riscv.md|  28 ++---
 gcc/config/riscv/t-riscv |   4 +
 6 files changed, 218 insertions(+), 171 deletions(-)
 create mode 100644 gcc/config/riscv/riscv-string.cc

diff --git a/gcc/config.gcc b/gcc/config.gcc
index b5eda046033..fc9e582e713 100644
--- a/gcc/config.gcc
+++ b/gcc/config.gcc
@@ -518,7 +518,8 @@ pru-*-*)
;;
 riscv*)
cpu_type=riscv
-   extra_objs="riscv-builtins.o riscv-c.o riscv-sr.o 
riscv-shorten-memrefs.o riscv-selftests.o riscv-v.o"
+   extra_objs="riscv-builtins.o riscv-c.o riscv-sr.o 
riscv-shorten-memrefs.o riscv-selftests.o"
+   extra_objs="${extra_objs} riscv-string.o riscv-v.o"
extra_objs="${extra_objs} riscv-vector-builtins.o 
riscv-vector-builtins-shapes.o riscv-vector-builtins-bases.o"
d_target_objs="riscv-d.o"
extra_headers="riscv_vector.h"
diff --git a/gcc/config/riscv/riscv-protos.h b/gcc/config/riscv/riscv-protos.h
index 5a718bb62b4..344515dbaf4 100644
--- a/gcc/config/riscv/riscv-protos.h
+++ b/gcc/config/riscv/riscv-protos.h
@@ -62,7 +62,6 @@ extern void riscv_expand_conditional_move (rtx, rtx, rtx, 
rtx_code, rtx, rtx);
 #endif
 extern rtx riscv_legitimize_call_address (rtx);
 extern void riscv_set_return_address (rtx, rtx);
-extern bool riscv_expand_block_move (rtx, rtx, rtx);
 extern rtx riscv_return_addr (int, rtx);
 extern poly_int64 riscv_initial_elimination_offset (int, int);
 extern void riscv_expand_prologue (void);
@@ -70,7 +69,6 @@ extern void riscv_expand_epilogue (int);
 extern bool riscv_epilogue_uses (unsigned int);
 extern bool riscv_can_use_return_insn (void);
 extern rtx riscv_function_value (const_tree, const_tree, enum machine_mode);
-extern bool riscv_expand_block_move (rtx, rtx, rtx);
 extern bool riscv_store_data_bypass_p (rtx_insn *, rtx_insn *);
 extern rtx riscv_gen_gpr_save_insn (struct riscv_frame_info *);
 extern bool riscv_gpr_save_operation_p (rtx);
@@ -96,6 +94,9 @@ extern bool riscv_hard_regno_rename_ok (unsigned, unsigned);
 
 rtl_opt_pass * make_pass_shorten_memrefs (gcc::context *ctxt);
 
+/* Routines implemented in riscv-string.c.  */
+extern bool riscv_expand_block_move (rtx, rtx, rtx);
+
 /* Information about one CPU we know about.  */
 struct riscv_cpu_info {
   /* This CPU's canonical name.  */
diff --git a/gcc/config/riscv/riscv-string.cc b/gcc/config/riscv/riscv-string.cc
new file mode 100644
index 000..6882f0be269
--- /dev/null
+++ b/gcc/config/riscv/riscv-string.cc
@@ -0,0 +1,194 @@
+/* Subroutines used to expand string and block move, clear,
+   compare and other operations for RISC-V.
+   Copyright (C) 2011-2022 Free Software Foundation, Inc.
+
+   This file is part of GCC.
+
+   GCC is free software; you can redistribute it and/or modify it
+   under the terms of the GNU General Public License as published
+   by the Free Software Foundation; either version 3, or (at your
+   option) any later version.
+
+   GCC is distributed in the hope that it will be useful, but WITHOUT
+   ANY WARRANTY; without even the implied warranty of MERCHANTABILITY
+   or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public
+   License for more details.
+
+   You should have received a copy of the GNU General Public License
+   along with GCC; see the file COPYING3.  If not see
+   .  */
+
+#define IN_TARGET_CODE 1
+
+#include "config.h"
+#include "system.h"
+#include "coretypes.h"
+#include "backend.h"
+#include "rtl.h"
+#include "tree.h"
+#include "memmodel.h"
+#include "tm_p.h"
+#include "ira.h"
+#include "print-tree.h"
+#include "varasm.h"
+#include "explow.h"
+#include "expr.h"
+#include "output.h"
+#include "ta

[PATCH 6/7] riscv: Add support for strlen inline expansion

2022-11-13 Thread Christoph Muellner
From: Christoph Müllner 

This patch implements the expansion of the strlen builtin
using Zbb instructions (if available) for aligned strings
using the following sequence:

  li  a3,-1
  addia4,a0,8
.L2:  ld  a5,0(a0)
  addia0,a0,8
  orc.b   a5,a5
  beq a5,a3,6 <.L2>
  not a5,a5
  ctz a5,a5
  srlia5,a5,0x3
  add a0,a0,a5
  sub a0,a0,a4

This allows to inline calls to strlen(), with optimized code for
determining the length of a string.

gcc/ChangeLog:

* config/riscv/riscv-protos.h (riscv_expand_strlen): New
  prototype.
* config/riscv/riscv-string.cc (riscv_emit_unlikely_jump): New
  function.
(GEN_EMIT_HELPER2): New helper macro.
(GEN_EMIT_HELPER3): New helper macro.
(do_load_from_addr): New helper function.
(riscv_expand_strlen_zbb): New function.
(riscv_expand_strlen): New function.
* config/riscv/riscv.md (strlen): Invoke expansion
  functions for strlen.

Signed-off-by: Christoph Müllner 
---
 gcc/config/riscv/riscv-protos.h   |   1 +
 gcc/config/riscv/riscv-string.cc  | 149 ++
 gcc/config/riscv/riscv.md |  28 
 .../gcc.target/riscv/zbb-strlen-unaligned.c   |  13 ++
 gcc/testsuite/gcc.target/riscv/zbb-strlen.c   |  18 +++
 5 files changed, 209 insertions(+)
 create mode 100644 gcc/testsuite/gcc.target/riscv/zbb-strlen-unaligned.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/zbb-strlen.c

diff --git a/gcc/config/riscv/riscv-protos.h b/gcc/config/riscv/riscv-protos.h
index 344515dbaf4..18187e3bd78 100644
--- a/gcc/config/riscv/riscv-protos.h
+++ b/gcc/config/riscv/riscv-protos.h
@@ -96,6 +96,7 @@ rtl_opt_pass * make_pass_shorten_memrefs (gcc::context *ctxt);
 
 /* Routines implemented in riscv-string.c.  */
 extern bool riscv_expand_block_move (rtx, rtx, rtx);
+extern bool riscv_expand_strlen (rtx[]);
 
 /* Information about one CPU we know about.  */
 struct riscv_cpu_info {
diff --git a/gcc/config/riscv/riscv-string.cc b/gcc/config/riscv/riscv-string.cc
index 1137df475be..bf96522b608 100644
--- a/gcc/config/riscv/riscv-string.cc
+++ b/gcc/config/riscv/riscv-string.cc
@@ -38,6 +38,81 @@
 #include "predict.h"
 #include "optabs.h"
 
+/* Emit unlikely jump instruction.  */
+
+static rtx_insn *
+riscv_emit_unlikely_jump (rtx insn)
+{
+  rtx_insn *jump = emit_jump_insn (insn);
+  add_reg_br_prob_note (jump, profile_probability::very_unlikely ());
+  return jump;
+}
+
+/* Emit proper instruction depending on type of dest.  */
+
+#define GEN_EMIT_HELPER2(name) \
+static rtx_insn *  \
+do_## name ## 2(rtx dest, rtx src) \
+{  \
+  rtx_insn *insn;  \
+  if (GET_MODE (dest) == DImode)   \
+insn = emit_insn (gen_ ## name ## di2 (dest, src));\
+  else \
+insn = emit_insn (gen_ ## name ## si2 (dest, src));\
+  return insn; \
+}
+
+/* Emit proper instruction depending on type of dest.  */
+
+#define GEN_EMIT_HELPER3(name) \
+static rtx_insn *  \
+do_## name ## 3(rtx dest, rtx src1, rtx src2)  \
+{  \
+  rtx_insn *insn;  \
+  if (GET_MODE (dest) == DImode)   \
+insn = emit_insn (gen_ ## name ## di3 (dest, src1, src2)); \
+  else \
+insn = emit_insn (gen_ ## name ## si3 (dest, src1, src2)); \
+  return insn; \
+}
+
+GEN_EMIT_HELPER3(add) /* do_add3  */
+GEN_EMIT_HELPER3(sub) /* do_sub3  */
+GEN_EMIT_HELPER3(lshr) /* do_lshr3  */
+GEN_EMIT_HELPER2(orcb) /* do_orcb2  */
+GEN_EMIT_HELPER2(one_cmpl) /* do_one_cmpl2  */
+GEN_EMIT_HELPER2(clz) /* do_clz2  */
+GEN_EMIT_HELPER2(ctz) /* do_ctz2  */
+GEN_EMIT_HELPER2(zero_extendqi) /* do_zero_extendqi2  */
+
+/* Helper function to load a byte or a Pmode register.
+
+   MODE is the mode to use for the load (QImode or Pmode).
+   DEST is the destination register for the data.
+   ADDR_REG is the register that holds the address.
+   ADDR is the address expression to load from.
+
+   This function returns an rtx containing the register,
+   where the ADDR is stored.  */
+
+static rtx
+do_load_from_addr (machine_mode mode, rtx dest, rtx addr_reg, rtx addr)
+{
+  rtx mem = gen_rtx_MEM (mode, addr_reg);
+  MEM_COPY_ATTRIBUTES (mem, addr);
+  set_mem_size (mem, GET_MODE_SIZE (mode));
+
+  if (mode == QImode)
+do_zero_extendqi2 (dest, mem);
+  else if (mode == Pmode)
+emit_move_insn (dest, mem);
+  else
+gcc_unreacha

[PATCH 7/7] riscv: Add support for str(n)cmp inline expansion

2022-11-13 Thread Christoph Muellner
From: Christoph Müllner 

This patch implements expansions for the cmpstrsi and the cmpstrnsi
builtins using Zbb instructions (if available).
This allows to inline calls to strcmp() and strncmp().

The expansion basically emits a peeled comparison sequence (i.e. a peeled
comparison loop) which compares XLEN bits per step if possible.

The emitted sequence can be controlled, by setting the maximum number
of compared bytes (-mstring-compare-inline-limit).

gcc/ChangeLog:

* config/riscv/riscv-protos.h (riscv_expand_strn_compare): New
  prototype.
* config/riscv/riscv-string.cc (GEN_EMIT_HELPER3): New helper
  macros.
(GEN_EMIT_HELPER2): New helper macros.
(expand_strncmp_zbb_sequence): New function.
(riscv_emit_str_compare_zbb): New function.
(riscv_expand_strn_compare): New function.
* config/riscv/riscv.md (cmpstrnsi): Invoke expansion functions
  for strn_compare.
(cmpstrsi): Invoke expansion functions for strn_compare.
* config/riscv/riscv.opt: Add new parameter
  '-mstring-compare-inline-limit'.

Signed-off-by: Christoph Müllner 
---
 gcc/config/riscv/riscv-protos.h   |   1 +
 gcc/config/riscv/riscv-string.cc  | 344 ++
 gcc/config/riscv/riscv.md |  46 +++
 gcc/config/riscv/riscv.opt|   5 +
 .../gcc.target/riscv/zbb-strcmp-unaligned.c   |  36 ++
 gcc/testsuite/gcc.target/riscv/zbb-strcmp.c   |  55 +++
 6 files changed, 487 insertions(+)
 create mode 100644 gcc/testsuite/gcc.target/riscv/zbb-strcmp-unaligned.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/zbb-strcmp.c

diff --git a/gcc/config/riscv/riscv-protos.h b/gcc/config/riscv/riscv-protos.h
index 18187e3bd78..7f334be333c 100644
--- a/gcc/config/riscv/riscv-protos.h
+++ b/gcc/config/riscv/riscv-protos.h
@@ -97,6 +97,7 @@ rtl_opt_pass * make_pass_shorten_memrefs (gcc::context *ctxt);
 /* Routines implemented in riscv-string.c.  */
 extern bool riscv_expand_block_move (rtx, rtx, rtx);
 extern bool riscv_expand_strlen (rtx[]);
+extern bool riscv_expand_strn_compare (rtx[], int);
 
 /* Information about one CPU we know about.  */
 struct riscv_cpu_info {
diff --git a/gcc/config/riscv/riscv-string.cc b/gcc/config/riscv/riscv-string.cc
index bf96522b608..f157e04ac0c 100644
--- a/gcc/config/riscv/riscv-string.cc
+++ b/gcc/config/riscv/riscv-string.cc
@@ -84,6 +84,11 @@ GEN_EMIT_HELPER2(one_cmpl) /* do_one_cmpl2  */
 GEN_EMIT_HELPER2(clz) /* do_clz2  */
 GEN_EMIT_HELPER2(ctz) /* do_ctz2  */
 GEN_EMIT_HELPER2(zero_extendqi) /* do_zero_extendqi2  */
+GEN_EMIT_HELPER3(xor) /* do_xor3  */
+GEN_EMIT_HELPER3(ashl) /* do_ashl3  */
+GEN_EMIT_HELPER2(bswap) /* do_bswap2  */
+GEN_EMIT_HELPER3(riscv_ior_not) /* do_riscv_ior_not3  */
+GEN_EMIT_HELPER3(riscv_and_not) /* do_riscv_and_not3  */
 
 /* Helper function to load a byte or a Pmode register.
 
@@ -268,6 +273,345 @@ riscv_expand_block_move (rtx dest, rtx src, rtx length)
   return false;
 }
 
+/* Generate the sequence of compares for strcmp/strncmp using zbb instructions.
+   BYTES_TO_COMPARE is the number of bytes to be compared.
+   BASE_ALIGN is the smaller of the alignment of the two strings.
+   ORIG_SRC1 is the unmodified rtx for the first string.
+   ORIG_SRC2 is the unmodified rtx for the second string.
+   DATA1 is the register for loading the first string.
+   DATA2 is the register for loading the second string.
+   HAS_NUL is the register holding non-NUL bytes for NUL-bytes in the string.
+   TARGET is the rtx for the result register (SImode)
+   EQUALITY_COMPARE_REST if set, then we hand over to libc if string matches.
+   END_LABEL is the location before the calculation of the result value.
+   FINAL_LABEL is the location after the calculation of the result value.  */
+
+static void
+expand_strncmp_zbb_sequence (unsigned HOST_WIDE_INT bytes_to_compare,
+rtx src1, rtx src2, rtx data1, rtx data2,
+rtx target, rtx orc, bool equality_compare_rest,
+rtx end_label, rtx final_label)
+{
+  const unsigned HOST_WIDE_INT p_mode_size = GET_MODE_SIZE (Pmode);
+  rtx src1_addr = force_reg (Pmode, XEXP (src1, 0));
+  rtx src2_addr = force_reg (Pmode, XEXP (src2, 0));
+  unsigned HOST_WIDE_INT offset = 0;
+
+  rtx m1 = gen_reg_rtx (Pmode);
+  emit_insn (gen_rtx_SET (m1, constm1_rtx));
+
+  /* Generate a compare sequence.  */
+  while (bytes_to_compare > 0)
+{
+  machine_mode load_mode = QImode;
+  unsigned HOST_WIDE_INT load_mode_size = 1;
+  if (bytes_to_compare > 1)
+   {
+ load_mode = Pmode;
+ load_mode_size = p_mode_size;
+   }
+  unsigned HOST_WIDE_INT cmp_bytes = 0;
+
+  if (bytes_to_compare >= load_mode_size)
+   cmp_bytes = load_mode_size;
+  else
+   cmp_bytes = bytes_to_compare;
+
+  unsigned HOST_WIDE_INT remain = bytes_to_compare - cmp_bytes;
+
+  /* load_mode_size...byte

Re: old install to a different folder

2022-11-13 Thread Gerald Pfeifer
On Sun, 13 Nov 2022, Martin Liška wrote:
> So Gerald, I'm suggesting a new url base gcc.gnu.org/docs that will be 
> filled with the new manuals and gcc.gnu.org/onlinedocs/$man and 
> gcc.gnu.org/install locations should point to older (trunk) manuals 
> (prev folder at server I guess). Having that, the new manuals will not 
> available through navigation and will get some time for further changes.

I feel I may be missing something.

Why don't we

 (1a) keep /onlinedocs for all docs < GCC 13,
 (1b) possibly introduce /docs as an alternative URL (though we need
  to keep /onlinedocs for all the existing ones),

  (2) add sphinx docs for trunk, GCC 13 and later to /docs (and at the
  same time /onlinedocs which is just an alias)

  (3) put current installation documentation under /install and simply
  add a few redirects for those pages that have changed names?

That way we can retain existing structures, possibly replace /onlinedocs 
for the shorter /docs, and have one consistent index for all manuals.

Gerald


Re: [PATCH V2] Enable small loop unrolling for O2

2022-11-13 Thread Hongtao Liu via Gcc-patches
On Wed, Nov 9, 2022 at 9:29 AM Hongyu Wang  wrote:
>
> > Although ix86_small_unroll_insns is coming from issue_rate, it's tuned
> > for codesize.
> > Make it exact as issue_rate and using factor * issue_width /
> > loop->ninsns may increase code size too much.
> > So I prefer to add those 2 parameters to the cost table for core
> > tunings instead of 1.
>
> Yes, here is the updated patch that changes the cost table.
>
> Bootstrapped & regrtested on x86_64-pc-linux-gnu.
>
> Ok for trunk?
Ok, Note GCC documents have been ported to sphinx, so you need to
adjust changes in invoke.texi to new sphinx files.
>
> Hongtao Liu via Gcc-patches  于2022年11月8日周二 11:05写道:
> >
> > On Mon, Nov 7, 2022 at 10:25 PM Richard Biener via Gcc-patches
> >  wrote:
> > >
> > > On Wed, Nov 2, 2022 at 4:37 AM Hongyu Wang  wrote:
> > > >
> > > > Hi, this is the updated patch of
> > > > https://gcc.gnu.org/pipermail/gcc-patches/2022-October/604345.html,
> > > > which uses targetm.loop_unroll_adjust as gate to enable small loop 
> > > > unroll.
> > > >
> > > > This patch does not change rs6000/s390 since I don't have machine to
> > > > test them, but I suppose the default behavior is the same since they
> > > > enable flag_unroll_loops at O2.
> > > >
> > > > Bootstrapped & regrtested on x86_64-pc-linux-gnu.
> > > >
> > > > Ok for trunk?
> > > >
> > > > -- Patch content 
> > > >
> > > > Modern processors has multiple way instruction decoders
> > > > For x86, icelake/zen3 has 5 uops, so for small loop with <= 4
> > > > instructions (usually has 3 uops with a cmp/jmp pair that can be
> > > > macro-fused), the decoder would have 2 uops bubble for each iteration
> > > > and the pipeline could not be fully utilized.
> > > >
> > > > Therefore, this patch enables loop unrolling for small size loop at O2
> > > > to fullfill the decoder as much as possible. It turns on rtl loop
> > > > unrolling when targetm.loop_unroll_adjust exists and O2 plus speed only.
> > > > In x86 backend the default behavior is to unroll small loops with less
> > > > than 4 insns by 1 time.
> > > >
> > > > This improves 548.exchange2 by 9% on icelake and 7.4% on zen3 with
> > > > 0.9% codesize increment. For other benchmarks the variants are minor
> > > > and overall codesize increased by 0.2%.
> > > >
> > > > The kernel image size increased by 0.06%, and no impact on eembc.
> > > >
> > > > gcc/ChangeLog:
> > > >
> > > > * common/config/i386/i386-common.cc (ix86_optimization_table):
> > > > Enable small loop unroll at O2 by default.
> > > > * config/i386/i386.cc (ix86_loop_unroll_adjust): Adjust unroll
> > > > factor if -munroll-only-small-loops enabled and -funroll-loops/
> > > > -funroll-all-loops are disabled.
> > > > * config/i386/i386.opt: Add -munroll-only-small-loops,
> > > > -param=x86-small-unroll-ninsns= for loop insn limit,
> > > > -param=x86-small-unroll-factor= for unroll factor.
> > > > * doc/invoke.texi: Document -munroll-only-small-loops,
> > > > x86-small-unroll-ninsns and x86-small-unroll-factor.
> > > > * loop-init.cc (pass_rtl_unroll_loops::gate): Enable rtl
> > > > loop unrolling for -O2-speed and above if target hook
> > > > loop_unroll_adjust exists.
> > > >
> > > > gcc/testsuite/ChangeLog:
> > > >
> > > > * gcc.dg/guality/loop-1.c: Add additional option
> > > >   -mno-unroll-only-small-loops.
> > > > * gcc.target/i386/pr86270.c: Add -mno-unroll-only-small-loops.
> > > > * gcc.target/i386/pr93002.c: Likewise.
> > > > ---
> > > >  gcc/common/config/i386/i386-common.cc   |  1 +
> > > >  gcc/config/i386/i386.cc | 18 ++
> > > >  gcc/config/i386/i386.opt| 13 +
> > > >  gcc/doc/invoke.texi | 16 
> > > >  gcc/loop-init.cc| 10 +++---
> > > >  gcc/testsuite/gcc.dg/guality/loop-1.c   |  2 ++
> > > >  gcc/testsuite/gcc.target/i386/pr86270.c |  2 +-
> > > >  gcc/testsuite/gcc.target/i386/pr93002.c |  2 +-
> > > >  8 files changed, 59 insertions(+), 5 deletions(-)
> > > >
> > > > diff --git a/gcc/common/config/i386/i386-common.cc 
> > > > b/gcc/common/config/i386/i386-common.cc
> > > > index f66bdd5a2af..c6891486078 100644
> > > > --- a/gcc/common/config/i386/i386-common.cc
> > > > +++ b/gcc/common/config/i386/i386-common.cc
> > > > @@ -1724,6 +1724,7 @@ static const struct default_options 
> > > > ix86_option_optimization_table[] =
> > > >  /* The STC algorithm produces the smallest code at -Os, for x86.  
> > > > */
> > > >  { OPT_LEVELS_2_PLUS, OPT_freorder_blocks_algorithm_, NULL,
> > > >REORDER_BLOCKS_ALGORITHM_STC },
> > > > +{ OPT_LEVELS_2_PLUS_SPEED_ONLY, OPT_munroll_only_small_loops, 
> > > > NULL, 1 },
> > > >  /* Turn off -fschedule-insns by default.  It tends to make the
> > > > problem with not enough registers even worse.  */
> > > >  { OPT_LE

Re: [wwwdocs] gcc-13: Mention Intel new ISA and march support.

2022-11-13 Thread Hongtao Liu via Gcc-patches
On Thu, Nov 10, 2022 at 2:04 PM Haochen Jiang via Gcc-patches
 wrote:
>
> Hi all,
>
> This patch aims to mention newly added Intel ISA and march support.
>
> Ok for trunk?
Ok.
>
> BRs,
> Haochen
>
> ---
>  htdocs/gcc-13/changes.html | 50 ++
>  1 file changed, 50 insertions(+)
>
> diff --git a/htdocs/gcc-13/changes.html b/htdocs/gcc-13/changes.html
> index bd11cbec..0daf921b 100644
> --- a/htdocs/gcc-13/changes.html
> +++ b/htdocs/gcc-13/changes.html
> @@ -240,6 +240,56 @@ a work-in-progress.
>__bf16 type to x86 psABI. Users need to adjust their
>AVX512BF16-related source code when upgrading GCC12 to GCC13.
>
> +  New ISA extension support for Intel AVX-IFMA was added to GCC.
> +  AVX-IFMA intrinsics are available via the -mavxifma
> +  compiler switch.
> +  
> +  New ISA extension support for Intel AVX-VNNI-INT8 was added to GCC.
> +  AVX-VNNI-INT8 intrinsics are available via the 
> -mavxvnniint8
> +  compiler switch.
> +  
> +  New ISA extension support for Intel AVX-NE-CONVERT was added to GCC.
> +  AVX-NE-CONVERT intrinsics are available via the
> +  -mavxneconvert compiler switch.
> +  
> +  New ISA extension support for Intel CMPccXADD was added to GCC.
> +  CMPccXADD intrinsics are available via the -mcmpccxadd
> +  compiler switch.
> +  
> +  New ISA extension support for Intel AMX-FP16 was added to GCC.
> +  AMX-FP16 intrinsics are available via the -mamx-fp16
> +  compiler switch.
> +  
> +  New ISA extension support for Intel PREFETCHI was added to GCC.
> +  PREFETCHI intrinsics are available via the -mprefetchi
> +  compiler switch.
> +  
> +  New ISA extension support for Intel RAO-INT was added to GCC.
> +  RAO-INT intrinsics are available via the -mraoint
> +  compiler switch.
> +  
> +  GCC now supports the Intel CPU named Raptor Lake through
> +-march=raptorlake.
> +Raptor Lake is based on Alder Lake.
> +  
> +  GCC now supports the Intel CPU named Meteor Lake through
> +-march=meteorlake.
> +Meteor Lake is based on Alder Lake.
> +  
> +  GCC now supports the Intel CPU named Sierra Forest through
> +-march=sierraforest.
> +The switch enables the AVX-IFMA, AVX-VNNI-INT8, AVX-NE-CONVERT and
> +CMPccXADD ISA extensions.
> +  
> +  GCC now supports the Intel CPU named Grand Ridge through
> +-march=grandridge.
> +The switch enables the AVX-IFMA, AVX-VNNI-INT8, AVX-NE-CONVERT, CMPccXADD
> +and RAO-INT ISA extensions.
> +  
> +  GCC now supports the Intel CPU named Granite Rapids through
> +-march=graniterapids.
> +The switch enables the AMX-FP16 and PREFETCHI ISA extensions.
> +  
>  
>
>  
> --
> 2.18.1
>


-- 
BR,
Hongtao


Re: [PATCH][i386]: Update ix86_can_change_mode_class target hook to accept QImode conversions

2022-11-13 Thread Hongtao Liu via Gcc-patches
On Fri, Nov 11, 2022 at 10:47 PM Tamar Christina via Gcc-patches
 wrote:
>
> Hi All,
>
> The current i386 implementation of the TARGET_CAN_CHANGE_MODE_CLASS is 
> currently
> not useful before re-alloc.
>
> In particular before regalloc optimization passes query the hook using 
> ALL_REGS,
> but because of the
>
>   if (MAYBE_FLOAT_CLASS_P (regclass))
>   return false;
>
> The hook returns false for all modes, even integer ones because ALL_REGS
> overlaps with floating point regs.
>
> The vector permute fallback cases used to unconditionally convert vector 
> integer
> permutes to vector QImode ones as a fallback plan.  This is incorrect and can
> result in incorrect code if the target doesn't support this conversion.
>
> To fix this some more checks were added, however that ended up introducing 
> ICEs
> in the i386 backend because e.g. the hook would reject conversions between 
> modes
> like V2TImode and V32QImode.
>
> My understanding is that for x87 we don't want to allow floating point
> conversions, but integers are fine.  So I have modified the check such that it
> also checks the modes, not just the register class groups.
>
> The second part of the code is needed because now that integer modes aren't
> uniformly rejected the i386 backend trigger further optimizations.  However 
> the
> backend lacks instructions to deal with canonical RTL representations of
> certain instructions.  for instance the back-end seems to prefer vec_select 0
> instead of subregs.
>
> So to prevent the canonicalization I reject integer modes when the sizes of to
> and from don't match and when we would have exited with false previously.
>
> This fixes all the ICEs and codegen regressions, but perhaps an x86 maintainer
> should take a deeper look at this hook implementation.
>
> Bootstrapped Regtested on x86_64-pc-linux-gnu and no issues.
>
> Ok for master?
>
> Thanks,
> Tamar
>
> gcc/ChangeLog:
>
> * config/i386/i386.cc (ix86_can_change_mode_class): Update the target
> hook.
>
> --- inline copy of patch --
> diff --git a/gcc/config/i386/i386.cc b/gcc/config/i386/i386.cc
> index 
> c4d0e36e9c0a2256f5dde1f4dc021c0328aa0cba..477dd007ea80272680751b61e35cc3eec79b66c3
>  100644
> --- a/gcc/config/i386/i386.cc
> +++ b/gcc/config/i386/i386.cc
> @@ -19682,7 +19682,15 @@ ix86_can_change_mode_class (machine_mode from, 
> machine_mode to,
>
>/* x87 registers can't do subreg at all, as all values are reformatted
>   to extended precision.  */
> -  if (MAYBE_FLOAT_CLASS_P (regclass))
> +  if (MAYBE_FLOAT_CLASS_P (regclass)
> +  && VALID_FP_MODE_P (from)
> +  && VALID_FP_MODE_P (to))
> +return false;
This change looks reasonable since only VALID_FP_MODE_P will be
allocated to FLOAT_CLASS.
> +
> +  /* Reject integer modes if the sizes aren't the same.  It would have
> + normally exited above.  */
> +  if (MAYBE_FLOAT_CLASS_P (regclass)
> +  && GET_MODE_SIZE (from) != GET_MODE_SIZE (to))
>  return false;
Do you have a case(or a patch so I can reproduce the regression
myself) to indicate the regression, so I can have a deep look.
>
>if (MAYBE_SSE_CLASS_P (regclass) || MAYBE_MMX_CLASS_P (regclass))
>
>
>
>
> --



-- 
BR,
Hongtao


Re: [PATCH v2 0/4] LoongArch: Add some floating-point operations

2022-11-13 Thread tangxiaolin

How about I do this work on Glibc?



在 2022/11/12 下午3:08, Xi Ruoyao 写道:

On Wed, 2022-11-09 at 21:53 +0800, Xi Ruoyao wrote:

These patches allow to expand the following builtins to floating point
instructions for LoongArch:

- __builtin_rint{,f}
- __builtin_{l,ll}rint{,f}
- __builtin_{l,ll}floor{,f}
- __builtin_{l,ll}ceil{,f}
- __builtin_scalb{n,ln}{,f}
- __builtin_logb{,f}

Bootstrapped and regtested on loongarch64-linux-gnu.  And a modified
Glibc using the builtins for rint{,f}, {l,ll}rint{,f}, and logb{,f}
also survived Glibc test suite.

Please review ASAP because GCC 13 stage 1 will end on Nov. 13th.

v1 -> v2: Only use ftint{rm,rp} instructions if floor and ceil are
allowed to raise inexact exception.

Xi Ruoyao (4):
   LoongArch: Rename frint_ to rint2
   LoongArch: Add ftint{,rm,rp}.{w,l}.{s,d} instructions
   LoongArch: Add fscaleb.{s,d} instructions as ldexp{sf,df}3
   LoongArch: Add flogb.{s,d} instructions and expand logb{sf,df}2

  gcc/config/loongarch/loongarch.md | 95
++-
  gcc/testsuite/gcc.target/loongarch/flogb.c    | 18 
  gcc/testsuite/gcc.target/loongarch/frint.c    | 16 
  gcc/testsuite/gcc.target/loongarch/fscaleb.c  | 48 ++
  .../gcc.target/loongarch/ftint-no-inexact.c   | 44 +
  gcc/testsuite/gcc.target/loongarch/ftint.c    | 44 +
  6 files changed, 261 insertions(+), 4 deletions(-)
  create mode 100644 gcc/testsuite/gcc.target/loongarch/flogb.c
  create mode 100644 gcc/testsuite/gcc.target/loongarch/frint.c
  create mode 100644 gcc/testsuite/gcc.target/loongarch/fscaleb.c
  create mode 100644 gcc/testsuite/gcc.target/loongarch/ftint-no-
inexact.c
  create mode 100644 gcc/testsuite/gcc.target/loongarch/ftint.c


Pushed r13-3922.

I'll be busy in the following week.  Will do the work on Glibc side
after Nov. 20.





Re: [PATCH 3/7] riscv: Enable overlap-by-pieces via tune param

2022-11-13 Thread Vineet Gupta




On 11/13/22 15:05, Christoph Muellner wrote:
  
+static bool

+riscv_overlap_op_by_pieces (void)
+{
+  return tune_param->overlap_op_by_pieces;


Does this not need to be gated on unaligned access enabled as well.

-Vineet



Revert Sphinx documentation [Was: Issues with Sphinx]

2022-11-13 Thread Martin Liška
Hi.

The situation with the Sphinx migration went out of control. The TODO list
overwhelmed me and there are road-blocks that can't be easily fixed with what
Sphinx currently supports. That would require addition of an upstream support 
and
a possible new Sphinx release.

Let me summarize the biggest road blocks:

1) PR107634 - documentation is divided among much files than it used to be; plus
   the current filenames tend to be very long
2) Index page regressions: PR107643 and PR107651
3) PR107656 - c::macro and c::function does replace [Macro:], [Target Hook:] 
well
4) Makefile.am, build system issues: missing support for lib*/Makefile.am and 
various
   limitation when it comes to 'make install-*'. Moreover, gcc_release 
--enable-generated-files-in-srcdir
   is not supported yet.

Plus, there are other issues linked in PR107655 and we face the issue that
many web links (to GCC documentation) are in the wild and should not become 404.

I would like to apologize to anybody who wasted a time with adoption to the 
Sphinx format
which we be reverted eventually. Special thanks belongs to all people who 
helped me and
prepared various patches.

I'm going to revert the patchset during today (Monday) and I'll send a patch 
with a couple
of new changes that landed in the period of time we used Sphinx.

Thank you for your understanding.
Martin


[PATCH (pushed)] gcc-changelog: temporarily disable check_line_start

2022-11-13 Thread Martin Liška
The following patch will be needed for ChangeLog entry emission
of the reverted Sphinx commits.

Pushed.
Martin

contrib/ChangeLog:

* gcc-changelog/git_commit.py: Temporarily disable
check_line.start.
---
 contrib/gcc-changelog/git_commit.py | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/contrib/gcc-changelog/git_commit.py 
b/contrib/gcc-changelog/git_commit.py
index 3bd671011f2..59f96800da9 100755
--- a/contrib/gcc-changelog/git_commit.py
+++ b/contrib/gcc-changelog/git_commit.py
@@ -615,6 +615,8 @@ class GitCommit:
 self.errors.append(Error(msg, entry.parentheses_stack[-1]))
 
 def check_line_start(self):
+# FIXME: temporarily disable
+return
 for entry in self.changelog_entries:
 for line in entry.lines:
 if line.startswith('\t '):
-- 
2.38.1



Re: [PATCH] doc: Ada: include Indices and Tables in manuals

2022-11-13 Thread Martin Liška
On 11/13/22 18:03, Arnaud Charlet wrote:
 Sorry for the breakage. However, I contacted you (and your colleague) and 
 haven't received
 any feedback for a couple of weeks.
>>>
>>> Right although I did give you feedback that what you sent wasn’t in a 
>>> suitable form for review wrt Ada.
>>
>> Sure, but sending a patch set to gcc-patches wouldn't have worked either, 
>> we've got quite a strict
>> email size limit.
>>
>> Anyway, hope the AdaCore build would be fixable with a reasonable amount of 
>> effort?
> 
> Unclear yet. We'll probably need to change and possibly partially revert the
> Ada changes, we'll see.
> 
> Arno

Hello.

Note the Sphinx changes will be reverted today:
https://gcc.gnu.org/pipermail/gcc/2022-November/239983.html

Sorry for your extra work.

Martin


  1   2   >