date:20231118

On Sat, 2023-11-18 at 08:43 +, Jonathan Wakely wrote:
> Hi,
> 
> Libstdc++ patches need to be CC'd to the gcc-patches list as well.
> 
> On Wed, 8 Nov 2023 at 10:26, Peng Fan  wrote:
> > 
> > libstdc++-v3:
> > 
> >     * configure.host: Add abi_baseline_pair for LoongArch.
> >     * config/abi/post/riscv64-linux-gnu: New directory.
> >     * config/abi/post/riscv64-linux-gnu/baseline_symbols.txt: New file.
> 
> Looks like the wrong filename here.

Try "git gcc-verify ${sha}" (you need to configure the local git repo
with contrib/gcc-git-customization.sh first), it will detect such issues
before submission.

> > Signed-off-by: Peng Fan 

This should be unneeded because Loongson company should have already
signed a FSF copyright assignment.

/* snip */

> 
> > +TLS:8:_ZSt11__once_call
> > +TLS:8:_ZSt11__once_call@@GLIBCXX_3.4.11
> > +TLS:8:_ZSt15__once_callable
> > +TLS:8:_ZSt15__once_callable@@GLIBCXX_3.4.11
> 
> Does this target support TLS unconditionally? These TLS symbols are
> not included in the baseline symbols for x86_64-pc-linux-gnu, for
> example.

A hardware register is used as TP on this target.  But anyway TLS may be
disabled via --disable-tls, though I don't know it this configuration
really works on loongarch64-linux-gnu (nobody have really tested it, I
guess).

-- 
Xi Ruoyao 
School of Aerospace Science and Technology, Xidian University

Re: gfortran.dg/dg.exp debug messages pollute test output

2023-11-18 Thread FX Coudert

> I suppose 'set t [...]' can be let go, too?

Duh (x2).
Pushed, on top of the previous patch.

FX



0001-Testsuite-remove-unused-variables.patch
Description: Binary data

Re: [PATCH] LoongArch: Optimize the loading of immediate numbers with the same high and low 32-bit values

On Sat, 2023-11-18 at 14:59 +0800, Guo Jie wrote:
> For the following immediate load operation in 
> gcc/testsuite/gcc.target/loongarch/imm-load1.c:
> 
>   long long r = 0x0101010101010101;
> 
> Before this patch:
> 
>   lu12i.w     $r15,16842752>>12
>   ori     $r15,$r15,257
>   lu32i.d     $r15,0x10101>>32
>   lu52i.d     $r15,$r15,0x100>>52
> 
> After this patch:
> 
>   lu12i.w $r15,16842752>>12
>   ori $r15,$r15,257
>   bstrins.d   $r15,$r15,63,32
> 
> gcc/ChangeLog:
> 
>   * config/loongarch/loongarch.cc (enum loongarch_load_imm_method): Add 
> new method.
>   (loongarch_build_integer): Add relevant implementations for new method.
>   (loongarch_move_integer): Ditto.

IIRC the ChangeLog line should be wrapped at 72 characters.

/* snip */

>  struct loongarch_integer_op
> @@ -1556,11 +1560,23 @@ loongarch_build_integer (struct loongarch_integer_op 
> *codes,
>  
>    int sign31 = (value & (HOST_WIDE_INT_1U << 31)) >> 31;
>    int sign51 = (value & (HOST_WIDE_INT_1U << 51)) >> 51;
> +
> +  unsigned HOST_WIDE_INT hival = value >> 32;
> +  unsigned HOST_WIDE_INT loval = value << 32 >> 32;

Use

uint32_t hival = (uint32_t) (value >> 32);
uint32_t loval = (uint32_t) value;

instead, because "value << 32" may trigger a left-shift of negative
value.

C++11 doesn't allow shifting left any negative value.  Yes it's allowed
as a GCC extension and it's also allowed by C++23, but GCC codebase is
still C++11.  So it may break GCC if bootstrapping from a different
compiler, and --with-build-config=bootstrap-ubsan will complain.

Otherwise LGTM.

-- 
Xi Ruoyao 
School of Aerospace Science and Technology, Xidian University

Re: RISC-V: Support XTheadVector extensions

2023-11-18 Thread Christoph Müllner

On Fri, Nov 17, 2023 at 12:40 PM juzhe.zh...@rivai.ai
 wrote:
>
> 90% theadvector extension reusing current RVV 1.0 instructions patterns:
> Just change ASM, For example:
>
> @@ -2923,7 +2923,7 @@ (define_insn "*pred_mulh_scalar"
>   (match_operand:VFULLI_D 3 "register_operand"  "vr,vr, vr, vr")] VMULH)
>(match_operand:VFULLI_D 2 "vector_merge_operand" "vu, 0, vu,  0")))]
>"TARGET_VECTOR"
> -  "vmulh.vx\t%0,%3,%z4%p1"
> +  "%^vmulh.vx\t%0,%3,%z4%p1"
>[(set_attr "type" "vimul")
> (set_attr "mode" "")])
>
> +  if (letter == '^')
> +{
> +  if (TARGET_XTHEADVECTOR)
> + fputs ("th.", file);
> +  return;
> +}
>
>
> For almost all patterns, you just simply append "th." in the ASM prefix.
> like change "vmulh.vv" -> "th.vmulh.vv"
>
> Almost all theadvector instructions are not new features,  all same as RVV1.0.
> Why do you invent the such ISA doesn't include any features that RVV1.0 
> doesn't satisfy ?
>
> I am not explicitly object this patch. But I should know the reason.

Palmer already outlined the reason why this has been implemented in HW.
I want to add some comments on the specification and the design of the
SW support.

We have to face the fact here, that T-Head implemented the 0.7.1 draft
version of RVV (plus a VLENB CSR).
I don't want to waste time and discuss who is to blame for that (this
has been done elsewhere in enough detail).
Also, there are mechanisms now in place to avoid that something like
this happens again.

When we call this extension "RVV-0.7.1-draft" (plus VLENB), then we
are facing arguments that
claim that a RVV "draft" cannot be treated as a ratified extension.
Further, there are arguments
that only one RVV extension exists (the one that was ratified).
Therefore, T-Head's vector extension was
several times described as a "custom-extension", which is
"non-conforming" (uses encoding space
of standard extension). Of course, this hides the fact that the
extension is identical to RVV 1.0 to a high degree.
Anyway, I don't think that these arguments and descriptions are wrong.

So, in order to avoid pointless discussions about what it is, and why
it is what it is,
we simply accepted this description and gave the extension the name
XTheadVector.
In RISC-V vendor instructions and CSRs need to have vendor prefixes ("th.").
The specification can be found (together with all other XThead*
extensions) here:

https://github.com/T-head-Semi/thead-extension-spec/blob/master/xtheadvector.adoc
Some further details, which are worth mentioning here about this specification:
* Factional LMUL values are not supported.
* Zvlsseg was an extension in RVV 0.7.1, but is part in RVV 1.0.
  Since T-Head has these instructions as well, we followed the RVV 1.0
idea and made
  these instructions mandatory for XTheadVector (ie. avoiding
introduction of useless extensions).
* Zvamo was an extension in RVV 0.7.1, which was dropped in RVV 1.0.
  Since T-Head has these instructions as well, we defined XTheadZvamo.

So the result is that we have a custom extension, which uses the RVI
encoding space
and which "by accident" has a huge overlap with RVV 1.0.
We are all fine with this, as long as this is our ticket to get the
extension supported upstream
(in the sense that everyone's opinions are respected and a solution is
found which
will not trigger useless discussions about things that happened a long
time ago).

The implementation follows this idea: it is a vendor extension and is
kept as separate
as possible from standard extensions. However, avoid duplication was
one of our most important
goals, so we came up with reusing the overlapping functionality by
just adding the instruction prefixes.

For the intrinsics API, we use a more user-friendly (pragmatic) approach:
* state the set of supported RVV intrinsic functions
* state the missing support of fractional LMUL values
* list the extension-specific intrinsic functions for the additional
load/store functionality

I hope this gives a good overview of our reasoning.
Let me know if you have further questions.

BR
Christoph

>
> Btw, stage 1 will close soon.  So I will review this patch on GCC-15 as long 
> as all other RISC-V maintainers agree.
>
>
> 
> juzhe.zh...@rivai.ai

Re: RISC-V: Support XTheadVector extensions

2023-11-18 Thread Philipp Tomsich

On Fri, 17 Nov 2023 at 22:47, Jeff Law  wrote:
>
>
>
> On 11/17/23 04:39, juzhe.zh...@rivai.ai wrote:
> > 90% theadvector extension reusing current RVV 1.0 instructions patterns:
> > Just change ASM, For example:
> >
> > @@ -2923,7 +2923,7 @@ (define_insn "*pred_mulh_scalar"
> >(match_operand:VFULLI_D 3 "register_operand"  "vr,vr, vr, vr")] 
> > VMULH)
> > (match_operand:VFULLI_D 2 "vector_merge_operand" "vu, 0, vu,  0")))]
> > "TARGET_VECTOR"
> > -  "vmulh.vx\t%0,%3,%z4%p1"
> > +  "%^vmulh.vx\t%0,%3,%z4%p1"
> > [(set_attr "type" "vimul")
> >  (set_attr "mode" "")])
> >
> > +  if (letter == '^')
> > +{
> > +  if (TARGET_XTHEADVECTOR)
> > + fputs ("th.", file);
> > +  return;
> > +}
> I assume this hunk is meant for riscv_output_operand in riscv.cc.  We
> may also need to add '^' to the punct_valid_p hook.  But yes, this is
> the preferred way to go when all we need to do is prefix the instruction
> with "th.".
>
>
> >
> > Btw, stage 1 will close soon.  So I will review this patch on GCC-15 as
> > long as all other RISC-V maintainers agree.
> I *think* it's a gcc-15 issue.  Philipp T. and I briefly spoke about
> this at the RVI summit a couple weeks back and he indicated the thead
> vector work was targeting gcc-15.

To restate the intent clearly:
- Getting this merged into GCC14 would be our most favored outcome, as
boards with XTheadV are quite common in the field: Allwinner D1,
BeagleBoard BeagleV-Ahead, Sophgo Milk-V;
- If that is not possible and we end up with an "ok for 15", we can
still resolve the downstream ecosystem issues (primarily felt by the
BeagleV-Ahead community) gracefully.
>From our brief discussion, I understood you thought it more realistic
to land this early into GCC15.

If we end up targeting GCC15, I would still like to achieve an
agreement on design early.  This would allow our team to make any
needed changes and maintain them in a vendor-branch (on the GCC GIT
reposirty) until GCC15 opens up.

Philipp.

Re: [PATCH] RISC-V: Fix bug of tuple move splitter[PR112561]

> On 11/17/23 07:18, Kito Cheng wrote:
> > I didn’t take a closer look yet on the ira/lra dump yet, but my feeling
> > is that may cause by the earlyclober modifier isn’t work as expect?
> >
> > Let me take closer look tomorrow.
> Remember that constraints aren't checked until register allocation.  So
> the combiner, splitters, etc don't know about "earlyclobber".  It's a
> relatively common goof.
>
> Not sure if that's playing a role here, but I've seen it happen several
> times in the past.

Oh, okay, found IRA/LRA are both did the right jobs, it just because
we don't use that clobber register correctly, only use - no def, so
the cprop_hardreg thinks it can do that, then screw up, so Ju-Zhe has
explained and fix in right way but I just didn't get the point
yesterday :P

Re: [PATCH V2] RISC-V: Fix bug of tuple move splitter

LGTM, and could you add one more comment before that condition: /*
Non-fractional LMUL has whole register moves that don't require a
vsetvl for VLMAX.  */

On Fri, Nov 17, 2023 at 9:48 PM Juzhe-Zhong  wrote:
>
> Fix segment fault on tuple move:
>
> bbl loader
> z   ra 000102ac sp 003ffaf0 gp 
> 0001c0b8
> tp  t0 000104a0 t1 000f t2 
> 
> s0  s1  a0 003ffb30 a1 
> 003ffb58
> a2  a3  a4  a5 
> 0001c340
> a6 0004 a7 0004 s2  s3 
> 
> s4  s5  s6  s7 
> 
> s8  s9  sA  sB 
> 
> t3  t4  t5  t6 
> 
> pc 000101aa va/inst 0004 sr 80026620
> User store segfault @ 0x0004
>
> PR target/112561
>
> gcc/ChangeLog:
>
> * config/riscv/riscv-v.cc (expand_tuple_move): Fix bug.
>
> gcc/testsuite/ChangeLog:
>
> * gcc.target/riscv/rvv/autovec/pr112561.c: New test.
>
> ---
>  gcc/config/riscv/riscv-v.cc  |  2 ++
>  .../gcc.target/riscv/rvv/autovec/pr112561.c  | 16 
>  2 files changed, 18 insertions(+)
>  create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/pr112561.c
>
> diff --git a/gcc/config/riscv/riscv-v.cc b/gcc/config/riscv/riscv-v.cc
> index 6a2009ffb05..91bb6ea520d 100644
> --- a/gcc/config/riscv/riscv-v.cc
> +++ b/gcc/config/riscv/riscv-v.cc
> @@ -2148,6 +2148,8 @@ expand_tuple_move (rtx *ops)
>   offset = ops[2];
> }
>
> +  if (fractional_p)
> +   emit_vlmax_vsetvl (subpart_mode, ops[4]);
>if (MEM_P (ops[1]))
> {
>   /* Load operations.  */
> diff --git a/gcc/testsuite/gcc.target/riscv/rvv/autovec/pr112561.c 
> b/gcc/testsuite/gcc.target/riscv/rvv/autovec/pr112561.c
> new file mode 100644
> index 000..25e61fa12c0
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/riscv/rvv/autovec/pr112561.c
> @@ -0,0 +1,16 @@
> +/* { dg-do run { target { riscv_v } } } */
> +/* { dg-options "-O3 -ftree-vectorize 
> --param=riscv-autovec-preference=fixed-vlmax -mcmodel=medlow" } */
> +
> +int printf(char *, ...);
> +int a, b, c, e;
> +short d[7][7] = {};
> +int main() {
> +  short f;
> +  c = 0;
> +  for (; c <= 6; c++) {
> +e |= d[c][c] & 1;
> +b &= f & 3;
> +  }
> +  printf("%d\n", a);
> +  return 0;
> +}
> --
> 2.36.3
>
>

Re: [PATCH v2 1/9] RISC-V: minimal support for xtheadvector

On Sat, Nov 18, 2023 at 12:27 PM Jun Sha (Joshua)
 wrote:
>
> This patch is to introduce basic XTheadVector support
> (march string parsing and a test for __riscv_xtheadvector)
> according to https://github.com/T-head-Semi/thead-extension-spec/
>
> Contributors:
> Jun Sha (Joshua) 
> Jin Ma 
> Christoph Müllner 
>
> gcc/ChangeLog:
>
> * common/config/riscv/riscv-common.cc
> (riscv_subset_list::parse): : Add new vendor extension.
> * config/riscv/riscv-c.cc (riscv_cpu_cpp_builtins):
> Add test marco.
> * config/riscv/riscv.opt: Add new mask.
>
> gcc/testsuite/ChangeLog:
>
> * gcc.target/riscv/predef-__riscv_th_v_intrinsic.c: New test.
> * gcc.target/riscv/rvv/xtheadvector.c: New test.
> ---
>  gcc/common/config/riscv/riscv-common.cc | 10 ++
>  gcc/config/riscv/riscv-c.cc |  4 
>  gcc/config/riscv/riscv.opt  |  2 ++
>  .../riscv/predef-__riscv_th_v_intrinsic.c   | 11 +++
>  gcc/testsuite/gcc.target/riscv/rvv/xtheadvector.c   | 13 +
>  5 files changed, 40 insertions(+)
>  create mode 100644 
> gcc/testsuite/gcc.target/riscv/predef-__riscv_th_v_intrinsic.c
>  create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/xtheadvector.c
>
> diff --git a/gcc/common/config/riscv/riscv-common.cc 
> b/gcc/common/config/riscv/riscv-common.cc
> index 526dbb7603b..914924171fd 100644
> --- a/gcc/common/config/riscv/riscv-common.cc
> +++ b/gcc/common/config/riscv/riscv-common.cc
> @@ -75,6 +75,8 @@ static const riscv_implied_info_t riscv_implied_info[] =
>
>{"v", "zvl128b"},
>{"v", "zve64d"},
> +  {"xtheadvector", "zvl128b"},
> +  {"xtheadvector", "zve64d"},

^^^ don't imply zve64d, it will mix V 1.0 together, I know why you
want to do that, so I have given some suggestions below.

>
>{"zve32f", "f"},
>{"zve64f", "f"},
> @@ -325,6 +327,7 @@ static const struct riscv_ext_version 
> riscv_ext_version_table[] =
>{"xtheadmemidx", ISA_SPEC_CLASS_NONE, 1, 0},
>{"xtheadmempair", ISA_SPEC_CLASS_NONE, 1, 0},
>{"xtheadsync", ISA_SPEC_CLASS_NONE, 1, 0},
> +  {"xtheadvector", ISA_SPEC_CLASS_NONE, 1, 0},
>
>{"xventanacondops", ISA_SPEC_CLASS_NONE, 1, 0},
>
> @@ -1495,6 +1498,10 @@ riscv_subset_list::parse (const char *arch, location_t 
> loc)
>  error_at (loc, "%<-march=%s%>: z*inx conflicts with floating-point "
>"extensions", arch);
>
> +  if (subset_list->lookup ("v") && subset_list->lookup ("xtheadvector"))
> +error_at (loc, "%<-march=%s%>: xtheadvector conflicts with vector "
> +  "extensions", arch);
> +
>/* 'H' hypervisor extension requires base ISA with 32 registers.  */
>if (subset_list->lookup ("e") && subset_list->lookup ("h"))
>  error_at (loc, "%<-march=%s%>: h extension requires i extension", arch);
> @@ -1680,6 +1687,9 @@ static const riscv_ext_flag_table_t 
> riscv_ext_flag_table[] =
>{"xtheadmemidx",  &gcc_options::x_riscv_xthead_subext, MASK_XTHEADMEMIDX},
>{"xtheadmempair", &gcc_options::x_riscv_xthead_subext, MASK_XTHEADMEMPAIR},
>{"xtheadsync",&gcc_options::x_riscv_xthead_subext, MASK_XTHEADSYNC},
> +  {"xtheadvector",  &gcc_options::x_riscv_xthead_subext, MASK_XTHEADVECTOR},
> +  {"xtheadvector",  &gcc_options::x_target_flags, MASK_FULL_V},
> +  {"xtheadvector",  &gcc_options::x_target_flags, MASK_VECTOR},

Add following two line then you don't need zve64d
 {"xtheadvector", &gcc_options::x_riscv_vector_elen_flags, MASK_VECTOR_ELEN_64},
 {"xtheadvector",  &gcc_options::x_riscv_vector_elen_flags,
MASK_VECTOR_ELEN_FP_64},

>
>{"xventanacondops", &gcc_options::x_riscv_xventana_subext, 
> MASK_XVENTANACONDOPS},
>

Re: [PATCH v2 2/9] RISC-V: Handle differences between xtheadvector and vector

> diff --git a/gcc/config/riscv/riscv_th_vector.h 
> b/gcc/config/riscv/riscv_th_vector.h
> new file mode 100644
> index 000..194652032bc
> --- /dev/null
> +++ b/gcc/config/riscv/riscv_th_vector.h
...
> +/* NOTE: This implementation of riscv_vector.h is intentionally short.  It 
> does
> +   not define the RVV types and intrinsic functions directly in C and C++
> +   code, but instead uses the following pragma to tell GCC to insert the
> +   necessary type and function definitions itself.  The net effect is the
> +   same, and the file is a complete implementation of riscv_vector.h.  */
> +#pragma riscv intrinsic "vector"

Plz use #pragma riscv intrinsic "thead_vector"

> @@ -1135,7 +1135,7 @@ (define_expand "@mov_lra"
>  [(set (match_operand:V_FRACT 0 "reg_or_mem_operand")
>   (match_operand:V_FRACT 1 "reg_or_mem_operand"))
> (clobber (match_scratch:P 2))])]
> -  "TARGET_VECTOR && (lra_in_progress || reload_completed)"
> +  "TARGET_VECTOR &&  (lra_in_progress || reload_completed)"

It's an accident, right?

>  {})
>
>  (define_expand "@mov_lra"
> @@ -1143,14 +1143,14 @@ (define_expand "@mov_lra"
>  [(set (match_operand:VB 0 "reg_or_mem_operand")
>   (match_operand:VB 1 "reg_or_mem_operand"))
> (clobber (match_scratch:P 2))])]
> -  "TARGET_VECTOR && (lra_in_progress || reload_completed)"
> +  "TARGET_VECTOR &&  (lra_in_progress || reload_completed)"

Ditto.

>  {})
>
>  (define_insn_and_split "*mov_lra"
>[(set (match_operand:V_FRACT 0 "reg_or_mem_operand" "=vr, m,vr")
> (match_operand:V_FRACT 1 "reg_or_mem_operand" "  m,vr,vr"))
> (clobber (match_scratch:P 2 "=&r,&r,X"))]
> -  "TARGET_VECTOR && (lra_in_progress || reload_completed)"
> +  "TARGET_VECTOR &&  (lra_in_progress || reload_completed)"

Ditto.

>"#"
>"&& reload_completed"
>[(const_int 0)]
> @@ -1172,7 +1172,7 @@ (define_insn_and_split "*mov_lra"
>[(set (match_operand:VB 0 "reg_or_mem_operand" "=vr, m,vr")
> (match_operand:VB 1 "reg_or_mem_operand" "  m,vr,vr"))
> (clobber (match_scratch:P 2 "=&r,&r,X"))]
> -  "TARGET_VECTOR && (lra_in_progress || reload_completed)"
> +  "TARGET_VECTOR &&  (lra_in_progress || reload_completed)"

Ditto.

>"#"
>"&& reload_completed"
>[(const_int 0)]
> @@ -1286,14 +1286,14 @@ (define_expand "@mov_lra"
>  [(set (match_operand:VLS_AVL_REG 0 "reg_or_mem_operand")
>   (match_operand:VLS_AVL_REG 1 "reg_or_mem_operand"))
> (clobber (match_scratch:P 2))])]
> -  "TARGET_VECTOR && (lra_in_progress || reload_completed)"
> +  "TARGET_VECTOR &&  (lra_in_progress || reload_completed)"

Ditto.

>  {})
>
>  (define_insn_and_split "*mov_lra"
>[(set (match_operand:VLS_AVL_REG 0 "reg_or_mem_operand" "=vr, m,vr")
> (match_operand:VLS_AVL_REG 1 "reg_or_mem_operand" "  m,vr,vr"))
> (clobber (match_scratch:P 2 "=&r,&r,X"))]
> -  "TARGET_VECTOR && (lra_in_progress || reload_completed)
> +  "TARGET_VECTOR &&  (lra_in_progress || reload_completed)

Ditto.

> && (register_operand (operands[0], mode)
> || register_operand (operands[1], mode))"
>"#"
> @@ -1359,7 +1359,7 @@ (define_expand "movmisalign"
>  (define_expand "movmisalign"
>[(set (match_operand:V 0 "nonimmediate_operand")
> (match_operand:V 1 "general_operand"))]
> -  "TARGET_VECTOR && TARGET_VECTOR_MISALIGN_SUPPORTED"
> +  "TARGET_VECTOR &&  TARGET_VECTOR_MISALIGN_SUPPORTED"

Ditto.

>{
>  emit_move_insn (operands[0], operands[1]);
>  DONE;
> @@ -1396,7 +1396,7 @@ (define_insn_and_split "*vec_duplicate"
>[(set (match_operand:V_VLS 0 "register_operand")
>  (vec_duplicate:V_VLS
>(match_operand: 1 "direct_broadcast_operand")))]
> -  "TARGET_VECTOR && can_create_pseudo_p ()"
> +  "TARGET_VECTOR &&  can_create_pseudo_p ()"

Ditto.

>"#"
>"&& 1"
>[(const_int 0)]

[PATCH] tree-ssa-math-opts: popcount (X) == 1 to (X ^ (X - 1)) > (X - 1) optimization for direct optab [PR90693]

Hi!

On Fri, Nov 17, 2023 at 03:01:04PM +0100, Jakub Jelinek wrote:
> As a follow-up, I'm considering changing in this routine the popcount
> call to IFN_POPCOUNT with 2 arguments and during expansion test costs.

Here is the follow-up which does the rtx costs testing.
While having to tweak internal-fn.def so that POPCOUNT can have a custom
expand_POPCOUNT, I have noticed we are inconsistent, some DEF_INTERNAL*
macros (most of them) were undefined at the end of internal-fn.def (but in
some cases uselessly undefined again after inclusion), while others were not
(and sometimes undefined after the inclusion).  I've changed it to always
undefine at the end of internal-fn.def.

Ok for trunk if it passes bootstrap/regtest?

2023-11-18  Jakub Jelinek  

PR tree-optimization/90693
* tree-ssa-math-opts.cc (match_single_bit_test): Mark POPCOUNT with
result only used in equality comparison against 1 with direct optab
support as .POPCOUNT call with 2 arguments.
* internal-fn.h (expand_POPCOUNT): Declare.
* internal-fn.def: Document missing DEF_INTERNAL* macros and make sure
they are all undefined at the end.
(DEF_INTERNAL_INT_EXT_FN): New macro.
(POPCOUNT): Use it instead of DEF_INTERNAL_INT_FN.
* internal-fn.cc (lookup_hilo_internal_fn, lookup_evenodd_internal_fn,
widening_fn_p, get_len_internal_fn): Don't undef DEF_INTERNAL_*FN
macros after inclusion of internal-fn.def.
(DEF_INTERNAL_INT_EXT_FN): Define to nothing before inclusion to
define expanders.
(expand_POPCOUNT): New function.

--- gcc/tree-ssa-math-opts.cc.jj2023-11-18 09:38:03.460813910 +0100
+++ gcc/tree-ssa-math-opts.cc   2023-11-18 10:25:18.751207936 +0100
@@ -5195,7 +5195,16 @@ match_single_bit_test (gimple_stmt_itera
   if (!INTEGRAL_TYPE_P (type))
 return;
   if (direct_internal_fn_supported_p (IFN_POPCOUNT, type, OPTIMIZE_FOR_BOTH))
-return;
+{
+  /* Tell expand_POPCOUNT the popcount result is only used in equality
+comparison with one, so that it can decide based on rtx costs.  */
+  gimple *g = gimple_build_call_internal (IFN_POPCOUNT, 2, arg,
+ integer_one_node);
+  gimple_call_set_lhs (g, gimple_call_lhs (call));
+  gimple_stmt_iterator gsi2 = gsi_for_stmt (call);
+  gsi_replace (&gsi2, g, true);
+  return;
+}
   tree argm1 = make_ssa_name (type);
   gimple *g = gimple_build_assign (argm1, PLUS_EXPR, arg,
   build_int_cst (type, -1));
--- gcc/internal-fn.h.jj2023-11-02 12:15:12.223998565 +0100
+++ gcc/internal-fn.h   2023-11-18 10:14:25.834340505 +0100
@@ -262,6 +262,7 @@ extern void expand_MULBITINT (internal_f
 extern void expand_DIVMODBITINT (internal_fn, gcall *);
 extern void expand_FLOATTOBITINT (internal_fn, gcall *);
 extern void expand_BITINTTOFLOAT (internal_fn, gcall *);
+extern void expand_POPCOUNT (internal_fn, gcall *);
 
 extern bool vectorized_internal_fn_supported_p (internal_fn, tree);
 
--- gcc/internal-fn.def.jj  2023-11-17 15:51:02.642802521 +0100
+++ gcc/internal-fn.def 2023-11-18 10:12:10.329236626 +0100
@@ -33,9 +33,13 @@ along with GCC; see the file COPYING3.
  DEF_INTERNAL_SIGNED_OPTAB_FN (NAME, FLAGS, SELECTOR, SIGNED_OPTAB,
   UNSIGNED_OPTAB, TYPE)
  DEF_INTERNAL_FLT_FN (NAME, FLAGS, OPTAB, TYPE)
+ DEF_INTERNAL_FLT_FLOATN_FN (NAME, FLAGS, OPTAB, TYPE)
  DEF_INTERNAL_INT_FN (NAME, FLAGS, OPTAB, TYPE)
+ DEF_INTERNAL_INT_EXT_FN (NAME, FLAGS, OPTAB, TYPE)
  DEF_INTERNAL_COND_FN (NAME, FLAGS, OPTAB, TYPE)
  DEF_INTERNAL_SIGNED_COND_FN (NAME, FLAGS, OPTAB, TYPE)
+ DEF_INTERNAL_WIDENING_OPTAB_FN (NAME, FLAGS, SELECTOR, SOPTAB, UOPTAB,
+TYPE)
 
where NAME is the name of the function, FLAGS is a set of
ECF_* flags and FNSPEC is a string describing functions fnspec.
@@ -97,6 +101,10 @@ along with GCC; see the file COPYING3.
says that the function extends the C-level BUILT_IN_{,L,LL,IMAX}
group of functions to any integral mode (including vector modes).
 
+   DEF_INTERNAL_INT_EXT_FN is like DEF_INTERNAL_INT_FN, except that it
+   has expand_##NAME defined in internal-fn.cc to override the
+   DEF_INTERNAL_INT_FN expansion behavior.
+
DEF_INTERNAL_WIDENING_OPTAB_FN is a wrapper that defines five internal
functions with DEF_INTERNAL_SIGNED_OPTAB_FN:
- one that describes a widening operation with the same number of elements
@@ -160,6 +168,11 @@ along with GCC; see the file COPYING3.
   DEF_INTERNAL_OPTAB_FN (NAME, FLAGS, OPTAB, TYPE)
 #endif
 
+#ifndef DEF_INTERNAL_INT_EXT_FN
+#define DEF_INTERNAL_INT_EXT_FN(NAME, FLAGS, OPTAB, TYPE) \
+  DEF_INTERNAL_INT_FN (NAME, FLAGS, OPTAB, TYPE)
+#endif
+
 #ifndef DEF_INTERNAL_WIDENING_OPTAB_FN
 #define DEF_INTERNAL_WIDENING_OPTAB_FN(NAME, FLAGS, SELECTOR, SOPTAB, UOPTAB, 
TYPE)\
   DEF_INTERN

Re: RISC-V: Support XTheadVector extensions

I guess it would be worth to state my thought publicly:

I *support* adding the T-head vector (a.k.a. vector 0.7) to upstream
GCC since T-Head vector already ships a large enough number of boards,
also it's not really T-head's problem as Palmer described in another
mail.

My biggest concern before is T-head folks didn't involved into
community work too much, so accept that definitely will increasing
work for maintainers, however I saw T-head folks is trying to
contribute stuffs to upstream now, so may not a concern now, also I
believe accept this patch will encourage they work more on upstream
together, which is benefit to each other.

Back to the one of the biggest issues for the patch set: GCC 14 or GCC
15. My general thought is it may be OK if it's less invasive enough,
then should be OK for GCC 14, but I don't have a strong opinion, since
as you know I am not the main developer of the vector part, so I will
let Ju-Zhe make the final decision, because he is the one who
contributes most things to RISC-V vector gcc support.

[committed] RISC-V: Fix mismatched new delete for unique_ptr

gcc/ChangeLog:

* config/riscv/riscv-target-attr.cc
(riscv_target_attr_parser::parse_arch): Use char[] for
std::unique_ptr to prevent mismatched new delete issue.
(riscv_process_one_target_attr): Ditto.
(riscv_process_target_attr): Ditto.
---
 gcc/config/riscv/riscv-target-attr.cc | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/gcc/config/riscv/riscv-target-attr.cc 
b/gcc/config/riscv/riscv-target-attr.cc
index 78f259d0c96..c4bd99d8632 100644
--- a/gcc/config/riscv/riscv-target-attr.cc
+++ b/gcc/config/riscv/riscv-target-attr.cc
@@ -105,7 +105,7 @@ riscv_target_attr_parser::parse_arch (const char *str)
 {
   /* Parsing the extension list like "+[,+]*".  */
   size_t len = strlen (str);
-  std::unique_ptr buf (new char[len]);
+  std::unique_ptr buf (new char[len]);
   char *str_to_check = buf.get ();
   strcpy (str_to_check, str);
   const char *token = strtok_r (str_to_check, ",", &str_to_check);
@@ -241,7 +241,7 @@ riscv_process_one_target_attr (char *arg_str,
   return false;
 }
 
-  std::unique_ptr buf (new char[len]);
+  std::unique_ptr buf (new char[len]);
   char *str_to_check = buf.get();
   strcpy (str_to_check, arg_str);
 
@@ -327,7 +327,7 @@ riscv_process_target_attr (tree args, location_t loc, 
struct gcc_options *opts)
   return false;
 }
 
-  std::unique_ptr buf (new char[len]);
+  std::unique_ptr buf (new char[len]);
   char *str_to_check = buf.get ();
   strcpy (str_to_check, TREE_STRING_POINTER (args));
 
-- 
2.40.1

Re: [PATCH v2] RISC-V: Implement target attribute

Fixed on upstream, thanks for reporting, I guess my host GCC is just
too old. It doesn't even not report that bug with -Wall -Wextra..

On Fri, Nov 17, 2023 at 11:41 PM Andreas Schwab  wrote:
>
> In file included from 
> /daten/riscv64/gcc/gcc-20231117/Build/prev-riscv64-suse-linux/libstdc++-v3/include/memory:78,
>  from ../../gcc/system.h:769,
>  from ../../gcc/config/riscv/riscv-target-attr.cc:25:
> In member function 'void std::default_delete<_Tp>::operator()(_Tp*) const 
> [with _Tp = char]',
> inlined from 'std::unique_ptr<_Tp, _Dp>::~unique_ptr() [with _Tp = char; 
> _Dp = std::default_delete]' at 
> /daten/riscv64/gcc/gcc-20231117/Build/prev-riscv64-suse-linux/libstdc++-v3/include/bits/unique_ptr.h:398:17,
> inlined from 'bool riscv_process_one_target_attr(char*, location_t, 
> {anonymous}::riscv_target_attr_parser&)' at 
> ../../gcc/config/riscv/riscv-target-attr.cc:274:1,
> inlined from 'bool riscv_process_target_attr(tree, location_t, 
> gcc_options*)' at ../../gcc/config/riscv/riscv-target-attr.cc:346:37:
> /daten/riscv64/gcc/gcc-20231117/Build/prev-riscv64-suse-linux/libstdc++-v3/include/bits/unique_ptr.h:93:9:
>  error: 'void operator delete(void*, std::size_t)' called on pointer returned 
> from a mismatched allocation function [-Werror=mismatched-new-delete]
>93 | delete __ptr;
>   | ^~~~
> In function 'bool riscv_process_one_target_attr(char*, location_t, 
> {anonymous}::riscv_target_attr_parser&)',
> inlined from 'bool riscv_process_target_attr(tree, location_t, 
> gcc_options*)' at ../../gcc/config/riscv/riscv-target-attr.cc:346:37:
> ../../gcc/config/riscv/riscv-target-attr.cc:244:42: note: returned from 
> 'void* operator new [](std::size_t)'
>   244 |   std::unique_ptr buf (new char[len]);
>   |  ^
> In member function 'void std::default_delete<_Tp>::operator()(_Tp*) const 
> [with _Tp = char]',
> inlined from 'std::unique_ptr<_Tp, _Dp>::~unique_ptr() [with _Tp = char; 
> _Dp = std::default_delete]' at 
> /daten/riscv64/gcc/gcc-20231117/Build/prev-riscv64-suse-linux/libstdc++-v3/include/bits/unique_ptr.h:398:17,
> inlined from 'bool riscv_process_target_attr(tree, location_t, 
> gcc_options*)' at ../../gcc/config/riscv/riscv-target-attr.cc:361:1:
> /daten/riscv64/gcc/gcc-20231117/Build/prev-riscv64-suse-linux/libstdc++-v3/include/bits/unique_ptr.h:93:9:
>  error: 'void operator delete(void*, std::size_t)' called on pointer returned 
> from a mismatched allocation function [-Werror=mismatched-new-delete]
>93 | delete __ptr;
>   | ^~~~
> ../../gcc/config/riscv/riscv-target-attr.cc: In function 'bool 
> riscv_process_target_attr(tree, location_t, gcc_options*)':
> ../../gcc/config/riscv/riscv-target-attr.cc:330:42: note: returned from 
> 'void* operator new [](std::size_t)'
>   330 |   std::unique_ptr buf (new char[len]);
>   |  ^
> In member function 'void std::default_delete<_Tp>::operator()(_Tp*) const 
> [with _Tp = char]',
> inlined from 'std::unique_ptr<_Tp, _Dp>::~unique_ptr() [with _Tp = char; 
> _Dp = std::default_delete]' at 
> /daten/riscv64/gcc/gcc-20231117/Build/prev-riscv64-suse-linux/libstdc++-v3/include/bits/unique_ptr.h:398:17,
> inlined from 'bool 
> {anonymous}::riscv_target_attr_parser::parse_arch(const char*)' at 
> ../../gcc/config/riscv/riscv-target-attr.cc:140:5,
> inlined from 'bool 
> {anonymous}::riscv_target_attr_parser::handle_arch(const char*)' at 
> ../../gcc/config/riscv/riscv-target-attr.cc:158:21:
> /daten/riscv64/gcc/gcc-20231117/Build/prev-riscv64-suse-linux/libstdc++-v3/include/bits/unique_ptr.h:93:9:
>  error: 'void operator delete(void*, std::size_t)' called on pointer returned 
> from a mismatched allocation function [-Werror=mismatched-new-delete]
>93 | delete __ptr;
>   | ^~~~
> In member function 'bool 
> {anonymous}::riscv_target_attr_parser::parse_arch(const char*)',
> inlined from 'bool 
> {anonymous}::riscv_target_attr_parser::handle_arch(const char*)' at 
> ../../gcc/config/riscv/riscv-target-attr.cc:158:21:
> ../../gcc/config/riscv/riscv-target-attr.cc:108:46: note: returned from 
> 'void* operator new [](std::size_t)'
>   108 |   std::unique_ptr buf (new char[len]);
>   |  ^
> In member function 'void std::default_delete<_Tp>::operator()(_Tp*) const 
> [with _Tp = char]',
> inlined from 'std::unique_ptr<_Tp, _Dp>::~unique_ptr() [with _Tp = char; 
> _Dp = std::default_delete]' at 
> /daten/riscv64/gcc/gcc-20231117/Build/prev-riscv64-suse-linux/libstdc++-v3/include/bits/unique_ptr.h:398:17,
> inlined from 'bool 
> {anonymous}::riscv_target_attr_parser::parse_arch(const char*)' at 
> ../../gcc/config/riscv/riscv-target-attr.cc:140:5,
> inlined from 'bool 
> {anonymous}::riscv_target_attr_parser::handle_arch(const char*)' at 
> .

Re: [PATCH v3] libstdc++: Remove UB from operator+ of months and weekdays.

2023-11-18 Thread Cassio Neri

Actually, disregard this patch. I'm preparing a better one which also
tackles UB in

month - months{INT_MIN}
weekday - days{INT_MIN}

Best wishes,
Cassio.

On Sat, 18 Nov 2023, 00:19 Cassio Neri,  wrote:

> The following functions invoke signed integer overflow (UB) for some
> extreme
> values of days and months [1]:
>
>   weekday operator+(const weekday& x, const days& y); // #1
>   month operator+(const month& x, const months& y);   // #2
>
> For #1 the problem is that in libstdc++ days::rep is int64_t. Other
> implementations use int32_t and cast operands to int64_t. Hence then
> perform
> arithmetic operations without fear of overflowing. For instance, #1
> evaluates:
>
>   modulo(static_cast(unsigned{x}._M_wd) + __y.count(), 7);
>
> For x86-64, long long is int64 so the cast is useless.  For #2, casting to
> a
> larger type could help but all implementations follow the Standard's
> "Returns
> clause" and evaluate:
>
>modulo(static_cast(unsigned{__x}) + (__y.count() - 1), 12);
>
> Hence, overflow occurs when __y.count() is the minimum value of its type.
> When
> long long is larger than months::rep, this is a fix:
>
>modulo(static_cast(unsigned{__x}) + 11 + __y.count(), 12);
>
> Again, this is not possible for libstdc++.  The fix uses this new function:
>
>   template 
>   unsigned __add_modulo(unsigned __x, _T __y);
>
> which returns the remainder of Euclidean division of __x +__y by __d
> without
> overflowing. This function replaces
>
>   constexpr unsigned __modulo(long long __n, unsigned __d);
>
> In addition to solve the UB issues, __add_modulo allows shorter branchless
> code
> on x86-64 and ARM [2].
>
> [1] https://godbolt.org/z/WqvosbrvG
> [2] https://godbolt.org/z/o63794GEE
>
> libstdc++-v3/ChangeLog:
>
> * include/std/chrono: Fix operator+ for months and weekdays.
> * testsuite/std/time/month/1.cc: Add constexpr tests against
> overflow.
> * testsuite/std/time/month/2.cc: New test for extreme values.
> * testsuite/std/time/weekday/1.cc: Add constexpr tests against
> overflow.
> * testsuite/std/time/weekday/2.cc: New test for extreme values.
> ---
>
> Changes with respect to previous versions:
>  v3: Fix screwed up email send with v2. (Sorry about that. I shall learn at
>  some point.)
>  v2: Replaced _T with _Tp and _U with _Up. Removed copyright+license from
> test.
>
>  libstdc++-v3/include/std/chrono  | 61 
>  libstdc++-v3/testsuite/std/time/month/1.cc   |  9 +++
>  libstdc++-v3/testsuite/std/time/month/2.cc   | 30 ++
>  libstdc++-v3/testsuite/std/time/weekday/1.cc |  8 +++
>  libstdc++-v3/testsuite/std/time/weekday/2.cc | 30 ++
>  5 files changed, 114 insertions(+), 24 deletions(-)
>  create mode 100644 libstdc++-v3/testsuite/std/time/month/2.cc
>  create mode 100644 libstdc++-v3/testsuite/std/time/weekday/2.cc
>
> diff --git a/libstdc++-v3/include/std/chrono
> b/libstdc++-v3/include/std/chrono
> index 10bdd1c4ede..691bb106bb9 100644
> --- a/libstdc++-v3/include/std/chrono
> +++ b/libstdc++-v3/include/std/chrono
> @@ -497,18 +497,38 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
>
>  namespace __detail
>  {
> -  // Compute the remainder of the Euclidean division of __n divided
> by __d.
> -  // Euclidean division truncates toward negative infinity and always
> -  // produces a remainder in the range of [0,__d-1] (whereas standard
> -  // division truncates toward zero and yields a nonpositive remainder
> -  // for negative __n).
> +  // Compute the remainder of the Euclidean division of __x + __y
> divided by
> +  // __d without overflowing.  Typically, __x <= 255 + d - 1 is sum of
> +  // weekday/month and an offset in [0, d - 1] and __y is a duration
> count.
> +  // For instance, [time.cal.month.nonmembers] says that given month
> x and
> +  // months y, to get x + y one must calculate:
> +  //
> +  // modulo(static_cast(unsigned{x}) + (y.count() - 1),
> 12) + 1.
> +  //
> +  // Since y.count() is a 64-bits signed value the subtraction
> y.count() - 1
> +  // or the addition of this value with static_cast long>(unsigned{x})
> +  // might overflow.  This function can be used to avoid this problem:
> +  // __add_modulo<12>(unsigned{x} + 11, y.count()) + 1;
> +  // (More details in the implementation of operator+(month, months).)
> +  template 
>constexpr unsigned
> -  __modulo(long long __n, unsigned __d)
> -  {
> -   if (__n >= 0)
> - return __n % __d;
> -   else
> - return (__d + (__n % __d)) % __d;
> +  __add_modulo(unsigned __x, _Tp __y)
> +  {
> +   using _Up = make_unsigned_t<_Tp>;
> +   // For __y >= 0, _Up(__y) has the same mathematical value as __y
> and
> +   // this function simply returns (__x + _Up(__y)) % d.  Typically,
> this
> +   // doesn't overflow since the range of _Up contains many more
> positive
> +   // values than _Tp's.

Re: Re: RISC-V: Support XTheadVector extensions

2023-11-18 Thread 钟居哲

Currently I start to work on full coverage testing (with different compile 
option test GCC testsuite)
and fix bugs which is highest priority definitely.

I am not able to find the time review this patch on GCC-14 for now.

So conservatively, postpone it to GCC-15.  

If we are lucky that I stablize RVV support quickly, we still have a chance to 
make it landed on GCC-14.
It all depends on my review.

But no worry, I will review that eventually.



juzhe.zh...@rivai.ai
 
From: Kito Cheng
Date: 2023-11-18 18:32
To: Philipp Tomsich
CC: Jeff Law; juzhe.zh...@rivai.ai; gcc-patches; kito.cheng; cooper.joshua; 
Robin Dapp; jkridner
Subject: Re: RISC-V: Support XTheadVector extensions
I guess it would be worth to state my thought publicly:
 
I *support* adding the T-head vector (a.k.a. vector 0.7) to upstream
GCC since T-Head vector already ships a large enough number of boards,
also it's not really T-head's problem as Palmer described in another
mail.
 
My biggest concern before is T-head folks didn't involved into
community work too much, so accept that definitely will increasing
work for maintainers, however I saw T-head folks is trying to
contribute stuffs to upstream now, so may not a concern now, also I
believe accept this patch will encourage they work more on upstream
together, which is benefit to each other.
 
Back to the one of the biggest issues for the patch set: GCC 14 or GCC
15. My general thought is it may be OK if it's less invasive enough,
then should be OK for GCC 14, but I don't have a strong opinion, since
as you know I am not the main developer of the vector part, so I will
let Ju-Zhe make the final decision, because he is the one who
contributes most things to RISC-V vector gcc support.

[PATCH 13/44] RISC-V/testsuite: Add branchless cases for FP cond-move operations

Verify, for short forward branch, T-Head, Ventana and Zicond targets and 
the ordered floating-point conditional-move operations that already work 
as expected, that if-conversion triggers via `noce_try_cmove' at the 
respective sufficiently high `-mbranch-cost=' settings that make 
branchless code sequences produced by if-conversion cheaper than their 
original branched equivalents, and that extraneous instructions such as 
SNEZ, etc. are not present in output.  Cover all ordered floating-point 
relational operations to make sure no corner case escapes.

gcc/testsuite/
* gcc.target/riscv/movdifge-sfb.c: New test.
* gcc.target/riscv/movdifge-thead.c: New test.
* gcc.target/riscv/movdifge-ventana.c: New test.
* gcc.target/riscv/movdifge-zicond.c: New test.
* gcc.target/riscv/movdifgt-sfb.c: New test.
* gcc.target/riscv/movdifgt-thead.c: New test.
* gcc.target/riscv/movdifgt-ventana.c: New test.
* gcc.target/riscv/movdifgt-zicond.c: New test.
* gcc.target/riscv/movdifle-sfb.c: New test.
* gcc.target/riscv/movdifle-thead.c: New test.
* gcc.target/riscv/movdifle-ventana.c: New test.
* gcc.target/riscv/movdifle-zicond.c: New test.
* gcc.target/riscv/movdiflt-sfb.c: New test.
* gcc.target/riscv/movdiflt-thead.c: New test.
* gcc.target/riscv/movdiflt-ventana.c: New test.
* gcc.target/riscv/movdiflt-zicond.c: New test.
* gcc.target/riscv/movdifne-sfb.c: New test.
* gcc.target/riscv/movdifne-thead.c: New test.
* gcc.target/riscv/movdifne-ventana.c: New test.
* gcc.target/riscv/movdifne-zicond.c: New test.
* gcc.target/riscv/movsifge-sfb.c: New test.
* gcc.target/riscv/movsifge-thead.c: New test.
* gcc.target/riscv/movsifge-ventana.c: New test.
* gcc.target/riscv/movsifge-zicond.c: New test.
* gcc.target/riscv/movsifgt-sfb.c: New test.
* gcc.target/riscv/movsifgt-thead.c: New test.
* gcc.target/riscv/movsifgt-ventana.c: New test.
* gcc.target/riscv/movsifgt-zicond.c: New test.
* gcc.target/riscv/movsifle-sfb.c: New test.
* gcc.target/riscv/movsifle-thead.c: New test.
* gcc.target/riscv/movsifle-ventana.c: New test.
* gcc.target/riscv/movsifle-zicond.c: New test.
* gcc.target/riscv/movsiflt-sfb.c: New test.
* gcc.target/riscv/movsiflt-thead.c: New test.
* gcc.target/riscv/movsiflt-ventana.c: New test.
* gcc.target/riscv/movsiflt-zicond.c: New test.
* gcc.target/riscv/movsifne-sfb.c: New test.
* gcc.target/riscv/movsifne-thead.c: New test.
* gcc.target/riscv/movsifne-ventana.c: New test.
* gcc.target/riscv/movsifne-zicond.c: New test.
---
 gcc/testsuite/gcc.target/riscv/movdifge-sfb.c |   26 
 gcc/testsuite/gcc.target/riscv/movdifge-thead.c   |   25 +++
 gcc/testsuite/gcc.target/riscv/movdifge-ventana.c |   28 ++
 gcc/testsuite/gcc.target/riscv/movdifge-zicond.c  |   28 ++
 gcc/testsuite/gcc.target/riscv/movdifgt-sfb.c |   26 
 gcc/testsuite/gcc.target/riscv/movdifgt-thead.c   |   25 +++
 gcc/testsuite/gcc.target/riscv/movdifgt-ventana.c |   28 ++
 gcc/testsuite/gcc.target/riscv/movdifgt-zicond.c  |   28 ++
 gcc/testsuite/gcc.target/riscv/movdifle-sfb.c |   26 
 gcc/testsuite/gcc.target/riscv/movdifle-thead.c   |   25 +++
 gcc/testsuite/gcc.target/riscv/movdifle-ventana.c |   28 ++
 gcc/testsuite/gcc.target/riscv/movdifle-zicond.c  |   28 ++
 gcc/testsuite/gcc.target/riscv/movdiflt-sfb.c |   26 
 gcc/testsuite/gcc.target/riscv/movdiflt-thead.c   |   25 +++
 gcc/testsuite/gcc.target/riscv/movdiflt-ventana.c |   28 ++
 gcc/testsuite/gcc.target/riscv/movdiflt-zicond.c  |   28 ++
 gcc/testsuite/gcc.target/riscv/movdifne-sfb.c |   27 +
 gcc/testsuite/gcc.target/riscv/movdifne-thead.c   |   25 +++
 gcc/testsuite/gcc.target/riscv/movdifne-ventana.c |   28 ++
 gcc/testsuite/gcc.target/riscv/movdifne-zicond.c  |   28 ++
 gcc/testsuite/gcc.target/riscv/movsifge-sfb.c |   26 
 gcc/testsuite/gcc.target/riscv/movsifge-thead.c   |   25 +++
 gcc/testsuite/gcc.target/riscv/movsifge-ventana.c |   28 ++
 gcc/testsuite/gcc.target/riscv/movsifge-zicond.c  |   28 ++
 gcc/testsuite/gcc.target/riscv/movsifgt-sfb.c |   26 
 gcc/testsuite/gcc.target/riscv/movsifgt-thead.c   |   25 +++
 gcc/testsuite/gcc.target/riscv/movsifgt-ventana.c |   28 ++
 gcc/testsuite/gcc.target/riscv/movsifgt-zicond

Re: [pushed][PATCH v2] LoongArch: Add code generation support for call36 function calls.

On Sat, 2023-11-18 at 16:16 +0800, chenglulu wrote:
> Pushed to r14-5567.
> 
> 在 2023/11/16 下午3:27, Lulu Cheng 写道:
> > When compiling with '-mcmodel=medium', the function call is made through
> > 'pcaddu18i+jirl' if binutils supports call36, otherwise the
> > native implementation 'pcalau12i+jirl' is used.
> > 
> > gcc/ChangeLog:
> > 
> > * config.in: Regenerate.
> > * config/loongarch/loongarch-opts.h (HAVE_AS_SUPPORT_CALL36): Define 
> > macro.
> > * config/loongarch/loongarch.cc (loongarch_legitimize_call_address):
> > If binutils supports call36, the function call is not split over expand.
> > * config/loongarch/loongarch.md: Add call36 generation code.
> > * config/loongarch/predicates.md: Likewise.
> > * configure: Regenerate.
> > * configure.ac: Check whether binutils supports call36.

With this change I get some test failures with "old" Binutils 2.41:

FAIL: gcc.target/loongarch/func-call-medium-1.c scan-assembler 
test:.*la.global\\t.*g\\n\\tjirl
FAIL: gcc.target/loongarch/func-call-medium-1.c scan-assembler 
test1:.*la.global\\t.*f\\n\\tjirl
FAIL: gcc.target/loongarch/func-call-medium-1.c scan-assembler 
test2:.*la.local\\t.*l\\n\\tjirl
FAIL: gcc.target/loongarch/func-call-medium-2.c scan-assembler 
test:.*la.global\\t.*g\\n\\tjirl
FAIL: gcc.target/loongarch/func-call-medium-2.c scan-assembler 
test1:.*la.local\\t.*f\\n\\tjirl
FAIL: gcc.target/loongarch/func-call-medium-2.c scan-assembler 
test2:.*la.local\\t.*l\\n\\tjirl
FAIL: gcc.target/loongarch/func-call-medium-3.c scan-assembler 
test2:.*la.local\\t.*l\\n\\tjirl
FAIL: gcc.target/loongarch/func-call-medium-4.c scan-assembler 
test1:.*la.local\\t.*f\\n\\tjirl
FAIL: gcc.target/loongarch/func-call-medium-4.c scan-assembler 
test2:.*la.local\\t.*l\\n\\tjirl

Some strange thing is happening: with -mexplicit-relocs=auto or always I
get pcalau12i + jirl as expected, but with -mexplicit-relocs=none I get
"pcaddu18i $r1,%call36(g)" and jirl.  This seems irony (!).

-- 
Xi Ruoyao 
School of Aerospace Science and Technology, Xidian University

[PATCH] LoongArch: Fix "-mexplict-relocs=none -mcmodel=medium" producing %call36 when the assembler does not support it

Even if !HAVE_AS_SUPPORT_CALL36, const_call_insn_operand should still
return false when -mexplict-relocs=none -mcmodel=medium to make
loongarch_legitimize_call_address emit la.local or la.global.

gcc/ChangeLog:

* config/loongarch/predicates.md (const_call_insn_operand):
Remove buggy "HAVE_AS_SUPPORT_CALL36" conditions.  Change "1" to
"true" to make the coding style consistent.
---

Not fully regtested, but it should be obvious and it indeed fixes the
func-call-medium-*.c test failures.  Ok for trunk?

 gcc/config/loongarch/predicates.md | 6 ++
 1 file changed, 2 insertions(+), 4 deletions(-)

diff --git a/gcc/config/loongarch/predicates.md 
b/gcc/config/loongarch/predicates.md
index 56f7c48e126..d02e846cb12 100644
--- a/gcc/config/loongarch/predicates.md
+++ b/gcc/config/loongarch/predicates.md
@@ -444,21 +444,19 @@ (define_predicate "const_call_insn_operand"
 case SYMBOL_PCREL:
   if (TARGET_CMODEL_EXTREME
  || (TARGET_CMODEL_MEDIUM
- && HAVE_AS_SUPPORT_CALL36
  && (la_opt_explicit_relocs == EXPLICIT_RELOCS_NONE)))
return false;
   else
-   return 1;
+   return true;
 
 case SYMBOL_GOT_DISP:
   if (TARGET_CMODEL_EXTREME
  || !flag_plt
  || (flag_plt && TARGET_CMODEL_MEDIUM
- && HAVE_AS_SUPPORT_CALL36
  && (la_opt_explicit_relocs == EXPLICIT_RELOCS_NONE)))
return false;
   else
-   return 1;
+   return true;
 
 default:
   return false;
-- 
2.42.1

Re: [PATCH 13/44] RISC-V/testsuite: Add branchless cases for FP cond-move operations





On 11/18/23 09:50, Maciej W. Rozycki wrote:

Verify, for short forward branch, T-Head, Ventana and Zicond targets and
the ordered floating-point conditional-move operations that already work
as expected, that if-conversion triggers via `noce_try_cmove' at the
respective sufficiently high `-mbranch-cost=' settings that make
branchless code sequences produced by if-conversion cheaper than their
original branched equivalents, and that extraneous instructions such as
SNEZ, etc. are not present in output.  Cover all ordered floating-point
relational operations to make sure no corner case escapes.

gcc/testsuite/
* gcc.target/riscv/movdifge-sfb.c: New test.
* gcc.target/riscv/movdifge-thead.c: New test.
* gcc.target/riscv/movdifge-ventana.c: New test.
* gcc.target/riscv/movdifge-zicond.c: New test.
* gcc.target/riscv/movdifgt-sfb.c: New test.
* gcc.target/riscv/movdifgt-thead.c: New test.
* gcc.target/riscv/movdifgt-ventana.c: New test.
* gcc.target/riscv/movdifgt-zicond.c: New test.
* gcc.target/riscv/movdifle-sfb.c: New test.
* gcc.target/riscv/movdifle-thead.c: New test.
* gcc.target/riscv/movdifle-ventana.c: New test.
* gcc.target/riscv/movdifle-zicond.c: New test.
* gcc.target/riscv/movdiflt-sfb.c: New test.
* gcc.target/riscv/movdiflt-thead.c: New test.
* gcc.target/riscv/movdiflt-ventana.c: New test.
* gcc.target/riscv/movdiflt-zicond.c: New test.
* gcc.target/riscv/movdifne-sfb.c: New test.
* gcc.target/riscv/movdifne-thead.c: New test.
* gcc.target/riscv/movdifne-ventana.c: New test.
* gcc.target/riscv/movdifne-zicond.c: New test.
* gcc.target/riscv/movsifge-sfb.c: New test.
* gcc.target/riscv/movsifge-thead.c: New test.
* gcc.target/riscv/movsifge-ventana.c: New test.
* gcc.target/riscv/movsifge-zicond.c: New test.
* gcc.target/riscv/movsifgt-sfb.c: New test.
* gcc.target/riscv/movsifgt-thead.c: New test.
* gcc.target/riscv/movsifgt-ventana.c: New test.
* gcc.target/riscv/movsifgt-zicond.c: New test.
* gcc.target/riscv/movsifle-sfb.c: New test.
* gcc.target/riscv/movsifle-thead.c: New test.
* gcc.target/riscv/movsifle-ventana.c: New test.
* gcc.target/riscv/movsifle-zicond.c: New test.
* gcc.target/riscv/movsiflt-sfb.c: New test.
* gcc.target/riscv/movsiflt-thead.c: New test.
* gcc.target/riscv/movsiflt-ventana.c: New test.
* gcc.target/riscv/movsiflt-zicond.c: New test.
* gcc.target/riscv/movsifne-sfb.c: New test.
* gcc.target/riscv/movsifne-thead.c: New test.
* gcc.target/riscv/movsifne-ventana.c: New test.
* gcc.target/riscv/movsifne-zicond.c: New test.
Is this dependent on any of the other patches in this series?  Or is it 
independent and ready to go as-is?  I ask becuase it's marked as 13/44 
and I haven't seen the other 43 patches in the series :-)


If it's independent and been tested, then it's OK for the trunk.  I'll 
trust your regexps :-)


jeff

Re: [pushed][PATCH] LoongArch: Increase cost of vector aligned store/load.

On Fri, 2023-11-17 at 10:21 +0800, chenglulu wrote:
> Pushed to r14-5545.
> 
> 在 2023/11/16 下午4:44, Jiahao Xu 写道:
> > Based on SPEC2017 performance evaluation results, it's better to make them 
> > equal
> > to the cost of unaligned store/load so as to avoid odd alignment peeling.
> > 
> > gcc/ChangeLog:
> > 
> > * config/loongarch/loongarch.cc
> > (loongarch_builtin_vectorization_cost): Adjust.

/* snip */

> > +  case vector_load:
> > +  case vector_store:
> >     case unaligned_load:
> >     case unaligned_store:
> >     return 2;

It seems penalizing the vectorizing and causes:

FAIL: gcc.target/loongarch/vector/lasx/lasx-xvstelm.c  -mlasx  
scan-assembler-times xvstelm.w 8

Maybe we can make unaligned_load and unaligned_store cost 1 too instead
of increasing vector_load and vector_store?

-- 
Xi Ruoyao 
School of Aerospace Science and Technology, Xidian University

Re: [PATCH v3 0/2] Replace intl/ with out-of-tree GNU gettext

2023-11-18 Thread Arsen Arsenović


David Edelsohn  writes:

> On Fri, Nov 17, 2023 at 10:17 AM Arsen Arsenović  wrote:
>
>>
>> David Edelsohn  writes:
>>
>> > On Fri, Nov 17, 2023 at 3:46 AM Arsen Arsenović  wrote:
>> >
>> >>
>> >> David Edelsohn  writes:
>> >>
>> >> > On Thu, Nov 16, 2023 at 5:52 PM Arsen Arsenović 
>> wrote:
>> >> >
>> >> > [snip]
>> >> >> Sure, but my patch does insert --disable-shared:
>> >> >>
>> >> >> --8<---cut here---start->8---
>> >> >> host_modules= { module= gettext; bootstrap=true; no_install=true;
>> >> >> module_srcdir= "gettext/gettext-runtime";
>> >> >> // We always build gettext with pic, because some
>> >> packages
>> >> >> (e.g. gdbserver)
>> >> >> // need it in some configuratons, which is determined
>> >> via
>> >> >> nontrivial tests.
>> >> >> // Always enabling pic seems to make sense for
>> something
>> >> >> tied to
>> >> >> // user-facing output.
>> >> >> extra_configure_flags='--disable-shared
>> --disable-java
>> >> >> --disable-csharp --with-pic';
>> >> >> lib_path=intl/.libs; };
>> >> >> --8<---cut here---end--->8---
>> >> >>
>> >> >> ... and it is applied:
>> >> >>
>> >> >> --8<---cut here---start->8---
>> >> >> -bash-5.1$ ./config.status --config
>> >> >> --srcdir=../../gcc/gettext/gettext-runtime
>> --cache-file=./config.cache
>> >> >>   --disable-werror --with-gmp=/opt/cfarm
>> >> >>   --with-libiconv-prefix=/opt/cfarm --disable-libstdcxx-pch
>> >> >>   --with-included-gettext --program-transform-name=s,y,y,
>> >> >>   --disable-option-checking --build=powerpc-ibm-aix7.3.1.0
>> >> >>   --host=powerpc-ibm-aix7.3.1.0 --target=powerpc-ibm-aix7.3.1.0
>> >> >>   --disable-intermodule --enable-checking=yes,types,extra
>> >> >>   --disable-coverage --enable-languages=c,c++
>> >> >>   --disable-build-format-warnings --disable-shared --disable-java
>> >> >>   --disable-csharp --with-pic build_alias=powerpc-ibm-aix7.3.1.0
>> >> >>   host_alias=powerpc-ibm-aix7.3.1.0
>> target_alias=powerpc-ibm-aix7.3.1.0
>> >> >>   CC=gcc CFLAGS=-g 'LDFLAGS=-static-libstdc++ -static-libgcc
>> >> >>   -Wl,-bbigtoc' 'CXX=g++ -std=c++11' CXXFLAGS=-g
>> >> >> --8<---cut here---end--->8---
>> >> >>
>> >> >> I'm unsure how to tell what the produced binaries are w.r.t static or
>> >> >> shared, but I only see .o files inside intl/.libs/libintl.a, while I
>> see
>> >> >> a .so.1 in (e.g.) /lib/libz.a, hinting at it not being shared (?)
>> >> >>
>> >> >
>> >> > An AIX shared library created by libtool will look like
>> >> > libfoo.a[libfoo.so.N], where N is the package major version number.
>> >> > Normally with one file.
>> >>
>> >> > An AIX static library will look like libfoo.a[a.o, b.o, c.o]
>> >> > with multiple object files.
>> >> >
>> >> > An AIX archive can contain a combination of shared objects and
>> >> > normal object files.
>> >> >
>> >> > AIX normally uses the convention shr.o or shr_64.o for the name
>> >> > of the shared object file.  Hint, hint, an AIX archive can contain
>> >> > both 32 bit and 64 bit object files or shared objects.
>> >> >
>> >> > I don't know why the gettext build system would create
>> >> > /home/arsen/build/./gettext/intl/.libs/libintl.a(libintl.so.8)
>> >> > if --disable-shared was requested.  That clearly is using the
>> >> > naming of a libtool AIX shared object and failing due to
>> >> > the missing shared object.  Although in this case, the problem
>> >> > seems to be the shared library load path.  AIX uses LIBPATH,
>> >> > not LD_LIBRARY_PATH.
>> >>
>> >> It doesn't create libintl.a with a libintl.so.8 inside of it.  The
>> >> libintl.a contains a bunch of objects, as I'd expect of a static
>> >> library:
>> >>
>> >> --8<---cut here---start->8---
>> >> -bash-5.1$ ar -t gettext/intl/.libs/libintl.a  | grep libintl
>> >> -bash-5.1$ ar -t gettext/intl/.libs/libintl.a
>> >> bindtextdom.o
>> >> dcgettext.o
>> >> ...
>> >> --8<---cut here---end--->8---
>> >>
>> >>
>> >> > Also, for me, the out of tree path was
>> >> >
>> >> > gettext/gettext-runtime/intl/.libs
>> >> >
>> >> > Is your search path missing a level?
>> >>
>> >> No, the above is generated by the GCC build system and builds
>> >> gettext-runtime directly (per Brunos recommendation a while ago) as it
>> >> is replacing intl/ of similar functionality.
>> >>
>> >> I'm currently building GCC with libintl with the threads hack you
>> >> mentioned applied (as I got undefined references to the pthread
>> >> functions you discovered).  I suspect that, bar this issue (which, IIUC,
>> >> Bruno will fix in a new release?) the patch above will fix the issues
>> >> you've encountered on AIX (note that if you want to use gettext in-tree,
>> >> you'd still have to fetch gettext into the tree).
>> >>
>> >> Maybe we shou

[PATCH] c-family, middle-end: Add __builtin_c[lt]zg (arg, 0ULL) exception

Hi!

In https://sourceware.org/pipermail/libc-alpha/2023-November/152819.html
Florian Weimer raised concern that the type-generic stdbit.h macros
currently being considered suffer from similar problem as old tgmath.h
implementation, in particular that the macros expand during preprocessing
their arguments multiple times and if one nests these stdbit.h type-generic
macros several times, that can result in extremely large preprocessed source
and long compile times, even when the argument is only actually evaluated
once at runtime for side-effects.

As I'd strongly prefer not to add new builtins for all the 14 stdbit.h
type-generic macros, I think it is better to build the macros from
smaller building blocks.

The following patch adds the first one.
While one can say implement e.g. stdc_leading_zeros(value) macro
as ((unsigned int) __builtin_clzg (value, __builtin_popcountg ((__typeof 
(value)) ~(__typeof (value)) 0)))
that expands the argument 3 times, and even if it just used
((unsigned int) __builtin_clzg (value, __builtin_popcountg ((__typeof (value)) 
-1)))
relying on 2-s complement, that is still twice.

I'd prefer not to add optional 3rd argument to these, but given that the
second argument if specified right now has to have signed int type,
the following patch adds an exception that it allows 0ULL as a magic
value for the argument to mean fill in the precision of the first argument.

Ok for trunk if it passes bootstrap/regtest?

2023-11-18  Jakub Jelinek  

PR c/111309
gcc/
* builtins.cc (fold_builtin_bit_query): If arg1 is 0ULL, use
TYPE_PRECISION (arg0_type) instead of it.
* fold-const-call.cc (fold_const_call_sss): Rename arg0_type
argument to arg_type, add arg1_type argument, if for CLZ/CTZ
second argument is unsigned long long, use
TYPE_PRECISION (arg0_type).
(fold_const_call_1): Pass also TREE_TYPE (arg1) to
fold_const_call_sss.
* doc/extend.texi (__builtin_clzg, __builtin_ctzg): Document
behavior for second argument 0ULL.
gcc/c-family/
* c-common.cc (check_builtin_function_arguments): If args[1] is
0ULL, use TYPE_PRECISION (TREE_TYPE (args[0])) instead of it.
gcc/testsuite/
* c-c++-common/pr111309-3.c: New test.
* gcc.dg/torture/bitint-43.c: Add tests with 0ULL second argument.

--- gcc/builtins.cc.jj  2023-11-14 10:52:16.170276318 +0100
+++ gcc/builtins.cc 2023-11-18 13:55:02.996395917 +0100
@@ -9591,6 +9591,10 @@ fold_builtin_bit_query (location_t loc,
 case BUILT_IN_CLZG:
   if (arg1 && TREE_CODE (arg1) != INTEGER_CST)
return NULL_TREE;
+  if (arg1
+ && (TYPE_MAIN_VARIANT (TREE_TYPE (arg1))
+ == long_long_unsigned_type_node))
+   arg1 = build_int_cst (integer_type_node, TYPE_PRECISION (arg0_type));
   ifn = IFN_CLZ;
   fcodei = BUILT_IN_CLZ;
   fcodel = BUILT_IN_CLZL;
@@ -9599,6 +9603,10 @@ fold_builtin_bit_query (location_t loc,
 case BUILT_IN_CTZG:
   if (arg1 && TREE_CODE (arg1) != INTEGER_CST)
return NULL_TREE;
+  if (arg1
+ && (TYPE_MAIN_VARIANT (TREE_TYPE (arg1))
+ == long_long_unsigned_type_node))
+   arg1 = build_int_cst (integer_type_node, TYPE_PRECISION (arg0_type));
   ifn = IFN_CTZ;
   fcodei = BUILT_IN_CTZ;
   fcodel = BUILT_IN_CTZL;
--- gcc/fold-const-call.cc.jj   2023-11-14 10:52:16.186276097 +0100
+++ gcc/fold-const-call.cc  2023-11-18 13:49:57.514641417 +0100
@@ -1543,13 +1543,13 @@ fold_const_call_sss (real_value *result,
 
   *RESULT = FN (ARG0, ARG1)
 
-   where ARG_TYPE is the type of ARG0 and PRECISION is the number of bits in
-   the result.  Return true on success.  */
+   where ARG0_TYPE is the type of ARG0, ARG1_TYPE is the type of ARG1 and
+   PRECISION is the number of bits in the result.  Return true on success.  */
 
 static bool
 fold_const_call_sss (wide_int *result, combined_fn fn,
 const wide_int_ref &arg0, const wide_int_ref &arg1,
-unsigned int precision, tree arg_type ATTRIBUTE_UNUSED)
+unsigned int precision, tree arg0_type, tree arg1_type)
 {
   switch (fn)
 {
@@ -1559,6 +1559,8 @@ fold_const_call_sss (wide_int *result, c
int tmp;
if (wi::ne_p (arg0, 0))
  tmp = wi::clz (arg0);
+   else if (TYPE_MAIN_VARIANT (arg1_type) == long_long_unsigned_type_node)
+ tmp = TYPE_PRECISION (arg0_type);
else
  tmp = arg1.to_shwi ();
*result = wi::shwi (tmp, precision);
@@ -1571,6 +1573,8 @@ fold_const_call_sss (wide_int *result, c
int tmp;
if (wi::ne_p (arg0, 0))
  tmp = wi::ctz (arg0);
+   else if (TYPE_MAIN_VARIANT (arg1_type) == long_long_unsigned_type_node)
+ tmp = TYPE_PRECISION (arg0_type);
else
  tmp = arg1.to_shwi ();
*result = wi::shwi (tmp, precision);
@@ -1625,7 +1629,7 @@ fold_const_call_1 (combined_fn fn, tree
  wide_int result;

[PATCH] c: Add __builtin_bit_complement

Hi!

Another obstackle mentioned in the
https://sourceware.org/pipermail/libc-alpha/2023-November/152819.html
thread is that half of the stdc_{leading,trailing}_{zeros,ones},
stdc_first_{leading,trailing}_{zero,one} and stdc_count_{zeros,ones}
type-generic macros actually need to use the __builtin_*g type-generic
builtins on inverted values, but need to use
(__typeof (value)) ~(value)
for that, because if value has unsigned char or unsigned short type,
it is promoted to int with the ~ operator, but that suffers from the
nested macros problem as well.

The following patch adds a C only builtin (if we'd want to make
stdbit.h work in C++, I think best would be to use a function template),
especially because the builtin needs a type-generic return type,
which is functionally equivalent to (__typeof (arg)) ~(arg).

Ok for trunk if it passes bootstrap/regtest?

2023-11-18  Jakub Jelinek  

gcc/
* doc/extend.texi (__builtin_bit_complement): Document.
gcc/c-family/
* c-common.h (enum rid): Add RID_BUILTIN_BIT_COMPLEMENT.
* c-common.cc (c_common_reswords): Add __builtin_bit_complement.
Move __builtin_assoc_barrier alphabetically.
gcc/c/
* c-parser.cc (c_parser_postfix_expression): Handle
RID_BUILTIN_BIT_COMPLEMENT.
* c-decl.cc (names_builtin_p): Likewise.  Move
RID_BUILTIN_ASSOC_BARRIER alphabetically.
gcc/testsuite/
* gcc.dg/builtin-bit-complement-1.c: New test.
* gcc.dg/builtin-bit-complement-2.c: New test.

--- gcc/doc/extend.texi.jj  2023-11-18 13:17:40.982551766 +0100
+++ gcc/doc/extend.texi 2023-11-18 18:39:24.303561182 +0100
@@ -15066,6 +15066,14 @@ unsigned integer (standard, extended or
 promotions are performed on the argument.
 @enddefbuiltin
 
+@defbuiltin{@var{type} __builtin_bit_complement (@var{type} @var{arg})}
+The @code{__builtin_bit_complement} function is available only
+in C.  It is type-generic, the argument can be any integral type, and
+is equivalent to @code{(__typeof (@var{arg})) ~(@var{arg})}, except that
+there is no need to specify the argument tokens twice.  No integral argument
+promotions are performed on the argument.
+@enddefbuiltin
+
 @defbuiltin{double __builtin_powi (double, int)}
 @defbuiltinx{float __builtin_powif (float, int)}
 @defbuiltinx{{long double} __builtin_powil (long double, int)}
--- gcc/c-family/c-common.h.jj  2023-11-09 09:04:18.422546151 +0100
+++ gcc/c-family/c-common.h 2023-11-18 17:05:00.223049400 +0100
@@ -110,6 +110,7 @@ enum rid
   RID_TYPES_COMPATIBLE_P,  RID_BUILTIN_COMPLEX, 
RID_BUILTIN_SHUFFLE,
   RID_BUILTIN_SHUFFLEVECTOR,   RID_BUILTIN_CONVERTVECTOR,   RID_BUILTIN_TGMATH,
   RID_BUILTIN_HAS_ATTRIBUTE,   RID_BUILTIN_ASSOC_BARRIER,
+  RID_BUILTIN_BIT_COMPLEMENT,
   RID_DFLOAT32, RID_DFLOAT64, RID_DFLOAT128,
 
   /* TS 18661-3 keywords, in the same sequence as the TI_* values.  */
--- gcc/c-family/c-common.cc.jj 2023-11-18 13:20:55.751844490 +0100
+++ gcc/c-family/c-common.cc2023-11-18 17:03:17.838466559 +0100
@@ -380,7 +380,9 @@ const struct c_common_resword c_common_r
   { "__attribute__",   RID_ATTRIBUTE,  0 },
   { "__auto_type", RID_AUTO_TYPE,  D_CONLY },
   { "__builtin_addressof", RID_ADDRESSOF, D_CXXONLY },
+  { "__builtin_assoc_barrier", RID_BUILTIN_ASSOC_BARRIER, 0 },
   { "__builtin_bit_cast", RID_BUILTIN_BIT_CAST, D_CXXONLY },
+  { "__builtin_bit_complement", RID_BUILTIN_BIT_COMPLEMENT, D_CONLY },
   { "__builtin_call_with_static_chain",
 RID_BUILTIN_CALL_WITH_STATIC_CHAIN, D_CONLY },
   { "__builtin_choose_expr", RID_CHOOSE_EXPR, D_CONLY },
@@ -388,7 +390,6 @@ const struct c_common_resword c_common_r
   { "__builtin_convertvector", RID_BUILTIN_CONVERTVECTOR, 0 },
   { "__builtin_has_attribute", RID_BUILTIN_HAS_ATTRIBUTE, 0 },
   { "__builtin_launder", RID_BUILTIN_LAUNDER, D_CXXONLY },
-  { "__builtin_assoc_barrier", RID_BUILTIN_ASSOC_BARRIER, 0 },
   { "__builtin_shuffle", RID_BUILTIN_SHUFFLE, 0 },
   { "__builtin_shufflevector", RID_BUILTIN_SHUFFLEVECTOR, 0 },
   { "__builtin_tgmath", RID_BUILTIN_TGMATH, D_CONLY },
--- gcc/c/c-parser.cc.jj2023-11-11 08:51:54.324206082 +0100
+++ gcc/c/c-parser.cc   2023-11-18 17:16:23.228591823 +0100
@@ -11743,6 +11743,46 @@ c_parser_postfix_expression (c_parser *p
set_c_expr_source_range (&expr, start_loc, end_loc);
  }
  break;
+   case RID_BUILTIN_BIT_COMPLEMENT:
+ {
+   vec *cexpr_list;
+   c_expr_t *arg_p;
+   location_t close_paren_loc;
+
+   c_parser_consume_token (parser);
+   if (!c_parser_get_builtin_args (parser,
+   "__builtin_bit_complement",
+   &cexpr_list, false,
+   &close_paren_loc))
+ {
+   expr.set_error ();
+   break;
+ }
+
+   if (vec_safe_length (cexpr_list) != 1)
+ {
+   error_at (loc, "wrong

[PATCH] c: Add __builtin_stdc_bit_{width,floor,ceil} builtins

Hi!

For these 3 type-generic macros I'm out of ideas how to satisfy
all the requirements (no use of ({ ... }), not expanding argument multiple
times, not evaluating side-effects multiple times using some small building
blocks, so the following patch introduces 3 new C only builtins which
can be used to implement those macros.

As a bonus, I think with this and previous 2 patches all the type-generic
stdbit.h macros can be actually used in constant expressions.
While in theory __builtin_stdc_bit_width builtin could be C/C++ in
builtins.def, given that it has unsigned return type rather than the
type-generic type, because it has very similar implementation to the other
two and is not really useful for C++ I chose to implement all 3 in there.

Ok for trunk if it passes bootstrap/regtest?

2023-11-18  Jakub Jelinek  

gcc/
* doc/extend.texi (__builtin_stdc_bit_width, __builtin_stdc_bit_floor,
__builtin_stdc_bit_ceil): Document.
gcc/c-family/
* c-common.h (enum rid): Add RID_BUILTIN_STDC_BIT_WIDTH,
RID_BUILTIN_STDC_BIT_FLOOR and RID_BUILTIN_STDC_BIT_CEIL.
* c-common.cc (c_common_reswords): Add __builtin_stdc_bit_width,
__builtin_stdc_bit_floor and __builtin_stdc_bit_ceil.
gcc/c/
* c-parser.cc (c_parser_postfix_expression): Handle
RID_BUILTIN_STDC_BIT_WIDTH, RID_BUILTIN_STDC_BIT_FLOOR and
RID_BUILTIN_STDC_BIT_CEIL.
* c-decl.cc (names_builtin_p): Likewise.
gcc/testsuite/
* gcc.dg/builtin-stdc-bit-1.c: New test.
* gcc.dg/builtin-stdc-bit-2.c: New test.

--- gcc/doc/extend.texi.jj  2023-11-18 19:29:08.056176234 +0100
+++ gcc/doc/extend.texi 2023-11-18 19:35:08.454176312 +0100
@@ -15074,6 +15074,37 @@ there is no need to specify the argument
 promotions are performed on the argument.
 @enddefbuiltin
 
+@defbuiltin{unsigned int __builtin_stdc_bit_width (@var{type} @var{arg})}
+The @code{__builtin_stdc_bit_width} function is available only
+in C.  It is type-generic, the argument can be any unsigned integer
+(standard, extended or bit-precise).  No integral argument promotions are
+performed on the argument.  It is equivalent to
+@code{(unsigned int) (@var{prec} - __builtin_clzg (@var{arg}, @var{prec}))}
+where @var{prec} is bit width of @var{type}.
+@enddefbuiltin
+
+@defbuiltin{@var{type} __builtin_stdc_bit_floor (@var{type} @var{arg})}
+The @code{__builtin_stdc_bit_floor} function is available only
+in C.  It is type-generic, the argument can be any unsigned integer
+(standard, extended or bit-precise).  No integral argument promotions are
+performed on the argument.  It is equivalent to
+@code{@var{arg} == 0 ? (@var{type}) 0
+: (@var{type}) 1 << (@var{prec} - 1 - __builtin_clzg (@var{arg}))}
+where @var{prec} is bit width of @var{type}, except that side-effects
+in @var{arg} are evaluated just once.
+@enddefbuiltin
+
+@defbuiltin{@var{type} __builtin_stdc_bit_ceil (@var{type} @var{arg})}
+The @code{__builtin_stdc_bit_ceil} function is available only
+in C.  It is type-generic, the argument can be any unsigned integer
+(standard, extended or bit-precise).  No integral argument promotions are
+performed on the argument.  It is equivalent to
+@code{@var{arg} <= 1 ? (@var{type}) 1
+: (@var{type}) 1 << (@var{prec} - __builtin_clzg ((@var{type}) (@var{arg} - 
1)))}
+where @var{prec} is bit width of @var{type}, except that side-effects
+in @var{arg} are evaluated just once.
+@enddefbuiltin
+
 @defbuiltin{double __builtin_powi (double, int)}
 @defbuiltinx{float __builtin_powif (float, int)}
 @defbuiltinx{{long double} __builtin_powil (long double, int)}
--- gcc/c-family/c-common.h.jj  2023-11-18 17:05:00.223049400 +0100
+++ gcc/c-family/c-common.h 2023-11-18 18:46:53.162346148 +0100
@@ -110,7 +110,8 @@ enum rid
   RID_TYPES_COMPATIBLE_P,  RID_BUILTIN_COMPLEX, 
RID_BUILTIN_SHUFFLE,
   RID_BUILTIN_SHUFFLEVECTOR,   RID_BUILTIN_CONVERTVECTOR,   RID_BUILTIN_TGMATH,
   RID_BUILTIN_HAS_ATTRIBUTE,   RID_BUILTIN_ASSOC_BARRIER,
-  RID_BUILTIN_BIT_COMPLEMENT,
+  RID_BUILTIN_BIT_COMPLEMENT,  RID_BUILTIN_STDC_BIT_WIDTH,
+  RID_BUILTIN_STDC_BIT_CEIL,   RID_BUILTIN_STDC_BIT_FLOOR,
   RID_DFLOAT32, RID_DFLOAT64, RID_DFLOAT128,
 
   /* TS 18661-3 keywords, in the same sequence as the TI_* values.  */
--- gcc/c-family/c-common.cc.jj 2023-11-18 17:03:17.838466559 +0100
+++ gcc/c-family/c-common.cc2023-11-18 18:47:47.809588817 +0100
@@ -392,6 +392,9 @@ const struct c_common_resword c_common_r
   { "__builtin_launder", RID_BUILTIN_LAUNDER, D_CXXONLY },
   { "__builtin_shuffle", RID_BUILTIN_SHUFFLE, 0 },
   { "__builtin_shufflevector", RID_BUILTIN_SHUFFLEVECTOR, 0 },
+  { "__builtin_stdc_bit_ceil", RID_BUILTIN_STDC_BIT_CEIL, D_CONLY },
+  { "__builtin_stdc_bit_floor", RID_BUILTIN_STDC_BIT_FLOOR, D_CONLY },
+  { "__builtin_stdc_bit_width", RID_BUILTIN_STDC_BIT_WIDTH, D_CONLY },
   { "__builtin_tgmath", RID_BUILTIN_TGMATH, D_CONLY },
   { "__builtin_offsetof", RID_OFFSETOF, 0 },
   { "__builtin_types_compatible_p", RID_TYPES_C

[PATCH] libstdc++: implement std::generator

2023-11-18 Thread Arsen Arsenović

libstdc++-v3/ChangeLog:

* include/Makefile.am: Install std/generator, bits/elements_of.h
as freestanding.
* include/Makefile.in: Regenerate.
* include/bits/version.def: Add __cpp_lib_generator.
* include/bits/version.h: Regenerate.
* include/precompiled/stdc++.h: Include .
* include/std/ranges: Include bits/elements_of.h
* include/bits/elements_of.h: New file.
* include/std/generator: New file.
* testsuite/24_iterators/range_generators/01.cc: New test.
* testsuite/24_iterators/range_generators/02.cc: New test.
* testsuite/24_iterators/range_generators/copy.cc: New test.
* testsuite/24_iterators/range_generators/except.cc: New test.
* testsuite/24_iterators/range_generators/synopsis.cc: New test.
* testsuite/24_iterators/range_generators/subrange.cc: New test.
---
Evening,

This is an implementation of  from C++23.  It should be
feature-complete, though it doesn't have all the tests that it ought to
and is missing a few tweaks.

Posting to get reviews in the meanwhile, in case something obvious was
missed.

Have a lovely night :-)

 libstdc++-v3/include/Makefile.am  |   2 +
 libstdc++-v3/include/Makefile.in  |   2 +
 libstdc++-v3/include/bits/elements_of.h   |  72 ++
 libstdc++-v3/include/bits/version.def |   9 +
 libstdc++-v3/include/bits/version.h   |  11 +
 libstdc++-v3/include/precompiled/stdc++.h |   1 +
 libstdc++-v3/include/std/generator| 820 ++
 libstdc++-v3/include/std/ranges   |   4 +
 .../24_iterators/range_generators/01.cc   |  55 ++
 .../24_iterators/range_generators/02.cc   | 219 +
 .../24_iterators/range_generators/copy.cc |  97 +++
 .../24_iterators/range_generators/except.cc   |  97 +++
 .../24_iterators/range_generators/subrange.cc |  45 +
 .../24_iterators/range_generators/synopsis.cc |  38 +
 14 files changed, 1472 insertions(+)
 create mode 100644 libstdc++-v3/include/bits/elements_of.h
 create mode 100644 libstdc++-v3/include/std/generator
 create mode 100644 libstdc++-v3/testsuite/24_iterators/range_generators/01.cc
 create mode 100644 libstdc++-v3/testsuite/24_iterators/range_generators/02.cc
 create mode 100644 libstdc++-v3/testsuite/24_iterators/range_generators/copy.cc
 create mode 100644 
libstdc++-v3/testsuite/24_iterators/range_generators/except.cc
 create mode 100644 
libstdc++-v3/testsuite/24_iterators/range_generators/subrange.cc
 create mode 100644 
libstdc++-v3/testsuite/24_iterators/range_generators/synopsis.cc

diff --git a/libstdc++-v3/include/Makefile.am b/libstdc++-v3/include/Makefile.am
index 17d9d9cec313..0b764f2b8a9e 100644
--- a/libstdc++-v3/include/Makefile.am
+++ b/libstdc++-v3/include/Makefile.am
@@ -35,6 +35,7 @@ std_freestanding = \
${std_srcdir}/coroutine \
${std_srcdir}/expected \
${std_srcdir}/functional \
+   ${std_srcdir}/generator \
${std_srcdir}/iterator \
${std_srcdir}/limits \
${std_srcdir}/memory \
@@ -122,6 +123,7 @@ bits_freestanding = \
${bits_srcdir}/concept_check.h \
${bits_srcdir}/char_traits.h \
${bits_srcdir}/cpp_type_traits.h \
+   ${bits_srcdir}/elements_of.h \
${bits_srcdir}/enable_special_members.h \
${bits_srcdir}/functexcept.h \
${bits_srcdir}/functional_hash.h \
diff --git a/libstdc++-v3/include/Makefile.in b/libstdc++-v3/include/Makefile.in
index f038af709cc4..7f1a6592942e 100644
--- a/libstdc++-v3/include/Makefile.in
+++ b/libstdc++-v3/include/Makefile.in
@@ -393,6 +393,7 @@ std_freestanding = \
${std_srcdir}/coroutine \
${std_srcdir}/expected \
${std_srcdir}/functional \
+   ${std_srcdir}/generator \
${std_srcdir}/iterator \
${std_srcdir}/limits \
${std_srcdir}/memory \
@@ -477,6 +478,7 @@ bits_freestanding = \
${bits_srcdir}/concept_check.h \
${bits_srcdir}/char_traits.h \
${bits_srcdir}/cpp_type_traits.h \
+   ${bits_srcdir}/elements_of.h \
${bits_srcdir}/enable_special_members.h \
${bits_srcdir}/functexcept.h \
${bits_srcdir}/functional_hash.h \
diff --git a/libstdc++-v3/include/bits/elements_of.h 
b/libstdc++-v3/include/bits/elements_of.h
new file mode 100644
index ..663e15a94aa7
--- /dev/null
+++ b/libstdc++-v3/include/bits/elements_of.h
@@ -0,0 +1,72 @@
+// Tag type for yielding ranges rather than values in   -*- C++ -*-
+
+// Copyright (C) 2023 Free Software Foundation, Inc.
+//
+// This file is part of the GNU ISO C++ Library.  This library is free
+// software; you can redistribute it and/or modify it under the
+// terms of the GNU General Public License as published by the
+// Free Software Foundation; either version 3, or (at your option)
+// any later version.
+
+// This library is distributed in the hope that it will be useful,
+// but WITHOUT ANY WARRANTY; without even the implied

C runtime checking for assigment of VM types



This is another revised series for checking for
bounds consistency when assigning VM types.


Based on feedback, I disentangled this from UBSan for 
a three reasons:

- I think it makes sense as a stand-alone feature
similar to other run-time instrumentation features
GCC already has.

- Not all checks are strictly speaking for UB, i.e. it
triggers for strictly conforming code which has
inconsistent bounds.  For this feature, it makes 
sense to assume that bounds are correct (and GCC warns 
about inconsistently declared bounds by default already
for a while).

- So far, there is no upstream support in libubsan
which we could use.



Bootstrapped and regression tested on x86_64.


Martin

[PATCH 1/4] c: runtime checking for assigment of VM types 1/4




When checking compatibility of types during assignment, collect
all pairs of types where the outermost bound needs to match at
run-time.  This list is then processed to add runtime checks for
each bound.

gcc/c-family:
* c-opt (fvla-bounds): New flag.

gcc/c:
* c-typeck.cc (struct instrument_data): New structure.
(comp_target_types_instr convert_for_assignment_instrument): New
interfaces for existing functions.
(struct comptypes_data): Add instrumentation.
(comptypes_check_enum_int_intr): New interface.
(comptypes_check_enum_int): Old interface (calls new).
(comptypes_internal): Collect VLA types needed for UBSan.
(comp_target_types_instr): New interface.
(comp_target_types): Old interface (calls new).
(function_types_compatible_p): No instrumentation for function
arguments.
(process_vm_constraints): New function.
(convert_argument): Adapt.
(convert_for_assignment_instrument): New interface.
(convert_for_assignment): Instrument assignments.
(c_instrument_vm_assign): Helper function.
(process_vm_constraints): Helper function.

gcc/doc/:
* invoke.texi (fvla-bounds): Document new flag.

gcc/testsuite:
* gcc.dg/vla-bounds-1.c: New test.
* gcc.dg/vla-bounds-assign-1.c: New test.
* gcc.dg/vla-bounds-assign-2.c: New test.
* gcc.dg/vla-bounds-assign-3.c: New test.
* gcc.dg/vla-bounds-assign-4.c: New test.
* gcc.dg/vla-bounds-func-1.c: New test.
* gcc.dg/vla-bounds-init-1.c: New test.
* gcc.dg/vla-bounds-init-2.c: New test.
* gcc.dg/vla-bounds-init-3.c: New test.
* gcc.dg/vla-bounds-init-4.c: New test.
* gcc.dg/vla-bounds-nest-1.c: New test.
* gcc.dg/vla-bounds-nest-2.c: New test.
* gcc.dg/vla-bounds-ret-1.c: New test.
* gcc.dg/vla-bounds-ret-2.c: New test.
---
 gcc/c-family/c.opt |   4 +
 gcc/c/c-typeck.cc  | 171 ++---
 gcc/doc/invoke.texi|  15 ++
 gcc/testsuite/gcc.dg/vla-bounds-1.c|  85 ++
 gcc/testsuite/gcc.dg/vla-bounds-assign-1.c | 126 +++
 gcc/testsuite/gcc.dg/vla-bounds-assign-2.c | 126 +++
 gcc/testsuite/gcc.dg/vla-bounds-assign-3.c | 126 +++
 gcc/testsuite/gcc.dg/vla-bounds-assign-4.c | 133 
 gcc/testsuite/gcc.dg/vla-bounds-func-1.c   |  56 +++
 gcc/testsuite/gcc.dg/vla-bounds-init-1.c   | 125 +++
 gcc/testsuite/gcc.dg/vla-bounds-init-2.c   | 125 +++
 gcc/testsuite/gcc.dg/vla-bounds-init-3.c   | 126 +++
 gcc/testsuite/gcc.dg/vla-bounds-init-4.c   | 125 +++
 gcc/testsuite/gcc.dg/vla-bounds-nest-1.c   |  39 +
 gcc/testsuite/gcc.dg/vla-bounds-nest-2.c   |  33 
 gcc/testsuite/gcc.dg/vla-bounds-ret-1.c| 132 
 gcc/testsuite/gcc.dg/vla-bounds-ret-2.c| 133 
 17 files changed, 1661 insertions(+), 19 deletions(-)
 create mode 100644 gcc/testsuite/gcc.dg/vla-bounds-1.c
 create mode 100644 gcc/testsuite/gcc.dg/vla-bounds-assign-1.c
 create mode 100644 gcc/testsuite/gcc.dg/vla-bounds-assign-2.c
 create mode 100644 gcc/testsuite/gcc.dg/vla-bounds-assign-3.c
 create mode 100644 gcc/testsuite/gcc.dg/vla-bounds-assign-4.c
 create mode 100644 gcc/testsuite/gcc.dg/vla-bounds-func-1.c
 create mode 100644 gcc/testsuite/gcc.dg/vla-bounds-init-1.c
 create mode 100644 gcc/testsuite/gcc.dg/vla-bounds-init-2.c
 create mode 100644 gcc/testsuite/gcc.dg/vla-bounds-init-3.c
 create mode 100644 gcc/testsuite/gcc.dg/vla-bounds-init-4.c
 create mode 100644 gcc/testsuite/gcc.dg/vla-bounds-nest-1.c
 create mode 100644 gcc/testsuite/gcc.dg/vla-bounds-nest-2.c
 create mode 100644 gcc/testsuite/gcc.dg/vla-bounds-ret-1.c
 create mode 100644 gcc/testsuite/gcc.dg/vla-bounds-ret-2.c

diff --git a/gcc/c-family/c.opt b/gcc/c-family/c.opt
index b10c6057cd1..29bc0956181 100644
--- a/gcc/c-family/c.opt
+++ b/gcc/c-family/c.opt
@@ -2280,6 +2280,10 @@ fvisibility-ms-compat
 C++ ObjC++ Var(flag_visibility_ms_compat)
 Changes visibility to match Microsoft Visual Studio by default.
 
+fvla-bounds
+C Var(flag_vla_bounds)
+Emit run-time consistency checks for variably-modified types.
+
 fvtable-gc
 C++ ObjC++ WarnRemoved
 No longer supported.
diff --git a/gcc/c/c-typeck.cc b/gcc/c/c-typeck.cc
index 1dbb4471a88..cb5887b6255 100644
--- a/gcc/c/c-typeck.cc
+++ b/gcc/c/c-typeck.cc
@@ -93,11 +93,13 @@ static tree qualify_type (tree, tree);
 struct comptypes_data;
 static bool tagged_types_tu_compatible_p (const_tree, const_tree,
  struct comptypes_data *);
-static bool comp_target_types (location_t, tree, tree);
 static bool function_types_compatible_p (const_tree, const_tree,
 struct comptypes_data *);
 static bool type_lists_compatible_p (const_tree, const_tree,

[PATCH 2/4] c: runtime checking for assigment of VM types 2/4




Support instrumentation of function arguments for functions
called via a declaration.  We can support only simple size
expressions without side effects, because the run-time
instrumentation is done before the call, but the expressions
are evaluated in the callee.

gcc/c:
* c-typeck.cc (process_vm_constraints): Add support
for instrumenting function arguments.
(convert_arguments): Instrument function arguments.
(convert_for_assigmnent): Adapt.

gcc/testsuide/gcc.dg:
* vla-bounds-func-1.c: Update.
* vla-bounds-func-2.c: New test.
* vla-bounds-func-3.c: New test.
* vla-bounds-func-4.c: New test.
* vla-bounds-func-5.c: New test.
* vla-bounds-func-6.c: New test.
* vla-bounds-func-7.c: New test.
* vla-bounds-func-8.c: New test.
* vla-bounds-func-9.c: New test.
* vla-bounds-func-10.c: New test.
* vla-bounds-func-11.c: New test.
* vla-bounds-func-12.c: New test.
* vla-bounds-func-13.c: New test.
* vla-bounds-func-14.c: New test.
* vla-bounds-func-15.c: New test.
* vla-bounds-func-16.c: New test.
* vla-bounds-func-17.c: New test.
* vla-bounds-func-18.c: New test.
* vla-bounds-func-19.c: New test.
* vla-bounds-func-20.c: New test.
---
 gcc/c/c-typeck.cc | 151 +++---
 gcc/testsuite/gcc.dg/vla-bounds-func-1.c  |   4 +-
 gcc/testsuite/gcc.dg/vla-bounds-func-10.c |  99 ++
 gcc/testsuite/gcc.dg/vla-bounds-func-11.c |  99 ++
 gcc/testsuite/gcc.dg/vla-bounds-func-12.c |  99 ++
 gcc/testsuite/gcc.dg/vla-bounds-func-13.c |  99 ++
 gcc/testsuite/gcc.dg/vla-bounds-func-14.c |  45 +++
 gcc/testsuite/gcc.dg/vla-bounds-func-15.c |  45 +++
 gcc/testsuite/gcc.dg/vla-bounds-func-16.c |  45 +++
 gcc/testsuite/gcc.dg/vla-bounds-func-17.c |  45 +++
 gcc/testsuite/gcc.dg/vla-bounds-func-18.c |  45 +++
 gcc/testsuite/gcc.dg/vla-bounds-func-19.c |  45 +++
 gcc/testsuite/gcc.dg/vla-bounds-func-2.c  |  99 ++
 gcc/testsuite/gcc.dg/vla-bounds-func-20.c |  45 +++
 gcc/testsuite/gcc.dg/vla-bounds-func-3.c  |  99 ++
 gcc/testsuite/gcc.dg/vla-bounds-func-4.c  |  99 ++
 gcc/testsuite/gcc.dg/vla-bounds-func-5.c  |  99 ++
 gcc/testsuite/gcc.dg/vla-bounds-func-6.c  |  99 ++
 gcc/testsuite/gcc.dg/vla-bounds-func-7.c  |  99 ++
 gcc/testsuite/gcc.dg/vla-bounds-func-8.c  |  99 ++
 gcc/testsuite/gcc.dg/vla-bounds-func-9.c  |  99 ++
 21 files changed, 1641 insertions(+), 17 deletions(-)
 create mode 100644 gcc/testsuite/gcc.dg/vla-bounds-func-10.c
 create mode 100644 gcc/testsuite/gcc.dg/vla-bounds-func-11.c
 create mode 100644 gcc/testsuite/gcc.dg/vla-bounds-func-12.c
 create mode 100644 gcc/testsuite/gcc.dg/vla-bounds-func-13.c
 create mode 100644 gcc/testsuite/gcc.dg/vla-bounds-func-14.c
 create mode 100644 gcc/testsuite/gcc.dg/vla-bounds-func-15.c
 create mode 100644 gcc/testsuite/gcc.dg/vla-bounds-func-16.c
 create mode 100644 gcc/testsuite/gcc.dg/vla-bounds-func-17.c
 create mode 100644 gcc/testsuite/gcc.dg/vla-bounds-func-18.c
 create mode 100644 gcc/testsuite/gcc.dg/vla-bounds-func-19.c
 create mode 100644 gcc/testsuite/gcc.dg/vla-bounds-func-2.c
 create mode 100644 gcc/testsuite/gcc.dg/vla-bounds-func-20.c
 create mode 100644 gcc/testsuite/gcc.dg/vla-bounds-func-3.c
 create mode 100644 gcc/testsuite/gcc.dg/vla-bounds-func-4.c
 create mode 100644 gcc/testsuite/gcc.dg/vla-bounds-func-5.c
 create mode 100644 gcc/testsuite/gcc.dg/vla-bounds-func-6.c
 create mode 100644 gcc/testsuite/gcc.dg/vla-bounds-func-7.c
 create mode 100644 gcc/testsuite/gcc.dg/vla-bounds-func-8.c
 create mode 100644 gcc/testsuite/gcc.dg/vla-bounds-func-9.c

diff --git a/gcc/c/c-typeck.cc b/gcc/c/c-typeck.cc
index cb5887b6255..b65fc450940 100644
--- a/gcc/c/c-typeck.cc
+++ b/gcc/c/c-typeck.cc
@@ -1067,19 +1067,13 @@ common_type (tree t1, tree t2)
 /* Instrument assignment of variably modified types.  */
 
 static tree
-c_instrument_vm_assign (location_t loc, tree a, tree b)
+c_instrument_vm_assign (location_t loc, tree a, tree b, tree as, tree bs)
 {
   gcc_assert (flag_vla_bounds);
 
   gcc_assert (TREE_CODE (a) == ARRAY_TYPE);
   gcc_assert (TREE_CODE (b) == ARRAY_TYPE);
 
-  tree as = TYPE_MAX_VALUE (TYPE_DOMAIN (a));
-  tree bs = TYPE_MAX_VALUE (TYPE_DOMAIN (b));
-
-  as = fold_build2 (PLUS_EXPR, sizetype, as, size_one_node);
-  bs = fold_build2 (PLUS_EXPR, sizetype, bs, size_one_node);
-
   tree t = build2 (NE_EXPR, boolean_type_node, as, bs);
   tree tt = build_call_expr_loc (loc, builtin_decl_explicit (BUILT_IN_TRAP), 
0);
 
@@ -3286,7 +3280,8 @@ static tree
 convert_argument (location_t ploc, tree function, tree fundecl,
  tree type, tree origtype, tree val, tree valtype,
  bool npc, tree rname, int parmnum, int argnum,
- boo

[PATCH 3/4] c: runtime checking for assigment of VM types 3/4




Support instrumentation of functions called via pointers.  To do so,
record the declaration with the parameter types, so that it can be
retrieved later.

gcc/c:
c-decl.cc (get_parm_info): Record function declaration
for arguments.
c-typeck.cc (process_vm_constraints): Instrument functions
called via pointers.

gcc/testsuide/gcc.dg:
* vla-bounds-func-1.c: Add warning.
* vla-bounds-fnptr.c: New test.
* vla-bounds-fnptr-1.c: New test.
* vla-bounds-fnptr-2.c: New test.
* vla-bounds-fnptr-3.c: New test.
* vla-bounds-fnptr-4.c: New test.
* vla-bounds-fnptr-5.c: New test.
---
 gcc/c/c-decl.cc   |  4 ++
 gcc/c/c-typeck.cc | 14 +++-
 gcc/testsuite/gcc.dg/vla-bounds-fnptr-1.c | 78 +++
 gcc/testsuite/gcc.dg/vla-bounds-fnptr-2.c | 78 +++
 gcc/testsuite/gcc.dg/vla-bounds-fnptr-3.c | 78 +++
 gcc/testsuite/gcc.dg/vla-bounds-fnptr-4.c | 78 +++
 gcc/testsuite/gcc.dg/vla-bounds-fnptr-5.c | 78 +++
 gcc/testsuite/gcc.dg/vla-bounds-fnptr.c   | 78 +++
 gcc/testsuite/gcc.dg/vla-bounds-func-1.c  |  2 +-
 9 files changed, 485 insertions(+), 3 deletions(-)
 create mode 100644 gcc/testsuite/gcc.dg/vla-bounds-fnptr-1.c
 create mode 100644 gcc/testsuite/gcc.dg/vla-bounds-fnptr-2.c
 create mode 100644 gcc/testsuite/gcc.dg/vla-bounds-fnptr-3.c
 create mode 100644 gcc/testsuite/gcc.dg/vla-bounds-fnptr-4.c
 create mode 100644 gcc/testsuite/gcc.dg/vla-bounds-fnptr-5.c
 create mode 100644 gcc/testsuite/gcc.dg/vla-bounds-fnptr.c

diff --git a/gcc/c/c-decl.cc b/gcc/c/c-decl.cc
index 64d3a941cb9..84a30f7476a 100644
--- a/gcc/c/c-decl.cc
+++ b/gcc/c/c-decl.cc
@@ -8549,6 +8549,10 @@ get_parm_info (bool ellipsis, tree expr)
 declared types.  The back end may override this later.  */
  DECL_ARG_TYPE (decl) = type;
  types = tree_cons (0, type, types);
+
+ /* Record the decl for use for VLA bounds checking.  */
+ if (flag_vla_bounds)
+   TREE_PURPOSE (types) = decl;
}
  break;
 
diff --git a/gcc/c/c-typeck.cc b/gcc/c/c-typeck.cc
index b65fc450940..1200abc2f4a 100644
--- a/gcc/c/c-typeck.cc
+++ b/gcc/c/c-typeck.cc
@@ -3472,9 +3472,19 @@ process_vm_constraints (location_t location,
}
   else
{
- /* Functions called via pointers are not yet supported.  */
- return void_node;
+ while (FUNCTION_TYPE != TREE_CODE (function))
+   function = TREE_TYPE (function);
+
+ args = TREE_PURPOSE (TYPE_ARG_TYPES (function));
+
+ if (!args)
+   {
+ /* FIXME: this can happen when forming composite types for the
+conditional operator.  */
+ return void_node;
+   }
}
+  gcc_assert (PARM_DECL == TREE_CODE (args));
 }
 
   for (struct instrument_data* d = *instr_vec; d; d = d->next)
diff --git a/gcc/testsuite/gcc.dg/vla-bounds-fnptr-1.c 
b/gcc/testsuite/gcc.dg/vla-bounds-fnptr-1.c
new file mode 100644
index 000..b9af87f6338
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/vla-bounds-fnptr-1.c
@@ -0,0 +1,78 @@
+/* { dg-do run } */
+/* { dg-options "-fvla-bounds" } */
+
+#include 
+#include 
+
+static void handler(int) { exit(0); }
+
+#define TRY(...) __VA_ARGS__ __builtin_abort();
+#define ERROR(...)
+
+
+
+void foo1(void (*p)(int n, char (*a)[n]))
+{
+   char A0[3];
+   (*p)(3, &A0);
+TRY(   (*p)(4, &A0); ) // 4 != 3
+}
+
+void b0(int n, char (*a)[n]) { }
+
+
+int n;
+
+void foo2(void (*p)(int n, char (*a)[n]))
+{
+   n = 4;
+   char A0[3];
+   (*p)(3, &A0);
+ERROR( (*p)(4, &A0); ) // 4 != 3
+}
+
+void foo3(void (*p)(int n0, char (*a)[n]))
+{
+   n = 4;
+   char A0[3];
+ERROR( (*p)(3, &A0); ) // 4 != 3
+ERROR( (*p)(4, &A0); ) // 4 != 3 
+}
+
+void foo4(void (*p)(int n, char (*a)[n]))
+{
+   n = 3;
+   char A0[3];
+   (*p)(3, &A0);
+ERROR( (*p)(4, &A0); ) // 4 != 3
+}
+
+
+void foo5(void (*p)(int n0, char (*a)[n]))
+{
+   n = 3;
+   char A0[3];
+   (*p)(3, &A0);
+   (*p)(4, &A0);
+}
+
+
+void b1(int n0, char (*a)[n]) { }
+
+
+
+int main()
+{
+   signal(SIGILL, handler);
+
+   foo1(&b0);
+
+   foo2(&b1);
+   foo3(&b1); // we should diagnose mismatch and run-time discrepancies
+
+   foo4(&b1);
+   foo5(&b1); // we should diagnose mismatch and run-time discrepancies
+}
+
+
+
diff --git a/gcc/testsuite/gcc.dg/vla-bounds-fnptr-2.c 
b/gcc/testsuite/gcc.dg/vla-bounds-fnptr-2.c
new file mode 100644
index 000..4ec326af06c
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/vla-bounds-fnptr-2.c
@@ -0,0 +1,78 @@
+/* { dg-do run } */
+/* { dg-options "-fvla-bounds" } */
+
+#include 
+#include 
+
+static void handler(int) { exit(0); }
+
+#define TRY(...) __VA_ARGS__ __builtin_abort();
+#define ERROR(...)
+
+
+
+void foo1(voi

[PATCH 4/4] c: runtime checking for assigment of VM types 4/4




Add warning for the case when a function call can not be instrumened.

gcc/c-family/:
* c.opt (Wvla-parameter-missing-check): Add warning.

gcc/c/:
* c-typeck.cc (process_vm_constraints): Add warning.

gcc/doc/:
* invoke.texi (Wvla-parameter-missing-check): Document warning.
(flag_vla_bounds): Update.

gcc/testsuite/:
* gcc.dg/vla-bounds-func-1.c: Add warning.
---
 gcc/c-family/c.opt   |  5 +
 gcc/c/c-typeck.cc|  4 
 gcc/doc/invoke.texi  | 11 ---
 gcc/testsuite/gcc.dg/vla-bounds-func-1.c |  6 +++---
 4 files changed, 20 insertions(+), 6 deletions(-)

diff --git a/gcc/c-family/c.opt b/gcc/c-family/c.opt
index 29bc0956181..bd45ba577bd 100644
--- a/gcc/c-family/c.opt
+++ b/gcc/c-family/c.opt
@@ -1485,6 +1485,11 @@ Wvla-parameter
 C ObjC C++ ObjC++ Var(warn_vla_parameter) Warning LangEnabledBy(C ObjC C++ 
ObjC++,Wall)
 Warn about mismatched declarations of VLA parameters.
 
+Wvla-parameter-missing-check
+C ObjC Var(warn_vla_parameter_check) Warning Init(0)
+When using run-time checking of VLA bounds, warn about function calls which
+could not be instrumented.
+
 Wvolatile
 C++ ObjC++ Var(warn_volatile) Warning
 Warn about deprecated uses of volatile qualifier.
diff --git a/gcc/c/c-typeck.cc b/gcc/c/c-typeck.cc
index 1200abc2f4a..a4fb0a6b527 100644
--- a/gcc/c/c-typeck.cc
+++ b/gcc/c/c-typeck.cc
@@ -3481,6 +3481,8 @@ process_vm_constraints (location_t location,
{
  /* FIXME: this can happen when forming composite types for the
 conditional operator.  */
+ warning_at (location, OPT_Wvla_parameter_missing_check,
+ "Function call not instrumented");
  return void_node;
}
}
@@ -3564,6 +3566,8 @@ process_vm_constraints (location_t location,
  also not instrument any of the others because it may have
  side effects affecting them.  (We could restart and instrument
  only the ones with integer constants.)   */
+   warning_at (location, OPT_Wvla_parameter_missing_check,
+   "Function call not instrumented");
return void_node;
}
 cont:
diff --git a/gcc/doc/invoke.texi b/gcc/doc/invoke.texi
index c94ca59086b..6f4bbd43919 100644
--- a/gcc/doc/invoke.texi
+++ b/gcc/doc/invoke.texi
@@ -10269,6 +10269,7 @@ void g (int n)
 @option{-Warray-parameter} option triggers warnings for similar problems
 involving ordinary array arguments.
 
+
 @opindex Wvla-parameter-missing-check
 @item -Wvla-parameter-missing-check
 Warn when function calls can not be instrumented with the use of
@@ -20063,9 +20064,13 @@ The @var{string} should be different for every file 
you compile.
 @item -fvla-bounds
 This option is only available when compiling C code.  If activated,
 additional code is emitted that verifies at run time for assignments
-involving variably-modified types that corresponding size expressions
-evaluate to the same value.
-
+and function calls involving variably-modified types that corresponding
+size expressions evaluate to the same value.  Note that for function
+calls the visible declarations needs to have a size expression that
+matches the size expression in the definition.  A mismatch seen by the
+the compiler is diagnosed by @option{-Wvla-parameter}). In same cases,
+a function call can not be instrumented.  This can be diagnosed by
+@option{-Wvla-parameter-missing-check}.
 
 @opindex save-temps
 @item -save-temps
diff --git a/gcc/testsuite/gcc.dg/vla-bounds-func-1.c 
b/gcc/testsuite/gcc.dg/vla-bounds-func-1.c
index 72dba39107b..205e5174185 100644
--- a/gcc/testsuite/gcc.dg/vla-bounds-func-1.c
+++ b/gcc/testsuite/gcc.dg/vla-bounds-func-1.c
@@ -1,5 +1,5 @@
 /* { dg-do compile } */
-/* { dg-options "-fvla-bounds" } */
+/* { dg-options "-fvla-bounds -Wvla-parameter-missing-check" } */
 
 // make sure we do not ICE on any of these
 
@@ -31,7 +31,7 @@ void f(void)
 
int u = 3; int v = 4;
char a[u][v];
-   (1 ? f1 : f2)(u, v, a); /* "Function call not instrumented." */
+   (1 ? f1 : f2)(u, v, a); /* { dg-warning "Function call" "not 
instrumented." } */
 }
 
 /* size expression in parameter */
@@ -51,6 +51,6 @@ int c(int u, char (*a)[u]) { }
 int d(void)
 {
char a[3];
-   c(3, &a);   /* "Function call not instrumented." */
+   c(3, &a);   /* { dg-warning "Function call" "not 
instrumented." } */
 }
 
-- 
2.39.2

RFA: RISC-V: Add support for XCVhwlp extension in CV32E40P

2023-11-18 Thread Joern Rennecke

This patch adds support for hardware loops as described in:
https://docs.openhwgroup.org/projects/cv32e40p-user-manual/en/cv32e40p_v1.3.2/instruction_set_extensions.html#hardware-loops
.

riscv32-corev-elf (using newlib) regression tested for multilibs:
rv32imc_zicsr-ilp32--
rv32imfc_zicsr-ilp32--
rv32imc_zicsr_zfinx-ilp32--
rv32imfc_zicsr_xcvmac_xcvalu-ilp32--

also tested against this:

rv32imc_zicsr_xcvhwlp-ilp32--
rv32imfc_zicsr_xcvhwlp-ilp32--
rv32imc_zicsr_zfinx_xcvhwlp-ilp32--
rv32imfc_zicsr_xcvmac_xcvalu_xcvhwlp-ilp32-

Bootstrapped on x86_64

build 'all-gcc' for x86_64 x sh-elf
Add support for XCVhwlp extension in CV32E40P

2023-11-18  Joern Rennecke  

gcc/
* common/config/riscv/riscv-common.cc (riscv_ext_version_table):
Add xcvhwlp.
(riscv_ext_flag_table): Likewise.
* config.gcc (riscv*): Add corev.o to extra_objs.
* config/riscv/constraints.md (xcvl0s, xcvl0e): New constraints.
(xcvl0c, xcvl1s, xcvl1e, xcvl1c): Likewise.
(CVl0, xcvlb5, xcvlbs, xcvlbe, CV12): Likewise.
* config/riscv/corev.cc: New file.
* config/riscv/corev.md (UNSPEC_CV_LOOPBUG): New constant.
(UNSPECV_CV_LOOPALIGN, UNSPEC_CV_FOLLOWS): Likewise.
(UNSPEC_CV_LP_START_12): Likewise.
(UNSPEC_CV_LP_END_5, UNSPEC_CV_LP_END_12): Likewise.
(doloop_end_i, *cv_start, *cv_end, *cv_count): New insn patterns.
(doloop_align): Likewise.
(doloop_end, doloop_begin): New expanders.
(doloop_begin_i): New define_insn_and_split.
(doloop_begin_i+1): New splitter.
* config/riscv/predicates.md (lpstart_reg_op): New predicate.
(lpend_reg_op, lpcount_reg_op): Likewise.
(label_register_operand, move_dest_operand): Likewise.
* config/riscv/riscv-passes.def (pass_riscv_doloop_begin): Add.
(pass_riscv_doloop_ranges):
Insert before and after register allocation.
* config/riscv/riscv-protos.h (make_pass_riscv_doloop_begin): Declare.
(make_pass_riscv_doloop_ranges): Likewise.
(riscv_can_use_doloop_p, riscv_invalid_within_doloop): Likewise.
(hwloop_setupi_p, add_label_op_ref, corev_label_align): Likewise.
* config/riscv/riscv.cc (riscv_regno_to_class): Add classes for
hardware loop start, end and counter registers.
(riscv_strip_unspec_address): Also strip UNSPEC_CV_LP_START_12,
UNSPEC_CV_LP_END_5 and UNSPEC_CV_LP_END_12.
(riscv_output_move): Add support to read loop counter registers.
(TARGET_CAN_USE_DOLOOP_P, TARGET_INVALID_WITHIN_DOLOOP): Override.
* config/riscv/riscv.h (enum reg_class): Add items for hardware
loop start, end and counter registers.
(REG_CLASS_NAMES): Likewise.
(REG_CLASS_CONTENTS): Likewise.
(REG_ALLOC_ORDER): Likewise.
(REGISTER_NAMES): Likewise.
(LABEL_ALIGN): Define.
* config/riscv/riscv.md (LPSTART0_REGNUM): New constant.
(LPEND0_REGNUM, LPCOUNT0_REGNUM): Likewise.
(LPSTART1_REGNUM, LPEND1_REGNUM, LPCOUNT1_REGNUM): Likewise.
(attr ext): New value xcvhwlp.
(attr enabled): Handle xcvhwlp.
(movsi_internal): Add alternatives to read loop counters.
Use move_dest_operand.
* config/riscv/riscv.opt (XCVHWLP): New Mask.
* config/riscv/t-riscv (corev.o): New rule.
* doc/md.texi (doloop_end): Document optional operand 2.
* loop-doloop.cc (doloop_optimize): Provide 3rd operand to
gen_doloop_end.
* target-insns.def (doloop_end): Add optional 3rd operand.
gcc/testsuite/
* gcc.target/riscv/cv-hwlp-shiftsub.c: New test.

diff --git a/gcc/common/config/riscv/riscv-common.cc 
b/gcc/common/config/riscv/riscv-common.cc
index 5111626157b..55b56235134 100644
--- a/gcc/common/config/riscv/riscv-common.cc
+++ b/gcc/common/config/riscv/riscv-common.cc
@@ -312,6 +312,7 @@ static const struct riscv_ext_version 
riscv_ext_version_table[] =
 
   {"xcvmac", ISA_SPEC_CLASS_NONE, 1, 0},
   {"xcvalu", ISA_SPEC_CLASS_NONE, 1, 0},
+  {"xcvhwlp", ISA_SPEC_CLASS_NONE, 1, 0},
 
   {"xtheadba", ISA_SPEC_CLASS_NONE, 1, 0},
   {"xtheadbb", ISA_SPEC_CLASS_NONE, 1, 0},
@@ -1676,6 +1677,7 @@ static const riscv_ext_flag_table_t 
riscv_ext_flag_table[] =
 
   {"xcvmac",&gcc_options::x_riscv_xcv_subext, MASK_XCVMAC},
   {"xcvalu",&gcc_options::x_riscv_xcv_subext, MASK_XCVALU},
+  {"xcvhwlp",   &gcc_options::x_riscv_xcv_subext, MASK_XCVHWLP},
 
   {"xtheadba",  &gcc_options::x_riscv_xthead_subext, MASK_XTHEADBA},
   {"xtheadbb",  &gcc_options::x_riscv_xthead_subext, MASK_XTHEADBB},
diff --git a/gcc/config.gcc b/gcc/config.gcc
index 6d51bd93f3f..8cddfbb12b3 100644
--- a/gcc/config.gcc
+++ b/gcc/config.gcc
@@ -546,7 +546,7 @@ riscv*)
extra_objs="riscv-builtins.o riscv-c.o riscv-sr.o 
riscv-shorten-memrefs.o riscv-selftests.o riscv-string.o"
extra_objs="${extra_objs} riscv-v.o riscv-vsetvl.o riscv-vector-costs.o 
riscv-a

[committed v2] libstdc++: Add fast path for std::format("{}", x) [PR110801]

2023-11-18 Thread Jonathan Wakely

Here's an improved version of this patch, which I've pushed to trunk.

Testeed x86_64-linux.

-- >8 --

This optimizes the simple case of formatting a single string, integer
or bool, with no format-specifier (so no padding, alignment, alternate
form etc.)

libstdc++-v3/ChangeLog:

PR libstdc++/110801
* include/std/format (_Sink_iter::_M_reserve): New member
function.
(_Sink::_Reservation): New nested class.
(_Sink::_M_reserve, _Sink::_M_bump): New virtual functions.
(_Seq_sink::_M_reserve, _Seq_sink::_M_bump): New virtual
overrides.
(_Iter_sink::_M_reserve): Likewise.
(__do_vformat_to): Use new functions to optimize "{}" case.
---
 libstdc++-v3/include/std/format | 164 +++-
 1 file changed, 163 insertions(+), 1 deletion(-)

diff --git a/libstdc++-v3/include/std/format b/libstdc++-v3/include/std/format
index 8ec1c8a0b9a..7c52cce5dbb 100644
--- a/libstdc++-v3/include/std/format
+++ b/libstdc++-v3/include/std/format
@@ -2442,6 +2442,10 @@ namespace __format
   iter_difference_t<_Out> size;
 };
 
+_GLIBCXX_BEGIN_NAMESPACE_CONTAINER
+template class vector;
+_GLIBCXX_END_NAMESPACE_CONTAINER
+
 /// @cond undocumented
 namespace __format
 {
@@ -2492,6 +2496,10 @@ namespace __format
   [[__gnu__::__always_inline__]]
   constexpr _Sink_iter
   operator++(int) { return *this; }
+
+  auto
+  _M_reserve(size_t __n) const
+  { return _M_sink->_M_reserve(__n); }
 };
 
   // Abstract base class for type-erased character sinks.
@@ -2508,6 +2516,7 @@ namespace __format
   // Called when the span is full, to make more space available.
   // Precondition: _M_next != _M_span.begin()
   // Postcondition: _M_next != _M_span.end()
+  // TODO: remove the precondition? could make overflow handle it.
   virtual void _M_overflow() = 0;
 
 protected:
@@ -2572,6 +2581,46 @@ namespace __format
  }
   }
 
+  // A successful _Reservation can be used to directly write
+  // up to N characters to the sink to avoid unwanted buffering.
+  struct _Reservation
+  {
+   // True if the reservation was successful, false otherwise.
+   explicit operator bool() const noexcept { return _M_sink; }
+   // A pointer to write directly to the sink.
+   _CharT* get() const noexcept { return _M_sink->_M_next.operator->(); }
+   // Add n to the _M_next iterator for the sink.
+   void _M_bump(size_t __n) { _M_sink->_M_bump(__n); }
+   _Sink* _M_sink;
+  };
+
+  // Attempt to reserve space to write n characters to the sink.
+  // If anything is written to the reservation then there must be a call
+  // to _M_bump(N2) before any call to another member function of *this,
+  // where N2 is the number of characters written.
+  virtual _Reservation
+  _M_reserve(size_t __n)
+  {
+   auto __avail = _M_unused();
+   if (__n <= __avail.size())
+ return { this };
+
+   if (__n <= _M_span.size()) // Cannot meet the request.
+ {
+   _M_overflow(); // Make more space available.
+   __avail = _M_unused();
+   if (__n <= __avail.size())
+ return { this };
+ }
+   return { nullptr };
+  }
+
+  // Update the next output position after writing directly to the sink.
+  // pre: no calls to _M_write or _M_overflow since _M_reserve.
+  virtual void
+  _M_bump(size_t __n)
+  { _M_next += __n; }
+
 public:
   _Sink(const _Sink&) = delete;
   _Sink& operator=(const _Sink&) = delete;
@@ -2596,6 +2645,8 @@ namespace __format
   { }
 };
 
+  using _GLIBCXX_STD_C::vector;
+
   // A sink that fills a sequence (e.g. std::string, std::vector, std::deque).
   // Writes to a buffer then appends that to the sequence when it fills up.
   template
@@ -2619,6 +2670,45 @@ namespace __format
this->_M_rewind();
   }
 
+  typename _Sink<_CharT>::_Reservation
+  _M_reserve(size_t __n) override
+  {
+   if constexpr (__is_specialization_of<_Seq, basic_string>
+   || __is_specialization_of<_Seq, vector>)
+ {
+   // Flush the buffer to _M_seq first:
+   if (this->_M_used().size())
+ _M_overflow();
+   // Expand _M_seq to make __n new characters available:
+   const auto __sz = _M_seq.size();
+   if constexpr (is_same_v || is_same_v)
+ _M_seq.__resize_and_overwrite(__sz + __n,
+   [](auto, auto __n2) {
+ return __n2;
+   });
+   else
+ _M_seq.resize(__sz + __n);
+   // Set _M_used() to be a span over the original part of _M_seq:
+   this->_M_reset(_M_seq, __sz);
+   return { this };
+ }
+   else // Try to use the base class' buffer.
+ return

[committed] libstdc++: Check string value_type in std::make_format_args [PR112607]

2023-11-18 Thread Jonathan Wakely

Tested x86_64-linux. Pushed to trunk. Backport to gcc-13 needed too.

-- >8 --

libstdc++-v3/ChangeLog:

PR libstdc++/112607
* include/std/format (basic_format_arg::_S_to_arg_type): Check
value_type for basic_string_view and basic_string
specializations.
* testsuite/std/format/arguments/112607.cc: New test.
---
 libstdc++-v3/include/std/format   | 12 +---
 .../testsuite/std/format/arguments/112607.cc  | 30 +++
 2 files changed, 38 insertions(+), 4 deletions(-)
 create mode 100644 libstdc++-v3/testsuite/std/format/arguments/112607.cc

diff --git a/libstdc++-v3/include/std/format b/libstdc++-v3/include/std/format
index 7c52cce5dbb..58cd310db4d 100644
--- a/libstdc++-v3/include/std/format
+++ b/libstdc++-v3/include/std/format
@@ -3220,10 +3220,14 @@ namespace __format
return type_identity<__format::__float128_t>();
 # endif
 #endif
- else if constexpr (__is_specialization_of<_Td, basic_string_view>)
-   return type_identity>();
- else if constexpr (__is_specialization_of<_Td, basic_string>)
-   return type_identity>();
+ else if constexpr (__is_specialization_of<_Td, basic_string_view>
+   || __is_specialization_of<_Td, basic_string>)
+   {
+ if constexpr (is_same_v)
+   return type_identity>();
+ else
+   return type_identity();
+   }
  else if constexpr (is_same_v, const _CharT*>)
return type_identity();
  else if constexpr (is_same_v, _CharT*>)
diff --git a/libstdc++-v3/testsuite/std/format/arguments/112607.cc 
b/libstdc++-v3/testsuite/std/format/arguments/112607.cc
new file mode 100644
index 000..19eec765ea5
--- /dev/null
+++ b/libstdc++-v3/testsuite/std/format/arguments/112607.cc
@@ -0,0 +1,30 @@
+// { dg-do compile { target c++20 } }
+
+// PR libstdc++/112607
+// _Normalize does not consider char_type for the basic_string_view case
+
+#include 
+
+template
+struct Alloc
+{
+  using value_type = T;
+  Alloc() = default;
+  template
+Alloc(const Alloc&) { }
+  T* allocate(std::size_t);
+  void deallocate(T*, std::size_t);
+  bool operator==(const Alloc&) const;
+};
+
+template
+using String = std::basic_string, Alloc>;
+
+template<>
+struct std::formatter> : std::formatter {
+  auto format(const String&, auto& ctx) const {
+return std::formatter::format(" ", ctx);
+  }
+};
+
+std::string str = std::format("{}", String{});
-- 
2.41.0

[PATCH] Fortran: restrictions on integer arguments to SYSTEM_CLOCK [PR112609]

2023-11-18 Thread Harald Anlauf

Hi all,

Fortran 2023 added restrictions on integer arguments to SYSTEM_CLOCK.
The attached patch implements these.

I was struggling with the way we should handle features that are sort-of
deleted in a new standard, but not described as such in the standard,
which is why we do not have GFC_STD_F2023_DEL.  As -std=gnu should not
apply this restriction, I came up with the solution in the patch.
While playing, I hit a gcc_unreachable in notify_std_msg due to a
missing case, also fixed.

Interestingly, the standard now has a recommendation:

16.9.202 SYSTEM_CLOCK

It it recommended that all references to SYSTEM_CLOCK use integer
arguments with a decimal exponent range of at least 18. ...

In case the user chooses integer(4), shall we emit a warning
e.g. under -pedantic, or some other flag?  This is not done
in the patch, but could be added.

Regtested on x86_64-pc-linux-gnu.  OK for mainline?

Thanks,
Harald

From 44814d9436b2e0be14b76b137602e40f3fdaf805 Mon Sep 17 00:00:00 2001
From: Harald Anlauf 
Date: Sat, 18 Nov 2023 22:51:35 +0100
Subject: [PATCH] Fortran: restrictions on integer arguments to SYSTEM_CLOCK
 [PR112609]

Fortran 2023 added restrictions on integer arguments to SYSTEM_CLOCK to
have a decimal exponent range at least as large as a default integer,
and that all integer arguments have the same kind type parameter.

gcc/fortran/ChangeLog:

	PR fortran/112609
	* check.cc (gfc_check_system_clock): Add checks on integer arguments
	to SYSTEM_CLOCK specific to F2023.
	* error.cc (notify_std_msg): Adjust to handle new features added
	in F2023.

gcc/testsuite/ChangeLog:

	PR fortran/112609
	* gfortran.dg/system_clock_4.f90: New test.
---
 gcc/fortran/check.cc | 57 
 gcc/fortran/error.cc |  4 +-
 gcc/testsuite/gfortran.dg/system_clock_4.f90 | 24 +
 3 files changed, 84 insertions(+), 1 deletion(-)
 create mode 100644 gcc/testsuite/gfortran.dg/system_clock_4.f90

diff --git a/gcc/fortran/check.cc b/gcc/fortran/check.cc
index 6c45e6542f0..8c2534ae1c9 100644
--- a/gcc/fortran/check.cc
+++ b/gcc/fortran/check.cc
@@ -6774,6 +6774,10 @@ bool
 gfc_check_system_clock (gfc_expr *count, gfc_expr *count_rate,
 			gfc_expr *count_max)
 {
+  int first_int_kind = -1;
+  bool f2023 = ((gfc_option.allow_std & GFC_STD_F2023) != 0
+		&& (gfc_option.allow_std & GFC_STD_GNU) == 0);
+
   if (count != NULL)
 {
   if (!scalar_check (count, 0))
@@ -6788,8 +6792,18 @@ gfc_check_system_clock (gfc_expr *count, gfc_expr *count_rate,
 			  &count->where))
 	return false;

+  if (f2023 && count->ts.kind < gfc_default_integer_kind)
+	{
+	  gfc_error ("Fortran 2023: COUNT argument to SYSTEM_CLOCK "
+		 "at %L must have kind of at least default integer",
+		 &count->where);
+	  return false;
+	}
+
   if (!variable_check (count, 0, false))
 	return false;
+
+  first_int_kind = count->ts.kind;
 }

   if (count_rate != NULL)
@@ -6816,6 +6830,17 @@ gfc_check_system_clock (gfc_expr *count, gfc_expr *count_rate,
   "SYSTEM_CLOCK at %L has non-default kind",
   &count_rate->where))
 	return false;
+
+	  if (f2023 && count_rate->ts.kind < gfc_default_integer_kind)
+	{
+	  gfc_error ("Fortran 2023: COUNT_RATE argument to SYSTEM_CLOCK "
+			 "at %L must have kind of at least default integer",
+			 &count_rate->where);
+	  return false;
+	}
+
+	  if (first_int_kind < 0)
+	first_int_kind = count_rate->ts.kind;
 	}

 }
@@ -6836,6 +6861,38 @@ gfc_check_system_clock (gfc_expr *count, gfc_expr *count_rate,

   if (!variable_check (count_max, 2, false))
 	return false;
+
+  if (f2023 && count_max->ts.kind < gfc_default_integer_kind)
+	{
+	  gfc_error ("Fortran 2023: COUNT_MAX argument to SYSTEM_CLOCK "
+		 "at %L must have kind of at least default integer",
+		 &count_max->where);
+	  return false;
+	}
+
+  if (first_int_kind < 0)
+	first_int_kind = count_max->ts.kind;
+}
+
+  if (f2023 && first_int_kind > 0)
+{
+  if (count_rate
+	  && count_rate->ts.type == BT_INTEGER
+	  && count_rate->ts.kind != first_int_kind)
+	{
+	  gfc_error ("Fortran 2023: all integer arguments to SYSTEM_CLOCK "
+		 "at %L must have the same kind",
+		 &count_rate->where);
+	  return false;
+	}
+
+  if (count_max && count_max->ts.kind != first_int_kind)
+	{
+	  gfc_error ("Fortran 2023: all integer arguments to SYSTEM_CLOCK "
+		 "at %L must have the same kind",
+		 &count_max->where);
+	  return false;
+	}
 }

   return true;
diff --git a/gcc/fortran/error.cc b/gcc/fortran/error.cc
index 2ac51e95e4d..b8b36c0cd7c 100644
--- a/gcc/fortran/error.cc
+++ b/gcc/fortran/error.cc
@@ -980,7 +980,9 @@ char const*
 notify_std_msg(int std)
 {

-  if (std & GFC_STD_F2018_DEL)
+  if (std & GFC_STD_F2023)
+return _("Fortran 2023:");
+  else if (std & GFC_STD_F2018_DEL)
 return _("Fortran 2018 deleted feature:");
   else if (std & GFC_STD_F2018_OBS)
 return _("Fortran 2018 obsolesc

Re: [PATCH] Makefile.tpl: Avoid race condition in generating site.exp from the top level





On 11/17/23 15:19, Lewis Hyatt wrote:

Hello-

I often find it convenient to run a new c-c++-common test from the
main build dir like:

$ make -j 2 RUNTESTFLAGS=dg.exp=new-test.c check-gcc-{c,c++}

I noticed that sometimes this produces a corrupted site.exp and then no
tests work until it is remade manually. To avoid the issue, it is necessary
to do "cd gcc; make site.exp" before running a parallel make from the top
level directory. The below patch fixes it by just making that dependency on
site.exp explicit in the top level Makefile. Is it OK please? Thanks...

-Lewis

-- >8 --

A command like "make -j 2 check-gcc-c check-gcc-c++" run in the top level of
a fresh build directory does not work reliably. That will spawn two
independent make processes inside the "gcc" directory, and each of those
will attempt to create site.exp if it doesn't exist and will interfere with
each other, producing often a corrupted or empty site.exp. Resolve that by
making these targets depend on a new phony target which makes sure site.exp
is created first before starting the recursive makes.

ChangeLog:

* Makefile.in: Regenerate.
* Makefile.tpl: Add dependency on site.exp to check-gcc-* targets

OK
jeff

Re: [PATCH] Fortran: restrictions on integer arguments to SYSTEM_CLOCK [PR112609]

2023-11-18 Thread Steve Kargl

On Sat, Nov 18, 2023 at 11:12:55PM +0100, Harald Anlauf wrote:
> 
> Fortran 2023 added restrictions on integer arguments to SYSTEM_CLOCK.
> The attached patch implements these.
> 
> I was struggling with the way we should handle features that are sort-of
> deleted in a new standard, but not described as such in the standard,
> which is why we do not have GFC_STD_F2023_DEL.  As -std=gnu should not
> apply this restriction, I came up with the solution in the patch.
> While playing, I hit a gcc_unreachable in notify_std_msg due to a
> missing case, also fixed.
> 
> Interestingly, the standard now has a recommendation:
> 
> 16.9.202 SYSTEM_CLOCK
> 
> It it recommended that all references to SYSTEM_CLOCK use integer
> arguments with a decimal exponent range of at least 18. ...
> 
> In case the user chooses integer(4), shall we emit a warning
> e.g. under -pedantic, or some other flag?  This is not done
> in the patch, but could be added.
> 
> Regtested on x86_64-pc-linux-gnu.  OK for mainline?
> 

Not in its current form.

>  {
> +  int first_int_kind = -1;
> +  bool f2023 = ((gfc_option.allow_std & GFC_STD_F2023) != 0
> + && (gfc_option.allow_std & GFC_STD_GNU) == 0);
> +

If you use the gfc_notify_std(), then you should not need the
above check on GFC_STD_GNU as it should include GFC_STD_F2023.

> 
> +  if (f2023 && count->ts.kind < gfc_default_integer_kind)
> + {
> +   gfc_error ("Fortran 2023: COUNT argument to SYSTEM_CLOCK "
> +  "at %L must have kind of at least default integer",
> +  &count->where);
> +   return false;
> + }

Elsewhere in the FE, gfortran uses gfc_notify_std() to enforce 
requirements of a Fortran standard.  The above would be 

  if (count->ts.kind < gfc_default_integer_kind
  && gfc_notify_std (GFC_STD_F2023, "COUNT argument to SYSTEM_CLOCK "
 "at %L must have kind of at least default integer",
 &count->where))

Note, gfc_notify_std() should add the 'Fortran 2023: ' string,
if not, that should be fixed.

Of course, I seldom provide patches if others don't have a comment
then do as you like.

-- 
Steve

Propagate value ranges of return values

2023-11-18 Thread Jan Hubicka

Hi,
this patch implements very basic propaation of return value ranges from VRP
pass.  This helps std::vector's push_back since we work out value range of
allocated block.  This propagates only within single translation unit.  I hoped
we will also do the propagation at WPA stage, but that needs more work on
ipa-cp side.

I also added code auto-detecting return_nonnull and corresponding 
-Wsuggest-attribute

Variant of this patch bootstrapped/regtested x86_64-linux, testing with
this version is running.  I plan to commit the patch at Monday provided
there are no issues.

gcc/ChangeLog:

* cgraph.cc (add_detected_attribute_1): New function.
(cgraph_node::add_detected_attribute): New member function.
* cgraph.h (struct cgraph_node): Declare it.
* common.opt: Add Wsuggest-attribute=returns_nonnull.
* doc/invoke.texi: Document +Wsuggest-attribute=returns_nonnull.
* gimple-range-fold.cc: Include ipa-prop and dependencies.
(fold_using_range::range_of_call): Look for return value range.
* ipa-prop.cc (struct ipa_return_value_summary): New structure.
(class ipa_return_value_sum_t): New summary.
(ipa_record_return_value_range): New function.
(ipa_return_value_range): New function.
* ipa-prop.h (ipa_return_value_range): Declare.
(ipa_record_return_value_range): Declare.
* ipa-pure-const.cc (warn_function_returns_nonnull): New function.
* ipa-utils.h (warn_function_returns_nonnull): Declare.
* symbol-summary.h: Fix comment typo.
* tree-vrp.cc (execute_ranger_vrp): Record return values.

gcc/testsuite/ChangeLog:

* gcc.dg/tree-ssa/return-value-range-1.c: New test.

diff --git a/gcc/cgraph.cc b/gcc/cgraph.cc
index e41e5ad3ae7..71dacf23ce1 100644
--- a/gcc/cgraph.cc
+++ b/gcc/cgraph.cc
@@ -2629,6 +2629,54 @@ cgraph_node::set_malloc_flag (bool malloc_p)
   return changed;
 }
 
+/* Worker to set malloc flag.  */
+static void
+add_detected_attribute_1 (cgraph_node *node, const char *attr, bool *changed)
+{
+  if (!lookup_attribute (attr, DECL_ATTRIBUTES (node->decl)))
+{
+  DECL_ATTRIBUTES (node->decl) = tree_cons (get_identifier (attr),
+NULL_TREE, DECL_ATTRIBUTES 
(node->decl));
+  *changed = true;
+}
+
+  ipa_ref *ref;
+  FOR_EACH_ALIAS (node, ref)
+{
+  cgraph_node *alias = dyn_cast (ref->referring);
+  if (alias->get_availability () > AVAIL_INTERPOSABLE)
+   add_detected_attribute_1 (alias, attr, changed);
+}
+
+  for (cgraph_edge *e = node->callers; e; e = e->next_caller)
+if (e->caller->thunk
+   && (e->caller->get_availability () > AVAIL_INTERPOSABLE))
+  add_detected_attribute_1 (e->caller, attr, changed);
+}
+
+/* Set DECL_IS_MALLOC on NODE's decl and on NODE's aliases if any.  */
+
+bool
+cgraph_node::add_detected_attribute (const char *attr)
+{
+  bool changed = false;
+
+  if (get_availability () > AVAIL_INTERPOSABLE)
+add_detected_attribute_1 (this, attr, &changed);
+  else
+{
+  ipa_ref *ref;
+
+  FOR_EACH_ALIAS (this, ref)
+   {
+ cgraph_node *alias = dyn_cast (ref->referring);
+ if (alias->get_availability () > AVAIL_INTERPOSABLE)
+   add_detected_attribute_1 (alias, attr, &changed);
+   }
+}
+  return changed;
+}
+
 /* Worker to set noreturng flag.  */
 static void
 set_noreturn_flag_1 (cgraph_node *node, bool noreturn_p, bool *changed)
diff --git a/gcc/cgraph.h b/gcc/cgraph.h
index cedaaac3a45..cfdd9f693a8 100644
--- a/gcc/cgraph.h
+++ b/gcc/cgraph.h
@@ -1190,6 +1190,10 @@ struct GTY((tag ("SYMTAB_FUNCTION"))) cgraph_node : 
public symtab_node
 
   bool set_pure_flag (bool pure, bool looping);
 
+  /* Add attribute ATTR to cgraph_node's decl and on aliases of the node
+ if any.  */
+  bool add_detected_attribute (const char *attr);
+
   /* Call callback on function and aliases associated to the function.
  When INCLUDE_OVERWRITABLE is false, overwritable aliases and thunks are
  skipped. */
diff --git a/gcc/common.opt b/gcc/common.opt
index d21db5d4a20..0be4f02677c 100644
--- a/gcc/common.opt
+++ b/gcc/common.opt
@@ -781,6 +781,10 @@ Wsuggest-attribute=malloc
 Common Var(warn_suggest_attribute_malloc) Warning
 Warn about functions which might be candidates for __attribute__((malloc)).
 
+Wsuggest-attribute=returns_nonnull
+Common Var(warn_suggest_attribute_malloc) Warning
+Warn about functions which might be candidates for __attribute__((malloc)).
+
 Wsuggest-final-types
 Common Var(warn_suggest_final_types) Warning
 Warn about C++ polymorphic types where adding final keyword would improve code 
quality.
diff --git a/gcc/doc/invoke.texi b/gcc/doc/invoke.texi
index 557d613a1e6..b9e98843613 100644
--- a/gcc/doc/invoke.texi
+++ b/gcc/doc/invoke.texi
@@ -8092,7 +8092,7 @@ if the array is referenced as a flexible array member.
 
 @opindex Wsuggest-attribute=
 @opindex Wno-suggest-attribute=
-@item 
-Wsuggest-attribute=@

[pushed] analyzer: new warning: -Wanalyzer-undefined-behavior-strtok [PR107573]

2023-11-18 Thread David Malcolm

This patch:
- adds support to the analyzer for tracking API-private state
  or which we don't have a decl (such as strtok's internal state),
- uses it to implement a new -Wanalyzer-undefined-behavior-strtok which
  warns when strtok (NULL, delim) is called as the first call to
  strtok after main.

Successfully bootstrapped & regrtested on x86_64-pc-linux-gnu.
Integration testing showed no changes.
Pushed to trunk as r14-5591-gf65f63c4d86a48.

gcc/analyzer/ChangeLog:
PR analyzer/107573
* analyzer.h (register_known_functions): Add region_model_manager
param.
* analyzer.opt (Wanalyzer-undefined-behavior-strtok): New.
* call-summary.cc
(call_summary_replay::convert_region_from_summary_1): Handle
RK_PRIVATE.
* engine.cc (impl_run_checkers): Pass model manager to
register_known_functions.
* kf.cc (class undefined_function_behavior): New.
(class kf_strtok): New.
(register_known_functions): Add region_model_manager param.
Use it to register "strtok".
* region-model-manager.cc
(region_model_manager::get_or_create_conjured_svalue): Add "idx"
param.
* region-model-manager.h
(region_model_manager::get_or_create_conjured_svalue): Add "idx"
param.
(region_model_manager::get_root_region): New accessor.
* region-model.cc (region_model::scan_for_null_terminator): Handle
"expr" being null.
(region_model::get_representative_path_var_1): Handle RK_PRIVATE.
* region-model.h (region_model::called_from_main_p): Make public.
* region.cc (region::get_memory_space): Handle RK_PRIVATE.
(region::can_have_initial_svalue_p): Handle MEMSPACE_PRIVATE.
(private_region::dump_to_pp): New.
* region.h (MEMSPACE_PRIVATE): New.
(RK_PRIVATE): New.
(class private_region): New.
(is_a_helper ::test): New.
* store.cc (store::replay_call_summary_cluster): Handle
RK_PRIVATE.
* svalue.h (struct conjured_svalue::key_t): Add "idx" param to
ctor and "m_idx" field.
(class conjured_svalue::conjured_svalue): Likewise.

gcc/ChangeLog:
PR analyzer/107573
* doc/invoke.texi: Add -Wanalyzer-undefined-behavior-strtok.

gcc/testsuite/ChangeLog:
PR analyzer/107573
* c-c++-common/analyzer/strtok-1.c: New test.
* c-c++-common/analyzer/strtok-2.c: New test.
* c-c++-common/analyzer/strtok-3.c: New test.
* c-c++-common/analyzer/strtok-4.c: New test.
* c-c++-common/analyzer/strtok-cppreference.c: New test.

Signed-off-by: David Malcolm 
---
 gcc/analyzer/analyzer.h   |   3 +-
 gcc/analyzer/analyzer.opt |   4 +
 gcc/analyzer/call-summary.cc  |   1 +
 gcc/analyzer/engine.cc|   3 +-
 gcc/analyzer/kf.cc| 320 +-
 gcc/analyzer/region-model-manager.cc  |  10 +-
 gcc/analyzer/region-model-manager.h   |   4 +-
 gcc/analyzer/region-model.cc  |   7 +-
 gcc/analyzer/region-model.h   |   3 +-
 gcc/analyzer/region.cc|  14 +
 gcc/analyzer/region.h |  41 ++-
 gcc/analyzer/store.cc |   1 +
 gcc/analyzer/svalue.h |  13 +-
 gcc/doc/invoke.texi   |  14 +
 .../c-c++-common/analyzer/strtok-1.c  |  62 
 .../c-c++-common/analyzer/strtok-2.c  |  18 +
 .../c-c++-common/analyzer/strtok-3.c  |  26 ++
 .../c-c++-common/analyzer/strtok-4.c  |  42 +++
 .../analyzer/strtok-cppreference.c|  50 +++
 19 files changed, 619 insertions(+), 17 deletions(-)
 create mode 100644 gcc/testsuite/c-c++-common/analyzer/strtok-1.c
 create mode 100644 gcc/testsuite/c-c++-common/analyzer/strtok-2.c
 create mode 100644 gcc/testsuite/c-c++-common/analyzer/strtok-3.c
 create mode 100644 gcc/testsuite/c-c++-common/analyzer/strtok-4.c
 create mode 100644 gcc/testsuite/c-c++-common/analyzer/strtok-cppreference.c

diff --git a/gcc/analyzer/analyzer.h b/gcc/analyzer/analyzer.h
index 777293ff4b9..f08572bb633 100644
--- a/gcc/analyzer/analyzer.h
+++ b/gcc/analyzer/analyzer.h
@@ -326,7 +326,8 @@ public:
   void impl_call_pre (const call_details &cd) const override;
 };
 
-extern void register_known_functions (known_function_manager &mgr);
+extern void register_known_functions (known_function_manager &kfm,
+ region_model_manager &rmm);
 extern void register_known_analyzer_functions (known_function_manager &kfm);
 extern void register_known_fd_functions (known_function_manager &kfm);
 extern void register_known_file_functions (known_function_manager &kfm);
diff --git a/gcc/analyzer/analyzer.opt b/gcc/analyzer/analyzer.opt
index fae2649389a..a3c30caf2ab 100644
--- a/gcc/analyzer/analyzer.opt
+++ b/gcc/analyzer/analyzer.

[Committed V3] RISC-V: Fix bug of tuple move splitter

2023-11-18 Thread Juzhe-Zhong



PR target/112561

gcc/ChangeLog:

* config/riscv/riscv-v.cc (expand_tuple_move): Fix bug.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/rvv/autovec/pr112561.c: New test.

---
 gcc/config/riscv/riscv-v.cc  |  4 
 .../gcc.target/riscv/rvv/autovec/pr112561.c  | 16 
 2 files changed, 20 insertions(+)
 create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/pr112561.c

diff --git a/gcc/config/riscv/riscv-v.cc b/gcc/config/riscv/riscv-v.cc
index 6a2009ffb05..f769c1474e0 100644
--- a/gcc/config/riscv/riscv-v.cc
+++ b/gcc/config/riscv/riscv-v.cc
@@ -2148,6 +2148,10 @@ expand_tuple_move (rtx *ops)
  offset = ops[2];
}
 
+  /* Non-fractional LMUL has whole register moves that don't require a
+vsetvl for VLMAX.  */
+  if (fractional_p)
+   emit_vlmax_vsetvl (subpart_mode, ops[4]);
   if (MEM_P (ops[1]))
{
  /* Load operations.  */
diff --git a/gcc/testsuite/gcc.target/riscv/rvv/autovec/pr112561.c 
b/gcc/testsuite/gcc.target/riscv/rvv/autovec/pr112561.c
new file mode 100644
index 000..25e61fa12c0
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/rvv/autovec/pr112561.c
@@ -0,0 +1,16 @@
+/* { dg-do run { target { riscv_v } } } */
+/* { dg-options "-O3 -ftree-vectorize 
--param=riscv-autovec-preference=fixed-vlmax -mcmodel=medlow" } */
+
+int printf(char *, ...);
+int a, b, c, e;
+short d[7][7] = {};
+int main() {
+  short f;
+  c = 0;
+  for (; c <= 6; c++) {
+e |= d[c][c] & 1;
+b &= f & 3;
+  }
+  printf("%d\n", a);
+  return 0;
+}
-- 
2.36.3

Re: [PATCH] tree-ssa-math-opts: popcount (X) == 1 to (X ^ (X - 1)) > (X - 1) optimization [PR90693]





On 11/17/23 07:01, Jakub Jelinek wrote:

Hi!

Per the earlier discussions on this PR, the following patch folds
popcount (x) == 1 (and != 1) into (x ^ (x - 1)) > x - 1 (or <=)
if the corresponding popcount optab isn't implemented (I think any
double-word popcount or call will be necessarily slower than the
above cheap 3 op check and even for -Os larger or same size).

I've noticed e.g. C++ aligned new starts with std::has_single_bit
which does popcount (x) == 1.

As a follow-up, I'm considering changing in this routine the popcount
call to IFN_POPCOUNT with 2 arguments and during expansion test costs.

Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?

2023-11-17  Jakub Jelinek  

PR tree-optimization/90693
* tree-ssa-math-opts.cc (match_single_bit_test): New function.
(math_opts_dom_walker::after_dom_children): Call it for EQ_EXPR
and NE_EXPR assignments and GIMPLE_CONDs.

* gcc.target/i386/pr90693.c: New test.

OK.

Jeff

Re: [PATCH] tree-ssa-math-opts: popcount (X) == 1 to (X ^ (X - 1)) > (X - 1) optimization for direct optab [PR90693]





On 11/18/23 03:27, Jakub Jelinek wrote:

Hi!

On Fri, Nov 17, 2023 at 03:01:04PM +0100, Jakub Jelinek wrote:

As a follow-up, I'm considering changing in this routine the popcount
call to IFN_POPCOUNT with 2 arguments and during expansion test costs.


Here is the follow-up which does the rtx costs testing.
While having to tweak internal-fn.def so that POPCOUNT can have a custom
expand_POPCOUNT, I have noticed we are inconsistent, some DEF_INTERNAL*
macros (most of them) were undefined at the end of internal-fn.def (but in
some cases uselessly undefined again after inclusion), while others were not
(and sometimes undefined after the inclusion).  I've changed it to always
undefine at the end of internal-fn.def.

Ok for trunk if it passes bootstrap/regtest?

2023-11-18  Jakub Jelinek  

PR tree-optimization/90693
* tree-ssa-math-opts.cc (match_single_bit_test): Mark POPCOUNT with
result only used in equality comparison against 1 with direct optab
support as .POPCOUNT call with 2 arguments.
* internal-fn.h (expand_POPCOUNT): Declare.
* internal-fn.def: Document missing DEF_INTERNAL* macros and make sure
they are all undefined at the end.
(DEF_INTERNAL_INT_EXT_FN): New macro.
(POPCOUNT): Use it instead of DEF_INTERNAL_INT_FN.
* internal-fn.cc (lookup_hilo_internal_fn, lookup_evenodd_internal_fn,
widening_fn_p, get_len_internal_fn): Don't undef DEF_INTERNAL_*FN
macros after inclusion of internal-fn.def.
(DEF_INTERNAL_INT_EXT_FN): Define to nothing before inclusion to
define expanders.
(expand_POPCOUNT): New function.




+  unsigned cmp_cost = seq_cost (cmp_insns, speed_p);
+  if (popcount_cost < cmp_cost)
+emit_insn (popcount_insns);
+  else
+{
+  emit_insn (cmp_insns);
+  plhs = expand_normal (lhs);
+  if (GET_MODE (cmp) != GET_MODE (plhs))
+   cmp = convert_to_mode (GET_MODE (plhs), cmp, 1);
+  emit_move_insn (plhs, cmp);
+}
Did you want <= for the test to use popcount?  That seems like a better 
choice in that scenario to me as the popcount is likely smaller.


OK for the trunk as-is or using a <= test.

jeff

Re: [PATCH] g++: Add require-effective-target to multi-input file testcase pr95401.cc

On 11/10/23 11:00, Patrick O'Neill wrote:

On 11/9/23 17:34, Jeff Law wrote:

On 11/3/23 00:18, Patrick O'Neill wrote:

On non-vector targets dejagnu attempts dg-do compile for pr95401.cc.
This produces a command like this:
g++ pr95401.cc pr95401a.cc -S -o pr95401.s

which isn't valid (gcc does not accept multiple input files when using
-S with -o).

This patch adds require-effective-target vect_int to avoid the case
where the testcase is invoked with dg-do compile.

gcc/testsuite/ChangeLog:

* g++.dg/vect/pr95401.cc: Add require-effective-target vect_int.
Sorry, I must be missing something here. I fail to see how adding an
effective target check would/should impact the problem you've
described above with the dg-additional-sources interaction with -S.

It's not intuitive (& probably not the cleanest way of solving it).

pr95401.cc is an invalid testcase when run with dg-do compile (for the
reasons above).

pr95401.cc
does not define a dg-do, which means it uses the testcase uses dg-do-what-default to determine what to do.
dg-do-what-default is set by target-supports.exp
.

The two options here are set dg-do-what-default run or compile.
On non-vector targets the pr95401 is set to compile (which is invalid).

Ideally we would say if dg-do-what-default == compile don't run, but
AFAIK that isn't possible.
I didn't want to duplicate the check_vect_support_and_set_flags logic to
return true/false since that'll probably get out of sync.

I used require-effective-target vect_int as a proxy for
check_vect_support_and_set_flags (also since the testcase only contains
integer arrays).

That way we do this now:
dg-do-what-default run -> run
dg-do-what-default compile -> skip test

If there's a cleaner/better approach I'm happy to revise.
Another approach would be to make this a run test -- without actually
running the vector bits. ie, set it up so that it can safely run on any
target.

volatile bool x = false;

main()
{
if (x)
call the vector function that already exists
exit (0);
}

Though that may run afoul of other issues.

So if you want to go forward with your patch, that's fine, just add a
comment about how adding the effective-target test works around the problem.

Thanks for the detailed explanation,

jeff

Re: [PING] [PATCH] gfortran: Rely on dg-do-what-default to avoid running pr85853.f90, pr107254.f90 and vect-alias-check-1.F90 on non-vector targets





On 11/15/23 17:03, Patrick O'Neill wrote:

Ping.

Testsuite fixup similar to:
https://inbox.sourceware.org/gcc-patches/974e9e5e-8f07-46dd-b9b9-db8aa4685...@gmail.com/T/#t
https://inbox.sourceware.org/gcc-patches/7e78cd70-70c9-41b1-8a98-6977a1034...@rivosinc.com/T/#t

OK.


Jeff

[PATCH 00/44] RISC-V: Various if-conversion fixes and improvements

Hi,

 This patch series has come out from a simple change to add generic 
conditional-move and conditional-add expansions for a yet-out-of-tree 
target, which has relatively expensive branches and no conditional 
operations beyond the base architecture conditional-set instructions.  At 
one point I have concluded it may make sense to release this code to the 
general public, especially as some of the conditional execution sequences 
will trigger for targets we already have support for.  Naturally as a part 
of a proper upstream submission I chose to add suitable test cases.

 Now these test cases triggered a lot of issues in our existing code and 
as I fixed them what was supposed to be a couple of patches has turned 
into this humongous patch series, including a branch costing model rework.  
Oh well.

 Please see individual change descriptions for the details.  The overall 
patch series structure is as follows:

- 01-02 add test cases covering the existing state that won't change 
  throughout the patch series,

- 03-08 make small preparatory clean-ups that do not change semantics,

- 09-13 implement a branch cost model rework and add the associated test 
  cases,

- 14-24 make various improvements for integer conditional operations and 
  add the associated test cases,

- 25-28 add generic `movMODEcc' support and the associated test cases,

- 29-31 add generic `addMODEcc' support and the associated test cases,

- 32-44 make various improvements for floating-point conditional 
  operations and add the associated test cases.

There is potential here for middle end improvement, in particular branch 
costing is already documented in if-cvt.cc to be intended to consistently 
use BRANCH_COST, and then the generic conditional-move and conditional-add 
sequences could I suppose be emitted there in a target-agnostic way rather 
than being supplied by the backend.  This I suppose could be investigated 
in the future if the RISC-V approach turned out potentially useful for 
other targets.

 This has been so far verified as follows, using SiFive HiFive Unmatched 
hardware and the `riscv64-linux-gnu' target:

- New target test cases have been run with `-mtune=sifive-5-series',
  `-mtune=sifive-5-series/-march=rv32gc/-mabi=ilp32d' and 
  `-mtune=sifive-5-series/-mmovcc/-mbranch-cost=8' DejaGNU board options.

- The C language test suite has been run at significant points in the 
  patch series with `-mtune=sifive-5-series' and (past 26/44) also with 
  `-mtune=sifive-5-series/-mmovcc/-mbranch-cost=8', and selectively with 
  `-mtune=sifive-7-series' and
  `-mtune=sifive-7-series/-mmovcc/-mbranch-cost=8' DejaGNU board options.

Since this is huge and every test iteration takes a couple of hours I will 
continue running testing and may investigate running QEMU testing for the 
features the Unmatched does not support such as Zicond.  I don't expect 
real issues however.

 There are a bunch of issues triggered with `-mmovcc/-mbranch-cost=8' or 
with lone `-mbranch-cost=8' even and the vector test cases, which are 
either due to match patterns expecting an assembly label that has been 
reordered or are similar to PR target/112092 and which are not a problem 
with this patch series, but rather one with the vector testsuite or code.

 Any questions, comments, or concerns?  Otherwise OK to apply?

  Maciej

[PATCH 01/44] testsuite: Add cases for conditional-move and conditional-add operations

Add generic execution tests for expressions that are expected to expand 
to conditional-move and conditional-add operations where supported.  To 
ensure no corner case escapes all relational operators are extensively 
covered for integer comparisons and all ordered operators are covered 
for floating-point comparisons.  Unordered operators are not covered at 
this point as they'd require a different input data set.

gcc/testsuite/
* gcc.dg/torture/addieq.c: New test.
* gcc.dg/torture/addifeq.c: New test.
* gcc.dg/torture/addifge.c: New test.
* gcc.dg/torture/addifgt.c: New test.
* gcc.dg/torture/addifle.c: New test.
* gcc.dg/torture/addiflt.c: New test.
* gcc.dg/torture/addifne.c: New test.
* gcc.dg/torture/addige.c: New test.
* gcc.dg/torture/addigeu.c: New test.
* gcc.dg/torture/addigt.c: New test.
* gcc.dg/torture/addigtu.c: New test.
* gcc.dg/torture/addile.c: New test.
* gcc.dg/torture/addileu.c: New test.
* gcc.dg/torture/addilt.c: New test.
* gcc.dg/torture/addiltu.c: New test.
* gcc.dg/torture/addine.c: New test.
* gcc.dg/torture/addleq.c: New test.
* gcc.dg/torture/addlfeq.c: New test.
* gcc.dg/torture/addlfge.c: New test.
* gcc.dg/torture/addlfgt.c: New test.
* gcc.dg/torture/addlfle.c: New test.
* gcc.dg/torture/addlflt.c: New test.
* gcc.dg/torture/addlfne.c: New test.
* gcc.dg/torture/addlge.c: New test.
* gcc.dg/torture/addlgeu.c: New test.
* gcc.dg/torture/addlgt.c: New test.
* gcc.dg/torture/addlgtu.c: New test.
* gcc.dg/torture/addlle.c: New test.
* gcc.dg/torture/addlleu.c: New test.
* gcc.dg/torture/addllt.c: New test.
* gcc.dg/torture/addlltu.c: New test.
* gcc.dg/torture/addlne.c: New test.
* gcc.dg/torture/movieq.c: New test.
* gcc.dg/torture/movifeq.c: New test.
* gcc.dg/torture/movifge.c: New test.
* gcc.dg/torture/movifgt.c: New test.
* gcc.dg/torture/movifle.c: New test.
* gcc.dg/torture/moviflt.c: New test.
* gcc.dg/torture/movifne.c: New test.
* gcc.dg/torture/movige.c: New test.
* gcc.dg/torture/movigeu.c: New test.
* gcc.dg/torture/movigt.c: New test.
* gcc.dg/torture/movigtu.c: New test.
* gcc.dg/torture/movile.c: New test.
* gcc.dg/torture/movileu.c: New test.
* gcc.dg/torture/movilt.c: New test.
* gcc.dg/torture/moviltu.c: New test.
* gcc.dg/torture/movine.c: New test.
* gcc.dg/torture/movleq.c: New test.
* gcc.dg/torture/movlfeq.c: New test.
* gcc.dg/torture/movlfge.c: New test.
* gcc.dg/torture/movlfgt.c: New test.
* gcc.dg/torture/movlfle.c: New test.
* gcc.dg/torture/movlflt.c: New test.
* gcc.dg/torture/movlfne.c: New test.
* gcc.dg/torture/movlge.c: New test.
* gcc.dg/torture/movlgeu.c: New test.
* gcc.dg/torture/movlgt.c: New test.
* gcc.dg/torture/movlgtu.c: New test.
* gcc.dg/torture/movlle.c: New test.
* gcc.dg/torture/movlleu.c: New test.
* gcc.dg/torture/movllt.c: New test.
* gcc.dg/torture/movlltu.c: New test.
* gcc.dg/torture/movlne.c: New test.
---
 gcc/testsuite/gcc.dg/torture/addieq.c  |   31 +++
 gcc/testsuite/gcc.dg/torture/addifeq.c |   31 +++
 gcc/testsuite/gcc.dg/torture/addifge.c |   31 +++
 gcc/testsuite/gcc.dg/torture/addifgt.c |   31 +++
 gcc/testsuite/gcc.dg/torture/addifle.c |   31 +++
 gcc/testsuite/gcc.dg/torture/addiflt.c |   31 +++
 gcc/testsuite/gcc.dg/torture/addifne.c |   31 +++
 gcc/testsuite/gcc.dg/torture/addige.c  |   31 +++
 gcc/testsuite/gcc.dg/torture/addigeu.c |   31 +++
 gcc/testsuite/gcc.dg/torture/addigt.c  |   31 +++
 gcc/testsuite/gcc.dg/torture/addigtu.c |   31 +++
 gcc/testsuite/gcc.dg/torture/addile.c  |   31 +++
 gcc/testsuite/gcc.dg/torture/addileu.c |   31 +++
 gcc/testsuite/gcc.dg/torture/addilt.c  |   31 +++
 gcc/testsuite/gcc.dg/torture/addiltu.c |   31 +++
 gcc/testsuite/gcc.dg/torture/addine.c  |   31 +++
 gcc/testsuite/gcc.dg/torture/addleq.c  |   31 +++
 gcc/testsuite/gcc.dg/torture/addlfeq.c |   31 +++
 gcc/testsuite/gcc.dg/torture/addlfge.c |   31 +++
 gcc/testsuite/gcc.dg/torture/addlfgt.c |   31 +++
 gcc/testsuite/gcc.dg/torture/addlfle.c |   31 +

[PATCH 02/44] RISC-V/testsuite: Add cases for integer SFB cond-move operations

Verify, for short forward branch targets and the conditional-move 
operations that already work as expected, that if-conversion triggers 
via `noce_try_cmove' already at `-mbranch-cost=1' and that extraneous 
instructions such as SNEZ, etc. are not present in output.  Cover all 
integer relational operations to make sure no corner case escapes.

gcc/testsuite/
* gcc.target/riscv/movdieq-sfb.c: New test.
* gcc.target/riscv/movdige-sfb.c: New test.
* gcc.target/riscv/movdigeu-sfb.c: New test.
* gcc.target/riscv/movdigt-sfb.c: New test.
* gcc.target/riscv/movdigtu-sfb.c: New test.
* gcc.target/riscv/movdile-sfb.c: New test.
* gcc.target/riscv/movdileu-sfb.c: New test.
* gcc.target/riscv/movdilt-sfb.c: New test.
* gcc.target/riscv/movdiltu-sfb.c: New test.
* gcc.target/riscv/movdine-sfb.c: New test.
* gcc.target/riscv/movsieq-sfb.c: New test.
* gcc.target/riscv/movsige-sfb.c: New test.
* gcc.target/riscv/movsigeu-sfb.c: New test.
* gcc.target/riscv/movsigt-sfb.c: New test.
* gcc.target/riscv/movsigtu-sfb.c: New test.
* gcc.target/riscv/movsile-sfb.c: New test.
* gcc.target/riscv/movsileu-sfb.c: New test.
* gcc.target/riscv/movsilt-sfb.c: New test.
* gcc.target/riscv/movsiltu-sfb.c: New test.
* gcc.target/riscv/movsine-sfb.c: New test.
---
 gcc/testsuite/gcc.target/riscv/movdieq-sfb.c  |   25 +
 gcc/testsuite/gcc.target/riscv/movdige-sfb.c  |   26 ++
 gcc/testsuite/gcc.target/riscv/movdigeu-sfb.c |   26 ++
 gcc/testsuite/gcc.target/riscv/movdigt-sfb.c  |   26 ++
 gcc/testsuite/gcc.target/riscv/movdigtu-sfb.c |   26 ++
 gcc/testsuite/gcc.target/riscv/movdile-sfb.c  |   26 ++
 gcc/testsuite/gcc.target/riscv/movdileu-sfb.c |   26 ++
 gcc/testsuite/gcc.target/riscv/movdilt-sfb.c  |   26 ++
 gcc/testsuite/gcc.target/riscv/movdiltu-sfb.c |   26 ++
 gcc/testsuite/gcc.target/riscv/movdine-sfb.c  |   25 +
 gcc/testsuite/gcc.target/riscv/movsieq-sfb.c  |   25 +
 gcc/testsuite/gcc.target/riscv/movsige-sfb.c  |   26 ++
 gcc/testsuite/gcc.target/riscv/movsigeu-sfb.c |   26 ++
 gcc/testsuite/gcc.target/riscv/movsigt-sfb.c  |   26 ++
 gcc/testsuite/gcc.target/riscv/movsigtu-sfb.c |   26 ++
 gcc/testsuite/gcc.target/riscv/movsile-sfb.c  |   26 ++
 gcc/testsuite/gcc.target/riscv/movsileu-sfb.c |   26 ++
 gcc/testsuite/gcc.target/riscv/movsilt-sfb.c  |   26 ++
 gcc/testsuite/gcc.target/riscv/movsiltu-sfb.c |   26 ++
 gcc/testsuite/gcc.target/riscv/movsine-sfb.c  |   25 +
 20 files changed, 516 insertions(+)

gcc-riscv-test-movcc.diff
Index: gcc/gcc/testsuite/gcc.target/riscv/movdieq-sfb.c
===
--- /dev/null
+++ gcc/gcc/testsuite/gcc.target/riscv/movdieq-sfb.c
@@ -0,0 +1,25 @@
+/* { dg-do compile } */
+/* { dg-require-effective-target rv64 } */
+/* { dg-skip-if "" { *-*-* } { "-O0" "-Og" } } */
+/* { dg-options "-march=rv64gc -mtune=sifive-7-series -mbranch-cost=1 
-fdump-rtl-ce1" } */
+
+typedef int __attribute__ ((mode (DI))) int_t;
+
+int_t
+movdieq (int_t w, int_t x, int_t y, int_t z)
+{
+  return w == x ? y : z;
+}
+
+/* Expect short forward branch assembly like:
+
+   bne a0,a1,1f# movcc
+   mv  a3,a2
+1:
+   mv  a0,a3
+ */
+
+/* { dg-final { scan-rtl-dump-times "Conversion succeeded on pass 1\\." 1 
"ce1" } } */
+/* { dg-final { scan-rtl-dump-times "if-conversion succeeded through 
noce_try_cmove" 1 "ce1" } } */
+/* { dg-final { scan-assembler-times "\\s(?:beq|bne)\\s\[^\\s\]+\\s# movcc\\s" 
1 } } */
+/* { dg-final { scan-assembler-not "\\s(?:seqz|snez)\\s" } } */
Index: gcc/gcc/testsuite/gcc.target/riscv/movdige-sfb.c
===
--- /dev/null
+++ gcc/gcc/testsuite/gcc.target/riscv/movdige-sfb.c
@@ -0,0 +1,26 @@
+/* { dg-do compile } */
+/* { dg-require-effective-target rv64 } */
+/* { dg-skip-if "" { *-*-* } { "-O0" "-Og" } } */
+/* { dg-options "-march=rv64gc -mtune=sifive-7-series -mbranch-cost=1 
-fdump-rtl-ce1" } */
+
+typedef int __attribute__ ((mode (DI))) int_t;
+
+int_t
+movdige (int_t w, int_t x, int_t y, int_t z)
+{
+  return w >= x ? y : z;
+}
+
+/* Expect short forward branch assembly like:
+
+   blt a0,a1,1f# movcc
+   mv  a3,a2
+1:
+   mv  a0,a3
+ */
+
+/* { dg-final { scan-rtl-dump-times "Conversion succeeded on pass 1\\." 1 
"ce1" } } */
+/* { dg-final { scan-rtl-dump-times "if-conversion s

[PATCH 03/44] RISC-V: Reorder comment on SFB patterns

Our `movcc' expander is no longer specific to short forward branch 
targets, so move its associated comment accordingly.

gcc/
* config/riscv/riscv.md (movcc): Move comment on SFB 
patterns over to...
(*movcc): ... here.
---
 gcc/config/riscv/riscv.md |4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

gcc-riscv-sfb-comment-move.diff
Index: gcc-master/gcc/config/riscv/riscv.md
===
--- gcc-master.orig/gcc/config/riscv/riscv.md
+++ gcc-master/gcc/config/riscv/riscv.md
@@ -2655,8 +2655,6 @@
   [(set_attr "type" "branch")
(set_attr "mode" "none")])
 
-;; Patterns for implementations that optimize short forward branches.
-
 (define_expand "movcc"
   [(set (match_operand:GPR 0 "register_operand")
(if_then_else:GPR (match_operand 1 "comparison_operator")
@@ -2671,6 +2669,8 @@
 FAIL;
 })
 
+;; Patterns for implementations that optimize short forward branches.
+
 (define_insn "*movcc"
   [(set (match_operand:GPR 0 "register_operand" "=r,r")
(if_then_else:GPR

[PATCH 04/44] RISC-V: Sanitise NEED_EQ_NE_P case with `riscv_emit_int_compare'

For the NEED_EQ_NE_P `riscv_emit_int_compare' is documented to only emit 
EQ or NE comparisons against zero, however it does not catch incorrect 
use where a non-equality comparison has been requested and falls through 
to the general case then.  Add a safety guard to catch such a case then.

Arguably the NEED_EQ_NE_P case would best be moved into a function of 
its own, but let's leave it for a separate cleanup.

gcc/
* config/riscv/riscv.cc (riscv_emit_int_compare): Bail out if
NEED_EQ_NE_P but the comparison is neither EQ nor NE.   
---
FWIW the structure of code here clearly shows the NEED_EQ_NE_P case has 
been bolted on as an afterthought rather than how this piece would look 
if written from scratch right away.  Let's defer any further cleanups at 
this stage of the development cycle though.
---
 gcc/config/riscv/riscv.cc |1 +
 1 file changed, 1 insertion(+)

gcc-riscv-emit-int-compare-need-eq-ne.diff
Index: gcc/gcc/config/riscv/riscv.cc
===
--- gcc.orig/gcc/config/riscv/riscv.cc
+++ gcc/gcc/config/riscv/riscv.cc
@@ -3779,6 +3779,7 @@ riscv_emit_int_compare (enum rtx_code *c
  *op1 = const0_rtx;
  return;
}
+  gcc_unreachable ();
 }
 
   if (splittable_const_int_operand (*op1, VOIDmode))

[PATCH 05/44] RISC-V: Fix `mode' usage in `riscv_expand_conditional_move'

In `riscv_expand_conditional_move' `mode' is initialized right away from 
`GET_MODE (dest)', so remove needless references that refrain from using 
the local variable.

gcc/
* config/riscv/riscv.cc (riscv_expand_conditional_move): Use 
`mode' for `GET_MODE (dest)' throughout.
---
 gcc/config/riscv/riscv.cc |4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

gcc-riscv-expand-conditional-move-mode-dest.diff
Index: gcc/gcc/config/riscv/riscv.cc
===
--- gcc.orig/gcc/config/riscv/riscv.cc
+++ gcc/gcc/config/riscv/riscv.cc
@@ -3999,8 +3999,8 @@ riscv_expand_conditional_move (rtx dest,
 arm of the conditional move.  That allows us to support more
 cases for extensions which are more general than SFB.  But
 does mean we need to force CONS into a register at this point.  */
-  cons = force_reg (GET_MODE (dest), cons);
-  emit_insn (gen_rtx_SET (dest, gen_rtx_IF_THEN_ELSE (GET_MODE (dest),
+  cons = force_reg (mode, cons);
+  emit_insn (gen_rtx_SET (dest, gen_rtx_IF_THEN_ELSE (mode,
  cond, cons, alt)));
   return true;
 }

[PATCH 06/44] RISC-V: Avoid repeated GET_MODE calls in `riscv_expand_conditional_move'

Use `mode0' and `mode1' shorthands respectively for `GET_MODE (op0)' and 
`GET_MODE (op1)' to improve code readability.

gcc/
* config/riscv/riscv.cc (riscv_expand_conditional_move): Use 
`mode0' and `mode1' for `GET_MODE (op0)' and `GET_MODE (op1)'.
---
 gcc/config/riscv/riscv.cc |   11 +++
 1 file changed, 7 insertions(+), 4 deletions(-)

gcc-riscv-expand-conditional-move-mode-cmp.diff
Index: gcc/gcc/config/riscv/riscv.cc
===
--- gcc.orig/gcc/config/riscv/riscv.cc
+++ gcc/gcc/config/riscv/riscv.cc
@@ -4007,12 +4007,15 @@ riscv_expand_conditional_move (rtx dest,
   else if (TARGET_ZICOND_LIKE
   && GET_MODE_CLASS (mode) == MODE_INT)
 {
+  machine_mode mode0 = GET_MODE (op0);
+  machine_mode mode1 = GET_MODE (op1);
+
   /* The comparison must be comparing WORD_MODE objects.   We must
 enforce that so that we don't strip away a sign_extension
 thinking it is unnecessary.  We might consider using
 riscv_extend_operands if they are not already properly extended.  */
-  if ((GET_MODE (op0) != word_mode && GET_MODE (op0) != VOIDmode)
- || (GET_MODE (op1) != word_mode && GET_MODE (op1) != VOIDmode))
+  if ((mode0 != word_mode && mode0 != VOIDmode)
+ || (mode1 != word_mode && mode1 != VOIDmode))
return false;
 
   /* Canonicalize the comparison.  It must be an equality comparison
@@ -4032,9 +4035,9 @@ riscv_expand_conditional_move (rtx dest,
  rtx tmp = gen_reg_rtx (word_mode);
 
  /* We can support both FP and integer conditional moves.  */
- if (INTEGRAL_MODE_P (GET_MODE (XEXP (op, 0
+ if (INTEGRAL_MODE_P (mode0))
riscv_expand_int_scc (tmp, code, op0, op1, invert_ptr);
- else if (FLOAT_MODE_P (GET_MODE (XEXP (op, 0)))
+ else if (FLOAT_MODE_P (mode0)
   && fp_scc_comparison (op, GET_MODE (op)))
riscv_expand_float_scc (tmp, code, op0, op1);
  else

[PATCH 08/44] RISC-V: Simplify EQ vs NE selection in `riscv_expand_conditional_move'

Just choose between EQ and NE at `gen_rtx_fmt_ee' invocation, removing 
an extraneous variable only referred once and improving code clarity.

gcc/
* config/riscv/riscv.cc (riscv_expand_conditional_move): Remove 
extraneous variable for EQ vs NE operation selection.
---
FWIW I have no idea what "We need to know where so that we can adjust it 
for our needs." refers to, but that would have to be for another change.
---
 gcc/config/riscv/riscv.cc |   12 
 1 file changed, 4 insertions(+), 8 deletions(-)

gcc-riscv-expand-conditional-move-new-code.diff
Index: gcc/gcc/config/riscv/riscv.cc
===
--- gcc.orig/gcc/config/riscv/riscv.cc
+++ gcc/gcc/config/riscv/riscv.cc
@@ -4023,10 +4023,12 @@ riscv_expand_conditional_move (rtx dest,
 we can then use an equality comparison against zero.  */
   if (!equality_operator (op, VOIDmode) || op1 != CONST0_RTX (mode))
{
- enum rtx_code new_code = NE;
  bool *invert_ptr = nullptr;
  bool invert = false;
 
+ /* If riscv_expand_int_scc inverts the condition, then it will
+flip the value of INVERT.  We need to know where so that
+we can adjust it for our needs.  */
  if (code == LE || code == GE)
invert_ptr = &invert;
 
@@ -4043,13 +4045,7 @@ riscv_expand_conditional_move (rtx dest,
  else
return false;
 
- /* If riscv_expand_int_scc inverts the condition, then it will
-flip the value of INVERT.  We need to know where so that
-we can adjust it for our needs.  */
- if (invert)
-   new_code = EQ;
-
- op = gen_rtx_fmt_ee (new_code, mode, tmp, const0_rtx);
+ op = gen_rtx_fmt_ee (invert ? EQ : NE, mode, tmp, const0_rtx);
 
  /* We've generated a new comparison.  Update the local variables.  */
  code = GET_CODE (op);

[PATCH 07/44] RISC-V: Use `nullptr' in `riscv_expand_conditional_move'

Use `nullptr' for consistency rather than 0 to initialize `invert_ptr'.

gcc/
* config/riscv/riscv.cc (riscv_expand_conditional_move): Use 
`nullptr' rather than 0 to initialize a pointer.
---
 gcc/config/riscv/riscv.cc |2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

gcc-riscv-expand-conditional-move-nullptr.diff
Index: gcc/gcc/config/riscv/riscv.cc
===
--- gcc.orig/gcc/config/riscv/riscv.cc
+++ gcc/gcc/config/riscv/riscv.cc
@@ -4024,7 +4024,7 @@ riscv_expand_conditional_move (rtx dest,
   if (!equality_operator (op, VOIDmode) || op1 != CONST0_RTX (mode))
{
  enum rtx_code new_code = NE;
- bool *invert_ptr = 0;
+ bool *invert_ptr = nullptr;
  bool invert = false;
 
  if (code == LE || code == GE)

[PATCH 09/44] RISC-V: Rework branch costing model for if-conversion

The generic branch costing model for if-conversion assumes a fixed cost 
of COSTS_N_INSNS (2) for a conditional branch, and that one half of that 
cost comes from a preceding condition-set instruction, such as with 
MODE_CC targets, and then the other half of that cost is for the actual 
branch instruction.  This is hardcoded for `if_info.original_cost' in 
`noce_find_if_block' and regardless of the cost set for branches via 
BRANCH_COST.

Then `default_max_noce_ifcvt_seq_cost' instructs if-conversion to prefer 
a branchless sequence as costly as high as triple the BRANCH_COST value 
set.  This is apparently to make up for the inability to accurately 
guess the branch penalty.

Consequently for the BRANCH_COST of 3 we commonly set for tuning, 
if-conversion will consider branchless sequences costing 3 * 3 - 2 = 7 
instruction units more than a corresponding branch sequence.  For the 
BRANCH_COST of 4 such as with `sifive-7-series' tuning this is even 
worse, at 3 * 4 - 2 = 10.  Effectively it means a branchless sequence 
will always be chosen if available, even a very inefficient one.

Rework the branch costing model to better match our architecture, 
observing in particular that we have no preparatory instructions for 
branches so that the cost of a branch is naked BRANCH_COST plus any 
extra overhead the processing of a branch's source RTX might incur.

Provide TARGET_INSN_COST and TARGET_MAX_NOCE_IFCVT_SEQ_COST handlers 
than that return suitable cost based on BRANCH_COST.  The latter hook 
usually returns a value that is lower than the cost of the corresponding 
branched sequence.  This is because we don't really want to produce a 
branchless sequence that is more expensive than the original branched 
sequence.  If this turns out too conservative for some corner case, then 
this choice might be revisited.

Then we don't want to fiddle with `noce_find_if_block' without a lot of 
cross-target verification, so add TARGET_NOCE_CONVERSION_PROFITABLE_P 
defined such that it subtracts the fixed COSTS_N_INSNS (2) cost from the 
cost of the original branched sequence supplied and instead adds actual 
branch cost calculated from the conditional branch instruction used.  It 
is then further tweaked according to simple analysis of the replacement 
branchless sequence produced so as to cancel the cost of an extraneous 
zero extend operation produced by `noce_try_store_flag_mask' as observed 
with gcc/testsuite/gcc.target/riscv/pr105314.c.

Tweak the testsuite accordingly and set `-mbranch-cost=' explicitly for 
the relevant cases so that the expected if-conversion transformation is 
made regardless of the default BRANCH_COST value of tuning in effect.  
Some of these settings will be lowered later on as deficiencies in 
branchless sequence generation have been fixed that lower their cost 
calculated by if-conversion.

gcc/
* config/riscv/riscv.cc (riscv_insn_cost): New function.
(riscv_max_noce_ifcvt_seq_cost): Likewise.
(riscv_noce_conversion_profitable_p): Likewise.
(TARGET_INSN_COST): New macro.
(TARGET_MAX_NOCE_IFCVT_SEQ_COST): New macro.
(TARGET_NOCE_CONVERSION_PROFITABLE_P: New macro.

gcc/testsuite/
* 
gcc.target/riscv/zicond-primitiveSemantics_compare_imm_return_imm_imm.c: 
Explicitly set the branch cost.
* 
gcc.target/riscv/zicond-primitiveSemantics_compare_imm_return_imm_reg.c: 
Likewise.
* 
gcc.target/riscv/zicond-primitiveSemantics_compare_imm_return_reg_reg.c: 
Likewise.
* 
gcc.target/riscv/zicond-primitiveSemantics_compare_reg_return_imm_imm.c: 
Likewise.
* 
gcc.target/riscv/zicond-primitiveSemantics_compare_reg_return_imm_reg.c: 
Likewise.
* 
gcc.target/riscv/zicond-primitiveSemantics_compare_reg_return_reg_reg.c: 
Likewise.
---
FWIW I don't understand why the test cases absolutely HAD to have such 
overlong names guaranteed to exceed our 80 column limit in any context.  
It's such a pain to handle.
---
 gcc/config/riscv/riscv.cc  
   |  120 ++
 
gcc/testsuite/gcc.target/riscv/zicond-primitiveSemantics_compare_imm_return_imm_imm.c
 |4 
 
gcc/testsuite/gcc.target/riscv/zicond-primitiveSemantics_compare_imm_return_imm_reg.c
 |4 
 
gcc/testsuite/gcc.target/riscv/zicond-primitiveSemantics_compare_imm_return_reg_reg.c
 |4 
 
gcc/testsuite/gcc.target/riscv/zicond-primitiveSemantics_compare_reg_return_imm_imm.c
 |4 
 
gcc/testsuite/gcc.target/riscv/zicond-primitiveSemantics_compare_reg_return_imm_reg.c
 |4 
 
gcc/testsuite/gcc.target/riscv/zicond-primitiveSemantics_compare_reg_return_reg_reg.c
 |4 
 7 files changed, 132 insertions(+), 12 deletions(-)

gcc-riscv-branch-cost.diff
Index: gcc/gcc/config/riscv/riscv.cc
===
--- gcc.orig/gcc/config/riscv/riscv.cc
+++ gcc/gcc/config/riscv/riscv.cc
@@ -43,6 +4

[PATCH 10/44] RISC-V/testsuite: Add branched cases for integer cond-move operations

Verify, for T-Head, Ventana and Zicond targets and the integer 
conditional-move operations that already work as expected, that 
if-conversion does *not* trigger at the respective sufficiently low 
`-mbranch-cost=' settings that make original branched code sequences 
cheaper than their branchless equivalents if-conversion would emit.  
Cover all integer relational operations to make sure no corner case 
escapes.

The reason to XFAIL movdibne-thead.c and movsibne-thead.c is the 
branchless T-Head sequence:

sub a1,a0,a1
th.mveqza2,a3,a1
mv  a0,a2
ret

produced rather than its original branched counterpart:

beq a0,a1,.L3
mv  a0,a2
ret
.L3:
mv  a0,a3
ret

at `-mbranch-cost=1', even though under this setting the latter sequence 
is obviously cheaper performance-wise.  This is because the final move 
instruction in the branchless sequence is not counted towards its cost 
and consequently the cost of both sequences works out at 8 each, making 
if-conversion prefer the branchless variant.  Use the XFAIL mark to keep 
track of these cases for future consideration.

gcc/testsuite/
* gcc.target/riscv/movdibeq-thead.c: New test.
* gcc.target/riscv/movdibge-ventana.c: New test.
* gcc.target/riscv/movdibge-zicond.c: New test.
* gcc.target/riscv/movdibgeu-ventana.c: New test.
* gcc.target/riscv/movdibgeu-zicond.c: New test.
* gcc.target/riscv/movdibgt-ventana.c: New test.
* gcc.target/riscv/movdibgt-zicond.c: New test.
* gcc.target/riscv/movdible-ventana.c: New test.
* gcc.target/riscv/movdible-zicond.c: New test.
* gcc.target/riscv/movdibleu-ventana.c: New test.
* gcc.target/riscv/movdibleu-zicond.c: New test.
* gcc.target/riscv/movdiblt-ventana.c: New test.
* gcc.target/riscv/movdiblt-zicond.c: New test.
* gcc.target/riscv/movdibne-thead.c: New test.
* gcc.target/riscv/movsibeq-thead.c: New test.
* gcc.target/riscv/movsibge-ventana.c: New test.
* gcc.target/riscv/movsibge-zicond.c: New test.
* gcc.target/riscv/movsibgeu-ventana.c: New test.
* gcc.target/riscv/movsibgeu-zicond.c: New test.
* gcc.target/riscv/movsibgt-ventana.c: New test.
* gcc.target/riscv/movsibgt-zicond.c: New test.
* gcc.target/riscv/movsible-ventana.c: New test.
* gcc.target/riscv/movsible-zicond.c: New test.
* gcc.target/riscv/movsibleu-ventana.c: New test.
* gcc.target/riscv/movsibleu-zicond.c: New test.
* gcc.target/riscv/movsiblt-ventana.c: New test.
* gcc.target/riscv/movsiblt-zicond.c: New test.
* gcc.target/riscv/movsibne-thead.c: New test.
---
 gcc/testsuite/gcc.target/riscv/movdibeq-thead.c|   27 +++
 gcc/testsuite/gcc.target/riscv/movdibge-ventana.c  |   28 
 gcc/testsuite/gcc.target/riscv/movdibge-zicond.c   |   28 
 gcc/testsuite/gcc.target/riscv/movdibgeu-ventana.c |   28 
 gcc/testsuite/gcc.target/riscv/movdibgeu-zicond.c  |   28 
 gcc/testsuite/gcc.target/riscv/movdibgt-ventana.c  |   28 
 gcc/testsuite/gcc.target/riscv/movdibgt-zicond.c   |   28 
 gcc/testsuite/gcc.target/riscv/movdible-ventana.c  |   28 
 gcc/testsuite/gcc.target/riscv/movdible-zicond.c   |   28 
 gcc/testsuite/gcc.target/riscv/movdibleu-ventana.c |   28 
 gcc/testsuite/gcc.target/riscv/movdibleu-zicond.c  |   28 
 gcc/testsuite/gcc.target/riscv/movdiblt-ventana.c  |   28 
 gcc/testsuite/gcc.target/riscv/movdiblt-zicond.c   |   28 
 gcc/testsuite/gcc.target/riscv/movdibne-thead.c|   29 +
 gcc/testsuite/gcc.target/riscv/movsibeq-thead.c|   27 +++
 gcc/testsuite/gcc.target/riscv/movsibge-ventana.c  |   28 
 gcc/testsuite/gcc.target/riscv/movsibge-zicond.c   |   28 
 gcc/testsuite/gcc.target/riscv/movsibgeu-ventana.c |   28 
 gcc/testsuite/gcc.target/riscv/movsibgeu-zicond.c  |   28 
 gcc/testsuite/gcc.target/riscv/movsibgt-ventana.c  |   28 
 gcc/testsuite/gcc.target/riscv/movsibgt-zicond.c   |   28 
 gcc/testsuite/gcc.target/riscv/movsible-ventana.c  |   28 
 gcc/testsuite/gcc.target/riscv/movsible-zicond.c   |   28 
 gcc/testsuite/gcc.target/riscv/movsibleu-ventana.c |   28 
 gcc/testsuite/gcc.target/riscv/movsibleu-zicond.c  |   28 
 gcc/testsuite/gcc.target/riscv/movsiblt-ventana.c  |   28 
 gcc/testsuite/gcc.target/riscv/movsiblt-zicond.c   |   28 
 gcc/testsuite/gcc.target/riscv/movsibne-thead.c|

[PATCH 11/44] RISC-V/testsuite: Add branchless cases for integer cond-move operations

Verify, for T-Head, Ventana and Zicond targets and the integer 
conditional-move operations that already work as expected, if-conversion 
to trigger via `noce_try_cmove' at the respective sufficiently high 
`-mbranch-cost=' settings that make branchless code sequences produced 
by if-conversion cheaper than their original branched equivalents, and 
that extraneous instructions such as SNEZ, etc. are not present in 
output.  Cover all integer relational operations to make sure no corner 
case escapes.

gcc/testsuite/
* gcc.target/riscv/movdieq-thead.c: New test.
* gcc.target/riscv/movdige-ventana.c: New test.
* gcc.target/riscv/movdige-zicond.c: New test.
* gcc.target/riscv/movdigeu-ventana.c: New test.
* gcc.target/riscv/movdigeu-zicond.c: New test.
* gcc.target/riscv/movdigt-ventana.c: New test.
* gcc.target/riscv/movdigt-zicond.c: New test.
* gcc.target/riscv/movdile-ventana.c: New test.
* gcc.target/riscv/movdile-zicond.c: New test.
* gcc.target/riscv/movdileu-ventana.c: New test.
* gcc.target/riscv/movdileu-zicond.c: New test.
* gcc.target/riscv/movdilt-ventana.c: New test.
* gcc.target/riscv/movdilt-zicond.c: New test.
* gcc.target/riscv/movdine-thead.c: New test.
* gcc.target/riscv/movsieq-thead.c: New test.
* gcc.target/riscv/movsige-ventana.c: New test.
* gcc.target/riscv/movsige-zicond.c: New test.
* gcc.target/riscv/movsigeu-ventana.c: New test.
* gcc.target/riscv/movsigeu-zicond.c: New test.
* gcc.target/riscv/movsigt-ventana.c: New test.
* gcc.target/riscv/movsigt-zicond.c: New test.
* gcc.target/riscv/movsile-ventana.c: New test.
* gcc.target/riscv/movsile-zicond.c: New test.
* gcc.target/riscv/movsileu-ventana.c: New test.
* gcc.target/riscv/movsileu-zicond.c: New test.
* gcc.target/riscv/movsilt-ventana.c: New test.
* gcc.target/riscv/movsilt-zicond.c: New test.
* gcc.target/riscv/movsine-thead.c: New test.
---
 gcc/testsuite/gcc.target/riscv/movdieq-thead.c|   26 
 gcc/testsuite/gcc.target/riscv/movdige-ventana.c  |   28 ++
 gcc/testsuite/gcc.target/riscv/movdige-zicond.c   |   28 ++
 gcc/testsuite/gcc.target/riscv/movdigeu-ventana.c |   28 ++
 gcc/testsuite/gcc.target/riscv/movdigeu-zicond.c  |   28 ++
 gcc/testsuite/gcc.target/riscv/movdigt-ventana.c  |   28 ++
 gcc/testsuite/gcc.target/riscv/movdigt-zicond.c   |   28 ++
 gcc/testsuite/gcc.target/riscv/movdile-ventana.c  |   28 ++
 gcc/testsuite/gcc.target/riscv/movdile-zicond.c   |   28 ++
 gcc/testsuite/gcc.target/riscv/movdileu-ventana.c |   28 ++
 gcc/testsuite/gcc.target/riscv/movdileu-zicond.c  |   28 ++
 gcc/testsuite/gcc.target/riscv/movdilt-ventana.c  |   28 ++
 gcc/testsuite/gcc.target/riscv/movdilt-zicond.c   |   28 ++
 gcc/testsuite/gcc.target/riscv/movdine-thead.c|   26 
 gcc/testsuite/gcc.target/riscv/movsieq-thead.c|   26 
 gcc/testsuite/gcc.target/riscv/movsige-ventana.c  |   28 ++
 gcc/testsuite/gcc.target/riscv/movsige-zicond.c   |   28 ++
 gcc/testsuite/gcc.target/riscv/movsigeu-ventana.c |   28 ++
 gcc/testsuite/gcc.target/riscv/movsigeu-zicond.c  |   28 ++
 gcc/testsuite/gcc.target/riscv/movsigt-ventana.c  |   28 ++
 gcc/testsuite/gcc.target/riscv/movsigt-zicond.c   |   28 ++
 gcc/testsuite/gcc.target/riscv/movsile-ventana.c  |   28 ++
 gcc/testsuite/gcc.target/riscv/movsile-zicond.c   |   28 ++
 gcc/testsuite/gcc.target/riscv/movsileu-ventana.c |   28 ++
 gcc/testsuite/gcc.target/riscv/movsileu-zicond.c  |   28 ++
 gcc/testsuite/gcc.target/riscv/movsilt-ventana.c  |   28 ++
 gcc/testsuite/gcc.target/riscv/movsilt-zicond.c   |   28 ++
 gcc/testsuite/gcc.target/riscv/movsine-thead.c|   26 
 28 files changed, 776 insertions(+)

gcc-riscv-branch-cost-test-movcc.diff
Index: gcc/gcc/testsuite/gcc.target/riscv/movdieq-thead.c
===
--- /dev/null
+++ gcc/gcc/testsuite/gcc.target/riscv/movdieq-thead.c
@@ -0,0 +1,26 @@
+/* { dg-do compile } */
+/* { dg-require-effective-target rv64 } */
+/* { dg-skip-if "" { *-*-* } { "-O0" "-Og" "-Os" "-Oz" } } */
+/* { dg-options "-march=rv64gc_xtheadcondmov -mtune=thead-c906 -mbranch-cost=2 
-fdump-rtl-ce1" } */
+
+typedef int __attribute__ ((mode (DI))) int_t;
+
+int_t
+movdieq (int_t w, int_t x, int_t y, int_t z)
+{
+  return w == x ?

[PATCH 12/44] RISC-V/testsuite: Add branched cases for FP cond-move operations

Verify, for Ventana and Zicond targets and the ordered floating-point 
conditional-move operations that already work as expected, that 
if-conversion does *not* trigger at `-mbranch-cost=2' setting, which 
makes original branched code sequences cheaper than their branchless 
equivalents if-conversion would emit.  Cover all ordered floating-point 
relational operations to make sure no corner case escapes.

gcc/testsuite/
* gcc.target/riscv/movdibfge-ventana.c: New test.
* gcc.target/riscv/movdibfge-zicond.c: New test.
* gcc.target/riscv/movdibfgt-ventana.c: New test.
* gcc.target/riscv/movdibfgt-zicond.c: New test.
* gcc.target/riscv/movdibfle-ventana.c: New test.
* gcc.target/riscv/movdibfle-zicond.c: New test.
* gcc.target/riscv/movdibflt-ventana.c: New test.
* gcc.target/riscv/movdibflt-zicond.c: New test.
* gcc.target/riscv/movdibfne-ventana.c: New test.
* gcc.target/riscv/movdibfne-zicond.c: New test.
* gcc.target/riscv/movsibfge-ventana.c: New test.
* gcc.target/riscv/movsibfge-zicond.c: New test.
* gcc.target/riscv/movsibfgt-ventana.c: New test.
* gcc.target/riscv/movsibfgt-zicond.c: New test.
* gcc.target/riscv/movsibfle-ventana.c: New test.
* gcc.target/riscv/movsibfle-zicond.c: New test.
* gcc.target/riscv/movsibflt-ventana.c: New test.
* gcc.target/riscv/movsibflt-zicond.c: New test.
* gcc.target/riscv/movsibfne-ventana.c: New test.
* gcc.target/riscv/movsibfne-zicond.c: New test.
---
 gcc/testsuite/gcc.target/riscv/movdibfge-ventana.c |   29 
 gcc/testsuite/gcc.target/riscv/movdibfge-zicond.c  |   29 
 gcc/testsuite/gcc.target/riscv/movdibfgt-ventana.c |   30 +
 gcc/testsuite/gcc.target/riscv/movdibfgt-zicond.c  |   30 +
 gcc/testsuite/gcc.target/riscv/movdibfle-ventana.c |   30 +
 gcc/testsuite/gcc.target/riscv/movdibfle-zicond.c  |   30 +
 gcc/testsuite/gcc.target/riscv/movdibflt-ventana.c |   30 +
 gcc/testsuite/gcc.target/riscv/movdibflt-zicond.c  |   30 +
 gcc/testsuite/gcc.target/riscv/movdibfne-ventana.c |   30 +
 gcc/testsuite/gcc.target/riscv/movdibfne-zicond.c  |   30 +
 gcc/testsuite/gcc.target/riscv/movsibfge-ventana.c |   30 +
 gcc/testsuite/gcc.target/riscv/movsibfge-zicond.c  |   30 +
 gcc/testsuite/gcc.target/riscv/movsibfgt-ventana.c |   30 +
 gcc/testsuite/gcc.target/riscv/movsibfgt-zicond.c  |   30 +
 gcc/testsuite/gcc.target/riscv/movsibfle-ventana.c |   30 +
 gcc/testsuite/gcc.target/riscv/movsibfle-zicond.c  |   30 +
 gcc/testsuite/gcc.target/riscv/movsibflt-ventana.c |   30 +
 gcc/testsuite/gcc.target/riscv/movsibflt-zicond.c  |   30 +
 gcc/testsuite/gcc.target/riscv/movsibfne-ventana.c |   30 +
 gcc/testsuite/gcc.target/riscv/movsibfne-zicond.c  |   30 +
 20 files changed, 598 insertions(+)

gcc-riscv-branch-cost-test-movccf-branch.diff
Index: gcc/gcc/testsuite/gcc.target/riscv/movdibfge-ventana.c
===
--- /dev/null
+++ gcc/gcc/testsuite/gcc.target/riscv/movdibfge-ventana.c
@@ -0,0 +1,29 @@
+/* { dg-do compile } */
+/* { dg-require-effective-target rv64 } */
+/* { dg-skip-if "" { *-*-* } { "-O0" "-Og" "-Os" "-Oz" } } */
+/* { dg-options "-march=rv64gc_xventanacondops -mtune=rocket -mbranch-cost=2 
-fdump-rtl-ce1" } */
+
+typedef int __attribute__ ((mode (DI))) int_t;
+
+int_t
+movdifge (double w, double x, int_t y, int_t z)
+{
+  return w >= x ? y : z;
+}
+
+/* Expect branched assembly like:
+
+   fge.d   a4,fa0,fa1
+   mv  a5,a0
+   mv  a0,a1
+   beq a4,zero,.L2
+   mv  a0,a5
+ */
+
+/* { dg-final { scan-rtl-dump-not "Conversion succeeded on pass \[0-9\]+\\." 
"ce1" } } */
+/* { dg-final { scan-rtl-dump-not "if-conversion succeeded through" "ce1" } } 
*/
+/* { dg-final { scan-assembler-times "\\s(?:fge\\.d|fle\\.d)\\s" 1 } } */
+/* { dg-final { scan-assembler-times "\\s(?:beq|bne)\\s" 1 } } */
+/* { dg-final { scan-assembler-not "\\svt\\.maskc\\s" } } */
+/* { dg-final { scan-assembler-not "\\svt\\.maskcn\\s" } } */
+/* { dg-final { scan-assembler-not "\\s(?:seqz|snez)\\s" } } */
Index: gcc/gcc/testsuite/gcc.target/riscv/movdibfge-zicond.c
===
--- /dev/null
+++ gcc/gcc/testsuite/gcc.target/riscv/movdibfge-zicond.c
@@ -0,0 +1,29 @@
+/* { dg-do compile } */
+/* { dg-require-effective-target rv64 } */
+/* { dg-skip-if "" { *-*-* } { "-O0" "-Og" "-Os" "-Oz" } } */
+/* { dg-options "-march=rv64gc_zicond -mtune=rocket -mbranch-cost=2 
-fdump-rtl-ce1" } */
+
+t

[PATCH 16/44] RISC-V/testsuite: Add branchless cases for GEU and LEU cond-move operations

Verify, for Ventana and Zicond targets and the GEU and LEU 
conditional-move operations, that if-conversion triggers via 
`noce_try_cmove' at `-mbranch-cost=4' setting, which makes branchless 
code sequences produced by if-conversion cheaper than their original 
branched equivalents, and that extraneous instructions such as SEQZ, 
etc. are not present in output.

gcc/testsuite/
* gcc.target/riscv/movdigtu-ventana.c: New test.
* gcc.target/riscv/movdigtu-zicond.c: New test.
* gcc.target/riscv/movdiltu-ventana.c: New test.
* gcc.target/riscv/movdiltu-zicond.c: New test.
* gcc.target/riscv/movsigtu-ventana.c: New test.
* gcc.target/riscv/movsigtu-zicond.c: New test.
* gcc.target/riscv/movsiltu-ventana.c: New test.
* gcc.target/riscv/movsiltu-zicond.c: New test.
---
 gcc/testsuite/gcc.target/riscv/movdigtu-ventana.c |   28 ++
 gcc/testsuite/gcc.target/riscv/movdigtu-zicond.c  |   28 ++
 gcc/testsuite/gcc.target/riscv/movdiltu-ventana.c |   28 ++
 gcc/testsuite/gcc.target/riscv/movdiltu-zicond.c  |   28 ++
 gcc/testsuite/gcc.target/riscv/movsigtu-ventana.c |   28 ++
 gcc/testsuite/gcc.target/riscv/movsigtu-zicond.c  |   28 ++
 gcc/testsuite/gcc.target/riscv/movsiltu-ventana.c |   28 ++
 gcc/testsuite/gcc.target/riscv/movsiltu-zicond.c  |   28 ++
 8 files changed, 224 insertions(+)

gcc-riscv-expand-conditional-move-geu-leu-test-movcc.diff
Index: gcc/gcc/testsuite/gcc.target/riscv/movdigtu-ventana.c
===
--- /dev/null
+++ gcc/gcc/testsuite/gcc.target/riscv/movdigtu-ventana.c
@@ -0,0 +1,28 @@
+/* { dg-do compile } */
+/* { dg-require-effective-target rv64 } */
+/* { dg-skip-if "" { *-*-* } { "-O0" "-Og" "-Os" "-Oz" } } */
+/* { dg-options "-march=rv64gc_xventanacondops -mtune=rocket -mbranch-cost=4 
-fdump-rtl-ce1" } */
+
+typedef unsigned int __attribute__ ((mode (DI))) int_t;
+
+int_t
+movdigtu (int_t w, int_t x, int_t y, int_t z)
+{
+  return w > x ? y : z;
+}
+
+/* Expect branchless assembly like:
+
+   sgtua1,a0,a1
+   vt.maskcn   a3,a3,a1
+   vt.maskca1,a2,a1
+   or  a0,a1,a3
+ */
+
+/* { dg-final { scan-rtl-dump-times "Conversion succeeded on pass 1\\." 1 
"ce1" } } */
+/* { dg-final { scan-rtl-dump-times "if-conversion succeeded through 
noce_try_cmove" 1 "ce1" } } */
+/* { dg-final { scan-assembler-times "\\s(?:sgtu|sltu)\\s" 1 } } */
+/* { dg-final { scan-assembler-times "\\svt\\.maskc\\s" 1 } } */
+/* { dg-final { scan-assembler-times "\\svt\\.maskcn\\s" 1 } } */
+/* { dg-final { scan-assembler-not "\\s(?:seqz|snez)\\s" } } */
+/* { dg-final { scan-assembler-not "\\s(?:bgeu|bgtu|bleu|bltu)\\s" } } */
Index: gcc/gcc/testsuite/gcc.target/riscv/movdigtu-zicond.c
===
--- /dev/null
+++ gcc/gcc/testsuite/gcc.target/riscv/movdigtu-zicond.c
@@ -0,0 +1,28 @@
+/* { dg-do compile } */
+/* { dg-require-effective-target rv64 } */
+/* { dg-skip-if "" { *-*-* } { "-O0" "-Og" "-Os" "-Oz" } } */
+/* { dg-options "-march=rv64gc_zicond -mtune=rocket -mbranch-cost=4 
-fdump-rtl-ce1" } */
+
+typedef unsigned int __attribute__ ((mode (DI))) int_t;
+
+int_t
+movdigtu (int_t w, int_t x, int_t y, int_t z)
+{
+  return w > x ? y : z;
+}
+
+/* Expect branchless assembly like:
+
+   sgtua1,a0,a1
+   czero.nez   a3,a3,a1
+   czero.eqz   a1,a2,a1
+   or  a0,a1,a3
+ */
+
+/* { dg-final { scan-rtl-dump-times "Conversion succeeded on pass 1\\." 1 
"ce1" } } */
+/* { dg-final { scan-rtl-dump-times "if-conversion succeeded through 
noce_try_cmove" 1 "ce1" } } */
+/* { dg-final { scan-assembler-times "\\s(?:sgtu|sltu)\\s" 1 } } */
+/* { dg-final { scan-assembler-times "\\sczero\\.eqz\\s" 1 } } */
+/* { dg-final { scan-assembler-times "\\sczero\\.nez\\s" 1 } } */
+/* { dg-final { scan-assembler-not "\\s(?:seqz|snez)\\s" } } */
+/* { dg-final { scan-assembler-not "\\s(?:bgeu|bgtu|bleu|bltu)\\s" } } */
Index: gcc/gcc/testsuite/gcc.target/riscv/movdiltu-ventana.c
===
--- /dev/null
+++ gcc/gcc/testsuite/gcc.target/riscv/movdiltu-ventana.c
@@ -0,0 +1,28 @@
+/* { dg-do compile } */
+/* { dg-require-effective-target rv64 } */
+/* { dg-skip-if "" { *-*-* } { "-O0" "-Og" "-Os" "-Oz" } } */
+/* { dg-options "-march=rv64gc_xventanacondops -mtune=rocket -mbranch-cost=4 
-fdump-rtl-ce1" } */
+
+typedef unsigned int __attribute__ ((mode (DI))) int_t;
+
+int_t
+movdiltu (int_t w, int_t x, int_t y, int_t z)
+{
+  return w < x ? y : z;
+}
+
+/* Expect branchless assembly like:
+
+   sltua1,a0,a1
+   vt.maskcn   a3,a3,a1
+   vt.maskca1,a2,a1
+   or  a0,a1,a3
+ */
+
+/* { dg-final { scan-rtl-dump-times "Conversion succeeded on pass 1\\." 1

[PATCH 17/44] RISC-V: Avoid extraneous EQ or NE operation in cond-move expansion

In the non-zero case there is no need for the conditional value used by 
Ventana and Zicond integer conditional operations to be specifically 1. 
Regardless we canonicalize it by producing an extraneous conditional-set 
operation, such as with the sequence below:

(insn 22 6 23 2 (set (reg:DI 141)
(minus:DI (reg/v:DI 135 [ w ])
(reg/v:DI 136 [ x ]))) 11 {subdi3}
 (nil))
(insn 23 22 24 2 (set (reg:DI 140)
(ne:DI (reg:DI 141)
(const_int 0 [0]))) 307 {*sne_zero_didi}
 (nil))
(insn 24 23 25 2 (set (reg:DI 143)
(if_then_else:DI (eq:DI (reg:DI 140)
(const_int 0 [0]))
(const_int 0 [0])
(reg:DI 13 a3 [ z ]))) 27913 {*czero.eqz.didi}
 (nil))
(insn 25 24 26 2 (set (reg:DI 142)
(if_then_else:DI (ne:DI (reg:DI 140)
(const_int 0 [0]))
(const_int 0 [0])
(reg/v:DI 137 [ y ]))) 27914 {*czero.nez.didi}
 (nil))
(insn 26 25 18 2 (set (reg/v:DI 138 [ z ])
(ior:DI (reg:DI 142)
(reg:DI 143))) 105 {iordi3}
 (nil))

where insn 23 can well be removed without changing the semantics of the 
sequence.  This is actually fixed up later on by combine and the insn 
does not make it to output meaning no SNEZ (or SEQZ in the reverse case) 
appears in the assembly produced, however it counts towards the cost of 
the sequence calculated by if-conversion, raising the trigger level for 
the branchless sequence to be chosen.  Arguably to emit this extraneous 
operation it can be also considered rather sloppy of our backend's.

Remove the check for operand 1 being constant 0 in the Ventana/Zicond 
case for equality comparisons then, observing that `riscv_zero_if_equal' 
called via `riscv_emit_int_compare' will canonicalize the comparison if 
required, removing the extraneous insn from output:

(insn 22 6 23 2 (set (reg:DI 142)
(minus:DI (reg/v:DI 135 [ w ])
(reg/v:DI 136 [ x ]))) 11 {subdi3}
 (nil))
(insn 23 22 24 2 (set (reg:DI 141)
(if_then_else:DI (eq:DI (reg:DI 142)
(const_int 0 [0]))
(const_int 0 [0])
(reg:DI 13 a3 [ z ]))) 27913 {*czero.eqz.didi}
 (nil))
(insn 24 23 25 2 (set (reg:DI 140)
(if_then_else:DI (ne:DI (reg:DI 142)
(const_int 0 [0]))
(const_int 0 [0])
(reg/v:DI 137 [ y ]))) 27914 {*czero.nez.didi}
 (nil))
(insn 25 24 18 2 (set (reg/v:DI 138 [ z ])
(ior:DI (reg:DI 140)
(reg:DI 141))) 105 {iordi3}
 (nil))

while keeping actual assembly produced the same.

Adjust branch costs across the test cases affected accordingly.

gcc/
* config/riscv/riscv.cc (riscv_expand_conditional_move): Remove 
the check for operand 1 being constant 0 in the Ventana/Zicond
case for equality comparisons.

gcc/testsuite/
* 
gcc.target/riscv/zicond-primitiveSemantics_compare_imm_return_imm_imm.c: 
Lower `-mbranch-cost=' setting.
* 
gcc.target/riscv/zicond-primitiveSemantics_compare_imm_return_imm_reg.c: 
Likewise.
* 
gcc.target/riscv/zicond-primitiveSemantics_compare_imm_return_reg_reg.c: 
Likewise.
* 
gcc.target/riscv/zicond-primitiveSemantics_compare_reg_return_imm_imm.c: 
Likewise.
* 
gcc.target/riscv/zicond-primitiveSemantics_compare_reg_return_imm_reg.c: 
Likewise.
* 
gcc.target/riscv/zicond-primitiveSemantics_compare_reg_return_reg_reg.c: 
Likewise.
---
 gcc/config/riscv/riscv.cc  
   |6 +++---
 
gcc/testsuite/gcc.target/riscv/zicond-primitiveSemantics_compare_imm_return_imm_imm.c
 |4 ++--
 
gcc/testsuite/gcc.target/riscv/zicond-primitiveSemantics_compare_imm_return_imm_reg.c
 |4 ++--
 
gcc/testsuite/gcc.target/riscv/zicond-primitiveSemantics_compare_imm_return_reg_reg.c
 |4 ++--
 
gcc/testsuite/gcc.target/riscv/zicond-primitiveSemantics_compare_reg_return_imm_imm.c
 |4 ++--
 
gcc/testsuite/gcc.target/riscv/zicond-primitiveSemantics_compare_reg_return_imm_reg.c
 |4 ++--
 
gcc/testsuite/gcc.target/riscv/zicond-primitiveSemantics_compare_reg_return_reg_reg.c
 |4 ++--
 7 files changed, 15 insertions(+), 15 deletions(-)

gcc-riscv-expand-conditional-move-zicond-equality.diff
Index: gcc/gcc/config/riscv/riscv.cc
===
--- gcc.orig/gcc/config/riscv/riscv.cc
+++ gcc/gcc/config/riscv/riscv.cc
@@ -4132,9 +4132,9 @@ riscv_expand_conditional_move (rtx dest,
return false;
 
   /* Canonicalize the comparison.  It must be an equality comparison
-against 0.  If it isn't, then emit an SCC instruction so that
-we can then use an equality comparison against zero.  */
-  if (!equality_operator (op, VOIDmode) || op1 != CONST0_RTX (mode))
+of integer operands.  If it isn't, then emit an SCC instruction
+so that we can then use

[PATCH 14/44] RISC-V: Also invert the cond-move condition for GEU and LEU

Update `riscv_expand_conditional_move' and handle the missing GEU and 
LEU operators there, avoiding an extraneous conditional set operation, 
such as with this output:

sgtua0,a0,a1
seqza1,a0
czero.eqz   a3,a3,a1
czero.nez   a1,a2,a1
or  a0,a1,a3

produced when optimizing for Zicond targets from:

int
movsigtu (int w, int x, int y, int z)
{
  return w > x ? y : z;
}

These operators can be inverted producing optimal code such as this:

sgtua1,a0,a1
czero.nez   a3,a3,a1
czero.eqz   a1,a2,a1
or  a0,a1,a3

which this change causes to happen.

gcc/
* config/riscv/riscv.cc (riscv_expand_conditional_move): Also 
invert the condition for GEU and LEU.
---
 gcc/config/riscv/riscv.cc |2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

gcc-riscv-expand-conditional-move-geu-leu.diff
Index: gcc/gcc/config/riscv/riscv.cc
===
--- gcc.orig/gcc/config/riscv/riscv.cc
+++ gcc/gcc/config/riscv/riscv.cc
@@ -4142,7 +4142,7 @@ riscv_expand_conditional_move (rtx dest,
  /* If riscv_expand_int_scc inverts the condition, then it will
 flip the value of INVERT.  We need to know where so that
 we can adjust it for our needs.  */
- if (code == LE || code == GE)
+ if (code == LE || code == LEU || code == GE || code == GEU)
invert_ptr = &invert;
 
  /* Emit an scc like instruction into a temporary

[PATCH 18/44] RISC-V/testsuite: Add branched cases for equality cond-move operations

Verify, for Ventana and Zicond targets and the equality conditional-move 
operations, that if-conversion does *not* trigger at the respective 
sufficiently low `-mbranch-cost=' settings that make original branched 
code sequences cheaper than their branchless equivalents if-conversion 
would emit.

gcc/testsuite/
* gcc.target/riscv/movdibeq-ventana.c: New test.
* gcc.target/riscv/movdibeq-zicond.c: New test.
* gcc.target/riscv/movdibne-ventana.c: New test.
* gcc.target/riscv/movdibne-zicond.c: New test.
* gcc.target/riscv/movsibeq-ventana.c: New test.
* gcc.target/riscv/movsibeq-zicond.c: New test.
* gcc.target/riscv/movsibne-ventana.c: New test.
* gcc.target/riscv/movsibne-zicond.c: New test.
---
 gcc/testsuite/gcc.target/riscv/movdibeq-ventana.c |   28 
 gcc/testsuite/gcc.target/riscv/movdibeq-zicond.c  |   28 
 gcc/testsuite/gcc.target/riscv/movdibne-ventana.c |   30 ++
 gcc/testsuite/gcc.target/riscv/movdibne-zicond.c  |   30 ++
 gcc/testsuite/gcc.target/riscv/movsibeq-ventana.c |   28 
 gcc/testsuite/gcc.target/riscv/movsibeq-zicond.c  |   28 
 gcc/testsuite/gcc.target/riscv/movsibne-ventana.c |   30 ++
 gcc/testsuite/gcc.target/riscv/movsibne-zicond.c  |   30 ++
 8 files changed, 232 insertions(+)

gcc-riscv-expand-conditional-move-zicond-equality-test-movcc-branch.diff
Index: gcc/gcc/testsuite/gcc.target/riscv/movdibeq-ventana.c
===
--- /dev/null
+++ gcc/gcc/testsuite/gcc.target/riscv/movdibeq-ventana.c
@@ -0,0 +1,28 @@
+/* { dg-do compile } */
+/* { dg-require-effective-target rv64 } */
+/* { dg-skip-if "" { *-*-* } { "-O0" "-Og" "-Os" "-Oz" } } */
+/* { dg-options "-march=rv64gc_xventanacondops -mtune=rocket -mbranch-cost=3 
-fdump-rtl-ce1" } */
+
+typedef int __attribute__ ((mode (DI))) int_t;
+
+int_t
+movdieq (int_t w, int_t x, int_t y, int_t z)
+{
+  return w == x ? y : z;
+}
+
+/* Expect branched assembly like:
+
+   bne a0,a1,.L2
+   mv  a3,a2
+.L2:
+   mv  a0,a3
+ */
+
+/* { dg-final { scan-rtl-dump-not "Conversion succeeded on pass \[0-9\]+\\." 
"ce1" } } */
+/* { dg-final { scan-rtl-dump-not "if-conversion succeeded through" "ce1" } } 
*/
+/* { dg-final { scan-assembler-times "\\s(?:beq|bne)\\s" 1 } } */
+/* { dg-final { scan-assembler-not "\\ssub\\s" } } */
+/* { dg-final { scan-assembler-not "\\svt\\.maskc\\s" } } */
+/* { dg-final { scan-assembler-not "\\svt\\.maskcn\\s" } } */
+/* { dg-final { scan-assembler-not "\\s(?:seqz|snez)\\s" } } */
Index: gcc/gcc/testsuite/gcc.target/riscv/movdibeq-zicond.c
===
--- /dev/null
+++ gcc/gcc/testsuite/gcc.target/riscv/movdibeq-zicond.c
@@ -0,0 +1,28 @@
+/* { dg-do compile } */
+/* { dg-require-effective-target rv64 } */
+/* { dg-skip-if "" { *-*-* } { "-O0" "-Og" "-Os" "-Oz" } } */
+/* { dg-options "-march=rv64gc_zicond -mtune=rocket -mbranch-cost=3 
-fdump-rtl-ce1" } */
+
+typedef int __attribute__ ((mode (DI))) int_t;
+
+int_t
+movdieq (int_t w, int_t x, int_t y, int_t z)
+{
+  return w == x ? y : z;
+}
+
+/* Expect branched assembly like:
+
+   bne a0,a1,.L2
+   mv  a3,a2
+.L2:
+   mv  a0,a3
+ */
+
+/* { dg-final { scan-rtl-dump-not "Conversion succeeded on pass \[0-9\]+\\." 
"ce1" } } */
+/* { dg-final { scan-rtl-dump-not "if-conversion succeeded through" "ce1" } } 
*/
+/* { dg-final { scan-assembler-times "\\s(?:beq|bne)\\s" 1 } } */
+/* { dg-final { scan-assembler-not "\\ssub\\s" } } */
+/* { dg-final { scan-assembler-not "\\sczero\\.eqz\\s" } } */
+/* { dg-final { scan-assembler-not "\\sczero\\.nez\\s" } } */
+/* { dg-final { scan-assembler-not "\\s(?:seqz|snez)\\s" } } */
Index: gcc/gcc/testsuite/gcc.target/riscv/movdibne-ventana.c
===
--- /dev/null
+++ gcc/gcc/testsuite/gcc.target/riscv/movdibne-ventana.c
@@ -0,0 +1,30 @@
+/* { dg-do compile } */
+/* { dg-require-effective-target rv64 } */
+/* { dg-skip-if "" { *-*-* } { "-O0" "-Og" "-Os" "-Oz" } } */
+/* { dg-options "-march=rv64gc_xventanacondops -mtune=rocket -mbranch-cost=2 
-fdump-rtl-ce1" } */
+
+typedef int __attribute__ ((mode (DI))) int_t;
+
+int_t
+movdine (int_t w, int_t x, int_t y, int_t z)
+{
+  return w != x ? y : z;
+}
+
+/* Expect branched assembly like:
+
+   beq a0,a1,.L3
+   mv  a0,a2
+   ret
+.L3:
+   mv  a0,a3
+   ret
+ */
+
+/* { dg-final { scan-rtl-dump-not "Conversion succeeded on pass \[0-9\]+\\." 
"ce1" } } */
+/* { dg-final { scan-rtl-dump-not "if-conversion succeeded through" "ce1" } } 
*/
+/* { dg-final { scan-assembler-times "\\s(?:beq|bne)\\s" 1 } } */
+/* { dg-final { scan-assembler-not "\\ssub\\s" } } */
+/* { dg-final { scan-assembler-not "\\svt\\.maskc\\s" } }

[PATCH 19/44] RISC-V/testsuite: Add branchless cases for equality cond-move operations

Verify, for Ventana and Zicond targets and the equality conditional-move 
operations, that if-conversion triggers via `noce_try_cmove' at the 
respective sufficiently high `-mbranch-cost=' settings that make 
branchless code sequences produced by if-conversion cheaper than their 
original branched equivalents, and that extraneous instructions such as 
SNEZ, etc. are not present in output.

gcc/testsuite/
* gcc.target/riscv/movdieq-ventana.c: New test.
* gcc.target/riscv/movdieq-zicond.c: New test.
* gcc.target/riscv/movdine-ventana.c: New test.
* gcc.target/riscv/movdine-zicond.c: New test.
* gcc.target/riscv/movsieq-ventana.c: New test.
* gcc.target/riscv/movsieq-zicond.c: New test.
* gcc.target/riscv/movsine-ventana.c: New test.
* gcc.target/riscv/movsine-zicond.c: New test.
---
 gcc/testsuite/gcc.target/riscv/movdieq-ventana.c |   28 +++
 gcc/testsuite/gcc.target/riscv/movdieq-zicond.c  |   28 +++
 gcc/testsuite/gcc.target/riscv/movdine-ventana.c |   28 +++
 gcc/testsuite/gcc.target/riscv/movdine-zicond.c  |   28 +++
 gcc/testsuite/gcc.target/riscv/movsieq-ventana.c |   28 +++
 gcc/testsuite/gcc.target/riscv/movsieq-zicond.c  |   28 +++
 gcc/testsuite/gcc.target/riscv/movsine-ventana.c |   28 +++
 gcc/testsuite/gcc.target/riscv/movsine-zicond.c  |   28 +++
 8 files changed, 224 insertions(+)

gcc-riscv-expand-conditional-move-zicond-equality-test-movcc.diff
Index: gcc/gcc/testsuite/gcc.target/riscv/movdieq-ventana.c
===
--- /dev/null
+++ gcc/gcc/testsuite/gcc.target/riscv/movdieq-ventana.c
@@ -0,0 +1,28 @@
+/* { dg-do compile } */
+/* { dg-require-effective-target rv64 } */
+/* { dg-skip-if "" { *-*-* } { "-O0" "-Og" "-Os" "-Oz" } } */
+/* { dg-options "-march=rv64gc_xventanacondops -mtune=rocket -mbranch-cost=4 
-fdump-rtl-ce1" } */
+
+typedef int __attribute__ ((mode (DI))) int_t;
+
+int_t
+movdieq (int_t w, int_t x, int_t y, int_t z)
+{
+  return w == x ? y : z;
+}
+
+/* Expect branchless assembly like:
+
+   sub a1,a0,a1
+   vt.maskca3,a3,a1
+   vt.maskcn   a1,a2,a1
+   or  a0,a1,a3
+ */
+
+/* { dg-final { scan-rtl-dump-times "Conversion succeeded on pass 1\\." 1 
"ce1" } } */
+/* { dg-final { scan-rtl-dump-times "if-conversion succeeded through 
noce_try_cmove" 1 "ce1" } } */
+/* { dg-final { scan-assembler-times "\\ssub\\s" 1 } } */
+/* { dg-final { scan-assembler-times "\\svt\\.maskc\\s" 1 } } */
+/* { dg-final { scan-assembler-times "\\svt\\.maskcn\\s" 1 } } */
+/* { dg-final { scan-assembler-not "\\s(?:seqz|snez)\\s" } } */
+/* { dg-final { scan-assembler-not "\\s(?:beq|bne)\\s" } } */
Index: gcc/gcc/testsuite/gcc.target/riscv/movdieq-zicond.c
===
--- /dev/null
+++ gcc/gcc/testsuite/gcc.target/riscv/movdieq-zicond.c
@@ -0,0 +1,28 @@
+/* { dg-do compile } */
+/* { dg-require-effective-target rv64 } */
+/* { dg-skip-if "" { *-*-* } { "-O0" "-Og" "-Os" "-Oz" } } */
+/* { dg-options "-march=rv64gc_zicond -mtune=rocket -mbranch-cost=4 
-fdump-rtl-ce1" } */
+
+typedef int __attribute__ ((mode (DI))) int_t;
+
+int_t
+movdieq (int_t w, int_t x, int_t y, int_t z)
+{
+  return w == x ? y : z;
+}
+
+/* Expect branchless assembly like:
+
+   sub a1,a0,a1
+   czero.eqz   a3,a3,a1
+   czero.nez   a1,a2,a1
+   or  a0,a1,a3
+ */
+
+/* { dg-final { scan-rtl-dump-times "Conversion succeeded on pass 1\\." 1 
"ce1" } } */
+/* { dg-final { scan-rtl-dump-times "if-conversion succeeded through 
noce_try_cmove" 1 "ce1" } } */
+/* { dg-final { scan-assembler-times "\\ssub\\s" 1 } } */
+/* { dg-final { scan-assembler-times "\\sczero\\.eqz\\s" 1 } } */
+/* { dg-final { scan-assembler-times "\\sczero\\.nez\\s" 1 } } */
+/* { dg-final { scan-assembler-not "\\s(?:seqz|snez)\\s" } } */
+/* { dg-final { scan-assembler-not "\\s(?:beq|bne)\\s" } } */
Index: gcc/gcc/testsuite/gcc.target/riscv/movdine-ventana.c
===
--- /dev/null
+++ gcc/gcc/testsuite/gcc.target/riscv/movdine-ventana.c
@@ -0,0 +1,28 @@
+/* { dg-do compile } */
+/* { dg-require-effective-target rv64 } */
+/* { dg-skip-if "" { *-*-* } { "-O0" "-Og" "-Os" "-Oz" } } */
+/* { dg-options "-march=rv64gc_xventanacondops -mtune=rocket -mbranch-cost=3 
-fdump-rtl-ce1" } */
+
+typedef int __attribute__ ((mode (DI))) int_t;
+
+int_t
+movdine (int_t w, int_t x, int_t y, int_t z)
+{
+  return w != x ? y : z;
+}
+
+/* Expect branchless assembly like:
+
+   sub a1,a0,a1
+   vt.maskcn   a3,a3,a1
+   vt.maskca1,a2,a1
+   or  a0,a1,a3
+ */
+
+/* { dg-final { scan-rtl-dump-times "Conversion succeeded on pass 1\\." 1 
"ce1" } } */
+/* { dg-final { scan-rtl-dump-times

[PATCH 15/44] RISC-V/testsuite: Add branched cases for GEU and LEU cond-move operations

Verify, for Ventana and Zicond targets and the GEU and LEU 
conditional-move operations, that if-conversion does *not* trigger at 
`-mbranch-cost=3' setting, which makes original branched code sequences 
cheaper than their branchless equivalents if-conversion would emit.

gcc/testsuite/
* gcc.target/riscv/movdibgtu-ventana.c: New test.
* gcc.target/riscv/movdibgtu-zicond.c: New test.
* gcc.target/riscv/movdibltu-ventana.c: New test.
* gcc.target/riscv/movdibltu-zicond.c: New test.
* gcc.target/riscv/movsibgtu-ventana.c: New test.
* gcc.target/riscv/movsibgtu-zicond.c: New test.
* gcc.target/riscv/movsibltu-ventana.c: New test.
* gcc.target/riscv/movsibltu-zicond.c: New test.
---
 gcc/testsuite/gcc.target/riscv/movdibgtu-ventana.c |   28 +
 gcc/testsuite/gcc.target/riscv/movdibgtu-zicond.c  |   28 +
 gcc/testsuite/gcc.target/riscv/movdibltu-ventana.c |   28 +
 gcc/testsuite/gcc.target/riscv/movdibltu-zicond.c  |   28 +
 gcc/testsuite/gcc.target/riscv/movsibgtu-ventana.c |   28 +
 gcc/testsuite/gcc.target/riscv/movsibgtu-zicond.c  |   28 +
 gcc/testsuite/gcc.target/riscv/movsibltu-ventana.c |   28 +
 gcc/testsuite/gcc.target/riscv/movsibltu-zicond.c  |   28 +
 8 files changed, 224 insertions(+)

gcc-riscv-expand-conditional-move-geu-leu-test-movcc-branch.diff
Index: gcc/gcc/testsuite/gcc.target/riscv/movdibgtu-ventana.c
===
--- /dev/null
+++ gcc/gcc/testsuite/gcc.target/riscv/movdibgtu-ventana.c
@@ -0,0 +1,28 @@
+/* { dg-do compile } */
+/* { dg-require-effective-target rv64 } */
+/* { dg-skip-if "" { *-*-* } { "-O0" "-Og" "-Os" "-Oz" } } */
+/* { dg-options "-march=rv64gc_xventanacondops -mtune=rocket -mbranch-cost=3 
-fdump-rtl-ce1" } */
+
+typedef unsigned int __attribute__ ((mode (DI))) int_t;
+
+int_t
+movdigtu (int_t w, int_t x, int_t y, int_t z)
+{
+  return w > x ? y : z;
+}
+
+/* Expect branched assembly like:
+
+   bleua0,a1,.L2
+   mv  a3,a2
+.L2:
+   mv  a0,a3
+ */
+
+/* { dg-final { scan-rtl-dump-not "Conversion succeeded on pass \[0-9\]+\\." 
"ce1" } } */
+/* { dg-final { scan-rtl-dump-not "if-conversion succeeded through" "ce1" } } 
*/
+/* { dg-final { scan-assembler-times "\\s(?:bgeu|bgtu|bleu|bltu)\\s" 1 } } */
+/* { dg-final { scan-assembler-not "\\s(?:sgtu|sltu)\\s" } } */
+/* { dg-final { scan-assembler-not "\\svt\\.maskc\\s" } } */
+/* { dg-final { scan-assembler-not "\\svt\\.maskcn\\s" } } */
+/* { dg-final { scan-assembler-not "\\s(?:seqz|snez)\\s" } } */
Index: gcc/gcc/testsuite/gcc.target/riscv/movdibgtu-zicond.c
===
--- /dev/null
+++ gcc/gcc/testsuite/gcc.target/riscv/movdibgtu-zicond.c
@@ -0,0 +1,28 @@
+/* { dg-do compile } */
+/* { dg-require-effective-target rv64 } */
+/* { dg-skip-if "" { *-*-* } { "-O0" "-Og" "-Os" "-Oz" } } */
+/* { dg-options "-march=rv64gc_zicond -mtune=rocket -mbranch-cost=3 
-fdump-rtl-ce1" } */
+
+typedef unsigned int __attribute__ ((mode (DI))) int_t;
+
+int_t
+movdigtu (int_t w, int_t x, int_t y, int_t z)
+{
+  return w > x ? y : z;
+}
+
+/* Expect branched assembly like:
+
+   bleua0,a1,.L2
+   mv  a3,a2
+.L2:
+   mv  a0,a3
+ */
+
+/* { dg-final { scan-rtl-dump-not "Conversion succeeded on pass \[0-9\]+\\." 
"ce1" } } */
+/* { dg-final { scan-rtl-dump-not "if-conversion succeeded through" "ce1" } } 
*/
+/* { dg-final { scan-assembler-times "\\s(?:bgeu|bgtu|bleu|bltu)\\s" 1 } } */
+/* { dg-final { scan-assembler-not "\\s(?:sgtu|sltu)\\s" } } */
+/* { dg-final { scan-assembler-not "\\sczero\\.eqz\\s" } } */
+/* { dg-final { scan-assembler-not "\\sczero\\.nez\\s" } } */
+/* { dg-final { scan-assembler-not "\\s(?:seqz|snez)\\s" } } */
Index: gcc/gcc/testsuite/gcc.target/riscv/movdibltu-ventana.c
===
--- /dev/null
+++ gcc/gcc/testsuite/gcc.target/riscv/movdibltu-ventana.c
@@ -0,0 +1,28 @@
+/* { dg-do compile } */
+/* { dg-require-effective-target rv64 } */
+/* { dg-skip-if "" { *-*-* } { "-O0" "-Og" "-Os" "-Oz" } } */
+/* { dg-options "-march=rv64gc_xventanacondops -mtune=rocket -mbranch-cost=3 
-fdump-rtl-ce1" } */
+
+typedef unsigned int __attribute__ ((mode (DI))) int_t;
+
+int_t
+movdiltu (int_t w, int_t x, int_t y, int_t z)
+{
+  return w < x ? y : z;
+}
+
+/* Expect branched assembly like:
+
+   bgeua0,a1,.L2
+   mv  a3,a2
+.L2:
+   mv  a0,a3
+ */
+
+/* { dg-final { scan-rtl-dump-not "Conversion succeeded on pass \[0-9\]+\\." 
"ce1" } } */
+/* { dg-final { scan-rtl-dump-not "if-conversion succeeded through" "ce1" } } 
*/
+/* { dg-final { scan-assembler-times "\\s(?:bgeu|bgtu|bleu|bltu)\\s" 1 } } */
+/* { dg-final { scan-assembler-not "\\s(?:sgtu|sltu)\\s" } } */
+/

[PATCH 20/44] RISC-V: Also accept constants for T-Head cond-move comparison operands

There is no need for the requirement for conditional-move comparison 
operands to be stricter for T-Head targets than for other targets and 
limit them to registers only.  Constants will be reloaded if required 
just as with branches or other-target conditional-move operations and 
there is no extra overhead specific to the T-Head case.  This enables 
more opportunities for a branchless sequence to be produced.

gcc/
* config/riscv/riscv.cc (riscv_expand_conditional_move): Also 
accept constants for T-Head comparison operands.
---
 gcc/config/riscv/riscv.cc |4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

gcc-riscv-expand-conditional-move-thead-op.diff
Index: gcc/gcc/config/riscv/riscv.cc
===
--- gcc.orig/gcc/config/riscv/riscv.cc
+++ gcc/gcc/config/riscv/riscv.cc
@@ -4097,8 +4097,8 @@ riscv_expand_conditional_move (rtx dest,
   && reg_or_0_operand (cons, mode)
   && reg_or_0_operand (alt, mode)
   && (GET_MODE (op) == mode || GET_MODE (op) == E_VOIDmode)
-  && GET_MODE (op0) == mode
-  && GET_MODE (op1) == mode
+  && (GET_MODE (op0) == mode || CONST_INT_P (op0))
+  && (GET_MODE (op1) == mode || CONST_INT_P (op1))
   && (code == EQ || code == NE))
 need_eq_ne_p = true;

[PATCH 21/44] RISC-V: Also accept constants for T-Head cond-move data input operands

There is no need for the requirement for conditional-move data input 
operands to be stricter for T-Head targets than for short forward branch 
targets and limit them to registers only.  They are keyed according to 
the `sfb_alu_operand' predicate, which lets certain constants through.  
Such constants are already forced into a register for the `cons' operand 
in the analogous short forward branch case and we can force them for the 
`alt' operand and T-Head as well.  This enables more opportunities for a 
branchless sequence to be produced.

gcc/
* config/riscv/riscv.cc (riscv_expand_conditional_move): Also
accept constants for T-Head data input operands.
---
 gcc/config/riscv/riscv.cc |4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

gcc-riscv-expand-conditional-move-thead-alt.diff
Index: gcc/gcc/config/riscv/riscv.cc
===
--- gcc.orig/gcc/config/riscv/riscv.cc
+++ gcc/gcc/config/riscv/riscv.cc
@@ -4094,8 +4094,6 @@ riscv_expand_conditional_move (rtx dest,
 
   if (TARGET_XTHEADCONDMOV
   && GET_MODE_CLASS (mode) == MODE_INT
-  && reg_or_0_operand (cons, mode)
-  && reg_or_0_operand (alt, mode)
   && (GET_MODE (op) == mode || GET_MODE (op) == E_VOIDmode)
   && (GET_MODE (op0) == mode || CONST_INT_P (op0))
   && (GET_MODE (op1) == mode || CONST_INT_P (op1))
@@ -4113,6 +4111,8 @@ riscv_expand_conditional_move (rtx dest,
 cases for extensions which are more general than SFB.  But
 does mean we need to force CONS into a register at this point.  */
   cons = force_reg (mode, cons);
+  /* With XTheadCondMov we need to force ALT into a register too.  */
+  alt = force_reg (mode, alt);
   emit_insn (gen_rtx_SET (dest, gen_rtx_IF_THEN_ELSE (mode,
  cond, cons, alt)));
   return true;

[PATCH 22/44] RISC-V: Fold all the cond-move variants together

Code in `riscv_expand_conditional_move' for Ventana and Zicond targets 
seems like bolted on as an afterthought rather than properly merged so 
as to handle all the cases together.

Fold the existing code pieces together then (observing that for short 
forward branch targets no integer comparisons need to be canonicalized), 
letting T-Head targets produce branchless sequences for all the integer 
comparisons rather than for equality ones only, and preparing for the 
handling of floating-point comparisons here across all conditional-move 
targets.

gcc/
* config/riscv/riscv.cc (riscv_expand_conditional_move): Unify
conditional-move handling across all the relevant targets.
---
 gcc/config/riscv/riscv.cc |   58 +++---
 1 file changed, 25 insertions(+), 33 deletions(-)

gcc-riscv-expand-conditional-move-sfb-alu-thead.diff
Index: gcc/gcc/config/riscv/riscv.cc
===
--- gcc.orig/gcc/config/riscv/riscv.cc
+++ gcc/gcc/config/riscv/riscv.cc
@@ -4090,35 +4090,9 @@ riscv_expand_conditional_move (rtx dest,
   rtx_code code = GET_CODE (op);
   rtx op0 = XEXP (op, 0);
   rtx op1 = XEXP (op, 1);
-  bool need_eq_ne_p = false;
-
-  if (TARGET_XTHEADCONDMOV
-  && GET_MODE_CLASS (mode) == MODE_INT
-  && (GET_MODE (op) == mode || GET_MODE (op) == E_VOIDmode)
-  && (GET_MODE (op0) == mode || CONST_INT_P (op0))
-  && (GET_MODE (op1) == mode || CONST_INT_P (op1))
-  && (code == EQ || code == NE))
-need_eq_ne_p = true;
-
-  if (need_eq_ne_p
-  || (TARGET_SFB_ALU && GET_MODE (op0) == word_mode))
-{
-  riscv_emit_int_compare (&code, &op0, &op1, need_eq_ne_p);
-  rtx cond = gen_rtx_fmt_ee (code, GET_MODE (op0), op0, op1);
 
-  /* The expander is a bit loose in its specification of the true
-arm of the conditional move.  That allows us to support more
-cases for extensions which are more general than SFB.  But
-does mean we need to force CONS into a register at this point.  */
-  cons = force_reg (mode, cons);
-  /* With XTheadCondMov we need to force ALT into a register too.  */
-  alt = force_reg (mode, alt);
-  emit_insn (gen_rtx_SET (dest, gen_rtx_IF_THEN_ELSE (mode,
- cond, cons, alt)));
-  return true;
-}
-  else if (TARGET_ZICOND_LIKE
-  && GET_MODE_CLASS (mode) == MODE_INT)
+  if ((TARGET_ZICOND_LIKE && GET_MODE_CLASS (mode) == MODE_INT)
+  || TARGET_SFB_ALU || TARGET_XTHEADCONDMOV)
 {
   machine_mode mode0 = GET_MODE (op0);
   machine_mode mode1 = GET_MODE (op1);
@@ -4132,9 +4106,11 @@ riscv_expand_conditional_move (rtx dest,
return false;
 
   /* Canonicalize the comparison.  It must be an equality comparison
-of integer operands.  If it isn't, then emit an SCC instruction
+of integer operands, or with SFB it can be any comparison of
+integer operands.  If it isn't, then emit an SCC instruction
 so that we can then use an equality comparison against zero.  */
-  if (!equality_operator (op, VOIDmode) || !INTEGRAL_MODE_P (mode0))
+  if ((!TARGET_SFB_ALU && !equality_operator (op, VOIDmode))
+ || !INTEGRAL_MODE_P (mode0))
{
  bool *invert_ptr = nullptr;
  bool invert = false;
@@ -4166,10 +4142,26 @@ riscv_expand_conditional_move (rtx dest,
  op1 = XEXP (op, 1);
}
 
+  if (TARGET_SFB_ALU || TARGET_XTHEADCONDMOV)
+   {
+ riscv_emit_int_compare (&code, &op0, &op1, !TARGET_SFB_ALU);
+ rtx cond = gen_rtx_fmt_ee (code, GET_MODE (op0), op0, op1);
+
+ /* The expander is a bit loose in its specification of the true
+arm of the conditional move.  That allows us to support more
+cases for extensions which are more general than SFB.  But
+does mean we need to force CONS into a register at this point.  */
+ cons = force_reg (mode, cons);
+ /* With XTheadCondMov we need to force ALT into a register too.  */
+ alt = force_reg (mode, alt);
+ emit_insn (gen_rtx_SET (dest, gen_rtx_IF_THEN_ELSE (mode, cond,
+ cons, alt)));
+ return true;
+   }
   /* 0, reg or 0, imm */
-  if (cons == CONST0_RTX (mode)
- && (REG_P (alt)
- || (CONST_INT_P (alt) && alt != CONST0_RTX (mode
+  else if (cons == CONST0_RTX (mode)
+  && (REG_P (alt)
+  || (CONST_INT_P (alt) && alt != CONST0_RTX (mode
{
  riscv_emit_int_compare (&code, &op0, &op1, true);
  rtx cond = gen_rtx_fmt_ee (code, GET_MODE (op0), op0, op1);

[PATCH 23/44] RISC-V/testsuite: Add branched cases for T-Head non-equality cond moves

Verify, for T-Head targets and the non-equality integer conditional-move 
operations, that if-conversion does *not* trigger at `-mbranch-cost=1' 
setting, which makes original branched code sequences cheaper than their 
branchless equivalents if-conversion would emit.

gcc/testsuite/
* gcc.target/riscv/movdibge-thead.c: New test.
* gcc.target/riscv/movdibgeu-thead.c: New test.
* gcc.target/riscv/movdibgt-thead.c: New test.
* gcc.target/riscv/movdibgtu-thead.c: New test.
* gcc.target/riscv/movdible-thead.c: New test.
* gcc.target/riscv/movdibleu-thead.c: New test.
* gcc.target/riscv/movdiblt-thead.c: New test.
* gcc.target/riscv/movdibltu-thead.c: New test.
* gcc.target/riscv/movsibge-thead.c: New test.
* gcc.target/riscv/movsibgeu-thead.c: New test.
* gcc.target/riscv/movsibgt-thead.c: New test.
* gcc.target/riscv/movsibgtu-thead.c: New test.
* gcc.target/riscv/movsible-thead.c: New test.
* gcc.target/riscv/movsibleu-thead.c: New test.
* gcc.target/riscv/movsiblt-thead.c: New test.
* gcc.target/riscv/movsibltu-thead.c: New test.
---
 gcc/testsuite/gcc.target/riscv/movdibge-thead.c  |   27 +++
 gcc/testsuite/gcc.target/riscv/movdibgeu-thead.c |   27 +++
 gcc/testsuite/gcc.target/riscv/movdibgt-thead.c  |   27 +++
 gcc/testsuite/gcc.target/riscv/movdibgtu-thead.c |   27 +++
 gcc/testsuite/gcc.target/riscv/movdible-thead.c  |   27 +++
 gcc/testsuite/gcc.target/riscv/movdibleu-thead.c |   27 +++
 gcc/testsuite/gcc.target/riscv/movdiblt-thead.c  |   27 +++
 gcc/testsuite/gcc.target/riscv/movdibltu-thead.c |   27 +++
 gcc/testsuite/gcc.target/riscv/movsibge-thead.c  |   27 +++
 gcc/testsuite/gcc.target/riscv/movsibgeu-thead.c |   27 +++
 gcc/testsuite/gcc.target/riscv/movsibgt-thead.c  |   27 +++
 gcc/testsuite/gcc.target/riscv/movsibgtu-thead.c |   27 +++
 gcc/testsuite/gcc.target/riscv/movsible-thead.c  |   27 +++
 gcc/testsuite/gcc.target/riscv/movsibleu-thead.c |   27 +++
 gcc/testsuite/gcc.target/riscv/movsiblt-thead.c  |   27 +++
 gcc/testsuite/gcc.target/riscv/movsibltu-thead.c |   27 +++
 16 files changed, 432 insertions(+)

gcc-riscv-expand-conditional-move-sfb-alu-thead-test-movcc-branch.diff
Index: gcc/gcc/testsuite/gcc.target/riscv/movdibge-thead.c
===
--- /dev/null
+++ gcc/gcc/testsuite/gcc.target/riscv/movdibge-thead.c
@@ -0,0 +1,27 @@
+/* { dg-do compile } */
+/* { dg-require-effective-target rv64 } */
+/* { dg-skip-if "" { *-*-* } { "-O0" "-Og" "-Os" "-Oz" } } */
+/* { dg-options "-march=rv64gc_xtheadcondmov -mtune=thead-c906 -mbranch-cost=1 
-fdump-rtl-ce1" } */
+
+typedef int __attribute__ ((mode (DI))) int_t;
+
+int_t
+movdige (int_t w, int_t x, int_t y, int_t z)
+{
+  return w >= x ? y : z;
+}
+
+/* Expect branched assembly like:
+
+   blt a0,a1,.L2
+   mv  a3,a2
+.L2:
+   mv  a0,a3
+ */
+
+/* { dg-final { scan-rtl-dump-not "Conversion succeeded on pass \[0-9\]+\\." 
"ce1" } } */
+/* { dg-final { scan-rtl-dump-not "if-conversion succeeded through" "ce1" } } 
*/
+/* { dg-final { scan-assembler-times "\\s(?:bge|bgt|ble|blt)\\s" 1 } } */
+/* { dg-final { scan-assembler-not "\\s(?:sgt|slt)\\s" } } */
+/* { dg-final { scan-assembler-not "\\s(?:th\\.mveqz|th\\.mvnez)\\s" } } */
+/* { dg-final { scan-assembler-not "\\s(?:seqz|snez)\\s" } } */
Index: gcc/gcc/testsuite/gcc.target/riscv/movdibgeu-thead.c
===
--- /dev/null
+++ gcc/gcc/testsuite/gcc.target/riscv/movdibgeu-thead.c
@@ -0,0 +1,27 @@
+/* { dg-do compile } */
+/* { dg-require-effective-target rv64 } */
+/* { dg-skip-if "" { *-*-* } { "-O0" "-Og" "-Os" "-Oz" } } */
+/* { dg-options "-march=rv64gc_xtheadcondmov -mtune=thead-c906 -mbranch-cost=1 
-fdump-rtl-ce1" } */
+
+typedef unsigned int __attribute__ ((mode (DI))) int_t;
+
+int_t
+movdigeu (int_t w, int_t x, int_t y, int_t z)
+{
+  return w >= x ? y : z;
+}
+
+/* Expect branched assembly like:
+
+   bltua0,a1,.L2
+   mv  a3,a2
+.L2:
+   mv  a0,a3
+ */
+
+/* { dg-final { scan-rtl-dump-not "Conversion succeeded on pass \[0-9\]+\\." 
"ce1" } } */
+/* { dg-final { scan-rtl-dump-not "if-conversion succeeded through" "ce1" } } 
*/
+/* { dg-final { scan-assembler-times "\\s(?:bgeu|bgtu|bleu|bltu)\\s" 1 } } */
+/* { dg-final { scan-assembler-not "\\s(?:sgtu|sltu)\\s" } } */
+/* { dg-final { scan-assembler-not "\\s(?:th\\.mveqz|th\\.mvnez)\\s" } } */
+/* { dg-final { scan-assembler-not "\\s(?:seqz|snez)\\s" } } */
Index: gcc/gcc/testsuite/gcc.target/riscv/movdibgt-thead.c

[PATCH 25/44] RISC-V: Implement `riscv_emit_unary' helper

Add a `riscv_emit_unary' helper for unary operations, complementing 
`riscv_emit_binary'.

gcc/
* config/riscv/riscv-protos.h (riscv_emit_unary): New prototype.
* config/riscv/riscv.cc (riscv_emit_unary): New function.
---
 gcc/config/riscv/riscv-protos.h |1 +
 gcc/config/riscv/riscv.cc   |8 
 2 files changed, 9 insertions(+)

gcc-riscv-emit-unary.diff
Index: gcc/gcc/config/riscv/riscv-protos.h
===
--- gcc.orig/gcc/config/riscv/riscv-protos.h
+++ gcc/gcc/config/riscv/riscv-protos.h
@@ -134,6 +134,7 @@ riscv_zcmp_valid_stack_adj_bytes_p (HOST
 extern void riscv_expand_int_scc (rtx, enum rtx_code, rtx, rtx, bool 
*invert_ptr = 0);
 extern void riscv_expand_float_scc (rtx, enum rtx_code, rtx, rtx);
 extern void riscv_expand_conditional_branch (rtx, enum rtx_code, rtx, rtx);
+extern rtx riscv_emit_unary (enum rtx_code code, rtx dest, rtx x);
 extern rtx riscv_emit_binary (enum rtx_code code, rtx dest, rtx x, rtx y);
 #endif
 extern bool riscv_expand_conditional_move (rtx, rtx, rtx, rtx);
Index: gcc/gcc/config/riscv/riscv.cc
===
--- gcc.orig/gcc/config/riscv/riscv.cc
+++ gcc/gcc/config/riscv/riscv.cc
@@ -1703,6 +1703,14 @@ riscv_emit_set (rtx target, rtx src)
   return target;
 }
 
+/* Emit an instruction of the form (set DEST (CODE X)).  */
+
+rtx
+riscv_emit_unary (enum rtx_code code, rtx dest, rtx x)
+{
+  return riscv_emit_set (dest, gen_rtx_fmt_e (code, GET_MODE (dest), x));
+}
+
 /* Emit an instruction of the form (set DEST (CODE X Y)).  */
 
 rtx

[PATCH 24/44] RISC-V/testsuite: Add branchless cases for T-Head non-equality cond moves

Verify, for T-Head targets and the non-equality integer conditional-move 
operations, that if-conversion triggers via `noce_try_cmove' at 
`-mbranch-cost=2' setting, which makes branchless code sequences 
produced by if-conversion cheaper than their original branched 
equivalents, and that extraneous instructions such as SNEZ, etc. are not 
present in output.

gcc/testsuite/
* gcc.target/riscv/movdige-thead.c: New test.
* gcc.target/riscv/movdigeu-thead.c: New test.
* gcc.target/riscv/movdigt-thead.c: New test.
* gcc.target/riscv/movdigtu-thead.c: New test.
* gcc.target/riscv/movdile-thead.c: New test.
* gcc.target/riscv/movdileu-thead.c: New test.
* gcc.target/riscv/movdilt-thead.c: New test.
* gcc.target/riscv/movdiltu-thead.c: New test.
* gcc.target/riscv/movsige-thead.c: New test.
* gcc.target/riscv/movsigeu-thead.c: New test.
* gcc.target/riscv/movsigt-thead.c: New test.
* gcc.target/riscv/movsigtu-thead.c: New test.
* gcc.target/riscv/movsile-thead.c: New test.
* gcc.target/riscv/movsileu-thead.c: New test.
* gcc.target/riscv/movsilt-thead.c: New test.
* gcc.target/riscv/movsiltu-thead.c: New test.
---
 gcc/testsuite/gcc.target/riscv/movdige-thead.c  |   26 
 gcc/testsuite/gcc.target/riscv/movdigeu-thead.c |   26 
 gcc/testsuite/gcc.target/riscv/movdigt-thead.c  |   26 
 gcc/testsuite/gcc.target/riscv/movdigtu-thead.c |   26 
 gcc/testsuite/gcc.target/riscv/movdile-thead.c  |   26 
 gcc/testsuite/gcc.target/riscv/movdileu-thead.c |   26 
 gcc/testsuite/gcc.target/riscv/movdilt-thead.c  |   26 
 gcc/testsuite/gcc.target/riscv/movdiltu-thead.c |   26 
 gcc/testsuite/gcc.target/riscv/movsige-thead.c  |   26 
 gcc/testsuite/gcc.target/riscv/movsigeu-thead.c |   26 
 gcc/testsuite/gcc.target/riscv/movsigt-thead.c  |   26 
 gcc/testsuite/gcc.target/riscv/movsigtu-thead.c |   26 
 gcc/testsuite/gcc.target/riscv/movsile-thead.c  |   26 
 gcc/testsuite/gcc.target/riscv/movsileu-thead.c |   26 
 gcc/testsuite/gcc.target/riscv/movsilt-thead.c  |   26 
 gcc/testsuite/gcc.target/riscv/movsiltu-thead.c |   26 
 16 files changed, 416 insertions(+)

gcc-riscv-expand-conditional-move-sfb-alu-thead-test-movcc.diff
Index: gcc/gcc/testsuite/gcc.target/riscv/movdige-thead.c
===
--- /dev/null
+++ gcc/gcc/testsuite/gcc.target/riscv/movdige-thead.c
@@ -0,0 +1,26 @@
+/* { dg-do compile } */
+/* { dg-require-effective-target rv64 } */
+/* { dg-skip-if "" { *-*-* } { "-O0" "-Og" "-Os" "-Oz" } } */
+/* { dg-options "-march=rv64gc_xtheadcondmov -mtune=thead-c906 -mbranch-cost=2 
-fdump-rtl-ce1" } */
+
+typedef int __attribute__ ((mode (DI))) int_t;
+
+int_t
+movdige (int_t w, int_t x, int_t y, int_t z)
+{
+  return w >= x ? y : z;
+}
+
+/* Expect branchless assembly like:
+
+   slt a0,a0,a1
+   th.mvneza2,a3,a0
+   mv  a0,a2
+ */
+
+/* { dg-final { scan-rtl-dump-times "Conversion succeeded on pass 1\\." 1 
"ce1" } } */
+/* { dg-final { scan-rtl-dump-times "if-conversion succeeded through 
noce_try_cmove" 1 "ce1" } } */
+/* { dg-final { scan-assembler-times "\\s(?:sgt|slt)\\s" 1 } } */
+/* { dg-final { scan-assembler-times "\\s(?:th\\.mveqz|th\\.mvnez)\\s" 1 } } */
+/* { dg-final { scan-assembler-not "\\s(?:seqz|snez)\\s" } } */
+/* { dg-final { scan-assembler-not "\\s(?:bge|bgt|ble|blt)\\s" } } */
Index: gcc/gcc/testsuite/gcc.target/riscv/movdigeu-thead.c
===
--- /dev/null
+++ gcc/gcc/testsuite/gcc.target/riscv/movdigeu-thead.c
@@ -0,0 +1,26 @@
+/* { dg-do compile } */
+/* { dg-require-effective-target rv64 } */
+/* { dg-skip-if "" { *-*-* } { "-O0" "-Og" "-Os" "-Oz" } } */
+/* { dg-options "-march=rv64gc_xtheadcondmov -mtune=thead-c906 -mbranch-cost=2 
-fdump-rtl-ce1" } */
+
+typedef unsigned int __attribute__ ((mode (DI))) int_t;
+
+int_t
+movdigeu (int_t w, int_t x, int_t y, int_t z)
+{
+  return w >= x ? y : z;
+}
+
+/* Expect branchless assembly like:
+
+   sltua0,a0,a1
+   th.mvneza2,a3,a0
+   mv  a0,a2
+ */
+
+/* { dg-final { scan-rtl-dump-times "Conversion succeeded on pass 1\\." 1 
"ce1" } } */
+/* { dg-final { scan-rtl-dump-times "if-conversion succeeded through 
noce_try_cmove" 1 "ce1" } } */
+/* { dg-final { scan-assembler-times "\\s(?:sgtu|sltu)\\s" 1 } } */
+/* { dg-final { scan-assembler-times "\\s(?:th\\.mveqz|th\\.mvnez)\\s" 1 } } */
+/* { dg-final { scan-assembler-not "\\s(?:seqz|snez)\\s" } } */
+/* { dg-final { scan-as

[PATCH 26/44] RISC-V: Add `movMODEcc' implementation for generic targets

Provide RTL expansion of conditional-move operations for generic targets 
using a suitable sequence of base integer machine instructions according 
to cost evaluation by if-conversion.  Add `-mmovcc' command line option 
to enable this transformation, off by default.

For the generic sequences small immediates as per the `arith_operand' 
predicate are cost-equivalent to registers as we can use them as input, 
alternative to a register, to the respective AND[I] machine operations, 
however we need to reject immediates fulfilling `lui_operand', because 
they would require reloading into a register, making the operation more 
costly.  Therefore add `movcc_operand' predicate and use it accordingly.

There is a need to adjust zbs-bext-02.c, which can also serve as emitted 
code example, because with certain compilation options an AND operation 
can now legitimately appear in output despite BEXT having been produced 
as expected, such as with `-march=rv64gc -O2':

foo:
mv  a3,a0
li  a5,0
mv  a0,a1
li  a2,64
li  a1,1
.L3:
sll a4,a1,a5
and a4,a4,a3
addiw   a5,a5,1
beq a4,zero,.L2
addiw   a0,a0,1
.L2:
bne a5,a2,.L3
ret

vs `-march=rv64gc_zbs -O2':

foo:
mv  a4,a0
li  a5,0
mv  a0,a1
li  a3,64
.L3:
bexta2,a4,a5
beq a2,zero,.L2
addiw   a0,a0,1
.L2:
addiw   a5,a5,1
bne a5,a3,.L3
ret

and then with `-march=rv64gc -mmovcc -mbranch-cost=7':

foo:
mv  a6,a0
li  a4,0
mv  a0,a1
li  a7,1
li  a1,64
.L3:
sll a5,a7,a4
and a5,a5,a6
sneza5,a5
neg a5,a5
not a2,a5
addiw   a3,a0,1
and a5,a5,a3
and a0,a2,a0
addiw   a4,a4,1
or  a0,a5,a0
bne a4,a1,.L3
ret

vs `-march=rv64gc_zbs -mmovcc -mbranch-cost=7':

foo:
mv  a6,a0
li  a4,0
mv  a0,a1
li  a1,64
.L3:
bexta5,a6,a4
neg a5,a5
not a2,a5
addiw   a3,a0,1
and a5,a5,a3
and a0,a2,a0
addiw   a4,a4,1
or  a0,a5,a0
bne a4,a1,.L3
ret

However BEXT is supposed to replace an SLL operation so adjust the test 
case to reject SLL rather than AND, letting the test case pass even with 
`/-mmovcc/-mbranch-cost=7' specified as DejaGNU test flags (and in the 
absence of target-specific conditional-move operations enabled either by 
default or with other test flags).

gcc/
* config/riscv/predicates.md (movcc_operand): New predicate.
* config/riscv/riscv.cc (riscv_expand_conditional_move): Handle
generic targets.
* config/riscv/riscv.md (movcc): Likewise.
* config/riscv/riscv.opt (mmovcc): New option.
* doc/invoke.texi (Option Summary): Document it.

gcc/testsuite/
* gcc.target/riscv/zbs-bext-02.c: Adjust to reject SLL rather 
than AND.
---
 gcc/config/riscv/predicates.md   |6 +++
 gcc/config/riscv/riscv.cc|   41 ++-
 gcc/config/riscv/riscv.md|7 ++--
 gcc/config/riscv/riscv.opt   |4 ++
 gcc/doc/invoke.texi  |9 +
 gcc/testsuite/gcc.target/riscv/zbs-bext-02.c |2 -
 6 files changed, 58 insertions(+), 11 deletions(-)

gcc-riscv-movcc.diff
Index: gcc/gcc/config/riscv/predicates.md
===
--- gcc.orig/gcc/config/riscv/predicates.md
+++ gcc/gcc/config/riscv/predicates.md
@@ -41,6 +41,12 @@
   (ior (match_operand 0 "arith_operand")
(match_operand 0 "lui_operand")))
 
+(define_predicate "movcc_operand"
+  (if_then_else (match_test "TARGET_SFB_ALU || TARGET_XTHEADCONDMOV
+|| TARGET_ZICOND_LIKE")
+   (match_operand 0 "sfb_alu_operand")
+   (match_operand 0 "arith_operand")))
+
 (define_predicate "const_csr_operand"
   (and (match_code "const_int")
(match_test "IN_RANGE (INTVAL (op), 0, 31)")))
Index: gcc/gcc/config/riscv/riscv.cc
===
--- gcc.orig/gcc/config/riscv/riscv.cc
+++ gcc/gcc/config/riscv/riscv.cc
@@ -4099,7 +4099,9 @@ riscv_expand_conditional_move (rtx dest,
   rtx op0 = XEXP (op, 0);
   rtx op1 = XEXP (op, 1);
 
-  if ((TARGET_ZICOND_LIKE && GET_MODE_CLASS (mode) == MODE_INT)
+  if (((TARGET_ZICOND_LIKE
+   || (arith_operand (cons, mode) && arith_operand (alt, mode)))
+   && (GET_MODE_CLASS (mode) == MODE_INT))
   || TARGET_SFB_ALU || TARGET_XTHEADCONDMOV)
 {
   machine_mode mode0 = GET_MODE (op0);
@@ -4113,6 +4115,15 @@ riscv_expand_conditional_move (rtx dest,
  || (mode1 != word_mode && mode1 != VOID

[PATCH 28/44] RISC-V/testsuite: Add branchless cases for generic integer cond moves

Verify, for generic integer conditional-move operations, if-conversion 
to trigger via `noce_try_cmove' at the respective sufficiently high 
`-mbranch-cost=' settings that make branchless code sequences produced 
by if-conversion cheaper than their original branched equivalents, and, 
where applicable, that extraneous instructions such as SNEZ, etc. are 
not present in output.  Cover all integer relational operations to make 
sure no corner case escapes.

gcc/testsuite/
* gcc.target/riscv/movdieq.c: New test.
* gcc.target/riscv/movdige.c: New test.
* gcc.target/riscv/movdigeu.c: New test.
* gcc.target/riscv/movdigt.c: New test.
* gcc.target/riscv/movdigtu.c: New test.
* gcc.target/riscv/movdile.c: New test.
* gcc.target/riscv/movdileu.c: New test.
* gcc.target/riscv/movdilt.c: New test.
* gcc.target/riscv/movdiltu.c: New test.
* gcc.target/riscv/movdine.c: New test.
* gcc.target/riscv/movsieq.c: New test.
* gcc.target/riscv/movsige.c: New test.
* gcc.target/riscv/movsigeu.c: New test.
* gcc.target/riscv/movsigt.c: New test.
* gcc.target/riscv/movsigtu.c: New test.
* gcc.target/riscv/movsile.c: New test.
* gcc.target/riscv/movsileu.c: New test.
* gcc.target/riscv/movsilt.c: New test.
* gcc.target/riscv/movsiltu.c: New test.
* gcc.target/riscv/movsine.c: New test.
---
 gcc/testsuite/gcc.target/riscv/movdieq.c  |   29 +
 gcc/testsuite/gcc.target/riscv/movdige.c  |   28 
 gcc/testsuite/gcc.target/riscv/movdigeu.c |   28 
 gcc/testsuite/gcc.target/riscv/movdigt.c  |   28 
 gcc/testsuite/gcc.target/riscv/movdigtu.c |   28 
 gcc/testsuite/gcc.target/riscv/movdile.c  |   28 
 gcc/testsuite/gcc.target/riscv/movdileu.c |   28 
 gcc/testsuite/gcc.target/riscv/movdilt.c  |   28 
 gcc/testsuite/gcc.target/riscv/movdiltu.c |   28 
 gcc/testsuite/gcc.target/riscv/movdine.c  |   29 +
 gcc/testsuite/gcc.target/riscv/movsieq.c  |   29 +
 gcc/testsuite/gcc.target/riscv/movsige.c  |   28 
 gcc/testsuite/gcc.target/riscv/movsigeu.c |   28 
 gcc/testsuite/gcc.target/riscv/movsigt.c  |   28 
 gcc/testsuite/gcc.target/riscv/movsigtu.c |   28 
 gcc/testsuite/gcc.target/riscv/movsile.c  |   28 
 gcc/testsuite/gcc.target/riscv/movsileu.c |   28 
 gcc/testsuite/gcc.target/riscv/movsilt.c  |   28 
 gcc/testsuite/gcc.target/riscv/movsiltu.c |   28 
 gcc/testsuite/gcc.target/riscv/movsine.c  |   29 +
 20 files changed, 564 insertions(+)

gcc-riscv-test-movcc-generic.diff
Index: gcc/gcc/testsuite/gcc.target/riscv/movdieq.c
===
--- /dev/null
+++ gcc/gcc/testsuite/gcc.target/riscv/movdieq.c
@@ -0,0 +1,29 @@
+/* { dg-do compile } */
+/* { dg-require-effective-target rv64 } */
+/* { dg-skip-if "" { *-*-* } { "-O0" "-Og" "-Os" "-Oz" } } */
+/* { dg-options "-march=rv64gc -mtune=sifive-5-series -mbranch-cost=7 -mmovcc 
-fdump-rtl-ce1" } */
+
+typedef int __attribute__ ((mode (DI))) int_t;
+
+int_t
+movdieq (int_t w, int_t x, int_t y, int_t z)
+{
+  return w == x ? y : z;
+}
+
+/* Expect branchless assembly like:
+
+   sub a5,a0,a1
+   sneza5,a5
+   neg a5,a5
+   and a3,a5,a3
+   not a5,a5
+   and a5,a5,a2
+   or  a0,a3,a5
+ */
+
+/* { dg-final { scan-rtl-dump-times "Conversion succeeded on pass 1\\." 1 
"ce1" } } */
+/* { dg-final { scan-rtl-dump-times "if-conversion succeeded through 
noce_try_cmove" 1 "ce1" } } */
+/* { dg-final { scan-assembler-times "\\ssub\\s" 1 } } */
+/* { dg-final { scan-assembler-times "\\s(?:seqz|snez)\\s" 1 } } */
+/* { dg-final { scan-assembler-not "\\s(?:beq|bne)\\s" } } */
Index: gcc/gcc/testsuite/gcc.target/riscv/movdige.c
===
--- /dev/null
+++ gcc/gcc/testsuite/gcc.target/riscv/movdige.c
@@ -0,0 +1,28 @@
+/* { dg-do compile } */
+/* { dg-require-effective-target rv64 } */
+/* { dg-skip-if "" { *-*-* } { "-O0" "-Og" "-Os" "-Oz" } } */
+/* { dg-options "-march=rv64gc -mtune=sifive-5-series -mbranch-cost=6 -mmovcc 
-fdump-rtl-ce1" } */
+
+typedef int __attribute__ ((mode (DI))) int_t;
+
+int_t
+movdige (int_t w, int_t x, int_t y, int_t z)
+{
+  return w >= x ? y : z;
+}
+
+/* Expect branchless assembly like:
+
+   slt a1,a0,a1
+   neg a1,a1
+   and a3,a1,a3
+   not a1,a1
+   and a

[PATCH 30/44] RISC-V/testsuite: Add branched cases for generic integer cond adds

Verify, for generic integer conditional-add operations, if-conversion 
*not* to trigger at the respective sufficiently low `-mbranch-cost=' 
settings that make original branched code sequences cheaper than their 
branchless equivalents if-conversion would emit.  Cover all integer 
relational operations to make sure no corner case escapes.

gcc/testsuite/
* gcc.target/riscv/adddibeq.c: New test.
* gcc.target/riscv/adddibge.c: New test.
* gcc.target/riscv/adddibgeu.c: New test.
* gcc.target/riscv/adddibgt.c: New test.
* gcc.target/riscv/adddibgtu.c: New test.
* gcc.target/riscv/adddible.c: New test.
* gcc.target/riscv/adddibleu.c: New test.
* gcc.target/riscv/adddiblt.c: New test.
* gcc.target/riscv/adddibltu.c: New test.
* gcc.target/riscv/adddibne.c: New test.
* gcc.target/riscv/addsibeq.c: New test.
* gcc.target/riscv/addsibge.c: New test.
* gcc.target/riscv/addsibgeu.c: New test.
* gcc.target/riscv/addsibgt.c: New test.
* gcc.target/riscv/addsibgtu.c: New test.
* gcc.target/riscv/addsible.c: New test.
* gcc.target/riscv/addsibleu.c: New test.
* gcc.target/riscv/addsiblt.c: New test.
* gcc.target/riscv/addsibltu.c: New test.
* gcc.target/riscv/addsibne.c: New test.
---
 gcc/testsuite/gcc.target/riscv/adddibeq.c  |   26 ++
 gcc/testsuite/gcc.target/riscv/adddibge.c  |   26 ++
 gcc/testsuite/gcc.target/riscv/adddibgeu.c |   26 ++
 gcc/testsuite/gcc.target/riscv/adddibgt.c  |   26 ++
 gcc/testsuite/gcc.target/riscv/adddibgtu.c |   26 ++
 gcc/testsuite/gcc.target/riscv/adddible.c  |   26 ++
 gcc/testsuite/gcc.target/riscv/adddibleu.c |   26 ++
 gcc/testsuite/gcc.target/riscv/adddiblt.c  |   26 ++
 gcc/testsuite/gcc.target/riscv/adddibltu.c |   26 ++
 gcc/testsuite/gcc.target/riscv/adddibne.c  |   28 
 gcc/testsuite/gcc.target/riscv/addsibeq.c  |   26 ++
 gcc/testsuite/gcc.target/riscv/addsibge.c  |   26 ++
 gcc/testsuite/gcc.target/riscv/addsibgeu.c |   26 ++
 gcc/testsuite/gcc.target/riscv/addsibgt.c  |   26 ++
 gcc/testsuite/gcc.target/riscv/addsibgtu.c |   26 ++
 gcc/testsuite/gcc.target/riscv/addsible.c  |   26 ++
 gcc/testsuite/gcc.target/riscv/addsibleu.c |   26 ++
 gcc/testsuite/gcc.target/riscv/addsiblt.c  |   26 ++
 gcc/testsuite/gcc.target/riscv/addsibltu.c |   26 ++
 gcc/testsuite/gcc.target/riscv/addsibne.c  |   28 
 20 files changed, 524 insertions(+)

gcc-riscv-test-addcc-branch-generic.diff
Index: gcc/gcc/testsuite/gcc.target/riscv/adddibeq.c
===
--- /dev/null
+++ gcc/gcc/testsuite/gcc.target/riscv/adddibeq.c
@@ -0,0 +1,26 @@
+/* { dg-do compile } */
+/* { dg-require-effective-target rv64 } */
+/* { dg-skip-if "" { *-*-* } { "-O0" "-Og" "-Os" "-Oz" } } */
+/* { dg-options "-march=rv64gc -mtune=sifive-5-series -mbranch-cost=4 -mmovcc 
-fdump-rtl-ce1" } */
+
+typedef int __attribute__ ((mode (DI))) int_t;
+
+int_t
+adddieq (int_t w, int_t x, int_t y, int_t z)
+{
+  return w == x ? y + z : y;
+}
+
+/* Expect branched assembly like:
+
+   bne a0,a1,.L2
+   add a2,a2,a3
+.L2:
+   mv  a0,a2
+ */
+
+/* { dg-final { scan-rtl-dump-not "Conversion succeeded on pass \[0-9\]+\\." 
"ce1" } } */
+/* { dg-final { scan-rtl-dump-not "if-conversion succeeded through" "ce1" } } 
*/
+/* { dg-final { scan-assembler-times "\\s(?:beq|bne)\\s" 1 } } */
+/* { dg-final { scan-assembler-not "\\ssub\\s" } } */
+/* { dg-final { scan-assembler-not "\\s(?:seqz|snez)\\s" } } */
Index: gcc/gcc/testsuite/gcc.target/riscv/adddibge.c
===
--- /dev/null
+++ gcc/gcc/testsuite/gcc.target/riscv/adddibge.c
@@ -0,0 +1,26 @@
+/* { dg-do compile } */
+/* { dg-require-effective-target rv64 } */
+/* { dg-skip-if "" { *-*-* } { "-O0" "-Og" "-Os" "-Oz" } } */
+/* { dg-options "-march=rv64gc -mtune=sifive-5-series -mbranch-cost=3 -mmovcc 
-fdump-rtl-ce1" } */
+
+typedef int __attribute__ ((mode (DI))) int_t;
+
+int_t
+adddige (int_t w, int_t x, int_t y, int_t z)
+{
+  return w >= x ? y + z : y;
+}
+
+/* Expect branched assembly like:
+
+   blt a0,a1,.L2
+   add a2,a2,a3
+.L2:
+   mv  a0,a2
+ */
+
+/* { dg-final { scan-rtl-dump-not "Conversion succeeded on pass \[0-9\]+\\." 
"ce1" } } */
+/* { dg-final { scan-rtl-dump-not "if-conversion succeeded through" "ce1" } } 
*/
+/* { dg-final { scan-assembler-times "\\s(?:bge|bgt|ble|blt

[PATCH 27/44] RISC-V/testsuite: Add branched cases for generic integer cond moves

Verify, for generic integer conditional-move operations, if-conversion 
*not* to trigger at the respective sufficiently low `-mbranch-cost=' 
settings that make original branched code sequences cheaper than their 
branchless equivalents if-conversion would emit.  Cover all integer 
relational operations to make sure no corner case escapes.

gcc/testsuite/
* gcc.target/riscv/movdibeq.c: New test.
* gcc.target/riscv/movdibge.c: New test.
* gcc.target/riscv/movdibgeu.c: New test.
* gcc.target/riscv/movdibgt.c: New test.
* gcc.target/riscv/movdibgtu.c: New test.
* gcc.target/riscv/movdible.c: New test.
* gcc.target/riscv/movdibleu.c: New test.
* gcc.target/riscv/movdiblt.c: New test.
* gcc.target/riscv/movdibltu.c: New test.
* gcc.target/riscv/movdibne.c: New test.
* gcc.target/riscv/movsibeq.c: New test.
* gcc.target/riscv/movsibge.c: New test.
* gcc.target/riscv/movsibgeu.c: New test.
* gcc.target/riscv/movsibgt.c: New test.
* gcc.target/riscv/movsibgtu.c: New test.
* gcc.target/riscv/movsible.c: New test.
* gcc.target/riscv/movsibleu.c: New test.
* gcc.target/riscv/movsiblt.c: New test.
* gcc.target/riscv/movsibltu.c: New test.
* gcc.target/riscv/movsibne.c: New test.
---
 gcc/testsuite/gcc.target/riscv/movdibeq.c  |   26 ++
 gcc/testsuite/gcc.target/riscv/movdibge.c  |   26 ++
 gcc/testsuite/gcc.target/riscv/movdibgeu.c |   26 ++
 gcc/testsuite/gcc.target/riscv/movdibgt.c  |   26 ++
 gcc/testsuite/gcc.target/riscv/movdibgtu.c |   26 ++
 gcc/testsuite/gcc.target/riscv/movdible.c  |   26 ++
 gcc/testsuite/gcc.target/riscv/movdibleu.c |   26 ++
 gcc/testsuite/gcc.target/riscv/movdiblt.c  |   26 ++
 gcc/testsuite/gcc.target/riscv/movdibltu.c |   26 ++
 gcc/testsuite/gcc.target/riscv/movdibne.c  |   28 
 gcc/testsuite/gcc.target/riscv/movsibeq.c  |   26 ++
 gcc/testsuite/gcc.target/riscv/movsibge.c  |   26 ++
 gcc/testsuite/gcc.target/riscv/movsibgeu.c |   26 ++
 gcc/testsuite/gcc.target/riscv/movsibgt.c  |   26 ++
 gcc/testsuite/gcc.target/riscv/movsibgtu.c |   26 ++
 gcc/testsuite/gcc.target/riscv/movsible.c  |   26 ++
 gcc/testsuite/gcc.target/riscv/movsibleu.c |   26 ++
 gcc/testsuite/gcc.target/riscv/movsiblt.c  |   26 ++
 gcc/testsuite/gcc.target/riscv/movsibltu.c |   26 ++
 gcc/testsuite/gcc.target/riscv/movsibne.c  |   28 
 20 files changed, 524 insertions(+)

gcc-riscv-test-movcc-branch-generic.diff
Index: gcc/gcc/testsuite/gcc.target/riscv/movdibeq.c
===
--- /dev/null
+++ gcc/gcc/testsuite/gcc.target/riscv/movdibeq.c
@@ -0,0 +1,26 @@
+/* { dg-do compile } */
+/* { dg-require-effective-target rv64 } */
+/* { dg-skip-if "" { *-*-* } { "-O0" "-Og" "-Os" "-Oz" } } */
+/* { dg-options "-march=rv64gc -mtune=sifive-5-series -mbranch-cost=6 -mmovcc 
-fdump-rtl-ce1" } */
+
+typedef int __attribute__ ((mode (DI))) int_t;
+
+int_t
+movdieq (int_t w, int_t x, int_t y, int_t z)
+{
+  return w == x ? y : z;
+}
+
+/* Expect branched assembly like:
+
+   bne a0,a1,.L2
+   mv  a3,a2
+.L2:
+   mv  a0,a3
+ */
+
+/* { dg-final { scan-rtl-dump-not "Conversion succeeded on pass \[0-9\]+\\." 
"ce1" } } */
+/* { dg-final { scan-rtl-dump-not "if-conversion succeeded through" "ce1" } } 
*/
+/* { dg-final { scan-assembler-times "\\s(?:beq|bne)\\s" 1 } } */
+/* { dg-final { scan-assembler-not "\\ssub\\s" } } */
+/* { dg-final { scan-assembler-not "\\s(?:seqz|snez)\\s" } } */
Index: gcc/gcc/testsuite/gcc.target/riscv/movdibge.c
===
--- /dev/null
+++ gcc/gcc/testsuite/gcc.target/riscv/movdibge.c
@@ -0,0 +1,26 @@
+/* { dg-do compile } */
+/* { dg-require-effective-target rv64 } */
+/* { dg-skip-if "" { *-*-* } { "-O0" "-Og" "-Os" "-Oz" } } */
+/* { dg-options "-march=rv64gc -mtune=sifive-5-series -mbranch-cost=5 -mmovcc 
-fdump-rtl-ce1" } */
+
+typedef int __attribute__ ((mode (DI))) int_t;
+
+int_t
+movdige (int_t w, int_t x, int_t y, int_t z)
+{
+  return w >= x ? y : z;
+}
+
+/* Expect branched assembly like:
+
+   blt a0,a1,.L2
+   mv  a3,a2
+.L2:
+   mv  a0,a3
+ */
+
+/* { dg-final { scan-rtl-dump-not "Conversion succeeded on pass \[0-9\]+\\." 
"ce1" } } */
+/* { dg-final { scan-rtl-dump-not "if-conversion succeeded through" "ce1" } } 
*/
+/* { dg-final { scan-assembler-times "\\s(?:bge|bgt|ble|blt)\\s" 1 } } *

[PATCH 31/44] RISC-V/testsuite: Add branchless cases for generic integer cond adds

Verify, for generic integer conditional-add operations, if-conversion
to trigger via `noce_try_addcc' at the respective sufficiently high
`-mbranch-cost=' settings that make branchless code sequences produced
by if-conversion cheaper than their original branched equivalents, and,
where applicable, that extraneous instructions such as SNEZ, etc. are
not present in output.  Cover all integer relational operations to make
sure no corner case escapes.

The reason to XFAIL SImode tests for RV64 targets is the compiler thinks 
it has to sign-extend addends, which causes if-conversion to give up.

gcc/testsuite/
* gcc.target/riscv/adddieq.c: New test.
* gcc.target/riscv/adddige.c: New test.
* gcc.target/riscv/adddigeu.c: New test.
* gcc.target/riscv/adddigt.c: New test.
* gcc.target/riscv/adddigtu.c: New test.
* gcc.target/riscv/adddile.c: New test.
* gcc.target/riscv/adddileu.c: New test.
* gcc.target/riscv/adddilt.c: New test.
* gcc.target/riscv/adddiltu.c: New test.
* gcc.target/riscv/adddine.c: New test.
* gcc.target/riscv/addsieq.c: New test.
* gcc.target/riscv/addsige.c: New test.
* gcc.target/riscv/addsigeu.c: New test.
* gcc.target/riscv/addsigt.c: New test.
* gcc.target/riscv/addsigtu.c: New test.
* gcc.target/riscv/addsile.c: New test.
* gcc.target/riscv/addsileu.c: New test.
* gcc.target/riscv/addsilt.c: New test.
* gcc.target/riscv/addsiltu.c: New test.
* gcc.target/riscv/addsine.c: New test.
---
 gcc/testsuite/gcc.target/riscv/adddieq.c  |   27 +++
 gcc/testsuite/gcc.target/riscv/adddige.c  |   26 ++
 gcc/testsuite/gcc.target/riscv/adddigeu.c |   26 ++
 gcc/testsuite/gcc.target/riscv/adddigt.c  |   26 ++
 gcc/testsuite/gcc.target/riscv/adddigtu.c |   26 ++
 gcc/testsuite/gcc.target/riscv/adddile.c  |   26 ++
 gcc/testsuite/gcc.target/riscv/adddileu.c |   26 ++
 gcc/testsuite/gcc.target/riscv/adddilt.c  |   26 ++
 gcc/testsuite/gcc.target/riscv/adddiltu.c |   26 ++
 gcc/testsuite/gcc.target/riscv/adddine.c  |   27 +++
 gcc/testsuite/gcc.target/riscv/addsieq.c  |   27 +++
 gcc/testsuite/gcc.target/riscv/addsige.c  |   26 ++
 gcc/testsuite/gcc.target/riscv/addsigeu.c |   26 ++
 gcc/testsuite/gcc.target/riscv/addsigt.c  |   26 ++
 gcc/testsuite/gcc.target/riscv/addsigtu.c |   26 ++
 gcc/testsuite/gcc.target/riscv/addsile.c  |   26 ++
 gcc/testsuite/gcc.target/riscv/addsileu.c |   26 ++
 gcc/testsuite/gcc.target/riscv/addsilt.c  |   26 ++
 gcc/testsuite/gcc.target/riscv/addsiltu.c |   26 ++
 gcc/testsuite/gcc.target/riscv/addsine.c  |   27 +++
 20 files changed, 524 insertions(+)

gcc-riscv-test-addcc-generic.diff
Index: gcc/gcc/testsuite/gcc.target/riscv/adddieq.c
===
--- /dev/null
+++ gcc/gcc/testsuite/gcc.target/riscv/adddieq.c
@@ -0,0 +1,27 @@
+/* { dg-do compile } */
+/* { dg-require-effective-target rv64 } */
+/* { dg-skip-if "" { *-*-* } { "-O0" "-Og" "-Os" "-Oz" } } */
+/* { dg-options "-march=rv64gc -mtune=sifive-5-series -mbranch-cost=5 -mmovcc 
-fdump-rtl-ce1" } */
+
+typedef int __attribute__ ((mode (DI))) int_t;
+
+int_t
+adddieq (int_t w, int_t x, int_t y, int_t z)
+{
+  return w == x ? y + z : y;
+}
+
+/* Expect branchless assembly like:
+
+   sub a1,a0,a1
+   seqza1,a1
+   neg a1,a1
+   and a1,a1,a3
+   add a0,a1,a2
+ */
+
+/* { dg-final { scan-rtl-dump-times "Conversion succeeded on pass 1\\." 1 
"ce1" } } */
+/* { dg-final { scan-rtl-dump-times "if-conversion succeeded through 
noce_try_addcc" 1 "ce1" } } */
+/* { dg-final { scan-assembler-times "\\ssub\\s" 1 } } */
+/* { dg-final { scan-assembler-times "\\s(?:seqz|snez)\\s" 1 } } */
+/* { dg-final { scan-assembler-not "\\s(?:beq|bne)\\s" } } */
Index: gcc/gcc/testsuite/gcc.target/riscv/adddige.c
===
--- /dev/null
+++ gcc/gcc/testsuite/gcc.target/riscv/adddige.c
@@ -0,0 +1,26 @@
+/* { dg-do compile } */
+/* { dg-require-effective-target rv64 } */
+/* { dg-skip-if "" { *-*-* } { "-O0" "-Og" "-Os" "-Oz" } } */
+/* { dg-options "-march=rv64gc -mtune=sifive-5-series -mbranch-cost=4 -mmovcc 
-fdump-rtl-ce1" } */
+
+typedef int __attribute__ ((mode (DI))) int_t;
+
+int_t
+adddige (int_t w, int_t x, int_t y, int_t z)
+{
+  return w >= x ? y + z : y;
+}
+
+/* Expect branchless assembly like:
+
+   slt a1,a0,a1
+   addia1,a1,-1
+

[PATCH 33/44] RISC-V: Also allow FP conditions in `riscv_expand_conditional_move'

In `riscv_expand_conditional_move' we only let integer conditions 
through at the moment, even though code has already been prepared to 
handle floating-point conditions as well.

Lift this restriction and only bail out if a non-word-mode integer 
condition has been requested, as we cannot handle this specific case 
owing to machine instruction set restriction.  We already take care of 
the non-integer, non-floating-point case later on.

gcc/
* config/riscv/riscv.cc (riscv_expand_conditional_move): Don't 
bail out in floating-point conditions.
---
 gcc/config/riscv/riscv.cc |8 
 1 file changed, 4 insertions(+), 4 deletions(-)

gcc-riscv-expand-conditional-move-fp.diff
Index: gcc/gcc/config/riscv/riscv.cc
===
--- gcc.orig/gcc/config/riscv/riscv.cc
+++ gcc/gcc/config/riscv/riscv.cc
@@ -4109,12 +4109,12 @@ riscv_expand_conditional_move (rtx dest,
   machine_mode mode0 = GET_MODE (op0);
   machine_mode mode1 = GET_MODE (op1);
 
-  /* The comparison must be comparing WORD_MODE objects.   We must
-enforce that so that we don't strip away a sign_extension
+  /* An integer comparison must be comparing WORD_MODE objects.  We
+must enforce that so that we don't strip away a sign_extension
 thinking it is unnecessary.  We might consider using
 riscv_extend_operands if they are not already properly extended.  */
-  if ((mode0 != word_mode && mode0 != VOIDmode)
- || (mode1 != word_mode && mode1 != VOIDmode))
+  if ((INTEGRAL_MODE_P (mode0) && mode0 != word_mode)
+ || (INTEGRAL_MODE_P (mode1) && mode1 != word_mode))
return false;
 
   /* In the fallback generic case use MODE rather than WORD_MODE for

[PATCH 29/44] RISC-V: Add `addMODEcc' implementation for generic targets

Provide RTL expansion of conditional-add operations for generic targets 
using a suitable sequence of base integer machine instructions according 
to cost evaluation by if-conversion.  Use existing `-mmovcc' command 
line option to enable this transformation.

gcc/
* config/riscv/riscv.md (addcc): New expander.
---
 gcc/config/riscv/riscv.md |   41 +
 1 file changed, 41 insertions(+)

Index: gcc/gcc/config/riscv/riscv.md
===
--- gcc.orig/gcc/config/riscv/riscv.md
+++ gcc/gcc/config/riscv/riscv.md
@@ -2655,6 +2655,8 @@
   [(set_attr "type" "branch")
(set_attr "mode" "none")])
 
+;; Conditional move and add patterns.
+
 (define_expand "movcc"
   [(set (match_operand:GPR 0 "register_operand")
(if_then_else:GPR (match_operand 1 "comparison_operator")
@@ -2670,6 +2672,45 @@
 FAIL;
 })
 
+(define_expand "addcc"
+  [(match_operand:GPR 0 "register_operand")
+   (match_operand 1 "comparison_operator")
+   (match_operand:GPR 2 "arith_operand")
+   (match_operand:GPR 3 "arith_operand")]
+  "TARGET_MOVCC"
+{
+  rtx cmp = operands[1];
+  rtx cmp0 = XEXP (cmp, 0);
+  rtx cmp1 = XEXP (cmp, 1);
+  machine_mode mode0 = GET_MODE (cmp0);
+
+  /* We only handle word mode integer compares for now.  */
+  if (INTEGRAL_MODE_P (mode0) && mode0 != word_mode)
+FAIL;
+
+  enum rtx_code code = GET_CODE (cmp);
+  rtx reg0 = gen_reg_rtx (mode);
+  rtx reg1 = gen_reg_rtx (mode);
+  rtx reg2 = gen_reg_rtx (mode);
+  bool invert = false;
+
+  if (INTEGRAL_MODE_P (mode0))
+riscv_expand_int_scc (reg0, code, cmp0, cmp1, &invert);
+  else if (FLOAT_MODE_P (mode0) && fp_scc_comparison (cmp, GET_MODE (cmp)))
+riscv_expand_float_scc (reg0, code, cmp0, cmp1);
+  else
+FAIL;
+
+  if (invert)
+riscv_emit_binary (PLUS, reg1, reg0, constm1_rtx);
+  else
+riscv_emit_unary (NEG, reg1, reg0);
+  riscv_emit_binary (AND, reg2, reg1, operands[3]);
+  riscv_emit_binary (PLUS, operands[0], reg2, operands[2]);
+
+  DONE;
+})
+
 ;; Patterns for implementations that optimize short forward branches.
 
 (define_insn "*movcc"

[PATCH 34/44] RISC-V: Provide FP conditional-branch instructions for if-conversion

Do not expand floating-point conditional-branch RTL instructions right 
away that use a comparison operation that is either directly available 
as a machine conditional-set instruction or is NE, which can be emulated 
by EQ.  This is so that if-conversion sees them in their original form 
and can produce fewer operations tried in a branchless code sequence 
compared to when such an instruction has been already converted to a 
sequence of a floating-point conditional-set RTL instruction followed by 
an integer conditional-branch RTL instruction.  Split any floating-point 
conditional-branch RTL instructions still remaining after reload then.

Adjust the testsuite accordingly: since the middle end uses the inverse 
condition internally, an inverse conditional-set instruction may make it 
to assembly output and also `cond_move_process_if_block' will be used by 
if-conversion rather than `noce_process_if_block', because the latter 
function not yet been updated to handle inverted conditions.

gcc/
* config/riscv/predicates.md (ne_operator): New predicate.
* config/riscv/riscv.cc (riscv_insn_cost): Handle branches on a 
floating-point condition.
* config/riscv/riscv.md (@cbranch4): Rename expander to...
(@cbranch4): ... this.  Only expand the RTX via 
`riscv_expand_conditional_branch' for `!signed_order_operator' 
operators, otherwise let it through.
(*cbranch4, *cbranch4): New insns and 
splitters.

gcc/testsuite/
* gcc.target/riscv/movdifge-sfb.c: Reject "if-conversion 
succeeded through" rather than accepting it.
* gcc.target/riscv/movdifge-thead.c: Likewise.
* gcc.target/riscv/movdifge-ventana.c: Likewise.
* gcc.target/riscv/movdifge-zicond.c: Likewise.
* gcc.target/riscv/movdifgt-sfb.c: Likewise.
* gcc.target/riscv/movdifgt-thead.c: Likewise.
* gcc.target/riscv/movdifgt-ventana.c: Likewise.
* gcc.target/riscv/movdifgt-zicond.c: Likewise.
* gcc.target/riscv/movdifle-sfb.c: Likewise.
* gcc.target/riscv/movdifle-thead.c: Likewise.
* gcc.target/riscv/movdifle-ventana.c: Likewise.
* gcc.target/riscv/movdifle-zicond.c: Likewise.
* gcc.target/riscv/movdiflt-sfb.c: Likewise.
* gcc.target/riscv/movdiflt-thead.c: Likewise.
* gcc.target/riscv/movdiflt-ventana.c: Likewise.
* gcc.target/riscv/movdiflt-zicond.c: Likewise.
* gcc.target/riscv/movsifge-sfb.c: Likewise.
* gcc.target/riscv/movsifge-thead.c: Likewise.
* gcc.target/riscv/movsifge-ventana.c: Likewise.
* gcc.target/riscv/movsifge-zicond.c: Likewise.
* gcc.target/riscv/movsifgt-sfb.c: Likewise.
* gcc.target/riscv/movsifgt-thead.c: Likewise.
* gcc.target/riscv/movsifgt-ventana.c: Likewise.
* gcc.target/riscv/movsifgt-zicond.c: Likewise.
* gcc.target/riscv/movsifle-sfb.c: Likewise.
* gcc.target/riscv/movsifle-thead.c: Likewise.
* gcc.target/riscv/movsifle-ventana.c: Likewise.
* gcc.target/riscv/movsifle-zicond.c: Likewise.
* gcc.target/riscv/movsiflt-sfb.c: Likewise.
* gcc.target/riscv/movsiflt-thead.c: Likewise.
* gcc.target/riscv/movsiflt-ventana.c: Likewise.
* gcc.target/riscv/movsiflt-zicond.c: Likewise.
* gcc.target/riscv/smax-ieee.c: Also accept FLT.D.
* gcc.target/riscv/smaxf-ieee.c: Also accept FLT.S.
* gcc.target/riscv/smin-ieee.c: Also accept FGT.D.
* gcc.target/riscv/sminf-ieee.c: Also accept FGT.S.
---
 gcc/config/riscv/predicates.md|3 
 gcc/config/riscv/riscv.cc |   22 +++--
 gcc/config/riscv/riscv.md |   89 +++---
 gcc/testsuite/gcc.target/riscv/movdifge-sfb.c |2 
 gcc/testsuite/gcc.target/riscv/movdifge-thead.c   |2 
 gcc/testsuite/gcc.target/riscv/movdifge-ventana.c |2 
 gcc/testsuite/gcc.target/riscv/movdifge-zicond.c  |2 
 gcc/testsuite/gcc.target/riscv/movdifgt-sfb.c |2 
 gcc/testsuite/gcc.target/riscv/movdifgt-thead.c   |2 
 gcc/testsuite/gcc.target/riscv/movdifgt-ventana.c |2 
 gcc/testsuite/gcc.target/riscv/movdifgt-zicond.c  |2 
 gcc/testsuite/gcc.target/riscv/movdifle-sfb.c |2 
 gcc/testsuite/gcc.target/riscv/movdifle-thead.c   |2 
 gcc/testsuite/gcc.target/riscv/movdifle-ventana.c |2 
 gcc/testsuite/gcc.target/riscv/movdifle-zicond.c  |2 
 gcc/testsuite/gcc.target/riscv/movdiflt-sfb.c |2 
 gcc/testsuite/gcc.target/riscv/movdiflt-thead.c   |2 
 gcc/testsuite/gcc.target/riscv/movdiflt-ventana.c |2 
 gcc/testsuite/gcc.target/riscv/movdiflt-zicond.c  |2 
 gcc/testsuite/gcc.target/riscv/movsifge-sfb.c |2 
 gcc/testsuite/gcc.target/riscv/movsifge-thead.c   |2 
 gcc/testsuite/gcc.target/riscv/movsifge-ventana.c |2 
 gcc/testsuite/gcc.target/riscv/movsifge-zicond.c  |2 
 gcc/testsuite/gc

[PATCH 37/44] RISC-V/testsuite: Add branchless cases for generic FP cond moves

Verify, for generic floating-point conditional-move operations that have 
a corresponding conditional-set machine instruction, that if-conversion 
triggers (via `cond_move_convert_if_block', which doesn't report) at 
`-mbranch-cost=5' setting, which makes branchless code sequences emitted 
by if-conversion cheaper than their original branched equivalents, and 
that extraneous instructions such as SNEZ, etc. are not present in 
output.

gcc/testsuite/
* gcc.target/riscv/movdifge.c: New test.
* gcc.target/riscv/movdifgt.c: New test.
* gcc.target/riscv/movdifle.c: New test.
* gcc.target/riscv/movdiflt.c: New test.
* gcc.target/riscv/movdifne.c: New test.
* gcc.target/riscv/movsifge.c: New test.
* gcc.target/riscv/movsifgt.c: New test.
* gcc.target/riscv/movsifle.c: New test.
* gcc.target/riscv/movsiflt.c: New test.
* gcc.target/riscv/movsifne.c: New test.
---
 gcc/testsuite/gcc.target/riscv/movdifge.c |   28 
 gcc/testsuite/gcc.target/riscv/movdifgt.c |   28 
 gcc/testsuite/gcc.target/riscv/movdifle.c |   28 
 gcc/testsuite/gcc.target/riscv/movdiflt.c |   28 
 gcc/testsuite/gcc.target/riscv/movdifne.c |   28 
 gcc/testsuite/gcc.target/riscv/movsifge.c |   28 
 gcc/testsuite/gcc.target/riscv/movsifgt.c |   28 
 gcc/testsuite/gcc.target/riscv/movsifle.c |   28 
 gcc/testsuite/gcc.target/riscv/movsiflt.c |   28 
 gcc/testsuite/gcc.target/riscv/movsifne.c |   28 
 10 files changed, 280 insertions(+)

gcc-riscv-test-movccf-generic.diff
Index: gcc/gcc/testsuite/gcc.target/riscv/movdifge.c
===
--- /dev/null
+++ gcc/gcc/testsuite/gcc.target/riscv/movdifge.c
@@ -0,0 +1,28 @@
+/* { dg-do compile } */
+/* { dg-require-effective-target rv64 } */
+/* { dg-skip-if "" { *-*-* } { "-O0" "-Og" "-Os" "-Oz" } } */
+/* { dg-options "-march=rv64gc -mtune=sifive-5-series -mbranch-cost=5 -mmovcc 
-fdump-rtl-ce1" } */
+
+typedef int __attribute__ ((mode (DI))) int_t;
+
+int_t
+movdifge (double w, double x, int_t y, int_t z)
+{
+  return w >= x ? y : z;
+}
+
+/* Expect branchless assembly like:
+
+   fge.d   a5,fa0,fa1
+   neg a5,a5
+   and a0,a5,a0
+   not a5,a5
+   and a5,a5,a1
+   or  a0,a0,a5
+ */
+
+/* { dg-final { scan-rtl-dump-times "Conversion succeeded on pass 1\\." 1 
"ce1" } } */
+/* { dg-final { scan-rtl-dump-not "if-conversion succeeded through" "ce1" } } 
*/
+/* { dg-final { scan-assembler-times "\\s(?:fge\\.d|fle\\.d)\\s" 1 } } */
+/* { dg-final { scan-assembler-not "\\s(?:beq|bne)\\s" } } */
+/* { dg-final { scan-assembler-not "\\s(?:seqz|snez)\\s" } } */
Index: gcc/gcc/testsuite/gcc.target/riscv/movdifgt.c
===
--- /dev/null
+++ gcc/gcc/testsuite/gcc.target/riscv/movdifgt.c
@@ -0,0 +1,28 @@
+/* { dg-do compile } */
+/* { dg-require-effective-target rv64 } */
+/* { dg-skip-if "" { *-*-* } { "-O0" "-Og" "-Os" "-Oz" } } */
+/* { dg-options "-march=rv64gc -mtune=sifive-5-series -mbranch-cost=5 -mmovcc 
-fdump-rtl-ce1" } */
+
+typedef int __attribute__ ((mode (DI))) int_t;
+
+int_t
+movdifgt (double w, double x, int_t y, int_t z)
+{
+  return w > x ? y : z;
+}
+
+/* Expect branchless assembly like:
+
+   fgt.d   a5,fa0,fa1
+   neg a5,a5
+   and a0,a5,a0
+   not a5,a5
+   and a5,a5,a1
+   or  a0,a0,a5
+ */
+
+/* { dg-final { scan-rtl-dump-times "Conversion succeeded on pass 1\\." 1 
"ce1" } } */
+/* { dg-final { scan-rtl-dump-not "if-conversion succeeded through" "ce1" } } 
*/
+/* { dg-final { scan-assembler-times "\\s(?:fgt\\.d|flt\\.d)\\s" 1 } } */
+/* { dg-final { scan-assembler-not "\\s(?:beq|bne)\\s" } } */
+/* { dg-final { scan-assembler-not "\\s(?:seqz|snez)\\s" } } */
Index: gcc/gcc/testsuite/gcc.target/riscv/movdifle.c
===
--- /dev/null
+++ gcc/gcc/testsuite/gcc.target/riscv/movdifle.c
@@ -0,0 +1,28 @@
+/* { dg-do compile } */
+/* { dg-require-effective-target rv64 } */
+/* { dg-skip-if "" { *-*-* } { "-O0" "-Og" "-Os" "-Oz" } } */
+/* { dg-options "-march=rv64gc -mtune=sifive-5-series -mbranch-cost=5 -mmovcc 
-fdump-rtl-ce1" } */
+
+typedef int __attribute__ ((mode (DI))) int_t;
+
+int_t
+movdifle (double w, double x, int_t y, int_t z)
+{
+  return w <= x ? y : z;
+}
+
+/* Expect branchless assembly like:
+
+   fle.d   a5,fa0,fa1
+   neg a5,a5
+   and a0,a5,a0
+   not a5,a5
+   and a5,a5,a1
+   or  a0,a0,a5
+ */
+
+/* { dg-final { scan-rtl-dump-times "Conversion succeeded on pass 1\\." 1 
"ce1" } } */
+/* { dg-final { scan-rtl-dump-not

[PATCH 32/44] RISC-V: Only use SUBREG if applicable in `riscv_expand_float_scc'

A subsequent change to enable the processing of conditional moves on a 
floating-point condition by `riscv_expand_conditional_move' will cause 
`riscv_expand_float_scc' to be called for word-mode target RTX with RV64 
targets.  In that case an invalid insn such as:

(insn 25 24 0 (set (reg:DI 141)
(subreg:SI (reg:DI 143) 0)) -1
 (nil))

would be produced, which would crash the compiler later on.  Since the 
output operand of the SET operation to be produced already has the same 
mode as the input operand does, just omit the use of SUBREG and assign 
directly.

gcc/
* config/riscv/riscv.cc (riscv_expand_float_scc): Suppress the 
use of SUBREG if the conditional-set target is word-mode.
---
 gcc/config/riscv/riscv.cc |4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

gcc-riscv-expand-float-scc-subreg.diff
Index: gcc/gcc/config/riscv/riscv.cc
===
--- gcc.orig/gcc/config/riscv/riscv.cc
+++ gcc/gcc/config/riscv/riscv.cc
@@ -4071,7 +4071,9 @@ riscv_expand_float_scc (rtx target, enum
   riscv_emit_float_compare (&code, &op0, &op1);
 
   rtx cmp = riscv_force_binary (word_mode, code, op0, op1);
-  riscv_emit_set (target, lowpart_subreg (SImode, cmp, word_mode));
+  if (GET_MODE (target) != word_mode)
+cmp = lowpart_subreg (GET_MODE (target), cmp, word_mode);
+  riscv_emit_set (target, cmp);
 }
 
 /* Jump to LABEL if (CODE OP0 OP1) holds.  */

[PATCH 35/44] RISC-V: Avoid extraneous integer comparison for FP comparisons

We have floating-point coditional-set machine instructions for a subset 
of FP comparisons, so avoid going through a comparison against constant 
zero in `riscv_expand_float_scc' where not necessary, preventing an 
extraneous RTL instruction from being produced that counts against the 
cost of the replacement branchless code sequence in if-conversion, e.g.:

(insn 29 6 30 2 (set (reg:DI 142)
(ge:DI (reg/v:DF 135 [ w ])
(reg/v:DF 136 [ x ]))) 297 {*cstoredfdi4}
 (nil))
(insn 30 29 31 2 (set (reg:DI 143)
(ne:DI (reg:DI 142)
(const_int 0 [0]))) 319 {*sne_zero_didi}
 (nil))
(insn 31 30 32 2 (set (reg:DI 141)
(reg:DI 143)) 206 {*movdi_64bit}
 (nil))
(insn 32 31 33 2 (set (reg:DI 144)
(neg:DI (reg:DI 141))) 15 {negdi2}
 (nil))
(insn 33 32 34 2 (set (reg:DI 145)
(and:DI (reg:DI 144)
(reg/v:DI 137 [ y ]))) 102 {*anddi3}
 (nil))
(insn 34 33 35 2 (set (reg:DI 146)
(not:DI (reg:DI 144))) 111 {one_cmpldi2}
 (nil))
(insn 35 34 36 2 (set (reg:DI 147)
(and:DI (reg:DI 146)
(reg/v:DI 138 [ z ]))) 102 {*anddi3}
 (nil))
(insn 36 35 21 2 (set (reg/v:DI 138 [ z ])
(ior:DI (reg:DI 145)
(reg:DI 147))) 105 {iordi3}
 (nil))

where the second insn effectively just copies its input.  This now gets 
simplified to:

(insn 29 6 30 2 (set (reg:DI 141)
(ge:DI (reg/v:DF 135 [ w ])
(reg/v:DF 136 [ x ]))) 297 {*cstoredfdi4}
 (nil))
(insn 30 29 31 2 (set (reg:DI 142)
(neg:DI (reg:DI 141))) 15 {negdi2}
 (nil))
(insn 31 30 32 2 (set (reg:DI 143)
(and:DI (reg:DI 142)
(reg/v:DI 137 [ y ]))) 102 {*anddi3}
 (nil))
(insn 32 31 33 2 (set (reg:DI 144)
(not:DI (reg:DI 142))) 111 {one_cmpldi2}
 (nil))
(insn 33 32 34 2 (set (reg:DI 145)
(and:DI (reg:DI 144)
(reg/v:DI 138 [ z ]))) 102 {*anddi3}
 (nil))
(insn 34 33 21 2 (set (reg/v:DI 138 [ z ])
(ior:DI (reg:DI 143)
(reg:DI 145))) 105 {iordi3}
 (nil))

lowering the cost of the code sequence produced (even though combine 
would swallow the second insn anyway).

We still need to produce a comparison against constant zero where the 
instruction following a floating-point coditional-set operation is a 
branch, so add canonicalization to `riscv_expand_conditional_branch' 
instead.

gcc/
* config/riscv/riscv.cc (riscv_emit_float_compare) : Handle 
separately.
: Return operands supplied as is.
(riscv_emit_binary): Call `riscv_emit_binary' directly rather 
than going through a temporary register for word-mode targets.
(riscv_expand_conditional_branch): Canonicalize the comparison 
if not against constant zero.
---
 gcc/config/riscv/riscv.cc |   29 +
 1 file changed, 21 insertions(+), 8 deletions(-)

gcc-riscv-emit-float-compare-fcmp.diff
Index: gcc/gcc/config/riscv/riscv.cc
===
--- gcc.orig/gcc/config/riscv/riscv.cc
+++ gcc/gcc/config/riscv/riscv.cc
@@ -4029,9 +4029,10 @@ riscv_emit_float_compare (enum rtx_code
 #undef UNORDERED_COMPARISON
 
 case NE:
-  fp_code = EQ;
   *code = EQ;
-  /* Fall through.  */
+  *op0 = riscv_force_binary (word_mode, EQ, cmp_op0, cmp_op1);
+  *op1 = const0_rtx;
+  break;
 
 case EQ:
 case LE:
@@ -4039,8 +4040,9 @@ riscv_emit_float_compare (enum rtx_code
 case GE:
 case GT:
   /* We have instructions for these cases.  */
-  *op0 = riscv_force_binary (word_mode, fp_code, cmp_op0, cmp_op1);
-  *op1 = const0_rtx;
+  *code = fp_code;
+  *op0 = cmp_op0;
+  *op1 = cmp_op1;
   break;
 
 case LTGT:
@@ -4080,10 +4082,14 @@ riscv_expand_float_scc (rtx target, enum
 {
   riscv_emit_float_compare (&code, &op0, &op1);
 
-  rtx cmp = riscv_force_binary (word_mode, code, op0, op1);
-  if (GET_MODE (target) != word_mode)
-cmp = lowpart_subreg (GET_MODE (target), cmp, word_mode);
-  riscv_emit_set (target, cmp);
+  machine_mode mode = GET_MODE (target);
+  if (mode != word_mode)
+{
+  rtx cmp = riscv_force_binary (word_mode, code, op0, op1);
+  riscv_emit_set (target, lowpart_subreg (mode, cmp, word_mode));
+}
+  else
+riscv_emit_binary (code, target, op0, op1);
 }
 
 /* Jump to LABEL if (CODE OP0 OP1) holds.  */
@@ -4096,6 +4102,13 @@ riscv_expand_conditional_branch (rtx lab
   else
 riscv_emit_int_compare (&code, &op0, &op1);
 
+  if (FLOAT_MODE_P (GET_MODE (op0)))
+{
+  op0 = riscv_force_binary (word_mode, code, op0, op1);
+  op1 = const0_rtx;
+  code = NE;
+}
+
   rtx condition = gen_rtx_fmt_ee (code, VOIDmode, op0, op1);
   emit_jump_insn (gen_condjump (condition, label));
 }

[PATCH 38/44] RISC-V/testsuite: Add branched cases for generic FP cond adds

Verify, for generic floating-point conditional-add operations that have 
a corresponding conditional-set machine instruction, that if-conversion 
does *not* trigger at `-mbranch-cost=2' setting, which makes original 
branched code sequences cheaper than their branchless equivalents 
if-conversion would emit.  Cover all the relevant floating-point 
relational operations to make sure no corner case escapes.

gcc/testsuite/
* gcc.target/riscv/adddibfeq.c: New test.
* gcc.target/riscv/adddibfge.c: New test.
* gcc.target/riscv/adddibfgt.c: New test.
* gcc.target/riscv/adddibfle.c: New test.
* gcc.target/riscv/adddibflt.c: New test.
* gcc.target/riscv/addsibfeq.c: New test.
* gcc.target/riscv/addsibfge.c: New test.
* gcc.target/riscv/addsibfgt.c: New test.
* gcc.target/riscv/addsibfle.c: New test.
* gcc.target/riscv/addsibflt.c: New test.
---
 gcc/testsuite/gcc.target/riscv/adddibfeq.c |   26 ++
 gcc/testsuite/gcc.target/riscv/adddibfge.c |   26 ++
 gcc/testsuite/gcc.target/riscv/adddibfgt.c |   26 ++
 gcc/testsuite/gcc.target/riscv/adddibfle.c |   26 ++
 gcc/testsuite/gcc.target/riscv/adddibflt.c |   26 ++
 gcc/testsuite/gcc.target/riscv/addsibfeq.c |   26 ++
 gcc/testsuite/gcc.target/riscv/addsibfge.c |   26 ++
 gcc/testsuite/gcc.target/riscv/addsibfgt.c |   26 ++
 gcc/testsuite/gcc.target/riscv/addsibfle.c |   26 ++
 gcc/testsuite/gcc.target/riscv/addsibflt.c |   26 ++
 10 files changed, 260 insertions(+)

gcc-riscv-test-addccf-branch-generic.diff
Index: gcc/gcc/testsuite/gcc.target/riscv/adddibfeq.c
===
--- /dev/null
+++ gcc/gcc/testsuite/gcc.target/riscv/adddibfeq.c
@@ -0,0 +1,26 @@
+/* { dg-do compile } */
+/* { dg-require-effective-target rv64 } */
+/* { dg-skip-if "" { *-*-* } { "-O0" "-Og" "-Os" "-Oz" } } */
+/* { dg-options "-march=rv64gc -mtune=sifive-5-series -mbranch-cost=2 -mmovcc 
-fdump-rtl-ce1" } */
+
+typedef int __attribute__ ((mode (DI))) int_t;
+
+int_t
+adddifeq (double w, double x, int_t y, int_t z)
+{
+  return w == x ? y + z : y;
+}
+
+/* Expect branched assembly like:
+
+   feq.d   a5,fa0,fa1
+   beq a5,zero,.L2
+   add a0,a0,a1
+.L2:
+ */
+
+/* { dg-final { scan-rtl-dump-not "Conversion succeeded on pass \[0-9\]+\\." 
"ce1" } } */
+/* { dg-final { scan-rtl-dump-not "if-conversion succeeded through" "ce1" } } 
*/
+/* { dg-final { scan-assembler-times "\\sfeq\\.d\\s" 1 } } */
+/* { dg-final { scan-assembler-times "\\s(?:beq|bne)\\s" 1 } } */
+/* { dg-final { scan-assembler-not "\\s(?:seqz|snez)\\s" } } */
Index: gcc/gcc/testsuite/gcc.target/riscv/adddibfge.c
===
--- /dev/null
+++ gcc/gcc/testsuite/gcc.target/riscv/adddibfge.c
@@ -0,0 +1,26 @@
+/* { dg-do compile } */
+/* { dg-require-effective-target rv64 } */
+/* { dg-skip-if "" { *-*-* } { "-O0" "-Og" "-Os" "-Oz" } } */
+/* { dg-options "-march=rv64gc -mtune=sifive-5-series -mbranch-cost=2 -mmovcc 
-ffinite-math-only -fdump-rtl-ce1" } */
+
+typedef int __attribute__ ((mode (DI))) int_t;
+
+int_t
+adddifge (double w, double x, int_t y, int_t z)
+{
+  return w >= x ? y + z : y;
+}
+
+/* Expect branched assembly like:
+
+   fge.d   a5,fa0,fa1
+   beq a5,zero,.L2
+   add a0,a0,a1
+.L2:
+ */
+
+/* { dg-final { scan-rtl-dump-not "Conversion succeeded on pass \[0-9\]+\\." 
"ce1" } } */
+/* { dg-final { scan-rtl-dump-not "if-conversion succeeded through" "ce1" } } 
*/
+/* { dg-final { scan-assembler-times 
"\\s(?:fge\\.d|fgt\\.d|fle\\.d|flt\\.d)\\s" 1 } } */
+/* { dg-final { scan-assembler-times "\\s(?:beq|bne)\\s" 1 } } */
+/* { dg-final { scan-assembler-not "\\s(?:seqz|snez)\\s" } } */
Index: gcc/gcc/testsuite/gcc.target/riscv/adddibfgt.c
===
--- /dev/null
+++ gcc/gcc/testsuite/gcc.target/riscv/adddibfgt.c
@@ -0,0 +1,26 @@
+/* { dg-do compile } */
+/* { dg-require-effective-target rv64 } */
+/* { dg-skip-if "" { *-*-* } { "-O0" "-Og" "-Os" "-Oz" } } */
+/* { dg-options "-march=rv64gc -mtune=sifive-5-series -mbranch-cost=2 -mmovcc 
-ffinite-math-only -fdump-rtl-ce1" } */
+
+typedef int __attribute__ ((mode (DI))) int_t;
+
+int_t
+adddifgt (double w, double x, int_t y, int_t z)
+{
+  return w > x ? y + z : y;
+}
+
+/* Expect branched assembly like:
+
+   fgt.d   a5,fa0,fa1
+   beq a5,zero,.L2
+   add a0,a0,a1
+.L2:
+ */
+
+/* { dg-final { scan-rtl-dump-not "Conversion succeeded on pass \[0-9\]+\\." 
"ce1" } } */
+/* { dg-final { scan-rtl-dump-not "if-conversion succeeded through" "ce1" } } 
*/
+/* { dg-final { scan-assembler-times 
"\\s(?:fge\\.d|fgt\\.d|fle\\.d|flt\\.d)\\s"

[PATCH 36/44] RISC-V/testsuite: Add branched cases for generic FP cond moves

Verify, for generic floating-point conditional-move operations that have 
a corresponding conditional-set machine instruction, that if-conversion 
does *not* trigger at `-mbranch-cost=4' setting, which makes original 
branched code sequences cheaper than their branchless equivalents 
if-conversion would emit.  Cover all the relevant floating-point 
relational operations to make sure no corner case escapes.

gcc/testsuite/
* gcc.target/riscv/movdibfge.c: New test.
* gcc.target/riscv/movdibfgt.c: New test.
* gcc.target/riscv/movdibfle.c: New test.
* gcc.target/riscv/movdibflt.c: New test.
* gcc.target/riscv/movdibfne.c: New test.
* gcc.target/riscv/movsibfge.c: New test.
* gcc.target/riscv/movsibfgt.c: New test.
* gcc.target/riscv/movsibfle.c: New test.
* gcc.target/riscv/movsibflt.c: New test.
* gcc.target/riscv/movsibfne.c: New test.
---
 gcc/testsuite/gcc.target/riscv/movdibfge.c |   28 
 gcc/testsuite/gcc.target/riscv/movdibfgt.c |   28 
 gcc/testsuite/gcc.target/riscv/movdibfle.c |   28 
 gcc/testsuite/gcc.target/riscv/movdibflt.c |   28 
 gcc/testsuite/gcc.target/riscv/movdibfne.c |   28 
 gcc/testsuite/gcc.target/riscv/movsibfge.c |   28 
 gcc/testsuite/gcc.target/riscv/movsibfgt.c |   28 
 gcc/testsuite/gcc.target/riscv/movsibfle.c |   28 
 gcc/testsuite/gcc.target/riscv/movsibflt.c |   28 
 gcc/testsuite/gcc.target/riscv/movsibfne.c |   28 
 10 files changed, 280 insertions(+)

gcc-riscv-test-movccf-branch-generic.diff
Index: gcc/gcc/testsuite/gcc.target/riscv/movdibfge.c
===
--- /dev/null
+++ gcc/gcc/testsuite/gcc.target/riscv/movdibfge.c
@@ -0,0 +1,28 @@
+/* { dg-do compile } */
+/* { dg-require-effective-target rv64 } */
+/* { dg-skip-if "" { *-*-* } { "-O0" "-Og" "-Os" "-Oz" } } */
+/* { dg-options "-march=rv64gc -mtune=sifive-5-series -mbranch-cost=4 -mmovcc 
-fdump-rtl-ce1" } */
+
+typedef int __attribute__ ((mode (DI))) int_t;
+
+int_t
+movdifge (double w, double x, int_t y, int_t z)
+{
+  return w >= x ? y : z;
+}
+
+/* Expect branched assembly like:
+
+   fge.d   a4,fa0,fa1
+   mv  a5,a0
+   mv  a0,a1
+   beq a4,zero,.L2
+   mv  a0,a5
+.L2:
+ */
+
+/* { dg-final { scan-rtl-dump-not "Conversion succeeded on pass \[0-9\]+\\." 
"ce1" } } */
+/* { dg-final { scan-rtl-dump-not "if-conversion succeeded through" "ce1" } } 
*/
+/* { dg-final { scan-assembler-times "\\s(?:fge\\.d|fle\\.d)\\s" 1 } } */
+/* { dg-final { scan-assembler-times "\\s(?:beq|bne)\\s" 1 } } */
+/* { dg-final { scan-assembler-not "\\s(?:seqz|snez)\\s" } } */
Index: gcc/gcc/testsuite/gcc.target/riscv/movdibfgt.c
===
--- /dev/null
+++ gcc/gcc/testsuite/gcc.target/riscv/movdibfgt.c
@@ -0,0 +1,28 @@
+/* { dg-do compile } */
+/* { dg-require-effective-target rv64 } */
+/* { dg-skip-if "" { *-*-* } { "-O0" "-Og" "-Os" "-Oz" } } */
+/* { dg-options "-march=rv64gc -mtune=sifive-5-series -mbranch-cost=4 -mmovcc 
-fdump-rtl-ce1" } */
+
+typedef int __attribute__ ((mode (DI))) int_t;
+
+int_t
+movdifgt (double w, double x, int_t y, int_t z)
+{
+  return w > x ? y : z;
+}
+
+/* Expect branched assembly like:
+
+   fgt.d   a4,fa0,fa1
+   mv  a5,a0
+   mv  a0,a1
+   beq a4,zero,.L2
+   mv  a0,a5
+.L2:
+ */
+
+/* { dg-final { scan-rtl-dump-not "Conversion succeeded on pass \[0-9\]+\\." 
"ce1" } } */
+/* { dg-final { scan-rtl-dump-not "if-conversion succeeded through" "ce1" } } 
*/
+/* { dg-final { scan-assembler-times "\\s(?:fgt\\.d|flt\\.d)\\s" 1 } } */
+/* { dg-final { scan-assembler-times "\\s(?:beq|bne)\\s" 1 } } */
+/* { dg-final { scan-assembler-not "\\s(?:seqz|snez)\\s" } } */
Index: gcc/gcc/testsuite/gcc.target/riscv/movdibfle.c
===
--- /dev/null
+++ gcc/gcc/testsuite/gcc.target/riscv/movdibfle.c
@@ -0,0 +1,28 @@
+/* { dg-do compile } */
+/* { dg-require-effective-target rv64 } */
+/* { dg-skip-if "" { *-*-* } { "-O0" "-Og" "-Os" "-Oz" } } */
+/* { dg-options "-march=rv64gc -mtune=sifive-5-series -mbranch-cost=4 -mmovcc 
-fdump-rtl-ce1" } */
+
+typedef int __attribute__ ((mode (DI))) int_t;
+
+int_t
+movdifle (double w, double x, int_t y, int_t z)
+{
+  return w <= x ? y : z;
+}
+
+/* Expect branched assembly like:
+
+   fle.d   a4,fa0,fa1
+   mv  a5,a0
+   mv  a0,a1
+   beq a4,zero,.L2
+   mv  a0,a5
+.L2:
+ */
+
+/* { dg-final { scan-rtl-dump-not "Conversion succeeded on pass \[0-9\]+\\." 
"ce1" } } */
+/* { dg-final { scan-rtl-dump-not "if-conversion succeeded through" "ce1" }

[PATCH 40/44] RISC-V: Handle FP NE operator via inversion in cond-operation expansion

We have no FNE.fmt machine instructions, but we can emulate them for the 
purpose of conditional-move and conditional-add operations by using the 
respective FEQ.fmt instruction and then swapping the data input operands 
or complementing the mask for the conditional addend respectively, so 
update our handlers accordingly.

gcc/
* config/riscv/riscv-protos.h (riscv_expand_float_scc): Add 
`invert_ptr' parameter.
* config/riscv/riscv.cc (riscv_emit_float_compare): Add NE 
inversion handling.
(riscv_expand_float_scc): Pass `invert_ptr' through to 
`riscv_emit_float_compare'.
(riscv_expand_conditional_move): Pass `&invert' to 
`riscv_expand_float_scc'.
* config/riscv/riscv.md (addcc): Likewise.
---
 gcc/config/riscv/riscv-protos.h |3 ++-
 gcc/config/riscv/riscv.cc   |   23 +++
 gcc/config/riscv/riscv.md   |2 +-
 3 files changed, 18 insertions(+), 10 deletions(-)

gcc-riscv-emit-float-compare-ne.diff
Index: gcc/gcc/config/riscv/riscv-protos.h
===
--- gcc.orig/gcc/config/riscv/riscv-protos.h
+++ gcc/gcc/config/riscv/riscv-protos.h
@@ -132,7 +132,8 @@ riscv_zcmp_valid_stack_adj_bytes_p (HOST
 
 #ifdef RTX_CODE
 extern void riscv_expand_int_scc (rtx, enum rtx_code, rtx, rtx, bool 
*invert_ptr = 0);
-extern void riscv_expand_float_scc (rtx, enum rtx_code, rtx, rtx);
+extern void riscv_expand_float_scc (rtx, enum rtx_code, rtx, rtx,
+   bool *invert_ptr = nullptr);
 extern void riscv_expand_conditional_branch (rtx, enum rtx_code, rtx, rtx);
 extern rtx riscv_emit_unary (enum rtx_code code, rtx dest, rtx x);
 extern rtx riscv_emit_binary (enum rtx_code code, rtx dest, rtx x, rtx y);
Index: gcc/gcc/config/riscv/riscv.cc
===
--- gcc.orig/gcc/config/riscv/riscv.cc
+++ gcc/gcc/config/riscv/riscv.cc
@@ -3965,7 +3965,8 @@ riscv_emit_int_compare (enum rtx_code *c
 /* Like riscv_emit_int_compare, but for floating-point comparisons.  */
 
 static void
-riscv_emit_float_compare (enum rtx_code *code, rtx *op0, rtx *op1)
+riscv_emit_float_compare (enum rtx_code *code, rtx *op0, rtx *op1,
+ bool *invert_ptr = nullptr)
 {
   rtx tmp0, tmp1, cmp_op0 = *op0, cmp_op1 = *op1;
   enum rtx_code fp_code = *code;
@@ -4029,10 +4030,15 @@ riscv_emit_float_compare (enum rtx_code
 #undef UNORDERED_COMPARISON
 
 case NE:
-  *code = EQ;
-  *op0 = riscv_force_binary (word_mode, EQ, cmp_op0, cmp_op1);
-  *op1 = const0_rtx;
-  break;
+  fp_code = EQ;
+  if (invert_ptr != nullptr)
+   *invert_ptr = !*invert_ptr;
+  else
+   {
+ cmp_op0 = riscv_force_binary (word_mode, fp_code, cmp_op0, cmp_op1);
+ cmp_op1 = const0_rtx;
+   }
+  gcc_fallthrough ();
 
 case EQ:
 case LE:
@@ -4078,9 +4084,10 @@ riscv_expand_int_scc (rtx target, enum r
 /* Like riscv_expand_int_scc, but for floating-point comparisons.  */
 
 void
-riscv_expand_float_scc (rtx target, enum rtx_code code, rtx op0, rtx op1)
+riscv_expand_float_scc (rtx target, enum rtx_code code, rtx op0, rtx op1,
+   bool *invert_ptr)
 {
-  riscv_emit_float_compare (&code, &op0, &op1);
+  riscv_emit_float_compare (&code, &op0, &op1, invert_ptr);
 
   machine_mode mode = GET_MODE (target);
   if (mode != word_mode)
@@ -4171,7 +4178,7 @@ riscv_expand_conditional_move (rtx dest,
riscv_expand_int_scc (tmp, code, op0, op1, invert_ptr);
  else if (FLOAT_MODE_P (mode0)
   && fp_scc_comparison (op, GET_MODE (op)))
-   riscv_expand_float_scc (tmp, code, op0, op1);
+   riscv_expand_float_scc (tmp, code, op0, op1, &invert);
  else
return false;
 
Index: gcc/gcc/config/riscv/riscv.md
===
--- gcc.orig/gcc/config/riscv/riscv.md
+++ gcc/gcc/config/riscv/riscv.md
@@ -2697,7 +2697,7 @@
   if (INTEGRAL_MODE_P (mode0))
 riscv_expand_int_scc (reg0, code, cmp0, cmp1, &invert);
   else if (FLOAT_MODE_P (mode0) && fp_scc_comparison (cmp, GET_MODE (cmp)))
-riscv_expand_float_scc (reg0, code, cmp0, cmp1);
+riscv_expand_float_scc (reg0, code, cmp0, cmp1, &invert);
   else
 FAIL;

[PATCH 39/44] RISC-V/testsuite: Add branchless cases for generic FP cond adds

Verify, for generic floating-point conditional-add operations that have 
a corresponding conditional-set machine instruction, that if-conversion 
triggers via `noce_try_addcc' at `-mbranch-cost=3' setting, which makes 
branchless code sequences emitted by if-conversion cheaper than their 
original branched equivalents, and that extraneous instructions such as 
SNEZ, etc. are not present in output.

The reason to XFAIL SImode tests for RV64 targets is the compiler thinks
it has to sign-extend addends, which causes if-conversion to give up.

gcc/testsuite/
* gcc.target/riscv/adddifeq.c: New test.
* gcc.target/riscv/adddifge.c: New test.
* gcc.target/riscv/adddifgt.c: New test.
* gcc.target/riscv/adddifle.c: New test.
* gcc.target/riscv/adddiflt.c: New test.
* gcc.target/riscv/addsifeq.c: New test.
* gcc.target/riscv/addsifge.c: New test.
* gcc.target/riscv/addsifgt.c: New test.
* gcc.target/riscv/addsifle.c: New test.
* gcc.target/riscv/addsiflt.c: New test.
---
 gcc/testsuite/gcc.target/riscv/adddifeq.c |   26 ++
 gcc/testsuite/gcc.target/riscv/adddifge.c |   26 ++
 gcc/testsuite/gcc.target/riscv/adddifgt.c |   26 ++
 gcc/testsuite/gcc.target/riscv/adddifle.c |   26 ++
 gcc/testsuite/gcc.target/riscv/adddiflt.c |   26 ++
 gcc/testsuite/gcc.target/riscv/addsifeq.c |   26 ++
 gcc/testsuite/gcc.target/riscv/addsifge.c |   26 ++
 gcc/testsuite/gcc.target/riscv/addsifgt.c |   26 ++
 gcc/testsuite/gcc.target/riscv/addsifle.c |   26 ++
 gcc/testsuite/gcc.target/riscv/addsiflt.c |   26 ++
 10 files changed, 260 insertions(+)

gcc-riscv-test-addccf-generic.diff
Index: gcc/gcc/testsuite/gcc.target/riscv/adddifeq.c
===
--- /dev/null
+++ gcc/gcc/testsuite/gcc.target/riscv/adddifeq.c
@@ -0,0 +1,26 @@
+/* { dg-do compile } */
+/* { dg-require-effective-target rv64 } */
+/* { dg-skip-if "" { *-*-* } { "-O0" "-Og" "-Os" "-Oz" } } */
+/* { dg-options "-march=rv64gc -mtune=sifive-5-series -mbranch-cost=3 -mmovcc 
-fdump-rtl-ce1" } */
+
+typedef int __attribute__ ((mode (DI))) int_t;
+
+int_t
+adddifeq (double w, double x, int_t y, int_t z)
+{
+  return w == x ? y + z : y;
+}
+
+/* Expect branchless assembly like:
+
+   feq.d   a5,fa0,fa1
+   neg a5,a5
+   and a5,a5,a1
+   add a0,a5,a0
+ */
+
+/* { dg-final { scan-rtl-dump-times "Conversion succeeded on pass 1\\." 1 
"ce1" } } */
+/* { dg-final { scan-rtl-dump-times "if-conversion succeeded through 
noce_try_addcc" 1 "ce1" } } */
+/* { dg-final { scan-assembler-times "\\sfeq\\.d\\s" 1 } } */
+/* { dg-final { scan-assembler-not "\\s(?:seqz|snez)\\s" } } */
+/* { dg-final { scan-assembler-not "\\s(?:beq|bne)\\s" } } */
Index: gcc/gcc/testsuite/gcc.target/riscv/adddifge.c
===
--- /dev/null
+++ gcc/gcc/testsuite/gcc.target/riscv/adddifge.c
@@ -0,0 +1,26 @@
+/* { dg-do compile } */
+/* { dg-require-effective-target rv64 } */
+/* { dg-skip-if "" { *-*-* } { "-O0" "-Og" "-Os" "-Oz" } } */
+/* { dg-options "-march=rv64gc -mtune=sifive-5-series -mbranch-cost=3 -mmovcc 
-ffinite-math-only -fdump-rtl-ce1" } */
+
+typedef int __attribute__ ((mode (DI))) int_t;
+
+int_t
+adddifge (double w, double x, int_t y, int_t z)
+{
+  return w >= x ? y + z : y;
+}
+
+/* Expect branchless assembly like:
+
+   fge.d   a5,fa0,fa1
+   neg a5,a5
+   and a5,a5,a1
+   add a0,a5,a0
+ */
+
+/* { dg-final { scan-rtl-dump-times "Conversion succeeded on pass 1\\." 1 
"ce1" } } */
+/* { dg-final { scan-rtl-dump-times "if-conversion succeeded through 
noce_try_addcc" 1 "ce1" } } */
+/* { dg-final { scan-assembler-times 
"\\s(?:fge\\.d|fgt\\.d|fle\\.d|flt\\.d)\\s" 1 } } */
+/* { dg-final { scan-assembler-not "\\s(?:seqz|snez)\\s" } } */
+/* { dg-final { scan-assembler-not "\\s(?:beq|bne)\\s" } } */
Index: gcc/gcc/testsuite/gcc.target/riscv/adddifgt.c
===
--- /dev/null
+++ gcc/gcc/testsuite/gcc.target/riscv/adddifgt.c
@@ -0,0 +1,26 @@
+/* { dg-do compile } */
+/* { dg-require-effective-target rv64 } */
+/* { dg-skip-if "" { *-*-* } { "-O0" "-Og" "-Os" "-Oz" } } */
+/* { dg-options "-march=rv64gc -mtune=sifive-5-series -mbranch-cost=3 -mmovcc 
-ffinite-math-only -fdump-rtl-ce1" } */
+
+typedef int __attribute__ ((mode (DI))) int_t;
+
+int_t
+adddifgt (double w, double x, int_t y, int_t z)
+{
+  return w > x ? y + z : y;
+}
+
+/* Expect branchless assembly like:
+
+   fgt.d   a5,fa0,fa1
+   neg a5,a5
+   and a5,a5,a1
+   add a0,a5,a0
+ */
+
+/* { dg-final { scan-rtl-dump-times "Conversion succeeded on pass 1\\." 1 
"ce1" } }

[PATCH 41/44] RISC-V/testsuite: Add branched cases for FP NE cond-move operations

Verify, for generic, Ventana and Zicond targets and the floating-point 
NE conditional-move operation, that if-conversion does *not* trigger at 
the respective sufficiently low `-mbranch-cost=' settings that make 
original branched code sequences cheaper than their branchless 
equivalents if-conversion would emit.

gcc/testsuite/
* gcc.target/riscv/movdibfeq-ventana.c: New test.
* gcc.target/riscv/movdibfeq-zicond.c: New test.
* gcc.target/riscv/movdibfeq.c: New test.
* gcc.target/riscv/movsibfeq-ventana.c: New test.
* gcc.target/riscv/movsibfeq-zicond.c: New test.
* gcc.target/riscv/movsibfeq.c: New test.
---
 gcc/testsuite/gcc.target/riscv/movdibfeq-ventana.c |   30 +
 gcc/testsuite/gcc.target/riscv/movdibfeq-zicond.c  |   30 +
 gcc/testsuite/gcc.target/riscv/movdibfeq.c |   28 +++
 gcc/testsuite/gcc.target/riscv/movsibfeq-ventana.c |   30 +
 gcc/testsuite/gcc.target/riscv/movsibfeq-zicond.c  |   30 +
 gcc/testsuite/gcc.target/riscv/movsibfeq.c |   28 +++
 6 files changed, 176 insertions(+)

gcc-riscv-test-movccne-branch-generic.diff
Index: gcc/gcc/testsuite/gcc.target/riscv/movdibfeq-ventana.c
===
--- /dev/null
+++ gcc/gcc/testsuite/gcc.target/riscv/movdibfeq-ventana.c
@@ -0,0 +1,30 @@
+/* { dg-do compile } */
+/* { dg-require-effective-target rv64 } */
+/* { dg-skip-if "" { *-*-* } { "-O0" "-Og" "-Os" "-Oz" } } */
+/* { dg-options "-march=rv64gc_xventanacondops -mtune=rocket -mbranch-cost=2 
-fdump-rtl-ce1" } */
+
+typedef int __attribute__ ((mode (DI))) int_t;
+
+int_t
+movdifeq (double w, double x, int_t y, int_t z)
+{
+  return w == x ? y : z;
+}
+
+/* Expect branched assembly like:
+
+   feq.d   a4,fa0,fa1
+   mv  a5,a0
+   mv  a0,a1
+   beq a4,zero,.L2
+   mv  a0,a5
+.L2:
+ */
+
+/* { dg-final { scan-rtl-dump-not "Conversion succeeded on pass \[0-9\]+\\." 
"ce1" } } */
+/* { dg-final { scan-rtl-dump-not "if-conversion succeeded through" "ce1" } } 
*/
+/* { dg-final { scan-assembler-times "\\sfeq\\.d\\s" 1 } } */
+/* { dg-final { scan-assembler-times "\\s(?:beq|bne)\\s" 1 } } */
+/* { dg-final { scan-assembler-not "\\svt\\.maskc\\s" } } */
+/* { dg-final { scan-assembler-not "\\svt\\.maskcn\\s" } } */
+/* { dg-final { scan-assembler-not "\\s(?:seqz|snez)\\s" } } */
Index: gcc/gcc/testsuite/gcc.target/riscv/movdibfeq-zicond.c
===
--- /dev/null
+++ gcc/gcc/testsuite/gcc.target/riscv/movdibfeq-zicond.c
@@ -0,0 +1,30 @@
+/* { dg-do compile } */
+/* { dg-require-effective-target rv64 } */
+/* { dg-skip-if "" { *-*-* } { "-O0" "-Og" "-Os" "-Oz" } } */
+/* { dg-options "-march=rv64gc_zicond -mtune=rocket -mbranch-cost=2 
-fdump-rtl-ce1" } */
+
+typedef int __attribute__ ((mode (DI))) int_t;
+
+int_t
+movdifeq (double w, double x, int_t y, int_t z)
+{
+  return w == x ? y : z;
+}
+
+/* Expect branched assembly like:
+
+   feq.d   a4,fa0,fa1
+   mv  a5,a0
+   mv  a0,a1
+   beq a4,zero,.L2
+   mv  a0,a5
+.L2:
+ */
+
+/* { dg-final { scan-rtl-dump-not "Conversion succeeded on pass \[0-9\]+\\." 
"ce1" } } */
+/* { dg-final { scan-rtl-dump-not "if-conversion succeeded through" "ce1" } } 
*/
+/* { dg-final { scan-assembler-times "\\sfeq\\.d\\s" 1 } } */
+/* { dg-final { scan-assembler-times "\\s(?:beq|bne)\\s" 1 } } */
+/* { dg-final { scan-assembler-not "\\sczero\\.eqz\\s" } } */
+/* { dg-final { scan-assembler-not "\\sczero\\.nez\\s" } } */
+/* { dg-final { scan-assembler-not "\\s(?:seqz|snez)\\s" } } */
Index: gcc/gcc/testsuite/gcc.target/riscv/movdibfeq.c
===
--- /dev/null
+++ gcc/gcc/testsuite/gcc.target/riscv/movdibfeq.c
@@ -0,0 +1,28 @@
+/* { dg-do compile } */
+/* { dg-require-effective-target rv64 } */
+/* { dg-skip-if "" { *-*-* } { "-O0" "-Og" "-Os" "-Oz" } } */
+/* { dg-options "-march=rv64gc -mtune=sifive-5-series -mbranch-cost=4 -mmovcc 
-fdump-rtl-ce1" } */
+
+typedef int __attribute__ ((mode (DI))) int_t;
+
+int_t
+movdifeq (double w, double x, int_t y, int_t z)
+{
+  return w == x ? y : z;
+}
+
+/* Expect branched assembly like:
+
+   feq.d   a4,fa0,fa1
+   mv  a5,a0
+   mv  a0,a1
+   beq a4,zero,.L2
+   mv  a0,a5
+.L2:
+ */
+
+/* { dg-final { scan-rtl-dump-not "Conversion succeeded on pass \[0-9\]+\\." 
"ce1" } } */
+/* { dg-final { scan-rtl-dump-not "if-conversion succeeded through" "ce1" } } 
*/
+/* { dg-final { scan-assembler-times "\\sfeq\\.d\\s" 1 } } */
+/* { dg-final { scan-assembler-times "\\s(?:beq|bne)\\s" 1 } } */
+/* { dg-final { scan-assembler-not "\\s(?:seqz|snez)\\s" } } */
Index: gcc/gcc/testsuite/gcc.target/riscv/movsibfeq-ventana.c
===
---

[PATCH 42/44] RISC-V/testsuite: Add branched cases for FP NE cond-move operations

Verify, for the floating-point NE conditional-move operation, that 
if-conversion triggers via `noce_try_cmove' at the respective 
sufficiently high `-mbranch-cost=' settings that make branchless code 
sequences produced by if-conversion cheaper than their original branched 
equivalents, and that extraneous instructions such as SNEZ, etc. are not 
present in output.

gcc/testsuite/
* gcc.target/riscv/movdifeq-sfb.c: New test.
* gcc.target/riscv/movdifeq-thead.c: New test.
* gcc.target/riscv/movdifeq-ventana.c: New test.
* gcc.target/riscv/movdifeq-zicond.c: New test.
* gcc.target/riscv/movdifeq.c: New test.
* gcc.target/riscv/movsifeq-sfb.c: New test.
* gcc.target/riscv/movsifeq-thead.c: New test.
* gcc.target/riscv/movsifeq-ventana.c: New test.
* gcc.target/riscv/movsifeq-zicond.c: New test.
* gcc.target/riscv/movsifeq.c: New test.
---
 gcc/testsuite/gcc.target/riscv/movdifeq-sfb.c |   27 +
 gcc/testsuite/gcc.target/riscv/movdifeq-thead.c   |   25 +++
 gcc/testsuite/gcc.target/riscv/movdifeq-ventana.c |   28 ++
 gcc/testsuite/gcc.target/riscv/movdifeq-zicond.c  |   28 ++
 gcc/testsuite/gcc.target/riscv/movdifeq.c |   28 ++
 gcc/testsuite/gcc.target/riscv/movsifeq-sfb.c |   27 +
 gcc/testsuite/gcc.target/riscv/movsifeq-thead.c   |   25 +++
 gcc/testsuite/gcc.target/riscv/movsifeq-ventana.c |   28 ++
 gcc/testsuite/gcc.target/riscv/movsifeq-zicond.c  |   28 ++
 gcc/testsuite/gcc.target/riscv/movsifeq.c |   28 ++
 10 files changed, 272 insertions(+)

gcc-riscv-test-movccne-generic.diff
Index: gcc/gcc/testsuite/gcc.target/riscv/movdifeq-sfb.c
===
--- /dev/null
+++ gcc/gcc/testsuite/gcc.target/riscv/movdifeq-sfb.c
@@ -0,0 +1,27 @@
+/* { dg-do compile } */
+/* { dg-require-effective-target rv64 } */
+/* { dg-skip-if "" { *-*-* } { "-O0" "-Og" } } */
+/* { dg-options "-march=rv64gc -mtune=sifive-7-series -mbranch-cost=1 
-fdump-rtl-ce1" } */
+
+typedef int __attribute__ ((mode (DI))) int_t;
+
+int_t
+movdifeq (double w, double x, int_t y, int_t z)
+{
+  return w == x ? y : z;
+}
+
+/* Expect short forward branch assembly like:
+
+   feq.d   a5,fa0,fa1
+   beq a5,zero,1f  # movcc
+   mv  a1,a0
+1:
+   mv  a0,a1
+ */
+
+/* { dg-final { scan-rtl-dump-times "Conversion succeeded on pass 1\\." 1 
"ce1" } } */
+/* { dg-final { scan-rtl-dump-times "if-conversion succeeded through 
noce_try_cmove" 1 "ce1" } } */
+/* { dg-final { scan-assembler-times "\\sfeq\\.d\\s" 1 } } */
+/* { dg-final { scan-assembler-times "\\s(?:beq|bne)\\s\[^\\s\]+\\s# movcc\\s" 
1 } } */
+/* { dg-final { scan-assembler-not "\\s(?:seqz|snez)\\s" } } */
Index: gcc/gcc/testsuite/gcc.target/riscv/movdifeq-thead.c
===
--- /dev/null
+++ gcc/gcc/testsuite/gcc.target/riscv/movdifeq-thead.c
@@ -0,0 +1,25 @@
+/* { dg-do compile } */
+/* { dg-require-effective-target rv64 } */
+/* { dg-skip-if "" { *-*-* } { "-O0" "-Og" } } */
+/* { dg-options "-march=rv64gc_xtheadcondmov -mtune=thead-c906 -mbranch-cost=1 
-fdump-rtl-ce1" } */
+
+typedef int __attribute__ ((mode (DI))) int_t;
+
+int_t
+movdifeq (double w, double x, int_t y, int_t z)
+{
+  return w == x ? y : z;
+}
+
+/* Expect branchless assembly like:
+
+   feq.d   a5,fa0,fa1
+   th.mveqza0,a1,a5
+ */
+
+/* { dg-final { scan-rtl-dump-times "Conversion succeeded on pass 1\\." 1 
"ce1" } } */
+/* { dg-final { scan-rtl-dump-times "if-conversion succeeded through 
noce_try_cmove" 1 "ce1" } } */
+/* { dg-final { scan-assembler-times "\\sfeq\\.d\\s" 1 } } */
+/* { dg-final { scan-assembler-times "\\s(?:th\\.mveqz|th\\.mvnez)\\s" 1 } } */
+/* { dg-final { scan-assembler-not "\\s(?:seqz|snez)\\s" } } */
+/* { dg-final { scan-assembler-not "\\s(?:beq|bne)\\s" } } */
Index: gcc/gcc/testsuite/gcc.target/riscv/movdifeq-ventana.c
===
--- /dev/null
+++ gcc/gcc/testsuite/gcc.target/riscv/movdifeq-ventana.c
@@ -0,0 +1,28 @@
+/* { dg-do compile } */
+/* { dg-require-effective-target rv64 } */
+/* { dg-skip-if "" { *-*-* } { "-O0" "-Og" "-Os" "-Oz" } } */
+/* { dg-options "-march=rv64gc_xventanacondops -mtune=rocket -mbranch-cost=3 
-fdump-rtl-ce1" } */
+
+typedef int __attribute__ ((mode (DI))) int_t;
+
+int_t
+movdifeq (double w, double x, int_t y, int_t z)
+{
+  return w == x ? y : z;
+}
+
+/* Expect branchless assembly like:
+
+   feq.d   a5,fa0,fa1
+   vt.maskcn   a1,a1,a5
+   vt.maskca0,a0,a5
+   or  a0,a0,a1
+ */
+
+/* { dg-final { scan-rtl-dump-times "Conversion succeeded on pass 1\\." 1 
"ce1" } } */
+/* { dg-final { scan-rtl-dump-times "if-conversi

[PATCH 43/44] RISC-V/testsuite: Add branched cases for FP NE cond-add operation

Verify, for the generic floating-point NE conditional-add operation, 
that if-conversion does *not* trigger at `-mbranch-cost=2' setting, 
which makes original branched code sequences cheaper than their 
branchless equivalents if-conversion would emit.

gcc/testsuite/
* gcc.target/riscv/adddibfne.c: New test.
* gcc.target/riscv/addsibfne.c: New test.
---
 gcc/testsuite/gcc.target/riscv/adddibfne.c |   26 ++
 gcc/testsuite/gcc.target/riscv/addsibfne.c |   26 ++
 2 files changed, 52 insertions(+)

gcc-riscv-test-addccne-branch-generic.diff
Index: gcc/gcc/testsuite/gcc.target/riscv/adddibfne.c
===
--- /dev/null
+++ gcc/gcc/testsuite/gcc.target/riscv/adddibfne.c
@@ -0,0 +1,26 @@
+/* { dg-do compile } */
+/* { dg-require-effective-target rv64 } */
+/* { dg-skip-if "" { *-*-* } { "-O0" "-Og" "-Os" "-Oz" } } */
+/* { dg-options "-march=rv64gc -mtune=sifive-5-series -mbranch-cost=2 -mmovcc 
-fdump-rtl-ce1" } */
+
+typedef int __attribute__ ((mode (DI))) int_t;
+
+int_t
+adddifne (double w, double x, int_t y, int_t z)
+{
+  return w != x ? y + z : y;
+}
+
+/* Expect branched assembly like:
+
+   feq.d   a5,fa0,fa1
+   bne a5,zero,.L2
+   add a0,a0,a1
+.L2:
+ */
+
+/* { dg-final { scan-rtl-dump-not "Conversion succeeded on pass \[0-9\]+\\." 
"ce1" } } */
+/* { dg-final { scan-rtl-dump-not "if-conversion succeeded through" "ce1" } } 
*/
+/* { dg-final { scan-assembler-times "\\sfeq\\.d\\s" 1 } } */
+/* { dg-final { scan-assembler-times "\\s(?:beq|bne)\\s" 1 } } */
+/* { dg-final { scan-assembler-not "\\s(?:seqz|snez)\\s" } } */
Index: gcc/gcc/testsuite/gcc.target/riscv/addsibfne.c
===
--- /dev/null
+++ gcc/gcc/testsuite/gcc.target/riscv/addsibfne.c
@@ -0,0 +1,26 @@
+/* { dg-do compile } */
+/* { dg-skip-if "" { *-*-* } { "-O0" "-Og" "-Os" "-Oz" } } */
+/* { dg-options "-march=rv32gc -mtune=sifive-5-series -mbranch-cost=2 -mmovcc 
-fdump-rtl-ce1" { target { rv32 } } } */
+/* { dg-options "-march=rv64gc -mtune=sifive-5-series -mbranch-cost=2 -mmovcc 
-fdump-rtl-ce1" { target { rv64 } } } */
+
+typedef int __attribute__ ((mode (SI))) int_t;
+
+int_t
+addsifne (double w, double x, int_t y, int_t z)
+{
+  return w != x ? y + z : y;
+}
+
+/* Expect branched assembly like:
+
+   feq.d   a5,fa0,fa1
+   bne a5,zero,.L2
+   add[w]  a0,a0,a1
+.L2:
+ */
+
+/* { dg-final { scan-rtl-dump-not "Conversion succeeded on pass \[0-9\]+\\." 
"ce1" } } */
+/* { dg-final { scan-rtl-dump-not "if-conversion succeeded through" "ce1" } } 
*/
+/* { dg-final { scan-assembler-times "\\sfeq\\.d\\s" 1 } } */
+/* { dg-final { scan-assembler-times "\\s(?:beq|bne)\\s" 1 } } */
+/* { dg-final { scan-assembler-not "\\s(?:seqz|snez)\\s" } } */

[PATCH 44/44] RISC-V/testsuite: Add branchless cases for FP NE cond-add operation

Verify, for the generic floating-point NE conditional-add operation, 
that if-conversion triggers via `noce_try_addcc' at `-mbranch-cost=3' 
setting, which makes branchless code sequences emitted by if-conversion 
cheaper than their original branched equivalents, and that extraneous 
instructions such as SNEZ, etc. are not present in output.

The reason to XFAIL the SImode test for RV64 targets is GCC thinks it 
has to sign-extend addends, which causes if-conversion to give up.

gcc/testsuite/
* gcc.target/riscv/adddifne.c: New test.
* gcc.target/riscv/addsifne.c: New test.
---
 gcc/testsuite/gcc.target/riscv/adddifne.c |   26 ++
 gcc/testsuite/gcc.target/riscv/addsifne.c |   26 ++
 2 files changed, 52 insertions(+)

gcc-riscv-test-addccne-generic.diff
Index: gcc/gcc/testsuite/gcc.target/riscv/adddifne.c
===
--- /dev/null
+++ gcc/gcc/testsuite/gcc.target/riscv/adddifne.c
@@ -0,0 +1,26 @@
+/* { dg-do compile } */
+/* { dg-require-effective-target rv64 } */
+/* { dg-skip-if "" { *-*-* } { "-O0" "-Og" "-Os" "-Oz" } } */
+/* { dg-options "-march=rv64gc -mtune=sifive-5-series -mbranch-cost=3 -mmovcc 
-fdump-rtl-ce1" } */
+
+typedef int __attribute__ ((mode (DI))) int_t;
+
+int_t
+adddifne (double w, double x, int_t y, int_t z)
+{
+  return w != x ? y + z : y;
+}
+
+/* Expect branchless assembly like:
+
+   feq.d   a5,fa0,fa1
+   addia5,a5,-1
+   and a5,a5,a1
+   add a0,a5,a0
+ */
+
+/* { dg-final { scan-rtl-dump-times "Conversion succeeded on pass 1\\." 1 
"ce1" } } */
+/* { dg-final { scan-rtl-dump-times "if-conversion succeeded through 
noce_try_addcc" 1 "ce1" } } */
+/* { dg-final { scan-assembler-times "\\sfeq\\.d\\s" 1 } } */
+/* { dg-final { scan-assembler-not "\\s(?:seqz|snez)\\s" } } */
+/* { dg-final { scan-assembler-not "\\s(?:beq|bne)\\s" } } */
Index: gcc/gcc/testsuite/gcc.target/riscv/addsifne.c
===
--- /dev/null
+++ gcc/gcc/testsuite/gcc.target/riscv/addsifne.c
@@ -0,0 +1,26 @@
+/* { dg-do compile } */
+/* { dg-skip-if "" { *-*-* } { "-O0" "-Og" "-Os" "-Oz" } } */
+/* { dg-options "-march=rv32gc -mtune=sifive-5-series -mbranch-cost=3 -mmovcc 
-fdump-rtl-ce1" { target { rv32 } } } */
+/* { dg-options "-march=rv64gc -mtune=sifive-5-series -mbranch-cost=3 -mmovcc 
-fdump-rtl-ce1" { target { rv64 } } } */
+
+typedef int __attribute__ ((mode (SI))) int_t;
+
+int_t
+addsifne (double w, double x, int_t y, int_t z)
+{
+  return w != x ? y + z : y;
+}
+
+/* Expect branchless assembly like:
+
+   feq.d   a5,fa0,fa1
+   addi[w] a5,a5,-1
+   and a5,a5,a1
+   add[w]  a0,a5,a0
+ */
+
+/* { dg-final { scan-rtl-dump-times "Conversion succeeded on pass 1\\." 1 
"ce1" { xfail rv64 } } } */
+/* { dg-final { scan-rtl-dump-times "if-conversion succeeded through 
noce_try_addcc" 1 "ce1" { xfail rv64 } } } */
+/* { dg-final { scan-assembler-times "\\sfeq\\.d\\s" 1 } } */
+/* { dg-final { scan-assembler-not "\\s(?:seqz|snez)\\s" } } */
+/* { dg-final { scan-assembler-not "\\s(?:beq|bne)\\s" { xfail rv64 } } } */

Re: [PATCH 00/44] RISC-V: Various if-conversion fixes and improvements

quick response for this patch set, it's a really huge number of
patches, so I'll review it individually, and feel free to commit
individual one once got LGTM for each single patch :P


On Sun, Nov 19, 2023 at 1:35 PM Maciej W. Rozycki  wrote:
>
> Hi,
>
>  This patch series has come out from a simple change to add generic
> conditional-move and conditional-add expansions for a yet-out-of-tree
> target, which has relatively expensive branches and no conditional
> operations beyond the base architecture conditional-set instructions.  At
> one point I have concluded it may make sense to release this code to the
> general public, especially as some of the conditional execution sequences
> will trigger for targets we already have support for.  Naturally as a part
> of a proper upstream submission I chose to add suitable test cases.
>
>  Now these test cases triggered a lot of issues in our existing code and
> as I fixed them what was supposed to be a couple of patches has turned
> into this humongous patch series, including a branch costing model rework.
> Oh well.
>
>  Please see individual change descriptions for the details.  The overall
> patch series structure is as follows:
>
> - 01-02 add test cases covering the existing state that won't change
>   throughout the patch series,
>
> - 03-08 make small preparatory clean-ups that do not change semantics,
>
> - 09-13 implement a branch cost model rework and add the associated test
>   cases,
>
> - 14-24 make various improvements for integer conditional operations and
>   add the associated test cases,
>
> - 25-28 add generic `movMODEcc' support and the associated test cases,
>
> - 29-31 add generic `addMODEcc' support and the associated test cases,
>
> - 32-44 make various improvements for floating-point conditional
>   operations and add the associated test cases.
>
> There is potential here for middle end improvement, in particular branch
> costing is already documented in if-cvt.cc to be intended to consistently
> use BRANCH_COST, and then the generic conditional-move and conditional-add
> sequences could I suppose be emitted there in a target-agnostic way rather
> than being supplied by the backend.  This I suppose could be investigated
> in the future if the RISC-V approach turned out potentially useful for
> other targets.
>
>  This has been so far verified as follows, using SiFive HiFive Unmatched
> hardware and the `riscv64-linux-gnu' target:
>
> - New target test cases have been run with `-mtune=sifive-5-series',
>   `-mtune=sifive-5-series/-march=rv32gc/-mabi=ilp32d' and
>   `-mtune=sifive-5-series/-mmovcc/-mbranch-cost=8' DejaGNU board options.
>
> - The C language test suite has been run at significant points in the
>   patch series with `-mtune=sifive-5-series' and (past 26/44) also with
>   `-mtune=sifive-5-series/-mmovcc/-mbranch-cost=8', and selectively with
>   `-mtune=sifive-7-series' and
>   `-mtune=sifive-7-series/-mmovcc/-mbranch-cost=8' DejaGNU board options.
>
> Since this is huge and every test iteration takes a couple of hours I will
> continue running testing and may investigate running QEMU testing for the
> features the Unmatched does not support such as Zicond.  I don't expect
> real issues however.
>
>  There are a bunch of issues triggered with `-mmovcc/-mbranch-cost=8' or
> with lone `-mbranch-cost=8' even and the vector test cases, which are
> either due to match patterns expecting an assembly label that has been
> reordered or are similar to PR target/112092 and which are not a problem
> with this patch series, but rather one with the vector testsuite or code.
>
>  Any questions, comments, or concerns?  Otherwise OK to apply?
>
>   Maciej

Re: [PATCH 01/44] testsuite: Add cases for conditional-move and conditional-add operations

ok

On Sun, Nov 19, 2023 at 1:35 PM Maciej W. Rozycki  wrote:
>
> Add generic execution tests for expressions that are expected to expand
> to conditional-move and conditional-add operations where supported.  To
> ensure no corner case escapes all relational operators are extensively
> covered for integer comparisons and all ordered operators are covered
> for floating-point comparisons.  Unordered operators are not covered at
> this point as they'd require a different input data set.
>
> gcc/testsuite/
> * gcc.dg/torture/addieq.c: New test.
> * gcc.dg/torture/addifeq.c: New test.
> * gcc.dg/torture/addifge.c: New test.
> * gcc.dg/torture/addifgt.c: New test.
> * gcc.dg/torture/addifle.c: New test.
> * gcc.dg/torture/addiflt.c: New test.
> * gcc.dg/torture/addifne.c: New test.
> * gcc.dg/torture/addige.c: New test.
> * gcc.dg/torture/addigeu.c: New test.
> * gcc.dg/torture/addigt.c: New test.
> * gcc.dg/torture/addigtu.c: New test.
> * gcc.dg/torture/addile.c: New test.
> * gcc.dg/torture/addileu.c: New test.
> * gcc.dg/torture/addilt.c: New test.
> * gcc.dg/torture/addiltu.c: New test.
> * gcc.dg/torture/addine.c: New test.
> * gcc.dg/torture/addleq.c: New test.
> * gcc.dg/torture/addlfeq.c: New test.
> * gcc.dg/torture/addlfge.c: New test.
> * gcc.dg/torture/addlfgt.c: New test.
> * gcc.dg/torture/addlfle.c: New test.
> * gcc.dg/torture/addlflt.c: New test.
> * gcc.dg/torture/addlfne.c: New test.
> * gcc.dg/torture/addlge.c: New test.
> * gcc.dg/torture/addlgeu.c: New test.
> * gcc.dg/torture/addlgt.c: New test.
> * gcc.dg/torture/addlgtu.c: New test.
> * gcc.dg/torture/addlle.c: New test.
> * gcc.dg/torture/addlleu.c: New test.
> * gcc.dg/torture/addllt.c: New test.
> * gcc.dg/torture/addlltu.c: New test.
> * gcc.dg/torture/addlne.c: New test.
> * gcc.dg/torture/movieq.c: New test.
> * gcc.dg/torture/movifeq.c: New test.
> * gcc.dg/torture/movifge.c: New test.
> * gcc.dg/torture/movifgt.c: New test.
> * gcc.dg/torture/movifle.c: New test.
> * gcc.dg/torture/moviflt.c: New test.
> * gcc.dg/torture/movifne.c: New test.
> * gcc.dg/torture/movige.c: New test.
> * gcc.dg/torture/movigeu.c: New test.
> * gcc.dg/torture/movigt.c: New test.
> * gcc.dg/torture/movigtu.c: New test.
> * gcc.dg/torture/movile.c: New test.
> * gcc.dg/torture/movileu.c: New test.
> * gcc.dg/torture/movilt.c: New test.
> * gcc.dg/torture/moviltu.c: New test.
> * gcc.dg/torture/movine.c: New test.
> * gcc.dg/torture/movleq.c: New test.
> * gcc.dg/torture/movlfeq.c: New test.
> * gcc.dg/torture/movlfge.c: New test.
> * gcc.dg/torture/movlfgt.c: New test.
> * gcc.dg/torture/movlfle.c: New test.
> * gcc.dg/torture/movlflt.c: New test.
> * gcc.dg/torture/movlfne.c: New test.
> * gcc.dg/torture/movlge.c: New test.
> * gcc.dg/torture/movlgeu.c: New test.
> * gcc.dg/torture/movlgt.c: New test.
> * gcc.dg/torture/movlgtu.c: New test.
> * gcc.dg/torture/movlle.c: New test.
> * gcc.dg/torture/movlleu.c: New test.
> * gcc.dg/torture/movllt.c: New test.
> * gcc.dg/torture/movlltu.c: New test.
> * gcc.dg/torture/movlne.c: New test.
> ---
>  gcc/testsuite/gcc.dg/torture/addieq.c  |   31 +++
>  gcc/testsuite/gcc.dg/torture/addifeq.c |   31 +++
>  gcc/testsuite/gcc.dg/torture/addifge.c |   31 +++
>  gcc/testsuite/gcc.dg/torture/addifgt.c |   31 +++
>  gcc/testsuite/gcc.dg/torture/addifle.c |   31 +++
>  gcc/testsuite/gcc.dg/torture/addiflt.c |   31 +++
>  gcc/testsuite/gcc.dg/torture/addifne.c |   31 +++
>  gcc/testsuite/gcc.dg/torture/addige.c  |   31 +++
>  gcc/testsuite/gcc.dg/torture/addigeu.c |   31 +++
>  gcc/testsuite/gcc.dg/torture/addigt.c  |   31 +++
>  gcc/testsuite/gcc.dg/torture/addigtu.c |   31 +++
>  gcc/testsuite/gcc.dg/torture/addile.c  |   31 +++
>  gcc/testsuite/gcc.dg/torture/addileu.c |   31 +++
>  gcc/testsuite/gcc.dg/torture/addilt.c  |   31 +++
>  gcc/testsuite/gcc.dg/torture/addiltu.c |   31 +++
>  gcc/testsuite/gcc.dg/torture/addine.c  |   31 +++
>  gcc/testsuite/gcc.dg/torture/addleq.c  |   31 +++
>  gcc/testsuite/gcc.dg/torture/addlfeq.c |   31 ++

Re: [PATCH 07/44] RISC-V: Use `nullptr' in `riscv_expand_conditional_move'