Re: [PATCH v2] aarch64: Improve on ldp-stp policies code structure.

2023-09-29 Thread Philipp Tomsich
Applied to master. Thanks!
--Philipp.

On Fri, 29 Sept 2023 at 12:34, Richard Sandiford
 wrote:
>
> Manos Anagnostakis  writes:
> > Improves on: 834fc2bf
> >
> > This improves the code structure of the ldp-stp policies
> > patch introduced in 834fc2bf
> >
> > Bootstrapped and regtested on aarch64-linux.
> >
> > gcc/ChangeLog:
> >   * config/aarch64/aarch64-opts.h (enum aarch64_ldp_policy): Removed.
> >   (enum aarch64_ldp_stp_policy): Merged enums aarch64_ldp_policy
> >   and aarch64_stp_policy to aarch64_ldp_stp_policy.
> >   (enum aarch64_stp_policy): Removed.
> >   * config/aarch64/aarch64-protos.h (struct tune_params): Removed
> >   aarch64_ldp_policy_model and aarch64_stp_policy_model enum types
> >   and left only the definitions to the aarch64-opts one.
> >   * config/aarch64/aarch64.cc (aarch64_parse_ldp_policy): Removed.
> >   (aarch64_parse_stp_policy): Removed.
> >   (aarch64_override_options_internal): Removed calls to parsing
> >   functions and added obvious direct assignments.
> >   (aarch64_mem_ok_with_ldpstp_policy_model): Improved
> >   code quality based on the new changes.
> >   * config/aarch64/aarch64.opt: Use single enum type
> >   aarch64_ldp_stp_policy for both ldp and stp options.
> >
> > gcc/testsuite/ChangeLog:
> >   * gcc.target/aarch64/ldp_aligned.c: Splitted into this and
> >   ldp_unaligned.
> >   * gcc.target/aarch64/stp_aligned.c: Splitted into this and
> >   stp_unaligned.
> >   * gcc.target/aarch64/ldp_unaligned.c: New test.
> >   * gcc.target/aarch64/stp_unaligned.c: New test.
>
> Nice!  OK for trunk, thanks.
>
> Sorry again for my mix-up with the original review.
>
> Richard
>
> > Signed-off-by: Manos Anagnostakis 
> > ---
> >  gcc/config/aarch64/aarch64-opts.h |  26 ++-
> >  gcc/config/aarch64/aarch64-protos.h   |  25 +--
> >  gcc/config/aarch64/aarch64.cc | 160 +++---
> >  gcc/config/aarch64/aarch64.opt|  29 +---
> >  .../gcc.target/aarch64/ldp_aligned.c  |  28 ---
> >  .../gcc.target/aarch64/ldp_unaligned.c|  40 +
> >  .../gcc.target/aarch64/stp_aligned.c  |  25 ---
> >  .../gcc.target/aarch64/stp_unaligned.c|  37 
> >  8 files changed, 155 insertions(+), 215 deletions(-)
> >  create mode 100644 gcc/testsuite/gcc.target/aarch64/ldp_unaligned.c
> >  create mode 100644 gcc/testsuite/gcc.target/aarch64/stp_unaligned.c
> >
> > diff --git a/gcc/config/aarch64/aarch64-opts.h 
> > b/gcc/config/aarch64/aarch64-opts.h
> > index db8348507a3..831e28ab52a 100644
> > --- a/gcc/config/aarch64/aarch64-opts.h
> > +++ b/gcc/config/aarch64/aarch64-opts.h
> > @@ -108,20 +108,18 @@ enum aarch64_key_type {
> >AARCH64_KEY_B
> >  };
> >
> > -/* Load pair policy type.  */
> > -enum aarch64_ldp_policy {
> > -  LDP_POLICY_DEFAULT,
> > -  LDP_POLICY_ALWAYS,
> > -  LDP_POLICY_NEVER,
> > -  LDP_POLICY_ALIGNED
> > -};
> > -
> > -/* Store pair policy type.  */
> > -enum aarch64_stp_policy {
> > -  STP_POLICY_DEFAULT,
> > -  STP_POLICY_ALWAYS,
> > -  STP_POLICY_NEVER,
> > -  STP_POLICY_ALIGNED
> > +/* An enum specifying how to handle load and store pairs using
> > +   a fine-grained policy:
> > +   - LDP_STP_POLICY_DEFAULT: Use the policy defined in the tuning 
> > structure.
> > +   - LDP_STP_POLICY_ALIGNED: Emit ldp/stp if the source pointer is aligned
> > +   to at least double the alignment of the type.
> > +   - LDP_STP_POLICY_ALWAYS: Emit ldp/stp regardless of alignment.
> > +   - LDP_STP_POLICY_NEVER: Do not emit ldp/stp.  */
> > +enum aarch64_ldp_stp_policy {
> > +  AARCH64_LDP_STP_POLICY_DEFAULT,
> > +  AARCH64_LDP_STP_POLICY_ALIGNED,
> > +  AARCH64_LDP_STP_POLICY_ALWAYS,
> > +  AARCH64_LDP_STP_POLICY_NEVER
> >  };
> >
> >  #endif
> > diff --git a/gcc/config/aarch64/aarch64-protos.h 
> > b/gcc/config/aarch64/aarch64-protos.h
> > index 5c6802b4fe8..60a55f4bc19 100644
> > --- a/gcc/config/aarch64/aarch64-protos.h
> > +++ b/gcc/config/aarch64/aarch64-protos.h
> > @@ -568,30 +568,9 @@ struct tune_params
> >/* Place prefetch struct pointer at the end to enable type checking
> >   errors when tune_params misses elements (e.g., from erroneous 
> > merges).  */
> >const struct cpu_prefetch_tune *prefetch;
> > -/* An enum specifying how to handle load pairs using a fine-grained policy:
> > -   - LDP_POLICY_ALIGNED: Emit ldp if the source pointer is aligned
> > -   to at least double the alignment of the type.
> > -   - LDP_POLICY_ALWAYS: Emit ldp regardless of alignment.
> > -   - LDP_POLICY_NEVER: Do not emit ldp.  */
> >
> > -  enum aarch64_ldp_policy_model
> > -  {
> > -LDP_POLICY_ALIGNED,
> > -LDP_POLICY_ALWAYS,
> > -LDP_POLICY_NEVER
> > -  } ldp_policy_model;
> > -/* An enum specifying how to handle store pairs using a fine-grained 
> > policy:
> > -   - STP_POLICY_ALIGNED: Emit stp if the source pointer is aligned
> > -   to at least double the alignment of the type.
> > -   - ST

Re: [PATCH] RISC-V: Synthesize power-of-two constants.

2023-05-30 Thread Philipp Tomsich
Assuming a fully pipelined vector unit (and from experience on
AArch64), an u-arch's scalar-to-vector move cost is likely to play a
significant role in whether this will be profitable or not.

--Philipp.

On Wed, 31 May 2023 at 00:10, Jeff Law via Gcc-patches
 wrote:
>
>
>
> On 5/30/23 16:01, 钟居哲 wrote:
> > I agree with Andrew.
> >
> > And I don't think this patch is appropriate for following reasons:
> > 1. This patch increases vector workload in machine since
> >   it convert scalar load + vmv.v.x into vmv.v.i + vsll.vi.
> This is probably uarch dependent.  I can probably construct cases where
> the first will be better and I can probably construct cases where the
> latter will be better.  In fact the recommendation from our uarch team
> is to generally do this stuff on the vector side.
>
>
>
> > 2. For multi-issue OoO machine, scalar instructions are very cheap
> >  when they are located in vector codegen. For example a sequence
> >  like this:
> >scalar insn
> >scalar insn
> >vector insn
> >scalar insn
> > vector insn
> >
> >In such situation, we can issue multiple instructions simultaneously,
> >and the latency of scalar instructions will be hided so scalar
> > instruction
> >is cheap. Wheras this patch increasing vector pipeline workload
> > is not
> >friendly to OoO machine what I mentioned above.
> I probably need to be careful what I say here :-)  I'll go with mixing
> vector/scalar code may incur certain penalties on some
> microarchitectures depending on the exact code sequences involved.
>
>
> > 3.   I can image the only benefit of this patch is that we can reduce
> > scalar register pressure
> >in some extreme circumstances. However, I don't this benefit is
> > "real" since GCC should
> >well schedule the instruction sequence when we well tune the
> > vector instructions scheduling
> >model and cost model to make such register live range very short
> > when the scalar register
> >pressure is very high.
> >
> > Overal, I disagree with this patch.
> What I think this all argues is that it'll likely need to be uarch
> dependent.I'm not yet sure how to describe the properties of the
> uarch in a concise manner to put into our costing structure yet though.
>
> jeff


Re: FW: [RFC] RISC-V: Support risc-v bfloat16 This patch support bfloat16 in riscv like x86_64 and arm.

2023-06-01 Thread Philipp Tomsich
On Thu, 1 Jun 2023 at 18:49, Jeff Law via Gcc-patches
 wrote:
>
>
>
> On 6/1/23 01:01, juzhe.zh...@rivai.ai wrote:
> > I plan to implement BF16 vector in GCC but still waiting for ISA
> > ratified since GCC policy doesn't allow un-ratified ISA.
> Right.  So those specs need to move along further before we can start
> integrating code.

Doesn't our policy require specs to only pass the FREEZE milestone
(i.e., the requirement for public review) before we can start
integrating them?
This should give us at least a 6 week (minimum 30 days public-review
plus 2 weeks for the TSC vote to send this up for ratification)
headstart on ratification (with the small risk of minor changes
required due to review comments) to start integrating support for new
extensions.

Best,
Philipp.

p.s.: Just for reference, the RISC-V Lifecycle Guide (defining these
milestones in specification development) is linked from
https://wiki.riscv.org/ for details.


> >
> > Currently, we are working on INT8,INT16,INT32,INT64,FP16,FP32,FP64
> > auto-vectorizaiton.
> > It should very simple BF16 in current vector framework in GCC.
> In prior architectures I've worked on the bulk of BF16 work was just
> adding additional entries to existing iterators.  So I agree, it should
> be very simple :-)
>
> Jeff
>


Re: [PATCH] RISC-V: Add Veyron V1 pipeline description

2023-06-08 Thread Philipp Tomsich
On Thu 8. Jun 2023 at 09:35, Kito Cheng via Gcc-patches <
gcc-patches@gcc.gnu.org> wrote:

> > diff --git a/gcc/config/riscv/riscv-cores.def
> b/gcc/config/riscv/riscv-cores.def
> > index 7d87ab7ce28..4078439e562 100644
> > --- a/gcc/config/riscv/riscv-cores.def
> > +++ b/gcc/config/riscv/riscv-cores.def
> > @@ -38,6 +38,7 @@ RISCV_TUNE("sifive-3-series", generic,
> rocket_tune_info)
> >  RISCV_TUNE("sifive-5-series", generic, rocket_tune_info)
> >  RISCV_TUNE("sifive-7-series", sifive_7, sifive_7_tune_info)
> >  RISCV_TUNE("thead-c906", generic, thead_c906_tune_info)
> > +RISCV_TUNE("veyron-v1", veyron_v1, veyron_v1_tune_info)
> >  RISCV_TUNE("size", generic, optimize_size_tune_info)
> >
> >  #undef RISCV_TUNE
> > @@ -77,4 +78,7 @@ RISCV_CORE("thead-c906",
> "rv64imafdc_xtheadba_xtheadbb_xtheadbs_xtheadcmo_"
> >   "xtheadcondmov_xtheadfmemidx_xtheadmac_"
> >   "xtheadmemidx_xtheadmempair_xtheadsync",
> >   "thead-c906")
> > +
> > +RISCV_CORE("veyron-v1",
>  "rv64imafdc_zba_zbb_zbc_zbs_zifencei_xventanacondops",
> > + "veyron-v1")
>
> Seems like xventanacondops have not in the trunk yet, I saw Jeff has
> approved before but not commit yet


We couldn’t apply back then, as Veyro -V1 had been unnannounced.
Can we move this forward now?

Philipp.

>


Re: [PATCH] RISC-V: Add Veyron V1 pipeline description

2023-06-08 Thread Philipp Tomsich
On Thu 8. Jun 2023 at 16:17, Jeff Law  wrote:

>
>
> On 6/8/23 04:22, Kito Cheng wrote:
>
> >
> >
> > Oh, okay I got the awkness point...I am ok with that on gcc land, but I
> > would like binutils support that first, or remove the extension from the
> > mcpu for temporary before binutils support, otherwise it just a broken
> > support for that CPU on trunk gcc.
> I pushed the binutils bits into the repo a couple months ago:
>
> > commit 1656d3f8ef56a16745689c03269412988ebcaa54
> > Author: Philipp Tomsich 
> > Date:   Wed Apr 26 14:09:34 2023 -0600
> >
> > RISC-V: Support XVentanaCondOps extension
> [ ... ]
>
> I'd very much like to see the condops go into GCC as well, but I've been
> hesitant to move it forward myself.  We're still waiting on hardware and
> it wasn't clear to me that we really had consensus agreement to move the
> bits forward based on an announcement vs waiting on actual hardware
> availability (based on the comments from Palmer when I upstreamed the
> binutils bits).


Zicondops will go to ratification in the next couple of weeks, and the plan
is to revise the patches by then.

So I would propose that we move Zicond forward as that happens and (given
how small XVentanaCondOps is on-top of Zicond) we pick it up then.


> IIRC there was general consensus on rewriting the lowest level


That was part of the “moving forward”… this needs a rebase and a major
revision.


> primitives as if-then-else constructs.  Something like this:
>
> > (define_code_iterator eq_or_ne [eq ne])
> > (define_code_attr n [(eq "") (ne "n")])
> > (define_code_attr rev [(eq "n") (ne "")])
> >
> > (define_insn "*vt.maskc"
> >   [(set (match_operand:X 0 "register_operand" "=r")
> > (if_then_else:X
> >  (eq_or_ne (match_operand:X 1 "register_operand" "r")
> >  (const_int 0))
> >  (const_int 0)
> >  (match_operand:X 2 "register_operand" "r")))]
> >   "TARGET_XVENTANACONDOPS"
> >   "vt.maskc\t%0,%2,%1")
> >
> > (define_insn "*vt.maskc_reverse"
> >   [(set (match_operand:X 0 "register_operand" "=r")
> > (if_then_else:X
> >  (eq_or_ne (match_operand:X 1 "register_operand" "r")
> >  (const_int 0))
> >  (match_operand:X 2 "register_operand" "r")
> >  (const_int 0)))]
> >   "TARGET_XVENTANACONDOPS"
> >   "vt.maskc\t%0,%2,%1")
>
> That's what we're using internally these days.  I would expect zicond to
> work in exactly the same manner, but with a different instruction being
> emitted.
>
> We've also got bits here which wire this up in the conditional move
> expander and which adjust the ifcvt.cc bits from VRULL to use the
> if-then-else form.  All this will be useful for zicond as well.
>
> I don't mind letting zicond go first.  It's frozen so it ought to be
> non-controversial.  We can then upstream the various improvements to
> utilize zicond better.  That moves things forward in a meaningful manner
> and buys time to meet the hardware requirement for xventanacondops which
> will be trivial to add if zicond is already supported.
>
>
>
>
> Jeff
>


Re: [PATCH 2/2] cprop_hardreg: Enable propagation of the stack pointer if possible.

2023-06-15 Thread Philipp Tomsich
Rebased, retested, and applied to trunk.  Thanks!
--Philipp.


On Thu, 8 Jun 2023 at 00:18, Jeff Law  wrote:
>
>
>
> On 5/25/23 06:35, Manolis Tsamis wrote:
> > Propagation of the stack pointer in cprop_hardreg is currenty forbidden
> > in all cases, due to maybe_mode_change returning NULL. Relax this
> > restriction and allow propagation when no mode change is requested.
> >
> > gcc/ChangeLog:
> >
> >  * regcprop.cc (maybe_mode_change): Enable stack pointer 
> > propagation.
> Thanks for the clarification.  This is OK for the trunk.  It looks
> generic enough to have value going forward now rather than waiting.
>
> jeff


Re: [PATCH 1/1] riscv: thead: Fix ICE when enable XTheadMemPair ISA extension.

2023-07-12 Thread Philipp Tomsich
Looks like I missed the OK on this one.
I can pick it up today, unless you Kito already has it in flight?

Thanks,
Philipp.

On Tue, 11 Jul 2023 at 17:51, Kito Cheng  wrote:

> Hi Christoph:
>
> Ooops, I thought Philipp will push those patches, does here any other
> patches got approved but not committed? I can help to push those
> patches tomorrow.
>
> On Tue, Jul 11, 2023 at 11:42 PM Christoph Müllner
>  wrote:
> >
> > Hi Cooper,
> >
> > I addressed this in April this year.
> > It even got an "ok", but nobody pushed it:
> >   https://gcc.gnu.org/pipermail/gcc-patches/2023-April/616972.html
> >
> > BR
> > Christoph
> >
> > On Tue, Jul 11, 2023 at 5:39 PM Xianmiao Qu 
> wrote:
> > >
> > > The frame related load/store instructions should not been
> > > scheduled bewteen echo other, and the REG_FRAME_RELATED_EXPR
> > > expression note should should be added to those instructions
> > > to prevent this.
> > > This bug cause ICE during GCC bootstap, and it will also ICE
> > > in the simplified case mempair-4.c, compilation fails with:
> > > during RTL pass: dwarf2
> > > theadmempair-4.c:20:1: internal compiler error: in
> dwarf2out_frame_debug_cfa_offset, at dwarf2cfi.cc:1376
> > > 0xa8c017 dwarf2out_frame_debug_cfa_offset
> > > ../../../gcc/gcc/dwarf2cfi.cc:1376
> > > 0xa8c017 dwarf2out_frame_debug
> > > ../../../gcc/gcc/dwarf2cfi.cc:2285
> > > 0xa8c017 scan_insn_after
> > > ../../../gcc/gcc/dwarf2cfi.cc:2726
> > > 0xa8cc97 scan_trace
> > > ../../../gcc/gcc/dwarf2cfi.cc:2893
> > > 0xa8d84d create_cfi_notes
> > > ../../../gcc/gcc/dwarf2cfi.cc:2933
> > > 0xa8d84d execute_dwarf2_frame
> > > ../../../gcc/gcc/dwarf2cfi.cc:3309
> > > 0xa8d84d execute
> > > ../../../gcc/gcc/dwarf2cfi.cc:3799
> > >
> > > gcc/ChangeLog:
> > >
> > > * config/riscv/thead.cc (th_mempair_save_regs): Add
> > > REG_FRAME_RELATED_EXPR note for mempair instuctions.
> > >
> > > gcc/testsuite/ChangeLog:
> > > * gcc.target/riscv/xtheadmempair-4.c: New test.
> > > ---
> > >  gcc/config/riscv/thead.cc |  6 +++--
> > >  .../gcc.target/riscv/xtheadmempair-4.c| 26 +++
> > >  2 files changed, 30 insertions(+), 2 deletions(-)
> > >  create mode 100644 gcc/testsuite/gcc.target/riscv/xtheadmempair-4.c
> > >
> > > diff --git a/gcc/config/riscv/thead.cc b/gcc/config/riscv/thead.cc
> > > index 75203805310..2df709226f9 100644
> > > --- a/gcc/config/riscv/thead.cc
> > > +++ b/gcc/config/riscv/thead.cc
> > > @@ -366,10 +366,12 @@ th_mempair_save_regs (rtx operands[4])
> > >  {
> > >rtx set1 = gen_rtx_SET (operands[0], operands[1]);
> > >rtx set2 = gen_rtx_SET (operands[2], operands[3]);
> > > +  rtx dwarf = gen_rtx_SEQUENCE (VOIDmode, rtvec_alloc (2));
> > >rtx insn = emit_insn (gen_rtx_PARALLEL (VOIDmode, gen_rtvec (2,
> set1, set2)));
> > >RTX_FRAME_RELATED_P (insn) = 1;
> > > -  add_reg_note (insn, REG_CFA_OFFSET, copy_rtx (set1));
> > > -  add_reg_note (insn, REG_CFA_OFFSET, copy_rtx (set2));
> > > +  XVECEXP (dwarf, 0, 0) = copy_rtx (set1);
> > > +  XVECEXP (dwarf, 0, 1) = copy_rtx (set2);
> > > +  add_reg_note (insn, REG_FRAME_RELATED_EXPR, dwarf);
> > >  }
> > >
> > >  /* Similar like riscv_restore_reg, but restores two registers from
> memory
> > > diff --git a/gcc/testsuite/gcc.target/riscv/xtheadmempair-4.c
> b/gcc/testsuite/gcc.target/riscv/xtheadmempair-4.c
> > > new file mode 100644
> > > index 000..d653f056ef4
> > > --- /dev/null
> > > +++ b/gcc/testsuite/gcc.target/riscv/xtheadmempair-4.c
> > > @@ -0,0 +1,26 @@
> > > +/* { dg-do compile } */
> > > +/* { dg-skip-if "" { *-*-* } { "-O0" "-O1" "-g" "-Oz" "-Os" "-flto" }
> } */
> > > +/* { dg-options "-march=rv64gc_xtheadmempair -O2 -g
> -mtune=thead-c906" { target { rv64 } } } */
> > > +/* { dg-options "-march=rv32gc_xtheadmempair -O2 -g
> -mtune=thead-c906" { target { rv32 } } } */
> > > +
> > > +void a();
> > > +void b(char *);
> > > +void m_fn1(int);
> > > +int e;
> > > +
> > > +int foo(int ee, int f, int g) {
> > > +  char *h = (char *)__builtin_alloca(1);
> > > +  b(h);
> > > +  b("");
> > > +  int i = ee;
> > > +  e = g;
> > > +  m_fn1(f);
> > > +  a();
> > > +  e = i;
> > > +}
> > > +
> > > +/* { dg-final { scan-assembler-times "th.ldd\t" 3 { target { rv64 } }
> } } */
> > > +/* { dg-final { scan-assembler-times "th.sdd\t" 3 { target { rv64 } }
> } } */
> > > +
> > > +/* { dg-final { scan-assembler-times "th.lwd\t" 3 { target { rv32 } }
> } } */
> > > +/* { dg-final { scan-assembler-times "th.swd\t" 3 { target { rv32 } }
> } } */
> > > --
> > > 2.17.1
> > >
>


Re: [PATCH 1/1] riscv: thead: Fix ICE when enable XTheadMemPair ISA extension.

2023-07-12 Thread Philipp Tomsich
Awesome, thanks!

On Wed, 12 Jul 2023 at 09:18, Kito Cheng  wrote:

> Yeah, I've applied patches on my local tree and running the testsuite.
>
> On Wed, Jul 12, 2023 at 3:11 PM Philipp Tomsich
>  wrote:
> >
> > Looks like I missed the OK on this one.
> > I can pick it up today, unless you Kito already has it in flight?
> >
> > Thanks,
> > Philipp.
> >
> > On Tue, 11 Jul 2023 at 17:51, Kito Cheng  wrote:
> >>
> >> Hi Christoph:
> >>
> >> Ooops, I thought Philipp will push those patches, does here any other
> >> patches got approved but not committed? I can help to push those
> >> patches tomorrow.
> >>
> >> On Tue, Jul 11, 2023 at 11:42 PM Christoph Müllner
> >>  wrote:
> >> >
> >> > Hi Cooper,
> >> >
> >> > I addressed this in April this year.
> >> > It even got an "ok", but nobody pushed it:
> >> >   https://gcc.gnu.org/pipermail/gcc-patches/2023-April/616972.html
> >> >
> >> > BR
> >> > Christoph
> >> >
> >> > On Tue, Jul 11, 2023 at 5:39 PM Xianmiao Qu <
> cooper...@linux.alibaba.com> wrote:
> >> > >
> >> > > The frame related load/store instructions should not been
> >> > > scheduled bewteen echo other, and the REG_FRAME_RELATED_EXPR
> >> > > expression note should should be added to those instructions
> >> > > to prevent this.
> >> > > This bug cause ICE during GCC bootstap, and it will also ICE
> >> > > in the simplified case mempair-4.c, compilation fails with:
> >> > > during RTL pass: dwarf2
> >> > > theadmempair-4.c:20:1: internal compiler error: in
> dwarf2out_frame_debug_cfa_offset, at dwarf2cfi.cc:1376
> >> > > 0xa8c017 dwarf2out_frame_debug_cfa_offset
> >> > > ../../../gcc/gcc/dwarf2cfi.cc:1376
> >> > > 0xa8c017 dwarf2out_frame_debug
> >> > > ../../../gcc/gcc/dwarf2cfi.cc:2285
> >> > > 0xa8c017 scan_insn_after
> >> > > ../../../gcc/gcc/dwarf2cfi.cc:2726
> >> > > 0xa8cc97 scan_trace
> >> > > ../../../gcc/gcc/dwarf2cfi.cc:2893
> >> > > 0xa8d84d create_cfi_notes
> >> > > ../../../gcc/gcc/dwarf2cfi.cc:2933
> >> > > 0xa8d84d execute_dwarf2_frame
> >> > > ../../../gcc/gcc/dwarf2cfi.cc:3309
> >> > > 0xa8d84d execute
> >> > > ../../../gcc/gcc/dwarf2cfi.cc:3799
> >> > >
> >> > > gcc/ChangeLog:
> >> > >
> >> > > * config/riscv/thead.cc (th_mempair_save_regs): Add
> >> > > REG_FRAME_RELATED_EXPR note for mempair instuctions.
> >> > >
> >> > > gcc/testsuite/ChangeLog:
> >> > > * gcc.target/riscv/xtheadmempair-4.c: New test.
> >> > > ---
> >> > >  gcc/config/riscv/thead.cc |  6 +++--
> >> > >  .../gcc.target/riscv/xtheadmempair-4.c| 26
> +++
> >> > >  2 files changed, 30 insertions(+), 2 deletions(-)
> >> > >  create mode 100644 gcc/testsuite/gcc.target/riscv/xtheadmempair-4.c
> >> > >
> >> > > diff --git a/gcc/config/riscv/thead.cc b/gcc/config/riscv/thead.cc
> >> > > index 75203805310..2df709226f9 100644
> >> > > --- a/gcc/config/riscv/thead.cc
> >> > > +++ b/gcc/config/riscv/thead.cc
> >> > > @@ -366,10 +366,12 @@ th_mempair_save_regs (rtx operands[4])
> >> > >  {
> >> > >rtx set1 = gen_rtx_SET (operands[0], operands[1]);
> >> > >rtx set2 = gen_rtx_SET (operands[2], operands[3]);
> >> > > +  rtx dwarf = gen_rtx_SEQUENCE (VOIDmode, rtvec_alloc (2));
> >> > >rtx insn = emit_insn (gen_rtx_PARALLEL (VOIDmode, gen_rtvec (2,
> set1, set2)));
> >> > >RTX_FRAME_RELATED_P (insn) = 1;
> >> > > -  add_reg_note (insn, REG_CFA_OFFSET, copy_rtx (set1));
> >> > > -  add_reg_note (insn, REG_CFA_OFFSET, copy_rtx (set2));
> >> > > +  XVECEXP (dwarf, 0, 0) = copy_rtx (set1);
> >> > > +  XVECEXP (dwarf, 0, 1) = copy_rtx (set2);
> >> > > +  add_reg_note (insn, REG_FRAME_RELATED_EXPR, dwarf);
> >> > >  }
> >> > >
> >> > >  /* Similar like riscv_restore_reg, but restores two registers from
> memory
> >> > > d

Re: [PATCH] riscv: thead: Fix failing XTheadCondMov tests (indirect-rv[32|64])

2023-07-12 Thread Philipp Tomsich
On Wed, 12 Jul 2023 at 16:05, Jeff Law  wrote:

>
>
> On 7/12/23 06:48, Christoph Müllner wrote:
> > On Wed, Jul 12, 2023 at 4:05 AM Jeff Law  wrote:
> >>
> >>
> >>
> >> On 7/10/23 22:44, Christoph Muellner wrote:
> >>> From: Christoph Müllner 
> >>>
> >>> Recently, two identical XTheadCondMov tests have been added, which
> both fail.
> >>> Let's fix that by changing the following:
> >>> * Merge both files into one (no need for separate tests for rv32 and
> rv64)
> >>> * Drop unrelated attribute check test (we already test for `th.mveqz`
> >>> and `th.mvnez` instructions, so there is little additional value)
> >>> * Fix the pattern to allow matching
> >>>
> >>> gcc/testsuite/ChangeLog:
> >>>
> >>>* gcc.target/riscv/xtheadcondmov-indirect-rv32.c: Moved to...
> >>>* gcc.target/riscv/xtheadcondmov-indirect.c: ...here.
> >>>* gcc.target/riscv/xtheadcondmov-indirect-rv64.c: Removed.
> >> I thought this stuff got fixed recently.  Certainly happy to see the
> >> files merged though.  Here's what I got from the July 4 run:
> >
> > I have the following with a GCC master from today
> > (a454325bea77a0dd79415480d48233a7c296bc0a):
> >
> > FAIL: gcc.target/riscv/xtheadcondmov-indirect-rv32.c   -O2
> > scan-assembler .attribute arch,
> > "rv32i2p1_m2p0_a2p1_f2p2_d2p2_c2p0_zicsr2p0_zifencei2p0_xtheadcondmov1p0"
> > FAIL: gcc.target/riscv/xtheadcondmov-indirect-rv64.c   -O2
> > scan-assembler .attribute arch,
> > "rv64i2p1_m2p0_a2p1_f2p2_d2p2_c2p0_zicsr2p0_zifencei2p0_xtheadcondmov1p0"
> >
> > With this patch the fails are gone.
> Then it's fine with me :-)


For the avoidance of all doubt: could I hear an "OK"?

Thanks,
Philipp.


Re: [PATCH] riscv: thead: Fix failing XTheadCondMov tests (indirect-rv[32|64])

2023-07-12 Thread Philipp Tomsich
Thanks, applied to trunk!

Philipp.

On Wed, 12 Jul 2023 at 16:08, Jeff Law  wrote:

>
>
> On 7/12/23 08:07, Philipp Tomsich wrote:
> >
> >
> > On Wed, 12 Jul 2023 at 16:05, Jeff Law  > <mailto:jeffreya...@gmail.com>> wrote:
> >
> >
> >
> > On 7/12/23 06:48, Christoph Müllner wrote:
> >  > On Wed, Jul 12, 2023 at 4:05 AM Jeff Law  > <mailto:jeffreya...@gmail.com>> wrote:
> >  >>
> >  >>
> >  >>
> >  >> On 7/10/23 22:44, Christoph Muellner wrote:
> >  >>> From: Christoph Müllner  > <mailto:christoph.muell...@vrull.eu>>
> >  >>>
> >  >>> Recently, two identical XTheadCondMov tests have been added,
> > which both fail.
> >  >>> Let's fix that by changing the following:
> >  >>> * Merge both files into one (no need for separate tests for
> > rv32 and rv64)
> >  >>> * Drop unrelated attribute check test (we already test for
> > `th.mveqz`
> >  >>> and `th.mvnez` instructions, so there is little additional
> > value)
> >  >>> * Fix the pattern to allow matching
> >  >>>
> >  >>> gcc/testsuite/ChangeLog:
> >  >>>
> >  >>>* gcc.target/riscv/xtheadcondmov-indirect-rv32.c: Moved
> > to...
> >  >>>* gcc.target/riscv/xtheadcondmov-indirect.c: ...here.
> >  >>>* gcc.target/riscv/xtheadcondmov-indirect-rv64.c:
> Removed.
> >  >> I thought this stuff got fixed recently.  Certainly happy to see
> the
> >  >> files merged though.  Here's what I got from the July 4 run:
> >  >
> >  > I have the following with a GCC master from today
> >  > (a454325bea77a0dd79415480d48233a7c296bc0a):
> >  >
> >  > FAIL: gcc.target/riscv/xtheadcondmov-indirect-rv32.c   -O2
> >  > scan-assembler .attribute arch,
> >  >
> >
>  "rv32i2p1_m2p0_a2p1_f2p2_d2p2_c2p0_zicsr2p0_zifencei2p0_xtheadcondmov1p0"
> >  > FAIL: gcc.target/riscv/xtheadcondmov-indirect-rv64.c   -O2
> >  > scan-assembler .attribute arch,
> >  >
> >
>  "rv64i2p1_m2p0_a2p1_f2p2_d2p2_c2p0_zicsr2p0_zifencei2p0_xtheadcondmov1p0"
> >  >
> >  > With this patch the fails are gone.
> > Then it's fine with me :-)
> >
> >
> > For the avoidance of all doubt: could I hear an "OK"?
> OK for the trunk.
> jeff
>


Re: [PATCH] RISC-V: optim const DF +0.0 store to mem [PR/110748]

2023-07-21 Thread Philipp Tomsich
On Fri, 21 Jul 2023 at 19:56, Vineet Gupta  wrote:
>
> DF +0.0 is bitwise all zeros so int x0 store to mem can be used to optimize 
> it.
>
> void zd(double *) { *d = 0.0; }
>
> currently:
>
> | fmv.d.x fa5,zero
> | fsd fa5,0(a0)
> | ret
>
> With patch
>
> | sd  zero,0(a0)
> | ret
> This came to light when testing the in-flight f-m-o patch where an ICE
> was gettinh triggered due to lack of this pattern but turns out this

typo: "gettinh" -> "getting"

> is an independent optimization of its own [1]
>
> [1] https://gcc.gnu.org/pipermail/gcc-patches/2023-July/624857.html
>
> Apparently this is a regression in gcc-13, introduced by commit
> ef85d150b5963 ("RISC-V: Enable TARGET_SUPPORTS_WIDE_INT") and the fix
> thus is a partial revert of that change.

Should we add a "Fixes: "?

> Ran thru full multilib testsuite, there was 1 false failure due to
> random string "lw" appearing in lto build assembler output,
> which is also fixed in the patch.
>
> gcc/Changelog:

PR target/110748

>
> * config/riscv/predicates.md (const_0_operand): Add back
>   const_double.
>
> gcc/testsuite/ChangeLog:
>
> * gcc.target/riscv/pr110748-1.c: New Test.
> * gcc.target/riscv/xtheadfmv-fmv.c: Add '\t' around test
>   patterns to avoid random string matches.
>
> Signed-off-by: Vineet Gupta 
> ---
>  gcc/config/riscv/predicates.md |  2 +-
>  gcc/testsuite/gcc.target/riscv/pr110748-1.c| 10 ++
>  gcc/testsuite/gcc.target/riscv/xtheadfmv-fmv.c |  8 
>  3 files changed, 15 insertions(+), 5 deletions(-)
>  create mode 100644 gcc/testsuite/gcc.target/riscv/pr110748-1.c
>
> diff --git a/gcc/config/riscv/predicates.md b/gcc/config/riscv/predicates.md
> index 5a22c77f0cd0..9db28c2def7e 100644
> --- a/gcc/config/riscv/predicates.md
> +++ b/gcc/config/riscv/predicates.md
> @@ -58,7 +58,7 @@
> (match_test "INTVAL (op) + 1 != 0")))
>
>  (define_predicate "const_0_operand"
> -  (and (match_code "const_int,const_wide_int,const_vector")
> +  (and (match_code "const_int,const_wide_int,const_double,const_vector")
> (match_test "op == CONST0_RTX (GET_MODE (op))")))
>
>  (define_predicate "const_1_operand"
> diff --git a/gcc/testsuite/gcc.target/riscv/pr110748-1.c 
> b/gcc/testsuite/gcc.target/riscv/pr110748-1.c
> new file mode 100644
> index ..2f5bc08aae72
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/riscv/pr110748-1.c
> @@ -0,0 +1,10 @@
> +/* { dg-do compile } */
> +/* { dg-require-effective-target hard_float } */
> +/* { dg-options "-march=rv64g -mabi=lp64d -O2" } */
> +
> +
> +void zd(double *d) { *d = 0.0;  }
> +void zf(float *f)  { *f = 0.0;  }
> +
> +/* { dg-final { scan-assembler-not "\tfmv\\.d\\.x\t" } } */
> +/* { dg-final { scan-assembler-not "\tfmv\\.s\\.x\t" } } */
> diff --git a/gcc/testsuite/gcc.target/riscv/xtheadfmv-fmv.c 
> b/gcc/testsuite/gcc.target/riscv/xtheadfmv-fmv.c
> index 1036044291e7..89eb48bed1b9 100644
> --- a/gcc/testsuite/gcc.target/riscv/xtheadfmv-fmv.c
> +++ b/gcc/testsuite/gcc.target/riscv/xtheadfmv-fmv.c
> @@ -18,7 +18,7 @@ d2ll (double d)
>  /* { dg-final { scan-assembler "th.fmv.hw.x" } } */
>  /* { dg-final { scan-assembler "fmv.x.w" } } */
>  /* { dg-final { scan-assembler "th.fmv.x.hw" } } */
> -/* { dg-final { scan-assembler-not "sw" } } */
> -/* { dg-final { scan-assembler-not "fld" } } */
> -/* { dg-final { scan-assembler-not "fsd" } } */
> -/* { dg-final { scan-assembler-not "lw" } } */
> +/* { dg-final { scan-assembler-not "\tsw\t" } } */
> +/* { dg-final { scan-assembler-not "\tfld\t" } } */
> +/* { dg-final { scan-assembler-not "\tfsd\t" } } */
> +/* { dg-final { scan-assembler-not "\tlw\t" } } */
> --
> 2.34.1
>


Re: RISC-V: Folding memory for FP + constant case

2023-08-01 Thread Philipp Tomsich
+Manolis Tsamis

On Tue, 1 Aug 2023 at 23:56, Jeff Law via Gcc-patches
 wrote:
>
>
>
> On 8/1/23 13:14, Vineet Gupta wrote:
>
> >
> > I have some numbers for f-m-o v3 vs this. Attached here (vs. inline to
> > avoid the Thunderbird mangling the test formatting)
> Thanks.  Of particular importance is the leela change.  My recollection
> was that the f-m-o work also picked up that case.  But if my memory is
> faulty (always a possibility), then that shows a clear case where
> Jivan's work picks up a case not handled by Manolis's work.

f-m-o originally targeted (and benefited) the leela-case.  I wonder if
other optimizations/changes over the last year interfere with this and
what needs to be changed to accomodate this... looks like we need to
revisit against trunk.

Philipp.

> And on the other direction we can see that deepsjeng isn't helped by
> Jivan's work, but is helped by Manolis's new pass.
>
> I'd always hoped/expected we'd have cases where one patch clearly helped
> over the other.  While the .25% to .37% improvements for the three most
> impacted benchmarks doesn't move the needle much across the whole suite
> they do add up over time.
>
> Jeff


Re: RISC-V: Folding memory for FP + constant case

2023-08-01 Thread Philipp Tomsich
Very helpful! Looks as if regprop for stack_pointer is now either too
conservative — or one of our patches is missing in everyone's test
setup; we'll take a closer look.

On Wed, 2 Aug 2023 at 01:03, Vineet Gupta  wrote:
>
>
>
> On 8/1/23 15:07, Philipp Tomsich wrote:
> > +Manolis Tsamis
> >
> > On Tue, 1 Aug 2023 at 23:56, Jeff Law via Gcc-patches
> >  wrote:
> >>
> >>
> >> On 8/1/23 13:14, Vineet Gupta wrote:
> >>
> >>> I have some numbers for f-m-o v3 vs this. Attached here (vs. inline to
> >>> avoid the Thunderbird mangling the test formatting)
> >> Thanks.  Of particular importance is the leela change.  My recollection
> >> was that the f-m-o work also picked up that case.  But if my memory is
> >> faulty (always a possibility), then that shows a clear case where
> >> Jivan's work picks up a case not handled by Manolis's work.
> > f-m-o originally targeted (and benefited) the leela-case.  I wonder if
> > other optimizations/changes over the last year interfere with this and
> > what needs to be changed to accomodate this... looks like we need to
> > revisit against trunk.
> >
> > Philipp.
> >
> >> And on the other direction we can see that deepsjeng isn't helped by
> >> Jivan's work, but is helped by Manolis's new pass.
> >>
> >> I'd always hoped/expected we'd have cases where one patch clearly helped
> >> over the other.  While the .25% to .37% improvements for the three most
> >> impacted benchmarks doesn't move the needle much across the whole suite
> >> they do add up over time.
> >>
> >> Jeff
>
> I took a quick look at Leela, the significant difference is from
> additional insns with SP not getting propagated.
>
> e.g.
>
> 231b6:mva4,sp
> 231b8:sh2adda5,a5,a4
>
> vs.
>
> 1e824:sh2adda5,a5,sp
>
> There are 5 such instances which more or less make up for the delta.
>
> -Vineet
>


Re: [PATCH] cprop_hardreg: Allow more propagation of the stack pointer.

2023-08-07 Thread Philipp Tomsich
Applied to master, thanks!
--Philipp.


On Mon, 7 Aug 2023 at 19:20, Jeff Law  wrote:
>
>
>
> On 8/7/23 05:31, Manolis Tsamis wrote:
> > The stack pointer propagation fix 736f8fd3 turned out to be more restrictive
> > than needed by rejecting propagation of the stack pointer when REG_POINTER
> > didn't match.
> >
> > This commit removes this check:
> > When the stack pointer is propagated it is fine for this to result in
> > REG_POINTER becoming true from false, which is what the original code 
> > checked.
> >
> > This simplification makes the previously introduced function
> > maybe_copy_reg_attrs obsolete and the logic can be inlined at the call 
> > sites,
> > as it was before 736f8fd3.
> >
> > gcc/ChangeLog:
> >
> >   * regcprop.cc (maybe_copy_reg_attrs): Remove unnecessary function.
> >   (find_oldest_value_reg): Inline stack_pointer_rtx check.
> >   (copyprop_hardreg_forward_1): Inline stack_pointer_rtx check.
> OK
> jeff


Re: [RFC PATCH 0/2] RISC-V: __builtin_riscv_pause for all environment

2023-08-13 Thread Philipp Tomsich
On Sat, 12 Aug 2023 at 01:31, Jeff Law via Gcc-patches
 wrote:
>
>
>
> On 8/9/23 16:39, Tsukasa OI wrote:
> > On 2023/08/10 5:05, Jeff Law wrote:
>
> >> I'd tend to think we do not want to expose the intrinsic unless the
> >> right extensions are enabled -- even though the encoding is a no-op and
> >> we could emit it as a .insn.
> >
> > I think that makes sense.  The only reason I implemented the
> > no-'Zihintpause' version is because GCC 13 implemented the built-in
> > unconditionally.  If the compatibility breakage is considered minimum (I
> > don't know, though), I'm ready to submit 'Zihintpause'-only version of
> > this patch set.
> While it's a compatibility break I don't think we have a need to
> preserve this kind of compatibility.  I suspect anyone using
> __builtin_riscv_pause was probably already turning on Zihintpause and if
> they weren't they should have been :-0
>
>
> I'm sure we'll kick this around in the Tuesday meeting and hopefully
> make a decision about the desired direction.  You're obviously welcome
> to join if you're inclined.  Let me know if you need an invite.

The original discussion (and I believe that Andrew was the decisive
voice in the end) came to the conclusion that—given that pause is a
true hint—it could always be enabled.
We had originally expected to enable it only if Zihintpause was part
of the target architecture, but viewing it as "just a name for an
already existing pure hint" also made sense.
Note that on systems that don't implement Zihintpause, the hint is
guarantueed to not have an architectural effect.

That said, I don't really have a strong leaning one way or another.
Philipp.


Re: [RFC PATCH v2 1/2] RISC-V: __builtin_riscv_pause for all environment

2023-08-16 Thread Philipp Tomsich
On Wed, 16 Aug 2023 at 03:27, Jeff Law via Gcc-patches
 wrote:
>
>
>
> On 8/9/23 20:25, Tsukasa OI wrote:
> > From: Tsukasa OI 
> >
> > The "pause" RISC-V hint instruction requires the 'Zihintpause' extension
> > (in the assembler).  However, GCC emits "pause" unconditionally, making
> > an assembler error while compiling code with __builtin_riscv_pause while
> > the 'Zihintpause' extension disabled.
> >
> > However, the "pause" instruction code (0x010f) is a HINT and emitting
> > its instruction code is safe in any environment.
> >
> > This commit implements handling for the 'Zihintpause' extension and emits
> > ".insn 0x010f" instead of "pause" only if the extension is disabled
> > (making the diagnostics better).
> >
> > gcc/ChangeLog:
> >
> >   * common/config/riscv/riscv-common.cc
> >   (riscv_ext_version_table): Implement the 'Zihintpause' extension,
> >   version 2.0.  (riscv_ext_flag_table) Add 'Zihintpause' handling.
> >   * config/riscv/riscv-builtins.cc: Remove availability predicate
> >   "always" and add "hint_pause" and "hint_pause_pseudo", corresponding
> >   the existence of the 'Zihintpause' extension.
> >   (riscv_builtins) Split builtin implementation depending on the
> >   existence of the 'Zihintpause' extension.
> >   * config/riscv/riscv-opts.h
> >   (MASK_ZIHINTPAUSE, TARGET_ZIHINTPAUSE): New.
> >   * config/riscv/riscv.md (riscv_pause): Make it only available when
> >   the 'Zihintpause' extension is enabled.  (riscv_pause_insn) New
> >   "pause" implementation when the 'Zihintpause' extension is disabled.
> >
> > gcc/testsuite/ChangeLog:
> >
> >   * gcc.target/riscv/builtin_pause.c: Removed.
> >   * gcc.target/riscv/zihintpause-1.c:
> >   New test when the 'Zihintpause' extension is enabled.
> >   * gcc.target/riscv/zihintpause-2.c: Likewise.
> >   * gcc.target/riscv/zihintpause-noarch.c:
> >   New test when the 'Zihintpause' extension is disabled.
> So the conclusion from today's meeting was to make this available
> irrespective of the extension set.  So I've dropped the alternate patch
> from patchwork.
>
>
> > diff --git a/gcc/config/riscv/riscv-builtins.cc 
> > b/gcc/config/riscv/riscv-builtins.cc
> > index 79681d759628..554fb7f69bb0 100644
> > --- a/gcc/config/riscv/riscv-builtins.cc
> > +++ b/gcc/config/riscv/riscv-builtins.cc
> > @@ -122,7 +122,8 @@ AVAIL (clmul_zbkc32_or_zbc32, (TARGET_ZBKC || 
> > TARGET_ZBC) && !TARGET_64BIT)
> >   AVAIL (clmul_zbkc64_or_zbc64, (TARGET_ZBKC || TARGET_ZBC) && TARGET_64BIT)
> >   AVAIL (clmulr_zbc32, TARGET_ZBC && !TARGET_64BIT)
> >   AVAIL (clmulr_zbc64, TARGET_ZBC && TARGET_64BIT)
> > -AVAIL (always, (!0))
> > +AVAIL (hint_pause, TARGET_ZIHINTPAUSE)
> > +AVAIL (hint_pause_pseudo, !TARGET_ZIHINTPAUSE)
> >
> >   /* Construct a riscv_builtin_description from the given arguments.
> >
> > @@ -179,7 +180,8 @@ static const struct riscv_builtin_description 
> > riscv_builtins[] = {
> >
> > DIRECT_BUILTIN (frflags, RISCV_USI_FTYPE, hard_float),
> > DIRECT_NO_TARGET_BUILTIN (fsflags, RISCV_VOID_FTYPE_USI, hard_float),
> > -  DIRECT_NO_TARGET_BUILTIN (pause, RISCV_VOID_FTYPE, always),
> > +  RISCV_BUILTIN (pause, "pause", RISCV_BUILTIN_DIRECT_NO_TARGET, 
> > RISCV_VOID_FTYPE, hint_pause),
> > +  RISCV_BUILTIN (pause_insn, "pause", RISCV_BUILTIN_DIRECT_NO_TARGET, 
> > RISCV_VOID_FTYPE, hint_pause_pseudo),
> >   };
> >
> >   /* Index I is the function declaration for riscv_builtins[I], or null if 
> > the
> > diff --git a/gcc/config/riscv/riscv-opts.h b/gcc/config/riscv/riscv-opts.h
> > index 28d9b81bd800..a6c3e0c9098f 100644
> > --- a/gcc/config/riscv/riscv-opts.h
> > +++ b/gcc/config/riscv/riscv-opts.h
> > @@ -102,10 +102,12 @@ enum riscv_entity
> >   #define MASK_ZICSR(1 << 0)
> >   #define MASK_ZIFENCEI (1 << 1)
> >   #define MASK_ZIHINTNTL (1 << 2)
> > +#define MASK_ZIHINTPAUSE (1 << 3)
> >
> >   #define TARGET_ZICSR((riscv_zi_subext & MASK_ZICSR) != 0)
> >   #define TARGET_ZIFENCEI ((riscv_zi_subext & MASK_ZIFENCEI) != 0)
> >   #define TARGET_ZIHINTNTL ((riscv_zi_subext & MASK_ZIHINTNTL) != 0)
> > +#define TARGET_ZIHINTPAUSE ((riscv_zi_subext & MASK_ZIHINTPAUSE) != 0)
> >
> >   #define MASK_ZAWRS   (1 << 0)
> >   #define TARGET_ZAWRS ((riscv_za_subext & MASK_ZAWRS) != 0)
> > diff --git a/gcc/config/riscv/riscv.md b/gcc/config/riscv/riscv.md
> > index 688fd697255b..a6cdb32e9408 100644
> > --- a/gcc/config/riscv/riscv.md
> > +++ b/gcc/config/riscv/riscv.md
> > @@ -2192,9 +2192,14 @@
> >
> >   (define_insn "riscv_pause"
> > [(unspec_volatile [(const_int 0)] UNSPECV_PAUSE)]
> > -  ""
> > +  "TARGET_ZIHINTPAUSE"
> > "pause")
> >
> > +(define_insn "riscv_pause_insn"
> > +  [(unspec_volatile [(const_int 0)] UNSPECV_PAUSE)]
> > +  ""
> > +  ".insn\t0x010f")
> > +
> So I was wondering if we'd be better off always emitting the .insn form
> with a comment on the line indicating it's a pause.  ie something like
>
> .insn\t0x010f ;; pause

Re: RISC-V: Added support for CRC.

2023-08-16 Thread Philipp Tomsich
On Wed, 16 Aug 2023 at 21:10, Alexander Monakov  wrote:
>
>
> On Tue, 15 Aug 2023, Jeff Law wrote:
>
> > Because if the compiler can optimize it automatically, then the projects 
> > have
> > to do literally nothing to take advantage of it.  They just compile normally
> > and their bitwise CRC gets optimized down to either a table lookup or a 
> > clmul
> > variant.  That's the real goal here.
>
> The only high-profile FOSS project that carries a bitwise CRC implementation
> I'm aware of is the 'xz' compression library. There bitwise CRC is used for
> populating the lookup table under './configure --enable-small':
>
> https://github.com/tukaani-project/xz/blob/2b871f4dbffe3801d0da3f89806b5935f758d5f3/src/liblzma/check/crc64_small.c
>
> It's a well-reasoned choice and your compiler would be undoing it
> (reintroducing the table when the bitwise CRC is employed specifically
> to avoid carrying the table).
>
> > One final note.  Elsewhere in this thread you described performance 
> > concerns.
> > Right now clmuls can be implemented in 4c, fully piped.
>
> Pipelining doesn't matter in the implementation being proposed here, because
> the builtin is expanded to
>
>li  a4,quotient
>li  a5,polynomial
>xor a0,a1,a0
>clmul   a0,a0,a4
>srlia0,a0,crc_size
>clmul   a0,a0,a5
>sllia0,a0,GET_MODE_BITSIZE (word_mode) - crc_size
>srlia0,a0,GET_MODE_BITSIZE (word_mode) - crc_size
>
> making CLMULs data-dependent, so the second can only be started one cycle
> after the first finishes, and consecutive invocations of __builtin_crc
> are likewise data-dependent (with three cycles between CLMUL). So even
> when you get CLMUL down to 3c latency, you'll have two CLMULs and 10 cycles
> per input block, while state of the art is one widening CLMUL per input block
> (one CLMUL per 32-bit block on a 64-bit CPU) limited by throughput, not 
> latency.
>
> > I fully expect that latency to drop within the next 12-18 months.  In that
> > world, there's not going to be much benefit to using hand-coded libraries vs
> > just letting the compiler do it.

I would also hope that the hand-coded libraries would eventually have
a code path for compilers that support the built-in.
For what it's worth, there now is CRC in Boost:
https://www.boost.org/doc/libs/1_83_0/doc/html/crc.html

Cheers,
philipp.


Re: [PATCH 2/2] cprop_hardreg: Enable propagation of the stack pointer if possible.

2023-06-22 Thread Philipp Tomsich
This should be covered by PR110308 (proposed fix attached there) and PR110313.
Our bootstrap runs are still in progress to confirm.


On Thu, 22 Jun 2023 at 09:40, Richard Biener  wrote:
>
> On Thu, Jun 22, 2023 at 1:42 AM Thiago Jung Bauermann
>  wrote:
> >
> >
> > Hello,
> >
> > Jeff Law  writes:
> >
> > > On 6/19/23 22:52, Tamar Christina wrote:
> > >
> > >>> It's a bit hackish, but could we reject the stack pointer for operand1 
> > >>> in the
> > >>> stack-tie?  And if we do so, does it help?
> > >> Yeah this one I had to defer until later this week to look at closer 
> > >> because what I'm
> > >> wondering about is whether the optimization should apply to frame related
> > >> RTX as well.
> > >> Looking at the description of RTX_FRAME_RELATED_P that this optimization 
> > >> may
> > >> end up de-optimizing RISC targets by creating an offset that is larger 
> > >> than offset
> > >> which can be used from a SP making reload having to spill.  i.e. 
> > >> sometimes the
> > >> move was explicitly done. So perhaps it should not apply it to
> > >> RTX_FRAME_RELATED_P in find_oldest_value_reg and 
> > >> copyprop_hardreg_forward_1?
> > >> Other parts of this pass already seems to bail out in similar 
> > >> situations. So I needed
> > >> to
> > >> write some testcases to check what would happen in these cases hence the 
> > >> deferral.
> > >> to later in the week.
> > > Rejecting for RTX_FRAME_RELATED_P would seem reasonable and probably 
> > > better in general to
> > > me.  The cases where we're looking to clean things up aren't really in the
> > > prologue/epilogue, but instead in the main body after register 
> > > elimination has turned fp
> > > into sp + offset, thus making all kinds of things no longer valid.
> >
> > The problems I reported were fixed by commits:
> >
> > 580b74a79146 "aarch64: Robustify stack tie handling"
> > 079f31c55318 "aarch64: Fix gcc.target/aarch64/sve/pcs failures"
> >
> > Thanks!
> >
> > But unfortunately I'm still seeing bootstrap failures (ICE segmentation
> > fault) in today's trunk with build config bootstrap-lto in both
> > armv8l-linux-gnueabihf and aarch64-linux-gnu.
>
> If there's not yet a bugreport for this please make sure to open one so
> this issue doesn't get lost.
>
> > If I revert commit 6a2e8dcbbd4b "cprop_hardreg: Enable propagation of
> > the stack pointer if possible" from trunk then both bootstraps succeed.
> >
> > Here's the command I'm using to build on armv8l:
> >
> > ~/src/configure \
> > SHELL=/bin/bash \
> > --with-gnu-as \
> > --with-gnu-ld \
> > --disable-libmudflap \
> > --enable-lto \
> > --enable-shared \
> > --without-included-gettext \
> > --enable-nls \
> > --with-system-zlib \
> > --disable-sjlj-exceptions \
> > --enable-gnu-unique-object \
> > --enable-linker-build-id \
> > --disable-libstdcxx-pch \
> > --enable-c99 \
> > --enable-clocale=gnu \
> > --enable-libstdcxx-debug \
> > --enable-long-long \
> > --with-cloog=no \
> > --with-ppl=no \
> > --with-isl=no \
> > --disable-multilib \
> > --with-float=hard \
> > --with-fpu=neon-fp-armv8 \
> > --with-mode=thumb \
> > --with-arch=armv8-a \
> > --enable-threads=posix \
> > --enable-multiarch \
> > --enable-libstdcxx-time=yes \
> > --enable-gnu-indirect-function \
> > --disable-werror \
> > --enable-checking=yes \
> > --enable-bootstrap \
> > --with-build-config=bootstrap-lto \
> > --enable-languages=c,c++,fortran,lto \
> > && make \
> > profiledbootstrap \
> > SHELL=/bin/bash \
> > -w \
> > -j 40 \
> > CFLAGS_FOR_BUILD="-pipe -g -O2" \
> > CXXFLAGS_FOR_BUILD="-pipe -g -O2" \
> > LDFLAGS_FOR_BUILD="-static-libgcc" \
> > MAKEINFOFLAGS=--force \
> > BUILD_INFO="" \
> > MAKEINFO=echo
> >
> > And here's the slightly different one for aarch64-linux:
> >
> > ~/src/configure \
> > SHELL=/bin/bash \
> > --with-gnu-as \
> > --with-gnu-ld \
> > --disable-libmudflap \
> > --enable-lto \
> > --enable-shared \
> > --without-included-gettext \
> > --enable-nls \
> > --with-system-zlib \
> > --disable-sjlj-exceptions \
> > --enable-gnu-unique-object \
> > --enable-linker-build-id \
> > --disable-libstdcxx-pch \
> > --enable-c99 \
> > --enable-clocale=gnu \
> > --enable-libstdcxx-debug \
> > --enable-long-long \
> > --with-cloog=no \
> > --with-ppl=no \
> > --with-isl=no \
> > --disable-multilib \
> > --enable-fix-cortex-a53-835769 \
> > --enable-fix-cortex-a53-843419 \
> > --with-arch=armv8-a \
> > --enable-threads=posix \
> > --enable-multiarch \
> > --enable-libstdcxx-time=yes \
> > --enable-gnu-indirect-function \
> > --disable-werror \
> > --enable-checking=yes \
> > --enable-bootstrap \
> > --with-build-config=bootstrap-lto \
> > --enable-languages=c,c++

[PATCH] cprop_hardreg: fix ORIGINAL_REGNO/REG_ATTRS/REG_POINTER handling

2023-06-22 Thread Philipp Tomsich
From: Manolis Tsamis 

Fixes: 6a2e8dcbbd4bab3

Propagation for the stack pointer in regcprop was enabled in
6a2e8dcbbd4bab3, but set ORIGINAL_REGNO/REG_ATTRS/REG_POINTER for
stack_pointer_rtx which caused regression (e.g., PR 110313, PR 110308).

This fix adds special handling for stack_pointer_rtx in the places
where maybe_mode_change is called. This also adds an check in
maybe_mode_change to return the stack pointer only when the requested
mode matches the mode of stack_pointer_rtx.

PR 110308

gcc/ChangeLog:

* regcprop.cc (maybe_mode_change): Check stack_pointer_rtx mode.
(find_oldest_value_reg): Special handling of stack_pointer_rtx.
(copyprop_hardreg_forward_1): Ditto.

gcc/testsuite/ChangeLog:

* g++.dg/torture/pr110308.C: New test.

Signed-off-by: Manolis Tsamis 
Signed-off-by: Philipp Tomsich 

---
This addresses both the PRs (110308 and 110313) and was confirmed to
resolve the AArch64 bootstrap issue reported by Thiago.

OK for trunk?

 gcc/regcprop.cc | 43 +
 gcc/testsuite/g++.dg/torture/pr110308.C | 30 +
 2 files changed, 60 insertions(+), 13 deletions(-)
 create mode 100644 gcc/testsuite/g++.dg/torture/pr110308.C

diff --git a/gcc/regcprop.cc b/gcc/regcprop.cc
index 6cbfadb181f..fe75b7f1fa0 100644
--- a/gcc/regcprop.cc
+++ b/gcc/regcprop.cc
@@ -423,7 +423,7 @@ maybe_mode_change (machine_mode orig_mode, machine_mode 
copy_mode,
  It's unclear if we need to do the same for other special registers.  */
   if (regno == STACK_POINTER_REGNUM)
 {
-  if (orig_mode == new_mode)
+  if (orig_mode == new_mode && new_mode == GET_MODE (stack_pointer_rtx))
return stack_pointer_rtx;
   else
return NULL_RTX;
@@ -487,9 +487,14 @@ find_oldest_value_reg (enum reg_class cl, rtx reg, struct 
value_data *vd)
   new_rtx = maybe_mode_change (oldmode, vd->e[regno].mode, mode, i, regno);
   if (new_rtx)
{
- ORIGINAL_REGNO (new_rtx) = ORIGINAL_REGNO (reg);
- REG_ATTRS (new_rtx) = REG_ATTRS (reg);
- REG_POINTER (new_rtx) = REG_POINTER (reg);
+ if (new_rtx != stack_pointer_rtx)
+   {
+ ORIGINAL_REGNO (new_rtx) = ORIGINAL_REGNO (reg);
+ REG_ATTRS (new_rtx) = REG_ATTRS (reg);
+ REG_POINTER (new_rtx) = REG_POINTER (reg);
+   }
+ else if (REG_POINTER (new_rtx) != REG_POINTER (reg))
+   return NULL_RTX;
  return new_rtx;
}
 }
@@ -965,15 +970,27 @@ copyprop_hardreg_forward_1 (basic_block bb, struct 
value_data *vd)
 
  if (validate_change (insn, &SET_SRC (set), new_rtx, 0))
{
- ORIGINAL_REGNO (new_rtx) = ORIGINAL_REGNO (src);
- REG_ATTRS (new_rtx) = REG_ATTRS (src);
- REG_POINTER (new_rtx) = REG_POINTER (src);
- if (dump_file)
-   fprintf (dump_file,
-"insn %u: replaced reg %u with %u\n",
-INSN_UID (insn), regno, REGNO (new_rtx));
- changed = true;
- goto did_replacement;
+ bool can_change;
+ if (new_rtx != stack_pointer_rtx)
+   {
+ ORIGINAL_REGNO (new_rtx) = ORIGINAL_REGNO (src);
+ REG_ATTRS (new_rtx) = REG_ATTRS (src);
+ REG_POINTER (new_rtx) = REG_POINTER (src);
+ can_change = true;
+   }
+ else
+   can_change
+ = (REG_POINTER (new_rtx) == REG_POINTER (src));
+
+ if (can_change)
+   {
+ if (dump_file)
+   fprintf (dump_file,
+"insn %u: replaced reg %u with %u\n",
+INSN_UID (insn), regno, REGNO (new_rtx));
+ changed = true;
+ goto did_replacement;
+   }
}
  /* We need to re-extract as validate_change clobbers
 recog_data.  */
diff --git a/gcc/testsuite/g++.dg/torture/pr110308.C 
b/gcc/testsuite/g++.dg/torture/pr110308.C
new file mode 100644
index 000..ddd30d4fc3f
--- /dev/null
+++ b/gcc/testsuite/g++.dg/torture/pr110308.C
@@ -0,0 +1,30 @@
+/* { dg-do compile } */
+/* { dg-options "-g2 -O2" } */
+
+int channelCount, decodeBlock_outputLength;
+struct BlockCodec {
+  virtual int decodeBlock(const unsigned char *, short *);
+};
+struct ms_adpcm_state {
+  char predictorIndex;
+  int sample1;
+  ms_adpcm_state();
+};
+bool decodeBlock_ok;
+void encodeBlock() { ms_adpcm_state(); }
+struct MSADPCM : BlockCodec {
+  int decodeBlock(const unsigned char 

Re: [PATCH] Change fma_reassoc_width tuning for ampere1

2023-06-22 Thread Philipp Tomsich
Richard,

OK for backport to GCC-13?

Thanks,
Philipp.

On Thu, 22 Jun 2023 at 16:18, Richard Sandiford via Gcc-patches
 wrote:
>
> Di Zhao OS via Gcc-patches  writes:
> > This patch enables reassociation of floating-point additions on ampere1.
> > This brings about 1% overall benefit on spec2017 fprate cases. (There
> > are minor regressions in 510.parest_r and 508.namd_r, analyzed here:
> > https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110279 .)
> >
> > Bootstrapped and tested on aarch64-unknown-linux-gnu. Is this OK for trunk?
> >
> > Thanks,
> > Di Zhao
> >
> > gcc/ChangeLog:
> >
> > * config/aarch64/aarch64.cc: Change fma_reassoc_width for ampere1
>
> Thanks, pushed to trunk.
>
> Richard
>
> > ---
> > diff --git a/gcc/config/aarch64/aarch64.cc b/gcc/config/aarch64/aarch64.cc
> > index d16565b5581..301c9f6c0cd 100644
> > --- a/gcc/config/aarch64/aarch64.cc
> > +++ b/gcc/config/aarch64/aarch64.cc
> > @@ -1927,7 +1927,7 @@ static const struct tune_params ampere1_tunings =
> >"32:12",   /* loop_align.  */
> >2, /* int_reassoc_width.  */
> >4, /* fp_reassoc_width.  */
> > -  1, /* fma_reassoc_width.  */
> > +  4, /* fma_reassoc_width.  */
> >2, /* vec_reassoc_width.  */
> >2, /* min_div_recip_mul_sf.  */
> >2, /* min_div_recip_mul_df.  */


[COMMITTED, PR 110308] cprop_hardreg: fix ORIGINAL_REGNO/REG_ATTRS/REG_POINTER handling

2023-06-28 Thread Philipp Tomsich
From: Manolis Tsamis 

Fixes: 6a2e8dcbbd4bab3

Propagation for the stack pointer in regcprop was enabled in
6a2e8dcbbd4bab3, but set ORIGINAL_REGNO/REG_ATTRS/REG_POINTER for
stack_pointer_rtx which caused regression (e.g., PR 110313, PR 110308).

This fix adds special handling for stack_pointer_rtx in the places
where maybe_mode_change is called. This also adds an check in
maybe_mode_change to return the stack pointer only when the requested
mode matches the mode of stack_pointer_rtx.

PR debug/110308

gcc/ChangeLog:

* regcprop.cc (maybe_mode_change): Check stack_pointer_rtx mode.
(maybe_copy_reg_attrs): New function.
(find_oldest_value_reg): Use maybe_copy_reg_attrs.
(copyprop_hardreg_forward_1): Ditto.

gcc/testsuite/ChangeLog:

* g++.dg/torture/pr110308.C: New test.

Signed-off-by: Manolis Tsamis 
Signed-off-by: Philipp Tomsich 

---

 gcc/regcprop.cc | 52 +
 gcc/testsuite/g++.dg/torture/pr110308.C | 29 ++
 2 files changed, 65 insertions(+), 16 deletions(-)
 create mode 100644 gcc/testsuite/g++.dg/torture/pr110308.C

diff --git a/gcc/regcprop.cc b/gcc/regcprop.cc
index 6cbfadb181f..d28a4d5aca8 100644
--- a/gcc/regcprop.cc
+++ b/gcc/regcprop.cc
@@ -423,7 +423,7 @@ maybe_mode_change (machine_mode orig_mode, machine_mode 
copy_mode,
  It's unclear if we need to do the same for other special registers.  */
   if (regno == STACK_POINTER_REGNUM)
 {
-  if (orig_mode == new_mode)
+  if (orig_mode == new_mode && new_mode == GET_MODE (stack_pointer_rtx))
return stack_pointer_rtx;
   else
return NULL_RTX;
@@ -451,6 +451,31 @@ maybe_mode_change (machine_mode orig_mode, machine_mode 
copy_mode,
   return NULL_RTX;
 }
 
+/* Helper function to copy attributes when replacing OLD_REG with NEW_REG.
+   If the changes required for NEW_REG are invalid return NULL_RTX, otherwise
+   return NEW_REG.  This is intended to be used with maybe_mode_change.  */
+
+static rtx
+maybe_copy_reg_attrs (rtx new_reg, rtx old_reg)
+{
+  if (new_reg != stack_pointer_rtx)
+{
+  /* NEW_REG is assumed to be a register copy resulting from
+maybe_mode_change.  */
+  ORIGINAL_REGNO (new_reg) = ORIGINAL_REGNO (old_reg);
+  REG_ATTRS (new_reg) = REG_ATTRS (old_reg);
+  REG_POINTER (new_reg) = REG_POINTER (old_reg);
+}
+  else if (REG_POINTER (new_reg) != REG_POINTER (old_reg))
+{
+  /* Only a single instance of STACK_POINTER_RTX must exist and we cannot
+modify it.  Allow propagation if REG_POINTER for OLD_REG matches and
+don't touch ORIGINAL_REGNO and REG_ATTRS.  */
+  return NULL_RTX;
+}
+  return new_reg;
+}
+
 /* Find the oldest copy of the value contained in REGNO that is in
register class CL and has mode MODE.  If found, return an rtx
of that oldest register, otherwise return NULL.  */
@@ -486,12 +511,7 @@ find_oldest_value_reg (enum reg_class cl, rtx reg, struct 
value_data *vd)
 
   new_rtx = maybe_mode_change (oldmode, vd->e[regno].mode, mode, i, regno);
   if (new_rtx)
-   {
- ORIGINAL_REGNO (new_rtx) = ORIGINAL_REGNO (reg);
- REG_ATTRS (new_rtx) = REG_ATTRS (reg);
- REG_POINTER (new_rtx) = REG_POINTER (reg);
- return new_rtx;
-   }
+   return maybe_copy_reg_attrs (new_rtx, reg);
 }
 
   return NULL_RTX;
@@ -965,15 +985,15 @@ copyprop_hardreg_forward_1 (basic_block bb, struct 
value_data *vd)
 
  if (validate_change (insn, &SET_SRC (set), new_rtx, 0))
{
- ORIGINAL_REGNO (new_rtx) = ORIGINAL_REGNO (src);
- REG_ATTRS (new_rtx) = REG_ATTRS (src);
- REG_POINTER (new_rtx) = REG_POINTER (src);
- if (dump_file)
-   fprintf (dump_file,
-"insn %u: replaced reg %u with %u\n",
-INSN_UID (insn), regno, REGNO (new_rtx));
- changed = true;
- goto did_replacement;
+ if (maybe_copy_reg_attrs (new_rtx, src))
+   {
+ if (dump_file)
+   fprintf (dump_file,
+"insn %u: replaced reg %u with %u\n",
+INSN_UID (insn), regno, REGNO (new_rtx));
+ changed = true;
+ goto did_replacement;
+   }
}
  /* We need to re-extract as validate_change clobbers
 recog_data.  */
diff --git a/gcc/testsuite/g++.dg/torture/pr110308.C 
b/gcc/testsuite/g++.dg/torture/pr110308.C
new file mode 100644
index 000..36c6d382121
--- /dev/null
+++ b/gcc/testsuite/g++.dg/torture/pr110308.C
@@ -0,0 +1,29 @@
+/* { dg-do compile } */
+
+int channelCount,

Re: [PATCH] cprop_hardreg: fix ORIGINAL_REGNO/REG_ATTRS/REG_POINTER handling

2023-06-28 Thread Philipp Tomsich
Thanks! Applied to master with the requested changes as
417b8379b32945d61f1ce3d8281bee063eea1937.
Note that the final version factors out the duplicated logic, so we
now have a single place to add the comments.

Philipp.


On Sun, 25 Jun 2023 at 06:09, Jeff Law  wrote:
>
>
>
> On 6/22/23 05:11, Philipp Tomsich wrote:
> > From: Manolis Tsamis 
> >
> > Fixes: 6a2e8dcbbd4bab3
> >
> > Propagation for the stack pointer in regcprop was enabled in
> > 6a2e8dcbbd4bab3, but set ORIGINAL_REGNO/REG_ATTRS/REG_POINTER for
> > stack_pointer_rtx which caused regression (e.g., PR 110313, PR 110308).
> >
> > This fix adds special handling for stack_pointer_rtx in the places
> > where maybe_mode_change is called. This also adds an check in
> > maybe_mode_change to return the stack pointer only when the requested
> > mode matches the mode of stack_pointer_rtx.
> >
> >   PR 110308
> Should be
> PR debug/110308
>
>
> >
> > gcc/ChangeLog:
> >
> >   * regcprop.cc (maybe_mode_change): Check stack_pointer_rtx mode.
> >   (find_oldest_value_reg): Special handling of stack_pointer_rtx.
> >   (copyprop_hardreg_forward_1): Ditto.
> >
> > gcc/testsuite/ChangeLog:
> >
> >   * g++.dg/torture/pr110308.C: New test.
> I don't doubt the need for the special handling of the stack pointer,
> but it's not obvious why it's needed.  So my request is that both humks
> which specialize handling of ORIGINAL_REGNO, REG_ATTRS & REG_POINTER
> have a comment indicating why we must not adjust those values when
> NEW_RTX is STACK_POINTER_RTX.
>
> OK with that change.
>
> Jeff


Re: [PATCH v2] RISC-V: Add support for vector crypto extensions

2023-07-03 Thread Philipp Tomsich
Thanks, applied to master.
--Philipp.

On Mon, 3 Jul 2023 at 15:42, Kito Cheng  wrote:

> Thanks, LGTM :)
>
> Christoph Muellner 於 2023年7月3日 週一,19:08寫道:
>
>> From: Christoph Müllner 
>>
>> This series adds basic support for the vector crypto extensions:
>> * Zvbb
>> * Zvbc
>> * Zvkg
>> * Zvkned
>> * Zvkhn[a,b]
>> * Zvksed
>> * Zvksh
>> * Zvkn
>> * Zvknc
>> * Zvkng
>> * Zvks
>> * Zvksc
>> * Zvksg
>> * Zvkt
>>
>> This patch is based on the v20230620 version of the Vector Cryptography
>> specification. The specification is frozen and can be found here:
>>   https://github.com/riscv/riscv-crypto/releases/tag/v20230620
>>
>> Binutils support has been merged upstream a few days ago.
>>
>> All extensions come with tests for the feature test macros.
>>
>> gcc/ChangeLog:
>>
>> * common/config/riscv/riscv-common.cc: Add support for zvbb,
>> zvbc, zvkg, zvkned, zvknha, zvknhb, zvksed, zvksh, zvkn,
>> zvknc, zvkng, zvks, zvksc, zvksg, zvkt and the implied subsets.
>> * config/riscv/arch-canonicalize: Add canonicalization info for
>> zvkn, zvknc, zvkng, zvks, zvksc, zvksg.
>> * config/riscv/riscv-opts.h (MASK_ZVBB): New macro.
>> (MASK_ZVBC): Likewise.
>> (TARGET_ZVBB): Likewise.
>> (TARGET_ZVBC): Likewise.
>> (MASK_ZVKG): Likewise.
>> (MASK_ZVKNED): Likewise.
>> (MASK_ZVKNHA): Likewise.
>> (MASK_ZVKNHB): Likewise.
>> (MASK_ZVKSED): Likewise.
>> (MASK_ZVKSH): Likewise.
>> (MASK_ZVKN): Likewise.
>> (MASK_ZVKNC): Likewise.
>> (MASK_ZVKNG): Likewise.
>> (MASK_ZVKS): Likewise.
>> (MASK_ZVKSC): Likewise.
>> (MASK_ZVKSG): Likewise.
>> (MASK_ZVKT): Likewise.
>> (TARGET_ZVKG): Likewise.
>> (TARGET_ZVKNED): Likewise.
>> (TARGET_ZVKNHA): Likewise.
>> (TARGET_ZVKNHB): Likewise.
>> (TARGET_ZVKSED): Likewise.
>> (TARGET_ZVKSH): Likewise.
>> (TARGET_ZVKN): Likewise.
>> (TARGET_ZVKNC): Likewise.
>> (TARGET_ZVKNG): Likewise.
>> (TARGET_ZVKS): Likewise.
>> (TARGET_ZVKSC): Likewise.
>> (TARGET_ZVKSG): Likewise.
>> (TARGET_ZVKT): Likewise.
>> * config/riscv/riscv.opt: Introduction of riscv_zv{b,k}_subext.
>>
>> gcc/testsuite/ChangeLog:
>>
>> * gcc.target/riscv/zvbb.c: New test.
>> * gcc.target/riscv/zvbc.c: New test.
>> * gcc.target/riscv/zvkg.c: New test.
>> * gcc.target/riscv/zvkn-1.c: New test.
>> * gcc.target/riscv/zvkn.c: New test.
>> * gcc.target/riscv/zvknc-1.c: New test.
>> * gcc.target/riscv/zvknc-2.c: New test.
>> * gcc.target/riscv/zvknc.c: New test.
>> * gcc.target/riscv/zvkned.c: New test.
>> * gcc.target/riscv/zvkng-1.c: New test.
>> * gcc.target/riscv/zvkng-2.c: New test.
>> * gcc.target/riscv/zvkng.c: New test.
>> * gcc.target/riscv/zvknha.c: New test.
>> * gcc.target/riscv/zvknhb.c: New test.
>> * gcc.target/riscv/zvks-1.c: New test.
>> * gcc.target/riscv/zvks.c: New test.
>> * gcc.target/riscv/zvksc-1.c: New test.
>> * gcc.target/riscv/zvksc-2.c: New test.
>> * gcc.target/riscv/zvksc.c: New test.
>> * gcc.target/riscv/zvksed.c: New test.
>> * gcc.target/riscv/zvksg-1.c: New test.
>> * gcc.target/riscv/zvksg-2.c: New test.
>> * gcc.target/riscv/zvksg.c: New test.
>> * gcc.target/riscv/zvksh.c: New test.
>> * gcc.target/riscv/zvkt.c: New test.
>>
>> Signed-off-by: Christoph Müllner 
>> ---
>> Changes for v2:
>> - Update patch for specification version v20230620
>>
>>  gcc/common/config/riscv/riscv-common.cc  | 55 
>>  gcc/config/riscv/arch-canonicalize   |  7 +++
>>  gcc/config/riscv/riscv-opts.h| 34 +++
>>  gcc/config/riscv/riscv.opt   |  6 +++
>>  gcc/testsuite/gcc.target/riscv/zvbb.c| 13 ++
>>  gcc/testsuite/gcc.target/riscv/zvbc.c| 13 ++
>>  gcc/testsuite/gcc.target/riscv/zvkg.c| 13 ++
>>  gcc/testsuite/gcc.target/riscv/zvkn-1.c  | 29 +
>>  gcc/testsuite/gcc.target/riscv/zvkn.c| 29 +
>>  gcc/testsuite/gcc.target/riscv/zvknc-1.c | 37 
>>  gcc/testsuite/gcc.target/riscv/zvknc-2.c | 37 
>>  gcc/testsuite/gcc.target/riscv/zvknc.c   | 37 
>>  gcc/testsuite/gcc.target/riscv/zvkned.c  | 13 ++
>>  gcc/testsuite/gcc.target/riscv/zvkng-1.c | 37 
>>  gcc/testsuite/gcc.target/riscv/zvkng-2.c | 37 
>>  gcc/testsuite/gcc.target/riscv/zvkng.c   | 37 
>>  gcc/testsuite/gcc.target/riscv/zvknha.c  | 13 ++
>>  gcc/testsuite/gcc.target/riscv/zvknhb.c  | 13 ++
>>  gcc/testsuite/gcc.target/riscv/zvks-1.c  | 29 +
>>  gcc/testsuite/gcc.target/riscv/zvks.c| 29 +
>>  gcc/testsuite/gcc.target/riscv/zvksc-1.c | 37 +

Re: [PING][PATCH] tree-optimization/110279- Check for nested FMA chains in reassoc

2023-07-07 Thread Philipp Tomsich
On Fri, 7 Jul 2023 at 10:28, Di Zhao OS via Gcc-patches
 wrote:
>
> Update the patch so it can apply.
>
> Tested on spec2017 fprate cases again. With option "-funroll-loops -Ofast 
> -flto",
> the improvements of 1-copy run are:
>
> Ampere1:
> 508.namd_r  4.26%
> 510.parest_r2.55%
> Overall 0.54%
> Intel Xeon:
> 503.bwaves_r1.3%
> 508.namd_r  1.58%
> overall 0.42%

This looks like a worthwhile improvement.

>From reviewing the patch, a few nit-picks:
- given that 'has_fma' can now take three values { -1, 0, 1 }, an enum
with more descriptive names for these 3 states should be used;
- using "has_fma >= 0" and "fma > 0" tests are hard to read; after
changing this to an enum, you can use macros or helper functions to
test the predicates (i.e., *_P macros or *_p helpers) for readability
- the meaning of the return values of rank_ops_for_fma should be
documented in the comment describing the function
- changing convert_mult_to_fma_1 to return a tree* (i.e., return_lhs
or NULL_TREE) removes the need for an in/out parameter

Thanks,
Philipp.

>
>
> Thanks,
> Di Zhao
>
>
> > -Original Message-
> > From: Di Zhao OS
> > Sent: Friday, June 16, 2023 4:51 PM
> > To: gcc-patches@gcc.gnu.org
> > Subject: [PATCH] tree-optimization/110279- Check for nested FMA chains in
> > reassoc
> >
> > This patch is to fix the regressions found in SPEC2017 fprate cases
> >  on aarch64.
> >
> > 1. Reused code in pass widening_mul to check for nested FMA chains
> >  (those connected by MULT_EXPRs), since re-writing to parallel
> >  generates worse codes.
> >
> > 2. Avoid re-arrange to produce less FMA chains that can be slow.
> >
> > Tested on ampere1 and neoverse-n1, this fixed the regressions in
> > 508.namd_r and 510.parest_r 1 copy run. While I'm still collecting data
> > on x86 machines we have, I'd like to know what do you think of this.
> >
> > (Previously I tried to improve things with FMA by adding a widening_mul
> > pass before reassoc2 for it's easier to recognize different patterns
> > of FMA chains and decide whether to split them. But I suppose handling
> > them all in reassoc pass is more efficient.)
> >
> > Thanks,
> > Di Zhao
> >
> > ---
> > gcc/ChangeLog:
> >
> > * tree-ssa-math-opts.cc (convert_mult_to_fma_1): Add new parameter.
> > Support new mode that merely do the checking.
> > (struct fma_transformation_info): Moved to header.
> > (class fma_deferring_state): Moved to header.
> > (convert_mult_to_fma): Add new parameter.
> > * tree-ssa-math-opts.h (struct fma_transformation_info):
> > (class fma_deferring_state): Moved from .cc.
> > (convert_mult_to_fma): Add function decl.
> > * tree-ssa-reassoc.cc (rewrite_expr_tree_parallel):
> > (rank_ops_for_fma): Return -1 if nested FMAs are found.
> > (reassociate_bb): Avoid rewriting to parallel if nested FMAs are
> > found.
>


Re: [PATCH v2] tree-optimization/110279- Check for nested FMA chains in reassoc

2023-07-10 Thread Philipp Tomsich
Jakub,

it looks like you did a lot of work on reassoc in the past — could you
have a quick look and comment?

Thanks,
Philipp.


On Tue, 11 Jul 2023 at 04:59, Di Zhao OS  wrote:
>
> Attached is an updated version of the patch.
>
> Based on Philipp's review, some changes:
>
> 1. Defined new enum fma_state to describe the state of FMA candidates
>for a list of operands. (Since the tests seems simple after the
>change, I didn't add predicates on it.)
> 2. Changed return type of convert_mult_to_fma_1 and convert_mult_to_fma
>to tree, to remove the in/out parameter.
> 3. Added description of return value values of rank_ops_for_fma.
>
> ---
> gcc/ChangeLog:
>
> * tree-ssa-math-opts.cc (convert_mult_to_fma_1): Added new parameter
> check_only_p. Changed return type to tree.
> (struct fma_transformation_info): Moved to header.
> (class fma_deferring_state): Moved to header.
> (convert_mult_to_fma): Added new parameter check_only_p. Changed
> return type to tree.
> * tree-ssa-math-opts.h (struct fma_transformation_info): Moved from 
> .cc.
> (class fma_deferring_state): Moved from .cc.
> (convert_mult_to_fma): Add function decl.
> * tree-ssa-reassoc.cc (enum fma_state): Defined new enum to describe
> the state of FMA candidates for a list of operands.
> (rewrite_expr_tree_parallel): Changed boolean parameter to enum type.
> (rank_ops_for_fma): Return enum fma_state.
> (reassociate_bb): Avoid rewriting to parallel if nested FMAs are 
> found.
>
> Thanks,
> Di Zhao
>
>


Re: [PATCH] RISC-V: Fix wrong partial subreg check for bsetidisi

2023-02-28 Thread Philipp Tomsich
On Tue, 28 Feb 2023 at 06:00, Lin Sinan  wrote:
>
> From: Lin Sinan 
>
> The partial subreg check should be for subreg operand(operand 1) instead of
> the immediate operand(operand 2). This change also fix pr68648.c in zbs.

Good catch.
Reviewed-by: 


Re: [PATCH] RISC-V: costs: miscomputed shiftadd_cost triggering synth_mult [PR/108987]

2023-03-01 Thread Philipp Tomsich
On Wed, 1 Mar 2023 at 20:53, Vineet Gupta  wrote:
>
> This showed up as dynamic icount regression in SPEC 531.deepsjeng with 
> upstream
> gcc (vs. gcc 12.2). gcc was resorting to synthetic multiply using shift+add(s)
> even when multiply had clear cost benefit.
>
> |000133b8  .constprop.0]+0x382>:
> |   133b8:  srl a3,a1,s6
> |   133bc:  and a3,a3,s5
> |   133c0:  sllia4,a3,0x9
> |   133c4:  add a4,a4,a3
> |   133c6:  sllia4,a4,0x9
> |   133c8:  add a4,a4,a3
> |   133ca:  sllia3,a4,0x1b
> |   133ce:  add a4,a4,a3
>
> vs. gcc 12 doing something lke below.
>
> |000131c4  .constprop.0]+0x35c>:
> |   131c4:  ld  s1,8(sp)
> |   131c6:  srl a3,a1,s4
> |   131ca:  and a3,a3,s11
> |   131ce:  mul a3,a3,s1
>
> Bisected this to f90cb39235c4 ("RISC-V: costs: support shift-and-add in
> strength-reduction"). The intent was to optimize cost for
> shift-add-pow2-{1,2,3} corresponding to bitmanip insns SH*ADD, but ended
> up doing that for all shift values which seems to favor synthezing
> multiply among others.
>
> The bug itself is trivial, IN_RANGE() calling pow2p_hwi() which returns bool
> vs. exact_log2() returning power of 2.
>
> This fix also requires update to the test introduced by the same commit
> which now generates MUL vs. synthesizing it.
>
> gcc/Changelog:
>
> * config/riscv/riscv.cc (riscv_rtx_costs): Fixed IN_RANGE() to
>   use exact_log2().
>
> gcc/testsuite/ChangeLog:
>
> * gcc.target/riscv/zba-shNadd-07.c: f2(i*783) now generates MUL vs.
>   5 insn sh1add+slli+add+slli+sub.
> * gcc.target/riscv/pr108987.c: New test.
>
> Signed-off-by: Vineet Gupta 

Reviewed-by: Philipp Tomsich 


Re: [PATCH v4 0/9] RISC-V: Add XThead* extension support

2023-03-15 Thread Philipp Tomsich
On Sun, 5 Mar 2023 at 11:19, Kito Cheng  wrote:

> LGTM :)
>

Applied to master, thanks!
--Philipp.

On Thu, Mar 2, 2023 at 4:36 PM Christoph Muellner
>  wrote:
> >
> > From: Christoph Müllner 
> >
> > This series introduces support for the T-Head specific RISC-V ISA
> extensions
> > which are available e.g. on the T-Head XuanTie C906.
> >
> > The ISA spec can be found here:
> >   https://github.com/T-head-Semi/thead-extension-spec
> >
> > This series adds support for the following XThead* extensions:
> > * XTheadBa
> > * XTheadBb
> > * XTheadBs
> > * XTheadCmo
> > * XTheadCondMov
> > * XTheadFmv
> > * XTheadInt
> > * XTheadMac
> > * XTheadMemPair
> > * XTheadSync
> >
> > All extensions are properly integrated and the included tests
> > demonstrate the improvements of the generated code.
> >
> > The series also introduces support for "-mcpu=thead-c906", which also
> > enables all available XThead* ISA extensions of the T-Head C906.
> >
> > All patches have been tested and don't introduce regressions for RV32 or
> RV64.
> > The patches have also been tested with SPEC CPU2017 on QEMU and real HW
> > (D1 board).
> >
> > Support patches for these extensions for Binutils, QEMU, and LLVM have
> > already been merged in the corresponding upstream projects.
> >
> > Patches 1-8 from this series (everything except the last one) got an ACK
> > by Kito. However, since there were a few comments after the ACK, I
> > decided to send out a v4, so that reviewers can verify that their
> > comments have been addressed properly.
> >
> > Note, that there was a concern raised by Andrew Pinski (on CC), which
> > might not be resolved with this series (I could not reproduce the issue,
> > but I might have misunderstood something).
> >
> > Changes in v4:
> > - Drop XTheadMemIdx and XTheadFMemIdx (will be a follow-up series)
> > - Replace 'immediate_operand' by 'const_int_operand' in many patterns
> > - Small cleanups in XTheadBb
> > - Factor out C code into thead.cc (XTheadMemPair) to minimize changes in
> >   riscv.cc
> >
> > Changes in v3:
> > - Bugfix in XTheadBa
> > - Rewrite of XTheadMemPair
> > - Inclusion of XTheadMemIdx and XTheadFMemIdx
> >
> > Christoph Müllner (9):
> >   riscv: Add basic XThead* vendor extension support
> >   riscv: riscv-cores.def: Add T-Head XuanTie C906
> >   riscv: thead: Add support for the XTheadBa ISA extension
> >   riscv: thead: Add support for the XTheadBs ISA extension
> >   riscv: thead: Add support for the XTheadBb ISA extension
> >   riscv: thead: Add support for the XTheadCondMov ISA extensions
> >   riscv: thead: Add support for the XTheadMac ISA extension
> >   riscv: thead: Add support for the XTheadFmv ISA extension
> >   riscv: thead: Add support for the XTheadMemPair ISA extension
> >
> >  gcc/common/config/riscv/riscv-common.cc   |  26 ++
> >  gcc/config.gcc|   1 +
> >  gcc/config/riscv/bitmanip.md  |  52 ++-
> >  gcc/config/riscv/constraints.md   |   8 +
> >  gcc/config/riscv/iterators.md |   4 +
> >  gcc/config/riscv/peephole.md  |  56 +++
> >  gcc/config/riscv/riscv-cores.def  |   4 +
> >  gcc/config/riscv/riscv-opts.h |  26 ++
> >  gcc/config/riscv/riscv-protos.h   |  16 +-
> >  gcc/config/riscv/riscv.cc | 226 +++--
> >  gcc/config/riscv/riscv.md |  67 ++-
> >  gcc/config/riscv/riscv.opt|   3 +
> >  gcc/config/riscv/t-riscv  |   4 +
> >  gcc/config/riscv/thead.cc | 427 ++
> >  gcc/config/riscv/thead.md | 346 ++
> >  .../gcc.target/riscv/mcpu-thead-c906.c|  28 ++
> >  .../gcc.target/riscv/xtheadba-addsl.c |  55 +++
> >  gcc/testsuite/gcc.target/riscv/xtheadba.c |  14 +
> >  gcc/testsuite/gcc.target/riscv/xtheadbb-ext.c |  20 +
> >  .../gcc.target/riscv/xtheadbb-extu-2.c|  22 +
> >  .../gcc.target/riscv/xtheadbb-extu.c  |  22 +
> >  gcc/testsuite/gcc.target/riscv/xtheadbb-ff1.c |  18 +
> >  gcc/testsuite/gcc.target/riscv/xtheadbb-rev.c |  45 ++
> >  .../gcc.target/riscv/xtheadbb-srri.c  |  25 +
> >  gcc/testsuite/gcc.target/riscv/xtheadbb.c |  14 +
> >  gcc/testsuite/gcc.target/riscv/xtheadbs-tst.c |  13 +
> >  gcc/testsuite/gcc.target/riscv/xtheadbs.c |  14 +
> >  gcc/testsuite/gcc.target/riscv/xtheadcmo.c|  14 +
> >  .../riscv/xtheadcondmov-mveqz-imm-eqz.c   |  38 ++
> >  .../riscv/xtheadcondmov-mveqz-imm-not.c   |  38 ++
> >  .../riscv/xtheadcondmov-mveqz-reg-eqz.c   |  38 ++
> >  .../riscv/xtheadcondmov-mveqz-reg-not.c   |  38 ++
> >  .../riscv/xtheadcondmov-mvnez-imm-cond.c  |  38 ++
> >  .../riscv/xtheadcondmov-mvnez-imm-nez.c   |  38 ++
> >  .../riscv/xtheadcondmov-mvnez-reg-cond.c  |  38 ++
> >  .../riscv/xtheadcondmov-mvnez-reg-nez.c   |  38 ++
> >  .../gcc.target/riscv/xtheadcondmov.c  |  14 +

Re: [wwwdocs] gcc-13: riscv: Document the T-Head CPU support

2023-03-15 Thread Philipp Tomsich
Applied to master, thanks!
Philipp.

On Sun, 5 Mar 2023 at 11:18, Kito Cheng  wrote:

> LGTM :)
>
>
> On Fri, Feb 24, 2023 at 7:19 PM Christoph Muellner
>  wrote:
> >
> > From: Christoph Müllner 
> >
> > This patch documents the new T-Head CPU support for RISC-V.
> >
> > Signed-off-by: Christoph Müllner 
> > ---
> >  htdocs/gcc-13/changes.html | 24 +++-
> >  1 file changed, 23 insertions(+), 1 deletion(-)
> >
> > diff --git a/htdocs/gcc-13/changes.html b/htdocs/gcc-13/changes.html
> > index a803f501..ce5ba35c 100644
> > --- a/htdocs/gcc-13/changes.html
> > +++ b/htdocs/gcc-13/changes.html
> > @@ -490,7 +490,29 @@ a work-in-progress.
> >
> >  RISC-V
> >  
> > -New ISA extension support for zawrs.
> > +  New ISA extension support for Zawrs.
> > +  Support for the following vendor extensions has been added:
> > +
> > +  XTheadBa
> > +  XTheadBb
> > +  XTheadBs
> > +  XTheadCmo
> > +  XTheadCondMov
> > +  XTheadFMemIdx
> > +  XTheadFmv
> > +  XTheadInt
> > +  XTheadMac
> > +  XTheadMemIdx
> > +  XTheadMemPair
> > +  XTheadSync
> > +
> > +  
> > +  The following new CPUs are supported through the
> -mcpu
> > +  option (GCC identifiers in parentheses).
> > +
> > +  T-Head's XuanTie C906 (thead-c906).
> > +
> > +  
> >  
> >
> >  
> > --
> > 2.39.2
> >
>


Re: [PATCH v3] Add pattern to convert vector shift + bitwise and + multiply to vector compare in some cases.

2023-05-11 Thread Philipp Tomsich
Bootstrapped and reg-tested overnight for x86 and aarch64.
Applied to master, thanks!

Philipp.

On Tue, 9 May 2023 at 09:13, Richard Biener  wrote:
>
> On Tue, Dec 20, 2022 at 1:23 PM Manolis Tsamis  
> wrote:
> >
> > When using SWAR (SIMD in a register) techniques a comparison operation 
> > within
> > such a register can be made by using a combination of shifts, bitwise and 
> > and
> > multiplication. If code using this scheme is vectorized then there is 
> > potential
> > to replace all these operations with a single vector comparison, by 
> > reinterpreting
> > the vector types to match the width of the SWAR register.
> >
> > For example, for the test function packed_cmp_16_32, the original generated 
> > code is:
> >
> > ldr q0, [x0]
> > add w1, w1, 1
> > ushrv0.4s, v0.4s, 15
> > and v0.16b, v0.16b, v2.16b
> > shl v1.4s, v0.4s, 16
> > sub v0.4s, v1.4s, v0.4s
> > str q0, [x0], 16
> > cmp w2, w1
> > bhi .L20
> >
> > with this pattern the above can be optimized to:
> >
> > ldr q0, [x0]
> > add w1, w1, 1
> > cmltv0.8h, v0.8h, #0
> > str q0, [x0], 16
> > cmp w2, w1
> > bhi .L20
> >
> > The effect is similar for x86-64.
> >
> > gcc/ChangeLog:
> >
> > * match.pd: Simplify vector shift + bit_and + multiply in some 
> > cases.
> >
> > gcc/testsuite/ChangeLog:
> >
> > * gcc.target/aarch64/swar_to_vec_cmp.c: New test.
>
> OK if it still bootstraps/tests OK.
>
> Thanks,
> Richard.
>
> > Signed-off-by: Manolis Tsamis 
> >
> > ---
> >
> > Changes in v3:
> > - Changed pattern to use vec_cond_expr.
> > - Changed pattern to work with VLA vector.
> > - Added both expand_vec_cmp_expr_p and
> >   expand_vec_cond_expr_p check.
> > - Fixed type compatibility issues.
> >
> >  gcc/match.pd  | 61 
> >  .../gcc.target/aarch64/swar_to_vec_cmp.c  | 72 +++
> >  2 files changed, 133 insertions(+)
> >  create mode 100644 gcc/testsuite/gcc.target/aarch64/swar_to_vec_cmp.c
> >
> > diff --git a/gcc/match.pd b/gcc/match.pd
> > index 67a0a682f31..320437f8aa3 100644
> > --- a/gcc/match.pd
> > +++ b/gcc/match.pd
> > @@ -301,6 +301,67 @@ DEFINE_INT_AND_FLOAT_ROUND_FN (RINT)
> >  (view_convert (bit_and:itype (view_convert @0)
> >  (ne @1 { build_zero_cst (type); })))
> >
> > +/* In SWAR (SIMD within a register) code a signed comparison of packed data
> > +   can be constructed with a particular combination of shift, bitwise and,
> > +   and multiplication by constants.  If that code is vectorized we can
> > +   convert this pattern into a more efficient vector comparison.  */
> > +(simplify
> > + (mult (bit_and (rshift @0 uniform_integer_cst_p@1)
> > +   uniform_integer_cst_p@2)
> > +uniform_integer_cst_p@3)
> > + (with {
> > +   tree rshift_cst = uniform_integer_cst_p (@1);
> > +   tree bit_and_cst = uniform_integer_cst_p (@2);
> > +   tree mult_cst = uniform_integer_cst_p (@3);
> > +  }
> > +  /* Make sure we're working with vectors and uniform vector constants.  */
> > +  (if (VECTOR_TYPE_P (type)
> > +   && tree_fits_uhwi_p (rshift_cst)
> > +   && tree_fits_uhwi_p (mult_cst)
> > +   && tree_fits_uhwi_p (bit_and_cst))
> > +   /* Compute what constants would be needed for this to represent a packed
> > +  comparison based on the shift amount denoted by RSHIFT_CST.  */
> > +   (with {
> > + HOST_WIDE_INT vec_elem_bits = vector_element_bits (type);
> > + poly_int64 vec_nelts = TYPE_VECTOR_SUBPARTS (type);
> > + poly_int64 vec_bits = vec_elem_bits * vec_nelts;
> > + unsigned HOST_WIDE_INT cmp_bits_i, bit_and_i, mult_i;
> > + unsigned HOST_WIDE_INT target_mult_i, target_bit_and_i;
> > + cmp_bits_i = tree_to_uhwi (rshift_cst) + 1;
> > + mult_i = tree_to_uhwi (mult_cst);
> > + target_mult_i = (HOST_WIDE_INT_1U << cmp_bits_i) - 1;
> > + bit_and_i = tree_to_uhwi (bit_and_cst);
> > + target_bit_and_i = 0;
> > +
> > + /* The bit pattern in BIT_AND_I should be a mask for the least
> > +   significant bit of each packed element that is CMP_BITS wide.  */
> > + for (unsigned i = 0; i < vec_elem_bits / cmp_bits_i; i++)
> > +   target_bit_and_i = (target_bit_and_i << cmp_bits_i) | 1U;
> > +}
> > +(if ((exact_log2 (cmp_bits_i)) >= 0
> > +&& cmp_bits_i < HOST_BITS_PER_WIDE_INT
> > +&& multiple_p (vec_bits, cmp_bits_i)
> > +&& vec_elem_bits <= HOST_BITS_PER_WIDE_INT
> > +&& target_mult_i == mult_i
> > +&& target_bit_and_i == bit_and_i)
> > + /* Compute the vector shape for the comparison and check if the 
> > target is
> > +   able to expand the comparison with that type.  */
> > + (with {
> > +   /* We're doing a signed comparison.  */
> > +   tree cmp_type = build_n

Re: [PATCH] RISC-V: Add rounding mode operand for fixed-point patterns

2023-05-15 Thread Philipp Tomsich
On Mon, 15 May 2023 at 10:18,  wrote:
>
> From: Juzhe-Zhong 
>
> Since we are going to have fixed-point intrinsics that are modeling rounding 
> mode
> https://github.com/riscv-non-isa/rvv-intrinsic-doc/pull/222
>
> We should have operand to specify rounding mode in fixed-point instructions.
> We don't support these modeling rounding mode intrinsics yet but we will 
> definetely
> support them later.
>
> This is the preparing patch for new coming intrinsics.
>
> gcc/ChangeLog:
>
> * config/riscv/riscv-protos.h (enum vxrm_field_enum): New enum.
> * config/riscv/riscv-vector-builtins.cc 
> (function_expander::use_exact_insn): Add default rounding mode operand.
> * config/riscv/riscv.cc (riscv_hard_regno_nregs): Add VXRM_RENUM.
> (riscv_hard_regno_mode_ok): Ditto.
> (riscv_conditional_register_usage): Ditto.
> * config/riscv/riscv.h (DWARF_FRAME_REGNUM): Ditto.
> (VXRM_REG_P): Ditto.
> (RISCV_DWARF_VXRM): Ditto.
> * config/riscv/riscv.md: Ditto.
> * config/riscv/vector.md: Ditto.
>
> ---
>  gcc/config/riscv/riscv-protos.h   |  8 +++
>  gcc/config/riscv/riscv-vector-builtins.cc |  7 +++
>  gcc/config/riscv/riscv.cc |  5 +-
>  gcc/config/riscv/riscv.h  |  5 +-
>  gcc/config/riscv/riscv.md |  1 +
>  gcc/config/riscv/vector.md| 74 +--
>  6 files changed, 77 insertions(+), 23 deletions(-)
>
> diff --git a/gcc/config/riscv/riscv-protos.h b/gcc/config/riscv/riscv-protos.h
> index bc71f9cbbba..835bb802fc6 100644
> --- a/gcc/config/riscv/riscv-protos.h
> +++ b/gcc/config/riscv/riscv-protos.h
> @@ -223,6 +223,14 @@ machine_mode preferred_simd_mode (scalar_mode);
>  opt_machine_mode get_mask_mode (machine_mode);
>  void expand_vec_series (rtx, rtx, rtx);
>  void expand_vec_init (rtx, rtx);
> +/* Rounding mode bitfield for fixed point VXRM.  */
> +enum vxrm_field_enum
> +{
> +  VXRM_RNU,
> +  VXRM_RNE,
> +  VXRM_RDN,
> +  VXRM_ROD
> +};
>  }
>
>  /* We classify builtin types into two classes:
> diff --git a/gcc/config/riscv/riscv-vector-builtins.cc 
> b/gcc/config/riscv/riscv-vector-builtins.cc
> index 0f56f29f7aa..1de075fb90d 100644
> --- a/gcc/config/riscv/riscv-vector-builtins.cc
> +++ b/gcc/config/riscv/riscv-vector-builtins.cc
> @@ -3288,6 +3288,13 @@ function_expander::use_exact_insn (insn_code icode)
>
>if (base->apply_vl_p ())
>  add_input_operand (Pmode, get_avl_type_rtx (avl_type::NONVLMAX));
> +
> +  /* TODO: Currently, we don't support intrinsic that is modeling rounding 
> mode.
> + We add default rounding mode for the intrinsics that didn't model 
> rounding
> + mode yet.  */
> +  if (opno != insn_data[icode].n_generator_args)
> +add_input_operand (Pmode, const0_rtx);
> +
>return generate_insn (icode);
>  }
>
> diff --git a/gcc/config/riscv/riscv.cc b/gcc/config/riscv/riscv.cc
> index a770fdfaa0e..c9c8861f84a 100644
> --- a/gcc/config/riscv/riscv.cc
> +++ b/gcc/config/riscv/riscv.cc
> @@ -6082,7 +6082,7 @@ riscv_hard_regno_nregs (unsigned int regno, 
> machine_mode mode)
>
>/* mode for VL or VTYPE are just a marker, not holding value,
>   so it always consume one register.  */
> -  if (regno == VTYPE_REGNUM || regno == VL_REGNUM)
> +  if (regno == VTYPE_REGNUM || regno == VL_REGNUM || regno == VXRM_REGNUM)

Shouldn't this be VXRM_REG_P(...), VTYPE_REG_P(...), and VL_REG_P(...)?

>  return 1;
>
>/* Assume every valid non-vector mode fits in one vector register.  */
> @@ -6150,7 +6150,7 @@ riscv_hard_regno_mode_ok (unsigned int regno, 
> machine_mode mode)
>if (lmul != 1)
> return ((regno % lmul) == 0);
>  }
> -  else if (regno == VL_REGNUM || regno == VTYPE_REGNUM)
> +  else if (regno == VL_REGNUM || regno == VTYPE_REGNUM || regno == 
> VXRM_REGNUM)

Ditto.

>  return true;
>else
>  return false;
> @@ -6586,6 +6586,7 @@ riscv_conditional_register_usage (void)
>
>fixed_regs[VTYPE_REGNUM] = call_used_regs[VTYPE_REGNUM] = 1;
>fixed_regs[VL_REGNUM] = call_used_regs[VL_REGNUM] = 1;
> +  fixed_regs[VXRM_REGNUM] = call_used_regs[VXRM_REGNUM] = 1;
>  }
>  }
>
> diff --git a/gcc/config/riscv/riscv.h b/gcc/config/riscv/riscv.h
> index 4473115d3a9..f74b70de562 100644
> --- a/gcc/config/riscv/riscv.h
> +++ b/gcc/config/riscv/riscv.h
> @@ -121,7 +121,8 @@ ASM_MISA_SPEC
>
>  /* The mapping from gcc register number to DWARF 2 CFA column number.  */
>  #define DWARF_FRAME_REGNUM(REGNO)
>   \
> -  (VL_REG_P (REGNO) ? RISCV_DWARF_VL 
>   \
> +  (VXRM_REG_P (REGNO) ? RISCV_DWARF_VXRM 
>   \
> +   : VL_REG_P (REGNO) ? RISCV_DWARF_VL   
>   \
> : VTYPE_REG_P (REGNO) 
>   \
>   ? RISCV_DWARF_VTYPE 
>   \

Re: [PATCH 1/2] Implementation of new RISCV optimizations pass: fold-mem-offsets.

2023-05-25 Thread Philipp Tomsich
On Thu, 25 May 2023 at 16:14, Jeff Law via Gcc-patches
 wrote:
>
>
>
> On 5/25/23 07:50, Richard Biener wrote:
> > On Thu, May 25, 2023 at 3:32 PM Jeff Law via Gcc-patches
> >  wrote:
> >>
> >>
> >>
> >> On 5/25/23 07:01, Richard Biener via Gcc-patches wrote:
> >>> On Thu, May 25, 2023 at 2:36 PM Manolis Tsamis  
> >>> wrote:
> 
>  Implementation of the new RISC-V optimization pass for memory offset
>  calculations, documentation and testcases.
> >>>
> >>> Why do fwprop or combine not what you want to do?

At least for stack variables, the virtual-stack-vars is not resolved
until reload.
So combine will be running much too early to be of any use (and I
haven't recently looked at whether one of the propagation passes runs
after).

Philipp.

> >> I think a lot of them end up coming from register elimination.
> >
> > Why isn't this a problem for other targets then?  Or maybe it is and this
> > shouldn't be a machine specific pass?  Maybe postreload-gcse should
> > perform strength reduction (I can't think of any other post reload pass
> > that would do something even remotely related).
> It is to some degree.  I ran into similar problems at my prior employer.
>   We ended up working around it in the target files in a different way
> -- which didn't work when I quickly tried it on RISC-V.
>
> Seems like it would be worth another investigative step as part of the
> evaluation of this patch.  I wasn't at 100% when I did that poking
> around many months ago.
>
> Jeff


Re: [PATCH] RISC-V: Optimize TARGET_XTHEADCONDMOV

2023-05-26 Thread Philipp Tomsich
LGTM.  Happy to move this forward, once it receives an OK from one of you.

--Philipp.

On Fri, 26 May 2023 at 02:53, Die Li  wrote:
>
> This patch allows less instructions to be used when TARGET_XTHEADCONDMOV is 
> enabled.
>
> Provide an example from the existing testcases.
>
> Testcase:
> int ConEmv_imm_imm_reg(int x, int y){
>   if (x == 1000) return 10;
>   return y;
> }
>
> Cflags:
> -O2 -march=rv64gc_xtheadcondmov -mabi=lp64d
>
> before patch:
> ConEmv_imm_imm_reg:
> addia5,a0,-1000
> li  a0,10
> th.mvneza0,zero,a5
> th.mveqza1,zero,a5
> or  a0,a0,a1
> ret
>
> after patch:
> ConEmv_imm_imm_reg:
> addia5,a0,-1000
> li  a0,10
> th.mvneza0,a1,a5
> ret
>
> Signed-off-by: Die Li 
>
> gcc/ChangeLog:
>
> * config/riscv/riscv.cc (riscv_expand_conditional_move_onesided): 
> Delete.
> (riscv_expand_conditional_move):  Reuse the TARGET_SFB_ALU expand 
> process for TARGET_XTHEADCONDMOV
>
> gcc/testsuite/ChangeLog:
>
> * gcc.target/riscv/xtheadcondmov-indirect-rv32.c: Update the output.
> * gcc.target/riscv/xtheadcondmov-indirect-rv64.c: Likewise.
> ---
>  gcc/config/riscv/riscv.cc | 44 +++--
>  .../riscv/xtheadcondmov-indirect-rv32.c   | 48 +++
>  .../riscv/xtheadcondmov-indirect-rv64.c   | 48 +++
>  3 files changed, 42 insertions(+), 98 deletions(-)
>
> diff --git a/gcc/config/riscv/riscv.cc b/gcc/config/riscv/riscv.cc
> index 09fc9e5d95e..8b8ac9181ba 100644
> --- a/gcc/config/riscv/riscv.cc
> +++ b/gcc/config/riscv/riscv.cc
> @@ -3442,37 +3442,6 @@ riscv_expand_conditional_branch (rtx label, rtx_code 
> code, rtx op0, rtx op1)
>emit_jump_insn (gen_condjump (condition, label));
>  }
>
> -/* Helper to emit two one-sided conditional moves for the movecc.  */
> -
> -static void
> -riscv_expand_conditional_move_onesided (rtx dest, rtx cons, rtx alt,
> -   rtx_code code, rtx op0, rtx op1)
> -{
> -  machine_mode mode = GET_MODE (dest);
> -
> -  gcc_assert (GET_MODE_CLASS (mode) == MODE_INT);
> -  gcc_assert (reg_or_0_operand (cons, mode));
> -  gcc_assert (reg_or_0_operand (alt, mode));
> -
> -  riscv_emit_int_compare (&code, &op0, &op1, true);
> -  rtx cond = gen_rtx_fmt_ee (code, mode, op0, op1);
> -
> -  rtx tmp1 = gen_reg_rtx (mode);
> -  rtx tmp2 = gen_reg_rtx (mode);
> -
> -  emit_insn (gen_rtx_SET (tmp1, gen_rtx_IF_THEN_ELSE (mode, cond,
> - cons, const0_rtx)));
> -
> -  /* We need to expand a sequence for both blocks and we do that such,
> - that the second conditional move will use the inverted condition.
> - We use temporaries that are or'd to the dest register.  */
> -  cond = gen_rtx_fmt_ee ((code == EQ) ? NE : EQ, mode, op0, op1);
> -  emit_insn (gen_rtx_SET (tmp2, gen_rtx_IF_THEN_ELSE (mode, cond,
> - alt, const0_rtx)));
> -
> -  emit_insn (gen_rtx_SET (dest, gen_rtx_IOR (mode, tmp1, tmp2)));
> - }
> -
>  /* Emit a cond move: If OP holds, move CONS to DEST; else move ALT to DEST.
> Return 0 if expansion failed.  */
>
> @@ -3483,6 +3452,7 @@ riscv_expand_conditional_move (rtx dest, rtx op, rtx 
> cons, rtx alt)
>rtx_code code = GET_CODE (op);
>rtx op0 = XEXP (op, 0);
>rtx op1 = XEXP (op, 1);
> +  bool need_eq_ne_p = false;
>
>if (TARGET_XTHEADCONDMOV
>&& GET_MODE_CLASS (mode) == MODE_INT
> @@ -3492,14 +3462,12 @@ riscv_expand_conditional_move (rtx dest, rtx op, rtx 
> cons, rtx alt)
>&& GET_MODE (op0) == mode
>&& GET_MODE (op1) == mode
>&& (code == EQ || code == NE))
> +need_eq_ne_p = true;
> +
> +  if (need_eq_ne_p || (TARGET_SFB_ALU
> +  && GET_MODE (op0) == word_mode))
>  {
> -  riscv_expand_conditional_move_onesided (dest, cons, alt, code, op0, 
> op1);
> -  return true;
> -}
> -  else if (TARGET_SFB_ALU
> -  && GET_MODE (op0) == word_mode)
> -{
> -  riscv_emit_int_compare (&code, &op0, &op1);
> +  riscv_emit_int_compare (&code, &op0, &op1, need_eq_ne_p);
>rtx cond = gen_rtx_fmt_ee (code, GET_MODE (op0), op0, op1);
>
>/* The expander allows (const_int 0) for CONS for the benefit of
> diff --git a/gcc/testsuite/gcc.target/riscv/xtheadcondmov-indirect-rv32.c 
> b/gcc/testsuite/gcc.target/riscv/xtheadcondmov-indirect-rv32.c
> index 9afdc2eabfd..e2b135f3d00 100644
> --- a/gcc/testsuite/gcc.target/riscv/xtheadcondmov-indirect-rv32.c
> +++ b/gcc/testsuite/gcc.target/riscv/xtheadcondmov-indirect-rv32.c
> @@ -1,15 +1,13 @@
>  /* { dg-do compile } */
>  /* { dg-options "-O2 -march=rv32gc_xtheadcondmov -mabi=ilp32 
> -mriscv-attribute" } */
> -/* { dg-skip-if "" { *-*-* } { "-O0" "-Os" "-Og" } } */
> +/* { dg-skip-if "" { *-*-* } {"-O0" "-O1" "-Os" "-Og" "-O3" "-Oz" "-flto"} } 
> */
>  /* { dg-final { check-function-b

[PATCH] aarch64: update Ampere-1 core definition

2022-10-03 Thread Philipp Tomsich
This brings the extensions detected by -mcpu=native on Ampere-1 systems
in sync with the defaults generated for -mcpu=ampere1.

Note that some kernel versions may misreport the presence of PAUTH and
PREDRES (i.e., -mcpu=native will add 'nopauth' and 'nopredres').

gcc/ChangeLog:

* config/aarch64/aarch64-cores.def (AARCH64_CORE): Update
  Ampere-1 core entry.

Signed-off-by: Philipp Tomsich 

---
Ok for backport?

 gcc/config/aarch64/aarch64-cores.def | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/gcc/config/aarch64/aarch64-cores.def 
b/gcc/config/aarch64/aarch64-cores.def
index 60299160bb6..9090f80b4b7 100644
--- a/gcc/config/aarch64/aarch64-cores.def
+++ b/gcc/config/aarch64/aarch64-cores.def
@@ -69,7 +69,7 @@ AARCH64_CORE("thunderxt81",   thunderxt81,   thunderx,  V8A,  
(CRC, CRYPTO), thu
 AARCH64_CORE("thunderxt83",   thunderxt83,   thunderx,  V8A,  (CRC, CRYPTO), 
thunderx,  0x43, 0x0a3, -1)
 
 /* Ampere Computing ('\xC0') cores. */
-AARCH64_CORE("ampere1", ampere1, cortexa57, V8_6A, (), ampere1, 0xC0, 0xac3, 
-1)
+AARCH64_CORE("ampere1", ampere1, cortexa57, V8_6A, (F16, RCPC, RNG, AES, 
SHA3), ampere1, 0xC0, 0xac3, -1)
 /* Do not swap around "emag" and "xgene1",
this order is required to handle variant correctly. */
 AARCH64_CORE("emag",emag,  xgene1,V8A,  (CRC, CRYPTO), emag, 
0x50, 0x000, 3)
-- 
2.34.1



[PATCH] aarch64: fix off-by-one in reading cpuinfo

2022-10-03 Thread Philipp Tomsich
Fixes: 341573406b39

Don't subtract one from the result of strnlen() when trying to point
to the first character after the current string.  This issue would
cause individual characters (where the 128 byte buffers are stitched
together) to be lost.

gcc/ChangeLog:

* config/aarch64/driver-aarch64.cc (readline): Fix off-by-one.

Signed-off-by: Philipp Tomsich 

---

 gcc/config/aarch64/driver-aarch64.cc | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/gcc/config/aarch64/driver-aarch64.cc 
b/gcc/config/aarch64/driver-aarch64.cc
index 52ff537908e..48250e68034 100644
--- a/gcc/config/aarch64/driver-aarch64.cc
+++ b/gcc/config/aarch64/driver-aarch64.cc
@@ -203,9 +203,9 @@ readline (FILE *f)
return std::string ();
   /* If we're not at the end of the line then override the
 \0 added by fgets.  */
-  last = strnlen (buf, size) - 1;
+  last = strnlen (buf, size);
 }
-  while (!feof (f) && buf[last] != '\n');
+  while (!feof (f) && (last > 0 && buf[last - 1] != '\n'));
 
   std::string result (buf);
   free (buf);
-- 
2.34.1



Re: [PATCH] aarch64: update Ampere-1 core definition

2022-10-06 Thread Philipp Tomsich
On Tue, 4 Oct 2022 at 18:43, Richard Sandiford
 wrote:
>
> Philipp Tomsich  writes:
> > This brings the extensions detected by -mcpu=native on Ampere-1 systems
> > in sync with the defaults generated for -mcpu=ampere1.
> >
> > Note that some kernel versions may misreport the presence of PAUTH and
> > PREDRES (i.e., -mcpu=native will add 'nopauth' and 'nopredres').
> >
> > gcc/ChangeLog:
> >
> >   * config/aarch64/aarch64-cores.def (AARCH64_CORE): Update
> >   Ampere-1 core entry.
> >
> > Signed-off-by: Philipp Tomsich 
> >
> > ---
> > Ok for backport?
> >
> >  gcc/config/aarch64/aarch64-cores.def | 2 +-
> >  1 file changed, 1 insertion(+), 1 deletion(-)
> >
> > diff --git a/gcc/config/aarch64/aarch64-cores.def 
> > b/gcc/config/aarch64/aarch64-cores.def
> > index 60299160bb6..9090f80b4b7 100644
> > --- a/gcc/config/aarch64/aarch64-cores.def
> > +++ b/gcc/config/aarch64/aarch64-cores.def
> > @@ -69,7 +69,7 @@ AARCH64_CORE("thunderxt81",   thunderxt81,   thunderx,  
> > V8A,  (CRC, CRYPTO), thu
> >  AARCH64_CORE("thunderxt83",   thunderxt83,   thunderx,  V8A,  (CRC, 
> > CRYPTO), thunderx,  0x43, 0x0a3, -1)
> >
> >  /* Ampere Computing ('\xC0') cores. */
> > -AARCH64_CORE("ampere1", ampere1, cortexa57, V8_6A, (), ampere1, 0xC0, 
> > 0xac3, -1)
> > +AARCH64_CORE("ampere1", ampere1, cortexa57, V8_6A, (F16, RCPC, RNG, AES, 
> > SHA3), ampere1, 0xC0, 0xac3, -1)
>
> The fact that you had include RCPC here shows that there was a bug
> in the definition of Armv8.3-A.  I've just pushed a fix for that.
>
> Otherwise, this seems to line up with the LLVM definition, except
> that this definition enables RNG/AEK_RAND whereas the LLVM one doesn't
> seem to.  Which one's right (or is it me that's wrong)?

I just rechecked, and the latest documents (in correspondence to the
/proc/cpuinfo-output) confirm that FEAT_RNG is implemented.

LLVM needs to be updated to reflect that RNG is implemented.

>
> Thanks,
> Richard
>
>
> >  /* Do not swap around "emag" and "xgene1",
> > this order is required to handle variant correctly. */
> >  AARCH64_CORE("emag",emag,  xgene1,V8A,  (CRC, CRYPTO), 
> > emag, 0x50, 0x000, 3)


[PATCH v2] aarch64: fix off-by-one in reading cpuinfo

2022-10-06 Thread Philipp Tomsich
Fixes: 341573406b39

Don't subtract one from the result of strnlen() when trying to point
to the first character after the current string.  This issue would
cause individual characters (where the 128 byte buffers are stitched
together) to be lost.

gcc/ChangeLog:

* config/aarch64/driver-aarch64.cc (readline): Fix off-by-one.

gcc/testsuite/ChangeLog:

* gcc.target/aarch64/cpunative/info_18: New test.
* gcc.target/aarch64/cpunative/native_cpu_18.c: New test.

Signed-off-by: Philipp Tomsich 

---

Changes in v2:
- Add a a regression test (as per review comment).

 gcc/config/aarch64/driver-aarch64.cc  |  4 ++--
 .../gcc.target/aarch64/cpunative/info_18  |  8 
 .../gcc.target/aarch64/cpunative/native_cpu_18.c  | 15 +++
 3 files changed, 25 insertions(+), 2 deletions(-)
 create mode 100644 gcc/testsuite/gcc.target/aarch64/cpunative/info_18
 create mode 100644 gcc/testsuite/gcc.target/aarch64/cpunative/native_cpu_18.c

diff --git a/gcc/config/aarch64/driver-aarch64.cc 
b/gcc/config/aarch64/driver-aarch64.cc
index 52ff537908e..48250e68034 100644
--- a/gcc/config/aarch64/driver-aarch64.cc
+++ b/gcc/config/aarch64/driver-aarch64.cc
@@ -203,9 +203,9 @@ readline (FILE *f)
return std::string ();
   /* If we're not at the end of the line then override the
 \0 added by fgets.  */
-  last = strnlen (buf, size) - 1;
+  last = strnlen (buf, size);
 }
-  while (!feof (f) && buf[last] != '\n');
+  while (!feof (f) && (last > 0 && buf[last - 1] != '\n'));
 
   std::string result (buf);
   free (buf);
diff --git a/gcc/testsuite/gcc.target/aarch64/cpunative/info_18 
b/gcc/testsuite/gcc.target/aarch64/cpunative/info_18
new file mode 100644
index 000..25061a4abe8
--- /dev/null
+++ b/gcc/testsuite/gcc.target/aarch64/cpunative/info_18
@@ -0,0 +1,8 @@
+processor  : 0
+BogoMIPS   : 2000.00
+Features   : fp asimd evtstrm aes pmull sha1 sha2 crc32 atomics fphp 
asimdhp cpuid asimdrdm jscvt fcma lrcpc dcpop sha3 asimddp sha512 asimdfhm dit 
uscat ilrcpc flagm ssbs sb dcpodp flagm2 frint i8mm bf16 rng ecv
+CPU implementer: 0xc0
+CPU architecture: 8
+CPU variant: 0x0
+CPU part   : 0xac3
+CPU revision   : 0
diff --git a/gcc/testsuite/gcc.target/aarch64/cpunative/native_cpu_18.c 
b/gcc/testsuite/gcc.target/aarch64/cpunative/native_cpu_18.c
new file mode 100644
index 000..b5f0a3005f5
--- /dev/null
+++ b/gcc/testsuite/gcc.target/aarch64/cpunative/native_cpu_18.c
@@ -0,0 +1,15 @@
+/* { dg-do compile { target { { aarch64*-*-linux*} && native } } } */
+/* { dg-set-compiler-env-var GCC_CPUINFO 
"$srcdir/gcc.target/aarch64/cpunative/info_18" } */
+/* { dg-additional-options "-mcpu=native" } */
+
+int main()
+{
+  return 0;
+}
+
+/* { dg-final { scan-assembler {\.arch armv8.6-a\+crc\+fp16\+aes\+sha3\+rng} } 
} */
+
+/* Test one where the boundary of buffer size would overwrite the last
+   character read when stitching the fgets-calls together.  With the
+   test data provided, this would truncate the 'sha512' into 'ha512'
+   (dropping the 'sha3' feature). */
-- 
2.34.1



[PATCH v2] aarch64: update Ampere-1 core definition

2022-10-06 Thread Philipp Tomsich
This brings the extensions detected by -mcpu=native on Ampere-1 systems
in sync with the defaults generated for -mcpu=ampere1.

Note that some early kernel versions on Ampere1 may misreport the
presence of PAUTH and PREDRES (i.e., -mcpu=native will add 'nopauth'
and 'nopredres').

gcc/ChangeLog:

* config/aarch64/aarch64-cores.def (AARCH64_CORE): Update
  Ampere-1 core entry.

Signed-off-by: Philipp Tomsich 

---
Ok for backport?

Changes in v2:
- Removed explicit RCPC, as the feature is now implicitly included
  in the 8.3 feature definition.

 gcc/config/aarch64/aarch64-cores.def | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/gcc/config/aarch64/aarch64-cores.def 
b/gcc/config/aarch64/aarch64-cores.def
index b50628d6b51..e9a4b622be0 100644
--- a/gcc/config/aarch64/aarch64-cores.def
+++ b/gcc/config/aarch64/aarch64-cores.def
@@ -69,7 +69,7 @@ AARCH64_CORE("thunderxt81",   thunderxt81,   thunderx,  V8A,  
(CRC, CRYPTO), thu
 AARCH64_CORE("thunderxt83",   thunderxt83,   thunderx,  V8A,  (CRC, CRYPTO), 
thunderx,  0x43, 0x0a3, -1)
 
 /* Ampere Computing ('\xC0') cores. */
-AARCH64_CORE("ampere1", ampere1, cortexa57, V8_6A, (), ampere1, 0xC0, 0xac3, 
-1)
+AARCH64_CORE("ampere1", ampere1, cortexa57, V8_6A, (F16, RNG, AES, SHA3), 
ampere1, 0xC0, 0xac3, -1)
 /* Do not swap around "emag" and "xgene1",
this order is required to handle variant correctly. */
 AARCH64_CORE("emag",emag,  xgene1,V8A,  (CRC, CRYPTO), emag, 
0x50, 0x000, 3)
-- 
2.34.1



Re: [PATCH v2] aarch64: fix off-by-one in reading cpuinfo

2022-10-06 Thread Philipp Tomsich
On Thu, 6 Oct 2022 at 12:06, Richard Sandiford 
wrote:

> Philipp Tomsich  writes:
> > Fixes: 341573406b39
> >
> > Don't subtract one from the result of strnlen() when trying to point
> > to the first character after the current string.  This issue would
> > cause individual characters (where the 128 byte buffers are stitched
> > together) to be lost.
> >
> > gcc/ChangeLog:
> >
> >   * config/aarch64/driver-aarch64.cc (readline): Fix off-by-one.
> >
> > gcc/testsuite/ChangeLog:
> >
> >   * gcc.target/aarch64/cpunative/info_18: New test.
> >   * gcc.target/aarch64/cpunative/native_cpu_18.c: New test.
> >
> > Signed-off-by: Philipp Tomsich 
> >
> > ---
> >
> > Changes in v2:
> > - Add a a regression test (as per review comment).
> >
> >  gcc/config/aarch64/driver-aarch64.cc  |  4 ++--
> >  .../gcc.target/aarch64/cpunative/info_18  |  8 
> >  .../gcc.target/aarch64/cpunative/native_cpu_18.c  | 15 +++
> >  3 files changed, 25 insertions(+), 2 deletions(-)
> >  create mode 100644 gcc/testsuite/gcc.target/aarch64/cpunative/info_18
> >  create mode 100644
> gcc/testsuite/gcc.target/aarch64/cpunative/native_cpu_18.c
> >
> > diff --git a/gcc/config/aarch64/driver-aarch64.cc
> b/gcc/config/aarch64/driver-aarch64.cc
> > index 52ff537908e..48250e68034 100644
> > --- a/gcc/config/aarch64/driver-aarch64.cc
> > +++ b/gcc/config/aarch64/driver-aarch64.cc
> > @@ -203,9 +203,9 @@ readline (FILE *f)
> >   return std::string ();
> >/* If we're not at the end of the line then override the
> >\0 added by fgets.  */
> > -  last = strnlen (buf, size) - 1;
> > +  last = strnlen (buf, size);
> >  }
> > -  while (!feof (f) && buf[last] != '\n');
> > +  while (!feof (f) && (last > 0 && buf[last - 1] != '\n'));
>
> Very minor, but: I think the normal GCC style would be to avoid the
> extra (...).
>
> OK with that change, thanks.  OK for backports too after a settling period.
>

Applied to master (with that change). Thanks!

I'll backport around the end of this month, if no new issues are caused by
this change.

Philipp.


>
> Richard
>
> >
> >std::string result (buf);
> >free (buf);
> > diff --git a/gcc/testsuite/gcc.target/aarch64/cpunative/info_18
> b/gcc/testsuite/gcc.target/aarch64/cpunative/info_18
> > new file mode 100644
> > index 000..25061a4abe8
> > --- /dev/null
> > +++ b/gcc/testsuite/gcc.target/aarch64/cpunative/info_18
> > @@ -0,0 +1,8 @@
> > +processor: 0
> > +BogoMIPS : 2000.00
> > +Features : fp asimd evtstrm aes pmull sha1 sha2 crc32 atomics fphp
> asimdhp cpuid asimdrdm jscvt fcma lrcpc dcpop sha3 asimddp sha512 asimdfhm
> dit uscat ilrcpc flagm ssbs sb dcpodp flagm2 frint i8mm bf16 rng ecv
> > +CPU implementer  : 0xc0
> > +CPU architecture: 8
> > +CPU variant  : 0x0
> > +CPU part : 0xac3
> > +CPU revision : 0
> > diff --git a/gcc/testsuite/gcc.target/aarch64/cpunative/native_cpu_18.c
> b/gcc/testsuite/gcc.target/aarch64/cpunative/native_cpu_18.c
> > new file mode 100644
> > index 000..b5f0a3005f5
> > --- /dev/null
> > +++ b/gcc/testsuite/gcc.target/aarch64/cpunative/native_cpu_18.c
> > @@ -0,0 +1,15 @@
> > +/* { dg-do compile { target { { aarch64*-*-linux*} && native } } } */
> > +/* { dg-set-compiler-env-var GCC_CPUINFO
> "$srcdir/gcc.target/aarch64/cpunative/info_18" } */
> > +/* { dg-additional-options "-mcpu=native" } */
> > +
> > +int main()
> > +{
> > +  return 0;
> > +}
> > +
> > +/* { dg-final { scan-assembler {\.arch
> armv8.6-a\+crc\+fp16\+aes\+sha3\+rng} } } */
> > +
> > +/* Test one where the boundary of buffer size would overwrite the last
> > +   character read when stitching the fgets-calls together.  With the
> > +   test data provided, this would truncate the 'sha512' into 'ha512'
> > +   (dropping the 'sha3' feature). */
>


Re: [PATCH v2] aarch64: update Ampere-1 core definition

2022-10-06 Thread Philipp Tomsich
Applied to master. Thanks!

Philipp.

On Thu, 6 Oct 2022 at 12:07, Richard Sandiford 
wrote:

> Philipp Tomsich  writes:
> > This brings the extensions detected by -mcpu=native on Ampere-1 systems
> > in sync with the defaults generated for -mcpu=ampere1.
> >
> > Note that some early kernel versions on Ampere1 may misreport the
> > presence of PAUTH and PREDRES (i.e., -mcpu=native will add 'nopauth'
> > and 'nopredres').
> >
> > gcc/ChangeLog:
> >
> >   * config/aarch64/aarch64-cores.def (AARCH64_CORE): Update
> >   Ampere-1 core entry.
> >
> > Signed-off-by: Philipp Tomsich 
>
> OK, thanks.
>
> > Ok for backport?
>
> Yeah.  I'll try to backport the RCPC change soon -- think it would
> be best to get that in first.
>
> Richard
>
> >
> > Changes in v2:
> > - Removed explicit RCPC, as the feature is now implicitly included
> >   in the 8.3 feature definition.
> >
> >  gcc/config/aarch64/aarch64-cores.def | 2 +-
> >  1 file changed, 1 insertion(+), 1 deletion(-)
> >
> > diff --git a/gcc/config/aarch64/aarch64-cores.def
> b/gcc/config/aarch64/aarch64-cores.def
> > index b50628d6b51..e9a4b622be0 100644
> > --- a/gcc/config/aarch64/aarch64-cores.def
> > +++ b/gcc/config/aarch64/aarch64-cores.def
> > @@ -69,7 +69,7 @@ AARCH64_CORE("thunderxt81",   thunderxt81,
>  thunderx,  V8A,  (CRC, CRYPTO), thu
> >  AARCH64_CORE("thunderxt83",   thunderxt83,   thunderx,  V8A,  (CRC,
> CRYPTO), thunderx,  0x43, 0x0a3, -1)
> >
> >  /* Ampere Computing ('\xC0') cores. */
> > -AARCH64_CORE("ampere1", ampere1, cortexa57, V8_6A, (), ampere1, 0xC0,
> 0xac3, -1)
> > +AARCH64_CORE("ampere1", ampere1, cortexa57, V8_6A, (F16, RNG, AES,
> SHA3), ampere1, 0xC0, 0xac3, -1)
> >  /* Do not swap around "emag" and "xgene1",
> > this order is required to handle variant correctly. */
> >  AARCH64_CORE("emag",emag,  xgene1,V8A,  (CRC, CRYPTO),
> emag, 0x50, 0x000, 3)
>


Re: [PATCH 3/3] RISC-V:Cache Management Operation instructions testcases

2022-03-18 Thread Philipp Tomsich
On Fri, Mar 18, 2022 at 7:58 AM Kito Cheng  wrote:

> I would suggest rename those __builtin_riscv_* to
> __builtin_riscv_cmo_*, that's less confusing,  __builtin_riscv_zero
> just seems like it will return a zero value.
>

You meant cbo_zero, right?
CMO was only the task-group name, but the extensions ended up having "cbo"
in their name…

On Fri, Mar 4, 2022 at 10:52 AM  wrote:
> >
> > From: yulong-plct 
> >
> > This commit adds testcases about CMO instructions.
> >   7
> >   8 gcc/testsuite/ChangeLog:
> >   9
> >  10 * gcc.target/riscv/cmo-zicbom-1.c: New test.
> >  11 * gcc.target/riscv/cmo-zicbom-2.c: New test.
> >  12 * gcc.target/riscv/cmo-zicbop-1.c: New test.
> >  13 * gcc.target/riscv/cmo-zicbop-2.c: New test.
> >  14 * gcc.target/riscv/cmo-zicboz-1.c: New test.
> >  15 * gcc.target/riscv/cmo-zicboz-2.c: New test.
> >
> > ---
> >  gcc/testsuite/gcc.target/riscv/cmo-zicbom-1.c | 21 +
> >  gcc/testsuite/gcc.target/riscv/cmo-zicbom-2.c | 21 +
> >  gcc/testsuite/gcc.target/riscv/cmo-zicbop-1.c | 23 +++
> >  gcc/testsuite/gcc.target/riscv/cmo-zicbop-2.c | 23 +++
> >  gcc/testsuite/gcc.target/riscv/cmo-zicboz-1.c |  9 
> >  gcc/testsuite/gcc.target/riscv/cmo-zicboz-2.c |  9 
> >  6 files changed, 106 insertions(+)
> >  create mode 100644 gcc/testsuite/gcc.target/riscv/cmo-zicbom-1.c
> >  create mode 100644 gcc/testsuite/gcc.target/riscv/cmo-zicbom-2.c
> >  create mode 100644 gcc/testsuite/gcc.target/riscv/cmo-zicbop-1.c
> >  create mode 100644 gcc/testsuite/gcc.target/riscv/cmo-zicbop-2.c
> >  create mode 100644 gcc/testsuite/gcc.target/riscv/cmo-zicboz-1.c
> >  create mode 100644 gcc/testsuite/gcc.target/riscv/cmo-zicboz-2.c
> >
> > diff --git a/gcc/testsuite/gcc.target/riscv/cmo-zicbom-1.c
> b/gcc/testsuite/gcc.target/riscv/cmo-zicbom-1.c
> > new file mode 100644
> > index 000..16935ff3d31
> > --- /dev/null
> > +++ b/gcc/testsuite/gcc.target/riscv/cmo-zicbom-1.c
> > @@ -0,0 +1,21 @@
> > +/* { dg-do compile } */
> > +/* { dg-options "-march=rv64gc_zicbom -mabi=lp64" } */
> > +
> > +int foo1()
> > +{
> > +return __builtin_riscv_clean();
> > +}
> > +
> > +int foo2()
> > +{
> > +return __builtin_riscv_flush();
> > +}
> > +
> > +int foo3()
> > +{
> > +return __builtin_riscv_inval();
> > +}
> > +
> > +/* { dg-final { scan-assembler-times "cbo.clean" 1 } } */
> > +/* { dg-final { scan-assembler-times "cbo.flush" 1 } } */
> > +/* { dg-final { scan-assembler-times "cbo.inval" 1 } } */
> > \ No newline at end of file
> > diff --git a/gcc/testsuite/gcc.target/riscv/cmo-zicbom-2.c
> b/gcc/testsuite/gcc.target/riscv/cmo-zicbom-2.c
> > new file mode 100644
> > index 000..fc14f2b9c2b
> > --- /dev/null
> > +++ b/gcc/testsuite/gcc.target/riscv/cmo-zicbom-2.c
> > @@ -0,0 +1,21 @@
> > +/* { dg-do compile } */
> > +/* { dg-options "-march=rv32gc_zicbom -mabi=ilp32" } */
> > +
> > +int foo1()
> > +{
> > +return __builtin_riscv_clean();
> > +}
> > +
> > +int foo2()
> > +{
> > +return __builtin_riscv_flush();
> > +}
> > +
> > +int foo3()
> > +{
> > +return __builtin_riscv_inval();
> > +}
> > +
> > +/* { dg-final { scan-assembler-times "cbo.clean" 1 } } */
> > +/* { dg-final { scan-assembler-times "cbo.flush" 1 } } */
> > +/* { dg-final { scan-assembler-times "cbo.inval" 1 } } */
> > \ No newline at end of file
> > diff --git a/gcc/testsuite/gcc.target/riscv/cmo-zicbop-1.c
> b/gcc/testsuite/gcc.target/riscv/cmo-zicbop-1.c
> > new file mode 100644
> > index 000..b8bac2e8c51
> > --- /dev/null
> > +++ b/gcc/testsuite/gcc.target/riscv/cmo-zicbop-1.c
> > @@ -0,0 +1,23 @@
> > +/* { dg-do compile target { { rv64-*-*}}} */
> > +/* { dg-options "-march=rv64gc_zicbop -mabi=lp64" } */
> > +
> > +void foo (char *p)
> > +{
> > +  __builtin_prefetch (p, 0, 0);
> > +  __builtin_prefetch (p, 0, 1);
> > +  __builtin_prefetch (p, 0, 2);
> > +  __builtin_prefetch (p, 0, 3);
> > +  __builtin_prefetch (p, 1, 0);
> > +  __builtin_prefetch (p, 1, 1);
> > +  __builtin_prefetch (p, 1, 2);
> > +  __builtin_prefetch (p, 1, 3);
> > +}
> > +
> > +int foo1()
> > +{
> > +  return __builtin_riscv_prefetchi(1);
> > +}
> > +
> > +/* { dg-final { scan-assembler-times "prefetch.i" 1 } } */
> > +/* { dg-final { scan-assembler-times "prefetch.r" 4 } } */
> > +/* { dg-final { scan-assembler-times "prefetch.w" 4 } } */
> > \ No newline at end of file
> > diff --git a/gcc/testsuite/gcc.target/riscv/cmo-zicbop-2.c
> b/gcc/testsuite/gcc.target/riscv/cmo-zicbop-2.c
> > new file mode 100644
> > index 000..5ace6e2b349
> > --- /dev/null
> > +++ b/gcc/testsuite/gcc.target/riscv/cmo-zicbop-2.c
> > @@ -0,0 +1,23 @@
> > +/* { dg-do compile target { { rv32-*-*}}} */
> > +/* { dg-options "-march=rv32gc_zicbop -mabi=ilp32" } */
> > +
> > +void foo (char *p)
> > +{
> > +  __builtin_prefetch (p, 0, 0);
> > +  __builtin_prefetch (p, 0, 1);
> > +  __builtin_prefetch (p, 0, 2);
> > +  __builtin_prefetc

Re: [PATCH] RISC-V: Enable TARGET_SUPPORTS_WIDE_INT

2022-02-07 Thread Philipp Tomsich
Vineet,

On Mon, 7 Feb 2022 at 07:06, Vineet Gupta  wrote:
>
> This is at par with other major arches such as aarch64, i386, s390 ...
>
> No testsuite regressions: same numbers w/ w/o

Putting that check seems a good idea, but I haven't seen any cases
related to this get through anyway.
Do you have seen any instances where the backend got this wrong? If
so, please share, so we can run a fuller regression and see any
performance impact.

Thanks,
Philipp.

> |   === gcc Summary ===
> |
> |# of expected passes   113392
> |# of unexpected failures   27
> |# of unexpected successes  3
> |# of expected failures 605
> |# of unsupported tests 2523
> |
> |   === g++ Summary ===
> |
> |# of expected passes   172997
> |# of unexpected failures   26
> |# of expected failures 706
> |# of unsupported tests 9566
>
> Signed-off-by: Vineet Gupta 
> ---
>  gcc/config/riscv/predicates.md | 2 +-
>  gcc/config/riscv/riscv.c   | 6 ++
>  gcc/config/riscv/riscv.h   | 2 ++
>  3 files changed, 9 insertions(+), 1 deletion(-)
>
> diff --git a/gcc/config/riscv/predicates.md b/gcc/config/riscv/predicates.md
> index 3da6fd4c0491..cf902229954b 100644
> --- a/gcc/config/riscv/predicates.md
> +++ b/gcc/config/riscv/predicates.md
> @@ -52,7 +52,7 @@
> (match_test "INTVAL (op) + 1 != 0")))
>
>  (define_predicate "const_0_operand"
> -  (and (match_code "const_int,const_wide_int,const_double,const_vector")
> +  (and (match_code "const_int,const_wide_int,const_vector")
> (match_test "op == CONST0_RTX (GET_MODE (op))")))
>
>  (define_predicate "reg_or_0_operand"
> diff --git a/gcc/config/riscv/riscv.c b/gcc/config/riscv/riscv.c
> index c830cd8f4ad1..d2f2d9e0276f 100644
> --- a/gcc/config/riscv/riscv.c
> +++ b/gcc/config/riscv/riscv.c
> @@ -1774,6 +1774,12 @@ riscv_rtx_costs (rtx x, machine_mode mode, int 
> outer_code, int opno ATTRIBUTE_UN
>  case SYMBOL_REF:
>  case LABEL_REF:
>  case CONST_DOUBLE:
> +  /* With TARGET_SUPPORTS_WIDE_INT const int can't be in CONST_DOUBLE
> + rtl object. Weird recheck due to switch-case fall through above.  */
> +  if (GET_CODE (x) == CONST_DOUBLE)
> +gcc_assert (GET_MODE (x) != VOIDmode);
> +  /* Fall through.  */
> +
>  case CONST:
>if ((cost = riscv_const_insns (x)) > 0)
> {
> diff --git a/gcc/config/riscv/riscv.h b/gcc/config/riscv/riscv.h
> index ff6729aedac2..91cfc82b4aa4 100644
> --- a/gcc/config/riscv/riscv.h
> +++ b/gcc/config/riscv/riscv.h
> @@ -997,4 +997,6 @@ extern void riscv_remove_unneeded_save_restore_calls 
> (void);
>
>  #define HARD_REGNO_RENAME_OK(FROM, TO) riscv_hard_regno_rename_ok (FROM, TO)
>
> +#define TARGET_SUPPORTS_WIDE_INT 1
> +
>  #endif /* ! GCC_RISCV_H */
> --
> 2.32.0
>


Re: [PATCH v1 2/2] RISC-V: Add instruction fusion (for ventana-vt1)

2021-11-17 Thread Philipp Tomsich
On Wed, 17 Nov 2021 at 20:40, Palmer Dabbelt  wrote:

> [This is my first time trying my Rivos address on the lists, so sorry if
> something goes off the rails.]
>
> On Wed, 17 Nov 2021 06:05:04 PST (-0800), gcc-patches@gcc.gnu.org wrote:
> > Hi Philipp:
> >
> > Thanks for the patch, I like this approach, that can easily configure
> > different capabilities for each core :)
> >
> > So there are only a few minor comments for this patch.
> >
> > On Mon, Nov 15, 2021 at 5:49 AM Philipp Tomsich
> >  wrote:
> >>
> >> From: Philipp Tomsich 
> >>
> >> The Ventana VT1 core supports quad-issue and instruction fusion.
> >> This implemented TARGET_SCHED_MACRO_FUSION_P to keep fusible sequences
> >> together and adds idiom matcheing for the supported fusion cases.
>
> There's a typo at "matcheing".
>
> >>
> >> gcc/ChangeLog:
> >>
> >> * config/riscv/riscv.c (enum riscv_fusion_pairs): Add symbolic
> >> constants to identify supported fusion patterns.
> >> (struct riscv_tune_param): Add fusible_op field.
> >> (riscv_macro_fusion_p): Implement.
> >> (riscv_fusion_enabled_p): Implement.
> >> (riscv_macro_fusion_pair_p): Implement and recoginze fusible
> >> idioms for Ventana VT1.
> >> (TARGET_SCHED_MACRO_FUSION_P): Point to riscv_macro_fusion_p.
> >> (TARGET_SCHED_MACRO_FUSION_PAIR_P): Point to
> riscv_macro_fusion_pair_p.
> >>
> >> Signed-off-by: Philipp Tomsich 
>
> This doesn't match the From (though admittedly I'm pretty new to the SoB
> stuff in GCC, so I'm not sure if that's even a rule here).
>

I noticed that I hadn't reset the authors and that patman had inserted a
Signed-off-by: for that reason, right after I sent this out.
Given that it's all me and there's both individual assignment paperwork and
company disclaimers on file for all of the email-addresses, this should be
fine.

>> ---
> >>
> >>  gcc/config/riscv/riscv.c | 196 +++
> >>  1 file changed, 196 insertions(+)
> >>
> >> diff --git a/gcc/config/riscv/riscv.c b/gcc/config/riscv/riscv.c
> >> index 6b918db65e9..8eac52101a3 100644
> >> --- a/gcc/config/riscv/riscv.c
> >> +++ b/gcc/config/riscv/riscv.c
> >> @@ -211,6 +211,19 @@ struct riscv_integer_op {
> >> The worst case is LUI, ADDI, SLLI, ADDI, SLLI, ADDI, SLLI, ADDI.  */
> >>  #define RISCV_MAX_INTEGER_OPS 8
> >>
> >> +enum riscv_fusion_pairs
> >> +{
> >> +  RISCV_FUSE_NOTHING = 0,
> >> +  RISCV_FUSE_ZEXTW = (1 << 0),
> >> +  RISCV_FUSE_ZEXTH = (1 << 1),
> >> +  RISCV_FUSE_ZEXTWS = (1 << 2),
> >> +  RISCV_FUSE_LDINDEXED = (1 << 3),
> >
> > RISCV_FUSE_LDINDEXED -> RISCV_FUSE_LD_INDEXED
> >
> > Could you add some comment for above enums, like that:
> > /* slli rx, rx, 32 + srli rx, rx, 32 */
> > RISCV_FUSE_ZEXTW
> >
> > So that we could know what kind of instruction will be funded for this
> enum.
> >
> >> +  RISCV_FUSE_LUI_ADDI = (1 << 4),
> >> +  RISCV_FUSE_AUIPC_ADDI = (1 << 5),
> >> +  RISCV_FUSE_LUI_LD = (1 << 6),
> >> +  RISCV_FUSE_AUIPC_LD = (1 << 7),
> >> +};
> >> +
> >>  /* Costs of various operations on the different architectures.  */
> >>
> >>  struct riscv_tune_param
> >> @@ -224,6 +237,7 @@ struct riscv_tune_param
> >>unsigned short branch_cost;
> >>unsigned short memory_cost;
> >>bool slow_unaligned_access;
> >> +  unsigned int fusible_ops;
> >>  };
> >>
> >>  /* Information about one micro-arch we know about.  */
> >> @@ -289,6 +303,7 @@ static const struct riscv_tune_param
> rocket_tune_info = {
> >>3,   /* branch_cost */
> >>5,   /* memory_cost */
> >>true,/*
> slow_unaligned_access */
> >> +  RISCV_FUSE_NOTHING,   /* fusible_ops */
> >>  };
>
> There's some tab/space issues here (and in the below ones).  They align
> when merged, but the new lines are spaces-only and the old ones have
> internal spaces mixed with tabs (IIRC that's to the GCC style, if not we
> should fix these to at least be consistent).
>
> >>
> >>  /

Re: [PATCH] RISC-V: Add Zawrs ISA extension support

2022-11-02 Thread Philipp Tomsich
On Wed, 2 Nov 2022 at 15:21, Christoph Müllner
 wrote:
>
>
>
> On Thu, Oct 27, 2022 at 10:51 PM Palmer Dabbelt  wrote:
>>
>> On Thu, 27 Oct 2022 11:23:17 PDT (-0700), christoph.muell...@vrull.eu wrote:
>> > On Thu, Oct 27, 2022 at 8:11 PM Christoph Muellner <
>> > christoph.muell...@vrull.eu> wrote:
>> >
>> >> From: Christoph Muellner 
>> >>
>> >> This patch adds support for the Zawrs ISA extension.
>> >> The patch depends on the corresponding Binutils patch
>> >> to be usable (see [1])
>> >>
>> >> The specification can be found here:
>> >> https://github.com/riscv/riscv-zawrs/blob/main/zawrs.adoc
>> >>
>> >> Note, that the Zawrs extension is not frozen or ratified yet.
>> >> Therefore this patch is an RFC and not intended to get merged.
>> >>
>> >
>> > Sorry, forgot to update this part:
>> > The Zawrs extension is frozen but not ratified.
>> > Let me know if I should send a v2 for this change of the commit msg.
>>
>> IMO it's fine to just fix it up at commit time.  This LGTM, we just need
>> the NEWS entry too.  I also don't see any build/test results.
>
>
> I ran the GCC regression test suite with rv32 and rv64 toolchains
> using the riscv-gnu-toolchain repo and did not see any regressions.
>
> Where can I create the news entry?

News are generated from
  git://gcc.gnu.org/git/gcc-wwwdocs.git

You'll want to add to
  htdocs/gcc-13/changes.html

Thanks,
Philipp.

>>
>>
>> Thanks!
>>
>> > Binuitls support has been merged recently:
>> >
>> > https://sourceware.org/git/?p=binutils-gdb.git;a=commit;h=eb668e50036e979fb0a74821df4eee0307b44e66
>> >
>> >
>> >>
>> >> [1] https://sourceware.org/pipermail/binutils/2022-April/120559.html
>> >>
>> >> gcc/ChangeLog:
>> >>
>> >> * common/config/riscv/riscv-common.cc: Add zawrs extension.
>> >> * config/riscv/riscv-opts.h (MASK_ZAWRS): New.
>> >> (TARGET_ZAWRS): New.
>> >> * config/riscv/riscv.opt: New.
>> >>
>> >> gcc/testsuite/ChangeLog:
>> >>
>> >> * gcc.target/riscv/zawrs.c: New test.
>> >>
>> >> Signed-off-by: Christoph Muellner 
>> >> ---
>> >>  gcc/common/config/riscv/riscv-common.cc |  4 
>> >>  gcc/config/riscv/riscv-opts.h   |  3 +++
>> >>  gcc/config/riscv/riscv.opt  |  3 +++
>> >>  gcc/testsuite/gcc.target/riscv/zawrs.c  | 13 +
>> >>  4 files changed, 23 insertions(+)
>> >>  create mode 100644 gcc/testsuite/gcc.target/riscv/zawrs.c
>> >>
>> >> diff --git a/gcc/common/config/riscv/riscv-common.cc
>> >> b/gcc/common/config/riscv/riscv-common.cc
>> >> index d6404a01205..4b7f777c103 100644
>> >> --- a/gcc/common/config/riscv/riscv-common.cc
>> >> +++ b/gcc/common/config/riscv/riscv-common.cc
>> >> @@ -163,6 +163,8 @@ static const struct riscv_ext_version
>> >> riscv_ext_version_table[] =
>> >>{"zifencei", ISA_SPEC_CLASS_20191213, 2, 0},
>> >>{"zifencei", ISA_SPEC_CLASS_20190608, 2, 0},
>> >>
>> >> +  {"zawrs", ISA_SPEC_CLASS_NONE, 1, 0},
>> >> +
>> >>{"zba", ISA_SPEC_CLASS_NONE, 1, 0},
>> >>{"zbb", ISA_SPEC_CLASS_NONE, 1, 0},
>> >>{"zbc", ISA_SPEC_CLASS_NONE, 1, 0},
>> >> @@ -1180,6 +1182,8 @@ static const riscv_ext_flag_table_t
>> >> riscv_ext_flag_table[] =
>> >>{"zicsr",&gcc_options::x_riscv_zi_subext, MASK_ZICSR},
>> >>{"zifencei", &gcc_options::x_riscv_zi_subext, MASK_ZIFENCEI},
>> >>
>> >> +  {"zawrs", &gcc_options::x_riscv_za_subext, MASK_ZAWRS},
>> >> +
>> >>{"zba",&gcc_options::x_riscv_zb_subext, MASK_ZBA},
>> >>{"zbb",&gcc_options::x_riscv_zb_subext, MASK_ZBB},
>> >>{"zbc",&gcc_options::x_riscv_zb_subext, MASK_ZBC},
>> >> diff --git a/gcc/config/riscv/riscv-opts.h b/gcc/config/riscv/riscv-opts.h
>> >> index 1dfe8c89209..25fd85b09b1 100644
>> >> --- a/gcc/config/riscv/riscv-opts.h
>> >> +++ b/gcc/config/riscv/riscv-opts.h
>> >> @@ -73,6 +73,9 @@ enum stack_protector_guard {
>> >>  #define TARGET_ZICSR((riscv_zi_subext & MASK_ZICSR) != 0)
>> >>  #define TARGET_ZIFENCEI ((riscv_zi_subext & MASK_ZIFENCEI) != 0)
>> >>
>> >> +#define MASK_ZAWRS   (1 << 0)
>> >> +#define TARGET_ZAWRS ((riscv_za_subext & MASK_ZAWRS) != 0)
>> >> +
>> >>  #define MASK_ZBA  (1 << 0)
>> >>  #define MASK_ZBB  (1 << 1)
>> >>  #define MASK_ZBC  (1 << 2)
>> >> diff --git a/gcc/config/riscv/riscv.opt b/gcc/config/riscv/riscv.opt
>> >> index 426ea95cd14..7c3ca48d1cc 100644
>> >> --- a/gcc/config/riscv/riscv.opt
>> >> +++ b/gcc/config/riscv/riscv.opt
>> >> @@ -203,6 +203,9 @@ long riscv_stack_protector_guard_offset = 0
>> >>  TargetVariable
>> >>  int riscv_zi_subext
>> >>
>> >> +TargetVariable
>> >> +int riscv_za_subext
>> >> +
>> >>  TargetVariable
>> >>  int riscv_zb_subext
>> >>
>> >> diff --git a/gcc/testsuite/gcc.target/riscv/zawrs.c
>> >> b/gcc/testsuite/gcc.target/riscv/zawrs.c
>> >> new file mode 100644
>> >> index 000..0b7e2662343
>> >> --- /dev/null
>> >> +++ b/gcc/testsuite/gcc.target/riscv/zawrs.c
>> >> @@ -0,0 +1,13 @@
>> >> +/* { dg-do compile } */
>> >> +/* { dg-options "-march=rv64gc_zawrs" { target { rv64 } } } */
>> 

Re: [wwwdocs] gcc-13: riscv: Document the Zawrs support

2022-11-02 Thread Philipp Tomsich
Applied to gcc-wwwdocs/master. Thanks!
Philipp.

On Wed, 2 Nov 2022 at 17:12, Kito Cheng  wrote:
>
> LGTM, thanks!
>
> On Wed, Nov 2, 2022 at 7:59 AM Christoph Muellner
>  wrote:
> >
> > From: Christoph Müllner 
> >
> > This patch documents the new RISC-V Zawrs support.
> >
> > Signed-off-by: Christoph Müllner 
> > ---
> >  htdocs/gcc-13/changes.html | 5 -
> >  1 file changed, 4 insertions(+), 1 deletion(-)
> >
> > diff --git a/htdocs/gcc-13/changes.html b/htdocs/gcc-13/changes.html
> > index 7c6bfa6e..5e6e054b 100644
> > --- a/htdocs/gcc-13/changes.html
> > +++ b/htdocs/gcc-13/changes.html
> > @@ -261,7 +261,10 @@ a work-in-progress.
> >
> >  
> >
> > -
> > +RISC-V
> > +
> > +New ISA extension support for zawrs.
> > +
> >
> >  
> >
> > --
> > 2.38.1
> >


Re: [PATCH] RISC-V: Add Zawrs ISA extension support

2022-11-02 Thread Philipp Tomsich
Applied to master (with a fixed-up commit message), thanks!
Note that the Zawrs has been approved for ratification by the RISC-V
BoD on Oct 20th.

--Philipp.


On Thu, 27 Oct 2022 at 22:51, Palmer Dabbelt  wrote:
>
> On Thu, 27 Oct 2022 11:23:17 PDT (-0700), christoph.muell...@vrull.eu wrote:
> > On Thu, Oct 27, 2022 at 8:11 PM Christoph Muellner <
> > christoph.muell...@vrull.eu> wrote:
> >
> >> From: Christoph Muellner 
> >>
> >> This patch adds support for the Zawrs ISA extension.
> >> The patch depends on the corresponding Binutils patch
> >> to be usable (see [1])
> >>
> >> The specification can be found here:
> >> https://github.com/riscv/riscv-zawrs/blob/main/zawrs.adoc
> >>
> >> Note, that the Zawrs extension is not frozen or ratified yet.
> >> Therefore this patch is an RFC and not intended to get merged.
> >>
> >
> > Sorry, forgot to update this part:
> > The Zawrs extension is frozen but not ratified.
> > Let me know if I should send a v2 for this change of the commit msg.
>
> IMO it's fine to just fix it up at commit time.  This LGTM, we just need
> the NEWS entry too.  I also don't see any build/test results.
>
> Thanks!
>
> > Binuitls support has been merged recently:
> >
> > https://sourceware.org/git/?p=binutils-gdb.git;a=commit;h=eb668e50036e979fb0a74821df4eee0307b44e66
> >
> >
> >>
> >> [1] https://sourceware.org/pipermail/binutils/2022-April/120559.html
> >>
> >> gcc/ChangeLog:
> >>
> >> * common/config/riscv/riscv-common.cc: Add zawrs extension.
> >> * config/riscv/riscv-opts.h (MASK_ZAWRS): New.
> >> (TARGET_ZAWRS): New.
> >> * config/riscv/riscv.opt: New.
> >>
> >> gcc/testsuite/ChangeLog:
> >>
> >> * gcc.target/riscv/zawrs.c: New test.
> >>
> >> Signed-off-by: Christoph Muellner 
> >> ---
> >>  gcc/common/config/riscv/riscv-common.cc |  4 
> >>  gcc/config/riscv/riscv-opts.h   |  3 +++
> >>  gcc/config/riscv/riscv.opt  |  3 +++
> >>  gcc/testsuite/gcc.target/riscv/zawrs.c  | 13 +
> >>  4 files changed, 23 insertions(+)
> >>  create mode 100644 gcc/testsuite/gcc.target/riscv/zawrs.c
> >>
> >> diff --git a/gcc/common/config/riscv/riscv-common.cc
> >> b/gcc/common/config/riscv/riscv-common.cc
> >> index d6404a01205..4b7f777c103 100644
> >> --- a/gcc/common/config/riscv/riscv-common.cc
> >> +++ b/gcc/common/config/riscv/riscv-common.cc
> >> @@ -163,6 +163,8 @@ static const struct riscv_ext_version
> >> riscv_ext_version_table[] =
> >>{"zifencei", ISA_SPEC_CLASS_20191213, 2, 0},
> >>{"zifencei", ISA_SPEC_CLASS_20190608, 2, 0},
> >>
> >> +  {"zawrs", ISA_SPEC_CLASS_NONE, 1, 0},
> >> +
> >>{"zba", ISA_SPEC_CLASS_NONE, 1, 0},
> >>{"zbb", ISA_SPEC_CLASS_NONE, 1, 0},
> >>{"zbc", ISA_SPEC_CLASS_NONE, 1, 0},
> >> @@ -1180,6 +1182,8 @@ static const riscv_ext_flag_table_t
> >> riscv_ext_flag_table[] =
> >>{"zicsr",&gcc_options::x_riscv_zi_subext, MASK_ZICSR},
> >>{"zifencei", &gcc_options::x_riscv_zi_subext, MASK_ZIFENCEI},
> >>
> >> +  {"zawrs", &gcc_options::x_riscv_za_subext, MASK_ZAWRS},
> >> +
> >>{"zba",&gcc_options::x_riscv_zb_subext, MASK_ZBA},
> >>{"zbb",&gcc_options::x_riscv_zb_subext, MASK_ZBB},
> >>{"zbc",&gcc_options::x_riscv_zb_subext, MASK_ZBC},
> >> diff --git a/gcc/config/riscv/riscv-opts.h b/gcc/config/riscv/riscv-opts.h
> >> index 1dfe8c89209..25fd85b09b1 100644
> >> --- a/gcc/config/riscv/riscv-opts.h
> >> +++ b/gcc/config/riscv/riscv-opts.h
> >> @@ -73,6 +73,9 @@ enum stack_protector_guard {
> >>  #define TARGET_ZICSR((riscv_zi_subext & MASK_ZICSR) != 0)
> >>  #define TARGET_ZIFENCEI ((riscv_zi_subext & MASK_ZIFENCEI) != 0)
> >>
> >> +#define MASK_ZAWRS   (1 << 0)
> >> +#define TARGET_ZAWRS ((riscv_za_subext & MASK_ZAWRS) != 0)
> >> +
> >>  #define MASK_ZBA  (1 << 0)
> >>  #define MASK_ZBB  (1 << 1)
> >>  #define MASK_ZBC  (1 << 2)
> >> diff --git a/gcc/config/riscv/riscv.opt b/gcc/config/riscv/riscv.opt
> >> index 426ea95cd14..7c3ca48d1cc 100644
> >> --- a/gcc/config/riscv/riscv.opt
> >> +++ b/gcc/config/riscv/riscv.opt
> >> @@ -203,6 +203,9 @@ long riscv_stack_protector_guard_offset = 0
> >>  TargetVariable
> >>  int riscv_zi_subext
> >>
> >> +TargetVariable
> >> +int riscv_za_subext
> >> +
> >>  TargetVariable
> >>  int riscv_zb_subext
> >>
> >> diff --git a/gcc/testsuite/gcc.target/riscv/zawrs.c
> >> b/gcc/testsuite/gcc.target/riscv/zawrs.c
> >> new file mode 100644
> >> index 000..0b7e2662343
> >> --- /dev/null
> >> +++ b/gcc/testsuite/gcc.target/riscv/zawrs.c
> >> @@ -0,0 +1,13 @@
> >> +/* { dg-do compile } */
> >> +/* { dg-options "-march=rv64gc_zawrs" { target { rv64 } } } */
> >> +/* { dg-options "-march=rv32gc_zawrs" { target { rv32 } } } */
> >> +
> >> +#ifndef __riscv_zawrs
> >> +#error Feature macro not defined
> >> +#endif
> >> +
> >> +int
> >> +foo (int a)
> >> +{
> >> +  return a;
> >> +}
> >> --
> >> 2.37.3
> >>
> >>


Re: [RFC] RISC-V: Minimal supports for new extensions in profile.

2022-11-02 Thread Philipp Tomsich
On Wed, 2 Nov 2022 at 23:06, Palmer Dabbelt  wrote:
>
> On Wed, 02 Nov 2022 05:52:34 PDT (-0700), jia...@iscas.ac.cn wrote:
> > This patch just add name support contain in profiles.
> > Set the extension version as 0.1.
>
> Or maybe v0.8, as they're in the v0.8 profile spec?  I doubt it really
> matters, though.  Either way we'll need a -mprofile-spec-version (or
> whatever) for these, as these one-phrase definitions will almost
> certainly change.
>
> This also doesn't couple these new extensions to the profiles in any
> way.  IMO that's a sane thing to do, but they're only defined as part of
> the mandatory profile section so I'm just double-checking here
> .
>
> We'll also need news entries and I don't see any testing results, though
> those are probably pretty easy here.
>
> >
> > gcc/ChangeLog:
> >
> > * common/config/riscv/riscv-common.cc: New extensions.
> > * config/riscv/riscv-opts.h (MASK_ZICCAMOA): New mask.
> > (MASK_ZICCIF): Ditto.
> > (MASK_ZICCLSM): Ditto.
> > (MASK_ZICCRSE): Ditto.
> > (MASK_ZICNTR): Ditto.
> > (MASK_ZIHINTPAUSE): Ditto.
> > (MASK_ZIHPM): Ditto.
> > (TARGET_ZICCAMOA): New target.
> > (TARGET_ZICCIF): Ditto.
> > (TARGET_ZICCLSM): Ditto.
> > (TARGET_ZICCRSE): Ditto.
> > (TARGET_ZICNTR): Ditto.
> > (TARGET_ZIHINTPAUSE): Ditto.
> > (TARGET_ZIHPM): Ditto.
> > (MASK_SVPBMT): New mask.
> >
> > ---
> >  gcc/common/config/riscv/riscv-common.cc | 20 
> >  gcc/config/riscv/riscv-opts.h   | 15 +++
> >  2 files changed, 35 insertions(+)
> >
> > diff --git a/gcc/common/config/riscv/riscv-common.cc 
> > b/gcc/common/config/riscv/riscv-common.cc
> > index d6404a01205..602491c638d 100644
> > --- a/gcc/common/config/riscv/riscv-common.cc
> > +++ b/gcc/common/config/riscv/riscv-common.cc
> > @@ -163,6 +163,15 @@ static const struct riscv_ext_version 
> > riscv_ext_version_table[] =
> >{"zifencei", ISA_SPEC_CLASS_20191213, 2, 0},
> >{"zifencei", ISA_SPEC_CLASS_20190608, 2, 0},
> >
> > +  {"ziccamoa", ISA_SPEC_CLASS_NONE, 0, 1},
> > +  {"ziccif", ISA_SPEC_CLASS_NONE, 0, 1},
>
> IMO Ziccif should be sufficiently visible in the object that we can
> reject running binaries that require that on systems that don't support
> it.  It's essentially the same as Ztso, we're adding more constraints to
> existing instructions.
>
> > +  {"zicclsm", ISA_SPEC_CLASS_NONE, 0, 1},
> > +  {"ziccrse", ISA_SPEC_CLASS_NONE, 0, 1},
> > +  {"zicntr", ISA_SPEC_CLASS_NONE, 0, 1},
>
> As per Andrew's post here
> ,
> Zicntr and Zihpm should be ignored by software.
>
> I think you could make that compatibility argument for Zicclsm and
> Ziccrse as well, but given that the core of the Zicntr/Zihpm argument is
> based on userspace not knowing about priv-spec details such as PMAs I'm
> guessing it'd go that way too.  That said, these are all listed in the
> "features available to user-mode execution environments" section.
>
> > +
> > +  {"zihintpause", ISA_SPEC_CLASS_NONE, 0, 1},
>
> We should probably have a builtin for this, there's a handful of
> userspace cpu_relax()-type calls and having something to select the
> flavor of pause instruction based on the target seems generally useful.

I had originally submitted this in early 2021 (including a builtin),
but we never agreed on details (e.g. whether this should be gated, as
it is a true hint):
  https://gcc.gnu.org/pipermail/gcc-patches/2021-January/562936.html

Let me know what behavior we want and I'll submit a v2.

Philipp.

> > +  {"zihpm", ISA_SPEC_CLASS_NONE, 0, 1},
>
> See above.
>
> > +
> >{"zba", ISA_SPEC_CLASS_NONE, 1, 0},
> >{"zbb", ISA_SPEC_CLASS_NONE, 1, 0},
> >{"zbc", ISA_SPEC_CLASS_NONE, 1, 0},
>
> There's some missing ones, just poking through the profile I can find:
> Za64rs and Zic64b, but there's a lot in there and I'm kind of getting my
> eyes crossed already.
>
> I'd argue that Za64rs should be handled like Ziccif, but we don't have a
> lot of bits left in the header.  I just sent some patches to the ELF
> psABI spec: https://github.com/riscv-non-isa/riscv-elf-psabi-doc/pull/351
>
> > @@ -219,6 +228,7 @@ static const struct riscv_ext_version 
> > riscv_ext_version_table[] =
> >
> >{"svinval", ISA_SPEC_CLASS_NONE, 1, 0},
> >{"svnapot", ISA_SPEC_CLASS_NONE, 1, 0},
> > +  {"svpbmt", ISA_SPEC_CLASS_NONE, 0, 1},
> >
> >/* Terminate the list.  */
> >{NULL, ISA_SPEC_CLASS_NONE, 0, 0}
> > @@ -1179,6 +1189,14 @@ static const riscv_ext_flag_table_t 
> > riscv_ext_flag_table[] =
> >
> >{"zicsr",&gcc_options::x_riscv_zi_subext, MASK_ZICSR},
> >{"zifencei", &gcc_options::x_riscv_zi_subext, MASK_ZIFENCEI},
> > +  {"ziccamoa", &gcc_options::x_riscv_zi_subext, MASK_ZICCAMOA},
> > +  {"ziccif", &gcc_options::x_riscv_zi_subext

[PATCH] riscv: implement TARGET_MODE_REP_EXTENDED

2022-09-05 Thread Philipp Tomsich
TARGET_MODE_REP_EXTENDED is supposed to match LOAD_EXTEND_OP, so this
adds an implementation using the same logic as in LOAD_EXTEND_OP.

This reduces the number of extension operations, as evidenced in the
reduction of dynamic instructions for the xz benchmark in SPEC CPU:

# dynamic instructions
baseline  new  improvement
xz, workload 13846813080263744645389112.66%
xz, workload 29859953271099743040304981.19%
xz, workload 35453729945235337177442602.14%

The shift-shift-2.c testcase needs to be adjusted, as it will no
longer use slliw/slriw for sub5, but will instead emit slli/slri.

No new regressions runnung the riscv.exp suite.

gcc/ChangeLog:

* config/riscv/riscv.cc (riscv_mode_rep_extended):
(TARGET_MODE_REP_EXTENDED): Implement.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/shift-shift-2.c: Adjust.

Signed-off-by: Philipp Tomsich 

---

 gcc/config/riscv/riscv.cc  | 15 +++
 gcc/testsuite/gcc.target/riscv/shift-shift-2.c |  2 --
 2 files changed, 15 insertions(+), 2 deletions(-)

diff --git a/gcc/config/riscv/riscv.cc b/gcc/config/riscv/riscv.cc
index 675d92c0961..cf829f390ab 100644
--- a/gcc/config/riscv/riscv.cc
+++ b/gcc/config/riscv/riscv.cc
@@ -5053,6 +5053,18 @@ riscv_hard_regno_mode_ok (unsigned int regno, 
machine_mode mode)
   return true;
 }
 
+/* Implement TARGET_MODE_REP_EXTENDED.  */
+
+static int
+riscv_mode_rep_extended (scalar_int_mode mode, scalar_int_mode mode_rep)
+{
+  /* On 64-bit targets, SImode register values are sign-extended to DImode.  */
+  if (TARGET_64BIT && mode == SImode && mode_rep == DImode)
+return SIGN_EXTEND;
+
+  return UNKNOWN;
+}
+
 /* Implement TARGET_MODES_TIEABLE_P.
 
Don't allow floating-point modes to be tied, since type punning of
@@ -6153,6 +6165,9 @@ riscv_init_libfuncs (void)
 #undef TARGET_HARD_REGNO_MODE_OK
 #define TARGET_HARD_REGNO_MODE_OK riscv_hard_regno_mode_ok
 
+#undef TARGET_MODE_REP_EXTENDED
+#define TARGET_MODE_REP_EXTENDED riscv_mode_rep_extended
+
 #undef TARGET_MODES_TIEABLE_P
 #define TARGET_MODES_TIEABLE_P riscv_modes_tieable_p
 
diff --git a/gcc/testsuite/gcc.target/riscv/shift-shift-2.c 
b/gcc/testsuite/gcc.target/riscv/shift-shift-2.c
index 5f93be15ac5..2f38b3f0fec 100644
--- a/gcc/testsuite/gcc.target/riscv/shift-shift-2.c
+++ b/gcc/testsuite/gcc.target/riscv/shift-shift-2.c
@@ -38,5 +38,3 @@ sub5 (unsigned int i)
 }
 /* { dg-final { scan-assembler-times "slli" 5 } } */
 /* { dg-final { scan-assembler-times "srli" 5 } } */
-/* { dg-final { scan-assembler-times "slliw" 1 } } */
-/* { dg-final { scan-assembler-times "srliw" 1 } } */
-- 
2.34.1



[PATCH v1] RISC-V: Implement C[LT]Z_DEFINED_VALUE_AT_ZERO

2022-04-23 Thread Philipp Tomsich
The Zbb support has introduced ctz and clz to the backend, but some
transformations in GCC need to know what the value of c[lt]z at zero
is. This affects how the optab is generated and may suppress use of
CLZ/CTZ in tree passes.

Among other things, this is needed for the transformation of
table-based ctz-implementations, such as in deepsjeng, to work
(see https://gcc.gnu.org/bugzilla/show_bug.cgi?id=90838).

Prior to this change, the test case from PR90838 would compile to
on RISC-V targets with Zbb:
  myctz:
lui a4,%hi(.LC0)
ld  a4,%lo(.LC0)(a4)
neg a5,a0
and a5,a5,a0
mul a5,a5,a4
lui a4,%hi(.LANCHOR0)
addia4,a4,%lo(.LANCHOR0)
srlia5,a5,58
sh2add  a5,a5,a4
lw  a0,0(a5)
ret

After this change, we get:
  myctz:
ctz a0,a0
andia0,a0,63
ret

Testing this with deepsjeng_r (from SPEC 2017) against QEMU, this
shows a clear reduction in dynamic instruction count:
 - before  1961888067076
 - after   1907928279874 (2.75% reduction)

gcc/ChangeLog:

* config/riscv/riscv.h (CLZ_DEFINED_VALUE_AT_ZERO): Implement.
(CTZ_DEFINED_VALUE_AT_ZERO): Same.

gcc/testsuite/ChangeLog:

* gcc.dg/pr90838.c: Add additional flags (dg-additional-options)
  when compiling for riscv64.
* gcc.target/riscv/zbb-ctz.c: New test.

Signed-off-by: Philipp Tomsich 
Signed-off-by: Manolis Tsamis 
Co-developed-by: Manolis Tsamis 

---
 gcc/config/riscv/riscv.h|  5 ++
 gcc/testsuite/gcc.dg/pr90838.c  |  2 +
 gcc/testsuite/gcc.target/riscv/zbb-ctz-32.c | 65 
 gcc/testsuite/gcc.target/riscv/zbb-ctz.c| 66 +
 4 files changed, 138 insertions(+)
 create mode 100644 gcc/testsuite/gcc.target/riscv/zbb-ctz-32.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/zbb-ctz.c

diff --git a/gcc/config/riscv/riscv.h b/gcc/config/riscv/riscv.h
index 4210e252255..95f72e2fd3f 100644
--- a/gcc/config/riscv/riscv.h
+++ b/gcc/config/riscv/riscv.h
@@ -1019,4 +1019,9 @@ extern void riscv_remove_unneeded_save_restore_calls 
(void);
 
 #define HARD_REGNO_RENAME_OK(FROM, TO) riscv_hard_regno_rename_ok (FROM, TO)
 
+#define CLZ_DEFINED_VALUE_AT_ZERO(MODE, VALUE) \
+  ((VALUE) = GET_MODE_UNIT_BITSIZE (MODE), 2)
+#define CTZ_DEFINED_VALUE_AT_ZERO(MODE, VALUE) \
+  ((VALUE) = GET_MODE_UNIT_BITSIZE (MODE), 2)
+
 #endif /* ! GCC_RISCV_H */
diff --git a/gcc/testsuite/gcc.dg/pr90838.c b/gcc/testsuite/gcc.dg/pr90838.c
index 41c5dab9a5c..162bd6f51d0 100644
--- a/gcc/testsuite/gcc.dg/pr90838.c
+++ b/gcc/testsuite/gcc.dg/pr90838.c
@@ -1,5 +1,6 @@
 /* { dg-do compile } */
 /* { dg-options "-O2 -fdump-tree-forwprop2-details" } */
+/* { dg-additional-options "-march=rv64gc_zbb" { target riscv64*-*-* } } */
 
 int ctz1 (unsigned x)
 {
@@ -57,3 +58,4 @@ int ctz4 (unsigned long x)
 }
 
 /* { dg-final { scan-tree-dump-times {= \.CTZ} 4 "forwprop2" { target 
aarch64*-*-* } } } */
+/* { dg-final { scan-tree-dump-times {= \.CTZ} 4 "forwprop2" { target 
riscv64*-*-* } } } */
diff --git a/gcc/testsuite/gcc.target/riscv/zbb-ctz-32.c 
b/gcc/testsuite/gcc.target/riscv/zbb-ctz-32.c
new file mode 100644
index 000..b903517197a
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/zbb-ctz-32.c
@@ -0,0 +1,65 @@
+/* { dg-do compile } */
+/* { dg-options "-march=rv32gc_zbb -mabi=ilp32" } */
+/* { dg-skip-if "" { *-*-* } { "-O0" "-Og" } } */
+
+int ctz1 (unsigned x)
+{
+  static const char table[32] =
+{
+  0, 1, 28, 2, 29, 14, 24, 3, 30, 22, 20, 15, 25, 17, 4, 8,
+  31, 27, 13, 23, 21, 19, 16, 7, 26, 12, 18, 6, 11, 5, 10, 9
+};
+
+  return table[((unsigned)((x & -x) * 0x077CB531U)) >> 27];
+}
+
+int ctz2 (unsigned x)
+{
+#define u 0
+  static short table[64] =
+{
+  32, 0, 1,12, 2, 6, u,13, 3, u, 7, u, u, u, u,14,
+  10, 4, u, u, 8, u, u,25, u, u, u, u, u,21,27,15,
+  31,11, 5, u, u, u, u, u, 9, u, u,24, u, u,20,26,
+  30, u, u, u, u,23, u,19,29, u,22,18,28,17,16, u
+};
+
+  x = (x & -x) * 0x0450FBAF;
+  return table[x >> 26];
+}
+
+int ctz3 (unsigned x)
+{
+  static int table[32] =
+{
+  0, 1, 2,24, 3,19, 6,25, 22, 4,20,10,16, 7,12,26,
+  31,23,18, 5,21, 9,15,11,30,17, 8,14,29,13,28,27
+};
+
+  if (x == 0) return 32;
+  x = (x & -x) * 0x04D7651F;
+  return table[x >> 27];
+}
+
+static const unsigned long long magic = 0x03f08c5392f756cdULL;
+
+static const char table[64] = {
+ 0,  1, 12,  2, 13, 22, 17,  3,
+14, 33, 23, 36, 18, 58, 28,  4,
+62, 15, 34, 26, 24, 48, 50, 37,
+19, 55, 59, 52, 29, 44, 39,  5,
+63, 11, 21, 16, 32, 35, 57, 27,
+61, 25, 47, 49, 54, 51, 43, 38,
+10, 20, 31, 56, 60, 46, 53, 42,
+ 9, 30, 45, 41,  8, 40,  7,  6,
+};
+
+int ctz4 (unsigned long x)
+{
+  unsigned long lsb = x & -x;
+  return table[(lsb * magic) &g

Re: [PATCH v1] RISC-V: Implement C[LT]Z_DEFINED_VALUE_AT_ZERO

2022-04-28 Thread Philipp Tomsich
Kito,

Did you have a chance to take a look at this one?

I assume this will have to wait until we reopen for 13...
OK for 13?  Also: OK for a backport (once a branch for that exists)?

Philipp.


On Sun, 24 Apr 2022 at 01:44, Philipp Tomsich  wrote:
>
> The Zbb support has introduced ctz and clz to the backend, but some
> transformations in GCC need to know what the value of c[lt]z at zero
> is. This affects how the optab is generated and may suppress use of
> CLZ/CTZ in tree passes.
>
> Among other things, this is needed for the transformation of
> table-based ctz-implementations, such as in deepsjeng, to work
> (see https://gcc.gnu.org/bugzilla/show_bug.cgi?id=90838).
>
> Prior to this change, the test case from PR90838 would compile to
> on RISC-V targets with Zbb:
>   myctz:
> lui a4,%hi(.LC0)
> ld  a4,%lo(.LC0)(a4)
> neg a5,a0
> and a5,a5,a0
> mul a5,a5,a4
> lui a4,%hi(.LANCHOR0)
> addia4,a4,%lo(.LANCHOR0)
> srlia5,a5,58
> sh2add  a5,a5,a4
> lw  a0,0(a5)
> ret
>
> After this change, we get:
>   myctz:
> ctz a0,a0
> andia0,a0,63
> ret
>
> Testing this with deepsjeng_r (from SPEC 2017) against QEMU, this
> shows a clear reduction in dynamic instruction count:
>  - before  1961888067076
>  - after   1907928279874 (2.75% reduction)
>
> gcc/ChangeLog:
>
> * config/riscv/riscv.h (CLZ_DEFINED_VALUE_AT_ZERO): Implement.
> (CTZ_DEFINED_VALUE_AT_ZERO): Same.
>
> gcc/testsuite/ChangeLog:
>
> * gcc.dg/pr90838.c: Add additional flags (dg-additional-options)
>   when compiling for riscv64.
> * gcc.target/riscv/zbb-ctz.c: New test.
>
> Signed-off-by: Philipp Tomsich 
> Signed-off-by: Manolis Tsamis 
> Co-developed-by: Manolis Tsamis 
>
> ---
>  gcc/config/riscv/riscv.h|  5 ++
>  gcc/testsuite/gcc.dg/pr90838.c  |  2 +
>  gcc/testsuite/gcc.target/riscv/zbb-ctz-32.c | 65 
>  gcc/testsuite/gcc.target/riscv/zbb-ctz.c| 66 +
>  4 files changed, 138 insertions(+)
>  create mode 100644 gcc/testsuite/gcc.target/riscv/zbb-ctz-32.c
>  create mode 100644 gcc/testsuite/gcc.target/riscv/zbb-ctz.c
>
> diff --git a/gcc/config/riscv/riscv.h b/gcc/config/riscv/riscv.h
> index 4210e252255..95f72e2fd3f 100644
> --- a/gcc/config/riscv/riscv.h
> +++ b/gcc/config/riscv/riscv.h
> @@ -1019,4 +1019,9 @@ extern void riscv_remove_unneeded_save_restore_calls 
> (void);
>
>  #define HARD_REGNO_RENAME_OK(FROM, TO) riscv_hard_regno_rename_ok (FROM, TO)
>
> +#define CLZ_DEFINED_VALUE_AT_ZERO(MODE, VALUE) \
> +  ((VALUE) = GET_MODE_UNIT_BITSIZE (MODE), 2)
> +#define CTZ_DEFINED_VALUE_AT_ZERO(MODE, VALUE) \
> +  ((VALUE) = GET_MODE_UNIT_BITSIZE (MODE), 2)
> +
>  #endif /* ! GCC_RISCV_H */
> diff --git a/gcc/testsuite/gcc.dg/pr90838.c b/gcc/testsuite/gcc.dg/pr90838.c
> index 41c5dab9a5c..162bd6f51d0 100644
> --- a/gcc/testsuite/gcc.dg/pr90838.c
> +++ b/gcc/testsuite/gcc.dg/pr90838.c
> @@ -1,5 +1,6 @@
>  /* { dg-do compile } */
>  /* { dg-options "-O2 -fdump-tree-forwprop2-details" } */
> +/* { dg-additional-options "-march=rv64gc_zbb" { target riscv64*-*-* } } */
>
>  int ctz1 (unsigned x)
>  {
> @@ -57,3 +58,4 @@ int ctz4 (unsigned long x)
>  }
>
>  /* { dg-final { scan-tree-dump-times {= \.CTZ} 4 "forwprop2" { target 
> aarch64*-*-* } } } */
> +/* { dg-final { scan-tree-dump-times {= \.CTZ} 4 "forwprop2" { target 
> riscv64*-*-* } } } */
> diff --git a/gcc/testsuite/gcc.target/riscv/zbb-ctz-32.c 
> b/gcc/testsuite/gcc.target/riscv/zbb-ctz-32.c
> new file mode 100644
> index 000..b903517197a
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/riscv/zbb-ctz-32.c
> @@ -0,0 +1,65 @@
> +/* { dg-do compile } */
> +/* { dg-options "-march=rv32gc_zbb -mabi=ilp32" } */
> +/* { dg-skip-if "" { *-*-* } { "-O0" "-Og" } } */
> +
> +int ctz1 (unsigned x)
> +{
> +  static const char table[32] =
> +{
> +  0, 1, 28, 2, 29, 14, 24, 3, 30, 22, 20, 15, 25, 17, 4, 8,
> +  31, 27, 13, 23, 21, 19, 16, 7, 26, 12, 18, 6, 11, 5, 10, 9
> +};
> +
> +  return table[((unsigned)((x & -x) * 0x077CB531U)) >> 27];
> +}
> +
> +int ctz2 (unsigned x)
> +{
> +#define u 0
> +  static short table[64] =
> +{
> +  32, 0, 1,12, 2, 6, u,13, 3, u, 7, u, u, u, u,14,
> +  10, 4, u, u, 8, u, u,25, u, u, u, u, u,21,27,15,
> +  31,11, 5, u, u, u, u, u, 9, u, u,24, u, u,20,26,
> +  30, u, u, u, u,23, u,19,29, u,22,18,28,17,16, u
>

[PATCH v2] RISC-V: Implement C[LT]Z_DEFINED_VALUE_AT_ZERO

2022-05-12 Thread Philipp Tomsich
The Zbb support has introduced ctz and clz to the backend, but some
transformations in GCC need to know what the value of c[lt]z at zero
is. This affects how the optab is generated and may suppress use of
CLZ/CTZ in tree passes.

Among other things, this is needed for the transformation of
table-based ctz-implementations, such as in deepsjeng, to work
(see https://gcc.gnu.org/bugzilla/show_bug.cgi?id=90838).

Prior to this change, the test case from PR90838 would compile to
on RISC-V targets with Zbb:
  myctz:
lui a4,%hi(.LC0)
ld  a4,%lo(.LC0)(a4)
neg a5,a0
and a5,a5,a0
mul a5,a5,a4
lui a4,%hi(.LANCHOR0)
addia4,a4,%lo(.LANCHOR0)
srlia5,a5,58
sh2add  a5,a5,a4
lw  a0,0(a5)
ret

After this change, we get:
  myctz:
ctz a0,a0
andia0,a0,63
ret

Testing this with deepsjeng_r (from SPEC 2017) against QEMU, this
shows a clear reduction in dynamic instruction count:
 - before  1961888067076
 - after   1907928279874 (2.75% reduction)

This also merges the various target-specific test-cases (for x86-64,
aarch64 and riscv) within gcc.dg/pr90838.c.

This extends the macros (i.e., effective-target keywords) used in
testing (lib/target-supports.exp) to reliably distinguish between RV32
and RV64 via __riscv_xlen (i.e., the integer register bitwidth) :
testing for ILP32 could be misleading (as ILP32 is a valid memory
model for 64bit systems).

gcc/ChangeLog:

* config/riscv/riscv.h (CLZ_DEFINED_VALUE_AT_ZERO): Implement.
(CTZ_DEFINED_VALUE_AT_ZERO): Same.
* doc/sourcebuild.texi: add documentation for RISC-V specific
test target keywords

gcc/testsuite/ChangeLog:

* gcc.dg/pr90838.c: Add additional flags (dg-additional-options)
  when compiling for riscv64 and subsume gcc.target/aarch64/pr90838.c
  and gcc.target/i386/pr95863-2.c.
* gcc.target/riscv/zbb-ctz.c: New test.
* gcc.target/aarch64/pr90838.c: Removed.
* gcc.target/i386/pr95863-2.c: Removed.
* lib/target-supports.exp: Recognize RV32 or RV64 via XLEN

Signed-off-by: Philipp Tomsich 
Signed-off-by: Manolis Tsamis 
Co-developed-by: Manolis Tsamis 

---
Changes in v2:
- Address review comments
- Merge the different target-specific testcases for CLZ into one
- Add RV32 tests
- Fix pr90383.c testcase for x86_64

 gcc/config/riscv/riscv.h   |  5 ++
 gcc/doc/sourcebuild.texi   | 12 
 gcc/testsuite/gcc.dg/pr90838.c | 25 +
 gcc/testsuite/gcc.target/aarch64/pr90838.c | 64 --
 gcc/testsuite/gcc.target/i386/pr95863-2.c  | 27 -
 gcc/testsuite/lib/target-supports.exp  | 30 ++
 6 files changed, 72 insertions(+), 91 deletions(-)
 delete mode 100644 gcc/testsuite/gcc.target/aarch64/pr90838.c
 delete mode 100644 gcc/testsuite/gcc.target/i386/pr95863-2.c

diff --git a/gcc/config/riscv/riscv.h b/gcc/config/riscv/riscv.h
index 8a4d2cf7f85..b191606edb4 100644
--- a/gcc/config/riscv/riscv.h
+++ b/gcc/config/riscv/riscv.h
@@ -1004,4 +1004,9 @@ extern void riscv_remove_unneeded_save_restore_calls 
(void);
 
 #define HARD_REGNO_RENAME_OK(FROM, TO) riscv_hard_regno_rename_ok (FROM, TO)
 
+#define CLZ_DEFINED_VALUE_AT_ZERO(MODE, VALUE) \
+  ((VALUE) = GET_MODE_UNIT_BITSIZE (MODE), 2)
+#define CTZ_DEFINED_VALUE_AT_ZERO(MODE, VALUE) \
+  ((VALUE) = GET_MODE_UNIT_BITSIZE (MODE), 2)
+
 #endif /* ! GCC_RISCV_H */
diff --git a/gcc/doc/sourcebuild.texi b/gcc/doc/sourcebuild.texi
index 613ac29967b..71c04841df2 100644
--- a/gcc/doc/sourcebuild.texi
+++ b/gcc/doc/sourcebuild.texi
@@ -2420,6 +2420,18 @@ PowerPC target pre-defines macro _ARCH_PWR9 which means 
the @code{-mcpu}
 setting is Power9 or later.
 @end table
 
+@subsection RISC-V specific attributes
+
+@table @code
+
+@item rv32
+Test system has an integer register width of 32 bits.
+
+@item rv64
+Test system has an integer register width of 64 bits.
+
+@end table
+
 @subsubsection Other hardware attributes
 
 @c Please keep this table sorted alphabetically.
diff --git a/gcc/testsuite/gcc.dg/pr90838.c b/gcc/testsuite/gcc.dg/pr90838.c
index 41c5dab9a5c..ae8652f3c39 100644
--- a/gcc/testsuite/gcc.dg/pr90838.c
+++ b/gcc/testsuite/gcc.dg/pr90838.c
@@ -1,5 +1,8 @@
 /* { dg-do compile } */
 /* { dg-options "-O2 -fdump-tree-forwprop2-details" } */
+/* { dg-additional-options "-mbmi" { target { { i?86-*-* x86_64-*-* } && { ! { 
ia32 } } } } } */
+/* { dg-additional-options "-march=rv64gc_zbb" { target { lp64 && riscv64*-*-* 
} } } */
+/* { dg-additional-options "-march=rv32gc_zbb" { target { ilp32 && 
riscv64*-*-* } } } */
 
 int ctz1 (unsigned x)
 {
@@ -56,4 +59,26 @@ int ctz4 (unsigned long x)
   return table[(lsb * magic) >> 58];
 }
 
+/* { dg-final { scan-tree-dump-times {= \.CTZ} 4 "forwprop2" { target { { 
i?86-*-* x86_64-*-* } &&am

Re: [PATCH v2] RISC-V: Implement C[LT]Z_DEFINED_VALUE_AT_ZERO

2022-05-13 Thread Philipp Tomsich
+Jakub Jelinek

Jakub,

I see you have recently worked on lib/target-support.exp: could you do
the review of that part?

Thanks,
Philipp.

On Fri, 13 May 2022 at 00:36, Palmer Dabbelt  wrote:
>
> On Thu, 12 May 2022 11:33:34 PDT (-0700), philipp.toms...@vrull.eu wrote:
> > The Zbb support has introduced ctz and clz to the backend, but some
> > transformations in GCC need to know what the value of c[lt]z at zero
> > is. This affects how the optab is generated and may suppress use of
> > CLZ/CTZ in tree passes.
> >
> > Among other things, this is needed for the transformation of
> > table-based ctz-implementations, such as in deepsjeng, to work
> > (see https://gcc.gnu.org/bugzilla/show_bug.cgi?id=90838).
> >
> > Prior to this change, the test case from PR90838 would compile to
> > on RISC-V targets with Zbb:
> >   myctz:
> >   lui a4,%hi(.LC0)
> >   ld  a4,%lo(.LC0)(a4)
> >   neg a5,a0
> >   and a5,a5,a0
> >   mul a5,a5,a4
> >   lui a4,%hi(.LANCHOR0)
> >   addia4,a4,%lo(.LANCHOR0)
> >   srlia5,a5,58
> >   sh2add  a5,a5,a4
> >   lw  a0,0(a5)
> >   ret
> >
> > After this change, we get:
> >   myctz:
> >   ctz a0,a0
> >   andia0,a0,63
> >   ret
> >
> > Testing this with deepsjeng_r (from SPEC 2017) against QEMU, this
> > shows a clear reduction in dynamic instruction count:
> >  - before  1961888067076
> >  - after   1907928279874 (2.75% reduction)
> >
> > This also merges the various target-specific test-cases (for x86-64,
> > aarch64 and riscv) within gcc.dg/pr90838.c.
> >
> > This extends the macros (i.e., effective-target keywords) used in
> > testing (lib/target-supports.exp) to reliably distinguish between RV32
> > and RV64 via __riscv_xlen (i.e., the integer register bitwidth) :
> > testing for ILP32 could be misleading (as ILP32 is a valid memory
> > model for 64bit systems).
> >
> > gcc/ChangeLog:
> >
> >   * config/riscv/riscv.h (CLZ_DEFINED_VALUE_AT_ZERO): Implement.
> >   (CTZ_DEFINED_VALUE_AT_ZERO): Same.
> >   * doc/sourcebuild.texi: add documentation for RISC-V specific
> >   test target keywords
> >
> > gcc/testsuite/ChangeLog:
> >
> >   * gcc.dg/pr90838.c: Add additional flags (dg-additional-options)
> > when compiling for riscv64 and subsume gcc.target/aarch64/pr90838.c
> > and gcc.target/i386/pr95863-2.c.
> >   * gcc.target/riscv/zbb-ctz.c: New test.
> >   * gcc.target/aarch64/pr90838.c: Removed.
> >   * gcc.target/i386/pr95863-2.c: Removed.
> >   * lib/target-supports.exp: Recognize RV32 or RV64 via XLEN
> >
> > Signed-off-by: Philipp Tomsich 
> > Signed-off-by: Manolis Tsamis 
> > Co-developed-by: Manolis Tsamis 
> >
> > ---
> > Changes in v2:
> > - Address review comments
> > - Merge the different target-specific testcases for CLZ into one
> > - Add RV32 tests
> > - Fix pr90383.c testcase for x86_64
> >
> >  gcc/config/riscv/riscv.h   |  5 ++
> >  gcc/doc/sourcebuild.texi   | 12 
> >  gcc/testsuite/gcc.dg/pr90838.c | 25 +
> >  gcc/testsuite/gcc.target/aarch64/pr90838.c | 64 --
> >  gcc/testsuite/gcc.target/i386/pr95863-2.c  | 27 -
> >  gcc/testsuite/lib/target-supports.exp  | 30 ++
> >  6 files changed, 72 insertions(+), 91 deletions(-)
> >  delete mode 100644 gcc/testsuite/gcc.target/aarch64/pr90838.c
> >  delete mode 100644 gcc/testsuite/gcc.target/i386/pr95863-2.c
>
> Reviewed-by: Palmer Dabbelt 
> Acked-by: Palmer Dabbelt 
>
> For the RISC-V bits, though presumably we need a global reviewer to
> handle the non-RISC-V stuff.
>
> Thanks!
>
> >
> > diff --git a/gcc/config/riscv/riscv.h b/gcc/config/riscv/riscv.h
> > index 8a4d2cf7f85..b191606edb4 100644
> > --- a/gcc/config/riscv/riscv.h
> > +++ b/gcc/config/riscv/riscv.h
> > @@ -1004,4 +1004,9 @@ extern void riscv_remove_unneeded_save_restore_calls 
> > (void);
> >
> >  #define HARD_REGNO_RENAME_OK(FROM, TO) riscv_hard_regno_rename_ok (FROM, 
> > TO)
> >
> > +#define CLZ_DEFINED_VALUE_AT_ZERO(MODE, VALUE) \
> > +  ((VALUE) = GET_MODE_UNIT_BITSIZE (MODE), 2)
> > +#define CTZ_DEFINED_VALUE_AT_ZERO(MODE, VALUE) \
> > +  ((VALUE) = GET_MODE_UNIT_BITSIZE (MODE), 2)
> > +
> >  #endif /* ! GCC_RISCV_H */
> > diff --git a/gcc/doc/sourcebuild.texi b/gcc/doc/sourcebuild.texi
>

Re: Supporting RISC-V Vendor Extensions in the GNU Toolchain

2022-05-13 Thread Philipp Tomsich
On Fri, 13 May 2022 at 12:00, Christoph Müllner  wrote:
>
> On Wed, May 11, 2022 at 2:02 AM Palmer Dabbelt  wrote:
> >
> > [Sorry for cross-posting to a bunch of lists, I figured it'd be best to
> > have all the discussions in one thread.]
> >
> > We currently only support what is defined by official RISC-V
> > specifications in the various GNU toolchain projects.  There's certainly
> > some grey areas there, but in general that means not taking code that
> > relies on drafts or vendor defined extensions, even if that would result
> > in higher performance or more featured systems for users.
> >
> > The original goal of these policies were to steer RISC-V implementers
> > towards a common set of specifications, but over the last year or so
> > it's become abundantly clear that this is causing more harm that good.
> > All extant RISC-V systems rely on behaviors defined outside the official
> > specifications, and while that's technically always been the case we've
> > gotten to the point where trying to ignore that fact is impacting real
> > users on real systems.  There's been consistent feedback from users that
> > we're not meeting their needs, which can clearly be seen in the many out
> > of tree patch sets in common use.
> >
> > There's been a handful of discussions about this, but we've yet to have
> > a proper discussion on the mailing lists.  From the various discussions
> > I've had it seems that folks are broadly in favor of supporting vendor
> > extensions, but the devil's always in the details with this sort of
> > thing so I thought it'd be best to write something up so we can have a
> > concrete discussion.
> >
> > The idea is to start taking code that depends on vendor-defined behavior
> > into the core GNU toolchain ports, as long as it meets the following
> > criteria:
> >
> > * An ISA manual is available that can be redistributed/archived, defines
> >   the behaviors in question as one or more vendor-specific extensions,
> >   and is clearly versioned.  The RISC-V foundation is setting various
> >   guidelines around how vendor-defined extensions and instructions
> >   should be named, we strongly suggest that vendors follow those
> >   conventions whenever possible (this is all new, though, so exactly
> >   what's necessary from vendor specifications will likely evolve as we
> >   learn).
> > * There is a substantial user base that depends on the behavior in
> >   question, which probably means there is hardware in the wild that
> >   implements the extensions and users that require those extensions in
> >   order for that hardware to be useful for common applications.  This is
> >   always going to be a grey area, but it's essentially the same spot
> >   everyone else is in.

I must take exception to the "There is a substantial user base" rule,
as this conflicts with the goal of avoiding fragmentation: the support
for vendor-defined extensions should ideally have landed in an
upstream release before the silicon is widely released.  This would
see these extensions being sent upstream significantly before
widespread sampling (and sometimes around the time of the announcement
of a device on the roadmap).  Simply put: I want everyone defining
vendor extensions to contribute to our mainline development efforts
instead of extending their own ancient forks.

I suspect that this rule is intended to ensure that experimental,
purely academic, or "closed" (as in: even if you have the silicon, it
will be so deeply embedded that no one can run their own software —
e.g. radio baseband controllers) extensions don't make the maintenance
work harder.  If that is the case: could we use wording such as (a
native speaker might wordsmith something more accurate) "accessible to
run user-developed software" and "intended for a wider audience"?

> > * There is a mechanism for testing the code in question without direct
> >   access to hardware, which in practice means a QEMU port (or whatever
> >   simulator is relevant in the space and that folks use for testing) or
> >   some community commitment to long-term availability of the hardware
> >   for testing (something like the GCC compile farm, for example).
> > * It is possible to produce binaries that are compatible with all
> >   upstream vendors' implementations.  That means we'll need mechanisms
> >   to allow extensions from multiple vendors to be linked together and
> >   then probed at runtime.  That's not to say that all binaries will be
> >   compatible, as users are always free to skip the compatibility code
> >   and there will be conflicting definitions of instruction encodings,
> >   but we can at least provide users with the option of compatibility.

We today have:
- Tag_RISCV_arch (see
https://github.com/riscv-non-isa/riscv-elf-psabi-doc/blob/master/riscv-elf.adoc#tag_riscv_arch-5-ntbssubarch)
- ifunc support

Admittedly, there's some loose ends in the end-to-end story (e.g.
Unified Discovery -> DTB -> glibc ifunc initialisation): we know just
t

Re: Supporting RISC-V Vendor Extensions in the GNU Toolchain

2022-05-13 Thread Philipp Tomsich
On Fri, 13 May 2022 at 12:58, Florian Weimer  wrote:
>
> * Christoph Müllner via Binutils:
>
> > I'd like to add two points to this topic and raise two questions.
> >
> > 1) Accepting vendor extensions = avoidance of fragmentation
> >
> > RISC-V implementors are actively encouraged to implement their
> > own ISA extensions. To avoid fragmentation in the SW ecosystem
> > (every vendor maintains a fork of tools, distros and binaries) there
> > needs to be a principle acceptance to get vendor extension support
> > upstream.
>
> If you eventually want portable binaries, it's necessary to converge on
> a small set of widely implemented extensions.  x86 didn't have this, and
> adoption was poor outside specialized libraries (and JIT, of course).
> Yet everything was as upstream as possible (ISA manuals, assemblers,
> compiler intrinsics, even automated adoption by optimizers).  So
> upstreaming is only the first step.

Some of the earlier discussion seems to have mixed two different goals:
1. making the vendor-defined features available to the developer and
ensuring that no unintended consequences (e.g., "accidental"
interlinking) happen, so developers can choose to adopt them (e.g.
through dynamic detection) where appropriate;
2. having widespread adoption for features across
libraries/applications that take advantage of all implemented features

As this is cross-posted to projects that provide the infrastructure,
tools, and plumbing, we should IMO focus on goal #1.
Coming from the RISC-V ISA philosophy, this also makes excellent
sense: after all, RISC-V is (in its purest form) an "ISA construction
kit": one can add extensions or leave extensions off.

For the essential development tools, this flexibility is reflected in
the myriad of combinations that "-march" can have (just consider that
there are 4 distinct Zb[abcs] extensions that add addressing, basic
bit-manipulation, carryless multiplication, and single-bit
operations…).  If individual downstream users see benefits from any of
these (e.g., Zbb for strlen; Zbc for GHASH, …), they will contribute
optimized code-paths under ifunc (or whatever other mechanism a given
library/application uses); however: we first need to have our tools
support these extensions (both standard and vendor-defined) and ensure
that no accidental interlinking happens.

Finally, to enable binary distributions, a basic architecture level
that everyone agrees on (these are being defined at the RISC-V
Foundation under the "Profiles" and "Platforms" umbrellas) provides a
baseline to target that will provide some level of "runs everywhere"
based on such a "small set of widely implemented extensions".

Philipp.


[PATCH v3] RISC-V: Implement C[LT]Z_DEFINED_VALUE_AT_ZERO

2022-05-13 Thread Philipp Tomsich
The Zbb support has introduced ctz and clz to the backend, but some
transformations in GCC need to know what the value of c[lt]z at zero
is. This affects how the optab is generated and may suppress use of
CLZ/CTZ in tree passes.

Among other things, this is needed for the transformation of
table-based ctz-implementations, such as in deepsjeng, to work
(see https://gcc.gnu.org/bugzilla/show_bug.cgi?id=90838).

Prior to this change, the test case from PR90838 would compile to
on RISC-V targets with Zbb:
  myctz:
lui a4,%hi(.LC0)
ld  a4,%lo(.LC0)(a4)
neg a5,a0
and a5,a5,a0
mul a5,a5,a4
lui a4,%hi(.LANCHOR0)
addia4,a4,%lo(.LANCHOR0)
srlia5,a5,58
sh2add  a5,a5,a4
lw  a0,0(a5)
ret

After this change, we get:
  myctz:
ctz a0,a0
andia0,a0,63
ret

Testing this with deepsjeng_r (from SPEC 2017) against QEMU, this
shows a clear reduction in dynamic instruction count:
 - before  1961888067076
 - after   1907928279874 (2.75% reduction)

This also merges the various target-specific test-cases (for x86-64,
aarch64 and riscv) within gcc.dg/pr90838.c.

This extends the macros (i.e., effective-target keywords) used in
testing (lib/target-supports.exp) to reliably distinguish between RV32
and RV64 via __riscv_xlen (i.e., the integer register bitwidth) :
testing for ILP32 could be misleading (as ILP32 is a valid memory
model for 64bit systems).

gcc/ChangeLog:

* config/riscv/riscv.h (CLZ_DEFINED_VALUE_AT_ZERO): Implement.
(CTZ_DEFINED_VALUE_AT_ZERO): Same.
* doc/sourcebuild.texi: add documentation for RISC-V specific
test target keywords

gcc/testsuite/ChangeLog:

* gcc.dg/pr90838.c: Add additional flags (dg-additional-options)
  when compiling for riscv64 and subsume gcc.target/aarch64/pr90838.c
  and gcc.target/i386/pr95863-2.c.
* gcc.target/riscv/zbb-ctz.c: New test.
* gcc.target/aarch64/pr90838.c: Removed.
* gcc.target/i386/pr95863-2.c: Removed.
* lib/target-supports.exp: Recognize RV32 or RV64 via XLEN

Signed-off-by: Philipp Tomsich 
Signed-off-by: Manolis Tsamis 
Co-developed-by: Manolis Tsamis 

---
Changes in v3:
- Address nit from Kito (use rv64 and rv32 on gcc.dg/pr90838.c
  consistently.

Changes in v2:
- Address review comments from Palmer (merging testcases)
- Merge the different target-specific testcases for CLZ into one
- Add RV32 tests
- Fix pr90383.c testcase for x86_64

 gcc/config/riscv/riscv.h   |  5 ++
 gcc/doc/sourcebuild.texi   | 12 
 gcc/testsuite/gcc.dg/pr90838.c | 25 +
 gcc/testsuite/gcc.target/aarch64/pr90838.c | 64 --
 gcc/testsuite/gcc.target/i386/pr95863-2.c  | 27 -
 gcc/testsuite/lib/target-supports.exp  | 30 ++
 6 files changed, 72 insertions(+), 91 deletions(-)
 delete mode 100644 gcc/testsuite/gcc.target/aarch64/pr90838.c
 delete mode 100644 gcc/testsuite/gcc.target/i386/pr95863-2.c

diff --git a/gcc/config/riscv/riscv.h b/gcc/config/riscv/riscv.h
index 8a4d2cf7f85..b191606edb4 100644
--- a/gcc/config/riscv/riscv.h
+++ b/gcc/config/riscv/riscv.h
@@ -1004,4 +1004,9 @@ extern void riscv_remove_unneeded_save_restore_calls 
(void);
 
 #define HARD_REGNO_RENAME_OK(FROM, TO) riscv_hard_regno_rename_ok (FROM, TO)
 
+#define CLZ_DEFINED_VALUE_AT_ZERO(MODE, VALUE) \
+  ((VALUE) = GET_MODE_UNIT_BITSIZE (MODE), 2)
+#define CTZ_DEFINED_VALUE_AT_ZERO(MODE, VALUE) \
+  ((VALUE) = GET_MODE_UNIT_BITSIZE (MODE), 2)
+
 #endif /* ! GCC_RISCV_H */
diff --git a/gcc/doc/sourcebuild.texi b/gcc/doc/sourcebuild.texi
index 613ac29967b..71c04841df2 100644
--- a/gcc/doc/sourcebuild.texi
+++ b/gcc/doc/sourcebuild.texi
@@ -2420,6 +2420,18 @@ PowerPC target pre-defines macro _ARCH_PWR9 which means 
the @code{-mcpu}
 setting is Power9 or later.
 @end table
 
+@subsection RISC-V specific attributes
+
+@table @code
+
+@item rv32
+Test system has an integer register width of 32 bits.
+
+@item rv64
+Test system has an integer register width of 64 bits.
+
+@end table
+
 @subsubsection Other hardware attributes
 
 @c Please keep this table sorted alphabetically.
diff --git a/gcc/testsuite/gcc.dg/pr90838.c b/gcc/testsuite/gcc.dg/pr90838.c
index 41c5dab9a5c..7502b846346 100644
--- a/gcc/testsuite/gcc.dg/pr90838.c
+++ b/gcc/testsuite/gcc.dg/pr90838.c
@@ -1,5 +1,8 @@
 /* { dg-do compile } */
 /* { dg-options "-O2 -fdump-tree-forwprop2-details" } */
+/* { dg-additional-options "-mbmi" { target { { i?86-*-* x86_64-*-* } && { ! { 
ia32 } } } } } */
+/* { dg-additional-options "-march=rv64gc_zbb" { target { rv64 } } } */
+/* { dg-additional-options "-march=rv32gc_zbb" { target { rv32 } } } */
 
 int ctz1 (unsigned x)
 {
@@ -56,4 +59,26 @@ int ctz4 (unsigned long x)
   return table[(lsb * magic) >> 58];
 }
 
+/* { dg-final { scan-tree-dump-times

Re: [PATCH v3] RISC-V: Implement C[LT]Z_DEFINED_VALUE_AT_ZERO

2022-05-13 Thread Philipp Tomsich
Added the two nits from Kito's review and … Applied to trunk!


On Fri, 13 May 2022 at 22:16, Philipp Tomsich  wrote:
>
> The Zbb support has introduced ctz and clz to the backend, but some
> transformations in GCC need to know what the value of c[lt]z at zero
> is. This affects how the optab is generated and may suppress use of
> CLZ/CTZ in tree passes.
>
> Among other things, this is needed for the transformation of
> table-based ctz-implementations, such as in deepsjeng, to work
> (see https://gcc.gnu.org/bugzilla/show_bug.cgi?id=90838).
>
> Prior to this change, the test case from PR90838 would compile to
> on RISC-V targets with Zbb:
>   myctz:
> lui a4,%hi(.LC0)
> ld  a4,%lo(.LC0)(a4)
> neg a5,a0
> and a5,a5,a0
> mul a5,a5,a4
> lui a4,%hi(.LANCHOR0)
> addia4,a4,%lo(.LANCHOR0)
> srlia5,a5,58
> sh2add  a5,a5,a4
> lw  a0,0(a5)
> ret
>
> After this change, we get:
>   myctz:
> ctz a0,a0
> andia0,a0,63
> ret
>
> Testing this with deepsjeng_r (from SPEC 2017) against QEMU, this
> shows a clear reduction in dynamic instruction count:
>  - before  1961888067076
>  - after   1907928279874 (2.75% reduction)
>
> This also merges the various target-specific test-cases (for x86-64,
> aarch64 and riscv) within gcc.dg/pr90838.c.
>
> This extends the macros (i.e., effective-target keywords) used in
> testing (lib/target-supports.exp) to reliably distinguish between RV32
> and RV64 via __riscv_xlen (i.e., the integer register bitwidth) :
> testing for ILP32 could be misleading (as ILP32 is a valid memory
> model for 64bit systems).
>
> gcc/ChangeLog:
>
> * config/riscv/riscv.h (CLZ_DEFINED_VALUE_AT_ZERO): Implement.
> (CTZ_DEFINED_VALUE_AT_ZERO): Same.
> * doc/sourcebuild.texi: add documentation for RISC-V specific
> test target keywords
>
> gcc/testsuite/ChangeLog:
>
> * gcc.dg/pr90838.c: Add additional flags (dg-additional-options)
>   when compiling for riscv64 and subsume gcc.target/aarch64/pr90838.c
>   and gcc.target/i386/pr95863-2.c.
> * gcc.target/riscv/zbb-ctz.c: New test.
>     * gcc.target/aarch64/pr90838.c: Removed.
> * gcc.target/i386/pr95863-2.c: Removed.
> * lib/target-supports.exp: Recognize RV32 or RV64 via XLEN
>
> Signed-off-by: Philipp Tomsich 
> Signed-off-by: Manolis Tsamis 
> Co-developed-by: Manolis Tsamis 
>
> ---
> Changes in v3:
> - Address nit from Kito (use rv64 and rv32 on gcc.dg/pr90838.c
>   consistently.
>
> Changes in v2:
> - Address review comments from Palmer (merging testcases)
> - Merge the different target-specific testcases for CLZ into one
> - Add RV32 tests
> - Fix pr90383.c testcase for x86_64
>
>  gcc/config/riscv/riscv.h   |  5 ++
>  gcc/doc/sourcebuild.texi   | 12 
>  gcc/testsuite/gcc.dg/pr90838.c | 25 +
>  gcc/testsuite/gcc.target/aarch64/pr90838.c | 64 --
>  gcc/testsuite/gcc.target/i386/pr95863-2.c  | 27 -
>  gcc/testsuite/lib/target-supports.exp  | 30 ++
>  6 files changed, 72 insertions(+), 91 deletions(-)
>  delete mode 100644 gcc/testsuite/gcc.target/aarch64/pr90838.c
>  delete mode 100644 gcc/testsuite/gcc.target/i386/pr95863-2.c
>
> diff --git a/gcc/config/riscv/riscv.h b/gcc/config/riscv/riscv.h
> index 8a4d2cf7f85..b191606edb4 100644
> --- a/gcc/config/riscv/riscv.h
> +++ b/gcc/config/riscv/riscv.h
> @@ -1004,4 +1004,9 @@ extern void riscv_remove_unneeded_save_restore_calls 
> (void);
>
>  #define HARD_REGNO_RENAME_OK(FROM, TO) riscv_hard_regno_rename_ok (FROM, TO)
>
> +#define CLZ_DEFINED_VALUE_AT_ZERO(MODE, VALUE) \
> +  ((VALUE) = GET_MODE_UNIT_BITSIZE (MODE), 2)
> +#define CTZ_DEFINED_VALUE_AT_ZERO(MODE, VALUE) \
> +  ((VALUE) = GET_MODE_UNIT_BITSIZE (MODE), 2)
> +
>  #endif /* ! GCC_RISCV_H */
> diff --git a/gcc/doc/sourcebuild.texi b/gcc/doc/sourcebuild.texi
> index 613ac29967b..71c04841df2 100644
> --- a/gcc/doc/sourcebuild.texi
> +++ b/gcc/doc/sourcebuild.texi
> @@ -2420,6 +2420,18 @@ PowerPC target pre-defines macro _ARCH_PWR9 which 
> means the @code{-mcpu}
>  setting is Power9 or later.
>  @end table
>
> +@subsection RISC-V specific attributes
> +
> +@table @code
> +
> +@item rv32
> +Test system has an integer register width of 32 bits.
> +
> +@item rv64
> +Test system has an integer register width of 64 bits.
> +
> +@end table
> +
>  @subsubsection Other hardware attributes
>
>  @c Please keep this table sorted alphabetically.
> dif

Re: Supporting RISC-V Vendor Extensions in the GNU Toolchain

2022-05-16 Thread Philipp Tomsich
A generous [snip], as this has been getting a bit long.

On Sun, 15 May 2022 at 03:21, Palmer Dabbelt  wrote:

> I am worried about bad
> actors leveraging any policy to make a bunch of noise, as that's a
> pretty persistent problem in RISC-V land and it looks like things are
> going to get worse before they get better.
>

I don't follow. Maybe you can walk me through the "bad actors" comment next
time we talk…


> > We today have:
> > - Tag_RISCV_arch (see
> >
> https://github.com/riscv-non-isa/riscv-elf-psabi-doc/blob/master/riscv-elf.adoc#tag_riscv_arch-5-ntbssubarch
> )
> > - ifunc support
> >
> > Admittedly, there's some loose ends in the end-to-end story (e.g.
> > Unified Discovery -> DTB -> glibc ifunc initialisation): we know just
> > too well how this plays out as there are optimised string/memory
> > functions (Zbb, Zicboz, cache-block-length, …) in our pipeline as well
> > as OpenSSL support for Zbb and Zbc.  However, this is a known gap and
> > will be fully addressed in the future.
> >
> > Is there something specific beyond this that you'd be looking for?
>
> I might be forgetting something, but at least:
>
> * Tag_RISCV_arch attributes are really fundamentally based around
>   compatible extension sets and just don't work when faced with the
>   realities of what RISC-V is today -- that's true even for standard
>   extensions, but it's going to be way worse with vendor extensions.
> * Some scheme that allows relocations from multiple vendors to be linked
>   together.  There's been some proposals here, but nothing that the
>   psABI folks seem to like (and also might not play well with dynamic
>   relocations).
>

I would recommend deferring solving the vendor-defined relocations to a
later time.
All vendor-defined extension proposals already on the table for upstream
inclusion (X-Ventana-CondOps, X-THead-CMO) don't require custom
relocation.  I don't expect anything requiring these shortly — and whoever
submits it will have to provide a proposal for vendor-defined relocations
that finds some consensus.


> * There's a lot between device tree and ifunc (not to mention ACPI).
>   Kito had a proposal for how to get this up to userspace, there's an
>   earlier version from Plumbers last year but there's a lot of work that
>   needs to be done to turn that into reality.
>

Agreed. Our team is looking into this already as Zbb and Zicboz are useful
in GLIBC.


> * Some use cases won't be met by ifunc, there's a whole lot of
>   techniques available and we at least want to allow those to function.
>   In the long run binary compatibility is going to be a losing battle,
>   but we can at least try to keep things sane so the folks in charge at
>   the foundation have a chance to understand what a hole we're in with
>   enough time left to fix it.
>
> I know it's a lot more work to give users the option of compatibility,
> but once that's gone it'll never come back so I'm willing to at least
> try -- though of course that'll put a burden on everyone, even those
> outside the RISC-V ports, so everyone needs to be on board.
>

I have been discussing "fat binaries" on and off in the context of
reconciling the vector fragmentation.
This is a follow-on topic to getting things enabled and ensuring that no
accidental interworking occurs — once the basic support is mature enough, I
hope there will be takers for fat-binary support.

I hope this further clarifies my thinking: I would like to roll support for
vendor-defined extensions out in an incremental manner: starting with
rolling up some extensions into the development tools (assembler, linker,
and compiler); and only then improving runtime detection and library
usage.  For vendor-defined relocations, I would build consensus once we
first encounter the need for them.

Philipp.


Re: [PATCH] [PR/target 105666] RISC-V: Inhibit FP <--> int register moves via tune param

2022-05-23 Thread Philipp Tomsich
Good catch!

On Mon, 23 May 2022 at 20:12, Vineet Gupta  wrote:

> Under extreme register pressure, compiler can use FP <--> int
> moves as a cheap alternate to spilling to memory.
> This was seen with SPEC2017 FP benchmark 507.cactu:
> ML_BSSN_Advect.cc:ML_BSSN_Advect_Body()
>
> |   fmv.d.x fa5,s9  # PDupwindNthSymm2Xt1, PDupwindNthSymm2Xt1
> | .LVL325:
> |   ld  s9,184(sp)  # _12469, %sfp
> | ...
> | .LVL339:
> |   fmv.x.d s4,fa5  # PDupwindNthSymm2Xt1, PDupwindNthSymm2Xt1
> |
>
> The FMV instructions could be costlier (than stack spill) on certain
> micro-architectures, thus this needs to be a per-cpu tunable
> (default being to inhibit on all existing RV cpus).
>
> Testsuite run with new test reports 10 failures without the fix
> corresponding to the build variations of pr105666.c
>
> |   === gcc Summary ===
> |
> | # of expected passes  123318   (+10)
> | # of unexpected failures  34   (-10)
> | # of unexpected successes 4
> | # of expected failures780
> | # of unresolved testcases 4
> | # of unsupported tests2796
>
> gcc/Changelog:
>
> * config/riscv/riscv.cc: (struct riscv_tune_param): Add
>   fmv_cost.
> (rocket_tune_info): Add default fmv_cost 8.
> (sifive_7_tune_info): Ditto.
> (thead_c906_tune_info): Ditto.
> (optimize_size_tune_info): Ditto.
> (riscv_register_move_cost): Use fmv_cost for int<->fp moves.
>
> gcc/testsuite/Changelog:
>
> * gcc.target/riscv/pr105666.c: New test.
>
> Signed-off-by: Vineet Gupta 
> ---
>  gcc/config/riscv/riscv.cc |  9 
>  gcc/testsuite/gcc.target/riscv/pr105666.c | 55 +++
>  2 files changed, 64 insertions(+)
>  create mode 100644 gcc/testsuite/gcc.target/riscv/pr105666.c
>
> diff --git a/gcc/config/riscv/riscv.cc b/gcc/config/riscv/riscv.cc
> index ee756aab6940..f3ac0d8865f0 100644
> --- a/gcc/config/riscv/riscv.cc
> +++ b/gcc/config/riscv/riscv.cc
> @@ -220,6 +220,7 @@ struct riscv_tune_param
>unsigned short issue_rate;
>unsigned short branch_cost;
>unsigned short memory_cost;
> +  unsigned short fmv_cost;
>bool slow_unaligned_access;
>  };
>
> @@ -285,6 +286,7 @@ static const struct riscv_tune_param rocket_tune_info
> = {
>1,   /* issue_rate */
>3,   /* branch_cost */
>5,   /* memory_cost */
> +  8,   /* fmv_cost */
>true,/*
> slow_unaligned_access */
>  };
>
> @@ -298,6 +300,7 @@ static const struct riscv_tune_param
> sifive_7_tune_info = {
>2,   /* issue_rate */
>4,   /* branch_cost */
>3,   /* memory_cost */
> +  8,   /* fmv_cost */
>true,/*
> slow_unaligned_access */
>  };
>
> @@ -311,6 +314,7 @@ static const struct riscv_tune_param
> thead_c906_tune_info = {
>1,/* issue_rate */
>3,/* branch_cost */
>5,/* memory_cost */
> +  8,   /* fmv_cost */
>false,/* slow_unaligned_access */
>  };
>
> @@ -324,6 +328,7 @@ static const struct riscv_tune_param
> optimize_size_tune_info = {
>1,   /* issue_rate */
>1,   /* branch_cost */
>2,   /* memory_cost */
> +  8,   /* fmv_cost */
>false,   /* slow_unaligned_access */
>  };
>
> @@ -4737,6 +4742,10 @@ static int
>  riscv_register_move_cost (machine_mode mode,
>   reg_class_t from, reg_class_t to)
>  {
> +  if ((from == FP_REGS && to == GR_REGS) ||
> +  (from == GR_REGS && to == FP_REGS))
> +return tune_param->fmv_cost;
> +
>return riscv_secondary_memory_needed (mode, from, to) ? 8 : 2;
>  }
>
> diff --git a/gcc/testsuite/gcc.target/riscv/pr105666.c
> b/gcc/testsuite/gcc.target/riscv/pr105666.c
> new file mode 100644
> index ..904f3bc0763f
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/riscv/pr105666.c
> @@ -0,0 +1,55 @@
> +/* Shamelessly plugged off
> gcc/testsuite/gcc.c-torture/execute/pr28982a.c.
> +
> +   The idea is to induce high register pressure for both int/fp registers
> +   so that they spill. By default FMV instructions would be used to stash
> +   int reg to a fp reg (and vice-versa) but that could be costlier than
> +   spilling to stack.  */
> +
> +/* { dg-do compile } */
> +/* { dg-options "-march=rv64g -ffast-math" } */
> +
> +#define NITER 4
> +#define NVARS 20
> +#define MULTI(X) \
> +  X( 0), X( 1), X( 2), X( 3), X( 4), X( 5), X( 6), X( 7), 

[PATCH v1 0/3] RISC-V: Improve sequences with shifted zero-extended operands

2022-05-24 Thread Philipp Tomsich


Code-generation currently misses some opportunities for optimized
sequences when zero-extension is combined with shifts.


Philipp Tomsich (3):
  RISC-V: add consecutive_bits_operand predicate
  RISC-V: Split slli+sh[123]add.uw opportunities to avoid zext.w
  RISC-V: Replace zero_extendsidi2_shifted with generalized split

 gcc/config/riscv/bitmanip.md   | 44 ++
 gcc/config/riscv/predicates.md | 11 ++
 gcc/config/riscv/riscv.md  | 37 +-
 gcc/testsuite/gcc.target/riscv/zba-shadd.c | 13 +++
 4 files changed, 88 insertions(+), 17 deletions(-)
 create mode 100644 gcc/testsuite/gcc.target/riscv/zba-shadd.c

-- 
2.34.1



[PATCH v1 1/3] RISC-V: add consecutive_bits_operand predicate

2022-05-24 Thread Philipp Tomsich
Provide an easy way to constrain for constants that are a a single,
consecutive run of ones.

gcc/ChangeLog:

* config/riscv/predicates.md (consecutive_bits_operand):
  Implement new predicate.

Signed-off-by: Philipp Tomsich 
---

 gcc/config/riscv/predicates.md | 11 +++
 1 file changed, 11 insertions(+)

diff --git a/gcc/config/riscv/predicates.md b/gcc/config/riscv/predicates.md
index c37caa2502b..90db5dfcdd5 100644
--- a/gcc/config/riscv/predicates.md
+++ b/gcc/config/riscv/predicates.md
@@ -243,3 +243,14 @@ (define_predicate "const63_operand"
 (define_predicate "imm5_operand"
   (and (match_code "const_int")
(match_test "INTVAL (op) < 5")))
+
+;; A CONST_INT operand that consists of a single run of consecutive set bits.
+(define_predicate "consecutive_bits_operand"
+  (match_code "const_int")
+{
+   unsigned HOST_WIDE_INT val = UINTVAL (op);
+   if (exact_log2 ((val >> ctz_hwi (val)) + 1) < 0)
+   return false;
+
+   return true;
+})
-- 
2.34.1



[PATCH v1 2/3] RISC-V: Split slli+sh[123]add.uw opportunities to avoid zext.w

2022-05-24 Thread Philipp Tomsich
When encountering a prescaled (biased) value as a candidate for
sh[123]add.uw, the combine pass will present this as shifted by the
aggregate amount (prescale + shift-amount) with an appropriately
adjusted mask constant that has fewer than 32 bits set.

E.g., here's the failing expression seen in combine for a prescale of
1 and a shift of 2 (note how 0x3fff8 >> 3 is 0x7fff).
  Trying 7, 8 -> 10:
  7: r78:SI=r81:DI#0<<0x1
REG_DEAD r81:DI
  8: r79:DI=zero_extend(r78:SI)
REG_DEAD r78:SI
 10: r80:DI=r79:DI<<0x2+r82:DI
REG_DEAD r79:DI
REG_DEAD r82:DI
  Failed to match this instruction:
  (set (reg:DI 80 [ cD.1491 ])
  (plus:DI (and:DI (ashift:DI (reg:DI 81)
   (const_int 3 [0x3]))
   (const_int 17179869176 [0x3fff8]))
  (reg:DI 82)))

To address this, we introduce a splitter handling these cases.

gcc/ChangeLog:

* config/riscv/bitmanip.md: Add split to handle opportunities
  for slli + sh[123]add.uw

gcc/testsuite/ChangeLog:

* gcc.target/riscv/zba-shadd.c: New test.

Signed-off-by: Philipp Tomsich 
Co-developed-by: Manolis Tsamis 

---

 gcc/config/riscv/bitmanip.md   | 44 ++
 gcc/testsuite/gcc.target/riscv/zba-shadd.c | 13 +++
 2 files changed, 57 insertions(+)
 create mode 100644 gcc/testsuite/gcc.target/riscv/zba-shadd.c

diff --git a/gcc/config/riscv/bitmanip.md b/gcc/config/riscv/bitmanip.md
index 0ab9ffe3c0b..6c1ccc6f8c5 100644
--- a/gcc/config/riscv/bitmanip.md
+++ b/gcc/config/riscv/bitmanip.md
@@ -79,6 +79,50 @@ (define_insn "*shNadduw"
   [(set_attr "type" "bitmanip")
(set_attr "mode" "DI")])
 
+;; During combine, we may encounter an attempt to combine
+;;   slli rtmp, rs, #imm
+;;   zext.w rtmp, rtmp
+;;   sh[123]add rd, rtmp, rs2
+;; which will lead to the immediate not satisfying the above constraints.
+;; By splitting the compound expression, we can simplify to a slli and a
+;; sh[123]add.uw.
+(define_split
+  [(set (match_operand:DI 0 "register_operand")
+   (plus:DI (and:DI (ashift:DI (match_operand:DI 1 "register_operand")
+   (match_operand:QI 2 "immediate_operand"))
+(match_operand:DI 3 "consecutive_bits_operand"))
+(match_operand:DI 4 "register_operand")))
+   (clobber (match_operand:DI 5 "register_operand"))]
+  "TARGET_64BIT && TARGET_ZBA"
+  [(set (match_dup 5) (ashift:DI (match_dup 1) (match_dup 6)))
+   (set (match_dup 0) (plus:DI (and:DI (ashift:DI (match_dup 5)
+ (match_dup 7))
+  (match_dup 8))
+  (match_dup 4)))]
+{
+   unsigned HOST_WIDE_INT mask = UINTVAL (operands[3]);
+   /* scale: shift within the sh[123]add.uw */
+   int scale = 32 - clz_hwi (mask);
+   /* bias:  pre-scale amount (i.e. the prior shift amount) */
+   int bias = ctz_hwi (mask) - scale;
+
+   /* If the bias + scale don't add up to operand[2], reject. */
+   if ((scale + bias) != UINTVAL (operands[2]))
+  FAIL;
+
+   /* If the shift-amount is out-of-range for sh[123]add.uw, reject. */
+   if ((scale < 1) || (scale > 3))
+  FAIL;
+
+   /* If there's no bias, the '*shNadduw' pattern should have matched. */
+   if (bias == 0)
+  FAIL;
+
+   operands[6] = GEN_INT (bias);
+   operands[7] = GEN_INT (scale);
+   operands[8] = GEN_INT (0xULL << scale);
+})
+
 (define_insn "*add.uw"
   [(set (match_operand:DI 0 "register_operand" "=r")
(plus:DI (zero_extend:DI
diff --git a/gcc/testsuite/gcc.target/riscv/zba-shadd.c 
b/gcc/testsuite/gcc.target/riscv/zba-shadd.c
new file mode 100644
index 000..33da2530f3f
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/zba-shadd.c
@@ -0,0 +1,13 @@
+/* { dg-do compile } */
+/* { dg-options "-O2 -march=rv64gc_zba -mabi=lp64" } */
+
+unsigned long foo(unsigned int a, unsigned long b)
+{
+a = a << 1;
+unsigned long c = (unsigned long) a;
+unsigned long d = b + (c<<2);
+return d;
+}
+
+/* { dg-final { scan-assembler "sh2add.uw" } } */
+/* { dg-final { scan-assembler-not "zext" } } */
\ No newline at end of file
-- 
2.34.1



[PATCH v1 3/3] RISC-V: Replace zero_extendsidi2_shifted with generalized split

2022-05-24 Thread Philipp Tomsich
The current method of treating shifts of extended values on RISC-V
frequently causes sequences of 3 shifts, despite the presence of the
'zero_extendsidi2_shifted' pattern.

Consider:
unsigned long f(unsigned int a, unsigned long b)
{
a = a << 1;
unsigned long c = (unsigned long) a;
c = b + (c<<4);
return c;
}
which will present at combine-time as:
Trying 7, 8 -> 9:
7: r78:SI=r81:DI#0<<0x1
  REG_DEAD r81:DI
8: r79:DI=zero_extend(r78:SI)
  REG_DEAD r78:SI
9: r72:DI=r79:DI<<0x4
  REG_DEAD r79:DI
Failed to match this instruction:
(set (reg:DI 72 [ _1 ])
(and:DI (ashift:DI (reg:DI 81)
(const_int 5 [0x5]))
(const_int 68719476704 [0xfffe0])))
and produce the following (optimized) assembly:
f:
slliw   a5,a0,1
sllia5,a5,32
srlia5,a5,28
add a0,a5,a1
ret

The current way of handling this (in 'zero_extendsidi2_shifted')
doesn't apply for two reasons:
- this is seen before reload, and
- (more importantly) the constant mask is not 0xul.

To address this, we introduce a generalized version of shifting
zero-extended values that supports any mask of consecutive ones as
long as the number of training zeros is the inner shift-amount.

With this new split, we generate the following assembly for the
aforementioned function:
f:
sllia0,a0,33
srlia0,a0,28
add a0,a0,a1
ret

gcc/ChangeLog:

* config/riscv/riscv.md (zero_extendsidi2_shifted): Replace
  with a generalized split that requires no clobber, runs
  before reload and works for smaller masks.

Signed-off-by: Philipp Tomsich 
---

 gcc/config/riscv/riscv.md | 37 -
 1 file changed, 20 insertions(+), 17 deletions(-)

diff --git a/gcc/config/riscv/riscv.md b/gcc/config/riscv/riscv.md
index b8ab0cf169a..cc10cd90a74 100644
--- a/gcc/config/riscv/riscv.md
+++ b/gcc/config/riscv/riscv.md
@@ -2119,23 +2119,26 @@ (define_split
 ;; occur when unsigned int is used for array indexing.  Split this into two
 ;; shifts.  Otherwise we can get 3 shifts.
 
-(define_insn_and_split "zero_extendsidi2_shifted"
-  [(set (match_operand:DI 0 "register_operand" "=r")
-   (and:DI (ashift:DI (match_operand:DI 1 "register_operand" "r")
-  (match_operand:QI 2 "immediate_operand" "I"))
-   (match_operand 3 "immediate_operand" "")))
-   (clobber (match_scratch:DI 4 "=&r"))]
-  "TARGET_64BIT && !TARGET_ZBA
-   && ((INTVAL (operands[3]) >> INTVAL (operands[2])) == 0x)"
-  "#"
-  "&& reload_completed"
-  [(set (match_dup 4)
-   (ashift:DI (match_dup 1) (const_int 32)))
-   (set (match_dup 0)
-   (lshiftrt:DI (match_dup 4) (match_dup 5)))]
-  "operands[5] = GEN_INT (32 - (INTVAL (operands [2])));"
-  [(set_attr "type" "shift")
-   (set_attr "mode" "DI")])
+(define_split
+  [(set (match_operand:DI 0 "register_operand")
+   (and:DI (ashift:DI (match_operand:DI 1 "register_operand")
+  (match_operand:QI 2 "immediate_operand"))
+   (match_operand:DI 3 "consecutive_bits_operand")))]
+  "TARGET_64BIT"
+  [(set (match_dup 0) (ashift:DI (match_dup 1) (match_dup 4)))
+   (set (match_dup 0) (lshiftrt:DI (match_dup 0) (match_dup 5)))]
+{
+   unsigned HOST_WIDE_INT mask = UINTVAL (operands[3]);
+   int leading = clz_hwi (mask);
+   int trailing = ctz_hwi (mask);
+
+   /* The shift-amount must match the number of trailing bits */
+   if (trailing != UINTVAL (operands[2]))
+  FAIL;
+
+   operands[4] = GEN_INT (leading + trailing);
+   operands[5] = GEN_INT (leading);
+})
 
 ;;
 ;;  
-- 
2.34.1



[PATCH v1 1/3] RISC-V: Split "(a & (1 << BIT_NO)) ? 0 : -1" to bexti + addi

2022-05-24 Thread Philipp Tomsich
Consider creating a polarity-reversed mask from a set-bit (i.e., if
the bit is set, produce all-ones; otherwise: all-zeros).  Using Zbb,
this can be expressed as bexti, followed by an addi of minus-one.  To
enable the combiner to discover this opportunity, we need to split the
canonical expression for "(a & (1 << BIT_NO)) ? 0 : -1" into a form
combinable into bexti.

Consider the function:
long f(long a)
{
  return (a & (1 << BIT_NO)) ? 0 : -1;
}
This produces the following sequence prior to this change:
andia0,a0,16
seqza0,a0
neg a0,a0
ret
Following this change, it results in:
bexti   a0,a0,4
addia0,a0,-1
    ret

Signed-off-by: Philipp Tomsich 

gcc/ChangeLog:

* config/riscv/bitmanip.md: Add a splitter to generate
  polarity-reversed masks from a set bit using bexti + addi.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/zbs-bexti.c: New test.

---

 gcc/config/riscv/bitmanip.md   | 13 +
 gcc/testsuite/gcc.target/riscv/zbs-bexti.c | 14 ++
 2 files changed, 27 insertions(+)
 create mode 100644 gcc/testsuite/gcc.target/riscv/zbs-bexti.c

diff --git a/gcc/config/riscv/bitmanip.md b/gcc/config/riscv/bitmanip.md
index 0ab9ffe3c0b..ea5dea13cfb 100644
--- a/gcc/config/riscv/bitmanip.md
+++ b/gcc/config/riscv/bitmanip.md
@@ -340,3 +340,16 @@ (define_insn "*bexti"
   "TARGET_ZBS"
   "bexti\t%0,%1,%2"
   [(set_attr "type" "bitmanip")])
+
+;; We can create a polarity-reversed mask (i.e. bit N -> { set = 0, clear = -1 
})
+;; using a bext(i) followed by an addi instruction.
+;; This splits the canonical representation of "(a & (1 << BIT_NO)) ? 0 : -1".
+(define_split
+  [(set (match_operand:GPR 0 "register_operand")
+   (neg:GPR (eq:GPR (zero_extract:GPR (match_operand:GPR 1 
"register_operand")
+  (const_int 1)
+  (match_operand 2))
+(const_int 0]
+  "TARGET_ZBS"
+  [(set (match_dup 0) (zero_extract:GPR (match_dup 1) (const_int 1) (match_dup 
2)))
+   (set (match_dup 0) (plus:GPR (match_dup 0) (const_int -1)))])
diff --git a/gcc/testsuite/gcc.target/riscv/zbs-bexti.c 
b/gcc/testsuite/gcc.target/riscv/zbs-bexti.c
new file mode 100644
index 000..99e3b58309c
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/zbs-bexti.c
@@ -0,0 +1,14 @@
+/* { dg-do compile } */
+/* { dg-options "-march=rv64gc_zbs -mabi=lp64 -O2" } */
+
+/* bexti */
+#define BIT_NO  4
+
+long
+foo0 (long a)
+{
+  return (a & (1 << BIT_NO)) ? 0 : -1;
+}
+
+/* { dg-final { scan-assembler "bexti" } } */
+/* { dg-final { scan-assembler "addi" } } */
-- 
2.34.1



[PATCH v1 2/3] RISC-V: Split "(a & (1UL << bitno)) ? 0 : -1" to bext + addi

2022-05-24 Thread Philipp Tomsich
For a straightforward application of bext for the following function
long bext64(long a, char bitno)
{
  return (a & (1UL << bitno)) ? 0 : -1;
}
we generate
srl a0,a0,a1# 7 [c=4 l=4]  lshrdi3
andia0,a0,1 # 8 [c=4 l=4]  anddi3/1
addia0,a0,-1# 14[c=4 l=4]  adddi3/1
due to the following failed match at combine time:
(set (reg:DI 82)
 (zero_extract:DI (reg:DI 83)
  (const_int 1 [0x1])
  (reg:DI 84)))

The existing pattern for bext requires the 3rd argument to
zero_extract to be a QImode register wrapped in a zero_extension.
This adds an additional pattern that allows an Xmode argument.

With this change, the testcase compiles to
bexta0,a0,a1# 8 [c=4 l=4]  *bextdi
addia0,a0,-1# 14[c=4 l=4]  adddi3/1

gcc/ChangeLog:

* config/riscv/bitmanip.md (*bext): Add an additional
pattern that allows the 3rd argument to zero_extract to be
an Xmode register operand.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/zbs-bext.c: Add testcases.
* gcc.target/riscv/zbs-bexti.c: Add testcases.

Signed-off-by: Philipp Tomsich 
Co-developed-by: Manolis Tsamis 

---

 gcc/config/riscv/bitmanip.md   | 12 +++
 gcc/testsuite/gcc.target/riscv/zbs-bext.c  | 23 +++---
 gcc/testsuite/gcc.target/riscv/zbs-bexti.c | 23 --
 3 files changed, 49 insertions(+), 9 deletions(-)

diff --git a/gcc/config/riscv/bitmanip.md b/gcc/config/riscv/bitmanip.md
index ea5dea13cfb..5d7c20e9fdc 100644
--- a/gcc/config/riscv/bitmanip.md
+++ b/gcc/config/riscv/bitmanip.md
@@ -332,6 +332,18 @@ (define_insn "*bext"
   "bext\t%0,%1,%2"
   [(set_attr "type" "bitmanip")])
 
+;; When performing `(a & (1UL << bitno)) ? 0 : -1` the combiner
+;; usually has the `bitno` typed as X-mode (i.e. no further
+;; zero-extension is performed around the bitno).
+(define_insn "*bext"
+  [(set (match_operand:X 0 "register_operand" "=r")
+   (zero_extract:X (match_operand:X 1 "register_operand" "r")
+   (const_int 1)
+   (match_operand:X 2 "register_operand" "r")))]
+  "TARGET_ZBS"
+  "bext\t%0,%1,%2"
+  [(set_attr "type" "bitmanip")])
+
 (define_insn "*bexti"
   [(set (match_operand:X 0 "register_operand" "=r")
(zero_extract:X (match_operand:X 1 "register_operand" "r")
diff --git a/gcc/testsuite/gcc.target/riscv/zbs-bext.c 
b/gcc/testsuite/gcc.target/riscv/zbs-bext.c
index 47982396119..8de9c5a167c 100644
--- a/gcc/testsuite/gcc.target/riscv/zbs-bext.c
+++ b/gcc/testsuite/gcc.target/riscv/zbs-bext.c
@@ -1,6 +1,6 @@
 /* { dg-do compile } */
 /* { dg-options "-march=rv64gc_zbs -mabi=lp64" } */
-/* { dg-skip-if "" { *-*-* } { "-O0" } } */
+/* { dg-skip-if "" { *-*-* } { "-O0" "-Og" } } */
 
 /* bext */
 long
@@ -16,6 +16,23 @@ foo1 (long i)
   return 1L & (i >> 20);
 }
 
+long bext64_1(long a, char bitno)
+{
+  return (a & (1UL << bitno)) ? 1 : 0;
+}
+
+long bext64_2(long a, char bitno)
+{
+  return (a & (1UL << bitno)) ? 0 : -1;
+}
+
+long bext64_3(long a, char bitno)
+{
+  return (a & (1UL << bitno)) ? -1 : 0;
+}
+
 /* { dg-final { scan-assembler-times "bexti\t" 1 } } */
-/* { dg-final { scan-assembler-times "bext\t" 1 } } */
-/* { dg-final { scan-assembler-not "andi" } } */
+/* { dg-final { scan-assembler-times "bext\t" 4 } } */
+/* { dg-final { scan-assembler-times "addi\t" 1 } } */
+/* { dg-final { scan-assembler-times "neg\t" 1 } } */
+/* { dg-final { scan-assembler-not "andi" } } */
\ No newline at end of file
diff --git a/gcc/testsuite/gcc.target/riscv/zbs-bexti.c 
b/gcc/testsuite/gcc.target/riscv/zbs-bexti.c
index 99e3b58309c..8182a61707d 100644
--- a/gcc/testsuite/gcc.target/riscv/zbs-bexti.c
+++ b/gcc/testsuite/gcc.target/riscv/zbs-bexti.c
@@ -1,14 +1,25 @@
 /* { dg-do compile } */
-/* { dg-options "-march=rv64gc_zbs -mabi=lp64 -O2" } */
+/* { dg-options "-march=rv64gc_zbs -mabi=lp64" } */
+/* { dg-skip-if "" { *-*-* } { "-O0" "-Og" } } */
 
 /* bexti */
 #define BIT_NO  4
 
-long
-foo0 (long a)
+long bexti64_1(long a, char bitno)
 {
-  return (a & (1 << BIT_NO)) ? 0 : -1;
+  return (a & (1UL << BIT_NO)) ? 1 : 0;
 }
 
-/* { dg-final { scan-assembler "bexti" } } */
-/* { dg-final { scan-assembler "addi" } } */
+long bexti64_2(long a, char bitno)
+{
+  return (a & (1UL << BIT_NO)) ? 0 : -1;
+}
+
+long bexti64_3(long a, char bitno)
+{
+  return (a & (1UL << BIT_NO)) ? -1 : 0;
+}
+
+/* { dg-final { scan-assembler-times "bexti\t" 3 } } */
+/* { dg-final { scan-assembler-times "addi\t" 1 } } */
+/* { dg-final { scan-assembler-times "neg\t" 1 } } */
\ No newline at end of file
-- 
2.34.1



[PATCH v1 3/3] RISC-V: Split "(a & (1UL << bitno)) ? 0 : 1" to bext + xori

2022-05-24 Thread Philipp Tomsich
We avoid reassociating "(~(a >> BIT_NO)) & 1" into "((~a) >> BIT_NO) & 1"
by splitting it into a zero-extraction (bext) and an xori.  This both
avoids burning a register on a temporary and generates a sequence that
clearly captures 'extract bit, then invert bit'.

This change improves the previously generated
srl   a0,a0,a1
not   a0,a0
andi  a0,a0,1
into
    bext  a0,a0,a1
xori  a0,a0,1

Signed-off-by: Philipp Tomsich 

gcc/ChangeLog:

* config/riscv/bitmanip.md: Add split covering
"(a & (1 << BIT_NO)) ? 0 : 1".

gcc/testsuite/ChangeLog:

* gcc.target/riscv/zbs-bext.c: Add testcases.
* gcc.target/riscv/zbs-bexti.c: Add testcases.

---

 gcc/config/riscv/bitmanip.md   | 13 +
 gcc/testsuite/gcc.target/riscv/zbs-bext.c  | 10 --
 gcc/testsuite/gcc.target/riscv/zbs-bexti.c | 10 --
 3 files changed, 29 insertions(+), 4 deletions(-)

diff --git a/gcc/config/riscv/bitmanip.md b/gcc/config/riscv/bitmanip.md
index 5d7c20e9fdc..c4b61880e0c 100644
--- a/gcc/config/riscv/bitmanip.md
+++ b/gcc/config/riscv/bitmanip.md
@@ -365,3 +365,16 @@ (define_split
   "TARGET_ZBS"
   [(set (match_dup 0) (zero_extract:GPR (match_dup 1) (const_int 1) (match_dup 
2)))
(set (match_dup 0) (plus:GPR (match_dup 0) (const_int -1)))])
+
+;; Split for "(a & (1 << BIT_NO)) ? 0 : 1":
+;; We avoid reassociating "(~(a >> BIT_NO)) & 1" into "((~a) >> BIT_NO) & 1",
+;; so we don't have to use a temporary.  Instead we extract the bit and then
+;; invert bit 0 ("a ^ 1") only.
+(define_split
+  [(set (match_operand:X 0 "register_operand")
+(and:X (not:X (lshiftrt:X (match_operand:X 1 "register_operand")
+  (subreg:QI (match_operand:X 2 
"register_operand") 0)))
+   (const_int 1)))]
+  "TARGET_ZBS"
+  [(set (match_dup 0) (zero_extract:X (match_dup 1) (const_int 1) (match_dup 
2)))
+   (set (match_dup 0) (xor:X (match_dup 0) (const_int 1)))])
diff --git a/gcc/testsuite/gcc.target/riscv/zbs-bext.c 
b/gcc/testsuite/gcc.target/riscv/zbs-bext.c
index 8de9c5a167c..a8aadb60390 100644
--- a/gcc/testsuite/gcc.target/riscv/zbs-bext.c
+++ b/gcc/testsuite/gcc.target/riscv/zbs-bext.c
@@ -23,16 +23,22 @@ long bext64_1(long a, char bitno)
 
 long bext64_2(long a, char bitno)
 {
-  return (a & (1UL << bitno)) ? 0 : -1;
+  return (a & (1UL << bitno)) ? 0 : 1;
 }
 
 long bext64_3(long a, char bitno)
+{
+  return (a & (1UL << bitno)) ? 0 : -1;
+}
+
+long bext64_4(long a, char bitno)
 {
   return (a & (1UL << bitno)) ? -1 : 0;
 }
 
 /* { dg-final { scan-assembler-times "bexti\t" 1 } } */
-/* { dg-final { scan-assembler-times "bext\t" 4 } } */
+/* { dg-final { scan-assembler-times "bext\t" 5 } } */
+/* { dg-final { scan-assembler-times "xori\t|snez\t" 1 } } */
 /* { dg-final { scan-assembler-times "addi\t" 1 } } */
 /* { dg-final { scan-assembler-times "neg\t" 1 } } */
 /* { dg-final { scan-assembler-not "andi" } } */
\ No newline at end of file
diff --git a/gcc/testsuite/gcc.target/riscv/zbs-bexti.c 
b/gcc/testsuite/gcc.target/riscv/zbs-bexti.c
index 8182a61707d..aa13487b357 100644
--- a/gcc/testsuite/gcc.target/riscv/zbs-bexti.c
+++ b/gcc/testsuite/gcc.target/riscv/zbs-bexti.c
@@ -12,14 +12,20 @@ long bexti64_1(long a, char bitno)
 
 long bexti64_2(long a, char bitno)
 {
-  return (a & (1UL << BIT_NO)) ? 0 : -1;
+  return (a & (1UL << BIT_NO)) ? 0 : 1;
 }
 
 long bexti64_3(long a, char bitno)
+{
+  return (a & (1UL << BIT_NO)) ? 0 : -1;
+}
+
+long bexti64_4(long a, char bitno)
 {
   return (a & (1UL << BIT_NO)) ? -1 : 0;
 }
 
-/* { dg-final { scan-assembler-times "bexti\t" 3 } } */
+/* { dg-final { scan-assembler-times "bexti\t" 4 } } */
+/* { dg-final { scan-assembler-times "xori\t|snez\t" 1 } } */
 /* { dg-final { scan-assembler-times "addi\t" 1 } } */
 /* { dg-final { scan-assembler-times "neg\t" 1 } } */
\ No newline at end of file
-- 
2.34.1



[PATCH v1] RISC-V: bitmanip: improve constant-loading for (1ULL << 31) in DImode

2022-05-24 Thread Philipp Tomsich
The SINGLE_BIT_MASK_OPERAND() is overly restrictive, triggering for
bits above 31 only (to side-step any issues with the negative SImode
value 0x8000).  This moves the special handling of this SImode
value (i.e. the check for -2147483648) to riscv.cc and relaxes the
SINGLE_BIT_MASK_OPERAND() test.

This changes the code-generation for loading (1ULL << 31) from:
li  a0,1
sllia0,a0,31
to:
bseti   a0,zero,31

gcc/ChangeLog:

* config/riscv/riscv.cc (riscv_build_integer_1): Rewrite value as
-2147483648 for the single-bit case, when operating on 0x8000
in SImode.
* gcc/config/riscv/riscv.h (SINGLE_BIT_MASK_OPERAND): Allow for
any single-bit value, moving the special case for 0x8000 to
riscv_build_integer_1 (in riscv.c).

Signed-off-by: Philipp Tomsich 

---

 gcc/config/riscv/riscv.cc |  9 +
 gcc/config/riscv/riscv.h  | 11 ---
 2 files changed, 13 insertions(+), 7 deletions(-)

diff --git a/gcc/config/riscv/riscv.cc b/gcc/config/riscv/riscv.cc
index f83dc796d88..fe8196f5c80 100644
--- a/gcc/config/riscv/riscv.cc
+++ b/gcc/config/riscv/riscv.cc
@@ -420,6 +420,15 @@ riscv_build_integer_1 (struct riscv_integer_op 
codes[RISCV_MAX_INTEGER_OPS],
   /* Simply BSETI.  */
   codes[0].code = UNKNOWN;
   codes[0].value = value;
+
+  /* RISC-V sign-extends all 32bit values that life in a 32bit
+register.  To avoid paradoxes, we thus need to use the
+sign-extended (negative) representation for the value, if we
+want to build 0x8000 in SImode.  This will then expand
+to an ADDI/LI instruction.  */
+  if (mode == SImode && value == 0x8000)
+   codes[0].value = -2147483648;
+
   return 1;
 }
 
diff --git a/gcc/config/riscv/riscv.h b/gcc/config/riscv/riscv.h
index 5083a1c24b0..6f7f4d3fbdc 100644
--- a/gcc/config/riscv/riscv.h
+++ b/gcc/config/riscv/riscv.h
@@ -528,13 +528,10 @@ enum reg_class
   (((VALUE) | ((1UL<<31) - IMM_REACH)) == ((1UL<<31) - IMM_REACH)  \
|| ((VALUE) | ((1UL<<31) - IMM_REACH)) + IMM_REACH == 0)
 
-/* If this is a single bit mask, then we can load it with bseti.  But this
-   is not useful for any of the low 31 bits because we can use addi or lui
-   to load them.  It is wrong for loading SImode 0x8000 on rv64 because it
-   needs to be sign-extended.  So we restrict this to the upper 32-bits
-   only.  */
-#define SINGLE_BIT_MASK_OPERAND(VALUE) \
-  (pow2p_hwi (VALUE) && (ctz_hwi (VALUE) >= 32))
+/* If this is a single bit mask, then we can load it with bseti.  Special
+   handling of SImode 0x8000 on RV64 is done in riscv_build_integer_1. */
+#define SINGLE_BIT_MASK_OPERAND(VALUE) \
+  (pow2p_hwi (VALUE))
 
 /* Stack layout; function entry, exit and calling.  */
 
-- 
2.34.1



[PATCH] match.pd: undistribute (a << s) & C, when C = (M << s) and exact_log2(M - 1)

2020-11-11 Thread Philipp Tomsich
From: Philipp Tomsich 

The function
long f(long a)
{
return(a & 0xull) << 3;
}
is folded into
_1 = a_2(D) << 3;
_3 = _1 & 34359738360;
wheras the construction
return (a & 0xull) * 8;
results in
_1 = a_2(D) & 4294967295;
_3 = _1 * 8;

This leads to suboptimal code-generation for RISC-V (march=rv64g), as
the shifted constant needs to be expanded into 3 RTX and 2 RTX (one
each for the LSHIFT_EXPR and the BIT_AND_EXPR) which will overwhelm
the combine pass (a sequence of 5 RTX are not considered):
li  a5,1# tmp78,# 23[c=4 l=4]  
*movdi_64bit/1
sllia5,a5,35#, tmp79, tmp78 # 24[c=4 l=4]  ashldi3
addia5,a5,-8#, tmp77, tmp79 # 9 [c=4 l=4]  adddi3/1
sllia0,a0,3 #, tmp76, tmp80 # 6 [c=4 l=4]  ashldi3
and a0,a0,a5# tmp77,, tmp76 # 15[c=4 l=4]  anddi3/0
ret # 28[c=0 l=4]  simple_return
instead of:
sllia0,a0,32#, tmp76, tmp79 # 26[c=4 l=4]  ashldi3
srlia0,a0,29#,, tmp76   # 27[c=4 l=4]  lshrdi3
ret # 24[c=0 l=4]  simple_return

We address this by adding a simplification for
   (a << s) & M, where ((M >> s) << s) == M
to
   (a & M_unshifted) << s, where M_unshifted := (M >> s)
which undistributes the LSHIFT.

Signed-off-by: Philipp Tomsich 
---
 gcc/match.pd| 11 +--
 gcc/testsuite/gcc.target/riscv/zextws.c | 18 ++
 2 files changed, 27 insertions(+), 2 deletions(-)
 create mode 100644 gcc/testsuite/gcc.target/riscv/zextws.c

diff --git a/gcc/match.pd b/gcc/match.pd
index 349eab6..6bb9535 100644
--- a/gcc/match.pd
+++ b/gcc/match.pd
@@ -3079,6 +3079,12 @@ DEFINE_INT_AND_FLOAT_ROUND_FN (RINT)
 }
 }
  }
+(if (GIMPLE && (((mask >> shiftc) << shiftc) == mask)
+   && (exact_log2((mask >> shiftc) + 1) >= 0)
+   && (shift == LSHIFT_EXPR))
+(with
+ { tree newmaskt = build_int_cst_type(TREE_TYPE (@2), mask >> shiftc); 
}
+ (shift (convert (bit_and:shift_type (convert @0) { newmaskt; })) @1))
  /* ((X << 16) & 0xff00) is (X, 0).  */
  (if ((mask & zerobits) == mask)
   { build_int_cst (type, 0); }
@@ -3100,7 +3106,7 @@ DEFINE_INT_AND_FLOAT_ROUND_FN (RINT)
   (if (!tree_int_cst_equal (newmaskt, @2))
(if (shift_type != TREE_TYPE (@3))
 (bit_and (convert (shift:shift_type (convert @3) @1)) { newmaskt; 
})
-(bit_and @4 { newmaskt; })
+(bit_and @4 { newmaskt; }))
 
 /* Fold (X {&,^,|} C2) << C1 into (X << C1) {&,^,|} (C2 << C1)
(X {&,^,|} C2) >> C1 into (X >> C1) & (C2 >> C1).  */
@@ -3108,7 +3114,8 @@ DEFINE_INT_AND_FLOAT_ROUND_FN (RINT)
  (for bit_op (bit_and bit_xor bit_ior)
   (simplify
(shift (convert?:s (bit_op:s @0 INTEGER_CST@2)) INTEGER_CST@1)
-   (if (tree_nop_conversion_p (type, TREE_TYPE (@0)))
+   (if (tree_nop_conversion_p (type, TREE_TYPE (@0))
+&& !wi::exact_log2(wi::to_wide(@2) + 1))
 (with { tree mask = int_const_binop (shift, fold_convert (type, @2), @1); }
  (bit_op (shift (convert @0) @1) { mask; }))
 
diff --git a/gcc/testsuite/gcc.target/riscv/zextws.c 
b/gcc/testsuite/gcc.target/riscv/zextws.c
new file mode 100644
index 000..8ac93f6
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/zextws.c
@@ -0,0 +1,18 @@
+/* { dg-do compile } */
+/* { dg-options "-march=rv64g -mabi=lp64 -O2" } */
+
+/* Test for
+ (a << s) & M', where ((M >> s) << s) == M
+   being undistributed into
+ (a & M_unshifted) << s, where M_unshifted := (M >> s)
+   to produce the sequence (or similar)
+ slli  a0,a0,32
+ srli  a0,a0,29
+*/
+long
+zextws_mask (long i)
+{
+  return (i & 0xULL) << 3;
+}
+/* { dg-final { scan-assembler "slli" } } */
+/* { dg-final { scan-assembler "srli" } } */
-- 
1.8.3.1



[PATCH] match.pd: rewrite x << C with C > precision to (const_int 0)

2020-11-11 Thread Philipp Tomsich
From: Philipp Tomsich 

csmith managed to sneak a shift wider than the bit-width of a register
past the frontend (found when addressing a bug in our bitmanip machine
description): no warning is given and an unneeded shift is generated.
This behaviour was validated for the resulting assembly both for RISC-V
and AArch64.

This matches (x << C), where C is contant and C > precicison(x), and
rewrites it to (const_int 0).  This has been confirmed to remove the
redundant shift instruction both for AArch64 and RISC-V.
---
 gcc/match.pd | 6 ++
 1 file changed, 6 insertions(+)

diff --git a/gcc/match.pd b/gcc/match.pd
index 349eab6..2309175 100644
--- a/gcc/match.pd
+++ b/gcc/match.pd
@@ -764,6 +764,12 @@ DEFINE_INT_AND_FLOAT_ROUND_FN (RINT)
(cabss (ops @0))
(cabss @0
 
+/* Fold (x << C), where C > precision(type) into 0. */
+(simplify
+ (lshift @0 INTEGER_CST@1)
+  (if (wi::ltu_p (TYPE_PRECISION (TREE_TYPE (@0)), wi::to_wide(@1)))
+   { build_zero_cst (TREE_TYPE (@0)); } ))
+
 /* Fold (a * (1 << b)) into (a << b)  */
 (simplify
  (mult:c @0 (convert? (lshift integer_onep@1 @2)))
-- 
1.8.3.1



Re: [PATCH] match.pd: undistribute (a << s) & C, when C = (M << s) and exact_log2(M - 1)

2020-11-11 Thread Philipp Tomsich
On Wed, 11 Nov 2020 at 19:17, Jeff Law  wrote:
>
>
> On 11/11/20 3:55 AM, Jakub Jelinek via Gcc-patches wrote:
> > On Wed, Nov 11, 2020 at 11:43:34AM +0100, Philipp Tomsich wrote:
> >> The patch addresses this by disallowing that rule, if an exact power-of-2 
> >> is
> >> seen as C1.  The reason why I would prefer to have this canonicalised the
> >> same way the (X & C1) * C2 is canonicalised, is that cleaning this up 
> >> during
> >> combine is more difficult on some architectures that require multiple insns
> >> to represent the shifted constant (i.e. C1 << C2).
> > It is bad to have many exceptions for the canonicalization
> > and it is unclear why exactly these were chosen, and it doesn't really deal
> > with say:
> > (x & 0xabcdef12ULL) << 13
> > being less expensive on some targets than
> > (x << 13) & (0xabcdef12ULL << 13).
> > (x & 0x7) << 3 vs. (x << 3) & 0x38 on the other side is a wash on
> > many targets.
> > As I said, it is better to decide which one is better before or during
> > expansion based on target costs, sure, combine can't catch everything.
>
> I think Jakub is hitting a key point here.  Gimple should canonicalize
> on what is simpler from a gimple standpoint, not what is better for some
> set of targets.   Target dependencies like this shouldn't be introduced
> until expansion time.

The simplification that distributes the shift (i.e. the one that Jakub referred
to as fighting the new rule) is also run after GIMPLE has been expanded to
RTX.  In my understanding, this still implies that even if we have a cost-aware
expansion, this existing rule will nonetheless distribute the shift.

Philipp.


Re: [PATCH] match.pd: undistribute (a << s) & C, when C = (M << s) and exact_log2(M - 1)

2020-11-11 Thread Philipp Tomsich
On Wed, 11 Nov 2020 at 20:59, Jakub Jelinek  wrote:
> >
> > The simplification that distributes the shift (i.e. the one that Jakub 
> > referred
> > to as fighting the new rule) is also run after GIMPLE has been expanded to
> > RTX.  In my understanding, this still implies that even if we have a 
> > cost-aware
> > expansion, this existing rule will nonetheless distribute the shift.
>
> At the RTL level, such simplifications should not happen if it is
> against costs (e.g. combine but various other passes too check costs and
> punt if the new code would be more costly than the old one).

I agree.
Let me go back and investigate if the cost-model is misreading things, before we
continue the discussion.

Philipp.


[PATCH v1 1/2] Simplify shifts wider than the bitwidth of types

2020-11-16 Thread Philipp Tomsich
From: Philipp Tomsich 

While most shifts wider than the bitwidth of a type will be caught by
other passes, it is possible that these show up for VRP.
Consider the following example:
  int func (int a, int b, int c)
  {
return (a << ((b && c) - 1));
  }

This adds simplify_using_ranges::simplify_lshift_using_ranges to
detect and rewrite such cases.  If the intersection of meaningful
shift amounts for the underlying type and the value-range computed
for the shift-amount (whether an integer constant or a variable) is
empty, the statement is replaced with the zero-constant of the same
precision as the result.

gcc/ChangeLog:

   * vr-values.h (simplify_using_ranges): Declare.
   * vr-values.c (simplify_lshift_using_ranges): New function.
   (simplify): Use simplify_lshift_using_ranges for LSHIFT_EXPR.

---

 gcc/ChangeLog   |  6 ++
 gcc/vr-values.c | 59 +
 gcc/vr-values.h |  1 +
 3 files changed, 66 insertions(+)

diff --git a/gcc/ChangeLog b/gcc/ChangeLog
index 89317d4..b8b9beb 100644
--- a/gcc/ChangeLog
+++ b/gcc/ChangeLog
@@ -1,3 +1,9 @@
+2020-11-16  Philipp Tomsich  
+
+   * vr-values.h (simplify_using_ranges): Declare.
+   * vr-values.c (simplify_lshift_using_ranges): New function.
+   (simplify): Use simplify_lshift_using_ranges for LSHIFT_EXPR.
+
 2020-11-13  Jan Hubicka  
 
* tree-ssa-alias.c (ao_ref_base_alias_ptr_type): Remove accidental
diff --git a/gcc/vr-values.c b/gcc/vr-values.c
index 9f5943a..da7208e 100644
--- a/gcc/vr-values.c
+++ b/gcc/vr-values.c
@@ -3318,6 +3318,58 @@ simplify_using_ranges::simplify_div_or_mod_using_ranges
   return false;
 }
 
+/* Simplify a lshift, if the shift-amount is larger than the
+   bit-width of the type.  Return true if we do simplify.  */
+bool
+simplify_using_ranges::simplify_lshift_using_ranges
+   (gimple_stmt_iterator *gsi,
+gimple *stmt)
+{
+  tree op0 = gimple_assign_rhs1 (stmt);
+  tree op1 = gimple_assign_rhs2 (stmt);
+  value_range vr1;
+
+  /* We only support integral types.  */
+  if (INTEGRAL_TYPE_P (op0))
+return false;
+
+  if (TREE_CODE (op1) == INTEGER_CST)
+vr1.set (op1);
+  else if (TREE_CODE (op1) == SSA_NAME)
+vr1 = *(query->get_value_range (op1, stmt));
+  else
+return false;
+
+  if (vr1.varying_p () || vr1.undefined_p ())
+return false;
+
+  /* Shift amounts are valid up to the type precision.  Any shift that
+ is outside of the range [0, type precision - 1] can be rewritten
+ to a constant result.  */
+  const unsigned prec = TYPE_PRECISION (TREE_TYPE (op0));
+  value_range valid (build_zero_cst (TREE_TYPE (op1)),
+build_int_cst (TREE_TYPE (op1), prec - 1),
+VR_RANGE);
+
+  valid.intersect (vr1);
+  if (valid.undefined_p ())
+{
+  /* If the intersection is empty (i.e. undefined), then we can replace the
+shift with the zero-constant.  */
+
+  if (dump_file && (dump_flags & TDF_DETAILS))
+   {
+ fprintf (dump_file, "\nReplacing shift beyond precision in stmt: ");
+ print_gimple_stmt (dump_file, stmt, 0);
+   }
+  gimple_assign_set_rhs_from_tree (gsi, build_zero_cst (TREE_TYPE (op0)));
+  update_stmt (gsi_stmt (*gsi));
+  return true;
+}
+
+  return false;
+}
+
 /* Simplify a min or max if the ranges of the two operands are
disjoint.   Return true if we do simplify.  */
 
@@ -4422,6 +4474,13 @@ simplify_using_ranges::simplify (gimple_stmt_iterator 
*gsi)
case MAX_EXPR:
  return simplify_min_or_max_using_ranges (gsi, stmt);
 
+   case LSHIFT_EXPR:
+ if ((TREE_CODE (rhs1) == SSA_NAME
+  || TREE_CODE (rhs1) == INTEGER_CST)
+ && INTEGRAL_TYPE_P (TREE_TYPE (rhs1)))
+   return simplify_lshift_using_ranges (gsi, stmt);
+ break;
+
default:
  break;
}
diff --git a/gcc/vr-values.h b/gcc/vr-values.h
index 59fac0c..18fd5c1 100644
--- a/gcc/vr-values.h
+++ b/gcc/vr-values.h
@@ -48,6 +48,7 @@ private:
   bool simplify_div_or_mod_using_ranges (gimple_stmt_iterator *, gimple *);
   bool simplify_abs_using_ranges (gimple_stmt_iterator *, gimple *);
   bool simplify_bit_ops_using_ranges (gimple_stmt_iterator *, gimple *);
+  bool simplify_lshift_using_ranges (gimple_stmt_iterator *, gimple *);
   bool simplify_min_or_max_using_ranges (gimple_stmt_iterator *, gimple *);
   bool simplify_cond_using_ranges_1 (gcond *);
   bool fold_cond (gcond *);
-- 
1.8.3.1



[PATCH v1 2/2] RISC-V: Adjust predicates for immediate shift operands

2020-11-16 Thread Philipp Tomsich
From: Philipp Tomsich 

In case a negative shift operand makes it through into the backend,
it will be treated as unsigned and truncated (using a mask) to fit
into the range 0..31 (for SImode) and 0..63 (for DImode).

Consider the following output illustrating the issue and shows how
the shift amount is truncated):
 #(insn 16 15 53 (set (reg:DI 10 a0 [orig:72  ] [72])
 #(sign_extend:DI (ashift:SI (reg:SI 15 a5 [orig:73 a ] [73])
 #(const_int -1 [0x] "isolated.c":3:13 168 
{*ashlsi3_extend}
 # (expr_list:REG_DEAD (reg:SI 15 a5 [orig:73 a ] [73])
 #(nil)))
  slliw a0,a5,31#, , a  # 16[c=8 l=4]  
*ashlsi3_extend

This change adjusts the predicates to allow immediate shifts for the
supported ranges 0..31 and 0..63, respectively, only.  As immediates
outside of these ranges can no longer pass the constraint-check, the
implementation of the patterns emitting the respective shift can also
be simplified.  Larger shift amounts will now be forced along a path
resulting in a non-immediate shift.

A new testcase is added to check that non-immediate shift instructions
are emitted.

gcc/ChangeLog:

   * config/riscv/predicates.md (riscv_shift_imm_si, riscv_shift_si,
   riscv_shift_imm_di, riscv_shift_di): New.
   * config/riscv/riscv.md: Use 'riscv_shift_si' and 'riscv_shift_di'
   in definition of shift instructions; remove (now unnecessary)
   truncation of immediates.

gcc/testsuite/ChangeLog:

   * gcc.target/riscv/shift-negative-amount.c: New.

---

 gcc/ChangeLog   |  5 +
 gcc/config/riscv/predicates.md  | 16 
 gcc/config/riscv/riscv.md   | 21 -
 gcc/testsuite/ChangeLog |  4 
 .../gcc.target/riscv/shift-negative-amount.c| 14 ++
 5 files changed, 43 insertions(+), 17 deletions(-)
 create mode 100644 gcc/testsuite/gcc.target/riscv/shift-negative-amount.c

diff --git a/gcc/ChangeLog b/gcc/ChangeLog
index b8b9beb..6ca8ee0 100644
--- a/gcc/ChangeLog
+++ b/gcc/ChangeLog
@@ -1,5 +1,10 @@
 2020-11-16  Philipp Tomsich  
 
+   * config/riscv/predicates.md (riscv_shift_imm_si, riscv_shift_si,
+   riscv_shift_imm_di, riscv_shift_di): New.
+   * config/riscv/riscv.md: Use 'riscv_shift_si' and 'riscv_shift_di'
+   in definition of shift instructions; remove (now unnecessary)
+   truncation of immediates.
* vr-values.h (simplify_using_ranges): Declare.
* vr-values.c (simplify_lshift_using_ranges): New function.
(simplify): Use simplify_lshift_using_ranges for LSHIFT_EXPR.
diff --git a/gcc/config/riscv/predicates.md b/gcc/config/riscv/predicates.md
index f764fe7..fb35871 100644
--- a/gcc/config/riscv/predicates.md
+++ b/gcc/config/riscv/predicates.md
@@ -27,6 +27,22 @@
   (ior (match_operand 0 "const_arith_operand")
(match_operand 0 "register_operand")))
 
+(define_predicate "riscv_shift_imm_si"
+  (and (match_code "const_int")
+   (match_test "(unsigned HOST_WIDE_INT) INTVAL (op) < 32")))
+
+(define_predicate "riscv_shift_si"
+  (ior (match_operand 0 "riscv_shift_imm_si")
+   (match_operand 0 "register_operand")))
+
+(define_predicate "riscv_shift_imm_di"
+  (and (match_code "const_int")
+   (match_test "(unsigned HOST_WIDE_INT) INTVAL (op) < 64")))
+
+(define_predicate "riscv_shift_di"
+  (ior (match_operand 0 "riscv_shift_imm_di")
+   (match_operand 0 "register_operand")))
+
 (define_predicate "lui_operand"
   (and (match_code "const_int")
(match_test "LUI_OPERAND (INTVAL (op))")))
diff --git a/gcc/config/riscv/riscv.md b/gcc/config/riscv/riscv.md
index f15bad3..7b34839 100644
--- a/gcc/config/riscv/riscv.md
+++ b/gcc/config/riscv/riscv.md
@@ -1574,13 +1574,9 @@
   [(set (match_operand:SI 0 "register_operand" "= r")
(any_shift:SI
(match_operand:SI 1 "register_operand" "  r")
-   (match_operand:QI 2 "arith_operand"" rI")))]
+   (match_operand:QI 2 "riscv_shift_si"   " rI")))]
   ""
 {
-  if (GET_CODE (operands[2]) == CONST_INT)
-operands[2] = GEN_INT (INTVAL (operands[2])
-  & (GET_MODE_BITSIZE (SImode) - 1));
-
   return TARGET_64BIT ? "%i2w\t%0,%1,%2" : "%i2\t%0,%1,%2";
 }
   [(set_attr "type" "shift")
@@ -1629,13 +1625,9 @@
   [(set (match_operand:DI 0 "register_operand" "= r")
(any_shift:DI
(match_operand:DI 1 "register_operand" "  r")
-   (match_operand:QI 2 "arith_operand&qu

Re: [PATCH v1 2/2] RISC-V: Adjust predicates for immediate shift operands

2020-11-16 Thread Philipp Tomsich
Jim,

On Mon, 16 Nov 2020 at 23:28, Jim Wilson  wrote:
>
> On Mon, Nov 16, 2020 at 10:57 AM Philipp Tomsich 
wrote:
>>
>> In case a negative shift operand makes it through into the backend,
>> it will be treated as unsigned and truncated (using a mask) to fit
>> into the range 0..31 (for SImode) and 0..63 (for DImode).
>
>
> This is a de-optimization.  This doesn't make any sense.  The ISA manual
clearly states the shift counts are truncated.  Some targets do this with
SHIFT_COUNT_TRUNCATED, but that is known to cause problems, so the RISC-V
port is doing it in the shift expanders.  I believe that other targets do
this too.

This is an de-optimization only, if applied without patch 1 from the
series: the change to VRP ensures that the backend will never see a shift
wider than the immediate field.
The problem is that if a negative shift-amount makes it to the backend,
unindented code may be generated (as a shift-amount, in my reading, should
always be interpreted as unsigned).

Note that with tree-vrp turned on (and patch 1 of the series applied), you
will see

.L3:
li a0,0

generated, anyway.

> Also, note that replacing
>   slli a0, a0, 31
> with
>   li a1, -1;
>   sll a0, a0, a1
> doesn't change the operation performed.  The shift count is still
truncated to 31, and so you get the exact same result from both code
sequences.  All you have done is make the code bigger and slower which is
undesirable.

I do agree that this does not address the issue of a shift that is wider
than the register width, even though it makes sure we reject this from the
immediate field.
That said: what is the correct behavior/result of this operation?

> Also note that the testcase has implementation defined results, so there
is no wrong answer here, and nothing wrong with what the RISC-V port is
doing.
>
>> +/* { dg-final { scan-assembler "sll" } } */
>
>
> I don't think that this will work as a grep for sll will also match
slli.  You would need to add a space or tab or maybe both to the search
string to prevent matches with slli.  Or alternatively use
scan-assembler-not "slli" which will match and fail for both slli and slliw.

Good catch. I turned this check around, but submitted the wrong one.

Thanks,
Philipp.


Re: [PATCH v1 1/2] Simplify shifts wider than the bitwidth of types

2020-11-16 Thread Philipp Tomsich
On Mon, 16 Nov 2020 at 23:38, Jim Wilson  wrote:

> On Mon, Nov 16, 2020 at 10:57 AM Philipp Tomsich 
> wrote:
>
>> This adds simplify_using_ranges::simplify_lshift_using_ranges to
>> detect and rewrite such cases.  If the intersection of meaningful
>> shift amounts for the underlying type and the value-range computed
>> for the shift-amount (whether an integer constant or a variable) is
>> empty, the statement is replaced with the zero-constant of the same
>> precision as the result.
>>
>
> This has the risk of breaking some user code.  I've seen people write code
> for RISC-V knowing that the hardware truncates shift counts, and so not
> doing the full calculation to get the right value but just letting the
> compiler/hardware calculate it for them via truncation.  Of course this
> code has implemented defined result, but there is no reason to break it
> unnecessarily.
>

While undefined behavior (as per the C standard), GCC uses a predictable
behaviour for negative shift-amounts (and shifts that are wider than the
type):

int func(int a)
{
  return a << -1;
}

will raise the following warning:

shift-neg.c: In function 'func':
shift-neg.c:3:12: warning: left shift count is negative
[-Wshift-count-negative]
3 |   return a << -1;
  |^~

and return 0:

func:
li a0,0
ret
.size func, .-func


Having two different results generated here, depending on what parts of GCC
"see" the shift-amount, doesn't seem sensible and likely to cause breakage
in the long term.
I fully agree that this is undefined behavior (so no well-formed program
should rely on it), but I would prefer to have a common behavior
independent on when the constant is known.

Philipp.


[PATCH v1] aarch64: enable Ampere-1 CPU

2021-11-01 Thread Philipp Tomsich
This adds support and a basic turning model for the Ampere Computing
"Ampere-1" CPU.

The Ampere-1 implements the ARMv8.6 architecture in A64 mode and is
modelled as a 4-wide issue (as with all modern micro-architectures,
the chosen issue rate is a compromise between the maximum dispatch
rate and the maximum rate of uops issued to the scheduler).

This adds the -mcpu=ampere1 command-line option and the relevant cost
information/tuning tables for the Ampere-1.

gcc/ChangeLog:

* config/aarch64/aarch64-cores.def (AARCH64_CORE): New Ampere-1
core.
* config/aarch64/aarch64-tune.md: Regenerate.
* config/aarch64/aarch64-cost-tables.h: Add extra costs for
Ampere-1.
* config/aarch64/aarch64.c: Add tuning structures for Ampere-1.

---

 gcc/config/aarch64/aarch64-cores.def |   3 +-
 gcc/config/aarch64/aarch64-cost-tables.h | 107 +++
 gcc/config/aarch64/aarch64-tune.md   |   2 +-
 gcc/config/aarch64/aarch64.c |  78 +
 gcc/doc/invoke.texi  |   2 +-
 5 files changed, 189 insertions(+), 3 deletions(-)

diff --git a/gcc/config/aarch64/aarch64-cores.def 
b/gcc/config/aarch64/aarch64-cores.def
index 77da31084de..617cde42fba 100644
--- a/gcc/config/aarch64/aarch64-cores.def
+++ b/gcc/config/aarch64/aarch64-cores.def
@@ -68,7 +68,8 @@ AARCH64_CORE("octeontx83",octeontxt83,   thunderx,  8A,  
AARCH64_FL_FOR_ARCH
 AARCH64_CORE("thunderxt81",   thunderxt81,   thunderx,  8A,  
AARCH64_FL_FOR_ARCH8 | AARCH64_FL_CRC | AARCH64_FL_CRYPTO, thunderx,  0x43, 
0x0a2, -1)
 AARCH64_CORE("thunderxt83",   thunderxt83,   thunderx,  8A,  
AARCH64_FL_FOR_ARCH8 | AARCH64_FL_CRC | AARCH64_FL_CRYPTO, thunderx,  0x43, 
0x0a3, -1)
 
-/* Ampere Computing cores. */
+/* Ampere Computing ('\xC0') cores. */
+AARCH64_CORE("ampere1", ampere1, cortexa57, 8_6A, AARCH64_FL_FOR_ARCH8_6, 
ampere1, 0xC0, 0xac3, -1)
 /* Do not swap around "emag" and "xgene1",
this order is required to handle variant correctly. */
 AARCH64_CORE("emag",emag,  xgene1,8A,  AARCH64_FL_FOR_ARCH8 | 
AARCH64_FL_CRC | AARCH64_FL_CRYPTO, emag, 0x50, 0x000, 3)
diff --git a/gcc/config/aarch64/aarch64-cost-tables.h 
b/gcc/config/aarch64/aarch64-cost-tables.h
index bb499a1eae6..e6ded65b67d 100644
--- a/gcc/config/aarch64/aarch64-cost-tables.h
+++ b/gcc/config/aarch64/aarch64-cost-tables.h
@@ -668,4 +668,111 @@ const struct cpu_cost_table a64fx_extra_costs =
   }
 };
 
+const struct cpu_cost_table ampere1_extra_costs =
+{
+  /* ALU */
+  {
+0, /* arith.  */
+0, /* logical.  */
+0, /* shift.  */
+COSTS_N_INSNS (1), /* shift_reg.  */
+0, /* arith_shift.  */
+COSTS_N_INSNS (1), /* arith_shift_reg.  */
+0, /* log_shift.  */
+COSTS_N_INSNS (1), /* log_shift_reg.  */
+0, /* extend.  */
+COSTS_N_INSNS (1), /* extend_arith.  */
+0, /* bfi.  */
+0, /* bfx.  */
+0, /* clz.  */
+0, /* rev.  */
+0, /* non_exec.  */
+true   /* non_exec_costs_exec.  */
+  },
+  {
+/* MULT SImode */
+{
+  COSTS_N_INSNS (3),   /* simple.  */
+  COSTS_N_INSNS (3),   /* flag_setting.  */
+  COSTS_N_INSNS (3),   /* extend.  */
+  COSTS_N_INSNS (4),   /* add.  */
+  COSTS_N_INSNS (4),   /* extend_add.  */
+  COSTS_N_INSNS (18)   /* idiv.  */
+},
+/* MULT DImode */
+{
+  COSTS_N_INSNS (3),   /* simple.  */
+  0,   /* flag_setting (N/A).  */
+  COSTS_N_INSNS (3),   /* extend.  */
+  COSTS_N_INSNS (4),   /* add.  */
+  COSTS_N_INSNS (4),   /* extend_add.  */
+  COSTS_N_INSNS (34)   /* idiv.  */
+}
+  },
+  /* LD/ST */
+  {
+COSTS_N_INSNS (4), /* load.  */
+COSTS_N_INSNS (4), /* load_sign_extend.  */
+0, /* ldrd (n/a).  */
+0, /* ldm_1st.  */
+0, /* ldm_regs_per_insn_1st.  */
+0, /* ldm_regs_per_insn_subsequent.  */
+COSTS_N_INSNS (5), /* loadf.  */
+COSTS_N_INSNS (5), /* loadd.  */
+COSTS_N_INSNS (5), /* load_unaligned.  */
+0, /* store.  */
+0, /* strd.  */
+0, /* stm_1st.  */
+0, /* stm_regs_per_insn_1st.  */
+0, /* stm_regs_per_insn_subsequent.  */
+COSTS_N_INSNS (2), /* storef.  */
+COSTS_N_INSNS (2), /* stored.  */
+COSTS_N_INSNS (2), /* store_unaligned.  */
+COSTS_N_INSNS (3), /* loadv.  */
+COSTS_N_INSNS (3)  /* storev.  */
+  },
+  {
+/* FP SFmode */
+{
+  COSTS_N_INSNS (25),  /* div.  */
+  COSTS_N_INSNS (4),

Re: [PATCH v1] aarch64: enable Ampere-1 CPU

2021-11-03 Thread Philipp Tomsich
Richard,

On Wed, 3 Nov 2021 at 10:08, Richard Sandiford
 wrote:
>
> Philipp Tomsich  writes:
> > This adds support and a basic tuning model for the Ampere Computing
> > "Ampere-1" CPU.
> >
> > The Ampere-1 implements the ARMv8.6 architecture in A64 mode and is
> > modelled as a 4-wide issue (as with all modern micro-architectures,
> > the chosen issue rate is a compromise between the maximum dispatch
> > rate and the maximum rate of uops issued to the scheduler).
> >
> > This adds the -mcpu=ampere1 command-line option and the relevant cost
> > information/tuning tables for the Ampere-1.
> >
> > gcc/ChangeLog:
> >
> >   * config/aarch64/aarch64-cores.def (AARCH64_CORE): New Ampere-1
> >   core.
> >   * config/aarch64/aarch64-tune.md: Regenerate.
> >   * config/aarch64/aarch64-cost-tables.h: Add extra costs for
> >   Ampere-1.
> >   * config/aarch64/aarch64.c: Add tuning structures for Ampere-1.
>
> OK, thanks.

Would this be eligible for a backport to gcc-11 as well to ensure
commandline-compatibility?

Thanks,
Philipp.


[PATCH v1 1/8] bswap: synthesize HImode bswap from SImode or DImode

2021-11-11 Thread Philipp Tomsich
The RISC-V Zbb extension adds an XLEN (i.e. SImode for rv32, DImode
for rv64) bswap instruction (rev8).  While, with the current master,
SImode is synthesized correctly from DImode, HImode is not.

This change adds an appropriate expansion for a HImode bswap, if a
wider bswap is available.

Without this change, the following rv64gc_zbb code is generated for
__builtin_bswap16():
slliw   a5,a0,8
zext.h  a0,a0
srliw   a0,a0,8
or  a0,a5,a0
sext.h  a0,a0  // this is a 16bit sign-extension following
   // the byteswap (e.g. on a 'short' function
   // return).

After this change, a bswap (rev8) is used and any extensions are
combined into the shift-right:
rev8a0,a0
sraia0,a0,48   // the sign-extension is combined into the
   // shift; a srli is emitted otherwise...

gcc/ChangeLog:

* optabs.c (expand_unop): support expanding a HImode bswap
  using SImode or DImode, followed by a shift.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/zbb-bswap.c: New test.

Signed-off-by: Philipp Tomsich 
---

 gcc/optabs.c   |  6 ++
 gcc/testsuite/gcc.target/riscv/zbb-bswap.c | 22 ++
 2 files changed, 28 insertions(+)
 create mode 100644 gcc/testsuite/gcc.target/riscv/zbb-bswap.c

diff --git a/gcc/optabs.c b/gcc/optabs.c
index 019bbb62882..7a3ffbe4525 100644
--- a/gcc/optabs.c
+++ b/gcc/optabs.c
@@ -3307,6 +3307,12 @@ expand_unop (machine_mode mode, optab unoptab, rtx op0, 
rtx target,
return temp;
}
 
+ /* If we are missing a HImode BSWAP, but have one for SImode or
+DImode, use a BSWAP followed by a SHIFT.  */
+ temp = widen_bswap (as_a  (mode), op0, target);
+ if (temp)
+   return temp;
+
  last = get_last_insn ();
 
  temp1 = expand_binop (mode, ashl_optab, op0,
diff --git a/gcc/testsuite/gcc.target/riscv/zbb-bswap.c 
b/gcc/testsuite/gcc.target/riscv/zbb-bswap.c
new file mode 100644
index 000..6ee27d9f47a
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/zbb-bswap.c
@@ -0,0 +1,22 @@
+/* { dg-do compile } */
+/* { dg-options "-march=rv64gc_zbb -mabi=lp64 -O2" } */
+
+unsigned long
+func64 (unsigned long i)
+{
+  return __builtin_bswap64(i);
+}
+
+unsigned int
+func32 (unsigned int i)
+{
+  return __builtin_bswap32(i);
+}
+
+unsigned short
+func16 (unsigned short i)
+{
+  return __builtin_bswap16(i);
+}
+
+/* { dg-final { scan-assembler-times "rev8" 3 } } */
-- 
2.32.0



[PATCH v1 0/8] Improvements to bitmanip-1.0 (Zb[abcs]) support

2021-11-11 Thread Philipp Tomsich


This series provides assorted improvements for the RISC-V Zb[abcs]
support collected over the last year and a half and forward-ported to
the recently merged upstream support for the Zb[abcs] extensions.

Improvements include:
 - synthesis of HImode bswap from SImode/DImode rev8
 - cost-model change to support shift-and-add (sh[123]add) in the
   strength-reduction of multiplication operations
 - support for constant-loading of (1ULL << 31) on RV64 using bseti
 - generating a polarity-reversed mask from a bit-test
 - adds orc.b as UNSPEC
 - improves min/minu/max/maxu patterns to suppress redundant extensions


Philipp Tomsich (8):
  bswap: synthesize HImode bswap from SImode or DImode
  RISC-V: costs: handle BSWAP
  RISC-V: costs: support shift-and-add in strength-reduction
  RISC-V: bitmanip: fix constant-loading for (1ULL << 31) in DImode
  RISC-V: bitmanip: improvements to rotate instructions
  RISC-V: bitmanip: add splitter to use bexti for "(a & (1 << BIT_NO)) ?
0 : -1"
  RISC-V: bitmanip: add orc.b as an unspec
  RISC-V: bitmanip: relax minmax to operate on GPR

 gcc/config/riscv/bitmanip.md | 74 +---
 gcc/config/riscv/riscv.c | 31 
 gcc/config/riscv/riscv.h | 11 ++-
 gcc/config/riscv/riscv.md|  3 +
 gcc/optabs.c |  6 ++
 gcc/testsuite/gcc.target/riscv/zbb-bswap.c   | 22 ++
 gcc/testsuite/gcc.target/riscv/zbb-min-max.c | 20 +-
 gcc/testsuite/gcc.target/riscv/zbs-bexti.c   | 14 
 8 files changed, 162 insertions(+), 19 deletions(-)
 create mode 100644 gcc/testsuite/gcc.target/riscv/zbb-bswap.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/zbs-bexti.c

-- 
2.32.0



[PATCH v1 2/8] RISC-V: costs: handle BSWAP

2021-11-11 Thread Philipp Tomsich
The BSWAP operation is not handled in rtx_costs. Add it.

gcc/ChangeLog:

* config/riscv/riscv.c (rtx_costs): Add BSWAP.

Signed-off-by: Philipp Tomsich 
---

 gcc/config/riscv/riscv.c | 8 
 1 file changed, 8 insertions(+)

diff --git a/gcc/config/riscv/riscv.c b/gcc/config/riscv/riscv.c
index c77b0322869..8480cf09294 100644
--- a/gcc/config/riscv/riscv.c
+++ b/gcc/config/riscv/riscv.c
@@ -2131,6 +2131,14 @@ riscv_rtx_costs (rtx x, machine_mode mode, int 
outer_code, int opno ATTRIBUTE_UN
   *total = riscv_extend_cost (XEXP (x, 0), GET_CODE (x) == ZERO_EXTEND);
   return false;
 
+case BSWAP:
+  if (TARGET_ZBB)
+   {
+ *total = COSTS_N_INSNS (1);
+ return true;
+   }
+  return false;
+
 case FLOAT:
 case UNSIGNED_FLOAT:
 case FIX:
-- 
2.32.0



[PATCH v1 3/8] RISC-V: costs: support shift-and-add in strength-reduction

2021-11-11 Thread Philipp Tomsich
The strength-reduction implementation in expmed.c will assess the
profitability of using shift-and-add using a RTL expression that wraps
a MULT (with a power-of-2) in a PLUS.  Unless the RISC-V rtx_costs
function recognizes this as expressing a sh[123]add instruction, we
will return an inflated cost, thus defeating the optimization.

This change adds the necessary idiom recognition to provide an
accurate cost for this for of expressing sh[123]add.

Instead on expanding to
li  a5,200
mulwa0,a5,a0
with this change, the expression 'a * 200' is sythesized as:
sh2add  a0,a0,a0   // *5 = a + 4 * a
sh2add  a0,a0,a0   // *5 = a + 4 * a
sllia0,a0,3// *8

gcc/ChangeLog:

* config/riscv/riscv.c (riscv_rtx_costs): Recognize shNadd,
if expressed as a plus and multiplication with a power-of-2.

Signed-off-by: Philipp Tomsich 
---

 gcc/config/riscv/riscv.c | 14 ++
 1 file changed, 14 insertions(+)

diff --git a/gcc/config/riscv/riscv.c b/gcc/config/riscv/riscv.c
index 8480cf09294..dff4e370471 100644
--- a/gcc/config/riscv/riscv.c
+++ b/gcc/config/riscv/riscv.c
@@ -2020,6 +2020,20 @@ riscv_rtx_costs (rtx x, machine_mode mode, int 
outer_code, int opno ATTRIBUTE_UN
  *total = COSTS_N_INSNS (1);
  return true;
}
+  /* Before strength-reduction, the shNadd can be expressed as the addition
+of a multiplication with a power-of-two.  If this case is not handled,
+the strength-reduction in expmed.c will calculate an inflated cost. */
+  if (TARGET_ZBA
+ && ((!TARGET_64BIT && (mode == SImode)) ||
+ (TARGET_64BIT && (mode == DImode)))
+ && (GET_CODE (XEXP (x, 0)) == MULT)
+ && REG_P (XEXP (XEXP (x, 0), 0))
+ && CONST_INT_P (XEXP (XEXP (x, 0), 1))
+ && IN_RANGE (pow2p_hwi (INTVAL (XEXP (XEXP (x, 0), 1))), 1, 3))
+   {
+ *total = COSTS_N_INSNS (1);
+ return true;
+   }
   /* shNadd.uw pattern for zba.
 [(set (match_operand:DI 0 "register_operand" "=r")
   (plus:DI
-- 
2.32.0



[PATCH v1 4/8] RISC-V: bitmanip: fix constant-loading for (1ULL << 31) in DImode

2021-11-11 Thread Philipp Tomsich
The SINGLE_BIT_MASK_OPERAND() is overly restrictive, triggering for
bits above 31 only (to side-step any issues with the negative SImode
value 0x8000).  This moves the special handling of this SImode
value (i.e. the check for -2147483648) to riscv.c and relaxes the
SINGLE_BIT_MASK_OPERAND() test.

This changes the code-generation for loading (1ULL << 31) from:
li  a0,1
sllia0,a0,31
to:
bseti   a0,zero,31

gcc/ChangeLog:

* config/riscv/riscv.c (riscv_build_integer_1): Rewrite value as
-2147483648 for the single-bit case, when operating on 0x8000
in SImode.
* gcc/config/riscv/riscv.h (SINGLE_BIT_MASK_OPERAND): Allow for
any single-bit value, moving the special case for 0x8000 to
riscv_build_integer_1 (in riscv.c).

Signed-off-by: Philipp Tomsich 
---

 gcc/config/riscv/riscv.c |  9 +
 gcc/config/riscv/riscv.h | 11 ---
 2 files changed, 13 insertions(+), 7 deletions(-)

diff --git a/gcc/config/riscv/riscv.c b/gcc/config/riscv/riscv.c
index dff4e370471..4c30d4e521d 100644
--- a/gcc/config/riscv/riscv.c
+++ b/gcc/config/riscv/riscv.c
@@ -415,6 +415,15 @@ riscv_build_integer_1 (struct riscv_integer_op 
codes[RISCV_MAX_INTEGER_OPS],
   /* Simply BSETI.  */
   codes[0].code = UNKNOWN;
   codes[0].value = value;
+
+  /* RISC-V sign-extends all 32bit values that life in a 32bit
+register.  To avoid paradoxes, we thus need to use the
+sign-extended (negative) representation for the value, if we
+want to build 0x8000 in SImode.  This will then expand
+to an ADDI/LI instruction.  */
+  if (mode == SImode && value == 0x8000)
+   codes[0].value = -2147483648;
+
   return 1;
 }
 
diff --git a/gcc/config/riscv/riscv.h b/gcc/config/riscv/riscv.h
index 64287124735..abb121ddbea 100644
--- a/gcc/config/riscv/riscv.h
+++ b/gcc/config/riscv/riscv.h
@@ -526,13 +526,10 @@ enum reg_class
   (((VALUE) | ((1UL<<31) - IMM_REACH)) == ((1UL<<31) - IMM_REACH)  \
|| ((VALUE) | ((1UL<<31) - IMM_REACH)) + IMM_REACH == 0)
 
-/* If this is a single bit mask, then we can load it with bseti.  But this
-   is not useful for any of the low 31 bits because we can use addi or lui
-   to load them.  It is wrong for loading SImode 0x8000 on rv64 because it
-   needs to be sign-extended.  So we restrict this to the upper 32-bits
-   only.  */
-#define SINGLE_BIT_MASK_OPERAND(VALUE) \
-  (pow2p_hwi (VALUE) && (ctz_hwi (VALUE) >= 32))
+/* If this is a single bit mask, then we can load it with bseti.  Special
+   handling of SImode 0x8000 on RV64 is done in riscv_build_integer_1. */
+#define SINGLE_BIT_MASK_OPERAND(VALUE) \
+  (pow2p_hwi (VALUE))
 
 /* Stack layout; function entry, exit and calling.  */
 
-- 
2.32.0



[PATCH v1 5/8] RISC-V: bitmanip: improvements to rotate instructions

2021-11-11 Thread Philipp Tomsich
This change improves rotate instructions (motivated by a review of the
code generated for OpenSSL): rotate-left by a constant are synthesized
using a rotate-right-immediate to avoid putting the shift-amount into
a temporary; to do so, we allow either a register or an immediate for
the expansion of rotl3 and then check if the shift-amount is a
constant.

Without these changes, the function
unsigned int f(unsigned int a)
{
  return (a << 2) | (a >> 30);
}
turns into
li  a5,2
rolwa0,a0,a5
while these changes give us:
roriw   a0,a0,30

gcc/ChangeLog:

* config/riscv/bitmanip.md (rotlsi3, rotldi3, rotlsi3_sext):
Synthesize rotate-left-by-immediate from a rotate-right insn.

Signed-off-by: Philipp Tomsich 
---

 gcc/config/riscv/bitmanip.md | 39 ++--
 1 file changed, 33 insertions(+), 6 deletions(-)

diff --git a/gcc/config/riscv/bitmanip.md b/gcc/config/riscv/bitmanip.md
index 59779b48f27..178d1ca0e4b 100644
--- a/gcc/config/riscv/bitmanip.md
+++ b/gcc/config/riscv/bitmanip.md
@@ -204,25 +204,52 @@ (define_insn "rotrsi3_sext"
 (define_insn "rotlsi3"
   [(set (match_operand:SI 0 "register_operand" "=r")
(rotate:SI (match_operand:SI 1 "register_operand" "r")
-  (match_operand:QI 2 "register_operand" "r")))]
+  (match_operand:QI 2 "arith_operand" "rI")))]
   "TARGET_ZBB"
-  { return TARGET_64BIT ? "rolw\t%0,%1,%2" : "rol\t%0,%1,%2"; }
+  {
+/* If the rotate-amount is constant, let's synthesize using a
+   rotate-right-immediate instead of using a temporary. */
+
+if (CONST_INT_P(operands[2])) {
+  operands[2] = GEN_INT(32 - INTVAL(operands[2]));
+  return TARGET_64BIT ? "roriw\t%0,%1,%2" : "rori\t%0,%1,%2";
+}
+
+return TARGET_64BIT ? "rolw\t%0,%1,%2" : "rol\t%0,%1,%2";
+  }
   [(set_attr "type" "bitmanip")])
 
 (define_insn "rotldi3"
   [(set (match_operand:DI 0 "register_operand" "=r")
(rotate:DI (match_operand:DI 1 "register_operand" "r")
-  (match_operand:QI 2 "register_operand" "r")))]
+  (match_operand:QI 2 "arith_operand" "rI")))]
   "TARGET_64BIT && TARGET_ZBB"
-  "rol\t%0,%1,%2"
+  {
+if (CONST_INT_P(operands[2])) {
+  operands[2] = GEN_INT(64 - INTVAL(operands[2]));
+  return "rori\t%0,%1,%2";
+}
+
+return "rol\t%0,%1,%2";
+  }
   [(set_attr "type" "bitmanip")])
 
+;; Until we have improved REE to understand that sign-extending the result of
+;; an implicitly sign-extending operation is redundant, we need an additional
+;; pattern to gobble up the redundant sign-extension.
 (define_insn "rotlsi3_sext"
   [(set (match_operand:DI 0 "register_operand" "=r")
(sign_extend:DI (rotate:SI (match_operand:SI 1 "register_operand" "r")
-  (match_operand:QI 2 "register_operand" 
"r"]
+  (match_operand:QI 2 "arith_operand" "rI"]
   "TARGET_64BIT && TARGET_ZBB"
-  "rolw\t%0,%1,%2"
+  {
+if (CONST_INT_P(operands[2])) {
+  operands[2] = GEN_INT(32 - INTVAL(operands[2]));
+  return "roriw\t%0,%1,%2";
+}
+
+return "rolw\t%0,%1,%2";
+  }
   [(set_attr "type" "bitmanip")])
 
 (define_insn "bswap2"
-- 
2.32.0



[PATCH v1 6/8] RISC-V: bitmanip: add splitter to use bexti for "(a & (1 << BIT_NO)) ? 0 : -1"

2021-11-11 Thread Philipp Tomsich
Consider creating a polarity-reversed mask from a set-bit (i.e., if
the bit is set, produce all-ones; otherwise: all-zeros).  Using Zbb,
this can be expressed as bexti, followed by an addi of minus-one.  To
enable the combiner to discover this opportunity, we need to split the
canonical expression for "(a & (1 << BIT_NO)) ? 0 : -1" into a form
combinable into bexti.

Consider the function:
long f(long a)
{
  return (a & (1 << BIT_NO)) ? 0 : -1;
}
This produces the following sequence prior to this change:
andia0,a0,16
seqza0,a0
neg a0,a0
ret
Following this change, it results in:
bexti   a0,a0,4
addia0,a0,-1
ret

gcc/ChangeLog:

* config/riscv/bitmanip.md: Add a splitter to generate
  polarity-reversed masks from a set bit using bexti + addi.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/zbs-bexti.c: New test.

Signed-off-by: Philipp Tomsich 
---

 gcc/config/riscv/bitmanip.md   | 13 +
 gcc/testsuite/gcc.target/riscv/zbs-bexti.c | 14 ++
 2 files changed, 27 insertions(+)
 create mode 100644 gcc/testsuite/gcc.target/riscv/zbs-bexti.c

diff --git a/gcc/config/riscv/bitmanip.md b/gcc/config/riscv/bitmanip.md
index 178d1ca0e4b..9e10280e306 100644
--- a/gcc/config/riscv/bitmanip.md
+++ b/gcc/config/riscv/bitmanip.md
@@ -367,3 +367,16 @@ (define_insn "*bexti"
   "TARGET_ZBS"
   "bexti\t%0,%1,%2"
   [(set_attr "type" "bitmanip")])
+
+;; We can create a polarity-reversed mask (i.e. bit N -> { set = 0, clear = -1 
})
+;; using a bext(i) followed by an addi instruction.
+;; This splits the canonical representation of "(a & (1 << BIT_NO)) ? 0 : -1".
+(define_split
+  [(set (match_operand:GPR 0 "register_operand")
+   (neg:GPR (eq:GPR (zero_extract:GPR (match_operand:GPR 1 
"register_operand")
+  (const_int 1)
+  (match_operand 2))
+(const_int 0]
+  "TARGET_ZBB"
+  [(set (match_dup 0) (zero_extract:GPR (match_dup 1) (const_int 1) (match_dup 
2)))
+   (set (match_dup 0) (plus:GPR (match_dup 0) (const_int -1)))])
diff --git a/gcc/testsuite/gcc.target/riscv/zbs-bexti.c 
b/gcc/testsuite/gcc.target/riscv/zbs-bexti.c
new file mode 100644
index 000..d02c3f7a98d
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/zbs-bexti.c
@@ -0,0 +1,14 @@
+/* { dg-do compile } */
+/* { dg-options "-march=rv64gc_zbs -mabi=lp64 -O2" } */
+
+/* bexti */
+#define BIT_NO  27
+
+long
+foo0 (long a)
+{
+  return (a & (1 << BIT_NO)) ? 0 : -1;
+}
+
+/* { dg-final { scan-assembler "bexti" } } */
+/* { dg-final { scan-assembler "addi" } } */
-- 
2.32.0



[PATCH v1 7/8] RISC-V: bitmanip: add orc.b as an unspec

2021-11-11 Thread Philipp Tomsich
As a basis for optimized string functions (e.g., the by-pieces
implementations), we need orc.b available.  This adds orc.b as an
unspec, so we can expand to it.

gcc/ChangeLog:

* config/riscv/bitmanip.md (orcb2): Add orc.b as an unspec.
* config/riscv/riscv.md: Add UNSPEC_ORC_B.

Signed-off-by: Philipp Tomsich 
---

 gcc/config/riscv/bitmanip.md | 8 
 gcc/config/riscv/riscv.md| 3 +++
 2 files changed, 11 insertions(+)

diff --git a/gcc/config/riscv/bitmanip.md b/gcc/config/riscv/bitmanip.md
index 9e10280e306..000deb48b16 100644
--- a/gcc/config/riscv/bitmanip.md
+++ b/gcc/config/riscv/bitmanip.md
@@ -267,6 +267,14 @@ (define_insn "3"
   "\t%0,%1,%2"
   [(set_attr "type" "bitmanip")])
 
+;; orc.b (or-combine) is added as an unspec for the benefit of the support
+;; for optimized string functions (such as strcmp).
+(define_insn "orcb2"
+  [(set (match_operand:X 0 "register_operand" "=r")
+   (unspec:X [(match_operand:X 1 "register_operand")] UNSPEC_ORC_B))]
+  "TARGET_ZBB"
+  "orc.b\t%0,%1")
+
 ;; ZBS extension.
 
 (define_insn "*bset"
diff --git a/gcc/config/riscv/riscv.md b/gcc/config/riscv/riscv.md
index 225e5b259c1..7a2501ec7a9 100644
--- a/gcc/config/riscv/riscv.md
+++ b/gcc/config/riscv/riscv.md
@@ -45,6 +45,9 @@ (define_c_enum "unspec" [
 
   ;; Stack tie
   UNSPEC_TIE
+
+  ;; Zbb OR-combine instruction
+  UNSPEC_ORC_B
 ])
 
 (define_c_enum "unspecv" [
-- 
2.32.0



[PATCH v1 8/8] RISC-V: bitmanip: relax minmax to operate on GPR

2021-11-11 Thread Philipp Tomsich
While min/minu/max/maxu instructions are provided for XLEN only, these
can safely operate on GPRs (i.e. SImode or DImode for RV64): SImode is
always sign-extended, which ensures that the XLEN-wide instructions
can be used for signed and unsigned comparisons on SImode yielding a
correct ordering of value.

This commit
 - relaxes the minmax pattern to express for GPR (instead of X only),
   providing both a si3 and di3 expansion on RV64
 - adds a sign-extending form for thee si3 pattern for RV64 to all REE
   to eliminate redundant extensions
 - adds test-cases for both

gcc/ChangeLog:

* config/riscv/bitmanip.md: Relax minmax to GPR (i.e SImode or
  DImode) on RV64.
* config/riscv/bitmanip.md (si3_sext): Add
  pattern for REE.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/zbb-min-max.c: Add testcases for SImode
  operands checking that no redundant sign- or zero-extensions
  are emitted.

Signed-off-by: Philipp Tomsich 
---

 gcc/config/riscv/bitmanip.md | 14 +++---
 gcc/testsuite/gcc.target/riscv/zbb-min-max.c | 20 +---
 2 files changed, 28 insertions(+), 6 deletions(-)

diff --git a/gcc/config/riscv/bitmanip.md b/gcc/config/riscv/bitmanip.md
index 000deb48b16..2a28f78f5f6 100644
--- a/gcc/config/riscv/bitmanip.md
+++ b/gcc/config/riscv/bitmanip.md
@@ -260,13 +260,21 @@ (define_insn "bswap2"
   [(set_attr "type" "bitmanip")])
 
 (define_insn "3"
-  [(set (match_operand:X 0 "register_operand" "=r")
-(bitmanip_minmax:X (match_operand:X 1 "register_operand" "r")
-  (match_operand:X 2 "register_operand" "r")))]
+  [(set (match_operand:GPR 0 "register_operand" "=r")
+(bitmanip_minmax:GPR (match_operand:GPR 1 "register_operand" "r")
+(match_operand:GPR 2 "register_operand" "r")))]
   "TARGET_ZBB"
   "\t%0,%1,%2"
   [(set_attr "type" "bitmanip")])
 
+(define_insn "si3_sext"
+  [(set (match_operand:DI 0 "register_operand" "=r")
+(sign_extend:DI (bitmanip_minmax:SI (match_operand:SI 1 
"register_operand" "r")
+(match_operand:SI 2 "register_operand" "r"]
+  "TARGET_64BIT && TARGET_ZBB"
+  "\t%0,%1,%2"
+  [(set_attr "type" "bitmanip")])
+
 ;; orc.b (or-combine) is added as an unspec for the benefit of the support
 ;; for optimized string functions (such as strcmp).
 (define_insn "orcb2"
diff --git a/gcc/testsuite/gcc.target/riscv/zbb-min-max.c 
b/gcc/testsuite/gcc.target/riscv/zbb-min-max.c
index f44c398ea08..7169e873551 100644
--- a/gcc/testsuite/gcc.target/riscv/zbb-min-max.c
+++ b/gcc/testsuite/gcc.target/riscv/zbb-min-max.c
@@ -1,5 +1,5 @@
 /* { dg-do compile } */
-/* { dg-options "-march=rv64gc_zbb -mabi=lp64 -O2" } */
+/* { dg-options "-march=rv64gc_zba_zbb -mabi=lp64 -O2" } */
 
 long
 foo1 (long i, long j)
@@ -25,7 +25,21 @@ foo4 (unsigned long i, unsigned long j)
   return i > j ? i : j;
 }
 
+unsigned int
+foo5(unsigned int a, unsigned int b)
+{
+  return a > b ? a : b;
+}
+
+int
+foo6(int a, int b)
+{
+  return a > b ? a : b;
+}
+
 /* { dg-final { scan-assembler-times "min" 3 } } */
-/* { dg-final { scan-assembler-times "max" 3 } } */
+/* { dg-final { scan-assembler-times "max" 4 } } */
 /* { dg-final { scan-assembler-times "minu" 1 } } */
-/* { dg-final { scan-assembler-times "maxu" 1 } } */
+/* { dg-final { scan-assembler-times "maxu" 3 } } */
+/* { dg-final { scan-assembler-not "zext.w" } } */
+/* { dg-final { scan-assembler-not "sext.w" } } */
-- 
2.32.0



Re: [PATCH v1 8/8] RISC-V: bitmanip: relax minmax to operate on GPR

2021-11-11 Thread Philipp Tomsich
Kito,

Unless I am missing something, the problem is not the relaxation to
GPR, but rather the sign-extending pattern I had squashed into the
same patch.
If you disable "si3_sext", a sext.w will be have to be
emitted after the 'max' and before the return (or before the SImode
output is consumed as a DImode), pushing the REE opportunity to a
subsequent consumer (e.g. an addw).

This will generate
   foo6:
  max a0,a0,a1
  sext.w a0,a0
  ret
which (assuming that the inputs to max are properly sign-extended
SImode values living in DImode registers) will be the same as
performing the two sext.w before the max.

Having a second set of eyes on this is appreciated — let me know if
you agree and I'll revise, once I have collected feedback on the
remaining patches of the series.

Philipp.


On Thu, 11 Nov 2021 at 17:00, Kito Cheng  wrote:
>
> Hi Philipp:
>
> We can't pretend we have SImode min/max instruction without that semantic.
> Give this testcase, x86 and rv64gc print out 8589934592 8589934591 = 0,
> but with this patch and compile with rv64gc_zba_zbb -O3, the output
> become 8589934592 8589934591 = 8589934592
>
> -Testcase---
> #include 
> long long __attribute__((noinline, noipa))
> foo6(long long a, long long b)
> {
>   int xa = a;
>   int xb = b;
>   return (xa > xb ? xa : xb);
> }
> int main() {
>   long long a = 0x2ll;
>   long long b = 0x1l;
>   long long c = foo6(a, b);
>   printf ("%lld %lld = %lld\n", a, b, c);
>   return 0;
> }
> --
> v64gc_zba_zbb -O3 w/o this patch:
> foo6:
> sext.w  a1,a1
> sext.w  a0,a0
> max a0,a0,a1
> ret
>
> ----------
> v64gc_zba_zbb -O3 w/ this patch:
> foo6:
> max a0,a0,a1
> ret
>
> On Thu, Nov 11, 2021 at 10:10 PM Philipp Tomsich
>  wrote:
> >
> > While min/minu/max/maxu instructions are provided for XLEN only, these
> > can safely operate on GPRs (i.e. SImode or DImode for RV64): SImode is
> > always sign-extended, which ensures that the XLEN-wide instructions
> > can be used for signed and unsigned comparisons on SImode yielding a
> > correct ordering of value.
> >
> > This commit
> >  - relaxes the minmax pattern to express for GPR (instead of X only),
> >providing both a si3 and di3 expansion on RV64
> >  - adds a sign-extending form for thee si3 pattern for RV64 to all REE
> >to eliminate redundant extensions
> >  - adds test-cases for both
> >
> > gcc/ChangeLog:
> >
> > * config/riscv/bitmanip.md: Relax minmax to GPR (i.e SImode or
> >   DImode) on RV64.
> > * config/riscv/bitmanip.md (si3_sext): Add
> >   pattern for REE.
> >
> > gcc/testsuite/ChangeLog:
> >
> > * gcc.target/riscv/zbb-min-max.c: Add testcases for SImode
> >   operands checking that no redundant sign- or zero-extensions
> >   are emitted.
> >
> > Signed-off-by: Philipp Tomsich 
> > ---
> >
> >  gcc/config/riscv/bitmanip.md | 14 +++---
> >  gcc/testsuite/gcc.target/riscv/zbb-min-max.c | 20 +---
> >  2 files changed, 28 insertions(+), 6 deletions(-)
> >
> > diff --git a/gcc/config/riscv/bitmanip.md b/gcc/config/riscv/bitmanip.md
> > index 000deb48b16..2a28f78f5f6 100644
> > --- a/gcc/config/riscv/bitmanip.md
> > +++ b/gcc/config/riscv/bitmanip.md
> > @@ -260,13 +260,21 @@ (define_insn "bswap2"
> >[(set_attr "type" "bitmanip")])
> >
> >  (define_insn "3"
> > -  [(set (match_operand:X 0 "register_operand" "=r")
> > -(bitmanip_minmax:X (match_operand:X 1 "register_operand" "r")
> > -  (match_operand:X 2 "register_operand" "r")))]
> > +  [(set (match_operand:GPR 0 "register_operand" "=r")
> > +(bitmanip_minmax:GPR (match_operand:GPR 1 "register_operand" "r")
> > +(match_operand:GPR 2 "register_operand" "r")))]
> >"TARGET_ZBB"
> >"\t%0,%1,%2"
> >[(set_attr "type" "bitmanip")])
> >
> > +(define_insn "si3_sext"
> > +  [(set (match_operand:DI 0 "register_operand" "=r")
> > +(sign_extend:DI (bitmanip_minmax:SI (match_operand:SI 1 
> > "register_operand" "r")
> > +(match_oper

Re: [PATCH v1 8/8] RISC-V: bitmanip: relax minmax to operate on GPR

2021-11-11 Thread Philipp Tomsich
Kito,

Thanks for the reality-check: the subreg-expressions are getting in the way.
I'll drop this from v2, as a permanent resolution for this will be a
bit more involved.

Philipp.

On Thu, 11 Nov 2021 at 17:42, Kito Cheng  wrote:
>
> Hi Philipp:
>
> This testcase got wrong result with this patch even w/o
> si3_sext pattern:
>
> #include 
>
> #define MAX(A, B) ((A) > (B) ? (A) : (B))
>
> long long __attribute__((noinline, noipa))
> foo6(long long a, long long b, int c)
> {
>   int xa = a;
>   int xb = b;
>   return MAX(MAX(xa, xb), c);
> }
> int main() {
>   long long a = 0x2ll;
>   long long b = 0x1l;
>   int c = 10;
>   long long d = foo6(a, b, c);
>   printf ("%lld %lld %d = %lld\n", a, b, c, d);
>   return 0;
> }
>
> On Fri, Nov 12, 2021 at 12:27 AM Kito Cheng  wrote:
> >
> > IIRC it's not work even without sign extend pattern since I did similar 
> > experimental before (not for RISC-V, but same concept), I guess I need more 
> > time to test that.
> >
> > Philipp Tomsich  於 2021年11月12日 週五 00:18 寫道:
> >>
> >> Kito,
> >>
> >> Unless I am missing something, the problem is not the relaxation to
> >> GPR, but rather the sign-extending pattern I had squashed into the
> >> same patch.
> >> If you disable "si3_sext", a sext.w will be have to be
> >> emitted after the 'max' and before the return (or before the SImode
> >> output is consumed as a DImode), pushing the REE opportunity to a
> >> subsequent consumer (e.g. an addw).
> >>
> >> This will generate
> >>foo6:
> >>   max a0,a0,a1
> >>   sext.w a0,a0
> >>   ret
> >> which (assuming that the inputs to max are properly sign-extended
> >> SImode values living in DImode registers) will be the same as
> >> performing the two sext.w before the max.
> >>
> >> Having a second set of eyes on this is appreciated — let me know if
> >> you agree and I'll revise, once I have collected feedback on the
> >> remaining patches of the series.
> >>
> >> Philipp.
> >>
> >>
> >> On Thu, 11 Nov 2021 at 17:00, Kito Cheng  wrote:
> >> >
> >> > Hi Philipp:
> >> >
> >> > We can't pretend we have SImode min/max instruction without that 
> >> > semantic.
> >> > Give this testcase, x86 and rv64gc print out 8589934592 8589934591 = 0,
> >> > but with this patch and compile with rv64gc_zba_zbb -O3, the output
> >> > become 8589934592 8589934591 = 8589934592
> >> >
> >> > -Testcase---
> >> > #include 
> >> > long long __attribute__((noinline, noipa))
> >> > foo6(long long a, long long b)
> >> > {
> >> >   int xa = a;
> >> >   int xb = b;
> >> >   return (xa > xb ? xa : xb);
> >> > }
> >> > int main() {
> >> >   long long a = 0x2ll;
> >> >   long long b = 0x1l;
> >> >   long long c = foo6(a, b);
> >> >   printf ("%lld %lld = %lld\n", a, b, c);
> >> >   return 0;
> >> > }
> >> > --
> >> > v64gc_zba_zbb -O3 w/o this patch:
> >> > foo6:
> >> > sext.w  a1,a1
> >> > sext.w  a0,a0
> >> > max a0,a0,a1
> >> > ret
> >> >
> >> > --
> >> > v64gc_zba_zbb -O3 w/ this patch:
> >> > foo6:
> >> > max a0,a0,a1
> >> > ret
> >> >
> >> > On Thu, Nov 11, 2021 at 10:10 PM Philipp Tomsich
> >> >  wrote:
> >> > >
> >> > > While min/minu/max/maxu instructions are provided for XLEN only, these
> >> > > can safely operate on GPRs (i.e. SImode or DImode for RV64): SImode is
> >> > > always sign-extended, which ensures that the XLEN-wide instructions
> >> > > can be used for signed and unsigned comparisons on SImode yielding a
> >> > > correct ordering of value.
> >> > >
> >> > > This commit
> >> > >  - relaxes the minmax pattern to express for GPR (instead of X only),
> >> > >providing both a si3 and di3 expansion on RV64
> >> > >  - adds a sign-extending form for thee si3 pattern for RV64 to all REE
> >> > >to elimin

[PATCH v1 0/2] Basic support for the Ventana VT1 w/ instruction fusion

2021-11-14 Thread Philipp Tomsich


This series provides support for the Ventana VT1 (a 4-way superscalar
rv64gc_zba_zbb_zbc_zbs core) including support for the supported
instruction fusion patterns.

This includes the addition of the fusion-aware scheduling
infrastructure for RISC-V and implements idiom recognition for the
fusion patterns supported by VT1.


Philipp Tomsich (2):
  RISC-V: Add basic support for the Ventana-VT1 core
  RISC-V: Add instruction fusion (for ventana-vt1)

 gcc/config/riscv/riscv-cores.def |   2 +
 gcc/config/riscv/riscv-opts.h|   3 +-
 gcc/config/riscv/riscv.c | 210 +++
 gcc/config/riscv/riscv.md|   2 +-
 gcc/doc/invoke.texi  |   4 +-
 5 files changed, 217 insertions(+), 4 deletions(-)

-- 
2.32.0



[PATCH v1 1/2] RISC-V: Add basic support for the Ventana-VT1 core

2021-11-14 Thread Philipp Tomsich
From: Philipp Tomsich 

The Ventana-VT1 core is compatible with rv64gc and Zb[abcs].
This introduces a placeholder -mcpu=ventana-vt1, so tooling and
scripts don't need to change once full support (pipeline, tuning,
etc.) will become public later.

gcc/ChangeLog:

* config/riscv/riscv-cores.def (RISCV_CORE): Add ventana-vt1.
* config/riscv/riscv-opts.h (enum riscv_microarchitecture_type): Add 
ventana_vt1.
* config/riscv/riscv.c: Add tune-info for ventana-vt1.
* config/riscv/riscv.md (tune): Add ventana_vt1.
* doc/invoke.texi: Add ventana-vt1.

Signed-off-by: Philipp Tomsich 
---

 gcc/config/riscv/riscv-cores.def |  2 ++
 gcc/config/riscv/riscv-opts.h|  3 ++-
 gcc/config/riscv/riscv.c | 14 ++
 gcc/config/riscv/riscv.md|  2 +-
 gcc/doc/invoke.texi  |  4 ++--
 5 files changed, 21 insertions(+), 4 deletions(-)

diff --git a/gcc/config/riscv/riscv-cores.def b/gcc/config/riscv/riscv-cores.def
index bf5aaba49c3..f6f225d3c5f 100644
--- a/gcc/config/riscv/riscv-cores.def
+++ b/gcc/config/riscv/riscv-cores.def
@@ -46,4 +46,6 @@ RISCV_CORE("sifive-s76",  "rv64imafdc", "sifive-7-series")
 RISCV_CORE("sifive-u54",  "rv64imafdc", "sifive-5-series")
 RISCV_CORE("sifive-u74",  "rv64imafdc", "sifive-7-series")
 
+RISCV_CORE("ventana-vt1", "rv64imafdc_zba_zbb_zbc_zbs","ventana-vt1")
+
 #undef RISCV_CORE
diff --git a/gcc/config/riscv/riscv-opts.h b/gcc/config/riscv/riscv-opts.h
index 2efc4b80f1f..32d6a9db1bd 100644
--- a/gcc/config/riscv/riscv-opts.h
+++ b/gcc/config/riscv/riscv-opts.h
@@ -52,7 +52,8 @@ extern enum riscv_isa_spec_class riscv_isa_spec;
 /* Keep this list in sync with define_attr "tune" in riscv.md.  */
 enum riscv_microarchitecture_type {
   generic,
-  sifive_7
+  sifive_7,
+  ventana_vt1
 };
 extern enum riscv_microarchitecture_type riscv_microarchitecture;
 
diff --git a/gcc/config/riscv/riscv.c b/gcc/config/riscv/riscv.c
index df66abeb6ce..6b918db65e9 100644
--- a/gcc/config/riscv/riscv.c
+++ b/gcc/config/riscv/riscv.c
@@ -330,6 +330,19 @@ static const struct riscv_tune_param 
optimize_size_tune_info = {
   false,   /* slow_unaligned_access */
 };
 
+/* Costs to use when optimizing for Ventana Micro VT1.  */
+static const struct riscv_tune_param ventana_vt1_tune_info = {
+  {COSTS_N_INSNS (4), COSTS_N_INSNS (5)},  /* fp_add */
+  {COSTS_N_INSNS (4), COSTS_N_INSNS (5)},  /* fp_mul */
+  {COSTS_N_INSNS (20), COSTS_N_INSNS (20)},/* fp_div */
+  {COSTS_N_INSNS (4), COSTS_N_INSNS (4)},  /* int_mul */
+  {COSTS_N_INSNS (6), COSTS_N_INSNS (6)},  /* int_div */
+  4,   /* issue_rate */
+  4,   /* branch_cost */
+  5,   /* memory_cost */
+  false,   /* slow_unaligned_access */
+};
+
 static tree riscv_handle_fndecl_attribute (tree *, tree, tree, int, bool *);
 static tree riscv_handle_type_attribute (tree *, tree, tree, int, bool *);
 
@@ -366,6 +379,7 @@ static const struct riscv_tune_info riscv_tune_info_table[] 
= {
   { "sifive-5-series", generic, &rocket_tune_info },
   { "sifive-7-series", sifive_7, &sifive_7_tune_info },
   { "thead-c906", generic, &thead_c906_tune_info },
+  { "ventana-vt1", ventana_vt1, &ventana_vt1_tune_info },
   { "size", generic, &optimize_size_tune_info },
 };
 
diff --git a/gcc/config/riscv/riscv.md b/gcc/config/riscv/riscv.md
index b06a26bffb3..be7ccc753a4 100644
--- a/gcc/config/riscv/riscv.md
+++ b/gcc/config/riscv/riscv.md
@@ -270,7 +270,7 @@ (define_attr "cannot_copy" "no,yes" (const_string "no"))
 ;; Microarchitectures we know how to tune for.
 ;; Keep this in sync with enum riscv_microarchitecture.
 (define_attr "tune"
-  "generic,sifive_7"
+  "generic,sifive_7,ventana_vt1"
   (const (symbol_ref "((enum attr_tune) riscv_microarchitecture)")))
 
 ;; Describe a user's asm statement.
diff --git a/gcc/doc/invoke.texi b/gcc/doc/invoke.texi
index 99cdeb90c7c..b5934183a88 100644
--- a/gcc/doc/invoke.texi
+++ b/gcc/doc/invoke.texi
@@ -27358,14 +27358,14 @@ by particular CPU name.
 Permissible values for this option are: @samp{sifive-e20}, @samp{sifive-e21},
 @samp{sifive-e24}, @samp{sifive-e31}, @samp{sifive-e34}, @samp{sifive-e76},
 @samp{sifive-s21}, @samp{sifive-s51}, @samp{sifive-s54}, @samp{sifive-s76},
-@samp{sifive-u54}, and @samp{sifive-u74}.
+@samp{sifive-u54}, @samp{sifive-u74}, and @samp{ventana-vt1} .
 
 @item -mtune=@var{processor-string}
 @opindex mtune
 Optimize the output for the given processor, specified by microarchitecture or
 particular CPU name.  Permissible values for this 

[PATCH v1 2/2] RISC-V: Add instruction fusion (for ventana-vt1)

2021-11-14 Thread Philipp Tomsich
From: Philipp Tomsich 

The Ventana VT1 core supports quad-issue and instruction fusion.
This implemented TARGET_SCHED_MACRO_FUSION_P to keep fusible sequences
together and adds idiom matcheing for the supported fusion cases.

gcc/ChangeLog:

* config/riscv/riscv.c (enum riscv_fusion_pairs): Add symbolic
constants to identify supported fusion patterns.
(struct riscv_tune_param): Add fusible_op field.
(riscv_macro_fusion_p): Implement.
(riscv_fusion_enabled_p): Implement.
(riscv_macro_fusion_pair_p): Implement and recoginze fusible
idioms for Ventana VT1.
(TARGET_SCHED_MACRO_FUSION_P): Point to riscv_macro_fusion_p.
(TARGET_SCHED_MACRO_FUSION_PAIR_P): Point to riscv_macro_fusion_pair_p.

Signed-off-by: Philipp Tomsich 
---

 gcc/config/riscv/riscv.c | 196 +++
 1 file changed, 196 insertions(+)

diff --git a/gcc/config/riscv/riscv.c b/gcc/config/riscv/riscv.c
index 6b918db65e9..8eac52101a3 100644
--- a/gcc/config/riscv/riscv.c
+++ b/gcc/config/riscv/riscv.c
@@ -211,6 +211,19 @@ struct riscv_integer_op {
The worst case is LUI, ADDI, SLLI, ADDI, SLLI, ADDI, SLLI, ADDI.  */
 #define RISCV_MAX_INTEGER_OPS 8
 
+enum riscv_fusion_pairs
+{
+  RISCV_FUSE_NOTHING = 0,
+  RISCV_FUSE_ZEXTW = (1 << 0),
+  RISCV_FUSE_ZEXTH = (1 << 1),
+  RISCV_FUSE_ZEXTWS = (1 << 2),
+  RISCV_FUSE_LDINDEXED = (1 << 3),
+  RISCV_FUSE_LUI_ADDI = (1 << 4),
+  RISCV_FUSE_AUIPC_ADDI = (1 << 5),
+  RISCV_FUSE_LUI_LD = (1 << 6),
+  RISCV_FUSE_AUIPC_LD = (1 << 7),
+};
+
 /* Costs of various operations on the different architectures.  */
 
 struct riscv_tune_param
@@ -224,6 +237,7 @@ struct riscv_tune_param
   unsigned short branch_cost;
   unsigned short memory_cost;
   bool slow_unaligned_access;
+  unsigned int fusible_ops;
 };
 
 /* Information about one micro-arch we know about.  */
@@ -289,6 +303,7 @@ static const struct riscv_tune_param rocket_tune_info = {
   3,   /* branch_cost */
   5,   /* memory_cost */
   true,/* 
slow_unaligned_access */
+  RISCV_FUSE_NOTHING,   /* fusible_ops */
 };
 
 /* Costs to use when optimizing for Sifive 7 Series.  */
@@ -302,6 +317,7 @@ static const struct riscv_tune_param sifive_7_tune_info = {
   4,   /* branch_cost */
   3,   /* memory_cost */
   true,/* 
slow_unaligned_access */
+  RISCV_FUSE_NOTHING,   /* fusible_ops */
 };
 
 /* Costs to use when optimizing for T-HEAD c906.  */
@@ -328,6 +344,7 @@ static const struct riscv_tune_param 
optimize_size_tune_info = {
   1,   /* branch_cost */
   2,   /* memory_cost */
   false,   /* slow_unaligned_access */
+  RISCV_FUSE_NOTHING,   /* fusible_ops */
 };
 
 /* Costs to use when optimizing for Ventana Micro VT1.  */
@@ -341,6 +358,10 @@ static const struct riscv_tune_param ventana_vt1_tune_info 
= {
   4,   /* branch_cost */
   5,   /* memory_cost */
   false,   /* slow_unaligned_access */
+  ( RISCV_FUSE_ZEXTW | RISCV_FUSE_ZEXTH |   /* fusible_ops */
+RISCV_FUSE_ZEXTWS | RISCV_FUSE_LDINDEXED |
+RISCV_FUSE_LUI_ADDI | RISCV_FUSE_AUIPC_ADDI |
+RISCV_FUSE_LUI_LD | RISCV_FUSE_AUIPC_LD )
 };
 
 static tree riscv_handle_fndecl_attribute (tree *, tree, tree, int, bool *);
@@ -4909,6 +4930,177 @@ riscv_issue_rate (void)
   return tune_param->issue_rate;
 }
 
+/* Implement TARGET_SCHED_MACRO_FUSION_P.  Return true if target supports
+   instruction fusion of some sort.  */
+
+static bool
+riscv_macro_fusion_p (void)
+{
+  return tune_param->fusible_ops != RISCV_FUSE_NOTHING;
+}
+
+/* Return true iff the instruction fusion described by OP is enabled.  */
+
+static bool
+riscv_fusion_enabled_p(enum riscv_fusion_pairs op)
+{
+  return tune_param->fusible_ops & op;
+}
+
+/* Implement TARGET_SCHED_MACRO_FUSION_PAIR_P.  Return true if PREV and CURR
+   should be kept together during scheduling.  */
+
+static bool
+riscv_macro_fusion_pair_p (rtx_insn *prev, rtx_insn *curr)
+{
+  rtx prev_set = single_set (prev);
+  rtx curr_set = single_set (curr);
+  /* prev and curr are simple SET insns i.e. no flag setting or branching.  */
+  bool simple_sets_p = prev_set && curr_set && !any_condjump_p (curr);
+
+  if (!riscv_macro_fusion_p ())
+return false;
+
+  if (simple_sets_p && (riscv_fusion_enabled_p (RISCV_FUSE_ZEXTW) ||
+   riscv_fusion_enabled_p (RISCV_FUSE_ZEXTH)))
+{
+   

[GCC-11 PATCH] aarch64: enable Ampere-1 CPU (backport to GCC11)

2021-11-15 Thread Philipp Tomsich
This adds support and a basic turning model for the Ampere Computing
"Ampere-1" CPU.

The Ampere-1 implements the ARMv8.6 architecture in A64 mode and is
modelled as a 4-wide issue (as with all modern micro-architectures,
the chosen issue rate is a compromise between the maximum dispatch
rate and the maximum rate of uops issued to the scheduler).

This adds the -mcpu=ampere1 command-line option and the relevant cost
information/tuning tables for the Ampere-1.

gcc/ChangeLog:

* config/aarch64/aarch64-cores.def (AARCH64_CORE): New Ampere-1
core.
* config/aarch64/aarch64-tune.md: Regenerate.
* config/aarch64/aarch64-cost-tables.h: Add extra costs for
Ampere-1.
* config/aarch64/aarch64.c: Add tuning structures for Ampere-1.

(cherry picked from 67b0d47e20e655c0dd53a76ea88aab60fafb2059)

---
This is a backport from master and only affects the AArch64 backend.

OK for GCC-11?

 gcc/config/aarch64/aarch64-cores.def |   3 +-
 gcc/config/aarch64/aarch64-cost-tables.h | 104 +++
 gcc/config/aarch64/aarch64-tune.md   |   2 +-
 gcc/config/aarch64/aarch64.c |  78 +
 gcc/doc/invoke.texi  |   2 +-
 5 files changed, 186 insertions(+), 3 deletions(-)

diff --git a/gcc/config/aarch64/aarch64-cores.def 
b/gcc/config/aarch64/aarch64-cores.def
index b2aa1670561..4643e0e2795 100644
--- a/gcc/config/aarch64/aarch64-cores.def
+++ b/gcc/config/aarch64/aarch64-cores.def
@@ -68,7 +68,8 @@ AARCH64_CORE("octeontx83",octeontxt83,   thunderx,  8A,  
AARCH64_FL_FOR_ARCH
 AARCH64_CORE("thunderxt81",   thunderxt81,   thunderx,  8A,  
AARCH64_FL_FOR_ARCH8 | AARCH64_FL_CRC | AARCH64_FL_CRYPTO, thunderx,  0x43, 
0x0a2, -1)
 AARCH64_CORE("thunderxt83",   thunderxt83,   thunderx,  8A,  
AARCH64_FL_FOR_ARCH8 | AARCH64_FL_CRC | AARCH64_FL_CRYPTO, thunderx,  0x43, 
0x0a3, -1)
 
-/* Ampere Computing cores. */
+/* Ampere Computing ('\xC0') cores. */
+AARCH64_CORE("ampere1", ampere1, cortexa57, 8_6A, AARCH64_FL_FOR_ARCH8_6, 
ampere1, 0xC0, 0xac3, -1)
 /* Do not swap around "emag" and "xgene1",
this order is required to handle variant correctly. */
 AARCH64_CORE("emag",emag,  xgene1,8A,  AARCH64_FL_FOR_ARCH8 | 
AARCH64_FL_CRC | AARCH64_FL_CRYPTO, emag, 0x50, 0x000, 3)
diff --git a/gcc/config/aarch64/aarch64-cost-tables.h 
b/gcc/config/aarch64/aarch64-cost-tables.h
index dd2e7e7cbb1..4b7e4e034a2 100644
--- a/gcc/config/aarch64/aarch64-cost-tables.h
+++ b/gcc/config/aarch64/aarch64-cost-tables.h
@@ -650,4 +650,108 @@ const struct cpu_cost_table a64fx_extra_costs =
   }
 };
 
+const struct cpu_cost_table ampere1_extra_costs =
+{
+  /* ALU */
+  {
+0, /* arith.  */
+0, /* logical.  */
+0, /* shift.  */
+COSTS_N_INSNS (1), /* shift_reg.  */
+0, /* arith_shift.  */
+COSTS_N_INSNS (1), /* arith_shift_reg.  */
+0, /* log_shift.  */
+COSTS_N_INSNS (1), /* log_shift_reg.  */
+0, /* extend.  */
+COSTS_N_INSNS (1), /* extend_arith.  */
+0, /* bfi.  */
+0, /* bfx.  */
+0, /* clz.  */
+0, /* rev.  */
+0, /* non_exec.  */
+true   /* non_exec_costs_exec.  */
+  },
+  {
+/* MULT SImode */
+{
+  COSTS_N_INSNS (3),   /* simple.  */
+  COSTS_N_INSNS (3),   /* flag_setting.  */
+  COSTS_N_INSNS (3),   /* extend.  */
+  COSTS_N_INSNS (4),   /* add.  */
+  COSTS_N_INSNS (4),   /* extend_add.  */
+  COSTS_N_INSNS (18)   /* idiv.  */
+},
+/* MULT DImode */
+{
+  COSTS_N_INSNS (3),   /* simple.  */
+  0,   /* flag_setting (N/A).  */
+  COSTS_N_INSNS (3),   /* extend.  */
+  COSTS_N_INSNS (4),   /* add.  */
+  COSTS_N_INSNS (4),   /* extend_add.  */
+  COSTS_N_INSNS (34)   /* idiv.  */
+}
+  },
+  /* LD/ST */
+  {
+COSTS_N_INSNS (4), /* load.  */
+COSTS_N_INSNS (4), /* load_sign_extend.  */
+0, /* ldrd (n/a).  */
+0, /* ldm_1st.  */
+0, /* ldm_regs_per_insn_1st.  */
+0, /* ldm_regs_per_insn_subsequent.  */
+COSTS_N_INSNS (5), /* loadf.  */
+COSTS_N_INSNS (5), /* loadd.  */
+COSTS_N_INSNS (5), /* load_unaligned.  */
+0, /* store.  */
+0, /* strd.  */
+0, /* stm_1st.  */
+0, /* stm_regs_per_insn_1st.  */
+0, /* stm_regs_per_insn_subsequent.  */
+COSTS_N_INSNS (2), /* storef.  */
+COSTS_N_INSNS (2), /* stored.  */
+COSTS_N_INSNS (2), /* store_unaligned.  */
+COSTS_N_INSNS (3), /* loadv.  */
+COSTS_

Re: [PATCH v3] RISC-V: Replace zero_extendsidi2_shifted with generalized split

2024-03-27 Thread Philipp Tomsich
Jeff,

just a heads-up that that trunk (i.e., the soon-to-be GCC14) still
generates the suboptimal sequence:
  https://godbolt.org/z/K9YYEPsvY

Thanks,
Philipp.


On Mon, 21 Nov 2022 at 18:00, Philipp Tomsich  wrote:
>
> On Sun, 20 Nov 2022 at 17:38, Jeff Law  wrote:
> >
> >
> > On 11/9/22 16:10, Philipp Tomsich wrote:
> > > The current method of treating shifts of extended values on RISC-V
> > > frequently causes sequences of 3 shifts, despite the presence of the
> > > 'zero_extendsidi2_shifted' pattern.
> > >
> > > Consider:
> > >  unsigned long f(unsigned int a, unsigned long b)
> > >  {
> > >  a = a << 1;
> > >  unsigned long c = (unsigned long) a;
> > >  c = b + (c<<4);
> > >  return c;
> > >  }
> > > which will present at combine-time as:
> > >  Trying 7, 8 -> 9:
> > >  7: r78:SI=r81:DI#0<<0x1
> > >REG_DEAD r81:DI
> > >  8: r79:DI=zero_extend(r78:SI)
> > >REG_DEAD r78:SI
> > >  9: r72:DI=r79:DI<<0x4
> > >REG_DEAD r79:DI
> > >  Failed to match this instruction:
> > >  (set (reg:DI 72 [ _1 ])
> > >  (and:DI (ashift:DI (reg:DI 81)
> > >  (const_int 5 [0x5]))
> > >   (const_int 68719476704 [0xfffe0])))
> > > and produce the following (optimized) assembly:
> > >  f:
> > >   slliw   a5,a0,1
> > >   sllia5,a5,32
> > >   srlia5,a5,28
> > >   add a0,a5,a1
> > >   ret
> > >
> > > The current way of handling this (in 'zero_extendsidi2_shifted')
> > > doesn't apply for two reasons:
> > > - this is seen before reload, and
> > > - (more importantly) the constant mask is not 0xul.
> > >
> > > To address this, we introduce a generalized version of shifting
> > > zero-extended values that supports any mask of consecutive ones as
> > > long as the number of training zeros is the inner shift-amount.
> > >
> > > With this new split, we generate the following assembly for the
> > > aforementioned function:
> > >  f:
> > >   sllia0,a0,33
> > >   srlia0,a0,28
> > >   add a0,a0,a1
> > >   ret
> > >
> > > Unfortunately, all of this causes some fallout (especially in how it
> > > interacts with Zb* extensions and zero_extract expressions formed
> > > during combine): this is addressed through additional instruction
> > > splitting and handling of zero_extract.
> > >
> > > gcc/ChangeLog:
> > >
> > >   * config/riscv/bitmanip.md (*zext.w): Match a zext.w expressed
> > >  as an and:DI.
> > >   (*andi_add.uw): New pattern.
> > >   (*slli_slli_uw): New pattern.
> > >   (*shift_then_shNadd.uw): New pattern.
> > >   (*slliuw): Rename to riscv_slli_uw.
> > >   (riscv_slli_uw): Renamed from *slliuw.
> > >   (*zeroextract2_highbits): New pattern.
> > >   (*zero_extract): New pattern, which will be split to
> > >   shift-left + shift-right.
> > >   * config/riscv/predicates.md (dimode_shift_operand):
> > >   * config/riscv/riscv.md (*zero_extract_lowbits):
> > >   (zero_extendsidi2_shifted): Rename.
> > >   (*zero_extendsidi2_shifted): Generalize.
> > >   (*shift_truthvalue): New pattern.
> > >
> > > gcc/testsuite/ChangeLog:
> > >
> > >   * gcc.target/riscv/shift-shift-6.c: New test.
> > >   * gcc.target/riscv/shift-shift-7.c: New test.
> > >   * gcc.target/riscv/shift-shift-8.c: New test.
> > >   * gcc.target/riscv/shift-shift-9.c: New test.
> > >   * gcc.target/riscv/snez.c: New test.
> > >
> > > Commit notes:
> > > - Depends on a predicate posted in "RISC-V: Optimize branches testing
> > >a bit-range or a shifted immediate".  Depending on the order of
> > >applying these, I'll take care to pull that part out of the other
> > >patch if needed.
> > >
> > > Version-changes: 2
> > > - refactor
> > > - optimise for additional corner cases and deal with fallout
> > >
> > > Version-changes: 3
> > > - removed the [WIP] from the commit message (no other changes)
> > >
> > > Signed-o

Re: [PATCH v3] RISC-V: Replace zero_extendsidi2_shifted with generalized split

2024-04-06 Thread Philipp Tomsich
On Sat 6. Apr 2024 at 06:52, Jeff Law  wrote:

>
>
> On 3/27/24 4:55 AM, Philipp Tomsich wrote:
> > Jeff,
> >
> > just a heads-up that that trunk (i.e., the soon-to-be GCC14) still
> > generates the suboptimal sequence:
> >https://godbolt.org/z/K9YYEPsvY
> Realistically it's too late to get this into gcc-14.


I didn’t expect this for 14, but wanted to make sure we didn’t forget about
it once the branch for 15 opens up.

Thanks,
Philipp.

>


Re: [PATCH] aarch64: Check the ldp/stp policy model correctly when mem ops are reversed.

2024-01-29 Thread Philipp Tomsich
Applied to master, thanks!
--Philipp.

On Wed, 24 Jan 2024 at 12:43, Richard Sandiford 
wrote:

> Manos Anagnostakis  writes:
> > The current ldp/stp policy framework implementation was missing cases,
> where
> > the memory operands were reversed. Therefore the call to the framework
> function
> > is moved after the lower mem check with the suitable parameters. Also
> removes
> > the mode of aarch64_operands_ok_for_ldpstp, which becomes unused and
> triggers
> > a warning on bootstrap.
> >
> > gcc/ChangeLog:
> >
> > * config/aarch64/aarch64-ldpstp.md: Remove unused mode.
> > * config/aarch64/aarch64-protos.h
> (aarch64_operands_ok_for_ldpstp):
> >   Likewise.
> > * config/aarch64/aarch64.cc (aarch64_operands_ok_for_ldpstp):
> >   Call on framework moved later.
>
> OK, thanks.  The policy infrastructure is new to GCC 14 and so I think
> the change qualifies for stage 4.
>
> Richard
>
> > Signed-off-by: Manos Anagnostakis 
> > Co-Authored-By: Manolis Tsamis 
> > ---
> >  gcc/config/aarch64/aarch64-ldpstp.md | 22 +++---
> >  gcc/config/aarch64/aarch64-protos.h  |  2 +-
> >  gcc/config/aarch64/aarch64.cc| 18 +-
> >  3 files changed, 21 insertions(+), 21 deletions(-)
> >
> > diff --git a/gcc/config/aarch64/aarch64-ldpstp.md
> b/gcc/config/aarch64/aarch64-ldpstp.md
> > index b668fa8e2a6..b7c0bf05cd1 100644
> > --- a/gcc/config/aarch64/aarch64-ldpstp.md
> > +++ b/gcc/config/aarch64/aarch64-ldpstp.md
> > @@ -23,7 +23,7 @@
> >   (match_operand:GPI 1 "memory_operand" ""))
> > (set (match_operand:GPI 2 "register_operand" "")
> >   (match_operand:GPI 3 "memory_operand" ""))]
> > -  "aarch64_operands_ok_for_ldpstp (operands, true, mode)"
> > +  "aarch64_operands_ok_for_ldpstp (operands, true)"
> >[(const_int 0)]
> >  {
> >aarch64_finish_ldpstp_peephole (operands, true);
> > @@ -35,7 +35,7 @@
> >   (match_operand:GPI 1 "aarch64_reg_or_zero" ""))
> > (set (match_operand:GPI 2 "memory_operand" "")
> >   (match_operand:GPI 3 "aarch64_reg_or_zero" ""))]
> > -  "aarch64_operands_ok_for_ldpstp (operands, false, mode)"
> > +  "aarch64_operands_ok_for_ldpstp (operands, false)"
> >[(const_int 0)]
> >  {
> >aarch64_finish_ldpstp_peephole (operands, false);
> > @@ -47,7 +47,7 @@
> >   (match_operand:GPF 1 "memory_operand" ""))
> > (set (match_operand:GPF 2 "register_operand" "")
> >   (match_operand:GPF 3 "memory_operand" ""))]
> > -  "aarch64_operands_ok_for_ldpstp (operands, true, mode)"
> > +  "aarch64_operands_ok_for_ldpstp (operands, true)"
> >[(const_int 0)]
> >  {
> >aarch64_finish_ldpstp_peephole (operands, true);
> > @@ -59,7 +59,7 @@
> >   (match_operand:GPF 1 "aarch64_reg_or_fp_zero" ""))
> > (set (match_operand:GPF 2 "memory_operand" "")
> >   (match_operand:GPF 3 "aarch64_reg_or_fp_zero" ""))]
> > -  "aarch64_operands_ok_for_ldpstp (operands, false, mode)"
> > +  "aarch64_operands_ok_for_ldpstp (operands, false)"
> >[(const_int 0)]
> >  {
> >aarch64_finish_ldpstp_peephole (operands, false);
> > @@ -71,7 +71,7 @@
> >   (match_operand:DREG 1 "memory_operand" ""))
> > (set (match_operand:DREG2 2 "register_operand" "")
> >   (match_operand:DREG2 3 "memory_operand" ""))]
> > -  "aarch64_operands_ok_for_ldpstp (operands, true, mode)"
> > +  "aarch64_operands_ok_for_ldpstp (operands, true)"
> >[(const_int 0)]
> >  {
> >aarch64_finish_ldpstp_peephole (operands, true);
> > @@ -83,7 +83,7 @@
> >   (match_operand:DREG 1 "register_operand" ""))
> > (set (match_operand:DREG2 2 "memory_operand" "")
> >   (match_operand:DREG2 3 "register_operand" ""))]
> > -  "aarch64_operands_ok_for_ldpstp (operands, false, mode)"
> > +  "aarch64_operands_ok_for_ldpstp (operands, false)"
> >[(const_int 0)]
> >  {
> >aarch64_finish_ldpstp_peephole (operands, false);
> > @@ -96,7 +96,7 @@
> > (set (match_operand:VQ2 2 "register_operand" "")
> >   (match_operand:VQ2 3 "memory_operand" ""))]
> >"TARGET_FLOAT
> > -   && aarch64_operands_ok_for_ldpstp (operands, true, mode)
> > +   && aarch64_operands_ok_for_ldpstp (operands, true)
> > && (aarch64_tune_params.extra_tuning_flags
> >   & AARCH64_EXTRA_TUNE_NO_LDP_STP_QREGS) == 0"
> >[(const_int 0)]
> > @@ -111,7 +111,7 @@
> > (set (match_operand:VQ2 2 "memory_operand" "")
> >   (match_operand:VQ2 3 "register_operand" ""))]
> >"TARGET_FLOAT
> > -   && aarch64_operands_ok_for_ldpstp (operands, false, mode)
> > +   && aarch64_operands_ok_for_ldpstp (operands, false)
> > && (aarch64_tune_params.extra_tuning_flags
> >   & AARCH64_EXTRA_TUNE_NO_LDP_STP_QREGS) == 0"
> >[(const_int 0)]
> > @@ -128,7 +128,7 @@
> >   (sign_extend:DI (match_operand:SI 1 "memory_operand" "")))
> > (set (match_operand:DI 2 "register_operand" "")
> >   (sign_extend:DI (match_operand:SI 3 "memory_operand" "")))]
> > -  "aarch64_operands_ok_for_ldpstp (operands, true, SImode)"
> > +  "aarch6

Re: [PATCH v6] aarch64: New RTL optimization pass avoid-store-forwarding.

2023-12-06 Thread Philipp Tomsich
On Wed, 6 Dec 2023 at 23:32, Richard Biener  wrote:
>
> On Wed, Dec 6, 2023 at 2:48 PM Manos Anagnostakis
>  wrote:
> >
> > This is an RTL pass that detects store forwarding from stores to larger 
> > loads (load pairs).
> >
> > This optimization is SPEC2017-driven and was found to be beneficial for 
> > some benchmarks,
> > through testing on ampere1/ampere1a machines.
> >
> > For example, it can transform cases like
> >
> > str  d5, [sp, #320]
> > fmul d5, d31, d29
> > ldp  d31, d17, [sp, #312] # Large load from small store
> >
> > to
> >
> > str  d5, [sp, #320]
> > fmul d5, d31, d29
> > ldr  d31, [sp, #312]
> > ldr  d17, [sp, #320]
> >
> > Currently, the pass is disabled by default on all architectures and enabled 
> > by a target-specific option.
> >
> > If deemed beneficial enough for a default, it will be enabled on 
> > ampere1/ampere1a,
> > or other architectures as well, without needing to be turned on by this 
> > option.
>
> What is aarch64-specific about the pass?
>
> I see an increasingly large number of target specific passes pop up (probably
> for the excuse we can generalize them if necessary).  But GCC isn't LLVM
> and this feels like getting out of hand?

We had an OK from Richard Sandiford on the earlier (v5) version with
v6 just fixing an obvious bug... so I was about to merge this earlier
just when you commented.

Given that this had months of test exposure on our end, I would prefer
to move this forward for GCC14 in its current form.
The project of replacing architecture-specific store-forwarding passes
with a generalized infrastructure could then be addressed in the GCC15
timeframe (or beyond)?

--Philipp.

>
> The x86 backend also has its store-forwarding "pass" as part of mdreorg
> in ix86_split_stlf_stall_load.
>
> Richard.
>
> > Bootstrapped and regtested on aarch64-linux.
> >
> > gcc/ChangeLog:
> >
> > * config.gcc: Add aarch64-store-forwarding.o to extra_objs.
> > * config/aarch64/aarch64-passes.def (INSERT_PASS_AFTER): New pass.
> > * config/aarch64/aarch64-protos.h 
> > (make_pass_avoid_store_forwarding): Declare.
> > * config/aarch64/aarch64.opt (mavoid-store-forwarding): New option.
> > (aarch64-store-forwarding-threshold): New param.
> > * config/aarch64/t-aarch64: Add aarch64-store-forwarding.o
> > * doc/invoke.texi: Document new option and new param.
> > * config/aarch64/aarch64-store-forwarding.cc: New file.
> >
> > gcc/testsuite/ChangeLog:
> >
> > * gcc.target/aarch64/ldp_ssll_no_overlap_address.c: New test.
> > * gcc.target/aarch64/ldp_ssll_no_overlap_offset.c: New test.
> > * gcc.target/aarch64/ldp_ssll_overlap.c: New test.
> >
> > Signed-off-by: Manos Anagnostakis 
> > Co-Authored-By: Manolis Tsamis 
> > Co-Authored-By: Philipp Tomsich 
> > ---
> > Changes in v6:
> > - An obvious change. insn_cnt was incremented only on
> >   stores and not for every insn in the bb. Now restored.
> >
> >  gcc/config.gcc|   1 +
> >  gcc/config/aarch64/aarch64-passes.def |   1 +
> >  gcc/config/aarch64/aarch64-protos.h   |   1 +
> >  .../aarch64/aarch64-store-forwarding.cc   | 318 ++
> >  gcc/config/aarch64/aarch64.opt|   9 +
> >  gcc/config/aarch64/t-aarch64  |  10 +
> >  gcc/doc/invoke.texi   |  11 +-
> >  .../aarch64/ldp_ssll_no_overlap_address.c |  33 ++
> >  .../aarch64/ldp_ssll_no_overlap_offset.c  |  33 ++
> >  .../gcc.target/aarch64/ldp_ssll_overlap.c |  33 ++
> >  10 files changed, 449 insertions(+), 1 deletion(-)
> >  create mode 100644 gcc/config/aarch64/aarch64-store-forwarding.cc
> >  create mode 100644 
> > gcc/testsuite/gcc.target/aarch64/ldp_ssll_no_overlap_address.c
> >  create mode 100644 
> > gcc/testsuite/gcc.target/aarch64/ldp_ssll_no_overlap_offset.c
> >  create mode 100644 gcc/testsuite/gcc.target/aarch64/ldp_ssll_overlap.c
> >
> > diff --git a/gcc/config.gcc b/gcc/config.gcc
> > index 6450448f2f0..7c48429eb82 100644
> > --- a/gcc/config.gcc
> > +++ b/gcc/config.gcc
> > @@ -350,6 +350,7 @@ aarch64*-*-*)
> > cxx_target_objs="aarch64-c.o"
> > d_target_objs="aarch64-d.o"
> > extra_objs="aarch64-builtins.o aarch-common.o 
> > aarch64-sve-builtins.o aarch64-sve-builtins-shapes.o 
> > aarch64-sve-builtins-base.o aarch64-sve-b

Re: [PATCH v2 05/11] riscv: thead: Add support for the XTheadBa ISA extension

2022-12-19 Thread Philipp Tomsich
On Mon, 19 Dec 2022 at 05:20, Kito Cheng  wrote:
>
> LGTM with a nit:
>
> ...
> > +  "TARGET_XTHEADBA
> > +   && (INTVAL (operands[2]) >= 0) && (INTVAL (operands[2]) <= 3)"
>
> IN_RANGE(INTVAL(operands[2]), 0, 3)
>
> and I am little bit suppress it can be zero

So was I, when reading the specification — and I reconfirmed that bit
by checking with the folks at T-Head.

We discussed this internally before submitting: while this case should
never occur (as other pieces in the compiler are smart enough to
simplify the RTX), we decided to include the 0 as it is an accurate
reflection of the instruction semantics.

Philipp.

>
> > +  "th.addsl\t%0,%1,%3,%2"
> > +  [(set_attr "type" "bitmanip")
> > +   (set_attr "mode" "")])


Re: [RFC PATCH] RISC-V: Add support for vector crypto extensions

2022-12-27 Thread Philipp Tomsich
On Tue, 27 Dec 2022 at 19:58, Palmer Dabbelt  wrote:
>
> On Tue, 27 Dec 2022 09:35:55 PST (-0800), gcc-patches@gcc.gnu.org wrote:
> >
> >
> > On 12/21/22 11:31, Christoph Muellner wrote:
> >> From: Christoph Müllner 
> >>
> >> This series adds basic support for the vector crypto extensions:
> >> * Zvkb
> >> * Zvkg
> >> * Zvkh[a,b]
> >> * Zvkn
> >> * Zvksed
> >> * Zvksh
> >>
> >> The implementation follows the version 20221220 of the specification,
> >> which can be found here:
> >>https://github.com/riscv/riscv-crypto/releases/tag/v20221220
> >>
> >> Note, that this specification is not frozen yet, meaning that
> >> incompatible changes are possible.
> >> Therefore, this patchset is marked as RFC and should not be considered
> >> for upstream inclusion.
> >>
> >> All extensions come with (passing) tests for the feature test macros.
> >>
> >> A Binutils patch series for vector crypto support can be found here:
> >>https://sourceware.org/pipermail/binutils/2022-December/125272.html
> >>
> >> Signed-off-by: Christoph Müllner 
> >> ---
> >>   gcc/common/config/riscv/riscv-common.cc | 16 
> >>   gcc/config/riscv/riscv-opts.h   | 16 
> >>   gcc/config/riscv/riscv.opt  |  3 +++
> >>   gcc/testsuite/gcc.target/riscv/zvkb.c   | 13 +
> >>   gcc/testsuite/gcc.target/riscv/zvkg.c   | 13 +
> >>   gcc/testsuite/gcc.target/riscv/zvkha.c  | 13 +
> >>   gcc/testsuite/gcc.target/riscv/zvkhb.c  | 13 +
> >>   gcc/testsuite/gcc.target/riscv/zvkn.c   | 13 +
> >>   gcc/testsuite/gcc.target/riscv/zvksed.c | 13 +
> >>   gcc/testsuite/gcc.target/riscv/zvksh.c  | 13 +
> >>   10 files changed, 126 insertions(+)
> >>   create mode 100644 gcc/testsuite/gcc.target/riscv/zvkb.c
> >>   create mode 100644 gcc/testsuite/gcc.target/riscv/zvkg.c
> >>   create mode 100644 gcc/testsuite/gcc.target/riscv/zvkha.c
> >>   create mode 100644 gcc/testsuite/gcc.target/riscv/zvkhb.c
> >>   create mode 100644 gcc/testsuite/gcc.target/riscv/zvkn.c
> >>   create mode 100644 gcc/testsuite/gcc.target/riscv/zvksed.c
> >>   create mode 100644 gcc/testsuite/gcc.target/riscv/zvksh.c
> > I don't see anything objectionable in here.  I'd guess that most (but
> > perhaps not all) of these will wire up as builtins at some point in the
> > not too distant future.
>
> These allow things like `-march=rv64gc_zvksh`, it's not really clear
> what the indented behavior is there -- specifically, does that
> implicitly enable some base vector extension?
>
> I've just skimmed the ISA manual here, but all I can find is a bit
> ambiguous
>
> With the exception of Zvknhb, each of these Vector Crypto Extensions
> can be build on any base Vector Extension, embedded (Zve*) or
> application ("V"). Zvknhb requires ELEN=64 and therefore cannot be
> implemented on a Zve32* base.
>
> I doubt it really matters which way we pick, but it is something we're
> going to need to keep consistent moving forwards as otherwise users
> might get some surprising behavior.  This has come up a bunch of times,
> but there's slightly different wording each time in the specs and I'm
> never really sure what to read of it.
>
> I don't think that alone would be enough to delay this for gcc-14, but
> as far as I can tell binutils is branching very soon for a target
> release in the middle of January.  I'm guessing these extensions will
> not be frozen by then, which would be a blocker.
>
> I'm not sure if anyone has a pressing need for these?  If not, I think
> it's best to delay them until binutils-2.41 (and presumably then
> gcc-14).

Given that the encodings last changed on Dec 21st, I would also prefer
if we could off until after the binutils-2.40 has been released.

Philipp.


Re: [PATCH v2 01/11] riscv: attr: Synchronize comments with code

2022-12-27 Thread Philipp Tomsich
Applied to master, thanks!

Philipp.

On Mon, 19 Dec 2022 at 03:49, Kito Cheng  wrote:

> LGTM, you can commit this separately if you want :)
>
> On Mon, Dec 19, 2022 at 9:09 AM Christoph Muellner
>  wrote:
> >
> > From: Christoph Müllner 
> >
> > The comment above the enumeration of existing attributes got out of
> > order and a few entries were forgotten.
> > This patch synchronizes the comments according to the list.
> > This commit does not include any functional change.
> >
> > gcc/ChangeLog:
> >
> > * config/riscv/riscv.md: Sync comments with code.
> >
> > Signed-off-by: Christoph Müllner 
> > ---
> >  gcc/config/riscv/riscv.md | 5 -
> >  1 file changed, 4 insertions(+), 1 deletion(-)
> >
> > diff --git a/gcc/config/riscv/riscv.md b/gcc/config/riscv/riscv.md
> > index df57e2b0b4a..a8bb331f25c 100644
> > --- a/gcc/config/riscv/riscv.md
> > +++ b/gcc/config/riscv/riscv.md
> > @@ -220,7 +220,6 @@ (define_attr "enabled" "no,yes"
> >  ;; mfc transfer from coprocessor
> >  ;; const   load constant
> >  ;; arith   integer arithmetic instructions
> > -;; auipc   integer addition to PC
> >  ;; logical  integer logical instructions
> >  ;; shift   integer shift instructions
> >  ;; slt set less than instructions
> > @@ -236,9 +235,13 @@ (define_attr "enabled" "no,yes"
> >  ;; fcvtfloating point convert
> >  ;; fsqrt   floating point square root
> >  ;; multi   multiword sequence (or user asm statements)
> > +;; auipc   integer addition to PC
> > +;; sfb_alu  SFB ALU instruction
> >  ;; nop no operation
> >  ;; ghost   an instruction that produces no real code
> >  ;; bitmanipbit manipulation instructions
> > +;; rotate   rotation instructions
> > +;; atomic   atomic instructions
> >  ;; Classification of RVV instructions which will be added to each RVV
> .md pattern and used by scheduler.
> >  ;; rdvlenb vector byte length vlenb csrr read
> >  ;; rdvlvector length vl csrr read
> > --
> > 2.38.1
> >
>


Re: [PATCH v2 02/11] riscv: Restructure callee-saved register save/restore code

2022-12-27 Thread Philipp Tomsich
Applied to master (with the change from the reviews), thanks!

Philipp.

On Mon, 19 Dec 2022 at 07:30, Kito Cheng  wrote:

> just one more nit: Use INVALID_REGNUM as sentinel value for
> riscv_next_saved_reg, otherwise LGTM, and feel free to commit that
> separately :)
>
> On Mon, Dec 19, 2022 at 9:08 AM Christoph Muellner
>  wrote:
> >
> > From: Christoph Müllner 
> >
> > This patch restructures the loop over the GP registers
> > which saves/restores then as part of the prologue/epilogue.
> > No functional change is intended by this patch, but it
> > offers the possibility to use load-pair/store-pair instructions.
> >
> > gcc/ChangeLog:
> >
> > * config/riscv/riscv.cc (riscv_next_saved_reg): New function.
> > (riscv_is_eh_return_data_register): New function.
> > (riscv_for_each_saved_reg): Restructure loop.
> >
> > Signed-off-by: Christoph Müllner 
> > ---
> >  gcc/config/riscv/riscv.cc | 94 +++
> >  1 file changed, 66 insertions(+), 28 deletions(-)
> >
> > diff --git a/gcc/config/riscv/riscv.cc b/gcc/config/riscv/riscv.cc
> > index 6dd2ab2d11e..a8d5e1dac7f 100644
> > --- a/gcc/config/riscv/riscv.cc
> > +++ b/gcc/config/riscv/riscv.cc
> > @@ -4835,6 +4835,49 @@ riscv_save_restore_reg (machine_mode mode, int
> regno,
> >fn (gen_rtx_REG (mode, regno), mem);
> >  }
> >
> > +/* Return the next register up from REGNO up to LIMIT for the callee
> > +   to save or restore.  OFFSET will be adjusted accordingly.
> > +   If INC is set, then REGNO will be incremented first.  */
> > +
> > +static unsigned int
> > +riscv_next_saved_reg (unsigned int regno, unsigned int limit,
> > + HOST_WIDE_INT *offset, bool inc = true)
> > +{
> > +  if (inc)
> > +regno++;
> > +
> > +  while (regno <= limit)
> > +{
> > +  if (BITSET_P (cfun->machine->frame.mask, regno - GP_REG_FIRST))
> > +   {
> > + *offset = *offset - UNITS_PER_WORD;
> > + break;
> > +   }
> > +
> > +  regno++;
> > +}
> > +  return regno;
> > +}
> > +
> > +/* Return TRUE if provided REGNO is eh return data register.  */
> > +
> > +static bool
> > +riscv_is_eh_return_data_register (unsigned int regno)
> > +{
> > +  unsigned int i, regnum;
> > +
> > +  if (!crtl->calls_eh_return)
> > +return false;
> > +
> > +  for (i = 0; (regnum = EH_RETURN_DATA_REGNO (i)) != INVALID_REGNUM;
> i++)
> > +if (regno == regnum)
> > +  {
> > +   return true;
> > +  }
> > +
> > +  return false;
> > +}
> > +
> >  /* Call FN for each register that is saved by the current function.
> > SP_OFFSET is the offset of the current stack pointer from the start
> > of the frame.  */
> > @@ -4844,36 +4887,31 @@ riscv_for_each_saved_reg (poly_int64 sp_offset,
> riscv_save_restore_fn fn,
> >   bool epilogue, bool maybe_eh_return)
> >  {
> >HOST_WIDE_INT offset;
> > +  unsigned int regno;
> > +  unsigned int start = GP_REG_FIRST;
> > +  unsigned int limit = GP_REG_LAST;
> >
> >/* Save the link register and s-registers. */
> > -  offset = (cfun->machine->frame.gp_sp_offset - sp_offset).to_constant
> ();
> > -  for (unsigned int regno = GP_REG_FIRST; regno <= GP_REG_LAST; regno++)
> > -if (BITSET_P (cfun->machine->frame.mask, regno - GP_REG_FIRST))
> > -  {
> > -   bool handle_reg =
> !cfun->machine->reg_is_wrapped_separately[regno];
> > -
> > -   /* If this is a normal return in a function that calls the
> eh_return
> > -  builtin, then do not restore the eh return data registers as
> that
> > -  would clobber the return value.  But we do still need to save
> them
> > -  in the prologue, and restore them for an exception return, so
> we
> > -  need special handling here.  */
> > -   if (epilogue && !maybe_eh_return && crtl->calls_eh_return)
> > - {
> > -   unsigned int i, regnum;
> > -
> > -   for (i = 0; (regnum = EH_RETURN_DATA_REGNO (i)) !=
> INVALID_REGNUM;
> > -i++)
> > - if (regno == regnum)
> > -   {
> > - handle_reg = FALSE;
> > - break;
> > -   }
> > - }
> > -
> > -   if (handle_reg)
> > - riscv_save_restore_reg (word_mode, regno, offset, fn);
> > -   offset -= UNITS_PER_WORD;
> > -  }
> > +  offset = (cfun->machine->frame.gp_sp_offset - sp_offset).to_constant
> ()
> > +  + UNITS_PER_WORD;
> > +  for (regno = riscv_next_saved_reg (start, limit, &offset, false);
> > +   regno <= limit;
> > +   regno = riscv_next_saved_reg (regno, limit, &offset))
> > +{
> > +  if (cfun->machine->reg_is_wrapped_separately[regno])
> > +   continue;
> > +
> > +  /* If this is a normal return in a function that calls the
> eh_return
> > +builtin, then do not restore the eh return data registers as
> that
> > +would clobber the return value.  But we do still need to save
> them
> > +in the prol

Re: [PATCH] RISC-V: Optimize min/max with SImode sources on 64-bit

2022-12-29 Thread Philipp Tomsich
On Wed, 28 Dec 2022 at 19:18, Raphael Moreira Zinsly <
rzin...@ventanamicro.com> wrote:

> The Zbb min/max pattern was not matching 32-bit sources when
> compiling for 64-bit.
> This patch separates the pattern into SImode and DImode, and
> use a define_expand to handle SImode on 64-bit.
> zbb-min-max-02.c generates different code as a result of the new
> expander.  The resulting code is as efficient as the old code.
> Furthermore, the special sh1add pattern that appeared in
> zbb-min-max-02.c is tested by the zba-shNadd-* tests.
>
> gcc/ChangeLog:
>
> * config/riscv/bitmanip.md
> (3): Divide pattern into
> si3_insn and di3.
> (si3): Handle SImode sources on
> TARGET_64BIT.
>
> gcc/testsuite:
>
> * gcc.target/riscv/zbb-abs.c: New test.
> * gcc.target/riscv/zbb-min-max-02.c: Addapt the
> expected output.
> ---
>  gcc/config/riscv/bitmanip.md  | 38 ---
>  gcc/testsuite/gcc.target/riscv/zbb-abs.c  | 18 +
>  .../gcc.target/riscv/zbb-min-max-02.c |  2 +-
>  3 files changed, 52 insertions(+), 6 deletions(-)
>  create mode 100644 gcc/testsuite/gcc.target/riscv/zbb-abs.c
>
> diff --git a/gcc/config/riscv/bitmanip.md b/gcc/config/riscv/bitmanip.md
> index d17133d58c1..abf08a29e89 100644
> --- a/gcc/config/riscv/bitmanip.md
> +++ b/gcc/config/riscv/bitmanip.md
> @@ -360,14 +360,42 @@
>DONE;
>  })
>
> -(define_insn "3"
> -  [(set (match_operand:X 0 "register_operand" "=r")
> -(bitmanip_minmax:X (match_operand:X 1 "register_operand" "r")
> -  (match_operand:X 2 "register_operand" "r")))]
> -  "TARGET_ZBB"
> +(define_insn "si3_insn"
> +  [(set (match_operand:SI 0 "register_operand" "=r")
> +(bitmanip_minmax:SI (match_operand:SI 1 "register_operand" "r")
> +(match_operand:SI 2 "register_operand" "r")))]
> +  "!TARGET_64BIT && TARGET_ZBB"
>"\t%0,%1,%2"
>[(set_attr "type" "bitmanip")])
>
> +(define_insn "di3"
> +  [(set (match_operand:DI 0 "register_operand" "=r")
> +(bitmanip_minmax:DI (match_operand:DI 1 "register_operand" "r")
> +(match_operand:DI 2 "register_operand" "r")))]
> +  "TARGET_64BIT && TARGET_ZBB"
> +  "\t%0,%1,%2"
> +  [(set_attr "type" "bitmanip")])
> +
> +(define_expand "si3"
> +  [(set (match_operand:SI 0 "register_operand" "=r")
> +(bitmanip_minmax:SI (match_operand:SI 1 "register_operand" "r")
> +(match_operand:SI 2 "register_operand" "r")))]
> +  "TARGET_ZBB"
> +  "
> +{
> +  if (TARGET_64BIT)
> +{
> +  rtx op1_x = gen_reg_rtx (DImode);
> +  emit_move_insn (op1_x, gen_rtx_SIGN_EXTEND (DImode, operands[1]));
> +  rtx op2_x = gen_reg_rtx (DImode);
> +  emit_move_insn (op2_x, gen_rtx_SIGN_EXTEND (DImode, operands[2]));
> +  rtx dst_x = gen_reg_rtx (DImode);
> +  emit_insn (gen_di3 (dst_x, op1_x, op2_x));
> +  emit_move_insn (operands[0], gen_lowpart (SImode, dst_x));
> +  DONE;
> +}
> +}")
>

We have two issues around min/max here:
1. That it doesn't apply to the SImode abs case (which is due to
expand_abs_nojump() blindly testing for the current mode in smax_optab).
2. That we have to reduce the number of extensions to the least amount.

The above addresses expand_abs_nojump(), but makes the general solution
harder as the middle-end needs to know there is no native SImode min/max
available.
We still plan (proof-of-concept works, but a final patch will likely not be
ready before very late in January) to submit a patch to improve the
expansion of MIN_EXPR/MAX_EXPR that utilizes the type-precision and
value-ranges to not even create the sign-extensions in the first place.  If
we do the above, the middle-end will blindly emit this sequence with the 2
sign-extensions — which may or may not be eliminated later by combining
with a w-form.
I'll also add an enhancement to expand_abs_nojump() to our list of changes
for the min/max enhancement during the lowering.

Note that, if we decide to go ahead with using this as a temporary solution
until our change is ready, you'll also need to add a cost for the SImode
max.

Philipp.


> +
>  ;; Optimize the common case of a SImode min/max against a constant
>  ;; that is safe both for sign- and zero-extension.
>  (define_insn_and_split "*minmax"
> diff --git a/gcc/testsuite/gcc.target/riscv/zbb-abs.c
> b/gcc/testsuite/gcc.target/riscv/zbb-abs.c
> new file mode 100644
> index 000..6ef7efdbd49
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/riscv/zbb-abs.c
> @@ -0,0 +1,18 @@
> +/* { dg-do compile } */
> +/* { dg-options "-march=rv64gc_zbb" } */
> +/* { dg-skip-if "" { *-*-* } { "-O0" } } */
> +
> +#define ABS(x) (((x) >= 0) ? (x) : -(x))
> +
> +int
> +foo (int x)
> +{
> +  return ABS(x);
> +}
> +
> +/* { dg-final { scan-assembler-times "neg" 1 } } */
> +/* { dg-final { scan-assemble

  1   2   3   4   >