Re: [PATCH v3] EXPR: Emit an truncate if 31+ bits polluted for SImode

2023-12-24 Thread YunQiang Su
> > Yes. I also guess so.  Any new idea?
> Well, I see multiple intertwined issues and I think MIPS has largely
> mucked this up.
>
> At a high level DI -> SI truncation is not a nop on MIPS64.  We must
> explicitly sign extend the value from SI->DI to preserve the invariant
> that SI mode objects are extended to DImode.  If we fail to do that,
> then the SImode conditional branch patterns simply aren't going to work.
>

MIPS64 never claims DI -> SI is nop, instead it claims SI -> DI is nop.
And for MIPS64, it has only one type of branch. it works for both SI and DI.

MIPS64 has 3 groups of instructions:
1. The instructions from MIPS32, 32bit calculations included.
These instructions expect the source values are properly sign-extended,
   otherwise, the result is UNPREDICTABLE.
And they make sure that the Destinations are  sign-extended.
Let's use INS here as an example.
2. The newly added 64bit ops.
Let's use DINS here as an example.
3. branch instructions
 They works depending on the value of the registers.

> What doesn't make sense to me is that for truncation, the output mode is
> going to be smaller than the input mode.  Which makes logical sense and
> is codified in the documentation:
>
> > @deftypefn {Target Hook} bool TARGET_TRULY_NOOP_TRUNCATION (poly_uint64 
> > @var{outprec}, poly_uint64 @var{inprec})
> > This hook returns true if it is safe to ``convert'' a value of
> > @var{inprec} bits to one of @var{outprec} bits (where @var{outprec} is
> > smaller than @var{inprec}) by merely operating on it as if it had only
> > @var{outprec} bits.  The default returns true unconditionally, which
> > is correct for most machines.  When @code{TARGET_TRULY_NOOP_TRUNCATION}
> > returns false, the machine description should provide a @code{trunc}
> > optab to specify the RTL that performs the required truncation.
>
>
> Yet the implementation in the mips backend:
>
> > static bool
> > mips_truly_noop_truncation (poly_uint64 outprec, poly_uint64 inprec)
> > {
> >   return !TARGET_64BIT || inprec <= 32 || outprec > 32;
> > }
>
>
> Can you verify what values are getting in here?  If we're being called
> with inprec as 32 and outprec as 64, we're going to return true which
> makes absolutely no sense at all.
>

One of a key design rule of MIPS, is to reduce hardware mode switch as possible.

Converting from 32 to 64 does be nop, IF the 32 is properly sign extended.
Since the nature of the 1st group of instructions (the result is
always sign extended)
MIPS's software stack carefully keeps SI values sign-extended.
So MIPS64 can claims SI -> DI is nop.

With this design, supporting 32bit software on 64bit hardware is smoothly,
without any mode switch etc.

The only exception is here: the 64bit bitops pollute a SI mode hard register:
lbu $3,3($4)
dins$2,$3,24,8
bltz$2,.L2
LBU loads a value from memory. It's OK now.
and DINS insert it to $2, and it will  pollute bit 31, since DINS is a
newly added
MIPS64 instruction, it won't copy bit31 to bits[32:63].
It will make $2, now a malformed SI mode hard register (31-63 bit is
not ALL same).
Then BLTZ will believe it is a normal DI/64bit value.

We have two options here to solve this problem:
1. Use INS instead of DINS.
In the original C code, the variable `val` is a value with SI mode.
So we should use INS here, as INS will to be sure the result is
well sign-extended.
That's what I tried with the previous patches:
[V1] https://gcc.gnu.org/pipermail/gcc-patches/2023-July/624174.html
[V2] https://gcc.gnu.org/pipermail/gcc-patches/2023-August/626137.html

2. Add an truncate or sign_extent instruction just after the instruction, which
may pollute the 31+bits of a SImode registers.
That's what I am trying to do with this V3 patch.
The result asm code will be like this:
lbu $3,3($4)
dins$2,$3,24,8
sll   $2,$2,0   # <--- newly added instruction
bltz$2,.L2
Background: SLL (shift left logical) is a special instruction in MIPS:
 1. SLL rX,rY,0 on MIPS64 is 32bit sign-extend operation.
 2. SLL r0,r0,0 is NOP instruction.
 3. SLL r0,r0,1 is SSNOP (Superscalar No Operation)
 4. SLL r0,r0,3 is EHB (Execution Hazard Barrier)




> Jeff


Re: [PATCH v3] EXPR: Emit an truncate if 31+ bits polluted for SImode

2023-12-24 Thread YunQiang Su
Roger Sayle  于2023年12月24日周日 08:49写道:
>
>
> Hi YunQiang (and Jeff),
>
> > MIPS claims TRULY_NOOP_TRUNCATION_MODES_P (DImode, SImode)) == true
> > based on that the hard register is always sign-extended, but here
> > the hard register is polluted by zero_extract.
>
> I suspect that the bug here is that the MIPS backend shouldn't be returning
> true for TRULY_NOOP_TRUNCATION_MODES_P (DImode, SImode).   It's true
> that the backend stores SImode values in DImode registers by sign extending
> them, but this doesn't mean that any DImode pseudo register can be truncated
> to an SImode pseudo just by SUBREG/register naming.  As you point out, if
> the
> high bits of a DImode value are random, truncation isn't a no-op, and
> requires
> an explicit sign-extension instruction.
>

Yes, you are right. While in most case, software/compiler carefully,
when work with SI variables, only instructions these instruction are used:
   1. the result of this instruction is proper sign-extended,
normally, the instructions from MIPS32.
   2. or use LW to load the value: LW also will sign-extend the registers.

> There's a PR in Bugzilla around this representational issue on MIPS, but I
> can't find it straight away.
>
> Out of curiosity, how badly affected is the testsuite if mips.cc's
> mips_truly_noop_truncation (poly_uint64 outprec, poly_uint64 inprec)
> is changed to just return !TARGET_64BIT ?
>

It will make some performance regression. Use our new tests as an example
The result will be:
lbu $2,0($4)
SLL # newly added
lbu $3,1($4)
dins$2,$3,8,8
SLL # newly added
lbu $3,2($4)
dins$2,$3,16,8
SLL # newly added
lbu $3,3($4)
dins$2,$3,24,8
sll   $2,$2

As my previous patch did, here, in fact if we replace all DINS with INS,
it will just work.

> I agree with Jeff there's an invariant that isn't correctly being modelled
> by
> the MIPS machine description.  A machine description probably shouldn't
> define an addsi3  pattern if what it actually supports is
> (sign_extend:DI (truncate:SI (plus:DI (reg:DI x) (reg:DI y
> Trying to model this as SImode addition plus a SUBREG_PROMOTED flag
> is less than ideal.
>

The addsi3 on MIPS64 is like:
(define_insn "*addsi3_extended"
 [(set (match_operand:DI 0 "register_operand" "=d,d")
   (sign_extend:DI
(plus:SI (match_operand:SI 1 "register_operand" "d,d")
 (match_operand:SI 2 "arith_operand" "d,Q"]
 "TARGET_64BIT && !TARGET_MIPS16"
 "@
   addu\t%0,%1,%2
   addiu\t%0,%1,%2"
 [(set_attr "alu_type" "add")
  (set_attr "mode" "SI")])

It expect the source register is in SImode. And in fact in the ISA
documents: the result of ADD/ADDU will be UNPREDICTABLE
if sources is not well sign-extended.


> Just my thoughts.  I'm curious what other folks think.
>
> Cheers,
> Roger
> --
>
>


RE: [PATCH v3] EXPR: Emit an truncate if 31+ bits polluted for SImode

2023-12-24 Thread Roger Sayle


> What's exceedingly weird is T_N_T_M_P (DImode, SImode) isn't actually a
> truncation!  The output precision is first, the input precision is second.  
> The docs
> explicitly state the output precision should be smaller than the input 
> precision
> (which makes sense for truncation).
> 
> That's where I'd start with trying to untangle this mess.

Thanks (both) for correcting my misunderstanding.
At the very least might I suggest that we introduce a new
TRULY_NOOP_EXTENSION_MODES_P target hook that MIPS
can use for this purpose?  It'd help reduce confusion, and keep
the documentation/function naming correct.

When Richard Sandiford "hookized" truly_noop_truncation in 2017
https://gcc.gnu.org/legacy-ml/gcc-patches/2017-09/msg00836.html
he mentions the inprec/outprec confusion [deciding not to add a
gcc_assert outprec < inprec here, which might be a good idea].

The next question is whether this is just
TRULY_NOOP_SIGN_EXTENSION_MODES_P
or whether there are any targets that usefully ensure some modes
are zero-extended forms of others.  TRULY_NOOP_ZERO_EXTENSION...

My vote is that a DINS instruction that updates the most significant
bit of an SImode value should then expand or define_insn_and_split
with an explicit following sign-extension operation.  To avoid this being
eliminated by the RTL optimizers/combine the DINS should return a
DImode result, with the following extension truncating it to canonical
SImode form.  This preserves the required backend invariant (and
doesn't require tweaking machine-independent code in combine).
SImode DINS instructions that don't/can't affect the MSB, can be a
single SImode instruction.

Cheers,
Roger
--




Re: [PATCH v3] EXPR: Emit an truncate if 31+ bits polluted for SImode

2023-12-24 Thread YunQiang Su
Roger Sayle  于2023年12月24日周日 16:51写道:
>
>
> > What's exceedingly weird is T_N_T_M_P (DImode, SImode) isn't actually a
> > truncation!  The output precision is first, the input precision is second.  
> > The docs
> > explicitly state the output precision should be smaller than the input 
> > precision
> > (which makes sense for truncation).
> >
> > That's where I'd start with trying to untangle this mess.
>
> Thanks (both) for correcting my misunderstanding.
> At the very least might I suggest that we introduce a new
> TRULY_NOOP_EXTENSION_MODES_P target hook that MIPS
> can use for this purpose?  It'd help reduce confusion, and keep
> the documentation/function naming correct.
>

Yes. It is good for me.
T_N_T_M_P is a really confusion naming.

> When Richard Sandiford "hookized" truly_noop_truncation in 2017
> https://gcc.gnu.org/legacy-ml/gcc-patches/2017-09/msg00836.html
> he mentions the inprec/outprec confusion [deciding not to add a
> gcc_assert outprec < inprec here, which might be a good idea].
>
> The next question is whether this is just
> TRULY_NOOP_SIGN_EXTENSION_MODES_P
> or whether there are any targets that usefully ensure some modes
> are zero-extended forms of others.  TRULY_NOOP_ZERO_EXTENSION...
>

I guess ARM64 is the one TRULY_NOOP_ZERO_EXTENSION?

> My vote is that a DINS instruction that updates the most significant
> bit of an SImode value should then expand or define_insn_and_split
> with an explicit following sign-extension operation.  To avoid this being
> eliminated by the RTL optimizers/combine the DINS should return a
> DImode result, with the following extension truncating it to canonical

Is it this one?
https://gcc.gnu.org/pipermail/gcc-patches/2023-August/626137.html

> SImode form.  This preserves the required backend invariant (and
> doesn't require tweaking machine-independent code in combine).
> SImode DINS instructions that don't/can't affect the MSB, can be a
> single SImode instruction.
>

Yes. As most of MIPS microarchitecture, INS may have slight better
performance than DINS.

While, I am worrying that: will some body do something like
INS ,,24,8
In this case, if  is not sign-extended, the result will be
UNPREDICTABLE.
For this, now, I prefer to use DINS and append a SLL.

I tried to write a C code that can produce this case, but not yet
success.


> Cheers,
> Roger
> --
>
>


Re: [PATCH v3] EXPR: Emit an truncate if 31+ bits polluted for SImode

2023-12-24 Thread Andrew Pinski
On Sun, Dec 24, 2023, 01:18 YunQiang Su  wrote:

> Roger Sayle  于2023年12月24日周日 16:51写道:
> >
> >
> > > What's exceedingly weird is T_N_T_M_P (DImode, SImode) isn't actually a
> > > truncation!  The output precision is first, the input precision is
> second.  The docs
> > > explicitly state the output precision should be smaller than the input
> precision
> > > (which makes sense for truncation).
> > >
> > > That's where I'd start with trying to untangle this mess.
> >
> > Thanks (both) for correcting my misunderstanding.
> > At the very least might I suggest that we introduce a new
> > TRULY_NOOP_EXTENSION_MODES_P target hook that MIPS
> > can use for this purpose?  It'd help reduce confusion, and keep
> > the documentation/function naming correct.
> >
>
> Yes. It is good for me.
> T_N_T_M_P is a really confusion naming.
>
> > When Richard Sandiford "hookized" truly_noop_truncation in 2017
> > https://gcc.gnu.org/legacy-ml/gcc-patches/2017-09/msg00836.html
> > he mentions the inprec/outprec confusion [deciding not to add a
> > gcc_assert outprec < inprec here, which might be a good idea].
> >
> > The next question is whether this is just
> > TRULY_NOOP_SIGN_EXTENSION_MODES_P
> > or whether there are any targets that usefully ensure some modes
> > are zero-extended forms of others.  TRULY_NOOP_ZERO_EXTENSION...
> >
>
> I guess ARM64 is the one TRULY_NOOP_ZERO_EXTENSION?
>

I am not 100% convinced here that is true. Yes aarch64 has many zero-extend
instruction and ones that ignore the top 32 bits. That is a different
requirement from mips.



> > My vote is that a DINS instruction that updates the most significant
> > bit of an SImode value should then expand or define_insn_and_split
> > with an explicit following sign-extension operation.  To avoid this being
> > eliminated by the RTL optimizers/combine the DINS should return a
> > DImode result, with the following extension truncating it to canonical
>
> Is it this one?
> https://gcc.gnu.org/pipermail/gcc-patches/2023-August/626137.html
>
> > SImode form.  This preserves the required backend invariant (and
> > doesn't require tweaking machine-independent code in combine).
> > SImode DINS instructions that don't/can't affect the MSB, can be a
> > single SImode instruction.
> >
>
> Yes. As most of MIPS microarchitecture, INS may have slight better
> performance than DINS.
>

This is not true. Cavium's octeon had the same performance characteristics
for dins and ins. Though I doubt that microarch matters any more.

Thanks,
Andrew




> While, I am worrying that: will some body do something like
> INS ,,24,8
> In this case, if  is not sign-extended, the result will be
> UNPREDICTABLE.
> For this, now, I prefer to use DINS and append a SLL.
>
> I tried to write a C code that can produce this case, but not yet
> success.
>
>
> > Cheers,
> > Roger
> > --
> >
> >
>


RE: [PATCH v3] EXPR: Emit an truncate if 31+ bits polluted for SImode

2023-12-24 Thread Roger Sayle


> > > What's exceedingly weird is T_N_T_M_P (DImode, SImode) isn't
> > > actually a truncation!  The output precision is first, the input
> > > precision is second.  The docs explicitly state the output precision
> > > should be smaller than the input precision (which makes sense for 
> > > truncation).
> > >
> > > That's where I'd start with trying to untangle this mess.
> >
> > Thanks (both) for correcting my misunderstanding.
> > At the very least might I suggest that we introduce a new
> > TRULY_NOOP_EXTENSION_MODES_P target hook that MIPS can use for this
> > purpose?  It'd help reduce confusion, and keep the
> > documentation/function naming correct.
> >
> 
> Yes. It is good for me.
> T_N_T_M_P is a really confusion naming.

Ignore my suggestion for a new target hook.  GCC already has one.
You shouldn't be using TRULY_NOOP_TRUNCATION_MODES_P
with incorrectly ordered arguments. The correct target hook is 
TARGET_MODE_REP_EXTENDED, which the MIPS backend correctly
defines via mips_mode_rep_extended.

It's MIPS definition of (and interpretation of) mips_truly_noop_truncation
that's suspect.

My latest theory is that these sign extensions should be:
(set (reg:DI) (sign_extend:DI (truncate:SI (reg:DI
and not
(set (reg:DI) (sign_extend:DI (subreg:SI (reg:DI
If the RTL optimizer's ever split this instruction the semantics of
the SUBREG intermediate are incorrect.  Another (less desirable)
approach might be to use an UNSPEC.




[PATCH v2] LoongArch: Expand left rotate to right rotate with negated amount

2023-12-24 Thread Xi Ruoyao
gcc/ChangeLog:

* config/loongarch/loongarch.md (rotl3):
New define_expand.
* config/loongarch/simd.md (vrotl3): Likewise.
(rotl3): Likewise.

gcc/testsuite/ChangeLog:

* gcc.target/loongarch/rotl-with-rotr.c: New test.
* gcc.target/loongarch/rotl-with-vrotr-b.c: New test.
* gcc.target/loongarch/rotl-with-vrotr-h.c: New test.
* gcc.target/loongarch/rotl-with-vrotr-w.c: New test.
* gcc.target/loongarch/rotl-with-vrotr-d.c: New test.
* gcc.target/loongarch/rotl-with-xvrotr-b.c: New test.
* gcc.target/loongarch/rotl-with-xvrotr-h.c: New test.
* gcc.target/loongarch/rotl-with-xvrotr-w.c: New test.
* gcc.target/loongarch/rotl-with-xvrotr-d.c: New test.
---

Change from [v1]:
- Wrap the negated wrapping amount with subreg: for
  rotl, to avoid an ICE left rotating QI and HI vectors.
- Add tests for QI, HI, and DI vectors.

[v1]:https://gcc.gnu.org/pipermail/gcc-patches/2023-December/640872.html

Bootstrapped and regtested on loongarch64-linux-gnu.  Ok for trunk?

 gcc/config/loongarch/loongarch.md | 12 
 gcc/config/loongarch/simd.md  | 29 +++
 .../gcc.target/loongarch/rotl-with-rotr.c |  9 ++
 .../gcc.target/loongarch/rotl-with-vrotr-b.c  |  7 +
 .../gcc.target/loongarch/rotl-with-vrotr-d.c  |  7 +
 .../gcc.target/loongarch/rotl-with-vrotr-h.c  |  7 +
 .../gcc.target/loongarch/rotl-with-vrotr-w.c  | 28 ++
 .../gcc.target/loongarch/rotl-with-xvrotr-b.c |  7 +
 .../gcc.target/loongarch/rotl-with-xvrotr-d.c |  7 +
 .../gcc.target/loongarch/rotl-with-xvrotr-h.c |  7 +
 .../gcc.target/loongarch/rotl-with-xvrotr-w.c |  7 +
 11 files changed, 127 insertions(+)
 create mode 100644 gcc/testsuite/gcc.target/loongarch/rotl-with-rotr.c
 create mode 100644 gcc/testsuite/gcc.target/loongarch/rotl-with-vrotr-b.c
 create mode 100644 gcc/testsuite/gcc.target/loongarch/rotl-with-vrotr-d.c
 create mode 100644 gcc/testsuite/gcc.target/loongarch/rotl-with-vrotr-h.c
 create mode 100644 gcc/testsuite/gcc.target/loongarch/rotl-with-vrotr-w.c
 create mode 100644 gcc/testsuite/gcc.target/loongarch/rotl-with-xvrotr-b.c
 create mode 100644 gcc/testsuite/gcc.target/loongarch/rotl-with-xvrotr-d.c
 create mode 100644 gcc/testsuite/gcc.target/loongarch/rotl-with-xvrotr-h.c
 create mode 100644 gcc/testsuite/gcc.target/loongarch/rotl-with-xvrotr-w.c

diff --git a/gcc/config/loongarch/loongarch.md 
b/gcc/config/loongarch/loongarch.md
index 30025bf1908..939432b83e0 100644
--- a/gcc/config/loongarch/loongarch.md
+++ b/gcc/config/loongarch/loongarch.md
@@ -2903,6 +2903,18 @@ (define_insn "rotrsi3_extend"
   [(set_attr "type" "shift,shift")
(set_attr "mode" "SI")])
 
+;; Expand left rotate to right rotate.
+(define_expand "rotl3"
+  [(set (match_dup 3)
+   (neg:SI (match_operand:SI 2 "register_operand")))
+   (set (match_operand:GPR 0 "register_operand")
+   (rotatert:GPR (match_operand:GPR 1 "register_operand")
+ (match_dup 3)))]
+  ""
+  {
+operands[3] = gen_reg_rtx (SImode);
+  });
+
 ;; The following templates were added to generate "bstrpick.d + alsl.d"
 ;; instruction pairs.
 ;; It is required that the values of const_immalsl_operand and
diff --git a/gcc/config/loongarch/simd.md b/gcc/config/loongarch/simd.md
index 13202f79bee..93fb39abcf5 100644
--- a/gcc/config/loongarch/simd.md
+++ b/gcc/config/loongarch/simd.md
@@ -268,6 +268,35 @@ (define_insn "vrotr3"
   [(set_attr "type" "simd_int_arith")
(set_attr "mode" "")])
 
+;; Expand left rotate to right rotate.
+(define_expand "vrotl3"
+  [(set (match_dup 3)
+   (neg:IVEC (match_operand:IVEC 2 "register_operand")))
+   (set (match_operand:IVEC 0 "register_operand")
+   (rotatert:IVEC (match_operand:IVEC 1 "register_operand")
+  (match_dup 3)))]
+  ""
+  {
+operands[3] = gen_reg_rtx (mode);
+  });
+
+;; Expand left rotate with a scalar amount to right rotate: negate the
+;; scalar before broadcasting it because scalar negation is cheaper than
+;; vector negation.
+(define_expand "rotl3"
+  [(set (match_dup 3)
+   (neg:SI (match_operand:SI 2 "register_operand")))
+   (set (match_dup 4)
+   (vec_duplicate:IVEC (subreg: (match_dup 3) 0)))
+   (set (match_operand:IVEC 0 "register_operand")
+   (rotatert:IVEC (match_operand:IVEC 1 "register_operand")
+  (match_dup 4)))]
+  ""
+  {
+operands[3] = gen_reg_rtx (SImode);
+operands[4] = gen_reg_rtx (mode);
+  });
+
 ;; vrotri.{b/h/w/d}
 
 (define_insn "rotr3"
diff --git a/gcc/testsuite/gcc.target/loongarch/rotl-with-rotr.c 
b/gcc/testsuite/gcc.target/loongarch/rotl-with-rotr.c
new file mode 100644
index 000..84cc53cecaf
--- /dev/null
+++ b/gcc/testsuite/gcc.target/loongarch/rotl-with-rotr.c
@@ -0,0 +1,9 @@
+/* { dg-do compile } */
+/* { dg-options "-O2" } */
+/* { dg-final { scan-assembler "rotr\\.w" } } */
+
+unsigned
+t (unsigned a, 

Re: [PATCH] LoongArch: Expand left rotate to right rotate with negated amount

2023-12-24 Thread Xi Ruoyao
On Sun, 2023-12-24 at 01:04 +0800, Xi Ruoyao wrote:
> On Sun, 2023-12-24 at 00:56 +0800, Xi Ruoyao wrote:
> > On Sat, 2023-12-23 at 15:00 +0800, chenglulu wrote:
> > > Hi,
> > > 
> > > This patch will cause the following tests to fail:
> > > 
> > > +FAIL: gcc.dg/vect/pr97081-2.c (internal compiler error: in extract_insn, 
> > > at recog.cc:2812)
> > > +FAIL: gcc.dg/vect/pr97081-2.c (test for excess errors)
> > > +FAIL: gcc.dg/vect/pr97081-2.c -flto -ffat-lto-objects (internal compiler 
> > > error: in extract_insn, at recog.cc:2812)
> > > +FAIL: gcc.dg/vect/pr97081-2.c -flto -ffat-lto-objects (test for excess 
> > > errors)
> > 
> > I can reproduce it now but it did not happen when I submitted the patch.
> > The difference may be caused by a different binutils version or some
> > other changes in GCC.  I'll figure it out...
> 
> Phew, it was simple.  I uploaded an earlier draft version of this patch
> onto the dev box :(.

Fixed the issue in v2.

-- 
Xi Ruoyao 
School of Aerospace Science and Technology, Xidian University


Re: Ping: [PATCH] LoongArch: Replace -mexplicit-relocs=auto simple-used address peephole2 with combine

2023-12-24 Thread Xi Ruoyao
On Sat, 2023-12-23 at 18:47 +0800, Xi Ruoyao wrote:
> On Sat, 2023-12-23 at 18:44 +0800, Xi Ruoyao wrote:
> > On Sat, 2023-12-23 at 10:29 +0800, chenglulu wrote:
> > > > The performance drop has nothing to do with this patch. I found that 
> > > > the h264 performance compiled 
> > > > by r14-6787 compared to r14-6421 dropped by 6.4%. 
> > 
> > Then I guess we should create a bug report...
> > 
> > >  But there is a problem. My regression test has the following two fail 
> > > items.(based on r14-6787)
> > 
> > > +FAIL: gcc.dg/cpp/_Pragma3.c (test for excess errors)
> 
> I guess this is https://gcc.gnu.org/PR28123.
> 
> > > +FAIL: gcc.dg/pr86617.c scan-rtl-dump-times final "mem/v" 6
> 
> I'll take a look on this.  Maybe it will show up with Binutils trunk (I
> just realized I tested this patch with Binutils 2.41, and it's not
> sufficient to really test the change).

I cannot reproduce the issue on a Gentoo dev machine with Binutils
2.41.50.20231218 and the patch on top of r14-6819.  And in my manual
testing (for ruling out the difference caused by default PIE and SSP)
the test also passes:

xry111@nanmen2 ~/git-repos/gcc-build $ /home/xry111/git-repos/gcc-
build/gcc/xgcc -B/home/xry111/git-repos/gcc-build/gcc/ /home/xry111/git-
repos/gcc/gcc/testsuite/gcc.dg/pr86617.c -fdiagnostics-plain-output -Os
-fdump-rtl-final -ffat-lto-objects -S -o pr86617.s -fno-stack-protector
-fno-pie && grep -c mem/v pr86617.c.348r.final 
6

Could you recheck with latest GCC master?

-- 
Xi Ruoyao 
School of Aerospace Science and Technology, Xidian University


[committed] hppa: Fix pr110279-1.c on hppa

2023-12-24 Thread John David Anglin
This test needs fma support.  It is only available on hppa in PA 2.0.

Tested on hppa-unknown-linux-gnu.  Committed to trunk.

Dave
---

hppa: Fix pr110279-1.c on hppa

2023-12-24  John David Anglin  

gcc/testsuite/ChangeLog:

* gcc.dg/pr110279-1.c: Add -march=2.0 option on hppa*-*-*.

diff --git a/gcc/testsuite/gcc.dg/pr110279-1.c 
b/gcc/testsuite/gcc.dg/pr110279-1.c
index f25b6aec967..291824c0a48 100644
--- a/gcc/testsuite/gcc.dg/pr110279-1.c
+++ b/gcc/testsuite/gcc.dg/pr110279-1.c
@@ -1,6 +1,7 @@
 /* { dg-do compile } */
 /* { dg-options "-Ofast --param avoid-fma-max-bits=512 --param 
tree-reassoc-width=4 -fdump-tree-widening_mul-details" } */
 /* { dg-additional-options "-march=armv8.2-a" { target aarch64-*-* } } */
+/* { dg-additional-options "-march=2.0" { target hppa*-*-* } } */
 
 #define LOOP_COUNT 8
 typedef double data_e;


signature.asc
Description: PGP signature


[PATCH][testsuite]: Add more pragma novector to new tests

2023-12-24 Thread Tamar Christina
Hi All,

This patch was pre-appproved by Richi.

This updates the testsuite and adds more #pragma GCC novector to various tests
that would otherwise vectorize the vector result checking code.

This cleans out the testsuite since the last rebase and prepares for the landing
of the early break patch.

Bootstrapped Regtested on aarch64-none-linux-gnu, x86_64-pc-linux-gnu 
and no issues.

Pushed to master.

Thanks,
Tamar

gcc/testsuite/ChangeLog:

* gcc.dg/vect/no-scevccp-slp-30.c: Add pragma GCC novector to abort
loop.
* gcc.dg/vect/no-scevccp-slp-31.c: Likewise.
* gcc.dg/vect/no-section-anchors-vect-69.c: Likewise.
* gcc.target/aarch64/vect-xorsign_exec.c: Likewise.
* gcc.target/i386/avx512er-vrcp28ps-3.c: Likewise.
* gcc.target/i386/avx512er-vrsqrt28ps-3.c: Likewise.
* gcc.target/i386/avx512er-vrsqrt28ps-5.c: Likewise.
* gcc.target/i386/avx512f-ceil-sfix-vec-1.c: Likewise.
* gcc.target/i386/avx512f-ceil-vec-1.c: Likewise.
* gcc.target/i386/avx512f-ceilf-sfix-vec-1.c: Likewise.
* gcc.target/i386/avx512f-ceilf-vec-1.c: Likewise.
* gcc.target/i386/avx512f-floor-sfix-vec-1.c: Likewise.
* gcc.target/i386/avx512f-floor-vec-1.c: Likewise.
* gcc.target/i386/avx512f-floorf-sfix-vec-1.c: Likewise.
* gcc.target/i386/avx512f-floorf-vec-1.c: Likewise.
* gcc.target/i386/avx512f-rint-sfix-vec-1.c: Likewise.
* gcc.target/i386/avx512f-rintf-sfix-vec-1.c: Likewise.
* gcc.target/i386/avx512f-round-sfix-vec-1.c: Likewise.
* gcc.target/i386/avx512f-roundf-sfix-vec-1.c: Likewise.
* gcc.target/i386/avx512f-trunc-vec-1.c: Likewise.
* gcc.target/i386/avx512f-truncf-vec-1.c: Likewise.
* gcc.target/i386/vect-alignment-peeling-1.c: Likewise.
* gcc.target/i386/vect-alignment-peeling-2.c: Likewise.
* gcc.target/i386/vect-pack-trunc-1.c: Likewise.
* gcc.target/i386/vect-pack-trunc-2.c: Likewise.
* gcc.target/i386/vect-perm-even-1.c: Likewise.
* gcc.target/i386/vect-unpack-1.c: Likewise.

--- inline copy of patch -- 
diff --git a/gcc/testsuite/gcc.dg/vect/no-scevccp-slp-30.c 
b/gcc/testsuite/gcc.dg/vect/no-scevccp-slp-30.c
index 
00d0eca56eeca6aee6f11567629dc955c0924c74..534bee4a1669a7cbd95cf6007f28dafd23bab8da
 100644
--- a/gcc/testsuite/gcc.dg/vect/no-scevccp-slp-30.c
+++ b/gcc/testsuite/gcc.dg/vect/no-scevccp-slp-30.c
@@ -24,9 +24,9 @@ main1 ()
}
 
   /* check results:  */
-#pragma GCC novector
for (j = 0; j < N; j++)
{
+#pragma GCC novector
 for (i = 0; i < N; i++)
   {
 if (out[i*4] != 8
diff --git a/gcc/testsuite/gcc.dg/vect/no-scevccp-slp-31.c 
b/gcc/testsuite/gcc.dg/vect/no-scevccp-slp-31.c
index 
48b6a9b0681cf1fe410755c3e639b825b27895b0..22817a57ef81398cc018a78597755397d20e0eb9
 100644
--- a/gcc/testsuite/gcc.dg/vect/no-scevccp-slp-31.c
+++ b/gcc/testsuite/gcc.dg/vect/no-scevccp-slp-31.c
@@ -27,6 +27,7 @@ main1 ()
 #pragma GCC novector
  for (i = 0; i < N; i++)
{
+#pragma GCC novector
 for (j = 0; j < N; j++) 
   {
 if (a[i][j] != 8)
diff --git a/gcc/testsuite/gcc.dg/vect/no-section-anchors-vect-69.c 
b/gcc/testsuite/gcc.dg/vect/no-section-anchors-vect-69.c
index 
a0e53d5fef91868dfdbd542dd0a98dff92bd265b..0861d488e134d3f01a2fa83c56eff7174f36ddfb
 100644
--- a/gcc/testsuite/gcc.dg/vect/no-section-anchors-vect-69.c
+++ b/gcc/testsuite/gcc.dg/vect/no-section-anchors-vect-69.c
@@ -83,9 +83,9 @@ int main1 ()
 }
 
   /* check results:  */
-#pragma GCC novector
   for (i = 0; i < N; i++)
 {
+#pragma GCC novector
   for (j = 0; j < N; j++)
{
   if (tmp1[2].e.n[1][i][j] != 8)
@@ -103,9 +103,9 @@ int main1 ()
 }
 
   /* check results:  */
-#pragma GCC novector
   for (i = 0; i < N - NINTS; i++)
 {
+#pragma GCC novector
   for (j = 0; j < N - NINTS; j++)
{
   if (tmp2[2].e.n[1][i][j] != 8)
diff --git a/gcc/testsuite/gcc.target/aarch64/vect-xorsign_exec.c 
b/gcc/testsuite/gcc.target/aarch64/vect-xorsign_exec.c
index 
cfa22115831272cb1d4e1a38512f10c3a1c6ad77..84f33d3f6cce9b0017fd12ab961019041245ffae
 100644
--- a/gcc/testsuite/gcc.target/aarch64/vect-xorsign_exec.c
+++ b/gcc/testsuite/gcc.target/aarch64/vect-xorsign_exec.c
@@ -33,6 +33,7 @@ main (void)
 r[i] = a[i] * __builtin_copysignf (1.0f, b[i]);
 
   /* check results:  */
+#pragma GCC novector
   for (i = 0; i < N; i++)
 if (r[i] != a[i] * __builtin_copysignf (1.0f, b[i]))
   abort ();
@@ -41,6 +42,7 @@ main (void)
 rd[i] = ad[i] * __builtin_copysign (1.0d, bd[i]);
 
   /* check results:  */
+#pragma GCC novector
   for (i = 0; i < N; i++)
 if (rd[i] != ad[i] * __builtin_copysign (1.0d, bd[i]))
   abort ();
diff --git a/gcc/testsuite/gcc.target/i386/avx512er-vrcp28ps-3.c 
b/gcc/testsuite/gcc.target/i386/avx512er-vrcp28ps-3.c
index 
c0b1f7b31027f9438ab1641d3002887eabd34efa..1e68926a3180fffc6cbc8c6eed639a567fc32566
 100644
--- a/gcc/testsuite/g

Re: Ping: [PATCH] LoongArch: Replace -mexplicit-relocs=auto simple-used address peephole2 with combine

2023-12-24 Thread chenglulu



在 2023/12/24 下午8:59, Xi Ruoyao 写道:

On Sat, 2023-12-23 at 18:47 +0800, Xi Ruoyao wrote:

On Sat, 2023-12-23 at 18:44 +0800, Xi Ruoyao wrote:

On Sat, 2023-12-23 at 10:29 +0800, chenglulu wrote:

The performance drop has nothing to do with this patch. I found that the h264 
performance compiled
by r14-6787 compared to r14-6421 dropped by 6.4%.

Then I guess we should create a bug report...


  But there is a problem. My regression test has the following two fail 
items.(based on r14-6787)
+FAIL: gcc.dg/cpp/_Pragma3.c (test for excess errors)

I guess this is https://gcc.gnu.org/PR28123.


+FAIL: gcc.dg/pr86617.c scan-rtl-dump-times final "mem/v" 6

I'll take a look on this.  Maybe it will show up with Binutils trunk (I
just realized I tested this patch with Binutils 2.41, and it's not
sufficient to really test the change).

I cannot reproduce the issue on a Gentoo dev machine with Binutils
2.41.50.20231218 and the patch on top of r14-6819.  And in my manual
testing (for ruling out the difference caused by default PIE and SSP)
the test also passes:

xry111@nanmen2 ~/git-repos/gcc-build $ /home/xry111/git-repos/gcc-
build/gcc/xgcc -B/home/xry111/git-repos/gcc-build/gcc/ /home/xry111/git-
repos/gcc/gcc/testsuite/gcc.dg/pr86617.c -fdiagnostics-plain-output -Os
-fdump-rtl-final -ffat-lto-objects -S -o pr86617.s -fno-stack-protector
-fno-pie && grep -c mem/v pr86617.c.348r.final
6

Could you recheck with latest GCC master?

Ok, I'll test again with the latest code.






[PATCH v1] LoongArch: Fixed bug in *bstrins__for_ior_mask template.

2023-12-24 Thread Li Wei
We found that using the latest compiled gcc will cause a miscompare error
when running spec2006 400.perlbench test with -flto turned on.  After testing,
it was found that only the LoongArch architecture will report errors.
The first error commit was located through the git bisect command as
r14-3773-g5b857e87201335.  Through debugging, it was found that the problem
was that the split condition of the *bstrins__for_ior_mask template was
empty, which should actually be consistent with the insn condition.

gcc/ChangeLog:

* config/loongarch/loongarch.md: Adjust.
---
 gcc/config/loongarch/loongarch.md | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/gcc/config/loongarch/loongarch.md 
b/gcc/config/loongarch/loongarch.md
index 7021105b241..2b0609f2f31 100644
--- a/gcc/config/loongarch/loongarch.md
+++ b/gcc/config/loongarch/loongarch.md
@@ -1489,7 +1489,7 @@ (define_insn_and_split "*bstrins__for_ior_mask"
   "loongarch_pre_reload_split () && \
loongarch_use_bstrins_for_ior_with_mask (mode, operands)"
   "#"
-  ""
+  "&& true"
   [(set (match_dup 0) (match_dup 1))
(set (zero_extract:GPR (match_dup 0) (match_dup 2) (match_dup 4))
(match_dup 3))]
-- 
2.39.3



[COMMITTED] match: Improve `(a != b) ? (a + b) : (2 * a)` pattern [PR19832]

2023-12-24 Thread Andrew Pinski
In the testcase provided, we would match f_plus but not g_plus
due to a missing `:c` on the plus operator. This fixes the oversight
there.

Note this was noted in https://github.com/llvm/llvm-project/issues/76318 .

Committed as obvious after bootstrap/test on x86_64-linux-gnu.

PR tree-optimization/19832

gcc/ChangeLog:

* match.pd (`(a != b) ? (a + b) : (2 * a)`): Add `:c`
on the plus operator.

gcc/testsuite/ChangeLog:

* gcc.dg/tree-ssa/phi-opt-same-2.c: New test.

Signed-off-by: Andrew Pinski 
---
 gcc/match.pd  |  2 +-
 .../gcc.dg/tree-ssa/phi-opt-same-2.c  | 19 +++
 2 files changed, 20 insertions(+), 1 deletion(-)
 create mode 100644 gcc/testsuite/gcc.dg/tree-ssa/phi-opt-same-2.c

diff --git a/gcc/match.pd b/gcc/match.pd
index d57e29bfe1d..a980c4d7e94 100644
--- a/gcc/match.pd
+++ b/gcc/match.pd
@@ -5694,7 +5694,7 @@ DEFINE_INT_AND_FLOAT_ROUND_FN (RINT)
 @2)))
  /* (a != b) ? (a + b) : (2 * a) -> (a + b) */
  (simplify
-  (cnd (ne:c @0 @1) (plus@2 @0 @1) (mult @0 uniform_integer_cst_p@3))
+  (cnd (ne:c @0 @1) (plus:c@2 @0 @1) (mult @0 uniform_integer_cst_p@3))
   (if (wi::to_wide (uniform_integer_cst_p (@3)) == 2)
@2))
 )
diff --git a/gcc/testsuite/gcc.dg/tree-ssa/phi-opt-same-2.c 
b/gcc/testsuite/gcc.dg/tree-ssa/phi-opt-same-2.c
new file mode 100644
index 000..94fb6a92cea
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/tree-ssa/phi-opt-same-2.c
@@ -0,0 +1,19 @@
+/* { dg-do compile } */
+/* { dg-options "-O2 -fdump-tree-phiopt1 -fdump-tree-optimized" } */
+/* PR tree-optimization/19832 */
+
+int f_plus(int a, int b)
+{
+  if (a != b) return a + b;
+  return a + a;
+}
+
+int g_plus(int a, int b)
+{
+  if (b != a) return a + b;
+  return a + a;
+}
+
+/* All of the above function's if should have been optimized away even in 
phiopt1. */
+/* { dg-final { scan-tree-dump-not "if " "phiopt1" } } */
+/* { dg-final { scan-tree-dump-not "if " "optimized" } } */
-- 
2.39.3



[Committed] RISC-V: Add one more ASM check in PR113112-1.c

2023-12-24 Thread Juzhe-Zhong
gcc/testsuite/ChangeLog:

* gcc.dg/vect/costmodel/riscv/rvv/pr113112-1.c: Add one more ASM check.

---
 gcc/testsuite/gcc.dg/vect/costmodel/riscv/rvv/pr113112-1.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/gcc/testsuite/gcc.dg/vect/costmodel/riscv/rvv/pr113112-1.c 
b/gcc/testsuite/gcc.dg/vect/costmodel/riscv/rvv/pr113112-1.c
index a44a1c041af..31b41ba707e 100644
--- a/gcc/testsuite/gcc.dg/vect/costmodel/riscv/rvv/pr113112-1.c
+++ b/gcc/testsuite/gcc.dg/vect/costmodel/riscv/rvv/pr113112-1.c
@@ -20,6 +20,7 @@ foo (int n){
   return 0;
 }
 
+/* { dg-final { scan-assembler {e32,m4} } } */
 /* { dg-final { scan-assembler-not {jr} } } */
 /* { dg-final { scan-assembler-times {ret} 1 } } */
 /* { dg-final { scan-tree-dump "Maximum lmul = 8" "vect" } } */
-- 
2.36.3



[PATCH v4 4/6] RISC-V: Adds the prefix "th." for the instructions of XTheadVector.

2023-12-24 Thread Jun Sha (Joshua)
This patch adds th. prefix to all XTheadVector instructions by
implementing new assembly output functions. In this version, we 
follow Kito's suggestions and only check the prefix is 'v', so that 
no extra attribute is needed.

gcc/ChangeLog:

* config/riscv/riscv-protos.h (riscv_asm_output_opcode): 
New function to add assembler insn code prefix/suffix.
* config/riscv/riscv.cc (riscv_asm_output_opcode): Likewise.
* config/riscv/riscv.h (ASM_OUTPUT_OPCODE): Likewise.

Co-authored-by: Jin Ma 
Co-authored-by: Xianmiao Qu 
Co-authored-by: Christoph Müllner 
---
 gcc/config/riscv/riscv-protos.h   |  1 +
 gcc/config/riscv/riscv.cc | 19 +++
 gcc/config/riscv/riscv.h  |  4 
 .../riscv/rvv/xtheadvector/prefix.c   | 12 
 4 files changed, 36 insertions(+)
 create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/xtheadvector/prefix.c

diff --git a/gcc/config/riscv/riscv-protos.h b/gcc/config/riscv/riscv-protos.h
index 31049ef7523..5ea54b45703 100644
--- a/gcc/config/riscv/riscv-protos.h
+++ b/gcc/config/riscv/riscv-protos.h
@@ -102,6 +102,7 @@ struct riscv_address_info {
 };
 
 /* Routines implemented in riscv.cc.  */
+extern const char *riscv_asm_output_opcode (FILE *asm_out_file, const char *p);
 extern enum riscv_symbol_type riscv_classify_symbolic_expression (rtx);
 extern bool riscv_symbolic_constant_p (rtx, enum riscv_symbol_type *);
 extern int riscv_float_const_rtx_index_for_fli (rtx);
diff --git a/gcc/config/riscv/riscv.cc b/gcc/config/riscv/riscv.cc
index 0d1cbc5cb5f..30e6ced5f3f 100644
--- a/gcc/config/riscv/riscv.cc
+++ b/gcc/config/riscv/riscv.cc
@@ -5636,6 +5636,25 @@ riscv_get_v_regno_alignment (machine_mode mode)
   return lmul;
 }
 
+/* Define ASM_OUTPUT_OPCODE to do anything special before
+   emitting an opcode.  */
+const char *
+riscv_asm_output_opcode (FILE *asm_out_file, const char *p)
+{
+  if (!TARGET_XTHEADVECTOR)
+return p;
+
+  if (current_output_insn == NULL_RTX)
+return p;
+
+  /* We need to add th. prefix to all the xtheadvector
+ insturctions here.*/
+  if (p[0] == 'v')
+fputs ("th.", asm_out_file);
+
+  return p;
+}
+
 /* Implement TARGET_PRINT_OPERAND.  The RISCV-specific operand codes are:
 
'h' Print the high-part relocation associated with OP, after stripping
diff --git a/gcc/config/riscv/riscv.h b/gcc/config/riscv/riscv.h
index 6df9ec73c5e..c33361a254d 100644
--- a/gcc/config/riscv/riscv.h
+++ b/gcc/config/riscv/riscv.h
@@ -826,6 +826,10 @@ extern enum riscv_cc get_riscv_cc (const rtx use);
   asm_fprintf ((FILE), "%U%s", (NAME));\
   } while (0)
 
+#undef ASM_OUTPUT_OPCODE
+#define ASM_OUTPUT_OPCODE(STREAM, PTR) \
+  (PTR) = riscv_asm_output_opcode(STREAM, PTR)
+
 #define JUMP_TABLES_IN_TEXT_SECTION 0
 #define CASE_VECTOR_MODE SImode
 #define CASE_VECTOR_PC_RELATIVE (riscv_cmodel != CM_MEDLOW)
diff --git a/gcc/testsuite/gcc.target/riscv/rvv/xtheadvector/prefix.c 
b/gcc/testsuite/gcc.target/riscv/rvv/xtheadvector/prefix.c
new file mode 100644
index 000..48867f4ddfb
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/rvv/xtheadvector/prefix.c
@@ -0,0 +1,12 @@
+/* { dg-do compile } */
+/* { dg-options "-march=rv32gc_xtheadvector -mabi=ilp32 -O0" } */
+
+#include "riscv_vector.h"
+
+vint32m1_t
+prefix (vint32m1_t vx, vint32m1_t vy, size_t vl)
+{
+  return __riscv_vadd_vv_i32m1 (vx, vy, vl);
+}
+
+/* { dg-final { scan-assembler {\mth\.v\M} } } */
\ No newline at end of file
-- 
2.17.1



[PATCH v4 5/6] RISC-V: Handle differences between XTheadvector and Vector

2023-12-24 Thread Jun Sha (Joshua)
This patch is to handle the differences in instruction generation
between Vector and XTheadVector. In this version, we only support
partial xtheadvector instructions that leverage directly from current
RVV1.0 with simple adding "th." prefix. For different name xtheadvector
instructions but share same patterns as RVV1.0 instructions, we will
use ASM targethook to rewrite the whole string of the instructions in
the following patches. 

For some vector patterns that cannot be avoided, we use
"!TARGET_XTHEADVECTOR" to disable them in vector.md in order
not to generate instructions that xtheadvector does not support,
like vmv1r and vsext.vf2.

gcc/ChangeLog:

* config.gcc:  Add files for XTheadVector intrinsics.
* config/riscv/autovec.md: Guard XTheadVector.
* config/riscv/riscv-string.cc (expand_block_move):
Guard XTheadVector.
* config/riscv/riscv-v.cc (legitimize_move):
New expansion.
(get_prefer_tail_policy): Give specific value for tail.
(get_prefer_mask_policy): Give specific value for mask.
(vls_mode_valid_p): Avoid autovec.
* config/riscv/riscv-vector-builtins-shapes.cc (check_type):
(build_one): New function.
* config/riscv/riscv-vector-builtins.cc (DEF_RVV_FUNCTION):
(DEF_THEAD_RVV_FUNCTION): Add new marcos.
(check_required_extensions):
(handle_pragma_vector):
* config/riscv/riscv-vector-builtins.h (RVV_REQUIRE_VECTOR):
(RVV_REQUIRE_XTHEADVECTOR):
Add RVV_REQUIRE_VECTOR and RVV_REQUIRE_XTHEADVECTOR.
(struct function_group_info):
* config/riscv/riscv-vector-switch.def (ENTRY):
Disable fractional mode for the XTheadVector extension.
(TUPLE_ENTRY): Likewise.
* config/riscv/riscv-vsetvl.cc: Add functions for xtheadvector.
* config/riscv/riscv.cc (riscv_v_ext_vls_mode_p):
Guard XTheadVector.
(riscv_v_adjust_bytesize): Likewise.
(riscv_preferred_simd_mode): Likewsie.
(riscv_autovectorize_vector_modes): Likewise.
(riscv_vector_mode_supported_any_target_p): Likewise.
(TARGET_VECTOR_MODE_SUPPORTED_ANY_TARGET_P): Likewise.
* config/riscv/vector-iterators.md: Remove fractional LMUL.
* config/riscv/vector.md: Include thead-vector.md.
* config/riscv/riscv_th_vector.h: New file.
* config/riscv/thead-vector.md: New file.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/rvv/base/pragma-1.c: Add XTheadVector.
* gcc.target/riscv/rvv/base/abi-1.c: Exclude XTheadVector.
* lib/target-supports.exp: Add target for XTheadVector.

Co-authored-by: Jin Ma 
Co-authored-by: Xianmiao Qu 
Co-authored-by: Christoph Müllner 
---
 gcc/config.gcc|   2 +-
 gcc/config/riscv/autovec.md   |   2 +-
 gcc/config/riscv/predicates.md|   8 +-
 gcc/config/riscv/riscv-string.cc  |   3 +
 gcc/config/riscv/riscv-v.cc   |  13 +-
 .../riscv/riscv-vector-builtins-shapes.cc |  23 +++
 gcc/config/riscv/riscv-vector-switch.def  | 150 +++---
 gcc/config/riscv/riscv-vsetvl.cc  |  10 +
 gcc/config/riscv/riscv.cc |  20 +-
 gcc/config/riscv/riscv_th_vector.h|  49 +
 gcc/config/riscv/thead-vector.md  | 120 +++
 gcc/config/riscv/vector-iterators.md  | 186 +-
 gcc/config/riscv/vector.md|  36 +++-
 .../gcc.target/riscv/rvv/base/abi-1.c |   2 +-
 .../gcc.target/riscv/rvv/base/pragma-1.c  |   2 +-
 gcc/testsuite/lib/target-supports.exp |  12 ++
 16 files changed, 449 insertions(+), 189 deletions(-)
 create mode 100644 gcc/config/riscv/riscv_th_vector.h
 create mode 100644 gcc/config/riscv/thead-vector.md

diff --git a/gcc/config.gcc b/gcc/config.gcc
index f0676c830e8..1445d98c147 100644
--- a/gcc/config.gcc
+++ b/gcc/config.gcc
@@ -549,7 +549,7 @@ riscv*)
extra_objs="${extra_objs} riscv-vector-builtins.o 
riscv-vector-builtins-shapes.o riscv-vector-builtins-bases.o"
extra_objs="${extra_objs} thead.o riscv-target-attr.o"
d_target_objs="riscv-d.o"
-   extra_headers="riscv_vector.h"
+   extra_headers="riscv_vector.h riscv_th_vector.h"
target_gtfiles="$target_gtfiles 
\$(srcdir)/config/riscv/riscv-vector-builtins.cc"
target_gtfiles="$target_gtfiles 
\$(srcdir)/config/riscv/riscv-vector-builtins.h"
;;
diff --git a/gcc/config/riscv/autovec.md b/gcc/config/riscv/autovec.md
index 8b8a92f10a1..1fac56c7095 100644
--- a/gcc/config/riscv/autovec.md
+++ b/gcc/config/riscv/autovec.md
@@ -2579,7 +2579,7 @@
   [(match_operand  0 "register_operand")
(match_operand  1 "memory_operand")
(match_operand:ANYI 2 "const_int_operand")]
-  "TARGET_VECTOR"
+  "TARGET_VECTOR && !TARGET_XTHEADVECTOR"
   {
 riscv_vector::expand_rawmemchr(mode, operands[0], operands[1],
 

[PATCH v4 6/6] RISC-V: Add support for xtheadvector-specific intrinsics.

2023-12-24 Thread Jun Sha (Joshua)
This patch only involves the generation of xtheadvector
special load/store instructions and vext instructions.

gcc/ChangeLog:

* config/riscv/riscv-vector-builtins-bases.cc
(class th_loadstore_width): Define new builtin bases.
(BASE): Define new builtin bases.
* config/riscv/riscv-vector-builtins-bases.h:
Define new builtin class.
* config/riscv/riscv-vector-builtins-functions.def (vlsegff):
Include thead-vector-builtins-functions.def.
* config/riscv/riscv-vector-builtins-shapes.cc
(struct th_loadstore_width_def): Define new builtin shapes.
(struct th_indexed_loadstore_width_def):
Define new builtin shapes.
(SHAPE): Define new builtin shapes.
* config/riscv/riscv-vector-builtins-shapes.h:
Define new builtin shapes.
* config/riscv/riscv-vector-builtins-types.def
(DEF_RVV_I8_OPS): Add datatypes for XTheadVector.
(DEF_RVV_I16_OPS): Add datatypes for XTheadVector.
(DEF_RVV_I32_OPS): Add datatypes for XTheadVector.
(DEF_RVV_U8_OPS): Add datatypes for XTheadVector.
(DEF_RVV_U16_OPS): Add datatypes for XTheadVector.
(DEF_RVV_U32_OPS): Add datatypes for XTheadVector.
(vint8m1_t): Add datatypes for XTheadVector.
(vint8m2_t): Likewise.
(vint8m4_t): Likewise.
(vint8m8_t): Likewise.
(vint16m1_t): Likewise.
(vint16m2_t): Likewise.
(vint16m4_t): Likewise.
(vint16m8_t): Likewise.
(vint32m1_t): Likewise.
(vint32m2_t): Likewise.
(vint32m4_t): Likewise.
(vint32m8_t): Likewise.
(vint64m1_t): Likewise.
(vint64m2_t): Likewise.
(vint64m4_t): Likewise.
(vint64m8_t): Likewise.
(vuint8m1_t): Likewise.
(vuint8m2_t): Likewise.
(vuint8m4_t): Likewise.
(vuint8m8_t): Likewise.
(vuint16m1_t): Likewise.
(vuint16m2_t): Likewise.
(vuint16m4_t): Likewise.
(vuint16m8_t): Likewise.
(vuint32m1_t): Likewise.
(vuint32m2_t): Likewise.
(vuint32m4_t): Likewise.
(vuint32m8_t): Likewise.
(vuint64m1_t): Likewise.
(vuint64m2_t): Likewise.
(vuint64m4_t): Likewise.
(vuint64m8_t): Likewise.
* config/riscv/riscv-vector-builtins.cc
(DEF_RVV_I8_OPS): Add datatypes for XTheadVector.
(DEF_RVV_I16_OPS): Add datatypes for XTheadVector.
(DEF_RVV_I32_OPS): Add datatypes for XTheadVector.
(DEF_RVV_U8_OPS): Add datatypes for XTheadVector.
(DEF_RVV_U16_OPS): Add datatypes for XTheadVector.
(DEF_RVV_U32_OPS): Add datatypes for XTheadVector.
* config/riscv/thead-vector-builtins-functions.def: New file.
* config/riscv/thead-vector.md: Add new patterns.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/rvv/xtheadvector/vlb-vsb.c: New test.
* gcc.target/riscv/rvv/xtheadvector/vlbu-vsb.c: New test.
* gcc.target/riscv/rvv/xtheadvector/vlh-vsh.c: New test.
* gcc.target/riscv/rvv/xtheadvector/vlhu-vsh.c: New test.
* gcc.target/riscv/rvv/xtheadvector/vlw-vsw.c: New test.
* gcc.target/riscv/rvv/xtheadvector/vlwu-vsw.c: New test.

Co-authored-by: Jin Ma 
Co-authored-by: Xianmiao Qu 
Co-authored-by: Christoph Müllner 
---
 gcc/config.gcc|   2 +-
 .../riscv/riscv-vector-builtins-shapes.cc | 126 +++
 .../riscv/riscv-vector-builtins-shapes.h  |   3 +
 .../riscv/riscv-vector-builtins-types.def | 120 +++
 gcc/config/riscv/riscv-vector-builtins.cc | 313 +-
 gcc/config/riscv/riscv-vector-builtins.h  |   3 +
 gcc/config/riscv/t-riscv  |  16 +
 .../riscv/thead-vector-builtins-functions.def |  39 +++
 gcc/config/riscv/thead-vector-builtins.cc | 200 +++
 gcc/config/riscv/thead-vector-builtins.h  |  64 
 gcc/config/riscv/thead-vector.md  | 255 +-
 11 files changed, 1138 insertions(+), 3 deletions(-)
 create mode 100644 gcc/config/riscv/thead-vector-builtins-functions.def
 create mode 100644 gcc/config/riscv/thead-vector-builtins.cc
 create mode 100644 gcc/config/riscv/thead-vector-builtins.h

diff --git a/gcc/config.gcc b/gcc/config.gcc
index 1445d98c147..4478395ab77 100644
--- a/gcc/config.gcc
+++ b/gcc/config.gcc
@@ -547,7 +547,7 @@ riscv*)
extra_objs="riscv-builtins.o riscv-c.o riscv-sr.o 
riscv-shorten-memrefs.o riscv-selftests.o riscv-string.o"
extra_objs="${extra_objs} riscv-v.o riscv-vsetvl.o riscv-vector-costs.o 
riscv-avlprop.o"
extra_objs="${extra_objs} riscv-vector-builtins.o 
riscv-vector-builtins-shapes.o riscv-vector-builtins-bases.o"
-   extra_objs="${extra_objs} thead.o riscv-target-attr.o"
+   extra_objs="${extra_objs} thead.o riscv-target-attr.o 
thead-vector-builtins.o"
d_target_objs="riscv-d.o"
extra_headers="riscv_vector.h riscv_th_vector.h"
target

Re: [PATCH v4 4/6] RISC-V: Adds the prefix "th." for the instructions of XTheadVector.

2023-12-24 Thread juzhe.zh...@rivai.ai
+  if (current_output_insn == NULL_RTX)
+return p;

What is this used for ?

How about:

+  /* We need to add th. prefix to all the xtheadvector
+ insturctions here.*/
+  if (TARGET_XTHEADVECTOR && p[0] == 'v')
+fputs ("th.", asm_out_file);

\ No newline at end of file

New line should be added into prefix.c



juzhe.zh...@rivai.ai
 
From: Jun Sha (Joshua)
Date: 2023-12-25 14:25
To: gcc-patches
CC: jim.wilson.gcc; palmer; andrew; philipp.tomsich; jeffreyalaw; 
christoph.muellner; juzhe.zhong; Jun Sha (Joshua); Jin Ma; Xianmiao Qu
Subject: [PATCH v4 4/6] RISC-V: Adds the prefix "th." for the instructions of 
XTheadVector.
This patch adds th. prefix to all XTheadVector instructions by
implementing new assembly output functions. In this version, we 
follow Kito's suggestions and only check the prefix is 'v', so that 
no extra attribute is needed.
 
gcc/ChangeLog:
 
* config/riscv/riscv-protos.h (riscv_asm_output_opcode): 
New function to add assembler insn code prefix/suffix.
* config/riscv/riscv.cc (riscv_asm_output_opcode): Likewise.
* config/riscv/riscv.h (ASM_OUTPUT_OPCODE): Likewise.
 
Co-authored-by: Jin Ma 
Co-authored-by: Xianmiao Qu 
Co-authored-by: Christoph Müllner 
---
gcc/config/riscv/riscv-protos.h   |  1 +
gcc/config/riscv/riscv.cc | 19 +++
gcc/config/riscv/riscv.h  |  4 
.../riscv/rvv/xtheadvector/prefix.c   | 12 
4 files changed, 36 insertions(+)
create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/xtheadvector/prefix.c
 
diff --git a/gcc/config/riscv/riscv-protos.h b/gcc/config/riscv/riscv-protos.h
index 31049ef7523..5ea54b45703 100644
--- a/gcc/config/riscv/riscv-protos.h
+++ b/gcc/config/riscv/riscv-protos.h
@@ -102,6 +102,7 @@ struct riscv_address_info {
};
/* Routines implemented in riscv.cc.  */
+extern const char *riscv_asm_output_opcode (FILE *asm_out_file, const char *p);
extern enum riscv_symbol_type riscv_classify_symbolic_expression (rtx);
extern bool riscv_symbolic_constant_p (rtx, enum riscv_symbol_type *);
extern int riscv_float_const_rtx_index_for_fli (rtx);
diff --git a/gcc/config/riscv/riscv.cc b/gcc/config/riscv/riscv.cc
index 0d1cbc5cb5f..30e6ced5f3f 100644
--- a/gcc/config/riscv/riscv.cc
+++ b/gcc/config/riscv/riscv.cc
@@ -5636,6 +5636,25 @@ riscv_get_v_regno_alignment (machine_mode mode)
   return lmul;
}
+/* Define ASM_OUTPUT_OPCODE to do anything special before
+   emitting an opcode.  */
+const char *
+riscv_asm_output_opcode (FILE *asm_out_file, const char *p)
+{
+  if (!TARGET_XTHEADVECTOR)
+return p;
+
+  if (current_output_insn == NULL_RTX)
+return p;
+
+  /* We need to add th. prefix to all the xtheadvector
+ insturctions here.*/
+  if (p[0] == 'v')
+fputs ("th.", asm_out_file);
+
+  return p;
+}
+
/* Implement TARGET_PRINT_OPERAND.  The RISCV-specific operand codes are:
'h' Print the high-part relocation associated with OP, after stripping
diff --git a/gcc/config/riscv/riscv.h b/gcc/config/riscv/riscv.h
index 6df9ec73c5e..c33361a254d 100644
--- a/gcc/config/riscv/riscv.h
+++ b/gcc/config/riscv/riscv.h
@@ -826,6 +826,10 @@ extern enum riscv_cc get_riscv_cc (const rtx use);
   asm_fprintf ((FILE), "%U%s", (NAME)); \
   } while (0)
+#undef ASM_OUTPUT_OPCODE
+#define ASM_OUTPUT_OPCODE(STREAM, PTR) \
+  (PTR) = riscv_asm_output_opcode(STREAM, PTR)
+
#define JUMP_TABLES_IN_TEXT_SECTION 0
#define CASE_VECTOR_MODE SImode
#define CASE_VECTOR_PC_RELATIVE (riscv_cmodel != CM_MEDLOW)
diff --git a/gcc/testsuite/gcc.target/riscv/rvv/xtheadvector/prefix.c 
b/gcc/testsuite/gcc.target/riscv/rvv/xtheadvector/prefix.c
new file mode 100644
index 000..48867f4ddfb
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/rvv/xtheadvector/prefix.c
@@ -0,0 +1,12 @@
+/* { dg-do compile } */
+/* { dg-options "-march=rv32gc_xtheadvector -mabi=ilp32 -O0" } */
+
+#include "riscv_vector.h"
+
+vint32m1_t
+prefix (vint32m1_t vx, vint32m1_t vy, size_t vl)
+{
+  return __riscv_vadd_vv_i32m1 (vx, vy, vl);
+}
+
+/* { dg-final { scan-assembler {\mth\.v\M} } } */
\ No newline at end of file
-- 
2.17.1
 
 


回复:[PATCH v4 4/6] RISC-V: Adds the prefix "th." for the instructions of XTheadVector.

2023-12-24 Thread joshua
+ if (current_output_insn == NULL_RTX)
+ return p;
This is for inline assembly case.
--
发件人:juzhe.zh...@rivai.ai 
发送时间:2023年12月25日(星期一) 14:37
收件人:"cooper.joshua"; 
"gcc-patches"
抄 送:Jim Wilson; palmer; 
andrew; "philipp.tomsich"; 
jeffreyalaw; 
"christoph.muellner"; 
"cooper.joshua"; 
jinma; "cooper.qu"
主 题:Re: [PATCH v4 4/6] RISC-V: Adds the prefix "th." for the instructions of 
XTheadVector.
+ if (current_output_insn == NULL_RTX)
+ return p;
What is this used for ?
How about:
+ /* We need to add th. prefix to all the xtheadvector
+ insturctions here.*/
+ if (TARGET_XTHEADVECTOR && p[0] == 'v')
+ fputs ("th.", asm_out_file);
\ No newline at end of file
New line should be added into prefix.c
juzhe.zh...@rivai.ai
From: Jun Sha (Joshua) 
Date: 2023-12-25 14:25
To: gcc-patches 
CC: jim.wilson.gcc ; palmer 
; andrew ; 
philipp.tomsich ; jeffreyalaw 
; christoph.muellner 
; juzhe.zhong ; Jun Sha (Joshua) ; Jin Ma 
; Xianmiao Qu 

Subject: [PATCH v4 4/6] RISC-V: Adds the prefix "th." for the instructions of 
XTheadVector.
This patch adds th. prefix to all XTheadVector instructions by
implementing new assembly output functions. In this version, we 
follow Kito's suggestions and only check the prefix is 'v', so that 
no extra attribute is needed.
gcc/ChangeLog:
 * config/riscv/riscv-protos.h (riscv_asm_output_opcode): 
 New function to add assembler insn code prefix/suffix.
 * config/riscv/riscv.cc (riscv_asm_output_opcode): Likewise.
 * config/riscv/riscv.h (ASM_OUTPUT_OPCODE): Likewise.
Co-authored-by: Jin Ma 
Co-authored-by: Xianmiao Qu 
Co-authored-by: Christoph Müllner 
---
 gcc/config/riscv/riscv-protos.h | 1 +
 gcc/config/riscv/riscv.cc | 19 +++
 gcc/config/riscv/riscv.h | 4 
 .../riscv/rvv/xtheadvector/prefix.c | 12 
 4 files changed, 36 insertions(+)
 create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/xtheadvector/prefix.c
diff --git a/gcc/config/riscv/riscv-protos.h b/gcc/config/riscv/riscv-protos.h
index 31049ef7523..5ea54b45703 100644
--- a/gcc/config/riscv/riscv-protos.h
+++ b/gcc/config/riscv/riscv-protos.h
@@ -102,6 +102,7 @@ struct riscv_address_info {
 };
 /* Routines implemented in riscv.cc. */
+extern const char *riscv_asm_output_opcode (FILE *asm_out_file, const char *p);
 extern enum riscv_symbol_type riscv_classify_symbolic_expression (rtx);
 extern bool riscv_symbolic_constant_p (rtx, enum riscv_symbol_type *);
 extern int riscv_float_const_rtx_index_for_fli (rtx);
diff --git a/gcc/config/riscv/riscv.cc b/gcc/config/riscv/riscv.cc
index 0d1cbc5cb5f..30e6ced5f3f 100644
--- a/gcc/config/riscv/riscv.cc
+++ b/gcc/config/riscv/riscv.cc
@@ -5636,6 +5636,25 @@ riscv_get_v_regno_alignment (machine_mode mode)
 return lmul;
 }
+/* Define ASM_OUTPUT_OPCODE to do anything special before
+ emitting an opcode. */
+const char *
+riscv_asm_output_opcode (FILE *asm_out_file, const char *p)
+{
+ if (!TARGET_XTHEADVECTOR)
+ return p;
+
+ if (current_output_insn == NULL_RTX)
+ return p;
+
+ /* We need to add th. prefix to all the xtheadvector
+ insturctions here.*/
+ if (p[0] == 'v')
+ fputs ("th.", asm_out_file);
+
+ return p;
+}
+
 /* Implement TARGET_PRINT_OPERAND. The RISCV-specific operand codes are:
 'h' Print the high-part relocation associated with OP, after stripping
diff --git a/gcc/config/riscv/riscv.h b/gcc/config/riscv/riscv.h
index 6df9ec73c5e..c33361a254d 100644
--- a/gcc/config/riscv/riscv.h
+++ b/gcc/config/riscv/riscv.h
@@ -826,6 +826,10 @@ extern enum riscv_cc get_riscv_cc (const rtx use);
 asm_fprintf ((FILE), "%U%s", (NAME)); \
 } while (0)
+#undef ASM_OUTPUT_OPCODE
+#define ASM_OUTPUT_OPCODE(STREAM, PTR) \
+ (PTR) = riscv_asm_output_opcode(STREAM, PTR)
+
 #define JUMP_TABLES_IN_TEXT_SECTION 0
 #define CASE_VECTOR_MODE SImode
 #define CASE_VECTOR_PC_RELATIVE (riscv_cmodel != CM_MEDLOW)
diff --git a/gcc/testsuite/gcc.target/riscv/rvv/xtheadvector/prefix.c 
b/gcc/testsuite/gcc.target/riscv/rvv/xtheadvector/prefix.c
new file mode 100644
index 000..48867f4ddfb
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/rvv/xtheadvector/prefix.c
@@ -0,0 +1,12 @@
+/* { dg-do compile } */
+/* { dg-options "-march=rv32gc_xtheadvector -mabi=ilp32 -O0" } */
+
+#include "riscv_vector.h"
+
+vint32m1_t
+prefix (vint32m1_t vx, vint32m1_t vy, size_t vl)
+{
+ return __riscv_vadd_vv_i32m1 (vx, vy, vl);
+}
+
+/* { dg-final { scan-assembler {\mth\.v\M} } } */
\ No newline at end of file
-- 
2.17.1


Re: 回复:[PATCH v4 4/6] RISC-V: Adds the prefix "th." for the instructions of XTheadVector.

2023-12-24 Thread juzhe.zh...@rivai.ai
OK. This sub-patch is ok to commit after adding new line to prefix.c



juzhe.zh...@rivai.ai
 
发件人: joshua
发送时间: 2023-12-25 15:08
收件人: juzhe.zh...@rivai.ai; gcc-patches
抄送: Jim Wilson; palmer; andrew; philipp.tomsich; jeffreyalaw; 
christoph.muellner; jinma; cooper.qu
主题: 回复:[PATCH v4 4/6] RISC-V: Adds the prefix "th." for the instructions of 
XTheadVector.
+  if (current_output_insn == NULL_RTX)
+return p;

This is for inline assembly case.
--
发件人:juzhe.zh...@rivai.ai 
发送时间:2023年12月25日(星期一) 14:37
收件人:"cooper.joshua"; 
"gcc-patches"
抄 送:Jim Wilson; palmer; 
andrew; "philipp.tomsich"; 
jeffreyalaw; 
"christoph.muellner"; 
"cooper.joshua"; 
jinma; "cooper.qu"
主 题:Re: [PATCH v4 4/6] RISC-V: Adds the prefix "th." for the instructions of 
XTheadVector.

+  if (current_output_insn == NULL_RTX)
+return p;

What is this used for ?

How about:

+  /* We need to add th. prefix to all the xtheadvector
+ insturctions here.*/
+  if (TARGET_XTHEADVECTOR && p[0] == 'v')
+fputs ("th.", asm_out_file);

\ No newline at end of file

New line should be added into prefix.c



juzhe.zh...@rivai.ai
 
From: Jun Sha (Joshua)
Date: 2023-12-25 14:25
To: gcc-patches
CC: jim.wilson.gcc; palmer; andrew; philipp.tomsich; jeffreyalaw; 
christoph.muellner; juzhe.zhong; Jun Sha (Joshua); Jin Ma; Xianmiao Qu
Subject: [PATCH v4 4/6] RISC-V: Adds the prefix "th." for the instructions of 
XTheadVector.
This patch adds th. prefix to all XTheadVector instructions by
implementing new assembly output functions. In this version, we 
follow Kito's suggestions and only check the prefix is 'v', so that 
no extra attribute is needed.
 
gcc/ChangeLog:
 
* config/riscv/riscv-protos.h (riscv_asm_output_opcode): 
New function to add assembler insn code prefix/suffix.
* config/riscv/riscv.cc (riscv_asm_output_opcode): Likewise.
* config/riscv/riscv.h (ASM_OUTPUT_OPCODE): Likewise.
 
Co-authored-by: Jin Ma 
Co-authored-by: Xianmiao Qu 
Co-authored-by: Christoph Müllner 
---
gcc/config/riscv/riscv-protos.h   |  1 +
gcc/config/riscv/riscv.cc | 19 +++
gcc/config/riscv/riscv.h  |  4 
.../riscv/rvv/xtheadvector/prefix.c   | 12 
4 files changed, 36 insertions(+)
create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/xtheadvector/prefix.c
 
diff --git a/gcc/config/riscv/riscv-protos.h b/gcc/config/riscv/riscv-protos.h
index 31049ef7523..5ea54b45703 100644
--- a/gcc/config/riscv/riscv-protos.h
+++ b/gcc/config/riscv/riscv-protos.h
@@ -102,6 +102,7 @@ struct riscv_address_info {
};
/* Routines implemented in riscv.cc.  */
+extern const char *riscv_asm_output_opcode (FILE *asm_out_file, const char *p);
extern enum riscv_symbol_type riscv_classify_symbolic_expression (rtx);
extern bool riscv_symbolic_constant_p (rtx, enum riscv_symbol_type *);
extern int riscv_float_const_rtx_index_for_fli (rtx);
diff --git a/gcc/config/riscv/riscv.cc b/gcc/config/riscv/riscv.cc
index 0d1cbc5cb5f..30e6ced5f3f 100644
--- a/gcc/config/riscv/riscv.cc
+++ b/gcc/config/riscv/riscv.cc
@@ -5636,6 +5636,25 @@ riscv_get_v_regno_alignment (machine_mode mode)
   return lmul;
}
+/* Define ASM_OUTPUT_OPCODE to do anything special before
+   emitting an opcode.  */
+const char *
+riscv_asm_output_opcode (FILE *asm_out_file, const char *p)
+{
+  if (!TARGET_XTHEADVECTOR)
+return p;
+
+  if (current_output_insn == NULL_RTX)
+return p;
+
+  /* We need to add th. prefix to all the xtheadvector
+ insturctions here.*/
+  if (p[0] == 'v')
+fputs ("th.", asm_out_file);
+
+  return p;
+}
+
/* Implement TARGET_PRINT_OPERAND.  The RISCV-specific operand codes are:
'h' Print the high-part relocation associated with OP, after stripping
diff --git a/gcc/config/riscv/riscv.h b/gcc/config/riscv/riscv.h
index 6df9ec73c5e..c33361a254d 100644
--- a/gcc/config/riscv/riscv.h
+++ b/gcc/config/riscv/riscv.h
@@ -826,6 +826,10 @@ extern enum riscv_cc get_riscv_cc (const rtx use);
   asm_fprintf ((FILE), "%U%s", (NAME)); \
   } while (0)
+#undef ASM_OUTPUT_OPCODE
+#define ASM_OUTPUT_OPCODE(STREAM, PTR) \
+  (PTR) = riscv_asm_output_opcode(STREAM, PTR)
+
#define JUMP_TABLES_IN_TEXT_SECTION 0
#define CASE_VECTOR_MODE SImode
#define CASE_VECTOR_PC_RELATIVE (riscv_cmodel != CM_MEDLOW)
diff --git a/gcc/testsuite/gcc.target/riscv/rvv/xtheadvector/prefix.c 
b/gcc/testsuite/gcc.target/riscv/rvv/xtheadvector/prefix.c
new file mode 100644
index 000..48867f4ddfb
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/rvv/xtheadvector/prefix.c
@@ -0,0 +1,12 @@
+/* { dg-do compile } */
+/* { dg-options "-march=rv32gc_xtheadvector -mabi=ilp32 -O0" } */
+
+#include "riscv_vector.h"
+
+vint32m1_t
+prefix (vint32m1_t vx, vint32m1_t vy, size_t vl)
+{
+  return __riscv_vadd_vv_i32m1 (vx, vy, vl);
+}
+
+/* { dg-final { scan-assembler {\mth\.v\M} } } */
\ No newline at end of file
-- 
2.17.1
 
 


RE: [r14-6770 Regression] FAIL: gcc.dg/gnu23-tag-4.c (test for excess errors) on Linux/x86_64

2023-12-24 Thread Jiang, Haochen
It is not a target specific issue, it will fail if we enabled AVX.

e.g.:

$ /export/users/haochenj/env/build_no_bootstrap_master/gcc/xgcc 
-B/export/users/haochenj/env/build_no_bootstrap_master/gcc/  
/export/users/haochenj/src/gcc/master/gcc/testsuite/gcc.dg/gnu23-tag-4.c  -m64 
-mavx   -fdiagnostics-plain-output   -std=gnu23 -S -o gnu23-tag-4.s
/export/users/haochenj/src/gcc/master/gcc/testsuite/gcc.dg/gnu23-tag-4.c: In 
function ‘bar’:
/export/users/haochenj/src/gcc/master/gcc/testsuite/gcc.dg/gnu23-tag-4.c:18:47: 
error: initialization of ‘struct g *’ from incompatible pointer type ‘struct g 
*’ [-Wincompatible-pointer-types]

Thx,
Haochen

> -Original Message-
> From: Martin Uecker 
> Sent: Friday, December 22, 2023 5:39 PM
> To: gcc-regress...@gcc.gnu.org; gcc-patches@gcc.gnu.org; Jiang, Haochen
> ; Joseph Myers 
> Subject: Re: [r14-6770 Regression] FAIL: gcc.dg/gnu23-tag-4.c (test for excess
> errors) on Linux/x86_64
> 
> 
> Hm, this is weird, as it really seems to depend on the -march=  So if 
> there is
> really a difference between those structs which make them incompatible on
> some archs, we should not consider them to be compatible in general.
> 
> struct g { int a[n]; int b; } *y;
> { struct g { int a[4]; int b; } *y2 = y; }
> 
> But I do not see what could go wrong here as sizeof / alignment is the same 
> for
> n = 4.  So there is something else I missed
> 
> 
> 
> Am Freitag, dem 22.12.2023 um 05:07 +0800 schrieb haochen.jiang:
> > On Linux/x86_64,
> >
> > 23fee88f84873b0b8b41c8e5a9b229d533fb4022 is the first bad commit
> > commit 23fee88f84873b0b8b41c8e5a9b229d533fb4022
> > Author: Martin Uecker 
> > Date:   Tue Aug 15 14:58:32 2023 +0200
> >
> > c23: tag compatibility rules for struct and unions
> >
> > caused
> >
> > FAIL: gcc.dg/gnu23-tag-4.c (test for excess errors)
> >
> > with GCC configured with
> >
> > ../../gcc/configure
> > --prefix=/export/users/haochenj/src/gcc-bisect/master/master/r14-6770/
> > usr --enable-clocale=gnu --with-system-zlib --with-demangler-in-ld
> > --with-fpmath=sse --enable-languages=c,c++,fortran --enable-cet
> > --without-isl --enable-libmpx x86_64-linux --disable-bootstrap
> >
> > To reproduce:
> >
> > $ cd {build_dir}/gcc && make check RUNTESTFLAGS="dg.exp=gcc.dg/gnu23-
> tag-4.c --target_board='unix{-m32\ -march=cascadelake}'"
> > $ cd {build_dir}/gcc && make check RUNTESTFLAGS="dg.exp=gcc.dg/gnu23-
> tag-4.c --target_board='unix{-m64\ -march=cascadelake}'"
> >
> > (Please do not reply to this email, for question about this report,
> > contact me at haochen dot jiang at intel.com.) (If you met problems
> > with cascadelake related, disabling AVX512F in command line might save
> that.) (However, please make sure that there is no potential problems with
> AVX512.)