[PATCH v3 3/3] testsuite: LoongArch: Enable 16B atomic tests if the test machine supports LSX and SCQ

2025-09-08 Thread Xi Ruoyao
Enable those tests so we won't make too stupid mistakes in 16B atomic implementation anymore. All these test passed on a Loongson 3C6000/S except atomic-other-int128.c. With GDB patched to support sc.q (https://sourceware.org/pipermail/gdb-patches/2025-August/220034.html) this test also XPASS. g

Re: [PATCH v2 1/3] LoongArch: Fix the semantic of 16B CAS

2025-09-08 Thread Xi Ruoyao
On Fri, 2025-09-05 at 08:50 +0800, Lulu Cheng wrote: > > 在 2025/9/4 下午7:48, Lulu Cheng 写道: > > > > 在 2025/8/22 下午4:14, Xi Ruoyao 写道: > > > In a CAS operation, even if expected != *memory we still need to > > > do an > > > atomic load of *mem

[PATCH v3 1/3] LoongArch: Fix the "%t" modifier handling for (const_int 0)

2025-09-08 Thread Xi Ruoyao
This modifier is intended to output $r0 for (const_int 0), but the logic: GET_MODE (op) != TImode || (op != CONST0_RTX (TImode) && code != REG) will reject (const_int 0) because (const_int 0) actually does not have a mode and GET_MODE will return VOIDmode for it. Use reg_or_0_operand instead to

[PATCH v3 0/3] LoongArch: Fix ICE and semantic issue of 16B CAS

2025-09-08 Thread Xi Ruoyao
ation test that we know to fail with current GDB, not the compile test. Xi Ruoyao (3): LoongArch: Fix the "%t" modifier handling for (const_int 0) LoongArch: Fix the semantic of 16B CAS testsuite: LoongArch: Enable 16B atomic tests if the test machine supports LSX and S

[PATCH] testsuite: Another fixup for fixed-point/bitint-1.c test

2025-09-08 Thread Xi Ruoyao
Besides r16-3595, there's another bug in this test: with -std=c23 the token _Sat isn't recognized as a keyword at all, thus an error massage different from the expected will be outputted. Fix it by using -std=gnu23 instead. gcc/testsuite: * gcc.dg/fixed-point/bitint-1.c (dg-options): Use

Re: [PATCH] bitint: Fix regressions [PR117599]

2025-09-03 Thread Xi Ruoyao
ng build_qualified_type when it won't be needed. > > At least on s390x-linux (tried cross) bitint-14.c doesn't ICE with it > anymore. I'll submit this to my test facility for loongarch64-linux too. -- Xi Ruoyao

Re: [PATCH] bitint: Fix regressions [PR117599]

2025-09-02 Thread Xi Ruoyao
On Tue, 2025-09-02 at 22:59 +0800, Xi Ruoyao wrote: > On Tue, 2025-09-02 at 16:51 +0200, Jakub Jelinek wrote: > > On Tue, Aug 19, 2025 at 07:37:29PM +0800, Yang Yujie wrote: > > > This patch fixes regressions of the gcc.dg/torture/bitint-* tests > > > caused by r1

Ping: [PATCH] bitint: Fix regressions [PR117599]

2025-09-01 Thread Xi Ruoyao
(build_pointer_type (limb_type_a)); > +   gimple *g = gimple_build_assign (base, build_fold_addr_expr (ret)); > +   insert_before (g); > + > +   tree ptrtype = build_pointer_type (ltype); > +   ret = build2 (MEM_REF, ltype, add_cast (ptrtype, base), > + build_zero_cst (ptrtype)); > + } > } >    if (!write_p && !useless_type_conversion_p (atype, ltype)) > { -- Xi Ruoyao

Re: [pushed] wwwdocs: gcc-14: Editorial changes in the RISC-V section

2025-08-30 Thread Xi Ruoyao
wwwdocs:, and this feature > require glibc 2.40, > -  thanks to Tatsuyuki Ishi from > +  configured with --with-tls=[trad|desc] wwwdocs:. > +  This feature requires glibc 2.40. Thanks to Tatsuyuki Ishi from Just noticed this claim but unfortunately... this is not correct. The RISC-V TLS desc support has not made into Glibc (even 2.42) yet. -- Xi Ruoyao

[PATCH] LoongArch: Fix ICE in highway-1.3.0 testsuite [PR121634]

2025-08-23 Thread Xi Ruoyao
I can't believe I made such a stupid pasto and the regression test didn't detect anything wrong. PR target/121634 gcc/ * config/loongarch/simd.md (simd_maddw_evod__): Use WVEC_HALF instead of WVEC for the mode of the sign_extend for the rhs of multiplication. gcc

[PATCH v2 0/3] LoongArch: Fix semantic issue and ICE of 16B CAS

2025-08-22 Thread Xi Ruoyao
Bootstrapped and regtested on loongarch64-linux-gnu. Ok for trunk? Changes from v1: - Use { xfail } in { dg-final } instead of { dg-xfail-if }, because it's the thread simulation test that we know to fail with current GDB, not the compile test. Xi Ruoyao (3): LoongArch: Fix the sem

[PATCH v2 3/3] testsuite: LoongArch: Enable 16B atomic tests if the test machine supports LSX and SCQ

2025-08-22 Thread Xi Ruoyao
Enable those tests so we won't make too stupid mistakes in 16B atomic implementation anymore. All these test passed on a Loongson 3C6000/S except atomic-other-int128.c. With GDB patched to support sc.q (https://sourceware.org/pipermail/gdb-patches/2025-August/220034.html) this test also XPASS. g

[PATCH v2 1/3] LoongArch: Fix the semantic of 16B CAS

2025-08-22 Thread Xi Ruoyao
In a CAS operation, even if expected != *memory we still need to do an atomic load of *memory into output. But I made a mistake in the initial implementation, causing the output to contain junk in this situation. Like a normal atomic load, the atomic load embedded in the CAS semantic is required

[PATCH v2 2/3] LoongArch: Fix ICE on atomic-compare-exchange-5.c

2025-08-22 Thread Xi Ruoyao
Fix the ICE: ../gcc/gcc/testsuite/gcc.dg/atomic-compare-exchange-5.c:88:1: internal compiler error: output_operand: invalid use of '%t' 88 | } | ^ The ICE is because we have an incorrect condition "GET_MODE (op) != TImode": we may use (const_int 0) here but it is in VOIDmode. Use reg_o

[PATCH 1/3] LoongArch: Fix the semantic of 16B CAS

2025-08-19 Thread Xi Ruoyao
In a CAS operation, even if expected != *memory we still need to do an atomic load of *memory into output. But I made a mistake in the initial implementation, causing the output to contain junk in this situation. Like a normal atomic load, the atomic load embedded in the CAS semantic is required

[PATCH 3/3] testsuite: LoongArch: Enable 16B atomic tests if the test machine supports LSX and SCQ

2025-08-19 Thread Xi Ruoyao
Enable those tests so we won't make too stupid mistakes in 16B atomic implementation anymore. All these test passed on a Loongson 3C6000/S except atomic-other-int128.c. With GDB patched to support sc.q (https://sourceware.org/pipermail/gdb-patches/2025-August/220034.html) this test also XPASS. g

[PATCH 2/3] LoongArch: Fix ICE on atomic-compare-exchange-5.c

2025-08-19 Thread Xi Ruoyao
Fix the ICE: ../gcc/gcc/testsuite/gcc.dg/atomic-compare-exchange-5.c:88:1: internal compiler error: output_operand: invalid use of '%t' 88 | } | ^ The ICE is because we have an incorrect condition "GET_MODE (op) != TImode": we may use (const_int 0) here but it is in VOIDmode. Use reg_o

[PATCH 0/3] LoongArch: Fix semantic issue and ICE of 16B CAS

2025-08-19 Thread Xi Ruoyao
Bootstrapped and regtested on loongarch64-linux-gnu. Ok for trunk? Xi Ruoyao (3): LoongArch: Fix the semantic of 16B CAS LoongArch: Fix ICE on atomic-compare-exchange-5.c testsuite: LoongArch: Enable 16B atomic tests if the test machine supports LSX and SCQ gcc/config/loongarch

Re: [pushed] [PATCH 00/17] LoongArch: Clean up atomic operations and implement 16-byte atomic operations

2025-08-18 Thread Xi Ruoyao
On Mon, 2025-08-18 at 11:10 +0800, Xi Ruoyao wrote: > On Mon, 2025-08-18 at 09:15 +0800, Lulu Cheng wrote: > > Pushed to r16-3247 ... r16-3264. > > > > Sorry it took so long to merge. > > > >    LoongArch: Implement 16-byte CAS with sc.q > > Sorry

Re: [pushed] [PATCH 00/17] LoongArch: Clean up atomic operations and implement 16-byte atomic operations

2025-08-17 Thread Xi Ruoyao
it's better to revert this for now until I can find a proper solution. > -- Xi Ruoyao

Re: [PATCH] LoongArch: Don't set movgr2cf cost for LA664 [PR120476]

2025-08-12 Thread Xi Ruoyao
Dropped in favor of https://gcc.gnu.org/pipermail/gcc-patches/2025-August/692394.html. On Tue, 2025-08-05 at 17:34 +0800, Xi Ruoyao wrote: > Despite LA664 has 1-cycle movgr2cf in real, it seems setting the correct > value in the cost model has puzzled the register allocator and se

Ping: [PATCH v2] testsuite: i386: Fix gcc.target/i386/pr90579.c when PIE is enabled [PR118885]

2025-08-09 Thread Xi Ruoyao
+/* { dg-options "-O3 -mavx2 -mfpmath=sse -fno-pie" } */ >   >  extern double r[6]; >  extern double a[]; -- Xi Ruoyao

[PATCH] LoongArch: Don't set movgr2cf cost for LA664 [PR120476]

2025-08-05 Thread Xi Ruoyao
Despite LA664 has 1-cycle movgr2cf in real, it seems setting the correct value in the cost model has puzzled the register allocator and severely impacted the performance, esp. for some workloads like OpenSSL 3.5.1 SHA512 and SPEC CPU 2017 exchange_r. As movgr2cf is very rarely used (we cannot even

Re: [PATCH v5 0/3] Hard Register Constraints

2025-07-21 Thread Xi Ruoyao
with a > simple loop. It's not if string.h is included. It's if string.h provides rawmemchr. rawmemchr is not a standard C function. It's a GNU extension and GCC is expected to work on various non-GNU systems. -- Xi Ruoyao

[PATCH] LoongArch: Fix wrong code generated by TARGET_VECTORIZE_VEC_PERM_CONST [PR121064]

2025-07-14 Thread Xi Ruoyao
When TARGET_VECTORIZE_VEC_PERM_CONST is called, target may be the same pseudo as op0 and/or op1. Loading the selector into target would clobber the input, producing wrong code like vld $vr0, $t0 vshuf.w $vr0, $vr0, $vr1 So don't load the selector into d->target, use a new pseudo to h

Re: [EXT] Re: [PATCH 2/2] lra: Reallow reloading user hard registers if the insn is not asm [PR 120983]

2025-07-12 Thread Xi Ruoyao
On Fri, 2025-07-11 at 14:01 -0500, Peter Bergner wrote: > On 7/11/25 10:22 AM, Vladimir Makarov wrote: > > On 7/8/25 9:43 PM, Xi Ruoyao wrote: > > > > > > IIUC "recog does not look at constraints until reload" has been a > > > well-established rule

[PATCH 2/2] lra: Reallow reloading user hard registers if the insn is not asm [PR 120983]

2025-07-08 Thread Xi Ruoyao
The PR 87600 fix has disallowed reloading user hard registers to resolve earlyclobber-induced conflict. However before reload, recog completely ignores the constraints of insns, so the RTL passes may produce insns where some user hard registers violate an earlyclobber. Then we'll get an ICE witho

[PATCH 1/2] testsuite: Enable the PR 87600 tests for LoongArch

2025-07-08 Thread Xi Ruoyao
I'm going to refine a part of the PR 87600 fix which seems triggering PR 120983 that LoongArch is particularly suffering. Enable the PR 87600 tests so I'll not regress PR 87600. gcc/testsuite/ChangeLog: PR rtl-optimization/87600 PR rtl-optimization/120983 * gcc.dg/pr87600

[PATCH 0/2] Fix PR120983

2025-07-08 Thread Xi Ruoyao
Bootstrapped and regtested on aarch64-linux-gnu, loongarch64-linux-gnu, and x86_64-linux-gnu. Ok for trunk? Xi Ruoyao (2): testsuite: Enable the PR 87600 tests for LoongArch lra: Reallow reloading user hard registers if the insn is not asm [PR 120983] gcc/lra-constraints.cc

Re: [PATCH] LoongArch: Fix ICE caused by _alsl_reversesi_extended.

2025-07-05 Thread Xi Ruoyao
On Sat, 2025-07-05 at 14:10 -0500, Segher Boessenkool wrote: > Hi! > > On Sat, Jul 05, 2025 at 11:10:05PM +0800, Xi Ruoyao wrote: > > Possibly this is https://gcc.gnu.org/PR101882.  Specifically comment 5 > > from Segher: > > > > "The LRA change is corre

Re: [PATCH] LoongArch: Fix ICE caused by _alsl_reversesi_extended.

2025-07-05 Thread Xi Ruoyao
On Sat, 2025-07-05 at 17:55 +0800, Xi Ruoyao wrote: > On Sat, 2025-07-05 at 11:20 +0800, Lulu Cheng wrote: > > For the gcc.target/loongarch/bitwise-shift-reassoc-clobber.c, > > some extensions are eliminated in ext_dce in commit r16-1835. > > > > This will result

Re: [PATCH] LoongArch: Fix ICE caused by _alsl_reversesi_extended.

2025-07-05 Thread Xi Ruoyao
"register_operand" "r"] >    "TARGET_64BIT >     && loongarch_reassoc_shift_bitwise (, operands[2], operands[3], > -    SImode)" > +    SImode) > +   && !(GP_REG_P (REGNO (operands[0])) > + && REGNO (operands[0]) == REGNO (operands[4]))" >    "#" >    "&& reload_completed" >    [; r0 = r1 [&|^] r3 is emitted in PREPARATION-STATEMENTS because we -- Xi Ruoyao

Re: [PATCH] LoongArch: Prevent subreg of subreg in CRC

2025-07-03 Thread Xi Ruoyao
On Fri, 2025-07-04 at 11:14 +0800, Xi Ruoyao wrote: > On Fri, 2025-07-04 at 09:47 +0800, Lulu Cheng wrote: > > > > 在 2025/7/2 下午3:31, Xi Ruoyao 写道: > > > The register_operand predicate can match subreg, then we'd have a > > > subreg > > > of subreg

Re: [PATCH] LoongArch: Prevent subreg of subreg in CRC

2025-07-03 Thread Xi Ruoyao
On Fri, 2025-07-04 at 09:47 +0800, Lulu Cheng wrote: > > 在 2025/7/2 下午3:31, Xi Ruoyao 写道: > > The register_operand predicate can match subreg, then we'd have a subreg > > of subreg and it's invalid.  Use lowpart_subreg to avoid the nested > >

[PATCH] LoongArch: testsuite: Adapt bstrpick_alsl_paired.c for GCC 16 change

2025-07-03 Thread Xi Ruoyao
In GCC 16 the compiler is smarter and it optimizes away the unneeded zero-extension during the expand pass. Thus we can no longer match and_alsl_reversed. Drop the scan-rtl-dump for and_alsl_reversed and add scan-assembler-not against bstrpick.d to detect the unneeded zero-extension in case it re

[PATCH] LoongArch: Prevent subreg of subreg in CRC

2025-07-02 Thread Xi Ruoyao
The register_operand predicate can match subreg, then we'd have a subreg of subreg and it's invalid. Use lowpart_subreg to avoid the nested subreg. gcc/ChangeLog: * config/loongarch/loongarch.md (crc_combine): Avoid nested subreg. gcc/testsuite/ChangeLog: * gcc.c-tortu

Re: [RFC PATCH v1 12/31] LoongArch: Forbid k, ZC constraints for movsi_internal

2025-06-10 Thread Xi Ruoyao
load,store,mgtf,fpload,mftg,fpstore") > +   (set_attr "mode" "SI")]) > + > +(define_insn_and_split "*movsi_internal_la32" > +  [(set (match_operand:SI 0 "nonimmediate_operand" "=r,r,r,m,*f,f,*r,*m") > + (match_operand:SI 1 "move_operand" "r,Yd,m,rJ,*r*J,m,*f,*f"))] > +  "TARGET_32BIT > +   && (register_operand (operands[0], SImode) > +   || reg_or_0_operand (operands[1], SImode))" >    { return loongarch_output_move (operands); } >    "CONST_INT_P (operands[1]) && REG_P (operands[0]) && GP_REG_P (REGNO >    (operands[0]))" -- Xi Ruoyao School of Aerospace Science and Technology, Xidian University

Re: [RFC PATCH v1 15/31] LoongArch: Disable k constraint on LA32

2025-06-10 Thread Xi Ruoyao
(op, 0), mode)"))) > +   (match_test "TARGET_64BIT && loongarch_base_index_address_p (XEXP > (op, 0), mode)"))) IMO it's more natural to do (and (match_code "mem") (match_test "TARGET_64BIT") (match_test "loongarch_b

Re: [RFC PATCH v1 16/31] LoongArch: Add support for atomic on LA32

2025-06-10 Thread Xi Ruoyao
t; +} > +  [(set (attr "length") (const_int 20))]) >   >  (define_insn "atomic_exchange" >    [(set (match_operand:GPR 0 "register_operand" "=&r") > @@ -217,10 +244,21 @@ (define_insn "atomic_exchange" >      (match_operand:

Re: [RFC PATCH v1 25/31] LoongArch: macro instead enum for base abi type

2025-06-10 Thread Xi Ruoyao
n the lines below). > +#define ABI_BASE_ILP32F   1 > +#define ABI_BASE_ILP32S   2 > +#define ABI_BASE_LP64D   3 > +#define ABI_BASE_LP64F   4 > +#define ABI_BASE_LP64S   5 > +#define N_ABI_BASE_TYPES  6 >   >  extern loongarch_def_array >    loongarch_abi_base_strings; -- Xi Ruoyao School of Aerospace Science and Technology, Xidian University

Re: [RFC PATCH v1 08/31] LoongArch: Forbid ADDRESS_REG_REG in loongarch32

2025-06-10 Thread Xi Ruoyao
ns "-march=loongarch32 -mabi=ilp32d -O2" } */ > +long long foo(long long *arr, long long index) > +{ > + return arr[index]; > +} > \ No newline at end of file Please don't leave files with no newline at end. -- Xi Ruoyao School of Aerospace Science and Technology, Xidian University

Re: [RFC PATCH v1 13/31] LoongArch: Forbid stptr/ldptr when enable -fshrink-wrap.

2025-06-10 Thread Xi Ruoyao
+ if (IMM12_OPERAND (offset) > +     || (TARGET_64BIT && (offset < 32768))) >     bitmap_set_bit (components, regno); >   >   offset -= UNITS_PER_WORD; -- Xi Ruoyao School of Aerospace Science and Technology, Xidian University

Re: [RFC PATCH v1 10/31] LoongArch: Disable extreme code model for crtbeginS.o on LA32

2025-06-10 Thread Xi Ruoyao
) IIRC this change is caused by a downstream autoconf patch used by some distro. Thus we need to regenerate the configure script with the vanilla autoconf-2.69 to avoid the unrelated change. -- Xi Ruoyao School of Aerospace Science and Technology, Xidian University

Re: [PATCH v2] ext-dce: Don't refine live width with SUBREG mode if !TRULY_NOOP_TRUNCATION_MODES_P [PR 120050]

2025-06-04 Thread Xi Ruoyao
On Wed, 2025-05-28 at 18:17 +0100, Richard Sandiford wrote: > Sorry for the slow reply, had a few days off. > > Xi Ruoyao writes: > > If we see a promoted subreg and TRULY_NOOP_TRUNCATION says the > > truncation is not a noop, then all bits of the inner reg are live.  We

Re: [RFC] RISC-V: Support -mcpu for XiangShan Kunminghu cpu.

2025-06-04 Thread Xi Ruoyao
zimop_zkn_zknd_zkne_zknh_zksed_zksh_zkt_zvbb_zvfh_" > +   "zvfhmin_zvkt_zvl128b_zvl32b_zvl64b", IIUC zvl128b implies zvl32b and zvl64b, then should we explicitly give zvl32b and zvl64b here? > +   "xiangshan-kunminghu") -- Xi Ruoyao School of Aerospace Science and Technology, Xidian University

Fwd: [PATCH] testsuite: Fix up dg-do-if

2025-05-26 Thread Xi Ruoyao
I forgot to send this to the list :(. Forwarded Message From: Xi Ruoyao To: Alexandre Oliva Cc: Xi Ruoyao Subject: [PATCH] testsuite: Fix up dg-do-if Date: 05/26/25 17:59:32 The line number needs to be passed to dg-do, instead of being stripped. Fixes 'compile: syntax

[PATCH v2] ext-dce: Don't refine live width with SUBREG mode if !TRULY_NOOP_TRUNCATION_MODES_P [PR 120050]

2025-05-23 Thread Xi Ruoyao
If we see a promoted subreg and TRULY_NOOP_TRUNCATION says the truncation is not a noop, then all bits of the inner reg are live. We cannot reduce the live mask to that of the mode of the subreg. gcc/ChangeLog: PR rtl-optimization/120050 * ext-dce.cc (ext_dce_process_uses): Break

Re: [PATCH 1/3] LoongArch: testsuite: Fix pr112325.c and pr117888-1.c.

2025-05-23 Thread Xi Ruoyao
} } > */ >  /* { dg-additional-options "--param max-completely-peeled-insns=200" > { target powerpc64*-*-* } } */ > +/* { dg-additional-options "-mlsx" { target loongarch64-*-* } } */ >   >  typedef unsigned short ggml_fp16_t; >  static float table_f32_f16[1 << 16]; -- Xi Ruoyao School of Aerospace Science and Technology, Xidian University

Re: RISC-V TLS Descriptors in GCC

2025-05-22 Thread Xi Ruoyao
upport for RISC-V TLSDESC? I don't think it's accurate. The RISC-V TLSDESC support is just not merged into Glibc yet. -- Xi Ruoyao School of Aerospace Science and Technology, Xidian University

[PATCH] doc: Document the 'q' constraint for LoongArch

2025-05-21 Thread Xi Ruoyao
The kernel developers have requested such a constraint to use csrxchg in inline assembly. gcc/ChangeLog: * doc/md.texi: Document the 'q' constraint for LoongArch. --- Ok for trunk? gcc/doc/md.texi | 3 +++ 1 file changed, 3 insertions(+) diff --git a/gcc/doc/md.texi b/gcc/doc/md.texi

Re: [PATCH 3/5] libstdc++: keep subtree sizes in pb_ds binary search trees (PR 81806)

2025-05-20 Thread Xi Ruoyao
On Tue, 2025-05-20 at 13:06 +0100, Jonathan Wakely wrote: > On 13/07/20 16:45 +0800, Xi Ruoyao via Libstdc++ wrote: > > > > > The second and third patch together resolve PR 81806. > > > > The attached patch modifies split_finish to use the subtree size we > >

Re: [RFC PATCH 0/3] _BitInt(N) support for LoongArch

2025-05-20 Thread Xi Ruoyao
quired by the ISA spec. I'm trying to fix an ext-dce bug regarding !TARGET_TRULY_NOOP_TRUNCATION so I just decided to chime in and explain this :). -- Xi Ruoyao School of Aerospace Science and Technology, Xidian University

Re: [PATCH] ext-dce: Only transform extend to subreg if TRULY_NOOP_TRUNCATION [PR 120050]

2025-05-12 Thread Xi Ruoyao
On Mon, 2025-05-12 at 12:59 +0100, Richard Sandiford wrote: > Xi Ruoyao writes: > > The tranform would be unsafe if !TRULY_NOOP_TRUNCATION because on these > > machines the hardware may look at bits outside of the given mode. > > > > gcc/ChangeLog: > > &

[PATCH] ext-dce: Only transform extend to subreg if TRULY_NOOP_TRUNCATION [PR 120050]

2025-05-12 Thread Xi Ruoyao
The tranform would be unsafe if !TRULY_NOOP_TRUNCATION because on these machines the hardware may look at bits outside of the given mode. gcc/ChangeLog: PR rtl-optimization/120050 * ext-dce.cc (ext_dce_try_optimize_insn): Only transform the insn if TRULY_NOOP_TRUNCATION. -

Re: [PATCH 30/61] MSA: Make MSA and microMIPS R5 unsupported

2025-04-27 Thread Xi Ruoyao
bination: %s", "-mmicromips -mmsa"); And should this line be updated too like "-mmicromips -mmsa is only supported for MIPSr6"? Unfortunately the original patch is already applied and breaking even a non-bootstrapping build for MIPS. Thus a fix is needed ASAP or we'd revert the original patch. -- Xi Ruoyao School of Aerospace Science and Technology, Xidian University

Pushed r15-9167: [PATCH] LoongArch: Make gen-evolution.awk compatible with FreeBSD awk

2025-04-04 Thread Xi Ruoyao
On Thu, 2025-04-03 at 10:13 +0800, Lulu Cheng wrote: > > 在 2025/4/2 上午11:19, Xi Ruoyao 写道: > > Avoid using gensub that FreeBSD awk lacks, use gsub and split those > > each > > of gawk, mawk, and FreeBSD awk provides. > > > > Reported-by: mp...@vip.163.com &

[PATCH] LoongArch: Make gen-evolution.awk compatible with FreeBSD awk

2025-04-01 Thread Xi Ruoyao
Avoid using gensub that FreeBSD awk lacks, use gsub and split those each of gawk, mawk, and FreeBSD awk provides. Reported-by: mp...@vip.163.com Link: https://man.freebsd.org/cgi/man.cgi?query=awk gcc/ChangeLog: * config/loongarch/genopts/gen-evolution.awk: Avoid using gensub tha

[gcc-14 PATCH] Reuse scratch registers generated by LRA

2025-03-27 Thread Xi Ruoyao
From: Denis Chertykov Test file: udivmoddi.c problem insn: 484 Before LRA pass we have: (insn 484 483 485 72 (parallel [ (set (reg/v:SI 143 [ __q1 ]) (plus:SI (reg/v:SI 143 [ __q1 ]) (const_int -2 [0xfffe]))) (clobber (scrat

[PATCH] LoongArch: Add ABI names for FPR

2025-03-15 Thread Xi Ruoyao
We already allow the ABI names for GPR in inline asm clobber list, so for consistency allow the ABI names for FPR as well. Reported-by: Yao Zi gcc/ChangeLog: * config/loongarch/loongarch.h (ADDITIONAL_REGISTER_NAMES): Add fa0-fa7, ft0-ft16, and fs0-fs7. gcc/testsuite/ChangeLog:

[PATCH] LoongArch: Don't use C++17 feature [PR119238]

2025-03-12 Thread Xi Ruoyao
Structured binding is a C++17 feature but the GCC code base is in C++14. gcc/ChangeLog: PR target/119238 * config/loongarch/simd.md (dot_prod): Stop using structured binding. --- Ok for trunk? gcc/config/loongarch/simd.md | 14 -- 1 file changed, 8 insertion

[PATCH] LoongArch: Fix ICE when trying to recognize bitwise + alsl.w pair [PR119127]

2025-03-11 Thread Xi Ruoyao
When we call loongarch_reassoc_shift_bitwise for _alsl_reversesi_extend, the mask is in DImode but we are trying to operate it in SImode, causing an ICE. To fix the issue sign-extend the mask into the mode we want. And also specially handle the case the mask is extended into -1 to avoid a miss-op

Re: [PATCH] LoongArch: Fix incorrect reorder of __lsx_vldx and __lasx_xvldx [PR119084]

2025-03-04 Thread Xi Ruoyao
On Wed, 2025-03-05 at 10:52 +0800, Lulu Cheng wrote: > LGTM! Pushed to trunk. The draft of gcc-14 backport is attached, I'll push it if it builds & tests fine and there's no objection. -- Xi Ruoyao School of Aerospace Science and Technology, Xidia

[PATCH 08/17] LoongArch: Implement subword atomic_fetch_{and, or, xor} with am*.w instructions

2025-03-03 Thread Xi Ruoyao
We can just shift the mask and fill the other bits with 0 (for ior/xor) or 1 (for and), and use an am*.w instruction to perform the atomic operation, instead of using a LL-SC loop. gcc/ChangeLog: * config/loongarch/sync.md (UNSPEC_COMPARE_AND_SWAP_AND): Remove. (UNSPEC_COM

[PATCH 13/17] LoongArch: Add -m[no-]scq option

2025-03-03 Thread Xi Ruoyao
We'll use the sc.q instruction for some 16-byte atomic operations, but it's only added in LoongArch 1.1 evolution so we need to gate it with an option. gcc/ChangeLog: * config/loongarch/genopts/isa-evolution.in (scq): New evolution feature. * config/loongarch/loongarch-evo

[PATCH] LoongArch: Fix incorrect reorder of __lsx_vldx and __lasx_xvldx [PR119084]

2025-03-02 Thread Xi Ruoyao
They could be incorrectly reordered with store instructions like st.b because the RTL expression does not have a memory_operand or a (mem) expression. The incorrect reorder has been observed in openh264 LTO build. Expand them to a (mem) expression instead of unspec to fix the issue. Then we need

[PATCH 01/17] LoongArch: (NFC) Remove atomic_optab and use amop instead

2025-03-02 Thread Xi Ruoyao
They are the same. gcc/ChangeLog: * config/loongarch/sync.md (atomic_optab): Remove. (atomic_): Change atomic_optab to amop. (atomic_fetch_): Likewise. --- gcc/config/loongarch/sync.md | 6 ++ 1 file changed, 2 insertions(+), 4 deletions(-) diff --git a/gcc/config/lo

[PATCH 05/17] LoongArch: Don't emit overly-restrictive barrier for LL-SC loops

2025-03-01 Thread Xi Ruoyao
For LL-SC loops, if the atomic operation has succeeded, the SC instruction always imply a full barrier, so the barrier we manually inserted only needs to take the account for the failure memorder, not the success memorder (the barrier is skipped with "b 3f" on success anyway). Note that if we use

[PATCH 17/17] LoongArch: Implement 16-byte atomic add, sub, and, or, xor, and nand with sc.q

2025-03-01 Thread Xi Ruoyao
gcc/ChangeLog: * config/loongarch/sync.md (UNSPEC_TI_FETCH_ADD): New unspec. (UNSPEC_TI_FETCH_SUB): Likewise. (UNSPEC_TI_FETCH_AND): Likewise. (UNSPEC_TI_FETCH_XOR): Likewise. (UNSPEC_TI_FETCH_OR): Likewise. (UNSPEC_TI_FETCH_NAND_MASK_INVERTED): Like

[PATCH 11/17] LoongArch: Implement 16-byte atomic load with LSX

2025-03-01 Thread Xi Ruoyao
If the vector is naturally aligned, it cannot cross cache lines so the LSX load is guaranteed to be atomic. Thus we can use LSX to do the lock-free atomic load, instead of using a lock. gcc/ChangeLog: * config/loongarch/sync.md (atomic_loadti_lsx): New define_insn. (atomic_loadti

[PATCH 03/17] LoongArch: Don't use "+" for atomic_{load, store} "m" constraint

2025-02-28 Thread Xi Ruoyao
Atomic load does not modify the memory. Atomic store does not read the memory, thus we can use "=" instead. gcc/ChangeLog: * config/loongarch/sync.md (atomic_load): Remove "+" for the memory operand. (atomic_store): Use "=" instead of "+" for the memory operand. -

[PATCH 15/17] LoongArch: Implement 16-byte CAS with sc.q

2025-02-28 Thread Xi Ruoyao
gcc/ChangeLog: * config/loongarch/sync.md (atomic_compare_and_swapti_scq): New define_insn. (atomic_compare_and_swapti): New define_expand. --- gcc/config/loongarch/sync.md | 89 1 file changed, 89 insertions(+) diff --git a/gcc/config

[PATCH 10/17] LoongArch: Implement atomic_fetch_nand

2025-02-28 Thread Xi Ruoyao
Without atomic_fetch_nandsi and atomic_fetch_nanddi, __atomic_fetch_nand is expanded to a loop containing a CAS in the body, and CAS itself is a LL-SC loop so we have a nested loop. This is obviously not a good idea as we just need one LL-SC loop in fact. As ~(atom & mask) is (~mask) | (~atom), w

[PATCH 06/17] LoongArch: Remove unneeded "b 3f" instruction after LL-SC loops

2025-02-28 Thread Xi Ruoyao
This instruction is used to skip an redundant barrier if -mno-ld-seq-sa or the memory model requires a barrier on failure. But with -mld-seq-sa and other memory models the barrier may be nonexisting at all, and we should remove the "b 3f" instruction as well. The implementation uses a new operand

[PATCH 16/17] LoongArch: Implement 16-byte atomic exchange with sc.q

2025-02-28 Thread Xi Ruoyao
gcc/ChangeLog: * config/loongarch/sync.md (atomic_exchangeti_scq): New define_insn. (atomic_exchangeti): New define_expand. --- gcc/config/loongarch/sync.md | 35 +++ 1 file changed, 35 insertions(+) diff --git a/gcc/config/loongarch/sync.m

[PATCH 09/17] LoongArch: Don't expand atomic_fetch_sub_{hi, qi} to LL-SC loop if -mlam-bh

2025-02-28 Thread Xi Ruoyao
With -mlam-bh, we should negate the addend first, and use an amadd instruction. Disabling the expander makes the compiler do it correctly. gcc/ChangeLog: * config/loongarch/sync.md (atomic_fetch_sub): Disable if ISA_HAS_LAM_BH. --- gcc/config/loongarch/sync.md | 2 +- 1 file cha

[PATCH 14/17] LoongArch: Implement 16-byte atomic store with sc.q

2025-02-28 Thread Xi Ruoyao
When LSX is not available but sc.q is (for example on LA664 where the SIMD unit is not enabled), we can use a LL-SC loop for 16-byte atomic store. gcc/ChangeLog: * config/loongarch/loongarch.cc (loongarch_print_operand_reloc): Accept "%t" for printing the number of the 64-bit mach

[PATCH 12/17] LoongArch: Implement 16-byte atomic store with LSX

2025-02-28 Thread Xi Ruoyao
If the vector is naturally aligned, it cannot cross cache lines so the LSX store is guaranteed to be atomic. Thus we can use LSX to do the lock-free atomic store, instead of using a lock. gcc/ChangeLog: * config/loongarch/sync.md (atomic_storeti_lsx): New define_insn. (at

[PATCH 07/17] LoongArch: Remove unneeded "andi offset, addr, 3" instruction in atomic_test_and_set

2025-02-28 Thread Xi Ruoyao
On LoongArch sll.w and srl.w instructions only take the [4:0] bits of rk (shift amount) into account, and we've already defined SHIFT_COUNT_TRUNCATED to 1 so the compiler knows this fact, thus we don't need this instruction. gcc/ChangeLog: * config/loongarch/sync.md (atomic_test_and_set):

[PATCH 02/17] LoongArch: (NFC) Remove amo and use size instead

2025-02-28 Thread Xi Ruoyao
They are the same. gcc/ChangeLog: * config/loongarch/sync.md: Use instead of . (amo): Remove. --- gcc/config/loongarch/sync.md | 53 +--- 1 file changed, 25 insertions(+), 28 deletions(-) diff --git a/gcc/config/loongarch/sync.md b/gcc/config/loo

[PATCH 04/17] LoongArch: Allow using bstrins for masking the address in atomic_test_and_set

2025-02-28 Thread Xi Ruoyao
We can use bstrins for masking the address here. As people are already working on LA32R (which lacks bstrins instructions), for future-proofing we check whether (const_int -4) is an and_operand and force it into an register if not. gcc/ChangeLog: * config/loongarch/sync.md (atomic_test_a

[PATCH 00/17] LoongArch: Clean up atomic operations and implement 16-byte atomic operations

2025-02-28 Thread Xi Ruoyao
The entire patch bootstrapped and regtested on loongarch64-linux-gnu with -march=la664, and I've also tried several simple 16-byte atomic operation tests locally. OK for trunk? Or maybe the clean up is OK but the 16-byte atomic implementation still needs to be confirmed by the hardware team

[PATCH] LoongArch: Add a dedicated pattern for bitwise + alsl

2025-02-28 Thread Xi Ruoyao
We've implemented the slli + bitwise => bitwise + slli reassociation in r15-7062. I'd hoped late combine could handle slli.d + bitwise + add.d => bitwise + slli.d + add.d => bitwise => alsl.d, but it does not always work, for example a |= 0xfff; b |= 0xfff; a <<= 2; b <<= 2; a += x; b

Re: [PATCH] LoongArch: Avoid unnecessary zero-initialization using LSX for scalar popcount

2025-02-25 Thread Xi Ruoyao
On Tue, 2025-02-25 at 20:49 +0800, Lulu Cheng wrote: > > 在 2025/2/22 下午3:34, Xi Ruoyao 写道: > > Now for __builtin_popcountl we are getting things like > > > > vrepli.b$vr0,0 > > vinsgr2vr.d $vr0,$r4,0 > > vpcnt.d $vr0,$vr0 > >

[PATCH] LoongArch: Avoid unnecessary zero-initialization using LSX for scalar popcount

2025-02-21 Thread Xi Ruoyao
Now for __builtin_popcountl we are getting things like vrepli.b$vr0,0 vinsgr2vr.d $vr0,$r4,0 vpcnt.d $vr0,$vr0 vpickve2gr.du $r4,$vr0,0 slli.w $r4,$r4,0 jr $r1 The "vrepli.b" instruction is introduced by the init-regs pass (see PR618

Re: [RFC] RISC-V: The optimization ignored the side effects of the rounding mode, resulting in incorrect results.

2025-02-19 Thread Xi Ruoyao
assumptions about the rounding modes in > floating-point > calculations, such as in float_extend, which may prevent CSE optimizations. > Could > this also lead to lost optimization opportunities in other areas that don't > require > this option? I'm not sure. > > I suspect that the best approach would be to define relevant > attributes (perhaps similar to -frounding-math) within specific related > patterns/built-ins > to inform optimizers we are using a rounding mode and to avoid > over-optimization. The "special pattern" is supposed to be #pragma STDC FENV_ACCESS that we've not implemented. See https://gcc.gnu.org/PR34678. -- Xi Ruoyao School of Aerospace Science and Technology, Xidian University

Ping: [PATCH] testsuite: Fix up toplevel-asm-1.c for LoongArch

2025-02-18 Thread Xi Ruoyao
On Wed, 2025-02-05 at 08:57 +0800, Xi Ruoyao wrote: > Like RISC-V, on LoongArch we don't really support %cN for SYMBOL_REFs > even with -fno-pic. > > gcc/testsuite/ChangeLog: > > * c-c++-common/toplevel-asm-1.c: Use %cc3 %cc4 instead of %c3 > %c4 on LoongArc

[PATCH] LoongArch: Use normal RTL pattern instead of UNSPEC for {x, }vsr{a, l}ri instructions

2025-02-14 Thread Xi Ruoyao
Allowing (t + (1ul << imm >> 1)) >> imm to be recognized as a rounding shift operation. gcc/ChangeLog: * config/loongarch/lasx.md (UNSPEC_LASX_XVSRARI): Remove. (UNSPEC_LASX_XVSRLRI): Remove. (lasx_xvsrari_): Remove. (lasx_xvsrlri_): Remove. * config/loonga

Re: [PATCH v2 2/8] LoongArch: Allow moving TImode vectors

2025-02-14 Thread Xi Ruoyao
On Fri, 2025-02-14 at 15:46 +0800, Lulu Cheng wrote: > Hi, > > If only apply the first and second patches, the code will not compile. > > Otherwise LGTM. Fixed in v3: https://gcc.gnu.org/pipermail/gcc-patches/2025-February/675776.html -- Xi Ruoyao School of Aerospace Science

[PATCH v3 5/8] LoongArch: Simplify {lsx_,lasx_x}vmaddw description

2025-02-14 Thread Xi Ruoyao
Like what we've done for {lsx_,lasx_x}v{add,sub,mul}l{ev,od}, use special predicates and TImode RTL instead of hard-coded const vectors and UNSPECs. Also reorder two operands of the outer plus in the template, so combine will recognize {x,}vadd + {x,}vmulw{ev,od} => {x,}vmaddw{ev,od}. gcc/ChangeL

[PATCH v3 4/8] LoongArch: Simplify {lsx_, lasx_x}vh{add, sub}w description

2025-02-14 Thread Xi Ruoyao
Like what we've done for {lsx_,lasx_x}v{add,sub,mul}l{ev,od}, use special predicates and TImode RTL instead of hard-coded const vectors and UNSPECs. gcc/ChangeLog: * config/loongarch/lasx.md (UNSPEC_LASX_XVHADDW_Q_D): Remove. (UNSPEC_LASX_XVHSUBW_Q_D): Remove. (UNSPEC_LASX

[PATCH v3 6/8] LoongArch: Simplify lsx_vpick description

2025-02-14 Thread Xi Ruoyao
Like what we've done for {lsx_,lasx_x}v{add,sub,mul}l{ev,od}, use special predicates instead of hard-coded const vectors. This is not suitable for LASX where lasx_xvpick has a different semantic. gcc/ChangeLog: * config/loongarch/simd.md (LVEC): New define_mode_attr. (simdfmt_as_

[PATCH v3 8/8] LoongArch: Implement [su]dot_prod* for LSX and LASX modes

2025-02-14 Thread Xi Ruoyao
Despite it's just a special case of "a widening product of which the result used for reduction," having these standard names allows to recognize the dot product pattern earlier and it may be beneficial to optimization. Also fix some test failures with the test cases: - gcc.dg/vect/vect-reduc-chai

[PATCH v3 7/8] LoongArch: Implement vec_widen_mult_{even, odd}_* for LSX and LASX modes

2025-02-14 Thread Xi Ruoyao
Since PR116142 has been fixed, now we can add the standard names so the compiler will generate better code if the result of a widening production is reduced. gcc/ChangeLog: * config/loongarch/simd.md (even_odd): New define_int_attr. (vec_widen_mult__): New define_expand. gcc/test

[PATCH v3 1/8] LoongArch: Try harder using vrepli instructions to materialize const vectors

2025-02-14 Thread Xi Ruoyao
For a = (v4si){0x, 0x, 0x, 0x} we just want vrepli.b $vr0, 0xdd but the compiler actually produces a load: la.local $r14,.LC0 vld $vr0,$r14,0 It's because we only tried vrepli.d which wouldn't work. Try all vrepli instructions for const int vector

[PATCH v3 2/8] LoongArch: Allow moving TImode vectors

2025-02-14 Thread Xi Ruoyao
We have some vector instructions for operations on 128-bit integer, i.e. TImode, vectors. Previously they had been modeled with unspecs, but it's more natural to just model them with TImode vector RTL expressions. For the preparation, allow moving V1TImode and V2TImode vectors in LSX and LASX reg

[PATCH v3 3/8] LoongArch: Simplify {lsx_, lasx_x}v{add, sub, mul}l{ev, od} description

2025-02-14 Thread Xi Ruoyao
These pattern definitions are tediously long, invoking 32 UNSPECs and many hard-coded long const vectors. To simplify them, at first we use the TImode vector operations instead of the UNSPECs, then we adopt an approach in AArch64: using a special predicate to match the const vectors for odd/even i

[PATCH v3 0/8] LoongArch: SIMD odd/even/horizontal widening arithmetic cleanup and optimization

2025-02-14 Thread Xi Ruoyao
tested on loongarch64-linux-gnu, no new code change in v3. Ok for trunk? Xi Ruoyao (8): LoongArch: Try harder using vrepli instructions to materialize const vectors LoongArch: Allow moving TImode vectors LoongArch: Simplify {lsx_,lasx_x}v{add,sub,mul}l{ev,od} description LoongArch: Si

[PATCH v2 7/8] LoongArch: Implement vec_widen_mult_{even, odd}_* for LSX and LASX modes

2025-02-13 Thread Xi Ruoyao
Since PR116142 has been fixed, now we can add the standard names so the compiler will generate better code if the result of a widening production is reduced. gcc/ChangeLog: * config/loongarch/simd.md (even_odd): New define_int_attr. (vec_widen_mult__): New define_expand. gcc/test

Re: [PATCH 6/8] LoongArch: Simplify {lsx,lasx_x}vpick description

2025-02-13 Thread Xi Ruoyao
n test the optimal > values > > for -malign-{functions,labels,jumps,loops} on that basis. Thanks! -- Xi Ruoyao School of Aerospace Science and Technology, Xidian University

[PATCH v2 6/8] LoongArch: Simplify lsx_vpick description

2025-02-13 Thread Xi Ruoyao
Like what we've done for {lsx_,lasx_x}v{add,sub,mul}l{ev,od}, use special predicates instead of hard-coded const vectors. This is not suitable for LASX where lasx_xvpick has a different semantic. gcc/ChangeLog: * config/loongarch/simd.md (LVEC): New define_mode_attr. (simdfmt_as_

  1   2   3   4   5   6   7   8   9   10   >