On Thu, 2025-04-03 at 10:13 +0800, Lulu Cheng wrote:
>
> 在 2025/4/2 上午11:19, Xi Ruoyao 写道:
> > Avoid using gensub that FreeBSD awk lacks, use gsub and split those
> > each
> > of gawk, mawk, and FreeBSD awk provides.
> >
> > Reported-by: mp...@vip.163.com
&
Avoid using gensub that FreeBSD awk lacks, use gsub and split those each
of gawk, mawk, and FreeBSD awk provides.
Reported-by: mp...@vip.163.com
Link: https://man.freebsd.org/cgi/man.cgi?query=awk
gcc/ChangeLog:
* config/loongarch/genopts/gen-evolution.awk: Avoid using gensub
tha
From: Denis Chertykov
Test file: udivmoddi.c
problem insn: 484
Before LRA pass we have:
(insn 484 483 485 72 (parallel [
(set (reg/v:SI 143 [ __q1 ])
(plus:SI (reg/v:SI 143 [ __q1 ])
(const_int -2 [0xfffe])))
(clobber (scrat
We already allow the ABI names for GPR in inline asm clobber list, so
for consistency allow the ABI names for FPR as well.
Reported-by: Yao Zi
gcc/ChangeLog:
* config/loongarch/loongarch.h (ADDITIONAL_REGISTER_NAMES): Add
fa0-fa7, ft0-ft16, and fs0-fs7.
gcc/testsuite/ChangeLog:
Structured binding is a C++17 feature but the GCC code base is in C++14.
gcc/ChangeLog:
PR target/119238
* config/loongarch/simd.md (dot_prod):
Stop using structured binding.
---
Ok for trunk?
gcc/config/loongarch/simd.md | 14 --
1 file changed, 8 insertion
When we call loongarch_reassoc_shift_bitwise for
_alsl_reversesi_extend, the mask is in DImode but we are trying
to operate it in SImode, causing an ICE.
To fix the issue sign-extend the mask into the mode we want. And also
specially handle the case the mask is extended into -1 to avoid a
miss-op
On Wed, 2025-03-05 at 10:52 +0800, Lulu Cheng wrote:
> LGTM!
Pushed to trunk. The draft of gcc-14 backport is attached, I'll push it
if it builds & tests fine and there's no objection.
--
Xi Ruoyao
School of Aerospace Science and Technology, Xidia
We can just shift the mask and fill the other bits with 0 (for ior/xor)
or 1 (for and), and use an am*.w instruction to perform the atomic
operation, instead of using a LL-SC loop.
gcc/ChangeLog:
* config/loongarch/sync.md (UNSPEC_COMPARE_AND_SWAP_AND):
Remove.
(UNSPEC_COM
We'll use the sc.q instruction for some 16-byte atomic operations, but
it's only added in LoongArch 1.1 evolution so we need to gate it with
an option.
gcc/ChangeLog:
* config/loongarch/genopts/isa-evolution.in (scq): New evolution
feature.
* config/loongarch/loongarch-evo
They could be incorrectly reordered with store instructions like st.b
because the RTL expression does not have a memory_operand or a (mem)
expression. The incorrect reorder has been observed in openh264 LTO
build.
Expand them to a (mem) expression instead of unspec to fix the issue.
Then we need
They are the same.
gcc/ChangeLog:
* config/loongarch/sync.md (atomic_optab): Remove.
(atomic_): Change atomic_optab to amop.
(atomic_fetch_): Likewise.
---
gcc/config/loongarch/sync.md | 6 ++
1 file changed, 2 insertions(+), 4 deletions(-)
diff --git a/gcc/config/lo
For LL-SC loops, if the atomic operation has succeeded, the SC
instruction always imply a full barrier, so the barrier we manually
inserted only needs to take the account for the failure memorder, not
the success memorder (the barrier is skipped with "b 3f" on success
anyway).
Note that if we use
gcc/ChangeLog:
* config/loongarch/sync.md (UNSPEC_TI_FETCH_ADD): New unspec.
(UNSPEC_TI_FETCH_SUB): Likewise.
(UNSPEC_TI_FETCH_AND): Likewise.
(UNSPEC_TI_FETCH_XOR): Likewise.
(UNSPEC_TI_FETCH_OR): Likewise.
(UNSPEC_TI_FETCH_NAND_MASK_INVERTED): Like
If the vector is naturally aligned, it cannot cross cache lines so the
LSX load is guaranteed to be atomic. Thus we can use LSX to do the
lock-free atomic load, instead of using a lock.
gcc/ChangeLog:
* config/loongarch/sync.md (atomic_loadti_lsx): New define_insn.
(atomic_loadti
Atomic load does not modify the memory. Atomic store does not read the
memory, thus we can use "=" instead.
gcc/ChangeLog:
* config/loongarch/sync.md (atomic_load): Remove "+" for
the memory operand.
(atomic_store): Use "=" instead of "+" for the memory
operand.
-
gcc/ChangeLog:
* config/loongarch/sync.md (atomic_compare_and_swapti_scq): New
define_insn.
(atomic_compare_and_swapti): New define_expand.
---
gcc/config/loongarch/sync.md | 89
1 file changed, 89 insertions(+)
diff --git a/gcc/config
Without atomic_fetch_nandsi and atomic_fetch_nanddi, __atomic_fetch_nand
is expanded to a loop containing a CAS in the body, and CAS itself is a
LL-SC loop so we have a nested loop. This is obviously not a good idea
as we just need one LL-SC loop in fact.
As ~(atom & mask) is (~mask) | (~atom), w
This instruction is used to skip an redundant barrier if -mno-ld-seq-sa
or the memory model requires a barrier on failure. But with -mld-seq-sa
and other memory models the barrier may be nonexisting at all, and we
should remove the "b 3f" instruction as well.
The implementation uses a new operand
gcc/ChangeLog:
* config/loongarch/sync.md (atomic_exchangeti_scq): New
define_insn.
(atomic_exchangeti): New define_expand.
---
gcc/config/loongarch/sync.md | 35 +++
1 file changed, 35 insertions(+)
diff --git a/gcc/config/loongarch/sync.m
With -mlam-bh, we should negate the addend first, and use an amadd
instruction. Disabling the expander makes the compiler do it correctly.
gcc/ChangeLog:
* config/loongarch/sync.md (atomic_fetch_sub):
Disable if ISA_HAS_LAM_BH.
---
gcc/config/loongarch/sync.md | 2 +-
1 file cha
When LSX is not available but sc.q is (for example on LA664 where the
SIMD unit is not enabled), we can use a LL-SC loop for 16-byte atomic
store.
gcc/ChangeLog:
* config/loongarch/loongarch.cc (loongarch_print_operand_reloc):
Accept "%t" for printing the number of the 64-bit mach
If the vector is naturally aligned, it cannot cross cache lines so the
LSX store is guaranteed to be atomic. Thus we can use LSX to do the
lock-free atomic store, instead of using a lock.
gcc/ChangeLog:
* config/loongarch/sync.md (atomic_storeti_lsx): New
define_insn.
(at
On LoongArch sll.w and srl.w instructions only take the [4:0] bits of
rk (shift amount) into account, and we've already defined
SHIFT_COUNT_TRUNCATED to 1 so the compiler knows this fact, thus we
don't need this instruction.
gcc/ChangeLog:
* config/loongarch/sync.md (atomic_test_and_set):
They are the same.
gcc/ChangeLog:
* config/loongarch/sync.md: Use instead of .
(amo): Remove.
---
gcc/config/loongarch/sync.md | 53 +---
1 file changed, 25 insertions(+), 28 deletions(-)
diff --git a/gcc/config/loongarch/sync.md b/gcc/config/loo
We can use bstrins for masking the address here. As people are already
working on LA32R (which lacks bstrins instructions), for future-proofing
we check whether (const_int -4) is an and_operand and force it into an
register if not.
gcc/ChangeLog:
* config/loongarch/sync.md (atomic_test_a
The entire patch bootstrapped and regtested on loongarch64-linux-gnu
with -march=la664, and I've also tried several simple 16-byte atomic
operation tests locally.
OK for trunk? Or maybe the clean up is OK but the 16-byte atomic
implementation still needs to be confirmed by the hardware team
We've implemented the slli + bitwise => bitwise + slli reassociation in
r15-7062. I'd hoped late combine could handle slli.d + bitwise + add.d
=> bitwise + slli.d + add.d => bitwise => alsl.d, but it does not always
work, for example
a |= 0xfff;
b |= 0xfff;
a <<= 2;
b <<= 2;
a += x;
b
On Tue, 2025-02-25 at 20:49 +0800, Lulu Cheng wrote:
>
> 在 2025/2/22 下午3:34, Xi Ruoyao 写道:
> > Now for __builtin_popcountl we are getting things like
> >
> > vrepli.b$vr0,0
> > vinsgr2vr.d $vr0,$r4,0
> > vpcnt.d $vr0,$vr0
> >
Now for __builtin_popcountl we are getting things like
vrepli.b$vr0,0
vinsgr2vr.d $vr0,$r4,0
vpcnt.d $vr0,$vr0
vpickve2gr.du $r4,$vr0,0
slli.w $r4,$r4,0
jr $r1
The "vrepli.b" instruction is introduced by the init-regs pass (see
PR618
assumptions about the rounding modes in
> floating-point
> calculations, such as in float_extend, which may prevent CSE optimizations.
> Could
> this also lead to lost optimization opportunities in other areas that don't
> require
> this option? I'm not sure.
>
> I suspect that the best approach would be to define relevant
> attributes (perhaps similar to -frounding-math) within specific related
> patterns/built-ins
> to inform optimizers we are using a rounding mode and to avoid
> over-optimization.
The "special pattern" is supposed to be #pragma STDC FENV_ACCESS that
we've not implemented. See https://gcc.gnu.org/PR34678.
--
Xi Ruoyao
School of Aerospace Science and Technology, Xidian University
On Wed, 2025-02-05 at 08:57 +0800, Xi Ruoyao wrote:
> Like RISC-V, on LoongArch we don't really support %cN for SYMBOL_REFs
> even with -fno-pic.
>
> gcc/testsuite/ChangeLog:
>
> * c-c++-common/toplevel-asm-1.c: Use %cc3 %cc4 instead of %c3
> %c4 on LoongArc
Allowing (t + (1ul << imm >> 1)) >> imm to be recognized as a rounding
shift operation.
gcc/ChangeLog:
* config/loongarch/lasx.md (UNSPEC_LASX_XVSRARI): Remove.
(UNSPEC_LASX_XVSRLRI): Remove.
(lasx_xvsrari_): Remove.
(lasx_xvsrlri_): Remove.
* config/loonga
On Fri, 2025-02-14 at 15:46 +0800, Lulu Cheng wrote:
> Hi,
>
> If only apply the first and second patches, the code will not compile.
>
> Otherwise LGTM.
Fixed in v3:
https://gcc.gnu.org/pipermail/gcc-patches/2025-February/675776.html
--
Xi Ruoyao
School of Aerospace Science
Like what we've done for {lsx_,lasx_x}v{add,sub,mul}l{ev,od}, use
special predicates and TImode RTL instead of hard-coded const vectors
and UNSPECs.
Also reorder two operands of the outer plus in the template, so combine
will recognize {x,}vadd + {x,}vmulw{ev,od} => {x,}vmaddw{ev,od}.
gcc/ChangeL
Like what we've done for {lsx_,lasx_x}v{add,sub,mul}l{ev,od}, use
special predicates and TImode RTL instead of hard-coded const vectors
and UNSPECs.
gcc/ChangeLog:
* config/loongarch/lasx.md (UNSPEC_LASX_XVHADDW_Q_D): Remove.
(UNSPEC_LASX_XVHSUBW_Q_D): Remove.
(UNSPEC_LASX
Like what we've done for {lsx_,lasx_x}v{add,sub,mul}l{ev,od}, use
special predicates instead of hard-coded const vectors.
This is not suitable for LASX where lasx_xvpick has a different
semantic.
gcc/ChangeLog:
* config/loongarch/simd.md (LVEC): New define_mode_attr.
(simdfmt_as_
Despite it's just a special case of "a widening product of which the
result used for reduction," having these standard names allows to
recognize the dot product pattern earlier and it may be beneficial to
optimization. Also fix some test failures with the test cases:
- gcc.dg/vect/vect-reduc-chai
Since PR116142 has been fixed, now we can add the standard names so the
compiler will generate better code if the result of a widening
production is reduced.
gcc/ChangeLog:
* config/loongarch/simd.md (even_odd): New define_int_attr.
(vec_widen_mult__): New define_expand.
gcc/test
For
a = (v4si){0x, 0x, 0x, 0x}
we just want
vrepli.b $vr0, 0xdd
but the compiler actually produces a load:
la.local $r14,.LC0
vld $vr0,$r14,0
It's because we only tried vrepli.d which wouldn't work. Try all vrepli
instructions for const int vector
We have some vector instructions for operations on 128-bit integer, i.e.
TImode, vectors. Previously they had been modeled with unspecs, but
it's more natural to just model them with TImode vector RTL expressions.
For the preparation, allow moving V1TImode and V2TImode vectors in LSX
and LASX reg
These pattern definitions are tediously long, invoking 32 UNSPECs and
many hard-coded long const vectors. To simplify them, at first we use
the TImode vector operations instead of the UNSPECs, then we adopt an
approach in AArch64: using a special predicate to match the const
vectors for odd/even i
tested on loongarch64-linux-gnu, no new code
change in v3. Ok for trunk?
Xi Ruoyao (8):
LoongArch: Try harder using vrepli instructions to materialize const
vectors
LoongArch: Allow moving TImode vectors
LoongArch: Simplify {lsx_,lasx_x}v{add,sub,mul}l{ev,od} description
LoongArch: Si
Since PR116142 has been fixed, now we can add the standard names so the
compiler will generate better code if the result of a widening
production is reduced.
gcc/ChangeLog:
* config/loongarch/simd.md (even_odd): New define_int_attr.
(vec_widen_mult__): New define_expand.
gcc/test
n test the optimal
> values
>
> for -malign-{functions,labels,jumps,loops} on that basis.
Thanks!
--
Xi Ruoyao
School of Aerospace Science and Technology, Xidian University
Like what we've done for {lsx_,lasx_x}v{add,sub,mul}l{ev,od}, use
special predicates instead of hard-coded const vectors.
This is not suitable for LASX where lasx_xvpick has a different
semantic.
gcc/ChangeLog:
* config/loongarch/simd.md (LVEC): New define_mode_attr.
(simdfmt_as_
Like what we've done for {lsx_,lasx_x}v{add,sub,mul}l{ev,od}, use
special predicates and TImode RTL instead of hard-coded const vectors
and UNSPECs.
Also reorder two operands of the outer plus in the template, so combine
will recognize {x,}vadd + {x,}vmulw{ev,od} => {x,}vmaddw{ev,od}.
gcc/ChangeL
Despite it's just a special case of "a widening product of which the
result used for reduction," having these standard names allows to
recognize the dot product pattern earlier and it may be beneficial to
optimization. Also fix some test failures with the test cases:
- gcc.dg/vect/vect-reduc-chai
These pattern definitions are tediously long, invoking 32 UNSPECs and
many hard-coded long const vectors. To simplify them, at first we use
the TImode vector operations instead of the UNSPECs, then we adopt an
approach in AArch64: using a special predicate to match the const
vectors for odd/even i
We have some vector instructions for operations on 128-bit integer, i.e.
TImode, vectors. Previously they had been modeled with unspecs, but
it's more natural to just model them with TImode vector RTL expressions.
For the preparation, allow moving V1TImode and V2TImode vectors in LSX
and LASX reg
For
a = (v4si){0x, 0x, 0x, 0x}
we just want
vrepli.b $vr0, 0xdd
but the compiler actually produces a load:
la.local $r14,.LC0
vld $vr0,$r14,0
It's because we only tried vrepli.d which wouldn't work. Try all vrepli
instructions for const int vector
Like what we've done for {lsx_,lasx_x}v{add,sub,mul}l{ev,od}, use
special predicates and TImode RTL instead of hard-coded const vectors
and UNSPECs.
gcc/ChangeLog:
* config/loongarch/lasx.md (UNSPEC_LASX_XVHADDW_Q_D): Remove.
(UNSPEC_LASX_XVHSUBW_Q_D): Remove.
(UNSPEC_LASX
is selected for the left operand of addsub. Swap the operands if
needed when outputting the asm.
- Fix typos in commit subjects.
- Mention V2TI in loongarch-modes.def.
Bootstrapped and regtested on loongarch64-linux-gnu. Ok for trunk?
Xi Ruoyao (8):
LoongArch: Try harder using vrepli instructions
On Thu, 2025-02-13 at 09:24 +0800, Lulu Cheng wrote:
>
> 在 2025/2/12 下午6:19, Xi Ruoyao 写道:
> > On Wed, 2025-02-12 at 18:03 +0800, Lulu Cheng wrote:
> >
> > /* snip */
> >
> > > diff --git a/gcc/testsuite/gcc.target/loongarch/pr118828-3.c
> > > b
oongarch/pr118828-4.c
> @@ -0,0 +1,55 @@
> +/* { dg-do run } */
> +/* { dg-options "-mtune=la464" } */
> +
> +#include
> +#include
> +#include
> +
> +#ifndef __loongarch_tune
> +#error __loongarch_tune should not be available here
Likewise.
--
Xi Ruoyao
School of Aerospace Science and Technology, Xidian University
On Tue, 2025-02-11 at 16:52 +0800, Lulu Cheng wrote:
>
> 在 2025/2/7 下午8:09, Xi Ruoyao 写道:
> /* snip */
> > -
> > -(define_insn "lasx_xvpickev_w"
> > - [(set (match_operand:V8SI 0 "register_operand" "=f")
> > - (vec_select:V8S
_LSX)
> - {
> - builtin_define ("__loongarch_simd");
> - builtin_define ("__loongarch_sx");
> -
> - if (!ISA_HAS_LASX)
> - builtin_define ("__loongarch_simd_width=128");
> - }
> -
> - if (ISA_HAS_LASX)
> - {
>
On Tue, 2025-02-11 at 15:49 +0800, Lulu Cheng wrote:
> It seems that the title here is "{lsx_,lasx_x}vmaddw".
Will fix in v2.
--
Xi Ruoyao
School of Aerospace Science and Technology, Xidian University
On Tue, 2025-02-11 at 15:48 +0800, Lulu Cheng wrote:
> Hi,
>
> I think , the "{lsx_,lasx_x}hv{add,sub}w" in the title should be
> "{lsx_,lasx_x}vh{add,sub}w".
Indeed.
>
> 在 2025/2/7 下午8:09, Xi Ruoyao 写道:
> > Like what we've done for {ls
Since r15-1120, multi-word shifts/rotates produces PLUS instead of IOR.
It's generally a good thing (allowing to use our alsl instruction or
similar instrunction on other architectures), but it's preventing us
from using bytepick. For example, if we shift a __int128 by 16 bits,
the higher word can
Since PR116142 has been fixed, now we can add the standard names so the
compiler will generate better code if the result of a widening
production is reduced.
gcc/ChangeLog:
* config/loongarch/simd.md (even_odd): New define_int_attr.
(vec_widen_mult__): New define_expand.
gcc/test
Like what we've done for {lsx_,lasx_x}v{add,sub,mul}l{ev,od}, use
special predicates and TImode RTL instead of hard-coded const vectors
and UNSPECs.
Also reorder two operands of the outer plus in the template, so combine
will recognize {x,}vadd + {x,}vmulw{ev,od} => {x,}vmaddw{ev,od}.
gcc/ChangeL
These pattern definitions are tediously long, invoking 32 UNSPECs and
many hard-coded long const vectors. To simplify them, at first we use
the TImode vector operations instead of the UNSPECs, then we adopt an
approach in AArch64: using a special predicate to match the const
vectors for odd/even i
Despite it's just a special case of "a widening product of which the
result used for reduction," having these standard names allows to
recognize the dot product pattern earlier and it may be beneficial to
optimization. Also fix some test failures with the test cases:
- gcc.dg/vect/vect-reduc-chai
Like what we've done for {lsx_,lasx_x}v{add,sub,mul}l{ev,od}, use
special predicates instead of hard-coded const vectors.
gcc/ChangeLog:
* config/loongarch/lasx.md (lasx_xvpickev_b): Remove.
(lasx_xvpickev_h): Remove.
(lasx_xvpickev_w): Remove.
(lasx_xvpickev_w_f):
Like what we've done for {lsx_,lasx_x}v{add,sub,mul}l{ev,od}, use
special predicates and TImode RTL instead of hard-coded const vectors
and UNSPECs.
gcc/ChangeLog:
* config/loongarch/lasx.md (UNSPEC_LASX_XVHADDW_Q_D): Remove.
(UNSPEC_LASX_XVHSUBW_Q_D): Remove.
(UNSPEC_LASX
For
a = (v4si){0x, 0x, 0x, 0x}
we just want
vrepli.b $vr0, 0xdd
but the compiler actually produces a load:
la.local $r14,.LC0
vld $vr0,$r14,0
It's because we only tried vrepli.d which wouldn't work. Try all vrepli
instructions for const int vector
We have some vector instructions for operations on 128-bit integer, i.e.
TImode, vectors. Previously they had been modeled with unspecs, but
it's more natural to just model them with TImode vector RTL expressions.
For the preparation, allow moving V1TImode and V2TImode vectors in LSX
and LASX reg
.
Bootstrapped and regtested on loongarch64-linux-gnu. Ok for trunk?
Xi Ruoyao (8):
LoongArch: Try harder using vrepli instructions to materialize const
vectors
LoongArch: Allow moving TImode vectors
LoongArch: Simplify {lsx_,lasx_x}v{add,sub,mul}l{ev,od} description
LoongArch: Simplify
Now that C default is C23, so we can no longer use LSX/LASX instructions
for these operations as the standard disallows raising INEXACT
exceptions. So LoongArch is no longer suitable for these effective
targets.
Fix the test failures on gcc.dg/vect/vect-rounding-*.c. For the old
standards or -ff
quot; "")])
-
(define_insn "lsx_vbitseli_b"
[(set (match_operand:V16QI 0 "register_operand" "=f")
(ior:V16QI (and:V16QI (not:V16QI
diff --git a/gcc/config/loongarch/simd.md b/gcc/config/loongarch/simd.md
index 611d1f87dd2..49070e829ca 100644
--- a/gcc/config/loongarch/simd.md
+++ b/gcc/config/loongarch/simd.md
@@ -950,6 +950,19 @@ (define_expand "_maddw_q_du_d_punned"
DONE;
})
+(define_insn "@simd_vbitsel"
+ [(set (match_operand:ALLVEC 0 "register_operand" "=f")
+ (ior:ALLVEC
+ (and:ALLVEC
+ (not:ALLVEC (match_operand:ALLVEC 3 "register_operand" "f"))
+ (match_operand:ALLVEC 1 "register_operand" "f"))
+ (and:ALLVEC (match_dup 3)
+ (match_operand:ALLVEC 2 "register_operand" "f"]
+ ""
+ "vbitsel.v\t%0,%1,%2,%3"
+ [(set_attr "type" "simd_bitmov")
+ (set_attr "mode" "")])
+
; The LoongArch SX Instructions.
(include "lsx.md")
--
Xi Ruoyao
School of Aerospace Science and Technology, Xidian University
Like RISC-V, on LoongArch we don't really support %cN for SYMBOL_REFs
even with -fno-pic.
gcc/testsuite/ChangeLog:
* c-c++-common/toplevel-asm-1.c: Use %cc3 %cc4 instead of %c3
%c4 on LoongArch.
---
Ok for trunk?
gcc/testsuite/c-c++-common/toplevel-asm-1.c | 2 +-
1 file change
With things like
// signed char a_14, a_16;
a.0_4 = (unsigned char) a_14;
_5 = (int) a.0_4;
b.1_6 = (unsigned char) b_16;
_7 = (int) b.1_6;
c_17 = _5 - _7;
_8 = ABS_EXPR ;
r_18 = _8 + r_23;
An ABD pattern will be recognized for _8:
patt_31 = .ABD (a.0_4, b.1_6);
It's still cor
On Thu, 2025-01-23 at 11:21 +0800, Lulu Cheng wrote:
>
> 在 2025/1/22 下午9:26, Xi Ruoyao 写道:
> > The test case added in r15-7073 now triggers an ICE, indicating we need
> > the same fix as AArch64.
> >
> > gcc/ChangeLog:
> >
> > PR target/1185
For LL-SC loops, if the atomic operation has succeeded, the SC
instruction always imply a full barrier, so the barrier we manually
inserted only needs to take the account for the failure memorder, not
the success memorder (the barrier is skipped with "b 3f" on success
anyway).
Note that if we use
We can use bstrins for masking the address here. As people are already
working on LA32R (which lacks bstrins instructions), for future-proofing
we check whether (const_int -4) is an and_operand and force it into an
register if not.
gcc/ChangeLog:
* config/loongarch/sync.md (atomic_test_a
Bootstrapped and regtested on loongarch64-linux-gnu. Ok for trunk?
Xi Ruoyao (5):
LoongArch: (NFC) Remove atomic_optab and use amop instead
LoongArch: Don't use "+" for atomic_{load,store} "m" constraint
LoongArch: Allow using bstrins for masking the address i
This instruction is used to skip an redundant barrier if -mno-ld-seq-sa
or the memory model requires a barrier on failure. But with -mld-seq-sa
and other memory models the barrier may be nonexisting at all, and we
should remove the "b 3f" instruction as well.
The implementation uses a new operand
They are the same.
gcc/ChangeLog:
* config/loongarch/sync.md (atomic_optab): Remove.
(atomic_): Change atomic_optab to amop.
(atomic_fetch_): Likewise.
---
gcc/config/loongarch/sync.md | 6 ++
1 file changed, 2 insertions(+), 4 deletions(-)
diff --git a/gcc/config/lo
Atomic load does not modify the memory. Atomic store does not read the
memory, thus we can use "=" instead.
gcc/ChangeLog:
* config/loongarch/sync.md (atomic_load): Remove "+" for
the memory operand.
(atomic_store): Use "=" instead of "+" for the memory
operand.
-
The test case added in r15-7073 now triggers an ICE, indicating we need
the same fix as AArch64.
gcc/ChangeLog:
PR target/118501
* config/loongarch/loongarch.md (@xorsign3): Use
force_lowpart_subreg.
---
Bootstrapped and regtested on loongarch64-linux-gnu, ok for trunk?
On Wed, 2025-01-22 at 10:53 +0800, Xi Ruoyao wrote:
> On Wed, 2025-01-22 at 10:37 +0800, Lulu Cheng wrote:
> >
> > 在 2025/1/22 上午8:49, Xi Ruoyao 写道:
> > > The second source register of this insn cannot be the same as the
> > > destination reg
C, 31, 0
alsl.d $a0, $a0, $a1, 2
.endr
addi.d $t1, $t1, -1
bnez $t1, 1b
ret
-DAVOID_RD_EQUAL_RS actually makes the program slower (on LA464)... So
I don't think it's really beneficial to deliberate insert a move.
--
Xi Ruoyao
School of Aerospace Science and Technology, Xidian University
On Wed, 2025-01-22 at 10:37 +0800, Lulu Cheng wrote:
>
> 在 2025/1/22 上午8:49, Xi Ruoyao 写道:
> > The second source register of this insn cannot be the same as the
> > destination register.
> >
> > gcc/ChangeLog:
> >
> > * config/loongarch/loongarch.
The uarch can fuse bstrpick.d rd,rs1,31,0 and alsl.d rd,rd,rs2,shamt,
so for this special case we should use alsl.d instead of slli.d. And
I'd hoped late combine to handle slli.d + and + add.d => and + slli.d +
add.d => and + alsl.d, but it does not always work (even before the
alsl.d special case
The second source register of this insn cannot be the same as the
destination register.
gcc/ChangeLog:
* config/loongarch/loongarch.md
(_alsl_reversesi_extended): Add '&' to the destination
register constraint and append '0' to the first source register
constraint
the uarch macro-fused operation. The fix is partial because
TARGET_SCHED_MACRO_FUSION_PAIR_P will be needed to guarantee the
bstrpick.d and alsl instructions are not separated.
Bootstrapped and regtested on loongarch64-linux-gnu. Ok for trunk?
Xi Ruoyao (2):
LoongArch: Fix wrong code with
On Tue, 2025-01-21 at 23:18 +0800, Xi Ruoyao wrote:
> On Tue, 2025-01-21 at 22:14 +0800, Xi Ruoyao wrote:
> > > > in GCC 13 the result is:
> > > >
> > > > or $r12,$r4,$r0
> > >
> > > Hmm, this strange move is caused by &quo
On Tue, 2025-01-21 at 22:14 +0800, Xi Ruoyao wrote:
> > > in GCC 13 the result is:
> > >
> > > or $r12,$r4,$r0
> >
> > Hmm, this strange move is caused by "&" in bstrpick_alsl_paired. Is it
> > really needed for the fusion?
>
On Tue, 2025-01-21 at 22:14 +0800, Xi Ruoyao wrote:
> On Tue, 2025-01-21 at 21:52 +0800, Xi Ruoyao wrote:
> > > struct Pair { unsigned long a, b; };
> > >
> > > struct Pair
> > > test (struct Pair p, long x, long y)
> > > {
> > >
On Tue, 2025-01-21 at 21:52 +0800, Xi Ruoyao wrote:
> > struct Pair { unsigned long a, b; };
> >
> > struct Pair
> > test (struct Pair p, long x, long y)
> > {
> > p.a &= 0x;
> > p.a <<= 2;
> > p.a += x;
> > p.
On Tue, 2025-01-21 at 21:23 +0800, Xi Ruoyao wrote:
/* snip */
> > It seems to be more formal through TARGET_SCHED_MACRO_FUSION_P and
> >
> > TARGET_SCHED_MACRO_FUSION_PAIR_P. I found the spec test item that
> > generated
> >
> > this instruction pair. I i
On Tue, 2025-01-21 at 20:34 +0800, Lulu Cheng wrote:
>
> 在 2025/1/21 下午6:05, Xi Ruoyao 写道:
> > On Tue, 2025-01-21 at 16:41 +0800, Lulu Cheng wrote:
> > > 在 2025/1/21 下午12:59, Xi Ruoyao 写道:
> > > > On Tue, 2025-01-21 at 11:46 +0800, Lulu Cheng wrote:
> >
On Tue, 2025-01-21 at 16:41 +0800, Lulu Cheng wrote:
>
> 在 2025/1/21 下午12:59, Xi Ruoyao 写道:
> > On Tue, 2025-01-21 at 11:46 +0800, Lulu Cheng wrote:
> > > 在 2025/1/18 下午7:33, Xi Ruoyao 写道:
> > > /* snip */
> > > > ;; This code iterator
On Tue, 2025-01-21 at 11:46 +0800, Lulu Cheng wrote:
>
> 在 2025/1/18 下午7:33, Xi Ruoyao 写道:
> /* snip */
> > ;; This code iterator allows unsigned and signed division to be generated
> > ;; from the same template.
> > @@ -3083,39 +3084
For mask{eq,ne}z, rk is always compared with 0 in the full width, thus
the mode for rk should be X.
I found the issue reviewing a patch fixing a similar issue for RISC-V
XTheadCondMov [1], but interestingly I cannot find a test case really
blowing up on LoongArch. But as the issue is obvious enou
e/gcc.target/riscv/xtheadcondmov-bug.c
> @@ -0,0 +1,12 @@
> +/* { dg-do compile { target { rv64 } } } */
> +/* { dg-options "-march=rv64gc_xtheadcondmov -mabi=lp64d -O2" } */
> +
> +__attribute__((noinline, noclone)) long long int
The attributes are useless as nothing is
For things like
(x | 0x101) << 11
It's obvious to write:
ori $r4,$r4,257
slli.d $r4,$r4,11
But we are actually generating something insane:
lu12i.w $r12,524288>>12 # 0x8
ori $r12,$r12,2048
slli.d $r4,$r4,11
or
For bstrins, we can merge it into and3 instead of having a
separate define_insn.
For bstrpick, we can use the constraints to ensure the first source
register and the destination register are the same hardware register,
instead of emitting a move manually.
This will simplify the next commit where
于 2025年1月15日 GMT+08:00 下午6:12:34,Xi Ruoyao 写道:
>For things like
>
>(x | 0x101) << 11
>
>It's obvious to write:
>
>ori $r4,$r4,257
>slli.d $r4,$r4,11
>
>But we are actually generating something insane:
>
>
On Thu, 2025-01-16 at 20:52 +0800, Xi Ruoyao wrote:
> On Thu, 2025-01-16 at 20:30 +0800, Lulu Cheng wrote:
> >
> > 在 2025/1/15 下午6:10, Xi Ruoyao 写道:
> > > diff --git a/gcc/config/loongarch/loongarch.cc
> > > b/gcc/config/loongarch/loongarch.cc
> >
1 - 100 of 1136 matches
Mail list logo