Re: [PATCH v2] LoongArch: testsuite:Added additional vectorization "-mlsx" option.

2024-01-14 Thread Xi Ruoyao
On Mon, 2024-01-15 at 09:29 +0800, chenxiaolong wrote: > At 21:13 +0800 on Saturday, 2024-01-13, Xi Ruoyao wrote: > > At 15:28 +0800 on Saturday 2024-01-13, chenxiaolong wrote: > > > gcc/testsuite/ChangeLog: > > > > > > * gcc.dg/pr104992.c: Added addition

Re: [PATCH v2] LoongArch: testsuite:Added additional vectorization "-mlsx" option.

2024-01-14 Thread Xi Ruoyao
On Mon, 2024-01-15 at 14:32 +0800, YunQiang Su wrote: > Xi Ruoyao 于2024年1月15日周一 12:11写道: > > > > On Mon, 2024-01-15 at 09:29 +0800, chenxiaolong wrote: > > > At 21:13 +0800 on Saturday, 2024-01-13, Xi Ruoyao wrote: > > > > At 15:28 +0800 on Saturday 2024-01-1

Re: [PATCH v2] LoongArch: testsuite:Added additional vectorization "-mlsx" option.

2024-01-14 Thread Xi Ruoyao
On Mon, 2024-01-15 at 15:10 +0800, chenxiaolong wrote: > At 14:42 +0800 on the first day of 2024-01-15, Xi Ruoyao wrote: > > On Mon, 2024-01-15 at 14:32 +0800, YunQiang Su wrote: > > > Xi Ruoyao wrote at 12:11pm on Monday, January > > > 15, 2024: > > >

Re: [PATCH v2] LoongArch: testsuite:Added additional vectorization "-mlsx" option.

2024-01-15 Thread Xi Ruoyao
On Tue, 2024-01-16 at 10:57 +0800, chenxiaolong wrote: > 在 2024-01-15一的 15:50 +0800,Xi Ruoyao写道: > > On Mon, 2024-01-15 at 15:10 +0800, chenxiaolong wrote: > > > At 14:42 +0800 on the first day of 2024-01-15, Xi Ruoyao wrote: > > > > On Mon, 2024-01-15 at

Ping: [PATCH] LoongArch: Remove constraint z from movsi_internal

2024-01-15 Thread Xi Ruoyao
Ping. On Fri, 2023-12-15 at 20:56 +0800, Xi Ruoyao wrote: > We don't allow SImode in FCC, so constraint z is never really used > here. > > gcc/ChangeLog: > > * config/loongarch/loongarch.md (movsi_internal): Remove > constraint z. > --- > > Bootst

Re: Ping: [PATCH] LoongArch: Remove constraint z from movsi_internal

2024-01-15 Thread Xi Ruoyao
On Tue, 2024-01-16 at 14:16 +0800, chenglulu wrote: > > > 在 2024/1/16 下午1:34, Xi Ruoyao 写道: > > Ping. > > > > On Fri, 2023-12-15 at 20:56 +0800, Xi Ruoyao wrote: > > > We don't allow SImode in FCC, so constraint z is never really us

Re: [PATCH v2] LoongArch: testsuite:Added additional vectorization "-mlsx" option.

2024-01-16 Thread Xi Ruoyao
On Tue, 2024-01-16 at 12:58 +0800, Xi Ruoyao wrote: > On Tue, 2024-01-16 at 10:57 +0800, chenxiaolong wrote: > > 在 2024-01-15一的 15:50 +0800,Xi Ruoyao写道: > > > On Mon, 2024-01-15 at 15:10 +0800, chenxiaolong wrote: > > > > At 14:42 +0800 on the first day of 2024-01-15,

Re: [PATCH] libstdc++: atomic: Add missing clear_padding in __atomic_float constructor

2024-01-16 Thread Xi Ruoyao
ite/lib/dg-options.exp > +++ b/libstdc++-v3/testsuite/lib/dg-options.exp > @@ -337,6 +337,7 @@ proc add_options_for_libatomic { flags } { >    || ([istarget powerpc*-*-*] && [check_effective_target_ilp32]) >    || [istarget riscv*-*-*] >    || ([istarget sparc*-*-linux-gnu] &

Re: [PATCH v2 2/2] LoongArch: When the code model is extreme, the symbol address is obtained through macro instructions regardless of the value of -mexplicit-relocs.

2024-01-17 Thread Xi Ruoyao
On Wed, 2024-01-17 at 17:38 +0800, chenglulu wrote: > > 在 2024/1/13 下午9:05, Xi Ruoyao 写道: > > 在 2024-01-13星期六的 15:01 +0800,chenglulu写道: > > > 在 2024/1/12 下午7:42, Xi Ruoyao 写道: > > > > 在 2024-01-12星期五的 09:46 +0800,chenglulu写道: > > > > > > >

Re: [PATCH v2] LoongArch: testsuite:Added additional vectorization "-mlsx" option.

2024-01-17 Thread Xi Ruoyao
;t understand the purpose of adding > '-fno-tree-vectorize' here. I don't think -fno-tree-vectorize will make a difference here. This test case uses __attribute__((vector_size(...))) explicitly so the vector operation will be used even if -fno-tree-vectorize. -- Xi Ruoyao School of Aerospace Science and Technology, Xidian University

Re: [PATCH v2 2/2] LoongArch: When the code model is extreme, the symbol address is obtained through macro instructions regardless of the value of -mexplicit-relocs.

2024-01-18 Thread Xi Ruoyao
gister_operand" "=r") (unspec:DI [(match_dup 2) (pc)] UNSPEC_LA_PCREL_64_PART2))] With this the buggy REG_UNUSED notes were gone. But it then prevented the CSE when loading the address of __tls_get_addr (i.e. if we address 10 TLE_LD symbols in a function it would emit 10 instance

[PATCH] LoongArch: Disable explicit reloc for TLS LD/GD with -mexplicit-relocs=auto

2024-01-22 Thread Xi Ruoyao
Binutils 2.42 supports TLS LD/GD relaxation which requires the assembler macro. gcc/ChangeLog: * config/loongarch/loongarch.cc (loongarch_explicit_relocs_p): If la_opt_explicit_relocs is EXPLICIT_RELOCS_AUTO, return false for SYMBOL_TLS_LDM and SYMBOL_TLS_GD. (loon

Pushed: [PATCH v2] LoongArch: Disable explicit reloc for TLS LD/GD with -mexplicit-relocs=auto

2024-01-23 Thread Xi Ruoyao
On Tue, 2024-01-23 at 10:37 +0800, chenglulu wrote: > LGTM! > > Thanks! Pushed v2 as attached. The only change is in the comment: Qinggang told me TLE LE relaxation actually *requires* explicit relocs. -- Xi Ruoyao School of Aerospace Science and Technology, Xidian Univer

[PATCH] LoongArch: testsuite: Disable stack protector for got-load.C

2024-01-23 Thread Xi Ruoyao
When building GCC with --enable-default-ssp, the stack protector is enabled for got-load.C, causing additional GOT loads for __stack_chk_guard. So mem/u will be matched more than 2 times and the test will fail. Disable stack protector to fix this issue. gcc/testsuite: * g++.target/loong

[PATCH] testsuite: Make pr104992.c irrelated to target vector feature [PR113418]

2024-01-23 Thread Xi Ruoyao
The vect_int_mod target selector is evaluated with the options in DEFAULT_VECTCFLAGS in effect, but these options are not automatically passed to tests out of the vect directories. So this test fails on targets where integer vector modulo operation is supported but requiring an option to enable, f

Re: [PATCH v2 2/2] LoongArch: When the code model is extreme, the symbol address is obtained through macro instructions regardless of the value of -mexplicit-relocs.

2024-01-23 Thread Xi Ruoyao
k only papers over the same issue caused spec2006 failure. I tried a bootstrap with BOOT_CFLAGS=-O2 -g -mcmodel=extreme and TARGET_DELEGITIMIZE_ADDRESS commented out, and there is no more spurious "note: non-delegitimized UNSPEC UNSPEC_LA_PCREL_64_PART1 (42) found in variable location" things. I feel that this hook is still written in a buggy way, so maybe removing it will solve the spec2017 issue. -- Xi Ruoyao School of Aerospace Science and Technology, Xidian University

Re: [PATCH] LoongArch: Fix incorrect return type for frecipe/frsqrte intrinsic functions

2024-01-24 Thread Xi Ruoyao
n __inline float >  __attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) >  __frecipe_s (float _1) >  { > -  __builtin_loongarch_frecipe_s ((float) _1); > +  return (float) __builtin_loongarch_frecipe_s ((float) _1); I don't think the (float) conversion is needed. -- Xi Ruoyao School of Aerospace Science and Technology, Xidian University

Re: [PATCH] testsuite: Make pr104992.c irrelated to target vector feature [PR113418]

2024-01-24 Thread Xi Ruoyao
On Wed, 2024-01-24 at 18:32 +0800, chenxiaolong wrote: > On 20:09 +0800 on Tuesday, 2024-01-23, Xi Ruoyao wrote: > > The vect_int_mod target selector is evaluated with the options in > > DEFAULT_VECTCFLAGS in effect, but these options are not automatically > > passed to

Re: [PATCH] testsuite: Make pr104992.c irrelated to target vector feature [PR113418]

2024-01-24 Thread Xi Ruoyao
On Wed, 2024-01-24 at 19:08 +0800, chenxiaolong wrote: > At 19:00 +0800 on Wednesday, 2024-01-24, Xi Ruoyao wrote: > > On Wed, 2024-01-24 at 18:32 +0800, chenxiaolong wrote: > > > On 20:09 +0800 on Tuesday, 2024-01-23, Xi Ruoyao wrote: > > > > The vect_int_mod target

Re: [PATCH v2 2/2] LoongArch: When the code model is extreme, the symbol address is obtained through macro instructions regardless of the value of -mexplicit-relocs.

2024-01-24 Thread Xi Ruoyao
On Thu, 2024-01-25 at 08:48 +0800, chenglulu wrote: > > 在 2024/1/24 上午3:36, Xi Ruoyao 写道: > > On Mon, 2024-01-22 at 15:27 +0800, chenglulu wrote: > > > > > The failure of this test case was because the compiler believes that > > > > > two > > &g

Re: [PATCH v4 0/4] When cmodel=extreme, add macro support and only support macros.

2024-01-26 Thread Xi Ruoyao
eme TLS GD/LD with -mexplicit-relocs=auto. I've rebased and attached the patch to fix the bad split in -mexplicit- relocs={always,auto} -mcmodel=extreme on top of this series. I've not tested it seriously though (only tested the added and modified test cases). -- Xi Ruoyao School of Aero

Re: [PATCH v4 1/4] LoongArch: Merge template got_load_tls_{ld/gd/le/ie}.

2024-01-26 Thread Xi Ruoyao
n "la.tls.le\t%0,%1"; > +    case SYMBOL_TLS_IE: > +  return "la.tls.ie\t%0,%1"; > +    case SYMBOL_TLSLDM: > +  return "la.tls.ld\t%0,%1"; > +    case SYMBOL_TLSGD: > +  return "la.tls.gd\t%0,%1"; /* snip */ > +    default:

Re: [PATCH v4 2/4] LoongArch: Add the macro implementation of mcmodel=extreme.

2024-01-26 Thread Xi Ruoyao
turn "la.tls.gd\t%0,%2,%1"; > +    case SYMBOL_TLSLDM: > +  return "la.tls.ld\t%0,%2,%1"; > + > +    default: > +  gcc_unreachable (); > +  } > +} > + "&& REG_P (operands[1]) && find_reg_note (insn, REG_UNUSED, operands[2]) != > 0" > + [(set (match_dup 0) (match_dup 1))] > + "" > + [(set_attr "mode" "DI") > +  (set_attr "length" "5")]) Should be 20, in bytes. -- Xi Ruoyao School of Aerospace Science and Technology, Xidian University

Re: [PATCH v4 0/4] When cmodel=extreme, add macro support and only support macros.

2024-01-26 Thread Xi Ruoyao
On Fri, 2024-01-26 at 16:59 +0800, chenglulu wrote: > > 在 2024/1/26 下午4:49, Xi Ruoyao 写道: > > On Fri, 2024-01-26 at 15:37 +0800, Lulu Cheng wrote: > > > v3 -> v4: > > >    1. Add macro support for TLS symbols > > >    2. Added support for loading __get_t

Re: [PATCH v4 0/4] When cmodel=extreme, add macro support and only support macros.

2024-01-27 Thread Xi Ruoyao
On Sat, 2024-01-27 at 11:15 +0800, chenglulu wrote: > > 在 2024/1/26 下午6:57, Xi Ruoyao 写道: > > On Fri, 2024-01-26 at 16:59 +0800, chenglulu wrote: > > > 在 2024/1/26 下午4:49, Xi Ruoyao 写道: > > > > On Fri, 2024-01-26 at 15:37 +0800, Lulu Cheng wrote: > > >

Re: [PATCH v4 0/4] When cmodel=extreme, add macro support and only support macros.

2024-01-27 Thread Xi Ruoyao
On Sat, 2024-01-27 at 18:02 +0800, Xi Ruoyao wrote: > On Sat, 2024-01-27 at 11:15 +0800, chenglulu wrote: > > > > 在 2024/1/26 下午6:57, Xi Ruoyao 写道: > > > On Fri, 2024-01-26 at 16:59 +0800, chenglulu wrote: > > > > 在 2024/1/26 下午4:49, Xi Ruoyao 写道: > > >

Re: [PATCH] LoongArch: Fix soft-float builds of libffi

2024-01-31 Thread Xi Ruoyao
at. You need to wait until the PR is accepted by the libffi maintainers. Frankly I don't know what libffi maintainers are busy on and I'm frustrated as well (having a MIPS patch unreviewed there for a month) but this is the procedure :(. -- Xi Ruoyao School of Aerospace Science and Technology, Xidian University

Re: [PATCH 2/3] LoongArch: Fix instruction costs [PR112936]

2023-12-13 Thread Xi Ruoyao
On Wed, 2023-12-13 at 20:22 +0800, chenglulu wrote: 在 2023/12/10 上午1:03, Xi Ruoyao 写道: Replace the instruction costs in loongarch_rtx_cost_data constructor based on micro-benchmark results on LA464 and LA664. This allows optimizations like "x * 17" to alsl, and "x * 68" to

[PATCH] LoongArch: Use the movcf2gr instruction to implement cstore4

2023-12-13 Thread Xi Ruoyao
We used a branch to load floating-point comparison results into GPR. This is very slow when the branch is not predictable. Use the movcf2gr instruction to implement cstore4 if movcf2gr is fast enough. gcc/ChangeLog: * config/loongarch/genopts/loongarch.opt.in (muse-movcf2gr): New

Re: [PATCH] LoongArch: Use the movcf2gr instruction to implement cstore4

2023-12-14 Thread Xi Ruoyao
t;  0x1206ac93f execute > ../../gcc/gcc/ira.cc:6161 -- Xi Ruoyao School of Aerospace Science and Technology, Xidian University

[PATCH v2] LoongArch: Implement FCCmode reload and cstore4

2023-12-15 Thread Xi Ruoyao
We used a branch to load floating-point comparison results into GPR. This is very slow when the branch is not predictable. Implement movfcc so we can reload FCCmode into GPRs, FPRs, and MEM. Then implement cstore4. gcc/ChangeLog: * config/loongarch/loongarch-tune.h (loongarch_rtx

[PATCH] LoongArch: Remove constraint z from movsi_internal

2023-12-15 Thread Xi Ruoyao
We don't allow SImode in FCC, so constraint z is never really used here. gcc/ChangeLog: * config/loongarch/loongarch.md (movsi_internal): Remove constraint z. --- Bootstrapped and regtested on loongarch64-linux-gnu. Ok for trunk? gcc/config/loongarch/loongarch.md | 6 +++--- 1

[PATCH] LoongArch: Fix FP vector comparsons [PR113034]

2023-12-17 Thread Xi Ruoyao
We had the following mappings between vfcmp submenmonics and RTX codes: (define_code_attr fcc [(unordered "cun") (ordered "cor") (eq "ceq") (ne "cne") (uneq "cueq") (unle "cule") (unlt "cult") (le "cle")

[PATCH] LoongArch: Add sign_extend pattern for 32-bit rotate shift

2023-12-17 Thread Xi Ruoyao
Remove a redundant sign extension. gcc/ChangeLog: * config/loongarch/loongarch.md (rotrsi3_extend): New define_insn. gcc/testsuite/ChangeLog: * gcc.target/loongarch/rotrw.c: New test. --- Bootstrapped and regtested on loongarch64-linux-gnu. Ok for trunk? gcc/config/l

Pushed: [PATCH 0/3] LoongArch: Fix instruction costs

2023-12-17 Thread Xi Ruoyao
On Sun, 2023-12-10 at 01:03 +0800, Xi Ruoyao wrote: > Update LoongArch instruction costs based on the micro-benchmark results > on LA464 and LA664.  In particular, this allows generating alsl/slli or > alsl/slli + add pairs for multiplying some constants as on LA464/LA664 > a mul instr

[PATCH] middle-end: Call negate_rtx instead of simplify_gen_unary expanding rotate shift [PR113033]

2023-12-18 Thread Xi Ruoyao
With simplify_gen_unary we end up with a not fully expanded RTX like (set (reg:SI 90) (and:SI (neg:SI (reg:SI 80)) (const_int 63))) Then it will cause an ICE with unrecognizable insn. gcc/ChangeLog: PR middle-end/113033 * expmed.cc (expand_shift_1): When expanding rotate shi

[PATCH] LoongArch: Expand left rotate to right rotate with negated amount

2023-12-18 Thread Xi Ruoyao
gcc/ChangeLog: * config/loongarch/loongarch.md (rotl3): New define_expand. * config/loongarch/simd.md (vrotl3): Likewise. (rotl3): Likewise. gcc/testsuite/ChangeLog: * gcc.target/loongarch/rotl-with-rotr.c: New test. * gcc.target/loongarch/rotl-wit

Re: [PATCH] middle-end: Call negate_rtx instead of simplify_gen_unary expanding rotate shift [PR113033]

2023-12-18 Thread Xi Ruoyao
On Mon, 2023-12-18 at 08:39 -0700, Jeff Law wrote: > > > On 12/18/23 06:42, Xi Ruoyao wrote: > > With simplify_gen_unary we end up with a not fully expanded RTX like > > > > (set (reg:SI 90) (and:SI (neg:SI (reg:SI 80)) (const_int 63))) > > &

Re: [PATCH] middle-end: Call negate_rtx instead of simplify_gen_unary expanding rotate shift [PR113033]

2023-12-18 Thread Xi Ruoyao
On Mon, 2023-12-18 at 18:45 +0100, Jakub Jelinek wrote: > On Tue, Dec 19, 2023 at 12:48:46AM +0800, Xi Ruoyao wrote: > > > > gcc/ChangeLog: > > > > > > > > PR middle-end/113033 > > > > * expmed.cc (expand_shift_1): When expa

Re: [PATCH] middle-end: Call negate_rtx instead of simplify_gen_unary expanding rotate shift [PR113033]

2023-12-18 Thread Xi Ruoyao
> I've looked e.g. at i386 vec_init and that is exactly what it does, > see the various tests + force_reg calls in ix86_expand_vector_init*. Ok, I'm abandoning abandon this patch and I'll rework. -- Xi Ruoyao School of Aerospace Science and Technology, Xidian University

[PATCH 0/2] LoongArch: Fix PR113033 and clean up code

2023-12-18 Thread Xi Ruoyao
code clean up is separated into the 2nd patch to make reviewing easier. Bootstrapped and regtested on loongarch64-linux-gnu. Ok for trunk? Xi Ruoyao (2): LoongArch: Use force_reg instead of gen_reg_rtx + emit_move_insn in vec_init expander [PR113033] LoongArch: Clean up vec_init expa

[PATCH 1/2] LoongArch: Use force_reg instead of gen_reg_rtx + emit_move_insn in vec_init expander [PR113033]

2023-12-18 Thread Xi Ruoyao
Jakub says: Then that seems like a bug in the loongarch vec_init pattern(s). Those really don't have a predicate in any of the backends on the input operand, so they need to force_reg it if it is something it can't handle. I've looked e.g. at i386 vec_init and that is exactly w

[PATCH 2/2] LoongArch: Clean up vec_init expander

2023-12-18 Thread Xi Ruoyao
Non functional change, clean up the code. gcc/ChangeLog: * config/loongarch/loongarch.cc (loongarch_expand_vector_init_same): Remove "temp2" and reuse "temp" instead. (loongarch_expand_vector_init): Use gcc_unreachable () instead of gcc_assert (0), and fix

Re: Fwd: [PATCH] LoongArch: Fix FP vector comparsons [PR113034]

2023-12-19 Thread Xi Ruoyao
e LSX/LASX code is wrong. > > Most seriously, the RTX code NE should be mapped to "cneq", not "cne". > > The "cneq" in the commit info may be "cune" according to the context? Oops, indeed. I'll push the patch with this typo fixed. > -- Xi Ruoyao School of Aerospace Science and Technology, Xidian University

Re: [PATCH] LoongArch: Added TLS Le Relax support.

2023-12-19 Thread Xi Ruoyao
_r". Or we'll hit: t.c:11:1: internal compiler error: output_operand: operand number missing after %-letter > +  [(set_attr "type" "move")] > +) > + -- Xi Ruoyao School of Aerospace Science and Technology, Xidian University

Ping: [PATCH] LoongArch: Replace -mexplicit-relocs=auto simple-used address peephole2 with combine

2023-12-21 Thread Xi Ruoyao
Ping :). On Tue, 2023-12-12 at 14:47 +0800, Xi Ruoyao wrote: > The problem with peephole2 is it uses a naive sliding-window algorithm > and misses many cases.  For example: > >     float a[1]; >     float t() { return a[0] + a[8000]; } > > is compiled to: >

Re: Ping: [PATCH] LoongArch: Replace -mexplicit-relocs=auto simple-used address peephole2 with combine

2023-12-21 Thread Xi Ruoyao
g the peephole besides the new define_insn_and_split produces a better result instead of solely relying on define_insn_and_split? -- Xi Ruoyao School of Aerospace Science and Technology, Xidian University

Re: Ping: [PATCH] LoongArch: Replace -mexplicit-relocs=auto simple-used address peephole2 with combine

2023-12-23 Thread Xi Ruoyao
here is a problem. My regression test has the following two fail > items.(based on r14-6787) > +FAIL: gcc.dg/cpp/_Pragma3.c (test for excess errors) > +FAIL: gcc.dg/pr86617.c scan-rtl-dump-times final "mem/v" 6 Strange. I didn't see them on r14-6650 (with or without the patch)

Re: Ping: [PATCH] LoongArch: Replace -mexplicit-relocs=auto simple-used address peephole2 with combine

2023-12-23 Thread Xi Ruoyao
On Sat, 2023-12-23 at 18:44 +0800, Xi Ruoyao wrote: > On Sat, 2023-12-23 at 10:29 +0800, chenglulu wrote: > > > The performance drop has nothing to do with this patch. I found that the > > > h264 performance compiled > > > by r14-6787 compared to r14-6421 dropped

Re: [PATCH] LoongArch: Expand left rotate to right rotate with negated amount

2023-12-23 Thread Xi Ruoyao
ence may be caused by a different binutils version or some other changes in GCC. I'll figure it out... -- Xi Ruoyao School of Aerospace Science and Technology, Xidian University

Re: [PATCH] LoongArch: Expand left rotate to right rotate with negated amount

2023-12-23 Thread Xi Ruoyao
On Sun, 2023-12-24 at 00:56 +0800, Xi Ruoyao wrote: > On Sat, 2023-12-23 at 15:00 +0800, chenglulu wrote: > > Hi, > > > > This patch will cause the following tests to fail: > > > > +FAIL: gcc.dg/vect/pr97081-2.c (internal compiler error: in extract_insn, > &

[PATCH v2] LoongArch: Expand left rotate to right rotate with negated amount

2023-12-24 Thread Xi Ruoyao
gcc/ChangeLog: * config/loongarch/loongarch.md (rotl3): New define_expand. * config/loongarch/simd.md (vrotl3): Likewise. (rotl3): Likewise. gcc/testsuite/ChangeLog: * gcc.target/loongarch/rotl-with-rotr.c: New test. * gcc.target/loongarch/rotl-wit

Re: [PATCH] LoongArch: Expand left rotate to right rotate with negated amount

2023-12-24 Thread Xi Ruoyao
On Sun, 2023-12-24 at 01:04 +0800, Xi Ruoyao wrote: > On Sun, 2023-12-24 at 00:56 +0800, Xi Ruoyao wrote: > > On Sat, 2023-12-23 at 15:00 +0800, chenglulu wrote: > > > Hi, > > > > > > This patch will cause the following tests to fail: > > > > >

Re: Ping: [PATCH] LoongArch: Replace -mexplicit-relocs=auto simple-used address peephole2 with combine

2023-12-24 Thread Xi Ruoyao
On Sat, 2023-12-23 at 18:47 +0800, Xi Ruoyao wrote: > On Sat, 2023-12-23 at 18:44 +0800, Xi Ruoyao wrote: > > On Sat, 2023-12-23 at 10:29 +0800, chenglulu wrote: > > > > The performance drop has nothing to do with this patch. I found that > > > > the h264 performa

Re: [PATCH v1] LoongArch: Fixed bug in *bstrins__for_ior_mask template.

2023-12-25 Thread Xi Ruoyao
gt; +  "&& true" >    [(set (match_dup 0) (match_dup 1)) >     (set (zero_extract:GPR (match_dup 0) (match_dup 2) (match_dup 4)) >   (match_dup 3))] -- Xi Ruoyao School of Aerospace Science and Technology, Xidian University

Re: Ping: [PATCH] LoongArch: Replace -mexplicit-relocs=auto simple-used address peephole2 with combine

2023-12-25 Thread Xi Ruoyao
On Mon, 2023-12-25 at 10:08 +0800, chenglulu wrote: > > 在 2023/12/24 下午8:59, Xi Ruoyao 写道: > > On Sat, 2023-12-23 at 18:47 +0800, Xi Ruoyao wrote: > > > On Sat, 2023-12-23 at 18:44 +0800, Xi Ruoyao wrote: > > > > On Sat, 2023-12-23 at 10:29 +0800, chenglulu wrote:

[PATCH v2] LoongArch: Replace -mexplicit-relocs=auto simple-used address peephole2 with combine

2023-12-25 Thread Xi Ruoyao
The problem with peephole2 is it uses a naive sliding-window algorithm and misses many cases. For example: float a[1]; float t() { return a[0] + a[8000]; } is compiled to: la.local$r13,a la.local$r12,a+32768 fld.s $f1,$r13,0 fld.s $f0,$r12,-768

[PATCH] LoongArch: Fix infinite secondary reloading of FCCmode [PR113148]

2023-12-26 Thread Xi Ruoyao
The GCC internal doc says: X might be a pseudo-register or a 'subreg' of a pseudo-register, which could either be in a hard register or in memory. Use 'true_regnum' to find out; it will return -1 if the pseudo is in memory and the hard register number if it is in a register.

Re: Ping: [PATCH] LoongArch: Replace -mexplicit-relocs=auto simple-used address peephole2 with combine

2023-12-27 Thread Xi Ruoyao
ymbol_ref:DI ("*.LANCHOR0") [flags 0x182])) [0 S1 > A8]))) "volatile.c":5:11 -1 >  (nil)) > > The volatile property of the mem here is gone, so the test fails. Phew. I guess I couldn't reproduce it because I have Jeff's ext-dce patch in my local repo, which removed the zero_extend... I'll rework this patch. -- Xi Ruoyao School of Aerospace Science and Technology, Xidian University

Re: [PATCH v1] LoongArch: Merge constant vector permuatation implementations.

2023-12-28 Thread Xi Ruoyao
; >nelt, > +    > rperm)); > +   tmp = gen_rtx_SUBREG (E_V4DImode, d->target, 0); Likewise. > +   emit_move_insn (tmp, sel); > +   break; > +     case E_V8SFmode: > +   sel = ge

[PATCH v3] LoongArch: Replace -mexplicit-relocs=auto simple-used address peephole2 with combine

2023-12-28 Thread Xi Ruoyao
The problem with peephole2 is it uses a naive sliding-window algorithm and misses many cases. For example: float a[1]; float t() { return a[0] + a[8000]; } is compiled to: la.local$r13,a la.local$r12,a+32768 fld.s $f1,$r13,0 fld.s $f0,$r12,-768

Re: [PATCH v3] LoongArch: Replace -mexplicit-relocs=auto simple-used address peephole2 with combine

2023-12-29 Thread Xi Ruoyao
+  op = XEXP (op, 0); > > +  return symbolic_pcrel_operand (op, Pmode) || > > +symbolic_pcrel_offset_operand (op, Pmode); > > +}) > > + > >   > Symbol '||' It shouldn't be at the end of the line. Indeed. > > +  return symbolic_pcrel_operand (op, Pmode) > +    || symbolic_pcrel_offset_operand (op, Pmode); > > Others LGTM. > Thanks! > > /* snip */ > -- Xi Ruoyao School of Aerospace Science and Technology, Xidian University

Pushed: [PATCH v4] LoongArch: Replace -mexplicit-relocs=auto simple-used address peephole2 with combine

2023-12-29 Thread Xi Ruoyao
Pushed v4 as attached, with the format issues fixed and a minor adjustment in the commit message ("define_insn_and_split" is changed to "define_insn_and_rewrite" to match the actual change). On Fri, 2023-12-29 at 19:55 +0800, Xi Ruoyao wrote: > On Fri, 2023-12-29 at 15:57

[PATCH pushed] LoongArch: Fix the format of bstrins__for_ior_mask condition (NFC)

2023-12-29 Thread Xi Ruoyao
gcc/ChangeLog: * config/loongarch/loongarch.md (bstrins__for_ior_mask): For the condition, remove unneeded trailing "\" and move "&&" to follow GNU coding style. NFC. --- Pushed as obvious. gcc/config/loongarch/loongarch.md | 4 ++-- 1 file changed, 2 insertions(+), 2 d

Re: [PATCH v1] LoongArch: testsuite:Add the "-ffast-math" compilation option for the file vect-fmin-3.c.

2023-12-30 Thread Xi Ruoyao
but not reduc_fmin_scal_*? > If so, we probably need a new target selector for fmin/fmax reduction. Let me try if the [x]vf{min,max} instructions are IEEE-conform. They've still not released the volume 2 of the instruction manual so I can only try... -- Xi Ruoyao School of Aerospace Science and Technology, Xidian University

Re: [PATCH v1] LoongArch: testsuite:Add the "-ffast-math" compilation option for the file vect-fmin-3.c.

2023-12-30 Thread Xi Ruoyao
On Sat, 2023-12-30 at 20:25 +0800, Xi Ruoyao wrote: > On Sat, 2023-12-30 at 12:15 +, Richard Sandiford wrote: > > This shouldn't be necessary.  The test does: > > > >   for (int i = 0; i < n; i += 2) > >     { > >   x0 = __builtin_fmin (x0, ptr[i

[PATCH] LoongArch: Provide fmin/fmax RTL pattern for vectors

2023-12-31 Thread Xi Ruoyao
We already had smin/smax RTL pattern using vfmin/vfmax instructions. But for smin/smax, it's unspecified what will happen if either operand contains any NaN operands. So we would not vectorize the loop with -fno-finite-math-only (the default for all optimization levels expect -Ofast). But, LoongA

Pushed: [PATCH] LoongArch: Provide fmin/fmax RTL pattern for vectors

2024-01-03 Thread Xi Ruoyao
On Wed, 2024-01-03 at 16:24 +0800, chenglulu wrote: > LGTM! > > Thanks! Pushed r14-6890. FWIW sometimes tree optimizer still fails to emit .reduc_f{max,min} or it emits them sub-optimally. I've commented in PR112457 but maybe I should've created a new ticket... > 在 2024

Re: [PATCH 1/2] LoongArch: Add the macro implementation of mcmodel=extreme.

2024-01-03 Thread Xi Ruoyao
match_operand:DI 2 "register_operand "=&r"))] And use gen_movdi_pcrel64 (operands[0], operands[1], gen_reg_rtx(DImode)) in expand. > + "TARGET_64BIT" > + "la.local %0,$r15,%1" > + [(set_attr "mode" "DI") > +  (set_attr "length" "5")]) -- Xi Ruoyao School of Aerospace Science and Technology, Xidian University

Re: [PATCH 1/2] LoongArch: Add the macro implementation of mcmodel=extreme.

2024-01-03 Thread Xi Ruoyao
On Thu, 2024-01-04 at 11:58 +0800, chenglulu wrote: > > 在 2024/1/4 上午11:51, Xi Ruoyao 写道: > > On Wed, 2023-12-27 at 16:46 +0800, Lulu Cheng wrote: > > > +(define_insn "movdi_pcrel64" > > > + [(set (match_operand:DI 0 "register_oper

Re: [RFA] [V3] new pass for sign/zero extension elimination

2024-01-04 Thread Xi Ruoyao
x27;s to get as much testing > as possible.  Assuming the rest is ACK'd for the trunk we'll put it into > the list of optimizations enabled by -O2. -- Xi Ruoyao School of Aerospace Science and Technology, Xidian University

Re: [PATCH]middle-end: Don't apply copysign optimization if target does not implement optab [PR112468]

2024-01-04 Thread Xi Ruoyao
_effective_target_s390_vx]) > > +|| ([istarget riscv*-*-*] > > + && [check_effective_target_riscv_v]) > > Unless I'm missing something, we have copysign in the scalar > floating-point ISAs as well.  So I think this should be > >   || ([istarget riscv*-*-*] >   && [check_effective_target_hard_float]) -- Xi Ruoyao School of Aerospace Science and Technology, Xidian University

Re: [PATCH v2 2/2] LoongArch: When the code model is extreme, the symbol address is obtained through macro instructions regardless of the value of -mexplicit-relocs.

2024-01-05 Thread Xi Ruoyao
e several hours trying to implement this... -- Xi Ruoyao School of Aerospace Science and Technology, Xidian University

Re: [PATCH v2 2/2] LoongArch: When the code model is extreme, the symbol address is obtained through macro instructions regardless of the value of -mexplicit-relocs.

2024-01-05 Thread Xi Ruoyao
On Fri, 2024-01-05 at 17:57 +0800, chenglulu wrote: > > 在 2024/1/5 下午4:37, Xi Ruoyao 写道: > > On Fri, 2024-01-05 at 11:40 +0800, Lulu Cheng wrote: > > >   bool > > >   loongarch_explicit_relocs_p (enum loongarch_symbol_type type) > > >   { > > > +

Re: [PATCH]middle-end: Don't apply copysign optimization if target does not implement optab [PR112468]

2024-01-05 Thread Xi Ruoyao
fective_target_loongarch_sx] ||" because SIMD requires hard float. -- Xi Ruoyao School of Aerospace Science and Technology, Xidian University

Re: [PATCH v2 2/2] LoongArch: When the code model is extreme, the symbol address is obtained through macro instructions regardless of the value of -mexplicit-relocs.

2024-01-05 Thread Xi Ruoyao
On Fri, 2024-01-05 at 18:25 +0800, Xi Ruoyao wrote: > On Fri, 2024-01-05 at 17:57 +0800, chenglulu wrote: > > > > 在 2024/1/5 下午4:37, Xi Ruoyao 写道: > > > On Fri, 2024-01-05 at 11:40 +0800, Lulu Cheng wrote: > > > >   bool > > > >   loongarch_ex

Re: [PATCH 1/4] LoongArch: Handle ISA evolution switches along with other options

2024-01-05 Thread Xi Ruoyao
HAS_DIV32 etc. in the code base? It seems some of them are not replaced. -- Xi Ruoyao School of Aerospace Science and Technology, Xidian University

Re: [PATCH v2 2/2] LoongArch: When the code model is extreme, the symbol address is obtained through macro instructions regardless of the value of -mexplicit-relocs.

2024-01-05 Thread Xi Ruoyao
On Fri, 2024-01-05 at 20:45 +0800, chenglulu wrote: > > 在 2024/1/5 下午7:55, Xi Ruoyao 写道: > > On Fri, 2024-01-05 at 18:25 +0800, Xi Ruoyao wrote: > > > On Fri, 2024-01-05 at 17:57 +0800, chenglulu wrote: > > > > 在 2024/1/5 下午4:37, Xi Ruoyao 写道: > > > &

Re: [PATCH 2/3] LoongArch: Redundant sign extension elimination optimization.

2024-01-06 Thread Xi Ruoyao
_rtx (DImode); > +   emit_insn (gen_addsi3_extended (t, operands[1], operands[2])); AFAIK if !TARGET_64BIT a DImode should be actually a pair of hardware registers, but addsi3_extended don't output such a pair so this seems invalid... > +   t = gen_lowpart (SImode, t); > +

Re: [PATCH 1/3] LoongArch: Optimized some of the symbolic expansion instructions generated during bitwise operations.

2024-01-06 Thread Xi Ruoyao
uot;")]) >   > +(define_insn "*nsi_internal" > +  [(set (match_operand:SI 0 "register_operand" "=r") > + (neg_bitwise:SI > +     (not:SI (match_operand:SI 1 "register_operand" "r")) > +     (match_operand:SI 2 "register_operand" "r")))] > +  "TARGET_64BIT" > +  "n\t%0,%2,%1" > +  [(set_attr "type" "logical") > +   (set_attr "mode" "SI")]) >   >  ;; >  ;;  > @@ -3167,7 +3210,6 @@ (define_expand "condjump" >     (label_ref (match_operand 1)) >     (pc)))]) >   > - >   >  ;; >  ;;  > @@ -3967,10 +4009,13 @@ (define_insn "bytepick_w_" >  (define_insn "bytepick_w__extend" >    [(set (match_operand:DI 0 "register_operand" "=r") >   (sign_extend:DI > -   (ior:SI (lshiftrt (match_operand:SI 1 "register_operand" "r") > -     (const_int )) > -   (ashift (match_operand:SI 2 "register_operand" "r") > -   (const_int bytepick_w_ashift_amount)] > + (subreg:SI > +   (ior:DI (subreg:DI (lshiftrt > +   (match_operand:SI 1 "register_operand" "r") > +   (const_int )) 0) > +   (subreg:DI (ashift > +   (match_operand:SI 2 "register_operand" "r") > +   (const_int bytepick_w_ashift_amount)) 0)) 0)))] >    "TARGET_64BIT" >    "bytepick.w\t%0,%1,%2," >    [(set_attr "mode" "SI")]) > diff --git a/gcc/testsuite/gcc.target/loongarch/sign-extend-bitwise.c > b/gcc/testsuite/gcc.target/loongarch/sign-extend-bitwise.c > new file mode 100644 > index 000..5753ef69db2 > --- /dev/null > +++ b/gcc/testsuite/gcc.target/loongarch/sign-extend-bitwise.c > @@ -0,0 +1,21 @@ > +/* { dg-do compile } */ > +/* { dg-options "-mabi=lp64d -O2" } */ > +/* { dg-final { scan-assembler-not "slli.w\t\\\$r\[0-9\]+,\\\$r\[0-9\]+,0" } > } */ > + > +struct pmop > +{ > +  unsigned int op_pmflags; > +  unsigned int op_pmpermflags; > +}; > +unsigned int PL_hints; > + > +struct pmop *pmop; > +void > +Perl_newPMOP (int type, int flags) > +{ > +  if (PL_hints & 0x0010) > +    pmop->op_pmpermflags |= 0x0001; > +  if (PL_hints & 0x0004) > +    pmop->op_pmpermflags |= 0x0800; > +  pmop->op_pmflags = pmop->op_pmpermflags; > +} -- Xi Ruoyao School of Aerospace Science and Technology, Xidian University

Re: [PATCH 3/3] LoongArch: Redundant sign extension elimination optimization 2.

2024-01-06 Thread Xi Ruoyao
can-assembler-times "slli.w\t\\\$r\[0-9\]+,\\\$r\[0-9\]+,0" > 0 } } */ Use scan-assembler-not instead of scan-assembler-times ... 0. Otherwise LGTM. >  #include >  #define my_min(x, y) ((x) < (y) ? (x) : (y)) -- Xi Ruoyao School of Aerospace Science and Technology, Xidian University

[PATCH v2] doc: Update the status of build directory not fully separated

2023-11-30 Thread Xi Ruoyao
Recently there are some people building GCC with srcdir == objdir and the attempts just failed [1]. So stop to say "it should work". OTOH objdir as a subdirectory of srcdir works: we've built GCC in LFS [2] and BLFS [3] this way for decades and this is confirmed during the review of a previous ve

Re: [V2] New pass for sign/zero extension elimination -- not ready for "final" review

2023-11-30 Thread Xi Ruoyao
On Thu, 2023-11-30 at 08:44 -0700, Jeff Law wrote: > > > On 11/29/23 02:33, Xi Ruoyao wrote: > > On Mon, 2023-11-27 at 23:06 -0700, Jeff Law wrote: > > > This has (of course) been tested on rv64.  It's also been bootstrapped > > > and regression tested on x8

Ping: [PATCH v2] Only allow (int)trunc(x) to (int)x simplification with -ffp-int-builtin-inexact [PR107723]

2023-11-30 Thread Xi Ruoyao
Ping. On Fri, 2023-11-24 at 17:09 +0800, Xi Ruoyao wrote: > With -fno-fp-int-builtin-inexact, trunc is not allowed to raise > FE_INEXACT and it should produce an integral result (if the input is not > NaN or Inf).  Thus FE_INEXACT should not be raised. > > But (int)x may raise FE

Re: [PATCH v2 3/3] libphobos: LoongArch hardware support.

2023-12-01 Thread Xi Ruoyao
    version (D_SoftFloat) > +    return; > +    else > +    { > +    asm nothrow @nogc > +    { > +    "movgr2fcsr $r0,%0" : > +    : "r" (newState & (roundingMask | > allExceptions)); > +    } &g

Re: [PATCH] LoongArch: Add support for TLS descriptors

2023-12-01 Thread Xi Ruoyao
CC is configured to decide the default. -- Xi Ruoyao School of Aerospace Science and Technology, Xidian University

Re: [PATCH] LoongArch: Add support for TLS descriptors

2023-12-01 Thread Xi Ruoyao
On Fri, 2023-12-01 at 18:01 +0800, Xi Ruoyao wrote: > On Fri, 2023-12-01 at 17:55 +0800, mengqinggang wrote: > > Generate la.tls.desc macro instruction for TLS descriptors model. > > > > la.tls.desc expand to > >   pcalau12i $a0, %desc_pc_hi20(a) > >   ld.d 

Re: [PATCH] LoongArch: Add support for TLS descriptors

2023-12-01 Thread Xi Ruoyao
ult if it's supported by the assembler and --with-glibc-version= setting is high enough... Currently the only architecture (AFAIK) having TLS desc as the default is AArch64 because it supports TLS desc since the birthday. -- Xi Ruoyao School of Aerospace Science and Technology, Xidian University

Re: [PATCH v1 1/2] LoongArch: Switch loongarch-def from C to C++ to make it possible.

2023-12-02 Thread Xi Ruoyao
gt; +#if !defined(IN_LIBGCC2) && !defined(IN_TARGET_LIBS) && !defined(IN_RTS) >  #include "loongarch-def.h" > +#endif With this change we can revert r14-5634 (remove the #if !defined(IN_LIBGCC2) && !defined(IN_TARGET_LIBS) && !defined(IN_RTS) guards in loongarch-def.h as they'll be unneeded). -- Xi Ruoyao School of Aerospace Science and Technology, Xidian University

Re: [PATCH v1 1/2] LoongArch: Switch loongarch-def from C to C++ to make it possible.

2023-12-02 Thread Xi Ruoyao
t that the code can't go here, I will add a prompt > message here.:-( If I read the code correctly, this is indeed unreachable so we can just put gcc_unreachable() here. But maybe I'm wrong. -- Xi Ruoyao School of Aerospace Science and Technology, Xidian University

Re: [PATCH] gcc/doc: spelling mistakes and example

2023-12-02 Thread Xi Ruoyao
t;  @end smallexample No, this is definitely incorrect. srcdir is the path (it may be relative or absolute) to the GCC source tree. It's not necessary to be placed in the parent directory of objdir. -- Xi Ruoyao School of Aerospace Science and Technology, Xidian University

Re: [PATCH v1] LoongArch: Modify the check type of the vector builtin function.

2023-12-04 Thread Xi Ruoyao
    > \ > -  int *temp_ref = &ref[i], *temp_res = &res[i];   > \ > +  int *temp_ref = (int *)&ref[i], *temp_res = (int *)&res[i]; > \ >        if (abs (*temp_ref - *temp_res) > 0)    > \ > { > \ >    printf (" error: %s at line %ld , expected " #ref   > \ -- Xi Ruoyao School of Aerospace Science and Technology, Xidian University

Re: [PATCH v1] LoongArch: Modify the check type of the vector builtin function.

2023-12-04 Thread Xi Ruoyao
On Mon, 2023-12-04 at 20:31 +0800, Xi Ruoyao wrote: > On Mon, 2023-12-04 at 20:14 +0800, chenxiaolong wrote: > > On LoongArch architecture, using the latest gcc14 in regression test, > > it is found that the vector test cases in vector directory appear FAIL > > entries with un

Re: [PATCH v1] LoongArch: Modify the check type of the vector builtin function.

2023-12-05 Thread Xi Ruoyao
nt main() { float x[4] = {}; int y[4] = {}; assert_eq(x, y, __LINE__); } This is C++, not C. But IMO we can port the tests to C++ anyway. -- Xi Ruoyao School of Aerospace Science and Technology, Xidian University

Re: [C PATCH, v2] Add Walloc-size to warn about insufficient size in allocations [PR71219]

2023-12-06 Thread Xi Ruoyao
Martin Uecker has pointed out the alignment may be different with the different order of arguments, per C23 (N2293). With earlier versions of the standard some people believe the alignment should not be different, while the other people disagree (as the text is not very clear). -- Xi Ruoya

Re: [PATCH v3] LoongArch: Fix eh_return epilogue for normal returns

2023-12-06 Thread Xi Ruoyao
_save_restore_reg (word_mode, regno, offset, fn); > + > +   offset -= UNITS_PER_WORD; > + } > +    } I don't like this pair of {} for the for statement. It's not necessary and it changes the indent level, causing the diff hard to review. Otherwise LGTM. I'm not sure why I didn't notice the eh_return issue when I learnt shrink wrapping from RISC-V... -- Xi Ruoyao School of Aerospace Science and Technology, Xidian University

Re: [PATCH v3] LoongArch: Fix eh_return epilogue for normal returns

2023-12-07 Thread Xi Ruoyao
On Thu, 2023-12-07 at 14:18 +0800, Yang Yujie wrote: > On Thu, Dec 07, 2023 at 11:02:58AM +0800, Xi Ruoyao wrote: > > > > I don't like this pair of {} for the for statement.  It's not necessary > > and it changes the indent level, causing the diff hard to review. &g

[PATCH] LoongArch: Allow -mcmodel=extreme and model attribute with -mexplicit-relocs=auto

2023-12-07 Thread Xi Ruoyao
There seems no real reason to require -mexplicit-relocs=always for -mcmodel=extreme or model attribute. As the linker does not know how to relax a 3-operand la.local or la.global pseudo instruction, just emit explicit relocs for SYMBOL_PCREL64, and under TARGET_CMODEL_EXTREME also SYMBOL_GOT_DISP.

Re: [PATCH v2 3/3] libphobos: LoongArch hardware support.

2023-12-07 Thread Xi Ruoyao
> Hi, > > Changes to this module should go first to github.com/dlang/phobos. > > I also notice that theses SoftFloat static conditions in all LoongArch > support code doesn't exist in upstream either.  Can a pull request be > raised to sort out the discrepancy? It looks like this patch has been dropped in V3. -- Xi Ruoyao School of Aerospace Science and Technology, Xidian University

[PATCH] LoongArch: Fix warnings building libgcc

2023-12-09 Thread Xi Ruoyao
We are excluding loongarch-opts.h from target libraries, but now struct loongarch_target and gcc_options are not declared in the target libraries, causing: In file included from ../.././gcc/options.h:8, from ../.././gcc/tm.h:49, from ../../../gcc/libgcc/fixed-bit.

<    1   2   3   4   5   6   7   8   9   10   >