[PATCH v2] LoongArch: Add prefetch instructions.

2022-11-11 Thread Lulu Cheng
Co-Authored-By: xujiahao gcc/ChangeLog: * config/loongarch/loongarch-def.c: Initial number of parallel prefetch. * config/loongarch/loongarch-tune.h (struct loongarch_cache): Define number of parallel prefetch. * config/loongarch/loongarch.cc (loongarch_option_ove

[PATCH v2] LoongArch: Optimize the implementation of stack check.

2022-11-12 Thread Lulu Cheng
The old stack check was performed before the stack was dropped, which would cause the detection tool to report a memory leak. The current stack check scheme is as follows: '-fstack-clash-protection': 1. When the frame->total_size is smaller than the guard page size, the stack is dropped accord

Re: [PATCH v2] LoongArch: Add prefetch instructions.

2022-11-15 Thread Lulu Cheng
在 2022/11/15 下午5:17, Xi Ruoyao 写道: On Sat, 2022-11-12 at 17:45 +0800, Xi Ruoyao via Gcc-patches wrote: void prefetch(char *ptr, int off) { return __builtin_prefetch(ptr + off); } It's compiled to "preldx 0,$r4,$r5".  I don't think it's correct because according to the doc, rk should

[PATCH v3] LoongArch: Add prefetch instructions.

2022-11-15 Thread Lulu Cheng
v2 -> v3: 1. Remove preldx support. --- Enable sw prefetching at -O3 and higher. Co-Authored-By: xujiahao gcc/ChangeLog: * config/loongarch/constraints.md (ZD): New constraint. * config/loongarch/loongarch-def.c: Initial number of parallel pr

Re: [PATCH v3] LoongArch: Add prefetch instructions.

2022-11-15 Thread Lulu Cheng
在 2022/11/16 上午11:06, WANG Xuerui 写道: On 2022/11/16 10:10, Lulu Cheng wrote: v2 -> v3: 1. Remove preldx support. --- Enable sw prefetching at -O3 and higher. Co-Authored-By: xujiahao gcc/ChangeLog: * config/loongarch/constraints.md (ZD):

[PATCH v4] LoongArch: Optimize immediate load.

2022-11-17 Thread Lulu Cheng
v1 -> v2: 1. Change the code format. 2. Fix bugs in the code. v2 -> v3: Modifying a code implementation of an undefined behavior. v3 -> v4: Move the part of the immediate number decomposition from expand pass to split pass. Both regression tests and spec2006 passed. The problem mentioned in the

Re: [GCC14 PATCH] LoongArch: Improve cpymemsi expansion [PR109465]

2023-04-18 Thread Lulu Cheng
在 2023/4/12 下午8:16, Xi Ruoyao 写道: We'd been generating really bad block move sequences which is recently complained by kernel developers who tried __builtin_memcpy. To improve it: 1. Take the advantage of -mno-strict-align. When it is set, set mode size to UNITS_PER_WORD regardless of th

Re: [pushed][PATCH v3] gcc-13: Add changelog for LoongArch.

2023-04-18 Thread Lulu Cheng
Pushed to master. Thanks! 在 2023/4/19 下午2:04, Gerald Pfeifer 写道: On Tue, 18 Apr 2023, Lulu Cheng wrote: v1 -> v2: Modify syntax errors and description information. v2 -> v3: Modify some description information. Thank you, and thank you to Xuerui for their feedback! Please go ahe

Re: [PATCH] LoongArch: fix MUSL_DYNAMIC_LINKER

2023-04-19 Thread Lulu Cheng
在 2023/4/17 下午2:51, 樊鹏 写道: Yes, https://wiki.musl-libc.org/guidelines-for-distributions.html, "Multilib/multi-arch" section of this introduces it. Hi,  fanpeng: I agree with ruoyao, add this link to the commit message. I have no problem with other. Thanks! -Original Messages- F

Re: [pushed][PATCH] LoongArch: fix MUSL_DYNAMIC_LINKER

2023-04-21 Thread Lulu Cheng
Pushed to r14-130. 在 2023/4/19 下午4:23, Peng Fan 写道: The system based on musl has no '/lib64', so change it. https://wiki.musl-libc.org/guidelines-for-distributions.html, "Multilib/multi-arch" section of this introduces it. gcc/ * config/loongarch/gnu-user.h (MUSL_DYNAMIC_LINKER: Redef

Re: [PATCH] LoongArch: Enable shrink wrapping

2023-04-24 Thread Lulu Cheng
Ok, I will do spec performance test comparison as soon as possible. Thanks! 在 2023/4/23 下午9:19, Xi Ruoyao 写道: This commit implements the target macros for shrink wrapping of function prologues/epilogues shrink wrapping on LoongArch. Bootstrapped and regtested on loongarch64-linux-gnu. I don't

Re: [PATCH] LoongArch: Enable shrink wrapping

2023-04-25 Thread Lulu Cheng
+guojie 在 2023/4/23 下午9:19, Xi Ruoyao 写道: This commit implements the target macros for shrink wrapping of function prologues/epilogues shrink wrapping on LoongArch. Bootstrapped and regtested on loongarch64-linux-gnu. I don't have an access to SPEC CPU so I hope the reviewer can perform a benc

Re: [PATCH] LoongArch: Enable shrink wrapping

2023-04-26 Thread Lulu Cheng
Hi, ruoyao:   The performance of spec2006 is finished. The fixed-point 400.perlbench has about 3% performance improvement, and the other basics have not changed, and the floating-point tests have basically remained the same.   Do you have any questions about the test cases mentioned

Re: [PATCH] LoongArch: Enable shrink wrapping

2023-04-26 Thread Lulu Cheng
在 2023/4/26 下午6:02, WANG Xuerui 写道: On 2023/4/26 17:53, Lulu Cheng wrote: Hi, ruoyao:   The performance of spec2006 is finished. The fixed-point 400.perlbench has about 3% performance improvement, and the other basics have not changed, and the floating-point tests have basically

Re: [PATCH] LoongArch: Generate bytepick.[wd] for suitable bit operation pattern

2023-02-06 Thread Lulu Cheng
在 2023/2/4 上午1:50, Xi Ruoyao 写道: We can use bytepick.[wd] for a << (8 * x) | b >> (8 * (sizeof(a) - x)) while a and b are uint32_t or uint64_t. This is useful for some cases, for example: https://sourceware.org/pipermail/libc-alpha/2023-February/145203.html Bootstrapped and regtested o

[pushed][PATCH] LoongArch: Remove undefined behavior from code [PR 106097]

2022-06-29 Thread Lulu Cheng
C++2017 and previous standard description: The value of E1 << E2 is E1 left-shifted E2 bit positions; vacated bits are zero-filled. If E1 has an unsigned type, the value of the result is E1×2E2, reduced modulo one more than the maximum value representable inthe result type. Otherwise, if E1 has a

[PATCH v1] LoongArch: Fixed a compilation failure with '%c' in inline assembly [PR107731].

2022-11-22 Thread Lulu Cheng
gcc/ChangeLog: * config/loongarch/loongarch.cc (loongarch_classify_address): Add precessint for CONST_INT. (loongarch_print_operand): Increase the processing of '%c'. gcc/testsuite/ChangeLog: * gcc.target/loongarch/tst-asm-const.c: Moved to... * gcc.target

Re: [pushed][PATCH v4] LoongArch: Optimize immediate load.

2022-11-27 Thread Lulu Cheng
bootstrap-ubsan.  And the compiled result of imm-load1.c seems OK. And it's doing correct thing for Glibc "improved generic string functions" patch, producing some really tight loop now. On Thu, 2022-11-17 at 17:59 +0800, Lulu Cheng wrote: v1 -> v2: 1. Change the code format.

[PATCH] doc: Correct a clerical error in the document.

2022-12-06 Thread Lulu Cheng
gcc/ChangeLog: * doc/rtl.texi: Correct a clerical error in the document. --- gcc/doc/rtl.texi | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/gcc/doc/rtl.texi b/gcc/doc/rtl.texi index 43c9ee8bffe..44858d12892 100644 --- a/gcc/doc/rtl.texi +++ b/gcc/doc/rtl.texi @@ -214

Re: [PATCH] doc: Correct a clerical error in the document.

2022-12-07 Thread Lulu Cheng
在 2022/12/7 下午6:05, Richard Sandiford 写道: Lulu Cheng writes: gcc/ChangeLog: * doc/rtl.texi: Correct a clerical error in the document. --- gcc/doc/rtl.texi | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/gcc/doc/rtl.texi b/gcc/doc/rtl.texi index 43c9ee8bffe

[PATCH v3] LoongArch: Fixed a compilation failure with '%c' in inline assembly [PR107731].

2022-12-09 Thread Lulu Cheng
There is description of '%c' "%n" "%a" and "%l" in section 17.5 of gccint.pdf. So I can understand that these descriptors are the ones that the common code implementation back end has to support, right? But I don't see the use of these descriptors in gcc.pdf.Now I want to add the descriptor informa

Re: [GCC14 PATCH] LoongArch: Optimize additions with immediates

2023-04-03 Thread Lulu Cheng
/* snip */ diff --git a/gcc/testsuite/gcc.target/loongarch/add-const.c b/gcc/testsuite/gcc.target/loongarch/add-const.c new file mode 100644 index 000..3a9f72fe83d --- /dev/null +++ b/gcc/testsuite/gcc.target/loongarch/add-const.c @@ -0,0 +1,47 @@ +/* { dg-do compile } */ +/* { dg-opt

Re: [GCC14 PATCH v2] LoongArch: Optimize additions with immediates

2023-04-04 Thread Lulu Cheng
在 2023/4/4 下午4:38, Xi Ruoyao 写道: 1. Use addu16i.d for TARGET_64BIT and suitable immediates. 2. Split one addition with immediate into two addu16i.d or addi.{d/w} instructions if possible. This can avoid using a temp register w/o increase the count of instructions. Inspired by https://

Re: [GCC14 PATCH] LoongArch: Optimize additions with immediates

2023-04-04 Thread Lulu Cheng
在 2023/4/4 下午4:40, Xi Ruoyao 写道: On Tue, 2023-04-04 at 16:00 +0800, Xi Ruoyao via Gcc-patches wrote: On Tue, 2023-04-04 at 11:01 +0800, Lulu Cheng wrote: /* snip */ +unsigned long f10 (unsigned long x) { return x - 0x8000l * 2; } +unsigned long f11 (unsigned long x) { return x

[PATCH] LoongArch: Add built-in functions description of LoongArch BASE instruction set instructions.

2023-04-06 Thread Lulu Cheng
gcc/ChangeLog: * doc/extend.texi: Add section for LoongArch BASE Built-in functions. --- gcc/doc/extend.texi | 89 + 1 file changed, 89 insertions(+) diff --git a/gcc/doc/extend.texi b/gcc/doc/extend.texi index 3adb67aa47a..417af6c368d 100644 -

[PATCH v2] LoongArch: Add built-in functions description of LoongArch Base instruction set instructions.

2023-04-07 Thread Lulu Cheng
gcc/ChangeLog: * doc/extend.texi: Add section for LoongArch Base Built-in functions. --- gcc/doc/extend.texi | 129 1 file changed, 129 insertions(+) --- v1 -> v2: (1) Does not use i8, u8, i16, u16 etc. (2) Add the description informati

Re: [PATCH] LoongArch: Improve GAR store for va_list

2023-04-10 Thread Lulu Cheng
Sorry, it's my question. I still have some questions that I haven't understood, so I haven't replied to the email yet.:-( 在 2023/4/10 下午5:04, Xi Ruoyao 写道: Ping. Or maybe I've lost some replies here because my mail server crashed several days ago :). On Wed, 2023-03-29 at 02:01 +0800, Xi Ruo

[PATCH] LoongArch: Remove the definition of the macro LOGICAL_OP_NON_SHORT_CIRCUIT under the architecture and use the default definition instead.

2023-04-13 Thread Lulu Cheng
In some cases, setting this macro as the default can reduce the number of conditional branch instructions. gcc/ChangeLog: * config/loongarch/loongarch.h (LOGICAL_OP_NON_SHORT_CIRCUIT): Remove the macro definition. --- gcc/config/loongarch/loongarch.h | 1 - 1 file changed, 1 de

Re: [PATCH] LoongArch: Remove the definition of the macro LOGICAL_OP_NON_SHORT_CIRCUIT under the architecture and use the default definition instead.

2023-04-13 Thread Lulu Cheng
在 2023/4/13 下午8:24, Xi Ruoyao 写道: On Thu, 2023-04-13 at 19:51 +0800, Lulu Cheng wrote: In some cases, setting this macro as the default can reduce the number of conditional branch instructions. gcc/ChangeLog: * config/loongarch/loongarch.h (LOGICAL_OP_NON_SHORT_CIRCUIT): Remove

Re: [pushed][PATCH] LoongArch: Remove the definition of the macro LOGICAL_OP_NON_SHORT_CIRCUIT under the architecture and use the default definition instead.

2023-04-17 Thread Lulu Cheng
Pushed to r14-15. Due to my reasons, this modification did not catch up with the creation of the releases/gcc-13 branch, can I still submit this modification to releases/gcc-13?:-( 在 2023/4/13 下午8:24, Xi Ruoyao 写道: On Thu, 2023-04-13 at 19:51 +0800, Lulu Cheng wrote: In some cases

Re: [pushed][PATCH v2] LoongArch: Add built-in functions description of LoongArch Base instruction set instructions.

2023-04-17 Thread Lulu Cheng
Pushed to r14-14. 在 2023/4/7 下午4:38, Lulu Cheng 写道: gcc/ChangeLog: * doc/extend.texi: Add section for LoongArch Base Built-in functions. --- gcc/doc/extend.texi | 129 1 file changed, 129 insertions(+) --- v1 -> v2: (1) Does

[PATCH] gcc-13: Add changelog for LoongArch.

2023-04-17 Thread Lulu Cheng
--- htdocs/gcc-13/changes.html | 39 ++ 1 file changed, 39 insertions(+) diff --git a/htdocs/gcc-13/changes.html b/htdocs/gcc-13/changes.html index f3b9afed..c75e341b 100644 --- a/htdocs/gcc-13/changes.html +++ b/htdocs/gcc-13/changes.html @@ -563,6 +563,45 @@

[PATCH v2] gcc-13: Add changelog for LoongArch.

2023-04-18 Thread Lulu Cheng
--- htdocs/gcc-13/changes.html | 41 ++ 1 file changed, 41 insertions(+) --- v1 -> v2: Modify syntax errors and description information. diff --git a/htdocs/gcc-13/changes.html b/htdocs/gcc-13/changes.html index f3b9afed..4324c2d1 100644 --- a/htdocs/gcc-13/

Re: [PATCH] gcc-13: Add changelog for LoongArch.

2023-04-18 Thread Lulu Cheng
在 2023/4/18 下午2:44, Gerald Pfeifer 写道: Here, and in the other cases, the closing (that I marked aboved) should follow , since both the heading and the are part of the same list item. (See the RISC-V entry, for example.) This change is fine with the changes highlighted above. (If you prefer

Re: [PATCH] gcc-13: Add changelog for LoongArch.

2023-04-18 Thread Lulu Cheng
在 2023/4/18 下午3:29, WANG Xuerui 写道: Hi, Just some minor fixes ;-) On 2023/4/18 14:15, Lulu Cheng wrote: ---   htdocs/gcc-13/changes.html | 39 ++   1 file changed, 39 insertions(+) diff --git a/htdocs/gcc-13/changes.html b/htdocs/gcc-13/changes.html index

[PATCH v3] gcc-13: Add changelog for LoongArch.

2023-04-18 Thread Lulu Cheng
--- htdocs/gcc-13/changes.html | 42 ++ 1 file changed, 42 insertions(+) --- v1 -> v2: Modify syntax errors and description information. v2 -> v3: Modify some description information. diff --git a/htdocs/gcc-13/changes.html b/htdocs/gcc-13/changes.html index

Re: [PATCH] LoongArch: Improve GAR store for va_list

2023-04-18 Thread Lulu Cheng
在 2023/4/18 下午5:27, Xi Ruoyao 写道: On Mon, 2023-04-10 at 17:45 +0800, Lulu Cheng wrote: Sorry, it's my question. I still have some questions that I haven't understood, so I haven't replied to the email yet.:-( I've verified the value of cfun->va_list_gpr_size with

Re: [PATCH] LoongArch: Improve GAR store for va_list

2023-04-18 Thread Lulu Cheng
在 2023/4/18 下午7:48, Xi Ruoyao 写道: On Tue, 2023-04-18 at 19:21 +0800, Lulu Cheng wrote: 在 2023/4/18 下午5:27, Xi Ruoyao 写道: On Mon, 2023-04-10 at 17:45 +0800, Lulu Cheng wrote: Sorry, it's my question. I still have some questions that I haven't understood, so I haven't replied t

Re: [PATCH] LoongArch: Set 4 * (issue rate) as the default for -falign-functions and -falign-loops

2023-04-18 Thread Lulu Cheng
Hi, ruoyao: Thank you so much for making this submission. But we are testing the impact of these two alignment parameters (also including -falign-jumps and -falign-lables ) on performance. So before the result comes out, this patch will not be merged into the main branch for the time being.

Re:[pushed] [PATCH] LoongArch: Add support to annotate tablejump

2024-10-07 Thread Lulu Cheng
Pushed to r15-4130. 在 2024/7/11 下午7:43, Xi Ruoyao 写道: This is per the request from the kernel developers. For generating the ORC unwind info, the objtool program needs to analysis the control flow of a .o file. If a jump table is used, objtool has to correlate the jump instruction with the tab

Re: [PATCH] LoongArch: Make __builtin_lsx_vorn_v and __builtin_lasx_xvorn_v arguments and return values unsigned

2024-11-01 Thread Lulu Cheng
在 2024/11/2 上午1:10, Xi Ruoyao 写道: On Thu, 2024-10-31 at 23:58 +0800, Xi Ruoyao wrote: /* snip */ --- Now running bootstrap & regtest.  Posted early as a context for some LLVM patch.  I'll post the regtest result once it finishes. Done, no regressions. The LLVM patch is https://github.com/

[PATCH] LoongArch: Fix clerical errors in lasx_xvreplgr2vr_* and lsx_vreplgr2vr_*.

2024-11-02 Thread Lulu Cheng
[x]vldi.{b/h/w/d} is not implemented in LoongArch. Use the macro [x]vrepli.{b/h/w/d} to replace. gcc/ChangeLog: * config/loongarch/lasx.md: Fixed. * config/loongarch/lsx.md: Fixed. --- gcc/config/loongarch/lasx.md | 2 +- gcc/config/loongarch/lsx.md | 2 +- 2 files changed, 2 in

Re: Pushed: [PATCH] LoongArch: testsuite: Add -O for jump-table-annotate.c

2024-11-01 Thread Lulu Cheng
在 2024/11/2 上午1:36, Xi Ruoyao 写道: Without optimization, GCC does not emit a jump table for the test case. I'm not sure if the test case has been wrong in the first place or something has changed in these months... It was in the r15-4756 that turned -fjump-tables off at O0 optimization. I wa

[PATCH 0/2] Remove redundant code.

2024-11-01 Thread Lulu Cheng
Lulu Cheng (2): LoongArch: Remove redundant code. LoongArch: Modify the document to remove options that don't exist. gcc/config/loongarch/loongarch-builtins.cc | 102 - gcc/config/loongarch/loongarch-protos.h| 1 - gcc/config/loongarch/loongarch.cc

[PATCH 1/2] LoongArch: Remove redundant code.

2024-11-01 Thread Lulu Cheng
TARGET_ASM_ALIGNED_{HI,SI,QI}_OP are defined repeatedly and deleted. gcc/ChangeLog: * config/loongarch/loongarch-builtins.cc (loongarch_builtin_vectorized_function): Delete. (LARCH_GET_BUILTIN): Delete. * config/loongarch/loongarch-protos.h (loongarch_built

[PATCH 2/2] LoongArch: Modify the document to remove options that don't exist.

2024-11-01 Thread Lulu Cheng
gcc/ChangeLog: * doc/invoke.texi: Remove the non-existent option '-msmall-data-limit' and add a description of '-G'. --- gcc/doc/invoke.texi | 10 +- 1 file changed, 5 insertions(+), 5 deletions(-) diff --git a/gcc/doc/invoke.texi b/gcc/doc/invoke.texi index fd6c0c44709..

Re:[pushed] [PATCH] LoongArch: Fix soft-float builds of libffi

2024-10-23 Thread Lulu Cheng
Pushed to r15-4588 在 2024/1/27 下午3:09, Yang Yujie 写道: This patch correspond to the upstream PR: https://github.com/libffi/libffi/pull/817 libffi/ChangeLog: * src/loongarch64/ffi.c: Avoid defining floats in struct call_context if the ABI is soft-float. --- libffi/src/loongarch

Re: [pushed][PATCH 0/2] Remove redundant code.

2024-11-21 Thread Lulu Cheng
Pushed to r15-5583 and r15-5584. 在 2024/11/2 上午10:48, Lulu Cheng 写道: Lulu Cheng (2): LoongArch: Remove redundant code. LoongArch: Modify the document to remove options that don't exist. gcc/config/loongarch/loongarch-builtins.cc | 102 - gcc/config/loon

[PATCH] Regenerate opt urls for r15-5584.

2024-11-22 Thread Lulu Cheng
gcc/ChangeLog: * config/g.opt.urls: Regenerate. * config/i386/nto.opt.urls: Regenerate. * config/riscv/riscv.opt.urls: Regenerate. * config/rx/rx.opt.urls: Regenerate. * config/sol2.opt.urls: Regenerate. --- gcc/config/g.opt.urls | 2 +- gcc/confi

Re: [PATCH v3] LoongArch: Mask shift offset when emit {xv,v}{srl,sll,sra} with sameimm vector

2024-11-27 Thread Lulu Cheng
在 2024/11/28 上午9:26, Jinyang He 写道: For {xv,v}{srl,sll,sra}, the constraint `vector_same_uimm6` cause overflow in when emit {w,h,b}. Since the number of bits shifted is the remainder of the register value, it is actually unnecessary to constrain the range. Simply mask the shift number with the

Re: [PATCH] LoongArch: Mask shift offset when emit {xv,v}{srl,sll,sra} with sameimm vector.

2024-11-26 Thread Lulu Cheng
在 2024/11/27 下午3:10, Xi Ruoyao 写道: On Wed, 2024-11-27 at 14:24 +0800, Lulu Cheng wrote: 在 2024/11/27 下午12:06, Xi Ruoyao 写道: On Wed, 2024-11-27 at 11:58 +0800, Lulu Cheng wrote: --- /dev/null +++ b/gcc/testsuite/gcc.target/loongarch/vector/lasx/lasx-shift-sameimm-vec.c @@ -0,0 +1,72

Re: [PATCH] LoongArch: Mask shift offset when emit {xv,v}{srl,sll,sra} with sameimm vector.

2024-11-26 Thread Lulu Cheng
在 2024/11/27 上午10:14, Xi Ruoyao 写道: On Tue, 2024-11-26 at 18:37 +0800, Jinyang He wrote: For {xv,v}{srl,sll,sra}, the constraint `vector_same_uimm6` cause overflow in when emit {w,h,b}. Since the number of bits shifted is the remainder of the register value, it is actually unnecessary to const

Re: [PATCH] LoongArch: Mask shift offset when emit {xv,v}{srl,sll,sra} with sameimm vector.

2024-11-26 Thread Lulu Cheng
在 2024/11/27 下午12:06, Xi Ruoyao 写道: On Wed, 2024-11-27 at 11:58 +0800, Lulu Cheng wrote: --- /dev/null +++ b/gcc/testsuite/gcc.target/loongarch/vector/lasx/lasx-shift-sameimm-vec.c @@ -0,0 +1,72 @@ +/* Test shift bits overflow in vector */ +/* { dg-do compile } */ +/* { dg-options "-mlas

Re:[pushed] [PATCH 1/2] LoongArch: testsuite: Fix loongarch/vect-frint-scalar.c.

2024-11-30 Thread Lulu Cheng
Pushed to r15-5817. 在 2024/11/26 下午4:06, Lulu Cheng 写道: In r15-5327, change the default language version for C compilation from -std=gnu17 to -std=gnu23. ISO C99 and C11 allow ceil, floor, round and trunc, and their float and long double variants, to raise the “inexact” exception, but ISO/IEC

Re:[pushed] [PATCH 2/2] LoongArch: testsuite: Fix l{a}sx-andn-iorn.c.

2024-11-30 Thread Lulu Cheng
Pushed to r15-5818. 在 2024/11/26 下午4:06, Lulu Cheng 写道: Add '-fdump-tree-optimized' to this testcases. gcc/testsuite/ChangeLog: * gcc.target/loongarch/lasx-andn-iorn.c: Add '-fdump-tree-optimized'. * gcc.target/loongarch/lsx-andn-iorn.c:

Re: [pushed][PATCH v3] LoongArch: Mask shift offset when emit {xv,v}{srl,sll,sra} with sameimm vector

2024-11-30 Thread Lulu Cheng
Pushed to r15-5819.. 在 2024/11/28 上午9:26, Jinyang He 写道: For {xv,v}{srl,sll,sra}, the constraint `vector_same_uimm6` cause overflow in when emit {w,h,b}. Since the number of bits shifted is the remainder of the register value, it is actually unnecessary to constrain the range. Simply mask the sh

[PATCH 1/2] LoongArch: testsuite: Fix loongarch/vect-frint-scalar.c.

2024-11-26 Thread Lulu Cheng
In r15-5327, change the default language version for C compilation from -std=gnu17 to -std=gnu23. ISO C99 and C11 allow ceil, floor, round and trunc, and their float and long double variants, to raise the “inexact” exception, but ISO/IEC TS 18661-1:2014, the C bindings to IEEE 754-2008, as integra

[PATCH 2/2] LoongArch: testsuite: Fix l{a}sx-andn-iorn.c.

2024-11-26 Thread Lulu Cheng
Add '-fdump-tree-optimized' to this testcases. gcc/testsuite/ChangeLog: * gcc.target/loongarch/lasx-andn-iorn.c: Add '-fdump-tree-optimized'. * gcc.target/loongarch/lsx-andn-iorn.c: Likewise. --- gcc/testsuite/gcc.target/loongarch/lasx-andn-iorn.c | 2 +- gcc/test

[PATCH] LoongArch: Adjust the cost of ADDRESS_REG_REG [PR114978].

2025-01-06 Thread Lulu Cheng
After changing this cost from 1 to 3, the performance of spec2006 401 473 416 465 482 can be improved by about 2% on LA664. Add option '-maddr-reg-reg-cost='. gcc/ChangeLog: * config/loongarch/genopts/loongarch.opt.in: Add option '-maddr-reg-reg-cost='. * config/loongarch

Re:[pushed] [PATCH v3] LoongArch: Implement vector cbranch optab for LSX and LASX

2024-12-31 Thread Lulu Cheng
Pushed to r15-6477. 在 2024/12/25 下午5:59, Jiahao Xu 写道: In order to support vectorization of loops with multiple exits, this patch adds the implementation of the conditional branch optab for LoongArch LSX/LASX instructions. This patch causes the gen-vect-{2,25}.c tests to fail. This is because

Re: [pushed][PATCH] LoongArch: Remove useless UNSPECs and define_mode_attrs

2025-01-01 Thread Lulu Cheng
Pushed to r15-6487. 在 2024/12/30 上午10:34, Guo Jie 写道: gcc/ChangeLog: * config/loongarch/lasx.md: Remove useless code. * config/loongarch/lsx.md: Ditto. --- gcc/config/loongarch/lasx.md | 66 gcc/config/loongarch/lsx.md | 35 -

Re:[pushed] [PATCH v2] LoongArch: Add standard patterns uabd and sabd

2025-01-01 Thread Lulu Cheng
Pushed to r15-6492. 在 2024/12/30 下午3:12, Guo Jie 写道: gcc/ChangeLog: * config/loongarch/lasx.md (lasx_xvabsd_s_): Remove. (abd3): New insn pattern. (lasx_xvabsd_u_): Remove. * config/loongarch/loongarch-builtins.cc (CODE_FOR_lsx_vabsd_b): Rename. (

Re:[pushed] [PATCH] LoongArch: Adjust insn patterns for better combine

2025-01-01 Thread Lulu Cheng
Pushed to r15-6490. 在 2024/12/30 上午10:38, Guo Jie 写道: For some instruction patterns with commutative operands, the order of operands needs to be adjusted to match the rules. gcc/ChangeLog: * config/loongarch/loongarch.md (bytepick_d__rev): New combiner. (bstrpick_alsl_p

Re:[pushed] [PATCH] LoongArch: Fix bugs in insn patterns lasx_xvrepl128vei_b/h/w/d_internal

2025-01-01 Thread Lulu Cheng
Pushed to r15-6489. 在 2024/12/30 上午10:37, Guo Jie 写道: There are two aspects that affect the matching of instruction templates: 1. vec_duplicate is redundant in the following operations. set (match_operand:V4DI ...) (vec_duplicate:V4DI (vec_select:V4DI ...)) 2. The range of values

Re: [pushed][PATCH] LoongArch: Add some vector pack/unpack patterns

2025-01-01 Thread Lulu Cheng
Pushed to r15-6491. 在 2024/12/30 上午10:38, Guo Jie 写道: gcc/ChangeLog: * config/loongarch/lasx.md (vec_unpacks_lo_): Redefine. (vec_unpacku_lo_): Ditto. (lasx_vext2xv_h_b): Replaced by vec_unpack_lo_v32qi. (vec_unpack_lo_v32qi): New insn. (lasx_vext2xv_w_h)

Re:[pushed] [PATCH] LoongArch: Optimize for conditional move operations

2025-01-01 Thread Lulu Cheng
Pushed to r15-6493. 在 2024/12/30 上午10:39, Guo Jie 写道: The optimization example is as follows. From: if (condition) dest += 1 << 16; To: dest += (condition ? 1 : 0) << 16; It does not use maskeqz and masknez, thus reducing the number of instructions. gcc/ChangeLog: * config

Re: [pushed][PATCH] LoongArch: Fix selector error in lasx_xvexth_h/w/d* patterns

2025-01-01 Thread Lulu Cheng
Pushed to r15-6488. 在 2024/12/30 上午10:37, Guo Jie 写道: The xvexth related instructions operate SEPARATELY according to the high and low 128 bits, and sign/zero extend the upper half of every 128 bits in src to the corresponding 128 bits in dest. For xvexth.d.w, the rule for the first element of

Re: [PATCH v1] LoongArch: Opitmize the cost of vec_construct.

2025-01-07 Thread Lulu Cheng
在 2025/1/7 下午12:47, chenxiaolong 写道: When analyzing 525 on LoongArch architecture, it was found that the for loop of hotspot function x264_pixel_satd_8x4 could not be quantized 256-bit due to the cost of vec_construct setting. After re-adjusting vec_construct, the performance of 525 program

[PATCH 0/2] Implement target attribute and pragma.

2025-01-07 Thread Lulu Cheng
__attribute__ ((target ("{no-}lsx"))) __attribute__ ((target ("{no-}lasx"))) Lulu Cheng (2): LoongArch: Implement target attribute. LoongArch: Implement target pragma. gcc/attr-urls.def | 6 + gcc/config.gcc| 2 +-

[PATCH 1/2] LoongArch: Implement target attribute.

2025-01-07 Thread Lulu Cheng
Add function attributes support for LoongArch. Currently, the following items are supported: __attribute__ ((target ("{no-}strict-align"))) __attribute__ ((target ("cmodel="))) __attribute__ ((target ("arch="))) __attribute__ ((target ("tune="))) __attribut

Re: [pushed][PATCH] LoongArch: Optimize initializing fp resgister to zero

2025-01-07 Thread Lulu Cheng
Pushed to r15-6617. 在 2024/12/31 下午7:33, Deng Jianbo 写道: In LoongArch, currently uses instruction movgr2fr.{d|w} to move zero from fixed-point register to floating-pointer regsiter for initializing fp register to zero. When LSX or LASX is enabled, we can use instruction vxor.v which has lower la

Re: [PATCH] LoongArch: combine related slli operations

2025-01-07 Thread Lulu Cheng
在 2025/1/2 下午5:46, Zhou Zhao 写道: If SImode reg is continuous left shifted twice, combine related instruction to one. gcc/ChangeLog: * config/loongarch/loongarch.md (extsv_ashlsi3): New template Hi, zhaozhou: The indentation here is wrong, it needs to be aligned with *.

[PATCH 2/2] LoongArch: Implement target pragma.

2025-01-07 Thread Lulu Cheng
The target pragmas defined correspond to the target function attributes. This implementation is derived from AArch64. gcc/ChangeLog: * config/loongarch/loongarch-protos.h (loongarch_reset_previous_fndecl): Add function declaration. (loongarch_save_restore_target_globals)

Re: [pushed][PATCH v2] LoongArch: Support immediate_operand for vec_cmp

2024-12-26 Thread Lulu Cheng
Pushed to r15-6445. 在 2024/12/18 下午3:45, Jiahao Xu 写道: We can't vectorize the code into instructions like vslti.w that compare with immediate_operand, because we miss immediate_operand support for integer comparisons. gcc/ChangeLog: * config/loongarch/lasx.md (vec_cmp): Remove.

Re: [pushed][PATCH] LoongArch: Fix ICE caused by illegal calls to builtin functions [PR118561].

2025-02-06 Thread Lulu Cheng
Pushed to r14-11275 and r15-7386. 在 2025/1/23 上午11:44, Lulu Cheng 写道: PR target/118561 gcc/ChangeLog: * config/loongarch/loongarch-builtins.cc (loongarch_expand_builtin_lsx_test_branch): NULL_RTX will not be returned when an error is detected

Re: [PATCH] testsuite: LoongArch: Remove from btrunc, ceil, and floor effective target allowlist

2025-02-07 Thread Lulu Cheng
在 2025/2/7 下午7:51, Xi Ruoyao 写道: Now that C default is C23, so we can no longer use LSX/LASX instructions for these operations as the standard disallows raising INEXACT exceptions. So LoongArch is no longer suitable for these effective targets. Fix the test failures on gcc.dg/vect/vect-roundi

Re: [PATCH] LoongArch: Correct the mode for mask{eq,ne}z

2025-02-06 Thread Lulu Cheng
在 2025/1/20 上午9:30, Xi Ruoyao 写道: For mask{eq,ne}z, rk is always compared with 0 in the full width, thus the mode for rk should be X. LGTM! I agree with your point of view. Thank you. I found the issue reviewing a patch fixing a similar issue for RISC-V XTheadCondMov [1], but interestin

Re: [PATCH 6/8] LoongArch: Simplify {lsx,lasx_x}vpick description

2025-02-11 Thread Lulu Cheng
在 2025/2/7 下午8:09, Xi Ruoyao 写道: /* snip */ - -(define_insn "lasx_xvpickev_w" - [(set (match_operand:V8SI 0 "register_operand" "=f") - (vec_select:V8SI - (vec_concat:V16SI - (match_operand:V8SI 1 "register_operand" "f") - (match_operand:V8SI 2 "register_operan

Re: [PATCH 4/8] LoongArch: Simplify {lsx_,lasx_x}hv{add,sub}w description

2025-02-11 Thread Lulu Cheng
在 2025/2/11 下午4:37, Xi Ruoyao 写道: On Tue, 2025-02-11 at 15:48 +0800, Lulu Cheng wrote: Hi,   I think , the "{lsx_,lasx_x}hv{add,sub}w" in the title should be "{lsx_,lasx_x}vh{add,sub}w". Indeed. 在 2025/2/7 下午8:09, Xi Ruoyao 写道: Like what we've done for {lsx_,las

Re: [PATCH 4/8] LoongArch: Simplify {lsx_,lasx_x}hv{add,sub}w description

2025-02-10 Thread Lulu Cheng
Hi,  I think , the "{lsx_,lasx_x}hv{add,sub}w" in the title should be "{lsx_,lasx_x}vh{add,sub}w". 在 2025/2/7 下午8:09, Xi Ruoyao 写道: Like what we've done for {lsx_,lasx_x}v{add,sub,mul}l{ev,od}, use special predicates and TImode RTL instead of hard-coded const vectors and UNSPECs. /* snip */

Re: [PATCH 5/8] LoongArch: Simplify {lsx_,lasx_x}maddw description

2025-02-10 Thread Lulu Cheng
It seems that the title here is "{lsx_,lasx_x}vmaddw". 在 2025/2/7 下午8:09, Xi Ruoyao 写道: Like what we've done for {lsx_,lasx_x}v{add,sub,mul}l{ev,od}, use special predicates and TImode RTL instead of hard-coded const vectors and UNSPECs. Also reorder two operands of the outer plus in the templat

[PATCH 3/3] LoongArch: After setting the compilation options, update the predefined macros.

2025-02-11 Thread Lulu Cheng
target/PR118828 gcc/ChangeLog: * config/loongarch/loongarch-c.cc (loongarch_pragma_target_parse): Update the predefined macros. gcc/testsuite/ChangeLog: * gcc.target/loongarch/pr118828.c: New test. Change-Id: I13f7b44b11bba2080db797157a0389cc1bd65ac6 --- gcc/co

[PATCH 2/3] LoongArch: Split the function loongarch_cpu_cpp_builtins into two functions.

2025-02-11 Thread Lulu Cheng
Split the implementation of the function loongarch_cpu_cpp_builtins into two parts: 1. Macro definitions that do not change (only considering 64-bit architecture) 2. Macro definitions that change with different compilation options. gcc/ChangeLog: * config/loongarch/loongarch-c.cc (bu

[PATCH 0/3] Organize the code and fix PR118828.

2025-02-11 Thread Lulu Cheng
Refer to the implementation of aarch64 to fix PR118828. Lulu Cheng (3): LoongArch: Move the function loongarch_register_pragmas to loongarch-c.cc. LoongArch: Split the function loongarch_cpu_cpp_builtins into two functions. LoongArch: After setting the compilation options, update

[PATCH 1/3] LoongArch: Move the function loongarch_register_pragmas to loongarch-c.cc.

2025-02-11 Thread Lulu Cheng
gcc/ChangeLog: * config/loongarch/loongarch-target-attr.cc (loongarch_pragma_target_parse): Move to ... (loongarch_register_pragmas): Move to ... * config/loongarch/loongarch-c.cc (loongarch_pragma_target_parse): ... here. (loongarch_register_pragmas

[PATCH v2 3/4] LoongArch: After setting the compilation options, update the predefined macros.

2025-02-12 Thread Lulu Cheng
PR target/118828 gcc/ChangeLog: * config/loongarch/loongarch-c.cc (loongarch_pragma_target_parse): Update the predefined macros. gcc/testsuite/ChangeLog: * gcc.target/loongarch/pr118828.c: New test. * gcc.target/loongarch/pr118828-2.c: New test. *

[PATCH v2 1/4] LoongArch: Move the function loongarch_register_pragmas to loongarch-c.cc.

2025-02-12 Thread Lulu Cheng
gcc/ChangeLog: * config/loongarch/loongarch-target-attr.cc (loongarch_pragma_target_parse): Move to ... (loongarch_register_pragmas): Move to ... * config/loongarch/loongarch-c.cc (loongarch_pragma_target_parse): ... here. (loongarch_register_pragmas

[PATCH v2 4/4] LoongArch: When -mfpu=none, '__loongarch_frecipe' shouldn't be defined [PR118843].

2025-02-12 Thread Lulu Cheng
PR target/118843 gcc/ChangeLog: * config/loongarch/loongarch-c.cc (loongarch_update_cpp_builtins): Fix macro definition issues. gcc/testsuite/ChangeLog: * gcc.target/loongarch/pr118843.c: New test. Change-Id: I777e46ccbc80bfa8948e7d416ac86853c8f4c16d --- gcc/co

[PATCH v2 0/4] Organize the code and fix PR118828 and PR118843.

2025-02-12 Thread Lulu Cheng
v1 -> v2: 1. Move __loongarch_{arch,tune} _LOONGARCH_{ARCH,TUNE} __loongarch_{div32,am_bh,amcas,ld_seq_sa} and __loongarch_version_major/__loongarch_version_minor to update function. 2. Fixed PR118843. 3. Add testsuites. Lulu Cheng (4): LoongArch: Move the funct

[PATCH v2 2/4] LoongArch: Split the function loongarch_cpu_cpp_builtins into two functions.

2025-02-12 Thread Lulu Cheng
Split the implementation of the function loongarch_cpu_cpp_builtins into two parts: 1. Macro definitions that do not change (only considering 64-bit architecture) 2. Macro definitions that change with different compilation options. gcc/ChangeLog: * config/loongarch/loongarch-c.cc (bu

Re: [PATCH 2/5] LoongArch: Add bit reverse operations

2024-12-16 Thread Lulu Cheng
在 2024/12/17 下午12:30, Xi Ruoyao 写道: On Tue, 2024-12-17 at 11:27 +0800, Lulu Cheng wrote: 在 2024/12/16 下午9:20, Xi Ruoyao 写道: /* snip */ +;; For HImode it's a little complicated... +(define_expand "rbithi" I didn't find rtithi's template description. Are there any tes

Re: [PATCH 2/5] LoongArch: Add bit reverse operations

2024-12-16 Thread Lulu Cheng
在 2024/12/16 下午9:20, Xi Ruoyao 写道: /* snip */ +;; For HImode it's a little complicated... +(define_expand "rbithi" I didn't find rtithi's template description. Are there any test cases ? + [(match_operand:HI 0 "register_operand") + (match_operand:HI 1 "register_operand")] + "" + { +r

Re: [PATCH 0/5] LoongArch: CRC optimization

2024-12-17 Thread Lulu Cheng
在 2024/12/16 下午9:19, Xi Ruoyao 写道: A generic CRC optimization pass has been implemented in r15-5850. But without target-specific code, it'll only optimize the CRC loop to a table lookup. With LoongArch-specific code we can do it better: for 64-bit LoongArch and the IEEE 802.3 polynomial or th

Re:[pushed] [PATCH] LoongArch: Make __builtin_lsx_vorn_v and __builtin_lasx_xvorn_v arguments and return values unsigned

2024-11-21 Thread Lulu Cheng
Pushed to r15-5580. We searched in the multimedia package and found no cases of using __builtin_lsx_vorn_v or __builtin_lasx_xvorn_v, so the interface type has been modified in the form of a bugfix. Thanks! 在 2024/10/31 下午11:58, Xi Ruoyao 写道: Align them with other vector bitwise builtins.

Re: [pushed] [PATCH] LoongArch: Make __builtin_lsx_vorn_v and __builtin_lasx_xvorn_v arguments and return values unsigned

2024-11-21 Thread Lulu Cheng
Pushed to r14-10960. 在 2024/11/22 上午9:52, Lulu Cheng 写道: Pushed to r15-5580. We searched in the multimedia package and found no cases of using __builtin_lsx_vorn_v or __builtin_lasx_xvorn_v, so the interface type has been modified in the form of a bugfix. Thanks! 在 2024/10/31 下午11:58

Re:[pushed] [PATCH] LoongArch: Fix clerical errors in lasx_xvreplgr2vr_* and lsx_vreplgr2vr_*.

2024-11-21 Thread Lulu Cheng
Pushed to r15-5581 and r14-10961. 在 2024/11/2 下午3:37, Lulu Cheng 写道: [x]vldi.{b/h/w/d} is not implemented in LoongArch. Use the macro [x]vrepli.{b/h/w/d} to replace. gcc/ChangeLog: * config/loongarch/lasx.md: Fixed. * config/loongarch/lsx.md: Fixed. --- gcc/config/loongarch

Re: [pushed][PATCH v2] LoongArch: Generate the final immediate for lu12i.w, lu32i.d and lu52i.d

2025-01-10 Thread Lulu Cheng
Pushed to r15-6817. 在 2025/1/10 上午10:27, mengqinggang 写道: Generate 0x1010 instead of 0x101>>12 for lu12i.w. lu32i.d and lu52i.d use the same processing. gcc/ChangeLog: * config/loongarch/lasx.md: Use new loongarch_output_move. * config/loongarch/loongarch-protos.h (loongarc

Re:[pushed] [PATCH v1] LoongArch: Generate the final immediate for lu12i.w, lu32i.d and lu52i.d

2025-01-09 Thread Lulu Cheng
Pushed to r15-6755. 在 2025/1/6 下午4:16, mengqinggang 写道: Generate 0x1010 instead of 0x101>>12 for lu12i.w. lu32i.d and lu52i.d use the same processing. gcc/ChangeLog: * config/loongarch/lasx.md: Use new loongarch_output_move. * config/loongarch/loongarch-protos.h (loongarch_

Re: [pushed] [PATCH v1] LoongArch: Generate the final immediate for lu12i.w, lu32i.d and lu52i.d

2025-01-09 Thread Lulu Cheng
在 2025/1/10 上午10:03, Lulu Cheng 写道: Pushed to r15-6755. Sorry, I replied to the wrong email. 在 2025/1/6 下午4:16, mengqinggang 写道: Generate 0x1010 instead of 0x101>>12 for lu12i.w. lu32i.d and lu52i.d use the same processing. gcc/ChangeLog: * config/loongarch/lasx.md: U

Re:[pushed] [PATCH v2] LoongArch: Opitmize the cost of vec_construct.

2025-01-09 Thread Lulu Cheng
Pushed to r15-6755. 在 2025/1/7 下午9:04, chenxiaolong 写道: When analyzing 525 on LoongArch architecture, it was found that the for loop of hotspot function x264_pixel_satd_8x4 could not be quantized 256-bit due to the cost of vec_construct setting. After re-adjusting vec_construct, the performan

<    1   2   3   4   5   >