在 2023/12/29 上午12:11, Xi Ruoyao 写道:
The problem with peephole2 is it uses a naive sliding-window algorithm
and misses many cases. For example:
float a[1];
float t() { return a[0] + a[8000]; }
is compiled to:
la.local$r13,a
la.local$r12,a+32768
fld.s
On 2023-12-25 16:45 Kito Cheng wrote:
>+++ b/gcc/testsuite/gcc.target/riscv/interrupt-misaligned.c
>@@ -0,0 +1,29 @@
>+/* { dg-do compile } */
>+/* { dg-options "-O2 -march=rv64gc -mabi=lp64d -fno-schedule-insns
>-fno-schedule-insns2" } */
>+/* { dg-skip-if "" { *-*-* } { "-flto -fno-fat-lto-ob
In the LoongArch architecture, GCC supports the vectorization function tested
by vect/slp-26.c, but there is no detection of loongarch in dg-finals. Add
loongarch to the appropriate dg-finals.
gcc/testsuite/ChangeLog:
* gcc.dg/vect/slp-26.c: Add loongarch.
---
gcc/testsuite/gcc.dg/vect/
In the GCC code of LoongArch architecture, IFN_STORE_LANES optimization
operation is not supported, and four SLP statements are used for vectorization
in slp-21.c. So add loongarch*-*-* to the corresponding dg-finals.
gcc/testsuite/ChangeLog:
* gcc.dg/vect/slp-21.c: Add loongarch.
---
gc
chenxiaolong writes:
> In order to improve and check the function of vector quantization in
> LoongArch architecture, tests on vector instruction set are provided
> in target-support.exp.
>
> gcc/testsuite/ChangeLog:
>
> * lib/target-supports.exp:Add LoongArch to the list of supported
>
Thanks Jeff.
I think I locate where aarch64 performs the trick here.
1. In the .final we have rtl like
(insn:TI 6 8 29 (set (reg:SF 32 v0)
(const_double:SF -0.0 [-0x0.0p+0]))
"/home/box/panli/gnu-toolchain/gcc/gcc/testsuite/gcc.dg/pr30957-1.c":31:7 79
{*movsf_aarch64}
(nil))
2. t
This fixes the gcc.dg/tree-ssa/gen-vect-26.c testcase by adding
`#pragma GCC novector` in front of the loop that is doing the checking
of the result. We only want to test the first loop to see if it can be
vectorize.
Committed as obvious after testing on x86_64-linux-gnu with -m32.
gcc/testsuite/
This patch only involves the generation of xtheadvector
special load/store instructions and vext instructions.
gcc/ChangeLog:
* config/riscv/riscv-vector-builtins-bases.cc
(class th_loadstore_width): Define new builtin bases.
(BASE): Define new builtin bases.
* con
This patch is to handle the differences in instruction generation
between Vector and XTheadVector. In this version, we only support
partial xtheadvector instructions that leverage directly from current
RVV1.0 with simple adding "th." prefix. For different name xtheadvector
instructions but share sa
This patch adds th. prefix to all XTheadVector instructions by
implementing new assembly output functions. We only check the
prefix is 'v', so that no extra attribute is needed.
gcc/ChangeLog:
* config/riscv/riscv-protos.h (riscv_asm_output_opcode):
New function to add assembler
This patch is to introduce basic XTheadVector support
(march string parsing and a test for __riscv_xtheadvector)
according to https://github.com/T-head-Semi/thead-extension-spec/
gcc/ChangeLog:
* common/config/riscv/riscv-common.cc
(riscv_subset_list::parse): Add new vendor extens
This patch use vector_length_operand instead of csr_operand for
vsetvl patterns, so that changes for vector will not affect scalar
patterns using csr_operand in riscv.md.
gcc/ChangeLog:
* config/riscv/vector.md:
Use vector_length_operand for vsetvl patterns.
Co-authored-by: Jin M
This patch use vector_length_operand instead of csr_operand for
vsetvl patterns, so that changes for vector will not affect scalar
patterns using csr_operand in riscv.md.
gcc/ChangeLog:
* config/riscv/vector.md:
Use vector_length_operand for vsetvl patterns.
Co-authored-by: Jin M
This patch moves the definition of the enums lst_type and
frm_op_type into riscv-vector-builtins-bases.h and removes
the static visibility of fold_fault_load(), so these
can be used in other compile units.
gcc/ChangeLog:
* config/riscv/riscv-vector-builtins-bases.cc (enum lst_type):
This patch series presents gcc implementation of the XTheadVector
extension [1].
[1] https://github.com/T-head-Semi/thead-extension-spec/
For some vector patterns that cannot be avoided, we use
"!TARGET_XTHEADVECTOR" to disable them in order not to
generate instructions that xtheadvector does not
After the detection of maximum reduction is enabled on LoongArch architecture,
the regression test of GCC finds that vect-fmin-3.c fails. Currently, in the
target-supports.exp file, only aarch64,arm,riscv, and LoongArch architectures
are supported. Through analysis, the "-ffast-math" compilation op
Hi Juzhe,
These vsetvl patterns were written by you with csr_operand initially.
Are you sure it can be repalced by vector_length_operand?
Joshua
--
发件人:juzhe.zh...@rivai.ai
发送时间:2023年12月29日(星期五) 10:25
收件人:"cooper.joshua";
"gc
Roger Sayle 于2023年12月29日周五 00:54写道:
>
>
>
> The current (default) behavior is that when the target doesn’t define
>
> TARGET_INSN_COST the middle-end uses the backend’s
>
> TARGET_RTX_COSTS, so multiplications are slower than additions,
>
> but about the same size when optimizing for size (with -O
We do not have vector_length_operand in vsetvl patterns.
(define_insn "@vsetvl"
[(set (match_operand:P 0 "register_operand" "=r")
(unspec:P [(match_operand:P 1 "vector_csr_operand" "rK")
(match_operand 2 "const_int_operand" "i")
(match_operand 3 "con
Hi Juzhe,
For vector_csr_operand, please refer to
https://gcc.gnu.org/pipermail/gcc-patches/2023-December/641124.html.
Joshua
--
发件人:juzhe.zh...@rivai.ai
发送时间:2023年12月29日(星期五) 10:14
收件人:"cooper.joshua";
"gcc-patches"
抄 送:Jim W
When gcc enabled the vectorization of the common layer, some FAIL items
appeared in GCC regression tests, such as gcc.dg/fma-{3,4,6,7}.c. On LoongArch
architecture, for example, the result of fmsub.s instruction is a*b-c, and
there is a problem of positive and negative zero inequality between the r
In the GCC regression test result, it is found that the
bind_c_array_params_2.f90 test fails. After analysis, it is found that the
reason why the test fails is that the regular expression in the test result
cannot correctly detect the correct assembly code (such as bl %plt(myBindC))
generated on th
When gcc enables the file test under gcc.dg/vect, it is found that vect-{82,
83}.c does not support the test. Through analysis, LoongArch architecture
supports the detection function of this test case. Therefore, the detection
of LoongArch architecture is added to the test rules to solve the situat
After the detection procedure under the gcc.dg/vect directory was added to
GCC, FAIL entries of vector multiplication transformations of different types
appeared in the gcc regression test results. After debugging analysis, the main
problem is that the 128-bit vector of LoongArch architecture does
When GCC is able to detect vectorized test cases in the common layer, FAIL
entries appear in some test cases after regression testing. The cause of the
error is that the vectorization option was not set when testing the program,
and the vectorization code could not be generated, so additional suppo
When using binutils that does not support vectorization and gcc compiler
toolchain that supports vectorization, regression tests found that pr60510.f
had a FAIL entry. The reason is that the default setting of the program is
the execution state, which will cause problems in the assembly stage when
When the toolchain is built using binutils that does not support vectorization
and gcc that supports vectorization, the regression test results of GCC show
that the vect-bic-bitmask-{12,23}.c file fails. The reason is that it carries
out two stages of compilation and assembly test, in the assembly
In order to improve and check the function of vector quantization in
LoongArch architecture, tests on vector instruction set are provided
in target-support.exp.
gcc/testsuite/ChangeLog:
* lib/target-supports.exp:Add LoongArch to the list of supported
targets.
---
gcc/testsuite/li
H Juzhe,
This patch "RISC-V: Handle differences between XTheadvector and
Vector" is addressing some code generation issues for RVV1.0
instructions that xtheadvector does not have, not with intrinsics.
BTW, what about the following patch " RISC-V: Add support for
xtheadvector-specific intrinsics"?
H Juzhe,
This patch "RISC-V: Handle differences between XTheadvector and
Vector" is addressing some code generation issues for RVV1.0
instructions that xtheadvector does not have, not with intrinsics.
BTW, what about the following patch " RISC-V: Add support for
xtheadvector-specific intrinsics"?
When using binutils, which does not support vectorization, and the gcc compiler
toolchain, which does support vectorization, the following two types of error
problems occur in gcc regression testing.
1.Failure of common tests in the gcc.dg/vect directory???
Regression testing of GCC has found tha
In general, I agree with this change.
When gcc12 on RV64, more than one `sext.w` will be produced with our test.
(Note, use -O1).
>
> There are two things that help here. The first is that the most significant
> bit never appears in the middle of a field, so we don't have to worry about
> overlap
Jeff Law 于2023年12月29日周五 02:23写道:
>
>
>
> On 12/28/23 07:59, Roger Sayle wrote:
> >
> > This patch fixes PR rtl-optmization/104914 by tweaking/improving the way
> > that fields are written into a pseudo register that needs to be kept sign
> > extended.
> Well, I think "fixes" is a bit of a stretch.
This patch only involves the generation of xtheadvector
special load/store instructions and vext instructions.
gcc/ChangeLog:
* config/riscv/riscv-vector-builtins-bases.cc
(class th_loadstore_width): Define new builtin bases.
(BASE): Define new builtin bases.
* con
This patch is to handle the differences in instruction generation
between Vector and XTheadVector. In this version, we only support
partial xtheadvector instructions that leverage directly from current
RVV1.0 with simple adding "th." prefix. For different name xtheadvector
instructions but share sa
After implementing the cost model on the LoongArch architecture, the GCC
compiler code has this feature turned on by default, which causes the
lasx-xvstelm.c file test to fail. Through analysis, this test case can
generate vectorization instructions required for detection only after
disabling the f
Hi Jeff,
Perhaps fold_fault_load cannot be moved to riscv-protos.h since
gimple_folder is declared in riscv-vector-builtins.h. It's not reasonable
to include riscv-vector-builtins.h in riscv-protos.h.
In fact, fold_fault_load is defined specially for some builtin functions, and
it would be bette
The redudant dump check is fragile and easily changed, not necessary.
Tested on both RV32/RV64 no regression.
Remove it and committed.
gcc/testsuite/ChangeLog:
* gcc.dg/vect/costmodel/riscv/rvv/pr113112-1.c: Remove redundant checks.
---
gcc/testsuite/gcc.dg/vect/costmodel/riscv/rvv/pr
This patch fixes the following choosing unexpected big LMUL which cause
register spillings.
Before this patch, choosing LMUL = 4:
addisp,sp,-160
addiw t1,a2,-1
li a5,7
bleut1,a5,.L16
vsetivlizero,8,e64,m4,ta,ma
vmv.v.x v4,a0
On 12/28/23 17:42, Li, Pan2 wrote:
Thanks Jeff for comments, and Happy new year!
Interesting. So I'd actually peel one more layer off this onion. Why
do the aarch64 and riscv targets generate different constants (0.0 vs
-0.0)?
Yeah, it surprise me too when debugging the foo function. But
Thanks Jeff for comments, and Happy new year!
> Interesting. So I'd actually peel one more layer off this onion. Why
> do the aarch64 and riscv targets generate different constants (0.0 vs
> -0.0)?
Yeah, it surprise me too when debugging the foo function. But didn't dig into
it in previous a
Hi Rimvydas!
Am 28.12.23 um 08:09 schrieb Rimvydas Jasinskas:
On Wed, Dec 27, 2023 at 10:34 PM Harald Anlauf wrote:
The patch is almost fine, except for a strange wording here:
+@smallexample
+gfortran -save-temps -c foo.F90
+@end smallexample
+
+preprocesses to in @file{foo.fii}, compiles to
Hi Jeff,
Thanks for the speedy review.
> On 12/28/23 07:59, Roger Sayle wrote:
> > This patch fixes PR rtl-optmization/104914 by tweaking/improving the
> > way that fields are written into a pseudo register that needs to be
> > kept sign extended.
> Well, I think "fixes" is a bit of a stretch.
On 12/24/23 05:24, Roger Sayle wrote:
What's exceedingly weird is T_N_T_M_P (DImode, SImode) isn't
actually a truncation! The output precision is first, the input
precision is second. The docs explicitly state the output precision
should be smaller than the input precision (which makes sen
On 12/28/23 07:59, Roger Sayle wrote:
This patch fixes PR rtl-optmization/104914 by tweaking/improving the way
that fields are written into a pseudo register that needs to be kept sign
extended.
Well, I think "fixes" is a bit of a stretch. We're avoiding the issue
by changing the early RTL
On 12/24/23 01:11, YunQiang Su wrote:
Yes. I also guess so. Any new idea?
Well, I see multiple intertwined issues and I think MIPS has largely
mucked this up.
At a high level DI -> SI truncation is not a nop on MIPS64. We must
explicitly sign extend the value from SI->DI to preserve the in
The current (default) behavior is that when the target doesn't define
TARGET_INSN_COST the middle-end uses the backend's
TARGET_RTX_COSTS, so multiplications are slower than additions,
but about the same size when optimizing for size (with -Os or -Oz).
All of this gets disabled with your
On 12/26/23 02:34, pan2...@intel.com wrote:
From: Pan Li
This patch would like to XFAIL the test case pr30957-1.c for the RVV when
build the elf with some configurations (list at the end of the log)
It will be vectorized during vect_transform_loop with a variable factor.
It won't benefit fro
On 12/26/23 19:38, Juzhe-Zhong wrote:
Notice we have this following situation:
vsetivlizero,4,e32,m1,ta,ma
vlseg4e32.v v4,(a5)
vlseg4e32.v v12,(a3)
vsetvli a5,zero,e32,m1,tu,ma ---> This is redundant since
VLMAX AVL = 4 when it
The problem with peephole2 is it uses a naive sliding-window algorithm
and misses many cases. For example:
float a[1];
float t() { return a[0] + a[8000]; }
is compiled to:
la.local$r13,a
la.local$r12,a+32768
fld.s $f1,$r13,0
fld.s $f0,$r12,-768
On 12/23/23 16:37, Roger Sayle wrote:
One of the cool features of the H8 backend is its use of tables to select
optimal shift implementations for different CPU variants. This patch
borrows (plagiarizes) that idiom for SImode left shifts in the ARC backend
(for CPUs without a barrel-shifter).
On 12/25/23 01:45, Kito Cheng wrote:
`interrupt` function will backup fcsr register, but it fixed to SImode,
it's not big issue since fcsr only used 8 bits so far, however the
offset should still using UNITS_PER_WORD to prevent the stack offset
become non 8 byte aligned, it will cause problem
On 12/26/23 19:47, Kito Cheng wrote:
Thanks Feng, the patch is LGTM from my side, I am happy to accept
vector crypto stuffs for GCC 14, it's mostly intrinsic stuff, and the
only few non-intrinsic stuff also low risk enough (e.g. vrol, vctz)
I won't object. I'm disappointed that we're in a sim
On 12/26/23 19:49, joshua wrote:
Hi Jeff,
Yes, I will change soemthing in vector_csr_operand in the following
patches.
Constraints will be added that the AVL cannot be encoded as an
immediate for xtheadvecotr vsetvl.
Ah. Thanks. Makes sense.
jeff
The truly_noop_truncation target hook is documented, in target.def, as
"true if it is safe to convert a value of inprec bits to one of outprec
bits (where outprec is smaller than inprec) by merely operating on it
as if it had only outprec bits", i.e. the middle-end can use a SUBREG
instead of a TR
This patch fixes PR rtl-optmization/104914 by tweaking/improving the way
that fields are written into a pseudo register that needs to be kept sign
extended.
The motivating example from the bugzilla PR is:
extern void ext(int);
void foo(const unsigned char *buf) {
int val;
((unsigned char*)&v
MIPS backend had some information about INSN, including length,
count etc.
And since some instructions are more costly, let's add a new
attr `perf_ratio`. It's default value is (const_int 1).
The return value of mips_insn_cost is
insn_count * perf_ratio * 4.
The magic `4` here, is due to that
With new glibc one more loop can be vectorized via simd exp in libmvec.
Found by the Linaro TCWG CI.
gcc/testsuite/ChangeLog:
* gfortran/vect/vect-8.f90: Accept more vectorized loops.
---
gcc/testsuite/gfortran.dg/vect/vect-8.f90 | 4 ++--
1 file changed, 2 insertions(+), 2 deletions(-)
I also have the same doubts about vector instructions.😂
Sorry i can't prove it, so i used simplify_gen_subreg instead to make sure
there won't be problems (i submitted the v2 version), my oversight.
> -原始邮件-
> 发件人: "Xi Ruoyao"
> 发送时间:2023-12-28 18:55:01 (星期四)
> 收件人: "Li Wei" , gcc-patche
There are currently two versions of the implementations of constant
vector permutation: loongarch_expand_vec_perm_const_1 and
loongarch_expand_vec_perm_const_2. The implementations of the two
versions are different. Currently, only the implementation of
loongarch_expand_vec_perm_const_1 is used fo
Move ix86_expand_unary_operator from i386.cc to i386-expand.cc, re-arrange
prototypes and do some cosmetic changes with the usage of TARGET_APX_NDD.
No functional changes.
gcc/ChangeLog:
* config/i386/i386.cc (ix86_unary_operator_ok): Move from here...
* config/i386/i386-expand.cc (ix86_
On Thu, 2023-12-28 at 14:59 +0800, Li Wei wrote:
> There are currently two versions of the implementations of constant
> vector permutation: loongarch_expand_vec_perm_const_1 and
> loongarch_expand_vec_perm_const_2. The implementations of the two
> versions are different. Currently, only the imple
On Fri, Dec 22, 2023 at 11:14 AM Roger Sayle wrote:
>
>
> This patch resolves the failure of pr43644-2.c in the testsuite, a code
> quality test I added back in July, that started failing as the code GCC
> generates for 128-bit values (and their parameter passing) has been in
> flux. After a few
There are currently two versions of the implementations of constant
vector permutation: loongarch_expand_vec_perm_const_1 and
loongarch_expand_vec_perm_const_2. The implementations of the two
versions are different. Currently, only the implementation of
loongarch_expand_vec_perm_const_1 is used fo
64 matches
Mail list logo