In order to support vectorization of loops with multiple exits, this
patch adds the implementation of the conditional branch optab for
LoongArch LSX/LASX instructions.
This patch causes the gen-vect-{2,25}.c tests to fail. This is because
the support for vectorizing loops with multiple exits has
In order to support vectorization of loops with multiple exits, this
patch adds the implementation of the conditional branch optab for
LoongArch LSX/LASX instructions.
This patch causes the gen-vect-{2,25}.c tests to fail. This is because
the support for vectorizing loops with multiple exits has
We can't vectorize the code into instructions like vslti.w that compare
with immediate_operand, because we miss immediate_operand support for
integer comparisons.
gcc/ChangeLog:
* config/loongarch/lasx.md (vec_cmp): Remove.
(vec_cmpu): Remove.
* config/loongarch/loongarch.
在 2024/12/17 10:58, Xi Ruoyao 写道:
On Tue, 2024-12-17 at 10:41 +0800, Jiahao Xu wrote:
/* snip */
+(define_expand "cbranch4"
+ [(set (pc)
+ (if_then_else
+ (match_operator 0 "equality_operator"
+ [(match_operand:ILASX
We can't vectorize the code into instructions like vslti.w that compare
with immediate_operand, because we miss immediate_operand support for
integer comparisons.
gcc/ChangeLog:
* config/loongarch/lasx.md: Support immediate_operand.
* config/loongarch/loongarch.cc (loongarch_expan
The hook changes the allocno class to either FP_REGS or GR_REGS depending on
the mode of the register. This results in better register allocation overall,
fewer spills and reduced codesize - particularly in SPEC2017 lbm.
gcc/ChangeLog:
* config/loongarch/loongarch.cc
(loongarch_ir
In order to support vectorization of loops with multiple exits, this
patch adds the implementation of the conditional branch optab for
LoongArch LSX/LASX instructions.
This patch causes the gen-vect-{2,25}.c tests to fail. This is because
the support for vectorizing loops with multiple exits has
For machines that satisfy ISA_HAS_LSX && !TARGET_64BIT, we will not support
them now
and in the future, so this patch removes these unused code.
gcc/ChangeLog:
* config/loongarch/lasx.md: Remove unused code.
* config/loongarch/loongarch-protos.h (loongarch_split_lsx_copy_d):
Rem
For machines that satisfy ISA_HAS_LSX && !TARGET_64BIT, we will not support
them now
and in the future, so this patch removes these unused code.
This patch also adds sign/zero-extend operations to vpickve2gr.d to match
the actual
instruction behavior, and integrates the template definition of vp
在 2024/1/25 下午3:46, chenglulu 写道:
Jiahao:
Note that the LoongArch 'a' in the title needs to be capitalized.
I modified this patch and incorporated it first.
Thanks, I'll pay attention next time.
在 2024/1/24 下午5:19, Jiahao Xu 写道:
It is incorrect to use vld/vori t
在 2024/1/24 下午5:48, Xi Ruoyao 写道:
On Wed, 2024-01-24 at 17:19 +0800, Jiahao Xu wrote:
gcc/ChangeLog:
* config/loongarch/larchintrin.h
(__frecipe_s): Update function return type.
(__frecipe_d): Ditto.
(__frsqrte_s): Ditto.
(__frsqrte_d): Ditto.
gcc
It is incorrect to use vld/vori to implement the vec_concatz because when
the LSX
instruction is used to update the value of the vector register, the upper 128
bits of
the vector register will not be zeroed.
gcc/ChangeLog:
* config/loongarch/lasx.md (@vec_concatz): Remove this
define_i
gcc/ChangeLog:
* config/loongarch/larchintrin.h
(__frecipe_s): Update function return type.
(__frecipe_d): Ditto.
(__frsqrte_s): Ditto.
(__frsqrte_d): Ditto.
gcc/testsuite/ChangeLog:
* gcc.target/loongarch/larch-frecipe-intrinsic.c: New test.
diff
-tree-dump-times ch2 "Will duplicate
bb" 2
+FAIL: gcc.dg/tree-ssa/update-threading.c scan-tree-dump-times optimized "Invalid
sum" 0
在 2024/1/16 上午10:32, Jiahao Xu 写道:
Define LOGICAL_OP_NON_SHORT_CIRCUIT as 0, for a short-circuit branch, use the
short-circuit operation in
Define LOGICAL_OP_NON_SHORT_CIRCUIT as 0, for a short-circuit branch, use the
short-circuit operation instead of the non-short-circuit operation.
SPEC2017 performance evaluation shows 1% performance improvement for fprate
GEOMEAN and no obvious regression for others. Especially, 526.blender_r +10.
In r14-7022-34d339bbd0c1f5b4ad9587e7ae8387c912cb028b I implement pattern
vec_concatz, the reg+reg addressing mode is not supported in
vec_concatz. This patch fixes that.
gcc/ChangeLog:
* config/loongarch/lasx.md (vec_concatz): Fix pattern to
support reg+reg addressing mode.
gcc/t
For below pattern, can be treated as a simple move because floating point
and vector share a common register on loongarch64.
(set (reg/v:SF 32 $f0 [orig:93 res ] [93])
(vec_select:SF (reg:V8SF 32 $f0 [115])
(parallel [
(const_int 0 [0])
])))
gcc/Cha
This patch implenments more vec_init optabs that can handle two LSX vectors
producing a LASX
vector by concatenating them. When an lsx vector is concatenated with an LSX
const_vector of
zeroes, the vec_concatz pattern can be used effectively. For example as below
typedef short v8hi __attribute__
For zero_extendqisi2 and zero_extendqidi2, use andi instead of bstrpick.w,
because andi is 6 times faster than bstrpick.w.
gcc/ChangeLog:
* config/loongarch/loongarch.md:
(zero_extend2): Rename to ..
(zero_extendhi2): .. this, use hi.
(zero_extendqihi2): Rename to
For instruction xvpermi.q, unused bits in operands[3] need be set to 0 to avoid
causing undefined behavior on LA464.
gcc/ChangeLog:
* config/loongarch/lasx.md: Set the unused bits in operand[3] to 0.
gcc/testsuite/ChangeLog:
* gcc.target/loongarch/vector/lasx/lasx-xvpremi.c: Rem
SPECCPU 2017 and SPECCPU 2006 successfully built and tested, and this
patch gives a 1.3% improvement in SPECCPU 2017 fprate on 3A6000, no
performance regression was found. This is an effective optimization and
looks good.
在 2023/12/15 下午4:57, Xi Ruoyao 写道:
We used a branch to load floating-po
When I attempt to enable vect_usad_char effective target for LoongArch,
slp-reduc-sad.c
and vect-reduc-sad*.c tests fail. These tests fail because the sad pattern
generates bad
code. This patch to fixed them, for sad patterns, use zero expansion instead of
sign
expansion for reduction.
Currentl
When I attempt to enable vect_usad_char effective target for LoongArch, some
tests fail. These tests fail because the sad pattern generates bad code. This
patch to fixed them, for sad patterns, use zero expansion instead of sign
expansion for reduction.
Currently, we are fixing failed vectorized t
The implementation of this patch has some issues. When I compile 521.wrf
with -Ofast -mlasx -flto -muse-movcf2gr, it results in an ICE:
during RTL pass: reload
module_mp_fast_sbm.fppized.f90: In function 'fast_sbm.constprop':
module_mp_fast_sbm.fppized.f90:1369:25: internal compiler error: maxi
在 2023/12/13 下午2:21, Xi Ruoyao 写道:
On Wed, 2023-12-13 at 14:17 +0800, Jiahao Xu wrote:
This test was extracted from the hot functions of 526.blender_r. Setting
LOGICAL_OP_NON_SHORT_CIRCUIT to 0 resulted in a 26% decrease in dynamic
instruction count and a 13.4% performance improvement. After
在 2023/12/13 上午2:27, Xi Ruoyao 写道:
On Tue, 2023-12-12 at 20:39 +0800, Xi Ruoyao wrote:
On Tue, 2023-12-12 at 19:59 +0800, Jiahao Xu wrote:
I guess here the problem is floating-point compare instruction is much
more costly than other instructions but the fact is not correctly
modeled yet
在 2023/12/12 下午7:26, Xi Ruoyao 写道:
On Tue, 2023-12-12 at 19:14 +0800, Jiahao Xu wrote:
Define LOGICAL_OP_NON_SHORT_CIRCUIT as 0, for a short-circuit branch, use the
short-circuit operation instead of the non-short-circuit operation.
This gives a 1.8% improvement in SPECCPU 2017 fprate on
Define LOGICAL_OP_NON_SHORT_CIRCUIT as 0, for a short-circuit branch, use the
short-circuit operation instead of the non-short-circuit operation.
This gives a 1.8% improvement in SPECCPU 2017 fprate on 3A6000.
gcc/ChangeLog:
* config/loongarch/loongarch.h (LOGICAL_OP_NON_SHORT_CIRCUIT):
在 2023/12/12 下午6:05, Xi Ruoyao 写道:
On Tue, 2023-12-12 at 17:50 +0800, Jiahao Xu wrote:
diff --git a/gcc/testsuite/gcc.target/loongarch/short-circuit.c
b/gcc/testsuite/gcc.target/loongarch/short-circuit.c
new file mode 100644
index 000..2cef0193466
--- /dev/null
+++ b/gcc/testsuite
Define LOGICAL_OP_NON_SHORT_CIRCUIT as 0, for a short-circuit branch, use the
short-circuit operation instead of the non-short-circuit operation.
This gives a 1.8% improvement in SPECCPU 2017 fprate on 3A6000.
gcc/ChangeLog:
* config/loongarch/loongarch.h (LOGICAL_OP_NON_SHORT_CIRCUIT):
在 2023/12/6 下午3:04, Jiahao Xu 写道:
LoongArch V1.1 adds support for approximate instructions, which are utilized
along with additional
Newton-Raphson steps implement single precision floating-point division, square
root and reciprocal
square root operations for better throughput.
The patches
When both the -mrecip and -mfrecipe options are enabled, use approximate
reciprocal
instructions and approximate reciprocal square root instructions with additional
Newton-Raphson steps to implement single precision floating-point division,
square
root and reciprocal square root operations, for a
Rename lasx_xvfrsqrt*/lsx_vfrsqrt* to rsqrt2 to align with standard
pattern name. Define function use_rsqrt_p to decide when to use rsqrt optab.
gcc/ChangeLog:
* config/loongarch/lasx.md (lasx_xvfrsqrt_): Renamed to ..
(rsqrt2): .. this.
* config/loongarch/loongarch-builti
Using -mrecip generates a sequence of instructions to replace divf, sqrtf and
rsqrtf. The number
of generated instructions is close to or exceeds the maximum issue instructions
per cycle of the
LoongArch, so vectorized loop unrolling is not performed on them.
gcc/ChangeLog:
* config/loo
Redefine pattern for [x]vfrecip instructions use rtx code instead of unspec,
and enable
[x]vfrecip instructions to be generated during auto-vectorization.
gcc/ChangeLog:
* config/loongarch/lasx.md (lasx_xvfrecip_): Renamed to ..
(recip3): .. this.
* config/loongarch/loong
This patch adds define_insn/builtins/intrinsics for these instructions, and add
option
-mfrecipe to control instruction generation.
gcc/ChangeLog:
* config/loongarch/genopts/isa-evolution.in (fecipe): Add.
* config/loongarch/larchintrin.h (__frecipe_s): New intrinsic.
(__
patch:
https://gcc.gnu.org/pipermail/gcc-patches/2023-December/639243.html
Jiahao Xu (5):
LoongArch: Add support for LoongArch V1.1 approximate instructions.
LoongArch: Use standard pattern name for xvfrsqrt/vfrsqrt
instructions.
LoongArch: Redefine pattern for xvfrecip/vfrecip instructions
patch:
https://gcc.gnu.org/pipermail/gcc-patches/2023-December/639243.html.
Jiahao Xu (5):
LoongArch: Add support for LoongArch V1.1 approximate instructions.
LoongArch: Use standard pattern name for xvfrsqrt/vfrsqrt
instructions.
LoongArch: Redefine pattern for xvfrecip/vfrecip
Using -mrecip generates a sequence of instructions to replace divf, sqrtf and
rsqrtf. The number
of generated instructions is close to or exceeds the maximum issue instructions
per cycle of the
LoongArch, so vectorized loop unrolling is not performed on them.
gcc/ChangeLog:
* config/loo
When both the -mrecip and -mfrecipe options are enabled, use approximate
reciprocal
instructions and approximate reciprocal square root instructions with additional
Newton-Raphson steps to implement single precision floating-point division,
square
root and reciprocal square root operations, for a
Redefine pattern for [x]vfrecip instructions use rtx code instead of unspec,
and enable
[x]vfrecip instructions to be generated during auto-vectorization.
gcc/ChangeLog:
* config/loongarch/lasx.md (lasx_xvfrecip_): Renamed to ..
(recip3): .. this.
* config/loongarch/loong
This patch adds define_insn/builtins/intrinsics for these instructions, and add
option
-mfrecipe to control instruction generation.
gcc/ChangeLog:
* config/loongarch/genopts/isa-evolution.in (fecipe): Add.
* config/loongarch/larchintrin.h (__frecipe_s): New intrinsic.
(__
Rename lasx_xvfrsqrt*/lsx_vfrsqrt* to rsqrt2 to align with standard
pattern name. Define function use_rsqrt_p to decide when to use rsqrt optab.
gcc/ChangeLog:
* config/loongarch/lasx.md (lasx_xvfrsqrt_): Renamed to ..
(rsqrt2): .. this.
* config/loongarch/loongarch-builti
loongarch_expand_vec_cond_mask_expr generates 'subreg's of 'subreg's, which are
not supported
in gcc, it causes an ICE:
ice.c:55:1: error: unrecognizable insn:
55 | }
| ^
(insn 63 62 64 8 (set (reg:V4DI 278)
(subreg:V4DI (subreg:V4DF (reg:V4DI 273 [ vect__53.26 ]) 0) 0)) -1
For [x]vshuf instructions, if the index value in the selector exceeds 63, it
triggers
undefined behavior on LA464, but not on LA664. To ensure compatibility of these
two
tests on both LA464 and LA664, we have modified both tests to ensure that the
index
value in the selector does not exceed 63.
在 2023/11/29 上午10:33, Xi Ruoyao 写道:
On Wed, 2023-11-29 at 10:23 +0800, Jiahao Xu wrote:
在 2023/11/29 上午10:08, Xi Ruoyao 写道:
On Tue, 2023-11-28 at 11:29 +0800, Jiahao Xu wrote:
diff --git a/gcc/config/loongarch/predicates.md
b/gcc/config/loongarch/predicates.md
index f7796da10b2..9e9ce58cb53
在 2023/11/29 上午10:08, Xi Ruoyao 写道:
On Tue, 2023-11-28 at 11:29 +0800, Jiahao Xu wrote:
diff --git a/gcc/config/loongarch/predicates.md
b/gcc/config/loongarch/predicates.md
index f7796da10b2..9e9ce58cb53 100644
--- a/gcc/config/loongarch/predicates.md
+++ b/gcc/config/loongarch/predicates.md
instructions by implementing '-mrecip' and '-mrecip='.
Jiahao Xu (5):
LoongArch: Add support for approximate instructions.
LoongArch: Use standard pattern name for xvfrsqrt/vfrsqrt
instructions.
LoongArch: Redefine pattern for xvfrecip/vfrecip instructions.
Lo
Redefine pattern for [x]vfrecip instructions use rtx code instead of unspec,
and enable
[x]vfrecip instructions to be generated during auto-vectorization.
gcc/ChangeLog:
* config/loongarch/lasx.md (lasx_xvfrecip_): Renamed to ..
(recip3): .. this.
* config/loongarch/loong
Using -mrecip generates a sequence of instructions to replace divf, sqrtf and
rsqrtf. The number
of generated instructions is close to or exceeds the maximum issue of the
LoongArch, so vectorized
loop unrolling is not performed on them.
gcc/ChangeLog:
* config/loongarch/loongarch.cc
(l
When -mrecip option is turned on, use approximate reciprocal instructions and
approximate
reciprocal square root instructions with additional Newton-Raphson steps to
implement
single precision floating-point division, square root and reciprocal square
root operations
for better throughput.
gcc/
Rename lasx_xvfrsqrt*/lsx_vfrsqrt* to rsqrt2 to align with standard
pattern name.
gcc/ChangeLog:
* config/loongarch/lasx.md (lasx_xvfrsqrt_): Renamed to ..
(*rsqrt2): .. this.
* config/loongarch/loongarch-builtins.cc
(CODE_FOR_lsx_vfrsqrt_d): Redefine to standard p
LA664 introduces new instructions for reciprocal approximation and reciprocal
square
root approximation. It includes the scalar instructions frecipe and frsrte, as
well
as their corresponding vector instructions [x]vfrecipe and [x]vfrsqrte. This
patch
adds define_insn/builtins/intrinsics for the
在 2023/11/19 上午2:25, Xi Ruoyao 写道:
On Fri, 2023-11-17 at 10:21 +0800, chenglulu wrote:
Pushed to r14-5545.
在 2023/11/16 下午4:44, Jiahao Xu 写道:
Based on SPEC2017 performance evaluation results, it's better to make them equal
to the cost of unaligned store/load so as to avoid odd alig
This patch adds support for xorsign pattern to scalar fp and vector. With the
new expands, uniformly using vector bitwise logical operations to handle
xorsign.
On LoongArch64, floating-point registers and vector registers share the same
register,
so this patch also allows conversion between LSX
This patch adds support for xorsign pattern to scalar fp and vector. With the
new expands, uniformly using vector bitwise logical operations to handle
xorsign.
On LoongArch64, floating-point registers and vector registers share the same
register,
so this patch also allows conversion between LSX
This patch adds support for xorsign pattern to scalar fp and vector. With the
new expands, uniformly using vector bitwise logical operations to handle
xorsign.
On LoongArch64, floating-point registers and vector registers share the same
register,
so this patch also allows conversion between LSX
These tests fail when they are first added,this patch adjusts the
scan-assembler-times
to fix them.
gcc/testsuite/ChangeLog:
* gcc.target/loongarch/vector/lasx/lasx-vcond-1.c: Adjust assembler
times.
* gcc.target/loongarch/vector/lasx/lasx-vcond-2.c: Ditto.
* gcc.target/
anged after they were added?)
On Thu, 2023-11-16 at 20:08 +0800, Jiahao Xu wrote:
gcc/testsuite/ChangeLog:
* gcc.target/loongarch/vector/lasx/lasx-vcond-1.c: Adjust assembler
times.
* gcc.target/loongarch/vector/lasx/lasx-vcond-2.c: Ditto.
* gcc.target/loongarch/vector/ls
gcc/testsuite/ChangeLog:
* gcc.target/loongarch/vector/lasx/lasx-vcond-1.c: Adjust assembler
times.
* gcc.target/loongarch/vector/lasx/lasx-vcond-2.c: Ditto.
* gcc.target/loongarch/vector/lsx/lsx-vcond-1.c: Ditto.
* gcc.target/loongarch/vector/lsx/lsx-vcond-2.c: Di
Based on SPEC2017 performance evaluation results, it's better to make them equal
to the cost of unaligned store/load so as to avoid odd alignment peeling.
gcc/ChangeLog:
* config/loongarch/loongarch.cc
(loongarch_builtin_vectorization_cost): Adjust.
diff --git a/gcc/config/loonga
Based on SPEC2017 performance evaluation results, it's better to make them equal
to the cost of unaligned store/load so as to avoid odd alignment peeling.
gcc/ChangeLog:
* config/loongarch/loongarch.cc
(loongarch_builtin_vectorization_cost): Adjust.
diff --git a/gcc/config/loonga
Based on SPEC2017 performance evaluation results, making them equal to the
cost of unaligned store/load to avoid odd alignment peeling is better.
gcc/ChangeLog:
* config/loongarch/loongarch.cc
(loongarch_builtin_vectorization_cost): Adjust.
diff --git a/gcc/config/loongarch/loong
If the vcond_mask patterns don't support fp modes, the vector
FP comparison instructions will not be generated.
gcc/ChangeLog:
* config/loongarch/lasx.md
(vcond_mask_): Change to
(vcond_mask_): this.
* config/loongarch/lsx.md
(vcond_mask_): Change to
If the vcond_mask patterns don't support fp modes, the vector
FP comparison instructions will not be generated.
gcc/ChangeLog:
* config/loongarch/lasx.md
(vcond_mask_): Change to
(vcond_mask_): this.
* config/loongarch/lsx.md
(vcond_mask_): Change to
If the vcond_mask patterns don't support fp modes, the vector
FP comparison instructions will not be generated.
gcc/ChangeLog:
* config/loongarch/lasx.md
(vcond_mask_): Change to
(vcond_mask_): this.
* config/loongarch/lsx.md
(vcond_mask_): Change to
If the vcond_mask patterns don't support fp modes, the vector
FP comparison instructions will not be generated.
gcc/ChangeLog:
* config/loongarch/lasx.md
(vcond_mask_): Change to
(vcond_mask_): this.
* config/loongarch/lsx.md
(vcond_mask_): Change to
Add support for vec_widen lo/hi patterns. These do not directly
match on Loongarch lasx instructions but can be emulated with
even/odd + vector merge.
gcc/ChangeLog:
* config/loongarch/lasx.md (vec_widen_add_hi_,
vec_widen_add_lo_,
vec_widen_sub_hi_, vec_widen_sub_lo_,
ve
gcc/ChangeLog:
* config/loongarch/lasx.md (avg3_floor, uavg3_floor,
avg3_ceil, uavg3_ceil, ssadv16qi, usadv16qi): New patterns.
* config/loongarch/lsx.md (avg3_floor, uavg3_floor,
avg3_ceil, uavg3_ceil, ssadv16qi, usadv16qi): New patterns.
gcc/testsuite/ChangeLog:
This patch make loongarch use the new vector hooks and implements the costing
function determine_suggested_unroll_factor, to make it be able to suggest the
unroll factor for a given loop being vectorized base vec_ops analysis during
vector costing and the available issue information. Referring to a
determine_suggested_unroll_factor, to make it be able to suggest the
unroll factor for a given loop being vectorized base vec_ops analysis during
vector costing and the available issue information.The patch also adjusts cost
model through performance analysis.
Jiahao Xu (3):
LoongArch:Implement
71 matches
Mail list logo