This FAIL was introduced from r14-6908. The reason is that when merging
constant vector permutation implementations, the 128-bit matching situation
was not fully considered. In fact, the expansion of 128-bit vectors after
merging only supports value-based 4 elements set shuffle, so this time is a
c
We found that the current combine optimization pass in gcc cannot handle
the following redundant sign extension situations:
(insn 77 76 78 5 (set (reg:SI 143)
(plus:SI (subreg/s/u:SI (reg/v:DI 104 [ len ]) 0)
(const_int 1 [0x1]))) {addsi3}
(expr_list:REG_DEAD (reg/v:DI 104
Eliminate the redundant sign extension that exists after the conditional
move when the target register is SImode.
gcc/ChangeLog:
* config/loongarch/loongarch.cc (loongarch_expand_conditional_move):
Adjust.
gcc/testsuite/ChangeLog:
* gcc.target/loongarch/sign-extend-2.c:
We found that when only 128-bit vectorization was enabled, 549.fotonik3d_r
failed to vectorize effectively. For this reason, we adjust the cost of
128-bit vector_stmt that match the multiply-add pattern to facilitate 128-bit
vectorization.
The experimental results show that after the modification,
We found that in the spec17 521.wrf program, some loop invariant code generated
from single-precision floating-point approximate division calculation failed to
propose a loop. This is because the pseudo-register that stores the
intermediate temporary calculation results is rewritten in the implemen
We found that when only 128-bit vectorization was enabled, 549.fotonik3d_r
failed to vectorize effectively. For this reason, we adjust the cost of
128-bit vector_stmt that match the multiply-add pattern to facilitate 128-bit
vectorization.
The experimental results show that after the modification,
We found that using the latest compiled gcc will cause a miscompare error
when running spec2006 400.perlbench test with -flto turned on. After testing,
it was found that only the LoongArch architecture will report errors.
The first error commit was located through the git bisect command as
r14-377
There are currently two versions of the implementations of constant
vector permutation: loongarch_expand_vec_perm_const_1 and
loongarch_expand_vec_perm_const_2. The implementations of the two
versions are different. Currently, only the implementation of
loongarch_expand_vec_perm_const_1 is used fo
There are currently two versions of the implementations of constant
vector permutation: loongarch_expand_vec_perm_const_1 and
loongarch_expand_vec_perm_const_2. The implementations of the two
versions are different. Currently, only the implementation of
loongarch_expand_vec_perm_const_1 is used fo
The LoongArch has defined ctz and clz on the backend, but if we want GCC
do CTZ transformation optimization in forwprop2 pass, GCC need to know
the value of c[lt]z at zero, which may be beneficial for some test cases
(like spec2017 deepsjeng_r).
After implementing the macro, we test dynamic instru
The LoongArch has defined ctz and clz on the backend, but if we want GCC
do CTZ transformation optimization in forwprop2 pass, GCC need to know
the value of c[lt]z at zero, which may be beneficial for some test cases
(like spec2017 deepsjeng_r).
After implementing the macro, we test dynamic instru
In LoongArch, the vector popcount has corresponding instructions, while
the scalar does not. Currently, the scalar popcount is calculated
through a loop, and the value of a non-power of two needs to be iterated
several times, so the vector popcount instruction is considered for
optimization.
gcc/C
For vector constant extract-{even/odd} permutation replace the default
[x]vshuf instruction combination with [x]vilv{l/h} instruction, which
can reduce instructions and improves performance.
gcc/ChangeLog:
* config/loongarch/loongarch.cc (loongarch_is_odd_extraction):
Supplement
In the r14-5547 commit, C[LT]Z_DEFINED_VALUE_AT_ZERO were defined at
the same time, but in fact, CLZ_DEFINED_VALUE_AT_ZERO has already been
defined, so remove the duplicate definition.
gcc/ChangeLog:
* config/loongarch/loongarch.h (CTZ_DEFINED_VALUE_AT_ZERO): Add
description.
gcc/ChangeLog:
* config/loongarch/loongarch.cc (loongarch_builtin_vectorization_cost):
---
gcc/config/loongarch/loongarch.cc | 21 ++---
1 file changed, 14 insertions(+), 7 deletions(-)
diff --git a/gcc/config/loongarch/loongarch.cc
b/gcc/config/loongarch/loongarch.cc
in
Currently, the shuffle in which LoongArch selects two vectors at
corresponding positions is implemented through the [x]vshuf instruction,
but this will introduce additional index copies. In this case, the
[x]vbitsel.v instruction can be used for optimization.
gcc/ChangeLog:
* config/loong
In LoongArch, when the permutation idx comes from different vectors and
idx is not repeated, for V8SI/V8SF/V4DI/V4DF type vectors, we can use
two xvperm.w + one xvbitsel.v instructions or two xvpermi.d + one
xvbitsel.v instructions for shuffle optimization.
gcc/ChangeLog:
* config/loongar
In LoongArch, we have xvshuf.{b/h/w/d} instructions which can dealt the
situation that all low 128-bit elements of the target vector are shuffled
by concatenating the low 128-bit elements of the two input vectors, and
all high 128-bit elements of the target vector are similarly shuffled.
Therefore,
In LoongArch, when the permutation idx comes from different vectors and
idx is not repeated, for V8SI/V8SF/V4DI/V4DF type vectors, we can use
two xvperm.w + one xvbitsel.v instructions or two xvpermi.d + one
xvbitsel.v instructions for shuffle optimization.
gcc/ChangeLog:
* config/loongar
Currently, the shuffle in which LoongArch selects two vectors at
corresponding positions is implemented through the [x]vshuf instruction,
but this will introduce additional index copies. In this case, the
[x]vbitsel.v instruction can be used for optimization.
gcc/ChangeLog:
* config/loong
Currently, the shuffle in which LoongArch selects two vectors at
corresponding positions is implemented through the [x]vshuf instruction,
but this will introduce additional index copies. In this case, the
[x]vbitsel.v instruction can be used for optimization.
gcc/ChangeLog:
* config/loong
In LoongArch, we have xvshuf.{b/h/w/d} instructions which can dealt the
situation that all low 128-bit elements of the target vector are shuffled
by concatenating the low 128-bit elements of the two input vectors, and
all high 128-bit elements of the target vector are similarly shuffled.
Therefore,
In LoongArch, when the permutation idx comes from different vectors and
idx is not repeated, for V8SI/V8SF/V4DI/V4DF type vectors, we can use
two xvperm.w + one xvbitsel.v instructions or two xvpermi.d + one
xvbitsel.v instructions for shuffle optimization.
gcc/ChangeLog:
* config/loongar
In LoongArch, we have xvshuf.{b/h/w/d} instructions which can dealt the
situation that all low 128-bit elements of the target vector are shuffled
by concatenating the low 128-bit elements of the two input vectors, and
all high 128-bit elements of the target vector are similarly shuffled.
Therefore,
24 matches
Mail list logo