[PATCH v1] LoongArch: Merge constant vector permuatation implementations.
There are currently two versions of the implementations of constant vector permutation: loongarch_expand_vec_perm_const_1 and loongarch_expand_vec_perm_const_2. The implementations of the two versions are different. Currently, only the implementation of loongarch_expand_vec_perm_const_1 is used for 256-bit vectors. We hope to streamline the code as much as possible while retaining the better-performing implementation of the two. By repeatedly testing spec2006 and spec2017, we got the following Merged version. Compared with the pre-merger version, the number of lines of code in loongarch.cc has been reduced by 888 lines. At the same time, the performance of SPECint2006 under Ofast has been improved by 0.97%, and the performance of SPEC2017 fprate has been improved by 0.27%. gcc/ChangeLog: * config/loongarch/loongarch.cc (loongarch_is_odd_extraction): Remove useless forward declaration. (loongarch_is_even_extraction): Remove useless forward declaration. (loongarch_try_expand_lsx_vshuf_const): Removed. (loongarch_expand_vec_perm_const_1): Merged. (loongarch_is_double_duplicate): Removed. (loongarch_is_center_extraction): Ditto. (loongarch_is_reversing_permutation): Ditto. (loongarch_is_di_misalign_extract): Ditto. (loongarch_is_si_misalign_extract): Ditto. (loongarch_is_lasx_lowpart_extract): Ditto. (loongarch_is_op_reverse_perm): Ditto. (loongarch_is_single_op_perm): Ditto. (loongarch_is_divisible_perm): Ditto. (loongarch_is_triple_stride_extract): Ditto. (loongarch_expand_vec_perm_const_2): Merged. (loongarch_expand_vec_perm_const): New. (loongarch_vectorize_vec_perm_const): Adjust. --- gcc/config/loongarch/loongarch.cc | 1302 + 1 file changed, 207 insertions(+), 1095 deletions(-) diff --git a/gcc/config/loongarch/loongarch.cc b/gcc/config/loongarch/loongarch.cc index 1d4d8f0b256..12408042d48 100644 --- a/gcc/config/loongarch/loongarch.cc +++ b/gcc/config/loongarch/loongarch.cc @@ -8769,143 +8769,6 @@ loongarch_expand_vec_perm (rtx target, rtx op0, rtx op1, rtx sel) } } -static bool -loongarch_is_odd_extraction (struct expand_vec_perm_d *); - -static bool -loongarch_is_even_extraction (struct expand_vec_perm_d *); - -static bool -loongarch_try_expand_lsx_vshuf_const (struct expand_vec_perm_d *d) -{ - int i; - rtx target, op0, op1, sel, tmp; - rtx rperm[MAX_VECT_LEN]; - - if (d->vmode == E_V2DImode || d->vmode == E_V2DFmode - || d->vmode == E_V4SImode || d->vmode == E_V4SFmode - || d->vmode == E_V8HImode || d->vmode == E_V16QImode) -{ - target = d->target; - op0 = d->op0; - op1 = d->one_vector_p ? d->op0 : d->op1; - - if (GET_MODE (op0) != GET_MODE (op1) - || GET_MODE (op0) != GET_MODE (target)) - return false; - - if (d->testing_p) - return true; - - /* If match extract-even and extract-odd permutations pattern, use - * vselect much better than vshuf. */ - if (loongarch_is_odd_extraction (d) - || loongarch_is_even_extraction (d)) - { - if (loongarch_expand_vselect_vconcat (d->target, d->op0, d->op1, - d->perm, d->nelt)) - return true; - - unsigned char perm2[MAX_VECT_LEN]; - for (i = 0; i < d->nelt; ++i) - perm2[i] = (d->perm[i] + d->nelt) & (2 * d->nelt - 1); - - if (loongarch_expand_vselect_vconcat (d->target, d->op1, d->op0, - perm2, d->nelt)) - return true; - } - - for (i = 0; i < d->nelt; i += 1) - { - rperm[i] = GEN_INT (d->perm[i]); - } - - if (d->vmode == E_V2DFmode) - { - sel = gen_rtx_CONST_VECTOR (E_V2DImode, gen_rtvec_v (d->nelt, rperm)); - tmp = simplify_gen_subreg (E_V2DImode, d->target, d->vmode, 0); - emit_move_insn (tmp, sel); - } - else if (d->vmode == E_V4SFmode) - { - sel = gen_rtx_CONST_VECTOR (E_V4SImode, gen_rtvec_v (d->nelt, rperm)); - tmp = simplify_gen_subreg (E_V4SImode, d->target, d->vmode, 0); - emit_move_insn (tmp, sel); - } - else - { - sel = gen_rtx_CONST_VECTOR (d->vmode, gen_rtvec_v (d->nelt, rperm)); - emit_move_insn (d->target, sel); - } - - switch (d->vmode) - { - case E_V2DFmode: - emit_insn (gen_lsx_vshuf_d_f (target, target, op1, op0)); - break; - case E_V2DImode: - emit_insn (gen_lsx_vshuf_d (target, target, op1, op0)); - break; - case E_V4SFmode: - emit_insn (gen_lsx_vshuf_w_f (target, target, op1, op0)); - break; - case E_V4SImode: - emit_insn (gen_lsx_vshuf_w (target, target, op1, op0)); - break; - case E_V8HImode: - emit_insn (gen_lsx_vshuf_h (target, target, op1, op0)); -
Re: [x86_PATCH] peephole2 to resolve failure of gcc.target/i386/pr43644-2.c
On Fri, Dec 22, 2023 at 11:14 AM Roger Sayle wrote: > > > This patch resolves the failure of pr43644-2.c in the testsuite, a code > quality test I added back in July, that started failing as the code GCC > generates for 128-bit values (and their parameter passing) has been in > flux. After a few attempts at tweaking pattern constraints in the hope > of convincing reload to produce a more aggressive (but potentially > unsafe) register allocation, I think the best solution is to use a > peephole2 to catch/clean-up this specific case. > > Specifically, the function: > > unsigned __int128 foo(unsigned __int128 x, unsigned long long y) { > return x+y; > } > > currently generates: > > foo:movq%rdx, %rcx > movq%rdi, %rax > movq%rsi, %rdx > addq%rcx, %rax > adcq$0, %rdx > ret > > and with this patch/peephole2 now generates: > > foo:movq%rdx, %rax > movq%rsi, %rdx > addq%rdi, %rax > adcq$0, %rdx > ret > > which I believe is optimal. How about simply moving the assignment to the MSB in the split pattern after the LSB calculation: [(set (match_dup 0) (match_dup 4)) - (set (match_dup 5) (match_dup 2)) (parallel [(set (reg:CCC FLAGS_REG) (compare:CCC (plus:DWIH (match_dup 0) (match_dup 1)) (match_dup 0))) (set (match_dup 0) (plus:DWIH (match_dup 0) (match_dup 1)))]) + (set (match_dup 5) (match_dup 2)) (parallel [(set (match_dup 5) (plus:DWIH (plus:DWIH There is an earlyclobber on the output operand, so we are sure that assignments to (op 0) and (op 5) won't clobber anything. cprop_hardreg pass will then do the cleanup for us, resulting in: foo: movq%rdi, %rax addq%rdx, %rax movq%rsi, %rdx adcq$0, %rdx Uros. > > This patch has been tested on x86_64-pc-linux-gnu with make bootstrap > and make -k check, both with and without --target_board=unix{-m32} > with no new failures. Ok for mainline? > > > 2023-12-21 Roger Sayle > > gcc/ChangeLog > PR target/43644 > * config/i386/i386.md (define_peephole2): Tweak register allocation > of *add3_doubleword_concat_zext. > > gcc/testsuite/ChangeLog > PR target/43644 > * gcc.target/i386/pr43644-2.c: Expect 2 movq instructions. > > > Thanks in advance, and for your patience with this testsuite noise. > Roger > -- > diff --git a/gcc/config/i386/i386.md b/gcc/config/i386/i386.md index 4c6368bf3b7..9f97d407975 100644 --- a/gcc/config/i386/i386.md +++ b/gcc/config/i386/i386.md @@ -6411,13 +6411,13 @@ (define_insn_and_split "*add3_doubleword_concat_zext" "#" "&& reload_completed" [(set (match_dup 0) (match_dup 4)) - (set (match_dup 5) (match_dup 2)) (parallel [(set (reg:CCC FLAGS_REG) (compare:CCC (plus:DWIH (match_dup 0) (match_dup 1)) (match_dup 0))) (set (match_dup 0) (plus:DWIH (match_dup 0) (match_dup 1)))]) + (set (match_dup 5) (match_dup 2)) (parallel [(set (match_dup 5) (plus:DWIH (plus:DWIH
Re: [PATCH v1] LoongArch: Merge constant vector permuatation implementations.
On Thu, 2023-12-28 at 14:59 +0800, Li Wei wrote: > There are currently two versions of the implementations of constant > vector permutation: loongarch_expand_vec_perm_const_1 and > loongarch_expand_vec_perm_const_2. The implementations of the two > versions are different. Currently, only the implementation of > loongarch_expand_vec_perm_const_1 is used for 256-bit vectors. We > hope to streamline the code as much as possible while retaining the > better-performing implementation of the two. By repeatedly testing > spec2006 and spec2017, we got the following Merged version. > Compared with the pre-merger version, the number of lines of code > in loongarch.cc has been reduced by 888 lines. At the same time, > the performance of SPECint2006 under Ofast has been improved by 0.97%, > and the performance of SPEC2017 fprate has been improved by 0.27%. /* snip */ > - * 3. What LASX permutation instruction does: > - * In short, it just execute two independent 128bit vector permuatation, and > - * it's the reason that we need to do the jobs below. We will explain it. > - * op0, op1, target, and selector will be separate into high 128bit and low > - * 128bit, and do permutation as the description below: > - * > - * a) op0's low 128bit and op1's low 128bit "combines" into a 256bit temp > - * vector storage (TVS1), elements are indexed as below: > - * 0 ~ nelt / 2 - 1 nelt / 2 ~ nelt - 1 > - * |-|-| TVS1 > - * op0's low 128bit op1's low 128bit > - * op0's high 128bit and op1's high 128bit are "combined" into TVS2 in the > - * same way. > - * 0 ~ nelt / 2 - 1 nelt / 2 ~ nelt - 1 > - * |-|-| TVS2 > - * op0's high 128bit op1's high 128bit > - * b) Selector's low 128bit describes which elements from TVS1 will fit into > - * target vector's low 128bit. No TVS2 elements are allowed. > - * c) Selector's high 128bit describes which elements from TVS2 will fit > into > - * target vector's high 128bit. No TVS1 elements are allowed. Just curious: why the hardware engineers created such a bizarre instruction? :) /* snip */ > + rtx conv_op1 = gen_rtx_SUBREG (E_V4DImode, d->op1, 0); > + rtx conv_op0 = gen_rtx_SUBREG (E_V4DImode, d->op0, 0); Can we prove d->op0, d->op1, and d->target are never SUBREGs? Otherwise I'd use lowpart_subreg (E_V4DImode, d->xxx, d->vmode) here to avoid creating a nested SUBREG (nested SUBREG will cause an ICE and it has happened several times before). /* snip */ > + switch (d->vmode) > { > - remapped[i] = d->perm[i]; > + case E_V4DFmode: > + sel = gen_rtx_CONST_VECTOR (E_V4DImode, gen_rtvec_v (d- > >nelt, > + > rperm)); > + tmp = gen_rtx_SUBREG (E_V4DImode, d->target, 0); Likewise. > + emit_move_insn (tmp, sel); > + break; > + case E_V8SFmode: > + sel = gen_rtx_CONST_VECTOR (E_V8SImode, gen_rtvec_v (d- > >nelt, > + > rperm)); > + tmp = gen_rtx_SUBREG (E_V8SImode, d->target, 0); Likewise. -- Xi Ruoyao School of Aerospace Science and Technology, Xidian University
[committed] i386: Cleanup ix86_expand_{unary|binary}_operator issues
Move ix86_expand_unary_operator from i386.cc to i386-expand.cc, re-arrange prototypes and do some cosmetic changes with the usage of TARGET_APX_NDD. No functional changes. gcc/ChangeLog: * config/i386/i386.cc (ix86_unary_operator_ok): Move from here... * config/i386/i386-expand.cc (ix86_unary_operator_ok): ... to here. * config/i386/i386-protos.h: Re-arrange ix86_{unary|binary}_operator_ok and ix86_expand_{unary|binary}_operator prototypes. * config/i386/i386.md: Cosmetic changes with the usage of TARGET_APX_NDD in ix86_expand_{unary|binary}_operator and ix86_{unary|binary}_operator_ok function calls. Bootstrapped and regression tested on x86_64-linux-gnu {-m32}. Uros. diff --git a/gcc/config/i386/i386-expand.cc b/gcc/config/i386/i386-expand.cc index 57a108ae4a7..fd1b2a9ff36 100644 --- a/gcc/config/i386/i386-expand.cc +++ b/gcc/config/i386/i386-expand.cc @@ -1537,6 +1537,23 @@ ix86_expand_unary_operator (enum rtx_code code, machine_mode mode, emit_move_insn (operands[0], dst); } +/* Return TRUE or FALSE depending on whether the unary operator meets the + appropriate constraints. */ + +bool +ix86_unary_operator_ok (enum rtx_code, + machine_mode, + rtx operands[2], + bool use_ndd) +{ + /* If one of operands is memory, source and destination must match. */ + if ((MEM_P (operands[0]) + || (!use_ndd && MEM_P (operands[1]))) + && ! rtx_equal_p (operands[0], operands[1])) +return false; + return true; +} + /* Predict just emitted jump instruction to be taken with probability PROB. */ static void diff --git a/gcc/config/i386/i386-protos.h b/gcc/config/i386/i386-protos.h index 56349064a6c..9ee08d8ecc0 100644 --- a/gcc/config/i386/i386-protos.h +++ b/gcc/config/i386/i386-protos.h @@ -108,15 +108,20 @@ extern void ix86_expand_clear (rtx); extern void ix86_expand_move (machine_mode, rtx[]); extern void ix86_expand_vector_move (machine_mode, rtx[]); extern void ix86_expand_vector_move_misalign (machine_mode, rtx[]); -extern rtx ix86_fixup_binary_operands (enum rtx_code, - machine_mode, rtx[], bool = false); -extern void ix86_fixup_binary_operands_no_copy (enum rtx_code, - machine_mode, rtx[], bool = false); -extern void ix86_expand_binary_operator (enum rtx_code, -machine_mode, rtx[], bool = false); +extern rtx ix86_fixup_binary_operands (enum rtx_code, machine_mode, + rtx[], bool = false); +extern void ix86_fixup_binary_operands_no_copy (enum rtx_code, machine_mode, + rtx[], bool = false); +extern void ix86_expand_binary_operator (enum rtx_code, machine_mode, +rtx[], bool = false); +extern bool ix86_binary_operator_ok (enum rtx_code, machine_mode, +rtx[3], bool = false); +extern void ix86_expand_unary_operator (enum rtx_code, machine_mode, + rtx[], bool = false); +extern bool ix86_unary_operator_ok (enum rtx_code, machine_mode, + rtx[2], bool = false); extern void ix86_expand_vector_logical_operator (enum rtx_code, machine_mode, rtx[]); -extern bool ix86_binary_operator_ok (enum rtx_code, machine_mode, rtx[3], bool = false); extern bool ix86_avoid_lea_for_add (rtx_insn *, rtx[]); extern bool ix86_use_lea_for_mov (rtx_insn *, rtx[]); extern bool ix86_avoid_lea_for_addr (rtx_insn *, rtx[]); @@ -126,12 +131,9 @@ extern int ix86_last_zero_store_uid; extern bool ix86_vec_interleave_v2df_operator_ok (rtx operands[3], bool high); extern bool ix86_dep_by_shift_count (const_rtx set_insn, const_rtx use_insn); extern bool ix86_agi_dependent (rtx_insn *set_insn, rtx_insn *use_insn); -extern void ix86_expand_unary_operator (enum rtx_code, machine_mode, - rtx[], bool = false); extern rtx ix86_build_const_vector (machine_mode, bool, rtx); extern rtx ix86_build_signbit_mask (machine_mode, bool, bool); -extern HOST_WIDE_INT ix86_convert_const_vector_to_integer (rtx, - machine_mode); +extern HOST_WIDE_INT ix86_convert_const_vector_to_integer (rtx, machine_mode); extern void ix86_split_convert_uns_si_sse (rtx[]); extern void ix86_expand_convert_uns_didf_sse (rtx, rtx); extern void ix86_expand_convert_uns_sixf_sse (rtx, rtx); @@ -147,8 +149,6 @@ extern void ix86_split_fp_absneg_operator (enum rtx_code, machine_mode, rtx[]); extern void ix86_expand_copysign (rtx []); extern void ix86_expand_xorsign (rtx []); -extern bool ix86_unary_operator_ok (enum rtx_code, machine_mode, rtx[2], - bool = false); extern bool ix86_match_c
[PATCH v2] LoongArch: Merge constant vector permuatation implementations.
There are currently two versions of the implementations of constant vector permutation: loongarch_expand_vec_perm_const_1 and loongarch_expand_vec_perm_const_2. The implementations of the two versions are different. Currently, only the implementation of loongarch_expand_vec_perm_const_1 is used for 256-bit vectors. We hope to streamline the code as much as possible while retaining the better-performing implementation of the two. By repeatedly testing spec2006 and spec2017, we got the following Merged version. Compared with the pre-merger version, the number of lines of code in loongarch.cc has been reduced by 888 lines. At the same time, the performance of SPECint2006 under Ofast has been improved by 0.97%, and the performance of SPEC2017 fprate has been improved by 0.27%. gcc/ChangeLog: * config/loongarch/loongarch.cc (loongarch_is_odd_extraction): Remove useless forward declaration. (loongarch_is_even_extraction): Remove useless forward declaration. (loongarch_try_expand_lsx_vshuf_const): Removed. (loongarch_expand_vec_perm_const_1): Merged. (loongarch_is_double_duplicate): Removed. (loongarch_is_center_extraction): Ditto. (loongarch_is_reversing_permutation): Ditto. (loongarch_is_di_misalign_extract): Ditto. (loongarch_is_si_misalign_extract): Ditto. (loongarch_is_lasx_lowpart_extract): Ditto. (loongarch_is_op_reverse_perm): Ditto. (loongarch_is_single_op_perm): Ditto. (loongarch_is_divisible_perm): Ditto. (loongarch_is_triple_stride_extract): Ditto. (loongarch_expand_vec_perm_const_2): Merged. (loongarch_expand_vec_perm_const): New. (loongarch_vectorize_vec_perm_const): Adjust. --- gcc/config/loongarch/loongarch.cc | 1308 + 1 file changed, 210 insertions(+), 1098 deletions(-) diff --git a/gcc/config/loongarch/loongarch.cc b/gcc/config/loongarch/loongarch.cc index 1d4d8f0b256..d5bf6a02a12 100644 --- a/gcc/config/loongarch/loongarch.cc +++ b/gcc/config/loongarch/loongarch.cc @@ -8769,143 +8769,6 @@ loongarch_expand_vec_perm (rtx target, rtx op0, rtx op1, rtx sel) } } -static bool -loongarch_is_odd_extraction (struct expand_vec_perm_d *); - -static bool -loongarch_is_even_extraction (struct expand_vec_perm_d *); - -static bool -loongarch_try_expand_lsx_vshuf_const (struct expand_vec_perm_d *d) -{ - int i; - rtx target, op0, op1, sel, tmp; - rtx rperm[MAX_VECT_LEN]; - - if (d->vmode == E_V2DImode || d->vmode == E_V2DFmode - || d->vmode == E_V4SImode || d->vmode == E_V4SFmode - || d->vmode == E_V8HImode || d->vmode == E_V16QImode) -{ - target = d->target; - op0 = d->op0; - op1 = d->one_vector_p ? d->op0 : d->op1; - - if (GET_MODE (op0) != GET_MODE (op1) - || GET_MODE (op0) != GET_MODE (target)) - return false; - - if (d->testing_p) - return true; - - /* If match extract-even and extract-odd permutations pattern, use - * vselect much better than vshuf. */ - if (loongarch_is_odd_extraction (d) - || loongarch_is_even_extraction (d)) - { - if (loongarch_expand_vselect_vconcat (d->target, d->op0, d->op1, - d->perm, d->nelt)) - return true; - - unsigned char perm2[MAX_VECT_LEN]; - for (i = 0; i < d->nelt; ++i) - perm2[i] = (d->perm[i] + d->nelt) & (2 * d->nelt - 1); - - if (loongarch_expand_vselect_vconcat (d->target, d->op1, d->op0, - perm2, d->nelt)) - return true; - } - - for (i = 0; i < d->nelt; i += 1) - { - rperm[i] = GEN_INT (d->perm[i]); - } - - if (d->vmode == E_V2DFmode) - { - sel = gen_rtx_CONST_VECTOR (E_V2DImode, gen_rtvec_v (d->nelt, rperm)); - tmp = simplify_gen_subreg (E_V2DImode, d->target, d->vmode, 0); - emit_move_insn (tmp, sel); - } - else if (d->vmode == E_V4SFmode) - { - sel = gen_rtx_CONST_VECTOR (E_V4SImode, gen_rtvec_v (d->nelt, rperm)); - tmp = simplify_gen_subreg (E_V4SImode, d->target, d->vmode, 0); - emit_move_insn (tmp, sel); - } - else - { - sel = gen_rtx_CONST_VECTOR (d->vmode, gen_rtvec_v (d->nelt, rperm)); - emit_move_insn (d->target, sel); - } - - switch (d->vmode) - { - case E_V2DFmode: - emit_insn (gen_lsx_vshuf_d_f (target, target, op1, op0)); - break; - case E_V2DImode: - emit_insn (gen_lsx_vshuf_d (target, target, op1, op0)); - break; - case E_V4SFmode: - emit_insn (gen_lsx_vshuf_w_f (target, target, op1, op0)); - break; - case E_V4SImode: - emit_insn (gen_lsx_vshuf_w (target, target, op1, op0)); - break; - case E_V8HImode: - emit_insn (gen_lsx_vshuf_h (target, target, op1, op0)); -
Re: Re: [PATCH v1] LoongArch: Merge constant vector permuatation implementations.
I also have the same doubts about vector instructions.😂 Sorry i can't prove it, so i used simplify_gen_subreg instead to make sure there won't be problems (i submitted the v2 version), my oversight. > -原始邮件- > 发件人: "Xi Ruoyao" > 发送时间:2023-12-28 18:55:01 (星期四) > 收件人: "Li Wei" , gcc-patches@gcc.gnu.org > 抄送: i...@xen0n.name, xucheng...@loongson.cn, chengl...@loongson.cn > 主题: Re: [PATCH v1] LoongArch: Merge constant vector permuatation > implementations. > > On Thu, 2023-12-28 at 14:59 +0800, Li Wei wrote: > > There are currently two versions of the implementations of constant > > vector permutation: loongarch_expand_vec_perm_const_1 and > > loongarch_expand_vec_perm_const_2. The implementations of the two > > versions are different. Currently, only the implementation of > > loongarch_expand_vec_perm_const_1 is used for 256-bit vectors. We > > hope to streamline the code as much as possible while retaining the > > better-performing implementation of the two. By repeatedly testing > > spec2006 and spec2017, we got the following Merged version. > > Compared with the pre-merger version, the number of lines of code > > in loongarch.cc has been reduced by 888 lines. At the same time, > > the performance of SPECint2006 under Ofast has been improved by 0.97%, > > and the performance of SPEC2017 fprate has been improved by 0.27%. > > /* snip */ > > > - * 3. What LASX permutation instruction does: > > - * In short, it just execute two independent 128bit vector permuatation, > > and > > - * it's the reason that we need to do the jobs below. We will explain it. > > - * op0, op1, target, and selector will be separate into high 128bit and low > > - * 128bit, and do permutation as the description below: > > - * > > - * a) op0's low 128bit and op1's low 128bit "combines" into a 256bit temp > > - * vector storage (TVS1), elements are indexed as below: > > - * 0 ~ nelt / 2 - 1 nelt / 2 ~ nelt - 1 > > - * |-|-| TVS1 > > - * op0's low 128bit op1's low 128bit > > - * op0's high 128bit and op1's high 128bit are "combined" into TVS2 in > > the > > - * same way. > > - * 0 ~ nelt / 2 - 1 nelt / 2 ~ nelt - 1 > > - * |-|-| TVS2 > > - * op0's high 128bit op1's high 128bit > > - * b) Selector's low 128bit describes which elements from TVS1 will fit > > into > > - * target vector's low 128bit. No TVS2 elements are allowed. > > - * c) Selector's high 128bit describes which elements from TVS2 will fit > > into > > - * target vector's high 128bit. No TVS1 elements are allowed. > > Just curious: why the hardware engineers created such a bizarre > instruction? :) > > /* snip */ > > > + rtx conv_op1 = gen_rtx_SUBREG (E_V4DImode, d->op1, 0); > > + rtx conv_op0 = gen_rtx_SUBREG (E_V4DImode, d->op0, 0); > > Can we prove d->op0, d->op1, and d->target are never SUBREGs? Otherwise > I'd use lowpart_subreg (E_V4DImode, d->xxx, d->vmode) here to avoid > creating a nested SUBREG (nested SUBREG will cause an ICE and it has > happened several times before). > > /* snip */ > > > + switch (d->vmode) > > { > > - remapped[i] = d->perm[i]; > > + case E_V4DFmode: > > + sel = gen_rtx_CONST_VECTOR (E_V4DImode, gen_rtvec_v (d- > > >nelt, > > + > > rperm)); > > + tmp = gen_rtx_SUBREG (E_V4DImode, d->target, 0); > > Likewise. > > > + emit_move_insn (tmp, sel); > > + break; > > + case E_V8SFmode: > > + sel = gen_rtx_CONST_VECTOR (E_V8SImode, gen_rtvec_v (d- > > >nelt, > > + > > rperm)); > > + tmp = gen_rtx_SUBREG (E_V8SImode, d->target, 0); > > Likewise. > > -- > Xi Ruoyao > School of Aerospace Science and Technology, Xidian University 本邮件及其附件含有龙芯中科的商业秘密信息,仅限于发送给上面地址中列出的个人或群组。禁止任何其他人以任何形式使用(包括但不限于全部或部分地泄露、复制或散发)本邮件及其附件中的信息。如果您错收本邮件,请您立即电话或邮件通知发件人并删除本邮件。 This email and its attachments contain confidential information from Loongson Technology , which is intended only for the person or entity whose address is listed above. Any use of the information contained herein in any way (including, but not limited to, total or partial disclosure, reproduction or dissemination) by persons other than the intended recipient(s) is prohibited. If you receive this email in error, please notify the sender by phone or email immediately and delete it. 本邮件及其附件含有龙芯中科的商业秘密信息,仅限于发送给上面地址中列出的个人或群组。禁止任何其他人以任何形式使用(包括但不限于全部或部分地泄露、复制或散发)本邮件及其附件中的信息。如果您错收本邮件,请您立即电话或邮件通知发件人并删除本邮件。 This email and its attachments contain confidential information from Loongson Technology , which is intended only for the person or entity whose address is listed above. Any use of the information contained herein in any way (including, but not limited to, total or partial disclosure, reproduction or dissemination) by persons other than t
[PATCH] aarch64: fortran: Adjust vect-8.f90 for libmvec
With new glibc one more loop can be vectorized via simd exp in libmvec. Found by the Linaro TCWG CI. gcc/testsuite/ChangeLog: * gfortran/vect/vect-8.f90: Accept more vectorized loops. --- gcc/testsuite/gfortran.dg/vect/vect-8.f90 | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/gcc/testsuite/gfortran.dg/vect/vect-8.f90 b/gcc/testsuite/gfortran.dg/vect/vect-8.f90 index ca72ddcffca..938dfc29754 100644 --- a/gcc/testsuite/gfortran.dg/vect/vect-8.f90 +++ b/gcc/testsuite/gfortran.dg/vect/vect-8.f90 @@ -704,7 +704,7 @@ CALL track('KERNEL ') RETURN END SUBROUTINE kernel -! { dg-final { scan-tree-dump-times "vectorized 25 loops" 1 "vect" { target aarch64_sve } } } -! { dg-final { scan-tree-dump-times "vectorized 24 loops" 1 "vect" { target { aarch64*-*-* && { ! aarch64_sve } } } } } +! { dg-final { scan-tree-dump-times "vectorized 2\[56\] loops" 1 "vect" { target aarch64_sve } } } +! { dg-final { scan-tree-dump-times "vectorized 2\[45\] loops" 1 "vect" { target { aarch64*-*-* && { ! aarch64_sve } } } } } ! { dg-final { scan-tree-dump-times "vectorized 2\[234\] loops" 1 "vect" { target { vect_intdouble_cvt && { ! aarch64*-*-* } } } } } ! { dg-final { scan-tree-dump-times "vectorized 17 loops" 1 "vect" { target { { ! vect_intdouble_cvt } && { ! aarch64*-*-* } } } } } -- 2.25.1
[PATCH] MIPS: Implement TARGET_INSN_COSTS
MIPS backend had some information about INSN, including length, count etc. And since some instructions are more costly, let's add a new attr `perf_ratio`. It's default value is (const_int 1). The return value of mips_insn_cost is insn_count * perf_ratio * 4. The magic `4` here, is due to that `rtx_cost` returns 4 for simple instructions. gcc * config/mips/mips.cc (mips_insn_cost): New function. TARGET_INSN_COST: defined to mips_insn_cost. * config/mips/mips.md (perf_ratio): New attr. --- gcc/config/mips/mips.cc | 14 ++ gcc/config/mips/mips.md | 4 2 files changed, 18 insertions(+) diff --git a/gcc/config/mips/mips.cc b/gcc/config/mips/mips.cc index 9180dbbf843..fddb1519d76 100644 --- a/gcc/config/mips/mips.cc +++ b/gcc/config/mips/mips.cc @@ -4170,6 +4170,18 @@ mips_set_reg_reg_cost (machine_mode mode) } } +/* Implement TARGET_INSN_COSTS. */ + +static int +mips_insn_cost (rtx_insn *x, bool speed ATTRIBUTE_UNUSED) +{ + if (GET_CODE (PATTERN (x)) != SET) +return pattern_cost (PATTERN (x), speed); + return get_attr_insn_count (x) + * get_attr_perf_ratio (x) + * 4; +} + /* Implement TARGET_RTX_COSTS. */ static bool @@ -23069,6 +23081,8 @@ mips_bit_clear_p (enum machine_mode mode, unsigned HOST_WIDE_INT m) #define TARGET_RTX_COSTS mips_rtx_costs #undef TARGET_ADDRESS_COST #define TARGET_ADDRESS_COST mips_address_cost +#undef TARGET_INSN_COST +#define TARGET_INSN_COST mips_insn_cost #undef TARGET_NO_SPECULATION_IN_DELAY_SLOTS_P #define TARGET_NO_SPECULATION_IN_DELAY_SLOTS_P mips_no_speculation_in_delay_slots_p diff --git a/gcc/config/mips/mips.md b/gcc/config/mips/mips.md index 0666310734e..d6c4ba13f47 100644 --- a/gcc/config/mips/mips.md +++ b/gcc/config/mips/mips.md @@ -312,6 +312,10 @@ (define_attr "sync_insn2" "nop,and,xor,not" ;; "11" specifies MEMMODEL_ACQUIRE. (define_attr "sync_memmodel" "" (const_int 10)) +;; Performance ratio. Used by mips_insn_cost: it returns insn_count*perf_ratio*4. +;; Add this attr to the slow INSNs. +(define_attr "perf_ratio" "" (const_int 1)) + ;; Accumulator operand for madd patterns. (define_attr "accum_in" "none,0,1,2,3,4,5" (const_string "none")) -- 2.39.2
[PATCH] Improved RTL expansion of field assignments into promoted registers.
This patch fixes PR rtl-optmization/104914 by tweaking/improving the way that fields are written into a pseudo register that needs to be kept sign extended. The motivating example from the bugzilla PR is: extern void ext(int); void foo(const unsigned char *buf) { int val; ((unsigned char*)&val)[0] = *buf++; ((unsigned char*)&val)[1] = *buf++; ((unsigned char*)&val)[2] = *buf++; ((unsigned char*)&val)[3] = *buf++; if(val > 0) ext(1); else ext(0); } which at the end of the tree optimization passes looks like: void foo (const unsigned char * buf) { int val; unsigned char _1; unsigned char _2; unsigned char _3; unsigned char _4; int val.5_5; [local count: 1073741824]: _1 = *buf_7(D); MEM[(unsigned char *)&val] = _1; _2 = MEM[(const unsigned char *)buf_7(D) + 1B]; MEM[(unsigned char *)&val + 1B] = _2; _3 = MEM[(const unsigned char *)buf_7(D) + 2B]; MEM[(unsigned char *)&val + 2B] = _3; _4 = MEM[(const unsigned char *)buf_7(D) + 3B]; MEM[(unsigned char *)&val + 3B] = _4; val.5_5 = val; if (val.5_5 > 0) goto ; [59.00%] else goto ; [41.00%] [local count: 633507681]: ext (1); goto ; [100.00%] [local count: 440234144]: ext (0); [local count: 1073741824]: val ={v} {CLOBBER(eol)}; return; } Here four bytes are being sequentially written into the SImode value val. On some platforms, such as MIPS64, this SImode value is kept in a 64-bit register, suitably sign-extended. The function expand_assignment contains logic to handle this via SUBREG_PROMOTED_VAR_P (around line 6264 in expr.cc) which outputs an explicit extension operation after each store_field (typically insv) to such promoted/extended pseudos. The first observation is that there's no need to perform sign extension after each byte in the example above; the extension is only required after changes to the most significant byte (i.e. to a field that overlaps the most significant bit). The bug fix is actually a bit more subtle, but at this point during code expansion it's not safe to use a SUBREG when sign-extending this field. Currently, GCC generates (sign_extend:DI (subreg:SI (reg:DI) 0)) but combine (and other RTL optimizers) later realize that because SImode values are always sign-extended in their 64-bit hard registers that this is a no-op and eliminates it. The trouble is that it's unsafe to refer to the SImode lowpart of a 64-bit register using SUBREG at those critical points when temporarily the value isn't correctly sign-extended, and the usual backend invariants don't hold. At these critical points, the middle-end needs to use an explicit TRUNCATE rtx (as this isn't a TRULY_NOOP_TRUNCATION), so that the explicit sign-extension looks like (sign_extend:DI (truncate:SI (reg:DI)), which avoids the problem. Note that MODE_REP_EXTENDED (NARROW, WIDE) != UNKOWN implies (or should imply) !TRULY_NOOP_TRUNCATION (NARROW, WIDE). I've another (independent) patch that I'll post in a few minutes. This middle-end patch has been tested on x86_64-pc-linux-gnu with make bootstrap and make -k check, both with and without --target_board=unix{-m32} with no new failures. The cc1 from a cross-compiler to mips64 appears to generate much better code for the above test case. Ok for mainline? 2023-12-28 Roger Sayle gcc/ChangeLog PR rtl-optimization/104914 * expr.cc (expand_assignment): When target is SUBREG_PROMOTED_VAR_P a sign or zero extension is only required if the modified field overlaps the SUBREG's most significant bit. On MODE_REP_EXTENDED targets, don't refer to the temporarily incorrectly extended value using a SUBREG, but instead generate an explicit TRUNCATE rtx. Thanks in advance, Roger -- diff --git a/gcc/expr.cc b/gcc/expr.cc index 9fef2bf6585..1a34b48e38f 100644 --- a/gcc/expr.cc +++ b/gcc/expr.cc @@ -6272,19 +6272,32 @@ expand_assignment (tree to, tree from, bool nontemporal) && known_eq (bitpos, 0) && known_eq (bitsize, GET_MODE_BITSIZE (GET_MODE (to_rtx result = store_expr (from, to_rtx, 0, nontemporal, false); - else + /* Check if the field overlaps the MSB, requiring extension. */ + else if (known_eq (bitpos + bitsize, +GET_MODE_BITSIZE (GET_MODE (to_rtx { - rtx to_rtx1 - = lowpart_subreg (subreg_unpromoted_mode (to_rtx), - SUBREG_REG (to_rtx), - subreg_promoted_mode (to_rtx)); + scalar_int_mode imode = subreg_unpromoted_mode (to_rtx); + scalar_int_mode omode = subreg_promoted_mode (to_rtx); + rtx to_rtx1 = lowpart_subreg (imode, SUBREG_REG (to_rtx), + omode); result = store_field (to_rtx1, bitsize, bitpos,
[middle-end PATCH] Only call targetm.truly_noop_truncation for truncations.
The truly_noop_truncation target hook is documented, in target.def, as "true if it is safe to convert a value of inprec bits to one of outprec bits (where outprec is smaller than inprec) by merely operating on it as if it had only outprec bits", i.e. the middle-end can use a SUBREG instead of a TRUNCATE. What's perhaps potentially a little ambiguous in the above description is whether it is the caller or the callee that's responsible for ensuring or checking whether "outprec < inprec". The name TRULY_NOOP_TRUNCATION_P, like SUBREG_PROMOTED_P, may be prone to being understood as a predicate that confirms that something is a no-op truncation or a promoted subreg, when in fact the caller must first confirm this is a truncation/subreg and only then call the "classification" macro. Alas making the following minor tweak (for testing) to the i386 backend: static bool ix86_truly_noop_truncation (poly_uint64 outprec, poly_uint64 inprec) { gcc_assert (outprec < inprec); return true; } #undef TARGET_TRULY_NOOP_TRUNCATION #define TARGET_TRULY_NOOP_TRUNCATION ix86_truly_noop_truncation reveals that there are numerous callers in middle-end that rely on the default behaviour of silently returning true for any (invalid) input. These are fixed below. This patch has been tested on x86_64-pc-linux-gnu with make bootstrap and make -k check, both with and without --target_board=unix{-m32} with no new failures. Ok for mainline? 2023-12-28 Roger Sayle gcc/ChangeLog * combine.cc (make_extraction): Confirm that OUTPREC is less than INPREC before calling TRULY_NOOP_TRUNCATION_MODES_P. * expmed.cc (store_bit_field_using_insv): Likewise. (extract_bit_field_using_extv): Likewise. (extract_bit_field_as_subreg): Likewise. * optabs-query.cc (get_best_extraction_insn): Likewise. * optabs.cc (expand_parity): Likewise. * rtlhooks.cc (gen_lowpart_general): Likewise. * simplify-rtx.cc (simplify_truncation): Disallow truncations to the same precision. (simplify_unary_operation_1) : Move optimization of truncations to the same mode earlier. Thanks in advance, Roger -- diff --git a/gcc/combine.cc b/gcc/combine.cc index f2c64a9..5aa2f57 100644 --- a/gcc/combine.cc +++ b/gcc/combine.cc @@ -7613,7 +7613,8 @@ make_extraction (machine_mode mode, rtx inner, HOST_WIDE_INT pos, && (pos == 0 || REG_P (inner)) && (inner_mode == tmode || !REG_P (inner) - || TRULY_NOOP_TRUNCATION_MODES_P (tmode, inner_mode) + || (known_lt (GET_MODE_SIZE (tmode), GET_MODE_SIZE (inner_mode)) + && TRULY_NOOP_TRUNCATION_MODES_P (tmode, inner_mode)) || reg_truncated_to_mode (tmode, inner)) && (! in_dest || (REG_P (inner) @@ -7856,6 +7857,8 @@ make_extraction (machine_mode mode, rtx inner, HOST_WIDE_INT pos, /* On the LHS, don't create paradoxical subregs implicitely truncating the register unless TARGET_TRULY_NOOP_TRUNCATION. */ if (in_dest + && known_lt (GET_MODE_SIZE (GET_MODE (inner)), + GET_MODE_SIZE (wanted_inner_mode)) && !TRULY_NOOP_TRUNCATION_MODES_P (GET_MODE (inner), wanted_inner_mode)) return NULL_RTX; diff --git a/gcc/config/i386/i386.cc b/gcc/config/i386/i386.cc index 0bba93f..8940d47 100644 --- a/gcc/config/i386/i386.cc +++ b/gcc/config/i386/i386.cc @@ -26707,6 +26707,16 @@ ix86_libm_function_max_error (unsigned cfn, machine_mode mode, #define TARGET_RUN_TARGET_SELFTESTS selftest::ix86_run_selftests #endif /* #if CHECKING_P */ +static bool +ix86_truly_noop_truncation (poly_uint64 outprec, poly_uint64 inprec) +{ + gcc_assert (outprec < inprec); + return true; +} + +#undef TARGET_TRULY_NOOP_TRUNCATION +#define TARGET_TRULY_NOOP_TRUNCATION ix86_truly_noop_truncation + struct gcc_target targetm = TARGET_INITIALIZER; #include "gt-i386.h" diff --git a/gcc/expmed.cc b/gcc/expmed.cc index 05331dd..6398bf9 100644 --- a/gcc/expmed.cc +++ b/gcc/expmed.cc @@ -651,6 +651,7 @@ store_bit_field_using_insv (const extraction_insn *insv, rtx op0, X) 0)) is (reg:N X). */ if (GET_CODE (xop0) == SUBREG && REG_P (SUBREG_REG (xop0)) + && paradoxical_subreg_p (xop0) && !TRULY_NOOP_TRUNCATION_MODES_P (GET_MODE (SUBREG_REG (xop0)), op_mode)) { @@ -1585,7 +1586,11 @@ extract_bit_field_using_extv (const extraction_insn *extv, rtx op0, mode. Instead, create a temporary and use convert_move to set the target. */ if (REG_P (target) - && TRULY_NOOP_TRUNCATION_MODES_P (GET_MODE (target), ext_mode) + && (known_lt (GET_MODE_SIZE (GET_MODE (target)), + GET_MODE_SIZE (ext_mode)) + ? TRULY_NOOP_TRUNCATION_MODES_P (GET_MODE (target), ext_mode) + : known_eq (GET_MODE_SIZE (GET_MOD
Re: 回复:[PATCH v3 2/6] RISC-V: Split csr_operand in predicates.md for vector patterns.
On 12/26/23 19:49, joshua wrote: Hi Jeff, Yes, I will change soemthing in vector_csr_operand in the following patches. Constraints will be added that the AVL cannot be encoded as an immediate for xtheadvecotr vsetvl. Ah. Thanks. Makes sense. jeff
Re: [PATCH] RISC-V: Add crypto machine descriptions
On 12/26/23 19:47, Kito Cheng wrote: Thanks Feng, the patch is LGTM from my side, I am happy to accept vector crypto stuffs for GCC 14, it's mostly intrinsic stuff, and the only few non-intrinsic stuff also low risk enough (e.g. vrol, vctz) I won't object. I'm disappointed that we're in a similar situation as last year, but at least the scope is smaller. jeff
Re: [PATCH] RISC-V: Fix misaligned stack offset for interrupt function
On 12/25/23 01:45, Kito Cheng wrote: `interrupt` function will backup fcsr register, but it fixed to SImode, it's not big issue since fcsr only used 8 bits so far, however the offset should still using UNITS_PER_WORD to prevent the stack offset become non 8 byte aligned, it will cause problem for RV64. gcc/ChangeLog: * config/riscv/riscv.cc (riscv_for_each_saved_reg): Adjust the offset of fcsr. gcc/testsuite/ChangeLog: * gcc.target/riscv/interrupt-misaligned.c: New. OK jeff
Re: [ARC PATCH] Table-driven ashlsi implementation for better code/rtx_costs.
On 12/23/23 16:37, Roger Sayle wrote: One of the cool features of the H8 backend is its use of tables to select optimal shift implementations for different CPU variants. This patch borrows (plagiarizes) that idiom for SImode left shifts in the ARC backend (for CPUs without a barrel-shifter). This provides a convenient mechanism for both selecting the best implementation strategy (for speed vs. size), and providing accurate rtx_costs [without duplicating a lot of logic]. Left shift RTX costs are especially important for use in synth_mult. An example improvement is: int foo(int x) { return 32768*x; } which is now generated with -O2 -mcpu=em -mswap as: foo:bmsk_s r0,r0,16 swapr0,r0 j_s.d [blink] ror r0,r0 where previously the ARC backend would generate a loop: foo:mov lp_count,15 lp 2f add r0,r0,r0 nop 2: # end single insn loop j_s [blink] Tested with a cross-compiler to arc-linux hosted on x86_64, with no new (compile-only) regressions from make -k check. Ok for mainline if this passes Claudiu's and/or Jeff's testing? [Thanks again to Jeff for finding the typo in my last ARC patch] So just an FYI. There's no upstream gdbsim for the arc, so my tester just uses a dummy simulator which says everything passes. So I could include your patch to test that the compiler doesn't ICE, produces results that will assemble/link, but it won't test the correctness of the resulting code. Jeff
[PATCH v3] LoongArch: Replace -mexplicit-relocs=auto simple-used address peephole2 with combine
The problem with peephole2 is it uses a naive sliding-window algorithm and misses many cases. For example: float a[1]; float t() { return a[0] + a[8000]; } is compiled to: la.local$r13,a la.local$r12,a+32768 fld.s $f1,$r13,0 fld.s $f0,$r12,-768 fadd.s $f0,$f1,$f0 by trunk. But as we've explained in r14-4851, the following would be better with -mexplicit-relocs=auto: pcalau12i $r13,%pc_hi20(a) pcalau12i $r12,%pc_hi20(a+32000) fld.s $f1,$r13,%pc_lo12(a) fld.s $f0,$r12,%pc_lo12(a+32000) fadd.s $f0,$f1,$f0 However the sliding-window algorithm just won't detect the pcalau12i/fld pair to be optimized. Use a define_insn_and_split in combine pass will work around the issue. gcc/ChangeLog: * config/loongarch/predicates.md (symbolic_pcrel_offset_operand): New define_predicate. (mem_simple_ldst_operand): Likewise. * config/loongarch/loongarch-protos.h (loongarch_rewrite_mem_for_simple_ldst): Declare. * config/loongarch/loongarch.cc (loongarch_rewrite_mem_for_simple_ldst): Implement. * config/loongarch/loongarch.md (simple_load): New define_insn_and_rewrite. (simple_load_ext): Likewise. (simple_store): Likewise. (define_peephole2): Remove la.local/[f]ld peepholes. gcc/testsuite/ChangeLog: * gcc.target/loongarch/explicit-relocs-auto-single-load-store-2.c: New test. * gcc.target/loongarch/explicit-relocs-auto-single-load-store-3.c: New test. --- Changes from [v2]: - Match (mem (symbol_ref ...)) instead of (symbol_ref ...) to retain the attributes of the MEM. - Add a test to make sure the attributes of the MEM is retained. [v2]:https://gcc.gnu.org/pipermail/gcc-patches/2023-December/641430.html Bootstrapped & regtestd on loongarch64-linux-gnu. Ok for trunk? gcc/config/loongarch/loongarch-protos.h | 1 + gcc/config/loongarch/loongarch.cc | 16 +++ gcc/config/loongarch/loongarch.md | 114 +- gcc/config/loongarch/predicates.md| 13 ++ ...explicit-relocs-auto-single-load-store-2.c | 11 ++ ...explicit-relocs-auto-single-load-store-3.c | 18 +++ 6 files changed, 86 insertions(+), 87 deletions(-) create mode 100644 gcc/testsuite/gcc.target/loongarch/explicit-relocs-auto-single-load-store-2.c create mode 100644 gcc/testsuite/gcc.target/loongarch/explicit-relocs-auto-single-load-store-3.c diff --git a/gcc/config/loongarch/loongarch-protos.h b/gcc/config/loongarch/loongarch-protos.h index 7bf21a45c69..024f3117604 100644 --- a/gcc/config/loongarch/loongarch-protos.h +++ b/gcc/config/loongarch/loongarch-protos.h @@ -163,6 +163,7 @@ extern bool loongarch_use_ins_ext_p (rtx, HOST_WIDE_INT, HOST_WIDE_INT); extern bool loongarch_check_zero_div_p (void); extern bool loongarch_pre_reload_split (void); extern int loongarch_use_bstrins_for_ior_with_mask (machine_mode, rtx *); +extern rtx loongarch_rewrite_mem_for_simple_ldst (rtx); union loongarch_gen_fn_ptrs { diff --git a/gcc/config/loongarch/loongarch.cc b/gcc/config/loongarch/loongarch.cc index 1d4d8f0b256..9f2b3e98bf0 100644 --- a/gcc/config/loongarch/loongarch.cc +++ b/gcc/config/loongarch/loongarch.cc @@ -5717,6 +5717,22 @@ loongarch_use_bstrins_for_ior_with_mask (machine_mode mode, rtx *op) return 0; } +/* Rewrite a MEM for simple load/store under -mexplicit-relocs=auto + -mcmodel={normal/medium}. */ +rtx +loongarch_rewrite_mem_for_simple_ldst (rtx mem) +{ + rtx addr = XEXP (mem, 0); + rtx hi = gen_rtx_UNSPEC (Pmode, gen_rtvec (1, addr), + UNSPEC_PCALAU12I_GR); + rtx new_mem; + + addr = gen_rtx_LO_SUM (Pmode, force_reg (Pmode, hi), addr); + new_mem = gen_rtx_MEM (GET_MODE (mem), addr); + MEM_COPY_ATTRIBUTES (new_mem, mem); + return new_mem; +} + /* Print the text for PRINT_OPERAND punctation character CH to FILE. The punctuation characters are: diff --git a/gcc/config/loongarch/loongarch.md b/gcc/config/loongarch/loongarch.md index ce8fcd5b572..0de7e516d56 100644 --- a/gcc/config/loongarch/loongarch.md +++ b/gcc/config/loongarch/loongarch.md @@ -4135,101 +4135,41 @@ (define_insn "loongarch_crcc_w__w" ;; ;; And if the pseudo op cannot be relaxed, we'll get a worse result (with ;; 3 instructions). -(define_peephole2 - [(set (match_operand:P 0 "register_operand") - (match_operand:P 1 "symbolic_pcrel_operand")) - (set (match_operand:LD_AT_LEAST_32_BIT 2 "register_operand") - (mem:LD_AT_LEAST_32_BIT (match_dup 0)))] - "la_opt_explicit_relocs == EXPLICIT_RELOCS_AUTO \ - && (TARGET_CMODEL_NORMAL || TARGET_CMODEL_MEDIUM) \ - && (peep2_reg_dead_p (2, operands[0]) \ - || REGNO (operands[0]) == REGNO (operands[2]))" - [(set (match_dup 2) - (mem:LD_AT_LEAST_32_BIT (lo_sum:P (match_dup 0) (match_dup 1] - { -emit_insn (gen_pcalau12i_gr (operands[0], operands[1])); - }) - -(define_p
Re: [PATCH V2] RISC-V: Disallow transformation into VLMAX AVL for cond_len_xxx when length is in range [0,31]
On 12/26/23 19:38, Juzhe-Zhong wrote: Notice we have this following situation: vsetivlizero,4,e32,m1,ta,ma vlseg4e32.v v4,(a5) vlseg4e32.v v12,(a3) vsetvli a5,zero,e32,m1,tu,ma ---> This is redundant since VLMAX AVL = 4 when it is fixed-vlmax vfadd.vfv3,v13,fa0 vfadd.vfv1,v12,fa1 vfmul.vvv17,v3,v5 vfmul.vvv16,v1,v5 The rootcause is that we transform COND_LEN_xxx into VLMAX AVL when len == NUNITS blindly. However, we don't need to transform all of them since when len is range of [0,31], we don't need to consume scalar registers. After this patch: vsetivlizero,4,e32,m1,tu,ma addia4,a5,400 vlseg4e32.v v12,(a3) vfadd.vfv3,v13,fa0 vfadd.vfv1,v12,fa1 vlseg4e32.v v4,(a4) vfadd.vfv2,v14,fa1 vfmul.vvv17,v3,v5 vfmul.vvv16,v1,v5 Tested on both RV32 and RV64 no regression. So it looks like the two fragments above are from different sources, though I guess it's also possible one of the cut-n-pastes just got truncated. Note the differing number of vfadd intructions. That doesn't invalidate the patch, but does make it slightly harder to reason about what you're doing. Ok for trunk ? gcc/ChangeLog: * config/riscv/riscv-v.cc (is_vlmax_len_p): New function. (expand_load_store): Disallow transformation into VLMAX when len is in range of [0,31] (expand_cond_len_op): Ditto. (expand_gather_scatter): Ditto. (expand_lanes_load_store): Ditto. (expand_fold_extract_last): Ditto. gcc/testsuite/ChangeLog: * gcc.target/riscv/rvv/autovec/post-ra-avl.c: Adapt test. * gcc.target/riscv/rvv/base/vf_avl-2.c: New test. --- gcc/config/riscv/riscv-v.cc | 21 +-- .../riscv/rvv/autovec/post-ra-avl.c | 2 +- .../gcc.target/riscv/rvv/base/vf_avl-2.c | 21 +++ 3 files changed, 37 insertions(+), 7 deletions(-) create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/base/vf_avl-2.c diff --git a/gcc/config/riscv/riscv-v.cc b/gcc/config/riscv/riscv-v.cc index 038ab084a37..0cc7af58da6 100644 --- a/gcc/config/riscv/riscv-v.cc +++ b/gcc/config/riscv/riscv-v.cc @@ -68,6 +68,16 @@ imm_avl_p (machine_mode mode) : false; } +/* Return true if LEN is equal to NUNITS that outbounds range of [0, 31]. */ Perhaps "that is out of the range [0, 31]."? OK with the comment nit fixed. jeff
Re: [PATCH v2] RISC-V: XFAIL pr30957-1.c when loop vectorized with variable factor
On 12/26/23 02:34, pan2...@intel.com wrote: From: Pan Li This patch would like to XFAIL the test case pr30957-1.c for the RVV when build the elf with some configurations (list at the end of the log) It will be vectorized during vect_transform_loop with a variable factor. It won't benefit from unrolling/peeling and mark the loop->unroll as 1. Of course, it will do nothing during unroll_loops when loop->unroll is 1. The aarch64_sve may have the similar issue but it initialize the const `0.0 / -5.0` in the test file to `+0.0` before pass to the function foo. Then it will pass the execution test. aarch64: moviv0.2s, #0x0 stp x29, x30, [sp, #-16]! mov w0, #0xa mov x29, sp bl 400280 <== s0 is +0.0 Unfortunately, the riscv initialize the the const `0.0 / -5.0` to the `-0.0`, and then pass it to the function foo. Of course it the execution test will fail. riscv: flw fa0,388(gp) # 1299c <__SDATA_BEGIN__+0x4> addisp,sp,-16 li a0,10 sd ra,8(sp) jal 101fc <== fa0 is -0.0 After this patch the loops vectorized with a variable factor of the RVV will be treated as XFAIL by the tree dump when riscv_v and variable_vect_length. The below configurations are validated as XFAIL for RV64. Interesting. So I'd actually peel one more layer off this onion. Why do the aarch64 and riscv targets generate different constants (0.0 vs -0.0)? Is it possible that the aarch64 is generating 0.0 when asked for -0.0 and -fno-signed-zeros is in effect? That's a valid thing to do when -fno-signed-zeros is on. Look for HONOR_SIGNED_ZEROs in the aarch64 backend. Jeff
[PATCH] MIPS: Implement TARGET_INSN_COSTS
The current (default) behavior is that when the target doesn't define TARGET_INSN_COST the middle-end uses the backend's TARGET_RTX_COSTS, so multiplications are slower than additions, but about the same size when optimizing for size (with -Os or -Oz). All of this gets disabled with your proposed patch. [If you don't check speed, you probably shouldn't touch insn_cost]. I agree that a backend can fine tune the (speed and size) costs of instructions (especially complex !single_set instructions) via attributes in the machine description, but these should be used to override/fine-tune rtx_costs, not override/replace/duplicate them. Having accurate rtx_costs also helps RTL expansion and the earlier optimizers, but insn_cost is used by combine and the later RTL optimization passes, once instructions have been recognized. Might I also recommend that instead of insn_count*perf_ratio*4, or even the slightly better COSTS_N_INSNS (insn_count*perf_ratio), that encode the relative cost in the attribute, avoiding the multiplication (at runtime), and allowing fine tuning like "COSTS_N_INSNS(2) - 1". Likewise, COSTS_N_BYTES is a very useful macro for a backend to define/use in rtx_costs. Conveniently for many RISC machines, 1 instruction takes about 4 bytes, for COSTS_N_INSNS (1) is (approximately) comparable to COSTS_N_BYTES (4). I hope this helps. Perhaps something like: static int mips_insn_cost (rtx_insn *insn, bool speed) { int cost; if (recog_memoized (insn) >= 0) { if (speed) { /* Use cost if provided. */ cost = get_attr_cost (insn); if (cost > 0) return cost; } else { /* If optimizing for size, we want the insn size. */ return get_attr_length (insn); } } if (rtx set = single_set (insn)) cost = set_rtx_cost (set, speed); else cost = pattern_cost (PATTERN (insn), speed); /* If the cost is zero, then it's likely a complex insn. We don't want the cost of these to be less than something we know about. */ return cost ? cost : COSTS_N_INSNS (2); }
Re: [PATCH v3] EXPR: Emit an truncate if 31+ bits polluted for SImode
On 12/24/23 01:11, YunQiang Su wrote: Yes. I also guess so. Any new idea? Well, I see multiple intertwined issues and I think MIPS has largely mucked this up. At a high level DI -> SI truncation is not a nop on MIPS64. We must explicitly sign extend the value from SI->DI to preserve the invariant that SI mode objects are extended to DImode. If we fail to do that, then the SImode conditional branch patterns simply aren't going to work. MIPS64 never claims DI -> SI is nop, instead it claims SI -> DI is nop. And that just seems wrong, at least for truncation which implies the input precision must be larger than the output precision. If you adjust the mips implementation of TARGET_TRULY_NOOP_TRUNCATION to return false when the input precision is smaller than the output precision, does that fix this problem? And for MIPS64, it has only one type of branch. it works for both SI and DI. Agreed, but the SImode variant is really just a DImode comparison that relies on the sign extending property of the MIPS architecture. I'm not 100% sure that's safe in the presence of bit manipulation instructions which do not preserve the sign extending property. We actually don't allow some bit manipulations on RV64 for a similar underlying reason. Converting from 32 to 64 does be nop, IF the 32 is properly sign extended. But that's not a *truncation*, that's an *extension*. Jeff
Re: [PATCH] Improved RTL expansion of field assignments into promoted registers.
On 12/28/23 07:59, Roger Sayle wrote: This patch fixes PR rtl-optmization/104914 by tweaking/improving the way that fields are written into a pseudo register that needs to be kept sign extended. Well, I think "fixes" is a bit of a stretch. We're avoiding the issue by changing the early RTL generation, but if I understand what's going on in the RTL optimizers and MIPS backend correctly, the core bug still remains. Admittedly I haven't put it under a debugger, but that MIPS definition of NOOP_TRUNCATION just seems badly wrong and is just waiting to pop it's ugly head up again. The motivating example from the bugzilla PR is: extern void ext(int); void foo(const unsigned char *buf) { int val; ((unsigned char*)&val)[0] = *buf++; ((unsigned char*)&val)[1] = *buf++; ((unsigned char*)&val)[2] = *buf++; ((unsigned char*)&val)[3] = *buf++; if(val > 0) ext(1); else ext(0); } which at the end of the tree optimization passes looks like: void foo (const unsigned char * buf) { int val; unsigned char _1; unsigned char _2; unsigned char _3; unsigned char _4; int val.5_5; [local count: 1073741824]: _1 = *buf_7(D); MEM[(unsigned char *)&val] = _1; _2 = MEM[(const unsigned char *)buf_7(D) + 1B]; MEM[(unsigned char *)&val + 1B] = _2; _3 = MEM[(const unsigned char *)buf_7(D) + 2B]; MEM[(unsigned char *)&val + 2B] = _3; _4 = MEM[(const unsigned char *)buf_7(D) + 3B]; MEM[(unsigned char *)&val + 3B] = _4; val.5_5 = val; if (val.5_5 > 0) goto ; [59.00%] else goto ; [41.00%] [local count: 633507681]: ext (1); goto ; [100.00%] [local count: 440234144]: ext (0); [local count: 1073741824]: val ={v} {CLOBBER(eol)}; return; } Here four bytes are being sequentially written into the SImode value val. On some platforms, such as MIPS64, this SImode value is kept in a 64-bit register, suitably sign-extended. The function expand_assignment contains logic to handle this via SUBREG_PROMOTED_VAR_P (around line 6264 in expr.cc) which outputs an explicit extension operation after each store_field (typically insv) to such promoted/extended pseudos. The first observation is that there's no need to perform sign extension after each byte in the example above; the extension is only required after changes to the most significant byte (i.e. to a field that overlaps the most significant bit). True. The bug fix is actually a bit more subtle, but at this point during code expansion it's not safe to use a SUBREG when sign-extending this field. Currently, GCC generates (sign_extend:DI (subreg:SI (reg:DI) 0)) but combine (and other RTL optimizers) later realize that because SImode values are always sign-extended in their 64-bit hard registers that this is a no-op and eliminates it. The trouble is that it's unsafe to refer to the SImode lowpart of a 64-bit register using SUBREG at those critical points when temporarily the value isn't correctly sign-extended, and the usual backend invariants don't hold. At these critical points, the middle-end needs to use an explicit TRUNCATE rtx (as this isn't a TRULY_NOOP_TRUNCATION), so that the explicit sign-extension looks like (sign_extend:DI (truncate:SI (reg:DI)), which avoids the problem. Note that MODE_REP_EXTENDED (NARROW, WIDE) != UNKOWN implies (or should imply) !TRULY_NOOP_TRUNCATION (NARROW, WIDE). I've another (independent) patch that I'll post in a few minutes. This middle-end patch has been tested on x86_64-pc-linux-gnu with make bootstrap and make -k check, both with and without --target_board=unix{-m32} with no new failures. The cc1 from a cross-compiler to mips64 appears to generate much better code for the above test case. Ok for mainline? 2023-12-28 Roger Sayle gcc/ChangeLog PR rtl-optimization/104914 * expr.cc (expand_assignment): When target is SUBREG_PROMOTED_VAR_P a sign or zero extension is only required if the modified field overlaps the SUBREG's most significant bit. On MODE_REP_EXTENDED targets, don't refer to the temporarily incorrectly extended value using a SUBREG, but instead generate an explicit TRUNCATE rtx. [ ... ] + /* Check if the field overlaps the MSB, requiring extension. */ + else if (known_eq (bitpos + bitsize, +GET_MODE_BITSIZE (GET_MODE (to_rtx Do you need to look at the size of the field as well? ie, the starting position might be before the sign bit, but the width of the field might cover the mode's sign bit? I'm not real good in the RTL expansion code, so if I'm offbase on this, just let me know. jeff
Re: [PATCH v3] EXPR: Emit an truncate if 31+ bits polluted for SImode
On 12/24/23 05:24, Roger Sayle wrote: What's exceedingly weird is T_N_T_M_P (DImode, SImode) isn't actually a truncation! The output precision is first, the input precision is second. The docs explicitly state the output precision should be smaller than the input precision (which makes sense for truncation). That's where I'd start with trying to untangle this mess. Thanks (both) for correcting my misunderstanding. At the very least might I suggest that we introduce a new TRULY_NOOP_EXTENSION_MODES_P target hook that MIPS can use for this purpose? It'd help reduce confusion, and keep the documentation/function naming correct. Yes. It is good for me. T_N_T_M_P is a really confusion naming. Ignore my suggestion for a new target hook. GCC already has one. You shouldn't be using TRULY_NOOP_TRUNCATION_MODES_P with incorrectly ordered arguments. The correct target hook is TARGET_MODE_REP_EXTENDED, which the MIPS backend correctly defines via mips_mode_rep_extended. It's MIPS definition of (and interpretation of) mips_truly_noop_truncation that's suspect. My latest theory is that these sign extensions should be: (set (reg:DI) (sign_extend:DI (truncate:SI (reg:DI and not (set (reg:DI) (sign_extend:DI (subreg:SI (reg:DI In isolation these are the same. I think the fact that the MIPS backend wipes out the sign extension turning the result into a NOP is what makes them different. Of course that's kind of the point behind the TRULY_NOOP_TRUNCATION macro. That's what allows the MIPS target to wipe out the sign extension. ISTM this might be worth noting in the docs for TRULY_NOOP_TRUNCATION. Jeff
RE: [PATCH] Improved RTL expansion of field assignments into promoted registers.
Hi Jeff, Thanks for the speedy review. > On 12/28/23 07:59, Roger Sayle wrote: > > This patch fixes PR rtl-optmization/104914 by tweaking/improving the > > way that fields are written into a pseudo register that needs to be > > kept sign extended. > Well, I think "fixes" is a bit of a stretch. We're avoiding the issue by > changing the > early RTL generation, but if I understand what's going on in the RTL > optimizers > and MIPS backend correctly, the core bug still remains. Admittedly I haven't > put it > under a debugger, but that MIPS definition of NOOP_TRUNCATION just seems > badly wrong and is just waiting to pop it's ugly head up again. I think this really is the/a correct fix. The MIPS backend defines NOOP_TRUNCATION to false, so it's not correct to use a SUBREG to convert from DImode to SImode. The problem then is where in the compiler (middle-end or backend) is this invalid SUBREG being created and how can it be fixed. In this particular case, the fault is in RTL expansion. There may be other places where a SUBREG is inappropriately used instead of a TRUNCATE, but this is the place where things go wrong for PR rtl-optimization/104914. Once an inappropriate SImode SUBREG is in the RTL stream, it can remain harmlessly latent (most of the time), unless it gets split, simplified or spilled. Copying this SImode expression into it's own pseudo, results in incorrect code. One approach might be to use an UNSPEC for places where backend invariants are temporarily invalid, but in this case it's machine independent middle-end code that's using SUBREGs as though the target was an x86/pdp11. So I agree that on the surface, both of these appear to be identical: > (set (reg:DI) (sign_extend:DI (truncate:SI (reg:DI > (set (reg:DI) (sign_extend:DI (subreg:SI (reg:DI But should they get split or spilled by reload: (set (reg_tmp:SI) (subreg:SI (reg:DI)) (set (reg:DI) (sign_extend:DI (reg_tmp:SI)) is invalid as the reg_tmp isn't correctly sign-extended for SImode. But, (set (reg_tmp:SI) (truncate:SI (reg:DI)) (set (reg:DI) (sign_extend:DI (reg_tmp:SI)) is fine. The difference is the instant in time, when the SUBREG's invariants aren't yet valid (and its contents shouldn't be thought of as SImode). On nvptx, where truly_noop_truncation is always "false", it'd show the same bug/failure, if it were not for that fact that nvptx doesn't attempt to store values in "mode extended" (SUBREG_PROMOTED_VAR_P) registers. The bug is really in MODE_REP_EXTENDED support. > > The motivating example from the bugzilla PR is: > > > > extern void ext(int); > > void foo(const unsigned char *buf) { > >int val; > >((unsigned char*)&val)[0] = *buf++; > >((unsigned char*)&val)[1] = *buf++; > >((unsigned char*)&val)[2] = *buf++; > >((unsigned char*)&val)[3] = *buf++; > >if(val > 0) > > ext(1); > >else > > ext(0); > > } > > > > which at the end of the tree optimization passes looks like: > > > > void foo (const unsigned char * buf) > > { > >int val; > >unsigned char _1; > >unsigned char _2; > >unsigned char _3; > >unsigned char _4; > >int val.5_5; > > > > [local count: 1073741824]: > >_1 = *buf_7(D); > >MEM[(unsigned char *)&val] = _1; > >_2 = MEM[(const unsigned char *)buf_7(D) + 1B]; > >MEM[(unsigned char *)&val + 1B] = _2; > >_3 = MEM[(const unsigned char *)buf_7(D) + 2B]; > >MEM[(unsigned char *)&val + 2B] = _3; > >_4 = MEM[(const unsigned char *)buf_7(D) + 3B]; > >MEM[(unsigned char *)&val + 3B] = _4; > >val.5_5 = val; > >if (val.5_5 > 0) > > goto ; [59.00%] > >else > > goto ; [41.00%] > > > > [local count: 633507681]: > >ext (1); > >goto ; [100.00%] > > > > [local count: 440234144]: > >ext (0); > > > > [local count: 1073741824]: > >val ={v} {CLOBBER(eol)}; > >return; > > > > } > > > > Here four bytes are being sequentially written into the SImode value > > val. On some platforms, such as MIPS64, this SImode value is kept in > > a 64-bit register, suitably sign-extended. The function > > expand_assignment contains logic to handle this via > > SUBREG_PROMOTED_VAR_P (around line 6264 in expr.cc) which outputs an > > explicit extension operation after each store_field (typically insv) to such > promoted/extended pseudos. > > > > The first observation is that there's no need to perform sign > > extension after each byte in the example above; the extension is only > > required after changes to the most significant byte (i.e. to a field > > that overlaps the most significant bit). > True. > > > The bug fix is actually a bit more subtle, but at this point during > > code expansion it's not safe to use a SUBREG when sign-extending this > > field. Currently, GCC generates (sign_extend:DI (subreg:SI (reg:DI) > > 0)) but combine (and other RTL optimizers) later realize that because > > SImode values are always sign-extended in their 64-bit hard registers > > that
Re: Fortran: Use non conflicting file extensions for intermediates [PR81615]
Hi Rimvydas! Am 28.12.23 um 08:09 schrieb Rimvydas Jasinskas: On Wed, Dec 27, 2023 at 10:34 PM Harald Anlauf wrote: The patch is almost fine, except for a strange wording here: +@smallexample +gfortran -save-temps -c foo.F90 +@end smallexample + +preprocesses to in @file{foo.fii}, compiles to an intermediate +@file{foo.s}, and then assembles to the (implied) output file +@file{foo.o}, whereas: I understand the formulation is copied from gcc/doc/invoke.texi, where it does not fully make sense to me either. How about: "preprocesses input file @file{foo.F90} to @file{foo.fii}, ..." Furthermore, +@smallexample +gfortran -save-temps -S foo.F +@end smallexample + +saves the (no longer) temporary preprocessed file in @file{foo.fi}, and +then compiles to the (implied) output file @file{foo.s}. Even if this is copied from the gcc texinfo file, how about: "saves the preprocessor output in @file{foo.fi}, ..." which I find easier to read. Can you also add a reference to the PR number in the commit message? I agree, wording sounds a lot better, included in v2 together with PR number. Yes, this is OK. Pushed: https://gcc.gnu.org/g:2cb93e6686e4af5725d8c919cf19f535a7f3aa33 Thanks for the patch! Is there a specific reason thy -fc-prototypes (Interoperability Options section) is excluded from manpage? Can you be more specific? I get here (since gcc-9): % man /opt/gcc/14/share/man/man1/gfortran.1 |grep -A 1 "Interoperability Options" Interoperability Options -fc-prototypes -fc-prototypes-external although no detailed explanation (-> gfortran.info). The https://gcc.gnu.org/onlinedocs/gfortran/Invoking-GNU-Fortran.html does contain a working link to https://gcc.gnu.org/onlinedocs/gfortran/Interoperability-Options.html However the manpage has Interoperability section explicitly disabled with "@c man end" ... "@c man begin ENVIRONMENT". After digging into git log it seems that Interoperability section was unintentionally added after this comment mark in https://gcc.gnu.org/git/?p=gcc.git;a=commitdiff;h=e655a6cc43 Yes, that might have been unintentional. Can you open a PR, and if you have a fix, attach it there? Thanks, Harald Best regards, Rimvydas
RE: [PATCH v2] RISC-V: XFAIL pr30957-1.c when loop vectorized with variable factor
Thanks Jeff for comments, and Happy new year! > Interesting. So I'd actually peel one more layer off this onion. Why > do the aarch64 and riscv targets generate different constants (0.0 vs > -0.0)? Yeah, it surprise me too when debugging the foo function. But didn't dig into it in previous as it may be unrelated to vectorize. > Is it possible that the aarch64 is generating 0.0 when asked for -0.0 > and -fno-signed-zeros is in effect? That's a valid thing to do when > -fno-signed-zeros is on. Look for HONOR_SIGNED_ZEROs in the aarch64 > backend. Sure, will have a try for making the -0.0 happen in aarch64. Pan -Original Message- From: Jeff Law Sent: Friday, December 29, 2023 12:39 AM To: Li, Pan2 ; gcc-patches@gcc.gnu.org Cc: juzhe.zh...@rivai.ai; Wang, Yanzhang ; kito.ch...@gmail.com; richard.guent...@gmail.com Subject: Re: [PATCH v2] RISC-V: XFAIL pr30957-1.c when loop vectorized with variable factor On 12/26/23 02:34, pan2...@intel.com wrote: > From: Pan Li > > This patch would like to XFAIL the test case pr30957-1.c for the RVV when > build the elf with some configurations (list at the end of the log) > It will be vectorized during vect_transform_loop with a variable factor. > It won't benefit from unrolling/peeling and mark the loop->unroll as 1. > Of course, it will do nothing during unroll_loops when loop->unroll is 1. > > The aarch64_sve may have the similar issue but it initialize the const > `0.0 / -5.0` in the test file to `+0.0` before pass to the function foo. > Then it will pass the execution test. > > aarch64: > moviv0.2s, #0x0 > stp x29, x30, [sp, #-16]! > mov w0, #0xa > mov x29, sp > bl 400280 <== s0 is +0.0 > > Unfortunately, the riscv initialize the the const `0.0 / -5.0` to the > `-0.0`, and then pass it to the function foo. Of course it the execution > test will fail. > > riscv: > flw fa0,388(gp) # 1299c <__SDATA_BEGIN__+0x4> > addisp,sp,-16 > li a0,10 > sd ra,8(sp) > jal 101fc <== fa0 is -0.0 > > After this patch the loops vectorized with a variable factor of the RVV > will be treated as XFAIL by the tree dump when riscv_v and > variable_vect_length. > > The below configurations are validated as XFAIL for RV64. Interesting. So I'd actually peel one more layer off this onion. Why do the aarch64 and riscv targets generate different constants (0.0 vs -0.0)? Is it possible that the aarch64 is generating 0.0 when asked for -0.0 and -fno-signed-zeros is in effect? That's a valid thing to do when -fno-signed-zeros is on. Look for HONOR_SIGNED_ZEROs in the aarch64 backend. Jeff
Re: [PATCH v2] RISC-V: XFAIL pr30957-1.c when loop vectorized with variable factor
On 12/28/23 17:42, Li, Pan2 wrote: Thanks Jeff for comments, and Happy new year! Interesting. So I'd actually peel one more layer off this onion. Why do the aarch64 and riscv targets generate different constants (0.0 vs -0.0)? Yeah, it surprise me too when debugging the foo function. But didn't dig into it in previous as it may be unrelated to vectorize. Is it possible that the aarch64 is generating 0.0 when asked for -0.0 and -fno-signed-zeros is in effect? That's a valid thing to do when -fno-signed-zeros is on. Look for HONOR_SIGNED_ZEROs in the aarch64 backend. Sure, will have a try for making the -0.0 happen in aarch64. I would first look at the .optimized dump, then I'd look at the .final dump alongside the resulting assembly for aarch64. I bet we're going to find that the aarch64 target internally converts -0.0 to 0.0 when we're not honoring signed zeros. jeff
[PATCH] RISC-V: Count pointer type SSA into RVV regs liveness for dynamic LMUL cost model
This patch fixes the following choosing unexpected big LMUL which cause register spillings. Before this patch, choosing LMUL = 4: addisp,sp,-160 addiw t1,a2,-1 li a5,7 bleut1,a5,.L16 vsetivlizero,8,e64,m4,ta,ma vmv.v.x v4,a0 vs4r.v v4,0(sp)---> spill to the stack. vmv.v.x v4,a1 addia5,sp,64 vs4r.v v4,0(a5)---> spill to the stack. The root cause is the following codes: if (poly_int_tree_p (var) || (is_gimple_val (var) && !POINTER_TYPE_P (TREE_TYPE (var We count the variable as consuming a RVV reg group when it is not POINTER_TYPE. It is right for load/store STMT for example: _1 = (MEM)*addr --> addr won't be allocated an RVV vector group. However, we find it is not right for non-load/store STMT: _3 = _1 == x_8(D); _1 is pointer type too but we does allocate a RVV register group for it. So after this patch, we are choosing the perfect LMUL for the testcase in this patch: ble a2,zero,.L17 addiw a7,a2,-1 li a5,3 bleua7,a5,.L15 srliw a5,a7,2 sllia6,a5,1 add a6,a6,a5 lui a5,%hi(replacements) addit1,a5,%lo(replacements) sllia6,a6,5 lui t4,%hi(.LANCHOR0) lui t3,%hi(.LANCHOR0+8) lui a3,%hi(.LANCHOR0+16) lui a4,%hi(.LC1) vsetivlizero,4,e16,mf2,ta,ma addit4,t4,%lo(.LANCHOR0) addit3,t3,%lo(.LANCHOR0+8) addia3,a3,%lo(.LANCHOR0+16) addia4,a4,%lo(.LC1) add a6,t1,a6 addia5,a5,%lo(replacements) vle16.v v18,0(t4) vle16.v v17,0(t3) vle16.v v16,0(a3) vmsgeu.vi v25,v18,4 vadd.vi v24,v18,-4 vmsgeu.vi v23,v17,4 vadd.vi v22,v17,-4 vlm.v v21,0(a4) vmsgeu.vi v20,v16,4 vadd.vi v19,v16,-4 vsetvli zero,zero,e64,m2,ta,mu vmv.v.x v12,a0 vmv.v.x v14,a1 .L4: vlseg3e64.v v6,(a5) vmseq.vvv2,v6,v12 vmseq.vvv0,v8,v12 vmsne.vvv1,v8,v12 vmand.mmv1,v1,v2 vmerge.vvm v2,v8,v14,v0 vmv1r.v v0,v1 addia4,a5,24 vmerge.vvm v6,v6,v14,v0 vmerge.vim v2,v2,0,v0 vrgatherei16.vv v4,v6,v18 vmv1r.v v0,v25 vrgatherei16.vv v4,v2,v24,v0.t vs1r.v v4,0(a5) addia3,a5,48 vmv1r.v v0,v21 vmv2r.v v4,v2 vcompress.vmv4,v6,v0 vs1r.v v4,0(a4) vmv1r.v v0,v23 addia4,a5,72 vrgatherei16.vv v4,v6,v17 vrgatherei16.vv v4,v2,v22,v0.t vs1r.v v4,0(a3) vmv1r.v v0,v20 vrgatherei16.vv v4,v6,v16 addia5,a5,96 vrgatherei16.vv v4,v2,v19,v0.t vs1r.v v4,0(a4) bne a6,a5,.L4 No spillings, no "sp" register used. Tested on both RV32 and RV64, no regression. Ok for trunk ? PR target/113112 gcc/ChangeLog: * config/riscv/riscv-vector-costs.cc (compute_nregs_for_mode): Fix pointer type liveness count. gcc/testsuite/ChangeLog: * gcc.dg/vect/costmodel/riscv/rvv/pr113112-4.c: New test. --- gcc/config/riscv/riscv-vector-costs.cc| 12 ++-- .../vect/costmodel/riscv/rvv/pr113112-4.c | 28 +++ 2 files changed, 37 insertions(+), 3 deletions(-) create mode 100644 gcc/testsuite/gcc.dg/vect/costmodel/riscv/rvv/pr113112-4.c diff --git a/gcc/config/riscv/riscv-vector-costs.cc b/gcc/config/riscv/riscv-vector-costs.cc index 0c485dc4f29..b41a79429d4 100644 --- a/gcc/config/riscv/riscv-vector-costs.cc +++ b/gcc/config/riscv/riscv-vector-costs.cc @@ -277,9 +277,12 @@ compute_local_live_ranges ( { unsigned int point = program_point.point; gimple *stmt = program_point.stmt; + stmt_vec_info stmt_info = program_point.stmt_info; tree lhs = gimple_get_lhs (stmt); if (lhs != NULL_TREE && is_gimple_reg (lhs) - && !POINTER_TYPE_P (TREE_TYPE (lhs))) + && (!POINTER_TYPE_P (TREE_TYPE (lhs)) + || STMT_VINFO_TYPE (vect_stmt_to_vectorize (stmt_info)) + != store_vec_info_type)) { biggest_mode = get_biggest_mode (biggest_mode, TYPE_MODE (TREE_TYPE (lhs))); @@ -305,7 +308,10 @@ compute_local_live_ranges ( the future. */ if (poly_int_tree_p (var) || (is_gimple_val (var) - && !POINTER_TYPE_P (TREE_TYPE (var + && (!POINTER_TYPE_P (TREE_TYPE (var)) + || STMT_VINFO_TYPE ( +
[Committed] RISC-V: Robostify testcase pr113112-1.c
The redudant dump check is fragile and easily changed, not necessary. Tested on both RV32/RV64 no regression. Remove it and committed. gcc/testsuite/ChangeLog: * gcc.dg/vect/costmodel/riscv/rvv/pr113112-1.c: Remove redundant checks. --- gcc/testsuite/gcc.dg/vect/costmodel/riscv/rvv/pr113112-1.c | 3 --- 1 file changed, 3 deletions(-) diff --git a/gcc/testsuite/gcc.dg/vect/costmodel/riscv/rvv/pr113112-1.c b/gcc/testsuite/gcc.dg/vect/costmodel/riscv/rvv/pr113112-1.c index 95df7809d49..2dc39ad8e8b 100644 --- a/gcc/testsuite/gcc.dg/vect/costmodel/riscv/rvv/pr113112-1.c +++ b/gcc/testsuite/gcc.dg/vect/costmodel/riscv/rvv/pr113112-1.c @@ -24,6 +24,3 @@ foo (int n){ /* { dg-final { scan-assembler-not {jr} } } */ /* { dg-final { scan-assembler-times {ret} 1 } } */ /* { dg-final { scan-tree-dump-times "Preferring smaller LMUL loop because it has unexpected spills" 1 "vect" } } */ -/* { dg-final { scan-tree-dump "At most 8 number of live V_REG at program point 1 for bb 4" "vect" } } */ -/* { dg-final { scan-tree-dump "At most 40 number of live V_REG at program point 1 for bb 3" "vect" } } */ -/* { dg-final { scan-tree-dump "At most 8 number of live V_REG at program point 1 for bb 5" "vect" } } */ -- 2.36.3
回复:[PATCH v3 1/6] RISC-V: Refactor riscv-vector-builtins-bases.cc
Hi Jeff, Perhaps fold_fault_load cannot be moved to riscv-protos.h since gimple_folder is declared in riscv-vector-builtins.h. It's not reasonable to include riscv-vector-builtins.h in riscv-protos.h. In fact, fold_fault_load is defined specially for some builtin functions, and it would be better to just prototype in riscv-vector-builtins-bases.h. Joshua -- 发件人:Jeff Law 发送时间:2023年12月21日(星期四) 02:14 收件人:"Jun Sha (Joshua)"; "gcc-patches" 抄 送:"jim.wilson.gcc"; palmer; andrew; "philipp.tomsich"; "christoph.muellner"; "juzhe.zhong"; Jin Ma; Xianmiao Qu 主 题:Re: [PATCH v3 1/6] RISC-V: Refactor riscv-vector-builtins-bases.cc On 12/20/23 05:25, Jun Sha (Joshua) wrote: > This patch moves the definition of the enums lst_type and > frm_op_type into riscv-vector-builtins-bases.h and removes > the static visibility of fold_fault_load(), so these > can be used in other compile units. > > gcc/ChangeLog: > > * config/riscv/riscv-vector-builtins-bases.cc (enum lst_type): > (enum frm_op_type): move to riscv-vector-builtins-bases.h > * config/riscv/riscv-vector-builtins-bases.h > (GCC_RISCV_VECTOR_BUILTINS_BASES_H): Add header files. > (enum lst_type): move from > (enum frm_op_type): riscv-vector-builtins-bases.cc > (fold_fault_load): riscv-vector-builtins-bases.cc I'm largely hoping to leave the heavy review lifting here to Juzhe who knows GCC's RV vector bits as well as anyone. Just one small issue. Would it be better to prototype fold_fault_load elsewhere and avoid the gimple.h inclusion in riscv-vector-builtins-bases.h? Perhaps riscv-protos.h? You might consider prefixing the function name with riscv_. It's not strictly necessary, but it appears to be relatively common in risc-v port. Thanks, Jeff
[PATCH v1] LoongArch: testsuite:Fix FAIL in lasx-xvstelm.c file.
After implementing the cost model on the LoongArch architecture, the GCC compiler code has this feature turned on by default, which causes the lasx-xvstelm.c file test to fail. Through analysis, this test case can generate vectorization instructions required for detection only after disabling the functionality of the cost model with the "-fno-vect-cost-model" compilation option. gcc/testsuite/ChangeLog: * gcc.target/loongarch/vector/lasx/lasx-xvstelm.c:Add compile option "-fno-vect-cost-model" to dg-options. --- gcc/testsuite/gcc.target/loongarch/vector/lasx/lasx-xvstelm.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/gcc/testsuite/gcc.target/loongarch/vector/lasx/lasx-xvstelm.c b/gcc/testsuite/gcc.target/loongarch/vector/lasx/lasx-xvstelm.c index 1a7b0e86f8b..4b846204a65 100644 --- a/gcc/testsuite/gcc.target/loongarch/vector/lasx/lasx-xvstelm.c +++ b/gcc/testsuite/gcc.target/loongarch/vector/lasx/lasx-xvstelm.c @@ -1,5 +1,5 @@ /* { dg-do compile } */ -/* { dg-options "-O3 -mlasx" } */ +/* { dg-options "-O3 -mlasx -fno-vect-cost-model" } */ /* { dg-final { scan-assembler-times "xvstelm.w" 8} } */ #define LEN 256 -- 2.20.1
[PATCH v4 5/6] RISC-V: Handle differences between XTheadvector and Vector
This patch is to handle the differences in instruction generation between Vector and XTheadVector. In this version, we only support partial xtheadvector instructions that leverage directly from current RVV1.0 with simple adding "th." prefix. For different name xtheadvector instructions but share same patterns as RVV1.0 instructions, we will use ASM targethook to rewrite the whole string of the instructions in the following patches. For some vector patterns that cannot be avoided, we use "!TARGET_XTHEADVECTOR" to disable them in vector.md in order not to generate instructions that xtheadvector does not support, like vmv1r and vsext.vf2. gcc/ChangeLog: * config.gcc: Add files for XTheadVector intrinsics. * config/riscv/autovec.md: Guard XTheadVector. * config/riscv/riscv-string.cc (expand_block_move): Guard XTheadVector. * config/riscv/riscv-v.cc (legitimize_move): New expansion. (get_prefer_tail_policy): Give specific value for tail. (get_prefer_mask_policy): Give specific value for mask. (vls_mode_valid_p): Avoid autovec. * config/riscv/riscv-vector-builtins-shapes.cc (check_type): (build_one): New function. * config/riscv/riscv-vector-builtins.cc (DEF_RVV_FUNCTION): (DEF_THEAD_RVV_FUNCTION): Add new marcos. (check_required_extensions): (handle_pragma_vector): * config/riscv/riscv-vector-builtins.h (RVV_REQUIRE_VECTOR): (RVV_REQUIRE_XTHEADVECTOR): Add RVV_REQUIRE_VECTOR and RVV_REQUIRE_XTHEADVECTOR. (struct function_group_info): * config/riscv/riscv-vector-switch.def (ENTRY): Disable fractional mode for the XTheadVector extension. (TUPLE_ENTRY): Likewise. * config/riscv/riscv-vsetvl.cc: Add functions for xtheadvector. * config/riscv/riscv.cc (riscv_v_ext_vls_mode_p): Guard XTheadVector. (riscv_v_adjust_bytesize): Likewise. (riscv_preferred_simd_mode): Likewsie. (riscv_autovectorize_vector_modes): Likewise. (riscv_vector_mode_supported_any_target_p): Likewise. (TARGET_VECTOR_MODE_SUPPORTED_ANY_TARGET_P): Likewise. * config/riscv/vector-iterators.md: Remove fractional LMUL. * config/riscv/vector.md: Include thead-vector.md. * config/riscv/riscv_th_vector.h: New file. * config/riscv/thead-vector.md: New file. gcc/testsuite/ChangeLog: * gcc.target/riscv/rvv/base/pragma-1.c: Add XTheadVector. * gcc.target/riscv/rvv/base/abi-1.c: Exclude XTheadVector. * lib/target-supports.exp: Add target for XTheadVector. Co-authored-by: Jin Ma Co-authored-by: Xianmiao Qu Co-authored-by: Christoph Müllner --- gcc/config.gcc| 2 +- gcc/config/riscv/autovec.md | 2 +- gcc/config/riscv/predicates.md| 8 +- gcc/config/riscv/riscv-string.cc | 3 + gcc/config/riscv/riscv-v.cc | 13 +- .../riscv/riscv-vector-builtins-bases.cc | 3 + .../riscv/riscv-vector-builtins-shapes.cc | 23 +++ gcc/config/riscv/riscv-vector-switch.def | 150 +++--- gcc/config/riscv/riscv-vsetvl.cc | 10 + gcc/config/riscv/riscv.cc | 20 +- gcc/config/riscv/riscv_th_vector.h| 49 + gcc/config/riscv/thead-vector.md | 142 + gcc/config/riscv/vector-iterators.md | 186 +- gcc/config/riscv/vector.md| 36 +++- .../gcc.target/riscv/rvv/base/abi-1.c | 2 +- .../gcc.target/riscv/rvv/base/pragma-1.c | 2 +- gcc/testsuite/lib/target-supports.exp | 12 ++ 17 files changed, 474 insertions(+), 189 deletions(-) create mode 100644 gcc/config/riscv/riscv_th_vector.h create mode 100644 gcc/config/riscv/thead-vector.md diff --git a/gcc/config.gcc b/gcc/config.gcc index f0676c830e8..1445d98c147 100644 --- a/gcc/config.gcc +++ b/gcc/config.gcc @@ -549,7 +549,7 @@ riscv*) extra_objs="${extra_objs} riscv-vector-builtins.o riscv-vector-builtins-shapes.o riscv-vector-builtins-bases.o" extra_objs="${extra_objs} thead.o riscv-target-attr.o" d_target_objs="riscv-d.o" - extra_headers="riscv_vector.h" + extra_headers="riscv_vector.h riscv_th_vector.h" target_gtfiles="$target_gtfiles \$(srcdir)/config/riscv/riscv-vector-builtins.cc" target_gtfiles="$target_gtfiles \$(srcdir)/config/riscv/riscv-vector-builtins.h" ;; diff --git a/gcc/config/riscv/autovec.md b/gcc/config/riscv/autovec.md index 8b8a92f10a1..1fac56c7095 100644 --- a/gcc/config/riscv/autovec.md +++ b/gcc/config/riscv/autovec.md @@ -2579,7 +2579,7 @@ [(match_operand 0 "register_operand") (match_operand 1 "memory_operand") (match_operand:ANYI 2 "const_int_operand")] - "TARGET_VECTOR" + "TARGET_VECTOR && !TARGET_XTHEADVECTOR" { riscv_vector::expand_rawmemchr(mode
[PATCH v4 6/6] RISC-V: Add support for xtheadvector-specific intrinsics.
This patch only involves the generation of xtheadvector special load/store instructions and vext instructions. gcc/ChangeLog: * config/riscv/riscv-vector-builtins-bases.cc (class th_loadstore_width): Define new builtin bases. (BASE): Define new builtin bases. * config/riscv/riscv-vector-builtins-bases.h: Define new builtin class. * config/riscv/riscv-vector-builtins-functions.def (vlsegff): Include thead-vector-builtins-functions.def. * config/riscv/riscv-vector-builtins-shapes.cc (struct th_loadstore_width_def): Define new builtin shapes. (struct th_indexed_loadstore_width_def): Define new builtin shapes. (SHAPE): Define new builtin shapes. * config/riscv/riscv-vector-builtins-shapes.h: Define new builtin shapes. * config/riscv/riscv-vector-builtins-types.def (DEF_RVV_I8_OPS): Add datatypes for XTheadVector. (DEF_RVV_I16_OPS): Add datatypes for XTheadVector. (DEF_RVV_I32_OPS): Add datatypes for XTheadVector. (DEF_RVV_U8_OPS): Add datatypes for XTheadVector. (DEF_RVV_U16_OPS): Add datatypes for XTheadVector. (DEF_RVV_U32_OPS): Add datatypes for XTheadVector. (vint8m1_t): Add datatypes for XTheadVector. (vint8m2_t): Likewise. (vint8m4_t): Likewise. (vint8m8_t): Likewise. (vint16m1_t): Likewise. (vint16m2_t): Likewise. (vint16m4_t): Likewise. (vint16m8_t): Likewise. (vint32m1_t): Likewise. (vint32m2_t): Likewise. (vint32m4_t): Likewise. (vint32m8_t): Likewise. (vint64m1_t): Likewise. (vint64m2_t): Likewise. (vint64m4_t): Likewise. (vint64m8_t): Likewise. (vuint8m1_t): Likewise. (vuint8m2_t): Likewise. (vuint8m4_t): Likewise. (vuint8m8_t): Likewise. (vuint16m1_t): Likewise. (vuint16m2_t): Likewise. (vuint16m4_t): Likewise. (vuint16m8_t): Likewise. (vuint32m1_t): Likewise. (vuint32m2_t): Likewise. (vuint32m4_t): Likewise. (vuint32m8_t): Likewise. (vuint64m1_t): Likewise. (vuint64m2_t): Likewise. (vuint64m4_t): Likewise. (vuint64m8_t): Likewise. * config/riscv/riscv-vector-builtins.cc (DEF_RVV_I8_OPS): Add datatypes for XTheadVector. (DEF_RVV_I16_OPS): Add datatypes for XTheadVector. (DEF_RVV_I32_OPS): Add datatypes for XTheadVector. (DEF_RVV_U8_OPS): Add datatypes for XTheadVector. (DEF_RVV_U16_OPS): Add datatypes for XTheadVector. (DEF_RVV_U32_OPS): Add datatypes for XTheadVector. * config/riscv/thead-vector-builtins-functions.def: New file. * config/riscv/thead-vector.md: Add new patterns. gcc/testsuite/ChangeLog: * gcc.target/riscv/rvv/xtheadvector/vlb-vsb.c: New test. * gcc.target/riscv/rvv/xtheadvector/vlbu-vsb.c: New test. * gcc.target/riscv/rvv/xtheadvector/vlh-vsh.c: New test. * gcc.target/riscv/rvv/xtheadvector/vlhu-vsh.c: New test. * gcc.target/riscv/rvv/xtheadvector/vlw-vsw.c: New test. * gcc.target/riscv/rvv/xtheadvector/vlwu-vsw.c: New test. Co-authored-by: Jin Ma Co-authored-by: Xianmiao Qu Co-authored-by: Christoph Müllner --- gcc/config.gcc| 2 +- .../riscv/riscv-vector-builtins-shapes.cc | 126 +++ .../riscv/riscv-vector-builtins-shapes.h | 3 + .../riscv/riscv-vector-builtins-types.def | 120 +++ gcc/config/riscv/riscv-vector-builtins.cc | 313 +- gcc/config/riscv/riscv-vector-builtins.h | 3 + gcc/config/riscv/t-riscv | 16 + .../riscv/thead-vector-builtins-functions.def | 39 +++ gcc/config/riscv/thead-vector-builtins.cc | 200 +++ gcc/config/riscv/thead-vector-builtins.h | 64 gcc/config/riscv/thead-vector.md | 253 ++ .../riscv/rvv/xtheadvector/vlb-vsb.c | 68 .../riscv/rvv/xtheadvector/vlbu-vsb.c | 68 .../riscv/rvv/xtheadvector/vlh-vsh.c | 68 .../riscv/rvv/xtheadvector/vlhu-vsh.c | 68 .../riscv/rvv/xtheadvector/vlw-vsw.c | 68 .../riscv/rvv/xtheadvector/vlwu-vsw.c | 68 17 files changed, 1545 insertions(+), 2 deletions(-) create mode 100644 gcc/config/riscv/thead-vector-builtins-functions.def create mode 100644 gcc/config/riscv/thead-vector-builtins.cc create mode 100644 gcc/config/riscv/thead-vector-builtins.h create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/xtheadvector/vlb-vsb.c create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/xtheadvector/vlbu-vsb.c create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/xtheadvector/vlh-vsh.c create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/xtheadvector/vlhu-vsh.c create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/xtheadvector/vlw-vsw.c creat
Re: [PATCH] Improved RTL expansion of field assignments into promoted registers.
Jeff Law 于2023年12月29日周五 02:23写道: > > > > On 12/28/23 07:59, Roger Sayle wrote: > > > > This patch fixes PR rtl-optmization/104914 by tweaking/improving the way > > that fields are written into a pseudo register that needs to be kept sign > > extended. > Well, I think "fixes" is a bit of a stretch. We're avoiding the issue > by changing the early RTL generation, but if I understand what's going > on in the RTL optimizers and MIPS backend correctly, the core bug still > remains. Admittedly I haven't put it under a debugger, but that MIPS > definition of NOOP_TRUNCATION just seems badly wrong and is just waiting > to pop it's ugly head up again. > Yes. I am trying to get rid of it from MIPS64. It may reduce our maintain workload. > > > > > > The motivating example from the bugzilla PR is: > > > > extern void ext(int); > > void foo(const unsigned char *buf) { > >int val; > >((unsigned char*)&val)[0] = *buf++; > >((unsigned char*)&val)[1] = *buf++; > >((unsigned char*)&val)[2] = *buf++; > >((unsigned char*)&val)[3] = *buf++; > >if(val > 0) > > ext(1); > >else > > ext(0); > > } > > > > which at the end of the tree optimization passes looks like: > > > > void foo (const unsigned char * buf) > > { > >int val; > >unsigned char _1; > >unsigned char _2; > >unsigned char _3; > >unsigned char _4; > >int val.5_5; > > > > [local count: 1073741824]: > >_1 = *buf_7(D); > >MEM[(unsigned char *)&val] = _1; > >_2 = MEM[(const unsigned char *)buf_7(D) + 1B]; > >MEM[(unsigned char *)&val + 1B] = _2; > >_3 = MEM[(const unsigned char *)buf_7(D) + 2B]; > >MEM[(unsigned char *)&val + 2B] = _3; > >_4 = MEM[(const unsigned char *)buf_7(D) + 3B]; > >MEM[(unsigned char *)&val + 3B] = _4; > >val.5_5 = val; > >if (val.5_5 > 0) > > goto ; [59.00%] > >else > > goto ; [41.00%] > > > > [local count: 633507681]: > >ext (1); > >goto ; [100.00%] > > > > [local count: 440234144]: > >ext (0); > > > > [local count: 1073741824]: > >val ={v} {CLOBBER(eol)}; > >return; > > > > } > > > > Here four bytes are being sequentially written into the SImode value > > val. On some platforms, such as MIPS64, this SImode value is kept in > > a 64-bit register, suitably sign-extended. The function expand_assignment > > contains logic to handle this via SUBREG_PROMOTED_VAR_P (around line 6264 > > in expr.cc) which outputs an explicit extension operation after each > > store_field (typically insv) to such promoted/extended pseudos. > > > > The first observation is that there's no need to perform sign extension > > after each byte in the example above; the extension is only required > > after changes to the most significant byte (i.e. to a field that overlaps > > the most significant bit). > True. > > > > > > The bug fix is actually a bit more subtle, but at this point during > > code expansion it's not safe to use a SUBREG when sign-extending this > > field. Currently, GCC generates (sign_extend:DI (subreg:SI (reg:DI) 0)) > > but combine (and other RTL optimizers) later realize that because SImode > > values are always sign-extended in their 64-bit hard registers that > > this is a no-op and eliminates it. The trouble is that it's unsafe to > > refer to the SImode lowpart of a 64-bit register using SUBREG at those > > critical points when temporarily the value isn't correctly sign-extended, > > and the usual backend invariants don't hold. At these critical points, > > the middle-end needs to use an explicit TRUNCATE rtx (as this isn't a > > TRULY_NOOP_TRUNCATION), so that the explicit sign-extension looks like > > (sign_extend:DI (truncate:SI (reg:DI)), which avoids the problem. > > > > > > Note that MODE_REP_EXTENDED (NARROW, WIDE) != UNKOWN implies (or should > > imply) !TRULY_NOOP_TRUNCATION (NARROW, WIDE). I've another (independent) > > patch that I'll post in a few minutes. > > > > > > This middle-end patch has been tested on x86_64-pc-linux-gnu with > > make bootstrap and make -k check, both with and without > > --target_board=unix{-m32} with no new failures. The cc1 from a > > cross-compiler to mips64 appears to generate much better code for > > the above test case. Ok for mainline? > > > > > > 2023-12-28 Roger Sayle > > > > gcc/ChangeLog > > PR rtl-optimization/104914 > > * expr.cc (expand_assignment): When target is SUBREG_PROMOTED_VAR_P > > a sign or zero extension is only required if the modified field > > overlaps the SUBREG's most significant bit. On MODE_REP_EXTENDED > > targets, don't refer to the temporarily incorrectly extended value > > using a SUBREG, but instead generate an explicit TRUNCATE rtx. > [ ... ] > > > > + /* Check if the field overlaps the MSB, requiring extension. */ > > + else if (known_eq (bitpos + bitsize, > > + GET_MODE_BITSIZE (GET_MODE (to_rtx > Do you nee
Re: [PATCH] Improved RTL expansion of field assignments into promoted registers.
In general, I agree with this change. When gcc12 on RV64, more than one `sext.w` will be produced with our test. (Note, use -O1). > > There are two things that help here. The first is that the most significant > bit never appears in the middle of a field, so we don't have to worry about > overlapping, nor writes to the paradoxical bits of the SUBREG. And secondly, > bits are numbered from zero for least significant, to MODE_BITSIZE (mode) - 1 > for most significant, irrespective of the endian-ness. So the code only needs I am worrying that the higher bits than MODE_BITSIZE (mode) - 1 are also modified. In this case, we also need do truncate/sign_extend. While I cannot produce this C code yet. > to check the highest value bitpos + bitsize is the maximum value for the mode. > The above logic stays the same, but which byte insert requires extension will > change between mips64be and mips64le. i.e. we test that the most significant > bit of the field/byte being written in the most significant bit of the SUBREG > target. [That's my understanding/rationalization, I could wrong]. > The bit higher than MODE_BITSIZE (mode) - 1 also matters. Since MIPS ISA claims that the src register of SImode instructions should be sign_extended, otherwise UNPREDICTABLE. It means, li $r2, 0xfff0 0001 # ^ addu $r1, $r0, $r2 is not allowed. > One thing I could be more cautious about is using maybe_eq instead of > known_eq, but the rest of the code (including truly_noop_truncation) assumes > scalar integer modes, so variable length vectors aren't (yet) a concern. > Would using maybe_eq be better coding style? > > > Cheers, > Roger > -- > >
[PATCH v1 0/8] LoongArch:Enable testing for common
When using binutils, which does not support vectorization, and the gcc compiler toolchain, which does support vectorization, the following two types of error problems occur in gcc regression testing. 1.Failure of common tests in the gcc.dg/vect directory??? Regression testing of GCC has found that vect-bic-bitmask-{12/23}.c has errors at compile time, and similar problems exist on various architectures (e.g. x86, aarch64,riscv, etc.). The reason is that the behavior of the program is the assembly state, and the vector instruction cannot be recognized in the assembly stage and an error occurs. 2.FAIL items of common vectorization tests are supported. When LoongArch architecture supports common vector test cases, GCC regression testing has many failures. Reasons include a lack of detection of targets Rules, lack of vectorization options, lack of specific compilation options, check for instruction set differences and test behavior for program Settings, etc. For details, see the following patches: chenxiaolong (8): LoongArch: testsuite:Add detection procedures supported by the target. LoongArch: testsuite:Modify the test behavior of the vect-bic-bitmask-{12,23}.c file. LoongArch: testsuite:Added test support for vect-{82,83}.c. LoongArch: testsuite:Fix FAIL in file bind_c_array_params_2.f90. LoongArch: testsuite:Modify the test behavior in file pr60510.f. LoongArch: testsuite:Added additional vectorization "-mlasx" compilation option. LoongArch: testsuite:Added additional vectorization "-mlsx" compilation option. LoongArch: testsuite:Modify the result check in the FMA file. gcc/testsuite/gcc.dg/fma-3.c | 2 +- gcc/testsuite/gcc.dg/fma-4.c | 2 +- gcc/testsuite/gcc.dg/fma-6.c | 2 +- gcc/testsuite/gcc.dg/fma-7.c | 2 +- gcc/testsuite/gcc.dg/signbit-2.c | 1 + gcc/testsuite/gcc.dg/tree-ssa/scev-16.c | 1 + gcc/testsuite/gcc.dg/vect/bb-slp-pattern-1.c | 1 + .../gcc.dg/vect/slp-widen-mult-half.c | 1 + gcc/testsuite/gcc.dg/vect/vect-82.c | 2 +- gcc/testsuite/gcc.dg/vect/vect-83.c | 2 +- .../gcc.dg/vect/vect-bic-bitmask-12.c | 2 +- .../gcc.dg/vect/vect-bic-bitmask-23.c | 2 +- .../gcc.dg/vect/vect-widen-mult-const-s16.c | 1 + .../gcc.dg/vect/vect-widen-mult-const-u16.c | 1 + .../gcc.dg/vect/vect-widen-mult-half-u8.c | 1 + .../gcc.dg/vect/vect-widen-mult-half.c| 1 + .../gcc.dg/vect/vect-widen-mult-u16.c | 1 + .../gcc.dg/vect/vect-widen-mult-u8-s16-s32.c | 1 + .../gcc.dg/vect/vect-widen-mult-u8-u32.c | 1 + .../gcc.dg/vect/vect-widen-mult-u8.c | 1 + .../gfortran.dg/bind_c_array_params_2.f90 | 4 +- .../gfortran.dg/graphite/vect-pr40979.f90 | 1 + .../gfortran.dg/vect/fast-math-mgrid-resid.f | 1 + gcc/testsuite/gfortran.dg/vect/pr60510.f | 1 - gcc/testsuite/lib/target-supports.exp | 219 +- 25 files changed, 186 insertions(+), 68 deletions(-) -- 2.20.1
回复:[PATCH v4 5/6] RISC-V: Handle differences between XTheadvector and Vector
H Juzhe, This patch "RISC-V: Handle differences between XTheadvector and Vector" is addressing some code generation issues for RVV1.0 instructions that xtheadvector does not have, not with intrinsics. BTW, what about the following patch " RISC-V: Add support for xtheadvector-specific intrinsics"?It adds support new xtheadvector instructions. Is it OK to be merged? Joshua -- 发件人:juzhe.zh...@rivai.ai 发送时间:2023年12月29日(星期五) 09:58 收件人:"cooper.joshua"; "gcc-patches" 抄 送:Jim Wilson; palmer; andrew; "philipp.tomsich"; jeffreyalaw; "christoph.muellner"; "cooper.joshua"; jinma; "cooper.qu" 主 题:Re: [PATCH v4 5/6] RISC-V: Handle differences between XTheadvector and Vector I am confused by the series patches. I thought this patch: https://gcc.gnu.org/pipermail/gcc-patches/2023-December/641417.html is enough to support partial theadvector that can leverage directly RVV1.0 ? Could clean up and resend the patches base on patch above (supposed it is merged already) ? juzhe.zh...@rivai.ai From: Jun Sha (Joshua) Date: 2023-12-29 09:46 To: gcc-patches CC: jim.wilson.gcc; palmer; andrew; philipp.tomsich; jeffreyalaw; christoph.muellner; juzhe.zhong; Jun Sha (Joshua); Jin Ma; Xianmiao Qu Subject: [PATCH v4 5/6] RISC-V: Handle differences between XTheadvector and Vector This patch is to handle the differences in instruction generation between Vector and XTheadVector. In this version, we only support partial xtheadvector instructions that leverage directly from current RVV1.0 with simple adding "th." prefix. For different name xtheadvector instructions but share same patterns as RVV1.0 instructions, we will use ASM targethook to rewrite the whole string of the instructions in the following patches. For some vector patterns that cannot be avoided, we use "!TARGET_XTHEADVECTOR" to disable them in vector.md in order not to generate instructions that xtheadvector does not support, like vmv1r and vsext.vf2. gcc/ChangeLog: * config.gcc: Add files for XTheadVector intrinsics. * config/riscv/autovec.md: Guard XTheadVector. * config/riscv/riscv-string.cc (expand_block_move): Guard XTheadVector. * config/riscv/riscv-v.cc (legitimize_move): New expansion. (get_prefer_tail_policy): Give specific value for tail. (get_prefer_mask_policy): Give specific value for mask. (vls_mode_valid_p): Avoid autovec. * config/riscv/riscv-vector-builtins-shapes.cc (check_type): (build_one): New function. * config/riscv/riscv-vector-builtins.cc (DEF_RVV_FUNCTION): (DEF_THEAD_RVV_FUNCTION): Add new marcos. (check_required_extensions): (handle_pragma_vector): * config/riscv/riscv-vector-builtins.h (RVV_REQUIRE_VECTOR): (RVV_REQUIRE_XTHEADVECTOR): Add RVV_REQUIRE_VECTOR and RVV_REQUIRE_XTHEADVECTOR. (struct function_group_info): * config/riscv/riscv-vector-switch.def (ENTRY): Disable fractional mode for the XTheadVector extension. (TUPLE_ENTRY): Likewise. * config/riscv/riscv-vsetvl.cc: Add functions for xtheadvector. * config/riscv/riscv.cc (riscv_v_ext_vls_mode_p): Guard XTheadVector. (riscv_v_adjust_bytesize): Likewise. (riscv_preferred_simd_mode): Likewsie. (riscv_autovectorize_vector_modes): Likewise. (riscv_vector_mode_supported_any_target_p): Likewise. (TARGET_VECTOR_MODE_SUPPORTED_ANY_TARGET_P): Likewise. * config/riscv/vector-iterators.md: Remove fractional LMUL. * config/riscv/vector.md: Include thead-vector.md. * config/riscv/riscv_th_vector.h: New file. * config/riscv/thead-vector.md: New file. gcc/testsuite/ChangeLog: * gcc.target/riscv/rvv/base/pragma-1.c: Add XTheadVector. * gcc.target/riscv/rvv/base/abi-1.c: Exclude XTheadVector. * lib/target-supports.exp: Add target for XTheadVector. Co-authored-by: Jin Ma Co-authored-by: Xianmiao Qu Co-authored-by: Christoph Müllner --- gcc/config.gcc | 2 +- gcc/config/riscv/autovec.md | 2 +- gcc/config/riscv/predicates.md | 8 +- gcc/config/riscv/riscv-string.cc | 3 + gcc/config/riscv/riscv-v.cc | 13 +- .../riscv/riscv-vector-builtins-bases.cc | 3 + .../riscv/riscv-vector-builtins-shapes.cc | 23 +++ gcc/config/riscv/riscv-vector-switch.def | 150 +++--- gcc/config/riscv/riscv-vsetvl.cc | 10 + gcc/config/riscv/riscv.cc | 20 +- gcc/config/riscv/riscv_th_vector.h | 49 + gcc/config/riscv/thead-vector.md | 142 + gcc/config/riscv/vector-iterators.md | 186 +- gcc/config/riscv/vector.md | 36 +++- .../gcc.target/riscv/rvv/base/abi-1.c | 2 +- .../gcc
Re:[PATCH v4 5/6] RISC-V: Handle differences between XTheadvector and Vector
H Juzhe, This patch "RISC-V: Handle differences between XTheadvector and Vector" is addressing some code generation issues for RVV1.0 instructions that xtheadvector does not have, not with intrinsics. BTW, what about the following patch " RISC-V: Add support for xtheadvector-specific intrinsics"? It adds support for new xtheadvector instructions. Is it OK to be merged? Joshua -- 发件人:juzhe.zh...@rivai.ai 发送时间:2023年12月29日(星期五) 09:58 收件人:"cooper.joshua"; "gcc-patches" 抄 送:Jim Wilson; palmer; andrew; "philipp.tomsich"; jeffreyalaw; "christoph.muellner"; "cooper.joshua"; jinma; "cooper.qu" 主 题:Re: [PATCH v4 5/6] RISC-V: Handle differences between XTheadvector and Vector I am confused by the series patches. I thought this patch: https://gcc.gnu.org/pipermail/gcc-patches/2023-December/641417.html is enough to support partial theadvector that can leverage directly RVV1.0 ? Could clean up and resend the patches base on patch above (supposed it is merged already) ? juzhe.zh...@rivai.ai From: Jun Sha (Joshua) Date: 2023-12-29 09:46 To: gcc-patches CC: jim.wilson.gcc; palmer; andrew; philipp.tomsich; jeffreyalaw; christoph.muellner; juzhe.zhong; Jun Sha (Joshua); Jin Ma; Xianmiao Qu Subject: [PATCH v4 5/6] RISC-V: Handle differences between XTheadvector and Vector This patch is to handle the differences in instruction generation between Vector and XTheadVector. In this version, we only support partial xtheadvector instructions that leverage directly from current RVV1.0 with simple adding "th." prefix. For different name xtheadvector instructions but share same patterns as RVV1.0 instructions, we will use ASM targethook to rewrite the whole string of the instructions in the following patches. For some vector patterns that cannot be avoided, we use "!TARGET_XTHEADVECTOR" to disable them in vector.md in order not to generate instructions that xtheadvector does not support, like vmv1r and vsext.vf2. gcc/ChangeLog: * config.gcc: Add files for XTheadVector intrinsics. * config/riscv/autovec.md: Guard XTheadVector. * config/riscv/riscv-string.cc (expand_block_move): Guard XTheadVector. * config/riscv/riscv-v.cc (legitimize_move): New expansion. (get_prefer_tail_policy): Give specific value for tail. (get_prefer_mask_policy): Give specific value for mask. (vls_mode_valid_p): Avoid autovec. * config/riscv/riscv-vector-builtins-shapes.cc (check_type): (build_one): New function. * config/riscv/riscv-vector-builtins.cc (DEF_RVV_FUNCTION): (DEF_THEAD_RVV_FUNCTION): Add new marcos. (check_required_extensions): (handle_pragma_vector): * config/riscv/riscv-vector-builtins.h (RVV_REQUIRE_VECTOR): (RVV_REQUIRE_XTHEADVECTOR): Add RVV_REQUIRE_VECTOR and RVV_REQUIRE_XTHEADVECTOR. (struct function_group_info): * config/riscv/riscv-vector-switch.def (ENTRY): Disable fractional mode for the XTheadVector extension. (TUPLE_ENTRY): Likewise. * config/riscv/riscv-vsetvl.cc: Add functions for xtheadvector. * config/riscv/riscv.cc (riscv_v_ext_vls_mode_p): Guard XTheadVector. (riscv_v_adjust_bytesize): Likewise. (riscv_preferred_simd_mode): Likewsie. (riscv_autovectorize_vector_modes): Likewise. (riscv_vector_mode_supported_any_target_p): Likewise. (TARGET_VECTOR_MODE_SUPPORTED_ANY_TARGET_P): Likewise. * config/riscv/vector-iterators.md: Remove fractional LMUL. * config/riscv/vector.md: Include thead-vector.md. * config/riscv/riscv_th_vector.h: New file. * config/riscv/thead-vector.md: New file. gcc/testsuite/ChangeLog: * gcc.target/riscv/rvv/base/pragma-1.c: Add XTheadVector. * gcc.target/riscv/rvv/base/abi-1.c: Exclude XTheadVector. * lib/target-supports.exp: Add target for XTheadVector. Co-authored-by: Jin Ma Co-authored-by: Xianmiao Qu Co-authored-by: Christoph Müllner --- gcc/config.gcc | 2 +- gcc/config/riscv/autovec.md | 2 +- gcc/config/riscv/predicates.md | 8 +- gcc/config/riscv/riscv-string.cc | 3 + gcc/config/riscv/riscv-v.cc | 13 +- .../riscv/riscv-vector-builtins-bases.cc | 3 + .../riscv/riscv-vector-builtins-shapes.cc | 23 +++ gcc/config/riscv/riscv-vector-switch.def | 150 +++--- gcc/config/riscv/riscv-vsetvl.cc | 10 + gcc/config/riscv/riscv.cc | 20 +- gcc/config/riscv/riscv_th_vector.h | 49 + gcc/config/riscv/thead-vector.md | 142 + gcc/config/riscv/vector-iterators.md | 186 +- gcc/config/riscv/vector.md | 36 +++- .../gcc.target/riscv/rvv/base/abi-1.c | 2 +- ..
[PATCH v1 1/8] LoongArch: testsuite:Add detection procedures supported by the target.
In order to improve and check the function of vector quantization in LoongArch architecture, tests on vector instruction set are provided in target-support.exp. gcc/testsuite/ChangeLog: * lib/target-supports.exp:Add LoongArch to the list of supported targets. --- gcc/testsuite/lib/target-supports.exp | 219 +++--- 1 file changed, 161 insertions(+), 58 deletions(-) diff --git a/gcc/testsuite/lib/target-supports.exp b/gcc/testsuite/lib/target-supports.exp index 14e3e119792..b90aaf8cabe 100644 --- a/gcc/testsuite/lib/target-supports.exp +++ b/gcc/testsuite/lib/target-supports.exp @@ -3811,7 +3811,11 @@ proc add_options_for_bfloat16 { flags } { # (fma, fms, fnma, and fnms) for both float and double. proc check_effective_target_scalar_all_fma { } { -return [istarget aarch64*-*-*] +if { [istarget aarch64*-*-*] +|| [istarget loongarch*-*-*]} { + return 1 +} +return 0 } # Return 1 if the target supports compiling fixed-point, @@ -4017,7 +4021,7 @@ proc check_effective_target_vect_cmdline_needed { } { || ([istarget arm*-*-*] && [check_effective_target_arm_neon]) || [istarget aarch64*-*-*] || [istarget amdgcn*-*-*] -|| [istarget riscv*-*-*]} { +|| [istarget riscv*-*-*] } { return 0 } else { return 1 @@ -4047,6 +4051,8 @@ proc check_effective_target_vect_int { } { && [check_effective_target_s390_vx]) || ([istarget riscv*-*-*] && [check_effective_target_riscv_v]) +|| ([istarget loongarch*-*-*] +&& [check_effective_target_loongarch_sx]) }}] } @@ -4176,7 +4182,9 @@ proc check_effective_target_vect_intfloat_cvt { } { || ([istarget s390*-*-*] && [check_effective_target_s390_vxe2]) || ([istarget riscv*-*-*] -&& [check_effective_target_riscv_v]) }}] +&& [check_effective_target_riscv_v]) +|| ([istarget loongarch*-*-*] +&& [check_effective_target_loongarch_sx]) }}] } # Return 1 if the target supports signed double->int conversion @@ -4197,7 +4205,9 @@ proc check_effective_target_vect_doubleint_cvt { } { || ([istarget s390*-*-*] && [check_effective_target_s390_vx]) || ([istarget riscv*-*-*] -&& [check_effective_target_riscv_v]) }}] +&& [check_effective_target_riscv_v]) +|| ([istarget loongarch*-*-*] +&& [check_effective_target_loongarch_sx]) }}] } # Return 1 if the target supports signed int->double conversion @@ -4218,7 +4228,9 @@ proc check_effective_target_vect_intdouble_cvt { } { || ([istarget s390*-*-*] && [check_effective_target_s390_vx]) || ([istarget riscv*-*-*] -&& [check_effective_target_riscv_v]) }}] +&& [check_effective_target_riscv_v]) +|| ([istarget loongarch*-*-*] +&& [check_effective_target_loongarch_sx]) }}] } #Return 1 if we're supporting __int128 for target, 0 otherwise. @@ -4251,7 +4263,9 @@ proc check_effective_target_vect_uintfloat_cvt { } { || ([istarget s390*-*-*] && [check_effective_target_s390_vxe2]) || ([istarget riscv*-*-*] -&& [check_effective_target_riscv_v]) }}] +&& [check_effective_target_riscv_v]) +|| ([istarget loongarch*-*-*] +&& [check_effective_target_loongarch_sx]) }}] } @@ -4270,7 +4284,9 @@ proc check_effective_target_vect_floatint_cvt { } { || ([istarget s390*-*-*] && [check_effective_target_s390_vxe2]) || ([istarget riscv*-*-*] -&& [check_effective_target_riscv_v]) }}] +&& [check_effective_target_riscv_v]) +|| ([istarget loongarch*-*-*] +&& [check_effective_target_loongarch_sx]) }}] } # Return 1 if the target supports unsigned float->int conversion @@ -4287,7 +4303,9 @@ proc check_effective_target_vect_floatuint_cvt { } { || ([istarget s390*-*-*] && [check_effective_target_s390_vxe2]) || ([istarget riscv*-*-*] - && [check_effective_target_riscv_v]) }}] + && [check_effective_target_riscv_v]) + || ([istarget loongarch*-*-*] + && [check_effective_target_loongarch_sx]) }}] } # Return 1 if the target supports vector integer char -> long long extend optab @@ -4296,7 +4314,9 @@ proc check_effective_target_vect_floatuint_cvt { } { proc check_effective_target_vect_ext_char_longlong { } { return [check_cached_effective_target_indexed vect_ext_char_longlong { expr { ([istarget riscv*-*-*] - && [check_effective_target_riscv_v]) }}] + && [check_effective_target_riscv_v]) + || ([istarget loongar
[PATCH v1 2/8] LoongArch: testsuite:Modify the test behavior of the vect-bic-bitmask-{12, 23}.c file.
When the toolchain is built using binutils that does not support vectorization and gcc that supports vectorization, the regression test results of GCC show that the vect-bic-bitmask-{12,23}.c file fails. The reason is that it carries out two stages of compilation and assembly test, in the assembly stage there is no identification of vector instructions, but in fact only need to carry out the compilation stage. To solve this problem, change the default set of assembly to compile only, so that other architectures do not have similar problems. gcc/testsuite/ChangeLog: * gcc.dg/vect/vect-bic-bitmask-12.c:Change the default setting of assembly to compile. * gcc.dg/vect/vect-bic-bitmask-23.c:Dito. --- gcc/testsuite/gcc.dg/vect/vect-bic-bitmask-12.c | 2 +- gcc/testsuite/gcc.dg/vect/vect-bic-bitmask-23.c | 2 +- 2 files changed, 2 insertions(+), 2 deletions(-) diff --git a/gcc/testsuite/gcc.dg/vect/vect-bic-bitmask-12.c b/gcc/testsuite/gcc.dg/vect/vect-bic-bitmask-12.c index 36ec5a8b19b..213e4c2a418 100644 --- a/gcc/testsuite/gcc.dg/vect/vect-bic-bitmask-12.c +++ b/gcc/testsuite/gcc.dg/vect/vect-bic-bitmask-12.c @@ -1,5 +1,5 @@ /* { dg-skip-if "missing optab for vectorization" { sparc*-*-* } } */ -/* { dg-do assemble } */ +/* { dg-do compile } */ /* { dg-additional-options "-O3 -fdump-tree-dce -w" } */ #include diff --git a/gcc/testsuite/gcc.dg/vect/vect-bic-bitmask-23.c b/gcc/testsuite/gcc.dg/vect/vect-bic-bitmask-23.c index 5b4c3b6e19b..5dceb4bbcb6 100644 --- a/gcc/testsuite/gcc.dg/vect/vect-bic-bitmask-23.c +++ b/gcc/testsuite/gcc.dg/vect/vect-bic-bitmask-23.c @@ -1,5 +1,5 @@ /* { dg-skip-if "missing optab for vectorization" { sparc*-*-* } } */ -/* { dg-do assemble } */ +/* { dg-do compile } */ /* { dg-additional-options "-O1 -fdump-tree-dce -w" } */ #include -- 2.20.1
[PATCH v1 5/8] LoongArch: testsuite:Modify the test behavior in file pr60510.f.
When using binutils that does not support vectorization and gcc compiler toolchain that supports vectorization, regression tests found that pr60510.f had a FAIL entry. The reason is that the default setting of the program is the execution state, which will cause problems in the assembly stage when the vector instructions cannot be identified. In order to solve this problem, the default behavior set to run was removed, and the behavior of the program depends on whether the software supports vectorization or not. gcc/testsuite/ChangeLog: * gfortran.dg/vect/pr60510.f:Delete the default behavior of the program. --- gcc/testsuite/gfortran.dg/vect/pr60510.f | 1 - 1 file changed, 1 deletion(-) diff --git a/gcc/testsuite/gfortran.dg/vect/pr60510.f b/gcc/testsuite/gfortran.dg/vect/pr60510.f index 6cae82acece..d4fd42a664a 100644 --- a/gcc/testsuite/gfortran.dg/vect/pr60510.f +++ b/gcc/testsuite/gfortran.dg/vect/pr60510.f @@ -1,4 +1,3 @@ -! { dg-do run } ! { dg-require-effective-target vect_double } ! { dg-require-effective-target vect_intdouble_cvt } ! { dg-additional-options "-fno-inline -ffast-math" } -- 2.20.1
[PATCH v1 7/8] LoongArch: testsuite:Added additional vectorization "-mlsx" compilation option.
When GCC is able to detect vectorized test cases in the common layer, FAIL entries appear in some test cases after regression testing. The cause of the error is that the vectorization option was not set when testing the program, and the vectorization code could not be generated, so additional support for the "-mlsx" option needed to be added back on the LoongArch architecture. gcc/testsuite/ChangeLog: * gcc.dg/signbit-2.c:Added additional "-mlsx" compilation options. * gcc.dg/tree-ssa/scev-16.c:Dito. * gfortran.dg/graphite/vect-pr40979.f90:Dito. * gfortran.dg/vect/fast-math-mgrid-resid.f:Dito. --- gcc/testsuite/gcc.dg/signbit-2.c | 1 + gcc/testsuite/gcc.dg/tree-ssa/scev-16.c| 1 + gcc/testsuite/gfortran.dg/graphite/vect-pr40979.f90| 1 + gcc/testsuite/gfortran.dg/vect/fast-math-mgrid-resid.f | 1 + 4 files changed, 4 insertions(+) diff --git a/gcc/testsuite/gcc.dg/signbit-2.c b/gcc/testsuite/gcc.dg/signbit-2.c index 62bb4047d74..2f65df16e43 100644 --- a/gcc/testsuite/gcc.dg/signbit-2.c +++ b/gcc/testsuite/gcc.dg/signbit-2.c @@ -5,6 +5,7 @@ /* { dg-additional-options "-msse2 -mno-avx512f" { target { i?86-*-* x86_64-*-* } } } */ /* { dg-additional-options "-march=armv8-a" { target aarch64_sve } } */ /* { dg-additional-options "-maltivec" { target powerpc_altivec_ok } } */ +/* { dg-additional-options "-mlsx" { target loongarch*-*-* } } */ /* { dg-skip-if "no fallback for MVE" { arm_mve } } */ #include diff --git a/gcc/testsuite/gcc.dg/tree-ssa/scev-16.c b/gcc/testsuite/gcc.dg/tree-ssa/scev-16.c index 120f40c0b6c..acaa1156419 100644 --- a/gcc/testsuite/gcc.dg/tree-ssa/scev-16.c +++ b/gcc/testsuite/gcc.dg/tree-ssa/scev-16.c @@ -1,6 +1,7 @@ /* { dg-do compile } */ /* { dg-require-effective-target vect_int } */ /* { dg-options "-O2 -ftree-vectorize -fdump-tree-vect-details" } */ +/* { dg-additional-options "-mlsx" { target loongarch*-*-* } } */ int A[1024 * 2]; diff --git a/gcc/testsuite/gfortran.dg/graphite/vect-pr40979.f90 b/gcc/testsuite/gfortran.dg/graphite/vect-pr40979.f90 index a42290948c4..4c251aacbe3 100644 --- a/gcc/testsuite/gfortran.dg/graphite/vect-pr40979.f90 +++ b/gcc/testsuite/gfortran.dg/graphite/vect-pr40979.f90 @@ -1,6 +1,7 @@ ! { dg-do compile } ! { dg-require-effective-target vect_double } ! { dg-additional-options "-msse2" { target { { i?86-*-* x86_64-*-* } && ilp32 } } } +! { dg-additional-options "-mlsx" { target loongarch*-*-* } } module mqc_m integer, parameter, private :: longreal = selected_real_kind(15,90) diff --git a/gcc/testsuite/gfortran.dg/vect/fast-math-mgrid-resid.f b/gcc/testsuite/gfortran.dg/vect/fast-math-mgrid-resid.f index 08965cc5e20..97b88821731 100644 --- a/gcc/testsuite/gfortran.dg/vect/fast-math-mgrid-resid.f +++ b/gcc/testsuite/gfortran.dg/vect/fast-math-mgrid-resid.f @@ -2,6 +2,7 @@ ! { dg-require-effective-target vect_double } ! { dg-options "-O3 --param vect-max-peeling-for-alignment=0 -fpredictive-commoning -fdump-tree-pcom-details -std=legacy" } ! { dg-additional-options "-mprefer-avx128" { target { i?86-*-* x86_64-*-* } } } +! { dg-additional-options "-mlsx" { target { loongarch*-*-* } } } ! { dg-additional-options "-mzarch" { target { s390*-*-* } } } *** RESID COMPUTES THE RESIDUAL: R = V - AU -- 2.20.1
[PATCH v1 6/8] LoongArch: testsuite:Added additional vectorization "-mlasx" compilation option.
After the detection procedure under the gcc.dg/vect directory was added to GCC, FAIL entries of vector multiplication transformations of different types appeared in the gcc regression test results. After debugging analysis, the main problem is that the 128-bit vector of LoongArch architecture does not realize this function. To solve this problem, the "-mlasx" option is used to enable the 256-bit vectorization implementation. gcc/testsuite/ChangeLog: * gcc.dg/vect/bb-slp-pattern-1.c:If you are testing on the LoongArch architecture, you need to add the "-mlasx" compilation option to generate vectorized code. * gcc.dg/vect/slp-widen-mult-half.c:Dito. * gcc.dg/vect/vect-widen-mult-const-s16.c:Dito. * gcc.dg/vect/vect-widen-mult-const-u16.c:Dito. * gcc.dg/vect/vect-widen-mult-half-u8.c:Dito. * gcc.dg/vect/vect-widen-mult-half.c:Dito. * gcc.dg/vect/vect-widen-mult-u16.c:Dito. * gcc.dg/vect/vect-widen-mult-u8-s16-s32.c:Dito. * gcc.dg/vect/vect-widen-mult-u8-u32.c:Dito. * gcc.dg/vect/vect-widen-mult-u8.c:Dito. --- gcc/testsuite/gcc.dg/vect/bb-slp-pattern-1.c | 1 + gcc/testsuite/gcc.dg/vect/slp-widen-mult-half.c| 1 + gcc/testsuite/gcc.dg/vect/vect-widen-mult-const-s16.c | 1 + gcc/testsuite/gcc.dg/vect/vect-widen-mult-const-u16.c | 1 + gcc/testsuite/gcc.dg/vect/vect-widen-mult-half-u8.c| 1 + gcc/testsuite/gcc.dg/vect/vect-widen-mult-half.c | 1 + gcc/testsuite/gcc.dg/vect/vect-widen-mult-u16.c| 1 + gcc/testsuite/gcc.dg/vect/vect-widen-mult-u8-s16-s32.c | 1 + gcc/testsuite/gcc.dg/vect/vect-widen-mult-u8-u32.c | 1 + gcc/testsuite/gcc.dg/vect/vect-widen-mult-u8.c | 1 + 10 files changed, 10 insertions(+) diff --git a/gcc/testsuite/gcc.dg/vect/bb-slp-pattern-1.c b/gcc/testsuite/gcc.dg/vect/bb-slp-pattern-1.c index a3ff0f5b3da..5ae99225273 100644 --- a/gcc/testsuite/gcc.dg/vect/bb-slp-pattern-1.c +++ b/gcc/testsuite/gcc.dg/vect/bb-slp-pattern-1.c @@ -1,4 +1,5 @@ /* { dg-require-effective-target vect_int } */ +/* { dg-additional-options "-mlasx" { target loongarch*-*-* } } */ #include #include "tree-vect.h" diff --git a/gcc/testsuite/gcc.dg/vect/slp-widen-mult-half.c b/gcc/testsuite/gcc.dg/vect/slp-widen-mult-half.c index 72811eb852e..b69ade33886 100644 --- a/gcc/testsuite/gcc.dg/vect/slp-widen-mult-half.c +++ b/gcc/testsuite/gcc.dg/vect/slp-widen-mult-half.c @@ -1,6 +1,7 @@ /* Disabling epilogues until we find a better way to deal with scans. */ /* { dg-additional-options "--param vect-epilogues-nomask=0" } */ /* { dg-require-effective-target vect_int } */ +/* { dg-additional-options "-mlasx" { target loongarch*-*-* } } */ #include "tree-vect.h" diff --git a/gcc/testsuite/gcc.dg/vect/vect-widen-mult-const-s16.c b/gcc/testsuite/gcc.dg/vect/vect-widen-mult-const-s16.c index dfbb2171c00..53c9b84ca01 100644 --- a/gcc/testsuite/gcc.dg/vect/vect-widen-mult-const-s16.c +++ b/gcc/testsuite/gcc.dg/vect/vect-widen-mult-const-s16.c @@ -2,6 +2,7 @@ /* { dg-additional-options "--param vect-epilogues-nomask=0" } */ /* { dg-require-effective-target vect_int } */ /* { dg-additional-options "-fno-ipa-icf" } */ +/* { dg-additional-options "-mlasx" { target loongarch*-*-*} } */ #include "tree-vect.h" diff --git a/gcc/testsuite/gcc.dg/vect/vect-widen-mult-const-u16.c b/gcc/testsuite/gcc.dg/vect/vect-widen-mult-const-u16.c index c2ad58f69e7..e9db8285b66 100644 --- a/gcc/testsuite/gcc.dg/vect/vect-widen-mult-const-u16.c +++ b/gcc/testsuite/gcc.dg/vect/vect-widen-mult-const-u16.c @@ -2,6 +2,7 @@ /* { dg-additional-options "--param vect-epilogues-nomask=0" } */ /* { dg-require-effective-target vect_int } */ /* { dg-additional-options "-fno-ipa-icf" } */ +/* { dg-additional-options "-mlasx" { target loongarch*-*-*} } */ #include "tree-vect.h" diff --git a/gcc/testsuite/gcc.dg/vect/vect-widen-mult-half-u8.c b/gcc/testsuite/gcc.dg/vect/vect-widen-mult-half-u8.c index bfdcbaa09fb..607f3178f90 100644 --- a/gcc/testsuite/gcc.dg/vect/vect-widen-mult-half-u8.c +++ b/gcc/testsuite/gcc.dg/vect/vect-widen-mult-half-u8.c @@ -2,6 +2,7 @@ /* { dg-additional-options "--param vect-epilogues-nomask=0" } */ /* { dg-require-effective-target vect_int } */ /* { dg-additional-options "-fno-ipa-icf" } */ +/* { dg-additional-options "-mlasx" { target loongarch*-*-*} } */ #include "tree-vect.h" diff --git a/gcc/testsuite/gcc.dg/vect/vect-widen-mult-half.c b/gcc/testsuite/gcc.dg/vect/vect-widen-mult-half.c index e46b0cc3135..cd13d826937 100644 --- a/gcc/testsuite/gcc.dg/vect/vect-widen-mult-half.c +++ b/gcc/testsuite/gcc.dg/vect/vect-widen-mult-half.c @@ -1,6 +1,7 @@ /* Disabling epilogues until we find a better way to deal with scans. */ /* { dg-additional-options "--param vect-epilogues-nomask=0" } */ /* { dg-require-effective-target vect_int } */ +/* { dg-additional-options "-mlasx" { target loongarch*-*-*} } */ #include "tree-vect.h" diff --git a/gcc/testsui
[PATCH v1 3/8] LoongArch: testsuite:Added test support for vect-{82, 83}.c.
When gcc enables the file test under gcc.dg/vect, it is found that vect-{82, 83}.c does not support the test. Through analysis, LoongArch architecture supports the detection function of this test case. Therefore, the detection of LoongArch architecture is added to the test rules to solve the situation that the test is not supported. gcc/testsuite/ChangeLog: * gcc.dg/vect/vect-82.c:Add the LoongArch architecture to the object detection framework. * gcc.dg/vect/vect-83.c:Dito. --- gcc/testsuite/gcc.dg/vect/vect-82.c | 2 +- gcc/testsuite/gcc.dg/vect/vect-83.c | 2 +- 2 files changed, 2 insertions(+), 2 deletions(-) diff --git a/gcc/testsuite/gcc.dg/vect/vect-82.c b/gcc/testsuite/gcc.dg/vect/vect-82.c index 4b2d5a8a464..5c761e92a3a 100644 --- a/gcc/testsuite/gcc.dg/vect/vect-82.c +++ b/gcc/testsuite/gcc.dg/vect/vect-82.c @@ -1,4 +1,4 @@ -/* { dg-skip-if "powerpc and integer vectorization only" { ! { powerpc*-*-* && vect_int } } } */ +/* { dg-skip-if "powerpc/loongarch and integer vectorization only" { ! { { powerpc*-*-* || loongarch*-*-* } && vect_int } } } */ /* { dg-additional-options "-fdump-tree-optimized-details-blocks" } */ #include diff --git a/gcc/testsuite/gcc.dg/vect/vect-83.c b/gcc/testsuite/gcc.dg/vect/vect-83.c index 1a173daa140..7fe1b050cee 100644 --- a/gcc/testsuite/gcc.dg/vect/vect-83.c +++ b/gcc/testsuite/gcc.dg/vect/vect-83.c @@ -1,4 +1,4 @@ -/* { dg-skip-if "powerpc and integer vectorization only" { ! { powerpc*-*-* && vect_int } } } */ +/* { dg-skip-if "powerpc/loongarch and integer vectorization only" { ! { { powerpc*-*-* || loongarch*-*-* } && vect_int } } } */ /* { dg-additional-options "-fdump-tree-optimized-details-blocks" } */ #include -- 2.20.1
[PATCH v1 4/8] LoongArch: testsuite:Fix FAIL in file bind_c_array_params_2.f90.
In the GCC regression test result, it is found that the bind_c_array_params_2.f90 test fails. After analysis, it is found that the reason why the test fails is that the regular expression in the test result cannot correctly detect the correct assembly code (such as bl %plt(myBindC)) generated on the LoongArch architecture, such as the assembly code generated on the x86 function call (call myBindC). gcc/testsuite/ChangeLog: * gfortran.dg/bind_c_array_params_2.f90:Add code test rules to support testing of the loongArch architecture. --- gcc/testsuite/gfortran.dg/bind_c_array_params_2.f90 | 4 +++- 1 file changed, 3 insertions(+), 1 deletion(-) diff --git a/gcc/testsuite/gfortran.dg/bind_c_array_params_2.f90 b/gcc/testsuite/gfortran.dg/bind_c_array_params_2.f90 index 0825efc7a2f..aa6a37b4850 100644 --- a/gcc/testsuite/gfortran.dg/bind_c_array_params_2.f90 +++ b/gcc/testsuite/gfortran.dg/bind_c_array_params_2.f90 @@ -2,6 +2,7 @@ ! { dg-options "-std=f2008ts -fdump-tree-original" } ! { dg-additional-options "-mno-explicit-relocs" { target alpha*-*-* } } ! { dg-additional-options "-mno-relax-pic-calls" { target mips*-*-* } } +! { dg-additional-options "-fplt -mcmodel=normal" { target loongarch*-*-* } } ! ! Check that assumed-shape variables are correctly passed to BIND(C) ! as defined in TS 29913 @@ -16,7 +17,8 @@ integer :: aa(4,4) call test(aa) end -! { dg-final { scan-assembler-times "\[ \t\]\[$,_0-9\]*myBindC" 1 { target { ! { hppa*-*-* s390*-*-* *-*-cygwin* amdgcn*-*-* powerpc-ibm-aix* *-*-ming* } } } } } +! { dg-final { scan-assembler-times "\[ \t\]\[$,_0-9\]*myBindC" 1 { target { ! { hppa*-*-* s390*-*-* *-*-cygwin* amdgcn*-*-* powerpc-ibm-aix* *-*-ming* loongarch*-*-* } } } } } +! { dg-final { scan-assembler-times "bl\t%plt\\(myBindC\\)" 1 { target loongarch*-*-* } } } ! { dg-final { scan-assembler-times "myBindC,%r2" 1 { target { hppa*-*-* } } } } ! { dg-final { scan-assembler-times "call\tmyBindC" 1 { target { *-*-cygwin* *-*-ming* } } } } ! { dg-final { scan-assembler-times "brasl\t%r\[0-9\]*,myBindC" 1 { target { s390*-*-* } } } } -- 2.20.1
[PATCH v1 8/8] LoongArch: testsuite:Modify the result check in the FMA file.
When gcc enabled the vectorization of the common layer, some FAIL items appeared in GCC regression tests, such as gcc.dg/fma-{3,4,6,7}.c. On LoongArch architecture, for example, the result of fmsub.s instruction is a*b-c, and there is a problem of positive and negative zero inequality between the result of c-a*b expected to be calculated, so the detection of such problems in LoongArch architecture needs to be set to unsupported state. gcc/testsuite/ChangeLog: * gcc.dg/fma-3.c:The intermediate file corresponding to the function does not produce the corresponding FNMA symbol, so the test rules should be skipped when testing. * gcc.dg/fma-4.c:The intermediate file corresponding to the function does not produce the corresponding FNMS symbol, so skip the test rules when testing. * gcc.dg/fma-6.c:The cause is the same as fma-3.c. * gcc.dg/fma-7.c:The cause is the same as fma-4.c --- gcc/testsuite/gcc.dg/fma-3.c | 2 +- gcc/testsuite/gcc.dg/fma-4.c | 2 +- gcc/testsuite/gcc.dg/fma-6.c | 2 +- gcc/testsuite/gcc.dg/fma-7.c | 2 +- 4 files changed, 4 insertions(+), 4 deletions(-) diff --git a/gcc/testsuite/gcc.dg/fma-3.c b/gcc/testsuite/gcc.dg/fma-3.c index 699aa2c9530..6649b54b6f9 100644 --- a/gcc/testsuite/gcc.dg/fma-3.c +++ b/gcc/testsuite/gcc.dg/fma-3.c @@ -12,4 +12,4 @@ f2 (double a, double b, double c) return c - a * b; } -/* { dg-final { scan-tree-dump-times { = \.FNMA \(} 2 "widening_mul" { target scalar_all_fma } } } */ +/* { dg-final { scan-tree-dump-times { = \.FNMA \(} 2 "widening_mul" { target { scalar_all_fma && { ! loongarch*-*-* } } } } } */ diff --git a/gcc/testsuite/gcc.dg/fma-4.c b/gcc/testsuite/gcc.dg/fma-4.c index bff928f1fac..f1701c1961a 100644 --- a/gcc/testsuite/gcc.dg/fma-4.c +++ b/gcc/testsuite/gcc.dg/fma-4.c @@ -12,4 +12,4 @@ f2 (double a, double b, double c) return -(a * b) - c; } -/* { dg-final { scan-tree-dump-times { = \.FNMS \(} 2 "widening_mul" { target scalar_all_fma } } } */ +/* { dg-final { scan-tree-dump-times { = \.FNMS \(} 2 "widening_mul" { target { scalar_all_fma && { ! loongarch*-*-* } } } } } */ diff --git a/gcc/testsuite/gcc.dg/fma-6.c b/gcc/testsuite/gcc.dg/fma-6.c index 87258cec4a2..9e49b62b6de 100644 --- a/gcc/testsuite/gcc.dg/fma-6.c +++ b/gcc/testsuite/gcc.dg/fma-6.c @@ -64,4 +64,4 @@ f10 (double a, double b, double c) return -__builtin_fma (a, b, -c); } -/* { dg-final { scan-tree-dump-times { = \.FNMA \(} 14 "optimized" { target scalar_all_fma } } } */ +/* { dg-final { scan-tree-dump-times { = \.FNMA \(} 14 "optimized" { target { scalar_all_fma && { ! loongarch*-*-* } } } } } */ diff --git a/gcc/testsuite/gcc.dg/fma-7.c b/gcc/testsuite/gcc.dg/fma-7.c index f409cc8ee3c..86aacad7b90 100644 --- a/gcc/testsuite/gcc.dg/fma-7.c +++ b/gcc/testsuite/gcc.dg/fma-7.c @@ -64,4 +64,4 @@ f10 (double a, double b, double c) return -__builtin_fma (a, b, c); } -/* { dg-final { scan-tree-dump-times { = \.FNMS \(} 14 "optimized" { target scalar_all_fma } } } */ +/* { dg-final { scan-tree-dump-times { = \.FNMS \(} 14 "optimized" { target { scalar_all_fma && { ! loongarch*-*-* } } } } } */ -- 2.20.1
Re:[PATCH v4 5/6] RISC-V: Handle differences between XTheadvector and Vector
Hi Juzhe, For vector_csr_operand, please refer to https://gcc.gnu.org/pipermail/gcc-patches/2023-December/641124.html. Joshua -- 发件人:juzhe.zh...@rivai.ai 发送时间:2023年12月29日(星期五) 10:14 收件人:"cooper.joshua"; "gcc-patches" 抄 送:Jim Wilson; palmer; andrew; "philipp.tomsich"; jeffreyalaw; "christoph.muellner"; jinma; "cooper.qu" 主 题:Re: 回复:[PATCH v4 5/6] RISC-V: Handle differences between XTheadvector and Vector No, we should handle this carefully step by step. First, after the the first kind of theadvector is merged, then we can talk about second kind of theadvector later. I am confused by this patch for example: (define_predicate "vector_csr_operand"- (ior (match_operand 0 "const_csr_operand")- (match_operand 0 "register_operand")))+ (ior (and (match_test "!TARGET_XTHEADVECTOR || rtx_equal_p (op, const0_rtx)")+ (match_operand 0 "const_csr_operand"))+(match_operand 0 "register_operand"))) I just checked upstream code, we don't have vector_csr_operand. So, to make me easily review and trace the codes, plz send the patch better organized. Thanks. juzhe.zh...@rivai.ai 发件人: joshua 发送时间: 2023-12-29 10:09 收件人: juzhe.zh...@rivai.ai; gcc-patches 抄送: Jim Wilson; palmer; andrew; philipp.tomsich; jeffreyalaw; christoph.muellner; jinma; cooper.qu 主题: 回复:[PATCH v4 5/6] RISC-V: Handle differences between XTheadvector and Vector H Juzhe, This patch "RISC-V: Handle differences between XTheadvector and Vector" is addressing some code generation issues for RVV1.0 instructions that xtheadvector does not have, not with intrinsics. BTW, what about the following patch " RISC-V: Add support for xtheadvector-specific intrinsics"?It adds support new xtheadvector instructions. Is it OK to be merged? Joshua -- 发件人:juzhe.zh...@rivai.ai 发送时间:2023年12月29日(星期五) 09:58 收件人:"cooper.joshua"; "gcc-patches" 抄 送:Jim Wilson; palmer; andrew; "philipp.tomsich"; jeffreyalaw; "christoph.muellner"; "cooper.joshua"; jinma; "cooper.qu" 主 题:Re: [PATCH v4 5/6] RISC-V: Handle differences between XTheadvector and Vector I am confused by the series patches. I thought this patch: https://gcc.gnu.org/pipermail/gcc-patches/2023-December/641417.html is enough to support partial theadvector that can leverage directly RVV1.0 ? Could clean up and resend the patches base on patch above (supposed it is merged already) ? juzhe.zh...@rivai.ai From: Jun Sha (Joshua) Date: 2023-12-29 09:46 To: gcc-patches CC: jim.wilson.gcc; palmer; andrew; philipp.tomsich; jeffreyalaw; christoph.muellner; juzhe.zhong; Jun Sha (Joshua); Jin Ma; Xianmiao Qu Subject: [PATCH v4 5/6] RISC-V: Handle differences between XTheadvector and Vector This patch is to handle the differences in instruction generation between Vector and XTheadVector. In this version, we only support partial xtheadvector instructions that leverage directly from current RVV1.0 with simple adding "th." prefix. For different name xtheadvector instructions but share same patterns as RVV1.0 instructions, we will use ASM targethook to rewrite the whole string of the instructions in the following patches. For some vector patterns that cannot be avoided, we use "!TARGET_XTHEADVECTOR" to disable them in vector.md in order not to generate instructions that xtheadvector does not support, like vmv1r and vsext.vf2. gcc/ChangeLog: * config.gcc: Add files for XTheadVector intrinsics. * config/riscv/autovec.md: Guard XTheadVector. * config/riscv/riscv-string.cc (expand_block_move): Guard XTheadVector. * config/riscv/riscv-v.cc (legitimize_move): New expansion. (get_prefer_tail_policy): Give specific value for tail. (get_prefer_mask_policy): Give specific value for mask. (vls_mode_valid_p): Avoid autovec. * config/riscv/riscv-vector-builtins-shapes.cc (check_type): (build_one): New function. * config/riscv/riscv-vector-builtins.cc (DEF_RVV_FUNCTION): (DEF_THEAD_RVV_FUNCTION): Add new marcos. (check_required_extensions): (handle_pragma_vector): * config/riscv/riscv-vector-builtins.h (RVV_REQUIRE_VECTOR): (RVV_REQUIRE_XTHEADVECTOR): Add RVV_REQUIRE_VECTOR and RVV_REQUIRE_XTHEADVECTOR. (struct function_group_info): * config/riscv/riscv-vector-switch.def (ENTRY): Disable fractional mode for the XTheadVector extension. (TUPLE_ENTRY): Likewise. * config/riscv/riscv-vsetvl.cc: Add functions for xtheadvector. * config/riscv/riscv.cc (riscv_v_ext_vls_mode_p): Guard XTheadVector. (riscv_v_adjust_bytesize): Likewise. (riscv_preferred_simd_mode): Likewsie. (riscv_autovectorize_vector_modes): Likewise. (riscv_vector_mode_supported_any_target_p): Likewise. (TARGET_VECTOR_MODE_SUPPOR
Re:Re:[PATCH v4 5/6] RISC-V: Handle differences between XTheadvector and Vector
We do not have vector_length_operand in vsetvl patterns. (define_insn "@vsetvl" [(set (match_operand:P 0 "register_operand" "=r") (unspec:P [(match_operand:P 1 "vector_csr_operand" "rK") (match_operand 2 "const_int_operand" "i") (match_operand 3 "const_int_operand" "i") (match_operand 4 "const_int_operand" "i") (match_operand 5 "const_int_operand" "i")] UNSPEC_VSETVL)) (set (reg:SI VL_REGNUM) (unspec:SI [(match_dup 1) (match_dup 2) (match_dup 3)] UNSPEC_VSETVL)) (set (reg:SI VTYPE_REGNUM) (unspec:SI [(match_dup 2) (match_dup 3) (match_dup 4) (match_dup 5)] UNSPEC_VSETVL))] "TARGET_VECTOR" "vset%i1vli\t%0,%1,e%2,%m3,t%p4,m%p5" [(set_attr "type" "vsetvl") (set_attr "mode" "") (set (attr "sew") (symbol_ref "INTVAL (operands[2])")) (set (attr "vlmul") (symbol_ref "INTVAL (operands[3])")) (set (attr "ta") (symbol_ref "INTVAL (operands[4])")) (set (attr "ma") (symbol_ref "INTVAL (operands[5])"))]) -- 发件人:juzhe.zh...@rivai.ai 发送时间:2023年12月29日(星期五) 10:22 收件人:"cooper.joshua"; "gcc-patches" 抄 送:Jim Wilson; palmer; andrew; "philipp.tomsich"; jeffreyalaw; "christoph.muellner"; jinma; "cooper.qu" 主 题:Re: Re:[PATCH v4 5/6] RISC-V: Handle differences between XTheadvector and Vector Why add vector_csr_operand ? Why not use vector_length_operand? juzhe.zh...@rivai.ai 发件人: joshua 发送时间: 2023-12-29 10:17 收件人: juzhe.zh...@rivai.ai; gcc-patches 抄送: Jim Wilson; palmer; andrew; philipp.tomsich; jeffreyalaw; christoph.muellner; jinma; cooper.qu 主题: Re:[PATCH v4 5/6] RISC-V: Handle differences between XTheadvector and Vector Hi Juzhe, For vector_csr_operand, please refer to https://gcc.gnu.org/pipermail/gcc-patches/2023-December/641124.html. Joshua -- 发件人:juzhe.zh...@rivai.ai 发送时间:2023年12月29日(星期五) 10:14 收件人:"cooper.joshua"; "gcc-patches" 抄 送:Jim Wilson; palmer; andrew; "philipp.tomsich"; jeffreyalaw; "christoph.muellner"; jinma; "cooper.qu" 主 题:Re: 回复:[PATCH v4 5/6] RISC-V: Handle differences between XTheadvector and Vector No, we should handle this carefully step by step. First, after the the first kind of theadvector is merged, then we can talk about second kind of theadvector later. I am confused by this patch for example: (define_predicate "vector_csr_operand"- (ior (match_operand 0 "const_csr_operand")- (match_operand 0 "register_operand")))+ (ior (and (match_test "!TARGET_XTHEADVECTOR || rtx_equal_p (op, const0_rtx)")+ (match_operand 0 "const_csr_operand"))+ (match_operand 0 "register_operand"))) I just checked upstream code, we don't have vector_csr_operand. So, to make me easily review and trace the codes, plz send the patch better organized. Thanks. juzhe.zh...@rivai.ai 发件人: joshua 发送时间: 2023-12-29 10:09 收件人: juzhe.zh...@rivai.ai; gcc-patches 抄送: Jim Wilson; palmer; andrew; philipp.tomsich; jeffreyalaw; christoph.muellner; jinma; cooper.qu 主题: 回复:[PATCH v4 5/6] RISC-V: Handle differences between XTheadvector and Vector H Juzhe, This patch "RISC-V: Handle differences between XTheadvector and Vector" is addressing some code generation issues for RVV1.0 instructions that xtheadvector does not have, not with intrinsics. BTW, what about the following patch " RISC-V: Add support for xtheadvector-specific intrinsics"?It adds support new xtheadvector instructions. Is it OK to be merged? Joshua -- 发件人:juzhe.zh...@rivai.ai 发送时间:2023年12月29日(星期五) 09:58 收件人:"cooper.joshua"; "gcc-patches" 抄 送:Jim Wilson; palmer; andrew; "philipp.tomsich"; jeffreyalaw; "christoph.muellner"; "cooper.joshua"; jinma; "cooper.qu" 主 题:Re: [PATCH v4 5/6] RISC-V: Handle differences between XTheadvector and Vector I am confused by the series patches. I thought this patch: https://gcc.gnu.org/pipermail/gcc-patches/2023-December/641417.html is enough to support partial theadvector that can leverage directly RVV1.0 ? Could clean up and resend the patches base on patch above (supposed it is merged already) ? juzhe.zh...@rivai.ai From: Jun Sha (Joshua) Date: 2023-12-29 09:46 To: gcc-patches CC: jim.wilson.gcc; palmer; andrew; philipp.tomsich; jeffreyalaw; christoph.muellner; juzhe.zhong; Jun Sha (Joshua); Jin Ma; Xianmiao Qu Subject: [PATCH v4 5/6] RISC-V: Handle differences between XTheadvector and Vector This patch is to handle the differences in instruction generation between Vector and XTheadVector. In this version, we only support partial xtheadvector instructions that leverage directly from current RVV1.0 with simple adding "th." prefix. For different name xtheadvector instructions but share same
Re: [PATCH] MIPS: Implement TARGET_INSN_COSTS
Roger Sayle 于2023年12月29日周五 00:54写道: > > > > The current (default) behavior is that when the target doesn’t define > > TARGET_INSN_COST the middle-end uses the backend’s > > TARGET_RTX_COSTS, so multiplications are slower than additions, > > but about the same size when optimizing for size (with -Os or -Oz). > > > > All of this gets disabled with your proposed patch. > > [If you don’t check speed, you probably shouldn’t touch insn_cost]. > > > > I agree that a backend can fine tune the (speed and size) costs of > > instructions (especially complex !single_set instructions) via > > attributes in the machine description, but these should be used > > to override/fine-tune rtx_costs, not override/replace/duplicate them. > > > > Having accurate rtx_costs also helps RTL expansion and the earlier > > optimizers, but insn_cost is used by combine and the later RTL > > optimization passes, once instructions have been recognized. > > Yes. I find this problem when I try to combine sign_extend and zero_extract. When I try to add an new define_insn for (set (reg/v:DI 200 [ val ]) (sign_extend:DI (ior:SI (and:SI (subreg:SI (reg/v:DI 200 [ val ]) 0) (const_int 16777215 [0xff])) (ashift:SI (subreg:SI (reg:QI 205 [ MEM[(const unsigned char *)buf_8(D) + 3B] ]) 0) (const_int 24 [0x18]) to generate an `ins` instruction. It is refused by `combine_validate_cost`. `combine_validate_cost` considers our RTX has cost COSTS_N_INSNS(3) instead of COSTS_N_INSNS(1). So we need a method to do so. I guess for all ports, we need a framework. `rtx_cost` should also tell me how many instructions it believes this RTX has. It may help us to accept some more complex RTX_INSNs, and convert them to 1 or 2 instructions. We can combine INSNs more aggressively. If so, we can calculate a ratio: total / insn_count. For MUL/DIV, the ratio may be a number > COSTS_N_INSNS (1). For our example above, the ratio will be COSTS_N_INSNS (1). So we can decide if we should accept this new RTX. > > Might I also recommend that instead of insn_count*perf_ratio*4, > > or even the slightly better COSTS_N_INSNS (insn_count*perf_ratio), > > that encode the relative cost in the attribute, avoiding the multiplication > > (at runtime), and allowing fine tuning like “COSTS_N_INSNS(2) – 1”. > > Likewise, COSTS_N_BYTES is a very useful macro for a backend to > > define/use in rtx_costs. Conveniently for many RISC machines, > > 1 instruction takes about 4 bytes, for COSTS_N_INSNS (1) is > > (approximately) comparable to COSTS_N_BYTES (4). > > > > I hope this helps. Perhaps something like: > > > > > > static int > > mips_insn_cost (rtx_insn *insn, bool speed) > > { > > int cost; > > if (recog_memoized (insn) >= 0) > > { > > if (speed) > > { > > /* Use cost if provided. */ > > cost = get_attr_cost (insn); > > if (cost > 0) > > return cost; > > } > > else > > { > > /* If optimizing for size, we want the insn size. */ > > return get_attr_length (insn); > > } > > } > > > > if (rtx set = single_set (insn)) > > cost = set_rtx_cost (set, speed); > > else > > cost = pattern_cost (PATTERN (insn), speed); > > /* If the cost is zero, then it's likely a complex insn. We don't > > want the cost of these to be less than something we know about. */ > > return cost ? cost : COSTS_N_INSNS (2); > > } > >
Re:[PATCH v4 5/6] RISC-V: Handle differences between XTheadvector and Vector
Hi Juzhe, These vsetvl patterns were written by you with csr_operand initially. Are you sure it can be repalced by vector_length_operand? Joshua -- 发件人:juzhe.zh...@rivai.ai 发送时间:2023年12月29日(星期五) 10:25 收件人:"cooper.joshua"; "gcc-patches" 抄 送:Jim Wilson; palmer; andrew; "philipp.tomsich"; jeffreyalaw; "christoph.muellner"; jinma; "cooper.qu" 主 题:Re: Re:[PATCH v4 5/6] RISC-V: Handle differences between XTheadvector and Vector Chnage it into vector_length_operand. juzhe.zh...@rivai.ai 发件人: joshua 发送时间: 2023-12-29 10:25 收件人: juzhe.zh...@rivai.ai; gcc-patches 抄送: Jim Wilson; palmer; andrew; philipp.tomsich; jeffreyalaw; christoph.muellner; jinma; cooper.qu 主题: Re:Re:[PATCH v4 5/6] RISC-V: Handle differences between XTheadvector and Vector We do not have vector_length_operand in vsetvl patterns. (define_insn "@vsetvl" [(set (match_operand:P 0 "register_operand" "=r") (unspec:P [(match_operand:P 1 "vector_csr_operand" "rK") (match_operand 2 "const_int_operand" "i") (match_operand 3 "const_int_operand" "i") (match_operand 4 "const_int_operand" "i") (match_operand 5 "const_int_operand" "i")] UNSPEC_VSETVL)) (set (reg:SI VL_REGNUM) (unspec:SI [(match_dup 1) (match_dup 2) (match_dup 3)] UNSPEC_VSETVL)) (set (reg:SI VTYPE_REGNUM) (unspec:SI [(match_dup 2) (match_dup 3) (match_dup 4) (match_dup 5)] UNSPEC_VSETVL))] "TARGET_VECTOR" "vset%i1vli\t%0,%1,e%2,%m3,t%p4,m%p5" [(set_attr "type" "vsetvl") (set_attr "mode" "") (set (attr "sew") (symbol_ref "INTVAL (operands[2])")) (set (attr "vlmul") (symbol_ref "INTVAL (operands[3])")) (set (attr "ta") (symbol_ref "INTVAL (operands[4])")) (set (attr "ma") (symbol_ref "INTVAL (operands[5])"))]) -- 发件人:juzhe.zh...@rivai.ai 发送时间:2023年12月29日(星期五) 10:22 收件人:"cooper.joshua"; "gcc-patches" 抄 送:Jim Wilson; palmer; andrew; "philipp.tomsich"; jeffreyalaw; "christoph.muellner"; jinma; "cooper.qu" 主 题:Re: Re:[PATCH v4 5/6] RISC-V: Handle differences between XTheadvector and Vector Why add vector_csr_operand ? Why not use vector_length_operand? juzhe.zh...@rivai.ai 发件人: joshua 发送时间: 2023-12-29 10:17 收件人: juzhe.zh...@rivai.ai; gcc-patches 抄送: Jim Wilson; palmer; andrew; philipp.tomsich; jeffreyalaw; christoph.muellner; jinma; cooper.qu 主题: Re:[PATCH v4 5/6] RISC-V: Handle differences between XTheadvector and Vector Hi Juzhe, For vector_csr_operand, please refer to https://gcc.gnu.org/pipermail/gcc-patches/2023-December/641124.html. Joshua -- 发件人:juzhe.zh...@rivai.ai 发送时间:2023年12月29日(星期五) 10:14 收件人:"cooper.joshua"; "gcc-patches" 抄 送:Jim Wilson; palmer; andrew; "philipp.tomsich"; jeffreyalaw; "christoph.muellner"; jinma; "cooper.qu" 主 题:Re: 回复:[PATCH v4 5/6] RISC-V: Handle differences between XTheadvector and Vector No, we should handle this carefully step by step. First, after the the first kind of theadvector is merged, then we can talk about second kind of theadvector later. I am confused by this patch for example: (define_predicate "vector_csr_operand"- (ior (match_operand 0 "const_csr_operand")- (match_operand 0 "register_operand")))+ (ior (and (match_test "!TARGET_XTHEADVECTOR || rtx_equal_p (op, const0_rtx)")+ (match_operand 0 "const_csr_operand"))+ (match_operand 0 "register_operand"))) I just checked upstream code, we don't have vector_csr_operand. So, to make me easily review and trace the codes, plz send the patch better organized. Thanks. juzhe.zh...@rivai.ai 发件人: joshua 发送时间: 2023-12-29 10:09 收件人: juzhe.zh...@rivai.ai; gcc-patches 抄送: Jim Wilson; palmer; andrew; philipp.tomsich; jeffreyalaw; christoph.muellner; jinma; cooper.qu 主题: 回复:[PATCH v4 5/6] RISC-V: Handle differences between XTheadvector and Vector H Juzhe, This patch "RISC-V: Handle differences between XTheadvector and Vector" is addressing some code generation issues for RVV1.0 instructions that xtheadvector does not have, not with intrinsics. BTW, what about the following patch " RISC-V: Add support for xtheadvector-specific intrinsics"?It adds support new xtheadvector instructions. Is it OK to be merged? Joshua -- 发件人:juzhe.zh...@rivai.ai 发送时间:2023年12月29日(星期五) 09:58 收件人:"cooper.joshua"; "gcc-patches" 抄 送:Jim Wilson; palmer; andrew; "philipp.tomsich"; jeffreyalaw; "christoph.muellner"; "cooper.joshua"; jinma; "cooper.qu" 主 题:Re: [PATCH v4 5/6] RISC-V: Handle differences between XTheadvector and Vector I am confused by the series patches. I thought this patch: https://gcc.gn
[PATCH v1] LoongArch: testsuite:Add the "-ffast-math" compilation option for the file vect-fmin-3.c.
After the detection of maximum reduction is enabled on LoongArch architecture, the regression test of GCC finds that vect-fmin-3.c fails. Currently, in the target-supports.exp file, only aarch64,arm,riscv, and LoongArch architectures are supported. Through analysis, the "-ffast-math" compilation option needs to be added to the test case in order to successfully reduce using vectorization. The original patch was submitted by author Richard Sandiford. The initial patch information submitted is as follows: commit e32b9eb32d7cd2d39bf9c70497890ac61b9ee14c gcc/testsuite/ChangeLog: * gcc.dg/vect/vect-fmin-3.c:Adding an extra "-ffast-math" to the compilation option ensures that the loop can be reduced to maximum success. --- gcc/testsuite/gcc.dg/vect/vect-fmin-3.c | 1 + 1 file changed, 1 insertion(+) diff --git a/gcc/testsuite/gcc.dg/vect/vect-fmin-3.c b/gcc/testsuite/gcc.dg/vect/vect-fmin-3.c index 2e282ba6878..edef57925c1 100644 --- a/gcc/testsuite/gcc.dg/vect/vect-fmin-3.c +++ b/gcc/testsuite/gcc.dg/vect/vect-fmin-3.c @@ -1,4 +1,5 @@ /* { dg-require-effective-target vect_float } */ +/* { dg-additional-options "-ffast-math" } */ #include "tree-vect.h" -- 2.20.1
[PATCH v4] RISC-V: Support XTheadVector extension
This patch series presents gcc implementation of the XTheadVector extension [1]. [1] https://github.com/T-head-Semi/thead-extension-spec/ For some vector patterns that cannot be avoided, we use "!TARGET_XTHEADVECTOR" to disable them in order not to generate instructions that xtheadvector does not support, causing 36 changes in vector.md. For the th. prefix issue, we use current_output_insn and the ASM_OUTPUT_OPCODE hook instead of directly modifying patterns in vector.md. We have run the GCC test suite and can confirm that there are no regressions. All the test results can be found in the following links, Run without xtheadvector: https://gcc.gnu.org/pipermail/gcc-testresults/2023-December/803686.html Run with xtheadvector: https://gcc.gnu.org/pipermail/gcc-testresults/2023-December/803687.html Furthermore, we have run the tests in https://github.com/riscv-non-isa/rvv-intrinsic-doc/tree/main/examples, and all the tests passed. Co-authored-by: Jin Ma Co-authored-by: Xianmiao Qu Co-authored-by: Christoph Müllner RISC-V: Refactor riscv-vector-builtins-bases.cc RISC-V: Change csr_operand into vector_length_operand for vsetvl patterns RISC-V: Introduce XTheadVector as a subset of V1.0.0 RISC-V: Adds the prefix "th." for the instructions of XTheadVector RISC-V: Handle differences between XTheadvector and Vector RISC-V: Add support for xtheadvector-specific intrinsics RISC-V: ... --- gcc/common/config/riscv/riscv-common.cc | 23 + gcc/config.gcc|4 +- gcc/config/riscv/autovec.md |2 +- gcc/config/riscv/predicates.md|8 +- gcc/config/riscv/riscv-c.cc |8 +- gcc/config/riscv/riscv-protos.h |1 + gcc/config/riscv/riscv-string.cc |3 + gcc/config/riscv/riscv-v.cc | 13 +- .../riscv/riscv-vector-builtins-bases.cc | 18 +- .../riscv/riscv-vector-builtins-bases.h | 19 + .../riscv/riscv-vector-builtins-shapes.cc | 149 + .../riscv/riscv-vector-builtins-shapes.h |3 + .../riscv/riscv-vector-builtins-types.def | 120 + gcc/config/riscv/riscv-vector-builtins.cc | 315 +- gcc/config/riscv/riscv-vector-builtins.h |5 +- gcc/config/riscv/riscv-vector-switch.def | 150 +- gcc/config/riscv/riscv.cc | 46 +- gcc/config/riscv/riscv.h |4 + gcc/config/riscv/riscv.opt|2 + gcc/config/riscv/riscv_th_vector.h| 49 + gcc/config/riscv/t-riscv | 16 + .../riscv/thead-vector-builtins-functions.def | 659 gcc/config/riscv/thead-vector-builtins.cc | 887 ++ gcc/config/riscv/thead-vector-builtins.h | 123 + gcc/config/riscv/thead-vector.md | 2827 + gcc/config/riscv/vector-iterators.md | 186 +- gcc/config/riscv/vector.md| 44 +- .../riscv/predef-__riscv_th_v_intrinsic.c | 11 + .../gcc.target/riscv/rvv/base/abi-1.c |2 +- .../gcc.target/riscv/rvv/base/pragma-1.c |2 +- .../gcc.target/riscv/rvv/xtheadvector.c | 13 + .../riscv/rvv/xtheadvector/prefix.c | 12 + .../riscv/rvv/xtheadvector/vlb-vsb.c | 68 + .../riscv/rvv/xtheadvector/vlbu-vsb.c | 68 + .../riscv/rvv/xtheadvector/vlh-vsh.c | 68 + .../riscv/rvv/xtheadvector/vlhu-vsh.c | 68 + .../riscv/rvv/xtheadvector/vlw-vsw.c | 68 + .../riscv/rvv/xtheadvector/vlwu-vsw.c | 68 + gcc/testsuite/lib/target-supports.exp | 12 + 39 files changed, 5931 insertions(+), 213 deletions(-) create mode 100644 gcc/config/riscv/riscv_th_vector.h create mode 100644 gcc/config/riscv/thead-vector-builtins-functions.def create mode 100644 gcc/config/riscv/thead-vector-builtins.cc create mode 100644 gcc/config/riscv/thead-vector-builtins.h create mode 100644 gcc/config/riscv/thead-vector.md create mode 100644 gcc/testsuite/gcc.target/riscv/predef-__riscv_th_v_intrinsic.c create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/xtheadvector.c create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/xtheadvector/prefix.c create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/xtheadvector/vlb-vsb.c create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/xtheadvector/vlbu-vsb.c create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/xtheadvector/vlh-vsh.c create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/xtheadvector/vlhu-vsh.c create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/xtheadvector/vlw-vsw.c create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/xtheadvector/vlwu-vsw.c
[PATCH v4] RISC-V: Refactor riscv-vector-builtins-bases.cc
This patch moves the definition of the enums lst_type and frm_op_type into riscv-vector-builtins-bases.h and removes the static visibility of fold_fault_load(), so these can be used in other compile units. gcc/ChangeLog: * config/riscv/riscv-vector-builtins-bases.cc (enum lst_type): (enum frm_op_type): move to riscv-vector-builtins-bases.h * config/riscv/riscv-vector-builtins-bases.h (GCC_RISCV_VECTOR_BUILTINS_BASES_H): Add header files. (enum lst_type): move from (enum frm_op_type): riscv-vector-builtins-bases.cc (fold_fault_load): riscv-vector-builtins-bases.cc Co-authored-by: Jin Ma Co-authored-by: Xianmiao Qu Co-authored-by: Christoph Müllner --- .../riscv/riscv-vector-builtins-bases.cc | 18 +- .../riscv/riscv-vector-builtins-bases.h | 19 +++ 2 files changed, 20 insertions(+), 17 deletions(-) diff --git a/gcc/config/riscv/riscv-vector-builtins-bases.cc b/gcc/config/riscv/riscv-vector-builtins-bases.cc index d70468542ee..c51affde353 100644 --- a/gcc/config/riscv/riscv-vector-builtins-bases.cc +++ b/gcc/config/riscv/riscv-vector-builtins-bases.cc @@ -48,24 +48,8 @@ using namespace riscv_vector; namespace riscv_vector { -/* Enumerates types of loads/stores operations. - It's only used in here so we don't define it - in riscv-vector-builtins-bases.h. */ -enum lst_type -{ - LST_UNIT_STRIDE, - LST_STRIDED, - LST_INDEXED, -}; - -enum frm_op_type -{ - NO_FRM, - HAS_FRM, -}; - /* Helper function to fold vleff and vlsegff. */ -static gimple * +gimple * fold_fault_load (gimple_folder &f) { /* fold fault_load (const *base, size_t *new_vl, size_t vl) diff --git a/gcc/config/riscv/riscv-vector-builtins-bases.h b/gcc/config/riscv/riscv-vector-builtins-bases.h index 131041ea66f..42d0cd17dc1 100644 --- a/gcc/config/riscv/riscv-vector-builtins-bases.h +++ b/gcc/config/riscv/riscv-vector-builtins-bases.h @@ -21,8 +21,27 @@ #ifndef GCC_RISCV_VECTOR_BUILTINS_BASES_H #define GCC_RISCV_VECTOR_BUILTINS_BASES_H +#include "gimple.h" +#include "riscv-vector-builtins.h" + namespace riscv_vector { +/* Enumerates types of loads/stores operations. */ +enum lst_type +{ + LST_UNIT_STRIDE, + LST_STRIDED, + LST_INDEXED, +}; + +enum frm_op_type +{ + NO_FRM, + HAS_FRM, +}; + +extern gimple *fold_fault_load (gimple_folder &f); + namespace bases { extern const function_base *const vsetvl; extern const function_base *const vsetvlmax; -- 2.17.1
[PATCH v4] RISC-V: Change csr_operand into
This patch use vector_length_operand instead of csr_operand for vsetvl patterns, so that changes for vector will not affect scalar patterns using csr_operand in riscv.md. gcc/ChangeLog: * config/riscv/vector.md: Use vector_length_operand for vsetvl patterns. Co-authored-by: Jin Ma Co-authored-by: Xianmiao Qu Co-authored-by: Christoph Müllner --- gcc/config/riscv/vector.md | 8 1 file changed, 4 insertions(+), 4 deletions(-) diff --git a/gcc/config/riscv/vector.md b/gcc/config/riscv/vector.md index f607d768b26..b5a9055cdc4 100644 --- a/gcc/config/riscv/vector.md +++ b/gcc/config/riscv/vector.md @@ -1496,7 +1496,7 @@ (define_insn "@vsetvl" [(set (match_operand:P 0 "register_operand" "=r") - (unspec:P [(match_operand:P 1 "csr_operand" "rK") + (unspec:P [(match_operand:P 1 "vector_length_operand" "rK") (match_operand 2 "const_int_operand" "i") (match_operand 3 "const_int_operand" "i") (match_operand 4 "const_int_operand" "i") @@ -1542,7 +1542,7 @@ ;; in vsetvl instruction pattern. (define_insn "@vsetvl_discard_result" [(set (reg:SI VL_REGNUM) - (unspec:SI [(match_operand:P 0 "csr_operand" "rK") + (unspec:SI [(match_operand:P 0 "vector_length_operand" "rK") (match_operand 1 "const_int_operand" "i") (match_operand 2 "const_int_operand" "i")] UNSPEC_VSETVL)) (set (reg:SI VTYPE_REGNUM) @@ -1564,7 +1564,7 @@ ;; such pattern can allow us gain benefits of these optimizations. (define_insn_and_split "@vsetvl_no_side_effects" [(set (match_operand:P 0 "register_operand" "=r") - (unspec:P [(match_operand:P 1 "csr_operand" "rK") + (unspec:P [(match_operand:P 1 "vector_length_operand" "rK") (match_operand 2 "const_int_operand" "i") (match_operand 3 "const_int_operand" "i") (match_operand 4 "const_int_operand" "i") @@ -1608,7 +1608,7 @@ [(set (match_operand:DI 0 "register_operand") (sign_extend:DI (subreg:SI - (unspec:DI [(match_operand:P 1 "csr_operand") + (unspec:DI [(match_operand:P 1 "vector_length_operand") (match_operand 2 "const_int_operand") (match_operand 3 "const_int_operand") (match_operand 4 "const_int_operand") -- 2.17.1
[PATCH v4] RISC-V: Change csr_operand into vector_length_operand for vsetvl patterns.
This patch use vector_length_operand instead of csr_operand for vsetvl patterns, so that changes for vector will not affect scalar patterns using csr_operand in riscv.md. gcc/ChangeLog: * config/riscv/vector.md: Use vector_length_operand for vsetvl patterns. Co-authored-by: Jin Ma Co-authored-by: Xianmiao Qu Co-authored-by: Christoph Müllner --- gcc/config/riscv/vector.md | 8 1 file changed, 4 insertions(+), 4 deletions(-) diff --git a/gcc/config/riscv/vector.md b/gcc/config/riscv/vector.md index f607d768b26..b5a9055cdc4 100644 --- a/gcc/config/riscv/vector.md +++ b/gcc/config/riscv/vector.md @@ -1496,7 +1496,7 @@ (define_insn "@vsetvl" [(set (match_operand:P 0 "register_operand" "=r") - (unspec:P [(match_operand:P 1 "csr_operand" "rK") + (unspec:P [(match_operand:P 1 "vector_length_operand" "rK") (match_operand 2 "const_int_operand" "i") (match_operand 3 "const_int_operand" "i") (match_operand 4 "const_int_operand" "i") @@ -1542,7 +1542,7 @@ ;; in vsetvl instruction pattern. (define_insn "@vsetvl_discard_result" [(set (reg:SI VL_REGNUM) - (unspec:SI [(match_operand:P 0 "csr_operand" "rK") + (unspec:SI [(match_operand:P 0 "vector_length_operand" "rK") (match_operand 1 "const_int_operand" "i") (match_operand 2 "const_int_operand" "i")] UNSPEC_VSETVL)) (set (reg:SI VTYPE_REGNUM) @@ -1564,7 +1564,7 @@ ;; such pattern can allow us gain benefits of these optimizations. (define_insn_and_split "@vsetvl_no_side_effects" [(set (match_operand:P 0 "register_operand" "=r") - (unspec:P [(match_operand:P 1 "csr_operand" "rK") + (unspec:P [(match_operand:P 1 "vector_length_operand" "rK") (match_operand 2 "const_int_operand" "i") (match_operand 3 "const_int_operand" "i") (match_operand 4 "const_int_operand" "i") @@ -1608,7 +1608,7 @@ [(set (match_operand:DI 0 "register_operand") (sign_extend:DI (subreg:SI - (unspec:DI [(match_operand:P 1 "csr_operand") + (unspec:DI [(match_operand:P 1 "vector_length_operand") (match_operand 2 "const_int_operand") (match_operand 3 "const_int_operand") (match_operand 4 "const_int_operand") -- 2.17.1
[PATCH v4] RISC-V: Introduce XTheadVector as a subset of V1.0.0
This patch is to introduce basic XTheadVector support (march string parsing and a test for __riscv_xtheadvector) according to https://github.com/T-head-Semi/thead-extension-spec/ gcc/ChangeLog: * common/config/riscv/riscv-common.cc (riscv_subset_list::parse): Add new vendor extension. * config/riscv/riscv-c.cc (riscv_cpu_cpp_builtins): Add test marco. * config/riscv/riscv.opt: Add new mask. gcc/testsuite/ChangeLog: * gcc.target/riscv/predef-__riscv_th_v_intrinsic.c: New test. * gcc.target/riscv/rvv/xtheadvector.c: New test. Co-authored-by: Jin Ma Co-authored-by: Xianmiao Qu Co-authored-by: Christoph Müllner --- gcc/common/config/riscv/riscv-common.cc | 23 +++ gcc/config/riscv/riscv-c.cc | 8 +-- gcc/config/riscv/riscv.opt| 2 ++ .../riscv/predef-__riscv_th_v_intrinsic.c | 11 + .../gcc.target/riscv/rvv/xtheadvector.c | 13 +++ 5 files changed, 55 insertions(+), 2 deletions(-) create mode 100644 gcc/testsuite/gcc.target/riscv/predef-__riscv_th_v_intrinsic.c create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/xtheadvector.c diff --git a/gcc/common/config/riscv/riscv-common.cc b/gcc/common/config/riscv/riscv-common.cc index f20d179568d..66b20c154a9 100644 --- a/gcc/common/config/riscv/riscv-common.cc +++ b/gcc/common/config/riscv/riscv-common.cc @@ -368,6 +368,7 @@ static const struct riscv_ext_version riscv_ext_version_table[] = {"xtheadmemidx", ISA_SPEC_CLASS_NONE, 1, 0}, {"xtheadmempair", ISA_SPEC_CLASS_NONE, 1, 0}, {"xtheadsync", ISA_SPEC_CLASS_NONE, 1, 0}, + {"xtheadvector", ISA_SPEC_CLASS_NONE, 1, 0}, {"xventanacondops", ISA_SPEC_CLASS_NONE, 1, 0}, @@ -1251,6 +1252,15 @@ riscv_subset_list::check_conflict_ext () if (lookup ("zcmp")) error_at (m_loc, "%<-march=%s%>: zcd conflicts with zcmp", m_arch); } + + if ((lookup ("v") || lookup ("zve32x") +|| lookup ("zve64x") || lookup ("zve32f") +|| lookup ("zve64f") || lookup ("zve64d") +|| lookup ("zvl32b") || lookup ("zvl64b") +|| lookup ("zvl128b") || lookup ("zvfh")) +&& lookup ("xtheadvector")) +error_at (m_loc, "%<-march=%s%>: xtheadvector conflicts with vector " + "extension or its sub-extensions", m_arch); } /* Parsing function for multi-letter extensions. @@ -1743,6 +1753,19 @@ static const riscv_ext_flag_table_t riscv_ext_flag_table[] = {"xtheadmemidx", &gcc_options::x_riscv_xthead_subext, MASK_XTHEADMEMIDX}, {"xtheadmempair", &gcc_options::x_riscv_xthead_subext, MASK_XTHEADMEMPAIR}, {"xtheadsync",&gcc_options::x_riscv_xthead_subext, MASK_XTHEADSYNC}, + {"xtheadvector", &gcc_options::x_riscv_xthead_subext, MASK_XTHEADVECTOR}, + {"xtheadvector", &gcc_options::x_riscv_vector_elen_flags, MASK_VECTOR_ELEN_32}, + {"xtheadvector", &gcc_options::x_riscv_vector_elen_flags, MASK_VECTOR_ELEN_64}, + {"xtheadvector", &gcc_options::x_riscv_vector_elen_flags, MASK_VECTOR_ELEN_FP_32}, + {"xtheadvector", &gcc_options::x_riscv_vector_elen_flags, MASK_VECTOR_ELEN_FP_64}, + {"xtheadvector", &gcc_options::x_riscv_vector_elen_flags, MASK_VECTOR_ELEN_FP_16}, + {"xtheadvector", &gcc_options::x_riscv_zvl_flags, MASK_ZVL32B}, + {"xtheadvector", &gcc_options::x_riscv_zvl_flags, MASK_ZVL64B}, + {"xtheadvector", &gcc_options::x_riscv_zvl_flags, MASK_ZVL128B}, + {"xtheadvector", &gcc_options::x_riscv_zf_subext, MASK_ZVFHMIN}, + {"xtheadvector", &gcc_options::x_riscv_zf_subext, MASK_ZVFH}, + {"xtheadvector", &gcc_options::x_target_flags, MASK_FULL_V}, + {"xtheadvector", &gcc_options::x_target_flags, MASK_VECTOR}, {"xventanacondops", &gcc_options::x_riscv_xventana_subext, MASK_XVENTANACONDOPS}, diff --git a/gcc/config/riscv/riscv-c.cc b/gcc/config/riscv/riscv-c.cc index d70eb8ed361..d7c63ead147 100644 --- a/gcc/config/riscv/riscv-c.cc +++ b/gcc/config/riscv/riscv-c.cc @@ -138,6 +138,10 @@ riscv_cpu_cpp_builtins (cpp_reader *pfile) riscv_ext_version_value (0, 11)); } + if (TARGET_XTHEADVECTOR) + builtin_define_with_int_value ("__riscv_th_v_intrinsic", +riscv_ext_version_value (0, 11)); + /* Define architecture extension test macros. */ builtin_define_with_int_value ("__riscv_arch_test", 1); @@ -191,8 +195,8 @@ riscv_pragma_intrinsic (cpp_reader *) { if (!TARGET_VECTOR) { - error ("%<#pragma riscv intrinsic%> option %qs needs 'V' extension " -"enabled", + error ("%<#pragma riscv intrinsic%> option %qs needs 'V' or " +"'XTHEADVECTOR' extension enabled", name); return; } diff --git a/gcc/config/riscv/riscv.opt b/gcc/config/riscv/riscv.opt index ede2d655e73..7de5f18e11b 100644 --- a/gcc/config/riscv/riscv.opt +++ b/gcc/config/riscv/riscv.opt @@ -449,6 +449,8 @@ Mask(XTHEADMEMP
[PATCH v4] RISC-V: Adds the prefix "th." for the instructions of XTheadVector.
This patch adds th. prefix to all XTheadVector instructions by implementing new assembly output functions. We only check the prefix is 'v', so that no extra attribute is needed. gcc/ChangeLog: * config/riscv/riscv-protos.h (riscv_asm_output_opcode): New function to add assembler insn code prefix/suffix. * config/riscv/riscv.cc (riscv_asm_output_opcode): Likewise. * config/riscv/riscv.h (ASM_OUTPUT_OPCODE): Likewise. Co-authored-by: Jin Ma Co-authored-by: Xianmiao Qu Co-authored-by: Christoph Müllner --- gcc/config/riscv/riscv-protos.h| 1 + gcc/config/riscv/riscv.cc | 14 ++ gcc/config/riscv/riscv.h | 4 .../gcc.target/riscv/rvv/xtheadvector/prefix.c | 12 4 files changed, 31 insertions(+) create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/xtheadvector/prefix.c diff --git a/gcc/config/riscv/riscv-protos.h b/gcc/config/riscv/riscv-protos.h index 31049ef7523..5ea54b45703 100644 --- a/gcc/config/riscv/riscv-protos.h +++ b/gcc/config/riscv/riscv-protos.h @@ -102,6 +102,7 @@ struct riscv_address_info { }; /* Routines implemented in riscv.cc. */ +extern const char *riscv_asm_output_opcode (FILE *asm_out_file, const char *p); extern enum riscv_symbol_type riscv_classify_symbolic_expression (rtx); extern bool riscv_symbolic_constant_p (rtx, enum riscv_symbol_type *); extern int riscv_float_const_rtx_index_for_fli (rtx); diff --git a/gcc/config/riscv/riscv.cc b/gcc/config/riscv/riscv.cc index 0d1cbc5cb5f..ea1d59d9cf2 100644 --- a/gcc/config/riscv/riscv.cc +++ b/gcc/config/riscv/riscv.cc @@ -5636,6 +5636,20 @@ riscv_get_v_regno_alignment (machine_mode mode) return lmul; } +/* Define ASM_OUTPUT_OPCODE to do anything special before + emitting an opcode. */ +const char * +riscv_asm_output_opcode (FILE *asm_out_file, const char *p) +{ + /* We need to add th. prefix to all the xtheadvector + insturctions here.*/ + if (TARGET_XTHEADVECTOR && current_output_insn != NULL_RTX && + p[0] == 'v') +fputs ("th.", asm_out_file); + + return p; +} + /* Implement TARGET_PRINT_OPERAND. The RISCV-specific operand codes are: 'h' Print the high-part relocation associated with OP, after stripping diff --git a/gcc/config/riscv/riscv.h b/gcc/config/riscv/riscv.h index 6df9ec73c5e..c33361a254d 100644 --- a/gcc/config/riscv/riscv.h +++ b/gcc/config/riscv/riscv.h @@ -826,6 +826,10 @@ extern enum riscv_cc get_riscv_cc (const rtx use); asm_fprintf ((FILE), "%U%s", (NAME));\ } while (0) +#undef ASM_OUTPUT_OPCODE +#define ASM_OUTPUT_OPCODE(STREAM, PTR) \ + (PTR) = riscv_asm_output_opcode(STREAM, PTR) + #define JUMP_TABLES_IN_TEXT_SECTION 0 #define CASE_VECTOR_MODE SImode #define CASE_VECTOR_PC_RELATIVE (riscv_cmodel != CM_MEDLOW) diff --git a/gcc/testsuite/gcc.target/riscv/rvv/xtheadvector/prefix.c b/gcc/testsuite/gcc.target/riscv/rvv/xtheadvector/prefix.c new file mode 100644 index 000..eee727ef6b4 --- /dev/null +++ b/gcc/testsuite/gcc.target/riscv/rvv/xtheadvector/prefix.c @@ -0,0 +1,12 @@ +/* { dg-do compile } */ +/* { dg-options "-march=rv32gc_xtheadvector -mabi=ilp32 -O0" } */ + +#include "riscv_vector.h" + +vint32m1_t +prefix (vint32m1_t vx, vint32m1_t vy, size_t vl) +{ + return __riscv_vadd_vv_i32m1 (vx, vy, vl); +} + +/* { dg-final { scan-assembler {\mth\.v\M} } } */ -- 2.17.1
[PATCH v4] RISC-V: Handle differences between XTheadvector and Vector
This patch is to handle the differences in instruction generation between Vector and XTheadVector. In this version, we only support partial xtheadvector instructions that leverage directly from current RVV1.0 with simple adding "th." prefix. For different name xtheadvector instructions but share same patterns as RVV1.0 instructions, we will use ASM targethook to rewrite the whole string of the instructions in the following patches. For some vector patterns that cannot be avoided, we use "!TARGET_XTHEADVECTOR" to disable them in vector.md in order not to generate instructions that xtheadvector does not support, like vmv1r and vsext.vf2. gcc/ChangeLog: * config.gcc: Add files for XTheadVector intrinsics. * config/riscv/autovec.md: Guard XTheadVector. * config/riscv/riscv-string.cc (expand_block_move): Guard XTheadVector. * config/riscv/riscv-v.cc (legitimize_move): New expansion. (get_prefer_tail_policy): Give specific value for tail. (get_prefer_mask_policy): Give specific value for mask. (vls_mode_valid_p): Avoid autovec. * config/riscv/riscv-vector-builtins-shapes.cc (check_type): (build_one): New function. * config/riscv/riscv-vector-builtins.cc (DEF_RVV_FUNCTION): (DEF_THEAD_RVV_FUNCTION): Add new marcos. (check_required_extensions): (handle_pragma_vector): * config/riscv/riscv-vector-builtins.h (RVV_REQUIRE_VECTOR): (RVV_REQUIRE_XTHEADVECTOR): Add RVV_REQUIRE_VECTOR and RVV_REQUIRE_XTHEADVECTOR. (struct function_group_info): * config/riscv/riscv-vector-switch.def (ENTRY): Disable fractional mode for the XTheadVector extension. (TUPLE_ENTRY): Likewise. * config/riscv/riscv-vsetvl.cc: Add functions for xtheadvector. * config/riscv/riscv.cc (riscv_v_ext_vls_mode_p): Guard XTheadVector. (riscv_v_adjust_bytesize): Likewise. (riscv_preferred_simd_mode): Likewsie. (riscv_autovectorize_vector_modes): Likewise. (riscv_vector_mode_supported_any_target_p): Likewise. (TARGET_VECTOR_MODE_SUPPORTED_ANY_TARGET_P): Likewise. * config/riscv/vector-iterators.md: Remove fractional LMUL. * config/riscv/vector.md: Include thead-vector.md. * config/riscv/riscv_th_vector.h: New file. * config/riscv/thead-vector.md: New file. gcc/testsuite/ChangeLog: * gcc.target/riscv/rvv/base/pragma-1.c: Add XTheadVector. * gcc.target/riscv/rvv/base/abi-1.c: Exclude XTheadVector. * lib/target-supports.exp: Add target for XTheadVector. Co-authored-by: Jin Ma Co-authored-by: Xianmiao Qu Co-authored-by: Christoph Müllner --- gcc/config.gcc| 2 +- gcc/config/riscv/autovec.md | 2 +- gcc/config/riscv/predicates.md| 3 +- gcc/config/riscv/riscv-string.cc | 3 + gcc/config/riscv/riscv-v.cc | 13 +- .../riscv/riscv-vector-builtins-bases.cc | 3 + .../riscv/riscv-vector-builtins-shapes.cc | 23 +++ gcc/config/riscv/riscv-vector-switch.def | 150 +++--- gcc/config/riscv/riscv-vsetvl.cc | 10 + gcc/config/riscv/riscv.cc | 20 +- gcc/config/riscv/riscv_th_vector.h| 49 + gcc/config/riscv/thead-vector.md | 142 + gcc/config/riscv/vector-iterators.md | 186 +- gcc/config/riscv/vector.md| 36 +++- .../gcc.target/riscv/rvv/base/abi-1.c | 2 +- .../gcc.target/riscv/rvv/base/pragma-1.c | 2 +- gcc/testsuite/lib/target-supports.exp | 12 ++ 17 files changed, 471 insertions(+), 187 deletions(-) create mode 100644 gcc/config/riscv/riscv_th_vector.h create mode 100644 gcc/config/riscv/thead-vector.md diff --git a/gcc/config.gcc b/gcc/config.gcc index f0676c830e8..1445d98c147 100644 --- a/gcc/config.gcc +++ b/gcc/config.gcc @@ -549,7 +549,7 @@ riscv*) extra_objs="${extra_objs} riscv-vector-builtins.o riscv-vector-builtins-shapes.o riscv-vector-builtins-bases.o" extra_objs="${extra_objs} thead.o riscv-target-attr.o" d_target_objs="riscv-d.o" - extra_headers="riscv_vector.h" + extra_headers="riscv_vector.h riscv_th_vector.h" target_gtfiles="$target_gtfiles \$(srcdir)/config/riscv/riscv-vector-builtins.cc" target_gtfiles="$target_gtfiles \$(srcdir)/config/riscv/riscv-vector-builtins.h" ;; diff --git a/gcc/config/riscv/autovec.md b/gcc/config/riscv/autovec.md index 8b8a92f10a1..1fac56c7095 100644 --- a/gcc/config/riscv/autovec.md +++ b/gcc/config/riscv/autovec.md @@ -2579,7 +2579,7 @@ [(match_operand 0 "register_operand") (match_operand 1 "memory_operand") (match_operand:ANYI 2 "const_int_operand")] - "TARGET_VECTOR" + "TARGET_VECTOR && !TARGET_XTHEADVECTOR" { riscv_vector::expand_rawmemchr(mode
[PATCH v4 6/6] RISC-V: Add support for xtheadvector-specific intrinsics.
This patch only involves the generation of xtheadvector special load/store instructions and vext instructions. gcc/ChangeLog: * config/riscv/riscv-vector-builtins-bases.cc (class th_loadstore_width): Define new builtin bases. (BASE): Define new builtin bases. * config/riscv/riscv-vector-builtins-bases.h: Define new builtin class. * config/riscv/riscv-vector-builtins-functions.def (vlsegff): Include thead-vector-builtins-functions.def. * config/riscv/riscv-vector-builtins-shapes.cc (struct th_loadstore_width_def): Define new builtin shapes. (struct th_indexed_loadstore_width_def): Define new builtin shapes. (SHAPE): Define new builtin shapes. * config/riscv/riscv-vector-builtins-shapes.h: Define new builtin shapes. * config/riscv/riscv-vector-builtins-types.def (DEF_RVV_I8_OPS): Add datatypes for XTheadVector. (DEF_RVV_I16_OPS): Add datatypes for XTheadVector. (DEF_RVV_I32_OPS): Add datatypes for XTheadVector. (DEF_RVV_U8_OPS): Add datatypes for XTheadVector. (DEF_RVV_U16_OPS): Add datatypes for XTheadVector. (DEF_RVV_U32_OPS): Add datatypes for XTheadVector. (vint8m1_t): Add datatypes for XTheadVector. (vint8m2_t): Likewise. (vint8m4_t): Likewise. (vint8m8_t): Likewise. (vint16m1_t): Likewise. (vint16m2_t): Likewise. (vint16m4_t): Likewise. (vint16m8_t): Likewise. (vint32m1_t): Likewise. (vint32m2_t): Likewise. (vint32m4_t): Likewise. (vint32m8_t): Likewise. (vint64m1_t): Likewise. (vint64m2_t): Likewise. (vint64m4_t): Likewise. (vint64m8_t): Likewise. (vuint8m1_t): Likewise. (vuint8m2_t): Likewise. (vuint8m4_t): Likewise. (vuint8m8_t): Likewise. (vuint16m1_t): Likewise. (vuint16m2_t): Likewise. (vuint16m4_t): Likewise. (vuint16m8_t): Likewise. (vuint32m1_t): Likewise. (vuint32m2_t): Likewise. (vuint32m4_t): Likewise. (vuint32m8_t): Likewise. (vuint64m1_t): Likewise. (vuint64m2_t): Likewise. (vuint64m4_t): Likewise. (vuint64m8_t): Likewise. * config/riscv/riscv-vector-builtins.cc (DEF_RVV_I8_OPS): Add datatypes for XTheadVector. (DEF_RVV_I16_OPS): Add datatypes for XTheadVector. (DEF_RVV_I32_OPS): Add datatypes for XTheadVector. (DEF_RVV_U8_OPS): Add datatypes for XTheadVector. (DEF_RVV_U16_OPS): Add datatypes for XTheadVector. (DEF_RVV_U32_OPS): Add datatypes for XTheadVector. * config/riscv/thead-vector-builtins-functions.def: New file. * config/riscv/thead-vector.md: Add new patterns. gcc/testsuite/ChangeLog: * gcc.target/riscv/rvv/xtheadvector/vlb-vsb.c: New test. * gcc.target/riscv/rvv/xtheadvector/vlbu-vsb.c: New test. * gcc.target/riscv/rvv/xtheadvector/vlh-vsh.c: New test. * gcc.target/riscv/rvv/xtheadvector/vlhu-vsh.c: New test. * gcc.target/riscv/rvv/xtheadvector/vlw-vsw.c: New test. * gcc.target/riscv/rvv/xtheadvector/vlwu-vsw.c: New test. Co-authored-by: Jin Ma Co-authored-by: Xianmiao Qu Co-authored-by: Christoph Müllner --- gcc/config.gcc| 2 +- .../riscv/riscv-vector-builtins-shapes.cc | 126 +++ .../riscv/riscv-vector-builtins-shapes.h | 3 + .../riscv/riscv-vector-builtins-types.def | 120 +++ gcc/config/riscv/riscv-vector-builtins.cc | 313 +- gcc/config/riscv/riscv-vector-builtins.h | 3 + gcc/config/riscv/t-riscv | 16 + .../riscv/thead-vector-builtins-functions.def | 39 +++ gcc/config/riscv/thead-vector-builtins.cc | 200 +++ gcc/config/riscv/thead-vector-builtins.h | 64 gcc/config/riscv/thead-vector.md | 253 ++ .../riscv/rvv/xtheadvector/vlb-vsb.c | 68 .../riscv/rvv/xtheadvector/vlbu-vsb.c | 68 .../riscv/rvv/xtheadvector/vlh-vsh.c | 68 .../riscv/rvv/xtheadvector/vlhu-vsh.c | 68 .../riscv/rvv/xtheadvector/vlw-vsw.c | 68 .../riscv/rvv/xtheadvector/vlwu-vsw.c | 68 17 files changed, 1545 insertions(+), 2 deletions(-) create mode 100644 gcc/config/riscv/thead-vector-builtins-functions.def create mode 100644 gcc/config/riscv/thead-vector-builtins.cc create mode 100644 gcc/config/riscv/thead-vector-builtins.h create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/xtheadvector/vlb-vsb.c create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/xtheadvector/vlbu-vsb.c create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/xtheadvector/vlh-vsh.c create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/xtheadvector/vlhu-vsh.c create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/xtheadvector/vlw-vsw.c creat
[PATCH] Fix gen-vect-26.c testcase after loops with multiple exits [PR113167]
This fixes the gcc.dg/tree-ssa/gen-vect-26.c testcase by adding `#pragma GCC novector` in front of the loop that is doing the checking of the result. We only want to test the first loop to see if it can be vectorize. Committed as obvious after testing on x86_64-linux-gnu with -m32. gcc/testsuite/ChangeLog: PR testsuite/113167 * gcc.dg/tree-ssa/gen-vect-26.c: Mark the test/check loop as novector. Signed-off-by: Andrew Pinski --- gcc/testsuite/gcc.dg/tree-ssa/gen-vect-26.c | 1 + 1 file changed, 1 insertion(+) diff --git a/gcc/testsuite/gcc.dg/tree-ssa/gen-vect-26.c b/gcc/testsuite/gcc.dg/tree-ssa/gen-vect-26.c index 710696198bb..fdcec67bde6 100644 --- a/gcc/testsuite/gcc.dg/tree-ssa/gen-vect-26.c +++ b/gcc/testsuite/gcc.dg/tree-ssa/gen-vect-26.c @@ -19,6 +19,7 @@ int main () } /* check results: */ + #pragma GCC novector for (i = 1; i <= N; i++) { if (ia[i] != 5) -- 2.39.3
RE: [PATCH v2] RISC-V: XFAIL pr30957-1.c when loop vectorized with variable factor
Thanks Jeff. I think I locate where aarch64 performs the trick here. 1. In the .final we have rtl like (insn:TI 6 8 29 (set (reg:SF 32 v0) (const_double:SF -0.0 [-0x0.0p+0])) "/home/box/panli/gnu-toolchain/gcc/gcc/testsuite/gcc.dg/pr30957-1.c":31:7 79 {*movsf_aarch64} (nil)) 2. the movsf_aarch64 comes from the aarch64.md file similar to the below rtl. Aka, it will generate movi\t%0.2s, #0 if the aarch64_reg_or_fp_zero is true. 1640 (define_insn "*mov_aarch64" 1641 [(set (match_operand:SFD 0 "nonimmediate_operand") 1642 match_operand:SFD 1 "general_operand"))] 1643 "TARGET_FLOAT && (register_operand (operands[0], mode) 1644 || aarch64_reg_or_fp_zero (operands[1], mode))" 1645 {@ [ cons: =0 , 1 ; attrs: type , arch ] 1646 [ w, Y ; neon_move , simd ] movi\t%0.2s, #0 3. Then we will have aarch64_float_const_zero_rtx_p here, and the -0.0 input rtl will return true in line 10873 because of no-signed-zero is given. 10863 bool 10864 aarch64_float_const_zero_rtx_p (rtx x 10865 { 10866 /* 0.0 in Decimal Floating Point cannot be represented by #0 or 10867 zr as our callers expect, so no need to check the actual 10868 value if X is of Decimal Floating Point type. */ 10869 if (GET_MODE_CLASS (GET_MODE (x)) == MODE_DECIMAL_FLOAT) 10870 return false; 10871 10872 if (REAL_VALUE_MINUS_ZERO (*CONST_DOUBLE_REAL_VALUE (x))) 10873 return !HONOR_SIGNED_ZEROS (GET_MODE (x)); 10874 return real_equal (CONST_DOUBLE_REAL_VALUE (x), &dconst0); 10875 } I think that explain why we have +0.0 in aarch64 here. Pan -Original Message- From: Jeff Law Sent: Friday, December 29, 2023 9:04 AM To: Li, Pan2 ; gcc-patches@gcc.gnu.org Cc: juzhe.zh...@rivai.ai; Wang, Yanzhang ; kito.ch...@gmail.com; richard.guent...@gmail.com Subject: Re: [PATCH v2] RISC-V: XFAIL pr30957-1.c when loop vectorized with variable factor On 12/28/23 17:42, Li, Pan2 wrote: > Thanks Jeff for comments, and Happy new year! > >> Interesting. So I'd actually peel one more layer off this onion. Why >> do the aarch64 and riscv targets generate different constants (0.0 vs >> -0.0)? > > Yeah, it surprise me too when debugging the foo function. But didn't dig into > it in previous as it may be unrelated to vectorize. > >> Is it possible that the aarch64 is generating 0.0 when asked for -0.0 >> and -fno-signed-zeros is in effect? That's a valid thing to do when >> -fno-signed-zeros is on. Look for HONOR_SIGNED_ZEROs in the aarch64 >> backend. > > Sure, will have a try for making the -0.0 happen in aarch64. I would first look at the .optimized dump, then I'd look at the .final dump alongside the resulting assembly for aarch64. I bet we're going to find that the aarch64 target internally converts -0.0 to 0.0 when we're not honoring signed zeros. jeff
Re: [PATCH v1 1/8] LoongArch: testsuite:Add detection procedures supported by the target.
chenxiaolong writes: > In order to improve and check the function of vector quantization in > LoongArch architecture, tests on vector instruction set are provided > in target-support.exp. > > gcc/testsuite/ChangeLog: > > * lib/target-supports.exp:Add LoongArch to the list of supported > targets. ^ Should be a space after ":". > --- > gcc/testsuite/lib/target-supports.exp | 219 +++--- > 1 file changed, 161 insertions(+), 58 deletions(-) > > diff --git a/gcc/testsuite/lib/target-supports.exp > b/gcc/testsuite/lib/target-supports.exp > index 14e3e119792..b90aaf8cabe 100644 > --- a/gcc/testsuite/lib/target-supports.exp > +++ b/gcc/testsuite/lib/target-supports.exp > @@ -3811,7 +3811,11 @@ proc add_options_for_bfloat16 { flags } { > # (fma, fms, fnma, and fnms) for both float and double. > > proc check_effective_target_scalar_all_fma { } { > -return [istarget aarch64*-*-*] > +if { [istarget aarch64*-*-*] Trailing whitespace. > + || [istarget loongarch*-*-*]} { > + return 1 > +} > +return 0 > } > > # Return 1 if the target supports compiling fixed-point, > @@ -4017,7 +4021,7 @@ proc check_effective_target_vect_cmdline_needed { } { >|| ([istarget arm*-*-*] && [check_effective_target_arm_neon]) >|| [istarget aarch64*-*-*] >|| [istarget amdgcn*-*-*] > - || [istarget riscv*-*-*]} { > + || [istarget riscv*-*-*] } { Misses something ? > return 0 > } else { > return 1 > @@ -4047,6 +4051,8 @@ proc check_effective_target_vect_int { } { >&& [check_effective_target_s390_vx]) >|| ([istarget riscv*-*-*] >&& [check_effective_target_riscv_v]) > + || ([istarget loongarch*-*-*] > + && [check_effective_target_loongarch_sx]) > }}] > } > > @@ -4176,7 +4182,9 @@ proc check_effective_target_vect_intfloat_cvt { } { >|| ([istarget s390*-*-*] >&& [check_effective_target_s390_vxe2]) >|| ([istarget riscv*-*-*] > - && [check_effective_target_riscv_v]) }}] > + && [check_effective_target_riscv_v]) > + || ([istarget loongarch*-*-*] > + && [check_effective_target_loongarch_sx]) }}] > } > > # Return 1 if the target supports signed double->int conversion > @@ -4197,7 +4205,9 @@ proc check_effective_target_vect_doubleint_cvt { } { >|| ([istarget s390*-*-*] >&& [check_effective_target_s390_vx]) >|| ([istarget riscv*-*-*] > - && [check_effective_target_riscv_v]) }}] > + && [check_effective_target_riscv_v]) > + || ([istarget loongarch*-*-*] > + && [check_effective_target_loongarch_sx]) }}] > } > > # Return 1 if the target supports signed int->double conversion > @@ -4218,7 +4228,9 @@ proc check_effective_target_vect_intdouble_cvt { } { >|| ([istarget s390*-*-*] >&& [check_effective_target_s390_vx]) >|| ([istarget riscv*-*-*] > - && [check_effective_target_riscv_v]) }}] > + && [check_effective_target_riscv_v]) > + || ([istarget loongarch*-*-*] > + && [check_effective_target_loongarch_sx]) }}] > } > > #Return 1 if we're supporting __int128 for target, 0 otherwise. > @@ -4251,7 +4263,9 @@ proc check_effective_target_vect_uintfloat_cvt { } { >|| ([istarget s390*-*-*] >&& [check_effective_target_s390_vxe2]) >|| ([istarget riscv*-*-*] > - && [check_effective_target_riscv_v]) }}] > + && [check_effective_target_riscv_v]) > + || ([istarget loongarch*-*-*] > + && [check_effective_target_loongarch_sx]) }}] > } > > > @@ -4270,7 +4284,9 @@ proc check_effective_target_vect_floatint_cvt { } { >|| ([istarget s390*-*-*] >&& [check_effective_target_s390_vxe2]) >|| ([istarget riscv*-*-*] > - && [check_effective_target_riscv_v]) }}] > + && [check_effective_target_riscv_v]) > + || ([istarget loongarch*-*-*] > + && [check_effective_target_loongarch_sx]) }}] > } > > # Return 1 if the target supports unsigned float->int conversion > @@ -4287,7 +4303,9 @@ proc check_effective_target_vect_floatuint_cvt { } { > || ([istarget s390*-*-*] > && [check_effective_target_s390_vxe2]) > || ([istarget riscv*-*-*] > - && [check_effective_target_riscv_v]) }}] > + && [check_effective_target_riscv_v]) > + || ([istarget loongarch*-*-*] > + && [check_effective_target_loongarch_sx]) }}] > } > > # Return 1 if the target supports vector integer char -> long long extend > optab > @@ -4296,7 +4314,9 @@ proc check_effective_target_vect_floatuint_cvt { } { > proc check_effective_target_vect_ext_char_longlong { } { > ret
[PATCH v1] LoongArch: testsuite:Add loongarch to gcc.dg/vect/slp-21.c.
In the GCC code of LoongArch architecture, IFN_STORE_LANES optimization operation is not supported, and four SLP statements are used for vectorization in slp-21.c. So add loongarch*-*-* to the corresponding dg-finals. gcc/testsuite/ChangeLog: * gcc.dg/vect/slp-21.c: Add loongarch. --- gcc/testsuite/gcc.dg/vect/slp-21.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/gcc/testsuite/gcc.dg/vect/slp-21.c b/gcc/testsuite/gcc.dg/vect/slp-21.c index 712a73b69d7..58751688414 100644 --- a/gcc/testsuite/gcc.dg/vect/slp-21.c +++ b/gcc/testsuite/gcc.dg/vect/slp-21.c @@ -213,7 +213,7 @@ int main (void) Not all vect_perm targets support that, and it's a bit too specific to have its own effective-target selector, so we just test targets directly. */ -/* { dg-final { scan-tree-dump-times "vectorizing stmts using SLP" 4 "vect" { target { powerpc64*-*-* s390*-*-* } } } } */ -/* { dg-final { scan-tree-dump-times "vectorizing stmts using SLP" 2 "vect" { target { vect_strided4 && { ! { powerpc64*-*-* s390*-*-* } } } } } } */ +/* { dg-final { scan-tree-dump-times "vectorizing stmts using SLP" 4 "vect" { target { powerpc64*-*-* s390*-*-* loongarch*-*-* } } } } */ +/* { dg-final { scan-tree-dump-times "vectorizing stmts using SLP" 2 "vect" { target { vect_strided4 && { ! { powerpc64*-*-* s390*-*-* loongarch*-*-* } } } } } } */ /* { dg-final { scan-tree-dump-times "vectorizing stmts using SLP" 0 "vect" { target { ! { vect_strided4 } } } } } */ -- 2.20.1
[PATCH v1] LoongArch: testsuite:Add loongarch to gcc.dg/vect/slp-26.c.
In the LoongArch architecture, GCC supports the vectorization function tested by vect/slp-26.c, but there is no detection of loongarch in dg-finals. Add loongarch to the appropriate dg-finals. gcc/testsuite/ChangeLog: * gcc.dg/vect/slp-26.c: Add loongarch. --- gcc/testsuite/gcc.dg/vect/slp-26.c | 8 1 file changed, 4 insertions(+), 4 deletions(-) diff --git a/gcc/testsuite/gcc.dg/vect/slp-26.c b/gcc/testsuite/gcc.dg/vect/slp-26.c index c964635c91c..cfb763bf519 100644 --- a/gcc/testsuite/gcc.dg/vect/slp-26.c +++ b/gcc/testsuite/gcc.dg/vect/slp-26.c @@ -47,7 +47,7 @@ int main (void) return 0; } -/* { dg-final { scan-tree-dump-times "vectorized 0 loops" 1 "vect" { target { ! { mips_msa || { amdgcn-*-* || riscv_v } } } } } } */ -/* { dg-final { scan-tree-dump-times "vectorized 1 loops" 1 "vect" { target { mips_msa || { amdgcn-*-* || riscv_v } } } } } */ -/* { dg-final { scan-tree-dump-times "vectorizing stmts using SLP" 0 "vect" { target { ! { mips_msa || { amdgcn-*-* || riscv_v } } } } } } */ -/* { dg-final { scan-tree-dump-times "vectorizing stmts using SLP" 1 "vect" { target { mips_msa || { amdgcn-*-* || riscv_v } } } } } */ +/* { dg-final { scan-tree-dump-times "vectorized 0 loops" 1 "vect" { target { ! { mips_msa || { amdgcn-*-* || { riscv_v || loongarch_sx } } } } } } } */ +/* { dg-final { scan-tree-dump-times "vectorized 1 loops" 1 "vect" { target { mips_msa || { amdgcn-*-* || { riscv_v || loongarch_sx } } } } } } */ +/* { dg-final { scan-tree-dump-times "vectorizing stmts using SLP" 0 "vect" { target { ! { mips_msa || { amdgcn-*-* || { riscv_v || loongarch_sx } } } } } } } */ +/* { dg-final { scan-tree-dump-times "vectorizing stmts using SLP" 1 "vect" { target { mips_msa || { amdgcn-*-* || { riscv_v || loongarch_sx } } } } } } */ -- 2.20.1
Re: [PATCH] RISC-V: Fix misaligned stack offset for interrupt function
On 2023-12-25 16:45 Kito Cheng wrote: >+++ b/gcc/testsuite/gcc.target/riscv/interrupt-misaligned.c >@@ -0,0 +1,29 @@ >+/* { dg-do compile } */ >+/* { dg-options "-O2 -march=rv64gc -mabi=lp64d -fno-schedule-insns >-fno-schedule-insns2" } */ >+/* { dg-skip-if "" { *-*-* } { "-flto -fno-fat-lto-objects" } } */ >+ >+/* Make sure no stack offset are misaligned. >+** interrupt: >+** ... >+** sd\tt0,40\(sp\) >+** frcsr\tt0 >+** sw\tt0,32\(sp\) >+** sd\tt1,24\(sp\) >+** fsd\tft0,8\(sp\) >+** ... >+** lw\tt0,32\(sp\) >+** fscsr\tt0 >+** ld\tt0,40\(sp\) >+** ld\tt1,24\(sp\) >+** fld\tft0,8\(sp\) >+** ... >+*/ Hi Kito The fix is fine but maybe using s0 instead of t0 is better: 1. simpler codes. 2. less stack size current implementaion: >+** sd\tt0,40\(sp\) >+** frcsr\tt0 >+** sw\tt0,32\(sp\) //save content of frcsr in stack use s0: >+** sd\tt0,40\(sp\) >+** frcsr\ts0 //save content of frcsr in s0 instead of >stack. If s0 is used as callee saved register, it will be saved again later by >legacy codes . Also adding this change in riscv_expand_prologue & epilogue would be consistent with current stack allocation logic. I can try it if you think necessary. BR Fei >+ >+ >+void interrupt(void) __attribute__((interrupt)); >+void interrupt(void) >+{ >+ asm volatile ("# clobber!":::"t0", "t1", "ft0"); >+} >+ >+/* { dg-final { check-function-bodies "**" "" } } */ >-- >2.40.1
Re: [PATCH v3] LoongArch: Replace -mexplicit-relocs=auto simple-used address peephole2 with combine
在 2023/12/29 上午12:11, Xi Ruoyao 写道: The problem with peephole2 is it uses a naive sliding-window algorithm and misses many cases. For example: float a[1]; float t() { return a[0] + a[8000]; } is compiled to: la.local$r13,a la.local$r12,a+32768 fld.s $f1,$r13,0 fld.s $f0,$r12,-768 fadd.s $f0,$f1,$f0 by trunk. But as we've explained in r14-4851, the following would be better with -mexplicit-relocs=auto: pcalau12i $r13,%pc_hi20(a) pcalau12i $r12,%pc_hi20(a+32000) fld.s $f1,$r13,%pc_lo12(a) fld.s $f0,$r12,%pc_lo12(a+32000) fadd.s $f0,$f1,$f0 However the sliding-window algorithm just won't detect the pcalau12i/fld pair to be optimized. Use a define_insn_and_split in combine pass will work around the issue. gcc/ChangeLog: * config/loongarch/predicates.md (symbolic_pcrel_offset_operand): New define_predicate. (mem_simple_ldst_operand): Likewise. * config/loongarch/loongarch-protos.h (loongarch_rewrite_mem_for_simple_ldst): Declare. * config/loongarch/loongarch.cc (loongarch_rewrite_mem_for_simple_ldst): Implement. * config/loongarch/loongarch.md (simple_load): New define_insn_and_rewrite. (simple_load_ext): Likewise. (simple_store): Likewise. (define_peephole2): Remove la.local/[f]ld peepholes. gcc/testsuite/ChangeLog: * gcc.target/loongarch/explicit-relocs-auto-single-load-store-2.c: New test. * gcc.target/loongarch/explicit-relocs-auto-single-load-store-3.c: New test. --- Changes from [v2]: - Match (mem (symbol_ref ...)) instead of (symbol_ref ...) to retain the attributes of the MEM. - Add a test to make sure the attributes of the MEM is retained. [v2]:https://gcc.gnu.org/pipermail/gcc-patches/2023-December/641430.html Bootstrapped & regtestd on loongarch64-linux-gnu. Ok for trunk? gcc/config/loongarch/loongarch-protos.h | 1 + gcc/config/loongarch/loongarch.cc | 16 +++ gcc/config/loongarch/loongarch.md | 114 +- gcc/config/loongarch/predicates.md| 13 ++ ...explicit-relocs-auto-single-load-store-2.c | 11 ++ ...explicit-relocs-auto-single-load-store-3.c | 18 +++ 6 files changed, 86 insertions(+), 87 deletions(-) create mode 100644 gcc/testsuite/gcc.target/loongarch/explicit-relocs-auto-single-load-store-2.c create mode 100644 gcc/testsuite/gcc.target/loongarch/explicit-relocs-auto-single-load-store-3.c diff --git a/gcc/config/loongarch/loongarch-protos.h b/gcc/config/loongarch/loongarch-protos.h /* snip */ diff --git a/gcc/config/loongarch/loongarch.md b/gcc/config/loongarch/loongarch.md /* snip */ +(define_insn_and_rewrite "simple_load" + [(set (match_operand:LD_AT_LEAST_32_BIT 0 "register_operand" "=r,f") + (match_operand:LD_AT_LEAST_32_BIT 1 "mem_simple_ldst_operand" ""))] + "loongarch_pre_reload_split () \ + && la_opt_explicit_relocs == EXPLICIT_RELOCS_AUTO \ Is the '\' here dispensable? I don't seem to have added it when I wrote the conditions. + && (TARGET_CMODEL_NORMAL || TARGET_CMODEL_MEDIUM)" + "#" + "&& true" { -emit_insn (gen_pcalau12i_gr (operands[0], operands[1])); +operands[1] = loongarch_rewrite_mem_for_simple_ldst (operands[1]); }) /* snip */ ;; Synchronization instructions. diff --git a/gcc/config/loongarch/predicates.md b/gcc/config/loongarch/predicates.md index 83fea08315c..2158fe7538c 100644 --- a/gcc/config/loongarch/predicates.md +++ b/gcc/config/loongarch/predicates.md @@ -579,6 +579,19 @@ (define_predicate "symbolic_pcrel_operand" return loongarch_symbolic_constant_p (op, &type) && type == SYMBOL_PCREL; }) +(define_predicate "symbolic_pcrel_offset_operand" + (and (match_code "plus") + (match_operand 0 "symbolic_pcrel_operand") + (match_operand 1 "const_int_operand"))) + +(define_predicate "mem_simple_ldst_operand" + (match_code "mem") +{ + op = XEXP (op, 0); + return symbolic_pcrel_operand (op, Pmode) || +symbolic_pcrel_offset_operand (op, Pmode); +}) + Symbol '||' It shouldn't be at the end of the line. + return symbolic_pcrel_operand (op, Pmode) +|| symbolic_pcrel_offset_operand (op, Pmode); Others LGTM. Thanks! /* snip */