[PATCH] vect: Support multiple lane-reducing operations for loop reduction [PR114440]
For lane-reducing operation(dot-prod/widen-sum/sad) in loop reduction, current vectorizer could only handle the pattern if the reduction chain does not contain other operation, no matter the other is normal or lane-reducing. Acctually, to allow multiple arbitray lane-reducing operations, we need to support vectorization of loop reduction chain with mixed input vectypes. Since lanes of vectype may vary with operation, the effective ncopies of vectorized statements for operation also may not be same to each other, this causes mismatch on vectorized def-use cycles. A simple way is to align all operations with the one that has the most ncopies, the gap could be complemented by generating extra trival pass-through copies. For example: int sum = 0; for (i) { sum += d0[i] * d1[i]; // dot-prod sum += w[i]; // widen-sum sum += abs(s0[i] - s1[i]); // sad sum += n[i]; // normal } The vector size is 128-bit,vectorization factor is 16. Reduction statements would be transformed as: vector<4> int sum_v0 = { 0, 0, 0, 0 }; vector<4> int sum_v1 = { 0, 0, 0, 0 }; vector<4> int sum_v2 = { 0, 0, 0, 0 }; vector<4> int sum_v3 = { 0, 0, 0, 0 }; for (i / 16) { sum_v0 = DOT_PROD (d0_v0[i: 0 ~ 15], d1_v0[i: 0 ~ 15], sum_v0); sum_v1 = sum_v1; // copy sum_v2 = sum_v2; // copy sum_v3 = sum_v3; // copy sum_v0 = sum_v0; // copy sum_v1 = WIDEN_SUM (w_v1[i: 0 ~ 15], sum_v1); sum_v2 = sum_v2; // copy sum_v3 = sum_v3; // copy sum_v0 = sum_v0; // copy sum_v1 = sum_v1; // copy sum_v2 = SAD (s0_v2[i: 0 ~ 7 ], s1_v2[i: 0 ~ 7 ], sum_v2); sum_v3 = SAD (s0_v3[i: 8 ~ 15], s1_v3[i: 8 ~ 15], sum_v3); sum_v0 += n_v0[i: 0 ~ 3 ]; sum_v1 += n_v1[i: 4 ~ 7 ]; sum_v2 += n_v2[i: 8 ~ 11]; sum_v3 += n_v3[i: 12 ~ 15]; } Moreover, for a higher instruction parallelism in final vectorized loop, it is considered to make those effective vectorized lane-reducing statements be distributed evenly among all def-use cycles. In the above example, DOT_PROD, WIDEN_SUM and SADs are generated into disparate cycles. Bootstrapped/regtested on x86_64-linux and aarch64-linux. Feng --- gcc/ PR tree-optimization/114440 * tree-vectorizer.h (struct _stmt_vec_info): Add a new field reduc_result_pos. (vectorizable_lane_reducing): New function declaration. * tree-vect-stmts.cc (vectorizable_condition): Treat the condition statement that is pointed by stmt_vec_info of reduction PHI as the real "for_reduction" statement. (vect_analyze_stmt): Call new function vectorizable_lane_reducing to analyze lane-reducing operation. * tree-vect-loop.cc (vect_is_emulated_mixed_dot_prod): Remove parameter loop_vinfo. Get input vectype from stmt_info instead of reduction PHI. (vect_model_reduction_cost): Remove cost computation code related to emulated_mixed_dot_prod. (vect_reduction_use_partial_vector): New function. (vectorizable_lane_reducing): New function. (vectorizable_reduction): Allow multiple lane-reducing operations in loop reduction. Move some original lane-reducing related code to vectorizable_lane_reducing, and move partial vectorization checking code to vect_reduction_use_partial_vector. (vect_transform_reduction): Extend transformation to support reduction statements with mixed input vectypes. gcc/testsuite/ PR tree-optimization/114440 * gcc.dg/vect/vect-reduc-chain-1.c * gcc.dg/vect/vect-reduc-chain-2.c * gcc.dg/vect/vect-reduc-chain-3.c * gcc.dg/vect/vect-reduc-dot-slp-1.c * gcc.dg/vect/vect-reduc-dot-slp-2.c --- .../gcc.dg/vect/vect-reduc-chain-1.c | 62 ++ .../gcc.dg/vect/vect-reduc-chain-2.c | 77 ++ .../gcc.dg/vect/vect-reduc-chain-3.c | 66 ++ .../gcc.dg/vect/vect-reduc-dot-slp-1.c| 97 +++ .../gcc.dg/vect/vect-reduc-dot-slp-2.c| 81 +++ gcc/tree-vect-loop.cc | 668 -- gcc/tree-vect-stmts.cc| 13 +- gcc/tree-vectorizer.h | 8 + 8 files changed, 863 insertions(+), 209 deletions(-) create mode 100644 gcc/testsuite/gcc.dg/vect/vect-reduc-chain-1.c create mode 100644 gcc/testsuite/gcc.dg/vect/vect-reduc-chain-2.c create mode 100644 gcc/testsuite/gcc.dg/vect/vect-reduc-chain-3.c create mode 100644 gcc/testsuite/gcc.dg/vect/vect-reduc-dot-slp-1.c create mode 100644 gcc/testsuite/gcc.dg/vect/vect-reduc-dot-slp-2.c diff --git a/gcc/testsuite/gcc.dg/vect/vect-reduc-chain-1.c b/gcc/testsuite/gcc.dg/vect/vect-reduc-chain-1.c new file mode 100644 index 000..04bfc419dbd --- /dev/null +++ b/gcc/testsuite/gcc.dg/vect/vect-reduc-chain-1.c @@ -0,0 +1,62 @@ +/* Disabling epilogues until we find a
[PATCH v2] Internal-fn: Introduce new internal function SAT_ADD
From: Pan Li Update in v2: * Fix one failure for x86 bootstrap. Original log: This patch would like to add the middle-end presentation for the saturation add. Aka set the result of add to the max when overflow. It will take the pattern similar as below. SAT_ADD (x, y) => (x + y) | (-(TYPE)((TYPE)(x + y) < x)) Take uint8_t as example, we will have: * SAT_ADD (1, 254) => 255. * SAT_ADD (1, 255) => 255. * SAT_ADD (2, 255) => 255. * SAT_ADD (255, 255) => 255. The patch also implement the SAT_ADD in the riscv backend as the sample for both the scalar and vector. Given below example: uint64_t sat_add_u64 (uint64_t x, uint64_t y) { return (x + y) | (- (uint64_t)((uint64_t)(x + y) < x)); } Before this patch: uint64_t sat_add_uint64_t (uint64_t x, uint64_t y) { long unsigned int _1; _Bool _2; long unsigned int _3; long unsigned int _4; uint64_t _7; long unsigned int _10; __complex__ long unsigned int _11; ;; basic block 2, loop depth 0 ;;pred: ENTRY _11 = .ADD_OVERFLOW (x_5(D), y_6(D)); _1 = REALPART_EXPR <_11>; _10 = IMAGPART_EXPR <_11>; _2 = _10 != 0; _3 = (long unsigned int) _2; _4 = -_3; _7 = _1 | _4; return _7; ;;succ: EXIT } After this patch: uint64_t sat_add_uint64_t (uint64_t x, uint64_t y) { uint64_t _7; ;; basic block 2, loop depth 0 ;;pred: ENTRY _7 = .SAT_ADD (x_5(D), y_6(D)); [tail call] return _7; ;;succ: EXIT } For vectorize, we leverage the existing vect pattern recog to find the pattern similar to scalar and let the vectorizer to perform the rest part for standard name usadd3 in vector mode. The riscv vector backend have insn "Vector Single-Width Saturating Add and Subtract" which can be leveraged when expand the usadd3 in vector mode. For example: void vec_sat_add_u64 (uint64_t *out, uint64_t *x, uint64_t *y, unsigned n) { unsigned i; for (i = 0; i < n; i++) out[i] = (x[i] + y[i]) | (- (uint64_t)((uint64_t)(x[i] + y[i]) < x[i])); } Before this patch: void vec_sat_add_u64 (uint64_t *out, uint64_t *x, uint64_t *y, unsigned n) { ... _80 = .SELECT_VL (ivtmp_78, POLY_INT_CST [2, 2]); ivtmp_58 = _80 * 8; vect__4.7_61 = .MASK_LEN_LOAD (vectp_x.5_59, 64B, { -1, ... }, _80, 0); vect__6.10_65 = .MASK_LEN_LOAD (vectp_y.8_63, 64B, { -1, ... }, _80, 0); vect__7.11_66 = vect__4.7_61 + vect__6.10_65; mask__8.12_67 = vect__4.7_61 > vect__7.11_66; vect__12.15_72 = .VCOND_MASK (mask__8.12_67, { 18446744073709551615, ... }, vect__7.11_66); .MASK_LEN_STORE (vectp_out.16_74, 64B, { -1, ... }, _80, 0, vect__12.15_72); vectp_x.5_60 = vectp_x.5_59 + ivtmp_58; vectp_y.8_64 = vectp_y.8_63 + ivtmp_58; vectp_out.16_75 = vectp_out.16_74 + ivtmp_58; ivtmp_79 = ivtmp_78 - _80; ... } vec_sat_add_u64: ... vsetvli a5,a3,e64,m1,ta,ma vle64.v v0,0(a1) vle64.v v1,0(a2) sllia4,a5,3 sub a3,a3,a5 add a1,a1,a4 add a2,a2,a4 vadd.vv v1,v0,v1 vmsgtu.vv v0,v0,v1 vmerge.vim v1,v1,-1,v0 vse64.v v1,0(a0) ... After this patch: void vec_sat_add_u64 (uint64_t *out, uint64_t *x, uint64_t *y, unsigned n) { ... _62 = .SELECT_VL (ivtmp_60, POLY_INT_CST [2, 2]); ivtmp_46 = _62 * 8; vect__4.7_49 = .MASK_LEN_LOAD (vectp_x.5_47, 64B, { -1, ... }, _62, 0); vect__6.10_53 = .MASK_LEN_LOAD (vectp_y.8_51, 64B, { -1, ... }, _62, 0); vect__12.11_54 = .SAT_ADD (vect__4.7_49, vect__6.10_53); .MASK_LEN_STORE (vectp_out.12_56, 64B, { -1, ... }, _62, 0, vect__12.11_54); ... } vec_sat_add_u64: ... vsetvli a5,a3,e64,m1,ta,ma vle64.v v1,0(a1) vle64.v v2,0(a2) sllia4,a5,3 sub a3,a3,a5 add a1,a1,a4 add a2,a2,a4 vsaddu.vv v1,v1,v2 vse64.v v1,0(a0) ... To limit the patch size for review, only unsigned version of usadd3 are involved here. The signed version will be covered in the underlying patch(es). The below test suites are passed for this patch. * The riscv fully regression tests. * The aarch64 fully regression tests. * The x86 bootstrap tests. * The x86 fully regression tests. PR target/51492 PR target/112600 gcc/ChangeLog: * config/riscv/autovec.md (usadd3): New pattern expand for unsigned SAT_ADD vector. * config/riscv/riscv-protos.h (riscv_expand_usadd): New func decl to expand usadd3 pattern. (expand_vec_usadd): Ditto but for vector. * config/riscv/riscv-v.cc (emit_vec_saddu): New func impl to emit the vsadd insn. (expand_vec_usadd): New func impl to expand usadd3 for vector. * config/riscv/riscv.cc (riscv_expand_usadd): New func impl to expand usadd3 for scalar. * config/riscv/riscv.md (usadd3): New pattern expand for unsigned SAT_ADD scalar. * config/riscv/vector.md: Allow VLS mode for vsaddu. * internal-fn.cc (commutative_binary_fn_p): Add type IFN_SAT_ADD. * internal-fn.def (SAT_ADD): Add new signed optab SAT_ADD. * match.pd: Add unsigned SAT_ADD matc
Re: [PATCH 0/2] Condition coverage fixes
On 07/04/2024 08:26, Richard Biener wrote: Am 06.04.2024 um 22:41 schrieb Jørgen Kvalsvik : On 06/04/2024 13:15, Jørgen Kvalsvik wrote: On 06/04/2024 07:50, Richard Biener wrote: Am 05.04.2024 um 21:59 schrieb Jørgen Kvalsvik : Hi, I propose these fixes for the current issues with the condition coverage. Rainer, I propose to simply delete the test with __sigsetjmp. I don't think it actually detects anything reasonable any more, I kept it around to prevent a regression. Since then I have built a lot of programs (with optimization enabled) and not really seen this problem. H.J., the problem you found with -O2 was really a problem of tree-inlining, which was actually caught earlier by Jan [1]. It probably warrants some more testing, but I could reproduce by tuning your test case to use always_inline and not -O2 and trigger the error. [1] https://gcc.gnu.org/pipermail/gcc-patches/2024-April/648785.html Ok Thanks, committed. I am wondering if the fn->cond_uids access should always be guarded (in tree-profile.cc) should always be guarded. Right now there is the assumption that if condition coverage is requested the will exist and be populated, but as this shows there may be other circumstances where this is not true. Or perhaps there should be a gcc_assert to (reliably) detect cases where the map is not constructed properly? Thanks, Jørgen I gave this some more thought, and realised I was too eager to fix the segfault. While trunk no longer crashes (at least on my x86_64 linux) the fix itself is bad. It copies the gcond -> uid mappings into the caller, but the stmts are deep copied into the caller, so no gcond will ever be a hit when we look up the condition_uids in tree-profile.cc. I did a very quick prototype to confirm. By applying this patch: @@ -2049,6 +2049,9 @@ copy_bb (copy_body_data *id, basic_block bb, copy_gsi = gsi_start_bb (copy_basic_block); + if (!cfun->cond_uids && id->src_cfun->cond_uids) + cfun->cond_uids = new hash_map (); + for (gsi = gsi_start_bb (bb); !gsi_end_p (gsi); gsi_next (&gsi)) { gimple_seq stmts; @@ -2076,6 +2079,12 @@ copy_bb (copy_body_data *id, basic_block bb, if (gimple_nop_p (stmt)) continue; + if (id->src_cfun->cond_uids && is_a (orig_stmt)) + { + unsigned *v = id->src_cfun->cond_uids->get (as_a (orig_stmt)); + if (v) cfun->cond_uids->put (as_a (stmt), *v); + } + and this test program: __attribute__((always_inline)) inline int inlinefn (int a) { if (a > 5) { printf ("a > 5\n"); return a; } else printf ("a < 5, was %d\n", a); return a * a - 2; } int mcdc027e (int a, int b) { int y = inlinefn (a); return y + b; } gcov reports: 2: 18:mcdc027e (int a, int b) condition outcomes covered 1/2 condition 0 not covered (true) -: 19:{ 2: 20:int y = inlinefn (a); 2: 21:return y + b; -: 22:} but without the patch, gcov prints nothing. I am not sure if this approach is even ideal. Probably the most problematic is the source line mapping which is all messed up. I checked with gcov --branch-probabilities and it too reports the callee at the top of the caller. If you think it is a good strategy I can clean up the prototype and submit a patch. I suppose the function _totals_ should be accurate, even if the source mapping is a bit surprising. What do you think? I am open to other strategies, too I think the most important bit is that the segfault is gone. The interaction of coverage with inlining or even other optimization when applying optimization to coverage should be documented better. Does condition coverage apply ontop of regular coverage counting or is it an either/or? On top, it is perfectly reasonable (and desirable) to measure statement/line coverage in addition to condition coverage. That being said, if you achieve MC/DC you also achieve branch coverage, but gcc -fprofile-arcs + --branch-counts/--branch-probabilities measure more than just taken/not taken, so -fcondition-coverage does not completely replace it. You might also not care about MC/DC, only branch coverage. Personally, I have come around to this strategy being alright. It can, and even might be, documented that inlined functions will be anchored to the top of the calling function, and the summaries will be useful still. A future project could be to improve the source mapping also through inlining. In practice this is ok because code under test tends to not be inlined so much in practice. Thanks, Jørgen Thanks, Richard Thanks, Jørgen Thanks, Richard Thanks, Jørgen Jørgen Kvalsvik (2): Remove unecessary and broken MC/DC compile test Copy condition->expr map when inlining [PR114599] gcc/testsuite/gcc.misc-tests/gcov-19.c | 11 - gcc/testsuite/gcc.misc-tests/gcov-pr114599.c | 25 gcc/tree-inlin
[PATCH] LoongArch: Enable switchable target
This patch fixes the back-end context switching in cases where functions should be built with their own target contexts instead of the global one, such as LTO linking and functions with target attributes (TBD). PR target/113233 gcc/ChangeLog: * config/loongarch/loongarch.cc (loongarch_reg_init): Reinitialize the loongarch_regno_mode_ok cache. (loongarch_option_override): Same. (loongarch_save_restore_target_globals): Restore target globals. (loongarch_set_current_function): Restore the target contexts for functions. (TARGET_SET_CURRENT_FUNCTION): Define. * config/loongarch/loongarch.h (SWITCHABLE_TARGET): Enable switchable target context. * config/loongarch/loongarch-builtins.cc (loongarch_init_builtins): Initialize all builtin functions at startup. (loongarch_expand_builtin): Turn assertion of builtin availability into a test. gcc/testsuite/ChangeLog: * lib/target-supports.exp: Define condition loongarch_sx_as. * gcc.dg/lto/pr113233_0.c: New test. --- gcc/config/loongarch/loongarch-builtins.cc | 25 +++--- gcc/config/loongarch/loongarch.cc | 91 -- gcc/config/loongarch/loongarch.h | 2 + gcc/testsuite/gcc.dg/lto/pr113233_0.c | 14 gcc/testsuite/lib/target-supports.exp | 12 +++ 5 files changed, 127 insertions(+), 17 deletions(-) create mode 100644 gcc/testsuite/gcc.dg/lto/pr113233_0.c diff --git a/gcc/config/loongarch/loongarch-builtins.cc b/gcc/config/loongarch/loongarch-builtins.cc index efe7e5e5ebc..fbe46833c9b 100644 --- a/gcc/config/loongarch/loongarch-builtins.cc +++ b/gcc/config/loongarch/loongarch-builtins.cc @@ -2512,14 +2512,11 @@ loongarch_init_builtins (void) for (i = 0; i < ARRAY_SIZE (loongarch_builtins); i++) { d = &loongarch_builtins[i]; - if (d->avail ()) - { - type = loongarch_build_function_type (d->function_type); - loongarch_builtin_decls[i] - = add_builtin_function (d->name, type, i, BUILT_IN_MD, NULL, - NULL); - loongarch_get_builtin_decl_index[d->icode] = i; - } + type = loongarch_build_function_type (d->function_type); + loongarch_builtin_decls[i] + = add_builtin_function (d->name, type, i, BUILT_IN_MD, NULL, + NULL); + loongarch_get_builtin_decl_index[d->icode] = i; } } @@ -3105,15 +3102,21 @@ loongarch_expand_builtin (tree exp, rtx target, rtx subtarget ATTRIBUTE_UNUSED, int ignore ATTRIBUTE_UNUSED) { tree fndecl; - unsigned int fcode, avail; + unsigned int fcode; const struct loongarch_builtin_description *d; fndecl = TREE_OPERAND (CALL_EXPR_FN (exp), 0); fcode = DECL_MD_FUNCTION_CODE (fndecl); gcc_assert (fcode < ARRAY_SIZE (loongarch_builtins)); d = &loongarch_builtins[fcode]; - avail = d->avail (); - gcc_assert (avail != 0); + + if (!d->avail ()) +{ + error_at (EXPR_LOCATION (exp), + "built-in function %qD is not enabled", fndecl); + return target; +} + switch (d->builtin_type) { case LARCH_BUILTIN_DIRECT: diff --git a/gcc/config/loongarch/loongarch.cc b/gcc/config/loongarch/loongarch.cc index c90b701a533..6b92e7034c5 100644 --- a/gcc/config/loongarch/loongarch.cc +++ b/gcc/config/loongarch/loongarch.cc @@ -7570,15 +7570,19 @@ loongarch_global_init (void) loongarch_dwarf_regno[i] = INVALID_REGNUM; } + /* Function to allocate machine-dependent function status. */ + init_machine_status = &loongarch_init_machine_status; +}; + +static void +loongarch_reg_init (void) +{ /* Set up loongarch_hard_regno_mode_ok. */ for (int mode = 0; mode < MAX_MACHINE_MODE; mode++) for (int regno = 0; regno < FIRST_PSEUDO_REGISTER; regno++) loongarch_hard_regno_mode_ok_p[mode][regno] = loongarch_hard_regno_mode_ok_uncached (regno, (machine_mode) mode); - - /* Function to allocate machine-dependent function status. */ - init_machine_status = &loongarch_init_machine_status; -}; +} static void loongarch_option_override_internal (struct loongarch_target *target, @@ -7605,20 +7609,92 @@ loongarch_option_override_internal (struct loongarch_target *target, /* Override some options according to the resolved target. */ loongarch_target_option_override (target, opts, opts_set); + + target_option_default_node = target_option_current_node += build_target_option_node (opts, opts_set); + + loongarch_reg_init (); +} + +/* Remember the last target of loongarch_set_current_function. */ + +static GTY(()) tree loongarch_previous_fndecl; + +/* Restore or save the TREE_TARGET_GLOBALS from or to new_tree. + Used by loongarch_set_current_function to + make sure optab availability predicates are recomputed when necessary. */ + +static void +loongarch_save_restore_target_globals (tree new_tree) +{ + if (TREE_TARGE
Re: Combine patch ping
> Am 01.04.2024 um 21:28 schrieb Uros Bizjak : > > Hello! > > I'd like to ping the > https://gcc.gnu.org/pipermail/gcc-patches/2024-March/647634.html > PR112560 P1 patch. Ok. Thanks, Richard > Thanks, > Uros.
Re: [PATCH] LoongArch: Enable switchable target
On Sun, 2024-04-07 at 15:47 +0800, Yang Yujie wrote: > This patch fixes the back-end context switching in cases where functions > should be built with their own target contexts instead of the > global one, such as LTO linking and functions with target attributes (TBD). > > PR target/113233 Oops, so this PR isn't fixed with r14-7134 "LoongArch: Implement option save/restore"? Should I reopen it? -- Xi Ruoyao School of Aerospace Science and Technology, Xidian University
Re: [PATCH] LoongArch: Enable switchable target
On Sun, Apr 07, 2024 at 04:23:53PM +0800, Xi Ruoyao wrote: > On Sun, 2024-04-07 at 15:47 +0800, Yang Yujie wrote: > > This patch fixes the back-end context switching in cases where functions > > should be built with their own target contexts instead of the > > global one, such as LTO linking and functions with target attributes (TBD). > > > > PR target/113233 > > Oops, so this PR isn't fixed with r14-7134 "LoongArch: Implement option > save/restore"? Should I reopen it? > > -- > Xi Ruoyao > School of Aerospace Science and Technology, Xidian University Yes, the issue was not fixed with that patch. This one should do.
Re: [PATCH] LoongArch: Enable switchable target
On Sun, 2024-04-07 at 16:23 +0800, Yang Yujie wrote: > On Sun, Apr 07, 2024 at 04:23:53PM +0800, Xi Ruoyao wrote: > > On Sun, 2024-04-07 at 15:47 +0800, Yang Yujie wrote: > > > This patch fixes the back-end context switching in cases where functions > > > should be built with their own target contexts instead of the > > > global one, such as LTO linking and functions with target attributes > > > (TBD). > > > > > > PR target/113233 > > > > Oops, so this PR isn't fixed with r14-7134 "LoongArch: Implement option > > save/restore"? Should I reopen it? > > > > -- > > Xi Ruoyao > > School of Aerospace Science and Technology, Xidian University > > Yes, the issue was not fixed with that patch. This one should do. So reopened the PR. -- Xi Ruoyao School of Aerospace Science and Technology, Xidian University
Re: [PATCH] ICF&SRA: Make ICF and SRA agree on padding
On Thu, 2024-04-04 at 23:19 +0200, Martin Jambor wrote: > The patch has been approved by Honza in Bugzilla. (I hope. He did write > it looked reasonable.) Together with the patch for PR 113907, it has > passed bootstrap, LTO bootstrap and LTO profiledbootstrap and testing on > x86_64-linux and bootstrap and LTO bootstrap on ppc64le-linux. It also > passed normal bootstrap on aarch64-linux but there many testcases failed > because the compiler timed out. The machine is old and slow and might > have been oversubscribed so my plan is to try again on gcc185 from > cfarm. If that goes well, I intend to commit the patch and then start > working on backports. I've tried these two patches out on my own 24-core AArch64 machine. Bootstrapped (but no LTO or PGO) and regtested fine. -- Xi Ruoyao School of Aerospace Science and Technology, Xidian University
Re: [PATCH] ICF&SRA: Make ICF and SRA agree on padding
On Thu, 2024-04-04 at 23:19 +0200, Martin Jambor wrote: > +/* Given two types in an assignment, return true either if any one cannot be > + totally scalarized or if they have padding (i.e. not copied bits) */ > + > +bool > +sra_total_scalarization_would_copy_same_data_p (tree t1, tree t2) > +{ > + sra_padding_collecting p1; > + if (!check_ts_and_push_padding_to_vec (t1, &p1)) > + return true; > + > + sra_padding_collecting p2; > + if (!check_ts_and_push_padding_to_vec (t2, &p2)) > + return true; > + > + unsigned l = p1.m_padding.length (); > + if (l != p2.m_padding.length ()) > + return false; > + for (unsigned i = 0; i < l; i++) > + if (p1.m_padding[i].first != p2.m_padding[i].first > + || p1.m_padding[i].second != p2.m_padding[i].second) > + return false; > + > + return true; > +} > + Better remove this trailing empty line from tree-sra.cc. -- Xi Ruoyao School of Aerospace Science and Technology, Xidian University
Re: [PATCH] LoongArch: Enable switchable target
On Sun, 2024-04-07 at 15:47 +0800, Yang Yujie wrote: > * config/loongarch/loongarch-builtins.cc > (loongarch_init_builtins): > Initialize all builtin functions at startup. git gcc-verify complains that tab should be used instead of space for this line. > (loongarch_expand_builtin): Turn assertion of builtin > availability > into a test. and this line. -- Xi Ruoyao School of Aerospace Science and Technology, Xidian University
Re: [PATCH, rs6000] Split TARGET_POWER8 from TARGET_DIRECT_MOVE [PR101865] (2/2)
I'm picking up Will's patches for this bug. As an FYI, this is the bug where _ARCH_PWR8 is conditional on TARGET_DIRECT_MOVE which can be disabled with -mno-vsx which is bad. I already posted the cleanup patch that the updated patch for this bug will rely on, that removed the OPTION_MASK_DIRECT_MOVE because it is fully redundant with OPTION_MASK_P8_VECTOR. I've also incorporated some of Ke Wen's review comments on Will's original patch. I have a couple of comments on your review though... On 10/17/22 1:08 PM, Segher Boessenkool wrote: > On Mon, Sep 19, 2022 at 11:13:20AM -0500, will schmidt wrote: >> @@ -24046,10 +24045,11 @@ static struct rs6000_opt_mask const >> rs6000_opt_masks[] = >>{ "block-ops-vector-pair",OPTION_MASK_BLOCK_OPS_VECTOR_PAIR, >> false, true }, >>{ "cmpb", OPTION_MASK_CMPB, false, true }, >>{ "crypto", OPTION_MASK_CRYPTO, false, >> true }, >>{ "direct-move", OPTION_MASK_DIRECT_MOVE,false, true }, >> + { "power8", OPTION_MASK_POWER8, false, >> true }, > > Why would we want a #pragma power8 ? Agreed, we don't want that. We have target attribute cpu=power8 for that. >> +mpower8 >> +Target Mask(POWER8) Var(rs6000_isa_flags) >> +Use instructions added in ISA 2.07 (power8). > > There should not be such an option. It is set by -mcpu=power8 and > later, but can never be enabled or disabled direfctly by the user. So we need an OPTION_MASK_POWER8 to be created for use in rs6000_isa_flags, but the only way I see that we can do that is to create an option in rs6000.opt. Did I miss that there is another way? Otherwise, I was thinking of creating a dummy option that is WarnRemoved from the start ala: +;; This option exists only for its MASK. It is not intended for users. +mpower8 +Target Mask(POWER8) Var(rs6000_isa_flags) WarnRemoved + Is there a better way? The problem is P8 created lots of new instructions, but they were basically all vector and htm instructions. There were no general GPR or FPR instructions (ie, what we'd think of as base architecture) added, so there's no other OPTION_MASK_*/TARGET_* we can use as a P8 base architecture test. I'll note I tried just a bare "Target Mask(POWER8) Var(rs6000_isa_flags)" with no option name mentioned at all, but that didn't work, as no OPTION_MASK_POWER8 was created. Peter
Re: [PATCH 2/9] wwwdocs: gcc-14: add URLs to some options
On Thu, 4 Apr 2024, David Malcolm wrote: > Signed-off-by: David Malcolm > --- > htdocs/gcc-14/changes.html | 23 --- > 1 file changed, 16 insertions(+), 7 deletions(-) > > diff --git a/htdocs/gcc-14/changes.html b/htdocs/gcc-14/changes.html > index 5cc729c5..397458d5 100644 > --- a/htdocs/gcc-14/changes.html > +++ b/htdocs/gcc-14/changes.html > @@ -149,26 +149,33 @@ a work-in-progress. > to enable additional hardening. > > > -New option -fhardened, an umbrella option that enables a set > -of hardening flags. The options it enables can be displayed using the > +New option > + href="https://gcc.gnu.org/onlinedocs/gcc/Instrumentation-Options.html#index-fhardened";>-fhardened, Shouldn't those URLs better point to a specific version, lest they might break with any newer release? The question is "a bit" rhetorical, since there appears to be nothing at onlinedocs/gcc-14.0.0/ (and "nearby numbers"). Still, maybe there ought to be a copy of onlinedocs/gcc/ that is frozen at time of release. brgds, H-P
Re:[pushed] [PATCH v1] LoongArch: Set default alignment for functions jumps and loops [PR112919].
在 2024/4/6 下午5:53, Xi Ruoyao 写道: On Tue, 2024-04-02 at 15:03 +0800, Lulu Cheng wrote: +/* Alignment for functions loops and jumps for best performance. For new + uarchs the value should be measured via benchmarking. See the documentation + for -falign-functions -falign-loops and -falign-jumps in invoke.texi for the ^ ^ Better have two commas here. Otherwise it should be OK. + format. */ Modify the annotation information and pushed to r14-9824.
[PATCH] aarch64: Fix vld1/st1_x4 intrinsic test
The test for this intrinsic was failing silently and so it failed to report the bug reported in 114521. This patch modifes the test to report the result. Bug report: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114521 Signed-off-by: Jonathan Swinney --- .../gcc.target/aarch64/advsimd-intrinsics/vld1x4.c | 10 +++--- 1 file changed, 7 insertions(+), 3 deletions(-) diff --git a/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vld1x4.c b/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vld1x4.c index 89b289bb21d..17db262a31a 100644 --- a/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vld1x4.c +++ b/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vld1x4.c @@ -3,6 +3,7 @@ /* { dg-skip-if "unimplemented" { arm*-*-* } } */ /* { dg-options "-O3" } */ +#include #include #include "arm-neon-ref.h" @@ -71,13 +72,16 @@ VARIANT (float64, 2, q_f64) VARIANTS (TESTMETH) #define CHECKS(BASE, ELTS, SUFFIX) \ - if (test_vld1##SUFFIX##_x4 () != 0) \ -fprintf (stderr, "test_vld1##SUFFIX##_x4"); + if (test_vld1##SUFFIX##_x4 () != 0) {\ +fprintf (stderr, "test_vld1" #SUFFIX "_x4 failed\n"); \ +failed = true; \ + } int main (int argc, char **argv) { + bool failed = false; VARIANTS (CHECKS) - return 0; + return (failed) ? 1 : 0; } -- 2.40.1
Re: [PATCH] LoongArch: Enable switchable target
On Sun, Apr 07, 2024 at 08:56:53PM +0800, Xi Ruoyao wrote: > On Sun, 2024-04-07 at 15:47 +0800, Yang Yujie wrote: > > * config/loongarch/loongarch-builtins.cc > > (loongarch_init_builtins): > > Initialize all builtin functions at startup. > > git gcc-verify complains that tab should be used instead of space for > this line. > > > (loongarch_expand_builtin): Turn assertion of builtin > > availability > > into a test. > > and this line. > > -- > Xi Ruoyao > School of Aerospace Science and Technology, Xidian University Thanks! I will fix it soon.
Re: [PATCH 0/2] Condition coverage fixes
Jørgen Kvalsvik writes: > Hi, > > I propose these fixes for the current issues with the condition > coverage. > > Rainer, I propose to simply delete the test with __sigsetjmp. I don't > think it actually detects anything reasonable any more, I kept it around > to prevent a regression. Since then I have built a lot of programs (with > optimization enabled) and not really seen this problem. > > H.J., the problem you found with -O2 was really a problem of > tree-inlining, which was actually caught earlier by Jan [1]. It probably > warrants some more testing, but I could reproduce by tuning your test > case to use always_inline and not -O2 and trigger the error. > > [1] https://gcc.gnu.org/pipermail/gcc-patches/2024-April/648785.html I couldn't find your BZ account, but FWIW: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114627. Thanks. > > Thanks, > Jørgen > > Jørgen Kvalsvik (2): > Remove unecessary and broken MC/DC compile test > Copy condition->expr map when inlining [PR114599] > > gcc/testsuite/gcc.misc-tests/gcov-19.c | 11 - > gcc/testsuite/gcc.misc-tests/gcov-pr114599.c | 25 > gcc/tree-inline.cc | 20 +++- > 3 files changed, 44 insertions(+), 12 deletions(-) > create mode 100644 gcc/testsuite/gcc.misc-tests/gcov-pr114599.c