RE: [PATCH] arm: Auto-vectorization for MVE: vorn
> -Original Message- > From: Christophe Lyon > Sent: 02 February 2021 07:19 > To: Kyrylo Tkachov > Cc: gcc-patches@gcc.gnu.org > Subject: Re: [PATCH] arm: Auto-vectorization for MVE: vorn > > On Mon, 1 Feb 2021 at 10:08, Kyrylo Tkachov > wrote: > > > > > > > > > -Original Message- > > > From: Christophe Lyon > > > Sent: 29 January 2021 18:18 > > > To: Kyrylo Tkachov > > > Cc: gcc-patches@gcc.gnu.org > > > Subject: Re: [PATCH] arm: Auto-vectorization for MVE: vorn > > > > > > On Fri, 29 Jan 2021 at 16:03, Kyrylo Tkachov > > > wrote: > > > > > > > > > > > > > > > > > -Original Message- > > > > > From: Gcc-patches On Behalf > Of > > > > > Christophe Lyon via Gcc-patches > > > > > Sent: 29 January 2021 14:55 > > > > > To: gcc-patches@gcc.gnu.org > > > > > Subject: [PATCH] arm: Auto-vectorization for MVE: vorn > > > > > > > > > > This patch enables MVE vornq instructions for auto-vectorization. > MVE > > > > > vornq insns in mve.md are modified to use ior instead of unspec > > > > > expression to support ior3. The ior3 expander is > added > > > to > > > > > vec-common.md > > > > > > > > > > 2021-01-29 Christophe Lyon > > > > > > > > > > gcc/ > > > > > * config/arm/iterators.md (supf): Remove VORNQ_S and > VORNQ_U. > > > > > (VORNQ): Remove. > > > > > * config/arm/mve.md (mve_vornq_s): New entry for vorn > > > > > instruction using expression ior. > > > > > (mve_vornq_u): New expander. > > > > > (mve_vornq_f): Use ior code instead of unspec. > > > > > * config/arm/unspecs.md (VORNQ_S, VORNQ_U, VORNQ_F): > > > > > Remove. > > > > > * config/arm/vec-common.md (orn3): New expander. > > > > > > > > > > gcc/testsuite/ > > > > > * gcc.target/arm/simd/mve-vorn.c: Add vorn tests. > > > > > --- > > > > > gcc/config/arm/iterators.md | 3 +-- > > > > > gcc/config/arm/mve.md| 23 +++-- > > > > > gcc/config/arm/unspecs.md| 3 --- > > > > > gcc/config/arm/vec-common.md | 8 ++ > > > > > gcc/testsuite/gcc.target/arm/simd/mve-vorn.c | 38 > > > > > > > > > > 5 files changed, 62 insertions(+), 13 deletions(-) > > > > > create mode 100644 gcc/testsuite/gcc.target/arm/simd/mve-vorn.c > > > > > > > > > > diff --git a/gcc/config/arm/iterators.md > b/gcc/config/arm/iterators.md > > > > > index b902790..43aab23 100644 > > > > > --- a/gcc/config/arm/iterators.md > > > > > +++ b/gcc/config/arm/iterators.md > > > > > @@ -1293,7 +1293,7 @@ (define_int_attr supf [(VCVTQ_TO_F_S "s") > > > > > (VCVTQ_TO_F_U "u") (VREV16Q_S "s") > > > > > (VMULLBQ_INT_S "s") (VMULLBQ_INT_U "u") > > > > > (VQADDQ_S "s") > > > > > (VMULLTQ_INT_S "s") (VMULLTQ_INT_U "u") > > > > > (VQADDQ_U "u") > > > > > (VMULQ_N_S "s") (VMULQ_N_U "u") (VMULQ_S "s") > > > > > -(VMULQ_U "u") (VORNQ_S "s") (VORNQ_U "u") > > > > > +(VMULQ_U "u") > > > > > (VQADDQ_N_S "s") (VQADDQ_N_U "u") > > > > > (VQRSHLQ_N_S "s") (VQRSHLQ_N_U "u") (VQRSHLQ_S > > > > > "s") > > > > > (VQRSHLQ_U "u") (VQSHLQ_N_S "s") > > > > > (VQSHLQ_N_U "u") > > > > > @@ -1563,7 +1563,6 @@ (define_int_iterator VMULLBQ_INT > > > > > [VMULLBQ_INT_U VMULLBQ_INT_S]) > > > > > (define_int_iterator VMULLTQ_INT [VMULLTQ_INT_U > VMULLTQ_INT_S]) > > > > > (define_int_iterator VMULQ [VMULQ_U VMULQ_S]) > > > > > (define_int_iterator VMULQ_N [VMULQ_N_U VMULQ_N_S]) > > > > > -(define_int_iterator VORNQ [VORNQ_U VORNQ_S]) > > > > > (define_int_iterator VQADDQ [VQADDQ_U VQADDQ_S]) > > > > > (define_int_iterator VQADDQ_N [VQADDQ_N_S VQADDQ_N_U]) > > > > > (define_int_iterator VQRSHLQ [VQRSHLQ_S VQRSHLQ_U]) > > > > > diff --git a/gcc/config/arm/mve.md b/gcc/config/arm/mve.md > > > > > index 465f71c..ec0ef7b 100644 > > > > > --- a/gcc/config/arm/mve.md > > > > > +++ b/gcc/config/arm/mve.md > > > > > @@ -1634,18 +1634,26 @@ (define_insn "mve_vmulq" > > > > > ;; > > > > > ;; [vornq_u, vornq_s]) > > > > > ;; > > > > > -(define_insn "mve_vornq_" > > > > > +(define_insn "mve_vornq_s" > > > > >[ > > > > > (set (match_operand:MVE_2 0 "s_register_operand" "=w") > > > > > - (unspec:MVE_2 [(match_operand:MVE_2 1 "s_register_operand" > "w") > > > > > -(match_operand:MVE_2 2 "s_register_operand" "w")] > > > > > - VORNQ)) > > > > > + (ior:MVE_2 (not:MVE_2 (match_operand:MVE_2 2 > > > > > "s_register_operand" "w")) > > > > > +(match_operand:MVE_2 1 "s_register_operand" "w"))) > > > > >] > > > > >"TARGET_HAVE_MVE" > > > > > - "vorn %q0, %q1, %q2" > > > > > + "vorn\t%q0, %q1, %q2" > > > > >[(set_attr "type" "mve_move") > > > > > ]) > > > > > > > > > > +(define_expand "mve_vornq_u" > > > > > + [ > > > > > + (set (match_operand:MVE_2 0 "s_register_operand") > > > > >
Re: [PATCH v2] PR target/98743: Fix ICE in convert_move for RISC-V
On Tue, Feb 02, 2021 at 03:21:24PM +0800, Kito Cheng wrote: > - Check `from` mode is not BLMmode before call store_expr, calling store_expr >with BLKmode will cause ICE. > > - Verified with riscv64, x86_64 and aarch64, no introduce new regression. > > Note: Those logic was introduced by 3e60ddeb8220ed388819bb3f14e8caa9309fd3c2, > so I cc Jakub for reivew. > > Changes for V2: > > - Checking mode of `from` rather than mode of `to`. > - Verified on riscv64, x86_64 and aarch64 again. > > gcc/ChangeLog: > > PR target/98743 > * expr.c: Check mode before calling store_expr. Ok, with a small nit. > > gcc/testsuite/ChangeLog: > > PR target/98743 > * g++.target/riscv/pr98743.C: New. There is nothing riscv specific on the testcase, so it should go into g++.dg/opt/ instead. Jakub
Re: [PATCH] rtl-optimization/80960 - avoid creating garbage RTL in DSE
On Tue, Feb 02, 2021 at 08:51:07AM +0100, Richard Biener wrote: > > Shouldn't we gather statistics from larger codebase first and perhaps > > compare against tree-ssa-dse statistics? I mean, in many functions there > > are no DSE opportunities at all. > > Of course. Some DSE will definitely be required because we expose > ABI details only on RTL and expand sometimes is quite stupid. ISTR > either DCE or CSE performs some limited amount of DSE as well? > > The most needed and interesting work will be to disentangle RTL expansion > into the "complex" bits to be done on (lowered) GIMPLE and the > mechanical detail of GIMPLE to RTL one instruction at a time. I guess > only during this work we'll learn what we need in lowered GIMPLE. BTW, more complete RTL DSE statistics (from x86_64-linux and i686-linux bootstraps/regtests together) are below (columns are number of occurences of specific local and global deletion pairs, local deletions, global deletions), so it is true RTL DSE only removed something in 1.6% of functions during those. But RTL DSE is also doing the SCCVN-like optimization of optimizing memory reads into constants if those constants were stored before and that isn't tracked in this. 5908970 0 0 20144 1 0 14162 0 3 11413 0 1 10862 2 0 5268 0 2 5256 3 0 4924 0 6 3506 4 0 2838 0 4 1734 0 9 1621 5 0 1449 0 5 1014 0 8 856 0 12 754 0 7 729 6 0 487 7 0 464 8 0 422 14 0 412 9 0 381 0 15 373 0 10 297 1 1 259 0 18 248 2 2 245 0 11 237 3 3 230 12 0 220 10 0 168 0 16 144 11 0 138 0 14 133 4 6 133 0 13 123 0 24 115 15 0 114 0 21 94 0 20 91 0 17 88 1 4 84 1 2 75 16 0 74 17 0 71 0 27 69 0 30 55 2 1 50 4 1 50 1 9 50 1 3 46 0 36 44 0 33 44 0 25 42 2 3 41 18 0 40 5 3 39 34 0 36 13 0 35 2 18 35 0 19 34 3 1 33 1 7 31 1 5 31 0 48 31 0 23 30 4 3 30 1 6 30 0 22 28 0 29 28 0 26 26 0 28 25 24 0 24 1 18 22 2 6 21 0 32 20 4 4 20 3 2 19 4 2 18 0 40 17 39 0 17 22 0 16 5 9 16 5 5 15 129 0 15 0 66 15 0 31 14 20 0 14 19 0 13 6 3 12 1 8 12 0 42 11 32 0 11 2 4 10 4 9 10 42 0 10 0 44 9 3 4 9 30 0 9 259 0 9 0 57 9 0 39 8 4 8 8 46 0 8 31 0 8 25 35 8 1 15 8 0 56 8 0 54 8 0 45 8 0 43 8 0 38 7 3 8 7 35 0 7 33 0 7 130 0 7 1 24 7 12 1 7 1 14 7 0 60 6 9 3 6 5 6 6 5 4 6 5 1 6 4 49 6 3 13 6 15 9 6 12 6 6 0 84 6 0 41 5 6 12 5 44 0 5 27 0 5 2 7 5 25 0 5 23 0 5 14 12 5 12 2 5 1 21 5 10 2 5 0 35 5 0 150 4 86 0 4 7 6 4 69 0 4 6 8 4 56 0 4 45 0 4 4 16 4 29 0 4 266 0 4 263 0 4 258 0 4 2 22 4 12 4 4 1 12 4 0 83 4 0 61 4 0 51 4 0 34 3 8 9 3 8 6 3 6 1 3 55 0 3 5 10 3 49 0 3 4 77 3 4 5 3 4 18 3 4 13 3 2 9 3 28 0 3 2 38 3 2 12 3 14 6 3 1 13 3 0 88 3 0 86 3 0 75 3 0 69 3 0 198 3 0 100 2 90 0 2 6 39 2 6 21 2 53 0 2 5 2 2 512 0 2 5 11 2 50 0 2 4 7 2 4 33 2 4 10 2 40 0 2 3 9 2 3 6 2 33 1 2 2 8 2 262 0 2 26 0 2 252 0 2 24 12 2 24 1 2 23 29 2 22 22 2 21 5 2 21 0 2 16 5 2 14 18 2 128 0 2 12 5 2 1 20 2 11 40 2 1 11 2 0 91 2 0 37 2 0 158 2 0 136 2 0 128 2 0 104 2 0 103 1 9 6 1 9 10 1 8 7 1 8 2 1 8 16 1 8 15 1 81 0 1 8 1 1 7 5 1 74 0 1 73 6 1 7 12 1 7 1 1 6 7 1 54 0 1 52 0 1 48 0 1 36 0 1 3 5 1 31 3 1 2 5 1 2 21 1 2 13 1 21 18 1 2 10 1 19 20 1 19 2 1 18 12 1 17 10 1 16 1 1 14 9 1 145 0 1 14 15 1 13 1 1 12 8 1 11 11 1 1 10 1 10 7 1 10 11 1 10 1 1 0 89 1 0 80 1 0 70 1 0 67 1 0 59 1 0 55 1 0 49 1 0 244 1 0 148 1 0 140 Jakub
Re: [PATCH v2] PR target/98743: Fix ICE in convert_move for RISC-V
Hi Jakub: Thanks for your review, committed with testcase movement. On Tue, Feb 2, 2021 at 4:41 PM Jakub Jelinek wrote: > > On Tue, Feb 02, 2021 at 03:21:24PM +0800, Kito Cheng wrote: > > - Check `from` mode is not BLMmode before call store_expr, calling > > store_expr > >with BLKmode will cause ICE. > > > > - Verified with riscv64, x86_64 and aarch64, no introduce new regression. > > > > Note: Those logic was introduced by > > 3e60ddeb8220ed388819bb3f14e8caa9309fd3c2, > > so I cc Jakub for reivew. > > > > Changes for V2: > > > > - Checking mode of `from` rather than mode of `to`. > > - Verified on riscv64, x86_64 and aarch64 again. > > > > gcc/ChangeLog: > > > > PR target/98743 > > * expr.c: Check mode before calling store_expr. > > Ok, with a small nit. > > > > gcc/testsuite/ChangeLog: > > > > PR target/98743 > > * g++.target/riscv/pr98743.C: New. > > There is nothing riscv specific on the testcase, so it should go into > g++.dg/opt/ instead. > > Jakub >
[committed] testsuite: Add testcase for already fixed PR [PR97960]
Hi! This testcase has been fixed by r11-5904-g4cf70c20cb10acd6fb1016611d05540728176b60 so I'm checking it in so that we can close the PR. Bootstrapped/regtested on x86_64-linux and i686-linux, committed to trunk as obvious. 2021-02-02 Jakub Jelinek PR tree-optimization/97960 * g++.dg/torture/pr97960.C: New test. --- gcc/testsuite/g++.dg/torture/pr97960.C.jj 2021-02-01 19:00:07.912833437 +0100 +++ gcc/testsuite/g++.dg/torture/pr97960.C 2021-02-01 18:57:04.616927413 +0100 @@ -0,0 +1,30 @@ +// PR tree-optimization/97960 +// { dg-do run } + +#if __CHAR_BIT__ == 8 && __SIZEOF_INT__ == 4 +const int & +foo (const int &d, const int &f) +{ + if (d < f) +return f; + return d; +} + +short a[575]; +unsigned b[25]; +unsigned char g; +#endif + +int +main () +{ +#if __CHAR_BIT__ == 8 && __SIZEOF_INT__ == 4 + for (int e = 0; e < 23; ++e) +a[e * 23] = 16137; + for (signed char h = (unsigned char) (foo (g, 253) + 3); h < 24; h++) +b[h] = 1064739102; + for (int e = 0; e < 23; ++e) +if (a[e * 23] != 16137) + __builtin_abort (); +#endif +} Jakub
[PATCH] tree-vect-patterns: Don't create over widening patterns for stmts used in reductions [PR98848]
Hi! As discussed in the PR, the reduction code isn't able to cope with type promotions/demotions in the reduction computation, so if we recognize an over-widening pattern that has vect_reduction_def type, we most likely make it non-vectorizable. Fixed thusly, bootstrapped/regtested on x86_64-linux and i686-linux, ok for turnk? 2021-02-02 Jakub Jelinek PR tree-optimization/98848 * tree-vect-patterns.c (vect_recog_over_widening_pattern): Punt if STMT_VINFO_DEF_TYPE (last_stmt_info) is vect_reduction_def. * gcc.dg/vect/pr98848.c: New test. * gcc.dg/vect/pr92205.c: Remove xfail. --- gcc/tree-vect-patterns.c.jj 2021-01-04 10:25:38.650235896 +0100 +++ gcc/tree-vect-patterns.c2021-02-01 11:04:48.136814236 +0100 @@ -1579,6 +1579,10 @@ vect_recog_over_widening_pattern (vec_in tree type = TREE_TYPE (lhs); tree_code code = gimple_assign_rhs_code (last_stmt); + /* Punt for reductions where we don't handle the type conversions. */ + if (STMT_VINFO_DEF_TYPE (last_stmt_info) == vect_reduction_def) +return NULL; + /* Keep the first operand of a COND_EXPR as-is: only the other two operands are interesting. */ unsigned int first_op = (code == COND_EXPR ? 2 : 1); --- gcc/testsuite/gcc.dg/vect/pr98848.c.jj 2021-02-01 11:15:11.965633750 +0100 +++ gcc/testsuite/gcc.dg/vect/pr98848.c 2021-02-01 11:15:07.348686891 +0100 @@ -0,0 +1,18 @@ +/* PR tree-optimization/98848 */ +/* { dg-do compile } */ +/* { dg-require-effective-target vect_int } */ + +short a[9000]; + +int +foo (void) +{ + int b = a[0]; + int i; + for (i = 1; i < 9000; i ++) +if (a[i] < b) + b = a[i]; + return b; +} + +/* { dg-final { scan-tree-dump-times "vectorized 1 loop" 1 "vect" { xfail { vect_no_int_add || vect_no_int_min_max } } } } */ --- gcc/testsuite/gcc.dg/vect/pr92205.c.jj 2020-01-12 11:54:37.645394821 +0100 +++ gcc/testsuite/gcc.dg/vect/pr92205.c 2021-02-02 09:59:40.998104863 +0100 @@ -10,4 +10,4 @@ int b(int n, unsigned char *a) return d; } -/* { dg-final { scan-tree-dump "vectorized 1 loops" "vect" { xfail *-*-* } } } */ +/* { dg-final { scan-tree-dump "vectorized 1 loops" "vect" { target { vect_unpack && { ! vect_no_bitwise } } } } } */ Jakub
[PATCH] gimple-match, gimple-fold: After PROP_gimple_lvec is set, punt for vector stmts that veclower would need to lower [PR98287]
Hi! The following testcase ICEs, because after the veclower pass which is the last point which lowers unsupported vector operations to supported ones (or scalars) match.pd simplifies a supported vector operation into unsupported one (vec << 1 >> 1 into vec & { ~1, ... }). Guarding each match.pd pattern that could do it manually would be IMHO a nightmare and it could affect hundreds of them, so this patch instead performs the verification and punting in the infrastructure (of course only in PROP_gimple_lvec state which is set after the vector lowering). The function attempts to match expand_vector_operations_1 (except I haven't added VEC_PERM_EXPR, VEC_COND_EXPR and COND_EXPR cases there, so those aren't checked and can be added later if we find it a problem). Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk? 2021-02-02 Jakub Jelinek PR tree-optimization/98287 * gimple-match.h (check_gimple_lvec): New declaration. * gimple-match-head.c (check_gimple_lvec): New function. (maybe_push_res_to_seq): Use it. * gimple-fold.c: Include tree-pass.h. (replace_stmt_with_simplification): Use it. * gcc.dg/pr98287.c: New test. --- gcc/gimple-match.h.jj 2021-01-04 10:25:38.456238093 +0100 +++ gcc/gimple-match.h 2021-02-01 13:19:30.393860766 +0100 @@ -335,6 +335,7 @@ extern tree (*mprts_hook) (gimple_match_ bool gimple_simplify (gimple *, gimple_match_op *, gimple_seq *, tree (*)(tree), tree (*)(tree)); +bool check_gimple_lvec (gimple_match_op *); tree maybe_push_res_to_seq (gimple_match_op *, gimple_seq *, tree res = NULL_TREE); void maybe_build_generic_op (gimple_match_op *); --- gcc/gimple-match-head.c.jj 2021-01-16 19:46:53.511672837 +0100 +++ gcc/gimple-match-head.c 2021-02-01 13:18:58.676225427 +0100 @@ -549,6 +549,109 @@ build_call_internal (internal_fn fn, gim res_op->op_or_null (4)); } +/* In PROP_gimple_lvec mode, perform extra checking if RES_OP and return + true if RES_OP is not appropriate - would require vector operations that + would need to be lowered before expansion. */ + +bool +check_gimple_lvec (gimple_match_op *res_op) +{ + enum tree_code code = res_op->code; + enum gimple_rhs_class rhs_class = get_gimple_rhs_class (code); + if (rhs_class != GIMPLE_UNARY_RHS && rhs_class != GIMPLE_BINARY_RHS) +return false; + + tree rhs1 = res_op->op_or_null (0); + tree rhs2 = res_op->op_or_null (1); + tree type = res_op->type; + + if (!VECTOR_TYPE_P (type) || !VECTOR_TYPE_P (TREE_TYPE (rhs1))) +return false; + + /* A scalar operation pretending to be a vector one. */ + if (VECTOR_BOOLEAN_TYPE_P (type) + && !VECTOR_MODE_P (TYPE_MODE (type)) + && TYPE_MODE (type) != BLKmode + && (TREE_CODE_CLASS (code) != tcc_comparison + || (VECTOR_BOOLEAN_TYPE_P (TREE_TYPE (rhs1)) + && !VECTOR_MODE_P (TYPE_MODE (TREE_TYPE (rhs1))) + && TYPE_MODE (TREE_TYPE (rhs1)) != BLKmode))) +return false; + + machine_mode mode = TYPE_MODE (type); + optab op; + switch (code) +{ +CASE_CONVERT: +case FLOAT_EXPR: +case FIX_TRUNC_EXPR: +case VIEW_CONVERT_EXPR: + return false; + +case VEC_UNPACK_FLOAT_HI_EXPR: +case VEC_UNPACK_FLOAT_LO_EXPR: +case VEC_PACK_FLOAT_EXPR: + return false; + +case WIDEN_SUM_EXPR: +case VEC_WIDEN_PLUS_HI_EXPR: +case VEC_WIDEN_PLUS_LO_EXPR: +case VEC_WIDEN_MINUS_HI_EXPR: +case VEC_WIDEN_MINUS_LO_EXPR: +case VEC_WIDEN_MULT_HI_EXPR: +case VEC_WIDEN_MULT_LO_EXPR: +case VEC_WIDEN_MULT_EVEN_EXPR: +case VEC_WIDEN_MULT_ODD_EXPR: +case VEC_UNPACK_HI_EXPR: +case VEC_UNPACK_LO_EXPR: +case VEC_UNPACK_FIX_TRUNC_HI_EXPR: +case VEC_UNPACK_FIX_TRUNC_LO_EXPR: +case VEC_PACK_TRUNC_EXPR: +case VEC_PACK_SAT_EXPR: +case VEC_PACK_FIX_TRUNC_EXPR: +case VEC_WIDEN_LSHIFT_HI_EXPR: +case VEC_WIDEN_LSHIFT_LO_EXPR: + return false; + +case LSHIFT_EXPR: +case RSHIFT_EXPR: +case LROTATE_EXPR: +case RROTATE_EXPR: + if (!VECTOR_MODE_P (mode)) + return true; + if (!VECTOR_INTEGER_TYPE_P (TREE_TYPE (rhs2))) + { + op = optab_for_tree_code (code, type, optab_scalar); + if (op && optab_handler (op, mode) != CODE_FOR_nothing) + return false; + } + op = optab_for_tree_code (code, type, optab_vector); + if (op && optab_handler (op, mode) != CODE_FOR_nothing) + return false; + return true; + +case MULT_HIGHPART_EXPR: + if (!VECTOR_MODE_P (mode) + || !can_mult_highpart_p (mode, TYPE_UNSIGNED (type))) + return true; + return false; + +default: + if (!VECTOR_MODE_P (mode)) + return true; + op = optab_for_tree_code (code, type, optab_default); + if (op == unknown_optab + && code == NEGATE_EXPR + && INTEGRAL_TYPE_P (TREE_TYPE (type))) +
Re: [PATCH] tree-vect-patterns: Don't create over widening patterns for stmts used in reductions [PR98848]
On Tue, 2 Feb 2021, Jakub Jelinek wrote: > Hi! > > As discussed in the PR, the reduction code isn't able to cope with type > promotions/demotions in the reduction computation, so if we recognize an > over-widening pattern that has vect_reduction_def type, we most likely make > it non-vectorizable. > > Fixed thusly, bootstrapped/regtested on x86_64-linux and i686-linux, ok for > turnk? OK. Thanks, Richard. > 2021-02-02 Jakub Jelinek > > PR tree-optimization/98848 > * tree-vect-patterns.c (vect_recog_over_widening_pattern): Punt if > STMT_VINFO_DEF_TYPE (last_stmt_info) is vect_reduction_def. > > * gcc.dg/vect/pr98848.c: New test. > * gcc.dg/vect/pr92205.c: Remove xfail. > > --- gcc/tree-vect-patterns.c.jj 2021-01-04 10:25:38.650235896 +0100 > +++ gcc/tree-vect-patterns.c 2021-02-01 11:04:48.136814236 +0100 > @@ -1579,6 +1579,10 @@ vect_recog_over_widening_pattern (vec_in >tree type = TREE_TYPE (lhs); >tree_code code = gimple_assign_rhs_code (last_stmt); > > + /* Punt for reductions where we don't handle the type conversions. */ > + if (STMT_VINFO_DEF_TYPE (last_stmt_info) == vect_reduction_def) > +return NULL; > + >/* Keep the first operand of a COND_EXPR as-is: only the other two > operands are interesting. */ >unsigned int first_op = (code == COND_EXPR ? 2 : 1); > --- gcc/testsuite/gcc.dg/vect/pr98848.c.jj2021-02-01 11:15:11.965633750 > +0100 > +++ gcc/testsuite/gcc.dg/vect/pr98848.c 2021-02-01 11:15:07.348686891 > +0100 > @@ -0,0 +1,18 @@ > +/* PR tree-optimization/98848 */ > +/* { dg-do compile } */ > +/* { dg-require-effective-target vect_int } */ > + > +short a[9000]; > + > +int > +foo (void) > +{ > + int b = a[0]; > + int i; > + for (i = 1; i < 9000; i ++) > +if (a[i] < b) > + b = a[i]; > + return b; > +} > + > +/* { dg-final { scan-tree-dump-times "vectorized 1 loop" 1 "vect" { xfail { > vect_no_int_add || vect_no_int_min_max } } } } */ > --- gcc/testsuite/gcc.dg/vect/pr92205.c.jj2020-01-12 11:54:37.645394821 > +0100 > +++ gcc/testsuite/gcc.dg/vect/pr92205.c 2021-02-02 09:59:40.998104863 > +0100 > @@ -10,4 +10,4 @@ int b(int n, unsigned char *a) >return d; > } > > -/* { dg-final { scan-tree-dump "vectorized 1 loops" "vect" { xfail *-*-* } } > } */ > +/* { dg-final { scan-tree-dump "vectorized 1 loops" "vect" { target { > vect_unpack && { ! vect_no_bitwise } } } } } */ > > Jakub > > -- Richard Biener SUSE Software Solutions Germany GmbH, Maxfeldstrasse 5, 90409 Nuernberg, Germany; GF: Felix Imendörffer; HRB 36809 (AG Nuernberg)
[PATCH] lra-constraints: Fix error-recovery for bad inline-asms [PR97971]
Hi! The following testcase has ice-on-invalid, it can't be reloaded, but we shouldn't ICE the compiler because the user typed non-sense. In current_insn_transform we have: if (process_alt_operands (reused_alternative_num)) alt_p = true; if (check_only_p) return ! alt_p || best_losers != 0; /* If insn is commutative (it's safe to exchange a certain pair of operands) then we need to try each alternative twice, the second time matching those two operands as if we had exchanged them. To do this, really exchange them in operands. If we have just tried the alternatives the second time, return operands to normal and drop through. */ if (reused_alternative_num < 0 && commutative >= 0) { curr_swapped = !curr_swapped; if (curr_swapped) { swap_operands (commutative); goto try_swapped; } else swap_operands (commutative); } if (! alt_p && ! sec_mem_p) { /* No alternative works with reloads?? */ if (INSN_CODE (curr_insn) >= 0) fatal_insn ("unable to generate reloads for:", curr_insn); error_for_asm (curr_insn, "inconsistent operand constraints in an %"); lra_asm_error_p = true; ... and so handle inline asms there differently (and delete/nullify them after this) - fatal_insn is only called for non-inline asm. But in process_alt_operands we do: /* Both the earlyclobber operand and conflicting operand cannot both be user defined hard registers. */ if (HARD_REGISTER_P (operand_reg[i]) && REG_USERVAR_P (operand_reg[i]) && operand_reg[j] != NULL_RTX && HARD_REGISTER_P (operand_reg[j]) && REG_USERVAR_P (operand_reg[j])) fatal_insn ("unable to generate reloads for " "impossible constraints:", curr_insn); and thus ICE even for inline-asms. I think it is inappropriate to delete/nullify the insn in process_alt_operands, as it could be done e.g. in the check_only_p mode, so this patch just returns false in that case, which results in the caller have alt_p false, and as inline asm isn't simple move, sec_mem_p will be also false (and it isn't commutative either), so for check_only_p it will suggests to the callers it isn't ok and otherwise will emit error and delete/nullify the inline asm insn. Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk? 2021-02-02 Jakub Jelinek PR middle-end/97971 * lra-constraints.c (process_alt_operands): For inline asm, don't call fatal_insn, but instead return false. * gcc.target/i386/pr97971.c: New test. --- gcc/lra-constraints.c.jj2021-01-30 10:48:21.380667825 +0100 +++ gcc/lra-constraints.c 2021-02-01 14:20:53.053603804 +0100 @@ -3106,8 +3106,13 @@ process_alt_operands (int only_alternati && operand_reg[j] != NULL_RTX && HARD_REGISTER_P (operand_reg[j]) && REG_USERVAR_P (operand_reg[j])) - fatal_insn ("unable to generate reloads for " - "impossible constraints:", curr_insn); + { + /* For asm, let curr_insn_transform diagnose it. */ + if (INSN_CODE (curr_insn) < 0) + return false; + fatal_insn ("unable to generate reloads for " + "impossible constraints:", curr_insn); + } } if (last_conflict_j < 0) continue; --- gcc/testsuite/gcc.target/i386/pr97971.c.jj 2021-02-01 14:24:26.159160855 +0100 +++ gcc/testsuite/gcc.target/i386/pr97971.c 2021-02-01 14:25:10.586652128 +0100 @@ -0,0 +1,12 @@ +/* PR middle-end/97971 */ +/* { dg-do compile } */ +/* { dg-options "-O2" } */ + +int +foo (void) +{ + register _Complex long a asm ("rax"); + register int b asm ("rdx"); + asm ("# %0 %1" : "=&r" (a), "=r" (b)); /* { dg-error "inconsistent operand constraints in an 'asm'" } */ + return a; +} Jakub
[PATCH] ifcvt: Avoid ICEs trying to force_operand random RTL [PR97487]
Hi! As the testcase shows, RTL ifcvt can throw random RTL (whatever it found in some insns) at expand_binop or expand_unop and expects it to do something (and then will check if it created valid insns and punts if not). These functions in the end if the operands don't match try to copy_to_mode_reg the operands, which does if (!general_operand (x, VOIDmode)) x = force_operand (x, temp); but, force_operand is far from handling all possible RTLs, it will ICE for all more unusual RTL codes. Basically handles just simple arithmetic and unary RTL operations if they have an optab and expand_simple_binop/expand_simple_unop ICE on others. The following patch fixes it by adding some operand verification (whether there is a hope that copy_to_mode_reg will succeed on those). It is added both to noce_emit_move_insn (not needed for this exact testcase, that function simply tries to recog the insn as is and if it fails, handles some simple binop/unop cases; the patch performs the verification of their operands) and noce_try_sign_mask. Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk? 2021-02-02 Jakub Jelinek PR middle-end/97487 * ifcvt.c (noce_can_force_operand): New function. (noce_emit_move_insn): Use it. (noce_try_sign_mask): Likewise. Formatting fix. * gcc.dg/pr97487-1.c: New test. * gcc.dg/pr97487-2.c: New test. --- gcc/ifcvt.c.jj 2021-01-04 10:25:39.663224426 +0100 +++ gcc/ifcvt.c 2021-02-01 16:57:35.483766472 +0100 @@ -885,6 +885,60 @@ noce_emit_store_flag (struct noce_if_inf || code == GEU || code == GTU), normalize); } +/* Return true if X can be safely forced into a register by copy_to_mode_reg + / force_operand. */ + +static bool +noce_can_force_operand (rtx x) +{ + if (general_operand (x, VOIDmode)) +return true; + if (SUBREG_P (x)) +{ + if (!noce_can_force_operand (SUBREG_REG (x))) + return false; + return true; +} + if (ARITHMETIC_P (x)) +{ + if (!noce_can_force_operand (XEXP (x, 0)) + || !noce_can_force_operand (XEXP (x, 1))) + return false; + switch (GET_CODE (x)) + { + case MULT: + case DIV: + case MOD: + case UDIV: + case UMOD: + return true; + default: + return code_to_optab (GET_CODE (x)); + } +} + if (UNARY_P (x)) +{ + if (!noce_can_force_operand (XEXP (x, 0))) + return false; + switch (GET_CODE (x)) + { + case ZERO_EXTEND: + case SIGN_EXTEND: + case TRUNCATE: + case FLOAT_EXTEND: + case FLOAT_TRUNCATE: + case FIX: + case UNSIGNED_FIX: + case FLOAT: + case UNSIGNED_FLOAT: + return true; + default: + return code_to_optab (GET_CODE (x)); + } +} + return false; +} + /* Emit instruction to move an rtx, possibly into STRICT_LOW_PART. X is the destination/target and Y is the value to copy. */ @@ -943,7 +997,7 @@ noce_emit_move_insn (rtx x, rtx y) { case RTX_UNARY: ot = code_to_optab (GET_CODE (y)); - if (ot) + if (ot && noce_can_force_operand (XEXP (y, 0))) { start_sequence (); target = expand_unop (GET_MODE (y), ot, XEXP (y, 0), x, 0); @@ -960,7 +1014,9 @@ noce_emit_move_insn (rtx x, rtx y) case RTX_BIN_ARITH: case RTX_COMM_ARITH: ot = code_to_optab (GET_CODE (y)); - if (ot) + if (ot + && noce_can_force_operand (XEXP (y, 0)) + && noce_can_force_operand (XEXP (y, 1))) { start_sequence (); target = expand_binop (GET_MODE (y), ot, @@ -2763,15 +2819,18 @@ noce_try_sign_mask (struct noce_if_info INSN_B which can happen for e.g. conditional stores to memory. For the cost computation use the block TEST_BB where the evaluation will end up after the transformation. */ - t_unconditional = -(t == if_info->b - && (if_info->insn_b == NULL_RTX -|| BLOCK_FOR_INSN (if_info->insn_b) == if_info->test_bb)); + t_unconditional += (t == if_info->b + && (if_info->insn_b == NULL_RTX + || BLOCK_FOR_INSN (if_info->insn_b) == if_info->test_bb)); if (!(t_unconditional || (set_src_cost (t, mode, if_info->speed_p) < COSTS_N_INSNS (2 return FALSE; + if (!noce_can_force_operand (t)) +return FALSE; + start_sequence (); /* Use emit_store_flag to generate "m < 0 ? -1 : 0" instead of expanding "(signed) m >> 31" directly. This benefits targets with specialized --- gcc/testsuite/gcc.dg/pr97487-1.c.jj 2021-02-01 17:11:48.281063978 +0100 +++ gcc/testsuite/gcc.dg/pr97487-1.c2021-02-01 17:11:07.530527523 +0100 @@ -0,0 +1,9 @@ +/* PR middle-end/97487 */ +/* { dg-do compile } */ +/* { dg-options "-O2 --param max-rtl-if-
Re: [PATCH] gimple-match, gimple-fold: After PROP_gimple_lvec is set, punt for vector stmts that veclower would need to lower [PR98287]
Jakub Jelinek via Gcc-patches writes: > Hi! > > The following testcase ICEs, because after the veclower pass which is the > last point which lowers unsupported vector operations to supported ones > (or scalars) match.pd simplifies a supported vector operation into > unsupported one (vec << 1 >> 1 into vec & { ~1, ... }). > Guarding each match.pd pattern that could do it manually would be IMHO a > nightmare and it could affect hundreds of them, so this patch instead > performs the verification and punting in the infrastructure (of course only > in PROP_gimple_lvec state which is set after the vector lowering). > The function attempts to match expand_vector_operations_1 (except I haven't > added VEC_PERM_EXPR, VEC_COND_EXPR and COND_EXPR cases there, so those > aren't checked and can be added later if we find it a problem). > > Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk? > > 2021-02-02 Jakub Jelinek > > PR tree-optimization/98287 > * gimple-match.h (check_gimple_lvec): New declaration. > * gimple-match-head.c (check_gimple_lvec): New function. > (maybe_push_res_to_seq): Use it. > * gimple-fold.c: Include tree-pass.h. > (replace_stmt_with_simplification): Use it. > > * gcc.dg/pr98287.c: New test. > > --- gcc/gimple-match.h.jj 2021-01-04 10:25:38.456238093 +0100 > +++ gcc/gimple-match.h2021-02-01 13:19:30.393860766 +0100 > @@ -335,6 +335,7 @@ extern tree (*mprts_hook) (gimple_match_ > > bool gimple_simplify (gimple *, gimple_match_op *, gimple_seq *, > tree (*)(tree), tree (*)(tree)); > +bool check_gimple_lvec (gimple_match_op *); > tree maybe_push_res_to_seq (gimple_match_op *, gimple_seq *, > tree res = NULL_TREE); > void maybe_build_generic_op (gimple_match_op *); > --- gcc/gimple-match-head.c.jj2021-01-16 19:46:53.511672837 +0100 > +++ gcc/gimple-match-head.c 2021-02-01 13:18:58.676225427 +0100 > @@ -549,6 +549,109 @@ build_call_internal (internal_fn fn, gim >res_op->op_or_null (4)); > } > > +/* In PROP_gimple_lvec mode, perform extra checking if RES_OP and return > + true if RES_OP is not appropriate - would require vector operations that > + would need to be lowered before expansion. */ > + > +bool > +check_gimple_lvec (gimple_match_op *res_op) Guess this is personal preference, but the return value seems inverted. I'd have expected true if something is OK and false if something isn't. > +{ > + enum tree_code code = res_op->code; > + enum gimple_rhs_class rhs_class = get_gimple_rhs_class (code); > + if (rhs_class != GIMPLE_UNARY_RHS && rhs_class != GIMPLE_BINARY_RHS) > +return false; > + > + tree rhs1 = res_op->op_or_null (0); > + tree rhs2 = res_op->op_or_null (1); > + tree type = res_op->type; > + > + if (!VECTOR_TYPE_P (type) || !VECTOR_TYPE_P (TREE_TYPE (rhs1))) > +return false; > + > + /* A scalar operation pretending to be a vector one. */ > + if (VECTOR_BOOLEAN_TYPE_P (type) > + && !VECTOR_MODE_P (TYPE_MODE (type)) > + && TYPE_MODE (type) != BLKmode > + && (TREE_CODE_CLASS (code) != tcc_comparison > + || (VECTOR_BOOLEAN_TYPE_P (TREE_TYPE (rhs1)) > + && !VECTOR_MODE_P (TYPE_MODE (TREE_TYPE (rhs1))) > + && TYPE_MODE (TREE_TYPE (rhs1)) != BLKmode))) > +return false; > + > + machine_mode mode = TYPE_MODE (type); > + optab op; > + switch (code) > +{ > +CASE_CONVERT: > +case FLOAT_EXPR: > +case FIX_TRUNC_EXPR: > +case VIEW_CONVERT_EXPR: > + return false; > + > +case VEC_UNPACK_FLOAT_HI_EXPR: > +case VEC_UNPACK_FLOAT_LO_EXPR: > +case VEC_PACK_FLOAT_EXPR: > + return false; > + > +case WIDEN_SUM_EXPR: > +case VEC_WIDEN_PLUS_HI_EXPR: > +case VEC_WIDEN_PLUS_LO_EXPR: > +case VEC_WIDEN_MINUS_HI_EXPR: > +case VEC_WIDEN_MINUS_LO_EXPR: > +case VEC_WIDEN_MULT_HI_EXPR: > +case VEC_WIDEN_MULT_LO_EXPR: > +case VEC_WIDEN_MULT_EVEN_EXPR: > +case VEC_WIDEN_MULT_ODD_EXPR: > +case VEC_UNPACK_HI_EXPR: > +case VEC_UNPACK_LO_EXPR: > +case VEC_UNPACK_FIX_TRUNC_HI_EXPR: > +case VEC_UNPACK_FIX_TRUNC_LO_EXPR: > +case VEC_PACK_TRUNC_EXPR: > +case VEC_PACK_SAT_EXPR: > +case VEC_PACK_FIX_TRUNC_EXPR: > +case VEC_WIDEN_LSHIFT_HI_EXPR: > +case VEC_WIDEN_LSHIFT_LO_EXPR: > + return false; > + > +case LSHIFT_EXPR: > +case RSHIFT_EXPR: > +case LROTATE_EXPR: > +case RROTATE_EXPR: > + if (!VECTOR_MODE_P (mode)) > + return true; > + if (!VECTOR_INTEGER_TYPE_P (TREE_TYPE (rhs2))) > + { > + op = optab_for_tree_code (code, type, optab_scalar); > + if (op && optab_handler (op, mode) != CODE_FOR_nothing) > + return false; > + } > + op = optab_for_tree_code (code, type, optab_vector); > + if (op && optab_handler (op, mode) != CODE_FOR_nothing) > + return false; > + return true; > + > +case MULT_HIGHP
Re: [PATCH] gimple-match, gimple-fold: After PROP_gimple_lvec is set, punt for vector stmts that veclower would need to lower [PR98287]
On Tue, Feb 02, 2021 at 09:38:09AM +, Richard Sandiford wrote: > > +/* In PROP_gimple_lvec mode, perform extra checking if RES_OP and return > > + true if RES_OP is not appropriate - would require vector operations that > > + would need to be lowered before expansion. */ > > + > > +bool > > +check_gimple_lvec (gimple_match_op *res_op) > > Guess this is personal preference, but the return value seems inverted. > I'd have expected true if something is OK and false if something isn't. Sure, I can change that. > > +default: > > + if (!VECTOR_MODE_P (mode)) > > + return true; > > + op = optab_for_tree_code (code, type, optab_default); > > + if (op == unknown_optab > > + && code == NEGATE_EXPR > > + && INTEGRAL_TYPE_P (TREE_TYPE (type))) > > + op = optab_for_tree_code (MINUS_EXPR, type, optab_default); > > + if (op && optab_handler (op, mode) != CODE_FOR_nothing) > > + return false; > > + return true; > > This doesn't look right for things like SVE comparisons, where the > result is a VECTOR_BOOLEAN_TYPE_P with MODE_VECTOR_BOOL and where > the inputs are something else. I guess it might also affect FP > comparisons on most targets. So how does that get through expand_vector_operations_1 ? I don't see there anything that would catch it until: if (compute_type == NULL_TREE) compute_type = get_compute_type (code, op, type); if (compute_type == type) return; and the above is an attempt to do what get_compute_type does. expand_vector_comparison indeed has one case: /* As seen in PR95830, we should not expand comparisons that are only feeding a VEC_COND_EXPR statement. */ ... and if expand_vector_comparison returns NULL, nothing is lowered. But we can't really look at immediate uses during the checking :(. So do you suggest for now letting all comparisons get through? Jakub
Re: [PATCH] gimple-match, gimple-fold: After PROP_gimple_lvec is set, punt for vector stmts that veclower would need to lower [PR98287]
Jakub Jelinek via Gcc-patches writes: > On Tue, Feb 02, 2021 at 09:38:09AM +, Richard Sandiford wrote: >> > +default: >> > + if (!VECTOR_MODE_P (mode)) >> > + return true; >> > + op = optab_for_tree_code (code, type, optab_default); >> > + if (op == unknown_optab >> > +&& code == NEGATE_EXPR >> > +&& INTEGRAL_TYPE_P (TREE_TYPE (type))) >> > + op = optab_for_tree_code (MINUS_EXPR, type, optab_default); >> > + if (op && optab_handler (op, mode) != CODE_FOR_nothing) >> > + return false; >> > + return true; >> >> This doesn't look right for things like SVE comparisons, where the >> result is a VECTOR_BOOLEAN_TYPE_P with MODE_VECTOR_BOOL and where >> the inputs are something else. I guess it might also affect FP >> comparisons on most targets. > > So how does that get through expand_vector_operations_1 ? > > I don't see there anything that would catch it until: > if (compute_type == NULL_TREE) > compute_type = get_compute_type (code, op, type); > if (compute_type == type) > return; > and the above is an attempt to do what get_compute_type does. > > expand_vector_comparison indeed has one case: > /* As seen in PR95830, we should not expand comparisons that are only > feeding a VEC_COND_EXPR statement. */ > ... > and if expand_vector_comparison returns NULL, nothing is lowered. Right. But I think the main thing for SVE is the expand_vec_cmp_expr_p test, which takes both the lhs and rhs types. So… > But we can't really look at immediate uses during the checking :(. > > So do you suggest for now letting all comparisons get through? …I guess we could just use expand_vec_cmp_expr_p for the comparisons. But in general, I think it would be good to avoid duplicating so much of the tests. At least the: if (CONVERT_EXPR_CODE_P (code) || code == FLOAT_EXPR || code == FIX_TRUNC_EXPR || code == VIEW_CONVERT_EXPR) return; /* The signedness is determined from input argument. */ if (code == VEC_UNPACK_FLOAT_HI_EXPR || code == VEC_UNPACK_FLOAT_LO_EXPR || code == VEC_PACK_FLOAT_EXPR) { /* We do not know how to scalarize those. */ return; } /* For widening/narrowing vector operations, the relevant type is of the arguments, not the widened result. VEC_UNPACK_FLOAT_*_EXPR is calculated in the same way above. */ if (code == WIDEN_SUM_EXPR || code == VEC_WIDEN_PLUS_HI_EXPR || code == VEC_WIDEN_PLUS_LO_EXPR || code == VEC_WIDEN_MINUS_HI_EXPR || code == VEC_WIDEN_MINUS_LO_EXPR || code == VEC_WIDEN_MULT_HI_EXPR || code == VEC_WIDEN_MULT_LO_EXPR || code == VEC_WIDEN_MULT_EVEN_EXPR || code == VEC_WIDEN_MULT_ODD_EXPR || code == VEC_UNPACK_HI_EXPR || code == VEC_UNPACK_LO_EXPR || code == VEC_UNPACK_FIX_TRUNC_HI_EXPR || code == VEC_UNPACK_FIX_TRUNC_LO_EXPR || code == VEC_PACK_TRUNC_EXPR || code == VEC_PACK_SAT_EXPR || code == VEC_PACK_FIX_TRUNC_EXPR || code == VEC_WIDEN_LSHIFT_HI_EXPR || code == VEC_WIDEN_LSHIFT_LO_EXPR) { /* We do not know how to scalarize those. */ return; } part seems like it could be split out into a predicate. Maybe we could also split out the optab choice, e.g. returning the optab if a split is possible and null otherwise. Just a suggestion though. Thanks, Richard
RE: [backport gcc10, gcc9] Requet to backport PR97969
> On 2021-01-18 7:50 a.m., Richard Biener wrote: > > On Mon, 18 Jan 2021, Przemyslaw Wirkus wrote: > > > >> Hi all, > >> > >> Can we backport PR97969 patch to GCC 10 and (maybe) GCC 9 ?: > >> https://gcc.gnu.org/bugzilla/show_bug.cgi?id=97969 > >> > >> IMHO bug is severe and could land in GCC 10 and 9. Vladimir's original > patch: > >> https://gcc.gnu.org/pipermail/gcc-patches/2021-January/563322.html > >> applies without changes to both gcc-10 and gcc-9. > >> > >> I've regression tested this patch on both gcc-10 and gcc-9 branched > >> for > >> x86_64 cross (arm-eabi target) and no issues. > >> > >> OK for gcc-10 and gcc-9 ? > > I see two fallout PRs with a trivial search: PR98643 and PR98722. LRA > > patches quite easily trigger unexpected fallout unfortunately ... > > > Yes, I am agree. We should wait until the new regressions are fixed. I am > going to work on this patch more to fix the new regressions. Although the > basic idea of the original problem solution probably will stay the same. I've retested series of three patches which are related to this PR: 19af25c0b3aa2a78b4d45d295359ec26cb9fc607 [PR98777] 79c57603602c4493b6baa1d47ed451e8f5e9c0f3 [PR98722] 34aa56af2547e1646c0f07b9b88b210ebdb2a9f5 [PR97969] on top of gcc-10 branch. Bootstrapped and regression tested on aarch64-linux-gnu machine and no issues. Regression tested on x86_64 host (arm-eabi target) cross and no issues. OK for gcc-10 ? > >> PS: I can commit if approved. > >>
Re: [PATCH] gimple-match, gimple-fold: After PROP_gimple_lvec is set, punt for vector stmts that veclower would need to lower [PR98287]
On Tue, 2 Feb 2021, Jakub Jelinek wrote: > Hi! > > The following testcase ICEs, because after the veclower pass which is the > last point which lowers unsupported vector operations to supported ones > (or scalars) match.pd simplifies a supported vector operation into > unsupported one (vec << 1 >> 1 into vec & { ~1, ... }). > Guarding each match.pd pattern that could do it manually would be IMHO a > nightmare and it could affect hundreds of them, so this patch instead > performs the verification and punting in the infrastructure (of course only > in PROP_gimple_lvec state which is set after the vector lowering). > The function attempts to match expand_vector_operations_1 (except I haven't > added VEC_PERM_EXPR, VEC_COND_EXPR and COND_EXPR cases there, so those > aren't checked and can be added later if we find it a problem). > > Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk? So I fear this only covers parts of the paths simplifications can end up used. Now one question is whether we want to allow "invalid" intermediate transforms if a final match makes it "valid" again. If not then doing the verification in gimple-match.c itself might be preferable. OTOH I think that most patterns should verify themselves - it's also odd that v << 1 >> 1 is supported but not v & vec-cst - we might want to require some basic support from targets. I realize the PR is one of the odd V1mode cases which we might want to support better during RTL expansion (or require targets to be more sane here). So in the end I don't like the patch much, not at this stage anyway. Thanks, Richard. > 2021-02-02 Jakub Jelinek > > PR tree-optimization/98287 > * gimple-match.h (check_gimple_lvec): New declaration. > * gimple-match-head.c (check_gimple_lvec): New function. > (maybe_push_res_to_seq): Use it. > * gimple-fold.c: Include tree-pass.h. > (replace_stmt_with_simplification): Use it. > > * gcc.dg/pr98287.c: New test. > > --- gcc/gimple-match.h.jj 2021-01-04 10:25:38.456238093 +0100 > +++ gcc/gimple-match.h2021-02-01 13:19:30.393860766 +0100 > @@ -335,6 +335,7 @@ extern tree (*mprts_hook) (gimple_match_ > > bool gimple_simplify (gimple *, gimple_match_op *, gimple_seq *, > tree (*)(tree), tree (*)(tree)); > +bool check_gimple_lvec (gimple_match_op *); > tree maybe_push_res_to_seq (gimple_match_op *, gimple_seq *, > tree res = NULL_TREE); > void maybe_build_generic_op (gimple_match_op *); > --- gcc/gimple-match-head.c.jj2021-01-16 19:46:53.511672837 +0100 > +++ gcc/gimple-match-head.c 2021-02-01 13:18:58.676225427 +0100 > @@ -549,6 +549,109 @@ build_call_internal (internal_fn fn, gim >res_op->op_or_null (4)); > } > > +/* In PROP_gimple_lvec mode, perform extra checking if RES_OP and return > + true if RES_OP is not appropriate - would require vector operations that > + would need to be lowered before expansion. */ > + > +bool > +check_gimple_lvec (gimple_match_op *res_op) > +{ > + enum tree_code code = res_op->code; > + enum gimple_rhs_class rhs_class = get_gimple_rhs_class (code); > + if (rhs_class != GIMPLE_UNARY_RHS && rhs_class != GIMPLE_BINARY_RHS) > +return false; > + > + tree rhs1 = res_op->op_or_null (0); > + tree rhs2 = res_op->op_or_null (1); > + tree type = res_op->type; > + > + if (!VECTOR_TYPE_P (type) || !VECTOR_TYPE_P (TREE_TYPE (rhs1))) > +return false; > + > + /* A scalar operation pretending to be a vector one. */ > + if (VECTOR_BOOLEAN_TYPE_P (type) > + && !VECTOR_MODE_P (TYPE_MODE (type)) > + && TYPE_MODE (type) != BLKmode > + && (TREE_CODE_CLASS (code) != tcc_comparison > + || (VECTOR_BOOLEAN_TYPE_P (TREE_TYPE (rhs1)) > + && !VECTOR_MODE_P (TYPE_MODE (TREE_TYPE (rhs1))) > + && TYPE_MODE (TREE_TYPE (rhs1)) != BLKmode))) > +return false; > + > + machine_mode mode = TYPE_MODE (type); > + optab op; > + switch (code) > +{ > +CASE_CONVERT: > +case FLOAT_EXPR: > +case FIX_TRUNC_EXPR: > +case VIEW_CONVERT_EXPR: > + return false; > + > +case VEC_UNPACK_FLOAT_HI_EXPR: > +case VEC_UNPACK_FLOAT_LO_EXPR: > +case VEC_PACK_FLOAT_EXPR: > + return false; > + > +case WIDEN_SUM_EXPR: > +case VEC_WIDEN_PLUS_HI_EXPR: > +case VEC_WIDEN_PLUS_LO_EXPR: > +case VEC_WIDEN_MINUS_HI_EXPR: > +case VEC_WIDEN_MINUS_LO_EXPR: > +case VEC_WIDEN_MULT_HI_EXPR: > +case VEC_WIDEN_MULT_LO_EXPR: > +case VEC_WIDEN_MULT_EVEN_EXPR: > +case VEC_WIDEN_MULT_ODD_EXPR: > +case VEC_UNPACK_HI_EXPR: > +case VEC_UNPACK_LO_EXPR: > +case VEC_UNPACK_FIX_TRUNC_HI_EXPR: > +case VEC_UNPACK_FIX_TRUNC_LO_EXPR: > +case VEC_PACK_TRUNC_EXPR: > +case VEC_PACK_SAT_EXPR: > +case VEC_PACK_FIX_TRUNC_EXPR: > +case VEC_WIDEN_LSHIFT_HI_EXPR: > +case VEC_WIDEN_LSHIFT_LO_EXPR: > + return false; > + > +case LSHIFT_E
RE: [backport gcc10, gcc9] Requet to backport PR97969
On Tue, 2 Feb 2021, Przemyslaw Wirkus wrote: > > On 2021-01-18 7:50 a.m., Richard Biener wrote: > > > On Mon, 18 Jan 2021, Przemyslaw Wirkus wrote: > > > > > >> Hi all, > > >> > > >> Can we backport PR97969 patch to GCC 10 and (maybe) GCC 9 ?: > > >> https://gcc.gnu.org/bugzilla/show_bug.cgi?id=97969 > > >> > > >> IMHO bug is severe and could land in GCC 10 and 9. Vladimir's original > > patch: > > >> https://gcc.gnu.org/pipermail/gcc-patches/2021-January/563322.html > > >> applies without changes to both gcc-10 and gcc-9. > > >> > > >> I've regression tested this patch on both gcc-10 and gcc-9 branched > > >> for > > >> x86_64 cross (arm-eabi target) and no issues. > > >> > > >> OK for gcc-10 and gcc-9 ? > > > I see two fallout PRs with a trivial search: PR98643 and PR98722. LRA > > > patches quite easily trigger unexpected fallout unfortunately ... > > > > > Yes, I am agree. We should wait until the new regressions are fixed. I > > am > > going to work on this patch more to fix the new regressions. Although the > > basic idea of the original problem solution probably will stay the same. > > I've retested series of three patches which are related to this PR: > > 19af25c0b3aa2a78b4d45d295359ec26cb9fc607 [PR98777] > 79c57603602c4493b6baa1d47ed451e8f5e9c0f3 [PR98722] > 34aa56af2547e1646c0f07b9b88b210ebdb2a9f5 [PR97969] > > on top of gcc-10 branch. > > Bootstrapped and regression tested on aarch64-linux-gnu machine and no issues. > Regression tested on x86_64 host (arm-eabi target) cross and no issues. > > OK for gcc-10 ? I think this warrants waiting until at least the GCC 11 release. Richard.
Re: [committed] libstdc++: Update C++17 status table for
On 01/02/21 16:08 +, Jonathan Wakely wrote: libstdc++-v3/ChangeLog: * doc/xml/manual/status_cxx2011.xml: Update std::call_once status. * doc/xml/manual/status_cxx2014.xml: Likewise. * doc/xml/manual/status_cxx2017.xml: Likewise. Update std::from_chars and std::to_chars status. Fix formatting. * doc/html/manual/status.html: Regenerate. I noticed we have an unwanted extra column at the right of the C++11/14/17 tables, due to a stray element. Fixed by this patch. Committed to trunk. commit 886f9f519c0c6297c95887839e318fa79cba2052 Author: Jonathan Wakely Date: Tue Feb 2 09:55:52 2021 libstdc++: Fix markup for status tables in docs libstdc++-v3/ChangeLog: * doc/xml/manual/status_cxx2011.xml: Remove stray table cell. * doc/xml/manual/status_cxx2014.xml: Likewise. * doc/xml/manual/status_cxx2017.xml: Likewise. * doc/html/manual/status.html: Regenerate. diff --git a/libstdc++-v3/doc/xml/manual/status_cxx2011.xml b/libstdc++-v3/doc/xml/manual/status_cxx2011.xml index e13ca566ea3..be873962597 100644 --- a/libstdc++-v3/doc/xml/manual/status_cxx2011.xml +++ b/libstdc++-v3/doc/xml/manual/status_cxx2011.xml @@ -2174,7 +2174,6 @@ particular release. Class template regex_traits Partial transform_primary is not correctly implemented - 28.8 diff --git a/libstdc++-v3/doc/xml/manual/status_cxx2014.xml b/libstdc++-v3/doc/xml/manual/status_cxx2014.xml index 7b2d4603b24..61bea5adad5 100644 --- a/libstdc++-v3/doc/xml/manual/status_cxx2014.xml +++ b/libstdc++-v3/doc/xml/manual/status_cxx2014.xml @@ -1142,7 +1142,6 @@ not in any particular release. Class template regex_traits Partial transform_primary is not correctly implemented - 28.8 diff --git a/libstdc++-v3/doc/xml/manual/status_cxx2017.xml b/libstdc++-v3/doc/xml/manual/status_cxx2017.xml index 7b5df3d1385..aa34b8c3cf5 100644 --- a/libstdc++-v3/doc/xml/manual/status_cxx2017.xml +++ b/libstdc++-v3/doc/xml/manual/status_cxx2017.xml @@ -2224,7 +2224,6 @@ since C++14 and the implementation is complete. Class template regex_traits Partial transform_primary is not correctly implemented - 31.8
[r11-7011 Regression] FAIL: g++.dg/cpp0x/alias-decl-dr1558.C -std=c++17 (test for excess errors) on Linux/x86_64
On Linux/x86_64, 6e0a231a4aa2407bb7167daf98a37795a67364d8 is the first bad commit commit 6e0a231a4aa2407bb7167daf98a37795a67364d8 Author: Jason Merrill Date: Wed Jan 27 17:15:39 2021 -0500 c++: alias in qualified-id in template arg [PR98570] caused FAIL: g++.dg/modules/xtreme-header-1_a.H module-cmi (gcm.cache/$srcdir/g++.dg/modules/xtreme-header-1_a.H.gcm) FAIL: g++.dg/modules/xtreme-header-1_a.H -std=c++2a (internal compiler error) FAIL: g++.dg/modules/xtreme-header-1_a.H -std=c++2a (test for excess errors) FAIL: g++.dg/pr77550.C (internal compiler error) FAIL: g++.dg/pr77550.C (test for excess errors) with GCC configured with ../../gcc/configure --prefix=/local/skpandey/gccwork/toolwork/gcc-bisect-master/master/r11-7011/usr --enable-clocale=gnu --with-system-zlib --with-demangler-in-ld --with-fpmath=sse --enable-languages=c,c++,fortran --enable-cet --without-isl --enable-libmpx x86_64-linux --disable-bootstrap To reproduce: $ cd {build_dir}/gcc && make check RUNTESTFLAGS="modules.exp=g++.dg/modules/xtreme-header-1_a.H --target_board='unix{-m32}'" $ cd {build_dir}/gcc && make check RUNTESTFLAGS="dg.exp=g++.dg/pr77550.C --target_board='unix{-m32}'" (Please do not reply to this email, for question about this report, contact me at skpgkp2 at gmail dot com)
[Patch] Fortran: Fix Array dependency with local coarrays [PR98913] (was: Re: A question about coarrays and -Warray-temporaries)
Hi all, the attached patch now handles -fcoarray=single and access to the local image like no coarray access, using the same check a before. Actually, I think we could remove the if (..coarray..) check completely: For 'single', it is like no coarrays; for 'lib' both AA[remote] =... and "AA = ... AA[remove]" will create a function call and uses already temporaries. I don't know what 'native'/'threads' does – but either it the image index is different, then no temporary is needed at all – or it is the same (in particular: this_image / not present), but then the noncoarray analysis would work. (Knowing that for i != this_image(), AA and AA[i] – or i /= j for AA[i] and AA[j] – we could further relax the checks, but I don't think that that's needed.) Hence: Is the patch OK for the trunk? Or should we completely get rid of if-conditional block? Tobias On 01.02.21 12:52, Tobias Burnus wrote: On 01.02.21 08:42, Thomas Koenig via Fortran wrote: I have a question with -Warray-temporaries in the test below. With the AA coarray that warning appears in the first loop (on its local part), but not with the BB array with the same task, i.e. [...] It seems that dependency checking does not detect that no overlap can exist in that case, and so generates a temporary. Probably, dependency checking was not updated when coarrays were introduced. This is a missed optimization, not a correctness issue. I have now filled https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98913 - Mentor Graphics (Deutschland) GmbH, Arnulfstrasse 201, 80634 München Registergericht München HRB 106955, Geschäftsführer: Thomas Heurung, Frank Thürauf Fortran: Fix Array dependency with local coarrays [PR98913] gcc/fortran/ChangeLog: PR fortran/98913 * dependency.c (gfc_dep_resolver): Treat local access to coarrays like any array access in dependency analysis. gcc/testsuite/ChangeLog: PR fortran/98913 * gfortran.dg/coarray/array_temporary.f90: New test. gcc/fortran/dependency.c | 11 +++- .../gfortran.dg/coarray/array_temporary.f90| 74 ++ 2 files changed, 83 insertions(+), 2 deletions(-) diff --git a/gcc/fortran/dependency.c b/gcc/fortran/dependency.c index c9baca80cbc..0de5d093aab 100644 --- a/gcc/fortran/dependency.c +++ b/gcc/fortran/dependency.c @@ -30,6 +30,7 @@ along with GCC; see the file COPYING3. If not see #include "dependency.h" #include "constructor.h" #include "arith.h" +#include "options.h" /* static declarations */ /* Enums */ @@ -2143,8 +2144,14 @@ gfc_dep_resolver (gfc_ref *lref, gfc_ref *rref, gfc_reverse *reverse, case REF_ARRAY: - /* For now, treat all coarrays as dangerous. */ - if (lref->u.ar.codimen || rref->u.ar.codimen) + /* For now, treat all nonlocal coarrays as dangerous. */ + if (flag_coarray != GFC_FCOARRAY_SINGLE + && ((lref->u.ar.codimen + && lref->u.ar.dimen_type[lref->u.ar.dimen] + != DIMEN_THIS_IMAGE) + || (rref->u.ar.codimen + && lref->u.ar.dimen_type[lref->u.ar.dimen] + != DIMEN_THIS_IMAGE))) return 1; if (ref_same_as_full_array (lref, rref)) diff --git a/gcc/testsuite/gfortran.dg/coarray/array_temporary.f90 b/gcc/testsuite/gfortran.dg/coarray/array_temporary.f90 new file mode 100644 index 000..86460a7c282 --- /dev/null +++ b/gcc/testsuite/gfortran.dg/coarray/array_temporary.f90 @@ -0,0 +1,74 @@ +! { dg-do compile } +! { dg-additional-options "-Warray-temporaries" } +! +! PR fortran/98913 +! +! Contributed by Jorge D'Elia +! +! Did create an array temporary for local access to coarray +! (but not for identical noncoarray use). +! + +program test + implicit none + integer, parameter :: iin = kind (1) + integer, parameter :: idp = kind (1.0d0) + real(kind=idp), allocatable :: AA (:,:)[:] + real(kind=idp), allocatable :: BB (:,:) + real(kind=idp), allocatable :: UU (:) + integer (kind=iin) :: nn, n1, n2 + integer (kind=iin) :: j, k, k1 + ! + nn = 5 + n1 = 1 + n2 = 10 + ! + allocate (AA (1:nn,n1:n2)[*]) + allocate (BB (1:nn,n1:n2)) + allocate (UU (1:nn)) + ! + k = 1 + k1 = k + 1 + ! + AA = 1.0_idp + BB = 1.0_idp + UU = 2.0_idp + + ! AA - coarrays + ! No temporary needed: + do j = 1, nn +AA (k1:nn,j) = AA (k1:nn,j) - UU (k1:nn) * AA (k,j) ! { dg-bogus "Creating array temporary" } + end do + do j = 1, nn +AA (k1:nn,j) = AA (k1:nn,j) - UU (k1:nn) * AA (k,j) - UU(k) * AA (k1-1:nn-1,j) ! { dg-bogus "Creating array temporary" } + end do + do j = 1, nn +AA (k1:nn,j) = AA (k1:nn,j) - UU (k1:nn) * AA (k,j) - UU(k) * AA (k1+1:nn+1,j) ! { dg-bogus "Creating array temporary" } + end do + + ! But: + do j = 1, nn +AA (k1:nn,j) = AA (k1-1:nn-1,j) - UU (k1:nn) * AA (k,j) - UU(k) * AA (k1+1:nn+1,j) ! { dg-warning "Creating array temporary" } + end do + + ! BB - no coarrays + ! No temporary needed: + do j = 1, nn +BB (k1:nn,j) = BB (k1:nn,j) - UU (k1:nn) * BB (k,j) ! { dg-bogus "Creating a
[PATCH] aarch64: Relax builtin flags for integer builtins
Hi all, This patch relaxes the flags for most integer builtins to NONE as they don't read/write memory and don't care about the FPCR/FPSR or exceptions so we should be more aggressive with them. This leads to fallout in a testcase where the result of an intrinsic was unused and it is now DCE'd. The testcase is adjusted. Bootstrapped and tested on aarch64-none-linux-gnu. Pushing to trunk. Thanks, Kyrill gcc/ChangeLog: * config/aarch64/aarch64-simd-builtins.def (clrsb, clz, ctz, popcount, vec_smult_lane_, vec_smlal_lane_, vec_smult_laneq_, vec_smlal_laneq_, vec_umult_lane_, vec_umlal_lane_, vec_umult_laneq_, vec_umlal_laneq_, ashl, sshl, ushl, srshl, urshl, sdot_lane, udot_lane, sdot_laneq, udot_laneq, usdot_lane, usdot_laneq, sudot_lane, sudot_laneq, ashr, ashr_simd, lshr, lshr_simd, srshr_n, urshr_n, ssra_n, usra_n, srsra_n, ursra_n, sshll_n, ushll_n, sshll2_n, ushll2_n, ssri_n, usri_n, ssli_n, ssli_n, usli_n, bswap, rbit, simd_bsl, eor3q, rax1q, xarq, bcaxq): Use NONE builtin flags. gcc/testsuite/ChangeLog: * gcc.target/aarch64/arg-type-diagnostics-1.c: Return result from foo. flag-int.patch Description: flag-int.patch
[PATCH] aarch64: Relax flags for floating-point builtins to FP where appropriate
Hi all, This patch relaxes various floating-point builtins to use the FP flags to signify they made use the FPCR or raise exceptions. Bootstrapped and tested on aarch64-none-linux-gnu. Pushing to trunk. Thanks, Kyrill gcc/ChangeLog: * config/aarch64/aarch64-simd-builtins.def (fcmla_lane0, fcmla_lane90, fcmla_lane180, fcmla_lane270, fcmlaq_lane0, fcmlaq_lane90, fcmlaq_lane180, fcmlaq_lane270, scvtf, ucvtf, fcvtzs, fcvtzu, scvtfsi, scvtfdi, ucvtfsi, ucvtfdi, fcvtzshf, fcvtzuhf, fmlal_lane_low, fmlsl_lane_low, fmlal_laneq_low, fmlsl_laneq_low, fmlalq_lane_low, fmlslq_lane_low, fmlalq_laneq_low, fmlslq_laneq_low, fmlal_lane_high, fmlsl_lane_high, fmlal_laneq_high, fmlsl_laneq_high, fmlalq_lane_high, fmlslq_lane_high, fmlalq_laneq_high, fmlslq_laneq_high): Use FP flags. fp-flag.patch Description: fp-flag.patch
[PATCH] aarch64: Relax some builtins to AUTO_FP
Hi all, This patch relaxes the flags for some builtins to AUTO_FP. These builtins do permutes and similar, so they shouldn't get the FP flags when operating on floating-point modes as they don't care about FPCR/FPSR and exceptions. Bootstrapped and tested on aarch64-none-linux-gnu. Pushing to trunk. Thanks, Kyrill gcc/ChangeLog: * config/aarch64/aarch64-simd-builtins.def (combine, zip1, zip2, uzp1, uzp2, trn1, trn2, simd_bsl): Use AUTO_FP flags. autofp-flag.patch Description: autofp-flag.patch
[PATCH] aarch64: Add and use FLAG_LOAD in builtins
Hi all, We already have a STORE flag that we use for builtins. This patch introduces a LOAD set that uses AUTO_FP and FLAG_READ_MEMORY. This allows for more aggressive optimisation of the load intrinsics. Turns out we have a great many testcases that do: float16x4x2_t f_vld2_lane_f16 (float16_t * p, float16x4x2_t v) { float16x4x2_t res; /* { dg-error "lane 4 out of range 0 - 3" "" { target *-*-* } 0 } */ res = vld2_lane_f16 (p, v, 4); /* { dg-error "lane -1 out of range 0 - 3" "" { target *-*-* } 0 } */ res = vld2_lane_f16 (p, v, -1); return res; } but since the first res is unused it now gets eliminated early on before we get to give an error message. Ideally we'd like to warn for both. This patch takes the conservative approach and doesn't convert the load-lane builtins to LOAD ; that's something we can improve later. Bootstrapped and tested on aarch64-none-linux-gnu. Pushing to trunk. Thanks, Kyrill gcc/ChangeLog: * config/aarch64/aarch64-builtins.c (FLAG_LOAD): Define. * config/aarch64/aarch64-simd-builtins.def (ld1x2, ld2, ld3, ld4, ld2r, ld3r, ld4r, ld1, ld1x3, ld1x4): Use LOAD flags. load-flag.patch Description: load-flag.patch
[PATCH] aarch64: Update flags for bfloat16 builtins
Hi all, This patch updates the flags for the bfloat16 builtins. The bfdot ones aren't affected by the FPCR/FPSR so can be AUTO_FP whereas the bfmlal ones follow the normal floating-point instructions and get FP. Bootstrapped and tested on aarch64-none-linux-gnu. Pushing to trunk. Thanks, Kyrill gcc/ChangeLog: * config/aarch64/aarch64-simd-builtins.def (bfdot_lane, bfdot_laneq): Use AUTO_FP flags. (bfmlalb_lane, bfmlalt_lane, bfmlalb_lane_q, bfmlalt_lane_q): Use FP flags. bfloat-flag.patch Description: bfloat-flag.patch
[Patch, fortran] PR98897 - Erroneous procedure attribute for associate name
This is more or less 'obvious' and does not require any further explanation. Regtests with FC33/x86_64 - OK for master (and )? Paul Fortran: Fix calls to associate name typebound subroutines [PR98897]. 2021-02-02 Paul Thomas gcc/fortran PR fortran/98897 * match.c (gfc_match_call): Include associate names as possible entities with typebound subroutines. The target needs to be resolved for the type. gcc/testsuite/ PR fortran/98897 * gfortran.dg/typebound_call_32.f90: New test. diff --git a/gcc/fortran/match.c b/gcc/fortran/match.c index f0469e25da6..2df6191d7e6 100644 --- a/gcc/fortran/match.c +++ b/gcc/fortran/match.c @@ -4999,10 +4999,16 @@ gfc_match_call (void) sym = st->n.sym; /* If this is a variable of derived-type, it probably starts a type-bound - procedure call. */ - if ((sym->attr.flavor != FL_PROCEDURE - || gfc_is_function_return_value (sym, gfc_current_ns)) - && (sym->ts.type == BT_DERIVED || sym->ts.type == BT_CLASS)) + procedure call. Associate variable targets have to be resolved for the + target type. */ + if (((sym->attr.flavor != FL_PROCEDURE + || gfc_is_function_return_value (sym, gfc_current_ns)) + && (sym->ts.type == BT_DERIVED || sym->ts.type == BT_CLASS)) + || + (sym->assoc && sym->assoc->target + && gfc_resolve_expr (sym->assoc->target) + && (sym->assoc->target->ts.type == BT_DERIVED + || sym->assoc->target->ts.type == BT_CLASS))) return match_typebound_call (st); /* If it does not seem to be callable (include functions so that the ! { dg-do run } ! ! Test the fix for PR98897 in which typebound subroutines of associate names ! were not recognised in a call. Functions were OK but this is tested below. ! ! Contributed by Damian Rouson ! module output_data_m implicit none type output_data_t integer, private :: i = 0 contains procedure output, return_value end type contains subroutine output(self) implicit none class(output_data_t) self self%i = 1234 end subroutine integer function return_value(self) implicit none class(output_data_t) self return_value = self%i end function end module use output_data_m implicit none associate(output_data => output_data_t()) call output_data%output if (output_data%return_value() .ne. 1234) stop 1 end associate end
Re: [PATCH] gimple-match, gimple-fold: After PROP_gimple_lvec is set, punt for vector stmts that veclower would need to lower [PR98287]
On Tue, Feb 02, 2021 at 11:06:33AM +0100, Richard Biener wrote: > So I fear this only covers parts of the paths simplifications can > end up used. Now one question is whether we want to allow > "invalid" intermediate transforms if a final match makes it > "valid" again. If not then doing the verification in > gimple-match.c itself might be preferable. OTOH I think that > most patterns should verify themselves - it's also odd that > v << 1 >> 1 is supported but not v & vec-cst - we might want to > require some basic support from targets. I realize the PR is > one of the odd V1mode cases which we might want to support better > during RTL expansion (or require targets to be more sane here). > > So in the end I don't like the patch much, not at this stage > anyway. I'm afraid we can't just ignore it for GCC 11, there are many passes after veclower and we have hundreds of folders on vector ops and many of them can just turn something that is handled into something that isn't, even with normal vector types and not the corner cases. Not all targets support all optabs... What about a modification of the patch which instead of preventing the simplifications clears PROP_gimple_lvec property and schedule another veclower pass right before isel which would be only run if the property isn't set (I think the -O0 version of the pass already works that way). Jakub
Re: [PATCH] gimple-match, gimple-fold: After PROP_gimple_lvec is set, punt for vector stmts that veclower would need to lower [PR98287]
On Tue, 2 Feb 2021, Jakub Jelinek wrote: > On Tue, Feb 02, 2021 at 11:06:33AM +0100, Richard Biener wrote: > > So I fear this only covers parts of the paths simplifications can > > end up used. Now one question is whether we want to allow > > "invalid" intermediate transforms if a final match makes it > > "valid" again. If not then doing the verification in > > gimple-match.c itself might be preferable. OTOH I think that > > most patterns should verify themselves - it's also odd that > > v << 1 >> 1 is supported but not v & vec-cst - we might want to > > require some basic support from targets. I realize the PR is > > one of the odd V1mode cases which we might want to support better > > during RTL expansion (or require targets to be more sane here). > > > > So in the end I don't like the patch much, not at this stage > > anyway. > > I'm afraid we can't just ignore it for GCC 11, there are many passes after > veclower and we have hundreds of folders on vector ops and many of them can > just turn something that is handled into something that isn't, even with > normal vector types and not the corner cases. Not all targets support all > optabs... > What about a modification of the patch which instead of preventing the > simplifications clears PROP_gimple_lvec property and schedule another > veclower pass right before isel which would be only run if the property > isn't set (I think the -O0 version of the pass already works that way). I don't say ignore the problem. All I say is that the x86 target should either not advertise V1DF shifts or advertise the basic ops that reasonable simplification would expect to exist. The testcases sofar are artificially generating V1DF operations. Now, if we want to have any fully automatic way of disabling transforms like you propose then IMHO it should be based on common a "is this op supported" machinery shared by vector lowering and the new machinery (and eventually also used by RTL expansion and the optab checks that exist for the vectorizers purpose). Otherwise it's just a quite fragile (and invisble) machinery again. Btw, I just can find V1DI mentioned in mmx.md but I can't find rotate or shift patterns that would match? That said, for match.pd result verification IMHO the correct place is a gimple_match_op::target_supported_p routine or so that we'd call in gimple-match.c like if (__builtin_expect (dump_file && (dump_flags & TDF_FOLDING), 0)) fprintf (dump_file, "Applying pattern %s:%d, %s:%d\n", "match.pd", 600, __FILE__, __LINE__); { res_op->set_op (NOP_EXPR, type, 1); { tree _o1[2], _r1; _o1[0] = captures[0]; _o1[1] = captures[1]; gimple_match_op tem_op (res_op->cond.any_else (), TRUNC_MOD_EXPR, TREE_TYPE (_o1[0]), _o1[0], _o1[1]); tem_op.resimplify (lseq, valueize); < here _r1 = maybe_push_res_to_seq (&tem_op, lseq); vector lowering could then build a gimple_match_op and just call the new routine as predicate. We could return a tri-state to explicitely signal "don't know" (aka not fully implement the function but try to be spot-on for vector optabs). That said, I'd prefer a x86 specific solution for the PR at this point (just kill those shift V1DImode optab entries?) Richard.
[PATCH 0/4] openacc: Mixing array elements and derived types
This patch series fixes support for mixing arbitrary derived type accesses and array element accesses in OpenACC directives, and fixes a bug in derived-type lowering. Somewhat relatedly, it also allows strided accesses in update directives. Further commentary alongside each patch. Some of the test cases were originally written by Tobias Burnus. Tested with offloading to AMD GCN. OK for mainline (or for stage1)? Thanks, Julian Julian Brown (4): openacc: Remove dereference for non-pointer derived-type members openacc: Use class_pointer instead of pointer attribute for class types openacc: Fix lowering for derived-type mappings through array elements openacc: Allow strided arrays in update directives gcc/fortran/openmp.c | 5 +- gcc/fortran/trans-openmp.c| 196 +++--- gcc/gimplify.c| 12 ++ .../gfortran.dg/goacc/array-with-dt-1.f90 | 11 + .../gfortran.dg/goacc/array-with-dt-2.f90 | 10 + .../gfortran.dg/goacc/array-with-dt-3.f90 | 14 ++ .../gfortran.dg/goacc/array-with-dt-4.f90 | 18 ++ .../gfortran.dg/goacc/array-with-dt-5.f90 | 12 ++ .../goacc/derived-classtypes-1.f90| 129 .../derivedtypes-arrays-1.f90 | 109 ++ .../libgomp.oacc-fortran/update-dt-array.f90 | 53 + 11 files changed, 487 insertions(+), 82 deletions(-) create mode 100644 gcc/testsuite/gfortran.dg/goacc/array-with-dt-1.f90 create mode 100644 gcc/testsuite/gfortran.dg/goacc/array-with-dt-2.f90 create mode 100644 gcc/testsuite/gfortran.dg/goacc/array-with-dt-3.f90 create mode 100644 gcc/testsuite/gfortran.dg/goacc/array-with-dt-4.f90 create mode 100644 gcc/testsuite/gfortran.dg/goacc/array-with-dt-5.f90 create mode 100644 gcc/testsuite/gfortran.dg/goacc/derived-classtypes-1.f90 create mode 100644 libgomp/testsuite/libgomp.oacc-fortran/derivedtypes-arrays-1.f90 create mode 100644 libgomp/testsuite/libgomp.oacc-fortran/update-dt-array.f90 -- 2.29.2
[PATCH 1/4] openacc: Remove dereference for non-pointer derived-type members
Derived-type members that are themselves derived types are not always represented as pointers, so it is not always correct to dereference them. The attached test case fails during gimplification without this patch. Tested with offloading to AMD GCN. OK for mainline? Thanks, Julian 2020-02-02 Julian Brown gcc/fortran/ * trans-openmp.c (gfc_trans_omp_clauses): Handle non-pointer type. gcc/testsuite/ * gfortran.dg/goacc/derived-classtypes-1.f95: New test. --- gcc/fortran/trans-openmp.c| 7 +- .../goacc/derived-classtypes-1.f95| 129 ++ 2 files changed, 134 insertions(+), 2 deletions(-) create mode 100644 gcc/testsuite/gfortran.dg/goacc/derived-classtypes-1.f95 diff --git a/gcc/fortran/trans-openmp.c b/gcc/fortran/trans-openmp.c index 00358ca4d39..8d8da4593c3 100644 --- a/gcc/fortran/trans-openmp.c +++ b/gcc/fortran/trans-openmp.c @@ -3013,7 +3013,8 @@ gfc_trans_omp_clauses (stmtblock_t *block, gfc_omp_clauses *clauses, } OMP_CLAUSE_DECL (node) - = build_fold_indirect_ref (data); + = (POINTER_TYPE_P (TREE_TYPE (data)) + ? build_fold_indirect_ref (data) : data); OMP_CLAUSE_SIZE (node) = size; node2 = build_omp_clause (input_location, OMP_CLAUSE_MAP); @@ -3021,7 +3022,9 @@ gfc_trans_omp_clauses (stmtblock_t *block, gfc_omp_clauses *clauses, openacc ? GOMP_MAP_ATTACH_DETACH : GOMP_MAP_ALWAYS_POINTER); - OMP_CLAUSE_DECL (node2) = data; + OMP_CLAUSE_DECL (node2) + = (POINTER_TYPE_P (TREE_TYPE (data)) + ? data : build_fold_addr_expr (data)); OMP_CLAUSE_SIZE (node2) = size_int (0); } else diff --git a/gcc/testsuite/gfortran.dg/goacc/derived-classtypes-1.f95 b/gcc/testsuite/gfortran.dg/goacc/derived-classtypes-1.f95 new file mode 100644 index 000..e6cf09c6d3c --- /dev/null +++ b/gcc/testsuite/gfortran.dg/goacc/derived-classtypes-1.f95 @@ -0,0 +1,129 @@ +type :: type1 + integer :: a +end type type1 + +type :: type2 + integer, pointer :: b +end type type2 + +type :: aux1 + integer :: y +end type aux1 + +type, extends(aux1) :: aux + integer :: x +end type aux + +type :: type3 + class(aux), pointer :: c(:) +end type type3 + +type :: type4 + integer, pointer :: d(:) +end type type4 + +type :: type5 + type(aux) :: e +end type type5 + +type :: type6 + type(aux), pointer :: f +end type type6 + +type :: type7 + class(aux), pointer :: g +end type type7 + +type(type1) :: foo +type(type2) :: bar +type(type3) :: qux +type(type4) :: quux +type(type5) :: fred +type(type6) :: jim +type(type7) :: shiela + +type(type1), pointer :: pfoo +type(type2), pointer :: pbar +type(type3), pointer :: pqux +type(type4), pointer :: pquux +type(type5), pointer :: pfred +type(type6), pointer :: pjim +type(type7), pointer :: pshiela + +class(type1), pointer :: cfoo +class(type2), pointer :: cbar +class(type3), pointer :: cqux +class(type4), pointer :: cquux +class(type5), pointer :: cfred +class(type6), pointer :: cjim +class(type7), pointer :: cshiela + +class(type1), allocatable :: acfoo +class(type2), allocatable :: acbar +class(type3), allocatable :: acqux +class(type4), allocatable :: acquux +class(type5), allocatable :: acfred +class(type6), allocatable :: acjim +class(type7), allocatable :: acshiela + +!$acc enter data copyin(foo) +!$acc enter data copyin(foo%a) +!$acc enter data copyin(bar) +!$acc enter data copyin(bar%b) +!$acc enter data copyin(qux) +!!$acc enter data copyin(qux%c) +!$acc enter data copyin(quux) +!$acc enter data copyin(quux%d) +!$acc enter data copyin(fred) +!$acc enter data copyin(fred%e) +!$acc enter data copyin(jim) +!$acc enter data copyin(jim%f) +!$acc enter data copyin(shiela) +!$acc enter data copyin(shiela%g) + +!$acc enter data copyin(pfoo) +!$acc enter data copyin(pfoo%a) +!$acc enter data copyin(pbar) +!$acc enter data copyin(pbar%b) +!$acc enter data copyin(pqux) +!!$acc enter data copyin(pqux%c) +!$acc enter data copyin(pquux) +!$acc enter data copyin(pquux%d) +!$acc enter data copyin(pfred) +!$acc enter data copyin(pfred%e) +!$acc enter data copyin(pjim) +!$acc enter data copyin(pjim%f) +!$acc enter data copyin(pshiela) +!$acc enter data copyin(pshiela%g) + +!$acc enter data copyin(cfoo) +!$acc enter data copyin(cfoo%a) +!$acc enter data copyin(cbar) +!$acc enter data copyin(cbar%b) +!$acc enter data copyin(cqux) +!!$acc enter data copyin(cqux%c) +!$acc enter data copyin(cquux) +!$acc enter data copyin(cquux%d) +!$acc enter data copyin(cfred) +!$acc enter
[PATCH 2/4] openacc: Use class_pointer instead of pointer attribute for class types
Elsewhere in the Fortran front-end, the class_pointer attribute is used for BT_CLASS entities instead of the pointer attribute. This patch follows suit for OpenACC. I couldn't actually come up with a test case where this makes a difference (i.e., where "class_pointer" and "pointer" have different values at this point in the code), but this may nonetheless fix a latent bug. Tested with offloading to AMD GCN. OK for mainline? Thanks, Julian 2020-02-02 Julian Brown gcc/fortran/ * trans-openmp.c (gfc_trans_omp_clauses): Use class_pointer attribute for BT_CLASS. --- gcc/fortran/trans-openmp.c | 5 - 1 file changed, 4 insertions(+), 1 deletion(-) diff --git a/gcc/fortran/trans-openmp.c b/gcc/fortran/trans-openmp.c index 8d8da4593c3..7be34ef9a35 100644 --- a/gcc/fortran/trans-openmp.c +++ b/gcc/fortran/trans-openmp.c @@ -2997,7 +2997,10 @@ gfc_trans_omp_clauses (stmtblock_t *block, gfc_omp_clauses *clauses, if (lastcomp->u.c.component->ts.type == BT_DERIVED || lastcomp->u.c.component->ts.type == BT_CLASS) { - if (sym_attr.pointer || (openacc && sym_attr.allocatable)) + bool pointer + = (lastcomp->u.c.component->ts.type == BT_CLASS + ? sym_attr.class_pointer : sym_attr.pointer); + if (pointer || (openacc && sym_attr.allocatable)) { tree data, size; -- 2.29.2
[PATCH 4/4] openacc: Allow strided arrays in update directives
OpenACC 3.0 ("2.14.4. Update Directive") states: Noncontiguous subarrays may appear. It is implementation-specific whether noncontiguous regions are updated by using one transfer for each contiguous subregion, or whether the non-contiguous data is packed, transferred once, and unpacked, or whether one or more larger subarrays (no larger than the smallest contiguous region that contains the specified subarray) are updated. This patch relaxes some conditions in the Fortran front-end so that strided accesses are permitted for update directives. Tested with offloading to AMD GCN. OK for mainline? Thanks, Julian 2020-02-02 Julian Brown gcc/fortran/ * openmp.c (resolve_omp_clauses): Omit OpenACC update in contiguity check and stride-specified error. gcc/testsuite/ * gfortran.dg/goacc/array-with-dt-2.f90: New test. --- gcc/fortran/openmp.c| 5 +++-- gcc/testsuite/gfortran.dg/goacc/array-with-dt-2.f90 | 10 ++ 2 files changed, 13 insertions(+), 2 deletions(-) create mode 100644 gcc/testsuite/gfortran.dg/goacc/array-with-dt-2.f90 diff --git a/gcc/fortran/openmp.c b/gcc/fortran/openmp.c index 9a3a8f63b5e..1c8c5315329 100644 --- a/gcc/fortran/openmp.c +++ b/gcc/fortran/openmp.c @@ -5192,7 +5192,8 @@ resolve_omp_clauses (gfc_code *code, gfc_omp_clauses *omp_clauses, array isn't contiguous. An expression such as arr(-n:n,-n:n) could be contiguous even if it looks like it may not be. */ - if (list != OMP_LIST_CACHE + if (code->op != EXEC_OACC_UPDATE + && list != OMP_LIST_CACHE && list != OMP_LIST_DEPEND && !gfc_is_simply_contiguous (n->expr, false, true) && gfc_is_not_contiguous (n->expr)) @@ -5224,7 +5225,7 @@ resolve_omp_clauses (gfc_code *code, gfc_omp_clauses *omp_clauses, int i; gfc_array_ref *ar = &array_ref->u.ar; for (i = 0; i < ar->dimen; i++) - if (ar->stride[i]) + if (ar->stride[i] && code->op != EXEC_OACC_UPDATE) { gfc_error ("Stride should not be specified for " "array section in %s clause at %L", diff --git a/gcc/testsuite/gfortran.dg/goacc/array-with-dt-2.f90 b/gcc/testsuite/gfortran.dg/goacc/array-with-dt-2.f90 new file mode 100644 index 000..807580d75a9 --- /dev/null +++ b/gcc/testsuite/gfortran.dg/goacc/array-with-dt-2.f90 @@ -0,0 +1,10 @@ +type t + integer, allocatable :: A(:,:) +end type t + +type(t), allocatable :: b(:) + +!$acc update host(b(::2)) +!$acc update host(b(1)%A(::3,::4)) +end + -- 2.29.2
[PATCH 3/4] openacc: Fix lowering for derived-type mappings through array elements
This patch fixes lowering of derived-type mappings which select elements of arrays of derived types, and similar. These would previously lead to ICEs. With this change, update directives and enter/exit data directives can pass through constructs that are no longer recognized by the gimplifier, hence alterations are needed there also. Tested with offloading to AMD GCN. OK for mainline? Thanks, Julian 2020-02-02 Julian Brown gcc/fortran/ * trans-openmp.c (gfc_trans_omp_clauses): Handle element selection for arrays of derived types. gcc/ * gimplify.c (gimplify_scan_omp_clauses): Handle ATTACH_DETACH for non-decls. gcc/testsuite/ * gfortran.dg/goacc/array-with-dt-1.f90: New test. * gfortran.dg/goacc/array-with-dt-3.f90: Likewise. * gfortran.dg/goacc/array-with-dt-4.f90: Likewise. * gfortran.dg/goacc/array-with-dt-5.f90: Likewise. * gfortran.dg/goacc/derived-classtypes-1.f95: Uncomment previously-broken directives. libgomp/ * testsuite/libgomp.oacc-fortran/derivedtypes-arrays-1.f90: New test. * testsuite/libgomp.oacc-fortran/update-dt-array.f90: Likewise. --- gcc/fortran/trans-openmp.c| 192 ++ gcc/gimplify.c| 12 ++ .../gfortran.dg/goacc/array-with-dt-1.f90 | 11 + .../gfortran.dg/goacc/array-with-dt-3.f90 | 14 ++ .../gfortran.dg/goacc/array-with-dt-4.f90 | 18 ++ .../gfortran.dg/goacc/array-with-dt-5.f90 | 12 ++ ...sstypes-1.f95 => derived-classtypes-1.f90} | 8 +- .../derivedtypes-arrays-1.f90 | 109 ++ .../libgomp.oacc-fortran/update-dt-array.f90 | 53 + 9 files changed, 344 insertions(+), 85 deletions(-) create mode 100644 gcc/testsuite/gfortran.dg/goacc/array-with-dt-1.f90 create mode 100644 gcc/testsuite/gfortran.dg/goacc/array-with-dt-3.f90 create mode 100644 gcc/testsuite/gfortran.dg/goacc/array-with-dt-4.f90 create mode 100644 gcc/testsuite/gfortran.dg/goacc/array-with-dt-5.f90 rename gcc/testsuite/gfortran.dg/goacc/{derived-classtypes-1.f95 => derived-classtypes-1.f90} (95%) create mode 100644 libgomp/testsuite/libgomp.oacc-fortran/derivedtypes-arrays-1.f90 create mode 100644 libgomp/testsuite/libgomp.oacc-fortran/update-dt-array.f90 diff --git a/gcc/fortran/trans-openmp.c b/gcc/fortran/trans-openmp.c index 7be34ef9a35..0ab08dabe9a 100644 --- a/gcc/fortran/trans-openmp.c +++ b/gcc/fortran/trans-openmp.c @@ -2675,6 +2675,32 @@ gfc_trans_omp_clauses (stmtblock_t *block, gfc_omp_clauses *clauses, tree decl = gfc_trans_omp_variable (n->sym, false); if (DECL_P (decl)) TREE_ADDRESSABLE (decl) = 1; + + gfc_ref *lastref = NULL; + + if (n->expr) + for (gfc_ref *ref = n->expr->ref; ref; ref = ref->next) + if (ref->type == REF_COMPONENT || ref->type == REF_ARRAY) + lastref = ref; + + bool allocatable = false, pointer = false; + + if (lastref && lastref->type == REF_COMPONENT) + { + gfc_component *c = lastref->u.c.component; + + if (c->ts.type == BT_CLASS) + { + pointer = CLASS_DATA (c)->attr.class_pointer; + allocatable = CLASS_DATA (c)->attr.allocatable; + } + else + { + pointer = c->attr.pointer; + allocatable = c->attr.allocatable; + } + } + if (n->expr == NULL || (n->expr->ref->type == REF_ARRAY && n->expr->ref->u.ar.type == AR_FULL)) @@ -2911,74 +2937,79 @@ gfc_trans_omp_clauses (stmtblock_t *block, gfc_omp_clauses *clauses, } else if (n->expr && n->expr->expr_type == EXPR_VARIABLE - && n->expr->ref->type == REF_COMPONENT) + && n->expr->ref->type == REF_ARRAY + && !n->expr->ref->next) { - gfc_ref *lastcomp; - - for (gfc_ref *ref = n->expr->ref; ref; ref = ref->next) - if (ref->type == REF_COMPONENT) - lastcomp = ref; - - symbol_attribute sym_attr; - - if (lastcomp->u.c.component->ts.type == BT_CLASS) - sym_attr = CLASS_DATA (lastcomp->u.c.component)->attr; - else - sym_attr = lastcomp->u.c.component->attr; - + /* An array element or array section which is not part of a +derived type, etc. */ + bool element = n->expr->ref->u.ar.type == AR_ELEMENT; + gfc_trans_omp_array_section (block, n, decl, element, + GOMP_MAP_POINTER, node, node2, +
Re: [RFC] Feedback on approach for adding support for V8QI->V8HI widening patterns
On Mon, Feb 1, 2021 at 6:54 PM Joel Hutton wrote: > > Hi Richard(s), > > I'm just looking to see if I'm going about this the right way, based on the > discussion we had on IRC. I've managed to hack something together, I've > attached a (very) WIP patch which gives the correct codegen for the testcase > in question (https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98772). It would > obviously need to support other widening patterns and differentiate between > big/little endian among other things. > > I added a backend pattern because I wasn't quite clear which changes to make > in order to allow the existing backend patterns to be used with a V8QI, or > how to represent V16QI where we don't care about the top/bottom 8. I made > some attempt in optabs.c, which is in the patch commented out, but I'm not > sure if I'm going about this the right way. Hmm, as said, I'd try to arrange like illustrated in the attachment, confined to vectorizable_conversion. The only complication might be sub-optimal code-gen for the vector-vector CTOR compensating for the input vector (on RTL that would be a paradoxical subreg from say V4HI to V8HI) Richard. > Joel p Description: Binary data
Re: [PATCH] gimple-match, gimple-fold: After PROP_gimple_lvec is set, punt for vector stmts that veclower would need to lower [PR98287]
On Tue, Feb 02, 2021 at 02:23:55PM +0100, Richard Biener wrote: > Btw, I just can find V1DI mentioned in mmx.md but I can't find > rotate or shift patterns that would match? The backend has several V1?Imode shifts, but optab only for those V1DImode ones: grep '[la]sh[lr]v1[qhsdtox]' tmp-mddump.md (define_insn ("mmx_ashlv1di3") (define_insn ("mmx_lshrv1di3") (define_insn ("avx512bw_ashlv1ti3") (define_insn ("avx512bw_lshrv1ti3") (define_insn ("sse2_ashlv1ti3") (define_insn ("sse2_lshrv1ti3") (define_expand ("ashlv1di3") (define_expand ("lshrv1di3") emit_insn (gen_sse2_lshrv1ti3 (tmp, gen_lowpart (V1TImode, operands[1]), I think it has been introduced with https://gcc.gnu.org/bugzilla/show_bug.cgi?id=89021#c13 Before we didn't have any V1DImode expanders (except mov/movmisalign, but those are needed and are supplied for other V1??mode modes too). So I'll test: 2021-02-02 Jakub Jelinek PR tree-optimization/98287 * config/i386/mmx.md (3): For shifts don't enable expander for V1DImode. * gcc.dg/pr98287.c: New test. --- gcc/config/i386/mmx.md.jj 2021-01-07 15:29:52.601974578 +0100 +++ gcc/config/i386/mmx.md 2021-02-02 14:48:52.310935516 +0100 @@ -1528,9 +1528,9 @@ (define_insn "mmx_3" (set_attr "mode" "DI,TI,TI")]) (define_expand "3" - [(set (match_operand:MMXMODE248 0 "register_operand") -(any_lshift:MMXMODE248 - (match_operand:MMXMODE248 1 "register_operand") + [(set (match_operand:MMXMODE24 0 "register_operand") +(any_lshift:MMXMODE24 + (match_operand:MMXMODE24 1 "register_operand") (match_operand:DI 2 "nonmemory_operand")))] "TARGET_MMX_WITH_SSE") --- gcc/testsuite/gcc.dg/pr98287.c.jj 2021-02-02 14:50:05.583107569 +0100 +++ gcc/testsuite/gcc.dg/pr98287.c 2021-02-02 14:50:05.583107569 +0100 @@ -0,0 +1,19 @@ +/* PR tree-optimization/98287 */ +/* { dg-do compile } */ +/* { dg-options "-O2 -fno-tree-ccp -fno-tree-forwprop -Wno-psabi -w" } */ + +typedef unsigned long __attribute__((__vector_size__ (8))) V; +V v; + +static __attribute__((noinline, noclone)) V +bar (unsigned short s) +{ + return v >> s << s | v >> s >> 63; +} + +unsigned long +foo (void) +{ + V x = bar (1); + return x[0]; +} Jakub
Re: [Patch, fortran] PR98897 - Erroneous procedure attribute for associate name
Hi Paul, On 02.02.21 13:20, Paul Richard Thomas via Gcc-patches wrote: This is more or less 'obvious' and does not require any further explanation. Well, I am not sure whether calling resolve is premature or not. In any case, it still fails for the attached testcase. (Related but separate issue.) The second testcase fails with "Selector at (1) has no type" / "Symbol 'var' at (1) has no IMPLICIT type". Disclaimer: I am not 100% sure whether those two or your/the PR's testcase is valid. (It fails to compile with ifort 19.1. I have not read the spec and assume that the original testcase is valid.) Thus, please confirm that all three are valid. If so, do you see a way to make the two new ones pass as well? If we are sure that the current patch is still the right approach, I am also fine to do it stepwise. Thanks, Tobias Regtests with FC33/x86_64 - OK for master (and )? Paul Fortran: Fix calls to associate name typebound subroutines [PR98897]. 2021-02-02 Paul Thomas gcc/fortran PR fortran/98897 * match.c (gfc_match_call): Include associate names as possible entities with typebound subroutines. The target needs to be resolved for the type. gcc/testsuite/ PR fortran/98897 * gfortran.dg/typebound_call_32.f90: New test. - Mentor Graphics (Deutschland) GmbH, Arnulfstrasse 201, 80634 München Registergericht München HRB 106955, Geschäftsführer: Thomas Heurung, Frank Thürauf module m implicit none contains subroutine double(i) integer :: i i = 2*i end subroutine double function bar() result(res) procedure(double), pointer :: res res => double end function bar subroutine foo(i) integer :: i ! This works: procedure(), pointer :: proc call double(i) proc => bar() call proc(i) ! This fails: associate (var => bar()) call var(i) ! { dg-bogus "VARIABLE attribute of 'var' conflicts with PROCEDURE attribute" } end associate end subroutine foo end module m program test use m implicit none (type, external) integer :: i i = 50 call foo(i) if (i /= 50*2*2) stop 1 end program test module m implicit none contains subroutine double(i) integer :: i i = 2*i end subroutine double function bar() result(res) procedure(double), pointer :: res res => double end function bar subroutine foo(i) integer :: i procedure(bar) :: var procedure(double), pointer :: proc associate (var => bar()) proc => var end associate call proc(i) end subroutine foo end module m program test use m implicit none (type, external) integer :: i i = 50 call foo(i) if (i /= 50*2) stop 1 end program test
Re: [PATCH] gimple-match, gimple-fold: After PROP_gimple_lvec is set, punt for vector stmts that veclower would need to lower [PR98287]
On Tue, Feb 2, 2021 at 2:56 PM Jakub Jelinek wrote: > > On Tue, Feb 02, 2021 at 02:23:55PM +0100, Richard Biener wrote: > > Btw, I just can find V1DI mentioned in mmx.md but I can't find > > rotate or shift patterns that would match? > > The backend has several V1?Imode shifts, but optab only for those V1DImode > ones: > > grep '[la]sh[lr]v1[qhsdtox]' tmp-mddump.md > (define_insn ("mmx_ashlv1di3") > (define_insn ("mmx_lshrv1di3") > (define_insn ("avx512bw_ashlv1ti3") > (define_insn ("avx512bw_lshrv1ti3") > (define_insn ("sse2_ashlv1ti3") > (define_insn ("sse2_lshrv1ti3") > (define_expand ("ashlv1di3") > (define_expand ("lshrv1di3") > emit_insn (gen_sse2_lshrv1ti3 (tmp, gen_lowpart (V1TImode, operands[1]), > > I think it has been introduced with > https://gcc.gnu.org/bugzilla/show_bug.cgi?id=89021#c13 > > Before we didn't have any V1DImode expanders (except mov/movmisalign, but > those are needed and are supplied for other V1??mode modes too). > > So I'll test: > > 2021-02-02 Jakub Jelinek > > PR tree-optimization/98287 > * config/i386/mmx.md (3): For shifts don't enable expander > for V1DImode. > > * gcc.dg/pr98287.c: New test. OK. Please note that V1DI vectors are intended only for builtins. They were introduced to prevent former DImode MMX builtins from interfering with DImode integer operations. Note that MMX shifts don't clobber flags, so their RTX pattern was preferred in comparison to integer DImode shifts. The same happened with TImode shifts of XMM builtins. Uros. > --- gcc/config/i386/mmx.md.jj 2021-01-07 15:29:52.601974578 +0100 > +++ gcc/config/i386/mmx.md 2021-02-02 14:48:52.310935516 +0100 > @@ -1528,9 +1528,9 @@ (define_insn "mmx_3" > (set_attr "mode" "DI,TI,TI")]) > > (define_expand "3" > - [(set (match_operand:MMXMODE248 0 "register_operand") > -(any_lshift:MMXMODE248 > - (match_operand:MMXMODE248 1 "register_operand") > + [(set (match_operand:MMXMODE24 0 "register_operand") > +(any_lshift:MMXMODE24 > + (match_operand:MMXMODE24 1 "register_operand") > (match_operand:DI 2 "nonmemory_operand")))] >"TARGET_MMX_WITH_SSE") > > --- gcc/testsuite/gcc.dg/pr98287.c.jj 2021-02-02 14:50:05.583107569 +0100 > +++ gcc/testsuite/gcc.dg/pr98287.c 2021-02-02 14:50:05.583107569 +0100 > @@ -0,0 +1,19 @@ > +/* PR tree-optimization/98287 */ > +/* { dg-do compile } */ > +/* { dg-options "-O2 -fno-tree-ccp -fno-tree-forwprop -Wno-psabi -w" } */ > + > +typedef unsigned long __attribute__((__vector_size__ (8))) V; > +V v; > + > +static __attribute__((noinline, noclone)) V > +bar (unsigned short s) > +{ > + return v >> s << s | v >> s >> 63; > +} > + > +unsigned long > +foo (void) > +{ > + V x = bar (1); > + return x[0]; > +} > > > Jakub >
Re: [Patch] Fortran: Fix Array dependency with local coarrays [PR98913] (was: Re: A question about coarrays and -Warray-temporaries)
Hi Tobias,, - Mensaje original - > De: "Tobias Burnus" > Para: "Thomas Koenig" , "jdelia" , "Gfortran List" > gcc.gnu.org>, "GCC Patches" > Enviados: Martes, 2 de Febrero 2021 8:46:00 > Asunto: [Patch] Fortran: Fix Array dependency with local coarrays [PR98913] > (was: Re: A question about coarrays and > -Warray-temporaries) > > Hi all, > > the attached patch now handles -fcoarray=single and access to the local > image like no coarray access, using the same check a before. I would like to apply the patch but, sorry, how do I proceed? Where in the gcc tree to apply the patch file? Should I use git apply or diff? e.g. it does not work with $ pwd /home/bigpack/gcc-paq/sources/gcc-11.0-20210202 $ git apply dep-caf.diff error: patch failed: gcc/fortran/dependency.c:30 error: gcc/fortran/dependency.c: patch does not apply > Actually, I think we could remove the if (..coarray..) check completely: > For 'single', it is like no coarrays; for 'lib' both AA[remote] =... and > "AA = ... AA[remove]" will create a function call and uses already > temporaries. I don't know what 'native'/'threads' does – but either it > the image index is different, then no temporary is needed at all – or it > is the same (in particular: this_image / not present), but then the > noncoarray analysis would work. > > (Knowing that for i != this_image(), AA and AA[i] – or i /= j for AA[i] > and AA[j] – we could further relax the checks, but I don't think that > that's needed.) > > Hence: Is the patch OK for the trunk? Or should we completely get rid of > if-conditional block? Thanks for the fast work with this code optimization issue. Regards, Jorge. > -- > On 01.02.21 12:52, Tobias Burnus wrote: > >> On 01.02.21 08:42, Thomas Koenig via Fortran wrote: >>>> I have a question with -Warray-temporaries in the test below. >>>> With the AA coarray that warning appears in the first loop (on its >>>> local part), >>>> but not with the BB array with the same task, i.e. [...] >>> It seems that dependency checking does not detect that no overlap >>> can exist in that case, and so generates a temporary. Probably, >>> dependency checking was not updated when coarrays were introduced. >>> This is a missed optimization, not a correctness issue. >> I have now filled https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98913 > - > Mentor Graphics (Deutschland) GmbH, Arnulfstrasse 201, 80634 München > Registergericht München HRB 106955, Geschäftsführer: Thomas Heurung, Frank > Thürauf
Re: [[C++ PATCH]] Implement C++2a P0330R2 - Literal Suffixes for ptrdiff_t and size_t
On 2/2/21 1:19 AM, Ed Smith-Rowland wrote: On 2/2/21 12:12 AM, Jason Merrill wrote: On 2/1/21 9:15 PM, Ed Smith-Rowland wrote: On 2/1/21 2:23 PM, Jakub Jelinek wrote: On Mon, Feb 01, 2021 at 01:46:13PM -0500, Ed Smith-Rowland wrote: @@ -0,0 +1,8 @@ +// { dg-do compile { target c++23 } } + +#include +#include + +static_assert(std::is_same_v); +static_assert(std::is_same_v); Shouldn't this be std::make_signed::type instead of std::ptrdiff_t Yes it should. The paper goes on about ptrdiff_t but at the very end they punt on that in favor of what you have. +std::ptrdiff_t pd1 = 1234z; // { dg-warning "use of C\+\+23 ptrdiff_t integer constant" "" { target c++20_down } } +std::ptrdiff_t PD1 = 5678Z; // { dg-warning "use of C\+\+23 ptrdiff_t integer constant" "" { target c++20_down } } Ditto here. Agree. + const char *message = (result & CPP_N_UNSIGNED) == CPP_N_UNSIGNED + ? N_("use of C++23 size_t integer constant") + : N_("use of C++23 ptrdiff_t integer constant"); And here too (perhaps %::type%> )? And maybe % too. Agree. --- a/libcpp/include/cpplib.h +++ b/libcpp/include/cpplib.h @@ -500,6 +500,9 @@ struct cpp_options /* Nonzero means tokenize C++20 module directives. */ unsigned char module_directives; + /* Nonzero for C++23 ptrdiff_t and size_t literals. */ And drop "ptrdiff_t and " here? +#define CPP_N_SIZE_T 0x200 /* C++23 size_t or ptrdiff_t literal */ And " or ptrdiff_t" here? While ptrdiff_t will usually be the same type, seems there is e.g.: config/darwin.h:#define SIZE_TYPE "long unsigned int" config/darwin.h:#define PTRDIFF_TYPE "int" config/i386/djgpp.h:#define SIZE_TYPE "long unsigned int" config/i386/djgpp.h:#define PTRDIFF_TYPE "int" config/m32c/m32c.h:#define PTRDIFF_TYPE (TARGET_A16 ? "int" : "long int") config/m32c/m32c.h:#define SIZE_TYPE "unsigned int" config/rs6000/rs6000.h:#define PTRDIFF_TYPE "int" config/rs6000/rs6000.h:#define SIZE_TYPE "long unsigned int" config/s390/linux.h:#define SIZE_TYPE "long unsigned int" config/s390/linux.h:#define PTRDIFF_TYPE (TARGET_64BIT ? "long int" : "int") config/visium/visium.h:#define SIZE_TYPE "unsigned int" config/visium/visium.h:#define PTRDIFF_TYPE "long int" config/vms/vms.h:#define SIZE_TYPE "unsigned int" config/vms/vms.h:#define PTRDIFF_TYPE (flag_vms_pointer_size == VMS_POINTER_SIZE_NONE ? \ config/vms/vms.h- "int" : "long long int") so quite a few differences. Jakub Here is my last patch with all the concerns addressed. I am not smart enough to get the dg-warning regex in Wsize_t-literals.C to work. If someone could carry this over the finish line that would be great. Or give me pointers. I can't any more. Your regex will work fine if you wrap it in {} instead of "", e.g. { dg-warning {use of C\+\+23 .size_t. integer constant} } Jason Thank you Jason, So here is the latest in testing. +++ b/gcc/testsuite/g++.dg/warn/Wsize_t-literals.C @@ -0,0 +1,16 @@ +// { dg-do compile { target c++20_down } } + +#if __cplusplus >= 201103L + +#include +#include + +std::size_t s1 = 1234zu; // { dg-warning {use of C\+\+23 .size_t. integer constant} "" { target { ! c++98_only } } } +std::size_t S1 = 5678ZU; // { dg-warning {use of C\+\+23 .size_t. integer constant} "" { target { ! c++98_only } } } +std::size_t s2 = 1234uz; // { dg-warning {use of C\+\+23 .size_t. integer constant} "" { target { ! c++98_only } } } +std::size_t S2 = 5678UZ; // { dg-warning {use of C\+\+23 .size_t. integer constant} "" { target { ! c++98_only } } } + +std::make_signed::type pd1 = 1234z; // { dg-warning {use of C\+\+23 .make_signed::type. integer constant} "" { target { ! c++98_only } } } +std::make_signed::type PD1 = 5678Z; // { dg-warning {use of C\+\+23 .make_signed::type. integer constant} "" { target { ! c++98_only } } } + +#endif I'd suggest switching the target constraints around: use target c++11 in the dg-do line, drop the #if, and use target c++20_down in the diagnostics. OK with that change. Jason
Re: [Patch] Fortran: Fix Array dependency with local coarrays [PR98913] (was: Re: A question about coarrays and -Warray-temporaries)
Am 02.02.21 um 12:46 schrieb Tobias Burnus: Hi all, the attached patch now handles -fcoarray=single and access to the local image like no coarray access, using the same check a before. Actually, I think we could remove the if (..coarray..) check completely: For 'single', it is like no coarrays; for 'lib' both AA[remote] =... and "AA = ... AA[remove]" will create a function call and uses already temporaries. I don't know what 'native'/'threads' does – but either it the image index is different, then no temporary is needed at all – or it is the same (in particular: this_image / not present), but then the noncoarray analysis would work. That analysis is correct, also as far as the shared memory coarray goes (where you do offsets as an extra dimension, either it points to the same memory or to different memory). So, while your patch is OK, I think a simple removal of the test is also OK. Take your pick :-) Best regards Thomas
Re: [PATCH 13/16] Improve test codegen for interpreting assembly
On 27/01/21 21:42 +0100, Matthias Kretz wrote: From: Matthias Kretz In many failure cases it is helpful to inspect the instructions leading up to the test failure. After this change the location is easier to find and the branch after failure is easier to find. libstdc++-v3/ChangeLog: * testsuite/experimental/simd/tests/bits/verify.h (verify): Add instruction pointer data member. Ensure that the `if (m_failed)` branch is always inlined into the calling code. The body of the conditional can still be a function call. Move the get_ip call into the verify ctor to simplify the ctor calls. (COMPARE): Don't mention the use of all_of for reduction of a simd_mask. It only distracts from the real issue. --- .../experimental/simd/tests/bits/verify.h | 44 +-- 1 file changed, 22 insertions(+), 22 deletions(-) diff --git a/libstdc++-v3/testsuite/experimental/simd/tests/bits/verify.h b/ libstdc++-v3/testsuite/experimental/simd/tests/bits/verify.h Several of these patches have been mangled by your mailer. The line above has been wrapped in the middle of the filename, making this an invalid patch header. @@ -220,24 +223,21 @@ template #define COMPARE(_a, _b) \ And all these lines ending with a backslash have been wrapped. I can easily fix the filename in the header, but my simplistic attempts to put this back together have failed. Could you please resend patch 13/16 as an attachment, not inline? I think the others are OK.
Re: [RFC] Feedback on approach for adding support for V8QI->V8HI widening patterns
Richard Biener writes: > On Mon, Feb 1, 2021 at 6:54 PM Joel Hutton wrote: >> >> Hi Richard(s), >> >> I'm just looking to see if I'm going about this the right way, based on the >> discussion we had on IRC. I've managed to hack something together, I've >> attached a (very) WIP patch which gives the correct codegen for the testcase >> in question (https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98772). It would >> obviously need to support other widening patterns and differentiate between >> big/little endian among other things. >> >> I added a backend pattern because I wasn't quite clear which changes to make >> in order to allow the existing backend patterns to be used with a V8QI, or >> how to represent V16QI where we don't care about the top/bottom 8. I made >> some attempt in optabs.c, which is in the patch commented out, but I'm not >> sure if I'm going about this the right way. > > Hmm, as said, I'd try to arrange like illustrated in the attachment, > confined to vectorizable_conversion. The > only complication might be sub-optimal code-gen for the vector-vector > CTOR compensating for the input > vector (on RTL that would be a paradoxical subreg from say V4HI to V8HI) Yeah. I don't really like this because it means that it'll be impossible to remove the redundant work in gimple. The extra elements are just a crutch to satisfy the type system. As far as Joel's patch goes, I was imagining that the new operation would be an internal function rather than a tree code. However, if we don't want that, maybe we should just emit separate conversions and a normal subtraction, like we would for (signed) x - (unsigned) y. Thanks, Richard
Re: [Patch, fortran] PR98897 - Erroneous procedure attribute for associate name
Hi Tobias, ifort (IFORT) 2021.1 Beta 20201112 is happy with the testcase in the patch. In foo.f90, if I remove call var(i) ! { dg-bogus "VARIABLE attribute of 'var' conflicts with PROCEDURE attribute" } gfortran correctly complains 23 | associate (var => bar()) | 1 Error: Selector at (1) has no type ifort complains: ../pr98897/foo.f90(11): error #8179: The procedure pointer and the procedure target must both be functions or subroutines. res => double The responses from both compilers to foo3.f90 are the same. Cheers Paul On Tue, 2 Feb 2021 at 13:59, Tobias Burnus wrote: > Hi Paul, > > On 02.02.21 13:20, Paul Richard Thomas via Gcc-patches wrote: > > This is more or less 'obvious' and does not require any further > explanation. > > Well, I am not sure whether calling resolve is premature or not. In any > case, it still fails for the attached testcase. (Related but separate > issue.) > > The second testcase fails with "Selector at (1) has no type" / "Symbol > 'var' at (1) has no IMPLICIT type". > > Disclaimer: I am not 100% sure whether those two or your/the PR's > testcase is valid. (It fails to compile with ifort 19.1. I have not read > the spec and assume that the original testcase is valid.) > > Thus, please confirm that all three are valid. If so, do you see a way > to make the two new ones pass as well? If we are sure that the current > patch is still the right approach, I am also fine to do it stepwise. > > Thanks, > > Tobias > > > Regtests with FC33/x86_64 - OK for master (and )? > > > > Paul > > > > Fortran: Fix calls to associate name typebound subroutines [PR98897]. > > > > 2021-02-02 Paul Thomas > > > > gcc/fortran > > PR fortran/98897 > > * match.c (gfc_match_call): Include associate names as possible > > entities with typebound subroutines. The target needs to be > > resolved for the type. > > > > gcc/testsuite/ > > PR fortran/98897 > > * gfortran.dg/typebound_call_32.f90: New test. > - > Mentor Graphics (Deutschland) GmbH, Arnulfstrasse 201, 80634 München > Registergericht München HRB 106955, Geschäftsführer: Thomas Heurung, Frank > Thürauf > -- "If you can't explain it simply, you don't understand it well enough" - Albert Einstein
Re: [RFC] Feedback on approach for adding support for V8QI->V8HI widening patterns
On Tue, Feb 2, 2021 at 4:03 PM Richard Sandiford wrote: > > Richard Biener writes: > > On Mon, Feb 1, 2021 at 6:54 PM Joel Hutton wrote: > >> > >> Hi Richard(s), > >> > >> I'm just looking to see if I'm going about this the right way, based on > >> the discussion we had on IRC. I've managed to hack something together, > >> I've attached a (very) WIP patch which gives the correct codegen for the > >> testcase in question (https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98772). > >> It would obviously need to support other widening patterns and > >> differentiate between big/little endian among other things. > >> > >> I added a backend pattern because I wasn't quite clear which changes to > >> make in order to allow the existing backend patterns to be used with a > >> V8QI, or how to represent V16QI where we don't care about the top/bottom > >> 8. I made some attempt in optabs.c, which is in the patch commented out, > >> but I'm not sure if I'm going about this the right way. > > > > Hmm, as said, I'd try to arrange like illustrated in the attachment, > > confined to vectorizable_conversion. The > > only complication might be sub-optimal code-gen for the vector-vector > > CTOR compensating for the input > > vector (on RTL that would be a paradoxical subreg from say V4HI to V8HI) > > Yeah. I don't really like this because it means that it'll be > impossible to remove the redundant work in gimple. The extra elements > are just a crutch to satisfy the type system. We can certainly devise a more clever way to represent a paradoxical subreg, but at least the actual operation (WIDEN_MINUS_LOW) would match what the hardware can do. OTOH we could simply accept half of a vector for the _LOW (little-endial) or _HIGH (big-endian) op and have the expander deal with subreg frobbing? Not that I'd like that very much though, even a VIEW_CONVERT (v4hi-reg) would be cleaner IMHO (not sure how to go about endianess here ... the _LOW/_HIGH paints us into some corner here) A new IFN (direct optab?) means targets with existing support for _LO/HI do not automatically benefit which is a shame. > As far as Joel's patch goes, I was imagining that the new operation > would be an internal function rather than a tree code. However, > if we don't want that, maybe we should just emit separate conversions > and a normal subtraction, like we would for (signed) x - (unsigned) y. > > Thanks, > Richard
Re: The performance data for two different implementation of new security feature -ftrivial-auto-var-init
> On Feb 2, 2021, at 1:43 AM, Richard Biener wrote: > > On Mon, 1 Feb 2021, Qing Zhao wrote: > >> Hi, Richard, >> >> I have adjusted SRA phase to split calls to DEFERRED_INIT per you suggestion. >> >> And now the routine “bump_map” in 511.povray is like following: >> ... >> >> # DEBUG BEGIN_STMT >> xcoor = 0.0; >> ycoor = 0.0; >> # DEBUG BEGIN_STMT >> index = .DEFERRED_INIT (index, 2); >> index2 = .DEFERRED_INIT (index2, 2); >> index3 = .DEFERRED_INIT (index3, 2); >> # DEBUG BEGIN_STMT >> colour1 = .DEFERRED_INIT (colour1, 2); >> colour2 = .DEFERRED_INIT (colour2, 2); >> colour3 = .DEFERRED_INIT (colour3, 2); >> # DEBUG BEGIN_STMT >> p1$0_181 = .DEFERRED_INIT (p1$0_195(D), 2); >> # DEBUG p1$0 => p1$0_181 >> p1$1_184 = .DEFERRED_INIT (p1$1_182(D), 2); >> # DEBUG p1$1 => p1$1_184 >> p1$2_172 = .DEFERRED_INIT (p1$2_185(D), 2); >> # DEBUG p1$2 => p1$2_172 >> p2$0_177 = .DEFERRED_INIT (p2$0_173(D), 2); >> # DEBUG p2$0 => p2$0_177 >> p2$1_135 = .DEFERRED_INIT (p2$1_178(D), 2); >> # DEBUG p2$1 => p2$1_135 >> p2$2_137 = .DEFERRED_INIT (p2$2_136(D), 2); >> # DEBUG p2$2 => p2$2_137 >> p3$0_377 = .DEFERRED_INIT (p3$0_376(D), 2); >> # DEBUG p3$0 => p3$0_377 >> p3$1_379 = .DEFERRED_INIT (p3$1_378(D), 2); >> # DEBUG p3$1 => p3$1_379 >> p3$2_381 = .DEFERRED_INIT (p3$2_380(D), 2); >> # DEBUG p3$2 => p3$2_381 >> >> >> In the above, p1, p2, and p3 are all splitted to calls to DEFERRED_INIT of >> the components of p1, p2 and p3. >> >> With this change, the stack usage numbers with -fstack-usage for approach A, >> old approach D and new D with the splitting in SRA are: >> >> Approach A Approach D-old Approach D-new >> >> 272 624 368 >> >> From the above, we can see that splitting the call to DEFERRED_INIT in SRA >> can reduce the stack usage increase dramatically. >> >> However, looks like that the stack size for D is still bigger than A. >> >> I checked the IR again, and found that the alias analysis might be >> responsible for this (by compare the image.cpp.026t.ealias for both A and D): >> >> (Due to the call to: >> >> colour1 = .DEFERRED_INIT (colour1, 2); >> ) >> >> **Approach A: >> >> Points_to analysis: >> >> Constraints: >> … >> colour1 = &NULL >> … >> colour1 = &NONLOCAL >> colour1 = &NONLOCAL >> colour1 = &NONLOCAL >> colour1 = &NONLOCAL >> colour1 = &NONLOCAL >> ... >> callarg(53) = &colour1 >> ... >> _53 = colour1 >> >> Points_to sets: >> … >> colour1 = { NULL ESCAPED NONLOCAL } same as _53 >> ... >> CALLUSED(48) = { NULL ESCAPED NONLOCAL index colour1 } >> CALLCLOBBERED(49) = { NULL ESCAPED NONLOCAL index colour1 } same as >> CALLUSED(48) >> ... >> callarg(53) = { NULL ESCAPED NONLOCAL colour1 } >> >> **Apprach D: >> >> Points_to analysis: >> >> Constraints: >> … >> callarg(19) = colour1 >> callarg(19) = &NONLOCAL >> colour1 = callarg(19) + UNKNOWN >> colour1 = &NONLOCAL >> … >> colour1 = &NONLOCAL >> colour1 = &NONLOCAL >> colour1 = &NONLOCAL >> colour1 = &NONLOCAL >> colour1 = &NONLOCAL >> … >> callarg(74) = &colour1 >> callarg(74) = callarg(74) + UNKNOWN >> callarg(74) = *callarg(74) + UNKNOWN >> … >> _53 = colour1 >> _54 = _53 >> _55 = _54 + UNKNOWN >> _55 = &NONLOCAL >> _56 = colour1 >> _57 = _56 >> _58 = _57 + UNKNOWN >> _58 = &NONLOCAL >> _59 = _55 + UNKNOWN >> _59 = _58 + UNKNOWN >> _60 = colour1 >> _61 = _60 >> _62 = _61 + UNKNOWN >> _62 = &NONLOCAL >> _63 = _59 + UNKNOWN >> _63 = _62 + UNKNOWN >> _64 = _63 + UNKNOWN >> .. >> Points_to set: >> … >> colour1 = { ESCAPED NONLOCAL } same as callarg(19) >> … >> CALLUSED(69) = { ESCAPED NONLOCAL index colour1 } >> CALLCLOBBERED(70) = { ESCAPED NONLOCAL index colour1 } same as CALLUSED(69) >> callarg(71) = { ESCAPED NONLOCAL } >> callarg(72) = { ESCAPED NONLOCAL } >> callarg(73) = { ESCAPED NONLOCAL } >> callarg(74) = { ESCAPED NONLOCAL colour1 } >> >> My question: >> >> Is it possible to adjust alias analysis to resolve this issue? > > You probably want to handle .DEFERRED_INIT in tree-ssa-structalias.c > find_func_aliases_for_call (it's not a builtin but you can look in > the respective subroutine for examples). Specifically you want to > avoid making anything escaped or clobbered. Okay, thanks. Will check on that. Qing >> > > -- > Richard Biener mailto:rguent...@suse.de>> > SUSE Software Solutions Germany GmbH, Maxfeldstrasse 5, 90409 Nuernberg, > Germany; GF: Felix Imendörffer; HRB 36809 (AG Nuernberg)
[PATCH] aarch64: Reimplement vqmovun_high* intrinsics using builtins
Hi all, Another transition from inline asm to builtin. Only 3 intrinsics converted this time but they use the "+w" constraint in their inline asm so are more likely to generate redundant moves so benefit more from reimplementation. Bootstrapped and tested on aarch64-none-linux-gnu and aarch64_be-none-elf. Pushing to trunk. Thanks, Kyrill gcc/ChangeLog: * config/aarch64/aarch64-simd-builtins.def (sqxtun2): Define builtin. * config/aarch64/aarch64-simd.md (aarch64_sqxtun2_le): Define. (aarch64_sqxtun2_be): Likewise. (aarch64_sqxtun2): Likewise. * config/aarch64/arm_neon.h (vqmovun_high_s16): Reimplement using builtin. (vqmovun_high_s32): Likewise. (vqmovun_high_s64): Likewise. * config/aarch64/iterators.md (UNSPEC_SQXTUN2): Define. gcc/testsuite/ChangeLog: * gcc.target/aarch64/narrow_high-intrinsics.c: Adjust sqxtun2 scan. vqmovun2.patch Description: vqmovun2.patch
[PATCH] aarch64: Reimplement vrsqrte* intrinsics with builtins
Hi all, Another very simple move from inline asm to builtins. Only two intrinsics this time. Bootstrapped and tested on aarch64-none-linux-gnu and aarch64_be-none-elf. Pushing to trunk. Thanks, Kyrill gcc/ChangeLog: * config/aarch64/aarch64-simd-builtins.def (ursqrte): Define builtin. * config/aarch64/aarch64-simd.md (aarch64_ursqrte): New pattern. * config/aarch64/arm_neon.h (vrsqrte_u32): Reimplement using builtin. (vrsqrteq_u32): Likewise. ursqrt.patch Description: ursqrt.patch
Re: [Patch, fortran] PR98897 - Erroneous procedure attribute for associate name
Hi, first, I have attached a new example – it works if I move bar/hello up, but if 'foo' comes first, it fails. I think it is valid. (ifort 19 also compiles it.) Sorry for trying hard to find examples where it does not work – but I have simply the feeling that resolving things during parsing cannot work in all cases. On the other hand, I think your patch at least does not break valid code as I had feared before. :-) Thus, in that sense it would work for me. * * * Regarding my previous examples, they are invalid because of: C1105 (R1105) expr shall not be a designator of a procedure pointer or a function reference that returns a procedure pointer. However: On 02.02.21 16:05, Paul Richard Thomas via Fortran wrote: In foo.f90, if I remove call var(i) ! { dg-bogus "VARIABLE attribute of 'var' conflicts with PROCEDURE attribute" } gfortran correctly complains 23 | associate (var => bar()) | 1 Error: Selector at (1) has no type Which is not quite right. bar() has a type – it returns a procedure pointer; even in cases where gfortran could know at parse time, it does not diagnose C1105 but shows an odd error instead. ifort complains: ../pr98897/foo.f90(11): error #8179: The procedure pointer and the procedure target must both be functions or subroutines. res => double Okay, we found a bug in ifort. 'double' and 'res' have the same interface by construction – and both are subroutines. It seems to be a similar bug to the ifort bug I got before: When 'double' is parsed, ifort expects that 'precision' follows ('double precision'). The responses from both compilers to foo3.f90 are the same. (I forgot to comment/remove 'procedure(bar) :: var' when playing around.) Again, this code violates C1105 – and the error messages are still odd. On Tue, 2 Feb 2021 at 13:59, Tobias Burnus wrote: On 02.02.21 13:20, Paul Richard Thomas via Gcc-patches wrote: Regtests with FC33/x86_64 - OK for master (and )? Fortran: Fix calls to associate name typebound subroutines [PR98897]. 2021-02-02 Paul Thomas gcc/fortran PR fortran/98897 * match.c (gfc_match_call): Include associate names as possible entities with typebound subroutines. The target needs to be resolved for the type. gcc/testsuite/ PR fortran/98897 * gfortran.dg/typebound_call_32.f90: New test. - Mentor Graphics (Deutschland) GmbH, Arnulfstrasse 201, 80634 München Registergericht München HRB 106955, Geschäftsführer: Thomas Heurung, Frank Thürauf module m implicit none type t contains procedure, nopass :: hello end type t contains subroutine foo() associate (var => bar()) call var%hello() end associate end subroutine foo subroutine hello print *, 'Hello' end type(t) function bar() end end module m program test use m call foo() end program test
Re: [RFC] Feedback on approach for adding support for V8QI->V8HI widening patterns
Richard Biener writes: > On Tue, Feb 2, 2021 at 4:03 PM Richard Sandiford > wrote: >> >> Richard Biener writes: >> > On Mon, Feb 1, 2021 at 6:54 PM Joel Hutton wrote: >> >> >> >> Hi Richard(s), >> >> >> >> I'm just looking to see if I'm going about this the right way, based on >> >> the discussion we had on IRC. I've managed to hack something together, >> >> I've attached a (very) WIP patch which gives the correct codegen for the >> >> testcase in question >> >> (https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98772). It would obviously >> >> need to support other widening patterns and differentiate between >> >> big/little endian among other things. >> >> >> >> I added a backend pattern because I wasn't quite clear which changes to >> >> make in order to allow the existing backend patterns to be used with a >> >> V8QI, or how to represent V16QI where we don't care about the top/bottom >> >> 8. I made some attempt in optabs.c, which is in the patch commented out, >> >> but I'm not sure if I'm going about this the right way. >> > >> > Hmm, as said, I'd try to arrange like illustrated in the attachment, >> > confined to vectorizable_conversion. The >> > only complication might be sub-optimal code-gen for the vector-vector >> > CTOR compensating for the input >> > vector (on RTL that would be a paradoxical subreg from say V4HI to V8HI) >> >> Yeah. I don't really like this because it means that it'll be >> impossible to remove the redundant work in gimple. The extra elements >> are just a crutch to satisfy the type system. > > We can certainly devise a more clever way to represent a paradoxical subreg, > but at least the actual operation (WIDEN_MINUS_LOW) would match what > the hardware can do. At least for the Arm ISAs, the low parts are really 64-bit → 128-bit operations. E.g. the low-part intrinsic for signed 8-bit integers is: int16x8_t vsubl_s8 (int8x8_t __a, int8x8_t __b); whereas the high-part intrinsic is: int16x8_t vsubl_high_s8 (int8x16_t __a, int8x16_t __b); So representing the low part as a 128-bit → 128-bit operation is already a little artifical. > OTOH we could simply accept half of a vector for > the _LOW (little-endial) or _HIGH (big-endian) op and have the expander > deal with subreg frobbing? Not that I'd like that very much though, even > a VIEW_CONVERT (v4hi-reg) would be cleaner IMHO (not sure > how to go about endianess here ... the _LOW/_HIGH paints us into some > corner here) I think it only makes sense for the low part. But yeah, I guess that would work (although I agree it doesn't seem very appealing :-)). > A new IFN (direct optab?) means targets with existing support for _LO/HI > do not automatically benefit which is a shame. In practice this will only affect targets that choose to use mixed vector sizes, and I think it's reasonable to optimise only for the case in which such targets support widening conversions. So what do you think about the idea of emitting separate conversions and a normal subtract? We'd be relying on RTL to fuse them together, but at least there would be no redundancy to eliminate. Thanks, Richard > >> As far as Joel's patch goes, I was imagining that the new operation >> would be an internal function rather than a tree code. However, >> if we don't want that, maybe we should just emit separate conversions >> and a normal subtraction, like we would for (signed) x - (unsigned) y. >> >> Thanks, >> Richard
Re: [PATCH 1/4] openacc: Remove dereference for non-pointer derived-type members
(added fortran@ at it is about Fortran) I note that some lines in the testcase are commented (polymorphic arrays), but I note further that Patch 3/4 enables them. On 02.02.21 14:28, Julian Brown wrote: Derived-type members that are themselves derived types are not always represented as pointers, so it is not always correct to dereference them. The attached test case fails during gimplification without this patch. Tested with offloading to AMD GCN. OK for mainline? 2020-02-02 Julian Brown gcc/fortran/ * trans-openmp.c (gfc_trans_omp_clauses): Handle non-pointer type. gcc/testsuite/ * gfortran.dg/goacc/derived-classtypes-1.f95: New test. --- gcc/fortran/trans-openmp.c| 7 +- .../goacc/derived-classtypes-1.f95| 129 ++ 2 files changed, 134 insertions(+), 2 deletions(-) create mode 100644 gcc/testsuite/gfortran.dg/goacc/derived-classtypes-1.f95 diff --git a/gcc/fortran/trans-openmp.c b/gcc/fortran/trans-openmp.c index 00358ca4d39..8d8da4593c3 100644 --- a/gcc/fortran/trans-openmp.c +++ b/gcc/fortran/trans-openmp.c @@ -3013,7 +3013,8 @@ gfc_trans_omp_clauses (stmtblock_t *block, gfc_omp_clauses *clauses, } OMP_CLAUSE_DECL (node) - = build_fold_indirect_ref (data); + = (POINTER_TYPE_P (TREE_TYPE (data)) +? build_fold_indirect_ref (data) : data); OMP_CLAUSE_SIZE (node) = size; node2 = build_omp_clause (input_location, OMP_CLAUSE_MAP); I am a bit surprised given that: if (sym_attr.pointer || (openacc && sym_attr.allocatable)) I wonder whether we have a size problem: data = inner; size = TYPE_SIZE_UNIT (TREE_TYPE (inner)); Testing shows that it fails for: 80 | !$acc enter data copyin(jim%f) 95 | !$acc enter data copyin(pjim%f) 110 | !$acc enter data copyin(cjim%f) 125 | !$acc enter data copyin(acjim%f) where the component is 'type(aux), pointer :: f'. As sizeof(void*) and 2*sizeof(int) is the same, it does not matter in this case, but I fear it might in other cases. ('aux' contains two 'integer', one direct and one by inheriting it from 'aux1'.) Otherwise, I think the patch is fine. However, it also might be useful to add some character tests (kind=1 and kind=4) for completeness. NOTE: character(len=:), allocatable/pointer is the most interesting case, but that fails due to https://gcc.gnu.org/PR95868. Thus, we can only test for len=. Tobias @@ -3021,7 +3022,9 @@ gfc_trans_omp_clauses (stmtblock_t *block, gfc_omp_clauses *clauses, openacc ? GOMP_MAP_ATTACH_DETACH : GOMP_MAP_ALWAYS_POINTER); - OMP_CLAUSE_DECL (node2) = data; + OMP_CLAUSE_DECL (node2) + = (POINTER_TYPE_P (TREE_TYPE (data)) +? data : build_fold_addr_expr (data)); OMP_CLAUSE_SIZE (node2) = size_int (0); } else diff --git a/gcc/testsuite/gfortran.dg/goacc/derived-classtypes-1.f95 b/gcc/testsuite/gfortran.dg/goacc/derived-classtypes-1.f95 new file mode 100644 index 000..e6cf09c6d3c --- /dev/null +++ b/gcc/testsuite/gfortran.dg/goacc/derived-classtypes-1.f95 @@ -0,0 +1,129 @@ +type :: type1 + integer :: a +end type type1 + +type :: type2 + integer, pointer :: b +end type type2 + +type :: aux1 + integer :: y +end type aux1 + +type, extends(aux1) :: aux + integer :: x +end type aux + +type :: type3 + class(aux), pointer :: c(:) +end type type3 + +type :: type4 + integer, pointer :: d(:) +end type type4 + +type :: type5 + type(aux) :: e +end type type5 + +type :: type6 + type(aux), pointer :: f +end type type6 + +type :: type7 + class(aux), pointer :: g +end type type7 + +type(type1) :: foo +type(type2) :: bar +type(type3) :: qux +type(type4) :: quux +type(type5) :: fred +type(type6) :: jim +type(type7) :: shiela + +type(type1), pointer :: pfoo +type(type2), pointer :: pbar +type(type3), pointer :: pqux +type(type4), pointer :: pquux +type(type5), pointer :: pfred +type(type6), pointer :: pjim +type(type7), pointer :: pshiela + +class(type1), pointer :: cfoo +class(type2), pointer :: cbar +class(type3), pointer :: cqux +class(type4), pointer :: cquux +class(type5), pointer :: cfred +class(type6), pointer :: cjim +class(type7), pointer :: cshiela + +class(type1), allocatable :: acfoo +class(type2), allocatable :: acbar +class(type3), allocatable :: acqux +class(type4), allocatable :: acquux +class(type5), allocatable :: acfred +class(type6), allocatable :: acjim +class(type7), allocatable :: acshiela + +!$acc enter data copyin(foo) +!$ac
[pushed] c++: Member fns and deduction guide rewriting [PR98929]
My patch for 96199 had us re-substitute the parameter types of a constructor in order to rewrite mentions of members into dependent references. We need to do that for member functions, too. Tested x86_64-pc-linux-gnu, applying to trunk. gcc/cp/ChangeLog: PR c++/98929 PR c++/96199 * error.c (dump_expr): Ignore dummy object. * pt.c (tsubst_baselink): Handle dependent scope. gcc/testsuite/ChangeLog: PR c++/98929 * g++.dg/cpp1z/class-deduction-decltype1.C: New test. --- gcc/cp/error.c| 3 ++- gcc/cp/pt.c | 10 ++ .../g++.dg/cpp1z/class-deduction-decltype1.C | 11 +++ 3 files changed, 23 insertions(+), 1 deletion(-) create mode 100644 gcc/testsuite/g++.dg/cpp1z/class-deduction-decltype1.C diff --git a/gcc/cp/error.c b/gcc/cp/error.c index 60f4fd691f3..5213a8030ca 100644 --- a/gcc/cp/error.c +++ b/gcc/cp/error.c @@ -2352,7 +2352,8 @@ dump_expr (cxx_pretty_printer *pp, tree t, int flags) if (INDIRECT_REF_P (ob)) { ob = TREE_OPERAND (ob, 0); - if (!is_this_parameter (ob)) + if (!is_this_parameter (ob) + && !is_dummy_object (ob)) { dump_expr (pp, ob, flags | TFF_EXPR_IN_PARENS); if (TYPE_REF_P (TREE_TYPE (ob))) diff --git a/gcc/cp/pt.c b/gcc/cp/pt.c index db0ff73bdeb..aa1687a9f2a 100644 --- a/gcc/cp/pt.c +++ b/gcc/cp/pt.c @@ -16196,6 +16196,16 @@ tsubst_baselink (tree baselink, tree object_type, if (IDENTIFIER_CONV_OP_P (name)) name = make_conv_op_name (optype); + /* See maybe_dependent_member_ref. */ + if (dependent_scope_p (qualifying_scope)) + { + if (template_id_p) + name = build2 (TEMPLATE_ID_EXPR, unknown_type_node, name, + template_args); + return build_qualified_name (NULL_TREE, qualifying_scope, name, + /* ::template */false); + } + if (name == complete_dtor_identifier) /* Treat as-if non-dependent below. */ dependent_p = false; diff --git a/gcc/testsuite/g++.dg/cpp1z/class-deduction-decltype1.C b/gcc/testsuite/g++.dg/cpp1z/class-deduction-decltype1.C new file mode 100644 index 000..b83f7adecc7 --- /dev/null +++ b/gcc/testsuite/g++.dg/cpp1z/class-deduction-decltype1.C @@ -0,0 +1,11 @@ +// PR c++/98929 +// { dg-do compile { target c++17 } } + +template +struct A { + void foo (); + using c = decltype (foo ()); + A (c); // { dg-message {decltype \(A::foo} } +}; +A d; // { dg-error "deduction failed" } +// { dg-error "no match" "" { target *-*-* } .-1 } base-commit: d14cf89b94299d6d66c150fbbb9899a5dd91e7d4 -- 2.27.0
Re: [PATCH] i386, df: Fix up gcc.c-torture/compile/20051216-1.c -O1 -march=cascadelake
Richard Biener writes: > On January 30, 2021 11:52:20 AM GMT+01:00, Jakub Jelinek > wrote: >>On Sat, Jan 30, 2021 at 11:47:24AM +0100, Richard Biener wrote: >>> OK, so I'd prefer we simply unset the flag after processing deferred >>rescan. I clearly misread the function to do that. >> >>This works too, will bootstrap/regtest it now. > > OK. FWIW, I'm still not convinced we need to defer the rescan here. AIUI, the concern was that the pass introduces uses of something and then only later introduces the definition. But that's OK: the two things don't have to be added in a set order. In particular, the normal rescan doesn't propagate the effects throughout the cfg; it just updates the list of references in the instruction itself and marks the block as dirty for later processing. The usual reason for deferring rescans is if the instruction can't cope with the df_refs changing from under them like that, e.g. because they're iterating through a list of df_refs, or because they're using df_refs to represent value numbers. It shouldn't be a problem for this pass. Thanks, Richard
Re: [RFC] test builtin ratio for loop distribution
On Jan 28, 2021, Richard Biener wrote: > That would allow turning back the memset into the original loop (but > with optimal IVs, etc.). Is this sort of what you had in mind? I haven't tested the inline expansion of memset much yet; and that of memcpy, not at all; this really is mainly to check that I understood what you had in mind before I spend further time polishing it. test builtin ratio for loop distribution From: Alexandre Oliva The ldist pass turns even very short loops into memset calls. E.g., the TFmode emulation calls end with a loop of up to 3 iterations, to zero out trailing words, and the loop distribution pass turns them into calls of the memset builtin. Though short constant-length memsets are usually dealt with efficiently, for non-constant-length ones, the options are setmemM, or a function calls. RISC-V doesn't have any setmemM pattern, so the loops above end up "optimized" into memset calls, incurring not only the overhead of an explicit call, but also discarding the information the compiler has about the alignment of the destination, and that the length is a multiple of the word alignment. This patch adds, to the loop distribution pass, the ability to perform inline expansion of bounded variable-length memset and memcpy in ways that take advantage of known alignments and size's factors, when preexisting *_RATIO macros suggest the inline expansion is advantageous. for gcc/ChangeLog * tree-loop-distribution.c: Include builtins.h. (generate_memset_builtin): Introduce optional inline expansion of bounded variable-sized memset calls. (generate_memcpy_builtin): Likewise for memcpy only. (loop_distribution::execute): Fix loop structure afterwards. --- gcc/tree-loop-distribution.c | 280 ++ 1 file changed, 279 insertions(+), 1 deletion(-) diff --git a/gcc/tree-loop-distribution.c b/gcc/tree-loop-distribution.c index bb15fd3723fb6..3be7a4c1ac281 100644 --- a/gcc/tree-loop-distribution.c +++ b/gcc/tree-loop-distribution.c @@ -115,6 +115,7 @@ along with GCC; see the file COPYING3. If not see #include "tree-vectorizer.h" #include "tree-eh.h" #include "gimple-fold.h" +#include "builtins.h" #define MAX_DATAREFS_NUM \ @@ -1148,6 +1149,23 @@ generate_memset_builtin (class loop *loop, partition *partition) /* The new statements will be placed before LOOP. */ gsi = gsi_last_bb (loop_preheader_edge (loop)->src); + /* Compute builtin->size range and ctz before it's gimplified; sub-expressions + thereof are rewritten in place, so they end up referencing SSA_NAMEs for + which we don't have VR info. */ + unsigned align = get_pointer_alignment (builtin->dst_base) / BITS_PER_UNIT; + unsigned alctz, szctz, xbits; + wide_int szmin, szmax; + value_range_kind vrk; + if (align) +{ + alctz = wi::ctz (align); + szctz = MIN (tree_ctz (builtin->size), alctz); + xbits = alctz - szctz; + vrk = determine_value_range (builtin->size, &szmin, &szmax); + if (szmin == szmax) + align = 0; +} + nb_bytes = rewrite_to_non_trapping_overflow (builtin->size); nb_bytes = force_gimple_operand_gsi (&gsi, nb_bytes, true, NULL_TREE, false, GSI_CONTINUE_LINKING); @@ -1172,6 +1190,127 @@ generate_memset_builtin (class loop *loop, partition *partition) val = tem; } + unsigned HOST_WIDE_INT ratio; + if (integer_zerop (val)) +ratio = CLEAR_RATIO (optimize_loop_for_speed_p (loop)); + else +ratio = SET_RATIO (optimize_loop_for_speed_p (loop)); + + /* Compare the ratio with the number of (most-aligned) required stores needed + for szmax bytes, with the ratio: a loop that iterates up to szmax >> alctz, + and then one conditional store for each extra bit of alignment that the + destination has over the size. */ + if (align && vrk == VR_RANGE + && wi::ltu_p (wi::rshift (szmax, alctz, UNSIGNED) + xbits, ratio)) +{ + gimple *stmt; + tree bytes = create_tmp_var_raw (size_type_node, "ldistbytes"); + tree ptr = create_tmp_var_raw (build_pointer_type (char_type_node), +"ldistptr"); + tree stval = force_gimple_operand_gsi (&gsi, +fold_convert +(unsigned_char_type_node, val), +true, NULL_TREE, false, +GSI_CONTINUE_LINKING); + + for (unsigned i = 1; i != alctz; i *= 2) + { + unsigned bits = i * 2 * BITS_PER_UNIT; + tree type = build_nonstandard_integer_type (bits, true); + stval = fold_convert (type, stval); + tree op2 = fold_build2_loc (partition->loc, LSHIFT_EXPR, + TREE_TYPE (stval), stval, + build_int_cst (type, i * BITS_PER_UNIT)); + stval
Re: [Patch] Fortran: Fix Array dependency with local coarrays [PR98913] (was: Re: A question about coarrays and -Warray-temporaries)
Hi Thomas, On 02.02.21 15:54, Thomas Koenig wrote: So, while your patch is OK, I think a simple removal of the test is also OK. Take your pick :-) I think I will do a combination: If 'identical' is true, I think I cannot remove it. If it is false, it can be identical or nonoverlapping – which makes sense. Unless there are further comments, I will commit it later. Tobias - Mentor Graphics (Deutschland) GmbH, Arnulfstrasse 201, 80634 München Registergericht München HRB 106955, Geschäftsführer: Thomas Heurung, Frank Thürauf Fortran: Fix Array dependency with local coarrays [PR98913] gcc/fortran/ChangeLog: PR fortran/98913 * dependency.c (gfc_dep_resolver): Treat local access to coarrays like any array access in dependency analysis. gcc/testsuite/ChangeLog: PR fortran/98913 * gfortran.dg/coarray/array_temporary.f90: New test. gcc/fortran/dependency.c | 15 - .../gfortran.dg/coarray/array_temporary.f90| 74 ++ 2 files changed, 86 insertions(+), 3 deletions(-) diff --git a/gcc/fortran/dependency.c b/gcc/fortran/dependency.c index c9baca80cbc..58593ba535b 100644 --- a/gcc/fortran/dependency.c +++ b/gcc/fortran/dependency.c @@ -30,6 +30,7 @@ along with GCC; see the file COPYING3. If not see #include "dependency.h" #include "constructor.h" #include "arith.h" +#include "options.h" /* static declarations */ /* Enums */ @@ -2142,9 +2143,17 @@ gfc_dep_resolver (gfc_ref *lref, gfc_ref *rref, gfc_reverse *reverse, return (fin_dep == GFC_DEP_OVERLAP) ? 1 : 0; case REF_ARRAY: - - /* For now, treat all coarrays as dangerous. */ - if (lref->u.ar.codimen || rref->u.ar.codimen) + /* Coarrays: If there is a coindex, either the image differs and there + is no overlap or the image is the same - then the normal analysis + applies. Hence, return early only if 'identical' is required and + either ref is coindexed and more than one image can exist. */ + if (identical && flag_coarray != GFC_FCOARRAY_SINGLE + && ((lref->u.ar.codimen + && lref->u.ar.dimen_type[lref->u.ar.dimen] + != DIMEN_THIS_IMAGE) + || (rref->u.ar.codimen + && lref->u.ar.dimen_type[lref->u.ar.dimen] + != DIMEN_THIS_IMAGE))) return 1; if (ref_same_as_full_array (lref, rref)) diff --git a/gcc/testsuite/gfortran.dg/coarray/array_temporary.f90 b/gcc/testsuite/gfortran.dg/coarray/array_temporary.f90 new file mode 100644 index 000..86460a7c282 --- /dev/null +++ b/gcc/testsuite/gfortran.dg/coarray/array_temporary.f90 @@ -0,0 +1,74 @@ +! { dg-do compile } +! { dg-additional-options "-Warray-temporaries" } +! +! PR fortran/98913 +! +! Contributed by Jorge D'Elia +! +! Did create an array temporary for local access to coarray +! (but not for identical noncoarray use). +! + +program test + implicit none + integer, parameter :: iin = kind (1) + integer, parameter :: idp = kind (1.0d0) + real(kind=idp), allocatable :: AA (:,:)[:] + real(kind=idp), allocatable :: BB (:,:) + real(kind=idp), allocatable :: UU (:) + integer (kind=iin) :: nn, n1, n2 + integer (kind=iin) :: j, k, k1 + ! + nn = 5 + n1 = 1 + n2 = 10 + ! + allocate (AA (1:nn,n1:n2)[*]) + allocate (BB (1:nn,n1:n2)) + allocate (UU (1:nn)) + ! + k = 1 + k1 = k + 1 + ! + AA = 1.0_idp + BB = 1.0_idp + UU = 2.0_idp + + ! AA - coarrays + ! No temporary needed: + do j = 1, nn +AA (k1:nn,j) = AA (k1:nn,j) - UU (k1:nn) * AA (k,j) ! { dg-bogus "Creating array temporary" } + end do + do j = 1, nn +AA (k1:nn,j) = AA (k1:nn,j) - UU (k1:nn) * AA (k,j) - UU(k) * AA (k1-1:nn-1,j) ! { dg-bogus "Creating array temporary" } + end do + do j = 1, nn +AA (k1:nn,j) = AA (k1:nn,j) - UU (k1:nn) * AA (k,j) - UU(k) * AA (k1+1:nn+1,j) ! { dg-bogus "Creating array temporary" } + end do + + ! But: + do j = 1, nn +AA (k1:nn,j) = AA (k1-1:nn-1,j) - UU (k1:nn) * AA (k,j) - UU(k) * AA (k1+1:nn+1,j) ! { dg-warning "Creating array temporary" } + end do + + ! BB - no coarrays + ! No temporary needed: + do j = 1, nn +BB (k1:nn,j) = BB (k1:nn,j) - UU (k1:nn) * BB (k,j) ! { dg-bogus "Creating array temporary" } + end do + do j = 1, nn +BB (k1:nn,j) = BB (k1:nn,j) - UU (k1:nn) * BB (k,j) - UU(k) * BB (k1-1:nn-1,j) ! { dg-bogus "Creating array temporary" } + end do + do j = 1, nn +BB (k1:nn,j) = BB (k1:nn,j) - UU (k1:nn) * BB (k,j) - UU(k) * BB (k1+1:nn+1,j) ! { dg-bogus "Creating array temporary" } + end do + + ! But: + do j = 1, nn +BB (k1:nn,j) = BB (k1-1:nn-1,j) - UU (k1:nn) * BB (k,j) - UU(k) * BB (k1+1:nn+1,j) ! { dg-warning "Creating array temporary" } + end do + + deallocate (AA) + deallocate (BB) + deallocate (UU) +end program test
Re: [PATCH] arm: Fix up neon_vector_mem_operand [PR97528]
Hi, This is a gcc-9 backport of the PR97528 fix that has been applied to trunk and gcc-10. Bootstraped on arm-linux-gnueabihf and regression tested. OK for gcc-9 branch? 2021-02-02 Andre Vieira Backport from mainline 2020-11-20 Jakub Jelinek PR target/97528 * config/arm/arm.c (neon_vector_mem_operand): For POST_MODIFY, require first POST_MODIFY operand is a REG and is equal to the first operand of PLUS. * gcc.target/arm/pr97528.c: New test. On 20/11/2020 11:25, Kyrylo Tkachov via Gcc-patches wrote: -Original Message- From: Jakub Jelinek Sent: 19 November 2020 18:57 To: Richard Earnshaw ; Ramana Radhakrishnan ; Kyrylo Tkachov Cc: gcc-patches@gcc.gnu.org Subject: [PATCH] arm: Fix up neon_vector_mem_operand [PR97528] Hi! The documentation for POST_MODIFY says: Currently, the compiler can only handle second operands of the form (plus (reg) (reg)) and (plus (reg) (const_int)), where the first operand of the PLUS has to be the same register as the first operand of the *_MODIFY. The following testcase ICEs, because combine just attempts to simplify things and ends up with (post_modify (reg1) (plus (mult (reg2) (const_int 4)) (reg1)) but the target predicates accept it, because they only verify that POST_MODIFY's second operand is PLUS and the second operand of the PLUS is a REG. The following patch fixes this by performing further verification that the POST_MODIFY is in the form it should be. Bootstrapped/regtested on armv7hl-linux-gnueabi, ok for trunk and release branches after a while? Ok. Thanks, Kyrill 2020-11-19 Jakub Jelinek PR target/97528 * config/arm/arm.c (neon_vector_mem_operand): For POST_MODIFY, require first POST_MODIFY operand is a REG and is equal to the first operand of PLUS. * gcc.target/arm/pr97528.c: New test. --- gcc/config/arm/arm.c.jj 2020-11-13 19:00:46.729620560 +0100 +++ gcc/config/arm/arm.c2020-11-18 17:05:44.656867343 +0100 @@ -13429,7 +13429,9 @@ neon_vector_mem_operand (rtx op, int typ /* Allow post-increment by register for VLDn */ if (type == 2 && GET_CODE (ind) == POST_MODIFY && GET_CODE (XEXP (ind, 1)) == PLUS - && REG_P (XEXP (XEXP (ind, 1), 1))) + && REG_P (XEXP (XEXP (ind, 1), 1)) + && REG_P (XEXP (ind, 0)) + && rtx_equal_p (XEXP (ind, 0), XEXP (XEXP (ind, 1), 0))) return true; /* Match: --- gcc/testsuite/gcc.target/arm/pr97528.c.jj 2020-11-18 17:09:58.195053288 +0100 +++ gcc/testsuite/gcc.target/arm/pr97528.c 2020-11-18 17:09:47.839168237 +0100 @@ -0,0 +1,28 @@ +/* PR target/97528 */ +/* { dg-do compile } */ +/* { dg-require-effective-target arm_neon_ok } */ +/* { dg-options "-O1" } */ +/* { dg-add-options arm_neon } */ + +#include + +typedef __simd64_int16_t T; +typedef __simd64_uint16_t U; +unsigned short c; +int d; +U e; + +void +foo (void) +{ + unsigned short *dst = &c; + int g = d, b = 4; + U dc = e; + for (int h = 0; h < b; h++) +{ + unsigned short *i = dst; + U j = dc; + vst1_s16 ((int16_t *) i, (T) j); + dst += g; +} +} Jakub diff --git a/gcc/config/arm/arm.c b/gcc/config/arm/arm.c index 04edd637d43198ad801bb5ada8f1807faae7001e..4679da75dd823778d5a3e37c497ee10793e9c7d7 100644 --- a/gcc/config/arm/arm.c +++ b/gcc/config/arm/arm.c @@ -12730,7 +12730,9 @@ neon_vector_mem_operand (rtx op, int type, bool strict) /* Allow post-increment by register for VLDn */ if (type == 2 && GET_CODE (ind) == POST_MODIFY && GET_CODE (XEXP (ind, 1)) == PLUS - && REG_P (XEXP (XEXP (ind, 1), 1))) + && REG_P (XEXP (XEXP (ind, 1), 1)) + && REG_P (XEXP (ind, 0)) + && rtx_equal_p (XEXP (ind, 0), XEXP (XEXP (ind, 1), 0))) return true; /* Match: diff --git a/gcc/testsuite/gcc.target/arm/pr97528.c b/gcc/testsuite/gcc.target/arm/pr97528.c new file mode 100644 index ..6cc59f2158c5f8c8dd78e5083ca7ebc4e5f63a44 --- /dev/null +++ b/gcc/testsuite/gcc.target/arm/pr97528.c @@ -0,0 +1,28 @@ +/* PR target/97528 */ +/* { dg-do compile } */ +/* { dg-require-effective-target arm_neon_ok } */ +/* { dg-options "-O1" } */ +/* { dg-add-options arm_neon } */ + +#include + +typedef __simd64_int16_t T; +typedef __simd64_uint16_t U; +unsigned short c; +int d; +U e; + +void +foo (void) +{ + unsigned short *dst = &c; + int g = d, b = 4; + U dc = e; + for (int h = 0; h < b; h++) +{ + unsigned short *i = dst; + U j = dc; + vst1_s16 ((int16_t *) i, (T) j); + dst += g; +} +}
Re: [PATCH] i386, df: Fix up gcc.c-torture/compile/20051216-1.c -O1 -march=cascadelake
On Tue, 2 Feb 2021, Richard Sandiford wrote: > Richard Biener writes: > > On January 30, 2021 11:52:20 AM GMT+01:00, Jakub Jelinek > > wrote: > >>On Sat, Jan 30, 2021 at 11:47:24AM +0100, Richard Biener wrote: > >>> OK, so I'd prefer we simply unset the flag after processing deferred > >>rescan. I clearly misread the function to do that. > >> > >>This works too, will bootstrap/regtest it now. > > > > OK. > > FWIW, I'm still not convinced we need to defer the rescan here. > AIUI, the concern was that the pass introduces uses of something > and then only later introduces the definition. But that's OK: > the two things don't have to be added in a set order. > > In particular, the normal rescan doesn't propagate the effects > throughout the cfg; it just updates the list of references in the > instruction itself and marks the block as dirty for later processing. > > The usual reason for deferring rescans is if the instruction can't > cope with the df_refs changing from under them like that, e.g. > because they're iterating through a list of df_refs, or because > they're using df_refs to represent value numbers. It shouldn't > be a problem for this pass. I guess that's correct but doing the scan on the "final" insn feels much cleaner - given all the confusion it also looks safer (doing defered scans as before) since I intend to backport this. But if you feel strongly about this we can of course change it. Thanks, Richard.
[PATCH v7] Practical improvement to libgcc complex divide
Changes in this version from Version 6: Updated copyrights for following three files to -2021. gcc/c-family/c-cppbuiltin.c Moved code for setting LIBGCC modename to FUNC_EXT section. Added code to set modename for any additional modes such as FLT128 or FLT64X. libgcc/libgcc2.c Changed RMIN2 & RMINSCAL for XF and TF modes to use the matching LIBGCC_?F_EPSILON instead of LIBGCC_DF_EPSILON libgcc/config/rs6000/_divkc3.c Changed RMIN2 & RMINSCAL for TF mode to use the matching LIBGCC_TF_EPSILON instead of LIBGCC_DF_EPSILON Correctness and performance test programs used during development of this project may be found in the attachment to: https://www.mail-archive.com/gcc-patches@gcc.gnu.org/msg254210.html Summary of Purpose This patch to libgcc/libgcc2.c __divdc3 provides an opportunity to gain important improvements to the quality of answers for the default complex divide routine (half, float, double, extended, long double precisions) when dealing with very large or very small exponents. The current code correctly implements Smith's method (1962) [2] further modified by c99's requirements for dealing with NaN (not a number) results. When working with input values where the exponents are greater than *_MAX_EXP/2 or less than -(*_MAX_EXP)/2, results are substantially different from the answers provided by quad precision more than 1% of the time. This error rate may be unacceptable for many applications that cannot a priori restrict their computations to the safe range. The proposed method reduces the frequency of "substantially different" answers by more than 99% for double precision at a modest cost of performance. Differences between current gcc methods and the new method will be described. Then accuracy and performance differences will be discussed. Background This project started with an investigation related to https://gcc.gnu.org/bugzilla/show_bug.cgi?id=59714. Study of Beebe[1] provided an overview of past and recent practice for computing complex divide. The current glibc implementation is based on Robert Smith's algorithm [2] from 1962. A google search found the paper by Baudin and Smith [3] (same Robert Smith) published in 2012. Elen Kalda's proposed patch [4] is based on that paper. I developed two sets of test data by randomly distributing values over a restricted range and the full range of input values. The current complex divide handled the restricted range well enough, but failed on the full range more than 1% of the time. Baudin and Smith's primary test for "ratio" equals zero reduced the cases with 16 or more error bits by a factor of 5, but still left too many flawed answers. Adding debug print out to cases with substantial errors allowed me to see the intermediate calculations for test values that failed. I noted that for many of the failures, "ratio" was a subnormal. Changing the "ratio" test from check for zero to check for subnormal reduced the 16 bit error rate by another factor of 12. This single modified test provides the greatest benefit for the least cost, but the percentage of cases with greater than 16 bit errors (double precision data) is still greater than 0.027% (2.7 in 10,000). Continued examination of remaining errors and their intermediate computations led to the various tests of input value tests and scaling to avoid under/overflow. The current patch does not handle some of the rare and most extreme combinations of input values, but the random test data is only showing 1 case in 10 million that has an error of greater than 12 bits. That case has 18 bits of error and is due to subtraction cancellation. These results are significantly better than the results reported by Baudin and Smith. Support for half, float, double, extended, and long double precision is included as all are handled with suitable preprocessor symbols in a single source routine. Since half precision is computed with float precision as per current libgcc practice, the enhanced algorithm provides no benefit for half precision and would cost performance. Further investigation showed changing the half precision algorithm to use the simple formula (real=a*c+b*d imag=b*c-a*d) caused no loss of precision and modest improvement in performance. The existing constants for each precision: float: FLT_MAX, FLT_MIN; double: DBL_MAX, DBL_MIN; extended and/or long double: LDBL_MAX, LDBL_MIN are used for avoiding the more common overflow/underflow cases. This use is made generic by defining appropriate __LIBGCC2_* macros in c-cppbuiltin.c. Tests are added for when both parts of the denominator have exponents small enough to allow shifting any subnormal values to normal values all input values could be scaled up without risking overflow. That gained a clear improvement in accuracy. Similarly, when either numerator was subnormal and the other numerator and both denominator values were not too large, scaling could be used to reduce r
RE: [PATCH] arm: Fix up neon_vector_mem_operand [PR97528]
> -Original Message- > From: Andre Vieira (lists) > Sent: 02 February 2021 17:27 > To: gcc-patches@gcc.gnu.org > Cc: Kyrylo Tkachov ; ja...@redhat.com > Subject: Re: [PATCH] arm: Fix up neon_vector_mem_operand [PR97528] > > Hi, > > This is a gcc-9 backport of the PR97528 fix that has been applied to > trunk and gcc-10. > Bootstraped on arm-linux-gnueabihf and regression tested. > > OK for gcc-9 branch? Ok. Thanks, Kyrill > > 2021-02-02 Andre Vieira > > Backport from mainline > 2020-11-20 Jakub Jelinek > > PR target/97528 > * config/arm/arm.c (neon_vector_mem_operand): For POST_MODIFY, > require > first POST_MODIFY operand is a REG and is equal to the first operand > of PLUS. > > * gcc.target/arm/pr97528.c: New test. > > On 20/11/2020 11:25, Kyrylo Tkachov via Gcc-patches wrote: > > > >> -Original Message- > >> From: Jakub Jelinek > >> Sent: 19 November 2020 18:57 > >> To: Richard Earnshaw ; Ramana > >> Radhakrishnan ; Kyrylo Tkachov > >> > >> Cc: gcc-patches@gcc.gnu.org > >> Subject: [PATCH] arm: Fix up neon_vector_mem_operand [PR97528] > >> > >> Hi! > >> > >> The documentation for POST_MODIFY says: > >> Currently, the compiler can only handle second operands of the > >> form (plus (reg) (reg)) and (plus (reg) (const_int)), where > >> the first operand of the PLUS has to be the same register as > >> the first operand of the *_MODIFY. > >> The following testcase ICEs, because combine just attempts to simplify > >> things and ends up with > >> (post_modify (reg1) (plus (mult (reg2) (const_int 4)) (reg1)) > >> but the target predicates accept it, because they only verify > >> that POST_MODIFY's second operand is PLUS and the second operand > >> of the PLUS is a REG. > >> > >> The following patch fixes this by performing further verification that > >> the POST_MODIFY is in the form it should be. > >> > >> Bootstrapped/regtested on armv7hl-linux-gnueabi, ok for trunk > >> and release branches after a while? > > Ok. > > Thanks, > > Kyrill > > > >> 2020-11-19 Jakub Jelinek > >> > >>PR target/97528 > >>* config/arm/arm.c (neon_vector_mem_operand): For > >> POST_MODIFY, require > >>first POST_MODIFY operand is a REG and is equal to the first operand > >>of PLUS. > >> > >>* gcc.target/arm/pr97528.c: New test. > >> > >> --- gcc/config/arm/arm.c.jj2020-11-13 19:00:46.729620560 +0100 > >> +++ gcc/config/arm/arm.c 2020-11-18 17:05:44.656867343 +0100 > >> @@ -13429,7 +13429,9 @@ neon_vector_mem_operand (rtx op, int typ > >> /* Allow post-increment by register for VLDn */ > >> if (type == 2 && GET_CODE (ind) == POST_MODIFY > >> && GET_CODE (XEXP (ind, 1)) == PLUS > >> - && REG_P (XEXP (XEXP (ind, 1), 1))) > >> + && REG_P (XEXP (XEXP (ind, 1), 1)) > >> + && REG_P (XEXP (ind, 0)) > >> + && rtx_equal_p (XEXP (ind, 0), XEXP (XEXP (ind, 1), 0))) > >>return true; > >> > >> /* Match: > >> --- gcc/testsuite/gcc.target/arm/pr97528.c.jj 2020-11-18 > >> 17:09:58.195053288 +0100 > >> +++ gcc/testsuite/gcc.target/arm/pr97528.c 2020-11-18 > >> 17:09:47.839168237 +0100 > >> @@ -0,0 +1,28 @@ > >> +/* PR target/97528 */ > >> +/* { dg-do compile } */ > >> +/* { dg-require-effective-target arm_neon_ok } */ > >> +/* { dg-options "-O1" } */ > >> +/* { dg-add-options arm_neon } */ > >> + > >> +#include > >> + > >> +typedef __simd64_int16_t T; > >> +typedef __simd64_uint16_t U; > >> +unsigned short c; > >> +int d; > >> +U e; > >> + > >> +void > >> +foo (void) > >> +{ > >> + unsigned short *dst = &c; > >> + int g = d, b = 4; > >> + U dc = e; > >> + for (int h = 0; h < b; h++) > >> +{ > >> + unsigned short *i = dst; > >> + U j = dc; > >> + vst1_s16 ((int16_t *) i, (T) j); > >> + dst += g; > >> +} > >> +} > >> > >> > >>Jakub
[PATCH] fix memory leaks
This fixes various vec<> memory leaks as discovered compiling 521.wrf_r. Bootstrapped and tested on x86_64-unknown-linux-gnu, pushed. 2021-02-02 Richard Biener * gimple-loop-interchange.cc (prepare_data_references): Release vectors. * gimple-loop-jam.c (tree_loop_unroll_and_jam): Likewise. * tree-ssa-loop-im.c (hoist_memory_references): Likewise. * tree-vect-stmts.c (vectorizable_condition): Do not allocate vectors. (vectorizable_comparison): Likewise. --- gcc/gimple-loop-interchange.cc | 10 -- gcc/gimple-loop-jam.c | 6 ++ gcc/tree-ssa-loop-im.c | 1 + gcc/tree-vect-stmts.c | 13 - 4 files changed, 11 insertions(+), 19 deletions(-) diff --git a/gcc/gimple-loop-interchange.cc b/gcc/gimple-loop-interchange.cc index a1dadd8e5d4..f45b9364644 100644 --- a/gcc/gimple-loop-interchange.cc +++ b/gcc/gimple-loop-interchange.cc @@ -1940,7 +1940,10 @@ prepare_data_references (class loop *loop, vec *datarefs) delete bb_refs; } else if (bb_refs->is_empty ()) - delete bb_refs; + { + bb_refs->release (); + delete bb_refs; + } else bb->aux = bb_refs; } @@ -1954,7 +1957,10 @@ prepare_data_references (class loop *loop, vec *datarefs) bb_refs = (vec *) bb->aux; if (loop_nest && flow_bb_inside_loop_p (loop_nest, bb)) - datarefs->safe_splice (*bb_refs); + { + datarefs->safe_splice (*bb_refs); + bb_refs->release (); + } else free_data_refs (*bb_refs); diff --git a/gcc/gimple-loop-jam.c b/gcc/gimple-loop-jam.c index 485f5a9bf61..69dbaeb6cb9 100644 --- a/gcc/gimple-loop-jam.c +++ b/gcc/gimple-loop-jam.c @@ -505,15 +505,13 @@ tree_loop_unroll_and_jam (void) if (!unroll_jam_possible_p (outer, loop)) continue; - vec datarefs; - vec dependences; + vec datarefs = vNULL; + vec dependences = vNULL; unsigned unroll_factor, profit_unroll, removed; class tree_niter_desc desc; bool unroll = false; auto_vec loop_nest; - dependences.create (10); - datarefs.create (10); if (!compute_data_dependences_for_loop (outer, true, &loop_nest, &datarefs, &dependences)) { diff --git a/gcc/tree-ssa-loop-im.c b/gcc/tree-ssa-loop-im.c index 445b93f7979..8034cf68d27 100644 --- a/gcc/tree-ssa-loop-im.c +++ b/gcc/tree-ssa-loop-im.c @@ -2508,6 +2508,7 @@ hoist_memory_references (class loop *loop, bitmap mem_refs, if (res != 1) { bitmap_copy (refs_not_supported, mem_refs); + seq.release (); break; } sms.safe_push (std::make_pair (e, seq)); diff --git a/gcc/tree-vect-stmts.c b/gcc/tree-vect-stmts.c index f180ced3124..5eb7b2d1d6e 100644 --- a/gcc/tree-vect-stmts.c +++ b/gcc/tree-vect-stmts.c @@ -10085,14 +10085,6 @@ vectorizable_condition (vec_info *vinfo, /* Transform. */ - if (!slp_node) -{ - vec_oprnds0.create (1); - vec_oprnds1.create (1); - vec_oprnds2.create (1); - vec_oprnds3.create (1); -} - /* Handle def. */ scalar_dest = gimple_assign_lhs (stmt); if (reduction_type != EXTRACT_LAST_REDUCTION) @@ -10480,11 +10472,6 @@ vectorizable_comparison (vec_info *vinfo, } /* Transform. */ - if (!slp_node) -{ - vec_oprnds0.create (1); - vec_oprnds1.create (1); -} /* Handle def. */ lhs = gimple_assign_lhs (stmt); -- 2.26.2
[PATCH]middle-end slp: Split out patterns away from using SLP_ONLY into their own flag
Hi All, Previously the SLP pattern matcher was using STMT_VINFO_SLP_VECT_ONLY as a way to dissolve the SLP only patterns during SLP cancellation. However it seems like the semantics for STMT_VINFO_SLP_VECT_ONLY are slightly different than what I expected. Namely that the non-SLP path can still use a statement marked STMT_VINFO_SLP_VECT_ONLY. One such example is masked loads which are used both in the SLP and non-SLP path. To fix this I now introduce a new flag STMT_VINFO_SLP_VECT_ONLY_PATTERN which is used only by the pattern matcher. Bootstrapped Regtested on aarch64-none-linux-gnu, x86_64-pc-linux-gnu and no issues. Ok for master? Thanks, Tamar gcc/ChangeLog: PR tree-optimization/98928 * tree-vect-loop.c (vect_analyze_loop_2): Change STMT_VINFO_SLP_VECT_ONLY to STMT_VINFO_SLP_VECT_ONLY_PATTERN. * tree-vect-slp-patterns.c (complex_pattern::build): Likewise. * tree-vectorizer.h (STMT_VINFO_SLP_VECT_ONLY_PATTERN): New. (class _stmt_vec_info): Add slp_vect_pattern_only_p. gcc/testsuite/ChangeLog: PR tree-optimization/98928 * gcc.target/i386/pr98928.c: New test. --- inline copy of patch -- diff --git a/gcc/testsuite/gcc.target/i386/pr98928.c b/gcc/testsuite/gcc.target/i386/pr98928.c new file mode 100644 index ..9503b579a88d95c427d3e3e5a71565b0c048c125 --- /dev/null +++ b/gcc/testsuite/gcc.target/i386/pr98928.c @@ -0,0 +1,59 @@ +/* { dg-do compile } */ +/* { dg-additional-options "-Ofast -march=skylake-avx512 -fwhole-program -w" } */ + +typedef float MagickRealType; +typedef short Quantum; +float InterpolateMagickPixelPacket_alpha[1]; +int InterpolateMagickPixelPacket_i; + +void InterpolateMagickPixelPacket(); + +void main() { InterpolateMagickPixelPacket(); } + +typedef struct { + MagickRealType red, green, blue, opacity, index; +} MagickPixelPacket; +typedef struct { + Quantum blue, green, red, opacity; +} PixelPacket; +struct _Image { + int colorspace; + int matte; +} GetMagickPixelPacket(MagickPixelPacket *pixel) { + pixel->red = pixel->green = pixel->blue = 0.0; +} +int AlphaBlendMagickPixelPacket(struct _Image *image, PixelPacket *color, +Quantum *indexes, MagickPixelPacket *pixel, +MagickRealType *alpha) { + if (image->matte) { +*alpha = pixel->red = pixel->green = pixel->blue = pixel->opacity = +color->opacity; +pixel->index = 0.0; +if (image->colorspace) + pixel->index = *indexes; +return 0; + } + *alpha = 1.0 / 0.2; + pixel->red = *alpha * color->red; + pixel->green = *alpha * color->green; + pixel->blue = *alpha * color->blue; + pixel->opacity = pixel->index = 0.0; + if (image->colorspace && indexes) +pixel->index = *indexes; +} +MagickPixelPacket InterpolateMagickPixelPacket_pixels[1]; +PixelPacket InterpolateMagickPixelPacket_p; + +void +InterpolateMagickPixelPacket(struct _Image *image) { + Quantum *indexes; + for (; InterpolateMagickPixelPacket_i; InterpolateMagickPixelPacket_i++) { +GetMagickPixelPacket(InterpolateMagickPixelPacket_pixels + + InterpolateMagickPixelPacket_i); +AlphaBlendMagickPixelPacket( +image, &InterpolateMagickPixelPacket_p + InterpolateMagickPixelPacket_i, +indexes + InterpolateMagickPixelPacket_i, +InterpolateMagickPixelPacket_pixels + InterpolateMagickPixelPacket_i, +InterpolateMagickPixelPacket_alpha + InterpolateMagickPixelPacket_i); + } +} diff --git a/gcc/tree-vect-loop.c b/gcc/tree-vect-loop.c index acfd1952e3b803ea79cf51433101466743c9793e..200ed27b32ef4aa54c6783afa1864924b6f55582 100644 --- a/gcc/tree-vect-loop.c +++ b/gcc/tree-vect-loop.c @@ -2700,7 +2700,7 @@ again: { stmt_vec_info pattern_stmt_info = STMT_VINFO_RELATED_STMT (stmt_info); - if (STMT_VINFO_SLP_VECT_ONLY (pattern_stmt_info)) + if (STMT_VINFO_SLP_VECT_ONLY_PATTERN (pattern_stmt_info)) STMT_VINFO_IN_PATTERN_P (stmt_info) = false; gimple *pattern_def_seq = STMT_VINFO_PATTERN_DEF_SEQ (stmt_info); diff --git a/gcc/tree-vect-slp-patterns.c b/gcc/tree-vect-slp-patterns.c index d25560fab97bb852e949884850d51c6148b14a68..f0817da9f622d22e3df2e30410d1cf610b4ffa1d 100644 --- a/gcc/tree-vect-slp-patterns.c +++ b/gcc/tree-vect-slp-patterns.c @@ -599,7 +599,7 @@ complex_pattern::build (vec_info *vinfo) the call there. */ vect_mark_pattern_stmts (vinfo, stmt_info, call_stmt, SLP_TREE_VECTYPE (node)); - STMT_VINFO_SLP_VECT_ONLY (call_stmt_info) = true; + STMT_VINFO_SLP_VECT_ONLY_PATTERN (call_stmt_info) = true; /* Since we are replacing all the statements in the group with the same thing it doesn't really matter. So just set it every time a new stmt diff --git a/gcc/tree-vectorizer.h b/gcc/tree-vectorizer.h index f8bf4488d0ea32e7909f6be2bf4e7cdaee4f55fe..e
Re: [PATCH] expand: Expand x / y * y as x - x % y if the latter is cheaper [PR96696]
On 1/16/21 11:13 AM, Jakub Jelinek wrote: > Hi! > > The following patch tests both x / y * y and x - x % y expansion for the > former GIMPLE code and chooses the cheaper of those sequences. > > Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk? > > 2021-01-16 Jakub Jelinek > > PR tree-optimization/96696 > * expr.c (expand_expr_divmod): New function. > (expand_expr_real_2) : Use it for truncations and > divisions. Formatting fixes. > : Optimize x / y * y as x - x % y if the latter is > cheaper. > > * gcc.target/i386/pr96696.c: New test. Given this is strictly a missed optimization, I'd lean towards deferring to gcc-12 at this point. Thoughts? jeff
Re: [PATCH] match.pd: Add some __builtin_ctz (x) cmp cst simplifications [PR95527]
On 1/16/21 11:19 AM, Jakub Jelinek via Gcc-patches wrote: > Hi! > > This patch adds some ctz simplifications (e.g. ctz (x) >= 3 can be done by > testing if the low 3 bits are zero, etc.). > > In addition, I've noticed that in the CLZ case, the > #ifdef CLZ_DEFINED_VALUE_AT_ZERO don't really work as intended, they > are evaluated during genmatch and the macro is not defined then > (but, because of the missing tm.h includes it isn't defined in > gimple-match.c or generic-match.c either). And when tm.h is included, > defaults.h is included which defines a fallback version of that macro. > > For GCC 12, I wonder if it wouldn't be better to say in addition to > __builtin_c[lt]z* > is always UB at zero that it would be undefined for .C[LT]Z ifn too if it > has just one operand and use a second operand to be the constant we expect > at zero. > > Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk? > > 2021-01-16 Jakub Jelinek > > PR tree-optimization/95527 > * generic-match-head.c: Include tm.h. > * gimple-match-head.c: Include tm.h. > * match.pd (CLZ == INTEGER_CST): Don't use > #ifdef CLZ_DEFINED_VALUE_AT_ZERO, only test CLZ_DEFINED_VALUE_AT_ZERO > if clz == CFN_CLZ. Add missing val declaration. > (CTZ cmp CST): New simplifications. > > * gcc.dg/tree-ssa/pr95527-2.c: New test. Similarly. I'd lean towards deferring to gcc-12. jeff
Re: [PATCH] match.pd: Add some __builtin_ctz (x) cmp cst simplifications [PR95527]
On Tue, Feb 02, 2021 at 11:39:30AM -0700, Jeff Law wrote: > > This patch adds some ctz simplifications (e.g. ctz (x) >= 3 can be done by > > testing if the low 3 bits are zero, etc.). > > > > In addition, I've noticed that in the CLZ case, the > > #ifdef CLZ_DEFINED_VALUE_AT_ZERO don't really work as intended, they > > are evaluated during genmatch and the macro is not defined then > > (but, because of the missing tm.h includes it isn't defined in > > gimple-match.c or generic-match.c either). And when tm.h is included, > > defaults.h is included which defines a fallback version of that macro. > > > > For GCC 12, I wonder if it wouldn't be better to say in addition to > > __builtin_c[lt]z* > > is always UB at zero that it would be undefined for .C[LT]Z ifn too if it > > has just one operand and use a second operand to be the constant we expect > > at zero. > > > > Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk? > > > > 2021-01-16 Jakub Jelinek > > > > PR tree-optimization/95527 > > * generic-match-head.c: Include tm.h. > > * gimple-match-head.c: Include tm.h. > > * match.pd (CLZ == INTEGER_CST): Don't use > > #ifdef CLZ_DEFINED_VALUE_AT_ZERO, only test CLZ_DEFINED_VALUE_AT_ZERO > > if clz == CFN_CLZ. Add missing val declaration. > > (CTZ cmp CST): New simplifications. > > > > * gcc.dg/tree-ssa/pr95527-2.c: New test. > Similarly. I'd lean towards deferring to gcc-12. Ok, will repost at the start of stage1 then (for both). Jakub
[PATCH] Add test-case.
Pushed to master. gcc/testsuite/ChangeLog: PR target/97510 * gcc.target/i386/pr97510.c: New test. --- gcc/testsuite/gcc.target/i386/pr97510.c | 18 ++ 1 file changed, 18 insertions(+) create mode 100644 gcc/testsuite/gcc.target/i386/pr97510.c diff --git a/gcc/testsuite/gcc.target/i386/pr97510.c b/gcc/testsuite/gcc.target/i386/pr97510.c new file mode 100644 index 000..4f967b59c9a --- /dev/null +++ b/gcc/testsuite/gcc.target/i386/pr97510.c @@ -0,0 +1,18 @@ +/* PR target/97510 */ +/* { dg-do compile } */ +/* { dg-options "-Os -fexcess-precision=standard -mfpmath=387 -funsafe-math-optimizations" } */ + +float compute_rsqrt_ref_r_0; + +__attribute__((optimize(1))) +void compute_rsqrt_ref() { + compute_rsqrt_ref_r_0 = 1.0 / 0.0; +} + +int icompute_rsqrt_ref(float *); + +void test_512() { + float in[0]; + for (int i;;) +in[i] = 8.6756 * icompute_rsqrt_ref(in); +} -- 2.30.0
[PATCH] release pointer_query cache when done (PR 98937)
The strlen pass initializes its pointer_query member object with a cache consisting of a couple of vec's. Because vec doesn't implement RAII its memory must be explicitly released to avoid memory leaks. The attached patch adds a dtor to the strlen_dom_walker to do that. Tested on x86_64-linux and by verifying that the cache leaks are gone by compiling gcc.dg/Wstringop-overflow*.c tests under Valgrind. I'll plan to commit this change as "obvious" tomorrow unless there are suggestions for changes. Martin PS Valgrind shows a fair number of leaks even with the patch but none of them due to the pointer_query cache. PR tree-optimization/98937 - pointer_query cache leaks gcc/ChangeLog: PR tree-optimization/98937 * tree-ssa-strlen.c (strlen_dom_walker::~strlen_dom_walker): Define. Flush pointer_query cache. diff --git a/gcc/tree-ssa-strlen.c b/gcc/tree-ssa-strlen.c index c6f74051607..8912a113de9 100644 --- a/gcc/tree-ssa-strlen.c +++ b/gcc/tree-ssa-strlen.c @@ -5491,6 +5491,8 @@ public: m_cleanup_cfg (false) { } + ~strlen_dom_walker (); + virtual edge before_dom_children (basic_block); virtual void after_dom_children (basic_block); @@ -5508,6 +5510,13 @@ public: bool m_cleanup_cfg; }; +/* Release pointer_query cache. */ + +strlen_dom_walker::~strlen_dom_walker () +{ + ptr_qry.flush_cache (); +} + /* Callback for walk_dominator_tree. Attempt to optimize various string ops by remembering string lengths pointed by pointer SSA_NAMEs. */
gotools patch committed: Test file embedding
This patch to the gotools Makefile runs "go test embed/internal/embedtest" using the newly built go tool, to test that file embedding works. Bootstrapped and ran Go testsuite on x86_64-pc-linux-gnu. Committed to mainline. Ian * Makefile.am (check-embed): New target. (check): Depend on check-embed. Examine embed-testlog. (mostlyclean-local): Add check-embed-dir. (.PHONY): Add check-embed. * Makefile.in: Regenerate. 345484a6755260cd6a01c17ac5547da669422b20 diff --git a/gotools/Makefile.am b/gotools/Makefile.am index 3bbccb96c28..6576fe77b85 100644 --- a/gotools/Makefile.am +++ b/gotools/Makefile.am @@ -102,7 +102,7 @@ MOSTLYCLEANFILES = \ mostlyclean-local: if test -d check-go-dir; then chmod -R u+w check-go-dir; fi rm -rf check-go-dir cgo-test-dir carchive-test-dir \ - check-vet-dir gocache-test + check-vet-dir check-embed-dir gocache-test if NATIVE @@ -307,11 +307,28 @@ check-vet: go$(EXEEXT) $(noinst_PROGRAMS) check-head check-gccgo check-gcc (cd check-vet-dir/src/cmd/vet && $(abs_builddir)/go$(EXEEXT) test -test.short -test.timeout=$(GOTOOLS_TEST_TIMEOUT)s -test.v) >> cmd_vet-testlog 2>&1 || echo "--- $${fl}: go test cmd/vet (0.00s)" >> cmd_vet-testlog grep '^--- ' cmd_vet-testlog | sed -e 's/^--- \(.*\) ([^)]*)$$/\1/' | sort -k 2 +# check-embed runs `go test embed/internal/embedtest` in our environment. +check-embed: go$(EXEEXT) $(noinst_PROGRAMS) check-head check-gccgo check-gcc + rm -rf check-embed-dir embed-testlog + $(MKDIR_P) check-embed-dir/src/embed/internal + cp $(libgosrcdir)/go.mod check-embed-dir/src/ + $(MKDIR_P) check-embed-dir/src/vendor + cp $(libgosrcdir)/vendor/modules.txt check-embed-dir/src/vendor/ + cp -r $(libgosrcdir)/embed/internal/embedtest check-embed-dir/src/embed/internal + @abs_libgodir=`cd $(libgodir) && $(PWD_COMMAND)`; \ + abs_checkdir=`cd check-embed-dir && $(PWD_COMMAND)`; \ + echo "cd check-embed-dir/src/embed/internal/embedtest && $(ECHO_ENV) GOPATH=$${abs_checkdir} $(abs_builddir)/go$(EXEEXT) test -test.short -test.timeout=$(GOTOOLS_TEST_TIMEOUT)s -test.v" > embed-testlog + $(CHECK_ENV) \ + GOPATH=`cd check-embed-dir && $(PWD_COMMAND)`; \ + export GOPATH; \ + (cd check-embed-dir/src/embed/internal/embedtest && $(abs_builddir)/go$(EXEEXT) test -test.short -test.timeout=$(GOTOOLS_TEST_TIMEOUT)s -test.v) >> embed-testlog 2>&1 || echo "--- $${fl}: go test embed/internal/embedtest (0.00s)" >> embed-testlog + grep '^--- ' embed-testlog | sed -e 's/^--- \(.*\) ([^)]*)$$/\1/' | sort -k 2 + # The check targets runs the tests and assembles the output files. -check: check-head check-go-tool check-runtime check-cgo-test check-carchive-test check-vet +check: check-head check-go-tool check-runtime check-cgo-test check-carchive-test check-vet check-embed @mv gotools.head gotools.sum @cp gotools.sum gotools.log - @for file in cmd_go-testlog runtime-testlog cgo-testlog carchive-testlog cmd_vet-testlog; do \ + @for file in cmd_go-testlog runtime-testlog cgo-testlog carchive-testlog cmd_vet-testlog embed-testlog; do \ testname=`echo $${file} | sed -e 's/-testlog//' -e 's|_|/|'`; \ echo "Running $${testname}" >> gotools.sum; \ echo "Running $${testname}" >> gotools.log; \ @@ -337,7 +354,8 @@ check: check-head check-go-tool check-runtime check-cgo-test check-carchive-test @echo "runtest completed at `date`" >> gotools.log @if grep '^FAIL' gotools.sum >/dev/null 2>&1; then exit 1; fi -.PHONY: check check-head check-go-tool check-runtime check-cgo-test check-carchive-test check-vet +.PHONY: check check-head check-go-tool check-runtime check-cgo-test +.PHONY: check-carchive-test check-vet check-embed else
Re: [PATCH] release pointer_query cache when done (PR 98937)
On Tue, 2021-02-02 at 12:57 -0700, Martin Sebor via Gcc-patches wrote: > The strlen pass initializes its pointer_query member object with > a cache consisting of a couple of vec's. Because vec doesn't > implement RAII its memory must be explicitly released to avoid > memory leaks. The attached patch adds a dtor to > the strlen_dom_walker to do that. > > Tested on x86_64-linux and by verifying that the cache leaks are > gone by compiling gcc.dg/Wstringop-overflow*.c tests under Valgrind. > > I'll plan to commit this change as "obvious" tomorrow unless there > are suggestions for changes. Why not make the vecs within struct pointer_query::cache_type's be auto_vecs? Then presumably the autogenerated dtor for pointer_query::cache_type would clean things up, as called from the autogenerated dtor for strlen_dom_walker, when cleaning up the var_cache field? Or am I missing something? (sorry, I'm not familiar with this code) Dave > Martin > > PS Valgrind shows a fair number of leaks even with the patch but > none of them due to the pointer_query cache.
Re: [PATCH] Make asm not contain prefixed addresses.
Hi, On Mon, Feb 01, 2021 at 11:24:42PM -0500, Michael Meissner wrote: > In PR target/98519, the assembler does not like asm memory references that are > prefixed. We can't automatically change the instruction to prefixed form with > a 'p' like we do for normal RTL insns, since this is assembly code. It is incorrect in general, too, for at least three reasons. > Instead, > the patch prevents prefixed memory addresses from being passed by default. > > This patch uses the TARGET_MD_ASM_ADJUST target hook to change the 'm' and 'o' > constraints to be 'em' and 'eo'. This does not work correctly, and is a gross misuse of this hook anyway. Like I said before, just make 'm' (or even general_operand) not allow prefixed memory in inline asm. > + while ((ch = *constraint++) != '\0') Don't do assignments (or any other surprising side effects) in conditionals please. Code should be *readable*, not a puzzle. 0 is written as 0, not as 000 or '\0' or 0x0 or anything else. > --- /dev/null > +++ b/gcc/testsuite/gcc.target/powerpc/pr98519.c > @@ -0,0 +1,17 @@ > +/* { dg-do compile } */ This is the default. > +/* { dg-require-effective-target powerpc_prefixed_addr } */ You do not want this line, you want to compile this *always*. You set cpu=power10 anyway! Also, the test should work *without it*! > +/* { dg-final { scan-assembler-not {\m[@]pcrel\M} } } */ You can just write @ instead of [@]. Putting \m immediately in front of a non-letter (or \M immediately after) does not do anything, either. Segher
[PATCH] c++: ICE with late parsing of noexcept in nested class [PR98899]
Here we crash with a noexcept-specifier in a nested template class, because my handling of such deferred-parse noexcept-specifiers was gronked when we need to instantiate a DEFERRED_PARSE before it was actually parsed at the end of the outermost class. In struct S { template struct B { B() noexcept(noexcept(x)); int x; }; struct A : B { A() : B() {} }; }; we call complete_type for B which triggers tsubsting S::B::B() whose noexcept-specifier still contains a DEFERRED_PARSE. The trick is to stash such noexcept-specifiers into DEFPARSE_INSTANTIATIONS so that we can replace it later when we've finally parsed all deferred noexcept-specifiers. In passing, fix missing usage of UNPARSED_NOEXCEPT_SPEC_P. Bootstrapped/regtested on x86_64-pc-linux-gnu, ok for trunk/10? gcc/cp/ChangeLog: PR c++/98899 * parser.c (cp_parser_class_specifier_1): Use any possible DEFPARSE_INSTANTIATIONS to update DEFERRED_NOEXCEPT_PATTERN. (cp_parser_save_noexcept): Initialize DEFPARSE_INSTANTIATIONS. * pt.c (tsubst_exception_specification): Stash new_specs into DEFPARSE_INSTANTIATIONS. * tree.c (fixup_deferred_exception_variants): Use UNPARSED_NOEXCEPT_SPEC_P. gcc/testsuite/ChangeLog: PR c++/98899 * g++.dg/cpp0x/noexcept65.C: New test. --- gcc/cp/parser.c | 13 ++--- gcc/cp/pt.c | 16 +++ gcc/cp/tree.c | 3 +-- gcc/testsuite/g++.dg/cpp0x/noexcept65.C | 35 + 4 files changed, 62 insertions(+), 5 deletions(-) create mode 100644 gcc/testsuite/g++.dg/cpp0x/noexcept65.C diff --git a/gcc/cp/parser.c b/gcc/cp/parser.c index abadaf972d6..5da8670f0e2 100644 --- a/gcc/cp/parser.c +++ b/gcc/cp/parser.c @@ -25026,8 +25026,8 @@ cp_parser_class_specifier_1 (cp_parser* parser) pushed_scope = push_scope (class_type); } - tree spec = TYPE_RAISES_EXCEPTIONS (TREE_TYPE (decl)); - spec = TREE_PURPOSE (spec); + tree def_parse = TYPE_RAISES_EXCEPTIONS (TREE_TYPE (decl)); + def_parse = TREE_PURPOSE (def_parse); /* Make sure that any template parameters are in scope. */ maybe_begin_member_template_processing (decl); @@ -25044,7 +25044,7 @@ cp_parser_class_specifier_1 (cp_parser* parser) parser->local_variables_forbidden_p |= THIS_FORBIDDEN; /* Now we can parse the noexcept-specifier. */ - spec = cp_parser_late_noexcept_specifier (parser, spec); + tree spec = cp_parser_late_noexcept_specifier (parser, def_parse); if (spec == error_mark_node) spec = NULL_TREE; @@ -25052,6 +25052,12 @@ cp_parser_class_specifier_1 (cp_parser* parser) /* Update the fn's type directly -- it might have escaped beyond this decl :( */ fixup_deferred_exception_variants (TREE_TYPE (decl), spec); + /* Update any instantiations we've already created. We must +keep the new noexcept-specifier wrapped in a DEFERRED_NOEXCEPT +so that maybe_instantiate_noexcept can tsubst the NOEXCEPT_EXPR +in the pattern. */ + for (tree i : DEFPARSE_INSTANTIATIONS (def_parse)) + DEFERRED_NOEXCEPT_PATTERN (TREE_PURPOSE (i)) = TREE_PURPOSE (spec); /* Restore the state of local_variables_forbidden_p. */ parser->local_variables_forbidden_p = local_variables_forbidden_p; @@ -26695,6 +26701,7 @@ cp_parser_save_noexcept (cp_parser *parser) /* Save away the noexcept-specifier; we will process it when the class is complete. */ DEFPARSE_TOKENS (expr) = cp_token_cache_new (first, last); + DEFPARSE_INSTANTIATIONS (expr) = nullptr; expr = build_tree_list (expr, NULL_TREE); return expr; } diff --git a/gcc/cp/pt.c b/gcc/cp/pt.c index aa1687a9f2a..4781519d00f 100644 --- a/gcc/cp/pt.c +++ b/gcc/cp/pt.c @@ -15189,6 +15189,22 @@ tsubst_exception_specification (tree fntype, /*integral_constant_expression_p=*/true); } new_specs = build_noexcept_spec (new_specs, complain); + /* We've instantiated a template before a noexcept-specifier +contained therein has been parsed. This can happen for +a nested template class: + + struct S { + template struct B { B() noexcept(...); }; + struct A : B { ... use B() ... }; + }; + +where completing B will trigger instantiating the +noexcept, even though we only parse it at the end of S. */ + if (UNPARSED_NOEXCEPT_SPEC_P (specs)) + { + gcc_checking_assert (defer_ok); + vec_safe_push (DEFPARSE_INSTANTIATIONS (expr), new_specs); + } } else if (specs) { diff --git a/gcc/cp/tree.c b/gcc/cp/tree.c index 2e5a1f198e8..e6ced274959 100644 --- a/gcc/cp/tree.c +++ b/gcc/cp/tree.c @@ -2738,8 +2738,7 @@ fixup_deferred_exception_variants (tr
[PATCH v7] Practical improvement to libgcc complex divide
Changes in this version from Version 6: Updated copyrights for following three files to -2021. gcc/c-family/c-cppbuiltin.c Moved code for setting LIBGCC modename to FUNC_EXT section. Added code to set modename for any additional modes such as FLT128 or FLT64X. libgcc/libgcc2.c Changed RMIN2 & RMINSCAL for XF and TF modes to use the matching LIBGCC_?F_EPSILON instead of LIBGCC_DF_EPSILON libgcc/config/rs6000/_divkc3.c Changed RMIN2 & RMINSCAL for TF mode to use the matching LIBGCC_TF_EPSILON instead of LIBGCC_DF_EPSILON Resubmitted with correction to following link. No other changes from previous PATCH v7. Correctness and performance test programs used during development of this project may be found in the attachment to: https://gcc.gnu.org/pipermail/gcc-patches/2021-January/564239.html with the attachment at https://gcc.gnu.org/pipermail/gcc-patches/attachments/20210125/b5a7df3b/attachment-0001.bin Summary of Purpose This patch to libgcc/libgcc2.c __divdc3 provides an opportunity to gain important improvements to the quality of answers for the default complex divide routine (half, float, double, extended, long double precisions) when dealing with very large or very small exponents. The current code correctly implements Smith's method (1962) [2] further modified by c99's requirements for dealing with NaN (not a number) results. When working with input values where the exponents are greater than *_MAX_EXP/2 or less than -(*_MAX_EXP)/2, results are substantially different from the answers provided by quad precision more than 1% of the time. This error rate may be unacceptable for many applications that cannot a priori restrict their computations to the safe range. The proposed method reduces the frequency of "substantially different" answers by more than 99% for double precision at a modest cost of performance. Differences between current gcc methods and the new method will be described. Then accuracy and performance differences will be discussed. Background This project started with an investigation related to https://gcc.gnu.org/bugzilla/show_bug.cgi?id=59714. Study of Beebe[1] provided an overview of past and recent practice for computing complex divide. The current glibc implementation is based on Robert Smith's algorithm [2] from 1962. A google search found the paper by Baudin and Smith [3] (same Robert Smith) published in 2012. Elen Kalda's proposed patch [4] is based on that paper. I developed two sets of test data by randomly distributing values over a restricted range and the full range of input values. The current complex divide handled the restricted range well enough, but failed on the full range more than 1% of the time. Baudin and Smith's primary test for "ratio" equals zero reduced the cases with 16 or more error bits by a factor of 5, but still left too many flawed answers. Adding debug print out to cases with substantial errors allowed me to see the intermediate calculations for test values that failed. I noted that for many of the failures, "ratio" was a subnormal. Changing the "ratio" test from check for zero to check for subnormal reduced the 16 bit error rate by another factor of 12. This single modified test provides the greatest benefit for the least cost, but the percentage of cases with greater than 16 bit errors (double precision data) is still greater than 0.027% (2.7 in 10,000). Continued examination of remaining errors and their intermediate computations led to the various tests of input value tests and scaling to avoid under/overflow. The current patch does not handle some of the rare and most extreme combinations of input values, but the random test data is only showing 1 case in 10 million that has an error of greater than 12 bits. That case has 18 bits of error and is due to subtraction cancellation. These results are significantly better than the results reported by Baudin and Smith. Support for half, float, double, extended, and long double precision is included as all are handled with suitable preprocessor symbols in a single source routine. Since half precision is computed with float precision as per current libgcc practice, the enhanced algorithm provides no benefit for half precision and would cost performance. Further investigation showed changing the half precision algorithm to use the simple formula (real=a*c+b*d imag=b*c-a*d) caused no loss of precision and modest improvement in performance. The existing constants for each precision: float: FLT_MAX, FLT_MIN; double: DBL_MAX, DBL_MIN; extended and/or long double: LDBL_MAX, LDBL_MIN are used for avoiding the more common overflow/underflow cases. This use is made generic by defining appropriate __LIBGCC2_* macros in c-cppbuiltin.c. Tests are added for when both parts of the denominator have exponents small enough to allow shifting any subnormal values to normal values all input values could be scaled up without risking over
Re: [PATCH v6] Practical improvement to libgcc complex divide
I saw the gcc.gnu.org at the end of the address and did not realize the link was not part gnu.org. The gcc.gnu.org link is: https://gcc.gnu.org/pipermail/gcc-patches/2021-January/564239.html with the attachment at https://gcc.gnu.org/pipermail/gcc-patches/attachments/20210125/b5a7df3b/attachment-0001.bin I'll resubmit the patch with the corrected link. - patrick On 2/1/2021 5:49 PM, Joseph Myers wrote: On Mon, 1 Feb 2021, Patrick McGehearty via Gcc-patches wrote: The message which contains the attached gzipped tarball of the development test programs is: https://www.mail-archive.com/gcc-patches@gcc.gnu.org/msg254210.html I'll include that link in the new patch as well. I think it's best to include the URL in the archives on gcc.gnu.org rather than on a third-party site.
Re: [PATCH] PING lra: clear lra_insn_recog_data after simplifying a mem subreg
On 2021-01-28 5:40 a.m., Ilya Leoshkevich via Gcc-patches wrote: Hello, I would like to ping the following patch: lra: clear lra_insn_recog_data after simplifying a mem subreg https://gcc.gnu.org/pipermail/gcc-patches/2021-January/563428.html Sorry, I missed your original email. The patch is ok to submit into the trunk. Thank you for addressing the issue.
Merge from trunk to gccgo branch
I merged trunk revision 8e4a738d2540ab6aff77506d368bf4e3fa6963bd to the gccgo branch. Ian
Re: [PATCH] ifcvt: Avoid ICEs trying to force_operand random RTL [PR97487]
On 2/2/21 2:29 AM, Jakub Jelinek wrote: > Hi! > > As the testcase shows, RTL ifcvt can throw random RTL (whatever it found in > some insns) at expand_binop or expand_unop and expects it to do something > (and then will check if it created valid insns and punts if not). > These functions in the end if the operands don't match try to > copy_to_mode_reg the operands, which does > if (!general_operand (x, VOIDmode)) > x = force_operand (x, temp); > but, force_operand is far from handling all possible RTLs, it will ICE for > all more unusual RTL codes. Basically handles just simple arithmetic and > unary RTL operations if they have an optab and > expand_simple_binop/expand_simple_unop ICE on others. > > The following patch fixes it by adding some operand verification (whether > there is a hope that copy_to_mode_reg will succeed on those). It is added > both to noce_emit_move_insn (not needed for this exact testcase, > that function simply tries to recog the insn as is and if it fails, > handles some simple binop/unop cases; the patch performs the verification > of their operands) and noce_try_sign_mask. > > Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk? > > 2021-02-02 Jakub Jelinek > > PR middle-end/97487 > * ifcvt.c (noce_can_force_operand): New function. > (noce_emit_move_insn): Use it. > (noce_try_sign_mask): Likewise. Formatting fix. > > * gcc.dg/pr97487-1.c: New test. > * gcc.dg/pr97487-2.c: New test. OK. One might consider having force_operand call can_force_operand but that feels more like a future thing than something we'd want to do right now. Jeff
Re: [PATCH] lra-constraints: Fix error-recovery for bad inline-asms [PR97971]
On 2/2/21 2:21 AM, Jakub Jelinek via Gcc-patches wrote: > Hi! > > The following testcase has ice-on-invalid, it can't be reloaded, but we > shouldn't ICE the compiler because the user typed non-sense. > > In current_insn_transform we have: > if (process_alt_operands (reused_alternative_num)) > alt_p = true; > > if (check_only_p) > return ! alt_p || best_losers != 0; > > /* If insn is commutative (it's safe to exchange a certain pair of > operands) then we need to try each alternative twice, the second > time matching those two operands as if we had exchanged them. To > do this, really exchange them in operands. > > If we have just tried the alternatives the second time, return > operands to normal and drop through. */ > > if (reused_alternative_num < 0 && commutative >= 0) > { > curr_swapped = !curr_swapped; > if (curr_swapped) > { > swap_operands (commutative); > goto try_swapped; > } > else > swap_operands (commutative); > } > > if (! alt_p && ! sec_mem_p) > { > /* No alternative works with reloads?? */ > if (INSN_CODE (curr_insn) >= 0) > fatal_insn ("unable to generate reloads for:", curr_insn); > error_for_asm (curr_insn, > "inconsistent operand constraints in an %"); > lra_asm_error_p = true; > ... > and so handle inline asms there differently (and delete/nullify them after > this) - fatal_insn is only called for non-inline asm. > But in process_alt_operands we do: > /* Both the earlyclobber operand and conflicting operand >cannot both be user defined hard registers. */ > if (HARD_REGISTER_P (operand_reg[i]) > && REG_USERVAR_P (operand_reg[i]) > && operand_reg[j] != NULL_RTX > && HARD_REGISTER_P (operand_reg[j]) > && REG_USERVAR_P (operand_reg[j])) > fatal_insn ("unable to generate reloads for " > "impossible constraints:", curr_insn); > and thus ICE even for inline-asms. > > I think it is inappropriate to delete/nullify the insn in > process_alt_operands, as it could be done e.g. in the check_only_p mode, > so this patch just returns false in that case, which results in the > caller have alt_p false, and as inline asm isn't simple move, sec_mem_p > will be also false (and it isn't commutative either), so for check_only_p > it will suggests to the callers it isn't ok and otherwise will emit > error and delete/nullify the inline asm insn. > > Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk? > > 2021-02-02 Jakub Jelinek > > PR middle-end/97971 > * lra-constraints.c (process_alt_operands): For inline asm, don't call > fatal_insn, but instead return false. > > * gcc.target/i386/pr97971.c: New test. OK jeff
Re: The performance data for two different implementation of new security feature -ftrivial-auto-var-init
Hi, With the following patch: [qinzhao@localhost gcc]$ git diff tree-ssa-structalias.c diff --git a/gcc/tree-ssa-structalias.c b/gcc/tree-ssa-structalias.c index cf653be..bd18841 100644 --- a/gcc/tree-ssa-structalias.c +++ b/gcc/tree-ssa-structalias.c @@ -4851,6 +4851,30 @@ find_func_aliases_for_builtin_call (struct function *fn, gcall *t) return false; } +static void +find_func_aliases_for_deferred_init (gcall *t) +{ + + tree lhsop = gimple_call_lhs (t); + enum auto_init_type init_type += (enum auto_init_type) TREE_INT_CST_LOW (gimple_call_arg (t, 1)); + auto_vec lhsc; + auto_vec rhsc; + struct constraint_expr temp; + + get_constraint_for (lhsop, &lhsc); + if (init_type == AUTO_INIT_ZERO && flag_delete_null_pointer_checks) +temp.var = nothing_id; + else +temp.var = nonlocal_id; + temp.type = ADDRESSOF; + temp.offset = 0; + rhsc.safe_push (temp); + + process_all_all_constraints (lhsc, rhsc); + return; +} + /* Create constraints for the call T. */ static void @@ -4864,6 +4888,12 @@ find_func_aliases_for_call (struct function *fn, gcall *t) && find_func_aliases_for_builtin_call (fn, t)) return; + if (gimple_call_internal_p (t, IFN_DEFERRED_INIT)) +{ + find_func_aliases_for_deferred_init (t); + return; +} + The *.ealias dump for the routine “bump_map” are exactly the same for approach A and D. However, the stack size for D still bigger than A. Any suggestions? Qing On Feb 2, 2021, at 9:17 AM, Qing Zhao via Gcc-patches wrote: > > > >> On Feb 2, 2021, at 1:43 AM, Richard Biener wrote: >> >> On Mon, 1 Feb 2021, Qing Zhao wrote: >> >>> Hi, Richard, >>> >>> I have adjusted SRA phase to split calls to DEFERRED_INIT per you >>> suggestion. >>> >>> And now the routine “bump_map” in 511.povray is like following: >>> ... >>> >>> # DEBUG BEGIN_STMT >>> xcoor = 0.0; >>> ycoor = 0.0; >>> # DEBUG BEGIN_STMT >>> index = .DEFERRED_INIT (index, 2); >>> index2 = .DEFERRED_INIT (index2, 2); >>> index3 = .DEFERRED_INIT (index3, 2); >>> # DEBUG BEGIN_STMT >>> colour1 = .DEFERRED_INIT (colour1, 2); >>> colour2 = .DEFERRED_INIT (colour2, 2); >>> colour3 = .DEFERRED_INIT (colour3, 2); >>> # DEBUG BEGIN_STMT >>> p1$0_181 = .DEFERRED_INIT (p1$0_195(D), 2); >>> # DEBUG p1$0 => p1$0_181 >>> p1$1_184 = .DEFERRED_INIT (p1$1_182(D), 2); >>> # DEBUG p1$1 => p1$1_184 >>> p1$2_172 = .DEFERRED_INIT (p1$2_185(D), 2); >>> # DEBUG p1$2 => p1$2_172 >>> p2$0_177 = .DEFERRED_INIT (p2$0_173(D), 2); >>> # DEBUG p2$0 => p2$0_177 >>> p2$1_135 = .DEFERRED_INIT (p2$1_178(D), 2); >>> # DEBUG p2$1 => p2$1_135 >>> p2$2_137 = .DEFERRED_INIT (p2$2_136(D), 2); >>> # DEBUG p2$2 => p2$2_137 >>> p3$0_377 = .DEFERRED_INIT (p3$0_376(D), 2); >>> # DEBUG p3$0 => p3$0_377 >>> p3$1_379 = .DEFERRED_INIT (p3$1_378(D), 2); >>> # DEBUG p3$1 => p3$1_379 >>> p3$2_381 = .DEFERRED_INIT (p3$2_380(D), 2); >>> # DEBUG p3$2 => p3$2_381 >>> >>> >>> In the above, p1, p2, and p3 are all splitted to calls to DEFERRED_INIT of >>> the components of p1, p2 and p3. >>> >>> With this change, the stack usage numbers with -fstack-usage for approach >>> A, old approach D and new D with the splitting in SRA are: >>> >>> Approach A Approach D-old Approach D-new >>> >>> 272 624 368 >>> >>> From the above, we can see that splitting the call to DEFERRED_INIT in SRA >>> can reduce the stack usage increase dramatically. >>> >>> However, looks like that the stack size for D is still bigger than A. >>> >>> I checked the IR again, and found that the alias analysis might be >>> responsible for this (by compare the image.cpp.026t.ealias for both A and >>> D): >>> >>> (Due to the call to: >>> >>> colour1 = .DEFERRED_INIT (colour1, 2); >>> ) >>> >>> **Approach A: >>> >>> Points_to analysis: >>> >>> Constraints: >>> … >>> colour1 = &NULL >>> … >>> colour1 = &NONLOCAL >>> colour1 = &NONLOCAL >>> colour1 = &NONLOCAL >>> colour1 = &NONLOCAL >>> colour1 = &NONLOCAL >>> ... >>> callarg(53) = &colour1 >>> ... >>> _53 = colour1 >>> >>> Points_to sets: >>> … >>> colour1 = { NULL ESCAPED NONLOCAL } same as _53 >>> ... >>> CALLUSED(48) = { NULL ESCAPED NONLOCAL index colour1 } >>> CALLCLOBBERED(49) = { NULL ESCAPED NONLOCAL index colour1 } same as >>> CALLUSED(48) >>> ... >>> callarg(53) = { NULL ESCAPED NONLOCAL colour1 } >>> >>> **Apprach D: >>> >>> Points_to analysis: >>> >>> Constraints: >>> … >>> callarg(19) = colour1 >>> callarg(19) = &NONLOCAL >>> colour1 = callarg(19) + UNKNOWN >>> colour1 = &NONLOCAL >>> … >>> colour1 = &NONLOCAL >>> colour1 = &NONLOCAL >>> colour1 = &NONLOCAL >>> colour1 = &NONLOCAL >>> colour1 = &NONLOCAL >>> … >>> callarg(74) = &colour1 >>> callarg(74) = callarg(74) + UNKNOWN >>> callarg(74) = *callarg(74) + UNKNOWN >>> … >>> _53 = colour1 >>> _54 = _53 >>> _55 = _54 + UNKNOWN >>> _55 = &NONLOCAL >>> _56 = colour1 >>> _57 = _56 >>> _58 = _57 + UNKNOWN >>> _58 = &NONLOCAL >>> _59 = _55 + UNKNOWN >>> _59 = _58 + UNKNOW
Re: [PATCH] release pointer_query cache when done (PR 98937)
On 2/2/21 2:29 PM, David Malcolm wrote: On Tue, 2021-02-02 at 12:57 -0700, Martin Sebor via Gcc-patches wrote: The strlen pass initializes its pointer_query member object with a cache consisting of a couple of vec's. Because vec doesn't implement RAII its memory must be explicitly released to avoid memory leaks. The attached patch adds a dtor to the strlen_dom_walker to do that. Tested on x86_64-linux and by verifying that the cache leaks are gone by compiling gcc.dg/Wstringop-overflow*.c tests under Valgrind. I'll plan to commit this change as "obvious" tomorrow unless there are suggestions for changes. Why not make the vecs within struct pointer_query::cache_type's be auto_vecs? Then presumably the autogenerated dtor for pointer_query::cache_type would clean things up, as called from the autogenerated dtor for strlen_dom_walker, when cleaning up the var_cache field? Or am I missing something? (sorry, I'm not familiar with this code) Dave It would work as long as the cache isn't copied or assigned anywhere. I don't think it is either, and GCC compiles and no C tests fail, so it should be okay. But I'm leery of using auto_vec as a class member because of pr90904. Looks like auto_vec now has a move ctor and move assignment but still no safe copy ctor or assignment. The cache copy ctor and assignment operator could be deleted to avoid accidental copies, but at that point it starts to become more involved than just flushing the cache. If you or someone else has a preference for using auto_vec despite this I'll change it. Otherwise I'd just as soon leave it as is. Martin Martin PS Valgrind shows a fair number of leaks even with the patch but none of them due to the pointer_query cache.
Re: [PATCH, rs6000] Optimization for PowerPC 64bit constant generation [PR94395]
On Fri, Jan 29, 2021 at 11:11:23AM +0800, HAO CHEN GUI via Gcc-patches wrote: > This patch tries to optimize PowerPC 64 bit constant generation when the > constant can be transformed from a 32 bit or 16 bit constant by rotating, > shifting and mask AND. All and more of what you are doing here for rotated 16-bit constants is covered by https://gcc.gnu.org/pipermail/gcc-patches/2020-October/555760.html That patch is still waiting on review. Hmm, I see my local copy of that patch has one extra line in gcc/testsuite/gcc.target/powerpc/rot_cst2.c +/* { dg-additional-options "-mno-prefixed" { target { lp64 } } } */ in order to keep scan-assembler-times counts correct for power10. -- Alan Modra Australia Development Lab, IBM
Re: [PATCH 2/2] RISC-V: Add riscv{32, 64}be with big endian as default
> > Could you extract the endian related LINK_SPEC change to > > ENDIAN_LINK_SPEC to riscv.h, so that we can prevent > > duplicate this several times. > > You mean a define which expands to > > "-melf" XLEN_SPEC DEFAULT_ENDIAN_SPEC "riscv" Yeah, but I'd like to include following 2 lines too: %{mbig-endian:-EB} \ %{mlittle-endian:-EL} \ I saw it's just the same among 3 files. > > ? Sure, but I don't think ENDIAN_LINK_SPEC would be a good name for > it since it defines the word size as well as the endianness, and also > ELF in general. > > Maybe ELF_LINK_SPEC? The word size and endianness are also ELF > properties (as encoded in EI_CLASS and EI_DATA). Either ENDIAN_LINK_SPEC or ELF_LINK_SPEC is ok to me, I don't have strong preference on nanming. > > > // Marcus > >
[PATCH] rs6000: Use rldimi for vec init instead of shift + ior
Hi, This patch merges the previously approved one[1] and its relied patch made by Segher here[2], it's to make unsigned int vector init go with rldimi to merge two integers instead of shift and ior. Segher's patch in [2] is required to make the test case pass, otherwise the costing for new pseudo-to-pseudo copies and the folding with nonzero_bits in combine will make the rl*imi pattern become compact and split into ior and shift unexpectedly. The commit log of Segher's patch describes it in more details: "An rl*imi is usually written as an IOR of an ASHIFT or similar, and an AND of a register with a constant mask. In some cases combine knows that that AND doesn't do anything (because all zero bits in that mask correspond to bits known to be already zero), and then no pattern matches. This patch adds a define_split for such cases. It uses nonzero_bits in the condition of the splitter, but does not need it afterwards for the instruction to be recognised. This is necessary because later passes can see fewer nonzero_bits. Because it is a splitter, combine will only use it when starting with three insns (or more), even though the result is just one. This isn't a huge problem in practice, but some possible combinations still won't happen." Bootstrapped/regtested on powerpc64le-linux-gnu P9 and powerpc64-linux-gnu P8, also SPEC2017 build/run passed on P9. Is it ok for trunk? BR, Kewen [1] https://gcc.gnu.org/pipermail/gcc-patches/2020-December/562407.html [2] https://gcc.gnu.org/pipermail/gcc-patches/2021-January/563526.html gcc/ChangeLog: 2020-02-03 Segher Boessenkool Kewen Lin * config/rs6000/rs6000.md (*rotl3_insert_3): Renamed to... (rotl3_insert_3): ...this. (plus_ior_xor): New code_iterator. (define_split for GPR rl*imi): New splitter. * config/rs6000/vsx.md (vsx_init_v4si): Use gen_rotldi3_insert_3 for integer merging. gcc/testsuite/ChangeLog: * gcc.target/powerpc/vec-init-10.c: New test. - diff --git a/gcc/config/rs6000/rs6000.md b/gcc/config/rs6000/rs6000.md index bb9fb42f82a..dca311ebc80 100644 --- a/gcc/config/rs6000/rs6000.md +++ b/gcc/config/rs6000/rs6000.md @@ -4067,7 +4067,7 @@ [(set_attr "type" "insert")]) ; There are also some forms without one of the ANDs. -(define_insn "*rotl3_insert_3" +(define_insn "rotl3_insert_3" [(set (match_operand:GPR 0 "gpc_reg_operand" "=r") (ior:GPR (and:GPR (match_operand:GPR 3 "gpc_reg_operand" "0") (match_operand:GPR 4 "const_int_operand" "n")) @@ -4082,6 +4082,24 @@ } [(set_attr "type" "insert")]) +(define_code_iterator plus_ior_xor [plus ior xor]) + +(define_split + [(set (match_operand:GPR 0 "gpc_reg_operand") + (plus_ior_xor:GPR (ashift:GPR (match_operand:GPR 1 "gpc_reg_operand") + (match_operand:SI 2 "const_int_operand")) + (match_operand:GPR 3 "gpc_reg_operand")))] + "nonzero_bits (operands[3], mode) + < HOST_WIDE_INT_1U << INTVAL (operands[2])" + [(set (match_dup 0) + (ior:GPR (and:GPR (match_dup 3) + (match_dup 4)) +(ashift:GPR (match_dup 1) +(match_dup 2] +{ + operands[4] = GEN_INT ((HOST_WIDE_INT_1U << INTVAL (operands[2])) - 1); +}) + (define_insn "*rotl3_insert_4" [(set (match_operand:GPR 0 "gpc_reg_operand" "=r") (ior:GPR (and:GPR (match_operand:GPR 3 "gpc_reg_operand" "0") diff --git a/gcc/config/rs6000/vsx.md b/gcc/config/rs6000/vsx.md index 0c1bda522a9..07c2f7ffa6e 100644 --- a/gcc/config/rs6000/vsx.md +++ b/gcc/config/rs6000/vsx.md @@ -3008,28 +3008,22 @@ (use (match_operand:SI 4 "gpc_reg_operand"))] "VECTOR_MEM_VSX_P (V4SImode) && TARGET_DIRECT_MOVE_64BIT" { - rtx a = gen_reg_rtx (DImode); - rtx b = gen_reg_rtx (DImode); - rtx c = gen_reg_rtx (DImode); - rtx d = gen_reg_rtx (DImode); - emit_insn (gen_zero_extendsidi2 (a, operands[1])); - emit_insn (gen_zero_extendsidi2 (b, operands[2])); - emit_insn (gen_zero_extendsidi2 (c, operands[3])); - emit_insn (gen_zero_extendsidi2 (d, operands[4])); + rtx a = gen_lowpart_SUBREG (DImode, operands[1]); + rtx b = gen_lowpart_SUBREG (DImode, operands[2]); + rtx c = gen_lowpart_SUBREG (DImode, operands[3]); + rtx d = gen_lowpart_SUBREG (DImode, operands[4]); if (!BYTES_BIG_ENDIAN) { std::swap (a, b); std::swap (c, d); } - rtx aa = gen_reg_rtx (DImode); rtx ab = gen_reg_rtx (DImode); - rtx cc = gen_reg_rtx (DImode); rtx cd = gen_reg_rtx (DImode); - emit_insn (gen_ashldi3 (aa, a, GEN_INT (32))); - emit_insn (gen_ashldi3 (cc, c, GEN_INT (32))); - emit_insn (gen_iordi3 (ab, aa, b)); - emit_insn (gen_iordi3 (cd, cc, d)); + emit_insn (gen_rotldi3_insert_3 (ab, a, GEN_INT (32), b, + GEN_INT (0x))); + emit_insn (gen_rotldi3_insert_3 (cd, c, GEN_INT (32), d, + GEN_INT (0x
Re: [PATCH, rs6000] Optimization for PowerPC 64bit constant generation [PR94395]
Alan, Thanks for your info. Just notice your patch. I will wait for your patch being reviewed. On 3/2/2021 上午 10:32, Alan Modra wrote: On Fri, Jan 29, 2021 at 11:11:23AM +0800, HAO CHEN GUI via Gcc-patches wrote: This patch tries to optimize PowerPC 64 bit constant generation when the constant can be transformed from a 32 bit or 16 bit constant by rotating, shifting and mask AND. All and more of what you are doing here for rotated 16-bit constants is covered by https://gcc.gnu.org/pipermail/gcc-patches/2020-October/555760.html That patch is still waiting on review. Hmm, I see my local copy of that patch has one extra line in gcc/testsuite/gcc.target/powerpc/rot_cst2.c +/* { dg-additional-options "-mno-prefixed" { target { lp64 } } } */ in order to keep scan-assembler-times counts correct for power10.