[PATCH] gm2: add missing debug output guard
The Close() procedure in MemStream is missing a guard to prevent it from printing in non-debug mode. gcc/gm2: * gm2-libs-iso/MemStream.mod: Guard debug output. Signed-off-by: Wilken Gottwalt --- gcc/m2/gm2-libs-iso/MemStream.mod | 5 - 1 file changed, 4 insertions(+), 1 deletion(-) diff --git a/gcc/m2/gm2-libs-iso/MemStream.mod b/gcc/m2/gm2-libs-iso/MemStream.mod index 9620ed2ba19..d3204692540 100644 --- a/gcc/m2/gm2-libs-iso/MemStream.mod +++ b/gcc/m2/gm2-libs-iso/MemStream.mod @@ -694,7 +694,10 @@ END handlefree ; PROCEDURE Close (VAR cid: ChanId) ; BEGIN - printf ("Close called\n"); + IF Debugging + THEN + printf ("Close called\n") + END ; IF IsMem(cid) THEN UnMakeChan(did, cid) ; -- 2.45.2
[RFC] Generalize formation of lane-reducing ops in loop reduction
Hi, I composed some patches to generalize lane-reducing (dot-product is a typical representative) pattern recognition, and prepared a RFC document so as to help review. The original intention was to make a complete solution for https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114440. For sure, the work might be limited, so hope your comments. Thanks. - 1. Background For loop reduction of accumulating result of a widening operation, the preferred pattern is lane-reducing operation, if supported by target. Because this kind of operation need not preserve intermediate results of widening operation, and only produces reduced amount of final results for accumulation, choosing the pattern could lead to pretty compact codegen. Three lane-reducing opcodes are defined in gcc, belonging to two kinds of operations: dot-product (DOT_PROD_EXPR) and sum-of-absolute-difference (SAD_EXPR). WIDEN_SUM_EXPR could be seen as a degenerated dot-product with a constant operand as "1". Currently, gcc only supports recognition of simple lane-reducing case, in which each accumulation statement of loop reduction forms one pattern: char *d0, *d1; short *s0, *s1; for (i) { sum += d0[i] * d1[i]; // = DOT_PROD sum += abs(s0[i] - s1[i]); // = SAD } We could rewrite the example as the below using only one statement, whose non- reduction addend is the sum of the above right-side parts. As a whole, the addend would match nothing, while its two sub-expressions could be recognized as corresponding lane-reducing patterns. for (i) { sum += d0[i] * d1[i] + abs(s0[i] - s1[i]); } This case might be too elaborately crafted to be very common in reality. Though, we do find seemingly variant but essentially similar code pattern in some AI applications, which use matrix-vector operations extensively, some usages are just single loop reduction composed of multiple dot-products. A code snippet from ggml: for (int j = 0; j < qk/2; ++j) { const uint8_t xh_0 = ((qh >> (j + 0)) << 4) & 0x10; const uint8_t xh_1 = ((qh >> (j + 12)) ) & 0x10; const int32_t x0 = (x[i].qs[j] & 0xF) | xh_0; const int32_t x1 = (x[i].qs[j] >> 4) | xh_1; sumi += (x0 * y[i].qs[j]) + (x1 * y[i].qs[j + qk/2]); } In the source level, it appears to be a nature and minor scaling-up of simple one lane-reducing pattern, but it is beyond capability of current vectorization pattern recognition, and needs some kind of generic extension to the framework. 2. Reasoning on validity of transform First of all, we should tell what kind of expression is appropriate for lane- reducing transform. Given a loop, we use the language of mathematics to define an abstract function f(x, i), whose first independent variable "x" denotes a value that will participate sum-based loop reduction either directly or indirectly, and the 2nd one "i" specifies index of a loop iteration, which implies other intra-iteration factor irrelevant to "x". The function itself represents the transformed value by applying a series of operations on "x" in the context of "i"th loop iteration, and this value is directly accumulated to the loop reduction result. For the purpose of vectorization, it is implicitly supposed that f(x, i) is a pure function, and free of loop dependency. Additionally, for a value "x" defined in the loop, let "X" be a vector as , consisting of the "x" values in all iterations, to be specific, "X[i]" corresponds to "x" at iteration "i", or "xi". With sequential execution order, a loop reduction regarding to f(x, i) would be expanded to: sum += f(x0, 0); sum += f(x1, 1); ... sum += f(xM, M); 2.1 Lane-reducing vs. Lane-combining Following lane-reducing semantics, we introduce a new similar lane-combining operation that also manipulates a subset of lanes/elements in vector, by accumulating all into one of them, at the same time, clearing the rest lanes to be zero. Two operations are equivalent in essence, while a major difference is that lane-combining operation does not reduce the lanes of vector. One advantage about this is codegen of lane-combining operation could seamlessly inter-operate with that of normal (non-lane-reducing) vector operation. Any lane-combining operation could be synthesized by a sequence of the most basic two-lane operations, which become the focuses of our analysis. Given two lanes "i" and "j", and let X' = lane-combine(X, i, j), then we have: X = <..., xi , ..., xj, ...> X' = <..., xi + xj, ..., 0, ...> 2.2 Equations for loop reduction invariance Since combining strategy of lane-reducing operations is target-specific, for examples, accumulating quad lanes to one (#0 + #1 + #2 + #3 => #0), or low to high (#0 + #4 => #4), we just make a conservative assumption that combining could happen on arbitrary two lanes in either order. Under the precondition, it is legitimate to optimize evaluation of a value "x" with a lane-reducing pattern, only if loop reduction always produces invariant result no matter w
[RFC][PATCH 1/5] vect: Fix single_imm_use in tree_vect_patterns
The work for RFC (https://gcc.gnu.org/pipermail/gcc-patches/2024-July/657860.html) involves not a little code change, so I have to separate it into several batches of patchset. This and the following patches constitute the first batch. Since pattern statement coexists with normal statements in a way that it is not linked into function body, we should not invoke utility procedures that depends on def/use graph on pattern statement, such as counting uses of a pseudo value defined by a pattern statement. This patch is to fix a bug of this type in vect pattern formation. Thanks, Feng --- gcc/ * tree-vect-patterns.cc (vect_recog_bitfield_ref_pattern): Only call single_imm_use if statement is not generated by pattern recognition. --- gcc/tree-vect-patterns.cc | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/gcc/tree-vect-patterns.cc b/gcc/tree-vect-patterns.cc index 4570c25b664..ca8809e7cfd 100644 --- a/gcc/tree-vect-patterns.cc +++ b/gcc/tree-vect-patterns.cc @@ -2700,7 +2700,8 @@ vect_recog_bitfield_ref_pattern (vec_info *vinfo, stmt_vec_info stmt_info, /* If the only use of the result of this BIT_FIELD_REF + CONVERT is a PLUS_EXPR then do the shift last as some targets can combine the shift and add into a single instruction. */ - if (lhs && single_imm_use (lhs, &use_p, &use_stmt)) + if (lhs && !STMT_VINFO_RELATED_STMT (stmt_info) + && single_imm_use (lhs, &use_p, &use_stmt)) { if (gimple_code (use_stmt) == GIMPLE_ASSIGN && gimple_assign_rhs_code (use_stmt) == PLUS_EXPR) -- 2.17.1 From 52e1725339fc7e4552eb7916570790c4ab7f133d Mon Sep 17 00:00:00 2001 From: Feng Xue Date: Fri, 14 Jun 2024 15:49:23 +0800 Subject: [PATCH 1/5] vect: Fix single_imm_use in tree_vect_patterns Since pattern statement coexists with normal statements in a way that it is not linked into function body, we should not invoke utility procedures that depends on def/use graph on pattern statement, such as counting uses of a pseudo value defined by a pattern statement. This patch is to fix a bug of this type in vect pattern formation. 2024-06-14 Feng Xue gcc/ * tree-vect-patterns.cc (vect_recog_bitfield_ref_pattern): Only call single_imm_use if statement is not generated by pattern recognition. --- gcc/tree-vect-patterns.cc | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/gcc/tree-vect-patterns.cc b/gcc/tree-vect-patterns.cc index 4570c25b664..ca8809e7cfd 100644 --- a/gcc/tree-vect-patterns.cc +++ b/gcc/tree-vect-patterns.cc @@ -2700,7 +2700,8 @@ vect_recog_bitfield_ref_pattern (vec_info *vinfo, stmt_vec_info stmt_info, /* If the only use of the result of this BIT_FIELD_REF + CONVERT is a PLUS_EXPR then do the shift last as some targets can combine the shift and add into a single instruction. */ - if (lhs && single_imm_use (lhs, &use_p, &use_stmt)) + if (lhs && !STMT_VINFO_RELATED_STMT (stmt_info) + && single_imm_use (lhs, &use_p, &use_stmt)) { if (gimple_code (use_stmt) == GIMPLE_ASSIGN && gimple_assign_rhs_code (use_stmt) == PLUS_EXPR) -- 2.17.1
[RFC][PATCH 3/5] vect: Enable lane-reducing operation that is not loop reduction statement
This patch extends original vect analysis and transform to support a new kind of lane-reducing operation that participates in loop reduction indirectly. The operation itself is not reduction statement, but its value would be accumulated into reduction result finally. Thanks, Feng --- gcc/ * tree-vect-loop.cc (vectorizable_lane_reducing): Allow indirect lane- reducing operation. (vect_transform_reduction): Extend transform for indirect lane-reducing operation. --- gcc/tree-vect-loop.cc | 48 +++ 1 file changed, 40 insertions(+), 8 deletions(-) diff --git a/gcc/tree-vect-loop.cc b/gcc/tree-vect-loop.cc index d7d628efa60..c344158b419 100644 --- a/gcc/tree-vect-loop.cc +++ b/gcc/tree-vect-loop.cc @@ -7520,9 +7520,7 @@ vectorizable_lane_reducing (loop_vec_info loop_vinfo, stmt_vec_info stmt_info, stmt_vec_info reduc_info = STMT_VINFO_REDUC_DEF (vect_orig_stmt (stmt_info)); - /* TODO: Support lane-reducing operation that does not directly participate - in loop reduction. */ - if (!reduc_info || STMT_VINFO_REDUC_IDX (stmt_info) < 0) + if (!reduc_info) return false; /* Lane-reducing pattern inside any inner loop of LOOP_VINFO is not @@ -7530,7 +7528,16 @@ vectorizable_lane_reducing (loop_vec_info loop_vinfo, stmt_vec_info stmt_info, gcc_assert (STMT_VINFO_DEF_TYPE (reduc_info) == vect_reduction_def); gcc_assert (STMT_VINFO_REDUC_TYPE (reduc_info) == TREE_CODE_REDUCTION); - for (int i = 0; i < (int) gimple_num_ops (stmt) - 1; i++) + int sum_idx = STMT_VINFO_REDUC_IDX (stmt_info); + int num_ops = (int) gimple_num_ops (stmt) - 1; + + /* Participate in loop reduction either directly or indirectly. */ + if (sum_idx >= 0) +gcc_assert (sum_idx == num_ops - 1); + else +sum_idx = num_ops - 1; + + for (int i = 0; i < num_ops; i++) { stmt_vec_info def_stmt_info; slp_tree slp_op; @@ -7573,7 +7580,24 @@ vectorizable_lane_reducing (loop_vec_info loop_vinfo, stmt_vec_info stmt_info, tree vectype_in = STMT_VINFO_REDUC_VECTYPE_IN (stmt_info); - gcc_assert (vectype_in); + if (!vectype_in) +{ + enum vect_def_type dt; + tree rhs1 = gimple_assign_rhs1 (stmt); + + if (!vect_is_simple_use (rhs1, loop_vinfo, &dt, &vectype_in)) + return false; + + if (!vectype_in) + { + vectype_in = get_vectype_for_scalar_type (loop_vinfo, + TREE_TYPE (rhs1)); + if (!vectype_in) + return false; + } + + STMT_VINFO_REDUC_VECTYPE_IN (stmt_info) = vectype_in; +} /* Compute number of effective vector statements for costing. */ unsigned int ncopies_for_cost = vect_get_num_copies (loop_vinfo, slp_node, @@ -8750,9 +8774,17 @@ vect_transform_reduction (loop_vec_info loop_vinfo, gcc_assert (single_defuse_cycle || lane_reducing); if (lane_reducing) -{ - /* The last operand of lane-reducing op is for reduction. */ - gcc_assert (reduc_index == (int) op.num_ops - 1); +{ + if (reduc_index < 0) + { + reduc_index = (int) op.num_ops - 1; + single_defuse_cycle = false; + } + else + { + /* The last operand of lane-reducing op is for reduction. */ + gcc_assert (reduc_index == (int) op.num_ops - 1); + } } /* Create the destination vector */ -- 2.17.1From 5e65c65786d9594c172b58a6cd1af50c67efb927 Mon Sep 17 00:00:00 2001 From: Feng Xue Date: Wed, 24 Apr 2024 16:46:49 +0800 Subject: [PATCH 3/5] vect: Enable lane-reducing operation that is not loop reduction statement This patch extends original vect analysis and transform to support a new kind of lane-reducing operation that participates in loop reduction indirectly. The operation itself is not reduction statement, but its value would be accumulated into reduction result finally. 2024-04-24 Feng Xue gcc/ * tree-vect-loop.cc (vectorizable_lane_reducing): Allow indirect lane- reducing operation. (vect_transform_reduction): Extend transform for indirect lane-reducing operation. --- gcc/tree-vect-loop.cc | 48 +++ 1 file changed, 40 insertions(+), 8 deletions(-) diff --git a/gcc/tree-vect-loop.cc b/gcc/tree-vect-loop.cc index d7d628efa60..c344158b419 100644 --- a/gcc/tree-vect-loop.cc +++ b/gcc/tree-vect-loop.cc @@ -7520,9 +7520,7 @@ vectorizable_lane_reducing (loop_vec_info loop_vinfo, stmt_vec_info stmt_info, stmt_vec_info reduc_info = STMT_VINFO_REDUC_DEF (vect_orig_stmt (stmt_info)); - /* TODO: Support lane-reducing operation that does not directly participate - in loop reduction. */ - if (!reduc_info || STMT_VINFO_REDUC_IDX (stmt_info) < 0) + if (!reduc_info) return false; /* Lane-reducing pattern inside any inner loop of LOOP_VINFO is not @@ -7530,7 +7528,16 @@ vectorizable_lane_reducing (loop_vec_info loop_vinfo, stmt_vec_info stmt_info, gcc_assert (ST
[RFC][PATCH 2/5] vect: Introduce loop reduction affine closure to vect pattern recog
For sum-based loop reduction, its affine closure is composed by statements whose results and derived computation only end up in the reduction, and are not used in any non-linear transform operation. The concept underlies the generalized lane-reducing pattern recognition in the coming patches. As mathematically proved, it is legitimate to optimize evaluation of a value with lane-reducing pattern, only if its definition statement locates in affine closure. That is to say, canonicalized representation for loop reduction could be of the following affine form, in which "opX" denotes an operation for lane-reducing pattern, h(i) represents remaining operations irrelvant to those patterns. for (i) sum += cst0 * op0 + cst1 * op1 + ... + cstN * opN + h(i); At initialization, we invoke a preprocessing step to mark all statements in affine closure, which could ease retrieval of the property during pattern matching. Since a pattern hit would replace original statement with new pattern statements, we resort to a postprocessing step after recognition, to parse semantics of those new, and incrementally update affine closure, or rollback the pattern change if it would break completeness of existing closure. Thus, inside affine closure, recog framework could universally handle both lane-reducing and normal patterns. Also with this patch, we are able to add more complicated logic to enhance lane-reducing patterns. Thanks, Feng --- gcc/ * tree-vectorizer.h (enum vect_reduc_pattern_status): New enum. (_stmt_vec_info): Add a new field reduc_pattern_status. * tree-vect-patterns.cc (vect_split_statement): Adjust statement status for reduction affine closure. (vect_convert_input): Do not reuse conversion statement in process. (vect_reassociating_reduction_p): Add a condition check to only allow statement in reduction affine closure. (vect_pattern_expr_invariant_p): New function. (vect_get_affine_operands_mask): Likewise. (vect_mark_reduction_affine_closure): Likewise. (vect_mark_stmts_for_reduction_pattern_recog): Likewise. (vect_get_prev_reduction_stmt): Likewise. (vect_mark_reduction_pattern_sequence_formed): Likewise. (vect_check_pattern_stmts_for_reduction): Likewise. (vect_pattern_recog_1): Check if a pattern recognition would break existing lane-reducing pattern statements. (vect_pattern_recog): Mark loop reduction affine closure. --- gcc/tree-vect-patterns.cc | 722 +- gcc/tree-vectorizer.h | 23 ++ 2 files changed, 742 insertions(+), 3 deletions(-) diff --git a/gcc/tree-vect-patterns.cc b/gcc/tree-vect-patterns.cc index ca8809e7cfd..02f6b942026 100644 --- a/gcc/tree-vect-patterns.cc +++ b/gcc/tree-vect-patterns.cc @@ -750,7 +750,6 @@ vect_split_statement (vec_info *vinfo, stmt_vec_info stmt2_info, tree new_rhs, gimple_stmt_iterator gsi = gsi_for_stmt (stmt2_info->stmt, def_seq); gsi_insert_before_without_update (&gsi, stmt1, GSI_SAME_STMT); } - return true; } else { @@ -783,9 +782,35 @@ vect_split_statement (vec_info *vinfo, stmt_vec_info stmt2_info, tree new_rhs, dump_printf_loc (MSG_NOTE, vect_location, "and: %G", (gimple *) new_stmt2); } +} - return true; + /* Since this function would change existing conversion statement no matter + the pattern is finally applied or not, we should check whether affine + closure of loop reduction need to be adjusted for impacted statements. */ + unsigned int status = stmt2_info->reduc_pattern_status; + + if (status != rpatt_none) +{ + tree rhs_type = TREE_TYPE (gimple_assign_rhs1 (stmt1)); + tree new_rhs_type = TREE_TYPE (new_rhs); + + /* The new statement generated by splitting is a nature widening +conversion. */ + gcc_assert (TYPE_PRECISION (rhs_type) < TYPE_PRECISION (new_rhs_type)); + gcc_assert (TYPE_UNSIGNED (rhs_type) || !TYPE_UNSIGNED (new_rhs_type)); + + /* The new statement would not break transform invariance of lane- +reducing operation, if the original conversion depends on the one +formed previously. For the case, it should also be marked with +rpatt_formed status. */ + if (status & rpatt_formed) + vinfo->lookup_stmt (stmt1)->reduc_pattern_status = rpatt_formed; + + if (!is_pattern_stmt_p (stmt2_info)) + STMT_VINFO_RELATED_STMT (stmt2_info)->reduc_pattern_status = status; } + + return true; } /* Look for the following pattern @@ -890,7 +915,10 @@ vect_convert_input (vec_info *vinfo, stmt_vec_info stmt_info, tree type, return wide_int_to_tree (type, wi::to_widest (unprom->op)); tree input = unprom->op; - if (unprom->caster) + + /* We should not reuse conversion, if it is just the statement under pattern + recognition. */ + if (unprom->caster && unprom->cast
[RFC][PATCH 4/5] vect: Extend lane-reducing patterns to non-loop-reduction statement
Previously, only simple lane-reducing case is supported, in which one loop reduction statement forms one pattern match: char *d0, *d1, *s0, *s1, *w; for (i) { sum += d0[i] * d1[i]; // sum = DOT_PROD(d0, d1, sum); sum += abs(s0[i] - s1[i]); // sum = SAD(s0, s1, sum); sum += w[i]; // sum = WIDEN_SUM(w, sum); } This patch removes limitation of current lane-reducing matching strategy, and extends candidate scope to the whole loop reduction affine closure. Thus, we could optimize reduction with lane-reducing as many as possible, which ends up with generalized pattern recognition as ("opX" denotes an operation for lane-reducing pattern): for (i) sum += cst0 * op0 + cst1 * op1 + ... + cstN * opN + h(i); A lane-reducing operation contains two aspects: main primitive operation and appendant result-accumulation. Original design handles match of the compound semantics in single pattern, but the means is not suitable for operation that does not directly participate in loop reduction. In this patch, we only focus on the basic aspect, and leave another patch to cover the rest. An example with dot-product: sum = DOT_PROD(d0, d1, sum); // original sum = DOT_PROD(d0, d1, 0) + sum; // now Thanks, Feng --- gcc/ * tree-vect-patterns (vect_reassociating_reduction_p): Remove the function. (vect_recog_dot_prod_pattern): Relax check to allow any statement in reduction affine closure. (vect_recog_sad_pattern): Likewise. (vect_recog_widen_sum_pattern): Likewise. And use dot-product if widen-sum is not supported. (vect_vect_recog_func_ptrs): Move lane-reducing patterns to the topmost. gcc/testsuite/ * gcc.dg/vect/vect-reduc-affine-1.c * gcc.dg/vect/vect-reduc-affine-2.c * gcc.dg/vect/vect-reduc-affine-slp-1.c --- .../gcc.dg/vect/vect-reduc-affine-1.c | 112 ++ .../gcc.dg/vect/vect-reduc-affine-2.c | 81 + .../gcc.dg/vect/vect-reduc-affine-slp-1.c | 74 gcc/tree-vect-patterns.cc | 321 ++ 4 files changed, 372 insertions(+), 216 deletions(-) create mode 100644 gcc/testsuite/gcc.dg/vect/vect-reduc-affine-1.c create mode 100644 gcc/testsuite/gcc.dg/vect/vect-reduc-affine-2.c create mode 100644 gcc/testsuite/gcc.dg/vect/vect-reduc-affine-slp-1.c diff --git a/gcc/testsuite/gcc.dg/vect/vect-reduc-affine-1.c b/gcc/testsuite/gcc.dg/vect/vect-reduc-affine-1.c new file mode 100644 index 000..a5e99ce703b --- /dev/null +++ b/gcc/testsuite/gcc.dg/vect/vect-reduc-affine-1.c @@ -0,0 +1,112 @@ +/* Disabling epilogues until we find a better way to deal with scans. */ +/* { dg-additional-options "--param vect-epilogues-nomask=0" } */ +/* { dg-require-effective-target vect_int } */ +/* { dg-require-effective-target arm_v8_2a_dotprod_neon_hw { target { aarch64*-*-* || arm*-*-* } } } */ +/* { dg-add-options arm_v8_2a_dotprod_neon } */ + +#include "tree-vect.h" + +#define N 50 + +#define FN(name, S1, S2) \ +S1 int __attribute__ ((noipa)) \ +name (S1 int res, \ + S2 char *restrict a, \ + S2 char *restrict b, \ + S2 int *restrict c, \ + S2 int cst1, \ + S2 int cst2, \ + int shift) \ +{ \ + for (int i = 0; i < N; i++) \ +res += a[i] * b[i] + 16; \ + \ + asm volatile ("" ::: "memory"); \ + for (int i = 0; i < N; i++) \ +res += a[i] * b[i] + cst1; \ + \ + asm volatile ("" ::: "memory"); \ + for (int i = 0; i < N; i++) \ +res += a[i] * b[i] + c[i]; \ + \ + asm volatile ("" ::: "memory"); \ + for (int i = 0; i < N; i++) \ +res += a[i] * b[i] * 23; \ + \ + asm volatile ("" ::: "memory"); \ + for (int i = 0; i < N; i++) \ +res += a[i] * b[i] << 6; \ + \ + asm volatile ("" ::: "memory"); \ + for (int i = 0; i < N; i++) \ +res += a[i] * b[i] * cst2; \ + \ + asm volatile ("" ::: "memory"); \ + for
[RFC][PATCH 5/5] vect: Add accumulating-result pattern for lane-reducing operation
This patch adds a pattern to fold a summation into the last operand of lane- reducing operation when appropriate, which is a supplement to those operation- specific patterns for dot-prod/sad/widen-sum. sum = lane-reducing-op(..., 0) + value; => sum = lane-reducing-op(..., value); Thanks, Feng --- gcc/ * tree-vect-patterns (vect_recog_lane_reducing_accum_pattern): New pattern function. (vect_vect_recog_func_ptrs): Add the new pattern function. * params.opt (vect-lane-reducing-accum-pattern): New parameter. gcc/testsuite/ * gcc.dg/vect/vect-reduc-accum-pattern.c --- gcc/params.opt| 4 + .../gcc.dg/vect/vect-reduc-accum-pattern.c| 61 ++ gcc/tree-vect-patterns.cc | 106 ++ 3 files changed, 171 insertions(+) create mode 100644 gcc/testsuite/gcc.dg/vect/vect-reduc-accum-pattern.c diff --git a/gcc/params.opt b/gcc/params.opt index c17ba17b91b..b94bdc26cbd 100644 --- a/gcc/params.opt +++ b/gcc/params.opt @@ -1198,6 +1198,10 @@ The maximum factor which the loop vectorizer applies to the cost of statements i Common Joined UInteger Var(param_vect_induction_float) Init(1) IntegerRange(0, 1) Param Optimization Enable loop vectorization of floating point inductions. +-param=vect-lane-reducing-accum-pattern= +Common Joined UInteger Var(param_vect_lane_reducing_accum_pattern) Init(2) IntegerRange(0, 2) Param Optimization +Allow pattern of combining plus into lane reducing operation or not. If value is 2, allow this for all statements, or if 1, only for reduction statement, otherwise, disable it. + -param=vrp-block-limit= Common Joined UInteger Var(param_vrp_block_limit) Init(15) Optimization Param Maximum number of basic blocks before VRP switches to a fast model with less memory requirements. diff --git a/gcc/testsuite/gcc.dg/vect/vect-reduc-accum-pattern.c b/gcc/testsuite/gcc.dg/vect/vect-reduc-accum-pattern.c new file mode 100644 index 000..80a2c4f047e --- /dev/null +++ b/gcc/testsuite/gcc.dg/vect/vect-reduc-accum-pattern.c @@ -0,0 +1,61 @@ +/* Disabling epilogues until we find a better way to deal with scans. */ +/* { dg-additional-options "--param vect-epilogues-nomask=0" } */ +/* { dg-require-effective-target vect_int } */ +/* { dg-require-effective-target arm_v8_2a_dotprod_neon_hw { target { aarch64*-*-* || arm*-*-* } } } */ +/* { dg-add-options arm_v8_2a_dotprod_neon } */ + +#include "tree-vect.h" + +#define N 50 + +#define FN(name, S1, S2) \ +S1 int __attribute__ ((noipa)) \ +name (S1 int res, \ + S2 char *restrict a, \ + S2 char *restrict b, \ + S2 char *restrict c, \ + S2 char *restrict d) \ +{ \ + for (int i = 0; i < N; i++) \ +res += a[i] * b[i];\ + \ + asm volatile ("" ::: "memory"); \ + for (int i = 0; i < N; ++i) \ +res += (a[i] * b[i] + c[i] * d[i]) << 3; \ + \ + return res; \ +} + +FN(f1_vec, signed, signed) + +#pragma GCC push_options +#pragma GCC optimize ("O0") +FN(f1_novec, signed, signed) +#pragma GCC pop_options + +#define BASE2 ((signed int) -1 < 0 ? -126 : 4) +#define OFFSET 20 + +int +main (void) +{ + check_vect (); + + signed char a[N], b[N]; + signed char c[N], d[N]; + +#pragma GCC novector + for (int i = 0; i < N; ++i) +{ + a[i] = BASE2 + i * 5; + b[i] = BASE2 + OFFSET + i * 4; + c[i] = BASE2 + i * 6; + d[i] = BASE2 + OFFSET + i * 5; +} + + if (f1_vec (0x12345, a, b, c, d) != f1_novec (0x12345, a, b, c, d)) +__builtin_abort (); +} + +/* { dg-final { scan-tree-dump "vect_recog_dot_prod_pattern: detected" "vect" } } */ +/* { dg-final { scan-tree-dump "vect_recog_lane_reducing_accum_pattern: detected" "vect" { target { vect_sdot_qi } } } } */ diff --git a/gcc/tree-vect-patterns.cc b/gcc/tree-vect-patterns.cc index bb037af0b68..9a6b16532e4 100644 --- a/gcc/tree-vect-patterns.cc +++ b/gcc/tree-vect-patterns.cc @@ -1490,6 +1490,111 @@ vect_recog_abd_pattern (vec_info *vinfo, return vect_convert_output (vinfo, stmt_vinfo, out_type, stmt, vectype_out); } +/* Function vect_recog_lane_reducing_accum_pattern + + Try to fold a summation into the last operand of lane-reducing operation. + + sum = lane-reducing-op(..., 0) + value; + + A lane-reducing operation contains two aspects: main primitive operation + and appendant result-accumulation. Pattern matching for the basic aspect + is handled in specific pattern for dot-prod/sad/widen-sum respectively. + The function is in charge of the other aspect. + + Input: + + * STMT_VINFO: The stmt from which the pattern se
[PATCH v1] RISC-V: Rearrange the test helper files for vector .SAT_*
From: Pan Li Rearrange the test help header files, as well as align the name conventions. gcc/testsuite/ChangeLog: * gcc.target/riscv/rvv/autovec/binop/vec_sat_binary.h: Move to... * gcc.target/riscv/rvv/autovec/binop/vec_sat_binary_vvv_run.h: ...here. * gcc.target/riscv/rvv/autovec/binop/vec_sat_binary_scalar.h: Move to... * gcc.target/riscv/rvv/autovec/binop/vec_sat_binary_vvx_run.h: ...here. * gcc.target/riscv/rvv/autovec/binop/vec_sat_binary_vx.h: Move to... * gcc.target/riscv/rvv/autovec/binop/vec_sat_binary_vx_run.h: ...here. * gcc.target/riscv/rvv/autovec/binop/vec_sat_u_add-1.c: Adjust the include file names. * gcc.target/riscv/rvv/autovec/binop/vec_sat_u_add-10.c: Ditto. * gcc.target/riscv/rvv/autovec/binop/vec_sat_u_add-11.c: Ditto. * gcc.target/riscv/rvv/autovec/binop/vec_sat_u_add-12.c: Ditto. * gcc.target/riscv/rvv/autovec/binop/vec_sat_u_add-13.c: Ditto. * gcc.target/riscv/rvv/autovec/binop/vec_sat_u_add-14.c: Ditto. * gcc.target/riscv/rvv/autovec/binop/vec_sat_u_add-15.c: Ditto. * gcc.target/riscv/rvv/autovec/binop/vec_sat_u_add-16.c: Ditto. * gcc.target/riscv/rvv/autovec/binop/vec_sat_u_add-17.c: Ditto. * gcc.target/riscv/rvv/autovec/binop/vec_sat_u_add-18.c: Ditto. * gcc.target/riscv/rvv/autovec/binop/vec_sat_u_add-19.c: Ditto. * gcc.target/riscv/rvv/autovec/binop/vec_sat_u_add-2.c: Ditto. * gcc.target/riscv/rvv/autovec/binop/vec_sat_u_add-20.c: Ditto. * gcc.target/riscv/rvv/autovec/binop/vec_sat_u_add-21.c: Ditto. * gcc.target/riscv/rvv/autovec/binop/vec_sat_u_add-22.c: Ditto. * gcc.target/riscv/rvv/autovec/binop/vec_sat_u_add-23.c: Ditto. * gcc.target/riscv/rvv/autovec/binop/vec_sat_u_add-24.c: Ditto. * gcc.target/riscv/rvv/autovec/binop/vec_sat_u_add-25.c: Ditto. * gcc.target/riscv/rvv/autovec/binop/vec_sat_u_add-26.c: Ditto. * gcc.target/riscv/rvv/autovec/binop/vec_sat_u_add-27.c: Ditto. * gcc.target/riscv/rvv/autovec/binop/vec_sat_u_add-28.c: Ditto. * gcc.target/riscv/rvv/autovec/binop/vec_sat_u_add-29.c: Ditto. * gcc.target/riscv/rvv/autovec/binop/vec_sat_u_add-3.c: Ditto. * gcc.target/riscv/rvv/autovec/binop/vec_sat_u_add-30.c: Ditto. * gcc.target/riscv/rvv/autovec/binop/vec_sat_u_add-31.c: Ditto. * gcc.target/riscv/rvv/autovec/binop/vec_sat_u_add-32.c: Ditto. * gcc.target/riscv/rvv/autovec/binop/vec_sat_u_add-4.c: Ditto. * gcc.target/riscv/rvv/autovec/binop/vec_sat_u_add-5.c: Ditto. * gcc.target/riscv/rvv/autovec/binop/vec_sat_u_add-6.c: Ditto. * gcc.target/riscv/rvv/autovec/binop/vec_sat_u_add-7.c: Ditto. * gcc.target/riscv/rvv/autovec/binop/vec_sat_u_add-8.c: Ditto. * gcc.target/riscv/rvv/autovec/binop/vec_sat_u_add-9.c: Ditto. * gcc.target/riscv/rvv/autovec/binop/vec_sat_u_add-run-1.c: Ditto. * gcc.target/riscv/rvv/autovec/binop/vec_sat_u_add-run-10.c: Ditto. * gcc.target/riscv/rvv/autovec/binop/vec_sat_u_add-run-11.c: Ditto. * gcc.target/riscv/rvv/autovec/binop/vec_sat_u_add-run-12.c: Ditto. * gcc.target/riscv/rvv/autovec/binop/vec_sat_u_add-run-13.c: Ditto. * gcc.target/riscv/rvv/autovec/binop/vec_sat_u_add-run-14.c: Ditto. * gcc.target/riscv/rvv/autovec/binop/vec_sat_u_add-run-15.c: Ditto. * gcc.target/riscv/rvv/autovec/binop/vec_sat_u_add-run-16.c: Ditto. * gcc.target/riscv/rvv/autovec/binop/vec_sat_u_add-run-17.c: Ditto. * gcc.target/riscv/rvv/autovec/binop/vec_sat_u_add-run-18.c: Ditto. * gcc.target/riscv/rvv/autovec/binop/vec_sat_u_add-run-19.c: Ditto. * gcc.target/riscv/rvv/autovec/binop/vec_sat_u_add-run-2.c: Ditto. * gcc.target/riscv/rvv/autovec/binop/vec_sat_u_add-run-20.c: Ditto. * gcc.target/riscv/rvv/autovec/binop/vec_sat_u_add-run-21.c: Ditto. * gcc.target/riscv/rvv/autovec/binop/vec_sat_u_add-run-22.c: Ditto. * gcc.target/riscv/rvv/autovec/binop/vec_sat_u_add-run-23.c: Ditto. * gcc.target/riscv/rvv/autovec/binop/vec_sat_u_add-run-24.c: Ditto. * gcc.target/riscv/rvv/autovec/binop/vec_sat_u_add-run-25.c: Ditto. * gcc.target/riscv/rvv/autovec/binop/vec_sat_u_add-run-26.c: Ditto. * gcc.target/riscv/rvv/autovec/binop/vec_sat_u_add-run-27.c: Ditto. * gcc.target/riscv/rvv/autovec/binop/vec_sat_u_add-run-28.c: Ditto. * gcc.target/riscv/rvv/autovec/binop/vec_sat_u_add-run-29.c: Ditto. * gcc.target/riscv/rvv/autovec/binop/vec_sat_u_add-run-3.c: Ditto. * gcc.target/riscv/rvv/autovec/binop/vec_sat_u_add-run-30.c: Ditto. * gcc.target/riscv/rvv/autovec/binop/vec_sat_u_add-run-31.c: Ditto. * gcc.target/riscv/rvv/autovec/binop/vec_sat_u_add-run-32.c: Ditto. * gcc.target/riscv/rvv/autovec/binop/vec_sat_u_add-run-4.c: Ditto. * gcc.target/riscv/rvv/au
[PATCH] tree-optimization/58416 - SRA wrt FP type replacements
As in other places we have to be careful to use FP modes to represent the underlying bit representation of an object. With x87 floating-point types there are no load or store instructions that preserve this and XFmode can have padding. When SRA faces the situation that a field is accessed with multiple effective types as happens for example for unions it generally choses an integer type if available. But in the case in the PR there's an aggregate type or a floating-point type only and we end up chosing the register type. SRA deals with similar situations for bit-precision integer types and adjusts the replacement type to one covering the size of the object. The following patch makes sure we do the same when the replacement has float mode and there were possibly two ways the object was accessed. I've chosen to use bitwise_type_for_mode in this case as done for example by memcpy folding to avoid creating a unsigned:96 replacement type on i?86 where sizeof(long double) is 12. This means we can fail to find an integer type for a replacement which slightly complicates the patch and it causes the testcase to no longer be SRAed on i?86. Bootstrapped on x86_64-unknown-linux-gnu, there is some fallout in the testsuite I need to compare to a clean run. Comments welcome. Richard. PR tree-optimization/58416 * tree-sra.cc (analyze_access_subtree): For FP mode replacements with multiple access paths use a bitwise type instead or fail if not available. * gcc.dg/torture/pr58416.c: New testcase. --- gcc/testsuite/gcc.dg/torture/pr58416.c | 32 gcc/tree-sra.cc| 72 ++ 2 files changed, 83 insertions(+), 21 deletions(-) create mode 100644 gcc/testsuite/gcc.dg/torture/pr58416.c diff --git a/gcc/testsuite/gcc.dg/torture/pr58416.c b/gcc/testsuite/gcc.dg/torture/pr58416.c new file mode 100644 index 000..0922b0e7089 --- /dev/null +++ b/gcc/testsuite/gcc.dg/torture/pr58416.c @@ -0,0 +1,32 @@ +/* { dg-do run } */ + +struct s { + char s[sizeof(long double)]; +}; + +union u { + long double d; + struct s s; +}; + +int main() +{ + union u x = {0}; +#if __SIZEOF_LONG_DOUBLE__ == 16 + x.s = (struct s){""}; +#elif __SIZEOF_LONG_DOUBLE__ == 12 + x.s = (struct s){""}; +#elif __SIZEOF_LONG_DOUBLE__ == 8 + x.s = (struct s){""}; +#elif __SIZEOF_LONG_DOUBLE__ == 4 + x.s = (struct s){""}; +#endif + + union u y = x; + + for (unsigned char *p = (unsigned char *)&y + sizeof y; + p-- > (unsigned char *)&y;) +if (*p != (unsigned char)'x') + __builtin_abort (); + return 0; +} diff --git a/gcc/tree-sra.cc b/gcc/tree-sra.cc index 8040b0c5645..bc9a7b3ee04 100644 --- a/gcc/tree-sra.cc +++ b/gcc/tree-sra.cc @@ -2868,40 +2868,70 @@ analyze_access_subtree (struct access *root, struct access *parent, /* Always create access replacements that cover the whole access. For integral types this means the precision has to match. Avoid assumptions based on the integral type kind, too. */ - if (INTEGRAL_TYPE_P (root->type) - && ((TREE_CODE (root->type) != INTEGER_TYPE - && TREE_CODE (root->type) != BITINT_TYPE) - || TYPE_PRECISION (root->type) != root->size) - /* But leave bitfield accesses alone. */ - && (TREE_CODE (root->expr) != COMPONENT_REF - || !DECL_BIT_FIELD (TREE_OPERAND (root->expr, 1 + if ((INTEGRAL_TYPE_P (root->type) + && ((TREE_CODE (root->type) != INTEGER_TYPE + && TREE_CODE (root->type) != BITINT_TYPE) + || TYPE_PRECISION (root->type) != root->size) + /* But leave bitfield accesses alone. */ + && (TREE_CODE (root->expr) != COMPONENT_REF + || !DECL_BIT_FIELD (TREE_OPERAND (root->expr, 1 + /* Avoid a floating-point replacement when there's multiple +ways this field is accessed. On some targets this can +cause correctness issues, see PR58416. */ + || (FLOAT_MODE_P (TYPE_MODE (root->type)) + && !root->grp_same_access_path)) { tree rt = root->type; gcc_assert ((root->offset % BITS_PER_UNIT) == 0 && (root->size % BITS_PER_UNIT) == 0); if (TREE_CODE (root->type) == BITINT_TYPE) root->type = build_bitint_type (root->size, TYPE_UNSIGNED (rt)); + else if (FLOAT_MODE_P (TYPE_MODE (root->type))) + { + tree bt = bitwise_type_for_mode (TYPE_MODE (root->type)); + if (!bt) + { + if (dump_file && (dump_flags & TDF_DETAILS)) + { + fprintf (dump_file, "Failed to change the type of a " + "replacement for "); + print_generic_expr (dump_file, root->base); + fprintf (dump_file, " offset: %u, size: %u ", +
[PATCH][RFC] tree-optimization/114659 - VN and FP to int punning
The following addresses another case where x87 FP loads mangle the bit representation and thus are not suitable for a representative in other types. VN was value-numbering a later integer load of 'x' as the same as a former float load of 'x'. The following disables this when the result is not known constant. This now regresses gcc.dg/tree-ssa/ssa-fre-7.c but for x87 float the optimization might elide a FP load/store "noop" move that isn't noop on x87 and thus the desired transform is invalid. Nevertheless it's bad to pessimize all targets for this. I was wondering if it's possible to key this on reg_raw_mode[] but that needs a hard register number (and suspiciously the array has no DFmode or SFmode on x86_64 but only XFmode). So would this need a new target hook? Should this use some other mechanism to query for the correctness of performing the load in another mode and then punning to the destination mode? Bootstrap and regtest running on x86_64-unknown-linux-gnu. PR tree-optimization/114659 * tree-ssa-sccvn.cc (visit_reference_op_load): Do not pun from a scalar floating point mode load to a different type unless we can do so by constnat folding. * gcc.target/i386/pr114659.c: New testcase. --- gcc/testsuite/gcc.target/i386/pr114659.c | 62 gcc/tree-ssa-sccvn.cc| 7 +++ 2 files changed, 69 insertions(+) create mode 100644 gcc/testsuite/gcc.target/i386/pr114659.c diff --git a/gcc/testsuite/gcc.target/i386/pr114659.c b/gcc/testsuite/gcc.target/i386/pr114659.c new file mode 100644 index 000..e1e24d55687 --- /dev/null +++ b/gcc/testsuite/gcc.target/i386/pr114659.c @@ -0,0 +1,62 @@ +/* { dg-do run } */ +/* { dg-options "-O2" } */ + +int +my_totalorderf (float const *x, float const *y) +{ + int xs = __builtin_signbit (*x); + int ys = __builtin_signbit (*y); + if (!xs != !ys) +return xs; + + int xn = __builtin_isnan (*x); + int yn = __builtin_isnan (*y); + if (!xn != !yn) +return !xn == !xs; + if (!xn) +return *x <= *y; + + unsigned int extended_sign = -!!xs; + union { unsigned int i; float f; } xu = {0}, yu = {0}; + __builtin_memcpy (&xu.f, x, sizeof (float)); + __builtin_memcpy (&yu.f, y, sizeof (float)); + return (xu.i ^ extended_sign) <= (yu.i ^ extended_sign); +} + +static float +positive_NaNf () +{ + float volatile nan = 0.0f / 0.0f; + return (__builtin_signbit (nan) ? - nan : nan); +} + +typedef union { float value; unsigned int word[1]; } memory_float; + +static memory_float +construct_memory_SNaNf (float quiet_value) +{ + memory_float m; + m.value = quiet_value; + m.word[0] ^= (unsigned int) 1 << 22; + m.word[0] |= (unsigned int) 1; + return m; +} + +memory_float x[7] = + { +{ 0 }, +{ 1e-5 }, +{ 1 }, +{ 1e37 }, +{ 1.0f / 0.0f }, + }; + +int +main () +{ + x[5] = construct_memory_SNaNf (positive_NaNf ()); + x[6] = (memory_float) { positive_NaNf () }; + if (! my_totalorderf (&x[5].value, &x[6].value)) +__builtin_abort (); + return 0; +} diff --git a/gcc/tree-ssa-sccvn.cc b/gcc/tree-ssa-sccvn.cc index 0139f1b4e30..62f3de11b56 100644 --- a/gcc/tree-ssa-sccvn.cc +++ b/gcc/tree-ssa-sccvn.cc @@ -5825,6 +5825,13 @@ visit_reference_op_load (tree lhs, tree op, gimple *stmt) result = NULL_TREE; else if (CONSTANT_CLASS_P (result)) result = const_unop (VIEW_CONVERT_EXPR, TREE_TYPE (op), result); + /* Do not treat a float-mode load as preserving the bit +representation. See PR114659, on for x87 FP modes there +is no load instruction that does not at least turn sNaNs +into qNaNs. But allow the case of a constant FP value we an +fold above. */ + else if (SCALAR_FLOAT_MODE_P (TYPE_MODE (TREE_TYPE (result + result = NULL_TREE; else { /* We will be setting the value number of lhs to the value number -- 2.43.0
Re: [PATCH] LoongArch: Implement scalar isinf, isnormal, and isfinite via fclass
On Mon, 2024-07-15 at 15:53 +0800, Lulu Cheng wrote: > Hi, > > g++.dg/opt/pr107569.C and range-sincos.c vrp-float-abs-1.c is the same > issue, right? > > And I have no objection to code modifications. But I think it's better > to wait until this builtin > > function is fixed. Oops https://gcc.gnu.org/pipermail/gcc-patches/2024-July/656937.html won't be enough for pr107569.C. For pr107569.C I guess we need to add range ops for __builtin_isfinite but the patch only handles __builtin_isinf. -- Xi Ruoyao School of Aerospace Science and Technology, Xidian University
Re: [PATCH] gcc: stop adding -fno-common for checking builds
Richard Biener writes: >> Am 20.07.2024 um 02:31 schrieb Andrew Pinski : >> >> On Fri, Jul 19, 2024 at 5:23 PM Sam James wrote: >>> >>> Originally added in r0-44646-g204250d2fcd084 and r0-44627-gfd350d241fecf6 >>> whic >>> moved -fno-common from all builds to just checking builds. >>> >>> Since r10-4867-g6271dd984d7f92, GCC defaults to -fno-common. There's no need >>> to pass it specially for checking builds. >>> >>> We could keep it for older bootstrap compilers with checking but I don't see >>> much value in that, it was already just a bonus before. >> >> Considering -fno-common has almost no effect on C++ code, removing it >> fully is a decent thing to do. >> It was added back when GCC was written in C and then never removed >> when GCC started to build as C++. > > Ok Thank you! Arsen has kindly pushed for me. > > Richard > >> Thanks, >> Andrew Pinski >> >>> >>> gcc/ChangeLog: >>>* Makefile.in (NOCOMMON_FLAG): Delete. >>>(GCC_WARN_CFLAGS): Drop NOCOMMON_FLAG. >>>(GCC_WARN_CXXFLAGS): Drop NOCOMMON_FLAG. >>>* configure.ac: Ditto. >>>* configure: Regenerate. >>> >>> gcc/d/ChangeLog: >>>* Make-lang.in (WARN_DFLAGS): Drop NOCOMMON_FLAG. >>> --- >>> This came out of a discussion with pinskia last year but I punted it >>> until stage1. Been running with it since then. >>> >>> gcc/Makefile.in| 8 ++-- >>> gcc/configure | 8 ++-- >>> gcc/configure.ac | 3 --- >>> gcc/d/Make-lang.in | 2 +- >>> 4 files changed, 5 insertions(+), 16 deletions(-) >>> >>> diff --git a/gcc/Makefile.in b/gcc/Makefile.in >>> index f4bb4a88cf31..4fc86ed7938b 100644 >>> --- a/gcc/Makefile.in >>> +++ b/gcc/Makefile.in >>> @@ -185,10 +185,6 @@ C_LOOSE_WARN = @c_loose_warn@ >>> STRICT_WARN = @strict_warn@ >>> C_STRICT_WARN = @c_strict_warn@ >>> >>> -# This is set by --enable-checking. The idea is to catch forgotten >>> -# "extern" tags in header files. >>> -NOCOMMON_FLAG = @nocommon_flag@ >>> - >>> NOEXCEPTION_FLAGS = @noexception_flags@ >>> >>> ALIASING_FLAGS = @aliasing_flags@ >>> @@ -215,8 +211,8 @@ VALGRIND_DRIVER_DEFINES = @valgrind_path_defines@ >>> .-warn = $(STRICT_WARN) >>> build-warn = $(STRICT_WARN) >>> rtl-ssa-warn = $(STRICT_WARN) >>> -GCC_WARN_CFLAGS = $(LOOSE_WARN) $(C_LOOSE_WARN) $($(@D)-warn) $(if >>> $(filter-out $(STRICT_WARN),$($(@D)-warn)),,$(C_STRICT_WARN)) >>> $(NOCOMMON_FLAG) $($@-warn) >>> -GCC_WARN_CXXFLAGS = $(LOOSE_WARN) $($(@D)-warn) $(NOCOMMON_FLAG) $($@-warn) >>> +GCC_WARN_CFLAGS = $(LOOSE_WARN) $(C_LOOSE_WARN) $($(@D)-warn) $(if >>> $(filter-out $(STRICT_WARN),$($(@D)-warn)),,$(C_STRICT_WARN)) $($@-warn) >>> +GCC_WARN_CXXFLAGS = $(LOOSE_WARN) $($(@D)-warn) $($@-warn) >>> >>> # 1 2 3 ... >>> one_to__0:=1 2 3 4 5 6 7 8 9 >>> diff --git a/gcc/configure b/gcc/configure >>> index 4faae0fa5fb8..01acca7fb5cc 100755 >>> --- a/gcc/configure >>> +++ b/gcc/configure >>> @@ -862,7 +862,6 @@ valgrind_command >>> valgrind_path_defines >>> valgrind_path >>> TREECHECKING >>> -nocommon_flag >>> noexception_flags >>> warn_cxxflags >>> warn_cflags >>> @@ -7605,17 +7604,14 @@ do >>> done >>> IFS="$ac_save_IFS" >>> >>> -nocommon_flag="" >>> if test x$ac_checking != x ; then >>> >>> $as_echo "#define CHECKING_P 1" >>confdefs.h >>> >>> - nocommon_flag=-fno-common >>> else >>> $as_echo "#define CHECKING_P 0" >>confdefs.h >>> >>> fi >>> - >>> if test x$ac_extra_checking != x ; then >>> >>> $as_echo "#define ENABLE_EXTRA_CHECKING 1" >>confdefs.h >>> @@ -21410,7 +21406,7 @@ else >>> lt_dlunknown=0; lt_dlno_uscore=1; lt_dlneed_uscore=2 >>> lt_status=$lt_dlunknown >>> cat > conftest.$ac_ext <<_LT_EOF >>> -#line 21413 "configure" >>> +#line 21409 "configure" >>> #include "confdefs.h" >>> >>> #if HAVE_DLFCN_H >>> @@ -21516,7 +21512,7 @@ else >>> lt_dlunknown=0; lt_dlno_uscore=1; lt_dlneed_uscore=2 >>> lt_status=$lt_dlunknown >>> cat > conftest.$ac_ext <<_LT_EOF >>> -#line 21519 "configure" >>> +#line 21515 "configure" >>> #include "confdefs.h" >>> >>> #if HAVE_DLFCN_H >>> diff --git a/gcc/configure.ac b/gcc/configure.ac >>> index 3da1eaa70646..3f20c107b6aa 100644 >>> --- a/gcc/configure.ac >>> +++ b/gcc/configure.ac >>> @@ -697,16 +697,13 @@ do >>> done >>> IFS="$ac_save_IFS" >>> >>> -nocommon_flag="" >>> if test x$ac_checking != x ; then >>> AC_DEFINE(CHECKING_P, 1, >>> [Define to 0/1 if you want more run-time sanity checks. This one gets a >>> grab >>> bag of miscellaneous but relatively cheap checks.]) >>> - nocommon_flag=-fno-common >>> else >>> AC_DEFINE(CHECKING_P, 0) >>> fi >>> -AC_SUBST(nocommon_flag) >>> if test x$ac_extra_checking != x ; then >>> AC_DEFINE(ENABLE_EXTRA_CHECKING, 1, >>> [Define to 0/1 if you want extra run-time checking that might affect code >>> diff --git a/gcc/d/Make-lang.in b/gcc/d/Make-lang.in >>> index eaea6e039cf7..077668faae64 100644 >>> --- a/gcc/d/Make-lang.in >>> +++ b/gcc/d/Make-lang.in >>> @@ -55,7 +55,7 @@ CHECKING_DFLAGS = -frelease >>> else >>> CHECKING_DFLAGS = >>> endif
[committed] [PR rtl-optimization/115877] Fix livein computation for ext-dce
So I'm not yet sure how I'm going to break everything down, but this is easy enough to break out as 1/N of ext-dce fixes/improvements. When handling uses in an insn, we first determine what bits are set in the destination which is represented in DST_MASK. Then we use that to refine what bits are live in the source operands. In the source operand handling section we *modify* DST_MASK if the source operand is a SUBREG (ugh!). So if the first operand is a SUBREG, then we can incorrectly compute which bit groups are live in the second operand, especially if it is a SUBREG as well. This was seen when testing a larger set of patches on the rl78 port (builtin-arith-overflow-p-7 & pr71631 execution failures), so no new test for this bugfix. Run through my tester (in conjunction with other ext-dce changes) on the various cross targets. Run individually through a bootstrap and regression test cycle on x86_64 as well. Pushing to the trunk. jeff PR rtl-optimization/115877 gcc/ * ext-dce.cc (ext_dce_process_uses): Restore the value of DST_MASK for reach operand. diff --git a/gcc/ext-dce.cc b/gcc/ext-dce.cc index 6d4b8858ec6..c4c38659701 100644 --- a/gcc/ext-dce.cc +++ b/gcc/ext-dce.cc @@ -591,8 +678,10 @@ ext_dce_process_uses (rtx_insn *insn, rtx obj, bitmap live_tmp) making things live. Breaking from this loop will cause the iterator to work on sub-rtxs, so it is safe to break if we see something we don't know how to handle. */ + unsigned HOST_WIDE_INT save_mask = dst_mask; for (;;) { + dst_mask = save_mask; /* Strip an outer paradoxical subreg. The bits outside the inner mode are don't cares. So we can just strip and process the inner object. */
[committed][PR rtl-optimization/115877][2/n] Improve liveness computation for constant initialization
While debugging pr115877, I noticed we were failing to remove the destination register from LIVENOW bitmap when it was set to a constant value. ie (set (dest) (const_int)). This was a trivial oversight in safe_for_live_propagation. I don't have an example of this affecting code generation, but it certainly could. More importantly, by making LIVENOW more accurate it's easier to debug when LIVENOW differs from expectations. As with the prior patch this has been tested as part of a larger patchset with the crosses as well as individually on x86_64. Pushing to the trunk, JeffPR rtl-optimization/115877 gcc/ * ext-dce.cc (safe_for_live_propagation): Handle RTX_CONST_OBJ. diff --git a/gcc/ext-dce.cc b/gcc/ext-dce.cc index 6d4b8858ec6..cbecfc53dba 100644 --- a/gcc/ext-dce.cc +++ b/gcc/ext-dce.cc @@ -69,6 +69,7 @@ safe_for_live_propagation (rtx_code code) switch (GET_RTX_CLASS (code)) { case RTX_OBJ: + case RTX_CONST_OBJ: return true; case RTX_COMPARE:
Re: [PATCH] testsuite: fix pr115929-1.c with -Wformat-security
Xi Ruoyao writes: > On Sat, 2024-07-20 at 06:52 +0100, Sam James wrote: >> Some distributions like Gentoo make -Wformat and -Wformat-security >> enabled by default. Pass -Wno-format to the test to avoid a spurious >> fail in such environments. >> >> gcc/testsuite/ >> PR rtl-optimization/115929 >> * gcc.dg/torture/pr115929-1.c: Pass -Wno-format. >> --- > > IMO if you are patching GCC downstream to enable some options, you can > patch the test case in the same .patch file anyway instead of pushing it > upstream. > > If we take the responsibility to make the test suite anticipate random > downstream changes, the test suite will ended up filled with different > workarounds for 42 distros. Yeah, I'm worried about that too. > If we have to anticipate downstream changes we should make a policy > about which changes we must anticipate (hmm and if we'll anticipate - > Wformat by default why not add a configuration option for it by the > way?), or do it in a more generic way (using a .spec file to explicitly > give the "baseline" options for testing?) Two systematic ways of dealing with this under the current testsuite framework would be: (1) Make dg-torture.exp add -w by default. This is what gcc.c-torture already does. Then, tests that want to test for warnings can enable them explicitly. Some of the existing dg-warnings are already due to lack of -w, rather than something that the test was originally designed for. E.g. pr26565.c. (2) Make dg-torture.exp add -Wall -Wextra by default, so that tests have to suppress any warnings they don't want. Personally, I'd prefer one of those two rather than patching upstream tests for downstream changes. Thanks, Richard
Re: [PATCH] testsuite: fix pr115929-1.c with -Wformat-security
Richard Sandiford writes: > Xi Ruoyao writes: >> On Sat, 2024-07-20 at 06:52 +0100, Sam James wrote: >>> Some distributions like Gentoo make -Wformat and -Wformat-security >>> enabled by default. Pass -Wno-format to the test to avoid a spurious >>> fail in such environments. >>> >>> gcc/testsuite/ >>> PR rtl-optimization/115929 >>> * gcc.dg/torture/pr115929-1.c: Pass -Wno-format. >>> --- >> >> IMO if you are patching GCC downstream to enable some options, you can >> patch the test case in the same .patch file anyway instead of pushing it >> upstream. >> >> If we take the responsibility to make the test suite anticipate random >> downstream changes, the test suite will ended up filled with different >> workarounds for 42 distros. > > Yeah, I'm worried about that too. > >> If we have to anticipate downstream changes we should make a policy >> about which changes we must anticipate (hmm and if we'll anticipate - >> Wformat by default why not add a configuration option for it by the >> way?), or do it in a more generic way (using a .spec file to explicitly >> give the "baseline" options for testing?) > > Two systematic ways of dealing with this under the current testsuite > framework would be: > > (1) Make dg-torture.exp add -w by default. This is what gcc.c-torture > already does. Then, tests that want to test for warnings can > enable them explicitly. > > Some of the existing dg-warnings are already due to lack of -w, > rather than something that the test was originally designed for. > E.g. pr26565.c. > > (2) Make dg-torture.exp add -Wall -Wextra by default, so that tests > have to suppress any warnings they don't want. > > Personally, I'd prefer one of those two rather than patching upstream > tests for downstream changes. I don't mind doing the work once we have consensus. (1) feels more pure but (2) is more progressive and lets us make things error out by default in future upstream with a bit more freedom. In the meantime, I'll return to other testsuite bits I have in mind. thanks, sam
[PATCH] doc: document all.cross and *.encap make targets
Informations were took from gcc/Makefile.in --- gcc/doc/sourcebuild.texi | 9 ++--- 1 file changed, 6 insertions(+), 3 deletions(-) diff --git a/gcc/doc/sourcebuild.texi b/gcc/doc/sourcebuild.texi index 66c4206bfc2..455836a583d 100644 --- a/gcc/doc/sourcebuild.texi +++ b/gcc/doc/sourcebuild.texi @@ -712,9 +712,12 @@ standard rule in @file{gcc/Makefile.in} to the variable @table @code @item all.cross -@itemx start.encap -@itemx rest.encap -FIXME: exactly what goes in each of these targets? +This is what to compile if making a cross-compiler. +@item start.encap +Build what must be done before installing GCC and converting libraries. +@item rest.encap +Build what must be done before installing GCC and converting libraries +that cannot be done in @code{start.encap}. @item tags Build an @command{etags} @file{TAGS} file in the language subdirectory in the source tree. -- 2.44.2
Re: [PATCH] testsuite: powerpc: fix dg-do run typo
Hi Sam, on 2024/7/20 07:10, Sam James wrote: > "Kewen.Lin" writes: > >> Hi Sam, > > Hi Kewen, > >> >> on 2024/7/19 11:28, Sam James wrote: >>> 'dg-run' is not a valid dejagnu directive, 'dg-do run' is needed here >>> for the test to be executed. >>> >>> 2024-07-18 Sam James >>> >>> PR target/108699 >>> * gcc.target/powerpc/pr108699.c: Fix 'dg-run' typo. >>> --- >>> Kewen, could you check this on powerpc to ensure it doesn't execute >>> beforehand >>> and now it does? I could do it on powerpc but I don't have anything setup >>> right now. >> >> Oops, thanks for catching and fixing this stupid typo! Yes, I just >> confirmed that, >> w/ this fix pr108699.exe gets generated and executed (# of expected passes >> is changed >> from 1 to 2). > > Many thanks! Could you push for me please? Sure, pushed as r15-2190. BR, Kewen > >> >> BR, >> Kewen > > best, > sam > >> >>> >>> gcc/testsuite/gcc.target/powerpc/pr108699.c | 2 +- >>> 1 file changed, 1 insertion(+), 1 deletion(-) >>> >>> diff --git a/gcc/testsuite/gcc.target/powerpc/pr108699.c >>> b/gcc/testsuite/gcc.target/powerpc/pr108699.c >>> index f02bac130cc7..beb8b601fd51 100644 >>> --- a/gcc/testsuite/gcc.target/powerpc/pr108699.c >>> +++ b/gcc/testsuite/gcc.target/powerpc/pr108699.c >>> @@ -1,4 +1,4 @@ >>> -/* { dg-run } */ >>> +/* { dg-do run } */ >>> /* { dg-options "-O2 -ftree-vectorize -fno-vect-cost-model" } */ >>> >>> #define N 16 >>>
[PATCHv2, expand] Add const0 move checking for CLEAR_BY_PIECES optabs
Hi, This patch adds const0 move checking for CLEAR_BY_PIECES. The original vec_duplicate handles duplicates of non-constant inputs. But 0 is a constant. So even a platform doesn't support vec_duplicate, it could still do clear by pieces if it supports const0 move by that mode. Compared to the previous version, the main change is to do const0 direct move for by-piece clear if the target supports const0 move by that mode. https://gcc.gnu.org/pipermail/gcc-patches/2024-January/643063.html Bootstrapped and tested on x86 and powerpc64-linux BE and LE with no regressions. There are several regressions on aarch64. They could be fixed by enhancing const0 move on V2x8QImode. Is it OK for trunk? Thanks Gui Haochen ChangeLog expand: Add const0 move checking for CLEAR_BY_PIECES optabs vec_duplicate handles duplicates of non-constant inputs. The 0 is a constant. So even a platform doesn't support vec_duplicate, it could still do clear by pieces if it supports const0 move. This patch adds the checking. gcc/ * expr.cc (by_pieces_mode_supported_p): Add const0 move checking for CLEAR_BY_PIECES. (op_by_pieces_d::run): Pass const0 to do the move if the target supports direct const0 move by the mode. patch.diff diff --git a/gcc/expr.cc b/gcc/expr.cc index fc5e998e329..97764eb9ebe 100644 --- a/gcc/expr.cc +++ b/gcc/expr.cc @@ -1006,14 +1006,21 @@ can_use_qi_vectors (by_pieces_operation op) static bool by_pieces_mode_supported_p (fixed_size_mode mode, by_pieces_operation op) { - if (optab_handler (mov_optab, mode) == CODE_FOR_nothing) + enum insn_code icode = optab_handler (mov_optab, mode); + if (icode == CODE_FOR_nothing) return false; - if ((op == SET_BY_PIECES || op == CLEAR_BY_PIECES) + if (op == SET_BY_PIECES && VECTOR_MODE_P (mode) && optab_handler (vec_duplicate_optab, mode) == CODE_FOR_nothing) return false; + if (op == CLEAR_BY_PIECES + && VECTOR_MODE_P (mode) + && optab_handler (vec_duplicate_optab, mode) == CODE_FOR_nothing + && !insn_operand_matches (icode, 1, CONST0_RTX (mode))) +return false; + if (op == COMPARE_BY_PIECES && !can_compare_p (EQ, mode, ccp_jump)) return false; @@ -1490,7 +1497,7 @@ op_by_pieces_d::run () do { unsigned int size = GET_MODE_SIZE (mode); - rtx to1 = NULL_RTX, from1; + rtx to1 = NULL_RTX, from1 = NULL_RTX; while (length >= size) { @@ -1500,12 +1507,26 @@ op_by_pieces_d::run () to1 = m_to.adjust (mode, m_offset, &to_prev); to_prev.data = to1; to_prev.mode = mode; - from1 = m_from.adjust (mode, m_offset, &from_prev); - from_prev.data = from1; - from_prev.mode = mode; m_to.maybe_predec (-(HOST_WIDE_INT)size); - m_from.maybe_predec (-(HOST_WIDE_INT)size); + + /* Pass CONST0_RTX for memory clear when target supports CONST0 +direct move. */ + if (m_op == CLEAR_BY_PIECES + && VECTOR_MODE_P (mode) + && optab_handler (vec_duplicate_optab, mode) == CODE_FOR_nothing) + { + enum insn_code icode = optab_handler (mov_optab, mode); + if (insn_operand_matches (icode, 1, CONST0_RTX (mode))) + from1 = CONST0_RTX (mode); + } + else + { + from1 = m_from.adjust (mode, m_offset, &from_prev); + from_prev.data = from1; + from_prev.mode = mode; + m_from.maybe_predec (-(HOST_WIDE_INT)size); + } generate (to1, from1, mode);
Ping [PATCH-1v4] Value Range: Add range op for builtin isinf
Hi, Gently ping it. https://gcc.gnu.org/pipermail/gcc-patches/2024-July/656937.html Thanks Gui Haochen 在 2024/7/11 15:32, HAO CHEN GUI 写道: > Hi, > The builtin isinf is not folded at front end if the corresponding optab > exists. It causes the range evaluation failed on the targets which has > optab_isinf. For instance, range-sincos.c will fail on the targets which > has optab_isinf as it calls builtin_isinf. > > This patch fixed the problem by adding range op for builtin isinf. It > also fixed the issue in PR114678. > > Compared with previous version, the main change is to remove xfail for > s390 in range-sincos.c and vrp-float-abs-1.c. > https://gcc.gnu.org/pipermail/gcc-patches/2024-May/653096.html > > Bootstrapped and tested on x86 and powerpc64-linux BE and LE with no > regressions. Is it OK for the trunk? > > Thanks > Gui Haochen > > > ChangeLog > Value Range: Add range op for builtin isinf > > The builtin isinf is not folded at front end if the corresponding optab > exists. So the range op for isinf is needed for value range analysis. > This patch adds range op for builtin isinf. > > gcc/ > PR target/114678 > * gimple-range-op.cc (class cfn_isinf): New. > (op_cfn_isinf): New variables. > (gimple_range_op_handler::maybe_builtin_call): Handle > CASE_FLT_FN (BUILT_IN_ISINF). > > gcc/testsuite/ > PR target/114678 > * gcc.dg/tree-ssa/range-isinf.c: New test. > * gcc.dg/tree-ssa/range-sincos.c: Remove xfail for s390. > * gcc.dg/tree-ssa/vrp-float-abs-1.c: Likewise. > > patch.diff > diff --git a/gcc/gimple-range-op.cc b/gcc/gimple-range-op.cc > index a80b93cf063..24559951dd6 100644 > --- a/gcc/gimple-range-op.cc > +++ b/gcc/gimple-range-op.cc > @@ -1153,6 +1153,63 @@ private: >bool m_is_pos; > } op_cfn_goacc_dim_size (false), op_cfn_goacc_dim_pos (true); > > +// Implement range operator for CFN_BUILT_IN_ISINF > +class cfn_isinf : public range_operator > +{ > +public: > + using range_operator::fold_range; > + using range_operator::op1_range; > + virtual bool fold_range (irange &r, tree type, const frange &op1, > +const irange &, relation_trio) const override > + { > +if (op1.undefined_p ()) > + return false; > + > +if (op1.known_isinf ()) > + { > + wide_int one = wi::one (TYPE_PRECISION (type)); > + r.set (type, one, one); > + return true; > + } > + > +if (op1.known_isnan () > + || (!real_isinf (&op1.lower_bound ()) > + && !real_isinf (&op1.upper_bound ( > + { > + r.set_zero (type); > + return true; > + } > + > +r.set_varying (type); > +return true; > + } > + virtual bool op1_range (frange &r, tree type, const irange &lhs, > + const frange &, relation_trio) const override > + { > +if (lhs.undefined_p ()) > + return false; > + > +if (lhs.zero_p ()) > + { > + nan_state nan (true); > + r.set (type, real_min_representable (type), > +real_max_representable (type), nan); > + return true; > + } > + > +if (!range_includes_zero_p (lhs)) > + { > + // The range is [-INF,-INF][+INF,+INF], but it can't be represented. > + // Set range to [-INF,+INF] > + r.set_varying (type); > + r.clear_nan (); > + return true; > + } > + > +r.set_varying (type); > +return true; > + } > +} op_cfn_isinf; > > // Implement range operator for CFN_BUILT_IN_ > class cfn_parity : public range_operator > @@ -1246,6 +1303,11 @@ gimple_range_op_handler::maybe_builtin_call () >m_operator = &op_cfn_signbit; >break; > > +CASE_FLT_FN (BUILT_IN_ISINF): > + m_op1 = gimple_call_arg (call, 0); > + m_operator = &op_cfn_isinf; > + break; > + > CASE_CFN_COPYSIGN_ALL: >m_op1 = gimple_call_arg (call, 0); >m_op2 = gimple_call_arg (call, 1); > diff --git a/gcc/testsuite/gcc.dg/tree-ssa/range-isinf.c > b/gcc/testsuite/gcc.dg/tree-ssa/range-isinf.c > new file mode 100644 > index 000..468f1bcf5c7 > --- /dev/null > +++ b/gcc/testsuite/gcc.dg/tree-ssa/range-isinf.c > @@ -0,0 +1,44 @@ > +/* { dg-do compile } */ > +/* { dg-options "-O2 -fdump-tree-evrp" } */ > + > +#include > +void link_error(); > + > +void > +test1 (double x) > +{ > + if (x > __DBL_MAX__ && !__builtin_isinf (x)) > +link_error (); > + if (x < -__DBL_MAX__ && !__builtin_isinf (x)) > +link_error (); > +} > + > +void > +test2 (float x) > +{ > + if (x > __FLT_MAX__ && !__builtin_isinf (x)) > +link_error (); > + if (x < -__FLT_MAX__ && !__builtin_isinf (x)) > +link_error (); > +} > + > +void > +test3 (double x) > +{ > + if (!__builtin_isinf (x) && !__builtin_isnan (x) && x > __DBL_MAX__) > +link_error (); > + if (!__builtin_isinf (x) && !__builtin_isnan (x) && x < -__DBL_MAX__) > +link_error (); > +} > + > +void > +test4 (float x) > +{ > + if (!__builtin_isinf (x) && !__builtin_isnan (x) && x > __FLT_MAX__)
Ping^4 [PATCH-3v2] Value Range: Add range op for builtin isnormal
Hi, Gently ping it. https://gcc.gnu.org/pipermail/gcc-patches/2024-May/653095.html Thanks Gui Haochen 在 2024/7/1 9:12, HAO CHEN GUI 写道: > Hi, > Gently ping it. > https://gcc.gnu.org/pipermail/gcc-patches/2024-May/653095.html > > Thanks > Gui Haochen > > 在 2024/6/24 9:41, HAO CHEN GUI 写道: >> Hi, >> Gently ping it. >> https://gcc.gnu.org/pipermail/gcc-patches/2024-May/653095.html >> >> Thanks >> Gui Haochen >> >> 在 2024/6/20 14:58, HAO CHEN GUI 写道: >>> Hi, >>> Gently ping it. >>> https://gcc.gnu.org/pipermail/gcc-patches/2024-May/653095.html >>> >>> Thanks >>> Gui Haochen >>> >>> 在 2024/5/30 10:46, HAO CHEN GUI 写道: Hi, This patch adds the range op for builtin isnormal. It also adds two help function in frange to detect range of normal floating-point and range of subnormal or zero. Compared to previous version, the main change is to set the range to 1 if it's normal number otherwise to 0. https://gcc.gnu.org/pipermail/gcc-patches/2024-May/652221.html Bootstrapped and tested on x86 and powerpc64-linux BE and LE with no regressions. Is it OK for the trunk? Thanks Gui Haochen ChangeLog Value Range: Add range op for builtin isnormal The former patch adds optab for builtin isnormal. Thus builtin isnormal might not be folded at front end. So the range op for isnormal is needed for value range analysis. This patch adds range op for builtin isnormal. gcc/ * gimple-range-op.cc (class cfn_isfinite): New. (op_cfn_finite): New variables. (gimple_range_op_handler::maybe_builtin_call): Handle CFN_BUILT_IN_ISFINITE. * value-range.h (class frange): Declare known_isnormal and known_isdenormal_or_zero. (frange::known_isnormal): Define. (frange::known_isdenormal_or_zero): Define. gcc/testsuite/ * gcc/testsuite/gcc.dg/tree-ssa/range-isnormal.c: New test. patch.diff diff --git a/gcc/gimple-range-op.cc b/gcc/gimple-range-op.cc index 5ec5c828fa4..6787f532f11 100644 --- a/gcc/gimple-range-op.cc +++ b/gcc/gimple-range-op.cc @@ -1289,6 +1289,61 @@ public: } } op_cfn_isfinite; +//Implement range operator for CFN_BUILT_IN_ISNORMAL +class cfn_isnormal : public range_operator +{ +public: + using range_operator::fold_range; + using range_operator::op1_range; + virtual bool fold_range (irange &r, tree type, const frange &op1, + const irange &, relation_trio) const override + { +if (op1.undefined_p ()) + return false; + +if (op1.known_isnormal ()) + { + wide_int one = wi::one (TYPE_PRECISION (type)); + r.set (type, one, one); + return true; + } + +if (op1.known_isnan () + || op1.known_isinf () + || op1.known_isdenormal_or_zero ()) + { + r.set_zero (type); + return true; + } + +r.set_varying (type); +return true; + } + virtual bool op1_range (frange &r, tree type, const irange &lhs, +const frange &, relation_trio) const override + { +if (lhs.undefined_p ()) + return false; + +if (lhs.zero_p ()) + { + r.set_varying (type); + return true; + } + +if (!range_includes_zero_p (lhs)) + { + nan_state nan (false); + r.set (type, real_min_representable (type), + real_max_representable (type), nan); + return true; + } + +r.set_varying (type); +return true; + } +} op_cfn_isnormal; + // Implement range operator for CFN_BUILT_IN_ class cfn_parity : public range_operator { @@ -1391,6 +1446,11 @@ gimple_range_op_handler::maybe_builtin_call () m_operator = &op_cfn_isfinite; break; +case CFN_BUILT_IN_ISNORMAL: + m_op1 = gimple_call_arg (call, 0); + m_operator = &op_cfn_isnormal; + break; + CASE_CFN_COPYSIGN_ALL: m_op1 = gimple_call_arg (call, 0); m_op2 = gimple_call_arg (call, 1); diff --git a/gcc/testsuite/gcc.dg/tree-ssa/range-isnormal.c b/gcc/testsuite/gcc.dg/tree-ssa/range-isnormal.c new file mode 100644 index 000..c4df4d839b0 --- /dev/null +++ b/gcc/testsuite/gcc.dg/tree-ssa/range-isnormal.c @@ -0,0 +1,37 @@ +/* { dg-do compile } */ +/* { dg-options "-O2 -fdump-tree-evrp" } */ + +#include +void link_error(); + +void test1 (double x) +{ + if (x < __DBL_MAX__ && x > __DBL_MIN__ && !__builtin_isnormal (x)) +link_error (); + + if (x < -__DBL_MIN__ && x > -__DBL_MAX__ && !__builtin_isnormal (x)) +link_error (); +}
Ping^4 [PATCH-2v4] Value Range: Add range op for builtin isfinite
Hi, Gently ping it. https://gcc.gnu.org/pipermail/gcc-patches/2024-May/653094.html Thanks Gui Haochen 在 2024/7/1 9:11, HAO CHEN GUI 写道: > Hi, > Gently ping it. > https://gcc.gnu.org/pipermail/gcc-patches/2024-May/653094.html > > Thanks > Gui Haochen > > 在 2024/6/24 9:41, HAO CHEN GUI 写道: >> Hi, >> Gently ping it. >> https://gcc.gnu.org/pipermail/gcc-patches/2024-May/653094.html >> >> Thanks >> Gui Haochen >> >> 在 2024/6/20 14:57, HAO CHEN GUI 写道: >>> Hi, >>> Gently ping it. >>> https://gcc.gnu.org/pipermail/gcc-patches/2024-May/653094.html >>> >>> Thanks >>> Gui Haochen >>> >>> 在 2024/5/30 10:46, HAO CHEN GUI 写道: Hi, This patch adds the range op for builtin isfinite. Compared to previous version, the main change is to set the range to 1 if it's finite number otherwise to 0. https://gcc.gnu.org/pipermail/gcc-patches/2024-May/652220.html Bootstrapped and tested on x86 and powerpc64-linux BE and LE with no regressions. Is it OK for the trunk? Thanks Gui Haochen ChangeLog Value Range: Add range op for builtin isfinite The former patch adds optab for builtin isfinite. Thus builtin isfinite might not be folded at front end. So the range op for isfinite is needed for value range analysis. This patch adds range op for builtin isfinite. gcc/ * gimple-range-op.cc (class cfn_isfinite): New. (op_cfn_finite): New variables. (gimple_range_op_handler::maybe_builtin_call): Handle CFN_BUILT_IN_ISFINITE. gcc/testsuite/ * gcc/testsuite/gcc.dg/tree-ssa/range-isfinite.c: New test. patch.diff diff --git a/gcc/gimple-range-op.cc b/gcc/gimple-range-op.cc index 4e60a42eaac..5ec5c828fa4 100644 --- a/gcc/gimple-range-op.cc +++ b/gcc/gimple-range-op.cc @@ -1233,6 +1233,62 @@ public: } } op_cfn_isinf; +//Implement range operator for CFN_BUILT_IN_ISFINITE +class cfn_isfinite : public range_operator +{ +public: + using range_operator::fold_range; + using range_operator::op1_range; + virtual bool fold_range (irange &r, tree type, const frange &op1, + const irange &, relation_trio) const override + { +if (op1.undefined_p ()) + return false; + +if (op1.known_isfinite ()) + { + wide_int one = wi::one (TYPE_PRECISION (type)); + r.set (type, one, one); + return true; + } + +if (op1.known_isnan () + || op1.known_isinf ()) + { + r.set_zero (type); + return true; + } + +r.set_varying (type); +return true; + } + virtual bool op1_range (frange &r, tree type, const irange &lhs, +const frange &, relation_trio) const override + { +if (lhs.undefined_p ()) + return false; + +if (lhs.zero_p ()) + { + // The range is [-INF,-INF][+INF,+INF] NAN, but it can't be represented. + // Set range to varying + r.set_varying (type); + return true; + } + +if (!range_includes_zero_p (lhs)) + { + nan_state nan (false); + r.set (type, real_min_representable (type), + real_max_representable (type), nan); + return true; + } + +r.set_varying (type); +return true; + } +} op_cfn_isfinite; + // Implement range operator for CFN_BUILT_IN_ class cfn_parity : public range_operator { @@ -1330,6 +1386,11 @@ gimple_range_op_handler::maybe_builtin_call () m_operator = &op_cfn_isinf; break; +case CFN_BUILT_IN_ISFINITE: + m_op1 = gimple_call_arg (call, 0); + m_operator = &op_cfn_isfinite; + break; + CASE_CFN_COPYSIGN_ALL: m_op1 = gimple_call_arg (call, 0); m_op2 = gimple_call_arg (call, 1); diff --git a/gcc/testsuite/gcc.dg/tree-ssa/range-isfinite.c b/gcc/testsuite/gcc.dg/tree-ssa/range-isfinite.c new file mode 100644 index 000..f5dce0a0486 --- /dev/null +++ b/gcc/testsuite/gcc.dg/tree-ssa/range-isfinite.c @@ -0,0 +1,31 @@ +/* { dg-do compile } */ +/* { dg-options "-O2 -fdump-tree-evrp" } */ + +#include +void link_error(); + +void test1 (double x) +{ + if (x < __DBL_MAX__ && x > -__DBL_MAX__ && !__builtin_isfinite (x)) +link_error (); +} + +void test2 (float x) +{ + if (x < __FLT_MAX__ && x > -__FLT_MAX__ && !__builtin_isfinite (x)) +link_error (); +} + +void test3 (double x) +{ + if (__builtin_isfinite (x) && __builtin_isinf (x)) +link_error (); +} + +void test4 (float x) +{ + if (__builtin_isfinite (x) && __
[Bug fortran/59104] 15 Regression - Wrong result with SIZE specification expression
After an OK from Harald, commit r15-2187-g838999bb23303edc14e96b6034cd837fa4454cfd Author: Paul Thomas Date: Sun Jul 21 17:48:47 2024 +0100 Fortran: Fix regression caused by r14-10477 [PR59104] 2024-07-21 Paul Thomas gcc/fortran PR fortran/59104 * gfortran.h : Add decl_order to gfc_symbol. * symbol.cc : Add static next_decl_order.. (gfc_set_sym_referenced): Set symbol decl_order. * trans-decl.cc : Include dependency.h. (decl_order): Replace symbol declared_at.lb->location with decl_order. gcc/testsuite/ PR fortran/59104 * gfortran.dg/dependent_decls_3.f90: New test. ug. You are the assignee for the bug.
Re: [PATCH] LoongArch: Implement scalar isinf, isnormal, and isfinite via fclass
On Sun, Jul 21, 2024 at 3:57 AM Xi Ruoyao wrote: > > On Mon, 2024-07-15 at 15:53 +0800, Lulu Cheng wrote: > > Hi, > > > > g++.dg/opt/pr107569.C and range-sincos.c vrp-float-abs-1.c is the same > > issue, right? > > > > And I have no objection to code modifications. But I think it's better > > to wait until this builtin > > > > function is fixed. > > Oops https://gcc.gnu.org/pipermail/gcc-patches/2024-July/656937.html > won't be enough for pr107569.C. For pr107569.C I guess we need to add > range ops for __builtin_isfinite but the patch only handles > __builtin_isinf. There is a patch for that; all 3 were pinged this morning: isinf: https://gcc.gnu.org/pipermail/gcc-patches/2024-July/657879.html isnormal: https://gcc.gnu.org/pipermail/gcc-patches/2024-July/657880.html isfinite: https://gcc.gnu.org/pipermail/gcc-patches/2024-July/657881.html Thanks, Andrew Pinski > > -- > Xi Ruoyao > School of Aerospace Science and Technology, Xidian University
[PATCH] i386: Change prefetchi output template
Hi all, For prefetchi instructions, RIP-relative address is explicitly mentioned for operand and assembler obeys that rule strictly. This makes instruction like: prefetchit0 bar got illegal for assembler, which should be a broad usage for prefetchi. Explicitly add (%rip) after function label to make it legal in assembler so that it could pass to linker to get the real address. Ok for trunk and backport to GCC14 and GCC13 since prefetchi instructions are introduced in GCC13? Thx, Haochen gcc/ChangeLog: * config/i386/i386.md (prefetchi): Add explicit (%rip) after function label. gcc/testsuite/ChangeLog: * gcc.target/i386/prefetchi-1.c: Check (%rip). --- gcc/config/i386/i386.md | 2 +- gcc/testsuite/gcc.target/i386/prefetchi-1.c | 4 ++-- 2 files changed, 3 insertions(+), 3 deletions(-) diff --git a/gcc/config/i386/i386.md b/gcc/config/i386/i386.md index 90d3aa450f0..3ec51bad6fe 100644 --- a/gcc/config/i386/i386.md +++ b/gcc/config/i386/i386.md @@ -28004,7 +28004,7 @@ "TARGET_PREFETCHI && TARGET_64BIT" { static const char * const patterns[2] = { -"prefetchit1\t%0", "prefetchit0\t%0" +"prefetchit1\t{%p0(%%rip)|%p0[rip]}", "prefetchit0\t{%p0(%%rip)|%p0[rip]}" }; int locality = INTVAL (operands[1]); diff --git a/gcc/testsuite/gcc.target/i386/prefetchi-1.c b/gcc/testsuite/gcc.target/i386/prefetchi-1.c index 80f25e70e8e..03dfdc55e86 100644 --- a/gcc/testsuite/gcc.target/i386/prefetchi-1.c +++ b/gcc/testsuite/gcc.target/i386/prefetchi-1.c @@ -1,7 +1,7 @@ /* { dg-do compile { target { ! ia32 } } } */ /* { dg-options "-mprefetchi -O2" } */ -/* { dg-final { scan-assembler-times "\[ \\t\]+prefetchit0\[ \\t\]+" 2 } } */ -/* { dg-final { scan-assembler-times "\[ \\t\]+prefetchit1\[ \\t\]+" 2 } } */ +/* { dg-final { scan-assembler-times "\[ \\t\]+prefetchit0\[ \\t\]+bar\\(%rip\\)" 2 } } */ +/* { dg-final { scan-assembler-times "\[ \\t\]+prefetchit1\[ \\t\]+bar\\(%rip\\)" 2 } } */ #include -- 2.31.1
[PATCH] regrename: Skip renaming register pairs [PR115860]
It is not trivial to decide when a write of a register pair terminates or starts a new chain. For example, prior regrename we have (insn 91 38 36 5 (set (reg:FPRX2 16 %f0 [orig:76 x ] [76]) (const_double:FPRX2 0.0 [0x0.0p+0])) "float-cast-overflow-7-reduced.c":5:55 discrim 2 1507 {*movfprx2_64} (expr_list:REG_EQUAL (const_double:FPRX2 0.0 [0x0.0p+0]) (nil))) (insn 36 91 37 5 (set (subreg:DF (reg:FPRX2 16 %f0 [orig:76 x ] [76]) 0) (mem/c:DF (plus:DI (reg/f:DI 15 %r15) (const_int 160 [0xa0])) [7 %sfp+-32 S8 A64])) "float-cast-overflow-7-reduced.c":5:55 discrim 2 1512 {*movdf_64dfp} (nil)) (insn 37 36 43 5 (set (subreg:DF (reg:FPRX2 16 %f0 [orig:76 x ] [76]) 8) (mem/c:DF (plus:DI (reg/f:DI 15 %r15) (const_int 168 [0xa8])) [7 %sfp+-24 S8 A64])) "float-cast-overflow-7-reduced.c":5:55 discrim 2 1512 {*movdf_64dfp} (nil)) where insn 91 writes both registers of a register pair and it is clear that an existing chain must be terminated and a new started. Insn 36 and 37 write only into one register of a corresponding register pair. For each write on its own it is not obvious when to terminate an existing chain and to start a new one. In other words, once insn 36 materializes and 37 didn't we are kind of in a limbo state. Tracking this correctly is inherently hard and I'm not entirely sure whether optimizations could even lead to more complicated cases where it is even less clear when a chain terminates and a new has to be started. Therefore, skip renaming of register pairs. Bootstrapped and regtested on x86_64, aarch64, powerpc64le, and s390. Ok for mainline? This fixes on s390: FAIL: g++.dg/cpp23/ext-floating14.C -std=gnu++23 execution test FAIL: g++.dg/cpp23/ext-floating14.C -std=gnu++26 execution test FAIL: c-c++-common/ubsan/float-cast-overflow-7.c -O2 execution test FAIL: c-c++-common/ubsan/float-cast-overflow-7.c -O2 -flto -fno-use-linker-plugin -flto-partition=none execution test FAIL: c-c++-common/ubsan/float-cast-overflow-7.c -O2 -flto -fuse-linker-plugin -fno-fat-lto-objects execution test FAIL: gcc.dg/torture/fp-int-convert-float128-ieee-timode.c -O0 execution test FAIL: gcc.dg/torture/fp-int-convert-float128-ieee-timode.c -O1 execution test FAIL: gcc.dg/torture/fp-int-convert-float128-ieee-timode.c -O2 execution test FAIL: gcc.dg/torture/fp-int-convert-float128-ieee-timode.c -O2 -flto -fno-use-linker-plugin -flto-partition=none execution test FAIL: gcc.dg/torture/fp-int-convert-float128-ieee-timode.c -O2 -flto -fuse-linker-plugin -fno-fat-lto-objects execution test FAIL: gcc.dg/torture/fp-int-convert-float128-ieee-timode.c -O3 -g execution test FAIL: gcc.dg/torture/fp-int-convert-float128-ieee-timode.c -Os execution test FAIL: gcc.dg/torture/fp-int-convert-float64x-timode.c -O0 execution test FAIL: gcc.dg/torture/fp-int-convert-float64x-timode.c -O1 execution test FAIL: gcc.dg/torture/fp-int-convert-float64x-timode.c -O2 execution test FAIL: gcc.dg/torture/fp-int-convert-float64x-timode.c -O2 -flto -fno-use-linker-plugin -flto-partition=none execution test FAIL: gcc.dg/torture/fp-int-convert-float64x-timode.c -O2 -flto -fuse-linker-plugin -fno-fat-lto-objects execution test FAIL: gcc.dg/torture/fp-int-convert-float64x-timode.c -O3 -g execution test FAIL: gcc.dg/torture/fp-int-convert-float64x-timode.c -Os execution test FAIL: gcc.dg/torture/fp-int-convert-timode.c -O0 execution test FAIL: gcc.dg/torture/fp-int-convert-timode.c -O1 execution test FAIL: gcc.dg/torture/fp-int-convert-timode.c -O2 execution test FAIL: gcc.dg/torture/fp-int-convert-timode.c -O2 -flto -fno-use-linker-plugin -flto-partition=none execution test FAIL: gcc.dg/torture/fp-int-convert-timode.c -O2 -flto -fuse-linker-plugin -fno-fat-lto-objects execution test FAIL: gcc.dg/torture/fp-int-convert-timode.c -O3 -g execution test FAIL: gcc.dg/torture/fp-int-convert-timode.c -Os execution test FAIL: gfortran.dg/pr96711.f90 -O0 execution test FAIL: TestSignalForwardingExternal FAIL: go test misc/cgo/testcarchive FAIL: libffi.closures/nested_struct5.c -W -Wall -Wno-psabi -O2 output pattern test FAIL: libphobos.phobos/std/algorithm/mutation.d execution test FAIL: libphobos.phobos/std/conv.d execution test FAIL: libphobos.phobos/std/internal/math/errorfunction.d execution test FAIL: libphobos.phobos/std/variant.d execution test FAIL: libphobos.phobos_shared/std/algorithm/mutation.d execution test FAIL: libphobos.phobos_shared/std/conv.d execution test FAIL: libphobos.phobos_shared/std/internal/math/errorfunction.d execution test FAIL: libphobos.phobos_shared/std/variant.d execution test gcc/ChangeLog: PR rtl-optimiztion/115860 * regrename.cc (scan_rtx_reg): Do not try to rename register pairs. --- gcc/regrename.cc | 4 1 file changed, 4 insertions(+) diff --git a/gcc/regrename.cc b/gcc/regrename.cc index 054e601740b..6ae5a2309d0 100644 ---
[PATCH,c++,wwwdocs] bugs: Remove old "export" non-bug
We have been carrying this note on the "original" export feature for ages, and I believe it's not actually a FAQ, if it ever was. Jonathan moved this down when adding a note on ADL last fall. I now propose to drop it. Thoughts? Gerald diff --git a/htdocs/bugs/index.html b/htdocs/bugs/index.html index 40355911..7f2f485c 100644 --- a/htdocs/bugs/index.html +++ b/htdocs/bugs/index.html @@ -622,17 +622,6 @@ and the scope operator, or compile using C++11 or later. Defect report 1104 changed the parser rules so that <:: works as expected. -export -Most C++ compilers (G++ included) never implemented C++98 -export, which was removed in C++11, and the keyword reused in -C++20 by the Modules feature. The C++98 feature was intended to support -separate compilation of template declarations and -definitions. Without export, a template definition must be in -scope to be used. The obvious workaround is simply to place all definitions in -the header itself. Alternatively, the compilation unit containing template -definitions may be included from the header. - - Common problems when upgrading the compiler ABI changes
Re: [PATCH] Reduce iteration counts of tsvc tests
On Fri, Jul 19, 2024 at 4:25 AM Joern Wolfgang Rennecke wrote: > > As discussed before on gcc@gcc,gnu.org, this patch reduces the iteration > counts of the tsvc tests to avoid timeouts when using simulators. > A few tests needed special attention because they divided "iterations" > by some constant, so putting 10 in there would lead to zero iteration > count, and thus the to-be-vectorized code removed. For nine of these > files, that was a simple adjustment of iterations to 256 (AKA LEN_2D), > but vect-tsvc-s176.c needed 3200 to avoid a zero outer loop iteration > count, and then it took to long on a simulator, so I curtailed the inner > loop unless run_expensive_tests is set; I targeted the inner loop > because it already had a variable as the loop end bound, and it was just > a matter of adjusting that variable. > > Regression tested in 9846b0916c1a9b9f3e9df4657670ef4419617134 on > x86_64-pc-linux-gnu (--disable-multilibs) by running > make check-gcc 'RUNTESTFLAGS=vect.exp' -j32 > and comparing gcc.sum without and with this patch. OK. Richard.