[PATCH] gm2: add missing debug output guard

2024-07-21 Thread Wilken Gottwalt
The Close() procedure in MemStream is missing a guard to prevent it from
printing in non-debug mode.

gcc/gm2:
* gm2-libs-iso/MemStream.mod: Guard debug output.

Signed-off-by: Wilken Gottwalt 
---
 gcc/m2/gm2-libs-iso/MemStream.mod | 5 -
 1 file changed, 4 insertions(+), 1 deletion(-)

diff --git a/gcc/m2/gm2-libs-iso/MemStream.mod 
b/gcc/m2/gm2-libs-iso/MemStream.mod
index 9620ed2ba19..d3204692540 100644
--- a/gcc/m2/gm2-libs-iso/MemStream.mod
+++ b/gcc/m2/gm2-libs-iso/MemStream.mod
@@ -694,7 +694,10 @@ END handlefree ;
 
 PROCEDURE Close (VAR cid: ChanId) ;
 BEGIN
-   printf ("Close called\n");
+   IF Debugging
+   THEN
+  printf ("Close called\n")
+   END ;
IF IsMem(cid)
THEN
   UnMakeChan(did, cid) ;
-- 
2.45.2



[RFC] Generalize formation of lane-reducing ops in loop reduction

2024-07-21 Thread Feng Xue OS
Hi,

  I composed some patches to generalize lane-reducing (dot-product is a typical 
representative) pattern recognition, and prepared a RFC document so as to help
review. The original intention was to make a complete solution for 
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114440.  For sure, the work might
be limited, so hope your comments. Thanks.

-

1. Background

For loop reduction of accumulating result of a widening operation, the
preferred pattern is lane-reducing operation, if supported by target. Because
this kind of operation need not preserve intermediate results of widening
operation, and only produces reduced amount of final results for accumulation,
choosing the pattern could lead to pretty compact codegen.

Three lane-reducing opcodes are defined in gcc, belonging to two kinds of
operations: dot-product (DOT_PROD_EXPR) and sum-of-absolute-difference
(SAD_EXPR). WIDEN_SUM_EXPR could be seen as a degenerated dot-product with a
constant operand as "1". Currently, gcc only supports recognition of simple
lane-reducing case, in which each accumulation statement of loop reduction
forms one pattern:

 char  *d0, *d1;
 short *s0, *s1;

 for (i) {
   sum += d0[i] * d1[i];  //  = DOT_PROD 
   sum += abs(s0[i] - s1[i]); //  = SAD 
 }

We could rewrite the example as the below using only one statement, whose non-
reduction addend is the sum of the above right-side parts. As a whole, the
addend would match nothing, while its two sub-expressions could be recognized
as corresponding lane-reducing patterns.

 for (i) {
   sum += d0[i] * d1[i] + abs(s0[i] - s1[i]);
 }

This case might be too elaborately crafted to be very common in reality.
Though, we do find seemingly variant but essentially similar code pattern in
some AI applications, which use matrix-vector operations extensively, some
usages are just single loop reduction composed of multiple dot-products. A
code snippet from ggml:

 for (int j = 0; j < qk/2; ++j) {
   const uint8_t xh_0 = ((qh >> (j +  0)) << 4) & 0x10;
   const uint8_t xh_1 = ((qh >> (j + 12)) ) & 0x10;

   const int32_t x0 = (x[i].qs[j] & 0xF) | xh_0;
   const int32_t x1 = (x[i].qs[j] >>  4) | xh_1;

   sumi += (x0 * y[i].qs[j]) + (x1 * y[i].qs[j + qk/2]);
 }

In the source level, it appears to be a nature and minor scaling-up of simple
one lane-reducing pattern, but it is beyond capability of current vectorization
pattern recognition, and needs some kind of generic extension to the framework.

2. Reasoning on validity of transform

First of all, we should tell what kind of expression is appropriate for lane-
reducing transform. Given a loop, we use the language of mathematics to define
an abstract function f(x, i), whose first independent variable "x" denotes a
value that will participate sum-based loop reduction either directly or
indirectly, and the 2nd one "i" specifies index of a loop iteration, which
implies other intra-iteration factor irrelevant to "x". The function itself
represents the transformed value by applying a series of operations on "x" in
the context of "i"th loop iteration, and this value is directly accumulated to
the loop reduction result. For the purpose of vectorization, it is implicitly
supposed that f(x, i) is a pure function, and free of loop dependency.

Additionally, for a value "x" defined in the loop, let "X" be a vector as
, consisting of the "x" values in all iterations, to be
specific, "X[i]" corresponds to "x" at iteration "i", or "xi". With sequential
execution order, a loop reduction regarding to f(x, i) would be expanded to:

 sum += f(x0, 0);
 sum += f(x1, 1);
 ...
 sum += f(xM, M);

2.1 Lane-reducing vs. Lane-combining

Following lane-reducing semantics, we introduce a new similar lane-combining
operation that also manipulates a subset of lanes/elements in vector, by
accumulating all into one of them, at the same time, clearing the rest lanes
to be zero. Two operations are equivalent in essence, while a major difference
is that lane-combining operation does not reduce the lanes of vector. One
advantage about this is codegen of lane-combining operation could seamlessly
inter-operate with that of normal (non-lane-reducing) vector operation.

Any lane-combining operation could be synthesized by a sequence of the most
basic two-lane operations, which become the focuses of our analysis. Given two
lanes "i" and "j", and let X' = lane-combine(X, i, j), then we have:

  X  = <..., xi , ...,  xj, ...>
  X' = <..., xi + xj, ...,   0, ...>

2.2 Equations for loop reduction invariance

Since combining strategy of lane-reducing operations is target-specific, for
examples, accumulating quad lanes to one (#0 + #1 + #2 + #3 => #0), or low to
high (#0 + #4 => #4), we just make a conservative assumption that combining
could happen on arbitrary two lanes in either order. Under the precondition,
it is legitimate to optimize evaluation of a value "x" with a lane-reducing
pattern, only if loop reduction always produces invariant result no matter
w

[RFC][PATCH 1/5] vect: Fix single_imm_use in tree_vect_patterns

2024-07-21 Thread Feng Xue OS
The work for RFC 
(https://gcc.gnu.org/pipermail/gcc-patches/2024-July/657860.html)
involves not a little code change, so I have to separate it into several batches
of patchset. This and the following patches constitute the first batch.

Since pattern statement coexists with normal statements in a way that it is
not linked into function body, we should not invoke utility procedures that
depends on def/use graph on pattern statement, such as counting uses of a
pseudo value defined by a pattern statement. This patch is to fix a bug of
this type in vect pattern formation.

Thanks,
Feng
---
gcc/
* tree-vect-patterns.cc (vect_recog_bitfield_ref_pattern): Only call
single_imm_use if statement is not generated by pattern recognition.
---
 gcc/tree-vect-patterns.cc | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/gcc/tree-vect-patterns.cc b/gcc/tree-vect-patterns.cc
index 4570c25b664..ca8809e7cfd 100644
--- a/gcc/tree-vect-patterns.cc
+++ b/gcc/tree-vect-patterns.cc
@@ -2700,7 +2700,8 @@ vect_recog_bitfield_ref_pattern (vec_info *vinfo, 
stmt_vec_info stmt_info,
   /* If the only use of the result of this BIT_FIELD_REF + CONVERT is a
  PLUS_EXPR then do the shift last as some targets can combine the shift and
  add into a single instruction.  */
-  if (lhs && single_imm_use (lhs, &use_p, &use_stmt))
+  if (lhs && !STMT_VINFO_RELATED_STMT (stmt_info)
+  && single_imm_use (lhs, &use_p, &use_stmt))
 {
   if (gimple_code (use_stmt) == GIMPLE_ASSIGN
  && gimple_assign_rhs_code (use_stmt) == PLUS_EXPR)
-- 
2.17.1

From 52e1725339fc7e4552eb7916570790c4ab7f133d Mon Sep 17 00:00:00 2001
From: Feng Xue 
Date: Fri, 14 Jun 2024 15:49:23 +0800
Subject: [PATCH 1/5] vect: Fix single_imm_use in tree_vect_patterns

Since pattern statement coexists with normal statements in a way that it is
not linked into function body, we should not invoke utility procedures that
depends on def/use graph on pattern statement, such as counting uses of a
pseudo value defined by a pattern statement. This patch is to fix a bug of
this type in vect pattern formation.

2024-06-14 Feng Xue 

gcc/
	* tree-vect-patterns.cc (vect_recog_bitfield_ref_pattern): Only call
	single_imm_use if statement is not generated by pattern recognition.
---
 gcc/tree-vect-patterns.cc | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/gcc/tree-vect-patterns.cc b/gcc/tree-vect-patterns.cc
index 4570c25b664..ca8809e7cfd 100644
--- a/gcc/tree-vect-patterns.cc
+++ b/gcc/tree-vect-patterns.cc
@@ -2700,7 +2700,8 @@ vect_recog_bitfield_ref_pattern (vec_info *vinfo, stmt_vec_info stmt_info,
   /* If the only use of the result of this BIT_FIELD_REF + CONVERT is a
  PLUS_EXPR then do the shift last as some targets can combine the shift and
  add into a single instruction.  */
-  if (lhs && single_imm_use (lhs, &use_p, &use_stmt))
+  if (lhs && !STMT_VINFO_RELATED_STMT (stmt_info)
+  && single_imm_use (lhs, &use_p, &use_stmt))
 {
   if (gimple_code (use_stmt) == GIMPLE_ASSIGN
 	  && gimple_assign_rhs_code (use_stmt) == PLUS_EXPR)
-- 
2.17.1



[RFC][PATCH 3/5] vect: Enable lane-reducing operation that is not loop reduction statement

2024-07-21 Thread Feng Xue OS
This patch extends original vect analysis and transform to support a new kind
of lane-reducing operation that participates in loop reduction indirectly. The
operation itself is not reduction statement, but its value would be accumulated
into reduction result finally.

Thanks,
Feng
---
gcc/
* tree-vect-loop.cc (vectorizable_lane_reducing): Allow indirect lane-
reducing operation.
(vect_transform_reduction): Extend transform for indirect lane-reducing
operation.
---
 gcc/tree-vect-loop.cc | 48 +++
 1 file changed, 40 insertions(+), 8 deletions(-)

diff --git a/gcc/tree-vect-loop.cc b/gcc/tree-vect-loop.cc
index d7d628efa60..c344158b419 100644
--- a/gcc/tree-vect-loop.cc
+++ b/gcc/tree-vect-loop.cc
@@ -7520,9 +7520,7 @@ vectorizable_lane_reducing (loop_vec_info loop_vinfo, 
stmt_vec_info stmt_info,
 
   stmt_vec_info reduc_info = STMT_VINFO_REDUC_DEF (vect_orig_stmt (stmt_info));
 
-  /* TODO: Support lane-reducing operation that does not directly participate
- in loop reduction.  */
-  if (!reduc_info || STMT_VINFO_REDUC_IDX (stmt_info) < 0)
+  if (!reduc_info)
 return false;
 
   /* Lane-reducing pattern inside any inner loop of LOOP_VINFO is not
@@ -7530,7 +7528,16 @@ vectorizable_lane_reducing (loop_vec_info loop_vinfo, 
stmt_vec_info stmt_info,
   gcc_assert (STMT_VINFO_DEF_TYPE (reduc_info) == vect_reduction_def);
   gcc_assert (STMT_VINFO_REDUC_TYPE (reduc_info) == TREE_CODE_REDUCTION);
 
-  for (int i = 0; i < (int) gimple_num_ops (stmt) - 1; i++)
+  int sum_idx = STMT_VINFO_REDUC_IDX (stmt_info);
+  int num_ops = (int) gimple_num_ops (stmt) - 1;
+
+  /* Participate in loop reduction either directly or indirectly.  */
+  if (sum_idx >= 0)
+gcc_assert (sum_idx  == num_ops - 1);
+  else
+sum_idx = num_ops - 1;
+
+  for (int i = 0; i < num_ops; i++)
 {
   stmt_vec_info def_stmt_info;
   slp_tree slp_op;
@@ -7573,7 +7580,24 @@ vectorizable_lane_reducing (loop_vec_info loop_vinfo, 
stmt_vec_info stmt_info,
 
   tree vectype_in = STMT_VINFO_REDUC_VECTYPE_IN (stmt_info);
 
-  gcc_assert (vectype_in);
+  if (!vectype_in)
+{
+  enum vect_def_type dt;
+  tree rhs1 = gimple_assign_rhs1 (stmt);
+
+  if (!vect_is_simple_use (rhs1, loop_vinfo, &dt, &vectype_in))
+   return false;
+
+  if (!vectype_in)
+   {
+ vectype_in = get_vectype_for_scalar_type (loop_vinfo,
+   TREE_TYPE (rhs1));
+ if (!vectype_in)
+   return false;
+   }
+
+  STMT_VINFO_REDUC_VECTYPE_IN (stmt_info) = vectype_in;
+}
 
   /* Compute number of effective vector statements for costing.  */
   unsigned int ncopies_for_cost = vect_get_num_copies (loop_vinfo, slp_node,
@@ -8750,9 +8774,17 @@ vect_transform_reduction (loop_vec_info loop_vinfo,
   gcc_assert (single_defuse_cycle || lane_reducing);
 
   if (lane_reducing)
-{
-  /* The last operand of lane-reducing op is for reduction.  */
-  gcc_assert (reduc_index == (int) op.num_ops - 1);
+{  
+  if (reduc_index < 0)
+   {
+ reduc_index = (int) op.num_ops - 1;
+ single_defuse_cycle = false;
+   }
+  else
+   {
+ /* The last operand of lane-reducing op is for reduction.  */
+ gcc_assert (reduc_index == (int) op.num_ops - 1);
+   }
 }
 
   /* Create the destination vector  */
-- 
2.17.1From 5e65c65786d9594c172b58a6cd1af50c67efb927 Mon Sep 17 00:00:00 2001
From: Feng Xue 
Date: Wed, 24 Apr 2024 16:46:49 +0800
Subject: [PATCH 3/5] vect: Enable lane-reducing operation that is not loop
 reduction statement

This patch extends original vect analysis and transform to support a new kind
of lane-reducing operation that participates in loop reduction indirectly. The
operation itself is not reduction statement, but its value would be accumulated
into reduction result finally.

2024-04-24 Feng Xue 

gcc/
	* tree-vect-loop.cc (vectorizable_lane_reducing): Allow indirect lane-
	reducing operation.
	(vect_transform_reduction): Extend transform for indirect lane-reducing
	operation.
---
 gcc/tree-vect-loop.cc | 48 +++
 1 file changed, 40 insertions(+), 8 deletions(-)

diff --git a/gcc/tree-vect-loop.cc b/gcc/tree-vect-loop.cc
index d7d628efa60..c344158b419 100644
--- a/gcc/tree-vect-loop.cc
+++ b/gcc/tree-vect-loop.cc
@@ -7520,9 +7520,7 @@ vectorizable_lane_reducing (loop_vec_info loop_vinfo, stmt_vec_info stmt_info,
 
   stmt_vec_info reduc_info = STMT_VINFO_REDUC_DEF (vect_orig_stmt (stmt_info));
 
-  /* TODO: Support lane-reducing operation that does not directly participate
- in loop reduction.  */
-  if (!reduc_info || STMT_VINFO_REDUC_IDX (stmt_info) < 0)
+  if (!reduc_info)
 return false;
 
   /* Lane-reducing pattern inside any inner loop of LOOP_VINFO is not
@@ -7530,7 +7528,16 @@ vectorizable_lane_reducing (loop_vec_info loop_vinfo, stmt_vec_info stmt_info,
   gcc_assert (ST

[RFC][PATCH 2/5] vect: Introduce loop reduction affine closure to vect pattern recog

2024-07-21 Thread Feng Xue OS
For sum-based loop reduction, its affine closure is composed by statements
whose results and derived computation only end up in the reduction, and are
not used in any non-linear transform operation. The concept underlies the
generalized lane-reducing pattern recognition in the coming patches. As
mathematically proved, it is legitimate to optimize evaluation of a value
with lane-reducing pattern, only if its definition statement locates in affine
closure. That is to say, canonicalized representation for loop reduction
could be of the following affine form, in which "opX" denotes an operation
for lane-reducing pattern, h(i) represents remaining operations irrelvant to
those patterns.

  for (i)
sum += cst0 * op0 + cst1 * op1 + ... + cstN * opN + h(i);

At initialization, we invoke a preprocessing step to mark all statements in
affine closure, which could ease retrieval of the property during pattern
matching. Since a pattern hit would replace original statement with new
pattern statements, we resort to a postprocessing step after recognition,
to parse semantics of those new, and incrementally update affine closure,
or rollback the pattern change if it would break completeness of existing
closure.

Thus, inside affine closure, recog framework could universally handle both
lane-reducing and normal patterns. Also with this patch, we are able to add
more complicated logic to enhance lane-reducing patterns.

Thanks,
Feng
---
gcc/
* tree-vectorizer.h (enum vect_reduc_pattern_status): New enum.
(_stmt_vec_info): Add a new field reduc_pattern_status.
* tree-vect-patterns.cc (vect_split_statement): Adjust statement
status for reduction affine closure.
(vect_convert_input): Do not reuse conversion statement in process.
(vect_reassociating_reduction_p): Add a condition check to only allow
statement in reduction affine closure.
(vect_pattern_expr_invariant_p): New function.
(vect_get_affine_operands_mask): Likewise.
(vect_mark_reduction_affine_closure): Likewise.
(vect_mark_stmts_for_reduction_pattern_recog): Likewise.
(vect_get_prev_reduction_stmt): Likewise.
(vect_mark_reduction_pattern_sequence_formed): Likewise.
(vect_check_pattern_stmts_for_reduction): Likewise.
(vect_pattern_recog_1): Check if a pattern recognition would break
existing lane-reducing pattern statements.
(vect_pattern_recog): Mark loop reduction affine closure.
---
 gcc/tree-vect-patterns.cc | 722 +-
 gcc/tree-vectorizer.h |  23 ++
 2 files changed, 742 insertions(+), 3 deletions(-)

diff --git a/gcc/tree-vect-patterns.cc b/gcc/tree-vect-patterns.cc
index ca8809e7cfd..02f6b942026 100644
--- a/gcc/tree-vect-patterns.cc
+++ b/gcc/tree-vect-patterns.cc
@@ -750,7 +750,6 @@ vect_split_statement (vec_info *vinfo, stmt_vec_info 
stmt2_info, tree new_rhs,
  gimple_stmt_iterator gsi = gsi_for_stmt (stmt2_info->stmt, def_seq);
  gsi_insert_before_without_update (&gsi, stmt1, GSI_SAME_STMT);
}
-  return true;
 }
   else
 {
@@ -783,9 +782,35 @@ vect_split_statement (vec_info *vinfo, stmt_vec_info 
stmt2_info, tree new_rhs,
  dump_printf_loc (MSG_NOTE, vect_location, "and: %G",
   (gimple *) new_stmt2);
}
+}
 
-  return true;
+  /* Since this function would change existing conversion statement no matter
+ the pattern is finally applied or not, we should check whether affine
+ closure of loop reduction need to be adjusted for impacted statements.  */
+  unsigned int status = stmt2_info->reduc_pattern_status;
+
+  if (status != rpatt_none)
+{
+  tree rhs_type = TREE_TYPE (gimple_assign_rhs1 (stmt1));
+  tree new_rhs_type = TREE_TYPE (new_rhs);
+
+  /* The new statement generated by splitting is a nature widening
+conversion. */
+  gcc_assert (TYPE_PRECISION (rhs_type) < TYPE_PRECISION (new_rhs_type));
+  gcc_assert (TYPE_UNSIGNED (rhs_type) || !TYPE_UNSIGNED (new_rhs_type));
+
+  /* The new statement would not break transform invariance of lane-
+reducing operation, if the original conversion depends on the one
+formed previously.  For the case, it should also be marked with
+rpatt_formed status.  */
+  if (status & rpatt_formed)
+   vinfo->lookup_stmt (stmt1)->reduc_pattern_status = rpatt_formed;
+
+  if (!is_pattern_stmt_p (stmt2_info))
+   STMT_VINFO_RELATED_STMT (stmt2_info)->reduc_pattern_status = status;
 }
+
+  return true;
 }
 
 /* Look for the following pattern
@@ -890,7 +915,10 @@ vect_convert_input (vec_info *vinfo, stmt_vec_info 
stmt_info, tree type,
 return wide_int_to_tree (type, wi::to_widest (unprom->op));
 
   tree input = unprom->op;
-  if (unprom->caster)
+
+  /* We should not reuse conversion, if it is just the statement under pattern
+ recognition.  */
+  if (unprom->caster && unprom->cast

[RFC][PATCH 4/5] vect: Extend lane-reducing patterns to non-loop-reduction statement

2024-07-21 Thread Feng Xue OS
Previously, only simple lane-reducing case is supported, in which one loop
reduction statement forms one pattern match:

  char *d0, *d1, *s0, *s1, *w;
  for (i) {
sum += d0[i] * d1[i];  // sum = DOT_PROD(d0, d1, sum);
sum += abs(s0[i] - s1[i]); // sum = SAD(s0, s1, sum);
sum += w[i];   // sum = WIDEN_SUM(w, sum);
  }

This patch removes limitation of current lane-reducing matching strategy, and
extends candidate scope to the whole loop reduction affine closure. Thus, we
could optimize reduction with lane-reducing as many as possible, which ends up
with generalized pattern recognition as ("opX" denotes an operation for
lane-reducing pattern):

 for (i)
   sum += cst0 * op0 + cst1 * op1 + ... + cstN * opN + h(i);

A lane-reducing operation contains two aspects: main primitive operation and
appendant result-accumulation. Original design handles match of the compound
semantics in single pattern, but the means is not suitable for operation that
does not directly participate in loop reduction. In this patch, we only focus
on the basic aspect, and leave another patch to cover the rest. An example
with dot-product:

sum = DOT_PROD(d0, d1, sum);   // original
sum = DOT_PROD(d0, d1, 0) + sum;   // now

Thanks,
Feng
---
gcc/
* tree-vect-patterns (vect_reassociating_reduction_p): Remove the
function.
(vect_recog_dot_prod_pattern): Relax check to allow any statement in
reduction affine closure.
(vect_recog_sad_pattern): Likewise.
(vect_recog_widen_sum_pattern): Likewise. And use dot-product if
widen-sum is not supported.
(vect_vect_recog_func_ptrs): Move lane-reducing patterns to the topmost.

gcc/testsuite/
* gcc.dg/vect/vect-reduc-affine-1.c
* gcc.dg/vect/vect-reduc-affine-2.c
* gcc.dg/vect/vect-reduc-affine-slp-1.c
---
 .../gcc.dg/vect/vect-reduc-affine-1.c | 112 ++
 .../gcc.dg/vect/vect-reduc-affine-2.c |  81 +
 .../gcc.dg/vect/vect-reduc-affine-slp-1.c |  74 
 gcc/tree-vect-patterns.cc | 321 ++
 4 files changed, 372 insertions(+), 216 deletions(-)
 create mode 100644 gcc/testsuite/gcc.dg/vect/vect-reduc-affine-1.c
 create mode 100644 gcc/testsuite/gcc.dg/vect/vect-reduc-affine-2.c
 create mode 100644 gcc/testsuite/gcc.dg/vect/vect-reduc-affine-slp-1.c

diff --git a/gcc/testsuite/gcc.dg/vect/vect-reduc-affine-1.c 
b/gcc/testsuite/gcc.dg/vect/vect-reduc-affine-1.c
new file mode 100644
index 000..a5e99ce703b
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/vect/vect-reduc-affine-1.c
@@ -0,0 +1,112 @@
+/* Disabling epilogues until we find a better way to deal with scans.  */
+/* { dg-additional-options "--param vect-epilogues-nomask=0" } */
+/* { dg-require-effective-target vect_int } */
+/* { dg-require-effective-target arm_v8_2a_dotprod_neon_hw { target { 
aarch64*-*-* || arm*-*-* } } } */
+/* { dg-add-options arm_v8_2a_dotprod_neon }  */
+
+#include "tree-vect.h"
+
+#define N 50
+
+#define FN(name, S1, S2)   \
+S1 int __attribute__ ((noipa)) \
+name (S1 int res,  \
+  S2 char *restrict a, \
+  S2 char *restrict b, \
+  S2 int *restrict c,  \
+  S2 int cst1, \
+  S2 int cst2, \
+  int shift)   \
+{  \
+  for (int i = 0; i < N; i++)  \
+res += a[i] * b[i] + 16;   \
+   \
+  asm volatile ("" ::: "memory");  \
+  for (int i = 0; i < N; i++)  \
+res += a[i] * b[i] + cst1; \
+   \
+  asm volatile ("" ::: "memory");  \
+  for (int i = 0; i < N; i++)  \
+res += a[i] * b[i] + c[i]; \
+   \
+  asm volatile ("" ::: "memory");  \
+  for (int i = 0; i < N; i++)  \
+res += a[i] * b[i] * 23;   \
+   \
+  asm volatile ("" ::: "memory");  \
+  for (int i = 0; i < N; i++)  \
+res += a[i] * b[i] << 6;   \
+   \
+  asm volatile ("" ::: "memory");  \
+  for (int i = 0; i < N; i++)  \
+res += a[i] * b[i] * cst2; \
+   \
+  asm volatile ("" ::: "memory");  \
+  for 

[RFC][PATCH 5/5] vect: Add accumulating-result pattern for lane-reducing operation

2024-07-21 Thread Feng Xue OS
This patch adds a pattern to fold a summation into the last operand of lane-
reducing operation when appropriate, which is a supplement to those operation-
specific patterns for dot-prod/sad/widen-sum.

  sum = lane-reducing-op(..., 0) + value;
=>
  sum = lane-reducing-op(..., value);

Thanks,
Feng
---
gcc/
* tree-vect-patterns (vect_recog_lane_reducing_accum_pattern): New
pattern function.
(vect_vect_recog_func_ptrs): Add the new pattern function.
* params.opt (vect-lane-reducing-accum-pattern): New parameter.

gcc/testsuite/
* gcc.dg/vect/vect-reduc-accum-pattern.c
---
 gcc/params.opt|   4 +
 .../gcc.dg/vect/vect-reduc-accum-pattern.c|  61 ++
 gcc/tree-vect-patterns.cc | 106 ++
 3 files changed, 171 insertions(+)
 create mode 100644 gcc/testsuite/gcc.dg/vect/vect-reduc-accum-pattern.c

diff --git a/gcc/params.opt b/gcc/params.opt
index c17ba17b91b..b94bdc26cbd 100644
--- a/gcc/params.opt
+++ b/gcc/params.opt
@@ -1198,6 +1198,10 @@ The maximum factor which the loop vectorizer applies to 
the cost of statements i
 Common Joined UInteger Var(param_vect_induction_float) Init(1) IntegerRange(0, 
1) Param Optimization
 Enable loop vectorization of floating point inductions.
 
+-param=vect-lane-reducing-accum-pattern=
+Common Joined UInteger Var(param_vect_lane_reducing_accum_pattern) Init(2) 
IntegerRange(0, 2) Param Optimization
+Allow pattern of combining plus into lane reducing operation or not. If value 
is 2, allow this for all statements, or if 1, only for reduction statement, 
otherwise, disable it.
+
 -param=vrp-block-limit=
 Common Joined UInteger Var(param_vrp_block_limit) Init(15) Optimization 
Param
 Maximum number of basic blocks before VRP switches to a fast model with less 
memory requirements.
diff --git a/gcc/testsuite/gcc.dg/vect/vect-reduc-accum-pattern.c 
b/gcc/testsuite/gcc.dg/vect/vect-reduc-accum-pattern.c
new file mode 100644
index 000..80a2c4f047e
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/vect/vect-reduc-accum-pattern.c
@@ -0,0 +1,61 @@
+/* Disabling epilogues until we find a better way to deal with scans.  */
+/* { dg-additional-options "--param vect-epilogues-nomask=0" } */
+/* { dg-require-effective-target vect_int } */
+/* { dg-require-effective-target arm_v8_2a_dotprod_neon_hw { target { 
aarch64*-*-* || arm*-*-* } } } */
+/* { dg-add-options arm_v8_2a_dotprod_neon }  */
+
+#include "tree-vect.h"
+
+#define N 50
+
+#define FN(name, S1, S2)   \
+S1 int __attribute__ ((noipa)) \
+name (S1 int res,  \
+  S2 char *restrict a, \
+  S2 char *restrict b, \
+  S2 char *restrict c, \
+  S2 char *restrict d) \
+{  \
+  for (int i = 0; i < N; i++)  \
+res += a[i] * b[i];\
+   \
+  asm volatile ("" ::: "memory");  \
+  for (int i = 0; i < N; ++i)  \
+res += (a[i] * b[i] + c[i] * d[i]) << 3;   \
+   \
+  return res;  \
+}
+
+FN(f1_vec, signed, signed)
+
+#pragma GCC push_options
+#pragma GCC optimize ("O0")
+FN(f1_novec, signed, signed)
+#pragma GCC pop_options
+
+#define BASE2 ((signed int) -1 < 0 ? -126 : 4)
+#define OFFSET 20
+
+int
+main (void)
+{
+  check_vect ();
+
+  signed char a[N], b[N];
+  signed char c[N], d[N];
+
+#pragma GCC novector
+  for (int i = 0; i < N; ++i)
+{
+  a[i] = BASE2 + i * 5;
+  b[i] = BASE2 + OFFSET + i * 4;
+  c[i] = BASE2 + i * 6;
+  d[i] = BASE2 + OFFSET + i * 5;
+}
+
+  if (f1_vec (0x12345, a, b, c, d) != f1_novec (0x12345, a, b, c, d))
+__builtin_abort ();
+}
+
+/* { dg-final { scan-tree-dump "vect_recog_dot_prod_pattern: detected" "vect" 
} } */
+/* { dg-final { scan-tree-dump "vect_recog_lane_reducing_accum_pattern: 
detected" "vect" { target { vect_sdot_qi } } } } */
diff --git a/gcc/tree-vect-patterns.cc b/gcc/tree-vect-patterns.cc
index bb037af0b68..9a6b16532e4 100644
--- a/gcc/tree-vect-patterns.cc
+++ b/gcc/tree-vect-patterns.cc
@@ -1490,6 +1490,111 @@ vect_recog_abd_pattern (vec_info *vinfo,
   return vect_convert_output (vinfo, stmt_vinfo, out_type, stmt, vectype_out);
 }
 
+/* Function vect_recog_lane_reducing_accum_pattern
+
+   Try to fold a summation into the last operand of lane-reducing operation.
+
+   sum = lane-reducing-op(..., 0) + value;
+
+   A lane-reducing operation contains two aspects: main primitive operation
+   and appendant result-accumulation.  Pattern matching for the basic aspect
+   is handled in specific pattern for dot-prod/sad/widen-sum respectively.
+   The function is in charge of the other aspect.
+
+   Input:
+
+   * STMT_VINFO: The stmt from which the pattern se

[PATCH v1] RISC-V: Rearrange the test helper files for vector .SAT_*

2024-07-21 Thread pan2 . li
From: Pan Li 

Rearrange the test help header files,  as well as align the name
conventions.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/rvv/autovec/binop/vec_sat_binary.h: Move to...
* gcc.target/riscv/rvv/autovec/binop/vec_sat_binary_vvv_run.h: ...here.
* gcc.target/riscv/rvv/autovec/binop/vec_sat_binary_scalar.h: Move to...
* gcc.target/riscv/rvv/autovec/binop/vec_sat_binary_vvx_run.h: ...here.
* gcc.target/riscv/rvv/autovec/binop/vec_sat_binary_vx.h: Move to...
* gcc.target/riscv/rvv/autovec/binop/vec_sat_binary_vx_run.h: ...here.
* gcc.target/riscv/rvv/autovec/binop/vec_sat_u_add-1.c: Adjust
the include file names.
* gcc.target/riscv/rvv/autovec/binop/vec_sat_u_add-10.c: Ditto.
* gcc.target/riscv/rvv/autovec/binop/vec_sat_u_add-11.c: Ditto.
* gcc.target/riscv/rvv/autovec/binop/vec_sat_u_add-12.c: Ditto.
* gcc.target/riscv/rvv/autovec/binop/vec_sat_u_add-13.c: Ditto.
* gcc.target/riscv/rvv/autovec/binop/vec_sat_u_add-14.c: Ditto.
* gcc.target/riscv/rvv/autovec/binop/vec_sat_u_add-15.c: Ditto.
* gcc.target/riscv/rvv/autovec/binop/vec_sat_u_add-16.c: Ditto.
* gcc.target/riscv/rvv/autovec/binop/vec_sat_u_add-17.c: Ditto.
* gcc.target/riscv/rvv/autovec/binop/vec_sat_u_add-18.c: Ditto.
* gcc.target/riscv/rvv/autovec/binop/vec_sat_u_add-19.c: Ditto.
* gcc.target/riscv/rvv/autovec/binop/vec_sat_u_add-2.c: Ditto.
* gcc.target/riscv/rvv/autovec/binop/vec_sat_u_add-20.c: Ditto.
* gcc.target/riscv/rvv/autovec/binop/vec_sat_u_add-21.c: Ditto.
* gcc.target/riscv/rvv/autovec/binop/vec_sat_u_add-22.c: Ditto.
* gcc.target/riscv/rvv/autovec/binop/vec_sat_u_add-23.c: Ditto.
* gcc.target/riscv/rvv/autovec/binop/vec_sat_u_add-24.c: Ditto.
* gcc.target/riscv/rvv/autovec/binop/vec_sat_u_add-25.c: Ditto.
* gcc.target/riscv/rvv/autovec/binop/vec_sat_u_add-26.c: Ditto.
* gcc.target/riscv/rvv/autovec/binop/vec_sat_u_add-27.c: Ditto.
* gcc.target/riscv/rvv/autovec/binop/vec_sat_u_add-28.c: Ditto.
* gcc.target/riscv/rvv/autovec/binop/vec_sat_u_add-29.c: Ditto.
* gcc.target/riscv/rvv/autovec/binop/vec_sat_u_add-3.c: Ditto.
* gcc.target/riscv/rvv/autovec/binop/vec_sat_u_add-30.c: Ditto.
* gcc.target/riscv/rvv/autovec/binop/vec_sat_u_add-31.c: Ditto.
* gcc.target/riscv/rvv/autovec/binop/vec_sat_u_add-32.c: Ditto.
* gcc.target/riscv/rvv/autovec/binop/vec_sat_u_add-4.c: Ditto.
* gcc.target/riscv/rvv/autovec/binop/vec_sat_u_add-5.c: Ditto.
* gcc.target/riscv/rvv/autovec/binop/vec_sat_u_add-6.c: Ditto.
* gcc.target/riscv/rvv/autovec/binop/vec_sat_u_add-7.c: Ditto.
* gcc.target/riscv/rvv/autovec/binop/vec_sat_u_add-8.c: Ditto.
* gcc.target/riscv/rvv/autovec/binop/vec_sat_u_add-9.c: Ditto.
* gcc.target/riscv/rvv/autovec/binop/vec_sat_u_add-run-1.c: Ditto.
* gcc.target/riscv/rvv/autovec/binop/vec_sat_u_add-run-10.c: Ditto.
* gcc.target/riscv/rvv/autovec/binop/vec_sat_u_add-run-11.c: Ditto.
* gcc.target/riscv/rvv/autovec/binop/vec_sat_u_add-run-12.c: Ditto.
* gcc.target/riscv/rvv/autovec/binop/vec_sat_u_add-run-13.c: Ditto.
* gcc.target/riscv/rvv/autovec/binop/vec_sat_u_add-run-14.c: Ditto.
* gcc.target/riscv/rvv/autovec/binop/vec_sat_u_add-run-15.c: Ditto.
* gcc.target/riscv/rvv/autovec/binop/vec_sat_u_add-run-16.c: Ditto.
* gcc.target/riscv/rvv/autovec/binop/vec_sat_u_add-run-17.c: Ditto.
* gcc.target/riscv/rvv/autovec/binop/vec_sat_u_add-run-18.c: Ditto.
* gcc.target/riscv/rvv/autovec/binop/vec_sat_u_add-run-19.c: Ditto.
* gcc.target/riscv/rvv/autovec/binop/vec_sat_u_add-run-2.c: Ditto.
* gcc.target/riscv/rvv/autovec/binop/vec_sat_u_add-run-20.c: Ditto.
* gcc.target/riscv/rvv/autovec/binop/vec_sat_u_add-run-21.c: Ditto.
* gcc.target/riscv/rvv/autovec/binop/vec_sat_u_add-run-22.c: Ditto.
* gcc.target/riscv/rvv/autovec/binop/vec_sat_u_add-run-23.c: Ditto.
* gcc.target/riscv/rvv/autovec/binop/vec_sat_u_add-run-24.c: Ditto.
* gcc.target/riscv/rvv/autovec/binop/vec_sat_u_add-run-25.c: Ditto.
* gcc.target/riscv/rvv/autovec/binop/vec_sat_u_add-run-26.c: Ditto.
* gcc.target/riscv/rvv/autovec/binop/vec_sat_u_add-run-27.c: Ditto.
* gcc.target/riscv/rvv/autovec/binop/vec_sat_u_add-run-28.c: Ditto.
* gcc.target/riscv/rvv/autovec/binop/vec_sat_u_add-run-29.c: Ditto.
* gcc.target/riscv/rvv/autovec/binop/vec_sat_u_add-run-3.c: Ditto.
* gcc.target/riscv/rvv/autovec/binop/vec_sat_u_add-run-30.c: Ditto.
* gcc.target/riscv/rvv/autovec/binop/vec_sat_u_add-run-31.c: Ditto.
* gcc.target/riscv/rvv/autovec/binop/vec_sat_u_add-run-32.c: Ditto.
* gcc.target/riscv/rvv/autovec/binop/vec_sat_u_add-run-4.c: Ditto.
* gcc.target/riscv/rvv/au

[PATCH] tree-optimization/58416 - SRA wrt FP type replacements

2024-07-21 Thread Richard Biener
As in other places we have to be careful to use FP modes to represent
the underlying bit representation of an object.  With x87 floating-point
types there are no load or store instructions that preserve this and
XFmode can have padding.

When SRA faces the situation that a field is accessed with multiple
effective types as happens for example for unions it generally
choses an integer type if available.  But in the case in the PR
there's an aggregate type or a floating-point type only and we end
up chosing the register type.

SRA deals with similar situations for bit-precision integer types
and adjusts the replacement type to one covering the size of the
object.  The following patch makes sure we do the same when the
replacement has float mode and there were possibly two ways the
object was accessed.  I've chosen to use bitwise_type_for_mode
in this case as done for example by memcpy folding to avoid
creating a unsigned:96 replacement type on i?86 where
sizeof(long double) is 12.  This means we can fail to find an
integer type for a replacement which slightly complicates the patch
and it causes the testcase to no longer be SRAed on i?86.

Bootstrapped on x86_64-unknown-linux-gnu, there is some fallout in
the testsuite I need to compare to a clean run.  Comments welcome.

Richard.

PR tree-optimization/58416
* tree-sra.cc (analyze_access_subtree): For FP mode replacements
with multiple access paths use a bitwise type instead or fail
if not available.

* gcc.dg/torture/pr58416.c: New testcase.
---
 gcc/testsuite/gcc.dg/torture/pr58416.c | 32 
 gcc/tree-sra.cc| 72 ++
 2 files changed, 83 insertions(+), 21 deletions(-)
 create mode 100644 gcc/testsuite/gcc.dg/torture/pr58416.c

diff --git a/gcc/testsuite/gcc.dg/torture/pr58416.c 
b/gcc/testsuite/gcc.dg/torture/pr58416.c
new file mode 100644
index 000..0922b0e7089
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/torture/pr58416.c
@@ -0,0 +1,32 @@
+/* { dg-do run } */
+
+struct s {
+  char s[sizeof(long double)];
+};
+
+union u {
+  long double d;
+  struct s s;
+};
+
+int main()
+{
+  union u x = {0};
+#if __SIZEOF_LONG_DOUBLE__ == 16
+  x.s = (struct s){""};
+#elif __SIZEOF_LONG_DOUBLE__ == 12
+  x.s = (struct s){""};
+#elif __SIZEOF_LONG_DOUBLE__ == 8
+  x.s = (struct s){""};
+#elif __SIZEOF_LONG_DOUBLE__ == 4
+  x.s = (struct s){""};
+#endif
+
+  union u y = x;
+
+  for (unsigned char *p = (unsigned char *)&y + sizeof y;
+   p-- > (unsigned char *)&y;)
+if (*p != (unsigned char)'x')
+  __builtin_abort ();
+  return 0;
+}
diff --git a/gcc/tree-sra.cc b/gcc/tree-sra.cc
index 8040b0c5645..bc9a7b3ee04 100644
--- a/gcc/tree-sra.cc
+++ b/gcc/tree-sra.cc
@@ -2868,40 +2868,70 @@ analyze_access_subtree (struct access *root, struct 
access *parent,
   /* Always create access replacements that cover the whole access.
  For integral types this means the precision has to match.
 Avoid assumptions based on the integral type kind, too.  */
-  if (INTEGRAL_TYPE_P (root->type)
- && ((TREE_CODE (root->type) != INTEGER_TYPE
-  && TREE_CODE (root->type) != BITINT_TYPE)
- || TYPE_PRECISION (root->type) != root->size)
- /* But leave bitfield accesses alone.  */
- && (TREE_CODE (root->expr) != COMPONENT_REF
- || !DECL_BIT_FIELD (TREE_OPERAND (root->expr, 1
+  if ((INTEGRAL_TYPE_P (root->type)
+  && ((TREE_CODE (root->type) != INTEGER_TYPE
+   && TREE_CODE (root->type) != BITINT_TYPE)
+  || TYPE_PRECISION (root->type) != root->size)
+  /* But leave bitfield accesses alone.  */
+  && (TREE_CODE (root->expr) != COMPONENT_REF
+  || !DECL_BIT_FIELD (TREE_OPERAND (root->expr, 1
+ /* Avoid a floating-point replacement when there's multiple
+ways this field is accessed.   On some targets this can
+cause correctness issues, see PR58416.  */
+ || (FLOAT_MODE_P (TYPE_MODE (root->type))
+ && !root->grp_same_access_path))
{
  tree rt = root->type;
  gcc_assert ((root->offset % BITS_PER_UNIT) == 0
  && (root->size % BITS_PER_UNIT) == 0);
  if (TREE_CODE (root->type) == BITINT_TYPE)
root->type = build_bitint_type (root->size, TYPE_UNSIGNED (rt));
+ else if (FLOAT_MODE_P (TYPE_MODE (root->type)))
+   {
+ tree bt = bitwise_type_for_mode (TYPE_MODE (root->type));
+ if (!bt)
+   {
+ if (dump_file && (dump_flags & TDF_DETAILS))
+   {
+ fprintf (dump_file, "Failed to change the type of a "
+  "replacement for ");
+ print_generic_expr (dump_file, root->base);
+ fprintf (dump_file, " offset: %u, size: %u ",
+   

[PATCH][RFC] tree-optimization/114659 - VN and FP to int punning

2024-07-21 Thread Richard Biener
The following addresses another case where x87 FP loads mangle the
bit representation and thus are not suitable for a representative
in other types.  VN was value-numbering a later integer load of 'x'
as the same as a former float load of 'x'.

The following disables this when the result is not known constant.

This now regresses gcc.dg/tree-ssa/ssa-fre-7.c but for x87 float
the optimization might elide a FP load/store "noop" move that isn't
noop on x87 and thus the desired transform is invalid.

Nevertheless it's bad to pessimize all targets for this.  I was
wondering if it's possible to key this on reg_raw_mode[] but
that needs a hard register number (and suspiciously the array
has no DFmode or SFmode on x86_64 but only XFmode).  So would
this need a new target hook?  Should this use some other
mechanism to query for the correctness of performing the load
in another mode and then punning to the destination mode?

Bootstrap and regtest running on x86_64-unknown-linux-gnu.

PR tree-optimization/114659
* tree-ssa-sccvn.cc (visit_reference_op_load): Do not
pun from a scalar floating point mode load to a different
type unless we can do so by constnat folding.

* gcc.target/i386/pr114659.c: New testcase.
---
 gcc/testsuite/gcc.target/i386/pr114659.c | 62 
 gcc/tree-ssa-sccvn.cc|  7 +++
 2 files changed, 69 insertions(+)
 create mode 100644 gcc/testsuite/gcc.target/i386/pr114659.c

diff --git a/gcc/testsuite/gcc.target/i386/pr114659.c 
b/gcc/testsuite/gcc.target/i386/pr114659.c
new file mode 100644
index 000..e1e24d55687
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/pr114659.c
@@ -0,0 +1,62 @@
+/* { dg-do run } */
+/* { dg-options "-O2" } */
+
+int
+my_totalorderf (float const *x, float const *y)
+{
+  int xs = __builtin_signbit (*x);
+  int ys = __builtin_signbit (*y);
+  if (!xs != !ys)
+return xs;
+
+  int xn = __builtin_isnan (*x);
+  int yn = __builtin_isnan (*y);
+  if (!xn != !yn)
+return !xn == !xs;
+  if (!xn)
+return *x <= *y;
+
+  unsigned int extended_sign = -!!xs;
+  union { unsigned int i; float f; } xu = {0}, yu = {0};
+  __builtin_memcpy (&xu.f, x, sizeof (float));
+  __builtin_memcpy (&yu.f, y, sizeof (float));
+  return (xu.i ^ extended_sign) <= (yu.i ^ extended_sign);
+}
+
+static float
+positive_NaNf ()
+{
+  float volatile nan = 0.0f / 0.0f;
+  return (__builtin_signbit (nan) ? - nan : nan);
+}
+
+typedef union { float value; unsigned int word[1]; } memory_float;
+
+static memory_float
+construct_memory_SNaNf (float quiet_value)
+{
+  memory_float m;
+  m.value = quiet_value;
+  m.word[0] ^= (unsigned int) 1 << 22;
+  m.word[0] |= (unsigned int) 1;
+  return m;
+}
+
+memory_float x[7] =
+  {
+{ 0 },
+{ 1e-5 },
+{ 1 },
+{ 1e37 },
+{ 1.0f / 0.0f },
+  };
+
+int
+main ()
+{
+  x[5] = construct_memory_SNaNf (positive_NaNf ());
+  x[6] = (memory_float) { positive_NaNf () };
+  if (! my_totalorderf (&x[5].value, &x[6].value))
+__builtin_abort ();
+  return 0;
+}
diff --git a/gcc/tree-ssa-sccvn.cc b/gcc/tree-ssa-sccvn.cc
index 0139f1b4e30..62f3de11b56 100644
--- a/gcc/tree-ssa-sccvn.cc
+++ b/gcc/tree-ssa-sccvn.cc
@@ -5825,6 +5825,13 @@ visit_reference_op_load (tree lhs, tree op, gimple *stmt)
result = NULL_TREE;
   else if (CONSTANT_CLASS_P (result))
result = const_unop (VIEW_CONVERT_EXPR, TREE_TYPE (op), result);
+  /* Do not treat a float-mode load as preserving the bit
+representation.  See PR114659, on for x87 FP modes there
+is no load instruction that does not at least turn sNaNs
+into qNaNs.  But allow the case of a constant FP value we an
+fold above.  */
+  else if (SCALAR_FLOAT_MODE_P (TYPE_MODE (TREE_TYPE (result
+   result = NULL_TREE;
   else
{
  /* We will be setting the value number of lhs to the value number
-- 
2.43.0


Re: [PATCH] LoongArch: Implement scalar isinf, isnormal, and isfinite via fclass

2024-07-21 Thread Xi Ruoyao
On Mon, 2024-07-15 at 15:53 +0800, Lulu Cheng wrote:
> Hi,
> 
> g++.dg/opt/pr107569.C and range-sincos.c vrp-float-abs-1.c is the same 
> issue, right?
> 
> And I have no objection to code modifications. But I think it's better
> to wait until this builtin
> 
> function is fixed.

Oops https://gcc.gnu.org/pipermail/gcc-patches/2024-July/656937.html
won't be enough for pr107569.C.  For pr107569.C I guess we need to add
range ops for __builtin_isfinite but the patch only handles
__builtin_isinf.

-- 
Xi Ruoyao 
School of Aerospace Science and Technology, Xidian University


Re: [PATCH] gcc: stop adding -fno-common for checking builds

2024-07-21 Thread Sam James
Richard Biener  writes:

>> Am 20.07.2024 um 02:31 schrieb Andrew Pinski :
>> 
>> On Fri, Jul 19, 2024 at 5:23 PM Sam James  wrote:
>>> 
>>> Originally added in r0-44646-g204250d2fcd084 and r0-44627-gfd350d241fecf6 
>>> whic
>>> moved -fno-common from all builds to just checking builds.
>>> 
>>> Since r10-4867-g6271dd984d7f92, GCC defaults to -fno-common. There's no need
>>> to pass it specially for checking builds.
>>> 
>>> We could keep it for older bootstrap compilers with checking but I don't see
>>> much value in that, it was already just a bonus before.
>> 
>> Considering -fno-common has almost no effect on C++ code, removing it
>> fully is a decent thing to do.
>> It was added back when GCC was written in C and then never removed
>> when GCC started to build as C++.
>
> Ok

Thank you! Arsen has kindly pushed for me.

>
> Richard 
>
>> Thanks,
>> Andrew Pinski
>> 
>>> 
>>> gcc/ChangeLog:
>>>* Makefile.in (NOCOMMON_FLAG): Delete.
>>>(GCC_WARN_CFLAGS): Drop NOCOMMON_FLAG.
>>>(GCC_WARN_CXXFLAGS): Drop NOCOMMON_FLAG.
>>>* configure.ac: Ditto.
>>>* configure: Regenerate.
>>> 
>>> gcc/d/ChangeLog:
>>>* Make-lang.in (WARN_DFLAGS): Drop NOCOMMON_FLAG.
>>> ---
>>> This came out of a discussion with pinskia last year but I punted it
>>> until stage1. Been running with it since then.
>>> 
>>> gcc/Makefile.in| 8 ++--
>>> gcc/configure  | 8 ++--
>>> gcc/configure.ac   | 3 ---
>>> gcc/d/Make-lang.in | 2 +-
>>> 4 files changed, 5 insertions(+), 16 deletions(-)
>>> 
>>> diff --git a/gcc/Makefile.in b/gcc/Makefile.in
>>> index f4bb4a88cf31..4fc86ed7938b 100644
>>> --- a/gcc/Makefile.in
>>> +++ b/gcc/Makefile.in
>>> @@ -185,10 +185,6 @@ C_LOOSE_WARN = @c_loose_warn@
>>> STRICT_WARN = @strict_warn@
>>> C_STRICT_WARN = @c_strict_warn@
>>> 
>>> -# This is set by --enable-checking.  The idea is to catch forgotten
>>> -# "extern" tags in header files.
>>> -NOCOMMON_FLAG = @nocommon_flag@
>>> -
>>> NOEXCEPTION_FLAGS = @noexception_flags@
>>> 
>>> ALIASING_FLAGS = @aliasing_flags@
>>> @@ -215,8 +211,8 @@ VALGRIND_DRIVER_DEFINES = @valgrind_path_defines@
>>> .-warn = $(STRICT_WARN)
>>> build-warn = $(STRICT_WARN)
>>> rtl-ssa-warn = $(STRICT_WARN)
>>> -GCC_WARN_CFLAGS = $(LOOSE_WARN) $(C_LOOSE_WARN) $($(@D)-warn) $(if 
>>> $(filter-out $(STRICT_WARN),$($(@D)-warn)),,$(C_STRICT_WARN)) 
>>> $(NOCOMMON_FLAG) $($@-warn)
>>> -GCC_WARN_CXXFLAGS = $(LOOSE_WARN) $($(@D)-warn) $(NOCOMMON_FLAG) $($@-warn)
>>> +GCC_WARN_CFLAGS = $(LOOSE_WARN) $(C_LOOSE_WARN) $($(@D)-warn) $(if 
>>> $(filter-out $(STRICT_WARN),$($(@D)-warn)),,$(C_STRICT_WARN)) $($@-warn)
>>> +GCC_WARN_CXXFLAGS = $(LOOSE_WARN) $($(@D)-warn) $($@-warn)
>>> 
>>> # 1 2 3 ... 
>>> one_to__0:=1 2 3 4 5 6 7 8 9
>>> diff --git a/gcc/configure b/gcc/configure
>>> index 4faae0fa5fb8..01acca7fb5cc 100755
>>> --- a/gcc/configure
>>> +++ b/gcc/configure
>>> @@ -862,7 +862,6 @@ valgrind_command
>>> valgrind_path_defines
>>> valgrind_path
>>> TREECHECKING
>>> -nocommon_flag
>>> noexception_flags
>>> warn_cxxflags
>>> warn_cflags
>>> @@ -7605,17 +7604,14 @@ do
>>> done
>>> IFS="$ac_save_IFS"
>>> 
>>> -nocommon_flag=""
>>> if test x$ac_checking != x ; then
>>> 
>>> $as_echo "#define CHECKING_P 1" >>confdefs.h
>>> 
>>> -  nocommon_flag=-fno-common
>>> else
>>>   $as_echo "#define CHECKING_P 0" >>confdefs.h
>>> 
>>> fi
>>> -
>>> if test x$ac_extra_checking != x ; then
>>> 
>>> $as_echo "#define ENABLE_EXTRA_CHECKING 1" >>confdefs.h
>>> @@ -21410,7 +21406,7 @@ else
>>>   lt_dlunknown=0; lt_dlno_uscore=1; lt_dlneed_uscore=2
>>>   lt_status=$lt_dlunknown
>>>   cat > conftest.$ac_ext <<_LT_EOF
>>> -#line 21413 "configure"
>>> +#line 21409 "configure"
>>> #include "confdefs.h"
>>> 
>>> #if HAVE_DLFCN_H
>>> @@ -21516,7 +21512,7 @@ else
>>>   lt_dlunknown=0; lt_dlno_uscore=1; lt_dlneed_uscore=2
>>>   lt_status=$lt_dlunknown
>>>   cat > conftest.$ac_ext <<_LT_EOF
>>> -#line 21519 "configure"
>>> +#line 21515 "configure"
>>> #include "confdefs.h"
>>> 
>>> #if HAVE_DLFCN_H
>>> diff --git a/gcc/configure.ac b/gcc/configure.ac
>>> index 3da1eaa70646..3f20c107b6aa 100644
>>> --- a/gcc/configure.ac
>>> +++ b/gcc/configure.ac
>>> @@ -697,16 +697,13 @@ do
>>> done
>>> IFS="$ac_save_IFS"
>>> 
>>> -nocommon_flag=""
>>> if test x$ac_checking != x ; then
>>>   AC_DEFINE(CHECKING_P, 1,
>>> [Define to 0/1 if you want more run-time sanity checks.  This one gets a 
>>> grab
>>> bag of miscellaneous but relatively cheap checks.])
>>> -  nocommon_flag=-fno-common
>>> else
>>>   AC_DEFINE(CHECKING_P, 0)
>>> fi
>>> -AC_SUBST(nocommon_flag)
>>> if test x$ac_extra_checking != x ; then
>>>   AC_DEFINE(ENABLE_EXTRA_CHECKING, 1,
>>> [Define to 0/1 if you want extra run-time checking that might affect code
>>> diff --git a/gcc/d/Make-lang.in b/gcc/d/Make-lang.in
>>> index eaea6e039cf7..077668faae64 100644
>>> --- a/gcc/d/Make-lang.in
>>> +++ b/gcc/d/Make-lang.in
>>> @@ -55,7 +55,7 @@ CHECKING_DFLAGS = -frelease
>>> else
>>> CHECKING_DFLAGS =
>>> endif

[committed] [PR rtl-optimization/115877] Fix livein computation for ext-dce

2024-07-21 Thread Jeff Law
So I'm not yet sure how I'm going to break everything down, but this is 
easy enough to break out as 1/N of ext-dce fixes/improvements.



When handling uses in an insn, we first determine what bits are set in 
the destination which is represented in DST_MASK.  Then we use that to 
refine what bits are live in the source operands.


In the source operand handling section we *modify* DST_MASK if the 
source operand is a SUBREG (ugh!).  So if the first operand is a SUBREG, 
then we can incorrectly compute which bit groups are live in the second 
operand, especially if it is a SUBREG as well.


This was seen when testing a larger set of patches on the rl78 port 
(builtin-arith-overflow-p-7 & pr71631 execution failures), so no new 
test for this bugfix.


Run through my tester (in conjunction with other ext-dce changes) on the 
various cross targets.  Run individually through a bootstrap and 
regression test cycle on x86_64 as well.


Pushing to the trunk.


jeff

PR rtl-optimization/115877
gcc/
* ext-dce.cc (ext_dce_process_uses): Restore the value of DST_MASK
for reach operand.

diff --git a/gcc/ext-dce.cc b/gcc/ext-dce.cc
index 6d4b8858ec6..c4c38659701 100644
--- a/gcc/ext-dce.cc
+++ b/gcc/ext-dce.cc
@@ -591,8 +678,10 @@ ext_dce_process_uses (rtx_insn *insn, rtx obj, bitmap 
live_tmp)
 making things live.  Breaking from this loop will cause
 the iterator to work on sub-rtxs, so it is safe to break
 if we see something we don't know how to handle.  */
+ unsigned HOST_WIDE_INT save_mask = dst_mask;
  for (;;)
{
+ dst_mask = save_mask;
  /* Strip an outer paradoxical subreg.  The bits outside
 the inner mode are don't cares.  So we can just strip
 and process the inner object.  */



[committed][PR rtl-optimization/115877][2/n] Improve liveness computation for constant initialization

2024-07-21 Thread Jeff Law
While debugging pr115877, I noticed we were failing to remove the 
destination register from LIVENOW bitmap when it was set to a constant 
value.  ie  (set (dest) (const_int)).  This was a trivial oversight in 
safe_for_live_propagation.


I don't have an example of this affecting code generation, but it 
certainly could.  More importantly, by making LIVENOW more accurate it's 
easier to debug when LIVENOW differs from expectations.


As with the prior patch this has been tested as part of a larger 
patchset with the crosses as well as individually on x86_64.


Pushing to the trunk,
JeffPR rtl-optimization/115877
gcc/
* ext-dce.cc (safe_for_live_propagation): Handle RTX_CONST_OBJ.

diff --git a/gcc/ext-dce.cc b/gcc/ext-dce.cc
index 6d4b8858ec6..cbecfc53dba 100644
--- a/gcc/ext-dce.cc
+++ b/gcc/ext-dce.cc
@@ -69,6 +69,7 @@ safe_for_live_propagation (rtx_code code)
   switch (GET_RTX_CLASS (code))
 {
   case RTX_OBJ:
+  case RTX_CONST_OBJ:
return true;
 
   case RTX_COMPARE:


Re: [PATCH] testsuite: fix pr115929-1.c with -Wformat-security

2024-07-21 Thread Richard Sandiford
Xi Ruoyao  writes:
> On Sat, 2024-07-20 at 06:52 +0100, Sam James wrote:
>> Some distributions like Gentoo make -Wformat and -Wformat-security
>> enabled by default. Pass -Wno-format to the test to avoid a spurious
>> fail in such environments.
>> 
>> gcc/testsuite/
>>  PR rtl-optimization/115929
>>  * gcc.dg/torture/pr115929-1.c: Pass -Wno-format.
>> ---
>
> IMO if you are patching GCC downstream to enable some options, you can
> patch the test case in the same .patch file anyway instead of pushing it
> upstream.
>
> If we take the responsibility to make the test suite anticipate random
> downstream changes, the test suite will ended up filled with different
> workarounds for 42 distros.

Yeah, I'm worried about that too.

> If we have to anticipate downstream changes we should make a policy
> about which changes we must anticipate (hmm and if we'll anticipate -
> Wformat by default why not add a configuration option for it by the
> way?), or do it in a more generic way (using a .spec file to explicitly
> give the "baseline" options for testing?)

Two systematic ways of dealing with this under the current testsuite
framework would be:

(1) Make dg-torture.exp add -w by default.  This is what gcc.c-torture
already does.  Then, tests that want to test for warnings can
enable them explicitly.

Some of the existing dg-warnings are already due to lack of -w,
rather than something that the test was originally designed for.
E.g. pr26565.c.

(2) Make dg-torture.exp add -Wall -Wextra by default, so that tests
have to suppress any warnings they don't want.

Personally, I'd prefer one of those two rather than patching upstream
tests for downstream changes.

Thanks,
Richard


Re: [PATCH] testsuite: fix pr115929-1.c with -Wformat-security

2024-07-21 Thread Sam James
Richard Sandiford  writes:

> Xi Ruoyao  writes:
>> On Sat, 2024-07-20 at 06:52 +0100, Sam James wrote:
>>> Some distributions like Gentoo make -Wformat and -Wformat-security
>>> enabled by default. Pass -Wno-format to the test to avoid a spurious
>>> fail in such environments.
>>> 
>>> gcc/testsuite/
>>> PR rtl-optimization/115929
>>> * gcc.dg/torture/pr115929-1.c: Pass -Wno-format.
>>> ---
>>
>> IMO if you are patching GCC downstream to enable some options, you can
>> patch the test case in the same .patch file anyway instead of pushing it
>> upstream.
>>
>> If we take the responsibility to make the test suite anticipate random
>> downstream changes, the test suite will ended up filled with different
>> workarounds for 42 distros.
>
> Yeah, I'm worried about that too.
>
>> If we have to anticipate downstream changes we should make a policy
>> about which changes we must anticipate (hmm and if we'll anticipate -
>> Wformat by default why not add a configuration option for it by the
>> way?), or do it in a more generic way (using a .spec file to explicitly
>> give the "baseline" options for testing?)
>
> Two systematic ways of dealing with this under the current testsuite
> framework would be:
>
> (1) Make dg-torture.exp add -w by default.  This is what gcc.c-torture
> already does.  Then, tests that want to test for warnings can
> enable them explicitly.
>
> Some of the existing dg-warnings are already due to lack of -w,
> rather than something that the test was originally designed for.
> E.g. pr26565.c.
>
> (2) Make dg-torture.exp add -Wall -Wextra by default, so that tests
> have to suppress any warnings they don't want.
>
> Personally, I'd prefer one of those two rather than patching upstream
> tests for downstream changes.

I don't mind doing the work once we have consensus. (1) feels more pure
but (2) is more progressive and lets us make things error out by default
in future upstream with a bit more freedom.

In the meantime, I'll return to other testsuite bits I have in mind.

thanks,
sam


[PATCH] doc: document all.cross and *.encap make targets

2024-07-21 Thread Etienne Buira
Informations were took from gcc/Makefile.in
---
 gcc/doc/sourcebuild.texi | 9 ++---
 1 file changed, 6 insertions(+), 3 deletions(-)

diff --git a/gcc/doc/sourcebuild.texi b/gcc/doc/sourcebuild.texi
index 66c4206bfc2..455836a583d 100644
--- a/gcc/doc/sourcebuild.texi
+++ b/gcc/doc/sourcebuild.texi
@@ -712,9 +712,12 @@ standard rule in @file{gcc/Makefile.in} to the variable
 
 @table @code
 @item all.cross
-@itemx start.encap
-@itemx rest.encap
-FIXME: exactly what goes in each of these targets?
+This is what to compile if making a cross-compiler.
+@item start.encap
+Build what must be done before installing GCC and converting libraries.
+@item rest.encap
+Build what must be done before installing GCC and converting libraries
+that cannot be done in @code{start.encap}.
 @item tags
 Build an @command{etags} @file{TAGS} file in the language subdirectory
 in the source tree.
-- 
2.44.2



Re: [PATCH] testsuite: powerpc: fix dg-do run typo

2024-07-21 Thread Kewen.Lin
Hi Sam,

on 2024/7/20 07:10, Sam James wrote:
> "Kewen.Lin"  writes:
> 
>> Hi Sam,
> 
> Hi Kewen,
> 
>>
>> on 2024/7/19 11:28, Sam James wrote:
>>> 'dg-run' is not a valid dejagnu directive, 'dg-do run' is needed here
>>> for the test to be executed.
>>>
>>> 2024-07-18  Sam James  
>>>
>>> PR target/108699
>>> * gcc.target/powerpc/pr108699.c: Fix 'dg-run' typo.
>>> ---
>>> Kewen, could you check this on powerpc to ensure it doesn't execute 
>>> beforehand
>>> and now it does? I could do it on powerpc but I don't have anything setup
>>> right now.
>>
>> Oops, thanks for catching and fixing this stupid typo!  Yes, I just 
>> confirmed that,
>> w/ this fix pr108699.exe gets generated and executed (# of expected passes 
>> is changed
>> from 1 to 2).
> 
> Many thanks! Could you push for me please?

Sure, pushed as r15-2190.

BR,
Kewen

> 
>>
>> BR,
>> Kewen
> 
> best,
> sam
> 
>>
>>>
>>>  gcc/testsuite/gcc.target/powerpc/pr108699.c | 2 +-
>>>  1 file changed, 1 insertion(+), 1 deletion(-)
>>>
>>> diff --git a/gcc/testsuite/gcc.target/powerpc/pr108699.c 
>>> b/gcc/testsuite/gcc.target/powerpc/pr108699.c
>>> index f02bac130cc7..beb8b601fd51 100644
>>> --- a/gcc/testsuite/gcc.target/powerpc/pr108699.c
>>> +++ b/gcc/testsuite/gcc.target/powerpc/pr108699.c
>>> @@ -1,4 +1,4 @@
>>> -/* { dg-run } */
>>> +/* { dg-do run } */
>>>  /* { dg-options "-O2 -ftree-vectorize -fno-vect-cost-model" } */
>>>  
>>>  #define N 16
>>>



[PATCHv2, expand] Add const0 move checking for CLEAR_BY_PIECES optabs

2024-07-21 Thread HAO CHEN GUI
Hi,
  This patch adds const0 move checking for CLEAR_BY_PIECES. The original
vec_duplicate handles duplicates of non-constant inputs. But 0 is a
constant. So even a platform doesn't support vec_duplicate, it could
still do clear by pieces if it supports const0 move by that mode.

  Compared to the previous version, the main change is to do const0
direct move for by-piece clear if the target supports const0 move by
that mode.
https://gcc.gnu.org/pipermail/gcc-patches/2024-January/643063.html

  Bootstrapped and tested on x86 and powerpc64-linux BE and LE with no
regressions. There are several regressions on aarch64. They could be
fixed by enhancing const0 move on V2x8QImode. Is it OK for trunk?

Thanks
Gui Haochen

ChangeLog
expand: Add const0 move checking for CLEAR_BY_PIECES optabs

vec_duplicate handles duplicates of non-constant inputs.  The 0 is a
constant.  So even a platform doesn't support vec_duplicate, it could
still do clear by pieces if it supports const0 move.  This patch adds
the checking.

gcc/
* expr.cc (by_pieces_mode_supported_p): Add const0 move checking
for CLEAR_BY_PIECES.
(op_by_pieces_d::run): Pass const0 to do the move if the target
supports direct const0 move by the mode.

patch.diff
diff --git a/gcc/expr.cc b/gcc/expr.cc
index fc5e998e329..97764eb9ebe 100644
--- a/gcc/expr.cc
+++ b/gcc/expr.cc
@@ -1006,14 +1006,21 @@ can_use_qi_vectors (by_pieces_operation op)
 static bool
 by_pieces_mode_supported_p (fixed_size_mode mode, by_pieces_operation op)
 {
-  if (optab_handler (mov_optab, mode) == CODE_FOR_nothing)
+  enum insn_code icode = optab_handler (mov_optab, mode);
+  if (icode == CODE_FOR_nothing)
 return false;

-  if ((op == SET_BY_PIECES || op == CLEAR_BY_PIECES)
+  if (op == SET_BY_PIECES
   && VECTOR_MODE_P (mode)
   && optab_handler (vec_duplicate_optab, mode) == CODE_FOR_nothing)
 return false;

+  if (op == CLEAR_BY_PIECES
+  && VECTOR_MODE_P (mode)
+  && optab_handler (vec_duplicate_optab, mode) == CODE_FOR_nothing
+  && !insn_operand_matches (icode, 1, CONST0_RTX (mode)))
+return false;
+
   if (op == COMPARE_BY_PIECES
   && !can_compare_p (EQ, mode, ccp_jump))
 return false;
@@ -1490,7 +1497,7 @@ op_by_pieces_d::run ()
   do
 {
   unsigned int size = GET_MODE_SIZE (mode);
-  rtx to1 = NULL_RTX, from1;
+  rtx to1 = NULL_RTX, from1 = NULL_RTX;

   while (length >= size)
{
@@ -1500,12 +1507,26 @@ op_by_pieces_d::run ()
  to1 = m_to.adjust (mode, m_offset, &to_prev);
  to_prev.data = to1;
  to_prev.mode = mode;
- from1 = m_from.adjust (mode, m_offset, &from_prev);
- from_prev.data = from1;
- from_prev.mode = mode;

  m_to.maybe_predec (-(HOST_WIDE_INT)size);
- m_from.maybe_predec (-(HOST_WIDE_INT)size);
+
+ /* Pass CONST0_RTX for memory clear when target supports CONST0
+direct move.  */
+ if (m_op == CLEAR_BY_PIECES
+ && VECTOR_MODE_P (mode)
+ && optab_handler (vec_duplicate_optab, mode) == CODE_FOR_nothing)
+   {
+ enum insn_code icode = optab_handler (mov_optab, mode);
+ if (insn_operand_matches (icode, 1, CONST0_RTX (mode)))
+   from1 = CONST0_RTX (mode);
+   }
+ else
+   {
+ from1 = m_from.adjust (mode, m_offset, &from_prev);
+ from_prev.data = from1;
+ from_prev.mode = mode;
+ m_from.maybe_predec (-(HOST_WIDE_INT)size);
+   }

  generate (to1, from1, mode);



Ping [PATCH-1v4] Value Range: Add range op for builtin isinf

2024-07-21 Thread HAO CHEN GUI
Hi,
  Gently ping it.
https://gcc.gnu.org/pipermail/gcc-patches/2024-July/656937.html

Thanks
Gui Haochen

在 2024/7/11 15:32, HAO CHEN GUI 写道:
> Hi,
>   The builtin isinf is not folded at front end if the corresponding optab
> exists. It causes the range evaluation failed on the targets which has
> optab_isinf. For instance, range-sincos.c will fail on the targets which
> has optab_isinf as it calls builtin_isinf.
> 
>   This patch fixed the problem by adding range op for builtin isinf. It
> also fixed the issue in PR114678.
> 
>   Compared with previous version, the main change is to remove xfail for
> s390 in range-sincos.c and vrp-float-abs-1.c.
> https://gcc.gnu.org/pipermail/gcc-patches/2024-May/653096.html
> 
>   Bootstrapped and tested on x86 and powerpc64-linux BE and LE with no
> regressions. Is it OK for the trunk?
> 
> Thanks
> Gui Haochen
> 
> 
> ChangeLog
> Value Range: Add range op for builtin isinf
> 
> The builtin isinf is not folded at front end if the corresponding optab
> exists.  So the range op for isinf is needed for value range analysis.
> This patch adds range op for builtin isinf.
> 
> gcc/
>   PR target/114678
>   * gimple-range-op.cc (class cfn_isinf): New.
>   (op_cfn_isinf): New variables.
>   (gimple_range_op_handler::maybe_builtin_call): Handle
>   CASE_FLT_FN (BUILT_IN_ISINF).
> 
> gcc/testsuite/
>   PR target/114678
>   * gcc.dg/tree-ssa/range-isinf.c: New test.
>   * gcc.dg/tree-ssa/range-sincos.c: Remove xfail for s390.
>   * gcc.dg/tree-ssa/vrp-float-abs-1.c: Likewise.
> 
> patch.diff
> diff --git a/gcc/gimple-range-op.cc b/gcc/gimple-range-op.cc
> index a80b93cf063..24559951dd6 100644
> --- a/gcc/gimple-range-op.cc
> +++ b/gcc/gimple-range-op.cc
> @@ -1153,6 +1153,63 @@ private:
>bool m_is_pos;
>  } op_cfn_goacc_dim_size (false), op_cfn_goacc_dim_pos (true);
> 
> +// Implement range operator for CFN_BUILT_IN_ISINF
> +class cfn_isinf : public range_operator
> +{
> +public:
> +  using range_operator::fold_range;
> +  using range_operator::op1_range;
> +  virtual bool fold_range (irange &r, tree type, const frange &op1,
> +const irange &, relation_trio) const override
> +  {
> +if (op1.undefined_p ())
> +  return false;
> +
> +if (op1.known_isinf ())
> +  {
> + wide_int one = wi::one (TYPE_PRECISION (type));
> + r.set (type, one, one);
> + return true;
> +  }
> +
> +if (op1.known_isnan ()
> + || (!real_isinf (&op1.lower_bound ())
> + && !real_isinf (&op1.upper_bound (
> +  {
> + r.set_zero (type);
> + return true;
> +  }
> +
> +r.set_varying (type);
> +return true;
> +  }
> +  virtual bool op1_range (frange &r, tree type, const irange &lhs,
> +   const frange &, relation_trio) const override
> +  {
> +if (lhs.undefined_p ())
> +  return false;
> +
> +if (lhs.zero_p ())
> +  {
> + nan_state nan (true);
> + r.set (type, real_min_representable (type),
> +real_max_representable (type), nan);
> + return true;
> +  }
> +
> +if (!range_includes_zero_p (lhs))
> +  {
> + // The range is [-INF,-INF][+INF,+INF], but it can't be represented.
> + // Set range to [-INF,+INF]
> + r.set_varying (type);
> + r.clear_nan ();
> + return true;
> +  }
> +
> +r.set_varying (type);
> +return true;
> +  }
> +} op_cfn_isinf;
> 
>  // Implement range operator for CFN_BUILT_IN_
>  class cfn_parity : public range_operator
> @@ -1246,6 +1303,11 @@ gimple_range_op_handler::maybe_builtin_call ()
>m_operator = &op_cfn_signbit;
>break;
> 
> +CASE_FLT_FN (BUILT_IN_ISINF):
> +  m_op1 = gimple_call_arg (call, 0);
> +  m_operator = &op_cfn_isinf;
> +  break;
> +
>  CASE_CFN_COPYSIGN_ALL:
>m_op1 = gimple_call_arg (call, 0);
>m_op2 = gimple_call_arg (call, 1);
> diff --git a/gcc/testsuite/gcc.dg/tree-ssa/range-isinf.c 
> b/gcc/testsuite/gcc.dg/tree-ssa/range-isinf.c
> new file mode 100644
> index 000..468f1bcf5c7
> --- /dev/null
> +++ b/gcc/testsuite/gcc.dg/tree-ssa/range-isinf.c
> @@ -0,0 +1,44 @@
> +/* { dg-do compile } */
> +/* { dg-options "-O2 -fdump-tree-evrp" } */
> +
> +#include 
> +void link_error();
> +
> +void
> +test1 (double x)
> +{
> +  if (x > __DBL_MAX__ && !__builtin_isinf (x))
> +link_error ();
> +  if (x < -__DBL_MAX__ && !__builtin_isinf (x))
> +link_error ();
> +}
> +
> +void
> +test2 (float x)
> +{
> +  if (x > __FLT_MAX__ && !__builtin_isinf (x))
> +link_error ();
> +  if (x < -__FLT_MAX__ && !__builtin_isinf (x))
> +link_error ();
> +}
> +
> +void
> +test3 (double x)
> +{
> +  if (!__builtin_isinf (x) && !__builtin_isnan (x) && x > __DBL_MAX__)
> +link_error ();
> +  if (!__builtin_isinf (x) && !__builtin_isnan (x) && x < -__DBL_MAX__)
> +link_error ();
> +}
> +
> +void
> +test4 (float x)
> +{
> +  if (!__builtin_isinf (x) && !__builtin_isnan (x) && x > __FLT_MAX__)

Ping^4 [PATCH-3v2] Value Range: Add range op for builtin isnormal

2024-07-21 Thread HAO CHEN GUI
Hi,
  Gently ping it.
https://gcc.gnu.org/pipermail/gcc-patches/2024-May/653095.html

Thanks
Gui Haochen

在 2024/7/1 9:12, HAO CHEN GUI 写道:
> Hi,
>   Gently ping it.
> https://gcc.gnu.org/pipermail/gcc-patches/2024-May/653095.html
> 
> Thanks
> Gui Haochen
> 
> 在 2024/6/24 9:41, HAO CHEN GUI 写道:
>> Hi,
>>   Gently ping it.
>> https://gcc.gnu.org/pipermail/gcc-patches/2024-May/653095.html
>>
>> Thanks
>> Gui Haochen
>>
>> 在 2024/6/20 14:58, HAO CHEN GUI 写道:
>>> Hi,
>>>   Gently ping it.
>>> https://gcc.gnu.org/pipermail/gcc-patches/2024-May/653095.html
>>>
>>> Thanks
>>> Gui Haochen
>>>
>>> 在 2024/5/30 10:46, HAO CHEN GUI 写道:
 Hi,
   This patch adds the range op for builtin isnormal. It also adds two
 help function in frange to detect range of normal floating-point and
 range of subnormal or zero.

   Compared to previous version, the main change is to set the range to
 1 if it's normal number otherwise to 0.
 https://gcc.gnu.org/pipermail/gcc-patches/2024-May/652221.html

   Bootstrapped and tested on x86 and powerpc64-linux BE and LE with no
 regressions. Is it OK for the trunk?

 Thanks
 Gui Haochen

 ChangeLog
 Value Range: Add range op for builtin isnormal

 The former patch adds optab for builtin isnormal. Thus builtin isnormal
 might not be folded at front end.  So the range op for isnormal is needed
 for value range analysis.  This patch adds range op for builtin isnormal.

 gcc/
* gimple-range-op.cc (class cfn_isfinite): New.
(op_cfn_finite): New variables.
(gimple_range_op_handler::maybe_builtin_call): Handle
CFN_BUILT_IN_ISFINITE.
* value-range.h (class frange): Declare known_isnormal and
known_isdenormal_or_zero.
(frange::known_isnormal): Define.
(frange::known_isdenormal_or_zero): Define.

 gcc/testsuite/
* gcc/testsuite/gcc.dg/tree-ssa/range-isnormal.c: New test.

 patch.diff
 diff --git a/gcc/gimple-range-op.cc b/gcc/gimple-range-op.cc
 index 5ec5c828fa4..6787f532f11 100644
 --- a/gcc/gimple-range-op.cc
 +++ b/gcc/gimple-range-op.cc
 @@ -1289,6 +1289,61 @@ public:
}
  } op_cfn_isfinite;

 +//Implement range operator for CFN_BUILT_IN_ISNORMAL
 +class cfn_isnormal :  public range_operator
 +{
 +public:
 +  using range_operator::fold_range;
 +  using range_operator::op1_range;
 +  virtual bool fold_range (irange &r, tree type, const frange &op1,
 + const irange &, relation_trio) const override
 +  {
 +if (op1.undefined_p ())
 +  return false;
 +
 +if (op1.known_isnormal ())
 +  {
 +  wide_int one = wi::one (TYPE_PRECISION (type));
 +  r.set (type, one, one);
 +  return true;
 +  }
 +
 +if (op1.known_isnan ()
 +  || op1.known_isinf ()
 +  || op1.known_isdenormal_or_zero ())
 +  {
 +  r.set_zero (type);
 +  return true;
 +  }
 +
 +r.set_varying (type);
 +return true;
 +  }
 +  virtual bool op1_range (frange &r, tree type, const irange &lhs,
 +const frange &, relation_trio) const override
 +  {
 +if (lhs.undefined_p ())
 +  return false;
 +
 +if (lhs.zero_p ())
 +  {
 +  r.set_varying (type);
 +  return true;
 +  }
 +
 +if (!range_includes_zero_p (lhs))
 +  {
 +  nan_state nan (false);
 +  r.set (type, real_min_representable (type),
 + real_max_representable (type), nan);
 +  return true;
 +  }
 +
 +r.set_varying (type);
 +return true;
 +  }
 +} op_cfn_isnormal;
 +
  // Implement range operator for CFN_BUILT_IN_
  class cfn_parity : public range_operator
  {
 @@ -1391,6 +1446,11 @@ gimple_range_op_handler::maybe_builtin_call ()
m_operator = &op_cfn_isfinite;
break;

 +case CFN_BUILT_IN_ISNORMAL:
 +  m_op1 = gimple_call_arg (call, 0);
 +  m_operator = &op_cfn_isnormal;
 +  break;
 +
  CASE_CFN_COPYSIGN_ALL:
m_op1 = gimple_call_arg (call, 0);
m_op2 = gimple_call_arg (call, 1);
 diff --git a/gcc/testsuite/gcc.dg/tree-ssa/range-isnormal.c 
 b/gcc/testsuite/gcc.dg/tree-ssa/range-isnormal.c
 new file mode 100644
 index 000..c4df4d839b0
 --- /dev/null
 +++ b/gcc/testsuite/gcc.dg/tree-ssa/range-isnormal.c
 @@ -0,0 +1,37 @@
 +/* { dg-do compile } */
 +/* { dg-options "-O2 -fdump-tree-evrp" } */
 +
 +#include 
 +void link_error();
 +
 +void test1 (double x)
 +{
 +  if (x < __DBL_MAX__ && x > __DBL_MIN__ && !__builtin_isnormal (x))
 +link_error ();
 +
 +  if (x < -__DBL_MIN__ && x > -__DBL_MAX__ && !__builtin_isnormal (x))
 +link_error ();
 +}

Ping^4 [PATCH-2v4] Value Range: Add range op for builtin isfinite

2024-07-21 Thread HAO CHEN GUI
Hi,
  Gently ping it.
https://gcc.gnu.org/pipermail/gcc-patches/2024-May/653094.html

Thanks
Gui Haochen

在 2024/7/1 9:11, HAO CHEN GUI 写道:
> Hi,
>   Gently ping it.
> https://gcc.gnu.org/pipermail/gcc-patches/2024-May/653094.html
> 
> Thanks
> Gui Haochen
> 
> 在 2024/6/24 9:41, HAO CHEN GUI 写道:
>> Hi,
>>   Gently ping it.
>> https://gcc.gnu.org/pipermail/gcc-patches/2024-May/653094.html
>>
>> Thanks
>> Gui Haochen
>>
>> 在 2024/6/20 14:57, HAO CHEN GUI 写道:
>>> Hi,
>>>   Gently ping it.
>>> https://gcc.gnu.org/pipermail/gcc-patches/2024-May/653094.html
>>>
>>> Thanks
>>> Gui Haochen
>>>
>>> 在 2024/5/30 10:46, HAO CHEN GUI 写道:
 Hi,
   This patch adds the range op for builtin isfinite.

   Compared to previous version, the main change is to set the range to
 1 if it's finite number otherwise to 0.
 https://gcc.gnu.org/pipermail/gcc-patches/2024-May/652220.html

   Bootstrapped and tested on x86 and powerpc64-linux BE and LE with no
 regressions. Is it OK for the trunk?

 Thanks
 Gui Haochen

 ChangeLog
 Value Range: Add range op for builtin isfinite

 The former patch adds optab for builtin isfinite. Thus builtin isfinite
 might not be folded at front end.  So the range op for isfinite is needed
 for value range analysis.  This patch adds range op for builtin isfinite.

 gcc/
* gimple-range-op.cc (class cfn_isfinite): New.
(op_cfn_finite): New variables.
(gimple_range_op_handler::maybe_builtin_call): Handle
CFN_BUILT_IN_ISFINITE.

 gcc/testsuite/
* gcc/testsuite/gcc.dg/tree-ssa/range-isfinite.c: New test.

 patch.diff
 diff --git a/gcc/gimple-range-op.cc b/gcc/gimple-range-op.cc
 index 4e60a42eaac..5ec5c828fa4 100644
 --- a/gcc/gimple-range-op.cc
 +++ b/gcc/gimple-range-op.cc
 @@ -1233,6 +1233,62 @@ public:
}
  } op_cfn_isinf;

 +//Implement range operator for CFN_BUILT_IN_ISFINITE
 +class cfn_isfinite : public range_operator
 +{
 +public:
 +  using range_operator::fold_range;
 +  using range_operator::op1_range;
 +  virtual bool fold_range (irange &r, tree type, const frange &op1,
 + const irange &, relation_trio) const override
 +  {
 +if (op1.undefined_p ())
 +  return false;
 +
 +if (op1.known_isfinite ())
 +  {
 +  wide_int one = wi::one (TYPE_PRECISION (type));
 +  r.set (type, one, one);
 +  return true;
 +  }
 +
 +if (op1.known_isnan ()
 +  || op1.known_isinf ())
 +  {
 +  r.set_zero (type);
 +  return true;
 +  }
 +
 +r.set_varying (type);
 +return true;
 +  }
 +  virtual bool op1_range (frange &r, tree type, const irange &lhs,
 +const frange &, relation_trio) const override
 +  {
 +if (lhs.undefined_p ())
 +  return false;
 +
 +if (lhs.zero_p ())
 +  {
 +  // The range is [-INF,-INF][+INF,+INF] NAN, but it can't be represented.
 +  // Set range to varying
 +  r.set_varying (type);
 +  return true;
 +  }
 +
 +if (!range_includes_zero_p (lhs))
 +  {
 +  nan_state nan (false);
 +  r.set (type, real_min_representable (type),
 + real_max_representable (type), nan);
 +  return true;
 +  }
 +
 +r.set_varying (type);
 +return true;
 +  }
 +} op_cfn_isfinite;
 +
  // Implement range operator for CFN_BUILT_IN_
  class cfn_parity : public range_operator
  {
 @@ -1330,6 +1386,11 @@ gimple_range_op_handler::maybe_builtin_call ()
m_operator = &op_cfn_isinf;
break;

 +case CFN_BUILT_IN_ISFINITE:
 +  m_op1 = gimple_call_arg (call, 0);
 +  m_operator = &op_cfn_isfinite;
 +  break;
 +
  CASE_CFN_COPYSIGN_ALL:
m_op1 = gimple_call_arg (call, 0);
m_op2 = gimple_call_arg (call, 1);
 diff --git a/gcc/testsuite/gcc.dg/tree-ssa/range-isfinite.c 
 b/gcc/testsuite/gcc.dg/tree-ssa/range-isfinite.c
 new file mode 100644
 index 000..f5dce0a0486
 --- /dev/null
 +++ b/gcc/testsuite/gcc.dg/tree-ssa/range-isfinite.c
 @@ -0,0 +1,31 @@
 +/* { dg-do compile } */
 +/* { dg-options "-O2 -fdump-tree-evrp" } */
 +
 +#include 
 +void link_error();
 +
 +void test1 (double x)
 +{
 +  if (x < __DBL_MAX__ && x > -__DBL_MAX__ && !__builtin_isfinite (x))
 +link_error ();
 +}
 +
 +void test2 (float x)
 +{
 +  if (x < __FLT_MAX__ && x > -__FLT_MAX__ && !__builtin_isfinite (x))
 +link_error ();
 +}
 +
 +void test3 (double x)
 +{
 +  if (__builtin_isfinite (x) && __builtin_isinf (x))
 +link_error ();
 +}
 +
 +void test4 (float x)
 +{
 +  if (__builtin_isfinite (x) && __

[Bug fortran/59104] 15 Regression - Wrong result with SIZE specification expression

2024-07-21 Thread Paul Richard Thomas
After an OK from Harald,

commit r15-2187-g838999bb23303edc14e96b6034cd837fa4454cfd
Author: Paul Thomas 
Date:   Sun Jul 21 17:48:47 2024 +0100

Fortran: Fix regression caused by r14-10477 [PR59104]

2024-07-21  Paul Thomas  

gcc/fortran
PR fortran/59104
* gfortran.h : Add decl_order to gfc_symbol.
* symbol.cc : Add static next_decl_order..
(gfc_set_sym_referenced): Set symbol decl_order.
* trans-decl.cc : Include dependency.h.
(decl_order): Replace symbol declared_at.lb->location with
decl_order.

gcc/testsuite/
PR fortran/59104
* gfortran.dg/dependent_decls_3.f90: New test.

ug.
You are the assignee for the bug.


Re: [PATCH] LoongArch: Implement scalar isinf, isnormal, and isfinite via fclass

2024-07-21 Thread Andrew Pinski
On Sun, Jul 21, 2024 at 3:57 AM Xi Ruoyao  wrote:
>
> On Mon, 2024-07-15 at 15:53 +0800, Lulu Cheng wrote:
> > Hi,
> >
> > g++.dg/opt/pr107569.C and range-sincos.c vrp-float-abs-1.c is the same
> > issue, right?
> >
> > And I have no objection to code modifications. But I think it's better
> > to wait until this builtin
> >
> > function is fixed.
>
> Oops https://gcc.gnu.org/pipermail/gcc-patches/2024-July/656937.html
> won't be enough for pr107569.C.  For pr107569.C I guess we need to add
> range ops for __builtin_isfinite but the patch only handles
> __builtin_isinf.

There is a patch for that; all 3 were pinged this morning:
isinf: https://gcc.gnu.org/pipermail/gcc-patches/2024-July/657879.html
isnormal: https://gcc.gnu.org/pipermail/gcc-patches/2024-July/657880.html
isfinite: https://gcc.gnu.org/pipermail/gcc-patches/2024-July/657881.html

Thanks,
Andrew Pinski


>
> --
> Xi Ruoyao 
> School of Aerospace Science and Technology, Xidian University


[PATCH] i386: Change prefetchi output template

2024-07-21 Thread Haochen Jiang
Hi all,

For prefetchi instructions, RIP-relative address is explicitly mentioned
for operand and assembler obeys that rule strictly. This makes
instruction like:

prefetchit0 bar

got illegal for assembler, which should be a broad usage for prefetchi.

Explicitly add (%rip) after function label to make it legal in
assembler so that it could pass to linker to get the real address.

Ok for trunk and backport to GCC14 and GCC13 since prefetchi instructions
are introduced in GCC13?

Thx,
Haochen

gcc/ChangeLog:

* config/i386/i386.md (prefetchi): Add explicit (%rip) after
function label.

gcc/testsuite/ChangeLog:

* gcc.target/i386/prefetchi-1.c: Check (%rip).
---
 gcc/config/i386/i386.md | 2 +-
 gcc/testsuite/gcc.target/i386/prefetchi-1.c | 4 ++--
 2 files changed, 3 insertions(+), 3 deletions(-)

diff --git a/gcc/config/i386/i386.md b/gcc/config/i386/i386.md
index 90d3aa450f0..3ec51bad6fe 100644
--- a/gcc/config/i386/i386.md
+++ b/gcc/config/i386/i386.md
@@ -28004,7 +28004,7 @@
   "TARGET_PREFETCHI && TARGET_64BIT"
 {
   static const char * const patterns[2] = {
-"prefetchit1\t%0", "prefetchit0\t%0"
+"prefetchit1\t{%p0(%%rip)|%p0[rip]}", "prefetchit0\t{%p0(%%rip)|%p0[rip]}"
   };
 
   int locality = INTVAL (operands[1]);
diff --git a/gcc/testsuite/gcc.target/i386/prefetchi-1.c 
b/gcc/testsuite/gcc.target/i386/prefetchi-1.c
index 80f25e70e8e..03dfdc55e86 100644
--- a/gcc/testsuite/gcc.target/i386/prefetchi-1.c
+++ b/gcc/testsuite/gcc.target/i386/prefetchi-1.c
@@ -1,7 +1,7 @@
 /* { dg-do compile { target { ! ia32 } } } */
 /* { dg-options "-mprefetchi -O2" } */
-/* { dg-final { scan-assembler-times "\[ \\t\]+prefetchit0\[ \\t\]+" 2 } } */
-/* { dg-final { scan-assembler-times "\[ \\t\]+prefetchit1\[ \\t\]+" 2 } } */
+/* { dg-final { scan-assembler-times "\[ \\t\]+prefetchit0\[ 
\\t\]+bar\\(%rip\\)" 2 } } */
+/* { dg-final { scan-assembler-times "\[ \\t\]+prefetchit1\[ 
\\t\]+bar\\(%rip\\)" 2 } } */
 
 #include 
 
-- 
2.31.1



[PATCH] regrename: Skip renaming register pairs [PR115860]

2024-07-21 Thread Stefan Schulze Frielinghaus
It is not trivial to decide when a write of a register pair terminates
or starts a new chain.  For example, prior regrename we have

(insn 91 38 36 5 (set (reg:FPRX2 16 %f0 [orig:76 x ] [76])
(const_double:FPRX2 0.0 [0x0.0p+0])) 
"float-cast-overflow-7-reduced.c":5:55 discrim 2 1507 {*movfprx2_64}
 (expr_list:REG_EQUAL (const_double:FPRX2 0.0 [0x0.0p+0])
(nil)))
(insn 36 91 37 5 (set (subreg:DF (reg:FPRX2 16 %f0 [orig:76 x ] [76]) 0)
(mem/c:DF (plus:DI (reg/f:DI 15 %r15)
(const_int 160 [0xa0])) [7 %sfp+-32 S8 A64])) 
"float-cast-overflow-7-reduced.c":5:55 discrim 2 1512 {*movdf_64dfp}
 (nil))
(insn 37 36 43 5 (set (subreg:DF (reg:FPRX2 16 %f0 [orig:76 x ] [76]) 8)
(mem/c:DF (plus:DI (reg/f:DI 15 %r15)
(const_int 168 [0xa8])) [7 %sfp+-24 S8 A64])) 
"float-cast-overflow-7-reduced.c":5:55 discrim 2 1512 {*movdf_64dfp}
 (nil))

where insn 91 writes both registers of a register pair and it is clear
that an existing chain must be terminated and a new started.  Insn 36
and 37 write only into one register of a corresponding register pair.
For each write on its own it is not obvious when to terminate an
existing chain and to start a new one.  In other words, once insn 36
materializes and 37 didn't we are kind of in a limbo state.  Tracking
this correctly is inherently hard and I'm not entirely sure whether
optimizations could even lead to more complicated cases where it is even
less clear when a chain terminates and a new has to be started.
Therefore, skip renaming of register pairs.

Bootstrapped and regtested on x86_64, aarch64, powerpc64le, and s390.
Ok for mainline?

This fixes on s390:
FAIL: g++.dg/cpp23/ext-floating14.C  -std=gnu++23 execution test
FAIL: g++.dg/cpp23/ext-floating14.C  -std=gnu++26 execution test
FAIL: c-c++-common/ubsan/float-cast-overflow-7.c   -O2  execution test
FAIL: c-c++-common/ubsan/float-cast-overflow-7.c   -O2 -flto 
-fno-use-linker-plugin -flto-partition=none  execution test
FAIL: c-c++-common/ubsan/float-cast-overflow-7.c   -O2 -flto 
-fuse-linker-plugin -fno-fat-lto-objects  execution test
FAIL: gcc.dg/torture/fp-int-convert-float128-ieee-timode.c   -O0  execution test
FAIL: gcc.dg/torture/fp-int-convert-float128-ieee-timode.c   -O1  execution test
FAIL: gcc.dg/torture/fp-int-convert-float128-ieee-timode.c   -O2  execution test
FAIL: gcc.dg/torture/fp-int-convert-float128-ieee-timode.c   -O2 -flto 
-fno-use-linker-plugin -flto-partition=none  execution test
FAIL: gcc.dg/torture/fp-int-convert-float128-ieee-timode.c   -O2 -flto 
-fuse-linker-plugin -fno-fat-lto-objects  execution test
FAIL: gcc.dg/torture/fp-int-convert-float128-ieee-timode.c   -O3 -g  execution 
test
FAIL: gcc.dg/torture/fp-int-convert-float128-ieee-timode.c   -Os  execution test
FAIL: gcc.dg/torture/fp-int-convert-float64x-timode.c   -O0  execution test
FAIL: gcc.dg/torture/fp-int-convert-float64x-timode.c   -O1  execution test
FAIL: gcc.dg/torture/fp-int-convert-float64x-timode.c   -O2  execution test
FAIL: gcc.dg/torture/fp-int-convert-float64x-timode.c   -O2 -flto 
-fno-use-linker-plugin -flto-partition=none  execution test
FAIL: gcc.dg/torture/fp-int-convert-float64x-timode.c   -O2 -flto 
-fuse-linker-plugin -fno-fat-lto-objects  execution test
FAIL: gcc.dg/torture/fp-int-convert-float64x-timode.c   -O3 -g  execution test
FAIL: gcc.dg/torture/fp-int-convert-float64x-timode.c   -Os  execution test
FAIL: gcc.dg/torture/fp-int-convert-timode.c   -O0  execution test
FAIL: gcc.dg/torture/fp-int-convert-timode.c   -O1  execution test
FAIL: gcc.dg/torture/fp-int-convert-timode.c   -O2  execution test
FAIL: gcc.dg/torture/fp-int-convert-timode.c   -O2 -flto -fno-use-linker-plugin 
-flto-partition=none  execution test
FAIL: gcc.dg/torture/fp-int-convert-timode.c   -O2 -flto -fuse-linker-plugin 
-fno-fat-lto-objects  execution test
FAIL: gcc.dg/torture/fp-int-convert-timode.c   -O3 -g  execution test
FAIL: gcc.dg/torture/fp-int-convert-timode.c   -Os  execution test
FAIL: gfortran.dg/pr96711.f90   -O0  execution test
FAIL: TestSignalForwardingExternal
FAIL: go test misc/cgo/testcarchive
FAIL: libffi.closures/nested_struct5.c -W -Wall -Wno-psabi -O2 output pattern 
test
FAIL: libphobos.phobos/std/algorithm/mutation.d execution test
FAIL: libphobos.phobos/std/conv.d execution test
FAIL: libphobos.phobos/std/internal/math/errorfunction.d execution test
FAIL: libphobos.phobos/std/variant.d execution test
FAIL: libphobos.phobos_shared/std/algorithm/mutation.d execution test
FAIL: libphobos.phobos_shared/std/conv.d execution test
FAIL: libphobos.phobos_shared/std/internal/math/errorfunction.d execution test
FAIL: libphobos.phobos_shared/std/variant.d execution test

gcc/ChangeLog:

PR rtl-optimiztion/115860
* regrename.cc (scan_rtx_reg): Do not try to rename register
pairs.
---
 gcc/regrename.cc | 4 
 1 file changed, 4 insertions(+)

diff --git a/gcc/regrename.cc b/gcc/regrename.cc
index 054e601740b..6ae5a2309d0 100644
---

[PATCH,c++,wwwdocs] bugs: Remove old "export" non-bug

2024-07-21 Thread Gerald Pfeifer
We have been carrying this note on the "original" export feature for ages, 
and I believe it's not actually a FAQ, if it ever was.

Jonathan moved this down when adding a note on ADL last fall.

I now propose to drop it.

Thoughts?

Gerald



diff --git a/htdocs/bugs/index.html b/htdocs/bugs/index.html
index 40355911..7f2f485c 100644
--- a/htdocs/bugs/index.html
+++ b/htdocs/bugs/index.html
@@ -622,17 +622,6 @@ and the scope operator, or compile using C++11 or later. 
Defect report 1104
 changed the parser rules so that <:: works as expected.
 
 
-export
-Most C++ compilers (G++ included) never implemented C++98
-export, which was removed in C++11, and the keyword reused in
-C++20 by the Modules feature. The C++98 feature was intended to support
-separate compilation of template declarations and
-definitions. Without export, a template definition must be in
-scope to be used. The obvious workaround is simply to place all definitions in
-the header itself. Alternatively, the compilation unit containing template
-definitions may be included from the header.
-
-
 Common problems when upgrading the compiler
 
 ABI changes


Re: [PATCH] Reduce iteration counts of tsvc tests

2024-07-21 Thread Richard Biener
On Fri, Jul 19, 2024 at 4:25 AM Joern Wolfgang Rennecke
 wrote:
>
> As discussed before on gcc@gcc,gnu.org, this patch reduces the iteration
> counts of the tsvc tests to avoid timeouts when using simulators.
> A few tests needed special attention because they divided "iterations"
> by some constant, so putting 10 in there would lead to zero iteration
> count, and thus the to-be-vectorized code removed.  For nine of these
> files, that was a simple adjustment of iterations to 256 (AKA LEN_2D),
> but vect-tsvc-s176.c needed 3200 to avoid a zero outer loop iteration
> count, and then it took to long on a simulator, so I curtailed the inner
> loop unless run_expensive_tests is set; I targeted the inner loop
> because it already had a variable as the loop end bound, and it was just
> a matter of adjusting that variable.
>
> Regression tested in 9846b0916c1a9b9f3e9df4657670ef4419617134 on
> x86_64-pc-linux-gnu (--disable-multilibs) by running
> make check-gcc 'RUNTESTFLAGS=vect.exp' -j32
> and comparing gcc.sum without and with this patch.

OK.

Richard.