[PATCH][RFC] tree-optimization/92335 - Improve sinking heuristics for vectorization

2023-07-28 Thread Richard Biener via Gcc-patches
The following delays sinking of loads within the same innermost
loop when it was unconditional before.  That's a not uncommon
issue preventing vectorization when masked loads are not available.

Bootstrapped and tested on x86_64-unknown-linux-gnu.

I have a followup patch improving sinking that without this would
cause more of the problematic sinking - now that we have a second
sink pass after loop opts this looks like a reasonable approach?

OK?

Thanks,
Richard.

PR tree-optimization/92335
* tree-ssa-sink.cc (select_best_block): Before loop
optimizations avoid sinking unconditional loads/stores
in innermost loops to conditional executed places.

* gcc.dg/tree-ssa/ssa-sink-10.c: Disable vectorizing.
* gcc.dg/tree-ssa/predcom-9.c: Clone from ssa-sink-10.c,
expect predictive commoning to happen instead of sinking.
* gcc.dg/vect/pr65947-3.c: Adjust.
---
 gcc/testsuite/gcc.dg/tree-ssa/predcom-9.c   | 20 
 gcc/testsuite/gcc.dg/tree-ssa/ssa-sink-10.c |  2 +-
 gcc/testsuite/gcc.dg/vect/pr65947-3.c   |  6 +-
 gcc/tree-ssa-sink.cc| 12 
 4 files changed, 34 insertions(+), 6 deletions(-)
 create mode 100644 gcc/testsuite/gcc.dg/tree-ssa/predcom-9.c

diff --git a/gcc/testsuite/gcc.dg/tree-ssa/predcom-9.c 
b/gcc/testsuite/gcc.dg/tree-ssa/predcom-9.c
new file mode 100644
index 000..b0fb0e2d4c5
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/tree-ssa/predcom-9.c
@@ -0,0 +1,20 @@
+/* { dg-do compile } */
+/* { dg-options "-O2 -fdump-tree-sink-details -fdump-tree-pcom-details" } */
+
+int x[1024], y[1024], z[1024], w[1024];
+void foo (void)
+{
+  int i;
+  for (i = 1; i < 1024; ++i)
+{
+  int a = x[i];
+  int b = y[i];
+  int c = x[i-1];
+  int d = y[i-1];
+  if (w[i])
+   z[i] = (a + b) + (c + d);
+}
+}
+
+/* { dg-final { scan-tree-dump-not "Sinking # VUSE" "sink1" } } */
+/* { dg-final { scan-tree-dump "Executing predictive commoning without 
unrolling" "pcom" } } */
diff --git a/gcc/testsuite/gcc.dg/tree-ssa/ssa-sink-10.c 
b/gcc/testsuite/gcc.dg/tree-ssa/ssa-sink-10.c
index 535cb3208f5..a35014be038 100644
--- a/gcc/testsuite/gcc.dg/tree-ssa/ssa-sink-10.c
+++ b/gcc/testsuite/gcc.dg/tree-ssa/ssa-sink-10.c
@@ -1,5 +1,5 @@
 /* { dg-do compile } */
-/* { dg-options "-O2 -fdump-tree-sink-details -fno-tree-pre" } */
+/* { dg-options "-O2 -fdump-tree-sink-details -fno-tree-vectorize 
-fno-tree-pre" } */
 
 int x[1024], y[1024], z[1024], w[1024];
 void foo (void)
diff --git a/gcc/testsuite/gcc.dg/vect/pr65947-3.c 
b/gcc/testsuite/gcc.dg/vect/pr65947-3.c
index f1bfad65c22..6b4077e1a62 100644
--- a/gcc/testsuite/gcc.dg/vect/pr65947-3.c
+++ b/gcc/testsuite/gcc.dg/vect/pr65947-3.c
@@ -51,10 +51,6 @@ main (void)
   return 0;
 }
 
-/* Since the fix for PR97307 which sinks the load of a[i], preventing
-   if-conversion to happen, targets that cannot do masked loads only
-   vectorize the inline copy.  */
-/* { dg-final { scan-tree-dump-times "LOOP VECTORIZED" 2 "vect" { target 
vect_masked_load } } } */
-/* { dg-final { scan-tree-dump-times "LOOP VECTORIZED" 1 "vect" { target { ! 
vect_masked_load } } } } */
+/* { dg-final { scan-tree-dump-times "LOOP VECTORIZED" 2 "vect" } } */
 /* { dg-final { scan-tree-dump-times "optimizing condition reduction with 
FOLD_EXTRACT_LAST" 2 "vect" { target vect_fold_extract_last } } } */
 /* { dg-final { scan-tree-dump-not "condition expression based on integer 
induction." "vect" } } */
diff --git a/gcc/tree-ssa-sink.cc b/gcc/tree-ssa-sink.cc
index cf0a32a954b..dcbe05b3b03 100644
--- a/gcc/tree-ssa-sink.cc
+++ b/gcc/tree-ssa-sink.cc
@@ -220,6 +220,18 @@ select_best_block (basic_block early_bb,
   if (bb_loop_depth (best_bb) < bb_loop_depth (early_bb))
 return best_bb;
 
+  /* Avoid turning an unconditional load/store into a conditional one when we
+ still might want to perform vectorization.  */
+  if (best_bb->loop_father == early_bb->loop_father
+  && loop_outer (best_bb->loop_father)
+  && !best_bb->loop_father->inner
+  && gimple_vuse (stmt)
+  && flag_tree_loop_vectorize
+  && !(cfun->curr_properties & PROP_loop_opts_done)
+  && dominated_by_p (CDI_DOMINATORS, best_bb->loop_father->latch, early_bb)
+  && !dominated_by_p (CDI_DOMINATORS, best_bb->loop_father->latch, 
best_bb))
+return early_bb;
+
   /* Get the sinking threshold.  If the statement to be moved has memory
  operands, then increase the threshold by 7% as those are even more
  profitable to avoid, clamping at 100%.  */
-- 
2.35.3


[PATCH V2] VECT: Support CALL vectorization for COND_LEN_*

2023-07-28 Thread Juzhe-Zhong
Hi, Richard and Richi.

Base on the suggestions from Richard:
https://gcc.gnu.org/pipermail/gcc-patches/2023-July/625396.html

This patch choose (1) approach that Richard provided, meaning:

RVV implements cond_* optabs as expanders.  RVV therefore supports
both IFN_COND_ADD and IFN_COND_LEN_ADD.  No dummy length arguments
are needed at the gimple level.

Such approach can make codes much cleaner and reasonable.

Consider this following case:
void foo (float * __restrict a, float * __restrict b, int * __restrict cond, 
int n)
{
  for (int i = 0; i < n; i++)
if (cond[i])
  a[i] = b[i] + a[i];
}


Output of RISC-V (32-bits) gcc (trunk) (Compiler #3)
:5:21: missed: couldn't vectorize loop
:5:21: missed: not vectorized: control flow in loop.

ARM SVE:

...
mask__27.10_51 = vect__4.9_49 != { 0, ... };
...
vec_mask_and_55 = loop_mask_49 & mask__27.10_51;
...
vect__9.17_62 = .COND_ADD (vec_mask_and_55, vect__6.13_56, vect__8.16_60, 
vect__6.13_56);

For RVV, we want IR as follows:

...
_68 = .SELECT_VL (ivtmp_66, POLY_INT_CST [4, 4]);
...
mask__27.10_51 = vect__4.9_49 != { 0, ... };
...
vect__9.17_60 = .COND_LEN_ADD (mask__27.10_51, vect__6.13_55, vect__8.16_59, 
vect__6.13_55, _68, 0);
...

Both len and mask of COND_LEN_ADD are real not dummy.

This patch has been fully tested in RISC-V port with supporting both COND_* and 
COND_LEN_*.

And also, Bootstrap and Regression on X86 passed.

OK for trunk?

gcc/ChangeLog:

* internal-fn.cc (FOR_EACH_LEN_FN_PAIR): New macro.
(get_len_internal_fn): New function.
(CASE): Ditto.
* internal-fn.h (get_len_internal_fn): Ditto.
* tree-vect-stmts.cc (vectorizable_call): Support CALL vectorization 
with COND_LEN_*.

---
 gcc/internal-fn.cc | 46 ++
 gcc/internal-fn.h  |  1 +
 gcc/tree-vect-stmts.cc | 87 +-
 3 files changed, 125 insertions(+), 9 deletions(-)

diff --git a/gcc/internal-fn.cc b/gcc/internal-fn.cc
index 8e294286388..379220bebc7 100644
--- a/gcc/internal-fn.cc
+++ b/gcc/internal-fn.cc
@@ -4443,6 +4443,52 @@ get_conditional_internal_fn (internal_fn fn)
 }
 }
 
+/* Invoke T(IFN) for each internal function IFN that also has an
+   IFN_COND_LEN_* or IFN_MASK_LEN_* form.  */
+#define FOR_EACH_LEN_FN_PAIR(T)
\
+  T (MASK_LOAD, MASK_LEN_LOAD) 
\
+  T (MASK_STORE, MASK_LEN_STORE)   
\
+  T (MASK_GATHER_LOAD, MASK_LEN_GATHER_LOAD)   
\
+  T (MASK_SCATTER_STORE, MASK_LEN_SCATTER_STORE)   
\
+  T (COND_ADD, COND_LEN_ADD)   
\
+  T (COND_SUB, COND_LEN_SUB)   
\
+  T (COND_MUL, COND_LEN_MUL)   
\
+  T (COND_DIV, COND_LEN_DIV)   
\
+  T (COND_MOD, COND_LEN_MOD)   
\
+  T (COND_RDIV, COND_LEN_RDIV) 
\
+  T (COND_FMIN, COND_LEN_FMIN) 
\
+  T (COND_FMAX, COND_LEN_FMAX) 
\
+  T (COND_MIN, COND_LEN_MIN)   
\
+  T (COND_MAX, COND_LEN_MAX)   
\
+  T (COND_AND, COND_LEN_AND)   
\
+  T (COND_IOR, COND_LEN_IOR)   
\
+  T (COND_XOR, COND_LEN_XOR)   
\
+  T (COND_SHL, COND_LEN_SHL)   
\
+  T (COND_SHR, COND_LEN_SHR)   
\
+  T (COND_NEG, COND_LEN_NEG)   
\
+  T (COND_FMA, COND_LEN_FMA)   
\
+  T (COND_FMS, COND_LEN_FMS)   
\
+  T (COND_FNMA, COND_LEN_FNMA) 
\
+  T (COND_FNMS, COND_LEN_FNMS)
+
+/* If there exists an internal function like IFN that operates on vectors,
+   but with additional length and bias parameters, return the internal_fn
+   for that function, otherwise return IFN_LAST.  */
+internal_fn
+get_len_internal_fn (internal_fn fn)
+{
+  switch (fn)
+{
+#define CASE(NAME, LEN_NAME)   
\
+  case IFN_##NAME: 
\
+return IFN_##LEN_NAME;
+  FOR_EACH_LEN_FN_PAIR (CASE)
+#undef CASE
+default:
+  return IFN_LAST;
+}
+}
+
 /* If IFN implements the conditional form of an unconditional internal
function, return that unconditional function, otherwise r

[PATCH] RISC-V: Support CALL conditional autovec patterns

2023-07-28 Thread Juzhe-Zhong
This patch is depending on middle-end support:
https://gcc.gnu.org/pipermail/gcc-patches/2023-July/625696.html

Consider this following case:
void foo (float * __restrict a, float * __restrict b, int * __restrict cond, 
int n)
{
  for (int i = 0; i < n; i++)
if (cond[i])
  a[i] = b[i] + a[i];
}

Before this patch (**NO** -ffast-math):
:5:21: missed: couldn't vectorize loop
:5:21: missed: not vectorized: control flow in loop.

After this patch:
foo:
ble a3,zero,.L5
mv  a6,a0
.L3:
vsetvli a5,a3,e8,mf4,ta,ma
vle32.v v0,0(a2)
vsetvli a7,zero,e32,m1,ta,ma
sllia4,a5,2
vmsne.viv0,v0,0
sub a3,a3,a5
vsetvli zero,a5,e32,m1,tu,mu--> must be TUMU
vle32.v v2,0(a0),v0.t
vle32.v v1,0(a1),v0.t
vfadd.vvv1,v1,v2,v0.t   --> generated by COND_LEN_ADD with 
real mask and len.
vse32.v v1,0(a6),v0.t
add a2,a2,a4
add a1,a1,a4
add a0,a0,a4
add a6,a6,a4
bne a3,zero,.L3
.L5:
ret

gcc/ChangeLog:

* config/riscv/autovec.md (cond_): New pattern.
(cond_len_): Ditto.
(cond_fma): Ditto.
* config/riscv/riscv-protos.h (enum insn_type): New macro.
* config/riscv/riscv-v.cc (emit_vlmax_masked_fp_mu_insn): New function.
(emit_nonvlmax_tumu_insn): Ditto.
(emit_nonvlmax_fp_tumu_insn): Ditto.
(expand_cond_len_binop): Add condtional len patterns.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/rvv/rvv.exp: Add cond tests.
* gcc.target/riscv/rvv/autovec/cond/cond_arith-1.c: New test.
* gcc.target/riscv/rvv/autovec/cond/cond_arith-2.c: New test.
* gcc.target/riscv/rvv/autovec/cond/cond_arith-3.c: New test.
* gcc.target/riscv/rvv/autovec/cond/cond_arith-4.c: New test.
* gcc.target/riscv/rvv/autovec/cond/cond_arith-5.c: New test.
* gcc.target/riscv/rvv/autovec/cond/cond_arith-6.c: New test.
* gcc.target/riscv/rvv/autovec/cond/cond_arith-7.c: New test.
* gcc.target/riscv/rvv/autovec/cond/cond_arith-8.c: New test.
* gcc.target/riscv/rvv/autovec/cond/cond_arith-9.c: New test.
* gcc.target/riscv/rvv/autovec/cond/cond_arith_run-1.c: New test.
* gcc.target/riscv/rvv/autovec/cond/cond_arith_run-2.c: New test.
* gcc.target/riscv/rvv/autovec/cond/cond_arith_run-3.c: New test.
* gcc.target/riscv/rvv/autovec/cond/cond_arith_run-4.c: New test.
* gcc.target/riscv/rvv/autovec/cond/cond_arith_run-5.c: New test.
* gcc.target/riscv/rvv/autovec/cond/cond_arith_run-6.c: New test.
* gcc.target/riscv/rvv/autovec/cond/cond_arith_run-7.c: New test.
* gcc.target/riscv/rvv/autovec/cond/cond_arith_run-8.c: New test.
* gcc.target/riscv/rvv/autovec/cond/cond_arith_run-9.c: New test.
* gcc.target/riscv/rvv/autovec/cond/cond_fadd-1.c: New test.
* gcc.target/riscv/rvv/autovec/cond/cond_fadd-2.c: New test.
* gcc.target/riscv/rvv/autovec/cond/cond_fadd-3.c: New test.
* gcc.target/riscv/rvv/autovec/cond/cond_fadd-4.c: New test.
* gcc.target/riscv/rvv/autovec/cond/cond_fadd_run-1.c: New test.
* gcc.target/riscv/rvv/autovec/cond/cond_fadd_run-2.c: New test.
* gcc.target/riscv/rvv/autovec/cond/cond_fadd_run-3.c: New test.
* gcc.target/riscv/rvv/autovec/cond/cond_fadd_run-4.c: New test.
* gcc.target/riscv/rvv/autovec/cond/cond_fmax-1.c: New test.
* gcc.target/riscv/rvv/autovec/cond/cond_fmax-2.c: New test.
* gcc.target/riscv/rvv/autovec/cond/cond_fmax-3.c: New test.
* gcc.target/riscv/rvv/autovec/cond/cond_fmax-4.c: New test.
* gcc.target/riscv/rvv/autovec/cond/cond_fmax_run-1.c: New test.
* gcc.target/riscv/rvv/autovec/cond/cond_fmax_run-2.c: New test.
* gcc.target/riscv/rvv/autovec/cond/cond_fmax_run-3.c: New test.
* gcc.target/riscv/rvv/autovec/cond/cond_fmax_run-4.c: New test.
* gcc.target/riscv/rvv/autovec/cond/cond_fmin-1.c: New test.
* gcc.target/riscv/rvv/autovec/cond/cond_fmin-2.c: New test.
* gcc.target/riscv/rvv/autovec/cond/cond_fmin-3.c: New test.
* gcc.target/riscv/rvv/autovec/cond/cond_fmin-4.c: New test.
* gcc.target/riscv/rvv/autovec/cond/cond_fmin_run-1.c: New test.
* gcc.target/riscv/rvv/autovec/cond/cond_fmin_run-2.c: New test.
* gcc.target/riscv/rvv/autovec/cond/cond_fmin_run-3.c: New test.
* gcc.target/riscv/rvv/autovec/cond/cond_fmin_run-4.c: New test.
* gcc.target/riscv/rvv/autovec/cond/cond_fmul-1.c: New test.
* gcc.target/riscv/rvv/autovec/cond/cond_fmul-2.c: New test.
* gcc.target/riscv/rvv/autovec/cond/cond_fmul-3.c: New test.
* gcc.target/riscv/rvv/autovec/cond/cond_fmul-4.c: New test.
* gcc.target/riscv/rvv/autovec/cond/cond_fmul_run-1.c: New test.
* gcc.target/riscv/rvv/autovec/cond/cond_fmul_

RE: [PATCH] RISC-V: Remove vxrm parameter for vsadd[u] and vssub[u]

2023-07-28 Thread Li, Pan2 via Gcc-patches
Committed, thanks Kito and Juzhe.

Pan

From: Kito Cheng 
Sent: Friday, July 28, 2023 2:46 PM
To: 钟居哲 
Cc: Li Xu ; gcc-patches ; 
palmer ; Li, Pan2 
Subject: Re: [PATCH] RISC-V: Remove vxrm parameter for vsadd[u] and vssub[u]

I didn't checked with rvv intrinsic spec, but I assume this is found during 
test with api test, so LGTM, thanks for fixing this:)

juzhe.zh...@rivai.ai 
mailto:juzhe.zh...@rivai.ai>> 於 2023年7月28日 週五 14:43 寫道:
Thanks for fixing it.
LGTM from my side.



juzhe.zh...@rivai.ai

From: Li Xu
Date: 2023-07-28 13:52
To: gcc-patches
CC: kito.cheng; palmer; juzhe.zhong; pan2.li; xuli
Subject: [PATCH] RISC-V: Remove vxrm parameter for vsadd[u] and vssub[u]
From: xuli mailto:xu...@eswincomputing.com>>

Computation of `vsadd`, `vsaddu`, `vssub`, and `vssubu` do not need the
rounding mode, therefore the intrinsics of these instructions do not have
the parameter for rounding mode control.

gcc/ChangeLog:

* config/riscv/riscv-vector-builtins-bases.cc: remove rounding mode of 
vsadd[u] and vssub[u].
* config/riscv/vector.md: Ditto.

gcc/testsuite/ChangeLog:

* g++.target/riscv/rvv/base/bug-12.C: Adapt testcase.
* g++.target/riscv/rvv/base/bug-14.C: Ditto.
* g++.target/riscv/rvv/base/bug-18.C: Ditto.
* g++.target/riscv/rvv/base/bug-19.C: Ditto.
* g++.target/riscv/rvv/base/bug-20.C: Ditto.
* g++.target/riscv/rvv/base/bug-21.C: Ditto.
* g++.target/riscv/rvv/base/bug-22.C: Ditto.
* g++.target/riscv/rvv/base/bug-23.C: Ditto.
* g++.target/riscv/rvv/base/bug-3.C: Ditto.
* g++.target/riscv/rvv/base/bug-8.C: Ditto.
* gcc.target/riscv/rvv/base/binop_vx_constraint-100.c: Ditto.
* gcc.target/riscv/rvv/base/binop_vx_constraint-101.c: Ditto.
* gcc.target/riscv/rvv/base/binop_vx_constraint-102.c: Ditto.
* gcc.target/riscv/rvv/base/binop_vx_constraint-103.c: Ditto.
* gcc.target/riscv/rvv/base/binop_vx_constraint-104.c: Ditto.
* gcc.target/riscv/rvv/base/binop_vx_constraint-105.c: Ditto.
* gcc.target/riscv/rvv/base/binop_vx_constraint-106.c: Ditto.
* gcc.target/riscv/rvv/base/binop_vx_constraint-107.c: Ditto.
* gcc.target/riscv/rvv/base/binop_vx_constraint-108.c: Ditto.
* gcc.target/riscv/rvv/base/binop_vx_constraint-109.c: Ditto.
* gcc.target/riscv/rvv/base/binop_vx_constraint-110.c: Ditto.
* gcc.target/riscv/rvv/base/binop_vx_constraint-111.c: Ditto.
* gcc.target/riscv/rvv/base/binop_vx_constraint-112.c: Ditto.
* gcc.target/riscv/rvv/base/binop_vx_constraint-113.c: Ditto.
* gcc.target/riscv/rvv/base/binop_vx_constraint-114.c: Ditto.
* gcc.target/riscv/rvv/base/binop_vx_constraint-115.c: Ditto.
* gcc.target/riscv/rvv/base/binop_vx_constraint-116.c: Ditto.
* gcc.target/riscv/rvv/base/binop_vx_constraint-117.c: Ditto.
* gcc.target/riscv/rvv/base/binop_vx_constraint-118.c: Ditto.
* gcc.target/riscv/rvv/base/binop_vx_constraint-119.c: Ditto.
* gcc.target/riscv/rvv/base/binop_vx_constraint-97.c: Ditto.
* gcc.target/riscv/rvv/base/binop_vx_constraint-98.c: Ditto.
* gcc.target/riscv/rvv/base/merge_constraint-1.c: Ditto.
* gcc.target/riscv/rvv/base/fixed-point-vxrm-error.c: New test.
* gcc.target/riscv/rvv/base/fixed-point-vxrm.c: New test.
---
.../riscv/riscv-vector-builtins-bases.cc  |  6 --
gcc/config/riscv/vector.md| 42 +++---
.../g++.target/riscv/rvv/base/bug-12.C|  2 +-
.../g++.target/riscv/rvv/base/bug-14.C|  2 +-
.../g++.target/riscv/rvv/base/bug-18.C|  2 +-
.../g++.target/riscv/rvv/base/bug-19.C|  2 +-
.../g++.target/riscv/rvv/base/bug-20.C|  2 +-
.../g++.target/riscv/rvv/base/bug-21.C|  2 +-
.../g++.target/riscv/rvv/base/bug-22.C|  2 +-
.../g++.target/riscv/rvv/base/bug-23.C|  2 +-
.../g++.target/riscv/rvv/base/bug-3.C |  2 +-
.../g++.target/riscv/rvv/base/bug-8.C |  2 +-
.../riscv/rvv/base/binop_vx_constraint-100.c  |  4 +-
.../riscv/rvv/base/binop_vx_constraint-101.c  |  4 +-
.../riscv/rvv/base/binop_vx_constraint-102.c  |  4 +-
.../riscv/rvv/base/binop_vx_constraint-103.c  | 28 +++
.../riscv/rvv/base/binop_vx_constraint-104.c  | 16 ++--
.../riscv/rvv/base/binop_vx_constraint-105.c  |  4 +-
.../riscv/rvv/base/binop_vx_constraint-106.c  |  4 +-
.../riscv/rvv/base/binop_vx_constraint-107.c  |  4 +-
.../riscv/rvv/base/binop_vx_constraint-108.c  |  4 +-
.../riscv/rvv/base/binop_vx_constraint-109.c  | 28 +++
.../riscv/rvv/base/binop_vx_constraint-110.c  | 16 ++--
.../riscv/rvv/base/binop_vx_constraint-111.c  |  4 +-
.../riscv/rvv/base/binop_vx_constraint-112.c  |  4 +-
.../riscv/rvv/base/binop_vx_constraint-113.c  |  4 +-
.../riscv/rvv/base/binop_vx_constraint-114.c  |  4 +-
.../riscv/rvv/base/binop_vx_constraint-115.c  | 16 ++--
.../risc

[COMMITTED] ada: Improve defense against illegal code in check for infinite loops

2023-07-28 Thread Marc Poulhiès via Gcc-patches
From: Piotr Trojanek 

Fix crash occurring when attribute System'To_Address is used without
a WITH clause for package System.

gcc/ada/

* sem_warn.adb (Check_Infinite_Loop_Warning): Don't look at the type of
actual parameter when it has no type at all, e.g. because the entire
subprogram call is illegal.

Tested on x86_64-pc-linux-gnu, committed on master.

---
 gcc/ada/sem_warn.adb | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/gcc/ada/sem_warn.adb b/gcc/ada/sem_warn.adb
index 5dd7c17d4e2..7ecb4d9c4a6 100644
--- a/gcc/ada/sem_warn.adb
+++ b/gcc/ada/sem_warn.adb
@@ -591,7 +591,9 @@ package body Sem_Warn is
 begin
Actual := First_Actual (N);
while Present (Actual) loop
-  if Is_Access_Subprogram_Type (Etype (Actual)) then
+  if No (Etype (Actual))
+or else Is_Access_Subprogram_Type (Etype (Actual))
+  then
  return Abandon;
   else
  Next_Actual (Actual);
-- 
2.40.0



[COMMITTED] ada: Allow calls to Number_Formals when no formals are present

2023-07-28 Thread Marc Poulhiès via Gcc-patches
From: Piotr Trojanek 

It is much simpler and safer for the routine Number_Formals to accept
subprogram entities that have no formals.

gcc/ada/

* einfo-utils.adb (Number_Formals): Change types in body.
* einfo-utils.ads (Number_Formals): Change type in spec.
* einfo.ads (Number_Formals): Change type in comment.
* sem_ch13.adb (Is_Property_Function): Fix style in a caller of
Number_Formals that was likely to crash because of missing guards.

Tested on x86_64-pc-linux-gnu, committed on master.

---
 gcc/ada/einfo-utils.adb | 4 ++--
 gcc/ada/einfo-utils.ads | 2 +-
 gcc/ada/einfo.ads   | 2 +-
 gcc/ada/sem_ch13.adb| 6 +-
 4 files changed, 9 insertions(+), 5 deletions(-)

diff --git a/gcc/ada/einfo-utils.adb b/gcc/ada/einfo-utils.adb
index dad3a654743..7fe517124d9 100644
--- a/gcc/ada/einfo-utils.adb
+++ b/gcc/ada/einfo-utils.adb
@@ -2105,8 +2105,8 @@ package body Einfo.Utils is
-- Number_Formals --

 
-   function Number_Formals (Id : E) return Pos is
-  N  : Int;
+   function Number_Formals (Id : E) return Nat is
+  N  : Nat;
   Formal : Entity_Id;
 
begin
diff --git a/gcc/ada/einfo-utils.ads b/gcc/ada/einfo-utils.ads
index fee771c20f4..20ca470d7ac 100644
--- a/gcc/ada/einfo-utils.ads
+++ b/gcc/ada/einfo-utils.ads
@@ -227,7 +227,7 @@ package Einfo.Utils is
function Next_Stored_Discriminant (Id : E) return Entity_Id;
function Number_Dimensions (Id : E) return Pos;
function Number_Entries (Id : E) return Nat;
-   function Number_Formals (Id : E) return Pos;
+   function Number_Formals (Id : E) return Nat;
function Object_Size_Clause (Id : E) return Node_Id;
function Parameter_Mode (Id : E) return Formal_Kind;
function Partial_Refinement_Constituents (Id : E) return L;
diff --git a/gcc/ada/einfo.ads b/gcc/ada/einfo.ads
index b356b76f0de..d7690d9f88a 100644
--- a/gcc/ada/einfo.ads
+++ b/gcc/ada/einfo.ads
@@ -3832,7 +3832,7 @@ package Einfo is
 
 --Number_Formals (synthesized)
 --   Applies to subprograms and subprogram types. Yields the number of
---   formals as a value of type Pos.
+--   formals as a value of type Nat.
 
 --Object_Size_Clause (synthesized)
 --   Applies to entities for types and subtypes. If an object size clause
diff --git a/gcc/ada/sem_ch13.adb b/gcc/ada/sem_ch13.adb
index 585c0f33d8b..7cd0800a56c 100644
--- a/gcc/ada/sem_ch13.adb
+++ b/gcc/ada/sem_ch13.adb
@@ -16544,7 +16544,11 @@ package body Sem_Ch13 is
 It : Interp;
 
 function Is_Property_Function (E : Entity_Id) return Boolean;
---  Implements RM 7.3.4 definition of "property function".
+--  Implements RM 7.3.4 definition of "property function"
+
+--
+-- Is_Property_Function --
+--
 
 function Is_Property_Function (E : Entity_Id) return Boolean is
 begin
-- 
2.40.0



[COMMITTED] ada: Small refactor

2023-07-28 Thread Marc Poulhiès via Gcc-patches
From: Viljar Indus 

gcc/ada/

* exp_util.adb (Find_Optional_Prim_Op): use "No" instead of "= Empty"

Tested on x86_64-pc-linux-gnu, committed on master.

---
 gcc/ada/exp_util.adb | 5 +++--
 1 file changed, 3 insertions(+), 2 deletions(-)

diff --git a/gcc/ada/exp_util.adb b/gcc/ada/exp_util.adb
index 66e1acbf65e..9f843d6d71e 100644
--- a/gcc/ada/exp_util.adb
+++ b/gcc/ada/exp_util.adb
@@ -6291,8 +6291,9 @@ package body Exp_Util is
 
   Typ := Underlying_Type (Typ);
 
-  --  We cannot find the operation if there is no full view available.
-  if Typ = Empty then
+  --  We cannot find the operation if there is no full view available
+
+  if No (Typ) then
  return Empty;
   end if;
 
-- 
2.40.0



[COMMITTED] ada: Add guard for detection of class-wide precondition subprograms

2023-07-28 Thread Marc Poulhiès via Gcc-patches
From: Piotr Trojanek 

When skipping check on subprograms built for class-wide preconditions
we must deal with the current scope not being a subprogram, e.g. it
could be a declare-block.

gcc/ada/

* sem_res.adb (Resolve_Actuals): Add guard for the call to
Class_Preconditions_Subprogram.

Tested on x86_64-pc-linux-gnu, committed on master.

---
 gcc/ada/sem_res.adb | 5 -
 1 file changed, 4 insertions(+), 1 deletion(-)

diff --git a/gcc/ada/sem_res.adb b/gcc/ada/sem_res.adb
index 2c8efec524b..d3a0192fb09 100644
--- a/gcc/ada/sem_res.adb
+++ b/gcc/ada/sem_res.adb
@@ -5146,7 +5146,10 @@ package body Sem_Res is
 if Is_EVF_Expression (A)
   and then Extensions_Visible_Status (Nam) =
Extensions_Visible_True
-  and then No (Class_Preconditions_Subprogram (Current_Scope))
+  and then not
+   (Is_Subprogram (Current_Scope)
+  and then
+Present (Class_Preconditions_Subprogram (Current_Scope)))
 then
Error_Msg_N
  ("formal parameter cannot act as actual parameter when "
-- 
2.40.0



[COMMITTED] ada: Elide the copy in extended returns for nonlimited by-reference types

2023-07-28 Thread Marc Poulhiès via Gcc-patches
From: Eric Botcazou 

gcc/ada/

* gcc-interface/trans.cc (gnat_to_gnu): Restrict previous change to
the case where the simple return statement has got no storage pool.

Tested on x86_64-pc-linux-gnu, committed on master.

---
 gcc/ada/gcc-interface/trans.cc | 7 ---
 1 file changed, 4 insertions(+), 3 deletions(-)

diff --git a/gcc/ada/gcc-interface/trans.cc b/gcc/ada/gcc-interface/trans.cc
index fd85facaf70..5d93060c6d8 100644
--- a/gcc/ada/gcc-interface/trans.cc
+++ b/gcc/ada/gcc-interface/trans.cc
@@ -8451,8 +8451,8 @@ gnat_to_gnu (Node_Id gnat_node)
 
5. If this is a reference to an unconstrained array which is used either
  as the prefix of an attribute reference that requires an lvalue or in
- a return statement, then return the result unmodified because we want
- to return the original bounds.
+ a return statement without storage pool, return the result unmodified
+ because we want to return the original bounds.
 
6. Finally, if the type of the result is already correct.  */
 
@@ -8518,7 +8518,8 @@ gnat_to_gnu (Node_Id gnat_node)
   && Present (Parent (gnat_node))
   && ((Nkind (Parent (gnat_node)) == N_Attribute_Reference
&& lvalue_required_for_attribute_p (Parent (gnat_node)))
-  || Nkind (Parent (gnat_node)) == N_Simple_Return_Statement))
+  || (Nkind (Parent (gnat_node)) == N_Simple_Return_Statement
+  && No (Storage_Pool (gnat_node)
 ;
 
   else if (TREE_TYPE (gnu_result) != gnu_result_type)
-- 
2.40.0



[COMMITTED] ada: Fix typo in comment of Ada.Exceptions.Save_Occurrence

2023-07-28 Thread Marc Poulhiès via Gcc-patches
From: Piotr Trojanek 

Minor typo in comment.

gcc/ada/

* libgnat/a-except.ads (Save_Occurrence): Fix typo.

Tested on x86_64-pc-linux-gnu, committed on master.

---
 gcc/ada/libgnat/a-except.ads | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/gcc/ada/libgnat/a-except.ads b/gcc/ada/libgnat/a-except.ads
index 5583bf5504d..d618f78e97a 100644
--- a/gcc/ada/libgnat/a-except.ads
+++ b/gcc/ada/libgnat/a-except.ads
@@ -120,7 +120,7 @@ package Ada.Exceptions is
 
--  Ada 2005 (AI-438): The language revision introduces the following
--  subprograms and attribute definitions. We do not provide them
-   --  explicitly. instead, the corresponding stream attributes are made
+   --  explicitly. Instead, the corresponding stream attributes are made
--  available through a pragma Stream_Convert in the private part.
 
--  procedure Read_Exception_Occurrence
-- 
2.40.0



[COMMITTED] ada: Add missing SCO generation for quantified expressions in object decl

2023-07-28 Thread Marc Poulhiès via Gcc-patches
From: Léo Creuse 

This change corrects the Has_Decision predicate in par_sco.adb to
properly consider predicates of quantified expressions as
decisions.

gcc/ada/

* par_sco.adb (Has_Decision): Consider that quantified expressions
contain decisions.

Tested on x86_64-pc-linux-gnu, committed on master.

---
 gcc/ada/par_sco.adb | 6 --
 1 file changed, 4 insertions(+), 2 deletions(-)

diff --git a/gcc/ada/par_sco.adb b/gcc/ada/par_sco.adb
index c3aa2a5936e..ce7de7f3d79 100644
--- a/gcc/ada/par_sco.adb
+++ b/gcc/ada/par_sco.adb
@@ -398,7 +398,8 @@ package body Par_SCO is
   function Check_Node (N : Node_Id) return Traverse_Result;
   --  Determine if Nkind (N) indicates the presence of a decision (i.e. N
   --  is a logical operator, which is a decision in itself, or an
-  --  IF-expression whose Condition attribute is a decision).
+  --  IF-expression whose Condition attribute is a decision, or a
+  --  quantified expression, whose predicate is a decision).
 
   
   -- Check_Node --
@@ -409,10 +410,11 @@ package body Par_SCO is
  --  If we are not sure this is a logical operator (AND and OR may be
  --  turned into logical operators with the Short_Circuit_And_Or
  --  pragma), assume it is. Putative decisions will be discarded if
- --  needed in the secord pass.
+ --  needed in the second pass.
 
  if Is_Logical_Operator (N) /= False
or else Nkind (N) = N_If_Expression
+   or else Nkind (N) = N_Quantified_Expression
  then
 return Abandon;
  else
-- 
2.40.0



[COMMITTED] ada: Fix race condition in protected entry call

2023-07-28 Thread Marc Poulhiès via Gcc-patches
From: Ronan Desplanques 

This patch only affects the single-entry implementation of protected
objects.

Before this patch, there was a race condition where a task that
called an entry could put itself to sleep right after another task
had executed the entry as a proxy and signalled the not-yet-waiting
first task, which caused the first task to enter a deadlock.

Note that this race condition has been identified and fixed before
for the implementations of the run-time that live under hie/.

This patch reworks the locking sequence so that it is closer to the
one that's used in the multiple-entry implementation of protected
objects. The code for the multiple-entry implementation is spread
across multiple subprograms. To draw a parallel with the section
this patch modifies, one can read the following subprograms:

- System.Tasking.Protected_Objects.Operations.Protected_Entry_Call
- System.Tasking.Entry_Calls.Wait_For_Completion
- System.Tasking.Entry_Calls.Check_Pending_Actions_For_Entry_Call

This patch also adds a comment that explicitly states the locking
constraint that must hold in the affected section.

gcc/ada/

* libgnarl/s-tposen.adb: Fix race condition. Add comment to justify
the locking timing.

Tested on x86_64-pc-linux-gnu, committed on master.

---
 gcc/ada/libgnarl/s-tposen.adb | 10 --
 1 file changed, 8 insertions(+), 2 deletions(-)

diff --git a/gcc/ada/libgnarl/s-tposen.adb b/gcc/ada/libgnarl/s-tposen.adb
index 9dff6619295..a7447b9e2af 100644
--- a/gcc/ada/libgnarl/s-tposen.adb
+++ b/gcc/ada/libgnarl/s-tposen.adb
@@ -345,11 +345,17 @@ package body 
System.Tasking.Protected_Objects.Single_Entry is
 
   pragma Assert (Entry_Call.State /= Cancelled);
 
+  --  Note that we need to acquire Self_Id's lock before checking the value
+  --  of Entry_Call.State, even though the latter is specified as atomic
+  --  with a pragma. If we didn't, another task could execute the entry on
+  --  our behalf right between the check of Entry_Call.State and the call
+  --  to Wait_For_Completion, and that would cause a deadlock.
+
+  STPO.Write_Lock (Self_Id);
   if Entry_Call.State /= Done then
- STPO.Write_Lock (Self_Id);
  Wait_For_Completion (Entry_Call'Access);
- STPO.Unlock (Self_Id);
   end if;
+  STPO.Unlock (Self_Id);
 
   Check_Exception (Self_Id, Entry_Call'Access);
end Protected_Single_Entry_Call;
-- 
2.40.0



[COMMITTED] ada: Emit enums rather than defines for various constants

2023-07-28 Thread Marc Poulhiès via Gcc-patches
From: Tom Tromey 

This patch changes xsnamest and gen_il-gen to emit various constants
as enums rather than a sequence of preprocessor defines.  This enables
better debugging and somewhat better type safety.

gcc/ada/

* fe.h (Convention): Now inline function.
* gen_il-gen.adb (Put_C_Type_And_Subtypes.Put_Enum_Lit)
(Put_C_Type_And_Subtypes.Put_Kind_Subtype, Put_C_Getter):
Emit enum.
* snames.h-tmpl (Name_Id, Name_, Attribute_Id, Attribute_)
(Convention_Id, Convention_, Pragma_Id, Pragma_): Now enum.
(Get_Attribute_Id, Get_Pragma_Id): Now inline functions.
* types.h (Node_Kind, Entity_Kind, Convention_Id, Name_Id):
Now enum.
* xsnamest.adb (Output_Header_Line, Make_Value): Emit enum.

Tested on x86_64-pc-linux-gnu, committed on master.

---
 gcc/ada/fe.h   |  8 --
 gcc/ada/gen_il-gen.adb | 11 ++---
 gcc/ada/snames.h-tmpl  | 56 +-
 gcc/ada/types.h|  8 +++---
 gcc/ada/xsnamest.adb   | 30 +-
 5 files changed, 69 insertions(+), 44 deletions(-)

diff --git a/gcc/ada/fe.h b/gcc/ada/fe.h
index f283064c728..ca77f433cfa 100644
--- a/gcc/ada/fe.h
+++ b/gcc/ada/fe.h
@@ -683,8 +683,12 @@ Entity_Kind Parameter_Mode (E Id);
 // The following is needed because Convention in Sem_Util is a renaming
 // of Basic_Convention.
 
-#define Convention einfo__entities__basic_convention
-Convention_Id Convention (N Node);
+static inline Convention_Id
+Convention (N Node)
+{
+  extern Byte einfo__entities__basic_convention (N Node);
+  return (Convention_Id) einfo__entities__basic_convention (Node);
+}
 
 // See comments regarding Entity_Or_Associated_Node in Sinfo.Utils.
 
diff --git a/gcc/ada/gen_il-gen.adb b/gcc/ada/gen_il-gen.adb
index bf760f3d917..1cee17caf76 100644
--- a/gcc/ada/gen_il-gen.adb
+++ b/gcc/ada/gen_il-gen.adb
@@ -2957,9 +2957,9 @@ package body Gen_IL.Gen is
  --  Current Node_Kind'Pos or Entity_Kind'Pos to be printed
 
  procedure Put_Enum_Lit (T : Node_Or_Entity_Type);
- --  Print out the #define corresponding to the Ada enumeration literal
+ --  Print out the enumerator corresponding to the Ada enumeration 
literal
  --  for T in Node_Kind and Entity_Kind (i.e. concrete types).
- --  This looks like "#define Some_Kind ", where Some_Kind
+ --  This looks like "Some_Kind = ", where Some_Kind
  --  is the Node_Kind or Entity_Kind enumeration literal, and
  --   is Node_Kind'Pos or Entity_Kind'Pos of that literal.
 
@@ -2970,7 +2970,7 @@ package body Gen_IL.Gen is
  procedure Put_Enum_Lit (T : Node_Or_Entity_Type) is
  begin
 if T in Concrete_Type then
-   Put (S, "#define " & Image (T) & " " & Image (Cur_Pos) & LF);
+   Put (S, "  " & Image (T) & " = " & Image (Cur_Pos) & "," & LF);
Cur_Pos := Cur_Pos + 1;
 end if;
  end Put_Enum_Lit;
@@ -2990,7 +2990,9 @@ package body Gen_IL.Gen is
   begin
  Put_Union_Membership (S, Root, Only_Prototypes => True);
 
+ Put (S, "enum " & Node_Or_Entity (Root) & "_Kind : unsigned int {" & 
LF);
  Iterate_Types (Root, Pre => Put_Enum_Lit'Access);
+ Put (S, "};" & LF);
 
  Put (S, "#define Number_" & Node_Or_Entity (Root) & "_Kinds " &
   Image (Cur_Pos) & "" & LF & LF);
@@ -3046,7 +3048,8 @@ package body Gen_IL.Gen is
 Put (S, "unsigned int Raw = slot;" & LF);
  end if;
 
- Put (S, Get_Set_Id_Image (Rec.Field_Type) & " val = ");
+ Put (S, Get_Set_Id_Image (Rec.Field_Type) & " val = (" &
+Get_Set_Id_Image (Rec.Field_Type) & ") ");
 
  if Field_Has_Special_Default (Rec.Field_Type) then
 Increase_Indent (S, 2);
diff --git a/gcc/ada/snames.h-tmpl b/gcc/ada/snames.h-tmpl
index 95b3c776197..f01642ffbff 100644
--- a/gcc/ada/snames.h-tmpl
+++ b/gcc/ada/snames.h-tmpl
@@ -28,43 +28,55 @@
 
 /* Name_Id values */
 
-typedef Int Name_Id;
-#define  Name_ !! TEMPLATE INSERTION POINT
+enum Name_Id : Int
+{
+  Name_ !! TEMPLATE INSERTION POINT
+};
 
-/* Define the function to return one of the numeric values below. Note
-   that it actually returns a char since an enumeration value of less
-   than 256 entries is represented that way in Ada.  The operand is a Chars
-   field value.  */
+/* Define the numeric values for attributes.  */
 
-typedef Byte Attribute_Id;
-#define Get_Attribute_Id snames__get_attribute_id
-extern Attribute_Id Get_Attribute_Id (int);
+enum Attribute_Id : unsigned char
+{
+  Attr_ !! TEMPLATE INSERTION POINT
+};
 
-/* Define the numeric values for attributes.  */
+/* Define the function to return one of the numeric values above.  The operand
+   is a Chars field value.  */
 
-#define  Attr_ !! TEMPLATE INSERTION POINT
+static inline Attribute_Id
+Get_Attribute_Id (int id)
+{
+  extern unsigned char snames__get_attribute_id (int);
+  

[COMMITTED] ada: Add an assert in Posix Interrupt_Wait

2023-07-28 Thread Marc Poulhiès via Gcc-patches
From: Clément Chigot 

All functions but Interrupt_Wait in s-inmaop__posix are checking the
result of their syscalls with an assert. However, any return code of
sigwait different than 0 means that something went wrong for it.

>From sigwait man:
> RETURN VALUE
>  On success, sigwait() returns 0.  On  error,  it  returns  a
>  positive error number (listed in ERRORS).

gcc/ada/

* libgnarl/s-inmaop__posix.adb: Add assert after sigwait in
Interrupt_Wait

Tested on x86_64-pc-linux-gnu, committed on master.

---
 gcc/ada/libgnarl/s-inmaop__posix.adb | 1 +
 1 file changed, 1 insertion(+)

diff --git a/gcc/ada/libgnarl/s-inmaop__posix.adb 
b/gcc/ada/libgnarl/s-inmaop__posix.adb
index 3171399f982..e4d07ee77eb 100644
--- a/gcc/ada/libgnarl/s-inmaop__posix.adb
+++ b/gcc/ada/libgnarl/s-inmaop__posix.adb
@@ -135,6 +135,7 @@ package body System.Interrupt_Management.Operations is
 
begin
   Result := sigwait (Mask, Sig'Access);
+  pragma Assert (Result = 0);
 
   if Result /= 0 then
  return 0;
-- 
2.40.0



[COMMITTED] ada: Leave detection of missing return in functions to GNATprove

2023-07-28 Thread Marc Poulhiès via Gcc-patches
From: Piotr Trojanek 

GNAT has a heuristic to warn about missing return statements in
functions. This warning was escalated to errors when operating in
GNATprove mode and SPARK_Mode was On. However, this heuristic was
imprecise and caused spurious errors. Also, it was applied after the
Push_Scope/End_Scope, so for functions acting as compilation units it
was using the wrong SPARK_Mode.

It is better to simply leave this detection to GNATprove.

gcc/ada/

* sem_ch6.adb (Check_Statement_Sequence): Only warn about missing return
statements and let GNATprove emit a check when needed.

Tested on x86_64-pc-linux-gnu, committed on master.

---
 gcc/ada/sem_ch6.adb | 11 ++-
 1 file changed, 2 insertions(+), 9 deletions(-)

diff --git a/gcc/ada/sem_ch6.adb b/gcc/ada/sem_ch6.adb
index 62ca985bf87..4e64833b3f7 100644
--- a/gcc/ada/sem_ch6.adb
+++ b/gcc/ada/sem_ch6.adb
@@ -7315,18 +7315,11 @@ package body Sem_Ch6 is
 --  already, so the Assert_False is for the assertions off case.
 
 if not Raise_Exception_Call and then not Assert_False then
-
-   --  In GNATprove mode, it is an error to have a missing return
-
-   Error_Msg_Warn := SPARK_Mode /= On;
-
-   --  Issue error message or warning
-
Error_Msg_N
- ("RETURN statement missing following this statement<

[COMMITTED] ada: Fix unsupported dispatching constructor call

2023-07-28 Thread Marc Poulhiès via Gcc-patches
From: Javier Miranda 

Add dummy build-in-place parameters when a BIP function does not
require the BIP parameters but it is a dispatching operation that
inherited them.

gcc/ada/

* einfo-utils.adb (Underlying_Type): Protect recursion call
against non-available attribute Etype.
* einfo.ads (Protected_Subprogram): Fix typo in documentation.
* exp_ch3.adb (BIP_Function_Call_Id): New subprogram.
(Expand_N_Object_Declaration): Improve code that evaluates if the
object is initialized with a BIP function call.
* exp_ch6.adb (Is_True_Build_In_Place_Function_Call): New
subprogram.
(Add_Task_Actuals_To_Build_In_Place_Call): Add dummy actuals if
the function does not require the BIP task actuals but it is a
dispatching operation that inherited them.
(Build_In_Place_Formal): Improve code to avoid never-ending loop
if the BIP formal is not found.
(Add_Dummy_Build_In_Place_Actuals): New subprogram.
(Expand_Call_Helper): Add calls to
Add_Dummy_Build_In_Place_Actuals.
(Expand_N_Extended_Return_Statement): Adjust assertion.
(Expand_Simple_Function_Return): Adjust assertion.
(Make_Build_In_Place_Call_In_Allocator): No action needed if the
called function inherited the BIP extra formals but it is not a
true BIP function.
(Make_Build_In_Place_Call_In_Assignment): Ditto.
* exp_intr.adb (Expand_Dispatching_Constructor_Call): Remove code
reporting unsupported case (since this patch adds support for it).
* sem_ch6.adb (Analyze_Subprogram_Body_Helper): Adding assertion
to ensure matching of BIP formals when setting the
Protected_Formal field of a protected subprogram to reference the
corresponding extra formal of the subprogram that implements it.
(Might_Need_BIP_Task_Actuals): New subprogram.
(Create_Extra_Formals): Improve code adding inherited extra
formals.

Tested on x86_64-pc-linux-gnu, committed on master.

---
 gcc/ada/einfo-utils.adb |   2 +-
 gcc/ada/einfo.ads   |   2 +-
 gcc/ada/exp_ch3.adb | 101 ++---
 gcc/ada/exp_ch6.adb | 234 +---
 gcc/ada/exp_intr.adb|  45 
 gcc/ada/sem_ch6.adb | 185 ++-
 6 files changed, 418 insertions(+), 151 deletions(-)

diff --git a/gcc/ada/einfo-utils.adb b/gcc/ada/einfo-utils.adb
index 7fe517124d9..cb9a00dc4bb 100644
--- a/gcc/ada/einfo-utils.adb
+++ b/gcc/ada/einfo-utils.adb
@@ -3019,7 +3019,7 @@ package body Einfo.Utils is
  --  Otherwise check for the case where we have a derived type or
  --  subtype, and if so get the Underlying_Type of the parent type.
 
- elsif Etype (Id) /= Id then
+ elsif Present (Etype (Id)) and then Etype (Id) /= Id then
 return Underlying_Type (Etype (Id));
 
  --  Otherwise we have an incomplete or private type that has no full
diff --git a/gcc/ada/einfo.ads b/gcc/ada/einfo.ads
index d7690d9f88a..977392899f9 100644
--- a/gcc/ada/einfo.ads
+++ b/gcc/ada/einfo.ads
@@ -4112,7 +4112,7 @@ package Einfo is
 --Protected_Subprogram
 --   Defined in functions and procedures. Set for the pair of subprograms
 --   which emulate the runtime semantics of a protected subprogram. Denotes
---   the entity of the origial protected subprogram.
+--   the entity of the original protected subprogram.
 
 --Protection_Object
 --   Applies to protected entries, entry families and subprograms. Denotes
diff --git a/gcc/ada/exp_ch3.adb b/gcc/ada/exp_ch3.adb
index db27a5f68b6..04c3ad8c631 100644
--- a/gcc/ada/exp_ch3.adb
+++ b/gcc/ada/exp_ch3.adb
@@ -6256,6 +6256,11 @@ package body Exp_Ch3 is
   --  temporary. Func_Id is the enclosing function. Ret_Typ is the return
   --  type of Func_Id. Alloc_Expr is the actual allocator.
 
+  function BIP_Function_Call_Id return Entity_Id;
+  --  If the object initialization expression is a call to a build-in-place
+  --  function, return the id of the called function; otherwise return
+  --  Empty.
+
   procedure Count_Default_Sized_Task_Stacks
 (Typ : Entity_Id;
  Pri_Stacks  : out Int;
@@ -6592,6 +6597,67 @@ package body Exp_Ch3 is
  end if;
   end Build_Heap_Or_Pool_Allocator;
 
+  --
+  -- BIP_Function_Call_Id --
+  --
+
+  function BIP_Function_Call_Id return Entity_Id is
+
+ function Func_Call_Id (Function_Call : Node_Id) return Entity_Id;
+ --  Return the id of the called function.
+
+ function Func_Call_Id (Function_Call : Node_Id) return Entity_Id is
+Call_Node : constant Node_Id := Unqual_Conv (Function_Call);
+
+ begin
+if Is_Entity_Name (Name (Call_Node)) then
+   return Entity (Name (Call_Node));
+
+elsif Nkin

[COMMITTED] ada: Fix memory explosion on aggregate of nested packed array type

2023-07-28 Thread Marc Poulhiès via Gcc-patches
From: Eric Botcazou 

It occurs at compile time on an aggregate of a 2-dimensional packed array
type whose component type is itself a packed array, because the compiler
is trying to pack the intermediate aggregate and ends up rewriting a bunch
of subcomponents.  This optimization was originally devised for the case of
a scalar component type so the change adds this restriction.

gcc/ada/

* exp_aggr.adb (Is_Two_Dim_Packed_Array): Return true only if the
component type of the array is scalar.

Tested on x86_64-pc-linux-gnu, committed on master.

---
 gcc/ada/exp_aggr.adb | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/gcc/ada/exp_aggr.adb b/gcc/ada/exp_aggr.adb
index dffc5ab721d..cd5cc0b7669 100644
--- a/gcc/ada/exp_aggr.adb
+++ b/gcc/ada/exp_aggr.adb
@@ -306,7 +306,7 @@ package body Exp_Aggr is
--  N is the N_Aggregate node to be expanded.
 
function Is_Two_Dim_Packed_Array (Typ : Entity_Id) return Boolean;
-   --  For two-dimensional packed aggregates with constant bounds and constant
+   --  For 2D packed array aggregates with constant bounds and constant scalar
--  components, it is preferable to pack the inner aggregates because the
--  whole matrix can then be presented to the back-end as a one-dimensional
--  list of literals. This is much more efficient than expanding into single
@@ -8563,9 +8563,11 @@ package body Exp_Aggr is
 
function Is_Two_Dim_Packed_Array (Typ : Entity_Id) return Boolean is
   C : constant Uint := Component_Size (Typ);
+
begin
   return Number_Dimensions (Typ) = 2
 and then Is_Bit_Packed_Array (Typ)
+and then Is_Scalar_Type (Component_Type (Typ))
 and then C in Uint_1 | Uint_2 | Uint_4; -- False if No_Uint
end Is_Two_Dim_Packed_Array;
 
-- 
2.40.0



[COMMITTED] ada: Add support for binding to a specific network interface controller.

2023-07-28 Thread Marc Poulhiès via Gcc-patches
From: Pascal Obry 

gcc/ada/

* s-oscons-tmplt.c: Add support for SO_BINDTODEVICE constant.
* libgnat/g-socket.ads (Set_Socket_Option): Handle SO_BINDTODEVICE 
option.
(Get_Socket_Option): Handle SO_BINDTODEVICE option.
* libgnat/g-socket.adb: Likewise.
(Get_Socket_Option): Handle the case where IF_NAMESIZE is not defined
and so equal to -1.

Tested on x86_64-pc-linux-gnu, committed on master.

---
 gcc/ada/libgnat/g-socket.adb | 26 --
 gcc/ada/libgnat/g-socket.ads |  5 +
 gcc/ada/s-oscons-tmplt.c |  5 +
 3 files changed, 34 insertions(+), 2 deletions(-)

diff --git a/gcc/ada/libgnat/g-socket.adb b/gcc/ada/libgnat/g-socket.adb
index dca696f7c5f..c4e29075a0b 100644
--- a/gcc/ada/libgnat/g-socket.adb
+++ b/gcc/ada/libgnat/g-socket.adb
@@ -120,7 +120,8 @@ package body GNAT.Sockets is
 IPv6_Only   => SOSC.IPV6_V6ONLY,
 Send_Timeout=> SOSC.SO_SNDTIMEO,
 Receive_Timeout => SOSC.SO_RCVTIMEO,
-Busy_Polling=> SOSC.SO_BUSY_POLL];
+Busy_Polling=> SOSC.SO_BUSY_POLL,
+Bind_To_Device  => SOSC.SO_BINDTODEVICE];
--  ??? Note: for OpenSolaris, Receive_Packet_Info should be IP_RECVPKTINFO,
--  but for Linux compatibility this constant is the same as IP_PKTINFO.
 
@@ -1413,17 +1414,21 @@ package body GNAT.Sockets is
   use type C.unsigned;
   use type C.unsigned_char;
 
+  --  SOSC.IF_NAMESIZE may be not defined, ensure that we have at least
+  --  a valid range for VS declared below.
+  NS  : constant Interfaces.C.size_t :=
+  (if SOSC.IF_NAMESIZE = -1 then 256 else SOSC.IF_NAMESIZE);
   V8  : aliased Two_Ints;
   V4  : aliased C.int;
   U4  : aliased C.unsigned;
   V1  : aliased C.unsigned_char;
+  VS  : aliased C.char_array (1 .. NS); -- for devices name
   VT  : aliased Timeval;
   Len : aliased C.int;
   Add : System.Address;
   Res : C.int;
   Opt : Option_Type (Name);
   Onm : Interfaces.C.int;
-
begin
   if Name in Specific_Option_Name then
  Onm := Options (Name);
@@ -1491,6 +1496,11 @@ package body GNAT.Sockets is
  =>
 Len := V8'Size / 8;
 Add := V8'Address;
+
+ when Bind_To_Device
+ =>
+Len := VS'Length;
+Add := VS'Address;
   end case;
 
   Res :=
@@ -1589,6 +1599,9 @@ package body GNAT.Sockets is
 else
Opt.Timeout := To_Duration (VT);
 end if;
+
+ when Bind_To_Device =>
+Opt.Device := ASU.To_Unbounded_String (C.To_Ada (VS));
   end case;
 
   return Opt;
@@ -2616,6 +2629,10 @@ package body GNAT.Sockets is
   V4  : aliased C.int;
   U4  : aliased C.unsigned;
   V1  : aliased C.unsigned_char;
+  VS  : aliased C.char_array
+  (1 .. (if Option.Name = Bind_To_Device
+ then C.size_t (ASU.Length (Option.Device) + 1)
+ else 0));
   VT  : aliased Timeval;
   Len : C.int;
   Add : System.Address := Null_Address;
@@ -2754,6 +2771,11 @@ package body GNAT.Sockets is
Len := VT'Size / 8;
Add := VT'Address;
 end if;
+
+ when Bind_To_Device =>
+VS := C.To_C (ASU.To_String (Option.Device));
+Len := C.int (VS'Length);
+Add := VS'Address;
   end case;
 
   if Option.Name in Specific_Option_Name then
diff --git a/gcc/ada/libgnat/g-socket.ads b/gcc/ada/libgnat/g-socket.ads
index d49245290ce..90740ec65a4 100644
--- a/gcc/ada/libgnat/g-socket.ads
+++ b/gcc/ada/libgnat/g-socket.ads
@@ -841,6 +841,9 @@ package GNAT.Sockets is
   --  Sets the approximate time in microseconds to busy poll on a blocking
   --  receive when there is no data.
 
+  Bind_To_Device,  -- SO_BINDTODEVICE
+  --  Bind to a specific NIC (Network Interface Controller)
+
   ---
   -- IP_Protocol_For_TCP_Level --
   ---
@@ -986,6 +989,8 @@ package GNAT.Sockets is
   Receive_Timeout =>
 Timeout : Timeval_Duration;
 
+ when Bind_To_Device =>
+Device : Ada.Strings.Unbounded.Unbounded_String;
   end case;
end record;
 
diff --git a/gcc/ada/s-oscons-tmplt.c b/gcc/ada/s-oscons-tmplt.c
index 28d42c5a459..fb6bb0f043b 100644
--- a/gcc/ada/s-oscons-tmplt.c
+++ b/gcc/ada/s-oscons-tmplt.c
@@ -1545,6 +1545,11 @@ CND(SO_KEEPALIVE, "Enable keep-alive msgs")
 #endif
 CND(SO_LINGER, "Defer close to flush data")
 
+#ifndef SO_BINDTODEVICE
+# define SO_BINDTODEVICE -1
+#endif
+CND(SO_BINDTODEVICE, "Bind to a NIC - Network Interface Controller")
+
 #ifndef SO_BROADCAST
 # define SO_BROADCAST -1
 #endif
-- 
2.40.0



Loop-split improvements, part 2

2023-07-28 Thread Jan Hubicka via Gcc-patches
Hi,
this patch fixes profile update in the first case of loop splitting.
The pass still gives up on very basic testcases:

__attribute__ ((noinline,noipa))
void test1 (int n)
{
  if (n <= 0 || n > 10)
return; 
  for (int i = 0; i <= n; i++)
{
  if (i < n)
do_something ();
  if (a[i])
do_something2();
}
}

Here I needed to do the conditoinal that enforces sane value range of n.
The reason is that it gives up on:
  !number_of_iterations_exit (loop1, exit1, &niter, false, true)
and without the conditonal we get assumption that n>=0 and not INT_MAX.
I think from overflow we shold derive that INT_MAX test is not needed and since
the loop does nothing for n<0 it is also just an paranoia.

I am not sure how to fix this though :(.  In general the pass does not really
need to compute iteration count.  It only needs to know what direction the IVs
go so it can detect tests that fires in first part of iteration space.

Rich, any idea what the correct test should be?

In testcase:
  for (int i = 0; i < 200; i++)
if (i < 150)
  do_something ();
else
  do_something2 ();
the old code did wrong update of the exit condition probabilities.
We know that first loop iterates 150 times and the second loop 50 times
and we get it by simply scaling loop body by the probability of inner test.

With the patch we now get:

   [count: 1000]:

   [count: 15]:<- loop 1 correctly iterates 149 times
  # i_10 = PHI 
  do_something ();
  i_7 = i_10 + 1;
  if (i_7 <= 149)
goto ; [99.33%]
  else
goto ; [0.67%]

   [count: 149000]:
  goto ; [100.00%]

   [count: 1000]:
  # i_15 = PHI 

   [count: 49975]:<- loop 2 should iterate 50 times but
   we are slightly wrong
  # i_3 = PHI 
  do_something2 ();
  i_14 = i_3 + 1;
  if (i_14 != 200)
goto ; [98.00%]
  else
goto ; [2.00%]

   [count: 48975]:
  goto ; [100.00%]

   [count: 1000]:   <- this test is always true becuase it is
  reached form bb 3
  # i_18 = PHI 
  if (i_18 != 200)
goto ; [99.95%]
  else
goto ; [0.05%]

   [count: 1000]:
  return;

The reason why we are slightly wrong is the condtion in bb17 that 
is always true but the pass does not konw it.

Rich any idea how to do that?  I think connect_loops should work out
the cas where the loop exit conditon is never satisfied at the time
the splitted condition fails for first time.

Also we do not update loop iteration expectancies.  If we were able to 
work out if one of the loop has constant iteration count, we could do it
perfectly.

Before patch on hmmer we get a lot of mismatches:
Profile report here claims:
dump id |static mismat|dynamic mismatch |   

|in count |in count  |time  |   

lsplit  |  5+5|   8151850567  +8151850567| 531506481006   +57.9%| 
ldist   |  9+4|  15345493501  +7193642934| 606848841056   +14.2%| 
ifcvt   | 10+1|  15487514871   +142021370| 689469797790   +13.6%| 
vect| 35   +25|  17558425961  +2070911090| 517375405715   -25.0%| 
cunroll | 42+7|  16898736178   -659689783| 452445796198-4.9%|  
loopdone| 33-9|   2678017188 -14220718990| 330969127663 |   

tracer  | 34+1|   2678018710+1522| 330613415364+0.0%|  
fre | 33-1|   2676980249 -1038461| 330465677073-0.0%|  
expand  | 28-5|   2497468467   -179511782|--|

With patch

lsplit  |  0  |0 | 328723360744-2.3%|
ldist   |  0  |0 | 396193562452   +20.6%|
ifcvt   |  1+1| 71010686+71010686| 478743508522   +20.8%|
vect| 14   +13|697518955   +626508269| 299398068323   -37.5%|
cunroll | 13-1|489349408   -208169547| 25839725   -10.5%|
loopdone| 11-2|402558559-86790849| 201010712702 |
tracer  | 13+2|402977200  +418641| 200651036623+0.0%|
fre | 13  |402622146  -355054| 200344398654-0.2%|
expand  | 11-2|333608636-69013510|--|

So no mismatches for lsplit and ldist and also lsplit thinks it improves
speed by 2.3% rather than regressig it by 57%. 

Update is still not perfect since we do not work out that the second loop
never iterates.  Also ldist is still wrong siince time should not go up.

Ifcft wrecks profile by desing since it insert conditonals with both arms 100%
that will be eliminated later after vect.  It is not clear to me what happens
in vect though.

Bootstrapped/regtested x86_64-linux, comitted.

gcc/ChangeLog:

PR middle-end/106293
* tree-ssa-loop-split.cc (connect_loops): Change probability
of the test preconditioning second loop to very_likely.
(fix_loop_bb_probability): Handle corre

Re: [PATCH] Optimize vec_splats of vec_extract for V2DI/V2DF (PR target/99293)

2023-07-28 Thread Kewen.Lin via Gcc-patches
Hi Mike,

on 2023/7/11 03:50, Michael Meissner wrote:
> This patch optimizes cases like:
> 
>   vector double v1, v2;
>   /* ... */
>   v2 = vec_splats (vec_extract (v1, 0);   /* or  */
>   v2 = vec_splats (vec_extract (v1, 1);
> 
> Previously:
> 
>   vector long long
>   splat_dup_l_0 (vector long long v)
>   {
> return __builtin_vec_splats (__builtin_vec_extract (v, 0));
>   }
> 
> would generate:
> 
> mfvsrld 9,34
> mtvsrdd 34,9,9
> blr
> 
> With this patch, GCC generates:
> 
> xxpermdi 34,34,34,3
>   blr
> > 2023-07-10  Michael Meissner  
> 
> gcc/
> 
>   PR target/99293
>   * gcc/config/rs6000/vsx.md (vsx_splat_extract_): New combiner
>   insn.
> 
> gcc/testsuite/
> 
>   PR target/108958
>   * gcc.target/powerpc/pr99293.c: New test.
>   * gcc.target/powerpc/builtins-1.c: Update insn count.
> ---
>  gcc/config/rs6000/vsx.md  | 18 ++
>  gcc/testsuite/gcc.target/powerpc/builtins-1.c |  2 +-
>  gcc/testsuite/gcc.target/powerpc/pr99293.c| 55 +++
>  3 files changed, 74 insertions(+), 1 deletion(-)
>  create mode 100644 gcc/testsuite/gcc.target/powerpc/pr99293.c
> 
> diff --git a/gcc/config/rs6000/vsx.md b/gcc/config/rs6000/vsx.md
> index 0c269e4e8d9..d34c3b21abe 100644
> --- a/gcc/config/rs6000/vsx.md
> +++ b/gcc/config/rs6000/vsx.md
> @@ -4600,6 +4600,24 @@ (define_insn "vsx_splat__mem"
>"lxvdsx %x0,%y1"
>[(set_attr "type" "vecload")])
>  
> +;; Optimize SPLAT of an extract from a V2DF/V2DI vector with a constant 
> element
> +(define_insn "*vsx_splat_extract_"
> +  [(set (match_operand:VSX_D 0 "vsx_register_operand" "=wa")
> + (vec_duplicate:VSX_D
> +  (vec_select:
> +   (match_operand:VSX_D 1 "vsx_register_operand" "wa")
> +   (parallel [(match_operand 2 "const_0_to_1_operand" "n")]]
> +  "VECTOR_MEM_VSX_P (mode)"
> +{
> +  int which_word = INTVAL (operands[2]);
> +  if (!BYTES_BIG_ENDIAN)
> +which_word = 1 - which_word;
> +
> +  operands[3] = GEN_INT (which_word ? 3 : 0);
> +  return "xxpermdi %x0,%x1,%x1,%3";
> +}
> +  [(set_attr "type" "vecperm")])
> +
>  ;; V4SI splat support
>  (define_insn "vsx_splat_v4si"
>[(set (match_operand:V4SI 0 "vsx_register_operand" "=wa,wa")
> diff --git a/gcc/testsuite/gcc.target/powerpc/builtins-1.c 
> b/gcc/testsuite/gcc.target/powerpc/builtins-1.c
> index 28cd1aa6b1a..98783668bce 100644
> --- a/gcc/testsuite/gcc.target/powerpc/builtins-1.c
> +++ b/gcc/testsuite/gcc.target/powerpc/builtins-1.c
> @@ -1035,4 +1035,4 @@ foo156 (vector unsigned short usa)
>  /* { dg-final { scan-assembler-times {\mvmrglb\M} 3 } } */
>  /* { dg-final { scan-assembler-times {\mvmrgew\M} 4 } } */
>  /* { dg-final { scan-assembler-times {\mvsplth|xxsplth\M} 4 } } */
> -/* { dg-final { scan-assembler-times {\mxxpermdi\M} 44 } } */
> +/* { dg-final { scan-assembler-times {\mxxpermdi\M} 42 } } */
> diff --git a/gcc/testsuite/gcc.target/powerpc/pr99293.c 
> b/gcc/testsuite/gcc.target/powerpc/pr99293.c
> new file mode 100644
> index 000..e5f44bd7346
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/powerpc/pr99293.c
> @@ -0,0 +1,55 @@
> +/* { dg-require-effective-target powerpc_p8vector_ok } */
> +/* { dg-options "-O2 -mpower8-vector" } */

Nit: IMHO -mdejagnu-cpu=power8 is preferred against -mpower8-vector which is
considered as a workaround option, and we plan to make it go away.

> +
> +/* Test for PR 99263, which wants to do:
> + __builtin_vec_splats (__builtin_vec_extract (v, n))

Nit: Maybe remove all "__builtin_" prefixes since vec_splats and vec_extract
are defined in PVIPR without __builtin_.

This is also applied for the others below.

> +
> +   where v is a V2DF or V2DI vector and n is either 0 or 1.  Previously the 
> GCC
> +   compiler would do a direct move to the GPR registers to select the item 
> and a
> +   direct move from the GPR registers to do the splat.
> +
> +   Before the patch, splat_dup_ll_0 or splat_dup_dbl_0 below would generate:
> +
> +mfvsrld 9,34
> +mtvsrdd 34,9,9
> +blr
> +
> +   and now it generates:
> +
> +xxpermdi 34,34,34,3
> +blr  */
> +
> +#include 
> +
> +vector long long
> +splat_dup_ll_0 (vector long long v)
> +{
> +  /* xxpermdi 34,34,34,3 */
> +  return __builtin_vec_splats (vec_extract (v, 0));
> +}
> +
> +vector double
> +splat_dup_dbl_0 (vector double v)
> +{
> +  /* xxpermdi 34,34,34,3 */
> +  return __builtin_vec_splats (vec_extract (v, 0));
> +}
> +
> +vector long long
> +splat_dup_ll_1 (vector long long v)
> +{
> +  /* xxpermdi 34,34,34,0 */
> +  return __builtin_vec_splats (vec_extract (v, 1));
> +}
> +
> +vector double
> +splat_dup_dbl_1 (vector double v)
> +{
> +  /* xxpermdi 34,34,34,0 */
> +  return __builtin_vec_splats (vec_extract (v, 1));
> +}
> +
> +/* { dg-final { scan-assembler-times "xxpermdi" 4 } } */

Nit: It's good to add \m..\M like the others, i.e.

   /* { dg-final { scan-assembler-times {\mxxpermdi\M} 4 

Re: [PATCH 0/5] GCC _BitInt support [PR102989]

2023-07-28 Thread Jakub Jelinek via Gcc-patches
On Thu, Jul 27, 2023 at 06:41:44PM +, Joseph Myers wrote:
> On Thu, 27 Jul 2023, Jakub Jelinek via Gcc-patches wrote:
> 
> > - _BitInt(N) bit-fields aren't supported yet (the patch rejects them); I'd 
> > like
> >   to enable those incrementally, but don't really see details on how such
> >   bit-fields should be laid-out in memory nor passed inside of function
> >   arguments; LLVM implements something, but it is a question if that is what
> >   the various ABIs want
> 
> So if the x86-64 ABI (or any other _BitInt ABI that already exists) 
> doesn't specify this adequately then an issue should be filed (at 
> https://gitlab.com/x86-psABIs/x86-64-ABI/-/issues in the x86-64 case).
> 
> (Note that the language specifies that e.g. _BitInt(123):45 gets promoted 
> to _BitInt(123) by the integer promotions, rather than left as a type with 
> the bit-field width.)

Ok, I'll try to investigate in detail what LLVM does and what GCC would do
if I just enabled the bitfield support and report.  Still, I'd like to
handle this only in incremental step after the rest of _BitInt support goes
in.

> > - conversions between large/huge (see later) _BitInt and _Decimal{32,64,128}
> >   aren't support and emit a sorry; I'm not familiar enough with DFP stuff
> >   to implement that
> 
> Doing things incrementally might indicate first doing this only for BID 
> (so sufficing for x86-64), with DPD support to be added when _BitInt 
> support is added for an architecture using DPD, i.e. powerpc / s390.
> 
> This conversion is a mix of base conversion and things specific to DFP 
> types.

I had a brief look at libbid and am totally unimpressed.
Seems we don't implement {,unsigned} __int128 <-> _Decimal{32,64,128}
conversions at all (we emit calls to __bid_* functions which don't exist),
the library (or the way we configure it) doesn't care about exceptions nor
rounding mode (see following testcase) and for integral <-> _Decimal32
conversions implement them as integral <-> _Decimal64 <-> _Decimal32
conversions.  While in the _Decimal32 -> _Decimal64 -> integral
direction that is probably ok, even if exceptions and rounding (other than
to nearest) were supported, the other direction I'm sure can suffer from
double rounding.

So, wonder if it wouldn't be better to implement these in the soft-fp
infrastructure which at least has the exception and rounding mode support.
Unlike DPD, decoding BID seems to be about 2 simple tests of the 4 bits
below the sign bit and doing some shifts, so not something one needs a 10MB
of a library for.  Now, sure, 5MB out of that are generated tables in
bid_binarydecimal.c, but unfortunately those are static and not in a form
which could be directly fed into multiplication (unless we'd want to go
through conversions to/from strings).
So, it seems to be easier to guess needed power of 10 from number of binary
digits or vice versa, have a small table of powers of 10 (say those which
fit into a limb) and construct larger powers of 10 by multiplicating those
several times, _Decimal128 has exponent up to 6144 which is ~ 2552 bytes
or 319 64-bit limbs, but having a table with all the 6144 powers of ten
would be just huge.  In 64-bit limb fit power of ten until 10^19, so we
might need say < 32 multiplications to cover it all (but with the current
575 bits limitation far less).  Perhaps later on write a few selected powers
of 10 as _BitInt to decrease that number.

> For conversion *from _BitInt to DFP*, the _BitInt value needs to be 
> expressed in decimal.  In the absence of optimized multiplication / 
> division for _BitInt, it seems reasonable enough to do this naively 
> (repeatedly dividing by a power of 10 that fits in one limb to determine 
> base 10^N digits from the least significant end, for example), modulo 
> detecting obvious overflow cases up front (if the absolute value is at 

Wouldn't it be cheaper to guess using the 10^3 ~= 2^10 approximation
and instead repeatedly multiply like in the other direction and then just
divide once with remainder?

Jakub
#include 

int
main ()
{
  volatile _Decimal64 d;
  volatile long long l;
  int e;

  feclearexcept (FE_ALL_EXCEPT);
  d = __builtin_infd64 ();
  l = d;
  e = fetestexcept (FE_INVALID);
  feclearexcept (FE_ALL_EXCEPT);
  __builtin_printf ("%016lx %d\n", l, e != 0);
  l = 50LL;
  fesetround (FE_TONEAREST);
  d = l;
  __builtin_printf ("%ld\n", (long long) d);
  fesetround (FE_UPWARD);
  d = l;
  fesetround (FE_TONEAREST);
  __builtin_printf ("%ld\n", (long long) d);
  fesetround (FE_DOWNWARD);
  d = l;
  fesetround (FE_TONEAREST);
  __builtin_printf ("%ld\n", (long long) d);
  l = 01LL;
  fesetround (FE_TONEAREST);
  d = l;
  __builtin_printf ("%ld\n", (long long) d);
  fesetround (FE_UPWARD);
  d = l;
  fesetround (FE_TONEAREST);
  __builtin_printf ("%ld\n", (long long) d);
  fesetround (FE_DOWNWARD);
  d = l;
  fesetround (FE_TONEAREST);
  __builtin_printf ("%ld\n", (long long) d);
}


Re: [PATCH, rs6000] Skip redundant vector extract if the element is first element of dword0 [PR110429]

2023-07-28 Thread Kewen.Lin via Gcc-patches
Hi Haochen,

on 2023/7/5 11:22, HAO CHEN GUI wrote:
> Hi,
>   This patch skips redundant vector extract insn to be generated when
> the extracted element is the first element of dword0 and the destination

"The first element" is confusing, it's easy to be misunderstood as element
0, but in fact the extracted element index is: 
  - for byte, 7 on BE while 8 on LE;
  - for half word, 3 on BE while 4 on LE;

so maybe just say when the extracted index for byte and half word like above,
the element to be stored is already in the corresponding place for stxsi[hb]x,
we don't need a redundant vector extraction at all.

> is a memory operand. Only one 'stxsi[hb]x' instruction is enough.
> 
>   The V4SImode is fixed in a previous patch.
> https://gcc.gnu.org/pipermail/gcc-patches/2023-June/622101.html
> 
>   Bootstrapped and tested on powerpc64-linux BE and LE with no regressions.
> Thanks
> Gui Haochen
> 
> ChangeLog
> rs6000: Skip redundant vector extract if the element is first element of
> dword0
> 
> gcc/
>   PR target/110429
>   * config/rs6000/vsx.md (*vsx_extract__store_p9): Skip vector
>   extract when the element is the first element of dword0.
> 
> gcc/testsuite/
>   PR target/110429
>   * gcc.target/powerpc/pr110429.c: New.
> 
> 
> patch.diff
> diff --git a/gcc/config/rs6000/vsx.md b/gcc/config/rs6000/vsx.md
> index 0c269e4e8d9..b3fec910eb6 100644
> --- a/gcc/config/rs6000/vsx.md
> +++ b/gcc/config/rs6000/vsx.md
> @@ -3855,7 +3855,22 @@ (define_insn_and_split "*vsx_extract__store_p9"
>   (parallel [(match_dup 2)])))
> (clobber (match_dup 4))])
> (set (match_dup 0)
> - (match_dup 3))])
> + (match_dup 3))]
> +{
> +  enum machine_mode dest_mode = GET_MODE (operands[0]);

Nit: Move this line ...

> +
> +  if (which_alternative == 0
> +  && ((mode == V16QImode
> +&& INTVAL (operands[2]) == (BYTES_BIG_ENDIAN ? 7 : 8))
> +   || (mode == V8HImode
> +   && INTVAL (operands[2]) == (BYTES_BIG_ENDIAN ? 3 : 4
> +{

... here.

> +  emit_move_insn (operands[0],
> +   gen_rtx_REG (dest_mode, REGNO (operands[3])));
> +  DONE;
> +}
> +})
> +
> 
>  (define_insn_and_split  "*vsx_extract_si"
>[(set (match_operand:SI 0 "nonimmediate_operand" "=r,wa,Z")
> diff --git a/gcc/testsuite/gcc.target/powerpc/pr110429.c 
> b/gcc/testsuite/gcc.target/powerpc/pr110429.c
> new file mode 100644
> index 000..5a938f9f90a
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/powerpc/pr110429.c
> @@ -0,0 +1,28 @@
> +/* { dg-do compile } */
> +/* { dg-skip-if "" { powerpc*-*-darwin* } } */
> +/* { dg-require-effective-target powerpc_p9vector_ok } */
> +/* { dg-options "-mdejagnu-cpu=power9 -O2" } */
> +/* { dg-require-effective-target has_arch_ppc64 } */
> +
> +#include 
> +
> +#ifdef __BIG_ENDIAN__
> +#define DWORD0_FIRST_SHORT 3
> +#define DWORD0_FIRST_CHAR 7
> +#else
> +#define DWORD0_FIRST_SHORT 4
> +#define DWORD0_FIRST_CHAR 8
> +#endif
> +
> +void vec_extract_short (vector short v, short* p)
> +{
> +   *p = vec_extract(v, DWORD0_FIRST_SHORT);
> +}
> +
> +void vec_extract_char (vector char v, char* p)
> +{
> +   *p = vec_extract(v, DWORD0_FIRST_CHAR);
> +}
> +
> +/* { dg-final { scan-assembler-times "stxsi\[hb\]x" 2 } } */

Nit: Break this check into stxsihx and stxsibx, and surround
with \m and \M.

> +/* { dg-final { scan-assembler-not "vextractu\[hb\]" } } */

Also with \m and \M.

OK for trunk with these nits tweaked and testing goes well,
thanks!

BR,
Kewen


Re: [PATCH v8] RISC-V: Support CALL for RVV floating-point dynamic rounding

2023-07-28 Thread Robin Dapp via Gcc-patches
Hi Pan,

thanks for your patience and your work.  Apart from my general doubt
whether mode-changing intrinsics are a good idea, I don't have other
remarks that need fixing.  What I mentioned before:

 - Handling of asms wouldn't be a huge change.  It can be done
 in a follow-up patch of course but should be done eventually.

 - The code is still rather difficult to follow because we diverge
 from the usual mode-switching semantics e.g. in that we emit insns
 in mode_needed as well as in mode_set.  I would have preferred
 to stay close to the regular usage, document where and why we need
 to do something different and suggest future middle-end improvements
 to solve this more elegantly.

 - I hope non-local control flow like setjmp/longjmp, sibcall
 optimization and maybe others work fine.  I didn't see a reason
 why not but I haven't checked very closely either.

 - We can probably get away with not annotating every call with
 an FRM clobber because there isn't any pass that would make use
 of that anyway?


As to my general qualm, independent of this patch, quickly
summarized again one last time (the problem was latent before this
specific patch anyway):

I would prefer not to have mode-changing intrinsics at all but
have users call fesetround explicitly.  That way the exact point
where the rounding mode is changed would be obvious and not
subject to optimization as well as caching/backing up.
If at all necessary I would have preferred the LLVM way of
backing up, setting new mode, performing the instruction
and restoring directly after.
If the initial intent of mode-changing intrinsics was to give
users more control, I don't believe we achieve this by the "lazy"
restore mechanism which is rather an obfuscation.

Pardon my frankness but the whole mode-changing thing feels to me
like just getting a feature out of the door to solve "something"
/appease users than a well thought-out feature.  It doesn't even
seem clear if this optimization is worthwhile when changing the
rounding mode is prohibitively slow anyway.

That said, if the current status is what the majority of
contributors can live with, I'm not going to stand in the way,
but I'd ask Kito or somebody else to give the final OK.

Regards
 Robin


New template for 'gcc' made available

2023-07-28 Thread Translation Project Robot
Hello, gentle maintainer.

This is a message from the Translation Project robot.  (If you have
any questions, send them to .)

A new POT file for textual domain 'gcc' has been made available
to the language teams for translation.  It is archived as:

https://translationproject.org/POT-files/gcc-13.2.0.pot

Whenever you have a new distribution with a new version number ready,
containing a newer POT file, please send the URL of that distribution
tarball to the address below.  The tarball may be just a pretest or a
snapshot, it does not even have to compile.  It is just used by the
translators when they need some extra translation context.

Below is the URL which has been provided to the translators of your
package.  Please inform the translation coordinator, at the address
at the bottom, if this information is not current:

https://gcc.gnu.org/pub/gcc/releases/gcc-13.2.0/gcc-13.2.0.tar.xz

Translated PO files will later be automatically e-mailed to you.

Thank you for all your work,

The Translation Project robot, in the
name of your translation coordinator.




Re: [PATCH] AArch64: Do not increase the vect reduction latency by multiplying count [PR110625]

2023-07-28 Thread Hao Liu OS via Gcc-patches
Hi Richard,

I've updated the patch and tested on aarch64.  Is it OK?

---

The new costs should only count reduction latency by multiplying count for
single_defuse_cycle.  For other situations, this will increase the reduction
latency a lot and miss vectorization opportunities.

Tested on aarch64-linux-gnu.

gcc/ChangeLog:

PR target/110625
* config/aarch64/aarch64.cc (count_ops): Only '* count' for
single_defuse_cycle while counting reduction_latency.

gcc/testsuite/ChangeLog:

* gcc.target/aarch64/pr110625_1.c: New testcase.
* gcc.target/aarch64/pr110625_2.c: New testcase.
---
 gcc/config/aarch64/aarch64.cc | 13 --
 gcc/testsuite/gcc.target/aarch64/pr110625_1.c | 46 +++
 gcc/testsuite/gcc.target/aarch64/pr110625_2.c | 14 ++
 3 files changed, 69 insertions(+), 4 deletions(-)
 create mode 100644 gcc/testsuite/gcc.target/aarch64/pr110625_1.c
 create mode 100644 gcc/testsuite/gcc.target/aarch64/pr110625_2.c

diff --git a/gcc/config/aarch64/aarch64.cc b/gcc/config/aarch64/aarch64.cc
index 560e5431636..10e7663cc42 100644
--- a/gcc/config/aarch64/aarch64.cc
+++ b/gcc/config/aarch64/aarch64.cc
@@ -16788,10 +16788,15 @@ aarch64_vector_costs::count_ops (unsigned int count, 
vect_cost_for_stmt kind,
 {
   unsigned int base
= aarch64_in_loop_reduction_latency (m_vinfo, stmt_info, m_vec_flags);
-
-  /* ??? Ideally we'd do COUNT reductions in parallel, but unfortunately
-that's not yet the case.  */
-  ops->reduction_latency = MAX (ops->reduction_latency, base * count);
+  if (STMT_VINFO_LIVE_P (stmt_info)
+ && STMT_VINFO_FORCE_SINGLE_CYCLE (
+   info_for_reduction (m_vinfo, stmt_info)))
+   /* ??? Ideally we'd use a tree to reduce the copies down to 1 vector,
+  and then accumulate that, but at the moment the loop-carried
+  dependency includes all copies.  */
+   ops->reduction_latency = MAX (ops->reduction_latency, base * count);
+  else
+   ops->reduction_latency = MAX (ops->reduction_latency, base);
 }

   /* Assume that multiply-adds will become a single operation.  */
diff --git a/gcc/testsuite/gcc.target/aarch64/pr110625_1.c 
b/gcc/testsuite/gcc.target/aarch64/pr110625_1.c
new file mode 100644
index 000..0965cac33a0
--- /dev/null
+++ b/gcc/testsuite/gcc.target/aarch64/pr110625_1.c
@@ -0,0 +1,46 @@
+/* { dg-do compile } */
+/* { dg-options "-Ofast -mcpu=neoverse-n2 -fdump-tree-vect-details 
-fno-tree-slp-vectorize" } */
+/* { dg-final { scan-tree-dump-not "reduction latency = 8" "vect" } } */
+
+/* Do not increase the vector body cost due to the incorrect reduction latency
+Original vector body cost = 51
+Scalar issue estimate:
+  ...
+  reduction latency = 2
+  estimated min cycles per iteration = 2.00
+  estimated cycles per vector iteration (for VF 2) = 4.00
+Vector issue estimate:
+  ...
+  reduction latency = 8  <-- Too large
+  estimated min cycles per iteration = 8.00
+Increasing body cost to 102 because scalar code would issue more quickly
+  ...
+missed:  cost model: the vector iteration cost = 102 divided by the scalar 
iteration cost = 44 is greater or equal to the vectorization factor = 2.
+missed:  not vectorized: vectorization not profitable.  */
+
+typedef struct
+{
+  unsigned short m1, m2, m3, m4;
+} the_struct_t;
+typedef struct
+{
+  double m1, m2, m3, m4, m5;
+} the_struct2_t;
+
+double
+bar (the_struct2_t *);
+
+double
+foo (double *k, unsigned int n, the_struct_t *the_struct)
+{
+  unsigned int u;
+  the_struct2_t result;
+  for (u = 0; u < n; u++, k--)
+{
+  result.m1 += (*k) * the_struct[u].m1;
+  result.m2 += (*k) * the_struct[u].m2;
+  result.m3 += (*k) * the_struct[u].m3;
+  result.m4 += (*k) * the_struct[u].m4;
+}
+  return bar (&result);
+}
diff --git a/gcc/testsuite/gcc.target/aarch64/pr110625_2.c 
b/gcc/testsuite/gcc.target/aarch64/pr110625_2.c
new file mode 100644
index 000..7a84aa8355e
--- /dev/null
+++ b/gcc/testsuite/gcc.target/aarch64/pr110625_2.c
@@ -0,0 +1,14 @@
+/* { dg-do compile } */
+/* { dg-options "-Ofast -mcpu=neoverse-n2 -fdump-tree-vect-details 
-fno-tree-slp-vectorize" } */
+/* { dg-final { scan-tree-dump "reduction latency = 8" "vect" } } */
+
+/* The reduction latency should be multiplied by the count for
+   single_defuse_cycle.  */
+
+long
+f (long res, short *ptr1, short *ptr2, int n)
+{
+  for (int i = 0; i < n; ++i)
+res += (long) ptr1[i] << ptr2[i];
+  return res;
+}
--
2.34.1



From: Hao Liu OS 
Sent: Wednesday, July 26, 2023 20:54
To: Richard Sandiford; Richard Biener
Cc: GCC-patches@gcc.gnu.org
Subject: Re: [PATCH] AArch64: Do not increase the vect reduction latency by 
multiplying count [PR110625]

> Ah, thanks.  In that case, Hao, I think we can avoid the ICE by changing:
>
>   if ((kind == scalar_stmt || kind == vector_stmt || kin

[PATCH] gcse: Extract reg pressure handling into separate file.

2023-07-28 Thread Robin Dapp via Gcc-patches
Hi,

this patch extracts the hoist-pressure handling from gcse and puts it
into a separate file so it can be used by other passes in the future.
No functional change and I also abstained from c++ifying the code.
The naming with the regpressure_ prefix might be a bit clunky for
now and I'm open to a better scheme.

Some minor helper functions are added that just encapsulate BB aux
data manipulation.  All of this is in preparation for fwprop to
use register pressure data if needed.

Bootstrapped and regtested on x86, aarch64 and power. 

Regards
 Robin

>From 65e69834eeb08ba093786e386ac16797cec4d8a7 Mon Sep 17 00:00:00 2001
From: Robin Dapp 
Date: Mon, 24 Jul 2023 16:25:38 +0200
Subject: [PATCH] gcse: Extract reg pressure handling into separate file.

This patch extracts the hoist-pressure handling from gcse into a separate
file so it can be used by other passes in the future.  No functional change.

gcc/ChangeLog:

* Makefile.in: Add regpressure.o.
* gcse.cc (struct bb_data): Move to regpressure.cc.
(BB_DATA): Ditto.
(get_regno_pressure_class): Ditto.
(get_pressure_class_and_nregs): Ditto.
(record_set_data): Ditto.
(update_bb_reg_pressure): Ditto.
(should_hoist_expr_to_dom): Ditto.
(hoist_code): Ditto.
(change_pressure): Ditto.
(calculate_bb_reg_pressure): Ditto.
(one_code_hoisting_pass): Ditto.
* gcse.h (single_set_gcse): Export single_set_gcse.
* regpressure.cc: New file.
* regpressure.h: New file.
---
 gcc/Makefile.in|   1 +
 gcc/gcse.cc| 304 ++--
 gcc/gcse.h |   2 +
 gcc/regpressure.cc | 379 +
 gcc/regpressure.h  |  46 ++
 5 files changed, 445 insertions(+), 287 deletions(-)
 create mode 100644 gcc/regpressure.cc
 create mode 100644 gcc/regpressure.h

diff --git a/gcc/Makefile.in b/gcc/Makefile.in
index 683774ad446..0a8e23e 100644
--- a/gcc/Makefile.in
+++ b/gcc/Makefile.in
@@ -1605,6 +1605,7 @@ OBJS = \
reg-stack.o \
regcprop.o \
reginfo.o \
+   regpressure.o \
regrename.o \
regstat.o \
reload.o \
diff --git a/gcc/gcse.cc b/gcc/gcse.cc
index f689c0c2687..5bafef7970f 100644
--- a/gcc/gcse.cc
+++ b/gcc/gcse.cc
@@ -160,6 +160,7 @@ along with GCC; see the file COPYING3.  If not see
 #include "gcse.h"
 #include "gcse-common.h"
 #include "function-abi.h"
+#include "regpressure.h"
 
 /* We support GCSE via Partial Redundancy Elimination.  PRE optimizations
are a superset of those done by classic GCSE.
@@ -419,30 +420,6 @@ static bool doing_code_hoisting_p = false;
 /* For available exprs */
 static sbitmap *ae_kill;
 
-/* Data stored for each basic block.  */
-struct bb_data
-{
-  /* Maximal register pressure inside basic block for given register class
- (defined only for the pressure classes).  */
-  int max_reg_pressure[N_REG_CLASSES];
-  /* Recorded register pressure of basic block before trying to hoist
- an expression.  Will be used to restore the register pressure
- if the expression should not be hoisted.  */
-  int old_pressure;
-  /* Recorded register live_in info of basic block during code hoisting
- process.  BACKUP is used to record live_in info before trying to
- hoist an expression, and will be used to restore LIVE_IN if the
- expression should not be hoisted.  */
-  bitmap live_in, backup;
-};
-
-#define BB_DATA(bb) ((struct bb_data *) (bb)->aux)
-
-static basic_block curr_bb;
-
-/* Current register pressure for each pressure class.  */
-static int curr_reg_pressure[N_REG_CLASSES];
-
 
 static void compute_can_copy (void);
 static void *gmalloc (size_t) ATTRIBUTE_MALLOC;
@@ -494,8 +471,6 @@ static bool should_hoist_expr_to_dom (basic_block, struct 
gcse_expr *,
  enum reg_class,
  int *, bitmap, rtx_insn *);
 static bool hoist_code (void);
-static enum reg_class get_regno_pressure_class (int regno, int *nregs);
-static enum reg_class get_pressure_class_and_nregs (rtx_insn *insn, int 
*nregs);
 static bool one_code_hoisting_pass (void);
 static rtx_insn *process_insert_insn (struct gcse_expr *);
 static bool pre_edge_insert (struct edge_list *, struct gcse_expr **);
@@ -2402,7 +2377,7 @@ record_set_data (rtx dest, const_rtx set, void *data)
 }
 }
 
-static const_rtx
+const_rtx
 single_set_gcse (rtx_insn *insn)
 {
   struct set_data s;
@@ -2804,72 +2779,6 @@ compute_code_hoist_data (void)
 fprintf (dump_file, "\n");
 }
 
-/* Update register pressure for BB when hoisting an expression from
-   instruction FROM, if live ranges of inputs are shrunk.  Also
-   maintain live_in information if live range of register referred
-   in FROM is shrunk.
-   
-   Return 0 if register pressure doesn't change, otherwise return
-   the number by which register pressure is decreased.
-   
-   NOTE: Register pressure won't be increased

Re: [PATCH v2] c-family: Implement pragma_lex () for preprocess-only mode

2023-07-28 Thread Lewis Hyatt via Gcc-patches
On Thu, Jul 27, 2023 at 06:18:33PM -0700, Jason Merrill wrote:
> On 7/27/23 18:59, Lewis Hyatt wrote:
> > In order to support processing #pragma in preprocess-only mode (-E or
> > -save-temps for gcc/g++), we need a way to obtain the #pragma tokens from
> > libcpp. In full compilation modes, this is accomplished by calling
> > pragma_lex (), which is a symbol that must be exported by the frontend, and
> > which is currently implemented for C and C++. Neither of those frontends
> > initializes its parser machinery in preprocess-only mode, and consequently
> > pragma_lex () does not work in this case.
> > 
> > Address that by adding a new function c_init_preprocess () for the frontends
> > to implement, which arranges for pragma_lex () to work in preprocess-only
> > mode, and adjusting pragma_lex () accordingly.
> > 
> > In preprocess-only mode, the preprocessor is accustomed to controlling the
> > interaction with libcpp, and it only knows about tokens that it has called
> > into libcpp itself to obtain. Since it still needs to see the tokens
> > obtained by pragma_lex () so that they can be streamed to the output, also
> > adjust c_lex_with_flags () and related functions in c-family/c-lex.cc to
> > inform the preprocessor about any tokens it won't be aware of.
> > 
> > Currently, there is one place where we are already supporting #pragma in
> > preprocess-only mode, namely the handling of `#pragma GCC diagnostic'.  That
> > was done by directly interfacing with libcpp, rather than making use of
> > pragma_lex (). Now that pragma_lex () works, that code is no longer
> > necessary; remove it.
> > 
> > gcc/c-family/ChangeLog:
> > 
> > * c-common.h (c_init_preprocess): Declare.
> > (c_lex_enable_token_streaming): Declare.
> > * c-opts.cc (c_common_init): Call c_init_preprocess ().
> > * c-lex.cc (stream_tokens_to_preprocessor): New static variable.
> > (c_lex_enable_token_streaming): New function.
> > (cb_def_pragma): Add a comment.
> > (get_token): New function wrapping cpp_get_token.
> > (c_lex_with_flags): Use the new wrapper function to support
> > obtaining tokens in preprocess_only mode.
> > (lex_string): Likewise.
> > * c-ppoutput.cc (preprocess_file): Call c_lex_enable_token_streaming
> > when needed.
> > * c-pragma.cc (pragma_diagnostic_lex_normal): Rename to...
> > (pragma_diagnostic_lex): ...this.
> > (pragma_diagnostic_lex_pp): Remove.
> > (handle_pragma_diagnostic_impl): Call pragma_diagnostic_lex () in
> > all modes.
> > (c_pp_invoke_early_pragma_handler): Adapt to support pragma_lex ()
> > usage.
> > * c-pragma.h (pragma_lex_discard_to_eol): Declare.
> > 
> > gcc/c/ChangeLog:
> > 
> > * c-parser.cc (pragma_lex_discard_to_eol): New function.
> > (c_init_preprocess): New function.
> > 
> > gcc/cp/ChangeLog:
> > 
> > * parser.cc (c_init_preprocess): New function.
> > (maybe_read_tokens_for_pragma_lex): New function.
> > (pragma_lex): Support preprocess-only mode.
> > (pragma_lex_discard_to_eol): New function.
> > ---
> > 
> > Notes:
> >  Hello-
> >  Here is version 2 of the patch, incorporating Jason's feedback from
> >  https://gcc.gnu.org/pipermail/gcc-patches/2023-July/625591.html
> >  Thanks again, please let me know if it's OK? Bootstrap + regtest all
> >  languages on x86-64 Linux looks good.
> >  -Lewis
> > 
> >   gcc/c-family/c-common.h|  4 +++
> >   gcc/c-family/c-lex.cc  | 49 +
> >   gcc/c-family/c-opts.cc |  1 +
> >   gcc/c-family/c-ppoutput.cc | 17 +---
> >   gcc/c-family/c-pragma.cc   | 56 ++
> >   gcc/c-family/c-pragma.h|  2 ++
> >   gcc/c/c-parser.cc  | 21 ++
> >   gcc/cp/parser.cc   | 45 ++
> >   8 files changed, 138 insertions(+), 57 deletions(-)
> > 
> > diff --git a/gcc/c-family/c-common.h b/gcc/c-family/c-common.h
> > index b5ef5ff6b2c..2fe2f194660 100644
> > --- a/gcc/c-family/c-common.h
> > +++ b/gcc/c-family/c-common.h
> > @@ -990,6 +990,9 @@ extern void c_parse_file (void);
> >   extern void c_parse_final_cleanups (void);
> > +/* This initializes for preprocess-only mode.  */
> > +extern void c_init_preprocess (void);
> > +
> >   /* These macros provide convenient access to the various _STMT nodes.  */
> >   /* Nonzero if a given STATEMENT_LIST represents the outermost binding
> > @@ -1214,6 +1217,7 @@ extern tree c_build_bind_expr (location_t, tree, 
> > tree);
> >   /* In c-lex.cc.  */
> >   extern enum cpp_ttype
> >   conflict_marker_get_final_tok_kind (enum cpp_ttype tok1_kind);
> > +extern void c_lex_enable_token_streaming (bool enabled);
> >   /* In c-pch.cc  */
> >   extern void pch_init (void);
> > diff --git a/gcc/c-family/c-lex.cc b/gcc/c-family/c-lex.cc
> > index dcd061c7cb1..ac4c018d863 100644
> > --- a/gcc/c-family/c-lex.cc
> > +++ b/gcc/c-family/c-lex.cc
> > @@ -57,6 +57,17 @@ static void c

[PATCH] gimple-fold: Handle _BitInt in __builtin_clear_padding [PR102989]

2023-07-28 Thread Jakub Jelinek via Gcc-patches
Hi!

The comments about _Atomic _BitInt made me figure out I forgot (although
earlier was planning to do that) to implement __builtin_clear_padding
support for _BitInt.

The following patch (incremental to the _BitInt series) does that.

2023-07-28  Jakub Jelinek  

PR c/102989
* gimple-fold.cc (clear_padding_unit): Mention in comment that
_BitInt types don't need to fit either.
(clear_padding_bitint_needs_padding_p): New function.
(clear_padding_type_may_have_padding_p): Handle BITINT_TYPE.
(clear_padding_type): Likewise.

* gcc.dg/bitint-16.c: New test.

--- gcc/gimple-fold.cc.jj   2023-07-11 15:28:54.704679510 +0200
+++ gcc/gimple-fold.cc  2023-07-28 12:37:18.971789595 +0200
@@ -4103,8 +4103,8 @@ gimple_fold_builtin_realloc (gimple_stmt
   return false;
 }
 
-/* Number of bytes into which any type but aggregate or vector types
-   should fit.  */
+/* Number of bytes into which any type but aggregate, vector or
+   _BitInt types should fit.  */
 static constexpr size_t clear_padding_unit
   = MAX_BITSIZE_MODE_ANY_MODE / BITS_PER_UNIT;
 /* Buffer size on which __builtin_clear_padding folding code works.  */
@@ -4595,6 +4595,26 @@ clear_padding_real_needs_padding_p (tree
  && (fmt->signbit_ro == 79 || fmt->signbit_ro == 95));
 }
 
+/* _BitInt has padding bits if it isn't extended in the ABI and has smaller
+   precision than bits in limb or corresponding number of limbs.  */
+
+static bool
+clear_padding_bitint_needs_padding_p (tree type)
+{
+  struct bitint_info info;
+  gcc_assert (targetm.c.bitint_type_info (TYPE_PRECISION (type), &info));
+  if (info.extended)
+return false;
+  scalar_int_mode limb_mode = as_a  (info.limb_mode);
+  if (TYPE_PRECISION (type) < GET_MODE_PRECISION (limb_mode))
+return true;
+  else if (TYPE_PRECISION (type) == GET_MODE_PRECISION (limb_mode))
+return false;
+  else
+return (((unsigned) TYPE_PRECISION (type))
+   % GET_MODE_PRECISION (limb_mode)) != 0;
+}
+
 /* Return true if TYPE might contain any padding bits.  */
 
 bool
@@ -4611,6 +4631,8 @@ clear_padding_type_may_have_padding_p (t
   return clear_padding_type_may_have_padding_p (TREE_TYPE (type));
 case REAL_TYPE:
   return clear_padding_real_needs_padding_p (type);
+case BITINT_TYPE:
+  return clear_padding_bitint_needs_padding_p (type);
 default:
   return false;
 }
@@ -4855,6 +4877,57 @@ clear_padding_type (clear_padding_struct
   memset (buf->buf + buf->size, ~0, sz);
   buf->size += sz;
   break;
+case BITINT_TYPE:
+  {
+   struct bitint_info info;
+   gcc_assert (targetm.c.bitint_type_info (TYPE_PRECISION (type), &info));
+   scalar_int_mode limb_mode = as_a  (info.limb_mode);
+   if (TYPE_PRECISION (type) <= GET_MODE_PRECISION (limb_mode))
+ {
+   gcc_assert ((size_t) sz <= clear_padding_unit);
+   if ((unsigned HOST_WIDE_INT) sz + buf->size
+   > clear_padding_buf_size)
+ clear_padding_flush (buf, false);
+   if (!info.extended
+   && TYPE_PRECISION (type) < GET_MODE_PRECISION (limb_mode))
+ {
+   int tprec = GET_MODE_PRECISION (limb_mode);
+   int prec = TYPE_PRECISION (type);
+   tree t = build_nonstandard_integer_type (tprec, 1);
+   tree cst = wide_int_to_tree (t, wi::mask (prec, true, tprec));
+   int len = native_encode_expr (cst, buf->buf + buf->size, sz);
+   gcc_assert (len > 0 && (size_t) len == (size_t) sz);
+ }
+   else
+ memset (buf->buf + buf->size, 0, sz);
+   buf->size += sz;
+   break;
+ }
+   tree limbtype
+ = build_nonstandard_integer_type (GET_MODE_PRECISION (limb_mode), 1);
+   fldsz = int_size_in_bytes (limbtype);
+   nelts = int_size_in_bytes (type) / fldsz;
+   for (HOST_WIDE_INT i = 0; i < nelts; i++)
+ {
+   if (!info.extended
+   && i == (info.big_endian ? 0 : nelts - 1)
+   && (((unsigned) TYPE_PRECISION (type))
+   % TYPE_PRECISION (limbtype)) != 0)
+ {
+   int tprec = GET_MODE_PRECISION (limb_mode);
+   int prec = (((unsigned) TYPE_PRECISION (type)) % tprec);
+   tree cst = wide_int_to_tree (limbtype,
+wi::mask (prec, true, tprec));
+   int len = native_encode_expr (cst, buf->buf + buf->size,
+ fldsz);
+   gcc_assert (len > 0 && (size_t) len == (size_t) fldsz);
+   buf->size += fldsz;
+ }
+   else
+ clear_padding_type (buf, limbtype, fldsz, for_auto_init);
+ }
+   break;
+  }
 default:
   gcc_assert ((size_t) sz <= clear_padding_unit);
   if ((unsigned HOST_WIDE_INT) sz + buf->size > clear_padding_buf_size)
--- gcc

Re: _BitInt vs. _Atomic

2023-07-28 Thread Jakub Jelinek via Gcc-patches
On Thu, Jul 27, 2023 at 07:06:03PM +, Joseph Myers wrote:
> I think there should be tests for _Atomic _BitInt types.  Hopefully atomic 
> compound assignment just works via the logic for compare-and-exchange 
> loops, but does e.g. atomic_fetch_add work with _Atomic _BitInt types?

So, there are 2 issues.

One is something I haven't seen being handled for C at all so far, but
handled for C++ - padding bits.

Already e.g. x86 long double has some padding bits - 16 bits on ia32,
48 bits on x86_64, when one does
  _Atomic long double l;
...
  l += 2.0;
it will sometimes work and sometimes hang forever.
Similarly atomic_compare_exchange with structs which contain padding
(unions with padding bits are lost case, there is nothing that can be
reliably done for that, because we don't know at runtime what is the active
union member if any).  And _BitInt if it doesn't use all bits in
all containing limbs has padding as well (and psABI doesn't say it is sign
or zero extended).

The C++ way of dealing with this is using __builtin_clear_padding,
done on atomic stores/updates of the atomic memory (padding is cleared
if any on the value to be stored, or on the expected and desired values).

I don't know enough about the C atomic requirements whether that is feasible
for it as well, or whether it is possible to make the padding bits partially
or fully set somehow non-atomically without invoking UB and then make it
never match.

If one ignores this or deals with it, then

_Atomic _BitInt(15) a;
_Atomic(_BitInt(15)) b;
_Atomic _BitInt(115) c;
_Atomic _BitInt(192) d;
_Atomic _BitInt(575) e;
_BitInt(575) f;

int
main ()
{
  a += 1wb;
  b -= 2wb;
  c += 3wb;
  d += 4wb;
  e -= 5wb;
//  f = __atomic_fetch_add (&e, 
54342985743985743985743895743834298574985734895743895734895wb, 
__ATOMIC_SEQ_CST);
}

compiles fine with the patch set.

And another issue is that while __atomic_load, __atomic_store,
__atomic_exchange and __atomic_compare_exchange work on arbitrary _BitInt
sizes, others like __atomic_fetch_add only support _BitInt or other integral
types which have size of 1, 2, 4, 8 or 16 bytes, others emit an error
in c-family/c-common.cc (sync_resolve_size).  So, either
resolve_overloaded_builtin should for the case when pointer is pointer to
_BitInt which doesn't have 1, 2, 4, 8 or 16 bytes size lower those into
a loop using __atomic_compare_exchange (or perhaps also if there is
padding), or  should do that.

Thoughts on that?

Jakub



[PATCH v2] RISC-V: convert the mulh with 0 to mov 0 to the reg.

2023-07-28 Thread yanzhang.wang--- via Gcc-patches
From: Yanzhang Wang 

This patch will optimize the below mulh example,

vint32m1_t shortcut_for_riscv_vmulh_case_0(vint32m1_t v1, size_t vl) {
  return __riscv_vmulh_vx_i32m1(v1, 0, vl);
}

from mulh pattern

vsetvli   zero, a2, e32, m1, ta, ma
vmulh.vx  v24, v24, zero
vs1r.vv24, 0(a0)

to below vmv.

vsetvli zero,a2,e32,m1,ta,ma
vmv.v.i v1,0
vs1r.v  v1,0(a0)

It will elimate the mul with const 0 instruction to the simple mov
instruction.

Signed-off-by: Yanzhang Wang 

gcc/ChangeLog:

* config/riscv/autovec-opt.md: Add a split pattern.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/rvv/base/binop_vx_constraint-121.c: The mul
  with 0 will be simplified to vmv.v.i.
* gcc.target/riscv/rvv/autovec/vmulh-with-zero.cc: New test.
---
 gcc/config/riscv/autovec-opt.md   | 58 +++
 gcc/config/riscv/riscv-protos.h   |  2 +
 gcc/config/riscv/riscv-v.cc   | 57 ++
 .../riscv/rvv/autovec/vmulh-with-zero.cc  | 19 ++
 .../riscv/rvv/base/binop_vx_constraint-121.c  |  3 +-
 5 files changed, 138 insertions(+), 1 deletion(-)
 create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/autovec/vmulh-with-zero.cc

diff --git a/gcc/config/riscv/autovec-opt.md b/gcc/config/riscv/autovec-opt.md
index 28040805b23..0d87572d1a4 100644
--- a/gcc/config/riscv/autovec-opt.md
+++ b/gcc/config/riscv/autovec-opt.md
@@ -405,3 +405,61 @@
   "vmv.x.s\t%0,%1"
   [(set_attr "type" "vimovvx")
(set_attr "mode" "")])
+
+;;; Simplify the mulh with 0 to move
+(define_split
+  [(set (match_operand:VI_QHS 0 "register_operand")
+ (if_then_else:VI_QHS
+   (unspec:
+[(match_operand: 1 "vector_all_trues_mask_operand")
+  (match_operand 5 "vector_length_operand")
+  (match_operand 6 "const_int_operand")
+  (match_operand 7 "const_int_operand")
+  (match_operand 8 "const_int_operand")
+  (reg:SI VL_REGNUM)
+  (reg:SI VTYPE_REGNUM)] UNSPEC_VPREDICATE)
+   (unspec:VI_QHS
+[(vec_duplicate:VI_QHS
+   (match_operand: 4 "reg_or_0_operand"))
+  (match_operand:VI_QHS 3 "register_operand")] VMULH)
+   (match_operand:VI_QHS 2 "vector_merge_operand")
+   ))]
+  "TARGET_VECTOR
+ && rtx_equal_p (operands[4], CONST0_RTX (GET_MODE (operands[4])))"
+  [(const_int 0)]
+{
+  riscv_vector::simplify_unspec_operations (operands, UNSPEC,
+, mode) ;
+  DONE;
+})
+
+;;; Simplify vmadc + vadc with 0 to a simple move.
+(define_split
+  [(set (match_operand:VI 0 "register_operand")
+ (if_then_else:VI
+   (unspec:
+[(match_operand 4 "vector_length_operand")
+  (match_operand 5 "const_int_operand")
+  (match_operand 6 "const_int_operand")
+  (reg:SI VL_REGNUM)
+  (reg:SI VTYPE_REGNUM)] UNSPEC_VPREDICATE)
+   (unspec:VI
+[(match_operand:VI 2 "register_operand")
+  (unspec:
+[(match_operand:VI 3 "register_operand")
+  (unspec:
+[(match_operand 7 "vector_length_operand")
+  (match_operand 8 "const_int_operand")
+  (reg:SI VL_REGNUM)
+  (reg:SI VTYPE_REGNUM)] UNSPEC_VPREDICATE)
+  ] UNSPEC_OVERFLOW)
+  ] UNSPEC_VADC)
+   (match_operand:VI 1 "vector_merge_operand")))]
+  "TARGET_VECTOR"
+  [(const_int 0)]
+{
+  riscv_vector::simplify_unspec_operations (operands, PLUS, UNSPEC_VADC,
+   mode);
+  DONE;
+})
+
diff --git a/gcc/config/riscv/riscv-protos.h b/gcc/config/riscv/riscv-protos.h
index f052757cede..6a188a3d0ef 100644
--- a/gcc/config/riscv/riscv-protos.h
+++ b/gcc/config/riscv/riscv-protos.h
@@ -228,6 +228,8 @@ bool neg_simm5_p (rtx);
 bool has_vi_variant_p (rtx_code, rtx);
 void expand_vec_cmp (rtx, rtx_code, rtx, rtx);
 bool expand_vec_cmp_float (rtx, rtx_code, rtx, rtx, bool);
+void simplify_complement (rtx *, rtx_code, machine_mode);
+void simplify_unspec_operations (rtx*, rtx_code, int, machine_mode);
 #endif
 bool sew64_scalar_helper (rtx *, rtx *, rtx, machine_mode,
  bool, void (*)(rtx *, rtx));
diff --git a/gcc/config/riscv/riscv-v.cc b/gcc/config/riscv/riscv-v.cc
index 839a2c6ba71..9a9428ce18d 100644
--- a/gcc/config/riscv/riscv-v.cc
+++ b/gcc/config/riscv/riscv-v.cc
@@ -2721,4 +2721,61 @@ expand_select_vl (rtx *ops)
   emit_insn (gen_no_side_effects_vsetvl_rtx (rvv_mode, ops[0], ops[1]));
 }
 
+void simplify_mulh (rtx *operands,
+   machine_mode mode)
+{
+  rtx zero_operand = CONST0_RTX(GET_MODE(operands[4]));
+  if (rtx_equal_p(operands[4], zero_operand))
+{
+  machine_mode mask_mode = riscv_vector::get_mask_mode (mode).require ();
+  emit_insn (gen_pred_mov (mode, operands[0], CONST1_RTX (mask_mode),
+  RVV_VUNDEF (mode),
+  CONST0_RTX (GET_MODE (operands[0])),
+  ope

[patch] libgomp: cuda.h and omp_target_memcpy_rect cleanup (was: [patch] OpenMP: Call cuMemcpy2D/cuMemcpy3D for nvptx for omp_target_memcpy_rect)

2023-07-28 Thread Tobias Burnus

Hi Thomas,

thanks for proof reading and the suggestions! – Do have comments to the
attached patch?

* * *

Crossref: For further optimizations, see also

https://gcc.gnu.org/PR101581 — [OpenMP] omp_target_memcpy – support
inter-device memcpy
https://gcc.gnu.org/PR110813 — [OpenMP] omp_target_memcpy_rect (+
strided 'target update'): Improve GCN performance and contiguous subranges

and just added based on Thomas' comment:

https://gcc.gnu.org/PR107424 — [OpenMP] Check whether device locking is
really needed for bare memcopy to/from devices (omp_target_memcpy...)

* * *

On 27.07.23 23:00, Thomas Schwinge wrote:

+++ b/include/cuda/cuda.h

I note that you're not actually using everything you're adding here.
(..., but I understand you're simply adding everying that relates to
these 'cuMemcpy[...]' routines -- OK as far as I'm concerned.)


Yes. That was on purpose to make it easier to pick something when needed
– especially as we might want to use some of those later on.

For symmetry, I now also added cuMemcpyPeer + ...Async, which also
remain unused. (But could be used as part of the PRs linked above.)


+  const void *dstHost;

That last one isn't 'const'.  ;-)

Fixed - three times.

A 'cuda.h' that I looked at calls that last one 'reserved0', with comment
"Must be NULL".

Seems to be unused in real world code and in the documentation. But
let's use this name as it might be exposed in the wild.

--- a/libgomp/libgomp-plugin.h
+++ b/libgomp/libgomp-plugin.h
+extern int GOMP_OFFLOAD_memcpy2d (int, int, size_t, size_t,
+   void*, size_t, size_t, size_t,
+   const void*, size_t, size_t, size_t);
+extern int GOMP_OFFLOAD_memcpy3d (int, int, size_t, size_t, size_t, void *,
+   size_t, size_t, size_t, size_t, size_t,
+   const void *, size_t, size_t, size_t, size_t,
+   size_t);

Oh, wow.  ;-)


Maybe this is not the best ABI. We can consider to modify it before the
GCC 14 release. (And in principle also afterwards, given that libgomp
and its plugins should™ be compiled and installed alongside.)

I think once we know how to implement GCN, we will see whether it was
done smartly or whether other arguments should be used or whether the
two functions should be combined.

[Regarding the reserve0/reserve1 values for cuMemcpy3D and whether it
should be NULL or not; quoting the usage in plugin-nvptx.c:]


I note that this doesn't adhere to the two "Must be NULL" remarks from
above -- but I'm confused, because, for example, on
capabilities
+& GOMP_OFFLOAD_CAP_SHARED_MEM)))

Are these 'GOMP_OFFLOAD_CAP_SHARED_MEM' actually reachable, given that
'omp_target_memcpy_check' (via 'omp_target_memcpy_rect_check') clears out
the device to 'NULL' for 'GOMP_OFFLOAD_CAP_SHARED_MEM'?


I have now undone this change – I did not dig deep enough into the
function calls.



+  else if (dst_devicep == NULL && src_devicep == NULL)
+ {
+   memcpy ((char *) dst + dst_off, (const char *) src + src_off,
+   length);
+   ret = 1;
+ }
else if (src_devicep == dst_devicep)
   ret = src_devicep->dev2dev_func (src_devicep->target_id,
(char *) dst + dst_off,
(const char *) src + src_off,
length);

..., but also left the intra-device case here -- which should now be dead
code here?


Why? Unless I missed something, the old, the current, and the proposed
(= old) code do still run this code.

I have not added an assert to confirm, but in any case, it is tested for
in my recently added testcase - thus, we could add a 'printf' to confirm.


+   else if (*tmp_size < length)
+ {
+   *tmp_size = length;
+   *tmp = realloc (*tmp, length);
+   if (*tmp == NULL)
+ return ENOMEM;

If 'realloc' returns 'NULL', we should 'free' the original '*tmp'?

Do we really need here the property here that if the re-allocation can't
be done in-place, 'realloc' copies the original content to the new?  In
other words, should we just unconditionally 'free' and re-'malloc' here,
instead of 'realloc'?

I have now done so – but I am not really sure what's faster on average.
If it can be enlarged, 'realloc' is faster, if it cannot free+malloc is
better.

I haven't looked whether the re-use of 'tmp' for multiple calls

Re: [PATCH] gcc-ar: Handle response files properly [PR77576]

2023-07-28 Thread Costas Argyris via Gcc-patches
ping

On Fri, 14 Jul 2023 at 09:05, Costas Argyris 
wrote:

> Pinging to try and get this bug in gcc-ar fixed.
>
> Note that the patch posted as an attachment in
>
> https://gcc.gnu.org/pipermail/gcc-patches/2023-July/623400.html
>
> is exactly the same as the patch embedded in
>
> https://gcc.gnu.org/pipermail/gcc-patches/2023-July/623855.html
>
> and the one posted in the PR itself
>
> https://gcc.gnu.org/bugzilla/show_bug.cgi?id=77576
>
> On Fri, 7 Jul 2023 at 13:00, Costas Argyris 
> wrote:
>
>> Bootstrapped successfully on x86_64-pc-linux-gnu
>>
>> On Fri, 7 Jul 2023 at 11:33, Costas Argyris 
>> wrote:
>>
>>> Problem: gcc-ar fails when a @file is passed to it:
>>>
>>> $ cat rsp
>>> --version
>>> $ gcc-ar @rsp
>>> /usr/bin/ar: invalid option -- '@'
>>>
>>> This is because a dash '-' is prepended to the first
>>> argument if it doesn't start with one, resulting in
>>> the wrong call 'ar -@rsp'.
>>>
>>> Fix: Expand argv to get rid of any @files and if any
>>> expansions were made, pass everything through a
>>> temporary response file.
>>>
>>> $ gcc-ar @rsp
>>> GNU ar (GNU Binutils for Debian) 2.35.2
>>> ...
>>>
>>>
>>> PR gcc-ar/77576
>>> * gcc/gcc-ar.cc (main): Expand argv and use
>>> temporary response file to call ar if any
>>> expansions were made.
>>> ---
>>>  gcc/gcc-ar.cc | 47 +++
>>>  1 file changed, 47 insertions(+)
>>>
>>> diff --git a/gcc/gcc-ar.cc b/gcc/gcc-ar.cc
>>> index 4e4c525927d..417c4913793 100644
>>> --- a/gcc/gcc-ar.cc
>>> +++ b/gcc/gcc-ar.cc
>>> @@ -135,6 +135,10 @@ main (int ac, char **av)
>>>int k, status, err;
>>>const char *err_msg;
>>>const char **nargv;
>>> +  char **old_argv;
>>> +  const char *rsp_file = NULL;
>>> +  const char *rsp_arg = NULL;
>>> +  const char *rsp_argv[3];
>>>bool is_ar = !strcmp (PERSONALITY, "ar");
>>>int exit_code = FATAL_EXIT_CODE;
>>>int i;
>>> @@ -209,6 +213,13 @@ main (int ac, char **av)
>>>   }
>>>  }
>>>
>>> +  /* Expand any @files before modifying the command line
>>> + and use a temporary response file if there were any.  */
>>> +  old_argv = av;
>>> +  expandargv (&ac, &av);
>>> +  if (av != old_argv)
>>> +rsp_file = make_temp_file ("");
>>> +
>>>/* Prepend - if necessary.  */
>>>if (is_ar && av[1] && av[1][0] != '-')
>>>  av[1] = concat ("-", av[1], NULL);
>>> @@ -225,6 +236,39 @@ main (int ac, char **av)
>>>  nargv[j + k] = av[k];
>>>nargv[j + k] = NULL;
>>>
>>> +  /* If @file was passed, put nargv into the temporary response
>>> + file and then change it to a single @FILE argument, where
>>> + FILE is the temporary filename.  */
>>> +  if (rsp_file)
>>> +{
>>> +  FILE *f;
>>> +  int status;
>>> +  f = fopen (rsp_file, "w");
>>> +  if (f == NULL)
>>> +{
>>> +  fprintf (stderr, "Cannot open temporary file %s\n", rsp_file);
>>> +  exit (1);
>>> +}
>>> +  status = writeargv (
>>> +  CONST_CAST2 (char * const *, const char **, nargv) + 1, f);
>>> +  if (status)
>>> +{
>>> +  fprintf (stderr, "Cannot write to temporary file %s\n",
>>> rsp_file);
>>> +  exit (1);
>>> +}
>>> +  status = fclose (f);
>>> +  if (EOF == status)
>>> +{
>>> +  fprintf (stderr, "Cannot close temporary file %s\n",
>>> rsp_file);
>>> +  exit (1);
>>> +}
>>> +  rsp_arg = concat ("@", rsp_file, NULL);
>>> +  rsp_argv[0] = nargv[0];
>>> +  rsp_argv[1] = rsp_arg;
>>> +  rsp_argv[2] = NULL;
>>> +  nargv = rsp_argv;
>>> +}
>>> +
>>>/* Run utility */
>>>/* ??? the const is misplaced in pex_one's argv? */
>>>err_msg = pex_one (PEX_LAST|PEX_SEARCH,
>>> @@ -249,5 +293,8 @@ main (int ac, char **av)
>>>else
>>>  exit_code = SUCCESS_EXIT_CODE;
>>>
>>> +  if (rsp_file)
>>> +unlink (rsp_file);
>>> +
>>>return exit_code;
>>>  }
>>> --
>>> 2.30.2
>>>
>>


Re: [PATCH v2] RISC-V: convert the mulh with 0 to mov 0 to the reg.

2023-07-28 Thread Kito Cheng via Gcc-patches
 於 2023年7月28日 週五 19:50 寫道:

> From: Yanzhang Wang 
>
> This patch will optimize the below mulh example,
>
> vint32m1_t shortcut_for_riscv_vmulh_case_0(vint32m1_t v1, size_t vl) {
>   return __riscv_vmulh_vx_i32m1(v1, 0, vl);
> }
>
> from mulh pattern
>
> vsetvli   zero, a2, e32, m1, ta, ma
> vmulh.vx  v24, v24, zero
> vs1r.vv24, 0(a0)
>
> to below vmv.
>
> vsetvli zero,a2,e32,m1,ta,ma
> vmv.v.i v1,0
> vs1r.v  v1,0(a0)
>
> It will elimate the mul with const 0 instruction to the simple mov
> instruction.
>
> Signed-off-by: Yanzhang Wang 
>
> gcc/ChangeLog:
>
> * config/riscv/autovec-opt.md: Add a split pattern.
>
> gcc/testsuite/ChangeLog:
>
> * gcc.target/riscv/rvv/base/binop_vx_constraint-121.c: The mul
>   with 0 will be simplified to vmv.v.i.
> * gcc.target/riscv/rvv/autovec/vmulh-with-zero.cc: New test.
> ---
>  gcc/config/riscv/autovec-opt.md   | 58 +++
>  gcc/config/riscv/riscv-protos.h   |  2 +
>  gcc/config/riscv/riscv-v.cc   | 57 ++
>  .../riscv/rvv/autovec/vmulh-with-zero.cc  | 19 ++
>  .../riscv/rvv/base/binop_vx_constraint-121.c  |  3 +-
>  5 files changed, 138 insertions(+), 1 deletion(-)
>  create mode 100644
> gcc/testsuite/gcc.target/riscv/rvv/autovec/vmulh-with-zero.cc
>
> diff --git a/gcc/config/riscv/autovec-opt.md
> b/gcc/config/riscv/autovec-opt.md
> index 28040805b23..0d87572d1a4 100644
> --- a/gcc/config/riscv/autovec-opt.md
> +++ b/gcc/config/riscv/autovec-opt.md
> @@ -405,3 +405,61 @@
>"vmv.x.s\t%0,%1"
>[(set_attr "type" "vimovvx")
> (set_attr "mode" "")])
> +
> +;;; Simplify the mulh with 0 to move
> +(define_split
> +  [(set (match_operand:VI_QHS 0 "register_operand")
> + (if_then_else:VI_QHS
> +   (unspec:
> +[(match_operand: 1 "vector_all_trues_mask_operand")
> +  (match_operand 5 "vector_length_operand")
> +  (match_operand 6 "const_int_operand")
> +  (match_operand 7 "const_int_operand")
> +  (match_operand 8 "const_int_operand")
> +  (reg:SI VL_REGNUM)
> +  (reg:SI VTYPE_REGNUM)] UNSPEC_VPREDICATE)
> +   (unspec:VI_QHS
> +[(vec_duplicate:VI_QHS
> +   (match_operand: 4 "reg_or_0_operand"))
>

This could be just a const int zero rather than a match operand

+  (match_operand:VI_QHS 3 "register_operand")] VMULH)
> +   (match_operand:VI_QHS 2 "vector_merge_operand")
> +   ))]
> +  "TARGET_VECTOR
> + && rtx_equal_p (operands[4], CONST0_RTX (GET_MODE (operands[4])))"
>

Then no need to check here.


+  [(const_int 0)]
> +{
> +  riscv_vector::simplify_unspec_operations (operands, UNSPEC,
> +, mode) ;
> +  DONE;
> +})
> +
> +;;; Simplify vmadc + vadc with 0 to a simple move.
> +(define_split
> +  [(set (match_operand:VI 0 "register_operand")
> + (if_then_else:VI
> +   (unspec:
> +[(match_operand 4 "vector_length_operand")
> +  (match_operand 5 "const_int_operand")
> +  (match_operand 6 "const_int_operand")
> +  (reg:SI VL_REGNUM)
> +  (reg:SI VTYPE_REGNUM)] UNSPEC_VPREDICATE)
> +   (unspec:VI
> +[(match_operand:VI 2 "register_operand")
> +  (unspec:
> +[(match_operand:VI 3 "register_operand")
> +  (unspec:
> +[(match_operand 7 "vector_length_operand")
> +  (match_operand 8 "const_int_operand")
> +  (reg:SI VL_REGNUM)
> +  (reg:SI VTYPE_REGNUM)] UNSPEC_VPREDICATE)
> +  ] UNSPEC_OVERFLOW)
> +  ] UNSPEC_VADC)
> +   (match_operand:VI 1 "vector_merge_operand")))]
> +  "TARGET_VECTOR"
> +  [(const_int 0)]
> +{
> +  riscv_vector::simplify_unspec_operations (operands, PLUS, UNSPEC_VADC,
> +   mode);
> +  DONE;
> +})
> +
> diff --git a/gcc/config/riscv/riscv-protos.h
> b/gcc/config/riscv/riscv-protos.h
> index f052757cede..6a188a3d0ef 100644
> --- a/gcc/config/riscv/riscv-protos.h
> +++ b/gcc/config/riscv/riscv-protos.h
> @@ -228,6 +228,8 @@ bool neg_simm5_p (rtx);
>  bool has_vi_variant_p (rtx_code, rtx);
>  void expand_vec_cmp (rtx, rtx_code, rtx, rtx);
>  bool expand_vec_cmp_float (rtx, rtx_code, rtx, rtx, bool);
> +void simplify_complement (rtx *, rtx_code, machine_mode);
> +void simplify_unspec_operations (rtx*, rtx_code, int, machine_mode);
>  #endif
>  bool sew64_scalar_helper (rtx *, rtx *, rtx, machine_mode,
>   bool, void (*)(rtx *, rtx));
> diff --git a/gcc/config/riscv/riscv-v.cc b/gcc/config/riscv/riscv-v.cc
> index 839a2c6ba71..9a9428ce18d 100644
> --- a/gcc/config/riscv/riscv-v.cc
> +++ b/gcc/config/riscv/riscv-v.cc
> @@ -2721,4 +2721,61 @@ expand_select_vl (rtx *ops)
>emit_insn (gen_no_side_effects_vsetvl_rtx (rvv_mode, ops[0], ops[1]));
>  }
>
> +void simplify_mulh (rtx *operands,
> +   machine_mode mode)
> +{
> +  rtx zero_operand = CONST0_RTX(G

RE: [PATCH v2] RISC-V: convert the mulh with 0 to mov 0 to the reg.

2023-07-28 Thread Wang, Yanzhang via Gcc-patches
This is a draft patch. I would like to explain it's hard to make the
simplify generic and ask for some help.

There're 2 categories we need to optimize.

- The op in optab such as div / 1.
- The unspec operation such as mulh * 0, (vadc+vmadc) + 0.

Especially for the unspec operation, I found we need to write one by
one to match the special pattern. Seems there's no way to write a
generic pattern that will match mulh, (vadc+vmadc), sll... This way
is too complicated and not so elegant because need to write so much
md patterns.

Do you have any ideas?

> -Original Message-
> From: Wang, Yanzhang 
> Sent: Friday, July 28, 2023 7:50 PM
> To: gcc-patches@gcc.gnu.org
> Cc: juzhe.zh...@rivai.ai; kito.ch...@sifive.com; rdapp@gmail.com; Li,
> Pan2 ; Wang, Yanzhang 
> Subject: [PATCH v2] RISC-V: convert the mulh with 0 to mov 0 to the reg.
> 
> From: Yanzhang Wang 
> 
> This patch will optimize the below mulh example,
> 
> vint32m1_t shortcut_for_riscv_vmulh_case_0(vint32m1_t v1, size_t vl) {
>   return __riscv_vmulh_vx_i32m1(v1, 0, vl); }
> 
> from mulh pattern
> 
> vsetvli   zero, a2, e32, m1, ta, ma
> vmulh.vx  v24, v24, zero
> vs1r.vv24, 0(a0)
> 
> to below vmv.
> 
> vsetvli zero,a2,e32,m1,ta,ma
> vmv.v.i v1,0
> vs1r.v  v1,0(a0)
> 
> It will elimate the mul with const 0 instruction to the simple mov
> instruction.
> 
> Signed-off-by: Yanzhang Wang 
> 
> gcc/ChangeLog:
> 
>   * config/riscv/autovec-opt.md: Add a split pattern.
> 
> gcc/testsuite/ChangeLog:
> 
>   * gcc.target/riscv/rvv/base/binop_vx_constraint-121.c: The mul
> with 0 will be simplified to vmv.v.i.
>   * gcc.target/riscv/rvv/autovec/vmulh-with-zero.cc: New test.
> ---
>  gcc/config/riscv/autovec-opt.md   | 58 +++
>  gcc/config/riscv/riscv-protos.h   |  2 +
>  gcc/config/riscv/riscv-v.cc   | 57 ++
>  .../riscv/rvv/autovec/vmulh-with-zero.cc  | 19 ++
>  .../riscv/rvv/base/binop_vx_constraint-121.c  |  3 +-
>  5 files changed, 138 insertions(+), 1 deletion(-)  create mode 100644
> gcc/testsuite/gcc.target/riscv/rvv/autovec/vmulh-with-zero.cc
> 
> diff --git a/gcc/config/riscv/autovec-opt.md b/gcc/config/riscv/autovec-
> opt.md index 28040805b23..0d87572d1a4 100644
> --- a/gcc/config/riscv/autovec-opt.md
> +++ b/gcc/config/riscv/autovec-opt.md
> @@ -405,3 +405,61 @@
>"vmv.x.s\t%0,%1"
>[(set_attr "type" "vimovvx")
> (set_attr "mode" "")])
> +
> +;;; Simplify the mulh with 0 to move
> +(define_split
> +  [(set (match_operand:VI_QHS 0 "register_operand")
> + (if_then_else:VI_QHS
> +   (unspec:
> +  [(match_operand: 1 "vector_all_trues_mask_operand")
> +(match_operand 5 "vector_length_operand")
> +(match_operand 6 "const_int_operand")
> +(match_operand 7 "const_int_operand")
> +(match_operand 8 "const_int_operand")
> +(reg:SI VL_REGNUM)
> +(reg:SI VTYPE_REGNUM)] UNSPEC_VPREDICATE)
> +   (unspec:VI_QHS
> +  [(vec_duplicate:VI_QHS
> + (match_operand: 4 "reg_or_0_operand"))
> +(match_operand:VI_QHS 3 "register_operand")] VMULH)
> +   (match_operand:VI_QHS 2 "vector_merge_operand")
> +   ))]
> +  "TARGET_VECTOR
> + && rtx_equal_p (operands[4], CONST0_RTX (GET_MODE (operands[4])))"
> +  [(const_int 0)]
> +{
> +  riscv_vector::simplify_unspec_operations (operands, UNSPEC,
> +  , mode) ;
> +  DONE;
> +})
> +
> +;;; Simplify vmadc + vadc with 0 to a simple move.
> +(define_split
> +  [(set (match_operand:VI 0 "register_operand")
> + (if_then_else:VI
> +   (unspec:
> +  [(match_operand 4 "vector_length_operand")
> +(match_operand 5 "const_int_operand")
> +(match_operand 6 "const_int_operand")
> +(reg:SI VL_REGNUM)
> +(reg:SI VTYPE_REGNUM)] UNSPEC_VPREDICATE)
> +   (unspec:VI
> +  [(match_operand:VI 2 "register_operand")
> +(unspec:
> +  [(match_operand:VI 3 "register_operand")
> +(unspec:
> +  [(match_operand 7 "vector_length_operand")
> +(match_operand 8 "const_int_operand")
> +(reg:SI VL_REGNUM)
> +(reg:SI VTYPE_REGNUM)] UNSPEC_VPREDICATE)
> +] UNSPEC_OVERFLOW)
> +] UNSPEC_VADC)
> +   (match_operand:VI 1 "vector_merge_operand")))]
> +  "TARGET_VECTOR"
> +  [(const_int 0)]
> +{
> +  riscv_vector::simplify_unspec_operations (operands, PLUS, UNSPEC_VADC,
> + mode);
> +  DONE;
> +})
> +
> diff --git a/gcc/config/riscv/riscv-protos.h b/gcc/config/riscv/riscv-
> protos.h index f052757cede..6a188a3d0ef 100644
> --- a/gcc/config/riscv/riscv-protos.h
> +++ b/gcc/config/riscv/riscv-protos.h
> @@ -228,6 +228,8 @@ bool neg_simm5_p (rtx);  bool has_vi_variant_p
> (rtx_code, rtx);  void expand_vec_cmp (rtx, rtx_code, rtx, rtx);  bool
> expand_vec_cmp_float (rtx, rtx_code, rtx, rtx, bool);
> +void simplify_complement

New French PO file for 'gcc' (version 13.2.0)

2023-07-28 Thread Translation Project Robot
Hello, gentle maintainer.

This is a message from the Translation Project robot.

A revised PO file for textual domain 'gcc' has been submitted
by the French team of translators.  The file is available at:

https://translationproject.org/latest/gcc/fr.po

(This file, 'gcc-13.2.0.fr.po', has just now been sent to you in
a separate email.)

All other PO files for your package are available in:

https://translationproject.org/latest/gcc/

Please consider including all of these in your next release, whether
official or a pretest.

Whenever you have a new distribution with a new version number ready,
containing a newer POT file, please send the URL of that distribution
tarball to the address below.  The tarball may be just a pretest or a
snapshot, it does not even have to compile.  It is just used by the
translators when they need some extra translation context.

The following HTML page has been updated:

https://translationproject.org/domain/gcc.html

If any question arises, please contact the translation coordinator.

Thank you for all your work,

The Translation Project robot, in the
name of your translation coordinator.




New Ukrainian PO file for 'gcc' (version 13.2.0)

2023-07-28 Thread Translation Project Robot
Hello, gentle maintainer.

This is a message from the Translation Project robot.

A revised PO file for textual domain 'gcc' has been submitted
by the Ukrainian team of translators.  The file is available at:

https://translationproject.org/latest/gcc/uk.po

(This file, 'gcc-13.2.0.uk.po', has just now been sent to you in
a separate email.)

All other PO files for your package are available in:

https://translationproject.org/latest/gcc/

Please consider including all of these in your next release, whether
official or a pretest.

Whenever you have a new distribution with a new version number ready,
containing a newer POT file, please send the URL of that distribution
tarball to the address below.  The tarball may be just a pretest or a
snapshot, it does not even have to compile.  It is just used by the
translators when they need some extra translation context.

The following HTML page has been updated:

https://translationproject.org/domain/gcc.html

If any question arises, please contact the translation coordinator.

Thank you for all your work,

The Translation Project robot, in the
name of your translation coordinator.




Re: [PATCH v2] RISC-V: convert the mulh with 0 to mov 0 to the reg.

2023-07-28 Thread Robin Dapp via Gcc-patches
> This is a draft patch. I would like to explain it's hard to make the
> simplify generic and ask for some help.
> 
> There're 2 categories we need to optimize.
> 
> - The op in optab such as div / 1.
> - The unspec operation such as mulh * 0, (vadc+vmadc) + 0.
> 
> Especially for the unspec operation, I found we need to write one by
> one to match the special pattern. Seems there's no way to write a
> generic pattern that will match mulh, (vadc+vmadc), sll... This way
> is too complicated and not so elegant because need to write so much
> md patterns.
> 
> Do you have any ideas?

Yes, it's cumbersome having to add the patterns individually
and it would be nicer to have the middle end optimize for us.

However, adding new rtl expressions, especially generic ones that
are useful for others and the respective optimizations is a tedious
process as well.  Still, just recently Roger Sayle added bitreverse
and copysign.  You can refer to his patch as well as the follow-up
ones to get an idea of what would need to be done.
("Add RTX codes for BITREVERSE and COPYSIGN")

So if we have few patterns that are really performance critical
(like for some benchmark) my take is to add them in a similar way you
were proposing but I would advise against using this excessively.
Is the mulh case somehow common or critical?

Regards
 Robin


RE: [PATCH v8] RISC-V: Support CALL for RVV floating-point dynamic rounding

2023-07-28 Thread Li, Pan2 via Gcc-patches
Great! Thanks Robin for so many useful comments, as well as the 
thought-provoking discussion with different insights.
I believe such kind of interactively discussion will empower all of us, and 
leading us to do the right things.

Back to this PATCH, I try to only do one thing at a time and I totally agree 
that there are something we need to try.
Thanks again and let's wait for kito's comments.

Pan

-Original Message-
From: Robin Dapp  
Sent: Friday, July 28, 2023 6:05 PM
To: Li, Pan2 ; gcc-patches@gcc.gnu.org
Cc: rdapp@gmail.com; juzhe.zh...@rivai.ai; kito.ch...@sifive.com; Wang, 
Yanzhang 
Subject: Re: [PATCH v8] RISC-V: Support CALL for RVV floating-point dynamic 
rounding

Hi Pan,

thanks for your patience and your work.  Apart from my general doubt
whether mode-changing intrinsics are a good idea, I don't have other
remarks that need fixing.  What I mentioned before:

 - Handling of asms wouldn't be a huge change.  It can be done
 in a follow-up patch of course but should be done eventually.

 - The code is still rather difficult to follow because we diverge
 from the usual mode-switching semantics e.g. in that we emit insns
 in mode_needed as well as in mode_set.  I would have preferred
 to stay close to the regular usage, document where and why we need
 to do something different and suggest future middle-end improvements
 to solve this more elegantly.

 - I hope non-local control flow like setjmp/longjmp, sibcall
 optimization and maybe others work fine.  I didn't see a reason
 why not but I haven't checked very closely either.

 - We can probably get away with not annotating every call with
 an FRM clobber because there isn't any pass that would make use
 of that anyway?


As to my general qualm, independent of this patch, quickly
summarized again one last time (the problem was latent before this
specific patch anyway):

I would prefer not to have mode-changing intrinsics at all but
have users call fesetround explicitly.  That way the exact point
where the rounding mode is changed would be obvious and not
subject to optimization as well as caching/backing up.
If at all necessary I would have preferred the LLVM way of
backing up, setting new mode, performing the instruction
and restoring directly after.
If the initial intent of mode-changing intrinsics was to give
users more control, I don't believe we achieve this by the "lazy"
restore mechanism which is rather an obfuscation.

Pardon my frankness but the whole mode-changing thing feels to me
like just getting a feature out of the door to solve "something"
/appease users than a well thought-out feature.  It doesn't even
seem clear if this optimization is worthwhile when changing the
rounding mode is prohibitively slow anyway.

That said, if the current status is what the majority of
contributors can live with, I'm not going to stand in the way,
but I'd ask Kito or somebody else to give the final OK.

Regards
 Robin


Re: Loop-split improvements, part 2

2023-07-28 Thread Richard Biener via Gcc-patches
On Fri, Jul 28, 2023 at 9:58 AM Jan Hubicka via Gcc-patches
 wrote:
>
> Hi,
> this patch fixes profile update in the first case of loop splitting.
> The pass still gives up on very basic testcases:
>
> __attribute__ ((noinline,noipa))
> void test1 (int n)
> {
>   if (n <= 0 || n > 10)
> return;
>   for (int i = 0; i <= n; i++)
> {
>   if (i < n)
> do_something ();
>   if (a[i])
> do_something2();
> }
> }
> Here I needed to do the conditoinal that enforces sane value range of n.
> The reason is that it gives up on:
>   !number_of_iterations_exit (loop1, exit1, &niter, false, true)
> and without the conditonal we get assumption that n>=0 and not INT_MAX.
> I think from overflow we shold derive that INT_MAX test is not needed and 
> since
> the loop does nothing for n<0 it is also just an paranoia.

I only get n != 2147483647 (loop header copying does the n >= 0).  Indeed
this test looks odd.  It's because we turn i <= n into i < n + 1 and analyze
that (our canonical test is LT_EXPR), for this to work n may not be INT_MAX.

> I am not sure how to fix this though :(.  In general the pass does not really
> need to compute iteration count.  It only needs to know what direction the IVs
> go so it can detect tests that fires in first part of iteration space.
>
> Rich, any idea what the correct test should be?

In principle it could just look at the scalar evolution for the IV in
the exit test.
Aka use simple_iv () and check ->no_overflow?

> In testcase:
>   for (int i = 0; i < 200; i++)
> if (i < 150)
>   do_something ();
> else
>   do_something2 ();
> the old code did wrong update of the exit condition probabilities.
> We know that first loop iterates 150 times and the second loop 50 times
> and we get it by simply scaling loop body by the probability of inner test.
>
> With the patch we now get:
>
>[count: 1000]:
>
>[count: 15]:<- loop 1 correctly iterates 149 times
>   # i_10 = PHI 
>   do_something ();
>   i_7 = i_10 + 1;
>   if (i_7 <= 149)
> goto ; [99.33%]
>   else
> goto ; [0.67%]
>
>[count: 149000]:
>   goto ; [100.00%]
>
>[count: 1000]:
>   # i_15 = PHI 
>
>[count: 49975]:<- loop 2 should iterate 50 times but
>we are slightly wrong
>   # i_3 = PHI 
>   do_something2 ();
>   i_14 = i_3 + 1;
>   if (i_14 != 200)
> goto ; [98.00%]
>   else
> goto ; [2.00%]
>
>[count: 48975]:
>   goto ; [100.00%]
>
>[count: 1000]:   <- this test is always true becuase it is
>   reached form bb 3
>   # i_18 = PHI 
>   if (i_18 != 200)
> goto ; [99.95%]
>   else
> goto ; [0.05%]
>
>[count: 1000]:
>   return;
>
> The reason why we are slightly wrong is the condtion in bb17 that
> is always true but the pass does not konw it.
>
> Rich any idea how to do that?  I think connect_loops should work out
> the cas where the loop exit conditon is never satisfied at the time
> the splitted condition fails for first time.
>
> Also we do not update loop iteration expectancies.  If we were able to
> work out if one of the loop has constant iteration count, we could do it
> perfectly.
>
> Before patch on hmmer we get a lot of mismatches:
> Profile report here claims:
> dump id |static mismat|dynamic mismatch |
> |in count |in count  |time  |
> lsplit  |  5+5|   8151850567  +8151850567| 531506481006   +57.9%|
> ldist   |  9+4|  15345493501  +7193642934| 606848841056   +14.2%|
> ifcvt   | 10+1|  15487514871   +142021370| 689469797790   +13.6%|
> vect| 35   +25|  17558425961  +2070911090| 517375405715   -25.0%|
> cunroll | 42+7|  16898736178   -659689783| 452445796198-4.9%|
> loopdone| 33-9|   2678017188 -14220718990| 330969127663 |
> tracer  | 34+1|   2678018710+1522| 330613415364+0.0%|
> fre | 33-1|   2676980249 -1038461| 330465677073-0.0%|
> expand  | 28-5|   2497468467   -179511782|--|
>
> With patch
>
> lsplit  |  0  |0 | 328723360744-2.3%|
> ldist   |  0  |0 | 396193562452   +20.6%|
> ifcvt   |  1+1| 71010686+71010686| 478743508522   +20.8%|
> vect| 14   +13|697518955   +626508269| 299398068323   -37.5%|
> cunroll | 13-1|489349408   -208169547| 25839725   -10.5%|
> loopdone| 11-2|402558559-86790849| 201010712702 |
> tracer  | 13+2|402977200  +418641| 200651036623+0.0%|
> fre | 13  |402622146  -355054| 200344398654-0.2%|
> expand  | 11-2|333608636-69013510|--|
>
> So no mismatches for lsplit and ldist and also lsplit thinks it improves
> speed by 2.3% rather than 

New Croatian PO file for 'gcc' (version 13.2.0)

2023-07-28 Thread Translation Project Robot
Hello, gentle maintainer.

This is a message from the Translation Project robot.

A revised PO file for textual domain 'gcc' has been submitted
by the Croatian team of translators.  The file is available at:

https://translationproject.org/latest/gcc/hr.po

(This file, 'gcc-13.2.0.hr.po', has just now been sent to you in
a separate email.)

All other PO files for your package are available in:

https://translationproject.org/latest/gcc/

Please consider including all of these in your next release, whether
official or a pretest.

Whenever you have a new distribution with a new version number ready,
containing a newer POT file, please send the URL of that distribution
tarball to the address below.  The tarball may be just a pretest or a
snapshot, it does not even have to compile.  It is just used by the
translators when they need some extra translation context.

The following HTML page has been updated:

https://translationproject.org/domain/gcc.html

If any question arises, please contact the translation coordinator.

Thank you for all your work,

The Translation Project robot, in the
name of your translation coordinator.




Re: Loop-split improvements, part 2

2023-07-28 Thread Jan Hubicka via Gcc-patches
> On Fri, Jul 28, 2023 at 9:58 AM Jan Hubicka via Gcc-patches
>  wrote:
> >
> > Hi,
> > this patch fixes profile update in the first case of loop splitting.
> > The pass still gives up on very basic testcases:
> >
> > __attribute__ ((noinline,noipa))
> > void test1 (int n)
> > {
> >   if (n <= 0 || n > 10)
> > return;
> >   for (int i = 0; i <= n; i++)
> > {
> >   if (i < n)
> > do_something ();
> >   if (a[i])
> > do_something2();
> > }
> > }
> > Here I needed to do the conditoinal that enforces sane value range of n.
> > The reason is that it gives up on:
> >   !number_of_iterations_exit (loop1, exit1, &niter, false, true)
> > and without the conditonal we get assumption that n>=0 and not INT_MAX.
> > I think from overflow we shold derive that INT_MAX test is not needed and 
> > since
> > the loop does nothing for n<0 it is also just an paranoia.
> 
> I only get n != 2147483647 (loop header copying does the n >= 0).  Indeed
> this test looks odd.  It's because we turn i <= n into i < n + 1 and analyze
> that (our canonical test is LT_EXPR), for this to work n may not be INT_MAX.

Yep, I can't think on how that can disturb loop splitting.  The loop
above is similar to one in hmmer so people do loops like that.
We should be able to use the fact that i can not overflow to get rid of
this assumtion, but I am not that famililar with that code...

I think it would help elsewhere too?
> 
> In principle it could just look at the scalar evolution for the IV in
> the exit test.
> Aka use simple_iv () and check ->no_overflow?

Yep, I tink that should be enough.  It uses simple_iv to analyze the
in-loop conditionals.  I will look into that.

Honza


Loop-split improvements, part 3

2023-07-28 Thread Jan Hubicka via Gcc-patches
Hi,
This patch extends tree-ssa-loop-split to understand test of the form
 if (i==0)
and
 if (i!=0)
which triggers only during the first iteration.  Naturally we should
also be able to trigger last iteration or split into 3 cases if
the test indeed can fire in the middle of the loop.

Last iteration is bit trickier pattern matching so I want to do it
incrementally, but I implemented easy case using value range that handled
loops with constant iterations.

The testcase gets misupdated profile, I will also fix that incrementally.

Bootstrapped/regtested x86_64-linux, OK?

gcc/ChangeLog:

PR middle-end/77689
* tree-ssa-loop-split.cc: Include value-query.h.
(split_at_bb_p): Analyze cases where EQ/NE can be turned
into LT/LE/GT/GE; return updated guard code.
(split_loop): Use guard code.

gcc/testsuite/ChangeLog:

PR middle-end/77689
* g++.dg/tree-ssa/loop-split-1.C: New test.

diff --git a/gcc/testsuite/g++.dg/tree-ssa/loop-split-1.C 
b/gcc/testsuite/g++.dg/tree-ssa/loop-split-1.C
new file mode 100644
index 000..9581438b536
--- /dev/null
+++ b/gcc/testsuite/g++.dg/tree-ssa/loop-split-1.C
@@ -0,0 +1,23 @@
+/* { dg-do compile } */
+/* { dg-options "-O3 -fdump-tree-lsplit-details -std=c++11" } */
+#include 
+#include 
+
+constexpr unsigned s = 1;
+
+int main()
+{
+std::vector a, b, c;
+a.reserve(s);
+b.reserve(s);
+c.reserve(s);
+
+for(unsigned i = 0; i < s; ++i)
+{
+if(i == 0)
+a[i] = b[i] * c[i];
+else
+a[i] = (b[i] + c[i]) * c[i-1] * std::log(i);
+}
+}
+/* { dg-final { scan-tree-dump-times "loop split" 1 "lsplit" } } */
diff --git a/gcc/tree-ssa-loop-split.cc b/gcc/tree-ssa-loop-split.cc
index 70cd0aaefa7..641346cba70 100644
--- a/gcc/tree-ssa-loop-split.cc
+++ b/gcc/tree-ssa-loop-split.cc
@@ -42,6 +42,7 @@ along with GCC; see the file COPYING3.  If not see
 #include "gimple-fold.h"
 #include "gimplify-me.h"
 #include "print-tree.h"
+#include "value-query.h"
 
 /* This file implements two kinds of loop splitting.
 
@@ -75,7 +76,8 @@ along with GCC; see the file COPYING3.  If not see
point in *BORDER and the comparison induction variable in IV.  */
 
 static tree
-split_at_bb_p (class loop *loop, basic_block bb, tree *border, affine_iv *iv)
+split_at_bb_p (class loop *loop, basic_block bb, tree *border, affine_iv *iv,
+  enum tree_code *guard_code)
 {
   gcond *stmt;
   affine_iv iv2;
@@ -87,19 +89,6 @@ split_at_bb_p (class loop *loop, basic_block bb, tree 
*border, affine_iv *iv)
 
   enum tree_code code = gimple_cond_code (stmt);
 
-  /* Only handle relational comparisons, for equality and non-equality
- we'd have to split the loop into two loops and a middle statement.  */
-  switch (code)
-{
-  case LT_EXPR:
-  case LE_EXPR:
-  case GT_EXPR:
-  case GE_EXPR:
-   break;
-  default:
-   return NULL_TREE;
-}
-
   if (loop_exits_from_bb_p (loop, bb))
 return NULL_TREE;
 
@@ -129,6 +118,56 @@ split_at_bb_p (class loop *loop, basic_block bb, tree 
*border, affine_iv *iv)
   if (!iv->no_overflow)
 return NULL_TREE;
 
+  /* Only handle relational comparisons, for equality and non-equality
+ we'd have to split the loop into two loops and a middle statement.  */
+  switch (code)
+{
+  case LT_EXPR:
+  case LE_EXPR:
+  case GT_EXPR:
+  case GE_EXPR:
+   break;
+  case NE_EXPR:
+  case EQ_EXPR:
+   /* If the test check for first iteration, we can handle NE/EQ
+  with only one split loop.  */
+   if (operand_equal_p (iv->base, iv2.base, 0))
+ {
+   if (code == EQ_EXPR)
+ code = !tree_int_cst_sign_bit (iv->step) ? LE_EXPR : GE_EXPR;
+   else
+ code = !tree_int_cst_sign_bit (iv->step) ? GT_EXPR : LT_EXPR;
+   break;
+ }
+   /* Similarly when the test checks for minimal or maximal
+  value range.  */
+   else
+ {
+   int_range<2> r;
+   get_global_range_query ()->range_of_expr (r, op0, stmt);
+   if (!r.varying_p () && !r.undefined_p ()
+   && TREE_CODE (op1) == INTEGER_CST)
+ {
+   wide_int val = wi::to_wide (op1);
+   if (known_eq (val, r.lower_bound ()))
+ {
+   code = (code == EQ_EXPR) ? LE_EXPR : GT_EXPR;
+   break;
+ }
+   else if (known_eq (val, r.upper_bound ()))
+ {
+   code = (code == EQ_EXPR) ? GE_EXPR : LT_EXPR;
+   break;
+ }
+ }
+ }
+   /* TODO: We can compare with exit condition; it seems that testing for
+  last iteration is common case.  */
+   return NULL_TREE;
+  default:
+   return NULL_TREE;
+}
+
   if (dump_file && (dump_flags & TDF_DETAILS))
 {
   fprintf (dump_file, "Found potential split point: ");

Re: [RFC] [v2] Extend fold_vec_perm to handle VLA vectors

2023-07-28 Thread Prathamesh Kulkarni via Gcc-patches
On Tue, 25 Jul 2023 at 18:25, Richard Sandiford
 wrote:
>
> Hi,
>
> Thanks for the rework and sorry for the slow review.
Hi Richard,
Thanks for the suggestions!  Please find my responses inline below.
>
> Prathamesh Kulkarni  writes:
> > Hi Richard,
> > This is reworking of patch to extend fold_vec_perm to handle VLA vectors.
> > The attached patch unifies handling of VLS and VLA vector_csts, while
> > using fallback code
> > for ctors.
> >
> > For VLS vector, the patch ignores underlying encoding, and
> > uses npatterns = nelts, and nelts_per_pattern = 1.
> >
> > For VLA patterns, if sel has a stepped sequence, then it
> > only chooses elements from a particular pattern of a particular
> > input vector.
> >
> > To make things simpler, the patch imposes following constraints:
> > (a) op0_npatterns, op1_npatterns and sel_npatterns are powers of 2.
> > (b) The step size for a stepped sequence is a power of 2, and
> >   multiple of npatterns of chosen input vector.
> > (c) Runtime vector length of sel is a multiple of sel_npatterns.
> >  So, we don't handle sel.length = 2 + 2x and npatterns = 4.
> >
> > Eg:
> > op0, op1: npatterns = 2, nelts_per_pattern = 3
> > op0_len = op1_len = 16 + 16x.
> > sel = { 0, 0, 2, 0, 4, 0, ... }
> > npatterns = 2, nelts_per_pattern = 3.
> >
> > For pattern {0, 2, 4, ...}
> > Let,
> > a1 = 2
> > S = step size = 2
> >
> > Let Esel denote number of elements per pattern in sel at runtime.
> > Esel = (16 + 16x) / npatterns_sel
> > = (16 + 16x) / 2
> > = (8 + 8x)
> >
> > So, last element of pattern:
> > ae = a1 + (Esel - 2) * S
> >  = 2 + (8 + 8x - 2) * 2
> >  = 14 + 16x
> >
> > a1 /trunc arg0_len = 2 / (16 + 16x) = 0
> > ae /trunc arg0_len = (14 + 16x) / (16 + 16x) = 0
> > Since both are equal with quotient = 0, we select elements from op0.
> >
> > Since step size (S) is a multiple of npatterns(op0), we select
> > all elements from same pattern of op0.
> >
> > res_npatterns = max (op0_npatterns, max (op1_npatterns, sel_npatterns))
> >= max (2, max (2, 2)
> >= 2
> >
> > res_nelts_per_pattern = max (op0_nelts_per_pattern,
> > max (op1_nelts_per_pattern,
> >  
> > sel_nelts_per_pattern))
> > = max (3, max (3, 3))
> > = 3
> >
> > So res has encoding with npatterns = 2, nelts_per_pattern = 3.
> > res: { op0[0], op0[0], op0[2], op0[0], op0[4], op0[0], ... }
> >
> > Unfortunately, this results in an issue for poly_int_cst index:
> > For example,
> > op0, op1: npatterns = 1, nelts_per_pattern = 3
> > op0_len = op1_len = 4 + 4x
> >
> > sel: { 4 + 4x, 5 + 4x, 6 + 4x, ... } // should choose op1
> >
> > In this case,
> > a1 = 5 + 4x
> > S = (6 + 4x) - (5 + 4x) = 1
> > Esel = 4 + 4x
> >
> > ae = a1 + (esel - 2) * S
> >  = (5 + 4x) + (4 + 4x - 2) * 1
> >  = 7 + 8x
> >
> > IIUC, 7 + 8x will always be index for last element of op1 ?
> > if x = 0, len = 4, 7 + 8x = 7
> > if x = 1, len = 8, 7 + 8x = 15, etc.
> > So the stepped sequence will always choose elements
> > from op1 regardless of vector length for above case ?
> >
> > However,
> > ae /trunc op0_len
> > = (7 + 8x) / (4 + 4x)
> > which is not defined because 7/4 != 8/4
> > and we return NULL_TREE, but I suppose the expected result would be:
> > res: { op1[0], op1[1], op1[2], ... } ?
> >
> > The patch passes bootstrap+test on aarch64-linux-gnu with and without sve,
> > and on x86_64-unknown-linux-gnu.
> > I would be grateful for suggestions on how to proceed.
> >
> > Thanks,
> > Prathamesh
> >
> > diff --git a/gcc/fold-const.cc b/gcc/fold-const.cc
> > index a02ede79fed..8028b3e8e9a 100644
> > --- a/gcc/fold-const.cc
> > +++ b/gcc/fold-const.cc
> > @@ -85,6 +85,10 @@ along with GCC; see the file COPYING3.  If not see
> >  #include "vec-perm-indices.h"
> >  #include "asan.h"
> >  #include "gimple-range.h"
> > +#include 
> > +#include "tree-pretty-print.h"
> > +#include "gimple-pretty-print.h"
> > +#include "print-tree.h"
> >
> >  /* Nonzero if we are folding constants inside an initializer or a C++
> > manifestly-constant-evaluated context; zero otherwise.
> > @@ -10493,15 +10497,9 @@ fold_mult_zconjz (location_t loc, tree type, tree 
> > expr)
> >  static bool
> >  vec_cst_ctor_to_array (tree arg, unsigned int nelts, tree *elts)
> >  {
> > -  unsigned HOST_WIDE_INT i, nunits;
> > +  unsigned HOST_WIDE_INT i;
> >
> > -  if (TREE_CODE (arg) == VECTOR_CST
> > -  && VECTOR_CST_NELTS (arg).is_constant (&nunits))
> > -{
> > -  for (i = 0; i < nunits; ++i)
> > - elts[i] = VECTOR_CST_ELT (arg, i);
> > -}
> > -  else if (TREE_CODE (arg) == CONSTRUCTOR)
> > +  if (TREE_CODE (arg) == CONSTRUCTOR)
> >  {
> >constructor_elt *elt;
> >
> > @@ -10519,6 +10517,230 @@ vec_cst_ctor_to_array (tree arg, unsigned int 
> > nelts, tree *elts)
> >return true;
> >  }
> >
>

[PATCH 1/4] openmp: Fix loop transformation tests

2023-07-28 Thread Frederik Harwath
libgomp/ChangeLog:

* testsuite/libgomp.fortran/loop-transforms/tile-2.f90: Add reduction 
clause.
* testsuite/libgomp.fortran/loop-transforms/unroll-1.f90: Initialize 
var.
* testsuite/libgomp.fortran/loop-transforms/unroll-simd-1.f90: Add 
reduction
and initialization.
---
 libgomp/testsuite/libgomp.fortran/loop-transforms/tile-2.f90   | 2 +-
 libgomp/testsuite/libgomp.fortran/loop-transforms/unroll-1.f90 | 2 ++
 .../libgomp.fortran/loop-transforms/unroll-simd-1.f90  | 3 ++-
 3 files changed, 5 insertions(+), 2 deletions(-)

diff --git a/libgomp/testsuite/libgomp.fortran/loop-transforms/tile-2.f90 
b/libgomp/testsuite/libgomp.fortran/loop-transforms/tile-2.f90
index 6aedbf4724f..a7cb5e7635d 100644
--- a/libgomp/testsuite/libgomp.fortran/loop-transforms/tile-2.f90
+++ b/libgomp/testsuite/libgomp.fortran/loop-transforms/tile-2.f90
@@ -69,7 +69,7 @@ module test_functions
 integer :: i,j

 sum = 0
-!$omp parallel do collapse(2)
+!$omp parallel do collapse(2) reduction(+:sum)
 !$omp tile sizes(6,10)
 do i = 1,10,3
do j = 1,10,3
diff --git a/libgomp/testsuite/libgomp.fortran/loop-transforms/unroll-1.f90 
b/libgomp/testsuite/libgomp.fortran/loop-transforms/unroll-1.f90
index f07aab898fa..b91ea275577 100644
--- a/libgomp/testsuite/libgomp.fortran/loop-transforms/unroll-1.f90
+++ b/libgomp/testsuite/libgomp.fortran/loop-transforms/unroll-1.f90
@@ -8,6 +8,7 @@ module test_functions

 integer :: i,j

+sum = 0
 !$omp do
 do i = 1,10,3
!$omp unroll full
@@ -22,6 +23,7 @@ module test_functions

 integer :: i,j

+sum = 0
 !$omp parallel do reduction(+:sum)
 !$omp unroll partial(2)
 do i = 1,10,3
diff --git 
a/libgomp/testsuite/libgomp.fortran/loop-transforms/unroll-simd-1.f90 
b/libgomp/testsuite/libgomp.fortran/loop-transforms/unroll-simd-1.f90
index 5fb64ddd6fd..7a43458f0dd 100644
--- a/libgomp/testsuite/libgomp.fortran/loop-transforms/unroll-simd-1.f90
+++ b/libgomp/testsuite/libgomp.fortran/loop-transforms/unroll-simd-1.f90
@@ -9,7 +9,8 @@ module test_functions

 integer :: i,j

-!$omp simd
+sum = 0
+!$omp simd reduction(+:sum)
 do i = 1,10,3
!$omp unroll full
do j = 1,10,3
--
2.36.1

-
Siemens Electronic Design Automation GmbH; Anschrift: Arnulfstraße 201, 80634 
München; Gesellschaft mit beschränkter Haftung; Geschäftsführer: Thomas 
Heurung, Frank Thürauf; Sitz der Gesellschaft: München; Registergericht 
München, HRB 106955


[PATCH 0/4] openmp: loop transformation fixes

2023-07-28 Thread Frederik Harwath
Hi,
the following patches contain some fixes from the devel/omp/gcc-13 branch
to the patches that implement the OpenMP 5.1. loop transformation directives
which I have posted in March 2023.

Frederik



Frederik Harwath (4):
  openmp: Fix loop transformation tests
  openmp: Fix initialization for 'unroll full'
  openmp: Fix diagnostic message for "omp unroll"
  openmp: Fix number of iterations computation for "omp unroll full"

 gcc/omp-transform-loops.cc| 99 ++-
 .../gomp/loop-transforms/unroll-8.c   | 76 ++
 .../gomp/loop-transforms/unroll-8.f90 |  2 +-
 .../gomp/loop-transforms/unroll-9.f90 |  2 +-
 .../matrix-no-directive-unroll-full-1.C   | 13 +++
 .../loop-transforms/matrix-no-directive-1.c   |  2 +-
 .../matrix-no-directive-unroll-full-1.c   |  2 +-
 .../matrix-omp-distribute-parallel-for-1.c|  2 +
 .../loop-transforms/matrix-omp-for-1.c|  2 +-
 .../matrix-omp-parallel-for-1.c   |  2 +-
 .../matrix-omp-parallel-masked-taskloop-1.c   |  2 +
 ...trix-omp-parallel-masked-taskloop-simd-1.c |  2 +
 .../matrix-omp-target-parallel-for-1.c|  2 +-
 ...p-target-teams-distribute-parallel-for-1.c |  2 +
 .../loop-transforms/matrix-omp-taskloop-1.c   |  2 +
 ...trix-omp-teams-distribute-parallel-for-1.c |  2 +
 .../loop-transforms/matrix-simd-1.c   |  2 +
 .../loop-transforms/unroll-1.c|  8 +-
 .../loop-transforms/unroll-non-rect-1.c   |  2 +
 .../loop-transforms/tile-2.f90|  2 +-
 .../loop-transforms/unroll-1.f90  |  2 +
 .../loop-transforms/unroll-6.f90  |  4 +-
 .../loop-transforms/unroll-simd-1.f90 |  3 +-
 23 files changed, 197 insertions(+), 40 deletions(-)
 create mode 100644 gcc/testsuite/c-c++-common/gomp/loop-transforms/unroll-8.c
 create mode 100644 
libgomp/testsuite/libgomp.c++/loop-transforms/matrix-no-directive-unroll-full-1.C

--
2.36.1

-
Siemens Electronic Design Automation GmbH; Anschrift: Arnulfstraße 201, 80634 
München; Gesellschaft mit beschränkter Haftung; Geschäftsführer: Thomas 
Heurung, Frank Thürauf; Sitz der Gesellschaft: München; Registergericht 
München, HRB 106955


[PATCH 4/4] openmp: Fix number of iterations computation for "omp unroll full"

2023-07-28 Thread Frederik Harwath
gcc/ChangeLog:

* omp-transform-loops.cc (gomp_for_number_of_iterations):
Always compute "final - init" and do not take absolute value.
Identify non-iterating and infinite loops for constant init,
final, step values for better diagnostic messages, consistent
behaviour in those corner cases, and better testability.
(gomp_for_constant_iterations_p): Add new argument to pass
on information about infinite loops, and ...
(full_unroll): ... use from here to emit a warning and remove
unrolled, known infinite loops consistently.
(process_omp_for): Only print dump message if loop has not
been removed by transformation.

gcc/testsuite/ChangeLog:

* c-c++-common/gomp/loop-transforms/unroll-8.c: New test.
---
 gcc/omp-transform-loops.cc| 94 ++-
 .../gomp/loop-transforms/unroll-8.c   | 76 +++
 2 files changed, 146 insertions(+), 24 deletions(-)
 create mode 100644 gcc/testsuite/c-c++-common/gomp/loop-transforms/unroll-8.c

diff --git a/gcc/omp-transform-loops.cc b/gcc/omp-transform-loops.cc
index c8853bcee89..b0645397641 100644
--- a/gcc/omp-transform-loops.cc
+++ b/gcc/omp-transform-loops.cc
@@ -153,20 +153,27 @@ subst_defs (tree expr, gimple_seq seq)
   return expr;
 }

-/* Return an expression for the number of iterations of the outermost loop of
-   OMP_FOR. */
+/* Return an expression for the number of iterations of the loop at
+   the given LEVEL of OMP_FOR.
+
+   If the expression is a negative constant, this means that the loop
+   is infinite. This can only be recognized for loops with constant
+   initial, final, and step values.  In general, according to the
+   OpenMP specification, the behaviour is unspecified if the number of
+   iterations does not fit the types used for their computation, and
+   hence in particular if the loop is infinite. */

 tree
 gomp_for_number_of_iterations (const gomp_for *omp_for, size_t level)
 {
   gcc_assert (!non_rectangular_p (omp_for));
-
   tree init = gimple_omp_for_initial (omp_for, level);
   tree final = gimple_omp_for_final (omp_for, level);
   tree_code cond = gimple_omp_for_cond (omp_for, level);
   tree index = gimple_omp_for_index (omp_for, level);
   tree type = gomp_for_iter_count_type (index, final);
-  tree step = TREE_OPERAND (gimple_omp_for_incr (omp_for, level), 1);
+  tree incr = gimple_omp_for_incr (omp_for, level);
+  tree step = omp_get_for_step_from_incr (gimple_location (omp_for), incr);

   init = subst_defs (init, gimple_omp_for_pre_body (omp_for));
   init = fold (init);
@@ -181,34 +188,64 @@ gomp_for_number_of_iterations (const gomp_for *omp_for, 
size_t level)
   diff_type = ptrdiff_type_node;
 }

-  tree diff;
-  if (cond == GT_EXPR)
-diff = fold_build2 (minus_code, diff_type, init, final);
-  else if (cond == LT_EXPR)
-diff = fold_build2 (minus_code, diff_type, final, init);
-  else
-gcc_unreachable ();

-  diff = fold_build2 (CEIL_DIV_EXPR, type, diff, step);
-  diff = fold_build1 (ABS_EXPR, type, diff);
+  /* Identify a simple case in which the loop does not iterate. The
+ computation below could not tell this apart from an infinite
+ loop, hence we handle this separately for better diagnostic
+ messages. */
+  gcc_assert (cond == GT_EXPR || cond == LT_EXPR);
+  if (TREE_CONSTANT (init) && TREE_CONSTANT (final)
+  && ((cond == GT_EXPR && tree_int_cst_le (init, final))
+ || (cond == LT_EXPR && tree_int_cst_le (final, init
+return build_int_cst (diff_type, 0);
+
+  tree diff = fold_build2 (minus_code, diff_type, final, init);
+
+  /* Divide diff by the step.
+
+ We could always use CEIL_DIV_EXPR since only non-negative results
+ correspond to valid number of iterations and the behaviour is
+ unspecified by the spec otherwise. But we try to get the rounding
+ right for constant negative values to identify infinite loops
+ more precisely for better warnings. */
+  tree_code div_expr = CEIL_DIV_EXPR;
+  if (TREE_CONSTANT (diff) && TREE_CONSTANT (step))
+{
+  bool diff_is_neg = tree_int_cst_lt (diff, size_zero_node);
+  bool step_is_neg = tree_int_cst_lt (step, size_zero_node);
+  if ((diff_is_neg && !step_is_neg)
+ || (!diff_is_neg && step_is_neg))
+   div_expr = FLOOR_DIV_EXPR;
+}

+  diff = fold_build2 (div_expr, type, diff, step);
   return diff;
 }

-/* Return true if the expression representing the number of iterations for
-   OMP_FOR is a constant expression, false otherwise. */
+/* Return true if the expression representing the number of iterations
+   for OMP_FOR is a non-negative constant and set ITERATIONS to the
+   value of that expression. Otherwise, return false.  Set INFINITE to
+   true if the number of iterations was recognized to be infinite. */

 bool
 gomp_for_constant_iterations_p (gomp_for *omp_for,
-   unsigned HOST_WIDE_INT *iterations)
+ 

[PATCH 3/4] openmp: Fix diagnostic message for "omp unroll"

2023-07-28 Thread Frederik Harwath
gcc/ChangeLog:

* omp-transform-loops.cc (print_optimized_unroll_partial_msg):
Output "omp unroll partial" instead of "omp unroll auto".
(optimize_transformation_clauses): Likewise.

libgomp/ChangeLog:

* testsuite/libgomp.fortran/loop-transforms/unroll-6.f90: Adjust.

gcc/testsuite/ChangeLog:

* gfortran.dg/gomp/loop-transforms/unroll-8.f90: Adjust.
* gfortran.dg/gomp/loop-transforms/unroll-9.f90: Adjust.
---
 gcc/omp-transform-loops.cc| 4 ++--
 gcc/testsuite/gfortran.dg/gomp/loop-transforms/unroll-8.f90   | 2 +-
 gcc/testsuite/gfortran.dg/gomp/loop-transforms/unroll-9.f90   | 2 +-
 .../testsuite/libgomp.fortran/loop-transforms/unroll-6.f90| 4 ++--
 4 files changed, 6 insertions(+), 6 deletions(-)

diff --git a/gcc/omp-transform-loops.cc b/gcc/omp-transform-loops.cc
index 275a5260dae..c8853bcee89 100644
--- a/gcc/omp-transform-loops.cc
+++ b/gcc/omp-transform-loops.cc
@@ -1423,7 +1423,7 @@ print_optimized_unroll_partial_msg (tree c)
   tree unroll_factor = OMP_CLAUSE_UNROLL_PARTIAL_EXPR (c);
   dump_printf_loc (MSG_OPTIMIZED_LOCATIONS, dump_loc,
   "replaced consecutive % directives by "
-  "%\n", tree_to_uhwi (unroll_factor));
 }

@@ -1483,7 +1483,7 @@ optimize_transformation_clauses (tree clauses)

  dump_printf_loc (
  MSG_OPTIMIZED_LOCATIONS, dump_loc,
- "removed useless % directives "
+ "removed useless % directives "
  "preceding 'omp unroll full'\n");
}
}
diff --git a/gcc/testsuite/gfortran.dg/gomp/loop-transforms/unroll-8.f90 
b/gcc/testsuite/gfortran.dg/gomp/loop-transforms/unroll-8.f90
index fd687890ee6..dab3f0fb5cf 100644
--- a/gcc/testsuite/gfortran.dg/gomp/loop-transforms/unroll-8.f90
+++ b/gcc/testsuite/gfortran.dg/gomp/loop-transforms/unroll-8.f90
@@ -5,7 +5,7 @@ subroutine test1
   implicit none
   integer :: i
   !$omp parallel do collapse(1)
-  !$omp unroll partial(4) ! { dg-optimized {replaced consecutive 'omp unroll' 
directives by 'omp unroll auto\(24\)'} }
+  !$omp unroll partial(4) ! { dg-optimized {replaced consecutive 'omp unroll' 
directives by 'omp unroll partial\(24\)'} }
   !$omp unroll partial(3)
   !$omp unroll partial(2)
   !$omp unroll partial(1)
diff --git a/gcc/testsuite/gfortran.dg/gomp/loop-transforms/unroll-9.f90 
b/gcc/testsuite/gfortran.dg/gomp/loop-transforms/unroll-9.f90
index 928ca44e811..91e13ff1b37 100644
--- a/gcc/testsuite/gfortran.dg/gomp/loop-transforms/unroll-9.f90
+++ b/gcc/testsuite/gfortran.dg/gomp/loop-transforms/unroll-9.f90
@@ -4,7 +4,7 @@
 subroutine test1
   implicit none
   integer :: i
-  !$omp unroll full ! { dg-optimized {removed useless 'omp unroll auto' 
directives preceding 'omp unroll full'} }
+  !$omp unroll full ! { dg-optimized {removed useless 'omp unroll partial' 
directives preceding 'omp unroll full'} }
   !$omp unroll partial(3)
   !$omp unroll partial(2)
   !$omp unroll partial(1)
diff --git a/libgomp/testsuite/libgomp.fortran/loop-transforms/unroll-6.f90 
b/libgomp/testsuite/libgomp.fortran/loop-transforms/unroll-6.f90
index 1df8ce8d5bb..b953ce31b5b 100644
--- a/libgomp/testsuite/libgomp.fortran/loop-transforms/unroll-6.f90
+++ b/libgomp/testsuite/libgomp.fortran/loop-transforms/unroll-6.f90
@@ -22,7 +22,7 @@ contains

 sum = 0
 !$omp parallel do reduction(+:sum) lastprivate(i)
-!$omp unroll partial(5) ! { dg-optimized {replaced consecutive 'omp 
unroll' directives by 'omp unroll auto\(50\)'} }
+!$omp unroll partial(5) ! { dg-optimized {replaced consecutive 'omp 
unroll' directives by 'omp unroll partial\(50\)'} }
 !$omp unroll partial(10)
 do i = 1,n,step
sum = sum + 1
@@ -36,7 +36,7 @@ contains
 sum = 0
 !$omp parallel do reduction(+:sum) lastprivate(i)
 do i = 1,n,step
-   !$omp unroll full ! { dg-optimized {removed useless 'omp unroll auto' 
directives preceding 'omp unroll full'} }
+   !$omp unroll full ! { dg-optimized {removed useless 'omp unroll 
partial' directives preceding 'omp unroll full'} }
!$omp unroll partial(10)
do j = 1, 1000
   sum = sum + 1
--
2.36.1

-
Siemens Electronic Design Automation GmbH; Anschrift: Arnulfstraße 201, 80634 
München; Gesellschaft mit beschränkter Haftung; Geschäftsführer: Thomas 
Heurung, Frank Thürauf; Sitz der Gesellschaft: München; Registergericht 
München, HRB 106955


[PATCH 2/4] openmp: Fix initialization for 'unroll full'

2023-07-28 Thread Frederik Harwath
The index variable initialization for the 'omp unroll'
directive with 'full' clause got lost and the testsuite
did not catch it.

Add the initialization and add -Wall to some tests
to detect uninitialized variable uses and other
potential problems in the code generation.

gcc/ChangeLog:

* omp-transform-loops.cc (full_unroll): Add initialization of index 
variable.

libgomp/ChangeLog:

* 
testsuite/libgomp.c-c++-common/loop-transforms/matrix-no-directive-unroll-full-1.c:
Use -Wall and add -Wno-unknown-pragmas to disable warnings about empty 
pragmas.
Use -O2.
* 
testsuite/libgomp.c++/loop-transforms/matrix-no-directive-unroll-full-1.C:
Copy of 
testsuite/libgomp.c-c++-common/matrix-no-directive-unroll-full-1.c,
but using -O0 which works only for C++.
* 
testsuite/libgomp.c-c++-common/loop-transforms/matrix-no-directive-1.c: Use 
-Wall
and use -Wno-unknown-pragmas to disable warnings about empty pragmas.
* 
testsuite/libgomp.c-c++-common/loop-transforms/matrix-omp-distribute-parallel-for-1.c:
Likewise.
* testsuite/libgomp.c-c++-common/loop-transforms/matrix-omp-for-1.c:
Likewise.
* 
testsuite/libgomp.c-c++-common/loop-transforms/matrix-omp-parallel-for-1.c:
Likewise.
* 
testsuite/libgomp.c-c++-common/loop-transforms/matrix-omp-parallel-masked-taskloop-1.c:
Likewise.
* 
testsuite/libgomp.c-c++-common/loop-transforms/matrix-omp-parallel-masked-taskloop-simd-1.c:
Likewise.
* 
testsuite/libgomp.c-c++-common/loop-transforms/matrix-omp-target-parallel-for-1.c:
Likewise.
* 
testsuite/libgomp.c-c++-common/loop-transforms/matrix-omp-target-teams-distribute-parallel-for-1.c:
Likewise.
* 
testsuite/libgomp.c-c++-common/loop-transforms/matrix-omp-taskloop-1.c:
Likewise.
* 
testsuite/libgomp.c-c++-common/loop-transforms/matrix-omp-teams-distribute-parallel-for-1.c:
Likewise.
* testsuite/libgomp.c-c++-common/loop-transforms/matrix-simd-1.c:
Likewise.
* testsuite/libgomp.c-c++-common/loop-transforms/unroll-non-rect-1.c:
Likewise.
* testsuite/libgomp.c-c++-common/loop-transforms/unroll-1.c:
Likewise and fix broken function calls found by -Wall.
---
 gcc/omp-transform-loops.cc  |  1 +
 .../matrix-no-directive-unroll-full-1.C | 13 +
 .../loop-transforms/matrix-no-directive-1.c |  2 +-
 .../matrix-no-directive-unroll-full-1.c |  2 +-
 .../matrix-omp-distribute-parallel-for-1.c  |  2 ++
 .../loop-transforms/matrix-omp-for-1.c  |  2 +-
 .../loop-transforms/matrix-omp-parallel-for-1.c |  2 +-
 .../matrix-omp-parallel-masked-taskloop-1.c |  2 ++
 .../matrix-omp-parallel-masked-taskloop-simd-1.c|  2 ++
 .../matrix-omp-target-parallel-for-1.c  |  2 +-
 ...rix-omp-target-teams-distribute-parallel-for-1.c |  2 ++
 .../loop-transforms/matrix-omp-taskloop-1.c |  2 ++
 .../matrix-omp-teams-distribute-parallel-for-1.c|  2 ++
 .../loop-transforms/matrix-simd-1.c |  2 ++
 .../libgomp.c-c++-common/loop-transforms/unroll-1.c |  8 +---
 .../loop-transforms/unroll-non-rect-1.c |  2 ++
 16 files changed, 40 insertions(+), 8 deletions(-)
 create mode 100644 
libgomp/testsuite/libgomp.c++/loop-transforms/matrix-no-directive-unroll-full-1.C

diff --git a/gcc/omp-transform-loops.cc b/gcc/omp-transform-loops.cc
index 517faea537c..275a5260dae 100644
--- a/gcc/omp-transform-loops.cc
+++ b/gcc/omp-transform-loops.cc
@@ -548,6 +548,7 @@ full_unroll (gomp_for *omp_for, location_t loc, walk_ctx 
*ctx ATTRIBUTE_UNUSED)

   gimple_seq unrolled = NULL;
   gimple_seq_add_seq (&unrolled, gimple_omp_for_pre_body (omp_for));
+  gimplify_assign (index, init, &unrolled);
   push_gimplify_context ();
   gimple_seq_add_seq (&unrolled,
  build_unroll_body (body, unroll_factor, index, incr));
diff --git 
a/libgomp/testsuite/libgomp.c++/loop-transforms/matrix-no-directive-unroll-full-1.C
 
b/libgomp/testsuite/libgomp.c++/loop-transforms/matrix-no-directive-unroll-full-1.C
new file mode 100644
index 000..3a684219627
--- /dev/null
+++ 
b/libgomp/testsuite/libgomp.c++/loop-transforms/matrix-no-directive-unroll-full-1.C
@@ -0,0 +1,13 @@
+/* { dg-additional-options { -O0 -fdump-tree-original -Wall 
-Wno-unknown-pragmas } } */
+
+#define COMMON_DIRECTIVE
+#define COMMON_TOP_TRANSFORM omp unroll full
+#define COLLAPSE_1
+#define COLLAPSE_2
+#define COLLAPSE_3
+#define IMPLEMENTATION_FILE 
"../../libgomp.c-c++-common/loop-transforms/matrix-constant-iter.h"
+
+#include 
"../../libgomp.c-c++-common/loop-transforms/matrix-transform-variants-1.h"
+
+/* A consistency check to prevent broken macro usage. */
+/* { dg-final { scan-tree-dump-times "unroll_full" 13 "original" } } */
diff --git 
a/libgomp/testsuite/libgomp.c-c++-common/loop-tra

Re: Loop-split improvements, part 3

2023-07-28 Thread Richard Biener via Gcc-patches
On Fri, Jul 28, 2023 at 2:57 PM Jan Hubicka via Gcc-patches
 wrote:
>
> Hi,
> This patch extends tree-ssa-loop-split to understand test of the form
>  if (i==0)
> and
>  if (i!=0)
> which triggers only during the first iteration.  Naturally we should
> also be able to trigger last iteration or split into 3 cases if
> the test indeed can fire in the middle of the loop.
>
> Last iteration is bit trickier pattern matching so I want to do it
> incrementally, but I implemented easy case using value range that handled
> loops with constant iterations.
>
> The testcase gets misupdated profile, I will also fix that incrementally.
>
> Bootstrapped/regtested x86_64-linux, OK?

OK, though I think we can handle more loops by simply conservatively peeling
one iteration at the beginning/end with such conditions and would be not subject
to all other limitations the loop splitting pass has?

Richard.

> gcc/ChangeLog:
>
> PR middle-end/77689
> * tree-ssa-loop-split.cc: Include value-query.h.
> (split_at_bb_p): Analyze cases where EQ/NE can be turned
> into LT/LE/GT/GE; return updated guard code.
> (split_loop): Use guard code.
>
> gcc/testsuite/ChangeLog:
>
> PR middle-end/77689
> * g++.dg/tree-ssa/loop-split-1.C: New test.
>
> diff --git a/gcc/testsuite/g++.dg/tree-ssa/loop-split-1.C 
> b/gcc/testsuite/g++.dg/tree-ssa/loop-split-1.C
> new file mode 100644
> index 000..9581438b536
> --- /dev/null
> +++ b/gcc/testsuite/g++.dg/tree-ssa/loop-split-1.C
> @@ -0,0 +1,23 @@
> +/* { dg-do compile } */
> +/* { dg-options "-O3 -fdump-tree-lsplit-details -std=c++11" } */
> +#include 
> +#include 
> +
> +constexpr unsigned s = 1;
> +
> +int main()
> +{
> +std::vector a, b, c;
> +a.reserve(s);
> +b.reserve(s);
> +c.reserve(s);
> +
> +for(unsigned i = 0; i < s; ++i)
> +{
> +if(i == 0)
> +a[i] = b[i] * c[i];
> +else
> +a[i] = (b[i] + c[i]) * c[i-1] * std::log(i);
> +}
> +}
> +/* { dg-final { scan-tree-dump-times "loop split" 1 "lsplit" } } */
> diff --git a/gcc/tree-ssa-loop-split.cc b/gcc/tree-ssa-loop-split.cc
> index 70cd0aaefa7..641346cba70 100644
> --- a/gcc/tree-ssa-loop-split.cc
> +++ b/gcc/tree-ssa-loop-split.cc
> @@ -42,6 +42,7 @@ along with GCC; see the file COPYING3.  If not see
>  #include "gimple-fold.h"
>  #include "gimplify-me.h"
>  #include "print-tree.h"
> +#include "value-query.h"
>
>  /* This file implements two kinds of loop splitting.
>
> @@ -75,7 +76,8 @@ along with GCC; see the file COPYING3.  If not see
> point in *BORDER and the comparison induction variable in IV.  */
>
>  static tree
> -split_at_bb_p (class loop *loop, basic_block bb, tree *border, affine_iv *iv)
> +split_at_bb_p (class loop *loop, basic_block bb, tree *border, affine_iv *iv,
> +  enum tree_code *guard_code)
>  {
>gcond *stmt;
>affine_iv iv2;
> @@ -87,19 +89,6 @@ split_at_bb_p (class loop *loop, basic_block bb, tree 
> *border, affine_iv *iv)
>
>enum tree_code code = gimple_cond_code (stmt);
>
> -  /* Only handle relational comparisons, for equality and non-equality
> - we'd have to split the loop into two loops and a middle statement.  */
> -  switch (code)
> -{
> -  case LT_EXPR:
> -  case LE_EXPR:
> -  case GT_EXPR:
> -  case GE_EXPR:
> -   break;
> -  default:
> -   return NULL_TREE;
> -}
> -
>if (loop_exits_from_bb_p (loop, bb))
>  return NULL_TREE;
>
> @@ -129,6 +118,56 @@ split_at_bb_p (class loop *loop, basic_block bb, tree 
> *border, affine_iv *iv)
>if (!iv->no_overflow)
>  return NULL_TREE;
>
> +  /* Only handle relational comparisons, for equality and non-equality
> + we'd have to split the loop into two loops and a middle statement.  */
> +  switch (code)
> +{
> +  case LT_EXPR:
> +  case LE_EXPR:
> +  case GT_EXPR:
> +  case GE_EXPR:
> +   break;
> +  case NE_EXPR:
> +  case EQ_EXPR:
> +   /* If the test check for first iteration, we can handle NE/EQ
> +  with only one split loop.  */
> +   if (operand_equal_p (iv->base, iv2.base, 0))
> + {
> +   if (code == EQ_EXPR)
> + code = !tree_int_cst_sign_bit (iv->step) ? LE_EXPR : GE_EXPR;
> +   else
> + code = !tree_int_cst_sign_bit (iv->step) ? GT_EXPR : LT_EXPR;
> +   break;
> + }
> +   /* Similarly when the test checks for minimal or maximal
> +  value range.  */
> +   else
> + {
> +   int_range<2> r;
> +   get_global_range_query ()->range_of_expr (r, op0, stmt);
> +   if (!r.varying_p () && !r.undefined_p ()
> +   && TREE_CODE (op1) == INTEGER_CST)
> + {
> +   wide_int val = wi::to_wide (op1);
> +   if (known_eq (val, r.lower_bound ()))
> + {
> +   code = (code == EQ_EXPR) ? LE_EXPR : GT_EXPR;
> +   break;
>

Re: Loop-split improvements, part 3

2023-07-28 Thread Jan Hubicka via Gcc-patches
> On Fri, Jul 28, 2023 at 2:57 PM Jan Hubicka via Gcc-patches
>  wrote:
> >
> > Hi,
> > This patch extends tree-ssa-loop-split to understand test of the form
> >  if (i==0)
> > and
> >  if (i!=0)
> > which triggers only during the first iteration.  Naturally we should
> > also be able to trigger last iteration or split into 3 cases if
> > the test indeed can fire in the middle of the loop.
> >
> > Last iteration is bit trickier pattern matching so I want to do it
> > incrementally, but I implemented easy case using value range that handled
> > loops with constant iterations.
> >
> > The testcase gets misupdated profile, I will also fix that incrementally.
> >
> > Bootstrapped/regtested x86_64-linux, OK?
> 
> OK, though I think we can handle more loops by simply conservatively peeling
> one iteration at the beginning/end with such conditions and would be not 
> subject
> to all other limitations the loop splitting pass has?

I was also thinking of extending loop peeling heuristics by this.
Loop-ch already can handle case where the static test exits loop, so we
could get this if I figure out how to merge the analysis.

To handle last iteration (like in hmmer), we would need to extend loop
peeling to support that.

Even with that tree-ssa-loop-split has chance to be more informed and
have better cost model.  Let me see how many restrictions can be dropped
it.

Honza


Re: _BitInt vs. _Atomic

2023-07-28 Thread Martin Uecker



> On Thu, Jul 27, 2023 at 07:06:03PM +, Joseph Myers wrote:
> > I think there should be tests for _Atomic _BitInt types.  Hopefully atomic 
> > compound assignment just works via the logic for compare-and-exchange 
> > loops, but does e.g. atomic_fetch_add work with _Atomic _BitInt types?
> 
> So, there are 2 issues.
> 
> One is something I haven't seen being handled for C at all so far, but
> handled for C++ - padding bits.
> 
> Already e.g. x86 long double has some padding bits - 16 bits on ia32,
> 48 bits on x86_64, when one does
>   _Atomic long double l;
> ...
>   l += 2.0;
> it will sometimes work and sometimes hang forever.
> Similarly atomic_compare_exchange with structs which contain padding
> (unions with padding bits are lost case, there is nothing that can be
> reliably done for that, because we don't know at runtime what is the active
> union member if any).  And _BitInt if it doesn't use all bits in
> all containing limbs has padding as well (and psABI doesn't say it is sign
> or zero extended).

What is the problem here?  In C, atomic_compare_exchange is defined in terms
of the memory content which includes padding.  So it may fail spuriously
due to padding differences (but it may fail anyway for arbitrary reasons
even without padding differences), but then should work in the second 
iterations.

Martin





Re: _BitInt vs. _Atomic

2023-07-28 Thread Martin Uecker
Am Freitag, dem 28.07.2023 um 16:03 +0200 schrieb Martin Uecker:
> 
> > On Thu, Jul 27, 2023 at 07:06:03PM +, Joseph Myers wrote:
> > > I think there should be tests for _Atomic _BitInt types.  Hopefully 
> > > atomic 
> > > compound assignment just works via the logic for compare-and-exchange 
> > > loops, but does e.g. atomic_fetch_add work with _Atomic _BitInt types?
> > 
> > So, there are 2 issues.
> > 
> > One is something I haven't seen being handled for C at all so far, but
> > handled for C++ - padding bits.
> > 
> > Already e.g. x86 long double has some padding bits - 16 bits on ia32,
> > 48 bits on x86_64, when one does
> >   _Atomic long double l;
> > ...
> >   l += 2.0;
> > it will sometimes work and sometimes hang forever.
> > Similarly atomic_compare_exchange with structs which contain padding
> > (unions with padding bits are lost case, there is nothing that can be
> > reliably done for that, because we don't know at runtime what is the active
> > union member if any).  And _BitInt if it doesn't use all bits in
> > all containing limbs has padding as well (and psABI doesn't say it is sign
> > or zero extended).
> 
> What is the problem here?  In C, atomic_compare_exchange is defined in terms
> of the memory content which includes padding.  So it may fail spuriously
> due to padding differences (but it may fail anyway for arbitrary reasons
> even without padding differences), but then should work in the second 
> iterations.

(only the weak version can fail spuriously, but the strong one can still
fail if there are differences in the padding)





Re: _BitInt vs. _Atomic

2023-07-28 Thread Jakub Jelinek via Gcc-patches
On Fri, Jul 28, 2023 at 04:03:39PM +0200, Martin Uecker wrote:
> > On Thu, Jul 27, 2023 at 07:06:03PM +, Joseph Myers wrote:
> > > I think there should be tests for _Atomic _BitInt types.  Hopefully 
> > > atomic 
> > > compound assignment just works via the logic for compare-and-exchange 
> > > loops, but does e.g. atomic_fetch_add work with _Atomic _BitInt types?
> > 
> > So, there are 2 issues.
> > 
> > One is something I haven't seen being handled for C at all so far, but
> > handled for C++ - padding bits.
> > 
> > Already e.g. x86 long double has some padding bits - 16 bits on ia32,
> > 48 bits on x86_64, when one does
> >   _Atomic long double l;
> > ...
> >   l += 2.0;
> > it will sometimes work and sometimes hang forever.
> > Similarly atomic_compare_exchange with structs which contain padding
> > (unions with padding bits are lost case, there is nothing that can be
> > reliably done for that, because we don't know at runtime what is the active
> > union member if any).  And _BitInt if it doesn't use all bits in
> > all containing limbs has padding as well (and psABI doesn't say it is sign
> > or zero extended).
> 
> What is the problem here?  In C, atomic_compare_exchange is defined in terms
> of the memory content which includes padding.  So it may fail spuriously
> due to padding differences (but it may fail anyway for arbitrary reasons
> even without padding differences), but then should work in the second 
> iterations.

See https://www.open-std.org/jtc1/sc22/wg21/docs/papers/2018/p0528r3.html
for background.  The thing is that user doesn't have much control over those
padding bits, so whether _Atomic operations on long double (when it is 80
bit and stores from hw actually store 10 bytes rather than 12 or 16), or
_BitInt(37) or _BitInt(195) or struct S { char a; int b; }; then depend
purely on luck.  If the expected value is based on atomic_load on the
atomic_compare_exchange location or whatever atomic_compare_exchange gave
back, if in the loop one e.g. adds something to it, then again it might get
different padding bits from what is originally in memory, so it isn't true
that it will always succeed at least in the second loop iteration.

Jakub



Re: _BitInt vs. _Atomic

2023-07-28 Thread Martin Uecker
Am Freitag, dem 28.07.2023 um 16:26 +0200 schrieb Jakub Jelinek:
> On Fri, Jul 28, 2023 at 04:03:39PM +0200, Martin Uecker wrote:
> > > On Thu, Jul 27, 2023 at 07:06:03PM +, Joseph Myers wrote:
> > > > I think there should be tests for _Atomic _BitInt types.  Hopefully 
> > > > atomic 
> > > > compound assignment just works via the logic for compare-and-exchange 
> > > > loops, but does e.g. atomic_fetch_add work with _Atomic _BitInt types?
> > > 
> > > So, there are 2 issues.
> > > 
> > > One is something I haven't seen being handled for C at all so far, but
> > > handled for C++ - padding bits.
> > > 
> > > Already e.g. x86 long double has some padding bits - 16 bits on ia32,
> > > 48 bits on x86_64, when one does
> > >   _Atomic long double l;
> > > ...
> > >   l += 2.0;
> > > it will sometimes work and sometimes hang forever.
> > > Similarly atomic_compare_exchange with structs which contain padding
> > > (unions with padding bits are lost case, there is nothing that can be
> > > reliably done for that, because we don't know at runtime what is the 
> > > active
> > > union member if any).  And _BitInt if it doesn't use all bits in
> > > all containing limbs has padding as well (and psABI doesn't say it is sign
> > > or zero extended).
> > 
> > What is the problem here?  In C, atomic_compare_exchange is defined in terms
> > of the memory content which includes padding.  So it may fail spuriously
> > due to padding differences (but it may fail anyway for arbitrary reasons
> > even without padding differences), but then should work in the second 
> > iterations.
> 
> See https://www.open-std.org/jtc1/sc22/wg21/docs/papers/2018/p0528r3.html
> for background.  

Thanks. I have seen this at that time, but it seems to refer to 
C++ specific problems. At least at that time, I concluded (maybe
incorrectly) that this is not a serious problem for how things 
work in C.


> The thing is that user doesn't have much control over those
> padding bits, so whether _Atomic operations on long double (when it is 80
> bit and stores from hw actually store 10 bytes rather than 12 or 16), or
> _BitInt(37) or _BitInt(195) or struct S { char a; int b; }; then depend
> purely on luck.  If the expected value is based on atomic_load on the
> atomic_compare_exchange location or whatever atomic_compare_exchange gave
> back, if in the loop one e.g. adds something to it, then again it might get
> different padding bits from what is originally in memory, so it isn't true
> that it will always succeed at least in the second loop iteration.

Sorry, somehow I must be missing something here.

If you add something you would create a new value and this may (in
an object) have random new padding.  But the "expected" value should
be updated by a failed atomic_compare_exchange cycle and then have
same padding as the value stored in the atomic. So the next cycle
should succeed.  The user would not change the representation of
the "expected" value but create a new value for another object
by adding something.


Martin






[PATCH] rs6000: Fix __builtin_altivec_vcmpne{b,h,w} implementation

2023-07-28 Thread Carl Love via Gcc-patches
GCC maintainers:

The following patch cleans up the definition for the
__builtin_altivec_vcmpnet.  The current implementation implies that the
built-in is only supported on Power 9 since it is defined under the
Power 9 stanza.  However the built-in has no ISA restrictions as stated
in the Power Vector Intrinsic Programming Reference document. The
current built-in works because the built-in gets replaced during GIMPLE
folding by a simple not-equal operator so it doesn't get expanded and
checked for Power 9 code generation.

This patch moves the definition to the Altivec stanza in the built-in
definition file to make it clear the built-ins are valid for Power 8,
Power 9 and beyond.  

The patch has been tested on Power 8 LE/BE, Power 9 LE/BE and Power 10
LE with no regressions.

Please let me know if the patch is acceptable for mainline.  Thanks.

  Carl 

--
rs6000: Fix __builtin_altivec_vcmpne{b,h,w} implementation

The current built-in definitions for vcmpneb, vcmpneh, vcmpnew are defined
under the Power 9 section of r66000-builtins.  This implies they are only
supported on Power 9 and above when in fact they are defined and work on
Power 8 as well with the appropriate Power 8 instruction generation.

The vec_cmpne builtin should generate the vcmpequ{b,h,w} instruction on
Power 8 and generate the vcmpne{b,h,w} on Power 9 an newer processors.

This patch moves the definitions to the Altivec stanza to make it clear
the built-ins are supported for all Altivec processors.  The patch
enables the vcmpequ{b,h,w} instruction to be generated on Power 8 and
the vcmpne{b,h,w} instruction to be generated on Power 9 and beyond.

There is existing test coverage for the vec_cmpne built-in for
vector bool char, vector bool short, vector bool int,
vector bool long long in builtins-3-p9.c and p8vector-builtin-2.c.
Coverage for vector signed int, vector unsigned int is in
p8vector-builtin-2.c.  Coverage for unsigned long long int and long long int
for Power 10 in int_128bit-runnable.c.

Patch has been tested on Power 8 LE/BE, Power 9 LE/BE and Power 10 LE
with no regressions.

gcc/ChangeLog:

* config/rs6000/rs6000-builtins.def (vcmpneb, vcmpneh, vcmpnew.
vcmpnet): Move definitions to Altivec stanza.
* config/rs6000/altivec.md (vcmpneb, vcmpneh, vcmpnew): New
define_expand.
---
 gcc/config/rs6000/altivec.md  | 12 
 gcc/config/rs6000/rs6000-builtins.def | 18 +-
 2 files changed, 21 insertions(+), 9 deletions(-)

diff --git a/gcc/config/rs6000/altivec.md b/gcc/config/rs6000/altivec.md
index ad1224e0b57..31f65aa1b7a 100644
--- a/gcc/config/rs6000/altivec.md
+++ b/gcc/config/rs6000/altivec.md
@@ -2631,6 +2631,18 @@ (define_insn "altivec_vcmpequt_p"
   "vcmpequq. %0,%1,%2"
   [(set_attr "type" "veccmpfx")])
 
+;; Expand for builtin vcmpne{b,h,w}
+(define_expand "altivec_vcmpne_"
+  [(set (match_operand:VSX_EXTRACT_I 3 "altivec_register_operand" "=v")
+   (eq:VSX_EXTRACT_I (match_operand:VSX_EXTRACT_I 1 
"altivec_register_operand" "v")
+ (match_operand:VSX_EXTRACT_I 2 
"altivec_register_operand" "v")))
+   (set (match_operand:VSX_EXTRACT_I 0 "altivec_register_operand" "=v")
+(not:VSX_EXTRACT_I (match_dup 3)))]
+  "TARGET_ALTIVEC"
+  {
+operands[3] = gen_reg_rtx (GET_MODE (operands[0]));
+  });
+
 (define_insn "*altivec_vcmpgts_p"
   [(set (reg:CC CR6_REGNO)
(unspec:CC [(gt:CC (match_operand:VI2 1 "register_operand" "v")
diff --git a/gcc/config/rs6000/rs6000-builtins.def 
b/gcc/config/rs6000/rs6000-builtins.def
index 638d0bc72ca..6b06fa8b34d 100644
--- a/gcc/config/rs6000/rs6000-builtins.def
+++ b/gcc/config/rs6000/rs6000-builtins.def
@@ -641,6 +641,15 @@
   const int __builtin_altivec_vcmpgtuw_p (int, vsi, vsi);
 VCMPGTUW_P vector_gtu_v4si_p {pred}
 
+  const vsc __builtin_altivec_vcmpneb (vsc, vsc);
+VCMPNEB altivec_vcmpne_v16qi {}
+
+  const vss __builtin_altivec_vcmpneh (vss, vss);
+VCMPNEH altivec_vcmpne_v8hi {}
+
+  const vsi __builtin_altivec_vcmpnew (vsi, vsi);
+VCMPNEW altivec_vcmpne_v4si {}
+
   const vsi __builtin_altivec_vctsxs (vf, const int<5>);
 VCTSXS altivec_vctsxs {}
 
@@ -2599,9 +2608,6 @@
   const signed int __builtin_altivec_vcmpaew_p (vsi, vsi);
 VCMPAEW_P vector_ae_v4si_p {}
 
-  const vsc __builtin_altivec_vcmpneb (vsc, vsc);
-VCMPNEB vcmpneb {}
-
   const signed int __builtin_altivec_vcmpneb_p (vsc, vsc);
 VCMPNEB_P vector_ne_v16qi_p {}
 
@@ -2614,15 +2620,9 @@
   const signed int __builtin_altivec_vcmpnefp_p (vf, vf);
 VCMPNEFP_P vector_ne_v4sf_p {}
 
-  const vss __builtin_altivec_vcmpneh (vss, vss);
-VCMPNEH vcmpneh {}
-
   const signed int __builtin_altivec_vcmpneh_p (vss, vss);
 VCMPNEH_P vector_ne_v8hi_p {}
 
-  const vsi __builtin_altivec_vcmpnew (vsi, vsi);
-VCMPNEW vcmpnew {}
-
   const signed int __builtin_altivec_vcmpnew_p (vsi, vsi);
 VCMPNEW_P vector_ne_v4si_p {}
 
-- 
2.37

Re: [PATCH 0/5] Recognize Zicond extension

2023-07-28 Thread Jeff Law via Gcc-patches




On 7/28/23 00:34, Xiao Zeng wrote:



Does that work for you?

I'm going to look at 3/5 today pretty closely.  Exposing zicond to
movcc is something we had implemented inside Ventana and I want to
compare/contrast your work with ours.


What a coincidence!
Zicond is a direct descendant of xventanacondops.  The only notable 
difference is in their encodings.






What I like about yours is it keeps all the logic in riscv.cc rather
than scattering it across riscv.cc and riscv.md.


Yes, when I use enough test cases, I cannot find a concise way to optimize
all test cases. When I enumerated all possible cases in the movcc
function of the RISC-V backend, I found a method that satisfied me, which
is the method in patch [3/5].
I got pulled away to another task yesterday, so didn't get as far as I 
wanted.   The biggest inight from yesterday was determining that some of 
the cases you're handling in riscv_expand_conditional_move were things 
we were doing inside ifcvt.cc.


The difference is likely because the initial work on zicond here was 
primarily driven by changes to ifcvt.  It was only after evaluating that 
initial implementation that we started to the effort to use zicond at 
RTL expansion time.


I could make a case for either approach, but the more I ponder them the 
more I'm inclined to go with something like yours.  We want to capture 
the cases implementable as a conditional move as early as possible in 
the RTL pipeline rather than relying on ifcvt to catch it later.  It 
also avoids polluting ifcvt with transformations that are only likely 
needed on risc-v.







If it's just for the Zicond instruction set, is it necessary to make judgments
outside of eq/ne? After all, it does not support comparison actions other
than eq/ne. Of course, it is also possible to use a special technique to use
Zicond in non eq/ne comparisons.
It's not necessary, but it's definitely helpful to cover the other 
conditions.  In fact, we can even cover a variety of fp conditions by 
utilizing the sCC type insns.



So what I'm looking at for patch #3 is to split out the costing bits 
into its own patch which can go forward immediately.  THen continue 
evaluating the best way to handle unifying the expander/canonicalization 
code.  Your testcases in patch #3 are particularly helpful to make sure 
we're not missing cases.


Jeff


Re: [PATCH 3/5] [RISC-V] Generate Zicond instruction for select pattern with condition eq or neq to 0

2023-07-28 Thread Jeff Law via Gcc-patches




On 7/25/23 11:55, Andreas Schwab wrote:

On Jul 19 2023, Xiao Zeng wrote:


diff --git a/gcc/config/riscv/riscv.cc b/gcc/config/riscv/riscv.cc
index 38d8eb2fcf5..7e6b24bd232 100644
--- a/gcc/config/riscv/riscv.cc
+++ b/gcc/config/riscv/riscv.cc
@@ -2448,6 +2448,17 @@ riscv_rtx_costs (rtx x, machine_mode mode, int 
outer_code, int opno ATTRIBUTE_UN
  *total = COSTS_N_INSNS (1);
  return true;
}
+  else if (TARGET_ZICOND && outer_code == SET &&
+   ((GET_CODE (XEXP (x, 1)) == REG && XEXP (x, 2) == const0_rtx) ||
+   (GET_CODE (XEXP (x, 2)) == REG && XEXP (x, 1) == const0_rtx) ||
+   (GET_CODE (XEXP (x, 1)) == REG && GET_CODE (XEXP (x, 2)) &&
+XEXP (x, 1) == XEXP (XEXP (x, 0), 0)) ||
+   (GET_CODE (XEXP (x, 1)) == REG && GET_CODE (XEXP (x, 2)) &&
+XEXP (x, 2) == XEXP (XEXP (x, 0), 0


Line breaks before the operator, not after.

Also note that && GET_CODE (XEXP (x, 2)) && that appears twice.

That just verifies the code isn't RTX_UNKNOWN which I suspect isn't what 
the author intended.  It probably needs to be adjusted for SUBREGs and 
the pointer equality issues with REGs after reload.


I'll take care of these goofs since the costing ought to be able to move 
forward independently of the improvements Xiao made to generating 
conditional move sequences.


Jeff


Re: _BitInt vs. _Atomic

2023-07-28 Thread Jakub Jelinek via Gcc-patches
On Fri, Jul 28, 2023 at 04:53:30PM +0200, Martin Uecker wrote:
> > The thing is that user doesn't have much control over those
> > padding bits, so whether _Atomic operations on long double (when it is 80
> > bit and stores from hw actually store 10 bytes rather than 12 or 16), or
> > _BitInt(37) or _BitInt(195) or struct S { char a; int b; }; then depend
> > purely on luck.  If the expected value is based on atomic_load on the
> > atomic_compare_exchange location or whatever atomic_compare_exchange gave
> > back, if in the loop one e.g. adds something to it, then again it might get
> > different padding bits from what is originally in memory, so it isn't true
> > that it will always succeed at least in the second loop iteration.
> 
> Sorry, somehow I must be missing something here.
> 
> If you add something you would create a new value and this may (in
> an object) have random new padding.  But the "expected" value should
> be updated by a failed atomic_compare_exchange cycle and then have
> same padding as the value stored in the atomic. So the next cycle
> should succeed.  The user would not change the representation of
> the "expected" value but create a new value for another object
> by adding something.

You're right that it would pass the expected value not something after an
operation on it usually.  But still, expected type will be something like
_BitInt(37) or _BitInt(195) and so neither the atomic_load nor what
atomic_compare_exchange copies back on failure is guaranteed to have the
padding bits preserved.
It is true that if it is larger than 16 bytes the libatomic
atomic_compare_exchange will memcpy the value back which copies the padding
bits, but is there a guarantee that the user code doesn't actually copy that
value further into some other variable?  Anyway, for smaller or equal
to 16 (or 8) bytes if atomic_compare_exchange is emitted inline I don't see
what would preserve the bits.

Jakub



Re: [PATCH] c++: devirtualization of array destruction [PR110057]

2023-07-28 Thread Jason Merrill via Gcc-patches

On 7/26/23 20:06, Ng YongXiang wrote:

Hi Jason,

I've made the following changes.

1. Add pr83054-2.C
2. Move the devirt tests to tree-ssa.
3. Remove dg do run for devirt tests
4. Add // PR c++/110057
5. Generate commit message with git gcc-commit-mklog
6. Check commit format with git gcc-verify

Thanks!


Thanks.  I added a comment and fixed another test that was breaking with 
the patch; here's what I pushed.


JasonFrom a47e615fbf9c6f4b24e5032df5d720b6bf9b63b5 Mon Sep 17 00:00:00 2001
From: Ng YongXiang 
Date: Thu, 27 Jul 2023 08:06:14 +0800
Subject: [PATCH] c++: devirtualization of array destruction [PR110057]
To: gcc-patches@gcc.gnu.org

	PR c++/110057
	PR ipa/83054

gcc/cp/ChangeLog:

	* init.cc (build_vec_delete_1): Devirtualize array destruction.

gcc/testsuite/ChangeLog:

	* g++.dg/warn/pr83054.C: Remove devirtualization warning.
	* g++.dg/lto/pr89335_0.C: Likewise.
	* g++.dg/tree-ssa/devirt-array-destructor-1.C: New test.
	* g++.dg/tree-ssa/devirt-array-destructor-2.C: New test.
	* g++.dg/warn/pr83054-2.C: New test.

Signed-off-by: Ng Yong Xiang 
---
 gcc/cp/init.cc| 11 +++--
 gcc/testsuite/g++.dg/lto/pr89335_0.C  |  2 +-
 .../tree-ssa/devirt-array-destructor-1.C  | 28 
 .../tree-ssa/devirt-array-destructor-2.C  | 29 
 gcc/testsuite/g++.dg/warn/pr83054-2.C | 44 +++
 gcc/testsuite/g++.dg/warn/pr83054.C   |  2 +-
 6 files changed, 110 insertions(+), 6 deletions(-)
 create mode 100644 gcc/testsuite/g++.dg/tree-ssa/devirt-array-destructor-1.C
 create mode 100644 gcc/testsuite/g++.dg/tree-ssa/devirt-array-destructor-2.C
 create mode 100644 gcc/testsuite/g++.dg/warn/pr83054-2.C

diff --git a/gcc/cp/init.cc b/gcc/cp/init.cc
index ff5014ca576..3b9a7783391 100644
--- a/gcc/cp/init.cc
+++ b/gcc/cp/init.cc
@@ -4116,8 +4116,8 @@ build_vec_delete_1 (location_t loc, tree base, tree maxindex, tree type,
   if (type_build_dtor_call (type))
 	{
 	  tmp = build_delete (loc, ptype, base, sfk_complete_destructor,
-			  LOOKUP_NORMAL|LOOKUP_DESTRUCTOR, 1,
-			  complain);
+			  LOOKUP_NORMAL|LOOKUP_DESTRUCTOR|LOOKUP_NONVIRTUAL,
+			  1, complain);
 	  if (tmp == error_mark_node)
 	return error_mark_node;
 	}
@@ -4146,9 +4146,12 @@ build_vec_delete_1 (location_t loc, tree base, tree maxindex, tree type,
   if (tmp == error_mark_node)
 return error_mark_node;
   body = build_compound_expr (loc, body, tmp);
+  /* [expr.delete]/3: "In an array delete expression, if the dynamic type of
+ the object to be deleted is not similar to its static type, the behavior
+ is undefined."  So we can set LOOKUP_NONVIRTUAL.  */
   tmp = build_delete (loc, ptype, tbase, sfk_complete_destructor,
-		  LOOKUP_NORMAL|LOOKUP_DESTRUCTOR, 1,
-		  complain);
+		  LOOKUP_NORMAL|LOOKUP_DESTRUCTOR|LOOKUP_NONVIRTUAL,
+		  1, complain);
   if (tmp == error_mark_node)
 return error_mark_node;
   body = build_compound_expr (loc, body, tmp);
diff --git a/gcc/testsuite/g++.dg/lto/pr89335_0.C b/gcc/testsuite/g++.dg/lto/pr89335_0.C
index 95bf4b3b0cb..76382f8d742 100644
--- a/gcc/testsuite/g++.dg/lto/pr89335_0.C
+++ b/gcc/testsuite/g++.dg/lto/pr89335_0.C
@@ -9,7 +9,7 @@ public:
   virtual ~Container ();
 };
 
-class List : public Container // { dg-lto-message "final would enable devirtualization" }
+class List : public Container
 {
 };
 
diff --git a/gcc/testsuite/g++.dg/tree-ssa/devirt-array-destructor-1.C b/gcc/testsuite/g++.dg/tree-ssa/devirt-array-destructor-1.C
new file mode 100644
index 000..ce8dc2a57cd
--- /dev/null
+++ b/gcc/testsuite/g++.dg/tree-ssa/devirt-array-destructor-1.C
@@ -0,0 +1,28 @@
+// PR c++/110057
+/* { dg-do-compile } */
+/* Virtual calls should be devirtualized because we know dynamic type of object in array at compile time */
+/* { dg-options "-O3 -fdump-tree-optimized -fno-inline"  } */
+
+class A
+{
+public:
+  virtual ~A()
+  {
+  }
+};
+
+class B : public A
+{
+public:
+  virtual ~B()
+  {
+  }
+};
+
+int main()
+{
+  B b[10];
+  return 0;
+}
+
+/* { dg-final { scan-tree-dump-times "OBJ_TYPE_REF" 0 "optimized"} } */
diff --git a/gcc/testsuite/g++.dg/tree-ssa/devirt-array-destructor-2.C b/gcc/testsuite/g++.dg/tree-ssa/devirt-array-destructor-2.C
new file mode 100644
index 000..6b44dc1a4ee
--- /dev/null
+++ b/gcc/testsuite/g++.dg/tree-ssa/devirt-array-destructor-2.C
@@ -0,0 +1,29 @@
+// PR c++/110057
+/* { dg-do-compile } */
+/* Virtual calls should be devirtualized because we know dynamic type of object in array at compile time */
+/* { dg-options "-O3 -fdump-tree-optimized -fno-inline"  } */
+
+class A
+{
+public:
+  virtual ~A()
+  {
+  }
+};
+
+class B : public A
+{
+public:
+  virtual ~B()
+  {
+  }
+};
+
+int main()
+{
+  B* ptr = new B[10];
+  delete[] ptr;
+  return 0;
+}
+
+/* { dg-final { scan-tree-dump-times "OBJ_TYPE_REF" 0 "optimized"} } */
diff --git a/gcc/testsuite/g++.dg/warn/pr83054-2.C b/gcc/testsuite/g++.dg/warn/pr83054-2.C
new file mod

Re: [PATCH v2] RISC-V: testsuite: Add vector_hw and zvfh_hw checks.

2023-07-28 Thread Jeff Law via Gcc-patches




On 7/27/23 06:02, Robin Dapp wrote:

LGTM, I just found this patch still on the list, I mostly tested with
qemu, so I don't think that is a problem before, but I realize it's a
problem when we run on a real board that does not support those
extensions.


I think we can skip this one as I needed to introduce vector_hw and
zvfh_hw with another patch anyway.  What I still intended to do is an
-march-ext=... switch but that might be superseded already by Jörn's patch
that I wanted to have a look at soon anyway.

I'll drop it from patchwork then.

jeff


Re: _BitInt vs. _Atomic

2023-07-28 Thread Martin Uecker
Am Freitag, dem 28.07.2023 um 17:10 +0200 schrieb Jakub Jelinek:
> On Fri, Jul 28, 2023 at 04:53:30PM +0200, Martin Uecker wrote:
> > > The thing is that user doesn't have much control over those
> > > padding bits, so whether _Atomic operations on long double (when it is 80
> > > bit and stores from hw actually store 10 bytes rather than 12 or 16), or
> > > _BitInt(37) or _BitInt(195) or struct S { char a; int b; }; then depend
> > > purely on luck.  If the expected value is based on atomic_load on the
> > > atomic_compare_exchange location or whatever atomic_compare_exchange gave
> > > back, if in the loop one e.g. adds something to it, then again it might 
> > > get
> > > different padding bits from what is originally in memory, so it isn't true
> > > that it will always succeed at least in the second loop iteration.
> > 
> > Sorry, somehow I must be missing something here.
> > 
> > If you add something you would create a new value and this may (in
> > an object) have random new padding.  But the "expected" value should
> > be updated by a failed atomic_compare_exchange cycle and then have
> > same padding as the value stored in the atomic. So the next cycle
> > should succeed.  The user would not change the representation of
> > the "expected" value but create a new value for another object
> > by adding something.
> 
> You're right that it would pass the expected value not something after an
> operation on it usually.  But still, expected type will be something like
> _BitInt(37) or _BitInt(195) and so neither the atomic_load nor what
> atomic_compare_exchange copies back on failure is guaranteed to have the
> padding bits preserved.

For atomic_load in C a value is returned. A value does not care about
padding and when stored into a new object can produce new and different
padding.  

But for atomic_compare_exchange the memory content is copied into 
an object passed by pointer, so here the C standard requires to
that the padding is preserved. It explicitely states that the effect
is like:

if (memcmp(object, expected, sizeof(*object)) == 0)
  memcpy(object, &desired, sizeof(*object));
else
  memcpy(expected, object, sizeof(*object));


> It is true that if it is larger than 16 bytes the libatomic
> atomic_compare_exchange will memcpy the value back which copies the padding
> bits, but is there a guarantee that the user code doesn't actually copy that
> value further into some other variable?  

I do not think it would be surprising for C user when
the next atomic_compare_exchange fails in this case.

> Anyway, for smaller or equal
> to 16 (or 8) bytes if atomic_compare_exchange is emitted inline I don't see
> what would preserve the bits.

This then seems to be incorrect for C.

Martin



[RFC WIP PATCH] _BitInt bit-field support [PR102989]

2023-07-28 Thread Jakub Jelinek via Gcc-patches
On Fri, Jul 28, 2023 at 11:05:42AM +0200, Jakub Jelinek via Gcc-patches wrote:
> On Thu, Jul 27, 2023 at 06:41:44PM +, Joseph Myers wrote:
> > On Thu, 27 Jul 2023, Jakub Jelinek via Gcc-patches wrote:
> > 
> > > - _BitInt(N) bit-fields aren't supported yet (the patch rejects them); 
> > > I'd like
> > >   to enable those incrementally, but don't really see details on how such
> > >   bit-fields should be laid-out in memory nor passed inside of function
> > >   arguments; LLVM implements something, but it is a question if that is 
> > > what
> > >   the various ABIs want
> > 
> > So if the x86-64 ABI (or any other _BitInt ABI that already exists) 
> > doesn't specify this adequately then an issue should be filed (at 
> > https://gitlab.com/x86-psABIs/x86-64-ABI/-/issues in the x86-64 case).
> > 
> > (Note that the language specifies that e.g. _BitInt(123):45 gets promoted 
> > to _BitInt(123) by the integer promotions, rather than left as a type with 
> > the bit-field width.)
> 
> Ok, I'll try to investigate in detail what LLVM does and what GCC would do
> if I just enabled the bitfield support and report.  Still, I'd like to
> handle this only in incremental step after the rest of _BitInt support goes
> in.

So, I've spent some time on this but after simply enabling _BitInt
bit-fields everything I've tried yields the same layout with both GCC and
LLVM trunk and as I didn't have to muck with stor-layout.cc for that
(haven't tried yet the function arg passing/returning), I assume it is just
the generic PCC_BITFIELD_TYPE_MATTERS behavior which works like that, so
while it wouldn't hurt it the psABI said something about those, perhaps it is
ok as is.

But I ran into a compiler divergence on _Generic with bit-field expressions.
My understanding is that _Generic controlling expression undergoes array
to pointer and function to pointer conversions, but not integral promotions
(otherwise it would never match for char, short etc. types).
C23 draft I have says:
"A bit-field is interpreted as having a signed or unsigned integer type
consisting of the specified number of bits"
but doesn't say which exact signed or unsigned integer type that is.
Additionally, I think at least for the larger widths, it would be really
strange if it was interpreted as an INTEGER_TYPE with say precision of 350
because that is more than INTEGER_TYPEs can really use.
So, in the patch I try to use a bit-precise integer type for the bit-field
type if it has a bit-precise bit-field type if possible (where not possible
is the special case of signed 1-bit field where _BitInt(1) is not allowed.

Now, in the testcase with GCC the
static_assert (expr_has_type (s4.a, _BitInt(195)));
static_assert (expr_has_type (s4.b, _BitInt(282)));
static_assert (expr_has_type (s4.c, _BitInt(389)));
static_assert (expr_has_type (s4.d, _BitInt(2)));
static_assert (expr_has_type (s5.a, _BitInt(192)));
static_assert (expr_has_type (s5.b, unsigned _BitInt(192)));
static_assert (expr_has_type (s5.c, _BitInt(192)));
static_assert (expr_has_type (s6.a, _BitInt(2)));
assertions all fail (and all the ones where integer promotions are performed
for binary operation succeed).  They would succeed with
static_assert (expr_has_type (s4.a, _BitInt(63)));
static_assert (expr_has_type (s4.b, _BitInt(280)));
static_assert (expr_has_type (s4.c, _BitInt(23)));
static_assert (!expr_has_type (s4.d, _BitInt(2)));
static_assert (expr_has_type (s5.a, _BitInt(191)));
static_assert (expr_has_type (s5.b, unsigned _BitInt(190)));
static_assert (expr_has_type (s5.c, _BitInt(189)));
static_assert (!expr_has_type (s6.a, _BitInt(2)));
The s4.d and s6.a cases for GCC with this patch actually have int:1 type,
something that can't be ever matched in _Generic except for default:.

On the other side, all the above pass with LLVM, i.e. as if they have
undergone the integral promotion for the _BitInt bitfield case even for
_Generic.  And the
static_assert (expr_has_type (s4.c + 1uwb, _BitInt(389)));
static_assert (expr_has_type (s4.d * 0wb, _BitInt(2)));
static_assert (expr_has_type (s6.a + 0wb, _BitInt(2)));
That looks to me like LLVM bug, because
"The value from a bit-field of a bit-precise integer type is converted to
the corresponding bit-precise integer type."
specifies that s4.c has _BitInt(389) type after integer promotions
and s4.d and s6.a have _BitInt(2) type.  Now, 1uwb has unsigned _BitInt(1)
type and 0wb has _BitInt(2) and the common type for those in all cases is
I believe the type of the left operand.

Thoughts on this?

The patch is obviously incomplete, I haven't added code for lowering
loads/stores from/to bit-fields for large/huge _BitInt nor added testcase
coverage for passing of small structs with _BitInt bit-fields as function
arguments/return values.

2023-07-28  Jakub Jelinek  

PR c/102989
* c-typeck.cc (perform_integral_promotions): Promote bit-fields
with bit-precise integral types to those types.
* c-decl.cc (check_bitfield_type_and_width)

[Committed] RISC-V: Specify -mabi in rv64 autovec testcase

2023-07-28 Thread Patrick O'Neill
On rv32 targets, this patch fixes:
FAIL: gcc.target/riscv/rvv/autovec/madd-split2-1.c -O3 -ftree-vectorize (test 
for excess errors)

cc1: error: ABI requires '-march=rv32'

gcc/testsuite/ChangeLog:

* gcc.target/riscv/rvv/autovec/madd-split2-1.c: Add -mabi=lp64d
to dg-options.

Signed-off-by: Patrick O'Neill 
---
 gcc/testsuite/gcc.target/riscv/rvv/autovec/madd-split2-1.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/gcc/testsuite/gcc.target/riscv/rvv/autovec/madd-split2-1.c 
b/gcc/testsuite/gcc.target/riscv/rvv/autovec/madd-split2-1.c
index 14a9802667e..e10a9e9d0f5 100644
--- a/gcc/testsuite/gcc.target/riscv/rvv/autovec/madd-split2-1.c
+++ b/gcc/testsuite/gcc.target/riscv/rvv/autovec/madd-split2-1.c
@@ -1,5 +1,5 @@
 /* { dg-do compile } */
-/* { dg-options "-march=rv64gcv_zvl256b -O3 -fno-cprop-registers -fno-dce 
--param riscv-autovec-preference=scalable" } */
+/* { dg-options "-march=rv64gcv_zvl256b -mabi=lp64d -O3 -fno-cprop-registers 
-fno-dce --param riscv-autovec-preference=scalable" } */

 long
 foo (long *__restrict a, long *__restrict b, long n)
--
2.34.1



Re: [PATCH] RISC-V: Fix uninitialized and redundant use of which_alternative

2023-07-28 Thread Patrick O'Neill

Thanks!
Here's the comitted patch:
https://inbox.sourceware.org/gcc-patches/20230728163758.377962-1-patr...@rivosinc.com/T/#u

On 7/27/23 15:11, juzhe.zhong wrote:

LGTM.Thanks. You can go ahead commit it.
 Replied Message 
FromPatrick O'Neill 
Date07/28/2023 04:46
To  Kito Cheng ,
juzhe.zh...@rivai.ai 
Cc 	demin.han 
,

gcc-patches 
Subject 	Re: [PATCH] RISC-V: Fix uninitialized and redundant use of 
which_alternative


The newly added testcase fails on rv32 targets with this message:
FAIL: gcc.target/riscv/rvv/autovec/madd-split2-1.c -O3 -ftree-vectorize (test 
for excess errors)

verbose log:
compiler exited with status 1
output is:
cc1: error: ABI requires '-march=rv32'

Something like this appears to fix the issue:

diff --git a/gcc/testsuite/gcc.target/riscv/rvv/autovec/madd-split2-1.c
b/gcc/testsuite/gcc.target/riscv/rvv/autovec/madd-split2-1.c
index 14a9802667e..e10a9e9d0f5 100644
--- a/gcc/testsuite/gcc.target/riscv/rvv/autovec/madd-split2-1.c
+++ b/gcc/testsuite/gcc.target/riscv/rvv/autovec/madd-split2-1.c
@@ -1,5 +1,5 @@
  /* { dg-do compile } */
-/* { dg-options "-march=rv64gcv_zvl256b -O3 -fno-cprop-registers -fno-dce --param 
riscv-autovec-preference=scalable" } */
+/* { dg-options "-march=rv64gcv_zvl256b -mabi=lp64d -O3
-fno-cprop-registers -fno-dce --param riscv-autovec-preference=scalable"
  } */
  
  long

  foo (long *__restrict a, long *__restrict b, long n)

On 7/27/23 04:57, Kito Cheng via Gcc-patches wrote:

My first impression is those emit_insn (gen_rtx_SET()) seems
necessary, but I got the point after I checked vector.md :P

Committed to trunk, thanks :)


On Thu, Jul 27, 2023 at 6:23 pmjuzhe.zh...@rivai.ai
  wrote:

Oh, YES.

Thanks for fixing it. It makes sense since the ternary operations in "vector.md"
generate "vmv.v.v" according to RA.

Thanks for fixing it.

@kito: Could you confirm it? If it's ok to you, commit it for Han (I am lazy to 
commit patches :).



juzhe.zh...@rivai.ai

From: demin.han
Date: 2023-07-27 17:48
To:gcc-patches@gcc.gnu.org
CC:kito.ch...@gmail.com;juzhe.zh...@rivai.ai
Subject: [PATCH] RISC-V: Fix uninitialized and redundant use of 
which_alternative
When pass split2 starts, which_alternative is random depending on
last set of certain pass.

Even initialized, the generated movement is redundant.
The movement can be generated by assembly output template.

Signed-off-by: demin.han

gcc/ChangeLog:

* config/riscv/autovec.md: Delete which_alternative use in split

gcc/testsuite/ChangeLog:

* gcc.target/riscv/rvv/autovec/madd-split2-1.c: New test.

---
gcc/config/riscv/autovec.md | 12 
.../gcc.target/riscv/rvv/autovec/madd-split2-1.c| 13 +
2 files changed, 13 insertions(+), 12 deletions(-)
create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/madd-split2-1.c

diff --git a/gcc/config/riscv/autovec.md b/gcc/config/riscv/autovec.md
index d899922586a..b7ea3101f5a 100644
--- a/gcc/config/riscv/autovec.md
+++ b/gcc/config/riscv/autovec.md
@@ -1012,8 +1012,6 @@ (define_insn_and_split "*fma"
[(const_int 0)]
{
  riscv_vector::emit_vlmax_vsetvl (mode, operands[4]);
-if (which_alternative == 2)
-  emit_insn (gen_rtx_SET (operands[0], operands[3]));
  rtx ops[] = {operands[0], operands[1], operands[2], operands[3], 
operands[0]};
  riscv_vector::emit_vlmax_ternary_insn (code_for_pred_mul_plus 
(mode),
riscv_vector::RVV_TERNOP, ops, operands[4]);
@@ -1058,8 +1056,6 @@ (define_insn_and_split "*fnma"
[(const_int 0)]
{
  riscv_vector::emit_vlmax_vsetvl (mode, operands[4]);
-if (which_alternative == 2)
-  emit_insn (gen_rtx_SET (operands[0], operands[3]));
  rtx ops[] = {operands[0], operands[1], operands[2], operands[3], 
operands[0]};
  riscv_vector::emit_vlmax_ternary_insn (code_for_pred_minus_mul 
(mode),
 riscv_vector::RVV_TERNOP, ops, operands[4]);
@@ -1102,8 +1098,6 @@ (define_insn_and_split "*fma"
[(const_int 0)]
{
  riscv_vector::emit_vlmax_vsetvl (mode, operands[4]);
-if (which_alternative == 2)
-  emit_insn (gen_rtx_SET (operands[0], operands[3]));
  rtx ops[] = {operands[0], operands[1], operands[2], operands[3], 
operands[0]};
  riscv_vector::emit_vlmax_fp_ternary_insn (code_for_pred_mul (PLUS, 
mode),
   riscv_vector::RVV_TERNOP, ops, operands[4]);
@@ -1148,8 +1142,6 @@ (define_insn_and_split "*fnma"
[(const_int 0)]
{
  riscv_vector::emit_vlmax_vsetvl (mode, operands[4]);
-if (which_alternative == 2)
-  emit_insn (gen_rtx_SET (operands[0], operands[3]));
  rtx ops[] = {operands[0], operands[1], operands[2], operands[3], 
operands[0]};
  riscv_vector::emit_vlmax_fp_ternary_insn (code_for_pred_mul_neg (PLUS, 
mode),
   riscv_vector::RVV_TERNOP, ops, operands[4]);
@@ -1194,8 +1186,6 @@ (define_insn_and_s

Re: [PATCH] RISC-V: Fix uninitialized and redundant use of which_alternative

2023-07-28 Thread Patrick O'Neill

No worries! I'm glad it was an easy fix ;)

On 7/27/23 19:55, Demin Han wrote:

Sorry for not consider rv32 config.
The fix is OK. If convenient, please commit it.

On 2023/7/28 4:46, Patrick O'Neill wrote:

The newly added testcase fails on rv32 targets with this message:
FAIL: gcc.target/riscv/rvv/autovec/madd-split2-1.c -O3 -ftree-vectorize (test 
for excess errors)

verbose log:
compiler exited with status 1
output is:
cc1: error: ABI requires '-march=rv32'

Something like this appears to fix the issue:

diff --git a/gcc/testsuite/gcc.target/riscv/rvv/autovec/madd-split2-1.c
b/gcc/testsuite/gcc.target/riscv/rvv/autovec/madd-split2-1.c
index 14a9802667e..e10a9e9d0f5 100644
--- a/gcc/testsuite/gcc.target/riscv/rvv/autovec/madd-split2-1.c
+++ b/gcc/testsuite/gcc.target/riscv/rvv/autovec/madd-split2-1.c
@@ -1,5 +1,5 @@
  /* { dg-do compile } */
-/* { dg-options "-march=rv64gcv_zvl256b -O3 -fno-cprop-registers -fno-dce --param 
riscv-autovec-preference=scalable" } */
+/* { dg-options "-march=rv64gcv_zvl256b -mabi=lp64d -O3
-fno-cprop-registers -fno-dce --param riscv-autovec-preference=scalable"
  } */
  
  long

  foo (long *__restrict a, long *__restrict b, long n)

On 7/27/23 04:57, Kito Cheng via Gcc-patches wrote:


My first impression is those emit_insn (gen_rtx_SET()) seems
necessary, but I got the point after I checked vector.md :P

Committed to trunk, thanks :)


On Thu, Jul 27, 2023 at 6:23 pmjuzhe.zh...@rivai.ai
  wrote:

Oh, YES.

Thanks for fixing it. It makes sense since the ternary operations in "vector.md"
generate "vmv.v.v" according to RA.

Thanks for fixing it.

@kito: Could you confirm it? If it's ok to you, commit it for Han (I am lazy to 
commit patches :).



juzhe.zh...@rivai.ai

From: demin.han
Date: 2023-07-27 17:48
To:gcc-patches@gcc.gnu.org
CC:kito.ch...@gmail.com;juzhe.zh...@rivai.ai
Subject: [PATCH] RISC-V: Fix uninitialized and redundant use of 
which_alternative
When pass split2 starts, which_alternative is random depending on
last set of certain pass.

Even initialized, the generated movement is redundant.
The movement can be generated by assembly output template.

Signed-off-by: demin.han

gcc/ChangeLog:

* config/riscv/autovec.md: Delete which_alternative use in split

gcc/testsuite/ChangeLog:

* gcc.target/riscv/rvv/autovec/madd-split2-1.c: New test.

---
gcc/config/riscv/autovec.md | 12 
.../gcc.target/riscv/rvv/autovec/madd-split2-1.c    | 13 +
2 files changed, 13 insertions(+), 12 deletions(-)
create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/madd-split2-1.c

diff --git a/gcc/config/riscv/autovec.md b/gcc/config/riscv/autovec.md
index d899922586a..b7ea3101f5a 100644
--- a/gcc/config/riscv/autovec.md
+++ b/gcc/config/riscv/autovec.md
@@ -1012,8 +1012,6 @@ (define_insn_and_split "*fma"
     [(const_int 0)]
     {
   riscv_vector::emit_vlmax_vsetvl (mode, operands[4]);
-    if (which_alternative == 2)
-  emit_insn (gen_rtx_SET (operands[0], operands[3]));
   rtx ops[] = {operands[0], operands[1], operands[2], operands[3], 
operands[0]};
   riscv_vector::emit_vlmax_ternary_insn (code_for_pred_mul_plus 
(mode),
     riscv_vector::RVV_TERNOP, ops, operands[4]);
@@ -1058,8 +1056,6 @@ (define_insn_and_split "*fnma"
     [(const_int 0)]
     {
   riscv_vector::emit_vlmax_vsetvl (mode, operands[4]);
-    if (which_alternative == 2)
-  emit_insn (gen_rtx_SET (operands[0], operands[3]));
   rtx ops[] = {operands[0], operands[1], operands[2], operands[3], 
operands[0]};
   riscv_vector::emit_vlmax_ternary_insn (code_for_pred_minus_mul 
(mode),
  riscv_vector::RVV_TERNOP, ops, operands[4]);
@@ -1102,8 +1098,6 @@ (define_insn_and_split "*fma"
     [(const_int 0)]
     {
   riscv_vector::emit_vlmax_vsetvl (mode, operands[4]);
-    if (which_alternative == 2)
-  emit_insn (gen_rtx_SET (operands[0], operands[3]));
   rtx ops[] = {operands[0], operands[1], operands[2], operands[3], 
operands[0]};
   riscv_vector::emit_vlmax_fp_ternary_insn (code_for_pred_mul (PLUS, 
mode),
    riscv_vector::RVV_TERNOP, ops, operands[4]);
@@ -1148,8 +1142,6 @@ (define_insn_and_split "*fnma"
     [(const_int 0)]
     {
   riscv_vector::emit_vlmax_vsetvl (mode, operands[4]);
-    if (which_alternative == 2)
-  emit_insn (gen_rtx_SET (operands[0], operands[3]));
   rtx ops[] = {operands[0], operands[1], operands[2], operands[3], 
operands[0]};
   riscv_vector::emit_vlmax_fp_ternary_insn (code_for_pred_mul_neg (PLUS, 
mode),
    riscv_vector::RVV_TERNOP, ops, operands[4]);
@@ -1194,8 +1186,6 @@ (define_insn_and_split "*fms"
     [(const_int 0)]
     {
   riscv_vector::emit_vlmax_vsetvl (mode, operands[4]);
-    if (which_alternative == 2)
-  emit_insn (gen_rtx_SET (operands[0], operands[3]));
   rtx ops[] = {operands[0], operands[1], operands[2], operands[3], 
operands[0]};
   riscv_vector::emit_vlmax_fp_ternary_insn (code_for_pred_mul (MINUS, 
mode),
    

Re: [PATCH] AArch64: Do not increase the vect reduction latency by multiplying count [PR110625]

2023-07-28 Thread Richard Sandiford via Gcc-patches
Sorry for the slow response.

Hao Liu OS  writes:
>> Ah, thanks.  In that case, Hao, I think we can avoid the ICE by changing:
>>
>>   if ((kind == scalar_stmt || kind == vector_stmt || kind == vec_to_scalar)
>>   && vect_is_reduction (stmt_info))
>>
>> to:
>>
>>   if ((kind == scalar_stmt || kind == vector_stmt || kind == vec_to_scalar)
>>   && STMT_VINFO_LIVE_P (stmt_info)
>>   && vect_is_reduction (stmt_info))
>
> I  tried this and it indeed can avoid ICE.  But it seems the 
> reduction_latency calculation is also skipped, after such modification, the 
> redunction_latency is 0 for this case. Previously, it is 1 and 2 for scalar 
> and vector separately.

Which test case do you see this for?  The two tests in the patch still
seem to report correct latencies for me if I make the change above.

Thanks,
Richard

> IMHO, to keep it consistent with previous result, should we move 
> STMT_VINFO_LIVE_P check below and inside the if? such as:
>
>   /* Calculate the minimum cycles per iteration imposed by a reduction
>  operation.  */
>   if ((kind == scalar_stmt || kind == vector_stmt || kind == vec_to_scalar)
>   && vect_is_reduction (stmt_info))
> {
>   unsigned int base
> = aarch64_in_loop_reduction_latency (m_vinfo, stmt_info, m_vec_flags);
>   if (STMT_VINFO_LIVE_P (stmt_info) && STMT_VINFO_FORCE_SINGLE_CYCLE (
> info_for_reduction (m_vinfo, stmt_info)))
> /* ??? Ideally we'd use a tree to reduce the copies down to 1 vector,
>and then accumulate that, but at the moment the loop-carried
>dependency includes all copies.  */
> ops->reduction_latency = MAX (ops->reduction_latency, base * count);
>   else
> ops->reduction_latency = MAX (ops->reduction_latency, base);
>
> Thanks,
> Hao
>
> 
> From: Richard Sandiford 
> Sent: Wednesday, July 26, 2023 17:14
> To: Richard Biener
> Cc: Hao Liu OS; GCC-patches@gcc.gnu.org
> Subject: Re: [PATCH] AArch64: Do not increase the vect reduction latency by 
> multiplying count [PR110625]
>
> Richard Biener  writes:
>> On Wed, Jul 26, 2023 at 4:02 AM Hao Liu OS via Gcc-patches
>>  wrote:
>>>
>>> > When was STMT_VINFO_REDUC_DEF empty?  I just want to make sure that we're 
>>> > not papering over an issue elsewhere.
>>>
>>> Yes, I also wonder if this is an issue in vectorizable_reduction.  Below is 
>>> the the gimple of "gcc.target/aarch64/sve/cost_model_13.c":
>>>
>>>   :
>>>   # res_18 = PHI 
>>>   # i_20 = PHI 
>>>   _1 = (long unsigned int) i_20;
>>>   _2 = _1 * 2;
>>>   _3 = x_14(D) + _2;
>>>   _4 = *_3;
>>>   _5 = (unsigned short) _4;
>>>   res.0_6 = (unsigned short) res_18;
>>>   _7 = _5 + res.0_6; <-- The current stmt_info
>>>   res_15 = (short int) _7;
>>>   i_16 = i_20 + 1;
>>>   if (n_11(D) > i_16)
>>> goto ;
>>>   else
>>> goto ;
>>>
>>>   :
>>>   goto ;
>>>
>>> It looks like that STMT_VINFO_REDUC_DEF should be "res_18 = PHI >> 0(6)>"?
>>> The status here is:
>>>   STMT_VINFO_REDUC_IDX (stmt_info): 1
>>>   STMT_VINFO_REDUC_TYPE (stmt_info): TREE_CODE_REDUCTION
>>>   STMT_VINFO_REDUC_VECTYPE (stmt_info): 0x0
>>
>> Not all stmts in the SSA cycle forming the reduction have
>> STMT_VINFO_REDUC_DEF set,
>> only the last (latch def) and live stmts have at the moment.
>
> Ah, thanks.  In that case, Hao, I think we can avoid the ICE by changing:
>
>   if ((kind == scalar_stmt || kind == vector_stmt || kind == vec_to_scalar)
>   && vect_is_reduction (stmt_info))
>
> to:
>
>   if ((kind == scalar_stmt || kind == vector_stmt || kind == vec_to_scalar)
>   && STMT_VINFO_LIVE_P (stmt_info)
>   && vect_is_reduction (stmt_info))
>
> instead of using a null check.
>
> I see that vectorizable_reduction calculates a reduc_chain_length.
> Would it be OK to store that in the stmt_vec_info?  I suppose the
> AArch64 code should be multiplying by that as well.  (It would be a
> separate patch from this one though.)
>
> Richard
>
>
>>
>> Richard.
>>
>>> Thanks,
>>> Hao
>>>
>>> 
>>> From: Richard Sandiford 
>>> Sent: Tuesday, July 25, 2023 17:44
>>> To: Hao Liu OS
>>> Cc: GCC-patches@gcc.gnu.org
>>> Subject: Re: [PATCH] AArch64: Do not increase the vect reduction latency by 
>>> multiplying count [PR110625]
>>>
>>> Hao Liu OS  writes:
>>> > Hi,
>>> >
>>> > Thanks for the suggestion.  I tested it and found a gcc_assert failure:
>>> > gcc.target/aarch64/sve/cost_model_13.c (internal compiler error: in 
>>> > info_for_reduction, at tree-vect-loop.cc:5473)
>>> >
>>> > It is caused by empty STMT_VINFO_REDUC_DEF.
>>>
>>> When was STMT_VINFO_REDUC_DEF empty?  I just want to make sure that
>>> we're not papering over an issue elsewhere.
>>>
>>> Thanks,
>>> Richard
>>>
>>>   So, I added an extra check before checking single_defuse_cycle. The 
>>> updated patch is below.  Is it OK for trunk?
>>> >
>>> > ---
>>> >
>>> > The new costs should only count reduction latency 

New German PO file for 'gcc' (version 13.2.0)

2023-07-28 Thread Translation Project Robot
Hello, gentle maintainer.

This is a message from the Translation Project robot.

A revised PO file for textual domain 'gcc' has been submitted
by the German team of translators.  The file is available at:

https://translationproject.org/latest/gcc/de.po

(This file, 'gcc-13.2.0.de.po', has just now been sent to you in
a separate email.)

All other PO files for your package are available in:

https://translationproject.org/latest/gcc/

Please consider including all of these in your next release, whether
official or a pretest.

Whenever you have a new distribution with a new version number ready,
containing a newer POT file, please send the URL of that distribution
tarball to the address below.  The tarball may be just a pretest or a
snapshot, it does not even have to compile.  It is just used by the
translators when they need some extra translation context.

The following HTML page has been updated:

https://translationproject.org/domain/gcc.html

If any question arises, please contact the translation coordinator.

Thank you for all your work,

The Translation Project robot, in the
name of your translation coordinator.




Re: [PATCH 0/5] GCC _BitInt support [PR102989]

2023-07-28 Thread Joseph Myers
On Fri, 28 Jul 2023, Jakub Jelinek via Gcc-patches wrote:

> I had a brief look at libbid and am totally unimpressed.
> Seems we don't implement {,unsigned} __int128 <-> _Decimal{32,64,128}
> conversions at all (we emit calls to __bid_* functions which don't exist),

That's bug 65833.

> the library (or the way we configure it) doesn't care about exceptions nor
> rounding mode (see following testcase)

And this is related to the never-properly-resolved issue about the split 
of responsibility between libgcc, libdfp and glibc.

Decimal floating point has its own rounding mode, set with fe_dec_setround 
and read with fe_dec_getround (so this test is incorrect).  In some cases 
(e.g. Power), that's a hardware rounding mode.  In others, it needs to be 
implemented in software as a TLS variable.  In either case, it's part of 
the floating-point environment, so should be included in the state 
manipulated by functions using fenv_t or femode_t.  Exceptions are shared 
with binary floating point.

libbid in libgcc has its own TLS rounding mode and exceptions state, but 
the former isn't connected to fe_dec_setround / fe_dec_getround functions, 
while the latter isn't the right way to do things when there's hardware 
exceptions state.

libdfp - https://github.com/libdfp/libdfp - is a separate library, not 
part of libgcc or glibc (and with its own range of correctness bugs) - 
maintained, but not very actively (maybe more so than the DFP support in 
GCC - we haven't had a listed DFP maintainer since 2019).  It has various 
standard DFP library functions - maybe not the full C23 set, though some 
of the TS 18661-2 functions did get added, so it's not just the old TR 
24732 set.  That includes its own version of the libgcc support, which I 
think has some more support for using exceptions and rounding modes.  It 
includes the fe_dec_getround and fe_dec_setround functions.  It doesn't do 
anything to help with the issue of including the DFP rounding state in the 
state manipulated by functions such as fegetenv.

Being a separate library probably in turn means that it's less likely to 
be used (although any code that uses DFP can probably readily enough 
choose to use a separate library if it wishes).  And it introduces issues 
with linker command line ordering, if the user intends to use libdfp's 
copy of the functions but the linker processes -lgcc first.

For full correctness, at least some functionality (such as the rounding 
modes and associated inclusion in fenv_t) would probably need to go in 
glibc.  See 
https://sourceware.org/pipermail/libc-alpha/2019-September/106579.html 
for more discussion.

But if you do put some things in glibc, maybe you still don't want the 
_BitInt conversions there?  Rather, if you keep the _BitInt conversions in 
libgcc (even when the other support is in glibc), you'd have some 
libc-provided interface for libgcc code to get the DFP rounding mode from 
glibc in the case where it's handled in software, like some interfaces 
already present in the soft-float powerpc case to provide access to its 
floating-point state from libc (and something along the lines of 
sfp-machine.h could tell libgcc how to use either that interface or 
hardware instructions to access the rounding mode and exceptions as 
needed).

> and for integral <-> _Decimal32
> conversions implement them as integral <-> _Decimal64 <-> _Decimal32
> conversions.  While in the _Decimal32 -> _Decimal64 -> integral
> direction that is probably ok, even if exceptions and rounding (other than
> to nearest) were supported, the other direction I'm sure can suffer from
> double rounding.

Yes, double rounding would be an issue for converting 64-bit integers to 
_Decimal32 via _Decimal64 (it would be fine to convert 32-bit integers 
like that since they can be exactly represented in _Decimal64; it would be 
fine to convert 64-bit integers via _Decimal128).

> So, wonder if it wouldn't be better to implement these in the soft-fp
> infrastructure which at least has the exception and rounding mode support.
> Unlike DPD, decoding BID seems to be about 2 simple tests of the 4 bits
> below the sign bit and doing some shifts, so not something one needs a 10MB
> of a library for.  Now, sure, 5MB out of that are generated tables in

Note that representations with too-large significand are defined to be 
noncanonical representations of zero, so you need to take care of that in 
decoding BID.

> bid_binarydecimal.c, but unfortunately those are static and not in a form
> which could be directly fed into multiplication (unless we'd want to go
> through conversions to/from strings).
> So, it seems to be easier to guess needed power of 10 from number of binary
> digits or vice versa, have a small table of powers of 10 (say those which
> fit into a limb) and construct larger powers of 10 by multiplicating those
> several times, _Decimal128 has exponent up to 6144 which is ~ 2552 bytes
> or 319 64-bit limbs, but having a table with all the 6144 po

Re: [PATCH] Fortran: do not pass hidden character length for TYPE(*) dummy [PR110825]

2023-07-28 Thread Steve Kargl via Gcc-patches
On Thu, Jul 27, 2023 at 09:39:53PM +0200, Harald Anlauf via Fortran wrote:
> Dear all,
> 
> when passing a character actual argument to an assumed-type dummy
> (TYPE(*)), we should not pass the character length for that argument,
> as otherwise other hidden arguments that are passed as part of the
> gfortran ABI will not be interpreted correctly.  This is in line
> with the current way the procedure decl is generated.
> 
> The attached patch fixes the caller and clarifies the behavior
> in the documentation.
> 
> Regtested on x86_64-pc-linux-gnu.  OK for mainline?
> 

OK.  Patch looks small enough that if you are so inclined
to backport that's ok as well.  Thanks for the quick response.

-- 
Steve


Re: _BitInt vs. _Atomic

2023-07-28 Thread Joseph Myers
On Fri, 28 Jul 2023, Jakub Jelinek via Gcc-patches wrote:

> The C++ way of dealing with this is using __builtin_clear_padding,
> done on atomic stores/updates of the atomic memory (padding is cleared
> if any on the value to be stored, or on the expected and desired values).
> 
> I don't know enough about the C atomic requirements whether that is feasible
> for it as well, or whether it is possible to make the padding bits partially
> or fully set somehow non-atomically without invoking UB and then make it
> never match.

If padding bits not being reliably preserved causes problems for the 
compare-exchange loops in C in practice, then it would seem reasonable to 
use __builtin_clear_padding internally as part of implementing those cases 
of atomic compound assignment.

> And another issue is that while __atomic_load, __atomic_store,
> __atomic_exchange and __atomic_compare_exchange work on arbitrary _BitInt
> sizes, others like __atomic_fetch_add only support _BitInt or other integral
> types which have size of 1, 2, 4, 8 or 16 bytes, others emit an error
> in c-family/c-common.cc (sync_resolve_size).  So, either
> resolve_overloaded_builtin should for the case when pointer is pointer to
> _BitInt which doesn't have 1, 2, 4, 8 or 16 bytes size lower those into
> a loop using __atomic_compare_exchange (or perhaps also if there is
> padding), or  should do that.

The  interfaces definitely need to work with _BitInt.  My 
guess is that doing this with the built-in expansion would be more robust 
than putting more complicated definitions in the header that choose which 
built-in functions to use depending on properties of the type (and keeping 
the built-in functions limited to certain widths), but I don't know.

Note also that those  operations have no undefined behavior 
on signed integer overflow.

If any ABIs require sign / zero extension of _BitInt values in memory, 
care would also be needed in the case of (size of 1, 2, 4, 8 or 16 bytes, 
but also has high bits required to be sign / zero extended) to ensure that 
the operations are implemented so as to leave the high bits with the 
expected values in case of overflow, which wouldn't result from simply 
using the underlying operation for a type with the full precision of its 
memory size.

-- 
Joseph S. Myers
jos...@codesourcery.com


Re: [RFC WIP PATCH] _BitInt bit-field support [PR102989]

2023-07-28 Thread Joseph Myers
On Fri, 28 Jul 2023, Jakub Jelinek via Gcc-patches wrote:

> But I ran into a compiler divergence on _Generic with bit-field expressions.
> My understanding is that _Generic controlling expression undergoes array
> to pointer and function to pointer conversions, but not integral promotions
> (otherwise it would never match for char, short etc. types).
> C23 draft I have says:
> "A bit-field is interpreted as having a signed or unsigned integer type
> consisting of the specified number of bits"
> but doesn't say which exact signed or unsigned integer type that is.

Yes, the type used in _Generic isn't fully specified, just the type after 
integer promotions in contexts where those occur.

> static_assert (expr_has_type (s4.c + 1uwb, _BitInt(389)));
> static_assert (expr_has_type (s4.d * 0wb, _BitInt(2)));
> static_assert (expr_has_type (s6.a + 0wb, _BitInt(2)));
> That looks to me like LLVM bug, because
> "The value from a bit-field of a bit-precise integer type is converted to
> the corresponding bit-precise integer type."
> specifies that s4.c has _BitInt(389) type after integer promotions
> and s4.d and s6.a have _BitInt(2) type.  Now, 1uwb has unsigned _BitInt(1)
> type and 0wb has _BitInt(2) and the common type for those in all cases is
> I believe the type of the left operand.

Indeed, I'd expect those to pass, since in those cases integer promotions 
(to the declared _BitInt type of the bit-field, without the bit-field 
width) are applied.

-- 
Joseph S. Myers
jos...@codesourcery.com


[COMMITTED] bpf: disable tail call optimization in BPF targets

2023-07-28 Thread Jose E. Marchesi via Gcc-patches
clang disables tail call optimizations in BPF targets.  Do the same in
GCC.

gcc/ChangeLog:

* config/bpf/bpf.cc (bpf_option_override): Disable tail-call
optimizations in BPF target.
---
 gcc/config/bpf/bpf.cc | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/gcc/config/bpf/bpf.cc b/gcc/config/bpf/bpf.cc
index b5b5674edbb..57817cdf2f8 100644
--- a/gcc/config/bpf/bpf.cc
+++ b/gcc/config/bpf/bpf.cc
@@ -273,6 +273,9 @@ bpf_option_override (void)
  "on this architecture");
   flag_stack_protect = 0;
 }
+
+  /* The BPF target does not support tail call optimization.  */
+  flag_optimize_sibling_calls = 0;
 }
 
 #undef TARGET_OPTION_OVERRIDE
-- 
2.30.2



[PATCH v2] SARIF and -ftime-report's output [PR109361]

2023-07-28 Thread David Malcolm via Gcc-patches
On Fri, 2023-07-28 at 08:00 +0200, Richard Biener wrote:
> On Fri, Jul 28, 2023 at 12:23 AM David Malcolm via Gcc-patches
>  wrote:
> > 
> > On Tue, 2023-04-11 at 08:43 +, Richard Biener wrote:
> > > On Tue, 4 Apr 2023, David Malcolm wrote:
> > > 
> > > > Richi, Jakub: I can probably self-approve this, but it's
> > > > technically a
> > > > new feature.  OK if I push this to trunk in stage 4?  I believe
> > > > it's
> > > > low risk, and is very useful for benchmarking -fanalyzer.
> > > 
> > > Please wait for stage1 at this point.  One comment on the patch
> > > below ...
> > > 
> > > > 
> > > > This patch adds support for embeddding profiling information
> > > > about
> > > > the
> > > > compiler itself into the SARIF output.

[...snip...]

> > > 
> > > 'sarif' is currently used only with -fdiagnostics-format= it
> > > seems.
> > > We already have
> > > 
> > > ftime-report
> > > Common Var(time_report)
> > > Report the time taken by each compiler pass.
> > > 
> > > ftime-report-details
> > > Common Var(time_report_details)
> > > Record times taken by sub-phases separately.
> > > 
> > > so -fsarif-time-report is not a) -ftime-report-sarif and b) it's
> > > unclear if it applies to -ftime-report or to both -ftime-report
> > > and -ftime-report-details?  (note -ftime-report-details needs
> > > -ftime-report to be effective)
> > > 
> > > I'd rather have a -ftime-report-format= (or -freport-format in
> > > case we want to cover -fmem-report, -fmem-report-wpa,
> > > -fpre-ipa-mem-report and -fpost-ipa-mem-report as well?)
> > > 
> > > ISTR there's a summer of code project in this are as well.
> > > 
> > > Thanks,
> > > Richard.
> > 
> > Revisiting this; sorry about the delay.
> > 
> > As I understand the status quo, we currently have:
> > * -ftime-report: enable capturing of timing information (with a
> > slight
> > speed hit), and report it to stderr
> > * -ftime-report-details: tweak how that information is captured (if
> > -
> > ftime-report is enabled), so that timevar->children is populated
> > and
> > printed
> > 
> > There seem to be two things here:
> > - what timing data we capture
> > - where that timing data goes
> > 
> > What I need is to some way to specify that the output should go to
> > the
> > SARIF file, rather than to stderr.
> > 
> > Some ways we could do this:
> > (a) simply enforce that if SARIF diagnostics were requested with -
> > fdiagnostics-format=sarif-{file|stderr} that the time report goes
> > there
> > in JSON form, rather than to stderr
> > (b) add an option to specify where the time report goes
> > (c) add options to allow the time report to potentially go to
> > multiple
> > places (both stderr and SARIF, one or the other, neither); this
> > seems
> > overcomplex to me.
> > (d) something else?
> > 
> > The patch I posted implements a form of (b), but right now I'm
> > leaning
> > towards option (a): if the user requested SARIF output, then the
> > time
> > report goes to the SARIF output, rather than stderr.
> 
> I'm fine with (a), but -fdiagnostics-format= doesn't naturally apply
> to
> -ftime-report (or -fmem-report), those are not "diagnostics" in my
> opinion but they are auxiliary data for the compilation process
> rather than the input to it.  But yes, -ftime-report-format= would be
> too specific, maybe -faux-format=.
> 
> That said, we can go with (a) and do something else later if desired.
> I don't think preserving behavior in this area will be important so
> we
> don't have to get it right immediately.

Thanks.

Here's an updated version of the patch which implements (a).

Successfully bootstrapped & regrtested on x86_64-pc-linux-gnu.
As before, I've tested this with my analyzer integration testsuite and
was able to use the .sarif data to generate reports about which source
files get slowed down by the analyzer [1]. I've validated the generated
.sarif files against the SARIF schema.

OK for trunk?
Dave
[1] https://github.com/davidmalcolm/gcc-analyzer-integration-tests/issues/5


This patch adds support for embeddding profiling information about the
compiler itself into the SARIF output.

Specifically, if SARIF diagnostic output is requested, via
-fdiagnostics-format=sarif-file or -fdiagnostics-format=sarif-stderr,
then any -ftime-report output is written in JSON form into the SARIF
output, rather than to stderr.

In earlier versions of this patch I extended -ftime-report so that
*as well* as writing to stderr, it would embed the information in any
SARIF output.  This turned out to be awkward to use, in that I found
myself needing to get the data in JSON form without also having it
emitted on stderr (which was fouling my build scripts).

The timing information is written to the SARIF as a "gcc/timeReport"
property within a property bag of the "invocation" object.

Here's an example of the output:

  "invocations": [
  {
  "executionSuccessful": true,
  "toolExecutionNotifications": [],
  "properties": {
  "gcc/timeReport": {

Re: [PATCH RESEND] c: add -Wmissing-variable-declarations [PR65213]

2023-07-28 Thread Joseph Myers
On Tue, 18 Jul 2023, Hamza Mahfooz wrote:

> Resolves:
> PR c/65213 - Extend -Wmissing-declarations to variables [i.e. add
> -Wmissing-variable-declarations]
> 
> gcc/c-family/ChangeLog:
> 
>   PR c/65213
>   * c.opt (-Wmissing-variable-declarations): New option.
> 
> gcc/c/ChangeLog:
> 
>   PR c/65213
>   * c-decl.cc (start_decl): Handle
>   -Wmissing-variable-declarations.
> 
> gcc/ChangeLog:
> 
>   PR c/65213
>   * doc/invoke.texi (-Wmissing-variable-declarations): Document
>   new option.
> 
> gcc/testsuite/ChangeLog:
> 
>   PR c/65213
>   * gcc.dg/Wmissing-variable-declarations.c: New test.

OK.

-- 
Joseph S. Myers
jos...@codesourcery.com


[COMMITTED] PR tree-optimization/110205 -Fix some warnings

2023-07-28 Thread Andrew MacLeod via Gcc-patches

This patch simply fixes the code up a little to remove potential warnings.

Bootstrapped on x86_64-pc-linux-gnu with no regressions. Pushed.

Andrew

From 7905c071c35070fff3397b1e24f140c128c08e64 Mon Sep 17 00:00:00 2001
From: Andrew MacLeod 
Date: Mon, 10 Jul 2023 13:58:22 -0400
Subject: [PATCH 1/3] Fix some warnings

	PR tree-optimization/110205
	* gimple-range-cache.h (ranger_cache::m_estimate): Delete.
	* range-op-mixed.h (operator_bitwise_xor::op1_op2_relation_effect):
	Add final override.
	* range-op.cc (operator_lshift): Add missing final overrides.
	(operator_rshift): Ditto.
---
 gcc/gimple-range-cache.h |  1 -
 gcc/range-op-mixed.h |  2 +-
 gcc/range-op.cc  | 44 ++--
 3 files changed, 21 insertions(+), 26 deletions(-)

diff --git a/gcc/gimple-range-cache.h b/gcc/gimple-range-cache.h
index 93d16294d2e..a0f436b5723 100644
--- a/gcc/gimple-range-cache.h
+++ b/gcc/gimple-range-cache.h
@@ -137,7 +137,6 @@ private:
   void exit_range (vrange &r, tree expr, basic_block bb, enum rfd_mode);
   bool edge_range (vrange &r, edge e, tree name, enum rfd_mode);
 
-  phi_analyzer *m_estimate;
   vec m_workback;
   class update_list *m_update;
 };
diff --git a/gcc/range-op-mixed.h b/gcc/range-op-mixed.h
index 3cb904f9d80..b623a88cc71 100644
--- a/gcc/range-op-mixed.h
+++ b/gcc/range-op-mixed.h
@@ -574,7 +574,7 @@ public:
 	tree type,
 	const irange &op1_range,
 	const irange &op2_range,
-	relation_kind rel) const;
+	relation_kind rel) const final override;
   void update_bitmask (irange &r, const irange &lh,
 		   const irange &rh) const final override;
 private:
diff --git a/gcc/range-op.cc b/gcc/range-op.cc
index 615e5fe0036..19fdff0eb64 100644
--- a/gcc/range-op.cc
+++ b/gcc/range-op.cc
@@ -2394,22 +2394,21 @@ class operator_lshift : public cross_product_operator
   using range_operator::fold_range;
   using range_operator::op1_range;
 public:
-  virtual bool op1_range (irange &r, tree type,
-			  const irange &lhs,
-			  const irange &op2,
-			  relation_trio rel = TRIO_VARYING) const;
-  virtual bool fold_range (irange &r, tree type,
-			   const irange &op1,
-			   const irange &op2,
-			   relation_trio rel = TRIO_VARYING) const;
+  virtual bool op1_range (irange &r, tree type, const irange &lhs,
+			  const irange &op2, relation_trio rel = TRIO_VARYING)
+const final override;
+  virtual bool fold_range (irange &r, tree type, const irange &op1,
+			   const irange &op2, relation_trio rel = TRIO_VARYING)
+const final override;
 
   virtual void wi_fold (irange &r, tree type,
 			const wide_int &lh_lb, const wide_int &lh_ub,
-			const wide_int &rh_lb, const wide_int &rh_ub) const;
+			const wide_int &rh_lb,
+			const wide_int &rh_ub) const final override;
   virtual bool wi_op_overflows (wide_int &res,
 tree type,
 const wide_int &,
-const wide_int &) const;
+const wide_int &) const final override;
   void update_bitmask (irange &r, const irange &lh,
 		   const irange &rh) const final override
 { update_known_bitmask (r, LSHIFT_EXPR, lh, rh); }
@@ -2421,27 +2420,24 @@ class operator_rshift : public cross_product_operator
   using range_operator::op1_range;
   using range_operator::lhs_op1_relation;
 public:
-  virtual bool fold_range (irange &r, tree type,
-			   const irange &op1,
-			   const irange &op2,
-			   relation_trio rel = TRIO_VARYING) const;
+  virtual bool fold_range (irange &r, tree type, const irange &op1,
+			   const irange &op2, relation_trio rel = TRIO_VARYING)
+   const final override;
   virtual void wi_fold (irange &r, tree type,
 			const wide_int &lh_lb,
 			const wide_int &lh_ub,
 			const wide_int &rh_lb,
-			const wide_int &rh_ub) const;
+			const wide_int &rh_ub) const final override;
   virtual bool wi_op_overflows (wide_int &res,
 tree type,
 const wide_int &w0,
-const wide_int &w1) const;
-  virtual bool op1_range (irange &, tree type,
-			  const irange &lhs,
-			  const irange &op2,
-			  relation_trio rel = TRIO_VARYING) const;
-  virtual relation_kind lhs_op1_relation (const irange &lhs,
-	   const irange &op1,
-	   const irange &op2,
-	   relation_kind rel) const;
+const wide_int &w1) const final override;
+  virtual bool op1_range (irange &, tree type, const irange &lhs,
+			  const irange &op2, relation_trio rel = TRIO_VARYING)
+const final override;
+  virtual relation_kind lhs_op1_relation (const irange &lhs, const irange &op1,
+	  const irange &op2, relation_kind rel)
+const final override;
   void update_bitmask (irange &r, const irange &lh,
 		   const irange &rh) const final override
 { update_known_bitmask (r, RSHIFT_EXPR, lh, rh); }
-- 
2.40.1



[COMMITTED] Add a merge_range to ssa_cache and use it.

2023-07-28 Thread Andrew MacLeod via Gcc-patches

This adds some tweaks to the ssa-range cache.

1)   Adds a new merge_range which works like set_range, except if there 
is already a value, the two values are merged via intersection and 
stored.   THis avpoids having to check if there is a value, load it, 
intersect it then store that in the client. There is one usage pattern 
(but more to come) in the code base.. change to use it.


2)  The range_of_expr() method in ssa_cache does not set the stmt to a 
default of NULL.  Correct that oversight.


3)  the method empty_p() is added to the ssa_lazy_cache class so we can 
detect if the lazy cache has any active elements in it or not.


Bootstrapped on 86_64-pc-linux-gnu with no regressions.   Pushed.

Andrew

From 72fb44ca53fda15024e0c272052b74b1f32735b1 Mon Sep 17 00:00:00 2001
From: Andrew MacLeod 
Date: Fri, 28 Jul 2023 11:00:57 -0400
Subject: [PATCH 3/3] Add a merge_range to ssa_cache and use it.  add empty_p
 and param tweaks.

	* gimple-range-cache.cc (ssa_cache::merge_range): New.
	(ssa_lazy_cache::merge_range): New.
	* gimple-range-cache.h (class ssa_cache): Adjust protoypes.
	(class ssa_lazy_cache): Ditto.
	* gimple-range.cc (assume_query::calculate_op): Use merge_range.
---
 gcc/gimple-range-cache.cc | 45 +++
 gcc/gimple-range-cache.h  |  6 --
 gcc/gimple-range.cc   |  6 ++
 3 files changed, 51 insertions(+), 6 deletions(-)

diff --git a/gcc/gimple-range-cache.cc b/gcc/gimple-range-cache.cc
index 52165d2405b..5b74681b61a 100644
--- a/gcc/gimple-range-cache.cc
+++ b/gcc/gimple-range-cache.cc
@@ -605,6 +605,32 @@ ssa_cache::set_range (tree name, const vrange &r)
   return m != NULL;
 }
 
+// If NAME has a range, intersect it with R, otherwise set it to R.
+// Return TRUE if there was already a range set, otherwise false.
+
+bool
+ssa_cache::merge_range (tree name, const vrange &r)
+{
+  unsigned v = SSA_NAME_VERSION (name);
+  if (v >= m_tab.length ())
+m_tab.safe_grow_cleared (num_ssa_names + 1);
+
+  vrange_storage *m = m_tab[v];
+  if (m)
+{
+  Value_Range curr (TREE_TYPE (name));
+  m->get_vrange (curr, TREE_TYPE (name));
+  curr.intersect (r);
+  if (m->fits_p (curr))
+	m->set_vrange (curr);
+  else
+	m_tab[v] = m_range_allocator->clone (curr);
+}
+  else
+m_tab[v] = m_range_allocator->clone (r);
+  return m != NULL;
+}
+
 // Set the range for NAME to R in the ssa cache.
 
 void
@@ -689,6 +715,25 @@ ssa_lazy_cache::set_range (tree name, const vrange &r)
   return false;
 }
 
+// If NAME has a range, intersect it with R, otherwise set it to R.
+// Return TRUE if there was already a range set, otherwise false.
+
+bool
+ssa_lazy_cache::merge_range (tree name, const vrange &r)
+{
+  unsigned v = SSA_NAME_VERSION (name);
+  if (!bitmap_set_bit (active_p, v))
+{
+  // There is already an entry, simply merge it.
+  gcc_checking_assert (v < m_tab.length ());
+  return ssa_cache::merge_range (name, r);
+}
+  if (v >= m_tab.length ())
+m_tab.safe_grow (num_ssa_names + 1);
+  m_tab[v] = m_range_allocator->clone (r);
+  return false;
+}
+
 // Return TRUE if NAME has a range, and return it in R.
 
 bool
diff --git a/gcc/gimple-range-cache.h b/gcc/gimple-range-cache.h
index a0f436b5723..bbb9b18a10c 100644
--- a/gcc/gimple-range-cache.h
+++ b/gcc/gimple-range-cache.h
@@ -61,11 +61,11 @@ public:
   virtual bool has_range (tree name) const;
   virtual bool get_range (vrange &r, tree name) const;
   virtual bool set_range (tree name, const vrange &r);
+  virtual bool merge_range (tree name, const vrange &r);
   virtual void clear_range (tree name);
   virtual void clear ();
   void dump (FILE *f = stderr);
-  virtual bool range_of_expr (vrange &r, tree expr, gimple *stmt);
-
+  virtual bool range_of_expr (vrange &r, tree expr, gimple *stmt = NULL);
 protected:
   vec m_tab;
   vrange_allocator *m_range_allocator;
@@ -80,8 +80,10 @@ class ssa_lazy_cache : public ssa_cache
 public:
   inline ssa_lazy_cache () { active_p = BITMAP_ALLOC (NULL); }
   inline ~ssa_lazy_cache () { BITMAP_FREE (active_p); }
+  inline bool empty_p () const { return bitmap_empty_p (active_p); }
   virtual bool has_range (tree name) const;
   virtual bool set_range (tree name, const vrange &r);
+  virtual bool merge_range (tree name, const vrange &r);
   virtual bool get_range (vrange &r, tree name) const;
   virtual void clear_range (tree name);
   virtual void clear ();
diff --git a/gcc/gimple-range.cc b/gcc/gimple-range.cc
index 01e62d3ff39..01173c58f02 100644
--- a/gcc/gimple-range.cc
+++ b/gcc/gimple-range.cc
@@ -809,10 +809,8 @@ assume_query::calculate_op (tree op, gimple *s, vrange &lhs, fur_source &src)
   if (m_gori.compute_operand_range (op_range, s, lhs, op, src)
   && !op_range.varying_p ())
 {
-  Value_Range range (TREE_TYPE (op));
-  if (global.get_range (range, op))
-	op_range.intersect (range);
-  global.set_range (op, op_range);
+  // Set the global range, merging if there is already a range.
+

[COMMITTED] Remove value_query, push into sub&fold class.

2023-07-28 Thread Andrew MacLeod via Gcc-patches
When we first introduced range_query, we provided a base class for 
constants rather than range queries.  Then inherioted from that and 
modified the value queries for a range-specific engine. .   At the time, 
we figured there would be other consumers of the value_query class.


When all the dust settled, it turned out that subsitute_and_fold is the 
only consumer, and all the other places we perceived there to be value 
clients actually use substitute_and_fold.


This patch simplifies everything by providing only a range-query class, 
and moving the old value_range functionality into substitute_and_fold, 
the only place that uses it.


Bootstrapped on x86_64-pc-linux-gnu with no regressions. Pushed.

Andrew

From 619641397a558bf65c24b99a4c52878bd940fcbe Mon Sep 17 00:00:00 2001
From: Andrew MacLeod 
Date: Sun, 16 Jul 2023 12:46:00 -0400
Subject: [PATCH 2/3] Remove value_query, push into sub&fold class

	* tree-ssa-propagate.cc (substitute_and_fold_engine::value_on_edge):
	Move from value-query.cc.
	(substitute_and_fold_engine::value_of_stmt): Ditto.
	(substitute_and_fold_engine::range_of_expr): New.
	* tree-ssa-propagate.h (substitute_and_fold_engine): Inherit from
	range_query.  New prototypes.
	* value-query.cc (value_query::value_on_edge): Relocate.
	(value_query::value_of_stmt): Ditto.
	* value-query.h (class value_query): Remove.
	(class range_query): Remove base class.  Adjust prototypes.
---
 gcc/tree-ssa-propagate.cc | 28 
 gcc/tree-ssa-propagate.h  |  8 +++-
 gcc/value-query.cc| 21 -
 gcc/value-query.h | 30 --
 4 files changed, 39 insertions(+), 48 deletions(-)

diff --git a/gcc/tree-ssa-propagate.cc b/gcc/tree-ssa-propagate.cc
index 174d19890f9..cb68b419b8c 100644
--- a/gcc/tree-ssa-propagate.cc
+++ b/gcc/tree-ssa-propagate.cc
@@ -532,6 +532,34 @@ struct prop_stats_d
 
 static struct prop_stats_d prop_stats;
 
+// range_query default methods to drive from a value_of_expr() ranther than
+// range_of_expr.
+
+tree
+substitute_and_fold_engine::value_on_edge (edge, tree expr)
+{
+  return value_of_expr (expr);
+}
+
+tree
+substitute_and_fold_engine::value_of_stmt (gimple *stmt, tree name)
+{
+  if (!name)
+name = gimple_get_lhs (stmt);
+
+  gcc_checking_assert (!name || name == gimple_get_lhs (stmt));
+
+  if (name)
+return value_of_expr (name);
+  return NULL_TREE;
+}
+
+bool
+substitute_and_fold_engine::range_of_expr (vrange &, tree, gimple *)
+{
+  return false;
+}
+
 /* Replace USE references in statement STMT with the values stored in
PROP_VALUE. Return true if at least one reference was replaced.  */
 
diff --git a/gcc/tree-ssa-propagate.h b/gcc/tree-ssa-propagate.h
index be4cb457873..29bde37add9 100644
--- a/gcc/tree-ssa-propagate.h
+++ b/gcc/tree-ssa-propagate.h
@@ -96,11 +96,17 @@ class ssa_propagation_engine
   void simulate_block (basic_block);
 };
 
-class substitute_and_fold_engine : public value_query
+class substitute_and_fold_engine : public range_query
 {
  public:
   substitute_and_fold_engine (bool fold_all_stmts = false)
 : fold_all_stmts (fold_all_stmts) { }
+
+  virtual tree value_of_expr (tree expr, gimple * = NULL) = 0;
+  virtual tree value_on_edge (edge, tree expr) override;
+  virtual tree value_of_stmt (gimple *, tree name = NULL) override;
+  virtual bool range_of_expr (vrange &r, tree expr, gimple * = NULL);
+
   virtual ~substitute_and_fold_engine (void) { }
   virtual bool fold_stmt (gimple_stmt_iterator *) { return false; }
 
diff --git a/gcc/value-query.cc b/gcc/value-query.cc
index adef93415b7..0870d6c60a6 100644
--- a/gcc/value-query.cc
+++ b/gcc/value-query.cc
@@ -33,27 +33,6 @@ along with GCC; see the file COPYING3.  If not see
 #include "gimple-range.h"
 #include "value-range-storage.h"
 
-// value_query default methods.
-
-tree
-value_query::value_on_edge (edge, tree expr)
-{
-  return value_of_expr (expr);
-}
-
-tree
-value_query::value_of_stmt (gimple *stmt, tree name)
-{
-  if (!name)
-name = gimple_get_lhs (stmt);
-
-  gcc_checking_assert (!name || name == gimple_get_lhs (stmt));
-
-  if (name)
-return value_of_expr (name);
-  return NULL_TREE;
-}
-
 // range_query default methods.
 
 bool
diff --git a/gcc/value-query.h b/gcc/value-query.h
index d10c3eac1e2..429446b32eb 100644
--- a/gcc/value-query.h
+++ b/gcc/value-query.h
@@ -37,28 +37,6 @@ along with GCC; see the file COPYING3.  If not see
 // Proper usage of the correct query in passes will enable other
 // valuation mechanisms to produce more precise results.
 
-class value_query
-{
-public:
-  value_query () { }
-  // Return the singleton expression for EXPR at a gimple statement,
-  // or NULL if none found.
-  virtual tree value_of_expr (tree expr, gimple * = NULL) = 0;
-  // Return the singleton expression for EXPR at an edge, or NULL if
-  // none found.
-  virtual tree value_on_edge (edge, tree expr);
-  // Return the singleton expression for the LHS of a gimple
-  // statemen

Re: [PATCH 3/5] [RISC-V] Generate Zicond instruction for select pattern with condition eq or neq to 0

2023-07-28 Thread Jeff Law via Gcc-patches




On 7/19/23 04:11, Xiao Zeng wrote:


+  else if (TARGET_ZICOND
+   && (code == EQ || code == NE)
+   && GET_MODE_CLASS (mode) == MODE_INT)
+{
+  need_eq_ne_p = true;
+  /* 0 + imm  */
+  if (GET_CODE (cons) == CONST_INT && cons == const0_rtx
+  && GET_CODE (alt) == CONST_INT && alt != const0_rtx)
+{
+  riscv_emit_int_compare (&code, &op0, &op1, need_eq_ne_p);
+  rtx cond = gen_rtx_fmt_ee (code, GET_MODE (op0), op0, op1);
+  alt = force_reg (mode, alt);
+  emit_insn (gen_rtx_SET (dest,
+  gen_rtx_IF_THEN_ELSE (mode, cond,
+cons, alt)));
+  return true;
+}
+  /* imm + imm  */
+  else if (GET_CODE (cons) == CONST_INT && cons != const0_rtx
+   && GET_CODE (alt) == CONST_INT && alt != const0_rtx)
+{
+  riscv_emit_int_compare (&code, &op0, &op1, need_eq_ne_p);
+  rtx cond = gen_rtx_fmt_ee (code, GET_MODE (op0), op0, op1);
+  alt = force_reg (mode, alt);
+  rtx temp1 = gen_reg_rtx (mode);
+  rtx temp2 = GEN_INT(-1 * INTVAL (cons));
+  riscv_emit_binary(PLUS, temp1, alt, temp2);
So in this sequence you're just computing a constant since both ALT and 
CONS are constants.  It's better to just form the constant directly, 
then force that into a register because it'll make the costing more 
correct, particularly if the resulting constant needs more than one 
instruction to synthesize.


And a nit.  There should always be a space between a function name and 
its argument list.





+  emit_insn (gen_rtx_SET (dest,
+  gen_rtx_IF_THEN_ELSE (mode, cond,
+const0_rtx, alt)));
+  riscv_emit_binary(PLUS, dest, dest, cons);
+  return true;
I don't see how this can be correct from a code generation standpoint. 
You compute ALT-CONS into TEMP1 earlier.  But you never use TEMP1 after 
that.  I think you meant to use TEMP1 instead of ALT as the false arm if 
the IF-THEN-ELSE you constructed.


In general you should be using CONST0_RTX (mode) rather than const0_rtx.


+}
+  /* imm + reg  */
+  else if (GET_CODE (cons) == CONST_INT && cons != const0_rtx
+   && GET_CODE (alt) == REG)
+{
+  /* Optimize for register value of 0.  */
+  if (op0 == alt && op1 == const0_rtx)
+{
+  rtx cond = gen_rtx_fmt_ee (code, GET_MODE (op0), op0, op1);
+  cons = force_reg (mode, cons);
+  emit_insn (gen_rtx_SET (dest,
+  gen_rtx_IF_THEN_ELSE (mode, cond,
+cons, alt)));
+  return true;
+}
+  riscv_emit_int_compare (&code, &op0, &op1, need_eq_ne_p);
+  rtx cond = gen_rtx_fmt_ee (code, GET_MODE (op0), op0, op1);
+  rtx temp1 = gen_reg_rtx (mode);
+  rtx temp2 = GEN_INT(-1 * INTVAL (cons));
+  riscv_emit_binary(PLUS, temp1, alt, temp2);
Here you have to be careful if CONS is -2048.  You negate it resulting 
in +2048 which can't be used in an addi.  This will cause the entire 
sequence to fail due to an unrecognized insn.  It would be better to 
handle that scenario directly so the generated sequence is still valid.


By generating recognizable code in that case we let the costing model 
determine if the conditional move sequence is better than the branching 
sequence.




+  emit_insn (gen_rtx_SET (dest,
+  gen_rtx_IF_THEN_ELSE (mode, cond,
+const0_rtx, alt)));
I think we have the same problem with the use of ALT here rather than 
TEMP1 that we had in the previous case.





+  /* reg + imm  */
+  else if (GET_CODE (cons) == REG
+   && GET_CODE (alt) == CONST_INT && alt != const0_rtx)
+{
+  /* Optimize for register value of 0.  */
+  if (op0 == cons && op1 == const0_rtx)
+{
+  rtx cond = gen_rtx_fmt_ee (code, GET_MODE (op0), op0, op1);
+  alt = force_reg (mode, alt);
+  emit_insn (gen_rtx_SET (dest,
+  gen_rtx_IF_THEN_ELSE (mode, cond,
+cons, alt)));
+  return true;
+}
+  riscv_emit_int_compare (&code, &op0, &op1, need_eq_ne_p);
+  rtx cond = gen_rtx_fmt_ee (code, GET_MODE (op0), op0, op1);
+  rtx temp1 = gen_reg_rtx (mode);
+  rtx temp2 = GEN_INT(-1 * INTVAL (alt));
+  riscv_emit_binary(PLUS, temp1, cons, temp2);
+  emit_insn (gen_rtx_SET (dest,
+  gen_rtx_IF_THEN_ELSE (mode, cond,
+temp1, const0

Re: [PATCH] gcc-ar: Handle response files properly [PR77576]

2023-07-28 Thread Joseph Myers
This patch is OK.

-- 
Joseph S. Myers
jos...@codesourcery.com


Re: [PATCH] RISC-V: Fix uninitialized and redundant use of which_alternative

2023-07-28 Thread Jeff Law via Gcc-patches




On 7/28/23 10:42, Patrick O'Neill wrote:

No worries! I'm glad it was an easy fix ;)
Note there's an effort underway via RISE to have some official RISC-V 
continuous testing for the compiler tools in place in the very near 
future (before Sept 1).


We're starting with a limited POC, so my thinking is to use it to 
augment what's already running in my tester.


Specifically I'm looking to add rv64gc and rv32gc cross testing for both 
the coordination branch and the trunk.  No multilibs in the immediate 
future, but once it's up and running and we have a good sense of the 
monthly cost we can look to add key multilibs such as vector or Zb*.


The hope is we can catch and resolve these minor problems quickly.

Jeff



Re: [PATCH v3 0/4] diagnostics: libcpp: Overhaul locations for _Pragma tokens

2023-07-28 Thread David Malcolm via Gcc-patches
On Fri, 2023-07-21 at 19:08 -0400, Lewis Hyatt wrote:
> Hello-
> 
> This is an update to the v2 patch series last sent in January:
> https://gcc.gnu.org/pipermail/gcc-patches/2023-January/609473.html
> 
> While I did not receive any feedback on the v2 patches yet, they did
> need some
> rebasing on top of other recent commits to input.cc, so I thought it
> would be
> helpful to send them again now. The patches have not otherwise
> changed from
> v2, and the above-linked message explains how all the patches fit in
> with the
> original v1 series sent last November.
> 
> Dave, I would appreciate it very much if you could please let me know
> what you
> think of this approach? I feel like the diagnostics we currently
> output for _Pragmas are worth improving. As a reminder, say for this
> example:
> 
> =
>  #define S "GCC diagnostic ignored \"oops"
>  _Pragma(S)
> =
> 
> We currently output:
> 
> =
> file.cpp:2:24: warning: missing terminating " character
>     2 | _Pragma(S)
>   |    ^
> =
> 
> While after these patches, we would output:
> 
> ==
> :1:24: warning: missing terminating " character
>     1 | GCC diagnostic ignored "oops
>   |    ^
> file.cpp:2:1: note: in <_Pragma directive>
>     2 | _Pragma(S)
>   | ^~~
> ==
> 
> Thanks!

Hi Lewis; sorry for not responding to the v2 patches.

I've started looking at the v3 patches in detail, but I have some high-
level questions about memory usage:

Am I right in thinking that the effect of this patch is that for every
_Pragma in the source we will create a new line_map_ordinary, and a new
buffer for the stringified content of that _Pragma, and that these
allocations will persist for the rest of the compilation?  (plus a
little extra allocation within the "location_t" space from 0 to
0x7fff).

It sounds like this will probably be a rounding error that won't be
noticable in profiling, but did you attempt any such measurement of the
memory usage before/after this patch on some real-world projects?

Thanks
Dave



Re: [PATCH v3 1/4] diagnostics: libcpp: Add LC_GEN linemaps to support in-memory buffers

2023-07-28 Thread David Malcolm via Gcc-patches
On Fri, 2023-07-21 at 19:08 -0400, Lewis Hyatt wrote:
> Add a new linemap reason LC_GEN which enables encoding the location
> of data
> that was generated during compilation and does not appear in any
> source file.
> There could be many use cases, such as, for instance, referring to
> the content
> of builtin macros (not yet implemented, but an easy lift after this
> one.) The
> first intended application is to create a place to store the input to
> a
> _Pragma directive, so that proper locations can be assigned to those
> tokens. This will be done in a subsequent commit.
> 
> The actual change needed to the line-maps API in libcpp is not too
> large and
> requires no space overhead in the line map data structures (on 64-bit
> systems
> that is; one newly added data member to class line_map_ordinary sits
> inside
> former padding bytes.) An LC_GEN map is just an ordinary map like any
> other,
> but the TO_FILE member that normally points to the file name points
> instead to
> the actual data.  This works automatically with PCH as well, for the
> same
> reason that the file name makes its way into a PCH.  In order to
> avoid
> confusion, the member has been renamed from TO_FILE to DATA, and
> associated
> accessors adjusted.
> 
> Outside libcpp, there are many small changes but most of them are to
> selftests, which are necessarily more sensitive to implementation
> details. From the perspective of the user (the "user", here, being a
> frontend
> using line maps or else the diagnostics infrastructure), the chief
> visible
> change is that the function location_get_source_line() should be
> passed an
> expanded_location object instead of a separate filename and line
> number.  This
> is not a big change because in most cases, this information came
> anyway from a
> call to expand_location and the needed expanded_location object is
> readily
> available. The new overload of location_get_source_line() uses the
> extra
> information in the expanded_location object to obtain the data from
> the
> in-memory buffer when it originated from an LC_GEN map.
> 
> Until the subsequent patch that starts using LC_GEN maps, none are
> yet
> generated within GCC, hence nothing is added to the testsuite here;
> but all
> relevant selftests have been extended to cover generated data maps in
> addition
> to normal files.

[..snip...]

Thanks for the updated patch.

Reading this patch, it felt a bit unnatural to me to have an
  (exploded location, source line) 
pair where the exploded location seems to be representing "which source
file or generated buffer", but the line/column info in that
exploded_location is to be ignored in favor of the 2nd source line.

I think we're missing a class: something that identifies either a
specific source file, or a specific generated buffer.

How about something like either:

class source_id
{
public:
  source_id (const char *filename)
  : m_filename_or_buffer (filename),
m_len (0)
  {
  }

  explicit source_id (const char *buffer, unsigned buffer_len)
  : m_filename_or_buffer (buffer),
m_len (buffer_len)
  {
linemap_assert (buffer_len > 0);
  }

private:
  const char *m_filename_or_buffer;
  unsigned m_len;  // where 0 means "it's a filename"
};

or:

class source_id
{
public:
  source_id (const char *filename)
  : m_ptr (filename),
m_is_buffer (false)
  {
  }

  explicit source_id (const linemap_ordinary *buffer_linemap)
  : m_ptr (buffer_linemap),
m_is_buffer (true)
  {
  }

private:
  const void *m_ptr;
  bool m_is_buffer;
};

and use one of these "source_id file" in place of "const char *file",
rather than replacing such things with expanded_location?

> diff --git a/gcc/c-family/c-indentation.cc b/gcc/c-family/c-indentation.cc
> index e8d3dece770..4164fa0b1ba 100644
> --- a/gcc/c-family/c-indentation.cc
> +++ b/gcc/c-family/c-indentation.cc
> @@ -50,7 +50,7 @@ get_visual_column (expanded_location exploc,
>  unsigned int *first_nws,
>  unsigned int tab_width)
>  {
> -  char_span line = location_get_source_line (exploc.file, exploc.line);
> +  char_span line = location_get_source_line (exploc);

...so this might contine to be:

  char_span line = location_get_source_line (exploc.file, exploc.line);

...but expanded_location's "file" field would become a source_id,
rather than a const char *.  It looks like doing do might make a lot of
"is this the same file or buffer?"  turn into comparisons of source_id
instances.

So I think expanded_location would become:

typedef struct
{
  /* Either the name of the source file involved, or the
 specific generated buffer.  */
  source_id file;

  /* The line-location in the source file.  */
  int line;

  int column;

  void *data;

  /* In a system header?. */
  bool sysp;
} expanded_location;

and we wouldn't need to add these extra fields:

> +
> +  /* If generated data, the data and its length.  The data may contain 
> embedded
> +   nulls and need not be null-terminated.  */
> +  unsigned in

Re: [PATCH v2] RISC-V: convert the mulh with 0 to mov 0 to the reg.

2023-07-28 Thread Jeff Law via Gcc-patches




On 7/28/23 06:31, Robin Dapp via Gcc-patches wrote:

This is a draft patch. I would like to explain it's hard to make the
simplify generic and ask for some help.

There're 2 categories we need to optimize.

- The op in optab such as div / 1.
- The unspec operation such as mulh * 0, (vadc+vmadc) + 0.

Especially for the unspec operation, I found we need to write one by
one to match the special pattern. Seems there's no way to write a
generic pattern that will match mulh, (vadc+vmadc), sll... This way
is too complicated and not so elegant because need to write so much
md patterns.

Do you have any ideas?


Yes, it's cumbersome having to add the patterns individually
and it would be nicer to have the middle end optimize for us.

However, adding new rtl expressions, especially generic ones that
are useful for others and the respective optimizations is a tedious
process as well.  Still, just recently Roger Sayle added bitreverse
and copysign.  You can refer to his patch as well as the follow-up
ones to get an idea of what would need to be done.
("Add RTX codes for BITREVERSE and COPYSIGN")

So if we have few patterns that are really performance critical
(like for some benchmark) my take is to add them in a similar way you
were proposing but I would advise against using this excessively.
Is the mulh case somehow common or critical?
Well, I would actually back up even further.  What were the 
circumstances that led to the mulh with a zero operand?   That would 
tend to be an indicator of a problem earlier.  Perhaps in the gimple 
pipeline or the gimple->rtl conversion.  I'd be a bit surprised to see a 
const0_rtx propagate in during the RTL pipeline, I guess it's possible, 
but I'd expect it to be relatively rare.


The one case I could see happening would be cases from the builtin 
apis...  Of course one might call that user error ;-)



jeff


[r14-2834 Regression] FAIL: gcc.target/i386/pr87007-5.c scan-assembler-times vxorps[^\n\r]*xmm[0-9] 1 on Linux/x86_64

2023-07-28 Thread haochen.jiang via Gcc-patches
On Linux/x86_64,

b9d7140c80bd3c7355b8291bb46f0895dcd8c3cb is the first bad commit
commit b9d7140c80bd3c7355b8291bb46f0895dcd8c3cb
Author: Jan Hubicka 
Date:   Fri Jul 28 09:16:09 2023 +0200

loop-split improvements, part 1

caused

FAIL: gcc.target/i386/pr87007-4.c scan-assembler-times vxorps[^\n\r]*xmm[0-9] 1
FAIL: gcc.target/i386/pr87007-5.c scan-assembler-times vxorps[^\n\r]*xmm[0-9] 1

with GCC configured with

../../gcc/configure 
--prefix=/export/users/haochenj/src/gcc-bisect/master/master/r14-2834/usr 
--enable-clocale=gnu --with-system-zlib --with-demangler-in-ld 
--with-fpmath=sse --enable-languages=c,c++,fortran --enable-cet --without-isl 
--enable-libmpx x86_64-linux --disable-bootstrap

To reproduce:

$ cd {build_dir}/gcc && make check 
RUNTESTFLAGS="i386.exp=gcc.target/i386/pr87007-4.c --target_board='unix{-m32}'"
$ cd {build_dir}/gcc && make check 
RUNTESTFLAGS="i386.exp=gcc.target/i386/pr87007-4.c --target_board='unix{-m32\ 
-march=cascadelake}'"
$ cd {build_dir}/gcc && make check 
RUNTESTFLAGS="i386.exp=gcc.target/i386/pr87007-4.c --target_board='unix{-m64}'"
$ cd {build_dir}/gcc && make check 
RUNTESTFLAGS="i386.exp=gcc.target/i386/pr87007-4.c --target_board='unix{-m64\ 
-march=cascadelake}'"
$ cd {build_dir}/gcc && make check 
RUNTESTFLAGS="i386.exp=gcc.target/i386/pr87007-5.c --target_board='unix{-m32}'"
$ cd {build_dir}/gcc && make check 
RUNTESTFLAGS="i386.exp=gcc.target/i386/pr87007-5.c --target_board='unix{-m32\ 
-march=cascadelake}'"
$ cd {build_dir}/gcc && make check 
RUNTESTFLAGS="i386.exp=gcc.target/i386/pr87007-5.c --target_board='unix{-m64}'"
$ cd {build_dir}/gcc && make check 
RUNTESTFLAGS="i386.exp=gcc.target/i386/pr87007-5.c --target_board='unix{-m64\ 
-march=cascadelake}'"

(Please do not reply to this email, for question about this report, contact me 
at haochen dot jiang at intel.com.)
(If you met problems with cascadelake related, disabling AVX512F in command 
line might save that.)
(However, please make sure that there is no potential problems with AVX512.)


Re: [PATCH v2] c-family: Implement pragma_lex () for preprocess-only mode

2023-07-28 Thread Jason Merrill via Gcc-patches

On 7/28/23 07:14, Lewis Hyatt wrote:

On Thu, Jul 27, 2023 at 06:18:33PM -0700, Jason Merrill wrote:

On 7/27/23 18:59, Lewis Hyatt wrote:

In order to support processing #pragma in preprocess-only mode (-E or
-save-temps for gcc/g++), we need a way to obtain the #pragma tokens from
libcpp. In full compilation modes, this is accomplished by calling
pragma_lex (), which is a symbol that must be exported by the frontend, and
which is currently implemented for C and C++. Neither of those frontends
initializes its parser machinery in preprocess-only mode, and consequently
pragma_lex () does not work in this case.

Address that by adding a new function c_init_preprocess () for the frontends
to implement, which arranges for pragma_lex () to work in preprocess-only
mode, and adjusting pragma_lex () accordingly.

In preprocess-only mode, the preprocessor is accustomed to controlling the
interaction with libcpp, and it only knows about tokens that it has called
into libcpp itself to obtain. Since it still needs to see the tokens
obtained by pragma_lex () so that they can be streamed to the output, also
adjust c_lex_with_flags () and related functions in c-family/c-lex.cc to
inform the preprocessor about any tokens it won't be aware of.

Currently, there is one place where we are already supporting #pragma in
preprocess-only mode, namely the handling of `#pragma GCC diagnostic'.  That
was done by directly interfacing with libcpp, rather than making use of
pragma_lex (). Now that pragma_lex () works, that code is no longer
necessary; remove it.

gcc/c-family/ChangeLog:

* c-common.h (c_init_preprocess): Declare.
(c_lex_enable_token_streaming): Declare.
* c-opts.cc (c_common_init): Call c_init_preprocess ().
* c-lex.cc (stream_tokens_to_preprocessor): New static variable.
(c_lex_enable_token_streaming): New function.
(cb_def_pragma): Add a comment.
(get_token): New function wrapping cpp_get_token.
(c_lex_with_flags): Use the new wrapper function to support
obtaining tokens in preprocess_only mode.
(lex_string): Likewise.
* c-ppoutput.cc (preprocess_file): Call c_lex_enable_token_streaming
when needed.
* c-pragma.cc (pragma_diagnostic_lex_normal): Rename to...
(pragma_diagnostic_lex): ...this.
(pragma_diagnostic_lex_pp): Remove.
(handle_pragma_diagnostic_impl): Call pragma_diagnostic_lex () in
all modes.
(c_pp_invoke_early_pragma_handler): Adapt to support pragma_lex ()
usage.
* c-pragma.h (pragma_lex_discard_to_eol): Declare.

gcc/c/ChangeLog:

* c-parser.cc (pragma_lex_discard_to_eol): New function.
(c_init_preprocess): New function.

gcc/cp/ChangeLog:

* parser.cc (c_init_preprocess): New function.
(maybe_read_tokens_for_pragma_lex): New function.
(pragma_lex): Support preprocess-only mode.
(pragma_lex_discard_to_eol): New function.
---

Notes:
  Hello-
  Here is version 2 of the patch, incorporating Jason's feedback from
  https://gcc.gnu.org/pipermail/gcc-patches/2023-July/625591.html
  Thanks again, please let me know if it's OK? Bootstrap + regtest all
  languages on x86-64 Linux looks good.
  -Lewis

   gcc/c-family/c-common.h|  4 +++
   gcc/c-family/c-lex.cc  | 49 +
   gcc/c-family/c-opts.cc |  1 +
   gcc/c-family/c-ppoutput.cc | 17 +---
   gcc/c-family/c-pragma.cc   | 56 ++
   gcc/c-family/c-pragma.h|  2 ++
   gcc/c/c-parser.cc  | 21 ++
   gcc/cp/parser.cc   | 45 ++
   8 files changed, 138 insertions(+), 57 deletions(-)

diff --git a/gcc/c-family/c-common.h b/gcc/c-family/c-common.h
index b5ef5ff6b2c..2fe2f194660 100644
--- a/gcc/c-family/c-common.h
+++ b/gcc/c-family/c-common.h
@@ -990,6 +990,9 @@ extern void c_parse_file (void);
   extern void c_parse_final_cleanups (void);
+/* This initializes for preprocess-only mode.  */
+extern void c_init_preprocess (void);
+
   /* These macros provide convenient access to the various _STMT nodes.  */
   /* Nonzero if a given STATEMENT_LIST represents the outermost binding
@@ -1214,6 +1217,7 @@ extern tree c_build_bind_expr (location_t, tree, tree);
   /* In c-lex.cc.  */
   extern enum cpp_ttype
   conflict_marker_get_final_tok_kind (enum cpp_ttype tok1_kind);
+extern void c_lex_enable_token_streaming (bool enabled);
   /* In c-pch.cc  */
   extern void pch_init (void);
diff --git a/gcc/c-family/c-lex.cc b/gcc/c-family/c-lex.cc
index dcd061c7cb1..ac4c018d863 100644
--- a/gcc/c-family/c-lex.cc
+++ b/gcc/c-family/c-lex.cc
@@ -57,6 +57,17 @@ static void cb_ident (cpp_reader *, unsigned int, const 
cpp_string *);
   static void cb_def_pragma (cpp_reader *, unsigned int);
   static void cb_define (cpp_reader *, unsigned int, cpp_hashnode *);
   static void cb_undef (cpp_reader *, unsigned int, cpp_h

Fix profile update after loop versioning in vectorizer

2023-07-28 Thread Jan Hubicka via Gcc-patches
Hi,
Vectorizer while loop versioning produces a versioned loop
guarded with two conditionals of the form

  if (cond1)
goto scalar_loop
  else
goto next_bb
next_bb:
  if (cond2)
godo scalar_loop
  else
goto vector_loop

It wants the combined test to be prob (whch is set to likely)
and uses profile_probability::split to determine probability
of cond1 and cond2.

However spliting  is turning:

 if (cond)
   goto lab; // ORIG probability
 into
 if (cond1)
   goto lab; // FIRST = ORIG * CPROB probability
 if (cond2)
   goto lab; // SECOND probability

Which is or instead of and.  As a result we get pretty low probabiility
of entering vectorized loop.

The fixes this by introducing sqrt to profile probability (which is correct
way to split this) and also adding pow that is needed elsewhere.

While loop versioning I now produce code as if there was only one combined
conditional and then update probability of conditional produced (containig
cond1).  Later edge is split and new conditional is added. At that time
it is necessary to update probability of the BB containing second conditional
so everything matches.

Bootstrapped/regtested x86_64-linux, comitted.

gcc/ChangeLog:

* profile-count.cc (profile_probability::sqrt): New member function.
(profile_probability::pow): Likewise.
* profile-count.h: (profile_probability::sqrt): Declare
(profile_probability::pow): Likewise.
* tree-vect-loop-manip.cc (vect_loop_versioning): Fix profile update.

gcc/testsuite/ChangeLog:

* gcc.dg/tree-ssa/vect-profile-upate-2.c: New test.

diff --git a/gcc/profile-count.cc b/gcc/profile-count.cc
index eaf0f0d787e..e63c9432388 100644
--- a/gcc/profile-count.cc
+++ b/gcc/profile-count.cc
@@ -471,3 +471,60 @@ profile_probability::to_sreal () const
   gcc_checking_assert (initialized_p ());
   return ((sreal)m_val) >> (n_bits - 2);
 }
+
+/* Compute square root.  */
+
+profile_probability
+profile_probability::sqrt () const
+{
+  if (!initialized_p () || *this == never () || *this == always ())
+return *this;
+  profile_probability ret = *this;
+  ret.m_quality = MIN (ret.m_quality, ADJUSTED);
+  uint32_t min_range = m_val;
+  uint32_t max_range = max_probability;
+  if (!m_val)
+max_range = 0;
+  if (m_val == max_probability)
+min_range = max_probability;
+  while (min_range != max_range)
+{
+  uint32_t val = (min_range + max_range) / 2;
+  uint32_t val2 = RDIV ((uint64_t)val * val, max_probability);
+  if (val2 == m_val)
+   min_range = max_range = m_val;
+  else if (val2 > m_val)
+   max_range = val - 1;
+  else if (val2 < m_val)
+   min_range = val + 1;
+}
+  ret.m_val = min_range;
+  return ret;
+}
+
+/* Compute n-th power of THIS.  */
+
+profile_probability
+profile_probability::pow (int n) const
+{
+  if (n == 1 || !initialized_p ())
+return *this;
+  if (!n)
+return profile_probability::always ();
+  if (!nonzero_p ()
+  || !(profile_probability::always () - *this).nonzero_p ())
+return *this;
+  profile_probability ret = profile_probability::always ();
+  profile_probability v = *this;
+  int p = 1;
+  while (true)
+{
+  if (n & p)
+   ret = ret * v;
+  p <<= 1;
+  if (p > n)
+   break;
+  v = v * v;
+}
+  return ret;
+}
diff --git a/gcc/profile-count.h b/gcc/profile-count.h
index 88a6431c21a..002bcb83481 100644
--- a/gcc/profile-count.h
+++ b/gcc/profile-count.h
@@ -650,6 +650,12 @@ public:
   return *this;
 }
 
+  /* Compute n-th power.  */
+  profile_probability pow (int) const;
+
+  /* Compute sware root.  */
+  profile_probability sqrt () const;
+
   /* Get the value of the count.  */
   uint32_t value () const { return m_val; }
 
diff --git a/gcc/testsuite/gcc.dg/tree-ssa/vect-profile-upate-2.c 
b/gcc/testsuite/gcc.dg/tree-ssa/vect-profile-upate-2.c
new file mode 100644
index 000..4a5f6bc4e23
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/tree-ssa/vect-profile-upate-2.c
@@ -0,0 +1,9 @@
+/* { dg-do compile } */
+/* { dg-options "-O2 -fdump-tree-optimized-details-blocks" } */
+void
+test (int *a, int *b, int n)
+{
+   for (int i = 0; i < n; i++)
+   a[i]+=b[i];
+}
+/* { dg-final { scan-tree-dump-not "Invalid sum" "optimized"} } */
diff --git a/gcc/tree-vect-loop-manip.cc b/gcc/tree-vect-loop-manip.cc
index 30baac6db44..e53a99e7c3c 100644
--- a/gcc/tree-vect-loop-manip.cc
+++ b/gcc/tree-vect-loop-manip.cc
@@ -3784,7 +3784,7 @@ vect_loop_versioning (loop_vec_info loop_vinfo,
 }
 
   tree cost_name = NULL_TREE;
-  profile_probability prob2 = profile_probability::uninitialized ();
+  profile_probability prob2 = profile_probability::always ();
   if (cond_expr
   && EXPR_P (cond_expr)
   && (version_niter
@@ -3797,7 +3797,7 @@ vect_loop_versioning (loop_vec_info loop_vinfo,
  is_gimple_val, NULL_TREE);
   /* Split prob () into two so that the overall prob