[PATCH v1] RISC-V: Bugfix for scalar move with merged operand

2023-09-17 Thread Pan Li via Gcc-patches
From: Pan Li 

Given below example for VLS mode

void
test (vl_t *u)
{
  vl_t t;
  long long *p = (long long *)&t;

  p[0] = p[1] = 2;

  *u = t;
}

The vec_set will simplify the insn to vmv.s.x when index is 0, without
merged operand. That will result in some problems in DCE, aka:

1:  137[DI] = a0
2:  138[V2DI] = 134[V2DI]  // deleted by DCE
3:  139[DI] = #2   // deleted by DCE
4:  140[DI] = #2   // deleted by DCE
5:  141[V2DI] = vec_dup:V2DI (139[DI]) // deleted by DCE
6:  138[V2DI] = vslideup_imm (138[V2DI], 141[V2DI], 1) // deleted by DCE
7:  135[V2DI] = 138[V2DI]  // deleted by DCE
8:  142[V2DI] = 135[V2DI]  // deleted by DCE
9:  143[DI] = #2
10: 142[V2DI] = vec_dup:V2DI (143[DI])
11: (137[DI]) = 142[V2DI]

The higher 64 bits of 142[V2DI] is unknown here and it generated incorrect
code when store back to memory. This patch would like to fix this issue
by adding a new SCALAR_MOVE_MERGED_OP for vec_set.

Please note this patch doesn't enable VLS for vec_set, the underlying
patches will support this soon.

gcc/ChangeLog:

* config/riscv/autovec.md: Bugfix.
* config/riscv/riscv-protos.h (SCALAR_MOVE_MERGED_OP): New enum.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/rvv/base/scalar-move-merged-run-1.c: New test.

Signed-off-by: Pan Li 
---
 gcc/config/riscv/autovec.md   |  4 +--
 gcc/config/riscv/riscv-protos.h   |  4 +++
 .../riscv/rvv/base/scalar-move-merged-run-1.c | 29 +++
 3 files changed, 35 insertions(+), 2 deletions(-)
 create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/base/scalar-move-merged-run-1.c

diff --git a/gcc/config/riscv/autovec.md b/gcc/config/riscv/autovec.md
index aca86554a94..01291ad9830 100644
--- a/gcc/config/riscv/autovec.md
+++ b/gcc/config/riscv/autovec.md
@@ -1401,9 +1401,9 @@ (define_expand "vec_set"
   /* If we set the first element, emit an v(f)mv.s.[xf].  */
   if (operands[2] == const0_rtx)
 {
-  rtx ops[] = {operands[0], operands[1]};
+  rtx ops[] = {operands[0], operands[0], operands[1]};
   riscv_vector::emit_nonvlmax_insn (code_for_pred_broadcast (mode),
- riscv_vector::SCALAR_MOVE_OP, ops, 
CONST1_RTX (Pmode));
+   riscv_vector::SCALAR_MOVE_MERGED_OP, 
ops, CONST1_RTX (Pmode));
 }
   else
 {
diff --git a/gcc/config/riscv/riscv-protos.h b/gcc/config/riscv/riscv-protos.h
index 5a2d218d67b..6d9367d9602 100644
--- a/gcc/config/riscv/riscv-protos.h
+++ b/gcc/config/riscv/riscv-protos.h
@@ -345,6 +345,10 @@ enum insn_type : unsigned int
   SCALAR_MOVE_OP = HAS_DEST_P | HAS_MASK_P | USE_ONE_TRUE_MASK_P | HAS_MERGE_P
   | USE_VUNDEF_MERGE_P | TDEFAULT_POLICY_P | MDEFAULT_POLICY_P
   | UNARY_OP_P,
+
+  SCALAR_MOVE_MERGED_OP = HAS_DEST_P | HAS_MASK_P | USE_ONE_TRUE_MASK_P
+ | HAS_MERGE_P | TDEFAULT_POLICY_P | MDEFAULT_POLICY_P
+ | UNARY_OP_P,
 };
 
 enum vlmul_type
diff --git a/gcc/testsuite/gcc.target/riscv/rvv/base/scalar-move-merged-run-1.c 
b/gcc/testsuite/gcc.target/riscv/rvv/base/scalar-move-merged-run-1.c
new file mode 100644
index 000..7aee75c6940
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/rvv/base/scalar-move-merged-run-1.c
@@ -0,0 +1,29 @@
+/* { dg-do run { target { riscv_vector } } } */
+/* { dg-options "-O3 -Wno-psabi" } */
+
+#define TEST_VAL 2
+
+typedef long long vl_t __attribute__((vector_size(2 * sizeof (long long;
+
+void init_vl (vl_t *u)
+{
+  vl_t t;
+  long long *p = (long long *)&t;
+
+  p[0] = p[1] = TEST_VAL;
+
+  *u = t;
+}
+
+int
+main ()
+{
+  vl_t vl = {};
+
+  init_vl (&vl);
+
+  if (vl[0] != TEST_VAL || vl[1] != TEST_VAL)
+__builtin_abort ();
+
+  return 0;
+}
-- 
2.34.1



[r14-4046 Regression] FAIL: 23_containers/vector/bool/110807.cc -std=gnu++17 (test for excess errors) on Linux/x86_64

2023-09-17 Thread Jiang, Haochen via Gcc-patches
On Linux/x86_64,

3a0e01f6bb1d6ec444001f2caea6ef43a4a83e3a is the first bad commit
commit 3a0e01f6bb1d6ec444001f2caea6ef43a4a83e3a
Author: Jonathan Wakely 
Date:   Fri Sep 1 21:27:57 2023 +0100

libstdc++: Add support for running tests with multiple -std options

caused

FAIL: 23_containers/vector/bool/110807.cc  -std=gnu++17 (test for excess errors)

with GCC configured with

../../gcc/configure 
--prefix=/export/users/haochenj/src/gcc-bisect/master/master/r14-4046/usr 
--enable-clocale=gnu --with-system-zlib --with-demangler-in-ld 
--with-fpmath=sse --enable-languages=c,c++,fortran --enable-cet --without-isl 
--enable-libmpx x86_64-linux --disable-bootstrap

To reproduce:

$ cd {build_dir}/x86_64-linux/libstdc++-v3/testsuite && make check 
RUNTESTFLAGS="conformance.exp=23_containers/vector/bool/110807.cc 
--target_board='unix{-m32}'"
$ cd {build_dir}/x86_64-linux/libstdc++-v3/testsuite && make check 
RUNTESTFLAGS="conformance.exp=23_containers/vector/bool/110807.cc 
--target_board='unix{-m32\ -march=cascadelake}'"

(If you met problems with cascadelake related, disabling AVX512F in command 
line might save that.)
(However, please make sure that there is no potential problems with AVX512.)


Re: [PATCH] internal-fn: Convert uninitialized SSA_NAME into SCRATCH rtx[PR110751]

2023-09-17 Thread Richard Sandiford via Gcc-patches
Juzhe-Zhong  writes:
> According to PR: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110751
>
> As Richard and Richi suggested, we recognize uninitialized SSA_NAME and 
> convert it
> into SCRATCH rtx if the target predicate allows SCRATCH.
>
> It can help to reduce redundant data move instructions of targets like RISC-V.
>
> Here we add the condition "insn_operand_matches (icode, opno, scratch)"
> Then, we will only create scratch rtx that target allow scratch rtx in 
> predicate.
> When the target doesn't allow scratch rtx in predicate, the later "else" 
> condtion
> will create fresh pseudo for uninitialized SSA.
>
> I have verify it in RISC-V port and it works well.
>
> Bootstrap and Regression on X86 passed.
>
> Ok for trunk ?
>  
> gcc/ChangeLog:
>
>   * internal-fn.cc (expand_fn_using_insn): Convert uninitialized SSA into 
> scratch.
>
> ---
>  gcc/internal-fn.cc | 6 ++
>  1 file changed, 6 insertions(+)
>
> diff --git a/gcc/internal-fn.cc b/gcc/internal-fn.cc
> index 0fd34359247..fe4d86b3dbd 100644
> --- a/gcc/internal-fn.cc
> +++ b/gcc/internal-fn.cc
> @@ -243,10 +243,16 @@ expand_fn_using_insn (gcall *stmt, insn_code icode, 
> unsigned int noutputs,
>tree rhs = gimple_call_arg (stmt, i);
>tree rhs_type = TREE_TYPE (rhs);
>rtx rhs_rtx = expand_normal (rhs);
> +  rtx scratch = gen_rtx_SCRATCH (TYPE_MODE (rhs_type));
>if (INTEGRAL_TYPE_P (rhs_type))
>   create_convert_operand_from (&ops[opno], rhs_rtx,
>TYPE_MODE (rhs_type),
>TYPE_UNSIGNED (rhs_type));
> +  else if (TREE_CODE (rhs) == SSA_NAME
> +&& SSA_NAME_IS_DEFAULT_DEF (rhs)
> +&& VAR_P (SSA_NAME_VAR (rhs))
> +&& insn_operand_matches (icode, opno, scratch))

Rather than check insn_operand_matches here, I think we should create
the scratch operand regardless and leave optabs.cc to deal with it.
(This will need changes to optabs.cc.)

How about adding:

  create_undefined_input_operand (expand_operand *op, machine_mode mode)

that maps to a new EXPAND_UNDEFINED, then handle EXPAND_UNDEFINED in the
two case statements in optabs.cc.

Thanks,
Richard

> + create_input_operand (&ops[opno], scratch, TYPE_MODE (rhs_type));
>else
>   create_input_operand (&ops[opno], rhs_rtx, TYPE_MODE (rhs_type));
>opno += 1;


Re: [PATCH] AArch64: Improve immediate expansion [PR105928]

2023-09-17 Thread Richard Sandiford via Gcc-patches
Wilco Dijkstra  writes:
> Support immediate expansion of immediates which can be created from 2 MOVKs
> and a shifted ORR or BIC instruction.  Change aarch64_split_dimode_const_store
> to apply if we save one instruction.
>
> This reduces the number of 4-instruction immediates in SPECINT/FP by 5%.
>
> Passes regress, OK for commit?
>
> gcc/ChangeLog:
> PR target/105928
> * config/aarch64/aarch64.cc (aarch64_internal_mov_immediate)
> Add support for immediates using shifted ORR/BIC.
> (aarch64_split_dimode_const_store): Apply if we save one instruction.
> * config/aarch64/aarch64.md (_3):
> Make pattern global.
>
> gcc/testsuite:
> PR target/105928
> * gcc.target/aarch64/pr105928.c: Add new test.
> * gcc.target/aarch64/vect-cse-codegen.c: Fix test.

Looks good apart from a comment below about the test.

I was worried that reusing "dest" for intermediate results would
prevent CSE for cases like:

void g (long long, long long);
void
f (long long *ptr)
{
  g (0xee11ee22ee11ee22LL, 0xdc23dc44ee11ee22LL);
}

where the same 32-bit lowpart pattern is used for two immediates.
In principle, that could be avoided using:

if (generate)
  {
rtx tmp = aarch64_target_reg (dest, DImode);
emit_insn (gen_rtx_SET (tmp, GEN_INT (val2 & 0x)));
emit_insn (gen_insv_immdi (tmp, GEN_INT (16),
   GEN_INT (val2 >> 16)));
set_unique_reg_note (get_last_insn (), REG_EQUAL,
 GEN_INT (val2));
emit_insn (gen_ior_ashldi3 (dest, tmp, GEN_INT (i), tmp));
  }
return 3;

But it doesn't work, since we only expose the individual immediates
during split1, and nothing between split1 and ira is able to remove
redundancies.  There's no point complicating the code for a theoretical
future optimisation.

> diff --git a/gcc/config/aarch64/aarch64.cc b/gcc/config/aarch64/aarch64.cc
> index 
> c44c0b979d0cc3755c61dcf566cfddedccebf1ea..832f8197ac8d1a04986791e6f3e51861e41944b2
>  100644
> --- a/gcc/config/aarch64/aarch64.cc
> +++ b/gcc/config/aarch64/aarch64.cc
> @@ -5639,7 +5639,7 @@ aarch64_internal_mov_immediate (rtx dest, rtx imm, bool 
> generate,
> machine_mode mode)
>  {
>int i;
> -  unsigned HOST_WIDE_INT val, val2, mask;
> +  unsigned HOST_WIDE_INT val, val2, val3, mask;
>int one_match, zero_match;
>int num_insns;
>
> @@ -5721,6 +5721,35 @@ aarch64_internal_mov_immediate (rtx dest, rtx imm, 
> bool generate,
> }
>   return 3;
> }
> +
> +  /* Try shifting and inserting the bottom 32-bits into the top bits.  */
> +  val2 = val & 0x;
> +  val3 = 0x;
> +  val3 = val2 | (val3 << 32);
> +  for (i = 17; i < 48; i++)
> +   if ((val2 | (val2 << i)) == val)
> + {
> +   if (generate)
> + {
> +   emit_insn (gen_rtx_SET (dest, GEN_INT (val2 & 0x)));
> +   emit_insn (gen_insv_immdi (dest, GEN_INT (16),
> +  GEN_INT (val2 >> 16)));
> +   emit_insn (gen_ior_ashldi3 (dest, dest, GEN_INT (i), dest));
> + }
> +   return 3;
> + }
> +   else if ((val3 & ~(val3 << i)) == val)
> + {
> +   if (generate)
> + {
> +   emit_insn (gen_rtx_SET (dest, GEN_INT (val3 | 0x)));
> +   emit_insn (gen_insv_immdi (dest, GEN_INT (16),
> +  GEN_INT (val2 >> 16)));
> +   emit_insn (gen_and_one_cmpl_ashldi3 (dest, dest, GEN_INT (i),
> + dest));
> + }
> +   return 3;
> + }
>  }
>
>/* Generate 2-4 instructions, skipping 16 bits of all zeroes or ones which
> @@ -25506,8 +25535,6 @@ aarch64_split_dimode_const_store (rtx dst, rtx src)
>rtx lo = gen_lowpart (SImode, src);
>rtx hi = gen_highpart_mode (SImode, DImode, src);
>
> -  bool size_p = optimize_function_for_size_p (cfun);
> -
>if (!rtx_equal_p (lo, hi))
>  return false;
>
> @@ -25526,14 +25553,8 @@ aarch64_split_dimode_const_store (rtx dst, rtx src)
>   MOV   w1, 49370
>   MOVK  w1, 0x140, lsl 16
>   STP   w1, w1, [x0]
> -   So we want to perform this only when we save two instructions
> -   or more.  When optimizing for size, however, accept any code size
> -   savings we can.  */
> -  if (size_p && orig_cost <= lo_cost)
> -return false;
> -
> -  if (!size_p
> -  && (orig_cost <= lo_cost + 1))
> +   So we want to perform this when we save at least one instruction.  */
> +  if (orig_cost <= lo_cost)
>  return false;
>
>rtx mem_lo = adjust_address (dst, SImode, 0);
> diff --git a/gcc/config/aarch64/aarch64.md b/gcc/config/aarch64/aarch64.md
> index 
> 97f

[PATCH v2] c++: Catch indirect change of active union member in constexpr [PR101631]

2023-09-17 Thread Nathaniel Shead via Gcc-patches
Ping for https://gcc.gnu.org/pipermail/gcc-patches/2023-September/629084.html

-- >8 --

This patch adds checks for attempting to change the active member of a
union by methods other than a member access expression.

To be able to properly distinguish `*(&u.a) = ` from `u.a = `, this
patch redoes the solution for c++/59950 to avoid extranneous *&; it
seems that the only case that needed the workaround was when copying
empty classes.

Additionally, this patch ensures that constructors for a union field
mark that field as the active member before entering the call itself;
this ensures that modifications of the field within the constructor's
body don't cause false positives (as these will not appear to be member
access expressions). This means that we no longer need to start the
lifetime of empty union members after the constructor body completes.

PR c++/101631

gcc/cp/ChangeLog:

* call.cc (build_over_call): Fold more indirect refs for trivial
assignment op.
* class.cc (type_has_non_deleted_trivial_default_ctor): Create.
* constexpr.cc (cxx_eval_call_expression): Start lifetime of
union member before entering constructor.
(cxx_eval_store_expression): Check for accessing inactive union
member indirectly.
* cp-tree.h (type_has_non_deleted_trivial_default_ctor):
Forward declare.

gcc/testsuite/ChangeLog:

* g++.dg/cpp2a/constexpr-union2.C: New test.
* g++.dg/cpp2a/constexpr-union3.C: New test.
* g++.dg/cpp2a/constexpr-union4.C: New test.
* g++.dg/cpp2a/constexpr-union5.C: New test.

Signed-off-by: Nathaniel Shead 
---
 gcc/cp/call.cc|  11 +-
 gcc/cp/class.cc   |   8 ++
 gcc/cp/constexpr.cc   | 105 --
 gcc/cp/cp-tree.h  |   1 +
 gcc/testsuite/g++.dg/cpp2a/constexpr-union2.C |  30 +
 gcc/testsuite/g++.dg/cpp2a/constexpr-union3.C |  45 
 gcc/testsuite/g++.dg/cpp2a/constexpr-union4.C |  29 +
 gcc/testsuite/g++.dg/cpp2a/constexpr-union5.C |  55 +
 8 files changed, 246 insertions(+), 38 deletions(-)
 create mode 100644 gcc/testsuite/g++.dg/cpp2a/constexpr-union2.C
 create mode 100644 gcc/testsuite/g++.dg/cpp2a/constexpr-union3.C
 create mode 100644 gcc/testsuite/g++.dg/cpp2a/constexpr-union4.C
 create mode 100644 gcc/testsuite/g++.dg/cpp2a/constexpr-union5.C

diff --git a/gcc/cp/call.cc b/gcc/cp/call.cc
index 23e458d3252..3372c88f182 100644
--- a/gcc/cp/call.cc
+++ b/gcc/cp/call.cc
@@ -10358,10 +10358,7 @@ build_over_call (struct z_candidate *cand, int flags, 
tsubst_flags_t complain)
   && DECL_OVERLOADED_OPERATOR_IS (fn, NOP_EXPR)
   && trivial_fn_p (fn))
 {
-  /* Don't use cp_build_fold_indirect_ref, op= returns an lvalue even if
-the object argument isn't one.  */
-  tree to = cp_build_indirect_ref (input_location, argarray[0],
-  RO_ARROW, complain);
+  tree to = cp_build_fold_indirect_ref (argarray[0]);
   tree type = TREE_TYPE (to);
   tree as_base = CLASSTYPE_AS_BASE (type);
   tree arg = argarray[1];
@@ -10369,7 +10366,11 @@ build_over_call (struct z_candidate *cand, int flags, 
tsubst_flags_t complain)
 
   if (is_really_empty_class (type, /*ignore_vptr*/true))
{
- /* Avoid copying empty classes.  */
+ /* Avoid copying empty classes, but ensure op= returns an lvalue even
+if the object argument isn't one. This isn't needed in other cases
+since MODIFY_EXPR is always considered an lvalue.  */
+ to = cp_build_addr_expr (to, tf_none);
+ to = cp_build_indirect_ref (input_location, to, RO_ARROW, complain);
  val = build2 (COMPOUND_EXPR, type, arg, to);
  suppress_warning (val, OPT_Wunused);
}
diff --git a/gcc/cp/class.cc b/gcc/cp/class.cc
index 778759237dc..43898dabbe7 100644
--- a/gcc/cp/class.cc
+++ b/gcc/cp/class.cc
@@ -5651,6 +5651,14 @@ type_has_virtual_destructor (tree type)
   return (dtor && DECL_VIRTUAL_P (dtor));
 }
 
+/* True iff class TYPE has a non-deleted trivial default
+   constructor.  */
+
+bool type_has_non_deleted_trivial_default_ctor (tree type)
+{
+  return TYPE_HAS_TRIVIAL_DFLT (type) && locate_ctor (type);
+}
+
 /* Returns true iff T, a class, has a move-assignment or
move-constructor.  Does not lazily declare either.
If USER_P is false, any move function will do.  If it is true, the
diff --git a/gcc/cp/constexpr.cc b/gcc/cp/constexpr.cc
index 8bd5c4a47f8..b82e87be974 100644
--- a/gcc/cp/constexpr.cc
+++ b/gcc/cp/constexpr.cc
@@ -3141,40 +3141,34 @@ cxx_eval_call_expression (const constexpr_ctx *ctx, 
tree t,
cxx_set_object_constness (ctx, new_obj, /*readonly_p=*/false,
  non_constant_p, overflow_p);
 
+ /* If this is a constructor, we are beginning the lifetime of the
+object 

Re: [PATCH] RISC-V: Finish Typing Un-Typed Instructions and Turn on Assert

2023-09-17 Thread Jeff Law via Gcc-patches




On 9/12/23 00:18, Lehua Ding wrote:

Hi Jeff,

On 2023/9/12 11:47, Jeff Law wrote:
But that condition is _not_ generally sufficient to prevent these 
insns from existing during sched1.  ie, a pass between split1 and 
sched1 could create these patterns and successfully match them.  That 
in turn would trigger the assertion.


can_create_pseudo_p is true up through the register allocator.  As a 
result a condition like TARGET_VECTOR && can_create_pseudo_p() is 
_not_ sufficient to ensure the pattern does not exist during sched1.  
While no pass likely creates these kinds of insns right now in that 
window between split1 and sched1, there's no guarantee that will 
always be true.


But if a pass between split1 and sched1 creates these patterns, then an 
unrecognized error will throw after reload. Is that right? That is to 
say, this insn patterns is designed to exist only before split1, but now 
the conditions are a little looser, a little tighter is better if we 
can. If this is the case, I feel it makes no difference whether the 
error is thrwoed by sched pass or a pass after reload.
If someone was to create one of these patterns without an associated 
insn type, then the assert would trigger during sched1, and that is 
good.  The earlier we can catch an inconsistency, the better.






The simple rule is easy to follow.  Every insn should have a type.  
That also gives us a degree of future-proof, even if someone adds 
additional passes/capabilities between split1 and sched1.


However, adding content that you don't need feels even more difficult to 
understand, and this is just my feeling. It would be clearer if we could 
set the type according to the purpose of the insn pattern.
I understand your position, but respectfully disagree with the 
conclusion in this case.


jeff


Re: [pushed] [RA]: Improve cost calculation of pseudos with equivalences

2023-09-17 Thread Jeff Law via Gcc-patches




On 9/14/23 09:28, Vladimir Makarov via Gcc-patches wrote:
I've committed the following patch.  The reason for this patch is 
explained in its commit message.


The patch was successfully bootstrapped and tested on x86-64, aarch64, 
and ppc64le.



ra-equiv-cost.patch_ZN7cObject4dropEP12cOwnedObject-stores

commit 3c834d85f2ec42c60995c2b678196a06cb744959
Author: Vladimir N. Makarov
Date:   Thu Sep 14 10:26:48 2023 -0400

 [RA]: Improve cost calculation of pseudos with equivalences
 
 RISCV target developers reported that RA can spill pseudo used in a

 loop although there are enough registers to assign.  It happens when
 the pseudo has an equivalence outside the loop and the equivalence is
 not merged into insns using the pseudo.  IRA sets up that memory cost
 to zero when the pseudo has an equivalence and it means that the
 pseudo will be probably spilled.  This approach worked well for i686
 (different approaches were benchmarked long time ago on spec2k).
 Although common sense says that the code is wrong and this was
 confirmed by RISCV developers.
 
 I've tried the following patch on I7-9700k and it improved spec17 fp

 by 1.5% (21.1 vs 20.8) although spec17 int is a bit worse by 0.45%
 (8.54 vs 8.58).  The average generated code size is practically the
 same (0.001% difference).
 
 In the future we probably need to try more sophisticated cost

 calculation which should take into account that the equiv can not be
 combined in usage insns and the costs of reloads because of this.
 
 gcc/ChangeLog:
 
 * ira-costs.cc (find_costs_and_classes): Decrease memory cost

 by equiv savings.

Thanks for diving into this!

What's rather strange is when I do an A/B test with this patch on RISC-V 
it appears to be a pretty consistent loss for integer code.  This would 
seem to match your findings on x86 as well.


I still need to dig into it more deeply, but I see higher ALU as well as 
higher load/store traffic.  The load/store traffic in the one case I've 
looked at so far (omnetpp) appears to be prologue/epilogue related. 
Essentially we're using an additional callee saved register on paths 
that don't trigger at runtime.


Jeff



Re: [AArch64][testsuite] Adjust vect_copy_lane_1.c for new code-gen

2023-09-17 Thread Richard Sandiford via Gcc-patches
Prathamesh Kulkarni  writes:
> Hi,
> After 27de9aa152141e7f3ee66372647d0f2cd94c4b90, there's a following 
> regression:
> FAIL: gcc.target/aarch64/vect_copy_lane_1.c scan-assembler-times
> ins\\tv0.s\\[1\\], v1.s\\[0\\] 3
>
> This happens because for the following function from vect_copy_lane_1.c:
> float32x2_t
> __attribute__((noinline, noclone)) test_copy_lane_f32 (float32x2_t a,
> float32x2_t b)
> {
>   return vcopy_lane_f32 (a, 1, b, 0);
> }
>
> Before 27de9aa152141e7f3ee66372647d0f2cd94c4b90,
> it got lowered to following sequence in .optimized dump:
>[local count: 1073741824]:
>   _4 = BIT_FIELD_REF ;
>   __a_5 = BIT_INSERT_EXPR ;
>   return __a_5;
>
> The above commit simplifies BIT_FIELD_REF + BIT_INSERT_EXPR
> to vector permutation and now thus gets lowered to:
>
>[local count: 1073741824]:
>   __a_4 = VEC_PERM_EXPR ;
>   return __a_4;
>
> Since we give higher priority to aarch64_evpc_zip over aarch64_evpc_ins
> in aarch64_expand_vec_perm_const_1, it now generates:
>
> test_copy_lane_f32:
> zip1v0.2s, v0.2s, v1.2s
> ret
>
> Similarly for test_copy_lane_[us]32.

Yeah, I suppose this choice is at least as good as INS.  It has the advantage
that the source and destination don't need to be tied.  For example:

int32x2_t f(int32x2_t a, int32x2_t b, int32x2_t c) {
return vcopy_lane_s32 (b, 1, c, 0);
}

used to be:

f:
mov v0.8b, v1.8b
ins v0.s[1], v2.s[0]
ret

but is now:

f:
zip1v0.2s, v1.2s, v2.2s
ret

> The attached patch adjusts the tests to reflect the change in code-gen
> and the tests pass.
> OK to commit ?
>
> Thanks,
> Prathamesh
>
> diff --git a/gcc/testsuite/gcc.target/aarch64/vect_copy_lane_1.c 
> b/gcc/testsuite/gcc.target/aarch64/vect_copy_lane_1.c
> index 2848be564d5..811dc678b92 100644
> --- a/gcc/testsuite/gcc.target/aarch64/vect_copy_lane_1.c
> +++ b/gcc/testsuite/gcc.target/aarch64/vect_copy_lane_1.c
> @@ -22,7 +22,7 @@ BUILD_TEST (uint16x4_t, uint16x4_t, , , u16, 3, 2)
>  BUILD_TEST (float32x2_t, float32x2_t, , , f32, 1, 0)
>  BUILD_TEST (int32x2_t,   int32x2_t,   , , s32, 1, 0)
>  BUILD_TEST (uint32x2_t,  uint32x2_t,  , , u32, 1, 0)
> -/* { dg-final { scan-assembler-times "ins\\tv0.s\\\[1\\\], v1.s\\\[0\\\]" 3 
> } } */
> +/* { dg-final { scan-assembler-times "zip1\\tv0.2s, v0.2s, v1.2s" 3 } } */
>  BUILD_TEST (int64x1_t,   int64x1_t,   , , s64, 0, 0)
>  BUILD_TEST (uint64x1_t,  uint64x1_t,  , , u64, 0, 0)
>  BUILD_TEST (float64x1_t, float64x1_t, , , f64, 0, 0)

OK, thanks.

Richard


[PATCH V2] internal-fn: Support undefined rtx for uninitialized SSA_NAME

2023-09-17 Thread Juzhe-Zhong
According to PR: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110751

As Richard and Richi suggested, we recognize uninitialized SSA_NAME and convert 
it
into SCRATCH rtx if the target predicate allows SCRATCH.

It can help to reduce redundant data move instructions of targets like RISC-V.

gcc/ChangeLog:

* internal-fn.cc (expand_fn_using_insn): Support undefined rtx.
* optabs.cc (maybe_legitimize_operand): Ditto.
(can_reuse_operands_p): Ditto.
* optabs.h (enum expand_operand_type): Ditto.
(create_undefined_input_operand): Ditto.

---
 gcc/internal-fn.cc |  4 
 gcc/optabs.cc  | 16 
 gcc/optabs.h   | 14 +-
 3 files changed, 33 insertions(+), 1 deletion(-)

diff --git a/gcc/internal-fn.cc b/gcc/internal-fn.cc
index 0fd34359247..61d5a9e4772 100644
--- a/gcc/internal-fn.cc
+++ b/gcc/internal-fn.cc
@@ -247,6 +247,10 @@ expand_fn_using_insn (gcall *stmt, insn_code icode, 
unsigned int noutputs,
create_convert_operand_from (&ops[opno], rhs_rtx,
 TYPE_MODE (rhs_type),
 TYPE_UNSIGNED (rhs_type));
+  else if (TREE_CODE (rhs) == SSA_NAME
+  && SSA_NAME_IS_DEFAULT_DEF (rhs)
+  && VAR_P (SSA_NAME_VAR (rhs)))
+   create_undefined_input_operand (&ops[opno], TYPE_MODE (rhs_type));
   else
create_input_operand (&ops[opno], rhs_rtx, TYPE_MODE (rhs_type));
   opno += 1;
diff --git a/gcc/optabs.cc b/gcc/optabs.cc
index 32ff379ffc3..d8c771547a3 100644
--- a/gcc/optabs.cc
+++ b/gcc/optabs.cc
@@ -8102,6 +8102,21 @@ maybe_legitimize_operand (enum insn_code icode, unsigned 
int opno,
  goto input;
}
   break;
+
+case EXPAND_UNDEFINED:
+  {
+   mode = insn_data[(int) icode].operand[opno].mode;
+   rtx scratch = gen_rtx_SCRATCH (mode);
+   /* For SCRATCH rtx which is converted from uninitialized
+  SSA, we convert it as fresh pseudo when target doesn't
+  allow scratch rtx in predicate. Otherwise, return true.  */
+   if (!insn_operand_matches (icode, opno, scratch))
+ {
+   op->value = gen_reg_rtx (mode);
+   goto input;
+ }
+   return true;
+  }
 }
   return insn_operand_matches (icode, opno, op->value);
 }
@@ -8147,6 +8162,7 @@ can_reuse_operands_p (enum insn_code icode,
 case EXPAND_INPUT:
 case EXPAND_ADDRESS:
 case EXPAND_INTEGER:
+case EXPAND_UNDEFINED:
   return true;
 
 case EXPAND_CONVERT_TO:
diff --git a/gcc/optabs.h b/gcc/optabs.h
index c80b7f4dc1b..4eb1f9ee09a 100644
--- a/gcc/optabs.h
+++ b/gcc/optabs.h
@@ -37,7 +37,8 @@ enum expand_operand_type {
   EXPAND_CONVERT_TO,
   EXPAND_CONVERT_FROM,
   EXPAND_ADDRESS,
-  EXPAND_INTEGER
+  EXPAND_INTEGER,
+  EXPAND_UNDEFINED
 };
 
 /* Information about an operand for instruction expansion.  */
@@ -117,6 +118,17 @@ create_input_operand (class expand_operand *op, rtx value,
   create_expand_operand (op, EXPAND_INPUT, value, mode, false);
 }
 
+/* Make OP describe an undefined input operand for uninitialized
+   SSA.  It's the scratch operand with mode MODE; MODE cannot be
+   VOIDmode.  */
+
+inline void
+create_undefined_input_operand (class expand_operand *op, machine_mode mode)
+{
+  create_expand_operand (op, EXPAND_UNDEFINED, gen_rtx_SCRATCH (mode), mode,
+false);
+}
+
 /* Like create_input_operand, except that VALUE must first be converted
to mode MODE.  UNSIGNED_P says whether VALUE is unsigned.  */
 
-- 
2.36.3



Re: Re: [PATCH] internal-fn: Convert uninitialized SSA_NAME into SCRATCH rtx[PR110751]

2023-09-17 Thread 钟居哲
Thanks Richard.

Address comment in V2:
[PATCH V2] internal-fn: Support undefined rtx for uninitialized SSA_NAME 
(gnu.org)




juzhe.zh...@rivai.ai
 
From: Richard Sandiford
Date: 2023-09-17 18:29
To: Juzhe-Zhong
CC: gcc-patches; rguenther
Subject: Re: [PATCH] internal-fn: Convert uninitialized SSA_NAME into SCRATCH 
rtx[PR110751]
Juzhe-Zhong  writes:
> According to PR: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110751
>
> As Richard and Richi suggested, we recognize uninitialized SSA_NAME and 
> convert it
> into SCRATCH rtx if the target predicate allows SCRATCH.
>
> It can help to reduce redundant data move instructions of targets like RISC-V.
>
> Here we add the condition "insn_operand_matches (icode, opno, scratch)"
> Then, we will only create scratch rtx that target allow scratch rtx in 
> predicate.
> When the target doesn't allow scratch rtx in predicate, the later "else" 
> condtion
> will create fresh pseudo for uninitialized SSA.
>
> I have verify it in RISC-V port and it works well.
>
> Bootstrap and Regression on X86 passed.
>
> Ok for trunk ?
>  
> gcc/ChangeLog:
>
> * internal-fn.cc (expand_fn_using_insn): Convert uninitialized SSA into 
> scratch.
>
> ---
>  gcc/internal-fn.cc | 6 ++
>  1 file changed, 6 insertions(+)
>
> diff --git a/gcc/internal-fn.cc b/gcc/internal-fn.cc
> index 0fd34359247..fe4d86b3dbd 100644
> --- a/gcc/internal-fn.cc
> +++ b/gcc/internal-fn.cc
> @@ -243,10 +243,16 @@ expand_fn_using_insn (gcall *stmt, insn_code icode, 
> unsigned int noutputs,
>tree rhs = gimple_call_arg (stmt, i);
>tree rhs_type = TREE_TYPE (rhs);
>rtx rhs_rtx = expand_normal (rhs);
> +  rtx scratch = gen_rtx_SCRATCH (TYPE_MODE (rhs_type));
>if (INTEGRAL_TYPE_P (rhs_type))
>  create_convert_operand_from (&ops[opno], rhs_rtx,
>   TYPE_MODE (rhs_type),
>   TYPE_UNSIGNED (rhs_type));
> +  else if (TREE_CODE (rhs) == SSA_NAME
> +&& SSA_NAME_IS_DEFAULT_DEF (rhs)
> +&& VAR_P (SSA_NAME_VAR (rhs))
> +&& insn_operand_matches (icode, opno, scratch))
 
Rather than check insn_operand_matches here, I think we should create
the scratch operand regardless and leave optabs.cc to deal with it.
(This will need changes to optabs.cc.)
 
How about adding:
 
  create_undefined_input_operand (expand_operand *op, machine_mode mode)
 
that maps to a new EXPAND_UNDEFINED, then handle EXPAND_UNDEFINED in the
two case statements in optabs.cc.
 
Thanks,
Richard
 
> + create_input_operand (&ops[opno], scratch, TYPE_MODE (rhs_type));
>else
>  create_input_operand (&ops[opno], rhs_rtx, TYPE_MODE (rhs_type));
>opno += 1;
 


Re: [PATCH V2] internal-fn: Support undefined rtx for uninitialized SSA_NAME

2023-09-17 Thread Richard Sandiford via Gcc-patches
Juzhe-Zhong  writes:
> According to PR: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110751
>
> As Richard and Richi suggested, we recognize uninitialized SSA_NAME and 
> convert it
> into SCRATCH rtx if the target predicate allows SCRATCH.
>
> It can help to reduce redundant data move instructions of targets like RISC-V.
>
> gcc/ChangeLog:
>
>   * internal-fn.cc (expand_fn_using_insn): Support undefined rtx.
>   * optabs.cc (maybe_legitimize_operand): Ditto.
>   (can_reuse_operands_p): Ditto.
>   * optabs.h (enum expand_operand_type): Ditto.
>   (create_undefined_input_operand): Ditto.
>
> ---
>  gcc/internal-fn.cc |  4 
>  gcc/optabs.cc  | 16 
>  gcc/optabs.h   | 14 +-
>  3 files changed, 33 insertions(+), 1 deletion(-)
>
> diff --git a/gcc/internal-fn.cc b/gcc/internal-fn.cc
> index 0fd34359247..61d5a9e4772 100644
> --- a/gcc/internal-fn.cc
> +++ b/gcc/internal-fn.cc
> @@ -247,6 +247,10 @@ expand_fn_using_insn (gcall *stmt, insn_code icode, 
> unsigned int noutputs,
>   create_convert_operand_from (&ops[opno], rhs_rtx,
>TYPE_MODE (rhs_type),
>TYPE_UNSIGNED (rhs_type));
> +  else if (TREE_CODE (rhs) == SSA_NAME
> +&& SSA_NAME_IS_DEFAULT_DEF (rhs)
> +&& VAR_P (SSA_NAME_VAR (rhs)))
> + create_undefined_input_operand (&ops[opno], TYPE_MODE (rhs_type));
>else
>   create_input_operand (&ops[opno], rhs_rtx, TYPE_MODE (rhs_type));
>opno += 1;
> diff --git a/gcc/optabs.cc b/gcc/optabs.cc
> index 32ff379ffc3..d8c771547a3 100644
> --- a/gcc/optabs.cc
> +++ b/gcc/optabs.cc
> @@ -8102,6 +8102,21 @@ maybe_legitimize_operand (enum insn_code icode, 
> unsigned int opno,
> goto input;
>   }
>break;
> +
> +case EXPAND_UNDEFINED:
> +  {
> + mode = insn_data[(int) icode].operand[opno].mode;
> + rtx scratch = gen_rtx_SCRATCH (mode);

A scratch of the right mode should already be available in op->value,
since it was created by create_undefined_input_operand.

If that doesn't work for some reason, then it would be better for
create_undefined_input_operand to pass NULL_RTX as the "value"
argument to create_expand_operand.

> + /* For SCRATCH rtx which is converted from uninitialized
> +SSA, we convert it as fresh pseudo when target doesn't
> +allow scratch rtx in predicate. Otherwise, return true.  */
> + if (!insn_operand_matches (icode, opno, scratch))
> +   {
> + op->value = gen_reg_rtx (mode);

The mode should come from op->mode.

> + goto input;
> +   }
> + return true;
> +  }
>  }
>return insn_operand_matches (icode, opno, op->value);
>  }
> @@ -8147,6 +8162,7 @@ can_reuse_operands_p (enum insn_code icode,
>  case EXPAND_INPUT:
>  case EXPAND_ADDRESS:
>  case EXPAND_INTEGER:
> +case EXPAND_UNDEFINED:
>return true;

I think this should be in the "return false" block instead.

>  
>  case EXPAND_CONVERT_TO:
> diff --git a/gcc/optabs.h b/gcc/optabs.h
> index c80b7f4dc1b..4eb1f9ee09a 100644
> --- a/gcc/optabs.h
> +++ b/gcc/optabs.h
> @@ -37,7 +37,8 @@ enum expand_operand_type {
>EXPAND_CONVERT_TO,
>EXPAND_CONVERT_FROM,
>EXPAND_ADDRESS,
> -  EXPAND_INTEGER
> +  EXPAND_INTEGER,
> +  EXPAND_UNDEFINED

Sorry, this was my bad suggestion.  I should have suggested
EXPAND_UNDEFINED_INPUT, to match the name of the function.

Thanks,
Richard

>  };
>  
>  /* Information about an operand for instruction expansion.  */
> @@ -117,6 +118,17 @@ create_input_operand (class expand_operand *op, rtx 
> value,
>create_expand_operand (op, EXPAND_INPUT, value, mode, false);
>  }
>  
> +/* Make OP describe an undefined input operand for uninitialized
> +   SSA.  It's the scratch operand with mode MODE; MODE cannot be
> +   VOIDmode.  */
> +
> +inline void
> +create_undefined_input_operand (class expand_operand *op, machine_mode mode)
> +{
> +  create_expand_operand (op, EXPAND_UNDEFINED, gen_rtx_SCRATCH (mode), mode,
> +  false);
> +}
> +
>  /* Like create_input_operand, except that VALUE must first be converted
> to mode MODE.  UNSIGNED_P says whether VALUE is unsigned.  */


Re: [PATCH v1] RISC-V: Bugfix for scalar move with merged operand

2023-09-17 Thread Jeff Law via Gcc-patches





On 9/17/23 01:42, Pan Li via Gcc-patches wrote:

From: Pan Li 

Given below example for VLS mode

void
test (vl_t *u)
{
   vl_t t;
   long long *p = (long long *)&t;

   p[0] = p[1] = 2;

   *u = t;
}

The vec_set will simplify the insn to vmv.s.x when index is 0, without
merged operand. That will result in some problems in DCE, aka:

1:  137[DI] = a0
2:  138[V2DI] = 134[V2DI]  // deleted by DCE
3:  139[DI] = #2   // deleted by DCE
4:  140[DI] = #2   // deleted by DCE
5:  141[V2DI] = vec_dup:V2DI (139[DI]) // deleted by DCE
6:  138[V2DI] = vslideup_imm (138[V2DI], 141[V2DI], 1) // deleted by DCE
7:  135[V2DI] = 138[V2DI]  // deleted by DCE
8:  142[V2DI] = 135[V2DI]  // deleted by DCE
9:  143[DI] = #2
10: 142[V2DI] = vec_dup:V2DI (143[DI])
11: (137[DI]) = 142[V2DI]

The higher 64 bits of 142[V2DI] is unknown here and it generated
incorrect code when store back to memory. This patch would like to
fix this issue by adding a new SCALAR_MOVE_MERGED_OP for vec_set.
I must be missing something.  Doesn't insn 10 broadcast the immediate 
0x2 to both elements of r142?!?  What am I missing?


JEff


Re: [PATCH 1/2] RISC-V: Make bit manipulation value / round number and shift amount types for builtins unsigned

2023-09-17 Thread Jeff Law via Gcc-patches




On 9/11/23 19:28, Tsukasa OI wrote:

From: Tsukasa OI 

For bit manipulation operations, input(s) and the manipulated output are
better to be unsigned like other target-independent builtins like
__builtin_bswap32 and __builtin_popcount.

Although this is not completely compatible as before (as the type changes),
most code will run normally, even without warnings (with -Wall -Wextra).

To make consistent to the LLVM commit 599421ae36c3 ("[RISCV] Use unsigned
instead of signed types for Zk* and Zb* builtins."), round numbers and
shift amount on the scalar crypto instructions are also changed
to unsigned.

gcc/ChangeLog:

* config/riscv/riscv-builtins.cc (RISCV_ATYPE_UQI): New for
uint8_t.  (RISCV_ATYPE_UHI): New for uint16_t.
(RISCV_ATYPE_QI, RISCV_ATYPE_HI, RISCV_ATYPE_SI, RISCV_ATYPE_DI):
Removed as no longer used.
(RISCV_ATYPE_UDI): New for uint64_t.
* config/riscv/riscv-cmo.def: Make types unsigned for not working
"zicbop_cbo_prefetchi" and working bit manipulation clmul builtin
argument/return types.
* config/riscv/riscv-ftypes.def: Make bit manipulation, round
number and shift amount types unsigned.
* config/riscv/riscv-scalar-crypto.def: Ditto.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/zbc32.c: Make signed type to unsigned.
* gcc.target/riscv/zbc64.c: Ditto.
* gcc.target/riscv/zbkb32.c: Ditto.
* gcc.target/riscv/zbkb64.c: Ditto.
* gcc.target/riscv/zbkc32.c: Ditto.
* gcc.target/riscv/zbkc64.c: Ditto.
* gcc.target/riscv/zbkx32.c: Ditto.
* gcc.target/riscv/zbkx64.c: Ditto.
* gcc.target/riscv/zknd32.c: Ditto.
* gcc.target/riscv/zknd64.c: Ditto.
* gcc.target/riscv/zkne32.c: Ditto.
* gcc.target/riscv/zkne64.c: Ditto.
* gcc.target/riscv/zknh-sha256.c: Ditto.
* gcc.target/riscv/zknh-sha512-32.c: Ditto.
* gcc.target/riscv/zknh-sha512-64.c: Ditto.
* gcc.target/riscv/zksed32.c: Ditto.
* gcc.target/riscv/zksed64.c: Ditto.
* gcc.target/riscv/zksh32.c: Ditto.
* gcc.target/riscv/zksh64.c: Ditto.

OK
Jeff

---


Re: [RFC PATCH 2/2] RISC-V: Update testsuite for type-changed builtins

2023-09-17 Thread Jeff Law via Gcc-patches




On 9/6/23 20:17, Tsukasa OI wrote:

From: Tsukasa OI 

This commit replaces the type of the builtin used in the testsuite.

Even without this commit, it won't cause any test failures but changed so
that no confusion occurs.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/zbc32.c: Make signed type to unsigned.
* gcc.target/riscv/zbc64.c: Ditto.
* gcc.target/riscv/zbkb32.c: Ditto.
* gcc.target/riscv/zbkb64.c: Ditto.
* gcc.target/riscv/zbkc32.c: Ditto.
* gcc.target/riscv/zbkc64.c: Ditto.
* gcc.target/riscv/zbkx32.c: Ditto.
* gcc.target/riscv/zbkx64.c: Ditto.
* gcc.target/riscv/zknd32.c: Ditto.
* gcc.target/riscv/zknd64.c: Ditto.
* gcc.target/riscv/zkne32.c: Ditto.
* gcc.target/riscv/zkne64.c: Ditto.
* gcc.target/riscv/zknh-sha256.c: Ditto.
* gcc.target/riscv/zknh-sha512-32.c: Ditto.
* gcc.target/riscv/zknh-sha512-64.c: Ditto.
* gcc.target/riscv/zksed32.c: Ditto.
* gcc.target/riscv/zksed64.c: Ditto.
* gcc.target/riscv/zksh32.c: Ditto.
* gcc.target/riscv/zksh64.c: Ditto.

OK
jeff


Re: RFC: Introduce -fhardened to enable security-related flags

2023-09-17 Thread Hans-Peter Nilsson via Gcc-patches
> From: Sam James 
> Date: Sun, 17 Sep 2023 05:00:37 +0100

> Hans-Peter Nilsson via Gcc-patches  writes:
> 
> >> Date: Tue, 29 Aug 2023 15:42:27 -0400
> >> From: Marek Polacek via Gcc-patches 
> >
> >> Surely, there must be no ABI impact, the option cannot cause
> >> severe performance issues,
> >
> >> Currently, -fhardened enables:
> > ...
> >>   -ftrivial-auto-var-init=zero
> >
> >> Thoughts?
> >
> > Regarding -ftrivial-auto-var-init=zero, I was consulted when
> > colleagues investigating a performance regression
> > pint-pointed it as *causing severe performance issues*;
> > cf. https://github.com/systemd/systemd.git commit 1a4e392760
> > (TL;DR: adds "-ftrivial-auto-var-init=zero" to the systemd
> > build).
> >
> > The situation was described as "we noticed that some test
> > suites takes 35% percent longer time to finish.  After
> > further investigation it was noticed that running systemctl
> > unmask x takes around 5s more time on [version including
> > patch vs. before that patch]" (timing out some tests).
> > Reverting that patch fixed the drop in performance.
> 
> Did some bug ever get filed for this to see if we can do a bit
> better here?

Not that I know of; neither for systemd nor gcc.

> Some slowdown doesn't mean it's of the expected magnitude.

Can you please rephrase that?

> > Just a data point, but I believe also exactly your intended
> > use.  IMO including -ftrivial-auto-var-init is worth extra
> > consideration.
> >
> > Alternatively, strike the while "cannot cause severe
> > performance issues".
> >
> > brgds, H-P
> 


Re: [PATCH] tree-optimization/111294 - backwards threader PHI costing

2023-09-17 Thread Jeff Law via Gcc-patches




On 9/14/23 07:23, Richard Biener via Gcc-patches wrote:

This revives an earlier patch since the problematic code applying
extra costs to PHIs in copied blocks we couldn't make any sense of
prevents a required threading in this case.  Instead of coming up
with an artificial other costing the following simply removes the
bits.

As with all threading changes this requires a plethora of testsuite
adjustments, but only the last three are unfortunate as is the
libgomp team.c adjustment which is required to avoid a bogus -Werror
diagnostic during bootstrap.

Bootstrapped and tested on x86_64-unknown-linux-gnu.

Any objections?

Thanks,
Richard.

PR tree-optimization/111294
gcc/
* tree-ssa-threadbackward.cc (back_threader_profitability::m_name):
Remove
(back_threader::find_paths_to_names): Adjust.
(back_threader::maybe_thread_block): Likewise.
(back_threader_profitability::possibly_profitable_path_p): Remove
code applying extra costs to copies PHIs.

libgomp/
* team.c (gomp_team_start): Guard gomp_alloca to avoid false
positive alloc-size diagnostic.

gcc/testsuite/
* gcc.dg/tree-ssa/pr111294.c: New test.
* gcc.dg/tree-ssa/phi_on_compare-4.c: Adjust.
* gcc.dg/tree-ssa/pr59597.c: Likewise.
* gcc.dg/tree-ssa/pr61839_2.c: Likewise.
* gcc.dg/tree-ssa/ssa-sink-18.c: Likewise.
* g++.dg/warn/Wstringop-overflow-4.C: XFAIL subtest on ilp32.
* gcc.dg/uninit-pred-9_b.c: XFAIL subtest everywhere.
* gcc.dg/vect/vect-117.c: Make scan for not Invalid sum
conditional on lp64.
No objections. IIRC this was all to deal with a code explosion problem 
seen on sparc.  As long as tree-ssa/pr69196-1.c hasn't gone crazy we're OK.


jeff


Re: [PATCH 2/2 v3] Ada: Finalization of constrained subtypes of unconstrained synchronized private extensions

2023-09-17 Thread Richard Wai
Hi Gary,

Thanks for finding that! I have made the recommended change and attached the 
revised patch, which is also rebased on trunk.

Additionally, I have added the “Signed-off-by” tag for legal compliance to the 
patch, as well as the change log entry as follows:

--  Begin change log entry –
 
ada: TSS finalize address subprogram generation for constrained subtypes of 
unconstrained synchronized private extensions should take care to designate the 
corresponding record of the underlying concurrent type.
 
When generating TSS finalize address subprograms for class-wide types of 
constrained root types, it follows the parent chain looking for the first 
“non-constrained” type. It is possible that such a type is a private extension 
with the “synchronized” keyword, in which case the underlying type is a 
concurrent type. When that happens, the designated type of the finalize address 
subprogram should be the corresponding record’s class-wide-type.
 
Gcc/ada/
* exp_ch3(Expand_Freeze_Class_Wide_Type): Expanded comments 
explaining why TSS Finalize_Address is not generated for concurrent class-wide 
types.
* exp_ch7(Make_Finalize_Address_Stmts): Handle cases where the 
underlying non-constrained parent type is a concurrent type, and adjust the 
designated type to be the corresponding record’s class-wide type.

Signed-off-by: Richard Wai mailto:rich...@annexi-strayline.com>> 

--  End change log entry –


See you at the next meeting!

Cheers,

Richard



> On Sep 15, 2023, at 12:38, Gary Dismukes  wrote:
> 
> Richard,
> 
> As a follow-on to my earlier message, additional testing has uncovered an 
> issue
> with your patch.  When run against a compiler built with assertions enabled,
> the test of "Present (Corresponding_Record_Type (Parent_Utyp))" can fail.
> An additional guard is needed prior to that test, as follows:
> 
>if Ekind (Parent_Utyp) in Concurrent_Kind
>  and then Present (Corresponding_Record_Type (Parent_Utyp))
>then
>   Parent_Utyp := Corresponding_Record_Type (Parent_Utyp);
>end if;
> 
> -- Gary
> 



[PATCH 1/2 v3] Ada: Synchronized private extensions are always limited

2023-09-17 Thread Richard Wai
Hi Arno,

I have added the required “Signed-off-by” tag to the patch and to the change 
log entry below. I believe for all other aspects I have followed the 
instructions. For getting the patch applied it states "If you do not have write 
access and a patch of yours has been approved, but not committed, please advise 
the approver of that fact.” So I think I have done that correctly.. However let 
me know if there is someone else not included in the CC that should be handling 
that.

Of course, I’d love to work towards one day getting write access myself, but 
something tells me that’s a bit of a process.

--  Begin change log entry --
 
ada: Private extensions with the keyword “synchronized” are always limited.
 
GNAT was relying on synchronized private type extensions deriving from a 
concurrent interface to determine its limitedness. This does not cover the case 
where such an extension derives a limited interface. RM-7.6(6/2) makes is clear 
that “synchronized” in a private extension implies the derived type is limited. 
GNAT should explicitly check for the presence of “synchronized” in a private 
extension declaration, and it should have the same effect as the presence of 
“limited”.
 
gcc/ada/
* sem_ch3.adb (Build_Derived_Record_Type): Treat presence of 
keyword “synchronized” the same as “limited” when determining if a private 
extension is limited.

Signed-off-by: Richard Wai mailto:rich...@annexi-strayline.com>> 

-- End change log entry --

Thanks,

Richard



> On Sep 13, 2023, at 03:54, Arnaud Charlet  wrote:
> 
>> No worries, and sorry for the trouble. I’m going to try using a different 
>> client for the gcc mailing list, it doesn’t seem to like Outlook. Thanks for 
>> catching that mistake!
>> 
>> Please advise how I can get this patch actually applied, given my lack of 
>> commit privilege.
> 
> You first need to follow instructions from https://gcc.gnu.org/contribute.html
> and in particular meet the legal requirements.
> 
> Then get someone with write approval to commit the change.
> 
> Arno



[PATCH] c++: non-dependent assignment checking [PR63198, PR18474]

2023-09-17 Thread Patrick Palka via Gcc-patches
Bootstrapped and regtested on x86_64-pc-linux-gnu, does this look OK for
trunk?  Patch generatde with -w to avoid noisy whitespace changes.

-- >8 --

This patch makes us recognize and check non-dependent simple assigments
ahead of time, like we already do for compound assignments.  This means
the templated representation of such assignments will now usually have
an implicit INDIRECT_REF (due to the reference return type), which the
-Wparentheses code needs to handle.  As a drive-by improvement, this
patch also makes maybe_convert_cond issue -Wparentheses warnings ahead
of time.

This revealed some libstdc++ tests were attempting to modify a data
member from a uninstantiated const member function; naively fixed by
making the data member mutable.

PR c++/63198
PR c++/18474

gcc/cp/ChangeLog:

* semantics.cc (maybe_convert_cond): Look through implicit
INDIRECT_REF when deciding whether to issue a -Wparentheses
warning, and consider templated assignment expressions as well.
(finish_parenthesized_expr): Look through implicit INDIRECT_REF
when suppressing -Wparentheses warning.
* typeck.cc (build_x_modify_expr): Check simple assignments
ahead time too, not just compound assignments.  Give the second
operand of MODOP_EXPR a non-null type so that it's not considered
always instantiation-dependent.

gcc/testsuite/ChangeLog:

* g++.dg/cpp0x/static_assert15.C: Expect diagnostic for
non-constant static_assert condition.
* g++.dg/expr/unary2.C: Remove xfails.
* g++.dg/template/init7.C: Make initializer type-dependent to
preserve intent of test.
* g++.dg/template/recurse3.C: Likewise for the erroneous
statement.
* g++.dg/template/non-dependent26.C: New test.
* g++.dg/warn/Wparentheses-32.C: New test.

libstdc++/ChangeLog:

* testsuite/26_numerics/random/discard_block_engine/cons/seed_seq2.cc:
Make seed_seq::called member mutable.
* 
testsuite/26_numerics/random/independent_bits_engine/cons/seed_seq2.cc:
Likewise.
* 
testsuite/26_numerics/random/linear_congruential_engine/cons/seed_seq2.cc
Likewise.
* 
testsuite/26_numerics/random/mersenne_twister_engine/cons/seed_seq2.cc:
Likewise.
* testsuite/26_numerics/random/shuffle_order_engine/cons/seed_seq2.cc:
Likewise.
* 
testsuite/26_numerics/random/subtract_with_carry_engine/cons/seed_seq2.cc:
Likewise.
* 
testsuite/ext/random/simd_fast_mersenne_twister_engine/cons/seed_seq2.cc
Likewise.
---
 gcc/cp/semantics.cc   | 17 +++
 gcc/cp/typeck.cc  | 23 +++
 gcc/testsuite/g++.dg/cpp0x/static_assert15.C  |  2 +-
 gcc/testsuite/g++.dg/expr/unary2.C|  8 ++
 gcc/testsuite/g++.dg/template/init7.C |  2 +-
 .../g++.dg/template/non-dependent26.C | 25 +
 gcc/testsuite/g++.dg/template/recurse3.C  |  8 +++---
 gcc/testsuite/g++.dg/warn/Wparentheses-32.C   | 28 +++
 .../discard_block_engine/cons/seed_seq2.cc|  2 +-
 .../independent_bits_engine/cons/seed_seq2.cc |  2 +-
 .../cons/seed_seq2.cc |  2 +-
 .../mersenne_twister_engine/cons/seed_seq2.cc |  2 +-
 .../shuffle_order_engine/cons/seed_seq2.cc|  2 +-
 .../cons/seed_seq2.cc |  2 +-
 .../cons/seed_seq2.cc |  2 +-
 15 files changed, 91 insertions(+), 36 deletions(-)
 create mode 100644 gcc/testsuite/g++.dg/template/non-dependent26.C
 create mode 100644 gcc/testsuite/g++.dg/warn/Wparentheses-32.C

diff --git a/gcc/cp/semantics.cc b/gcc/cp/semantics.cc
index 0f7f4e87ae4..b57c1ac868b 100644
--- a/gcc/cp/semantics.cc
+++ b/gcc/cp/semantics.cc
@@ -881,13 +881,17 @@ maybe_convert_cond (tree cond)
   /* Do the conversion.  */
   cond = convert_from_reference (cond);
 
-  if ((TREE_CODE (cond) == MODIFY_EXPR || is_assignment_op_expr_p (cond))
+  tree inner = REFERENCE_REF_P (cond) ? TREE_OPERAND (cond, 0) : cond;
+  if ((TREE_CODE (inner) == MODIFY_EXPR
+   || (TREE_CODE (inner) == MODOP_EXPR
+  && TREE_CODE (TREE_OPERAND (inner, 1)) == NOP_EXPR)
+   || is_assignment_op_expr_p (inner))
   && warn_parentheses
-  && !warning_suppressed_p (cond, OPT_Wparentheses)
-  && warning_at (cp_expr_loc_or_input_loc (cond),
+  && !warning_suppressed_p (inner, OPT_Wparentheses)
+  && warning_at (cp_expr_loc_or_input_loc (inner),
 OPT_Wparentheses, "suggest parentheses around "
   "assignment used as truth value"))
-suppress_warning (cond, OPT_Wparentheses);
+suppress_warning (inner, OPT_Wparentheses);
 
   return condition_conversion (cond);
 }
@@ -2155,8 +2159,11 @@ cp_expr
 finish_parenthesized_expr (cp_expr expr)
 {
   if (EXPR_P (expr))
+{
+  tree inner = REFERENCE_REF_P (e

[PATCH] c++: optimize tsubst_template_decl for function templates

2023-09-17 Thread Patrick Palka via Gcc-patches
Bootstrapped and regtested on x86_64-pc-linux-gnu, does this look OK for
trunk?

-- >8 --

r14-2655-g92d1425ca78040 made instantiate_template avoid redundantly
performing a specialization lookup when instantiating a function or
alias template.  This patch applies the same optimization to
tsubst_template_decl when (partially) instantiating a function template,
which allows us to remove a check from register_specialization since
tsubst_function_decl no longer calls register_specialization for
a function template partial instantiation.

gcc/cp/ChangeLog:

* pt.cc (register_specialization): Remove now-unnecessary
early exit for FUNCTION_DECL partial instantiation.
(tsubst_template_decl): Pass use_spec_table=false to
tsubst_function_decl.  Set DECL_TI_ARGS of a non-lambda
FUNCTION_DECL specialization to the full set of arguments.
Simplify register_specialization call accordingly.

gcc/testsuite/ChangeLog:

* g++.dg/template/nontype12.C: Expect two instead of three
duplicate diagnostics for A::bar() specialization.
---
 gcc/cp/pt.cc  | 29 +++
 gcc/testsuite/g++.dg/template/nontype12.C |  1 -
 2 files changed, 9 insertions(+), 21 deletions(-)

diff --git a/gcc/cp/pt.cc b/gcc/cp/pt.cc
index c311a6b88f5..a0296a1ea16 100644
--- a/gcc/cp/pt.cc
+++ b/gcc/cp/pt.cc
@@ -1507,21 +1507,6 @@ register_specialization (tree spec, tree tmpl, tree 
args, bool is_friend,
  || (TREE_CODE (tmpl) == FIELD_DECL
  && TREE_CODE (spec) == NONTYPE_ARGUMENT_PACK));
 
-  if (TREE_CODE (spec) == FUNCTION_DECL
-  && uses_template_parms (DECL_TI_ARGS (spec)))
-/* This is the FUNCTION_DECL for a partial instantiation.  Don't
-   register it; we want the corresponding TEMPLATE_DECL instead.
-   We use `uses_template_parms (DECL_TI_ARGS (spec))' rather than
-   the more obvious `uses_template_parms (spec)' to avoid problems
-   with default function arguments.  In particular, given
-   something like this:
-
- template  void f(T t1, T t = T())
-
-   the default argument expression is not substituted for in an
-   instantiation unless and until it is actually needed.  */
-return spec;
-
   spec_entry elt;
   elt.tmpl = tmpl;
   elt.args = args;
@@ -14663,7 +14648,7 @@ tsubst_template_decl (tree t, tree args, tsubst_flags_t 
complain,
   tree in_decl = t;
   tree spec;
   tree tmpl_args;
-  tree full_args;
+  tree full_args = NULL_TREE;
   tree r;
   hashval_t hash = 0;
 
@@ -14754,7 +14739,8 @@ tsubst_template_decl (tree t, tree args, tsubst_flags_t 
complain,
   tree inner = decl;
   ++processing_template_decl;
   if (TREE_CODE (inner) == FUNCTION_DECL)
-inner = tsubst_function_decl (inner, args, complain, lambda_fntype);
+inner = tsubst_function_decl (inner, args, complain, lambda_fntype,
+ /*use_spec_table=*/false);
   else
 {
   if (TREE_CODE (inner) == TYPE_DECL && !TYPE_DECL_ALIAS_P (inner))
@@ -14792,6 +14778,11 @@ tsubst_template_decl (tree t, tree args, 
tsubst_flags_t complain,
 }
   else
 {
+  if (TREE_CODE (inner) == FUNCTION_DECL)
+   /* Set DECL_TI_ARGS to the full set of template arguments, which
+  tsubst_function_decl didn't do due to use_spec_table=false.  */
+   DECL_TI_ARGS (inner) = full_args;
+
   DECL_TI_TEMPLATE (inner) = r;
   DECL_TI_ARGS (r) = DECL_TI_ARGS (inner);
 }
@@ -14822,9 +14813,7 @@ tsubst_template_decl (tree t, tree args, tsubst_flags_t 
complain,
 
   if (TREE_CODE (decl) == FUNCTION_DECL && !lambda_fntype)
 /* Record this non-type partial instantiation.  */
-register_specialization (r, t,
-DECL_TI_ARGS (DECL_TEMPLATE_RESULT (r)),
-false, hash);
+register_specialization (r, t, full_args, false, hash);
 
   return r;
 }
diff --git a/gcc/testsuite/g++.dg/template/nontype12.C 
b/gcc/testsuite/g++.dg/template/nontype12.C
index 9a9c3ac1e66..e36a9f16f94 100644
--- a/gcc/testsuite/g++.dg/template/nontype12.C
+++ b/gcc/testsuite/g++.dg/template/nontype12.C
@@ -5,7 +5,6 @@ template struct A
 {
   template int foo();// { dg-error "double" "" { 
target c++17_down } }
   template class> int bar();// { dg-bogus 
{double[^\n]*\n[^\n]*C:7:[^\n]*double} "" { xfail c++17_down } }
-  // { dg-error "double" "" { target c++17_down } .-1 }
   template struct X; // { dg-error "double" "" { 
target c++17_down } }
 };
 
-- 
2.42.0.216.gbda494f404



[PATCH] Trivial typo fix in variadic

2023-09-17 Thread Marc Poulhiès via Gcc-patches
Fix all occurences of varadic, except for Rust (will be part of another change).

gcc/ChangeLog:

* config/nvptx/nvptx.h (struct machine_function): Fix typo in variadic.
* config/nvptx/nvptx.cc (nvptx_function_arg_advance): Adjust to use 
fixed name.
(nvptx_declare_function_name): Likewise.
(nvptx_call_args): Likewise.
(nvptx_expand_call): Likewise.

gcc/cp/ChangeLog:

* lambda.cc (compare_lambda_sig): Fix typo in variadic.

libcpp/ChangeLog:

* macro.cc (parse_params): Fix typo in variadic.
(create_iso_definition): Likewise.

Signed-off-by: Marc Poulhiès 
---

Hi,

I came across this trivial typo and fixed it.

The compiler still builds correctly.
I've bootstraped x86_64-linux.
As I don't really know how to setup nvptx correctly (and not sure
this trivial fix warrants learning the full setup...), I've simply
built the compiler for nvptx-none.

Ok for master?

 gcc/config/nvptx/nvptx.cc | 14 +++---
 gcc/config/nvptx/nvptx.h  |  4 ++--
 gcc/cp/lambda.cc  |  2 +-
 libcpp/macro.cc   | 20 ++--
 4 files changed, 20 insertions(+), 20 deletions(-)

diff --git a/gcc/config/nvptx/nvptx.cc b/gcc/config/nvptx/nvptx.cc
index edef39fb5e1..0de42408841 100644
--- a/gcc/config/nvptx/nvptx.cc
+++ b/gcc/config/nvptx/nvptx.cc
@@ -720,7 +720,7 @@ nvptx_function_arg_advance (cumulative_args_t cum_v, const 
function_arg_info &)
 
 /* Implement TARGET_FUNCTION_ARG_BOUNDARY.
 
-   For nvptx This is only used for varadic args.  The type has already
+   For nvptx This is only used for variadic args.  The type has already
been promoted and/or converted to invisible reference.  */
 
 static unsigned
@@ -1548,7 +1548,7 @@ nvptx_declare_function_name (FILE *file, const char 
*name, const_tree decl)
   if (!TARGET_SOFT_STACK)
 {
   /* Declare a local var for outgoing varargs.  */
-  if (cfun->machine->has_varadic)
+  if (cfun->machine->has_variadic)
init_frame (file, STACK_POINTER_REGNUM,
UNITS_PER_WORD, crtl->outgoing_args_size);
 
@@ -1558,7 +1558,7 @@ nvptx_declare_function_name (FILE *file, const char 
*name, const_tree decl)
init_frame (file, FRAME_POINTER_REGNUM, alignment,
ROUND_UP (sz, GET_MODE_SIZE (DImode)));
 }
-  else if (need_frameptr || cfun->machine->has_varadic || cfun->calls_alloca
+  else if (need_frameptr || cfun->machine->has_variadic || cfun->calls_alloca
   || (cfun->machine->has_simtreg && !crtl->is_leaf))
 init_softstack_frame (file, alignment, sz);
 
@@ -1795,13 +1795,13 @@ nvptx_call_args (rtx arg, tree fntype)
   if (!cfun->machine->doing_call)
 {
   cfun->machine->doing_call = true;
-  cfun->machine->is_varadic = false;
+  cfun->machine->is_variadic = false;
   cfun->machine->num_args = 0;
 
   if (fntype && stdarg_p (fntype))
{
- cfun->machine->is_varadic = true;
- cfun->machine->has_varadic = true;
+ cfun->machine->is_variadic = true;
+ cfun->machine->has_variadic = true;
  cfun->machine->num_args++;
}
 }
@@ -1871,7 +1871,7 @@ nvptx_expand_call (rtx retval, rtx address)
 }
 
   unsigned nargs = cfun->machine->num_args;
-  if (cfun->machine->is_varadic)
+  if (cfun->machine->is_variadic)
 {
   varargs = gen_reg_rtx (Pmode);
   emit_move_insn (varargs, stack_pointer_rtx);
diff --git a/gcc/config/nvptx/nvptx.h b/gcc/config/nvptx/nvptx.h
index 129427e5654..666021283c2 100644
--- a/gcc/config/nvptx/nvptx.h
+++ b/gcc/config/nvptx/nvptx.h
@@ -209,8 +209,8 @@ struct GTY(()) machine_function
 {
   rtx_expr_list *call_args;  /* Arg list for the current call.  */
   bool doing_call; /* Within a CALL_ARGS ... CALL_ARGS_END sequence.  */
-  bool is_varadic;  /* This call is varadic  */
-  bool has_varadic;  /* Current function has a varadic call.  */
+  bool is_variadic;  /* This call is variadic  */
+  bool has_variadic;  /* Current function has a variadic call.  */
   bool has_chain; /* Current function has outgoing static chain.  */
   bool has_softstack; /* Current function has a soft stack frame.  */
   bool has_simtreg; /* Current function has an OpenMP SIMD region.  */
diff --git a/gcc/cp/lambda.cc b/gcc/cp/lambda.cc
index a359bc6ee8d..34d0190a89b 100644
--- a/gcc/cp/lambda.cc
+++ b/gcc/cp/lambda.cc
@@ -1619,7 +1619,7 @@ compare_lambda_sig (tree fn_a, tree fn_b)
 {
   if (!args_a || !args_b)
return false;
-  // This check also deals with differing varadicness
+  // This check also deals with differing variadicness
   if (!same_type_p (TREE_VALUE (args_a), TREE_VALUE (args_b)))
return false;
 }
diff --git a/libcpp/macro.cc b/libcpp/macro.cc
index dada8fea835..4f229c1501c 100644
--- a/libcpp/macro.cc
+++ b/libcpp/macro.cc
@@ -3431,7 +3431,7 @@ _cpp_unsave_parameters (cpp_reader *pfile, unsigned n)
 */
 
 static bool
-parse_params (cpp_reader *pfile, unsigned *n_ptr, bool *varadic_ptr)

Re: gcc-patches From rewriting mailman settings (Was: [Linaro-TCWG-CI] gcc patch #75674: FAIL: 68 regressions)

2023-09-17 Thread Mark Wielaard
Hi all,

On Tue, Sep 12, 2023 at 05:00:07PM +0200, Mark Wielaard wrote:
> Adding Jeff to CC who is the official gcc-patches mailinglist admin.
> [...] 
> Yes, it is expected for emails that come from domains with a dmarc
> policy. That is because the current settings of the gcc-patches
> mailinglist might slightly alter the message or headers in a way that
> invalidates the DKIM signature. Without From rewriting those messages
> would be bounced by recipients that check the dmarc policy/dkim
> signature.
> 
> As you noticed the glibc hackers have recently worked together with the
> sourceware overseers to upgrade mailman and alter the postfix and the
> libc-alpha mailinglist setting so it doesn't require From rewriting
> anymore (the message and header aren't altered anymore to invalidate
> the DKIM signatures).
> 
> We (Jeff or anyone else with mailman admin privs) could use the same
> settings for gcc-patches. The settings that need to be set are in that
> bug:
> 
> - subject_prefix (general): (empty)
> - from_is_list (general): No
> - anonymous_list (general): No
> - first_strip_reply_to (general): No
> - reply_goes_to_list (general): Poster
> - reply_to_address (general): (empty)
> - include_sender_header (general): No
> - drop_cc (general): No
> - msg_header (nondigest): (empty)
> - msg_footer (nondigest): (empty)
> - scrub_nondigest (nondigest): No
> - dmarc_moderation_action (privacy): Accept
> - filter_content (contentfilter): No
> 
> The only visible change (apart from no more From rewriting) is that
> HTML multi-parts aren't scrubbed anymore (that would be a message
> altering issue). The html part is still scrubbed from the
> inbox.sourceware.org archive, so b4 works just fine. But I don't know
> what patchwork.sourceware.org does with HTML attachements. Of course
> people really shouldn't sent HTML attachments to gcc-patches, so maybe
> this is no real problem.

Although there were some positive responses (on list and on irc) it is
sometimes hard to know if there really is consensus for these kind of
infrastructure tweaks. But I believe there is at least no sustained
opposition to changing the gcc-patches mailman setting as proposed
above.

So unless someone objects I like to make this change Tuesday September
19 around 08:00 UTC.

And if there are no complaints at Cauldron we could do the same for
the other patch lists the week after.

Cheers,

Mark

> > [1] https://patchwork.sourceware.org/project/gcc/list/
> > [2] https://sourceware.org/bugzilla/show_bug.cgi?id=29713


[PATCH][_Hashtable] Avoid redundant usage of rehash policy

2023-09-17 Thread François Dumont via Gcc-patches

libstdc++: [_Hashtable] Avoid redundant usage of rehash policy

Bypass usage of __detail::__distance_fwd and check for need to rehash 
when assigning an initializer_list to

an unordered_multimap or unordered_multiset.

libstdc++-v3/ChangeLog:

    * include/bits/hashtable_policy.h
    (_Insert_base<>::_M_insert_range(_InputIte, _InputIte, _NodeGen&)): 
New.
    (_Insert_base<>::_M_insert_range(_InputIte, _InputIte, true_type)): 
Use latter.
    (_Insert_base<>::_M_insert_range(_InputIte, _InputIte, 
false_type)): Likewise.

    * include/bits/hashtable.h
(_Hashtable<>::operator=(initializer_list)): Likewise.
    (_Hashtable<>::_Hashtable(_InputIte, _InputIte, size_type, const 
_Hash&, const _Equal&,

    const allocator_type&, false_type)): Likewise.

Ok to commit ?

François

diff --git a/libstdc++-v3/include/bits/hashtable.h b/libstdc++-v3/include/bits/hashtable.h
index 4c12dc895b2..c544094847d 100644
--- a/libstdc++-v3/include/bits/hashtable.h
+++ b/libstdc++-v3/include/bits/hashtable.h
@@ -614,7 +614,7 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
 	if (_M_bucket_count < __l_bkt_count)
 	  rehash(__l_bkt_count);
 
-	this->_M_insert_range(__l.begin(), __l.end(), __roan, __unique_keys{});
+	this->_M_insert_range(__l.begin(), __l.end(), __roan);
 	return *this;
   }
 
@@ -1254,8 +1254,7 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
 	  }
 
 	__alloc_node_gen_t __node_gen(*this);
-	for (; __f != __l; ++__f)
-	  _M_insert(*__f, __node_gen, __uks);
+	this->_M_insert_range(__f, __l, __node_gen);
   }
 
   template
 	void
 	_M_insert_range(_InputIterator __first, _InputIterator __last,
-			const _NodeGetter&, true_type __uks);
+			const _NodeGetter&);
 
-  template
+  template
 	void
 	_M_insert_range(_InputIterator __first, _InputIterator __last,
-			const _NodeGetter&, false_type __uks);
+			true_type __uks);
+
+  template
+	void
+	_M_insert_range(_InputIterator __first, _InputIterator __last,
+			false_type __uks);
 
 public:
   using iterator = _Node_iterator<_Value, __constant_iterators::value,
@@ -966,11 +971,7 @@ namespace __detail
   template
 	void
 	insert(_InputIterator __first, _InputIterator __last)
-	{
-	  __hashtable& __h = _M_conjure_hashtable();
-	  __node_gen_type __node_gen(__h);
-	  return _M_insert_range(__first, __last, __node_gen, __unique_keys{});
-	}
+	{ _M_insert_range(__first, __last, __unique_keys{}); }
 };
 
   template::
   _M_insert_range(_InputIterator __first, _InputIterator __last,
-		  const _NodeGetter& __node_gen, true_type __uks)
+		  const _NodeGetter& __node_gen)
   {
 	__hashtable& __h = _M_conjure_hashtable();
 	for (; __first != __last; ++__first)
-	  __h._M_insert(*__first, __node_gen, __uks);
+	  __h._M_insert(*__first, __node_gen, __unique_keys{});
   }
 
   template
-template
+template
+  void
+  _Insert_base<_Key, _Value, _Alloc, _ExtractKey, _Equal,
+		   _Hash, _RangeHash, _Unused,
+		   _RehashPolicy, _Traits>::
+  _M_insert_range(_InputIterator __first, _InputIterator __last,
+		  true_type /* __uks */)
+  {
+	__hashtable& __h = _M_conjure_hashtable();
+	__node_gen_type __node_gen(__h);
+	_M_insert_range(__first, __last, __node_gen);
+  }
+
+  template
+template
   void
   _Insert_base<_Key, _Value, _Alloc, _ExtractKey, _Equal,
 		   _Hash, _RangeHash, _Unused,
 		   _RehashPolicy, _Traits>::
   _M_insert_range(_InputIterator __first, _InputIterator __last,
-		  const _NodeGetter& __node_gen, false_type __uks)
+		  false_type /* __uks */)
   {
 	using __rehash_type = typename __hashtable::__rehash_type;
 	using __rehash_state = typename __hashtable::__rehash_state;
@@ -1020,8 +1038,8 @@ namespace __detail
 	if (__do_rehash.first)
 	  __h._M_rehash(__do_rehash.second, __saved_state);
 
-	for (; __first != __last; ++__first)
-	  __h._M_insert(*__first, __node_gen, __uks);
+	__node_gen_type __node_gen(__h);
+	_M_insert_range(__first, __last, __node_gen);
   }
 
   /**


[PATCH] MATCH: Make zero_one_valued_p non-recusive fully

2023-09-17 Thread Andrew Pinski via Gcc-patches
So it turns out VN can't handle any kind of recusion for match. In this
case we have `b = a & -1` and we try to match a as being zero_one_valued_p
and VN returns b as being the value and we just go into an infinite loop at
this point.

OK? Bootstrapped and tested on x86_64-linux-gnu with no regressions.

Note genmatch should warn (or error out) if this gets detected so I filed PR 
111446
which I will be looking into next week or the week after so we don't run into
this issue again.

PR tree-optimization/111442

gcc/ChangeLog:

* match.pd (zero_one_valued_p): Have the bit_and match not be
recusive.

gcc/testsuite/ChangeLog:

* gcc.c-torture/compile/pr111442-1.c: New test.
---
 gcc/match.pd |  5 -
 gcc/testsuite/gcc.c-torture/compile/pr111442-1.c | 13 +
 2 files changed, 17 insertions(+), 1 deletion(-)
 create mode 100644 gcc/testsuite/gcc.c-torture/compile/pr111442-1.c

diff --git a/gcc/match.pd b/gcc/match.pd
index 887665633d4..773c3810f51 100644
--- a/gcc/match.pd
+++ b/gcc/match.pd
@@ -2183,8 +2183,11 @@ DEFINE_INT_AND_FLOAT_ROUND_FN (RINT)
 
 /* (a&1) is always [0,1] too. This is useful again when
the range is not known. */
+/* Note this can't be recusive due to VN handling of equivalents,
+   VN and would cause an infinite recusion. */
 (match zero_one_valued_p
- (bit_and:c@0 @1 zero_one_valued_p))
+ (bit_and:c@0 @1 integer_onep)
+ (if (INTEGRAL_TYPE_P (type
 
 /* A conversion from an zero_one_valued_p is still a [0,1].
This is useful when the range of a variable is not known */
diff --git a/gcc/testsuite/gcc.c-torture/compile/pr111442-1.c 
b/gcc/testsuite/gcc.c-torture/compile/pr111442-1.c
new file mode 100644
index 000..5814ee938de
--- /dev/null
+++ b/gcc/testsuite/gcc.c-torture/compile/pr111442-1.c
@@ -0,0 +1,13 @@
+
+int *a, b;
+int main() {
+  int d = 1, e;
+  if (d)
+e = a ? 0 % 0 : 0;
+  if (d)
+a = &d;
+  d = -1;
+  b = d & e;
+  b = 2 * e ^ 1;
+  return 0;
+}
-- 
2.31.1



[PATCH] Remove xfail from gcc.dg/tree-ssa/20040204-1.c

2023-09-17 Thread Andrew Pinski via Gcc-patches
So the xfail was there because at one point the difference
from having logical-op-non-short-circuit set to 1 or 0 made a
difference in being able to optimizing a conditional way.
This has not been true for over 10 years in this case so
instead of keeping on adding to the xfail list, removing it
is the right thing to do.

Committed as obvious after a test on x86_64-linux-gnu.

gcc/testsuite/ChangeLog:

* gcc.dg/tree-ssa/20040204-1.c: Remove xfail.
---
 gcc/testsuite/gcc.dg/tree-ssa/20040204-1.c | 6 +-
 1 file changed, 1 insertion(+), 5 deletions(-)

diff --git a/gcc/testsuite/gcc.dg/tree-ssa/20040204-1.c 
b/gcc/testsuite/gcc.dg/tree-ssa/20040204-1.c
index b9f8fd21ac9..aa9f68b8b42 100644
--- a/gcc/testsuite/gcc.dg/tree-ssa/20040204-1.c
+++ b/gcc/testsuite/gcc.dg/tree-ssa/20040204-1.c
@@ -29,8 +29,4 @@ void test55 (int x, int y)
 
 /* There should be not link_error calls, if there is any the
optimization has failed */
-/* ??? Ug.  This one may or may not fail based on how fold decides
-   that the && should be emitted (based on BRANCH_COST).  Fix this
-   by teaching dom to look through && and register all components
-   as true.  */
-/* { dg-final { scan-tree-dump-times "link_error" 0 "optimized" { xfail { ! 
"alpha*-*-* arm*-*-* aarch64*-*-* powerpc*-*-* cris-*-* hppa*-*-* i?86-*-* 
mmix-*-* mips*-*-* m68k*-*-* moxie-*-* nds32*-*-* s390*-*-* sh*-*-* sparc*-*-* 
visium-*-* x86_64-*-* riscv*-*-* or1k*-*-* msp430-*-* pru*-*-* nvptx*-*-*" } } 
} } */
+/* { dg-final { scan-tree-dump-times "link_error" 0 "optimized" } } */
-- 
2.31.1



[r14-4046 Regression] FAIL: 23_containers/vector/bool/110807.cc -std=gnu++17 (test for excess errors) on Linux/x86_64

2023-09-17 Thread Hu, Lin1 via Gcc-patches


On Linux/x86_64,

3a0e01f6bb1d6ec444001f2caea6ef43a4a83e3a is the first bad commit commit 
3a0e01f6bb1d6ec444001f2caea6ef43a4a83e3a
Author: Jonathan Wakely 
Date:   Fri Sep 1 21:27:57 2023 +0100

libstdc++: Add support for running tests with multiple -std options

caused

FAIL: 23_containers/vector/bool/110807.cc  -std=gnu++17 (test for excess errors)

with GCC configured with

../../gcc/configure 
--prefix=/export/users/haochenj/src/gcc-bisect/master/master/r14-4046/usr 
--enable-clocale=gnu --with-system-zlib --with-demangler-in-ld 
--with-fpmath=sse --enable-languages=c,c++,fortran --enable-cet --without-isl 
--enable-libmpx x86_64-linux --disable-bootstrap

To reproduce:

$ cd {build_dir}/x86_64-linux/libstdc++-v3/testsuite && make check 
RUNTESTFLAGS="conformance.exp=23_containers/vector/bool/110807.cc 
--target_board='unix{-m32}'"
$ cd {build_dir}/x86_64-linux/libstdc++-v3/testsuite && make check 
RUNTESTFLAGS="conformance.exp=23_containers/vector/bool/110807.cc 
--target_board='unix{-m32\ -march=cascadelake}'"

(Please do not reply to this email, for question about this report, contact me 
at lin1 dot hu at intel.com.) (If you met problems with cascadelake related, 
disabling AVX512F in command line might save that.) (However, please make sure 
that there is no potential problems with AVX512.)


RE: [PATCH v1] RISC-V: Bugfix for scalar move with merged operand

2023-09-17 Thread Li, Pan2 via Gcc-patches
> I must be missing something.  Doesn't insn 10 broadcast the immediate 
> 0x2 to both elements of r142?!?  What am I missing?

Thanks Jeff for comments.

The insn 10 is VECTOR_SCALAR_MOV, aka vmv.s.x from the asm code.

Pan

-Original Message-
From: Jeff Law  
Sent: Sunday, September 17, 2023 11:53 PM
To: Li, Pan2 ; gcc-patches@gcc.gnu.org
Cc: juzhe.zh...@rivai.ai; Wang, Yanzhang ; 
kito.ch...@gmail.com; rdapp@gmail.com
Subject: Re: [PATCH v1] RISC-V: Bugfix for scalar move with merged operand




On 9/17/23 01:42, Pan Li via Gcc-patches wrote:
> From: Pan Li 
> 
> Given below example for VLS mode
> 
> void
> test (vl_t *u)
> {
>vl_t t;
>long long *p = (long long *)&t;
> 
>p[0] = p[1] = 2;
> 
>*u = t;
> }
> 
> The vec_set will simplify the insn to vmv.s.x when index is 0, without
> merged operand. That will result in some problems in DCE, aka:
> 
> 1:  137[DI] = a0
> 2:  138[V2DI] = 134[V2DI]  // deleted by DCE
> 3:  139[DI] = #2   // deleted by DCE
> 4:  140[DI] = #2   // deleted by DCE
> 5:  141[V2DI] = vec_dup:V2DI (139[DI]) // deleted by DCE
> 6:  138[V2DI] = vslideup_imm (138[V2DI], 141[V2DI], 1) // deleted by DCE
> 7:  135[V2DI] = 138[V2DI]  // deleted by DCE
> 8:  142[V2DI] = 135[V2DI]  // deleted by DCE
> 9:  143[DI] = #2
> 10: 142[V2DI] = vec_dup:V2DI (143[DI])
> 11: (137[DI]) = 142[V2DI]
> 
> The higher 64 bits of 142[V2DI] is unknown here and it generated
> incorrect code when store back to memory. This patch would like to
> fix this issue by adding a new SCALAR_MOVE_MERGED_OP for vec_set.
I must be missing something.  Doesn't insn 10 broadcast the immediate 
0x2 to both elements of r142?!?  What am I missing?

JEff


[PATCH V3] internal-fn: Support undefined rtx for uninitialized SSA_NAME

2023-09-17 Thread Juzhe-Zhong
According to PR: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110751

As Richard and Richi suggested, we recognize uninitialized SSA_NAME and convert 
it
into SCRATCH rtx if the target predicate allows SCRATCH.

It can help to reduce redundant data move instructions of targets like RISC-V.

Bootstrap and Regression on x86 passed.

gcc/ChangeLog:

* internal-fn.cc (expand_fn_using_insn): Support undefined rtx value.
* optabs.cc (maybe_legitimize_operand): Ditto.
(can_reuse_operands_p): Ditto.
* optabs.h (enum expand_operand_type): Ditto.
(create_undefined_input_operand): Ditto.

---
 gcc/internal-fn.cc |  4 
 gcc/optabs.cc  | 14 +-
 gcc/optabs.h   | 14 +-
 3 files changed, 30 insertions(+), 2 deletions(-)

diff --git a/gcc/internal-fn.cc b/gcc/internal-fn.cc
index 0fd34359247..61d5a9e4772 100644
--- a/gcc/internal-fn.cc
+++ b/gcc/internal-fn.cc
@@ -247,6 +247,10 @@ expand_fn_using_insn (gcall *stmt, insn_code icode, 
unsigned int noutputs,
create_convert_operand_from (&ops[opno], rhs_rtx,
 TYPE_MODE (rhs_type),
 TYPE_UNSIGNED (rhs_type));
+  else if (TREE_CODE (rhs) == SSA_NAME
+  && SSA_NAME_IS_DEFAULT_DEF (rhs)
+  && VAR_P (SSA_NAME_VAR (rhs)))
+   create_undefined_input_operand (&ops[opno], TYPE_MODE (rhs_type));
   else
create_input_operand (&ops[opno], rhs_rtx, TYPE_MODE (rhs_type));
   opno += 1;
diff --git a/gcc/optabs.cc b/gcc/optabs.cc
index 32ff379ffc3..75b1d54ad7c 100644
--- a/gcc/optabs.cc
+++ b/gcc/optabs.cc
@@ -8102,6 +8102,17 @@ maybe_legitimize_operand (enum insn_code icode, unsigned 
int opno,
  goto input;
}
   break;
+
+case EXPAND_UNDEFINED_INPUT:
+  /* For SCRATCH rtx which is converted from uninitialized
+SSA, we convert it as fresh pseudo when target doesn't
+allow scratch rtx in predicate. Otherwise, return true.  */
+  if (!insn_operand_matches (icode, opno, op->value))
+   {
+ op->value = gen_reg_rtx (op->mode);
+ goto input;
+   }
+  return true;
 }
   return insn_operand_matches (icode, opno, op->value);
 }
@@ -8140,7 +8151,8 @@ can_reuse_operands_p (enum insn_code icode,
   switch (op1->type)
 {
 case EXPAND_OUTPUT:
-  /* Outputs must remain distinct.  */
+case EXPAND_UNDEFINED_INPUT:
+  /* Outputs and undefined intputs must remain distinct.  */
   return false;
 
 case EXPAND_FIXED:
diff --git a/gcc/optabs.h b/gcc/optabs.h
index c80b7f4dc1b..6faebf7cb63 100644
--- a/gcc/optabs.h
+++ b/gcc/optabs.h
@@ -37,7 +37,8 @@ enum expand_operand_type {
   EXPAND_CONVERT_TO,
   EXPAND_CONVERT_FROM,
   EXPAND_ADDRESS,
-  EXPAND_INTEGER
+  EXPAND_INTEGER,
+  EXPAND_UNDEFINED_INPUT
 };
 
 /* Information about an operand for instruction expansion.  */
@@ -117,6 +118,17 @@ create_input_operand (class expand_operand *op, rtx value,
   create_expand_operand (op, EXPAND_INPUT, value, mode, false);
 }
 
+/* Make OP describe an undefined input operand for uninitialized
+   SSA.  It's the scratch operand with mode MODE; MODE cannot be
+   VOIDmode.  */
+
+inline void
+create_undefined_input_operand (class expand_operand *op, machine_mode mode)
+{
+  create_expand_operand (op, EXPAND_UNDEFINED_INPUT, gen_rtx_SCRATCH (mode),
+mode, false);
+}
+
 /* Like create_input_operand, except that VALUE must first be converted
to mode MODE.  UNSIGNED_P says whether VALUE is unsigned.  */
 
-- 
2.36.3



Re: [PATCH] c++: overeager type completion in convert_to_void [PR111419]

2023-09-17 Thread Jason Merrill via Gcc-patches

On 9/16/23 17:41, Patrick Palka wrote:

On Sat, 16 Sep 2023, Jason Merrill wrote:


On 9/15/23 12:03, Patrick Palka wrote:

Bootstrapped and regtested on x86_64-pc-linux-gnu, does this look OK for
trunk?

-- >8 --

Here convert_to_void always completes the type of an INDIRECT_REF or
VAR_DECL expression, but according to [expr.context] an lvalue-to-rvalue
conversion is applied to a discarded-value expression only if "the
expression is a glvalue of volatile-qualified type".  This patch restricts
convert_to_void's type completion accordingly.

PR c++/111419

gcc/cp/ChangeLog:

* cvt.cc (convert_to_void) : Only call
complete_type if the type is volatile and the INDIRECT_REF
isn't an implicit one.


Hmm, what does implicit have to do with it?  The expression forms listed in
https://eel.is/c++draft/expr.context#2 include "id-expression"...


When there's an implicit INDIRECT_REF, I reckoned the type of the
id-expression is really a reference type, which can't be cv-qualified?


A name can have reference type, but its use as an expression doesn't:
https://eel.is/c++draft/expr.type#1.sentence-1


diff --git a/gcc/testsuite/g++.dg/expr/discarded1a.C
b/gcc/testsuite/g++.dg/expr/discarded1a.C
new file mode 100644
index 000..5516ff46fe9
--- /dev/null
+++ b/gcc/testsuite/g++.dg/expr/discarded1a.C
@@ -0,0 +1,16 @@
+// PR c++/111419
+
+struct Incomplete;
+
+template struct Holder { T t; }; // { dg-error "incomplete" }
+
+extern volatile Holder a;
+extern volatile Holder& b;
+extern volatile Holder* c;
+
+int main() {
+  a; // { dg-message "required from here" }
+  b; // { dg-warning "implicit dereference will not access object" }
+ // { dg-bogus "required from here" "" { target *-*-* } .-1 }


...so it seems to me this line should get the lvalue-rvalue conversion (and
not the warning about no access).


+  *c; // { dg-message "required from here" }
+}




Re: [PATCH] c++: constness of decltype of NTTP object [PR98820]

2023-09-17 Thread Jason Merrill via Gcc-patches

On 9/16/23 18:00, Patrick Palka wrote:

On Sat, 16 Sep 2023, Jason Merrill wrote:


On 9/15/23 13:55, Patrick Palka wrote:

This corrects decltype of a (class) NTTP object as per
[dcl.type.decltype]/1.2 and [temp.param]/6 in the type-dependent case.
In the non-dependent case (nontype-class8.C) we resolve the decltype
ahead of time, and finish_decltype_type already made sure to drop the
const VIEW_CONVERT_EXPR wrapper around the TEMPLATE_PARM_INDEX.


Hmm, seems like dropping the VIEW_CONVERT_EXPR is wrong in this case? I'm not
sure why I added that.


Ah sorry, my commit message was a bit sloppy.

In the non-dependent case we resolve the decltype ahead of time, in
which case finish_decltype_type drops the const VIEW_CONVERT_EXPR
wrapper around the TEMPLATE_PARM_INDEX, and the latter has the
desired non-const type.

In the type-dependent case, tsubst drops the VIEW_CONVERT_EXPR
because the substituted class NTTP is the already const object
created by get_template_parm_object.  So finish_decltype_type
at instantiation time sees the bare const object, which this patch
now adds special handling for.

So we need to continue dropping the VIEW_CONVERT_EXPR to handle the
non-dependent case.


Aha.  The patch is OK, then.

Jason



Re: [PATCH] c++: optimize tsubst_template_decl for function templates

2023-09-17 Thread Jason Merrill via Gcc-patches

On 9/17/23 15:13, Patrick Palka wrote:

Bootstrapped and regtested on x86_64-pc-linux-gnu, does this look OK for
trunk?

-- >8 --

r14-2655-g92d1425ca78040 made instantiate_template avoid redundantly
performing a specialization lookup when instantiating a function or
alias template.  This patch applies the same optimization to
tsubst_template_decl when (partially) instantiating a function template,
which allows us to remove a check from register_specialization since
tsubst_function_decl no longer calls register_specialization for
a function template partial instantiation.

gcc/cp/ChangeLog:

* pt.cc (register_specialization): Remove now-unnecessary
early exit for FUNCTION_DECL partial instantiation.
(tsubst_template_decl): Pass use_spec_table=false to
tsubst_function_decl.  Set DECL_TI_ARGS of a non-lambda
FUNCTION_DECL specialization to the full set of arguments.
Simplify register_specialization call accordingly.

gcc/testsuite/ChangeLog:

* g++.dg/template/nontype12.C: Expect two instead of three
duplicate diagnostics for A::bar() specialization.
---
  gcc/cp/pt.cc  | 29 +++
  gcc/testsuite/g++.dg/template/nontype12.C |  1 -
  2 files changed, 9 insertions(+), 21 deletions(-)

diff --git a/gcc/cp/pt.cc b/gcc/cp/pt.cc
index c311a6b88f5..a0296a1ea16 100644
--- a/gcc/cp/pt.cc
+++ b/gcc/cp/pt.cc
@@ -1507,21 +1507,6 @@ register_specialization (tree spec, tree tmpl, tree 
args, bool is_friend,
  || (TREE_CODE (tmpl) == FIELD_DECL
  && TREE_CODE (spec) == NONTYPE_ARGUMENT_PACK));
  
-  if (TREE_CODE (spec) == FUNCTION_DECL

-  && uses_template_parms (DECL_TI_ARGS (spec)))
-/* This is the FUNCTION_DECL for a partial instantiation.  Don't
-   register it; we want the corresponding TEMPLATE_DECL instead.
-   We use `uses_template_parms (DECL_TI_ARGS (spec))' rather than
-   the more obvious `uses_template_parms (spec)' to avoid problems
-   with default function arguments.  In particular, given
-   something like this:
-
- template  void f(T t1, T t = T())
-
-   the default argument expression is not substituted for in an
-   instantiation unless and until it is actually needed.  */
-return spec;
-
spec_entry elt;
elt.tmpl = tmpl;
elt.args = args;
@@ -14663,7 +14648,7 @@ tsubst_template_decl (tree t, tree args, tsubst_flags_t 
complain,
tree in_decl = t;
tree spec;
tree tmpl_args;
-  tree full_args;
+  tree full_args = NULL_TREE;
tree r;
hashval_t hash = 0;
  
@@ -14754,7 +14739,8 @@ tsubst_template_decl (tree t, tree args, tsubst_flags_t complain,

tree inner = decl;
++processing_template_decl;
if (TREE_CODE (inner) == FUNCTION_DECL)
-inner = tsubst_function_decl (inner, args, complain, lambda_fntype);
+inner = tsubst_function_decl (inner, args, complain, lambda_fntype,
+ /*use_spec_table=*/false);
else
  {
if (TREE_CODE (inner) == TYPE_DECL && !TYPE_DECL_ALIAS_P (inner))
@@ -14792,6 +14778,11 @@ tsubst_template_decl (tree t, tree args, 
tsubst_flags_t complain,
  }
else
  {
+  if (TREE_CODE (inner) == FUNCTION_DECL)
+   /* Set DECL_TI_ARGS to the full set of template arguments, which
+  tsubst_function_decl didn't do due to use_spec_table=false.  */
+   DECL_TI_ARGS (inner) = full_args;
+
DECL_TI_TEMPLATE (inner) = r;
DECL_TI_ARGS (r) = DECL_TI_ARGS (inner);
  }
@@ -14822,9 +14813,7 @@ tsubst_template_decl (tree t, tree args, tsubst_flags_t 
complain,
  
if (TREE_CODE (decl) == FUNCTION_DECL && !lambda_fntype)

  /* Record this non-type partial instantiation.  */
-register_specialization (r, t,
-DECL_TI_ARGS (DECL_TEMPLATE_RESULT (r)),
-false, hash);
+register_specialization (r, t, full_args, false, hash);
  
return r;

  }
diff --git a/gcc/testsuite/g++.dg/template/nontype12.C 
b/gcc/testsuite/g++.dg/template/nontype12.C
index 9a9c3ac1e66..e36a9f16f94 100644
--- a/gcc/testsuite/g++.dg/template/nontype12.C
+++ b/gcc/testsuite/g++.dg/template/nontype12.C
@@ -5,7 +5,6 @@ template struct A
  {
template int foo();// { dg-error "double" "" { 
target c++17_down } }
template class> int bar();// { dg-bogus 
{double[^\n]*\n[^\n]*C:7:[^\n]*double} "" { xfail c++17_down } }
-  // { dg-error "double" "" { target c++17_down } .-1 }


Hmm, I thought this line was to check that we get one error even if we 
don't want two?


Jason



[Committed] RISC-V: Remove redundant codes of VLS patterns[NFC]

2023-09-17 Thread Juzhe-Zhong
Consider those VLS patterns are the same VLA patterns.
Now extend VI -> V_VLSI and VF -> V_VLSF.
Then remove the redundant codes of VLS patterns.

gcc/ChangeLog:

* config/riscv/autovec-vls.md (3): Deleted.
(copysign3): Ditto.
(xorsign3): Ditto.
(2): Ditto.
* config/riscv/autovec.md: Extend VLS modes.

---
 gcc/config/riscv/autovec-vls.md | 138 
 gcc/config/riscv/autovec.md |  44 +-
 2 files changed, 22 insertions(+), 160 deletions(-)

diff --git a/gcc/config/riscv/autovec-vls.md b/gcc/config/riscv/autovec-vls.md
index d4ed2081537..3488f452e5d 100644
--- a/gcc/config/riscv/autovec-vls.md
+++ b/gcc/config/riscv/autovec-vls.md
@@ -194,141 +194,3 @@
   }
   [(set_attr "type" "vector")]
 )
-
-;; -
-;;  [INT] Binary operations
-;; -
-;; Includes:
-;; - vadd.vv/vsub.vv/...
-;; - vadd.vi/vsub.vi/...
-;; -
-
-(define_insn_and_split "3"
-  [(set (match_operand:VLSI 0 "register_operand")
-(any_int_binop_no_shift:VLSI
- (match_operand:VLSI 1 "")
- (match_operand:VLSI 2 "")))]
-  "TARGET_VECTOR && can_create_pseudo_p ()"
-  "#"
-  "&& 1"
-  [(const_int 0)]
-{
-  riscv_vector::emit_vlmax_insn (code_for_pred (, mode),
- riscv_vector::BINARY_OP, operands);
-  DONE;
-}
-[(set_attr "type" "vector")]
-)
-
-;; -
-;;  [FP] Binary operations
-;; -
-;; Includes:
-;; - vfadd.vv/vfsub.vv/vfmul.vv/vfdiv.vv
-;; - vfadd.vf/vfsub.vf/vfmul.vf/vfdiv.vf
-;; -
-(define_insn_and_split "3"
-  [(set (match_operand:VLSF 0 "register_operand")
-(any_float_binop:VLSF
- (match_operand:VLSF 1 "")
- (match_operand:VLSF 2 "")))]
-  "TARGET_VECTOR && can_create_pseudo_p ()"
-  "#"
-  "&& 1"
-  [(const_int 0)]
-{
-  riscv_vector::emit_vlmax_insn (code_for_pred (, mode),
-riscv_vector::BINARY_OP_FRM_DYN, operands);
-  DONE;
-}
-[(set_attr "type" "vector")]
-)
-
-;; -
-;; Includes:
-;; - vfmin.vv/vfmax.vv
-;; - vfmin.vf/vfmax.vf
-;; - fmax/fmaxf in math.h
-;; -
-(define_insn_and_split "3"
-  [(set (match_operand:VLSF 0 "register_operand")
-(any_float_binop_nofrm:VLSF
- (match_operand:VLSF 1 "")
- (match_operand:VLSF 2 "")))]
-  "TARGET_VECTOR && can_create_pseudo_p ()"
-  "#"
-  "&& 1"
-  [(const_int 0)]
-{
-  riscv_vector::emit_vlmax_insn (code_for_pred (, mode),
-riscv_vector::BINARY_OP, operands);
-  DONE;
-}
-[(set_attr "type" "vector")]
-)
-
-;; -
-;; Includes:
-;; - vfsgnj.vv
-;; - vfsgnj.vf
-;; -
-(define_insn_and_split "copysign3"
-  [(set (match_operand:VLSF 0 "register_operand")
-(unspec:VLSF
-  [(match_operand:VLSF  1 "register_operand")
-   (match_operand:VLSF  2 "register_operand")] UNSPEC_VCOPYSIGN))]
-  "TARGET_VECTOR && can_create_pseudo_p ()"
-  "#"
-  "&& 1"
-  [(const_int 0)]
-  {
-riscv_vector::emit_vlmax_insn (code_for_pred (UNSPEC_VCOPYSIGN, 
mode),
-  riscv_vector::BINARY_OP, operands);
-DONE;
-  }
-  [(set_attr "type" "vector")]
-)
-
-;; -
-;; Includes:
-;; - vfsgnjx.vv
-;; - vfsgnjx.vf
-;; -
-(define_insn_and_split "xorsign3"
-  [(set (match_operand:VLSF 0 "register_operand")
-(unspec:VLSF
-  [(match_operand:VLSF  1 "register_operand")
-   (match_operand:VLSF  2 "register_operand")] UNSPEC_VXORSIGN))]
-  "TARGET_VECTOR && can_create_pseudo_p ()"
-  "#"
-  "&& 1"
-  [(const_int 0)]
-  {
-riscv_vector::emit_vlmax_insn (code_for_pred (UNSPEC_VXORSIGN, mode),
-  riscv_vector::BINARY_OP, operands);
-DONE;
-  }
-)
-
-;; 
---
-;;  [INT] Unary operations
-;; 
---
-;; Includes:
-;; - vneg.v/vnot.v
-;; 
---
-
-(define_insn_and_split "2"
-  [(set (match_operand:VLSI 0 "register_operand")
-(any_int_unop:VLSI
- (match_operand:VLSI 1 "register_operand")))]
-  "TARGET_VECTOR && can_create_pseudo_p ()"
-  "#"
-  "&& 1"
-  [(const_int 0)]
-

Re: [PATCH] c++: non-dependent assignment checking [PR63198, PR18474]

2023-09-17 Thread Jason Merrill via Gcc-patches

On 9/17/23 14:51, Patrick Palka wrote:

Bootstrapped and regtested on x86_64-pc-linux-gnu, does this look OK for
trunk?  Patch generatde with -w to avoid noisy whitespace changes.

-- >8 --

This patch makes us recognize and check non-dependent simple assigments
ahead of time, like we already do for compound assignments.  This means
the templated representation of such assignments will now usually have
an implicit INDIRECT_REF (due to the reference return type), which the
-Wparentheses code needs to handle.  As a drive-by improvement, this
patch also makes maybe_convert_cond issue -Wparentheses warnings ahead
of time.

This revealed some libstdc++ tests were attempting to modify a data
member from a uninstantiated const member function; naively fixed by
making the data member mutable.

PR c++/63198
PR c++/18474

gcc/cp/ChangeLog:

* semantics.cc (maybe_convert_cond): Look through implicit
INDIRECT_REF when deciding whether to issue a -Wparentheses
warning, and consider templated assignment expressions as well.
(finish_parenthesized_expr): Look through implicit INDIRECT_REF
when suppressing -Wparentheses warning.
* typeck.cc (build_x_modify_expr): Check simple assignments
ahead time too, not just compound assignments.  Give the second
operand of MODOP_EXPR a non-null type so that it's not considered
always instantiation-dependent.

gcc/testsuite/ChangeLog:

* g++.dg/cpp0x/static_assert15.C: Expect diagnostic for
non-constant static_assert condition.
* g++.dg/expr/unary2.C: Remove xfails.
* g++.dg/template/init7.C: Make initializer type-dependent to
preserve intent of test.
* g++.dg/template/recurse3.C: Likewise for the erroneous
statement.
* g++.dg/template/non-dependent26.C: New test.
* g++.dg/warn/Wparentheses-32.C: New test.

libstdc++/ChangeLog:

* testsuite/26_numerics/random/discard_block_engine/cons/seed_seq2.cc:
Make seed_seq::called member mutable.
* 
testsuite/26_numerics/random/independent_bits_engine/cons/seed_seq2.cc:
Likewise.
* 
testsuite/26_numerics/random/linear_congruential_engine/cons/seed_seq2.cc
Likewise.
* 
testsuite/26_numerics/random/mersenne_twister_engine/cons/seed_seq2.cc:
Likewise.
* testsuite/26_numerics/random/shuffle_order_engine/cons/seed_seq2.cc:
Likewise.
* 
testsuite/26_numerics/random/subtract_with_carry_engine/cons/seed_seq2.cc:
Likewise.
* 
testsuite/ext/random/simd_fast_mersenne_twister_engine/cons/seed_seq2.cc
Likewise.
---
  gcc/cp/semantics.cc   | 17 +++
  gcc/cp/typeck.cc  | 23 +++
  gcc/testsuite/g++.dg/cpp0x/static_assert15.C  |  2 +-
  gcc/testsuite/g++.dg/expr/unary2.C|  8 ++
  gcc/testsuite/g++.dg/template/init7.C |  2 +-
  .../g++.dg/template/non-dependent26.C | 25 +
  gcc/testsuite/g++.dg/template/recurse3.C  |  8 +++---
  gcc/testsuite/g++.dg/warn/Wparentheses-32.C   | 28 +++
  .../discard_block_engine/cons/seed_seq2.cc|  2 +-
  .../independent_bits_engine/cons/seed_seq2.cc |  2 +-
  .../cons/seed_seq2.cc |  2 +-
  .../mersenne_twister_engine/cons/seed_seq2.cc |  2 +-
  .../shuffle_order_engine/cons/seed_seq2.cc|  2 +-
  .../cons/seed_seq2.cc |  2 +-
  .../cons/seed_seq2.cc |  2 +-
  15 files changed, 91 insertions(+), 36 deletions(-)
  create mode 100644 gcc/testsuite/g++.dg/template/non-dependent26.C
  create mode 100644 gcc/testsuite/g++.dg/warn/Wparentheses-32.C

diff --git a/gcc/cp/semantics.cc b/gcc/cp/semantics.cc
index 0f7f4e87ae4..b57c1ac868b 100644
--- a/gcc/cp/semantics.cc
+++ b/gcc/cp/semantics.cc
diff --git a/gcc/cp/typeck.cc b/gcc/cp/typeck.cc
index 459739d5866..74f5fced060 100644
--- a/gcc/cp/typeck.cc
+++ b/gcc/cp/typeck.cc
@@ -9739,15 +9739,15 @@ build_x_modify_expr (location_t loc, tree lhs, enum 
tree_code modifycode,
rhs = build_non_dependent_expr (rhs);
  }
  
-  if (modifycode != NOP_EXPR)

-{
-  tree op = build_nt (modifycode, NULL_TREE, NULL_TREE);
-  tree rval = build_new_op (loc, MODIFY_EXPR, LOOKUP_NORMAL,
+  tree rval;
+  if (modifycode == NOP_EXPR)
+rval = cp_build_modify_expr (loc, lhs, modifycode, rhs, complain);
+  else
+rval = build_new_op (loc, MODIFY_EXPR, LOOKUP_NORMAL,
 lhs, rhs, op, lookups, &overload, complain);
-  if (rval)
-   {
if (rval == error_mark_node)
-   return rval;
+return error_mark_node;
+  if (modifycode != NOP_EXPR)
  suppress_warning (rval /* What warning? */);


Did you try disabling this to see if it's still needed?

Jason



[PATCH v1] RISC-V: Support VLS mode for vec_set

2023-09-17 Thread Pan Li via Gcc-patches
From: Pan Li 

This patch would like to add the VLS support vec_set, both INT
and FP are included.

Give sample code as below:

typedef long long vl_t \
  __attribute__((vector_size(2 * sizeof (long long;

vl_t init_vl (vl_t v, unsigned index, unsigned value)
{
  v[index] = value;

  return v;
}

Before this patch:
init_vl:
  addi sp,sp,-16
  vsetivli zero,2,e64,m1,ta,ma
  vle64.v  v1,0(a1)
  vse64.v  v1,0(sp)
  slli a4,a2,32
  srli a2,a4,29
  add  a2,sp,a2
  slli a3,a3,32
  srli a3,a3,32
  sd   a3,0(a2)
  vle64.v  v1,0(sp)
  vse64.v  v1,0(a0)
  addi sp,sp,16
  jr   ra

After this patch:
init_vl:
  vsetivlizero,2,e64,m1,ta,ma
  vle64.v v1,0(a1)
  sllia3,a3,32
  srlia3,a3,32
  addia5,a2,1
  vsetvli zero,a5,e64,m1,tu,ma
  vmv.v.x v2,a3
  vslideup.vx v1,v2,a2
  vsetivlizero,2,e64,m1,ta,ma
  vse64.v v1,0(a0)
  ret

Please note this patch depends the RVV SCALAR_MOVE_MERGED_OP bugfix.

gcc/ChangeLog:

* config/riscv/autovec.md: Extend to vls mode.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/rvv/autovec/vls/def.h: New macros.
* gcc.target/riscv/rvv/autovec/vls/vec-set-1.c: New test.
* gcc.target/riscv/rvv/autovec/vls/vec-set-10.c: New test.
* gcc.target/riscv/rvv/autovec/vls/vec-set-11.c: New test.
* gcc.target/riscv/rvv/autovec/vls/vec-set-12.c: New test.
* gcc.target/riscv/rvv/autovec/vls/vec-set-13.c: New test.
* gcc.target/riscv/rvv/autovec/vls/vec-set-14.c: New test.
* gcc.target/riscv/rvv/autovec/vls/vec-set-15.c: New test.
* gcc.target/riscv/rvv/autovec/vls/vec-set-16.c: New test.
* gcc.target/riscv/rvv/autovec/vls/vec-set-17.c: New test.
* gcc.target/riscv/rvv/autovec/vls/vec-set-18.c: New test.
* gcc.target/riscv/rvv/autovec/vls/vec-set-19.c: New test.
* gcc.target/riscv/rvv/autovec/vls/vec-set-2.c: New test.
* gcc.target/riscv/rvv/autovec/vls/vec-set-20.c: New test.
* gcc.target/riscv/rvv/autovec/vls/vec-set-21.c: New test.
* gcc.target/riscv/rvv/autovec/vls/vec-set-22.c: New test.
* gcc.target/riscv/rvv/autovec/vls/vec-set-3.c: New test.
* gcc.target/riscv/rvv/autovec/vls/vec-set-4.c: New test.
* gcc.target/riscv/rvv/autovec/vls/vec-set-5.c: New test.
* gcc.target/riscv/rvv/autovec/vls/vec-set-6.c: New test.
* gcc.target/riscv/rvv/autovec/vls/vec-set-7.c: New test.
* gcc.target/riscv/rvv/autovec/vls/vec-set-8.c: New test.
* gcc.target/riscv/rvv/autovec/vls/vec-set-9.c: New test.

Signed-off-by: Pan Li 
---
 gcc/config/riscv/autovec.md   |  4 +--
 .../gcc.target/riscv/rvv/autovec/vls/def.h| 18 ++
 .../riscv/rvv/autovec/vls/vec-set-1.c | 35 +++
 .../riscv/rvv/autovec/vls/vec-set-10.c| 31 
 .../riscv/rvv/autovec/vls/vec-set-11.c| 29 +++
 .../riscv/rvv/autovec/vls/vec-set-12.c| 21 +++
 .../riscv/rvv/autovec/vls/vec-set-13.c| 20 +++
 .../riscv/rvv/autovec/vls/vec-set-14.c| 19 ++
 .../riscv/rvv/autovec/vls/vec-set-15.c| 18 ++
 .../riscv/rvv/autovec/vls/vec-set-16.c| 21 +++
 .../riscv/rvv/autovec/vls/vec-set-17.c| 20 +++
 .../riscv/rvv/autovec/vls/vec-set-18.c| 19 ++
 .../riscv/rvv/autovec/vls/vec-set-19.c| 18 ++
 .../riscv/rvv/autovec/vls/vec-set-2.c | 33 +
 .../riscv/rvv/autovec/vls/vec-set-20.c| 20 +++
 .../riscv/rvv/autovec/vls/vec-set-21.c| 19 ++
 .../riscv/rvv/autovec/vls/vec-set-22.c| 18 ++
 .../riscv/rvv/autovec/vls/vec-set-3.c | 31 
 .../riscv/rvv/autovec/vls/vec-set-4.c | 29 +++
 .../riscv/rvv/autovec/vls/vec-set-5.c | 35 +++
 .../riscv/rvv/autovec/vls/vec-set-6.c | 33 +
 .../riscv/rvv/autovec/vls/vec-set-7.c | 31 
 .../riscv/rvv/autovec/vls/vec-set-8.c | 29 +++
 .../riscv/rvv/autovec/vls/vec-set-9.c | 33 +
 24 files changed, 582 insertions(+), 2 deletions(-)
 create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/vls/vec-set-1.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/vls/vec-set-10.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/vls/vec-set-11.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/vls/vec-set-12.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/vls/vec-set-13.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/vls/vec-set-14.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/vls/vec-set-15.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/vls/vec-set-16.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/vls/vec-set-17.c
 create mod

Re: [pushed] c++: [[no_unique_address]] and cv-qualified type

2023-09-17 Thread Jason Merrill via Gcc-patches

On 9/5/23 23:19, Jason Merrill wrote:

Tested x86_64-pc-linux-gnu, applying to trunk.

-- 8< --

We were checking for overlap using same_type_p and therefore allocating two
Empty subobjects at the same offset because one was cv-qualified.

This gives the warning at the location of the class name rather than the
member declaration, but this should be a rare enough issue that it doesn't
seem worth trying to be more precise.


The ABI and language seem to be settling on referring to "similar types" 
here rather than same ignoring top-level cv-qualifiers.


Tested x86_64-pc-linux-gnu, applying to trunk.
From 3f65c1dc56f3a6dd4be85a064d0023b7be3fcd8a Mon Sep 17 00:00:00 2001
From: Jason Merrill 
Date: Tue, 12 Sep 2023 12:15:13 -0400
Subject: [PATCH] c++: overlapping subobjects tweak
To: gcc-patches@gcc.gnu.org

The ABI is settling on "similar" for this rule.

gcc/cp/ChangeLog:

	* class.cc (check_subobject_offset): Use similar_type_p.
---
 gcc/cp/class.cc | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/gcc/cp/class.cc b/gcc/cp/class.cc
index 9139a0075ab..d270dcbb14c 100644
--- a/gcc/cp/class.cc
+++ b/gcc/cp/class.cc
@@ -4065,7 +4065,7 @@ check_subobject_offset (tree type, tree offset, splay_tree offsets)
 	return 1;
 
   if (cv_check != ignore
-	  && same_type_ignoring_top_level_qualifiers_p (elt, type))
+	  && similar_type_p (elt, type))
 	{
 	  if (cv_check == fast)
 	return 1;
-- 
2.39.3



Re: [PATCH v1] RISC-V: Support VLS mode for vec_set

2023-09-17 Thread Kito Cheng via Gcc-patches
LGTM

On Mon, Sep 18, 2023 at 11:27 AM Pan Li via Gcc-patches
 wrote:
>
> From: Pan Li 
>
> This patch would like to add the VLS support vec_set, both INT
> and FP are included.
>
> Give sample code as below:
>
> typedef long long vl_t \
>   __attribute__((vector_size(2 * sizeof (long long;
>
> vl_t init_vl (vl_t v, unsigned index, unsigned value)
> {
>   v[index] = value;
>
>   return v;
> }
>
> Before this patch:
> init_vl:
>   addi sp,sp,-16
>   vsetivli zero,2,e64,m1,ta,ma
>   vle64.v  v1,0(a1)
>   vse64.v  v1,0(sp)
>   slli a4,a2,32
>   srli a2,a4,29
>   add  a2,sp,a2
>   slli a3,a3,32
>   srli a3,a3,32
>   sd   a3,0(a2)
>   vle64.v  v1,0(sp)
>   vse64.v  v1,0(a0)
>   addi sp,sp,16
>   jr   ra
>
> After this patch:
> init_vl:
>   vsetivlizero,2,e64,m1,ta,ma
>   vle64.v v1,0(a1)
>   sllia3,a3,32
>   srlia3,a3,32
>   addia5,a2,1
>   vsetvli zero,a5,e64,m1,tu,ma
>   vmv.v.x v2,a3
>   vslideup.vx v1,v2,a2
>   vsetivlizero,2,e64,m1,ta,ma
>   vse64.v v1,0(a0)
>   ret
>
> Please note this patch depends the RVV SCALAR_MOVE_MERGED_OP bugfix.
>
> gcc/ChangeLog:
>
> * config/riscv/autovec.md: Extend to vls mode.
>
> gcc/testsuite/ChangeLog:
>
> * gcc.target/riscv/rvv/autovec/vls/def.h: New macros.
> * gcc.target/riscv/rvv/autovec/vls/vec-set-1.c: New test.
> * gcc.target/riscv/rvv/autovec/vls/vec-set-10.c: New test.
> * gcc.target/riscv/rvv/autovec/vls/vec-set-11.c: New test.
> * gcc.target/riscv/rvv/autovec/vls/vec-set-12.c: New test.
> * gcc.target/riscv/rvv/autovec/vls/vec-set-13.c: New test.
> * gcc.target/riscv/rvv/autovec/vls/vec-set-14.c: New test.
> * gcc.target/riscv/rvv/autovec/vls/vec-set-15.c: New test.
> * gcc.target/riscv/rvv/autovec/vls/vec-set-16.c: New test.
> * gcc.target/riscv/rvv/autovec/vls/vec-set-17.c: New test.
> * gcc.target/riscv/rvv/autovec/vls/vec-set-18.c: New test.
> * gcc.target/riscv/rvv/autovec/vls/vec-set-19.c: New test.
> * gcc.target/riscv/rvv/autovec/vls/vec-set-2.c: New test.
> * gcc.target/riscv/rvv/autovec/vls/vec-set-20.c: New test.
> * gcc.target/riscv/rvv/autovec/vls/vec-set-21.c: New test.
> * gcc.target/riscv/rvv/autovec/vls/vec-set-22.c: New test.
> * gcc.target/riscv/rvv/autovec/vls/vec-set-3.c: New test.
> * gcc.target/riscv/rvv/autovec/vls/vec-set-4.c: New test.
> * gcc.target/riscv/rvv/autovec/vls/vec-set-5.c: New test.
> * gcc.target/riscv/rvv/autovec/vls/vec-set-6.c: New test.
> * gcc.target/riscv/rvv/autovec/vls/vec-set-7.c: New test.
> * gcc.target/riscv/rvv/autovec/vls/vec-set-8.c: New test.
> * gcc.target/riscv/rvv/autovec/vls/vec-set-9.c: New test.
>
> Signed-off-by: Pan Li 
> ---
>  gcc/config/riscv/autovec.md   |  4 +--
>  .../gcc.target/riscv/rvv/autovec/vls/def.h| 18 ++
>  .../riscv/rvv/autovec/vls/vec-set-1.c | 35 +++
>  .../riscv/rvv/autovec/vls/vec-set-10.c| 31 
>  .../riscv/rvv/autovec/vls/vec-set-11.c| 29 +++
>  .../riscv/rvv/autovec/vls/vec-set-12.c| 21 +++
>  .../riscv/rvv/autovec/vls/vec-set-13.c| 20 +++
>  .../riscv/rvv/autovec/vls/vec-set-14.c| 19 ++
>  .../riscv/rvv/autovec/vls/vec-set-15.c| 18 ++
>  .../riscv/rvv/autovec/vls/vec-set-16.c| 21 +++
>  .../riscv/rvv/autovec/vls/vec-set-17.c| 20 +++
>  .../riscv/rvv/autovec/vls/vec-set-18.c| 19 ++
>  .../riscv/rvv/autovec/vls/vec-set-19.c| 18 ++
>  .../riscv/rvv/autovec/vls/vec-set-2.c | 33 +
>  .../riscv/rvv/autovec/vls/vec-set-20.c| 20 +++
>  .../riscv/rvv/autovec/vls/vec-set-21.c| 19 ++
>  .../riscv/rvv/autovec/vls/vec-set-22.c| 18 ++
>  .../riscv/rvv/autovec/vls/vec-set-3.c | 31 
>  .../riscv/rvv/autovec/vls/vec-set-4.c | 29 +++
>  .../riscv/rvv/autovec/vls/vec-set-5.c | 35 +++
>  .../riscv/rvv/autovec/vls/vec-set-6.c | 33 +
>  .../riscv/rvv/autovec/vls/vec-set-7.c | 31 
>  .../riscv/rvv/autovec/vls/vec-set-8.c | 29 +++
>  .../riscv/rvv/autovec/vls/vec-set-9.c | 33 +
>  24 files changed, 582 insertions(+), 2 deletions(-)
>  create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/vls/vec-set-1.c
>  create mode 100644 
> gcc/testsuite/gcc.target/riscv/rvv/autovec/vls/vec-set-10.c
>  create mode 100644 
> gcc/testsuite/gcc.target/riscv/rvv/autovec/vls/vec-set-11.c
>  create mode 100644 
> gcc/testsuite/gcc.target/riscv/rvv/autovec/vls/vec-set-12.c
>  create mode 100644 
> gcc/testsuite/gcc.target/riscv/rvv/autovec/vls/vec-set-13.c
>  create mode 100644 
> gcc/testsui

[pushed] doc: GTY((cache)) documentation tweak

2023-09-17 Thread Jason Merrill via Gcc-patches
Applying to trunk as obvious (explaining existing behavior).

-- 8< --

gcc/ChangeLog:

* doc/gty.texi: Add discussion of cache vs. deletable.
---
 gcc/doc/gty.texi | 7 +++
 1 file changed, 7 insertions(+)

diff --git a/gcc/doc/gty.texi b/gcc/doc/gty.texi
index 15f9fa07405..1dfe4652644 100644
--- a/gcc/doc/gty.texi
+++ b/gcc/doc/gty.texi
@@ -306,6 +306,13 @@ called on that variable between the mark and sweep phases 
of garbage
 collection.  The gt_clear_cache function is free to mark blocks as used, or to
 clear pointers in the variable.
 
+In a hash table, the @samp{gt_cleare_cache} function discards entries
+if the key is not marked, or marks the value if the key is marked.
+
+Note that caches should generally use @code{deletable} instead;
+@code{cache} is only preferable if the value is impractical to
+recompute from the key when needed.
+
 @findex deletable
 @item deletable
 

base-commit: 68845f7c4d58186cc0a5b09f7511f3c0a8f07e88
-- 
2.39.3



Ping [PATCH] rs6000: mark tieable between INT and FLOAT

2023-09-17 Thread Jiufu Guo via Gcc-patches
Hi,

I would like to have a ping.

BR,
Jeff (Jiufu Guo)

Jiufu Guo  writes:

> Hi,
>
> For PowerPC, some INT mode and FLOAT modes can be marked as tieable,
> for example: DI<->DF.
> One note SFmode is special, it would only tieable with itself.
>
> I updated previous patch more reasonable:
> https://gcc.gnu.org/pipermail/gcc-patches/2023-January/609504.html
>
> Bootstrap and regtest pass on ppc64{,le}.
> Is this ok for trunk?
>
> BR,
> Jeff (Jiufu)
>
> gcc/ChangeLog:
>
>   * config/rs6000/rs6000.cc (rs6000_modes_tieable_p): Mark more tieable
>   modes.
>
> gcc/testsuite/ChangeLog:
>
>   * g++.target/powerpc/pr102024.C: Updated.
>
> ---
>  gcc/config/rs6000/rs6000.cc | 9 +
>  gcc/testsuite/g++.target/powerpc/pr102024.C | 3 ++-
>  2 files changed, 11 insertions(+), 1 deletion(-)
>
> diff --git a/gcc/config/rs6000/rs6000.cc b/gcc/config/rs6000/rs6000.cc
> index 6ac3adcec6b..3cb0186089e 100644
> --- a/gcc/config/rs6000/rs6000.cc
> +++ b/gcc/config/rs6000/rs6000.cc
> @@ -1968,6 +1968,15 @@ rs6000_modes_tieable_p (machine_mode mode1, 
> machine_mode mode2)
>if (ALTIVEC_OR_VSX_VECTOR_MODE (mode2))
>  return false;
>  
> +  /* SFmode format (IEEE DP) in register would not as required,
> + So SFmode is restrict here.  */
> +  if (GET_MODE_CLASS (mode1) == MODE_FLOAT
> +  && GET_MODE_CLASS (mode2) == MODE_INT)
> +return GET_MODE_SIZE (mode1) == UNITS_PER_FP_WORD;
> +  if (GET_MODE_CLASS (mode1) == MODE_INT
> +  && GET_MODE_CLASS (mode2) == MODE_FLOAT)
> +return GET_MODE_SIZE (mode2) == UNITS_PER_FP_WORD;
> +
>if (SCALAR_FLOAT_MODE_P (mode1))
>  return SCALAR_FLOAT_MODE_P (mode2);
>if (SCALAR_FLOAT_MODE_P (mode2))
> diff --git a/gcc/testsuite/g++.target/powerpc/pr102024.C 
> b/gcc/testsuite/g++.target/powerpc/pr102024.C
> index 769585052b5..27d2dc5e80b 100644
> --- a/gcc/testsuite/g++.target/powerpc/pr102024.C
> +++ b/gcc/testsuite/g++.target/powerpc/pr102024.C
> @@ -5,7 +5,8 @@
>  // Test that a zero-width bit field in an otherwise homogeneous aggregate
>  // generates a psabi warning and passes arguments in GPRs.
>  
> -// { dg-final { scan-assembler-times {\mstd\M} 4 } }
> +// { dg-final { scan-assembler-times {\mmtvsrd\M} 4 { target has_arch_pwr8 } 
> } }
> +// { dg-final { scan-assembler-times {\mstd\M} 4 { target { ! has_arch_pwr8 
> } } } }
>  
>  struct a_thing
>  {


Ping [PATCH V4 1/2] rs6000: optimize moving to sf from highpart di

2023-09-17 Thread Jiufu Guo via Gcc-patches
Hi,

I would like to have a ping.

BR,
Jeff (Jiufu Guo)

Jiufu Guo  writes:

> Hi,
>
> Currently, we have the pattern "movsf_from_si2" which was trying
> to support moving high part DI to SF.
>
> The pattern looks like: XX:SF=bitcast:SF(subreg(YY:DI>>32),0)
> It only accepts the "ashiftrt" for ">>", but "lshiftrt" is also ok.
> And the offset of "subreg" is hard code 0, which only works for LE.
>
> "movsf_from_si2" is updated to cover BE for "subreg", and cover
> the logical shift for ":DI>>32".
>
> Pass bootstrap and regression on ppc64{,le}.
> Is this ok for trunk?
>
> BR,
> Jeff (Jiufu Guo)
>
>   PR target/108338
>
> gcc/ChangeLog:
>
>   * config/rs6000/predicates.md (lowpart_subreg_operator): New
>   define_predicate.
>   * config/rs6000/rs6000.md (any_rshift): New code_iterator.
>   (movsf_from_si2): Rename to ...
>   (movsf_from_si2_): ... this.
>
> gcc/testsuite/ChangeLog:
>
>   * gcc.target/powerpc/pr108338.c: New test.
>
> ---
>  gcc/config/rs6000/predicates.md |  5 +++
>  gcc/config/rs6000/rs6000.md | 11 +++---
>  gcc/testsuite/gcc.target/powerpc/pr108338.c | 40 +
>  3 files changed, 51 insertions(+), 5 deletions(-)
>  create mode 100644 gcc/testsuite/gcc.target/powerpc/pr108338.c
>
> diff --git a/gcc/config/rs6000/predicates.md b/gcc/config/rs6000/predicates.md
> index 
> 3552d908e9d149a30993e3e6568466de537336be..e25b3b4864f681d47e9d5c2eb88bcde0aea6d17b
>  100644
> --- a/gcc/config/rs6000/predicates.md
> +++ b/gcc/config/rs6000/predicates.md
> @@ -2098,3 +2098,8 @@ (define_predicate "macho_pic_address"
>else
>  return false;
>  })
> +
> +(define_predicate "lowpart_subreg_operator"
> +  (and (match_code "subreg")
> +   (match_test "subreg_lowpart_offset (mode, GET_MODE (SUBREG_REG (op)))
> + == SUBREG_BYTE (op)")))
> diff --git a/gcc/config/rs6000/rs6000.md b/gcc/config/rs6000/rs6000.md
> index 
> 1a9a7b1a47918f39fc91038607f21a8ba9a2e740..8c92cbf976de915136ad5dba24e69a363d21438d
>  100644
> --- a/gcc/config/rs6000/rs6000.md
> +++ b/gcc/config/rs6000/rs6000.md
> @@ -8299,18 +8299,19 @@ (define_insn_and_split "movsf_from_si"
>   "*,  *, p9v,   p8v,   *, *,
>p8v,p8v,   p8v,   *")])
>  
> +(define_code_iterator any_rshift [ashiftrt lshiftrt])
> +
>  ;; For extracting high part element from DImode register like:
>  ;; {%1:SF=unspec[r122:DI>>0x20#0] 86;clobber scratch;}
>  ;; split it before reload with "and mask" to avoid generating shift right
>  ;; 32 bit then shift left 32 bit.
> -(define_insn_and_split "movsf_from_si2"
> +(define_insn_and_split "movsf_from_si2_"
>[(set (match_operand:SF 0 "gpc_reg_operand" "=wa")
>   (unspec:SF
> -  [(subreg:SI
> -(ashiftrt:DI
> +  [(match_operator:SI 3 "lowpart_subreg_operator"
> +[(any_rshift:DI
>   (match_operand:DI 1 "input_operand" "r")
> - (const_int 32))
> -0)]
> + (const_int 32))])]
>UNSPEC_SF_FROM_SI))
>(clobber (match_scratch:DI 2 "=r"))]
>"TARGET_NO_SF_SUBREG"
> diff --git a/gcc/testsuite/gcc.target/powerpc/pr108338.c 
> b/gcc/testsuite/gcc.target/powerpc/pr108338.c
> new file mode 100644
> index 
> ..6db65595343c2407fc32f68f5f52a1f7196c371d
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/powerpc/pr108338.c
> @@ -0,0 +1,40 @@
> +// { dg-do run }
> +// { dg-options "-O2 -save-temps" }
> +
> +float __attribute__ ((noipa)) sf_from_di_off0 (long long l)
> +{
> +  char buff[16];
> +  *(long long*)buff = l;
> +  float f = *(float*)(buff);
> +  return f;
> +}
> +
> +float  __attribute__ ((noipa)) sf_from_di_off4 (long long l)
> +{
> +  char buff[16];
> +  *(long long*)buff = l;
> +  float f = *(float*)(buff + 4);
> +  return f; 
> +}
> +
> +/* Under lp64, parameter 'l' is in one DI reg, then bitcast sub DI to SF. */
> +/* { dg-final { scan-assembler-times {\mxscvspdpn\M} 2 { target { lp64 && 
> has_arch_pwr8 } } } } */
> +/* { dg-final { scan-assembler-times {\mmtvsrd\M} 2 { target { lp64 && 
> has_arch_pwr8 } } } } */
> +/* { dg-final { scan-assembler-times {\mrldicr\M} 1 { target { lp64 && 
> has_arch_pwr8 } } } } */
> +/* { dg-final { scan-assembler-times {\msldi\M} 1 { target { lp64 && 
> has_arch_pwr8 } } } } */
> +
> +union di_sf_sf
> +{
> +  struct {float f1; float f2;};
> +  long long l;
> +};
> +
> +int main()
> +{
> +  union di_sf_sf v;
> +  v.f1 = 1.0f;
> +  v.f2 = 2.0f;
> +  if (sf_from_di_off0 (v.l) != 1.0f || sf_from_di_off4 (v.l) != 2.0f )
> +__builtin_abort ();
> +  return 0;
> +}


Ping [PATCH V4 2/2] rs6000: use mtvsrws to move sf from si p9

2023-09-17 Thread Jiufu Guo via Gcc-patches
Hi,

I would like to have a ping.

BR,
Jeff (Jiufu Guo)

Jiufu Guo  writes:

> Hi,
>
> As mentioned in PR108338, on p9, we could use mtvsrws to implement
> the bitcast from SI to SF (or lowpart DI to SF).
>
> For code:
>   *(long long*)buff = di;
>   float f = *(float*)(buff);
>
> "sldi 9,3,32 ; mtvsrd 1,9 ; xscvspdpn 1,1" is generated.
> A better one would be "mtvsrws 1,3 ; xscvspdpn 1,1".
>
> Compare with previous patch:
> https://gcc.gnu.org/pipermail/gcc-patches/2023-July/623533.html
> "highpart DI-->SF" is put to a seperate patch.
>
> Pass bootstrap and regression on ppc64{,le}.
> Is this ok for trunk?
>
> BR,
> Jeff (Jiufu Guo)
>
> gcc/ChangeLog:
>
>   * config/rs6000/rs6000.md (movsf_from_si): Update to generate mtvsrws
>   for P9.
>
> gcc/testsuite/ChangeLog:
>
>   * gcc.target/powerpc/pr108338.c: Updated to check mtvsrws for p9.
>
> ---
>  gcc/config/rs6000/rs6000.md | 25 -
>  gcc/testsuite/gcc.target/powerpc/pr108338.c |  6 +++--
>  2 files changed, 23 insertions(+), 8 deletions(-)
>
> diff --git a/gcc/config/rs6000/rs6000.md b/gcc/config/rs6000/rs6000.md
> index 
> 8c92cbf976de915136ad5dba24e69a363d21438d..c03e677bca79e8fb1acb276d07d0acfae009f6d8
>  100644
> --- a/gcc/config/rs6000/rs6000.md
> +++ b/gcc/config/rs6000/rs6000.md
> @@ -8280,13 +8280,26 @@ (define_insn_and_split "movsf_from_si"
>  {
>rtx op0 = operands[0];
>rtx op1 = operands[1];
> -  rtx op2 = operands[2];
> -  rtx op1_di = gen_rtx_REG (DImode, REGNO (op1));
>  
> -  /* Move SF value to upper 32-bits for xscvspdpn.  */
> -  emit_insn (gen_ashldi3 (op2, op1_di, GEN_INT (32)));
> -  emit_insn (gen_p8_mtvsrd_sf (op0, op2));
> -  emit_insn (gen_vsx_xscvspdpn_directmove (op0, op0));
> +  /* Move lowpart 32-bits from register for SFmode.  */
> +  if (TARGET_P9_VECTOR)
> +{
> +  /* Using mtvsrws;xscvspdpn.  */
> +  rtx op0_v = gen_rtx_REG (V4SImode, REGNO (op0));
> +  emit_insn (gen_vsx_splat_v4si (op0_v, op1));
> +  emit_insn (gen_vsx_xscvspdpn_directmove (op0, op0));
> +}
> +  else
> +{
> +  rtx op2 = operands[2];
> +  rtx op1_di = gen_rtx_REG (DImode, REGNO (op1));
> +
> +  /* Using ashl;mtvsrd;xscvspdpn.  */
> +  emit_insn (gen_ashldi3 (op2, op1_di, GEN_INT (32)));
> +  emit_insn (gen_p8_mtvsrd_sf (op0, op2));
> +  emit_insn (gen_vsx_xscvspdpn_directmove (op0, op0));
> +}
> +
>DONE;
>  }
>[(set_attr "length"
> diff --git a/gcc/testsuite/gcc.target/powerpc/pr108338.c 
> b/gcc/testsuite/gcc.target/powerpc/pr108338.c
> index 
> 6db65595343c2407fc32f68f5f52a1f7196c371d..0565e5254ed0a8cc579cf505a3f865426dcf62ae
>  100644
> --- a/gcc/testsuite/gcc.target/powerpc/pr108338.c
> +++ b/gcc/testsuite/gcc.target/powerpc/pr108338.c
> @@ -19,9 +19,11 @@ float  __attribute__ ((noipa)) sf_from_di_off4 (long long 
> l)
>  
>  /* Under lp64, parameter 'l' is in one DI reg, then bitcast sub DI to SF. */
>  /* { dg-final { scan-assembler-times {\mxscvspdpn\M} 2 { target { lp64 && 
> has_arch_pwr8 } } } } */
> -/* { dg-final { scan-assembler-times {\mmtvsrd\M} 2 { target { lp64 && 
> has_arch_pwr8 } } } } */
> +/* { dg-final { scan-assembler-times {\mmtvsrd\M} 2 { target { lp64 && { 
> has_arch_pwr8 && { ! has_arch_pwr9 } } } } } } */
> +/* { dg-final { scan-assembler-times {\msldi\M} 1 { target { lp64 && { 
> has_arch_pwr8 && { ! has_arch_pwr9 } } } } } } */
> +/* { dg-final { scan-assembler-times {\mmtvsrd\M} 1 { target { lp64 && 
> has_arch_pwr9 } } } } */
> +/* { dg-final { scan-assembler-times {\mmtvsrws\M} 1 { target { lp64 && 
> has_arch_pwr9 } } } } */
>  /* { dg-final { scan-assembler-times {\mrldicr\M} 1 { target { lp64 && 
> has_arch_pwr8 } } } } */
> -/* { dg-final { scan-assembler-times {\msldi\M} 1 { target { lp64 && 
> has_arch_pwr8 } } } } */
>  
>  union di_sf_sf
>  {


[PATCH] RISC-V: Remove phase 6 of vsetvl pass in GCC13[PR111412]

2023-09-17 Thread Li Xu
From: xuli 

vsetvl pass has been refactored in gcc14, and the optimization
is more reasonable than releases/gcc-13. This problem does not
exist in gcc14.

Phase 6 of gcc13 is an optimization patch. Due to lack of consideration,
there will be some hidden bugs, so we decided to remove phase 6.
Although the generated code will be redundant, the program is correct.

PR target/111412

gcc/ChangeLog:

* config/riscv/riscv-vsetvl.cc (vector_infos_manager::release): Remove.
(pass_vsetvl::refine_vsetvls): Ditto.
(pass_vsetvl::cleanup_vsetvls): Ditto.
(pass_vsetvl::propagate_avl): Ditto.
(pass_vsetvl::lazy_vsetvl): Ditto.
* config/riscv/riscv-vsetvl.h: Ditto.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/rvv/vsetvl/avl_single-79.c: Adjust case.
* gcc.target/riscv/rvv/vsetvl/avl_single-80.c: Ditto.
* gcc.target/riscv/rvv/vsetvl/avl_single-86.c: Ditto.
* gcc.target/riscv/rvv/vsetvl/avl_single-87.c: Ditto.
* gcc.target/riscv/rvv/vsetvl/avl_single-88.c: Ditto.
* gcc.target/riscv/rvv/vsetvl/avl_single-89.c: Ditto.
* gcc.target/riscv/rvv/vsetvl/avl_single-90.c: Ditto.
* gcc.target/riscv/rvv/vsetvl/vlmax_back_prop-25.c: Ditto.
* gcc.target/riscv/rvv/vsetvl/vlmax_back_prop-26.c: Ditto.
* gcc.target/riscv/rvv/vsetvl/vlmax_switch_vtype-14.c: Ditto.
* gcc.target/riscv/rvv/vsetvl/vlmax_switch_vtype-15.c: Ditto.
* gcc.target/riscv/rvv/vsetvl/vsetvl-1.c: Ditto.
* gcc.target/riscv/rvv/vsetvl/vsetvl-5.c: Ditto.
* gcc.target/riscv/rvv/vsetvl/vsetvl-6.c: Ditto.
* gcc.target/riscv/rvv/vsetvl/vsetvl-7.c: Ditto.
* gcc.target/riscv/rvv/vsetvl/vsetvl-8.c: Ditto.
* gcc.target/riscv/rvv/vsetvl/vsetvlmax-2.c: Ditto.
* gcc.target/riscv/rvv/vsetvl/vsetvlmax-4.c: Ditto.
* gcc.target/riscv/rvv/base/pr111412.c: New test.
---
 gcc/config/riscv/riscv-vsetvl.cc  | 153 +-
 gcc/config/riscv/riscv-vsetvl.h   |   2 -
 .../gcc.target/riscv/rvv/base/pr111412.c  |  41 +
 .../riscv/rvv/vsetvl/avl_single-79.c  |   4 +-
 .../riscv/rvv/vsetvl/avl_single-80.c  |   4 +-
 .../riscv/rvv/vsetvl/avl_single-86.c  |   4 +-
 .../riscv/rvv/vsetvl/avl_single-87.c  |   4 +-
 .../riscv/rvv/vsetvl/avl_single-88.c  |   4 +-
 .../riscv/rvv/vsetvl/avl_single-89.c  |   4 +-
 .../riscv/rvv/vsetvl/avl_single-90.c  |   4 +-
 .../riscv/rvv/vsetvl/vlmax_back_prop-25.c |  10 +-
 .../riscv/rvv/vsetvl/vlmax_back_prop-26.c |  10 +-
 .../riscv/rvv/vsetvl/vlmax_switch_vtype-14.c  |   6 +-
 .../riscv/rvv/vsetvl/vlmax_switch_vtype-15.c  |   2 +-
 .../gcc.target/riscv/rvv/vsetvl/vsetvl-1.c|   2 +-
 .../gcc.target/riscv/rvv/vsetvl/vsetvl-5.c|   2 +-
 .../gcc.target/riscv/rvv/vsetvl/vsetvl-6.c|   2 +-
 .../gcc.target/riscv/rvv/vsetvl/vsetvl-7.c|   2 +-
 .../gcc.target/riscv/rvv/vsetvl/vsetvl-8.c|   2 +-
 .../gcc.target/riscv/rvv/vsetvl/vsetvlmax-2.c |   4 +-
 .../gcc.target/riscv/rvv/vsetvl/vsetvlmax-4.c |   4 +-
 21 files changed, 80 insertions(+), 190 deletions(-)
 create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/base/pr111412.c

diff --git a/gcc/config/riscv/riscv-vsetvl.cc b/gcc/config/riscv/riscv-vsetvl.cc
index 0cf4bc818e2..9dca2ce709d 100644
--- a/gcc/config/riscv/riscv-vsetvl.cc
+++ b/gcc/config/riscv/riscv-vsetvl.cc
@@ -2494,8 +2494,6 @@ vector_infos_manager::release (void)
   if (!vector_exprs.is_empty ())
 vector_exprs.release ();
 
-  gcc_assert (to_refine_vsetvls.is_empty ());
-  gcc_assert (to_delete_vsetvls.is_empty ());
   if (optimize > 0)
 free_bitmap_vectors ();
 }
@@ -2702,9 +2700,6 @@ private:
   /* Phase 5.  */
   void cleanup_insns (void) const;
 
-  /* Phase 6.  */
-  void propagate_avl (void) const;
-
   void init (void);
   void done (void);
   void compute_probabilities (void);
@@ -3823,10 +3818,8 @@ pass_vsetvl::refine_vsetvls (void) const
   /* We can't refine user vsetvl into vsetvl zero,zero since the dest
 will be used by the following instructions.  */
   if (vector_config_insn_p (rinsn))
-   {
- m_vector_manager->to_refine_vsetvls.add (rinsn);
  continue;
-   }
+
   rinsn = PREV_INSN (rinsn);
   rtx new_pat = gen_vsetvl_pat (VSETVL_VTYPE_CHANGE_ONLY, info, NULL_RTX);
   change_insn (rinsn, new_pat);
@@ -3862,10 +3855,7 @@ pass_vsetvl::cleanup_vsetvls ()
  /* We can't eliminate user vsetvl since the dest will be used
   * by the following instructions.  */
  if (vector_config_insn_p (insn->rtl ()))
-   {
- m_vector_manager->to_delete_vsetvls.add (insn->rtl ());
- continue;
-   }
+   continue;
 
  gcc_assert (has_vtype_op (insn->rtl ()));
  rinsn = PREV_INSN (insn->rtl ());
@@ -4067,139 +4057,6 @@ p

PATCH v6 4/4] ree: Improve ree pass for rs6000 target using defined ABI interfaces.

2023-09-17 Thread Ajit Agarwal via Gcc-patches
This new version of patch 6 use improve ree pass for rs6000 target using 
defined ABI interfaces.
Bootstrapped and regtested on power64-linux-gnu.

Review comments incorporated.

Thanks & Regards
Ajit


ree: Improve ree pass for rs6000 target using defined abi interfaces

For rs6000 target we see redundant zero and sign extension and done to
improve ree pass to eliminate such redundant zero and sign extension
using defined ABI interfaces.

2023-09-18  Ajit Kumar Agarwal  

gcc/ChangeLog:

* ree.cc (combine_reaching_defs): Use of  zero_extend and sign_extend
defined abi interfaces.
(add_removable_extension): Use of defined abi interfaces for no
reaching defs.
(abi_extension_candidate_return_reg_p): New function.
(abi_extension_candidate_p): New function.
(abi_extension_candidate_argno_p): New function.
(abi_handle_regs_without_defs_p): New function.
(abi_target_promote_function_mode): New function.

gcc/testsuite/ChangeLog:

* g++.target/powerpc/zext-elim-3.C
---
 gcc/ree.cc| 145 +-
 .../g++.target/powerpc/zext-elim-3.C  |  13 ++
 2 files changed, 155 insertions(+), 3 deletions(-)
 create mode 100644 gcc/testsuite/g++.target/powerpc/zext-elim-3.C

diff --git a/gcc/ree.cc b/gcc/ree.cc
index fc04249fa84..e395af6b1bd 100644
--- a/gcc/ree.cc
+++ b/gcc/ree.cc
@@ -514,7 +514,8 @@ get_uses (rtx_insn *insn, rtx reg)
 if (REGNO (DF_REF_REG (def)) == REGNO (reg))
   break;
 
-  gcc_assert (def != NULL);
+  if (def == NULL)
+return NULL;
 
   ref_chain = DF_REF_CHAIN (def);
 
@@ -750,6 +751,118 @@ get_extended_src_reg (rtx src)
   return src;
 }
 
+/* Return TRUE if target mode is equal to source mode of zero_extend
+   or sign_extend otherwise false.  */
+
+static bool
+abi_target_promote_function_mode (machine_mode mode)
+{
+  int unsignedp;
+  machine_mode tgt_mode =
+targetm.calls.promote_function_mode (NULL_TREE, mode, &unsignedp,
+NULL_TREE, 1);
+
+  if (tgt_mode == mode)
+return true;
+  else
+return false;
+}
+
+/* Return TRUE if the candidate insn is zero extend and regno is
+   an return  registers.  */
+
+static bool
+abi_extension_candidate_return_reg_p (rtx_insn *insn, int regno)
+{
+  rtx set = single_set (insn);
+
+  if (GET_CODE (SET_SRC (set)) != ZERO_EXTEND)
+return false;
+
+  if (FUNCTION_VALUE_REGNO_P (regno))
+return true;
+
+  return false;
+}
+
+/* Return TRUE if reg source operand of zero_extend is argument registers
+   and not return registers and source and destination operand are same
+   and mode of source and destination operand are not same.  */
+
+static bool
+abi_extension_candidate_p (rtx_insn *insn)
+{
+  rtx set = single_set (insn);
+
+  if (GET_CODE (SET_SRC (set)) != ZERO_EXTEND)
+return false;
+
+  machine_mode ext_dst_mode = GET_MODE (SET_DEST (set));
+  rtx orig_src = XEXP (SET_SRC (set),0);
+
+  bool copy_needed
+= (REGNO (SET_DEST (set)) != REGNO (XEXP (SET_SRC (set), 0)));
+
+  if (!copy_needed && ext_dst_mode != GET_MODE (orig_src)
+  && FUNCTION_ARG_REGNO_P (REGNO (orig_src))
+  && !abi_extension_candidate_return_reg_p (insn, REGNO (orig_src)))
+return true;
+
+  return false;
+}
+
+/* Return TRUE if the candidate insn is zero extend and regno is
+   an argument registers.  */
+
+static bool
+abi_extension_candidate_argno_p (rtx_code code, int regno)
+{
+  if (code != ZERO_EXTEND)
+return false;
+
+  if (FUNCTION_ARG_REGNO_P (regno))
+return true;
+
+  return false;
+}
+
+/* Return TRUE if the candidate insn doesn't have defs and have
+ * uses without RTX_BIN_ARITH/RTX_COMM_ARITH/RTX_UNARY rtx class.  */
+
+static bool
+abi_handle_regs_without_defs_p (rtx_insn *insn)
+{
+  if (side_effects_p (PATTERN (insn)))
+return false;
+
+  struct df_link *uses = get_uses (insn, SET_DEST (PATTERN (insn)));
+
+  if (!uses)
+return false;
+
+  for (df_link *use = uses; use; use = use->next)
+{
+  if (!use->ref)
+   return false;
+
+  if (BLOCK_FOR_INSN (insn) != BLOCK_FOR_INSN (DF_REF_INSN (use->ref)))
+   return false;
+
+  rtx_insn *use_insn = DF_REF_INSN (use->ref);
+
+  if (GET_CODE (PATTERN (use_insn)) == SET)
+   {
+ rtx_code code = GET_CODE (SET_SRC (PATTERN (use_insn)));
+
+ if (GET_RTX_CLASS (code) == RTX_BIN_ARITH
+ || GET_RTX_CLASS (code) == RTX_COMM_ARITH
+ || GET_RTX_CLASS (code) == RTX_UNARY)
+   return false;
+   }
+ }
+  return true;
+}
+
 /* This function goes through all reaching defs of the source
of the candidate for elimination (CAND) and tries to combine
the extension with the definition instruction.  The changes
@@ -770,6 +883,11 @@ combine_reaching_defs (ext_cand *cand, const_rtx set_pat, 
ext_state *state)
 
   state->defs_list.truncate (0);
   state->copies_list.truncate (0);
+  rtx orig_src = XEXP (SET_SRC (cand->expr),0)

[PATCH] rs6000: Use default target option node for callee by default [PR111380]

2023-09-17 Thread Kewen.Lin via Gcc-patches
Hi,

As PR111380 (and the discussion in related PRs) shows, for
now how function rs6000_can_inline_p treats the callee
without any target option node is wrong.  It considers it's
always safe to inline this kind of callee, but actually its
target flags are from the command line options
(target_option_default_node), it's possible that the flags
of callee don't satisfy the condition of inlining, but it
is still inlined, then result in unexpected consequence.

As the associated test case pr111380-1.c shows, the caller
main is attributed with power8, but the callee foo is
compiled with power9 from command line, it's unexpected to
make main inline foo since foo can contain something that
requires power9 capability.  Without this patch, for lto
(with -flto) we can get error message (as it forces the
callee to have a target option node), but for non-lto, it's
inlined unexpectedly.

This patch is to make callee adopt target_option_default_node
when it doesn't have a target option node, it can avoid wrong
inlining decision and fix the inconsistency between LTO and
non-LTO.  It also aligns with what the other ports do.

Bootstrapped and regtested on powerpc64-linux-gnu P7/P8/P9
and powerpc64le-linux-gnu P9 and P10.

I'm going to push this soon if no objections.

BR,
Kewen
-

PR target/111380

gcc/ChangeLog:

* config/rs6000/rs6000.cc (rs6000_can_inline_p): Adopt
target_option_default_node when the callee has no option
attributes, also simplify the existing code accordingly.

gcc/testsuite/ChangeLog:

* gcc.target/powerpc/pr111380-1.c: New test.
* gcc.target/powerpc/pr111380-2.c: New test.
---
 gcc/config/rs6000/rs6000.cc   | 65 +--
 gcc/testsuite/gcc.target/powerpc/pr111380-1.c | 20 ++
 gcc/testsuite/gcc.target/powerpc/pr111380-2.c | 20 ++
 3 files changed, 70 insertions(+), 35 deletions(-)
 create mode 100644 gcc/testsuite/gcc.target/powerpc/pr111380-1.c
 create mode 100644 gcc/testsuite/gcc.target/powerpc/pr111380-2.c

diff --git a/gcc/config/rs6000/rs6000.cc b/gcc/config/rs6000/rs6000.cc
index efe9adce1f8..d48134b35f8 100644
--- a/gcc/config/rs6000/rs6000.cc
+++ b/gcc/config/rs6000/rs6000.cc
@@ -25508,49 +25508,44 @@ rs6000_can_inline_p (tree caller, tree callee)
   tree caller_tree = DECL_FUNCTION_SPECIFIC_TARGET (caller);
   tree callee_tree = DECL_FUNCTION_SPECIFIC_TARGET (callee);

-  /* If the callee has no option attributes, then it is ok to inline.  */
+  /* If the caller/callee has option attributes, then use them.
+ Otherwise, use the command line options.  */
   if (!callee_tree)
-ret = true;
+callee_tree = target_option_default_node;
+  if (!caller_tree)
+caller_tree = target_option_default_node;

-  else
-{
-  HOST_WIDE_INT caller_isa;
-  struct cl_target_option *callee_opts = TREE_TARGET_OPTION (callee_tree);
-  HOST_WIDE_INT callee_isa = callee_opts->x_rs6000_isa_flags;
-  HOST_WIDE_INT explicit_isa = callee_opts->x_rs6000_isa_flags_explicit;
+  struct cl_target_option *callee_opts = TREE_TARGET_OPTION (callee_tree);
+  struct cl_target_option *caller_opts = TREE_TARGET_OPTION (caller_tree);

-  /* If the caller has option attributes, then use them.
-Otherwise, use the command line options.  */
-  if (caller_tree)
-   caller_isa = TREE_TARGET_OPTION (caller_tree)->x_rs6000_isa_flags;
-  else
-   caller_isa = rs6000_isa_flags;
+  HOST_WIDE_INT callee_isa = callee_opts->x_rs6000_isa_flags;
+  HOST_WIDE_INT caller_isa = caller_opts->x_rs6000_isa_flags;
+  HOST_WIDE_INT explicit_isa = callee_opts->x_rs6000_isa_flags_explicit;

-  cgraph_node *callee_node = cgraph_node::get (callee);
-  if (ipa_fn_summaries && ipa_fn_summaries->get (callee_node) != NULL)
+  cgraph_node *callee_node = cgraph_node::get (callee);
+  if (ipa_fn_summaries && ipa_fn_summaries->get (callee_node) != NULL)
+{
+  unsigned int info = ipa_fn_summaries->get (callee_node)->target_info;
+  if ((info & RS6000_FN_TARGET_INFO_HTM) == 0)
{
- unsigned int info = ipa_fn_summaries->get (callee_node)->target_info;
- if ((info & RS6000_FN_TARGET_INFO_HTM) == 0)
-   {
- callee_isa &= ~OPTION_MASK_HTM;
- explicit_isa &= ~OPTION_MASK_HTM;
-   }
+ callee_isa &= ~OPTION_MASK_HTM;
+ explicit_isa &= ~OPTION_MASK_HTM;
}
+}

-  /* Ignore -mpower8-fusion and -mpower10-fusion options for inlining
-purposes.  */
-  callee_isa &= ~(OPTION_MASK_P8_FUSION | OPTION_MASK_P10_FUSION);
-  explicit_isa &= ~(OPTION_MASK_P8_FUSION | OPTION_MASK_P10_FUSION);
+  /* Ignore -mpower8-fusion and -mpower10-fusion options for inlining
+ purposes.  */
+  callee_isa &= ~(OPTION_MASK_P8_FUSION | OPTION_MASK_P10_FUSION);
+  explicit_isa &= ~(OPTION_MASK_P8_FUSION | OPTION_MASK_P10_FUSION);

-  /* The callee's options must be a subset of the caller's options, i.e.
-a vsx function m

[PATCH] rs6000: Skip empty inline asm in rs6000_update_ipa_fn_target_info [PR111366]

2023-09-17 Thread Kewen.Lin via Gcc-patches
Hi,

PR111366 exposes one thing that can be improved in function
rs6000_update_ipa_fn_target_info is to skip the given empty
inline asm string, since it's impossible to adopt any
hardware features (so far HTM).

Since this rs6000_update_ipa_fn_target_info related approach
exists in GCC12 and later, the affected project highway has
updated its target pragma with ",htm", see the link:
https://github.com/google/highway/commit/15e63d61eb535f478bc
I'd not bother to consider an inline asm parser for now but
will file a separated PR for further enhancement.

Bootstrapped and regtested on powerpc64-linux-gnu P7/P8/P9
and powerpc64le-linux-gnu P9 and P10.

I'm going to push this soon.

BR,
Kewen
-
PR target/111366

gcc/ChangeLog:

* config/rs6000/rs6000.cc (rs6000_update_ipa_fn_target_info): Skip
empty inline asm.

gcc/testsuite/ChangeLog:

* g++.target/powerpc/pr111366.C: New test.
---
 gcc/config/rs6000/rs6000.cc |  9 ++--
 gcc/testsuite/g++.target/powerpc/pr111366.C | 48 +
 2 files changed, 54 insertions(+), 3 deletions(-)
 create mode 100644 gcc/testsuite/g++.target/powerpc/pr111366.C

diff --git a/gcc/config/rs6000/rs6000.cc b/gcc/config/rs6000/rs6000.cc
index d48134b35f8..40925407a99 100644
--- a/gcc/config/rs6000/rs6000.cc
+++ b/gcc/config/rs6000/rs6000.cc
@@ -25475,9 +25475,12 @@ rs6000_update_ipa_fn_target_info (unsigned int &info, 
const gimple *stmt)
   /* Assume inline asm can use any instruction features.  */
   if (gimple_code (stmt) == GIMPLE_ASM)
 {
-  /* Should set any bits we concerned, for now OPTION_MASK_HTM is
-the only bit we care about.  */
-  info |= RS6000_FN_TARGET_INFO_HTM;
+  const char *asm_str = gimple_asm_string (as_a (stmt));
+  /* Ignore empty inline asm string.  */
+  if (strlen (asm_str) > 0)
+   /* Should set any bits we concerned, for now OPTION_MASK_HTM is
+  the only bit we care about.  */
+   info |= RS6000_FN_TARGET_INFO_HTM;
   return false;
 }
   else if (gimple_code (stmt) == GIMPLE_CALL)
diff --git a/gcc/testsuite/g++.target/powerpc/pr111366.C 
b/gcc/testsuite/g++.target/powerpc/pr111366.C
new file mode 100644
index 000..6d3d8ebc552
--- /dev/null
+++ b/gcc/testsuite/g++.target/powerpc/pr111366.C
@@ -0,0 +1,48 @@
+/* { dg-do compile } */
+/* Use -Wno-attributes to suppress the possible warning on always_inline.  */
+/* { dg-options "-O2 -mdejagnu-cpu=power9 -Wno-attributes" } */
+
+/* Verify it doesn't emit any error messages.  */
+
+#include 
+#define HWY_PRAGMA(tokens) _Pragma (#tokens)
+#define HWY_PUSH_ATTRIBUTES(targets_str) HWY_PRAGMA (GCC target targets_str)
+__attribute__ ((always_inline)) void
+PreventElision ()
+{
+  asm("");
+}
+#define HWY_BEFORE_NAMESPACE() HWY_PUSH_ATTRIBUTES (",cpu=power10")
+HWY_BEFORE_NAMESPACE () namespace detail
+{
+  template  struct CappedTagChecker
+  {
+  };
+}
+template 
+using CappedTag = detail::CappedTagChecker;
+template  struct ForeachCappedR
+{
+  static void Do (size_t, size_t)
+  {
+CappedTag d;
+Test () (int(), d);
+  }
+};
+template  struct ForPartialVectors
+{
+  template  void operator() (T)
+  {
+ForeachCappedR::Do (1, 1);
+  }
+};
+struct TestFloorLog2
+{
+  template  void operator() (T, DF) { PreventElision (); }
+};
+void
+TestAllFloorLog2 ()
+{
+  ForPartialVectors () (float());
+}
+
--
2.31.1


Re: [PATCH] Harmonize headers between both dg-extract-results scripts

2023-09-17 Thread Paul Iannetta via Gcc-patches
On Thu, Sep 14, 2023 at 04:24:33PM +0200, Paul Iannetta wrote:
> Hi,
> 
> This is a small patch so that both dg-extract-results.py and
> dg-extract-results.sh share the same header.  In particular, it fixes
> the fact that the regexp r'^Test Run By (\S+) on (.*)$' was never
> matched in the python file.

By the way, the bash script dg-extract-results.sh checks whether
python is available by invoking python.  However, it seems that the
policy on newer machines is to not provide python as a symlink (at
least on Ubuntu 22.04 and above; and RHEL 8).  Therefore, we might
want to also check against python3 so that the bash script does not
fail to find python even though it is available.

Thanks,
Paul


> Author: Paul Iannetta 
> Date:   Thu Sep 14 15:43:58 2023 +0200
> 
> Harmonize headers between both dg-extract-results scripts
> 
> The header of the python version looked like:
> Target is ...
> Host   is ...
> The header of the bash version looked like:
> Test run by ... on ...
> Target is ...
> 
> After this change both headers look like:
> Test run by ... on ...
> Target is ...
> Host   is ...
> 
> The order of the tests is not the same but since dg-cmp-results.sh it
> does not matter much.
> 
> contrib/ChangeLog:
> 
> 2023-09-14  Paul Iannetta  
> 
> * dg-extract-results.py: Print the "Test run" line.
> * dg-extract-results.sh: Print the "Host" line.
> 
> diff --git a/contrib/dg-extract-results.py b/contrib/dg-extract-results.py
> index 30aa68771d4..34da1808c5f 100644
> --- a/contrib/dg-extract-results.py
> +++ b/contrib/dg-extract-results.py
> @@ -113,7 +113,7 @@ class Prog:
>  # Whether to create .sum rather than .log output.
>  self.do_sum = True
>  # Regexps used while parsing.
> -self.test_run_re = re.compile (r'^Test Run By (\S+) on (.*)$')
> +self.test_run_re = re.compile (r'^Test run by (\S+) on (.*)$')
>  self.tool_re = re.compile (r'^\t\t=== (.*) tests ===$')
>  self.result_re = re.compile (r'^(PASS|XPASS|FAIL|XFAIL|UNRESOLVED'
>   r'|WARNING|ERROR|UNSUPPORTED|UNTESTED'
> diff --git a/contrib/dg-extract-results.sh b/contrib/dg-extract-results.sh
> index ff6c50d029c..57f6fe0e997 100755
> --- a/contrib/dg-extract-results.sh
> +++ b/contrib/dg-extract-results.sh
> @@ -271,7 +271,7 @@ cat $SUM_FILES \
>  
>  # Write the begining of the combined summary file.
>  
> -head -n 2 $FIRST_SUM
> +head -n 3 $FIRST_SUM
>  echo
>  echo "   === $TOOL tests ==="
>  echo






Re: [PATCH] RISC-V: Remove phase 6 of vsetvl pass in GCC13[PR111412]

2023-09-17 Thread juzhe.zh...@rivai.ai
Thanks for fixing it.
I am ok remove phase 6 optimization which has many latent bugs (in GCC 14 kito 
has refactored it) there.
But I think we need kito's more comments about that.



juzhe.zh...@rivai.ai
 
From: Li Xu
Date: 2023-09-18 12:19
To: gcc-patches
CC: kito.cheng; palmer; juzhe.zhong; xuli
Subject: [PATCH] RISC-V: Remove phase 6 of vsetvl pass in GCC13[PR111412]
From: xuli 
 
vsetvl pass has been refactored in gcc14, and the optimization
is more reasonable than releases/gcc-13. This problem does not
exist in gcc14.
 
Phase 6 of gcc13 is an optimization patch. Due to lack of consideration,
there will be some hidden bugs, so we decided to remove phase 6.
Although the generated code will be redundant, the program is correct.
 
PR target/111412
 
gcc/ChangeLog:
 
* config/riscv/riscv-vsetvl.cc (vector_infos_manager::release): Remove.
(pass_vsetvl::refine_vsetvls): Ditto.
(pass_vsetvl::cleanup_vsetvls): Ditto.
(pass_vsetvl::propagate_avl): Ditto.
(pass_vsetvl::lazy_vsetvl): Ditto.
* config/riscv/riscv-vsetvl.h: Ditto.
 
gcc/testsuite/ChangeLog:
 
* gcc.target/riscv/rvv/vsetvl/avl_single-79.c: Adjust case.
* gcc.target/riscv/rvv/vsetvl/avl_single-80.c: Ditto.
* gcc.target/riscv/rvv/vsetvl/avl_single-86.c: Ditto.
* gcc.target/riscv/rvv/vsetvl/avl_single-87.c: Ditto.
* gcc.target/riscv/rvv/vsetvl/avl_single-88.c: Ditto.
* gcc.target/riscv/rvv/vsetvl/avl_single-89.c: Ditto.
* gcc.target/riscv/rvv/vsetvl/avl_single-90.c: Ditto.
* gcc.target/riscv/rvv/vsetvl/vlmax_back_prop-25.c: Ditto.
* gcc.target/riscv/rvv/vsetvl/vlmax_back_prop-26.c: Ditto.
* gcc.target/riscv/rvv/vsetvl/vlmax_switch_vtype-14.c: Ditto.
* gcc.target/riscv/rvv/vsetvl/vlmax_switch_vtype-15.c: Ditto.
* gcc.target/riscv/rvv/vsetvl/vsetvl-1.c: Ditto.
* gcc.target/riscv/rvv/vsetvl/vsetvl-5.c: Ditto.
* gcc.target/riscv/rvv/vsetvl/vsetvl-6.c: Ditto.
* gcc.target/riscv/rvv/vsetvl/vsetvl-7.c: Ditto.
* gcc.target/riscv/rvv/vsetvl/vsetvl-8.c: Ditto.
* gcc.target/riscv/rvv/vsetvl/vsetvlmax-2.c: Ditto.
* gcc.target/riscv/rvv/vsetvl/vsetvlmax-4.c: Ditto.
* gcc.target/riscv/rvv/base/pr111412.c: New test.
---
gcc/config/riscv/riscv-vsetvl.cc  | 153 +-
gcc/config/riscv/riscv-vsetvl.h   |   2 -
.../gcc.target/riscv/rvv/base/pr111412.c  |  41 +
.../riscv/rvv/vsetvl/avl_single-79.c  |   4 +-
.../riscv/rvv/vsetvl/avl_single-80.c  |   4 +-
.../riscv/rvv/vsetvl/avl_single-86.c  |   4 +-
.../riscv/rvv/vsetvl/avl_single-87.c  |   4 +-
.../riscv/rvv/vsetvl/avl_single-88.c  |   4 +-
.../riscv/rvv/vsetvl/avl_single-89.c  |   4 +-
.../riscv/rvv/vsetvl/avl_single-90.c  |   4 +-
.../riscv/rvv/vsetvl/vlmax_back_prop-25.c |  10 +-
.../riscv/rvv/vsetvl/vlmax_back_prop-26.c |  10 +-
.../riscv/rvv/vsetvl/vlmax_switch_vtype-14.c  |   6 +-
.../riscv/rvv/vsetvl/vlmax_switch_vtype-15.c  |   2 +-
.../gcc.target/riscv/rvv/vsetvl/vsetvl-1.c|   2 +-
.../gcc.target/riscv/rvv/vsetvl/vsetvl-5.c|   2 +-
.../gcc.target/riscv/rvv/vsetvl/vsetvl-6.c|   2 +-
.../gcc.target/riscv/rvv/vsetvl/vsetvl-7.c|   2 +-
.../gcc.target/riscv/rvv/vsetvl/vsetvl-8.c|   2 +-
.../gcc.target/riscv/rvv/vsetvl/vsetvlmax-2.c |   4 +-
.../gcc.target/riscv/rvv/vsetvl/vsetvlmax-4.c |   4 +-
21 files changed, 80 insertions(+), 190 deletions(-)
create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/base/pr111412.c
 
diff --git a/gcc/config/riscv/riscv-vsetvl.cc b/gcc/config/riscv/riscv-vsetvl.cc
index 0cf4bc818e2..9dca2ce709d 100644
--- a/gcc/config/riscv/riscv-vsetvl.cc
+++ b/gcc/config/riscv/riscv-vsetvl.cc
@@ -2494,8 +2494,6 @@ vector_infos_manager::release (void)
   if (!vector_exprs.is_empty ())
 vector_exprs.release ();
-  gcc_assert (to_refine_vsetvls.is_empty ());
-  gcc_assert (to_delete_vsetvls.is_empty ());
   if (optimize > 0)
 free_bitmap_vectors ();
}
@@ -2702,9 +2700,6 @@ private:
   /* Phase 5.  */
   void cleanup_insns (void) const;
-  /* Phase 6.  */
-  void propagate_avl (void) const;
-
   void init (void);
   void done (void);
   void compute_probabilities (void);
@@ -3823,10 +3818,8 @@ pass_vsetvl::refine_vsetvls (void) const
   /* We can't refine user vsetvl into vsetvl zero,zero since the dest
will be used by the following instructions.  */
   if (vector_config_insn_p (rinsn))
- {
-   m_vector_manager->to_refine_vsetvls.add (rinsn);
  continue;
- }
+
   rinsn = PREV_INSN (rinsn);
   rtx new_pat = gen_vsetvl_pat (VSETVL_VTYPE_CHANGE_ONLY, info, NULL_RTX);
   change_insn (rinsn, new_pat);
@@ -3862,10 +3855,7 @@ pass_vsetvl::cleanup_vsetvls ()
  /* We can't eliminate user vsetvl since the dest will be used
   * by the following instructions.  */
  if (vector_config_insn_p (insn->rtl ()))
- {
-   m_vector_manager->to_delete_vsetvls.

Re: Re: [PATCH V2] internal-fn: Support undefined rtx for uninitialized SSA_NAME

2023-09-17 Thread juzhe.zh...@rivai.ai
Thanks Richard.
Address comments on V3:
https://gcc.gnu.org/pipermail/gcc-patches/2023-September/630699.html 




juzhe.zh...@rivai.ai
 
From: Richard Sandiford
Date: 2023-09-17 23:29
To: Juzhe-Zhong
CC: gcc-patches; rguenther
Subject: Re: [PATCH V2] internal-fn: Support undefined rtx for uninitialized 
SSA_NAME
Juzhe-Zhong  writes:
> According to PR: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110751
>
> As Richard and Richi suggested, we recognize uninitialized SSA_NAME and 
> convert it
> into SCRATCH rtx if the target predicate allows SCRATCH.
>
> It can help to reduce redundant data move instructions of targets like RISC-V.
>
> gcc/ChangeLog:
>
> * internal-fn.cc (expand_fn_using_insn): Support undefined rtx.
> * optabs.cc (maybe_legitimize_operand): Ditto.
> (can_reuse_operands_p): Ditto.
> * optabs.h (enum expand_operand_type): Ditto.
> (create_undefined_input_operand): Ditto.
>
> ---
>  gcc/internal-fn.cc |  4 
>  gcc/optabs.cc  | 16 
>  gcc/optabs.h   | 14 +-
>  3 files changed, 33 insertions(+), 1 deletion(-)
>
> diff --git a/gcc/internal-fn.cc b/gcc/internal-fn.cc
> index 0fd34359247..61d5a9e4772 100644
> --- a/gcc/internal-fn.cc
> +++ b/gcc/internal-fn.cc
> @@ -247,6 +247,10 @@ expand_fn_using_insn (gcall *stmt, insn_code icode, 
> unsigned int noutputs,
>  create_convert_operand_from (&ops[opno], rhs_rtx,
>   TYPE_MODE (rhs_type),
>   TYPE_UNSIGNED (rhs_type));
> +  else if (TREE_CODE (rhs) == SSA_NAME
> +&& SSA_NAME_IS_DEFAULT_DEF (rhs)
> +&& VAR_P (SSA_NAME_VAR (rhs)))
> + create_undefined_input_operand (&ops[opno], TYPE_MODE (rhs_type));
>else
>  create_input_operand (&ops[opno], rhs_rtx, TYPE_MODE (rhs_type));
>opno += 1;
> diff --git a/gcc/optabs.cc b/gcc/optabs.cc
> index 32ff379ffc3..d8c771547a3 100644
> --- a/gcc/optabs.cc
> +++ b/gcc/optabs.cc
> @@ -8102,6 +8102,21 @@ maybe_legitimize_operand (enum insn_code icode, 
> unsigned int opno,
>goto input;
>  }
>break;
> +
> +case EXPAND_UNDEFINED:
> +  {
> + mode = insn_data[(int) icode].operand[opno].mode;
> + rtx scratch = gen_rtx_SCRATCH (mode);
 
A scratch of the right mode should already be available in op->value,
since it was created by create_undefined_input_operand.
 
If that doesn't work for some reason, then it would be better for
create_undefined_input_operand to pass NULL_RTX as the "value"
argument to create_expand_operand.
 
> + /* For SCRATCH rtx which is converted from uninitialized
> +SSA, we convert it as fresh pseudo when target doesn't
> +allow scratch rtx in predicate. Otherwise, return true.  */
> + if (!insn_operand_matches (icode, opno, scratch))
> +   {
> + op->value = gen_reg_rtx (mode);
 
The mode should come from op->mode.
 
> + goto input;
> +   }
> + return true;
> +  }
>  }
>return insn_operand_matches (icode, opno, op->value);
>  }
> @@ -8147,6 +8162,7 @@ can_reuse_operands_p (enum insn_code icode,
>  case EXPAND_INPUT:
>  case EXPAND_ADDRESS:
>  case EXPAND_INTEGER:
> +case EXPAND_UNDEFINED:
>return true;
 
I think this should be in the "return false" block instead.
 
>  
>  case EXPAND_CONVERT_TO:
> diff --git a/gcc/optabs.h b/gcc/optabs.h
> index c80b7f4dc1b..4eb1f9ee09a 100644
> --- a/gcc/optabs.h
> +++ b/gcc/optabs.h
> @@ -37,7 +37,8 @@ enum expand_operand_type {
>EXPAND_CONVERT_TO,
>EXPAND_CONVERT_FROM,
>EXPAND_ADDRESS,
> -  EXPAND_INTEGER
> +  EXPAND_INTEGER,
> +  EXPAND_UNDEFINED
 
Sorry, this was my bad suggestion.  I should have suggested
EXPAND_UNDEFINED_INPUT, to match the name of the function.
 
Thanks,
Richard
 
>  };
>  
>  /* Information about an operand for instruction expansion.  */
> @@ -117,6 +118,17 @@ create_input_operand (class expand_operand *op, rtx 
> value,
>create_expand_operand (op, EXPAND_INPUT, value, mode, false);
>  }
>  
> +/* Make OP describe an undefined input operand for uninitialized
> +   SSA.  It's the scratch operand with mode MODE; MODE cannot be
> +   VOIDmode.  */
> +
> +inline void
> +create_undefined_input_operand (class expand_operand *op, machine_mode mode)
> +{
> +  create_expand_operand (op, EXPAND_UNDEFINED, gen_rtx_SCRATCH (mode), mode,
> + false);
> +}
> +
>  /* Like create_input_operand, except that VALUE must first be converted
> to mode MODE.  UNSIGNED_P says whether VALUE is unsigned.  */
 


Re: [PATCH] RISC-V: Remove phase 6 of vsetvl pass in GCC13[PR111412]

2023-09-17 Thread Kito Cheng via Gcc-patches
I think it's not make too much sense to back port GCC14's change to
GCC 13, removing phase 6 optimization is reasonable to me, so  LGTM :)

On Mon, Sep 18, 2023 at 2:44 PM juzhe.zh...@rivai.ai
 wrote:
>
> Thanks for fixing it.
> I am ok remove phase 6 optimization which has many latent bugs (in GCC 14 
> kito has refactored it) there.
> But I think we need kito's more comments about that.
>
>
>
> juzhe.zh...@rivai.ai
>
> From: Li Xu
> Date: 2023-09-18 12:19
> To: gcc-patches
> CC: kito.cheng; palmer; juzhe.zhong; xuli
> Subject: [PATCH] RISC-V: Remove phase 6 of vsetvl pass in GCC13[PR111412]
> From: xuli 
>
> vsetvl pass has been refactored in gcc14, and the optimization
> is more reasonable than releases/gcc-13. This problem does not
> exist in gcc14.
>
> Phase 6 of gcc13 is an optimization patch. Due to lack of consideration,
> there will be some hidden bugs, so we decided to remove phase 6.
> Although the generated code will be redundant, the program is correct.
>
> PR target/111412
>
> gcc/ChangeLog:
>
> * config/riscv/riscv-vsetvl.cc (vector_infos_manager::release): 
> Remove.
> (pass_vsetvl::refine_vsetvls): Ditto.
> (pass_vsetvl::cleanup_vsetvls): Ditto.
> (pass_vsetvl::propagate_avl): Ditto.
> (pass_vsetvl::lazy_vsetvl): Ditto.
> * config/riscv/riscv-vsetvl.h: Ditto.
>
> gcc/testsuite/ChangeLog:
>
> * gcc.target/riscv/rvv/vsetvl/avl_single-79.c: Adjust case.
> * gcc.target/riscv/rvv/vsetvl/avl_single-80.c: Ditto.
> * gcc.target/riscv/rvv/vsetvl/avl_single-86.c: Ditto.
> * gcc.target/riscv/rvv/vsetvl/avl_single-87.c: Ditto.
> * gcc.target/riscv/rvv/vsetvl/avl_single-88.c: Ditto.
> * gcc.target/riscv/rvv/vsetvl/avl_single-89.c: Ditto.
> * gcc.target/riscv/rvv/vsetvl/avl_single-90.c: Ditto.
> * gcc.target/riscv/rvv/vsetvl/vlmax_back_prop-25.c: Ditto.
> * gcc.target/riscv/rvv/vsetvl/vlmax_back_prop-26.c: Ditto.
> * gcc.target/riscv/rvv/vsetvl/vlmax_switch_vtype-14.c: Ditto.
> * gcc.target/riscv/rvv/vsetvl/vlmax_switch_vtype-15.c: Ditto.
> * gcc.target/riscv/rvv/vsetvl/vsetvl-1.c: Ditto.
> * gcc.target/riscv/rvv/vsetvl/vsetvl-5.c: Ditto.
> * gcc.target/riscv/rvv/vsetvl/vsetvl-6.c: Ditto.
> * gcc.target/riscv/rvv/vsetvl/vsetvl-7.c: Ditto.
> * gcc.target/riscv/rvv/vsetvl/vsetvl-8.c: Ditto.
> * gcc.target/riscv/rvv/vsetvl/vsetvlmax-2.c: Ditto.
> * gcc.target/riscv/rvv/vsetvl/vsetvlmax-4.c: Ditto.
> * gcc.target/riscv/rvv/base/pr111412.c: New test.
> ---
> gcc/config/riscv/riscv-vsetvl.cc  | 153 +-
> gcc/config/riscv/riscv-vsetvl.h   |   2 -
> .../gcc.target/riscv/rvv/base/pr111412.c  |  41 +
> .../riscv/rvv/vsetvl/avl_single-79.c  |   4 +-
> .../riscv/rvv/vsetvl/avl_single-80.c  |   4 +-
> .../riscv/rvv/vsetvl/avl_single-86.c  |   4 +-
> .../riscv/rvv/vsetvl/avl_single-87.c  |   4 +-
> .../riscv/rvv/vsetvl/avl_single-88.c  |   4 +-
> .../riscv/rvv/vsetvl/avl_single-89.c  |   4 +-
> .../riscv/rvv/vsetvl/avl_single-90.c  |   4 +-
> .../riscv/rvv/vsetvl/vlmax_back_prop-25.c |  10 +-
> .../riscv/rvv/vsetvl/vlmax_back_prop-26.c |  10 +-
> .../riscv/rvv/vsetvl/vlmax_switch_vtype-14.c  |   6 +-
> .../riscv/rvv/vsetvl/vlmax_switch_vtype-15.c  |   2 +-
> .../gcc.target/riscv/rvv/vsetvl/vsetvl-1.c|   2 +-
> .../gcc.target/riscv/rvv/vsetvl/vsetvl-5.c|   2 +-
> .../gcc.target/riscv/rvv/vsetvl/vsetvl-6.c|   2 +-
> .../gcc.target/riscv/rvv/vsetvl/vsetvl-7.c|   2 +-
> .../gcc.target/riscv/rvv/vsetvl/vsetvl-8.c|   2 +-
> .../gcc.target/riscv/rvv/vsetvl/vsetvlmax-2.c |   4 +-
> .../gcc.target/riscv/rvv/vsetvl/vsetvlmax-4.c |   4 +-
> 21 files changed, 80 insertions(+), 190 deletions(-)
> create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/base/pr111412.c
>
> diff --git a/gcc/config/riscv/riscv-vsetvl.cc 
> b/gcc/config/riscv/riscv-vsetvl.cc
> index 0cf4bc818e2..9dca2ce709d 100644
> --- a/gcc/config/riscv/riscv-vsetvl.cc
> +++ b/gcc/config/riscv/riscv-vsetvl.cc
> @@ -2494,8 +2494,6 @@ vector_infos_manager::release (void)
>if (!vector_exprs.is_empty ())
>  vector_exprs.release ();
> -  gcc_assert (to_refine_vsetvls.is_empty ());
> -  gcc_assert (to_delete_vsetvls.is_empty ());
>if (optimize > 0)
>  free_bitmap_vectors ();
> }
> @@ -2702,9 +2700,6 @@ private:
>/* Phase 5.  */
>void cleanup_insns (void) const;
> -  /* Phase 6.  */
> -  void propagate_avl (void) const;
> -
>void init (void);
>void done (void);
>void compute_probabilities (void);
> @@ -3823,10 +3818,8 @@ pass_vsetvl::refine_vsetvls (void) const
>/* We can't refine user vsetvl into vsetvl zero,zero since the dest
> will be used by the following instructions.  */
>if (vector_config_insn_p (rinsn))
> - {
> -   m_vector_manager->to_refine_vsetvls.add (rinsn);
>   continue;
> - }

Re: [PATCH] [RFC] New early __builtin_unreachable processing.

2023-09-17 Thread Richard Biener via Gcc-patches
On Fri, Sep 15, 2023 at 4:45 PM Andrew MacLeod  wrote:
>
> Ive been looking at __builtin_unreachable () regressions.  The
> fundamental problem seems to be  a lack of consistent expectation for
> when we remove it earlier than the final pass of VRP.After looking
> through them, I think this provides a sensible approach.
>
> Ranger is pretty good at providing ranges in blocks dominated by the
> __builtin_unreachable  branch, so removing it isn't quite a critical as
> it once was.  Its also pretty good at identifying what in the block can
> be affected by the branch.
>
> This patch provide an alternate removal algorithm for earlier passes.
> it looks at *all* the exports from the block, and if the branch
> dominates every use of all the exports, AND none of those values access
> memory, VRP will remove the unreachable call, rewrite the branch, update
> all the values globally, and finally perform the simple DCE on the
> branch's ssa-name.   This is kind of what it did before, but it wasn't
> as stringent on the requirements.
>
> The memory access check is required because there are a couple of test
> cases for PRE in which there is a series of instruction leading to an
> unreachable call, and none of those ssa names are ever used in the IL
> again. The whole chunk is dead, and we update globals, however
> pointlessly.  However, one of ssa_names loads from memory, and a later
> passes commons this value with a later load, and then  the unreachable
> call provides additional information about the later load.This is
> evident in tree-ssa/ssa-pre-34.c.   The only way I see to avoid this
> situation is to not remove the unreachable if there is a load feeding it.
>
> What this does is a more sophisticated version of what DOM does in
> all_uses_feed_or_dominated_by_stmt.  THe feeding instructions dont have
> to be single use, but they do have to be dominated by the branch or be
> single use within the branches block..
>
> If there are multiple uses in the same block as the branch, this does
> not remove the unreachable call.  If we could be sure there are no
> intervening calls or side effects, it would be allowable, but this a
> more expensive checking operation.  Ranger gets the ranges right anyway,
> so with more passes using ranger, Im not sure we'd see much benefit from
> the additional analysis.   It could always be added later.
>
> This fixes at least 110249 and 110080 (and probably others).  The only
> regression is 93917 for which I changed the testcase to adjust
> expectations:
>
> // PR 93917
> void f1(int n)
> {
>if(n<0)
>  __builtin_unreachable();
>f3(n);
> }
>
> void f2(int*n)
> {
>if(*n<0)
>  __builtin_unreachable();
>f3 (*n);
> }
>
> We were removing both unreachable calls in VRP1, but only updating the
> global values in the first case, meaning we lose information.   With the
> change in semantics, we only update the global in the first case, but we
> leave the unreachable call in the second case now (due to the load from
> memory).  Ranger still calculates the contextual range correctly as [0,
> +INF] in the second case, it just doesn't set the global value until
> VRP2 when it is removed.
>
> Does this seem reasonable?

I wonder how this addresses the fundamental issue we always faced
in that when we apply the range this range info in itself allows the
branch to the __builtin_unreachable () to be statically determined,
so when the first VRP pass sets the range the next pass evaluating
the condition will remove it (and the guarded __builtin_unreachable ()).

In principle there's nothing wrong with that if we don't lose the range
info during optimizations, but that unfortunately happens more often
than wanted and with the __builtin_unreachable () gone we've lost
the ability to re-compute them.

I think it's good to explicitly remove the branch at the point we want
rather than relying on the "next" visitor to pick up the global range.

As I read the patch we now remove __builtin_unreachable () explicitly
as soon as possible but don't really address the fundamental issue
in any way?

> Bootstraps on x86_64-pc-linux-gnu with no regressions.  OK?
>
> Andrew
>
>


Re: [PATCH] gcc: Introduce -fhardened

2023-09-17 Thread Richard Biener via Gcc-patches
On Fri, Sep 15, 2023 at 5:09 PM Marek Polacek via Gcc-patches
 wrote:
>
> Bootstrapped/regtested on x86_64-pc-linux-gnu, powerpc64le-unknown-linux-gnu,
> and aarch64-unknown-linux-gnu; ok for trunk?
>
> -- >8 --
> In 
> I proposed -fhardened, a new umbrella option that enables a reasonable set
> of hardening flags.  The read of the room seems to be that the option
> would be useful.  So here's a patch implementing that option.
>
> Currently, -fhardened enables:
>
>   -D_FORTIFY_SOURCE=3 (or =2 for older glibcs)
>   -D_GLIBCXX_ASSERTIONS
>   -ftrivial-auto-var-init=pattern
>   -fPIE  -pie  -Wl,-z,relro,-z,now
>   -fstack-protector-strong
>   -fstack-clash-protection
>   -fcf-protection=full (x86 GNU/Linux only)
>
> -fhardened will not override options that were specified on the command line
> (before or after -fhardened).  For example,
>
>  -D_FORTIFY_SOURCE=1 -fhardened
>
> means that _FORTIFY_SOURCE=1 will be used.  Similarly,
>
>   -fhardened -fstack-protector
>
> will not enable -fstack-protector-strong.
>
> In DW_AT_producer it is reflected only as -fhardened; it doesn't expand
> to anything.  I think we need a better way to show what it actually
> enables.

I do think we need to find a solution here to solve asserting compliance.
Maybe we can have -Whardened that will diagnose any altering of
-fhardened by other options on the command-line or by missed target
implementations?  People might for example use -fstack-protector
but don't really want to make protection lower than requested with -fhardened.

Any such conflict is much less appearant than when you use the
flags -fhardened composes.

Richard.

>
> gcc/c-family/ChangeLog:
>
> * c-opts.cc (c_finish_options): Maybe cpp_define _FORTIFY_SOURCE
> and _GLIBCXX_ASSERTIONS.
>
> gcc/ChangeLog:
>
> * common.opt (fhardened): New option.
> * config.in: Regenerate.
> * config/bpf/bpf.cc: Include "opts.h".
> (bpf_option_override): If flag_stack_protector_set_by_fhardened_p, do
> not inform that -fstack-protector does not work.
> * config/i386/i386-options.cc (ix86_option_override_internal): When
> -fhardened, maybe enable -fcf-protection=full.
> * configure: Regenerate.
> * configure.ac: Check if the linker supports '-z now' and '-z relro'.
> * doc/invoke.texi: Document -fhardened.
> * gcc.cc (driver_handle_option): Remember if any link options or 
> -static
> were specified on the command line.
> (process_command): When -fhardened, maybe enable -pie and
> -Wl,-z,relro,-z,now.
> * opts.cc (flag_stack_protector_set_by_fhardened_p): New global.
> (finish_options): When -fhardened, enable
> -ftrivial-auto-var-init=pattern and -fstack-protector-strong.
> (print_help_hardened): New.
> (print_help): Call it.
> * toplev.cc (process_options): When -fhardened, enable
> -fstack-clash-protection.  If flag_stack_protector_set_by_fhardened_p,
> do not warn that -fstack-protector not supported for this target.
>
> gcc/testsuite/ChangeLog:
>
> * gcc.misc-tests/help.exp: Test -fhardened.
> * c-c++-common/fhardened-1.S: New test.
> * c-c++-common/fhardened-1.c: New test.
> * c-c++-common/fhardened-10.c: New test.
> * c-c++-common/fhardened-11.c: New test.
> * c-c++-common/fhardened-12.c: New test.
> * c-c++-common/fhardened-13.c: New test.
> * c-c++-common/fhardened-14.c: New test.
> * c-c++-common/fhardened-2.c: New test.
> * c-c++-common/fhardened-3.c: New test.
> * c-c++-common/fhardened-5.c: New test.
> * c-c++-common/fhardened-6.c: New test.
> * c-c++-common/fhardened-7.c: New test.
> * c-c++-common/fhardened-8.c: New test.
> * c-c++-common/fhardened-9.c: New test.
> * gcc.target/i386/cf_check-6.c: New test.
> ---
>  gcc/c-family/c-opts.cc | 29 
>  gcc/common.opt |  4 ++
>  gcc/config.in  | 12 +
>  gcc/config/bpf/bpf.cc  |  8 ++--
>  gcc/config/i386/i386-options.cc| 11 -
>  gcc/configure  | 50 +++-
>  gcc/configure.ac   | 42 -
>  gcc/doc/invoke.texi| 29 +++-
>  gcc/gcc.cc | 39 +++-
>  gcc/opts.cc| 53 --
>  gcc/opts.h |  1 +
>  gcc/testsuite/c-c++-common/fhardened-1.S   |  6 +++
>  gcc/testsuite/c-c++-common/fhardened-1.c   | 14 ++
>  gcc/testsuite/c-c++-common/fhardened-10.c  | 10 
>  gcc/testsuite/c-c++-common/fhardened-11.c  | 10 
>  gcc/testsuite/c-c++-common/fhardened-12.c  | 11 +
>  gcc/testsuite/c