date:20231213

[PATCH] Middle-end: Adjust decrement IV style partial vectorization COST model

2023-12-13 Thread Juzhe-Zhong

Hi, before this patch, a simple conversion case for RVV codegen:

foo:
ble a2,zero,.L8
addiw   a5,a2,-1
li  a4,6
bleua5,a4,.L6
srliw   a3,a2,3
sllia3,a3,3
add a3,a3,a0
mv  a5,a0
mv  a4,a1
vsetivlizero,8,e16,m1,ta,ma
.L4:
vle8.v  v2,0(a5)
addia5,a5,8
vzext.vf2   v1,v2
vse16.v v1,0(a4)
addia4,a4,16
bne a3,a5,.L4
andia5,a2,-8
beq a2,a5,.L10
.L3:
sllia4,a5,32
srlia4,a4,32
subwa2,a2,a5
sllia2,a2,32
sllia5,a4,1
srlia2,a2,32
add a0,a0,a4
add a1,a1,a5
vsetvli zero,a2,e16,m1,ta,ma
vle8.v  v2,0(a0)
vzext.vf2   v1,v2
vse16.v v1,0(a1)
.L8:
ret
.L10:
ret
.L6:
li  a5,0
j   .L3

This vectorization go through first loop:

vsetivlizero,8,e16,m1,ta,ma
.L4:
vle8.v  v2,0(a5)
addia5,a5,8
vzext.vf2   v1,v2
vse16.v v1,0(a4)
addia4,a4,16
bne a3,a5,.L4

Each iteration processes 8 elements.

For a scalable vectorization with VLEN > 128 bits CPU, it's ok when VLEN = 128.
But, as long as VLEN > 128 bits, it will waste the CPU resources. That is, e.g. 
VLEN = 256bits.
only half of the vector units are working and another half is idle.

After investigation, I realize that I forgot to adjust COST for SELECT_VL.
So, adjust COST for SELECT_VL styple length vectorization. We adjust COST from 
3 to 2. since
after this patch:

foo:
ble a2,zero,.L5
.L3:
vsetvli a5,a2,e16,m1,ta,ma -> SELECT_VL cost.
vle8.v  v2,0(a0)
sllia4,a5,1-> additional shift of outcome 
SELECT_VL for memory address calculation.
vzext.vf2   v1,v2
sub a2,a2,a5
vse16.v v1,0(a1)
add a0,a0,a5
add a1,a1,a4
bne a2,zero,.L3
.L5:
ret

This patch is a simple fix that I previous forgot.

Ok for trunk ?

If not, I am going to adjust cost in backend cost model.

PR target/111317

gcc/ChangeLog:

* tree-vect-loop.cc (vect_estimate_min_profitable_iters): Adjust for 
COST for decrement IV.

gcc/testsuite/ChangeLog:

* gcc.dg/vect/costmodel/riscv/rvv/pr111317.c: New test.

---
 .../gcc.dg/vect/costmodel/riscv/rvv/pr111317.c  | 12 
 gcc/tree-vect-loop.cc   | 17 ++---
 2 files changed, 26 insertions(+), 3 deletions(-)
 create mode 100644 gcc/testsuite/gcc.dg/vect/costmodel/riscv/rvv/pr111317.c

diff --git a/gcc/testsuite/gcc.dg/vect/costmodel/riscv/rvv/pr111317.c 
b/gcc/testsuite/gcc.dg/vect/costmodel/riscv/rvv/pr111317.c
new file mode 100644
index 000..d4bea242a9a
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/vect/costmodel/riscv/rvv/pr111317.c
@@ -0,0 +1,12 @@
+/* { dg-do compile } */
+/* { dg-options "-march=rv64gcv -mabi=lp64d -O3 -ftree-vectorize 
--param=riscv-autovec-lmul=m1" } */
+
+void
+foo (char *__restrict a, short *__restrict b, int n)
+{
+  for (int i = 0; i < n; i++)
+b[i] = (short) a[i];
+}
+
+/* { dg-final { scan-assembler-times 
{vsetvli\s+[a-x0-9]+,\s*[a-x0-9]+,\s*e16,\s*m1,\s*t[au],\s*m[au]} 1 } } */
+/* { dg-final { scan-assembler-times {vsetvli} 1 } } */
diff --git a/gcc/tree-vect-loop.cc b/gcc/tree-vect-loop.cc
index 6261cd1be1d..19e38b8637b 100644
--- a/gcc/tree-vect-loop.cc
+++ b/gcc/tree-vect-loop.cc
@@ -4870,10 +4870,21 @@ vect_estimate_min_profitable_iters (loop_vec_info 
loop_vinfo,
if (partial_load_store_bias != 0)
  body_stmts += 1;
 
-   /* Each may need two MINs and one MINUS to update lengths in body
-  for next iteration.  */
+   unsigned int length_update_cost = 0;
+   if (LOOP_VINFO_USING_DECREMENTING_IV_P (loop_vinfo))
+ /* For decrement IV style, we use a single SELECT_VL since
+beginning to calculate the number of elements need to be
+processed in current iteration, and a SHIFT operation to
+compute the next memory address instead of adding vectorization
+factor.  */
+ length_update_cost = 2;
+   else
+ /* For increment IV stype, Each may need two MINs and one MINUS to
+update lengths in body for next iteration.  */
+ length_update_cost = 3;
+
if (need_iterate_p)
- body_stmts += 3 * num_vectors;
+ body_stmts += length_update_cost * num_vectors;
  }
 
   (void) add_stmt_cost (target_cost_data, prologue_stmts,
-- 
2.36.3

[PATCH] i386: Make most MD builtins nothrow, leaf [PR112962]

2023-12-13 Thread Jakub Jelinek

Hi!

The following patch makes most of x86 MD builtins nothrow,leaf
(like most middle-end builtins are).  For -fnon-call-exceptions it
doesn't nothrow, better might be to still add it if the builtins
don't read or write memory and can't raise floating point exceptions,
but we don't have such information readily available, so the patch
uses just !flag_non_call_exceptions for now.
Not sure if we shouldn't have some exceptions for the leaf attribute,
e.g. wonder about EMMS/FEMMS and the various xsave/xrstor etc. builtins,
pedantically none of those builtins do anything that leaf functions
are forbidden to do (having callbacks, calling functions from current TU,
longjump into the current TU), but sometimes non-leaf is also used on
really complex functions to prevent some unwanted optimizations.
That said, haven't run into any problems as is with the patch.

Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?

2023-12-13  Jakub Jelinek  

PR target/112962
* config/i386/i386-builtins.cc (ix86_builtins): Increase by one
element.
(def_builtin): If not -fnon-call-exceptions, set TREE_NOTHROW on
the builtin FUNCTION_DECL.  Add leaf attribute to DECL_ATTRIBUTES.
(ix86_add_new_builtins): Likewise.

--- gcc/config/i386/i386-builtins.cc.jj 2023-10-13 19:34:43.767837029 +0200
+++ gcc/config/i386/i386-builtins.cc2023-12-12 12:20:50.980071085 +0100
@@ -221,7 +221,7 @@ ix86_get_builtin_func_type (enum ix86_bu
 }
 
 /* Table for the ix86 builtin decls.  */
-static GTY(()) tree ix86_builtins[(int) IX86_BUILTIN_MAX];
+static GTY(()) tree ix86_builtins[(int) IX86_BUILTIN_MAX + 1];
 
 struct builtin_isa ix86_builtins_isa[(int) IX86_BUILTIN_MAX];
 
@@ -295,6 +295,12 @@ def_builtin (HOST_WIDE_INT mask, HOST_WI
   NULL, NULL_TREE);
  ix86_builtins[(int) code] = decl;
  ix86_builtins_isa[(int) code].set_and_not_built_p = false;
+ if (!flag_non_call_exceptions)
+   TREE_NOTHROW (decl) = 1;
+ if (ix86_builtins[(int) IX86_BUILTIN_MAX] == NULL_TREE)
+   ix86_builtins[(int) IX86_BUILTIN_MAX]
+ = build_tree_list (get_identifier ("leaf"), NULL_TREE);
+ DECL_ATTRIBUTES (decl) = ix86_builtins[(int) IX86_BUILTIN_MAX];
}
   else
{
@@ -393,6 +399,12 @@ ix86_add_new_builtins (HOST_WIDE_INT isa
TREE_READONLY (decl) = 1;
  if (ix86_builtins_isa[i].pure_p)
DECL_PURE_P (decl) = 1;
+ if (!flag_non_call_exceptions)
+   TREE_NOTHROW (decl) = 1;
+ if (ix86_builtins[(int) IX86_BUILTIN_MAX] == NULL_TREE)
+   ix86_builtins[(int) IX86_BUILTIN_MAX]
+ = build_tree_list (get_identifier ("leaf"), NULL_TREE);
+ DECL_ATTRIBUTES (decl) = ix86_builtins[(int) IX86_BUILTIN_MAX];
}
 }
 

Jakub

Re: [PATCH v3 3/4] RISC-V: Add crypto machine descriptions

2023-12-13 Thread juzhe.zh...@rivai.ai

+(define_insn "@pred_vandn_scalar"
+  [(set (match_operand:VI 0 "register_operand"   "=vd, vr,vd, vr")
+(if_then_else:VI
+  (unspec:
+[(match_operand: 1 "vector_mask_operand" " vm,Wc1,vm,Wc1")
+ (match_operand 5 "vector_length_operand"" rK, rK,rK, rK")
+ (match_operand 6 "const_int_operand""  i,  i, i,  i")
+ (match_operand 7 "const_int_operand""  i,  i, i,  i")
+ (match_operand 8 "const_int_operand""  i,  i, i,  i")
+ (reg:SI VL_REGNUM)
+ (reg:SI VTYPE_REGNUM)] UNSPEC_VPREDICATE)
+  (and:VI
+ (match_operand:VI 3 "register_operand"   "vr, vr,vr, vr")
+ (not:
+   (match_operand: 4 "register_operand"  " r,  r, r,  r")))
+  (match_operand:VI 2 "vector_merge_operand"  "vu, vu, 0,  0")))]
+  "TARGET_ZVBB || TARGET_ZVKB"
+  "vandn.vx\t%0,%3,%4%p1"
+  [(set_attr "type" "vandn")
+   (set_attr "mode" "")])
EEW = 64 in RV32 system handling is missing ? 





juzhe.zh...@rivai.ai
 
From: Feng Wang
Date: 2023-12-13 17:12
To: gcc-patches
CC: kito.cheng; jeffreyalaw; juzhe.zhong; zhusonghe; panciyan; Feng Wang
Subject: [PATCH v3 3/4] RISC-V: Add crypto machine descriptions
Patch v3: Moidfy constrains for crypto vector.
Patch v2: Add crypto vector ins into RATIO attr and use vr as
destination register.
 
This patch add the crypto machine descriptions(vector-crypto.md) and
some new iterators which are used by crypto vector ext.
 
Co-Authored by: Songhe Zhu 
Co-Authored by: Ciyan Pan 
 
gcc/ChangeLog:
 
* config/riscv/iterators.md: Add rotate insn name.
* config/riscv/riscv.md: Add new insns name for crypto vector.
* config/riscv/vector-iterators.md: Add new iterators for crypto vector.
* config/riscv/vector.md: Add the corresponding attr for crypto vector.
* config/riscv/vector-crypto.md: New file.The machine descriptions for crypto 
vector.
---
gcc/config/riscv/iterators.md|   4 +-
gcc/config/riscv/riscv.md|  33 +-
gcc/config/riscv/vector-crypto.md| 502 +++
gcc/config/riscv/vector-iterators.md |  41 +++
gcc/config/riscv/vector.md   |  55 ++-
5 files changed, 614 insertions(+), 21 deletions(-)
create mode 100755 gcc/config/riscv/vector-crypto.md
 
diff --git a/gcc/config/riscv/iterators.md b/gcc/config/riscv/iterators.md
index ecf033f2fa7..f332fba7031 100644
--- a/gcc/config/riscv/iterators.md
+++ b/gcc/config/riscv/iterators.md
@@ -304,7 +304,9 @@
(umax "maxu")
(clz "clz")
(ctz "ctz")
- (popcount "cpop")])
+ (popcount "cpop")
+ (rotate "rol")
+ (rotatert "ror")])
;; ---
;; Int Iterators.
diff --git a/gcc/config/riscv/riscv.md b/gcc/config/riscv/riscv.md
index eed997116b0..572ad381d65 100644
--- a/gcc/config/riscv/riscv.md
+++ b/gcc/config/riscv/riscv.md
@@ -427,6 +427,34 @@
;; vcompressvector compress instruction
;; vmov whole vector register move
;; vector   unknown vector instruction
+;; 17. Crypto Vector instructions
+;; vandncrypto vector bitwise and-not instructions
+;; vbrevcrypto vector reverse bits in elements instructions
+;; vbrev8   crypto vector reverse bits in bytes instructions
+;; vrev8crypto vector reverse bytes instructions
+;; vclz crypto vector count leading Zeros instructions
+;; vctz crypto vector count lrailing Zeros instructions
+;; vrol crypto vector rotate left instructions
+;; vror crypto vector rotate right instructions
+;; vwsllcrypto vector widening shift left logical instructions
+;; vclmul   crypto vector carry-less multiply - return low half 
instructions
+;; vclmulh  crypto vector carry-less multiply - return high half 
instructions
+;; vghshcrypto vector add-multiply over GHASH Galois-Field instructions
+;; vgmulcrypto vector multiply over GHASH Galois-Field instrumctions
+;; vaesef   crypto vector AES final-round encryption instructions
+;; vaesem   crypto vector AES middle-round encryption instructions
+;; vaesdf   crypto vector AES final-round decryption instructions
+;; vaesdm   crypto vector AES middle-round decryption instructions
+;; vaeskf1  crypto vector AES-128 Forward KeySchedule generation 
instructions
+;; vaeskf2  crypto vector AES-256 Forward KeySchedule generation 
instructions
+;; vaeszcrypto vector AES round zero encryption/decryption instructions
+;; vsha2ms  crypto vector SHA-2 message schedule instructions
+;; vsha2ch  crypto vector SHA-2 two rounds of compression instructions
+;; vsha2cl  crypto vector SHA-2 two rounds of compression instructions
+;; vsm4kcrypto vector SM4 KeyExpansion instructions
+;; vsm4rcrypto vector SM4 Rounds instructions
+;; vsm3me   crypto vector SM3 Message Expansion instructions
+;; vsm3ccrypto vector SM3 Compression instructions
(define_attr "type"

[PATCH] c++: Fix tinst_level::to_list [PR112968]

2023-12-13 Thread Jakub Jelinek

Hi!

With valgrind checking, there are various errors reported on some C++26
libstdc++ tests, like:
==2009913== Conditional jump or move depends on uninitialised value(s)
==2009913==at 0x914C59: gt_ggc_mx_lang_tree_node(void*) (gt-cp-tree.h:107)
==2009913==by 0x8AB7A5: gt_ggc_mx_tinst_level(void*) (gt-cp-pt.h:32)
==2009913==by 0xB89B25: ggc_mark_root_tab(ggc_root_tab const*) 
(ggc-common.cc:75)
==2009913==by 0xB89DF4: ggc_mark_roots() (ggc-common.cc:104)
==2009913==by 0x9D6311: ggc_collect(ggc_collect) (ggc-page.cc:2227)
==2009913==by 0xDB70F6: execute_one_pass(opt_pass*) (passes.cc:2738)
==2009913==by 0xDB721F: execute_pass_list_1(opt_pass*) (passes.cc:2755)
==2009913==by 0xDB7258: execute_pass_list(function*, opt_pass*) 
(passes.cc:2766)
==2009913==by 0xA55525: cgraph_node::analyze() (cgraphunit.cc:695)
==2009913==by 0xA57CC7: analyze_functions(bool) (cgraphunit.cc:1248)
==2009913==by 0xA5890D: symbol_table::finalize_compilation_unit() 
(cgraphunit.cc:2555)
==2009913==by 0xEB02A1: compile_file() (toplev.cc:473)

I think the problem is in the tinst_level::to_list optimization from 2018.
That function returns a TREE_LIST with TREE_PURPOSE/TREE_VALUE filled in.
Either it freshly allocates using build_tree_list (NULL, NULL); + stores
TREE_PURPOSE/TREE_VALUE, that case is fine (the whole tree_list object
is zeros, except for TREE_CODE set to TREE_LIST and TREE_PURPOSE/TREE_VALUE
modified later; the above also means in particular TREE_TYPE of it is NULL
and TREE_CHAIN is NULL and both are accessible/initialized even in valgrind
annotations.
Or it grabs a TREE_LIST node from a freelist.
If defined(ENABLE_GC_CHECKING), the object is still all zeros except
for TREE_CODE/TREE_PURPOSE/TREE_VALUE like in the fresh allocation case
(but unlike the build_tree_list case in the valgrind annotations
TREE_TYPE and TREE_CHAIN are marked as uninitialized).
If !defined(ENABLE_GC_CHECKING), I believe the actual memory content
is that everything but TREE_CODE/TREE_PURPOSE/TREE_VALUE/TREE_CHAIN is
zeros and TREE_CHAIN is something random (whatever next entry is in the
freelist, nothing overwrote it) and from valgrind POV again,
TREE_TYPE and TREE_CHAIN are marked as uninitialized.

When using the other freelist instantiations (pending_template and
tinst_level) I believe everything is correct, from valgrind POV it marks
the whole pending_template or tinst_level as uninitialized, but the
caller initializes it all).

One way to fix this would be let tinst_level::to_list not store just
  TREE_PURPOSE (ret) = tldcl;
  TREE_VALUE (ret) = targs;
but also
  TREE_TYPE (ret) = NULL_TREE;
  TREE_CHAIN (ret) = NULL_TREE;
Though, that seems like wasted effort in the build_tree_list case to me.

So, the following patch instead does that TREE_CHAIN = NULL_TREE store only
in the case where it isn't already done and marks both TREE_CHAIN and
TREE_TYPE as initialized (the latter is at that spot, the former is because
we never really touch TREE_TYPE of a TREE_LIST anywhere and so the NULL
gets stored into the freelist and restored from there (except for
ENABLE_GC_CHECKING where it is poisoned and then cleared again).

Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?

2023-12-13  Jakub Jelinek  

PR c++/112968
* pt.cc (freelist::reinit): Make whole obj->common
defined for valgrind annotations rather than just obj->base,
and do it even for ENABLE_GC_CHECKING.  If not ENABLE_GC_CHECKING,
clear TREE_CHAIN (obj).

--- gcc/cp/pt.cc.jj 2023-12-11 23:52:03.592513063 +0100
+++ gcc/cp/pt.cc2023-12-12 16:40:09.259903877 +0100
@@ -9525,7 +9525,7 @@ template <>
 inline void
 freelist::reinit (tree obj ATTRIBUTE_UNUSED)
 {
-  tree_base *b ATTRIBUTE_UNUSED = &obj->base;
+  tree_common *c ATTRIBUTE_UNUSED = &obj->common;
 
 #ifdef ENABLE_GC_CHECKING
   gcc_checking_assert (TREE_CODE (obj) == TREE_LIST);
@@ -9540,8 +9540,9 @@ freelist::reinit (tree obj AT
 #ifdef ENABLE_GC_CHECKING
   TREE_SET_CODE (obj, TREE_LIST);
 #else
-  VALGRIND_DISCARD (VALGRIND_MAKE_MEM_DEFINED (b, sizeof (*b)));
+  TREE_CHAIN (obj) = NULL_TREE;
 #endif
+  VALGRIND_DISCARD (VALGRIND_MAKE_MEM_DEFINED (c, sizeof (*c)));
 }
 
 /* Point to the first object in the TREE_LIST freelist.  */

Jakub

[PATCH] lower-bitint: Fix lowering of non-_BitInt to _BitInt cast merged with some wider cast [PR112940]

2023-12-13 Thread Jakub Jelinek

Hi!

The following testcase ICEs, because a PHI argument from latch edge
uses a SSA_NAME set only in a conditionally executed block inside of the
loop.
This happens when we have some outer cast which lowers its operand several
times, under some condition with variable index, under different condition
with some constant index, otherwise something else, and then there is
an inner cast from non-_BitInt integer (or small/middle one).  Such cast
in certain conditions is emitted by initializing some SSA_NAMEs in the
initialization statements before loops (say for casts from <= limb size
precision by computing a SSA_NAME for the first limb and then extension
of it for the later limbs) and uses the prepare_data_in_out function
to create a PHI node.  Such function is passed the value (constant or
SSA_NAME) to use in the PHI argument from the pre-header edge, but for
the latch edge it always created a new SSA_NAME and then caller emitted
in the following 3 spots an extra assignment to set that SSA_NAME to
whatever value we want from the latch edge.  In all these 3 cases
the argument from the latch edge is known already before the loop though,
either constant or SSA_NAME computed in pre-header as well.
But the need to emit an assignment combined with the handle_operand done
in a conditional basic block results in the SSA verification failure.

The following patch fixes it by extending the prpare_data_in_out method,
so that when the latch edge argument is known before (constant or computed
in pre-header), we can just use it directly and avoid the extra assignment
that would normally be hopefully optimized away later to what we now emit
directly.

Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?

2023-12-13  Jakub Jelinek  

PR tree-optimization/112940
* gimple-lower-bitint.cc (struct bitint_large_huge): Add another
argument to prepare_data_in_out method defaulted to NULL_TREE.
(bitint_large_huge::handle_operand): Pass another argument to
prepare_data_in_out instead of emitting an assignment to set it.
(bitint_large_huge::prepare_data_in_out): Add VAL_OUT argument.
If non-NULL, use it as PHI argument instead of creating a new
SSA_NAME.
(bitint_large_huge::handle_cast): Pass rext as another argument
to 2 prepare_data_in_out calls instead of emitting assignments
to set them.

* gcc.dg/bitint-53.c: New test.

--- gcc/gimple-lower-bitint.cc.jj   2023-12-08 09:03:06.644475539 +0100
+++ gcc/gimple-lower-bitint.cc  2023-12-12 18:31:24.440394239 +0100
@@ -405,7 +405,7 @@ struct bitint_large_huge
 profile_probability, profile_probability,
 edge &, edge &, edge &);
   tree handle_operand (tree, tree);
-  tree prepare_data_in_out (tree, tree, tree *);
+  tree prepare_data_in_out (tree, tree, tree *, tree = NULL_TREE);
   tree add_cast (tree, tree);
   tree handle_plus_minus (tree_code, tree, tree, tree);
   tree handle_lshift (tree, tree, tree);
@@ -872,11 +872,8 @@ bitint_large_huge::handle_operand (tree
  gcc_assert (m_first);
  m_data.pop ();
  m_data.pop ();
- prepare_data_in_out (fold_convert (m_limb_type, op), idx, &out);
- g = gimple_build_assign (m_data[m_data_cnt + 1],
-  build_int_cst (m_limb_type, ext));
- insert_before (g);
- m_data[m_data_cnt + 1] = gimple_assign_rhs1 (g);
+ prepare_data_in_out (fold_convert (m_limb_type, op), idx, &out,
+  build_int_cst (m_limb_type, ext));
}
  else if (min_prec > prec - rem - 2 * limb_prec)
{
@@ -1008,10 +1005,13 @@ bitint_large_huge::handle_operand (tree
 
 /* Helper method, add a PHI node with VAL from preheader edge if
inside of a loop and m_first.  Keep state in a pair of m_data
-   elements.  */
+   elements.  If VAL_OUT is non-NULL, use that as PHI argument from
+   the latch edge, otherwise create a new SSA_NAME for it and let
+   caller initialize it.  */
 
 tree
-bitint_large_huge::prepare_data_in_out (tree val, tree idx, tree *data_out)
+bitint_large_huge::prepare_data_in_out (tree val, tree idx, tree *data_out,
+   tree val_out)
 {
   if (!m_first)
 {
@@ -1034,7 +1034,7 @@ bitint_large_huge::prepare_data_in_out (
   if (e1 == e2)
 e2 = EDGE_PRED (m_bb, 1);
   add_phi_arg (phi, val, e1, UNKNOWN_LOCATION);
-  tree out = make_ssa_name (TREE_TYPE (val));
+  tree out = val_out ? val_out : make_ssa_name (TREE_TYPE (val));
   add_phi_arg (phi, out, e2, UNKNOWN_LOCATION);
   m_data.safe_push (in);
   m_data.safe_push (out);
@@ -1541,14 +1541,10 @@ bitint_large_huge::handle_cast (tree lhs
  if (m_first)
{
  tree out1, out2;
- prepare_data_in_out (r1, idx, &out1);
- g = gimple_build_assign (m_data[m_data_cnt + 1],

Re: [PATCH v2 1/4] RISC-V:Add crypto vector implied ISA info.

2023-12-13 Thread Kito Cheng

LGTM

On Wed, Dec 13, 2023 at 5:14 PM Feng Wang  wrote:
>
> Patch v2: Change the implied ISA info using the minimum set and add
> dependencies info  into the python script.
>
> Due to the crypto vector entension is depend on the Vector extension,
> so the "v" info is added into implied ISA info with the corresponding
> crypto vector extension.
>
> gcc/ChangeLog:
>
> * common/config/riscv/riscv-common.cc: Modify implied ISA info.
> * config/riscv/arch-canonicalize: Add crypto vector implied info.
> ---
>  gcc/common/config/riscv/riscv-common.cc |  9 +
>  gcc/config/riscv/arch-canonicalize  | 21 +++--
>  2 files changed, 24 insertions(+), 6 deletions(-)
>
> diff --git a/gcc/common/config/riscv/riscv-common.cc 
> b/gcc/common/config/riscv/riscv-common.cc
> index 4d5a2f874a2..76987598143 100644
> --- a/gcc/common/config/riscv/riscv-common.cc
> +++ b/gcc/common/config/riscv/riscv-common.cc
> @@ -145,6 +145,15 @@ static const riscv_implied_info_t riscv_implied_info[] =
>{"zvksc", "zvbc"},
>{"zvksg", "zvks"},
>{"zvksg", "zvkg"},
> +  {"zvbb",  "zvkb"},
> +  {"zvbc",   "zve64x"},
> +  {"zvkb",   "zve32x"},
> +  {"zvkg",   "zve32x"},
> +  {"zvkned", "zve32x"},
> +  {"zvknha", "zve32x"},
> +  {"zvknhb", "zve64x"},
> +  {"zvksed", "zve32x"},
> +  {"zvksh",  "zve32x"},
>
>{"zfh", "zfhmin"},
>{"zfhmin", "f"},
> diff --git a/gcc/config/riscv/arch-canonicalize 
> b/gcc/config/riscv/arch-canonicalize
> index ea2f67a0944..a8f47a1752b 100755
> --- a/gcc/config/riscv/arch-canonicalize
> +++ b/gcc/config/riscv/arch-canonicalize
> @@ -69,12 +69,21 @@ IMPLIED_EXT = {
>"zvl32768b" : ["zvl16384b"],
>"zvl65536b" : ["zvl32768b"],
>
> -  "zvkn" : ["zvkned", "zvknhb", "zvbb", "zvkt"],
> -  "zvknc" : ["zvkn", "zvbc"],
> -  "zvkng" : ["zvkn", "zvkg"],
> -  "zvks" : ["zvksed", "zvksh", "zvbb", "zvkt"],
> -  "zvksc" : ["zvks", "zvbc"],
> -  "zvksg" : ["zvks", "zvkg"],
> +  "zvkn"   : ["zvkned", "zvknhb", "zvkb", "zvkt"],
> +  "zvknc"  : ["zvkn", "zvbc"],
> +  "zvkng"  : ["zvkn", "zvkg"],
> +  "zvks"   : ["zvksed", "zvksh", "zvkb", "zvkt"],
> +  "zvksc"  : ["zvks", "zvbc"],
> +  "zvksg"  : ["zvks", "zvkg"],
> +  "zvbb"   : ["zvkb"],
> +  "zvbc"   : ["zve64x"],
> +  "zvkb"   : ["zve32x"],
> +  "zvkg"   : ["zve32x"],
> +  "zvkned" : ["zve32x"],
> +  "zvknha" : ["zve32x"],
> +  "zvknhb" : ["zve64x"],
> +  "zvksed" : ["zve32x"],
> +  "zvksh"  : ["zve32x"],
>  }
>
>  def arch_canonicalize(arch, isa_spec):
> --
> 2.17.1
>

Re: [PATCH] lower-bitint: Fix lowering of non-_BitInt to _BitInt cast merged with some wider cast [PR112940]

2023-12-13 Thread Richard Biener

On Wed, 13 Dec 2023, Jakub Jelinek wrote:

> Hi!
> 
> The following testcase ICEs, because a PHI argument from latch edge
> uses a SSA_NAME set only in a conditionally executed block inside of the
> loop.
> This happens when we have some outer cast which lowers its operand several
> times, under some condition with variable index, under different condition
> with some constant index, otherwise something else, and then there is
> an inner cast from non-_BitInt integer (or small/middle one).  Such cast
> in certain conditions is emitted by initializing some SSA_NAMEs in the
> initialization statements before loops (say for casts from <= limb size
> precision by computing a SSA_NAME for the first limb and then extension
> of it for the later limbs) and uses the prepare_data_in_out function
> to create a PHI node.  Such function is passed the value (constant or
> SSA_NAME) to use in the PHI argument from the pre-header edge, but for
> the latch edge it always created a new SSA_NAME and then caller emitted
> in the following 3 spots an extra assignment to set that SSA_NAME to
> whatever value we want from the latch edge.  In all these 3 cases
> the argument from the latch edge is known already before the loop though,
> either constant or SSA_NAME computed in pre-header as well.
> But the need to emit an assignment combined with the handle_operand done
> in a conditional basic block results in the SSA verification failure.
> 
> The following patch fixes it by extending the prpare_data_in_out method,
> so that when the latch edge argument is known before (constant or computed
> in pre-header), we can just use it directly and avoid the extra assignment
> that would normally be hopefully optimized away later to what we now emit
> directly.
> 
> Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?

OK.

Richard.

> 2023-12-13  Jakub Jelinek  
> 
>   PR tree-optimization/112940
>   * gimple-lower-bitint.cc (struct bitint_large_huge): Add another
>   argument to prepare_data_in_out method defaulted to NULL_TREE.
>   (bitint_large_huge::handle_operand): Pass another argument to
>   prepare_data_in_out instead of emitting an assignment to set it.
>   (bitint_large_huge::prepare_data_in_out): Add VAL_OUT argument.
>   If non-NULL, use it as PHI argument instead of creating a new
>   SSA_NAME.
>   (bitint_large_huge::handle_cast): Pass rext as another argument
>   to 2 prepare_data_in_out calls instead of emitting assignments
>   to set them.
> 
>   * gcc.dg/bitint-53.c: New test.
> 
> --- gcc/gimple-lower-bitint.cc.jj 2023-12-08 09:03:06.644475539 +0100
> +++ gcc/gimple-lower-bitint.cc2023-12-12 18:31:24.440394239 +0100
> @@ -405,7 +405,7 @@ struct bitint_large_huge
>profile_probability, profile_probability,
>edge &, edge &, edge &);
>tree handle_operand (tree, tree);
> -  tree prepare_data_in_out (tree, tree, tree *);
> +  tree prepare_data_in_out (tree, tree, tree *, tree = NULL_TREE);
>tree add_cast (tree, tree);
>tree handle_plus_minus (tree_code, tree, tree, tree);
>tree handle_lshift (tree, tree, tree);
> @@ -872,11 +872,8 @@ bitint_large_huge::handle_operand (tree
> gcc_assert (m_first);
> m_data.pop ();
> m_data.pop ();
> -   prepare_data_in_out (fold_convert (m_limb_type, op), idx, &out);
> -   g = gimple_build_assign (m_data[m_data_cnt + 1],
> -build_int_cst (m_limb_type, ext));
> -   insert_before (g);
> -   m_data[m_data_cnt + 1] = gimple_assign_rhs1 (g);
> +   prepare_data_in_out (fold_convert (m_limb_type, op), idx, &out,
> +build_int_cst (m_limb_type, ext));
>   }
> else if (min_prec > prec - rem - 2 * limb_prec)
>   {
> @@ -1008,10 +1005,13 @@ bitint_large_huge::handle_operand (tree
>  
>  /* Helper method, add a PHI node with VAL from preheader edge if
> inside of a loop and m_first.  Keep state in a pair of m_data
> -   elements.  */
> +   elements.  If VAL_OUT is non-NULL, use that as PHI argument from
> +   the latch edge, otherwise create a new SSA_NAME for it and let
> +   caller initialize it.  */
>  
>  tree
> -bitint_large_huge::prepare_data_in_out (tree val, tree idx, tree *data_out)
> +bitint_large_huge::prepare_data_in_out (tree val, tree idx, tree *data_out,
> + tree val_out)
>  {
>if (!m_first)
>  {
> @@ -1034,7 +1034,7 @@ bitint_large_huge::prepare_data_in_out (
>if (e1 == e2)
>  e2 = EDGE_PRED (m_bb, 1);
>add_phi_arg (phi, val, e1, UNKNOWN_LOCATION);
> -  tree out = make_ssa_name (TREE_TYPE (val));
> +  tree out = val_out ? val_out : make_ssa_name (TREE_TYPE (val));
>add_phi_arg (phi, out, e2, UNKNOWN_LOCATION);
>m_data.safe_push (in);
>m_data.safe_push (out);
> @@ -1541,14 +1541,10 @@ bitint_large_huge::hand

Re: [PATCH] Middle-end: Adjust decrement IV style partial vectorization COST model

2023-12-13 Thread Richard Biener

On Wed, 13 Dec 2023, Juzhe-Zhong wrote:

> Hi, before this patch, a simple conversion case for RVV codegen:
> 
> foo:
> ble a2,zero,.L8
> addiw   a5,a2,-1
> li  a4,6
> bleua5,a4,.L6
> srliw   a3,a2,3
> sllia3,a3,3
> add a3,a3,a0
> mv  a5,a0
> mv  a4,a1
> vsetivlizero,8,e16,m1,ta,ma
> .L4:
> vle8.v  v2,0(a5)
> addia5,a5,8
> vzext.vf2   v1,v2
> vse16.v v1,0(a4)
> addia4,a4,16
> bne a3,a5,.L4
> andia5,a2,-8
> beq a2,a5,.L10
> .L3:
> sllia4,a5,32
> srlia4,a4,32
> subwa2,a2,a5
> sllia2,a2,32
> sllia5,a4,1
> srlia2,a2,32
> add a0,a0,a4
> add a1,a1,a5
> vsetvli zero,a2,e16,m1,ta,ma
> vle8.v  v2,0(a0)
> vzext.vf2   v1,v2
> vse16.v v1,0(a1)
> .L8:
> ret
> .L10:
> ret
> .L6:
> li  a5,0
> j   .L3
> 
> This vectorization go through first loop:
> 
> vsetivlizero,8,e16,m1,ta,ma
> .L4:
> vle8.v  v2,0(a5)
> addia5,a5,8
> vzext.vf2   v1,v2
> vse16.v v1,0(a4)
> addia4,a4,16
> bne a3,a5,.L4
> 
> Each iteration processes 8 elements.
> 
> For a scalable vectorization with VLEN > 128 bits CPU, it's ok when VLEN = 
> 128.
> But, as long as VLEN > 128 bits, it will waste the CPU resources. That is, 
> e.g. VLEN = 256bits.
> only half of the vector units are working and another half is idle.
> 
> After investigation, I realize that I forgot to adjust COST for SELECT_VL.
> So, adjust COST for SELECT_VL styple length vectorization. We adjust COST 
> from 3 to 2. since
> after this patch:
> 
> foo:
>   ble a2,zero,.L5
> .L3:
>   vsetvli a5,a2,e16,m1,ta,ma -> SELECT_VL cost.
>   vle8.v  v2,0(a0)
>   sllia4,a5,1-> additional shift of outcome 
> SELECT_VL for memory address calculation.
>   vzext.vf2   v1,v2
>   sub a2,a2,a5
>   vse16.v v1,0(a1)
>   add a0,a0,a5
>   add a1,a1,a4
>   bne a2,zero,.L3
> .L5:
>   ret
> 
> This patch is a simple fix that I previous forgot.
> 
> Ok for trunk ?

OK.

Richard.

> If not, I am going to adjust cost in backend cost model.
> 
>   PR target/111317
> 
> gcc/ChangeLog:
> 
>   * tree-vect-loop.cc (vect_estimate_min_profitable_iters): Adjust for 
> COST for decrement IV.
> 
> gcc/testsuite/ChangeLog:
> 
>   * gcc.dg/vect/costmodel/riscv/rvv/pr111317.c: New test.
> 
> ---
>  .../gcc.dg/vect/costmodel/riscv/rvv/pr111317.c  | 12 
>  gcc/tree-vect-loop.cc   | 17 ++---
>  2 files changed, 26 insertions(+), 3 deletions(-)
>  create mode 100644 gcc/testsuite/gcc.dg/vect/costmodel/riscv/rvv/pr111317.c
> 
> diff --git a/gcc/testsuite/gcc.dg/vect/costmodel/riscv/rvv/pr111317.c 
> b/gcc/testsuite/gcc.dg/vect/costmodel/riscv/rvv/pr111317.c
> new file mode 100644
> index 000..d4bea242a9a
> --- /dev/null
> +++ b/gcc/testsuite/gcc.dg/vect/costmodel/riscv/rvv/pr111317.c
> @@ -0,0 +1,12 @@
> +/* { dg-do compile } */
> +/* { dg-options "-march=rv64gcv -mabi=lp64d -O3 -ftree-vectorize 
> --param=riscv-autovec-lmul=m1" } */
> +
> +void
> +foo (char *__restrict a, short *__restrict b, int n)
> +{
> +  for (int i = 0; i < n; i++)
> +b[i] = (short) a[i];
> +}
> +
> +/* { dg-final { scan-assembler-times 
> {vsetvli\s+[a-x0-9]+,\s*[a-x0-9]+,\s*e16,\s*m1,\s*t[au],\s*m[au]} 1 } } */
> +/* { dg-final { scan-assembler-times {vsetvli} 1 } } */
> diff --git a/gcc/tree-vect-loop.cc b/gcc/tree-vect-loop.cc
> index 6261cd1be1d..19e38b8637b 100644
> --- a/gcc/tree-vect-loop.cc
> +++ b/gcc/tree-vect-loop.cc
> @@ -4870,10 +4870,21 @@ vect_estimate_min_profitable_iters (loop_vec_info 
> loop_vinfo,
>   if (partial_load_store_bias != 0)
> body_stmts += 1;
>  
> - /* Each may need two MINs and one MINUS to update lengths in body
> -for next iteration.  */
> + unsigned int length_update_cost = 0;
> + if (LOOP_VINFO_USING_DECREMENTING_IV_P (loop_vinfo))
> +   /* For decrement IV style, we use a single SELECT_VL since
> +  beginning to calculate the number of elements need to be
> +  processed in current iteration, and a SHIFT operation to
> +  compute the next memory address instead of adding vectorization
> +  factor.  */
> +   length_update_cost = 2;
> + else
> +   /* For increment IV stype, Each may need two MINs and one MINUS to
> +  update lengths in body for next iteration.  */
> +   length_update_cost = 3;
> +
>   if (need_iterate_p)
> -   body_stmts += 3 * num_vectors;
> +   body_stmts += length_update_cost * num_vectors;
> }
>  
>

Re: [PATCH v3 2/4] RISC-V: Add crypto vector builtin function.

2023-12-13 Thread juzhe.zh...@rivai.ai



+multiple_p (GET_MODE_BITSIZE (e.arg_mode (0)),
+GET_MODE_BITSIZE (e.arg_mode (1)), &nunits);

Change it into gcc_assert (multiple_p (...))

+/* A list of all Vector Crypto intrinsic functions.  */
+static function_group_info cryoto_function_groups[] = {
+#define DEF_RVV_FUNCTION(NAME, SHAPE, PREDS, OPS_INFO, AVAIL) \
+  {#NAME, &bases::NAME, &shapes::SHAPE, PREDS, OPS_INFO,\
+   riscv_vector_avail_ ## AVAIL},
+#include "riscv-vector-crypto-builtins-functions.def"
+};
Why do you add this ? I think it should belong to function_groups.

+  /* Dfine the crypto vector builtin functions. */
+  for (unsigned int i = 0; i < ARRAY_SIZE (cryoto_function_groups); ++i)
+  {
+function_group_info  *f = &cryoto_function_groups[i];
+if (f->avail && f->avail ())
+  builder.register_function_group (cryoto_function_groups[i]);
+  }


I think it should be:

for (unsigned int i = 0; i < ARRAY_SIZE (function_groups); ++i)
if (avail)
 builder.register_function_group (function_groups[i]);




juzhe.zh...@rivai.ai
 
From: Feng Wang
Date: 2023-12-13 17:12
To: gcc-patches
CC: kito.cheng; jeffreyalaw; juzhe.zhong; zhusonghe; panciyan; Feng Wang
Subject: [PATCH v3 2/4] RISC-V: Add crypto vector builtin function.
Patch v3:Define a shape for vaesz and merge vector-crypto-types.def
 into riscv-vector-builtins-types.def.
Patch v2:Optimize function_shape class for crypto_vector.
 
This patch add the intrinsic funtions of crypto vector based on the
intrinsic doc(https://github.com/riscv-non-isa/rvv-intrinsic-doc/blob
/eopc/vector-crypto/auto-generated/vector-crypto/intrinsic_funcs.md).
 
Co-Authored by: Songhe Zhu 
Co-Authored by: Ciyan Pan 
 
gcc/ChangeLog:
 
* config/riscv/riscv-vector-builtins-avail.h (AVAIL): Add AVAIL macro 
definition.
* config/riscv/riscv-vector-builtins-bases.cc (class vandn):
Add new function_base for crypto vector.
(class bitmanip): Ditto. 
(class b_reverse):Ditto. 
(class vwsll):   Ditto. 
(class clmul):   Ditto. 
(class vg_nhab):  Ditto. 
(class crypto_vv):Ditto. 
(class crypto_vi):Ditto. 
(class vaeskf2_vsm3c):Ditto.
(class vsm3me): Ditto.
(BASE): Add BASE declaration for crypto vector.
* config/riscv/riscv-vector-builtins-bases.h: Ditto.
* config/riscv/riscv-vector-builtins-shapes.cc (struct crypto_vv_def):
Add new function_shape for crypto vector.
(struct crypto_vi_def): Ditto.
(struct crypto_vv_no_op_type_def): Ditto.
(SHAPE): Add SHAPE declaration of crypto vector.
* config/riscv/riscv-vector-builtins-shapes.h: Ditto.
* config/riscv/riscv-vector-builtins-types.def (DEF_RVV_CRYPTO_SEW32_OPS):
Add new data type for crypto vector.
(DEF_RVV_CRYPTO_SEW64_OPS): Ditto.
(vuint32mf2_t): Ditto.
(vuint32m1_t): Ditto.
(vuint32m2_t): Ditto.
(vuint32m4_t): Ditto.
(vuint32m8_t): Ditto.
(vuint64m1_t): Ditto.
(vuint64m2_t): Ditto.
(vuint64m4_t): Ditto.
(vuint64m8_t): Ditto.
* config/riscv/riscv-vector-builtins.cc (DEF_RVV_CRYPTO_SEW32_OPS):
Add new data struct for crypto vector.
(DEF_RVV_CRYPTO_SEW64_OPS): Ditto.
(DEF_RVV_FUNCTION): Redefine DEF_RVV_FUNCTION macro for crypto vector.
(registered_function::overloaded_hash): Processing size_t uimm for C overloaded 
func.
(handle_pragma_vector): Add registration for crypto vector.
* config/riscv/riscv-vector-builtins.def (vi): Add vi OP_TYPE.
* config/riscv/t-riscv: Add building dependency files.
* config/riscv/riscv-vector-crypto-builtins-functions.def: New file.
---
.../riscv/riscv-vector-builtins-avail.h   |  10 +
.../riscv/riscv-vector-builtins-bases.cc  | 259 +-
.../riscv/riscv-vector-builtins-bases.h   |  28 ++
.../riscv/riscv-vector-builtins-shapes.cc |  87 +-
.../riscv/riscv-vector-builtins-shapes.h  |   4 +
.../riscv/riscv-vector-builtins-types.def |  25 ++
gcc/config/riscv/riscv-vector-builtins.cc | 149 +-
gcc/config/riscv/riscv-vector-builtins.def|   1 +
gcc/config/riscv/riscv-vector-builtins.h  |   8 +
...riscv-vector-crypto-builtins-functions.def |  78 ++
gcc/config/riscv/t-riscv  |   1 +
11 files changed, 647 insertions(+), 3 deletions(-)
create mode 100755 gcc/config/riscv/riscv-vector-crypto-builtins-functions.def
 
diff --git a/gcc/config/riscv/riscv-vector-builtins-avail.h 
b/gcc/config/riscv/riscv-vector-builtins-avail.h
index b11a8bcbc7f..4079fa1423a 100644
--- a/gcc/config/riscv/riscv-vector-builtins-avail.h
+++ b/gcc/config/riscv/riscv-vector-builtins-avail.h
@@ -8,5 +8,15 @@ namespace riscv_vector {
#define AVAIL(NAME, COND)  \
   static unsigned int riscv_vector_avail_##NAME(void) { return (COND); }
+AVAIL (zvbb, TARGET_ZVBB)
+AVAIL (zvbc, TARGET_ZVBC)
+AVAIL (zvkb_or_zvbb, TARGET_ZVKB || TARGET_ZVBB)
+AVAIL (zvkg, TARGET_ZVKG)
+AVAIL (zvkned, TARGET_ZVKNED)
+AVAIL (zvknha_or_zvknhb, TARGET_ZVKNHA || TARGET_ZVKNHB)
+AVAIL (zvknhb, TARGET_ZVKNHB)
+AVAIL (zvksed, TARGET_ZVKSED)
+AVAIL (zvksh, TARGET_ZVKSH)
+
} // namespace riscv_vector
#endif
diff --git a/gcc/c

Re: Re: [PATCH v2 1/4] RISC-V:Add crypto vector implied ISA info.

2023-12-13 Thread juzhe.zh...@rivai.ai

Hi, Kito.

Vector crypto ISA is ratifed, but intrinsics is not.

I wonder what the schedule of vector crypto  intrinsic ?

Will it be ratified before GCC-14 release 
(I personally think intrinsics stuff can be considered to be merged until the 
end of GCC-14, like I did in GCC-13 push rvv-intrinsic v0.11)? 
Intrinsics stuff should be very safe.



juzhe.zh...@rivai.ai
 
From: Kito Cheng
Date: 2023-12-13 18:09
To: Feng Wang
CC: gcc-patches; jeffreyalaw; juzhe.zhong; zhusonghe; panciyan
Subject: Re: [PATCH v2 1/4] RISC-V:Add crypto vector implied ISA info.
LGTM
 
On Wed, Dec 13, 2023 at 5:14 PM Feng Wang  wrote:
>
> Patch v2: Change the implied ISA info using the minimum set and add
> dependencies info  into the python script.
>
> Due to the crypto vector entension is depend on the Vector extension,
> so the "v" info is added into implied ISA info with the corresponding
> crypto vector extension.
>
> gcc/ChangeLog:
>
> * common/config/riscv/riscv-common.cc: Modify implied ISA info.
> * config/riscv/arch-canonicalize: Add crypto vector implied info.
> ---
>  gcc/common/config/riscv/riscv-common.cc |  9 +
>  gcc/config/riscv/arch-canonicalize  | 21 +++--
>  2 files changed, 24 insertions(+), 6 deletions(-)
>
> diff --git a/gcc/common/config/riscv/riscv-common.cc 
> b/gcc/common/config/riscv/riscv-common.cc
> index 4d5a2f874a2..76987598143 100644
> --- a/gcc/common/config/riscv/riscv-common.cc
> +++ b/gcc/common/config/riscv/riscv-common.cc
> @@ -145,6 +145,15 @@ static const riscv_implied_info_t riscv_implied_info[] =
>{"zvksc", "zvbc"},
>{"zvksg", "zvks"},
>{"zvksg", "zvkg"},
> +  {"zvbb",  "zvkb"},
> +  {"zvbc",   "zve64x"},
> +  {"zvkb",   "zve32x"},
> +  {"zvkg",   "zve32x"},
> +  {"zvkned", "zve32x"},
> +  {"zvknha", "zve32x"},
> +  {"zvknhb", "zve64x"},
> +  {"zvksed", "zve32x"},
> +  {"zvksh",  "zve32x"},
>
>{"zfh", "zfhmin"},
>{"zfhmin", "f"},
> diff --git a/gcc/config/riscv/arch-canonicalize 
> b/gcc/config/riscv/arch-canonicalize
> index ea2f67a0944..a8f47a1752b 100755
> --- a/gcc/config/riscv/arch-canonicalize
> +++ b/gcc/config/riscv/arch-canonicalize
> @@ -69,12 +69,21 @@ IMPLIED_EXT = {
>"zvl32768b" : ["zvl16384b"],
>"zvl65536b" : ["zvl32768b"],
>
> -  "zvkn" : ["zvkned", "zvknhb", "zvbb", "zvkt"],
> -  "zvknc" : ["zvkn", "zvbc"],
> -  "zvkng" : ["zvkn", "zvkg"],
> -  "zvks" : ["zvksed", "zvksh", "zvbb", "zvkt"],
> -  "zvksc" : ["zvks", "zvbc"],
> -  "zvksg" : ["zvks", "zvkg"],
> +  "zvkn"   : ["zvkned", "zvknhb", "zvkb", "zvkt"],
> +  "zvknc"  : ["zvkn", "zvbc"],
> +  "zvkng"  : ["zvkn", "zvkg"],
> +  "zvks"   : ["zvksed", "zvksh", "zvkb", "zvkt"],
> +  "zvksc"  : ["zvks", "zvbc"],
> +  "zvksg"  : ["zvks", "zvkg"],
> +  "zvbb"   : ["zvkb"],
> +  "zvbc"   : ["zve64x"],
> +  "zvkb"   : ["zve32x"],
> +  "zvkg"   : ["zve32x"],
> +  "zvkned" : ["zve32x"],
> +  "zvknha" : ["zve32x"],
> +  "zvknhb" : ["zve64x"],
> +  "zvksed" : ["zve32x"],
> +  "zvksh"  : ["zve32x"],
>  }
>
>  def arch_canonicalize(arch, isa_spec):
> --
> 2.17.1
>

RE: [ARC PATCH] Add *extvsi_n_0 define_insn_and_split for PR 110717.

2023-12-13 Thread Claudiu Zissulescu

Hi Roger,

It looks good to me.

Thank you for your contribution,
Claudiu

-Original Message-
From: Roger Sayle  
Sent: Tuesday, December 5, 2023 4:00 PM
To: gcc-patches@gcc.gnu.org
Cc: 'Claudiu Zissulescu' 
Subject: [ARC PATCH] Add *extvsi_n_0 define_insn_and_split for PR 110717.


This patch improves the code generated for bitfield sign extensions on ARC cpus 
without a barrel shifter.


Compiling the following test case:

int foo(int x) { return (x<<27)>>27; }

with -O2 -mcpu=em, generates two loops:

foo:mov lp_count,27
lp  2f
add r0,r0,r0
nop
2:  # end single insn loop
mov lp_count,27
lp  2f
asr r0,r0
nop
2:  # end single insn loop
j_s [blink]


and the closely related test case:

struct S { int a : 5; };
int bar (struct S *p) { return p->a; }

generates the slightly better:

bar:ldb_s   r0,[r0]
mov_s   r2,0;3
add3r0,r2,r0
sexb_s  r0,r0
asr_s   r0,r0
asr_s   r0,r0
j_s.d   [blink]
asr_s   r0,r0

which uses 6 instructions to perform this particular sign extension.
It turns out that sign extensions can always be implemented using at most three 
instructions on ARC (without a barrel shifter) using the idiom 
((x&mask)^msb)-msb [as described in section "2-5 Sign Extension"
of Henry Warren's book "Hacker's Delight"].  Using this, the sign extensions 
above on ARC's EM both become:

bmsk_s  r0,r0,4
xor r0,r0,32
sub r0,r0,32

which takes about 3 cycles, compared to the ~112 cycles for the loops in foo.


Tested with a cross-compiler to arc-linux hosted on x86_64, with no new 
(compile-only) regressions from make -k check.
Ok for mainline if this passes Claudiu's nightly testing?


2023-12-05  Roger Sayle  

gcc/ChangeLog
* config/arc/arc.md (*extvsi_n_0): New define_insn_and_split to
implement SImode sign extract using a AND, XOR and MINUS sequence.

gcc/testsuite/ChangeLog
* gcc.target/arc/extvsi-1.c: New test case.
* gcc.target/arc/extvsi-2.c: Likewise.


Thanks in advance,
Roger
--

Re: [PATCH v2 09/11] aarch64: Rewrite non-writeback ldp/stp patterns

2023-12-13 Thread Alex Coplan

On 12/12/2023 15:58, Richard Sandiford wrote:
> Alex Coplan  writes:
> > Hi,
> >
> > This is a v2 version which addresses feedback from Richard's review
> > here:
> >
> > https://gcc.gnu.org/pipermail/gcc-patches/2023-November/637648.html
> >
> > I'll reply inline to address specific comments.
> >
> > Bootstrapped/regtested on aarch64-linux-gnu, OK for trunk?
> >
> > Thanks,
> > Alex
> >
> > -- >8 --
> >
> > This patch overhauls the load/store pair patterns with two main goals:
> >
> > 1. Fixing a correctness issue (the current patterns are not RA-friendly).
> > 2. Allowing more flexibility in which operand modes are supported, and which
> >combinations of modes are allowed in the two arms of the load/store pair,
> >while reducing the number of patterns required both in the source and in
> >the generated code.
> >
> > The correctness issue (1) is due to the fact that the current patterns have
> > two independent memory operands tied together only by a predicate on the 
> > insns.
> > Since LRA only looks at the constraints, one of the memory operands can get
> > reloaded without the other one being changed, leading to the insn becoming
> > unrecognizable after reload.
> >
> > We fix this issue by changing the patterns such that they only ever have one
> > memory operand representing the entire pair.  For the store case, we use an
> > unspec to logically concatenate the register operands before storing them.
> > For the load case, we use unspecs to extract the "lanes" from the pair mem,
> > with the second occurrence of the mem matched using a match_dup (such that 
> > there
> > is still really only one memory operand as far as the RA is concerned).
> >
> > In terms of the modes used for the pair memory operands, we canonicalize
> > these to V2x4QImode, V2x8QImode, and V2x16QImode.  These modes have not
> > only the correct size but also correct alignment requirement for a
> > memory operand representing an entire load/store pair.  Unlike the other
> > two, V2x4QImode didn't previously exist, so had to be added with the
> > patch.
> >
> > As with the previous patch generalizing the writeback patterns, this
> > patch aims to be flexible in the combinations of modes supported by the
> > patterns without requiring a large number of generated patterns by using
> > distinct mode iterators.
> >
> > The new scheme means we only need a single (generated) pattern for each
> > load/store operation of a given operand size.  For the 4-byte and 8-byte
> > operand cases, we use the GPI iterator to synthesize the two patterns.
> > The 16-byte case is implemented as a separate pattern in the source (due
> > to only having a single possible alternative).
> >
> > Since the UNSPEC patterns can't be interpreted by the dwarf2cfi code,
> > we add REG_CFA_OFFSET notes to the store pair insns emitted by
> > aarch64_save_callee_saves, so that correct CFI information can still be
> > generated.  Furthermore, we now unconditionally generate these CFA
> > notes on frame-related insns emitted by aarch64_save_callee_saves.
> > This is done in case that the load/store pair pass forms these into
> > pairs, in which case the CFA notes would be needed.
> >
> > We also adjust the ldp/stp peepholes to generate the new form.  This is
> > done by switching the generation to use the
> > aarch64_gen_{load,store}_pair interface, making it easier to change the
> > form in the future if needed.  (Likewise, the upcoming aarch64
> > load/store pair pass also makes use of this interface).
> >
> > This patch also adds an "ldpstp" attribute to the non-writeback
> > load/store pair patterns, which is used by the post-RA load/store pair
> > pass to identify existing patterns and see if they can be promoted to
> > writeback variants.
> >
> > One potential concern with using unspecs for the patterns is that it can 
> > block
> > optimization by the generic RTL passes.  This patch series tries to mitigate
> > this in two ways:
> >  1. The pre-RA load/store pair pass runs very late in the pre-RA pipeline.
> >  2. A later patch in the series adjusts the aarch64 mem{cpy,set} expansion 
> > to
> > emit individual loads/stores instead of ldp/stp.  These should then be
> > formed back into load/store pairs much later in the RTL pipeline by the
> > new load/store pair pass.
> >
> > gcc/ChangeLog:
> >
> > * config/aarch64/aarch64-ldpstp.md: Abstract ldp/stp
> > representation from peepholes, allowing use of new form.
> > * config/aarch64/aarch64-modes.def (V2x4QImode): Define.
> > * config/aarch64/aarch64-protos.h
> > (aarch64_finish_ldpstp_peephole): Declare.
> > (aarch64_swap_ldrstr_operands): Delete declaration.
> > (aarch64_gen_load_pair): Adjust parameters.
> > (aarch64_gen_store_pair): Likewise.
> > * config/aarch64/aarch64-simd.md (load_pair):
> > Delete.
> > (vec_store_pair): Delete.
> > (load_pair): Delete.
> > (vec_store_pair): Del

[PATCH] Fix tests for gomp

2023-12-13 Thread Andre Vieira (lists)


Hi,

Apologies for the delay and this mixup. I need to do something different

This is to fix testisms initially introduced by:
commit f5fc001a84a7dbb942a6252b3162dd38b4aae311
Author: Andre Vieira 
Date:   Mon Dec 11 14:24:41 2023 +

aarch64: enable mixed-types for aarch64 simdclones

gcc/testsuite/ChangeLog:

* gcc.dg/gomp/pr87887-1.c: Fixed test.
* gcc.dg/gomp/pr89246-1.c: Likewise.
* gcc.dg/gomp/simd-clones-2.c: Likewise.

libgomp/ChangeLog:

* testsuite/libgomp.c/declare-variant-1.c: Fixed test.
* testsuite/libgomp.fortran/declare-simd-1.f90: Likewise.

OK for trunk? I was intending to commit as obvious, but jakub had made a 
comment about declare-simd-1.f90 so I thought it might be worth just 
sending it up to the mailing list first.


Kind regards,
Andrediff --git a/gcc/testsuite/gcc.dg/gomp/pr87887-1.c 
b/gcc/testsuite/gcc.dg/gomp/pr87887-1.c
index 
281898300c7794d862e62c70a83a33d5aaa8f89e..8b04ffd0809be4e6f5ab97c2e32e800edffbee4f
 100644
--- a/gcc/testsuite/gcc.dg/gomp/pr87887-1.c
+++ b/gcc/testsuite/gcc.dg/gomp/pr87887-1.c
@@ -10,7 +10,6 @@ foo (int x)
 {
   return (struct S) { x };
 }
-/* { dg-warning "unsupported return type ‘struct S’ for ‘simd’ functions" "" { 
target aarch64*-*-* } .-4 } */
 
 #pragma omp declare simd
 int
@@ -18,7 +17,6 @@ bar (struct S x)
 {
   return x.n;
 }
-/* { dg-warning "unsupported argument type ‘struct S’ for ‘simd’ functions" "" 
{ target aarch64*-*-* } .-4 } */
 
 #pragma omp declare simd uniform (x)
 int
diff --git a/gcc/testsuite/gcc.dg/gomp/pr89246-1.c 
b/gcc/testsuite/gcc.dg/gomp/pr89246-1.c
index 
4a0fd74f0639b2832dcb9101e006d127568fbcbd..dfe629c1c6a51624cd94878c638606220cfe94eb
 100644
--- a/gcc/testsuite/gcc.dg/gomp/pr89246-1.c
+++ b/gcc/testsuite/gcc.dg/gomp/pr89246-1.c
@@ -8,7 +8,6 @@ int foo (__int128 x)
 {
   return x;
 }
-/* { dg-warning "unsupported argument type ‘__int128’ for ‘simd’ functions" "" 
{ target aarch64*-*-* } .-4 } */
 
 #pragma omp declare simd
 extern int bar (int x);
diff --git a/gcc/testsuite/gcc.dg/gomp/simd-clones-2.c 
b/gcc/testsuite/gcc.dg/gomp/simd-clones-2.c
index 
f12244054bd46fa10e51cc3a688c4cf683689994..354078acd9f3073b8400621a0e7149aee571594b
 100644
--- a/gcc/testsuite/gcc.dg/gomp/simd-clones-2.c
+++ b/gcc/testsuite/gcc.dg/gomp/simd-clones-2.c
@@ -19,7 +19,6 @@ float setArray(float *a, float x, int k)
 /* { dg-final { scan-tree-dump "_ZGVnN2ua32vl_setArray" "optimized" { target 
aarch64*-*-* } } } */
 /* { dg-final { scan-tree-dump "_ZGVnN4ua32vl_setArray" "optimized" { target 
aarch64*-*-* } } } */
 /* { dg-final { scan-tree-dump "_ZGVnN2vvva32_addit" "optimized" { target 
aarch64*-*-* } } } */
-/* { dg-final { scan-tree-dump "_ZGVnN4vvva32_addit" "optimized" { target 
aarch64*-*-* } } } */
 /* { dg-final { scan-tree-dump "_ZGVnM2vl66u_addit" "optimized" { target 
aarch64*-*-* } } } */
 /* { dg-final { scan-tree-dump "_ZGVnM4vl66u_addit" "optimized" { target 
aarch64*-*-* } } } */
 
diff --git a/libgomp/testsuite/libgomp.c/declare-variant-1.c 
b/libgomp/testsuite/libgomp.c/declare-variant-1.c
index 
6129f23a0f80585246957022d63608dc3a68f1ff..790e9374054fe3e0ae609796640ff295b61e8389
 100644
--- a/libgomp/testsuite/libgomp.c/declare-variant-1.c
+++ b/libgomp/testsuite/libgomp.c/declare-variant-1.c
@@ -40,16 +40,17 @@ f04 (int a)
 int
 test1 (int x)
 {
-  /* At gimplification time, we can't decide yet which function to call.  */
-  /* { dg-final { scan-tree-dump-times "f04 \\\(x" 2 "gimple" } } */
+  /* At gimplification time, we can't decide yet which function to call for
+ x86_64 targets, given the f01 variant.  */
+  /* { dg-final { scan-tree-dump-times "f04 \\\(x" 2 "gimple" { target 
x86_64-*-* } } } */
   /* After simd clones are created, the original non-clone test1 shall
  call f03 (score 6), the sse2/avx/avx2 clones too, but avx512f clones
  shall call f01 with score 8.  */
   /* { dg-final { scan-ltrans-tree-dump-not "f04 \\\(x" "optimized" } } */
-  /* { dg-final { scan-ltrans-tree-dump-times "f03 \\\(x" 14 "optimized" { 
target { !aarch64*-*-* } } } } } */
-  /* { dg-final { scan-ltrans-tree-dump-times "f01 \\\(x" 4 "optimized" { 
target { !aarch64*-*-* } } } } } */
-  /* { dg-final { scan-ltrans-tree-dump-times "f03 \\\(x" 10 "optimized" { 
target { aarch64*-*-* } } } } } */
-  /* { dg-final { scan-ltrans-tree-dump-not "f01 \\\(x" "optimized" { target { 
aarch64*-*-* } } } } } */
+  /* { dg-final { scan-ltrans-tree-dump-times "f03 \\\(x" 14 "optimized" { 
target { !aarch64*-*-* } } } } */
+  /* { dg-final { scan-ltrans-tree-dump-times "f01 \\\(x" 4 "optimized" { 
target { !aarch64*-*-* } } } } */
+  /* { dg-final { scan-ltrans-tree-dump-times "f03 \\\(x" 10 "optimized" { 
target { aarch64*-*-* } } } } */
+  /* { dg-final { scan-ltrans-tree-dump-not "f01 \\\(x" "optimized" { target { 
aarch64*-*-* } } } } */
   int a = f04 (x);
   int b = f04 (x);
   return a + b;
diff --git a/libgomp/testsuite/libgomp.fortran/declare-simd-1.f90 
b/libgomp/testsuite/libg

[r14-6468 Regression] FAIL: std/time/year/io.cc -std=gnu++26 execution test on Linux/x86_64

2023-12-13 Thread haochen.jiang

On Linux/x86_64,

a01462ae8bafa86e7df47a252917ba6899d587cf is the first bad commit
commit a01462ae8bafa86e7df47a252917ba6899d587cf
Author: Jonathan Wakely 
Date:   Mon Dec 11 15:33:59 2023 +

libstdc++: Fix std::format output of %C for negative years

caused

FAIL: std/time/year/io.cc  -std=gnu++20 execution test
FAIL: std/time/year/io.cc  -std=gnu++26 execution test

with GCC configured with

../../gcc/configure 
--prefix=/export/users/haochenj/src/gcc-bisect/master/master/r14-6468/usr 
--enable-clocale=gnu --with-system-zlib --with-demangler-in-ld 
--with-fpmath=sse --enable-languages=c,c++,fortran --enable-cet --without-isl 
--enable-libmpx x86_64-linux --disable-bootstrap

To reproduce:

$ cd {build_dir}/x86_64-linux/libstdc++-v3/testsuite && make check 
RUNTESTFLAGS="conformance.exp=std/time/year/io.cc --target_board='unix{-m32}'"
$ cd {build_dir}/x86_64-linux/libstdc++-v3/testsuite && make check 
RUNTESTFLAGS="conformance.exp=std/time/year/io.cc --target_board='unix{-m32\ 
-march=cascadelake}'"
$ cd {build_dir}/x86_64-linux/libstdc++-v3/testsuite && make check 
RUNTESTFLAGS="conformance.exp=std/time/year/io.cc --target_board='unix{-m64}'"
$ cd {build_dir}/x86_64-linux/libstdc++-v3/testsuite && make check 
RUNTESTFLAGS="conformance.exp=std/time/year/io.cc --target_board='unix{-m64\ 
-march=cascadelake}'"

(Please do not reply to this email, for question about this report, contact me 
at haochen dot jiang at intel.com.)
(If you met problems with cascadelake related, disabling AVX512F in command 
line might save that.)
(However, please make sure that there is no potential problems with AVX512.)

[r14-6470 Regression] FAIL: g++.dg/pr112822.C -std=gnu++98 (test for excess errors) on Linux/x86_64

2023-12-13 Thread haochen.jiang

On Linux/x86_64,

788e0d48ec639d44294434f4f20ae94023c3759d is the first bad commit
commit 788e0d48ec639d44294434f4f20ae94023c3759d
Author: Peter Bergner 
Date:   Tue Dec 12 16:46:16 2023 -0600

testsuite: Add testcase for already fixed PR [PR112822]

caused

FAIL: g++.dg/pr112822.C  -std=gnu++14 (test for excess errors)
FAIL: g++.dg/pr112822.C  -std=gnu++98 (test for excess errors)

with GCC configured with

../../gcc/configure 
--prefix=/export/users/haochenj/src/gcc-bisect/master/master/r14-6470/usr 
--enable-clocale=gnu --with-system-zlib --with-demangler-in-ld 
--with-fpmath=sse --enable-languages=c,c++,fortran --enable-cet --without-isl 
--enable-libmpx x86_64-linux --disable-bootstrap

To reproduce:

$ cd {build_dir}/gcc && make check RUNTESTFLAGS="dg.exp=g++.dg/pr112822.C 
--target_board='unix{-m32}'"
$ cd {build_dir}/gcc && make check RUNTESTFLAGS="dg.exp=g++.dg/pr112822.C 
--target_board='unix{-m32\ -march=cascadelake}'"
$ cd {build_dir}/gcc && make check RUNTESTFLAGS="dg.exp=g++.dg/pr112822.C 
--target_board='unix{-m64}'"
$ cd {build_dir}/gcc && make check RUNTESTFLAGS="dg.exp=g++.dg/pr112822.C 
--target_board='unix{-m64\ -march=cascadelake}'"

(Please do not reply to this email, for question about this report, contact me 
at haochen dot jiang at intel.com.)
(If you met problems with cascadelake related, disabling AVX512F in command 
line might save that.)
(However, please make sure that there is no potential problems with AVX512.)

Re: [PATCH] Fix tests for gomp

2023-12-13 Thread Jakub Jelinek

On Wed, Dec 13, 2023 at 10:43:16AM +, Andre Vieira (lists) wrote:
> Hi,
> 
> Apologies for the delay and this mixup. I need to do something different
> 
> This is to fix testisms initially introduced by:
> commit f5fc001a84a7dbb942a6252b3162dd38b4aae311
> Author: Andre Vieira 
> Date:   Mon Dec 11 14:24:41 2023 +
> 
> aarch64: enable mixed-types for aarch64 simdclones
> 
> gcc/testsuite/ChangeLog:
> 
>   * gcc.dg/gomp/pr87887-1.c: Fixed test.
>   * gcc.dg/gomp/pr89246-1.c: Likewise.
>   * gcc.dg/gomp/simd-clones-2.c: Likewise.
> 
> libgomp/ChangeLog:
> 
>   * testsuite/libgomp.c/declare-variant-1.c: Fixed test.
>   * testsuite/libgomp.fortran/declare-simd-1.f90: Likewise.
> 
> OK for trunk? I was intending to commit as obvious, but jakub had made a
> comment about declare-simd-1.f90 so I thought it might be worth just sending
> it up to the mailing list first.

> --- a/libgomp/testsuite/libgomp.c/declare-variant-1.c
> +++ b/libgomp/testsuite/libgomp.c/declare-variant-1.c
> @@ -40,16 +40,17 @@ f04 (int a)
>  int
>  test1 (int x)
>  {
> -  /* At gimplification time, we can't decide yet which function to call.  */
> -  /* { dg-final { scan-tree-dump-times "f04 \\\(x" 2 "gimple" } } */
> +  /* At gimplification time, we can't decide yet which function to call for
> + x86_64 targets, given the f01 variant.  */
> +  /* { dg-final { scan-tree-dump-times "f04 \\\(x" 2 "gimple" { target 
> x86_64-*-* } } } */
>/* After simd clones are created, the original non-clone test1 shall
>   call f03 (score 6), the sse2/avx/avx2 clones too, but avx512f clones
>   shall call f01 with score 8.  */
>/* { dg-final { scan-ltrans-tree-dump-not "f04 \\\(x" "optimized" } } */
> -  /* { dg-final { scan-ltrans-tree-dump-times "f03 \\\(x" 14 "optimized" { 
> target { !aarch64*-*-* } } } } } */
> -  /* { dg-final { scan-ltrans-tree-dump-times "f01 \\\(x" 4 "optimized" { 
> target { !aarch64*-*-* } } } } } */
> -  /* { dg-final { scan-ltrans-tree-dump-times "f03 \\\(x" 10 "optimized" { 
> target { aarch64*-*-* } } } } } */
> -  /* { dg-final { scan-ltrans-tree-dump-not "f01 \\\(x" "optimized" { target 
> { aarch64*-*-* } } } } } */
> +  /* { dg-final { scan-ltrans-tree-dump-times "f03 \\\(x" 14 "optimized" { 
> target { !aarch64*-*-* } } } } */
> +  /* { dg-final { scan-ltrans-tree-dump-times "f01 \\\(x" 4 "optimized" { 
> target { !aarch64*-*-* } } } } */
> +  /* { dg-final { scan-ltrans-tree-dump-times "f03 \\\(x" 10 "optimized" { 
> target { aarch64*-*-* } } } } */
> +  /* { dg-final { scan-ltrans-tree-dump-not "f01 \\\(x" "optimized" { target 
> { aarch64*-*-* } } } } */

The changes in this test look all wrong.  The differences are
i?86-*-* x86_64-*-* (which can support avx512f isa) vs. other targets (which
can't).
So, there is nothing aarch64 specific in there and { target x86_64-*-* }
is also incorrect.  It should be simply
{ target i?86-*-* x86_64-*-* }
vs.
{ target { ! { i?86-*-* x86_64-*-* } } }
(never sure about the ! syntaxes).

The other changes LGTM.

Jakub

Re: [PATCH] Fix tests for gomp

2023-12-13 Thread Jakub Jelinek

On Wed, Dec 13, 2023 at 11:55:52AM +0100, Jakub Jelinek wrote:
> On Wed, Dec 13, 2023 at 10:43:16AM +, Andre Vieira (lists) wrote:
> > --- a/libgomp/testsuite/libgomp.c/declare-variant-1.c
> > +++ b/libgomp/testsuite/libgomp.c/declare-variant-1.c
> > @@ -40,16 +40,17 @@ f04 (int a)
> >  int
> >  test1 (int x)
> >  {
> > -  /* At gimplification time, we can't decide yet which function to call.  
> > */
> > -  /* { dg-final { scan-tree-dump-times "f04 \\\(x" 2 "gimple" } } */
> > +  /* At gimplification time, we can't decide yet which function to call for
> > + x86_64 targets, given the f01 variant.  */
> > +  /* { dg-final { scan-tree-dump-times "f04 \\\(x" 2 "gimple" { target 
> > x86_64-*-* } } } */
> >/* After simd clones are created, the original non-clone test1 shall
> >   call f03 (score 6), the sse2/avx/avx2 clones too, but avx512f clones
> >   shall call f01 with score 8.  */
> >/* { dg-final { scan-ltrans-tree-dump-not "f04 \\\(x" "optimized" } } */
> > -  /* { dg-final { scan-ltrans-tree-dump-times "f03 \\\(x" 14 "optimized" { 
> > target { !aarch64*-*-* } } } } } */
> > -  /* { dg-final { scan-ltrans-tree-dump-times "f01 \\\(x" 4 "optimized" { 
> > target { !aarch64*-*-* } } } } } */
> > -  /* { dg-final { scan-ltrans-tree-dump-times "f03 \\\(x" 10 "optimized" { 
> > target { aarch64*-*-* } } } } } */
> > -  /* { dg-final { scan-ltrans-tree-dump-not "f01 \\\(x" "optimized" { 
> > target { aarch64*-*-* } } } } } */
> > +  /* { dg-final { scan-ltrans-tree-dump-times "f03 \\\(x" 14 "optimized" { 
> > target { !aarch64*-*-* } } } } */
> > +  /* { dg-final { scan-ltrans-tree-dump-times "f01 \\\(x" 4 "optimized" { 
> > target { !aarch64*-*-* } } } } */
> > +  /* { dg-final { scan-ltrans-tree-dump-times "f03 \\\(x" 10 "optimized" { 
> > target { aarch64*-*-* } } } } */
> > +  /* { dg-final { scan-ltrans-tree-dump-not "f01 \\\(x" "optimized" { 
> > target { aarch64*-*-* } } } } */
> 
> The changes in this test look all wrong.  The differences are
> i?86-*-* x86_64-*-* (which can support avx512f isa) vs. other targets (which
> can't).
> So, there is nothing aarch64 specific in there and { target x86_64-*-* }
> is also incorrect.  It should be simply
> { target i?86-*-* x86_64-*-* }
> vs.
> { target { ! { i?86-*-* x86_64-*-* } } }
> (never sure about the ! syntaxes).

Or even better just make the whole test i?86-*-* x86_64-*-* specific.
It is really testing the details how many clones are created and what is
called in them, so nothing that applies to other architectures and how
many clones those have will differ, but what exactly will be called in there
will be the same.

Jakub

Re: [PATCH] Fix tests for gomp

2023-12-13 Thread Andre Vieira (lists)





On 13/12/2023 10:55, Jakub Jelinek wrote:

On Wed, Dec 13, 2023 at 10:43:16AM +, Andre Vieira (lists) wrote:

Hi,

Apologies for the delay and this mixup. I need to do something different

This is to fix testisms initially introduced by:
commit f5fc001a84a7dbb942a6252b3162dd38b4aae311
Author: Andre Vieira 
Date:   Mon Dec 11 14:24:41 2023 +

 aarch64: enable mixed-types for aarch64 simdclones

gcc/testsuite/ChangeLog:

* gcc.dg/gomp/pr87887-1.c: Fixed test.
* gcc.dg/gomp/pr89246-1.c: Likewise.
* gcc.dg/gomp/simd-clones-2.c: Likewise.

libgomp/ChangeLog:

* testsuite/libgomp.c/declare-variant-1.c: Fixed test.
* testsuite/libgomp.fortran/declare-simd-1.f90: Likewise.

OK for trunk? I was intending to commit as obvious, but jakub had made a
comment about declare-simd-1.f90 so I thought it might be worth just sending
it up to the mailing list first.



--- a/libgomp/testsuite/libgomp.c/declare-variant-1.c
+++ b/libgomp/testsuite/libgomp.c/declare-variant-1.c
@@ -40,16 +40,17 @@ f04 (int a)
  int
  test1 (int x)
  {
-  /* At gimplification time, we can't decide yet which function to call.  */
-  /* { dg-final { scan-tree-dump-times "f04 \\\(x" 2 "gimple" } } */
+  /* At gimplification time, we can't decide yet which function to call for
+ x86_64 targets, given the f01 variant.  */
+  /* { dg-final { scan-tree-dump-times "f04 \\\(x" 2 "gimple" { target 
x86_64-*-* } } } */
/* After simd clones are created, the original non-clone test1 shall
   call f03 (score 6), the sse2/avx/avx2 clones too, but avx512f clones
   shall call f01 with score 8.  */
/* { dg-final { scan-ltrans-tree-dump-not "f04 \\\(x" "optimized" } } */
-  /* { dg-final { scan-ltrans-tree-dump-times "f03 \\\(x" 14 "optimized" { 
target { !aarch64*-*-* } } } } } */
-  /* { dg-final { scan-ltrans-tree-dump-times "f01 \\\(x" 4 "optimized" { 
target { !aarch64*-*-* } } } } } */
-  /* { dg-final { scan-ltrans-tree-dump-times "f03 \\\(x" 10 "optimized" { 
target { aarch64*-*-* } } } } } */
-  /* { dg-final { scan-ltrans-tree-dump-not "f01 \\\(x" "optimized" { target { 
aarch64*-*-* } } } } } */
+  /* { dg-final { scan-ltrans-tree-dump-times "f03 \\\(x" 14 "optimized" { 
target { !aarch64*-*-* } } } } */
+  /* { dg-final { scan-ltrans-tree-dump-times "f01 \\\(x" 4 "optimized" { 
target { !aarch64*-*-* } } } } */
+  /* { dg-final { scan-ltrans-tree-dump-times "f03 \\\(x" 10 "optimized" { 
target { aarch64*-*-* } } } } */
+  /* { dg-final { scan-ltrans-tree-dump-not "f01 \\\(x" "optimized" { target { 
aarch64*-*-* } } } } */


The changes in this test look all wrong.  The differences are
i?86-*-* x86_64-*-* (which can support avx512f isa) vs. other targets (which
can't).
So, there is nothing aarch64 specific in there and { target x86_64-*-* }
is also incorrect.  It should be simply
{ target i?86-*-* x86_64-*-* }
vs.
{ target { ! { i?86-*-* x86_64-*-* } } }
(never sure about the ! syntaxes).



Hmm I think I understand what you are saying, but I'm not sure I 
agree. So before I enabled simdclone testing for aarch64, this test had 
no target selectors. So it checked the same for 'all simdclone test 
targets'. Which seem to be x86 and amdgcn:


@@ -4321,7 +4321,8 @@ proc check_effective_target_vect_simd_clones { } {
 return [check_cached_effective_target_indexed vect_simd_clones {
   expr { (([istarget i?86-*-*] || [istarget x86_64-*-*])
  && [check_effective_target_avx512f])
|| [istarget amdgcn-*-*]
|| [istarget aarch64*-*-*] }}]
 }

I haven't checked what amdgcn does with this test, but I'd have to 
assume they were passing before? Though I'm not sure how amdgcn would 
pass the original:
 -  /* At gimplification time, we can't decide yet which function to 
call.  */

 -  /* { dg-final { scan-tree-dump-times "f04 \\\(x" 2 "gimple" } } */

I've added Andrew to the mail to see if he can comment on that. Either 
way I'd suggest we either add scan's per target with the expected value 
or stick with my original change of aarch64 vs non-aarch64 as I think 
that would better reflect the changes of enabling this for aarch64 where 
it wasn't ran before.

[PATCH] extend.texi: Fix typos in LSX intrinsics

2023-12-13 Thread Jiajie Chen

Several typos have been found and fixed: missing semicolons, using
variable name instead of type and wrong types.

gcc/ChangeLog:

* doc/extend.texi(__lsx_vabsd_di): remove extra `i' in name.
(__lsx_vfrintrm_d, __lsx_vfrintrm_s, __lsx_vfrintrne_d,
__lsx_vfrintrne_s, __lsx_vfrintrp_d, __lsx_vfrintrp_s, __lsx_vfrintrz_d,
__lsx_vfrintrz_s): fix return types.
(__lsx_vld, __lsx_vldi, __lsx_vldrepl_b, __lsx_vldrepl_d,
__lsx_vldrepl_h, __lsx_vldrepl_w, __lsx_vmaxi_b, __lsx_vmaxi_d,
__lsx_vmaxi_h, __lsx_vmaxi_w, __lsx_vmini_b, __lsx_vmini_d,
__lsx_vmini_h, __lsx_vmini_w, __lsx_vsrani_d_q, __lsx_vsrarni_d_q,
__lsx_vsrlni_d_q, __lsx_vsrlrni_d_q, __lsx_vssrani_d_q, 
__lsx_vssrarni_d_q,
__lsx_vssrarni_du_q, __lsx_vssrlni_d_q, __lsx_vssrlrni_du_q, __lsx_vst,
__lsx_vstx): add missing semicolon.
(__lsx_vpickve2gr_bu, __lsx_vpickve2gr_hu): fix typo in return
type.
(__lsx_vstelm_b, __lsx_vstelm_d, __lsx_vstelm_h,
__lsx_vstelm_w): use imm type for the last argument.
---
 gcc/doc/extend.texi | 80 ++---
 1 file changed, 40 insertions(+), 40 deletions(-)

diff --git a/gcc/doc/extend.texi b/gcc/doc/extend.texi
index f0c789f6cb4..cb114f9cacf 100644
--- a/gcc/doc/extend.texi
+++ b/gcc/doc/extend.texi
@@ -17563,7 +17563,7 @@ int __lsx_bz_v (__m128i);
 int __lsx_bz_w (__m128i);
 __m128i __lsx_vabsd_b (__m128i, __m128i);
 __m128i __lsx_vabsd_bu (__m128i, __m128i);
-__m128i __lsx_vabsd_di (__m128i, __m128i);
+__m128i __lsx_vabsd_d (__m128i, __m128i);
 __m128i __lsx_vabsd_du (__m128i, __m128i);
 __m128i __lsx_vabsd_h (__m128i, __m128i);
 __m128i __lsx_vabsd_hu (__m128i, __m128i);
@@ -17769,14 +17769,14 @@ __m128 __lsx_vfnmsub_s (__m128, __m128, __m128);
 __m128d __lsx_vfrecip_d (__m128d);
 __m128 __lsx_vfrecip_s (__m128);
 __m128d __lsx_vfrint_d (__m128d);
-__m128i __lsx_vfrintrm_d (__m128d);
-__m128i __lsx_vfrintrm_s (__m128);
-__m128i __lsx_vfrintrne_d (__m128d);
-__m128i __lsx_vfrintrne_s (__m128);
-__m128i __lsx_vfrintrp_d (__m128d);
-__m128i __lsx_vfrintrp_s (__m128);
-__m128i __lsx_vfrintrz_d (__m128d);
-__m128i __lsx_vfrintrz_s (__m128);
+__m128d __lsx_vfrintrm_d (__m128d);
+__m128 __lsx_vfrintrm_s (__m128);
+__m128d __lsx_vfrintrne_d (__m128d);
+__m128 __lsx_vfrintrne_s (__m128);
+__m128d __lsx_vfrintrp_d (__m128d);
+__m128 __lsx_vfrintrp_s (__m128);
+__m128d __lsx_vfrintrz_d (__m128d);
+__m128 __lsx_vfrintrz_s (__m128);
 __m128 __lsx_vfrint_s (__m128);
 __m128d __lsx_vfrsqrt_d (__m128d);
 __m128 __lsx_vfrsqrt_s (__m128);
@@ -17845,12 +17845,12 @@ __m128i __lsx_vinsgr2vr_b (__m128i, int, imm0_15);
 __m128i __lsx_vinsgr2vr_d (__m128i, long int, imm0_1);
 __m128i __lsx_vinsgr2vr_h (__m128i, int, imm0_7);
 __m128i __lsx_vinsgr2vr_w (__m128i, int, imm0_3);
-__m128i __lsx_vld (void *, imm_n2048_2047)
-__m128i __lsx_vldi (imm_n1024_1023)
-__m128i __lsx_vldrepl_b (void *, imm_n2048_2047)
-__m128i __lsx_vldrepl_d (void *, imm_n256_255)
-__m128i __lsx_vldrepl_h (void *, imm_n1024_1023)
-__m128i __lsx_vldrepl_w (void *, imm_n512_511)
+__m128i __lsx_vld (void *, imm_n2048_2047);
+__m128i __lsx_vldi (imm_n1024_1023);
+__m128i __lsx_vldrepl_b (void *, imm_n2048_2047);
+__m128i __lsx_vldrepl_d (void *, imm_n256_255);
+__m128i __lsx_vldrepl_h (void *, imm_n1024_1023);
+__m128i __lsx_vldrepl_w (void *, imm_n512_511);
 __m128i __lsx_vldx (void *, long int);
 __m128i __lsx_vmadd_b (__m128i, __m128i, __m128i);
 __m128i __lsx_vmadd_d (__m128i, __m128i, __m128i);
@@ -17886,13 +17886,13 @@ __m128i __lsx_vmax_d (__m128i, __m128i);
 __m128i __lsx_vmax_du (__m128i, __m128i);
 __m128i __lsx_vmax_h (__m128i, __m128i);
 __m128i __lsx_vmax_hu (__m128i, __m128i);
-__m128i __lsx_vmaxi_b (__m128i, imm_n16_15)
+__m128i __lsx_vmaxi_b (__m128i, imm_n16_15);
 __m128i __lsx_vmaxi_bu (__m128i, imm0_31);
-__m128i __lsx_vmaxi_d (__m128i, imm_n16_15)
+__m128i __lsx_vmaxi_d (__m128i, imm_n16_15);
 __m128i __lsx_vmaxi_du (__m128i, imm0_31);
-__m128i __lsx_vmaxi_h (__m128i, imm_n16_15)
+__m128i __lsx_vmaxi_h (__m128i, imm_n16_15);
 __m128i __lsx_vmaxi_hu (__m128i, imm0_31);
-__m128i __lsx_vmaxi_w (__m128i, imm_n16_15)
+__m128i __lsx_vmaxi_w (__m128i, imm_n16_15);
 __m128i __lsx_vmaxi_wu (__m128i, imm0_31);
 __m128i __lsx_vmax_w (__m128i, __m128i);
 __m128i __lsx_vmax_wu (__m128i, __m128i);
@@ -17902,13 +17902,13 @@ __m128i __lsx_vmin_d (__m128i, __m128i);
 __m128i __lsx_vmin_du (__m128i, __m128i);
 __m128i __lsx_vmin_h (__m128i, __m128i);
 __m128i __lsx_vmin_hu (__m128i, __m128i);
-__m128i __lsx_vmini_b (__m128i, imm_n16_15)
+__m128i __lsx_vmini_b (__m128i, imm_n16_15);
 __m128i __lsx_vmini_bu (__m128i, imm0_31);
-__m128i __lsx_vmini_d (__m128i, imm_n16_15)
+__m128i __lsx_vmini_d (__m128i, imm_n16_15);
 __m128i __lsx_vmini_du (__m128i, imm0_31);
-__m128i __lsx_vmini_h (__m128i, imm_n16_15)
+__m128i __lsx_vmini_h (__m128i, imm_n16_15);
 __m128i __lsx_vmini_hu (__m128i, imm0_31);
-__m128i __lsx_vmini_w (__m128i, im

Re: [PATCH] Fix tests for gomp

2023-12-13 Thread Jakub Jelinek

On Wed, Dec 13, 2023 at 11:03:50AM +, Andre Vieira (lists) wrote:
> Hmm I think I understand what you are saying, but I'm not sure I agree.
> So before I enabled simdclone testing for aarch64, this test had no target
> selectors. So it checked the same for 'all simdclone test targets'. Which
> seem to be x86 and amdgcn:
> 
> @@ -4321,7 +4321,8 @@ proc check_effective_target_vect_simd_clones { } {
>  return [check_cached_effective_target_indexed vect_simd_clones {
>expr { (([istarget i?86-*-*] || [istarget x86_64-*-*])
>   && [check_effective_target_avx512f])
> || [istarget amdgcn-*-*]
> || [istarget aarch64*-*-*] }}]
>  }
> 
> I haven't checked what amdgcn does with this test, but I'd have to assume
> they were passing before? Though I'm not sure how amdgcn would pass the
> original:
>  -  /* At gimplification time, we can't decide yet which function to call.
> */
>  -  /* { dg-final { scan-tree-dump-times "f04 \\\(x" 2 "gimple" } } */

It can't really pass there.  amdgcn certainly doesn't create 4 different
simd clones where one has avx512f isa and others don't.
gcn creates just one simd clone with simdlen 64 and that clone will never
support avx512f isa and we know that already at gimplification time.

Jakub

[PATCH v2] RISC-V: Fix dynamic lmul tests depended on abi

2023-12-13 Thread demin . han

Some toolchain configs would report:
fatal error: gnu/stubs-ilp32.h: No such file or directory

Fix method suggested by Juzhe-Zhong

gcc/testsuite/ChangeLog:

* gcc.dg/vect/costmodel/riscv/rvv/riscv_vector.h: New file.

Signed-off-by: demin.han 
---
 .../gcc.dg/vect/costmodel/riscv/rvv/riscv_vector.h| 11 +++
 1 file changed, 11 insertions(+)
 create mode 100644 gcc/testsuite/gcc.dg/vect/costmodel/riscv/rvv/riscv_vector.h

diff --git a/gcc/testsuite/gcc.dg/vect/costmodel/riscv/rvv/riscv_vector.h 
b/gcc/testsuite/gcc.dg/vect/costmodel/riscv/rvv/riscv_vector.h
new file mode 100644
index 000..fbb4858fc86
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/vect/costmodel/riscv/rvv/riscv_vector.h
@@ -0,0 +1,11 @@
+/* Wrapper of riscv_vector.h, prevent riscv_vector.h including stdint.h from
+   C library, that might cause problem on testing RV32 related testcase when
+   we disable multilib.  */
+#ifndef _RISCV_VECTOR_WRAP_H
+
+#define _GCC_WRAP_STDINT_H
+#include "stdint-gcc.h"
+#include_next 
+#define _RISCV_VECTOR_WRAP_H
+
+#endif
-- 
2.43.0

Re: [PATCH v2] RISC-V: Fix dynamic lmul tests depended on abi

2023-12-13 Thread juzhe.zh...@rivai.ai

LGTM.



juzhe.zh...@rivai.ai
 
From: demin.han
Date: 2023-12-13 19:12
To: gcc-patches@gcc.gnu.org
CC: juzhe.zh...@rivai.ai; pan2...@intel.com
Subject: [PATCH v2] RISC-V: Fix dynamic lmul tests depended on abi
Some toolchain configs would report:
fatal error: gnu/stubs-ilp32.h: No such file or directory
 
Fix method suggested by Juzhe-Zhong
 
gcc/testsuite/ChangeLog:
 
* gcc.dg/vect/costmodel/riscv/rvv/riscv_vector.h: New file.
 
Signed-off-by: demin.han 
---
.../gcc.dg/vect/costmodel/riscv/rvv/riscv_vector.h| 11 +++
1 file changed, 11 insertions(+)
create mode 100644 gcc/testsuite/gcc.dg/vect/costmodel/riscv/rvv/riscv_vector.h
 
diff --git a/gcc/testsuite/gcc.dg/vect/costmodel/riscv/rvv/riscv_vector.h 
b/gcc/testsuite/gcc.dg/vect/costmodel/riscv/rvv/riscv_vector.h
new file mode 100644
index 000..fbb4858fc86
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/vect/costmodel/riscv/rvv/riscv_vector.h
@@ -0,0 +1,11 @@
+/* Wrapper of riscv_vector.h, prevent riscv_vector.h including stdint.h from
+   C library, that might cause problem on testing RV32 related testcase when
+   we disable multilib.  */
+#ifndef _RISCV_VECTOR_WRAP_H
+
+#define _GCC_WRAP_STDINT_H
+#include "stdint-gcc.h"
+#include_next 
+#define _RISCV_VECTOR_WRAP_H
+
+#endif
-- 
2.43.0

Re: [PATCH v3] A new copy propagation and PHI elimination pass

2023-12-13 Thread Richard Biener

On Fri, 8 Dec 2023, Filip Kastl wrote:

> > > Hi,
> > > 
> > > this is a patch that I submitted two months ago as an RFC. I added some 
> > > polish
> > > since.
> > > 
> > > It is a new lightweight pass that removes redundant PHI functions and as a
> > > bonus does basic copy propagation. With Jan Hubi?ka we measured that it 
> > > is able
> > > to remove usually more than 5% of all PHI functions when run among early 
> > > passes
> > > (sometimes even 13% or more). Those are mostly PHI functions that would be
> > > later optimized away but with this pass it is possible to remove them 
> > > early
> > > enough so that they don't get streamed when runing LTO (and also 
> > > potentially
> > > inlined at multiple places). It is also able to remove some redundant PHIs
> > > that otherwise would still be present during RTL expansion.
> > > 
> > > Jakub Jel?nek was concerned about debug info coverage so I compiled 
> > > cc1plus
> > > with and without this patch. These are the sizes of .debug_info and
> > > .debug_loclists
> > > 
> > > .debug_info without patch 181694311
> > > .debug_infowith patch 181692320
> > > +0.0011% change
> > > 
> > > .debug_loclists without patch 47934753
> > > .debug_loclistswith patch 47934966
> > > -0.0004% change
> > > 
> > > I wanted to use dwlocstat to compare debug coverages but didn't manage to 
> > > get
> > > the program working on my machine sadly. Hope this suffices. Seems to me 
> > > that
> > > my patch doesn't have a significant impact on debug info.
> > > 
> > > Bootstraped and tested* on x86_64-pc-linux-gnu.
> > > 
> > > * One testcase (pr79691.c) did regress. However that is because the test 
> > > is
> > > dependent on a certain variable not being copy propagated. I will go into 
> > > more
> > > detail about this in a reply to this mail.
> > > 
> > > Ok to commit?
> > 
> > This is a second version of the patch.  In this version, I modified the
> > pr79691.c testcase so that it works as intended with other changes from the
> > patch.
> > 
> > The pr79691.c testcase checks that we get constants from snprintf calls and
> > that they simplify into a single constant.  The testcase doesn't account for
> > the fact that this constant may be further copy propagated which is exactly
> > what happens with this patch applied.
> > 
> > Bootstrapped and tested on x86_64-pc-linux-gnu.
> > 
> > Ok to commit?
> 
> This is the third version of the patch. In this version, I adressed most of
> Richards remarks about the second version. Here is a summary of changes I 
> made:
> 
> - Rename the pass from tree-ssa-sccopy.cc to gimple-ssa-sccopy.cc
> - Use simple_dce_from_worklist to remove propagated statements
> - Use existing replace_uses API instead of reinventing it
>   - This allowed me to get rid of some now redundant cleanup code
> - Encapsulate the SCC finding into a class
> - Rework stmt_may_generate_copy to get rid of redundant checks
> - Add check that PHI doesn't contain two non-SSA-name values to
>   stmt_may_generate_copy
> - Regarding alignment and value ranges in stmt_may_generate_copy: For now use
>   the conservative check that Richard suggested
> - Index array of vertices that SCC discovery uses by SSA name version numbers
>   instead of numbering statements myself.
> 
> 
> I didn't make any changes based on these remarks:
> 
> 1 It might be nice to optimize SCCs of size 1 somehow, not sure how
>   many times these appear - possibly prevent them from even entering
>   the SCC discovery?
> 
> It would be nice. But the only way to do this that I see right now is to first
> propagate SCCs of size 1 and then the rest. This would mean adding a new copy
> propagation procedure. It wouldn't be a trivial procedure. Efficiency of the
> pass relies on having SCCs topogically sorted so this procedure would have to
> implement some topological sort algorithm.
> 
> This could be done. It could save allocating some vec<>s (right now, SCCs of
> size 1 are represented by a vec<> with a single element). But is it worth it 
> to
> optimize the pass this way right now? If possible, I'd like to see that the
> pass works and sort out any problems people encounter with it before I start
> optimizing it.
> 
> 2 Instead of collecting all stmts that may generate a copy at the beginning of
>   the pass into a vec<>, let the SCC discovery check that statements may
>   generate a copy on the fly.
> 
> This would be a big change to the pass, it would require a lot of reworking.
> I'm also not sure if this would help reduce the number of allocated vec<>s 
> that
> much because I'll still want to represent SCCs by vec<>s.
> 
> Again - its possible I'll want to rework the pass in this way in the future 
> but
> I'd like to leave it as it is for now.
> 
> 3 Add a comment saying that the pass is doing optimistic copy propagation
> 
> I don't think the pass works in an optimistic way. It doesn't assume that all
> variables are copies of each other at any point. It instead identifies copy
> stat

RE: [PATCH v2] RISC-V: Fix dynamic lmul tests depended on abi

2023-12-13 Thread Li, Pan2

Committed, thanks all.

Pan

From: juzhe.zh...@rivai.ai 
Sent: Wednesday, December 13, 2023 7:16 PM
To: demin.han ; gcc-patches 

Cc: Li, Pan2 
Subject: Re: [PATCH v2] RISC-V: Fix dynamic lmul tests depended on abi

LGTM.


juzhe.zh...@rivai.ai

From: demin.han
Date: 2023-12-13 19:12
To: gcc-patches@gcc.gnu.org
CC: juzhe.zh...@rivai.ai; 
pan2...@intel.com
Subject: [PATCH v2] RISC-V: Fix dynamic lmul tests depended on abi
Some toolchain configs would report:
fatal error: gnu/stubs-ilp32.h: No such file or directory

Fix method suggested by Juzhe-Zhong

gcc/testsuite/ChangeLog:

* gcc.dg/vect/costmodel/riscv/rvv/riscv_vector.h: New file.

Signed-off-by: demin.han 
mailto:demin@starfivetech.com>>
---
.../gcc.dg/vect/costmodel/riscv/rvv/riscv_vector.h| 11 +++
1 file changed, 11 insertions(+)
create mode 100644 gcc/testsuite/gcc.dg/vect/costmodel/riscv/rvv/riscv_vector.h

diff --git a/gcc/testsuite/gcc.dg/vect/costmodel/riscv/rvv/riscv_vector.h 
b/gcc/testsuite/gcc.dg/vect/costmodel/riscv/rvv/riscv_vector.h
new file mode 100644
index 000..fbb4858fc86
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/vect/costmodel/riscv/rvv/riscv_vector.h
@@ -0,0 +1,11 @@
+/* Wrapper of riscv_vector.h, prevent riscv_vector.h including stdint.h from
+   C library, that might cause problem on testing RV32 related testcase when
+   we disable multilib.  */
+#ifndef _RISCV_VECTOR_WRAP_H
+
+#define _GCC_WRAP_STDINT_H
+#include "stdint-gcc.h"
+#include_next 
+#define _RISCV_VECTOR_WRAP_H
+
+#endif
--
2.43.0

Re: [PATCH] [ICE] Support vpcmov for V4HF/V4BF/V2HF/V2BF under TARGET_XOP.

2023-12-13 Thread Jakub Jelinek

On Fri, Dec 08, 2023 at 03:12:00PM +0800, liuhongt wrote:
> Bootstrapped and regtested on x86_64-pc-linux-gnu{-m32,}.
> Ready push to trunk.
> 
> gcc/ChangeLog:
> 
>   PR target/112904
>   * config/i386/mmx.md (*xop_pcmov_): New define_insn.
> 
> gcc/testsuite/ChangeLog:
> 
>   * g++.target/i386/pr112904.C: New test.

The new test FAILs on i686-linux and even on x86_64-linux I think
it doesn't actually test what was reported, unless one performs testing
with -march= for some XOP enabled CPU or -mxop.

The following patch fixes that, tested on x86_64-linux with
make check-g++ 
RUNTESTFLAGS='--target_board=unix\{-m32,-m32/-mno-sse/-mno-mmx,-m64\} 
i386.exp=pr112904.C'
Ok for trunk?

2023-12-13  Jakub Jelinek  

* g++.target/i386/pr112904.C: Add dg-do compile, dg-options -mxop
and for ia32 also dg-additional-options -mmmx.

--- gcc/testsuite/g++.target/i386/pr112904.C.jj 2023-12-11 08:31:59.001938798 
+0100
+++ gcc/testsuite/g++.target/i386/pr112904.C2023-12-13 12:54:50.318521637 
+0100
@@ -1,3 +1,8 @@
+// PR target/112904
+// { dg-do compile }
+// { dg-options "-mxop" }
+// { dg-additional-options "-mmmx" { target ia32 } }
+
 typedef _Float16 v4hf __attribute__((vector_size(8)));
 typedef short v4hi __attribute__((vector_size(8)));
 typedef _Float16 v2hf __attribute__((vector_size(4)));

Jakub

Re: [PATCH] RISC-V: Postpone full available optimization [VSETVL PASS]

2023-12-13 Thread Robin Dapp

Hi Juzhe,

in general looks OK to me.

Just a question for understanding:

> -  if (header_info.valid_p ()
> -   && (anticipated_exp_p (header_info) || block_info.full_available))

Why is full_available true if we cannot use it?

> +/* { dg-do compile } */

It would be nice if we could make this a run test as well.

Regards
 Robin

Re: [PATCH] RISC-V: Postpone full available optimization [VSETVL PASS]

2023-12-13 Thread juzhe.zhong

I don”t choose to run since I didn”t have issue run on my local simulator no matter qemu or spike.So it”s better to check vsetvl asm.full available is not consistent between LCM analysis and earliest fusion，so it”s safe to postpone it. Replied Message FromRobin DappDate12/13/2023 20:08 ToJuzhe-Zhong,gcc-patches@gcc.gnu.org Ccrdapp@gmail.com,kito.ch...@gmail.com,kito.ch...@sifive.com,jeffreya...@gmail.comSubjectRe: [PATCH] RISC-V: Postpone full available optimization [VSETVL PASS]Hi Juzhe,

in general looks OK to me.

Just a question for understanding:

> -  if (header_info.valid_p ()
> -      && (anticipated_exp_p (header_info) || block_info.full_available))

Why is full_available true if we cannot use it?

> +/* { dg-do compile } */

It would be nice if we could make this a run test as well.

Regards
 Robin

Re: [PATCH] RISC-V: Postpone full available optimization [VSETVL PASS]

2023-12-13 Thread Robin Dapp



> I don”t choose to run since I didn”t have issue run on my local
> simulator no matter qemu or spike.

Yes it was flaky.  That's kind of expected with the out-of-bounds
writes we did.  They can depend on runtime environment and other
factors.  Of course it's a bit counterintuitive to add a (before)
passing test but, with the proper comment, if it ever FAILed at some
point in the future we'd have a point of reference.  

Regards
 Robin

Re: [PATCH] RISC-V: Postpone full available optimization [VSETVL PASS]

2023-12-13 Thread juzhe.zhong

Do you mean add some comments in tests？ Replied Message FromRobin DappDate12/13/2023 20:16 Tojuzhe.zhong Ccrdapp@gmail.com,gcc-patches@gcc.gnu.org,kito.ch...@gmail.com,kito.ch...@sifive.com,jeffreya...@gmail.comSubjectRe: [PATCH] RISC-V: Postpone full available optimization [VSETVL PASS]
> I don”t choose to run since I didn”t have issue run on my local
> simulator no matter qemu or spike.

Yes it was flaky.  That's kind of expected with the out-of-bounds
writes we did.  They can depend on runtime environment and other
factors.  Of course it's a bit counterintuitive to add a (before)
passing test but, with the proper comment, if it ever FAILed at some
point in the future we'd have a point of reference.  

Regards
 Robin

Re: [PATCH] expmed: Perform mask extraction via QImode [PR112773].

2023-12-13 Thread Robin Dapp

Thanks.  The attached v2 goes with your suggestion and adds a
vec_extractbi expander.  Apart from that it keeps the
MODE_PRECISION changes from before and uses
insn_data[icode].operand[0]'s mode.

Apart from that no changes on the riscv side.

Bootstrapped and regtested on x86 and aarch64.  On cfarm185 (aarch64)
I saw several segmentation faults that only were internal compiler
errors in the unpatched run but they didn't seem related to my patch
at all.  It looks like several patches aarch64 testsuite patches have
landed in the past days so I can re-try tomorrow or so.
Regtested on riscv64.

Regards
 Robin

Subject: [PATCH v2] expmed: Use GET_MODE_PRECISION and expander's output mode.

This changes the vec_extract path of extract_bit_field to use
GET_MODE_PRECISION instead of GET_MODE_BITSIZE and uses
the mode obtained from insn_data[icode].operand[0] as target mode.

Also, it adds a vec_extractbi expander for riscv that maps
to vec_extractqi.  This fixes an ICE on riscv where we did
not find a vec_extract optab and continued with the generic code
which requires 1-byte alignment that riscv mask modes do not provide.

Apart from that it adds poly_int support to riscv's vec_extract
expander and makes the RVV..BImode -> QImode expander call
emit_vec_extract in order not to duplicate code.

gcc/ChangeLog:

PR target/112773

* config/riscv/autovec.md (vec_extractbi): New expander
calling vec_extractqi.
* config/riscv/riscv-protos.h (riscv_legitimize_poly_move):
Export.
(emit_vec_extract): Change argument from poly_int64 to rtx.
* config/riscv/riscv-v.cc (shuffle_extract_and_slide1up_patterns):
Ditto.
* config/riscv/riscv.cc (riscv_legitimize_poly_move): Export.
(riscv_legitimize_move): Use rtx instead of poly_int64.
* expmed.cc (store_bit_field_1): Change BITSIZE to PRECISION.
(extract_bit_field_1): Change BITSIZE to PRECISION and use
return mode from insn_data as target mode.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/rvv/autovec/partial/pr112773.c: New test.
---
 gcc/config/riscv/autovec.md   | 46 +--
 gcc/config/riscv/riscv-protos.h   |  3 +-
 gcc/config/riscv/riscv-v.cc   | 14 +++---
 gcc/config/riscv/riscv.cc |  6 +--
 gcc/expmed.cc | 18 +---
 .../riscv/rvv/autovec/partial/pr112773.c  | 20 
 6 files changed, 75 insertions(+), 32 deletions(-)
 create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/autovec/partial/pr112773.c

diff --git a/gcc/config/riscv/autovec.md b/gcc/config/riscv/autovec.md
index 55d3ae50c8b..8b8a92f10a1 100644
--- a/gcc/config/riscv/autovec.md
+++ b/gcc/config/riscv/autovec.md
@@ -1396,12 +1396,23 @@ (define_expand "vec_extract"
   rtx tmp = NULL_RTX;
   if (operands[2] != const0_rtx)
 {
-  /* Emit the slide down to index 0 in a new vector.  */
-  tmp = gen_reg_rtx (mode);
-  operands[2] = gen_lowpart (Pmode, operands[2]);
-  rtx ops[] = {tmp, operands[1], operands[2]};
-  riscv_vector::emit_vlmax_insn
-   (code_for_pred_slide (UNSPEC_VSLIDEDOWN, mode), 
riscv_vector::BINARY_OP, ops);
+  /* Properly convert a poly_int value and put the result into a
+register.  */
+  if (CONST_POLY_INT_P (operands[2]))
+   {
+ rtx pos = gen_reg_rtx (Pmode);
+ riscv_legitimize_poly_move (Pmode, pos, gen_reg_rtx (Pmode),
+ operands[2]);
+ operands[2] = pos;
+   }
+
+/* Emit the slide down to index 0 in a new vector.  */
+tmp = gen_reg_rtx (mode);
+operands[2] = gen_lowpart (Pmode, operands[2]);
+rtx ops[] = {tmp, operands[1], operands[2]};
+riscv_vector::emit_vlmax_insn
+  (code_for_pred_slide (UNSPEC_VSLIDEDOWN, mode),
+   riscv_vector::BINARY_OP, ops);
 }
 
   /* Emit v(f)mv.[xf].s.  */
@@ -1433,16 +1444,21 @@ (define_expand "vec_extractqi"
   riscv_vector::emit_vlmax_insn (code_for_pred_merge (qimode),
 riscv_vector::MERGE_OP, ops1);
 
-  /* Slide down the requested byte element.  */
-  rtx tmp2 = gen_reg_rtx (qimode);
-
-  rtx ops2[] = {tmp2, tmp1, operands[2]};
-  riscv_vector::emit_vlmax_insn
-(code_for_pred_slide (UNSPEC_VSLIDEDOWN, qimode),
- riscv_vector::BINARY_OP, ops2);
+  /* Extract from it.  */
+  riscv_vector::emit_vec_extract (operands[0], tmp1, operands[2]);
+  DONE;
+})
 
-  /* Extract it.  */
-  emit_insn (gen_pred_extract_first (qimode, operands[0], tmp2));
+;; Same for a BImode but still return a QImode.
+(define_expand "vec_extractbi"
+  [(set (match_operand:QI0 "register_operand")
+ (vec_select:QI
+   (match_operand:VB 1 "register_operand")
+   (parallel
+[(match_operand  2 "nonmemory_operand")])))]
+  "TARGET_VECTOR"
+{
+  emit_insn (gen_vec_extractqi (operands[0], operands[1], operands[2]));
   DONE;
 })
 
diff --git

Re: [PATCH 2/3] LoongArch: Fix instruction costs [PR112936]

2023-12-13 Thread chenglulu




在 2023/12/10 上午1:03, Xi Ruoyao 写道:

Replace the instruction costs in loongarch_rtx_cost_data constructor
based on micro-benchmark results on LA464 and LA664.

This allows optimizations like "x * 17" to alsl, and "x * 68" to alsl
and slli.

gcc/ChangeLog:

PR target/112936
* config/loongarch/loongarch-def.cc
(loongarch_rtx_cost_data::loongarch_rtx_cost_data): Update
instruction costs per micro-benchmark results.
(loongarch_rtx_cost_optimize_size): Set all instruction costs
to (COSTS_N_INSNS (1) + 1).
* config/loongarch/loongarch.cc (loongarch_rtx_costs): Remove
special case for multiplication when optimizing for size.
Adjust division cost when TARGET_64BIT && !TARGET_DIV32.
Account the extra cost when TARGET_CHECK_ZERO_DIV and
optimizing for speed.

gcc/testsuite/ChangeLog

PR target/112936
* gcc.target/loongarch/mul-const-reduction.c: New test.
---
  gcc/config/loongarch/loongarch-def.cc | 39 ++-
  gcc/config/loongarch/loongarch.cc | 22 +--
  .../loongarch/mul-const-reduction.c   | 11 ++
  3 files changed, 43 insertions(+), 29 deletions(-)
  create mode 100644 gcc/testsuite/gcc.target/loongarch/mul-const-reduction.c


Well, I'm curious about how the value of this cost is obtained.

Re: [PATCH] RISC-V: Postpone full available optimization [VSETVL PASS]

2023-12-13 Thread Robin Dapp

> Do you mean add some comments in tests？

I meant add it as a run test as well and comment that the test
has caused out-of-bounds writes before and passed by the time of
adding it (or so) and is kept regardless.

Regards
 Robin

Re: [PATCH] RISC-V: Postpone full available optimization [VSETVL PASS]

2023-12-13 Thread juzhe.zhong

OK. will add it later. Replied Message FromRobin DappDate12/13/2023 20:23 Tojuzhe.zhong Ccrdapp@gmail.com,gcc-patches@gcc.gnu.org,kito.ch...@gmail.com,kito.ch...@sifive.com,jeffreya...@gmail.comSubjectRe: [PATCH] RISC-V: Postpone full available optimization [VSETVL PASS]> Do you mean add some comments in tests？

I meant add it as a run test as well and comment that the test
has caused out-of-bounds writes before and passed by the time of
adding it (or so) and is kept regardless.

Regards
 Robin

[PATCH][0/6][RFC] Relax single-vector-size restriction

2023-12-13 Thread Richard Biener



I've been asked to look into how to best relax the current restriction
of the vectorizer that it prefers to use a single vector size throughout
loop vectorization.  That size is determined by the preferred_simd_mode
and the autovectorize_vector_modes hook for other-than-first iterations.

The target does have some leeway with its related_mode hook which you
can see in the aarch64 backend which has a "hack" prefering
"1 128-bit vector instead of 2 64-bit vectors" (for ADVSIMD).  
Incidentially that hack allows it to vectorize gcc.dg/vect/pr65947-7.c
which uses a condition reduction that is generally unhappy about the
ncopies > 1 case.

The first roadblock you hit when trying to relax things is that we
are assigning vector types very early - during data reference
analysis and during pattern matching and then for the rest of stmts
as part of determining the vectorization factor.

The patch series starts pushing that back (with some exceptions - it's
a proof-of-concept), trying to get us to the point of determining
the vectorization factor to use and only after that assign vector
types (with that VF as one of the constraints).  In particular the
patch tries to avoid altering the VF choice as we're still iterating
over the SIMD modes (possibly iterating over { VF, mode } pairs
where 'mode' would be VLA or VLS might be a future improvement).

Apart from gcc.dg/vect/pr65947-7.c which I'd like to see vectorized on
x86_64 there is a motivational testcase like

double x[1024];
char y[1024]; 
void foo  ()
{
  for (int i = 0 ; i < 16; ++i) 
{
  x[i] = i;
  y[i] = i;
}
}

where the iteration domain constrains the VF and we currently end
up vectorizing this with SSE vectors, causing 8 vector stores to x[]
even when AVX2 or AVX512 widths would be available there.

After a lot of different experiments I finally settled on the following
minimal impact solution - after determining the VF we assign vector
types but allow larger than the current active vector modes up to
the size of the mode of the preferred_simd_mode when that stays within
the constraint of the VF.  For the second example above on x86
with -march=znver4 we then fail vectorizing with V64QImode
(AVX512, the preferred_simd_mode) and for V32QImode (AVX2) because
of the low iteration count but we succeed with V16QImode (SSE, as
with current GCC) but are able to choose V8DFmode for the accesses
to x[] (AVX512, the preferred_simd_mode).  The condition reduction
case works in a similar way - with just SSE we succeed with V4HImode
but use V4SImode for the condition, keeping ncopies == 1 and making
the vectorizer happy.

The patch series prototypes this for non-SLP loop vectorization
(because the testcases above do not use SLP) - the prototype doesn't
pass testing and I won't pursue this further until we get rid of
the non-SLP path.

The series starts with some cleanups that might still be applicable
though, reducing calls to get_vectype_for_scalar_type where the
vector types should be known already (all of the constant/external
def kinds will go away with SLP-only anyway).  Then as I first
tried to vary VF it makes LOOP_VINFO_VECT_FACTOR an rvalue to
make sure we're nowhere rely on its value before it's really
final.

Gathers/scatters also complicate manners right now since we're
analyzing them very early (and that analysis needs a vector type),
but the actual offset def we need to mark relevant is tightly
coupled with the vector type chosen for it (and what the target
actually supports).  That's going to be trick.  I also noticed
that we might no longer need the gather/scater pattern support
as SLP can handle them without the IFNs(?)  Some general
API cleanup wrt unsigned vs. poly-uint and finally the last
patch in the series defers setting STMT_VINFO_VECTYPE (with
exceptions as I said) and has a cobbled up loop to assign
vector types after the VF is determined with the above
described scheme.

There's complication around mask types, so the patch goes one
step further and makes vectorizable_operation determine
the vector type of the def from the vector types of the
operands.  I think that in the end we want to "force" as few
vector types as possible and perform upward/downward propagation
from within vectorizable_* which would need a new mode of
operation for this (figure either output or input vector types
from what is present, possibly signaling DEFER and queuing
either uses of the output or the fixed inputs for further
processing).

I'd like to get some feedback on the way I chose to wire the
new flexibility into the existing mode iteration and whether
that's sound both for SVE/NEON or whether any of you have
concerns around that or ideas how to instead exploit such
flexibility.

Thanks,
Richard.

[PATCH 1/6] Reduce the number of get_vectype_for_scalar_type calls

2023-12-13 Thread Richard Biener

The following removes get_vectype_for_scalar_type calls when we
already have the vector type computed.  It also avoids some
premature and possibly redundant or unnecessary check during
data-ref analysis for gathers.

* tree-vect-data-refs.cc (vect_analyze_data_refs): Do
not check for a vector type for gather/scatter offset.
vect_check_gather_scatter does that already.
* tree-vect-loop.cc (get_initial_def_for_reduction): Use
the computed vector type.
* tree-vect-stmts.cc (vectorizable_operation): Use the
known vector type for constants/externs.
---
 gcc/tree-vect-data-refs.cc | 4 +---
 gcc/tree-vect-loop.cc  | 2 +-
 gcc/tree-vect-stmts.cc | 3 ++-
 3 files changed, 4 insertions(+), 5 deletions(-)

diff --git a/gcc/tree-vect-data-refs.cc b/gcc/tree-vect-data-refs.cc
index d5c9c4a11c2..107dffe0a64 100644
--- a/gcc/tree-vect-data-refs.cc
+++ b/gcc/tree-vect-data-refs.cc
@@ -4721,9 +4721,7 @@ vect_analyze_data_refs (vec_info *vinfo, poly_uint64 
*min_vf, bool *fatal)
  gather_scatter_info gs_info;
  if (!vect_check_gather_scatter (stmt_info,
  as_a  (vinfo),
- &gs_info)
- || !get_vectype_for_scalar_type (vinfo,
-  TREE_TYPE (gs_info.offset)))
+ &gs_info))
{
  if (fatal)
*fatal = false;
diff --git a/gcc/tree-vect-loop.cc b/gcc/tree-vect-loop.cc
index 6261cd1be1d..3a0731f3bea 100644
--- a/gcc/tree-vect-loop.cc
+++ b/gcc/tree-vect-loop.cc
@@ -5466,7 +5466,7 @@ get_initial_def_for_reduction (loop_vec_info loop_vinfo,
 {
   class loop *loop = LOOP_VINFO_LOOP (loop_vinfo);
   tree scalar_type = TREE_TYPE (init_val);
-  tree vectype = get_vectype_for_scalar_type (loop_vinfo, scalar_type);
+  tree vectype = STMT_VINFO_VECTYPE (reduc_info);
   tree init_def;
   gimple_seq stmts = NULL;
 
diff --git a/gcc/tree-vect-stmts.cc b/gcc/tree-vect-stmts.cc
index 390c8472fd6..5ad306e2b08 100644
--- a/gcc/tree-vect-stmts.cc
+++ b/gcc/tree-vect-stmts.cc
@@ -6914,7 +6914,8 @@ vectorizable_operation (vec_info *vinfo,
 S2: z = x + 1   -   VS2_0  */
 
   vect_get_vec_defs (vinfo, stmt_info, slp_node, ncopies,
-op0, &vec_oprnds0, op1, &vec_oprnds1, op2, &vec_oprnds2);
+op0, vectype, &vec_oprnds0, op1, vectype, &vec_oprnds1,
+op2, vectype, &vec_oprnds2);
   /* Arguments are ready.  Create the new vector stmt.  */
   FOR_EACH_VEC_ELT (vec_oprnds0, i, vop0)
 {
-- 
2.35.3

[committed] libstdc++: Fix regression in std::format output of %Y for negative years

2023-12-13 Thread Jonathan Wakely

It seems that what I pushed didn't match what I tested, due to testing
on a different machine!

Tested x86_64-linux, on the right machine this time. Pushed to trunk.

-- >8 --

The change in r14-6468-ga01462ae8bafa8 was only supposed to apply to %C
formats, not %Y.

libstdc++-v3/ChangeLog:

* include/bits/chrono_io.h (__formatter_chrono::_M_C_y_Y): Do
not round century down for %Y formats.
---
 libstdc++-v3/include/bits/chrono_io.h | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/libstdc++-v3/include/bits/chrono_io.h 
b/libstdc++-v3/include/bits/chrono_io.h
index b63b8592eba..bcd76e4ab7b 100644
--- a/libstdc++-v3/include/bits/chrono_io.h
+++ b/libstdc++-v3/include/bits/chrono_io.h
@@ -825,7 +825,7 @@ namespace __format
{
  __s.assign(1, _S_plus_minus[1]);
  // For floored division -123//100 is -2 and -100//100 is -1
- if ((__ci * 100) != __yi)
+ if (__conv == 'C' && (__ci * 100) != __yi)
++__ci;
}
  if (__ci >= 100) [[unlikely]]
-- 
2.43.0

[PATCH 4/6] More explicit vector types

2023-12-13 Thread Richard Biener

This reduces more calls to get_vectype_for_scalar_type.

* tree-vect-loop.cc (vect_transform_cycle_phi): Specify
the vector type for invariant/external defs.
* tree-vect-stmts.cc (vectorizable_shift): For invariant
or external shifted operands use the result vector type.
Specify the vector type for invariant/external defs.
(vectorizable_store): Likewise.
---
 gcc/tree-vect-loop.cc  |  2 +-
 gcc/tree-vect-stmts.cc | 20 ++--
 2 files changed, 7 insertions(+), 15 deletions(-)

diff --git a/gcc/tree-vect-loop.cc b/gcc/tree-vect-loop.cc
index 3af4160426b..9e531921e29 100644
--- a/gcc/tree-vect-loop.cc
+++ b/gcc/tree-vect-loop.cc
@@ -8749,7 +8749,7 @@ vect_transform_cycle_phi (loop_vec_info loop_vinfo,
 correctly if ncopies is not one.  */
  vect_get_vec_defs_for_operand (loop_vinfo, reduc_stmt_info,
 ncopies, initial_def,
-&vec_initial_defs);
+&vec_initial_defs, vectype_out);
}
   else if (STMT_VINFO_REDUC_TYPE (reduc_info) == CONST_COND_REDUCTION
   || STMT_VINFO_REDUC_TYPE (reduc_info) == COND_REDUCTION)
diff --git a/gcc/tree-vect-stmts.cc b/gcc/tree-vect-stmts.cc
index 5ad306e2b08..88401a2a00b 100644
--- a/gcc/tree-vect-stmts.cc
+++ b/gcc/tree-vect-stmts.cc
@@ -6126,19 +6126,11 @@ vectorizable_shift (vec_info *vinfo,
  "use not simple.\n");
   return false;
 }
-  /* If op0 is an external or constant def, infer the vector type
- from the scalar type.  */
+  /* If op0 is an external or constant def, use the output vector type.  */
   if (!vectype)
-vectype = get_vectype_for_scalar_type (vinfo, TREE_TYPE (op0), slp_node);
+vectype = vectype_out;
   if (vec_stmt)
 gcc_assert (vectype);
-  if (!vectype)
-{
-  if (dump_enabled_p ())
-dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location,
- "no vectype for scalar type\n");
-  return false;
-}
 
   nunits_out = TYPE_VECTOR_SUBPARTS (vectype_out);
   nunits_in = TYPE_VECTOR_SUBPARTS (vectype);
@@ -6426,8 +6418,8 @@ vectorizable_shift (vec_info *vinfo,
  (a special case for certain kind of vector shifts); otherwise,
  operand 1 should be of a vector type (the usual case).  */
   vect_get_vec_defs (vinfo, stmt_info, slp_node, ncopies,
-op0, &vec_oprnds0,
-vec_oprnd1 ? NULL_TREE : op1, &vec_oprnds1);
+op0, vectype, &vec_oprnds0,
+vec_oprnd1 ? NULL_TREE : op1, op1_vectype, &vec_oprnds1);
 
   /* Arguments are ready.  Create the new vector stmt.  */
   FOR_EACH_VEC_ELT (vec_oprnds0, i, vop0)
@@ -8537,7 +8529,7 @@ vectorizable_store (vec_info *vinfo,
op = vect_get_store_rhs (next_stmt_info);
  if (!costing_p)
vect_get_vec_defs (vinfo, next_stmt_info, slp_node, ncopies, op,
-  &vec_oprnds);
+  vectype, &vec_oprnds);
  else
update_prologue_cost (&prologue_cost, op);
  unsigned int group_el = 0;
@@ -9303,7 +9295,7 @@ vectorizable_store (vec_info *vinfo,
{
  vect_get_vec_defs_for_operand (vinfo, next_stmt_info,
 ncopies, op,
-gvec_oprnds[i]);
+gvec_oprnds[i], vectype);
  vec_oprnd = (*gvec_oprnds[i])[0];
  dr_chain.quick_push (vec_oprnd);
}
-- 
2.35.3

[PATCH 2/6] Set LOOP_VINFO_VECT_FACTOR only when it is final

2023-12-13 Thread Richard Biener

The following makes sure to keep LOOP_VINFO_VECT_FACTOR at the
indetermined value zero until it is final, making LOOP_VINFO_VECT_FACTOR
an rvalue and changing some direct references to use the macro.

* tree-vectorizer.h (LOOP_VINFO_VECT_FACTOR): Make an rvalue.
* tree-vect-loop.cc (vect_determine_vectorization_factor):
Do not set LOOP_VINFO_VECT_FACTOR, return value via reference.
(vect_update_vf_for_slp): Likewise.
(vect_analyze_loop_2): Set LOOP_VINFO_VECT_FACTOR only
ever to its final value.  Perform SLP optimization after
setting the vectorization factor.
* tree-vect-slp.cc (vect_slp_analyze_node_operations_1):
Use LOOP_VINFO_VECT_FACTOR.
(vect_slp_analyze_node_operations): Likewise.
* tree-vectorizer.cc (vect_transform_loops): Likewise.
---
 gcc/tree-vect-loop.cc  | 43 +++---
 gcc/tree-vect-slp.cc   |  4 ++--
 gcc/tree-vectorizer.cc |  2 +-
 gcc/tree-vectorizer.h  |  2 +-
 4 files changed, 28 insertions(+), 23 deletions(-)

diff --git a/gcc/tree-vect-loop.cc b/gcc/tree-vect-loop.cc
index 3a0731f3bea..3af4160426b 100644
--- a/gcc/tree-vect-loop.cc
+++ b/gcc/tree-vect-loop.cc
@@ -282,12 +282,12 @@ vect_determine_vf_for_stmt (vec_info *vinfo,
 */
 
 static opt_result
-vect_determine_vectorization_factor (loop_vec_info loop_vinfo)
+vect_determine_vectorization_factor (loop_vec_info loop_vinfo,
+poly_uint64 &vectorization_factor)
 {
   class loop *loop = LOOP_VINFO_LOOP (loop_vinfo);
   basic_block *bbs = LOOP_VINFO_BBS (loop_vinfo);
   unsigned nbbs = loop->num_nodes;
-  poly_uint64 vectorization_factor = 1;
   tree scalar_type = NULL_TREE;
   gphi *phi;
   tree vectype;
@@ -296,6 +296,8 @@ vect_determine_vectorization_factor (loop_vec_info 
loop_vinfo)
 
   DUMP_VECT_SCOPE ("vect_determine_vectorization_factor");
 
+  vectorization_factor = 1;
+
   for (i = 0; i < nbbs; i++)
 {
   basic_block bb = bbs[i];
@@ -370,7 +372,6 @@ vect_determine_vectorization_factor (loop_vec_info 
loop_vinfo)
   if (known_le (vectorization_factor, 1U))
 return opt_result::failure_at (vect_location,
   "not vectorized: unsupported data-type\n");
-  LOOP_VINFO_VECT_FACTOR (loop_vinfo) = vectorization_factor;
   return opt_result::success ();
 }
 
@@ -1937,17 +1938,16 @@ vect_create_loop_vinfo (class loop *loop, 
vec_info_shared *shared,
statements update the vectorization factor.  */
 
 static void
-vect_update_vf_for_slp (loop_vec_info loop_vinfo)
+vect_update_vf_for_slp (loop_vec_info loop_vinfo,
+   poly_uint64 &vectorization_factor)
 {
   class loop *loop = LOOP_VINFO_LOOP (loop_vinfo);
   basic_block *bbs = LOOP_VINFO_BBS (loop_vinfo);
   int nbbs = loop->num_nodes;
-  poly_uint64 vectorization_factor;
   int i;
 
   DUMP_VECT_SCOPE ("vect_update_vf_for_slp");
 
-  vectorization_factor = LOOP_VINFO_VECT_FACTOR (loop_vinfo);
   gcc_assert (known_ne (vectorization_factor, 0U));
 
   /* If all the stmts in the loop can be SLPed, we perform only SLP, and
@@ -2006,7 +2006,6 @@ vect_update_vf_for_slp (loop_vec_info loop_vinfo)
 LOOP_VINFO_SLP_UNROLLING_FACTOR (loop_vinfo));
 }
 
-  LOOP_VINFO_VECT_FACTOR (loop_vinfo) = vectorization_factor;
   if (dump_enabled_p ())
 {
   dump_printf_loc (MSG_NOTE, vect_location,
@@ -2809,7 +2808,8 @@ vect_analyze_loop_2 (loop_vec_info loop_vinfo, bool 
&fatal,
 return opt_result::failure_at (vect_location, "bad data dependence.\n");
   LOOP_VINFO_MAX_VECT_FACTOR (loop_vinfo) = max_vf;
 
-  ok = vect_determine_vectorization_factor (loop_vinfo);
+  poly_uint64 vectorization_factor;
+  ok = vect_determine_vectorization_factor (loop_vinfo, vectorization_factor);
   if (!ok)
 {
   if (dump_enabled_p ())
@@ -2821,7 +2821,7 @@ vect_analyze_loop_2 (loop_vec_info loop_vinfo, bool 
&fatal,
   /* Compute the scalar iteration cost.  */
   vect_compute_single_scalar_iteration_cost (loop_vinfo);
 
-  poly_uint64 saved_vectorization_factor = LOOP_VINFO_VECT_FACTOR (loop_vinfo);
+  poly_uint64 saved_vectorization_factor = vectorization_factor;
 
   if (slp)
 {
@@ -2839,13 +2839,7 @@ vect_analyze_loop_2 (loop_vec_info loop_vinfo, bool 
&fatal,
  vect_detect_hybrid_slp (loop_vinfo);
 
  /* Update the vectorization factor based on the SLP decision.  */
- vect_update_vf_for_slp (loop_vinfo);
-
- /* Optimize the SLP graph with the vectorization factor fixed.  */
- vect_optimize_slp (loop_vinfo);
-
- /* Gather the loads reachable from the SLP graph entries.  */
- vect_gather_slp_loads (loop_vinfo);
+ vect_update_vf_for_slp (loop_vinfo, vectorization_factor);
}
 }
 
@@ -2863,11 +2857,12 @@ start_over:
  during finish_cost the first time we ran the analyzis for this
  vector mode.  */
   if (applying_suggested_uf)
-LOOP_VINFO_VECT_FACTOR (loop_vinf

[PATCH 5/6] Allow poly_uint64 for group_size args to vector type query routines

2023-12-13 Thread Richard Biener

The following changes the unsigned group_size argument to a poly_uint64
one to avoid too much special-casing in callers for VLA vectors when
passing down the effective maximum desirable vector size to vector
type query routines.  The intent is to be able to pass down
the vectorization factor (times the SLP group size) eventually.

* tree-vectorizer.h (get_vectype_for_scalar_type,
get_mask_type_for_scalar_type, vect_get_vector_types_for_stmt):
Change group_size argument to poly_uint64 type.
(vect_get_mask_type_for_stmt): Remove prototype for no longer
existing function.
* tree-vect-stmts.cc (get_vectype_for_scalar_type): Change
group_size argument to poly_uint64.
(get_mask_type_for_scalar_type): Likewise.
(vect_get_vector_types_for_stmt): Likewise.
---
 gcc/tree-vect-stmts.cc | 25 ++---
 gcc/tree-vectorizer.h  |  7 +++
 2 files changed, 17 insertions(+), 15 deletions(-)

diff --git a/gcc/tree-vect-stmts.cc b/gcc/tree-vect-stmts.cc
index 88401a2a00b..a5e26b746fb 100644
--- a/gcc/tree-vect-stmts.cc
+++ b/gcc/tree-vect-stmts.cc
@@ -13297,14 +13297,14 @@ get_related_vectype_for_scalar_type (machine_mode 
prevailing_mode,
 
 tree
 get_vectype_for_scalar_type (vec_info *vinfo, tree scalar_type,
-unsigned int group_size)
+poly_uint64 group_size)
 {
   /* For BB vectorization, we should always have a group size once we've
  constructed the SLP tree; the only valid uses of zero GROUP_SIZEs
  are tentative requests during things like early data reference
  analysis and pattern recognition.  */
   if (is_a  (vinfo))
-gcc_assert (vinfo->slp_instances.is_empty () || group_size != 0);
+gcc_assert (vinfo->slp_instances.is_empty () || known_ne (group_size, 0));
   else
 group_size = 0;
 
@@ -13320,9 +13320,11 @@ get_vectype_for_scalar_type (vec_info *vinfo, tree 
scalar_type,
 
   /* If the natural choice of vector type doesn't satisfy GROUP_SIZE,
  try again with an explicit number of elements.  */
+  uint64_t cst_group_size;
   if (vectype
-  && group_size
-  && maybe_ge (TYPE_VECTOR_SUBPARTS (vectype), group_size))
+  && group_size.is_constant (&cst_group_size)
+  && cst_group_size != 0
+  && maybe_ge (TYPE_VECTOR_SUBPARTS (vectype), cst_group_size))
 {
   /* Start with the biggest number of units that fits within
 GROUP_SIZE and halve it until we find a valid vector type.
@@ -13336,7 +13338,7 @@ get_vectype_for_scalar_type (vec_info *vinfo, tree 
scalar_type,
 even though the group is not a multiple of that vector size.
 The BB vectorizer will then try to carve up the group into
 smaller pieces.  */
-  unsigned int nunits = 1 << floor_log2 (group_size);
+  unsigned int nunits = 1 << floor_log2 (cst_group_size);
   do
{
  vectype = get_related_vectype_for_scalar_type (vinfo->vector_mode,
@@ -13372,7 +13374,7 @@ get_vectype_for_scalar_type (vec_info *vinfo, tree 
scalar_type, slp_tree node)
 
 tree
 get_mask_type_for_scalar_type (vec_info *vinfo, tree scalar_type,
-  unsigned int group_size)
+  poly_uint64 group_size)
 {
   tree vectype = get_vectype_for_scalar_type (vinfo, scalar_type, group_size);
 
@@ -14243,7 +14245,7 @@ opt_result
 vect_get_vector_types_for_stmt (vec_info *vinfo, stmt_vec_info stmt_info,
tree *stmt_vectype_out,
tree *nunits_vectype_out,
-   unsigned int group_size)
+   poly_uint64 group_size)
 {
   gimple *stmt = stmt_info->stmt;
 
@@ -14252,7 +14254,7 @@ vect_get_vector_types_for_stmt (vec_info *vinfo, 
stmt_vec_info stmt_info,
  are tentative requests during things like early data reference
  analysis and pattern recognition.  */
   if (is_a  (vinfo))
-gcc_assert (vinfo->slp_instances.is_empty () || group_size != 0);
+gcc_assert (vinfo->slp_instances.is_empty () || known_ne (group_size, 0));
   else
 group_size = 0;
 
@@ -14281,7 +14283,7 @@ vect_get_vector_types_for_stmt (vec_info *vinfo, 
stmt_vec_info stmt_info,
 
   tree vectype;
   tree scalar_type = NULL_TREE;
-  if (group_size == 0 && STMT_VINFO_VECTYPE (stmt_info))
+  if (known_eq (group_size, 0U) && STMT_VINFO_VECTYPE (stmt_info))
 {
   vectype = STMT_VINFO_VECTYPE (stmt_info);
   if (dump_enabled_p ())
@@ -14310,10 +14312,11 @@ vect_get_vector_types_for_stmt (vec_info *vinfo, 
stmt_vec_info stmt_info,
 
   if (dump_enabled_p ())
{
- if (group_size)
+ if (known_ne (group_size, 0U))
dump_printf_loc (MSG_NOTE, vect_location,
 "get vectype for scalar type (group size %d):"
-" %T\n", group_size, scalar_type);
+" %T\n", (int)constant_lower_bound (group_si

[PATCH 3/6] Query an appropriate offset vector type in vect_gather_scatter_fn_p

2023-12-13 Thread Richard Biener

The gather_load optab and friends require the offset vector mode to
have the same number of lanes as the data vector mode.  Restrict the
vector type query to that when searching for a proper offset type.

* tree-vect-data-refs.cc (vect_gather_scatter_fn_p):
Use get_related_vectype_for_scalar_type to get at the
offset vector type.
---
 gcc/tree-vect-data-refs.cc | 17 +++--
 1 file changed, 11 insertions(+), 6 deletions(-)

diff --git a/gcc/tree-vect-data-refs.cc b/gcc/tree-vect-data-refs.cc
index 107dffe0a64..59e296e7976 100644
--- a/gcc/tree-vect-data-refs.cc
+++ b/gcc/tree-vect-data-refs.cc
@@ -3913,7 +3913,7 @@ vect_prune_runtime_alias_test_list (loop_vec_info 
loop_vinfo)
*IFN_OUT and the vector type for the offset in *OFFSET_VECTYPE_OUT.  */
 
 bool
-vect_gather_scatter_fn_p (vec_info *vinfo, bool read_p, bool masked_p,
+vect_gather_scatter_fn_p (vec_info *, bool read_p, bool masked_p,
  tree vectype, tree memory_type, tree offset_type,
  int scale, internal_fn *ifn_out,
  tree *offset_vectype_out)
@@ -3948,13 +3948,18 @@ vect_gather_scatter_fn_p (vec_info *vinfo, bool read_p, 
bool masked_p,
 
   for (;;)
 {
-  tree offset_vectype = get_vectype_for_scalar_type (vinfo, offset_type);
-  if (!offset_vectype)
-   return false;
+  /* The optabs require the same number of elements in the offset
+vector as in the data vector.  */
+  tree offset_vectype
+   = get_related_vectype_for_scalar_type (TYPE_MODE (vectype), offset_type,
+  TYPE_VECTOR_SUBPARTS (vectype));
 
   /* Test whether the target supports this combination.  */
-  if (internal_gather_scatter_fn_supported_p (ifn, vectype, memory_type,
- offset_vectype, scale))
+  if (!offset_vectype)
+   ;
+  else if (internal_gather_scatter_fn_supported_p (ifn, vectype,
+  memory_type,
+  offset_vectype, scale))
{
  *ifn_out = ifn;
  *offset_vectype_out = offset_vectype;
-- 
2.35.3

[PATCH 6/6] Defer assigning vector types until after VF is determined

2023-12-13 Thread Richard Biener

The following defers, for non-gather/scatter and non-pattern stmts,
setting of STMT_VINFO_VECTYPE until after we computed the desired
vectorization factor.  This allows us to use larger vector types
when the vectorization factor and the preferred vector mode allow,
reducing the number of vector stmt copies and enabling vectorization
in the first place if ncopies restrictions requires the use of
different size vector types like for PR65947.

vectorizable_operation handles some of the required vector type
inference.

* tree-vect-data-refs.cc (vect_analyze_data_refs): Do not
set STMT_VINFO_VECTYPE unless this is a gather/scatter.
* tree-vect-loop.cc (vect_determine_vf_for_stmt_1): Do not
set STMT_VINFO_VECTYPE, only determine the VF.
(vect_determine_vectorization_factor): Likewise.
(vect_analyze_loop_2): Set STMT_VINFO_VECTYPE where missing
and non-mask.  Choose larger vectors to reduce the number of
stmt copies.
* tree-vect-stmts.cc (vect_analyze_stmt): Allow not
specified vector type for mask producers.
(vectorizable_operation): Refactor to handle
STMT_VINFO_VECTYPE inference from operands.

* gcc.dg/vect/pr65947-7.c: Adjust.
* gcc.target/i386/vect-multi-size-1.c: New testcase.
---
 gcc/testsuite/gcc.dg/vect/pr65947-7.c |   2 +-
 .../gcc.target/i386/vect-multi-size-1.c   |  17 ++
 gcc/tree-vect-data-refs.cc|  11 +-
 gcc/tree-vect-loop.cc | 148 +++---
 gcc/tree-vect-stmts.cc| 121 +++---
 5 files changed, 202 insertions(+), 97 deletions(-)
 create mode 100644 gcc/testsuite/gcc.target/i386/vect-multi-size-1.c

diff --git a/gcc/testsuite/gcc.dg/vect/pr65947-7.c 
b/gcc/testsuite/gcc.dg/vect/pr65947-7.c
index 58c46df5c54..8f8adce3d91 100644
--- a/gcc/testsuite/gcc.dg/vect/pr65947-7.c
+++ b/gcc/testsuite/gcc.dg/vect/pr65947-7.c
@@ -53,4 +53,4 @@ main (void)
 }
 
 /* { dg-final { scan-tree-dump "optimizing condition reduction with 
FOLD_EXTRACT_LAST" "vect" { target vect_fold_extract_last } } } */
-/* { dg-final { scan-tree-dump "LOOP VECTORIZED" "vect" { target aarch64*-*-* 
} } } */
+/* { dg-final { scan-tree-dump "LOOP VECTORIZED" "vect" { target { 
aarch64*-*-* } || { vect_multiple_sizes } } } } */
diff --git a/gcc/testsuite/gcc.target/i386/vect-multi-size-1.c 
b/gcc/testsuite/gcc.target/i386/vect-multi-size-1.c
new file mode 100644
index 000..a0dd3cf9801
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/vect-multi-size-1.c
@@ -0,0 +1,17 @@
+/* { dg-do compile } */
+/* { dg-options "-O2 -march=znver4 -fdump-tree-vect" } */
+
+double x[1024];
+char y[1024];
+void foo  ()
+{
+  for (int i = 0 ; i < 16; ++i)
+{
+  x[i] = i;
+  y[i] = i;
+}
+}
+
+/* We expect to see AVX512 vectors for x[] and a SSE vector for y[].  */
+/* { dg-final { scan-tree-dump-times "MEM " 2 "vect" } 
} */
+/* { dg-final { scan-tree-dump-times "MEM " 1 "vect" } 
} */
diff --git a/gcc/tree-vect-data-refs.cc b/gcc/tree-vect-data-refs.cc
index 59e296e7976..80057474af9 100644
--- a/gcc/tree-vect-data-refs.cc
+++ b/gcc/tree-vect-data-refs.cc
@@ -4716,18 +4716,19 @@ vect_analyze_data_refs (vec_info *vinfo, poly_uint64 
*min_vf, bool *fatal)
   vf = TYPE_VECTOR_SUBPARTS (vectype);
   *min_vf = upper_bound (*min_vf, vf);
 
-  /* Leave the BB vectorizer to pick the vector type later, based on
-the final dataref group size and SLP node size.  */
-  if (is_a  (vinfo))
-   STMT_VINFO_VECTYPE (stmt_info) = vectype;
-
   if (gatherscatter != SG_NONE)
{
+ /* ???  We should perform a coarser check here, or none at all.
+We're checking this again later, in particular during
+relevancy analysis where we hook on the discovered offset
+operand.  */
+ STMT_VINFO_VECTYPE (stmt_info) = vectype;
  gather_scatter_info gs_info;
  if (!vect_check_gather_scatter (stmt_info,
  as_a  (vinfo),
  &gs_info))
{
+ STMT_VINFO_VECTYPE (stmt_info) = NULL_TREE;
  if (fatal)
*fatal = false;
  return opt_result::failure_at
diff --git a/gcc/tree-vect-loop.cc b/gcc/tree-vect-loop.cc
index 9e531921e29..f226135cb1d 100644
--- a/gcc/tree-vect-loop.cc
+++ b/gcc/tree-vect-loop.cc
@@ -189,22 +189,19 @@ vect_determine_vf_for_stmt_1 (vec_info *vinfo, 
stmt_vec_info stmt_info,
   if (!res)
 return res;
 
-  if (stmt_vectype)
+  if (nunits_vectype)
 {
-  if (STMT_VINFO_VECTYPE (stmt_info))
-   /* The only case when a vectype had been already set is for stmts
-  that contain a data ref, or for "pattern-stmts" (stmts generated
-  by the vectorizer to represent/replace a certain idiom).  */
-   gcc_assert ((STMT_VINFO_DATA_REF (stmt_info)
-|| vectype_maybe_set_p)

Re: [PATCH 1/3] LoongArch: Include rtl.h for COSTS_N_INSNS instead of hard coding our own

2023-12-13 Thread chenglulu


LGTM!

Thanks.

在 2023/12/10 上午1:03, Xi Ruoyao 写道:

With loongarch-def.cc switched from C to C++, we can include rtl.h for
COSTS_N_INSNS, instead of hard coding our own.

THis is a non-functional change for now, but it will make the code more
future-proof in case COSTS_N_INSNS in rtl.h would be changed.

gcc/ChangeLog:

* config/loongarch/loongarch-def.cc (rtl.h): Include.
(COSTS_N_INSNS): Remove the macro definition.
---
  gcc/config/loongarch/loongarch-def.cc | 3 +--
  1 file changed, 1 insertion(+), 2 deletions(-)

diff --git a/gcc/config/loongarch/loongarch-def.cc 
b/gcc/config/loongarch/loongarch-def.cc
index c41804a180e..6217b19268c 100644
--- a/gcc/config/loongarch/loongarch-def.cc
+++ b/gcc/config/loongarch/loongarch-def.cc
@@ -22,6 +22,7 @@ along with GCC; see the file COPYING3.  If not see
  #include "system.h"
  #include "coretypes.h"
  #include "tm.h"
+#include "rtl.h"
  
  #include "loongarch-def.h"

  #include "loongarch-str.h"
@@ -89,8 +90,6 @@ array_tune loongarch_cpu_align =
  .set (CPU_LA464, la464_align ())
  .set (CPU_LA664, la464_align ());
  
-#define COSTS_N_INSNS(N) ((N) * 4)

-
  /* Default RTX cost initializer.  */
  loongarch_rtx_cost_data::loongarch_rtx_cost_data ()
: fp_add (COSTS_N_INSNS (1)),

Re: [PATCH 3/3] LoongArch: Add alslsi3_extend

2023-12-13 Thread chenglulu


LGTM!

Thanks!

在 2023/12/10 上午1:03, Xi Ruoyao 写道:

Following the instruction cost fix, we are generating

 alsl.w $a0, $a0, $a0, 4

instead of

 li.w  $t0, 17
 mul.w $a0, $t0

for "x * 4", because alsl.w is 4 times faster than mul.w.  But we didn't
have a sign-extending pattern for alsl.w, causing an extra slli.w
instruction generated to sign-extend $a0.  Add the pattern to remove the
redundant extension.

gcc/ChangeLog:

* config/loongarch/loongarch.md (alslsi3_extend): New
define_insn.
---
  gcc/config/loongarch/loongarch.md | 12 
  1 file changed, 12 insertions(+)

diff --git a/gcc/config/loongarch/loongarch.md 
b/gcc/config/loongarch/loongarch.md
index afbf201d4d0..7b26d15aa4e 100644
--- a/gcc/config/loongarch/loongarch.md
+++ b/gcc/config/loongarch/loongarch.md
@@ -2869,6 +2869,18 @@ (define_insn "alsl3"
[(set_attr "type" "arith")
 (set_attr "mode" "")])
  
+(define_insn "alslsi3_extend"

+  [(set (match_operand:DI 0 "register_operand" "=r")
+   (sign_extend:DI
+ (plus:SI
+   (ashift:SI (match_operand:SI 1 "register_operand" "r")
+  (match_operand 2 "const_immalsl_operand" ""))
+   (match_operand:SI 3 "register_operand" "r"]
+  ""
+  "alsl.w\t%0,%1,%3,%2"
+  [(set_attr "type" "arith")
+   (set_attr "mode" "SI")])
+
  
  
  ;; Reverse the order of bytes of operand 1 and store the result in operand 0.

Re: Re: [PATCH v3 2/4] RISC-V: Add crypto vector builtin function.

2023-12-13 Thread Feng Wang

2023-12-13 18:18 juzhe.zhong  wrote:
>
>
>+    multiple_p (GET_MODE_BITSIZE (e.arg_mode (0)),
>+    GET_MODE_BITSIZE (e.arg_mode (1)), &nunits);
>
>Change it into gcc_assert (multiple_p (...))
>
>+/* A list of all Vector Crypto intrinsic functions.  */
>+static function_group_info cryoto_function_groups[] = {
>+#define DEF_RVV_FUNCTION(NAME, SHAPE, PREDS, OPS_INFO, AVAIL) \
>+  {#NAME, &bases::NAME, &shapes::SHAPE, PREDS, OPS_INFO,\
>+   riscv_vector_avail_ ## AVAIL},
>+#include "riscv-vector-crypto-builtins-functions.def"
>+};
>Why do you add this ? I think it should belong to function_groups.

The original intention of this modification was to make the processing flow of 
the crypto vector more clearer.
If you think it should merge into V extension, I will do it.
Thanks.

Feng Wang

>
>+  /* Dfine the crypto vector builtin functions. */
>+  for (unsigned int i = 0; i < ARRAY_SIZE (cryoto_function_groups); ++i)
>+  {
>+    function_group_info  *f = &cryoto_function_groups[i];
>+    if (f->avail && f->avail ())
>+  builder.register_function_group (cryoto_function_groups[i]);
>+  }
>
>
>I think it should be:
>
>for (unsigned int i = 0; i < ARRAY_SIZE (function_groups); ++i)
>    if (avail)
> builder.register_function_group (function_groups[i]);
>
>
>
>
>juzhe.zh...@rivai.ai
>

[committed] RISC-V:Add crypto vector implied ISA info.

2023-12-13 Thread Feng Wang

Due to the crypto vector entension is depend on the Vector extension,
so add the implied ISA info with the corresponding crypto vector extension.

gcc/ChangeLog:

* common/config/riscv/riscv-common.cc: Modify implied ISA info.
* config/riscv/arch-canonicalize: Add crypto vector implied info.
---
 gcc/common/config/riscv/riscv-common.cc |  9 +
 gcc/config/riscv/arch-canonicalize  | 21 +++--
 2 files changed, 24 insertions(+), 6 deletions(-)

diff --git a/gcc/common/config/riscv/riscv-common.cc 
b/gcc/common/config/riscv/riscv-common.cc
index 4d5a2f874a2..76987598143 100644
--- a/gcc/common/config/riscv/riscv-common.cc
+++ b/gcc/common/config/riscv/riscv-common.cc
@@ -145,6 +145,15 @@ static const riscv_implied_info_t riscv_implied_info[] =
   {"zvksc", "zvbc"},
   {"zvksg", "zvks"},
   {"zvksg", "zvkg"},
+  {"zvbb",  "zvkb"},
+  {"zvbc",   "zve64x"},
+  {"zvkb",   "zve32x"},
+  {"zvkg",   "zve32x"},
+  {"zvkned", "zve32x"},
+  {"zvknha", "zve32x"},
+  {"zvknhb", "zve64x"},
+  {"zvksed", "zve32x"},
+  {"zvksh",  "zve32x"},
 
   {"zfh", "zfhmin"},
   {"zfhmin", "f"},
diff --git a/gcc/config/riscv/arch-canonicalize 
b/gcc/config/riscv/arch-canonicalize
index ea2f67a0944..a8f47a1752b 100755
--- a/gcc/config/riscv/arch-canonicalize
+++ b/gcc/config/riscv/arch-canonicalize
@@ -69,12 +69,21 @@ IMPLIED_EXT = {
   "zvl32768b" : ["zvl16384b"],
   "zvl65536b" : ["zvl32768b"],
 
-  "zvkn" : ["zvkned", "zvknhb", "zvbb", "zvkt"],
-  "zvknc" : ["zvkn", "zvbc"],
-  "zvkng" : ["zvkn", "zvkg"],
-  "zvks" : ["zvksed", "zvksh", "zvbb", "zvkt"],
-  "zvksc" : ["zvks", "zvbc"],
-  "zvksg" : ["zvks", "zvkg"],
+  "zvkn"   : ["zvkned", "zvknhb", "zvkb", "zvkt"],
+  "zvknc"  : ["zvkn", "zvbc"],
+  "zvkng"  : ["zvkn", "zvkg"],
+  "zvks"   : ["zvksed", "zvksh", "zvkb", "zvkt"],
+  "zvksc"  : ["zvks", "zvbc"],
+  "zvksg"  : ["zvks", "zvkg"],
+  "zvbb"   : ["zvkb"],
+  "zvbc"   : ["zve64x"],
+  "zvkb"   : ["zve32x"],
+  "zvkg"   : ["zve32x"],
+  "zvkned" : ["zve32x"],
+  "zvknha" : ["zve32x"],
+  "zvknhb" : ["zve64x"],
+  "zvksed" : ["zve32x"],
+  "zvksh"  : ["zve32x"],
 }
 
 def arch_canonicalize(arch, isa_spec):
-- 
2.17.1

Re: [PATCH 2/3] LoongArch: Fix instruction costs [PR112936]

2023-12-13 Thread Xi Ruoyao

On Wed, 2023-12-13 at 20:22 +0800, chenglulu wrote:

在 2023/12/10 上午1:03, Xi Ruoyao 写道:
Replace the instruction costs in loongarch_rtx_cost_data constructor
based on micro-benchmark results on LA464 and LA664.

This allows optimizations like "x * 17" to alsl, and "x * 68" to alsl
and slli.

gcc/ChangeLog:

    PR target/112936
    * config/loongarch/loongarch-def.cc
    (loongarch_rtx_cost_data::loongarch_rtx_cost_data): Update
    instruction costs per micro-benchmark results.
    (loongarch_rtx_cost_optimize_size): Set all instruction costs
    to (COSTS_N_INSNS (1) + 1).
    * config/loongarch/loongarch.cc (loongarch_rtx_costs): Remove
    special case for multiplication when optimizing for size.
    Adjust division cost when TARGET_64BIT && !TARGET_DIV32.
    Account the extra cost when TARGET_CHECK_ZERO_DIV and
    optimizing for speed.

gcc/testsuite/ChangeLog

    PR target/112936
    * gcc.target/loongarch/mul-const-reduction.c: New test.
---
   gcc/config/loongarch/loongarch-def.cc | 39 ++-
   gcc/config/loongarch/loongarch.cc | 22 +--
   .../loongarch/mul-const-reduction.c   | 11 ++
   3 files changed, 43 insertions(+), 29 deletions(-)
   create mode 100644 gcc/testsuite/gcc.target/loongarch/mul-const-reduction.c

Well, I'm curious about how the value of this cost is obtained.

I just make a loop containing 1000 mul.w instructions, then run the loop
100 times and compare the time usage with running another loop
containing 1000 addi.w instructions iterated 100 times too.
> 

Likewise for other instructions...

-- 
Xi Ruoyao 
School of Aerospace Science and Technology, Xidian University

[PATCH v1] RISC-V: Refine test cases for both PR112929 and PR112988

2023-12-13 Thread pan2 . li

From: Pan Li 

Refine the test cases for:

* Name convention.
* Add run case.

PR target/112929
PR target/112988

gcc/testsuite/ChangeLog:

* gcc.target/riscv/rvv/vsetvl/pr112929.c: Moved to...
* gcc.target/riscv/rvv/vsetvl/pr112929-1.c: ...here.
* gcc.target/riscv/rvv/vsetvl/pr112988.c: Moved to...
* gcc.target/riscv/rvv/vsetvl/pr112988-1.c: ...here.
* gcc.target/riscv/rvv/vsetvl/pr112929-2.c: New test.
* gcc.target/riscv/rvv/vsetvl/pr112988-2.c: New test.

Signed-off-by: Pan Li 
---
 .../rvv/vsetvl/{pr112929.c => pr112929-1.c}   |  0
 .../gcc.target/riscv/rvv/vsetvl/pr112929-2.c  | 57 +++
 .../rvv/vsetvl/{pr112988.c => pr112988-1.c}   |  0
 .../gcc.target/riscv/rvv/vsetvl/pr112988-2.c  | 53 +
 4 files changed, 110 insertions(+)
 rename gcc/testsuite/gcc.target/riscv/rvv/vsetvl/{pr112929.c => pr112929-1.c} 
(100%)
 create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/vsetvl/pr112929-2.c
 rename gcc/testsuite/gcc.target/riscv/rvv/vsetvl/{pr112988.c => pr112988-1.c} 
(100%)
 create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/vsetvl/pr112988-2.c

diff --git a/gcc/testsuite/gcc.target/riscv/rvv/vsetvl/pr112929.c 
b/gcc/testsuite/gcc.target/riscv/rvv/vsetvl/pr112929-1.c
similarity index 100%
rename from gcc/testsuite/gcc.target/riscv/rvv/vsetvl/pr112929.c
rename to gcc/testsuite/gcc.target/riscv/rvv/vsetvl/pr112929-1.c
diff --git a/gcc/testsuite/gcc.target/riscv/rvv/vsetvl/pr112929-2.c 
b/gcc/testsuite/gcc.target/riscv/rvv/vsetvl/pr112929-2.c
new file mode 100644
index 000..f2022026639
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/rvv/vsetvl/pr112929-2.c
@@ -0,0 +1,57 @@
+/* { dg-do run { target { riscv_v } } } */
+/* { dg-additional-options "-std=c99 -O3 -fno-vect-cost-model" } */
+
+int printf(char *, ...);
+int a, l, i, p, q, t, n, o;
+int *volatile c;
+static int j;
+static struct pack_1_struct d;
+long e;
+char m = 5;
+short s;
+
+#pragma pack(1)
+struct pack_1_struct {
+  long c;
+  int d;
+  int e;
+  int f;
+  int g;
+  int h;
+  int i;
+} h, r = {1}, *f = &h, *volatile g;
+
+void add_em_up(int count, ...) {
+  __builtin_va_list ap;
+  __builtin_va_start(ap, count);
+  __builtin_va_end(ap);
+}
+
+int main() {
+  int u;
+  j = 0;
+
+  for (; j < 9; ++j) {
+u = ++t ? a : 0;
+if (u) {
+  int *v = &d.d;
+  *v = g || e;
+  *c = 0;
+  *f = h;
+}
+s = l && c;
+o = i;
+d.f || (p = 0);
+q |= n;
+  }
+
+  r = *f;
+
+  add_em_up(1, 1);
+  printf("%d\n", m);
+
+  if (m != 5)
+__builtin_abort ();
+
+  return 0;
+}
diff --git a/gcc/testsuite/gcc.target/riscv/rvv/vsetvl/pr112988.c 
b/gcc/testsuite/gcc.target/riscv/rvv/vsetvl/pr112988-1.c
similarity index 100%
rename from gcc/testsuite/gcc.target/riscv/rvv/vsetvl/pr112988.c
rename to gcc/testsuite/gcc.target/riscv/rvv/vsetvl/pr112988-1.c
diff --git a/gcc/testsuite/gcc.target/riscv/rvv/vsetvl/pr112988-2.c 
b/gcc/testsuite/gcc.target/riscv/rvv/vsetvl/pr112988-2.c
new file mode 100644
index 000..e952b85b630
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/rvv/vsetvl/pr112988-2.c
@@ -0,0 +1,53 @@
+/* { dg-do run { target { riscv_v } } } */
+/* { dg-additional-options "-std=c99 -O3 -fno-vect-cost-model" } */
+
+int a = 0;
+int p, q, r, x = 230;
+short d;
+int e[256];
+static struct f w;
+int *c = &r;
+
+short y(short z) {
+  return z * d;
+}
+
+#pragma pack(1)
+struct f {
+  int g;
+  short h;
+  int j;
+  char k;
+  char l;
+  long m;
+  long n;
+  int o;
+} s = {1}, v, t, *u = &v, *b = &s;
+
+void add_em_up(int count, ...) {
+  __builtin_va_list ap;
+  __builtin_va_start(ap, count);
+  __builtin_va_end(ap);
+}
+
+int main() {
+  int i = 0;
+  for (; i < 256; i++)
+e[i] = i;
+
+  p = 0;
+  for (; p <= 0; p++) {
+*c = 4;
+*u = t;
+x |= y(6 >= q);
+  }
+
+  *b = w;
+
+  add_em_up(1, 1);
+
+  if (a != 0 || q != 0 || p != 1 || r != 4 || x != 0xE6 || d != 0)
+__builtin_abort ();
+
+  return 0;
+}
-- 
2.34.1

Re: [PATCH v1] RISC-V: Refine test cases for both PR112929 and PR112988

2023-12-13 Thread juzhe.zhong

lgtm from my side. But I'd like to see Robin's commentsThanks Replied Message Frompan2...@intel.comDate12/13/2023 21:49 Togcc-patches@gcc.gnu.org Ccjuzhe.zh...@rivai.ai,pan2...@intel.com,rdapp@gmail.comSubject[PATCH v1] RISC-V: Refine test cases for both PR112929 and PR112988

Re: [PATCH v1] RISC-V: Refine test cases for both PR112929 and PR112988

2023-12-13 Thread Robin Dapp

Thanks, LGTM but please add a comment like:

These test cases used to cause out-of-bounds writes to the stack
and therefore showed unreliable behavior.  Depending on the
execution environment they can either pass or fail.  As of now,
with the latest QEMU version, they will pass even without the
underlying issue fixed.  As the test case is known to have
caused the problem before we keep it as a run test case for
future reference.

Regards
 Robin

[committed] aarch64 testsuite: Only run aarch64-ssve tests once

2023-12-13 Thread Andrew Carlotti

Results verified by running
`RUNTESTFLAGS="aarch64-ssve.exp=*" make -k -j 56 check-gcc`
before and after the change.  I initally spotted the issue because the tests
were being run a nondeterministic number of time during unrelated regresison
testing.

Committed as obvious.

gcc/testsuite/ChangeLog:

* g++.target/aarch64/sve/aarch64-ssve.exp:


diff --git a/gcc/testsuite/g++.target/aarch64/sve/aarch64-ssve.exp 
b/gcc/testsuite/g++.target/aarch64/sve/aarch64-ssve.exp
index 
d6a5a561a33ea98d7363af0cfa4d73955baabd1b..98242a97b46e9793f34a26f4365a3d1f39d58da5
 100644
--- a/gcc/testsuite/g++.target/aarch64/sve/aarch64-ssve.exp
+++ b/gcc/testsuite/g++.target/aarch64/sve/aarch64-ssve.exp
@@ -27,6 +27,10 @@ if {![istarget aarch64*-*-*] } {
 
 load_lib gcc-defs.exp
 
+if ![gcc_parallel_test_run_p aarch64-ssve] {
+  return
+}
+
 gcc_parallel_test_enable 0
 
 # Code shared by all tests.

Re: [PATCH] expmed: Perform mask extraction via QImode [PR112773].

2023-12-13 Thread Richard Sandiford

Robin Dapp  writes:
> @@ -1758,16 +1759,19 @@ extract_bit_field_1 (rtx str_rtx, poly_uint64 
> bitsize, poly_uint64 bitnum,
>if (VECTOR_MODE_P (outermode) && !MEM_P (op0))
>  {
>scalar_mode innermode = GET_MODE_INNER (outermode);
>enum insn_code icode
>   = convert_optab_handler (vec_extract_optab, outermode, innermode);
>poly_uint64 pos;
> -  if (icode != CODE_FOR_nothing
> -   && known_eq (bitsize, GET_MODE_BITSIZE (innermode))
> -   && multiple_p (bitnum, GET_MODE_BITSIZE (innermode), &pos))
> +  if ((icode != CODE_FOR_nothing
> +&& known_eq (bitsize, GET_MODE_PRECISION (innermode))
> +&& multiple_p (bitnum, GET_MODE_PRECISION (innermode), &pos)))

This adds an extra, unnecessary layer of bracketing.  OK for the
target-independent parts without that.

Thanks,
Richard

>   {
> class expand_operand ops[3];
>  
> -   create_output_operand (&ops[0], target, innermode);
> +   create_output_operand (&ops[0], target,
> +  insn_data[icode].operand[0].mode);
> ops[0].target = 1;
> create_input_operand (&ops[1], op0, outermode);
> create_integer_operand (&ops[2], pos);
> diff --git a/gcc/testsuite/gcc.target/riscv/rvv/autovec/partial/pr112773.c 
> b/gcc/testsuite/gcc.target/riscv/rvv/autovec/partial/pr112773.c
> new file mode 100644
> index 000..5f7374b0040
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/riscv/rvv/autovec/partial/pr112773.c
> @@ -0,0 +1,20 @@
> +/* { dg-do compile } */
> +/* { dg-additional-options "-march=rv32gcv_zvl256b -mabi=ilp32d -O3" } */
> +
> +long long a;
> +int b, c;
> +int *d;
> +void e(unsigned f) {
> +  for (;; ++c)
> +if (f) {
> +  a = 0;
> +  for (; a <= 3; a++) {
> +f = 0;
> +for (; f <= 0; f++)
> +  if ((long)a)
> +break;
> +  }
> +  if (b)
> +*d = f;
> +}
> +}

Re: Re: [PATCH] expmed: Perform mask extraction via QImode [PR112773].

2023-12-13 Thread 钟居哲

Thanks Richard.

LGTM for RISC-V part.

Thanks Robin for fixing it.



juzhe.zh...@rivai.ai
 
From: Richard Sandiford
Date: 2023-12-13 22:05
To: Robin Dapp
CC: Richard Biener; gcc-patches; juzhe.zhong\@rivai.ai
Subject: Re: [PATCH] expmed: Perform mask extraction via QImode [PR112773].
Robin Dapp  writes:
> @@ -1758,16 +1759,19 @@ extract_bit_field_1 (rtx str_rtx, poly_uint64 
> bitsize, poly_uint64 bitnum,
>if (VECTOR_MODE_P (outermode) && !MEM_P (op0))
>  {
>scalar_mode innermode = GET_MODE_INNER (outermode);
>enum insn_code icode
>  = convert_optab_handler (vec_extract_optab, outermode, innermode);
>poly_uint64 pos;
> -  if (icode != CODE_FOR_nothing
> -   && known_eq (bitsize, GET_MODE_BITSIZE (innermode))
> -   && multiple_p (bitnum, GET_MODE_BITSIZE (innermode), &pos))
> +  if ((icode != CODE_FOR_nothing
> +&& known_eq (bitsize, GET_MODE_PRECISION (innermode))
> +&& multiple_p (bitnum, GET_MODE_PRECISION (innermode), &pos)))
 
This adds an extra, unnecessary layer of bracketing.  OK for the
target-independent parts without that.
 
Thanks,
Richard
 
>  {
>class expand_operand ops[3];
>  
> -   create_output_operand (&ops[0], target, innermode);
> +   create_output_operand (&ops[0], target,
> + insn_data[icode].operand[0].mode);
>ops[0].target = 1;
>create_input_operand (&ops[1], op0, outermode);
>create_integer_operand (&ops[2], pos);
> diff --git a/gcc/testsuite/gcc.target/riscv/rvv/autovec/partial/pr112773.c 
> b/gcc/testsuite/gcc.target/riscv/rvv/autovec/partial/pr112773.c
> new file mode 100644
> index 000..5f7374b0040
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/riscv/rvv/autovec/partial/pr112773.c
> @@ -0,0 +1,20 @@
> +/* { dg-do compile } */
> +/* { dg-additional-options "-march=rv32gcv_zvl256b -mabi=ilp32d -O3" } */
> +
> +long long a;
> +int b, c;
> +int *d;
> +void e(unsigned f) {
> +  for (;; ++c)
> +if (f) {
> +  a = 0;
> +  for (; a <= 3; a++) {
> +f = 0;
> +for (; f <= 0; f++)
> +  if ((long)a)
> +break;
> +  }
> +  if (b)
> +*d = f;
> +}
> +}

RE: [PATCH v1] RISC-V: Refine test cases for both PR112929 and PR112988

2023-12-13 Thread Li, Pan2

Committed with below comments, thanks Juzhe and Robin.

Pan

-Original Message-
From: Robin Dapp  
Sent: Wednesday, December 13, 2023 9:56 PM
To: Li, Pan2 ; gcc-patches@gcc.gnu.org
Cc: rdapp@gmail.com; juzhe.zh...@rivai.ai
Subject: Re: [PATCH v1] RISC-V: Refine test cases for both PR112929 and PR112988

Thanks, LGTM but please add a comment like:

These test cases used to cause out-of-bounds writes to the stack
and therefore showed unreliable behavior.  Depending on the
execution environment they can either pass or fail.  As of now,
with the latest QEMU version, they will pass even without the
underlying issue fixed.  As the test case is known to have
caused the problem before we keep it as a run test case for
future reference.

Regards
 Robin

RE: [PATCH 9/21]middle-end: implement vectorizable_early_exit for codegen of exit code

2023-12-13 Thread Tamar Christina

> > >   else if (vect_use_mask_type_p (stmt_info))
> > > {
> > >   unsigned int precision = stmt_info->mask_precision;
> > >   scalar_type = build_nonstandard_integer_type (precision, 1);
> > >   vectype = get_mask_type_for_scalar_type (vinfo, scalar_type,
> > > group_size);
> > >   if (!vectype)
> > > return opt_result::failure_at (stmt, "not vectorized: unsupported"
> > >" data-type %T\n", scalar_type);
> > >
> > > Richard, do you have any advice here?  I suppose vect_determine_precisions
> > > needs to handle the gcond case with bool != 0 somehow and for the
> > > extra mask producer we add here we have to emulate what it would have
> > > done, right?
> >
> > How about handling gconds directly in vect_determine_mask_precision?
> > In a sense it's not needed, since gconds are always roots, and so we
> > could calculate their precision on the fly instead.  But handling it in
> > vect_determine_mask_precision feels like it should reduce the number
> > of special cases.
> 
> Yeah, that sounds worth trying.
> 
> Richard.

So here's a respin with this suggestion and the other issues fixed.
Note that the testcases still need to be updated with the right stanzas.

The patch is much smaller, I still have a small change to
vect_get_vector_types_for_stmt  in case we get there on a gcond where
vect_recog_gcond_pattern couldn't apply due to the target missing an
appropriate vectype.  The change only gracefully rejects the gcond.

Since patterns cannot apply to the same root twice I've had to also do
the split of the condition out of the gcond in bitfield lowering.

Bootstrapped Regtested on aarch64-none-linux-gnu, x86_64-pc-linux-gnu and no 
issues.

Ok for master?

Thanks,
Tamar
gcc/ChangeLog:

* tree-vect-patterns.cc (vect_init_pattern_stmt): Support gcond
(vect_recog_bitfield_ref_pattern): Update to split out bool.
(vect_recog_gcond_pattern): New.
(possible_vector_mask_operation_p): Support gcond.
(vect_determine_mask_precision): Likewise.
* tree-vect-stmts.cc (vectorizable_comparison_1): Support stmts without
lhs.
(vectorizable_early_exit): New.
(vect_analyze_stmt, vect_transform_stmt): Use it.
(vect_get_vector_types_for_stmt): Rejects gcond if not lowered by
vect_recog_gcond_pattern.

gcc/testsuite/ChangeLog:

* gcc.dg/vect/vect-early-break_84.c: New test.
* gcc.dg/vect/vect-early-break_85.c: New test.
* gcc.dg/vect/vect-early-break_86.c: New test.
* gcc.dg/vect/vect-early-break_87.c: New test.
* gcc.dg/vect/vect-early-break_88.c: New test.

--- inline copy of patch ---

diff --git a/gcc/testsuite/gcc.dg/vect/vect-early-break_84.c 
b/gcc/testsuite/gcc.dg/vect/vect-early-break_84.c
new file mode 100644
index 
..0622339491d333b07c2ce895785b5216713097a9
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/vect/vect-early-break_84.c
@@ -0,0 +1,39 @@
+/* { dg-require-effective-target vect_early_break } */
+/* { dg-require-effective-target vect_int } */
+
+/* { dg-final { scan-tree-dump "LOOP VECTORIZED" "vect" } } */
+
+#include 
+
+#ifndef N
+#define N 17
+#endif
+bool vect_a[N] = { false, false, true, false, false, false,
+   false, false, false, false, false, false,
+   false, false, false, false, false };
+unsigned vect_b[N] = { 0 };
+
+__attribute__ ((noinline, noipa))
+unsigned test4(bool x)
+{
+ unsigned ret = 0;
+ for (int i = 0; i < N; i++)
+ {
+   if (vect_a[i] == x)
+ return 1;
+   vect_a[i] = x;
+   
+ }
+ return ret;
+}
+
+extern void abort ();
+
+int main ()
+{
+  if (test4 (true) != 1)
+abort ();
+
+  if (vect_b[2] != 0 && vect_b[1] == 0)
+abort ();
+}
diff --git a/gcc/testsuite/gcc.dg/vect/vect-early-break_85.c 
b/gcc/testsuite/gcc.dg/vect/vect-early-break_85.c
new file mode 100644
index 
..39b3d9bad8681a2d15d7fc7de86bdd3ce0f0bd4e
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/vect/vect-early-break_85.c
@@ -0,0 +1,35 @@
+/* { dg-require-effective-target vect_early_break } */
+/* { dg-require-effective-target vect_int } */
+
+/* { dg-final { scan-tree-dump "LOOP VECTORIZED" "vect" } } */
+
+#ifndef N
+#define N 5
+#endif
+int vect_a[N] = { 5, 4, 8, 4, 6 };
+unsigned vect_b[N] = { 0 };
+
+__attribute__ ((noinline, noipa))
+unsigned test4(int x)
+{
+ unsigned ret = 0;
+ for (int i = 0; i < N; i++)
+ {
+   if (vect_a[i] > x)
+ return 1;
+   vect_a[i] = x;
+   
+ }
+ return ret;
+}
+
+extern void abort ();
+
+int main ()
+{
+  if (test4 (7) != 1)
+abort ();
+
+  if (vect_b[2] != 0 && vect_b[1] == 0)
+abort ();
+}
diff --git a/gcc/testsuite/gcc.dg/vect/vect-early-break_86.c 
b/gcc/testsuite/gcc.dg/vect/vect-early-break_86.c
new file mode 100644
index 
..66eb570f4028bca4b631329d7af50c646d3c0cb3
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/vect/vect-early-break_86.

Re: [PATCH] SRA: Force gimple operand in an additional corner case (PR 112822)

2023-12-13 Thread Peter Bergner

On 12/13/23 2:05 AM, Jakub Jelinek wrote:
> On Wed, Dec 13, 2023 at 08:51:16AM +0100, Richard Biener wrote:
>> On Tue, 12 Dec 2023, Peter Bergner wrote:
>>
>>> On 12/12/23 8:36 PM, Jason Merrill wrote:
 This test is failing for me below C++17, I think you need

 // { dg-do compile { target c++17 } }
 or
 // { dg-require-effective-target c++17 }
>>>
>>> Sorry about that.  Should we do the above or should we just add
>>> -std=c++17 to dg-options?  ...or do we need to do both?
>>
>> Just do the above, the C++ testsuite iterates over all standards,
>> adding -std=c++17 would just run that 5 times.  But the above
>> properly skips unsupported cases.
> 
> I believe if one uses explicit -std=gnu++17 or -std=c++17 in dg-options
> then it will not iterate:
> # If the testcase specifies a standard, use that one.
> # If not, run it under several standards, allowing GNU extensions
> # if there's a dg-options line.
> if ![search_for $test "-std=*++"] {
> and otherwise how many times exactly it iterates depends on what the user
> asked for or what effective target is there (normally the default is
> to iterate 4 times (98,14,17,20), one can use e.g.
> GXX_TESTSUITE_STDS=98,11,14,17,20,23,26 to iterate 7 times, or the
> default also changes if c++23, c++26 or c++11_only effective targets
> are present somewhere in the test.
> 
> But sure, if the test is valid in C++17, 20, 23, 26, then
> // { dg-do compile { target c++17 } }
> is best (unless the test is mostly language version independent and
> very expensive to compile or run).

I confirmed the test case builds with C++17, 20, 23, 26 and errors out
with C++11, so I went with your solution.  Thanks for the input and
sorry for the breakage.  Pushed.

Peter


testsuite: Add dg-do compile target c++17 directive for testcase [PR112822]

Add dg-do compile target directive that limits the test case to being built
on c++17 compiles or greater.

2023-12-13  Peter Bergner  

gcc/testsuite/
PR tree-optimization/112822
* g++.dg/pr112822.C: Add dg-do compile target c++17 directive.

diff --git a/gcc/testsuite/g++.dg/pr112822.C b/gcc/testsuite/g++.dg/pr112822.C
index d1490405493..a8557522467 100644
--- a/gcc/testsuite/g++.dg/pr112822.C
+++ b/gcc/testsuite/g++.dg/pr112822.C
@@ -1,4 +1,5 @@
 /* PR tree-optimization/112822 */
+/* { dg-do compile { target c++17 } } */
 /* { dg-options "-w -O2" } */
 
 /* Verify we do not ICE on the following noisy creduced test case.  */

[PATCH] LoongArch: Use the movcf2gr instruction to implement cstore4

2023-12-13 Thread Xi Ruoyao

We used a branch to load floating-point comparison results into GPR.
This is very slow when the branch is not predictable.

Use the movcf2gr instruction to implement cstore4 if movcf2gr
is fast enough.

gcc/ChangeLog:

* config/loongarch/genopts/loongarch.opt.in (muse-movcf2gr): New
option.
* config/loongarch/loongarch.opt: Regenerate.
* config/loongarch/loongarch-tune.h
(loongarch_rtx_cost_data::movcf2gr): New field.
(loongarch_rtx_cost_data::movcf2gr_): New method.
(loongarch_rtx_cost_data::use_movcf2gr): New method.
(simple_insn_cost): Declare.
* config/loongarch/loongarch-def.cc
(loongarch_rtx_cost_data::loongarch_rtx_cost_data): Set movcf2gr
to COSTS_N_INSNS (7).
(loongarch_cpu_rtx_cost_data): Set movcf2gr to COSTS_N_INSNS (1)
for LA664.
(loongarch_rtx_cost_optimize_size): Set movcf2gr to
COSTS_N_INSNS (1) + 1.
(simple_insn_cost): Define and initialize to COSTS_N_INSNS (1).
* doc/invoke.texi (-muse-movcf2gr): Document the new option.
* config/loongarch/predicates.md (loongarch_fcmp_operator): New
predicate.
* config/loongarch/loongarch.md (movcf2gr): New
define_insn.
(cstore4): New define_expand.
* config/loongarch/loongarch.cc
(loongarch_option_override_internal): Set the default of
-muse-movcf2gr based on -mtune=.

gcc/testsuite/ChangeLog:

* gcc.target/loongarch/movcf2gr.c: New test.
---

Bootstrapped and regtested on loongarch64-linux-gnu (twice, with
BOOT_CFLAGS and {C,CXX}FLAGS_FOR_TARGET set to "-O2 -muse-movcf2gr" and
"-O2 -mno-use-movcf2gr").  Ok for trunk?

 gcc/config/loongarch/genopts/loongarch.opt.in |  4 +++
 gcc/config/loongarch/loongarch-def.cc | 12 +--
 gcc/config/loongarch/loongarch-tune.h | 14 
 gcc/config/loongarch/loongarch.cc |  3 ++
 gcc/config/loongarch/loongarch.md | 36 +++
 gcc/config/loongarch/loongarch.opt|  4 +++
 gcc/config/loongarch/predicates.md|  4 +++
 gcc/doc/invoke.texi   |  8 +
 gcc/testsuite/gcc.target/loongarch/movcf2gr.c |  9 +
 9 files changed, 91 insertions(+), 3 deletions(-)
 create mode 100644 gcc/testsuite/gcc.target/loongarch/movcf2gr.c

diff --git a/gcc/config/loongarch/genopts/loongarch.opt.in 
b/gcc/config/loongarch/genopts/loongarch.opt.in
index c3848d02fd3..a87915d9b5a 100644
--- a/gcc/config/loongarch/genopts/loongarch.opt.in
+++ b/gcc/config/loongarch/genopts/loongarch.opt.in
@@ -245,6 +245,10 @@ mpass-mrelax-to-as
 Target Var(loongarch_pass_mrelax_to_as) Init(HAVE_AS_MRELAX_OPTION)
 Pass -mrelax or -mno-relax option to the assembler.
 
+muse-movcf2gr
+Target Var(loongarch_use_movcf2gr) Init(M_OPT_UNSET)
+Emit the movcf2gr instruction.
+
 -param=loongarch-vect-unroll-limit=
 Target Joined UInteger Var(loongarch_vect_unroll_limit) Init(6) 
IntegerRange(1, 64) Param
 Used to limit unroll factor which indicates how much the autovectorizer may
diff --git a/gcc/config/loongarch/loongarch-def.cc 
b/gcc/config/loongarch/loongarch-def.cc
index 4a8885e8343..6da085d375e 100644
--- a/gcc/config/loongarch/loongarch-def.cc
+++ b/gcc/config/loongarch/loongarch-def.cc
@@ -36,6 +36,8 @@ using array_tune = array;
 template 
 using array_arch = array;
 
+const int simple_insn_cost = COSTS_N_INSNS (1);
+
 /* CPU property tables.  */
 array_tune loongarch_cpu_strings = array_tune ()
   .set (CPU_NATIVE, STR_CPU_NATIVE)
@@ -101,15 +103,18 @@ loongarch_rtx_cost_data::loongarch_rtx_cost_data ()
 int_mult_di (COSTS_N_INSNS (4)),
 int_div_si (COSTS_N_INSNS (5)),
 int_div_di (COSTS_N_INSNS (5)),
+movcf2gr (COSTS_N_INSNS (7)),
 branch_cost (6),
 memory_latency (4) {}
 
 /* The following properties cannot be looked up directly using "cpucfg".
  So it is necessary to provide a default value for "unknown native"
  tune targets (i.e. -mtune=native while PRID does not correspond to
- any known "-mtune" type).  Currently all numbers are default.  */
+ any known "-mtune" type).  */
 array_tune loongarch_cpu_rtx_cost_data =
-  array_tune ();
+  array_tune ()
+.set (CPU_LA664,
+ loongarch_rtx_cost_data ().movcf2gr_ (COSTS_N_INSNS (1)));
 
 /* RTX costs to use when optimizing for size.
We use a value slightly larger than COSTS_N_INSNS (1) for all of them
@@ -125,7 +130,8 @@ const loongarch_rtx_cost_data 
loongarch_rtx_cost_optimize_size =
 .int_mult_si_ (COST_COMPLEX_INSN)
 .int_mult_di_ (COST_COMPLEX_INSN)
 .int_div_si_ (COST_COMPLEX_INSN)
-.int_div_di_ (COST_COMPLEX_INSN);
+.int_div_di_ (COST_COMPLEX_INSN)
+.movcf2gr_ (COST_COMPLEX_INSN);
 
 array_tune loongarch_cpu_issue_rate = array_tune ()
   .set (CPU_NATIVE, 4)
diff --git a/gcc/config/loongarch/loongarch-tune.h 
b/gcc/config/loongarch/loongarch-tune.h
index 4aa01c54c08..7f478e009cd 100644
--- a/gcc/config/loongarch/loongarch-tune.h
+++ b/gc

Re: [PATCH v3 1/6] libgomp: basic pinned memory on Linux

2023-12-13 Thread Andrew Stubbs


On 12/12/2023 09:02, Tobias Burnus wrote:

On 11.12.23 18:04, Andrew Stubbs wrote:

Implement the OpenMP pinned memory trait on Linux hosts using the mlock
syscall.  Pinned allocations are performed using mmap, not malloc, to 
ensure

that they can be unpinned safely when freed.

This implementation will work OK for page-scale allocations, and 
finer-grained

allocations will be implemented in a future patch.


LGTM.

Thanks,

Tobias


Thank you, this one is now pushed.

Andrew

Re: [PATCH v4] aarch64: SVE/NEON Bridging intrinsics

2023-12-13 Thread Richard Sandiford

Richard Ball  writes:
> ACLE has added intrinsics to bridge between SVE and Neon.
>
> The NEON_SVE Bridge adds intrinsics that allow conversions between NEON and
> SVE vectors.
>
> This patch adds support to GCC for the following 3 intrinsics:
> svset_neonq, svget_neonq and svdup_neonq
>
> gcc/ChangeLog:
>
>   * config.gcc: Adds new header to config.
>   * config/aarch64/aarch64-builtins.cc (enum aarch64_type_qualifiers):
>   Moved to header file.
>   (ENTRY): Likewise.
>   (enum aarch64_simd_type): Likewise.
>   (struct aarch64_simd_type_info): Remove static.
>   (GTY): Likewise.
>   * config/aarch64/aarch64-c.cc (aarch64_pragma_aarch64):
>   Defines pragma for arm_neon_sve_bridge.h.
>   * config/aarch64/aarch64-sve-builtins-base.h: New intrinsics.
>   * config/aarch64/aarch64-sve-builtins-base.cc
>   (class svget_neonq_impl): New intrinsic implementation.
>   (class svset_neonq_impl): Likewise.
>   (class svdup_neonq_impl): Likewise.
>   (NEON_SVE_BRIDGE_FUNCTION): New intrinsics.
>   * config/aarch64/aarch64-sve-builtins-functions.h
>   (NEON_SVE_BRIDGE_FUNCTION): Defines macro for NEON_SVE_BRIDGE
>   functions.
>   * config/aarch64/aarch64-sve-builtins-shapes.h: New shapes.
>   * config/aarch64/aarch64-sve-builtins-shapes.cc
>   (parse_element_type): Add NEON element types.
>   (parse_type): Likewise.
>   (struct get_neonq_def): Defines function shape for get_neonq.
>   (struct set_neonq_def): Defines function shape for set_neonq.
>   (struct dup_neonq_def): Defines function shape for dup_neonq.
>   * config/aarch64/aarch64-sve-builtins.cc 
>   (DEF_SVE_TYPE_SUFFIX): Changed to be called through
>   SVE_NEON macro.
>   (DEF_SVE_NEON_TYPE_SUFFIX): Defines 
> macro for NEON_SVE_BRIDGE type suffixes.
>   (DEF_NEON_SVE_FUNCTION): Defines 
> macro for NEON_SVE_BRIDGE functions.
>   (function_resolver::infer_neon128_vector_type): Infers type suffix
>   for overloaded functions.
>   (init_neon_sve_builtins): Initialise neon_sve_bridge_builtins for LTO.
>   (handle_arm_neon_sve_bridge_h): Handles #pragma arm_neon_sve_bridge.h.
>   * config/aarch64/aarch64-sve-builtins.def
>   (DEF_SVE_NEON_TYPE_SUFFIX): Macro for handling neon_sve type suffixes.
>   (bf16): Replace entry with neon-sve entry.
>   (f16): Likewise.
>   (f32): Likewise.
>   (f64): Likewise.
>   (s8): Likewise.
>   (s16): Likewise.
>   (s32): Likewise.
>   (s64): Likewise.
>   (u8): Likewise.
>   (u16): Likewise.
>   (u32): Likewise.
>   (u64): Likewise.
>   * config/aarch64/aarch64-sve-builtins.h
>   (GCC_AARCH64_SVE_BUILTINS_H): Include aarch64-builtins.h.
>   (ENTRY): Add aarch64_simd_type definiton.
>   (enum aarch64_simd_type): Add neon information to type_suffix_info.
>   (struct type_suffix_info): New function.
>   * config/aarch64/aarch64-sve.md
>   (@aarch64_sve_get_neonq_): New intrinsic insn for big endian.
>   (@aarch64_sve_set_neonq_): Likewise.
>   * config/aarch64/aarch64.cc 
>   (aarch64_init_builtins): Add call to init_neon_sve_builtins.
>   * config/aarch64/iterators.md: Add UNSPEC_SET_NEONQ.
>   * config/aarch64/aarch64-builtins.h: New file.
>   * config/aarch64/aarch64-neon-sve-bridge-builtins.def: New file.
>   * config/aarch64/arm_neon_sve_bridge.h: New file.
>
> gcc/testsuite/ChangeLog:
>
>   * gcc.target/aarch64/sve/acle/asm/test_sve_acle.h: Add include 
>   arm_neon_sve_bridge header file
>   * gcc.dg/torture/neon-sve-bridge.c: New test.
>   * gcc.target/aarch64/sve/acle/asm/dup_neonq_bf16.c: New test.
>   * gcc.target/aarch64/sve/acle/asm/dup_neonq_f16.c: New test.
>   * gcc.target/aarch64/sve/acle/asm/dup_neonq_f32.c: New test.
>   * gcc.target/aarch64/sve/acle/asm/dup_neonq_f64.c: New test.
>   * gcc.target/aarch64/sve/acle/asm/dup_neonq_s16.c: New test.
>   * gcc.target/aarch64/sve/acle/asm/dup_neonq_s32.c: New test.
>   * gcc.target/aarch64/sve/acle/asm/dup_neonq_s64.c: New test.
>   * gcc.target/aarch64/sve/acle/asm/dup_neonq_s8.c: New test.
>   * gcc.target/aarch64/sve/acle/asm/dup_neonq_u16.c: New test.
>   * gcc.target/aarch64/sve/acle/asm/dup_neonq_u32.c: New test.
>   * gcc.target/aarch64/sve/acle/asm/dup_neonq_u64.c: New test.
>   * gcc.target/aarch64/sve/acle/asm/dup_neonq_u8.c: New test.
>   * gcc.target/aarch64/sve/acle/asm/get_neonq_bf16.c: New test.
>   * gcc.target/aarch64/sve/acle/asm/get_neonq_f16.c: New test.
>   * gcc.target/aarch64/sve/acle/asm/get_neonq_f32.c: New test.
>   * gcc.target/aarch64/sve/acle/asm/get_neonq_f64.c: New test.
>   * gcc.target/aarch64/sve/acle/asm/get_neonq_s16.c: New test.
>   * gcc.target/aarch64/sve/acle/asm/get_neonq_s32.c: New test.
>   * gcc.target/aarch64/sve/acle/asm/get_neonq_s64.c: New test.
>   * gcc.target/aarch64/s

Re: [RFC/RFT,V2] CFI: Add support for gcc CFI in aarch64

2023-12-13 Thread Mark Rutland

On Wed, Dec 13, 2023 at 05:01:07PM +0800, Wang wrote:
> On 2023/12/13 16:48, Dan Li wrote:
> > + Likun
> >
> > On Tue, 28 Mar 2023 at 06:18, Sami Tolvanen wrote:
> >> On Mon, Mar 27, 2023 at 2:30 AM Peter Zijlstra wrote:
> >>> On Sat, Mar 25, 2023 at 01:54:16AM -0700, Dan Li wrote:
> >>>
>  In the compiler part[4], most of the content is the same as Sami's
>  implementation[3], except for some minor differences, mainly including:
> 
>  1. The function typeid is calculated differently and it is difficult
>  to be consistent.
> >>> This means there is an effective ABI break between the compilers, which
> >>> is sad :-( Is there really nothing to be done about this?
> >> I agree, this would be unfortunate, and would also be a compatibility
> >> issue with rustc where there's ongoing work to support
> >> clang-compatible CFI type hashes:
> >>
> >> https://github.com/rust-lang/rust/pull/105452
> >>
> >> Sami
> 
> Hi Peter and Sami
> 
> I am Dan Li's colleague, and I will take over and continue the work of CFI.
> 
> Regarding the issue of gcc cfi type id being compatible with clang, we
> have analyzed and verified:
> 
> 1. clang uses Mangling defined in Itanium C++ ABI to encode the function
> prototype, and uses the encoding result as input to generate cfi type id;
> 2. Currently, gcc only implements mangling for the C++ compiler, and the
> function prototype coding generated by these interfaces is compatible
> with clang, but gcc's c compiler does not support mangling.;
> 
> Adding mangling to gcc's c compiler is a huge and difficult task，because
> we have to refactor the mangling of C++, splitting it into basic
> mangling and language specific mangling, and adding support for the c
> language which requires a deep understanding of the compiler and
> language processing parts.
> 
> And for the kernel cfi, I suggest separating type compatibility from CFI
> basic functions. Type compatibility is independent from CFI basic
> funcitons and should be dealt with under another topic. Should we focus
> on the main issus of cfi, and  let it work first on linux kernel, and
> left the compatible issue to be solved later?

I'm not sure what you're suggesting here exactly, do you mean to add a type ID
scheme that's incompatible with clang, leaving everything else the same? If so,
what sort of scheme are you proposing?

It seems unfortunate to have a different scheme, but IIUC we expect all kernel
objects to be built with the same compiler.

Mark.

[committed v2] aarch64: Add missing driver-aarch64 dependencies

2023-12-13 Thread Andrew Carlotti

On Sat, Dec 09, 2023 at 06:42:17PM +, Richard Sandiford wrote:
> Andrew Carlotti  writes:
> The .def files are included in TM_H by:
> 
> TM_H += $(srcdir)/config/aarch64/aarch64-fusion-pairs.def \
>   $(srcdir)/config/aarch64/aarch64-tuning-flags.def \
>   $(srcdir)/config/aarch64/aarch64-option-extensions.def \
>   $(srcdir)/config/aarch64/aarch64-cores.def \
>   $(srcdir)/config/aarch64/aarch64-isa-modes.def \
>   $(srcdir)/config/aarch64/aarch64-arches.def

They are included now, but only because you added them last week.

I've removed them in v2 of the patch, committed as below:

---

gcc/ChangeLog:

* config/aarch64/x-aarch64: Add missing dependencies.


diff --git a/gcc/config/aarch64/x-aarch64 b/gcc/config/aarch64/x-aarch64
index 
3cf701a0a01ab00eaaafdfad14bd90ebbb1d498f..ee828c9af53a11885c2bcef8f112c0ebaf161c59
 100644
--- a/gcc/config/aarch64/x-aarch64
+++ b/gcc/config/aarch64/x-aarch64
@@ -1,3 +1,5 @@
 driver-aarch64.o: $(srcdir)/config/aarch64/driver-aarch64.cc \
-  $(CONFIG_H) $(SYSTEM_H)
+  $(CONFIG_H) $(SYSTEM_H) $(TM_H) $(CORETYPES_H) \
+  $(srcdir)/config/aarch64/aarch64-protos.h \
+  $(srcdir)/config/aarch64/aarch64-feature-deps.h
$(COMPILER) -c $(ALL_COMPILERFLAGS) $(ALL_CPPFLAGS) $(INCLUDES) $<

Re: [PATCH v2 09/11] aarch64: Rewrite non-writeback ldp/stp patterns

2023-12-13 Thread Richard Sandiford

Alex Coplan  writes:
> On 12/12/2023 15:58, Richard Sandiford wrote:
>> Alex Coplan  writes:
>> > Hi,
>> >
>> > This is a v2 version which addresses feedback from Richard's review
>> > here:
>> >
>> > https://gcc.gnu.org/pipermail/gcc-patches/2023-November/637648.html
>> >
>> > I'll reply inline to address specific comments.
>> >
>> > Bootstrapped/regtested on aarch64-linux-gnu, OK for trunk?
>> >
>> > Thanks,
>> > Alex
>> >
>> > -- >8 --
>> >
>> > This patch overhauls the load/store pair patterns with two main goals:
>> >
>> > 1. Fixing a correctness issue (the current patterns are not RA-friendly).
>> > 2. Allowing more flexibility in which operand modes are supported, and 
>> > which
>> >combinations of modes are allowed in the two arms of the load/store 
>> > pair,
>> >while reducing the number of patterns required both in the source and in
>> >the generated code.
>> >
>> > The correctness issue (1) is due to the fact that the current patterns have
>> > two independent memory operands tied together only by a predicate on the 
>> > insns.
>> > Since LRA only looks at the constraints, one of the memory operands can get
>> > reloaded without the other one being changed, leading to the insn becoming
>> > unrecognizable after reload.
>> >
>> > We fix this issue by changing the patterns such that they only ever have 
>> > one
>> > memory operand representing the entire pair.  For the store case, we use an
>> > unspec to logically concatenate the register operands before storing them.
>> > For the load case, we use unspecs to extract the "lanes" from the pair mem,
>> > with the second occurrence of the mem matched using a match_dup (such that 
>> > there
>> > is still really only one memory operand as far as the RA is concerned).
>> >
>> > In terms of the modes used for the pair memory operands, we canonicalize
>> > these to V2x4QImode, V2x8QImode, and V2x16QImode.  These modes have not
>> > only the correct size but also correct alignment requirement for a
>> > memory operand representing an entire load/store pair.  Unlike the other
>> > two, V2x4QImode didn't previously exist, so had to be added with the
>> > patch.
>> >
>> > As with the previous patch generalizing the writeback patterns, this
>> > patch aims to be flexible in the combinations of modes supported by the
>> > patterns without requiring a large number of generated patterns by using
>> > distinct mode iterators.
>> >
>> > The new scheme means we only need a single (generated) pattern for each
>> > load/store operation of a given operand size.  For the 4-byte and 8-byte
>> > operand cases, we use the GPI iterator to synthesize the two patterns.
>> > The 16-byte case is implemented as a separate pattern in the source (due
>> > to only having a single possible alternative).
>> >
>> > Since the UNSPEC patterns can't be interpreted by the dwarf2cfi code,
>> > we add REG_CFA_OFFSET notes to the store pair insns emitted by
>> > aarch64_save_callee_saves, so that correct CFI information can still be
>> > generated.  Furthermore, we now unconditionally generate these CFA
>> > notes on frame-related insns emitted by aarch64_save_callee_saves.
>> > This is done in case that the load/store pair pass forms these into
>> > pairs, in which case the CFA notes would be needed.
>> >
>> > We also adjust the ldp/stp peepholes to generate the new form.  This is
>> > done by switching the generation to use the
>> > aarch64_gen_{load,store}_pair interface, making it easier to change the
>> > form in the future if needed.  (Likewise, the upcoming aarch64
>> > load/store pair pass also makes use of this interface).
>> >
>> > This patch also adds an "ldpstp" attribute to the non-writeback
>> > load/store pair patterns, which is used by the post-RA load/store pair
>> > pass to identify existing patterns and see if they can be promoted to
>> > writeback variants.
>> >
>> > One potential concern with using unspecs for the patterns is that it can 
>> > block
>> > optimization by the generic RTL passes.  This patch series tries to 
>> > mitigate
>> > this in two ways:
>> >  1. The pre-RA load/store pair pass runs very late in the pre-RA pipeline.
>> >  2. A later patch in the series adjusts the aarch64 mem{cpy,set} expansion 
>> > to
>> > emit individual loads/stores instead of ldp/stp.  These should then be
>> > formed back into load/store pairs much later in the RTL pipeline by the
>> > new load/store pair pass.
>> >
>> > gcc/ChangeLog:
>> >
>> > * config/aarch64/aarch64-ldpstp.md: Abstract ldp/stp
>> > representation from peepholes, allowing use of new form.
>> > * config/aarch64/aarch64-modes.def (V2x4QImode): Define.
>> > * config/aarch64/aarch64-protos.h
>> > (aarch64_finish_ldpstp_peephole): Declare.
>> > (aarch64_swap_ldrstr_operands): Delete declaration.
>> > (aarch64_gen_load_pair): Adjust parameters.
>> > (aarch64_gen_store_pair): Likewise.
>> > * config/aarch64/aarch64

[committed v2] aarch64 testsuite: Check entire .arch string

2023-12-13 Thread Andrew Carlotti

Add a terminating newline to various tests, and add missing
extensions to some test strings.  The current output is broken for
options_set_4.c, so this test is left unchanged, to be fixed in a
subsequent patch.

Committed as obvious, with options_set_4.c removed compared to v1.

gcc/testsuite/ChangeLog:

* gcc.target/aarch64/cpunative/native_cpu_18.c: Add \+nopauth\n
* gcc.target/aarch64/options_set_7.c: Add \+crc\n
* gcc.target/aarch64/options_set_8.c: Add \+crc\+nodotprod\n
* gcc.target/aarch64/cpunative/native_cpu_0.c: Add \n
* gcc.target/aarch64/cpunative/native_cpu_1.c: Ditto.
* gcc.target/aarch64/cpunative/native_cpu_2.c: Ditto.
* gcc.target/aarch64/cpunative/native_cpu_3.c: Ditto.
* gcc.target/aarch64/cpunative/native_cpu_4.c: Ditto.
* gcc.target/aarch64/cpunative/native_cpu_5.c: Ditto.
* gcc.target/aarch64/cpunative/native_cpu_6.c: Ditto.
* gcc.target/aarch64/cpunative/native_cpu_7.c: Ditto.
* gcc.target/aarch64/cpunative/native_cpu_8.c: Ditto.
* gcc.target/aarch64/cpunative/native_cpu_9.c: Ditto.
* gcc.target/aarch64/cpunative/native_cpu_10.c: Ditto.
* gcc.target/aarch64/cpunative/native_cpu_11.c: Ditto.
* gcc.target/aarch64/cpunative/native_cpu_12.c: Ditto.
* gcc.target/aarch64/cpunative/native_cpu_13.c: Ditto.
* gcc.target/aarch64/cpunative/native_cpu_14.c: Ditto.
* gcc.target/aarch64/cpunative/native_cpu_15.c: Ditto.
* gcc.target/aarch64/cpunative/native_cpu_16.c: Ditto.
* gcc.target/aarch64/cpunative/native_cpu_17.c: Ditto.
* gcc.target/aarch64/options_set_1.c: Ditto.
* gcc.target/aarch64/options_set_2.c: Ditto.
* gcc.target/aarch64/options_set_3.c: Ditto.
* gcc.target/aarch64/options_set_5.c: Ditto.
* gcc.target/aarch64/options_set_6.c: Ditto.
* gcc.target/aarch64/options_set_9.c: Ditto.
* gcc.target/aarch64/options_set_11.c: Ditto.
* gcc.target/aarch64/options_set_12.c: Ditto.
* gcc.target/aarch64/options_set_13.c: Ditto.
* gcc.target/aarch64/options_set_14.c: Ditto.
* gcc.target/aarch64/options_set_15.c: Ditto.
* gcc.target/aarch64/options_set_16.c: Ditto.
* gcc.target/aarch64/options_set_17.c: Ditto.
* gcc.target/aarch64/options_set_18.c: Ditto.
* gcc.target/aarch64/options_set_19.c: Ditto.
* gcc.target/aarch64/options_set_20.c: Ditto.
* gcc.target/aarch64/options_set_21.c: Ditto.
* gcc.target/aarch64/options_set_22.c: Ditto.
* gcc.target/aarch64/options_set_23.c: Ditto.
* gcc.target/aarch64/options_set_24.c: Ditto.
* gcc.target/aarch64/options_set_25.c: Ditto.
* gcc.target/aarch64/options_set_26.c: Ditto.


diff --git a/gcc/testsuite/gcc.target/aarch64/cpunative/native_cpu_0.c 
b/gcc/testsuite/gcc.target/aarch64/cpunative/native_cpu_0.c
index 
8499f87c39b173491a89626af56f4e193b1d12b5..fb5a7a18ad1a2d09ac4b231150a1bd9e72d6fab6
 100644
--- a/gcc/testsuite/gcc.target/aarch64/cpunative/native_cpu_0.c
+++ b/gcc/testsuite/gcc.target/aarch64/cpunative/native_cpu_0.c
@@ -7,6 +7,6 @@ int main()
   return 0;
 }
 
-/* { dg-final { scan-assembler {\.arch armv8-a\+crc\+dotprod\+crypto} } } */
+/* { dg-final { scan-assembler {\.arch armv8-a\+crc\+dotprod\+crypto\n} } } */
 
 /* Test a normal looking procinfo.  */
diff --git a/gcc/testsuite/gcc.target/aarch64/cpunative/native_cpu_1.c 
b/gcc/testsuite/gcc.target/aarch64/cpunative/native_cpu_1.c
index 
2cf0e89994b1cc0dc9fac67f4dc431c003498048..cb50e3b73057994432cc3ed15e3d5b57c7a3cb7b
 100644
--- a/gcc/testsuite/gcc.target/aarch64/cpunative/native_cpu_1.c
+++ b/gcc/testsuite/gcc.target/aarch64/cpunative/native_cpu_1.c
@@ -7,6 +7,6 @@ int main()
   return 0;
 }
 
-/* { dg-final { scan-assembler {\.arch armv8-a\+nosimd} } } */
+/* { dg-final { scan-assembler {\.arch armv8-a\+nosimd\n} } } */
 
 /* Test one where fp is on by default so turn off simd.  */
diff --git a/gcc/testsuite/gcc.target/aarch64/cpunative/native_cpu_10.c 
b/gcc/testsuite/gcc.target/aarch64/cpunative/native_cpu_10.c
index 
ddb06b8227576807fe068b76dabed91a0223e4fa..6a524bad371c55fc32698ff0994f4ad431be49ca
 100644
--- a/gcc/testsuite/gcc.target/aarch64/cpunative/native_cpu_10.c
+++ b/gcc/testsuite/gcc.target/aarch64/cpunative/native_cpu_10.c
@@ -7,6 +7,6 @@ int main()
   return 0;
 }
 
-/* { dg-final { scan-assembler {\.arch armv8-a\+nofp} } } */
+/* { dg-final { scan-assembler {\.arch armv8-a\+nofp\n} } } */
 
 /* Test one with no entry in feature list.  */
diff --git a/gcc/testsuite/gcc.target/aarch64/cpunative/native_cpu_11.c 
b/gcc/testsuite/gcc.target/aarch64/cpunative/native_cpu_11.c
index 
96b9ca434ebbf007ddaa45d55a8c2b8e7a19a715..644f4792275bdd32a9f84241f0c329b046cbd909
 100644
--- a/gcc/testsuite/gcc.target/aarch64/cpunative/native_cpu_11.c
+++ b/gcc/testsuite/gcc.target/aarch64/cpunative/native_cpu_11.c
@@ -7,6 +7,6 @@ int main()
   return 0;
 }

[PATCH v2] aarch64: Fix +nocrypto handling

2023-12-13 Thread Andrew Carlotti

Additionally, replace all checks for the AARCH64_FL_CRYPTO bit with
checks for (AARCH64_FL_AES | AARCH64_FL_SHA2) instead.  The value of the
AARCH64_FL_CRYPTO bit within isa_flags is now ignored, but it is
retained because removing it would make processing the data in
option-extensions.def significantly more complex.

This bug should have been picked up by an existing test, but a missing
newline meant that the pattern incorrectly allowed "+crypto+nocrypto".

Ok for master?

gcc/ChangeLog:

* common/config/aarch64/aarch64-common.cc
(aarch64_get_extension_string_for_isa_flags): Fix generation of
the "+nocrypto" extension.
* config/aarch64/aarch64.h (AARCH64_ISA_CRYPTO): Remove.
(TARGET_CRYPTO): Remove.
* config/aarch64/aarch64-c.cc (aarch64_update_cpp_builtins):
Don't use TARGET_CRYPTO.

gcc/testsuite/ChangeLog:

* gcc.target/aarch64/options_set_4.c: Add terminating newline.
* gcc.target/aarch64/options_set_27.c: New test.


diff --git a/gcc/common/config/aarch64/aarch64-common.cc 
b/gcc/common/config/aarch64/aarch64-common.cc
index 
8fb901029ec2980a048177586b84201b3b398f9e..c2a6d357c0bc17996a25ea5c3a40f69d745c7931
 100644
--- a/gcc/common/config/aarch64/aarch64-common.cc
+++ b/gcc/common/config/aarch64/aarch64-common.cc
@@ -311,6 +311,7 @@ aarch64_get_extension_string_for_isa_flags
  But in order to make the output more readable, it seems better
  to add the strings in definition order.  */
   aarch64_feature_flags added = 0;
+  auto flags_crypto = AARCH64_FL_AES | AARCH64_FL_SHA2;
   for (unsigned int i = ARRAY_SIZE (all_extensions); i-- > 0; )
 {
   auto &opt = all_extensions[i];
@@ -320,7 +321,7 @@ aarch64_get_extension_string_for_isa_flags
 per-feature crypto flags.  */
   auto flags = opt.flag_canonical;
   if (flags == AARCH64_FL_CRYPTO)
-   flags = AARCH64_FL_AES | AARCH64_FL_SHA2;
+   flags = flags_crypto;
 
   if ((flags & isa_flags & (explicit_flags | ~current_flags)) == flags)
{
@@ -339,14 +340,32 @@ aarch64_get_extension_string_for_isa_flags
  not have an HWCAPs then it shouldn't be taken into account for feature
  detection because one way or another we can't tell if it's available
  or not.  */
+
   for (auto &opt : all_extensions)
-if (opt.native_detect_p
-   && (opt.flag_canonical & current_flags & ~isa_flags))
-  {
-   current_flags &= ~opt.flags_off;
-   outstr += "+no";
-   outstr += opt.name;
-  }
+{
+  auto flags = opt.flag_canonical;
+  /* As a special case, don't emit "+noaes" or "+nosha2" when we could emit
+"+nocrypto" instead, in order to support assemblers that predate the
+separate per-feature crypto flags.  Only allow "+nocrypto" when "sm4"
+is not already enabled (to avoid dependending on whether "+nocrypto"
+also disables "sm4").  */
+  if (flags & flags_crypto
+ && (flags_crypto & current_flags & ~isa_flags) == flags_crypto
+ && !(current_flags & AARCH64_FL_SM4))
+ continue;
+
+  if (flags == AARCH64_FL_CRYPTO)
+   /* If either crypto flag needs removing here, then both do.  */
+   flags = flags_crypto;
+
+  if (opt.native_detect_p
+ && (flags & current_flags & ~isa_flags))
+   {
+ current_flags &= ~opt.flags_off;
+ outstr += "+no";
+ outstr += opt.name;
+   }
+}
 
   return outstr;
 }
diff --git a/gcc/config/aarch64/aarch64-c.cc b/gcc/config/aarch64/aarch64-c.cc
index 
115a2a8b7568c43a712d819e03147ff84ff182c0..cdc4e453a2054b1a1d2c70bf0b528e497ae0b9ad
 100644
--- a/gcc/config/aarch64/aarch64-c.cc
+++ b/gcc/config/aarch64/aarch64-c.cc
@@ -188,7 +188,7 @@ aarch64_update_cpp_builtins (cpp_reader *pfile)
   aarch64_def_or_undef (TARGET_ILP32, "_ILP32", pfile);
   aarch64_def_or_undef (TARGET_ILP32, "__ILP32__", pfile);
 
-  aarch64_def_or_undef (TARGET_CRYPTO, "__ARM_FEATURE_CRYPTO", pfile);
+  aarch64_def_or_undef (TARGET_AES && TARGET_SHA2, "__ARM_FEATURE_CRYPTO", 
pfile);
   aarch64_def_or_undef (TARGET_SIMD_RDMA, "__ARM_FEATURE_QRDMX", pfile);
   aarch64_def_or_undef (TARGET_SVE, "__ARM_FEATURE_SVE", pfile);
   cpp_undef (pfile, "__ARM_FEATURE_SVE_BITS");
diff --git a/gcc/config/aarch64/aarch64.h b/gcc/config/aarch64/aarch64.h
index 
2cd0bc552ebadac06a2838ae2767852c036d0db4..501bb7478a0755fa76c488ec03dcfab6c272851c
 100644
--- a/gcc/config/aarch64/aarch64.h
+++ b/gcc/config/aarch64/aarch64.h
@@ -204,7 +204,11 @@ constexpr auto AARCH64_FL_DEFAULT_ISA_MODE = 
AARCH64_FL_SM_OFF;
 
 #endif
 
-/* Macros to test ISA flags.  */
+/* Macros to test ISA flags.
+
+   There is intentionally no macro for AARCH64_FL_CRYPTO, since this flag bit
+   is not always set when its constituent features are present.
+   Check (TARGET_AES && TARGET_SHA2) instead.  */
 
 #define AARCH64_ISA_SM_OFF (aarch64_isa_flags & AARCH64_FL_SM_OFF)
 #define AARCH64_ISA_SM_ON  (aarch64_isa_flags & AARCH64_F

[PATCH v2] aarch64: Fix +nopredres, +nols64 and +nomops

2023-12-13 Thread Andrew Carlotti

On Sat, Dec 09, 2023 at 07:22:49PM +, Richard Sandiford wrote:
> Andrew Carlotti  writes:
> > ...
> 
> This is the only use of native_detect_p, so it'd be good to remove
> the field itself.

Done

> > ...
> >
> > @@ -447,6 +451,13 @@ host_detect_local_cpu (int argc, const char **argv)
> >if (tune)
> >  return res;
> >  
> > +  if (!processed_exts)
> > +goto not_found;
> 
> Could you explain this part?  It seems like more of a parsing change
> (i.e. being more strict about what we accept).
> 
> If that's the intention, it probably belongs in:
> 
>   if (n_cores == 0
>   || n_cores > 2
>   || (n_cores == 1 && n_variants != 1)
>   || imp == INVALID_IMP)
> goto not_found;
> 
> But maybe it should be a separate patch.

I added it because I realised that the parsing behaviour didn't make sense in
that case, and my patch happens to change the behaviour as well (the outcome
without the check would be no enabled features, whereas previously it would
enable only the features with no native detection).

I agree that it makes sense to put it with the original check, so I've made 
that change.

> Looks good otherwise, thanks.
> 
> Richard

New patch version below, ok for master?

---

For native cpu feature detection, certain features have no entry in
/proc/cpuinfo, so have to be assumed to be present whenever the detected
cpu is supposed to support that feature.

However, the logic for this was mistakenly implemented by excluding
these features from part of aarch64_get_extension_string_for_isa_flags.
This function is also used elsewhere when canonicalising explicit
feature sets, which may require removing features that are normally
implied by the specified architecture version.

This change reenables generation of +nopredres, +nols64 and +nomops
during canonicalisation, by relocating the misplaced native cpu
detection logic.

gcc/ChangeLog:

* common/config/aarch64/aarch64-common.cc
(struct aarch64_option_extension): Remove unused field.
(all_extensions): Ditto.
(aarch64_get_extension_string_for_isa_flags): Remove filtering
of features without native detection.
* config/aarch64/driver-aarch64.cc (host_detect_local_cpu):
Explicitly add expected features that lack cpuinfo detection.

gcc/testsuite/ChangeLog:

* gcc.target/aarch64/options_set_28.c: New test.

diff --git a/gcc/common/config/aarch64/aarch64-common.cc 
b/gcc/common/config/aarch64/aarch64-common.cc
index 
c2a6d357c0bc17996a25ea5c3a40f69d745c7931..4d0431d3a2cad5414790646bce0c09877c0366b2
 100644
--- a/gcc/common/config/aarch64/aarch64-common.cc
+++ b/gcc/common/config/aarch64/aarch64-common.cc
@@ -149,9 +149,6 @@ struct aarch64_option_extension
   aarch64_feature_flags flags_on;
   /* If this feature is turned off, these bits also need to be turned off.  */
   aarch64_feature_flags flags_off;
-  /* Indicates whether this feature is taken into account during native cpu
- detection.  */
-  bool native_detect_p;
 };

 /* ISA extensions in AArch64.  */
@@ -159,10 +156,9 @@ static constexpr aarch64_option_extension all_extensions[] 
=
 {
 #define AARCH64_OPT_EXTENSION(NAME, IDENT, C, D, E, FEATURE_STRING) \
   {NAME, AARCH64_FL_##IDENT, feature_deps::IDENT ().explicit_on, \
-   feature_deps::get_flags_off (feature_deps::root_off_##IDENT), \
-   FEATURE_STRING[0]},
+   feature_deps::get_flags_off (feature_deps::root_off_##IDENT)},
 #include "config/aarch64/aarch64-option-extensions.def"
-  {NULL, 0, 0, 0, false}
+  {NULL, 0, 0, 0}
 };

 struct processor_name_to_arch
@@ -358,8 +354,7 @@ aarch64_get_extension_string_for_isa_flags
/* If either crypto flag needs removing here, then both do.  */
flags = flags_crypto;

-  if (opt.native_detect_p
- && (flags & current_flags & ~isa_flags))
+  if (flags & current_flags & ~isa_flags)
{
  current_flags &= ~opt.flags_off;
  outstr += "+no";
diff --git a/gcc/config/aarch64/driver-aarch64.cc 
b/gcc/config/aarch64/driver-aarch64.cc
index 
8e318892b10aa2288421fad418844744a2f5a3b4..c18f065aa41e7328d71b45a53c82a3b703ae44d5
 100644
--- a/gcc/config/aarch64/driver-aarch64.cc
+++ b/gcc/config/aarch64/driver-aarch64.cc
@@ -262,6 +262,7 @@ host_detect_local_cpu (int argc, const char **argv)
   unsigned int n_variants = 0;
   bool processed_exts = false;
   aarch64_feature_flags extension_flags = 0;
+  aarch64_feature_flags unchecked_extension_flags = 0;
   aarch64_feature_flags default_flags = 0;
   std::string buf;
   size_t sep_pos = -1;
@@ -348,7 +349,10 @@ host_detect_local_cpu (int argc, const char **argv)
  /* If the feature contains no HWCAPS string then ignore it for the
 auto detection.  */
  if (val.empty ())
-   continue;
+   {
+ unchecked_extension_flags |= aarch64_extensions[i].flag;
+ continue;
+   }

  bool enabled = true;

@@ -380,7 +384,8 @@ h

[wwwdocs][patch] gcc-14/changes.html + project/gomp/: Update OpenMP status

2023-12-13 Thread Tobias Burnus


Attached is an in-between update for the release notes and also for the project 
status page.

The latter contains an implementation-status page that is updated based on the 
libgomp.texi entries;
I think there are more issues, but I found an incomplete update which is now 
fixed. I probably need to
go through that list and check it again.

Comments, suggestions, remarks before (or after) I commit it?

Tobias

PS: I hope that we can still review, revise, commit some more pending patches
(in general, but mostly talking about OpenMP, OpenACC and nvptx/gcn here).

PPS: I or someone else also needs to update the Fortran part, now that there is
-std=f2023 (besides -std=c23 and -std=c++23). I have not checked for additional
features.
-
Siemens Electronic Design Automation GmbH; Anschrift: Arnulfstraße 201, 80634 
München; Gesellschaft mit beschränkter Haftung; Geschäftsführer: Thomas 
Heurung, Frank Thürauf; Sitz der Gesellschaft: München; Registergericht 
München, HRB 106955
gcc-14/changes.html + project/gomp/: Update OpenMP status

diff --git a/htdocs/gcc-14/changes.html b/htdocs/gcc-14/changes.html
index c49c13e9..516f3954 100644
--- a/htdocs/gcc-14/changes.html
+++ b/htdocs/gcc-14/changes.html
@@ -52,10 +53,22 @@ a work-in-progress.
 
   The requires directive's unified_address
   requirement is now fulfilled by both AMD GCN and nvptx devices.
+  AMD GCN and nvptx devices now support low-latency allocators as
+  https://gcc.gnu.org/onlinedocs/libgomp/Offload-Target-Specifics.html";
+  >detailed in the manual. Initial support for pinned-memory
+  allocators has been added (https://gcc.gnu.org/onlinedocs/libgomp/Memory-allocation.html";
+  >as detailed in the manual)
 
 OpenMP 5.0: The allocate directive is now
   supported for stack variables in C and Fortran, including the OpenMP 5.1
-  align modifier.
+  align modifier. For Fortran, OpenMP allocators can now be
+  used for allocatables and pointers using the allocate
+  directive and its OpenMP 5.2 replacement, the allocators
+  directive; files using this allocator and all files that might directly
+  or indirectly (intrinisic assignment, intent(out), ...)
+  de- or reallocate such-allocated variables must be compiled with the
+  https://gcc.gnu.org/onlinedocs/gfortran/Fortran-Dialect-Options.html#index-fopenmp-allocators";
+  >-fopenmp-allocators option.
 
 
   OpenMP 5.1: Support was added for collapsing imperfectly nested loops and

diff --git a/htdocs/projects/gomp/index.html b/htdocs/projects/gomp/index.html
index 03f74a88..bf20bb88 100644
--- a/htdocs/projects/gomp/index.html
+++ b/htdocs/projects/gomp/index.html
@@ -480,7 +480,7 @@ than listed, depending on resolved corner cases and optimizations.
   
 allocate directive
 GCC 14
-Only C and Fortran, only stack variables
+Only C for stack/automatic and Fortran for stack/automatic and allocatable/pointer variables
   
   
 Discontiguous array section with target update construct
@@ -555,7 +555,7 @@ than listed, depending on resolved corner cases and optimizations.
   
 align clause in allocate directive
 GCC 14
-Only C and Fortran (and only stack variables)
+Only C and Fortran (and not for static variables)
   
   
 align modifier in allocate clause
@@ -820,10 +820,15 @@ than listed, depending on resolved corner cases and optimizations.
 
   
   
-Deprecation of no-argument destroy clause on depobj
+destroy clause with destroy-var argument on depobj
 GCC 14
 
   
+  
+Deprecation of no-argument destroy clause on depobj
+No
+
+  
   
 linear clause syntax changes and step modifier
 GCC 13

[PATCH v2] extend.texi: Fix typos in LSX intrinsics

2023-12-13 Thread Jiajie Chen

Several typos have been found and fixed: missing semicolons, using
variable name instead of type, duplicate functions and wrong types.

gcc/ChangeLog:

* doc/extend.texi(__lsx_vabsd_di): remove extra `i' in name.
(__lsx_vfrintrm_d, __lsx_vfrintrm_s, __lsx_vfrintrne_d,
__lsx_vfrintrne_s, __lsx_vfrintrp_d, __lsx_vfrintrp_s, __lsx_vfrintrz_d,
__lsx_vfrintrz_s): fix return types.
(__lsx_vld, __lsx_vldi, __lsx_vldrepl_b, __lsx_vldrepl_d,
__lsx_vldrepl_h, __lsx_vldrepl_w, __lsx_vmaxi_b, __lsx_vmaxi_d,
__lsx_vmaxi_h, __lsx_vmaxi_w, __lsx_vmini_b, __lsx_vmini_d,
__lsx_vmini_h, __lsx_vmini_w, __lsx_vsrani_d_q, __lsx_vsrarni_d_q,
__lsx_vsrlni_d_q, __lsx_vsrlrni_d_q, __lsx_vssrani_d_q,
__lsx_vssrarni_d_q, __lsx_vssrarni_du_q, __lsx_vssrlni_d_q,
__lsx_vssrlrni_du_q, __lsx_vst, __lsx_vstx, __lsx_vssrani_du_q,
__lsx_vssrlni_du_q, __lsx_vssrlrni_d_q): add missing semicolon.
(__lsx_vpickve2gr_bu, __lsx_vpickve2gr_hu): fix typo in return
type.
(__lsx_vstelm_b, __lsx_vstelm_d, __lsx_vstelm_h,
__lsx_vstelm_w): use imm type for the last argument.
(__lsx_vsigncov_b, __lsx_vsigncov_h, __lsx_vsigncov_w,
__lsx_vsigncov_d): remove duplicate definitions.
---
 gcc/doc/extend.texi | 90 ++---
 1 file changed, 43 insertions(+), 47 deletions(-)

diff --git a/gcc/doc/extend.texi b/gcc/doc/extend.texi
index f0c789f6cb4..ba1317c3510 100644
--- a/gcc/doc/extend.texi
+++ b/gcc/doc/extend.texi
@@ -17563,7 +17563,7 @@ int __lsx_bz_v (__m128i);
 int __lsx_bz_w (__m128i);
 __m128i __lsx_vabsd_b (__m128i, __m128i);
 __m128i __lsx_vabsd_bu (__m128i, __m128i);
-__m128i __lsx_vabsd_di (__m128i, __m128i);
+__m128i __lsx_vabsd_d (__m128i, __m128i);
 __m128i __lsx_vabsd_du (__m128i, __m128i);
 __m128i __lsx_vabsd_h (__m128i, __m128i);
 __m128i __lsx_vabsd_hu (__m128i, __m128i);
@@ -17769,14 +17769,14 @@ __m128 __lsx_vfnmsub_s (__m128, __m128, __m128);
 __m128d __lsx_vfrecip_d (__m128d);
 __m128 __lsx_vfrecip_s (__m128);
 __m128d __lsx_vfrint_d (__m128d);
-__m128i __lsx_vfrintrm_d (__m128d);
-__m128i __lsx_vfrintrm_s (__m128);
-__m128i __lsx_vfrintrne_d (__m128d);
-__m128i __lsx_vfrintrne_s (__m128);
-__m128i __lsx_vfrintrp_d (__m128d);
-__m128i __lsx_vfrintrp_s (__m128);
-__m128i __lsx_vfrintrz_d (__m128d);
-__m128i __lsx_vfrintrz_s (__m128);
+__m128d __lsx_vfrintrm_d (__m128d);
+__m128 __lsx_vfrintrm_s (__m128);
+__m128d __lsx_vfrintrne_d (__m128d);
+__m128 __lsx_vfrintrne_s (__m128);
+__m128d __lsx_vfrintrp_d (__m128d);
+__m128 __lsx_vfrintrp_s (__m128);
+__m128d __lsx_vfrintrz_d (__m128d);
+__m128 __lsx_vfrintrz_s (__m128);
 __m128 __lsx_vfrint_s (__m128);
 __m128d __lsx_vfrsqrt_d (__m128d);
 __m128 __lsx_vfrsqrt_s (__m128);
@@ -17845,12 +17845,12 @@ __m128i __lsx_vinsgr2vr_b (__m128i, int, imm0_15);
 __m128i __lsx_vinsgr2vr_d (__m128i, long int, imm0_1);
 __m128i __lsx_vinsgr2vr_h (__m128i, int, imm0_7);
 __m128i __lsx_vinsgr2vr_w (__m128i, int, imm0_3);
-__m128i __lsx_vld (void *, imm_n2048_2047)
-__m128i __lsx_vldi (imm_n1024_1023)
-__m128i __lsx_vldrepl_b (void *, imm_n2048_2047)
-__m128i __lsx_vldrepl_d (void *, imm_n256_255)
-__m128i __lsx_vldrepl_h (void *, imm_n1024_1023)
-__m128i __lsx_vldrepl_w (void *, imm_n512_511)
+__m128i __lsx_vld (void *, imm_n2048_2047);
+__m128i __lsx_vldi (imm_n1024_1023);
+__m128i __lsx_vldrepl_b (void *, imm_n2048_2047);
+__m128i __lsx_vldrepl_d (void *, imm_n256_255);
+__m128i __lsx_vldrepl_h (void *, imm_n1024_1023);
+__m128i __lsx_vldrepl_w (void *, imm_n512_511);
 __m128i __lsx_vldx (void *, long int);
 __m128i __lsx_vmadd_b (__m128i, __m128i, __m128i);
 __m128i __lsx_vmadd_d (__m128i, __m128i, __m128i);
@@ -17886,13 +17886,13 @@ __m128i __lsx_vmax_d (__m128i, __m128i);
 __m128i __lsx_vmax_du (__m128i, __m128i);
 __m128i __lsx_vmax_h (__m128i, __m128i);
 __m128i __lsx_vmax_hu (__m128i, __m128i);
-__m128i __lsx_vmaxi_b (__m128i, imm_n16_15)
+__m128i __lsx_vmaxi_b (__m128i, imm_n16_15);
 __m128i __lsx_vmaxi_bu (__m128i, imm0_31);
-__m128i __lsx_vmaxi_d (__m128i, imm_n16_15)
+__m128i __lsx_vmaxi_d (__m128i, imm_n16_15);
 __m128i __lsx_vmaxi_du (__m128i, imm0_31);
-__m128i __lsx_vmaxi_h (__m128i, imm_n16_15)
+__m128i __lsx_vmaxi_h (__m128i, imm_n16_15);
 __m128i __lsx_vmaxi_hu (__m128i, imm0_31);
-__m128i __lsx_vmaxi_w (__m128i, imm_n16_15)
+__m128i __lsx_vmaxi_w (__m128i, imm_n16_15);
 __m128i __lsx_vmaxi_wu (__m128i, imm0_31);
 __m128i __lsx_vmax_w (__m128i, __m128i);
 __m128i __lsx_vmax_wu (__m128i, __m128i);
@@ -17902,13 +17902,13 @@ __m128i __lsx_vmin_d (__m128i, __m128i);
 __m128i __lsx_vmin_du (__m128i, __m128i);
 __m128i __lsx_vmin_h (__m128i, __m128i);
 __m128i __lsx_vmin_hu (__m128i, __m128i);
-__m128i __lsx_vmini_b (__m128i, imm_n16_15)
+__m128i __lsx_vmini_b (__m128i, imm_n16_15);
 __m128i __lsx_vmini_bu (__m128i, imm0_31);
-__m128i __lsx_vmini_d (__m128i, imm_n16_15)
+__m128i __lsx_vmini_d (__m128i, imm_n16_15);
 __m12

[committed] amdgcn: XNACK support

2023-12-13 Thread Andrew Stubbs

Some AMD GCN devices support an "XNACK" mode in which the device can 
handle page-misses (and maybe other traps in memory instructions), but 
it's not completely invisible to software.


We need this now to support OpenMP Unified Shared Memory (I plan to post 
updated patches for that in January), and in future it may enable 
support for APU devices (such as MI300).


The first patch ensures that load instructions are "restartable", 
meaning that the outputs do not overwrite the input registers (address 
and offsets). This maps pretty much exactly to the GCC "early-clobber" 
concept, so we just need to add additional alternatives and then not 
generate problem instructions explicitly.


The second patch is a workaround for the register allocation patch I 
asked about on gcc@ yesterday.  The early clobber increases register 
pressure which causes compile failure when LRA is unable to spill 
additional registers without needing yet more registers. This doesn't 
become a problem on gfx90a (MI200) so soon due to the additional AVGPR 
spill registers, and that's the only device that really supports USM, so 
far, so limiting XNACK to that device will work for now.


The -mxnack option was already added as a placeholder, so not much is 
needed there.


Committed to master. An older version of these patches is already 
committed to devel/omp/gcc-13 (OG13).


Andrewamdgcn: Work around XNACK register allocation problem

The extra register pressure is causing infinite loops in some cases, especially
at -O0.  I have not yet observed any issue on devices that have AVGPRs for
spilling, and XNACK is only really useful on those devices anyway, so change
the defaults.

gcc/ChangeLog:

* config/gcn/gcn-hsa.h (NO_XNACK): Change the defaults.
* config/gcn/gcn-opts.h (enum hsaco_attr_type): Add HSACO_ATTR_DEFAULT.
* config/gcn/gcn.cc (gcn_option_override): Set the default flag_xnack.
* config/gcn/gcn.opt: Add -mxnack=default.
* doc/invoke.texi: Document the -mxnack default.

diff --git a/gcc/config/gcn/gcn-hsa.h b/gcc/config/gcn/gcn-hsa.h
index bfb104526c5..b44d42b02d6 100644
--- a/gcc/config/gcn/gcn-hsa.h
+++ b/gcc/config/gcn/gcn-hsa.h
@@ -75,7 +75,9 @@ extern unsigned int gcn_local_sym_hash (const char *name);
supported for gcn.  */
 #define GOMP_SELF_SPECS ""
 
-#define NO_XNACK "march=fiji:;march=gfx1030:;"
+#define NO_XNACK "march=fiji:;march=gfx1030:;" \
+/* These match the defaults set in gcn.cc.  */ \
+
"!mxnack*|mxnack=default:%{march=gfx900|march=gfx906|march=gfx908:-mattr=-xnack};"
 #define NO_SRAM_ECC "!march=*:;march=fiji:;march=gfx900:;march=gfx906:;"
 
 /* In HSACOv4 no attribute setting means the binary supports "any" hardware
diff --git a/gcc/config/gcn/gcn-opts.h b/gcc/config/gcn/gcn-opts.h
index b4f494d868c..634cec6d832 100644
--- a/gcc/config/gcn/gcn-opts.h
+++ b/gcc/config/gcn/gcn-opts.h
@@ -65,7 +65,8 @@ enum hsaco_attr_type
 {
   HSACO_ATTR_OFF,
   HSACO_ATTR_ON,
-  HSACO_ATTR_ANY
+  HSACO_ATTR_ANY,
+  HSACO_ATTR_DEFAULT
 };
 
 #endif
diff --git a/gcc/config/gcn/gcn.cc b/gcc/config/gcn/gcn.cc
index d92cd01d03f..b67551a2e8e 100644
--- a/gcc/config/gcn/gcn.cc
+++ b/gcc/config/gcn/gcn.cc
@@ -172,6 +172,29 @@ gcn_option_override (void)
   /* Allow HSACO_ATTR_ANY silently because that's the default.  */
   flag_xnack = HSACO_ATTR_OFF;
 }
+
+  /* There's no need for XNACK on devices without USM, and there are register
+ allocation problems caused by the early-clobber when AVGPR spills are not
+ available.
+ FIXME: can the regalloc mean the default can be really "any"?  */
+  if (flag_xnack == HSACO_ATTR_DEFAULT)
+switch (gcn_arch)
+  {
+  case PROCESSOR_FIJI:
+  case PROCESSOR_VEGA10:
+  case PROCESSOR_VEGA20:
+  case PROCESSOR_GFX908:
+   flag_xnack = HSACO_ATTR_OFF;
+   break;
+  case PROCESSOR_GFX90a:
+   flag_xnack = HSACO_ATTR_ANY;
+   break;
+  default:
+   gcc_unreachable ();
+  }
+
+  if (flag_sram_ecc == HSACO_ATTR_DEFAULT)
+flag_sram_ecc = HSACO_ATTR_ANY;
 }
 
 /* }}}  */
diff --git a/gcc/config/gcn/gcn.opt b/gcc/config/gcn/gcn.opt
index c356a0cbb08..32486d9615f 100644
--- a/gcc/config/gcn/gcn.opt
+++ b/gcc/config/gcn/gcn.opt
@@ -97,9 +97,12 @@ Enum(hsaco_attr_type) String(on) Value(HSACO_ATTR_ON)
 EnumValue
 Enum(hsaco_attr_type) String(any) Value(HSACO_ATTR_ANY)
 
+EnumValue
+Enum(hsaco_attr_type) String(default) Value(HSACO_ATTR_DEFAULT)
+
 mxnack=
-Target RejectNegative Joined ToLower Enum(hsaco_attr_type) Var(flag_xnack) 
Init(HSACO_ATTR_ANY)
-Compile for devices requiring XNACK enabled. Default \"any\".
+Target RejectNegative Joined ToLower Enum(hsaco_attr_type) Var(flag_xnack) 
Init(HSACO_ATTR_DEFAULT)
+Compile for devices requiring XNACK enabled. Default \"any\" if USM is 
supported.
 
 msram-ecc=
 Target RejectNegative Joined ToLower Enum(hsaco_attr_type) Var(flag_sram_ecc) 
Init(HSACO_ATTR_ANY)
diff --git a/gcc/doc/invoke.texi b/gcc/doc/invoke.texi
index db

Re: Disable FMADD in chains for Zen4 and generic

2023-12-13 Thread Jan Hubicka

> > The diffrerence is that Cores understand the fact that fmadd does not need
> > all three parameters to start computation, while Zen cores doesn't.
> >
> > Since this seems noticeable win on zen and not loss on Core it seems like 
> > good
> > default for generic.
> >
> > I plan to commit the patch next week if there are no compplains.
> The generic part LGTM.(It's exactly what we proposed in [1])
> 
> [1] https://gcc.gnu.org/pipermail/gcc-patches/2023-November/637721.html

Thanks.  I wonder if can think of other generic changes that would make
sense to do?
Concerning zen4 and FMA, it is not really win with AVX512 enabled
(which is what I was benchmarking for znver4 tuning), but indeed it is
win with AVX256 where the extra latency is not hidden by the parallelism
exposed by doing evertyhing twice.

I re-benmchmarked zen4 and it behaves similarly to zen3 with avx256, so
for x86-64-v3 this makes sense.

Honza
> >
> > Honza
> >
> > #include 
> > #include 
> >
> > #define SIZE 1000
> >
> > float a[SIZE][SIZE];
> > float b[SIZE][SIZE];
> > float c[SIZE][SIZE];
> >
> > void init(void)
> > {
> >int i, j, k;
> >for(i=0; i >{
> >   for(j=0; j >   {
> >  a[i][j] = (float)i + j;
> >  b[i][j] = (float)i - j;
> >  c[i][j] = 0.0f;
> >   }
> >}
> > }
> >
> > void mult(void)
> > {
> >int i, j, k;
> >
> >for(i=0; i >{
> >   for(j=0; j >   {
> >  for(k=0; k >  {
> > c[i][j] += a[i][k] * b[k][j];
> >  }
> >   }
> >}
> > }
> >
> > int main(void)
> > {
> >clock_t s, e;
> >
> >init();
> >s=clock();
> >mult();
> >e=clock();
> >printf("mult took %10d clocks\n", (int)(e-s));
> >
> >return 0;
> >
> > }
> >
> > * confg/i386/x86-tune.def (X86_TUNE_AVOID_128FMA_CHAINS, 
> > X86_TUNE_AVOID_256FMA_CHAINS)
> > Enable for znver4 and Core.
> >
> > diff --git a/gcc/config/i386/x86-tune.def b/gcc/config/i386/x86-tune.def
> > index 43fa9e8fd6d..74b03cbcc60 100644
> > --- a/gcc/config/i386/x86-tune.def
> > +++ b/gcc/config/i386/x86-tune.def
> > @@ -515,13 +515,13 @@ DEF_TUNE (X86_TUNE_USE_SCATTER_8PARTS, 
> > "use_scatter_8parts",
> >
> >  /* X86_TUNE_AVOID_128FMA_CHAINS: Avoid creating loops with tight 128bit or
> > smaller FMA chain.  */
> > -DEF_TUNE (X86_TUNE_AVOID_128FMA_CHAINS, "avoid_fma_chains", m_ZNVER1 | 
> > m_ZNVER2 | m_ZNVER3
> > -  | m_YONGFENG)
> > +DEF_TUNE (X86_TUNE_AVOID_128FMA_CHAINS, "avoid_fma_chains", m_ZNVER1 | 
> > m_ZNVER2 | m_ZNVER3 | m_ZNVER4
> > +  | m_YONGFENG | m_GENERIC)
> >
> >  /* X86_TUNE_AVOID_256FMA_CHAINS: Avoid creating loops with tight 256bit or
> > smaller FMA chain.  */
> > -DEF_TUNE (X86_TUNE_AVOID_256FMA_CHAINS, "avoid_fma256_chains", m_ZNVER2 | 
> > m_ZNVER3
> > - | m_CORE_HYBRID | m_SAPPHIRERAPIDS | m_CORE_ATOM)
> > +DEF_TUNE (X86_TUNE_AVOID_256FMA_CHAINS, "avoid_fma256_chains", m_ZNVER2 | 
> > m_ZNVER3 | m_ZNVER4
> > + | m_CORE_HYBRID | m_SAPPHIRERAPIDS | m_CORE_ATOM | m_GENERIC)
> >
> >  /* X86_TUNE_AVOID_512FMA_CHAINS: Avoid creating loops with tight 512bit or
> > smaller FMA chain.  */
> 
> 
> 
> -- 
> BR,
> Hongtao

Re: [PATCH] tree-optimization/111807 - ICE in verify_sra_access_forest

2023-12-13 Thread Martin Jambor

Hi,

sorry for getting to this only so late, my email backlog from my medical
leave still isn't empty.

On Mon, Oct 16 2023, Richard Biener wrote:
> The following addresses build_reconstructed_reference failing to
> build references with a different offset than the models and thus
> the caller conditional being off.  This manifests when attempting
> to build a ref with offset 160 from the model BIT_FIELD_REF 
> onto the same base l_4827 but the models offset being 288.  This
> cannot work for any kind of ref I can think of, not just with
> BIT_FIELD_REFs.
>
> Bootstrapped and tested on x86_64-unknown-linux-gnu, will push
> later.
>
> Martin - do you remember which case was supposed to be allowed
> with offset < model->offset?
>

It happens quite often, even in our testsuite.  In fact, the condition
is not even supposed to be necessary and is there just an early exit in
hopeless cases with malformed programs.

The problem is that the function is used in two contexts and in one of
them we are not careful about ARRAY_REFs, as explained below in a commit
message of a patch I'd like to push.  Needless to say, it has passed
bootstrap and testing on x86_64-linux, I'm running the same on
aarch64-linux.

What do you think?

Martin

Subject: [PATCH] SRA: Relax requirements to use build_reconstructed_reference 
(PR 111807)

This patch half-reverts 3aaf704bca3e and replaces it with a fix with
relaxed requiremets for invoking build_reconstructed_reference in
build_ref_for_model.

build_ref_for_model/build_ref_for_offset is used in two slightly
different contexts. The first is when we are looking at an assignment
like

   p->field_A.field_B = s.field_B;

and we have a replacements for e.g. s.field_B.field_C.field_D and we
want to store them directly to p->field_A.field_B.field_C.field_D (as
opposed to going through s or using a MEM_REF based in
p->field_A.field_B).  In this case, the offset of the
"model" (s.field_B.field_C.field_D) within this can be different than
offset within the LHS that we want to reach (field_C.field_D within
the "base" p->field_A.field_B).  Patch 3aaf704bca3e has caused us to
unnecessarily create MEM_REFs for these situations.  These uses of
build_ref_for_model work with the relaxed condition just fine.

The second, problematic, context is when somewhere in the function we
have an assignment

  s.field_A = t.field_A.field_B;

and we are creating an access structure to represent s.field_A.field_B
even if it is not actually accessed in the original input.  This is
done after scanning the entire function body and we need to construct
a "universal" reference to s.field_A.field_B.  In this case the "base"
is "s" and it has to be the DECL itself and not some reference for it
because for arbitrary references we need a GSI pointing to a statement
which we don't have, the reference is supposed to be universal.

But then using build_ref_for_model and within it
build_reconstructed_reference misbehaves if the expression contains
any ARRAY_REFs.  In the first case those are fine because as we
eventually reach the aggregate type that matches a real LHS or RHS, we
know we we can just bolt the rest of the references onto it and end up
with the correct overall reference.  However when dealing with

   s.array[1].field_A = s.array[2].field_B;

we cannot just bolt array[2] reference when we want array[1] but that
is exactly what happens when we use build_reconstructed_reference and
keep it walking all the way to s.

I was considering making all users of the second kind use directly
build_ref_for_offset instead of build_ref_for_model but the latter
also handles COMPONENT_REFs to bit-fields which the former does not.
Therefore I have decided to use the NULL-ness of GSI as an indicator
how strict we need to be.  I have changed the function comment to
reflect that.

I have been able to observe disambiguation improvements with this patch
over current master, we do successfully manage a few more
aliasing_component_refs_p disambiguations when compiling cc1, going
from:

  Alias oracle query stats:
refs_may_alias_p: 94354287 disambiguations, 106279231 queries
ref_maybe_used_by_call_p: 1572511 disambiguations, 95618222 queries
call_may_clobber_ref_p: 649273 disambiguations, 659371 queries
stmt_kills_ref_p: 142342 kills, 8407309 queries
nonoverlapping_component_refs_p: 19 disambiguations, 10227 queries
nonoverlapping_refs_since_match_p: 15665 disambiguations, 52585 must 
overlaps, 68893 queries
aliasing_component_refs_p: 67090 disambiguations, 3081766 queries
TBAA oracle: 22675296 disambiguations 61781978 queries
 14045969 are in alias set 0
 10997085 queries asked about the same object
 153 queries asked about the same alias set
 0 access volatile
 12485774 are dependent in the DAG
 1577701 are aritificially in conflict with void *

  Modref stats:
modref kill: 832 kills, 19399 queries
m

[PATCH v4] A new copy propagation and PHI elimination pass

2023-12-13 Thread Filip Kastl

> > > Hi,
> > > 
> > > this is a patch that I submitted two months ago as an RFC. I added some 
> > > polish
> > > since.
> > > 
> > > It is a new lightweight pass that removes redundant PHI functions and as a
> > > bonus does basic copy propagation. With Jan Hubi?ka we measured that it 
> > > is able
> > > to remove usually more than 5% of all PHI functions when run among early 
> > > passes
> > > (sometimes even 13% or more). Those are mostly PHI functions that would be
> > > later optimized away but with this pass it is possible to remove them 
> > > early
> > > enough so that they don't get streamed when runing LTO (and also 
> > > potentially
> > > inlined at multiple places). It is also able to remove some redundant PHIs
> > > that otherwise would still be present during RTL expansion.
> > > 
> > > Jakub Jel?nek was concerned about debug info coverage so I compiled 
> > > cc1plus
> > > with and without this patch. These are the sizes of .debug_info and
> > > .debug_loclists
> > > 
> > > .debug_info without patch 181694311
> > > .debug_infowith patch 181692320
> > > +0.0011% change
> > > 
> > > .debug_loclists without patch 47934753
> > > .debug_loclistswith patch 47934966
> > > -0.0004% change
> > > 
> > > I wanted to use dwlocstat to compare debug coverages but didn't manage to 
> > > get
> > > the program working on my machine sadly. Hope this suffices. Seems to me 
> > > that
> > > my patch doesn't have a significant impact on debug info.
> > > 
> > > Bootstraped and tested* on x86_64-pc-linux-gnu.
> > > 
> > > * One testcase (pr79691.c) did regress. However that is because the test 
> > > is
> > > dependent on a certain variable not being copy propagated. I will go into 
> > > more
> > > detail about this in a reply to this mail.
> > > 
> > > Ok to commit?
> > 
> > This is a second version of the patch.  In this version, I modified the
> > pr79691.c testcase so that it works as intended with other changes from the
> > patch.
> > 
> > The pr79691.c testcase checks that we get constants from snprintf calls and
> > that they simplify into a single constant.  The testcase doesn't account for
> > the fact that this constant may be further copy propagated which is exactly
> > what happens with this patch applied.
> > 
> > Bootstrapped and tested on x86_64-pc-linux-gnu.
> > 
> > Ok to commit?
> 
> This is the third version of the patch. In this version, I adressed most of
> Richards remarks about the second version. Here is a summary of changes I 
> made:
> 
> - Rename the pass from tree-ssa-sccopy.cc to gimple-ssa-sccopy.cc
> - Use simple_dce_from_worklist to remove propagated statements
> - Use existing replace_uses API instead of reinventing it
>   - This allowed me to get rid of some now redundant cleanup code
> - Encapsulate the SCC finding into a class
> - Rework stmt_may_generate_copy to get rid of redundant checks
> - Add check that PHI doesn't contain two non-SSA-name values to
>   stmt_may_generate_copy
> - Regarding alignment and value ranges in stmt_may_generate_copy: For now use
>   the conservative check that Richard suggested
> - Index array of vertices that SCC discovery uses by SSA name version numbers
>   instead of numbering statements myself.
> 
> 
> I didn't make any changes based on these remarks:
> 
> 1 It might be nice to optimize SCCs of size 1 somehow, not sure how
>   many times these appear - possibly prevent them from even entering
>   the SCC discovery?
> 
> It would be nice. But the only way to do this that I see right now is to first
> propagate SCCs of size 1 and then the rest. This would mean adding a new copy
> propagation procedure. It wouldn't be a trivial procedure. Efficiency of the
> pass relies on having SCCs topogically sorted so this procedure would have to
> implement some topological sort algorithm.
> 
> This could be done. It could save allocating some vec<>s (right now, SCCs of
> size 1 are represented by a vec<> with a single element). But is it worth it 
> to
> optimize the pass this way right now? If possible, I'd like to see that the
> pass works and sort out any problems people encounter with it before I start
> optimizing it.
> 
> 2 Instead of collecting all stmts that may generate a copy at the beginning of
>   the pass into a vec<>, let the SCC discovery check that statements may
>   generate a copy on the fly.
> 
> This would be a big change to the pass, it would require a lot of reworking.
> I'm also not sure if this would help reduce the number of allocated vec<>s 
> that
> much because I'll still want to represent SCCs by vec<>s.
> 
> Again - its possible I'll want to rework the pass in this way in the future 
> but
> I'd like to leave it as it is for now.
> 
> 3 Add a comment saying that the pass is doing optimistic copy propagation
> 
> I don't think the pass works in an optimistic way. It doesn't assume that all
> variables are copies of each other at any point. It instead identifies copy
> statements (or PHI SCCs that act as copy sta

Re: [PATCH] tree-optimization/111807 - ICE in verify_sra_access_forest

2023-12-13 Thread Richard Biener




> Am 13.12.2023 um 17:07 schrieb Martin Jambor :
> 
> Hi,
> 
> sorry for getting to this only so late, my email backlog from my medical
> leave still isn't empty.
> 
>> On Mon, Oct 16 2023, Richard Biener wrote:
>> The following addresses build_reconstructed_reference failing to
>> build references with a different offset than the models and thus
>> the caller conditional being off.  This manifests when attempting
>> to build a ref with offset 160 from the model BIT_FIELD_REF 
>> onto the same base l_4827 but the models offset being 288.  This
>> cannot work for any kind of ref I can think of, not just with
>> BIT_FIELD_REFs.
>> 
>> Bootstrapped and tested on x86_64-unknown-linux-gnu, will push
>> later.
>> 
>> Martin - do you remember which case was supposed to be allowed
>> with offset < model->offset?
>> 
> 
> It happens quite often, even in our testsuite.  In fact, the condition
> is not even supposed to be necessary and is there just an early exit in
> hopeless cases with malformed programs.
> 
> The problem is that the function is used in two contexts and in one of
> them we are not careful about ARRAY_REFs, as explained below in a commit
> message of a patch I'd like to push.  Needless to say, it has passed
> bootstrap and testing on x86_64-linux, I'm running the same on
> aarch64-linux.
> 
> What do you think

Thanks for the explanation. This wasn’t obvious.  The patch is OK from my side.

Thanks,
Richard 


> Martin
> 
> 
> 
> Subject: [PATCH] SRA: Relax requirements to use build_reconstructed_reference 
> (PR 111807)
> 
> This patch half-reverts 3aaf704bca3e and replaces it with a fix with
> relaxed requiremets for invoking build_reconstructed_reference in
> build_ref_for_model.
> 
> build_ref_for_model/build_ref_for_offset is used in two slightly
> different contexts. The first is when we are looking at an assignment
> like
> 
>   p->field_A.field_B = s.field_B;
> 
> and we have a replacements for e.g. s.field_B.field_C.field_D and we
> want to store them directly to p->field_A.field_B.field_C.field_D (as
> opposed to going through s or using a MEM_REF based in
> p->field_A.field_B).  In this case, the offset of the
> "model" (s.field_B.field_C.field_D) within this can be different than
> offset within the LHS that we want to reach (field_C.field_D within
> the "base" p->field_A.field_B).  Patch 3aaf704bca3e has caused us to
> unnecessarily create MEM_REFs for these situations.  These uses of
> build_ref_for_model work with the relaxed condition just fine.
> 
> The second, problematic, context is when somewhere in the function we
> have an assignment
> 
>  s.field_A = t.field_A.field_B;
> 
> and we are creating an access structure to represent s.field_A.field_B
> even if it is not actually accessed in the original input.  This is
> done after scanning the entire function body and we need to construct
> a "universal" reference to s.field_A.field_B.  In this case the "base"
> is "s" and it has to be the DECL itself and not some reference for it
> because for arbitrary references we need a GSI pointing to a statement
> which we don't have, the reference is supposed to be universal.
> 
> But then using build_ref_for_model and within it
> build_reconstructed_reference misbehaves if the expression contains
> any ARRAY_REFs.  In the first case those are fine because as we
> eventually reach the aggregate type that matches a real LHS or RHS, we
> know we we can just bolt the rest of the references onto it and end up
> with the correct overall reference.  However when dealing with
> 
>   s.array[1].field_A = s.array[2].field_B;
> 
> we cannot just bolt array[2] reference when we want array[1] but that
> is exactly what happens when we use build_reconstructed_reference and
> keep it walking all the way to s.
> 
> I was considering making all users of the second kind use directly
> build_ref_for_offset instead of build_ref_for_model but the latter
> also handles COMPONENT_REFs to bit-fields which the former does not.
> Therefore I have decided to use the NULL-ness of GSI as an indicator
> how strict we need to be.  I have changed the function comment to
> reflect that.
> 
> I have been able to observe disambiguation improvements with this patch
> over current master, we do successfully manage a few more
> aliasing_component_refs_p disambiguations when compiling cc1, going
> from:
> 
>  Alias oracle query stats:
>refs_may_alias_p: 94354287 disambiguations, 106279231 queries
>ref_maybe_used_by_call_p: 1572511 disambiguations, 95618222 queries
>call_may_clobber_ref_p: 649273 disambiguations, 659371 queries
>stmt_kills_ref_p: 142342 kills, 8407309 queries
>nonoverlapping_component_refs_p: 19 disambiguations, 10227 queries
>nonoverlapping_refs_since_match_p: 15665 disambiguations, 52585 must 
> overlaps, 68893 queries
>aliasing_component_refs_p: 67090 disambiguations, 3081766 queries
>TBAA oracle: 22675296 disambiguations 61781978 queries
> 14045969 are in ali

Re: [PATCH v4] A new copy propagation and PHI elimination pass

2023-12-13 Thread Richard Biener




> Am 13.12.2023 um 17:12 schrieb Filip Kastl :
> 
> 
>> 
 Hi,
 
 this is a patch that I submitted two months ago as an RFC. I added some 
 polish
 since.
 
 It is a new lightweight pass that removes redundant PHI functions and as a
 bonus does basic copy propagation. With Jan Hubi?ka we measured that it is 
 able
 to remove usually more than 5% of all PHI functions when run among early 
 passes
 (sometimes even 13% or more). Those are mostly PHI functions that would be
 later optimized away but with this pass it is possible to remove them early
 enough so that they don't get streamed when runing LTO (and also 
 potentially
 inlined at multiple places). It is also able to remove some redundant PHIs
 that otherwise would still be present during RTL expansion.
 
 Jakub Jel?nek was concerned about debug info coverage so I compiled cc1plus
 with and without this patch. These are the sizes of .debug_info and
 .debug_loclists
 
 .debug_info without patch 181694311
 .debug_infowith patch 181692320
 +0.0011% change
 
 .debug_loclists without patch 47934753
 .debug_loclistswith patch 47934966
 -0.0004% change
 
 I wanted to use dwlocstat to compare debug coverages but didn't manage to 
 get
 the program working on my machine sadly. Hope this suffices. Seems to me 
 that
 my patch doesn't have a significant impact on debug info.
 
 Bootstraped and tested* on x86_64-pc-linux-gnu.
 
 * One testcase (pr79691.c) did regress. However that is because the test is
 dependent on a certain variable not being copy propagated. I will go into 
 more
 detail about this in a reply to this mail.
 
 Ok to commit?
>>> 
>>> This is a second version of the patch.  In this version, I modified the
>>> pr79691.c testcase so that it works as intended with other changes from the
>>> patch.
>>> 
>>> The pr79691.c testcase checks that we get constants from snprintf calls and
>>> that they simplify into a single constant.  The testcase doesn't account for
>>> the fact that this constant may be further copy propagated which is exactly
>>> what happens with this patch applied.
>>> 
>>> Bootstrapped and tested on x86_64-pc-linux-gnu.
>>> 
>>> Ok to commit?
>> 
>> This is the third version of the patch. In this version, I adressed most of
>> Richards remarks about the second version. Here is a summary of changes I 
>> made:
>> 
>> - Rename the pass from tree-ssa-sccopy.cc to gimple-ssa-sccopy.cc
>> - Use simple_dce_from_worklist to remove propagated statements
>> - Use existing replace_uses API instead of reinventing it
>>  - This allowed me to get rid of some now redundant cleanup code
>> - Encapsulate the SCC finding into a class
>> - Rework stmt_may_generate_copy to get rid of redundant checks
>> - Add check that PHI doesn't contain two non-SSA-name values to
>>  stmt_may_generate_copy
>> - Regarding alignment and value ranges in stmt_may_generate_copy: For now use
>>  the conservative check that Richard suggested
>> - Index array of vertices that SCC discovery uses by SSA name version numbers
>>  instead of numbering statements myself.
>> 
>> 
>> I didn't make any changes based on these remarks:
>> 
>> 1 It might be nice to optimize SCCs of size 1 somehow, not sure how
>>  many times these appear - possibly prevent them from even entering
>>  the SCC discovery?
>> 
>> It would be nice. But the only way to do this that I see right now is to 
>> first
>> propagate SCCs of size 1 and then the rest. This would mean adding a new copy
>> propagation procedure. It wouldn't be a trivial procedure. Efficiency of the
>> pass relies on having SCCs topogically sorted so this procedure would have to
>> implement some topological sort algorithm.
>> 
>> This could be done. It could save allocating some vec<>s (right now, SCCs of
>> size 1 are represented by a vec<> with a single element). But is it worth it 
>> to
>> optimize the pass this way right now? If possible, I'd like to see that the
>> pass works and sort out any problems people encounter with it before I start
>> optimizing it.
>> 
>> 2 Instead of collecting all stmts that may generate a copy at the beginning 
>> of
>>  the pass into a vec<>, let the SCC discovery check that statements may
>>  generate a copy on the fly.
>> 
>> This would be a big change to the pass, it would require a lot of reworking.
>> I'm also not sure if this would help reduce the number of allocated vec<>s 
>> that
>> much because I'll still want to represent SCCs by vec<>s.
>> 
>> Again - its possible I'll want to rework the pass in this way in the future 
>> but
>> I'd like to leave it as it is for now.
>> 
>> 3 Add a comment saying that the pass is doing optimistic copy propagation
>> 
>> I don't think the pass works in an optimistic way. It doesn't assume that all
>> variables are copies of each other at any point. It instead identifies

Re: [PATCH] SRA: Force gimple operand in an additional corner case (PR 112822)

2023-12-13 Thread Jason Merrill


On 12/12/23 21:36, Jason Merrill wrote:

On 12/12/23 17:50, Peter Bergner wrote:

On 12/12/23 1:26 PM, Richard Biener wrote:

Am 12.12.2023 um 19:51 schrieb Peter Bergner :

On 12/12/23 12:45 PM, Peter Bergner wrote:

+/* PR target/112822 */


Oops, this should be:

/* PR tree-optimization/112822 */

It's fixed on my end.


Ok


Pushed now that Martin has pushed his fix.  Thanks!


This test is failing for me below C++17, I think you need

// { dg-do compile { target c++17 } }
or
// { dg-require-effective-target c++17 }


Fixed thus.


From d2b269ce30d77dbfc6c28c75887c330d4698b132 Mon Sep 17 00:00:00 2001
From: Jason Merrill 
Date: Tue, 12 Dec 2023 21:33:11 -0500
Subject: [PATCH] testsuite: fix g++.dg/pr112822.C
To: gcc-patches@gcc.gnu.org

gcc/testsuite/ChangeLog:

	* g++.dg/pr112822.C: Require C++17.
---
 gcc/testsuite/g++.dg/pr112822.C | 1 +
 1 file changed, 1 insertion(+)

diff --git a/gcc/testsuite/g++.dg/pr112822.C b/gcc/testsuite/g++.dg/pr112822.C
index a8557522467..9949fbb08ac 100644
--- a/gcc/testsuite/g++.dg/pr112822.C
+++ b/gcc/testsuite/g++.dg/pr112822.C
@@ -1,6 +1,7 @@
 /* PR tree-optimization/112822 */
 /* { dg-do compile { target c++17 } } */
 /* { dg-options "-w -O2" } */
+// { dg-do compile { target c++17 } }
 
 /* Verify we do not ICE on the following noisy creduced test case.  */
 
-- 
2.39.3

Re: [PATCH] SRA: Force gimple operand in an additional corner case (PR 112822)

2023-12-13 Thread Jakub Jelinek

On Wed, Dec 13, 2023 at 11:24:42AM -0500, Jason Merrill wrote:
> gcc/testsuite/ChangeLog:
> 
>   * g++.dg/pr112822.C: Require C++17.
> ---
>  gcc/testsuite/g++.dg/pr112822.C | 1 +
>  1 file changed, 1 insertion(+)
> 
> diff --git a/gcc/testsuite/g++.dg/pr112822.C b/gcc/testsuite/g++.dg/pr112822.C
> index a8557522467..9949fbb08ac 100644
> --- a/gcc/testsuite/g++.dg/pr112822.C
> +++ b/gcc/testsuite/g++.dg/pr112822.C
> @@ -1,6 +1,7 @@
>  /* PR tree-optimization/112822 */
>  /* { dg-do compile { target c++17 } } */
>  /* { dg-options "-w -O2" } */
> +// { dg-do compile { target c++17 } }

2 dg-do compile directives?

Jakub

[pushed 1/4] c++: copy location to AGGR_INIT_EXPR

2023-12-13 Thread Jason Merrill

Tested x86_64-pc-linux-gnu, applying to trunk.

-- 8< --

When building an AGGR_INIT_EXPR from a CALL_EXPR, we shouldn't lose location
information.

gcc/cp/ChangeLog:

* tree.cc (build_aggr_init_expr): Copy EXPR_LOCATION.

gcc/testsuite/ChangeLog:

* g++.dg/cpp1y/constexpr-nsdmi7b.C: Adjust line.
* g++.dg/template/copy1.C: Likewise.
---
 gcc/cp/tree.cc | 1 +
 gcc/testsuite/g++.dg/cpp1y/constexpr-nsdmi7b.C | 4 ++--
 gcc/testsuite/g++.dg/template/copy1.C  | 2 +-
 3 files changed, 4 insertions(+), 3 deletions(-)

diff --git a/gcc/cp/tree.cc b/gcc/cp/tree.cc
index da4d5c51f07..c4e41fd7b5c 100644
--- a/gcc/cp/tree.cc
+++ b/gcc/cp/tree.cc
@@ -689,6 +689,7 @@ build_aggr_init_expr (tree type, tree init)
   CALL_EXPR_OPERATOR_SYNTAX (rval) = CALL_EXPR_OPERATOR_SYNTAX (init);
   CALL_EXPR_ORDERED_ARGS (rval) = CALL_EXPR_ORDERED_ARGS (init);
   CALL_EXPR_REVERSE_ARGS (rval) = CALL_EXPR_REVERSE_ARGS (init);
+  SET_EXPR_LOCATION (rval, EXPR_LOCATION (init));
 }
   else
 rval = init;
diff --git a/gcc/testsuite/g++.dg/cpp1y/constexpr-nsdmi7b.C 
b/gcc/testsuite/g++.dg/cpp1y/constexpr-nsdmi7b.C
index a410e482664..586ee54124c 100644
--- a/gcc/testsuite/g++.dg/cpp1y/constexpr-nsdmi7b.C
+++ b/gcc/testsuite/g++.dg/cpp1y/constexpr-nsdmi7b.C
@@ -20,8 +20,8 @@ bar()
 {
   A a = foo();
   a.p->n = 5;
-  return a;
-} // { dg-error "non-.constexpr." "" { target c++20_down } }
+  return a; // { dg-error "non-.constexpr." "" { target c++20_down } }
+}
 
 constexpr int
 baz()
diff --git a/gcc/testsuite/g++.dg/template/copy1.C 
b/gcc/testsuite/g++.dg/template/copy1.C
index eacd9e2c025..7e0a3805a77 100644
--- a/gcc/testsuite/g++.dg/template/copy1.C
+++ b/gcc/testsuite/g++.dg/template/copy1.C
@@ -6,10 +6,10 @@
 
 struct A
 {
-  // { dg-error "reference" "" { target c++14_down } .+1 }
   A(A&);   // { dg-message "A::A" "" { target c++14_down } 
}
   template  A(T); // { dg-message "A::A" "" { target c++14_down } 
}
 };
 
+// { dg-error "reference" "" { target c++14_down } .+1 }
 A a = 0; // { dg-error "no match" "" { target c++14_down } }
 

base-commit: d2b269ce30d77dbfc6c28c75887c330d4698b132
-- 
2.39.3

[pushed 3/4] c++: fix in-charge parm in constexpr

2023-12-13 Thread Jason Merrill

Tested x86_64-pc-linux-gnu, applying to trunk.

-- 8< --

I was puzzled by the proposed patch for PR71093 specifically ignoring the
in-charge parameter; the problem turned out to be that when
cxx_eval_call_expression jumps from the clone to the cloned function, it
assumes that the latter has the same parameters, and so the in-charge parm
doesn't get an argument.  Since a class with vbases can't have constexpr
'tors there isn't actually a need for an in-charge parameter in a
destructor, but we used to use it for deleting destructors and never removed
it.  I have a patch to do that for GCC 15, but for now let's work around it.

gcc/cp/ChangeLog:

* constexpr.cc (cxx_eval_call_expression): Handle missing in-charge
argument.
---
 gcc/cp/constexpr.cc | 13 +
 1 file changed, 13 insertions(+)

diff --git a/gcc/cp/constexpr.cc b/gcc/cp/constexpr.cc
index 4cf9dd71b05..9d9e96c2afd 100644
--- a/gcc/cp/constexpr.cc
+++ b/gcc/cp/constexpr.cc
@@ -3169,6 +3169,19 @@ cxx_eval_call_expression (const constexpr_ctx *ctx, tree 
t,
  ctx->global->put_value (remapped, arg);
  remapped = DECL_CHAIN (remapped);
}
+ for (; remapped; remapped = TREE_CHAIN (remapped))
+   if (DECL_NAME (remapped) == in_charge_identifier)
+ {
+   /* FIXME destructors unnecessarily have in-charge parameters
+  even in classes without vbases, map it to 0 for now.  */
+   gcc_assert (!CLASSTYPE_VBASECLASSES (DECL_CONTEXT (fun)));
+   ctx->global->put_value (remapped, integer_zero_node);
+ }
+   else
+ {
+   gcc_assert (seen_error ());
+   *non_constant_p = true;
+ }
  /* Add the RESULT_DECL to the values map, too.  */
  gcc_assert (!DECL_BY_REFERENCE (res));
  ctx->global->put_value (res, NULL_TREE);
-- 
2.39.3

[pushed 4/4] c++: End lifetime of objects in constexpr after destructor call [PR71093]

2023-12-13 Thread Jason Merrill

Tested x86_64-pc-linux-gnu, applying to trunk.

This is modified from Nathaniel's last version by adjusting for my recent
CLOBBER changes and removing the special handling of __in_chrg which is no
longer needed since my previous commit.

-- 8< --

This patch adds checks for using objects after they've been manually
destroyed via explicit destructor call. Currently this is only
implemented for 'top-level' objects; FIELD_DECLs and individual elements
of arrays will need a lot more work to track correctly and are left for
a future patch.

The other limitation is that destruction of parameter objects is checked
too 'early', happening at the end of the function call rather than the
end of the owning full-expression as they should be for consistency;
see cpp2a/constexpr-lifetime2.C. This is because I wasn't able to find a
good way to link the constructed parameter declarations with the
variable declarations that are actually destroyed later on to propagate
their lifetime status, so I'm leaving this for a later patch.

PR c++/71093

gcc/cp/ChangeLog:

* constexpr.cc (constexpr_global_ctx::get_value_ptr): Don't
return NULL_TREE for objects we're initializing.
(constexpr_global_ctx::destroy_value): Rename from remove_value.
Only mark real variables as outside lifetime.
(constexpr_global_ctx::clear_value): New function.
(destroy_value_checked): New function.
(cxx_eval_call_expression): Defer complaining about non-constant
arg0 for operator delete. Use remove_value_safe.
(cxx_fold_indirect_ref_1): Handle conversion to 'as base' type.
(outside_lifetime_error): Include name of object we're
accessing.
(cxx_eval_store_expression): Handle clobbers. Improve error
messages.
(cxx_eval_constant_expression): Use remove_value_safe. Clear
bind variables before entering body.

gcc/testsuite/ChangeLog:

* g++.dg/cpp1y/constexpr-lifetime1.C: Improve error message.
* g++.dg/cpp1y/constexpr-lifetime2.C: Likewise.
* g++.dg/cpp1y/constexpr-lifetime3.C: Likewise.
* g++.dg/cpp1y/constexpr-lifetime4.C: Likewise.
* g++.dg/cpp2a/bitfield2.C: Likewise.
* g++.dg/cpp2a/constexpr-new3.C: Likewise. New check.
* g++.dg/cpp1y/constexpr-lifetime7.C: New test.
* g++.dg/cpp2a/constexpr-lifetime1.C: New test.
* g++.dg/cpp2a/constexpr-lifetime2.C: New test.

Signed-off-by: Nathaniel Shead 
---
 gcc/cp/constexpr.cc   | 148 +++---
 .../g++.dg/cpp1y/constexpr-lifetime1.C|   2 +-
 .../g++.dg/cpp1y/constexpr-lifetime2.C|   2 +-
 .../g++.dg/cpp1y/constexpr-lifetime3.C|   2 +-
 .../g++.dg/cpp1y/constexpr-lifetime4.C|   2 +-
 .../g++.dg/cpp1y/constexpr-lifetime7.C|  93 +++
 gcc/testsuite/g++.dg/cpp2a/bitfield2.C|   2 +-
 .../g++.dg/cpp2a/constexpr-lifetime1.C|  21 +++
 .../g++.dg/cpp2a/constexpr-lifetime2.C|  23 +++
 gcc/testsuite/g++.dg/cpp2a/constexpr-new3.C   |  17 +-
 10 files changed, 284 insertions(+), 28 deletions(-)
 create mode 100644 gcc/testsuite/g++.dg/cpp1y/constexpr-lifetime7.C
 create mode 100644 gcc/testsuite/g++.dg/cpp2a/constexpr-lifetime1.C
 create mode 100644 gcc/testsuite/g++.dg/cpp2a/constexpr-lifetime2.C

diff --git a/gcc/cp/constexpr.cc b/gcc/cp/constexpr.cc
index 9d9e96c2afd..e1b2d27fc36 100644
--- a/gcc/cp/constexpr.cc
+++ b/gcc/cp/constexpr.cc
@@ -1193,13 +1193,20 @@ public:
return *p;
 return NULL_TREE;
   }
-  tree *get_value_ptr (tree t)
+  tree *get_value_ptr (tree t, bool initializing)
   {
 if (modifiable && !modifiable->contains (t))
   return nullptr;
 if (tree *p = values.get (t))
-  if (*p != void_node)
-   return p;
+  {
+   if (*p != void_node)
+ return p;
+   else if (initializing)
+ {
+   *p = NULL_TREE;
+   return p;
+ }
+  }
 return nullptr;
   }
   void put_value (tree t, tree v)
@@ -1208,13 +1215,19 @@ public:
 if (!already_in_map && modifiable)
   modifiable->add (t);
   }
-  void remove_value (tree t)
+  void destroy_value (tree t)
   {
-if (DECL_P (t))
+if (TREE_CODE (t) == VAR_DECL
+   || TREE_CODE (t) == PARM_DECL
+   || TREE_CODE (t) == RESULT_DECL)
   values.put (t, void_node);
 else
   values.remove (t);
   }
+  void clear_value (tree t)
+  {
+values.remove (t);
+  }
 };
 
 /* Helper class for constexpr_global_ctx.  In some cases we want to avoid
@@ -1238,7 +1251,7 @@ public:
   ~modifiable_tracker ()
   {
 for (tree t: set)
-  global->remove_value (t);
+  global->clear_value (t);
 global->modifiable = nullptr;
   }
 };
@@ -1278,6 +1291,40 @@ struct constexpr_ctx {
   mce_value manifestly_const_eval;
 };
 
+/* Remove T from the global values map, checking for attempts to destroy
+   a value that has already finished its lifetime.  */
+
+static void
+destroy_val

[pushed 2/4] c++: constant direct-initialization [PR108243]

2023-12-13 Thread Jason Merrill

Tested x86_64-pc-linux-gnu, applying to trunk.

-- 8< --

When testing the proposed patch for PR71093 I noticed that it changed the
diagnostic for consteval-prop6.C.  I then noticed that the diagnostic wasn't
very helpful either way; it was complaining about modification of the 'x'
variable, but it's not a problem to initialize a local variable with a
consteval constructor as long as the value is actually constant, we want to
know why the value isn't constant.  And then it turned out that this also
fixed a missed-optimization bug in the testsuite.

PR c++/108243

gcc/cp/ChangeLog:

* constexpr.cc (cxx_eval_outermost_constant_expr): Turn
a constructor CALL_EXPR into a TARGET_EXPR.

gcc/testsuite/ChangeLog:

* g++.dg/cpp2a/consteval-prop6.C: Adjust diagnostic.
* g++.dg/opt/is_constant_evaluated3.C: Remove xfails.
---
 gcc/cp/constexpr.cc  | 16 +++-
 gcc/testsuite/g++.dg/cpp2a/consteval-prop6.C |  2 +-
 .../g++.dg/opt/is_constant_evaluated3.C  |  8 
 3 files changed, 20 insertions(+), 6 deletions(-)

diff --git a/gcc/cp/constexpr.cc b/gcc/cp/constexpr.cc
index 58187a4fd12..4cf9dd71b05 100644
--- a/gcc/cp/constexpr.cc
+++ b/gcc/cp/constexpr.cc
@@ -8651,7 +8651,21 @@ cxx_eval_outermost_constant_expr (tree t, bool 
allow_non_constant,
}
   if (!object)
{
- if (TREE_CODE (t) == TARGET_EXPR)
+ if (TREE_CODE (t) == CALL_EXPR)
+   {
+ /* If T is calling a constructor to initialize an object, reframe
+it as an AGGR_INIT_EXPR to avoid trying to modify an object
+from outside the constant evaluation, which will fail even if
+the value is actually constant (is_constant_evaluated3.C).  */
+ tree fn = cp_get_callee_fndecl_nofold (t);
+ if (fn && DECL_CONSTRUCTOR_P (fn))
+   {
+ object = CALL_EXPR_ARG (t, 0);
+ object = build_fold_indirect_ref (object);
+ r = build_aggr_init_expr (type, r);
+   }
+   }
+ else if (TREE_CODE (t) == TARGET_EXPR)
object = TARGET_EXPR_SLOT (t);
  else if (TREE_CODE (t) == AGGR_INIT_EXPR)
object = AGGR_INIT_EXPR_SLOT (t);
diff --git a/gcc/testsuite/g++.dg/cpp2a/consteval-prop6.C 
b/gcc/testsuite/g++.dg/cpp2a/consteval-prop6.C
index 93ed398d9bf..ca7db7c63d3 100644
--- a/gcc/testsuite/g++.dg/cpp2a/consteval-prop6.C
+++ b/gcc/testsuite/g++.dg/cpp2a/consteval-prop6.C
@@ -48,7 +48,7 @@ struct X {
   int a = sizeof(undef(0));
   int x = undef(0);
 
-  X() = default; // { dg-error "modification of .x. is not a constant 
expression" }
+  X() = default; // { dg-error {'consteval int undef\(int\)' used before its 
definition} }
 };
 
 void
diff --git a/gcc/testsuite/g++.dg/opt/is_constant_evaluated3.C 
b/gcc/testsuite/g++.dg/opt/is_constant_evaluated3.C
index 0a1e46e5638..783127cf909 100644
--- a/gcc/testsuite/g++.dg/opt/is_constant_evaluated3.C
+++ b/gcc/testsuite/g++.dg/opt/is_constant_evaluated3.C
@@ -17,7 +17,7 @@ int main() {
 }
 
 // { dg-final { scan-tree-dump "a1 = {\\.n=42, \\.m=0}" "original" } }
-// { dg-final { scan-tree-dump "a2 = {\\.n=42, \\.m=0}" "original" { xfail 
*-*-* } } }
-// { dg-final { scan-tree-dump "a3 = {\\.n=42, \\.m=0}" "original" { xfail 
*-*-* } } }
-// { dg-final { scan-tree-dump "a4 = {\\.n=42, \\.m=0}" "original" { xfail 
*-*-* } } }
-// { dg-final { scan-tree-dump "a5 = {\\.n=42, \\.m=0}" "original" { xfail 
*-*-* } } }
+// { dg-final { scan-tree-dump "a2 = {\\.n=42, \\.m=0}" "original" } }
+// { dg-final { scan-tree-dump "a3 = {\\.n=42, \\.m=0}" "original" } }
+// { dg-final { scan-tree-dump "a4 = {\\.n=42, \\.m=0}" "original" } }
+// { dg-final { scan-tree-dump "a5 = {\\.n=42, \\.m=0}" "original" } }
-- 
2.39.3

[PATCH] middle-end: Fix up constant handling in emit_conditional_move [PR111260]

2023-12-13 Thread Andrew Pinski

After r14-2667-gceae1400cf24f329393e96dd9720, we force a constant to a register
if it is shared with one of the other operands. The problem is used the 
comparison
mode for the register but that could be different from the operand mode. This
causes some issues on some targets.
To fix it, we either need to have the modes match or if it is an integer mode,
then we can use the lower part for the smaller mode.

Bootstrapped and tested on both aarch64-linux-gnu and x86_64-linux.

PR middle-end/111260

gcc/ChangeLog:

* optabs.cc (emit_conditional_move): Fix up mode handling for
forcing the constant to a register.

gcc/testsuite/ChangeLog:

* gcc.c-torture/compile/condmove-1.c: New test.

Signed-off-by: Andrew Pinski 
---
 gcc/optabs.cc | 40 +--
 .../gcc.c-torture/compile/condmove-1.c|  9 +
 2 files changed, 45 insertions(+), 4 deletions(-)
 create mode 100644 gcc/testsuite/gcc.c-torture/compile/condmove-1.c

diff --git a/gcc/optabs.cc b/gcc/optabs.cc
index f0a048a6bdb..573cf22760e 100644
--- a/gcc/optabs.cc
+++ b/gcc/optabs.cc
@@ -5131,26 +5131,58 @@ emit_conditional_move (rtx target, struct 
rtx_comparison comp,
  /* If we are optimizing, force expensive constants into a register
 but preserve an eventual equality with op2/op3.  */
  if (CONSTANT_P (orig_op0) && optimize
+ && (cmpmode == mode
+ || (GET_MODE_CLASS (cmpmode) == MODE_INT
+ && GET_MODE_CLASS (mode) == MODE_INT))
  && (rtx_cost (orig_op0, mode, COMPARE, 0,
optimize_insn_for_speed_p ())
  > COSTS_N_INSNS (1))
  && can_create_pseudo_p ())
{
+ machine_mode new_mode;
+ if (known_le (GET_MODE_PRECISION (cmpmode), GET_MODE_PRECISION 
(mode)))
+   new_mode = mode;
+ else
+   new_mode = cmpmode;
  if (rtx_equal_p (orig_op0, op2))
-   op2p = XEXP (comparison, 0) = force_reg (cmpmode, orig_op0);
+   {
+ rtx r = force_reg (new_mode, orig_op0);
+ op2p = gen_lowpart (mode, r);
+ XEXP (comparison, 0) = gen_lowpart (cmpmode, r);
+   }
  else if (rtx_equal_p (orig_op0, op3))
-   op3p = XEXP (comparison, 0) = force_reg (cmpmode, orig_op0);
+   {
+ rtx r = force_reg (new_mode, orig_op0);
+ op3p = gen_lowpart (mode, r);
+ XEXP (comparison, 0) = gen_lowpart (cmpmode, r);
+   }
}
  if (CONSTANT_P (orig_op1) && optimize
+ && (cmpmode == mode
+ || (GET_MODE_CLASS (cmpmode) == MODE_INT
+ && GET_MODE_CLASS (mode) == MODE_INT))
  && (rtx_cost (orig_op1, mode, COMPARE, 0,
optimize_insn_for_speed_p ())
  > COSTS_N_INSNS (1))
  && can_create_pseudo_p ())
{
+ machine_mode new_mode;
+ if (known_le (GET_MODE_PRECISION (cmpmode), GET_MODE_PRECISION 
(mode)))
+   new_mode = mode;
+ else
+   new_mode = cmpmode;
  if (rtx_equal_p (orig_op1, op2))
-   op2p = XEXP (comparison, 1) = force_reg (cmpmode, orig_op1);
+   {
+ rtx r = force_reg (new_mode, orig_op1);
+ op2p = gen_lowpart (mode, r);
+ XEXP (comparison, 1) = gen_lowpart (cmpmode, r);
+   }
  else if (rtx_equal_p (orig_op1, op3))
-   op3p = XEXP (comparison, 1) = force_reg (cmpmode, orig_op1);
+   {
+ rtx r = force_reg (new_mode, orig_op1);
+ op3p = gen_lowpart (mode, r);
+ XEXP (comparison, 1) = gen_lowpart (cmpmode, r);
+   }
}
  prepare_cmp_insn (XEXP (comparison, 0), XEXP (comparison, 1),
GET_CODE (comparison), NULL_RTX, unsignedp,
diff --git a/gcc/testsuite/gcc.c-torture/compile/condmove-1.c 
b/gcc/testsuite/gcc.c-torture/compile/condmove-1.c
new file mode 100644
index 000..3fcc591af00
--- /dev/null
+++ b/gcc/testsuite/gcc.c-torture/compile/condmove-1.c
@@ -0,0 +1,9 @@
+/* PR middle-end/111260 */
+
+/* Used to ICE while expansion of the `(a == b) ? b : 0;` */
+int f1(long long a)
+{
+  int b = 822920;
+  int t = a == b;
+  return t * (int)b;
+}
-- 
2.39.3

Re: [r14-6468 Regression] FAIL: std/time/year/io.cc -std=gnu++26 execution test on Linux/x86_64

2023-12-13 Thread Jonathan Wakely

On Wed, 13 Dec 2023 at 10:51, haochen.jiang
 wrote:
>
> On Linux/x86_64,
>
> a01462ae8bafa86e7df47a252917ba6899d587cf is the first bad commit
> commit a01462ae8bafa86e7df47a252917ba6899d587cf
> Author: Jonathan Wakely 
> Date:   Mon Dec 11 15:33:59 2023 +
>
> libstdc++: Fix std::format output of %C for negative years
>
> caused
>
> FAIL: std/time/year/io.cc  -std=gnu++20 execution test
> FAIL: std/time/year/io.cc  -std=gnu++26 execution test
>
> with GCC configured with
>
> ../../gcc/configure 
> --prefix=/export/users/haochenj/src/gcc-bisect/master/master/r14-6468/usr 
> --enable-clocale=gnu --with-system-zlib --with-demangler-in-ld 
> --with-fpmath=sse --enable-languages=c,c++,fortran --enable-cet --without-isl 
> --enable-libmpx x86_64-linux --disable-bootstrap
>
> To reproduce:
>
> $ cd {build_dir}/x86_64-linux/libstdc++-v3/testsuite && make check 
> RUNTESTFLAGS="conformance.exp=std/time/year/io.cc --target_board='unix{-m32}'"
> $ cd {build_dir}/x86_64-linux/libstdc++-v3/testsuite && make check 
> RUNTESTFLAGS="conformance.exp=std/time/year/io.cc --target_board='unix{-m32\ 
> -march=cascadelake}'"
> $ cd {build_dir}/x86_64-linux/libstdc++-v3/testsuite && make check 
> RUNTESTFLAGS="conformance.exp=std/time/year/io.cc --target_board='unix{-m64}'"
> $ cd {build_dir}/x86_64-linux/libstdc++-v3/testsuite && make check 
> RUNTESTFLAGS="conformance.exp=std/time/year/io.cc --target_board='unix{-m64\ 
> -march=cascadelake}'"
>
> (Please do not reply to this email, for question about this report, contact 
> me at haochen dot jiang at intel.com.)
> (If you met problems with cascadelake related, disabling AVX512F in command 
> line might save that.)
> (However, please make sure that there is no potential problems with AVX512.)

Thanks, fixed at r14-6493-gad537ccd525fd3

Re: [PATCH] c++: Fix tinst_level::to_list [PR112968]

2023-12-13 Thread Jason Merrill


On 12/13/23 04:49, Jakub Jelinek wrote:

Hi!

With valgrind checking, there are various errors reported on some C++26
libstdc++ tests, like:
==2009913== Conditional jump or move depends on uninitialised value(s)
==2009913==at 0x914C59: gt_ggc_mx_lang_tree_node(void*) (gt-cp-tree.h:107)
==2009913==by 0x8AB7A5: gt_ggc_mx_tinst_level(void*) (gt-cp-pt.h:32)
==2009913==by 0xB89B25: ggc_mark_root_tab(ggc_root_tab const*) 
(ggc-common.cc:75)
==2009913==by 0xB89DF4: ggc_mark_roots() (ggc-common.cc:104)
==2009913==by 0x9D6311: ggc_collect(ggc_collect) (ggc-page.cc:2227)
==2009913==by 0xDB70F6: execute_one_pass(opt_pass*) (passes.cc:2738)
==2009913==by 0xDB721F: execute_pass_list_1(opt_pass*) (passes.cc:2755)
==2009913==by 0xDB7258: execute_pass_list(function*, opt_pass*) 
(passes.cc:2766)
==2009913==by 0xA55525: cgraph_node::analyze() (cgraphunit.cc:695)
==2009913==by 0xA57CC7: analyze_functions(bool) (cgraphunit.cc:1248)
==2009913==by 0xA5890D: symbol_table::finalize_compilation_unit() 
(cgraphunit.cc:2555)
==2009913==by 0xEB02A1: compile_file() (toplev.cc:473)

I think the problem is in the tinst_level::to_list optimization from 2018.
That function returns a TREE_LIST with TREE_PURPOSE/TREE_VALUE filled in.
Either it freshly allocates using build_tree_list (NULL, NULL); + stores
TREE_PURPOSE/TREE_VALUE, that case is fine (the whole tree_list object
is zeros, except for TREE_CODE set to TREE_LIST and TREE_PURPOSE/TREE_VALUE
modified later; the above also means in particular TREE_TYPE of it is NULL
and TREE_CHAIN is NULL and both are accessible/initialized even in valgrind
annotations.
Or it grabs a TREE_LIST node from a freelist.
If defined(ENABLE_GC_CHECKING), the object is still all zeros except
for TREE_CODE/TREE_PURPOSE/TREE_VALUE like in the fresh allocation case
(but unlike the build_tree_list case in the valgrind annotations
TREE_TYPE and TREE_CHAIN are marked as uninitialized).
If !defined(ENABLE_GC_CHECKING), I believe the actual memory content
is that everything but TREE_CODE/TREE_PURPOSE/TREE_VALUE/TREE_CHAIN is
zeros and TREE_CHAIN is something random (whatever next entry is in the
freelist, nothing overwrote it) and from valgrind POV again,
TREE_TYPE and TREE_CHAIN are marked as uninitialized.

When using the other freelist instantiations (pending_template and
tinst_level) I believe everything is correct, from valgrind POV it marks
the whole pending_template or tinst_level as uninitialized, but the
caller initializes it all).

One way to fix this would be let tinst_level::to_list not store just
   TREE_PURPOSE (ret) = tldcl;
   TREE_VALUE (ret) = targs;
but also
   TREE_TYPE (ret) = NULL_TREE;
   TREE_CHAIN (ret) = NULL_TREE;
Though, that seems like wasted effort in the build_tree_list case to me.

So, the following patch instead does that TREE_CHAIN = NULL_TREE store only
in the case where it isn't already done and marks both TREE_CHAIN and
TREE_TYPE as initialized (the latter is at that spot, the former is because
we never really touch TREE_TYPE of a TREE_LIST anywhere and so the NULL
gets stored into the freelist and restored from there (except for
ENABLE_GC_CHECKING where it is poisoned and then cleared again).


We sometimes do put things in the TREE_TYPE of a TREE_LIST, so I would 
be more comfortable setting it here as well.  OK with that change.



Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?

2023-12-13  Jakub Jelinek  

PR c++/112968
* pt.cc (freelist::reinit): Make whole obj->common
defined for valgrind annotations rather than just obj->base,
and do it even for ENABLE_GC_CHECKING.  If not ENABLE_GC_CHECKING,
clear TREE_CHAIN (obj).

--- gcc/cp/pt.cc.jj 2023-12-11 23:52:03.592513063 +0100
+++ gcc/cp/pt.cc2023-12-12 16:40:09.259903877 +0100
@@ -9525,7 +9525,7 @@ template <>
  inline void
  freelist::reinit (tree obj ATTRIBUTE_UNUSED)
  {
-  tree_base *b ATTRIBUTE_UNUSED = &obj->base;
+  tree_common *c ATTRIBUTE_UNUSED = &obj->common;
  
  #ifdef ENABLE_GC_CHECKING

gcc_checking_assert (TREE_CODE (obj) == TREE_LIST);
@@ -9540,8 +9540,9 @@ freelist::reinit (tree obj AT
  #ifdef ENABLE_GC_CHECKING
TREE_SET_CODE (obj, TREE_LIST);
  #else
-  VALGRIND_DISCARD (VALGRIND_MAKE_MEM_DEFINED (b, sizeof (*b)));
+  TREE_CHAIN (obj) = NULL_TREE;
  #endif
+  VALGRIND_DISCARD (VALGRIND_MAKE_MEM_DEFINED (c, sizeof (*c)));
  }
  
  /* Point to the first object in the TREE_LIST freelist.  */


Jakub

Re: [PATCH v2] aarch64: Fix +nocrypto handling

2023-12-13 Thread Richard Sandiford

Andrew Carlotti  writes:
> Additionally, replace all checks for the AARCH64_FL_CRYPTO bit with
> checks for (AARCH64_FL_AES | AARCH64_FL_SHA2) instead.  The value of the
> AARCH64_FL_CRYPTO bit within isa_flags is now ignored, but it is
> retained because removing it would make processing the data in
> option-extensions.def significantly more complex.
>
> This bug should have been picked up by an existing test, but a missing
> newline meant that the pattern incorrectly allowed "+crypto+nocrypto".
>
> Ok for master?
>
> gcc/ChangeLog:
>
>   * common/config/aarch64/aarch64-common.cc
>   (aarch64_get_extension_string_for_isa_flags): Fix generation of
>   the "+nocrypto" extension.
>   * config/aarch64/aarch64.h (AARCH64_ISA_CRYPTO): Remove.
>   (TARGET_CRYPTO): Remove.
>   * config/aarch64/aarch64-c.cc (aarch64_update_cpp_builtins):
>   Don't use TARGET_CRYPTO.
>
> gcc/testsuite/ChangeLog:
>
>   * gcc.target/aarch64/options_set_4.c: Add terminating newline.
>   * gcc.target/aarch64/options_set_27.c: New test.

OK, thanks.

Richard

> diff --git a/gcc/common/config/aarch64/aarch64-common.cc 
> b/gcc/common/config/aarch64/aarch64-common.cc
> index 
> 8fb901029ec2980a048177586b84201b3b398f9e..c2a6d357c0bc17996a25ea5c3a40f69d745c7931
>  100644
> --- a/gcc/common/config/aarch64/aarch64-common.cc
> +++ b/gcc/common/config/aarch64/aarch64-common.cc
> @@ -311,6 +311,7 @@ aarch64_get_extension_string_for_isa_flags
>   But in order to make the output more readable, it seems better
>   to add the strings in definition order.  */
>aarch64_feature_flags added = 0;
> +  auto flags_crypto = AARCH64_FL_AES | AARCH64_FL_SHA2;
>for (unsigned int i = ARRAY_SIZE (all_extensions); i-- > 0; )
>  {
>auto &opt = all_extensions[i];
> @@ -320,7 +321,7 @@ aarch64_get_extension_string_for_isa_flags
>per-feature crypto flags.  */
>auto flags = opt.flag_canonical;
>if (flags == AARCH64_FL_CRYPTO)
> - flags = AARCH64_FL_AES | AARCH64_FL_SHA2;
> + flags = flags_crypto;
>  
>if ((flags & isa_flags & (explicit_flags | ~current_flags)) == flags)
>   {
> @@ -339,14 +340,32 @@ aarch64_get_extension_string_for_isa_flags
>   not have an HWCAPs then it shouldn't be taken into account for feature
>   detection because one way or another we can't tell if it's available
>   or not.  */
> +
>for (auto &opt : all_extensions)
> -if (opt.native_detect_p
> - && (opt.flag_canonical & current_flags & ~isa_flags))
> -  {
> - current_flags &= ~opt.flags_off;
> - outstr += "+no";
> - outstr += opt.name;
> -  }
> +{
> +  auto flags = opt.flag_canonical;
> +  /* As a special case, don't emit "+noaes" or "+nosha2" when we could 
> emit
> +  "+nocrypto" instead, in order to support assemblers that predate the
> +  separate per-feature crypto flags.  Only allow "+nocrypto" when "sm4"
> +  is not already enabled (to avoid dependending on whether "+nocrypto"
> +  also disables "sm4").  */
> +  if (flags & flags_crypto
> +   && (flags_crypto & current_flags & ~isa_flags) == flags_crypto
> +   && !(current_flags & AARCH64_FL_SM4))
> +   continue;
> +
> +  if (flags == AARCH64_FL_CRYPTO)
> + /* If either crypto flag needs removing here, then both do.  */
> + flags = flags_crypto;
> +
> +  if (opt.native_detect_p
> +   && (flags & current_flags & ~isa_flags))
> + {
> +   current_flags &= ~opt.flags_off;
> +   outstr += "+no";
> +   outstr += opt.name;
> + }
> +}
>  
>return outstr;
>  }
> diff --git a/gcc/config/aarch64/aarch64-c.cc b/gcc/config/aarch64/aarch64-c.cc
> index 
> 115a2a8b7568c43a712d819e03147ff84ff182c0..cdc4e453a2054b1a1d2c70bf0b528e497ae0b9ad
>  100644
> --- a/gcc/config/aarch64/aarch64-c.cc
> +++ b/gcc/config/aarch64/aarch64-c.cc
> @@ -188,7 +188,7 @@ aarch64_update_cpp_builtins (cpp_reader *pfile)
>aarch64_def_or_undef (TARGET_ILP32, "_ILP32", pfile);
>aarch64_def_or_undef (TARGET_ILP32, "__ILP32__", pfile);
>  
> -  aarch64_def_or_undef (TARGET_CRYPTO, "__ARM_FEATURE_CRYPTO", pfile);
> +  aarch64_def_or_undef (TARGET_AES && TARGET_SHA2, "__ARM_FEATURE_CRYPTO", 
> pfile);
>aarch64_def_or_undef (TARGET_SIMD_RDMA, "__ARM_FEATURE_QRDMX", pfile);
>aarch64_def_or_undef (TARGET_SVE, "__ARM_FEATURE_SVE", pfile);
>cpp_undef (pfile, "__ARM_FEATURE_SVE_BITS");
> diff --git a/gcc/config/aarch64/aarch64.h b/gcc/config/aarch64/aarch64.h
> index 
> 2cd0bc552ebadac06a2838ae2767852c036d0db4..501bb7478a0755fa76c488ec03dcfab6c272851c
>  100644
> --- a/gcc/config/aarch64/aarch64.h
> +++ b/gcc/config/aarch64/aarch64.h
> @@ -204,7 +204,11 @@ constexpr auto AARCH64_FL_DEFAULT_ISA_MODE = 
> AARCH64_FL_SM_OFF;
>  
>  #endif
>  
> -/* Macros to test ISA flags.  */
> +/* Macros to test ISA flags.
> +
> +   There is intentionally no macro for AARCH64_FL_CRYPTO, since this flag bit
> +   is not always set when its constitu

Re: [committed v2] aarch64: Add missing driver-aarch64 dependencies

2023-12-13 Thread Richard Sandiford

Andrew Carlotti  writes:
> On Sat, Dec 09, 2023 at 06:42:17PM +, Richard Sandiford wrote:
>> Andrew Carlotti  writes:
>> The .def files are included in TM_H by:
>> 
>> TM_H += $(srcdir)/config/aarch64/aarch64-fusion-pairs.def \
>>  $(srcdir)/config/aarch64/aarch64-tuning-flags.def \
>>  $(srcdir)/config/aarch64/aarch64-option-extensions.def \
>>  $(srcdir)/config/aarch64/aarch64-cores.def \
>>  $(srcdir)/config/aarch64/aarch64-isa-modes.def \
>>  $(srcdir)/config/aarch64/aarch64-arches.def
>
> They are included now, but only because you added them last week.

Ah, right.  I'd already forgotten that I'd only done that recently, sorry...

> I've removed them in v2 of the patch, committed as below:

Thanks.

> ---
>
> gcc/ChangeLog:
>
>   * config/aarch64/x-aarch64: Add missing dependencies.
>
>
> diff --git a/gcc/config/aarch64/x-aarch64 b/gcc/config/aarch64/x-aarch64
> index 
> 3cf701a0a01ab00eaaafdfad14bd90ebbb1d498f..ee828c9af53a11885c2bcef8f112c0ebaf161c59
>  100644
> --- a/gcc/config/aarch64/x-aarch64
> +++ b/gcc/config/aarch64/x-aarch64
> @@ -1,3 +1,5 @@
>  driver-aarch64.o: $(srcdir)/config/aarch64/driver-aarch64.cc \
> -  $(CONFIG_H) $(SYSTEM_H)
> +  $(CONFIG_H) $(SYSTEM_H) $(TM_H) $(CORETYPES_H) \
> +  $(srcdir)/config/aarch64/aarch64-protos.h \
> +  $(srcdir)/config/aarch64/aarch64-feature-deps.h
>   $(COMPILER) -c $(ALL_COMPILERFLAGS) $(ALL_CPPFLAGS) $(INCLUDES) $<

Re: [pushed 1/4] c++: copy location to AGGR_INIT_EXPR

2023-12-13 Thread Patrick Palka

On Wed, 13 Dec 2023, Jason Merrill wrote:

> Tested x86_64-pc-linux-gnu, applying to trunk.
> 
> -- 8< --
> 
> When building an AGGR_INIT_EXPR from a CALL_EXPR, we shouldn't lose location
> information.
> 
> gcc/cp/ChangeLog:
> 
>   * tree.cc (build_aggr_init_expr): Copy EXPR_LOCATION.

I made a similar change in the past which caused the debug regression
PR96997 which I fixed by reverting the change in r11-7263-g78a6d0e30d7950
(didn't do much deeper analysis than that).  Unfortunately it seems this
regression is back now.

> 
> gcc/testsuite/ChangeLog:
> 
>   * g++.dg/cpp1y/constexpr-nsdmi7b.C: Adjust line.
>   * g++.dg/template/copy1.C: Likewise.
> ---
>  gcc/cp/tree.cc | 1 +
>  gcc/testsuite/g++.dg/cpp1y/constexpr-nsdmi7b.C | 4 ++--
>  gcc/testsuite/g++.dg/template/copy1.C  | 2 +-
>  3 files changed, 4 insertions(+), 3 deletions(-)
> 
> diff --git a/gcc/cp/tree.cc b/gcc/cp/tree.cc
> index da4d5c51f07..c4e41fd7b5c 100644
> --- a/gcc/cp/tree.cc
> +++ b/gcc/cp/tree.cc
> @@ -689,6 +689,7 @@ build_aggr_init_expr (tree type, tree init)
>CALL_EXPR_OPERATOR_SYNTAX (rval) = CALL_EXPR_OPERATOR_SYNTAX (init);
>CALL_EXPR_ORDERED_ARGS (rval) = CALL_EXPR_ORDERED_ARGS (init);
>CALL_EXPR_REVERSE_ARGS (rval) = CALL_EXPR_REVERSE_ARGS (init);
> +  SET_EXPR_LOCATION (rval, EXPR_LOCATION (init));
>  }
>else
>  rval = init;
> diff --git a/gcc/testsuite/g++.dg/cpp1y/constexpr-nsdmi7b.C 
> b/gcc/testsuite/g++.dg/cpp1y/constexpr-nsdmi7b.C
> index a410e482664..586ee54124c 100644
> --- a/gcc/testsuite/g++.dg/cpp1y/constexpr-nsdmi7b.C
> +++ b/gcc/testsuite/g++.dg/cpp1y/constexpr-nsdmi7b.C
> @@ -20,8 +20,8 @@ bar()
>  {
>A a = foo();
>a.p->n = 5;
> -  return a;
> -} // { dg-error "non-.constexpr." "" { target c++20_down } }
> +  return a; // { dg-error "non-.constexpr." "" { target c++20_down } }
> +}
>  
>  constexpr int
>  baz()
> diff --git a/gcc/testsuite/g++.dg/template/copy1.C 
> b/gcc/testsuite/g++.dg/template/copy1.C
> index eacd9e2c025..7e0a3805a77 100644
> --- a/gcc/testsuite/g++.dg/template/copy1.C
> +++ b/gcc/testsuite/g++.dg/template/copy1.C
> @@ -6,10 +6,10 @@
>  
>  struct A
>  {
> -  // { dg-error "reference" "" { target c++14_down } .+1 }
>A(A&); // { dg-message "A::A" "" { target c++14_down } 
> }
>template  A(T);   // { dg-message "A::A" "" { target c++14_down } 
> }
>  };
>  
> +// { dg-error "reference" "" { target c++14_down } .+1 }
>  A a = 0; // { dg-error "no match" "" { target c++14_down } }
>  
> 
> base-commit: d2b269ce30d77dbfc6c28c75887c330d4698b132
> -- 
> 2.39.3
> 
>

Re: [PATCH] SRA: Force gimple operand in an additional corner case (PR 112822)

2023-12-13 Thread Jason Merrill


On 12/13/23 11:26, Jakub Jelinek wrote:

On Wed, Dec 13, 2023 at 11:24:42AM -0500, Jason Merrill wrote:

gcc/testsuite/ChangeLog:

* g++.dg/pr112822.C: Require C++17.
---
  gcc/testsuite/g++.dg/pr112822.C | 1 +
  1 file changed, 1 insertion(+)

diff --git a/gcc/testsuite/g++.dg/pr112822.C b/gcc/testsuite/g++.dg/pr112822.C
index a8557522467..9949fbb08ac 100644
--- a/gcc/testsuite/g++.dg/pr112822.C
+++ b/gcc/testsuite/g++.dg/pr112822.C
@@ -1,6 +1,7 @@
  /* PR tree-optimization/112822 */
  /* { dg-do compile { target c++17 } } */
  /* { dg-options "-w -O2" } */
+// { dg-do compile { target c++17 } }


2 dg-do compile directives?


Oops, I assumed that since my commit still applied on rebase that it 
hadn't been fixed yet, and didn't see the additional discussion this 
morning.  Reverted.


Jason

Re: [RFC/RFT,V2] CFI: Add support for gcc CFI in aarch64

2023-12-13 Thread Kees Cook

On Wed, Dec 13, 2023 at 05:01:07PM +0800, Wang wrote:
> On 2023/12/13 16:48, Dan Li wrote:
> > + Likun
> >
> > On Tue, 28 Mar 2023 at 06:18, Sami Tolvanen  wrote:
> >> On Mon, Mar 27, 2023 at 2:30 AM Peter Zijlstra  
> >> wrote:
> >>> On Sat, Mar 25, 2023 at 01:54:16AM -0700, Dan Li wrote:
> >>>
>  In the compiler part[4], most of the content is the same as Sami's
>  implementation[3], except for some minor differences, mainly including:
> 
>  1. The function typeid is calculated differently and it is difficult
>  to be consistent.
> >>> This means there is an effective ABI break between the compilers, which
> >>> is sad :-( Is there really nothing to be done about this?
> >> I agree, this would be unfortunate, and would also be a compatibility
> >> issue with rustc where there's ongoing work to support
> >> clang-compatible CFI type hashes:
> >>
> >> https://github.com/rust-lang/rust/pull/105452
> >>
> >> Sami
> 
> 
> Hi Peter and Sami
> 
> I am Dan Li's colleague, and I will take over and continue the work of CFI.

Welcome; this is great news! :) Thanks for picking up the work.

> 
> Regarding the issue of gcc cfi type id being compatible with clang, we 
> have analyzed and verified:
> 
> 1. clang uses Mangling defined in Itanium C++ ABI to encode the function 
> prototype, and uses the encoding result as input to generate cfi type id;
> 2. Currently, gcc only implements mangling for the C++ compiler, and the 
> function prototype coding generated by these interfaces is compatible 
> with clang, but gcc's c compiler does not support mangling.;
> 
> Adding mangling to gcc's c compiler is a huge and difficult task，because 
> we have to refactor the mangling of C++, splitting it into basic 
> mangling and language specific mangling, and adding support for the c 
> language which requires a deep understanding of the compiler and 
> language processing parts.
> 
> And for the kernel cfi, I suggest separating type compatibility from CFI 
> basic functions. Type compatibility is independent from CFI basic 
> funcitons and should be dealt with under another topic. Should we focus 
> on the main issus of cfi, and  let it work first on linux kernel, and 
> left the compatible issue to be solved later?

If you mean keeping the hashes identical between Clang/LLVM and GCC,
I think this is going to be a requirement due to adding Rust to the
build environment (which uses the LLVM mangling and hashing).

FWIW, I think the subset of type mangling needed isn't the entirely C++
language spec, so it shouldn't be hard to add this to GCC.

-Kees

-- 
Kees Cook

Re: [PATCH] libgccjit: Add ability to get CPU features

2023-12-13 Thread Antoni Boucher

David: Ping.
I guess if we want to have this merged for this release, it should be
sooner rather than later (if it's still an option).

On Thu, 2023-11-09 at 18:04 -0500, David Malcolm wrote:
> On Thu, 2023-11-09 at 17:27 -0500, Antoni Boucher wrote:
> > Hi.
> > This patch adds support for getting the CPU features in libgccjit
> > (bug
> > 112466)
> > 
> > There's a TODO in the test:
> > I'm not sure how to test that gcc_jit_target_info_arch returns the
> > correct value since it is dependant on the CPU.
> > Any idea on how to improve this?
> > 
> > Also, I created a CStringHash to be able to have a
> > std::unordered_set. Is there any built-in way of
> > doing
> > this?
> 
> Thanks for the patch.
> 
> Some high-level questions:
> 
> Is this specifically about detecting capabilities of the host that
> libgccjit is currently running on? or how the target was configured
> when libgccjit was built?
> 
> One of the benefits of libgccjit is that, in theory, we support all
> of
> the targets that GCC already supports.  Does this patch change that,
> or
> is this more about giving client code the ability to determine
> capabilities of the specific host being compiled for?
> 
> I'm nervous about having per-target jit code.  Presumably there's a
> reason that we can't reuse existing target logic here - can you
> please
> describe what the problem is.  I see that the ChangeLog has:
> 
> > * config/i386/i386-jit.cc: New file.
> 
> where i386-jit.cc has almost 200 lines of nontrivial code.  Where did
> this come from?  Did you base it on existing code in our source tree,
> making modifications to fit the new internal API, or did you write it
> from scratch?  In either case, how onerous would this be for other
> targets?
> 
> I'm not at expert at target hooks (or at the i386 backend), so if we
> do
> go with this approach I'd want someone else to review those parts of
> the patch.
> 
> Have you verified that GCC builds with this patch with jit *not*
> enabled in the enabled languages?
> 
> [...snip...]
> 
> A nitpick:
> 
> > +.. function:: const char * \
> > +  gcc_jit_target_info_arch (gcc_jit_target_info *info)
> > +
> > +   Get the architecture of the currently running CPU.
> 
> What does this string look like?
> How long does the pointer remain valid?
> 
> Thanks again; hope the above makes sense
> Dave
>

[pushed] c++: TARGET_EXPR location in default arg [PR96997]

2023-12-13 Thread Jason Merrill

Tested x86_64-pc-linux-gnu, applying to trunk.

-- 8< --

My r14-6505-g52b4b7d7f5c7c0 change to copy the location in
build_aggr_init_expr reopened PR96997; let's fix it properly this time, by
clearing the location like we do for other trees.

PR c++/96997

gcc/cp/ChangeLog:

* tree.cc (bot_manip): Check data.clear_location for TARGET_EXPR.

gcc/testsuite/ChangeLog:

* g++.dg/debug/cleanup2.C: New test.
---
 gcc/cp/tree.cc|  3 +++
 gcc/testsuite/g++.dg/debug/cleanup2.C | 10 ++
 2 files changed, 13 insertions(+)
 create mode 100644 gcc/testsuite/g++.dg/debug/cleanup2.C

diff --git a/gcc/cp/tree.cc b/gcc/cp/tree.cc
index c4e41fd7b5c..d26e73aaf95 100644
--- a/gcc/cp/tree.cc
+++ b/gcc/cp/tree.cc
@@ -3170,6 +3170,9 @@ bot_manip (tree* tp, int* walk_subtrees, void* data_)
   if (TREE_OPERAND (u, 1) == error_mark_node)
return error_mark_node;
 
+  if (data.clear_location)
+   SET_EXPR_LOCATION (u, input_location);
+
   /* Replace the old expression with the new version.  */
   *tp = u;
   /* We don't have to go below this point; the recursive call to
diff --git a/gcc/testsuite/g++.dg/debug/cleanup2.C 
b/gcc/testsuite/g++.dg/debug/cleanup2.C
new file mode 100644
index 000..03bf92c8424
--- /dev/null
+++ b/gcc/testsuite/g++.dg/debug/cleanup2.C
@@ -0,0 +1,10 @@
+// PR c++/96997
+// { dg-additional-options "-g -fdump-tree-gimple-lineno" }
+
+struct A { A(); ~A(); };
+void f(const A& = A());
+int main() { f(); }
+
+// The destructor call for the A temporary should not have the location of the
+// f declaration.
+// { dg-final { scan-tree-dump-not ".C:5" "gimple" } }

base-commit: da730b29f10fb48d5ed812535768c69ff7d74248
-- 
2.39.3

Re: [PATCH] libcpp: Fix valgrind errors on pr88974.c [PR112956]

2023-12-13 Thread Jason Merrill


On 12/13/23 03:39, Jakub Jelinek wrote:

Hi!

On the c-c++-common/cpp/pr88974.c testcase I'm seeing
==600549== Conditional jump or move depends on uninitialised value(s)
==600549==at 0x1DD3A05: cpp_get_token_1(cpp_reader*, unsigned int*) 
(macro.cc:3050)
==600549==by 0x1DBFC7F: _cpp_parse_expr (expr.cc:1392)
==600549==by 0x1DB9471: do_if(cpp_reader*) (directives.cc:2087)
==600549==by 0x1DBB4D8: _cpp_handle_directive (directives.cc:572)
==600549==by 0x1DCD488: _cpp_lex_token (lex.cc:3682)
==600549==by 0x1DD3A97: cpp_get_token_1(cpp_reader*, unsigned int*) 
(macro.cc:2936)
==600549==by 0x7F7EE4: scan_translation_unit (c-ppoutput.cc:350)
==600549==by 0x7F7EE4: preprocess_file(cpp_reader*) (c-ppoutput.cc:106)
==600549==by 0x7F6235: c_common_init() (c-opts.cc:1280)
==600549==by 0x704C8B: lang_dependent_init (toplev.cc:1837)
==600549==by 0x704C8B: do_compile (toplev.cc:2135)
==600549==by 0x704C8B: toplev::main(int, char**) (toplev.cc:2306)
==600549==by 0x7064BA: main (main.cc:39)
error.  The problem is that _cpp_lex_direct can leave result->src_loc
uninitialized in some cases and later on we use that location_t.

_cpp_lex_direct essentially does:
   cppchar_t c;
...
   cpp_token *result = pfile->cur_token++;

  fresh_line:
   result->flags = 0;
...
   if (buffer->need_line)
 {
   if (pfile->state.in_deferred_pragma)
{
  result->type = CPP_PRAGMA_EOL;
  ... // keeps result->src_loc uninitialized;
  return result;
}
   if (!_cpp_get_fresh_line (pfile))
 {
   result->type = CPP_EOF;
   if (!pfile->state.in_directive && !pfile->state.parsing_args)
{
  result->src_loc = pfile->line_table->highest_line;
  ...
}
  ... // otherwise result->src_loc is sometimes uninitialized here
  return result;
}
   ...
 }
...
   result->src_loc = pfile->line_table->highest_line;
...
   c = *buffer->cur++;
   switch (c)
 {
...
 case '\n':
...
   buffer->need_line = true;
   if (pfile->state.in_deferred_pragma)
 {
   result->type = CPP_PRAGMA_EOL;
...
  return result;
}
   goto fresh_line;
...
 }
...
So, if _cpp_lex_direct is called without buffer->need_line initially set,
result->src_loc is always initialized (and actually hundreds of tests rely
on that exact value it has), even when c == '\n' and we set that flag later
on and goto fresh_line.  For CPP_PRAGMA_EOL case we have in that case
separate handling and don't goto.
But if _cpp_lex_direct is called with buffer->need_line initially set and
either decide to return a CPP_PRAGMA_EOL token or if getting a new line fails
for some reason and we return an CPP_ERROR token and we are in directive
or parsing args state, it is kept uninitialized and can be whatever the
allocation left it there as.

The following patch attempts to keep the status quo, use value that was
returned previously if it was initialized (i.e. we went through the
goto fresh_line; statement in c == '\n' handling) and only initialize
result->src_loc if it was uninitialized before.

Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?


OK.


2023-12-13  Jakub Jelinek  

PR preprocessor/112956
* lex.cc (_cpp_lex_direct): Initialize c to 0.
For CPP_PRAGMA_EOL tokens and if c == 0 also for CPP_EOF
set result->src_loc to highest locus.

--- libcpp/lex.cc.jj2023-12-11 12:39:23.353442196 +0100
+++ libcpp/lex.cc   2023-12-12 13:15:07.154019695 +0100
@@ -3809,7 +3809,7 @@ _cpp_get_fresh_line (cpp_reader *pfile)
  cpp_token *
  _cpp_lex_direct (cpp_reader *pfile)
  {
-  cppchar_t c;
+  cppchar_t c = 0;
cpp_buffer *buffer;
const unsigned char *comment_start;
bool fallthrough_comment = false;
@@ -3833,6 +3833,7 @@ _cpp_lex_direct (cpp_reader *pfile)
  pfile->state.in_deferred_pragma = false;
  if (!pfile->state.pragma_allow_expansion)
pfile->state.prevent_expansion--;
+ result->src_loc = pfile->line_table->highest_line;
  return result;
}
if (!_cpp_get_fresh_line (pfile))
@@ -3849,6 +3850,8 @@ _cpp_lex_direct (cpp_reader *pfile)
  /* Now pop the buffer that _cpp_get_fresh_line did not.  */
  _cpp_pop_buffer (pfile);
}
+ else if (c == 0)
+   result->src_loc = pfile->line_table->highest_line;
  return result;
}
if (buffer != pfile->buffer)

Jakub

Re: [PATCH] c++: unifying constants vs their type [PR99186, PR104867]

2023-12-13 Thread Jason Merrill


On 12/12/23 16:21, Patrick Palka wrote:

Bootstrapped and regtested on x86_64-pc-linux-gnu, does this look OK
for trunk?


OK.


-- >8 --

When unifying constants we need to generally treat constants of
different types but same value as different, in light of auto template
parameters.  This patch fixes this in a minimal way; it seems we could
get away with just using template_args_equal here, as we do in the
default case, but that's a simplification we could look into during next
stage 1.

PR c++/99186
PR c++/104867

gcc/cp/ChangeLog:

* pt.cc (unify) : Compare types as well.

gcc/testsuite/ChangeLog:

* g++.dg/cpp1z/nontype-auto23.C: New test.
* g++.dg/cpp1z/nontype-auto24.C: New test.
---
  gcc/cp/pt.cc|  2 ++
  gcc/testsuite/g++.dg/cpp1z/nontype-auto23.C | 23 +
  gcc/testsuite/g++.dg/cpp1z/nontype-auto24.C | 18 
  3 files changed, 43 insertions(+)
  create mode 100644 gcc/testsuite/g++.dg/cpp1z/nontype-auto23.C
  create mode 100644 gcc/testsuite/g++.dg/cpp1z/nontype-auto24.C

diff --git a/gcc/cp/pt.cc b/gcc/cp/pt.cc
index a8966e223f1..602dd02d29d 100644
--- a/gcc/cp/pt.cc
+++ b/gcc/cp/pt.cc
@@ -24709,6 +24709,8 @@ unify (tree tparms, tree targs, tree parm, tree arg, 
int strict,
/* Type INTEGER_CST can come from ordinary constant template args.  */
  case INTEGER_CST:
  case REAL_CST:
+  if (!same_type_p (TREE_TYPE (parm), TREE_TYPE (arg)))
+   return unify_template_argument_mismatch (explain_p, parm, arg);
while (CONVERT_EXPR_P (arg))
arg = TREE_OPERAND (arg, 0);
  
diff --git a/gcc/testsuite/g++.dg/cpp1z/nontype-auto23.C b/gcc/testsuite/g++.dg/cpp1z/nontype-auto23.C

new file mode 100644
index 000..467559ffdda
--- /dev/null
+++ b/gcc/testsuite/g++.dg/cpp1z/nontype-auto23.C
@@ -0,0 +1,23 @@
+// PR c++/99186
+// { dg-do compile { target c++17 } }
+
+template
+struct tuple_impl : tuple_impl { };
+
+template
+struct tuple_impl { };
+
+template
+struct tuple : tuple_impl<0, T, U> { };
+
+template
+void get(const tuple_impl&);
+
+template
+struct S;
+
+int main() {
+   tuple,S<1U>> x;
+   get>(x);
+   get>(x);
+}
diff --git a/gcc/testsuite/g++.dg/cpp1z/nontype-auto24.C 
b/gcc/testsuite/g++.dg/cpp1z/nontype-auto24.C
new file mode 100644
index 000..52e4c134ccd
--- /dev/null
+++ b/gcc/testsuite/g++.dg/cpp1z/nontype-auto24.C
@@ -0,0 +1,18 @@
+// PR c++/104867
+// { dg-do compile { target c++17 } }
+
+enum class Foo { A1 };
+
+enum class Bar { B1 };
+
+template struct enum_;
+
+template struct list { };
+
+template void f(list, V>);
+
+struct enum_type_map : list, int>, list, double> 
{ };
+
+int main() {
+  f(enum_type_map());
+}

Fix 'libgomp/config/linux/allocator.c' 'size_t' vs. '%ld' format string mismatch (was: Build breakage)

2023-12-13 Thread Thomas Schwinge

Hi!

On 2023-12-13T20:36:40+0100, I wrote:
> On 2023-12-13T11:15:54-0800, Jerry D via Gcc  wrote:
>> I am getting this failure to build from clean trunk.
>
> This is due to commit r14-6499-g348874f0baac0f22c98ab11abbfa65fd172f6bdd
> "libgomp: basic pinned memory on Linux", which supposedly was only tested
> with '--disable-multilib' or so.  As Andrew's now on vacations --
> conveniently ;-P -- I'll soon push a fix.

Pushed to master branch commit 5445ff4a51fcee4d281f79b5f54b349290d0327d
"Fix 'libgomp/config/linux/allocator.c' 'size_t' vs. '%ld' format string 
mismatch",
see attached.


Grüße
 Thomas


>> In file included from ../../../../trunk/libgomp/config/linux/allocator.c:31:
>> ../../../../trunk/libgomp/config/linux/allocator.c: In function
>> ‘linux_memspace_alloc’:
>> ../../../../trunk/libgomp/config/linux/allocator.c:70:26: error: format
>> ‘%ld’ expects argument of type ‘long int’, but argument 3 has type
>> ‘size_t’ {aka ‘unsigned int’} [-Werror=format=]
>> 70 |   gomp_debug (0, "libgomp: failed to pin %ld bytes of"
>>|  ^
>> 71 |   " memory (ulimit too low?)\n", size);
>>|  
>>|  |
>>|  size_t
>> {aka unsigned int}
>> ../../../../trunk/libgomp/libgomp.h:186:29: note: in definition of macro
>> ‘gomp_debug’
>>186 |   (gomp_debug) ((KIND), __VA_ARGS__); \
>>| ^~~
>> ../../../../trunk/libgomp/config/linux/allocator.c:70:52: note: format
>> string is defined here
>> 70 |   gomp_debug (0, "libgomp: failed to pin %ld bytes of"
>>|  ~~^
>>||
>>|long int
>>|  %d


-
Siemens Electronic Design Automation GmbH; Anschrift: Arnulfstraße 201, 80634 
München; Gesellschaft mit beschränkter Haftung; Geschäftsführer: Thomas 
Heurung, Frank Thürauf; Sitz der Gesellschaft: München; Registergericht 
München, HRB 106955
>From 5445ff4a51fcee4d281f79b5f54b349290d0327d Mon Sep 17 00:00:00 2001
From: Thomas Schwinge 
Date: Wed, 13 Dec 2023 17:48:11 +0100
Subject: [PATCH] Fix 'libgomp/config/linux/allocator.c' 'size_t' vs. '%ld'
 format string mismatch
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

Fix-up for commit 348874f0baac0f22c98ab11abbfa65fd172f6bdd
"libgomp: basic pinned memory on Linux", which may result in build failures
as follow, for example, for the '-m32' multilib of x86_64-pc-linux-gnu:

In file included from [...]/source-gcc/libgomp/config/linux/allocator.c:31:
[...]/source-gcc/libgomp/config/linux/allocator.c: In function ‘linux_memspace_alloc’:
[...]/source-gcc/libgomp/config/linux/allocator.c:70:26: error: format ‘%ld’ expects argument of type ‘long int’, but argument 3 has type ‘size_t’ {aka ‘unsigned int’} [-Werror=format=]
   70 |   gomp_debug (0, "libgomp: failed to pin %ld bytes of"
  |  ^
   71 |   " memory (ulimit too low?)\n", size);
  |  
  |  |
  |  size_t {aka unsigned int}
[...]/source-gcc/libgomp/libgomp.h:186:29: note: in definition of macro ‘gomp_debug’
  186 |   (gomp_debug) ((KIND), __VA_ARGS__); \
  | ^~~
[...]/source-gcc/libgomp/config/linux/allocator.c:70:52: note: format string is defined here
   70 |   gomp_debug (0, "libgomp: failed to pin %ld bytes of"
  |  ~~^
  ||
  |long int
  |  %d
cc1: all warnings being treated as errors
make[9]: *** [allocator.lo] Error 1
make[9]: Leaving directory `[...]/build-gcc/x86_64-pc-linux-gnu/32/libgomp'
[...]

Fix this in the same way as used elsewhere in libgomp.

	libgomp/
	* config/linux/allocator.c (linux_memspace_alloc): Fix 'size_t'
	vs. '%ld' format string mismatch.
---
 libgomp/config/linux/allocator.c | 12 ++--
 1 file changed, 10 insertions(+), 2 deletions(-)

diff --git a/libgomp/config/linux/allocator.c b/libgomp/config/linux/allocator.c
index 269d0d607d8..6ffa2417913 100644
--- a/libgomp/config/linux/allocator.c
+++ b/libgomp/config/linux/allocator.c
@@ -50,6 +50,9 @@
 #include 
 #includ

Re: [PATCH v3] c++: fix ICE with sizeof in a template [PR112869]

2023-12-13 Thread Jason Merrill


On 12/12/23 17:48, Marek Polacek wrote:

On Fri, Dec 08, 2023 at 11:09:15PM -0500, Jason Merrill wrote:

On 12/8/23 16:15, Marek Polacek wrote:

On Fri, Dec 08, 2023 at 12:09:18PM -0500, Jason Merrill wrote:

On 12/5/23 15:31, Marek Polacek wrote:

Bootstrapped/regtested on x86_64-pc-linux-gnu, ok for trunk?

-- >8 --
This test shows that we cannot clear *walk_subtrees in
cp_fold_immediate_r when we're in_immediate_context, because that,
as the comment says, affects cp_fold_r as well.  Here we had an
expression with

 min ((long int) VIEW_CONVERT_EXPR(bytecount), (long int) 
<<< Unknown tree: sizeof_expr
   (int) <<< error >>> >>>)

as its sub-expression, and we never evaluated that into

 min ((long int) bytecount, 4)

so the SIZEOF_EXPR leaked into the middle end.

(There's still one *walk_subtrees = 0; in cp_fold_immediate_r, but that
one should be OK.)

PR c++/112869

gcc/cp/ChangeLog:

* cp-gimplify.cc (cp_fold_immediate_r): Don't clear *walk_subtrees
for unevaluated operands.


I agree that we want this change for in_immediate_context (), but I don't
see why we want it for TYPE_P or unevaluated_p (code) or
cp_unevaluated_operand?


No particular reason, just paranoia.  How's this?

Bootstrapped/regtested on x86_64-pc-linux-gnu, ok for trunk?

-- >8 --
This test shows that we cannot clear *walk_subtrees in
cp_fold_immediate_r when we're in_immediate_context, because that,
as the comment says, affects cp_fold_r as well.  Here we had an
expression with

min ((long int) VIEW_CONVERT_EXPR(bytecount), (long int) 
<<< Unknown tree: sizeof_expr
  (int) <<< error >>> >>>)

as its sub-expression, and we never evaluated that into

min ((long int) bytecount, 4)

so the SIZEOF_EXPR leaked into the middle end.

(There's still one *walk_subtrees = 0; in cp_fold_immediate_r, but that
one should be OK.)

PR c++/112869

gcc/cp/ChangeLog:

* cp-gimplify.cc (cp_fold_immediate_r): Don't clear *walk_subtrees
for in_immediate_context.

gcc/testsuite/ChangeLog:

* g++.dg/template/sizeof18.C: New test.
---
   gcc/cp/cp-gimplify.cc| 6 +-
   gcc/testsuite/g++.dg/template/sizeof18.C | 8 
   2 files changed, 13 insertions(+), 1 deletion(-)
   create mode 100644 gcc/testsuite/g++.dg/template/sizeof18.C

diff --git a/gcc/cp/cp-gimplify.cc b/gcc/cp/cp-gimplify.cc
index 5abb91bbdd3..6af7c787372 100644
--- a/gcc/cp/cp-gimplify.cc
+++ b/gcc/cp/cp-gimplify.cc
@@ -1179,11 +1179,15 @@ cp_fold_immediate_r (tree *stmt_p, int *walk_subtrees, 
void *data_)
 /* No need to look into types or unevaluated operands.
NB: This affects cp_fold_r as well.  */
-  if (TYPE_P (stmt) || unevaluated_p (code) || in_immediate_context ())
+  if (TYPE_P (stmt) || unevaluated_p (code))
   {
 *walk_subtrees = 0;
 return NULL_TREE;
   }
+  else if (in_immediate_context ())
+/* Don't clear *walk_subtrees here: we still need to walk the subtrees
+   of SIZEOF_EXPR and similar.  */
+return NULL_TREE;
 tree decl = NULL_TREE;
 bool call_p = false;
diff --git a/gcc/testsuite/g++.dg/template/sizeof18.C 
b/gcc/testsuite/g++.dg/template/sizeof18.C
new file mode 100644
index 000..afba9946258
--- /dev/null
+++ b/gcc/testsuite/g++.dg/template/sizeof18.C
@@ -0,0 +1,8 @@
+// PR c++/112869
+// { dg-do compile }
+
+void min(long, long);
+template  void Binaryread(int &, T, unsigned long);
+template <> void Binaryread(int &, float, unsigned long bytecount) {
+  min(bytecount, sizeof(int));
+}


Hmm, actually, why does the above make a difference for this testcase?

...

It seems that in_immediate_context always returns true in cp_fold_function
because current_binding_level->kind == sk_template_parms.  That seems like a
problem.  Maybe for cp_fold_immediate_r we only want to check
cp_unevaluated_operand or DECL_IMMEDIATE_CONTEXT (current_function_decl)?


Yeah, I suppose that could become an issue.  How about this, then?

Bootstrapped/regtested on x86_64-pc-linux-gnu, ok for trunk?
-- >8 --
This test shows that we cannot clear *walk_subtrees in
cp_fold_immediate_r when we're in_immediate_context, because that,
as the comment says, affects cp_fold_r as well.  Here we had an
expression with

   min ((long int) VIEW_CONVERT_EXPR(bytecount), (long int) 
<<< Unknown tree: sizeof_expr
 (int) <<< error >>> >>>)

as its sub-expression, and we never evaluated that into

   min ((long int) bytecount, 4)

so the SIZEOF_EXPR leaked into the middle end.

(There's still one *walk_subtrees = 0; in cp_fold_immediate_r, but that
one should be OK.)

PR c++/112869

gcc/cp/ChangeLog:

* cp-gimplify.cc (cp_fold_immediate_r): Don't clear *walk_subtrees
in an unevaluated operand or immediate function.

gcc/testsuite/ChangeLog:

* g++.dg/template/sizeof18.C: New test.
---
  gcc/cp/cp-gimplify.cc| 8 +++-
  gcc/testsuite/g++.dg/template/sizeof18.C | 8 
  2 files ch

RE: [PATCH v7] libgfortran: Replace mutex with rwlock

2023-12-13 Thread Thomas Schwinge

Hi Lipeng!

On 2023-12-12T02:05:26+, "Zhu, Lipeng"  wrote:
> On 2023/12/12 1:45, H.J. Lu wrote:
>> On Sat, Dec 9, 2023 at 7:25 PM Zhu, Lipeng  wrote:
>> > On 2023/12/9 23:23, Jakub Jelinek wrote:
>> > > On Sat, Dec 09, 2023 at 10:39:45AM -0500, Lipeng Zhu wrote:
>> > > > This patch try to introduce the rwlock and split the read/write to
>> > > > unit_root tree and unit_cache with rwlock instead of the mutex to
>> > > > increase CPU efficiency. In the get_gfc_unit function, the
>> > > > percentage to step into the insert_unit function is around 30%, in
>> > > > most instances, we can get the unit in the phase of reading the
>> > > > unit_cache or unit_root tree. So split the read/write phase by
>> > > > rwlock would be an approach to make it more parallel.
>> > > >
>> > > > BTW, the IPC metrics can gain around 9x in our test server with
>> > > > 220 cores. The benchmark we used is
>> > > > https://github.com/rwesson/NEAT

>> > > Ok for trunk, thanks.

>> > Thanks! Looking forward to landing to trunk.

>> Pushed for you.

> Thanks for everyone's patience and help, really appreciate that!

Congratulations on your first contribution to GCC (as far as I can tell)!
:-)


I've just filed 
"'libgomp.fortran/rwlock_1.f90', 'libgomp.fortran/rwlock_3.f90' execution test 
timeouts".
Would you be able to look into that?


Grüße
 Thomas
-
Siemens Electronic Design Automation GmbH; Anschrift: Arnulfstraße 201, 80634 
München; Gesellschaft mit beschränkter Haftung; Geschäftsführer: Thomas 
Heurung, Frank Thürauf; Sitz der Gesellschaft: München; Registergericht 
München, HRB 106955

[PATCH 1/2] emit-rtl, lra: Move lra's emit_inc to emit-rtl.cc

2023-12-13 Thread Alex Coplan

Hi,

In PR112906 we ICE because we try to use force_reg to reload an
auto-increment address, but force_reg can't do this.

With the aim of fixing the PR by supporting reloading arbitrary
addresses in pre-RA splitters, this patch generalizes
lra-constraints.cc:emit_inc and makes it available to the rest of the
compiler by moving the generalized version to emit-rtl.cc.

We observe that the separate IN parameter to LRA's emit_inc is
redundant, since the function is static and is only (statically) called
once in lra-constraints.cc, with in == value.  As such, we drop the IN
parameter and simplify the code accordingly.

We wrap the emit_inc code in a virtual class to allow LRA to override
how reload pseudos are created, thereby preserving the existing LRA
behaviour as much as possible.

We then add a second (higher-level) routine to emit-rtl.cc,
force_reload_address, which can reload arbitrary addresses.  This uses
the generalized emit_inc code to handle the RTX_AUTOINC case.  The
second patch in this series uses force_reload_address to fix PR112906.

Since we intend to call address_reload_context::emit_autoinc from within
splitters, and the code lifted from LRA calls recog, we have to avoid
clobbering recog_data.  We do this by introducing a new RAII class for
saving/restoring recog_data on the stack.

Bootstrapped/regtested on aarch64-linux-gnu, bootstrapped on
x86_64-linux-gnu, OK for trunk?

Thanks,
Alex

gcc/ChangeLog:

PR target/112906
* emit-rtl.cc (address_reload_context::emit_autoinc): New.
(force_reload_address): New.
* emit-rtl.h (struct address_reload_context): Declare.
(force_reload_address): Declare.
* lra-constraints.cc (class lra_autoinc_reload_context): New.
(emit_inc): Drop IN parameter, invoke
code moved to emit-rtl.cc:address_reload_context::emit_autoinc.
(curr_insn_transform): Drop redundant IN parameter in call to
emit_inc.
* recog.h (class recog_data_saver): New.
diff --git a/gcc/emit-rtl.cc b/gcc/emit-rtl.cc
index 4a7e420e7c0..ce7b98bf006 100644
--- a/gcc/emit-rtl.cc
+++ b/gcc/emit-rtl.cc
@@ -58,6 +58,7 @@ along with GCC; see the file COPYING3.  If not see
 #include "rtl-iter.h"
 #include "stor-layout.h"
 #include "opts.h"
+#include "optabs.h"
 #include "predict.h"
 #include "rtx-vector-builder.h"
 #include "gimple.h"
@@ -2576,6 +2577,140 @@ replace_equiv_address_nv (rtx memref, rtx addr, bool 
inplace)
   return change_address_1 (memref, VOIDmode, addr, 0, inplace);
 }
 
+
+/* Emit insns to reload VALUE into a new register.  VALUE is an
+   auto-increment or auto-decrement RTX whose operand is a register or
+   memory location; so reloading involves incrementing that location.
+
+   INC_AMOUNT is the number to increment or decrement by (always
+   positive and ignored for POST_MODIFY/PRE_MODIFY).
+
+   Return a pseudo containing the result.  */
+rtx
+address_reload_context::emit_autoinc (rtx value, poly_int64 inc_amount)
+{
+  /* Since we're going to call recog, and might be called within recog,
+ we need to ensure we save and restore recog_data.  */
+  recog_data_saver recog_save;
+
+  /* REG or MEM to be copied and incremented.  */
+  rtx incloc = XEXP (value, 0);
+
+  const rtx_code code = GET_CODE (value);
+  const bool post_p
+= code == POST_DEC || code == POST_INC || code == POST_MODIFY;
+
+  bool plus_p = true;
+  rtx inc;
+  if (code == PRE_MODIFY || code == POST_MODIFY)
+{
+  gcc_assert (GET_CODE (XEXP (value, 1)) == PLUS
+ || GET_CODE (XEXP (value, 1)) == MINUS);
+  gcc_assert (rtx_equal_p (XEXP (XEXP (value, 1), 0), XEXP (value, 0)));
+  plus_p = GET_CODE (XEXP (value, 1)) == PLUS;
+  inc = XEXP (XEXP (value, 1), 1);
+}
+  else
+{
+  if (code == PRE_DEC || code == POST_DEC)
+   inc_amount = -inc_amount;
+
+  inc = gen_int_mode (inc_amount, GET_MODE (value));
+}
+
+  rtx result;
+  if (!post_p && REG_P (incloc))
+result = incloc;
+  else
+{
+  result = get_reload_reg ();
+  /* First copy the location to the result register.  */
+  emit_insn (gen_move_insn (result, incloc));
+}
+
+  /* See if we can directly increment INCLOC.  */
+  rtx_insn *last = get_last_insn ();
+  rtx_insn *add_insn = emit_insn (plus_p
+ ? gen_add2_insn (incloc, inc)
+ : gen_sub2_insn (incloc, inc));
+  const int icode = recog_memoized (add_insn);
+  if (icode >= 0)
+{
+  if (!post_p && result != incloc)
+   emit_insn (gen_move_insn (result, incloc));
+  return result;
+}
+  delete_insns_since (last);
+
+  /* If couldn't do the increment directly, must increment in RESULT.
+ The way we do this depends on whether this is pre- or
+ post-increment.  For pre-increment, copy INCLOC to the reload
+ register, increment it there, then save back.  */
+  if (!post_p)
+{
+  if (incloc != result)
+   emit_insn (gen_move_insn (result,

[PATCH 2/2] aarch64: Handle autoinc addresses in ld1rq splitter [PR112906]

2023-12-13 Thread Alex Coplan

This patch uses the new force_reload_address routine added by the
previous patch to fix PR112906.

Bootstrapped/regtested on aarch64-linux-gnu, OK for trunk?

Thanks,
Alex

gcc/ChangeLog:

PR target/112906
* config/aarch64/aarch64-sve.md (@aarch64_vec_duplicate_vq_le):
Use force_reload_address to reload addresses that aren't suitable for
ld1rq in the pre-RA splitter.

gcc/testsuite/ChangeLog:

PR target/112906
* gcc.target/aarch64/sve/acle/general/pr112906.c: New test.
diff --git a/gcc/config/aarch64/aarch64-sve.md 
b/gcc/config/aarch64/aarch64-sve.md
index fdd14d15096..319bc01cae9 100644
--- a/gcc/config/aarch64/aarch64-sve.md
+++ b/gcc/config/aarch64/aarch64-sve.md
@@ -2690,10 +2690,7 @@ (define_insn_and_split 
"@aarch64_vec_duplicate_vq_le"
   {
 if (can_create_pseudo_p ()
 && !aarch64_sve_ld1rq_operand (operands[1], mode))
-  {
-   rtx addr = force_reg (Pmode, XEXP (operands[1], 0));
-   operands[1] = replace_equiv_address (operands[1], addr);
-  }
+  operands[1] = force_reload_address (operands[1]);
 if (GET_CODE (operands[2]) == SCRATCH)
   operands[2] = gen_reg_rtx (VNx16BImode);
 emit_move_insn (operands[2], CONSTM1_RTX (VNx16BImode));
diff --git a/gcc/testsuite/gcc.target/aarch64/sve/acle/general/pr112906.c 
b/gcc/testsuite/gcc.target/aarch64/sve/acle/general/pr112906.c
new file mode 100644
index 000..69b653f1a71
--- /dev/null
+++ b/gcc/testsuite/gcc.target/aarch64/sve/acle/general/pr112906.c
@@ -0,0 +1,17 @@
+/* { dg-do compile } */
+/* { dg-additional-options "-O2" } */
+#include 
+unsigned c;
+long d;
+void f() {
+  unsigned char *b;
+  svbool_t x = svptrue_b8();
+  svuint32_t g;
+  svuint8_t h, i;
+  d = 0;
+  for (; (unsigned *)d < &c; d += 16) {
+h = svld1rq(x, &b[d]);
+g = svdot_lane(g, i, h, 3);
+  }
+  svst1_vnum(x, &c, 8, g);
+}

[PATCH V3] RISC-V: XFAIL scan dump fails for autovec PR111311

2023-12-13 Thread Edwin Lu

Clean up scan dump failures on linux rv64 vector targets Juzhe mentioned
could be ignored for now. This will help reduce noise and make it more obvious
if a bug or regression is introduced. The failures that are still reported
are either execution failures or failures that are also present on armv8-a+sve

gcc/testsuite/ChangeLog:

* c-c++-common/vector-subscript-4.c: xfail testcase
* g++.dg/tree-ssa/pr83518.C: ditto
* gcc.dg/attr-alloc_size-11.c: remove xfail
* gcc.dg/signbit-2.c: xfail testcase
* gcc.dg/signbit-5.c: ditto
* gcc.dg/tree-ssa/cunroll-16.c: ditto
* gcc.dg/tree-ssa/gen-vect-34.c: ditto
* gcc.dg/tree-ssa/loop-bound-1.c: ditto
* gcc.dg/tree-ssa/loop-bound-2.c: ditto
* gcc.dg/tree-ssa/pr84512.c: remove xfail
* gcc.dg/tree-ssa/predcom-4.c: xfail testcase
* gcc.dg/tree-ssa/predcom-5.c: ditto
* gcc.dg/tree-ssa/predcom-9.c: ditto
* gcc.dg/tree-ssa/reassoc-46.c: ditto
* gcc.dg/tree-ssa/scev-10.c: ditto
* gcc.dg/tree-ssa/scev-11.c: ditto
* gcc.dg/tree-ssa/scev-14.c: ditto
* gcc.dg/tree-ssa/scev-9.c: ditto
* gcc.dg/tree-ssa/split-path-11.c: ditto
* gcc.dg/tree-ssa/ssa-dom-cse-2.c: ditto
* gcc.dg/tree-ssa/update-threading.c: ditto
* gcc.dg/unroll-8.c: ditto
* gcc.dg/var-expand1.c: ditto
* gcc.dg/vect/pr103116-1.c: ditto
* gcc.dg/vect/pr103116-2.c: ditto
* gcc.dg/vect/pr65310.c: ditto
* gfortran.dg/vect/vect-8.f90: ditto

Signed-off-by: Edwin Lu 
---
V2 changes:
- added attr-alloc_size-11.c and update-threading.c which were missed in
  previous patch
- remove pr83232.f90 xfail since it was fixed in a recent trunk update
- adjust xfail on split-path-11.c to only apply to rv64

V3 changes:
- swapped to only xfailing riscv specifically (pr84512.c and pr83518.c)
- removed modifications to target-supports.exp as it was accidentally added
---
 gcc/testsuite/c-c++-common/vector-subscript-4.c  | 3 ++-
 gcc/testsuite/g++.dg/tree-ssa/pr83518.C  | 2 +-
 gcc/testsuite/gcc.dg/attr-alloc_size-11.c| 4 ++--
 gcc/testsuite/gcc.dg/signbit-2.c | 5 +++--
 gcc/testsuite/gcc.dg/signbit-5.c | 1 +
 gcc/testsuite/gcc.dg/tree-ssa/cunroll-16.c   | 5 +++--
 gcc/testsuite/gcc.dg/tree-ssa/gen-vect-34.c  | 3 ++-
 gcc/testsuite/gcc.dg/tree-ssa/loop-bound-1.c | 3 ++-
 gcc/testsuite/gcc.dg/tree-ssa/loop-bound-2.c | 3 ++-
 gcc/testsuite/gcc.dg/tree-ssa/pr84512.c  | 2 +-
 gcc/testsuite/gcc.dg/tree-ssa/predcom-4.c| 5 +++--
 gcc/testsuite/gcc.dg/tree-ssa/predcom-5.c| 5 +++--
 gcc/testsuite/gcc.dg/tree-ssa/predcom-9.c| 3 ++-
 gcc/testsuite/gcc.dg/tree-ssa/reassoc-46.c   | 3 ++-
 gcc/testsuite/gcc.dg/tree-ssa/scev-10.c  | 3 ++-
 gcc/testsuite/gcc.dg/tree-ssa/scev-11.c  | 3 ++-
 gcc/testsuite/gcc.dg/tree-ssa/scev-14.c  | 4 +++-
 gcc/testsuite/gcc.dg/tree-ssa/scev-9.c   | 3 ++-
 gcc/testsuite/gcc.dg/tree-ssa/split-path-11.c| 2 +-
 gcc/testsuite/gcc.dg/tree-ssa/ssa-dom-cse-2.c| 2 +-
 gcc/testsuite/gcc.dg/tree-ssa/update-threading.c | 2 +-
 gcc/testsuite/gcc.dg/unroll-8.c  | 8 +---
 gcc/testsuite/gcc.dg/var-expand1.c   | 3 ++-
 gcc/testsuite/gcc.dg/vect/pr103116-1.c   | 3 ++-
 gcc/testsuite/gcc.dg/vect/pr103116-2.c   | 3 ++-
 gcc/testsuite/gcc.dg/vect/pr65310.c  | 4 ++--
 gcc/testsuite/gfortran.dg/vect/vect-8.f90| 3 ++-
 27 files changed, 56 insertions(+), 34 deletions(-)

diff --git a/gcc/testsuite/c-c++-common/vector-subscript-4.c 
b/gcc/testsuite/c-c++-common/vector-subscript-4.c
index 2c2481f88b7..eb0bca1c19e 100644
--- a/gcc/testsuite/c-c++-common/vector-subscript-4.c
+++ b/gcc/testsuite/c-c++-common/vector-subscript-4.c
@@ -25,5 +25,6 @@ foobar(16)
 foobar(32)
 foobar(64)
 
+/* Xfail riscv PR112531.  */
 /* Verify we don't have any vector temporaries in the IL.  */
-/* { dg-final { scan-tree-dump-not "vector" "optimized" } } */
+/* { dg-final { scan-tree-dump-not "vector" "optimized" { xfail { riscv_v && 
vect_variable_length } } } } */
diff --git a/gcc/testsuite/g++.dg/tree-ssa/pr83518.C 
b/gcc/testsuite/g++.dg/tree-ssa/pr83518.C
index b8a2bd1ebbd..dcb9279abc2 100644
--- a/gcc/testsuite/g++.dg/tree-ssa/pr83518.C
+++ b/gcc/testsuite/g++.dg/tree-ssa/pr83518.C
@@ -24,4 +24,4 @@ unsigned test()
   return sum;
 }
 
-/* { dg-final { scan-tree-dump "return 15;" "optimized" { xfail 
vect_variable_length } } } */
+/* { dg-final { scan-tree-dump "return 15;" "optimized" { xfail { 
vect_variable_length && { ! riscv_v } } } } */
diff --git a/gcc/testsuite/gcc.dg/attr-alloc_size-11.c 
b/gcc/testsuite/gcc.dg/attr-alloc_size-11.c
index a2efe128915..2828db12e05 100644
--- a/gcc/testsuite/gcc.dg/attr-alloc_size-11.c
+++ b/gcc/testsuite/gcc.dg/attr-alloc_size-11.c
@@ -47,8 +47,8 @@ typedef __SIZE_TYPE__size_t;
 
 /* The following tests fai

Re: [PATCH] rs6000: Disassemble opaque modes using subregs to allow optimizations [PR109116]

2023-12-13 Thread Peter Bergner

On 11/24/23 3:28 AM, Kewen.Lin wrote:
>> +  int regoff = INTVAL (operands[2]) * GET_MODE_SIZE (V16QImode);
> 
> Is it intentional to keep GET_MODE_SIZE (V16QImode) instead of 16?
> I think if one day NUM_POLY_INT_COEFFS isn't 1 on rs6000 any more,
> we have to add one explicit .to_constant () here.  So I prefer this
> to use 16 directly, maybe one comment above to indicate what's for
> the value 16.

I normally don't like hard coding constants in the code, so used
GET_MODE_SIZE (V16QImode) as the number of bytes of a vector register,
but if that's going to cause an issue in the future, I'm fine using 16.
Changed.

>> +  int regoff = INTVAL (operands[2]) * GET_MODE_SIZE (V16QImode);
> 
> Likewise.

Changed.

>> diff --git a/gcc/config/rs6000/rs6000.cc b/gcc/config/rs6000/rs6000.cc
>> index 5f56c3ed85b..f2efa46c147 100644
>> --- a/gcc/config/rs6000/rs6000.cc
>> +++ b/gcc/config/rs6000/rs6000.cc
>> @@ -1964,9 +1964,12 @@ rs6000_hard_regno_mode_ok (unsigned int regno, 
>> machine_mode mode)
>>  static bool
>>  rs6000_modes_tieable_p (machine_mode mode1, machine_mode mode2)
>>  {
>> -  if (mode1 == PTImode || mode1 == OOmode || mode1 == XOmode
>> -  || mode2 == PTImode || mode2 == OOmode || mode2 == XOmode)
>> -return mode1 == mode2;
>> +   if (mode1 == PTImode || mode1 == OOmode || mode1 == XOmode
>> +   || mode2 == PTImode || mode2 == XOmode)
>> + return mode1 == mode2;
>> + 
>> +  if (mode2 == OOmode)
>> +return ALTIVEC_OR_VSX_VECTOR_MODE (mode1);
> 
> I vaguely remembered that Segher mentioned it's unexpected for opaque
> modes to have tieable modes excepting for themselves, but if this is the
> only way to get rid of those extra moves, I guess we can special-case
> them here.  Looking forward to Segher's comments on this part.

To be honest, my original patch didn't have this.  I think it was Mike who
said we want or need this.  Mike, why do we want/need this again?

That said, the documentation for TARGET_MODES_TIEABLE_P says:

  This hook returns true if a value of mode mode1 is accessible in
  mode mode2 without copying.

Given OOmode (ie, __vector_pair) under the covers is two consecutive
vector registers, and we use them/initialize them with two vectors,
then mode1 being a vector mode could be accesible from an OOmode mode2
without copying, meaning we could access it directly from the registers
holding mode2.

Segher, your input to the above an the subreg portion of the patch in general?

Peter

Re: [PATCH v3 10/11] aarch64: Add new load/store pair fusion pass

2023-12-13 Thread Richard Sandiford

Thanks for the update.  The new comments are really nice, and I think
make the implementation much easier to follow.

I was going to say OK with the changes below, but there's one question/
comment near the end about the double list walk.

Alex Coplan  writes:
> +// Convenience wrapper around strip_offset that can also look through
> +// RTX_AUTOINC addresses.  The interface is like strip_offset except we take 
> a
> +// MEM so that we know the mode of the access.
> +static rtx ldp_strip_offset (rtx mem, poly_int64 *offset)

Formatting nit, should be:

static rtx
ldp_strip_offset (rtx mem, poly_int64 *offset)

> +{
> +  rtx addr = XEXP (mem, 0);
> +
> +  switch (GET_CODE (addr))
> +{
> +case PRE_MODIFY:
> +case POST_MODIFY:
> +  addr = strip_offset (XEXP (addr, 1), offset);
> +  gcc_checking_assert (REG_P (addr));
> +  gcc_checking_assert (rtx_equal_p (XEXP (XEXP (mem, 0), 0), addr));
> +  break;
> +case PRE_INC:
> +case POST_INC:
> +  addr = XEXP (addr, 0);
> +  *offset = GET_MODE_SIZE (GET_MODE (mem));
> +  gcc_checking_assert (REG_P (addr));
> +  break;
> +case PRE_DEC:
> +case POST_DEC:
> +  addr = XEXP (addr, 0);
> +  *offset = -GET_MODE_SIZE (GET_MODE (mem));
> +  gcc_checking_assert (REG_P (addr));
> +  break;
> +
> +default:
> +  addr = strip_offset (addr, offset);
> +}
> +
> +  return addr;
> +}
> [...]
> +// Main function to begin pair discovery.  Given a memory access INSN,
> +// determine whether it could be a candidate for fusing into an ldp/stp,
> +// and if so, track it in the appropriate data structure for this basic
> +// block.  LOAD_P is true if the access is a load, and MEM is the mem
> +// rtx that occurs in INSN.
> +void
> +ldp_bb_info::track_access (insn_info *insn, bool load_p, rtx mem)
> +{
> +  // We can't combine volatile MEMs, so punt on these.
> +  if (MEM_VOLATILE_P (mem))
> +return;
> +
> +  // Ignore writeback accesses if the param says to do so.
> +  if (!aarch64_ldp_writeback
> +  && GET_RTX_CLASS (GET_CODE (XEXP (mem, 0))) == RTX_AUTOINC)
> +return;
> +
> +  const machine_mode mem_mode = GET_MODE (mem);
> +  if (!ldp_operand_mode_ok_p (mem_mode))
> +return;
> +
> +  // Note ldp_operand_mode_ok_p already rejected VL modes.
> +  const HOST_WIDE_INT mem_size = GET_MODE_SIZE (mem_mode).to_constant ();
> +
> +  rtx reg_op = XEXP (PATTERN (insn->rtl ()), !load_p);
> +
> +  // We want to segregate FP/SIMD accesses from GPR accesses.
> +  //
> +  // Before RA, we use the modes, noting that stores of constant zero
> +  // operands use GPRs (even in non-integer modes).  After RA, we use
> +  // the hard register numbers.
> +  const bool fpsimd_op_p
> += reload_completed
> +? (REG_P (reg_op) && FP_REGNUM_P (REGNO (reg_op)))
> +: (GET_MODE_CLASS (mem_mode) != MODE_INT
> +   && (load_p || !aarch64_const_zero_rtx_p (reg_op)));
> +
> +  const lfs_fields lfs = { load_p, fpsimd_op_p, mem_size };
> +
> +  if (track_via_mem_expr (insn, mem, lfs))
> +return;
> +
> +  poly_int64 mem_off;
> +  rtx addr = XEXP (mem, 0);
> +  const bool autoinc_p = GET_RTX_CLASS (GET_CODE (addr)) == RTX_AUTOINC;
> +  rtx base = ldp_strip_offset (mem, &mem_off);
> +  if (!REG_P (base))
> +return;
> +
> +  // Need to calculate two (possibly different) offsets:
> +  //  - Offset at which the access occurs.
> +  //  - Offset of the new base def.
> +  poly_int64 access_off;
> +  if (autoinc_p && any_post_modify_p (addr))
> +access_off = 0;
> +  else
> +access_off = mem_off;
> +
> +  poly_int64 new_def_off = mem_off;
> +
> +  // Punt on accesses relative to the eliminable regs: since we don't
> +  // know the elimination offset pre-RA, we should postpone forming
> +  // pairs on such accesses until after RA.
> +  if (!reload_completed
> +  && (REGNO (base) == FRAME_POINTER_REGNUM
> +   || REGNO (base) == ARG_POINTER_REGNUM))
> +return;

FTR, this reason still doesn't feel entirely convincing.  Although we don't
know the offset, we do know that the offset will be consistent for both
accesses, and LRA will/should be able to reload the pair address if
necessary.  In fact we could even end up with fewer reloads (given the
new single-mem representation of the pair).

But perhaps one reason to defer is that elimination reduces the number
of fixed base registers in play, and so there should be strictly more
fusing opportunities after register allocation.

No need to change anything though, just noting it in passing.

> +
> +  // Now need to find def of base register.
> +  def_info *base_def;
> +  use_info *base_use = find_access (insn->uses (), REGNO (base));
> +  gcc_assert (base_use);
> +  base_def = base_use->def ();

Think this'd be more natural as:

  use_info *base_use = find_access (insn->uses (), REGNO (base));
  gcc_assert (base_use);
  use_info *base_def = base_use->def ();

> +  if (!base_def)
> +{
> +  if (dump_file)
> + fprintf (dump_file,
> +

Re: [PATCH 1/2] emit-rtl, lra: Move lra's emit_inc to emit-rtl.cc

2023-12-13 Thread Richard Sandiford

Alex Coplan  writes:
> Hi,
>
> In PR112906 we ICE because we try to use force_reg to reload an
> auto-increment address, but force_reg can't do this.
>
> With the aim of fixing the PR by supporting reloading arbitrary
> addresses in pre-RA splitters, this patch generalizes
> lra-constraints.cc:emit_inc and makes it available to the rest of the
> compiler by moving the generalized version to emit-rtl.cc.
>
> We observe that the separate IN parameter to LRA's emit_inc is
> redundant, since the function is static and is only (statically) called
> once in lra-constraints.cc, with in == value.  As such, we drop the IN
> parameter and simplify the code accordingly.
>
> We wrap the emit_inc code in a virtual class to allow LRA to override
> how reload pseudos are created, thereby preserving the existing LRA
> behaviour as much as possible.
>
> We then add a second (higher-level) routine to emit-rtl.cc,
> force_reload_address, which can reload arbitrary addresses.  This uses
> the generalized emit_inc code to handle the RTX_AUTOINC case.  The
> second patch in this series uses force_reload_address to fix PR112906.
>
> Since we intend to call address_reload_context::emit_autoinc from within
> splitters, and the code lifted from LRA calls recog, we have to avoid
> clobbering recog_data.  We do this by introducing a new RAII class for
> saving/restoring recog_data on the stack.
>
> Bootstrapped/regtested on aarch64-linux-gnu, bootstrapped on
> x86_64-linux-gnu, OK for trunk?
>
> Thanks,
> Alex
>
> gcc/ChangeLog:
>
>   PR target/112906
>   * emit-rtl.cc (address_reload_context::emit_autoinc): New.
>   (force_reload_address): New.
>   * emit-rtl.h (struct address_reload_context): Declare.
>   (force_reload_address): Declare.
>   * lra-constraints.cc (class lra_autoinc_reload_context): New.
>   (emit_inc): Drop IN parameter, invoke
>   code moved to emit-rtl.cc:address_reload_context::emit_autoinc.
>   (curr_insn_transform): Drop redundant IN parameter in call to
>   emit_inc.
>   * recog.h (class recog_data_saver): New.

LGTM, and looks like it should functionally be a no-op for LRA.

So OK, but please give Vlad 24 hours to comment, or to ask for more time.

Thanks,
Richard

>
> diff --git a/gcc/emit-rtl.cc b/gcc/emit-rtl.cc
> index 4a7e420e7c0..ce7b98bf006 100644
> --- a/gcc/emit-rtl.cc
> +++ b/gcc/emit-rtl.cc
> @@ -58,6 +58,7 @@ along with GCC; see the file COPYING3.  If not see
>  #include "rtl-iter.h"
>  #include "stor-layout.h"
>  #include "opts.h"
> +#include "optabs.h"
>  #include "predict.h"
>  #include "rtx-vector-builder.h"
>  #include "gimple.h"
> @@ -2576,6 +2577,140 @@ replace_equiv_address_nv (rtx memref, rtx addr, bool 
> inplace)
>return change_address_1 (memref, VOIDmode, addr, 0, inplace);
>  }
>  
> +
> +/* Emit insns to reload VALUE into a new register.  VALUE is an
> +   auto-increment or auto-decrement RTX whose operand is a register or
> +   memory location; so reloading involves incrementing that location.
> +
> +   INC_AMOUNT is the number to increment or decrement by (always
> +   positive and ignored for POST_MODIFY/PRE_MODIFY).
> +
> +   Return a pseudo containing the result.  */
> +rtx
> +address_reload_context::emit_autoinc (rtx value, poly_int64 inc_amount)
> +{
> +  /* Since we're going to call recog, and might be called within recog,
> + we need to ensure we save and restore recog_data.  */
> +  recog_data_saver recog_save;
> +
> +  /* REG or MEM to be copied and incremented.  */
> +  rtx incloc = XEXP (value, 0);
> +
> +  const rtx_code code = GET_CODE (value);
> +  const bool post_p
> += code == POST_DEC || code == POST_INC || code == POST_MODIFY;
> +
> +  bool plus_p = true;
> +  rtx inc;
> +  if (code == PRE_MODIFY || code == POST_MODIFY)
> +{
> +  gcc_assert (GET_CODE (XEXP (value, 1)) == PLUS
> +   || GET_CODE (XEXP (value, 1)) == MINUS);
> +  gcc_assert (rtx_equal_p (XEXP (XEXP (value, 1), 0), XEXP (value, 0)));
> +  plus_p = GET_CODE (XEXP (value, 1)) == PLUS;
> +  inc = XEXP (XEXP (value, 1), 1);
> +}
> +  else
> +{
> +  if (code == PRE_DEC || code == POST_DEC)
> + inc_amount = -inc_amount;
> +
> +  inc = gen_int_mode (inc_amount, GET_MODE (value));
> +}
> +
> +  rtx result;
> +  if (!post_p && REG_P (incloc))
> +result = incloc;
> +  else
> +{
> +  result = get_reload_reg ();
> +  /* First copy the location to the result register.  */
> +  emit_insn (gen_move_insn (result, incloc));
> +}
> +
> +  /* See if we can directly increment INCLOC.  */
> +  rtx_insn *last = get_last_insn ();
> +  rtx_insn *add_insn = emit_insn (plus_p
> +   ? gen_add2_insn (incloc, inc)
> +   : gen_sub2_insn (incloc, inc));
> +  const int icode = recog_memoized (add_insn);
> +  if (icode >= 0)
> +{
> +  if (!post_p && result != incloc)
> + emit_insn (gen_move_insn (result, incloc));
> +  return res

Re: [PATCH 2/2] aarch64: Handle autoinc addresses in ld1rq splitter [PR112906]

2023-12-13 Thread Richard Sandiford

Alex Coplan  writes:
> This patch uses the new force_reload_address routine added by the
> previous patch to fix PR112906.
>
> Bootstrapped/regtested on aarch64-linux-gnu, OK for trunk?

OK, thanks, and sorry for the breakage.

Richard

>
> Thanks,
> Alex
>
> gcc/ChangeLog:
>
>   PR target/112906
>   * config/aarch64/aarch64-sve.md (@aarch64_vec_duplicate_vq_le):
>   Use force_reload_address to reload addresses that aren't suitable for
>   ld1rq in the pre-RA splitter.
>
> gcc/testsuite/ChangeLog:
>
>   PR target/112906
>   * gcc.target/aarch64/sve/acle/general/pr112906.c: New test.

1 2 >

1 - 100 of 146 matches

Mail list logo