date:20231209

[Committed] RISC-V: Fix VLS mode movmiaslign bug

2023-12-09 Thread Juzhe-Zhong

PR112932 let me notice there is a bug of current VLS mode misalign pattern.
Adapt it same as VLA mode.

Commited as it is obvious fix.

PR target/112932

gcc/ChangeLog:

* config/riscv/vector.md (movmisalign): Fix VLSmode bugs.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/rvv/autovec/vls/misalign-1.c: Ditto.
* gcc.target/riscv/rvv/autovec/pr112932.c: New test.

---
 gcc/config/riscv/vector.md| 23 +--
 .../gcc.target/riscv/rvv/autovec/pr112932.c   | 66 +++
 .../riscv/rvv/autovec/vls/misalign-1.c|  6 +-
 3 files changed, 70 insertions(+), 25 deletions(-)
 create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/pr112932.c

diff --git a/gcc/config/riscv/vector.md b/gcc/config/riscv/vector.md
index 31c13a6dcca..a1284fd3251 100644
--- a/gcc/config/riscv/vector.md
+++ b/gcc/config/riscv/vector.md
@@ -1334,31 +1334,12 @@
   [(set_attr "type" "vmov")
(set_attr "mode" "")])
 
-(define_expand "movmisalign"
-  [(set (match_operand:VLS 0 "nonimmediate_operand")
-   (match_operand:VLS 1 "general_operand"))]
-  "TARGET_VECTOR"
-  {
-/* To support misalign data movement, we should use
-   minimum element alignment load/store.  */
-unsigned int size = GET_MODE_SIZE (GET_MODE_INNER (mode));
-poly_int64 nunits = GET_MODE_NUNITS (mode) * size;
-machine_mode mode = riscv_vector::get_vector_mode (QImode, nunits).require 
();
-operands[0] = gen_lowpart (mode, operands[0]);
-operands[1] = gen_lowpart (mode, operands[1]);
-if (MEM_P (operands[0]) && !register_operand (operands[1], mode))
-  operands[1] = force_reg (mode, operands[1]);
-riscv_vector::emit_vlmax_insn (code_for_pred_mov (mode), 
riscv_vector::UNARY_OP, operands);
-DONE;
-  }
-)
-
 ;; According to RVV ISA:
 ;; If an element accessed by a vector memory instruction is not naturally 
aligned to the size of the element,
 ;; either the element is transferred successfully or an address misaligned 
exception is raised on that element.
 (define_expand "movmisalign"
-  [(set (match_operand:V 0 "nonimmediate_operand")
-   (match_operand:V 1 "general_operand"))]
+  [(set (match_operand:V_VLS 0 "nonimmediate_operand")
+   (match_operand:V_VLS 1 "general_operand"))]
   "TARGET_VECTOR && TARGET_VECTOR_MISALIGN_SUPPORTED"
   {
 emit_move_insn (operands[0], operands[1]);
diff --git a/gcc/testsuite/gcc.target/riscv/rvv/autovec/pr112932.c 
b/gcc/testsuite/gcc.target/riscv/rvv/autovec/pr112932.c
new file mode 100644
index 000..4ae6ec02817
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/rvv/autovec/pr112932.c
@@ -0,0 +1,66 @@
+/* { dg-do run } */
+/* { dg-options "-O3" } */
+/* { dg-require-effective-target riscv_v } */
+
+#include 
+int a, j, n, b, c, o, d, g, h;
+int e[8];
+long f[8][6];
+void l() {
+  o = -27;
+  for (; o; o++) {
+*e = 1;
+if (a >= n) {
+  d = 0;
+  for (; d <= 7; d++)
+e[d] = c;
+}
+  }
+  j = 0;
+  for (; j < 8; j++) {
+g = 0;
+for (; g < 2; g++) {
+  h = 1;
+  for (; h < 3; h++)
+f[j][g * 2 + h] = 1;
+}
+  }
+  unsigned long *m = &f[1][1];
+  *m = 0;
+}
+int main() {
+  l();
+  assert (f[0][1] == 1);
+  assert (f[0][2] == 1);
+  assert (f[0][3] == 1);
+  assert (f[0][4] == 1);
+  assert (f[1][1] == 0);
+  assert (f[1][2] == 1);
+  assert (f[1][3] == 1);
+  assert (f[1][4] == 1);
+  assert (f[2][1] == 1);
+  assert (f[2][2] == 1);
+  assert (f[2][3] == 1);
+  assert (f[2][4] == 1);
+  assert (f[3][1] == 1);
+  assert (f[3][2] == 1);
+  assert (f[3][3] == 1);
+  assert (f[3][4] == 1);
+  assert (f[4][1] == 1);
+  assert (f[4][2] == 1);
+  assert (f[4][3] == 1);
+  assert (f[4][4] == 1);
+  assert (f[5][1] == 1);
+  assert (f[5][2] == 1);
+  assert (f[5][3] == 1);
+  assert (f[5][4] == 1);
+  assert (f[6][1] == 1);
+  assert (f[6][2] == 1);
+  assert (f[6][3] == 1);
+  assert (f[6][4] == 1);
+  assert (f[7][1] == 1);
+  assert (f[7][2] == 1);
+  assert (f[7][3] == 1);
+  assert (f[7][4] == 1);
+}
+
diff --git a/gcc/testsuite/gcc.target/riscv/rvv/autovec/vls/misalign-1.c 
b/gcc/testsuite/gcc.target/riscv/rvv/autovec/vls/misalign-1.c
index b602ffd69bb..6e08f77921a 100644
--- a/gcc/testsuite/gcc.target/riscv/rvv/autovec/vls/misalign-1.c
+++ b/gcc/testsuite/gcc.target/riscv/rvv/autovec/vls/misalign-1.c
@@ -21,7 +21,5 @@ foo ()
 abort ();
 }
 
-/* { dg-final { scan-assembler-times {vle8\.v} 1 } } */
-/* { dg-final { scan-assembler-times {vle8\.v} 1 } } */
-/* { dg-final { scan-assembler-not {vle16\.v} } } */
-/* { dg-final { scan-assembler-not {vle16\.v} } } */
+/* { dg-final { scan-assembler-times {vsetvli} 1 } } */
+
-- 
2.36.3

[PATCH] phiopt: Fix ICE with large --param l1-cache-line-size= [PR112887]

2023-12-09 Thread Jakub Jelinek

Hi!

This function is never called when param_l1_cache_line_size is 0,
but it uses int and unsigned int variables to hold alignment in
bits, so for large param_l1_cache_line_size it is zero and e.g.
DECL_ALIGN () % param_align_bits can divide by zero.
Looking at the code, the function uses tree_fits_uhwi_p on the trees
before converting them using tree_to_uhwi to int variables, which
looks just wrong, either it would need to punt if it doesn't fit
into those and also check for overflows during the computation,
or use unsigned HOST_WIDE_INT for all of this.  That also fixes
the division by zero, as param_l1_cache_line_size maximum is INT_MAX,
that multiplied by 8 will always fit.

Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?

2023-12-09  Jakub Jelinek  

PR tree-optimization/112887
* tree-ssa-phiopt.cc (hoist_adjacent_loads): Change type of
param_align, param_align_bits, offset1, offset2, size2 and align1
variables from int or unsigned int to unsigned HOST_WIDE_INT.

* gcc.dg/pr112887.c: New test.

--- gcc/tree-ssa-phiopt.cc.jj   2023-11-14 10:52:16.195275972 +0100
+++ gcc/tree-ssa-phiopt.cc  2023-12-08 16:25:29.166747347 +0100
@@ -3757,8 +3757,8 @@ static void
 hoist_adjacent_loads (basic_block bb0, basic_block bb1,
  basic_block bb2, basic_block bb3)
 {
-  int param_align = param_l1_cache_line_size;
-  unsigned param_align_bits = (unsigned) (param_align * BITS_PER_UNIT);
+  unsigned HOST_WIDE_INT param_align = param_l1_cache_line_size;
+  unsigned HOST_WIDE_INT param_align_bits = param_align * BITS_PER_UNIT;
   gphi_iterator gsi;
 
   /* Walk the phis in bb3 looking for an opportunity.  We are looking
@@ -3770,8 +3770,7 @@ hoist_adjacent_loads (basic_block bb0, b
   gimple *def1, *def2;
   tree arg1, arg2, ref1, ref2, field1, field2;
   tree tree_offset1, tree_offset2, tree_size2, next;
-  int offset1, offset2, size2;
-  unsigned align1;
+  unsigned HOST_WIDE_INT offset1, offset2, size2, align1;
   gimple_stmt_iterator gsi2;
   basic_block bb_for_def1, bb_for_def2;
 
--- gcc/testsuite/gcc.dg/pr112887.c.jj  2023-12-08 16:31:30.708697160 +0100
+++ gcc/testsuite/gcc.dg/pr112887.c 2023-12-08 16:27:06.662385487 +0100
@@ -0,0 +1,13 @@
+/* PR tree-optimization/112887 */
+/* { dg-do compile } */
+/* { dg-options "-O2 --param=l1-cache-line-size=0x2000" } */
+
+void bar (long);
+long c;
+struct S { long a, b; } s;
+
+void
+foo (void)
+{
+  bar (c ? s.a : s.b);
+}

Jakub

Re: [PATCH 15/21]middle-end: [RFC] conditionally support forcing final edge for debugging

2023-12-09 Thread Richard Sandiford

Tamar Christina  writes:
> Hi All,
>
> What do people think about having the ability to force only the latch 
> connected
> exit as the exit as a param? I.e. what's in the patch but as a param.
>
> I found this useful when debugging large example failures as it tells me where
> I should be looking.  No hard requirement but just figured I'd ask if we 
> should.

If it's useful for that, then perhaps it would be worth making it a
DEBUG_COUNTER instead of a --param, for easy bisection.

Thanks,
Richard

>
> Thanks,
> Tamar
>
> gcc/ChangeLog:
>
>   * tree-vect-loop.cc (vec_init_loop_exit_info): Allow forcing of exit.
>
> --- inline copy of patch -- 
> diff --git a/gcc/tree-vect-loop.cc b/gcc/tree-vect-loop.cc
> index 
> 27ab6abfa854f14f8a4cf3d9fcb1ac1c203a4198..d6b35372623e94e02965510ab557cb568c302ebe
>  100644
> --- a/gcc/tree-vect-loop.cc
> +++ b/gcc/tree-vect-loop.cc
> @@ -964,6 +964,7 @@ vec_init_loop_exit_info (class loop *loop)
>if (exits.length () == 1)
>  return exits[0];
>  
> +#if 0
>/* If we have multiple exits we only support counting IV at the moment.  
> Analyze
>   all exits and return one */
>class tree_niter_desc niter_desc;
> @@ -982,6 +983,16 @@ vec_init_loop_exit_info (class loop *loop)
>  }
>  
>return candidate;
> +#else
> +  basic_block bb = ip_normal_pos (loop);
> +  if (!bb)
> +return NULL;
> +
> +  edge exit = EDGE_SUCC (bb, 0);
> +  if (exit->dest == loop->latch)
> +return EDGE_SUCC (bb, 1);
> +  return exit;
> +#endif
>  }
>  
>  /* Function bb_in_loop_p

Re: [PATCH] Add support for function attributes and variable attributes

2023-12-09 Thread Guillaume Gomez

Added it.

Le jeu. 7 déc. 2023 à 18:13, Antoni Boucher  a écrit :
>
> It seems like you forgot to prefix the commit message with "libgccjit:
> ".
>
> On Thu, 2023-11-30 at 10:55 +0100, Guillaume Gomez wrote:
> > Ping David. :)
> >
> > Le jeu. 23 nov. 2023 à 22:59, Antoni Boucher  a
> > écrit :
> > > David: I found back the comment you made. Here it is:
> > >
> > >I see you have patches to add function and variable attributes;
> > > I
> > >wonder if this would be cleaner internally if there was a
> > >recording::attribute class, rather than the std::pair currently
> > > in
> > >use
> > >(some attributes have int arguments rather than string, others
> > > have
> > >multiple args).
> > >
> > >I also wondered if a "gcc_jit_attribute" type could be exposed
> > > to
> > >the
> > >user, e.g.:
> > >
> > >  attr1 = gcc_jit_context_new_attribute (ctxt, "noreturn");
> > >  attr2 = gcc_jit_context_new_attribute_with_string (ctxt,
> > > "alias",
> > >"__foo");
> > >  gcc_jit_function_add_attribute (ctxt, attr1);
> > >  gcc_jit_function_add_attribute (ctxt, attr2);
> > >
> > >or somesuch?  But I think the API you currently have is OK.
> > >
> > > On Thu, 2023-11-23 at 22:52 +0100, Guillaume Gomez wrote:
> > > > Ping David. :)
> > > >
> > > > Le mer. 15 nov. 2023 à 17:56, Antoni Boucher  a
> > > > écrit :
> > > > >
> > > > > David: another thing I remember you mentioned when you reviewed
> > > > > an
> > > > > earlier version of this patch is the usage of `std::pair`.
> > > > > I can't find where you said that, but I remember you mentioned
> > > > > that
> > > > > we
> > > > > should use a struct instead.
> > > > > Can you please elaborate again?
> > > > > Thanks.
> > > > >
> > > > > On Wed, 2023-11-15 at 17:53 +0100, Guillaume Gomez wrote:
> > > > > > Hi,
> > > > > >
> > > > > > This patch adds the (incomplete) support for function and
> > > > > > variable
> > > > > > attributes. The added attributes are the ones we're using in
> > > > > > rustc_codegen_gcc but all the groundwork is done to add more
> > > > > > (and
> > > > > > we
> > > > > > will very likely add more as we didn't add all the ones we
> > > > > > use in
> > > > > > rustc_codegen_gcc yet).
> > > > > >
> > > > > > The only big question with this patch is about `inline`. We
> > > > > > currently
> > > > > > handle it as an attribute because it is more convenient for
> > > > > > us
> > > > > > but is
> > > > > > it ok or should we create a separate function to mark a
> > > > > > function
> > > > > > as
> > > > > > inlined?
> > > > > >
> > > > > > Thanks in advance for the review.
> > > > >
> > >
>
From df75f0eb8aacba249b6e791603752e35778951a4 Mon Sep 17 00:00:00 2001
From: Guillaume Gomez 
Date: Mon, 20 Jun 2022 14:34:39 -0400
Subject: [PATCH] libgccjit: Add support for function attributes and variable
 attributes.

gcc/jit/ChangeLog:

	* dummy-frontend.cc (handle_alias_attribute): New function.
	(handle_always_inline_attribute): New function.
	(handle_cold_attribute): New function.
	(handle_fnspec_attribute): New function.
	(handle_format_arg_attribute): New function.
	(handle_format_attribute): New function.
	(handle_noinline_attribute): New function.
	(handle_target_attribute): New function.
	(handle_used_attribute): New function.
	(handle_visibility_attribute): New function.
	(handle_weak_attribute): New function.
	(handle_alias_ifunc_attribute): New function.
	* jit-playback.cc (fn_attribute_to_string): New function.
	(variable_attribute_to_string): New function.
	(global_new_decl): Add attributes support.
	(set_variable_attribute): New function.
	(new_global): Add attributes support.
	(new_global_initialized): Add attributes support.
	(new_local): Add attributes support.
	* jit-playback.h (fn_attribute_to_string): New function.
	(set_variable_attribute): New function.
	* jit-recording.cc (recording::lvalue::add_attribute): New function.
	(recording::function::function): New function.
	(recording::function::write_to_dump): Add attributes support.
	(recording::function::add_attribute): New function.
	(recording::function::add_string_attribute): New function.
	(recording::function::add_integer_array_attribute): New function.
	(recording::global::replay_into): Add attributes support.
	(recording::local::replay_into): Add attributes support.
	* libgccjit.cc (gcc_jit_function_add_attribute): New function.
	(gcc_jit_function_add_string_attribute): New function.
	(gcc_jit_function_add_integer_array_attribute): New function.
	(gcc_jit_lvalue_add_attribute): New function.
	* libgccjit.h (enum gcc_jit_fn_attribute): New enum.
	(gcc_jit_function_add_attribute): New function.
	(gcc_jit_function_add_string_attribute): New function.
	(gcc_jit_function_add_integer_array_attribute): New function.
	(enum gcc_jit_variable_attribute): New function.
	(gcc_jit_lvalue_add_string_attribute): New function.
	* libgccjit.map: Declare new functions.

gcc/testsuite/ChangeLog:

	* jit.dg/jit.exp: Add `jit-verify-assembler-

Re: [patch] OpenMP/Fortran: Implement omp allocators/allocate for ptr/allocatables

2023-12-09 Thread Thomas Schwinge

Hi Tobias!

On 2023-11-08T17:58:10+0100, Tobias Burnus  wrote:
> OpenMP/Fortran: Implement omp allocators/allocate for ptr/allocatables

Nice work!

> This commit adds -fopenmp-allocators which enables support for
> 'omp allocators' and 'omp allocate' that are associated with a Fortran
> allocate-stmt. If such a construct is encountered, an error is shown,
> unless the -fopenmp-allocators flag is present.
>
> With -fopenmp -fopenmp-allocators, those constructs get turned into
> GOMP_alloc allocations, while -fopenmp-allocators (also without -fopenmp)
> ensures deallocation and reallocation (via intrinsic assignments) are
> properly directed to GOMP_free/omp_realloc - while normal Fortran
> allocations are processed by free/realloc.
>
> In order to distinguish a 'malloc'ed from a 'GOMP_alloc'ed memory, the
> version field of the Fortran array discriptor is (mis)used: 0 indicates
> the normal Fortran allocation while 1 denotes GOMP_alloc. For scalars,
> there is record keeping in libgomp: GOMP_add_alloc(ptr) will add the
> pointer address to a splay_tree while GOMP_is_alloc(ptr) will return
> true it was previously added but also removes it from the list.
>
> Besides Fortran FE work, BUILT_IN_GOMP_REALLOC is no part of
> omp-builtins.def and libgomp gains the mentioned two new function.

Minor comments:

> --- a/gcc/fortran/trans-openmp.cc
> +++ b/gcc/fortran/trans-openmp.cc

> +/* Add ptr for tracking as being allocated by GOMP_alloc. */
> +
> +tree
> +gfc_omp_call_add_alloc (tree ptr)
> +{
> +  static tree fn = NULL_TREE;
> +  if (fn == NULL_TREE)
> +{
> +  fn = build_function_type_list (void_type_node, ptr_type_node, 
> NULL_TREE);
> +  fn = build_fn_decl ("GOMP_add_alloc", fn);
> +/* FIXME: attributes.  */
> +}
> +  return build_call_expr_loc (input_location, fn, 1, ptr);
> +}
> +
> +/* Generated function returns true when it was tracked via GOMP_add_alloc and
> +   removes it from the tracking.  As called just before GOMP_free or 
> omp_realloc
> +   the pointer is or might become invalid, thus, it is always removed. */
> +
> +tree
> +gfc_omp_call_is_alloc (tree ptr)
> +{
> +  static tree fn = NULL_TREE;
> +  if (fn == NULL_TREE)
> +{
> +  fn = build_function_type_list (boolean_type_node, ptr_type_node,
> +  NULL_TREE);
> +  fn = build_fn_decl ("GOMP_is_alloc", fn);
> +/* FIXME: attributes.  */
> +}
> +  return build_call_expr_loc (input_location, fn, 1, ptr);
> +}

Why not define 'GOMP_add_alloc', 'GOMP_is_alloc' via
'gcc/omp-builtins.def'?

> --- a/gcc/omp-builtins.def
> +++ b/gcc/omp-builtins.def
> @@ -467,6 +467,9 @@ DEF_GOMP_BUILTIN 
> (BUILT_IN_GOMP_WORKSHARE_TASK_REDUCTION_UNREGISTER,
>  DEF_GOMP_BUILTIN (BUILT_IN_GOMP_ALLOC,
> "GOMP_alloc", BT_FN_PTR_SIZE_SIZE_PTRMODE,
> ATTR_ALLOC_WARN_UNUSED_RESULT_SIZE_2_NOTHROW_LIST)
> +DEF_GOMP_BUILTIN (BUILT_IN_GOMP_REALLOC,
> +   "omp_realloc", BT_FN_PTR_PTR_SIZE_PTRMODE_PTRMODE,
> +   ATTR_ALLOC_WARN_UNUSED_RESULT_SIZE_2_NOTHROW_LEAF_LIST)
>  DEF_GOMP_BUILTIN (BUILT_IN_GOMP_FREE,
> "GOMP_free", BT_FN_VOID_PTR_PTRMODE, ATTR_NOTHROW_LEAF_LIST)
>  DEF_GOMP_BUILTIN (BUILT_IN_GOMP_WARNING, "GOMP_warning",

Should this either be 'BUILT_IN_OMP_REALLOC' ('OMP' instead of 'GOMP'),
or otherwise a 'GOMP_realloc' be added to 'libgomp/allocator.c', like for
'GOMP_alloc', 'GOMP_free'; 'ialias_call'ing the respective 'omp_[...]'
functions?  (Verbatim 'omp_realloc' also mentioned in the comment before
'gcc/fortran/trans-openmp.cc:gfc_omp_call_is_alloc'.)

> --- a/libgomp/allocator.c
> +++ b/libgomp/allocator.c

> +/* Add pointer as being alloced by GOMP_alloc.  */
> +void
> +GOMP_add_alloc (void *ptr)
> +{
> +  [...]
> +}
> +
> +/* Remove pointer, either called by FREE or by REALLOC,
> +   either of them can change the allocation status.  */
> +bool
> +GOMP_is_alloc (void *ptr)
> +{
> +  [...]
> +}

> --- a/libgomp/libgomp.map
> +++ b/libgomp/libgomp.map

> +GOMP_5.1.2 {
> +  global:
> + GOMP_add_alloc;
> + GOMP_is_alloc;
> + [...]
> +} GOMP_5.1.1;

'GOMP_add_alloc', 'GOMP_is_alloc' should get prototyped in
'libgomp/libgomp_g.h'.


Grüße
 Thomas
-
Siemens Electronic Design Automation GmbH; Anschrift: Arnulfstraße 201, 80634 
München; Gesellschaft mit beschränkter Haftung; Geschäftsführer: Thomas 
Heurung, Frank Thürauf; Sitz der Gesellschaft: München; Registergericht 
München, HRB 106955

Re: [PATCH] Reimplement __gnu_cxx::__ops operators

2023-12-09 Thread François Dumont




On 07/12/2023 14:41, Jonathan Wakely wrote:

On Wed, 6 Dec 2023 at 20:55, François Dumont  wrote:

I think I still got no feedback about this cleanup proposal.

Can you remind me why we have all those different functions in
predefined_ops.h in the first place? I think it was to avoid having
two versions of every algorithm, one that does *l < *r and one that
does pred(*l, *r), right?

Yes, that was the purpose.


One property of the current code is that _Iter_less_iter will compare
exactly *lhs < *rhs and so works even with this type, where its
operator< only accepts non-const arguments:

struct X { bool operator<(X&); };

Doesn't your simplification break that, because the _Less function
only accepts const references now?


Indeed, I thought more about the problem of const qualification on the 
operator(). This is why _Comp_val is mimicking what _Not_fn does.


To be honnest I even thought that this kind of operator above was more a 
user code issue than a real use case we need to handle. But it looks 
like you, I guess the C++ Standard then, impose to support it.


I'll rework it then, thanks for the proposal below and in your other email.

[PATCH] RISC-V: Recognize stepped series in expand_vec_perm_const.

2023-12-09 Thread Robin Dapp

Hi,

we currently try to recognize various forms of stepped (const_vector)
sequence variants in expand_const_vector.  Because of complications with
canonicalization and encoding it is easier to identify such patterns
in expand_vec_perm_const_1 already where perm.series_p () is available.

This patch introduces shuffle_series as new permutation pattern and
tries to recognize series like [base0 base1 base1 + step ...].  If such
a series is found the series is expanded by expand_vec_series and a
gather is emitted.

On top the patch fixes the step recognition in expand_const_vector
for stepped series where such a series would end up before.

This fixes several execution failures when running code compiled for a
scalable vector size of 128 on a target with vlen = 256 or higher.
The problem was only noticed there because the encoding for a reversed
[2 2]-element vector ("3 2 1 0") is { [1 2], [0 2], [1 4] }.
Some testcases that failed were:

vect-alias-check-18.c
vect-alias-check-1.F90
pr64365.c

On a 128-bit target, only the first two elements are used.  The
third element causing the complications only comes into effect at
vlen = 256.

With this patch the testsuite results are similar with vlen = 128 and
vlen = 256 (when built with -march=rv64gcv_zvl128b).

Regards
 Robin


gcc/ChangeLog:

* config/riscv/riscv-v.cc (expand_const_vector):  Fix step
calculation.
(modulo_sel_indices): Also perform modulo for variable-length
constants.
(shuffle_series): Recognize series permutations.
(expand_vec_perm_const_1): Add shuffle_series.
---
 gcc/config/riscv/riscv-v.cc | 66 +++--
 1 file changed, 64 insertions(+), 2 deletions(-)

diff --git a/gcc/config/riscv/riscv-v.cc b/gcc/config/riscv/riscv-v.cc
index 9b99d0aca84..fd6ef0660a2 100644
--- a/gcc/config/riscv/riscv-v.cc
+++ b/gcc/config/riscv/riscv-v.cc
@@ -1378,12 +1378,15 @@ expand_const_vector (rtx target, rtx src)
 { base0, base1, base1 + step, base1 + step * 2, ... }  */
  rtx base0 = builder.elt (0);
  rtx base1 = builder.elt (1);
- rtx step = builder.elt (2);
+ rtx base2 = builder.elt (2);
+
+ scalar_mode elem_mode = GET_MODE_INNER (mode);
+ rtx step = simplify_binary_operation (MINUS, elem_mode, base2, base1);
+
  /* Step 1 - { base1, base1 + step, base1 + step * 2, ... }  */
  rtx tmp = gen_reg_rtx (mode);
  expand_vec_series (tmp, base1, step);
  /* Step 2 - { base0, base1, base1 + step, base1 + step * 2, ... }  */
- scalar_mode elem_mode = GET_MODE_INNER (mode);
  if (!rtx_equal_p (base0, const0_rtx))
base0 = force_reg (elem_mode, base0);
 
@@ -3395,6 +3398,63 @@ shuffle_extract_and_slide1up_patterns (struct 
expand_vec_perm_d *d)
   return true;
 }
 
+static bool
+shuffle_series (struct expand_vec_perm_d *d)
+{
+  if (!d->one_vector_p || d->perm.encoding ().npatterns () != 1)
+return false;
+
+  poly_int64 el1 = d->perm[0];
+  poly_int64 el2 = d->perm[1];
+  poly_int64 el3 = d->perm[2];
+
+  poly_int64 step1 = el2 - el1;
+  poly_int64 step2 = el3 - el2;
+
+  bool need_insert = false;
+  bool have_series = false;
+
+  /* Check for a full series.  */
+  if (known_ne (step1, 0) && d->perm.series_p (0, 1, el1, step1))
+have_series = true;
+
+  /* Check for a series starting at the second element.  */
+  else if (known_ne (step2, 0) && d->perm.series_p (1, 1, el2, step2))
+{
+  have_series = true;
+  need_insert = true;
+}
+
+  if (!have_series)
+return false;
+
+  /* Get a vector int-mode to be used for the permute selector.  */
+  machine_mode sel_mode = related_int_vector_mode (d->vmode).require ();
+  insn_code icode = optab_handler (vec_shl_insert_optab, sel_mode);
+
+  /* We need to be able to insert an element and shift the vector.  */
+  if (need_insert && icode == CODE_FOR_nothing)
+return false;
+
+  /* Success! */
+  if (d->testing_p)
+return true;
+
+  /* Create the series.  */
+  machine_mode eltmode = Pmode;
+  rtx series = gen_reg_rtx (sel_mode);
+  expand_vec_series (series, gen_int_mode (need_insert ? el2 : el1, eltmode),
+gen_int_mode (need_insert ? step2 : step1, eltmode));
+
+  /* Insert the remaining element if necessary.  */
+  if (need_insert)
+emit_insn (GEN_FCN (icode) (series, series, gen_int_mode (el1, eltmode)));
+
+  emit_vlmax_gather_insn (d->target, d->op0, series);
+
+  return true;
+}
+
 /* Recognize the pattern that can be shuffled by generic approach.  */
 
 static bool
@@ -3475,6 +3535,8 @@ expand_vec_perm_const_1 (struct expand_vec_perm_d *d)
return true;
  if (shuffle_extract_and_slide1up_patterns (d))
return true;
+ if (shuffle_series (d))
+   return true;
  if (shuffle_generic_patterns (d))
return true;
  return false;
-- 
2.43.0

Re: [PATCH] RISC-V: Add vectorized strcmp.

2023-12-09 Thread Robin Dapp

> FAIL: gcc.target/riscv/rvv/autovec/builtin/strcmp-run.c execution test
> FAIL: gcc.target/riscv/rvv/autovec/builtin/strcmp-run.c execution test
> FAIL: gcc.target/riscv/rvv/autovec/builtin/strcmp-run.c execution test
> FAIL: gcc.target/riscv/rvv/autovec/builtin/strcmp-run.c execution test
> FAIL: gcc.target/riscv/rvv/autovec/builtin/strcmp-run.c execution test
> FAIL: gcc.target/riscv/rvv/autovec/builtin/strcmp-run.c execution test
> FAIL: gcc.target/riscv/rvv/autovec/builtin/strcmp-run.c execution test
> FAIL: gcc.target/riscv/rvv/autovec/builtin/strcmp-run.c execution test
> FAIL: gcc.target/riscv/rvv/autovec/builtin/strcmp-run.c execution test
> FAIL: gcc.target/riscv/rvv/autovec/builtin/strcmp-run.c execution test

Thanks, which config?  For me everything under builtin passes on rv64gcv
and rv32gcv:

PASS: gcc.target/riscv/rvv/autovec/builtin/strcmp-run.c execution test
PASS: gcc.target/riscv/rvv/autovec/builtin/strcmp-run.c execution test
PASS: gcc.target/riscv/rvv/autovec/builtin/strcmp-run.c execution test
PASS: gcc.target/riscv/rvv/autovec/builtin/strcmp-run.c execution test
PASS: gcc.target/riscv/rvv/autovec/builtin/strcmp-run.c execution test
PASS: gcc.target/riscv/rvv/autovec/builtin/strcmp-run.c execution test
PASS: gcc.target/riscv/rvv/autovec/builtin/strcmp-run.c execution test
PASS: gcc.target/riscv/rvv/autovec/builtin/strcmp-run.c execution test
PASS: gcc.target/riscv/rvv/autovec/builtin/strcmp-run.c execution test
PASS: gcc.target/riscv/rvv/autovec/builtin/strcmp-run.c execution test

Regards
 Robin

Re: [PATCH] RISC-V: Recognize stepped series in expand_vec_perm_const.

2023-12-09 Thread 钟居哲

It's more reasonable to fix it in vec_perm_const instead of fix it in 
middle-end.

LGTM.



juzhe.zh...@rivai.ai
 
From: Robin Dapp
Date: 2023-12-09 21:18
To: gcc-patches; palmer; Kito Cheng; jeffreyalaw; juzhe.zh...@rivai.ai
CC: rdapp.gcc
Subject: [PATCH] RISC-V: Recognize stepped series in expand_vec_perm_const.
Hi,
 
we currently try to recognize various forms of stepped (const_vector)
sequence variants in expand_const_vector.  Because of complications with
canonicalization and encoding it is easier to identify such patterns
in expand_vec_perm_const_1 already where perm.series_p () is available.
 
This patch introduces shuffle_series as new permutation pattern and
tries to recognize series like [base0 base1 base1 + step ...].  If such
a series is found the series is expanded by expand_vec_series and a
gather is emitted.
 
On top the patch fixes the step recognition in expand_const_vector
for stepped series where such a series would end up before.
 
This fixes several execution failures when running code compiled for a
scalable vector size of 128 on a target with vlen = 256 or higher.
The problem was only noticed there because the encoding for a reversed
[2 2]-element vector ("3 2 1 0") is { [1 2], [0 2], [1 4] }.
Some testcases that failed were:
 
vect-alias-check-18.c
vect-alias-check-1.F90
pr64365.c
 
On a 128-bit target, only the first two elements are used.  The
third element causing the complications only comes into effect at
vlen = 256.
 
With this patch the testsuite results are similar with vlen = 128 and
vlen = 256 (when built with -march=rv64gcv_zvl128b).
 
Regards
Robin
 
 
gcc/ChangeLog:
 
* config/riscv/riscv-v.cc (expand_const_vector):  Fix step
calculation.
(modulo_sel_indices): Also perform modulo for variable-length
constants.
(shuffle_series): Recognize series permutations.
(expand_vec_perm_const_1): Add shuffle_series.
---
gcc/config/riscv/riscv-v.cc | 66 +++--
1 file changed, 64 insertions(+), 2 deletions(-)
 
diff --git a/gcc/config/riscv/riscv-v.cc b/gcc/config/riscv/riscv-v.cc
index 9b99d0aca84..fd6ef0660a2 100644
--- a/gcc/config/riscv/riscv-v.cc
+++ b/gcc/config/riscv/riscv-v.cc
@@ -1378,12 +1378,15 @@ expand_const_vector (rtx target, rtx src)
 { base0, base1, base1 + step, base1 + step * 2, ... }  */
  rtx base0 = builder.elt (0);
  rtx base1 = builder.elt (1);
-   rtx step = builder.elt (2);
+   rtx base2 = builder.elt (2);
+
+   scalar_mode elem_mode = GET_MODE_INNER (mode);
+   rtx step = simplify_binary_operation (MINUS, elem_mode, base2, base1);
+
  /* Step 1 - { base1, base1 + step, base1 + step * 2, ... }  */
  rtx tmp = gen_reg_rtx (mode);
  expand_vec_series (tmp, base1, step);
  /* Step 2 - { base0, base1, base1 + step, base1 + step * 2, ... }  */
-   scalar_mode elem_mode = GET_MODE_INNER (mode);
  if (!rtx_equal_p (base0, const0_rtx))
base0 = force_reg (elem_mode, base0);
@@ -3395,6 +3398,63 @@ shuffle_extract_and_slide1up_patterns (struct 
expand_vec_perm_d *d)
   return true;
}
+static bool
+shuffle_series (struct expand_vec_perm_d *d)
+{
+  if (!d->one_vector_p || d->perm.encoding ().npatterns () != 1)
+return false;
+
+  poly_int64 el1 = d->perm[0];
+  poly_int64 el2 = d->perm[1];
+  poly_int64 el3 = d->perm[2];
+
+  poly_int64 step1 = el2 - el1;
+  poly_int64 step2 = el3 - el2;
+
+  bool need_insert = false;
+  bool have_series = false;
+
+  /* Check for a full series.  */
+  if (known_ne (step1, 0) && d->perm.series_p (0, 1, el1, step1))
+have_series = true;
+
+  /* Check for a series starting at the second element.  */
+  else if (known_ne (step2, 0) && d->perm.series_p (1, 1, el2, step2))
+{
+  have_series = true;
+  need_insert = true;
+}
+
+  if (!have_series)
+return false;
+
+  /* Get a vector int-mode to be used for the permute selector.  */
+  machine_mode sel_mode = related_int_vector_mode (d->vmode).require ();
+  insn_code icode = optab_handler (vec_shl_insert_optab, sel_mode);
+
+  /* We need to be able to insert an element and shift the vector.  */
+  if (need_insert && icode == CODE_FOR_nothing)
+return false;
+
+  /* Success! */
+  if (d->testing_p)
+return true;
+
+  /* Create the series.  */
+  machine_mode eltmode = Pmode;
+  rtx series = gen_reg_rtx (sel_mode);
+  expand_vec_series (series, gen_int_mode (need_insert ? el2 : el1, eltmode),
+  gen_int_mode (need_insert ? step2 : step1, eltmode));
+
+  /* Insert the remaining element if necessary.  */
+  if (need_insert)
+emit_insn (GEN_FCN (icode) (series, series, gen_int_mode (el1, eltmode)));
+
+  emit_vlmax_gather_insn (d->target, d->op0, series);
+
+  return true;
+}
+
/* Recognize the pattern that can be shuffled by generic approach.  */
static bool
@@ -3475,6 +3535,8 @@ expand_vec_perm_const_1 (struct expand_vec_perm_d *d)
return true;
  if (shuffle_extract_and_slide1up_patterns (d))
return true;
+   if (shuffle_series (d))
+ return true;
  if (shuffle_generic_patterns (d))
return true;

Re: Re: [PATCH] RISC-V: Add vectorized strcmp.

2023-12-09 Thread 钟居哲

rv64gcv



juzhe.zh...@rivai.ai
 
From: Robin Dapp
Date: 2023-12-09 21:51
To: 钟居哲; gcc-patches; palmer; kito.cheng; Jeff Law
CC: rdapp.gcc
Subject: Re: [PATCH] RISC-V: Add vectorized strcmp.
> FAIL: gcc.target/riscv/rvv/autovec/builtin/strcmp-run.c execution test
> FAIL: gcc.target/riscv/rvv/autovec/builtin/strcmp-run.c execution test
> FAIL: gcc.target/riscv/rvv/autovec/builtin/strcmp-run.c execution test
> FAIL: gcc.target/riscv/rvv/autovec/builtin/strcmp-run.c execution test
> FAIL: gcc.target/riscv/rvv/autovec/builtin/strcmp-run.c execution test
> FAIL: gcc.target/riscv/rvv/autovec/builtin/strcmp-run.c execution test
> FAIL: gcc.target/riscv/rvv/autovec/builtin/strcmp-run.c execution test
> FAIL: gcc.target/riscv/rvv/autovec/builtin/strcmp-run.c execution test
> FAIL: gcc.target/riscv/rvv/autovec/builtin/strcmp-run.c execution test
> FAIL: gcc.target/riscv/rvv/autovec/builtin/strcmp-run.c execution test
 
Thanks, which config?  For me everything under builtin passes on rv64gcv
and rv32gcv:
 
PASS: gcc.target/riscv/rvv/autovec/builtin/strcmp-run.c execution test
PASS: gcc.target/riscv/rvv/autovec/builtin/strcmp-run.c execution test
PASS: gcc.target/riscv/rvv/autovec/builtin/strcmp-run.c execution test
PASS: gcc.target/riscv/rvv/autovec/builtin/strcmp-run.c execution test
PASS: gcc.target/riscv/rvv/autovec/builtin/strcmp-run.c execution test
PASS: gcc.target/riscv/rvv/autovec/builtin/strcmp-run.c execution test
PASS: gcc.target/riscv/rvv/autovec/builtin/strcmp-run.c execution test
PASS: gcc.target/riscv/rvv/autovec/builtin/strcmp-run.c execution test
PASS: gcc.target/riscv/rvv/autovec/builtin/strcmp-run.c execution test
PASS: gcc.target/riscv/rvv/autovec/builtin/strcmp-run.c execution test
 
Regards
Robin

[committed] libstdc++: Fix resolution of LWG 4016 for std::ranges::to [PR112876]

2023-12-09 Thread Jonathan Wakely

Tested x86_64-linux. Pushed to trunk.

-- >8 --

What I implemented in r14-6199-g45630fbcf7875b does not match what I
proposed for LWG 4016, and it imposes additional, unwanted requirements
on the emplace and insert member functions of the container being
populated.

libstdc++-v3/ChangeLog:

PR libstdc++/112876
* include/std/ranges (ranges::to): Do not try to use an iterator
returned by the container's emplace or insert member functions.
* testsuite/std/ranges/conv/1.cc (Cont4::emplace, Cont4::insert):
Use the iterator parameter. Do not return an iterator.
---
 libstdc++-v3/include/std/ranges | 10 +++---
 libstdc++-v3/testsuite/std/ranges/conv/1.cc | 12 ++--
 2 files changed, 9 insertions(+), 13 deletions(-)

diff --git a/libstdc++-v3/include/std/ranges b/libstdc++-v3/include/std/ranges
index fb9df3d3e79..be8475c0cb1 100644
--- a/libstdc++-v3/include/std/ranges
+++ b/libstdc++-v3/include/std/ranges
@@ -9300,14 +9300,10 @@ namespace __detail
__c.emplace_back(*__it);
  else if constexpr (requires { __c.push_back(*__it); })
__c.push_back(*__it);
+ else if constexpr (requires { __c.emplace(__c.end(), *__it); 
})
+   __c.emplace(__c.end(), *__it);
  else
-   {
- auto __end = __c.end();
- if constexpr (requires { __c.emplace(__end, *__it); })
-   __end = __c.emplace(__end, *__it);
- else
-   __end = __c.insert(__end, *__it);
-   }
+   __c.insert(__c.end(), *__it);
  ++__it;
}
  return __c;
diff --git a/libstdc++-v3/testsuite/std/ranges/conv/1.cc 
b/libstdc++-v3/testsuite/std/ranges/conv/1.cc
index b5f861dedb3..6d6a708ab64 100644
--- a/libstdc++-v3/testsuite/std/ranges/conv/1.cc
+++ b/libstdc++-v3/testsuite/std/ranges/conv/1.cc
@@ -236,19 +236,19 @@ struct Cont4
   template
 requires (Kind <= Emplace)
 && requires(C& c, T&& t) { c.emplace(c.end(), std::forward(t)); }
-typename C::iterator
-emplace(typename C::iterator, T&& t)
+void
+emplace(typename C::iterator pos, T&& t)
 {
   kind = Emplace;
-  return c.emplace(c.end(), std::forward(t));
+  c.emplace(pos, std::forward(t));
 }
 
   template
-typename C::iterator
-insert(typename C::iterator, T&& t)
+void
+insert(typename C::iterator pos, T&& t)
 {
   kind = Insert;
-  return c.insert(c.end(), std::forward(t));
+  c.insert(pos, std::forward(t));
 }
 
   // Required to satisfy reservable-container
-- 
2.43.0

Re: [PATCH] RISC-V: Add vectorized strcmp.

2023-12-09 Thread Robin Dapp

> rv64gcv

With -minline-strcmp I assume?

Regards
 Robin

[committed] libstdc++: Fix value of __cpp_lib_format macro [PR111826]

2023-12-09 Thread Jonathan Wakely

Tested x86_64-linux. Pushed to trunk.

I'll check, but I think should be backported to gcc-13 too.

-- >8 --

As noted in the PR, we support both features required for the 202110L
value, so we should define it with that value.

libstdc++-v3/ChangeLog:

PR libstdc++/111826
* include/bits/version.def (format): Update value.
* include/bits/version.h: Regenerate.
* testsuite/std/format/functions/format.cc:
---
 libstdc++-v3/include/bits/version.def |   4 +-
 libstdc++-v3/include/bits/version.h   | 128 +-
 .../testsuite/std/format/functions/format.cc  |   4 +-
 3 files changed, 67 insertions(+), 69 deletions(-)

diff --git a/libstdc++-v3/include/bits/version.def 
b/libstdc++-v3/include/bits/version.def
index 140777832ed..38b73ec9b5d 100644
--- a/libstdc++-v3/include/bits/version.def
+++ b/libstdc++-v3/include/bits/version.def
@@ -1160,14 +1160,12 @@ ftms = {
   // TODO: #define __cpp_lib_format_ranges 202207L
   name = format;
   values = {
-v = 202106;
+v = 202110;
 cxxmin = 20;
 hosted = yes;
   };
 };
 
-// #undef __glibcxx_chrono
-// #define __glibcxx_chrono 201907L
 // FIXME: #define __glibcxx_execution 201902L
 
 ftms = {
diff --git a/libstdc++-v3/testsuite/std/format/functions/format.cc 
b/libstdc++-v3/testsuite/std/format/functions/format.cc
index dacc276e03c..9328dec8875 100644
--- a/libstdc++-v3/testsuite/std/format/functions/format.cc
+++ b/libstdc++-v3/testsuite/std/format/functions/format.cc
@@ -5,7 +5,7 @@
 
 #ifndef __cpp_lib_format
 # error "Feature test macro for std::format is missing in "
-#elif __cpp_lib_format < 202106L
+#elif __cpp_lib_format < 202110L
 # error "Feature test macro for std::format has wrong value in "
 #endif
 
@@ -13,7 +13,7 @@
 #include 
 #ifndef __cpp_lib_format
 # error "Feature test macro for std::format is missing in "
-#elif __cpp_lib_format < 202106L
+#elif __cpp_lib_format < 202110L
 # error "Feature test macro for std::format has wrong value in "
 #endif
 
-- 
2.43.0

Re: Re: [PATCH] RISC-V: Add vectorized strcmp.

2023-12-09 Thread 钟居哲

I didn't use any special configuration:

--with-arch=rv64gcv_zvl256b --with-abi=lp64d --test --jobs=64 --with-sim=qemu 
--enable-gcc-checking=yes,assert,extra,rtlflag,rtl,gimple



juzhe.zh...@rivai.ai
 
From: Robin Dapp
Date: 2023-12-09 22:07
To: 钟居哲; gcc-patches; palmer; kito.cheng; Jeff Law
CC: rdapp.gcc
Subject: Re: [PATCH] RISC-V: Add vectorized strcmp.
> rv64gcv
 
With -minline-strcmp I assume?
 
Regards
Robin

[committed] testsuite: Add testcase for already fixed PR [PR112924]

2023-12-09 Thread Jakub Jelinek

Hi!

This testcase got fixed with
r14-6132-g50f2a3370d177f8fe9bea0461feb710523e048a2 .
I'm just adding a testcase so that it doesn't reappear.

Tested on x86_64-linux, with -m32/-m64, current trunk as well as r14-6131
where it ICEd with -m32, committed to trunk as obvious.

2023-12-09  Jakub Jelinek  

PR tree-optimization/112924
* gcc.dg/pr112924.c: New test.

--- gcc/testsuite/gcc.dg/pr112924.c.jj  2023-12-09 15:28:00.848388123 +0100
+++ gcc/testsuite/gcc.dg/pr112924.c 2023-12-09 15:27:34.401752018 +0100
@@ -0,0 +1,26 @@
+/* PR tree-optimization/112924 */
+/* { dg-do compile } */
+/* { dg-options "-O2 -w" } */
+/* { dg-additional-options "-msse2" { target i?86-*-* x86_64-*-* } } */
+
+struct S { long a; char b[64]; };
+void foo (struct S a);
+char c;
+int d[3541];
+
+static void
+bar (struct S *s, char *p)
+{
+  unsigned int a = sizeof (d) - sizeof (int) - s->a;
+  long c = __builtin_object_size (s, 0);
+  for (; a >= 64; a -= 64, p += 4);
+__builtin___memcpy_chk (s, p, a, c);
+}
+
+void
+baz (void)
+{
+  struct S s = {};
+  bar (&s, &c);
+  foo (s);
+}

Jakub

RE: [PATCH v6] libgfortran: Replace mutex with rwlock

2023-12-09 Thread Zhu, Lipeng

On 2023/12/8 18:19, Jakub Jelinek wrote:
> On Fri, Aug 18, 2023 at 11:18:19AM +0800, Zhu, Lipeng wrote:
> > From: Lipeng Zhu 
> >
> > This patch try to introduce the rwlock and split the read/write to
> > unit_root tree and unit_cache with rwlock instead of the mutex to
> > increase CPU efficiency. In the get_gfc_unit function, the percentage
> > to step into the insert_unit function is around 30%, in most
> > instances, we can get the unit in the phase of reading the unit_cache
> > or unit_root tree. So split the read/write phase by rwlock would be an
> > approach to make it more parallel.
> >
> > BTW, the IPC metrics can gain around 9x in our test server with 220
> > cores. The benchmark we used is https://github.com/rwesson/NEAT
> >
> > libgcc/ChangeLog:
> >
> > * gthr-posix.h (__GTHREAD_RWLOCK_INIT): New macro
> > (__gthrw): New function
> > (__gthread_rwlock_rdlock): New function
> > (__gthread_rwlock_tryrdlock): New function
> > (__gthread_rwlock_wrlock): New function
> > (__gthread_rwlock_trywrlock): New function
> > (__gthread_rwlock_unlock): New function
> >
> > libgfortran/ChangeLog:
> >
> > * io/async.c (DEBUG_LINE): New
> > * io/async.h (RWLOCK_DEBUG_ADD): New macro
> > (CHECK_RDLOCK): New macro
> > (CHECK_WRLOCK): New macro
> > (TAIL_RWLOCK_DEBUG_QUEUE): New macro
> > (IN_RWLOCK_DEBUG_QUEUE): New macro
> > (RDLOCK): New macro
> > (WRLOCK): New macro
> > (RWUNLOCK): New macro
> > (RD_TO_WRLOCK): New macro
> > (INTERN_RDLOCK): New macro
> > (INTERN_WRLOCK): New macro
> > (INTERN_RWUNLOCK): New macro
> > * io/io.h (internal_proto): Define unit_rwlock
> > * io/transfer.c (st_read_done_worker): Relace unit_lock with unit_rwlock
> > (st_write_done_worker): Relace unit_lock with unit_rwlock
> > * io/unit.c (get_gfc_unit): Relace unit_lock with unit_rwlock
> > (if): Relace unit_lock with unit_rwlock
> > (close_unit_1): Relace unit_lock with unit_rwlock
> > (close_units): Relace unit_lock with unit_rwlock
> > (newunit_alloc): Relace unit_lock with unit_rwlock
> > * io/unix.c (flush_all_units): Relace unit_lock with unit_rwlock
> 
> The changeLog entries are all incorrect.
> 1) they should be indented by a tab, not 4 spaces, and should end with
>a dot
> 2) when several consecutive descriptions have the same text, especially
>when it is long, it should use Likewise. for the 2nd and following
> 3) (internal_proto) is certainly not what you've changed, from what I can
>see in io.h you've done:
>   * io/io.h (struct gfc_unit): Change UNIT_LOCK to UNIT_RWLOCK in
>   a comment.
>   (unit_lock): Remove including associated internal_proto.
>   (unit_rwlock): New declarations including associated internal_proto.
>   (dec_waiting_unlocked): Use WRLOCK and RWUNLOCK on unit_rwlock
>   instead of __gthread_mutex_lock and __gthread_mutex_unlock on
>   unit_lock.
>(if) is certainly not what you've changed either, always find what
>function or macro the change was in, or if you remove something, state
>it, if you add something, state it.
> 4) all the
>Replace unit_lock with unit_rwlock. descriptions only partially match
>reality, you've also changed the operations on those variables.
> 

Hi Jakub, 
Thanks for your help, very appreciated! I just updated the patch according to
your comments. A new version [PATCH V7] is sent out for your review which 
update the change log and formatting code according to your following 
comments.

Lipeng Zhu

> > --- a/libgfortran/io/async.h
> > +++ b/libgfortran/io/async.h
> > @@ -207,9 +207,132 @@
> >  INTERN_LOCK (&debug_queue_lock);
>   \
> >  MUTEX_DEBUG_ADD (mutex);
>   \
> >  INTERN_UNLOCK (&debug_queue_lock);
>   \
> > -DEBUG_PRINTF ("%s" DEBUG_RED "ACQ:" DEBUG_NORM " %-
> 30s %78p\n", aio_prefix, #mutex, mutex); \
> > +DEBUG_PRINTF ("%s" DEBUG_RED "ACQ:" DEBUG_NORM " %-
> 30s %78p\n", aio_prefix, #mutex, \
> > +mutex); \
> 
> Why are you changing this at all?
> 
> > +#define RD_TO_WRLOCK(rwlock) \
> > +  RWUNLOCK (rwlock);\
> 
> At least a space before \ (or better tab
> 
> > +#define RD_TO_WRLOCK(rwlock) \
> > +  RWUNLOCK (rwlock);\
> 
> Likewise.
> 
> > +  WRLOCK (rwlock);
> > +#endif
> > +#endif
> > +
> > +#ifndef __GTHREAD_RWLOCK_INIT
> > +#define RDLOCK(rwlock) LOCK (rwlock)
> > +#define WRLOCK(rwlock) LOCK (rwlock)
> > +#define RWUNLOCK(rwlock) UNLOCK (rwlock) #define
> RD_TO_WRLOCK(rwlock)
> > +{}
> 
> do {} while (0)
> instead of {}
> ?
> 
> >  #endif
> >
> >  #define INTERN_LOCK(mutex) T_ERROR (__gthread_mutex_lock, mutex);
> >
> >  #define INTERN_UNLOCK(mutex) T_ERROR (__gthread_mutex_unlock,
> mutex);
> >
> > +#define INTERN_RDLOCK(rwlock) T_ERROR (__gthread_rwlock_rdlock,
> > +rwlock); #define INTERN_WRLOCK(rwlock) T_ERROR
> > +(__gthread_rwlock_wrlock, rwlock); #define INTERN_RWUNLOCK(rwlock)
> > +T_ERROR (__gthread_rwlock_unlock, rwlock);
> 
> Admit

Re: [patch] OpenMP/Fortran: Implement omp allocators/allocate for ptr/allocatables

2023-12-09 Thread Jakub Jelinek

On Sat, Dec 09, 2023 at 12:19:10PM +0100, Thomas Schwinge wrote:
> > --- a/gcc/omp-builtins.def
> > +++ b/gcc/omp-builtins.def
> > @@ -467,6 +467,9 @@ DEF_GOMP_BUILTIN 
> > (BUILT_IN_GOMP_WORKSHARE_TASK_REDUCTION_UNREGISTER,
> >  DEF_GOMP_BUILTIN (BUILT_IN_GOMP_ALLOC,
> > "GOMP_alloc", BT_FN_PTR_SIZE_SIZE_PTRMODE,
> > ATTR_ALLOC_WARN_UNUSED_RESULT_SIZE_2_NOTHROW_LIST)
> > +DEF_GOMP_BUILTIN (BUILT_IN_GOMP_REALLOC,
> > +   "omp_realloc", BT_FN_PTR_PTR_SIZE_PTRMODE_PTRMODE,
> > +   ATTR_ALLOC_WARN_UNUSED_RESULT_SIZE_2_NOTHROW_LEAF_LIST)
> >  DEF_GOMP_BUILTIN (BUILT_IN_GOMP_FREE,
> > "GOMP_free", BT_FN_VOID_PTR_PTRMODE, ATTR_NOTHROW_LEAF_LIST)
> >  DEF_GOMP_BUILTIN (BUILT_IN_GOMP_WARNING, "GOMP_warning",
> 
> Should this either be 'BUILT_IN_OMP_REALLOC' ('OMP' instead of 'GOMP'),
> or otherwise a 'GOMP_realloc' be added to 'libgomp/allocator.c', like for
> 'GOMP_alloc', 'GOMP_free'; 'ialias_call'ing the respective 'omp_[...]'
> functions?  (Verbatim 'omp_realloc' also mentioned in the comment before
> 'gcc/fortran/trans-openmp.cc:gfc_omp_call_is_alloc'.)

There were 3 reasons to add GOMP_alloc (and 1 for GOMP_free):
1) when it was added, omp_aligned_alloc was still not exported from the
   library because we thought we shouldn't expose 5.1 features until we
   have 5.0 implemented (then changed mind)
2) unline omp_aligned_alloc, GOMP_alloc issues fatal error on allocation
   failure
3) the omp_* functions have omp_allocator_handle_t arguments, which is hard
   to provide for builtins (I think this is the only reason for GOMP_free
   addition, maybe together with wanting those to be paired)

Now, 1) is a non-issue anymore, I don't know what Fortran wants for
allocation failures, if it is better to have diagnostics on the libgomp side
or if wants to emit it inline.  And yes, 3) would be an argument to add
GOMP_realloc.

Jakub

[PATCH v7] libgfortran: Replace mutex with rwlock

2023-12-09 Thread Lipeng Zhu

This patch try to introduce the rwlock and split the read/write to
unit_root tree and unit_cache with rwlock instead of the mutex to
increase CPU efficiency. In the get_gfc_unit function, the percentage
to step into the insert_unit function is around 30%, in most instances,
we can get the unit in the phase of reading the unit_cache or unit_root
tree. So split the read/write phase by rwlock would be an approach to
make it more parallel.

BTW, the IPC metrics can gain around 9x in our test
server with 220 cores. The benchmark we used is
https://github.com/rwesson/NEAT

libgcc/ChangeLog:

* gthr-posix.h (__GTHREAD_RWLOCK_INIT): New macro.
(__gthrw): New function.
(__gthread_rwlock_rdlock): New function.
(__gthread_rwlock_tryrdlock): New function.
(__gthread_rwlock_wrlock): New function.
(__gthread_rwlock_trywrlock): New function.
(__gthread_rwlock_unlock): New function.

libgfortran/ChangeLog:

* io/async.c (DEBUG_LINE): New macro.
* io/async.h (RWLOCK_DEBUG_ADD): New macro.
(CHECK_RDLOCK): New macro.
(CHECK_WRLOCK): New macro.
(TAIL_RWLOCK_DEBUG_QUEUE): New macro.
(IN_RWLOCK_DEBUG_QUEUE): New macro.
(RDLOCK): New macro.
(WRLOCK): New macro.
(RWUNLOCK): New macro.
(RD_TO_WRLOCK): New macro.
(INTERN_RDLOCK): New macro.
(INTERN_WRLOCK): New macro.
(INTERN_RWUNLOCK): New macro.
* io/io.h (struct gfc_unit): Change UNIT_LOCK to UNIT_RWLOCK in
a comment.
(unit_lock): Remove including associated internal_proto.
(unit_rwlock): New declarations including associated internal_proto.
(dec_waiting_unlocked): Use WRLOCK and RWUNLOCK on unit_rwlock
instead of __gthread_mutex_lock and __gthread_mutex_unlock on
unit_lock.
* io/transfer.c (st_read_done_worker): Use WRLOCK and RWUNLOCK on
unit_rwlock instead of LOCK and UNLOCK on unit_lock.
(st_write_done_worker): Likewise.
* io/unit.c: Change UNIT_LOCK to UNIT_RWLOCK in 'IO locking rules'
comment. Use unit_rwlock variable instead of unit_lock variable.
(get_gfc_unit_from_unit_root): New function.
(get_gfc_unit): Use RDLOCK, WRLOCK and RWUNLOCK on unit_rwlock
instead of LOCK and UNLOCK on unit_lock.
(close_unit_1): Use WRLOCK and RWUNLOCK on unit_rwlock instead of
LOCK and UNLOCK on unit_lock.
(close_units): Likewise.
(newunit_alloc): Use RWUNLOCK on unit_rwlock instead of UNLOCK on
unit_lock.
* io/unix.c (find_file): Use RDLOCK and RWUNLOCK on unit_rwlock
instead of LOCK and UNLOCK on unit_lock.
(flush_all_units): Use WRLOCK and RWUNLOCK on unit_rwlock instead
of LOCK and UNLOCK on unit_lock.

---
v1 -> v2:
Limit the pthread_rwlock usage in libgcc only when __cplusplus isn't defined.

v2 -> v3:
Rebase the patch with trunk branch.

v3 -> v4:
Update the comments.

v4 -> v5:
Fix typos and code formatter.

v5 -> v6:
Add unit tests.

v6 -> v7:
Update ChangeLog and code formatter.

Reviewed-by: Hongjiu Lu 
Reviewed-by: Bernhard Reutner-Fischer 
Reviewed-by: Thomas Koenig 
Reviewed-by: Jakub Jelinek 
Signed-off-by: Lipeng Zhu 
---
 libgcc/gthr-posix.h   |  60 +++
 libgfortran/io/async.c|   4 +
 libgfortran/io/async.h| 151 ++
 libgfortran/io/io.h   |  15 +-
 libgfortran/io/transfer.c |   8 +-
 libgfortran/io/unit.c | 117 +-
 libgfortran/io/unix.c |  16 +-
 .../testsuite/libgomp.fortran/rwlock_1.f90|  33 
 .../testsuite/libgomp.fortran/rwlock_2.f90|  22 +++
 .../testsuite/libgomp.fortran/rwlock_3.f90|  18 +++
 10 files changed, 386 insertions(+), 58 deletions(-)
 create mode 100644 libgomp/testsuite/libgomp.fortran/rwlock_1.f90
 create mode 100644 libgomp/testsuite/libgomp.fortran/rwlock_2.f90
 create mode 100644 libgomp/testsuite/libgomp.fortran/rwlock_3.f90

diff --git a/libgcc/gthr-posix.h b/libgcc/gthr-posix.h
index aebcfdd9f4c..73283082997 100644
--- a/libgcc/gthr-posix.h
+++ b/libgcc/gthr-posix.h
@@ -48,6 +48,9 @@ typedef pthread_t __gthread_t;
 typedef pthread_key_t __gthread_key_t;
 typedef pthread_once_t __gthread_once_t;
 typedef pthread_mutex_t __gthread_mutex_t;
+#ifndef __cplusplus
+typedef pthread_rwlock_t __gthread_rwlock_t;
+#endif
 typedef pthread_mutex_t __gthread_recursive_mutex_t;
 typedef pthread_cond_t __gthread_cond_t;
 typedef struct timespec __gthread_time_t;
@@ -58,6 +61,9 @@ typedef struct timespec __gthread_time_t;
 
 #define __GTHREAD_MUTEX_INIT PTHREAD_MUTEX_INITIALIZER
 #define __GTHREAD_MUTEX_INIT_FUNCTION __gthread_mutex_init_function
+#ifndef __cplusplus
+#define __GTHREAD_RWLOCK_INIT PTHREAD_RWLOCK_INITIALIZER
+#endif
 #define __GTHREAD_ONCE_INIT PTHREAD_ONCE_INIT
 #if defined(PTHREAD_RECURSIVE_MUT

Re: [PATCH v7] libgfortran: Replace mutex with rwlock

2023-12-09 Thread Jakub Jelinek

On Sat, Dec 09, 2023 at 10:39:45AM -0500, Lipeng Zhu wrote:
> This patch try to introduce the rwlock and split the read/write to
> unit_root tree and unit_cache with rwlock instead of the mutex to
> increase CPU efficiency. In the get_gfc_unit function, the percentage
> to step into the insert_unit function is around 30%, in most instances,
> we can get the unit in the phase of reading the unit_cache or unit_root
> tree. So split the read/write phase by rwlock would be an approach to
> make it more parallel.
> 
> BTW, the IPC metrics can gain around 9x in our test
> server with 220 cores. The benchmark we used is
> https://github.com/rwesson/NEAT
> 
> libgcc/ChangeLog:
> 
>   * gthr-posix.h (__GTHREAD_RWLOCK_INIT): New macro.
>   (__gthrw): New function.
>   (__gthread_rwlock_rdlock): New function.
>   (__gthread_rwlock_tryrdlock): New function.
>   (__gthread_rwlock_wrlock): New function.
>   (__gthread_rwlock_trywrlock): New function.
>   (__gthread_rwlock_unlock): New function.
> 
> libgfortran/ChangeLog:
> 
>   * io/async.c (DEBUG_LINE): New macro.
>   * io/async.h (RWLOCK_DEBUG_ADD): New macro.
>   (CHECK_RDLOCK): New macro.
>   (CHECK_WRLOCK): New macro.
>   (TAIL_RWLOCK_DEBUG_QUEUE): New macro.
>   (IN_RWLOCK_DEBUG_QUEUE): New macro.
>   (RDLOCK): New macro.
>   (WRLOCK): New macro.
>   (RWUNLOCK): New macro.
>   (RD_TO_WRLOCK): New macro.
>   (INTERN_RDLOCK): New macro.
>   (INTERN_WRLOCK): New macro.
>   (INTERN_RWUNLOCK): New macro.
>   * io/io.h (struct gfc_unit): Change UNIT_LOCK to UNIT_RWLOCK in
>   a comment.
>   (unit_lock): Remove including associated internal_proto.
>   (unit_rwlock): New declarations including associated internal_proto.
>   (dec_waiting_unlocked): Use WRLOCK and RWUNLOCK on unit_rwlock
>   instead of __gthread_mutex_lock and __gthread_mutex_unlock on
>   unit_lock.
>   * io/transfer.c (st_read_done_worker): Use WRLOCK and RWUNLOCK on
>   unit_rwlock instead of LOCK and UNLOCK on unit_lock.
>   (st_write_done_worker): Likewise.
>   * io/unit.c: Change UNIT_LOCK to UNIT_RWLOCK in 'IO locking rules'
>   comment. Use unit_rwlock variable instead of unit_lock variable.
>   (get_gfc_unit_from_unit_root): New function.
>   (get_gfc_unit): Use RDLOCK, WRLOCK and RWUNLOCK on unit_rwlock
>   instead of LOCK and UNLOCK on unit_lock.
>   (close_unit_1): Use WRLOCK and RWUNLOCK on unit_rwlock instead of
>   LOCK and UNLOCK on unit_lock.
>   (close_units): Likewise.
>   (newunit_alloc): Use RWUNLOCK on unit_rwlock instead of UNLOCK on
>   unit_lock.
>   * io/unix.c (find_file): Use RDLOCK and RWUNLOCK on unit_rwlock
>   instead of LOCK and UNLOCK on unit_lock.
>   (flush_all_units): Use WRLOCK and RWUNLOCK on unit_rwlock instead
>   of LOCK and UNLOCK on unit_lock.

Ok for trunk, thanks.

Jakub

[PATCH] LoongArch: Fix warnings building libgcc

2023-12-09 Thread Xi Ruoyao

We are excluding loongarch-opts.h from target libraries, but now struct
loongarch_target and gcc_options are not declared in the target
libraries, causing:

In file included from ../.././gcc/options.h:8,
 from ../.././gcc/tm.h:49,
 from ../../../gcc/libgcc/fixed-bit.c:48:
../../../gcc/libgcc/../gcc/config/loongarch/loongarch-opts.h:57:41:
warning: 'struct gcc_options' declared inside parameter list will not
be visible outside of this definition or declaration
   57 |  struct gcc_options *opts,
  | ^~~

So exclude the declarations referring to the C++ structs as well.

gcc/ChangeLog:

* config/loongarch/loongarch-opts.h (la_target): Move into #if
for loongarch-def.h.
(loongarch_init_target): Likewise.
(loongarch_config_target): Likewise.
(loongarch_update_gcc_opt_status): Likewise.
---

Bootstrapped and regtested on loongarch64-linux-gnu.  Ok for trunk?

 gcc/config/loongarch/loongarch-opts.h | 20 ++--
 1 file changed, 10 insertions(+), 10 deletions(-)

diff --git a/gcc/config/loongarch/loongarch-opts.h 
b/gcc/config/loongarch/loongarch-opts.h
index 651c1c18ca8..d091359300a 100644
--- a/gcc/config/loongarch/loongarch-opts.h
+++ b/gcc/config/loongarch/loongarch-opts.h
@@ -21,22 +21,15 @@ along with GCC; see the file COPYING3.  If not see
 #ifndef LOONGARCH_OPTS_H
 #define LOONGARCH_OPTS_H
 
-/* This is a C++ header and it shouldn't be used by target libraries.  */
+/* The loongarch-def.h file is a C++ header and it shouldn't be used by
+   target libraries.  Exclude it and everything using the C++ structs
+   (struct loongarch_target and gcc_options) from target libraries.  */
 #if !defined(IN_LIBGCC2) && !defined(IN_TARGET_LIBS) && !defined(IN_RTS)
 #include "loongarch-def.h"
-#endif
 
 /* Target configuration */
 extern struct loongarch_target la_target;
 
-/* Flag status */
-struct loongarch_flags {
-int flt; const char* flt_str;
-#define SX_FLAG_TYPE(x) ((x) < 0 ? -(x) : (x))
-int sx[2];
-};
-
-
 /* Initialize loongarch_target from separate option variables.  */
 void
 loongarch_init_target (struct loongarch_target *target,
@@ -56,7 +49,14 @@ void
 loongarch_update_gcc_opt_status (struct loongarch_target *target,
 struct gcc_options *opts,
 struct gcc_options *opts_set);
+#endif
 
+/* Flag status */
+struct loongarch_flags {
+int flt; const char* flt_str;
+#define SX_FLAG_TYPE(x) ((x) < 0 ? -(x) : (x))
+int sx[2];
+};
 
 /* Macros for common conditional expressions used in loongarch.{c,h,md} */
 #define TARGET_CMODEL_NORMAL   (la_target.cmodel == CMODEL_NORMAL)
-- 
2.43.0

[PATCH 0/3] LoongArch: Fix instruction costs

2023-12-09 Thread Xi Ruoyao

Update LoongArch instruction costs based on the micro-benchmark results
on LA464 and LA664.  In particular, this allows generating alsl/slli or
alsl/slli + add pairs for multiplying some constants as on LA464/LA664
a mul instruction is 4x slower than alsl, slli, or add instructions.

Bootstrapped and regtested on loongarch64-linux-gnu.  Ok for trunk?

Xi Ruoyao (3):
  LoongArch: Include rtl.h for COSTS_N_INSNS instead of hard coding our
own
  LoongArch: Fix instruction costs [PR112936]
  LoongArch: Add alslsi3_extend

 gcc/config/loongarch/loongarch-def.cc | 42 ++-
 gcc/config/loongarch/loongarch.cc | 22 +-
 gcc/config/loongarch/loongarch.md | 12 ++
 .../loongarch/mul-const-reduction.c   | 11 +
 4 files changed, 56 insertions(+), 31 deletions(-)
 create mode 100644 gcc/testsuite/gcc.target/loongarch/mul-const-reduction.c

-- 
2.43.0

[PATCH 2/3] LoongArch: Fix instruction costs [PR112936]

2023-12-09 Thread Xi Ruoyao

Replace the instruction costs in loongarch_rtx_cost_data constructor
based on micro-benchmark results on LA464 and LA664.

This allows optimizations like "x * 17" to alsl, and "x * 68" to alsl
and slli.

gcc/ChangeLog:

PR target/112936
* config/loongarch/loongarch-def.cc
(loongarch_rtx_cost_data::loongarch_rtx_cost_data): Update
instruction costs per micro-benchmark results.
(loongarch_rtx_cost_optimize_size): Set all instruction costs
to (COSTS_N_INSNS (1) + 1).
* config/loongarch/loongarch.cc (loongarch_rtx_costs): Remove
special case for multiplication when optimizing for size.
Adjust division cost when TARGET_64BIT && !TARGET_DIV32.
Account the extra cost when TARGET_CHECK_ZERO_DIV and
optimizing for speed.

gcc/testsuite/ChangeLog

PR target/112936
* gcc.target/loongarch/mul-const-reduction.c: New test.
---
 gcc/config/loongarch/loongarch-def.cc | 39 ++-
 gcc/config/loongarch/loongarch.cc | 22 +--
 .../loongarch/mul-const-reduction.c   | 11 ++
 3 files changed, 43 insertions(+), 29 deletions(-)
 create mode 100644 gcc/testsuite/gcc.target/loongarch/mul-const-reduction.c

diff --git a/gcc/config/loongarch/loongarch-def.cc 
b/gcc/config/loongarch/loongarch-def.cc
index 6217b19268c..4a8885e8343 100644
--- a/gcc/config/loongarch/loongarch-def.cc
+++ b/gcc/config/loongarch/loongarch-def.cc
@@ -92,15 +92,15 @@ array_tune loongarch_cpu_align =
 
 /* Default RTX cost initializer.  */
 loongarch_rtx_cost_data::loongarch_rtx_cost_data ()
-  : fp_add (COSTS_N_INSNS (1)),
-fp_mult_sf (COSTS_N_INSNS (2)),
-fp_mult_df (COSTS_N_INSNS (4)),
-fp_div_sf (COSTS_N_INSNS (6)),
+  : fp_add (COSTS_N_INSNS (5)),
+fp_mult_sf (COSTS_N_INSNS (5)),
+fp_mult_df (COSTS_N_INSNS (5)),
+fp_div_sf (COSTS_N_INSNS (8)),
 fp_div_df (COSTS_N_INSNS (8)),
-int_mult_si (COSTS_N_INSNS (1)),
-int_mult_di (COSTS_N_INSNS (1)),
-int_div_si (COSTS_N_INSNS (4)),
-int_div_di (COSTS_N_INSNS (6)),
+int_mult_si (COSTS_N_INSNS (4)),
+int_mult_di (COSTS_N_INSNS (4)),
+int_div_si (COSTS_N_INSNS (5)),
+int_div_di (COSTS_N_INSNS (5)),
 branch_cost (6),
 memory_latency (4) {}
 
@@ -111,18 +111,21 @@ loongarch_rtx_cost_data::loongarch_rtx_cost_data ()
 array_tune loongarch_cpu_rtx_cost_data =
   array_tune ();
 
-/* RTX costs to use when optimizing for size.  */
+/* RTX costs to use when optimizing for size.
+   We use a value slightly larger than COSTS_N_INSNS (1) for all of them
+   because they are slower than simple instructions.  */
+#define COST_COMPLEX_INSN (COSTS_N_INSNS (1) + 1)
 const loongarch_rtx_cost_data loongarch_rtx_cost_optimize_size =
   loongarch_rtx_cost_data ()
-.fp_add_ (4)
-.fp_mult_sf_ (4)
-.fp_mult_df_ (4)
-.fp_div_sf_ (4)
-.fp_div_df_ (4)
-.int_mult_si_ (4)
-.int_mult_di_ (4)
-.int_div_si_ (4)
-.int_div_di_ (4);
+.fp_add_ (COST_COMPLEX_INSN)
+.fp_mult_sf_ (COST_COMPLEX_INSN)
+.fp_mult_df_ (COST_COMPLEX_INSN)
+.fp_div_sf_ (COST_COMPLEX_INSN)
+.fp_div_df_ (COST_COMPLEX_INSN)
+.int_mult_si_ (COST_COMPLEX_INSN)
+.int_mult_di_ (COST_COMPLEX_INSN)
+.int_div_si_ (COST_COMPLEX_INSN)
+.int_div_di_ (COST_COMPLEX_INSN);
 
 array_tune loongarch_cpu_issue_rate = array_tune ()
   .set (CPU_NATIVE, 4)
diff --git a/gcc/config/loongarch/loongarch.cc 
b/gcc/config/loongarch/loongarch.cc
index 754aeb8bfb7..f04b5798f39 100644
--- a/gcc/config/loongarch/loongarch.cc
+++ b/gcc/config/loongarch/loongarch.cc
@@ -3787,8 +3787,6 @@ loongarch_rtx_costs (rtx x, machine_mode mode, int 
outer_code,
*total = (speed
  ? loongarch_cost->int_mult_si * 3 + 6
  : COSTS_N_INSNS (7));
-  else if (!speed)
-   *total = COSTS_N_INSNS (1) + 1;
   else if (mode == DImode)
*total = loongarch_cost->int_mult_di;
   else
@@ -3823,14 +3821,18 @@ loongarch_rtx_costs (rtx x, machine_mode mode, int 
outer_code,
 
 case UDIV:
 case UMOD:
-  if (!speed)
-   {
- *total = COSTS_N_INSNS (loongarch_idiv_insns (mode));
-   }
-  else if (mode == DImode)
+  if (mode == DImode)
*total = loongarch_cost->int_div_di;
   else
-   *total = loongarch_cost->int_div_si;
+   {
+ *total = loongarch_cost->int_div_si;
+ if (TARGET_64BIT && !TARGET_DIV32)
+   *total += COSTS_N_INSNS (2);
+   }
+
+  if (TARGET_CHECK_ZERO_DIV)
+   *total += COSTS_N_INSNS (2);
+
   return false;
 
 case SIGN_EXTEND:
@@ -3862,9 +3864,7 @@ loongarch_rtx_costs (rtx x, machine_mode mode, int 
outer_code,
  && (GET_CODE (XEXP (XEXP (XEXP (x, 0), 0), 1))
  == ZERO_EXTEND
{
- if (!speed)
-   *total = COSTS_N_INSNS (1) + 1;
- else if (mode == DImode)
+ if (mode == DImode)
*total = loonga

[PATCH 1/3] LoongArch: Include rtl.h for COSTS_N_INSNS instead of hard coding our own

2023-12-09 Thread Xi Ruoyao

With loongarch-def.cc switched from C to C++, we can include rtl.h for
COSTS_N_INSNS, instead of hard coding our own.

THis is a non-functional change for now, but it will make the code more
future-proof in case COSTS_N_INSNS in rtl.h would be changed.

gcc/ChangeLog:

* config/loongarch/loongarch-def.cc (rtl.h): Include.
(COSTS_N_INSNS): Remove the macro definition.
---
 gcc/config/loongarch/loongarch-def.cc | 3 +--
 1 file changed, 1 insertion(+), 2 deletions(-)

diff --git a/gcc/config/loongarch/loongarch-def.cc 
b/gcc/config/loongarch/loongarch-def.cc
index c41804a180e..6217b19268c 100644
--- a/gcc/config/loongarch/loongarch-def.cc
+++ b/gcc/config/loongarch/loongarch-def.cc
@@ -22,6 +22,7 @@ along with GCC; see the file COPYING3.  If not see
 #include "system.h"
 #include "coretypes.h"
 #include "tm.h"
+#include "rtl.h"
 
 #include "loongarch-def.h"
 #include "loongarch-str.h"
@@ -89,8 +90,6 @@ array_tune loongarch_cpu_align =
 .set (CPU_LA464, la464_align ())
 .set (CPU_LA664, la464_align ());
 
-#define COSTS_N_INSNS(N) ((N) * 4)
-
 /* Default RTX cost initializer.  */
 loongarch_rtx_cost_data::loongarch_rtx_cost_data ()
   : fp_add (COSTS_N_INSNS (1)),
-- 
2.43.0

[PATCH 3/3] LoongArch: Add alslsi3_extend

2023-12-09 Thread Xi Ruoyao

Following the instruction cost fix, we are generating

alsl.w $a0, $a0, $a0, 4

instead of

li.w  $t0, 17
mul.w $a0, $t0

for "x * 4", because alsl.w is 4 times faster than mul.w.  But we didn't
have a sign-extending pattern for alsl.w, causing an extra slli.w
instruction generated to sign-extend $a0.  Add the pattern to remove the
redundant extension.

gcc/ChangeLog:

* config/loongarch/loongarch.md (alslsi3_extend): New
define_insn.
---
 gcc/config/loongarch/loongarch.md | 12 
 1 file changed, 12 insertions(+)

diff --git a/gcc/config/loongarch/loongarch.md 
b/gcc/config/loongarch/loongarch.md
index afbf201d4d0..7b26d15aa4e 100644
--- a/gcc/config/loongarch/loongarch.md
+++ b/gcc/config/loongarch/loongarch.md
@@ -2869,6 +2869,18 @@ (define_insn "alsl3"
   [(set_attr "type" "arith")
(set_attr "mode" "")])
 
+(define_insn "alslsi3_extend"
+  [(set (match_operand:DI 0 "register_operand" "=r")
+   (sign_extend:DI
+ (plus:SI
+   (ashift:SI (match_operand:SI 1 "register_operand" "r")
+  (match_operand 2 "const_immalsl_operand" ""))
+   (match_operand:SI 3 "register_operand" "r"]
+  ""
+  "alsl.w\t%0,%1,%3,%2"
+  [(set_attr "type" "arith")
+   (set_attr "mode" "SI")])
+
 
 
 ;; Reverse the order of bytes of operand 1 and store the result in operand 0.
-- 
2.43.0

Re: [PING][PATCH 2/2] arm: Add support for MVE Tail-Predicated Low Overhead Loops

2023-12-09 Thread Richard Sandiford

Sorry for the slow review.

Stamatis Markianos-Wright  writes:
> [...]
> diff --git a/gcc/config/arm/mve.md b/gcc/config/arm/mve.md
> index 
> 44a04b86cb5806fcf50917826512fd203d42106c..c083f965fa9a40781bc86beb6e63654afd14eac4
>  100644
> --- a/gcc/config/arm/mve.md
> +++ b/gcc/config/arm/mve.md
> @@ -6922,23 +6922,24 @@
>  ;; Originally expanded by 'predicated_doloop_end'.
>  ;; In the rare situation where the branch is too far, we do also need to
>  ;; revert FPSCR.LTPSIZE back to 0x100 after the last iteration.
> -(define_insn "*predicated_doloop_end_internal"
> +(define_insn "predicated_doloop_end_internal"
>[(set (pc)
>   (if_then_else
> -(ge (plus:SI (reg:SI LR_REGNUM)
> - (match_operand:SI 0 "const_int_operand" ""))
> - (const_int 0))
> -  (label_ref (match_operand 1 "" ""))
> +(gtu (unspec:SI [(plus:SI (match_operand:SI 0 "s_register_operand" 
> "=r")
> +  (const_int ))]
> + LETP)
> + (const_int ))

Is there any need for the unspec?  I couldn't see why this wasn't simply:

  (gtu (match_operand:SI 0 "s_register_operand" "=r")
   (const_int ))

But I agree that using gtu rather than ge is nicer if it's what the
instruction does.

> diff --git a/gcc/df-core.cc b/gcc/df-core.cc
> index 
> d4812b04a7cb97ea1606082e26e910472da5bcc1..4fcc14bf790d43e792b3c926fe1f80073d908c17
>  100644
> --- a/gcc/df-core.cc
> +++ b/gcc/df-core.cc
> @@ -1964,6 +1964,21 @@ df_bb_regno_last_def_find (basic_block bb, unsigned 
> int regno)
>return NULL;
>  }
>  
> +/* Return the one and only def of REGNO within BB.  If there is no def or
> +   there are multiple defs, return NULL.  */
> +
> +df_ref
> +df_bb_regno_only_def_find (basic_block bb, unsigned int regno)
> +{
> +  df_ref temp = df_bb_regno_first_def_find (bb, regno);
> +  if (!temp)
> +return NULL;
> +  else if (temp == df_bb_regno_last_def_find (bb, regno))
> +return temp;
> +  else
> +return NULL;
> +}
> +
>  /* Finds the reference corresponding to the definition of REG in INSN.
> DF is the dataflow object.  */
>  
> diff --git a/gcc/df.h b/gcc/df.h
> index 
> 402657a7076f1bcad24e9c50682e033e57f432f9..98623637f9c839c799222e99df2a7173a770b2ac
>  100644
> --- a/gcc/df.h
> +++ b/gcc/df.h
> @@ -987,6 +987,7 @@ extern void df_check_cfg_clean (void);
>  #endif
>  extern df_ref df_bb_regno_first_def_find (basic_block, unsigned int);
>  extern df_ref df_bb_regno_last_def_find (basic_block, unsigned int);
> +extern df_ref df_bb_regno_only_def_find (basic_block, unsigned int);
>  extern df_ref df_find_def (rtx_insn *, rtx);
>  extern bool df_reg_defined (rtx_insn *, rtx);
>  extern df_ref df_find_use (rtx_insn *, rtx);
> diff --git a/gcc/loop-doloop.cc b/gcc/loop-doloop.cc
> index 
> 4feb0a25ab9331b7124df900f73c9fc6fb3eb10b..d919207505c472c8a54a2c9c982a09061584177b
>  100644
> --- a/gcc/loop-doloop.cc
> +++ b/gcc/loop-doloop.cc
> @@ -85,10 +85,10 @@ doloop_condition_get (rtx_insn *doloop_pat)
>   forms:
>  
>   1)  (parallel [(set (pc) (if_then_else (condition)
> - (label_ref (label))
> - (pc)))
> -  (set (reg) (plus (reg) (const_int -1)))
> -  (additional clobbers and uses)])
> + (label_ref (label))
> + (pc)))
> +  (set (reg) (plus (reg) (const_int -1)))
> +  (additional clobbers and uses)])
>  
>   The branch must be the first entry of the parallel (also required
>   by jump.cc), and the second entry of the parallel must be a set of
> @@ -96,19 +96,34 @@ doloop_condition_get (rtx_insn *doloop_pat)
>   the loop counter in an if_then_else too.
>  
>   2)  (set (reg) (plus (reg) (const_int -1))
> - (set (pc) (if_then_else (reg != 0)
> -  (label_ref (label))
> -  (pc))).  
> +  (set (pc) (if_then_else (reg != 0)
> +  (label_ref (label))
> +  (pc))).
>  
>   Some targets (ARM) do the comparison before the branch, as in the
>   following form:
>  
> - 3) (parallel [(set (cc) (compare ((plus (reg) (const_int -1), 0)))
> -   (set (reg) (plus (reg) (const_int -1)))])
> -(set (pc) (if_then_else (cc == NE)
> -(label_ref (label))
> -(pc))) */
> -
> + 3) (parallel [(set (cc) (compare (plus (reg) (const_int -1)) 0))
> +(set (reg) (plus (reg) (const_int -1)))])
> + (set (pc) (if_then_else (cc == NE)
> + (label_ref (label))
> + (pc)))
> +
> +  The ARM target also supports a special case of a counter that 
> decrements
> +  by `n` and terminating in a GTU condition.  In that case, the compare 
> and
> +

Re: [PATCH] aarch64: Add missing driver-aarch64 dependencies

2023-12-09 Thread Richard Sandiford

Andrew Carlotti  writes:
> Ok for master?
>
> gcc/ChangeLog:
>
>   * config/aarch64/x-aarch64: Add missing dependencies.
>
>
> diff --git a/gcc/config/aarch64/x-aarch64 b/gcc/config/aarch64/x-aarch64
> index 
> 3cf701a0a01ab00eaaafdfad14bd90ebbb1d498f..6fd638faaab7cb5bb2309d36d6dea2adf1fb8d32
>  100644
> --- a/gcc/config/aarch64/x-aarch64
> +++ b/gcc/config/aarch64/x-aarch64
> @@ -1,3 +1,7 @@
>  driver-aarch64.o: $(srcdir)/config/aarch64/driver-aarch64.cc \
> -  $(CONFIG_H) $(SYSTEM_H)
> +  $(CONFIG_H) $(SYSTEM_H) $(TM_H) $(CORETYPES_H) \
> +  $(srcdir)/config/aarch64/aarch64-protos.h \
> +  $(srcdir)/config/aarch64/aarch64-feature-deps.h \
> +  $(srcdir)/config/aarch64/aarch64-cores.def \
> +  $(srcdir)/config/aarch64/aarch64-arches.def

The .def files are included in TM_H by:

TM_H += $(srcdir)/config/aarch64/aarch64-fusion-pairs.def \
$(srcdir)/config/aarch64/aarch64-tuning-flags.def \
$(srcdir)/config/aarch64/aarch64-option-extensions.def \
$(srcdir)/config/aarch64/aarch64-cores.def \
$(srcdir)/config/aarch64/aarch64-isa-modes.def \
$(srcdir)/config/aarch64/aarch64-arches.def

so they aren't strictly needed.  If you'd prefer to include the
directly-included files anyway (can see the argument in favour of that),
then the list ought to include aarch64-option-extensions.def too.

OK whichever way you prefer.

Thanks,
Richard

>   $(COMPILER) -c $(ALL_COMPILERFLAGS) $(ALL_CPPFLAGS) $(INCLUDES) $<

Re: aarch64: Fix +nocrypto handling

2023-12-09 Thread Richard Sandiford

Andrew Carlotti  writes:
> Additionally, replace all checks for the AARCH64_FL_CRYPTO bit with
> checks for (AARCH64_FL_AES | AARCH64_FL_SHA2) instead.  The value of the
> AARCH64_FL_CRYPTO bit within isa_flags is now ignored, but it is
> retained because removing it would make processing the data in
> option-extensions.def significantly more complex.
>
> Ok for master?
>
> gcc/ChangeLog:
>
>   * common/config/aarch64/aarch64-common.cc
>   (aarch64_get_extension_string_for_isa_flags): Fix generation of
>   the "+nocrypto" extension.
>   * config/aarch64/aarch64.h (AARCH64_ISA_CRYPTO): Remove.
>   (TARGET_CRYPTO): Remove.
>   * config/aarch64/aarch64-c.cc (aarch64_update_cpp_builtins):
>   Don't use TARGET_CRYPTO.
>
> gcc/testsuite/ChangeLog:
>
>   * gcc.target/aarch64/options_set_27.c: New test.
>   * gcc.target/aarch64/options_set_28.c: New test.
>
> diff --git a/gcc/common/config/aarch64/aarch64-common.cc 
> b/gcc/common/config/aarch64/aarch64-common.cc
> index 
> 20bc4e1291bba9b73798398fea659f1154afa205..6d12454143cd64ebaafa7f5e6c23869ee0bfa543
>  100644
> --- a/gcc/common/config/aarch64/aarch64-common.cc
> +++ b/gcc/common/config/aarch64/aarch64-common.cc
> @@ -310,6 +310,7 @@ aarch64_get_extension_string_for_isa_flags
>   But in order to make the output more readable, it seems better
>   to add the strings in definition order.  */
>aarch64_feature_flags added = 0;
> +  auto flags_crypto = AARCH64_FL_AES | AARCH64_FL_SHA2;
>for (unsigned int i = ARRAY_SIZE (all_extensions); i-- > 0; )
>  {
>auto &opt = all_extensions[i];
> @@ -319,7 +320,7 @@ aarch64_get_extension_string_for_isa_flags
>per-feature crypto flags.  */
>auto flags = opt.flag_canonical;
>if (flags == AARCH64_FL_CRYPTO)
> - flags = AARCH64_FL_AES | AARCH64_FL_SHA2;
> + flags = flags_crypto;
>  
>if ((flags & isa_flags & (explicit_flags | ~current_flags)) == flags)
>   {
> @@ -337,9 +338,27 @@ aarch64_get_extension_string_for_isa_flags
>/* Remove the features in current_flags & ~isa_flags.  If the feature does
>   not have an HWCAPs then it shouldn't be taken into account for feature
>   detection because one way or another we can't tell if it's available
> - or not.  */
> + or not.
> +
> + As a special case, emit "+nocrypto" instead of "+noaes+nosha2", in order
> + to support assemblers that predate the separate per-feature crypto 
> flags.
> + Only use "+nocrypto" when "simd" is enabled (to avoid redundant feature
> + removal), and when "sm4" is not already enabled (to avoid dependending 
> on
> + whether "+nocrypto" also disables "sm4")  */
> +  for (auto &opt : all_extensions)
> +if ((opt.flag_canonical == AARCH64_FL_CRYPTO)
> + && ((flags_crypto & current_flags & ~isa_flags) == flags_crypto)
> + && (current_flags & AARCH64_FL_SIMD)
> + && !(current_flags & AARCH64_FL_SM4))
> +  {
> + current_flags &= ~opt.flags_off;
> + outstr += "+no";
> + outstr += opt.name;
> +  }
> +

Is it an important part of the patch that we do this ahead of time,
rather than in the main loop?  Doing it in the main loop feels more
natural, and should avoid the need for the SIMD test.

It we do use an in-loop test, I assume the test would need to be
something like:

(opt.flag_canonical & flag_crypto)
&& (flags_crypto & current_flags & ~isa_flags) == flags_crypto
&& !(current_flags & AARCH64_FL_SM4)

so that the new code is applied when the loop first sees a crypto flag.

The set of flags to disable would be:

current_flags &= ~feature_deps::get_flags_off (flag_crypto);

Otherwise it looks good, thanks.  As a general formatting note,
GCC style is not to put individual comparisons in parentheses in
&& and || combos.

Richard

>for (auto &opt : all_extensions)
>  if (opt.native_detect_p
> + && (opt.flag_canonical != AARCH64_FL_CRYPTO)
>   && (opt.flag_canonical & current_flags & ~isa_flags))
>{
>   current_flags &= ~opt.flags_off;
> diff --git a/gcc/config/aarch64/aarch64-c.cc b/gcc/config/aarch64/aarch64-c.cc
> index 
> ab8844f6049dc95b97648b651bfcd3a4ccd3ca0b..4f9ee01d52f3ac42f95edbb030bdb2d09fc36d16
>  100644
> --- a/gcc/config/aarch64/aarch64-c.cc
> +++ b/gcc/config/aarch64/aarch64-c.cc
> @@ -140,7 +140,7 @@ aarch64_update_cpp_builtins (cpp_reader *pfile)
>aarch64_def_or_undef (TARGET_ILP32, "_ILP32", pfile);
>aarch64_def_or_undef (TARGET_ILP32, "__ILP32__", pfile);
>  
> -  aarch64_def_or_undef (TARGET_CRYPTO, "__ARM_FEATURE_CRYPTO", pfile);
> +  aarch64_def_or_undef (TARGET_AES && TARGET_SHA2, "__ARM_FEATURE_CRYPTO", 
> pfile);
>aarch64_def_or_undef (TARGET_SIMD_RDMA, "__ARM_FEATURE_QRDMX", pfile);
>aarch64_def_or_undef (TARGET_SVE, "__ARM_FEATURE_SVE", pfile);
>cpp_undef (pfile, "__ARM_FEATURE_SVE_BITS");
> diff --git a/gcc/config/aarch64/aarch64.h b/gcc/config/aarch64/aarch64.h
> index 
> 1ac298926ce

Re: aarch64: Fix +nopredres, +nols64 and +nomops

2023-12-09 Thread Richard Sandiford

Andrew Carlotti  writes:
> For native cpu feature detection, certain features have no entry in
> /proc/cpuinfo, so have to be assumed to be present whenever the detected
> cpu is supposed to support that feature.
>
> However, the logic for this was mistakenly implemented by excluding
> these features from part of aarch64_get_extension_string_for_isa_flags.
> This function is also used elsewhere when canonicalising explicit
> feature sets, which may require removing features that are normally
> implied by the specified architecture version.
>
> This change reenables generation of +nopredres, +nols64 and +nomops
> during canonicalisation, by relocating the misplaced native cpu
> detection logic.
>
> gcc/ChangeLog:
>
>   * common/config/aarch64/aarch64-common.cc
>   (aarch64_get_extension_string_for_isa_flags): Remove filtering
>   of features without native detection.
>   * config/aarch64/driver-aarch64.cc (host_detect_local_cpu):
>   Explicitly add expected features that lack cpuinfo detection.
>
> gcc/testsuite/ChangeLog:
>
>   * gcc.target/aarch64/options_set_29.c: New test.
>
>
> diff --git a/gcc/common/config/aarch64/aarch64-common.cc 
> b/gcc/common/config/aarch64/aarch64-common.cc
> index 
> ee2ea7eae105d19ec906ef8d25d3a237fbeac4b4..37e60d6083e290b18b1f4c6274123b0a58de5476
>  100644
> --- a/gcc/common/config/aarch64/aarch64-common.cc
> +++ b/gcc/common/config/aarch64/aarch64-common.cc
> @@ -357,8 +357,7 @@ aarch64_get_extension_string_for_isa_flags
>}
>  
>for (auto &opt : all_extensions)
> -if (opt.native_detect_p
> - && (opt.flag_canonical != AARCH64_FL_CRYPTO)
> +if ((opt.flag_canonical != AARCH64_FL_CRYPTO)
>   && (opt.flag_canonical & current_flags & ~isa_flags))
>{
>   current_flags &= ~opt.flags_off;

This is the only use of native_detect_p, so it'd be good to remove
the field itself.

> diff --git a/gcc/config/aarch64/driver-aarch64.cc 
> b/gcc/config/aarch64/driver-aarch64.cc
> index 
> 8e318892b10aa2288421fad418844744a2f5a3b4..470c19b650f1ae953918eaeddbf0f768c12a99d9
>  100644
> --- a/gcc/config/aarch64/driver-aarch64.cc
> +++ b/gcc/config/aarch64/driver-aarch64.cc
> @@ -262,6 +262,7 @@ host_detect_local_cpu (int argc, const char **argv)
>unsigned int n_variants = 0;
>bool processed_exts = false;
>aarch64_feature_flags extension_flags = 0;
> +  aarch64_feature_flags unchecked_extension_flags = 0;
>aarch64_feature_flags default_flags = 0;
>std::string buf;
>size_t sep_pos = -1;
> @@ -348,7 +349,10 @@ host_detect_local_cpu (int argc, const char **argv)
> /* If the feature contains no HWCAPS string then ignore it for the
>auto detection.  */
> if (val.empty ())
> - continue;
> + {
> +   unchecked_extension_flags |= aarch64_extensions[i].flag;
> +   continue;
> + }
>  
> bool enabled = true;
>  
> @@ -447,6 +451,13 @@ host_detect_local_cpu (int argc, const char **argv)
>if (tune)
>  return res;
>  
> +  if (!processed_exts)
> +goto not_found;

Could you explain this part?  It seems like more of a parsing change
(i.e. being more strict about what we accept).

If that's the intention, it probably belongs in:

  if (n_cores == 0
  || n_cores > 2
  || (n_cores == 1 && n_variants != 1)
  || imp == INVALID_IMP)
goto not_found;

But maybe it should be a separate patch.

Looks good otherwise, thanks.

Richard

> +
> +  /* Add any features that should be be present, but can't be verified using
> + the /proc/cpuinfo "Features" list.  */
> +  extension_flags |= unchecked_extension_flags & default_flags;
> +
>{
>  std::string extension
>= aarch64_get_extension_string_for_isa_flags (extension_flags,
> diff --git a/gcc/testsuite/gcc.target/aarch64/options_set_29.c 
> b/gcc/testsuite/gcc.target/aarch64/options_set_29.c
> new file mode 100644
> index 
> ..01bb73c02e232bdfeca5f16dad3fa2a6484843d5
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/aarch64/options_set_29.c
> @@ -0,0 +1,11 @@
> +/* { dg-do compile } */
> +/* { dg-additional-options "-march=armv9.3-a+nopredres+nols64+nomops" } */
> +
> +int main ()
> +{
> +  return 0;
> +}
> +
> +/* { dg-final { scan-assembler-times {\.arch 
> armv9\.3\-a\+crc\+nopredres\+nols64\+nomops\n} 1 } } */
> +
> +/* Checking if enabling default features drops the superfluous bits.   */

v2 [C PATCH] Fix regression causing ICE for structs with VLAs [PR 112488]

2023-12-09 Thread Martin Uecker



I revised version which fixes a problem with breaking other
callers of finish_rust. Please ignore the previous one.

Bootstrapped and regression tested on x86_64


Fix regression causing ICE for structs with VLAs [PR 112488]

A previous patch the fixed several ICEs related to size expressions
of VM types (PR c/70418, ...) caused a regression for structs where
a DECL_EXPR is not generated anymore although reqired.  We now call
add_decl_expr introduced by the previous patch from finish_struct.
The function is revised with a new argument to not set the TYPE_NAME
for the type to the DECL_EXPR in this specific case.

PR c/112488

gcc/c
* c-decl.cc (add_decl_expr): Revise.
(finish_struct): Create DECL_EXPR.
* c-parser.cc (c_parser_struct_or_union_specifier): Call
finish_struct with expression for VLA sizes.
* c-tree.h (finish_struct): Add argument.

gcc/testsuite
* gcc.dg/pr112488-1.c: New test.
* gcc.dg/pr112488-2.c: New test.
* gcc.dg/pr112898.c: New test.
* gcc.misc-tests/gcov-pr85350.c: Adapt.
---
 gcc/c/c-decl.cc | 33 -
 gcc/c/c-parser.cc   |  2 +-
 gcc/c/c-tree.h  |  3 +-
 gcc/testsuite/gcc.dg/pr112488-1.c   | 14 +
 gcc/testsuite/gcc.dg/pr112488-2.c   | 13 
 gcc/testsuite/gcc.dg/pr112898.c |  9 ++
 gcc/testsuite/gcc.misc-tests/gcov-pr85350.c |  2 +-
 7 files changed, 65 insertions(+), 11 deletions(-)
 create mode 100644 gcc/testsuite/gcc.dg/pr112488-1.c
 create mode 100644 gcc/testsuite/gcc.dg/pr112488-2.c
 create mode 100644 gcc/testsuite/gcc.dg/pr112898.c

diff --git a/gcc/c/c-decl.cc b/gcc/c/c-decl.cc
index 92c83e1bf10..039a66fef09 100644
--- a/gcc/c/c-decl.cc
+++ b/gcc/c/c-decl.cc
@@ -6618,12 +6618,10 @@ smallest_type_quals_location (const location_t 
*locations,
the size evaluation prior to the side effects.  We therefore
use BIND_EXPRs in TYPENAME contexts too.  */
 static void
-add_decl_expr (location_t loc, enum decl_context decl_context, tree type,
-  tree *expr)
+add_decl_expr (location_t loc, tree type, tree *expr, bool set_name_p)
 {
   tree bind = NULL_TREE;
-  if (decl_context == TYPENAME || decl_context == PARM
-  || decl_context == FIELD)
+  if (expr)
 {
   bind = build3 (BIND_EXPR, void_type_node, NULL_TREE, NULL_TREE,
 NULL_TREE);
@@ -6636,7 +6634,8 @@ add_decl_expr (location_t loc, enum decl_context 
decl_context, tree type,
   pushdecl (decl);
   DECL_ARTIFICIAL (decl) = 1;
   add_stmt (build_stmt (DECL_SOURCE_LOCATION (decl), DECL_EXPR, decl));
-  TYPE_NAME (type) = decl;
+  if (set_name_p)
+TYPE_NAME (type) = decl;
 
   if (bind)
 {
@@ -7635,7 +7634,12 @@ grokdeclarator (const struct c_declarator *declarator,
   type has a name/declaration of it's own, but special attention
   is required if the type is anonymous. */
if (!TYPE_NAME (type) && c_type_variably_modified_p (type))
- add_decl_expr (loc, decl_context, type, expr);
+ {
+   bool bind_p = decl_context == TYPENAME
+ || decl_context == FIELD
+ || decl_context == PARM;
+   add_decl_expr (loc, type, bind_p ? expr : NULL, true);
+ }
 
type = c_build_pointer_type (type);
 
@@ -7900,7 +7904,12 @@ grokdeclarator (const struct c_declarator *declarator,
 
/* The pointed-to type may need a decl expr (see above).  */
if (!TYPE_NAME (type) && c_type_variably_modified_p (type))
- add_decl_expr (loc, decl_context, type, expr);
+ {
+   bool bind_p = decl_context == TYPENAME
+ || decl_context == FIELD
+ || decl_context == PARM;
+   add_decl_expr (loc, type, bind_p ? expr : NULL, true);
+ }
 
type = c_build_pointer_type (type);
type_quals = array_ptr_quals;
@@ -9257,7 +9266,8 @@ is_flexible_array_member_p (bool is_last_field,
 
 tree
 finish_struct (location_t loc, tree t, tree fieldlist, tree attributes,
-  class c_struct_parse_info *enclosing_struct_parse_info)
+  class c_struct_parse_info *enclosing_struct_parse_info,
+  tree *expr)
 {
   tree x;
   bool toplevel = file_scope == current_scope;
@@ -9595,6 +9605,13 @@ finish_struct (location_t loc, tree t, tree fieldlist, 
tree attributes,
 
   finish_incomplete_vars (incomplete_vars, toplevel);
 
+  /* Make sure a DECL_EXPR is created for structs with VLA members.
+ Because we do not know the context, we always pass expr
+ to force creation of a BIND_EXPR which is required in some
+ contexts.  */
+  if (c_type_variably_modified_p (t))
+add_decl_expr (loc, t, expr, false);
+
   if (warn_cxx_compat)
 warn_cxx_compat_finish_struct (fie

Re: [PATCH] phiopt: Fix ICE with large --param l1-cache-line-size= [PR112887]

2023-12-09 Thread Richard Biener




> Am 09.12.2023 um 10:35 schrieb Jakub Jelinek :
> 
> Hi!
> 
> This function is never called when param_l1_cache_line_size is 0,
> but it uses int and unsigned int variables to hold alignment in
> bits, so for large param_l1_cache_line_size it is zero and e.g.
> DECL_ALIGN () % param_align_bits can divide by zero.
> Looking at the code, the function uses tree_fits_uhwi_p on the trees
> before converting them using tree_to_uhwi to int variables, which
> looks just wrong, either it would need to punt if it doesn't fit
> into those and also check for overflows during the computation,
> or use unsigned HOST_WIDE_INT for all of this.  That also fixes
> the division by zero, as param_l1_cache_line_size maximum is INT_MAX,
> that multiplied by 8 will always fit.
> 
> Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?

Ok

> 2023-12-09  Jakub Jelinek  
> 
>PR tree-optimization/112887
>* tree-ssa-phiopt.cc (hoist_adjacent_loads): Change type of
>param_align, param_align_bits, offset1, offset2, size2 and align1
>variables from int or unsigned int to unsigned HOST_WIDE_INT.
> 
>* gcc.dg/pr112887.c: New test.
> 
> --- gcc/tree-ssa-phiopt.cc.jj2023-11-14 10:52:16.195275972 +0100
> +++ gcc/tree-ssa-phiopt.cc2023-12-08 16:25:29.166747347 +0100
> @@ -3757,8 +3757,8 @@ static void
> hoist_adjacent_loads (basic_block bb0, basic_block bb1,
>  basic_block bb2, basic_block bb3)
> {
> -  int param_align = param_l1_cache_line_size;
> -  unsigned param_align_bits = (unsigned) (param_align * BITS_PER_UNIT);
> +  unsigned HOST_WIDE_INT param_align = param_l1_cache_line_size;
> +  unsigned HOST_WIDE_INT param_align_bits = param_align * BITS_PER_UNIT;
>   gphi_iterator gsi;
> 
>   /* Walk the phis in bb3 looking for an opportunity.  We are looking
> @@ -3770,8 +3770,7 @@ hoist_adjacent_loads (basic_block bb0, b
>   gimple *def1, *def2;
>   tree arg1, arg2, ref1, ref2, field1, field2;
>   tree tree_offset1, tree_offset2, tree_size2, next;
> -  int offset1, offset2, size2;
> -  unsigned align1;
> +  unsigned HOST_WIDE_INT offset1, offset2, size2, align1;
>   gimple_stmt_iterator gsi2;
>   basic_block bb_for_def1, bb_for_def2;
> 
> --- gcc/testsuite/gcc.dg/pr112887.c.jj2023-12-08 16:31:30.708697160 +0100
> +++ gcc/testsuite/gcc.dg/pr112887.c2023-12-08 16:27:06.662385487 +0100
> @@ -0,0 +1,13 @@
> +/* PR tree-optimization/112887 */
> +/* { dg-do compile } */
> +/* { dg-options "-O2 --param=l1-cache-line-size=0x2000" } */
> +
> +void bar (long);
> +long c;
> +struct S { long a, b; } s;
> +
> +void
> +foo (void)
> +{
> +  bar (c ? s.a : s.b);
> +}
> 
>Jakub
>

Re: [PATCH] c++: End lifetime of objects in constexpr after destructor call [PR71093]

2023-12-09 Thread Jason Merrill


On 11/2/23 21:18, Nathaniel Shead wrote:

Bootstrapped and regtested on x86-64_pc_linux_gnu.

I'm not entirely sure if the change I made to have destructors clobber with
CLOBBER_EOL instead of CLOBBER_UNDEF is appropriate, but nothing seemed to have
broken by doing this and I wasn't able to find anything else that really
depended on this distinction other than a warning pass. Otherwise I could
experiment with a new clobber kind for destructor calls.


It seems wrong to me: CLOBBER_EOL is documented to mean that the storage 
is expiring at that point as well, which a (pseudo-)destructor does not 
imply; it's perfectly valid to destroy an object and then create another 
in the same storage.


We probably do want another clobber kind for end of object lifetime. 
And/or one for beginning of object lifetime.


Jason

Re: [PATCH] c++: fix noexcept checking for trivial operations [PR96090]

2023-12-09 Thread Jason Merrill


On 11/27/23 06:07, Nathaniel Shead wrote:

Ping for https://gcc.gnu.org/pipermail/gcc-patches/2023-October/634626.html.

I've been made aware since constructing this patch of CWG2820, which has
a proposed resolution that would change the result of the testcase
'noexcept(yesthrow_t())' (and similarly for the library builtin), but as
it hasn't yet been accepted I think at least ensuring the builtin
matches the behaviour of the operator is probably still sensible.


OK.


On Sun, Oct 29, 2023 at 12:43:28PM +1100, Nathaniel Shead wrote:

Bootstrapped and regtested on x86_64-pc-linux-gnu.

-- >8 --

This patch stops eager folding of trivial operations (construction and
assignment) from occurring when checking for noexceptness. This was
previously done in PR c++/53025, but only for copy/move construction,
and the __is_nothrow_xible builtins did not receive the same treatment
when they were added.

To handle `is_nothrow_default_constructible`, the patch also ensures
that when no parameters are passed we do value initialisation instead of
just building the constructor call: in particular, value-initialisation
doesn't necessarily actually invoke the constructor for trivial default
constructors, and so we need to handle this case as well.

PR c++/96090
PR c++/100470

gcc/cp/ChangeLog:

* call.cc (build_over_call): Prevent folding of trivial special
members when checking for noexcept.
* method.cc (constructible_expr): Perform value-initialisation
for empty parameter lists.
(is_nothrow_xible): Treat as noexcept operator.

gcc/testsuite/ChangeLog:

* g++.dg/cpp0x/noexcept81.C: New test.
* g++.dg/ext/is_nothrow_constructible7.C: New test.
* g++.dg/ext/is_nothrow_constructible8.C: New test.

Signed-off-by: Nathaniel Shead 
---
  gcc/cp/call.cc| 17 ++---
  gcc/cp/method.cc  | 19 --
  gcc/testsuite/g++.dg/cpp0x/noexcept81.C   | 36 +++
  .../g++.dg/ext/is_nothrow_constructible7.C| 20 ++
  .../g++.dg/ext/is_nothrow_constructible8.C| 63 +++
  5 files changed, 141 insertions(+), 14 deletions(-)
  create mode 100644 gcc/testsuite/g++.dg/cpp0x/noexcept81.C
  create mode 100644 gcc/testsuite/g++.dg/ext/is_nothrow_constructible7.C
  create mode 100644 gcc/testsuite/g++.dg/ext/is_nothrow_constructible8.C

diff --git a/gcc/cp/call.cc b/gcc/cp/call.cc
index c1fb8807d3f..ac02b0633ed 100644
--- a/gcc/cp/call.cc
+++ b/gcc/cp/call.cc
@@ -10231,15 +10231,16 @@ build_over_call (struct z_candidate *cand, int flags, 
tsubst_flags_t complain)
/* Avoid actually calling copy constructors and copy assignment operators,
   if possible.  */
  
-  if (! flag_elide_constructors && !force_elide)

+  if (!force_elide
+  && (!flag_elide_constructors
+ /* It's unsafe to elide the operation when handling
+a noexcept-expression, it may evaluate to the wrong
+value (c++/53025, c++/96090).  */
+ || cp_noexcept_operand != 0))
  /* Do things the hard way.  */;
-  else if (cand->num_convs == 1
-   && (DECL_COPY_CONSTRUCTOR_P (fn)
-   || DECL_MOVE_CONSTRUCTOR_P (fn))
-  /* It's unsafe to elide the constructor when handling
- a noexcept-expression, it may evaluate to the wrong
- value (c++/53025).  */
-  && (force_elide || cp_noexcept_operand == 0))
+  else if (cand->num_convs == 1
+  && (DECL_COPY_CONSTRUCTOR_P (fn)
+  || DECL_MOVE_CONSTRUCTOR_P (fn)))
  {
tree targ;
tree arg = argarray[num_artificial_parms_for (fn)];
diff --git a/gcc/cp/method.cc b/gcc/cp/method.cc
index a70dd5d6adc..3c978e2369d 100644
--- a/gcc/cp/method.cc
+++ b/gcc/cp/method.cc
@@ -2091,6 +2091,7 @@ constructible_expr (tree to, tree from)
  {
tree expr;
cp_unevaluated cp_uneval_guard;
+  const int len = TREE_VEC_LENGTH (from);
if (CLASS_TYPE_P (to))
  {
tree ctype = to;
@@ -2098,11 +2099,16 @@ constructible_expr (tree to, tree from)
if (!TYPE_REF_P (to))
to = cp_build_reference_type (to, /*rval*/false);
tree ob = build_stub_object (to);
-  vec_alloc (args, TREE_VEC_LENGTH (from));
-  for (tree arg : tree_vec_range (from))
-   args->quick_push (build_stub_object (arg));
-  expr = build_special_member_call (ob, complete_ctor_identifier, &args,
-   ctype, LOOKUP_NORMAL, tf_none);
+  if (len == 0)
+   expr = build_value_init (ctype, tf_none);
+  else
+   {
+ vec_alloc (args, TREE_VEC_LENGTH (from));
+ for (tree arg : tree_vec_range (from))
+   args->quick_push (build_stub_object (arg));
+ expr = build_special_member_call (ob, complete_ctor_identifier, &args,
+   ctype, LOOKUP_NORMAL, tf_none);
+   }
if (expr == error_mark_node)
return error_mark_node;

Re: [PATCH] c++/modules: alias CTAD and specializations table

2023-12-09 Thread Jason Merrill


On 11/24/23 13:09, Patrick Palka wrote:

Bootstrapped and regtested on x86_64-pc-linux-gnu, does this look OK for
trunk?


OK.


-- >8 --

A rewritten guide for alias CTAD isn't really a specialization of the
original guide, so we shouldn't register it as such.  This avoids an ICE
in the below modules testcase which otherwise tries to inspect the
rewritten guide's empty DECL_CONTEXT.  It also preemptively avoids an
ICE in modules/concept-6 in C++23 mode with the inherited CTAD patch.

* pt.cc (alias_ctad_tweaks): Pass use_spec_table=false to
tsubst_decl.

gcc/testsuite/ChangeLog:

* g++.dg/modules/concept-8.h: New test.
* g++.dg/modules/concept-8_a.H: New test.
* g++.dg/modules/concept-8_b.C: New test.
---
  gcc/cp/pt.cc   |  3 ++-
  gcc/testsuite/g++.dg/modules/concept-8.h   | 14 ++
  gcc/testsuite/g++.dg/modules/concept-8_a.H |  5 +
  gcc/testsuite/g++.dg/modules/concept-8_b.C |  8 
  4 files changed, 29 insertions(+), 1 deletion(-)
  create mode 100644 gcc/testsuite/g++.dg/modules/concept-8.h
  create mode 100644 gcc/testsuite/g++.dg/modules/concept-8_a.H
  create mode 100644 gcc/testsuite/g++.dg/modules/concept-8_b.C

diff --git a/gcc/cp/pt.cc b/gcc/cp/pt.cc
index 4f93150c5d7..2cfe1da5e07 100644
--- a/gcc/cp/pt.cc
+++ b/gcc/cp/pt.cc
@@ -30015,7 +30015,8 @@ alias_ctad_tweaks (tree tmpl, tree uguides)
  /* Parms are to have DECL_CHAIN tsubsted, which would be skipped
 if cp_unevaluated_operand.  */
  cp_evaluated ev;
- g = tsubst_decl (DECL_TEMPLATE_RESULT (f), targs, complain);
+ g = tsubst_decl (DECL_TEMPLATE_RESULT (f), targs, complain,
+  /*use_spec_table=*/false);
}
  if (g == error_mark_node)
continue;
diff --git a/gcc/testsuite/g++.dg/modules/concept-8.h 
b/gcc/testsuite/g++.dg/modules/concept-8.h
new file mode 100644
index 000..a25f9b752fd
--- /dev/null
+++ b/gcc/testsuite/g++.dg/modules/concept-8.h
@@ -0,0 +1,14 @@
+// A version of concept-6.h using an alias template + alias CTAD
+
+template
+struct Base
+{
+  Base (const _Callable &)
+requires true
+  {}
+};
+
+template requires true
+using Derived = Base<_Callable>;
+
+inline Derived all = [] (auto&& __r) {};
diff --git a/gcc/testsuite/g++.dg/modules/concept-8_a.H 
b/gcc/testsuite/g++.dg/modules/concept-8_a.H
new file mode 100644
index 000..da0467781c1
--- /dev/null
+++ b/gcc/testsuite/g++.dg/modules/concept-8_a.H
@@ -0,0 +1,5 @@
+// { dg-require-effective-target c++20 }
+// { dg-additional-options "-fmodule-header -fconcepts" }
+// { dg-module-cmi {} }
+
+#include "concept-8.h"
diff --git a/gcc/testsuite/g++.dg/modules/concept-8_b.C 
b/gcc/testsuite/g++.dg/modules/concept-8_b.C
new file mode 100644
index 000..9a9f014ee09
--- /dev/null
+++ b/gcc/testsuite/g++.dg/modules/concept-8_b.C
@@ -0,0 +1,8 @@
+// { dg-require-effective-target c++20 }
+// { dg-additional-options "-fmodules-ts -fconcepts -fdump-lang-module-alias 
-fno-module-lazy" }
+
+#include "concept-8.h"
+import "concept-8_a.H";
+
+// { dg-final { scan-lang-dump-times {named merge key \(matched\) 
function_decl:'::Base<::._anon_0>::__ct '} 2 module } }
+// { dg-final { scan-lang-dump-not {merge key \(new\)} module } }

[PATCH] Add some new DW_IDX_* constants

2023-12-09 Thread Tom Tromey

I've reimplemented the .debug_names code in GDB -- it was quite far
from being correct, and the new implementation is much closer to what
is specified by DWARF.

However, the new writer in GDB needs to emit some symbol properties,
so that the reader can be fully functional.  This patch adds a few new
DW_IDX_* constants, and tries to document the existing extensions as
well.  (My patch series add more documentation of these to the GDB
manual as well.)
---
 include/dwarf2.def | 9 +
 1 file changed, 9 insertions(+)

diff --git a/include/dwarf2.def b/include/dwarf2.def
index 7ab3ee611fd4..75b75d901884 100644
--- a/include/dwarf2.def
+++ b/include/dwarf2.def
@@ -802,8 +802,17 @@ DW_IDX (DW_IDX_parent, 4)
 DW_IDX (DW_IDX_type_hash, 5)
 DW_IDX_DUP (DW_IDX_lo_user, 0x2000)
 DW_IDX (DW_IDX_hi_user, 0x3fff)
+/* Internal linkage.  A flag.  */
 DW_IDX (DW_IDX_GNU_internal, 0x2000)
+/* External linkage.  A flag.  Note that gdb no longer generates this;
+   the default is to assume external linkage.  */
 DW_IDX (DW_IDX_GNU_external, 0x2001)
+/* This entry is the program's entry point.  A flag.  */
+DW_IDX (DW_IDX_GNU_main, 0x2002)
+/* Language for this entry.  A DW_LANG_* value.  */
+DW_IDX (DW_IDX_GNU_language, 0x2003)
+/* This entry is a linkage name.  A flag.  */
+DW_IDX (DW_IDX_GNU_linkage_name, 0x2004)
 DW_END_IDX
 
 /* DWARF5 Unit type header encodings  */
-- 
2.43.0

[PATCH] aarch64: arm_neon.h - Fix -Wincompatible-pointer-types errors

2023-12-09 Thread Victor Do Nascimento

In the Linux kernel, u64/s64 are [un]signed long long, not [un]signed
long.  This means that when the `arm_neon.h' header is used by the
kernel, any use of the `uint64_t' / `in64_t' types needs to be
correctly cast to the correct `__builtin_aarch64_simd_di' /
`__builtin_aarch64_simd_df' types when calling the relevant ACLE
builtins.

This patch adds the necessary fixes to ensure that `vstl1_*' and
`vldap1_*' intrinsics are correctly defined for use by the kernel.

gcc/ChangeLog:

* config/aarch64/arm_neon.h (vldap1_lane_u64): Add
`const' to `__builtin_aarch64_simd_di *' cast.
(vldap1q_lane_u64): Likewise.
(vldap1_lane_s64): Cast __src to `const __builtin_aarch64_simd_di *'.
(vldap1q_lane_s64): Likewise.
(vldap1_lane_f64): Cast __src to `const __builtin_aarch64_simd_df *'.
(vldap1q_lane_f64): Cast __src to `const __builtin_aarch64_simd_df *'.
(vldap1_lane_p64): Add `const' to `__builtin_aarch64_simd_di *' cast.
(vldap1q_lane_p64): Add `const' to `__builtin_aarch64_simd_di *' cast.
(vstl1_lane_u64): remove stray `const'.
(vstl1_lane_s64): Cast __src to `__builtin_aarch64_simd_di *'.
(vstl1q_lane_s64): Likewise.
(vstl1_lane_f64): Cast __src to `const __builtin_aarch64_simd_df *'.
(vstl1q_lane_f64): Likewise.
---
 gcc/config/aarch64/arm_neon.h | 34 +-
 1 file changed, 21 insertions(+), 13 deletions(-)

diff --git a/gcc/config/aarch64/arm_neon.h b/gcc/config/aarch64/arm_neon.h
index ef0d75e07ce..f394de595f7 100644
--- a/gcc/config/aarch64/arm_neon.h
+++ b/gcc/config/aarch64/arm_neon.h
@@ -13456,7 +13456,7 @@ __attribute__ ((__always_inline__, __gnu_inline__, 
__artificial__))
 vldap1_lane_u64 (const uint64_t *__src, uint64x1_t __vec, const int __lane)
 {
   return __builtin_aarch64_vec_ldap1_lanev1di_usus (
- (__builtin_aarch64_simd_di *) __src, __vec, __lane);
+ (const __builtin_aarch64_simd_di *) __src, __vec, __lane);
 }
 
 __extension__ extern __inline uint64x2_t
@@ -13464,35 +13464,39 @@ __attribute__ ((__always_inline__, __gnu_inline__, 
__artificial__))
 vldap1q_lane_u64 (const uint64_t *__src, uint64x2_t __vec, const int __lane)
 {
   return __builtin_aarch64_vec_ldap1_lanev2di_usus (
- (__builtin_aarch64_simd_di *) __src, __vec, __lane);
+ (const __builtin_aarch64_simd_di *) __src, __vec, __lane);
 }
 
 __extension__ extern __inline int64x1_t
 __attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
 vldap1_lane_s64 (const int64_t *__src, int64x1_t __vec, const int __lane)
 {
-  return __builtin_aarch64_vec_ldap1_lanev1di (__src, __vec, __lane);
+  return __builtin_aarch64_vec_ldap1_lanev1di (
+ (const __builtin_aarch64_simd_di *) __src, __vec, __lane);
 }
 
 __extension__ extern __inline int64x2_t
 __attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
 vldap1q_lane_s64 (const int64_t *__src, int64x2_t __vec, const int __lane)
 {
-  return __builtin_aarch64_vec_ldap1_lanev2di (__src, __vec, __lane);
+  return __builtin_aarch64_vec_ldap1_lanev2di (
+ (const __builtin_aarch64_simd_di *) __src, __vec, __lane);
 }
 
 __extension__ extern __inline float64x1_t
 __attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
 vldap1_lane_f64 (const float64_t *__src, float64x1_t __vec, const int __lane)
 {
-  return __builtin_aarch64_vec_ldap1_lanev1df (__src, __vec, __lane);
+  return __builtin_aarch64_vec_ldap1_lanev1df (
+ (const __builtin_aarch64_simd_df *) __src, __vec, __lane);
 }
 
 __extension__ extern __inline float64x2_t
 __attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
 vldap1q_lane_f64 (const float64_t *__src, float64x2_t __vec, const int __lane)
 {
-  return __builtin_aarch64_vec_ldap1_lanev2df (__src, __vec, __lane);
+  return __builtin_aarch64_vec_ldap1_lanev2df (
+ (const __builtin_aarch64_simd_df *) __src, __vec, __lane);
 }
 
 __extension__ extern __inline poly64x1_t
@@ -13500,7 +13504,7 @@ __attribute__ ((__always_inline__, __gnu_inline__, 
__artificial__))
 vldap1_lane_p64 (const poly64_t *__src, poly64x1_t __vec, const int __lane)
 {
   return __builtin_aarch64_vec_ldap1_lanev1di_psps (
- (__builtin_aarch64_simd_di *) __src, __vec, __lane);
+ (const __builtin_aarch64_simd_di *) __src, __vec, __lane);
 }
 
 __extension__ extern __inline poly64x2_t
@@ -13508,14 +13512,14 @@ __attribute__ ((__always_inline__, __gnu_inline__, 
__artificial__))
 vldap1q_lane_p64 (const poly64_t *__src, poly64x2_t __vec, const int __lane)
 {
   return __builtin_aarch64_vec_ldap1_lanev2di_psps (
- (__builtin_aarch64_simd_di *) __src, __vec, __lane);
+ (const __builtin_aarch64_simd_di *) __src, __vec, __lane);
 }
 
 /* vstl1_lane.  */
 
 __extension__ extern __inline void
 __attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
-vstl1_lane_u64 (const uint64_t *__src, uint64x1_t __vec, const int __lane)
+vstl1_lane_u64 (

RE: [PATCH v7] libgfortran: Replace mutex with rwlock

2023-12-09 Thread Zhu, Lipeng

On 2023/12/9 23:23, Jakub Jelinek wrote:
> On Sat, Dec 09, 2023 at 10:39:45AM -0500, Lipeng Zhu wrote:
> > This patch try to introduce the rwlock and split the read/write to
> > unit_root tree and unit_cache with rwlock instead of the mutex to
> > increase CPU efficiency. In the get_gfc_unit function, the percentage
> > to step into the insert_unit function is around 30%, in most
> > instances, we can get the unit in the phase of reading the unit_cache
> > or unit_root tree. So split the read/write phase by rwlock would be an
> > approach to make it more parallel.
> >
> > BTW, the IPC metrics can gain around 9x in our test server with 220
> > cores. The benchmark we used is https://github.com/rwesson/NEAT
> >
> > libgcc/ChangeLog:
> >
> > * gthr-posix.h (__GTHREAD_RWLOCK_INIT): New macro.
> > (__gthrw): New function.
> > (__gthread_rwlock_rdlock): New function.
> > (__gthread_rwlock_tryrdlock): New function.
> > (__gthread_rwlock_wrlock): New function.
> > (__gthread_rwlock_trywrlock): New function.
> > (__gthread_rwlock_unlock): New function.
> >
> > libgfortran/ChangeLog:
> >
> > * io/async.c (DEBUG_LINE): New macro.
> > * io/async.h (RWLOCK_DEBUG_ADD): New macro.
> > (CHECK_RDLOCK): New macro.
> > (CHECK_WRLOCK): New macro.
> > (TAIL_RWLOCK_DEBUG_QUEUE): New macro.
> > (IN_RWLOCK_DEBUG_QUEUE): New macro.
> > (RDLOCK): New macro.
> > (WRLOCK): New macro.
> > (RWUNLOCK): New macro.
> > (RD_TO_WRLOCK): New macro.
> > (INTERN_RDLOCK): New macro.
> > (INTERN_WRLOCK): New macro.
> > (INTERN_RWUNLOCK): New macro.
> > * io/io.h (struct gfc_unit): Change UNIT_LOCK to UNIT_RWLOCK in
> > a comment.
> > (unit_lock): Remove including associated internal_proto.
> > (unit_rwlock): New declarations including associated internal_proto.
> > (dec_waiting_unlocked): Use WRLOCK and RWUNLOCK on unit_rwlock
> > instead of __gthread_mutex_lock and __gthread_mutex_unlock on
> > unit_lock.
> > * io/transfer.c (st_read_done_worker): Use WRLOCK and RWUNLOCK
> on
> > unit_rwlock instead of LOCK and UNLOCK on unit_lock.
> > (st_write_done_worker): Likewise.
> > * io/unit.c: Change UNIT_LOCK to UNIT_RWLOCK in 'IO locking rules'
> > comment. Use unit_rwlock variable instead of unit_lock variable.
> > (get_gfc_unit_from_unit_root): New function.
> > (get_gfc_unit): Use RDLOCK, WRLOCK and RWUNLOCK on unit_rwlock
> > instead of LOCK and UNLOCK on unit_lock.
> > (close_unit_1): Use WRLOCK and RWUNLOCK on unit_rwlock instead
> of
> > LOCK and UNLOCK on unit_lock.
> > (close_units): Likewise.
> > (newunit_alloc): Use RWUNLOCK on unit_rwlock instead of UNLOCK on
> > unit_lock.
> > * io/unix.c (find_file): Use RDLOCK and RWUNLOCK on unit_rwlock
> > instead of LOCK and UNLOCK on unit_lock.
> > (flush_all_units): Use WRLOCK and RWUNLOCK on unit_rwlock instead
> > of LOCK and UNLOCK on unit_lock.
> 
> Ok for trunk, thanks.
> 
>   Jakub

Thanks! Looking forward to landing to trunk.

Lipeng Zhu

[Committed] RISC-V: Fix VLS mode movmiaslign bug

[PATCH] phiopt: Fix ICE with large --param l1-cache-line-size= [PR112887]

Re: [PATCH 15/21]middle-end: [RFC] conditionally support forcing final edge for debugging

Re: [PATCH] Add support for function attributes and variable attributes

Re: [patch] OpenMP/Fortran: Implement omp allocators/allocate for ptr/allocatables

Re: [PATCH] Reimplement __gnu_cxx::__ops operators

[PATCH] RISC-V: Recognize stepped series in expand_vec_perm_const.

Re: [PATCH] RISC-V: Add vectorized strcmp.

Re: [PATCH] RISC-V: Recognize stepped series in expand_vec_perm_const.

Re: Re: [PATCH] RISC-V: Add vectorized strcmp.

[committed] libstdc++: Fix resolution of LWG 4016 for std::ranges::to [PR112876]

Re: [PATCH] RISC-V: Add vectorized strcmp.

[committed] libstdc++: Fix value of __cpp_lib_format macro [PR111826]

Re: Re: [PATCH] RISC-V: Add vectorized strcmp.

[committed] testsuite: Add testcase for already fixed PR [PR112924]

RE: [PATCH v6] libgfortran: Replace mutex with rwlock

Re: [patch] OpenMP/Fortran: Implement omp allocators/allocate for ptr/allocatables

[PATCH v7] libgfortran: Replace mutex with rwlock

Re: [PATCH v7] libgfortran: Replace mutex with rwlock

[PATCH] LoongArch: Fix warnings building libgcc

[PATCH 0/3] LoongArch: Fix instruction costs

[PATCH 2/3] LoongArch: Fix instruction costs [PR112936]

[PATCH 1/3] LoongArch: Include rtl.h for COSTS_N_INSNS instead of hard coding our own

[PATCH 3/3] LoongArch: Add alslsi3_extend

Re: [PING][PATCH 2/2] arm: Add support for MVE Tail-Predicated Low Overhead Loops

Re: [PATCH] aarch64: Add missing driver-aarch64 dependencies

Re: aarch64: Fix +nocrypto handling

Re: aarch64: Fix +nopredres, +nols64 and +nomops

v2 [C PATCH] Fix regression causing ICE for structs with VLAs [PR 112488]

Re: [PATCH] phiopt: Fix ICE with large --param l1-cache-line-size= [PR112887]

Re: [PATCH] c++: End lifetime of objects in constexpr after destructor call [PR71093]

Re: [PATCH] c++: fix noexcept checking for trivial operations [PR96090]

Re: [PATCH] c++/modules: alias CTAD and specializations table

[PATCH] Add some new DW_IDX_* constants

[PATCH] aarch64: arm_neon.h - Fix -Wincompatible-pointer-types errors

RE: [PATCH v7] libgfortran: Replace mutex with rwlock

36 matches

Site Navigation

Mail list logo

Footer information