[Committed] RISC-V: Fix VLS mode movmiaslign bug
PR112932 let me notice there is a bug of current VLS mode misalign pattern. Adapt it same as VLA mode. Commited as it is obvious fix. PR target/112932 gcc/ChangeLog: * config/riscv/vector.md (movmisalign): Fix VLSmode bugs. gcc/testsuite/ChangeLog: * gcc.target/riscv/rvv/autovec/vls/misalign-1.c: Ditto. * gcc.target/riscv/rvv/autovec/pr112932.c: New test. --- gcc/config/riscv/vector.md| 23 +-- .../gcc.target/riscv/rvv/autovec/pr112932.c | 66 +++ .../riscv/rvv/autovec/vls/misalign-1.c| 6 +- 3 files changed, 70 insertions(+), 25 deletions(-) create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/pr112932.c diff --git a/gcc/config/riscv/vector.md b/gcc/config/riscv/vector.md index 31c13a6dcca..a1284fd3251 100644 --- a/gcc/config/riscv/vector.md +++ b/gcc/config/riscv/vector.md @@ -1334,31 +1334,12 @@ [(set_attr "type" "vmov") (set_attr "mode" "")]) -(define_expand "movmisalign" - [(set (match_operand:VLS 0 "nonimmediate_operand") - (match_operand:VLS 1 "general_operand"))] - "TARGET_VECTOR" - { -/* To support misalign data movement, we should use - minimum element alignment load/store. */ -unsigned int size = GET_MODE_SIZE (GET_MODE_INNER (mode)); -poly_int64 nunits = GET_MODE_NUNITS (mode) * size; -machine_mode mode = riscv_vector::get_vector_mode (QImode, nunits).require (); -operands[0] = gen_lowpart (mode, operands[0]); -operands[1] = gen_lowpart (mode, operands[1]); -if (MEM_P (operands[0]) && !register_operand (operands[1], mode)) - operands[1] = force_reg (mode, operands[1]); -riscv_vector::emit_vlmax_insn (code_for_pred_mov (mode), riscv_vector::UNARY_OP, operands); -DONE; - } -) - ;; According to RVV ISA: ;; If an element accessed by a vector memory instruction is not naturally aligned to the size of the element, ;; either the element is transferred successfully or an address misaligned exception is raised on that element. (define_expand "movmisalign" - [(set (match_operand:V 0 "nonimmediate_operand") - (match_operand:V 1 "general_operand"))] + [(set (match_operand:V_VLS 0 "nonimmediate_operand") + (match_operand:V_VLS 1 "general_operand"))] "TARGET_VECTOR && TARGET_VECTOR_MISALIGN_SUPPORTED" { emit_move_insn (operands[0], operands[1]); diff --git a/gcc/testsuite/gcc.target/riscv/rvv/autovec/pr112932.c b/gcc/testsuite/gcc.target/riscv/rvv/autovec/pr112932.c new file mode 100644 index 000..4ae6ec02817 --- /dev/null +++ b/gcc/testsuite/gcc.target/riscv/rvv/autovec/pr112932.c @@ -0,0 +1,66 @@ +/* { dg-do run } */ +/* { dg-options "-O3" } */ +/* { dg-require-effective-target riscv_v } */ + +#include +int a, j, n, b, c, o, d, g, h; +int e[8]; +long f[8][6]; +void l() { + o = -27; + for (; o; o++) { +*e = 1; +if (a >= n) { + d = 0; + for (; d <= 7; d++) +e[d] = c; +} + } + j = 0; + for (; j < 8; j++) { +g = 0; +for (; g < 2; g++) { + h = 1; + for (; h < 3; h++) +f[j][g * 2 + h] = 1; +} + } + unsigned long *m = &f[1][1]; + *m = 0; +} +int main() { + l(); + assert (f[0][1] == 1); + assert (f[0][2] == 1); + assert (f[0][3] == 1); + assert (f[0][4] == 1); + assert (f[1][1] == 0); + assert (f[1][2] == 1); + assert (f[1][3] == 1); + assert (f[1][4] == 1); + assert (f[2][1] == 1); + assert (f[2][2] == 1); + assert (f[2][3] == 1); + assert (f[2][4] == 1); + assert (f[3][1] == 1); + assert (f[3][2] == 1); + assert (f[3][3] == 1); + assert (f[3][4] == 1); + assert (f[4][1] == 1); + assert (f[4][2] == 1); + assert (f[4][3] == 1); + assert (f[4][4] == 1); + assert (f[5][1] == 1); + assert (f[5][2] == 1); + assert (f[5][3] == 1); + assert (f[5][4] == 1); + assert (f[6][1] == 1); + assert (f[6][2] == 1); + assert (f[6][3] == 1); + assert (f[6][4] == 1); + assert (f[7][1] == 1); + assert (f[7][2] == 1); + assert (f[7][3] == 1); + assert (f[7][4] == 1); +} + diff --git a/gcc/testsuite/gcc.target/riscv/rvv/autovec/vls/misalign-1.c b/gcc/testsuite/gcc.target/riscv/rvv/autovec/vls/misalign-1.c index b602ffd69bb..6e08f77921a 100644 --- a/gcc/testsuite/gcc.target/riscv/rvv/autovec/vls/misalign-1.c +++ b/gcc/testsuite/gcc.target/riscv/rvv/autovec/vls/misalign-1.c @@ -21,7 +21,5 @@ foo () abort (); } -/* { dg-final { scan-assembler-times {vle8\.v} 1 } } */ -/* { dg-final { scan-assembler-times {vle8\.v} 1 } } */ -/* { dg-final { scan-assembler-not {vle16\.v} } } */ -/* { dg-final { scan-assembler-not {vle16\.v} } } */ +/* { dg-final { scan-assembler-times {vsetvli} 1 } } */ + -- 2.36.3
[PATCH] phiopt: Fix ICE with large --param l1-cache-line-size= [PR112887]
Hi! This function is never called when param_l1_cache_line_size is 0, but it uses int and unsigned int variables to hold alignment in bits, so for large param_l1_cache_line_size it is zero and e.g. DECL_ALIGN () % param_align_bits can divide by zero. Looking at the code, the function uses tree_fits_uhwi_p on the trees before converting them using tree_to_uhwi to int variables, which looks just wrong, either it would need to punt if it doesn't fit into those and also check for overflows during the computation, or use unsigned HOST_WIDE_INT for all of this. That also fixes the division by zero, as param_l1_cache_line_size maximum is INT_MAX, that multiplied by 8 will always fit. Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk? 2023-12-09 Jakub Jelinek PR tree-optimization/112887 * tree-ssa-phiopt.cc (hoist_adjacent_loads): Change type of param_align, param_align_bits, offset1, offset2, size2 and align1 variables from int or unsigned int to unsigned HOST_WIDE_INT. * gcc.dg/pr112887.c: New test. --- gcc/tree-ssa-phiopt.cc.jj 2023-11-14 10:52:16.195275972 +0100 +++ gcc/tree-ssa-phiopt.cc 2023-12-08 16:25:29.166747347 +0100 @@ -3757,8 +3757,8 @@ static void hoist_adjacent_loads (basic_block bb0, basic_block bb1, basic_block bb2, basic_block bb3) { - int param_align = param_l1_cache_line_size; - unsigned param_align_bits = (unsigned) (param_align * BITS_PER_UNIT); + unsigned HOST_WIDE_INT param_align = param_l1_cache_line_size; + unsigned HOST_WIDE_INT param_align_bits = param_align * BITS_PER_UNIT; gphi_iterator gsi; /* Walk the phis in bb3 looking for an opportunity. We are looking @@ -3770,8 +3770,7 @@ hoist_adjacent_loads (basic_block bb0, b gimple *def1, *def2; tree arg1, arg2, ref1, ref2, field1, field2; tree tree_offset1, tree_offset2, tree_size2, next; - int offset1, offset2, size2; - unsigned align1; + unsigned HOST_WIDE_INT offset1, offset2, size2, align1; gimple_stmt_iterator gsi2; basic_block bb_for_def1, bb_for_def2; --- gcc/testsuite/gcc.dg/pr112887.c.jj 2023-12-08 16:31:30.708697160 +0100 +++ gcc/testsuite/gcc.dg/pr112887.c 2023-12-08 16:27:06.662385487 +0100 @@ -0,0 +1,13 @@ +/* PR tree-optimization/112887 */ +/* { dg-do compile } */ +/* { dg-options "-O2 --param=l1-cache-line-size=0x2000" } */ + +void bar (long); +long c; +struct S { long a, b; } s; + +void +foo (void) +{ + bar (c ? s.a : s.b); +} Jakub
Re: [PATCH 15/21]middle-end: [RFC] conditionally support forcing final edge for debugging
Tamar Christina writes: > Hi All, > > What do people think about having the ability to force only the latch > connected > exit as the exit as a param? I.e. what's in the patch but as a param. > > I found this useful when debugging large example failures as it tells me where > I should be looking. No hard requirement but just figured I'd ask if we > should. If it's useful for that, then perhaps it would be worth making it a DEBUG_COUNTER instead of a --param, for easy bisection. Thanks, Richard > > Thanks, > Tamar > > gcc/ChangeLog: > > * tree-vect-loop.cc (vec_init_loop_exit_info): Allow forcing of exit. > > --- inline copy of patch -- > diff --git a/gcc/tree-vect-loop.cc b/gcc/tree-vect-loop.cc > index > 27ab6abfa854f14f8a4cf3d9fcb1ac1c203a4198..d6b35372623e94e02965510ab557cb568c302ebe > 100644 > --- a/gcc/tree-vect-loop.cc > +++ b/gcc/tree-vect-loop.cc > @@ -964,6 +964,7 @@ vec_init_loop_exit_info (class loop *loop) >if (exits.length () == 1) > return exits[0]; > > +#if 0 >/* If we have multiple exits we only support counting IV at the moment. > Analyze > all exits and return one */ >class tree_niter_desc niter_desc; > @@ -982,6 +983,16 @@ vec_init_loop_exit_info (class loop *loop) > } > >return candidate; > +#else > + basic_block bb = ip_normal_pos (loop); > + if (!bb) > +return NULL; > + > + edge exit = EDGE_SUCC (bb, 0); > + if (exit->dest == loop->latch) > +return EDGE_SUCC (bb, 1); > + return exit; > +#endif > } > > /* Function bb_in_loop_p
Re: [PATCH] Add support for function attributes and variable attributes
Added it. Le jeu. 7 déc. 2023 à 18:13, Antoni Boucher a écrit : > > It seems like you forgot to prefix the commit message with "libgccjit: > ". > > On Thu, 2023-11-30 at 10:55 +0100, Guillaume Gomez wrote: > > Ping David. :) > > > > Le jeu. 23 nov. 2023 à 22:59, Antoni Boucher a > > écrit : > > > David: I found back the comment you made. Here it is: > > > > > >I see you have patches to add function and variable attributes; > > > I > > >wonder if this would be cleaner internally if there was a > > >recording::attribute class, rather than the std::pair currently > > > in > > >use > > >(some attributes have int arguments rather than string, others > > > have > > >multiple args). > > > > > >I also wondered if a "gcc_jit_attribute" type could be exposed > > > to > > >the > > >user, e.g.: > > > > > > attr1 = gcc_jit_context_new_attribute (ctxt, "noreturn"); > > > attr2 = gcc_jit_context_new_attribute_with_string (ctxt, > > > "alias", > > >"__foo"); > > > gcc_jit_function_add_attribute (ctxt, attr1); > > > gcc_jit_function_add_attribute (ctxt, attr2); > > > > > >or somesuch? But I think the API you currently have is OK. > > > > > > On Thu, 2023-11-23 at 22:52 +0100, Guillaume Gomez wrote: > > > > Ping David. :) > > > > > > > > Le mer. 15 nov. 2023 à 17:56, Antoni Boucher a > > > > écrit : > > > > > > > > > > David: another thing I remember you mentioned when you reviewed > > > > > an > > > > > earlier version of this patch is the usage of `std::pair`. > > > > > I can't find where you said that, but I remember you mentioned > > > > > that > > > > > we > > > > > should use a struct instead. > > > > > Can you please elaborate again? > > > > > Thanks. > > > > > > > > > > On Wed, 2023-11-15 at 17:53 +0100, Guillaume Gomez wrote: > > > > > > Hi, > > > > > > > > > > > > This patch adds the (incomplete) support for function and > > > > > > variable > > > > > > attributes. The added attributes are the ones we're using in > > > > > > rustc_codegen_gcc but all the groundwork is done to add more > > > > > > (and > > > > > > we > > > > > > will very likely add more as we didn't add all the ones we > > > > > > use in > > > > > > rustc_codegen_gcc yet). > > > > > > > > > > > > The only big question with this patch is about `inline`. We > > > > > > currently > > > > > > handle it as an attribute because it is more convenient for > > > > > > us > > > > > > but is > > > > > > it ok or should we create a separate function to mark a > > > > > > function > > > > > > as > > > > > > inlined? > > > > > > > > > > > > Thanks in advance for the review. > > > > > > > > > From df75f0eb8aacba249b6e791603752e35778951a4 Mon Sep 17 00:00:00 2001 From: Guillaume Gomez Date: Mon, 20 Jun 2022 14:34:39 -0400 Subject: [PATCH] libgccjit: Add support for function attributes and variable attributes. gcc/jit/ChangeLog: * dummy-frontend.cc (handle_alias_attribute): New function. (handle_always_inline_attribute): New function. (handle_cold_attribute): New function. (handle_fnspec_attribute): New function. (handle_format_arg_attribute): New function. (handle_format_attribute): New function. (handle_noinline_attribute): New function. (handle_target_attribute): New function. (handle_used_attribute): New function. (handle_visibility_attribute): New function. (handle_weak_attribute): New function. (handle_alias_ifunc_attribute): New function. * jit-playback.cc (fn_attribute_to_string): New function. (variable_attribute_to_string): New function. (global_new_decl): Add attributes support. (set_variable_attribute): New function. (new_global): Add attributes support. (new_global_initialized): Add attributes support. (new_local): Add attributes support. * jit-playback.h (fn_attribute_to_string): New function. (set_variable_attribute): New function. * jit-recording.cc (recording::lvalue::add_attribute): New function. (recording::function::function): New function. (recording::function::write_to_dump): Add attributes support. (recording::function::add_attribute): New function. (recording::function::add_string_attribute): New function. (recording::function::add_integer_array_attribute): New function. (recording::global::replay_into): Add attributes support. (recording::local::replay_into): Add attributes support. * libgccjit.cc (gcc_jit_function_add_attribute): New function. (gcc_jit_function_add_string_attribute): New function. (gcc_jit_function_add_integer_array_attribute): New function. (gcc_jit_lvalue_add_attribute): New function. * libgccjit.h (enum gcc_jit_fn_attribute): New enum. (gcc_jit_function_add_attribute): New function. (gcc_jit_function_add_string_attribute): New function. (gcc_jit_function_add_integer_array_attribute): New function. (enum gcc_jit_variable_attribute): New function. (gcc_jit_lvalue_add_string_attribute): New function. * libgccjit.map: Declare new functions. gcc/testsuite/ChangeLog: * jit.dg/jit.exp: Add `jit-verify-assembler-
Re: [patch] OpenMP/Fortran: Implement omp allocators/allocate for ptr/allocatables
Hi Tobias! On 2023-11-08T17:58:10+0100, Tobias Burnus wrote: > OpenMP/Fortran: Implement omp allocators/allocate for ptr/allocatables Nice work! > This commit adds -fopenmp-allocators which enables support for > 'omp allocators' and 'omp allocate' that are associated with a Fortran > allocate-stmt. If such a construct is encountered, an error is shown, > unless the -fopenmp-allocators flag is present. > > With -fopenmp -fopenmp-allocators, those constructs get turned into > GOMP_alloc allocations, while -fopenmp-allocators (also without -fopenmp) > ensures deallocation and reallocation (via intrinsic assignments) are > properly directed to GOMP_free/omp_realloc - while normal Fortran > allocations are processed by free/realloc. > > In order to distinguish a 'malloc'ed from a 'GOMP_alloc'ed memory, the > version field of the Fortran array discriptor is (mis)used: 0 indicates > the normal Fortran allocation while 1 denotes GOMP_alloc. For scalars, > there is record keeping in libgomp: GOMP_add_alloc(ptr) will add the > pointer address to a splay_tree while GOMP_is_alloc(ptr) will return > true it was previously added but also removes it from the list. > > Besides Fortran FE work, BUILT_IN_GOMP_REALLOC is no part of > omp-builtins.def and libgomp gains the mentioned two new function. Minor comments: > --- a/gcc/fortran/trans-openmp.cc > +++ b/gcc/fortran/trans-openmp.cc > +/* Add ptr for tracking as being allocated by GOMP_alloc. */ > + > +tree > +gfc_omp_call_add_alloc (tree ptr) > +{ > + static tree fn = NULL_TREE; > + if (fn == NULL_TREE) > +{ > + fn = build_function_type_list (void_type_node, ptr_type_node, > NULL_TREE); > + fn = build_fn_decl ("GOMP_add_alloc", fn); > +/* FIXME: attributes. */ > +} > + return build_call_expr_loc (input_location, fn, 1, ptr); > +} > + > +/* Generated function returns true when it was tracked via GOMP_add_alloc and > + removes it from the tracking. As called just before GOMP_free or > omp_realloc > + the pointer is or might become invalid, thus, it is always removed. */ > + > +tree > +gfc_omp_call_is_alloc (tree ptr) > +{ > + static tree fn = NULL_TREE; > + if (fn == NULL_TREE) > +{ > + fn = build_function_type_list (boolean_type_node, ptr_type_node, > + NULL_TREE); > + fn = build_fn_decl ("GOMP_is_alloc", fn); > +/* FIXME: attributes. */ > +} > + return build_call_expr_loc (input_location, fn, 1, ptr); > +} Why not define 'GOMP_add_alloc', 'GOMP_is_alloc' via 'gcc/omp-builtins.def'? > --- a/gcc/omp-builtins.def > +++ b/gcc/omp-builtins.def > @@ -467,6 +467,9 @@ DEF_GOMP_BUILTIN > (BUILT_IN_GOMP_WORKSHARE_TASK_REDUCTION_UNREGISTER, > DEF_GOMP_BUILTIN (BUILT_IN_GOMP_ALLOC, > "GOMP_alloc", BT_FN_PTR_SIZE_SIZE_PTRMODE, > ATTR_ALLOC_WARN_UNUSED_RESULT_SIZE_2_NOTHROW_LIST) > +DEF_GOMP_BUILTIN (BUILT_IN_GOMP_REALLOC, > + "omp_realloc", BT_FN_PTR_PTR_SIZE_PTRMODE_PTRMODE, > + ATTR_ALLOC_WARN_UNUSED_RESULT_SIZE_2_NOTHROW_LEAF_LIST) > DEF_GOMP_BUILTIN (BUILT_IN_GOMP_FREE, > "GOMP_free", BT_FN_VOID_PTR_PTRMODE, ATTR_NOTHROW_LEAF_LIST) > DEF_GOMP_BUILTIN (BUILT_IN_GOMP_WARNING, "GOMP_warning", Should this either be 'BUILT_IN_OMP_REALLOC' ('OMP' instead of 'GOMP'), or otherwise a 'GOMP_realloc' be added to 'libgomp/allocator.c', like for 'GOMP_alloc', 'GOMP_free'; 'ialias_call'ing the respective 'omp_[...]' functions? (Verbatim 'omp_realloc' also mentioned in the comment before 'gcc/fortran/trans-openmp.cc:gfc_omp_call_is_alloc'.) > --- a/libgomp/allocator.c > +++ b/libgomp/allocator.c > +/* Add pointer as being alloced by GOMP_alloc. */ > +void > +GOMP_add_alloc (void *ptr) > +{ > + [...] > +} > + > +/* Remove pointer, either called by FREE or by REALLOC, > + either of them can change the allocation status. */ > +bool > +GOMP_is_alloc (void *ptr) > +{ > + [...] > +} > --- a/libgomp/libgomp.map > +++ b/libgomp/libgomp.map > +GOMP_5.1.2 { > + global: > + GOMP_add_alloc; > + GOMP_is_alloc; > + [...] > +} GOMP_5.1.1; 'GOMP_add_alloc', 'GOMP_is_alloc' should get prototyped in 'libgomp/libgomp_g.h'. Grüße Thomas - Siemens Electronic Design Automation GmbH; Anschrift: Arnulfstraße 201, 80634 München; Gesellschaft mit beschränkter Haftung; Geschäftsführer: Thomas Heurung, Frank Thürauf; Sitz der Gesellschaft: München; Registergericht München, HRB 106955
Re: [PATCH] Reimplement __gnu_cxx::__ops operators
On 07/12/2023 14:41, Jonathan Wakely wrote: On Wed, 6 Dec 2023 at 20:55, François Dumont wrote: I think I still got no feedback about this cleanup proposal. Can you remind me why we have all those different functions in predefined_ops.h in the first place? I think it was to avoid having two versions of every algorithm, one that does *l < *r and one that does pred(*l, *r), right? Yes, that was the purpose. One property of the current code is that _Iter_less_iter will compare exactly *lhs < *rhs and so works even with this type, where its operator< only accepts non-const arguments: struct X { bool operator<(X&); }; Doesn't your simplification break that, because the _Less function only accepts const references now? Indeed, I thought more about the problem of const qualification on the operator(). This is why _Comp_val is mimicking what _Not_fn does. To be honnest I even thought that this kind of operator above was more a user code issue than a real use case we need to handle. But it looks like you, I guess the C++ Standard then, impose to support it. I'll rework it then, thanks for the proposal below and in your other email.
[PATCH] RISC-V: Recognize stepped series in expand_vec_perm_const.
Hi, we currently try to recognize various forms of stepped (const_vector) sequence variants in expand_const_vector. Because of complications with canonicalization and encoding it is easier to identify such patterns in expand_vec_perm_const_1 already where perm.series_p () is available. This patch introduces shuffle_series as new permutation pattern and tries to recognize series like [base0 base1 base1 + step ...]. If such a series is found the series is expanded by expand_vec_series and a gather is emitted. On top the patch fixes the step recognition in expand_const_vector for stepped series where such a series would end up before. This fixes several execution failures when running code compiled for a scalable vector size of 128 on a target with vlen = 256 or higher. The problem was only noticed there because the encoding for a reversed [2 2]-element vector ("3 2 1 0") is { [1 2], [0 2], [1 4] }. Some testcases that failed were: vect-alias-check-18.c vect-alias-check-1.F90 pr64365.c On a 128-bit target, only the first two elements are used. The third element causing the complications only comes into effect at vlen = 256. With this patch the testsuite results are similar with vlen = 128 and vlen = 256 (when built with -march=rv64gcv_zvl128b). Regards Robin gcc/ChangeLog: * config/riscv/riscv-v.cc (expand_const_vector): Fix step calculation. (modulo_sel_indices): Also perform modulo for variable-length constants. (shuffle_series): Recognize series permutations. (expand_vec_perm_const_1): Add shuffle_series. --- gcc/config/riscv/riscv-v.cc | 66 +++-- 1 file changed, 64 insertions(+), 2 deletions(-) diff --git a/gcc/config/riscv/riscv-v.cc b/gcc/config/riscv/riscv-v.cc index 9b99d0aca84..fd6ef0660a2 100644 --- a/gcc/config/riscv/riscv-v.cc +++ b/gcc/config/riscv/riscv-v.cc @@ -1378,12 +1378,15 @@ expand_const_vector (rtx target, rtx src) { base0, base1, base1 + step, base1 + step * 2, ... } */ rtx base0 = builder.elt (0); rtx base1 = builder.elt (1); - rtx step = builder.elt (2); + rtx base2 = builder.elt (2); + + scalar_mode elem_mode = GET_MODE_INNER (mode); + rtx step = simplify_binary_operation (MINUS, elem_mode, base2, base1); + /* Step 1 - { base1, base1 + step, base1 + step * 2, ... } */ rtx tmp = gen_reg_rtx (mode); expand_vec_series (tmp, base1, step); /* Step 2 - { base0, base1, base1 + step, base1 + step * 2, ... } */ - scalar_mode elem_mode = GET_MODE_INNER (mode); if (!rtx_equal_p (base0, const0_rtx)) base0 = force_reg (elem_mode, base0); @@ -3395,6 +3398,63 @@ shuffle_extract_and_slide1up_patterns (struct expand_vec_perm_d *d) return true; } +static bool +shuffle_series (struct expand_vec_perm_d *d) +{ + if (!d->one_vector_p || d->perm.encoding ().npatterns () != 1) +return false; + + poly_int64 el1 = d->perm[0]; + poly_int64 el2 = d->perm[1]; + poly_int64 el3 = d->perm[2]; + + poly_int64 step1 = el2 - el1; + poly_int64 step2 = el3 - el2; + + bool need_insert = false; + bool have_series = false; + + /* Check for a full series. */ + if (known_ne (step1, 0) && d->perm.series_p (0, 1, el1, step1)) +have_series = true; + + /* Check for a series starting at the second element. */ + else if (known_ne (step2, 0) && d->perm.series_p (1, 1, el2, step2)) +{ + have_series = true; + need_insert = true; +} + + if (!have_series) +return false; + + /* Get a vector int-mode to be used for the permute selector. */ + machine_mode sel_mode = related_int_vector_mode (d->vmode).require (); + insn_code icode = optab_handler (vec_shl_insert_optab, sel_mode); + + /* We need to be able to insert an element and shift the vector. */ + if (need_insert && icode == CODE_FOR_nothing) +return false; + + /* Success! */ + if (d->testing_p) +return true; + + /* Create the series. */ + machine_mode eltmode = Pmode; + rtx series = gen_reg_rtx (sel_mode); + expand_vec_series (series, gen_int_mode (need_insert ? el2 : el1, eltmode), +gen_int_mode (need_insert ? step2 : step1, eltmode)); + + /* Insert the remaining element if necessary. */ + if (need_insert) +emit_insn (GEN_FCN (icode) (series, series, gen_int_mode (el1, eltmode))); + + emit_vlmax_gather_insn (d->target, d->op0, series); + + return true; +} + /* Recognize the pattern that can be shuffled by generic approach. */ static bool @@ -3475,6 +3535,8 @@ expand_vec_perm_const_1 (struct expand_vec_perm_d *d) return true; if (shuffle_extract_and_slide1up_patterns (d)) return true; + if (shuffle_series (d)) + return true; if (shuffle_generic_patterns (d)) return true; return false; -- 2.43.0
Re: [PATCH] RISC-V: Add vectorized strcmp.
> FAIL: gcc.target/riscv/rvv/autovec/builtin/strcmp-run.c execution test > FAIL: gcc.target/riscv/rvv/autovec/builtin/strcmp-run.c execution test > FAIL: gcc.target/riscv/rvv/autovec/builtin/strcmp-run.c execution test > FAIL: gcc.target/riscv/rvv/autovec/builtin/strcmp-run.c execution test > FAIL: gcc.target/riscv/rvv/autovec/builtin/strcmp-run.c execution test > FAIL: gcc.target/riscv/rvv/autovec/builtin/strcmp-run.c execution test > FAIL: gcc.target/riscv/rvv/autovec/builtin/strcmp-run.c execution test > FAIL: gcc.target/riscv/rvv/autovec/builtin/strcmp-run.c execution test > FAIL: gcc.target/riscv/rvv/autovec/builtin/strcmp-run.c execution test > FAIL: gcc.target/riscv/rvv/autovec/builtin/strcmp-run.c execution test Thanks, which config? For me everything under builtin passes on rv64gcv and rv32gcv: PASS: gcc.target/riscv/rvv/autovec/builtin/strcmp-run.c execution test PASS: gcc.target/riscv/rvv/autovec/builtin/strcmp-run.c execution test PASS: gcc.target/riscv/rvv/autovec/builtin/strcmp-run.c execution test PASS: gcc.target/riscv/rvv/autovec/builtin/strcmp-run.c execution test PASS: gcc.target/riscv/rvv/autovec/builtin/strcmp-run.c execution test PASS: gcc.target/riscv/rvv/autovec/builtin/strcmp-run.c execution test PASS: gcc.target/riscv/rvv/autovec/builtin/strcmp-run.c execution test PASS: gcc.target/riscv/rvv/autovec/builtin/strcmp-run.c execution test PASS: gcc.target/riscv/rvv/autovec/builtin/strcmp-run.c execution test PASS: gcc.target/riscv/rvv/autovec/builtin/strcmp-run.c execution test Regards Robin
Re: [PATCH] RISC-V: Recognize stepped series in expand_vec_perm_const.
It's more reasonable to fix it in vec_perm_const instead of fix it in middle-end. LGTM. juzhe.zh...@rivai.ai From: Robin Dapp Date: 2023-12-09 21:18 To: gcc-patches; palmer; Kito Cheng; jeffreyalaw; juzhe.zh...@rivai.ai CC: rdapp.gcc Subject: [PATCH] RISC-V: Recognize stepped series in expand_vec_perm_const. Hi, we currently try to recognize various forms of stepped (const_vector) sequence variants in expand_const_vector. Because of complications with canonicalization and encoding it is easier to identify such patterns in expand_vec_perm_const_1 already where perm.series_p () is available. This patch introduces shuffle_series as new permutation pattern and tries to recognize series like [base0 base1 base1 + step ...]. If such a series is found the series is expanded by expand_vec_series and a gather is emitted. On top the patch fixes the step recognition in expand_const_vector for stepped series where such a series would end up before. This fixes several execution failures when running code compiled for a scalable vector size of 128 on a target with vlen = 256 or higher. The problem was only noticed there because the encoding for a reversed [2 2]-element vector ("3 2 1 0") is { [1 2], [0 2], [1 4] }. Some testcases that failed were: vect-alias-check-18.c vect-alias-check-1.F90 pr64365.c On a 128-bit target, only the first two elements are used. The third element causing the complications only comes into effect at vlen = 256. With this patch the testsuite results are similar with vlen = 128 and vlen = 256 (when built with -march=rv64gcv_zvl128b). Regards Robin gcc/ChangeLog: * config/riscv/riscv-v.cc (expand_const_vector): Fix step calculation. (modulo_sel_indices): Also perform modulo for variable-length constants. (shuffle_series): Recognize series permutations. (expand_vec_perm_const_1): Add shuffle_series. --- gcc/config/riscv/riscv-v.cc | 66 +++-- 1 file changed, 64 insertions(+), 2 deletions(-) diff --git a/gcc/config/riscv/riscv-v.cc b/gcc/config/riscv/riscv-v.cc index 9b99d0aca84..fd6ef0660a2 100644 --- a/gcc/config/riscv/riscv-v.cc +++ b/gcc/config/riscv/riscv-v.cc @@ -1378,12 +1378,15 @@ expand_const_vector (rtx target, rtx src) { base0, base1, base1 + step, base1 + step * 2, ... } */ rtx base0 = builder.elt (0); rtx base1 = builder.elt (1); - rtx step = builder.elt (2); + rtx base2 = builder.elt (2); + + scalar_mode elem_mode = GET_MODE_INNER (mode); + rtx step = simplify_binary_operation (MINUS, elem_mode, base2, base1); + /* Step 1 - { base1, base1 + step, base1 + step * 2, ... } */ rtx tmp = gen_reg_rtx (mode); expand_vec_series (tmp, base1, step); /* Step 2 - { base0, base1, base1 + step, base1 + step * 2, ... } */ - scalar_mode elem_mode = GET_MODE_INNER (mode); if (!rtx_equal_p (base0, const0_rtx)) base0 = force_reg (elem_mode, base0); @@ -3395,6 +3398,63 @@ shuffle_extract_and_slide1up_patterns (struct expand_vec_perm_d *d) return true; } +static bool +shuffle_series (struct expand_vec_perm_d *d) +{ + if (!d->one_vector_p || d->perm.encoding ().npatterns () != 1) +return false; + + poly_int64 el1 = d->perm[0]; + poly_int64 el2 = d->perm[1]; + poly_int64 el3 = d->perm[2]; + + poly_int64 step1 = el2 - el1; + poly_int64 step2 = el3 - el2; + + bool need_insert = false; + bool have_series = false; + + /* Check for a full series. */ + if (known_ne (step1, 0) && d->perm.series_p (0, 1, el1, step1)) +have_series = true; + + /* Check for a series starting at the second element. */ + else if (known_ne (step2, 0) && d->perm.series_p (1, 1, el2, step2)) +{ + have_series = true; + need_insert = true; +} + + if (!have_series) +return false; + + /* Get a vector int-mode to be used for the permute selector. */ + machine_mode sel_mode = related_int_vector_mode (d->vmode).require (); + insn_code icode = optab_handler (vec_shl_insert_optab, sel_mode); + + /* We need to be able to insert an element and shift the vector. */ + if (need_insert && icode == CODE_FOR_nothing) +return false; + + /* Success! */ + if (d->testing_p) +return true; + + /* Create the series. */ + machine_mode eltmode = Pmode; + rtx series = gen_reg_rtx (sel_mode); + expand_vec_series (series, gen_int_mode (need_insert ? el2 : el1, eltmode), + gen_int_mode (need_insert ? step2 : step1, eltmode)); + + /* Insert the remaining element if necessary. */ + if (need_insert) +emit_insn (GEN_FCN (icode) (series, series, gen_int_mode (el1, eltmode))); + + emit_vlmax_gather_insn (d->target, d->op0, series); + + return true; +} + /* Recognize the pattern that can be shuffled by generic approach. */ static bool @@ -3475,6 +3535,8 @@ expand_vec_perm_const_1 (struct expand_vec_perm_d *d) return true; if (shuffle_extract_and_slide1up_patterns (d)) return true; + if (shuffle_series (d)) + return true; if (shuffle_generic_patterns (d)) return true;
Re: Re: [PATCH] RISC-V: Add vectorized strcmp.
rv64gcv juzhe.zh...@rivai.ai From: Robin Dapp Date: 2023-12-09 21:51 To: 钟居哲; gcc-patches; palmer; kito.cheng; Jeff Law CC: rdapp.gcc Subject: Re: [PATCH] RISC-V: Add vectorized strcmp. > FAIL: gcc.target/riscv/rvv/autovec/builtin/strcmp-run.c execution test > FAIL: gcc.target/riscv/rvv/autovec/builtin/strcmp-run.c execution test > FAIL: gcc.target/riscv/rvv/autovec/builtin/strcmp-run.c execution test > FAIL: gcc.target/riscv/rvv/autovec/builtin/strcmp-run.c execution test > FAIL: gcc.target/riscv/rvv/autovec/builtin/strcmp-run.c execution test > FAIL: gcc.target/riscv/rvv/autovec/builtin/strcmp-run.c execution test > FAIL: gcc.target/riscv/rvv/autovec/builtin/strcmp-run.c execution test > FAIL: gcc.target/riscv/rvv/autovec/builtin/strcmp-run.c execution test > FAIL: gcc.target/riscv/rvv/autovec/builtin/strcmp-run.c execution test > FAIL: gcc.target/riscv/rvv/autovec/builtin/strcmp-run.c execution test Thanks, which config? For me everything under builtin passes on rv64gcv and rv32gcv: PASS: gcc.target/riscv/rvv/autovec/builtin/strcmp-run.c execution test PASS: gcc.target/riscv/rvv/autovec/builtin/strcmp-run.c execution test PASS: gcc.target/riscv/rvv/autovec/builtin/strcmp-run.c execution test PASS: gcc.target/riscv/rvv/autovec/builtin/strcmp-run.c execution test PASS: gcc.target/riscv/rvv/autovec/builtin/strcmp-run.c execution test PASS: gcc.target/riscv/rvv/autovec/builtin/strcmp-run.c execution test PASS: gcc.target/riscv/rvv/autovec/builtin/strcmp-run.c execution test PASS: gcc.target/riscv/rvv/autovec/builtin/strcmp-run.c execution test PASS: gcc.target/riscv/rvv/autovec/builtin/strcmp-run.c execution test PASS: gcc.target/riscv/rvv/autovec/builtin/strcmp-run.c execution test Regards Robin
[committed] libstdc++: Fix resolution of LWG 4016 for std::ranges::to [PR112876]
Tested x86_64-linux. Pushed to trunk. -- >8 -- What I implemented in r14-6199-g45630fbcf7875b does not match what I proposed for LWG 4016, and it imposes additional, unwanted requirements on the emplace and insert member functions of the container being populated. libstdc++-v3/ChangeLog: PR libstdc++/112876 * include/std/ranges (ranges::to): Do not try to use an iterator returned by the container's emplace or insert member functions. * testsuite/std/ranges/conv/1.cc (Cont4::emplace, Cont4::insert): Use the iterator parameter. Do not return an iterator. --- libstdc++-v3/include/std/ranges | 10 +++--- libstdc++-v3/testsuite/std/ranges/conv/1.cc | 12 ++-- 2 files changed, 9 insertions(+), 13 deletions(-) diff --git a/libstdc++-v3/include/std/ranges b/libstdc++-v3/include/std/ranges index fb9df3d3e79..be8475c0cb1 100644 --- a/libstdc++-v3/include/std/ranges +++ b/libstdc++-v3/include/std/ranges @@ -9300,14 +9300,10 @@ namespace __detail __c.emplace_back(*__it); else if constexpr (requires { __c.push_back(*__it); }) __c.push_back(*__it); + else if constexpr (requires { __c.emplace(__c.end(), *__it); }) + __c.emplace(__c.end(), *__it); else - { - auto __end = __c.end(); - if constexpr (requires { __c.emplace(__end, *__it); }) - __end = __c.emplace(__end, *__it); - else - __end = __c.insert(__end, *__it); - } + __c.insert(__c.end(), *__it); ++__it; } return __c; diff --git a/libstdc++-v3/testsuite/std/ranges/conv/1.cc b/libstdc++-v3/testsuite/std/ranges/conv/1.cc index b5f861dedb3..6d6a708ab64 100644 --- a/libstdc++-v3/testsuite/std/ranges/conv/1.cc +++ b/libstdc++-v3/testsuite/std/ranges/conv/1.cc @@ -236,19 +236,19 @@ struct Cont4 template requires (Kind <= Emplace) && requires(C& c, T&& t) { c.emplace(c.end(), std::forward(t)); } -typename C::iterator -emplace(typename C::iterator, T&& t) +void +emplace(typename C::iterator pos, T&& t) { kind = Emplace; - return c.emplace(c.end(), std::forward(t)); + c.emplace(pos, std::forward(t)); } template -typename C::iterator -insert(typename C::iterator, T&& t) +void +insert(typename C::iterator pos, T&& t) { kind = Insert; - return c.insert(c.end(), std::forward(t)); + c.insert(pos, std::forward(t)); } // Required to satisfy reservable-container -- 2.43.0
Re: [PATCH] RISC-V: Add vectorized strcmp.
> rv64gcv With -minline-strcmp I assume? Regards Robin
[committed] libstdc++: Fix value of __cpp_lib_format macro [PR111826]
Tested x86_64-linux. Pushed to trunk. I'll check, but I think should be backported to gcc-13 too. -- >8 -- As noted in the PR, we support both features required for the 202110L value, so we should define it with that value. libstdc++-v3/ChangeLog: PR libstdc++/111826 * include/bits/version.def (format): Update value. * include/bits/version.h: Regenerate. * testsuite/std/format/functions/format.cc: --- libstdc++-v3/include/bits/version.def | 4 +- libstdc++-v3/include/bits/version.h | 128 +- .../testsuite/std/format/functions/format.cc | 4 +- 3 files changed, 67 insertions(+), 69 deletions(-) diff --git a/libstdc++-v3/include/bits/version.def b/libstdc++-v3/include/bits/version.def index 140777832ed..38b73ec9b5d 100644 --- a/libstdc++-v3/include/bits/version.def +++ b/libstdc++-v3/include/bits/version.def @@ -1160,14 +1160,12 @@ ftms = { // TODO: #define __cpp_lib_format_ranges 202207L name = format; values = { -v = 202106; +v = 202110; cxxmin = 20; hosted = yes; }; }; -// #undef __glibcxx_chrono -// #define __glibcxx_chrono 201907L // FIXME: #define __glibcxx_execution 201902L ftms = { diff --git a/libstdc++-v3/testsuite/std/format/functions/format.cc b/libstdc++-v3/testsuite/std/format/functions/format.cc index dacc276e03c..9328dec8875 100644 --- a/libstdc++-v3/testsuite/std/format/functions/format.cc +++ b/libstdc++-v3/testsuite/std/format/functions/format.cc @@ -5,7 +5,7 @@ #ifndef __cpp_lib_format # error "Feature test macro for std::format is missing in " -#elif __cpp_lib_format < 202106L +#elif __cpp_lib_format < 202110L # error "Feature test macro for std::format has wrong value in " #endif @@ -13,7 +13,7 @@ #include #ifndef __cpp_lib_format # error "Feature test macro for std::format is missing in " -#elif __cpp_lib_format < 202106L +#elif __cpp_lib_format < 202110L # error "Feature test macro for std::format has wrong value in " #endif -- 2.43.0
Re: Re: [PATCH] RISC-V: Add vectorized strcmp.
I didn't use any special configuration: --with-arch=rv64gcv_zvl256b --with-abi=lp64d --test --jobs=64 --with-sim=qemu --enable-gcc-checking=yes,assert,extra,rtlflag,rtl,gimple juzhe.zh...@rivai.ai From: Robin Dapp Date: 2023-12-09 22:07 To: 钟居哲; gcc-patches; palmer; kito.cheng; Jeff Law CC: rdapp.gcc Subject: Re: [PATCH] RISC-V: Add vectorized strcmp. > rv64gcv With -minline-strcmp I assume? Regards Robin
[committed] testsuite: Add testcase for already fixed PR [PR112924]
Hi! This testcase got fixed with r14-6132-g50f2a3370d177f8fe9bea0461feb710523e048a2 . I'm just adding a testcase so that it doesn't reappear. Tested on x86_64-linux, with -m32/-m64, current trunk as well as r14-6131 where it ICEd with -m32, committed to trunk as obvious. 2023-12-09 Jakub Jelinek PR tree-optimization/112924 * gcc.dg/pr112924.c: New test. --- gcc/testsuite/gcc.dg/pr112924.c.jj 2023-12-09 15:28:00.848388123 +0100 +++ gcc/testsuite/gcc.dg/pr112924.c 2023-12-09 15:27:34.401752018 +0100 @@ -0,0 +1,26 @@ +/* PR tree-optimization/112924 */ +/* { dg-do compile } */ +/* { dg-options "-O2 -w" } */ +/* { dg-additional-options "-msse2" { target i?86-*-* x86_64-*-* } } */ + +struct S { long a; char b[64]; }; +void foo (struct S a); +char c; +int d[3541]; + +static void +bar (struct S *s, char *p) +{ + unsigned int a = sizeof (d) - sizeof (int) - s->a; + long c = __builtin_object_size (s, 0); + for (; a >= 64; a -= 64, p += 4); +__builtin___memcpy_chk (s, p, a, c); +} + +void +baz (void) +{ + struct S s = {}; + bar (&s, &c); + foo (s); +} Jakub
RE: [PATCH v6] libgfortran: Replace mutex with rwlock
On 2023/12/8 18:19, Jakub Jelinek wrote: > On Fri, Aug 18, 2023 at 11:18:19AM +0800, Zhu, Lipeng wrote: > > From: Lipeng Zhu > > > > This patch try to introduce the rwlock and split the read/write to > > unit_root tree and unit_cache with rwlock instead of the mutex to > > increase CPU efficiency. In the get_gfc_unit function, the percentage > > to step into the insert_unit function is around 30%, in most > > instances, we can get the unit in the phase of reading the unit_cache > > or unit_root tree. So split the read/write phase by rwlock would be an > > approach to make it more parallel. > > > > BTW, the IPC metrics can gain around 9x in our test server with 220 > > cores. The benchmark we used is https://github.com/rwesson/NEAT > > > > libgcc/ChangeLog: > > > > * gthr-posix.h (__GTHREAD_RWLOCK_INIT): New macro > > (__gthrw): New function > > (__gthread_rwlock_rdlock): New function > > (__gthread_rwlock_tryrdlock): New function > > (__gthread_rwlock_wrlock): New function > > (__gthread_rwlock_trywrlock): New function > > (__gthread_rwlock_unlock): New function > > > > libgfortran/ChangeLog: > > > > * io/async.c (DEBUG_LINE): New > > * io/async.h (RWLOCK_DEBUG_ADD): New macro > > (CHECK_RDLOCK): New macro > > (CHECK_WRLOCK): New macro > > (TAIL_RWLOCK_DEBUG_QUEUE): New macro > > (IN_RWLOCK_DEBUG_QUEUE): New macro > > (RDLOCK): New macro > > (WRLOCK): New macro > > (RWUNLOCK): New macro > > (RD_TO_WRLOCK): New macro > > (INTERN_RDLOCK): New macro > > (INTERN_WRLOCK): New macro > > (INTERN_RWUNLOCK): New macro > > * io/io.h (internal_proto): Define unit_rwlock > > * io/transfer.c (st_read_done_worker): Relace unit_lock with unit_rwlock > > (st_write_done_worker): Relace unit_lock with unit_rwlock > > * io/unit.c (get_gfc_unit): Relace unit_lock with unit_rwlock > > (if): Relace unit_lock with unit_rwlock > > (close_unit_1): Relace unit_lock with unit_rwlock > > (close_units): Relace unit_lock with unit_rwlock > > (newunit_alloc): Relace unit_lock with unit_rwlock > > * io/unix.c (flush_all_units): Relace unit_lock with unit_rwlock > > The changeLog entries are all incorrect. > 1) they should be indented by a tab, not 4 spaces, and should end with >a dot > 2) when several consecutive descriptions have the same text, especially >when it is long, it should use Likewise. for the 2nd and following > 3) (internal_proto) is certainly not what you've changed, from what I can >see in io.h you've done: > * io/io.h (struct gfc_unit): Change UNIT_LOCK to UNIT_RWLOCK in > a comment. > (unit_lock): Remove including associated internal_proto. > (unit_rwlock): New declarations including associated internal_proto. > (dec_waiting_unlocked): Use WRLOCK and RWUNLOCK on unit_rwlock > instead of __gthread_mutex_lock and __gthread_mutex_unlock on > unit_lock. >(if) is certainly not what you've changed either, always find what >function or macro the change was in, or if you remove something, state >it, if you add something, state it. > 4) all the >Replace unit_lock with unit_rwlock. descriptions only partially match >reality, you've also changed the operations on those variables. > Hi Jakub, Thanks for your help, very appreciated! I just updated the patch according to your comments. A new version [PATCH V7] is sent out for your review which update the change log and formatting code according to your following comments. Lipeng Zhu > > --- a/libgfortran/io/async.h > > +++ b/libgfortran/io/async.h > > @@ -207,9 +207,132 @@ > > INTERN_LOCK (&debug_queue_lock); > \ > > MUTEX_DEBUG_ADD (mutex); > \ > > INTERN_UNLOCK (&debug_queue_lock); > \ > > -DEBUG_PRINTF ("%s" DEBUG_RED "ACQ:" DEBUG_NORM " %- > 30s %78p\n", aio_prefix, #mutex, mutex); \ > > +DEBUG_PRINTF ("%s" DEBUG_RED "ACQ:" DEBUG_NORM " %- > 30s %78p\n", aio_prefix, #mutex, \ > > +mutex); \ > > Why are you changing this at all? > > > +#define RD_TO_WRLOCK(rwlock) \ > > + RWUNLOCK (rwlock);\ > > At least a space before \ (or better tab > > > +#define RD_TO_WRLOCK(rwlock) \ > > + RWUNLOCK (rwlock);\ > > Likewise. > > > + WRLOCK (rwlock); > > +#endif > > +#endif > > + > > +#ifndef __GTHREAD_RWLOCK_INIT > > +#define RDLOCK(rwlock) LOCK (rwlock) > > +#define WRLOCK(rwlock) LOCK (rwlock) > > +#define RWUNLOCK(rwlock) UNLOCK (rwlock) #define > RD_TO_WRLOCK(rwlock) > > +{} > > do {} while (0) > instead of {} > ? > > > #endif > > > > #define INTERN_LOCK(mutex) T_ERROR (__gthread_mutex_lock, mutex); > > > > #define INTERN_UNLOCK(mutex) T_ERROR (__gthread_mutex_unlock, > mutex); > > > > +#define INTERN_RDLOCK(rwlock) T_ERROR (__gthread_rwlock_rdlock, > > +rwlock); #define INTERN_WRLOCK(rwlock) T_ERROR > > +(__gthread_rwlock_wrlock, rwlock); #define INTERN_RWUNLOCK(rwlock) > > +T_ERROR (__gthread_rwlock_unlock, rwlock); > > Admit
Re: [patch] OpenMP/Fortran: Implement omp allocators/allocate for ptr/allocatables
On Sat, Dec 09, 2023 at 12:19:10PM +0100, Thomas Schwinge wrote: > > --- a/gcc/omp-builtins.def > > +++ b/gcc/omp-builtins.def > > @@ -467,6 +467,9 @@ DEF_GOMP_BUILTIN > > (BUILT_IN_GOMP_WORKSHARE_TASK_REDUCTION_UNREGISTER, > > DEF_GOMP_BUILTIN (BUILT_IN_GOMP_ALLOC, > > "GOMP_alloc", BT_FN_PTR_SIZE_SIZE_PTRMODE, > > ATTR_ALLOC_WARN_UNUSED_RESULT_SIZE_2_NOTHROW_LIST) > > +DEF_GOMP_BUILTIN (BUILT_IN_GOMP_REALLOC, > > + "omp_realloc", BT_FN_PTR_PTR_SIZE_PTRMODE_PTRMODE, > > + ATTR_ALLOC_WARN_UNUSED_RESULT_SIZE_2_NOTHROW_LEAF_LIST) > > DEF_GOMP_BUILTIN (BUILT_IN_GOMP_FREE, > > "GOMP_free", BT_FN_VOID_PTR_PTRMODE, ATTR_NOTHROW_LEAF_LIST) > > DEF_GOMP_BUILTIN (BUILT_IN_GOMP_WARNING, "GOMP_warning", > > Should this either be 'BUILT_IN_OMP_REALLOC' ('OMP' instead of 'GOMP'), > or otherwise a 'GOMP_realloc' be added to 'libgomp/allocator.c', like for > 'GOMP_alloc', 'GOMP_free'; 'ialias_call'ing the respective 'omp_[...]' > functions? (Verbatim 'omp_realloc' also mentioned in the comment before > 'gcc/fortran/trans-openmp.cc:gfc_omp_call_is_alloc'.) There were 3 reasons to add GOMP_alloc (and 1 for GOMP_free): 1) when it was added, omp_aligned_alloc was still not exported from the library because we thought we shouldn't expose 5.1 features until we have 5.0 implemented (then changed mind) 2) unline omp_aligned_alloc, GOMP_alloc issues fatal error on allocation failure 3) the omp_* functions have omp_allocator_handle_t arguments, which is hard to provide for builtins (I think this is the only reason for GOMP_free addition, maybe together with wanting those to be paired) Now, 1) is a non-issue anymore, I don't know what Fortran wants for allocation failures, if it is better to have diagnostics on the libgomp side or if wants to emit it inline. And yes, 3) would be an argument to add GOMP_realloc. Jakub
[PATCH v7] libgfortran: Replace mutex with rwlock
This patch try to introduce the rwlock and split the read/write to unit_root tree and unit_cache with rwlock instead of the mutex to increase CPU efficiency. In the get_gfc_unit function, the percentage to step into the insert_unit function is around 30%, in most instances, we can get the unit in the phase of reading the unit_cache or unit_root tree. So split the read/write phase by rwlock would be an approach to make it more parallel. BTW, the IPC metrics can gain around 9x in our test server with 220 cores. The benchmark we used is https://github.com/rwesson/NEAT libgcc/ChangeLog: * gthr-posix.h (__GTHREAD_RWLOCK_INIT): New macro. (__gthrw): New function. (__gthread_rwlock_rdlock): New function. (__gthread_rwlock_tryrdlock): New function. (__gthread_rwlock_wrlock): New function. (__gthread_rwlock_trywrlock): New function. (__gthread_rwlock_unlock): New function. libgfortran/ChangeLog: * io/async.c (DEBUG_LINE): New macro. * io/async.h (RWLOCK_DEBUG_ADD): New macro. (CHECK_RDLOCK): New macro. (CHECK_WRLOCK): New macro. (TAIL_RWLOCK_DEBUG_QUEUE): New macro. (IN_RWLOCK_DEBUG_QUEUE): New macro. (RDLOCK): New macro. (WRLOCK): New macro. (RWUNLOCK): New macro. (RD_TO_WRLOCK): New macro. (INTERN_RDLOCK): New macro. (INTERN_WRLOCK): New macro. (INTERN_RWUNLOCK): New macro. * io/io.h (struct gfc_unit): Change UNIT_LOCK to UNIT_RWLOCK in a comment. (unit_lock): Remove including associated internal_proto. (unit_rwlock): New declarations including associated internal_proto. (dec_waiting_unlocked): Use WRLOCK and RWUNLOCK on unit_rwlock instead of __gthread_mutex_lock and __gthread_mutex_unlock on unit_lock. * io/transfer.c (st_read_done_worker): Use WRLOCK and RWUNLOCK on unit_rwlock instead of LOCK and UNLOCK on unit_lock. (st_write_done_worker): Likewise. * io/unit.c: Change UNIT_LOCK to UNIT_RWLOCK in 'IO locking rules' comment. Use unit_rwlock variable instead of unit_lock variable. (get_gfc_unit_from_unit_root): New function. (get_gfc_unit): Use RDLOCK, WRLOCK and RWUNLOCK on unit_rwlock instead of LOCK and UNLOCK on unit_lock. (close_unit_1): Use WRLOCK and RWUNLOCK on unit_rwlock instead of LOCK and UNLOCK on unit_lock. (close_units): Likewise. (newunit_alloc): Use RWUNLOCK on unit_rwlock instead of UNLOCK on unit_lock. * io/unix.c (find_file): Use RDLOCK and RWUNLOCK on unit_rwlock instead of LOCK and UNLOCK on unit_lock. (flush_all_units): Use WRLOCK and RWUNLOCK on unit_rwlock instead of LOCK and UNLOCK on unit_lock. --- v1 -> v2: Limit the pthread_rwlock usage in libgcc only when __cplusplus isn't defined. v2 -> v3: Rebase the patch with trunk branch. v3 -> v4: Update the comments. v4 -> v5: Fix typos and code formatter. v5 -> v6: Add unit tests. v6 -> v7: Update ChangeLog and code formatter. Reviewed-by: Hongjiu Lu Reviewed-by: Bernhard Reutner-Fischer Reviewed-by: Thomas Koenig Reviewed-by: Jakub Jelinek Signed-off-by: Lipeng Zhu --- libgcc/gthr-posix.h | 60 +++ libgfortran/io/async.c| 4 + libgfortran/io/async.h| 151 ++ libgfortran/io/io.h | 15 +- libgfortran/io/transfer.c | 8 +- libgfortran/io/unit.c | 117 +- libgfortran/io/unix.c | 16 +- .../testsuite/libgomp.fortran/rwlock_1.f90| 33 .../testsuite/libgomp.fortran/rwlock_2.f90| 22 +++ .../testsuite/libgomp.fortran/rwlock_3.f90| 18 +++ 10 files changed, 386 insertions(+), 58 deletions(-) create mode 100644 libgomp/testsuite/libgomp.fortran/rwlock_1.f90 create mode 100644 libgomp/testsuite/libgomp.fortran/rwlock_2.f90 create mode 100644 libgomp/testsuite/libgomp.fortran/rwlock_3.f90 diff --git a/libgcc/gthr-posix.h b/libgcc/gthr-posix.h index aebcfdd9f4c..73283082997 100644 --- a/libgcc/gthr-posix.h +++ b/libgcc/gthr-posix.h @@ -48,6 +48,9 @@ typedef pthread_t __gthread_t; typedef pthread_key_t __gthread_key_t; typedef pthread_once_t __gthread_once_t; typedef pthread_mutex_t __gthread_mutex_t; +#ifndef __cplusplus +typedef pthread_rwlock_t __gthread_rwlock_t; +#endif typedef pthread_mutex_t __gthread_recursive_mutex_t; typedef pthread_cond_t __gthread_cond_t; typedef struct timespec __gthread_time_t; @@ -58,6 +61,9 @@ typedef struct timespec __gthread_time_t; #define __GTHREAD_MUTEX_INIT PTHREAD_MUTEX_INITIALIZER #define __GTHREAD_MUTEX_INIT_FUNCTION __gthread_mutex_init_function +#ifndef __cplusplus +#define __GTHREAD_RWLOCK_INIT PTHREAD_RWLOCK_INITIALIZER +#endif #define __GTHREAD_ONCE_INIT PTHREAD_ONCE_INIT #if defined(PTHREAD_RECURSIVE_MUT
Re: [PATCH v7] libgfortran: Replace mutex with rwlock
On Sat, Dec 09, 2023 at 10:39:45AM -0500, Lipeng Zhu wrote: > This patch try to introduce the rwlock and split the read/write to > unit_root tree and unit_cache with rwlock instead of the mutex to > increase CPU efficiency. In the get_gfc_unit function, the percentage > to step into the insert_unit function is around 30%, in most instances, > we can get the unit in the phase of reading the unit_cache or unit_root > tree. So split the read/write phase by rwlock would be an approach to > make it more parallel. > > BTW, the IPC metrics can gain around 9x in our test > server with 220 cores. The benchmark we used is > https://github.com/rwesson/NEAT > > libgcc/ChangeLog: > > * gthr-posix.h (__GTHREAD_RWLOCK_INIT): New macro. > (__gthrw): New function. > (__gthread_rwlock_rdlock): New function. > (__gthread_rwlock_tryrdlock): New function. > (__gthread_rwlock_wrlock): New function. > (__gthread_rwlock_trywrlock): New function. > (__gthread_rwlock_unlock): New function. > > libgfortran/ChangeLog: > > * io/async.c (DEBUG_LINE): New macro. > * io/async.h (RWLOCK_DEBUG_ADD): New macro. > (CHECK_RDLOCK): New macro. > (CHECK_WRLOCK): New macro. > (TAIL_RWLOCK_DEBUG_QUEUE): New macro. > (IN_RWLOCK_DEBUG_QUEUE): New macro. > (RDLOCK): New macro. > (WRLOCK): New macro. > (RWUNLOCK): New macro. > (RD_TO_WRLOCK): New macro. > (INTERN_RDLOCK): New macro. > (INTERN_WRLOCK): New macro. > (INTERN_RWUNLOCK): New macro. > * io/io.h (struct gfc_unit): Change UNIT_LOCK to UNIT_RWLOCK in > a comment. > (unit_lock): Remove including associated internal_proto. > (unit_rwlock): New declarations including associated internal_proto. > (dec_waiting_unlocked): Use WRLOCK and RWUNLOCK on unit_rwlock > instead of __gthread_mutex_lock and __gthread_mutex_unlock on > unit_lock. > * io/transfer.c (st_read_done_worker): Use WRLOCK and RWUNLOCK on > unit_rwlock instead of LOCK and UNLOCK on unit_lock. > (st_write_done_worker): Likewise. > * io/unit.c: Change UNIT_LOCK to UNIT_RWLOCK in 'IO locking rules' > comment. Use unit_rwlock variable instead of unit_lock variable. > (get_gfc_unit_from_unit_root): New function. > (get_gfc_unit): Use RDLOCK, WRLOCK and RWUNLOCK on unit_rwlock > instead of LOCK and UNLOCK on unit_lock. > (close_unit_1): Use WRLOCK and RWUNLOCK on unit_rwlock instead of > LOCK and UNLOCK on unit_lock. > (close_units): Likewise. > (newunit_alloc): Use RWUNLOCK on unit_rwlock instead of UNLOCK on > unit_lock. > * io/unix.c (find_file): Use RDLOCK and RWUNLOCK on unit_rwlock > instead of LOCK and UNLOCK on unit_lock. > (flush_all_units): Use WRLOCK and RWUNLOCK on unit_rwlock instead > of LOCK and UNLOCK on unit_lock. Ok for trunk, thanks. Jakub
[PATCH] LoongArch: Fix warnings building libgcc
We are excluding loongarch-opts.h from target libraries, but now struct loongarch_target and gcc_options are not declared in the target libraries, causing: In file included from ../.././gcc/options.h:8, from ../.././gcc/tm.h:49, from ../../../gcc/libgcc/fixed-bit.c:48: ../../../gcc/libgcc/../gcc/config/loongarch/loongarch-opts.h:57:41: warning: 'struct gcc_options' declared inside parameter list will not be visible outside of this definition or declaration 57 | struct gcc_options *opts, | ^~~ So exclude the declarations referring to the C++ structs as well. gcc/ChangeLog: * config/loongarch/loongarch-opts.h (la_target): Move into #if for loongarch-def.h. (loongarch_init_target): Likewise. (loongarch_config_target): Likewise. (loongarch_update_gcc_opt_status): Likewise. --- Bootstrapped and regtested on loongarch64-linux-gnu. Ok for trunk? gcc/config/loongarch/loongarch-opts.h | 20 ++-- 1 file changed, 10 insertions(+), 10 deletions(-) diff --git a/gcc/config/loongarch/loongarch-opts.h b/gcc/config/loongarch/loongarch-opts.h index 651c1c18ca8..d091359300a 100644 --- a/gcc/config/loongarch/loongarch-opts.h +++ b/gcc/config/loongarch/loongarch-opts.h @@ -21,22 +21,15 @@ along with GCC; see the file COPYING3. If not see #ifndef LOONGARCH_OPTS_H #define LOONGARCH_OPTS_H -/* This is a C++ header and it shouldn't be used by target libraries. */ +/* The loongarch-def.h file is a C++ header and it shouldn't be used by + target libraries. Exclude it and everything using the C++ structs + (struct loongarch_target and gcc_options) from target libraries. */ #if !defined(IN_LIBGCC2) && !defined(IN_TARGET_LIBS) && !defined(IN_RTS) #include "loongarch-def.h" -#endif /* Target configuration */ extern struct loongarch_target la_target; -/* Flag status */ -struct loongarch_flags { -int flt; const char* flt_str; -#define SX_FLAG_TYPE(x) ((x) < 0 ? -(x) : (x)) -int sx[2]; -}; - - /* Initialize loongarch_target from separate option variables. */ void loongarch_init_target (struct loongarch_target *target, @@ -56,7 +49,14 @@ void loongarch_update_gcc_opt_status (struct loongarch_target *target, struct gcc_options *opts, struct gcc_options *opts_set); +#endif +/* Flag status */ +struct loongarch_flags { +int flt; const char* flt_str; +#define SX_FLAG_TYPE(x) ((x) < 0 ? -(x) : (x)) +int sx[2]; +}; /* Macros for common conditional expressions used in loongarch.{c,h,md} */ #define TARGET_CMODEL_NORMAL (la_target.cmodel == CMODEL_NORMAL) -- 2.43.0
[PATCH 0/3] LoongArch: Fix instruction costs
Update LoongArch instruction costs based on the micro-benchmark results on LA464 and LA664. In particular, this allows generating alsl/slli or alsl/slli + add pairs for multiplying some constants as on LA464/LA664 a mul instruction is 4x slower than alsl, slli, or add instructions. Bootstrapped and regtested on loongarch64-linux-gnu. Ok for trunk? Xi Ruoyao (3): LoongArch: Include rtl.h for COSTS_N_INSNS instead of hard coding our own LoongArch: Fix instruction costs [PR112936] LoongArch: Add alslsi3_extend gcc/config/loongarch/loongarch-def.cc | 42 ++- gcc/config/loongarch/loongarch.cc | 22 +- gcc/config/loongarch/loongarch.md | 12 ++ .../loongarch/mul-const-reduction.c | 11 + 4 files changed, 56 insertions(+), 31 deletions(-) create mode 100644 gcc/testsuite/gcc.target/loongarch/mul-const-reduction.c -- 2.43.0
[PATCH 2/3] LoongArch: Fix instruction costs [PR112936]
Replace the instruction costs in loongarch_rtx_cost_data constructor based on micro-benchmark results on LA464 and LA664. This allows optimizations like "x * 17" to alsl, and "x * 68" to alsl and slli. gcc/ChangeLog: PR target/112936 * config/loongarch/loongarch-def.cc (loongarch_rtx_cost_data::loongarch_rtx_cost_data): Update instruction costs per micro-benchmark results. (loongarch_rtx_cost_optimize_size): Set all instruction costs to (COSTS_N_INSNS (1) + 1). * config/loongarch/loongarch.cc (loongarch_rtx_costs): Remove special case for multiplication when optimizing for size. Adjust division cost when TARGET_64BIT && !TARGET_DIV32. Account the extra cost when TARGET_CHECK_ZERO_DIV and optimizing for speed. gcc/testsuite/ChangeLog PR target/112936 * gcc.target/loongarch/mul-const-reduction.c: New test. --- gcc/config/loongarch/loongarch-def.cc | 39 ++- gcc/config/loongarch/loongarch.cc | 22 +-- .../loongarch/mul-const-reduction.c | 11 ++ 3 files changed, 43 insertions(+), 29 deletions(-) create mode 100644 gcc/testsuite/gcc.target/loongarch/mul-const-reduction.c diff --git a/gcc/config/loongarch/loongarch-def.cc b/gcc/config/loongarch/loongarch-def.cc index 6217b19268c..4a8885e8343 100644 --- a/gcc/config/loongarch/loongarch-def.cc +++ b/gcc/config/loongarch/loongarch-def.cc @@ -92,15 +92,15 @@ array_tune loongarch_cpu_align = /* Default RTX cost initializer. */ loongarch_rtx_cost_data::loongarch_rtx_cost_data () - : fp_add (COSTS_N_INSNS (1)), -fp_mult_sf (COSTS_N_INSNS (2)), -fp_mult_df (COSTS_N_INSNS (4)), -fp_div_sf (COSTS_N_INSNS (6)), + : fp_add (COSTS_N_INSNS (5)), +fp_mult_sf (COSTS_N_INSNS (5)), +fp_mult_df (COSTS_N_INSNS (5)), +fp_div_sf (COSTS_N_INSNS (8)), fp_div_df (COSTS_N_INSNS (8)), -int_mult_si (COSTS_N_INSNS (1)), -int_mult_di (COSTS_N_INSNS (1)), -int_div_si (COSTS_N_INSNS (4)), -int_div_di (COSTS_N_INSNS (6)), +int_mult_si (COSTS_N_INSNS (4)), +int_mult_di (COSTS_N_INSNS (4)), +int_div_si (COSTS_N_INSNS (5)), +int_div_di (COSTS_N_INSNS (5)), branch_cost (6), memory_latency (4) {} @@ -111,18 +111,21 @@ loongarch_rtx_cost_data::loongarch_rtx_cost_data () array_tune loongarch_cpu_rtx_cost_data = array_tune (); -/* RTX costs to use when optimizing for size. */ +/* RTX costs to use when optimizing for size. + We use a value slightly larger than COSTS_N_INSNS (1) for all of them + because they are slower than simple instructions. */ +#define COST_COMPLEX_INSN (COSTS_N_INSNS (1) + 1) const loongarch_rtx_cost_data loongarch_rtx_cost_optimize_size = loongarch_rtx_cost_data () -.fp_add_ (4) -.fp_mult_sf_ (4) -.fp_mult_df_ (4) -.fp_div_sf_ (4) -.fp_div_df_ (4) -.int_mult_si_ (4) -.int_mult_di_ (4) -.int_div_si_ (4) -.int_div_di_ (4); +.fp_add_ (COST_COMPLEX_INSN) +.fp_mult_sf_ (COST_COMPLEX_INSN) +.fp_mult_df_ (COST_COMPLEX_INSN) +.fp_div_sf_ (COST_COMPLEX_INSN) +.fp_div_df_ (COST_COMPLEX_INSN) +.int_mult_si_ (COST_COMPLEX_INSN) +.int_mult_di_ (COST_COMPLEX_INSN) +.int_div_si_ (COST_COMPLEX_INSN) +.int_div_di_ (COST_COMPLEX_INSN); array_tune loongarch_cpu_issue_rate = array_tune () .set (CPU_NATIVE, 4) diff --git a/gcc/config/loongarch/loongarch.cc b/gcc/config/loongarch/loongarch.cc index 754aeb8bfb7..f04b5798f39 100644 --- a/gcc/config/loongarch/loongarch.cc +++ b/gcc/config/loongarch/loongarch.cc @@ -3787,8 +3787,6 @@ loongarch_rtx_costs (rtx x, machine_mode mode, int outer_code, *total = (speed ? loongarch_cost->int_mult_si * 3 + 6 : COSTS_N_INSNS (7)); - else if (!speed) - *total = COSTS_N_INSNS (1) + 1; else if (mode == DImode) *total = loongarch_cost->int_mult_di; else @@ -3823,14 +3821,18 @@ loongarch_rtx_costs (rtx x, machine_mode mode, int outer_code, case UDIV: case UMOD: - if (!speed) - { - *total = COSTS_N_INSNS (loongarch_idiv_insns (mode)); - } - else if (mode == DImode) + if (mode == DImode) *total = loongarch_cost->int_div_di; else - *total = loongarch_cost->int_div_si; + { + *total = loongarch_cost->int_div_si; + if (TARGET_64BIT && !TARGET_DIV32) + *total += COSTS_N_INSNS (2); + } + + if (TARGET_CHECK_ZERO_DIV) + *total += COSTS_N_INSNS (2); + return false; case SIGN_EXTEND: @@ -3862,9 +3864,7 @@ loongarch_rtx_costs (rtx x, machine_mode mode, int outer_code, && (GET_CODE (XEXP (XEXP (XEXP (x, 0), 0), 1)) == ZERO_EXTEND { - if (!speed) - *total = COSTS_N_INSNS (1) + 1; - else if (mode == DImode) + if (mode == DImode) *total = loonga
[PATCH 1/3] LoongArch: Include rtl.h for COSTS_N_INSNS instead of hard coding our own
With loongarch-def.cc switched from C to C++, we can include rtl.h for COSTS_N_INSNS, instead of hard coding our own. THis is a non-functional change for now, but it will make the code more future-proof in case COSTS_N_INSNS in rtl.h would be changed. gcc/ChangeLog: * config/loongarch/loongarch-def.cc (rtl.h): Include. (COSTS_N_INSNS): Remove the macro definition. --- gcc/config/loongarch/loongarch-def.cc | 3 +-- 1 file changed, 1 insertion(+), 2 deletions(-) diff --git a/gcc/config/loongarch/loongarch-def.cc b/gcc/config/loongarch/loongarch-def.cc index c41804a180e..6217b19268c 100644 --- a/gcc/config/loongarch/loongarch-def.cc +++ b/gcc/config/loongarch/loongarch-def.cc @@ -22,6 +22,7 @@ along with GCC; see the file COPYING3. If not see #include "system.h" #include "coretypes.h" #include "tm.h" +#include "rtl.h" #include "loongarch-def.h" #include "loongarch-str.h" @@ -89,8 +90,6 @@ array_tune loongarch_cpu_align = .set (CPU_LA464, la464_align ()) .set (CPU_LA664, la464_align ()); -#define COSTS_N_INSNS(N) ((N) * 4) - /* Default RTX cost initializer. */ loongarch_rtx_cost_data::loongarch_rtx_cost_data () : fp_add (COSTS_N_INSNS (1)), -- 2.43.0
[PATCH 3/3] LoongArch: Add alslsi3_extend
Following the instruction cost fix, we are generating alsl.w $a0, $a0, $a0, 4 instead of li.w $t0, 17 mul.w $a0, $t0 for "x * 4", because alsl.w is 4 times faster than mul.w. But we didn't have a sign-extending pattern for alsl.w, causing an extra slli.w instruction generated to sign-extend $a0. Add the pattern to remove the redundant extension. gcc/ChangeLog: * config/loongarch/loongarch.md (alslsi3_extend): New define_insn. --- gcc/config/loongarch/loongarch.md | 12 1 file changed, 12 insertions(+) diff --git a/gcc/config/loongarch/loongarch.md b/gcc/config/loongarch/loongarch.md index afbf201d4d0..7b26d15aa4e 100644 --- a/gcc/config/loongarch/loongarch.md +++ b/gcc/config/loongarch/loongarch.md @@ -2869,6 +2869,18 @@ (define_insn "alsl3" [(set_attr "type" "arith") (set_attr "mode" "")]) +(define_insn "alslsi3_extend" + [(set (match_operand:DI 0 "register_operand" "=r") + (sign_extend:DI + (plus:SI + (ashift:SI (match_operand:SI 1 "register_operand" "r") + (match_operand 2 "const_immalsl_operand" "")) + (match_operand:SI 3 "register_operand" "r"] + "" + "alsl.w\t%0,%1,%3,%2" + [(set_attr "type" "arith") + (set_attr "mode" "SI")]) + ;; Reverse the order of bytes of operand 1 and store the result in operand 0. -- 2.43.0
Re: [PING][PATCH 2/2] arm: Add support for MVE Tail-Predicated Low Overhead Loops
Sorry for the slow review. Stamatis Markianos-Wright writes: > [...] > diff --git a/gcc/config/arm/mve.md b/gcc/config/arm/mve.md > index > 44a04b86cb5806fcf50917826512fd203d42106c..c083f965fa9a40781bc86beb6e63654afd14eac4 > 100644 > --- a/gcc/config/arm/mve.md > +++ b/gcc/config/arm/mve.md > @@ -6922,23 +6922,24 @@ > ;; Originally expanded by 'predicated_doloop_end'. > ;; In the rare situation where the branch is too far, we do also need to > ;; revert FPSCR.LTPSIZE back to 0x100 after the last iteration. > -(define_insn "*predicated_doloop_end_internal" > +(define_insn "predicated_doloop_end_internal" >[(set (pc) > (if_then_else > -(ge (plus:SI (reg:SI LR_REGNUM) > - (match_operand:SI 0 "const_int_operand" "")) > - (const_int 0)) > - (label_ref (match_operand 1 "" "")) > +(gtu (unspec:SI [(plus:SI (match_operand:SI 0 "s_register_operand" > "=r") > + (const_int ))] > + LETP) > + (const_int )) Is there any need for the unspec? I couldn't see why this wasn't simply: (gtu (match_operand:SI 0 "s_register_operand" "=r") (const_int )) But I agree that using gtu rather than ge is nicer if it's what the instruction does. > diff --git a/gcc/df-core.cc b/gcc/df-core.cc > index > d4812b04a7cb97ea1606082e26e910472da5bcc1..4fcc14bf790d43e792b3c926fe1f80073d908c17 > 100644 > --- a/gcc/df-core.cc > +++ b/gcc/df-core.cc > @@ -1964,6 +1964,21 @@ df_bb_regno_last_def_find (basic_block bb, unsigned > int regno) >return NULL; > } > > +/* Return the one and only def of REGNO within BB. If there is no def or > + there are multiple defs, return NULL. */ > + > +df_ref > +df_bb_regno_only_def_find (basic_block bb, unsigned int regno) > +{ > + df_ref temp = df_bb_regno_first_def_find (bb, regno); > + if (!temp) > +return NULL; > + else if (temp == df_bb_regno_last_def_find (bb, regno)) > +return temp; > + else > +return NULL; > +} > + > /* Finds the reference corresponding to the definition of REG in INSN. > DF is the dataflow object. */ > > diff --git a/gcc/df.h b/gcc/df.h > index > 402657a7076f1bcad24e9c50682e033e57f432f9..98623637f9c839c799222e99df2a7173a770b2ac > 100644 > --- a/gcc/df.h > +++ b/gcc/df.h > @@ -987,6 +987,7 @@ extern void df_check_cfg_clean (void); > #endif > extern df_ref df_bb_regno_first_def_find (basic_block, unsigned int); > extern df_ref df_bb_regno_last_def_find (basic_block, unsigned int); > +extern df_ref df_bb_regno_only_def_find (basic_block, unsigned int); > extern df_ref df_find_def (rtx_insn *, rtx); > extern bool df_reg_defined (rtx_insn *, rtx); > extern df_ref df_find_use (rtx_insn *, rtx); > diff --git a/gcc/loop-doloop.cc b/gcc/loop-doloop.cc > index > 4feb0a25ab9331b7124df900f73c9fc6fb3eb10b..d919207505c472c8a54a2c9c982a09061584177b > 100644 > --- a/gcc/loop-doloop.cc > +++ b/gcc/loop-doloop.cc > @@ -85,10 +85,10 @@ doloop_condition_get (rtx_insn *doloop_pat) > forms: > > 1) (parallel [(set (pc) (if_then_else (condition) > - (label_ref (label)) > - (pc))) > - (set (reg) (plus (reg) (const_int -1))) > - (additional clobbers and uses)]) > + (label_ref (label)) > + (pc))) > + (set (reg) (plus (reg) (const_int -1))) > + (additional clobbers and uses)]) > > The branch must be the first entry of the parallel (also required > by jump.cc), and the second entry of the parallel must be a set of > @@ -96,19 +96,34 @@ doloop_condition_get (rtx_insn *doloop_pat) > the loop counter in an if_then_else too. > > 2) (set (reg) (plus (reg) (const_int -1)) > - (set (pc) (if_then_else (reg != 0) > - (label_ref (label)) > - (pc))). > + (set (pc) (if_then_else (reg != 0) > + (label_ref (label)) > + (pc))). > > Some targets (ARM) do the comparison before the branch, as in the > following form: > > - 3) (parallel [(set (cc) (compare ((plus (reg) (const_int -1), 0))) > - (set (reg) (plus (reg) (const_int -1)))]) > -(set (pc) (if_then_else (cc == NE) > -(label_ref (label)) > -(pc))) */ > - > + 3) (parallel [(set (cc) (compare (plus (reg) (const_int -1)) 0)) > +(set (reg) (plus (reg) (const_int -1)))]) > + (set (pc) (if_then_else (cc == NE) > + (label_ref (label)) > + (pc))) > + > + The ARM target also supports a special case of a counter that > decrements > + by `n` and terminating in a GTU condition. In that case, the compare > and > +
Re: [PATCH] aarch64: Add missing driver-aarch64 dependencies
Andrew Carlotti writes: > Ok for master? > > gcc/ChangeLog: > > * config/aarch64/x-aarch64: Add missing dependencies. > > > diff --git a/gcc/config/aarch64/x-aarch64 b/gcc/config/aarch64/x-aarch64 > index > 3cf701a0a01ab00eaaafdfad14bd90ebbb1d498f..6fd638faaab7cb5bb2309d36d6dea2adf1fb8d32 > 100644 > --- a/gcc/config/aarch64/x-aarch64 > +++ b/gcc/config/aarch64/x-aarch64 > @@ -1,3 +1,7 @@ > driver-aarch64.o: $(srcdir)/config/aarch64/driver-aarch64.cc \ > - $(CONFIG_H) $(SYSTEM_H) > + $(CONFIG_H) $(SYSTEM_H) $(TM_H) $(CORETYPES_H) \ > + $(srcdir)/config/aarch64/aarch64-protos.h \ > + $(srcdir)/config/aarch64/aarch64-feature-deps.h \ > + $(srcdir)/config/aarch64/aarch64-cores.def \ > + $(srcdir)/config/aarch64/aarch64-arches.def The .def files are included in TM_H by: TM_H += $(srcdir)/config/aarch64/aarch64-fusion-pairs.def \ $(srcdir)/config/aarch64/aarch64-tuning-flags.def \ $(srcdir)/config/aarch64/aarch64-option-extensions.def \ $(srcdir)/config/aarch64/aarch64-cores.def \ $(srcdir)/config/aarch64/aarch64-isa-modes.def \ $(srcdir)/config/aarch64/aarch64-arches.def so they aren't strictly needed. If you'd prefer to include the directly-included files anyway (can see the argument in favour of that), then the list ought to include aarch64-option-extensions.def too. OK whichever way you prefer. Thanks, Richard > $(COMPILER) -c $(ALL_COMPILERFLAGS) $(ALL_CPPFLAGS) $(INCLUDES) $<
Re: aarch64: Fix +nocrypto handling
Andrew Carlotti writes: > Additionally, replace all checks for the AARCH64_FL_CRYPTO bit with > checks for (AARCH64_FL_AES | AARCH64_FL_SHA2) instead. The value of the > AARCH64_FL_CRYPTO bit within isa_flags is now ignored, but it is > retained because removing it would make processing the data in > option-extensions.def significantly more complex. > > Ok for master? > > gcc/ChangeLog: > > * common/config/aarch64/aarch64-common.cc > (aarch64_get_extension_string_for_isa_flags): Fix generation of > the "+nocrypto" extension. > * config/aarch64/aarch64.h (AARCH64_ISA_CRYPTO): Remove. > (TARGET_CRYPTO): Remove. > * config/aarch64/aarch64-c.cc (aarch64_update_cpp_builtins): > Don't use TARGET_CRYPTO. > > gcc/testsuite/ChangeLog: > > * gcc.target/aarch64/options_set_27.c: New test. > * gcc.target/aarch64/options_set_28.c: New test. > > diff --git a/gcc/common/config/aarch64/aarch64-common.cc > b/gcc/common/config/aarch64/aarch64-common.cc > index > 20bc4e1291bba9b73798398fea659f1154afa205..6d12454143cd64ebaafa7f5e6c23869ee0bfa543 > 100644 > --- a/gcc/common/config/aarch64/aarch64-common.cc > +++ b/gcc/common/config/aarch64/aarch64-common.cc > @@ -310,6 +310,7 @@ aarch64_get_extension_string_for_isa_flags > But in order to make the output more readable, it seems better > to add the strings in definition order. */ >aarch64_feature_flags added = 0; > + auto flags_crypto = AARCH64_FL_AES | AARCH64_FL_SHA2; >for (unsigned int i = ARRAY_SIZE (all_extensions); i-- > 0; ) > { >auto &opt = all_extensions[i]; > @@ -319,7 +320,7 @@ aarch64_get_extension_string_for_isa_flags >per-feature crypto flags. */ >auto flags = opt.flag_canonical; >if (flags == AARCH64_FL_CRYPTO) > - flags = AARCH64_FL_AES | AARCH64_FL_SHA2; > + flags = flags_crypto; > >if ((flags & isa_flags & (explicit_flags | ~current_flags)) == flags) > { > @@ -337,9 +338,27 @@ aarch64_get_extension_string_for_isa_flags >/* Remove the features in current_flags & ~isa_flags. If the feature does > not have an HWCAPs then it shouldn't be taken into account for feature > detection because one way or another we can't tell if it's available > - or not. */ > + or not. > + > + As a special case, emit "+nocrypto" instead of "+noaes+nosha2", in order > + to support assemblers that predate the separate per-feature crypto > flags. > + Only use "+nocrypto" when "simd" is enabled (to avoid redundant feature > + removal), and when "sm4" is not already enabled (to avoid dependending > on > + whether "+nocrypto" also disables "sm4") */ > + for (auto &opt : all_extensions) > +if ((opt.flag_canonical == AARCH64_FL_CRYPTO) > + && ((flags_crypto & current_flags & ~isa_flags) == flags_crypto) > + && (current_flags & AARCH64_FL_SIMD) > + && !(current_flags & AARCH64_FL_SM4)) > + { > + current_flags &= ~opt.flags_off; > + outstr += "+no"; > + outstr += opt.name; > + } > + Is it an important part of the patch that we do this ahead of time, rather than in the main loop? Doing it in the main loop feels more natural, and should avoid the need for the SIMD test. It we do use an in-loop test, I assume the test would need to be something like: (opt.flag_canonical & flag_crypto) && (flags_crypto & current_flags & ~isa_flags) == flags_crypto && !(current_flags & AARCH64_FL_SM4) so that the new code is applied when the loop first sees a crypto flag. The set of flags to disable would be: current_flags &= ~feature_deps::get_flags_off (flag_crypto); Otherwise it looks good, thanks. As a general formatting note, GCC style is not to put individual comparisons in parentheses in && and || combos. Richard >for (auto &opt : all_extensions) > if (opt.native_detect_p > + && (opt.flag_canonical != AARCH64_FL_CRYPTO) > && (opt.flag_canonical & current_flags & ~isa_flags)) >{ > current_flags &= ~opt.flags_off; > diff --git a/gcc/config/aarch64/aarch64-c.cc b/gcc/config/aarch64/aarch64-c.cc > index > ab8844f6049dc95b97648b651bfcd3a4ccd3ca0b..4f9ee01d52f3ac42f95edbb030bdb2d09fc36d16 > 100644 > --- a/gcc/config/aarch64/aarch64-c.cc > +++ b/gcc/config/aarch64/aarch64-c.cc > @@ -140,7 +140,7 @@ aarch64_update_cpp_builtins (cpp_reader *pfile) >aarch64_def_or_undef (TARGET_ILP32, "_ILP32", pfile); >aarch64_def_or_undef (TARGET_ILP32, "__ILP32__", pfile); > > - aarch64_def_or_undef (TARGET_CRYPTO, "__ARM_FEATURE_CRYPTO", pfile); > + aarch64_def_or_undef (TARGET_AES && TARGET_SHA2, "__ARM_FEATURE_CRYPTO", > pfile); >aarch64_def_or_undef (TARGET_SIMD_RDMA, "__ARM_FEATURE_QRDMX", pfile); >aarch64_def_or_undef (TARGET_SVE, "__ARM_FEATURE_SVE", pfile); >cpp_undef (pfile, "__ARM_FEATURE_SVE_BITS"); > diff --git a/gcc/config/aarch64/aarch64.h b/gcc/config/aarch64/aarch64.h > index > 1ac298926ce
Re: aarch64: Fix +nopredres, +nols64 and +nomops
Andrew Carlotti writes: > For native cpu feature detection, certain features have no entry in > /proc/cpuinfo, so have to be assumed to be present whenever the detected > cpu is supposed to support that feature. > > However, the logic for this was mistakenly implemented by excluding > these features from part of aarch64_get_extension_string_for_isa_flags. > This function is also used elsewhere when canonicalising explicit > feature sets, which may require removing features that are normally > implied by the specified architecture version. > > This change reenables generation of +nopredres, +nols64 and +nomops > during canonicalisation, by relocating the misplaced native cpu > detection logic. > > gcc/ChangeLog: > > * common/config/aarch64/aarch64-common.cc > (aarch64_get_extension_string_for_isa_flags): Remove filtering > of features without native detection. > * config/aarch64/driver-aarch64.cc (host_detect_local_cpu): > Explicitly add expected features that lack cpuinfo detection. > > gcc/testsuite/ChangeLog: > > * gcc.target/aarch64/options_set_29.c: New test. > > > diff --git a/gcc/common/config/aarch64/aarch64-common.cc > b/gcc/common/config/aarch64/aarch64-common.cc > index > ee2ea7eae105d19ec906ef8d25d3a237fbeac4b4..37e60d6083e290b18b1f4c6274123b0a58de5476 > 100644 > --- a/gcc/common/config/aarch64/aarch64-common.cc > +++ b/gcc/common/config/aarch64/aarch64-common.cc > @@ -357,8 +357,7 @@ aarch64_get_extension_string_for_isa_flags >} > >for (auto &opt : all_extensions) > -if (opt.native_detect_p > - && (opt.flag_canonical != AARCH64_FL_CRYPTO) > +if ((opt.flag_canonical != AARCH64_FL_CRYPTO) > && (opt.flag_canonical & current_flags & ~isa_flags)) >{ > current_flags &= ~opt.flags_off; This is the only use of native_detect_p, so it'd be good to remove the field itself. > diff --git a/gcc/config/aarch64/driver-aarch64.cc > b/gcc/config/aarch64/driver-aarch64.cc > index > 8e318892b10aa2288421fad418844744a2f5a3b4..470c19b650f1ae953918eaeddbf0f768c12a99d9 > 100644 > --- a/gcc/config/aarch64/driver-aarch64.cc > +++ b/gcc/config/aarch64/driver-aarch64.cc > @@ -262,6 +262,7 @@ host_detect_local_cpu (int argc, const char **argv) >unsigned int n_variants = 0; >bool processed_exts = false; >aarch64_feature_flags extension_flags = 0; > + aarch64_feature_flags unchecked_extension_flags = 0; >aarch64_feature_flags default_flags = 0; >std::string buf; >size_t sep_pos = -1; > @@ -348,7 +349,10 @@ host_detect_local_cpu (int argc, const char **argv) > /* If the feature contains no HWCAPS string then ignore it for the >auto detection. */ > if (val.empty ()) > - continue; > + { > + unchecked_extension_flags |= aarch64_extensions[i].flag; > + continue; > + } > > bool enabled = true; > > @@ -447,6 +451,13 @@ host_detect_local_cpu (int argc, const char **argv) >if (tune) > return res; > > + if (!processed_exts) > +goto not_found; Could you explain this part? It seems like more of a parsing change (i.e. being more strict about what we accept). If that's the intention, it probably belongs in: if (n_cores == 0 || n_cores > 2 || (n_cores == 1 && n_variants != 1) || imp == INVALID_IMP) goto not_found; But maybe it should be a separate patch. Looks good otherwise, thanks. Richard > + > + /* Add any features that should be be present, but can't be verified using > + the /proc/cpuinfo "Features" list. */ > + extension_flags |= unchecked_extension_flags & default_flags; > + >{ > std::string extension >= aarch64_get_extension_string_for_isa_flags (extension_flags, > diff --git a/gcc/testsuite/gcc.target/aarch64/options_set_29.c > b/gcc/testsuite/gcc.target/aarch64/options_set_29.c > new file mode 100644 > index > ..01bb73c02e232bdfeca5f16dad3fa2a6484843d5 > --- /dev/null > +++ b/gcc/testsuite/gcc.target/aarch64/options_set_29.c > @@ -0,0 +1,11 @@ > +/* { dg-do compile } */ > +/* { dg-additional-options "-march=armv9.3-a+nopredres+nols64+nomops" } */ > + > +int main () > +{ > + return 0; > +} > + > +/* { dg-final { scan-assembler-times {\.arch > armv9\.3\-a\+crc\+nopredres\+nols64\+nomops\n} 1 } } */ > + > +/* Checking if enabling default features drops the superfluous bits. */
v2 [C PATCH] Fix regression causing ICE for structs with VLAs [PR 112488]
I revised version which fixes a problem with breaking other callers of finish_rust. Please ignore the previous one. Bootstrapped and regression tested on x86_64 Fix regression causing ICE for structs with VLAs [PR 112488] A previous patch the fixed several ICEs related to size expressions of VM types (PR c/70418, ...) caused a regression for structs where a DECL_EXPR is not generated anymore although reqired. We now call add_decl_expr introduced by the previous patch from finish_struct. The function is revised with a new argument to not set the TYPE_NAME for the type to the DECL_EXPR in this specific case. PR c/112488 gcc/c * c-decl.cc (add_decl_expr): Revise. (finish_struct): Create DECL_EXPR. * c-parser.cc (c_parser_struct_or_union_specifier): Call finish_struct with expression for VLA sizes. * c-tree.h (finish_struct): Add argument. gcc/testsuite * gcc.dg/pr112488-1.c: New test. * gcc.dg/pr112488-2.c: New test. * gcc.dg/pr112898.c: New test. * gcc.misc-tests/gcov-pr85350.c: Adapt. --- gcc/c/c-decl.cc | 33 - gcc/c/c-parser.cc | 2 +- gcc/c/c-tree.h | 3 +- gcc/testsuite/gcc.dg/pr112488-1.c | 14 + gcc/testsuite/gcc.dg/pr112488-2.c | 13 gcc/testsuite/gcc.dg/pr112898.c | 9 ++ gcc/testsuite/gcc.misc-tests/gcov-pr85350.c | 2 +- 7 files changed, 65 insertions(+), 11 deletions(-) create mode 100644 gcc/testsuite/gcc.dg/pr112488-1.c create mode 100644 gcc/testsuite/gcc.dg/pr112488-2.c create mode 100644 gcc/testsuite/gcc.dg/pr112898.c diff --git a/gcc/c/c-decl.cc b/gcc/c/c-decl.cc index 92c83e1bf10..039a66fef09 100644 --- a/gcc/c/c-decl.cc +++ b/gcc/c/c-decl.cc @@ -6618,12 +6618,10 @@ smallest_type_quals_location (const location_t *locations, the size evaluation prior to the side effects. We therefore use BIND_EXPRs in TYPENAME contexts too. */ static void -add_decl_expr (location_t loc, enum decl_context decl_context, tree type, - tree *expr) +add_decl_expr (location_t loc, tree type, tree *expr, bool set_name_p) { tree bind = NULL_TREE; - if (decl_context == TYPENAME || decl_context == PARM - || decl_context == FIELD) + if (expr) { bind = build3 (BIND_EXPR, void_type_node, NULL_TREE, NULL_TREE, NULL_TREE); @@ -6636,7 +6634,8 @@ add_decl_expr (location_t loc, enum decl_context decl_context, tree type, pushdecl (decl); DECL_ARTIFICIAL (decl) = 1; add_stmt (build_stmt (DECL_SOURCE_LOCATION (decl), DECL_EXPR, decl)); - TYPE_NAME (type) = decl; + if (set_name_p) +TYPE_NAME (type) = decl; if (bind) { @@ -7635,7 +7634,12 @@ grokdeclarator (const struct c_declarator *declarator, type has a name/declaration of it's own, but special attention is required if the type is anonymous. */ if (!TYPE_NAME (type) && c_type_variably_modified_p (type)) - add_decl_expr (loc, decl_context, type, expr); + { + bool bind_p = decl_context == TYPENAME + || decl_context == FIELD + || decl_context == PARM; + add_decl_expr (loc, type, bind_p ? expr : NULL, true); + } type = c_build_pointer_type (type); @@ -7900,7 +7904,12 @@ grokdeclarator (const struct c_declarator *declarator, /* The pointed-to type may need a decl expr (see above). */ if (!TYPE_NAME (type) && c_type_variably_modified_p (type)) - add_decl_expr (loc, decl_context, type, expr); + { + bool bind_p = decl_context == TYPENAME + || decl_context == FIELD + || decl_context == PARM; + add_decl_expr (loc, type, bind_p ? expr : NULL, true); + } type = c_build_pointer_type (type); type_quals = array_ptr_quals; @@ -9257,7 +9266,8 @@ is_flexible_array_member_p (bool is_last_field, tree finish_struct (location_t loc, tree t, tree fieldlist, tree attributes, - class c_struct_parse_info *enclosing_struct_parse_info) + class c_struct_parse_info *enclosing_struct_parse_info, + tree *expr) { tree x; bool toplevel = file_scope == current_scope; @@ -9595,6 +9605,13 @@ finish_struct (location_t loc, tree t, tree fieldlist, tree attributes, finish_incomplete_vars (incomplete_vars, toplevel); + /* Make sure a DECL_EXPR is created for structs with VLA members. + Because we do not know the context, we always pass expr + to force creation of a BIND_EXPR which is required in some + contexts. */ + if (c_type_variably_modified_p (t)) +add_decl_expr (loc, t, expr, false); + if (warn_cxx_compat) warn_cxx_compat_finish_struct (fie
Re: [PATCH] phiopt: Fix ICE with large --param l1-cache-line-size= [PR112887]
> Am 09.12.2023 um 10:35 schrieb Jakub Jelinek : > > Hi! > > This function is never called when param_l1_cache_line_size is 0, > but it uses int and unsigned int variables to hold alignment in > bits, so for large param_l1_cache_line_size it is zero and e.g. > DECL_ALIGN () % param_align_bits can divide by zero. > Looking at the code, the function uses tree_fits_uhwi_p on the trees > before converting them using tree_to_uhwi to int variables, which > looks just wrong, either it would need to punt if it doesn't fit > into those and also check for overflows during the computation, > or use unsigned HOST_WIDE_INT for all of this. That also fixes > the division by zero, as param_l1_cache_line_size maximum is INT_MAX, > that multiplied by 8 will always fit. > > Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk? Ok > 2023-12-09 Jakub Jelinek > >PR tree-optimization/112887 >* tree-ssa-phiopt.cc (hoist_adjacent_loads): Change type of >param_align, param_align_bits, offset1, offset2, size2 and align1 >variables from int or unsigned int to unsigned HOST_WIDE_INT. > >* gcc.dg/pr112887.c: New test. > > --- gcc/tree-ssa-phiopt.cc.jj2023-11-14 10:52:16.195275972 +0100 > +++ gcc/tree-ssa-phiopt.cc2023-12-08 16:25:29.166747347 +0100 > @@ -3757,8 +3757,8 @@ static void > hoist_adjacent_loads (basic_block bb0, basic_block bb1, > basic_block bb2, basic_block bb3) > { > - int param_align = param_l1_cache_line_size; > - unsigned param_align_bits = (unsigned) (param_align * BITS_PER_UNIT); > + unsigned HOST_WIDE_INT param_align = param_l1_cache_line_size; > + unsigned HOST_WIDE_INT param_align_bits = param_align * BITS_PER_UNIT; > gphi_iterator gsi; > > /* Walk the phis in bb3 looking for an opportunity. We are looking > @@ -3770,8 +3770,7 @@ hoist_adjacent_loads (basic_block bb0, b > gimple *def1, *def2; > tree arg1, arg2, ref1, ref2, field1, field2; > tree tree_offset1, tree_offset2, tree_size2, next; > - int offset1, offset2, size2; > - unsigned align1; > + unsigned HOST_WIDE_INT offset1, offset2, size2, align1; > gimple_stmt_iterator gsi2; > basic_block bb_for_def1, bb_for_def2; > > --- gcc/testsuite/gcc.dg/pr112887.c.jj2023-12-08 16:31:30.708697160 +0100 > +++ gcc/testsuite/gcc.dg/pr112887.c2023-12-08 16:27:06.662385487 +0100 > @@ -0,0 +1,13 @@ > +/* PR tree-optimization/112887 */ > +/* { dg-do compile } */ > +/* { dg-options "-O2 --param=l1-cache-line-size=0x2000" } */ > + > +void bar (long); > +long c; > +struct S { long a, b; } s; > + > +void > +foo (void) > +{ > + bar (c ? s.a : s.b); > +} > >Jakub >
Re: [PATCH] c++: End lifetime of objects in constexpr after destructor call [PR71093]
On 11/2/23 21:18, Nathaniel Shead wrote: Bootstrapped and regtested on x86-64_pc_linux_gnu. I'm not entirely sure if the change I made to have destructors clobber with CLOBBER_EOL instead of CLOBBER_UNDEF is appropriate, but nothing seemed to have broken by doing this and I wasn't able to find anything else that really depended on this distinction other than a warning pass. Otherwise I could experiment with a new clobber kind for destructor calls. It seems wrong to me: CLOBBER_EOL is documented to mean that the storage is expiring at that point as well, which a (pseudo-)destructor does not imply; it's perfectly valid to destroy an object and then create another in the same storage. We probably do want another clobber kind for end of object lifetime. And/or one for beginning of object lifetime. Jason
Re: [PATCH] c++: fix noexcept checking for trivial operations [PR96090]
On 11/27/23 06:07, Nathaniel Shead wrote: Ping for https://gcc.gnu.org/pipermail/gcc-patches/2023-October/634626.html. I've been made aware since constructing this patch of CWG2820, which has a proposed resolution that would change the result of the testcase 'noexcept(yesthrow_t())' (and similarly for the library builtin), but as it hasn't yet been accepted I think at least ensuring the builtin matches the behaviour of the operator is probably still sensible. OK. On Sun, Oct 29, 2023 at 12:43:28PM +1100, Nathaniel Shead wrote: Bootstrapped and regtested on x86_64-pc-linux-gnu. -- >8 -- This patch stops eager folding of trivial operations (construction and assignment) from occurring when checking for noexceptness. This was previously done in PR c++/53025, but only for copy/move construction, and the __is_nothrow_xible builtins did not receive the same treatment when they were added. To handle `is_nothrow_default_constructible`, the patch also ensures that when no parameters are passed we do value initialisation instead of just building the constructor call: in particular, value-initialisation doesn't necessarily actually invoke the constructor for trivial default constructors, and so we need to handle this case as well. PR c++/96090 PR c++/100470 gcc/cp/ChangeLog: * call.cc (build_over_call): Prevent folding of trivial special members when checking for noexcept. * method.cc (constructible_expr): Perform value-initialisation for empty parameter lists. (is_nothrow_xible): Treat as noexcept operator. gcc/testsuite/ChangeLog: * g++.dg/cpp0x/noexcept81.C: New test. * g++.dg/ext/is_nothrow_constructible7.C: New test. * g++.dg/ext/is_nothrow_constructible8.C: New test. Signed-off-by: Nathaniel Shead --- gcc/cp/call.cc| 17 ++--- gcc/cp/method.cc | 19 -- gcc/testsuite/g++.dg/cpp0x/noexcept81.C | 36 +++ .../g++.dg/ext/is_nothrow_constructible7.C| 20 ++ .../g++.dg/ext/is_nothrow_constructible8.C| 63 +++ 5 files changed, 141 insertions(+), 14 deletions(-) create mode 100644 gcc/testsuite/g++.dg/cpp0x/noexcept81.C create mode 100644 gcc/testsuite/g++.dg/ext/is_nothrow_constructible7.C create mode 100644 gcc/testsuite/g++.dg/ext/is_nothrow_constructible8.C diff --git a/gcc/cp/call.cc b/gcc/cp/call.cc index c1fb8807d3f..ac02b0633ed 100644 --- a/gcc/cp/call.cc +++ b/gcc/cp/call.cc @@ -10231,15 +10231,16 @@ build_over_call (struct z_candidate *cand, int flags, tsubst_flags_t complain) /* Avoid actually calling copy constructors and copy assignment operators, if possible. */ - if (! flag_elide_constructors && !force_elide) + if (!force_elide + && (!flag_elide_constructors + /* It's unsafe to elide the operation when handling +a noexcept-expression, it may evaluate to the wrong +value (c++/53025, c++/96090). */ + || cp_noexcept_operand != 0)) /* Do things the hard way. */; - else if (cand->num_convs == 1 - && (DECL_COPY_CONSTRUCTOR_P (fn) - || DECL_MOVE_CONSTRUCTOR_P (fn)) - /* It's unsafe to elide the constructor when handling - a noexcept-expression, it may evaluate to the wrong - value (c++/53025). */ - && (force_elide || cp_noexcept_operand == 0)) + else if (cand->num_convs == 1 + && (DECL_COPY_CONSTRUCTOR_P (fn) + || DECL_MOVE_CONSTRUCTOR_P (fn))) { tree targ; tree arg = argarray[num_artificial_parms_for (fn)]; diff --git a/gcc/cp/method.cc b/gcc/cp/method.cc index a70dd5d6adc..3c978e2369d 100644 --- a/gcc/cp/method.cc +++ b/gcc/cp/method.cc @@ -2091,6 +2091,7 @@ constructible_expr (tree to, tree from) { tree expr; cp_unevaluated cp_uneval_guard; + const int len = TREE_VEC_LENGTH (from); if (CLASS_TYPE_P (to)) { tree ctype = to; @@ -2098,11 +2099,16 @@ constructible_expr (tree to, tree from) if (!TYPE_REF_P (to)) to = cp_build_reference_type (to, /*rval*/false); tree ob = build_stub_object (to); - vec_alloc (args, TREE_VEC_LENGTH (from)); - for (tree arg : tree_vec_range (from)) - args->quick_push (build_stub_object (arg)); - expr = build_special_member_call (ob, complete_ctor_identifier, &args, - ctype, LOOKUP_NORMAL, tf_none); + if (len == 0) + expr = build_value_init (ctype, tf_none); + else + { + vec_alloc (args, TREE_VEC_LENGTH (from)); + for (tree arg : tree_vec_range (from)) + args->quick_push (build_stub_object (arg)); + expr = build_special_member_call (ob, complete_ctor_identifier, &args, + ctype, LOOKUP_NORMAL, tf_none); + } if (expr == error_mark_node) return error_mark_node;
Re: [PATCH] c++/modules: alias CTAD and specializations table
On 11/24/23 13:09, Patrick Palka wrote: Bootstrapped and regtested on x86_64-pc-linux-gnu, does this look OK for trunk? OK. -- >8 -- A rewritten guide for alias CTAD isn't really a specialization of the original guide, so we shouldn't register it as such. This avoids an ICE in the below modules testcase which otherwise tries to inspect the rewritten guide's empty DECL_CONTEXT. It also preemptively avoids an ICE in modules/concept-6 in C++23 mode with the inherited CTAD patch. * pt.cc (alias_ctad_tweaks): Pass use_spec_table=false to tsubst_decl. gcc/testsuite/ChangeLog: * g++.dg/modules/concept-8.h: New test. * g++.dg/modules/concept-8_a.H: New test. * g++.dg/modules/concept-8_b.C: New test. --- gcc/cp/pt.cc | 3 ++- gcc/testsuite/g++.dg/modules/concept-8.h | 14 ++ gcc/testsuite/g++.dg/modules/concept-8_a.H | 5 + gcc/testsuite/g++.dg/modules/concept-8_b.C | 8 4 files changed, 29 insertions(+), 1 deletion(-) create mode 100644 gcc/testsuite/g++.dg/modules/concept-8.h create mode 100644 gcc/testsuite/g++.dg/modules/concept-8_a.H create mode 100644 gcc/testsuite/g++.dg/modules/concept-8_b.C diff --git a/gcc/cp/pt.cc b/gcc/cp/pt.cc index 4f93150c5d7..2cfe1da5e07 100644 --- a/gcc/cp/pt.cc +++ b/gcc/cp/pt.cc @@ -30015,7 +30015,8 @@ alias_ctad_tweaks (tree tmpl, tree uguides) /* Parms are to have DECL_CHAIN tsubsted, which would be skipped if cp_unevaluated_operand. */ cp_evaluated ev; - g = tsubst_decl (DECL_TEMPLATE_RESULT (f), targs, complain); + g = tsubst_decl (DECL_TEMPLATE_RESULT (f), targs, complain, + /*use_spec_table=*/false); } if (g == error_mark_node) continue; diff --git a/gcc/testsuite/g++.dg/modules/concept-8.h b/gcc/testsuite/g++.dg/modules/concept-8.h new file mode 100644 index 000..a25f9b752fd --- /dev/null +++ b/gcc/testsuite/g++.dg/modules/concept-8.h @@ -0,0 +1,14 @@ +// A version of concept-6.h using an alias template + alias CTAD + +template +struct Base +{ + Base (const _Callable &) +requires true + {} +}; + +template requires true +using Derived = Base<_Callable>; + +inline Derived all = [] (auto&& __r) {}; diff --git a/gcc/testsuite/g++.dg/modules/concept-8_a.H b/gcc/testsuite/g++.dg/modules/concept-8_a.H new file mode 100644 index 000..da0467781c1 --- /dev/null +++ b/gcc/testsuite/g++.dg/modules/concept-8_a.H @@ -0,0 +1,5 @@ +// { dg-require-effective-target c++20 } +// { dg-additional-options "-fmodule-header -fconcepts" } +// { dg-module-cmi {} } + +#include "concept-8.h" diff --git a/gcc/testsuite/g++.dg/modules/concept-8_b.C b/gcc/testsuite/g++.dg/modules/concept-8_b.C new file mode 100644 index 000..9a9f014ee09 --- /dev/null +++ b/gcc/testsuite/g++.dg/modules/concept-8_b.C @@ -0,0 +1,8 @@ +// { dg-require-effective-target c++20 } +// { dg-additional-options "-fmodules-ts -fconcepts -fdump-lang-module-alias -fno-module-lazy" } + +#include "concept-8.h" +import "concept-8_a.H"; + +// { dg-final { scan-lang-dump-times {named merge key \(matched\) function_decl:'::Base<::._anon_0>::__ct '} 2 module } } +// { dg-final { scan-lang-dump-not {merge key \(new\)} module } }
[PATCH] Add some new DW_IDX_* constants
I've reimplemented the .debug_names code in GDB -- it was quite far from being correct, and the new implementation is much closer to what is specified by DWARF. However, the new writer in GDB needs to emit some symbol properties, so that the reader can be fully functional. This patch adds a few new DW_IDX_* constants, and tries to document the existing extensions as well. (My patch series add more documentation of these to the GDB manual as well.) --- include/dwarf2.def | 9 + 1 file changed, 9 insertions(+) diff --git a/include/dwarf2.def b/include/dwarf2.def index 7ab3ee611fd4..75b75d901884 100644 --- a/include/dwarf2.def +++ b/include/dwarf2.def @@ -802,8 +802,17 @@ DW_IDX (DW_IDX_parent, 4) DW_IDX (DW_IDX_type_hash, 5) DW_IDX_DUP (DW_IDX_lo_user, 0x2000) DW_IDX (DW_IDX_hi_user, 0x3fff) +/* Internal linkage. A flag. */ DW_IDX (DW_IDX_GNU_internal, 0x2000) +/* External linkage. A flag. Note that gdb no longer generates this; + the default is to assume external linkage. */ DW_IDX (DW_IDX_GNU_external, 0x2001) +/* This entry is the program's entry point. A flag. */ +DW_IDX (DW_IDX_GNU_main, 0x2002) +/* Language for this entry. A DW_LANG_* value. */ +DW_IDX (DW_IDX_GNU_language, 0x2003) +/* This entry is a linkage name. A flag. */ +DW_IDX (DW_IDX_GNU_linkage_name, 0x2004) DW_END_IDX /* DWARF5 Unit type header encodings */ -- 2.43.0
[PATCH] aarch64: arm_neon.h - Fix -Wincompatible-pointer-types errors
In the Linux kernel, u64/s64 are [un]signed long long, not [un]signed long. This means that when the `arm_neon.h' header is used by the kernel, any use of the `uint64_t' / `in64_t' types needs to be correctly cast to the correct `__builtin_aarch64_simd_di' / `__builtin_aarch64_simd_df' types when calling the relevant ACLE builtins. This patch adds the necessary fixes to ensure that `vstl1_*' and `vldap1_*' intrinsics are correctly defined for use by the kernel. gcc/ChangeLog: * config/aarch64/arm_neon.h (vldap1_lane_u64): Add `const' to `__builtin_aarch64_simd_di *' cast. (vldap1q_lane_u64): Likewise. (vldap1_lane_s64): Cast __src to `const __builtin_aarch64_simd_di *'. (vldap1q_lane_s64): Likewise. (vldap1_lane_f64): Cast __src to `const __builtin_aarch64_simd_df *'. (vldap1q_lane_f64): Cast __src to `const __builtin_aarch64_simd_df *'. (vldap1_lane_p64): Add `const' to `__builtin_aarch64_simd_di *' cast. (vldap1q_lane_p64): Add `const' to `__builtin_aarch64_simd_di *' cast. (vstl1_lane_u64): remove stray `const'. (vstl1_lane_s64): Cast __src to `__builtin_aarch64_simd_di *'. (vstl1q_lane_s64): Likewise. (vstl1_lane_f64): Cast __src to `const __builtin_aarch64_simd_df *'. (vstl1q_lane_f64): Likewise. --- gcc/config/aarch64/arm_neon.h | 34 +- 1 file changed, 21 insertions(+), 13 deletions(-) diff --git a/gcc/config/aarch64/arm_neon.h b/gcc/config/aarch64/arm_neon.h index ef0d75e07ce..f394de595f7 100644 --- a/gcc/config/aarch64/arm_neon.h +++ b/gcc/config/aarch64/arm_neon.h @@ -13456,7 +13456,7 @@ __attribute__ ((__always_inline__, __gnu_inline__, __artificial__)) vldap1_lane_u64 (const uint64_t *__src, uint64x1_t __vec, const int __lane) { return __builtin_aarch64_vec_ldap1_lanev1di_usus ( - (__builtin_aarch64_simd_di *) __src, __vec, __lane); + (const __builtin_aarch64_simd_di *) __src, __vec, __lane); } __extension__ extern __inline uint64x2_t @@ -13464,35 +13464,39 @@ __attribute__ ((__always_inline__, __gnu_inline__, __artificial__)) vldap1q_lane_u64 (const uint64_t *__src, uint64x2_t __vec, const int __lane) { return __builtin_aarch64_vec_ldap1_lanev2di_usus ( - (__builtin_aarch64_simd_di *) __src, __vec, __lane); + (const __builtin_aarch64_simd_di *) __src, __vec, __lane); } __extension__ extern __inline int64x1_t __attribute__ ((__always_inline__, __gnu_inline__, __artificial__)) vldap1_lane_s64 (const int64_t *__src, int64x1_t __vec, const int __lane) { - return __builtin_aarch64_vec_ldap1_lanev1di (__src, __vec, __lane); + return __builtin_aarch64_vec_ldap1_lanev1di ( + (const __builtin_aarch64_simd_di *) __src, __vec, __lane); } __extension__ extern __inline int64x2_t __attribute__ ((__always_inline__, __gnu_inline__, __artificial__)) vldap1q_lane_s64 (const int64_t *__src, int64x2_t __vec, const int __lane) { - return __builtin_aarch64_vec_ldap1_lanev2di (__src, __vec, __lane); + return __builtin_aarch64_vec_ldap1_lanev2di ( + (const __builtin_aarch64_simd_di *) __src, __vec, __lane); } __extension__ extern __inline float64x1_t __attribute__ ((__always_inline__, __gnu_inline__, __artificial__)) vldap1_lane_f64 (const float64_t *__src, float64x1_t __vec, const int __lane) { - return __builtin_aarch64_vec_ldap1_lanev1df (__src, __vec, __lane); + return __builtin_aarch64_vec_ldap1_lanev1df ( + (const __builtin_aarch64_simd_df *) __src, __vec, __lane); } __extension__ extern __inline float64x2_t __attribute__ ((__always_inline__, __gnu_inline__, __artificial__)) vldap1q_lane_f64 (const float64_t *__src, float64x2_t __vec, const int __lane) { - return __builtin_aarch64_vec_ldap1_lanev2df (__src, __vec, __lane); + return __builtin_aarch64_vec_ldap1_lanev2df ( + (const __builtin_aarch64_simd_df *) __src, __vec, __lane); } __extension__ extern __inline poly64x1_t @@ -13500,7 +13504,7 @@ __attribute__ ((__always_inline__, __gnu_inline__, __artificial__)) vldap1_lane_p64 (const poly64_t *__src, poly64x1_t __vec, const int __lane) { return __builtin_aarch64_vec_ldap1_lanev1di_psps ( - (__builtin_aarch64_simd_di *) __src, __vec, __lane); + (const __builtin_aarch64_simd_di *) __src, __vec, __lane); } __extension__ extern __inline poly64x2_t @@ -13508,14 +13512,14 @@ __attribute__ ((__always_inline__, __gnu_inline__, __artificial__)) vldap1q_lane_p64 (const poly64_t *__src, poly64x2_t __vec, const int __lane) { return __builtin_aarch64_vec_ldap1_lanev2di_psps ( - (__builtin_aarch64_simd_di *) __src, __vec, __lane); + (const __builtin_aarch64_simd_di *) __src, __vec, __lane); } /* vstl1_lane. */ __extension__ extern __inline void __attribute__ ((__always_inline__, __gnu_inline__, __artificial__)) -vstl1_lane_u64 (const uint64_t *__src, uint64x1_t __vec, const int __lane) +vstl1_lane_u64 (
RE: [PATCH v7] libgfortran: Replace mutex with rwlock
On 2023/12/9 23:23, Jakub Jelinek wrote: > On Sat, Dec 09, 2023 at 10:39:45AM -0500, Lipeng Zhu wrote: > > This patch try to introduce the rwlock and split the read/write to > > unit_root tree and unit_cache with rwlock instead of the mutex to > > increase CPU efficiency. In the get_gfc_unit function, the percentage > > to step into the insert_unit function is around 30%, in most > > instances, we can get the unit in the phase of reading the unit_cache > > or unit_root tree. So split the read/write phase by rwlock would be an > > approach to make it more parallel. > > > > BTW, the IPC metrics can gain around 9x in our test server with 220 > > cores. The benchmark we used is https://github.com/rwesson/NEAT > > > > libgcc/ChangeLog: > > > > * gthr-posix.h (__GTHREAD_RWLOCK_INIT): New macro. > > (__gthrw): New function. > > (__gthread_rwlock_rdlock): New function. > > (__gthread_rwlock_tryrdlock): New function. > > (__gthread_rwlock_wrlock): New function. > > (__gthread_rwlock_trywrlock): New function. > > (__gthread_rwlock_unlock): New function. > > > > libgfortran/ChangeLog: > > > > * io/async.c (DEBUG_LINE): New macro. > > * io/async.h (RWLOCK_DEBUG_ADD): New macro. > > (CHECK_RDLOCK): New macro. > > (CHECK_WRLOCK): New macro. > > (TAIL_RWLOCK_DEBUG_QUEUE): New macro. > > (IN_RWLOCK_DEBUG_QUEUE): New macro. > > (RDLOCK): New macro. > > (WRLOCK): New macro. > > (RWUNLOCK): New macro. > > (RD_TO_WRLOCK): New macro. > > (INTERN_RDLOCK): New macro. > > (INTERN_WRLOCK): New macro. > > (INTERN_RWUNLOCK): New macro. > > * io/io.h (struct gfc_unit): Change UNIT_LOCK to UNIT_RWLOCK in > > a comment. > > (unit_lock): Remove including associated internal_proto. > > (unit_rwlock): New declarations including associated internal_proto. > > (dec_waiting_unlocked): Use WRLOCK and RWUNLOCK on unit_rwlock > > instead of __gthread_mutex_lock and __gthread_mutex_unlock on > > unit_lock. > > * io/transfer.c (st_read_done_worker): Use WRLOCK and RWUNLOCK > on > > unit_rwlock instead of LOCK and UNLOCK on unit_lock. > > (st_write_done_worker): Likewise. > > * io/unit.c: Change UNIT_LOCK to UNIT_RWLOCK in 'IO locking rules' > > comment. Use unit_rwlock variable instead of unit_lock variable. > > (get_gfc_unit_from_unit_root): New function. > > (get_gfc_unit): Use RDLOCK, WRLOCK and RWUNLOCK on unit_rwlock > > instead of LOCK and UNLOCK on unit_lock. > > (close_unit_1): Use WRLOCK and RWUNLOCK on unit_rwlock instead > of > > LOCK and UNLOCK on unit_lock. > > (close_units): Likewise. > > (newunit_alloc): Use RWUNLOCK on unit_rwlock instead of UNLOCK on > > unit_lock. > > * io/unix.c (find_file): Use RDLOCK and RWUNLOCK on unit_rwlock > > instead of LOCK and UNLOCK on unit_lock. > > (flush_all_units): Use WRLOCK and RWUNLOCK on unit_rwlock instead > > of LOCK and UNLOCK on unit_lock. > > Ok for trunk, thanks. > > Jakub Thanks! Looking forward to landing to trunk. Lipeng Zhu