[PATCH] Improve sequence logic in cxx_init_decl_processing
Hello, commit aa2c978400f3b3ca6e9f2d18598a379589e77ba0, introduced per https://gcc.gnu.org/pipermail/gcc-patches/2020-May/545552.html makes references to __cxa_pure_virtual weak and this is causing issues on some VxWorks configurations, where weak symbols are only supported for one of the two major operating modes, and not on all versions. While trying to circumvent that, I noticed that the current code in cxx_init_decl_processing does something like: if (flag_weak) /* If no definition is available, resolve references to NULL. */ declare_weak (abort_fndecl); ... if (! supports_one_only ()) flag_weak = 0; The code possibly resetting flag_weak should presumlably execute before the test checking the flag, or we'd need a comment explaining why this surprising order is on purpose. The attached patch just moves the reset above the test. It bootstraps/regtests fine on x86_64-linux and allows better control on vxWorks. I'm not yet clear on some of the ramifications there (tigthening the definitions of SUPPORTS_ONE_ONLY and TARGET_SUPPORTS_WEAK yields lots of dg test failures) but that's another story. Ok to commit? Thanks in advance! 2021-12-30 Olivier Hainque gcc/ * cp/decl.c (cxx_init_decl_processing): Move code possibly altering flag_weak before code testing it. Olivier 0001-Improve-sequence-logic-in-cxx_init_decl_processing.patch Description: Binary data
[PATCH] expr: Workaround profiledbootstrap uninit false positive [PR103899]
Hi! The threader changes resulted in a false positive warning during profiledbootstrap: In file included from ../../gcc/expr.c:26: ../../gcc/tree.h: In function ‘rtx_def* expand_expr_real_1(tree, rtx, machine_mode, expand_modifier, rtx_def**, bool)’: ../../gcc/tree.h:244:56: error: ‘context’ may be used uninitialized in this function [-Werror=maybe-uninitialized] 244 | #define TREE_CODE(NODE) ((enum tree_code) (NODE)->base.code) |^~~~ ../../gcc/expr.c:10343:8: note: ‘context’ was declared here 10343 | tree context; |^~~ While it will be nice to improve the uninit pass to handle it if possible (I do not want to close the PR until that is done), doing profiledbootstrap is a common thing to do, so a workaround is handy, especially as in this case when the workaround seems to be the right thing to do, as it moves a variable declaration to the only place where it is set and used and avoids the weird and for uninit asking tree context; ... if (exp) context = ...; gcc_assert (!exp || use (context) || use_some_more (context)); Bootstrapped/regtested on x86_64-linux and i686-linux with normal bootstrap and on aarch64-linux and powerpc64le-linux with profiledbootstrap, ok for trunk? 2022-01-06 Jakub Jelinek PR tree-optimization/103899 * expr.c (expand_expr_real_1): Add a workaround for bogus uninit warning by moving context variable to the only spot where it is used and moving gcc_assert into if body. --- gcc/expr.c.jj 2022-01-03 10:40:41.203164211 +0100 +++ gcc/expr.c 2022-01-05 14:47:47.684660031 +0100 @@ -10340,7 +10340,6 @@ expand_expr_real_1 (tree exp, rtx target enum tree_code code = TREE_CODE (exp); rtx subtarget, original_target; int ignore; - tree context; bool reduce_bit_field; location_t loc = EXPR_LOCATION (exp); struct separate_ops ops; @@ -10579,14 +10578,16 @@ expand_expr_real_1 (tree exp, rtx target /* Variables inherited from containing functions should have been lowered by this point. */ if (exp) - context = decl_function_context (exp); - gcc_assert (!exp - || SCOPE_FILE_SCOPE_P (context) - || context == current_function_decl - || TREE_STATIC (exp) - || DECL_EXTERNAL (exp) - /* ??? C++ creates functions that are not TREE_STATIC. */ - || TREE_CODE (exp) == FUNCTION_DECL); + { + tree context = decl_function_context (exp); + gcc_assert (SCOPE_FILE_SCOPE_P (context) + || context == current_function_decl + || TREE_STATIC (exp) + || DECL_EXTERNAL (exp) + /* ??? C++ creates functions that are not +TREE_STATIC. */ + || TREE_CODE (exp) == FUNCTION_DECL); + } /* This is the case of an array whose size is to be determined from its initializer, while the initializer is still being parsed. Jakub
[PATCH] c++: Ensure some more that immediate functions aren't gimplified [PR103912]
Hi! Immediate functions should never be emitted into assembly, the FE doesn't genericize them and does various things to ensure they aren't gimplified. But the following testcase ICEs anyway due to that, because the consteval function returns a lambda, and operator() of the lambda has decl_function_context of the consteval function. cgraphunit.c then does: /* Preserve a functions function context node. It will later be needed to output debug info. */ if (tree fn = decl_function_context (decl)) { cgraph_node *origin_node = cgraph_node::get_create (fn); enqueue_node (origin_node); } which enqueues the immediate function and then tries to gimplify it, which results in ICE because it hasn't been genericized. When I try similar testcase with constexpr instead of consteval and static constinit auto instead of auto in main, what happens is that the functions are gimplified, later ipa.c discovers they aren't reachable and sets body_removed to true for them (and clears other flags) and we end up with a debug info which has the foo and bar functions without DW_AT_low_pc and other code specific attributes, just stuff from its BLOCK structure and in there the lambda with DW_AT_low_pc etc. The following patch attempts to emulate that behavior early, so that cgraph doesn't try to gimplify those and pretends they were already gimplified and found unused and optimized away. Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk? 2022-01-06 Jakub Jelinek PR c++/103912 * semantics.c (expand_or_defer_fn): For immediate functions, set node->body_removed to true and clear analyzed, definition and force_output. * decl2.c (c_parse_final_cleanups): Ignore immediate functions for expand_or_defer_fn. * g++.dg/cpp2a/consteval26.C: New test. --- gcc/cp/semantics.c.jj 2022-01-03 10:40:48.0 +0100 +++ gcc/cp/semantics.c 2022-01-05 12:52:11.484379138 +0100 @@ -4785,6 +4785,18 @@ expand_or_defer_fn (tree fn) emit_associated_thunks (fn); function_depth--; + + if (DECL_IMMEDIATE_FUNCTION_P (fn)) + { + cgraph_node *node = cgraph_node::get (fn); + if (node) + { + node->body_removed = true; + node->analyzed = false; + node->definition = false; + node->force_output = false; + } + } } } --- gcc/cp/decl2.c.jj 2022-01-03 10:40:48.083068010 +0100 +++ gcc/cp/decl2.c 2022-01-05 12:53:34.930204119 +0100 @@ -5272,6 +5272,7 @@ c_parse_final_cleanups (void) if (!DECL_EXTERNAL (decl) && decl_needed_p (decl) && !TREE_ASM_WRITTEN (decl) + && !DECL_IMMEDIATE_FUNCTION_P (decl) && !node->definition) { /* We will output the function; no longer consider it in this --- gcc/testsuite/g++.dg/cpp2a/consteval26.C.jj 2022-01-05 12:42:07.918878074 +0100 +++ gcc/testsuite/g++.dg/cpp2a/consteval26.C2022-01-05 12:40:18.853416637 +0100 @@ -0,0 +1,39 @@ +// PR c++/103912 +// { dg-do run { target c++20 } } +// { dg-additional-options "-O2 -g -fkeep-inline-functions" } + +extern "C" void abort (); + +struct A { A () {} }; + +consteval auto +foo () +{ + if (1) +; + return [] (A x) { return 1; }; +} + +consteval auto +bar (int a) +{ + int b = a + 4; + if (1) +; + return [=] (A x) { return a + b; }; +} + +int +main () +{ + A x; + auto h = foo (); + if (h (x) != 1) +abort (); + auto i = bar (5); + if (i (x) != 14) +abort (); + auto j = bar (42); + if (j (x) != 88) +abort (); +} Jakub
[PATCH] [i386] Optimize V16HF vector insert to element 0 for AVX2.
Also remove mode attribute blendsuf, use ssemodesuf instead. Bootstrapped and regtested on x86_64-pc-linux-gnu{-m32,}. Ready to push to trunk. gcc/ChangeLog: PR target/103753 * config/i386/i386-expand.c (ix86_expand_vector_set): Not use gen_avx2_pblendph_pblendd when elt == 0. * config/i386/sse.md (avx2_pblendph): Rename to .. (avx2_pblend_pblendd).. this, and extend to V16HI. (*avx2_pblendw): Rename to .. (*avx2_pblend): .. this, and extend to V16HF. (avx2_pblendw): Rename to .. (*avx2_pblend): .. this, and extend to V16HF. (blendsuf): Removed. (sse4_1_pblend): Renamed to .. (sse4_1_pblend): .. this. gcc/testsuite/ChangeLog: * gcc.target/i386/pr103753.c: New test. --- gcc/config/i386/i386-expand.c| 5 ++- gcc/config/i386/sse.md | 48 +++- gcc/testsuite/gcc.target/i386/pr103753.c | 17 + 3 files changed, 42 insertions(+), 28 deletions(-) create mode 100644 gcc/testsuite/gcc.target/i386/pr103753.c diff --git a/gcc/config/i386/i386-expand.c b/gcc/config/i386/i386-expand.c index e93ef1cafa6..0d219d6bb69 100644 --- a/gcc/config/i386/i386-expand.c +++ b/gcc/config/i386/i386-expand.c @@ -16245,10 +16245,11 @@ ix86_expand_vector_set (bool mmx_ok, rtx target, rtx val, int elt) goto half; case E_V16HFmode: - if (TARGET_AVX2) + /* For ELT == 0, vec_setv8hf_0 can save 1 vpbroadcastw. */ + if (TARGET_AVX2 && elt != 0) { mmode = SImode; - gen_blendm = gen_avx2_pblendph; + gen_blendm = gen_avx2_pblendph_pblendd; blendm_const = true; break; } diff --git a/gcc/config/i386/sse.md b/gcc/config/i386/sse.md index 033b60d9aa2..c986c73bef8 100644 --- a/gcc/config/i386/sse.md +++ b/gcc/config/i386/sse.md @@ -21291,10 +21291,7 @@ (define_insn_and_split "*_pblendvb_lt_subreg_not" (lt:VI1_AVX2 (match_dup 3) (match_dup 4))] UNSPEC_BLENDV))] "operands[3] = gen_lowpart (mode, operands[3]);") -(define_mode_attr blendsuf - [(V8HI "w") (V8HF "ph")]) - -(define_insn "sse4_1_pblend" +(define_insn "sse4_1_pblend" [(set (match_operand:V8_128 0 "register_operand" "=Yr,*x,x") (vec_merge:V8_128 (match_operand:V8_128 2 "vector_operand" "YrBm,*xBm,xm") @@ -21313,11 +21310,11 @@ (define_insn "sse4_1_pblend" (set_attr "mode" "TI")]) ;; The builtin uses an 8-bit immediate. Expand that. -(define_expand "avx2_pblendw" - [(set (match_operand:V16HI 0 "register_operand") - (vec_merge:V16HI - (match_operand:V16HI 2 "nonimmediate_operand") - (match_operand:V16HI 1 "register_operand") +(define_expand "avx2_pblend" + [(set (match_operand:V16_256 0 "register_operand") + (vec_merge:V16_256 + (match_operand:V16_256 2 "nonimmediate_operand") + (match_operand:V16_256 1 "register_operand") (match_operand:SI 3 "const_0_to_255_operand")))] "TARGET_AVX2" { @@ -21325,11 +21322,11 @@ (define_expand "avx2_pblendw" operands[3] = GEN_INT (val << 8 | val); }) -(define_expand "avx2_pblendph" - [(set (match_operand:V16HF 0 "register_operand") - (vec_merge:V16HF - (match_operand:V16HF 2 "register_operand") - (match_operand:V16HF 1 "register_operand") +(define_expand "avx2_pblend_pblendd" + [(set (match_operand:V16_256 0 "register_operand") + (vec_merge:V16_256 + (match_operand:V16_256 2 "register_operand") + (match_operand:V16_256 1 "register_operand") (match_operand:SI 3 "const_int_operand")))] "TARGET_AVX2 && !((INTVAL (operands[3]) & 0xff) && (INTVAL (operands[3]) & 0xff00))" @@ -21339,7 +21336,7 @@ (define_expand "avx2_pblendph" emit_move_insn (operands[0], operands[1]); else { - rtx tmp = gen_reg_rtx (V16HImode); + rtx tmp = gen_reg_rtx (mode); rtx blendw_idx, blendd_idx; if (mask & 0xff) @@ -21352,13 +21349,12 @@ (define_expand "avx2_pblendph" blendw_idx = GEN_INT (mask >> 8 & 0xff); blendd_idx = GEN_INT (240); } - operands[1] = lowpart_subreg (V16HImode, operands[1], V16HFmode); - operands[2] = lowpart_subreg (V16HImode, operands[2], V16HFmode); - emit_insn (gen_avx2_pblendw (tmp, operands[1], operands[2], blendw_idx)); + emit_insn (gen_avx2_pblend (tmp, operands[1], + operands[2], blendw_idx)); - operands[0] = lowpart_subreg (V8SImode, operands[0], V16HFmode); - tmp = lowpart_subreg (V8SImode, tmp, V16HImode); - operands[1] = lowpart_subreg (V8SImode, operands[1], V16HImode); + operands[0] = lowpart_subreg (V8SImode, operands[0], mode); + tmp = lowpart_subreg (V8SImode, tmp, mode); + operands[1] = lowpart_subreg (V8SImode, operands[1], mode); emit_insn (gen_avx2_pblenddv8si (operands[0], operands[1], tmp, blendd_idx)); } @@ -21366,11 +21362,11 @@ (
[PATCH] c++: Reject in constant evaluation address comparisons of start of one var and end of another [PR89074]
Hi! The following testcase used to be incorrectly accepted. The match.pd optimization that uses address_compare punts on folding comparison of start of one object and end of another one only when those addresses are cast to integral types, when the comparison is done on pointer types it assumes undefined behavior and decides to fold the comparison such that the addresses don't compare equal even when they at runtime they could be equal. But C++ says it is undefined behavior and so during constant evaluation we should reject those, so this patch adds !folding_initializer && check to that spot. Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk? Note, address_compare has some special cases, e.g. it assumes that static vars are never adjacent to automatic vars, which is the case for the usual layout where automatic vars are on the stack and after .rodata/.data sections there is heap: /* Assume that automatic variables can't be adjacent to global variables. */ else if (is_global_var (base0) != is_global_var (base1)) ; Is it ok that during constant evaluation we don't treat those as undefined behavior, or shall that be with !folding_initializer && too? Another special case is: if ((DECL_P (base0) && TREE_CODE (base1) == STRING_CST) || (TREE_CODE (base0) == STRING_CST && DECL_P (base1)) || (TREE_CODE (base0) == STRING_CST && TREE_CODE (base1) == STRING_CST && ioff0 >= 0 && ioff1 >= 0 && ioff0 < TREE_STRING_LENGTH (base0) && ioff1 < TREE_STRING_LENGTH (base1) /* This is a too conservative test that the STRING_CSTs will not end up being string-merged. */ && strncmp (TREE_STRING_POINTER (base0) + ioff0, TREE_STRING_POINTER (base1) + ioff1, MIN (TREE_STRING_LENGTH (base0) - ioff0, TREE_STRING_LENGTH (base1) - ioff1)) != 0)) ; else if (!DECL_P (base0) || !DECL_P (base1)) return 2; Here we similarly assume that vars aren't adjacent to string literals or vice versa. Do we need to stick !folding_initializer && to those DECL_P vs. STRING_CST cases? Though, because of the return 2; for non-DECL_P that would mean rejecting comparisons like &var == &"foobar"[3] etc. which ought to be fine, no? So perhaps we need to watch for decls. vs. STRING_CSTs like for DECLs whether the address is at the start or at the end of the string literal or somewhere in between (at least for folding_initializer)? And yet another chapter but probably unsolvable is comparison of string literal addresses. I think pedantically in C++ &"foo"[0] == &"foo"[0] is undefined behavior, different occurences of the same string literals might still not be merged in some implementations. But constexpr const char *s = "foo"; &s[0] == &s[0] should be well defined, and we aren't tracking anywhere whether the string literal was the same one or different (and I think other compilers don't track that either). 2022-01-06 Jakub Jelinek PR c++/89074 * fold-const.c (address_compare): Punt on comparison of address of one object with address of end of another object if folding_initializer. * g++.dg/cpp1y/constexpr-89074-1.C: New test. --- gcc/fold-const.c.jj 2022-01-05 20:30:08.731806756 +0100 +++ gcc/fold-const.c2022-01-05 20:34:52.277822349 +0100 @@ -16627,7 +16627,7 @@ address_compare (tree_code code, tree ty /* If this is a pointer comparison, ignore for now even valid equalities where one pointer is the offset zero of one object and the other to one past end of another one. */ - else if (!INTEGRAL_TYPE_P (type)) + else if (!folding_initializer && !INTEGRAL_TYPE_P (type)) ; /* Assume that automatic variables can't be adjacent to global variables. */ --- gcc/testsuite/g++.dg/cpp1y/constexpr-89074-1.C.jj 2022-01-05 20:43:03.696917484 +0100 +++ gcc/testsuite/g++.dg/cpp1y/constexpr-89074-1.C 2022-01-05 20:42:12.676634044 +0100 @@ -0,0 +1,28 @@ +// PR c++/89074 +// { dg-do compile { target c++14 } } + +constexpr bool +foo () +{ + int a[] = { 1, 2 }; + int b[] = { 3, 4 }; + + if (&a[0] == &b[0]) +return false; + + if (&a[1] == &b[0]) +return false; + + if (&a[1] == &b[1]) +return false; + + if (&a[2] == &b[1]) +return false; + + if (&a[2] == &b[0]) // { dg-error "is not a constant expression" } +return false; + + return true; +} + +constexpr bool a = foo (); Jakub
Re: [PATCH] libgomp, OpenMP, nvptx: Low-latency memory allocator
On 1/5/22 15:36, Andrew Stubbs wrote: On 05/01/2022 13:04, Tom de Vries wrote: On 1/5/22 12:08, Tom de Vries wrote: The allocators-1.c test-case doesn't compile because: ... FAIL: libgomp.c/allocators-1.c (test for excess errors) Excess errors: /home/vries/oacc/trunk/source-gcc/libgomp/testsuite/libgomp.c/allocators-1.c:7:22: sorry, unimplemented: ' ' clause on 'requires' directive not supported yet UNRESOLVED: libgomp.c/allocators-1.c compilation failed to produce executable ... So, I suppose I need "[PATCH] OpenMP front-end: allow requires dynamic_allocators" as well, I'll try again with that applied. After applying that, I get: ... WARNING: program timed out. FAIL: libgomp.c/allocators-2.c execution test WARNING: program timed out. FAIL: libgomp.c/allocators-3.c execution test ... It works for me. Those tests are doing some large number of allocations repeatedly and in parallel to stress the atomics. They're also slightly longer running than the other tests. - allocators-2 calls omp_alloc 8080 times, over 16 kernel launches, some of which will fall back to PTX malloc. I've minimized the test-case by enabling a single call in main at a time. All but the last 4 take about two seconds, the last 4 hang (and time out at 5min). So, this already times out for me: ... int main () { test (1000, omp_low_lat_mem_alloc); return 0; } ... I tried playing around with the n, and roughly there's no hang below 100, and a hang above 200, and inbetween there may or may not be a hang. Again the same dynamic: if there's no hang, it just takes a few seconds. - allocators-3 calls omp_alloc and omp_free 8 million times each, over 8 kernel launches, and takes about a minute to run on my device (whether that falls back depends entirely on how the free calls interleave). Either there is a flaw in the concurrency causing some kind of deadlock, or else your timeout is set too short for your device. I hope it's the latter. We may need to tweak this. At first glance, the above behaviour doesn't look like a too short timeout. [ FTR, I'm using a GT 1030 with production branch driver version 470.86 (which is one version behind the latest 470.94) ] Thanks, - Tom
Re: [PATCH] match.pd: Simplify 1 / X for integer X [PR95424]
On Wed, 5 Jan 2022 at 17:55, Richard Biener wrote: > On Wed, Jan 5, 2022 at 10:42 AM Jakub Jelinek wrote: > > > > On Wed, Jan 05, 2022 at 10:38:53AM +0100, Richard Biener via Gcc-patches > wrote: > > > On Wed, Jan 5, 2022 at 10:18 AM Zhao Wei Liew > wrote: > > > > > > > > > X >= -1 && X <= 1 is (hopefully?) going to be simplified > > > > > as (unsigned)X + 1 <= 2, it might be good to simplify it this way > > > > > here as well? > > > > > > > > Yup, GCC does simplify it that way in the end, so I didn't really > bother to simplify it here. That said, I'm open to simplifying it here as > well, but I'm not sure how to do the unsigned cast. > > > > > > You'd do sth like > > > (with > > > { tree utype = unsigned_type_for (type); } > > > (cond (le (plus (convert:utype @0) { build_one_cst (utype); }) { > > > build_int_cst (utype, 2); }) ...) > > > > > > extra tricky will be 1 bit integer types, I guess it might be easiest > > > to exclude them > > > and special case them separately - X / Y is always X for them I think, > > > > Note, we already have: > > /* X / bool_range_Y is X. */ > > (simplify > > (div @0 SSA_NAME@1) > > (if (INTEGRAL_TYPE_P (type) && ssa_name_has_boolean_range (@1)) > >@0)) > > for those. > > Ah, it might not handle the signed : 1 case though since -1 is not in the > bool range. We could generalize the above though. > > Richard. > > > > > Jakub > > > Perhaps it is possible to exclude 1-bit cases and rely on other existing simplifications to deal with them? In particular, GCC seems to already optimally simplify the signed and unsigned 1-bit cases [1]. [1]: struct S { unsigned int x: 1; int y: 1; }; unsigned int fu(S s) { return 1 / s.x; } int fi(S s) { return 1 / s.y; } fu(S): mov eax, 1 ret fi(S): mov eax, -1 ret
Re: [committed] libstdc++: Reduce template instantiations in
On 05/01/2022 14:47, Jonathan Wakely via Libstdc++ wrote: Tested powerpc64le-linux, pushed to trunk. This moves the last two template parameters of __regex_algo_impl to be runtime function parameters instead, so that we don't need four different instantiations for the possible ways to call it. Most of the function (and what it instantiates) is the same in all cases, so making them compile-time choices doesn't really have much benefit. Use 'if constexpr' for conditions that check template parameters, so that when we do depend on a compile-time condition we only instantiate what we need to. libstdc++-v3/ChangeLog: * include/bits/regex.h (__regex_algo_impl): Change __policy and __match_mode template parameters to be function parameters. (regex_match, regex_search): Pass policy and match mode as function arguments. * include/bits/regex.tcc (__regex_algo_impl): Change template parameters to function parameters. * include/bits/regex_compiler.h (_RegexTranslatorBase): Use 'if constexpr' for conditions using template parameters. (_RegexTranslator): Likewise. * include/bits/regex_executor.tcc (_Executor::_M_handle_accept): Likewise. * testsuite/util/testsuite_regex.h (regex_match_debug) (regex_search_debug): Move template arguments to function arguments. --- libstdc++-v3/include/bits/regex.h | 33 +-- libstdc++-v3/include/bits/regex.tcc | 8 ++--- libstdc++-v3/include/bits/regex_compiler.h| 9 ++--- libstdc++-v3/include/bits/regex_executor.tcc | 2 +- libstdc++-v3/testsuite/util/testsuite_regex.h | 24 +++--- 5 files changed, 37 insertions(+), 39 deletions(-) Clang now fails #include with In file included from gcc/trunk/inst/lib/gcc/x86_64-pc-linux-gnu/12.0.0/../../../../include/c++/12.0.0/regex:66: gcc/trunk/inst/lib/gcc/x86_64-pc-linux-gnu/12.0.0/../../../../include/c++/12.0.0/bits/regex.h:799:9: error: unknown type name '_RegexExecutorPolicy'; did you mean '__detail::_RegexExecutorPolicy'? _RegexExecutorPolicy, bool); ^ gcc/trunk/inst/lib/gcc/x86_64-pc-linux-gnu/12.0.0/../../../../include/c++/12.0.0/bits/regex.h:45:14: note: '__detail::_RegexExecutorPolicy' declared here enum class _RegexExecutorPolicy : int { _S_auto, _S_alternate }; ^ gcc/trunk/inst/lib/gcc/x86_64-pc-linux-gnu/12.0.0/../../../../include/c++/12.0.0/bits/regex.h:2070:9: error: unknown type name '_RegexExecutorPolicy'; did you mean '__detail::_RegexExecutorPolicy'? _RegexExecutorPolicy, bool); ^ gcc/trunk/inst/lib/gcc/x86_64-pc-linux-gnu/12.0.0/../../../../include/c++/12.0.0/bits/regex.h:45:14: note: '__detail::_RegexExecutorPolicy' declared here enum class _RegexExecutorPolicy : int { _S_auto, _S_alternate }; ^ and diff --git a/libstdc++-v3/include/bits/regex.h b/libstdc++-v3/include/bits/regex.h index 7480b0a5f97..46c168010bf 100644 --- a/libstdc++-v3/include/bits/regex.h +++ b/libstdc++-v3/include/bits/regex.h @@ -796,7 +796,7 @@ _GLIBCXX_BEGIN_NAMESPACE_CXX11 __detail::__regex_algo_impl(_Bp, _Bp, match_results<_Bp, _Ap>&, const basic_regex<_Cp, _Rp>&, regex_constants::match_flag_type, - _RegexExecutorPolicy, bool); + __detail::_RegexExecutorPolicy, bool); template friend class __detail::_Executor; @@ -2067,7 +2067,7 @@ _GLIBCXX_BEGIN_NAMESPACE_CXX11 __detail::__regex_algo_impl(_Bp, _Bp, match_results<_Bp, _Ap>&, const basic_regex<_Cp, _Rp>&, regex_constants::match_flag_type, - _RegexExecutorPolicy, bool); + __detail::_RegexExecutorPolicy, bool); // Reset contents to __size unmatched sub_match objects // (plus additional objects for prefix, suffix and unmatched sub). would fix that.
Re: [PATCH] match.pd: Simplify 1 / X for integer X [PR95424]
On Thu, Jan 06, 2022 at 05:52:03PM +0800, Zhao Wei Liew wrote: > On Wed, 5 Jan 2022 at 17:55, Richard Biener > wrote: > > > On Wed, Jan 5, 2022 at 10:42 AM Jakub Jelinek wrote: > > > > > > On Wed, Jan 05, 2022 at 10:38:53AM +0100, Richard Biener via Gcc-patches > > wrote: > > > > On Wed, Jan 5, 2022 at 10:18 AM Zhao Wei Liew > > wrote: > > > > > > > > > > > X >= -1 && X <= 1 is (hopefully?) going to be simplified > > > > > > as (unsigned)X + 1 <= 2, it might be good to simplify it this way > > > > > > here as well? > > > > > > > > > > Yup, GCC does simplify it that way in the end, so I didn't really > > bother to simplify it here. That said, I'm open to simplifying it here as > > well, but I'm not sure how to do the unsigned cast. > > > > > > > > You'd do sth like > > > > (with > > > > { tree utype = unsigned_type_for (type); } > > > > (cond (le (plus (convert:utype @0) { build_one_cst (utype); }) { > > > > build_int_cst (utype, 2); }) ...) > > > > > > > > extra tricky will be 1 bit integer types, I guess it might be easiest > > > > to exclude them > > > > and special case them separately - X / Y is always X for them I think, > > > > > > Note, we already have: > > > /* X / bool_range_Y is X. */ > > > (simplify > > > (div @0 SSA_NAME@1) > > > (if (INTEGRAL_TYPE_P (type) && ssa_name_has_boolean_range (@1)) > > >@0)) > > > for those. > > > > Ah, it might not handle the signed : 1 case though since -1 is not in the > > bool range. We could generalize the above though. > > > > Richard. > > > > > > > > Jakub > > > > > > > Perhaps it is possible to exclude 1-bit cases and rely on other existing > simplifications to deal with them? > In particular, GCC seems to already optimally simplify the signed and > unsigned 1-bit cases [1]. Sure, I thought that was what Richi was suggesting, so ad some && TYPE_PRECISION (...) > 1 somewhere (after making sure it is INTEGRAL_TYPE_P), or element_precision if it is ANY_INTEGRAL_TYPE_P and could be vector type. The discussion was about whether we optimize even those : 1 cases sufficiently but that can be done in different simplifications... Jakub
Re: [committed] libstdc++: Reduce template instantiations in
On Thu, 6 Jan 2022 at 10:00, Stephan Bergmann wrote: > On 05/01/2022 14:47, Jonathan Wakely via Libstdc++ wrote: > > Tested powerpc64le-linux, pushed to trunk. > > > > > > This moves the last two template parameters of __regex_algo_impl to be > > runtime function parameters instead, so that we don't need four > > different instantiations for the possible ways to call it. Most of the > > function (and what it instantiates) is the same in all cases, so making > > them compile-time choices doesn't really have much benefit. > > > > Use 'if constexpr' for conditions that check template parameters, so > > that when we do depend on a compile-time condition we only instantiate > > what we need to. > > > > libstdc++-v3/ChangeLog: > > > > * include/bits/regex.h (__regex_algo_impl): Change __policy and > > __match_mode template parameters to be function parameters. > > (regex_match, regex_search): Pass policy and match mode as > > function arguments. > > * include/bits/regex.tcc (__regex_algo_impl): Change template > > parameters to function parameters. > > * include/bits/regex_compiler.h (_RegexTranslatorBase): Use > > 'if constexpr' for conditions using template parameters. > > (_RegexTranslator): Likewise. > > * include/bits/regex_executor.tcc (_Executor::_M_handle_accept): > > Likewise. > > * testsuite/util/testsuite_regex.h (regex_match_debug) > > (regex_search_debug): Move template arguments to function > > arguments. > > --- > > libstdc++-v3/include/bits/regex.h | 33 +-- > > libstdc++-v3/include/bits/regex.tcc | 8 ++--- > > libstdc++-v3/include/bits/regex_compiler.h| 9 ++--- > > libstdc++-v3/include/bits/regex_executor.tcc | 2 +- > > libstdc++-v3/testsuite/util/testsuite_regex.h | 24 +++--- > > 5 files changed, 37 insertions(+), 39 deletions(-) > > Clang now fails #include with > > > In file included from > gcc/trunk/inst/lib/gcc/x86_64-pc-linux-gnu/12.0.0/../../../../include/c++/12.0.0/regex:66: > > > gcc/trunk/inst/lib/gcc/x86_64-pc-linux-gnu/12.0.0/../../../../include/c++/12.0.0/bits/regex.h:799:9: > error: unknown type name '_RegexExecutorPolicy'; did you mean > '__detail::_RegexExecutorPolicy'? > > _RegexExecutorPolicy, bool); > > ^ > > > gcc/trunk/inst/lib/gcc/x86_64-pc-linux-gnu/12.0.0/../../../../include/c++/12.0.0/bits/regex.h:45:14: > note: '__detail::_RegexExecutorPolicy' declared here > > enum class _RegexExecutorPolicy : int { _S_auto, _S_alternate }; > > ^ > > > gcc/trunk/inst/lib/gcc/x86_64-pc-linux-gnu/12.0.0/../../../../include/c++/12.0.0/bits/regex.h:2070:9: > error: unknown type name '_RegexExecutorPolicy'; did you mean > '__detail::_RegexExecutorPolicy'? > > _RegexExecutorPolicy, bool); > > ^ > > > gcc/trunk/inst/lib/gcc/x86_64-pc-linux-gnu/12.0.0/../../../../include/c++/12.0.0/bits/regex.h:45:14: > note: '__detail::_RegexExecutorPolicy' declared here > > enum class _RegexExecutorPolicy : int { _S_auto, _S_alternate }; > > ^ > > and > > > diff --git a/libstdc++-v3/include/bits/regex.h > b/libstdc++-v3/include/bits/regex.h > > index 7480b0a5f97..46c168010bf 100644 > > --- a/libstdc++-v3/include/bits/regex.h > > +++ b/libstdc++-v3/include/bits/regex.h > > @@ -796,7 +796,7 @@ _GLIBCXX_BEGIN_NAMESPACE_CXX11 > > __detail::__regex_algo_impl(_Bp, _Bp, match_results<_Bp, _Ap>&, > > const basic_regex<_Cp, _Rp>&, > > regex_constants::match_flag_type, > > - _RegexExecutorPolicy, bool); > > + __detail::_RegexExecutorPolicy, > bool); > > > >template > > friend class __detail::_Executor; > > @@ -2067,7 +2067,7 @@ _GLIBCXX_BEGIN_NAMESPACE_CXX11 > > __detail::__regex_algo_impl(_Bp, _Bp, match_results<_Bp, _Ap>&, > > const basic_regex<_Cp, _Rp>&, > > regex_constants::match_flag_type, > > - _RegexExecutorPolicy, bool); > > + __detail::_RegexExecutorPolicy, > bool); > > > >// Reset contents to __size unmatched sub_match objects > >// (plus additional objects for prefix, suffix and unmatched sub). > > would fix that. > I'll make the change, but this looks like a clang bug: https://godbolt.org/z/bozxYErrc
[PATCH v2] match.pd: Simplify 1 / X for integer X [PR95424]
This patch implements an optimization for the following C++ code: int f(int x) { return 1 / x; } int f(unsigned int x) { return 1 / x; } Before this patch, x86-64 gcc -std=c++20 -O3 produces the following assembly: f(int): xor edx, edx mov eax, 1 idiv edi ret f(unsigned int): xor edx, edx mov eax, 1 div edi ret In comparison, clang++ -std=c++20 -O3 produces the following assembly: f(int): lea ecx, [rdi + 1] xor eax, eax cmp ecx, 3 cmovb eax, edi ret f(unsigned int): xor eax, eax cmp edi, 1 sete al ret Clang's output is more efficient as it avoids expensive div operations. With this patch, GCC now produces the following assembly: f(int): lea eax, [rdi + 1] cmp eax, 2 mov eax, 0 cmovbe eax, edi ret f(unsigned int): xor eax, eax cmp edi, 1 sete al ret which is virtually identical to Clang's assembly output. Any slight differences in the output for f(int) is possibly related to a different missed optimization. v1: https://gcc.gnu.org/pipermail/gcc-patches/2022-January/587634.html Changes from v1: 1. Refactor common if conditions. 2. Use build_[minus_]one_cst (type) to get -1/1 of the correct type. 3. Match only for TRUNC_DIV_EXPR and TYPE_PRECISION (type) > 1. gcc/ChangeLog: * match.pd: Simplify 1 / X where X is an integer. gcc/testsuite/ChangeLog: * gcc.dg/tree-ssa/divide-6.c: New test. * gcc.dg/tree-ssa/divide-7.c: New test. --- gcc/match.pd | 15 +++ gcc/testsuite/gcc.dg/tree-ssa/divide-6.c | 9 + gcc/testsuite/gcc.dg/tree-ssa/divide-7.c | 9 + 3 files changed, 33 insertions(+) create mode 100644 gcc/testsuite/gcc.dg/tree-ssa/divide-6.c create mode 100644 gcc/testsuite/gcc.dg/tree-ssa/divide-7.c diff --git a/gcc/match.pd b/gcc/match.pd index 84c9b918041..52a0f77f455 100644 --- a/gcc/match.pd +++ b/gcc/match.pd @@ -432,6 +432,21 @@ DEFINE_INT_AND_FLOAT_ROUND_FN (RINT) && TYPE_UNSIGNED (type)) (trunc_div @0 @1))) + /* 1 / X -> X == 1 for unsigned integer X. +1 / X -> X >= -1 && X <= 1 ? X : 0 for signed integer X. +But not for 1 / 0 so that we can get proper warnings and errors, +and not for 1-bit integers as they are edge cases better handled elsewhere. */ +(simplify + (trunc_div integer_onep@0 @1) + (if (INTEGRAL_TYPE_P (type) && !integer_zerop (@1) && TYPE_PRECISION (type) > 1) +(switch + (if (TYPE_UNSIGNED (type)) +(eq @1 { build_one_cst (type); })) + (if (!TYPE_UNSIGNED (type)) +(with { tree utype = unsigned_type_for (type); } + (cond (le (plus (convert:utype @1) { build_one_cst (utype); }) { build_int_cst (utype, 2); }) +@1 { build_zero_cst (type); })) + /* Combine two successive divisions. Note that combining ceil_div and floor_div is trickier and combining round_div even more so. */ (for div (trunc_div exact_div) diff --git a/gcc/testsuite/gcc.dg/tree-ssa/divide-6.c b/gcc/testsuite/gcc.dg/tree-ssa/divide-6.c new file mode 100644 index 000..a9fc4c04058 --- /dev/null +++ b/gcc/testsuite/gcc.dg/tree-ssa/divide-6.c @@ -0,0 +1,9 @@ +/* { dg-do compile } */ +/* { dg-options "-O -fdump-tree-optimized" } */ + +unsigned int f(unsigned int x) { + return 1 / x; +} + +/* { dg-final { scan-tree-dump-not "1 / x_..D.;" "optimized" } } */ +/* { dg-final { scan-tree-dump "x_..D. == 1;" "optimized" } } */ diff --git a/gcc/testsuite/gcc.dg/tree-ssa/divide-7.c b/gcc/testsuite/gcc.dg/tree-ssa/divide-7.c new file mode 100644 index 000..285279af7c2 --- /dev/null +++ b/gcc/testsuite/gcc.dg/tree-ssa/divide-7.c @@ -0,0 +1,9 @@ +/* { dg-do compile } */ +/* { dg-options "-O -fdump-tree-optimized" } */ + +int f(int x) { + return 1 / x; +} + +/* { dg-final { scan-tree-dump-not "1 / x_..D.;" "optimized" } } */ +/* { dg-final { scan-tree-dump ".. <= 2 ? x_..D. : 0;" "optimized" } } */ -- 2.17.1
Re: [committed] libstdc++: Reduce template instantiations in
On Thu, 6 Jan 2022 at 10:33, Jonathan Wakely wrote: > > > On Thu, 6 Jan 2022 at 10:00, Stephan Bergmann wrote: > >> On 05/01/2022 14:47, Jonathan Wakely via Libstdc++ wrote: >> > Tested powerpc64le-linux, pushed to trunk. >> > >> > >> > This moves the last two template parameters of __regex_algo_impl to be >> > runtime function parameters instead, so that we don't need four >> > different instantiations for the possible ways to call it. Most of the >> > function (and what it instantiates) is the same in all cases, so making >> > them compile-time choices doesn't really have much benefit. >> > >> > Use 'if constexpr' for conditions that check template parameters, so >> > that when we do depend on a compile-time condition we only instantiate >> > what we need to. >> > >> > libstdc++-v3/ChangeLog: >> > >> > * include/bits/regex.h (__regex_algo_impl): Change __policy and >> > __match_mode template parameters to be function parameters. >> > (regex_match, regex_search): Pass policy and match mode as >> > function arguments. >> > * include/bits/regex.tcc (__regex_algo_impl): Change template >> > parameters to function parameters. >> > * include/bits/regex_compiler.h (_RegexTranslatorBase): Use >> > 'if constexpr' for conditions using template parameters. >> > (_RegexTranslator): Likewise. >> > * include/bits/regex_executor.tcc (_Executor::_M_handle_accept): >> > Likewise. >> > * testsuite/util/testsuite_regex.h (regex_match_debug) >> > (regex_search_debug): Move template arguments to function >> > arguments. >> > --- >> > libstdc++-v3/include/bits/regex.h | 33 +-- >> > libstdc++-v3/include/bits/regex.tcc | 8 ++--- >> > libstdc++-v3/include/bits/regex_compiler.h| 9 ++--- >> > libstdc++-v3/include/bits/regex_executor.tcc | 2 +- >> > libstdc++-v3/testsuite/util/testsuite_regex.h | 24 +++--- >> > 5 files changed, 37 insertions(+), 39 deletions(-) >> >> Clang now fails #include with >> >> > In file included from >> gcc/trunk/inst/lib/gcc/x86_64-pc-linux-gnu/12.0.0/../../../../include/c++/12.0.0/regex:66: >> > >> gcc/trunk/inst/lib/gcc/x86_64-pc-linux-gnu/12.0.0/../../../../include/c++/12.0.0/bits/regex.h:799:9: >> error: unknown type name '_RegexExecutorPolicy'; did you mean >> '__detail::_RegexExecutorPolicy'? >> > _RegexExecutorPolicy, bool); >> > ^ >> > >> gcc/trunk/inst/lib/gcc/x86_64-pc-linux-gnu/12.0.0/../../../../include/c++/12.0.0/bits/regex.h:45:14: >> note: '__detail::_RegexExecutorPolicy' declared here >> > enum class _RegexExecutorPolicy : int { _S_auto, _S_alternate }; >> > ^ >> > >> gcc/trunk/inst/lib/gcc/x86_64-pc-linux-gnu/12.0.0/../../../../include/c++/12.0.0/bits/regex.h:2070:9: >> error: unknown type name '_RegexExecutorPolicy'; did you mean >> '__detail::_RegexExecutorPolicy'? >> > _RegexExecutorPolicy, bool); >> > ^ >> > >> gcc/trunk/inst/lib/gcc/x86_64-pc-linux-gnu/12.0.0/../../../../include/c++/12.0.0/bits/regex.h:45:14: >> note: '__detail::_RegexExecutorPolicy' declared here >> > enum class _RegexExecutorPolicy : int { _S_auto, _S_alternate }; >> > ^ >> >> and >> >> > diff --git a/libstdc++-v3/include/bits/regex.h >> b/libstdc++-v3/include/bits/regex.h >> > index 7480b0a5f97..46c168010bf 100644 >> > --- a/libstdc++-v3/include/bits/regex.h >> > +++ b/libstdc++-v3/include/bits/regex.h >> > @@ -796,7 +796,7 @@ _GLIBCXX_BEGIN_NAMESPACE_CXX11 >> > __detail::__regex_algo_impl(_Bp, _Bp, match_results<_Bp, _Ap>&, >> > const basic_regex<_Cp, _Rp>&, >> > regex_constants::match_flag_type, >> > - _RegexExecutorPolicy, bool); >> > + __detail::_RegexExecutorPolicy, >> bool); >> > >> >template >> > friend class __detail::_Executor; >> > @@ -2067,7 +2067,7 @@ _GLIBCXX_BEGIN_NAMESPACE_CXX11 >> > __detail::__regex_algo_impl(_Bp, _Bp, match_results<_Bp, _Ap>&, >> > const basic_regex<_Cp, _Rp>&, >> > regex_constants::match_flag_type, >> > - _RegexExecutorPolicy, bool); >> > + __detail::_RegexExecutorPolicy, >> bool); >> > >> >// Reset contents to __size unmatched sub_match objects >> >// (plus additional objects for prefix, suffix and unmatched >> sub). >> >> would fix that. >> > > > I'll make the change, but this looks like a clang bug: > https://godbolt.org/z/bozxYErrc > Maybe this one: https://github.com/llvm/llvm-project/issues/38230
Re: [PATCH] expr: Workaround profiledbootstrap uninit false positive [PR103899]
On January 6, 2022 9:38:40 AM GMT+01:00, Jakub Jelinek via Gcc-patches wrote: >Hi! > >The threader changes resulted in a false positive warning during >profiledbootstrap: >In file included from ../../gcc/expr.c:26: >../../gcc/tree.h: In function ‘rtx_def* expand_expr_real_1(tree, rtx, >machine_mode, expand_modifier, rtx_def**, bool)’: >../../gcc/tree.h:244:56: error: ‘context’ may be used uninitialized in this >function [-Werror=maybe-uninitialized] > 244 | #define TREE_CODE(NODE) ((enum tree_code) (NODE)->base.code) > |^~~~ >../../gcc/expr.c:10343:8: note: ‘context’ was declared here >10343 | tree context; > |^~~ >While it will be nice to improve the uninit pass to handle it if possible >(I do not want to close the PR until that is done), doing profiledbootstrap >is a common thing to do, so a workaround is handy, especially as in this >case when the workaround seems to be the right thing to do, as it moves >a variable declaration to the only place where it is set and used and avoids >the weird and for uninit asking > tree context; >... > if (exp) >context = ...; > gcc_assert (!exp > || use (context) > || use_some_more (context)); > >Bootstrapped/regtested on x86_64-linux and i686-linux with normal bootstrap >and on aarch64-linux and powerpc64le-linux with profiledbootstrap, ok for >trunk? Ok. Richard. >2022-01-06 Jakub Jelinek > > PR tree-optimization/103899 > * expr.c (expand_expr_real_1): Add a workaround for bogus uninit > warning by moving context variable to the only spot where it is used > and moving gcc_assert into if body. > >--- gcc/expr.c.jj 2022-01-03 10:40:41.203164211 +0100 >+++ gcc/expr.c 2022-01-05 14:47:47.684660031 +0100 >@@ -10340,7 +10340,6 @@ expand_expr_real_1 (tree exp, rtx target > enum tree_code code = TREE_CODE (exp); > rtx subtarget, original_target; > int ignore; >- tree context; > bool reduce_bit_field; > location_t loc = EXPR_LOCATION (exp); > struct separate_ops ops; >@@ -10579,14 +10578,16 @@ expand_expr_real_1 (tree exp, rtx target > /* Variables inherited from containing functions should have >been lowered by this point. */ > if (exp) >- context = decl_function_context (exp); >- gcc_assert (!exp >-|| SCOPE_FILE_SCOPE_P (context) >-|| context == current_function_decl >-|| TREE_STATIC (exp) >-|| DECL_EXTERNAL (exp) >-/* ??? C++ creates functions that are not TREE_STATIC. */ >-|| TREE_CODE (exp) == FUNCTION_DECL); >+ { >+tree context = decl_function_context (exp); >+gcc_assert (SCOPE_FILE_SCOPE_P (context) >+|| context == current_function_decl >+|| TREE_STATIC (exp) >+|| DECL_EXTERNAL (exp) >+/* ??? C++ creates functions that are not >+ TREE_STATIC. */ >+|| TREE_CODE (exp) == FUNCTION_DECL); >+ } > > /* This is the case of an array whose size is to be determined >from its initializer, while the initializer is still being parsed. > > Jakub >
Re: [PATCH] [RTL/fwprop] Allow propagations from inner loop to outer loop.
On January 6, 2022 7:51:48 AM GMT+01:00, liuhongt wrote: >> that's flow_loop_nested_p (loop *outer, loop *inner) which >> is implemented in O(1). Note behavior for outer == inner >> might be different (didn't check your implementation too hard) >> >Thanks, it seems flow_loop_nested_p assume outer and inner not to be >NULL. So I add some conditions to check NULL which is considered as an outer > loop of any other loop. Huh, loop_father should never be NULL. Maybe when fwprop is run after RTL loop opts you instead want to add a check for current_loops or alternelatively initialize loops in fwprop. > > >gcc/ChangeLog: > > PR rtl/103750 > * fwprop.c (forward_propagate_into): Allow propagations from > inner loop to outer loop. > >gcc/testsuite/ChangeLog: > > * g++.target/i386/pr103750-fwprop-1.C: New test. >--- > gcc/fwprop.c | 7 +++-- > .../g++.target/i386/pr103750-fwprop-1.C | 26 +++ > 2 files changed, 31 insertions(+), 2 deletions(-) > create mode 100644 gcc/testsuite/g++.target/i386/pr103750-fwprop-1.C > >diff --git a/gcc/fwprop.c b/gcc/fwprop.c >index 2eab4fd4614..af2e9d1c189 100644 >--- a/gcc/fwprop.c >+++ b/gcc/fwprop.c >@@ -866,10 +866,13 @@ forward_propagate_into (use_info *use, bool >reg_prop_only = false) > rtx src = SET_SRC (def_set); > > /* Allow propagations into a loop only for reg-to-reg copies, since >- replacing one register by another shouldn't increase the cost. */ >+ replacing one register by another shouldn't increase the cost. >+ Propagations from inner loop to outer loop should be also ok. */ > struct loop *def_loop = def_insn->bb ()->cfg_bb ()->loop_father; > struct loop *use_loop = use->bb ()->cfg_bb ()->loop_father; >- if ((reg_prop_only || def_loop != use_loop) >+ if ((reg_prop_only >+ || (use_loop && def_loop != use_loop >+ &&(!def_loop || !flow_loop_nested_p (use_loop, def_loop > && (!reg_single_def_p (dest) || !reg_single_def_p (src))) > return false; > >diff --git a/gcc/testsuite/g++.target/i386/pr103750-fwprop-1.C >b/gcc/testsuite/g++.target/i386/pr103750-fwprop-1.C >new file mode 100644 >index 000..26987d307aa >--- /dev/null >+++ b/gcc/testsuite/g++.target/i386/pr103750-fwprop-1.C >@@ -0,0 +1,26 @@ >+/* PR target/103750. */ >+/* { dg-do compile } */ >+/* { dg-options "-O2 -std=c++1y -march=cannonlake -fdump-rtl-fwprop1" } */ >+/* { dg-final { scan-rtl-dump-not "subreg:HI\[ >\\\(\]*reg:SI\[^\n]*\n\[^\n]*UNSPEC_TZCNT" "fwprop1" } } */ >+ >+#include >+const char16_t *qustrchr(char16_t *n, char16_t *e, char16_t c) noexcept >+{ >+ __m256i mch256 = _mm256_set1_epi16(c); >+ for ( ; n < e; n += 32) { >+__m256i data1 = _mm256_loadu_si256(reinterpret_cast(n)); >+__m256i data2 = _mm256_loadu_si256(reinterpret_cast(n) + >1); >+__mmask16 mask1 = _mm256_cmpeq_epu16_mask(data1, mch256); >+__mmask16 mask2 = _mm256_cmpeq_epu16_mask(data2, mch256); >+if (_kortestz_mask16_u8(mask1, mask2)) >+ continue; >+ >+unsigned idx = _tzcnt_u32(mask1); >+if (mask1 == 0) { >+ idx = __tzcnt_u16(mask2); >+ n += 16; >+} >+return n + idx; >+ } >+ return e; >+}
Re: [PATCH] LTO plugin: modernize a bit.
On 1/5/22 18:43, Jeff Law wrote: But the timing isn't great. We're in stage3 and presumably moving into stage4 shortly. Shouldn't this wait until stage1 re-opens? Agree, I'm going to prepare the changes for GCC 13, same as LTO plugin support for ld.mold. Cheers, Martin
[PATCH] [i386] Support commutative alternative for AVX512 vpcmpeq{b, w, d, q}
Bootstrapped and regtested on x86_64-pc-linux-gnu{-m32,}. Ready to push to trunk. gcc/ChangeLog: * config/i386/sse.md (*_eq3_1): Extend to UNSPEC_PCMP_UNSIGNED. gcc/testsuite/ChangeLog: * gcc.target/i386/pr103774.c: New test. * gcc.target/i386/avx512bw-vpcmpequb-1.c: Adjust scan assembler from vpcmpub to (?:vpcmpub|vpcmpeqb). * gcc.target/i386/avx512bw-vpcmpequw-1.c: Ditto. * gcc.target/i386/avx512bw-vpcmpub-1.c: Ditto. * gcc.target/i386/avx512bw-vpcmpuw-1.c: Ditto. * gcc.target/i386/avx512f-vpcmpequd-1.c: Ditto. * gcc.target/i386/avx512f-vpcmpequq-1.c: Ditto. * gcc.target/i386/avx512f-vpcmpud-1.c: Ditto. * gcc.target/i386/avx512vl-vpcmpequd-1.c: Ditto. * gcc.target/i386/avx512vl-vpcmpequq-1.c: Ditto. * gcc.target/i386/avx512vl-vpcmpuq-1.c: Ditto. --- gcc/config/i386/sse.md| 64 +-- .../gcc.target/i386/avx512bw-vpcmpequb-1.c| 12 ++-- .../gcc.target/i386/avx512bw-vpcmpequw-1.c| 12 ++-- .../gcc.target/i386/avx512bw-vpcmpub-1.c | 2 +- .../gcc.target/i386/avx512bw-vpcmpuw-1.c | 2 +- .../gcc.target/i386/avx512f-vpcmpequd-1.c | 4 +- .../gcc.target/i386/avx512f-vpcmpequq-1.c | 4 +- .../gcc.target/i386/avx512f-vpcmpud-1.c | 2 +- .../gcc.target/i386/avx512vl-vpcmpequd-1.c| 8 +-- .../gcc.target/i386/avx512vl-vpcmpequq-1.c| 8 +-- .../gcc.target/i386/avx512vl-vpcmpuq-1.c | 2 +- gcc/testsuite/gcc.target/i386/pr103774.c | 25 12 files changed, 85 insertions(+), 60 deletions(-) create mode 100644 gcc/testsuite/gcc.target/i386/pr103774.c diff --git a/gcc/config/i386/sse.md b/gcc/config/i386/sse.md index fa1d56ae3e3..35360004818 100644 --- a/gcc/config/i386/sse.md +++ b/gcc/config/i386/sse.md @@ -3895,6 +3895,22 @@ (define_insn_and_split "*_cmp3" UNSPEC_PCMP_ITER))] "operands[4] = GEN_INT (INTVAL (operands[3]) ^ 4);") +(define_insn "*_eq3_1" + [(set (match_operand: 0 "register_operand" "=k,k") + (unspec: + [(match_operand:VI12_AVX512VL 1 "nonimm_or_0_operand" "%v,v") + (match_operand:VI12_AVX512VL 2 "nonimm_or_0_operand" "vm,C") + (const_int 0)] + UNSPEC_PCMP_ITER))] + "TARGET_AVX512BW && !(MEM_P (operands[1]) && MEM_P (operands[2]))" + "@ + vpcmpeq\t{%2, %1, %0|%0, %1, %2} + vptestnm\t{%1, %1, %0|%0, %1, %1}" + [(set_attr "type" "ssecmp") + (set_attr "prefix_extra" "1") + (set_attr "prefix" "evex") + (set_attr "mode" "")]) + (define_insn "_ucmp3" [(set (match_operand: 0 "register_operand" "=k") (unspec: @@ -3977,6 +3993,22 @@ (define_insn_and_split "*_ucmp3_zero_extend")]) +(define_insn "*_eq3_1" + [(set (match_operand: 0 "register_operand" "=k,k") + (unspec: + [(match_operand:VI48_AVX512VL 1 "nonimm_or_0_operand" "%v,v") + (match_operand:VI48_AVX512VL 2 "nonimm_or_0_operand" "vm,C") + (const_int 0)] + UNSPEC_PCMP_ITER))] + "TARGET_AVX512F && !(MEM_P (operands[1]) && MEM_P (operands[2]))" + "@ + vpcmpeq\t{%2, %1, %0|%0, %1, %2} + vptestnm\t{%1, %1, %0|%0, %1, %1}" + [(set_attr "type" "ssecmp") + (set_attr "prefix_extra" "1") + (set_attr "prefix" "evex") + (set_attr "mode" "")]) + (define_insn "_ucmp3" [(set (match_operand: 0 "register_operand" "=k") (unspec: @@ -16248,38 +16280,6 @@ (define_expand "_eq3" "TARGET_AVX512F" "ix86_fixup_binary_operands_no_copy (EQ, mode, operands);") -(define_insn "*_eq3_1" - [(set (match_operand: 0 "register_operand" "=k,k") - (unspec: - [(match_operand:VI12_AVX512VL 1 "nonimm_or_0_operand" "%v,v") - (match_operand:VI12_AVX512VL 2 "nonimm_or_0_operand" "vm,C") - (const_int 0)] - UNSPEC_PCMP))] - "TARGET_AVX512BW && !(MEM_P (operands[1]) && MEM_P (operands[2]))" - "@ - vpcmpeq\t{%2, %1, %0|%0, %1, %2} - vptestnm\t{%1, %1, %0|%0, %1, %1}" - [(set_attr "type" "ssecmp") - (set_attr "prefix_extra" "1") - (set_attr "prefix" "evex") - (set_attr "mode" "")]) - -(define_insn "*_eq3_1" - [(set (match_operand: 0 "register_operand" "=k,k") - (unspec: - [(match_operand:VI48_AVX512VL 1 "nonimm_or_0_operand" "%v,v") - (match_operand:VI48_AVX512VL 2 "nonimm_or_0_operand" "vm,C") - (const_int 0)] - UNSPEC_PCMP))] - "TARGET_AVX512F && !(MEM_P (operands[1]) && MEM_P (operands[2]))" - "@ - vpcmpeq\t{%2, %1, %0|%0, %1, %2} - vptestnm\t{%1, %1, %0|%0, %1, %1}" - [(set_attr "type" "ssecmp") - (set_attr "prefix_extra" "1") - (set_attr "prefix" "evex") - (set_attr "mode" "")]) - (define_insn "*sse4_1_eqv2di3" [(set (match_operand:V2DI 0 "register_operand" "=Yr,*x,x") (eq:V2DI diff --git a/gcc/testsuite/gcc.target/i386/avx512bw-vpcmpequb-1.c b/gcc/testsuite/gcc.target/i386/avx512bw-vpcmpequb-1.c index 31a0afc2f17..9a85753c9fb 100644 --- a/gcc/testsuite/gcc.target/i386/avx512bw-vpcmpequb-1.c +++
Re: [PATCH] Libquadmath: add nansq() function
Hi Joseph, > All targets with _Float128 should have __builtin_nansf128, since we have > DEF_GCC_FLOATN_NX_BUILTINS (BUILT_IN_NANS, "nans", NAN_TYPE, > ATTR_CONST_NOTHROW_NONNULL) > in builtins.def. Hum, I see, I didn’t know that version existed. To be honest, I find the “other built-ins” doc page to be quite confusing to read: https://gcc.gnu.org/onlinedocs/gcc/Other-Builtins.html Thanks! FX
Re: [PATCH] gcc: pass-manager: Fix memory leak. [PR jit/63854]
On Sun, 2021-12-19 at 22:30 +0100, Marc Nieper-Wißkirchen wrote: > This patch fixes a memory leak in the pass manager. In the existing > code, > the m_name_to_pass_map is allocated in > pass_manager::register_pass_name, but > never deallocated. This is fixed by adding a deletion in > pass_manager::~pass_manager. Moreover the string keys in > m_name_to_pass_map are > all dynamically allocated. To free them, this patch adds a new hash > trait for > string hashes that are to be freed when the corresponding hash entry > is removed. > > This fix is particularly relevant for using GCC as a library through > libgccjit. > The memory leak also occurs when libgccjit is instructed to use an > external > driver. > > Before the patch, compiling the hello world example of libgccjit with > the external driver under Valgrind shows a loss of 12,611 (48 direct) > bytes. After the patch, no memory leaks are reported anymore. > (Memory leaks occurring when using the internal driver are mostly in > the driver code in gcc/gcc.c and have to be fixed separately.) > > The patch has been tested by fully bootstrapping the compiler with > the > frontends C, C++, Fortran, LTO, ObjC, JIT and running the test suite > under a x86_64-pc-linux-gnu host. Thanks for the patch. It looks correct to me, given that pass_manager::register_pass_name does an xstrdup and puts the result in the map. That said: - I'm not officially a reviewer for this part of gcc (though I probably touched this code last) - is it cleaner to instead change m_name_to_pass_map's key type from const char * to char *, to convey that the map "owns" the name? That way we probably wouldn't need struct typed_const_free_remove, and (I hope) works better with the type system. Dave > > gcc/ChangeLog: > > PR jit/63854 > * hash-traits.h (struct typed_const_free_remove): New. > (struct free_string_hash): New. > * pass_manager.h: Use free_string_hash. > * passes.c (pass_manager::register_pass_name): Use > free_string_hash. > (pass_manager::~pass_manager): Delete allocated > m_name_to_pass_map. > --- > gcc/hash-traits.h | 17 + > gcc/pass_manager.h | 3 +-- > gcc/passes.c | 5 +++-- > 3 files changed, 21 insertions(+), 4 deletions(-) > > diff --git a/gcc/hash-traits.h b/gcc/hash-traits.h > index 6f0373ec27f..1c08d6874ab 100644 > --- a/gcc/hash-traits.h > +++ b/gcc/hash-traits.h > @@ -28,6 +28,11 @@ struct typed_free_remove > static inline void remove (Type *p); > }; > > +template > +struct typed_const_free_remove > +{ > + static inline void remove (const Type *p); > +}; > > /* Remove with free. */ > > @@ -38,6 +43,13 @@ typed_free_remove ::remove (Type *p) > free (p); > } > > +template > +inline void > +typed_const_free_remove ::remove (const Type *p) > +{ > + free (const_cast (p)); > +} > + > /* Helpful type for removing with delete. */ > > template > @@ -305,6 +317,11 @@ struct ggc_ptr_hash : pointer_hash , > ggc_remove {}; > template > struct ggc_cache_ptr_hash : pointer_hash , ggc_cache_remove > {}; > > +/* Traits for string elements that should be freed when an element > is > + deleted. */ > + > +struct free_string_hash : string_hash, typed_const_free_remove > {}; > + > /* Traits for string elements that should not be freed when an > element > is deleted. */ > > diff --git a/gcc/pass_manager.h b/gcc/pass_manager.h > index aaf72cf6803..f5615e1fda8 100644 > --- a/gcc/pass_manager.h > +++ b/gcc/pass_manager.h > @@ -106,7 +106,7 @@ private: > > private: > context *m_ctxt; > - hash_map *m_name_to_pass_map; > + hash_map *m_name_to_pass_map; > > /* References to all of the individual passes. > These fields are generated via macro expansion. > @@ -146,4 +146,3 @@ private: > } // namespace gcc > > #endif /* ! GCC_PASS_MANAGER_H */ > - > diff --git a/gcc/passes.c b/gcc/passes.c > index 4bea6ae5b6a..0c70ece5321 100644 > --- a/gcc/passes.c > +++ b/gcc/passes.c > @@ -903,7 +903,7 @@ void > pass_manager::register_pass_name (opt_pass *pass, const char *name) > { > if (!m_name_to_pass_map) > - m_name_to_pass_map = new hash_map *> (256); > + m_name_to_pass_map = new hash_map > (256); > > if (m_name_to_pass_map->get (name)) > return; /* Ignore plugin passes. */ > @@ -1674,6 +1674,7 @@ pass_manager::~pass_manager () > GCC_PASS_LISTS > #undef DEF_PASS_LIST > > + delete m_name_to_pass_map; > } > > /* If we are in IPA mode (i.e., current_function_decl is NULL), call > @@ -1943,7 +1944,7 @@ pass_manager::dump_profile_report () const > " |in count |out > prob " > "|in count |out prob " > "|size |time |\n"); > - > + > for (int i = 1; i < passes_by_id_size; i++) > if (profile_record[i].run) > {
Re: [PATCH] gcc: pass-manager: Fix memory leak. [PR jit/63854]
On Thu, 2022-01-06 at 08:53 -0500, David Malcolm wrote: > On Sun, 2021-12-19 at 22:30 +0100, Marc Nieper-Wißkirchen wrote: > > This patch fixes a memory leak in the pass manager. In the existing > > code, > > the m_name_to_pass_map is allocated in > > pass_manager::register_pass_name, but > > never deallocated. This is fixed by adding a deletion in > > pass_manager::~pass_manager. Moreover the string keys in > > m_name_to_pass_map are > > all dynamically allocated. To free them, this patch adds a new hash > > trait for > > string hashes that are to be freed when the corresponding hash entry > > is removed. > > > > This fix is particularly relevant for using GCC as a library through > > libgccjit. > > The memory leak also occurs when libgccjit is instructed to use an > > external > > driver. > > > > Before the patch, compiling the hello world example of libgccjit with > > the external driver under Valgrind shows a loss of 12,611 (48 direct) > > bytes. After the patch, no memory leaks are reported anymore. > > (Memory leaks occurring when using the internal driver are mostly in > > the driver code in gcc/gcc.c and have to be fixed separately.) > > > > The patch has been tested by fully bootstrapping the compiler with > > the > > frontends C, C++, Fortran, LTO, ObjC, JIT and running the test suite > > under a x86_64-pc-linux-gnu host. > > Thanks for the patch. > > It looks correct to me, given that pass_manager::register_pass_name > does an xstrdup and puts the result in the map. > > That said: > - I'm not officially a reviewer for this part of gcc (though I probably > touched this code last) > - is it cleaner to instead change m_name_to_pass_map's key type from > const char * to char *, to convey that the map "owns" the name? That > way we probably wouldn't need struct typed_const_free_remove, and (I > hope) works better with the type system. [...snip...] > > > diff --git a/gcc/passes.c b/gcc/passes.c > > index 4bea6ae5b6a..0c70ece5321 100644 > > --- a/gcc/passes.c > > +++ b/gcc/passes.c [...snip...] > > @@ -1943,7 +1944,7 @@ pass_manager::dump_profile_report () const > > " |in count |out > > prob " > > "|in count |out prob " > > "|size |time |\n"); > > - > > + > > for (int i = 1; i < passes_by_id_size; i++) > > if (profile_record[i].run) > > { > ...and there's a stray whitespace change here (in pass_manager::dump_profile_report), which probably shouldn't be in the patch. Dave
PING (C/C++): Re: [PATCH 6/6] Add __attribute__ ((tainted))
On Sat, 2021-11-13 at 15:37 -0500, David Malcolm wrote: > This patch adds a new __attribute__ ((tainted)) to the C/C++ > frontends. Ping for GCC C/C++ mantainers for review of the C/C++ FE parts of this patch (attribute registration, documentation, the name of the attribute, etc). (I believe it's independent of the rest of the patch kit, in that it could go into trunk without needing the prior patches) Thanks Dave > > It can be used on function decls: the analyzer will treat as tainted > all parameters to the function and all buffers pointed to by > parameters > to the function. Adding this in one place to the Linux kernel's > __SYSCALL_DEFINEx macro allows the analyzer to treat all syscalls as > having tainted inputs. This gives additional testing beyond e.g. > __user > pointers added by earlier patches - an example of the use of this can > be > seen in CVE-2011-2210, where given: > > SYSCALL_DEFINE5(osf_getsysinfo, unsigned long, op, void __user *, > buffer, > unsigned long, nbytes, int __user *, start, void > __user *, arg) > > the analyzer will treat the nbytes param as under attacker control, > and > can complain accordingly: > > taint-CVE-2011-2210-1.c: In function ‘sys_osf_getsysinfo’: > taint-CVE-2011-2210-1.c:69:21: warning: use of attacker-controlled > value > ‘nbytes’ as size without upper-bounds checking [CWE-129] [- > Wanalyzer-tainted-size] > 69 | if (copy_to_user(buffer, hwrpb, nbytes) != 0) > | ^~~ > > Additionally, the patch allows the attribute to be used on field > decls: > specifically function pointers. Any function used as an initializer > for such a field gets treated as tainted. An example can be seen in > CVE-2020-13143, where adding __attribute__((tainted)) to the "store" > callback of configfs_attribute: > > struct configfs_attribute { > /* [...snip...] */ > ssize_t (*store)(struct config_item *, const char *, size_t) > __attribute__((tainted)); > /* [...snip...] */ > }; > > allows the analyzer to see: > > CONFIGFS_ATTR(gadget_dev_desc_, UDC); > > and treat gadget_dev_desc_UDC_store as tainted, so that it complains: > > taint-CVE-2020-13143-1.c: In function ‘gadget_dev_desc_UDC_store’: > taint-CVE-2020-13143-1.c:33:17: warning: use of attacker-controlled > value > ‘len + 18446744073709551615’ as offset without upper-bounds > checking [CWE-823] [-Wanalyzer-tainted-offset] > 33 | if (name[len - 1] == '\n') > | ^ > > Similarly, the attribute could be used on the ioctl callback field, > USB device callbacks, network-handling callbacks etc. This > potentially > gives a lot of test coverage with relatively little code annotation, > and > without necessarily needing link-time analysis (which -fanalyzer can > only do at present on trivial examples). > > I believe this is the first time we've had an attribute on a field. > If that's an issue, I could prepare a version of the patch that > merely allowed it on functions themselves. > > As before this currently still needs -fanalyzer-checker=taint (in > addition to -fanalyzer). > > gcc/analyzer/ChangeLog: > * engine.cc: Include "stringpool.h", "attribs.h", and > "tree-dfa.h". > (mark_params_as_tainted): New. > (class tainted_function_custom_event): New. > (class tainted_function_info): New. > (exploded_graph::add_function_entry): Handle functions with > "tainted" attribute. > (class tainted_field_custom_event): New. > (class tainted_callback_custom_event): New. > (class tainted_call_info): New. > (add_tainted_callback): New. > (add_any_callbacks): New. > (exploded_graph::build_initial_worklist): Find callbacks that > are > reachable from global initializers, calling add_any_callbacks > on > them. > > gcc/c-family/ChangeLog: > * c-attribs.c (c_common_attribute_table): Add "tainted". > (handle_tainted_attribute): New. > > gcc/ChangeLog: > * doc/extend.texi (Function Attributes): Note that "tainted" > can > be used on field decls. > (Common Function Attributes): Add entry on "tainted" > attribute. > > gcc/testsuite/ChangeLog: > * gcc.dg/analyzer/attr-tainted-1.c: New test. > * gcc.dg/analyzer/attr-tainted-misuses.c: New test. > * gcc.dg/analyzer/taint-CVE-2011-2210-1.c: New test. > * gcc.dg/analyzer/taint-CVE-2020-13143-1.c: New test. > * gcc.dg/analyzer/taint-CVE-2020-13143-2.c: New test. > * gcc.dg/analyzer/taint-CVE-2020-13143.h: New test. > * gcc.dg/analyzer/taint-alloc-3.c: New test. > * gcc.dg/analyzer/taint-alloc-4.c: New test. > > Signed-off-by: David Malcolm > --- > gcc/analyzer/engine.cc | 317 > +- > gcc/c-family/c-attribs.c | 36 ++ > gcc/doc/extend.texi
[PATCH 0/6] ira: Fix performance regression in exchange2 [PR98782]
This series of patches recovers the exchange2 performance lost in the GCC 11 timeframe (at least on aarch64 and Power9 -- thanks Pat for testing the latter). There are 6 patches, split into two groups of 3. The first 3 are just preparatory patches, although patch 2 does contain a minor bug fix. The other 3 patches are the ones that together fix the regression. I realise this is a bit invasive for stage 3. However, the series is fixing a large regression in an important benchmark and AFAIK there are no known acceptable mitigations that we could apply instead. I think the series is also working with concepts that already exist in IRA: it's really about tweaking the cost model used to control them. We also still have at least 3 months (more realistically 4 months) of testing before GCC 12 is released. So perhaps one option would be to apply any approved version of the series now, but with the understanding that if there's significant fallout (more than a handful of small tweaks or fixes), we would simply revert the patches rather than trying to rework them in-situ. The series is confined to IRA so reverting it should be simple. Would that be OK? Each patch bootstrapped & regression-tested individually on aarch64-linux-gnu. Also tested as a series on aarch64_be-elf, arm-linux-gnueabihf, powerpc64le-linux-gnu, and x86_64-linux-gnu. Richard
[PATCH 1/6] ira: Add a ira_loop_border_costs class
The final index into (ira_)memory_move_cost is 1 for loads and 0 for stores. Thus the combination: entry_freq * memory_cost[1] + exit_freq * memory_cost[0] is the cost of loading a register on entry to a loop and storing it back on exit from the loop. This is the cost to use if the register is successfully allocated within the loop but is spilled in the parent loop. Similarly: entry_freq * memory_cost[0] + exit_freq * memory_cost[1] is the cost of storing a register on entry to the loop and restoring it on exit from the loop. This is the cost to use if the register is spilled within the loop but is successfully allocated in the parent loop. The patch adds a helper class for calculating these values and mechanically replaces the existing instances. There is no attempt to editorialise the choice between using “spill inside” and “spill outside” costs. (I think one of them is the wrong way round, but a later patch deals with that.) No functional change intended. gcc/ PR rtl-optimization/98782 * ira-int.h (ira_loop_border_costs): New class. * ira-color.c (ira_loop_border_costs::ira_loop_border_costs): New constructor. (calculate_allocno_spill_cost): Use ira_loop_border_costs. (color_pass): Likewise. (move_spill_restore): Likewise. --- gcc/ira-color.c | 76 +++-- gcc/ira-int.h | 56 2 files changed, 86 insertions(+), 46 deletions(-) diff --git a/gcc/ira-color.c b/gcc/ira-color.c index e0b4e490043..66c11710b97 100644 --- a/gcc/ira-color.c +++ b/gcc/ira-color.c @@ -2567,13 +2567,23 @@ ira_loop_edge_freq (ira_loop_tree_node_t loop_node, int regno, bool exit_p) return REG_FREQ_FROM_EDGE_FREQ (freq); } +/* Construct an object that describes the boundary between A and its + parent allocno. */ +ira_loop_border_costs::ira_loop_border_costs (ira_allocno_t a) + : m_mode (ALLOCNO_MODE (a)), +m_class (ALLOCNO_CLASS (a)), +m_entry_freq (ira_loop_edge_freq (ALLOCNO_LOOP_TREE_NODE (a), + ALLOCNO_REGNO (a), false)), +m_exit_freq (ira_loop_edge_freq (ALLOCNO_LOOP_TREE_NODE (a), +ALLOCNO_REGNO (a), true)) +{ +} + /* Calculate and return the cost of putting allocno A into memory. */ static int calculate_allocno_spill_cost (ira_allocno_t a) { int regno, cost; - machine_mode mode; - enum reg_class rclass; ira_allocno_t parent_allocno; ira_loop_tree_node_t parent_node, loop_node; @@ -2586,24 +2596,12 @@ calculate_allocno_spill_cost (ira_allocno_t a) return cost; if ((parent_allocno = parent_node->regno_allocno_map[regno]) == NULL) return cost; - mode = ALLOCNO_MODE (a); - rclass = ALLOCNO_CLASS (a); + ira_loop_border_costs border_costs (a); if (ALLOCNO_HARD_REGNO (parent_allocno) < 0) -cost -= (ira_memory_move_cost[mode][rclass][0] -* ira_loop_edge_freq (loop_node, regno, true) -+ ira_memory_move_cost[mode][rclass][1] -* ira_loop_edge_freq (loop_node, regno, false)); +cost -= border_costs.spill_outside_loop_cost (); else -{ - ira_init_register_move_cost_if_necessary (mode); - cost += ((ira_memory_move_cost[mode][rclass][1] - * ira_loop_edge_freq (loop_node, regno, true) - + ira_memory_move_cost[mode][rclass][0] - * ira_loop_edge_freq (loop_node, regno, false)) - - (ira_register_move_cost[mode][rclass][rclass] - * (ira_loop_edge_freq (loop_node, regno, false) -+ ira_loop_edge_freq (loop_node, regno, true; -} +cost += (border_costs.spill_inside_loop_cost () +- border_costs.move_between_loops_cost ()); return cost; } @@ -3342,7 +3340,7 @@ static void color_pass (ira_loop_tree_node_t loop_tree_node) { int regno, hard_regno, index = -1, n; - int cost, exit_freq, enter_freq; + int cost; unsigned int j; bitmap_iterator bi; machine_mode mode; @@ -3466,8 +3464,6 @@ color_pass (ira_loop_tree_node_t loop_tree_node) } continue; } - exit_freq = ira_loop_edge_freq (subloop_node, regno, true); - enter_freq = ira_loop_edge_freq (subloop_node, regno, false); ira_assert (regno < ira_reg_equiv_len); if (ira_equiv_no_lvalue_p (regno)) { @@ -3483,16 +3479,16 @@ color_pass (ira_loop_tree_node_t loop_tree_node) } else if (hard_regno < 0) { + ira_loop_border_costs border_costs (subloop_allocno); ALLOCNO_UPDATED_MEMORY_COST (subloop_allocno) - -= ((ira_memory_move_cost[mode][rclass][1] * enter_freq) - + (ira_memory_move_cost[mode][rclass][0] * exit_freq)); + -= border_costs.spill_outside_loop_cost (); } else { + ira_loop_border_costs border_c
[PATCH 2/6] ira: Add comments and fix move_spill_restore calculation
This patch adds comments to describe each use of ira_loop_border_costs. I think this highlights that move_spill_restore was using the wrong cost in one case, which came from tranposing [0] and [1] in the original (pre-ira_loop_border_costs) ira_memory_move_cost expressions. The difference would only be noticeable on targets that distinguish between load and store costs. gcc/ PR rtl-optimization/98782 * ira-color.c (color_pass): Add comments to describe the spill costs. (move_spill_restore): Likewise. Fix reversed calculation. --- gcc/ira-color.c | 28 +++- 1 file changed, 27 insertions(+), 1 deletion(-) diff --git a/gcc/ira-color.c b/gcc/ira-color.c index 66c11710b97..e7433312675 100644 --- a/gcc/ira-color.c +++ b/gcc/ira-color.c @@ -3479,6 +3479,13 @@ color_pass (ira_loop_tree_node_t loop_tree_node) } else if (hard_regno < 0) { + /* If we allocate a register to SUBLOOP_ALLOCNO, we'll need +to load the register on entry to the subloop and store +the register back on exit from the subloop. This incurs +a fixed cost for all registers. Since UPDATED_MEMORY_COST +is (and should only be) used relative to the register costs +for the same allocno, we can subtract this shared register +cost from the memory cost. */ ira_loop_border_costs border_costs (subloop_allocno); ALLOCNO_UPDATED_MEMORY_COST (subloop_allocno) -= border_costs.spill_outside_loop_cost (); @@ -3503,6 +3510,9 @@ color_pass (ira_loop_tree_node_t loop_tree_node) > ALLOCNO_UPDATED_HARD_REG_COSTS (subloop_allocno)[index]) ALLOCNO_UPDATED_CLASS_COST (subloop_allocno) = ALLOCNO_UPDATED_HARD_REG_COSTS (subloop_allocno)[index]; + /* If we spill SUBLOOP_ALLOCNO, we'll need to store HARD_REGNO +on entry to the subloop and restore HARD_REGNO on exit from +the subloop. */ ALLOCNO_UPDATED_MEMORY_COST (subloop_allocno) += border_costs.spill_inside_loop_cost (); } @@ -3601,9 +3611,17 @@ move_spill_restore (void) : ALLOCNO_HARD_REG_COSTS (subloop_allocno)[index])); ira_loop_border_costs border_costs (subloop_allocno); if ((hard_regno2 = ALLOCNO_HARD_REGNO (subloop_allocno)) < 0) - cost -= border_costs.spill_outside_loop_cost (); + /* The register was spilled in the subloop. If we spill + it in the outer loop too then we'll no longer need to + save the register on entry to the subloop and restore + the register on exit from the subloop. */ + cost -= border_costs.spill_inside_loop_cost (); else { + /* The register was also allocated in the subloop. If we +spill it in the outer loop then we'll need to load the +register on entry to the subloop and store the register +back on exit from the subloop. */ cost += border_costs.spill_outside_loop_cost (); if (hard_regno2 != hard_regno) cost -= border_costs.move_between_loops_cost (); @@ -3615,9 +3633,17 @@ move_spill_restore (void) ira_assert (rclass == ALLOCNO_CLASS (parent_allocno)); ira_loop_border_costs border_costs (a); if ((hard_regno2 = ALLOCNO_HARD_REGNO (parent_allocno)) < 0) + /* The register was spilled in the parent loop. If we spill + it in this loop too then we'll no longer need to load the + register on entry to this loop and save the register back + on exit from this loop. */ cost -= border_costs.spill_outside_loop_cost (); else { + /* The register was also allocated in the parent loop. +If we spill it in this loop then we'll need to save +the register on entry to this loop and restore the +register on exit from this loop. */ cost += border_costs.spill_inside_loop_cost (); if (hard_regno2 != hard_regno) cost -= border_costs.move_between_loops_cost (); -- 2.25.1
[PATCH 3/6] ira: Add ira_subloop_allocnos_can_differ_p
color_pass has two instances of the same code for propagating non-cap assignments from parent loops to subloops. This patch adds a helper function for testing when such propagations are required for correctness and uses it to remove the duplicated code. A later patch will use this in ira-build.c too, which is why the function is exported to ira-int.h. No functional change intended. gcc/ PR rtl-optimization/98782 * ira-int.h (ira_subloop_allocnos_can_differ_p): New function, extracted from... * ira-color.c (color_pass): ...here. --- gcc/ira-color.c | 21 + gcc/ira-int.h | 28 2 files changed, 29 insertions(+), 20 deletions(-) diff --git a/gcc/ira-color.c b/gcc/ira-color.c index e7433312675..ae0f08af4b3 100644 --- a/gcc/ira-color.c +++ b/gcc/ira-color.c @@ -3446,26 +3446,7 @@ color_pass (ira_loop_tree_node_t loop_tree_node) if ((flag_ira_region == IRA_REGION_MIXED && (loop_tree_node->reg_pressure[pclass] <= ira_class_hard_regs_num[pclass])) - || (pic_offset_table_rtx != NULL - && regno == (int) REGNO (pic_offset_table_rtx)) - /* Avoid overlapped multi-registers. Moves between them -might result in wrong code generation. */ - || (hard_regno >= 0 - && ira_reg_class_max_nregs[pclass][mode] > 1)) - { - if (! ALLOCNO_ASSIGNED_P (subloop_allocno)) - { - ALLOCNO_HARD_REGNO (subloop_allocno) = hard_regno; - ALLOCNO_ASSIGNED_P (subloop_allocno) = true; - if (hard_regno >= 0) - update_costs_from_copies (subloop_allocno, true, true); - /* We don't need updated costs anymore. */ - ira_free_allocno_updated_costs (subloop_allocno); - } - continue; - } - ira_assert (regno < ira_reg_equiv_len); - if (ira_equiv_no_lvalue_p (regno)) + || !ira_subloop_allocnos_can_differ_p (a, hard_regno >= 0)) { if (! ALLOCNO_ASSIGNED_P (subloop_allocno)) { diff --git a/gcc/ira-int.h b/gcc/ira-int.h index b32c80d4c9e..c5b1a131abd 100644 --- a/gcc/ira-int.h +++ b/gcc/ira-int.h @@ -1595,4 +1595,32 @@ ira_loop_border_costs::move_between_loops_cost () const return move_cost * (m_entry_freq + m_exit_freq); } +/* Return true if subloops that contain allocnos for A's register can + use a different assignment from A. ALLOCATED_P is true for the case + in which allocation succeeded for A. */ +inline bool +ira_subloop_allocnos_can_differ_p (ira_allocno_t a, bool allocated_p = true) +{ + auto regno = ALLOCNO_REGNO (a); + + if (pic_offset_table_rtx != NULL + && regno == (int) REGNO (pic_offset_table_rtx)) +return false; + + ira_assert (regno < ira_reg_equiv_len); + if (ira_equiv_no_lvalue_p (regno)) +return false; + + /* Avoid overlapping multi-registers. Moves between them might result + in wrong code generation. */ + if (allocated_p) +{ + auto pclass = ira_pressure_class_translate[ALLOCNO_CLASS (a)]; + if (ira_reg_class_max_nregs[pclass][ALLOCNO_MODE (a)] > 1) + return false; +} + + return true; +} + #endif /* GCC_IRA_INT_H */ -- 2.25.1
[PATCH 4/6] ira: Try to avoid propagating conflicts
Suppose that: - an inner loop L contains an allocno A - L clobbers hard register R while A is live - A's parent allocno is AP Previously, propagate_allocno_info would propagate conflict sets up the loop tree, so that the conflict between A and R would become a conflict between AP and R (and so on for ancestors of AP). However, when IRA treats loops as separate allocation regions, it can decide on a loop-by-loop basis whether to allocate a register or spill to memory. Conflicts in inner loops therefore don't need to become hard conflicts in parent loops. Instead we can record that using the “conflicting” registers for the parent allocnos has a higher cost. In the example above, this higher cost is the sum of: - the cost of saving R on entry to L - the cost of keeping the pseudo register in memory throughout L - the cost of reloading R on exit from L This value is also a cap on the hard register cost that A can contribute to AP in general (not just for conflicts). Whatever allocation we pick for AP, there is always the option of spilling that register to memory throughout L, so the cost to A of allocating a register to AP can't be more than the cost of spilling A. To take an extreme example: if allocating a register R2 to A is more expensive than spilling A to memory, ALLOCNO_HARD_REG_COSTS (A)[R2] could be (say) 2 times greater than ALLOCNO_MEMORY_COST (A) or 100 times greater than ALLOCNO_MEMORY_COST (A). But this scale factor doesn't matter to AP. All that matters is that R2 is more expensive than memory for A, so that allocating R2 to AP should be costed as spilling A to memory (again assuming that A and AP are in different allocation regions). Propagating a factor of 100 would distort the register costs for AP. move_spill_restore tries to undo the propagation done by propagate_allocno_info, so we need some extra processing there. gcc/ PR rtl-optimization/98782 * ira-int.h (ira_allocno::might_conflict_with_parent_p): New field. (ALLOCNO_MIGHT_CONFLICT_WITH_PARENT_P): New macro. (ira_single_region_allocno_p): New function. (ira_total_conflict_hard_regs): Likewise. * ira-build.c (ira_create_allocno): Initialize ALLOCNO_MIGHT_CONFLICT_WITH_PARENT_P. (ira_propagate_hard_reg_costs): New function. (propagate_allocno_info): Use it. Try to avoid propagating hard register conflicts to parent allocnos if we can handle the conflicts by spilling instead. Limit the propagated register costs to the cost of spilling throughout the child loop. * ira-color.c (color_pass): Use ira_single_region_allocno_p to test whether a child and parent allocno can share the same register. (move_spill_restore): Adjust for the new behavior of propagate_allocno_info. gcc/testsuite/ * gcc.target/aarch64/reg-alloc-2.c: New test. --- gcc/ira-build.c | 55 +-- gcc/ira-color.c | 53 -- gcc/ira-int.h | 37 + .../gcc.target/aarch64/reg-alloc-2.c | 47 4 files changed, 169 insertions(+), 23 deletions(-) create mode 100644 gcc/testsuite/gcc.target/aarch64/reg-alloc-2.c diff --git a/gcc/ira-build.c b/gcc/ira-build.c index 0547edfde55..875b4d8ed7c 100644 --- a/gcc/ira-build.c +++ b/gcc/ira-build.c @@ -497,6 +497,7 @@ ira_create_allocno (int regno, bool cap_p, bitmap_set_bit (loop_tree_node->all_allocnos, ALLOCNO_NUM (a)); ALLOCNO_NREFS (a) = 0; ALLOCNO_FREQ (a) = 0; + ALLOCNO_MIGHT_CONFLICT_WITH_PARENT_P (a) = false; ALLOCNO_HARD_REGNO (a) = -1; ALLOCNO_CALL_FREQ (a) = 0; ALLOCNO_CALLS_CROSSED_NUM (a) = 0; @@ -1991,6 +1992,35 @@ propagate_modified_regnos (ira_loop_tree_node_t loop_tree_node) loop_tree_node->modified_regnos); } +/* Propagate ALLOCNO_HARD_REG_COSTS from A to PARENT_A. Use SPILL_COST + as the cost of spilling a register throughout A (which we have to do + for PARENT_A allocations that conflict with A). */ +static void +ira_propagate_hard_reg_costs (ira_allocno_t parent_a, ira_allocno_t a, + int spill_cost) +{ + HARD_REG_SET conflicts = ira_total_conflict_hard_regs (a); + conflicts &= ~ira_total_conflict_hard_regs (parent_a); + + auto costs = ALLOCNO_HARD_REG_COSTS (a); + if (!hard_reg_set_empty_p (conflicts)) +ALLOCNO_MIGHT_CONFLICT_WITH_PARENT_P (a) = true; + else if (!costs) +return; + + auto aclass = ALLOCNO_CLASS (a); + ira_allocate_and_set_costs (&ALLOCNO_HARD_REG_COSTS (parent_a), + aclass, ALLOCNO_CLASS_COST (parent_a)); + auto parent_costs = ALLOCNO_HARD_REG_COSTS (parent_a); + for (int i = 0; i < ira_class_hard_regs_num[aclass]; ++i) +if (TEST_HARD_REG_BIT (conflicts, ira_class_hard_regs[aclass][i])) + parent_costs[i] += spill_cost; +else if (costs) + /* The cost to A of
[PATCH 5/6] ira: Consider modelling caller-save allocations as loop spills
If an allocno A in an inner loop L spans a call, a parent allocno AP can choose to handle a call-clobbered/caller-saved hard register R in one of two ways: (1) save R before each call in L and restore R after each call (2) spill R to memory throughout L (2) can be cheaper than (1) in some cases, particularly if L does not reference A. Before the patch we always did (1). The patch adds support for picking (2) instead, when it seems cheaper. It builds on the earlier support for not propagating conflicts to parent allocnos. gcc/ PR rtl-optimization/98782 * ira-int.h (ira_caller_save_cost): New function. (ira_caller_save_loop_spill_p): Likewise. * ira-build.c (ira_propagate_hard_reg_costs): Test whether it is cheaper to spill a call-clobbered register throughout a loop rather than spill it around each individual call. If so, treat all call-clobbered registers as conflicts and... (propagate_allocno_info): ...do not propagate call information from the child to the parent. * ira-color.c (move_spill_restore): Update accordingly. * ira-costs.c (ira_tune_allocno_costs): Use ira_caller_save_cost. gcc/testsuite/ * gcc.target/aarch64/reg-alloc-3.c: New test. --- gcc/ira-build.c | 23 --- gcc/ira-color.c | 13 ++-- gcc/ira-costs.c | 7 +- gcc/ira-int.h | 39 +++ .../gcc.target/aarch64/reg-alloc-3.c | 65 +++ 5 files changed, 129 insertions(+), 18 deletions(-) create mode 100644 gcc/testsuite/gcc.target/aarch64/reg-alloc-3.c diff --git a/gcc/ira-build.c b/gcc/ira-build.c index 875b4d8ed7c..ab3e87164e1 100644 --- a/gcc/ira-build.c +++ b/gcc/ira-build.c @@ -2000,6 +2000,8 @@ ira_propagate_hard_reg_costs (ira_allocno_t parent_a, ira_allocno_t a, int spill_cost) { HARD_REG_SET conflicts = ira_total_conflict_hard_regs (a); + if (ira_caller_save_loop_spill_p (parent_a, a, spill_cost)) +conflicts |= ira_need_caller_save_regs (a); conflicts &= ~ira_total_conflict_hard_regs (parent_a); auto costs = ALLOCNO_HARD_REG_COSTS (a); @@ -2069,15 +2071,18 @@ propagate_allocno_info (void) if (!ira_subloop_allocnos_can_differ_p (parent_a)) merge_hard_reg_conflicts (a, parent_a, true); - ALLOCNO_CALL_FREQ (parent_a) += ALLOCNO_CALL_FREQ (a); - ALLOCNO_CALLS_CROSSED_NUM (parent_a) - += ALLOCNO_CALLS_CROSSED_NUM (a); - ALLOCNO_CHEAP_CALLS_CROSSED_NUM (parent_a) - += ALLOCNO_CHEAP_CALLS_CROSSED_NUM (a); - ALLOCNO_CROSSED_CALLS_ABIS (parent_a) - |= ALLOCNO_CROSSED_CALLS_ABIS (a); - ALLOCNO_CROSSED_CALLS_CLOBBERED_REGS (parent_a) - |= ALLOCNO_CROSSED_CALLS_CLOBBERED_REGS (a); + if (!ira_caller_save_loop_spill_p (parent_a, a, spill_cost)) + { + ALLOCNO_CALL_FREQ (parent_a) += ALLOCNO_CALL_FREQ (a); + ALLOCNO_CALLS_CROSSED_NUM (parent_a) + += ALLOCNO_CALLS_CROSSED_NUM (a); + ALLOCNO_CHEAP_CALLS_CROSSED_NUM (parent_a) + += ALLOCNO_CHEAP_CALLS_CROSSED_NUM (a); + ALLOCNO_CROSSED_CALLS_ABIS (parent_a) + |= ALLOCNO_CROSSED_CALLS_ABIS (a); + ALLOCNO_CROSSED_CALLS_CLOBBERED_REGS (parent_a) + |= ALLOCNO_CROSSED_CALLS_CLOBBERED_REGS (a); + } ALLOCNO_EXCESS_PRESSURE_POINTS_NUM (parent_a) += ALLOCNO_EXCESS_PRESSURE_POINTS_NUM (a); aclass = ALLOCNO_CLASS (a); diff --git a/gcc/ira-color.c b/gcc/ira-color.c index 4344ee6689e..1487afc5ef1 100644 --- a/gcc/ira-color.c +++ b/gcc/ira-color.c @@ -3597,11 +3597,16 @@ move_spill_restore (void) propagate_allocno_info will have propagated the cost of spilling HARD_REGNO in SUBLOOP_NODE. (ira_subloop_allocnos_can_differ_p must be true -in that case.) Otherwise, SPILL_COST acted as -a cap on the propagated register cost, in cases -where the allocations can differ. */ +in that case.) If HARD_REGNO is a caller-saved +register, we might have modelled it in the same way. + +Otherwise, SPILL_COST acted as a cap on the propagated +register cost, in cases where the allocations can differ. */ auto conflicts = ira_total_conflict_hard_regs (subloop_allocno); - if (TEST_HARD_REG_BIT (conflicts, hard_regno)) + if (TEST_HARD_REG_BIT (conflicts, hard_regno) + || (ira_need_caller_save_p (subloop_allocno, hard_regno) + && ira_caller_save_loop_spill_p (a, subloop_allocno, + spill_cost))) reg_cost = spill_cost; else if (ira_subloop_allocno
[PATCH 6/6] ira: Handle "soft" conflicts between cap and non-cap allocnos
This patch looks for allocno conflicts of the following form: - One allocno (X) is a cap allocno for some non-cap allocno X2. - X2 belongs to some loop L2. - The other allocno (Y) is a non-cap allocno. - Y is an ancestor of some allocno Y2 in L2. - Y2 is not referenced in L2 (that is, ALLOCNO_NREFS (Y2) == 0). - Y can use a different allocation from Y2. In this case, Y's register is live across L2 but is not used within it, whereas X's register is used only within L2. The conflict is therefore only "soft", in that it can easily be avoided by spilling Y2 inside L2 without affecting any insn references. In principle we could do this for ALLOCNO_NREFS (Y2) != 0 too, with the callers then taking Y2's ALLOCNO_MEMORY_COST into account. There would then be no "cliff edge" between a Y2 that has no references and a Y2 that has (say) a single cold reference. However, doing that isn't necessary for the PR and seems to give variable results in practice. (fotonik3d_r improves slightly but namd_r regresses slightly.) It therefore seemed better to start with the higher-value zero-reference case and see how things go. On top of the previous patches in the series, this fixes the exchange2 regression seen in GCC 11. gcc/ PR rtl-optimization/98782 * ira-int.h (ira_soft_conflict): Declare. * ira-costs.c (max_soft_conflict_loop_depth): New constant. (ira_soft_conflict): New function. (spill_soft_conflicts): Likewise. (assign_hard_reg): Use them to handle the case described by the comment above ira_soft_conflict. (improve_allocation): Likewise. * ira.c (check_allocation): Allow allocnos with "soft" conflicts to share the same register. gcc/testsuite/ * gcc.target/aarch64/reg-alloc-4.c: New test. --- gcc/ira-color.c | 284 -- gcc/ira-int.h | 1 + gcc/ira.c | 2 + .../gcc.target/aarch64/reg-alloc-4.c | 69 + 4 files changed, 326 insertions(+), 30 deletions(-) create mode 100644 gcc/testsuite/gcc.target/aarch64/reg-alloc-4.c diff --git a/gcc/ira-color.c b/gcc/ira-color.c index 1487afc5ef1..36f3f4d70f3 100644 --- a/gcc/ira-color.c +++ b/gcc/ira-color.c @@ -36,6 +36,11 @@ along with GCC; see the file COPYING3. If not see #include "reload.h" #include "cfgloop.h" +/* To prevent soft conflict detection becoming quadratic in the + loop depth. Only for very pathological cases, so it hardly + seems worth a --param. */ +const int max_soft_conflict_loop_depth = 64; + typedef struct allocno_hard_regs *allocno_hard_regs_t; /* The structure contains information about hard registers can be @@ -1707,6 +1712,167 @@ calculate_saved_nregs (int hard_regno, machine_mode mode) return nregs; } +/* Allocnos A1 and A2 are known to conflict. Check whether, in some loop L + that is either the current loop or a nested subloop, the conflict is of + the following form: + + - One allocno (X) is a cap allocno for some non-cap allocno X2. + + - X2 belongs to some loop L2. + + - The other allocno (Y) is a non-cap allocno. + + - Y is an ancestor of some allocno Y2 in L2. (Note that such a Y2 + must exist, given that X and Y conflict.) + + - Y2 is not referenced in L2 (that is, ALLOCNO_NREFS (Y2) == 0). + + - Y can use a different allocation from Y2. + + In this case, Y's register is live across L2 but is not used within it, + whereas X's register is used only within L2. The conflict is therefore + only "soft", in that it can easily be avoided by spilling Y2 inside L2 + without affecting any insn references. + + If the conflict does have this form, return the Y2 that would need to be + spilled in order to allow X and Y (and thus A1 and A2) to use the same + register. Return null otherwise. Returning null is conservatively correct; + any nonnnull return value is an optimization. */ +ira_allocno_t +ira_soft_conflict (ira_allocno_t a1, ira_allocno_t a2) +{ + /* Search for the loop L and its associated allocnos X and Y. */ + int search_depth = 0; + while (ALLOCNO_CAP_MEMBER (a1) && ALLOCNO_CAP_MEMBER (a2)) +{ + a1 = ALLOCNO_CAP_MEMBER (a1); + a2 = ALLOCNO_CAP_MEMBER (a2); + if (search_depth++ > max_soft_conflict_loop_depth) + return nullptr; +} + /* This must be true if A1 and A2 conflict. */ + ira_assert (ALLOCNO_LOOP_TREE_NODE (a1) == ALLOCNO_LOOP_TREE_NODE (a2)); + + /* Make A1 the cap allocno (X in the comment above) and A2 the + non-cap allocno (Y in the comment above). */ + if (ALLOCNO_CAP_MEMBER (a2)) +std::swap (a1, a2); + if (!ALLOCNO_CAP_MEMBER (a1)) +return nullptr; + + /* Search for the real allocno that A1 caps (X2 in the comment above). */ + do +{ + a1 = ALLOCNO_CAP_MEMBER (a1); + if (search_depth++ > max_soft_conflict_loop_depth) + return nullptr; +} + while (ALLOCNO_CAP_MEMBER (a1))
[committed] libstdc++: Do not use std::isdigit in [PR103911]
Tested powerpc64le-linux, pushed to trunk. This avoids a potential race condition if std::setlocale is used concurrently with std::from_chars. libstdc++-v3/ChangeLog: PR libstdc++/103911 * include/std/charconv (__from_chars_alpha_to_num): Return char instead of unsigned char. Change invalid return value to 127 instead of using numeric trait. (__from_chars_alnum): Fix comment. Do not use std::isdigit. Change type of variable to char. --- libstdc++-v3/include/std/charconv | 11 +-- 1 file changed, 5 insertions(+), 6 deletions(-) diff --git a/libstdc++-v3/include/std/charconv b/libstdc++-v3/include/std/charconv index 4337a9e52f9..a3f8c7718b2 100644 --- a/libstdc++-v3/include/std/charconv +++ b/libstdc++-v3/include/std/charconv @@ -39,7 +39,6 @@ #include #include // for __bit_width -#include // for isdigit #include // for __to_chars_len, __to_chars_10_impl #include // for std::errc #include @@ -466,7 +465,7 @@ namespace __detail return true; } - constexpr unsigned char + constexpr char __from_chars_alpha_to_num(char __c) { switch (__c) @@ -550,10 +549,10 @@ namespace __detail case 'Z': return 35; } -return __gnu_cxx::__int_traits::__max; +return 127; } - /// std::from_chars implementation for integers in bases 11 to 26. + /// std::from_chars implementation for integers in bases 11 to 36. template bool __from_chars_alnum(const char*& __first, const char* __last, _Tp& __val, @@ -562,8 +561,8 @@ namespace __detail bool __valid = true; while (__first != __last) { - unsigned char __c = *__first; - if (std::isdigit(__c)) + char __c = *__first; + if ('0' <= __c && __c <= '9') // isdigit __c -= '0'; else { -- 2.31.1
[committed] libstdc++: Increase timeout for pthread7-rope.cc test
Tested powerpc64le-linux, pushed to trunk. This test spawns thousands of threads and so times out if the tests are run with a low timeout value and the machine is busy. libstdc++-v3/ChangeLog: * testsuite/ext/rope/pthread7-rope.cc: Add dg-timeout-factor. --- libstdc++-v3/testsuite/ext/rope/pthread7-rope.cc | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/libstdc++-v3/testsuite/ext/rope/pthread7-rope.cc b/libstdc++-v3/testsuite/ext/rope/pthread7-rope.cc index 9a870d1fc2e..b0b92adcfa9 100644 --- a/libstdc++-v3/testsuite/ext/rope/pthread7-rope.cc +++ b/libstdc++-v3/testsuite/ext/rope/pthread7-rope.cc @@ -20,6 +20,7 @@ // { dg-do run } // { dg-options "-pthread" } // { dg-require-effective-target pthread } +// { dg-timeout-factor 2 } #include #include @@ -33,7 +34,7 @@ typedef __gnu_cxx::rope > rope_type; rope_type foo2; rope_type foo4; -void* thread_main(void *) +void* thread_main(void *) { // To see a problem with gcc 3.3 and before, set a break point here. // Single step through c_str implementation, call sched_yield after -- 2.31.1
Re: [committed] libstdc++: Reduce template instantiations in
On Thu, 6 Jan 2022 at 10:43, Jonathan Wakely wrote: > > > On Thu, 6 Jan 2022 at 10:33, Jonathan Wakely wrote: > >> >> >> On Thu, 6 Jan 2022 at 10:00, Stephan Bergmann >> wrote: >> >>> On 05/01/2022 14:47, Jonathan Wakely via Libstdc++ wrote: >>> > Tested powerpc64le-linux, pushed to trunk. >>> > >>> > >>> > This moves the last two template parameters of __regex_algo_impl to be >>> > runtime function parameters instead, so that we don't need four >>> > different instantiations for the possible ways to call it. Most of the >>> > function (and what it instantiates) is the same in all cases, so making >>> > them compile-time choices doesn't really have much benefit. >>> > >>> > Use 'if constexpr' for conditions that check template parameters, so >>> > that when we do depend on a compile-time condition we only instantiate >>> > what we need to. >>> > >>> > libstdc++-v3/ChangeLog: >>> > >>> > * include/bits/regex.h (__regex_algo_impl): Change __policy and >>> > __match_mode template parameters to be function parameters. >>> > (regex_match, regex_search): Pass policy and match mode as >>> > function arguments. >>> > * include/bits/regex.tcc (__regex_algo_impl): Change template >>> > parameters to function parameters. >>> > * include/bits/regex_compiler.h (_RegexTranslatorBase): Use >>> > 'if constexpr' for conditions using template parameters. >>> > (_RegexTranslator): Likewise. >>> > * include/bits/regex_executor.tcc (_Executor::_M_handle_accept): >>> > Likewise. >>> > * testsuite/util/testsuite_regex.h (regex_match_debug) >>> > (regex_search_debug): Move template arguments to function >>> > arguments. >>> > --- >>> > libstdc++-v3/include/bits/regex.h | 33 >>> +-- >>> > libstdc++-v3/include/bits/regex.tcc | 8 ++--- >>> > libstdc++-v3/include/bits/regex_compiler.h| 9 ++--- >>> > libstdc++-v3/include/bits/regex_executor.tcc | 2 +- >>> > libstdc++-v3/testsuite/util/testsuite_regex.h | 24 +++--- >>> > 5 files changed, 37 insertions(+), 39 deletions(-) >>> >>> Clang now fails #include with >>> >>> > In file included from >>> gcc/trunk/inst/lib/gcc/x86_64-pc-linux-gnu/12.0.0/../../../../include/c++/12.0.0/regex:66: >>> > >>> gcc/trunk/inst/lib/gcc/x86_64-pc-linux-gnu/12.0.0/../../../../include/c++/12.0.0/bits/regex.h:799:9: >>> error: unknown type name '_RegexExecutorPolicy'; did you mean >>> '__detail::_RegexExecutorPolicy'? >>> > _RegexExecutorPolicy, bool); >>> > ^ >>> > >>> gcc/trunk/inst/lib/gcc/x86_64-pc-linux-gnu/12.0.0/../../../../include/c++/12.0.0/bits/regex.h:45:14: >>> note: '__detail::_RegexExecutorPolicy' declared here >>> > enum class _RegexExecutorPolicy : int { _S_auto, _S_alternate }; >>> > ^ >>> > >>> gcc/trunk/inst/lib/gcc/x86_64-pc-linux-gnu/12.0.0/../../../../include/c++/12.0.0/bits/regex.h:2070:9: >>> error: unknown type name '_RegexExecutorPolicy'; did you mean >>> '__detail::_RegexExecutorPolicy'? >>> > _RegexExecutorPolicy, bool); >>> > ^ >>> > >>> gcc/trunk/inst/lib/gcc/x86_64-pc-linux-gnu/12.0.0/../../../../include/c++/12.0.0/bits/regex.h:45:14: >>> note: '__detail::_RegexExecutorPolicy' declared here >>> > enum class _RegexExecutorPolicy : int { _S_auto, _S_alternate }; >>> > ^ >>> >>> and >>> >>> > diff --git a/libstdc++-v3/include/bits/regex.h >>> b/libstdc++-v3/include/bits/regex.h >>> > index 7480b0a5f97..46c168010bf 100644 >>> > --- a/libstdc++-v3/include/bits/regex.h >>> > +++ b/libstdc++-v3/include/bits/regex.h >>> > @@ -796,7 +796,7 @@ _GLIBCXX_BEGIN_NAMESPACE_CXX11 >>> > __detail::__regex_algo_impl(_Bp, _Bp, match_results<_Bp, _Ap>&, >>> > const basic_regex<_Cp, _Rp>&, >>> > regex_constants::match_flag_type, >>> > - _RegexExecutorPolicy, bool); >>> > + __detail::_RegexExecutorPolicy, >>> bool); >>> > >>> >template >>> > friend class __detail::_Executor; >>> > @@ -2067,7 +2067,7 @@ _GLIBCXX_BEGIN_NAMESPACE_CXX11 >>> > __detail::__regex_algo_impl(_Bp, _Bp, match_results<_Bp, _Ap>&, >>> > const basic_regex<_Cp, _Rp>&, >>> > regex_constants::match_flag_type, >>> > - _RegexExecutorPolicy, bool); >>> > + __detail::_RegexExecutorPolicy, >>> bool); >>> > >>> >// Reset contents to __size unmatched sub_match objects >>> >// (plus additional objects for prefix, suffix and unmatched >>> sub). >>> >>> would fix that. >>> >> >> >> I'll make the change, but this looks like a clang bug: >> https://godbolt.org/z/bozxYErrc >> > > Maybe this one: https:/
[committed] libstdc++: Add self-merge check to std::forward_list::merge [PR103853]
From: "Pavel I. Kryukov" Tested powerpc64le-linux, pushed to trunk. This implements the proposed resolution of LWG 3088, so that x.merge(x) is a no-op, consistent with std::list::merge. Signed-off-by: Pavel I. Kryukov Co-authored-by: Jonathan Wakely libstdc++-v3/ChangeLog: PR libstdc++/103853 * include/bits/forward_list.tcc (forward_list::merge): Check for self-merge. * testsuite/23_containers/forward_list/operations/merge.cc: New test. --- libstdc++-v3/include/bits/forward_list.tcc| 5 ++ .../forward_list/operations/merge.cc | 48 +++ 2 files changed, 53 insertions(+) create mode 100644 libstdc++-v3/testsuite/23_containers/forward_list/operations/merge.cc diff --git a/libstdc++-v3/include/bits/forward_list.tcc b/libstdc++-v3/include/bits/forward_list.tcc index 7d90e82de39..b0a65457404 100644 --- a/libstdc++-v3/include/bits/forward_list.tcc +++ b/libstdc++-v3/include/bits/forward_list.tcc @@ -367,6 +367,11 @@ _GLIBCXX_BEGIN_NAMESPACE_CONTAINER forward_list<_Tp, _Alloc>:: merge(forward_list&& __list, _Comp __comp) { + // _GLIBCXX_RESOLVE_LIB_DEFECTS + // 3088. forward_list::merge behavior unclear when passed *this + if (std::__addressof(__list) == this) + return; + _Node_base* __node = &this->_M_impl._M_head; while (__node->_M_next && __list._M_impl._M_head._M_next) { diff --git a/libstdc++-v3/testsuite/23_containers/forward_list/operations/merge.cc b/libstdc++-v3/testsuite/23_containers/forward_list/operations/merge.cc new file mode 100644 index 000..0f6f520c33b --- /dev/null +++ b/libstdc++-v3/testsuite/23_containers/forward_list/operations/merge.cc @@ -0,0 +1,48 @@ +// { dg-do run { target c++11 } } +// C++11 23.3.4.6 Operations [forwardlist.ops] + +#include +#include + +void +test_stable() +{ + std::forward_list a{1.5, 2.0, 3.5, 4.1}; + std::forward_list b{1.0, 2.5, 3.0, 4.3, 4.2, 5.0}; + + a.merge(b, std::less{}); + + // result is sorted with respect to std::less, so 1.0 and 1.5 are + // equivalent, and stability guarantee means the element from a comes first. + const std::forward_list r { 1.5, 1.0, + 2.0, 2.5, + 3.5, 3.0, + 4.1, 4.3, 4.2, + 5.0}; + + VERIFY(a == r); +} + +void +test_lwg3088() +{ + // LWG 3088: forward_list::merge behavior unclear when passed *this + // PR libstdc++/103853 + std::forward_list c1{ 1, 2, 3 }; + const std::forward_list c2 = c1; + c1.merge(c1); + VERIFY( c1 == c2 ); + c1.merge(c1, std::less{}); + VERIFY( c1 == c2 ); + c1.merge(std::move(c1)); + VERIFY( c1 == c2 ); + c1.merge(std::move(c1), std::less{}); + VERIFY( c1 == c2 ); +} + +int +main() +{ + test_stable(); + test_lwg3088(); +} -- 2.31.1
Re: [PATCH 1/6] ira: Add a ira_loop_border_costs class
> The final index into (ira_)memory_move_cost is 1 for loads and > 0 for stores. Thus the combination: > > entry_freq * memory_cost[1] + exit_freq * memory_cost[0] > > is the cost of loading a register on entry to a loop and > storing it back on exit from the loop. This is the cost to > use if the register is successfully allocated within the > loop but is spilled in the parent loop. Similarly: > > entry_freq * memory_cost[0] + exit_freq * memory_cost[1] > > is the cost of storing a register on entry to the loop and > restoring it on exit from the loop. This is the cost to > use if the register is spilled within the loop but is > successfully allocated in the parent loop. > > The patch adds a helper class for calculating these values and > mechanically replaces the existing instances. There is no attempt to > editorialise the choice between using “spill inside” and “spill outside” > costs. (I think one of them is the wrong way round, but a later patch > deals with that.) > > No functional change intended. > > gcc/ > PR rtl-optimization/98782 > * ira-int.h (ira_loop_border_costs): New class. > * ira-color.c (ira_loop_border_costs::ira_loop_border_costs): > New constructor. > (calculate_allocno_spill_cost): Use ira_loop_border_costs. > (color_pass): Likewise. > (move_spill_restore): Likewise. Thanks for working on this. For profile bits, the patch looks good to me. In general I am trying to move away from the integer frequencies. It would be more precise to calculate the m_entry_freq and m_exit_freq as profile_count m_entry_count, m_exit_count and do conversion only later. count->to_frequency method basically scales it to the range 0...BB_FREQ_MAX. Honza
Re: [PATCH 1b/6] Add __attribute__((untrusted))
On Thu, 2021-12-09 at 15:54 -0700, Martin Sebor wrote: > On 11/13/21 1:37 PM, David Malcolm via Gcc-patches wrote: > > This patch adds a new: > > > > __attribute__((untrusted)) > > > > for use by the C front-end, intended for use by the Linux kernel for > > use with "__user", but which could be used by other operating system > > kernels, and potentialy by other projects. > > It looks like untrusted is a type attribute (rather than one > that applies to variables and/or function return values or > writeable by-reference arguments). I find that quite surprising. FWIW I initially tried implementing it on pointer types, but doing it on the underlying type was much cleaner. > I'm used to thinking of trusted vs tainted as dynamic properties > of data so I'm having trouble deciding what to think about > the attribute applying to types. Can you explain why it's > useful on types? A type system *is* a way of detecting problems involving dynamic properties of data. Ultimately all we have at runtime is a collection of bits; the toolchain has the concept of types as a way to allow us to reason about properies of those bits without requiring a full cross-TU analysis (to try to figure out that e.g. x is, say, a 32 bit unsigned integer), and to document these properties clearly to human readers of the code. I see this as working like a qualifier (rather like "const" and "volatile"), in that an untrusted char * when dereferenced gives you an untrusted char The intent is to have a way of treating the values as "actively hostile", so that code analyzers can assume the worst possible values for such types (or more glibly, that we're dealing with data from Satan rather than from Murphy). Such types are also relevant to infoleaks: writing sensitive information to an untrusted value can be detected relatively easily with this approach, by checking the type of the value - the types express the trust boundary Doing this with qualifiers allows us to use the C type system to detect these kinds of issues without having to add a full cross-TU interprocedural analysis, and documents it to human readers of the code. Compare with const-correctness; we can have an analogous "trust-correctness". > > I'd expect the taint property of a type to be quickly lost as > an object of the type is passed through existing APIs (e.g., > a char array manipulated by string functions like strchr). FWIW you can't directly pass an attacker-controlled buffer to strchr: strchr requires there to be a 0-terminator to the array; if the array's content is untrusted then the attacker might not have 0-terminated it. As implemented, the patch doesn't complain about this, though maybe it should. The main point here is to support the existing __user annotation used by the Linux kernel, in particular, copy_from_user and copy_to_user. > > (I usually look at tests to help me understand the design of > a change but I couldn't find an answer to my question in those > in the patch.) The patch kit was rather unclear on this, due to the use of two different approaches (custom address spaces vs this untrusted attribute). Sorry about this. Patches 4a and 4b in the kit add test-uaccess.h (to gcc/testsuite/gcc.dg/analyzer) which supplies "__user"; see the tests that use "test-uaccess.h" in patch 3: [PATCH 3/6] analyzer: implement infoleak detection https://gcc.gnu.org/pipermail/gcc-patches/2021-November/584377.html and in patch 5: [PATCH 5/6] analyzer: use region::untrusted_p in taint detection https://gcc.gnu.org/pipermail/gcc-patches/2021-November/584374.html (sorry about messing up the order of the patches). Patch 4a here: [PATCH 4a/6] analyzer: implement region::untrusted_p in terms of custom address spaces https://gcc.gnu.org/pipermail/gcc-patches/2021-November/584371.html implements "__user" as a custom address space, whereas patch 4b here: [PATCH 4b/6] analyzer: implement region::untrusted_p in terms of __attribute__((untrusted)) https://gcc.gnu.org/pipermail/gcc-patches/2021-November/584373.html implements "__user" to be __attribute__((untrusted)). Perhaps I should drop the custom address space versions of the patches and post a version of the kit that simply uses the attribute? Dave > > Thanks > Martin > > PS I found one paper online that discusses type-based taint > analysis in Java but not much more. I only quickly skimmed > the paper and although it conceptually makes sense I'm still > having difficulties seeing how it would be useful in C. > > > > > Known issues: > > - at least one TODO in handle_untrusted_attribute > > - should it be permitted to dereference an untrusted pointer? The > > patch > > currently allows this > > > > gcc/c-family/ChangeLog: > > * c-attribs.c (c_common_attribute_table): Add "untrusted". > > (build_untrusted_type): New. > > (handle_untrusted_attribute): New. > > * c-pretty-print.c (pp_c_cv_qualifiers): Handle > > TYPE_QUAL_UNTRU
Re: [PATCH] Loop unswitching: support gswitch statements.
On 1/5/22 07:34, Richard Biener wrote: On Thu, Dec 9, 2021 at 2:02 PM Martin Liška wrote: On 11/30/21 12:17, Richard Biener wrote: + unswitch_predicate *predicate + = new unswitch_predicate (expr, idx, edge_index); + ranger->gori ().outgoing_edge_range_p (predicate->true_range, e, +idx, *get_global_range_query ()); + /* Huge switches are not supported by Ranger. */ + if (predicate->true_range.undefined_p ()) I hope ranger will set the range to varying_p () in that case, not undefined? But even then, is that a reason for punting? I guess we fail to prune cases in that case but the cost modeling should then account for those and thus we are at least consistent? huge switches not supported? I don't know what you mean, either that or I forget :-) If there are more edges than multi-ranges support, then things will start getting merged because they cant be represented.. and the default case may then also contain some values that also have cases.. but all inconsistencies will move towards varying.. not undefined. Andrew
[committed] c++: Add testcase for recently fixed PR [PR69681]
Fixed ever since r12-6188. PR c++/69681 gcc/testsuite/ChangeLog: * g++.dg/cpp0x/constexpr-compare2.C: New test. --- gcc/testsuite/g++.dg/cpp0x/constexpr-compare2.C | 10 ++ 1 file changed, 10 insertions(+) create mode 100644 gcc/testsuite/g++.dg/cpp0x/constexpr-compare2.C diff --git a/gcc/testsuite/g++.dg/cpp0x/constexpr-compare2.C b/gcc/testsuite/g++.dg/cpp0x/constexpr-compare2.C new file mode 100644 index 000..b1bc4720805 --- /dev/null +++ b/gcc/testsuite/g++.dg/cpp0x/constexpr-compare2.C @@ -0,0 +1,10 @@ +// PR c++/69681 +// { dg-do compile { target c++11 } } + +void f(); +void g(); +static_assert(f != g, ""); + +#if __cpp_constexpr >= 201603L +static_assert([]{} != []{}, ""); +#endif -- 2.34.1.493.ge83ba647f7
Re: [PATCH] Loop unswitching: support gswitch statements.
On 1/6/22 16:11, Andrew MacLeod wrote: On 1/5/22 07:34, Richard Biener wrote: On Thu, Dec 9, 2021 at 2:02 PM Martin Liška wrote: On 11/30/21 12:17, Richard Biener wrote: + unswitch_predicate *predicate + = new unswitch_predicate (expr, idx, edge_index); + ranger->gori ().outgoing_edge_range_p (predicate->true_range, e, + idx, *get_global_range_query ()); + /* Huge switches are not supported by Ranger. */ + if (predicate->true_range.undefined_p ()) I hope ranger will set the range to varying_p () in that case, not undefined? But even then, is that a reason for punting? I guess we fail to prune cases in that case but the cost modeling should then account for those and thus we are at least consistent? huge switches not supported? I don't know what you mean, either that or I forget :-) If there are more edges than multi-ranges support, then things will start getting merged because they cant be represented.. and the default case may then also contain some values that also have cases.. but all inconsistencies will move towards varying.. not undefined. Andrew Hello. If you consider the attached test-case, then I get for: (gdb) p debug_bb(e->src) [local count: 955630225]: # i_489 = PHI # tmp_490 = PHI _1329 = (long unsigned int) i_489; _1330 = _1329 * 8; switch (order_385(D)) [1.08%], case 0: [1.08%], case 1: [1.08%], case 2: [1.08%], case 3: [1.08%], case 4: [1.08%], case 5: [1.08%], case 6: [1.08%], case 7: [1.08%], case 8: [1.08%], case 9: [1.08%], case 10: [1.08%], case 11: [1.08%], case 12: [1.08%], case 13: [1.08%], case 14: [1.08%], case 15: [1.08%], case 16: [1.08%], case 17: [1.08%], case 18: [1.08%], case 19: [1.08%], case 20: [1.08%], case 21: [1.08%], case 22: [1.08%], case 23: [1.08%], case 24: [1.08%], case 25: [1.08%], case 26: [1.08%], case 27: [1.08%], case 28: [1.08%], case 29: [1.08%], case 30: [1.08%], case 31: [1.08%], case 32: [1.08%], case 33: [1.08%], case 34: [1.08%], case 35: [1.08%], case 36: [1.08%], case 37: [1.08%], case 38: [1.08%], case 39: [1.08%], case 40: [1.08%], case 41: [1.08%], case 42: [1.08%], case 43: [1.08%], case 44: [1.08%], case 45: [1.08%], case 46: [1.08%], case 47: [1.08%], case 48: [1.08%], case 49: [1.08%], case 50: [1.08%], case 51: [1.08%], case 52: [1.08%], case 53: [1.08%], case 54: [1.08%], case 55: [1.08%], case 56: [1.08%], case 57: [1.08%], case 58: [1.08%], case 59: [1.08%], case 60: [1.08%], case 61: [1.08%], case 62: [1.08%], case 63: [1.08%], case 64: [1.08%], case 65: [1.08%], case 66: [1.08%], case 67: [1.08%], case 68: [1.08%], case 69: [1.08%], case 70: [1.08%], case 71: [1.08%], case 72: [1.08%], case 73: [1.08%], case 74: [1.08%], case 75: [1.08%], case 76: [1.08%], case 77: [1.08%], case 78: [1.08%], case 79: [1.08%], case 80: [1.08%], case 81: [1.08%], case 82: [1.08%], case 83: [1.08%], case 84: [1.08%], case 85: [1.08%], case 86: [1.08%], case 87: [1.08%], case 88: [1.08%], case 89: [1.08%], case 90: [1.08%], case 91: [1.08%]> $7 = void (gdb) p debug_bb(e->dest) [local count: 10275591]: : _3 = a_386(D) + _1330; _4 = *_3; tmp_478 = _4 * 0.0; goto ; [100.00%] │ 480ranger->gori ().outgoing_edge_range_p (predicate->true_range, e, │ 481 idx, *get_global_range_query ()); Note the return value is false. But I think the comment is not precise: │ 1233 // Calculate a range on edge E and return it in R. Try to evaluate a │ 1234 // range for NAME on this edge. Return FALSE if this is either not a │ 1235 // control edge or NAME is not defined by this edge. and (gdb) p predicate->true_range.debug() UNDEFINED So is the behavior correct Andrew? Thanks, Martin int __attribute__((noipa)) foo(double *a, double *b, double *c, double *d, double *r, int size, int order) { for (int i = 0; i < size; i++) { double tmp, tmp2; switch(order) { case 0: tmp = 0 * a[i]; break; case 1: tmp = 1 * a[i]; break; case 2: tmp = 2 * a[i]; break; case 3: tmp = 3 * a[i]; break; case 4: tmp = 4 * a[i]; break; case 5: tmp = 5 * a[i]; break; case 6: tmp = 6 * a[i]; break; case 7: tmp = 7 * a[i]; break; case 8: tmp = 8 * a[i]; break; case 9: tmp = 9 * a[i]; break; case 10: tmp = 10 * a[i]; break; case 11: tmp = 11 * a[i]; break; case 12: tmp = 12 * a[i]; break; case 13: tmp = 13 * a[i]; break; case 14: tmp = 14 * a[i]; break; case 15: tmp = 15 * a[i]; break; case 16: tmp = 16 * a[i]; break; case 17: tmp = 17 * a[i]; break; case 18: tmp = 18 * a[i]; break; case 19: tmp = 19 * a[i]; break; case 20: tmp = 20 * a[i]; break; case 21: tmp = 21 * a[i]; break; case 22:
Re: [PATCH] Loop unswitching: support gswitch statements.
On 1/6/22 11:02, Martin Liška wrote: On 1/6/22 16:11, Andrew MacLeod wrote: On 1/5/22 07:34, Richard Biener wrote: On Thu, Dec 9, 2021 at 2:02 PM Martin Liška wrote: On 11/30/21 12:17, Richard Biener wrote: + unswitch_predicate *predicate + = new unswitch_predicate (expr, idx, edge_index); + ranger->gori ().outgoing_edge_range_p (predicate->true_range, e, + idx, *get_global_range_query ()); + /* Huge switches are not supported by Ranger. */ + if (predicate->true_range.undefined_p ()) I hope ranger will set the range to varying_p () in that case, not undefined? But even then, is that a reason for punting? I guess we fail to prune cases in that case but the cost modeling should then account for those and thus we are at least consistent? huge switches not supported? I don't know what you mean, either that or I forget :-) If there are more edges than multi-ranges support, then things will start getting merged because they cant be represented.. and the default case may then also contain some values that also have cases.. but all inconsistencies will move towards varying.. not undefined. Andrew Hello. If you consider the attached test-case, then I get for: (gdb) p debug_bb(e->src) [local count: 955630225]: # i_489 = PHI # tmp_490 = PHI _1329 = (long unsigned int) i_489; _1330 = _1329 * 8; switch (order_385(D)) [1.08%], case 0: [1.08%], case 1: [1.08%], case 2: [1.08%], case 3: [1.08%], case 4: [1.08%], case 5: [1.08%], case 6: [1.08%], case 7: [1.08%], case 8: [1.08%], case 9: [1.08%], case 10: [1.08%], case 11: [1.08%], case 12: [1.08%], case 13: [1.08%], case 14: [1.08%], case 15: [1.08%], case 16: [1.08%], case 17: [1.08%], case 18: [1.08%], case 19: [1.08%], case 20: [1.08%], case 21: [1.08%], case 22: [1.08%], case 23: [1.08%], case 24: [1.08%], case 25: [1.08%], case 26: [1.08%], case 27: [1.08%], case 28: [1.08%], case 29: [1.08%], case 30: [1.08%], case 31: [1.08%], case 32: [1.08%], case 33: [1.08%], case 34: [1.08%], case 35: [1.08%], case 36: [1.08%], case 37: [1.08%], case 38: [1.08%], case 39: [1.08%], case 40: [1.08%], case 41: [1.08%], case 42: [1.08%], case 43: [1.08%], case 44: [1.08%], case 45: [1.08%], case 46: [1.08%], case 47: [1.08%], case 48: [1.08%], case 49: [1.08%], case 50: [1.08%], case 51: [1.08%], case 52: [1.08%], case 53: [1.08%], case 54: [1.08%], case 55: [1.08%], case 56: [1.08%], case 57: [1.08%], case 58: [1.08%], case 59: [1.08%], case 60: [1.08%], case 61: [1.08%], case 62: [1.08%], case 63: [1.08%], case 64: [1.08%], case 65: [1.08%], case 66: [1.08%], case 67: [1.08%], case 68: [1.08%], case 69: [1.08%], case 70: [1.08%], case 71: [1.08%], case 72: [1.08%], case 73: [1.08%], case 74: [1.08%], case 75: [1.08%], case 76: [1.08%], case 77: [1.08%], case 78: [1.08%], case 79: [1.08%], case 80: [1.08%], case 81: [1.08%], case 82: [1.08%], case 83: [1.08%], case 84: [1.08%], case 85: [1.08%], case 86: [1.08%], case 87: [1.08%], case 88: [1.08%], case 89: [1.08%], case 90: [1.08%], case 91: [1.08%]> $7 = void (gdb) p debug_bb(e->dest) [local count: 10275591]: : _3 = a_386(D) + _1330; _4 = *_3; tmp_478 = _4 * 0.0; goto ; [100.00%] │ 480 ranger->gori ().outgoing_edge_range_p (predicate->true_range, e, │ 481 idx, *get_global_range_query ()); Note the return value is false. But I think the comment is not precise: So if you get a FALSE back, you cannot use any results because GORI is claiming that it cant figure something out... or there is nothing to figure out. Most of rangers routines are implemented so that if they return FALSE, the result is meaningless. what is IDX you are passing? order_385? As a side note, theres a typo in the testcase: .. Im not sure how that affects things, but : defaut: __builtin_unreachable (); default is misspelled... maybe it thinks that some kind of runtime value? I am surprised it even compiles. maybe that is mucking up what GORI thiunks it can calculate? │ 1233 // Calculate a range on edge E and return it in R. Try to evaluate a │ 1234 // range for NAME on this edge. Return FALSE if this is either not a │ 1235 // control edge or NAME is not defined by this edge. and (gdb) p predicate->true_range.debug() UNDEFINED So is the behavior correct Andrew? Thanks, Martin
Re: [PATCH] Loop unswitching: support gswitch statements.
On 1/5/22 13:34, Richard Biener wrote: On Thu, Dec 9, 2021 at 2:02 PM Martin Liška wrote: On 11/30/21 12:17, Richard Biener wrote: I'd like to see the gswitch support - that's what was posted before stage3 close, this patch on its own doesn't seem worth pushing for. That said, I have some comments below (and the already raised ones about how things might need to change with gswitch support). Is it so difficult to develop gswitch support as a separate change ontop of this? Hello. Took me some time, but I have a working version of the patch that makes both refactoring of the costing model and adds support for gswitch. For quite some time, I maintained 2 patches, but as commonly happens, I was forced doing quite some rework. If we really want a separation for bisection purpose, I suggest simple disabling of gswitch support? It was really meant to ease review. Sure, but keeping a separation if the final shape is changing is difficult :P I'm now looking at the combined patch, comments will follow. +static void +clean_up_after_unswitching (const auto_edge_flag &ignored_edge_flag) +{ + basic_block bb; + ... + /* Clean up the ignored_edge_flag from edges. */ + FOR_EACH_BB_FN (bb, cfun) +{ + edge e; you can probably clean up outgoing edge flags in the loop above? Done. (I'm randomly jumping) + /* Build compound expression for all cases leading +to this edge. */ + tree expr = NULL_TREE; + for (unsigned i = 1; i < gimple_switch_num_labels (stmt); ++i) + { + tree lab = gimple_switch_label (stmt, i); + basic_block dest = label_to_block (cfun, CASE_LABEL (lab)); + edge e2 = find_edge (gimple_bb (stmt), dest); + if (e == e2) just as a style note I prefer if (e != e2) continue; so the following code needs less indentation Likewise. + tree cmp1 = build2 (GE_EXPR, boolean_type_node, idx, + CASE_LOW (lab)); is there a reason you are not using fold_build2? Changed. Do we want to somehow account for the built expression size or maybe have a separate limit for the number of cases we want to combine this way? Huh, I would ignore it for now, but yes, can be accounted as well. + unswitch_predicate *predicate + = new unswitch_predicate (expr, idx, edge_index); + ranger->gori ().outgoing_edge_range_p (predicate->true_range, e, +idx, *get_global_range_query ()); + /* Huge switches are not supported by Ranger. */ + if (predicate->true_range.undefined_p ()) I hope ranger will set the range to varying_p () in that case, not undefined? ... it's been discussed in a different sub-thread. But even then, is that a reason for punting? I guess we fail to prune cases in that case but the cost modeling should then account for those and thus we are at least consistent? Yes, but I think the situation (huge switches) will be rarely an interesting optimization opportunity. Plus we would have to find a hot case in such switch. That's currently ignored by the pass. + { + delete predicate; + return; In this context, when we do + if (operand_equal_p (predicate->lhs, last_predicate->lhs, 0)) + { + irange &other = (true_edge ? predicate->merged_true_range + : predicate->merged_false_range); + last_predicate->merged_true_range.intersect (other); + last_predicate->merged_false_range.intersect (other); + return; ranger may be conservative when intersecting (and hopefully not end up with undefined_p - heh). I also am confused about intersecting both merged_true_range and merged_false_range with 'other'? other is a predicate for a SSA_NAME _x_ that we already switched on. And thus we can improve both predicates for true/false edge that are also related to the same SSA_NAME _x_. I would have expected to merge true edge info with true edge info and thus only "swap" things somehow? OTOH "path" suggests we're dealing with more than one edge and associated ranges? Yep, path means the we already did a series of unswitching like: - if (a > 10) TRUE - if (b == 10) FALSE - if (c > 0) TRUE Maybe expanding the comment on the predicate_vector would be useful. AFAIR we Sure, will do that. there store the sequence of unswitchings done with pairs of the predicate unswitched on and a bool indicating whether we're dealing with the copy that had the predicate true or the one that had it false. Exactly. + unswitch_predicate *predicate = NULL; + basic_block bb = NULL; + if (num > param_max_unswitch_level) +{ + if (dump_enabled_p ()) + dump_printf_loc (MSG_MISSED_OPTIMIZATION, loc, +"Not unswitc
Re: [PATCH] Loop unswitching: support gswitch statements.
On 1/6/22 11:02, Martin Liška wrote: On 1/6/22 16:11, Andrew MacLeod wrote: On 1/5/22 07:34, Richard Biener wrote: On Thu, Dec 9, 2021 at 2:02 PM Martin Liška wrote: On 11/30/21 12:17, Richard Biener wrote: + unswitch_predicate *predicate + = new unswitch_predicate (expr, idx, edge_index); + ranger->gori ().outgoing_edge_range_p (predicate->true_range, e, + idx, *get_global_range_query ()); + /* Huge switches are not supported by Ranger. */ + if (predicate->true_range.undefined_p ()) I hope ranger will set the range to varying_p () in that case, not undefined? But even then, is that a reason for punting? I guess we fail to prune cases in that case but the cost modeling should then account for those and thus we are at least consistent? huge switches not supported? I don't know what you mean, either that or I forget :-) If there are more edges than multi-ranges support, then things will start getting merged because they cant be represented.. and the default case may then also contain some values that also have cases.. but all inconsistencies will move towards varying.. not undefined. Andrew oh oh oh. EVRp was spending too much time in very large switches, so we added a param to say "don't look at switches too big" You are probably tripping over that -param=evrp-switch-limit= Common Joined UInteger Var(param_evrp_switch_limit) Init(50) Optimization Param Maximum number of outgoing edges in a switch before EVRP will not process it. and in GORI we check: // Do not process switches if they are too large. if (EDGE_COUNT (bb->succs) > (unsigned)param_evrp_switch_limit) return; so maybe you want to temporarily override param_evrp_switch_limit to some big number for your purposes... just set it back when you are done :-) Andrew
Re: [PATCH] Loop unswitching: support gswitch statements.
On 1/6/22 17:20, Andrew MacLeod wrote: So if you get a FALSE back, you cannot use any results because GORI is claiming that it cant figure something out... or there is nothing to figure out. Most of rangers routines are implemented so that if they return FALSE, the result is meaningless. All right, then it's my bad, sorry for the noise. what is IDX you are passing? order_385? Yep. (gdb) p idx $1 = As a side note, theres a typo in the testcase: .. Im not sure how that affects things, but : Oh, yeah, that's typo :) defaut: __builtin_unreachable (); default is misspelled... maybe it thinks that some kind of runtime value? I am surprised it even compiles. maybe that is mucking up what GORI thiunks it can calculate? But it does not affect anything. The bailout happens due to: │ 199// Only process switches if it within the size limit. │ 200if (EDGE_COUNT (e->src->succs) > (unsigned)m_max_edges) │ > 201 return NULL; Cheers, Martin
[PATCH] nvptx: Add support for PTX's cnot instruction.
Happy New Year for 2022. This is a simple patch, now that the nvptx backend has transitioned to STORE_FLAG_VALUE=1, that adds support for NVidia's cnot instruction, that implements C/C++ style logical negation. Previously, the simple function: int foo(int x) { return !x; } on nvptx-none with -O2 would generate: mov.u32 %r24, %ar0; setp.eq.u32 %r28, %r24, 0; selp.u32%value, 1, 0, %r28; with this patch, GCC now generates: mov.u32 %r24, %ar0; cnot.b32%value, %r24; This patch has been tested on nvptx-none hosted on x86_64-pc-linux-gnu (including newlib) with a make and make -k check with no new failures. Ok for mainline? 2022-01-06 Roger Sayle gcc/ChangeLog * config/nvptx/nvptx.md (*cnot2): New define_insn. gcc/testsuite/ChangeLog * gcc.target/nvptx/cnot-1.c: New test case. Thanks in advance, Roger -- diff --git a/gcc/config/nvptx/nvptx.md b/gcc/config/nvptx/nvptx.md index a4c14a3..ce74672 100644 --- a/gcc/config/nvptx/nvptx.md +++ b/gcc/config/nvptx/nvptx.md @@ -592,6 +592,13 @@ "" "%.\\tnot.b%T0\\t%0, %1;") +(define_insn "*cnot2" + [(set (match_operand:HSDIM 0 "nvptx_register_operand" "=R") + (eq:HSDIM (match_operand:HSDIM 1 "nvptx_register_operand" "R") + (const_int 0)))] + "" + "%.\\tcnot.b%T0\\t%0, %1;") + (define_insn "bitrev2" [(set (match_operand:SDIM 0 "nvptx_register_operand" "=R") (unspec:SDIM [(match_operand:SDIM 1 "nvptx_register_operand" "R")] diff --git a/gcc/testsuite/gcc.target/nvptx/cnot-1.c b/gcc/testsuite/gcc.target/nvptx/cnot-1.c new file mode 100644 index 000..d0bdccd --- /dev/null +++ b/gcc/testsuite/gcc.target/nvptx/cnot-1.c @@ -0,0 +1,94 @@ +/* { dg-do compile } */ +/* { dg-options "-O2" } */ + +int test1(int x) +{ + return !x; +} + +int test2(int x) +{ + return x ? 0 : 1; +} + +int test3(int x) +{ + return (x == 0) ? 1 : 0; +} + +unsigned int test4(unsigned int x) +{ + return !x; +} + +unsigned int test5(unsigned int x) +{ + return x ? 0 : 1; +} + +unsigned int test6(unsigned int x) +{ + return (x == 0) ? 1 : 0; +} + +short test7(short x) +{ + return !x; +} + +short test8(short x) +{ + return x ? 0 : 1; +} + +short test9(short x) +{ + return (x == 0) ? 1 : 0; +} + +unsigned short test10(unsigned short x) +{ + return !x; +} + +unsigned short test11(unsigned short x) +{ + return x ? 0 : 1; +} + +unsigned short test12(unsigned short x) +{ + return (x == 0) ? 1 : 0; +} + +long test13(long x) +{ + return !x; +} + +long test14(long x) +{ + return x ? 0 : 1; +} + +long test15(long x) +{ + return (x == 0) ? 1: 0; +} + +unsigned long test16(unsigned long x) +{ + return !x; +} + +unsigned long test17(unsigned long x) +{ + return x ? 0 : 1; +} + +unsigned long test18(unsigned long x) +{ + return (x == 0) ? 1 : 0; +} + +/* { dg-final { scan-assembler-times "cnot.b" 18 } } */
Re: [PATCH] Loop unswitching: support gswitch statements.
On 1/6/22 11:35, Martin Liška wrote: On 1/6/22 17:20, Andrew MacLeod wrote: So if you get a FALSE back, you cannot use any results because GORI is claiming that it cant figure something out... or there is nothing to figure out. Most of rangers routines are implemented so that if they return FALSE, the result is meaningless. All right, then it's my bad, sorry for the noise. what is IDX you are passing? order_385? Yep. (gdb) p idx $1 = As a side note, theres a typo in the testcase: .. Im not sure how that affects things, but : Oh, yeah, that's typo :) defaut: __builtin_unreachable (); default is misspelled... maybe it thinks that some kind of runtime value? I am surprised it even compiles. maybe that is mucking up what GORI thiunks it can calculate? But it does not affect anything. The bailout happens due to: │ 199 // Only process switches if it within the size limit. │ 200 if (EDGE_COUNT (e->src->succs) > (unsigned)m_max_edges) │ > 201 return NULL; Cheers, Martin yeah its created by: gori_compute::gori_compute (int not_executable_flag) : outgoing (param_evrp_switch_limit), tracer ("GORI ") so as long as you change it to whatever you want before you create a ranger, you should get whatever limit you want. anything above 255 may produce imprecise default values, if there are big holes in the switch because multi ranges only currently support up to 255 ranges.. so if there are more than 255 distinct subranges, it wont be exact. Andrew
Re: [PATCH 1/2] RISC-V: Allow extension name contain digit
Committed On Fri, Dec 3, 2021 at 11:51 PM Kito Cheng wrote: > > RISC-V spec only allow alphabetical name for extension before, however > vector extension add several extension named with digits, so we try to > extend the naming rule. > > Ref: > https://github.com/riscv/riscv-isa-manual/pull/718 > > gcc/ChangeLog: > > * common/config/riscv/riscv-common.c > (riscv_subset_list::parse_multiletter_ext): Allow ext. name has > digit. > --- > gcc/common/config/riscv/riscv-common.c | 42 +++--- > 1 file changed, 38 insertions(+), 4 deletions(-) > > diff --git a/gcc/common/config/riscv/riscv-common.c > b/gcc/common/config/riscv/riscv-common.c > index b8dd0aeac3e..31b1c833965 100644 > --- a/gcc/common/config/riscv/riscv-common.c > +++ b/gcc/common/config/riscv/riscv-common.c > @@ -760,24 +760,58 @@ riscv_subset_list::parse_multiletter_ext (const char *p, >bool explicit_version_p = false; >char *ext; >char backup; > + size_t len; > + size_t end_of_version_pos, i; > + bool found_any_number = false; > + bool found_minor_version = false; > > - while (*++q != '\0' && *q != '_' && !ISDIGIT (*q)) > + /* Parse until end of this extension including version number. */ > + while (*++q != '\0' && *q != '_') > ; > >backup = *q; >*q = '\0'; > - ext = xstrdup (subset); > + len = q - subset; >*q = backup; > > + end_of_version_pos = len; > + /* Find the begin of version string. */ > + for (i = len -1; i > 0; --i) > + { > + if (ISDIGIT (subset[i])) > + { > + found_any_number = true; > + continue; > + } > + /* Might be version seperator, but need to check one more char, > +we only allow p, so we could stop parsing if found > +any more `p`. */ > + if (subset[i] == 'p' && > + !found_minor_version && > + found_any_number && ISDIGIT (subset[i-1])) > + { > + found_minor_version = true; > + continue; > + } > + > + end_of_version_pos = i + 1; > + break; > + } > + > + backup = subset[end_of_version_pos]; > + subset[end_of_version_pos] = '\0'; > + ext = xstrdup (subset); > + subset[end_of_version_pos] = backup; > + >end_of_version > - = parsing_subset_version (ext, q, &major_version, &minor_version, > + = parsing_subset_version (ext, subset + end_of_version_pos, > &major_version, &minor_version, > /* std_ext_p= */ false, > &explicit_version_p); >free (ext); > >if (end_of_version == NULL) > return NULL; > > - *q = '\0'; > + subset[end_of_version_pos] = '\0'; > >if (strlen (subset) == 1) > { > -- > 2.34.0 >
Re: [PATCH 2/2] RISC-V: Minimal support of vector extensions
Committed with minor changelog fix On Fri, Dec 3, 2021 at 11:52 PM Kito Cheng wrote: > > gcc/ChangeLog: > > * common/config/riscv/riscv-common.c (riscv_implied_info): Add > vector extensions. > (riscv_ext_version_table): Add version info for vector extensions. > (riscv_ext_flag_table): Add option mask for vector extensions. > * config/riscv/riscv-opts.h (MASK_VECTOR_EEW_32): New. > (MASK_VECTOR_EEW_64): New. > (MASK_VECTOR_EEW_FP_32): New. > (MASK_VECTOR_EEW_FP_64): New. > (MASK_ZVL32B: New. > (MASK_ZVL64B: New. > (MASK_ZVL128B: New. > (MASK_ZVL256B: New. > (MASK_ZVL512B: New. > (MASK_ZVL1024B): New. > (MASK_ZVL2048B): New. > (MASK_ZVL4096B): New. > (MASK_ZVL8192B): New. > (MASK_ZVL16384B): New. > (MASK_ZVL32768B): New. > (MASK_ZVL65536B): New. > (TARGET_ZVL32B): New. > (TARGET_ZVL64B): New. > (TARGET_ZVL128B): New. > (TARGET_ZVL256B): New. > (TARGET_ZVL512B): New. > (TARGET_ZVL1024B): New. > (TARGET_ZVL2048B): New. > (TARGET_ZVL4096B): New. > (TARGET_ZVL8192B): New. > (TARGET_ZVL16384B): New. > (TARGET_ZVL32768B): New. > (TARGET_ZVL65536B): New. > * config/riscv/riscv.opt (Mask(VECTOR)): New. > (riscv_vector_eew_flags): New. > (riscv_zvl_flags): New. > > gcc/testsuite/ChangeLog: > > * testsuite/gcc.target/riscv/predef-14.c: New. > * testsuite/gcc.target/riscv/predef-15.c: Ditto. > * testsuite/gcc.target/riscv/predef-16.c: Ditto. > --- > gcc/common/config/riscv/riscv-common.c | 86 > gcc/config/riscv/riscv-opts.h | 31 > gcc/config/riscv/riscv.opt | 8 ++ > gcc/testsuite/gcc.target/riscv/predef-14.c | 83 > gcc/testsuite/gcc.target/riscv/predef-15.c | 91 ++ > gcc/testsuite/gcc.target/riscv/predef-16.c | 91 ++ > 6 files changed, 390 insertions(+) > create mode 100644 gcc/testsuite/gcc.target/riscv/predef-14.c > create mode 100644 gcc/testsuite/gcc.target/riscv/predef-15.c > create mode 100644 gcc/testsuite/gcc.target/riscv/predef-16.c > > diff --git a/gcc/common/config/riscv/riscv-common.c > b/gcc/common/config/riscv/riscv-common.c > index 31b1c833965..5cf15024aef 100644 > --- a/gcc/common/config/riscv/riscv-common.c > +++ b/gcc/common/config/riscv/riscv-common.c > @@ -50,6 +50,38 @@ static const riscv_implied_info_t riscv_implied_info[] = >{"d", "f"}, >{"f", "zicsr"}, >{"d", "zicsr"}, > + > + {"v", "zvl128b"}, > + {"v", "zve64d"}, > + > + {"zve32f", "f"}, > + {"zve64f", "f"}, > + {"zve64d", "d"}, > + > + {"zve32x", "zvl32b"}, > + {"zve32f", "zve32x"}, > + {"zve32f", "zvl32b"}, > + > + {"zve64x", "zve32x"}, > + {"zve64x", "zvl64b"}, > + {"zve64f", "zve32f"}, > + {"zve64f", "zve64x"}, > + {"zve64f", "zvl64b"}, > + {"zve64d", "zve64f"}, > + {"zve64d", "zvl64b"}, > + > + {"zvl64b", "zvl32b"}, > + {"zvl128b", "zvl64b"}, > + {"zvl256b", "zvl128b"}, > + {"zvl512b", "zvl256b"}, > + {"zvl1024b", "zvl512b"}, > + {"zvl2048b", "zvl1024b"}, > + {"zvl4096b", "zvl2048b"}, > + {"zvl8192b", "zvl4096b"}, > + {"zvl16384b", "zvl8192b"}, > + {"zvl32768b", "zvl16384b"}, > + {"zvl65536b", "zvl32768b"}, > + >{NULL, NULL} > }; > > @@ -101,11 +133,34 @@ static const struct riscv_ext_version > riscv_ext_version_table[] = >{"zifencei", ISA_SPEC_CLASS_20191213, 2, 0}, >{"zifencei", ISA_SPEC_CLASS_20190608, 2, 0}, > > + > + {"v", ISA_SPEC_CLASS_NONE, 1, 0}, > + >{"zba", ISA_SPEC_CLASS_NONE, 1, 0}, >{"zbb", ISA_SPEC_CLASS_NONE, 1, 0}, >{"zbc", ISA_SPEC_CLASS_NONE, 1, 0}, >{"zbs", ISA_SPEC_CLASS_NONE, 1, 0}, > > + {"zve32x", ISA_SPEC_CLASS_NONE, 1, 0}, > + {"zve32f", ISA_SPEC_CLASS_NONE, 1, 0}, > + {"zve32d", ISA_SPEC_CLASS_NONE, 1, 0}, > + {"zve64x", ISA_SPEC_CLASS_NONE, 1, 0}, > + {"zve64f", ISA_SPEC_CLASS_NONE, 1, 0}, > + {"zve64d", ISA_SPEC_CLASS_NONE, 1, 0}, > + > + {"zvl32b", ISA_SPEC_CLASS_NONE, 1, 0}, > + {"zvl64b", ISA_SPEC_CLASS_NONE, 1, 0}, > + {"zvl128b", ISA_SPEC_CLASS_NONE, 1, 0}, > + {"zvl256b", ISA_SPEC_CLASS_NONE, 1, 0}, > + {"zvl512b", ISA_SPEC_CLASS_NONE, 1, 0}, > + {"zvl1024b", ISA_SPEC_CLASS_NONE, 1, 0}, > + {"zvl2048b", ISA_SPEC_CLASS_NONE, 1, 0}, > + {"zvl4096b", ISA_SPEC_CLASS_NONE, 1, 0}, > + {"zvl8192b", ISA_SPEC_CLASS_NONE, 1, 0}, > + {"zvl16384b", ISA_SPEC_CLASS_NONE, 1, 0}, > + {"zvl32768b", ISA_SPEC_CLASS_NONE, 1, 0}, > + {"zvl65536b", ISA_SPEC_CLASS_NONE, 1, 0}, > + >/* Terminate the list. */ >{NULL, ISA_SPEC_CLASS_NONE, 0, 0} > }; > @@ -940,6 +995,7 @@ static const riscv_ext_flag_table_t > riscv_ext_flag_table[] = >{"f", &gcc_options::x_target_flags, MASK_HARD_FLOAT}, >{"d", &gcc_options::x_target_flags, MASK_DOUBLE_FLOAT}, >{"c", &gcc_options::x_target_flags, MA
[Ada] Proof of runtime units for binary modular exponentiation
This proves the generic unit System.Exponu instantiated for Unsigned, Long_Long_Unsigned and Long_Long_Long_Unsigned. The proof is simpler than the one for signed integers, as there are no possible overflows here. Tested on x86_64-pc-linux-gnu, committed on trunk gcc/ada/ * libgnat/s-explllu.ads: Mark in SPARK. * libgnat/s-expllu.ads: Mark in SPARK. * libgnat/s-exponu.adb: Add loop invariants and needed assertions. * libgnat/s-exponu.ads: Add functional contract. * libgnat/s-expuns.ads: Mark in SPARK.diff --git a/gcc/ada/libgnat/s-explllu.ads b/gcc/ada/libgnat/s-explllu.ads --- a/gcc/ada/libgnat/s-explllu.ads +++ b/gcc/ada/libgnat/s-explllu.ads @@ -34,10 +34,23 @@ -- The result is always full width, the caller must do a masking operation if -- the modulus is less than 2 ** Long_Long_Long_Unsigned'Size. +-- Preconditions in this unit are meant for analysis only, not for run-time +-- checking, so that the expected exceptions are raised. This is enforced +-- by setting the corresponding assertion policy to Ignore. Postconditions +-- and contract cases should not be executed at runtime as well, in order +-- not to slow down the execution of these functions. + +pragma Assertion_Policy (Pre=> Ignore, + Post => Ignore, + Contract_Cases => Ignore, + Ghost => Ignore); + with System.Exponu; with System.Unsigned_Types; -package System.Exp_LLLU is +package System.Exp_LLLU + with SPARK_Mode +is subtype Long_Long_Long_Unsigned is Unsigned_Types.Long_Long_Long_Unsigned; diff --git a/gcc/ada/libgnat/s-expllu.ads b/gcc/ada/libgnat/s-expllu.ads --- a/gcc/ada/libgnat/s-expllu.ads +++ b/gcc/ada/libgnat/s-expllu.ads @@ -34,10 +34,23 @@ -- The result is always full width, the caller must do a masking operation if -- the modulus is less than 2 ** Long_Long_Unsigned'Size. +-- Preconditions in this unit are meant for analysis only, not for run-time +-- checking, so that the expected exceptions are raised. This is enforced +-- by setting the corresponding assertion policy to Ignore. Postconditions +-- and contract cases should not be executed at runtime as well, in order +-- not to slow down the execution of these functions. + +pragma Assertion_Policy (Pre=> Ignore, + Post => Ignore, + Contract_Cases => Ignore, + Ghost => Ignore); + with System.Exponu; with System.Unsigned_Types; -package System.Exp_LLU is +package System.Exp_LLU + with SPARK_Mode +is subtype Long_Long_Unsigned is Unsigned_Types.Long_Long_Unsigned; diff --git a/gcc/ada/libgnat/s-exponu.adb b/gcc/ada/libgnat/s-exponu.adb --- a/gcc/ada/libgnat/s-exponu.adb +++ b/gcc/ada/libgnat/s-exponu.adb @@ -29,7 +29,19 @@ -- -- -- -function System.Exponu (Left : Int; Right : Natural) return Int is +function System.Exponu (Left : Int; Right : Natural) return Int + with SPARK_Mode +is + -- Preconditions, postconditions, ghost code, loop invariants and + -- assertions in this unit are meant for analysis only, not for run-time + -- checking, as it would be too costly otherwise. This is enforced by + -- setting the assertion policy to Ignore. + + pragma Assertion_Policy (Pre=> Ignore, +Post => Ignore, +Ghost => Ignore, +Loop_Invariant => Ignore, +Assert => Ignore); -- Note that negative exponents get a constraint error because the -- subtype of the Right argument (the exponent) is Natural. @@ -49,7 +61,16 @@ begin if Exp /= 0 then loop + pragma Loop_Invariant (Exp > 0); + pragma Loop_Invariant (Result * Factor ** Exp = Left ** Right); + pragma Loop_Variant (Decreases => Exp); + if Exp rem 2 /= 0 then +pragma Assert + (Result * (Factor * Factor ** (Exp - 1)) = Left ** Right); +pragma Assert + ((Result * Factor) * Factor ** (Exp - 1) = Left ** Right); + Result := Result * Factor; end if; diff --git a/gcc/ada/libgnat/s-exponu.ads b/gcc/ada/libgnat/s-exponu.ads --- a/gcc/ada/libgnat/s-exponu.ads +++ b/gcc/ada/libgnat/s-exponu.ads @@ -31,8 +31,22 @@ -- Modular integer exponentiation +-- Preconditions in this unit are meant for analysis only, not for run-time +-- checking, so that the expected exceptions are raised. This is enforced +-- by setting the corresponding assertion policy to Ignore. Postconditions +-- and contract cases should not be executed at runtime as well, in order +-- not to slow down the execution of these functi
[Ada] Proof of runtime unit for non-binary modular exponentiation
This proof combines the difficulties of proving signed exponentiation, as we need to compare the result to the mathematical one using big integers, with pervasive use of modulo operation. This requires lemmas which should later be isolated in a shared library, for possible reuse in other runtime units. The postcondition previously stated on Exp_Modular was wrong, as it performed exponentiation in the Unsigned type, which was silently performing already a different modulo operation. This is fixed here by using big integers instead. Tested on x86_64-pc-linux-gnu, committed on trunk gcc/ada/ * libgnat/s-expmod.adb: Mark in SPARK. Add ghost code for proof. * libgnat/s-expmod.ads: Mark in SPARK. Add ghost specifications.diff --git a/gcc/ada/libgnat/s-expmod.adb b/gcc/ada/libgnat/s-expmod.adb --- a/gcc/ada/libgnat/s-expmod.adb +++ b/gcc/ada/libgnat/s-expmod.adb @@ -29,9 +29,168 @@ -- -- -- -package body System.Exp_Mod is +-- Preconditions, postconditions, ghost code, loop invariants and assertions +-- in this unit are meant for analysis only, not for run-time checking, as it +-- would be too costly otherwise. This is enforced by setting the assertion +-- policy to Ignore. + +pragma Assertion_Policy (Pre=> Ignore, + Post => Ignore, + Ghost => Ignore, + Loop_Invariant => Ignore, + Assert => Ignore); + +with Ada.Numerics.Big_Numbers.Big_Integers_Ghost; +use Ada.Numerics.Big_Numbers.Big_Integers_Ghost; + +package body System.Exp_Mod + with SPARK_Mode +is use System.Unsigned_Types; + -- Local lemmas + + procedure Lemma_Add_Mod (X, Y : Big_Natural; B : Big_Positive) + with + Ghost, + Post => (X + Y) mod B = ((X mod B) + (Y mod B)) mod B; + + procedure Lemma_Exp_Expand (A : Big_Integer; Exp : Natural) + with + Ghost, + Post => + (if Exp rem 2 = 0 then + A ** Exp = A ** (Exp / 2) * A ** (Exp / 2) +else + A ** Exp = A ** (Exp / 2) * A ** (Exp / 2) * A); + + procedure Lemma_Exp_Mod (A : Big_Natural; Exp : Natural; B : Big_Positive) + with + Ghost, + Subprogram_Variant => (Decreases => Exp), + Post => ((A mod B) ** Exp) mod B = (A ** Exp) mod B; + + procedure Lemma_Mod_Ident (A : Big_Natural; B : Big_Positive) + with + Ghost, + Pre => A < B, + Post => A mod B = A; + + procedure Lemma_Mod_Mod (A : Big_Integer; B : Big_Positive) + with + Ghost, + Post => A mod B mod B = A mod B; + + procedure Lemma_Mult_Div (X : Big_Natural; Y : Big_Positive) + with + Ghost, + Post => X * Y / Y = X; + + procedure Lemma_Mult_Mod (X, Y : Big_Natural; B : Big_Positive) + with + Ghost, + -- The following subprogram variant can be added as soon as supported + -- Subprogram_Variant => (Decreases => Y), + Post => (X * Y) mod B = ((X mod B) * (Y mod B)) mod B; + + - + -- Local lemma null bodies -- + - + + procedure Lemma_Mod_Ident (A : Big_Natural; B : Big_Positive) is null; + procedure Lemma_Mod_Mod (A : Big_Integer; B : Big_Positive) is null; + procedure Lemma_Mult_Div (X : Big_Natural; Y : Big_Positive) is null; + + --- + -- Lemma_Add_Mod -- + --- + + procedure Lemma_Add_Mod (X, Y : Big_Natural; B : Big_Positive) is + Left : constant Big_Natural := (X + Y) mod B; + Right : constant Big_Natural := ((X mod B) + (Y mod B)) mod B; + XQuot : constant Big_Natural := X / B; + YQuot : constant Big_Natural := Y / B; + AQuot : constant Big_Natural := (X mod B + Y mod B) / B; + begin + if Y /= 0 and B > 1 then + pragma Assert (X = XQuot * B + X mod B); + pragma Assert (Y = YQuot * B + Y mod B); + pragma Assert + (Left = ((XQuot + YQuot) * B + X mod B + Y mod B) mod B); + pragma Assert (X mod B + Y mod B = AQuot * B + Right); + pragma Assert (Left = ((XQuot + YQuot + AQuot) * B + Right) mod B); + pragma Assert (Left = Right); + end if; + end Lemma_Add_Mod; + + -- + -- Lemma_Exp_Expand -- + -- + + procedure Lemma_Exp_Expand (A : Big_Integer; Exp : Natural) is + begin + if Exp rem 2 = 0 then + pragma Assert (Exp = Exp / 2 + Exp / 2); + else + pragma Assert (Exp = Exp / 2 + Exp / 2 + 1); + pragma Assert (A ** Exp = A ** (Exp / 2) * A ** (Exp / 2 + 1)); + pragma Assert (A ** (Exp / 2 + 1) = A ** (Exp / 2) * A); + pragma Assert (A ** Exp = A ** (Exp / 2) * A ** (Exp / 2) * A); + end if; + end Lemma_Exp_Expand; + + --- + -- Lemma_Exp_Mod -- + --- + + procedure Lemma_Exp_
[Ada] Simplify GNAT AST printing with simple GNAT hash table
For pretty-printing of GNAT AST we had a custom hash table which stored visited nodes. Now this custom hash table is replaced with an instance of GNAT.Dynamic_Tables.Dynamic_Hash_Tables. Expansion and compression factors for this table are the same as for all other instances of Dynamic_Hash_Tables in the frontend. Code cleanup; behaviour is unaffected; no noticeable difference in performance either (when comparing the running time with AST dump for a reasonably large file, i.e. "gcc -c sem_util.adb -gnatdt"). Tested on x86_64-pc-linux-gnu, committed on trunk gcc/ada/ * treepr.ads (Treepr, Print_Tree_List, Print_Tree_Elist): Fix style in comments. * treepr.adb (Serial_Numbers): Hash table instance. (Hash): Hashing routine. (Print_Field): Fix style. (Print_Init): Adapt to simple hash table. (Print_Term): Likewise. (Serial_Numbers): Likewise. (Set_Serial_Number): Likewise.diff --git a/gcc/ada/treepr.adb b/gcc/ada/treepr.adb --- a/gcc/ada/treepr.adb +++ b/gcc/ada/treepr.adb @@ -23,32 +23,32 @@ -- -- -- -with Aspects;use Aspects; -with Atree; use Atree; -with Csets; use Csets; -with Debug; use Debug; -with Einfo; use Einfo; -with Einfo.Entities; use Einfo.Entities; -with Einfo.Utils;use Einfo.Utils; -with Elists; use Elists; -with Lib;use Lib; -with Namet; use Namet; -with Nlists; use Nlists; -with Output; use Output; -with Seinfo; use Seinfo; -with Sinfo; use Sinfo; -with Sinfo.Nodes;use Sinfo.Nodes; -with Sinfo.Utils;use Sinfo.Utils; -with Snames; use Snames; -with Sinput; use Sinput; -with Stand; use Stand; -with Stringt;use Stringt; -with SCIL_LL;use SCIL_LL; -with Uintp; use Uintp; -with Urealp; use Urealp; -with Uname; use Uname; +with Aspects; use Aspects; +with Atree;use Atree; +with Csets;use Csets; +with Debug;use Debug; +with Einfo;use Einfo; +with Einfo.Entities; use Einfo.Entities; +with Einfo.Utils; use Einfo.Utils; +with Elists; use Elists; +with GNAT.Dynamic_HTables; use GNAT.Dynamic_HTables; +with Lib; use Lib; +with Namet;use Namet; +with Nlists; use Nlists; +with Output; use Output; +with Seinfo; use Seinfo; +with Sinfo;use Sinfo; +with Sinfo.Nodes; use Sinfo.Nodes; +with Sinfo.Utils; use Sinfo.Utils; +with Snames; use Snames; +with Sinput; use Sinput; +with Stand;use Stand; +with Stringt; use Stringt; +with SCIL_LL; use SCIL_LL; +with Uintp;use Uintp; +with Urealp; use Urealp; +with Uname;use Uname; with Unchecked_Conversion; -with Unchecked_Deallocation; package body Treepr is @@ -80,24 +80,30 @@ package body Treepr is -- Set True to print low-level information useful for debugging Atree and -- the like. - type Hash_Record is record - Serial : Nat; - -- Serial number for hash table entry. A value of zero means that - -- the entry is currently unused. - - Id : Int; - -- If serial number field is non-zero, contains corresponding Id value - end record; - - type Hash_Table_Type is array (Nat range <>) of Hash_Record; - type Access_Hash_Table_Type is access Hash_Table_Type; - Hash_Table : Access_Hash_Table_Type; + function Hash (Key : Int) return GNAT.Bucket_Range_Type; + -- Simple Hash function for Node_Ids, List_Ids and Elist_Ids + + procedure Destroy (Value : in out Nat) is null; + -- Dummy routine for destroing hashed values + + package Serial_Numbers is new Dynamic_Hash_Tables + (Key_Type => Int, + Value_Type=> Nat, + No_Value => 0, + Expansion_Threshold => 1.5, + Expansion_Factor => 2, + Compression_Threshold => 0.3, + Compression_Factor=> 2, + "=" => "=", + Destroy_Value => Destroy, + Hash => Hash); + -- Hash tables with dynamic resizing based on load factor. They provide + -- reasonable performance both when the printed AST is small (e.g. when + -- printing from debugger) and large (e.g. when printing with -gnatdt). + + Hash_Table : Serial_Numbers.Dynamic_Hash_Table; -- The hash table itself, see Serial_Number function for details of use - Hash_Table_Len : Nat; - -- Range of Hash_Table is from 0 .. Hash_Table_Len - 1 so that dividing - -- by Hash_Table_Len gives a remainder that is in Hash_Table'Range. - Next_Serial_Number
[Ada] Simplify repeated calls in printing of GNAT AST
Code cleanup; behaviour is unaffected. Tested on x86_64-pc-linux-gnu, committed on trunk gcc/ada/ * treepr.adb (Visit_Node): Simplify repeated call to Next_Entity.diff --git a/gcc/ada/treepr.adb b/gcc/ada/treepr.adb --- a/gcc/ada/treepr.adb +++ b/gcc/ada/treepr.adb @@ -2305,8 +2305,8 @@ package body Treepr is begin Nod := N; while Present (Nod) loop - Visit_Descendant (Union_Id (Next_Entity (Nod))); Next_Entity (Nod); + Visit_Descendant (Union_Id (Nod)); end loop; end; end if;
[Ada] Crash in class-wide pre/postconditions
The compiler may crash processing a class-wide pre/postcondition that has dispatching calls using the Object.Operation notation. Tested on x86_64-pc-linux-gnu, committed on trunk gcc/ada/ * contracts.adb (Restore_Original_Selected_Component): New subprogram that traverses a preanalyzed expression searching for dispatching calls to functions whose original node was a selected component, and replacing them with their original node. This functionality is required because the preanalyis of dispatching calls using the Object.Operation notation transforms such calls, and we need the original condition to properly inherit and extend the condition expression on tagged type derivations. This functionality was previously provided by the routine Install_Original_Selected_Component (as part of inheriting conditions); now it is performed as part of the preanalysis of the condition, thus avoiding repeatedly installing and restoring such nodes. (Install_Original_Selected_Component): Removed. (Restore_Dispatching_Calls): Removed.diff --git a/gcc/ada/contracts.adb b/gcc/ada/contracts.adb --- a/gcc/ada/contracts.adb +++ b/gcc/ada/contracts.adb @@ -4254,6 +4254,11 @@ package body Contracts is procedure Remove_Formals (Id : Entity_Id); -- Remove formals from homonym chains and make them not visible + procedure Restore_Original_Selected_Component; + -- Traverse Expr searching for dispatching calls to functions whose + -- original node was a selected component, and replace them with + -- their original node. + -- Clear_Unset_References -- @@ -4313,6 +4318,46 @@ package body Contracts is end loop; end Remove_Formals; + - + -- Restore_Original_Selected_Component -- + - + + procedure Restore_Original_Selected_Component is + +function Restore_Node (N : Node_Id) return Traverse_Result; +-- Process a single node + +-- +-- Restore_Node -- +-- + +function Restore_Node (N : Node_Id) return Traverse_Result is +begin + if Nkind (N) = N_Function_Call + and then Nkind (Original_Node (N)) = N_Selected_Component + and then Is_Dispatching_Operation (Entity (Name (N))) + then + Rewrite (N, Original_Node (N)); + Set_Original_Node (N, N); + + -- Restore decoration of its child nodes; required to ensure + -- proper copies of this subtree (if required) by subsequent + -- calls to New_Copy_Tree (since otherwise these child nodes + -- are not duplicated). + + Set_Parent (Prefix (N), N); + Set_Parent (Selector_Name (N), N); + end if; + + return OK; +end Restore_Node; + +procedure Restore_Nodes is new Traverse_Proc (Restore_Node); + + begin +Restore_Nodes (Expr); + end Restore_Original_Selected_Component; + -- Start of processing for Preanalyze_Condition begin @@ -4329,6 +4374,16 @@ package body Contracts is Remove_Formals (Subp); Pop_Scope; + -- If this preanalyzed condition has occurrences of dispatching calls + -- using the Object.Operation notation, during preanalysis such calls + -- are rewritten as dispatching function calls; if at later stages + -- this condition is inherited we must have restored the original + -- selected-component node to ensure that the preanalysis of the + -- inherited condition rewrites these dispatching calls in the + -- correct context to avoid reporting spurious errors. + + Restore_Original_Selected_Component; + -- Traverse Expr and clear the Controlling_Argument of calls to -- nonabstract functions. Required since the preanalyzed condition -- is not yet installed on its definite context and will be cloned @@ -4373,103 +4428,9 @@ package body Contracts is (Par_Subp : Entity_Id; Subp : Entity_Id) return Node_Id is -Installed_Calls : constant Elist_Id := New_Elmt_List; - -procedure Install_Original_Selected_Component (Expr : Node_Id); --- Traverse the given expression searching for dispatching calls --- to functions whose original nodes was a selected component, --- and replacing them temporarily by a copy of their original --- node. Modified calls are stored in the list Installed
[Ada] Justify false positive message from CodePeer analysis of GNAT
Analysis of loop variant is known to lead to false alarms with CodePeer. Add pragma Annotate in such a case which can be justified. Tested on x86_64-pc-linux-gnu, committed on trunk gcc/ada/ * libgnat/s-exponu.adb (Exponu): Add annotation.diff --git a/gcc/ada/libgnat/s-exponu.adb b/gcc/ada/libgnat/s-exponu.adb --- a/gcc/ada/libgnat/s-exponu.adb +++ b/gcc/ada/libgnat/s-exponu.adb @@ -64,6 +64,9 @@ begin pragma Loop_Invariant (Exp > 0); pragma Loop_Invariant (Result * Factor ** Exp = Left ** Right); pragma Loop_Variant (Decreases => Exp); + pragma Annotate + (CodePeer, False_Positive, +"validity check", "confusion on generated code"); if Exp rem 2 /= 0 then pragma Assert
[Ada] Removal of technical debt
This patch removes various technical debt in the form of "???" comments throughout the GNAT sources. Tested on x86_64-pc-linux-gnu, committed on trunk gcc/ada/ * exp_ch6.adb (Add_Simple_Call_By_Copy_Code): Add comments regarding special handling of components which depend on discriminants. * exp_dist.adb (Build_From_Any_Function): Add Real_Rep actual for calls to Has_Stream_Attribute_Definition. (Build_To_Any_Function): Likewise. (Build_TypeCode_Function): Likewise. * freeze.adb (Freeze_Entity): Add missing comment for Test_E. * libgnat/s-utf_32.adb: Remove disabled warning comments and temporarily inserted pragma warnings. Remove very old (2006 and 2012) comments about bootstrapping older versions. * par.adb (P_Identifier): Add new parameter Force_Msg. * par-ch2.adb (P_Identifier): Restructure and clean up function. * par-ch3.adb (P_Defining_Identifier): Remove code duplication for parsing identifiers. * sem_attr.adb (Stream_Attribute_Available): Add missing comments and add Real_Rep actual for calls to Has_Stream_Attribute_Definition. * sem_cat.adb (Has_Read_Write_Attribute): Add Real_Rep actual for calls to Has_Stream_Attribute_Definition. (Has_Stream_Attribute_Definition): Remove local Real_Rep and fix recursive calls. Add default value for Real_Rep. * sem_cat.ads (Has_Stream_Attribute_Definition): Add new out parameter "Real_Rep". * sem_type.adb (Add_Entry): Add condition to avoid passing non-function calls to Function_Interp_Has_Abstract_Op. (Function_Interp_Has_Abstract_Op): Add missing comments and remove check for Is_Overloadable. * sem_util.adb (Derivation_Too_Early_To_Inherit): Remove duplicated code.diff --git a/gcc/ada/exp_ch6.adb b/gcc/ada/exp_ch6.adb --- a/gcc/ada/exp_ch6.adb +++ b/gcc/ada/exp_ch6.adb @@ -1899,7 +1899,7 @@ package body Exp_Ch6 is Reset_Packed_Prefix; - Temp := Make_Temporary (Loc, 'T', Actual); + Temp := Make_Temporary (Loc, 'T', Actual); Incod := Relocate_Node (Actual); Outcod := New_Copy_Tree (Incod); @@ -1921,7 +1921,10 @@ package body Exp_Ch6 is elsif Inside_Init_Proc then --- Could use a comment here to match comment below ??? +-- Skip using the actual as the expression in Decl if we are in +-- an init proc and it is not a component which depends on a +-- discriminant, because, in this case, we need to use the actual +-- type of the component instead. if Nkind (Actual) /= N_Selected_Component or else @@ -1930,8 +1933,9 @@ package body Exp_Ch6 is then Incod := Empty; --- Otherwise, keep the component in order to generate the proper --- actual subtype, that depends on enclosing discriminants. +-- Otherwise, keep the component so we can generate the proper +-- actual subtype - since the subtype depends on enclosing +-- discriminants. else null; diff --git a/gcc/ada/exp_dist.adb b/gcc/ada/exp_dist.adb --- a/gcc/ada/exp_dist.adb +++ b/gcc/ada/exp_dist.adb @@ -8600,6 +8600,8 @@ package body Exp_Dist is Use_Opaque_Representation : Boolean; +Real_Rep : Node_Id; + begin -- For a derived type, we can't go past the base type (to the -- parent type) here, because that would cause the attribute's @@ -8634,10 +8636,10 @@ package body Exp_Dist is Use_Opaque_Representation := False; if Has_Stream_Attribute_Definition - (Typ, TSS_Stream_Output, At_Any_Place => True) + (Typ, TSS_Stream_Output, Real_Rep, At_Any_Place => True) or else Has_Stream_Attribute_Definition - (Typ, TSS_Stream_Write, At_Any_Place => True) + (Typ, TSS_Stream_Write, Real_Rep, At_Any_Place => True) then -- If user-defined stream attributes are specified for this -- type, use them and transmit data as an opaque sequence of @@ -9438,6 +9440,8 @@ package body Exp_Dist is -- When True, use stream attributes and represent type as an -- opaque sequence of bytes. +Real_Rep : Node_Id; + begin -- For a derived type, we can't go past the base type (to the -- parent type) here, because that would cause the attribute's @@ -9492,10 +9496,10 @@ package body Exp_Dist is Use_Opaque_Representation := False; if Has_Stream_Attribute_Definition - (Typ, TSS_Stream_Output, At_Any_Place => True) + (Typ, TSS_Stream_Output, Real_Rep, At_Any_Place
[Ada] Suppress spurious CodePeer check on generic actual subprogram
Procedure Destroy is intentionally doing nothing and needs an IN OUT parameter, because to match the profile of a generic formal subprogram. Tested on x86_64-pc-linux-gnu, committed on trunk gcc/ada/ * treepr.adb (Destroy): Prevent spurious check from CodePeer.diff --git a/gcc/ada/treepr.adb b/gcc/ada/treepr.adb --- a/gcc/ada/treepr.adb +++ b/gcc/ada/treepr.adb @@ -84,6 +84,8 @@ package body Treepr is -- Simple Hash function for Node_Ids, List_Ids and Elist_Ids procedure Destroy (Value : in out Nat) is null; + pragma Annotate (CodePeer, False_Positive, "unassigned parameter", +"in out parameter is required to instantiate generic"); -- Dummy routine for destroing hashed values package Serial_Numbers is new Dynamic_Hash_Tables
[Ada] Rename Any_Access into Universal_Access
The front-end defines an Any_Access entity which is only used as the type of the literal null. Now, since AI95-0230, the RM 4.2(8/2) clause reads: "An integer literal is of type universal_integer. A real literal is of type universal_real. The literal null is of type universal_access." and e.g. Find_Non_Universal_Interpretations deals with Any_Access as if it was an universal type, so it is more consistent to rename it into Universal_Access. Tested on x86_64-pc-linux-gnu, committed on trunk gcc/ada/ * stand.ads (Any_Access): Delete. (Universal_Access): New entity. * einfo.ads: Remove obsolete reference to Any_Access. * gen_il-gen-gen_entities.adb: Likewise. * cstand.adb (Create_Standard): Do not create Any_Access and create Universal_Access as a full type instead. * errout.adb (Set_Msg_Insertion_Type_Reference): Do not deal with Any_Access and deal with Universal_Access instead. * sem_ch3.adb (Analyze_Object_Declaration): Replace Any_Access with Universal_Access. * sem_ch4.adb (Analyze_Null): Likewise. (Find_Non_Universal_Interpretations): Likewise. (Find_Equality_Types.Try_One_Interp): Likewise and avoid shadowing by renaming a local variable of the same name. * sem_res.adb (Make_Call_Into_Operato): Likewise. (Resolve_Equality_Op): Likewise. * sem_type.adb (Covers): Likewise. (Specific_Type): Likewise.diff --git a/gcc/ada/cstand.adb b/gcc/ada/cstand.adb --- a/gcc/ada/cstand.adb +++ b/gcc/ada/cstand.adb @@ -1191,15 +1191,6 @@ package body CStand is pragma Assert (not Known_Esize (Any_Id)); pragma Assert (not Known_Alignment (Any_Id)); - Any_Access := New_Standard_Entity ("an access type"); - Mutate_Ekind (Any_Access, E_Access_Type); - Set_Scope (Any_Access, Standard_Standard); - Set_Etype (Any_Access, Any_Access); - Init_Size (Any_Access, System_Address_Size); - Set_Elem_Alignment(Any_Access); - Set_Directly_Designated_Type -(Any_Access, Any_Type); - Any_Character := New_Standard_Entity ("a character type"); Mutate_Ekind (Any_Character, E_Enumeration_Type); Set_Scope (Any_Character, Standard_Standard); @@ -1416,6 +1407,16 @@ package body CStand is Set_Size_Known_At_Compile_Time (Universal_Fixed); + Universal_Access := New_Standard_Entity ("universal_access"); + Decl := New_Node (N_Full_Type_Declaration, Stloc); + Set_Defining_Identifier (Decl, Universal_Access); + Mutate_Ekind (Universal_Access, E_Access_Type); + Set_Etype(Universal_Access, Universal_Access); + Set_Scope(Universal_Access, Standard_Standard); + Init_Size(Universal_Access, System_Address_Size); + Set_Elem_Alignment (Universal_Access); + Set_Directly_Designated_Type (Universal_Access, Any_Type); + -- Create type declaration for Duration, using a 64-bit size. The -- delta and size values depend on the mode set in system.ads. diff --git a/gcc/ada/einfo.ads b/gcc/ada/einfo.ads --- a/gcc/ada/einfo.ads +++ b/gcc/ada/einfo.ads @@ -4864,10 +4864,6 @@ package Einfo is -- associated with an access attribute. After resolution a specific access -- type will be established as determined by the context. --- Finally, the type Any_Access is used to label -null- during type --- resolution. Any_Access is also replaced by the context type after --- resolution. - -- Description of Defined Attributes for Entity_Kinds -- diff --git a/gcc/ada/errout.adb b/gcc/ada/errout.adb --- a/gcc/ada/errout.adb +++ b/gcc/ada/errout.adb @@ -3622,8 +3622,7 @@ package body Errout is Set_Msg_Str ("exception name"); return; - elsif Error_Msg_Node_1 = Any_Access -or else Error_Msg_Node_1 = Any_Array + elsif Error_Msg_Node_1 = Any_Array or else Error_Msg_Node_1 = Any_Boolean or else Error_Msg_Node_1 = Any_Character or else Error_Msg_Node_1 = Any_Composite @@ -3640,17 +3639,21 @@ package body Errout is Set_Msg_Name_Buffer; return; - elsif Error_Msg_Node_1 = Universal_Real then - Set_Msg_Str ("type universal real"); - return; - elsif Error_Msg_Node_1 = Universal_Integer then Set_Msg_Str ("type universal integer"); return; + elsif Error_Msg_Node_1 = Universal_Real then + Set_Msg_Str ("type universal real"); + return; + elsif Error_Msg_Node_1 = Universal_Fixed then Set_Msg_Str ("type universal fixed"); return; + + elsif Error_Msg_Node_1 = Universal_Access then +
[Ada] Remove duplicates of empty strings
In package Stringt we already have a Null_String_Id, which represents a null string with length zero. There is no need to duplicate it in other packages. Cleanup originating from enabling expansion of dispatching wrappers for GNATprove; semantics is unaffected. Tested on x86_64-pc-linux-gnu, committed on trunk gcc/ada/ * par-ch4.adb (P_Simple_Expression): Reuse Null_String_Id. * prep.adb (Parse_Def_File): Likewise; remove Empty_String.diff --git a/gcc/ada/par-ch4.adb b/gcc/ada/par-ch4.adb --- a/gcc/ada/par-ch4.adb +++ b/gcc/ada/par-ch4.adb @@ -2471,9 +2471,6 @@ package body Ch4 is and then Num_Concats >= Num_Concats_Threshold then declare - Empty_String_Val : String_Id; - -- String_Id for "" - Strlit_Concat_Val : String_Id; -- Contains the folded value (which will be correct if the -- "&" operators are the predefined ones). @@ -2509,11 +2506,9 @@ package body Ch4 is -- Create new folded node, and rewrite result with a concat- -- enation of an empty string literal and the folded node. - Start_String; - Empty_String_Val := End_String; New_Node := Make_Op_Concat (Loc, - Make_String_Literal (Loc, Empty_String_Val), + Make_String_Literal (Loc, Null_String_Id), Make_String_Literal (Loc, Strlit_Concat_Val, Is_Folded_In_Parser => True)); Rewrite (Node1, New_Node); diff --git a/gcc/ada/prep.adb b/gcc/ada/prep.adb --- a/gcc/ada/prep.adb +++ b/gcc/ada/prep.adb @@ -114,9 +114,6 @@ package body Prep is -- Used to avoid repetition of the part of the initialisation that needs -- to be done only once. - Empty_String : String_Id; - -- "", as a string_id - String_False : String_Id; -- "false", as a string_id @@ -809,9 +806,6 @@ package body Prep is Store_String_Chars ("True"); True_Value.Value := End_String; - Start_String; - Empty_String := End_String; - Start_String; Store_String_Chars ("False"); String_False := End_String; @@ -1072,7 +1066,7 @@ package body Prep is Original=> Original_Name, On_The_Command_Line => False, Is_A_String => False, -Value => Empty_String); +Value => Null_String_Id); else Value_Start := Token_Ptr;
[Ada] New restriction No_Tagged_Type_Registration
This patch implements the No_Tagged_Type_Registration restriction, analogous to No_Exception_Registration, but for tagged types. Fix several violations of the RTE_Available protocol, as documented in rtsfind.ads: -- If we call this and it returns True, we should generate a reference to -- E (usually a call). In other words, for a subprogram E, the compiler -- should not call RTE_Available (E) until it has decided it wants to -- generate a call to E. This implies that RTE_Available must be called last in certain if statements. Tested on x86_64-pc-linux-gnu, committed on trunk gcc/ada/ * libgnat/s-rident.ads (No_Tagged_Type_Registration): New restriction identifier. * restrict.ads (Implementation_Restriction): Add restriction. * exp_ch7.adb (Process_Declarations): Suppress tagged-type-related finalization actions if the restriction is active. Call RTE_Available last. * exp_disp.adb (Make_DT): Likewise. * exp_util.adb (Requires_Cleanup_Actions): Return False for a tagged type declaration if No_Tagged_Type_Registration is active. * sem_attr.adb (Check_Stream_Attribute): Check restriction No_Tagged_Type_Registration. * libgnat/a-except.ads (Null_Occurrence): Minor: Initialize, to avoid stopping at a warning in gdb. * doc/gnat_rm/standard_and_implementation_defined_restrictions.rst: Document new restriction. * gnat_rm.texi: Regenerate. patch.diff.gz Description: application/gzip
[Ada] Spurious error when using current instance of type
This patch fixes an issue in the compiler whereby it fails to recognize the presence of a current instance of an incomplete type when the instance is used within a default expression for a record component. Tested on x86_64-pc-linux-gnu, committed on trunk gcc/ada/ * exp_ch3.adb (Build_Assignment): Replace current instance of type with Init_Proc formal. * sem_attr.adb (OK_Self_Reference): Handle recognition of Current_Instance to detect certain expansion. * sem_ch4.adb (Analyze_One_Call): Set actual's type when the actual in question is a current instance and its corresponding formal is an incomplete type. * sem_util.adb (Is_Current_Instance): Add check for incomplete views and add comment.diff --git a/gcc/ada/exp_ch3.adb b/gcc/ada/exp_ch3.adb --- a/gcc/ada/exp_ch3.adb +++ b/gcc/ada/exp_ch3.adb @@ -2022,6 +2022,25 @@ package body Exp_Ch3 is Elmt2 => Defining_Identifier (First (Parameter_Specifications (Parent (Proc_Id); + + -- If the type has an incomplete view, a current instance + -- may have an incomplete type. In that case, it must also be + -- replaced by the formal of the Init_Proc. + + if Nkind (Parent (Rec_Type)) = N_Full_Type_Declaration + and then Present (Incomplete_View (Parent (Rec_Type))) + then + Append_Elmt ( +N => Defining_Identifier +(Incomplete_View (Parent (Rec_Type))), +To => Map); + Append_Elmt ( +N => Defining_Identifier +(First + (Parameter_Specifications +(Parent (Proc_Id, +To => Map); + end if; end if; Exp := New_Copy_Tree (Exp, New_Scope => Proc_Id, Map => Map); diff --git a/gcc/ada/sem_attr.adb b/gcc/ada/sem_attr.adb --- a/gcc/ada/sem_attr.adb +++ b/gcc/ada/sem_attr.adb @@ -989,7 +989,15 @@ package body Sem_Attr is Set_Etype (P, Typ); end if; - if Typ = Scop then + -- A current instance typically appears immediately within + -- the type declaration, but may be nested within an internally + -- generated temporary scope - as for an aggregate of a + -- discriminated component. + + if Typ = Scop + or else (In_Open_Scopes (Typ) + and then not Comes_From_Source (Scop)) + then declare Q : Node_Id := Parent (N); diff --git a/gcc/ada/sem_ch4.adb b/gcc/ada/sem_ch4.adb --- a/gcc/ada/sem_ch4.adb +++ b/gcc/ada/sem_ch4.adb @@ -3727,6 +3727,24 @@ package body Sem_Ch4 is Next_Actual (Actual); Next_Formal (Formal); + -- A current instance used as an actual of a function, + -- whose body has not been seen, may include a formal + -- whose type is an incomplete view of an enclosing + -- type declaration containing the current call (e.g. + -- in the Expression for a component declaration). + + -- In this case, update the signature of the subprogram + -- so the formal has the type of the full view. + + elsif Inside_Init_Proc + and then Nkind (Actual) = N_Identifier + and then Ekind (Etype (Formal)) = E_Incomplete_Type + and then Etype (Actual) = Full_View (Etype (Formal)) + then + Set_Etype (Formal, Etype (Actual)); + Next_Actual (Actual); + Next_Formal (Formal); + -- Handle failed type check else diff --git a/gcc/ada/sem_util.adb b/gcc/ada/sem_util.adb --- a/gcc/ada/sem_util.adb +++ b/gcc/ada/sem_util.adb @@ -16680,7 +16680,18 @@ package body Sem_Util is | N_Private_Type_Declaration | N_Subtype_Declaration and then Comes_From_Source (P) - and then Defining_Entity (P) = Typ + + -- If the type has a previous incomplete declaration, the + -- reference in the type definition may have the incomplete + -- view. So, here we detect if this incomplete view is a current + -- instance by checking if its full view is the entity of the + -- full declaration begin analyzed. + + and then +(Defining_Entity (P) = Typ + or else + (Ekind (Typ) = E_Incomplete_Type + and then Full_View (Typ) = Defining_Entity (P)))
[Ada] Avoid building malformed component constraints
Given a discriminated type T1 with discriminant D1 having a component C1 of another discriminated type T2 with discriminant D2 and a propagated discriminant constraint (that is, "C1 : T2 (D2 => D1);" and, for example, a parameter of type T1, the compiler will sometimes build an anonymous subtype to describe the constraints of the C1 component of that parameter. In some cases, these constraints were malformed; this could result in either internal errors during compilation or the generation of incorrect constraint checks. This error is corrected. Tested on x86_64-pc-linux-gnu, committed on trunk gcc/ada/ * sem_util.adb (Build_Actual_Subtype_Of_Component): Define a new local function, Build_Discriminant_Reference, and call it in each of the three cases where Make_Selected_Component was previously being called to construct a discriminant reference (2 in Build_Actual_Array_Constraint and 1 in Build_Actual_Record_Constraint). Instead of unconditionally using the passed-in object name as the prefix for the new selected component node, this new function checks to see if perhaps a prefix of that name should be used instead.diff --git a/gcc/ada/sem_util.adb b/gcc/ada/sem_util.adb --- a/gcc/ada/sem_util.adb +++ b/gcc/ada/sem_util.adb @@ -1970,6 +1970,12 @@ package body Sem_Util is -- Similar to previous one, for discriminated components constrained -- by the discriminant of the enclosing object. + function Build_Discriminant_Reference +(Discrim_Name : Node_Id; Obj : Node_Id := P) return Node_Id; + -- Build a reference to the discriminant denoted by Discrim_Name. + -- The prefix of the result is usually Obj, but it could be + -- a prefix of Obj in some corner cases. + function Copy_And_Maybe_Dereference (N : Node_Id) return Node_Id; -- Copy the subtree rooted at N and insert an explicit dereference if it -- is of an access type. @@ -1993,11 +1999,7 @@ package body Sem_Util is Old_Hi := Type_High_Bound (Etype (Indx)); if Denotes_Discriminant (Old_Lo) then - Lo := - Make_Selected_Component (Loc, - Prefix => Copy_And_Maybe_Dereference (P), - Selector_Name => New_Occurrence_Of (Entity (Old_Lo), Loc)); - + Lo := Build_Discriminant_Reference (Old_Lo); else Lo := New_Copy_Tree (Old_Lo); @@ -2011,11 +2013,7 @@ package body Sem_Util is end if; if Denotes_Discriminant (Old_Hi) then - Hi := - Make_Selected_Component (Loc, - Prefix => Copy_And_Maybe_Dereference (P), - Selector_Name => New_Occurrence_Of (Entity (Old_Hi), Loc)); - + Hi := Build_Discriminant_Reference (Old_Hi); else Hi := New_Copy_Tree (Old_Hi); Set_Analyzed (Hi, False); @@ -2041,10 +2039,7 @@ package body Sem_Util is D := First_Elmt (Discriminant_Constraint (Desig_Typ)); while Present (D) loop if Denotes_Discriminant (Node (D)) then - D_Val := Make_Selected_Component (Loc, - Prefix => Copy_And_Maybe_Dereference (P), -Selector_Name => New_Occurrence_Of (Entity (Node (D)), Loc)); - + D_Val := Build_Discriminant_Reference (Node (D)); else D_Val := New_Copy_Tree (Node (D)); end if; @@ -2056,6 +2051,89 @@ package body Sem_Util is return Constraints; end Build_Actual_Record_Constraint; + -- + -- Build_Discriminant_Reference -- + -- + + function Build_Discriminant_Reference +(Discrim_Name : Node_Id; Obj : Node_Id := P) return Node_Id + is + Discrim : constant Entity_Id := Entity (Discrim_Name); + + function Obj_Is_Good_Prefix return Boolean; + -- Returns True if Obj.Discrim makes sense; that is, if + -- Obj has Discrim as one of its discriminants (or is an + -- access value that designates such an object). + + + -- Obj_Is_Good_Prefix -- + + + function Obj_Is_Good_Prefix return Boolean is +Obj_Type : Entity_Id := + Implementation_Base_Type (Etype (Obj)); + +Discriminated_Type : constant Entity_Id := + Implementation_Base_Type +(Scope (Original_Record_Component (Discrim))); + begin +-- The order of the following two tests matters in the +-- access-to-class-wide case. + +if Is_Access_Type (Obj_Type) then + Obj_Type := Implementation_Base_Type + (Designated_Type (Obj_Type)); +end if; + +if I
[Ada] Avoid building malformed component constraints
The previous fix introduced a not-yet-understood regression in compiling CodePeer. For now, we attempt a quick workaround for the problem. Tested on x86_64-pc-linux-gnu, committed on trunk gcc/ada/ * sem_util.adb (Build_Discriminant_Reference): In the unexpected case where we previously would fail an assertion, we instead revert to the old behavior.diff --git a/gcc/ada/sem_util.adb b/gcc/ada/sem_util.adb --- a/gcc/ada/sem_util.adb +++ b/gcc/ada/sem_util.adb @@ -2107,11 +2107,7 @@ package body Sem_Util is -- Start of processing for Build_Discriminant_Reference begin - if Obj_Is_Good_Prefix then -return Make_Selected_Component (Loc, - Prefix => Copy_And_Maybe_Dereference (Obj), - Selector_Name => New_Occurrence_Of (Discrim, Loc)); - else + if not Obj_Is_Good_Prefix then -- If the given discriminant is not a component of the given -- object, then try the enclosing object. @@ -2128,10 +2124,15 @@ package body Sem_Util is (Discrim_Name => Discrim_Name, Obj => Name (Parent (Entity (Obj; else - pragma Assert (False); - raise Program_Error; + -- We are in some unexpected case here, so revert to the + -- old behavior (by falling through to it). + null; end if; end if; + + return Make_Selected_Component (Loc, + Prefix => Copy_And_Maybe_Dereference (Obj), + Selector_Name => New_Occurrence_Of (Discrim, Loc)); end Build_Discriminant_Reference;
[Ada] Fix spurious error on instantiation with Text_IO name
This gets rid of a spurious error given by the compiler on the instantiation of a generic package, when the instance happens to be an homonym of one of the subpackages of Text_IO (Fixed_IO, Float_IO, etc) and when it is placed in a context where Text_IO itself is also visible. Tested on x86_64-pc-linux-gnu, committed on trunk gcc/ada/ * sem_ch8.adb (Analyze_Package_Renaming): Do not check for Text_IO special units when the name of the renaming is a generic instance, which is the case for package instantiations in the GNAT model.diff --git a/gcc/ada/sem_ch8.adb b/gcc/ada/sem_ch8.adb --- a/gcc/ada/sem_ch8.adb +++ b/gcc/ada/sem_ch8.adb @@ -1594,9 +1594,18 @@ package body Sem_Ch8 is return; end if; - -- Check for Text_IO special unit (we may be renaming a Text_IO child) + -- Check for Text_IO special units (we may be renaming a Text_IO child), + -- but make sure not to catch renamings generated for package instances + -- that have nothing to do with them but are nevertheless homonyms. - Check_Text_IO_Special_Unit (Name (N)); + if Is_Entity_Name (Name (N)) +and then Present (Entity (Name (N))) +and then Is_Generic_Instance (Entity (Name (N))) + then + null; + else + Check_Text_IO_Special_Unit (Name (N)); + end if; if Current_Scope /= Standard_Standard then Set_Is_Pure (New_P, Is_Pure (Current_Scope));
[Ada] Fix style in calls to Compile_Time_Constraint_Error
Cleanup related to handling of -gnatwE (warnings-as-errors) in instances of generic units. Tested on x86_64-pc-linux-gnu, committed on trunk gcc/ada/ * checks.adb (Null_Exclusion_Static_Checks, Selected_Range_Checks): Fix style.diff --git a/gcc/ada/checks.adb b/gcc/ada/checks.adb --- a/gcc/ada/checks.adb +++ b/gcc/ada/checks.adb @@ -4455,8 +4455,8 @@ package body Checks is Discard_Node (Compile_Time_Constraint_Error - (N => N, - Msg=> + (N => N, + Msg => "(Ada 2005) null-excluding component % of object % must " & "be initialized??", Ent => Defining_Identifier (Comp))); @@ -4467,8 +4467,8 @@ package body Checks is elsif Array_Comp then Discard_Node (Compile_Time_Constraint_Error - (N => N, - Msg=> + (N => N, + Msg => "(Ada 2005) null-excluding array components must " & "be initialized??", Ent => Defining_Identifier (N))); @@ -10896,8 +10896,8 @@ package body Checks is Add_Check (Compile_Time_Constraint_Error ((if Present (Warn_Node) -then Warn_Node else Low_Bound (Expr)), -"static value does not equal lower bound of}??", + then Warn_Node else Low_Bound (Expr)), + "static value does not equal lower bound of}??", T_Typ)); Check_Added := True; end if;
[Ada] Refactor repeated implicit conversion from Char_Code to Uint
When resolving a string literal we examine each character against low and high bounds of the expected type. We stored each character as an Int and implicitly converted it to Uint twice: for "<" and ">" operators. Now we store convert it to Uint explicitly and only once. Cleanup related to handling of compile-time constraints errors. Semantics is unaffacted. Tested on x86_64-pc-linux-gnu, committed on trunk gcc/ada/ * sem_res.adb (Resolve_String_Literal): Avoid unnecessary conversions inside "<" and ">" bodies.diff --git a/gcc/ada/sem_res.adb b/gcc/ada/sem_res.adb --- a/gcc/ada/sem_res.adb +++ b/gcc/ada/sem_res.adb @@ -11722,14 +11722,14 @@ package body Sem_Res is Comp_Typ_Hi : constant Node_Id := Type_High_Bound (Component_Type (Typ)); - Char_Val : Int; + Char_Val : Uint; begin if Compile_Time_Known_Value (Comp_Typ_Lo) and then Compile_Time_Known_Value (Comp_Typ_Hi) then for J in 1 .. Strlen loop - Char_Val := Int (Get_String_Char (Str, J)); + Char_Val := UI_From_CC (Get_String_Char (Str, J)); if Char_Val < Expr_Value (Comp_Typ_Lo) or else Char_Val > Expr_Value (Comp_Typ_Hi)
[Ada] Simplify type conversions in source pointer arithmetic
Code cleanup; semantics is unaffacted. Tested on x86_64-pc-linux-gnu, committed on trunk gcc/ada/ * sem_res.adb (Resolve_String_Literal): Simplify pointer arithmetic.diff --git a/gcc/ada/sem_res.adb b/gcc/ada/sem_res.adb --- a/gcc/ada/sem_res.adb +++ b/gcc/ada/sem_res.adb @@ -11737,7 +11737,7 @@ package body Sem_Res is Apply_Compile_Time_Constraint_Error (N, "character out of range??", CE_Range_Check_Failed, - Loc => Source_Ptr (Int (Loc) + J)); + Loc => Loc + Source_Ptr (J)); end if; end loop;
[Ada] Fix style in comments about warning messages
Cleanup related to handling of messages for compile-time known constraint errors. Tested on x86_64-pc-linux-gnu, committed on trunk gcc/ada/ * errout.adb (Error_Msg_Internal): Reorder words. * erroutc.ads (Is_Warning_Msg): Add closing paren. * sem_util.adb (Compile_Time_Constraint_Error): Fix casing.diff --git a/gcc/ada/errout.adb b/gcc/ada/errout.adb --- a/gcc/ada/errout.adb +++ b/gcc/ada/errout.adb @@ -1049,7 +1049,7 @@ package body Errout is elsif No (Cunit (Main_Unit)) then null; - -- If the flag location is not in the main extended source unit, then + -- If the flag location is not in the extended main source unit, then -- we want to eliminate the warning, unless it is in the extended -- main code unit and we want warnings on the instance. diff --git a/gcc/ada/erroutc.ads b/gcc/ada/erroutc.ads --- a/gcc/ada/erroutc.ads +++ b/gcc/ada/erroutc.ads @@ -69,7 +69,7 @@ package Erroutc is Is_Warning_Msg : Boolean := False; -- Set True to indicate if current message is warning message (contains ? - -- or contains < and Error_Msg_Warn is True. + -- or contains < and Error_Msg_Warn is True). Is_Info_Msg : Boolean := False; -- Set True to indicate that the current message starts with the characters diff --git a/gcc/ada/sem_util.adb b/gcc/ada/sem_util.adb --- a/gcc/ada/sem_util.adb +++ b/gcc/ada/sem_util.adb @@ -6604,7 +6604,7 @@ package body Sem_Util is Error_Msg_Warn := Warn or SPARK_Mode /= On; -- A static constraint error in an instance body is not a fatal error. - -- we choose to inhibit the message altogether, because there is no + -- We choose to inhibit the message altogether, because there is no -- obvious node (for now) on which to post it. On the other hand the -- offending node must be replaced with a constraint_error in any case.
[Ada] Remove unreferenced Warn_On_Instance
Cleanup related to handling of -gnatwE (warnings-as-errors) in instances of generic units. Tested on x86_64-pc-linux-gnu, committed on trunk gcc/ada/ * err_vars.ads (Warn_On_Instance): Remove; it was a relic from the previous handling of warning in instances that was removed decades ago.diff --git a/gcc/ada/err_vars.ads b/gcc/ada/err_vars.ads --- a/gcc/ada/err_vars.ads +++ b/gcc/ada/err_vars.ads @@ -61,14 +61,6 @@ package Err_Vars is -- not get reset by any Error_Msg call, so the caller is responsible -- for resetting it. - Warn_On_Instance : Boolean := False; - -- Normally if a warning is generated in a generic template from the - -- analysis of the template, then the warning really belongs in the - -- template, and the default value of False for this Boolean achieves - -- that effect. If Warn_On_Instance is set True, then the warnings are - -- generated on the instantiation (referring to the template) rather - -- than on the template itself. - Raise_Exception_On_Error : Nat := 0; -- If this value is non-zero, then any attempt to generate an error -- message raises the exception Error_Msg_Exception, and the error
[Ada] Fix regression in freezing code for instantiations
When going to the outer level for the placement of a freeze node in the case where the current package has no body, the previous change would overlook instantiations whose body has not materialized yet. Tested on x86_64-pc-linux-gnu, committed on trunk gcc/ada/ * sem_ch12.adb (Insert_Freeze_Node_For_Instance): When going to the outer level, do not jump over following instantiations in the list.diff --git a/gcc/ada/sem_ch12.adb b/gcc/ada/sem_ch12.adb --- a/gcc/ada/sem_ch12.adb +++ b/gcc/ada/sem_ch12.adb @@ -9703,6 +9703,7 @@ package body Sem_Ch12 is Decl : Node_Id; Decls: List_Id; Inst : Entity_Id; + Origin : Entity_Id; Par_Inst : Node_Id; Par_N: Node_Id; @@ -9793,9 +9794,10 @@ package body Sem_Ch12 is end; end if; - Decl := N; - Decls := List_Containing (N); - Par_N := Parent (Decls); + Decl := N; + Decls := List_Containing (N); + Par_N := Parent (Decls); + Origin := Empty; -- Determine the proper freeze point of an instantiation @@ -9824,10 +9826,13 @@ package body Sem_Ch12 is not In_Same_Source_Unit (Generic_Parent (Par_Inst), Inst) then while Present (Decl) loop - if (Nkind (Decl) in N_Unit_Body + if ((Nkind (Decl) in N_Unit_Body or else - Nkind (Decl) in N_Body_Stub) -and then Comes_From_Source (Decl) + Nkind (Decl) in N_Body_Stub) + and then Comes_From_Source (Decl)) +or else (Present (Origin) + and then Nkind (Decl) in N_Generic_Instantiation + and then Instance_Spec (Decl) /= Origin) then Set_Sloc (F_Node, Sloc (Decl)); Insert_Before (Decl, F_Node); @@ -9851,16 +9856,19 @@ package body Sem_Ch12 is return; -- When the instantiation occurs in a package spec and there is --- no source body which follows, not even of the package itself --- then insert into the declaration list of the outer level. +-- no source body which follows, not even of the package itself, +-- then insert into the declaration list of the outer level, but +-- do not jump over following instantiations in this list because +-- they may have a body that has not materialized yet, see above. elsif Nkind (Par_N) = N_Package_Specification and then No (Corresponding_Body (Parent (Par_N))) and then Is_List_Member (Parent (Par_N)) then - Decl := Parent (Par_N); - Decls := List_Containing (Decl); - Par_N := Parent (Decls); + Decl := Parent (Par_N); + Decls := List_Containing (Decl); + Par_N := Parent (Decls); + Origin := Decl; -- In a package declaration, or if no source body which follows -- and at library level, then insert at end of list.
[Ada] Remove unnecessary guards for non-empty lists
All node lists can be safely iterated with First/Present/Next. There is no need for explicit guard against empty lists. Code cleanup. Tested on x86_64-pc-linux-gnu, committed on trunk gcc/ada/ * errout.adb (Remove_Warning_Messages): Remove unnecessary guard. * exp_util.adb (Kill_Dead_Code): Likewise. * par_sco.adb (Traverse_Declarations_Or_Statements): Likewise. * sem_ch3.adb (Build_Derived_Record_Type): Likewise. * sem_ch4.adb (Traverse_Interfaces): Likewise. * sem_eval.adb (Traverse_Interfaces): Likewise. * sem_util.adb (Collect_Interfaces): Likewise. (Has_Non_Null_Statements, Side_Effect_Free_Statements): Likewise; turn into WHILE loops, for consistency.diff --git a/gcc/ada/errout.adb b/gcc/ada/errout.adb --- a/gcc/ada/errout.adb +++ b/gcc/ada/errout.adb @@ -3378,13 +3378,11 @@ package body Errout is procedure Remove_Warning_Messages (L : List_Id) is Stat : Node_Id; begin - if Is_Non_Empty_List (L) then - Stat := First (L); - while Present (Stat) loop -Remove_Warning_Messages (Stat); -Next (Stat); - end loop; - end if; + Stat := First (L); + while Present (Stat) loop + Remove_Warning_Messages (Stat); + Next (Stat); + end loop; end Remove_Warning_Messages; diff --git a/gcc/ada/exp_util.adb b/gcc/ada/exp_util.adb --- a/gcc/ada/exp_util.adb +++ b/gcc/ada/exp_util.adb @@ -9495,14 +9495,12 @@ package body Exp_Util is begin W := Warn; - if Is_Non_Empty_List (L) then - N := First (L); - while Present (N) loop -Kill_Dead_Code (N, W); -W := False; -Next (N); - end loop; - end if; + N := First (L); + while Present (N) loop + Kill_Dead_Code (N, W); + W := False; + Next (N); + end loop; end Kill_Dead_Code; - diff --git a/gcc/ada/par_sco.adb b/gcc/ada/par_sco.adb --- a/gcc/ada/par_sco.adb +++ b/gcc/ada/par_sco.adb @@ -2417,21 +2417,18 @@ package body Par_SCO is -- Loop through statements or declarations - if Is_Non_Empty_List (L) then - N := First (L); - while Present (N) loop + N := First (L); + while Present (N) loop --- Note: For separate bodies, we see the tree after Par.Labl has --- introduced implicit labels, so we need to ignore those nodes. + -- Note: For separate bodies, we see the tree after Par.Labl has + -- introduced implicit labels, so we need to ignore those nodes. -if Nkind (N) /= N_Implicit_Label_Declaration then - Traverse_One (N); -end if; - -Next (N); - end loop; + if Nkind (N) /= N_Implicit_Label_Declaration then +Traverse_One (N); + end if; - end if; + Next (N); + end loop; -- End sequence of statements and flush deferred decisions diff --git a/gcc/ada/sem_ch3.adb b/gcc/ada/sem_ch3.adb --- a/gcc/ada/sem_ch3.adb +++ b/gcc/ada/sem_ch3.adb @@ -9351,13 +9351,11 @@ package body Sem_Ch3 is declare Iface : Node_Id; begin -if Is_Non_Empty_List (Interface_List (Type_Def)) then - Iface := First (Interface_List (Type_Def)); - while Present (Iface) loop - Freeze_Before (N, Etype (Iface)); - Next (Iface); - end loop; -end if; +Iface := First (Interface_List (Type_Def)); +while Present (Iface) loop + Freeze_Before (N, Etype (Iface)); + Next (Iface); +end loop; end; end if; diff --git a/gcc/ada/sem_ch4.adb b/gcc/ada/sem_ch4.adb --- a/gcc/ada/sem_ch4.adb +++ b/gcc/ada/sem_ch4.adb @@ -9597,31 +9597,29 @@ package body Sem_Ch4 is begin Error := False; -if Is_Non_Empty_List (Intface_List) then - Intface := First (Intface_List); - while Present (Intface) loop +Intface := First (Intface_List); +while Present (Intface) loop - -- Look for acceptable class-wide homonyms associated with - -- the interface. + -- Look for acceptable class-wide homonyms associated with the + -- interface. - Traverse_Homonyms (Etype (Intface), Error); + Traverse_Homonyms (Etype (Intface), Error); - if Error then - return; - end if; + if Error then + return; + end if; - -- Continue the search by looking at each of the interface's - -- associated interface ancestors. + -- Continue the search by looking at each of the
[Ada] Move messages on division by zero to the right operand
All compile-time messages about division by zero are now located at the right operand. Previously some of them were located at the division operator, which was inconsistent. Tested on x86_64-pc-linux-gnu, committed on trunk gcc/ada/ * sem_eval.adb (Eval_Arithmetic_Op): Add Loc parameter to all calls to Apply_Compile_Time_Constraint_Error related to division by zero.diff --git a/gcc/ada/sem_eval.adb b/gcc/ada/sem_eval.adb --- a/gcc/ada/sem_eval.adb +++ b/gcc/ada/sem_eval.adb @@ -2117,6 +2117,7 @@ package body Sem_Eval is Apply_Compile_Time_Constraint_Error (N, "division by zero", CE_Divide_By_Zero, +Loc => Sloc (Right), Warn => not Stat or SPARK_Mode = On); return; @@ -2139,6 +2140,7 @@ package body Sem_Eval is Apply_Compile_Time_Constraint_Error (N, "mod with zero divisor", CE_Divide_By_Zero, +Loc => Sloc (Right), Warn => not Stat or SPARK_Mode = On); return; @@ -2159,6 +2161,7 @@ package body Sem_Eval is Apply_Compile_Time_Constraint_Error (N, "rem with zero divisor", CE_Divide_By_Zero, +Loc => Sloc (Right), Warn => not Stat or SPARK_Mode = On); return; @@ -2218,7 +2221,8 @@ package body Sem_Eval is else pragma Assert (Nkind (N) = N_Op_Divide); if UR_Is_Zero (Right_Real) then Apply_Compile_Time_Constraint_Error -(N, "division by zero", CE_Divide_By_Zero); +(N, "division by zero", CE_Divide_By_Zero, + Loc => Sloc (Right)); return; end if;
[Ada] Remove a locally handled exception
Code cleanup related to handling of warnings-as-errors. Semantics is unaffected. Tested on x86_64-pc-linux-gnu, committed on trunk gcc/ada/ * exp_ch4.adb (Expand_Concatenate): There is no reason for using declaring, raising and catching an exception; a simple return statement is enough.diff --git a/gcc/ada/exp_ch4.adb b/gcc/ada/exp_ch4.adb --- a/gcc/ada/exp_ch4.adb +++ b/gcc/ada/exp_ch4.adb @@ -2695,9 +2695,6 @@ package body Exp_Ch4 is -- lengths of operands. The choice of this type is a little subtle and -- is discussed in a separate section at the start of the body code. - Concatenation_Error : exception; - -- Raised if concatenation is sure to raise a CE - Result_May_Be_Null : Boolean := True; -- Reset to False if at least one operand is encountered which is known -- at compile time to be non-null. Used for handling the special case @@ -3460,7 +3457,16 @@ package body Exp_Ch4 is -- Catch the static out of range case now if Raises_Constraint_Error (High_Bound) then - raise Concatenation_Error; + -- Kill warning generated for the declaration of the static out of + -- range high bound, and instead generate a Constraint_Error with + -- an appropriate specific message. + + Kill_Dead_Code (Declaration_Node (Entity (High_Bound))); + Apply_Compile_Time_Constraint_Error + (N => Cnode, +Msg=> "concatenation result upper bound out of range??", +Reason => CE_Range_Check_Failed); + return; end if; -- Now we will generate the assignments to do the actual concatenation @@ -3629,19 +3635,6 @@ package body Exp_Ch4 is pragma Assert (Present (Result)); Rewrite (Cnode, Result); Analyze_And_Resolve (Cnode, Atyp); - - exception - when Concatenation_Error => - - -- Kill warning generated for the declaration of the static out of - -- range high bound, and instead generate a Constraint_Error with - -- an appropriate specific message. - - Kill_Dead_Code (Declaration_Node (Entity (High_Bound))); - Apply_Compile_Time_Constraint_Error - (N => Cnode, -Msg=> "concatenation result upper bound out of range??", -Reason => CE_Range_Check_Failed); end Expand_Concatenate; ---
[Ada] Simplify traversal for removing warnings from dead code
Cleanup related to handling of warnings-as-errors. Semantics is unaffected. Tested on x86_64-pc-linux-gnu, committed on trunk gcc/ada/ * errout.adb (Remove_Warning_Messages): Use traversal procedure instead of traversal function, since we discard status of each step anyway.diff --git a/gcc/ada/errout.adb b/gcc/ada/errout.adb --- a/gcc/ada/errout.adb +++ b/gcc/ada/errout.adb @@ -3254,7 +3254,7 @@ package body Errout is function Check_For_Warning (N : Node_Id) return Traverse_Result; -- This function checks one node for a possible warning message - function Check_All_Warnings is new Traverse_Func (Check_For_Warning); + procedure Check_All_Warnings is new Traverse_Proc (Check_For_Warning); -- This defines the traversal operation --- @@ -3341,37 +3341,23 @@ package body Errout is -- the current tree. Given that we are in unreachable code, this -- modification to the tree is harmless. -declare - Status : Traverse_Final_Result; - -begin - if Is_List_Member (N) then - Set_Condition (N, Original_Node (N)); - Status := Check_All_Warnings (Condition (N)); - else - Rewrite (N, Original_Node (N)); - Status := Check_All_Warnings (N); - end if; - - return Status; -end; - - else -return OK; +if Is_List_Member (N) then + Set_Condition (N, Original_Node (N)); + Check_All_Warnings (Condition (N)); +else + Rewrite (N, Original_Node (N)); + Check_All_Warnings (N); +end if; end if; + + return OK; end Check_For_Warning; -- Start of processing for Remove_Warning_Messages begin if Warnings_Detected /= 0 then - declare -Discard : Traverse_Final_Result; -pragma Warnings (Off, Discard); - - begin -Discard := Check_All_Warnings (N); - end; + Check_All_Warnings (N); end if; end Remove_Warning_Messages;
[Ada] Proof of System.Generic_Array_Operations at silver level
Proof of the generic unit to array operations (vector/matrix), only at silver level, for runtime errors that come from the generic part of the unit. This does not prove e.g. absence of overflows in an instantiation related to arithmetic operations passed as formal generic subprogram parameters. Justify 2 checks with pragma Annotate as False_Positive (related to absence of aliasing) and 3 checks as Intentional (possible overflow and explicit exception). Tested on x86_64-pc-linux-gnu, committed on trunk gcc/ada/ * libgnat/a-ngcoar.adb: Add pragma to ignore assertions in instance. * libgnat/a-ngrear.adb: Likewise. * libgnat/s-gearop.adb: Prove implementation is free of runtime errors. * libgnat/s-gearop.ads: Add contracts to protect against runtime errors in the generic part. patch.diff.gz Description: application/gzip
[Ada] Remove unnecessary declare block
Code cleanup related to handling of warnings-as-errors. Semantics is unaffected. Tested on x86_64-pc-linux-gnu, committed on trunk gcc/ada/ * errout.adb (Adjust_Name_Case): Remove unnecessary declare block.diff --git a/gcc/ada/errout.adb b/gcc/ada/errout.adb --- a/gcc/ada/errout.adb +++ b/gcc/ada/errout.adb @@ -3391,60 +3391,57 @@ package body Errout is (Buf : in out Bounded_String; Loc : Source_Ptr) is + Src_Ind : constant Source_File_Index := Get_Source_File_Index (Loc); + Sbuffer : Source_Buffer_Ptr; + Ref_Ptr : Integer; + Src_Ptr : Source_Ptr; + begin -- We have an all lower case name from Namet, and now we want to set -- the appropriate case. If possible we copy the actual casing from -- the source. If not we use standard identifier casing. - declare - Src_Ind : constant Source_File_Index := Get_Source_File_Index (Loc); - Sbuffer : Source_Buffer_Ptr; - Ref_Ptr : Integer; - Src_Ptr : Source_Ptr; + Ref_Ptr := 1; + Src_Ptr := Loc; - begin - Ref_Ptr := 1; - Src_Ptr := Loc; + -- For standard locations, always use mixed case - -- For standard locations, always use mixed case + if Loc <= No_Location then + Set_Casing (Buf, Mixed_Case); - if Loc <= No_Location then -Set_Casing (Buf, Mixed_Case); + else + -- Determine if the reference we are dealing with corresponds to text + -- at the point of the error reference. This will often be the case + -- for simple identifier references, and is the case where we can + -- copy the casing from the source. + + Sbuffer := Source_Text (Src_Ind); + + while Ref_Ptr <= Buf.Length loop +exit when + Fold_Lower (Sbuffer (Src_Ptr)) /= +Fold_Lower (Buf.Chars (Ref_Ptr)); +Ref_Ptr := Ref_Ptr + 1; +Src_Ptr := Src_Ptr + 1; + end loop; - else --- Determine if the reference we are dealing with corresponds to --- text at the point of the error reference. This will often be --- the case for simple identifier references, and is the case --- where we can copy the casing from the source. + -- If we get through the loop without a mismatch, then output the + -- name the way it is cased in the source program. -Sbuffer := Source_Text (Src_Ind); + if Ref_Ptr > Buf.Length then +Src_Ptr := Loc; -while Ref_Ptr <= Buf.Length loop - exit when - Fold_Lower (Sbuffer (Src_Ptr)) /= - Fold_Lower (Buf.Chars (Ref_Ptr)); - Ref_Ptr := Ref_Ptr + 1; +for J in 1 .. Buf.Length loop + Buf.Chars (J) := Sbuffer (Src_Ptr); Src_Ptr := Src_Ptr + 1; end loop; --- If we get through the loop without a mismatch, then output the --- name the way it is cased in the source program - -if Ref_Ptr > Buf.Length then - Src_Ptr := Loc; + -- Otherwise set the casing using the default identifier casing - for J in 1 .. Buf.Length loop - Buf.Chars (J) := Sbuffer (Src_Ptr); - Src_Ptr := Src_Ptr + 1; - end loop; - --- Otherwise set the casing using the default identifier casing - -else - Set_Casing (Buf, Identifier_Casing (Src_Ind)); -end if; + else +Set_Casing (Buf, Identifier_Casing (Src_Ind)); end if; - end; + end if; end Adjust_Name_Case; ---
[Ada] Warn on subtype declaration of null range
This patch adds a warning on a subtype declaration with a compile-time-known range constraint that is a null range. Tested on x86_64-pc-linux-gnu, committed on trunk gcc/ada/ * sem_res.adb (Resolve_Range): Warn on null range, unless we are inside a generic unit or an instance thereof. * sem_ch3.adb (Analyze_Subtype_Indication): Minor: avoid double negative.diff --git a/gcc/ada/sem_ch3.adb b/gcc/ada/sem_ch3.adb --- a/gcc/ada/sem_ch3.adb +++ b/gcc/ada/sem_ch3.adb @@ -6048,13 +6048,13 @@ package body Sem_Ch3 is begin Analyze (T); - if R /= Error then + if R = Error then + Set_Error_Posted (R); + Set_Error_Posted (T); + else Analyze (R); Set_Etype (N, Etype (R)); Resolve (R, Entity (T)); - else - Set_Error_Posted (R); - Set_Error_Posted (T); end if; end Analyze_Subtype_Indication; diff --git a/gcc/ada/sem_res.adb b/gcc/ada/sem_res.adb --- a/gcc/ada/sem_res.adb +++ b/gcc/ada/sem_res.adb @@ -10754,6 +10754,30 @@ package body Sem_Res is Fold_Uint (H, Expr_Value (H), Static => True); end if; end if; + + -- If we have a compile-time-known null range, we warn, because that is + -- likely to be a mistake. (Dynamic null ranges make sense, but often + -- compile-time-known ones do not.) Warn only if this is in a subtype + -- declaration. We do this here, rather than while analyzing a subtype + -- declaration, in case we decide to expand the cases. We do not want to + -- warn in all cases, because some are idiomatic, such as an empty + -- aggregate (1 .. 0 => <>). + + -- We don't warn in generics or their instances, because there might be + -- some instances where the range is null, and some where it is not, + -- which would lead to false alarms. + + if not (Inside_A_Generic or In_Instance) +and then Comes_From_Source (N) +and then Compile_Time_Compare + (Low_Bound (N), High_Bound (N), Assume_Valid => True) = GT +and then Nkind (Parent (N)) = N_Range_Constraint +and then Nkind (Parent (Parent (N))) = N_Subtype_Indication +and then Nkind (Parent (Parent (Parent (N = N_Subtype_Declaration +and then Is_OK_Static_Range (N) + then + Error_Msg_N ("null range??", N); + end if; end Resolve_Range; --
Re: [PATCH] [i386] Optimize V16HF vector insert to element 0 for AVX2.
On Thu, Jan 6, 2022 at 10:22 AM liuhongt via Gcc-patches wrote: > > Also remove mode attribute blendsuf, use ssemodesuf instead. > Bootstrapped and regtested on x86_64-pc-linux-gnu{-m32,}. > Ready to push to trunk. > > gcc/ChangeLog: > > PR target/103753 > * config/i386/i386-expand.c (ix86_expand_vector_set): Not use > gen_avx2_pblendph_pblendd when elt == 0. > * config/i386/sse.md (avx2_pblendph): Rename to .. > (avx2_pblend_pblendd).. this, and extend to V16HI. > (*avx2_pblendw): Rename to .. > (*avx2_pblend): .. this, and extend to V16HF. > (avx2_pblendw): Rename to .. > (*avx2_pblend): .. this, and extend to V16HF. > (blendsuf): Removed. > (sse4_1_pblend): Renamed to .. > (sse4_1_pblend): .. this. > > gcc/testsuite/ChangeLog: > > * gcc.target/i386/pr103753.c: New test. LGTM, with a small nit below. Thanks, Uros. > --- > gcc/config/i386/i386-expand.c| 5 ++- > gcc/config/i386/sse.md | 48 +++- > gcc/testsuite/gcc.target/i386/pr103753.c | 17 + > 3 files changed, 42 insertions(+), 28 deletions(-) > create mode 100644 gcc/testsuite/gcc.target/i386/pr103753.c > > diff --git a/gcc/config/i386/i386-expand.c b/gcc/config/i386/i386-expand.c > index e93ef1cafa6..0d219d6bb69 100644 > --- a/gcc/config/i386/i386-expand.c > +++ b/gcc/config/i386/i386-expand.c > @@ -16245,10 +16245,11 @@ ix86_expand_vector_set (bool mmx_ok, rtx target, > rtx val, int elt) >goto half; > > case E_V16HFmode: > - if (TARGET_AVX2) > + /* For ELT == 0, vec_setv8hf_0 can save 1 vpbroadcastw. */ > + if (TARGET_AVX2 && elt != 0) > { > mmode = SImode; > - gen_blendm = gen_avx2_pblendph; > + gen_blendm = gen_avx2_pblendph_pblendd; > blendm_const = true; > break; > } > diff --git a/gcc/config/i386/sse.md b/gcc/config/i386/sse.md > index 033b60d9aa2..c986c73bef8 100644 > --- a/gcc/config/i386/sse.md > +++ b/gcc/config/i386/sse.md > @@ -21291,10 +21291,7 @@ (define_insn_and_split > "*_pblendvb_lt_subreg_not" > (lt:VI1_AVX2 (match_dup 3) (match_dup 4))] UNSPEC_BLENDV))] >"operands[3] = gen_lowpart (mode, operands[3]);") > > -(define_mode_attr blendsuf > - [(V8HI "w") (V8HF "ph")]) > - > -(define_insn "sse4_1_pblend" > +(define_insn "sse4_1_pblend" >[(set (match_operand:V8_128 0 "register_operand" "=Yr,*x,x") > (vec_merge:V8_128 > (match_operand:V8_128 2 "vector_operand" "YrBm,*xBm,xm") > @@ -21313,11 +21310,11 @@ (define_insn "sse4_1_pblend" > (set_attr "mode" "TI")]) > > ;; The builtin uses an 8-bit immediate. Expand that. > -(define_expand "avx2_pblendw" > - [(set (match_operand:V16HI 0 "register_operand") > - (vec_merge:V16HI > - (match_operand:V16HI 2 "nonimmediate_operand") > - (match_operand:V16HI 1 "register_operand") > +(define_expand "avx2_pblend" > + [(set (match_operand:V16_256 0 "register_operand") > + (vec_merge:V16_256 > + (match_operand:V16_256 2 "nonimmediate_operand") > + (match_operand:V16_256 1 "register_operand") > (match_operand:SI 3 "const_0_to_255_operand")))] >"TARGET_AVX2" > { > @@ -21325,11 +21322,11 @@ (define_expand "avx2_pblendw" >operands[3] = GEN_INT (val << 8 | val); > }) > > -(define_expand "avx2_pblendph" > - [(set (match_operand:V16HF 0 "register_operand") > - (vec_merge:V16HF > - (match_operand:V16HF 2 "register_operand") > - (match_operand:V16HF 1 "register_operand") > +(define_expand "avx2_pblend_pblendd" Just name the above avx2_pblend_1. > + [(set (match_operand:V16_256 0 "register_operand") > + (vec_merge:V16_256 > + (match_operand:V16_256 2 "register_operand") > + (match_operand:V16_256 1 "register_operand") > (match_operand:SI 3 "const_int_operand")))] >"TARGET_AVX2 >&& !((INTVAL (operands[3]) & 0xff) && (INTVAL (operands[3]) & 0xff00))" > @@ -21339,7 +21336,7 @@ (define_expand "avx2_pblendph" > emit_move_insn (operands[0], operands[1]); >else > { > - rtx tmp = gen_reg_rtx (V16HImode); > + rtx tmp = gen_reg_rtx (mode); > rtx blendw_idx, blendd_idx; > > if (mask & 0xff) > @@ -21352,13 +21349,12 @@ (define_expand "avx2_pblendph" > blendw_idx = GEN_INT (mask >> 8 & 0xff); > blendd_idx = GEN_INT (240); > } > - operands[1] = lowpart_subreg (V16HImode, operands[1], V16HFmode); > - operands[2] = lowpart_subreg (V16HImode, operands[2], V16HFmode); > - emit_insn (gen_avx2_pblendw (tmp, operands[1], operands[2], > blendw_idx)); > + emit_insn (gen_avx2_pblend (tmp, operands[1], > + operands[2], blendw_idx)); > > - operands[0] = lowpart_subreg (V8SImode, operands[0], V16HFmode); > - tmp = lowpart_subreg (V8SImode, tmp, V16HImode); > - operands[1] = lowpart_
Re: [PATCH] libgomp, OpenMP, nvptx: Low-latency memory allocator
On 1/6/22 10:29, Tom de Vries wrote: At first glance, the above behaviour doesn't look like a too short timeout. Using patch below, this passes for me, I'm currently doing a full build and test to confirm. Looks like it has to do with: ... For sm_6x and earlier architectures, atom operations on .shared state space do not guarantee atomicity with respect to normal store instructions to the same address. It is the programmer's responsibility to guarantee correctness of programs that use shared memory atomic instructions, e.g., by inserting barriers between normal stores and atomic operations to a common address, or by using atom.exch to store to locations accessed by other atomic operations. ... My current understanding is that this is a backend problem, and needs to be fixed by defining atomic_store patterns which take care of this peculiarity. Thanks, - Tom diff --git a/libgomp/config/nvptx/allocator.c b/libgomp/config/nvptx/allocator.c index 6bc2ea48043..4524122b3e7 100644 --- a/libgomp/config/nvptx/allocator.c +++ b/libgomp/config/nvptx/allocator.c @@ -122,7 +122,8 @@ nvptx_memspace_alloc (omp_memspace_handle_t memspace, size_t size) } /* Update the free chain root and release the lock. */ - __atomic_store_n (&__nvptx_lowlat_heap_root, root.raw, MEMMODEL_RELEASE); + __atomic_exchange_n (&__nvptx_lowlat_heap_root, + root.raw, MEMMODEL_RELEASE); return result; } else- @@ -221,7 +222,8 @@ nvptx_memspace_free (omp_memspace_handle_t memspace, void *addr, s ize_t size) root.raw = newfreechunk.raw; /* Update the free chain root and release the lock. */ - __atomic_store_n (&__nvptx_lowlat_heap_root, root.raw, MEMMODEL_RELEASE); + __atomic_exchange_n (&__nvptx_lowlat_heap_root, + root.raw, MEMMODEL_RELEASE); } else free (addr); @@ -331,7 +333,8 @@ nvptx_memspace_realloc (omp_memspace_handle_t memspace, void *addr , /* Else realloc in-place has failed and result remains NULL. */ /* Update the free chain root and release the lock. */ - __atomic_store_n (&__nvptx_lowlat_heap_root, root.raw, MEMMODEL_RELEASE); + __atomic_exchange_n (&__nvptx_lowlat_heap_root, + root.raw, MEMMODEL_RELEASE); if (result == NULL) {
Re: [PATCH v4 04/12] LoongArch Port: Machine Decsription files.
On Fri, 2021-12-24 at 17:28 +0800, chenglulu wrote: > +(define_insn "*zero_extendsidi2_internal" > + [(set (match_operand:DI 0 "register_operand" "=r,r,r") > + (subreg:DI (match_operand:SI 1 "nonimmediate_operand" "r,ZC,W") 0))] > + "TARGET_64BIT" > + "@ > + bstrpick.d\t%0,%1,31,0 > + ldptr.w\t%0,%1\n\tlu32i.d\t%0,0 > + ld.wu\t%0,%1" > + [(set_attr "move_type" "arith,load,load") > + (set_attr "mode" "DI") > + (set_attr "insn_count" "1,2,1")]) This pattern is generating wrong code, causing FAIL: gcc.dg/compat/scalar-by-value-3 c_compat_x_tst.o-c_compat_y_tst.o execute This failure is a real bug, the reduced testcase is attached. In the assembly: # ... bstrins.d $r5,$r13,31,0 addi.d $r12,$r22,-228 bstrpick.d $r12,$r12,31,0 bstrins.d $r5,$r12,63,32 addi.w $r4,$r0,14 # 0xe bl testvaci This obviously does not make any sense: it calculates the *address* of g[0]'s imaginary part, truncates it to 32-bit, then pass it as the *value* of the imaginary part to testvaci(). The problem is: before the reload pass, the compiler consider g[0] a (virtual) register. It only becomes MEM after the reload pass. Adding reload_completed as the condition of this insn seems able to fix the issue. However I'm not sure if other insns need the change too. > +;; See the comment before the *and3 pattern why this is generated by > +;; combine. A minor issue: the comment before 'and3' does not exist. -- Xi Ruoyao School of Aerospace Science and Technology, Xidian University #include #include extern void testvaci (int n, ...); int main() { int i; volatile _Complex int g[14]; for (i = 0; i < 14; i++) g[i] = i + 3 + 3i; testvaci (14, g[0], g[1], g[2], g[3], g[4], g[5], g[6], g[7], g[8], g[9], g[10], g[11], g[12], g[13]); return 0; } void testvaci (int n, ...) { int i; va_list ap; va_start(ap, n); for (i = 0; i < n; i++) { _Complex int t = __builtin_va_arg(ap, _Complex int); assert(t == i + 3 + 3i); } va_end(ap); }
[PATCH] x86_64: Improve (interunit) moves from TImode to V1TImode.
This patch improves the code generated when moving a 128-bit value in TImode, represented by two 64-bit registers, to V1TImode, which is a single SSE register. Currently, the simple move: typedef unsigned __int128 uv1ti __attribute__ ((__vector_size__ (16))); uv1ti foo(__int128 x) { return (uv1ti)x; } is always transferred via memory, as: foo:movq%rdi, -24(%rsp) movq%rsi, -16(%rsp) movdqa -24(%rsp), %xmm0 ret with this patch, we now generate (with -msse2): foo:movq%rdi, %xmm1 movq%rsi, %xmm2 punpcklqdq %xmm2, %xmm1 movdqa %xmm1, %xmm0 ret and with -mavx2: foo:vmovq %rdi, %xmm1 vpinsrq $1, %rsi, %xmm1, %xmm0 ret Even more dramatic is the improvement of zero extended transfers. uv1ti bar(unsigned char c) { return (uv1ti)(__int128)c; } Previously generated: bar:movq$0, -16(%rsp) movzbl %dil, %eax movq%rax, -24(%rsp) vmovdqa -24(%rsp), %xmm0 ret Now generates: bar:movzbl %dil, %edi movq%rdi, %xmm0 ret My first attempt at this functionality attempted to use a simple define_split: +;; Move TImode to V1TImode via V2DImode instead of memory. +(define_split + [(set (match_operand:V1TI 0 "register_operand") +(subreg:V1TI (match_operand:TI 1 "register_operand") 0))] + "TARGET_64BIT && TARGET_SSE2 && can_create_pseudo_p ()" + [(set (match_dup 2) (vec_concat:V2DI (match_dup 3) (match_dup 4))) + (set (match_dup 0) (subreg:V1TI (match_dup 2) 0))] +{ + operands[2] = gen_reg_rtx (V2DImode); + operands[3] = gen_lowpart (DImode, operands[1]); + operands[4] = gen_highpart (DImode, operands[1]); +}) + Unfortunately, this triggers very late during the compilation preventing some of the simplification's we'd like (in combine). For example the foo case above becomes: foo:movq%rsi, -16(%rsp) movq%rdi, %xmm0 movhps -16(%rsp), %xmm0 transferring half directly, and the other half via memory. And for the bar case above, GCC fails to appreciate that movq/vmovq clears the high bits, resulting in: bar:movzbl %dil, %eax xorl%edx, %edx vmovq %rax, %xmm1 vpinsrq $1, %rdx, %xmm1, %xmm0 ret Hence the solution (i.e. this patch) is to add a special case to ix86_expand_vector_move for TImode to V1TImode transfers. This patch has been tested on x86_64-pc-linux-gnu with make bootstrap and make -k check with no new failures. Ok for mainline? 2022-01-06 Roger Sayle gcc/ChangeLog * config/i386/i386-expand.c (ix86_expand_vector_move): Add special case for TImode to V1TImode moves, going via V2DImode. gcc/testsuite/ChangeLog * gcc.target/i386/sse2-v1ti-mov-1.c: New test case. * gcc.target/i386/sse2-v1ti-zext.c: New test case. Many thanks in advance, Roger -- diff --git a/gcc/config/i386/i386-expand.c b/gcc/config/i386/i386-expand.c index 7013c20..47c4870 100644 --- a/gcc/config/i386/i386-expand.c +++ b/gcc/config/i386/i386-expand.c @@ -615,6 +615,23 @@ ix86_expand_vector_move (machine_mode mode, rtx operands[]) return; } + /* Special case TImode to V1TImode conversions, via V2DI. */ + if (mode == V1TImode + && SUBREG_P (op1) + && GET_MODE (SUBREG_REG (op1)) == TImode + && TARGET_64BIT && TARGET_SSE + && can_create_pseudo_p ()) +{ + rtx tmp = gen_reg_rtx (V2DImode); + rtx lo = gen_reg_rtx (DImode); + rtx hi = gen_reg_rtx (DImode); + emit_move_insn (lo, gen_lowpart (DImode, SUBREG_REG (op1))); + emit_move_insn (hi, gen_highpart (DImode, SUBREG_REG (op1))); + emit_insn (gen_vec_concatv2di (tmp, lo, hi)); + emit_move_insn (op0, gen_lowpart (V1TImode, tmp)); + return; +} + /* If operand0 is a hard register, make operand1 a pseudo. */ if (can_create_pseudo_p () && !ix86_hardreg_mov_ok (op0, op1)) diff --git a/gcc/testsuite/gcc.target/i386/sse2-v1ti-mov-1.c b/gcc/testsuite/gcc.target/i386/sse2-v1ti-mov-1.c new file mode 100644 index 000..a1ef7b7 --- /dev/null +++ b/gcc/testsuite/gcc.target/i386/sse2-v1ti-mov-1.c @@ -0,0 +1,11 @@ +/* { dg-do compile { target int128 } } */ +/* { dg-options "-O2 -msse2" } */ + +typedef unsigned __int128 uv1ti __attribute__ ((__vector_size__ (16))); + +uv1ti foo(__int128 x) +{ + return (uv1ti)x; +} + +/* { dg-final { scan-assembler-not "%\[er\]sp" } } */ diff --git a/gcc/testsuite/gcc.target/i386/sse2-v1ti-zext.c b/gcc/testsuite/gcc.target/i386/sse2-v1ti-zext.c new file mode 100644 index 000..4dfc655 --- /dev/null +++ b/gcc/testsuite/gcc.target/i386/sse2-v1ti-zext.c @@ -0,0 +1,16 @@ +/* { dg-do compile { target int128 } } */ +/* { dg-options "-O2 -msse2" } */ + +typedef unsigned __int128 uv1ti __attribute__ ((__vector_size__ (16))); + +uv1ti extqi(unsigned char c) { return (uv1ti)(__int128)c; } +uv1ti ext
[PATCH] libgcc: Fix a broken call/return address prediction
This patch fixed a broken call/return address prediction in segmented stack implementation on x86_64 by leveraging the red-zone under the stack pointer. 2022-01-06 Zhiyao Ma libgcc/ChangeLog: * config/i386/morestack.S: Modified instructions. --- libgcc/config/i386/morestack.S | 11 --- 1 file changed, 8 insertions(+), 3 deletions(-) diff --git a/libgcc/config/i386/morestack.S b/libgcc/config/i386/morestack.S index f1cf32dec9f..61c91ce4a35 100644 --- a/libgcc/config/i386/morestack.S +++ b/libgcc/config/i386/morestack.S @@ -213,14 +213,19 @@ __morestack_non_split: cmpl $0x185d8d4c,(%rax) je2f - # This breaks call/return prediction, as described above. - incq8(%rsp) # Increment the return address. + movq %rax,-8(%rsp)# Save the incremented return + # address to the red-zone below + # the stack pointer. It is + # guaranteed not to be corrupted. popq %rax # Restore register. .cfi_adjust_cfa_offset -8# Adjust for popped register. - ret # Return to caller. + callq *-16(%rsp) # Call into the caller's + # function body. + + ret # Return to caller's prologue. 2: popq %rax # Restore register. -- 2.25.1
Re: [PATCH] c++: CTAD within alias template [PR91911]
On Mon, 3 Jan 2022, Patrick Palka wrote: > On Wed, 22 Dec 2021, Jason Merrill wrote: > > > On 12/21/21 14:08, Patrick Palka wrote: > > > On Tue, Dec 21, 2021 at 2:03 PM Patrick Palka wrote: > > > > > > > > On Wed, Jun 30, 2021 at 4:23 PM Jason Merrill wrote: > > > > > > > > > > On 6/30/21 4:18 PM, Patrick Palka wrote: > > > > > > On Wed, Jun 30, 2021 at 3:51 PM Jason Merrill > > > > > > wrote: > > > > > > > > > > > > > > On 6/30/21 11:58 AM, Patrick Palka wrote: > > > > > > > > On Wed, 30 Jun 2021, Patrick Palka wrote: > > > > > > > > > > > > > > > > > On Fri, 25 Jun 2021, Jason Merrill wrote: > > > > > > > > > > > > > > > > > > > On 6/25/21 1:11 PM, Patrick Palka wrote: > > > > > > > > > > > On Fri, 25 Jun 2021, Jason Merrill wrote: > > > > > > > > > > > > > > > > > > > > > > > On 6/24/21 4:45 PM, Patrick Palka wrote: > > > > > > > > > > > > > In the first testcase below, during parsing of the > > > > > > > > > > > > > alias > > > > > > > > > > > > > template > > > > > > > > > > > > > ConstSpanType, transparency of alias template > > > > > > > > > > > > > specializations means we > > > > > > > > > > > > > replace SpanType with SpanType's substituted > > > > > > > > > > > > > definition. But this > > > > > > > > > > > > > substitution lowers the level of the CTAD placeholder > > > > > > > > > > > > > for span(T()) from > > > > > > > > > > > > > 2 to 1, and so the later instantiantion of > > > > > > > > > > > > > ConstSpanType > > > > > > > > > > > > > erroneously substitutes this CTAD placeholder with the > > > > > > > > > > > > > template argument > > > > > > > > > > > > > at level 1 index 0, i.e. with int, before we get a > > > > > > > > > > > > > chance to perform the > > > > > > > > > > > > > CTAD. > > > > > > > > > > > > > > > > > > > > > > > > > > In light of this, it seems we should avoid level > > > > > > > > > > > > > lowering when > > > > > > > > > > > > > substituting through through the type-id of a > > > > > > > > > > > > > dependent > > > > > > > > > > > > > alias template > > > > > > > > > > > > > specialization. To that end this patch makes > > > > > > > > > > > > > lookup_template_class_1 > > > > > > > > > > > > > pass tf_partial to tsubst in this situation. > > > > > > > > > > > > > > > > > > > > > > > > This makes sense, but what happens if SpanType is a > > > > > > > > > > > > member > > > > > > > > > > > > template, so > > > > > > > > > > > > that > > > > > > > > > > > > the levels of it and ConstSpanType don't match? Or the > > > > > > > > > > > > other way around? > > > > > > > > > > > > > > > > > > > > > > If SpanType is a member template of say the class > > > > > > > > > > > template A (and > > > > > > > > > > > thus its level is greater than ConstSpanType): > > > > > > > > > > > > > > > > > > > > > > template > > > > > > > > > > > struct A { > > > > > > > > > > > template > > > > > > > > > > > using SpanType = decltype(span(T())); > > > > > > > > > > > }; > > > > > > > > > > > > > > > > > > > > > > template > > > > > > > > > > > using ConstSpanType = span > > > > > > > > > > A::SpanType::value_type>; > > > > > > > > > > > > > > > > > > > > > > using type = ConstSpanType; > > > > > > > > > > > > > > > > > > > > > > then this case luckily works even without the patch > > > > > > > > > > > because > > > > > > > > > > > instantiate_class_template now reuses the specialization > > > > > > > > > > > A::SpanType > > > > > > > > > > > that was formed earlier during instantiation of A, > > > > > > > > > > > where we > > > > > > > > > > > substitute only a single level of template arguments, so > > > > > > > > > > > the > > > > > > > > > > > level of > > > > > > > > > > > the CTAD placeholder inside the defining-type-id of this > > > > > > > > > > > specialization > > > > > > > > > > > dropped from 3 to 2, so still more than the level of > > > > > > > > > > > ConstSpanType. > > > > > > > > > > > > > > > > > > > > > > This luck is short-lived though, because if we replace > > > > > > > > > > > A::SpanType with say A::SpanType > > > > > > > > > > > then > > > > > > > > > > > the testcase > > > > > > > > > > > breaks again (without the patch) because we no longer can > > > > > > > > > > > reuse that > > > > > > > > > > > specialization, so we instead form it on the spot by > > > > > > > > > > > substituting two > > > > > > > > > > > levels of template arguments (U=int,T=T) into the > > > > > > > > > > > defining-type-id, > > > > > > > > > > > causing the level of the placeholder to drop to 1. I > > > > > > > > > > > think > > > > > > > > > > > the patch > > > > > > > > > > > causes its level to remain 3 (though I guess it should > > > > > > > > > > > really be 2). > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > For the other way around, if ConstSpanType is a member > > > > > > > > > > > template of > > > > > > > > > > > say the class template B (and thus its level is greater > > > > > > > > >
[PATCH] x86: Rename -harden-sls=indirect-branch to -harden-sls=indirect-jmp
Indirect branch also includes indirect call instructions. Rename -harden-sls=indirect-branch to -harden-sls=indirect-jmp to match its intended behavior. PR target/102952 * config/i386/i386-opts.h (harden_sls): Replace harden_sls_indirect_branch with harden_sls_indirect_jmp. * config/i386/i386.c (ix86_output_jmp_thunk_or_indirect): Likewise. (ix86_output_indirect_jmp): Likewise. (ix86_output_call_insn): Likewise. * config/i386/i386.opt: Replace indirect-branch with indirect-jmp. Replace harden_sls_indirect_branch with harden_sls_indirect_jmp. * doc/invoke.texi (-harden-sls=): Replace indirect-branch with indirect-jmp. --- gcc/config/i386/i386-opts.h | 4 ++-- gcc/config/i386/i386.c | 6 +++--- gcc/config/i386/i386.opt| 2 +- gcc/doc/invoke.texi | 4 ++-- 4 files changed, 8 insertions(+), 8 deletions(-) diff --git a/gcc/config/i386/i386-opts.h b/gcc/config/i386/i386-opts.h index b31f9506c35..8f71e89fa9a 100644 --- a/gcc/config/i386/i386-opts.h +++ b/gcc/config/i386/i386-opts.h @@ -124,8 +124,8 @@ enum instrument_return { enum harden_sls { harden_sls_none = 0, harden_sls_return = 1 << 0, - harden_sls_indirect_branch = 1 << 1, - harden_sls_all = harden_sls_return | harden_sls_indirect_branch + harden_sls_indirect_jmp = 1 << 1, + harden_sls_all = harden_sls_return | harden_sls_indirect_jmp }; #endif diff --git a/gcc/config/i386/i386.c b/gcc/config/i386/i386.c index 1a964fe00f4..835ec2cafcb 100644 --- a/gcc/config/i386/i386.c +++ b/gcc/config/i386/i386.c @@ -16037,7 +16037,7 @@ ix86_output_jmp_thunk_or_indirect (const char *thunk_name, const int regno) fprintf (asm_out_file, "\tjmp\t"); assemble_name (asm_out_file, thunk_name); putc ('\n', asm_out_file); - if ((ix86_harden_sls & harden_sls_indirect_branch)) + if ((ix86_harden_sls & harden_sls_indirect_jmp)) fputs ("\tint3\n", asm_out_file); } else @@ -16263,7 +16263,7 @@ ix86_output_indirect_jmp (rtx call_op) } else output_asm_insn ("%!jmp\t%A0", &call_op); - return (ix86_harden_sls & harden_sls_indirect_branch) ? "int3" : ""; + return (ix86_harden_sls & harden_sls_indirect_jmp) ? "int3" : ""; } /* Output return instrumentation for current function if needed. */ @@ -16430,7 +16430,7 @@ ix86_output_call_insn (rtx_insn *insn, rtx call_op) { output_asm_insn (xasm, &call_op); if (!direct_p - && (ix86_harden_sls & harden_sls_indirect_branch)) + && (ix86_harden_sls & harden_sls_indirect_jmp)) return "int3"; } return ""; diff --git a/gcc/config/i386/i386.opt b/gcc/config/i386/i386.opt index e69955bd1a8..eb829d13d40 100644 --- a/gcc/config/i386/i386.opt +++ b/gcc/config/i386/i386.opt @@ -1144,7 +1144,7 @@ EnumValue Enum(harden_sls) String(return) Value(harden_sls_return) EnumValue -Enum(harden_sls) String(indirect-branch) Value(harden_sls_indirect_branch) +Enum(harden_sls) String(indirect-jmp) Value(harden_sls_indirect_jmp) EnumValue Enum(harden_sls) String(all) Value(harden_sls_all) diff --git a/gcc/doc/invoke.texi b/gcc/doc/invoke.texi index 04820656ffa..6b84228f142 100644 --- a/gcc/doc/invoke.texi +++ b/gcc/doc/invoke.texi @@ -32507,8 +32507,8 @@ Force indirect call and jump via register. @opindex mharden-sls Generate code to mitigate against straight line speculation (SLS) with @var{choice}. The default is @samp{none} which disables all SLS -hardening. @samp{return} enables SLS hardening for function return. -@samp{indirect-branch} enables SLS hardening for indirect branch. +hardening. @samp{return} enables SLS hardening for function returns. +@samp{indirect-jmp} enables SLS hardening for indirect jumps. @samp{all} enables all SLS hardening. @item -mindirect-branch-cs-prefix -- 2.33.1
[PATCH] x86: Generate INT3 for __builtin_eh_return
Generate INT3 after indirect jmp in exception return for -fcf-protection with -mharden-sls=indirect-jmp. gcc/ PR target/103925 * config/i386/i386.c (ix86_output_indirect_function_return): Generate INT3 after indirect jmp for -mharden-sls=indirect-jmp. gcc/testsuite/ PR target/103925 * gcc.target/i386/harden-sls-6.c: New test. --- gcc/config/i386/i386.c | 9 ++--- gcc/testsuite/gcc.target/i386/harden-sls-6.c | 18 ++ 2 files changed, 24 insertions(+), 3 deletions(-) create mode 100644 gcc/testsuite/gcc.target/i386/harden-sls-6.c diff --git a/gcc/config/i386/i386.c b/gcc/config/i386/i386.c index 835ec2cafcb..35fb5adf2ef 100644 --- a/gcc/config/i386/i386.c +++ b/gcc/config/i386/i386.c @@ -16367,11 +16367,14 @@ ix86_output_indirect_function_return (rtx ret_op) } else output_indirect_thunk (regno); - - return ""; } else -return "%!jmp\t%A0"; +{ + output_asm_insn ("%!jmp\t%A0", &ret_op); + if (ix86_harden_sls & harden_sls_indirect_jmp) + fputs ("\tint3\n", asm_out_file); +} + return ""; } /* Output the assembly for a call instruction. */ diff --git a/gcc/testsuite/gcc.target/i386/harden-sls-6.c b/gcc/testsuite/gcc.target/i386/harden-sls-6.c new file mode 100644 index 000..9068eb64008 --- /dev/null +++ b/gcc/testsuite/gcc.target/i386/harden-sls-6.c @@ -0,0 +1,18 @@ +/* { dg-do compile { target { ! ia32 } } } */ +/* { dg-options "-O2 -fcf-protection -mharden-sls=indirect-jmp" } */ + +struct _Unwind_Context _Unwind_Resume_or_Rethrow_this_context; + +void offset (int); + +struct _Unwind_Context { + void *reg[7]; +} _Unwind_Resume_or_Rethrow() { + struct _Unwind_Context cur_contextcur_context = + _Unwind_Resume_or_Rethrow_this_context; + offset(0); + __builtin_eh_return ((long) offset, 0); +} + +/* { dg-final { scan-assembler "jmp\[ \t\]+\\*%rcx" } } */ +/* { dg-final { scan-assembler-times "int3" 1 } } */ -- 2.33.1
Re: [PATCH 1b/6] Add __attribute__((untrusted))
On 1/6/22 8:10 AM, David Malcolm wrote: On Thu, 2021-12-09 at 15:54 -0700, Martin Sebor wrote: On 11/13/21 1:37 PM, David Malcolm via Gcc-patches wrote: This patch adds a new: __attribute__((untrusted)) for use by the C front-end, intended for use by the Linux kernel for use with "__user", but which could be used by other operating system kernels, and potentialy by other projects. It looks like untrusted is a type attribute (rather than one that applies to variables and/or function return values or writeable by-reference arguments). I find that quite surprising. FWIW I initially tried implementing it on pointer types, but doing it on the underlying type was much cleaner. I'm used to thinking of trusted vs tainted as dynamic properties of data so I'm having trouble deciding what to think about the attribute applying to types. Can you explain why it's useful on types? A type system *is* a way of detecting problems involving dynamic properties of data. Ultimately all we have at runtime is a collection of bits; the toolchain has the concept of types as a way to allow us to reason about properies of those bits without requiring a full cross-TU analysis (to try to figure out that e.g. x is, say, a 32 bit unsigned integer), and to document these properties clearly to human readers of the code. I understand that relying on the type system is a way to do it. It just doesn't seem like a very good way in a loosely typed language like C (or C++). I see this as working like a qualifier (rather like "const" and "volatile"), in that an untrusted char * when dereferenced gives you an untrusted char Dereferencing a const char* yields a const char lvalue that implicitly converts to an unqualified value of the referenced object. The qualifier is lost in the conversion, so modeling taint/trust this way will also lose the property in the same contexts. It sounds to me like the concept you're modeling might be more akin to a type specifier (maybe like _Atomic, although that still converts to the underlying type). The intent is to have a way of treating the values as "actively hostile", so that code analyzers can assume the worst possible values for such types (or more glibly, that we're dealing with data from Satan rather than from Murphy). Such types are also relevant to infoleaks: writing sensitive information to an untrusted value can be detected relatively easily with this approach, by checking the type of the value - the types express the trust boundary Doing this with qualifiers allows us to use the C type system to detect these kinds of issues without having to add a full cross-TU interprocedural analysis, and documents it to human readers of the code. Compare with const-correctness; we can have an analogous "trust-correctness". The problem with const-correctness in C is that it's so easily lost (like with strchr, or in the lvalue-rvalue conversion). This is also why I'm skeptical of the type-based approach here. I'd expect the taint property of a type to be quickly lost as an object of the type is passed through existing APIs (e.g., a char array manipulated by string functions like strchr). FWIW you can't directly pass an attacker-controlled buffer to strchr: strchr requires there to be a 0-terminator to the array; if the array's content is untrusted then the attacker might not have 0-terminated it. strchr is just an example of the many functions that in my mind make the type-based approach less than ideal. If the untrusted string was known to be nul-teminated, strchr still couldn't be used without losing the property. Ditto for memchr. It seems that all sanitization would either have to be written from scratch, without relying on existing utility functions, or by providing wrappers that called the common utility functions after removing the qualifier from the tainted data even before the santization was complete. That would obviously be error- prone, but it's something that would be made much more robust by tracking the taint independently of the data type. Martin As implemented, the patch doesn't complain about this, though maybe it should. The main point here is to support the existing __user annotation used by the Linux kernel, in particular, copy_from_user and copy_to_user. (I usually look at tests to help me understand the design of a change but I couldn't find an answer to my question in those in the patch.) The patch kit was rather unclear on this, due to the use of two different approaches (custom address spaces vs this untrusted attribute). Sorry about this. Patches 4a and 4b in the kit add test-uaccess.h (to gcc/testsuite/gcc.dg/analyzer) which supplies "__user"; see the tests that use "test-uaccess.h" in patch 3: [PATCH 3/6] analyzer: implement infoleak detection https://gcc.gnu.org/pipermail/gcc-patches/2021-November/584377.html and in patch 5: [PATCH 5/6] analyzer: use region::untrusted_p in taint detection https://gcc.gn
[PATCH] i386: Improve HImode interunit moves
Currently, the compiler moves HImode values between GPR and XMM registers with: %vpinsrw\t{$0, %k1, %d0|%d0, %k1, 0} %vpextrw\t{$0, %1, %k0|%k0, %1, 0} but it could use slightly faster and shorter: %vmovd\t{%k1, %0|%0, %k1} %vmovd\t{%1, %k0|%k0, %1} 2022-01-06 Uroš Bizjak gcc/ChangeLog: * config/i386/i386.c (ix86_output_ssemov) : Add %q modifier for operands in general registers. : Add %q modifier for operands in general registers. * config/i386/i386.md (*movhi_internal): Change type attribute of xmm-gpr interunit alternatives 9,10 to ssemov and mode attribute to SImode for non-avx512fp16 targets. (*movhf_internal): Ditto for xmm-gpr interunit alternatives 6,8. * config/i386/mmx.md (*movv2qi_internal): Ditto for xmm-gpr interunit alternatives 8,9. gcc/testsuite/ChangeLog: * gcc.target/i386/pr102811-2.c (dg-final): Update scan-assembler-times directives. * gcc.target/i386/sse2-float16-2.c (dg-final): Update scan-assembler directives. Bootstrapped and regression tested on x86_64-linux-gnu {,-m32}. Pushed to master. Uros. diff --git a/gcc/config/i386/i386.c b/gcc/config/i386/i386.c index 1a964fe00f4..aeb7db5a5e3 100644 --- a/gcc/config/i386/i386.c +++ b/gcc/config/i386/i386.c @@ -5535,15 +5535,30 @@ ix86_output_ssemov (rtx_insn *insn, rtx *operands) case MODE_DI: /* Handle broken assemblers that require movd instead of movq. */ - if (!HAVE_AS_IX86_INTERUNIT_MOVQ - && (GENERAL_REG_P (operands[0]) - || GENERAL_REG_P (operands[1]))) - return "%vmovd\t{%1, %0|%0, %1}"; + if (GENERAL_REG_P (operands[0])) + { + if (HAVE_AS_IX86_INTERUNIT_MOVQ) + return "%vmovq\t{%1, %q0|%q0, %1}"; + else + return "%vmovd\t{%1, %q0|%q0, %1}"; + } + else if (GENERAL_REG_P (operands[1])) + { + if (HAVE_AS_IX86_INTERUNIT_MOVQ) + return "%vmovq\t{%q1, %0|%0, %q1}"; + else + return "%vmovd\t{%q1, %0|%0, %q1}"; + } else return "%vmovq\t{%1, %0|%0, %1}"; case MODE_SI: - return "%vmovd\t{%1, %0|%0, %1}"; + if (GENERAL_REG_P (operands[0])) + return "%vmovd\t{%1, %k0|%k0, %1}"; + else if (GENERAL_REG_P (operands[1])) + return "%vmovd\t{%k1, %0|%0, %k1}"; + else + return "%vmovd\t{%1, %0|%0, %1}"; case MODE_HI: if (GENERAL_REG_P (operands[0])) diff --git a/gcc/config/i386/i386.md b/gcc/config/i386/i386.md index 9b424a3935b..376df1d51d1 100644 --- a/gcc/config/i386/i386.md +++ b/gcc/config/i386/i386.md @@ -2580,13 +2580,9 @@ return standard_sse_constant_opcode (insn, operands); if (SSE_REG_P (operands[0])) - return MEM_P (operands[1]) - ? "%vpinsrw\t{$0, %1, %d0|%d0, %1, 0}" - : "%vpinsrw\t{$0, %k1, %d0|%d0, %k1, 0}"; + return "%vpinsrw\t{$0, %1, %d0|%d0, %1, 0}"; else - return MEM_P (operands[0]) - ? "%vpextrw\t{$0, %1, %0|%0, %1, 0}" - : "%vpextrw\t{$0, %1, %k0|%k0, %1, 0}"; + return "%vpextrw\t{$0, %1, %0|%0, %1, 0}"; case TYPE_MSKLOG: if (operands[1] == const0_rtx) @@ -2614,13 +2610,13 @@ (const_string "mskmov") (eq_attr "alternative" "8") (const_string "msklog") - (eq_attr "alternative" "9,10,13,14") + (eq_attr "alternative" "13,14") (if_then_else (match_test "TARGET_AVX512FP16") (const_string "ssemov") (const_string "sselog1")) (eq_attr "alternative" "11") (const_string "sselog1") - (eq_attr "alternative" "12") + (eq_attr "alternative" "9,10,12") (const_string "ssemov") (match_test "optimize_function_for_size_p (cfun)") (const_string "imov") @@ -2644,7 +2640,11 @@ ] (const_string "orig"))) (set (attr "mode") - (cond [(eq_attr "alternative" "9,10,13,14") + (cond [(eq_attr "alternative" "9,10") + (if_then_else (match_test "TARGET_AVX512FP16") + (const_string "HI") + (const_string "SI")) + (eq_attr "alternative" "13,14") (if_then_else (match_test "TARGET_AVX512FP16") (const_string "HI") (const_string "TI")) @@ -3876,13 +3876,9 @@ return standard_sse_constant_opcode (insn, operands); if (SSE_REG_P (operands[0])) - return MEM_P (operands[1]) - ? "%vpinsrw\t{$0, %1, %d0|%d0, %1, 0}" - : "%vpinsrw\t{$0, %k1, %d0|%d0, %k1, 0}"; + return "%vpinsrw\t{$0, %1, %d0|%d0, %1, 0}"; else - return MEM_P (operands[0]) - ? "%vpextrw\t{$0, %1, %0|%0, %1, 0}" - : "%vpextrw\t{$0, %1, %k0|%k0, %1, 0}"; + return "%vpextrw\t{$0, %1, %0|%0, %1, 0}"; default: if (get_attr_mode (insn) == MODE_SI) @@ -3901,9 +3897,9 @@ (set (attr "type")
Re: [PATCH] x86: Rename -harden-sls=indirect-branch to -harden-sls=indirect-jmp
On Thu, Jan 6, 2022 at 7:57 PM H.J. Lu wrote: > > Indirect branch also includes indirect call instructions. Rename > -harden-sls=indirect-branch to -harden-sls=indirect-jmp to match its > intended behavior. > > PR target/102952 > * config/i386/i386-opts.h (harden_sls): Replace > harden_sls_indirect_branch with harden_sls_indirect_jmp. > * config/i386/i386.c (ix86_output_jmp_thunk_or_indirect): > Likewise. > (ix86_output_indirect_jmp): Likewise. > (ix86_output_call_insn): Likewise. > * config/i386/i386.opt: Replace indirect-branch with > indirect-jmp. Replace harden_sls_indirect_branch with > harden_sls_indirect_jmp. > * doc/invoke.texi (-harden-sls=): Replace indirect-branch with > indirect-jmp. OK. Thanks, Uros. > --- > gcc/config/i386/i386-opts.h | 4 ++-- > gcc/config/i386/i386.c | 6 +++--- > gcc/config/i386/i386.opt| 2 +- > gcc/doc/invoke.texi | 4 ++-- > 4 files changed, 8 insertions(+), 8 deletions(-) > > diff --git a/gcc/config/i386/i386-opts.h b/gcc/config/i386/i386-opts.h > index b31f9506c35..8f71e89fa9a 100644 > --- a/gcc/config/i386/i386-opts.h > +++ b/gcc/config/i386/i386-opts.h > @@ -124,8 +124,8 @@ enum instrument_return { > enum harden_sls { >harden_sls_none = 0, >harden_sls_return = 1 << 0, > - harden_sls_indirect_branch = 1 << 1, > - harden_sls_all = harden_sls_return | harden_sls_indirect_branch > + harden_sls_indirect_jmp = 1 << 1, > + harden_sls_all = harden_sls_return | harden_sls_indirect_jmp > }; > > #endif > diff --git a/gcc/config/i386/i386.c b/gcc/config/i386/i386.c > index 1a964fe00f4..835ec2cafcb 100644 > --- a/gcc/config/i386/i386.c > +++ b/gcc/config/i386/i386.c > @@ -16037,7 +16037,7 @@ ix86_output_jmp_thunk_or_indirect (const char > *thunk_name, const int regno) >fprintf (asm_out_file, "\tjmp\t"); >assemble_name (asm_out_file, thunk_name); >putc ('\n', asm_out_file); > - if ((ix86_harden_sls & harden_sls_indirect_branch)) > + if ((ix86_harden_sls & harden_sls_indirect_jmp)) > fputs ("\tint3\n", asm_out_file); > } >else > @@ -16263,7 +16263,7 @@ ix86_output_indirect_jmp (rtx call_op) > } >else > output_asm_insn ("%!jmp\t%A0", &call_op); > - return (ix86_harden_sls & harden_sls_indirect_branch) ? "int3" : ""; > + return (ix86_harden_sls & harden_sls_indirect_jmp) ? "int3" : ""; > } > > /* Output return instrumentation for current function if needed. */ > @@ -16430,7 +16430,7 @@ ix86_output_call_insn (rtx_insn *insn, rtx call_op) > { > output_asm_insn (xasm, &call_op); > if (!direct_p > - && (ix86_harden_sls & harden_sls_indirect_branch)) > + && (ix86_harden_sls & harden_sls_indirect_jmp)) > return "int3"; > } >return ""; > diff --git a/gcc/config/i386/i386.opt b/gcc/config/i386/i386.opt > index e69955bd1a8..eb829d13d40 100644 > --- a/gcc/config/i386/i386.opt > +++ b/gcc/config/i386/i386.opt > @@ -1144,7 +1144,7 @@ EnumValue > Enum(harden_sls) String(return) Value(harden_sls_return) > > EnumValue > -Enum(harden_sls) String(indirect-branch) Value(harden_sls_indirect_branch) > +Enum(harden_sls) String(indirect-jmp) Value(harden_sls_indirect_jmp) > > EnumValue > Enum(harden_sls) String(all) Value(harden_sls_all) > diff --git a/gcc/doc/invoke.texi b/gcc/doc/invoke.texi > index 04820656ffa..6b84228f142 100644 > --- a/gcc/doc/invoke.texi > +++ b/gcc/doc/invoke.texi > @@ -32507,8 +32507,8 @@ Force indirect call and jump via register. > @opindex mharden-sls > Generate code to mitigate against straight line speculation (SLS) with > @var{choice}. The default is @samp{none} which disables all SLS > -hardening. @samp{return} enables SLS hardening for function return. > -@samp{indirect-branch} enables SLS hardening for indirect branch. > +hardening. @samp{return} enables SLS hardening for function returns. > +@samp{indirect-jmp} enables SLS hardening for indirect jumps. > @samp{all} enables all SLS hardening. > > @item -mindirect-branch-cs-prefix > -- > 2.33.1 >
Re: [PATCH] x86: Generate INT3 for __builtin_eh_return
On Thu, Jan 6, 2022 at 7:58 PM H.J. Lu wrote: > > Generate INT3 after indirect jmp in exception return for -fcf-protection > with -mharden-sls=indirect-jmp. > > gcc/ > > PR target/103925 > * config/i386/i386.c (ix86_output_indirect_function_return): > Generate INT3 after indirect jmp for -mharden-sls=indirect-jmp. > > gcc/testsuite/ > > PR target/103925 > * gcc.target/i386/harden-sls-6.c: New test. OK. Thanks, Uros. > --- > gcc/config/i386/i386.c | 9 ++--- > gcc/testsuite/gcc.target/i386/harden-sls-6.c | 18 ++ > 2 files changed, 24 insertions(+), 3 deletions(-) > create mode 100644 gcc/testsuite/gcc.target/i386/harden-sls-6.c > > diff --git a/gcc/config/i386/i386.c b/gcc/config/i386/i386.c > index 835ec2cafcb..35fb5adf2ef 100644 > --- a/gcc/config/i386/i386.c > +++ b/gcc/config/i386/i386.c > @@ -16367,11 +16367,14 @@ ix86_output_indirect_function_return (rtx ret_op) > } >else > output_indirect_thunk (regno); > - > - return ""; > } >else > -return "%!jmp\t%A0"; > +{ > + output_asm_insn ("%!jmp\t%A0", &ret_op); > + if (ix86_harden_sls & harden_sls_indirect_jmp) > + fputs ("\tint3\n", asm_out_file); > +} > + return ""; > } > > /* Output the assembly for a call instruction. */ > diff --git a/gcc/testsuite/gcc.target/i386/harden-sls-6.c > b/gcc/testsuite/gcc.target/i386/harden-sls-6.c > new file mode 100644 > index 000..9068eb64008 > --- /dev/null > +++ b/gcc/testsuite/gcc.target/i386/harden-sls-6.c > @@ -0,0 +1,18 @@ > +/* { dg-do compile { target { ! ia32 } } } */ > +/* { dg-options "-O2 -fcf-protection -mharden-sls=indirect-jmp" } */ > + > +struct _Unwind_Context _Unwind_Resume_or_Rethrow_this_context; > + > +void offset (int); > + > +struct _Unwind_Context { > + void *reg[7]; > +} _Unwind_Resume_or_Rethrow() { > + struct _Unwind_Context cur_contextcur_context = > + _Unwind_Resume_or_Rethrow_this_context; > + offset(0); > + __builtin_eh_return ((long) offset, 0); > +} > + > +/* { dg-final { scan-assembler "jmp\[ \t\]+\\*%rcx" } } */ > +/* { dg-final { scan-assembler-times "int3" 1 } } */ > -- > 2.33.1 >
Go patch committed: Permit unnamed types when ignoring struct tags
This patch to the Go frontend permits converting unnamed types when ignoring struct tags. I think the code was written this way because before conversion could ignore struct tags this case could only arise with named types. Now that conversions permit struct tags to change, this can occur with unnamed types as well. The test case is https://golang.org/cl/375796. This fixes https://golang.org/issue/50439. Bootstrapped and ran Go testsuite on x86_64-pc-linux-gnu. Committed to mainline. Ian 89ce08665e128d2f8e9ae148044134d9e94a9509 diff --git a/gcc/go/gofrontend/MERGE b/gcc/go/gofrontend/MERGE index a18f3a37349..9cc6a1c63c6 100644 --- a/gcc/go/gofrontend/MERGE +++ b/gcc/go/gofrontend/MERGE @@ -1,4 +1,4 @@ -9732b077b9235d0f35d0fb0abfe406b94d49 +799e9807c36fc661b14dfff136369556f09a5ebf The first line of this file holds the git revision number of the last merge done from the gofrontend repository. diff --git a/gcc/go/gofrontend/types.cc b/gcc/go/gofrontend/types.cc index 57c02a97ef0..1c67ea099eb 100644 --- a/gcc/go/gofrontend/types.cc +++ b/gcc/go/gofrontend/types.cc @@ -791,8 +791,7 @@ Type::are_convertible(const Type* lhs, const Type* rhs, std::string* reason) // The types are convertible if they have identical underlying // types, ignoring struct field tags. - if ((lhs->named_type() != NULL || rhs->named_type() != NULL) - && Type::are_identical(lhs->base(), rhs->base(), 0, reason)) + if (Type::are_identical(lhs->base(), rhs->base(), 0, reason)) return true; // The types are convertible if they are both unnamed pointer types
Re: [PATCH] Fortran: Fix ICE in argument_rank_mismatch [PR103287]
Am 05.01.22 um 22:34 schrieb Sandra Loosemore: One of my previous TS29113-related patches from last fall introduced an ICE due to a bogus assertion. This is the fix Steve suggested in the issue, bundled with the test cases, regression-tested, etc. OK to check in? OK. -Sandra Thanks, Harald
Re: [PATCH] PR fortran/103777 - ICE in gfc_simplify_maskl, at fortran/simplify.c:4918
Hi Mikael, Am 21.12.21 um 13:38 schrieb Mikael Morin: Le 20/12/2021 à 23:05, Harald Anlauf via Fortran a écrit : Dear all, we need to check the arguments of the elemental MASKL and MASKR intrinsics also before simplifying. Testcase by Gerhard. The fix is almost obvious, but I'm happy to get feedback in case there is something I overlooked. (There is already a check on scalar arguments to MASKL/MASKR, which however misses the case of array arguments.) Regtested on x86_64-pc-linux-gnu. OK for mainline? Your patch looks reasonable and safe. However, I find it surprising that it’s actually needed, as gfc_check mask is already the check function associated to maskl and maskr in the definition of the symbols. The simplification function should be called only when the associated check function has returned successfully, so it shouldn’t be necessary to call it again at simplification time. Looking at the backtrace, it is the do_simplify call at the beginning of gfc_intrinsic_func_interface that seems dubious to me, as it comes before all the check further down in the function and it looks redundant with the other simplification code after the checks. So I’m inclined to test whether by any chance removing that call works, and if it doesn’t, let’s go with this patch. Did you find the time to try your version? Mikael Thanks, Harald
Re: [power-ieee128] RFH: LTO broken
On 06.01.22 06:00, Michael Meissner via Fortran wrote: I pushed the patch to the branch. Test results are looking quite good right now. What is still missing is the conversion for unformatted I/O, both ways. I'll start doing some stuff on it. Just one question: What are functions that I can use to convert from IBM long double to IEEE and long double and vice versa? It was in an e-mail somewhere, but I cannot find it at the moment. Best regards Thomas
Re: [power-ieee128] RFH: LTO broken
On Thu, Jan 06, 2022 at 09:01:54PM +0100, Thomas Koenig wrote: > On 06.01.22 06:00, Michael Meissner via Fortran wrote: > > I pushed the patch to the branch. > > Test results are looking quite good right now. > > What is still missing is the conversion for unformatted I/O, both > ways. I'll start doing some stuff on it. Just one question: > What are functions that I can use to convert from IBM long double > to IEEE and long double and vice versa? It was in an e-mail somewhere, > but I cannot find it at the moment. Under the hood __extendkftf2 and __trunctfkf2, as can be seen on: __ibm128 foo (__float128 x) { return x; } __float128 bar (__ibm128 x) { return x; } But, I really don't think libgfortran should call those by hand, just use C casts, (GFC_REAL_17) var_with_GFC_REAL_16_type or (GFC_REAL_16) var_with_GFC_REAL_17_type. Jakub
[PATCH] Fortran: Fix handling of optional argument to SIZE intrinsic [PR103898]
This patch fixes an ICE introduced with the recent-ish rewrite to inline the SIZE intrinsic, using a helper function to do the bulk of the work in producing the expansion. It turns out to be a simple think-o type mistake in the wrapper around the helper rather than anything deeply wrong with the logic. OK to check in? -Sandra commit 0e5b74440572f988dd96a6e9c33c11b59323d6cf Author: Sandra Loosemore Date: Thu Jan 6 11:23:18 2022 -0800 Fortran: Fix handling of optional argument to SIZE intrinsic [PR103898] This patch fixes a think-o in the code that triggered an ICE in the test case. 2021-01-06 Sandra Loosemore gcc/fortran/ * trans-intrinsic.c (gfc_conv_intrinsic_size): Make size_var actually be a variable and fix surrounding code. gcc/testsuite/ * gfortran.dg/pr103898.f90: New test. diff --git a/gcc/fortran/trans-intrinsic.c b/gcc/fortran/trans-intrinsic.c index 41252c9..aae34b0 100644 --- a/gcc/fortran/trans-intrinsic.c +++ b/gcc/fortran/trans-intrinsic.c @@ -8006,10 +8006,14 @@ gfc_conv_intrinsic_size (gfc_se * se, gfc_expr * expr) cond = gfc_evaluate_now (cond, &se->pre); /* 'block2' contains the arg2 absent case, 'block' the arg2 present case; size_var can be used in both blocks. */ - tree size_var = gfc_tree_array_size (&block2, arg1, e, NULL_TREE); + tree size_var = gfc_create_var (TREE_TYPE (size), "size"); tmp = fold_build2_loc (input_location, MODIFY_EXPR, TREE_TYPE (size_var), size_var, size); gfc_add_expr_to_block (&block, tmp); + size = gfc_tree_array_size (&block2, arg1, e, NULL_TREE); + tmp = fold_build2_loc (input_location, MODIFY_EXPR, + TREE_TYPE (size_var), size_var, size); + gfc_add_expr_to_block (&block2, tmp); tmp = build3_v (COND_EXPR, cond, gfc_finish_block (&block), gfc_finish_block (&block2)); gfc_add_expr_to_block (&se->pre, tmp); diff --git a/gcc/testsuite/gfortran.dg/pr103898.f90 b/gcc/testsuite/gfortran.dg/pr103898.f90 new file mode 100644 index 000..6b4bb30 --- /dev/null +++ b/gcc/testsuite/gfortran.dg/pr103898.f90 @@ -0,0 +1,15 @@ +! { dg-do compile } + +! This test used to ICE during gimplification (PR103898). + +Module g +contains + function mysize(array, dim) +integer :: mysize +integer, dimension(:), intent(in) :: array +integer, optional, intent(in) :: dim +if (present(dim)) then + mysize = size(array, dim=dim) +endif + end function mysize +end module
Re: [PATCH] Fortran: Fix handling of optional argument to SIZE intrinsic [PR103898]
Hi Sandra, Am 06.01.22 um 21:11 schrieb Sandra Loosemore: This patch fixes an ICE introduced with the recent-ish rewrite to inline the SIZE intrinsic, using a helper function to do the bulk of the work in producing the expansion. It turns out to be a simple think-o type mistake in the wrapper around the helper rather than anything deeply wrong with the logic. OK to check in? LGTM. -Sandra Thanks for addressing this swiftly!
Re: [PATCH] PR fortran/103777 - ICE in gfc_simplify_maskl, at fortran/simplify.c:4918
Le 06/01/2022 à 20:50, Harald Anlauf a écrit : Did you find the time to try your version? Not yet. But I have not (yet) forgotten about this.
[wwwdocs] Update C++ DR table
A new batch of DRs. Pushed. commit ecbd709b2ecb1e3c35fa4ede09bcb7ceb443def6 Author: Marek Polacek Date: Thu Jan 6 17:06:24 2022 -0500 Update C++ DR table diff --git a/htdocs/projects/cxx-dr-status.html b/htdocs/projects/cxx-dr-status.html index e8002b27..cc2afc03 100644 --- a/htdocs/projects/cxx-dr-status.html +++ b/htdocs/projects/cxx-dr-status.html @@ -12113,7 +12113,7 @@ https://wg21.link/cwg1726";>1726 - drafting + ready Declarator operators and conversion function - https://gcc.gnu.org/PR79318";>PR79318 @@ -17406,7 +17406,7 @@ https://wg21.link/cwg2482";>2482 - review + accepted bit_cast and indeterminate values ? @@ -17481,16 +17481,16 @@ - - + https://wg21.link/cwg2493";>2493 - open + dup auto as a conversion-type-id - - + Dup of issue 1670 https://wg21.link/cwg2494";>2494 - drafting + ready Multiple definitions of non-odr-used entities - @@ -17567,16 +17567,100 @@ https://wg21.link/cwg2505";>2505 - open + drafting Nested unnamed namespace of inline unnamed namespace - + + https://wg21.link/cwg2506";>2506 + ready + Structured bindings and array cv-qualifiers + - + + + + https://wg21.link/cwg2507";>2507 + review + Default arguments for operator[] + ? + + + + https://wg21.link/cwg2508";>2508 + review + Restrictions on uses of template parameter names + ? + + + + https://wg21.link/cwg2509";>2509 + ready + decl-specifier-seq in lambda-specifiers + - + + + + https://wg21.link/cwg2510";>2510 + open + noexcept-specifier of friend function vs class completeness + - + + + + https://wg21.link/cwg2511";>2511 + ready + cv-qualified bit-fields + - + + + + https://wg21.link/cwg2512";>2512 + NAD + typeid and incomplete class types + N/A + + + + https://wg21.link/cwg2513";>2513 + open + Ambiguity with requires-clause and operator-function-id + - + + + + https://wg21.link/cwg2514";>2514 + open + Modifying const subobjects + - + + + + https://wg21.link/cwg2515";>2515 + open + Result of a function call + - + + + + https://wg21.link/cwg2516";>2516 + open + Locus of enum-specifier or opaque-enum-declaration + - + + + + https://wg21.link/cwg2517";>2517 + open + Useless restriction on use of parameter in constraint-expression + - + + This page is currently maintained by mailto:pola...@redhat.com";>pola...@redhat.com. Last update: -Tue Nov 23 04:16:08 PM EST 2021 +Thu Jan 6 04:42:28 PM EST 2022
[PATCH take #3] Recognize MULT_HIGHPART_EXPR in tree-ssa-math-opts pass.
This is the third iteration of a patch to perceive MULT_HIGHPART_EXPR in the middle-end. As they say "the third time's a charm". The first version implemented this in match.pd, which was considered too early. https://gcc.gnu.org/pipermail/gcc-patches/2020-August/551316.html The second version attempted to do this during RTL expansion, and was considered to be too late in the middle-end. https://gcc.gnu.org/pipermail/gcc-patches/2021-August/576922.html https://gcc.gnu.org/pipermail/gcc-patches/2021-August/576923.html This latest version incorporates Richard Biener's feedback/suggestion to perceive MULT_HIGHPART_EXPR in one of the "instruction selection passes", specifically tree-ssa-math-opts, where the recognition of highpart multiplications takes place in the same pass as widening multiplications. With each rewrite, the patch is also getting more aggressive in the set of widening multiplications that it recognizes as highpart multiplies. Currently any widening multiplication followed by a right shift (either signed or unsigned) by a bit count sufficient to eliminate the lowpart is recognized. The result of this shift doesn't need to be truncated. As previously, this patch confirms the target provides a suitable optab before introducing the MULT_HIGHPART_EXPR. This is the reason the testcase is restricted to x86_64, as this pass doesn't do anything on some platforms, but x86_64 should be sufficient to confirm that the pass is working/continues to work. This patch has been tested on x86_64-pc-linux-gnu with a make bootstrap and make -k check (both with and without --target_board='unix{-m32}') with no new regressions. Ok for mainline? 2022-01-06 Roger Sayle gcc/ChangeLog * tree-ssa-math-opts.c (struct widen_mul_stats): Add a highpart_mults_inserted field. (convert_mult_to_highpart): New function to convert right shift of a widening multiply into a MULT_HIGHPART_EXPR. (math_opts_dom_walker::after_dom_children) [RSHIFT_EXPR]: Call new convert_mult_to_highpart function. (pass_optimize_widening_mul::execute): Add a statistics counter for tracking "highpart multiplications inserted" events. gcc/testsuite/ChangeLog * gcc.target/i386/mult-highpart.c: New test case. Thanks in advance, Roger -- diff --git a/gcc/tree-ssa-math-opts.c b/gcc/tree-ssa-math-opts.c index 6131824..2014139 100644 --- a/gcc/tree-ssa-math-opts.c +++ b/gcc/tree-ssa-math-opts.c @@ -207,6 +207,9 @@ static struct /* Number of divmod calls inserted. */ int divmod_calls_inserted; + + /* Number of highpart multiplication ops inserted. */ + int highpart_mults_inserted; } widen_mul_stats; /* The instance of "struct occurrence" representing the highest @@ -4548,9 +4551,129 @@ convert_to_divmod (gassign *stmt) return true; } +/* Process a single gimple statement STMT, which has a RSHIFT_EXPR as + its rhs, and try to convert it into a MULT_HIGHPART_EXPR. The return + value is true iff we converted the statement. */ + +static bool +convert_mult_to_highpart (gimple *stmt, gimple_stmt_iterator *gsi) +{ + tree lhs = gimple_assign_lhs (stmt); + tree stype = TREE_TYPE (lhs); + tree sarg0 = gimple_assign_rhs1 (stmt); + tree sarg1 = gimple_assign_rhs2 (stmt); + + if (TREE_CODE (stype) != INTEGER_TYPE + || TREE_CODE (sarg1) != INTEGER_CST + || TREE_CODE (sarg0) != SSA_NAME + || !tree_fits_uhwi_p (sarg1) + || !has_single_use (sarg0)) +return false; + + gimple *def = SSA_NAME_DEF_STMT (sarg0); + if (!is_gimple_assign (def)) +return false; + + enum tree_code mcode = gimple_assign_rhs_code (def); + if (mcode == NOP_EXPR) +{ + tree tmp = gimple_assign_rhs1 (def); + if (TREE_CODE (tmp) != SSA_NAME || !has_single_use (tmp)) + return false; + def = SSA_NAME_DEF_STMT (tmp); + if (!is_gimple_assign (def)) + return false; + mcode = gimple_assign_rhs_code (def); +} + + if (mcode != WIDEN_MULT_EXPR + || gimple_bb (def) != gimple_bb (stmt)) +return false; + tree mtype = TREE_TYPE (gimple_assign_lhs (def)); + if (TREE_CODE (mtype) != INTEGER_TYPE + || TYPE_PRECISION (mtype) != TYPE_PRECISION (stype)) +return false; + + tree mop1 = gimple_assign_rhs1 (def); + tree mop2 = gimple_assign_rhs2 (def); + tree optype = TREE_TYPE (mop1); + bool unsignedp = TYPE_UNSIGNED (optype); + unsigned int prec = TYPE_PRECISION (optype); + + if (optype != TREE_TYPE (mop2) + || unsignedp != TYPE_UNSIGNED (mtype) + || TYPE_PRECISION (mtype) != 2 * prec) +return false; + + unsigned HOST_WIDE_INT bits = tree_to_uhwi (sarg1); + if (bits < prec || bits >= 2 * prec) +return false; + + machine_mode mode = TYPE_MODE (optype); + optab tab = unsignedp ? umul_highpart_optab : smul_highpart_optab; + if (optab_handler (tab, mode) == CODE_FOR_nothing) +return false; + + location_t loc = gimple_location (stmt); + tree highpart1 = build_and_insert_binop (gsi,
[PATCH] rs6000: More factoring of overload processing
Hi! This patch continues the refactoring started with r12-6014. I had previously noted that the resolve_vec* routines can be further simplified by processing the argument list earlier, so that all routines can use the arrays of arguments and types. I found that this was useful for some of the routines, but not for all of them. For several of the special-cased overloads, we don't specify all of the possible type combinations in rs6000-overload.def, because the types don't matter for the expansion we do. For these, we can't use generic error message handling when the number of arguments is incorrect, because the result is misleading error messages that indicate argument types are wrong. So this patch goes halfway and improves the factoring on the remaining special cases, but leaves vec_splats, vec_promote, vec_extract, vec_insert, and vec_step alone. Bootstrapped and tested on powerpc64le-linux-gnu with no regressions. Is this okay for trunk? Thanks! Bill 2022-01-06 Bill Schmidt gcc/ * config/rs6000/rs6000-c.c (resolve_vec_mul): Accept args and types parameters instead of arglist and nargs. Simplify accordingly. Remove unnecessary test for argument count mismatch. (resolve_vec_cmpne): Likewise. (resolve_vec_adde_sube): Likewise. (resolve_vec_addec_subec): Likewise. (altivec_resolve_overloaded_builtin): Move overload special handling after the gathering of arguments into args[] and types[] and the test for correct number of arguments. Don't perform the test for correct number of arguments for certain special cases. Call the other special cases with args and types instead of arglist and nargs. --- gcc/config/rs6000/rs6000-c.c | 304 +++ 1 file changed, 127 insertions(+), 177 deletions(-) diff --git a/gcc/config/rs6000/rs6000-c.c b/gcc/config/rs6000/rs6000-c.c index 24a081ced37..189a70d89bf 100644 --- a/gcc/config/rs6000/rs6000-c.c +++ b/gcc/config/rs6000/rs6000-c.c @@ -939,37 +939,25 @@ altivec_build_resolved_builtin (tree *args, int n, tree fntype, tree ret_type, enum resolution { unresolved, resolved, resolved_bad }; /* Resolve an overloaded vec_mul call and return a tree expression for the - resolved call if successful. NARGS is the number of arguments to the call. - ARGLIST contains the arguments. RES must be set to indicate the status of + resolved call if successful. ARGS contains the arguments to the call. + TYPES contains their types. RES must be set to indicate the status of the resolution attempt. LOC contains statement location information. */ static tree -resolve_vec_mul (resolution *res, vec *arglist, unsigned nargs, -location_t loc) +resolve_vec_mul (resolution *res, tree *args, tree *types, location_t loc) { /* vec_mul needs to be special cased because there are no instructions for it for the {un}signed char, {un}signed short, and {un}signed int types. */ - if (nargs != 2) -{ - error ("builtin %qs only accepts 2 arguments", "vec_mul"); - *res = resolved; - return error_mark_node; -} - - tree arg0 = (*arglist)[0]; - tree arg0_type = TREE_TYPE (arg0); - tree arg1 = (*arglist)[1]; - tree arg1_type = TREE_TYPE (arg1); /* Both arguments must be vectors and the types must be compatible. */ - if (TREE_CODE (arg0_type) != VECTOR_TYPE - || !lang_hooks.types_compatible_p (arg0_type, arg1_type)) + if (TREE_CODE (types[0]) != VECTOR_TYPE + || !lang_hooks.types_compatible_p (types[0], types[1])) { *res = resolved_bad; return error_mark_node; } - switch (TYPE_MODE (TREE_TYPE (arg0_type))) + switch (TYPE_MODE (TREE_TYPE (types[0]))) { case E_QImode: case E_HImode: @@ -978,21 +966,21 @@ resolve_vec_mul (resolution *res, vec *arglist, unsigned nargs, case E_TImode: /* For scalar types just use a multiply expression. */ *res = resolved; - return fold_build2_loc (loc, MULT_EXPR, TREE_TYPE (arg0), arg0, - fold_convert (TREE_TYPE (arg0), arg1)); + return fold_build2_loc (loc, MULT_EXPR, types[0], args[0], + fold_convert (types[0], args[1])); case E_SFmode: { /* For floats use the xvmulsp instruction directly. */ *res = resolved; tree call = rs6000_builtin_decls[RS6000_BIF_XVMULSP]; - return build_call_expr (call, 2, arg0, arg1); + return build_call_expr (call, 2, args[0], args[1]); } case E_DFmode: { /* For doubles use the xvmuldp instruction directly. */ *res = resolved; tree call = rs6000_builtin_decls[RS6000_BIF_XVMULDP]; - return build_call_expr (call, 2, arg0, arg1); + return build_call_expr (call, 2, args[0], args[1]); } /* Other types are errors. */ default: @@ -1002,37 +990,25 @@ resolve_vec_mul (resolution *res, vec *argli
[committed] analyzer: make use of may_be_aliased in alias detection [PR103546]
Whilst debugging PR analyzer/103546 (false +ve in flex-generated lexers) I noticed that the analyzer was considering that writes through symbolic pointers could be treated as clobbering static globals such as: static YY_BUFFER_STATE * yy_buffer_stack = NULL; even for such variables that never have their address taken. This patch fixes this issue at least, so that the analyzer can preserve knowledge of such globals on code paths with writes through symbolic pointers. It does not fix the false +ve in the lexer code. Successfully bootstrapped & regrtested on x86_64-pc-linux-gnu. Pushed to trunk as r12-6323-gd564a83d14252d7db01381f71900b7a68357803b. gcc/analyzer/ChangeLog: PR analyzer/103546 * store.cc (store::eval_alias_1): Refactor handling of decl regions, adding a test for may_be_aliased, rejecting those for which it returns false. gcc/testsuite/ChangeLog: PR analyzer/103546 * gcc.dg/analyzer/aliasing-3.c: New test. --- gcc/analyzer/store.cc | 18 -- gcc/testsuite/gcc.dg/analyzer/aliasing-3.c | 75 ++ 2 files changed, 86 insertions(+), 7 deletions(-) create mode 100644 gcc/testsuite/gcc.dg/analyzer/aliasing-3.c diff --git a/gcc/analyzer/store.cc b/gcc/analyzer/store.cc index 8729aa8dab6..3f91b6107a9 100644 --- a/gcc/analyzer/store.cc +++ b/gcc/analyzer/store.cc @@ -2456,13 +2456,17 @@ store::eval_alias_1 (const region *base_reg_a, = base_reg_a->dyn_cast_symbolic_region ()) { const svalue *sval_a = sym_reg_a->get_pointer (); - if (sval_a->get_kind () == SK_INITIAL) - if (tree decl_b = base_reg_b->maybe_get_decl ()) - if (!is_global_var (decl_b)) - { - /* The initial value of a pointer can't point to a local. */ - return tristate::TS_FALSE; - } + if (tree decl_b = base_reg_b->maybe_get_decl ()) + { + if (!may_be_aliased (decl_b)) + return tristate::TS_FALSE; + if (sval_a->get_kind () == SK_INITIAL) + if (!is_global_var (decl_b)) + { + /* The initial value of a pointer can't point to a local. */ + return tristate::TS_FALSE; + } + } if (sval_a->get_kind () == SK_INITIAL && base_reg_b->get_kind () == RK_HEAP_ALLOCATED) { diff --git a/gcc/testsuite/gcc.dg/analyzer/aliasing-3.c b/gcc/testsuite/gcc.dg/analyzer/aliasing-3.c new file mode 100644 index 000..003077ad5c1 --- /dev/null +++ b/gcc/testsuite/gcc.dg/analyzer/aliasing-3.c @@ -0,0 +1,75 @@ +#include "analyzer-decls.h" + +#define NULL ((void *)0) + +struct s1 +{ + int f1; +}; + +static struct s1 *p1_glob = NULL; + +void test_1 (struct s1 **pp1, struct s1 *p1_parm) +{ + struct s1 *init_p1_glob = p1_glob; + + __analyzer_eval (p1_glob == init_p1_glob); /* { dg-warning "TRUE" } */ + + if (!p1_glob) +return; + + __analyzer_eval (p1_glob == init_p1_glob); /* { dg-warning "TRUE" } */ + __analyzer_eval (p1_glob != NULL); /* { dg-warning "TRUE" } */ + + *pp1 = p1_parm; + + /* The write through *pp1 can't have changed p1_glob, because + we never take a pointer to p1_glob (and it's static to this TU). */ + __analyzer_eval (p1_glob == init_p1_glob); /* { dg-warning "TRUE" } */ + __analyzer_eval (p1_glob != NULL); /* { dg-warning "TRUE" } */ +} + +struct s2 +{ + int f1; +}; + +static struct s2 *p2_glob = NULL; + +void test_2 (struct s2 **pp2, struct s2 *p2_parm) +{ + /* Ensure that p2_glob is modified. */ + p2_glob = __builtin_malloc (sizeof (struct s2)); + if (!p2_glob) +return; + + __analyzer_eval (p2_glob != NULL); /* { dg-warning "TRUE" } */ + + *pp2 = p2_parm; + + /* The write through *pp2 can't have changed p2_glob, because + we never take a pointer to p2_glob (and it's static to this TU). */ + __analyzer_eval (p2_glob != NULL); /* { dg-warning "TRUE" } */ +} + +struct s3 +{ + int f1; +}; + +struct s3 *p3_glob = NULL; + +void test_3 (struct s3 **pp3, struct s3 *p3_parm) +{ + p3_glob = __builtin_malloc (sizeof (struct s3)); + if (!p3_glob) +return; + + __analyzer_eval (p3_glob != NULL); /* { dg-warning "TRUE" } */ + + *pp3 = p3_parm; + + /* The write through *pp3 could have changed p3_glob, because + another TU could take a pointer to p3_glob. */ + __analyzer_eval (p3_glob != NULL); /* { dg-warning "UNKNOWN" } */ +} -- 2.26.3
[PATCH 1/1] [PATCH] Fix canadian compile for mingw-w64 copies the wrong dlls for mingw-w64 multilibs [PR100427]
When building GCC hosted on windows with Canadian/native compilation (host==target), the build scripts in GCC would override DLLs with each other. For example, for MinGW-w64, 32-bit DLLs would override 64 bits because build scripts copy them both to /bin. This patch fixes the issue by avoiding copying DLLs with multilibs. However, it would still copy when we do not build multilibs, usually the native build for GCC on windows. --- gcc/configure | 26 ++ libatomic/configure| 13 + libbacktrace/configure | 13 + libcc1/configure | 26 ++ libffi/configure | 26 ++ libgfortran/configure | 26 ++ libgo/config/libtool.m4| 13 + libgo/configure| 13 + libgomp/configure | 26 ++ libitm/configure | 26 ++ libobjc/configure | 13 + liboffloadmic/configure| 26 ++ liboffloadmic/plugin/configure | 26 ++ libphobos/configure| 13 + libquadmath/configure | 13 + libsanitizer/configure | 26 ++ libssp/configure | 13 + libstdc++-v3/configure | 26 ++ libtool.m4 | 13 + libvtv/configure | 26 ++ lto-plugin/configure | 13 + zlib/configure | 13 + 22 files changed, 429 insertions(+) diff --git a/gcc/configure b/gcc/configure index 992a9d70092..d66ccc2a7f4 100755 --- a/gcc/configure +++ b/gcc/configure @@ -18642,6 +18642,18 @@ cygwin* | mingw* | pw32* | cegcc*) yes,cygwin* | yes,mingw* | yes,pw32* | yes,cegcc*) library_names_spec='$libname.dll.a' # DLL is installed to $(libdir)/../bin by postinstall_cmds +# If user builds GCC with mulitlibs enabled, it should just install on $(libdir) +# not on $(libdir)/../bin or 32 bits dlls would override 64 bit ones. +if test ${multilib} = yes; then +postinstall_cmds='base_file=`basename \${file}`~ + dlpath=`$SHELL 2>&1 -c '\''. $dir/'\''\${base_file}'\''i; echo \$dlname'\''`~ + dldir=$destdir/`dirname \$dlpath`~ + $install_prog $dir/$dlname $destdir/$dlname~ + chmod a+x $destdir/$dlname~ + if test -n '\''$stripme'\'' && test -n '\''$striplib'\''; then +eval '\''$striplib $destdir/$dlname'\'' || exit \$?; + fi' +else postinstall_cmds='base_file=`basename \${file}`~ dlpath=`$SHELL 2>&1 -c '\''. $dir/'\''\${base_file}'\''i; echo \$dlname'\''`~ dldir=$destdir/`dirname \$dlpath`~ @@ -18651,6 +18663,7 @@ cygwin* | mingw* | pw32* | cegcc*) if test -n '\''$stripme'\'' && test -n '\''$striplib'\''; then eval '\''$striplib \$dldir/$dlname'\'' || exit \$?; fi' +fi postuninstall_cmds='dldll=`$SHELL 2>&1 -c '\''. $file; echo \$dlname'\''`~ dlpath=$dir/\$dldll~ $RM \$dlpath' @@ -22299,6 +22312,18 @@ cygwin* | mingw* | pw32* | cegcc*) yes,cygwin* | yes,mingw* | yes,pw32* | yes,cegcc*) library_names_spec='$libname.dll.a' # DLL is installed to $(libdir)/../bin by postinstall_cmds +# If user builds GCC with mulitlibs enabled, it should just install on $(libdir) +# not on $(libdir)/../bin or 32 bits dlls would override 64 bit ones. +if test ${multilib} = yes; then +postinstall_cmds='base_file=`basename \${file}`~ + dlpath=`$SHELL 2>&1 -c '\''. $dir/'\''\${base_file}'\''i; echo \$dlname'\''`~ + dldir=$destdir/`dirname \$dlpath`~ + $install_prog $dir/$dlname $destdir/$dlname~ + chmod a+x $destdir/$dlname~ + if test -n '\''$stripme'\'' && test -n '\''$striplib'\''; then +eval '\''$striplib $destdir/$dlname'\'' || exit \$?; + fi' +else postinstall_cmds='base_file=`basename \${file}`~ dlpath=`$SHELL 2>&1 -c '\''. $dir/'\''\${base_file}'\''i; echo \$dlname'\''`~ dldir=$destdir/`dirname \$dlpath`~ @@ -22308,6 +22333,7 @@ cygwin* | mingw* | pw32* | cegcc*) if test -n '\''$stripme'\'' && test -n '\''$striplib'\''; then eval '\''$striplib \$dldir/$dlname'\'' || exit \$?; fi' +fi postuninstall_cmds='dldll=`$SHELL 2>&1 -c '\''. $file; echo \$dlname'\''`~ dlpath=$dir/\$dldll~ $RM \$dlpath' diff --git a/libatomic/configure b/libatomic/configure index 5867e69ac14..4fd8cb34cd4 100755 --- a/libatomic/configure +++ b/libatomic/configure @@ -10461,6 +10461,18 @@ cygwin* | mingw* | pw32* | cegcc*) yes,cygwin* | yes,mingw* | yes,pw32* | yes,cegcc*) library_names_spec='$libname.dll.a' # DLL is installed to $(libdir)/../bin by postinstall_cmds +# If user builds GCC with mulitlibs enabled,
[pushed 01/11] c++: don't preevaluate new-initializer
Here begins a series of EH cleanup bugfixes. The preevaluation code was causing trouble with my fix for PR94041, and now I see that it's actually wrong since P0145 was adopted for C++17, mandating order of evaluation for many expressions that were previously unspecified. I don't see a need to preserve the preevaluation code for older standard modes. gcc/cp/ChangeLog: * init.c (build_new_1): Remove preevaluation code. gcc/testsuite/ChangeLog: * g++.old-deja/g++.martin/new1.C: Don't expect preeval. --- gcc/cp/init.c| 38 +--- gcc/testsuite/g++.dg/tree-ssa/stabilize1.C | 13 --- gcc/testsuite/g++.old-deja/g++.martin/new1.C | 18 +- 3 files changed, 17 insertions(+), 52 deletions(-) delete mode 100644 gcc/testsuite/g++.dg/tree-ssa/stabilize1.C diff --git a/gcc/cp/init.c b/gcc/cp/init.c index 9d616f3f5e9..2cab4b4cdce 100644 --- a/gcc/cp/init.c +++ b/gcc/cp/init.c @@ -3047,7 +3047,6 @@ build_new_1 (vec **placement, tree type, tree nelts, address of the first array element. This node is a VAR_DECL, and is therefore reusable. */ tree data_addr; - tree init_preeval_expr = NULL_TREE; tree orig_type = type; if (nelts) @@ -3561,7 +3560,6 @@ build_new_1 (vec **placement, tree type, tree nelts, placement delete. */ if (is_initialized) { - bool stable; bool explicit_value_init_p = false; if (*init != NULL && (*init)->is_empty ()) @@ -3587,7 +3585,6 @@ build_new_1 (vec **placement, tree type, tree nelts, init, elt_type, LOOKUP_NORMAL, complain); - stable = stabilize_init (init_expr, &init_preeval_expr); } else if (array_p) { @@ -3633,11 +3630,6 @@ build_new_1 (vec **placement, tree type, tree nelts, explicit_value_init_p, /*from_array=*/0, complain); - - /* An array initialization is stable because the initialization -of each element is a full-expression, so the temporaries don't -leak out. */ - stable = true; } else { @@ -3694,8 +3686,6 @@ build_new_1 (vec **placement, tree type, tree nelts, = replace_placeholders (TREE_OPERAND (init_expr, 1), TREE_OPERAND (init_expr, 0), &had_placeholder); - stable = (!had_placeholder - && stabilize_init (init_expr, &init_preeval_expr)); } if (init_expr == error_mark_node) @@ -3726,20 +3716,7 @@ build_new_1 (vec **placement, tree type, tree nelts, alloc_fn, complain)); - if (!cleanup) - /* We're done. */; - else if (stable) - /* This is much simpler if we were able to preevaluate all of - the arguments to the constructor call. */ - { - /* CLEANUP is compiler-generated, so no diagnostics. */ - suppress_warning (cleanup); - init_expr = build2 (TRY_CATCH_EXPR, void_type_node, - init_expr, cleanup); - /* Likewise, this try-catch is compiler-generated. */ - suppress_warning (init_expr); - } - else + if (cleanup && !processing_template_decl) /* Ack! First we allocate the memory. Then we set our sentry variable to true, and expand a cleanup that deletes the memory if sentry is true. Then we run the constructor, and @@ -3747,9 +3724,13 @@ build_new_1 (vec **placement, tree type, tree nelts, We need to do this because we allocate the space first, so if there are any temporaries with cleanups in the - constructor args and we weren't able to preevaluate them, we - need this EH region to extend until end of full-expression - to preserve nesting. */ + constructor args, we need this EH region to extend until + end of full-expression to preserve nesting. + + We used to try to evaluate the args first to avoid this, but + since C++17 [expr.new] says that "The invocation of the + allocation function is sequenced before the evaluations of + expressions in the new-initializer." */ { tree end, sentry, begin; @@ -3810,9 +3791,6 @@ build_new_1 (vec **placement, tree type, tree nelts, rval = build2 (COMPOUND_EXPR, TREE_TYPE (rval), alloc_expr, rval); } - if (init_preeval_expr) -rval = build2 (COMPOUND_EXPR, TREE_TYPE (rval), init_preeval_expr, rval); - /* A new-expression is never an lvalue. */ gcc_assert (!obvalue_p (rval));
[pushed 02/11] c++: loop over array elts w/o explicit init [PR92385]
The PR complains that initializing a large array with {} takes a long time to compile; this was because digest_init would turn {} into a long CONSTRUCTOR with an initializer for each element, instead of more sensibly generating a loop. The standard doesn't specify this implementation, but it does allow for it by specifying that a temporary created "when a default constructor is called to initialize an element of an array with no corresponding initializer" is destroyed "before the construction of the next array element, if any." rather than living until the end of the complete object initialization as usual. This change is also needed before the PR94041 fix extends the lifetime of temporaries from elements with explicit initializers. To implement this, I change digest_init so that in cases where initialization of trailing array elements isn't constant, we return a VEC_INIT_EXPR instead of a bare CONSTRUCTOR; when it is encountered later, we call build_vec_init to generate the actual initialization code. PR c++/92385 gcc/cp/ChangeLog: * typeck2.c (PICFLAG_VEC_INIT): New. (process_init_constructor_array): Set it. (process_init_constructor): Handle it. (split_nonconstant_init_1): Handle VEC_INIT_EXPR. * init.c (build_vec_init): Likewise. * cp-gimplify.c (cp_gimplify_expr): Factor out... * tree.c (expand_vec_init_expr): ...this function. (build_vec_init_elt): Handle BRACE_ENCLOSED_INITIALIZER_P. (build_vec_init_expr): Likewise. * constexpr.c (cxx_eval_vec_init): Likewise. (reduced_constant_expression_p): Check arrays before C++20. * cp-tree.h (expand_vec_init_expr): Declare. gcc/testsuite/ChangeLog: * g++.dg/init/array61.C: New test. --- gcc/cp/cp-tree.h| 1 + gcc/cp/constexpr.c | 35 --- gcc/cp/cp-gimplify.c| 13 ++--- gcc/cp/init.c | 11 +++- gcc/cp/tree.c | 44 - gcc/cp/typeck2.c| 23 +-- gcc/testsuite/g++.dg/init/array61.C | 16 +++ 7 files changed, 112 insertions(+), 31 deletions(-) create mode 100644 gcc/testsuite/g++.dg/init/array61.C diff --git a/gcc/cp/cp-tree.h b/gcc/cp/cp-tree.h index 10ca8098a85..0a3697f2f98 100644 --- a/gcc/cp/cp-tree.h +++ b/gcc/cp/cp-tree.h @@ -7779,6 +7779,7 @@ extern bool array_of_runtime_bound_p (tree); extern bool vla_type_p (tree); extern tree build_array_copy (tree); extern tree build_vec_init_expr(tree, tree, tsubst_flags_t); +extern tree expand_vec_init_expr (tree, tree, tsubst_flags_t); extern void diagnose_non_constexpr_vec_init(tree); extern tree hash_tree_cons (tree, tree, tree); extern tree hash_tree_chain(tree, tree); diff --git a/gcc/cp/constexpr.c b/gcc/cp/constexpr.c index 72be45c9e87..4a4b347c31d 100644 --- a/gcc/cp/constexpr.c +++ b/gcc/cp/constexpr.c @@ -3020,8 +3020,7 @@ reduced_constant_expression_p (tree t) if (TREE_CODE (TREE_TYPE (t)) == VECTOR_TYPE) /* An initialized vector would have a VECTOR_CST. */ return false; - else if (cxx_dialect >= cxx20 - && TREE_CODE (TREE_TYPE (t)) == ARRAY_TYPE) + else if (TREE_CODE (TREE_TYPE (t)) == ARRAY_TYPE) { /* There must be a valid constant initializer at every array index. */ @@ -4955,8 +4954,36 @@ cxx_eval_vec_init (const constexpr_ctx *ctx, tree t, { tree atype = TREE_TYPE (t); tree init = VEC_INIT_EXPR_INIT (t); - tree r = cxx_eval_vec_init_1 (ctx, atype, init, - VEC_INIT_EXPR_VALUE_INIT (t), + bool value_init = VEC_INIT_EXPR_VALUE_INIT (t); + if (!init || !BRACE_ENCLOSED_INITIALIZER_P (init)) +; + else if (CONSTRUCTOR_NELTS (init) == 0) +{ + /* Handle {} as value-init. */ + init = NULL_TREE; + value_init = true; +} + else +{ + /* This is a more complicated case, like needing to loop over trailing +elements; call build_vec_init and evaluate the result. */ + tsubst_flags_t complain = ctx->quiet ? tf_none : tf_warning_or_error; + constexpr_ctx new_ctx = *ctx; + if (!ctx->object) + { + /* We want to have an initialization target for an VEC_INIT_EXPR. +If we don't already have one in CTX, use the VEC_INIT_EXPR_SLOT. */ + new_ctx.object = VEC_INIT_EXPR_SLOT (t); + tree ctor = new_ctx.ctor = build_constructor (atype, NULL); + CONSTRUCTOR_NO_CLEARING (ctor) = true; + ctx->global->values.put (new_ctx.object, ctor); + ctx = &new_ctx; + } + init = expand_vec_init_expr (ctx->object, t, complain); + return cxx_eval_constant_expression (ctx, init, lval, non_constant_p, +
[pushed 03/11] c++: temporary lifetime with aggregate init [PR94041]
In C++98 the lifetime of temporaries in aggregate initialization was unclear, but C++11 DR201 clarified that only temporaries created for default-initialization of an array element with no corresponding initializer-clause are destroyed immediately; all others persist until the end of the full-expression. But we never implemented that, and continued treating individual element initializations as full-expressions, such as in my patch for PR50866 in r180442. This blocked my attempted fix for PR66139, which extended the use of split_nonconstant_init, and thus the bug, to aggregate initialization of temporaries within an expression. The longer temporary lifetime creates further EH region overlap problems like the ones that wrap_temporary_cleanups addresses: in aggr7.C, we start out with a normal nesting of A1 c.b1 A2 c.b2 ... ~A2 ~A1 where on the way in, throwing from one of the inits will clean up from the previous inits. But once c.b2 is initialized, throwing from ~A2 must not clean up c.b1; instead it needs to clean up c. So as in build_new_1, we deal with this by guarding the B cleanups with flags that are cleared once c is fully constructed; throwing from one of the ~A still hits that region, but does not call ~B. And then wrap_temporary_cleanups deals with calling ~C, but we need to tell it not to wrap the subobject cleanups. The change from expressing the subobject cleanups with CLEANUP_STMT to TARGET_EXPR was also necessary because we need them to collate with the ~A in gimplify_cleanup_point_expr; the CLEANUP_STMT representation only worked with treating subobject initializations as full-expressions. PR c++/94041 gcc/cp/ChangeLog: * decl.c (check_initializer): Remove obsolete comment. (wrap_cleanups_r): Don't wrap CLEANUP_EH_ONLY. (initialize_local_var): Change assert to test. * typeck2.c (maybe_push_temp_cleanup): New. (split_nonconstant_init_1): Use it. (split_nonconstant_init): Clear cleanup flags. gcc/testsuite/ChangeLog: * g++.dg/init/aggr7-eh.C: New test. * g++.dg/cpp0x/initlist122.C: Also test aggregate variable. --- gcc/cp/decl.c| 12 ++--- gcc/cp/typeck2.c | 55 + gcc/testsuite/g++.dg/cpp0x/initlist122.C | 12 - gcc/testsuite/g++.dg/init/aggr7-eh.C | 62 4 files changed, 121 insertions(+), 20 deletions(-) create mode 100644 gcc/testsuite/g++.dg/init/aggr7-eh.C diff --git a/gcc/cp/decl.c b/gcc/cp/decl.c index 0b71c00f5ab..9f759ceb1c8 100644 --- a/gcc/cp/decl.c +++ b/gcc/cp/decl.c @@ -7257,11 +7257,6 @@ check_initializer (tree decl, tree init, int flags, vec **cleanups) if (init && TREE_CODE (init) != TREE_VEC) { - /* In aggregate initialization of a variable, each element -initialization is a full-expression because there is no -enclosing expression. */ - gcc_assert (stmts_are_full_exprs_p ()); - init_code = store_init_value (decl, init, cleanups, flags); if (DECL_INITIAL (decl) @@ -7428,7 +7423,8 @@ wrap_cleanups_r (tree *stmt_p, int *walk_subtrees, void *data) tree guard = (tree)data; tree tcleanup = TARGET_EXPR_CLEANUP (*stmt_p); - if (tcleanup && !expr_noexcept_p (tcleanup, tf_none)) + if (tcleanup && !CLEANUP_EH_ONLY (*stmt_p) + && !expr_noexcept_p (tcleanup, tf_none)) { tcleanup = build2 (TRY_CATCH_EXPR, void_type_node, tcleanup, guard); /* Tell honor_protect_cleanup_actions to handle this as a separate @@ -7500,11 +7496,11 @@ initialize_local_var (tree decl, tree init) { tree rinit = (TREE_CODE (init) == INIT_EXPR ? TREE_OPERAND (init, 1) : NULL_TREE); - if (rinit && !TREE_SIDE_EFFECTS (rinit)) + if (rinit && !TREE_SIDE_EFFECTS (rinit) + && TREE_OPERAND (init, 0) == decl) { /* Stick simple initializers in DECL_INITIAL so that -Wno-init-self works (c++/34772). */ - gcc_assert (TREE_OPERAND (init, 0) == decl); DECL_INITIAL (decl) = rinit; if (warn_init_self && TYPE_REF_P (type)) diff --git a/gcc/cp/typeck2.c b/gcc/cp/typeck2.c index 3d4d35e13c6..54b1d0da129 100644 --- a/gcc/cp/typeck2.c +++ b/gcc/cp/typeck2.c @@ -459,13 +459,34 @@ cxx_incomplete_type_error (location_t loc, const_tree value, const_tree type) } +/* We've just initialized subobject SUB; also insert a TARGET_EXPR with an + EH-only cleanup for SUB. Because of EH region nesting issues, we need to + make the cleanup conditional on a flag that we will clear once the object is + fully initialized, so push a new flag onto FLAGS. */ + +static void +maybe_push_temp_cleanup (tree sub, vec **flags) +{ + if (tree cleanup + = cxx_maybe_build_cleanup (sub, tf_warning_or_error)) +{ + tree tx = get_target_expr (boolean_true_node); + tree flag = TARGE