Re: [PATCH] testsuite: Disable colorization for ubsan test
ping On Wed, May 20, 2020 at 3:01 PM Kito Cheng wrote: > > - Run gcc testsuite with qemu will print out ascii color code for >ubsan related testcase, however several testcase didn't consider >that, so disable colorization prevent such problem and simplify the >process when adding testcase in future. > > - Verified on native X86 and RISC-V qemu full system mode and user mode. > > ChangeLog: > > gcc/testsuite/ > > Kito Cheng > > * ubsan-dg.exp (orig_ubsan_options_saved): New > (orig_ubsan_options): Ditto. > (ubsan_init): Store UBSAN_OPTIONS and set UBSAN_OPTIONS. > (ubsan_finish): Restore UBSAN_OPTIONS. > --- > gcc/testsuite/lib/ubsan-dg.exp | 22 ++ > 1 file changed, 22 insertions(+) > > diff --git a/gcc/testsuite/lib/ubsan-dg.exp b/gcc/testsuite/lib/ubsan-dg.exp > index 015601cd404..f4ab29e2add 100644 > --- a/gcc/testsuite/lib/ubsan-dg.exp > +++ b/gcc/testsuite/lib/ubsan-dg.exp > @@ -17,6 +17,9 @@ > # Return 1 if compilation with -fsanitize=undefined is error-free for trivial > # code, 0 otherwise. > > +set orig_ubsan_options_saved 0 > +set orig_ubsan_options 0 > + > proc check_effective_target_fsanitize_undefined {} { > return [check_runtime fsanitize_undefined { > int main (void) { return 0; } > @@ -74,6 +77,17 @@ proc ubsan_init { args } { > global TOOL_OPTIONS > global ubsan_saved_TEST_ALWAYS_FLAGS > global ubsan_saved_ALWAYS_CXXFLAGS > +global orig_ubsan_options_saved > +global orig_ubsan_options > + > +if { $orig_ubsan_options_saved == 0 } { > + # Save the original environment. > + if [info exists env(UBSAN_OPTIONS)] { > + set orig_ubsan_options "$env(UBSAN_OPTIONS)" > + set orig_ubsan_options_saved 1 > + } > +} > +setenv UBSAN_OPTIONS color=never > > set link_flags "" > if ![is_remote host] { > @@ -109,6 +123,14 @@ proc ubsan_finish { args } { > global ubsan_saved_ALWAYS_CXXFLAGS > global ubsan_saved_library_path > global ld_library_path > +global orig_ubsan_options_saved > +global orig_ubsan_options > + > +if { $orig_ubsan_options_saved } { > + setenv UBSAN_OPTIONS "$orig_ubsan_options" > +} elseif [info exists env(UBSAN_OPTIONS)] { > + unsetenv UBSAN_OPTIONS > +} > > if [info exists ubsan_saved_ALWAYS_CXXFLAGS ] { > set ALWAYS_CXXFLAGS $ubsan_saved_ALWAYS_CXXFLAGS > -- > 2.26.2 >
Re: [PATCH] testsuite: Disable colorization for ubsan test
On Mon, Jun 01, 2020 at 03:43:00PM +0800, Kito Cheng wrote: > ping > > > On Wed, May 20, 2020 at 3:01 PM Kito Cheng wrote: > > > > - Run gcc testsuite with qemu will print out ascii color code for > >ubsan related testcase, however several testcase didn't consider > >that, so disable colorization prevent such problem and simplify the > >process when adding testcase in future. > > > > - Verified on native X86 and RISC-V qemu full system mode and user mode. > > > > ChangeLog: > > > > gcc/testsuite/ > > > > Kito Cheng > > > > * ubsan-dg.exp (orig_ubsan_options_saved): New > > (orig_ubsan_options): Ditto. > > (ubsan_init): Store UBSAN_OPTIONS and set UBSAN_OPTIONS. > > (ubsan_finish): Restore UBSAN_OPTIONS. Ok, thanks. Jakub
[PATCH] coroutines: Wrap co_await in a target expr where needed [PR95050]
Hi, Since the co_await expression is mostly opaque to the existing machinery, we were hiding the details of the await_resume return value. If that needs to be wrapped in a target expression, then emulate this with the whole co_await. Similarly, if the await expression we build in response to co_await p.yield_value (e) is wrapped in a target expression, then we need to transfer that wrapper to the resultant CO_YIELD_EXPR (which is, itself, just a proxy for the underlying co_await). tested on x86_64,powerpc64-linux, x86_64-darwin OK for master? OK for 10.2? thanks Iain gcc/cp/ChangeLog: PR c++/95050 * coroutines.cc (build_co_await): Wrap the co_await expression in a TARGET_EXPR, where needed. (finish_co_yield_expr): Likewise. gcc/testsuite/ChangeLog: PR c++/95050 * g++.dg/coroutines/pr95050.C: New test. --- gcc/cp/coroutines.cc | 29 +- gcc/testsuite/g++.dg/coroutines/pr95050.C | 49 +++ 2 files changed, 76 insertions(+), 2 deletions(-) create mode 100644 gcc/testsuite/g++.dg/coroutines/pr95050.C diff --git a/gcc/cp/coroutines.cc b/gcc/cp/coroutines.cc index 8746927577a..cc685ca73b2 100644 --- a/gcc/cp/coroutines.cc +++ b/gcc/cp/coroutines.cc @@ -816,6 +816,12 @@ build_co_await (location_t loc, tree a, suspend_point_kind suspend_kind) tree awaiter_calls = make_tree_vec (3); TREE_VEC_ELT (awaiter_calls, 0) = awrd_call; /* await_ready(). */ TREE_VEC_ELT (awaiter_calls, 1) = awsp_call; /* await_suspend(). */ + tree te = NULL_TREE; + if (TREE_CODE (awrs_call) == TARGET_EXPR) +{ + te = awrs_call; + awrs_call = TREE_OPERAND (awrs_call, 1); +} TREE_VEC_ELT (awaiter_calls, 2) = awrs_call; /* await_resume(). */ tree await_expr = build5_loc (loc, CO_AWAIT_EXPR, @@ -823,7 +829,13 @@ build_co_await (location_t loc, tree a, suspend_point_kind suspend_kind) a, e_proxy, o, awaiter_calls, build_int_cst (integer_type_node, (int) suspend_kind)); - return convert_from_reference (await_expr); + if (te) +{ + TREE_OPERAND (te, 1) = await_expr; + await_expr = te; +} + tree t = convert_from_reference (await_expr); + return t; } tree @@ -960,8 +972,21 @@ finish_co_yield_expr (location_t kw, tree expr) tree op = build_co_await (kw, yield_call, CO_YIELD_SUSPEND_POINT); if (op != error_mark_node) { - op = build2_loc (kw, CO_YIELD_EXPR, TREE_TYPE (op), expr, op); + if (REFERENCE_REF_P (op)) + op = TREE_OPERAND (op, 0); + /* If the await expression is wrapped in a TARGET_EXPR, then transfer +that wrapper to the CO_YIELD_EXPR, since this is just a proxy for +its contained await. Otherwise, just build the CO_YIELD_EXPR. */ + if (TREE_CODE (op) == TARGET_EXPR) + { + tree t = TREE_OPERAND (op, 1); + t = build2_loc (kw, CO_YIELD_EXPR, TREE_TYPE (t), expr, t); + TREE_OPERAND (op, 1) = t; + } + else + op = build2_loc (kw, CO_YIELD_EXPR, TREE_TYPE (op), expr, op); TREE_SIDE_EFFECTS (op) = 1; + op = convert_from_reference (op); } return op; diff --git a/gcc/testsuite/g++.dg/coroutines/pr95050.C b/gcc/testsuite/g++.dg/coroutines/pr95050.C new file mode 100644 index 000..fd1516d32f0 --- /dev/null +++ b/gcc/testsuite/g++.dg/coroutines/pr95050.C @@ -0,0 +1,49 @@ +#if __has_include () +#include +using namespace std; +#elif defined (__clang__) && __has_include () +#include +using namespace std::experimental; +#endif +#include + +struct ret_type +{ + ret_type () = default; + ret_type (const ret_type&) = delete; + //ret_type (ret_type&&) = default; + ~ret_type() {} +}; + +struct task +{ + struct promise_type + { +auto get_return_object () -> task { return {}; } +auto initial_suspend () -> suspend_always { return {}; } +auto final_suspend () -> suspend_always { return {}; } +void return_void () {} +void unhandled_exception () { } +void thing (ret_type x) {} + }; +}; + +struct awaiter +{ + bool await_ready() const { return true; } + void await_suspend (coroutine_handle<>) {} + ret_type await_resume() { return {}; } +}; + +task +my_coro () +{ + ret_type r2{co_await awaiter{}}; + //ret_type r3 (std::move(r2)); +} + +int main() +{ + auto x = my_coro (); + return 0; +} -- 2.24.1
[PATCH] coroutines: Correct handling of references in parm copies [PR95350].
(resending, this didn’t appear to make it to the list) Hi, I had implemented a move out of rvalue refs for such ramp values (since these are most likely to be dangling references). However this does cause a divergence with the clang implementation - and the patch fixes that. tested on x86_64,powerpc64-linux, x86_64-darwin OK for master? OK for 10.2? thanks Iain --- Adjust to handle rvalue refs the same way as clang, and to correct the handling of moves when a copy CTOR is present. This is one area where we could make things easier for the end-user (as was implemented before this change), however there needs to be agreement about when the full statement containing a coroutine call ends (i.e. when the ramp terminates or when the coroutine terminates). gcc/cp/ChangeLog: PR c++/95350 * coroutines.cc (struct param_info): Remove rv_ref field. (build_actor_fn): Remove specifial rvalue ref handling. (morph_fn_to_coro): Likewise. gcc/testsuite/ChangeLog: PR c++/95350 * g++.dg/coroutines/torture/func-params-08.C: Adjust test to reflect that all rvalue refs are dangling. * g++.dg/coroutines/torture/func-params-09-awaitable-parms.C: Likewise. * g++.dg/coroutines/pr95350.C: New test. --- gcc/cp/coroutines.cc | 41 +-- gcc/testsuite/g++.dg/coroutines/pr95350.C | 28 + .../coroutines/torture/func-params-08.C | 11 ++--- .../torture/func-params-09-awaitable-parms.C | 11 ++--- 4 files changed, 50 insertions(+), 41 deletions(-) create mode 100644 gcc/testsuite/g++.dg/coroutines/pr95350.C diff --git a/gcc/cp/coroutines.cc b/gcc/cp/coroutines.cc index 969f4a66f2f..8746927577a 100644 --- a/gcc/cp/coroutines.cc +++ b/gcc/cp/coroutines.cc @@ -1807,7 +1807,6 @@ struct param_info tree frame_type; /* The type used to represent this parm in the frame. */ tree orig_type;/* The original type of the parm (not as passed). */ bool by_ref; /* Was passed by reference. */ - bool rv_ref; /* Was an rvalue reference. */ bool pt_ref; /* Was a pointer to object. */ bool trivial_dtor; /* The frame type has a trivial DTOR. */ bool this_ptr; /* Is 'this' */ @@ -2077,12 +2076,6 @@ build_actor_fn (location_t loc, tree coro_frame_type, tree actor, tree fnbody, if (parm.pt_ref) fld_idx = build1_loc (loc, CONVERT_EXPR, TREE_TYPE (arg), fld_idx); - /* We expect an rvalue ref. here. */ - if (parm.rv_ref) - fld_idx = convert_to_reference (DECL_ARG_TYPE (arg), fld_idx, - CONV_STATIC, LOOKUP_NORMAL, - NULL_TREE, tf_warning_or_error); - int i; tree *puse; FOR_EACH_VEC_ELT (*parm.body_uses, i, puse) @@ -3770,15 +3763,8 @@ morph_fn_to_coro (tree orig, tree *resumer, tree *destroyer) if (actual_type == NULL_TREE) actual_type = error_mark_node; parm.orig_type = actual_type; - parm.by_ref = parm.rv_ref = parm.pt_ref = false; - if (TREE_CODE (actual_type) == REFERENCE_TYPE - && TYPE_REF_IS_RVALUE (DECL_ARG_TYPE (arg))) - { - parm.rv_ref = true; - actual_type = TREE_TYPE (actual_type); - parm.frame_type = actual_type; - } - else if (TREE_CODE (actual_type) == REFERENCE_TYPE) + parm.by_ref = parm.pt_ref = false; + if (TREE_CODE (actual_type) == REFERENCE_TYPE) { /* If the user passes by reference, then we will save the pointer to the original. As noted in @@ -3786,16 +3772,12 @@ morph_fn_to_coro (tree orig, tree *resumer, tree *destroyer) referenced item ends and then the coroutine is resumed, we have UB; well, the user asked for it. */ actual_type = build_pointer_type (TREE_TYPE (actual_type)); - parm.frame_type = actual_type; parm.pt_ref = true; } else if (TYPE_REF_P (DECL_ARG_TYPE (arg))) - { - parm.by_ref = true; - parm.frame_type = actual_type; - } - else - parm.frame_type = actual_type; + parm.by_ref = true; + + parm.frame_type = actual_type; parm.this_ptr = is_this_parameter (arg); if (lambda_p) @@ -4170,17 +4152,16 @@ morph_fn_to_coro (tree orig, tree *resumer, tree *destroyer) } else if (parm.by_ref) vec_safe_push (promise_args, fld_idx); - else if (parm.rv_ref) - vec_safe_push (promise_args, rvalue (fld_idx)); else vec_safe_push (promise_args, arg); if (TYPE_NEEDS_CONSTRUCTING (parm.frame_type)) { vec *p_in; - if (parm.by_ref - && classtype_has_non_deleted_move_ctor (parm.frame_typ
[PATCH] coroutines: Allow parameter packs in co_await/yield expressions [PR95345]
Hi This corrects a pasto, where I copied the constraint on bare parameter packs from the co_return to co_yield/await without properly reviewing it. tested on x86_64,powerpc64-linux, x86_64-darwin OK for master? OK for 10.2? thanks Iain gcc/cp/ChangeLog: PR c++/95345 * coroutines.cc (finish_co_await_expr): Revise to allow for parameter packs. (finish_co_yield_expr): Likewise. gcc/testsuite/ChangeLog: PR c++/95345 * g++.dg/coroutines/pr95345.C: New test. --- gcc/cp/coroutines.cc | 45 +++ gcc/testsuite/g++.dg/coroutines/pr95345.C | 32 2 files changed, 53 insertions(+), 24 deletions(-) create mode 100644 gcc/testsuite/g++.dg/coroutines/pr95345.C diff --git a/gcc/cp/coroutines.cc b/gcc/cp/coroutines.cc index cc685ca73b2..7afa550037c 100644 --- a/gcc/cp/coroutines.cc +++ b/gcc/cp/coroutines.cc @@ -851,19 +851,18 @@ finish_co_await_expr (location_t kw, tree expr) /* The current function has now become a coroutine, if it wasn't already. */ DECL_COROUTINE_P (current_function_decl) = 1; - if (processing_template_decl) -{ - current_function_returns_value = 1; - - if (check_for_bare_parameter_packs (expr)) - return error_mark_node; + /* This function will appear to have no return statement, even if it + is declared to return non-void (most likely). This is correct - we + synthesize the return for the ramp in the compiler. So suppress any + extraneous warnings during substitution. */ + TREE_NO_WARNING (current_function_decl) = true; - /* If we don't know the promise type, we can't proceed. */ - tree functype = TREE_TYPE (current_function_decl); - if (dependent_type_p (functype) || type_dependent_expression_p (expr)) - return build5_loc (kw, CO_AWAIT_EXPR, unknown_type_node, expr, - NULL_TREE, NULL_TREE, NULL_TREE, integer_zero_node); -} + /* If we don't know the promise type, we can't proceed, build the + co_await with the expression unchanged. */ + tree functype = TREE_TYPE (current_function_decl); + if (dependent_type_p (functype) || type_dependent_expression_p (expr)) +return build5_loc (kw, CO_AWAIT_EXPR, unknown_type_node, expr, + NULL_TREE, NULL_TREE, NULL_TREE, integer_zero_node); /* We must be able to look up the "await_transform" method in the scope of the promise type, and obtain its return type. */ @@ -928,19 +927,17 @@ finish_co_yield_expr (location_t kw, tree expr) /* The current function has now become a coroutine, if it wasn't already. */ DECL_COROUTINE_P (current_function_decl) = 1; - if (processing_template_decl) -{ - current_function_returns_value = 1; - - if (check_for_bare_parameter_packs (expr)) - return error_mark_node; + /* This function will appear to have no return statement, even if it + is declared to return non-void (most likely). This is correct - we + synthesize the return for the ramp in the compiler. So suppress any + extraneous warnings during substitution. */ + TREE_NO_WARNING (current_function_decl) = true; - tree functype = TREE_TYPE (current_function_decl); - /* If we don't know the promise type, we can't proceed. */ - if (dependent_type_p (functype) || type_dependent_expression_p (expr)) - return build2_loc (kw, CO_YIELD_EXPR, unknown_type_node, expr, - NULL_TREE); -} + /* If we don't know the promise type, we can't proceed, build the + co_await with the expression unchanged. */ + tree functype = TREE_TYPE (current_function_decl); + if (dependent_type_p (functype) || type_dependent_expression_p (expr)) +return build2_loc (kw, CO_YIELD_EXPR, unknown_type_node, expr, NULL_TREE); if (!coro_promise_type_found_p (current_function_decl, kw)) /* We must be able to look up the "yield_value" method in the scope of diff --git a/gcc/testsuite/g++.dg/coroutines/pr95345.C b/gcc/testsuite/g++.dg/coroutines/pr95345.C new file mode 100644 index 000..90e946d91c2 --- /dev/null +++ b/gcc/testsuite/g++.dg/coroutines/pr95345.C @@ -0,0 +1,32 @@ +#if __has_include () +#include +using namespace std; +#elif defined (__clang__) && __has_include () +#include +using namespace std::experimental; +#endif + +struct dummy_coro +{ + using promise_type = dummy_coro; + bool await_ready() { return false; } + void await_suspend(std::coroutine_handle<>) { } + void await_resume() { } + dummy_coro get_return_object() { return {}; } + dummy_coro initial_suspend() { return {}; } + dummy_coro final_suspend() { return {}; } + void return_void() { } + void unhandled_exception() { } +}; + +template +dummy_coro +foo() +{ + ((co_await [](int){ return std::suspend_never{}; }(I)), ...); + co_return; +} + +void bar() { + foo<1>(); +} -- 2.24.1
[PATCH] Add pattern for pointer-diff on addresses with same base/offset (PR 94234)
This patch is meant to add match rules to simplify patterns as: o. (pointer + offset_a) - (pointer + offset_b) -> (ptrdiff_t) (offset_a - offset_b) o. (pointer_a + offset) - (pointer_b + offset) -> (pointer_a - pointer_b) Bootstrapped/regtested on x86_64-linux and aarch64-linux. Feng --- 2020-06-01 Feng Xue gcc/ PR tree-optimization/94234 * match.pd ((PTR + A) - (PTR + B)) -> (ptrdiff_t)(A - B): New simplification. * ((PTR_A + O) - (PTR_B + O)) -> (PTR_A - PTR_B): New simplification. gcc/testsuite/ PR tree-optimization/94234 * gcc.dg/pr94234.c: New test. --- gcc/match.pd | 19 +-- gcc/testsuite/gcc.dg/pr94234.c | 24 2 files changed, 33 insertions(+), 10 deletions(-) create mode 100644 gcc/testsuite/gcc.dg/pr94234.c diff --git a/gcc/match.pd b/gcc/match.pd index 33ee1a920bf..6553be4822e 100644 --- a/gcc/match.pd +++ b/gcc/match.pd @@ -2515,16 +2515,15 @@ DEFINE_INT_AND_FLOAT_ROUND_FN (RINT) && TREE_CODE (@2) == INTEGER_CST && tree_int_cst_sign_bit (@2) == 0)) (minus (convert @1) (convert @2) - (simplify -(pointer_diff (pointer_plus @@0 @1) (pointer_plus @0 @2)) -/* The second argument of pointer_plus must be interpreted as signed, and - thus sign-extended if necessary. */ -(with { tree stype = signed_type_for (TREE_TYPE (@1)); } - /* Use view_convert instead of convert here, as POINTER_PLUS_EXPR - second arg is unsigned even when we need to consider it as signed, - we don't want to diagnose overflow here. */ - (minus (convert (view_convert:stype @1)) - (convert (view_convert:stype @2))) + (simplify + (pointer_diff (pointer_plus@3 @0 @1) (pointer_plus @0 @2)) +(if (TYPE_OVERFLOW_UNDEFINED (TREE_TYPE (@3))) + (convert (minus @1 @2 + (simplify + (pointer_diff (pointer_plus@3 @0 @2) (pointer_plus @1 @2)) +(if (TYPE_OVERFLOW_UNDEFINED (TREE_TYPE (@3)) +&& !integer_zerop (@2)) + (pointer_diff @0 @1) /* (A * C) +- (B * C) -> (A+-B) * C and (A * C) +- A -> A * (C+-1). Modeled after fold_plusminus_mult_expr. */ diff --git a/gcc/testsuite/gcc.dg/pr94234.c b/gcc/testsuite/gcc.dg/pr94234.c new file mode 100644 index 000..ef9076c80da --- /dev/null +++ b/gcc/testsuite/gcc.dg/pr94234.c @@ -0,0 +1,24 @@ +/* { dg-do compile } */ +/* { dg-options "-O2 -fdump-tree-ccp1" } */ + +typedef __SIZE_TYPE__ size_t; +typedef __PTRDIFF_TYPE__ ptrdiff_t; + +ptrdiff_t foo (char *a, size_t n) +{ + char *b1 = a + 8 * n; + char *b2 = a + 8 * (n - 1); + + return b1 - b2; +} + +ptrdiff_t goo (char *a, size_t n, size_t m) +{ + char *b1 = a + 8 * n; + char *b2 = a + 8 * (n + 1); + + return (b1 + m) - (b2 + m); +} + +/* { dg-final { scan-tree-dump-times "return 8;" 1 "ccp1" } } */ +/* { dg-final { scan-tree-dump-times "return -8;" 1 "ccp1" } } */ From 160eaeb151197844005837dc4b8e1e27bb6dfadf Mon Sep 17 00:00:00 2001 From: Feng Xue Date: Mon, 1 Jun 2020 11:57:35 +0800 Subject: [PATCH] tree-optimization/94234 - add ptr-diff pattern for addresses with same base or offset 2020-06-01 Feng Xue gcc/ PR tree-optimization/94234 * match.pd ((PTR + A) - (PTR + B)) -> (ptrdiff_t)(A - B): New simplification. * ((PTR_A + O) - (PTR_B + O)) -> (PTR_A - PTR_B): New simplification. gcc/testsuite/ PR tree-optimization/94234 * gcc.dg/pr94234.c: New test. --- gcc/match.pd | 19 +-- gcc/testsuite/gcc.dg/pr94234.c | 24 2 files changed, 33 insertions(+), 10 deletions(-) create mode 100644 gcc/testsuite/gcc.dg/pr94234.c diff --git a/gcc/match.pd b/gcc/match.pd index 33ee1a920bf..6553be4822e 100644 --- a/gcc/match.pd +++ b/gcc/match.pd @@ -2515,16 +2515,15 @@ DEFINE_INT_AND_FLOAT_ROUND_FN (RINT) && TREE_CODE (@2) == INTEGER_CST && tree_int_cst_sign_bit (@2) == 0)) (minus (convert @1) (convert @2) - (simplify -(pointer_diff (pointer_plus @@0 @1) (pointer_plus @0 @2)) -/* The second argument of pointer_plus must be interpreted as signed, and - thus sign-extended if necessary. */ -(with { tree stype = signed_type_for (TREE_TYPE (@1)); } - /* Use view_convert instead of convert here, as POINTER_PLUS_EXPR - second arg is unsigned even when we need to consider it as signed, - we don't want to diagnose overflow here. */ - (minus (convert (view_convert:stype @1)) - (convert (view_convert:stype @2))) + (simplify + (pointer_diff (pointer_plus@3 @0 @1) (pointer_plus @0 @2)) +(if (TYPE_OVERFLOW_UNDEFINED (TREE_TYPE (@3))) + (convert (minus @1 @2 + (simplify + (pointer_diff (pointer_plus@3 @0 @2) (pointer_plus @1 @2)) +(if (TYPE_OVERFLOW_UNDEFINED (TREE_TYPE (@3)) + && !integer_zerop (@2)) + (pointer_diff @0 @1) /* (A * C) +- (B * C) -> (A+-B) * C and (A * C) +- A -> A * (C+-1). Modeled after fold_plusminus_mult_expr. */ diff --g
[patch] Make memory copy functions scalar storage order barriers
Hi, this addresses the issue raised by Andrew a few weeks ago about the usage of memory copy functions to toggle the scalar storage order. Recall that you cannot (the compiler errors out) take the address of a scalar which is stored in reverse order, but you can do it for the enclosing aggregate type., which means that you can also pass it to the memory copy functions. In this case, the optimizer may rewrite the copy into a scalar copy, which is a no-no. The patch also contains an unrelated hunk for the tree pretty printer. Tested on x86-64/Linux, OK for the mainline? 2020-06-01 Eric Botcazou * gimple-fold.c (gimple_fold_builtin_memory_op): Do not replace with a scalar copy if either type has reverse scalar storage order. * tree-ssa-sccvn.c (vn_reference_lookup_3): Do not propagate through a memory copy if either type has reverse scalar storage order. * tree-pretty-print.c (dump_generic_node) : Print quals. 2020-06-01 Eric Botcazou * gcc.c-torture/execute/sso-1.c: New test. -- Eric Botcazoudiff --git a/gcc/gimple-fold.c b/gcc/gimple-fold.c index 4e3de95d2d2..64a9221f8cf 100644 --- a/gcc/gimple-fold.c +++ b/gcc/gimple-fold.c @@ -741,7 +741,8 @@ gimple_fold_builtin_memory_op (gimple_stmt_iterator *gsi, } else { - tree srctype, desttype; + tree srctype = TREE_TYPE (TREE_TYPE (src)); + tree desttype = TREE_TYPE (TREE_TYPE (dest)); unsigned int src_align, dest_align; tree off0; const char *tmp_str; @@ -767,7 +768,11 @@ gimple_fold_builtin_memory_op (gimple_stmt_iterator *gsi, hack can be removed. */ && !c_strlen (src, 1) && !((tmp_str = c_getstr (src, &tmp_len)) != NULL - && memchr (tmp_str, 0, tmp_len) == NULL)) + && memchr (tmp_str, 0, tmp_len) == NULL) + && !(AGGREGATE_TYPE_P (srctype) + && TYPE_REVERSE_STORAGE_ORDER (srctype)) + && !(AGGREGATE_TYPE_P (desttype) + && TYPE_REVERSE_STORAGE_ORDER (desttype))) { unsigned ilen = tree_to_uhwi (len); if (pow2p_hwi (ilen)) @@ -957,10 +962,15 @@ gimple_fold_builtin_memory_op (gimple_stmt_iterator *gsi, but that only gains us that the destination and source possibly no longer will have their address taken. */ srctype = TREE_TYPE (TREE_TYPE (src)); + desttype = TREE_TYPE (TREE_TYPE (dest)); + if ((AGGREGATE_TYPE_P (srctype) + && TYPE_REVERSE_STORAGE_ORDER (srctype)) + || (AGGREGATE_TYPE_P (desttype) + && TYPE_REVERSE_STORAGE_ORDER (desttype))) + return false; if (TREE_CODE (srctype) == ARRAY_TYPE && !tree_int_cst_equal (TYPE_SIZE_UNIT (srctype), len)) srctype = TREE_TYPE (srctype); - desttype = TREE_TYPE (TREE_TYPE (dest)); if (TREE_CODE (desttype) == ARRAY_TYPE && !tree_int_cst_equal (TYPE_SIZE_UNIT (desttype), len)) desttype = TREE_TYPE (desttype); diff --git a/gcc/tree-pretty-print.c b/gcc/tree-pretty-print.c index f04fd65091a..7d581214022 100644 --- a/gcc/tree-pretty-print.c +++ b/gcc/tree-pretty-print.c @@ -1899,8 +1899,16 @@ dump_generic_node (pretty_printer *pp, tree node, int spc, dump_flags_t flags, case ARRAY_TYPE: { + unsigned int quals = TYPE_QUALS (node); tree tmp; + if (quals & TYPE_QUAL_ATOMIC) + pp_string (pp, "atomic "); + if (quals & TYPE_QUAL_CONST) + pp_string (pp, "const "); + if (quals & TYPE_QUAL_VOLATILE) + pp_string (pp, "volatile "); + /* Print the innermost component type. */ for (tmp = TREE_TYPE (node); TREE_CODE (tmp) == ARRAY_TYPE; tmp = TREE_TYPE (tmp)) diff --git a/gcc/tree-ssa-sccvn.c b/gcc/tree-ssa-sccvn.c index 4b3f31c12cb..17867b65ecb 100644 --- a/gcc/tree-ssa-sccvn.c +++ b/gcc/tree-ssa-sccvn.c @@ -3275,6 +3275,9 @@ vn_reference_lookup_3 (ao_ref *ref, tree vuse, void *data_, } if (TREE_CODE (lhs) == ADDR_EXPR) { + if (AGGREGATE_TYPE_P (TREE_TYPE (TREE_TYPE (lhs))) + && TYPE_REVERSE_STORAGE_ORDER (TREE_TYPE (TREE_TYPE (lhs + return (void *)-1; tree tem = get_addr_base_and_unit_offset (TREE_OPERAND (lhs, 0), &lhs_offset); if (!tem) @@ -3303,6 +3306,9 @@ vn_reference_lookup_3 (ao_ref *ref, tree vuse, void *data_, rhs = vn_valueize (rhs); if (TREE_CODE (rhs) == ADDR_EXPR) { + if (AGGREGATE_TYPE_P (TREE_TYPE (TREE_TYPE (rhs))) + && TYPE_REVERSE_STORAGE_ORDER (TREE_TYPE (TREE_TYPE (rhs + return (void *)-1; tree tem = get_addr_base_and_unit_offset (TREE_OPERAND (rhs, 0), &rhs_offset); if (!tem) typedef unsigned char uint8_t; typedef unsigned int uint32_t; #define __big_endian__ scalar_storage_order("big-endian") #define __little_endian__ scalar_storage_order("little-endian") typedef union { uint32_t val; uint8_t v[4]; } __attribute__((__big_endian__)) upal_u32be_t; typedef union { uint32_t val; uint8_t v[4]; } __attribute__((__little_endian__)) upal_u32le_t; static inline uint32_t native_to_big_endian(uint32_t t) { #if __BYTE_ORDER__ == __ORDER_BIG_ENDIAN__ re
[PATCH] coroutines: Fix missed ramp function return copy elision [PR95346].
Hi Confusingly, "get_return_object ()" can do two things: - Firstly it can provide the return object for the ramp function (as the name suggests). - Secondly if the type of the ramp function is different from that of the get_return_object call, this is used as a single parameter to a CTOR for the ramp's return type. In the first case we can rely on finish_return_stmt () to do the necessary processing for copy elision. In the second case, we should have passed a prvalue to the CTOR as per the standard comment, but I had omitted the rvalue () call. Fixed thus. tested on x86_64-darwin, x86_64-linux, powerpc64-linux OK for master? OK for 10.2? thanks Iain gcc/cp/ChangeLog: PR c++/95346 * coroutines.cc (morph_fn_to_coro): Ensure that the get- return-object is constructed correctly; When it is not the final return value, pass it to the CTOR of the return type as an rvalue, per the standard comment. gcc/testsuite/ChangeLog: PR c++/95346 * g++.dg/coroutines/pr95346.C: New test. --- gcc/cp/coroutines.cc | 70 +++ gcc/testsuite/g++.dg/coroutines/pr95346.C | 26 + 2 files changed, 71 insertions(+), 25 deletions(-) create mode 100644 gcc/testsuite/g++.dg/coroutines/pr95346.C diff --git a/gcc/cp/coroutines.cc b/gcc/cp/coroutines.cc index 7afa550037c..d1c2b437ade 100644 --- a/gcc/cp/coroutines.cc +++ b/gcc/cp/coroutines.cc @@ -4279,7 +4279,8 @@ morph_fn_to_coro (tree orig, tree *resumer, tree *destroyer) } tree gro_context_body = push_stmt_list (); - bool gro_is_void_p = VOID_TYPE_P (TREE_TYPE (get_ro)); + tree gro_type = TREE_TYPE (get_ro); + bool gro_is_void_p = VOID_TYPE_P (gro_type); tree gro = NULL_TREE; tree gro_bind_vars = NULL_TREE; @@ -4289,17 +4290,23 @@ morph_fn_to_coro (tree orig, tree *resumer, tree *destroyer) finish_expr_stmt (get_ro); else { - gro = build_lang_decl (VAR_DECL, get_identifier ("coro.gro"), - TREE_TYPE (get_ro)); + gro = build_lang_decl (VAR_DECL, get_identifier ("coro.gro"), gro_type); DECL_CONTEXT (gro) = current_scope (); DECL_ARTIFICIAL (gro) = true; DECL_IGNORED_P (gro) = true; add_decl_expr (gro); gro_bind_vars = gro; - - r = build2_loc (fn_start, INIT_EXPR, TREE_TYPE (gro), gro, get_ro); - r = coro_build_cvt_void_expr_stmt (r, fn_start); - add_stmt (r); + if (TYPE_NEEDS_CONSTRUCTING (gro_type)) + { + vec *arg = make_tree_vector_single (get_ro); + r = build_special_member_call (gro, complete_ctor_identifier, +&arg, gro_type, LOOKUP_NORMAL, +tf_warning_or_error); + release_tree_vector (arg); + } + else + r = build2_loc (fn_start, INIT_EXPR, gro_type, gro, get_ro); + finish_expr_stmt (r); } /* Initialize the resume_idx_name to 0, meaning "not started". */ @@ -4333,28 +4340,41 @@ morph_fn_to_coro (tree orig, tree *resumer, tree *destroyer) /* Switch to using 'input_location' as the loc, since we're now more logically doing things related to the end of the function. */ - /* The ramp is done, we just need the return value. */ - if (!same_type_p (TREE_TYPE (get_ro), fn_return_type)) + /* The ramp is done, we just need the return value. + [dcl.fct.def.coroutine] / 7 + The expression promise.get_return_object() is used to initialize the + glvalue result or prvalue result object of a call to a coroutine. + + If the 'get return object' is non-void, then we built it before the + promise was constructed. We now supply a reference to that var, + either as the return value (if it's the same type) or to the CTOR + for an object of the return type. */ + if (gro_is_void_p) +r = NULL_TREE; + else +r = rvalue (gro); + + if (!same_type_p (gro_type, fn_return_type)) { - /* construct the return value with a single GRO param, if it's not -void. */ - vec *args = NULL; - vec **arglist = NULL; - if (!gro_is_void_p) + /* The return object is , even if the gro is void. */ + if (CLASS_TYPE_P (fn_return_type)) { - args = make_tree_vector_single (gro); - arglist = &args; + vec *args = NULL; + vec **arglist = NULL; + if (!gro_is_void_p) + { + args = make_tree_vector_single (r); + arglist = &args; + } + r = build_special_member_call (NULL_TREE, +complete_ctor_identifier, arglist, +fn_return_type, LOOKUP_NORMAL, +tf_warning_or_error); + r = build_cplus_new (fn_return_type, r, tf_warning_or_error); } - r = build_special_member_call (NULL_TREE, -complete_ctor_i
Re: [PATCH PR95254] aarch64: gcc generate inefficient code with fixed sve vector length
"Yangfei (Felix)" writes: > Hi, > >> -Original Message- >> From: Richard Sandiford [mailto:richard.sandif...@arm.com] >> Sent: Sunday, May 31, 2020 12:01 AM >> To: Yangfei (Felix) >> Cc: gcc-patches@gcc.gnu.org; Uros Bizjak ; Jakub >> Jelinek ; Hongtao Liu ; H.J. Lu >> >> Subject: Re: [PATCH PR95254] aarch64: gcc generate inefficient code with >> fixed sve vector length >> > > Snip... > >> > >> > The v5 patch attached addressed this issue. >> > >> > There two added changes compared with the v4 patch: >> > 1. In candidate_mem_p, mov_optab for innermode should be available. >> > In this case, mov_optab for SDmode is not there and subreg are added >> back by emit_move_insn_1. So we won't get the benefit with the patch. >> >> I agree we should have this check. I think the rule applies to all of the >> transforms though, not just the mem one, so we should add the check to the >> register and constant cases too. > > OK. I changed to make this an extra condition for calculating x_inner & y > _inner. Sounds good. Maybe at this point the x_inner and y_inner code is getting complicated enough to put into a lambda too: x_inner = ... (x); y_inner = ... (y); Just a suggestion though. >> > 2. Instead of using adjust_address, I changed to use adjust_address_nv to >> avoid the emit of invalid insn 13. >> > The latter call to validize_mem() in emit_move_insn will take care of >> > the >> address for us. >> >> The validation performed by validize_mem is the same as that performed by >> adjust_address, so the only case this should make a difference is for >> push_operands: > > True. > >> /* If X or Y are memory references, verify that their addresses are valid >> for the machine. */ >> if (MEM_P (x) >> && (! memory_address_addr_space_p (GET_MODE (x), XEXP (x, 0), >> MEM_ADDR_SPACE (x)) >>&& ! push_operand (x, GET_MODE (x >> x = validize_mem (x); >> >> if (MEM_P (y) >> && ! memory_address_addr_space_p (GET_MODE (y), XEXP (y, 0), >> MEM_ADDR_SPACE (y))) >> y = validize_mem (y); >> >> So I think the fix is to punt on push_operands instead (and continue to use >> adjust_address rather than adjust_address_nv). > > Not sure if I understand it correctly. > Do you mean excluding push_operand in candidate_mem_p? Like: > > 3830 auto candidate_mem_p = [&](machine_mode innermode, rtx mem) { > 3831 return !targetm.can_change_mode_class (innermode, GET_MODE (mem), > ALL_REGS) > 3832&& !push_operand (mem, GET_MODE (mem)) > 3833/* Not a candiate if innermode requires too much alignment. > */ > 3834&& (MEM_ALIGN (mem) >= GET_MODE_ALIGNMENT (innermode) > 3835|| targetm.slow_unaligned_access (GET_MODE (mem), > 3836 MEM_ALIGN (mem)) > 3837|| !targetm.slow_unaligned_access (innermode, MEM_ALIGN > (mem))); > 3838 }; Yeah, looks good. Formatting nit though: multi-line conditions should be wrapped in (...), i.e.: return (... && ... && ...); Thanks, Richard
[PATCH] coroutines: co_returns are statements, not expressions.
Hi This corrects an typo in the CO_RETURN_EXPR tree class. Although it doens’t fix any PR or regression - it seems to me that it would be sensible to apply this to 10.2 as well as master (or it’s an accident waiting to happen). OK for master? 10.2 (after some bake)? thanks Iain gcc/cp/ChangeLog: * cp-tree.def (CO_RETURN_EXPR): Correct the class to use tcc_statement. --- gcc/cp/cp-tree.def | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/gcc/cp/cp-tree.def b/gcc/cp/cp-tree.def index 1454802bf68..99851eb780f 100644 --- a/gcc/cp/cp-tree.def +++ b/gcc/cp/cp-tree.def @@ -594,9 +594,9 @@ DEFTREECODE (CO_YIELD_EXPR, "co_yield", tcc_expression, 2) /* The co_return expression is used to support coroutines. Op0 is the original expr, can be void (for use in diagnostics) - Op2 is the promise return_ call for Op0. */ + Op1 is the promise return_ call for for the expression given. */ -DEFTREECODE (CO_RETURN_EXPR, "co_return", tcc_expression, 2) +DEFTREECODE (CO_RETURN_EXPR, "co_return", tcc_statement, 2) /* Local variables: -- 2.24.1
[PATCH] Fix some improper debug dump in clone materialization
Clone materialization might produce some improper debug output as: Original-- cloning foo/271 to foo.constprop/334 replace map: 0 -> xxx1->yyy m_always_copy_start: 1 IPA adjusted parameters: foo (...) { ... } And a better output could be: cloning foo/271 to foo.constprop/334 replace map: 0 -> xxx, 1->yyy /* separate 1 with xxx, */ m_always_copy_start: 1 /* Align with replace map */ IPA adjusted parameters:/* If no adjusted parameter, start a new line or omit this line */ foo (...) { ... } Feng --- 2020-06-01 Feng Xue gcc/ * cgraphclones.c (materialize_all_clones): Adjust replace map dump. * ipa-param-manipulation.c (ipa_dump_adjusted_parameters): Do not dump infomation if there is no adjusted parameter. * (ipa_param_adjustments::dump): Adjust prefix spaces for dump string. --- gcc/cgraphclones.c | 6 +++--- gcc/ipa-param-manipulation.c | 5 - 2 files changed, 7 insertions(+), 4 deletions(-) diff --git a/gcc/cgraphclones.c b/gcc/cgraphclones.c index e4f1c1d4b5e..db61c218297 100644 --- a/gcc/cgraphclones.c +++ b/gcc/cgraphclones.c @@ -1160,15 +1160,15 @@ symbol_table::materialize_all_clones (void) if (node->clone.tree_map) { unsigned int i; - fprintf (symtab->dump_file, " replace map: "); + fprintf (symtab->dump_file, "replace map:"); for (i = 0; i < vec_safe_length (node->clone.tree_map); i++) { ipa_replace_map *replace_info; replace_info = (*node->clone.tree_map)[i]; - fprintf (symtab->dump_file, "%i -> ", - (*node->clone.tree_map)[i]->parm_num); + fprintf (symtab->dump_file, "%s %i -> ", + i ? "," : "", replace_info->parm_num); print_generic_expr (symtab->dump_file, replace_info->new_tree); } diff --git a/gcc/ipa-param-manipulation.c b/gcc/ipa-param-manipulation.c index 978916057f0..2cc4bc79dc1 100644 --- a/gcc/ipa-param-manipulation.c +++ b/gcc/ipa-param-manipulation.c @@ -111,6 +111,9 @@ ipa_dump_adjusted_parameters (FILE *f, unsigned i, len = vec_safe_length (adj_params); bool first = true; + if (!len) +return; + fprintf (f, "IPA adjusted parameters: "); for (i = 0; i < len; i++) { @@ -899,7 +902,7 @@ ipa_param_adjustments::dump (FILE *f) fprintf (f, "m_always_copy_start: %i\n", m_always_copy_start); ipa_dump_adjusted_parameters (f, m_adj_params); if (m_skip_return) -fprintf (f, " Will SKIP return.\n"); +fprintf (f, "Will SKIP return.\n"); } /* Dump information contained in the object in textual form to stderr. */ --
Re: [PATCH 2/2] x86: Add cmpmemsi for -minline-all-stringops
On Sun, 31 May 2020, H.J. Lu via Gcc-patches wrote: > --- a/gcc/config/i386/i386-expand.c > +++ b/gcc/config/i386/i386-expand.c > @@ -7656,6 +7656,90 @@ ix86_expand_set_or_cpymem (rtx dst, rtx src, rtx > count_exp, rtx val_exp, >return true; > } > > +/* Expand cmpstrn or memcmp. */ > + > +bool > +ix86_expand_cmpstrn_or_cmpmem (rtx result, rtx src1, rtx src2, > +rtx length, rtx align, bool is_cmpstrn) > +{ > + if (optimize_insn_for_size_p () && !TARGET_INLINE_ALL_STRINGOPS) > +return false; > + > + /* Can't use this if the user has appropriated ecx, esi or edi. */ > + if (fixed_regs[CX_REG] || fixed_regs[SI_REG] || fixed_regs[DI_REG]) > +return false; > + > + if (is_cmpstrn) > +{ > + /* For strncmp, length is the maximum length, which can be larger > + than actual string lengths. We can expand the cmpstrn pattern > + to "repz cmpsb" only if one of the strings is a constant so > + that expand_builtin_strncmp() can write the length argument to > + be the minimum of the const string length and the actual length > + argument. Otherwise, "repz cmpsb" may pass the 0 byte. */ > + tree t1 = MEM_EXPR (src1); > + tree t2 = MEM_EXPR (src2); > + if (!((t1 && TREE_CODE (t1) == MEM_REF > + && TREE_CODE (TREE_OPERAND (t1, 0)) == ADDR_EXPR > + && (TREE_CODE (TREE_OPERAND (TREE_OPERAND (t1, 0), 0)) > + == STRING_CST)) > + || (t2 && TREE_CODE (t2) == MEM_REF > + && TREE_CODE (TREE_OPERAND (t2, 0)) == ADDR_EXPR > + && (TREE_CODE (TREE_OPERAND (TREE_OPERAND (t2, 0), 0)) > + == STRING_CST > + return false; > +} > + else > +{ > + /* Expand memcmp to "repz cmpsb" only for -minline-all-stringops > + since "repz cmpsb" can be much slower than memcmp function > + implemented with vector instructions, see > + > + https://gcc.gnu.org/bugzilla/show_bug.cgi?id=43052 > + */ > + if (!TARGET_INLINE_ALL_STRINGOPS) > + return false; > +} This check seems to be misplaced, "rep cmps" is slower than either memcmp or strcmp. The test for TARGET_INLINE_ALL_STRINGOPS should happen regardless of is_cmpstrn, so it should go earlier in the function. Alexander
Re: [PATCH] Fix some improper debug dump in clone materialization
Hi Feng, On Mon, Jun 01 2020, Feng Xue OS wrote: > Clone materialization might produce some improper debug output as: > > Original-- > > cloning foo/271 to foo.constprop/334 >replace map: 0 -> xxx1->yyy > m_always_copy_start: 1 > IPA adjusted parameters: foo (...) > { > ... > } > > And a better output could be: > > cloning foo/271 to foo.constprop/334 > replace map: 0 -> xxx, 1->yyy /* separate 1 with xxx, */ > m_always_copy_start: 1 /* Align with replace map */ > IPA adjusted parameters:/* If no adjusted parameter, > start a new line or omit this line */ > foo (...) > { > ... > } > > Feng > --- > 2020-06-01 Feng Xue > > gcc/ > * cgraphclones.c (materialize_all_clones): Adjust replace map dump. > * ipa-param-manipulation.c (ipa_dump_adjusted_parameters): Do not > dump infomation if there is no adjusted parameter. > * (ipa_param_adjustments::dump): Adjust prefix spaces for dump string. This is OK, thank you. Martin
[PATCH 1/2] Re-format zen memcpy/memset costs.
The patch improves readability of the memcpy and memset expansion strategies. gcc/ChangeLog: * config/i386/x86-tune-costs.h: Change code formatting. --- gcc/config/i386/x86-tune-costs.h | 38 +++- 1 file changed, 28 insertions(+), 10 deletions(-) diff --git a/gcc/config/i386/x86-tune-costs.h b/gcc/config/i386/x86-tune-costs.h index c73917e5a62..1169178433f 100644 --- a/gcc/config/i386/x86-tune-costs.h +++ b/gcc/config/i386/x86-tune-costs.h @@ -1311,14 +1311,23 @@ const struct processor_costs bdver_cost = { very small blocks it is better to use loop. For large blocks, libcall can do nontemporary accesses and beat inline considerably. */ static stringop_algs znver1_memcpy[2] = { - {libcall, {{6, loop, false}, {14, unrolled_loop, false}, + /* 32-bit tuning. */ + {libcall, {{6, loop, false}, +{14, unrolled_loop, false}, {-1, rep_prefix_4_byte, false}}}, - {libcall, {{16, loop, false}, {8192, rep_prefix_8_byte, false}, + /* 64-bit tuning. */ + {libcall, {{16, loop, false}, +{8192, rep_prefix_8_byte, false}, {-1, libcall, false; static stringop_algs znver1_memset[2] = { - {libcall, {{8, loop, false}, {24, unrolled_loop, false}, -{2048, rep_prefix_4_byte, false}, {-1, libcall, false}}}, - {libcall, {{48, unrolled_loop, false}, {8192, rep_prefix_8_byte, false}, + /* 32-bit tuning. */ + {libcall, {{8, loop, false}, +{24, unrolled_loop, false}, +{2048, rep_prefix_4_byte, false}, +{-1, libcall, false}}}, + /* 64-bit tuning. */ + {libcall, {{48, unrolled_loop, false}, +{8192, rep_prefix_8_byte, false}, {-1, libcall, false; struct processor_costs znver1_cost = { { @@ -1448,14 +1457,23 @@ struct processor_costs znver1_cost = { very small blocks it is better to use loop. For large blocks, libcall can do nontemporary accesses and beat inline considerably. */ static stringop_algs znver2_memcpy[2] = { - {libcall, {{6, loop, false}, {14, unrolled_loop, false}, + /* 32-bit tuning. */ + {libcall, {{6, loop, false}, +{14, unrolled_loop, false}, {-1, rep_prefix_4_byte, false}}}, - {libcall, {{16, loop, false}, {64, rep_prefix_4_byte, false}, + /* 64-bit tuning. */ + {libcall, {{16, loop, false}, +{64, rep_prefix_4_byte, false}, {-1, libcall, false; static stringop_algs znver2_memset[2] = { - {libcall, {{8, loop, false}, {24, unrolled_loop, false}, -{2048, rep_prefix_4_byte, false}, {-1, libcall, false}}}, - {libcall, {{24, rep_prefix_4_byte, false}, {128, rep_prefix_8_byte, false}, + /* 32-bit tuning. */ + {libcall, {{8, loop, false}, +{24, unrolled_loop, false}, +{2048, rep_prefix_4_byte, false} +{-1, libcall, false}}}, + /* 64-bit tuning. */ + {libcall, {{24, rep_prefix_4_byte, false}, +{128, rep_prefix_8_byte, false}, {-1, libcall, false; struct processor_costs znver2_cost = { -- 2.26.2
[PATCH 2/2] Tune memcpy and memset for Zen cores.
Based on the collected numbers in PR95435, I suggest the following tuning changes: gcc/ChangeLog: PR target/95435 * config/i386/x86-tune-costs.h: Use libcall for large sizes for -m32. Start using libcall from 128+ bytes. --- gcc/config/i386/x86-tune-costs.h | 12 ++-- 1 file changed, 6 insertions(+), 6 deletions(-) diff --git a/gcc/config/i386/x86-tune-costs.h b/gcc/config/i386/x86-tune-costs.h index 1169178433f..3207404e514 100644 --- a/gcc/config/i386/x86-tune-costs.h +++ b/gcc/config/i386/x86-tune-costs.h @@ -1314,20 +1314,20 @@ static stringop_algs znver1_memcpy[2] = { /* 32-bit tuning. */ {libcall, {{6, loop, false}, {14, unrolled_loop, false}, -{-1, rep_prefix_4_byte, false}}}, +{-1, libcall, false}}}, /* 64-bit tuning. */ {libcall, {{16, loop, false}, -{8192, rep_prefix_8_byte, false}, +{128, rep_prefix_8_byte, false}, {-1, libcall, false; static stringop_algs znver1_memset[2] = { /* 32-bit tuning. */ {libcall, {{8, loop, false}, {24, unrolled_loop, false}, -{2048, rep_prefix_4_byte, false}, +{128, rep_prefix_4_byte, false}, {-1, libcall, false}}}, /* 64-bit tuning. */ {libcall, {{48, unrolled_loop, false}, -{8192, rep_prefix_8_byte, false}, +{128, rep_prefix_8_byte, false}, {-1, libcall, false; struct processor_costs znver1_cost = { { @@ -1460,7 +1460,7 @@ static stringop_algs znver2_memcpy[2] = { /* 32-bit tuning. */ {libcall, {{6, loop, false}, {14, unrolled_loop, false}, -{-1, rep_prefix_4_byte, false}}}, +{-1, libcall, false}}}, /* 64-bit tuning. */ {libcall, {{16, loop, false}, {64, rep_prefix_4_byte, false}, @@ -1469,7 +1469,7 @@ static stringop_algs znver2_memset[2] = { /* 32-bit tuning. */ {libcall, {{8, loop, false}, {24, unrolled_loop, false}, -{2048, rep_prefix_4_byte, false} +{128, rep_prefix_4_byte, false}, {-1, libcall, false}}}, /* 64-bit tuning. */ {libcall, {{24, rep_prefix_4_byte, false}, -- 2.26.2
Re: [PATCH 2/2] Tune memcpy and memset for Zen cores.
Adding Honza as Uros recommended him for a review. Martin On 6/1/20 1:35 PM, Martin Liška wrote: Based on the collected numbers in PR95435, I suggest the following tuning changes: gcc/ChangeLog: PR target/95435 * config/i386/x86-tune-costs.h: Use libcall for large sizes for -m32. Start using libcall from 128+ bytes. --- gcc/config/i386/x86-tune-costs.h | 12 ++-- 1 file changed, 6 insertions(+), 6 deletions(-) diff --git a/gcc/config/i386/x86-tune-costs.h b/gcc/config/i386/x86-tune-costs.h index 1169178433f..3207404e514 100644 --- a/gcc/config/i386/x86-tune-costs.h +++ b/gcc/config/i386/x86-tune-costs.h @@ -1314,20 +1314,20 @@ static stringop_algs znver1_memcpy[2] = { /* 32-bit tuning. */ {libcall, {{6, loop, false}, {14, unrolled_loop, false}, - {-1, rep_prefix_4_byte, false}}}, + {-1, libcall, false}}}, /* 64-bit tuning. */ {libcall, {{16, loop, false}, - {8192, rep_prefix_8_byte, false}, + {128, rep_prefix_8_byte, false}, {-1, libcall, false; static stringop_algs znver1_memset[2] = { /* 32-bit tuning. */ {libcall, {{8, loop, false}, {24, unrolled_loop, false}, - {2048, rep_prefix_4_byte, false}, + {128, rep_prefix_4_byte, false}, {-1, libcall, false}}}, /* 64-bit tuning. */ {libcall, {{48, unrolled_loop, false}, - {8192, rep_prefix_8_byte, false}, + {128, rep_prefix_8_byte, false}, {-1, libcall, false; struct processor_costs znver1_cost = { { @@ -1460,7 +1460,7 @@ static stringop_algs znver2_memcpy[2] = { /* 32-bit tuning. */ {libcall, {{6, loop, false}, {14, unrolled_loop, false}, - {-1, rep_prefix_4_byte, false}}}, + {-1, libcall, false}}}, /* 64-bit tuning. */ {libcall, {{16, loop, false}, {64, rep_prefix_4_byte, false}, @@ -1469,7 +1469,7 @@ static stringop_algs znver2_memset[2] = { /* 32-bit tuning. */ {libcall, {{8, loop, false}, {24, unrolled_loop, false}, - {2048, rep_prefix_4_byte, false} + {128, rep_prefix_4_byte, false}, {-1, libcall, false}}}, /* 64-bit tuning. */ {libcall, {{24, rep_prefix_4_byte, false},
Re: [PATCH 1/2] Provide diagnostic hints for missing C inttypes.h string constants.
On Sun, May 24, 2020 at 02:30:13AM +0200, Mark Wielaard wrote: > This adds a flag to c_parser so we know when we were trying to > construct a string literal. If there is a parse error and we were > constructing a string literal, and the next token is an unknown > identifier name, and we know there is a standard header that defines > that name as a string literal, then add a missing header hint to > the error messsage. > > The list of macro names are also used when providing a hint for > missing identifiers. Ping. Note the followup patch that introduces the same functionality for the C++ parser was already approved. This patch (as attached) only needs review/approval from a C-frontend maintainer for some of the gcc/c/c-parser.c bits. Thanks, Mark >From 1aceca275a73b4c7991a6fbde45f4d6da1a9daf5 Mon Sep 17 00:00:00 2001 From: Mark Wielaard Date: Fri, 22 May 2020 01:10:50 +0200 Subject: [PATCH] Provide diagnostic hints for missing C inttypes.h string constants. This adds a flag to c_parser so we know when we were trying to construct a string literal. If there is a parse error and we were constructing a string literal, and the next token is an unknown identifier name, and we know there is a standard header that defines that name as a string literal, then add a missing header hint to the error messsage. The list of macro names are also used when providing a hint for missing identifiers. gcc/c-family/ChangeLog: * known-headers.cc (get_string_macro_hint): New function. (get_stdlib_header_for_name): Use get_string_macro_hint. (get_c_stdlib_header_for_string_macro_name): New function. * known-headers.h (get_c_stdlib_header_for_string_macro_name): New function declaration. gcc/c/ChangeLog: * c-parser.c (struct c_parser): Add seen_string_literal bitfield. (c_parser_consume_token): Reset seen_string_literal. (c_parser_error_richloc): Add name_hint if seen_string_literal and next token is a CPP_NAME and we have a missing header suggestion for the name. (c_parser_string_literal): Set seen_string_literal. gcc/testsuite/ChangeLog: * gcc.dg/spellcheck-inttypes.c: New test. * g++.dg/spellcheck-inttypes.C: Likewise. --- gcc/c-family/known-headers.cc | 53 ++- gcc/c-family/known-headers.h | 2 + gcc/c/c-parser.c | 29 gcc/testsuite/g++.dg/spellcheck-inttypes.C | 41 gcc/testsuite/gcc.dg/spellcheck-inttypes.c | 78 ++ 5 files changed, 202 insertions(+), 1 deletion(-) create mode 100644 gcc/testsuite/g++.dg/spellcheck-inttypes.C create mode 100644 gcc/testsuite/gcc.dg/spellcheck-inttypes.c diff --git a/gcc/c-family/known-headers.cc b/gcc/c-family/known-headers.cc index 1e2bf49c439a..c07cfd1db815 100644 --- a/gcc/c-family/known-headers.cc +++ b/gcc/c-family/known-headers.cc @@ -46,6 +46,49 @@ struct stdlib_hint const char *header[NUM_STDLIBS]; }; +/* Given non-NULL NAME, return the header name defining it (as literal + string) within either the standard library (with '<' and '>'), or + NULL. + + Only handle string macros, so that this can be used for + get_stdlib_header_for_name and + get_c_stdlib_header_for_string_macro_name. */ + +static const char * +get_string_macro_hint (const char *name, enum stdlib lib) +{ + /* and . */ + static const char *c99_cxx11_macros[] = +{ "PRId8", "PRId16", "PRId32", "PRId64", + "PRIi8", "PRIi16", "PRIi32", "PRIi64", + "PRIo8", "PRIo16", "PRIo32", "PRIo64", + "PRIu8", "PRIu16", "PRIu32", "PRIu64", + "PRIx8", "PRIx16", "PRIx32", "PRIx64", + "PRIX8", "PRIX16", "PRIX32", "PRIX64", + + "PRIdPTR", "PRIiPTR", "PRIoPTR", "PRIuPTR", "PRIxPTR", "PRIXPTR", + + "SCNd8", "SCNd16", "SCNd32", "SCNd64", + "SCNi8", "SCNi16", "SCNi32", "SCNi64", + "SCNo8", "SCNo16", "SCNo32", "SCNo64", + "SCNu8", "SCNu16", "SCNu32", "SCNu64", + "SCNx8", "SCNx16", "SCNx32", "SCNx64", + + "SCNdPTR", "SCNiPTR", "SCNoPTR", "SCNuPTR", "SCNxPTR" }; + + if ((lib == STDLIB_C && flag_isoc99) + || (lib == STDLIB_CPLUSPLUS && cxx_dialect >= cxx11 )) +{ + const size_t num_c99_cxx11_macros + = sizeof (c99_cxx11_macros) / sizeof (c99_cxx11_macros[0]); + for (size_t i = 0; i < num_c99_cxx11_macros; i++) + if (strcmp (name, c99_cxx11_macros[i]) == 0) + return lib == STDLIB_C ? "" : ""; +} + + return NULL; +} + /* Given non-NULL NAME, return the header name defining it within either the standard library (with '<' and '>'), or NULL. Only handles a subset of the most common names within the stdlibs. */ @@ -196,7 +239,7 @@ get_stdlib_header_for_name (const char *name, enum stdlib lib) if (strcmp (name, c99_cxx11_hints[i].name) == 0) return c99_cxx11_hints[i].header[lib]; - return NULL; + return get_string_macro_hint (name, lib); } /* Given non-NULL NAME, return the header name defining it within the C @@ -217,6 +260,14 @@ get_cp_stdlib_header_for_name (const char *name) retu
Re: [PATCH 2/2] x86: Add cmpmemsi for -minline-all-stringops
On Mon, Jun 1, 2020 at 3:19 AM Alexander Monakov wrote: > > On Sun, 31 May 2020, H.J. Lu via Gcc-patches wrote: > > > --- a/gcc/config/i386/i386-expand.c > > +++ b/gcc/config/i386/i386-expand.c > > @@ -7656,6 +7656,90 @@ ix86_expand_set_or_cpymem (rtx dst, rtx src, rtx > > count_exp, rtx val_exp, > >return true; > > } > > > > +/* Expand cmpstrn or memcmp. */ > > + > > +bool > > +ix86_expand_cmpstrn_or_cmpmem (rtx result, rtx src1, rtx src2, > > +rtx length, rtx align, bool is_cmpstrn) > > +{ > > + if (optimize_insn_for_size_p () && !TARGET_INLINE_ALL_STRINGOPS) > > +return false; > > + > > + /* Can't use this if the user has appropriated ecx, esi or edi. */ > > + if (fixed_regs[CX_REG] || fixed_regs[SI_REG] || fixed_regs[DI_REG]) > > +return false; > > + > > + if (is_cmpstrn) > > +{ > > + /* For strncmp, length is the maximum length, which can be larger > > + than actual string lengths. We can expand the cmpstrn pattern > > + to "repz cmpsb" only if one of the strings is a constant so > > + that expand_builtin_strncmp() can write the length argument to > > + be the minimum of the const string length and the actual length > > + argument. Otherwise, "repz cmpsb" may pass the 0 byte. */ > > + tree t1 = MEM_EXPR (src1); > > + tree t2 = MEM_EXPR (src2); > > + if (!((t1 && TREE_CODE (t1) == MEM_REF > > + && TREE_CODE (TREE_OPERAND (t1, 0)) == ADDR_EXPR > > + && (TREE_CODE (TREE_OPERAND (TREE_OPERAND (t1, 0), 0)) > > + == STRING_CST)) > > + || (t2 && TREE_CODE (t2) == MEM_REF > > + && TREE_CODE (TREE_OPERAND (t2, 0)) == ADDR_EXPR > > + && (TREE_CODE (TREE_OPERAND (TREE_OPERAND (t2, 0), 0)) > > + == STRING_CST > > + return false; > > +} > > + else > > +{ > > + /* Expand memcmp to "repz cmpsb" only for -minline-all-stringops > > + since "repz cmpsb" can be much slower than memcmp function > > + implemented with vector instructions, see > > + > > + https://gcc.gnu.org/bugzilla/show_bug.cgi?id=43052 > > + */ > > + if (!TARGET_INLINE_ALL_STRINGOPS) > > + return false; > > +} > > This check seems to be misplaced, "rep cmps" is slower than either memcmp or > strcmp. The test for TARGET_INLINE_ALL_STRINGOPS should happen regardless of > is_cmpstrn, so it should go earlier in the function. > My patch doesn't change strncmp at all. I opened: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=95458 -- H.J.
Re: [PATCH] Add missing store in emission of asan_stack_free.
On 5/20/20 1:03 PM, Franz Sirl wrote: Am 2020-05-19 um 21:05 schrieb Martin Liška: Hi. We make direct emission for asan_emit_stack_protection for smaller stacks. That's fine but we're missing the piece that marks the stack as released and we run out of pre-allocated stacks. I also included some stack-related constants that were used in asan.c. Patch can bootstrap on x86_64-linux-gnu and survives regression tests. Ready to be installed? Thanks, Martin gcc/ChangeLog: 2020-05-19 Martin Liska PR sanitizer/94910 * asan.c (asan_emit_stack_protection): Emit also **SavedFlagPtr(FakeStack) = 0 in order to release a stack frame. * asan.h (ASAN_MIN_STACK_FRAME_SIZE_LOG): New. (ASAN_MAX_STACK_FRAME_SIZE_LOG): Likewise. (ASAN_MIN_STACK_FRAME_SIZE): Likewise. (ASAN_MAX_STACK_FRAME_SIZE): Likewise. --- gcc/asan.c | 26 ++ gcc/asan.h | 8 2 files changed, 30 insertions(+), 4 deletions(-) >- if (asan_frame_size > 32 && asan_frame_size <= 65536 && pbase >+ if (asan_frame_size >= ASAN_MIN_STACK_FRAME_SIZE Hi, is the change from > to >= and from 32 to 64 for ASAN_MIN_STACK_FRAME_SIZE intentional? Just asking because it doesn't look obvious from Changelog or patch. Also a few lines below the "5" in use_after_return_class = floor_log2 (asan_frame_size - 1) - 5; looks like it may be related to ASAN_MIN_STACK_FRAME_SIZE_LOG. Hello. Thank you very much for the useful feedback. I really made the refactoring in a wrong way. I'm suggesting to only change the emission of asan_emit_stack_protection. Tested locally with asan.exp file. Ready for master? Thanks, Martin regards, Franz >From 5d0c64b2f4028af3ed575934ecc0c3378cca3de1 Mon Sep 17 00:00:00 2001 From: Martin Liska Date: Tue, 19 May 2020 16:57:56 +0200 Subject: [PATCH] Add missing store in emission of asan_stack_free. gcc/ChangeLog: 2020-05-19 Martin Liska PR sanitizer/94910 * asan.c (asan_emit_stack_protection): Emit also **SavedFlagPtr(FakeStack) = 0 in order to release a stack frame. --- gcc/asan.c | 20 ++-- 1 file changed, 18 insertions(+), 2 deletions(-) diff --git a/gcc/asan.c b/gcc/asan.c index c9872f1b007..e8d2a25ff79 100644 --- a/gcc/asan.c +++ b/gcc/asan.c @@ -1598,8 +1598,24 @@ asan_emit_stack_protection (rtx base, rtx pbase, unsigned int alignb, if (use_after_return_class < 5 && can_store_by_pieces (sz, builtin_memset_read_str, &c, BITS_PER_UNIT, true)) - store_by_pieces (shadow_mem, sz, builtin_memset_read_str, &c, - BITS_PER_UNIT, true, RETURN_BEGIN); + { + /* Emit: + memset(ShadowBase, kAsanStackAfterReturnMagic, ShadowSize); + **SavedFlagPtr(FakeStack) = 0 + */ + store_by_pieces (shadow_mem, sz, builtin_memset_read_str, &c, + BITS_PER_UNIT, true, RETURN_BEGIN); + + unsigned HOST_WIDE_INT offset + = (1 << (use_after_return_class + 6)); + offset -= GET_MODE_SIZE (ptr_mode); + mem = adjust_address (mem, Pmode, offset); + mem = gen_rtx_MEM (ptr_mode, mem); + rtx tmp_reg = gen_reg_rtx (Pmode); + emit_move_insn (tmp_reg, mem); + mem = adjust_address (mem, QImode, 0); + emit_move_insn (mem, const0_rtx); + } else if (use_after_return_class >= 5 || !set_storage_via_setmem (shadow_mem, GEN_INT (sz), -- 2.26.2
Re: [PATCH] Add missing store in emission of asan_stack_free.
On Mon, Jun 01, 2020 at 02:28:51PM +0200, Martin Liška wrote: > --- a/gcc/asan.c > +++ b/gcc/asan.c > @@ -1598,8 +1598,24 @@ asan_emit_stack_protection (rtx base, rtx pbase, > unsigned int alignb, >if (use_after_return_class < 5 > && can_store_by_pieces (sz, builtin_memset_read_str, &c, > BITS_PER_UNIT, true)) > - store_by_pieces (shadow_mem, sz, builtin_memset_read_str, &c, > - BITS_PER_UNIT, true, RETURN_BEGIN); > + { > + /* Emit: > +memset(ShadowBase, kAsanStackAfterReturnMagic, ShadowSize); > +**SavedFlagPtr(FakeStack) = 0 > + */ > + store_by_pieces (shadow_mem, sz, builtin_memset_read_str, &c, > +BITS_PER_UNIT, true, RETURN_BEGIN); > + > + unsigned HOST_WIDE_INT offset > + = (1 << (use_after_return_class + 6)); > + offset -= GET_MODE_SIZE (ptr_mode); > + mem = adjust_address (mem, Pmode, offset); > + mem = gen_rtx_MEM (ptr_mode, mem); > + rtx tmp_reg = gen_reg_rtx (Pmode); > + emit_move_insn (tmp_reg, mem); > + mem = adjust_address (mem, QImode, 0); > + emit_move_insn (mem, const0_rtx); This doesn't look correct to me. I'd think the first adjust_address should be mem = adjust_address (mem, ptr_mode, offset); which will give you a MEM with ptr_mode which has SavedFlagPtr(FakeStack) address, i.e. *SavedFlagPtr(FakeStack). Next, you want to load that into some temporary, so e.g. rtx addr = gen_reg_rtx (ptr_mode); emit_move_insn (addr, mem); next you need to convert that ptr_mode to Pmode if needed, so something like addr = convert_memory_address (Pmode, addr); and finally: mem = gen_rtx_MEM (QImode, addr); emit_move_insn (mem, const0_rtx); Completely untested. Jakub
Re: [PATCH] More c++ math reject macros
Please CC libstd...@gcc.gnu.org on all libstdc++ patches, even if the approval is coming from a target port maintainer, not from a libstdc++ maintainer. Thanks.
Re: [PATCH] Add missing store in emission of asan_stack_free.
On 6/1/20 2:52 PM, Jakub Jelinek wrote: On Mon, Jun 01, 2020 at 02:28:51PM +0200, Martin Liška wrote: --- a/gcc/asan.c +++ b/gcc/asan.c @@ -1598,8 +1598,24 @@ asan_emit_stack_protection (rtx base, rtx pbase, unsigned int alignb, if (use_after_return_class < 5 && can_store_by_pieces (sz, builtin_memset_read_str, &c, BITS_PER_UNIT, true)) - store_by_pieces (shadow_mem, sz, builtin_memset_read_str, &c, -BITS_PER_UNIT, true, RETURN_BEGIN); + { + /* Emit: + memset(ShadowBase, kAsanStackAfterReturnMagic, ShadowSize); + **SavedFlagPtr(FakeStack) = 0 + */ + store_by_pieces (shadow_mem, sz, builtin_memset_read_str, &c, + BITS_PER_UNIT, true, RETURN_BEGIN); + + unsigned HOST_WIDE_INT offset + = (1 << (use_after_return_class + 6)); + offset -= GET_MODE_SIZE (ptr_mode); + mem = adjust_address (mem, Pmode, offset); + mem = gen_rtx_MEM (ptr_mode, mem); + rtx tmp_reg = gen_reg_rtx (Pmode); + emit_move_insn (tmp_reg, mem); + mem = adjust_address (mem, QImode, 0); + emit_move_insn (mem, const0_rtx); This doesn't look correct to me. I'd think the first adjust_address should be mem = adjust_address (mem, ptr_mode, offset); which will give you a MEM with ptr_mode which has SavedFlagPtr(FakeStack) address, i.e. *SavedFlagPtr(FakeStack). Next, you want to load that into some temporary, so e.g. rtx addr = gen_reg_rtx (ptr_mode); emit_move_insn (addr, mem); next you need to convert that ptr_mode to Pmode if needed, so something like addr = convert_memory_address (Pmode, addr); and finally: mem = gen_rtx_MEM (QImode, addr); emit_move_insn (mem, const0_rtx); Completely untested. This is not correct. With your suggestion I have: int foo(int index) { int a[100]; return a[index]; } $ diff -u before.s after.s --- before.s2020-06-01 15:15:22.634337654 +0200 +++ after.s 2020-06-01 15:16:32.205711511 +0200 @@ -81,8 +81,7 @@ movq%rdi, 2147450920(%rax) movq%rsi, 2147450928(%rax) movq%rdi, 2147450936(%rax) - movq504(%rbx), %rax - movb$0, (%rax) + movb$0, 504(%rbx) jmp .L3 .L2: movq$0, 2147450880(%rax) There's missing one level of de-reference. Looking at clang: movq%rsi, 2147450928(%rax) movq%rdi, 2147450936(%rax) movq504(%rbx), %rax movb$0, (%rax) jmp .L3 .L2: It does the same as my patch. Martin Jakub
Discussion about the medium code model in aarch64
Hi, I reported a PR in gcc Bugzilla about the medium code model in aarch64. A solution is proposed and some discussion has been posted. The details of the discussion can be found here : https://gcc.gnu.org/bugzilla/show_bug.cgi?id=95285 Wilco suggest me to make a PIC 48-bit code model by making a new relocation type "high32_47" combined with ADRP instruction, which I think is feasible and more efficient than my solution. But this kind of relocation hasn't been defined in arm's ABI. Meanwhile he also doubt the necessity of the medium or large-pic code model. My solution, on the other hand, only use exiting relocation types R__MOVW_PREL_G0-3, which is also how llvm solve similar problems. Although it is less efficient, but currently more easier to implement. For the necessity concern, because I need to optimize CESM in my work, I happened need to use this kind of large-pic code model. The abstracted test case is also provided in the bug report. I would very much like to know what is your opinion on this issue. Which solution you think is more appropriate for current situation? And regarding the necessity problem, I admit it is not a critical issue. But some application in HPC field do need this code model. Personally, I think it doesn't hurt for us to upstream a prototype first for customer to use it. Later if arm have an official document regarding this code model, we can then make a standard model. What's you opinion regarding this necessity problem? Thanks a lot. Regards, Bu Le (Bruce)
Re: [PATCH] More c++ math reject macros
Apologies! Will do so in the future. Doug On 6/1/20 6:15 AM, Jonathan Wakely wrote: Please CC libstd...@gcc.gnu.org on all libstdc++ patches, even if the approval is coming from a target port maintainer, not from a libstdc++ maintainer. Thanks.
Cleanup global decl stream reference streaming, part 1
Hi, this patch further simplifies way we reffer to global stream. Every function section has vector of references to global trees which are populated during streaming. This vector is for some reason divided into field_decls, fn_decls, type_decls, types, namespace_decls, labels_decls and var_decls which contains also other things. There is no benefit for this split except perhaps for making the indexes bit smaller and possibly better encodable by ulebs. This however does not pay back and makes things unnecesarily complex. We may want to re-add multiple tables if we start streaming something else than trees into the global stream, but that would not work with current infrastructure anyway. The patch drops different streams and I checked that it results in reduction of global stream and apparently very small increase in function streams but it may be just because I updated tree in between the tests. This will be fixed by incremental patch. [WPA] Compression: 86220483 input bytes, 217762146 uncompressed bytes (ratio: 2.525643) [WPA] Compression: 111735464 input bytes, 297410918 uncompressed bytes (ratio: 2.661741) [WPA] Size of mmap'd section decls: 86220483 bytes [WPA] Size of mmap'd section function_body: 14353447 bytes to: [WPA] Compression: 85754594 input bytes, 216006049 uncompressed bytes (ratio: 2.518886) [WPA] Compression: 111370381 input bytes, 295746052 uncompressed bytes (ratio: 2.655518) [WPA] Size of mmap'd section decls: 85754594 bytes [WPA] Size of mmap'd section function_body: 14447946 bytes The patch also removes some of ugly macro generators of accessors functions and makes it easier to further optimize the way we stream references to trees which I plan to do incrementally. I also made the API for streaming referneces symmetric. I.e. you stream out by lto_output_var_decl_ref and stream in by lto_input_var_decl_ref instead streaming out by lto_output_var_decl_index and streaming in by decl_index = streamer_read_uhwi (ib); lto_file_decl_data_get_fn_decl (file_data, decl_index); lto-bootstrapped/regtested x86_64-linux, will commit it shortly. gcc/ChangeLog: 2020-06-01 Jan Hubicka * ipa-reference.c (stream_out_bitmap): Use lto_output_var_decl_ref. (ipa_reference_read_optimization_summary): Use lto_intput_var_decl_ref. * lto-cgraph.c (lto_output_node): Likewise. (lto_output_varpool_node): Likewise. (output_offload_tables): Likewise. (input_node): Likewise. (input_varpool_node): Likewise. (input_offload_tables): Likewise. * lto-streamer-in.c (lto_input_tree_ref): Declare. (lto_input_var_decl_ref): Declare. (lto_input_fn_decl_ref): Declare. * lto-streamer-out.c (lto_indexable_tree_ref): Use only one decl stream. (lto_output_var_decl_index): Rename to .. (lto_output_var_decl_ref): ... this. (lto_output_fn_decl_index): Rename to ... (lto_output_fn_decl_ref): ... this. * lto-streamer.h (enum lto_decl_stream_e_t): Remove per-type streams. (DEFINE_DECL_STREAM_FUNCS): Remove. (lto_output_var_decl_index): Remove. (lto_output_fn_decl_index): Remove. (lto_output_var_decl_ref): Declare. (lto_output_fn_decl_ref): Declare. (lto_input_var_decl_ref): Declare. (lto_input_fn_decl_ref): Declare. diff --git a/gcc/lto-streamer.h b/gcc/lto-streamer.h index 6ab0505c3fd..fc7e4312420 100644 --- a/gcc/lto-streamer.h +++ b/gcc/lto-streamer.h @@ -249,38 +249,12 @@ enum lto_section_type /* Indices to the various function, type and symbol streams. */ enum lto_decl_stream_e_t { - LTO_DECL_STREAM_TYPE = 0,/* Must be first. */ - LTO_DECL_STREAM_FIELD_DECL, - LTO_DECL_STREAM_FN_DECL, - LTO_DECL_STREAM_VAR_DECL, - LTO_DECL_STREAM_TYPE_DECL, - LTO_DECL_STREAM_NAMESPACE_DECL, - LTO_DECL_STREAM_LABEL_DECL, + LTO_DECL_STREAM = 0, /* Must be first. */ LTO_N_DECL_STREAMS }; typedef enum ld_plugin_symbol_resolution ld_plugin_symbol_resolution_t; - -/* Macro to define convenience functions for type and decl streams - in lto_file_decl_data. */ -#define DEFINE_DECL_STREAM_FUNCS(UPPER_NAME, name) \ -static inline tree \ -lto_file_decl_data_get_ ## name (struct lto_file_decl_data *data, \ -unsigned int idx) \ -{ \ - struct lto_in_decl_state *state = data->current_decl_state; \ - return (*state->streams[LTO_DECL_STREAM_## UPPER_NAME])[idx]; \ -} \ -\ -static inline unsigned int \ -lto_file_decl_data_num_ ## name ## s (struct lto_file_decl_data *data) \ -{ \ - struct lto_in_decl_state *state = data->current_decl_state; \ - return vec_safe_length (state->streams[LTO_DECL_STREAM_## UPPER_NAME]); \ -} - - /* Return a char pointer to the start of a data stream for an lto pass or function. The first parameter is the file data that contains the information. The second parameter is the type of information @@ -908,10 +882,12 @@ extern struct
Re: [PATCH] coroutines: Correct handling of references in parm copies [PR95350].
On 6/1/20 3:44 AM, Iain Sandoe wrote: Hi, I had implemented a move out of rvalue refs for such ramp values (since these are most likely to be dangling references). However this does cause a divergence with the clang implementation - and the patch fixes that. ok for both tested on x86_64,powerpc64-linux, x86_64-darwin OK for master? OK for 10.2? thanks Iain --- Adjust to handle rvalue refs the same way as clang, and to correct the handling of moves when a copy CTOR is present. This is one area where we could make things easier for the end-user (as was implemented before this change), however there needs to be agreement about when the full statement containing a coroutine call ends (i.e. when the ramp terminates or when the coroutine terminates). gcc/cp/ChangeLog: PR c++/95350 * coroutines.cc (struct param_info): Remove rv_ref field. (build_actor_fn): Remove specifial rvalue ref handling. (morph_fn_to_coro): Likewise. gcc/testsuite/ChangeLog: PR c++/95350 * g++.dg/coroutines/torture/func-params-08.C: Adjust test to reflect that all rvalue refs are dangling. * g++.dg/coroutines/torture/func-params-09-awaitable-parms.C: Likewise. * g++.dg/coroutines/pr95350.C: New test. --- gcc/cp/coroutines.cc | 41 +-- gcc/testsuite/g++.dg/coroutines/pr95350.C | 28 + .../coroutines/torture/func-params-08.C | 11 ++--- .../torture/func-params-09-awaitable-parms.C | 11 ++--- 4 files changed, 50 insertions(+), 41 deletions(-) create mode 100644 gcc/testsuite/g++.dg/coroutines/pr95350.C diff --git a/gcc/cp/coroutines.cc b/gcc/cp/coroutines.cc index 969f4a66f2f..8746927577a 100644 --- a/gcc/cp/coroutines.cc +++ b/gcc/cp/coroutines.cc @@ -1807,7 +1807,6 @@ struct param_info tree frame_type; /* The type used to represent this parm in the frame. */ tree orig_type; /* The original type of the parm (not as passed). */ bool by_ref; /* Was passed by reference. */ - bool rv_ref; /* Was an rvalue reference. */ bool pt_ref; /* Was a pointer to object. */ bool trivial_dtor; /* The frame type has a trivial DTOR. */ bool this_ptr; /* Is 'this' */ @@ -2077,12 +2076,6 @@ build_actor_fn (location_t loc, tree coro_frame_type, tree actor, tree fnbody, if (parm.pt_ref) fld_idx = build1_loc (loc, CONVERT_EXPR, TREE_TYPE (arg), fld_idx); - /* We expect an rvalue ref. here. */ - if (parm.rv_ref) - fld_idx = convert_to_reference (DECL_ARG_TYPE (arg), fld_idx, - CONV_STATIC, LOOKUP_NORMAL, - NULL_TREE, tf_warning_or_error); - int i; tree *puse; FOR_EACH_VEC_ELT (*parm.body_uses, i, puse) @@ -3770,15 +3763,8 @@ morph_fn_to_coro (tree orig, tree *resumer, tree *destroyer) if (actual_type == NULL_TREE) actual_type = error_mark_node; parm.orig_type = actual_type; - parm.by_ref = parm.rv_ref = parm.pt_ref = false; - if (TREE_CODE (actual_type) == REFERENCE_TYPE - && TYPE_REF_IS_RVALUE (DECL_ARG_TYPE (arg))) - { - parm.rv_ref = true; - actual_type = TREE_TYPE (actual_type); - parm.frame_type = actual_type; - } - else if (TREE_CODE (actual_type) == REFERENCE_TYPE) + parm.by_ref = parm.pt_ref = false; + if (TREE_CODE (actual_type) == REFERENCE_TYPE) { /* If the user passes by reference, then we will save the pointer to the original. As noted in @@ -3786,16 +3772,12 @@ morph_fn_to_coro (tree orig, tree *resumer, tree *destroyer) referenced item ends and then the coroutine is resumed, we have UB; well, the user asked for it. */ actual_type = build_pointer_type (TREE_TYPE (actual_type)); - parm.frame_type = actual_type; parm.pt_ref = true; } else if (TYPE_REF_P (DECL_ARG_TYPE (arg))) - { - parm.by_ref = true; - parm.frame_type = actual_type; - } - else - parm.frame_type = actual_type; + parm.by_ref = true; + + parm.frame_type = actual_type; parm.this_ptr = is_this_parameter (arg); if (lambda_p) @@ -4170,17 +4152,16 @@ morph_fn_to_coro (tree orig, tree *resumer, tree *destroyer) } else if (parm.by_ref) vec_safe_push (promise_args, fld_idx); - else if (parm.rv_ref) - vec_safe_push (promise_args, rvalue (fld_idx)); else vec_safe_push (promise_args, arg); if (TYPE_NEEDS_CONSTRUCTING (parm.frame_type)) { vec *p_in; - if (parm.by_ref - && classtype_has_non_deleted_move_ctor (parm.frame_type) - && !classtype_has_non_deleted_copy_ctor (parm.frame_type)) + if (CLASS_TYPE_P (parm.frame_type) + && classtype_has_non_deleted_move_ctor (parm.frame_type)) +p_in = make_tree_vector_single (move (arg)); + else if (lvalue_p (arg)) p_in = make_tree_vector_single (rvalue (arg)); else p_in = make_tree_vector_single (arg); @@ -4193,9 +4174,7 @@ morph_fn_to_coro (tree orig, tree *resumer, tree *destroyer) } else { - if (parm.rv_ref) -r = convert_from_reference (arg); - else if (!same_type_p (parm.frame_type, DECL_ARG_TYPE (arg))) + if (!same_type_p (parm.frame_type, DECL_ARG_TYP
Re: [PATCH] coroutines: Wrap co_await in a target expr where needed [PR95050]
On 6/1/20 3:55 AM, Iain Sandoe wrote: Hi, Since the co_await expression is mostly opaque to the existing machinery, we were hiding the details of the await_resume return value. If that needs to be wrapped in a target expression, then emulate this with the whole co_await. Similarly, if the await expression we build in response to co_await p.yield_value (e) is wrapped in a target expression, then we need to transfer that wrapper to the resultant CO_YIELD_EXPR (which is, itself, just a proxy for the underlying co_await). tested on x86_64,powerpc64-linux, x86_64-darwin OK for master? OK for 10.2? thanks Iain ok for both gcc/cp/ChangeLog: PR c++/95050 * coroutines.cc (build_co_await): Wrap the co_await expression in a TARGET_EXPR, where needed. (finish_co_yield_expr): Likewise. gcc/testsuite/ChangeLog: PR c++/95050 * g++.dg/coroutines/pr95050.C: New test. --- gcc/cp/coroutines.cc | 29 +- gcc/testsuite/g++.dg/coroutines/pr95050.C | 49 +++ 2 files changed, 76 insertions(+), 2 deletions(-) create mode 100644 gcc/testsuite/g++.dg/coroutines/pr95050.C diff --git a/gcc/cp/coroutines.cc b/gcc/cp/coroutines.cc index 8746927577a..cc685ca73b2 100644 --- a/gcc/cp/coroutines.cc +++ b/gcc/cp/coroutines.cc @@ -816,6 +816,12 @@ build_co_await (location_t loc, tree a, suspend_point_kind suspend_kind) tree awaiter_calls = make_tree_vec (3); TREE_VEC_ELT (awaiter_calls, 0) = awrd_call; /* await_ready(). */ TREE_VEC_ELT (awaiter_calls, 1) = awsp_call; /* await_suspend(). */ + tree te = NULL_TREE; + if (TREE_CODE (awrs_call) == TARGET_EXPR) +{ + te = awrs_call; + awrs_call = TREE_OPERAND (awrs_call, 1); +} TREE_VEC_ELT (awaiter_calls, 2) = awrs_call; /* await_resume(). */ tree await_expr = build5_loc (loc, CO_AWAIT_EXPR, @@ -823,7 +829,13 @@ build_co_await (location_t loc, tree a, suspend_point_kind suspend_kind) a, e_proxy, o, awaiter_calls, build_int_cst (integer_type_node, (int) suspend_kind)); - return convert_from_reference (await_expr); + if (te) +{ + TREE_OPERAND (te, 1) = await_expr; + await_expr = te; +} + tree t = convert_from_reference (await_expr); + return t; } tree @@ -960,8 +972,21 @@ finish_co_yield_expr (location_t kw, tree expr) tree op = build_co_await (kw, yield_call, CO_YIELD_SUSPEND_POINT); if (op != error_mark_node) { - op = build2_loc (kw, CO_YIELD_EXPR, TREE_TYPE (op), expr, op); + if (REFERENCE_REF_P (op)) + op = TREE_OPERAND (op, 0); + /* If the await expression is wrapped in a TARGET_EXPR, then transfer +that wrapper to the CO_YIELD_EXPR, since this is just a proxy for +its contained await. Otherwise, just build the CO_YIELD_EXPR. */ + if (TREE_CODE (op) == TARGET_EXPR) + { + tree t = TREE_OPERAND (op, 1); + t = build2_loc (kw, CO_YIELD_EXPR, TREE_TYPE (t), expr, t); + TREE_OPERAND (op, 1) = t; + } + else + op = build2_loc (kw, CO_YIELD_EXPR, TREE_TYPE (op), expr, op); TREE_SIDE_EFFECTS (op) = 1; + op = convert_from_reference (op); } return op; diff --git a/gcc/testsuite/g++.dg/coroutines/pr95050.C b/gcc/testsuite/g++.dg/coroutines/pr95050.C new file mode 100644 index 000..fd1516d32f0 --- /dev/null +++ b/gcc/testsuite/g++.dg/coroutines/pr95050.C @@ -0,0 +1,49 @@ +#if __has_include () +#include +using namespace std; +#elif defined (__clang__) && __has_include () +#include +using namespace std::experimental; +#endif +#include + +struct ret_type +{ + ret_type () = default; + ret_type (const ret_type&) = delete; + //ret_type (ret_type&&) = default; + ~ret_type() {} +}; + +struct task +{ + struct promise_type + { +auto get_return_object () -> task { return {}; } +auto initial_suspend () -> suspend_always { return {}; } +auto final_suspend () -> suspend_always { return {}; } +void return_void () {} +void unhandled_exception () { } +void thing (ret_type x) {} + }; +}; + +struct awaiter +{ + bool await_ready() const { return true; } + void await_suspend (coroutine_handle<>) {} + ret_type await_resume() { return {}; } +}; + +task +my_coro () +{ + ret_type r2{co_await awaiter{}}; + //ret_type r3 (std::move(r2)); +} + +int main() +{ + auto x = my_coro (); + return 0; +} -- Nathan Sidwell
Re: [PATCH] coroutines: Correct handling of references in parm copies [PR95350].
On 6/1/20 3:59 AM, Iain Sandoe wrote: (resending, this didn’t appear to make it to the list) Hi, I had implemented a move out of rvalue refs for such ramp values (since these are most likely to be dangling references). However this does cause a divergence with the clang implementation - and the patch fixes that. tested on x86_64,powerpc64-linux, x86_64-darwin OK for master? OK for 10.2? ok for both Iain --- Adjust to handle rvalue refs the same way as clang, and to correct the handling of moves when a copy CTOR is present. This is one area where we could make things easier for the end-user (as was implemented before this change), however there needs to be agreement about when the full statement containing a coroutine call ends (i.e. when the ramp terminates or when the coroutine terminates). gcc/cp/ChangeLog: PR c++/95350 * coroutines.cc (struct param_info): Remove rv_ref field. (build_actor_fn): Remove specifial rvalue ref handling. (morph_fn_to_coro): Likewise. gcc/testsuite/ChangeLog: PR c++/95350 * g++.dg/coroutines/torture/func-params-08.C: Adjust test to reflect that all rvalue refs are dangling. * g++.dg/coroutines/torture/func-params-09-awaitable-parms.C: Likewise. * g++.dg/coroutines/pr95350.C: New test. --- gcc/cp/coroutines.cc | 41 +-- gcc/testsuite/g++.dg/coroutines/pr95350.C | 28 + .../coroutines/torture/func-params-08.C | 11 ++--- .../torture/func-params-09-awaitable-parms.C | 11 ++--- 4 files changed, 50 insertions(+), 41 deletions(-) create mode 100644 gcc/testsuite/g++.dg/coroutines/pr95350.C diff --git a/gcc/cp/coroutines.cc b/gcc/cp/coroutines.cc index 969f4a66f2f..8746927577a 100644 --- a/gcc/cp/coroutines.cc +++ b/gcc/cp/coroutines.cc @@ -1807,7 +1807,6 @@ struct param_info tree frame_type; /* The type used to represent this parm in the frame. */ tree orig_type;/* The original type of the parm (not as passed). */ bool by_ref; /* Was passed by reference. */ - bool rv_ref; /* Was an rvalue reference. */ bool pt_ref; /* Was a pointer to object. */ bool trivial_dtor; /* The frame type has a trivial DTOR. */ bool this_ptr; /* Is 'this' */ @@ -2077,12 +2076,6 @@ build_actor_fn (location_t loc, tree coro_frame_type, tree actor, tree fnbody, if (parm.pt_ref) fld_idx = build1_loc (loc, CONVERT_EXPR, TREE_TYPE (arg), fld_idx); - /* We expect an rvalue ref. here. */ - if (parm.rv_ref) - fld_idx = convert_to_reference (DECL_ARG_TYPE (arg), fld_idx, - CONV_STATIC, LOOKUP_NORMAL, - NULL_TREE, tf_warning_or_error); - int i; tree *puse; FOR_EACH_VEC_ELT (*parm.body_uses, i, puse) @@ -3770,15 +3763,8 @@ morph_fn_to_coro (tree orig, tree *resumer, tree *destroyer) if (actual_type == NULL_TREE) actual_type = error_mark_node; parm.orig_type = actual_type; - parm.by_ref = parm.rv_ref = parm.pt_ref = false; - if (TREE_CODE (actual_type) == REFERENCE_TYPE - && TYPE_REF_IS_RVALUE (DECL_ARG_TYPE (arg))) - { - parm.rv_ref = true; - actual_type = TREE_TYPE (actual_type); - parm.frame_type = actual_type; - } - else if (TREE_CODE (actual_type) == REFERENCE_TYPE) + parm.by_ref = parm.pt_ref = false; + if (TREE_CODE (actual_type) == REFERENCE_TYPE) { /* If the user passes by reference, then we will save the pointer to the original. As noted in @@ -3786,16 +3772,12 @@ morph_fn_to_coro (tree orig, tree *resumer, tree *destroyer) referenced item ends and then the coroutine is resumed, we have UB; well, the user asked for it. */ actual_type = build_pointer_type (TREE_TYPE (actual_type)); - parm.frame_type = actual_type; parm.pt_ref = true; } else if (TYPE_REF_P (DECL_ARG_TYPE (arg))) - { - parm.by_ref = true; - parm.frame_type = actual_type; - } - else - parm.frame_type = actual_type; + parm.by_ref = true; + + parm.frame_type = actual_type; parm.this_ptr = is_this_parameter (arg); if (lambda_p) @@ -4170,17 +4152,16 @@ morph_fn_to_coro (tree orig, tree *resumer, tree *destroyer) } else if (parm.by_ref) vec_safe_push (promise_args, fld_idx); - else if (parm.rv_ref) - vec_safe_push (promise_args, rvalue (fld_idx)); else vec_safe_push (promise_args, arg); if (TYPE_NEEDS_CONSTRUCTING (parm.frame_type)) { vec *p_in; - if (parm.by_ref -
Re: [PATCH] coroutines: Allow parameter packs in co_await/yield expressions [PR95345]
On 6/1/20 4:09 AM, Iain Sandoe wrote: Hi This corrects a pasto, where I copied the constraint on bare parameter packs from the co_return to co_yield/await without properly reviewing it. tested on x86_64,powerpc64-linux, x86_64-darwin OK for master? OK for 10.2? ok for both thanks Iain gcc/cp/ChangeLog: PR c++/95345 * coroutines.cc (finish_co_await_expr): Revise to allow for parameter packs. (finish_co_yield_expr): Likewise. gcc/testsuite/ChangeLog: PR c++/95345 * g++.dg/coroutines/pr95345.C: New test. --- gcc/cp/coroutines.cc | 45 +++ gcc/testsuite/g++.dg/coroutines/pr95345.C | 32 2 files changed, 53 insertions(+), 24 deletions(-) create mode 100644 gcc/testsuite/g++.dg/coroutines/pr95345.C diff --git a/gcc/cp/coroutines.cc b/gcc/cp/coroutines.cc index cc685ca73b2..7afa550037c 100644 --- a/gcc/cp/coroutines.cc +++ b/gcc/cp/coroutines.cc @@ -851,19 +851,18 @@ finish_co_await_expr (location_t kw, tree expr) /* The current function has now become a coroutine, if it wasn't already. */ DECL_COROUTINE_P (current_function_decl) = 1; - if (processing_template_decl) -{ - current_function_returns_value = 1; - - if (check_for_bare_parameter_packs (expr)) - return error_mark_node; + /* This function will appear to have no return statement, even if it + is declared to return non-void (most likely). This is correct - we + synthesize the return for the ramp in the compiler. So suppress any + extraneous warnings during substitution. */ + TREE_NO_WARNING (current_function_decl) = true; - /* If we don't know the promise type, we can't proceed. */ - tree functype = TREE_TYPE (current_function_decl); - if (dependent_type_p (functype) || type_dependent_expression_p (expr)) - return build5_loc (kw, CO_AWAIT_EXPR, unknown_type_node, expr, - NULL_TREE, NULL_TREE, NULL_TREE, integer_zero_node); -} + /* If we don't know the promise type, we can't proceed, build the + co_await with the expression unchanged. */ + tree functype = TREE_TYPE (current_function_decl); + if (dependent_type_p (functype) || type_dependent_expression_p (expr)) +return build5_loc (kw, CO_AWAIT_EXPR, unknown_type_node, expr, + NULL_TREE, NULL_TREE, NULL_TREE, integer_zero_node); /* We must be able to look up the "await_transform" method in the scope of the promise type, and obtain its return type. */ @@ -928,19 +927,17 @@ finish_co_yield_expr (location_t kw, tree expr) /* The current function has now become a coroutine, if it wasn't already. */ DECL_COROUTINE_P (current_function_decl) = 1; - if (processing_template_decl) -{ - current_function_returns_value = 1; - - if (check_for_bare_parameter_packs (expr)) - return error_mark_node; + /* This function will appear to have no return statement, even if it + is declared to return non-void (most likely). This is correct - we + synthesize the return for the ramp in the compiler. So suppress any + extraneous warnings during substitution. */ + TREE_NO_WARNING (current_function_decl) = true; - tree functype = TREE_TYPE (current_function_decl); - /* If we don't know the promise type, we can't proceed. */ - if (dependent_type_p (functype) || type_dependent_expression_p (expr)) - return build2_loc (kw, CO_YIELD_EXPR, unknown_type_node, expr, - NULL_TREE); -} + /* If we don't know the promise type, we can't proceed, build the + co_await with the expression unchanged. */ + tree functype = TREE_TYPE (current_function_decl); + if (dependent_type_p (functype) || type_dependent_expression_p (expr)) +return build2_loc (kw, CO_YIELD_EXPR, unknown_type_node, expr, NULL_TREE); if (!coro_promise_type_found_p (current_function_decl, kw)) /* We must be able to look up the "yield_value" method in the scope of diff --git a/gcc/testsuite/g++.dg/coroutines/pr95345.C b/gcc/testsuite/g++.dg/coroutines/pr95345.C new file mode 100644 index 000..90e946d91c2 --- /dev/null +++ b/gcc/testsuite/g++.dg/coroutines/pr95345.C @@ -0,0 +1,32 @@ +#if __has_include () +#include +using namespace std; +#elif defined (__clang__) && __has_include () +#include +using namespace std::experimental; +#endif + +struct dummy_coro +{ + using promise_type = dummy_coro; + bool await_ready() { return false; } + void await_suspend(std::coroutine_handle<>) { } + void await_resume() { } + dummy_coro get_return_object() { return {}; } + dummy_coro initial_suspend() { return {}; } + dummy_coro final_suspend() { return {}; } + void return_void() { } + void unhandled_exception() { } +}; + +template +dummy_coro +foo() +{ + ((co_await [](int){ return std::suspend_never{}; }(I)), ...); + co_return; +} + +void bar() { + foo<1>(); +} -- Nathan Sidw
[committed] libstdc++: Update/streamline Valgrind references
Like many sites over the last year(s) valgrind.org has now moved to https. While there, replace the second of two links in the same vicinity by a purely textual reference -- easier to maintain, and in particular also better from a user experience perspective. Gerald * doc/xml/faq.xml: Adjust Valgrind reference and remove another. * doc/html/faq.html: Regenerate. --- libstdc++-v3/doc/html/faq.html | 4 ++-- libstdc++-v3/doc/xml/faq.xml | 4 ++-- 2 files changed, 4 insertions(+), 4 deletions(-) diff --git a/libstdc++-v3/doc/html/faq.html b/libstdc++-v3/doc/html/faq.html index 18407225d7a..967e5f5f348 100644 --- a/libstdc++-v3/doc/html/faq.html +++ b/libstdc++-v3/doc/html/faq.html @@ -700,7 +700,7 @@ of a few dozen kilobytes on startup. This pool is used to ensure it's possible to throw exceptions (such as bad_alloc) even when malloc is unable to allocate any more memory. -With some versions of http://valgrind.org/"; target="_top">valgrind +With some versions of https://valgrind.org"; target="_top">valgrind this pool will be shown as "still reachable" when the process exits, e.g. still reachable: 72,704 bytes in 1 blocks. This memory is not a leak, because it's still in use by libstdc++, @@ -710,7 +710,7 @@ In the past, a few people reported that the standard containers appear to leak memory when tested with memory checkers such as -http://valgrind.org/"; target="_top">valgrind. +valgrind. Under some (non-default) configurations the library's allocators keep free memory in a pool for later reuse, rather than deallocating it with delete diff --git a/libstdc++-v3/doc/xml/faq.xml b/libstdc++-v3/doc/xml/faq.xml index cf8684e1cea..e419d3c22a0 100644 --- a/libstdc++-v3/doc/xml/faq.xml +++ b/libstdc++-v3/doc/xml/faq.xml @@ -993,7 +993,7 @@ of a few dozen kilobytes on startup. This pool is used to ensure it's possible to throw exceptions (such as bad_alloc) even when malloc is unable to allocate any more memory. -With some versions of http://www.w3.org/1999/xlink"; xlink:href="http://valgrind.org/";>valgrind +With some versions of http://www.w3.org/1999/xlink"; xlink:href="https://valgrind.org";>valgrind this pool will be shown as "still reachable" when the process exits, e.g. still reachable: 72,704 bytes in 1 blocks. This memory is not a leak, because it's still in use by libstdc++, @@ -1004,7 +1004,7 @@ In the past, a few people reported that the standard containers appear to leak memory when tested with memory checkers such as -http://www.w3.org/1999/xlink"; xlink:href="http://valgrind.org/";>valgrind. +valgrind. Under some (non-default) configurations the library's allocators keep free memory in a pool for later reuse, rather than deallocating it with delete -- 2.26.2
[PATCH 0/6] Permute Class Operations
GCC maintianers: The following patch set adds builtins for the various Permute Class Operations specified in IBM RFC 2609. Based on previous IBM internal reviews of the patch set, the desire is for all of the vector insert and extract support to be in vsx.md as there is a longer term plan to re-work this support for PPC. The first patch moves the existing extract support in altivec.md to vsx.md. Additionally, the documentation for the existing vector extract builtins has been updated to match the latest documentation and builtin names in the code. Specifically, the builtin name vec_extractr has been changed to vec_extracth. The description of the two builtins has been changed to match the latest description of the builtins with a few minor edits to address typos in the descriptions. The subsequent patches add additional vector insert, vector replace, vector shift, vector splat, vector blend builtin support. Carl Love
[PATCH 4/6] rs6000, Add vector shift double builtin support
GCC maintainers: The following patch adds support for the vector shift double builtins for RFC2609. The patch has been compiled and tested on powerpc64le-unknown-linux-gnu (Power 9 LE) and Mambo with no regression errors. Please let me know if this patch is acceptable for the mainline branch. Thanks. Carl Love --- gcc/ChangeLog 2020-05-30 Carl Love * config/rs6000/altivec.h: Add define for vec_sldb and vec_srdb. * config/rs6000/altivec.md: Add unspec definitions UNSPEC_SLDB and UNSPEC_SRDB. (define_int_attr): Add SLDB_LR attribute. (define_int_iterator): Add VSHIFT_DBL_LR iterator. (define_insn): Add vsdb_. * config/rs6000/rs6000-builtin.def (BU_FUTURE_V_3): Add definitions for VSLDB_V16QI, VSLDB_V8HI, VSLDB_V4SI, VSLDB_V2DI, VSRDB_V16QI, VSRDB_V8HI, VSRDB_V4SI and VSRDB_V2DI. (BU_FUTURE_OVERLOAD_3): Add overload definitions for SLDB and SRDB. * config/rs6000/rs6000-call.c (altivec_overloaded_builtins): Add entries for FUTURE_BUILTIN_VEC_SLDB and FUTURE_BUILTIN_VEC_SRDB. (rs6000_expand_ternop_builtin): Add else if clause for CODE_FOR_vsldb_v16qi, CODE_FOR_vsldb_v8hi, CODE_FOR_vsldb_v4si, CODE_FOR_vsldb_v2di, CODE_FOR_vsrdb_v16qi, CODE_FOR_vsrdb_v8hi, CODE_FOR_vsrdb_v4si, CODE_FOR_vsrdb_v2di. * doc/extend.texi: Add description for vec_sldb and vec_srdb. * testsuite/gcc.target/powerpc/vec-shift-double-runnable.c: Add runnable test case. gcc/ChangeLog 2020-05-26 Carl Love * config/rs6000/altivec.h: Add define for vec_sldb and vec_srdb. * config/rs6000/altivec.md: Add unspec definitions UNSPEC_SLDB and UNSPEC_SRDB. (define_int_attr): Add SLDB_LR attribute. (define_int_iterator): Add VSHIFT_DBL_LR iterator. (define_insn): Add vsdb_. * config/rs6000/rs6000-builtin.def (BU_FUTURE_V_3): Add definitions for VSLDB_V16QI, VSLDB_V8HI, VSLDB_V4SI, VSLDB_V2DI, VSRDB_V16QI, VSRDB_V8HI, VSRDB_V4SI and VSRDB_V2DI. (BU_FUTURE_OVERLOAD_3): Add overload definitions for SLDB and SRDB. * config/rs6000/rs6000-call.c (altivec_overloaded_builtins): Add entries for FUTURE_BUILTIN_VEC_SLDB and FUTURE_BUILTIN_VEC_SRDB. (rs6000_expand_ternop_builtin): Add else if clause for CODE_FOR_vsldb_v\ 16qi, CODE_FOR_vsldb_v8hi, CODE_FOR_vsldb_v4si, CODE_FOR_vsldb_v2di, CODE_FOR_vsrdb_v16qi, CODE_FOR_vsrdb_v8hi, CODE_FOR_vsrdb_v4si, CODE_FOR_vsrdb_v2di. * doc/extend.texi: Add description for vec_sldb and vec_srdb. * testsuite/gcc.target/powerpc/vec-shift-double-runnable.c: Add runnable test case. --- gcc/config/rs6000/altivec.h | 2 + gcc/config/rs6000/altivec.md | 18 + gcc/config/rs6000/rs6000-builtin.def | 11 + gcc/config/rs6000/rs6000-call.c | 70 gcc/doc/extend.texi | 53 +++ .../powerpc/vec-shift-double-runnable.c | 384 ++ 6 files changed, 538 insertions(+) create mode 100644 gcc/testsuite/gcc.target/powerpc/vec-shift-double- runnable.c diff --git a/gcc/config/rs6000/altivec.h b/gcc/config/rs6000/altivec.h index 435ffb8158f..0be68892aad 100644 --- a/gcc/config/rs6000/altivec.h +++ b/gcc/config/rs6000/altivec.h @@ -703,6 +703,8 @@ __altivec_scalar_pred(vec_any_nle, #define vec_inserth(a, b, c) __builtin_vec_inserth (a, b, c) #define vec_replace_elt(a, b, c) __builtin_vec_replace_elt (a, b, c) #define vec_replace_unaligned(a, b, c) __builtin_vec_replace_un (a, b, c) +#define vec_sldb(a, b, c) __builtin_vec_sldb (a, b, c) +#define vec_srdb(a, b, c) __builtin_vec_srdb (a, b, c) #define vec_gnb(a, b) __builtin_vec_gnb (a, b) #define vec_clrl(a, b) __builtin_vec_clrl (a, b) diff --git a/gcc/config/rs6000/altivec.md b/gcc/config/rs6000/altivec.md index 2fadb442eca..de79ae22fd4 100644 --- a/gcc/config/rs6000/altivec.md +++ b/gcc/config/rs6000/altivec.md @@ -171,6 +171,8 @@ UNSPEC_XXEVAL UNSPEC_VSTRIR UNSPEC_VSTRIL + UNSPEC_SLDB + UNSPEC_SRDB ]) (define_c_enum "unspecv" @@ -781,6 +783,22 @@ DONE; }) +;; Map UNSPEC_SLDB to "l" and UNSPEC_SRDB to "r". +(define_int_attr SLDB_LR [(UNSPEC_SLDB "l") + (UNSPEC_SRDB "r")]) + +(define_int_iterator VSHIFT_DBL_LR [UNSPEC_SLDB UNSPEC_SRDB]) + +(define_insn "vsdb_" + [(set (match_operand:VI2 0 "register_operand" "=v") + (unspec:VI2 [(match_operand:VI2 1 "register_operand" "v") + (match_operand:VI2 2 "register_operand" "v") + (match_operand:QI 3 "const_0_to_12_operand" "n")] + VSHIFT_DBL_LR))] + "TARGET_FUTURE" + "vsdbi %0,%1,%2,%3" + [(set_attr "type" "vecsimple")]) + (define_expand "vstrir_" [(set (match_operand:VIshort 0 "altivec_register_operand") (unspec:VIshort
[PATCH 2/6] rs6000 Add vector insert builtin support
GCC maintainers: This patch adds support for vec_insertl and vec_inserth builtins. The patch has been compiled and tested on powerpc64le-unknown-linux-gnu (Power 9 LE) and mambo with no regression errors. Please let me know if this patch is acceptable for the mainline branch. Thanks. Carl Love -- gcc/ChangeLog 2020-05-30 Carl Love * config/rs6000/altivec.h: Add define vec_insertl, vec_inserth. * config/rs6000/rs6000-builtin.def (BU_FUTURE_V_3): Add definition for VINSERTGPRBL, VINSERTGPRHL, VINSERTGPRWL, VINSERTGPRDL, VINSERTVPRBL, VINSERTVPRHL, VINSERTVPRWL, VINSERTGPRBR, VINSERTGPRHR, VINSERTGPRWR, VINSERTGPRDR, VINSERTVPRBR, VINSERTVPRHR, VINSERTVPRWR. (BU_FUTURE_OVERLOAD_3): Add definition for INSERTL, INSERTH. * config/rs6000/rs6000-call.c (FUTURE_BUILTIN_VEC_INSERTL): Add overloaded argument declarations. (FUTURE_BUILTIN_VEC_INSERTH): Add overloaded argument declarations. (builtin_function_type): Add case entries for FUTURE_BUILTIN_VINSERTGPRBL, FUTURE_BUILTIN_VINSERTGPRHL, FUTURE_BUILTIN_VINSERTGPRWL, FUTURE_BUILTIN_VINSERTGPRDL, FUTURE_BUILTIN_VINSERTVPRBL, FUTURE_BUILTIN_VINSERTVPRHL, FUTURE_BUILTIN_VINSERTVPRWL. * config/rs6000/vsx.md (define_c_enum): Add UNSPEC_INSERTL, UNSPEC_INSERTR. (define_expand): Add vinsertvl_, vinsertvr_, vinsertgl_ vinsertgr_, mode is VI2. (define_ins): vinsertvl_internal_, vinsertvr_internal_, vinsertgl_internal_, vinsertgr_internal_, mode VEC_I. * doc/extend.texi: Add documentation for vec_insertl, vec_inserth. * gcc/testsuite/gcc.target/powerpc/vec-insert-word-runnable.c: New test case. --- gcc/config/rs6000/altivec.h | 2 + gcc/config/rs6000/rs6000-builtin.def | 18 + gcc/config/rs6000/rs6000-call.c | 51 +++ gcc/config/rs6000/vsx.md | 110 ++ gcc/doc/extend.texi | 68 .../powerpc/vec-insert-word-runnable.c| 345 ++ 6 files changed, 594 insertions(+) create mode 100644 gcc/testsuite/gcc.target/powerpc/vec-insert-word-runnable.c diff --git a/gcc/config/rs6000/altivec.h b/gcc/config/rs6000/altivec.h index 0a7e8ab3647..936aeb1ee09 100644 --- a/gcc/config/rs6000/altivec.h +++ b/gcc/config/rs6000/altivec.h @@ -699,6 +699,8 @@ __altivec_scalar_pred(vec_any_nle, /* Overloaded built-in functions for future architecture. */ #define vec_extractl(a, b, c) __builtin_vec_extractl (a, b, c) #define vec_extracth(a, b, c) __builtin_vec_extracth (a, b, c) +#define vec_insertl(a, b, c) __builtin_vec_insertl (a, b, c) +#define vec_inserth(a, b, c) __builtin_vec_inserth (a, b, c) #define vec_gnb(a, b) __builtin_vec_gnb (a, b) #define vec_clrl(a, b) __builtin_vec_clrl (a, b) diff --git a/gcc/config/rs6000/rs6000-builtin.def b/gcc/config/rs6000/rs6000-builtin.def index 8b1ddb00045..c5bd4f86555 100644 --- a/gcc/config/rs6000/rs6000-builtin.def +++ b/gcc/config/rs6000/rs6000-builtin.def @@ -2627,6 +2627,22 @@ BU_FUTURE_V_3 (VEXTRACTHR, "vextduhvhx", CONST, vextractrv8hi) BU_FUTURE_V_3 (VEXTRACTWR, "vextduwvhx", CONST, vextractrv4si) BU_FUTURE_V_3 (VEXTRACTDR, "vextddvhx", CONST, vextractrv2di) +BU_FUTURE_V_3 (VINSERTGPRBL, "vinsgubvlx", CONST, vinsertgl_v16qi) +BU_FUTURE_V_3 (VINSERTGPRHL, "vinsguhvlx", CONST, vinsertgl_v8hi) +BU_FUTURE_V_3 (VINSERTGPRWL, "vinsguwvlx", CONST, vinsertgl_v4si) +BU_FUTURE_V_3 (VINSERTGPRDL, "vinsgudvlx", CONST, vinsertgl_v2di) +BU_FUTURE_V_3 (VINSERTVPRBL, "vinsvubvlx", CONST, vinsertvl_v16qi) +BU_FUTURE_V_3 (VINSERTVPRHL, "vinsvuhvlx", CONST, vinsertvl_v8hi) +BU_FUTURE_V_3 (VINSERTVPRWL, "vinsvuwvlx", CONST, vinsertvl_v4si) + +BU_FUTURE_V_3 (VINSERTGPRBR, "vinsgubvrx", CONST, vinsertgr_v16qi) +BU_FUTURE_V_3 (VINSERTGPRHR, "vinsguhvrx", CONST, vinsertgr_v8hi) +BU_FUTURE_V_3 (VINSERTGPRWR, "vinsguwvrx", CONST, vinsertgr_v4si) +BU_FUTURE_V_3 (VINSERTGPRDR, "vinsgudvrx", CONST, vinsertgr_v2di) +BU_FUTURE_V_3 (VINSERTVPRBR, "vinsvubvrx", CONST, vinsertvr_v16qi) +BU_FUTURE_V_3 (VINSERTVPRHR, "vinsvuhvrx", CONST, vinsertvr_v8hi) +BU_FUTURE_V_3 (VINSERTVPRWR, "vinsvuwvrx", CONST, vinsertvr_v4si) + BU_FUTURE_V_1 (VSTRIBR, "vstribr", CONST, vstrir_v16qi) BU_FUTURE_V_1 (VSTRIHR, "vstrihr", CONST, vstrir_v8hi) BU_FUTURE_V_1 (VSTRIBL, "vstribl", CONST, vstril_v16qi) @@ -2646,6 +2662,8 @@ BU_FUTURE_OVERLOAD_2 (XXGENPCVM, "xxgenpcvm") BU_FUTURE_OVERLOAD_3 (EXTRACTL, "extractl") BU_FUTURE_OVERLOAD_3 (EXTRACTH, "extracth") +BU_FUTURE_OVERLOAD_3 (INSERTL, "insertl") +BU_FUTURE_OVERLOAD_3 (INSERTH, "inserth") BU_FUTURE_OVERLOAD_1 (VSTRIR, "strir") BU_FUTURE_OVERLOAD_1 (VSTRIL, "stril") diff --git a/gcc/config/rs6000/rs6000-call.c b/gcc/config/rs6000/rs6000-call.c index 0ac8054d030..a265e30d1d9 100644 --- a/gcc/config/rs6000/rs6000-call.c +++ b/gcc
[PATCH 5/6] rs6000, Add vector splat builtin support
GCC maintainers: The following patch adds support for the vec_splati, vec_splatid and vec_splati_ins builtins. Note, this patch adds support for instructions that take a 32-bit immediate value that represents a floating point value. This support adds new predicates and a support function to properly handle the immediate value. The patch has been compiled and tested on powerpc64le-unknown-linux-gnu (Power 9 LE) with no regression errors. The test case was compiled on a Power 9 system and then tested on Mambo. Please let me know if this patch is acceptable for the mainline branch. Thanks. Carl Love gcc/ChangeLog 2020-05-30 Carl Love * config/rs6000/altivec.h: Add define for vec_splati, vec_splatid and vec_splati_ins. * config/rs6000/vsx.md: Add UNSPEC_XXSPLTIW, UNSPEC_XXSPLTID and UNSPEC_XXSPLTI32DX. (define_insn): Add vxxspltiw_v4si, vxxspltiw_v4sf_inst, vxxspltidp_v2df_inst, vxxsplti32dx_v4si_inst, and vxxsplti32dx_v4sf_inst. (define_expand): vxxspltiw_v4sf, vxxspltidp_v2df, vxxsplti32dx_v4si, vxxsplti32dx_v4sf. * config/rs6000/predicates: Add predicates u1bit_cint_operand, s32bit_cint_operand, c32bit_cint_operand, and f32bit_const_operand. * config/rs6000/rs6000-builtin.def (BU_FUTURE_V_1): Add definitions for VXXSPLTIW_V4SI, VXXSPLTIW_V4SF and VXXSPLTID. (BU_FUTURE_V_3): Add definitions for VXXSPLTI32DX_V4SI and VXXSPLTI32DX_V4SF. (BU_FUTURE_OVERLOAD_1): Add definitions XXSPLTIW and XXSPLTID. (BU_FUTURE_OVERLOAD_3): Add definition XXSPLTI32DX. * config/rs6000/rs6000-call.c: Add overloaded definitions for FUTURE_BUILTIN_VEC_XXSPLTIW, FUTURE_BUILTIN_VEC_XXSPLTID and FUTURE_BUILTIN_VEC_XXSPLTI32DX. * config/rs6000/rs6000-protos.h: Add prototype definition for rs6000_constF32toI32. (builtin_function_type): Add cases for FUTURE_BUILTIN_VXXSPLTI32DX_V4SI and FUTURE_BUILTIN_VXXSPLTI32DX_V4SF. * config/rs6000/rs6000.c: Add function rs6000_constF32toI32. * config/doc/extend.texi: Add documentation for vec_splati, vec_splatid, and vec_splati_ins. * testsuite/gcc.target/powerpc/vec-splati-runnable: New test. --- gcc/config/rs6000/altivec.h | 3 + gcc/config/rs6000/altivec.md | 109 + gcc/config/rs6000/predicates.md | 20 +++ gcc/config/rs6000/rs6000-builtin.def | 13 ++ gcc/config/rs6000/rs6000-call.c | 19 +++ gcc/config/rs6000/rs6000-protos.h | 1 + gcc/config/rs6000/rs6000.c| 16 ++ gcc/doc/extend.texi | 35 + .../gcc.target/powerpc/vec-splati-runnable.c | 145 ++ 9 files changed, 361 insertions(+) create mode 100644 gcc/testsuite/gcc.target/powerpc/vec-splati- runnable.c diff --git a/gcc/config/rs6000/altivec.h b/gcc/config/rs6000/altivec.h index 0be68892aad..9ed41b1cbf1 100644 --- a/gcc/config/rs6000/altivec.h +++ b/gcc/config/rs6000/altivec.h @@ -705,6 +705,9 @@ __altivec_scalar_pred(vec_any_nle, #define vec_replace_unaligned(a, b, c) __builtin_vec_replace_un (a, b, c) #define vec_sldb(a, b, c) __builtin_vec_sldb (a, b, c) #define vec_srdb(a, b, c) __builtin_vec_srdb (a, b, c) +#define vec_splati(a) __builtin_vec_xxspltiw (a) +#define vec_splatid(a) __builtin_vec_xxspltid (a) +#define vec_splati_ins(a, b, c)__builtin_vec_xxsplti32dx (a, b, c) #define vec_gnb(a, b) __builtin_vec_gnb (a, b) #define vec_clrl(a, b) __builtin_vec_clrl (a, b) diff --git a/gcc/config/rs6000/altivec.md b/gcc/config/rs6000/altivec.md index de79ae22fd4..47e8148029b 100644 --- a/gcc/config/rs6000/altivec.md +++ b/gcc/config/rs6000/altivec.md @@ -173,6 +173,9 @@ UNSPEC_VSTRIL UNSPEC_SLDB UNSPEC_SRDB + UNSPEC_XXSPLTIW + UNSPEC_XXSPLTID + UNSPEC_XXSPLTI32DX ]) (define_c_enum "unspecv" @@ -799,6 +802,112 @@ "vsdbi %0,%1,%2,%3" [(set_attr "type" "vecsimple")]) +(define_insn "vxxspltiw_v4si" + [(set (match_operand:V4SI 0 "register_operand" "=wa") + (unspec:V4SI [(match_operand:SI 1 "s32bit_cint_operand" "n")] +UNSPEC_XXSPLTIW))] + "TARGET_FUTURE" + "xxspltiw %x0,%1" + [(set_attr "type" "vecsimple")]) + +(define_expand "vxxspltiw_v4sf" + [(set (match_operand:V4SF 0 "register_operand" "=wa") + (unspec:V4SF [(match_operand:SF 1 "f32bit_const_operand" "n")] +UNSPEC_XXSPLTIW))] + "TARGET_FUTURE" +{ + long long value = rs6000_constF32toI32 (operands[1]); + emit_insn (gen_vxxspltiw_v4sf_inst (operands[0], GEN_INT (value))); + DONE; +}) + +(define_insn "vxxspltiw_v4sf_inst" + [(set (match_operand:V4SF 0 "register_operand" "=wa") + (unspec:V4SF [(match_operand:SI 1 "c32bit_cint_operand" "n")] +UNSPEC_XXSPLTIW))] + "TARGET_
[PATCH 1/6] rs6000, Update support for vec_extract
GCC maintainers: Move the existing vector extract support in altivec.md to vsx.md so all of the vector insert and extract support is in the same file. The patch also updates the name of the builtins and descriptions for the builtins in the documentation file so they match the approved builtin names and descriptions. The patch does not make any functional changes. Please let me know if the changes are acceptable for the mainline branch. Thanks. Carl Love -- gcc/ChangeLog 2020-05-30 Carl Love * config/rs6000/altivec.md: Move UNSPEC_EXTRACTL, UNSPEC_EXTRACTR declarations to gcc/config/rs6000/vsx.md. (define_expand): Move vextractl and vextractr to gcc/config/rs6000/vsx.md. (define_insn): Move vextractl_internal and vextractr_internal to gcc/config/rs6000/vsx.md. * config/rs6000/vsx.md: Code moved from file config/rs6000/altivec.md. * gcc/doc/extend.texi: Update documentation for vec_extractl. Replace builtin name vec_extractr with vec_extracth. Update description of vec_extracth. --- gcc/config/rs6000/altivec.md | 64 --- gcc/config/rs6000/vsx.md | 66 gcc/doc/extend.texi | 73 +--- 3 files changed, 101 insertions(+), 102 deletions(-) diff --git a/gcc/config/rs6000/altivec.md b/gcc/config/rs6000/altivec.md index 792ca4f488e..2fadb442eca 100644 --- a/gcc/config/rs6000/altivec.md +++ b/gcc/config/rs6000/altivec.md @@ -171,8 +171,6 @@ UNSPEC_XXEVAL UNSPEC_VSTRIR UNSPEC_VSTRIL - UNSPEC_EXTRACTL - UNSPEC_EXTRACTR ]) (define_c_enum "unspecv" @@ -183,8 +181,6 @@ UNSPECV_DSS ]) -;; Like VI, defined in vector.md, but add ISA 2.07 integer vector ops -(define_mode_iterator VI2 [V4SI V8HI V16QI V2DI]) ;; Short vec int modes (define_mode_iterator VIshort [V8HI V16QI]) ;; Longer vec int modes for rotate/mask ops @@ -785,66 +781,6 @@ DONE; }) -(define_expand "vextractl" - [(set (match_operand:V2DI 0 "altivec_register_operand") - (unspec:V2DI [(match_operand:VI2 1 "altivec_register_operand") - (match_operand:VI2 2 "altivec_register_operand") - (match_operand:SI 3 "register_operand")] -UNSPEC_EXTRACTL))] - "TARGET_FUTURE" -{ - if (BYTES_BIG_ENDIAN) -{ - emit_insn (gen_vextractl_internal (operands[0], operands[1], - operands[2], operands[3])); - emit_insn (gen_xxswapd_v2di (operands[0], operands[0])); -} - else -emit_insn (gen_vextractr_internal (operands[0], operands[2], -operands[1], operands[3])); - DONE; -}) - -(define_insn "vextractl_internal" - [(set (match_operand:V2DI 0 "altivec_register_operand" "=v") - (unspec:V2DI [(match_operand:VEC_I 1 "altivec_register_operand" "v") - (match_operand:VEC_I 2 "altivec_register_operand" "v") - (match_operand:SI 3 "register_operand" "r")] -UNSPEC_EXTRACTL))] - "TARGET_FUTURE" - "vextvlx %0,%1,%2,%3" - [(set_attr "type" "vecsimple")]) - -(define_expand "vextractr" - [(set (match_operand:V2DI 0 "altivec_register_operand") - (unspec:V2DI [(match_operand:VI2 1 "altivec_register_operand") - (match_operand:VI2 2 "altivec_register_operand") - (match_operand:SI 3 "register_operand")] -UNSPEC_EXTRACTR))] - "TARGET_FUTURE" -{ - if (BYTES_BIG_ENDIAN) -{ - emit_insn (gen_vextractr_internal (operands[0], operands[1], - operands[2], operands[3])); - emit_insn (gen_xxswapd_v2di (operands[0], operands[0])); -} - else -emit_insn (gen_vextractl_internal (operands[0], operands[2], -operands[1], operands[3])); - DONE; -}) - -(define_insn "vextractr_internal" - [(set (match_operand:V2DI 0 "altivec_register_operand" "=v") - (unspec:V2DI [(match_operand:VEC_I 1 "altivec_register_operand" "v") - (match_operand:VEC_I 2 "altivec_register_operand" "v") - (match_operand:SI 3 "register_operand" "r")] -UNSPEC_EXTRACTR))] - "TARGET_FUTURE" - "vextvrx %0,%1,%2,%3" - [(set_attr "type" "vecsimple")]) - (define_expand "vstrir_" [(set (match_operand:VIshort 0 "altivec_register_operand") (unspec:VIshort [(match_operand:VIshort 1 "altivec_register_operand")] diff --git a/gcc/config/rs6000/vsx.md b/gcc/config/rs6000/vsx.md index 2a28215ac5b..51ffe2d2000 100644 --- a/gcc/config/rs6000/vsx.md +++ b/gcc/config/rs6000/vsx.md @@ -344,8 +344,13 @@ UNSPEC_VSX_FIRST_MISMATCH_INDEX UNSPEC_VSX_FIRST_MISMATCH_EOS_INDEX UNSPEC_XXGENPCV + UNSPEC_EXTRACTL + UNSPEC_EXTRACTR ]) +;; Like VI, defined in v
[PATCH 6/6] rs6000 Add vector blend, permute builtin support
GCC maintainers: The following patch adds support for the vec_blendv and vec_permx builtins. The patch has been compiled and tested on powerpc64le-unknown-linux-gnu (Power 9 LE) with no regression errors. The test cases were compiled on a Power 9 system and then tested on Mambo. Please let me know if this patch is acceptable for the mainline branch. Thanks. Carl Love --- rs6000 RFC2609 vector blend, permute instructions gcc/ChangeLog 2020-05-30 Carl Love * config/rs6000/altivec.h: Add define for vec_blendv and vec_permx. * config/rs6000/altivec.md: Add unspec UNSPEC_XXBLEND, UNSPEC_XXPERMX. New define_mode VM3. New define_attr VM3_char. New define_insn xxblend_ mode is VM3. New define_expand xxpermx. New define_insn xxpermx_inst. * config/rs6000/rs6000-builtin.def (BU_FUTURE_V_3): New definitions VXXBLEND_V16QI, VXXBLEND_V8HI, VXXBLEND_V4SI, VXXBLEND_V2DI, VXXBLEND_V4SF, VXXBLEND_V2DF. (BU_FUTURE_OVERLOAD_3): New definition XXBLEND, (BU_FUTURE_OVERLOAD_4): New definition XXPERMX. * config/rs6000/rs6000-c.c: (altivecaltivec_resolve_overloaded_builtin): Add if case support for FUTURE_BUILTIN_VXXPERMX * config/rs6000/rs6000-call.c: Define overloaded arguments for FUTURE_BUILTIN_VXXBLEND_V16QI, FUTURE_BUILTIN_VXXBLEND_V8HI, FUTURE_BUILTIN_VXXBLEND_V4SI, FUTURE_BUILTIN_VXXBLEND_V2DI, FUTURE_BUILTIN_VXXBLEND_V4SF, FUTURE_BUILTIN_VXXBLEND_V2DF, FUTURE_BUILTIN_VXXPERMX. (rs6000_expand_quaternop_builtin): Add if case for CODE_FOR_xxpermx. (builtin_quaternary_function_type): Add v16uqi_type and xxpermx_type variables, case for FUTURE_BUILTIN_VXXPERMX. (builtin_function_type): Add case for FUTURE_BUILTIN_VXXBLEND_V16QI, FUTURE_BUILTIN_VXXBLEND_V8HI, FUTURE_BUILTIN_VXXBLEND_V4SI, FUTURE_BUILTIN_VXXBLEND_V2DI. * doc/extend.texi: Add documentation for vec_blendv and vec_permx. testsuite/gcc.target/powerpc/vec-blend-runnable.c: New test. testsuite/gcc.target/powerpc/vec-permute-ext-runnable.c: New test. --- gcc/config/rs6000/altivec.h | 2 + gcc/config/rs6000/altivec.md | 82 + gcc/config/rs6000/rs6000-builtin.def | 13 + gcc/config/rs6000/rs6000-c.c | 25 +- gcc/config/rs6000/rs6000-call.c | 94 ++ gcc/doc/extend.texi | 62 .../gcc.target/powerpc/vec-blend-runnable.c | 276 .../powerpc/vec-permute-ext-runnable.c| 294 ++ 8 files changed, 843 insertions(+), 5 deletions(-) create mode 100644 gcc/testsuite/gcc.target/powerpc/vec-blend- runnable.c create mode 100644 gcc/testsuite/gcc.target/powerpc/vec-permute-ext- runnable.c diff --git a/gcc/config/rs6000/altivec.h b/gcc/config/rs6000/altivec.h index 9ed41b1cbf1..1b532effebe 100644 --- a/gcc/config/rs6000/altivec.h +++ b/gcc/config/rs6000/altivec.h @@ -708,6 +708,8 @@ __altivec_scalar_pred(vec_any_nle, #define vec_splati(a) __builtin_vec_xxspltiw (a) #define vec_splatid(a) __builtin_vec_xxspltid (a) #define vec_splati_ins(a, b, c)__builtin_vec_xxsplti32dx (a, b, c) +#define vec_blendv(a, b, c)__builtin_vec_xxblend (a, b, c) +#define vec_permx(a, b, c, d) __builtin_vec_xxpermx (a, b, c, d) #define vec_gnb(a, b) __builtin_vec_gnb (a, b) #define vec_clrl(a, b) __builtin_vec_clrl (a, b) diff --git a/gcc/config/rs6000/altivec.md b/gcc/config/rs6000/altivec.md index 47e8148029b..92c52eae78d 100644 --- a/gcc/config/rs6000/altivec.md +++ b/gcc/config/rs6000/altivec.md @@ -176,6 +176,8 @@ UNSPEC_XXSPLTIW UNSPEC_XXSPLTID UNSPEC_XXSPLTI32DX + UNSPEC_XXBLEND + UNSPEC_XXPERMX ]) (define_c_enum "unspecv" @@ -218,6 +220,21 @@ (KF "FLOAT128_VECTOR_P (KFmode)") (TF "FLOAT128_VECTOR_P (TFmode)")]) +;; Like VM2, just do char, short, int, long, float and double +(define_mode_iterator VM3 [V4SI + V8HI + V16QI + V4SF + V2DF + V2DI]) + +(define_mode_attr VM3_char [(V2DI "d") + (V4SI "w") + (V8HI "h") + (V16QI "b") + (V2DF "d") + (V4SF "w")]) + ;; Map the Vector convert single precision to double precision for integer ;; versus floating point (define_mode_attr VS_sxwsp [(V4SI "sxw") (V4SF "sp")]) @@ -908,6 +925,71 @@ "xxsplti32dx %x0,%1,%2" [(set_attr "type" "vecsimple")]) +(define_insn "xxblend_" + [(set (match_operand:VM3 0 "register_operand" "=wa") + (unspec:VM3 [(match_operand:VM3 1 "register_operand" "wa") +(match
[PATCH 3/6] rs6000, Add vector replace builtin support
GCC maintainers: The following patch adds support for builtins vec_replace_elt and vec_replace_unaligned. The patch has been compiled and tested on powerpc64le-unknown-linux-gnu (Power 9 LE) and mambo with no regression errors. Please let me know if this patch is acceptable for the mainline branch. Thanks. Carl Love --- gcc/ChangeLog 2020-05-30 Carl Love * config/rs6000/altivec.h: Add define for vec_replace_elt and vec_replace_unaligned. * config/rs6000/vsx.md: Add unspec UNSPEC_REPLACE_ELT and UNSPEC_REPLACE_UN. Add mode iterator REPLACE_ELT. Add mode attributes REPLACE_ELT_atr, REPLACE_ELT_inst, REPLACE_ELT_char, REPLACE_ELT_sh, REPLACE_ELT_max. Add define_expand vreplace_elt_, mode REPLACE_ELT. Add define_expand vreplace_un_, mode REPLACE_ELT. Add define_insn vreplace_elt__inst, mode REPLACE_ELT. * config/rs6000/rs6000-builtin.def (BU_FUTURE_V_3): Add VREPLACE_ELT_V4SI, VREPLACE_ELT_UV4SI, VREPLACE_ELT_V4SF, VREPLACE_ELT_UV2DI, VREPLACE_ELT_V2DF,VREPLACE_UN_V4SI, VREPLACE_UN_UV4SI, VREPLACE_UN_V4SF, VREPLACE_UN_V2DI, VREPLACE_UN_UV2DI, VREPLACE_UN_V2DF. (BU_FUTURE_OVERLOAD_3): Add REPLACE_ELT, REPLACE_UN. * config/rs6000/rs6000-call.c: Add FUTURE_BUILTIN_VEC_REPLACE_ELT, FUTURE_BUILTIN_VEC_REPLACE_UN specifications. (rs6000_expand_ternop_builtin): Add 3rd argument checks for CODE_FOR_vreplace_elt_v4si, CODE_FOR_vreplace_elt_v4sf, CODE_FOR_vreplace_un_v4si, CODE_FOR_vreplace_un_v4sf. (builtin_function_type): Add case statements for FUTURE_BUILTIN_VREPLACE_ELT_UV4SI, FUTURE_BUILTIN_VREPLACE_ELT_UV2DI, FUTURE_BUILTIN_VREPLACE_UN_UV4SI, FUTURE_BUILTIN_VREPLACE_UN_UV2DI. * doc/extend.texi: Add description for vec_replace_elt and vec_replace_unaligned builtins. * testsuite/gcc.target/powerpc/vec-replace-word.c: Add new test. --- gcc/config/rs6000/altivec.h | 2 + gcc/config/rs6000/rs6000-builtin.def | 16 + gcc/config/rs6000/rs6000-call.c | 59 gcc/config/rs6000/vsx.md | 61 gcc/doc/extend.texi | 50 +++ .../powerpc/vec-replace-word-runnable.c | 288 ++ 6 files changed, 476 insertions(+) create mode 100644 gcc/testsuite/gcc.target/powerpc/vec-replace-word- runnable.c diff --git a/gcc/config/rs6000/altivec.h b/gcc/config/rs6000/altivec.h index 936aeb1ee09..435ffb8158f 100644 --- a/gcc/config/rs6000/altivec.h +++ b/gcc/config/rs6000/altivec.h @@ -701,6 +701,8 @@ __altivec_scalar_pred(vec_any_nle, #define vec_extracth(a, b, c) __builtin_vec_extracth (a, b, c) #define vec_insertl(a, b, c) __builtin_vec_insertl (a, b, c) #define vec_inserth(a, b, c) __builtin_vec_inserth (a, b, c) +#define vec_replace_elt(a, b, c) __builtin_vec_replace_elt (a, b, c) +#define vec_replace_unaligned(a, b, c) __builtin_vec_replace_un (a, b, c) #define vec_gnb(a, b) __builtin_vec_gnb (a, b) #define vec_clrl(a, b) __builtin_vec_clrl (a, b) diff --git a/gcc/config/rs6000/rs6000-builtin.def b/gcc/config/rs6000/rs6000-builtin.def index c5bd4f86555..91821f29a6f 100644 --- a/gcc/config/rs6000/rs6000-builtin.def +++ b/gcc/config/rs6000/rs6000-builtin.def @@ -2643,6 +2643,20 @@ BU_FUTURE_V_3 (VINSERTVPRBR, "vinsvubvrx", CONST, vinsertvr_v16qi) BU_FUTURE_V_3 (VINSERTVPRHR, "vinsvuhvrx", CONST, vinsertvr_v8hi) BU_FUTURE_V_3 (VINSERTVPRWR, "vinsvuwvrx", CONST, vinsertvr_v4si) +BU_FUTURE_V_3 (VREPLACE_ELT_V4SI, "vreplace_v4si", CONST, vreplace_elt_v4si) +BU_FUTURE_V_3 (VREPLACE_ELT_UV4SI, "vreplace_uv4si", CONST, vreplace_elt_v4si) +BU_FUTURE_V_3 (VREPLACE_ELT_V4SF, "vreplace_v4sf", CONST, vreplace_elt_v4sf) +BU_FUTURE_V_3 (VREPLACE_ELT_V2DI, "vreplace_v2di", CONST, vreplace_elt_v2di) +BU_FUTURE_V_3 (VREPLACE_ELT_UV2DI, "vreplace_uv2di", CONST, vreplace_elt_v2di) +BU_FUTURE_V_3 (VREPLACE_ELT_V2DF, "vreplace_v2df", CONST, vreplace_elt_v2df) + +BU_FUTURE_V_3 (VREPLACE_UN_V4SI, "vreplace_un_v4si", CONST, vreplace_un_v4si) +BU_FUTURE_V_3 (VREPLACE_UN_UV4SI, "vreplace_un_uv4si", CONST, vreplace_un_v4si) +BU_FUTURE_V_3 (VREPLACE_UN_V4SF, "vreplace_un_v4sf", CONST, vreplace_un_v4sf) +BU_FUTURE_V_3 (VREPLACE_UN_V2DI, "vreplace_un_v2di", CONST, vreplace_un_v2di) +BU_FUTURE_V_3 (VREPLACE_UN_UV2DI, "vreplace_un_uv2di", CONST, vreplace_un_v2di) +BU_FUTURE_V_3 (VREPLACE_UN_V2DF, "vreplace_un_v2df", CONST, vreplace_un_v2df) + BU_FUTURE_V_1 (VSTRIBR, "vstribr", CONST, vstrir_v16qi) BU_FUTURE_V_1 (VSTRIHR, "vstrihr", CONST, vstrir_v8hi) BU_FUTURE_V_1 (VSTRIBL, "vstribl", CONST, vstril_v16qi) @@ -2664,6 +2678,8 @@ BU_FUTURE_OVERLOAD_3 (EXTRACTL, "extractl") BU_FUTURE_OVERLOAD_3 (EXTRACTH, "extracth") BU_FUTURE_OVERLOAD_3 (INSERTL, "insertl") BU_FUTURE_OVERLOAD_3 (INSERTH, "inserth") +BU_FUTURE_OVERLOAD_3
Re: [PATCH] Prefer simple case changes in spelling suggestions
> "David" == David Malcolm writes: >> I tested this using the self-tests. A new self-test is also >> included. > Did the full DejaGnu testsuite get run? There are a lot of tests in it > that make use of this code. I didn't try it, but I can. > The patch should probably update the leading comment to > get_edit_distance. Will do. >> test_get_edit_distance_both_ways ("foo", "FOO", 3); [...] > If I'm reading things correctly, the patch here updates the existing > tests to apply the BASE_COST scale factor, but I don't think it adds > any direct checks of the cost of case-conversion. It would be good to > add those. It isn't obvious but the foo/FOO test did change. Tom
Re: [PATCH] favor bcrypt over wincrypt for the random generator on Windows
Steve Lhomme writes: > Hello, > > Any update on this ? This prevents libssp from being usable in UWP apps. > > (BTW the name of the old API is not wincrypt, the header, but CryptoAPI > or CAPI) Sorry for the slow review. I fear most global reviewers would have no idea whether the patch is right or not. Maybe Jon (cc:ed) could comment. Thanks, Richard > > On 2020-04-21 9:48, Steve Lhomme wrote: >> BCrypt is more modern and supported in Universal Apps, Wincrypt is not and >> CryptGenRandom is deprecated: >> https://docs.microsoft.com/en-us/windows/win32/api/wincrypt/nf-wincrypt-cryptgenrandom >> >> BCrypt is available since Vista >> https://docs.microsoft.com/en-us/windows/win32/api/bcrypt/nf-bcrypt-bcryptopenalgorithmprovider >> >> It requires linking with bcrypt rather than advapi32 for wincrypt. >> --- >> libssp/configure.ac | 16 >> libssp/ssp.c| 20 >> 2 files changed, 36 insertions(+) >> >> diff --git a/libssp/configure.ac b/libssp/configure.ac >> index f30f81c54f6..a39d9e9c992 100644 >> --- a/libssp/configure.ac >> +++ b/libssp/configure.ac >> @@ -158,6 +158,22 @@ else >> fi >> AC_SUBST(ssp_have_usable_vsnprintf) >> >> +AC_ARG_ENABLE(bcrypt, >> +AS_HELP_STRING([--disable-bcrypt], >> + [use bcrypt for random generator on Windows (otherwise wincrypt)]), >> + use_win_bcrypt=$enableval, >> + use_win_bcrypt=yes) >> +if test "x$use_win_bcrypt" != xno; then >> + case "$target_os" in >> +win32 | pe | mingw32*) >> + AC_CHECK_TYPES([BCRYPT_ALG_HANDLE],[ >> + LDFLAGS="$LDFLAGS -lbcrypt" >> +],[],[#include >> +#include ]) >> +;; >> + esac >> +fi >> + >> AM_PROG_LIBTOOL >> ACX_LT_HOST_FLAGS >> AC_SUBST(enable_shared) >> diff --git a/libssp/ssp.c b/libssp/ssp.c >> index 28f3e9cc64a..f07cc41fd4f 100644 >> --- a/libssp/ssp.c >> +++ b/libssp/ssp.c >> @@ -56,7 +56,11 @@ see the files COPYING3 and COPYING.RUNTIME respectively. >> If not, see >> to the console using "CONOUT$" */ >> #if defined (_WIN32) && !defined (__CYGWIN__) >> #include >> +#ifdef HAVE_BCRYPT_ALG_HANDLE >> +#include >> +#else >> #include >> +#endif >> # define _PATH_TTY "CONOUT$" >> #else >> # define _PATH_TTY "/dev/tty" >> @@ -77,6 +81,21 @@ __guard_setup (void) >> return; >> >> #if defined (_WIN32) && !defined (__CYGWIN__) >> +#ifdef HAVE_BCRYPT_ALG_HANDLE >> + BCRYPT_ALG_HANDLE algo = 0; >> + NTSTATUS err = BCryptOpenAlgorithmProvider(&algo, BCRYPT_RNG_ALGORITHM, >> + NULL, 0); >> + if (BCRYPT_SUCCESS(err)) >> +{ >> + if (BCryptGenRandom(algo, (BYTE *)&__stack_chk_guard, >> + sizeof (__stack_chk_guard), 0) && >> __stack_chk_guard != 0) >> +{ >> + BCryptCloseAlgorithmProvider(algo, 0); >> + return; >> +} >> + BCryptCloseAlgorithmProvider(algo, 0); >> +} >> +#else /* !HAVE_BCRYPT_ALG_HANDLE */ >> HCRYPTPROV hprovider = 0; >> if (CryptAcquireContext(&hprovider, NULL, NULL, PROV_RSA_FULL, >> CRYPT_VERIFYCONTEXT | CRYPT_SILENT)) >> @@ -89,6 +108,7 @@ __guard_setup (void) >> } >> CryptReleaseContext(hprovider, 0); >> } >> +#endif /* !HAVE_BCRYPT_ALG_HANDLE */ >> #else >> int fd = open ("/dev/urandom", O_RDONLY); >> if (fd != -1) >> -- >> 2.17.1 >>
Re: [PATCH] Fix unrecognised -mcpu target: armv7-a on arm-wrs-vxworks7 (PR95420)
On 31/05/2020 23:40, Iain Buclaw via Gcc-patches wrote: > Hi, > > In the removal of arm-wrs-vxworks, the default cpu was updated from arm8 > to armv7-a, but this is not recognized as a valid -mcpu target. There > is however generic-armv7-a, which was likely the intended cpu that > should have been used instead. > > Tested by building a cross-compiler targetting arm-wrs-vxworks7, running > make all-gcc and ensuring it succeeds. > > OK? This affects release/gcc-10 branch as well, so should be > backported too. > > Regards > Iain. > > > gcc/ChangeLog: > > PR target/95420 > * config.gcc (arm-wrs-vxworks7*): Set default cpu to generic-armv7-a. > --- > gcc/config.gcc | 2 +- > 1 file changed, 1 insertion(+), 1 deletion(-) > > diff --git a/gcc/config.gcc b/gcc/config.gcc > index f544932fc39..06ad813ad39 100644 > --- a/gcc/config.gcc > +++ b/gcc/config.gcc > @@ -1193,7 +1193,7 @@ arm-wrs-vxworks7*) > tmake_file="${tmake_file} arm/t-arm arm/t-vxworks arm/t-bpabi" > tm_file="elfos.h arm/elf.h arm/bpabi.h arm/aout.h ${tm_file}" > tm_file="${tm_file} vx-common.h vxworks.h arm/vxworks.h" > - target_cpu_cname="armv7-a" > + target_cpu_cname="generic-armv7-a" > need_64bit_hwint=yes > ;; > arm*-*-freebsd*)# ARM FreeBSD EABI > OK all. Thanks, R.
[PATCH] c++: constrained lambda inside template [PR92633]
When regenerating a constrained lambda during instantiation of an enclosing template, we are forgetting to substitute into the lambda's constraints. Fix this by substituting through the constraints during tsubst_lambda_expr. Passes 'make check-c++', and also tested by building the testsuites of cmcstl2 and range-v3. Does this look OK to commit to master and to the 10 branch after a full bootstrap and regtest? gcc/cp/ChangeLog: PR c++/92633 PR c++/92838 * pt.c (tsubst_function_decl): Don't do set_constraints when regenerating a lambda. (tsubst_lambda_expr): Substitute into the lambda's constraints and do set_constraints here. gcc/testsuite/ChangeLog: PR c++/92633 PR c++/92838 * g++.dg/cpp2a/concepts-lambda11.C: New test. * g++.dg/cpp2a/concepts-lambda12.C: New test. --- gcc/cp/pt.c| 16 +++- gcc/testsuite/g++.dg/cpp2a/concepts-lambda11.C | 17 + gcc/testsuite/g++.dg/cpp2a/concepts-lambda12.C | 15 +++ 3 files changed, 47 insertions(+), 1 deletion(-) create mode 100644 gcc/testsuite/g++.dg/cpp2a/concepts-lambda11.C create mode 100644 gcc/testsuite/g++.dg/cpp2a/concepts-lambda12.C diff --git a/gcc/cp/pt.c b/gcc/cp/pt.c index df647af7b46..907ca879c73 100644 --- a/gcc/cp/pt.c +++ b/gcc/cp/pt.c @@ -13854,7 +13854,10 @@ tsubst_function_decl (tree t, tree args, tsubst_flags_t complain, don't substitute through the constraints; that's only done when they are checked. */ if (tree ci = get_constraints (t)) -set_constraints (r, ci); +/* Unless we're regenerating a lambda, in which case we'll set the + lambda's constraints in tsubst_lambda_expr. */ +if (!lambda_fntype) + set_constraints (r, ci); if (DECL_FRIEND_P (t) && DECL_FRIEND_CONTEXT (t)) SET_DECL_FRIEND_CONTEXT (r, @@ -19029,6 +19032,17 @@ tsubst_lambda_expr (tree t, tree args, tsubst_flags_t complain, tree in_decl) finish_member_declaration (fn); } + if (tree ci = get_constraints (oldfn)) + { + /* Substitute into the lambda's constraints. */ + if (oldtmpl) + ++processing_template_decl; + ci = tsubst_constraint_info (ci, args, complain, in_decl); + if (oldtmpl) + --processing_template_decl; + set_constraints (fn, ci); + } + /* Let finish_function set this. */ DECL_DECLARED_CONSTEXPR_P (fn) = false; diff --git a/gcc/testsuite/g++.dg/cpp2a/concepts-lambda11.C b/gcc/testsuite/g++.dg/cpp2a/concepts-lambda11.C new file mode 100644 index 000..dd9cd4e2344 --- /dev/null +++ b/gcc/testsuite/g++.dg/cpp2a/concepts-lambda11.C @@ -0,0 +1,17 @@ +// PR c++/92838 +// { dg-do compile { target c++20 } } + +template +auto foo() +{ + [] () requires (N != 0) { }(); // { dg-error "no match" } + [] () requires (N == 0) { }(); + + [] () requires (N == M) { }(); // { dg-error "no match" } + [] () requires (N != M) { }(); +} + +void bar() +{ + foo<0>(); +} diff --git a/gcc/testsuite/g++.dg/cpp2a/concepts-lambda12.C b/gcc/testsuite/g++.dg/cpp2a/concepts-lambda12.C new file mode 100644 index 000..2bc9fd0bb25 --- /dev/null +++ b/gcc/testsuite/g++.dg/cpp2a/concepts-lambda12.C @@ -0,0 +1,15 @@ +// PR c++/92633 +// { dg-do compile { target c++20 } } + +template +concept different_than = !__is_same_as(A, B); + +template +auto diff(B) { +return [](different_than auto a) {}; +} + +int main() { +diff(42)(""); +diff(42)(42); // { dg-error "no match" } +} -- 2.27.0.rc1.5.gae92ac8ae3
Re: [PATCH] Fix unrecognised -mcpu target: armv7-a on arm-wrs-vxworks7 (PR95420)
Hello Iain, > On 01 Jun 2020, at 00:40, Iain Buclaw wrote: > > Hi, > > In the removal of arm-wrs-vxworks, the default cpu was updated from arm8 > to armv7-a, but this is not recognized as a valid -mcpu target. There > is however generic-armv7-a, which was likely the intended cpu that > should have been used instead. Yes, indeed. > Tested by building a cross-compiler targetting arm-wrs-vxworks7, running > make all-gcc and ensuring it succeeds. > > OK? Yes, OK. > This affects release/gcc-10 branch as well, so should be > backported too. Certainly. Could you please ? Thanks! Olivier
Re: [PATCH ping] ppc64 check for incompatible setting of minimal-toc
Greetings, Curious if you've had a chance to look at this patch yet? --Doug On 5/18/20 4:02 PM, Douglas B Rupp wrote: Greetings, The attached patch is proposed for rs6000/linux64.h. The problem it addresses is that the current checking only tests for existence not for an incompatible/compatible setting. For example: $ powerpc64-linux-gnu-gcc -mcmodel=medium -mminimal-toc foo.c is an incompatible set of switches however $ powerpc64-linux-gnu-gcc -mcmodel=medium -mno-minimal-toc foo.c is ok. Currently both are reported as incompatible. --Douglas Rupp, AdaCore
Cleanup global decl stream reference streaming, part 2
Hi, this patch removes unnecesary ref tags and replaces them by one tag for all references to the global stream. lto-bootstrapped/regtested x86_64-linux, comitted. Honza gcc/ChangeLog: 2020-06-01 Jan Hubicka * lto-streamer.h (enum LTO_tags): Remove LTO_field_decl_ref, LTO_function_decl_ref, LTO_label_decl_ref, LTO_namespace_decl_ref, LTO_result_decl_ref, LTO_type_decl_ref, LTO_type_ref, LTO_const_decl_ref, LTO_imported_decl_ref, LTO_translation_unit_decl_ref, LTO_global_decl_ref and LTO_namelist_decl_ref; add LTO_global_stream_ref. * lto-streamer-in.c (lto_input_tree_ref): Simplify. (lto_input_scc): Update. (lto_input_tree_1): Update. * lto-streamer-out.c (lto_indexable_tree_ref): Simlify. * lto-streamer.c (lto_tag_name): Update. diff --git a/gcc/lto-streamer-in.c b/gcc/lto-streamer-in.c index d77b4f5e9ff..5eaba7d16d4 100644 --- a/gcc/lto-streamer-in.c +++ b/gcc/lto-streamer-in.c @@ -316,34 +316,17 @@ lto_input_tree_ref (class lto_input_block *ib, class data_in *data_in, unsigned HOST_WIDE_INT ix_u; tree result = NULL_TREE; - lto_tag_check_range (tag, LTO_field_decl_ref, LTO_namelist_decl_ref); - - switch (tag) + if (tag == LTO_ssa_name_ref) { -case LTO_ssa_name_ref: ix_u = streamer_read_uhwi (ib); result = (*SSANAMES (fn))[ix_u]; - break; - -case LTO_type_ref: -case LTO_field_decl_ref: -case LTO_function_decl_ref: -case LTO_type_decl_ref: -case LTO_namespace_decl_ref: -case LTO_global_decl_ref: -case LTO_result_decl_ref: -case LTO_const_decl_ref: -case LTO_imported_decl_ref: -case LTO_label_decl_ref: -case LTO_translation_unit_decl_ref: -case LTO_namelist_decl_ref: +} + else +{ + gcc_checking_assert (tag == LTO_global_stream_ref); ix_u = streamer_read_uhwi (ib); result = (*data_in->file_data->current_decl_state ->streams[LTO_DECL_STREAM])[ix_u]; - break; - -default: - gcc_unreachable (); } gcc_assert (result); @@ -1485,7 +1468,7 @@ lto_input_scc (class lto_input_block *ib, class data_in *data_in, { enum LTO_tags tag = streamer_read_record_start (ib); if (tag == LTO_null - || (tag >= LTO_field_decl_ref && tag <= LTO_global_decl_ref) + || tag == LTO_global_stream_ref || tag == LTO_tree_pickle_reference || tag == LTO_integer_cst || tag == LTO_tree_scc @@ -1549,7 +1532,7 @@ lto_input_tree_1 (class lto_input_block *ib, class data_in *data_in, if (tag == LTO_null) result = NULL_TREE; - else if (tag >= LTO_field_decl_ref && tag <= LTO_namelist_decl_ref) + else if (tag == LTO_global_stream_ref || tag == LTO_ssa_name_ref) { /* If TAG is a reference to an indexable tree, the next value in IB is the index into the table where we expect to find diff --git a/gcc/lto-streamer-out.c b/gcc/lto-streamer-out.c index a44ed0037ee..dfc4603d7ae 100644 --- a/gcc/lto-streamer-out.c +++ b/gcc/lto-streamer-out.c @@ -252,84 +252,18 @@ static void lto_indexable_tree_ref (struct output_block *ob, tree expr, enum LTO_tags *tag, unsigned *index) { - enum tree_code code; - enum lto_decl_stream_e_t encoder; - gcc_checking_assert (tree_is_indexable (expr)); - if (TYPE_P (expr)) + if (TREE_CODE (expr) == SSA_NAME) { - *tag = LTO_type_ref; - encoder = LTO_DECL_STREAM; + *tag = LTO_ssa_name_ref; + *index = SSA_NAME_VERSION (expr); } else { - code = TREE_CODE (expr); - switch (code) - { - case SSA_NAME: - *tag = LTO_ssa_name_ref; - *index = SSA_NAME_VERSION (expr); - return; - break; - - case FIELD_DECL: - *tag = LTO_field_decl_ref; - encoder = LTO_DECL_STREAM; - break; - - case FUNCTION_DECL: - *tag = LTO_function_decl_ref; - encoder = LTO_DECL_STREAM; - break; - - case VAR_DECL: - case DEBUG_EXPR_DECL: - gcc_checking_assert (decl_function_context (expr) == NULL - || TREE_STATIC (expr)); - /* FALLTHRU */ - case PARM_DECL: - *tag = LTO_global_decl_ref; - encoder = LTO_DECL_STREAM; - break; - - case CONST_DECL: - *tag = LTO_const_decl_ref; - encoder = LTO_DECL_STREAM; - break; - - case TYPE_DECL: - *tag = LTO_type_decl_ref; - encoder = LTO_DECL_STREAM; - break; - - case NAMESPACE_DECL: - *tag = LTO_namespace_decl_ref; - encoder = LTO_DECL_STREAM; - break; - - case LABEL_DECL: - *tag = LTO_label_decl_ref; - encoder = LTO_DECL_STREAM; - break; - - case RESULT_DECL: - *tag = LTO_result_decl_ref; - encoder = LTO_DECL_STREAM; - break; - - case TRANSLAT
Re: [PATCH] coroutines: Fix missed ramp function return copy elision [PR95346].
On 6/1/20 4:46 AM, Iain Sandoe wrote: Hi Confusingly, "get_return_object ()" can do two things: - Firstly it can provide the return object for the ramp function (as the name suggests). - Secondly if the type of the ramp function is different from that of the get_return_object call, this is used as a single parameter to a CTOR for the ramp's return type. In the first case we can rely on finish_return_stmt () to do the necessary processing for copy elision. In the second case, we should have passed a prvalue to the CTOR as per the standard comment, but I had omitted the rvalue () call. Fixed thus. tested on x86_64-darwin, x86_64-linux, powerpc64-linux OK for master? OK for 10.2? ok for both, but I think there's an existing nit ... thanks Iain gcc/cp/ChangeLog: PR c++/95346 * coroutines.cc (morph_fn_to_coro): Ensure that the get- return-object is constructed correctly; When it is not the final return value, pass it to the CTOR of the return type as an rvalue, per the standard comment. gcc/testsuite/ChangeLog: PR c++/95346 * g++.dg/coroutines/pr95346.C: New test. --- gcc/cp/coroutines.cc | 70 +++ gcc/testsuite/g++.dg/coroutines/pr95346.C | 26 + 2 files changed, 71 insertions(+), 25 deletions(-) create mode 100644 gcc/testsuite/g++.dg/coroutines/pr95346.C diff --git a/gcc/cp/coroutines.cc b/gcc/cp/coroutines.cc index 7afa550037c..d1c2b437ade 100644 --- a/gcc/cp/coroutines.cc +++ b/gcc/cp/coroutines.cc { - args = make_tree_vector_single (gro); - arglist = &args; + vec *args = NULL; + vec **arglist = NULL; + if (!gro_is_void_p) + { + args = make_tree_vector_single (r); + arglist = &args; + } + r = build_special_member_call (NULL_TREE, +complete_ctor_identifier, arglist, +fn_return_type, LOOKUP_NORMAL, +tf_warning_or_error); + r = build_cplus_new (fn_return_type, r, tf_warning_or_error); missing release_tree_vector (arg) call here? -- Nathan Sidwell
Re: [PATCH] coroutines: co_returns are statements, not expressions.
On 6/1/20 4:56 AM, Iain Sandoe wrote: Hi This corrects an typo in the CO_RETURN_EXPR tree class. Although it doens’t fix any PR or regression - it seems to me that it would be sensible to apply this to 10.2 as well as master (or it’s an accident waiting to happen). OK for master? 10.2 (after some bake)? thanks Iain gcc/cp/ChangeLog: * cp-tree.def (CO_RETURN_EXPR): Correct the class to use tcc_statement. ok --- gcc/cp/cp-tree.def | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/gcc/cp/cp-tree.def b/gcc/cp/cp-tree.def index 1454802bf68..99851eb780f 100644 --- a/gcc/cp/cp-tree.def +++ b/gcc/cp/cp-tree.def @@ -594,9 +594,9 @@ DEFTREECODE (CO_YIELD_EXPR, "co_yield", tcc_expression, 2) /* The co_return expression is used to support coroutines. Op0 is the original expr, can be void (for use in diagnostics) - Op2 is the promise return_ call for Op0. */ + Op1 is the promise return_ call for for the expression given. */ -DEFTREECODE (CO_RETURN_EXPR, "co_return", tcc_expression, 2) +DEFTREECODE (CO_RETURN_EXPR, "co_return", tcc_statement, 2) /* Local variables: -- Nathan Sidwell
Re: [IMPORTANT] ChangeLog related changes
On Mon, 25 May 2020 at 23:50, Jakub Jelinek via Gcc wrote: > > Hi! > > I've turned the strict mode of Martin Liška's hook changes, > which means that from now on no commits to the trunk or release branches > should be changing any ChangeLog files together with the other files, > ChangeLog entry should be solely in the commit message. > The DATESTAMP bumping script will be updating the ChangeLog files for you. > If somebody makes a mistake in that, please wait 24 hours (at least until > after 00:16 UTC after your commit) so that the script will create the > ChangeLog entries, and afterwards it can be fixed by adjusting the ChangeLog > files. But you can only touch the ChangeLog files in that case (and > shouldn't write a ChangeLog entry for that in the commit message). > > If anything goes wrong, please let me, other RMs and Martin Liška know. The libstdc++ manual is written in Docbook XML, but we commit both the XML and generated HTML pages to Git. Sometimes a small XML file can result in dozens of mechanical changes to the generated HTML files, which we record in the ChangeLog as: * doc/html/*: Regenerated. With the new checks we need to name every generated file individually. If we add that directory to the ignored_prefixes list, we won't need to name them. But then the doc/html/* entry will give an error, and changes to the HTML files can be committed without any ChangeLog entry. Should we just stop mentioning the HTML in the ChangeLog? We could do something like the attached patch, but it seems overkill for this one special case. diff --git a/contrib/gcc-changelog/git_commit.py b/contrib/gcc-changelog/git_commit.py index 4f82b58f64b..add0defaeed 100755 --- a/contrib/gcc-changelog/git_commit.py +++ b/contrib/gcc-changelog/git_commit.py @@ -501,6 +501,7 @@ class GitCommit: assert folder_count == len(self.changelog_entries) mentioned_files = set() +libstdcxx_html_regenerated = False for entry in self.changelog_entries: if not entry.files: msg = 'ChangeLog must contain a file entry' @@ -508,16 +509,33 @@ class GitCommit: assert not entry.folder.endswith('/') for file in entry.files: if not self.is_changelog_filename(file): -mentioned_files.add(os.path.join(entry.folder, file)) +file = os.path.join(entry.folder, file) +if file == 'libstdc++-v3/doc/html/*': +libstdcxx_html_regenerated = True +else: +mentioned_files.add(file) cand = [x[0] for x in self.modified_files if not self.is_changelog_filename(x[0])] changed_files = set(cand) +if libstdcxx_html_regenerated: +libstdcxx_html_regenerated = False +for c in changed_files: +if c.startswith('libstdc++-v3/doc/html/'): +libstdcxx_html_regenerated = True +break +if not libstdcxx_html_regenerated: +self.errors.append(Error('No libstdc++ HTML changes found')) + for file in sorted(mentioned_files - changed_files): self.errors.append(Error('file not changed in a patch', file)) for file in sorted(changed_files - mentioned_files): if not self.in_ignored_location(file): -if file in self.new_files: +if file.startswith('libstdc++-v3/doc/html/'): +if not libstdcxx_html_regenerated: +msg = 'libstdc++ HTML changes not in ChangeLog' +self.errors.append(Error(msg, file)) +elif file in self.new_files: changelog_location = self.get_changelog_by_path(file) # Python2: we cannot use next(filter(...)) entries = filter(lambda x: x.folder == changelog_location,
[PATCH] contrib: Improve comments and error text
* gcc-changelog/git_commit.py (GitCommit.check_mentioned_files): Improve error text. OK for master? commit b0eb103fc6a8b12905ce8feea299e02048b7f820 Author: Jonathan Wakely Date: Mon Jun 1 18:28:35 2020 +0100 contrib: Improve comments and error text * gcc-changelog/git_commit.py (GitCommit.check_mentioned_files): Improve error text. diff --git a/contrib/gcc-changelog/git_commit.py b/contrib/gcc-changelog/git_commit.py index 4f82b58f64b..7fbc029408d 100755 --- a/contrib/gcc-changelog/git_commit.py +++ b/contrib/gcc-changelog/git_commit.py @@ -176,7 +176,7 @@ class Error: class ChangeLogEntry: def __init__(self, folder, authors, prs): self.folder = folder -# Python2 has not 'copy' function +# The 'list.copy()' function is not available before Python 3.3 self.author_lines = list(authors) self.initial_prs = list(prs) self.prs = list(prs) @@ -209,7 +209,7 @@ class ChangeLogEntry: line = line[:line.index(':')] in_location = False -# At this point, all that 's left is a list of filenames +# At this point, all that's left is a list of filenames # separated by commas and whitespaces. for file in line.split(','): file = file.strip() @@ -503,7 +503,7 @@ class GitCommit: mentioned_files = set() for entry in self.changelog_entries: if not entry.files: -msg = 'ChangeLog must contain a file entry' +msg = 'ChangeLog must contain at least one file entry' self.errors.append(Error(msg, entry.folder)) assert not entry.folder.endswith('/') for file in entry.files:
[committed] libstdc++: Fix __gnu_test::input_iterator_wrapper::operator++(int)
I noticed recently that our input_iterator_wrapper utility for writing tests has the following post-increment operator: void operator++(int) { ++*this; } That fails to meet the Cpp17InputIterator requirement that *r++ is valid. This change makes it return a non-void proxy type that can be deferenced to produce another proxy, which is convertible to the value_type. The second proxy converts to const T& to ensure it can't be written to. * testsuite/util/testsuite_iterators.h: (input_iterator_wrapper::operator++(int)): Return proxy object. Tested powerpc64le-linux, committed to master. commit 118158b646d402b0fb5d760e4827611b731fe6f3 Author: Jonathan Wakely Date: Mon Jun 1 18:30:47 2020 +0100 libstdc++: Fix __gnu_test::input_iterator_wrapper::operator++(int) I noticed recently that our input_iterator_wrapper utility for writing tests has the following post-increment operator: void operator++(int) { ++*this; } That fails to meet the Cpp17InputIterator requirement that *r++ is valid. This change makes it return a non-void proxy type that can be deferenced to produce another proxy, which is convertible to the value_type. The second proxy converts to const T& to ensure it can't be written to. * testsuite/util/testsuite_iterators.h: (input_iterator_wrapper::operator++(int)): Return proxy object. diff --git a/libstdc++-v3/testsuite/util/testsuite_iterators.h b/libstdc++-v3/testsuite/util/testsuite_iterators.h index 5be47f47915..71b672c85fa 100644 --- a/libstdc++-v3/testsuite/util/testsuite_iterators.h +++ b/libstdc++-v3/testsuite/util/testsuite_iterators.h @@ -208,6 +208,17 @@ namespace __gnu_test : public std::iterator::type, std::ptrdiff_t, T*, T&> { +struct post_inc_proxy +{ + struct deref_proxy + { + T* ptr; + operator const T&() const { return *ptr; } + } p; + + deref_proxy operator*() const { return p; } +}; + protected: input_iterator_wrapper() : ptr(0), SharedInfo(0) { } @@ -266,10 +277,12 @@ namespace __gnu_test return *this; } -void +post_inc_proxy operator++(int) { + post_inc_proxy tmp = { { ptr } }; ++*this; + return tmp; } #if __cplusplus >= 201103L
Re: [committed] libstdc++: Update/streamline Valgrind references
On 01/06/20 17:06 +0200, Gerald Pfeifer wrote: Like many sites over the last year(s) valgrind.org has now moved to https. While there, replace the second of two links in the same vicinity by a purely textual reference -- easier to maintain, and in particular also better from a user experience perspective. Thanks. I've also committed a couple more doc improvements, as attached. commit 258059d91bd0e27cc335312f4558e1b339a2e77d Author: Jonathan Wakely Date: Mon Jun 1 16:43:01 2020 +0100 libstdc++: Document API changes in GCC 10 * doc/xml/manual/evolution.xml: Document deprecation of __is_nullptr_t and removal of std::allocator members. * doc/html/manual/api.html: Regenerate. diff --git a/libstdc++-v3/doc/xml/manual/evolution.xml b/libstdc++-v3/doc/xml/manual/evolution.xml index ab04c1ad272..623d53e7faf 100644 --- a/libstdc++-v3/doc/xml/manual/evolution.xml +++ b/libstdc++-v3/doc/xml/manual/evolution.xml @@ -955,11 +955,23 @@ now defaults to zero. + + The non-standard std::__is_nullptr_t type trait + was deprecated. + + The std::packaged_task constructors taking an allocator argument are only defined for C++11 and C++14. + + Several members of std::allocator were removed + for C++20 mode. The removed functionality has been provided by + std::allocator_traits since C++11 and that should + be used instead. + + commit a1ffe9b6f4d0e2dd9493c5bd669fc5a2ea24a6f9 Author: Jonathan Wakely Date: Mon Jun 1 16:40:13 2020 +0100 libstdc++: Fix incorrect Docbook links The element creates the link text automatically from the link target, rather than using the text node child of the element. This can be changed by using an endterm attribute, but it's simpler to just use the element instead. * doc/xml/manual/containers.xml: Replace with . * doc/xml/manual/evolution.xml: Likewise. * doc/html/manual/api.html: Regenerate. * doc/html/manual/containers.html: Regenerate. diff --git a/libstdc++-v3/doc/xml/manual/containers.xml b/libstdc++-v3/doc/xml/manual/containers.xml index 5c9854efbdd..6d568164b47 100644 --- a/libstdc++-v3/doc/xml/manual/containers.xml +++ b/libstdc++-v3/doc/xml/manual/containers.xml @@ -25,8 +25,8 @@ list::size() is O(n) - Yes it is, at least using the old - ABI, and that's okay. This is a decision that we preserved + Yes it is, at least using the old + ABI, and that's okay. This is a decision that we preserved when we imported SGI's STL implementation. The following is quoted from http://www.w3.org/1999/xlink"; xlink:href="https://web.archive.org/web/20171225062613/http://www.sgi.com/tech/stl/FAQ.html";>their FAQ: diff --git a/libstdc++-v3/doc/xml/manual/evolution.xml b/libstdc++-v3/doc/xml/manual/evolution.xml index 1bd7bb1bb9f..ab04c1ad272 100644 --- a/libstdc++-v3/doc/xml/manual/evolution.xml +++ b/libstdc++-v3/doc/xml/manual/evolution.xml @@ -784,8 +784,8 @@ now defaults to zero. Assertions to check function preconditions can be enabled by defining the - _GLIBCXX_ASSERTIONS - macro. + _GLIBCXX_ASSERTIONS + macro. The initial set of assertions are a subset of the checks enabled by the Debug Mode, but without the ABI changes and changes to algorithmic complexity that are caused by enabling the full Debug Mode.
Re: [PATCH] c++: premature requires-expression folding [PR95020]
On 5/30/20 12:37 AM, Patrick Palka wrote: On Wed, 13 May 2020, Jason Merrill wrote: On 5/11/20 6:43 PM, Patrick Palka wrote: In the testcase below we're prematurely folding away the requires-expression to 'true' after substituting in the function's template arguments, but before substituting in the lambda's deduced template arguments. This happens because during the first tsubst_requires_expr, processing_template_decl is 1 but 'args' is just {void} and therefore non-dependent, so we end up folding away the requires-expression to boolean_true_node before we could substitute in the lambda's template arguments and determine that '*v' is ill-formed. This patch removes the uses_template_parms check when deciding in tsubst_requires_expr whether to keep around a new requires-expression. Regardless of whether the template arguments are dependent, there still might be more template parameters to later substitute in -- as in the testcase below -- and even if not, tsubst_expr doesn't perform full semantic processing unless !processing_template_decl, so it seems we should wait until then to fold away the requires-expression. Passes 'make check-c++', does this look OK to commit after a full bootstrap/regtest? OK. Would the same patch be OK to backport to the GCC 10 branch? Yes. gcc/cp/ChangeLog: PR c++/95020 * constraint.c (tsubst_requires_expr): Produce a new requires-expression when processing_template_decl, even if template arguments are not dependent. gcc/testsuite/ChangeLog: PR c++/95020 * g++/cpp2a/concepts-lambda7.C: New test. --- gcc/cp/constraint.cc | 4 +--- gcc/testsuite/g++.dg/cpp2a/concepts-lambda7.C | 14 ++ 2 files changed, 15 insertions(+), 3 deletions(-) create mode 100644 gcc/testsuite/g++.dg/cpp2a/concepts-lambda7.C diff --git a/gcc/cp/constraint.cc b/gcc/cp/constraint.cc index 4ad17f3b7d8..8ee347cae60 100644 --- a/gcc/cp/constraint.cc +++ b/gcc/cp/constraint.cc @@ -2173,9 +2173,7 @@ tsubst_requires_expr (tree t, tree args, if (reqs == error_mark_node) return boolean_false_node; - /* In certain cases, produce a new requires-expression. - Otherwise the value of the expression is true. */ - if (processing_template_decl && uses_template_parms (args)) + if (processing_template_decl) return finish_requires_expr (cp_expr_location (t), parms, reqs); return boolean_true_node; diff --git a/gcc/testsuite/g++.dg/cpp2a/concepts-lambda7.C b/gcc/testsuite/g++.dg/cpp2a/concepts-lambda7.C new file mode 100644 index 000..50746b777a3 --- /dev/null +++ b/gcc/testsuite/g++.dg/cpp2a/concepts-lambda7.C @@ -0,0 +1,14 @@ +// PR c++/95020 +// { dg-do compile { target c++2a } } + +template +void foo() { + auto t = [](auto v) { +static_assert(requires { *v; }); // { dg-error "static assertion failed" } + }; + t(0); +} + +void bar() { + foo(); +}
Re: [PATCH] c++: constrained lambda inside template [PR92633]
On 6/1/20 12:47 PM, Patrick Palka wrote: When regenerating a constrained lambda during instantiation of an enclosing template, we are forgetting to substitute into the lambda's constraints. Fix this by substituting through the constraints during tsubst_lambda_expr. Passes 'make check-c++', and also tested by building the testsuites of cmcstl2 and range-v3. Does this look OK to commit to master and to the 10 branch after a full bootstrap and regtest? OK for both. gcc/cp/ChangeLog: PR c++/92633 PR c++/92838 * pt.c (tsubst_function_decl): Don't do set_constraints when regenerating a lambda. (tsubst_lambda_expr): Substitute into the lambda's constraints and do set_constraints here. gcc/testsuite/ChangeLog: PR c++/92633 PR c++/92838 * g++.dg/cpp2a/concepts-lambda11.C: New test. * g++.dg/cpp2a/concepts-lambda12.C: New test. --- gcc/cp/pt.c| 16 +++- gcc/testsuite/g++.dg/cpp2a/concepts-lambda11.C | 17 + gcc/testsuite/g++.dg/cpp2a/concepts-lambda12.C | 15 +++ 3 files changed, 47 insertions(+), 1 deletion(-) create mode 100644 gcc/testsuite/g++.dg/cpp2a/concepts-lambda11.C create mode 100644 gcc/testsuite/g++.dg/cpp2a/concepts-lambda12.C diff --git a/gcc/cp/pt.c b/gcc/cp/pt.c index df647af7b46..907ca879c73 100644 --- a/gcc/cp/pt.c +++ b/gcc/cp/pt.c @@ -13854,7 +13854,10 @@ tsubst_function_decl (tree t, tree args, tsubst_flags_t complain, don't substitute through the constraints; that's only done when they are checked. */ if (tree ci = get_constraints (t)) -set_constraints (r, ci); +/* Unless we're regenerating a lambda, in which case we'll set the + lambda's constraints in tsubst_lambda_expr. */ +if (!lambda_fntype) + set_constraints (r, ci); if (DECL_FRIEND_P (t) && DECL_FRIEND_CONTEXT (t)) SET_DECL_FRIEND_CONTEXT (r, @@ -19029,6 +19032,17 @@ tsubst_lambda_expr (tree t, tree args, tsubst_flags_t complain, tree in_decl) finish_member_declaration (fn); } + if (tree ci = get_constraints (oldfn)) + { + /* Substitute into the lambda's constraints. */ + if (oldtmpl) + ++processing_template_decl; + ci = tsubst_constraint_info (ci, args, complain, in_decl); + if (oldtmpl) + --processing_template_decl; + set_constraints (fn, ci); + } + /* Let finish_function set this. */ DECL_DECLARED_CONSTEXPR_P (fn) = false; diff --git a/gcc/testsuite/g++.dg/cpp2a/concepts-lambda11.C b/gcc/testsuite/g++.dg/cpp2a/concepts-lambda11.C new file mode 100644 index 000..dd9cd4e2344 --- /dev/null +++ b/gcc/testsuite/g++.dg/cpp2a/concepts-lambda11.C @@ -0,0 +1,17 @@ +// PR c++/92838 +// { dg-do compile { target c++20 } } + +template +auto foo() +{ + [] () requires (N != 0) { }(); // { dg-error "no match" } + [] () requires (N == 0) { }(); + + [] () requires (N == M) { }(); // { dg-error "no match" } + [] () requires (N != M) { }(); +} + +void bar() +{ + foo<0>(); +} diff --git a/gcc/testsuite/g++.dg/cpp2a/concepts-lambda12.C b/gcc/testsuite/g++.dg/cpp2a/concepts-lambda12.C new file mode 100644 index 000..2bc9fd0bb25 --- /dev/null +++ b/gcc/testsuite/g++.dg/cpp2a/concepts-lambda12.C @@ -0,0 +1,15 @@ +// PR c++/92633 +// { dg-do compile { target c++20 } } + +template +concept different_than = !__is_same_as(A, B); + +template +auto diff(B) { +return [](different_than auto a) {}; +} + +int main() { +diff(42)(""); +diff(42)(42); // { dg-error "no match" } +}
Re: [PATCH 3/4] ivopts: Consider cost_step on different forms during unrolling
Could you go into more detail about this choice of cost calculation? It looks like we first calculate per-group flags, which are true only if the unrolled offsets are valid for all uses in the group. Then we create per-candidate flags when associating candidates with groups. Instead, couldn't we take this into account in get_address_cost, which calculates the cost of an address use for a given candidate? E.g. after the main if-else at the start of the function, perhaps it would make sense to add the worst-case offset to the address in “parts”, check whether that too is a valid address, and if not, increase var_cost by the cost of one add instruction. I guess there are two main sources of inexactness if we do that: (1) It might underestimate the cost because it assumes that vuse[0] stands for all vuses in the group. (2) It might overestimates the cost because it treats all unrolled iterations as having the cost of the final unrolled iteration. (1) could perhaps be avoided by adding a flag to the iv_use to say whether it wants this treatment. I think the flag approach suffers from (2) too, and I'd be surprised if it makes a difference in practice. Thanks, Richard
PowerPC tests for -mcpu=future
This thread adds seven patches to add tests for the -mcpu=future code generation. These patches are an update to the patches I sent out in April. https://gcc.gnu.org/pipermail/gcc-patches/2020-April/544653.html I have done bootstrap builds with/without the patches on a little end power9 box, and there were no regressions with any of the tests ran. I verified that these tests do run and succeed. Can I check them into the master branch?
[PATCH 1/7] PowerPC tests: Add prefixed/pcrel tests.
2020-06-01 Michael Meissner * lib/target-supports.exp (check_effective_target_powerpc_pcrel): New. (check_effective_target_powerpc_prefixed_addr): New. --- gcc/testsuite/lib/target-supports.exp | 19 +++ 1 file changed, 19 insertions(+) diff --git a/gcc/testsuite/lib/target-supports.exp b/gcc/testsuite/lib/target-supports.exp index b335108..9d880f4 100644 --- a/gcc/testsuite/lib/target-supports.exp +++ b/gcc/testsuite/lib/target-supports.exp @@ -2163,6 +2163,25 @@ proc check_p9vector_hw_available { } { }] } +# Return 1 if the target generates PC-relative instructions automatically for +# the PowerPC 'future' machine. +proc check_effective_target_powerpc_pcrel { } { +return [check_no_messages_and_pattern powerpc_pcrel \ + {\mpla\M} assembly { + static unsigned short s; + unsigned short *p_foo (void) { return &s; } + } {-O2 -mcpu=future}] +} + +# Return 1 if the target generates prefixed instructions automatically for the +# PowerPC 'future' machine. +proc check_effective_target_powerpc_prefixed_addr { } { +return [check_no_messages_and_pattern powerpc_prefixed_addr \ + {\mplwz\M} assembly { + unsigned int foo (unsigned int *p) { return p[0x12345]; } + } {-O2 -mcpu=future}] +} + # Return 1 if the target supports executing power9 modulo instructions, 0 # otherwise. Cache the result. -- 1.8.3.1
[PATCH 3/7] PowerPC tests: Add prefixed vs. DS/DQ instruction tests.
Add test to make sure prefixed load/store instructions are generated if the offset would not fit in the DS/DQ encodings. 2020-06-01 Michael Meissner * gcc.target/powerpc/prefix-ds-dq.c: New test. --- gcc/testsuite/gcc.target/powerpc/prefix-ds-dq.c | 159 1 file changed, 159 insertions(+) diff --git a/gcc/testsuite/gcc.target/powerpc/prefix-ds-dq.c b/gcc/testsuite/gcc.target/powerpc/prefix-ds-dq.c index e69de29..68fbad3 100644 --- a/gcc/testsuite/gcc.target/powerpc/prefix-ds-dq.c +++ b/gcc/testsuite/gcc.target/powerpc/prefix-ds-dq.c @@ -0,0 +1,159 @@ +/* { dg-do compile } */ +/* { dg-require-effective-target powerpc_prefixed_addr } */ +/* { dg-options "-O2 -mdejagnu-cpu=future" } */ + +/* Tests whether we generate a prefixed load/store operation for addresses that + don't meet DS/DQ offset constraints. */ + +struct packed_struct +{ + long long pad; /* offset 0 bytes. */ + unsigned char pad_uc;/* offset 8 bytes. */ + unsigned char uc;/* offset 9 bytes. */ + + unsigned char pad_sc[sizeof (long long) - sizeof (unsigned char)]; + unsigned char sc;/* offset 17 bytes. */ + + unsigned char pad_us[sizeof (long long) - sizeof (signed char)]; + unsigned short us; /* offset 25 bytes. */ + + unsigned char pad_ss[sizeof (long long) - sizeof (unsigned short)]; + short ss;/* offset 33 bytes. */ + + unsigned char pad_ui[sizeof (long long) - sizeof (short)]; + unsigned int ui; /* offset 41 bytes. */ + + unsigned char pad_si[sizeof (long long) - sizeof (unsigned int)]; + unsigned int si; /* offset 49 bytes. */ + + unsigned char pad_f[sizeof (long long) - sizeof (int)]; + float f; /* offset 57 bytes. */ + + unsigned char pad_d[sizeof (long long) - sizeof (float)]; + double d;/* offset 65 bytes. */ + __float128 f128; /* offset 73 bytes. */ +} __attribute__((packed)); + +unsigned char +load_uc (struct packed_struct *p) +{ + return p->uc;/* LBZ 3,9(3). */ +} + +signed char +load_sc (struct packed_struct *p) +{ + return p->sc;/* LBZ 3,17(3) + EXTSB 3,3. */ +} + +unsigned short +load_us (struct packed_struct *p) +{ + return p->us;/* LHZ 3,25(3). */ +} + +short +load_ss (struct packed_struct *p) +{ + return p->ss;/* LHA 3,33(3). */ +} + +unsigned int +load_ui (struct packed_struct *p) +{ + return p->ui;/* LWZ 3,41(3). */ +} + +int +load_si (struct packed_struct *p) +{ + return p->si;/* PLWA 3,49(3). */ +} + +float +load_float (struct packed_struct *p) +{ + return p->f; /* LFS 1,57(3). */ +} + +double +load_double (struct packed_struct *p) +{ + return p->d; /* LFD 1,65(3). */ +} + +__float128 +load_float128 (struct packed_struct *p) +{ + return p->f128; /* PLXV 34,73(3). */ +} + +void +store_uc (struct packed_struct *p, unsigned char uc) +{ + p->uc = uc; /* STB 4,9(3). */ +} + +void +store_sc (struct packed_struct *p, signed char sc) +{ + p->sc = sc; /* STB 4,17(3). */ +} + +void +store_us (struct packed_struct *p, unsigned short us) +{ + p->us = us; /* STH 4,25(3). */ +} + +void +store_ss (struct packed_struct *p, signed short ss) +{ + p->ss = ss; /* STH 4,33(3). */ +} + +void +store_ui (struct packed_struct *p, unsigned int ui) +{ + p->ui = ui; /* STW 4,41(3). */ +} + +void +store_si (struct packed_struct *p, signed int si) +{ + p->si = si; /* STW 4,49(3). */ +} + +void +store_float (struct packed_struct *p, float f) +{ + p->f = f;/* STFS 1,57(3). */ +} + +void +store_double (struct packed_struct *p, double d) +{ + p->d = d;/* STFD 1,65(3). */ +} + +void +store_float128 (struct packed_struct *p, __float128 f128) +{ + p->f128 = f128; /* PSTXV 34,1(3). */ +} + +/* { dg-final { scan-assembler-times {\mextsb\M} 1 } } */ +/* { dg-final { scan-assembler-times {\mlbz\M} 2 } } */ +/* { dg-final { scan-assembler-times {\mlfd\M} 1 } } */ +/* { dg-final { scan-assembler-times {\mlfs\M} 1 } } */ +/* { dg-final { scan-assembler-times {\mlha\M} 1 } } */ +/* { dg-final { scan-assembler-times {\mlhz\M} 1 } } */ +/* { dg-final { scan-assembler-times {\mlwz\M} 1 } } */ +/* { dg-final { scan-assembler-times {\mplwa\M} 1 } } */ +/* { dg-final { scan-assembler-times {\mplxv\M} 1 } } */ +/* { dg-final { scan-assembler-times {\mpstxv\M} 1 } } */ +/* { dg-final { scan-assembler-times {\mstb\M} 2 } } */ +/
[PATCH 5/7] PowerPC tests: Prefixed insn with large offsets
Add tests to make sure for -mcpu=future that prefixed load/store instructions are generated if the offset is larger than 16 bits. 2020-06-01 Michael Meissner * gcc.target/powerpc/prefix-large-dd.c: New test. * gcc.target/powerpc/prefix-large-df.c: New test. * gcc.target/powerpc/prefix-large-di.c: New test. * gcc.target/powerpc/prefix-large-hi.c: New test. * gcc.target/powerpc/prefix-large-kf.c: New test. * gcc.target/powerpc/prefix-large-qi.c: New test. * gcc.target/powerpc/prefix-large-sd.c: New test. * gcc.target/powerpc/prefix-large-sf.c: New test. * gcc.target/powerpc/prefix-large-si.c: New test. * gcc.target/powerpc/prefix-large-udi.c: New test. * gcc.target/powerpc/prefix-large-uhi.c: New test. * gcc.target/powerpc/prefix-large-uqi.c: New test. * gcc.target/powerpc/prefix-large-usi.c: New test. * gcc.target/powerpc/prefix-large-v2df.c: New test. * gcc.target/powerpc/prefix-large.h: Include file for new tests. --- gcc/testsuite/gcc.target/powerpc/prefix-large-dd.c | 13 ++ gcc/testsuite/gcc.target/powerpc/prefix-large-df.c | 13 ++ gcc/testsuite/gcc.target/powerpc/prefix-large-di.c | 13 ++ gcc/testsuite/gcc.target/powerpc/prefix-large-hi.c | 13 ++ gcc/testsuite/gcc.target/powerpc/prefix-large-kf.c | 13 ++ gcc/testsuite/gcc.target/powerpc/prefix-large-qi.c | 13 ++ gcc/testsuite/gcc.target/powerpc/prefix-large-sd.c | 16 +++ gcc/testsuite/gcc.target/powerpc/prefix-large-sf.c | 13 ++ gcc/testsuite/gcc.target/powerpc/prefix-large-si.c | 13 ++ .../gcc.target/powerpc/prefix-large-udi.c | 14 ++ .../gcc.target/powerpc/prefix-large-uhi.c | 14 ++ .../gcc.target/powerpc/prefix-large-uqi.c | 14 ++ .../gcc.target/powerpc/prefix-large-usi.c | 14 ++ .../gcc.target/powerpc/prefix-large-v2df.c | 13 ++ gcc/testsuite/gcc.target/powerpc/prefix-large.h| 51 ++ 15 files changed, 240 insertions(+) create mode 100644 gcc/testsuite/gcc.target/powerpc/prefix-large-dd.c create mode 100644 gcc/testsuite/gcc.target/powerpc/prefix-large-df.c create mode 100644 gcc/testsuite/gcc.target/powerpc/prefix-large-di.c create mode 100644 gcc/testsuite/gcc.target/powerpc/prefix-large-hi.c create mode 100644 gcc/testsuite/gcc.target/powerpc/prefix-large-kf.c create mode 100644 gcc/testsuite/gcc.target/powerpc/prefix-large-qi.c create mode 100644 gcc/testsuite/gcc.target/powerpc/prefix-large-sd.c create mode 100644 gcc/testsuite/gcc.target/powerpc/prefix-large-sf.c create mode 100644 gcc/testsuite/gcc.target/powerpc/prefix-large-si.c create mode 100644 gcc/testsuite/gcc.target/powerpc/prefix-large-udi.c create mode 100644 gcc/testsuite/gcc.target/powerpc/prefix-large-uhi.c create mode 100644 gcc/testsuite/gcc.target/powerpc/prefix-large-uqi.c create mode 100644 gcc/testsuite/gcc.target/powerpc/prefix-large-usi.c create mode 100644 gcc/testsuite/gcc.target/powerpc/prefix-large-v2df.c create mode 100644 gcc/testsuite/gcc.target/powerpc/prefix-large.h diff --git a/gcc/testsuite/gcc.target/powerpc/prefix-large-dd.c b/gcc/testsuite/gcc.target/powerpc/prefix-large-dd.c new file mode 100644 index 000..2000fdd --- /dev/null +++ b/gcc/testsuite/gcc.target/powerpc/prefix-large-dd.c @@ -0,0 +1,13 @@ +/* { dg-do compile } */ +/* { dg-require-effective-target powerpc_prefixed_addr } */ +/* { dg-options "-O2 -mdejagnu-cpu=future" } */ + +/* Tests for prefixed instructions testing whether we can generate a prefixed + load/store instruction that has a 34-bit offset for _Decimal64 objects. */ + +#define TYPE _Decimal64 + +#include "prefix-large.h" + +/* { dg-final { scan-assembler-times {\mplfd\M} 2 } } */ +/* { dg-final { scan-assembler-times {\mpstfd\M} 2 } } */ diff --git a/gcc/testsuite/gcc.target/powerpc/prefix-large-df.c b/gcc/testsuite/gcc.target/powerpc/prefix-large-df.c new file mode 100644 index 000..48c497b --- /dev/null +++ b/gcc/testsuite/gcc.target/powerpc/prefix-large-df.c @@ -0,0 +1,13 @@ +/* { dg-do compile } */ +/* { dg-require-effective-target powerpc_prefixed_addr } */ +/* { dg-options "-O2 -mdejagnu-cpu=future" } */ + +/* Tests for prefixed instructions testing whether we can generate a prefixed + load/store instruction that has a 34-bit offset for double objects. */ + +#define TYPE double + +#include "prefix-large.h" + +/* { dg-final { scan-assembler-times {\mplfd\M} 2 } } */ +/* { dg-final { scan-assembler-times {\mpstfd\M} 2 } } */ diff --git a/gcc/testsuite/gcc.target/powerpc/prefix-large-di.c b/gcc/testsuite/gcc.target/powerpc/prefix-large-di.c new file mode 100644 index 000..aeb879e --- /dev/null +++ b/gcc/testsuite/gcc.target/powerpc/prefix-large-di.c @@ -0,0 +1,13 @@ +/* { dg-do compile } */ +/* { dg-require-effective-target powerpc_prefixed_addr } */ +/* { dg-options "-O2 -mdejagnu-cpu=future" } */ + +/* Tests for prefixed instructions
[PATCH 2/7] PowerPC tests: Add PLI/PADDI tests.
Add tests for -mcpu=future that test the generation of PADDI (and PLI which becomes PADDI). 2020-06-01 Michael Meissner * gcc.target/powerpc/prefix-add.c: New test. * gcc.target/powerpc/prefix-si-constant.c: New test. * gcc.target/powerpc/prefix-di-constant.c: New test. --- gcc/testsuite/gcc.target/powerpc/prefix-add.c | 14 ++ gcc/testsuite/gcc.target/powerpc/prefix-di-constant.c | 13 + gcc/testsuite/gcc.target/powerpc/prefix-ds-dq.c | 0 gcc/testsuite/gcc.target/powerpc/prefix-si-constant.c | 12 4 files changed, 39 insertions(+) create mode 100644 gcc/testsuite/gcc.target/powerpc/prefix-add.c create mode 100644 gcc/testsuite/gcc.target/powerpc/prefix-di-constant.c create mode 100644 gcc/testsuite/gcc.target/powerpc/prefix-ds-dq.c create mode 100644 gcc/testsuite/gcc.target/powerpc/prefix-si-constant.c diff --git a/gcc/testsuite/gcc.target/powerpc/prefix-add.c b/gcc/testsuite/gcc.target/powerpc/prefix-add.c new file mode 100644 index 000..26ef23e0 --- /dev/null +++ b/gcc/testsuite/gcc.target/powerpc/prefix-add.c @@ -0,0 +1,14 @@ +/* { dg-do compile } */ +/* { dg-require-effective-target powerpc_prefixed_addr } */ +/* { dg-options "-O2 -mdejagnu-cpu=future" } */ + +/* Test that PADDI is generated to add a large constant. */ +unsigned long +add (unsigned long a) +{ + return a + 0x12345U; +} + +/* { dg-final { scan-assembler {\mpaddi\M} } } */ +/* { dg-final { scan-assembler-not {\maddi\M} } } */ +/* { dg-final { scan-assembler-not {\maddis\M} } } */ diff --git a/gcc/testsuite/gcc.target/powerpc/prefix-di-constant.c b/gcc/testsuite/gcc.target/powerpc/prefix-di-constant.c new file mode 100644 index 000..389fdaa --- /dev/null +++ b/gcc/testsuite/gcc.target/powerpc/prefix-di-constant.c @@ -0,0 +1,13 @@ +/* { dg-do compile } */ +/* { dg-require-effective-target powerpc_prefixed_addr } */ +/* { dg-require-effective-target lp64 } */ +/* { dg-options "-O2 -mdejagnu-cpu=future" } */ + +/* Test that PLI (PADDI) is generated to load a large constant. */ +unsigned long long +large (void) +{ + return 0x12345678ULL; +} + +/* { dg-final { scan-assembler {\mpli\M} } } */ diff --git a/gcc/testsuite/gcc.target/powerpc/prefix-ds-dq.c b/gcc/testsuite/gcc.target/powerpc/prefix-ds-dq.c new file mode 100644 index 000..e69de29 diff --git a/gcc/testsuite/gcc.target/powerpc/prefix-si-constant.c b/gcc/testsuite/gcc.target/powerpc/prefix-si-constant.c new file mode 100644 index 000..269fc0f --- /dev/null +++ b/gcc/testsuite/gcc.target/powerpc/prefix-si-constant.c @@ -0,0 +1,12 @@ +/* { dg-do compile } */ +/* { dg-require-effective-target powerpc_prefixed_addr } */ +/* { dg-options "-O2 -mdejagnu-cpu=future" } */ + +/* Test that PLI (PADDI) is generated to load a large constant for SImode. */ +void +large_si (unsigned int *p) +{ + *p = 0x12345U; +} + +/* { dg-final { scan-assembler {\mpli\M} } } */ -- 1.8.3.1
[PATCH 4/7] PowerPC test: Add prefixed no update test
This test makes sure we do not generate a prefixed instruction with an update form. 2020-06-01 Michael Meissner * gcc.target/powerpc/prefix-no-update.c: New test. --- .../gcc.target/powerpc/prefix-no-update.c | 50 ++ 1 file changed, 50 insertions(+) create mode 100644 gcc/testsuite/gcc.target/powerpc/prefix-no-update.c diff --git a/gcc/testsuite/gcc.target/powerpc/prefix-no-update.c b/gcc/testsuite/gcc.target/powerpc/prefix-no-update.c new file mode 100644 index 000..e3c2e5e --- /dev/null +++ b/gcc/testsuite/gcc.target/powerpc/prefix-no-update.c @@ -0,0 +1,50 @@ +/* { dg-do compile } */ +/* { dg-require-effective-target powerpc_prefixed_addr } */ +/* { dg-options "-O2 -mdejagnu-cpu=future" } */ + +/* Make sure that we don't generate a prefixed form of the load and store with + update instructions (i.e. instead of generating LWZU we have to generate + PLWZ plus a PADDI). */ + +#ifndef SIZE +#define SIZE 5 +#endif + +struct foo { + unsigned int field; + char pad[SIZE]; +}; + +struct foo *inc_load (struct foo *p, unsigned int *q) +{ + *q = (++p)->field; /* PLWZ, PADDI, STW. */ + return p; +} + +struct foo *dec_load (struct foo *p, unsigned int *q) +{ + *q = (--p)->field; /* PLWZ, PADDI, STW. */ + return p; +} + +struct foo *inc_store (struct foo *p, unsigned int *q) +{ + (++p)->field = *q; /* LWZ, PADDI, PSTW. */ + return p; +} + +struct foo *dec_store (struct foo *p, unsigned int *q) +{ + (--p)->field = *q; /* LWZ, PADDI, PSTW. */ + return p; +} + +/* { dg-final { scan-assembler-times {\mlwz\M}2 } } */ +/* { dg-final { scan-assembler-times {\mstw\M}2 } } */ +/* { dg-final { scan-assembler-times {\mpaddi\M} 4 } } */ +/* { dg-final { scan-assembler-times {\mplwz\M} 2 } } */ +/* { dg-final { scan-assembler-times {\mpstw\M} 2 } } */ +/* { dg-final { scan-assembler-not {\mplwzu\M}} } */ +/* { dg-final { scan-assembler-not {\mpstwu\M}} } */ +/* { dg-final { scan-assembler-not {\maddis\M}} } */ +/* { dg-final { scan-assembler-not {\maddi\M} } } */ -- 1.8.3.1
[PATCH 7/7] PowerPC test: Add prefixed stack protect test
Test that stack protection generates prefixed stack instructions if you are using large stack frame for -mcpu=future. 2020-06-01 Michael Meissner * gcc.target/powerpc/prefix-stack-protect.c: New test. --- .../gcc.target/powerpc/prefix-stack-protect.c| 20 1 file changed, 20 insertions(+) create mode 100644 gcc/testsuite/gcc.target/powerpc/prefix-stack-protect.c diff --git a/gcc/testsuite/gcc.target/powerpc/prefix-stack-protect.c b/gcc/testsuite/gcc.target/powerpc/prefix-stack-protect.c new file mode 100644 index 000..d0d291b --- /dev/null +++ b/gcc/testsuite/gcc.target/powerpc/prefix-stack-protect.c @@ -0,0 +1,20 @@ +/* { dg-do compile } */ +/* { dg-require-effective-target powerpc_prefixed_addr } */ +/* { dg-options "-O2 -mdejagnu-cpu=future -fstack-protector-strong" } */ + +/* Test that we can handle large stack frames with -fstack-protector-strong and + prefixed addressing. This was originally discovered when trying to build + glibc with -mcpu=future, and vfwprintf.c failed because it used + -fstack-protector-strong. */ + +extern long foo (char *); + +long +bar (void) +{ + char buffer[0x2]; + return foo (buffer) + 1; +} + +/* { dg-final { scan-assembler {\mpld\M} } } */ +/* { dg-final { scan-assembler {\mpstd\M} } } */ -- 1.8.3.1
[PATCH 6/7] PowerPC tests: Add PC-relative tests.
These tests make sure that PC-relative variant is generated for -mcpu=future on systems that support PC-relative addressing. 2020-06-01 Michael Meissner * gcc.target/powerpc/prefix-pcrel-dd.c: New test. * gcc.target/powerpc/prefix-pcrel-df.c: New test. * gcc.target/powerpc/prefix-pcrel-di.c: New test. * gcc.target/powerpc/prefix-pcrel-hi.c: New test. * gcc.target/powerpc/prefix-pcrel-kf.c: New test. * gcc.target/powerpc/prefix-pcrel-qi.c: New test. * gcc.target/powerpc/prefix-pcrel-sd.c: New test. * gcc.target/powerpc/prefix-pcrel-sf.c: New test. * gcc.target/powerpc/prefix-pcrel-si.c: New test. * gcc.target/powerpc/prefix-pcrel-udi.c: New test. * gcc.target/powerpc/prefix-pcrel-uhi.c: New test. * gcc.target/powerpc/prefix-pcrel-uqi.c: New test. * gcc.target/powerpc/prefix-pcrel-usi.c: New test. * gcc.target/powerpc/prefix-pcrel-v2df.c: New test. * gcc.target/powerpc/prefix-pcrel.h: Include file for new tests. --- gcc/testsuite/gcc.target/powerpc/prefix-pcrel-dd.c | 13 ++ gcc/testsuite/gcc.target/powerpc/prefix-pcrel-df.c | 13 ++ gcc/testsuite/gcc.target/powerpc/prefix-pcrel-di.c | 13 ++ gcc/testsuite/gcc.target/powerpc/prefix-pcrel-hi.c | 13 ++ gcc/testsuite/gcc.target/powerpc/prefix-pcrel-kf.c | 13 ++ gcc/testsuite/gcc.target/powerpc/prefix-pcrel-qi.c | 13 ++ gcc/testsuite/gcc.target/powerpc/prefix-pcrel-sd.c | 16 +++ gcc/testsuite/gcc.target/powerpc/prefix-pcrel-sf.c | 13 ++ gcc/testsuite/gcc.target/powerpc/prefix-pcrel-si.c | 13 ++ .../gcc.target/powerpc/prefix-pcrel-udi.c | 13 ++ .../gcc.target/powerpc/prefix-pcrel-uhi.c | 13 ++ .../gcc.target/powerpc/prefix-pcrel-uqi.c | 13 ++ .../gcc.target/powerpc/prefix-pcrel-usi.c | 13 ++ .../gcc.target/powerpc/prefix-pcrel-v2df.c | 13 ++ gcc/testsuite/gcc.target/powerpc/prefix-pcrel.h| 52 ++ 15 files changed, 237 insertions(+) create mode 100644 gcc/testsuite/gcc.target/powerpc/prefix-pcrel-dd.c create mode 100644 gcc/testsuite/gcc.target/powerpc/prefix-pcrel-df.c create mode 100644 gcc/testsuite/gcc.target/powerpc/prefix-pcrel-di.c create mode 100644 gcc/testsuite/gcc.target/powerpc/prefix-pcrel-hi.c create mode 100644 gcc/testsuite/gcc.target/powerpc/prefix-pcrel-kf.c create mode 100644 gcc/testsuite/gcc.target/powerpc/prefix-pcrel-qi.c create mode 100644 gcc/testsuite/gcc.target/powerpc/prefix-pcrel-sd.c create mode 100644 gcc/testsuite/gcc.target/powerpc/prefix-pcrel-sf.c create mode 100644 gcc/testsuite/gcc.target/powerpc/prefix-pcrel-si.c create mode 100644 gcc/testsuite/gcc.target/powerpc/prefix-pcrel-udi.c create mode 100644 gcc/testsuite/gcc.target/powerpc/prefix-pcrel-uhi.c create mode 100644 gcc/testsuite/gcc.target/powerpc/prefix-pcrel-uqi.c create mode 100644 gcc/testsuite/gcc.target/powerpc/prefix-pcrel-usi.c create mode 100644 gcc/testsuite/gcc.target/powerpc/prefix-pcrel-v2df.c create mode 100644 gcc/testsuite/gcc.target/powerpc/prefix-pcrel.h diff --git a/gcc/testsuite/gcc.target/powerpc/prefix-pcrel-dd.c b/gcc/testsuite/gcc.target/powerpc/prefix-pcrel-dd.c new file mode 100644 index 000..f100c24 --- /dev/null +++ b/gcc/testsuite/gcc.target/powerpc/prefix-pcrel-dd.c @@ -0,0 +1,13 @@ +/* { dg-do compile } */ +/* { dg-require-effective-target powerpc_pcrel } */ +/* { dg-options "-O2 -mdejagnu-cpu=future" } */ + +/* Tests for prefixed instructions testing whether pc-relative prefixed + instructions are generated for the _Decimal64 type. */ + +#define TYPE _Decimal64 + +#include "prefix-pcrel.h" + +/* { dg-final { scan-assembler-times {\mplfd\M} 2 } } */ +/* { dg-final { scan-assembler-times {\mpstfd\M} 2 } } */ diff --git a/gcc/testsuite/gcc.target/powerpc/prefix-pcrel-df.c b/gcc/testsuite/gcc.target/powerpc/prefix-pcrel-df.c new file mode 100644 index 000..a9a0711 --- /dev/null +++ b/gcc/testsuite/gcc.target/powerpc/prefix-pcrel-df.c @@ -0,0 +1,13 @@ +/* { dg-do compile } */ +/* { dg-require-effective-target powerpc_pcrel } */ +/* { dg-options "-O2 -mdejagnu-cpu=future" } */ + +/* Tests for prefixed instructions testing whether pc-relative prefixed + instructions are generated for the double type. */ + +#define TYPE double + +#include "prefix-pcrel.h" + +/* { dg-final { scan-assembler-times {\mplfd\M} 2 } } */ +/* { dg-final { scan-assembler-times {\mpstfd\M} 2 } } */ diff --git a/gcc/testsuite/gcc.target/powerpc/prefix-pcrel-di.c b/gcc/testsuite/gcc.target/powerpc/prefix-pcrel-di.c new file mode 100644 index 000..850c28b --- /dev/null +++ b/gcc/testsuite/gcc.target/powerpc/prefix-pcrel-di.c @@ -0,0 +1,13 @@ +/* { dg-do compile } */ +/* { dg-require-effective-target powerpc_pcrel } */ +/* { dg-options "-O2 -mdejagnu-cpu=future" } */ + +/* Tests for prefixed instructions testing whether pc-relative prefixed + instructions are generated for the long
[PATCH 3/3] PowerPC future: Add IEEE 128-bit min, max, compare.
Add support for the new IEEE 128-bit minimum, maximum, and set compare mask instructions when -mcpu=future was used. gcc/ 2020-06-01 Michael Meissner * config/rs6000/rs6000.c (rs6000_emit_hw_fp_minmax): Update comment. (rs6000_emit_hw_fp_cmove): Update comment. (rs6000_emit_cmove): Add support for IEEE 128-bit min, max, and comparisons with -mcpu=future. (rs6000_emit_minmax): Add support for IEEE 128-bit min/max with -mcpu=future. * config/rs6000/rs6000.md (s3, IEEE128 iterator): New insns for IEEE 128-bit min/max. (movcc, IEEE128 iterator): New insns for IEEE 128-bit conditional move. (movcc_future, IEEE128 iterator): New insns for IEEE 128-bit conditional move. (movcc_invert_future, IEEE128 iterator): New insns for IEEE 128-bit conditional move. (fpmask, IEEE128 iterator): New insns for IEEE 128-bit conditional move. testsuite/ 2020-06-01 Michael Meissner * gcc.target/powerpc/float128-minmax-2.c: New test. --- gcc/config/rs6000/rs6000.c | 26 - gcc/config/rs6000/rs6000.md| 121 + .../gcc.target/powerpc/float128-minmax-2.c | 70 3 files changed, 214 insertions(+), 3 deletions(-) create mode 100644 gcc/testsuite/gcc.target/powerpc/float128-minmax-2.c diff --git a/gcc/config/rs6000/rs6000.c b/gcc/config/rs6000/rs6000.c index 0921328..bbba8f1 100644 --- a/gcc/config/rs6000/rs6000.c +++ b/gcc/config/rs6000/rs6000.c @@ -14847,7 +14847,9 @@ rs6000_emit_vector_cond_expr (rtx dest, rtx op_true, rtx op_false, /* ISA 3.0 (power9) minmax subcase to emit a XSMAXCDP or XSMINCDP instruction for SF/DF scalars. Move TRUE_COND to DEST if OP of the operands of the last comparison is nonzero/true, FALSE_COND if it is zero/false. Return 0 if the - hardware has no such operation. */ + hardware has no such operation. + + Under FUTURE, also handle IEEE 128-bit floating point. */ static int rs6000_emit_hw_fp_minmax (rtx dest, rtx op, rtx true_cond, rtx false_cond) @@ -14889,7 +14891,9 @@ rs6000_emit_hw_fp_minmax (rtx dest, rtx op, rtx true_cond, rtx false_cond) /* ISA 3.0 (power9) conditional move subcase to emit XSCMP{EQ,GE,GT,NE}DP and XXSEL instructions for SF/DF scalars. Move TRUE_COND to DEST if OP of the operands of the last comparison is nonzero/true, FALSE_COND if it is - zero/false. Return 0 if the hardware has no such operation. */ + zero/false. Return 0 if the hardware has no such operation. + + Under FUTURE, also handle IEEE 128-bit conditional moves. */ static int rs6000_emit_hw_fp_cmove (rtx dest, rtx op, rtx true_cond, rtx false_cond) @@ -14981,6 +14985,21 @@ rs6000_emit_cmove (rtx dest, rtx op, rtx true_cond, rtx false_cond) return 1; } + /* See if we can use the FUTURE min/max/compare instructions for IEEE 128-bit + floating point. At present, don't worry about doing conditional moves + with different types for the comparison and movement (unlike SF/DF, where + you can do a conditional test between double and use float as the if/then + parts. */ + if (TARGET_FUTURE && FLOAT128_IEEE_P (compare_mode) + && compare_mode == result_mode) +{ + if (rs6000_emit_hw_fp_minmax (dest, op, true_cond, false_cond)) + return 1; + + if (rs6000_emit_hw_fp_cmove (dest, op, true_cond, false_cond)) + return 1; +} + /* Don't allow using floating point comparisons for integer results for now. */ if (FLOAT_MODE_P (compare_mode) && !FLOAT_MODE_P (result_mode)) @@ -15204,7 +15223,8 @@ rs6000_emit_minmax (rtx dest, enum rtx_code code, rtx op0, rtx op1) /* VSX/altivec have direct min/max insns. */ if ((code == SMAX || code == SMIN) && (VECTOR_UNIT_ALTIVEC_OR_VSX_P (mode) - || (mode == SFmode && VECTOR_UNIT_VSX_P (DFmode + || (mode == SFmode && VECTOR_UNIT_VSX_P (DFmode)) + || (TARGET_FUTURE && FLOAT128_IEEE_P (mode { emit_insn (gen_rtx_SET (dest, gen_rtx_fmt_ee (code, mode, op0, op1))); return; diff --git a/gcc/config/rs6000/rs6000.md b/gcc/config/rs6000/rs6000.md index 3310b4b..ef82f11 100644 --- a/gcc/config/rs6000/rs6000.md +++ b/gcc/config/rs6000/rs6000.md @@ -14645,6 +14645,127 @@ (define_insn "*cmp_hw" "xscmpuqp %0,%1,%2" [(set_attr "type" "veccmp") (set_attr "size" "128")]) + +;; IEEE 128-bit min/max +(define_insn "s3" + [(set (match_operand:IEEE128 0 "altivec_register_operand" "=v") + (fp_minmax:IEEE128 +(match_operand:IEEE128 1 "altivec_register_operand" "v") +(match_operand:IEEE128 2 "altivec_register_operand" "v")))] + "TARGET_FUTURE && FLOAT128_IEEE_P (mode)" + "xscqp %0,%1,%2" + [(set_attr "type" "fp") + (set_attr "size" "128")]) + +;; IEEE 128-bit conditional move. At present, don't worry about doing +;; conditional moves with different types for
[PATCH 1/3] PowerPC future: Add byte swap insns
Add support for generating BRH/BRW/BRD when -mcpu=future is used. gcc/ 2020-06-01 Michael Meissner * config/rs6000/rs6000.md (bswaphi2_reg): If -mcpu=future, generate the BRH instruction. (bswapsi2_reg): If -mcpu=future, generate the BRW instruction. (bswapdi2): Rename bswapdi2_xxbrd to bswapdi2_hw. (bswapdi2_hw): Rename from bswapdi2_xxbrd. If -mcpu=future, generate the BRD instruction. testsuite/ 2020-06-01 Michael Meissner * gcc.target/powerpc/bswap64-5.c: New test. --- gcc/config/rs6000/rs6000.md | 44 +++- gcc/testsuite/gcc.target/powerpc/bswap64-5.c | 42 ++ 2 files changed, 66 insertions(+), 20 deletions(-) create mode 100644 gcc/testsuite/gcc.target/powerpc/bswap64-5.c diff --git a/gcc/config/rs6000/rs6000.md b/gcc/config/rs6000/rs6000.md index 0aa5265..3310b4b 100644 --- a/gcc/config/rs6000/rs6000.md +++ b/gcc/config/rs6000/rs6000.md @@ -2585,15 +2585,16 @@ (define_insn "bswap2_store" [(set_attr "type" "store")]) (define_insn_and_split "bswaphi2_reg" - [(set (match_operand:HI 0 "gpc_reg_operand" "=&r,wa") + [(set (match_operand:HI 0 "gpc_reg_operand" "=r,&r,wa") (bswap:HI -(match_operand:HI 1 "gpc_reg_operand" "r,wa"))) - (clobber (match_scratch:SI 2 "=&r,X"))] +(match_operand:HI 1 "gpc_reg_operand" "r,r,wa"))) + (clobber (match_scratch:SI 2 "=X,&r,X"))] "" "@ + brh %0,%1 # xxbrh %x0,%x1" - "reload_completed && int_reg_operand (operands[0], HImode)" + "reload_completed && !TARGET_FUTURE && int_reg_operand (operands[0], HImode)" [(set (match_dup 3) (and:SI (lshiftrt:SI (match_dup 4) (const_int 8)) @@ -2609,21 +2610,22 @@ (define_insn_and_split "bswaphi2_reg" operands[3] = simplify_gen_subreg (SImode, operands[0], HImode, 0); operands[4] = simplify_gen_subreg (SImode, operands[1], HImode, 0); } - [(set_attr "length" "12,4") - (set_attr "type" "*,vecperm") - (set_attr "isa" "*,p9v")]) + [(set_attr "length" "4,12,4") + (set_attr "type" "shift,*,vecperm") + (set_attr "isa" "fut,*,p9v")]) ;; We are always BITS_BIG_ENDIAN, so the bit positions below in ;; zero_extract insns do not change for -mlittle. (define_insn_and_split "bswapsi2_reg" - [(set (match_operand:SI 0 "gpc_reg_operand" "=&r,wa") + [(set (match_operand:SI 0 "gpc_reg_operand" "=r,&r,wa") (bswap:SI -(match_operand:SI 1 "gpc_reg_operand" "r,wa")))] +(match_operand:SI 1 "gpc_reg_operand" "r,r,wa")))] "" "@ + brw %0,%1 # xxbrw %x0,%x1" - "reload_completed && int_reg_operand (operands[0], SImode)" + "reload_completed && !TARGET_FUTURE && int_reg_operand (operands[0], SImode)" [(set (match_dup 0) ; DABC (rotate:SI (match_dup 1) (const_int 24))) @@ -2640,9 +2642,9 @@ (define_insn_and_split "bswapsi2_reg" (and:SI (match_dup 0) (const_int -256] "" - [(set_attr "length" "12,4") - (set_attr "type" "*,vecperm") - (set_attr "isa" "*,p9v")]) + [(set_attr "length" "4,12,4") + (set_attr "type" "shift,*,vecperm") + (set_attr "isa" "fut,*,p9v")]) ;; On systems with LDBRX/STDBRX generate the loads/stores directly, just like ;; we do for L{H,W}BRX and ST{H,W}BRX above. If not, we have to generate more @@ -2675,7 +2677,7 @@ (define_expand "bswapdi2" emit_insn (gen_bswapdi2_store (dest, src)); } else if (TARGET_P9_VECTOR) - emit_insn (gen_bswapdi2_xxbrd (dest, src)); + emit_insn (gen_bswapdi2_hw (dest, src)); else emit_insn (gen_bswapdi2_reg (dest, src)); DONE; @@ -2706,13 +2708,15 @@ (define_insn "bswapdi2_store" "stdbrx %1,%y0" [(set_attr "type" "store")]) -(define_insn "bswapdi2_xxbrd" - [(set (match_operand:DI 0 "gpc_reg_operand" "=wa") - (bswap:DI (match_operand:DI 1 "gpc_reg_operand" "wa")))] +(define_insn "bswapdi2_hw" + [(set (match_operand:DI 0 "gpc_reg_operand" "=r,wa") + (bswap:DI (match_operand:DI 1 "gpc_reg_operand" "r,wa")))] "TARGET_P9_VECTOR" - "xxbrd %x0,%x1" - [(set_attr "type" "vecperm") - (set_attr "isa" "p9v")]) + "@ + brd %0,%1 + xxbrd %x0,%x1" + [(set_attr "type" "shift,vecperm") + (set_attr "isa" "fut,p9v")]) (define_insn "bswapdi2_reg" [(set (match_operand:DI 0 "gpc_reg_operand" "=&r") diff --git a/gcc/testsuite/gcc.target/powerpc/bswap64-5.c b/gcc/testsuite/gcc.target/powerpc/bswap64-5.c new file mode 100644 index 000..9183e16 --- /dev/null +++ b/gcc/testsuite/gcc.target/powerpc/bswap64-5.c @@ -0,0 +1,42 @@ +/* { dg-do compile { target { powerpc*-*-* && lp64 } } } */ +/* { dg-require-effective-target powerpc_future_ok } */ +/* { dg-options "-O2 -mdejagnu-cpu=future" } */ + +/* This tests whether -mcpu=future generates the new byte swap + instructions (brd, brw, brh). */ + +unsigned short +bswap_short (unsigned short
PowerPC new instructions for -mcpu=future
These 3 patches add support for some new instructions in the 'future' processor. The first patch adds support for the new byte swap instructions that byte swap valies in the GPRs. The second patch renames some functions from _p9 to _hw. This is in preparation for the third patch that adds support for IEEE 128-bit minimum, maximum, and set compare masks. The third patch implements the new instructions. I have built bootstrap compilers with/without the patches, and there are no regressions. I verified that the two new tests pass. Can I check these into the master branch?
[PATCH 2/3] PowerPC future: Rename some p9 hardware functions.
This patch renames some functions that were added for power9 support that are named '_p9' to be '_hw'. This is preparation for the next patch that wants to extend these functions for -mcpu=power support. 2020-06-01 Michael Meissner * config/rs6000/rs6000.c (rs6000_emit_hw_fp_minmax): Rename rs6000_emit_p9_fp_minmax. (rs6000_emit_hw_fp_cmove): Rename rs6000_emit_p9_fp_cmove. (rs6000_emit_cmove): Update calls to rs6000_emit_hw_fp_minmax and rs6000_emit_hw_fp_cmove. --- gcc/config/rs6000/rs6000.c | 8 1 file changed, 4 insertions(+), 4 deletions(-) diff --git a/gcc/config/rs6000/rs6000.c b/gcc/config/rs6000/rs6000.c index 8435bc1..0921328 100644 --- a/gcc/config/rs6000/rs6000.c +++ b/gcc/config/rs6000/rs6000.c @@ -14850,7 +14850,7 @@ rs6000_emit_vector_cond_expr (rtx dest, rtx op_true, rtx op_false, hardware has no such operation. */ static int -rs6000_emit_p9_fp_minmax (rtx dest, rtx op, rtx true_cond, rtx false_cond) +rs6000_emit_hw_fp_minmax (rtx dest, rtx op, rtx true_cond, rtx false_cond) { enum rtx_code code = GET_CODE (op); rtx op0 = XEXP (op, 0); @@ -14892,7 +14892,7 @@ rs6000_emit_p9_fp_minmax (rtx dest, rtx op, rtx true_cond, rtx false_cond) zero/false. Return 0 if the hardware has no such operation. */ static int -rs6000_emit_p9_fp_cmove (rtx dest, rtx op, rtx true_cond, rtx false_cond) +rs6000_emit_hw_fp_cmove (rtx dest, rtx op, rtx true_cond, rtx false_cond) { enum rtx_code code = GET_CODE (op); rtx op0 = XEXP (op, 0); @@ -14974,10 +14974,10 @@ rs6000_emit_cmove (rtx dest, rtx op, rtx true_cond, rtx false_cond) && (compare_mode == SFmode || compare_mode == DFmode) && (result_mode == SFmode || result_mode == DFmode)) { - if (rs6000_emit_p9_fp_minmax (dest, op, true_cond, false_cond)) + if (rs6000_emit_hw_fp_minmax (dest, op, true_cond, false_cond)) return 1; - if (rs6000_emit_p9_fp_cmove (dest, op, true_cond, false_cond)) + if (rs6000_emit_hw_fp_cmove (dest, op, true_cond, false_cond)) return 1; } -- 1.8.3.1
Re: [PATCH] Prefer simple case changes in spelling suggestions
> Did the full DejaGnu testsuite get run? There are a lot of tests in it > that make use of this code. I did "make check" and only saw some XFAILs. Here's v2 of the patch, which I think addresses your comments. I did not add a new test of get_edit_distance, because as I mentioned earlier, an existing test already does what you asked for. Tom commit e897a99dada8d3935343ebf7b14ad7ec36515b3d Author: Tom Tromey Date: Fri May 29 10:46:57 2020 -0600 Prefer simple case changes in spelling suggestions I got this error message when editing gcc and recompiling: ../../gcc/gcc/ada/gcc-interface/decl.c:7714:39: error: ‘DWARF_GNAT_ENCODINGS_all’ was not declared in this scope; did you mean ‘DWARF_GNAT_ENCODINGS_GDB’? 7714 | = debug_info && gnat_encodings == DWARF_GNAT_ENCODINGS_all; | ^~~~ | DWARF_GNAT_ENCODINGS_GDB This suggestion could be improved -- what happened here is that I failed to upper-case the word, and DWARF_GNAT_ENCODINGS_ALL was the correct spelling. This patch changes gcc's spell checker to prefer simple case changes when possible. I tested this using the self-tests. A new self-test is also included. gcc/ChangeLog: * spellcheck.c (CASE_COST): New define. (BASE_COST): New define. (get_edit_distance): Recognize case changes. (get_edit_distance_cutoff): Update. (test_edit_distances): Update. (get_old_cutoff): Update. (test_find_closest_string): Add case sensitivity test. diff --git a/gcc/spellcheck.c b/gcc/spellcheck.c index 7891260a258..9f7351f364f 100644 --- a/gcc/spellcheck.c +++ b/gcc/spellcheck.c @@ -25,14 +25,22 @@ along with GCC; see the file COPYING3. If not see #include "spellcheck.h" #include "selftest.h" +/* Cost of a case transformation. */ +#define CASE_COST 1 + +/* Cost of another kind of edit. */ +#define BASE_COST 2 + /* Get the edit distance between the two strings: the minimal number of edits that are needed to change one string into another, where edits can be one-character insertions, removals, or substitutions, or transpositions of two adjacent characters (counting as one "edit"). - This implementation uses the Wagner-Fischer algorithm for the - Damerau-Levenshtein distance; specifically, the "optimal string alignment - distance" or "restricted edit distance" variant. */ + This implementation uses a modified variant of the Wagner-Fischer + algorithm for the Damerau-Levenshtein distance; specifically, the + "optimal string alignment distance" or "restricted edit distance" + variant. This implementation has been further modified to take + case into account. */ edit_distance_t get_edit_distance (const char *s, int len_s, @@ -47,9 +55,9 @@ get_edit_distance (const char *s, int len_s, } if (len_s == 0) -return len_t; +return BASE_COST * len_t; if (len_t == 0) -return len_s; +return BASE_COST * len_s; /* We effectively build a matrix where each (i, j) contains the distance between the prefix strings s[0:j] and t[0:i]. @@ -67,7 +75,7 @@ get_edit_distance (const char *s, int len_s, /* The first row is for the case of an empty target string, which we can reach by deleting every character in the source string. */ for (int i = 0; i < len_s + 1; i++) -v_one_ago[i] = i; +v_one_ago[i] = i * BASE_COST; /* Build successive rows. */ for (int i = 0; i < len_t; i++) @@ -83,21 +91,28 @@ get_edit_distance (const char *s, int len_s, /* The initial column is for the case of an empty source string; we can reach prefixes of the target string of length i by inserting i characters. */ - v_next[0] = i + 1; + v_next[0] = (i + 1) * BASE_COST; /* Build the rest of the row by considering neighbors to the north, west and northwest. */ for (int j = 0; j < len_s; j++) { - edit_distance_t cost = (s[j] == t[i] ? 0 : 1); - edit_distance_t deletion = v_next[j] + 1; - edit_distance_t insertion= v_one_ago[j + 1] + 1; + edit_distance_t cost; + + if (s[j] == t[i]) + cost = 0; + else if (TOLOWER (s[j]) == TOLOWER (t[i])) + cost = CASE_COST; + else + cost = BASE_COST; + edit_distance_t deletion = v_next[j] + BASE_COST; + edit_distance_t insertion= v_one_ago[j + 1] + BASE_COST; edit_distance_t substitution = v_one_ago[j] + cost; edit_distance_t cheapest = MIN (deletion, insertion); cheapest = MIN (cheapest, substitution); if (i > 0 && j > 0 && s[j] == t[i - 1] && s[j - 1] == t[i]) { - edit_distance_t transposition = v_two_ago[j - 1] + 1; + edit_distance_t transpositio
[committed] i386: Add __attribute__ ((gcc_struct)) to struct fenv [PR95418]
Windows ABI (MinGW) is different than Linux ABI when bitfileds are involved. The following patch adds __attribute__ ((gcc_struct)) to struct fenv in order to match the layout of x87 state image in memory. 2020-06-01 Uroš Bizjak libatomic/ChangeLog: * config/x86/fenv.c (struct fenv): Add __attribute__ ((gcc_struct)). libgcc/ChangeLog: * config/i386/sfp-exceptions.c (struct fenv): Add __attribute__ ((gcc_struct)). libgfortran/ChangeLog: PR libfortran/95418 * config/fpu-387.h (struct fenv): Add __attribute__ ((gcc_struct)). Bootstrapped and regression tested on x86_64-linux-gnu {,-m32}, and as stated in the PR, also tested by Markus on MinGW. Uros. diff --git a/libatomic/config/x86/fenv.c b/libatomic/config/x86/fenv.c index 88622c613f3..138a67ff217 100644 --- a/libatomic/config/x86/fenv.c +++ b/libatomic/config/x86/fenv.c @@ -45,7 +45,7 @@ struct fenv unsigned int __data_offset; unsigned short int __data_selector; unsigned short int __unused5; -}; +} __attribute__ ((gcc_struct)); #ifdef __SSE_MATH__ # define __math_force_eval_div(x, y) \ diff --git a/libgcc/config/i386/sfp-exceptions.c b/libgcc/config/i386/sfp-exceptions.c index 72cb0f4d3bb..3aed0af7c46 100644 --- a/libgcc/config/i386/sfp-exceptions.c +++ b/libgcc/config/i386/sfp-exceptions.c @@ -39,7 +39,7 @@ struct fenv unsigned int __data_offset; unsigned short int __data_selector; unsigned short int __unused5; -}; +} __attribute__ ((gcc_struct)); #ifdef __SSE_MATH__ # define __math_force_eval_div(x, y) \ diff --git a/libgfortran/config/fpu-387.h b/libgfortran/config/fpu-387.h index 8b5e758c2ca..7ff5acdc933 100644 --- a/libgfortran/config/fpu-387.h +++ b/libgfortran/config/fpu-387.h @@ -85,7 +85,7 @@ struct fenv unsigned short int __data_selector; unsigned short int __unused5; unsigned int __mxcsr; -}; +} __attribute__ ((gcc_struct)); /* Check we can actually store the FPU state in the allocated size. */ _Static_assert (sizeof(struct fenv) <= (size_t) GFC_FPE_STATE_BUFFER_SIZE,
[pushed] c++: vptr ubsan and object of known type [PR95466]
Another case where we can't find the OBJ_TYPE_REF_OBJECT in the OBJ_TYPE_REF_EXPR. So let's just evaluate the sanitize call first. Tested x86_64-pc-linux-gnu, applying to trunk. gcc/cp/ChangeLog: PR c++/95466 PR c++/95311 PR c++/95221 * class.c (build_vfn_ref): Revert 95311 change. * cp-ubsan.c (cp_ubsan_maybe_instrument_member_call): Build a COMPOUND_EXPR. gcc/testsuite/ChangeLog: PR c++/95466 * g++.dg/ubsan/vptr-17.C: New test. --- gcc/cp/class.c | 8 ++-- gcc/cp/cp-ubsan.c| 17 - gcc/testsuite/g++.dg/ubsan/vptr-17.C | 15 +++ 3 files changed, 25 insertions(+), 15 deletions(-) create mode 100644 gcc/testsuite/g++.dg/ubsan/vptr-17.C diff --git a/gcc/cp/class.c b/gcc/cp/class.c index c818826a108..757e010b6b7 100644 --- a/gcc/cp/class.c +++ b/gcc/cp/class.c @@ -729,13 +729,9 @@ build_vtbl_ref (tree instance, tree idx) tree build_vfn_ref (tree instance_ptr, tree idx) { - tree obtype = TREE_TYPE (TREE_TYPE (instance_ptr)); + tree aref; - /* Leave the INDIRECT_REF unfolded so cp_ubsan_maybe_instrument_member_call - can find instance_ptr. */ - tree ind = build1 (INDIRECT_REF, obtype, instance_ptr); - - tree aref = build_vtbl_ref (ind, idx); + aref = build_vtbl_ref (cp_build_fold_indirect_ref (instance_ptr), idx); /* When using function descriptors, the address of the vtable entry is treated as a function pointer. */ diff --git a/gcc/cp/cp-ubsan.c b/gcc/cp/cp-ubsan.c index c40dac72b42..183bd238aff 100644 --- a/gcc/cp/cp-ubsan.c +++ b/gcc/cp/cp-ubsan.c @@ -125,16 +125,11 @@ cp_ubsan_maybe_instrument_member_call (tree stmt) { /* Virtual function call: Sanitize the use of the object pointer in the OBJ_TYPE_REF, since the vtable reference will SEGV otherwise (95221). -OBJ_TYPE_REF_EXPR is ptr->vptr[N] and OBJ_TYPE_REF_OBJECT is ptr. */ +OBJ_TYPE_REF_EXPR is ptr->vptr[N] and OBJ_TYPE_REF_OBJECT is ptr. But +we can't be sure of finding OBJ_TYPE_REF_OBJECT in OBJ_TYPE_REF_EXPR +if the latter has been optimized, so we use a COMPOUND_EXPR below. */ opp = &OBJ_TYPE_REF_EXPR (fn); op = OBJ_TYPE_REF_OBJECT (fn); - while (*opp != op) - { - if (TREE_CODE (*opp) == COMPOUND_EXPR) - opp = &TREE_OPERAND (*opp, 1); - else - opp = &TREE_OPERAND (*opp, 0); - } } else { @@ -150,7 +145,11 @@ cp_ubsan_maybe_instrument_member_call (tree stmt) op = cp_ubsan_maybe_instrument_vptr (EXPR_LOCATION (stmt), op, TREE_TYPE (TREE_TYPE (op)), true, UBSAN_MEMBER_CALL); - if (op) + if (!op) +/* No change. */; + else if (fn && TREE_CODE (fn) == OBJ_TYPE_REF) +*opp = cp_build_compound_expr (op, *opp, tf_none); + else *opp = op; } diff --git a/gcc/testsuite/g++.dg/ubsan/vptr-17.C b/gcc/testsuite/g++.dg/ubsan/vptr-17.C new file mode 100644 index 000..b7f6a4cb4df --- /dev/null +++ b/gcc/testsuite/g++.dg/ubsan/vptr-17.C @@ -0,0 +1,15 @@ +// PR c++/95466 +// { dg-additional-options -fsanitize=vptr } + +class A { + virtual void m_fn1(); +}; +class C { +public: + virtual void m_fn2(); +}; +class B : A, public C {}; +int main() { + B b; + static_cast(&b)->m_fn2(); +} base-commit: 88f48e2967ead9be262483618238efa9c7c842ec -- 2.18.1
[committed] Fix pr92085-2.c regressions on msp430-elf
msp430-elf has had regressions for a while. There was other instability at the time the regression started, so I waited to see if it'd correct itself, but it didn't and I finally took a looksie. We're processing this in lower-subreg: (insn 30 64 42 6 (set (subreg:SI (reg/v:HI 33 [ oz ]) 0) (concatn:SI [ (reg:HI 45 [ _5 ]) (reg:HI 46 [ _5+2 ]) ])) "j.c":22:19 14 {movsi_x} (nil)) Note the paradoxical subreg destination. There's nothing inherently wrong with that.But if lower-subreg wants to decompose it it'll use simplify_gen_subreg_concatn which has this gem: /* If we see an insn like (set (reg:DI) (subreg:DI (reg:SI) 0)) then resolve_simple_move will ask for the high part of the paradoxical subreg, which does not have a value. Just return a zero. */ if (ret == NULL_RTX && paradoxical_subreg_p (op)) return CONST0_RTX (outermode); That's fine and good for the source of a set, but it's not good for the destination of a set. For the destination we might as well just not emit anything. The bits we're setting are don't cares and leaving them uninitialized should be fine. And that's exactly what this patch does. WHen simplify_gen_subreg_concatn returns CONST0_RTX for a destination operand, we assume that we need not actually assign anything to the destination and leave it as-is. This fixed the regression on the msp430-elf port and has bootstrapped and successfully regression tested on x86_64-linux-gnu. It's built on a few other targets as well, but I haven't tried to enumerate them -- I just knew there aren't new failures since dropping this patch into my tester :-) Installing on the trunk, Jeff commit c7969df1c5d3785c0b409f97e7682a6f0d2637ec Author: Jeff Law Date: Mon Jun 1 17:14:50 2020 -0400 Fix 92085-2.c ICE due to having (const_int 0) as the destination of a set. gcc/ * lower-subreg.c (resolve_simple_move): If simplify_gen_subreg_concatn returns (const_int 0) for the destination, then emit nothing. diff --git a/gcc/lower-subreg.c b/gcc/lower-subreg.c index a11e535b5bf..abe7180c686 100644 --- a/gcc/lower-subreg.c +++ b/gcc/lower-subreg.c @@ -1087,12 +1087,21 @@ resolve_simple_move (rtx set, rtx_insn *insn) emit_clobber (dest); for (i = 0; i < words; ++i) - emit_move_insn (simplify_gen_subreg_concatn (word_mode, dest, -dest_mode, -i * UNITS_PER_WORD), - simplify_gen_subreg_concatn (word_mode, src, -orig_mode, -i * UNITS_PER_WORD)); + { + rtx t = simplify_gen_subreg_concatn (word_mode, dest, + dest_mode, + i * UNITS_PER_WORD); + /* simplify_gen_subreg_concatn can return (const_int 0) for +some sub-objects of paradoxical subregs. As a source operand, +that's fine. As a destination it must be avoided. Those are +supposed to be don't care bits, so we can just drop that store +on the floor. */ + if (t != CONST0_RTX (word_mode)) + emit_move_insn (t, + simplify_gen_subreg_concatn (word_mode, src, +orig_mode, +i * UNITS_PER_WORD)); + } } if (real_dest != NULL_RTX)
[PATCH] Practical Improvement to Double Precision Complex Divide
The following patch to libgcc/libgcc2.c __divdc3 provides an opportunity to gain important improvements to the quality of answers for the default double precision complex divide routine when dealing with very large or very small exponents. The current code correctly implements Smith's method (1962) [1] further modified by c99's requirements for dealing with NaN (not a number) results. When working with input values where the exponents are greater than 512 (i.e. 2.0^512) or less than -512 (i.e. 2.0^-512), results are substantially different from the answers provided by quad precision more than 1% of the time. Since the allowed exponent range for double precision numbers is -1076 to +1023, the error rate may be unacceptable for many applications. The proposed method reduces the frequency of "substantially different" answers by more than 99% at a modest cost of performance. Differences between current gcc methods and the new method will be described. Then accuracy and performance differences will be discussed. NOTATION For all of the following, the notation is: Input complex values: a+bi (a= 64bit real part, b= 64bit imaginary part) c+di Output complex value: e+fi = (a+bi)/(c+di) DESCRIPTIONS of different complex divide methods: NAIVE COMPUTATION (-fcx-limited-range): e = (a*c + b*d)/(c*c + d*d) f = (b*c - a*d)/(c*c + d*d) Note that c*c and d*d will overflow or underflow if either c or d is outside the range 2^-538 to 2^512. This method is available in gcc when the switch -fcx-limited-range is used. That switch is also enabled by -ffast-math. Only one who has a clear understanding of the maximum range of intermediate values generated by a computation should consider using this switch. SMITH's METHOD (current libgcc): if(fabs(c) RBIG) || (FABS (a) > RBIG) || (FABS (b) > RBIG) ) { a = a * 0.5; b = b * 0.5; c = c * 0.5; d = d * 0.5; } /* minimize overflow/underflow issues when c and d are small */ else if (FABS (d) < RMIN2) { a = a * RMINSCAL; b = b * RMINSCAL; c = c * RMINSCAL; d = d * RMINSCAL; } r = c/d; denom = (c*r) + d; if( r > RMIN ) { e = (a*r + b) / denom ; f = (b*r - a) / denom } else { e = (c * (a/d) + b) / denom; f = (c * (b/d) - a) / denom; } [ only presenting the fabs(c) < fabs(d) case here, full code in patch. ] Before any computation of the answer, the code checks for near maximum or near minimum inputs and scale the results to move all values away from the extremes. If the complex divide can be computed at all without generating infinities, these scalings will not affect the accuracy since they are by a power of 2. Values that are over RBIG are relatively rare but it is easy to test for them and required to avoid unnecessary overflows. Testing for RMIN2 reveals when both c and d are less than 2^-512. By scaling all values by 2^510, the code avoids many underflows in intermediate computations that otherwise might occur. If scaling a and b by 2^510 causes either to overflow, then the computation will overflow whatever method is used. Next, r (the ratio of c to d) is checked for being near zero. Baudin and Smith checked r for zero. Checking for values less than DBL_MIN covers more cases and improves overall accuracy. If r is near zero, then when it is used in a multiplication, there is a high chance that the result will underflow to zero, losing significant accuracy. That underflow can be avoided if the computation is done in a different order. When r is subnormal, the code replaces a*r (= a*(c/d)) with ((a/d)*c) which is mathematically the same but avoids the unnecessary underflow. TEST Data Two sets of data are presented to test these methods. Both sets contain 10 million pairs of 64bit complex values. The exponents and mantissas are generated using multiple calls to random() and then combining the results. Only values which give results to complex divide that are representable in 64-bits after being computed in quad precision are used. The first data set is labeled "moderate exponents". The exponent range is limited to -512 to +511. The second data set is labeled "full exponents". The exponent range is -1076 to + 1024. ACCURACY Test results: Note: All results are based on use of fused multiply-add. If fused multiply-add is not used, the error rate increases slightly for the 2 ulp and 8 ulp cases. The complex divide methods are evaluated by determining what percentage of values exceed different ulp (units in last place) levels. If a "2 ulp" test results show 1%, that would mean that 1% of 10,000,000 values (100,000) have either a real or imaginary part that had a greater than 2 bit difference from the quad precision result. Results are reported for differences greater than or equal to 2 ulp, 8 ulp, 16 ulp, 24 ulp, and 52 ulp. Even when the patch avoids overflows and underflows, some input values are expected to have errors due to normal limitations of floating point sub
Re: [PATCH 6/7] PowerPC tests: Add PC-relative tests.
On Mon, 2020-06-01 at 15:53 -0400, Michael Meissner via Gcc-patches wrote: > These tests make sure that PC-relative variant is generated for -mcpu=future > on > systems that support PC-relative addressing. > > 2020-06-01 Michael Meissner > > * gcc.target/powerpc/prefix-pcrel-dd.c: New test. > * gcc.target/powerpc/prefix-pcrel-df.c: New test. > * gcc.target/powerpc/prefix-pcrel-di.c: New test. > * gcc.target/powerpc/prefix-pcrel-hi.c: New test. > * gcc.target/powerpc/prefix-pcrel-kf.c: New test. > * gcc.target/powerpc/prefix-pcrel-qi.c: New test. > * gcc.target/powerpc/prefix-pcrel-sd.c: New test. > * gcc.target/powerpc/prefix-pcrel-sf.c: New test. > * gcc.target/powerpc/prefix-pcrel-si.c: New test. > * gcc.target/powerpc/prefix-pcrel-udi.c: New test. > * gcc.target/powerpc/prefix-pcrel-uhi.c: New test. > * gcc.target/powerpc/prefix-pcrel-uqi.c: New test. > * gcc.target/powerpc/prefix-pcrel-usi.c: New test. > * gcc.target/powerpc/prefix-pcrel-v2df.c: New test. > * gcc.target/powerpc/prefix-pcrel.h: Include file for new tests. > --- > gcc/testsuite/gcc.target/powerpc/prefix-pcrel-dd.c | 13 ++ > gcc/testsuite/gcc.target/powerpc/prefix-pcrel-df.c | 13 ++ > gcc/testsuite/gcc.target/powerpc/prefix-pcrel-di.c | 13 ++ > gcc/testsuite/gcc.target/powerpc/prefix-pcrel-hi.c | 13 ++ > gcc/testsuite/gcc.target/powerpc/prefix-pcrel-kf.c | 13 ++ > gcc/testsuite/gcc.target/powerpc/prefix-pcrel-qi.c | 13 ++ > gcc/testsuite/gcc.target/powerpc/prefix-pcrel-sd.c | 16 +++ > gcc/testsuite/gcc.target/powerpc/prefix-pcrel-sf.c | 13 ++ > gcc/testsuite/gcc.target/powerpc/prefix-pcrel-si.c | 13 ++ > .../gcc.target/powerpc/prefix-pcrel-udi.c | 13 ++ > .../gcc.target/powerpc/prefix-pcrel-uhi.c | 13 ++ > .../gcc.target/powerpc/prefix-pcrel-uqi.c | 13 ++ > .../gcc.target/powerpc/prefix-pcrel-usi.c | 13 ++ > .../gcc.target/powerpc/prefix-pcrel-v2df.c | 13 ++ > gcc/testsuite/gcc.target/powerpc/prefix-pcrel.h| 52 > ++ > 15 files changed, 237 insertions(+) > create mode 100644 gcc/testsuite/gcc.target/powerpc/prefix-pcrel-dd.c > create mode 100644 gcc/testsuite/gcc.target/powerpc/prefix-pcrel-df.c > create mode 100644 gcc/testsuite/gcc.target/powerpc/prefix-pcrel-di.c > create mode 100644 gcc/testsuite/gcc.target/powerpc/prefix-pcrel-hi.c > create mode 100644 gcc/testsuite/gcc.target/powerpc/prefix-pcrel-kf.c > create mode 100644 gcc/testsuite/gcc.target/powerpc/prefix-pcrel-qi.c > create mode 100644 gcc/testsuite/gcc.target/powerpc/prefix-pcrel-sd.c > create mode 100644 gcc/testsuite/gcc.target/powerpc/prefix-pcrel-sf.c > create mode 100644 gcc/testsuite/gcc.target/powerpc/prefix-pcrel-si.c > create mode 100644 gcc/testsuite/gcc.target/powerpc/prefix-pcrel-udi.c > create mode 100644 gcc/testsuite/gcc.target/powerpc/prefix-pcrel-uhi.c > create mode 100644 gcc/testsuite/gcc.target/powerpc/prefix-pcrel-uqi.c > create mode 100644 gcc/testsuite/gcc.target/powerpc/prefix-pcrel-usi.c > create mode 100644 gcc/testsuite/gcc.target/powerpc/prefix-pcrel-v2df.c > create mode 100644 gcc/testsuite/gcc.target/powerpc/prefix-pcrel.h > > diff --git a/gcc/testsuite/gcc.target/powerpc/prefix-pcrel-dd.c > b/gcc/testsuite/gcc.target/powerpc/prefix-pcrel-dd.c > new file mode 100644 > index 000..f100c24 > --- /dev/null > +++ b/gcc/testsuite/gcc.target/powerpc/prefix-pcrel-dd.c > @@ -0,0 +1,13 @@ > +/* { dg-do compile } */ > +/* { dg-require-effective-target powerpc_pcrel } */ > +/* { dg-options "-O2 -mdejagnu-cpu=future" } */ > + > +/* Tests for prefixed instructions testing whether pc-relative prefixed > + instructions are generated for the _Decimal64 type. */ Similar/same comment as was made in Apr.I recommend something like "Test whether pc-relative prefixed instructions are generated for the _Decimal64 type."
Re: PowerPC tests for -mcpu=future
On Mon, 2020-06-01 at 15:53 -0400, Michael Meissner via Gcc-patches wrote: > This thread adds seven patches to add tests for the -mcpu=future code > generation. These patches are an update to the patches I sent out in > April. > > https://gcc.gnu.org/pipermail/gcc-patches/2020-April/544653.html > > I have done bootstrap builds with/without the patches on a little end > power9 > box, and there were no regressions with any of the tests ran. I > verified that > these tests do run and succeed. Can I check them into the master > branch? > One nit in #6, mentioned separately. Otherwise this patch series lgtm. thanks
Re: [PATCH] c++: Reject some further reinterpret casts in constexpr [PR82304, PR95307]
On Fri, May 29, 2020 at 01:26:32PM -0400, Jason Merrill via Gcc-patches wrote: > This is a diagnostic quality regression, moving the error message away from > the line where the actual problem is. > > Maybe use error_at (loc, ...)? That works fine, bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk? 2020-06-02 Jakub Jelinek PR c++/82304 PR c++/95307 * constexpr.c (cxx_eval_constant_expression): Diagnose CONVERT_EXPR conversions from pointer types to arithmetic types here... (cxx_eval_outermost_constant_expr): ... instead of here. * g++.dg/template/pr79650.C: Expect different diagnostics and expect it on all lines that do pointer to integer casts. * g++.dg/cpp1y/constexpr-shift1.C: Expect different diagnostics. * g++.dg/cpp1y/constexpr-82304.C: New test. * g++.dg/cpp0x/constexpr-95307.C: New test. --- gcc/cp/constexpr.c.jj 2020-05-29 23:49:25.479087388 +0200 +++ gcc/cp/constexpr.c 2020-06-01 12:53:30.348337388 +0200 @@ -6210,6 +6210,18 @@ cxx_eval_constant_expression (const cons if (VOID_TYPE_P (type)) return void_node; + if (TREE_CODE (t) == CONVERT_EXPR + && ARITHMETIC_TYPE_P (type) + && INDIRECT_TYPE_P (TREE_TYPE (op))) + { + if (!ctx->quiet) + error_at (loc, + "conversion from pointer type %qT to arithmetic type " + "%qT in a constant expression", TREE_TYPE (op), type); + *non_constant_p = true; + return t; + } + if (TREE_CODE (op) == PTRMEM_CST && !TYPE_PTRMEM_P (type)) op = cplus_expand_constant (op); @@ -6811,19 +6823,6 @@ cxx_eval_outermost_constant_expr (tree t non_constant_p = true; } - /* Technically we should check this for all subexpressions, but that - runs into problems with our internal representation of pointer - subtraction and the 5.19 rules are still in flux. */ - if (CONVERT_EXPR_CODE_P (TREE_CODE (r)) - && ARITHMETIC_TYPE_P (TREE_TYPE (r)) - && TREE_CODE (TREE_OPERAND (r, 0)) == ADDR_EXPR) -{ - if (!allow_non_constant) - error ("conversion from pointer type %qT " - "to arithmetic type %qT in a constant expression", - TREE_TYPE (TREE_OPERAND (r, 0)), TREE_TYPE (r)); - non_constant_p = true; -} if (!non_constant_p && overflow_p) non_constant_p = true; --- gcc/testsuite/g++.dg/template/pr79650.C.jj 2020-05-29 23:49:19.040183088 +0200 +++ gcc/testsuite/g++.dg/template/pr79650.C 2020-06-01 12:53:30.348337388 +0200 @@ -11,10 +11,10 @@ foo () static int a, b; lab1: lab2: - A<(intptr_t)&&lab1 - (__INTPTR_TYPE__)&&lab2> c; // { dg-error "not a constant integer" } - A<(intptr_t)&&lab1 - (__INTPTR_TYPE__)&&lab1> d; - A<(intptr_t)&a - (intptr_t)&b> e;// { dg-error "is not a constant expression" } - A<(intptr_t)&a - (intptr_t)&a> f; - A<(intptr_t)sizeof(a) + (intptr_t)&a> g; // { dg-error "not a constant integer" } + A<(intptr_t)&&lab1 - (__INTPTR_TYPE__)&&lab2> c; // { dg-error "conversion from pointer type" } + A<(intptr_t)&&lab1 - (__INTPTR_TYPE__)&&lab1> d; // { dg-error "conversion from pointer type" } + A<(intptr_t)&a - (intptr_t)&b> e;// { dg-error "conversion from pointer type" } + A<(intptr_t)&a - (intptr_t)&a> f;// { dg-error "conversion from pointer type" } + A<(intptr_t)sizeof(a) + (intptr_t)&a> g; // { dg-error "conversion from pointer type" } A<(intptr_t)&a> h; // { dg-error "conversion from pointer type" } } --- gcc/testsuite/g++.dg/cpp1y/constexpr-shift1.C.jj2020-05-29 23:49:19.036183148 +0200 +++ gcc/testsuite/g++.dg/cpp1y/constexpr-shift1.C 2020-06-01 13:55:22.607594689 +0200 @@ -3,7 +3,7 @@ constexpr int p = 1; constexpr __PTRDIFF_TYPE__ bar (int a) { - return ((__PTRDIFF_TYPE__) &p) << a; // { dg-error "is not a constant expression" } + return ((__PTRDIFF_TYPE__) &p) << a; // { dg-error "conversion from pointer" } } constexpr __PTRDIFF_TYPE__ r = bar (2); // { dg-message "in .constexpr. expansion of" } -constexpr __PTRDIFF_TYPE__ s = bar (0); // { dg-error "conversion from pointer" } +constexpr __PTRDIFF_TYPE__ s = bar (0); --- gcc/testsuite/g++.dg/cpp1y/constexpr-82304.C.jj 2020-06-01 12:53:30.349337373 +0200 +++ gcc/testsuite/g++.dg/cpp1y/constexpr-82304.C2020-06-01 13:03:40.668227604 +0200 @@ -0,0 +1,14 @@ +// PR c++/82304 +// { dg-do compile { target c++14 } } + +typedef __UINTPTR_TYPE__ uintptr_t; + +constexpr const char * +foo (const char *p) +{ + auto l = reinterpret_cast(p); // { dg-error "conversion from pointer" } + ++l; + return reinterpret_cast(l); +} + +constexpr auto s = foo ("Hello"); --- gcc/testsuite/g++.dg/cpp0x/constexpr-95307.C.jj 2020-06-01 12:53:30.349337373 +0200 +++ gcc/tests
Re: [PATCH 1/3] PowerPC future: Add byte swap insns
On Mon, 2020-06-01 at 16:01 -0400, Michael Meissner via Gcc-patches wrote: > Add support for generating BRH/BRW/BRD when -mcpu=future is used. > Hi, > gcc/ > 2020-06-01 Michael Meissner > > * config/rs6000/rs6000.md (bswaphi2_reg): If -mcpu=future, > generate the BRH instruction. > (bswapsi2_reg): If -mcpu=future, generate the BRW instruction. > (bswapdi2): Rename bswapdi2_xxbrd to bswapdi2_hw. > (bswapdi2_hw): Rename from bswapdi2_xxbrd. If -mcpu=future, > generate the BRD instruction. The "If -mcpu=future" blurbs there could probably be dropped. > > testsuite/ > 2020-06-01 Michael Meissner > > * gcc.target/powerpc/bswap64-5.c: New test. > --- > gcc/config/rs6000/rs6000.md | 44 > +++- > gcc/testsuite/gcc.target/powerpc/bswap64-5.c | 42 ++ > 2 files changed, 66 insertions(+), 20 deletions(-) > create mode 100644 gcc/testsuite/gcc.target/powerpc/bswap64-5.c > > diff --git a/gcc/testsuite/gcc.target/powerpc/bswap64-5.c > b/gcc/testsuite/gcc.target/powerpc/bswap64-5.c > new file mode 100644 > index 000..9183e16 > --- /dev/null > +++ b/gcc/testsuite/gcc.target/powerpc/bswap64-5.c > @@ -0,0 +1,42 @@ > +/* { dg-do compile { target { powerpc*-*-* && lp64 } } } */ > +/* { dg-require-effective-target powerpc_future_ok } */ > +/* { dg-options "-O2 -mdejagnu-cpu=future" } */ > + > +/* This tests whether -mcpu=future generates the new byte swap > + instructions (brd, brw, brh). */ s/new// (It's only new until it's not). Aside from those nits, lgtm. thanks -Will
Re: [committed 3/3] libstdc++: Refactor filesystem::path string conversions
On 23/05/20 09:44 +0100, Jonathan Wakely wrote: This simplifies the logic of converting Source arguments and pairs of InputIterator arguments into the native string format. For any input that is a contiguous range of path::value_type (or char8_t for POSIX) a string view can be created and the conversion can be done directly, with no intermediate allocation. Previously some cases created a basic_string unnecessarily, for example construction from a pair of path::string_type::iterators, or a pair of non-const value_type* pointers. * include/bits/fs_path.h (__detail::_S_range_begin) (__detail::_S_range_end, path::_S_string_from_iter): Replace with overloaded function template __detail::__effective_range. (__detail::__effective_range): New overloaded function template to create a basic_string or basic_string_view for an effective range. (__detail::__value_type_is_char): Use __detail::__effective_range. Do not use remove_const on value type. (__detail::__value_type_is_char_or_char8_t): Likewise. (path::path(const Source&, format)) (path::path(const Source&, const locale&)) (path::operator/=(const Source&), path::append(const Source&)) (path::concat(const Source&)): Use __detail::__effective_range. (path::_S_to_string(InputIterator, InputIterator)): New function template to create a string view if possible, or string otherwise. (path::_S_convert): Add overloads that convert a string returned by __detail::__effective_range. Use if-constexpr to inline conversion logic from all overloads of _Cvt::_S_convert. (path::_S_convert_loc): Add overload that converts a string. Use _S_to_string to avoid allocation when possible. (path::_Cvt): Remove. (path::operator+=(CharT)): Remove indirection through path::concat. * include/experimental/bits/fs_path.h (path::_S_convert_loc): Add overload for non-const pointers, to avoid constructing a std::string. * src/c++17/fs_path.cc (path::_S_convert_loc): Replace conditional compilation with call to _S_convert. This commit broke *-*-mingw* bootstrap. Fixed with the attached patch. Tested powerpc64le-linux and x86_64-w64-mingw32, committed to master. commit cd3f067b82a1331f5fb695879ba5c3d9bb2cca3a Author: Jonathan Wakely Date: Tue Jun 2 00:07:05 2020 +0100 libstdc++: Fix filesystem::u8path for mingw targets (PR 95392) When I refactored filesystem::path string conversions in r11-587-584d52b088f9fcf78704b504c3f1f07e17c1cded I failed to update the mingw-specific code in filesystem::u8path, causing a bootstrap failure. This fixes it, and further refactors the mingw-specific code along the same lines as the previous commit. All conversions from UTF-8 strings to wide strings now use the same helper function, __wstr_from_utf8. PR libstdc++/95392 * include/bits/fs_path.h (path::_S_to_string): Move to namespace-scope and rename to ... (__detail::__string_from_range): ... this. [WINDOWS] (__detail::__wstr_from_utf8): New function template to convert a char sequence containing UTF-8 to wstring. (path::_S_convert(Iter, Iter)): Adjust call to _S_to_string. (path::_S_convert_loc(Iter, Iter, const locale&)): Likewise. (u8path(InputIterator, InputIterator)) [WINDOWS]: Use __string_from_range to obtain a contiguous range and __wstr_from_utf8 to obtain a wide string. (u8path(const Source&)) [WINDOWS]: Use __effective_range to obtain a contiguous range and __wstr_from_utf8 to obtain a wide string. (path::_S_convert(const _EcharT*, const _EcharT)) [WINDOWS]: Use __wstr_from_utf8. diff --git a/libstdc++-v3/include/bits/fs_path.h b/libstdc++-v3/include/bits/fs_path.h index 2d2766ec62e..26ddf0afec4 100644 --- a/libstdc++-v3/include/bits/fs_path.h +++ b/libstdc++-v3/include/bits/fs_path.h @@ -211,6 +211,51 @@ namespace __detail #endif , _Val>; + // Create a string or string view from an iterator range. + template +inline auto +__string_from_range(_InputIterator __first, _InputIterator __last) +{ + using _EcharT + = typename std::iterator_traits<_InputIterator>::value_type; + static_assert(__is_encoded_char<_EcharT>); + +#if __cpp_lib_concepts + constexpr bool __contiguous = std::contiguous_iterator<_InputIterator>; +#else + constexpr bool __contiguous + = is_pointer_v; +#endif + if constexpr (__contiguous) + { + // For contiguous iterators we can just return a string view. + const auto* __f = std::__to_address(std::__niter_base(__first)); + const auto* __l = std::__to_address(std::__niter_base(__last)); + return basic_string_view<_EcharT>(__f, __l - __f); + } + else + // Conversion requires contiguous characters, so create a string. +
Re: [PATCH 3/3] PowerPC future: Add IEEE 128-bit min, max, compare.
On Mon, 2020-06-01 at 16:01 -0400, Michael Meissner via Gcc-patches wrote: > Add support for the new IEEE 128-bit minimum, maximum, and set compare mask > instructions when -mcpu=future was used. > > gcc/ > 2020-06-01 Michael Meissner > > * config/rs6000/rs6000.c (rs6000_emit_hw_fp_minmax): Update > comment. > (rs6000_emit_hw_fp_cmove): Update comment. > (rs6000_emit_cmove): Add support for IEEE 128-bit min, max, and > comparisons with -mcpu=future. > (rs6000_emit_minmax): Add support for IEEE 128-bit min/max with > -mcpu=future. > * config/rs6000/rs6000.md (s3, IEEE128 iterator): > New insns for IEEE 128-bit min/max. > (movcc, IEEE128 iterator): New insns for IEEE 128-bit > conditional move. > (movcc_future, IEEE128 iterator): New insns for IEEE 128-bit > conditional move. > (movcc_invert_future, IEEE128 iterator): New insns for IEEE > 128-bit conditional move. > (fpmask, IEEE128 iterator): New insns for IEEE 128-bit > conditional move. Include the leading wildcard here? (*fpmask ... and missing an entry for this one: (*xxsel ... > > testsuite/ > 2020-06-01 Michael Meissner > > * gcc.target/powerpc/float128-minmax-2.c: New test. > --- > gcc/config/rs6000/rs6000.c | 26 - > gcc/config/rs6000/rs6000.md| 121 > + > .../gcc.target/powerpc/float128-minmax-2.c | 70 > 3 files changed, 214 insertions(+), 3 deletions(-) > create mode 100644 gcc/testsuite/gcc.target/powerpc/float128-minmax-2.c > > diff --git a/gcc/config/rs6000/rs6000.c b/gcc/config/rs6000/rs6000.c > index 0921328..bbba8f1 100644 > --- a/gcc/config/rs6000/rs6000.c > +++ b/gcc/config/rs6000/rs6000.c > @@ -14847,7 +14847,9 @@ rs6000_emit_vector_cond_expr (rtx dest, rtx op_true, > rtx op_false, > /* ISA 3.0 (power9) minmax subcase to emit a XSMAXCDP or XSMINCDP instruction > for SF/DF scalars. Move TRUE_COND to DEST if OP of the operands of the > last > comparison is nonzero/true, FALSE_COND if it is zero/false. Return 0 if > the > - hardware has no such operation. */ > + hardware has no such operation. > + > + Under FUTURE, also handle IEEE 128-bit floating point. */ > > static int > rs6000_emit_hw_fp_minmax (rtx dest, rtx op, rtx true_cond, rtx false_cond) > @@ -14889,7 +14891,9 @@ rs6000_emit_hw_fp_minmax (rtx dest, rtx op, rtx > true_cond, rtx false_cond) > /* ISA 3.0 (power9) conditional move subcase to emit XSCMP{EQ,GE,GT,NE}DP and > XXSEL instructions for SF/DF scalars. Move TRUE_COND to DEST if OP of the > operands of the last comparison is nonzero/true, FALSE_COND if it is > - zero/false. Return 0 if the hardware has no such operation. */ > + zero/false. Return 0 if the hardware has no such operation. > + > + Under FUTURE, also handle IEEE 128-bit conditional moves. */ > > static int > rs6000_emit_hw_fp_cmove (rtx dest, rtx op, rtx true_cond, rtx false_cond) > @@ -14981,6 +14985,21 @@ rs6000_emit_cmove (rtx dest, rtx op, rtx true_cond, > rtx false_cond) > return 1; > } > > + /* See if we can use the FUTURE min/max/compare instructions for IEEE > 128-bit > + floating point. At present, don't worry about doing conditional moves > + with different types for the comparison and movement (unlike SF/DF, > where > + you can do a conditional test between double and use float as the > if/then > + parts. */ Why don't we worry about that now? Should this be a 'future todo' comment here? Beyond those nits and questions, lgtm, Thanks, -Will
Re: PowerPC new instructions for -mcpu=future
On Mon, 2020-06-01 at 16:01 -0400, Michael Meissner via Gcc-patches wrote: > These 3 patches add support for some new instructions in the 'future' > processor. > > The first patch adds support for the new byte swap instructions that > byte swap > valies in the GPRs. values > > The second patch renames some functions from _p9 to > _hw. This is > in preparation for the third patch that adds support for IEEE 128-bit > minimum, > maximum, and set compare masks. > > The third patch implements the new instructions. > > I have built bootstrap compilers with/without the patches, and there > are no > regressions. I verified that the two new tests pass. Can I check > these into > the master branch? A couple cosmetic nits, in a followup email. otherwise this series lgtm. Thanks -Will
Re: [PATCH 1/2] Introduce flag_cunroll_grow_size for cunroll
Jiufu Guo writes: Hi, I updated the patch just a little accordinlgy. Thanks! diff --git a/gcc/common.opt b/gcc/common.opt index 4464049fc1f..570e2aa53c8 100644 --- a/gcc/common.opt +++ b/gcc/common.opt @@ -2856,6 +2856,10 @@ funroll-all-loops Common Report Var(flag_unroll_all_loops) Optimization Perform loop unrolling for all loops. +funroll-completely-grow-size +Common Undocumented Var(flag_cunroll_grow_size) Init(2) Optimization +; Internal undocumented flag, allow size growth during complete unrolling + ; Nonzero means that loop optimizer may assume that the induction variables ; that control loops do not overflow and that the loops with nontrivial ; exit condition are not infinite diff --git a/gcc/toplev.c b/gcc/toplev.c index 96316fbd23b..8d52358efdd 100644 --- a/gcc/toplev.c +++ b/gcc/toplev.c @@ -1474,6 +1474,10 @@ process_options (void) if (flag_unroll_all_loops) flag_unroll_loops = 1; + /* Allow cunroll to grow size accordingly. */ + if (flag_cunroll_grow_size == AUTODETECT_VALUE) +flag_cunroll_grow_size = flag_unroll_loops || flag_peel_loops; + /* web and rename-registers help when run after loop unrolling. */ if (flag_web == AUTODETECT_VALUE) flag_web = flag_unroll_loops; diff --git a/gcc/tree-ssa-loop-ivcanon.c b/gcc/tree-ssa-loop-ivcanon.c index 8ab6ab3330c..298ab215530 100644 --- a/gcc/tree-ssa-loop-ivcanon.c +++ b/gcc/tree-ssa-loop-ivcanon.c @@ -1603,9 +1603,8 @@ pass_complete_unroll::execute (function *fun) re-peeling the same loop multiple times. */ if (flag_peel_loops) peeled_loops = BITMAP_ALLOC (NULL); - unsigned int val = tree_unroll_loops_completely (flag_unroll_loops - || flag_peel_loops - || optimize >= 3, true); + unsigned int val = tree_unroll_loops_completely (flag_cunroll_grow_size, + true); if (peeled_loops) { BITMAP_FREE (peeled_loops); BR, Jiufu > Richard Biener writes: > >>> >> From: Jiufu Guo >>> >> >>> >> Currently GIMPLE complete unroller(cunroll) is checking >>> >> flag_unroll_loops and flag_peel_loops to see if allow size growth. >>> >> Beside affects curnoll, flag_unroll_loops also controls RTL unroler. >>> >> To have more freedom to control cunroll and RTL unroller, this patch >>> >> introduces flag_cunroll_grow_size. With this patch, we can control >>> >> cunroll and RTL unroller indepently. >>> >> >>> >> Bootstrap/regtest pass on powerpc64le. OK for trunk? And backport to >>> >> GCC10 after week? >>> >> >>> >> >>> >> +funroll-completely-grow-size >>> >> +Var(flag_cunroll_grow_size) Init(2) >>> >> +; Control cunroll to allow size growth during complete unrolling >>> >> + >>> > >> >> It won't work without adjusting the awk scripts. So go with >> >> funroll-completely-grow-size >> Undocumented Optimization Var(flag_cunroll_grow_size) >> EnabledBy(funroll-loops || fpeel-loops) >> ; ... >> > EnabledBy(funroll-loops || fpeel-loops) does not works as we expected: > "-funroll-loops -fno-peel-loops" turns off flag_cunroll_grow_size. > > Through "EnabledBy", a flag can be turned, and also can be turned off by > the "EnabledBy option", only if the flag is not specifed through commond > line. > >> and enable it at O3+. AUTODETECT_VALUE doesn't make sense for >> an option not supposed to be set by users? >> > > global_options_set.x_flagxxx can be used to check if option is set by > user. But it does not work well here neither, because we also care of > if the flag is override by OPTION_OPTIMIZATION_TABLE or > OPTION_OVERRIDE. > > AUTODETECT_VALUE(value is 2) is used for some flags like flag_web, > flag_rename_registers, flag_var_tracking, flag_tree_cselim... > And this way could be used to check if the flag is effective(on/off) > either explicit set by command line or implicit set through > OPTION_OVERRIDE or OPTION_OPTIMIZATION_TABLE. > So, I use it here.
Re: [PATCH] testsuite: Disable colorization for ubsan test
Committed, thanks :) On Mon, Jun 1, 2020 at 4:10 PM Jakub Jelinek via Gcc-patches wrote: > > On Mon, Jun 01, 2020 at 03:43:00PM +0800, Kito Cheng wrote: > > ping > > > > > > On Wed, May 20, 2020 at 3:01 PM Kito Cheng wrote: > > > > > > - Run gcc testsuite with qemu will print out ascii color code for > > >ubsan related testcase, however several testcase didn't consider > > >that, so disable colorization prevent such problem and simplify the > > >process when adding testcase in future. > > > > > > - Verified on native X86 and RISC-V qemu full system mode and user mode. > > > > > > ChangeLog: > > > > > > gcc/testsuite/ > > > > > > Kito Cheng > > > > > > * ubsan-dg.exp (orig_ubsan_options_saved): New > > > (orig_ubsan_options): Ditto. > > > (ubsan_init): Store UBSAN_OPTIONS and set UBSAN_OPTIONS. > > > (ubsan_finish): Restore UBSAN_OPTIONS. > > Ok, thanks. > > Jakub >
RE: [PATCH PR95254] aarch64: gcc generate inefficient code with fixed sve vector length
Hi, > -Original Message- > From: Richard Sandiford [mailto:richard.sandif...@arm.com] > Sent: Monday, June 1, 2020 4:47 PM > To: Yangfei (Felix) > Cc: gcc-patches@gcc.gnu.org; Uros Bizjak ; Jakub > Jelinek ; Hongtao Liu ; H.J. Lu > > Subject: Re: [PATCH PR95254] aarch64: gcc generate inefficient code with > fixed sve vector length Snip... > Sounds good. Maybe at this point the x_inner and y_inner code is getting > complicated enough to put into a lambda too: > > x_inner = ... (x); > y_inner = ... (y); > > Just a suggestion though. Yes, that's a good suggestion. I see the code becomes more cleaner with another lambda. > Yeah, looks good. > > Formatting nit though: multi-line conditions should be wrapped in (...), > i.e.: > > return (... > && ... > && ...); > Done. v6 patch is based on trunk 20200601. Bootstrapped and tested on aarch64-linux-gnu. Also bootstrapped on x86-64-linux-gnu with --enable-multilib (for building -m32 x86 libgcc). Regresssion test on x86-64-linux-gnu looks good except for the following failures which has been confirmed by x86 devs: > FAIL: gcc.target/i386/avx512f-vcvtps2ph-2.c (test for excess errors) > UNRESOLVED: gcc.target/i386/avx512f-vcvtps2ph-2.c compilation failed to > produce executable 154803c154803 Thanks, Felix pr95254-v6.diff Description: pr95254-v6.diff
Re: [PATCH 3/4] ivopts: Consider cost_step on different forms during unrolling
Hi Richard, Thanks for the comments! on 2020/6/2 上午1:59, Richard Sandiford wrote: > Could you go into more detail about this choice of cost calculation? > It looks like we first calculate per-group flags, which are true only if > the unrolled offsets are valid for all uses in the group. Then we create > per-candidate flags when associating candidates with groups. > Sure. It checks every address type IV group to determine whether this group is valid to use reg offset addressing mode. Here we only need to check the first one and the last one, since the intermediates should have been handled by split_address_groups. With unrolling the displacement of the address can be offset-ed by (UF-1)*step, check the address with this max offset whether still valid. If the check finds it's valid to use reg offset mode for the whole group, we flag this group. Later, when we create IV candidate for address group flagged, we flag the candidate further. This flag is mainly for iv cand costing, we don't need to scale up iv cand's step cost for this kind of candidate. Imagining this loop is being unrolled, all the statements will be duplicated by UF. For the cost modeling against iv group, it's scaling up the cost by UF (here I simply excluded the compare_type since in most cases it for loop ending check). For the cost modeling against iv candidate, it's to focus on step costs, for an iv candidate we flagged before, it's taken as one time step cost, for the others, it's scaling up the step cost since the unrolling make step calculation become UF times. This cost modeling is trying to simulate cost change after the unrolling, scaling up the costs accordingly. There are somethings to be improved like distinguish the loop ending compare or else, whether need to tweak the other costs somehow since the scaling up probably cause existing cost framework imbalance, but during benchmarking I didn't find these matter, so take it as simple as possible for now. > Instead, couldn't we take this into account in get_address_cost, > which calculates the cost of an address use for a given candidate? > E.g. after the main if-else at the start of the function, > perhaps it would make sense to add the worst-case offset to > the address in “parts”, check whether that too is a valid address, > and if not, increase var_cost by the cost of one add instruction. > IIUC, what you suggest is to tweak the iv group cost, if we find one address group is valid for reg offset mode, we price more on the pairs between this group and other non address-based iv cands. The question is how do we decide this add-on cost. For the test case I was working on initially, adding one cost (of add) doesn't work, the normal iv still wined. We can price it more like two but what's the justification on this value, by heuristics? > I guess there are two main sources of inexactness if we do that: > > (1) It might underestimate the cost because it assumes that vuse[0] > stands for all vuses in the group. > Do you mean we don't need one check function like mark_reg_offset_groups? If without it, vuse[0] might be not enough since we can't ensure the others are fine with additional displacement from unrolling. If we still have it, I think it's fine to just use vuse[0]. > (2) It might overestimates the cost because it treats all unrolled > iterations as having the cost of the final unrolled iteration. > > (1) could perhaps be avoided by adding a flag to the iv_use to say > whether it wants this treatment. I think the flag approach suffers > from (2) too, and I'd be surprised if it makes a difference in practice. > Sorry, I didn't have the whole picture how to deal with uf for your proposal. But the flag approach considers uf in iv group cost calculation as well as iv cand step cost calculation. BR, Kewen > Thanks, > Richard >
[PATCH] Reapply all revisions mentioned in LOCAL_PATCHES.
(cherry picked from commit 21bb1625bd4f183984223ce31bd03ba47ed62f27) --- libsanitizer/asan/asan_globals.cpp| 19 --- libsanitizer/asan/asan_interceptors.h | 7 ++- libsanitizer/asan/asan_mapping.h | 2 +- .../sanitizer_linux_libcdep.cpp | 4 .../sanitizer_common/sanitizer_mac.cpp| 2 +- .../sanitizer_platform_limits_linux.cpp | 7 +-- .../sanitizer_platform_limits_posix.h | 2 +- .../sanitizer_common/sanitizer_stacktrace.cpp | 17 - libsanitizer/tsan/tsan_rtl_ppc64.S| 1 + libsanitizer/ubsan/ubsan_flags.cpp| 1 + libsanitizer/ubsan/ubsan_handlers.cpp | 15 +++ libsanitizer/ubsan/ubsan_handlers.h | 8 libsanitizer/ubsan/ubsan_platform.h | 2 ++ 13 files changed, 57 insertions(+), 30 deletions(-) diff --git a/libsanitizer/asan/asan_globals.cpp b/libsanitizer/asan/asan_globals.cpp index 9d7dbc6f264..e045c31cd1c 100644 --- a/libsanitizer/asan/asan_globals.cpp +++ b/libsanitizer/asan/asan_globals.cpp @@ -154,23 +154,6 @@ static void CheckODRViolationViaIndicator(const Global *g) { } } -// Check ODR violation for given global G by checking if it's already poisoned. -// We use this method in case compiler doesn't use private aliases for global -// variables. -static void CheckODRViolationViaPoisoning(const Global *g) { - if (__asan_region_is_poisoned(g->beg, g->size_with_redzone)) { -// This check may not be enough: if the first global is much larger -// the entire redzone of the second global may be within the first global. -for (ListOfGlobals *l = list_of_all_globals; l; l = l->next) { - if (g->beg == l->g->beg && - (flags()->detect_odr_violation >= 2 || g->size != l->g->size) && - !IsODRViolationSuppressed(g->name)) -ReportODRViolation(g, FindRegistrationSite(g), - l->g, FindRegistrationSite(l->g)); -} - } -} - // Clang provides two different ways for global variables protection: // it can poison the global itself or its private alias. In former // case we may poison same symbol multiple times, that can help us to @@ -216,8 +199,6 @@ static void RegisterGlobal(const Global *g) { // where two globals with the same name are defined in different modules. if (UseODRIndicator(g)) CheckODRViolationViaIndicator(g); -else - CheckODRViolationViaPoisoning(g); } if (CanPoisonMemory()) PoisonRedZones(*g); diff --git a/libsanitizer/asan/asan_interceptors.h b/libsanitizer/asan/asan_interceptors.h index 344a64bd83d..b7a85fedbdf 100644 --- a/libsanitizer/asan/asan_interceptors.h +++ b/libsanitizer/asan/asan_interceptors.h @@ -80,7 +80,12 @@ void InitializePlatformInterceptors(); #if ASAN_HAS_EXCEPTIONS && !SANITIZER_WINDOWS && !SANITIZER_SOLARIS && \ !SANITIZER_NETBSD # define ASAN_INTERCEPT___CXA_THROW 1 -# define ASAN_INTERCEPT___CXA_RETHROW_PRIMARY_EXCEPTION 1 +# if ! defined(ASAN_HAS_CXA_RETHROW_PRIMARY_EXCEPTION) \ + || ASAN_HAS_CXA_RETHROW_PRIMARY_EXCEPTION +# define ASAN_INTERCEPT___CXA_RETHROW_PRIMARY_EXCEPTION 1 +# else +# define ASAN_INTERCEPT___CXA_RETHROW_PRIMARY_EXCEPTION 0 +# endif # if defined(_GLIBCXX_SJLJ_EXCEPTIONS) || (SANITIZER_IOS && defined(__arm__)) # define ASAN_INTERCEPT__UNWIND_SJLJ_RAISEEXCEPTION 1 # else diff --git a/libsanitizer/asan/asan_mapping.h b/libsanitizer/asan/asan_mapping.h index 41fb49ee46d..09be904270c 100644 --- a/libsanitizer/asan/asan_mapping.h +++ b/libsanitizer/asan/asan_mapping.h @@ -163,7 +163,7 @@ static const u64 kDefaultShort64bitShadowOffset = static const u64 kAArch64_ShadowOffset64 = 1ULL << 36; static const u64 kMIPS32_ShadowOffset32 = 0x0aaa; static const u64 kMIPS64_ShadowOffset64 = 1ULL << 37; -static const u64 kPPC64_ShadowOffset64 = 1ULL << 44; +static const u64 kPPC64_ShadowOffset64 = 1ULL << 41; static const u64 kSystemZ_ShadowOffset64 = 1ULL << 52; static const u64 kSPARC64_ShadowOffset64 = 1ULL << 43; // 0x800 static const u64 kFreeBSD_ShadowOffset32 = 1ULL << 30; // 0x4000 diff --git a/libsanitizer/sanitizer_common/sanitizer_linux_libcdep.cpp b/libsanitizer/sanitizer_common/sanitizer_linux_libcdep.cpp index 4d17c9686e4..0a549021821 100644 --- a/libsanitizer/sanitizer_common/sanitizer_linux_libcdep.cpp +++ b/libsanitizer/sanitizer_common/sanitizer_linux_libcdep.cpp @@ -697,9 +697,13 @@ u32 GetNumberOfCPUs() { #elif SANITIZER_SOLARIS return sysconf(_SC_NPROCESSORS_ONLN); #else +#if defined(CPU_COUNT) cpu_set_t CPUs; CHECK_EQ(sched_getaffinity(0, sizeof(cpu_set_t), &CPUs), 0); return CPU_COUNT(&CPUs); +#else + return 1; +#endif #endif } diff --git a/libsanitizer/sanitizer_common/sanitizer_mac.cpp b/libsanitizer/sanitizer_common/sanitizer_mac.cpp index 7550545ea6f..648236ea83d 100644 --- a/libsanitizer/sanitizer_common/sanitizer_mac.cpp +++ b/libsanitizer/sanitizer_common/sanitizer_mac
[PATCH] Update link to LOCAL_PATCHES.
libsanitizer/ChangeLog: * LOCAL_PATCHES: Update hash of local patches. --- libsanitizer/LOCAL_PATCHES | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/libsanitizer/LOCAL_PATCHES b/libsanitizer/LOCAL_PATCHES index 7732de3d436..f0c99a3b7ea 100644 --- a/libsanitizer/LOCAL_PATCHES +++ b/libsanitizer/LOCAL_PATCHES @@ -1 +1 @@ -21bb1625bd4f183984223ce31bd03ba47ed62f27 +f18ab18032031d1e5540dcc85b396cd2d0166c5b -- 2.26.2
Re: [PATCH] contrib: Improve comments and error text
On 6/1/20 7:30 PM, Jonathan Wakely wrote: * gcc-changelog/git_commit.py (GitCommit.check_mentioned_files): Improve error text. OK for master? Yes, I've just pushed the patch with your authorship. Martin
Re: [IMPORTANT] ChangeLog related changes
On 6/1/20 7:24 PM, Jonathan Wakely wrote: On Mon, 25 May 2020 at 23:50, Jakub Jelinek via Gcc wrote: Hi! I've turned the strict mode of Martin Liška's hook changes, which means that from now on no commits to the trunk or release branches should be changing any ChangeLog files together with the other files, ChangeLog entry should be solely in the commit message. The DATESTAMP bumping script will be updating the ChangeLog files for you. If somebody makes a mistake in that, please wait 24 hours (at least until after 00:16 UTC after your commit) so that the script will create the ChangeLog entries, and afterwards it can be fixed by adjusting the ChangeLog files. But you can only touch the ChangeLog files in that case (and shouldn't write a ChangeLog entry for that in the commit message). If anything goes wrong, please let me, other RMs and Martin Liška know. The libstdc++ manual is written in Docbook XML, but we commit both the XML and generated HTML pages to Git. Sometimes a small XML file can result in dozens of mechanical changes to the generated HTML files, which we record in the ChangeLog as: * doc/html/*: Regenerated. With the new checks we need to name every generated file individually. If we add that directory to the ignored_prefixes list, we won't need to name them. But then the doc/html/* entry will give an error, and changes to the HTML files can be committed without any ChangeLog entry. Should we just stop mentioning the HTML in the ChangeLog? We could do something like the attached patch, but it seems overkill for this one special case. The patch is fine to me. Can you please a pytest for the situation: contrib/gcc-changelog/test_email.py ? Martin