Re: [PATCH] Improve __builtin_add_overflow on x86 for double-word types (PR target/93141)
On Sat, Jan 4, 2020 at 12:39 PM Jakub Jelinek wrote: > > On Sat, Jan 04, 2020 at 12:13:50PM +0100, Uros Bizjak wrote: > > LGTM, but I wonder if *addcarry_1 gets overmacroized, the insn > > condition is really hard to comprehend. Perhaps it should be written > > as a separate pattern for SImode and DImode instead? > > > > +(define_insn "*addcarry_1" > > > + [(set (reg:CCC FLAGS_REG) > > > + (compare:CCC > > > + (zero_extend: > > > + (plus:SWI48 > > > + (plus:SWI48 > > > + (match_operator:SWI48 5 "ix86_carry_flag_operator" > > > + [(match_operand 3 "flags_reg_operand") (const_int 0)]) > > > + (match_operand:SWI48 1 "nonimmediate_operand" "%0")) > > > + (match_operand:SWI48 2 "x86_64_immediate_operand" "e"))) > > > + (plus: > > > + (match_operand: 6 "const_scalar_int_operand" "") > > > + (match_operator: 4 "ix86_carry_flag_operator" > > > + [(match_dup 3) (const_int 0)] > > > + (set (match_operand:SWI48 0 "register_operand" "=r") > > > + (plus:SWI48 (plus:SWI48 (match_op_dup 5 > > > +[(match_dup 3) (const_int 0)]) > > > + (match_dup 1)) > > > + (match_dup 2)))] > > > + "ix86_binary_operator_ok (PLUS, mode, operands) > > > + && CONST_INT_P (operands[2]) > > > + /* Check that operands[6] is operands[2] zero extended from > > > + mode to mode. */ > > > + && ((mode == SImode || INTVAL (operands[2]) >= 0) > > > + ? (CONST_INT_P (operands[6]) > > > + && UINTVAL (operands[6]) == (UINTVAL (operands[2]) > > > + & GET_MODE_MASK (mode))) > > > + : (CONST_WIDE_INT_P (operands[6]) > > > + && CONST_WIDE_INT_NUNITS (operands[6]) == 2 > > > + && ((unsigned HOST_WIDE_INT) CONST_WIDE_INT_ELT (operands[6], 0) > > > + == UINTVAL (operands[2])) > > > + && CONST_WIDE_INT_ELT (operands[6], 1) == 0))" > > I'm afraid it wouldn't help. For SImode it would remove the : part, sure, > but any optimizing compiler will optimize that away too, for DImode we need > both branches - if INTVAL isn't negative, its zero extension will still be > a CONST_INT with the same value, if it is negative, zero extension will be a > CONST_WIDE_INT. It would be shorter to write > rtx_equal_p (operands[6], >simplify_unary_operation (ZERO_EXTEND, mode, operands[2], > mode)) > but I wanted to avoid that because it is quite costly and will create a new > rtx each time the condition is checked. > Anyway, attached is in one file the *addcarry_1 from the patch and > in another one the demacroized variant, the macroized one is 41 lines > long, the demacroized one is 75 lines long, so not fully 2 times, but > almost. Yes, in the SImode case we can just use const_int_operand and don't > need to test it in the condition. Thanks for the examples! Yes, I agree that the expanded version is not much better than macroized (it is even worse...). So, OK. Thanks, Uros.
[C++ PATCH] Avoid caching constexpr calls that allocate something that they don't deallocate or vice versa (PR c++/91369)
Hi! The caching of constexpr calls breaks the following testcase. The problem is that constexpr calls that allocate from heap something that they don't deallocate or calls that deallocate something they haven't allocated aren't stateless for the constexpr context and in each outermost constexpr evaluation the allocations will produce different artificial heap VAR_DECLs. In the testcase a function allocates something that the caller is supposed to deallocate, so when we call it second time (e.g. from different function) with the same constant arguments, we get the cached result that contains address of the heap VAR_DECL from the earlier evaluation and error trying to free something we haven't allocated (in the same outermost constexpr evaluation). Similarly, a constexpr call could deallocate something the caller is supposed to allocate, if we cache such call, we wouldn't actually deallocate it and complain the allocation has not been deallocated. Fixed by making sure we only cache calls that either don't perform any allocations/deallocations, or perform exactly as many allocations and deallocations and all those new allocations (new additions to heap_vars, to which we only add, never remove) have been deallocated (checking just numbers is not good enough, we might perform 3 allocations and 3 deallocations, e.g. deallocate 2 of those allocations and one passed to the call from the caller and pass the still allocated address to the caller. Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk? 2020-01-05 Jakub Jelinek PR c++/91369 * constexpr.c (struct constexpr_global_ctx): Add heap_alloc_count member, initialize it to zero in ctor. (cxx_eval_call_expression): Bump heap_dealloc_count when deleting a heap object. Don't cache calls to functions which allocate some heap objects and don't deallocate them or deallocate some heap objects they didn't allocate. * g++.dg/cpp1y/constexpr-new.C: Expect an error explaining why static_assert failed for C++2a. * g++.dg/cpp2a/constexpr-new9.C: New test. --- gcc/cp/constexpr.c.jj 2020-01-01 12:22:34.002475595 +0100 +++ gcc/cp/constexpr.c 2020-01-04 11:56:07.619124987 +0100 @@ -1041,8 +1041,11 @@ struct constexpr_global_ctx { auto_vec heap_vars; /* Cleanups that need to be evaluated at the end of CLEANUP_POINT_EXPR. */ vec *cleanups; + /* Number of heap VAR_DECL deallocations. */ + unsigned heap_dealloc_count; /* Constructor. */ - constexpr_global_ctx () : constexpr_ops_count (0), cleanups (NULL) {} + constexpr_global_ctx () +: constexpr_ops_count (0), cleanups (NULL), heap_dealloc_count (0) {} }; /* The constexpr expansion context. CALL is the current function @@ -2056,6 +2059,7 @@ cxx_eval_call_expression (const constexp { DECL_NAME (var) = heap_deleted_identifier; ctx->global->values.remove (var); + ctx->global->heap_dealloc_count++; return void_node; } else if (DECL_NAME (var) == heap_deleted_identifier) @@ -2281,6 +2285,7 @@ cxx_eval_call_expression (const constexp } else { + bool cacheable = true; if (result && result != error_mark_node) /* OK */; else if (!DECL_SAVED_TREE (fun)) @@ -2346,6 +2351,8 @@ cxx_eval_call_expression (const constexp auto_vec save_exprs; ctx_with_save_exprs.save_exprs = &save_exprs; ctx_with_save_exprs.call = &new_call; + unsigned save_heap_alloc_count = ctx->global->heap_vars.length (); + unsigned save_heap_dealloc_count = ctx->global->heap_dealloc_count; tree jump_target = NULL_TREE; cxx_eval_constant_expression (&ctx_with_save_exprs, body, @@ -2417,6 +2424,33 @@ cxx_eval_call_expression (const constexp /* Make the unshared function copy we used available for re-use. */ save_fundef_copy (fun, copy); + + /* If the call allocated some heap object that hasn't been +deallocated during the call, or if it deallocated some heap +object it has not allocated, the call isn't really stateless +for the constexpr evaluation and should not be cached. +It is fine if the call allocates something and deallocates it +too. */ + if (entry + && (save_heap_alloc_count != ctx->global->heap_vars.length () + || (save_heap_dealloc_count + != ctx->global->heap_dealloc_count))) + { + tree heap_var; + unsigned int i; + if ((ctx->global->heap_vars.length () + - ctx->global->heap_dealloc_count) + != save_heap_alloc_count - save_heap_dealloc_count) + cacheable = false; + else + FOR_EACH_VEC_ELT_FROM (ctx->global->heap_vars, i
[testsuite, Darwin, committed] Fix fail of darwin-version-1.c.
Recent platform linkers will no longer accept linking for a target OS version less than 10.4. Recent SDKs no longer have the libgcc_s shims used for 10.4 and 10.5. So we need to adjust tests that expect these. Added some explanation too. tested on darwin9, darwin16 and darwin18, applied to mainline, thanks Iain gcc/testsuite/ChangeLog: 2020-01-05 Iain Sandoe * gcc.dg/darwin-version-1.c: Adjust test to use different options for Darwin4-9 and Darwin10+. diff --git a/gcc/testsuite/gcc.dg/darwin-version-1.c b/gcc/testsuite/gcc.dg/darwin-version-1.c index 11cfceff39..ad7f7da3b6 100644 --- a/gcc/testsuite/gcc.dg/darwin-version-1.c +++ b/gcc/testsuite/gcc.dg/darwin-version-1.c @@ -1,10 +1,14 @@ /* Basic test of the -mmacosx-version-min option. */ -/* { dg-options "-mmacosx-version-min=10.1" } */ +/* Darwin4 corresponds to MacOS 10.0. */ +/* { dg-options "-mmacosx-version-min=10.1" { target *-*-darwin[456789]* } } */ +/* Later Darwin linkers decline to link for less than Darwin8/MacOS 10.4. + However, we need to make the link for 10.6 because the relevant libgcc_s + shim files for 10.4 and 10.5 are also not installed in later SDKs. */ +/* { dg-options "-mmacosx-version-min=10.6" { target *-*-darwin[123]* } } */ /* { dg-do link { target *-*-darwin* } } */ int main() { return 0; } -
[PATCH] optimize costly division in rtx_cost
Hi, I noticed there's a costly signed 64-bit division in rtx_cost on x86 as well as any other target where UNITS_PER_WORD is implemented like TARGET_64BIT ? 8 : 4. It's also evident that rtx_cost does redundant work for a SET rtx argument. Obviously the variable named 'factor' rarely exceeds 1, so in the majority of cases it can be computed with a well-predictable branch rather than a division. This patch makes rtx_cost do the division only in case mode is wider than UNITS_PER_WORD, and also moves a test for a SET up front to avoid redundancy. No functional change. Bootstrapped on x86_64, ok for trunk? To illustrate the improvement this buys, for tramp3d -O2 compilation, I got before: 73887675319 instructions:u 72438432200 cycles:u 924298569 idq.ms_uops:u 102603799255 uops_dispatched.thread:u after: 73888371724 instructions:u 72386986612 cycles:u 802744775 idq.ms_uops:u 102096987220 uops_dispatched.thread:u (this is on Sandybridge, idq.ms_uops are uops going via the microcode sequencer, so the unneeded division is responsible for a good fraction of them) * rtlanal.c (rtx_cost): Handle a SET up front. Avoid division if the mode is not wider than UNITS_PER_WORD. diff --git a/gcc/rtlanal.c b/gcc/rtlanal.c index 9a7afccefb8..c7ab86e228b 100644 --- a/gcc/rtlanal.c +++ b/gcc/rtlanal.c @@ -4207,18 +4207,23 @@ rtx_cost (rtx x, machine_mode mode, enum rtx_code outer_code, const char *fmt; int total; int factor; + unsigned mode_size; if (x == 0) return 0; - if (GET_MODE (x) != VOIDmode) + if (GET_CODE (x) == SET) +/* A SET doesn't have a mode, so let's look at the SET_DEST to get + the mode for the factor. */ +mode = GET_MODE (SET_DEST (x)); + else if (GET_MODE (x) != VOIDmode) mode = GET_MODE (x); + mode_size = estimated_poly_value (GET_MODE_SIZE (mode)); + /* A size N times larger than UNITS_PER_WORD likely needs N times as many insns, taking N times as long. */ - factor = estimated_poly_value (GET_MODE_SIZE (mode)) / UNITS_PER_WORD; - if (factor == 0) -factor = 1; + factor = mode_size > UNITS_PER_WORD ? mode_size / UNITS_PER_WORD : 1; /* Compute the default costs of certain things. Note that targetm.rtx_costs can override the defaults. */ @@ -4243,14 +4248,6 @@ rtx_cost (rtx x, machine_mode mode, enum rtx_code outer_code, /* Used in combine.c as a marker. */ total = 0; break; -case SET: - /* A SET doesn't have a mode, so let's look at the SET_DEST to get -the mode for the factor. */ - mode = GET_MODE (SET_DEST (x)); - factor = estimated_poly_value (GET_MODE_SIZE (mode)) / UNITS_PER_WORD; - if (factor == 0) - factor = 1; - /* FALLTHRU */ default: total = factor * COSTS_N_INSNS (1); }
[Committed] New bit-field testcases Part 3/N
Hi, I found some more testcases that would cause an internal compiler error while working on my bit-field lowering pass. Since there was no testcase already done, I thought it would be best if it was applied to the trunk. Thanks, Andrew Pinski testsuite/ChangeLog: * gcc.c-torture/compile/20200105-1.c: New testcase. * gcc.c-torture/compile/20200105-2.c: New testcase. * gcc.c-torture/compile/20200105-3.c: New testcase. Index: gcc.c-torture/compile/20200105-1.c === --- gcc.c-torture/compile/20200105-1.c (nonexistent) +++ gcc.c-torture/compile/20200105-1.c (working copy) @@ -0,0 +1,12 @@ +struct mouse_button_str { +unsigned char left : 1; +unsigned char right : 1; +unsigned char middle: 1; +}; +int g(void) +{ + signed char a = 0; + struct mouse_button_str *newbutton1 = (struct mouse_button_str*)&a; + newbutton1->left = 1; + return a; +} Index: gcc.c-torture/compile/20200105-2.c === --- gcc.c-torture/compile/20200105-2.c (nonexistent) +++ gcc.c-torture/compile/20200105-2.c (working copy) @@ -0,0 +1,12 @@ +struct mouse_button_str { +signed char left : 1; +signed char right : 1; +signed char middle: 1; +}; +int g(void) +{ + unsigned char a = 0; + struct mouse_button_str *newbutton1 = (struct mouse_button_str*)&a; + newbutton1->left = 1; + return a; +} Index: gcc.c-torture/compile/20200105-3.c === --- gcc.c-torture/compile/20200105-3.c (nonexistent) +++ gcc.c-torture/compile/20200105-3.c (working copy) @@ -0,0 +1,12 @@ +struct mouse_button_str { +unsigned char left : 1; +unsigned char right : 1; +unsigned char middle: 1; +}; +int g(void) +{ + unsigned char a = 0; + struct mouse_button_str *newbutton1 = (struct mouse_button_str*)&a; + newbutton1->left = 1; + return a; +}
[PATCH] ipa-inline: Adjust condition for caller_growth_limits
Inline should return failure either (newsize > param_large_function_insns) OR (newsize > limit). Sometimes newsize is larger than param_large_function_insns, but smaller than limit, inline doesn't return failure even if the new function is a large function. Therefore, param_large_function_insns and param_large_function_growth should be OR instead of AND, otherwise --param large-function-growth won't work correctly with --param large-function-insns. gcc/ChangeLog: 2020-01-06 Luo Xiong Hu ipa-inline-analysis.c (estimate_growth): Fix typo. ipa-inline.c (caller_growth_limits): Use OR instead of AND. --- gcc/ipa-inline-analysis.c | 2 +- gcc/ipa-inline.c | 4 ++-- 2 files changed, 3 insertions(+), 3 deletions(-) diff --git a/gcc/ipa-inline-analysis.c b/gcc/ipa-inline-analysis.c index 711b85bd1a9..0e584a5a401 100644 --- a/gcc/ipa-inline-analysis.c +++ b/gcc/ipa-inline-analysis.c @@ -462,7 +462,7 @@ offline_size (struct cgraph_node *node, ipa_size_summary *info) return 0; } -/* Estimate the growth caused by inlining NODE into all callees. */ +/* Estimate the growth caused by inlining NODE into all callers. */ int estimate_growth (struct cgraph_node *node) diff --git a/gcc/ipa-inline.c b/gcc/ipa-inline.c index 0f87c476dde..33945538def 100644 --- a/gcc/ipa-inline.c +++ b/gcc/ipa-inline.c @@ -184,8 +184,8 @@ caller_growth_limits (struct cgraph_edge *e) the function to shrink if it went over the limits by forced inlining. */ newsize = estimate_size_after_inlining (to, e); if (newsize >= ipa_size_summaries->get (what)->size - && newsize > opt_for_fn (to->decl, param_large_function_insns) - && newsize > limit) + && (newsize > opt_for_fn (to->decl, param_large_function_insns) + || newsize > limit)) { e->inline_failed = CIF_LARGE_FUNCTION_GROWTH_LIMIT; return false; -- 2.21.0.777.g83232e3864